Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Preprocessing module for large histological images

License

NotificationsYou must be signed in to change notification settings

jopo666/HistoPrep

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Preprocessing large medical images for machine learning made easy!

DescriptionInstallationUsageAPI DocumentationCitation

Description

HistoPrep makes is easy to prepare your histological slide images for deeplearning models. You can easily cut large slide images into smaller tiles and thenpreprocess those tiles (remove tiles with shitty tissue, finger marks etc).

Installation

InstallOpenSlide on your system and then install histoprep withpip!

pip install histoprep

Usage

Typical workflow for training deep learning models with histological images is thefollowing:

  1. Cut each slide image into smaller tile images.
  2. Preprocess smaller tile images by removing tiles with bad tissue, staining artifacts.
  3. Overfit a pretrained ResNet50 model, report 100% validation accuracy and publish itinNature like everyone else.

WithHistoPrep, steps 1. and 2. are as easy as accidentally drinking too much at theresearch group christmas party and proceeding to work remotely until June.

Let's start by cutting a slide from thePANDA kaggle challenge intosmall tiles.

fromhistoprepimportSlideReader# Read slide image.reader=SlideReader("./slides/slide_with_ink.jpeg")# Detect tissue.threshold,tissue_mask=reader.get_tissue_mask(level=-1)# Extract overlapping tile coordinates with less than 50% background.tile_coordinates=reader.get_tile_coordinates(tissue_mask,width=512,overlap=0.5,max_background=0.5)# Save tile images with image metrics for preprocessing.tile_metadata=reader.save_regions("./train_tiles/",tile_coordinates,threshold=threshold,save_metrics=True)
slide_with_ink: 100%|██████████| 390/390 [00:01<00:00, 295.90it/s]

Let's take a look at the output and visualise the thumbnails.

jopo666@~$ tree train_tilestrain_tiles└── slide_with_ink    ├── metadata.parquet# tile metadata    ├── properties.json# tile properties    ├── thumbnail.jpeg# thumbnail image    ├── thumbnail_tiles.jpeg# thumbnail with tiles    ├── thumbnail_tissue.jpeg# thumbnail of the tissue mask    └── tiles [390 entries exceeds filelimit, not opening dir]

Prostate biopsy sampleTissue maskThumbnail with tiles

That was easy, but it can be annoying to whip up a new python script every time you wantto cut slides, and thus it is recommended to use theHistoPrep CLI program!

# Repeat the above code for all images in the PANDA dataset!jopo666@~$ HistoPrep --input'./train_images/*.tiff' --output ./tiles --width 512 --overlap 0.5 --max-background 0.5

As we can see from the above images, histological slide images often contain areas thatwe would not like to include into our training data. Might seem like a daunting task butlet's try it out!

fromhistoprep.utilsimportOutlierDetector# Let's wrap the tile metadata with a helper class.detector=OutlierDetector(tile_metadata)# Cluster tiles based on image metrics.clusters=detector.cluster_kmeans(num_clusters=4,random_state=666)# Visualise first cluster.reader.get_annotated_thumbnail(image=reader.read_level(-1),coordinates=detector.coordinates[clusters==0])

Tiles in cluster 0

I said it was gonna be easy! Now we can mark tiles in cluster0 as outliers andstart overfitting our neural network! This was a simple example but the same code can beused to cluster all severalmillion tiles extracted from thePANDA dataset and discardoutliers simultaneously!

Citation

If you useHistoPrep to process the images for your publication, please cite the github repository.

@misc{histoprep,  author = {Pohjonen, Joona and Ariotta, Valeria},  title = {HistoPrep: Preprocessing large medical images for machine learning made easy!},  year = {2022},  publisher = {GitHub},  journal = {GitHub repository},  howpublished = {https://github.com/jopo666/HistoPrep},}

[8]ページ先頭

©2009-2025 Movatter.jp