- Notifications
You must be signed in to change notification settings - Fork0
enigarv/Breast-Cancer-Detection
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
The application ofdeep learning algorithms to pathological imagesof the entire slide can potentially improve accuracy and efficiencydiagnostics. The complete digitization of the microscopic evaluation of sectionsof colored tissue in histopathology has become feasible in recent years thanks toadvances in slide scanning technology and cost reductionsin digital archiving. The benefits of digital pathology include diagnosticsremote, the immediate availability of archival cases and easier consultations withexpert pathologists.
The aim of the project is to develop Deep models Learningfrom scratch and in Fine Tuning, able to make distinctionbetween lymph node cells affected by metastases, deriving from a breast cancer, and the healthy ones.
The developed models will then be tested on a collection of images from lymph node cells, generated ad hoc. These images will be extracted from scientific publications available on theNCBI 2 database regarding the topic of metastases of breast cancer.
The dataset used was published in 2018 on GitHub, but a version of it with no duplicates was made available inKaggle later. The reduced version consists of
A dataset of cell images was generated through the NCBI database lymph nodes, to be used as a test set for the developed models. From thePubMed database we downloaded all the publications having the key of search «lymph node metastasis breast cancer». Using thePMID, the unique code that identifies publications, it was possible to download the relative PDF, through thepubmed2pdf
library. All the images present have been extracted from the PDF. Through pixel analysiscontent, number and color, it was possible to select those of our interest. For identify the images to keep we analyse the pixel color, their number: an abundant number of pixels belonging to thepurple range implied that the image was a good candidate to be part of the collection.
We tested two different approaches:
- implementation of a CNN from scratch, subsequently optimized with a Bayesian Optimization method;
- fine-tuning training of two known NN, MobileNet and EfficientNet.
The models have been trained for 50 epochs, but the training is been interrupted by the Early Stopping in case of absence of improvements.
The first CNN has obtained a
About
Breast cancer detection using small pathology images applied to images downloaded from PDF of scientific paper.