Histopathological image classification method based on color deconvolution and self-attention modelTechnical Field
The invention belongs to the field of medical tissue cell image processing, and particularly relates to a tissue cell image classification method combining color deconvolution and a self-attention model.
Background
According to the published data of the world health organization international cancer research institution, in 2020, new cancer cases are about 1930 ten thousand worldwide, and the death number is about 1000 ten thousand. The number of newly increased breast cancer is 226 ten thousand, and the newly increased breast cancer accounts for 11.7% of the total new cancer cases, exceeds the number of newly increased lung cancer 220 ten thousand, and becomes the first cancer worldwide. In the diagnosis of breast cancer, examination of breast tissue sections by a pathologist remains the gold standard for clinical diagnosis. The development of digital pathology makes the number of digital histocyte images explosively increased, and with the increasing awareness of health and the increasing popularity of breast cancer diagnosis, there is an urgent need to find automated methods that can rapidly analyze histopathological images.
In histopathology, the most commonly used staining agents are hematoxylin and eosin. Wherein hematoxylin binds to nucleic acid, staining the nucleus in dark blue or purple, while eosin adheres to proteins in tissue, staining the cytoplasm and extracellular matrix in pink. Through the color deconvolution operation, the RGB image of the tissue cells dyed by hematoxylin and eosin can be subjected to color deconvolution, and the contribution of each coloring agent is calculated according to the absorbance of a specific coloring agent, so that the effect of separating the tissue dyed by the method is achieved.
In the past, the classification of the histopathological images often adopts a method of manually extracting image features or using machine learning, however, the methods have the characteristics of complicated steps, low efficiency, low accuracy and the like, and are difficult to use in the classification process of the actual tissue cell images.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention provides a histopathological image classification method based on color deconvolution and a self-attention model, which is used for classifying deconvoluted images by performing deconvolution operation on tissue cell images and then utilizing the self-attention model, so that the accuracy of histopathological image classification is improved.
The histopathological image classification method based on the color deconvolution and the self-attention model adopts the following scheme: the method comprises the steps of adopting an offline image color deconvolution method or an online image color deconvolution method;
The offline image color deconvolution method comprises the following steps:
S100, acquiring a hematoxylin-eosin stained histopathological image dataset, wherein the histopathological image in the dataset is a standard RGB three-channel color image, and pixels of the RGB three-channel color image form a matrix Ht;
S200, setting a pixel point value with a pixel value of 0 in a matrix Ht as 1 multiplied by 10-7, and then normalizing each pixel value to form a new matrix Ht';
S300, multiplying the new matrix Ht' with the color deconvolution standard matrix to obtain a matrix Dt, wherein the matrix Dt is a matrix after the deconvolution of the pixel matrix Ht;
S400, normalizing the matrix Dt to obtain a matrix Dt ', wherein all elements of the matrix Dt' are in the interval [0,1 ];
S500, multiplying each element in the matrix Dt' by 255 and reserving an integer to obtain an HED color space image;
S600, dividing the HED color space images into a training set, a verification set and a test set according to a certain proportion, respectively training, verifying and evaluating the subsequent model, and scaling all the images; performing online data enhancement on the training set image;
S700, normalizing and normalizing the enhanced image;
S800, modifying the last full-connection layer into a single-layer MLP (Multilayer Perception) layer by using a swin-transducer model pre-trained on an ImageNet dataset, wherein the number of output neurons of the MLP layer is 2, and sending the training set image normalized by S700 into the model for fine tuning training;
and S900, verifying by using a verification set every fixed iteration times in the training process, selecting a model with the highest classification accuracy on the verification set in the iteration process, and then verifying by using a test set, thereby obtaining the final model classification accuracy.
The online image color deconvolution method comprises the following steps:
t100, acquiring a standard RGB three-channel histopathological image dataset subjected to hematoxylin-eosin staining;
T200, dividing images in a data set into a training set, a verification set and a test set according to a certain proportion, respectively performing training, verification and evaluation on the subsequent model, scaling all the images into images with the size of w multiplied by w, and performing online data enhancement operation on the training set images;
T300, performing the same normalization operation as the step S400 and the same normalization operation as the step S700 on the enhanced image;
T400, using a pre-trained swin-transducer self-attention model on the ImageNet dataset, modifying the last fully connected layer, and adding an input negation operation to the model head, namely outputting-x for each input value x. Then adding a layer of convolution layer, namely conv1, wherein the convolution kernel is 1 multiplied by 1, the number of input channels and the number of output channels are 3, no offset parameter is used, and the modified model is named as a de-swt model;
T500, using a pre-trained swin-transducer model on an ImageNet dataset, changing the last full-connection layer of the model into a single MLP layer, wherein the number of output neurons of the MLP layer is 2, adding an input negation operation to the head of the model, namely outputting-x for each input numerical value x, adding a layer of convolution layer, and recording as conv1, wherein the convolution kernel size is 1 multiplied by 1, the number of input channels and the number of output channels are 3, and recording the modified model as a de-swt model without using offset parameters;
t600, splitting the color deconvolution standard matrix into 3 multiplied by 1 matrixes according to columns, and loading the 3 multiplied by 1 matrixes into weight parameters of a conv1 layer of the de-swt model;
t700, inputting the training set standardized in the step T300 into a de-swt model, performing fine tuning training by using a specific learning rate lr, and setting the learning rate of the conv1 layer as p multiplied by lr in order to prevent gradient disappearance when the de-swt model reversely propagates to the conv1 layer, wherein p is the learning rate amplification factor, and increasing the learning rate of the model on the conv1 layer by using p so that parameters of the conv1 layer are changed along with iteration of the training set;
T800, in the training process of the step T600, verifying the model by using verification set data at intervals of a certain iteration number, selecting a model with a better effect on the verification set, extracting parameters of a model conv1 layer, combining the parameters into a new image color deconvolution matrix according to columns, and marking the new image color deconvolution matrix as N;
T900, loading each column value of the color deconvolution matrix N into conv1 layer parameters of the de-swt model, reloading swin-transducer parameters pre-trained on an ImageNet data set into other corresponding layers of the de-swt model, and setting p to 1 so that the learning rate in the conv1 layer is reset to lr;
And T1000, retraining the de-swt model obtained in the step T800 by using a training set, verifying the model by using verification set data at intervals of a certain iteration number, finally selecting the model with the highest accuracy on the verification set, and testing a model classification result by using a testing set, thereby obtaining the final model classification accuracy.
Further, in step S200, the pixel value of the pixel value 0 in the dataset is set to 1×10-7, and then the pixel value of the ith row and the jth column of the kth channel of Ht is setAnd (3) calculating:
where k represents R, G, B three channels, after which normalized pixel values are obtainedAnd a new matrix Ht' is constructed.
Further, in step S300, the matrix Ht' is multiplied by a color deconvolution standard matrix M, where M is:
Let Mk,c (1 is less than or equal to k, c is less than or equal to 3) be the element of the kth row c column of the matrix M,(C=1, 2,3, represent three channels of the image; 1.ltoreq.i.ltoreq.wt, 1.ltoreq.j.ltoreq.ht) is the pixel point of the ith row j column of the c-channel in the pixel matrix Dt, then:
The matrix Dt is formed, namely the result of the original image pixel matrix Ht after color deconvolution, wherein three color channels of the Dt are recorded as H, E, D and respectively represent the dyeing information of hematoxylin, eosin and diaminobenzidine, and the obtained matrix Dt is the HED color space matrix.
Further, in step S400, the matrix Dt is normalized to obtain Dt':
wherein (Dt) max, (Dt) min represent the maximum and minimum values, respectively, of the elements in the matrix Dt such that
Further, the same normalization operation as in step S400 is performed on the enhanced image in step S700, after which the mean value (μ1, μ2, μ3) = (0.458,0.456,0.406) and variance (σ1, σ2, σ3) = (0.229,0.224,0.225) of the three-channel pixels of the image dataset image are selected,
By the formula:
Each image is normalized.
Further, in steps S600 and T200, the training set is enhanced numerically, where the data enhancement includes randomly rotating the center, horizontally flipping, vertically flipping, scaling, and randomly changing the brightness, contrast, saturation, and hue of the image; the blank part of the edge of the rotated image center is complemented by 0 pixel, the brightness, contrast and saturation of the image are respectively randomly changed into 80-120%, 90-110% and 90-110% of the original image, and the hue change range is-0.1.
Further, in step T500, the deconvolution matrix is split into the following 33×1 column vectors, denoted as z1, z2, z3:
And loading the three column vectors as initial weight parameters of a de-swt model conv1 layer, and respectively carrying out convolution operation on input data by each z1, z2 and z3 in the calculation process of the de-swt model to obtain deconvoluted ith channel data, and finally obtaining H, E, D three channel information respectively.
Further, in step T700, the model with better effect on the verification set is selected as the model with classification accuracy lower than the highest accuracy obtained in the iterative process on the verification set.
Further, an attention mechanism is used within the swin-transducer model in the offline image color deconvolution method and the online image color deconvolution method.
Compared with the prior art, the invention has the following beneficial effects:
1. In step S200, the 0 pixel value is set to 1×10-7, so that the number of pairs of 0 pixels is prevented from being infinitesimal, and other smaller numbers such as: 1×10-6, and will not affect the subsequent classification;
2. In step S500, each normalized element in the matrix Dt' is multiplied by 255 and an integer is reserved, so that an HED color space image can be obtained, and information of each channel of H, E, D can be extracted to observe the result after dyeing and separating of each dye, which can be used for checking the dyeing and separating effect;
3. only in the steps S600 and T200, the height and width of the image can be changed by scaling the image, and the scaled size is suitable for the input of a subsequent model, and the deconvolution and image enhancement operations do not change the size of the image;
4. The training set is subjected to numerical enhancement, wherein brightness, contrast, saturation and hue of images of the training set are changed, so that the model has robustness to tissue pathology image dyeing difference, the model is not easy to be classified into errors caused by tissue dyeing condition difference, and the verification set and the test set are not subjected to data enhancement, so that the true classification effect of the model can be reflected by means of classification accuracy on the verification set and the test set;
5. In the step T800, a model with better effect on the verification set is selected to prevent conv1 layer parameters from being over-fitted to the data of the verification set;
6. the swin-transducer model uses an attention mechanism, and color deconvolution operation is combined with a self-attention mechanism, so that a good image classification effect can be achieved;
7. Because the model parameters pre-trained on the ImageNet dataset are used in the S700 and the T900, the standard values and variance values of the pixels of the ImageNet dataset are used for normalizing the dataset in the steps S600 and T300, so that the model can be better classified.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application, illustrate and/or provide a further understanding of the application, and wherein:
FIG. 1 is a flow chart of two embodiments of the present invention;
FIG. 2 is a detailed flow chart of two embodiments of the present invention;
FIG. 3 is a sample of BreakHis datasets of benign and malignant specimens;
FIG. 4 is a BreakHis dataset color deconvolution and channel separation sample;
FIG. 5 is a BreakHis dataset training set image data enhancement sample;
FIG. 6 is a graph of the change in the classification accuracy of the training set and the verification set during the training of the offline image color deconvolution model.
FIG. 7 is a graph showing the classification accuracy rate change of the training set and the verification set during training in the online image color deconvolution model step T800;
Fig. 8 is a training set and verification set classification accuracy change curve in the training process of the online image color deconvolution model step T1000.
Detailed Description
Referring to one of fig. 1-2, embodiments include two implementations of offline image color deconvolution or online image color deconvolution.
The method for implementing the offline image color deconvolution comprises the following steps:
Referring to fig. 3, S100, a hematoxylin-eosin stained histopathological image dataset is acquired, and BreakHis breast cancer histopathological image datasets are selected in this example, which are 7909 images acquired from 82 patients, 24 of which benign patients, from which 2480 benign images were acquired; the malignant patient 58 had 5429 malignant images, benign images and malignant image samples acquired from them. There are four magnifications in the dataset, 40-fold, 100-fold, 200-fold and 400-fold. The image sizes are 700 multiplied by 460, the images are scaled into images with the size of 224 multiplied by 224, and the pixels of each image form a matrix Ht with the size of 224 multiplied by 3, wherein 3 corresponds to R, G, B three channels of the image;
S200, set up(K=1, 2,3, representing the R, G, B channel of the image; 1.ltoreq.i, j.ltoreq.224) is an element in Ht,The values being within the standard RGB image pixel value range, i.eWhen (when)When 1X 10-7 is used for replacement, so thatAnd then calculating:
obtaining the normalized valueWherein the method comprises the steps ofAnd forms a matrix Ht';
s300, multiplying the matrix Ht' by a color deconvolution standard matrix M, wherein M is:
Let Mk,c (1 is less than or equal to k, c is less than or equal to 3) be the element of the kth row c column of the matrix M,(C=1, 2,3, representing three channels of the image; 1.ltoreq.i, j.ltoreq.224) is the pixel point of the ith row j column of the c-channel in the pixel matrix Dt, then:
The matrix Dt,Dt is the result of the original image pixel matrix Ht after the color deconvolution, wherein three color channels of Dt are H, E, D and respectively represent the dyeing information of hematoxylin (hematoxylin), eosin (eosin) and Diaminobenzidine (DAB), and the obtained matrix Dt is the HED color space matrix;
s400, using a formula:
Normalizing the matrix Dt to obtain Dt ', wherein (Dt)max、(Dt)min respectively represents the maximum value and the minimum value of the elements in the matrix Dt, so that the value is more than or equal to 0 and less than or equal to (yci,j)' -1;
Referring to fig. 4, S500, multiplying each element in the matrix Dt 'obtained in step S400 by 255 and reserving an integer, then obtaining an HED color space image after color deconvolution, and then separately extracting the H, E, D channels of Dt', so as to obtain a dyed and separated image, wherein the effect of dyeing and separating can be verified by observing the 3 channel images, and the result of color deconvolution and HED channel separation of the image can be obtained;
S600, the HED color space image obtained in the step S500 is processed according to the following steps of about 7:1.5:1.5 quantity divides training set, verification set, test set, is used for training, verification, aassessment to follow-up model respectively. Wherein all images of each patient are divided into only one of the training set, the verification set and the test set, and the images belonging to the same patient are always ensured not to appear in any 2 data sets of the training set, the verification set and the test set. Finally taking 5769 images of 59 patients as a training set, 1063 images of 11 patients as a verification set and 1077 images of the remaining 12 patients as a test set;
Referring to fig. 5, S700, performing online random data enhancement on training set data, setting the probability of horizontal overturn and vertical overturn of an image to be 0.5, and rotating a random center by 30 °, wherein the blank part of the edge of the rotated image center is complemented by 0 pixels, the brightness, contrast and saturation of the image are randomly changed to 80% -120%, 90% -110% and the hue change range is-0.1.
S800, performing the same normalization operation as that of step S400 on the enhanced image, and then selecting the mean value (μ1,μ2,μ3) = (0.458,0.456,0.406) and variance (σ1,σ2,σ3) = (0.229,0.224,0.225) of the three-channel pixels of the image dataset image. By the formula:
Normalizing each image;
s900, modifying the last full-connection layer into a single-layer MLP (Multilayer Perception) layer by using a swin-transducer model pre-trained on an ImageNet dataset, wherein the number of output neurons of the MLP layer is 2, and sending the normalized training set image into the model for fine tuning training. The batch size (batch size) of the images in the training process is set to be 32, and all training sets are trained for 20 times (epoch), and the models are trained together (iteration)And twice. The initial learning rate of the model is set to be 1 multiplied by 10-5, the learning rate of every 5 epochs is reduced to be 0.5 times, a cross entropy loss function is used for calculating a loss value, and an Adam optimizer is used for optimization;
referring to fig. 6, S1000, in the model training process, the trained model is verified by using a verification set every 72 item, and finally, 50 accuracy rates for classifying the verification set are obtained, and the highest accuracy rate on the verification set is 93.59% (marked by a circle in fig. 6). And selecting the model with the highest accuracy rate classified in the verification set, and verifying the model by using the test set. The final test set classification accuracy was 93.51%, and the test set classification results are shown in table 1 below.
Table 1: swin-transducer model Classification results
The online image color deconvolution implementation method comprises the following steps:
Referring to fig. 3, step S100 is the same as step S100 of the offline image color deconvolution scheme embodiment, and the images of the BreakHis datasets are acquired and scaled to 224×224, and the training set, the test set and the verification set are divided;
T200, performing corresponding online data enhancement on training set images of BreakHis data sets according to the step S600 in the offline image color deconvolution scheme;
t300, through the formula:
Pixel xki,j in each image Dt in BreakHis is normalized, where (Dt)max、(Dt)min represents the maximum and minimum values, respectively, of the elements in matrix Dt:
Normalizing pixels in Dt, wherein (μ1,μ2,μ3)=(0.458,0.456,0.406),(σ1,σ2,σ3)=(0.229,0.224,0.225);
T400, using a swin-transducer model pre-trained on an ImageNet dataset, changing the last connecting layer of the model into a single MLP layer, wherein the number of output neurons of the MLP layer is 2, and adding a negation operation on the head of the model, namely outputting-x to any input x. Then adding a layer of convolution layer, namely conv1, wherein the convolution kernel is 1 multiplied by 1, the number of input channels and the number of output channels are 3, no offset parameter is used, and the modified model is named as a de-swt (deconvolution swin transformer) model;
T500, splitting the deconvolution matrix into the following 3 column vectors of 3 multiplied by 1 according to columns:
loading the three column vectors as initial weight parameters of a de-swt model conv1 layer;
And T600, inputting the training set standardized in the step T300 into a de-swt model, and setting the initial learning rate lr of the model to be 1 multiplied by 10-5. The conv1 layer model learning rate magnification factor p is set to be 10, namely the conv1 layer initial learning rate is 1×10-4, and the rest layer learning rates are 1×10-5. The batch size (batch size) of the images in the training process is set to be 32, and all training sets are trained for 20 times (epoch), and the models are trained together (iteration)And twice. The learning rate lr is reduced to 0.5 times every 5 epochs, and optimization is carried out by using an Adam optimizer;
referring to fig. 7, in the model training process, the training model is verified by using a verification set every 72 item, and finally, 50 correct rates for classifying the verification set are obtained. The highest correct rate of the verification set is 94.52%, which is the test result of the 14 th verification set. The next highest accuracy of the validation set was 93.59% and the test results were obtained at the 15 th validation set (marked with circles in fig. 7). Selecting a model with the classification accuracy of 93.59% in the verification set, and extracting conv1 layer parameters of the model as follows:
combining the vectors column by column yields a new color deconvolution matrix M':
T800, reloading the swin-transducer model part in de-swt into parameters trained from an ImageNet dataset, wherein the conv1 layer learning rate magnification factor p is set to be 1, so that the learning rates of all layers of the model are kept consistent. The model was then retrained with the image batch size (batch size) set to 32 and all training sets training iterated (epoch) 20 times, model co-training (iteration)And twice. The initial learning rate of the model is set to be 1 multiplied by 10-5, the learning rate of every 5 epochs is reduced to be 0.5 times, a cross entropy loss function is used, and an Adam optimizer is used for optimization;
Referring to fig. 8, in the process of retraining the model in step T800, the trained model is verified with a verification set every 72 iterations, and finally, 50 accuracy rates for classifying the verification set are obtained. The model with the highest classification accuracy in the verification set (marked by circles in fig. 8) was selected, the highest classification accuracy in the verification set was 95.36%, and the model was verified by the test set, so that the classification accuracy of the test set was 94.17%, and the classification results are shown in table 2 below.
Table 2: de-swt model classification results
The above is a preferred embodiment of the present invention, and the results of the embodiment show that the combination of color deconvolution and self-attention model can achieve a good image classification effect. Both off-line image color deconvolution and on-line image color deconvolution use the swin-transducer model because the model uses the attention mechanism, and the models that can be used here are not limited to the swin-transducer model.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.