CT image self-supervision pancreas segmentation model construction method based on contrast learningTechnical Field
The application relates to the technical field of image segmentation, in particular to a CT image self-supervision pancreas segmentation model construction method based on contrast learning.
Background
Accurate CT imaging pancreatic organ segmentation techniques are a very basic and important task in pancreatic pathology diagnosis, disease treatment and cancer radiotherapy planning. Accurately segmented pancreatic organs can provide a number of valuable information for a physician's actual clinical diagnosis, such as the size, location, boundary conditions, spatial relationships of multiple organs, etc. of a diseased pancreatic organ. Furthermore, pancreatic organ segmentation plays an irreplaceable role in the subsequent specific clinical course of treatment, in particular in radiotherapy-based cancer and tumor treatment. Accurate segmentation of diseased organs can effectively mitigate potential effects on healthy organs near the cancer area. In clinical practice, accurate segmentation of pancreatic organs is typically performed manually by oncologists or radiologists. However, the above method relies on the professional medical personnel to manually perform careful labeling and careful examination on a slice-by-slice basis, and the manual segmentation process of an image of a patient often takes several hours to complete. Thus, this is a very time consuming and error prone process. Furthermore, due to the differences between the imaging quality of different examination devices and the anatomy of different patient pancreas, it is a very challenging task for a physician with a shallow experience to quickly and accurately manually segment images of the pancreatic organs of a large number of patients.
In recent years, with the development of deep learning and other technologies, a deep network-based pancreatic automatic segmentation technology has received a great deal of attention. However, there are a number of difficulties in the task of accurate segmentation of the pancreas. Firstly, the pancreas volume is smaller, the proportion of the pancreas volume in the whole abdomen CT volume scanning is less than 0.5%, the pancreas volume scanning has serious class imbalance problem, and effective characteristics are difficult to extract in the training process; secondly, pancreas has large morphological changes such as shape, direction, length-diameter ratio and the like, and is difficult to effectively divide by low-level features; in addition, the large number of organs and abundant soft tissues around the pancreas create a large background disturbance that makes it difficult to discern the blurred organ boundaries. Currently, the performance of the main stream fully supervised learning method in the field is severely dependent on very expensive manual labeling, so that the existing method is often limited by the scale of labeling data, and therefore, enough effective features are difficult to extract.
Disclosure of Invention
In order to overcome at least one defect in the prior art, the application provides a CT image self-supervision pancreas segmentation model construction method based on contrast learning.
In a first aspect, a method for constructing a self-supervision pancreatic segmentation model of a CT image based on contrast learning is provided, including:
performing self-supervision pre-training on the encoder based on the original training data set to obtain a trained encoder; the samples in the original training data set comprise abdomen enhancement CT image slices of normal people and abdomen enhancement CT image slices of abnormal people;
migrating the trained encoder parameters to the encoder, and performing fine tuning on the encoder-decoder structure based on the labeling data set to obtain a fine-tuned encoder-decoder structure; the samples in the labeling data set are data obtained by labeling the abdomen enhancement CT image slices of the abnormal person; the finely tuned encoder-decoder structure is the CT image self-supervision pancreas segmentation model based on contrast learning.
In one embodiment, performing self-supervised pre-training of an encoder based on an original training dataset to obtain a trained encoder, comprising:
carrying out data enhancement on samples in the original training data set, and converting the samples into a space transformation image and a color transformation image;
inputting the space transformation image and the color transformation image into an encoder to obtain a first characteristic representation and a second characteristic representation;
inputting the first characteristic representation and the second characteristic representation into the MLP to obtain a third characteristic representation and a fourth characteristic representation;
and constructing a contrast learning loss function based on the third characteristic representation and the fourth characteristic representation, and performing self-supervision pre-training based on the contrast learning loss function to obtain the trained encoder.
In one embodiment, the contrast learning loss function is:
wherein L iscons In order to compare the learning loss function,for the third feature of the kth sample, +.>For the fourth feature of the kth sample, sim () is a dot product, k represents the sample number, M is the number of samples in the original training dataset, and t is a measure of the normalized temperature scale.
In one embodiment, trimming the encoder-decoder structure based on the annotation data set results in a trimmed encoder-decoder structure comprising:
inputting the samples in the annotation data set into an encoder, and inputting the output of the encoder into a decoder;
and constructing a fine tuning loss function based on the output of the decoder, and performing model fine tuning based on the fine tuning loss function to obtain a fine-tuned encoder-decoder structure.
In one embodiment, the fine-tuning loss function is:
L=LWCE +αLFocal
wherein L is a fine tuning loss function, LWCE Is crossed withEntropy loss function, LFocal The cost function is Focal Loss function, and alpha is the contribution degree of the Loss function;
wherein N is the number of samples in the marked data set, C is the number of categories, wc For a weight of category c,true tag for sample i of category c, +.>The output of the decoder for sample i of class c; gamma is the modulation parameter.
In a second aspect, a device for constructing a self-supervised pancreatic segmentation model of a CT image based on contrast learning is provided, including:
the self-supervision pre-training module is used for carrying out self-supervision pre-training on the encoder based on the original training data set to obtain a trained encoder; the samples in the original training data set comprise abdomen enhancement CT image slices of normal people and abdomen enhancement CT image slices of abnormal people;
the fine tuning module is used for migrating the parameters of the trained encoder to the encoder, and carrying out fine tuning on the encoder-decoder structure based on the marking data set to obtain a fine-tuned encoder-decoder structure; the samples in the labeling data set are data obtained by labeling the abdomen enhancement CT image slices of the abnormal person; the finely tuned encoder-decoder structure is the pancreatic segmentation model constructed.
In one embodiment, the self-supervising pre-training module is further configured to:
carrying out data enhancement on samples in the original training data set, and converting the samples into a space transformation image and a color transformation image;
inputting the space transformation image and the color transformation image into an encoder to obtain a first characteristic representation and a second characteristic representation;
inputting the first characteristic representation and the second characteristic representation into the MLP to obtain a third characteristic representation and a fourth characteristic representation;
and constructing a contrast learning loss function based on the third characteristic representation and the fourth characteristic representation, and performing self-supervision pre-training based on the contrast learning loss function to obtain the trained encoder.
In one embodiment, the contrast learning loss function is:
wherein L iscons In order to compare the learning loss function,for the third feature of the kth sample, +.>For the fourth feature of the kth sample, sim () is the dot product, k represents the sample number, and M is the number of samples in the original training dataset.
In one embodiment, the trimming module is further configured to:
inputting the samples in the annotation data set into an encoder, and inputting the output of the encoder into a decoder;
and constructing a fine tuning loss function based on the output of the decoder, and performing model fine tuning based on the fine tuning loss function to obtain a fine-tuned encoder-decoder structure.
In one embodiment, the fine-tuning loss function is:
L=LWCE +αLFocal
wherein L is a fine tuning loss function, LWCE L is a cross entropy loss functionFocal For Focal Loss function, α is the Loss function contribution;
Wherein N is the number of samples in the marked data set, C is the number of categories, wc For a weight of category c,true tag for sample i of category c, +.>The output of the decoder for sample i of class c; gamma is the modulation parameter.
In a third aspect, a method for self-supervising pancreatic segmentation of CT images based on contrast learning is provided, comprising:
inputting the pancreatic CT image to be segmented into a CT image self-supervision pancreatic segmentation model based on contrast learning to obtain a segmentation result;
the pancreas segmentation model is a CT image self-supervision pancreas segmentation model based on contrast learning obtained according to the previous description.
In a fourth aspect, a computer readable storage medium is provided, where the computer readable storage medium stores a computer program, and when the computer program is executed by a processor, the computer program implements the method for constructing a CT image self-supervision pancreas segmentation model based on contrast learning, or implements the method for constructing a CT image self-supervision pancreas segmentation based on contrast learning.
In a fifth aspect, a computer program product is provided, which includes a computer program/instruction, where the computer program/instruction, when executed by a processor, implements the above-mentioned CT image self-supervised pancreatic segmentation model construction method based on contrast learning, or implements the above-mentioned CT image self-supervised pancreatic segmentation method based on contrast learning.
Compared with the prior art, the application has the following beneficial effects: the pancreatic segmentation model constructed by the model construction method solves the problems that the segmentation category in CT images is unbalanced, the morphology change of pancreas is large and the fuzzy organ boundary is difficult to effectively process because of the limitation of the scale of labeling data in the prior art; the self-supervision learning mode is adopted, so that the dependence of the current mainstream depth network on the expensive data acquisition and data labeling cost is relieved, and the segmentation performance of pancreatic organs in CT images is further improved.
Drawings
The present application may be better understood by reference to the following description taken in conjunction with the accompanying drawings, which are incorporated in and form a part of this specification, together with the following detailed description. In the drawings:
FIG. 1 shows a flow diagram of a CT image self-supervised pancreatic segmentation model construction method based on contrast learning, according to an embodiment of the present application;
FIG. 2 shows a block diagram of a CT image self-supervised pancreatic segmentation model construction apparatus based on contrast learning, according to an embodiment of the present application;
FIG. 3 is a graph showing visual comparison of segmentation results of different methods and the method of the present application on test data, wherein (a) is a real label; (b) a segmentation result of the 3DUNet model; (c) the segmentation result of the model genes method; (d) is a segmentation result of the Rubik's cube++ method; (e) is the segmentation result of the nnUNet model; (f) is the segmentation result of the DQN model; (g) is the segmentation result of the method.
Detailed Description
Exemplary embodiments of the present application will be described hereinafter with reference to the accompanying drawings. In the interest of clarity and conciseness, not all features of an actual embodiment are described in the specification. It will of course be appreciated that in the development of any such actual embodiment, numerous implementation-specific decisions may be made to achieve the developers' specific goals, and that these decisions may vary from one implementation to another.
It should be noted that, in order to avoid obscuring the present application with unnecessary details, only the device structures closely related to the solution according to the present application are shown in the drawings, and other details not greatly related to the present application are omitted.
It is to be understood that the present application is not limited to the described embodiments due to the following description with reference to the drawings. In this context, embodiments may be combined with each other, features replaced or borrowed between different embodiments, one or more features omitted in one embodiment, where possible.
An embodiment of the present application provides a method for constructing a CT image self-supervised pancreatic segmentation model based on contrast learning, and fig. 1 shows a flow chart of the method for constructing the CT image self-supervised pancreatic segmentation model based on contrast learning according to an embodiment of the present application, referring to fig. 1, the method includes:
step S1, performing self-supervision pre-training on an encoder based on an original training data set to obtain a trained encoder; the samples in the raw training dataset include normal human abdomen-enhancing CT image slices and abnormal human abdomen-enhancing CT image slices.
Here, first, NIH pancreas public data sets are collected; then, data set arrangement is carried out, and the abdomen enhancement CT image data of 82 normal persons are processed according to the resolution ratio of 512x512 pixels and the slice thickness of 1.5mm-2.5 mm; CT image data of an abnormal person (patient) are all stored in a DICOM format, and all tag data are stored in a nii data format; the abdomen enhancement CT image slice of the normal person and the abdomen enhancement CT image slice of the abnormal person obtained through the processing form an original training data set.
Step S2, transferring the parameters of the trained encoder to the encoder, and performing fine adjustment on the encoder-decoder structure based on the annotation data set to obtain a fine-adjusted encoder-decoder structure; the samples in the labeling data set are data obtained by labeling the abdomen enhancement CT image slices of the abnormal person; the finely tuned encoder-decoder structure is the CT image self-supervision pancreas segmentation model based on contrast learning. Here, the encoder-decoder may employ a 3DUnet model.
The pancreatic segmentation model constructed by the model construction method solves the problems that in the prior art, segmentation categories in CT images are unbalanced, morphology changes of pancreas are large, and fuzzy organ boundaries are difficult to effectively process due to limitation of labeling data scale; the self-supervision learning mode is adopted, so that the dependence of the current mainstream depth network on the expensive data acquisition and data labeling cost is relieved, and the segmentation performance of pancreatic organs in CT images is further improved. The model construction process comprises two stages, namely an upstream task (self-supervision pre-training) and a downstream task (pancreas segmentation), wherein the upstream task performs self-supervision pre-training through an input original three-dimensional CT image, and the main objective is to mine potential high-level semantic features of abdominal organs in the original data.
In this embodiment, a 3DUnet encoder/decoder structure is adopted, and the 3DUnet is used as a classical three-dimensional model in the field of medical image segmentation, and mainly comprises three parts of downsampling (encoder), upsampling (decoder) and jump connection. Because the rich information of different scales is fused, the 3DUnet is very suitable for the medical image segmentation field with smaller data volume and more fuzzy characteristics. The output of the encoder-decoder network is a feature map of the same size as the original image. After the feature map is obtained, a three-dimensional weak label of the pancreas is generated through a maximum external frame algorithm, and a plurality of three-dimensional frames are generated around the pancreas weak label in the same size from near to far. Thereby generating a feature representation of the positive and negative sample pairs on the feature map. And finally, mapping the positive sample pair and the negative sample pair of the obtained characteristic representation layer into a hidden space, and applying a contrast loss function in the hidden space so as to perform contrast learning training.
Through training of an upstream task, a 3DUnet self-supervision pre-training parameter capable of fully understanding local high-level semantic features of different areas of the abdomen in a CT image can be obtained. After the upstream task training is completed, the parameters of the encoder-decoder network after self-supervision pre-training are migrated to the 3DUnet model in the downstream task for use. The downstream task then performs fine-tuning training on the 3DUnet model using the already annotated dataset. After training, the obtained model is used as a pancreatic organ segmentation model, and accurate segmentation of pancreas is completed by using the model.
The upstream task is constructed based on the idea of contrast learning, and the pancreas local advanced semantic features in the original data are fully excavated in a self-supervision mode through a contrast learning paradigm with strong semantic feature extraction capability, so that the model is forced to deeply understand the difference between a background area and a pancreas area in a CT image, and further background interference is accurately identified to further distinguish the fuzzy organ boundary of the pancreas in the CT image. The hidden abundant features in the original data are obtained through the self-supervision pre-trained model, and the model is helped to distinguish pancreas in CT images by some high-level semantic features, so that the model is better suitable for larger morphological changes of pancreas. The upstream task greatly enhances the extraction capability of the downstream task on pancreatic features through 'useful information' acquired through self-supervision pre-training, so that the model can better distinguish pancreatic regions from the complex background of CT images.
In one embodiment, the self-supervised pre-training of the encoder based on the original training data set in step S1, to obtain a trained encoder may include:
step S11, carrying out data enhancement on samples in the original training data set, and converting the samples into a space transformation image and a color transformation image;
step S12, inputting the space transformation image and the color transformation image into an encoder to obtain a first characteristic representation and a second characteristic representation;
step S13, inputting the first characteristic representation and the second characteristic representation into an MLP (Multilayer Perceptron, multi-layer perceptron) to obtain a third characteristic representation and a fourth characteristic representation;
and S14, constructing a contrast learning loss function based on the third characteristic representation and the fourth characteristic representation, and performing self-supervision pre-training based on the contrast learning loss function to obtain the trained encoder.
Specifically, the contrast learning loss function is:
wherein L iscons In order to compare the learning loss function,for the third feature of the kth sample, +.>For the fourth feature of the kth sample, sim () is a dot product, k represents the sample number, M is the number of samples in the original training dataset, and t is a measure of the normalized temperature scale.
In one embodiment, the trimming of the encoder-decoder structure based on the labeling data set in step S2 results in a trimmed encoder-decoder structure, comprising:
s21, inputting samples in the marked data set into an encoder, and inputting the output of the encoder into a decoder;
and S22, constructing a fine tuning loss function based on the output of the decoder, and performing model fine tuning based on the fine tuning loss function to obtain a fine-tuned encoder-decoder structure.
Specifically, the fine-tuning loss function is:
L=LWCE +αLFocal
wherein L is a fine tuning loss function, LWCE L is a cross entropy loss functionFocal The cost function is Focal Loss function, and alpha is the contribution degree of the Loss function;
wherein N is the number of samples in the marked data set, C is the number of categories and is set as 2,wc Is of the classThe weight of the other c,true tag for sample i of category c, +.>The output of the decoder of the sample i with the category of c is used for obtaining the probability that each pixel point in each sample belongs to a target (pancreas); gamma is a modulation parameter that can be empirically set.
Here, the model was trained using an AdamW optimizer with a pre-heat cosine scheduler of 500 iterations. The pre-training experiment uses 4 batch sizes per GPU with an initial learning rate of 4e-4 for 450K iterations, a momentum of 0.9, and a decay of 1e-5.
With the same inventive concept as the contrast learning-based CT image self-supervised pancreatic segmentation model construction method, this embodiment also provides a contrast learning-based CT image self-supervised pancreatic segmentation model construction apparatus corresponding thereto, and fig. 2 shows a block diagram of a contrast learning-based CT image self-supervised pancreatic segmentation model construction apparatus according to an embodiment of the present application, including:
the self-supervision pre-training module 21 is configured to perform self-supervision pre-training on the encoder based on the original training data set, so as to obtain a trained encoder; the samples in the original training data set comprise abdomen enhancement CT image slices of normal people and abdomen enhancement CT image slices of abnormal people;
a fine tuning module 22, configured to migrate the trained parameters of the encoder to the encoder, and perform fine tuning on the encoder-decoder structure based on the labeling data set, so as to obtain a fine-tuned encoder-decoder structure; the samples in the labeling data set are data obtained by labeling the abdomen enhancement CT image slices of the abnormal person; the finely tuned encoder-decoder structure is the pancreatic segmentation model constructed.
The device for constructing the CT image self-supervision pancreatic segmentation model based on contrast learning in this embodiment has the same inventive concept as the above method for constructing the CT image self-supervision pancreatic segmentation model based on contrast learning, so that the specific embodiment of the device can be seen from the foregoing example part of the method for constructing the CT image self-supervision pancreatic segmentation model based on contrast learning, and the technical effects thereof correspond to those of the above method, and will not be described herein again.
The embodiment of the application also provides a CT image self-supervision pancreas segmentation method based on contrast learning, which comprises the following steps:
inputting the pancreatic CT image to be segmented into a CT image self-supervision pancreatic segmentation model based on contrast learning to obtain a segmentation result; the pancreas segmentation model is a CT image self-supervision pancreas segmentation model based on contrast learning, which is obtained according to the model construction method. Here, the segmentation results are to divide the target region and the background region in the pancreatic CT image to be segmented.
In order to verify the effectiveness of the CT image self-supervision pancreas segmentation method based on contrast learning, the method is subjected to a contrast experiment with the existing method so as to verify the superiority of the method. In addition, pancreatic segmentation results obtained with the strategy of de novo training of the underlying segmentation model 3DUnet employed in the present application were also compared to verify the effectiveness of the methods of the present application.
The comparative results of the pancreatic organ segmentation experiments are shown in table 1, and the results in the table can show that the CT image self-supervision pancreatic segmentation method based on comparative learning provided by the application obtains the most excellent 87.96% DICE coefficient on 24 cases of test data. Compared with the most advanced full-supervision pancreas semantic segmentation methods nnUnet and DQN at present, the self-supervision segmentation method of the application improves the pancreas segmentation result by 2.02% and 1.03% respectively. Compared with the current self-supervision method model generation and Rubik's cube++ segmentation results which realize the advanced pancreas segmentation effect, the method has the advantages that the segmentation results are respectively improved by 4.73 percent and 3.88 percent. The comparison result proves the superiority of the method, and the method further improves the accuracy of pancreas segmentation under the condition of not depending on additional added data and labels. In addition, for the basic segmentation model 3DUnet used in the upstream task and the downstream task, the self-supervision pre-training strategy of the application significantly improves the segmentation performance of the application. Compared with the traditional strategy of de novo training, the 3DUnet model obtains a large number of important characteristic representations of pancreatic organs through the training of the self-supervision upstream task, and provides the pre-training information for the downstream task in a parameter migration manner, so that the DICE coefficient of pancreatic segmentation is improved by 7.21%. The experimental result fully proves the correctness and effectiveness of the method.
TABLE 1 pancreatic segmentation comparison experiment results
| Method | Average DICE coefficient |
| 3D Unet | 0.8075 |
| Models genesis | 0.8323 |
| Rubik’s Cube++ | 0.8408 |
| nnUNet | 0.8594 |
| DQN | 0.8693 |
| The method of the application | 0.8796 |
FIG. 3 is a graph showing visual comparison of segmentation results of different methods and the method of the present application on test data, wherein (a) is a real label; (b) a segmentation result of the 3DUNet model; (c) the segmentation result of the model genes method; (d) is a segmentation result of the Rubik's cube++ method; (e) is the segmentation result of the nnUNet model; (f) is the segmentation result of the DQN model; (g) is the segmentation result of the method of the present application. The green area in the figure represents pancreas, and according to fig. 3, the problem of inaccurate segmentation of the partial pancreas area is obvious in the prior art. Compared with other methods, the test data has better performance on the self-supervision and segmentation method provided by the application, and complex background and pancreas areas in CT images are accurately distinguished, so that the effectiveness and superiority of the method are demonstrated.
The embodiment of the application provides a computer readable storage medium, wherein the computer readable storage medium stores a computer program, and when the computer program is executed by a processor, the computer program is used for realizing the CT image self-supervision pancreas segmentation model construction method based on contrast learning or the CT image self-supervision pancreas segmentation method based on contrast learning.
The embodiment of the application provides a computer program product, which comprises a computer program/instruction, wherein the computer program/instruction is executed by a processor to realize the CT image self-supervision pancreas segmentation model construction method based on contrast learning or the CT image self-supervision pancreas segmentation method based on contrast learning.
The foregoing is merely various embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered in the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.