Pulmonary nodule classification method, medium and electronic deviceTechnical Field
The present invention relates to the field of machine learning and CT image processing, and in particular, to a lung nodule classification method, medium, and electronic device that combine deep learning and imaging features.
Background
Lung cancer is one of the main causes of death of malignant tumors in human beings at present, and the morbidity and mortality of lung cancer are on an increasing trend year by year. The 'early discovery, early diagnosis and early treatment' is the key for improving the survival rate of the lung cancer patients. Compared with the advanced lung cancer, the survival rate of the lung cancer patients diagnosed at the early stage is greatly improved. Early lung cancer is generally characterized by lung nodules, and therefore, the detection of lung nodules is key to reducing mortality in lung cancer patients.
In clinical treatment, doctors can analyze lesion information according to the features of the shapes, the positions, the textures and the like of nodules in CT images of patients, but because image data are huge, manual identification is undoubtedly time-consuming, labor-consuming and high-intensity work, and in addition, different radiologists have different experiences, so that the identification result has strong subjectivity and the accuracy cannot be guaranteed. To improve the detection accuracy of lung nodules, technicians have conducted technical studies on a number of factors that affect the accuracy of lung nodules. The extraction and identification of these features in CT images are the key processes.
The lung nodule risk degree type identification process based on the traditional imaging method mainly comprises the following steps: lung nodule segmentation, feature extraction, feature optimization, classifier classification and the like. In general, classification accuracy is related to three factors: (1) lung nodule segmentation accuracy; (2) representativeness of lung nodule extraction features; (3) the performance of the classifier. Therefore, the classification accuracy of benign and malignant pulmonary nodules is often improved by optimizing and improving the factors.
In recent years, deep learning has been significantly advanced in the field of medical image analysis, and many network models with excellent performance have been proposed in succession in the field of classification of benign and malignant pulmonary nodules. For example, CN110534192A discloses a lung nodule benign and malignant identification method based on deep learning, which has the characteristics of strong real-time performance, high accuracy and robustness compared with an artificial method, but the method also has the problem of poor classification effect under the condition of insufficient data volume.
Disclosure of Invention
The present invention is directed to overcome the above-mentioned drawbacks of the prior art, and an object of the present invention is to provide a method, a medium, and an electronic device for classifying pulmonary nodules, which combine deep learning and imaging features, so as to obtain more accurate identification features and improve classification accuracy.
The purpose of the invention can be realized by the following technical scheme:
a lung nodule classification method combining deep learning and imaging characteristics comprises the following steps:
acquiring CT original image data and a corresponding lung nodule region labeling file;
segmenting a lung nodule image block according to the CT original image data and the lung nodule region labeling file;
extracting the imaging characteristics of the lung nodule image block;
processing the lung nodule image block by adopting a trained three-dimensional convolutional neural network, and extracting CNN (convolutional neural network) features;
combining the imaging characteristic and the CNN characteristic to obtain a final characteristic;
and obtaining a classification result based on the final characteristics by adopting a classifier.
Further, the imaging features include texture features, grayscale features, and shape features.
Further, training samples for training the three-dimensional convolutional neural network are obtained by:
the method comprises the steps of intercepting pixel blocks with lung nodules as centers from a plurality of CT original images, carrying out transformation operation on each pixel block in three dimensions to amplify data, and carrying out down-sampling on the amplified data to form training samples.
Further, the three-dimensional convolutional neural network comprises a Conv3D convolutional layer, a MaxPholing 3D pooling layer and a Dense fully-connected layer.
Further, the final features are obtained by a feature selection method based on a random forest model.
Further, the lung nodule region labeling file is an XLM file.
Further, the classifier comprises a support vector machine classifier or a random forest classifier.
Further, 512 CNN features are extracted based on the three-dimensional convolutional neural network.
The present invention also provides a computer-readable storage medium comprising one or more programs for execution by one or more processors of an electronic device, the one or more programs including instructions for performing a lung nodule classification method that combines deep learning and imaging features as described.
The present invention also provides an electronic device comprising:
one or more processors;
a memory; and
one or more programs stored in the memory, the one or more programs including instructions for performing a lung nodule classification method that combines deep learning and imaging features as described.
Compared with the prior art, the invention has the following beneficial effects:
1. when the CT image is processed, the lung nodule region is segmented by utilizing the lung nodule region marking file, so that the step of segmenting lung nodules by a design algorithm is omitted, the efficiency is high, and the reliability is higher.
2. The classification method combines the CNN characteristics and the influential characteristics during classification, greatly enriches the representativeness of the nodule characteristics, overcomes the defect that the traditional imaging characteristics cannot sufficiently reflect the information of the lesion area, overcomes the defect that the classification effect is poor under the condition that the data volume of the 3D CNN is insufficient, and effectively improves the classification accuracy.
3. The invention adopts a feature selection method based on a random forest model to screen features during feature combination, retains the features with high weight value, reduces a large amount of irrelevant features and redundant features, ensures the accuracy and reduces the classification calculation amount.
4. The three-dimensional convolution neural network adopted by the invention is an inclusion-ResNet model, can extract an effective characteristic diagram, and has high model convergence speed.
5. Experiments prove that the method can achieve higher classification accuracy when applied to a support vector machine classifier or a random forest classifier.
Drawings
FIG. 1 is a block flow diagram of the method of the present invention;
FIG. 2 is a diagram of a 3D-inclusion-ResNet network framework employed in the present invention;
FIG. 3 is a feature set visualization.
Detailed Description
The invention is described in detail below with reference to the figures and specific embodiments. The present embodiment is implemented on the premise of the technical solution of the present invention, and a detailed implementation manner and a specific operation process are given, but the scope of the present invention is not limited to the following embodiments.
Example 1
As shown in fig. 1, the present embodiment provides a lung nodule classification method combining deep learning and imaging features, including the following steps:
step 1, acquiring CT original image data which is a DICOM file acquired by CT equipment.
And 2, acquiring CT original image data and a corresponding lung nodule region labeling file, wherein the lung nodule region labeling file is an XLM file and is acquired by labeling of a doctor. And segmenting the lung nodule image block according to the CT original image data and the lung nodule region labeling file.
And 3, extracting the imaging characteristics of the lung nodule image block, including texture characteristics, gray characteristics and shape characteristics. In the embodiment, there are 103 kinds of imaging features, where the texture feature includes a gray level co-occurrence matrix, a gray level run matrix, and the like, the gray level feature includes gray level entropy, energy, skewness, variance, and the like, and the shape feature includes elongation, sphericity, surface area, volume, and the like.
And 4, processing the lung nodule image block by adopting a trained three-dimensional convolutional neural network, and extracting CNN characteristics. The three-dimensional convolutional neural network comprises a Conv3D convolutional layer, a MaxPholing 3D pooling layer and a Dense fully-connected layer.
In the embodiment, a 3D-inclusion-ResNet classification model is built to be used as a three-dimensional convolution neural network to extract CNN characteristics. Training samples for training the three-dimensional convolutional neural network are obtained by the following method:
a64 x 64 pixel block taking a lung nodule as a center is intercepted from a plurality of CT original images, each pixel block is subjected to transformation operation in three dimensions to amplify data, and the amplified data is downsampled to a 48 x 48 pixel block to eliminate the influence of data enhancement operation on boundary values of the node blocks and form a training sample.
And training a 3D-inclusion-ResNet model by using the amplification data, and extracting the features of the previous layer of the 3D-inclusion-ResNet model classification output, wherein the total number of the features is 512 convolutional neural network features.
The Incep-ResNet model comprises 3 convolution layers, 5 pooling layers, 3 Incep-ResNet modules and 1 output layer in total, a shortcut structure is arranged, 4 convolution branches are totally arranged on the basis of input data, the receptive fields with different sizes are simulated respectively, convolution kernels of 3 x 3 are used in a superposition mode, the number of model parameters can be reduced while large convolution kernels are replaced, finally feature maps obtained by the convolution branches are cascaded, the alignment dimensions are operated by using the 1 x 1 convolution, the Incep-ResNet module output is obtained by adding the feature maps with the input data, and the model convergence speed is high.
And 5, combining the imaging characteristic and the CNN characteristic to obtain a final characteristic.
In this embodiment, the final feature is obtained by using a feature selection method based on a random forest model. First, for each decision tree in the random forest, calculating an out-of-bag error (errOOB1) using the corresponding out-of-bag data (OOB); secondly, noise interference is added to the characteristics X of all samples of the data outside the bag randomly, and the error outside the bag (errOOB2) is calculated again; finally, the importance of feature X is equal, where Ntree is the total number of decision trees in the random forest. According to this expression, as the importance measure of the corresponding feature. And finally, selecting the image features with the weight values higher than 0.01 to form a feature subset.
And 6, obtaining a classification result based on the final characteristics by adopting a classifier.
The above method can determine the reliability and stability of the model by testing the known lung nodule data and benign and malignant tags in the LIDC database.
In the experiment of the embodiment, firstly, DICOM raw data acquired by CT equipment and a doctor labeling information XLM file are read, and then a lung nodule sample is extracted through a pre-programmed program. The experiment contains 680 lung nodule samples, wherein 544 training sets and 136 test sets evaluate classification results by using test set test precision. The test results are shown in table 1.
TABLE 1 classification of benign and malignant pulmonary nodules
Through the above experiments, the following conclusions can be drawn: (1) the pulmonary nodules are segmented based on the gold standard drawn by doctors, the step of designing an algorithm for segmenting the pulmonary nodules is omitted, and the reliability is higher. (2) 103 traditional imaging features and 512 CNN features are extracted, the two features are combined and feature selection based on random forests is used, and a large number of irrelevant features and redundant features are reduced. The advantages of the traditional imaging characteristics and the high-level abstract characteristics of the convolutional neural network are fully utilized by combining the algorithm, the characteristic that the traditional imaging characteristics cannot fully reflect the information of the focus area is overcome, and the defect of poor classification effect of the 3D CNN under the condition of insufficient data volume is overcome. (3) In the embodiment, the classification method is combined with various classifiers, the optimal classification effect is obtained in the SVM and the random forest classifier, the optimal classification effect respectively reaches 92.75% and 94.89%, and the experimental result shows that the classification model of the method can accurately judge whether the lung nodules are good or bad.
Compared with the traditional image characteristic classification method, the classification method of the convolutional neural network characteristics is fused, each evaluation index is obviously improved, and the 3D-Incepration-ResNet model is proved to extract the CNN characteristics capable of effectively reflecting focus region information, so that the classification precision of the benign and malignant pulmonary nodules is further improved.
The above functions, if implemented in the form of software functional units and sold or used as a separate product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Example 2
The present embodiments provide an electronic device comprising one or more processors, memory, and one or more programs stored in the memory, the one or more programs including instructions for performing the lung nodule classification method incorporating deep learning and imagery features as described in your 1.
The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.