Detailed Description
The following describes embodiments of the present application in detail with reference to the drawings.
In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, interfaces, techniques, etc., in order to provide a thorough understanding of the present application.
The term "and/or" is herein merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship. Further, "a plurality" herein means two or more than two.
In order to enable those skilled in the art to better understand the technical solution of the present invention, a method for tracking an object provided by the present invention is described in further detail below with reference to the accompanying drawings and the detailed description.
Referring to fig. 1, fig. 1 is a flow chart of an embodiment of an airway classification method according to the present invention. In this embodiment, an airway classifying method is provided, and the airway classifying method includes the following steps.
S1: training the deep learning network.
Referring to fig. 2, fig. 2 is a flowchart of an embodiment of step S1 in the airway classification method provided in fig. 1. Specifically, the deep learning network includes a deep learning feature extraction network model and a deep learning classification network model, each having a three-dimensional convolution kernel. Specifically, firstly, a feature extraction network model is obtained through pre-training of a medical CT image public data set, and then a deep learning feature extraction network model and a deep learning classification network model are obtained through training of an airway image data set, so that the deep learning feature extraction network model can extract features more accurately, and further training of a deep learning network is completed. Wherein the airway image dataset is an airway scan image dataset.
S11: and pre-training the deep learning feature extraction network model.
Referring to fig. 3, fig. 3 is a schematic structural diagram of an embodiment of a deep learning network according to the present invention. Specifically, a segmentation network is constructed, the segmentation network including an initial feature extraction network model and a decoder. The initial feature extraction network model comprises an input layer, a first feature extraction layer, a pooling layer, a second feature extraction layer and a third feature extraction layer. In another alternative embodiment, the initial feature extraction network model further comprises a fifth feature extraction layer and a sixth feature extraction layer, the fifth feature extraction layer and the sixth feature extraction layer being disposed sequentially after the third feature extraction layer. In a specific embodiment, the initial feature extraction network model is modified based on ResNet-50 networks. Wherein, the number of input channels of the first feature extraction layer is 2, the number of output channels is 64, and the step s= (2, 2); the number of input channels of the pooling layer is 64, the number of output channels is 64, and the step size s= (2, 2); the second feature extraction layer has an input channel number of 64, an output channel number of 256, and a step size s= (1, 1); the third feature extraction layer has 128 input channels, 512 output channels, and step s= (1, 2); the fifth feature extraction layer has 256 input channels, 1024 output channels, and step s= (1, 2); the sixth feature extraction layer has 512 input channels, 2048 output channels, and a step s= (1, 2).
The decoder includes a transpose convolution layer, a seventh feature extraction layer, an eighth feature extraction layer, a ninth feature extraction layer, and a first output layer that are sequentially disposed. Wherein, the transposed convolution layer has 2048 input channels, 256 output channels step s= (2,8,8) the convolution kernel is 3×3×3; the seventh feature extraction layer has 512 input channels, 32 output channels, and a step s= (1, 1); the eighth feature extraction layer has 32 input channels and 32 output channels step s= (1, 1) the convolution kernel is 3×3×3; the ninth feature extraction layer has 32 input channels and 2 output channels step s= (1, 1) the convolution kernel is 1X 1. The initial feature extraction network model is used for extracting features, and the decoder is used for realizing category labeling of each pixel in the feature map.
Referring to fig. 4, fig. 4 is a flowchart illustrating an embodiment of step S11 in the airway classification method provided in fig. 2. The training step of extracting the network model for the deep learning features specifically comprises the following steps.
S111: a first training sample set is acquired, the first training sample set comprising a plurality of first image samples in a medical CT image public dataset.
In particular, the medical CT image public dataset may comprise KiTS dataset, liTS dataset and MSD dataset as public datasets. The first image sample comprises scanned image data and a real mask area image corresponding to the scanned image data; wherein the same scanned image data is stacked into two channel data. In a specific embodiment, the scanned image data may be three-dimensional scanned image data.
S112: and inputting the two-channel data into an initial feature extraction network model to perform feature extraction to obtain a feature map.
Specifically, two-channel data are input to an input layer, the input layer inputs the two-channel data to a first feature extraction layer, the first feature extraction layer performs feature extraction on the two-channel data, and inputs an extracted first feature map to a pooling layer, and the pooling layer performs pooling operation on the first feature extraction layer so as to reduce the size of the first feature map. Inputting the pooled first feature map into a second feature extraction layer for feature extraction to obtain a second feature map, inputting the second feature map into a third feature extraction layer for feature extraction to obtain a third feature map, inputting the third feature map into a fifth feature extraction layer for feature extraction to obtain a fifth feature map, inputting the fifth feature map into a sixth feature extraction layer for feature extraction to obtain a sixth feature map, wherein the sixth feature map is a feature map output by an initial feature extraction network model. And the second feature map is subjected to feature extraction for a plurality of times, so that the accuracy of the feature map extracted by the initial feature extraction network model is higher, and the learning capacity of the initial feature extraction network model is improved.
S113: the feature map is input into a decoder to be segmented to obtain a prediction mask region image of the scanned image data.
Specifically, the initial feature extraction network model inputs the sixth feature map into the transposed convolution layer for up-sampling processing to improve the resolution of the sixth feature map, and the processed sixth feature map is connected with the second feature map in the initial feature extraction network model and then is input into the seventh feature extraction layer for feature extraction to obtain the seventh feature map. In a specific embodiment, the data to be input into the seventh feature extraction layer is obtained by connecting the second feature map and the sixth feature map after the resolution enhancement process. The eighth feature extraction layer performs feature extraction on the seventh feature image to obtain an eighth feature image, the ninth feature extraction layer performs feature extraction on the ninth feature image to obtain a ninth feature image, and then classification marking of each pixel in the feature image is achieved, namely segmentation is performed, and the first output layer outputs the ninth feature image to obtain a prediction mask region image.
S114: a first loss function is constructed from the predicted mask area image and the real mask area image.
Specifically, an error value between the predicted mask region image and the true mask region image is calculated using a cross entropy loss function. In a specific embodiment, the first Loss function is the Cross-entropy Loss.
S115: and performing iterative training on the initial feature extraction network model by using the first loss function to obtain a feature extraction network model.
Specifically, the initial feature extraction network model is iteratively trained through an error value between the predicted mask area image and the real mask area image to obtain a feature extraction network model.
In an alternative embodiment, the results of the initial feature extraction network model are back-propagated, and the weights of the initial feature extraction network model are corrected according to the loss value fed back by the first loss function. In an alternative embodiment, the parameters in the initial feature extraction network model may also be modified to implement training of the initial feature extraction network model.
And inputting the scanned image data into an initial feature extraction network model, and predicting the mask region image in the scanned image data by the initial feature extraction network model. When the error value between the predicted mask region image and the real mask region image is smaller than a preset threshold, the preset threshold can be set by itself, for example, 1%, 5%, and the like, training of the initial feature extraction network model is stopped and the feature extraction network model is obtained.
The feature extraction network model is obtained through segmentation network training, so that generalization of the network can be increased, extraction of high-level features of the feature extraction network model learning robustness is promoted, and accuracy of feature extraction is improved.
S12: and training the feature extraction network model and the deep learning classification network model.
Referring to fig. 5, fig. 5 is a flowchart illustrating an embodiment of step S12 in the airway classification method provided in fig. 2. Specifically, an initial classification network model is constructed. The initial classification network model comprises a fourth feature extraction layer, a filling layer, a remodelling layer, a linear layer and a second output layer. The fourth feature extraction layer has 2048 input channels, 32 output channels, and a step s= (1, 1). The fourth feature extraction layer performs feature extraction and size conversion on the feature map; supplementing the number of the converted feature images through a filling layer so that the number of the airway feature images input into the linear layer is the same as the maximum slice number; the linear layer identifies the input feature map to obtain the predicted category of the airway. And pre-training the initial classification network model after feature extraction is performed on the airway scanning image data based on the feature extraction network model obtained through training. The airway scanning image data is input into the feature extraction network model, and parameters in the feature extraction network model are trained while the initial classification network model is pre-trained, so that the obtained deep learning feature extraction network model can extract features more accurately. Specifically, training the feature extraction network model and the deep learning classification network model specifically comprises the following steps.
S121: a second training sample set is acquired, the second training sample set comprising a plurality of second image samples.
Specifically, the second image sample includes airway scan data, airway region image data corresponding to the airway scan data, and a true category of an airway in the airway scan data. Wherein the airway scan image data is connected with the airway region image data. In one embodiment, the airway scan image data is connected to the airway region image to obtain two-channel data. In one embodiment, the airway scan image data is a three-dimensional CT airway scan image. The three-dimensional CT airway scan image includes a plurality of CT airway slice images. In an embodiment, referring to fig. 6, fig. 6 is a flowchart illustrating an embodiment of step S121 in the airway classifying method according to the present invention. The pre-processing of the airway scanning image data is also required before the airway scanning image data is connected with the airway region image data, and comprises the following steps.
S1211: valid data points in the airway scan image data are screened.
Specifically, invalid values in the airway scanning image data can be removed, so that the situation that abnormal values exist in part of pixels in the airway scanning image caused by factors such as noise and metal can be reduced. In one embodiment, all pixel data in the CT airway slice image are ordered, and pixel points with pixel values within the range of [0.005,0.995] are taken, and the pixel points within the range are taken as effective data points. The value range of the effective data points can be set according to the needs.
S1212: and calculating the mean and variance of the effective data points and carrying out normalization processing.
Specifically, the mean and variance of the effective data points obtained in the above steps are calculated, and then normalization processing is performed on the calculated mean and variance. In another alternative embodiment, to ensure the integrity of the airway scan image data, invalid data points in the CT airway slice image are replaced by standard normal distribution random numbers, so that the data points are complemented, and the data integrity of the CT airway slice image is maintained.
S1213: and connecting the normalized airway scanning image data with the data of the airway area image to obtain two-channel data.
Specifically, the normalized airway scan image data obtained in step S1212 is connected to an airway region image corresponding to the airway scan image data to obtain two-channel data.
S1214: and carrying out data enhancement processing on the two-channel data.
Specifically, the data enhancement processing includes at least one of a data translation processing, a data rotation processing, and a data scaling processing. In a specific embodiment, the two-channel data may be translated to increase the image sample within 10% of the original size, rotated to increase the image sample within [ -5 °,5 ° ], and scaled to increase the image sample within [0.8,1.2] times the original size.
The common head and neck three-dimensional CT airway scan image size is typically z×h×w, where H and W represent the length and width of the image and Z represents the number of slices. Data with the size of Z×H×W is input into a feature extraction network model, and the feature map output after the feature extraction network model has the size of Z '×H' ×W ', wherein Z' =Z/4, H '=H/32 and W' =W/32. For three-dimensional CT airway scan images of different patients, H and W are the same, and the number of slices Z is different according to the size of the patient and the position of the patient in the scanner, so that the feature images obtained by the feature extraction network model of different three-dimensional CT airway scan images are different. In an alternative embodiment, a maximum number of slices of airway scan image data in the second training sample set is determined. That is, the following is true. And determining the maximum number Zmax of CT airway slice images corresponding to all three-dimensional CT airway scanning images in the second training sample set, and taking Zmax as the input size of the linear layer so that the input size of the linear layer is required to be a fixed value. Wherein the three-dimensional CT airway scan image comprises a CT airway slice image.
S122: and inputting the second image sample into a feature extraction network model to perform feature extraction to obtain an airway feature map.
Specifically, two-channel data obtained by connecting the airway scanning image data in the second image sample with the airway area image is input into a feature extraction network model, and the feature extraction network model performs feature extraction on the two-channel data to obtain an airway feature map. Specifically, the feature extraction network model is obtained by pre-training in step S11, and the step of feature extraction of the feature extraction network model on the two-channel data in the second image sample is the same as the step of step S112, which is not described herein.
S123: inputting the airway characteristic diagram into an initial classification network model for classification to obtain the predicted category of the airway.
Specifically, the airway feature map is input into a fourth feature extraction layer for feature extraction to obtain a fourth feature map, and a filling layer fills the size of the fourth feature map to the size set by the linear layer, so that the increase of computational complexity and the loss of important spatial information are avoided. In a specific embodiment, the airway feature map is converted into a fixed size, that is, the airway feature map is adaptively filled into a predefined size Z 'max, where Z'max=Zmax/4,Zmax is the maximum number of CT airway slice images corresponding to the three-dimensional CT airway scan images in the second training dataset, the remodeling layer tiles the fourth feature map that is supplemented into a one-dimensional tensor, the linear layer performs linear combination on the tiled one-dimensional tensor, and then identifies the one-dimensional tensor, and the second output layer outputs the identified prediction category of the airway. In a particular embodiment, the predictive categories of airways include both difficult airways and non-difficult airways.
S124: a second loss function is constructed from the true class of the airway and the predicted class of the airway.
Specifically, a cross entropy loss function is used to calculate an error value between a true class of the airway and a predicted class of the airway. In a specific embodiment, the second Loss function is the Cross-entropy Loss.
S125: and performing iterative training on the feature extraction network model and the initial classification network model by using the second loss function to obtain a deep learning feature extraction network model and a deep learning classification network model.
Specifically, the feature extraction network model and the initial classification network model are iteratively trained through error values between the real type of the air passage and the prediction type of the air passage to obtain a deep learning feature extraction network model and a deep learning classification network model.
In an alternative embodiment, the results of the initial classification network model are back-propagated, and the weights of the feature extraction network model and the initial classification network model are modified according to the loss value fed back by the second loss function. In an alternative embodiment, the parameters in the feature extraction network model and the initial classification network model may also be modified to implement training of the feature extraction network model and the initial classification network model.
And inputting the second image sample into a feature extraction network model to perform feature extraction, and predicting the airway type in the airway scanning image data through an initial classification network model. When the error value between the real type of the air passage and the predicted type of the air passage is smaller than a preset threshold, the preset threshold can be set by itself, for example, 1%, 5% and the like, training of parameters in the feature extraction network model and the initial classification network model is stopped, and a deep learning feature extraction network model and a deep learning classification network model are obtained.
And training the initial feature extraction network model and the initial classification network model in sequence through the steps to obtain a deep learning feature extraction network model and a deep learning classification network model.
S2: airway image data is acquired.
Specifically, airway scan image data to be detected is acquired. The airway scanning image data are CT airway slice images.
S3: the airway image data is segmented to obtain airway region image data.
Referring to fig. 7, fig. 7 is a flowchart illustrating an embodiment of step S3 in the airway classification method provided in fig. 1. The airway region image data can be obtained according to the airway scanning image data, and the method specifically comprises the following steps.
S31: setting a threshold value, and performing binarization processing on the airway image data by using threshold segmentation to extract an initial airway region image.
Specifically, the CT value in CT airway slice images is determined based on the linear absorption coefficient of various tissues for X-rays. Wherein, a large amount of air exists in the air passage, and the CT value of the air is about-1000 HU. In one embodiment, the threshold is set to-200 HU, the region with CT value less than or equal to-200 HU is set to 1, and the region with CT value greater than-200 HU is set to 0. And then the CT airway slice image is segmented through a threshold value-200 HU so as to obtain an initial airway region image. At this time, the initial airway region image is a black-and-white image, white is an airway region, and the airway is a non-airway region.
S32: the initial airway region image is preprocessed, including hole filling and small region removal.
Specifically, due to the influence of noise, a large number of black holes and white small areas exist in the initial airway area image obtained after threshold segmentation. In a specific embodiment, the initial airway region image obtained after threshold segmentation is processed through an opening operation, so that the outline of the airway in the initial airway region image becomes smooth, a narrow neck is broken, and a thin protrusion is eliminated; the initial airway region image after threshold segmentation is processed by a closing operation, bridging narrow discontinuities and elongated ravines, eliminating small holes, and filling in cracks in the contour lines.
S33: an airway template is constructed and seed points are determined in the initial airway region image from the airway template.
Specifically, constructing an airway template; matching the airway template with the connected domain of each voxel point in the initial airway region image; and determining the voxel point with the highest contact ratio of the connected domain and the airway template as a seed point.
The shape of the airway varies from patient to patient, from location to location on the same patient, but the airway is generally characterized by an elongated cylindrical structure. In one embodiment, the airway template is constructed by simulating a section of a cylinder with the diameter of 32 and the height of 16 in cubes with the length, the width and the height of 64, 64 and 16 respectively as the airway template. The unit of the airway template is a voxel, and the cross section of the specific airway template is circular.
In one embodiment, the three-dimensional CT airway scan image is generally from nostrils to above the lungs, and since the sinus and other parts also contain a large amount of air, the slices comprising the nostrils are shaped differently after thresholding, and the resulting shape of the above-the-lungs slice thresholding is relatively tubular, and the cross section is also circular. The initial airway region image is provided with a plurality of voxel points, the connected domains with the diameters of 32 and the heights of 16 taking each voxel point as a center are respectively matched with the airway template, the similarity between each connected domain and the airway template is judged, and the voxel point corresponding to the connected domain with the highest similarity of the airway template is determined as a seed point.
S34: and determining whether the connected domain of the seed points is an airway region image according to the proportion of the voxel points in the connected domain of the seed points in the voxel points of the airway image data.
Specifically, the voxels in the connected domain where the seed points are located are counted, and the proportion of the voxels in the connected domain where the seed points are located in the voxel points of the three-dimensional CT airway scanning image is calculated. The proportion of the airway in the three-dimensional CT airway scanning image is generally in the range of 0.2% -0.8%. In a specific embodiment, whether the proportion of the voxel points in the connected domain of the seed points in the voxel points of the three-dimensional CT airway scanning image is in the range of 0.2% -0.8% is judged. If the proportion is within the range of 0.2% -0.8%, determining that the connected domain of the seed points is an airway area image; if the proportion is not in the range of 0.2% -0.8%, determining that the connected domain of the seed point is not the airway region image.
In a specific embodiment, if it is determined that the connected domain of the seed point is not the airway area image, the diameter of the airway template needs to be adjusted, and a new airway template is regenerated after adding 2 to the diameter of the original airway template, so that steps S33 and S34 are executed again. Until a communicating domain where the air passage is located is found, and the communicating domain is displayed, so that the image data of the air passage area is obtained. The method can accurately divide the airway from the three-dimensional CT airway scanning image.
S4: the airway image data and the airway region image data are connected to obtain two-channel data.
Specifically, in order to enable the deep learning network to focus on the airway region in the airway scanning image, learning ability of other parts in the airway scanning image is reserved, and therefore extraction accuracy of the feature map of the airway is improved. And connecting the CT airway slice image data with the airway region image data to obtain two-channel data. No enhancement processing of the two-channel data is required in this step.
S5: the two-channel data is input to a pre-trained deep learning network to confirm whether the airway in the airway image is a difficult airway.
Referring to fig. 8, fig. 8 is a flowchart illustrating an embodiment of step S5 in the airway classification method provided in fig. 1. Specifically, the two-channel data obtained in the step S4 is input into a deep learning feature extraction network model to obtain an airway feature map, and the airway feature map is identified by the deep learning classification network model to determine whether the airway in the airway scanning image is a difficult airway. The method for determining whether the airway in the airway scanning image is a difficult airway through the two-channel data specifically comprises the following steps.
S51: inputting the two-channel data into a pre-trained deep learning feature extraction network model for feature extraction to obtain an airway feature map.
Specifically, the two-channel data is input into a deep learning feature extraction network model for feature extraction. In a specific embodiment, the step of feature extraction of the two-channel data by the deep learning feature extraction network model is the same as the specific operation of step S112, and will not be described herein. The parameters of the first feature extraction layer, the pooling layer, the second feature extraction layer, the third feature extraction layer, the fifth feature extraction layer, and the sixth feature extraction layer in the deep learning feature extraction network model are different from those of the first feature extraction layer, the pooling layer, the second feature extraction layer, the third feature extraction layer, the fifth feature extraction layer, and the sixth feature extraction layer in the step S112.
S52: the airway feature map is input into a pre-trained deep learning classification network model to identify whether the airway is a difficult airway.
Specifically, the airway feature map obtained in the step S51 is input to a deep learning classification network model to identify the airway feature map, so as to classify the airways in the airway feature map. In a specific embodiment, the step of identifying and classifying the airway feature map by the deep learning classification network model is the same as the specific operation in the step S123, and will not be repeated here. The parameters of the fourth feature extraction layer, the filling layer, the remodelling layer and the linear layer in the deep learning classifying network model are different from those of the fourth feature extraction layer, the filling layer, the remodelling layer and the linear layer in the initial classifying network model. The input size of the linear layer in the embodiment is determined according to the maximum number of CT airway slice images in the three-dimensional CT airway scan image, so that the deep learning network can focus on the airway area in the CT airway scan image, and the method is applicable to three-dimensional CT airway scan images with different slice numbers, and classification accuracy is improved.
According to the airway classification method provided by the embodiment, the airway scanning image data is obtained, the airway area image data is obtained according to the airway scanning image data, the airway scanning image data and the airway area image data are connected to generate two-channel data, the two-channel data are input into the deep learning network model, and whether the airway in the airway scanning image is a difficult airway is further determined. According to the application, the two-channel data are input into the deep learning network, so that the deep learning network can effectively utilize the airway information in the airway region image data to identify whether the airway is a difficult airway, and can also reserve the learning ability of the deep learning network on other parts except the airway information in the airway scanning image data, thereby improving the accuracy of airway classification, and the method is simple and reliable in classification result.
Referring to fig. 9, fig. 9 is a schematic block diagram of an embodiment of an airway classifying device according to the present invention. As shown in fig. 9, the airway classifying device 80 in this embodiment includes: the steps of the above airway classification method are implemented by the processor 81, the memory 82, and a computer program stored in the memory 82 and executable on the processor 81, and are not repeated herein.
Referring to fig. 10, fig. 10 is a schematic block diagram of an embodiment of a computer readable storage medium provided by the present invention.
The embodiment of the application further provides a computer readable storage medium 90, the computer readable storage medium 90 stores a computer program 901, the computer program 901 includes program instructions, and a processor executes the program instructions to implement steps in any airway classification method provided in the embodiment of the application.
The computer readable storage medium 90 may be an internal storage unit of the computer device of the foregoing embodiment, for example, a hard disk or a memory of the computer device. The computer readable storage medium 90 may also be an external storage device of a computer device, such as a plug-in hard disk provided on the computer device, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD), or the like.
The foregoing is only the embodiments of the present invention, and therefore, the patent protection scope of the present invention is not limited thereto, and all equivalent structures or equivalent flow changes made by the content of the present specification and the accompanying drawings, or direct or indirect application in other related technical fields, are included in the patent protection scope of the present invention.