Disclosure of Invention
The invention aims to solve the technical problem of providing an intelligent system for automatically segmenting and grading thyroid nodules, which is used for automatically segmenting and grading the thyroid nodules based on deep learning. The technical scheme is as follows:
an intelligent system for automatic thyroid nodule segmentation and classification, comprising:
thyroid ultrasound image database: the construction method comprises the following steps: collecting an ultrasonic image of a thyroid nodule patient and a corresponding pathological result, delineating the peripheral contour of the thyroid nodule, and converting the thyroid nodule into a binary image;
thyroid ultrasound image preprocessing module: the method is used for preprocessing the obtained thyroid gland ultrasonic image database and distributing a training set and a testing set.
Thyroid nodule feature extraction module: the method comprises the following steps of performing feature extraction on a preprocessed thyroid ultrasound image based on a U-Net segmentation model taking ResNet34 as a main frame, wherein the U-Net segmentation model taking ResNet34 as the main frame comprises a down-sampling module, a feature fusion module and an up-sampling module, and specifically comprises the following steps:
the down-sampling module is used for extracting the characteristics of the input image, and the method comprises the following steps: firstly, extracting surface layer characteristics of a large receptive field area by using convolution kernels with the size of 7 multiplied by 7, the step length of 2 and the number of 64, and changing an input image from 512 multiplied by 1 to 256 multiplied by 64; dividing the ResNet34 network into four blocks by taking pooling operation as a boundary, wherein each Block is composed of a series of convolution operations, activating functions of the convolutions are all ReLU, the number of convolution layers in each Block from top to bottom is 6, 8, 12 and 6 layers in sequence, the size of a convolution kernel in each convolution layer is 3 × 3, the number of the convolution kernels in each convolution layer from top to bottom is 128, 256, 896 and 1920 and is also called as the channel number of the convolution layers, a maximum pooling layer with the kernel size of 3 × 3 is used between adjacent blocks to reduce the size of a feature map obtained after the current Block to be one fourth of the original size, the length and the width are reduced to be half of the original size, and the size of the feature map is changed from 512 × 512 × 1, 256 × 256 × 64, 128 × 128 × 128, 64 × 256, 32 × 32 × 896 to 16 × 16 × 1920;
the up-sampling module is used for restoring and classifying the image at pixel level by using the feature map obtained by down-sampling, and the method comprises the following steps: after the 16 × 16 × 1920 feature map is obtained, the feature map is up-sampled four times by using a transposed convolution method, and after four times of transposed convolution, the size of a convolution kernel is 3 × 3, and the activation function is ReLU. After each transposition convolution, the feature map is quadrupled from 16 × 16, 32 × 32, 64 × 64, 128 × 128 to 256 × 256. Then, the 3 × 3 transpose convolution with Sigmoid as an activation function is used once again to restore the image to 512 × 512 × 2, and thenumber 2 represents the category of each final pixel point.
The feature fusion module is realized by skip connection (skip connection), and performs skip connection on each feature graph obtained by down sampling and the feature graph with the same size in the up sampling process;
thyroid nodule segmentation module: the method is used for carrying out semantic segmentation on the image subjected to the feature extraction to form a thyroid nodule segmentation result graph, and comprises the following steps: training by using a Loss function fused by a Dice Loss function and a cross entropy Loss function on a training set, verifying the effect of the obtained network model on a test set, assisting in storing model parameters, and stopping training when the Loss function of the model is minimum and stable on the test set; and verifying the segmentation effect of the trained network model on a verification set, and outputting an up-sampling semantic segmentation result.
Furthermore, in the process of establishing a thyroid ultrasonic image database, collecting an ultrasonic image of a thyroid nodule patient and a corresponding pathological result, delineating the thyroid nodule through a labeling tool labelme, storing data of the result in a json format, and finally converting the label image into a binary image to form a corresponding mask.
Further, the thyroid ultrasound image preprocessing module is configured to normalize the original ultrasound image of the thyroid nodule and the corresponding mask size to 512x512, remove noise using bilateral filtering, normalize the pixel value to 0-1, and distinguish the training set, the verification set, and the test set.
Further, the intelligent system further comprises a thyroid nodule malignancy risk rating module implemented by using a multitask deep convolutional neural network MCNN, wherein the multitask deep convolutional neural network MCNN comprises:
CONV _1 is a 75 convolution layer of 7 × 7kernel, RELU, maximum pooling layer of 3 × 3, Normalization of 5 × 5.
CONV _2 is a 200 convolutional layers of 5 × 5kernel, RELU,max pooling 3 × 3, Normalization 5 × 5.
CONV _3 is a 300 convolution layer of 3 × 3 kernels, RELU, max pooling of 5 × 5, normalization of 5 × 5.
FC is a full link layer, and consists of a first full link layer with an output of 500, ReLU, 50% drop, a second full link layer with an output of 2, and an activation function sigmoid.
Detailed Description
The present invention will be further described with reference to the accompanying drawings.
1. Thyroid ultrasound image database
In this example, the data originated at the general Hospital of Tianjin medical university, and included 4172 ultrasound images. When ultrasound images of Tianjin medical university are collected, three professionals draw and annotate thyroid nodules manually through a common annotation tool labelme, and the results are stored in json format. And finally converting the label image into a binary image to form a corresponding mask.
The specific method comprises the following steps: ultrasound images of past cases with puncture pathology results in a hospital database were collected, and the nodule portion of each case was delineated (ROI region of interest rendered) by three highly experienced imaging physicians. The labeling software adopts LABELme, the format of the ultrasonic image file is JPG, and the format of the labeling file is JSON. The JSON file contains a set of coordinate points for the thyroid nodule boundaries drawn by three physicians. In order to ensure the accuracy of the labeling result, intersection operation is carried out on the nodule focus area drawn by three doctors to obtain a final accurate nodule label file.
2. Thyroid ultrasonic image preprocessing module
The thyroid ultrasound images in the data set are different in size, and in order not to influence the input of a network, the size of the thyroid ultrasound images is normalized to 512x 512. To ensure that the image features are not distorted by normalization, the pictures are all filled to a size of 1024 × 1024, and the blank area pixel value is 0. The picture is then scaled down to 512x 512. The picture is denoised using bilateral filtering. Bilateral filtering is a filter that can remove noise while preserving boundary information. The value of each pixel using bilateral filtering is:
wherein g (i, j) is a value of the current pixel after filtering, f (k, l) is a value around the selected pixel, ω (i, j, k, l) is a weight of the selected pixel, and is determined by a pixel position d (i, j, k, l) and a pixel value r (i, j, k, l) together, and the calculation formula is as follows:
ω(i,j,k,l)=d(i,j,k,l)×r(i,j,k,l)
the white noise of the ultrasonic image after bilateral filtering is reduced, and meanwhile, the contrast of the boundary is enhanced, so that the boundary is more obvious and is convenient to distinguish. And then normalizing the pixel value of (0-255) to be between (0-1) for subsequent training of the network model. The label data (mask) of the data set is obtained by taking intersection from the labeling results of three doctors, and is more accurate and reliable. According to the following steps of 6: 2: and 2, dividing a training set, a verification set and a test set according to the proportion. 2503, 834 and 834 training sets were obtained.
3. Thyroid nodule segmentation module
Automatic segmentation of thyroid nodules is achieved using a convolutional neural network. The input of the model is 512 × 512 × 1, the output size is 512 × 512 × 2, then a threshold value is set to 0.5, and each pixel point is selected to be 0 or 1, 0 represents the background, and 1 represents the thyroid nodule region.
Referring to fig. 2, firstly, a convolution layer with convolution kernel size of 7 × 7, channel number of 64 and step length of 2 is used to perform initial activation on an input image, the activation function uses ReLU, and the activation function formula is:
the resulting feature map size was 256 × 256 × 64.
The feature map is then subjected to multi-scale feature extraction using ResNet 34. In the first Block, there are 6 convolutional layers with convolutional kernel size of 3 × 3, channel number of 128 and step size of 1, and the activation functions are all relus. Every two convolution layers form a group, the input and the output of the group of convolution layers are directly connected, residual learning of a convolution network is realized, and a learning formula is converted into:
the gradient can be effectively and reversely propagated to the surface layer convolution layer, and the problem that the neural network is degraded when the network layer is deep is avoided. After the first Block, the feature map size is reduced from 256 × 256 × 128 to 128 × 128 × 128 using a maximum pooling layer with a convolution kernel size of 3 × 3 and a step size of 2. And then connecting a second Block, wherein the second Block consists of 8 convolutional layers with the convolutional kernel size of 3 multiplied by 3, the channel number of 256 and the step length of 1, and every two convolutional layers still form a group to form residual learning. The output feature map size is 128 × 128 × 256, and after one pass through the maximum pooling layer, the feature map size is converted to 64 × 64 × 256. The third Block is connected, and consists of 6 groups of 12 convolutional layers, the number of channels is 896, and the size of the output feature map is 32 × 32 × 896. After the maximum pooling layer converts the feature map size to 16 × 16 × 896, a fourth Block is connected, consisting of 3 groups of 6 convolutional layers in total, the number of channels is 1920, and the resulting feature map size is 16 × 16 × 1920.
After the down-sampling process composed of ResNet34, feature fusion and up-sampling are required to be performed on the obtained feature map. The feature fusions are represented by skip connections in FIG. 2. In the up-sampling process, in addition to obtaining a new feature map by up-sampling from the lower feature map, the intermediate feature map obtained in the down-sampling process is connected to the new feature map, and the connection operation is crop, namely, firstly, the concatenation is carried out according to the number of channels, and then, the convolution operation is carried out by using a convolution layer with a specified number of channels, so as to obtain the final new feature map. Taking the feature map of 32 × 32 × 896 on the right side of fig. 2 as an example, the feature map is converted from the feature map of 16 × 16 × 1920 on the left side into the feature map of 32 × 32 × 896 through upsampling, and the feature map of the input (32 × 32 × 896) of Block4 is obtained through crop operation. The multi-scale feature fusion is performed because the segmentation problem belongs to the small object segmentation problem, and the requirements for surface features and deep semantic features are large. The loss of upper layer characteristic information caused by the maximum pooling operation can be reduced by means of the jump connection.
The method adopted by the up-sampling is the transposed convolution. The transposed convolution can be viewed as the inverse of the convolution operation, which is:
y=xωT
where x is the input and ω is the weight expansion matrix of the convolution kernel. Then the transposed convolution operation is:
x=y(ωT)-1
the advantage of using transposed convolution for upsampling is that the upsampling process is learning trained from the data with greater accuracy.
The loss function in the training process adopts a mode of combining the DiceLoss and the cross entropy loss function, and the formula is as follows:
wherein Y istTo predict the segmentation result, YgIs the tag data. And N, M is the size of the image, p is the probability of each pixel point being predicted as a nodule region, and y is label data.
The optimization algorithm of the network uses the SGD optimization algorithm, the size of batch is set to be 32, and after 273 rounds of training, the loss value tends to be stable. The Dice coefficient for this test set was 0.876. The model test case is shown in FIG. 4.
4. Thyroid nodule malignancy risk rating module
The thyroid nodules were rated for risk of malignancy using the multitask network MCNN (multi-task CNN), and the structure is shown in the figure. CONV _1 is a 75 convolution layer of 7 × 7kernel, RELU,3 × 3 max pooling layer, 5 × 5 Normalization. CONV _2 is a 200 convolutional layers of 5 × 5kernel, RELU, max pooling 3 × 3, Normalization 5 × 5. CONV _3 is a 300 convolution layer of 3 × 3 kernels, RELU, max pooling of 5 × 5, normalization of 5 × 5. And performing feature extraction on different scales by using the convolution, and finally integrating five features of solid nodules, low echoes or extremely low echoes, lobules or irregular edges, gravel sample calcification and aspect ratio more than or equal to 1 by using a full-connected layer and outputting the probability by using softmax. A malignancy risk rating for thyroid nodules was completed. With 0.5 as a probability threshold, the accuracy (accuracuracy) of the five features is 0.9722, 0.983, 0.956, 0.958, 0.984, respectively.
5. Output scores and corresponding malignancy characteristics
And automatically identifying five characteristics in the kwak (TI-RADS) grading report by utilizing the network, and after risk grading is carried out on the nodes, forming a diagnosis report of thyroid gland malignancy risk grading according to semantic segmentation results and classification probability results. An example of the report is shown in figure 5.
In summary, the technical scheme of the invention is as follows:
thyroid ultrasound image database: the method is used for collecting the ultrasonic images of thyroid nodule patients and corresponding pathological results, sketching the peripheral contour of thyroid nodules and converting the thyroid nodules into binary images.
Thyroid ultrasound image preprocessing module: the method is used for preprocessing data in a thyroid ultrasound image database, and comprises the following steps: and (4) size normalization, removing noise by using bilateral filtering, normalizing the pixel value to be between 0 and 1, and distributing a training set and a test set.
Thyroid nodule automatic segmentation module: the method is used for performing feature extraction and semantic segmentation on a preprocessed image through a U-Net segmentation model taking ResNet34 as a backbone, and in the U-Net network, ResNet34 is used as a down-sampling part of a feature extraction network, namely the U-Net network, and the down-sampling part with transposed convolution used for U-Net is used for image restoration, so that a thyroid nodule segmentation result graph is formed.
Thyroid nodule malignancy risk rating module: and carrying out hierarchical discrimination on thyroid nodules on the obtained semantic segmentation result graph based on the multitask deep convolutional neural network, and outputting a classification probability result.
The U-Net segmentation model with ResNet34 as a backbone comprises a down-sampling module, a feature fusion module and an up-sampling module, and specifically comprises the following steps:
the down-sampling module is used for extracting the characteristics of the input image, and the method comprises the following steps:
firstly, extracting surface layer characteristics of a large receptive field area by using convolution kernels with the size of 7 multiplied by 7, the step length of 2 and the number of 64, and changing an input image from 512 multiplied by 1 to 256 multiplied by 64; then dividing the ResNet34 network into four blocks by taking pooling operation as a boundary, wherein each Block is composed of a series of convolution operations, the activation functions of the convolutions are all ReLU, the number of convolution layers in each Block from top to bottom is 6, 8, 12 and 6 layers in sequence, the size of the convolution kernel in each convolution layer is 3 × 3, the number of the convolution kernels in each convolution layer from top to bottom is 128, 256, 896 and 1920 which are also called the channel number of the convolution layers, between the adjacent blocks, one maximum pooling layer with the kernel size of 3 × 3 reduces the feature map size obtained after the current Block to one fourth of the original size, the length and the width are reduced to half of the original size, and the size of the feature map is changed from 512 × 512 × 1, 256 × 64, 128 × 128 × 128, 64 × 256, 32 × 896 to 16 × 1920. The up-sampling module is used for restoring and classifying the image at pixel level by using the feature map obtained by down-sampling, and the method comprises the following steps: after the 16 × 16 × 1920 feature map is obtained, the feature map is up-sampled four times by using a transposed convolution method, and after four times of transposed convolution, the size of a convolution kernel is 3 × 3, and the activation function is ReLU. After each transposition convolution, the feature map is quadrupled from 16 × 16, 32 × 32, 64 × 64, 128 × 128 to 256 × 256. The image is then restored to a size of512x 2 again using a 3 x 3 transposed convolution with the activation function Sigmoid. Thenumber 2 represents the category of each pixel point finally.
1. A feature fusion module:
since the lesion area of the thyroid nodule is small relative to the entire image, each pooling operation results in a loss of a portion of the characteristic information. Therefore, in order to improve the accuracy of the model for segmenting the lesion region, feature fusion is an essential part. The feature fusion part of the model is realized by skip connection (skip connection). And performing jump connection on each feature map obtained by down sampling and the feature map with the same size in the up sampling process. On one hand, the loss of characteristics caused by the pooling operation is compensated; on the other hand, the method is beneficial to the propagation of the gradient, shortens the distance from the gradient to the upper layer convolution in the backward propagation process, and is beneficial to the training of the upper layer convolution.
And (3) training by using a Loss function fused by a Dice Loss function and a cross entropy Loss function on a training set, verifying the effect of the obtained network model on a test set, and assisting the storage of model parameters (when the Loss function of the model is minimum and stable on the test set, stopping training). And verifying the segmentation effect of the trained network model on a verification set, and outputting an up-sampling semantic segmentation result.
Thyroid nodule malignancy risk rating, including: using the multitasking network MCNN (multi-task CNN),
CONV _1 is a 75 convolution layer of 7 × 7kernel, RELU,3 × 3 max pooling layer, 5 × 5 Normalization.
CONV _2 is a 200 convolutional layers of 5 × 5kernel, RELU, max pooling 3 × 3, Normalization 5 × 5.
CONV _3 is a 300 convolution layer of 3 × 3 kernels, RELU, max pooling of 5 × 5, normalization of 5 × 5.
FC is a fully-connected layer, and comprises a first fully-connected layer (output is 500), ReLU, 50% drop, a second fully-connected layer (output is 2), and an activation function sigmoid.
Kwak (TI-RADS) rating report ranks by five thyroid nodule malignancy characteristics: 1. solid nodules; 2. hypo-or ultra-low echo; 3. lobular or irregular edges; 4. calcification of gravel sample; 5. the aspect ratio is more than or equal to 1. And (3) automatically identifying five characteristics in the kwak (TI-RADS) rating report by utilizing the network, and carrying out risk rating on the nodes.
Outputting scores and corresponding malignancy characteristics, including: and outputting the kwak score of the thyroid nodule and the corresponding malignant characteristics according to the semantic segmentation result obtained by the upsampling and the classification probability result in the multitask deep convolutional neural network.