CN113223005A

Movatterモバイル変換

Info

Publication number: CN113223005A
Application number: CN202110511712.XA
Authority: CN
Inventors: 余辉; 王青松; 郑洁; 李佳燨; 张�杰; 张竞亓; 汪光普
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2021-05-11
Filing date: 2021-05-11
Publication date: 2021-08-06
Anticipated expiration: 2041-05-11
Also published as: CN113223005B

Abstract

Translated fromChinese

本发明涉及一种甲状腺结节自动分割及分级的智能系统，其特征在于，包括：甲状腺超声图像数据库；甲状腺超声图像预处理模块；甲状腺结节特征提取模块：基于以ResNet34为主干的U‑Net分割模型对预处理后的甲状腺超声图像进行特征提取，以ResNet34为主干的U‑Net分割模型，包括下采样模块、特征融合模块与上采样模块；甲状腺结节分割模块：用于对进行了特征提取的图像进行语义分割，形成甲状腺结节分割结果图。

The invention relates to an intelligent system for automatic segmentation and grading of thyroid nodules, which is characterized by comprising: a thyroid ultrasound image database; a thyroid ultrasound image preprocessing module; a thyroid nodule feature extraction module: based on U-Net with ResNet34 as the backbone The segmentation model extracts features from the preprocessed thyroid ultrasound images. The U-Net segmentation model with ResNet34 as the backbone includes a downsampling module, a feature fusion module and an upsampling module; The extracted images are semantically segmented to form a thyroid nodule segmentation result map.

Description

Thyroid nodule automatic segmentation and grading intelligent system

Technical Field

The invention relates to an image processing and deep learning technology, in particular to an intelligent method for automatically segmenting and grading thyroid nodules.

Background

Thyroid cancer is one of the fastest growing cancers in the world. Ultrasonic Imaging (Ultrasound Imaging) is widely applied to early diagnosis of thyroid nodules by virtue of the advantages of no pain, no damage, no radiation, high speed and low price. However, due to the ultrasonic imaging principle, thyroid images have the defects of low gray contrast, fuzzy edge, more speckle noise and the like, on the other hand, the accuracy of ultrasonic diagnosis is influenced by the experience, the inspection skill and the fidelity of operators, and along with the rapid increase of the number of patients, the labor intensity of doctors is greatly improved, so that the diagnosis result is influenced. In the past decade, the deep convolutional neural network has been widely used for medical image segmentation, and the deep learning model has strong fitting capability, so that the deep learning model has the best effect in thyroid ultrasound image segmentation. The development of intelligent medicine is promoted by a large medical platform and a large amount of medical data.

However, most of the current studies are only used for diagnosing benign and malignant nodules, and the malignant degree of the nodules is not studied hierarchically, but most of the clinical studies are used for diagnosing patients based on a Kwak TI-RADS grading system, which summarizes the malignant characteristics of five thyroid nodules and comprises the following steps: 1. solid nodules; 2. hypo-or ultra-low echo; 3. lobular or irregular edges; 4. calcification of gravel sample; 5. the aspect ratio is more than or equal to 1. And determining the malignancy of five types of thyroid nodules by referring to a TI-RADS malignancy classification system, sequentially confirming five suspicious ultrasonic expressions of TI-RADS 1 type negativity, TI-RADS 2 type benign lesions, TI-RADS 3 type non-suspicious ultrasonic expression, TI-RADS 4 suspicious malignant nodules and TI-RADS 5 type, wherein the malignancy reaches 87.5%.

Disclosure of Invention

The invention aims to solve the technical problem of providing an intelligent system for automatically segmenting and grading thyroid nodules, which is used for automatically segmenting and grading the thyroid nodules based on deep learning. The technical scheme is as follows:

an intelligent system for automatic thyroid nodule segmentation and classification, comprising:

thyroid ultrasound image database: the construction method comprises the following steps: collecting an ultrasonic image of a thyroid nodule patient and a corresponding pathological result, delineating the peripheral contour of the thyroid nodule, and converting the thyroid nodule into a binary image;

thyroid ultrasound image preprocessing module: the method is used for preprocessing the obtained thyroid gland ultrasonic image database and distributing a training set and a testing set.

Thyroid nodule feature extraction module: the method comprises the following steps of performing feature extraction on a preprocessed thyroid ultrasound image based on a U-Net segmentation model taking ResNet34 as a main frame, wherein the U-Net segmentation model taking ResNet34 as the main frame comprises a down-sampling module, a feature fusion module and an up-sampling module, and specifically comprises the following steps:

the down-sampling module is used for extracting the characteristics of the input image, and the method comprises the following steps: firstly, extracting surface layer characteristics of a large receptive field area by using convolution kernels with the size of 7 multiplied by 7, the step length of 2 and the number of 64, and changing an input image from 512 multiplied by 1 to 256 multiplied by 64; dividing the ResNet34 network into four blocks by taking pooling operation as a boundary, wherein each Block is composed of a series of convolution operations, activating functions of the convolutions are all ReLU, the number of convolution layers in each Block from top to bottom is 6, 8, 12 and 6 layers in sequence, the size of a convolution kernel in each convolution layer is 3 × 3, the number of the convolution kernels in each convolution layer from top to bottom is 128, 256, 896 and 1920 and is also called as the channel number of the convolution layers, a maximum pooling layer with the kernel size of 3 × 3 is used between adjacent blocks to reduce the size of a feature map obtained after the current Block to be one fourth of the original size, the length and the width are reduced to be half of the original size, and the size of the feature map is changed from 512 × 512 × 1, 256 × 256 × 64, 128 × 128 × 128, 64 × 256, 32 × 32 × 896 to 16 × 16 × 1920;

the up-sampling module is used for restoring and classifying the image at pixel level by using the feature map obtained by down-sampling, and the method comprises the following steps: after the 16 × 16 × 1920 feature map is obtained, the feature map is up-sampled four times by using a transposed convolution method, and after four times of transposed convolution, the size of a convolution kernel is 3 × 3, and the activation function is ReLU. After each transposition convolution, the feature map is quadrupled from 16 × 16, 32 × 32, 64 × 64, 128 × 128 to 256 × 256. Then, the 3 × 3 transpose convolution with Sigmoid as an activation function is used once again to restore the image to 512 × 512 × 2, and thenumber 2 represents the category of each final pixel point.

The feature fusion module is realized by skip connection (skip connection), and performs skip connection on each feature graph obtained by down sampling and the feature graph with the same size in the up sampling process;

thyroid nodule segmentation module: the method is used for carrying out semantic segmentation on the image subjected to the feature extraction to form a thyroid nodule segmentation result graph, and comprises the following steps: training by using a Loss function fused by a Dice Loss function and a cross entropy Loss function on a training set, verifying the effect of the obtained network model on a test set, assisting in storing model parameters, and stopping training when the Loss function of the model is minimum and stable on the test set; and verifying the segmentation effect of the trained network model on a verification set, and outputting an up-sampling semantic segmentation result.

Furthermore, in the process of establishing a thyroid ultrasonic image database, collecting an ultrasonic image of a thyroid nodule patient and a corresponding pathological result, delineating the thyroid nodule through a labeling tool labelme, storing data of the result in a json format, and finally converting the label image into a binary image to form a corresponding mask.

Further, the thyroid ultrasound image preprocessing module is configured to normalize the original ultrasound image of the thyroid nodule and the corresponding mask size to 512x512, remove noise using bilateral filtering, normalize the pixel value to 0-1, and distinguish the training set, the verification set, and the test set.

Further, the intelligent system further comprises a thyroid nodule malignancy risk rating module implemented by using a multitask deep convolutional neural network MCNN, wherein the multitask deep convolutional neural network MCNN comprises:

CONV _1 is a 75 convolution layer of 7 × 7kernel, RELU, maximum pooling layer of 3 × 3, Normalization of 5 × 5.

CONV _2 is a 200 convolutional layers of 5 × 5kernel, RELU,max pooling 3 × 3, Normalization 5 × 5.

CONV _3 is a 300 convolution layer of 3 × 3 kernels, RELU, max pooling of 5 × 5, normalization of 5 × 5.

FC is a full link layer, and consists of a first full link layer with an output of 500, ReLU, 50% drop, a second full link layer with an output of 2, and an activation function sigmoid.

Drawings

FIG. 1: block diagram of the architecture of the present invention

FIG. 2: network structure diagram of U-Net + ResNet34 adopted by the invention

FIG. 3: multi-task network MCNN structure diagram used by the invention

FIG. 4: the invention relates to a thyroid nodule semantic segmentation map

FIG. 5: final report example graph

Detailed Description

The present invention will be further described with reference to the accompanying drawings.

1. Thyroid ultrasound image database

In this example, the data originated at the general Hospital of Tianjin medical university, and included 4172 ultrasound images. When ultrasound images of Tianjin medical university are collected, three professionals draw and annotate thyroid nodules manually through a common annotation tool labelme, and the results are stored in json format. And finally converting the label image into a binary image to form a corresponding mask.

The specific method comprises the following steps: ultrasound images of past cases with puncture pathology results in a hospital database were collected, and the nodule portion of each case was delineated (ROI region of interest rendered) by three highly experienced imaging physicians. The labeling software adopts LABELme, the format of the ultrasonic image file is JPG, and the format of the labeling file is JSON. The JSON file contains a set of coordinate points for the thyroid nodule boundaries drawn by three physicians. In order to ensure the accuracy of the labeling result, intersection operation is carried out on the nodule focus area drawn by three doctors to obtain a final accurate nodule label file.

2. Thyroid ultrasonic image preprocessing module

The thyroid ultrasound images in the data set are different in size, and in order not to influence the input of a network, the size of the thyroid ultrasound images is normalized to 512x 512. To ensure that the image features are not distorted by normalization, the pictures are all filled to a size of 1024 × 1024, and the blank area pixel value is 0. The picture is then scaled down to 512x 512. The picture is denoised using bilateral filtering. Bilateral filtering is a filter that can remove noise while preserving boundary information. The value of each pixel using bilateral filtering is:

wherein g (i, j) is a value of the current pixel after filtering, f (k, l) is a value around the selected pixel, ω (i, j, k, l) is a weight of the selected pixel, and is determined by a pixel position d (i, j, k, l) and a pixel value r (i, j, k, l) together, and the calculation formula is as follows:

ω(i,j,k,l)＝d(i,j,k,l)×r(i,j,k,l)

the white noise of the ultrasonic image after bilateral filtering is reduced, and meanwhile, the contrast of the boundary is enhanced, so that the boundary is more obvious and is convenient to distinguish. And then normalizing the pixel value of (0-255) to be between (0-1) for subsequent training of the network model. The label data (mask) of the data set is obtained by taking intersection from the labeling results of three doctors, and is more accurate and reliable. According to the following steps of 6: 2: and 2, dividing a training set, a verification set and a test set according to the proportion. 2503, 834 and 834 training sets were obtained.

3. Thyroid nodule segmentation module

Automatic segmentation of thyroid nodules is achieved using a convolutional neural network. The input of the model is 512 × 512 × 1, the output size is 512 × 512 × 2, then a threshold value is set to 0.5, and each pixel point is selected to be 0 or 1, 0 represents the background, and 1 represents the thyroid nodule region.

Referring to fig. 2, firstly, a convolution layer with convolution kernel size of 7 × 7, channel number of 64 and step length of 2 is used to perform initial activation on an input image, the activation function uses ReLU, and the activation function formula is:

the resulting feature map size was 256 × 256 × 64.

The feature map is then subjected to multi-scale feature extraction using ResNet 34. In the first Block, there are 6 convolutional layers with convolutional kernel size of 3 × 3, channel number of 128 and step size of 1, and the activation functions are all relus. Every two convolution layers form a group, the input and the output of the group of convolution layers are directly connected, residual learning of a convolution network is realized, and a learning formula is converted into:

the gradient can be effectively and reversely propagated to the surface layer convolution layer, and the problem that the neural network is degraded when the network layer is deep is avoided. After the first Block, the feature map size is reduced from 256 × 256 × 128 to 128 × 128 × 128 using a maximum pooling layer with a convolution kernel size of 3 × 3 and a step size of 2. And then connecting a second Block, wherein the second Block consists of 8 convolutional layers with the convolutional kernel size of 3 multiplied by 3, the channel number of 256 and the step length of 1, and every two convolutional layers still form a group to form residual learning. The output feature map size is 128 × 128 × 256, and after one pass through the maximum pooling layer, the feature map size is converted to 64 × 64 × 256. The third Block is connected, and consists of 6 groups of 12 convolutional layers, the number of channels is 896, and the size of the output feature map is 32 × 32 × 896. After the maximum pooling layer converts the feature map size to 16 × 16 × 896, a fourth Block is connected, consisting of 3 groups of 6 convolutional layers in total, the number of channels is 1920, and the resulting feature map size is 16 × 16 × 1920.

After the down-sampling process composed of ResNet34, feature fusion and up-sampling are required to be performed on the obtained feature map. The feature fusions are represented by skip connections in FIG. 2. In the up-sampling process, in addition to obtaining a new feature map by up-sampling from the lower feature map, the intermediate feature map obtained in the down-sampling process is connected to the new feature map, and the connection operation is crop, namely, firstly, the concatenation is carried out according to the number of channels, and then, the convolution operation is carried out by using a convolution layer with a specified number of channels, so as to obtain the final new feature map. Taking the feature map of 32 × 32 × 896 on the right side of fig. 2 as an example, the feature map is converted from the feature map of 16 × 16 × 1920 on the left side into the feature map of 32 × 32 × 896 through upsampling, and the feature map of the input (32 × 32 × 896) of Block4 is obtained through crop operation. The multi-scale feature fusion is performed because the segmentation problem belongs to the small object segmentation problem, and the requirements for surface features and deep semantic features are large. The loss of upper layer characteristic information caused by the maximum pooling operation can be reduced by means of the jump connection.

The method adopted by the up-sampling is the transposed convolution. The transposed convolution can be viewed as the inverse of the convolution operation, which is:

y＝xω^T

where x is the input and ω is the weight expansion matrix of the convolution kernel. Then the transposed convolution operation is:

x＝y(ω^T)^-1

the advantage of using transposed convolution for upsampling is that the upsampling process is learning trained from the data with greater accuracy.

The loss function in the training process adopts a mode of combining the DiceLoss and the cross entropy loss function, and the formula is as follows:

wherein Y is_tTo predict the segmentation result, Y_gIs the tag data. And N, M is the size of the image, p is the probability of each pixel point being predicted as a nodule region, and y is label data.

The optimization algorithm of the network uses the SGD optimization algorithm, the size of batch is set to be 32, and after 273 rounds of training, the loss value tends to be stable. The Dice coefficient for this test set was 0.876. The model test case is shown in FIG. 4.

4. Thyroid nodule malignancy risk rating module

The thyroid nodules were rated for risk of malignancy using the multitask network MCNN (multi-task CNN), and the structure is shown in the figure. CONV _1 is a 75 convolution layer of 7 × 7kernel, RELU,3 × 3 max pooling layer, 5 × 5 Normalization. CONV _2 is a 200 convolutional layers of 5 × 5kernel, RELU, max pooling 3 × 3, Normalization 5 × 5. CONV _3 is a 300 convolution layer of 3 × 3 kernels, RELU, max pooling of 5 × 5, normalization of 5 × 5. And performing feature extraction on different scales by using the convolution, and finally integrating five features of solid nodules, low echoes or extremely low echoes, lobules or irregular edges, gravel sample calcification and aspect ratio more than or equal to 1 by using a full-connected layer and outputting the probability by using softmax. A malignancy risk rating for thyroid nodules was completed. With 0.5 as a probability threshold, the accuracy (accuracuracy) of the five features is 0.9722, 0.983, 0.956, 0.958, 0.984, respectively.

5. Output scores and corresponding malignancy characteristics

And automatically identifying five characteristics in the kwak (TI-RADS) grading report by utilizing the network, and after risk grading is carried out on the nodes, forming a diagnosis report of thyroid gland malignancy risk grading according to semantic segmentation results and classification probability results. An example of the report is shown in figure 5.

In summary, the technical scheme of the invention is as follows:

thyroid ultrasound image database: the method is used for collecting the ultrasonic images of thyroid nodule patients and corresponding pathological results, sketching the peripheral contour of thyroid nodules and converting the thyroid nodules into binary images.

Thyroid ultrasound image preprocessing module: the method is used for preprocessing data in a thyroid ultrasound image database, and comprises the following steps: and (4) size normalization, removing noise by using bilateral filtering, normalizing the pixel value to be between 0 and 1, and distributing a training set and a test set.

Thyroid nodule automatic segmentation module: the method is used for performing feature extraction and semantic segmentation on a preprocessed image through a U-Net segmentation model taking ResNet34 as a backbone, and in the U-Net network, ResNet34 is used as a down-sampling part of a feature extraction network, namely the U-Net network, and the down-sampling part with transposed convolution used for U-Net is used for image restoration, so that a thyroid nodule segmentation result graph is formed.

Thyroid nodule malignancy risk rating module: and carrying out hierarchical discrimination on thyroid nodules on the obtained semantic segmentation result graph based on the multitask deep convolutional neural network, and outputting a classification probability result.

The U-Net segmentation model with ResNet34 as a backbone comprises a down-sampling module, a feature fusion module and an up-sampling module, and specifically comprises the following steps:

the down-sampling module is used for extracting the characteristics of the input image, and the method comprises the following steps:

firstly, extracting surface layer characteristics of a large receptive field area by using convolution kernels with the size of 7 multiplied by 7, the step length of 2 and the number of 64, and changing an input image from 512 multiplied by 1 to 256 multiplied by 64; then dividing the ResNet34 network into four blocks by taking pooling operation as a boundary, wherein each Block is composed of a series of convolution operations, the activation functions of the convolutions are all ReLU, the number of convolution layers in each Block from top to bottom is 6, 8, 12 and 6 layers in sequence, the size of the convolution kernel in each convolution layer is 3 × 3, the number of the convolution kernels in each convolution layer from top to bottom is 128, 256, 896 and 1920 which are also called the channel number of the convolution layers, between the adjacent blocks, one maximum pooling layer with the kernel size of 3 × 3 reduces the feature map size obtained after the current Block to one fourth of the original size, the length and the width are reduced to half of the original size, and the size of the feature map is changed from 512 × 512 × 1, 256 × 64, 128 × 128 × 128, 64 × 256, 32 × 896 to 16 × 1920. The up-sampling module is used for restoring and classifying the image at pixel level by using the feature map obtained by down-sampling, and the method comprises the following steps: after the 16 × 16 × 1920 feature map is obtained, the feature map is up-sampled four times by using a transposed convolution method, and after four times of transposed convolution, the size of a convolution kernel is 3 × 3, and the activation function is ReLU. After each transposition convolution, the feature map is quadrupled from 16 × 16, 32 × 32, 64 × 64, 128 × 128 to 256 × 256. The image is then restored to a size of512x 2 again using a 3 x 3 transposed convolution with the activation function Sigmoid. Thenumber 2 represents the category of each pixel point finally.

1. A feature fusion module:

since the lesion area of the thyroid nodule is small relative to the entire image, each pooling operation results in a loss of a portion of the characteristic information. Therefore, in order to improve the accuracy of the model for segmenting the lesion region, feature fusion is an essential part. The feature fusion part of the model is realized by skip connection (skip connection). And performing jump connection on each feature map obtained by down sampling and the feature map with the same size in the up sampling process. On one hand, the loss of characteristics caused by the pooling operation is compensated; on the other hand, the method is beneficial to the propagation of the gradient, shortens the distance from the gradient to the upper layer convolution in the backward propagation process, and is beneficial to the training of the upper layer convolution.

And (3) training by using a Loss function fused by a Dice Loss function and a cross entropy Loss function on a training set, verifying the effect of the obtained network model on a test set, and assisting the storage of model parameters (when the Loss function of the model is minimum and stable on the test set, stopping training). And verifying the segmentation effect of the trained network model on a verification set, and outputting an up-sampling semantic segmentation result.

Thyroid nodule malignancy risk rating, including: using the multitasking network MCNN (multi-task CNN),

CONV _1 is a 75 convolution layer of 7 × 7kernel, RELU,3 × 3 max pooling layer, 5 × 5 Normalization.

CONV _2 is a 200 convolutional layers of 5 × 5kernel, RELU, max pooling 3 × 3, Normalization 5 × 5.

FC is a fully-connected layer, and comprises a first fully-connected layer (output is 500), ReLU, 50% drop, a second fully-connected layer (output is 2), and an activation function sigmoid.

Kwak (TI-RADS) rating report ranks by five thyroid nodule malignancy characteristics: 1. solid nodules; 2. hypo-or ultra-low echo; 3. lobular or irregular edges; 4. calcification of gravel sample; 5. the aspect ratio is more than or equal to 1. And (3) automatically identifying five characteristics in the kwak (TI-RADS) rating report by utilizing the network, and carrying out risk rating on the nodes.

Outputting scores and corresponding malignancy characteristics, including: and outputting the kwak score of the thyroid nodule and the corresponding malignant characteristics according to the semantic segmentation result obtained by the upsampling and the classification probability result in the multitask deep convolutional neural network.

Claims

1. An intelligent system for automatic thyroid nodule segmentation and classification, comprising:

thyroid ultrasound image database: the construction method comprises the following steps: collecting an ultrasonic image of a thyroid nodule patient and a corresponding pathological result, delineating the peripheral contour of the thyroid nodule, and converting the thyroid nodule into a binary image.

Thyroid ultrasound image preprocessing module: the thyroid ultrasound image database is used for preprocessing the obtained thyroid ultrasound image database and distributing a training set and a testing set;

the up-sampling module is used for restoring and classifying the image at pixel level by using the feature map obtained by down-sampling, and the method comprises the following steps: after obtaining a 16 × 16 × 1920 feature map, performing up-sampling on the feature map four times in a manner of transposition convolution, and performing transposition convolution four times, wherein the size of a convolution kernel is 3 × 3, and an activation function is ReLU; after each transposition convolution, the feature map is changed into four times of the original feature map, and the feature map is changed into four times of the original feature map from 16 multiplied by 16, 32 multiplied by 32, 64 multiplied by 64, 128 multiplied by 128 to 256 multiplied by 256; then, restoring the image to the size of 512 multiplied by 2 by using the 3 multiplied by 3 transposition convolution with the activation function being Sigmoid, wherein the number 2 represents the category of each final pixel point;

2. The intelligent system according to claim 1, wherein during the thyroid ultrasound image database construction process, ultrasound images of thyroid nodule patients and corresponding pathological results are collected, thyroid nodules are delineated by a labeling tool labelme, the results are stored in json format, and finally the label images are converted into binary images to form corresponding masks.

3. The intelligent system according to claim 2, wherein the thyroid ultrasound image preprocessing module is configured to normalize the thyroid nodule original ultrasound image and the corresponding mask size to 512x512, remove noise using bilateral filtering, normalize pixel values to 0-1, and distinguish between the training set, the validation set, and the test set.

4. The intelligent system according to claim 1, further comprising a thyroid nodule malignancy risk rating module implemented using a multitasking deep convolutional neural network (MCNN) comprising:

CONV _1 is a 75 convolution layer of 7 × 7kernel, RELU, maximum pooling layer of 3 × 3, Normalization of 5 × 5;

CONV _2 is a 200 convolutional layers of 5 × 5kernel, RELU, maximum pooling layer of 3 × 3, Normalization of 5 × 5;

CONV _3 is a 300 convolution layer of 3 × 3 kernels, RELU, max pooling of 5 × 5, normalization of 5 × 5;