CN114565626A

Movatterモバイル変換

Info

Publication number: CN114565626A
Application number: CN202210221919.8A
Authority: CN
Inventors: 王灏睿; 宋博
Original assignee: Jiangsu Normal University
Current assignee: Jiangsu Normal University
Priority date: 2022-03-07
Filing date: 2022-03-07
Publication date: 2022-05-31

Abstract

The invention discloses a lung CT image segmentation algorithm based on PSPNet improvement, which comprises the following steps: collecting lung CT image data on Kaggle, preprocessing the images, such as image enhancement, geometric transformation, image cutting and the like, and then dividing the samples into a training set and a test set according to the proportion; an improved PSPNet network model is constructed, feature information is extracted by using a latest MobilenetV3 algorithm of a lightweight network Mobilenet series, then a core module pyramid pooling module is introduced, the capability of acquiring global information is improved, and finally, the classified image is closer to the real contour of a target through operations such as up-sampling. The invention aims to apply an improved PSPNet algorithm to the field of lung CT segmentation, and introduces the MobilenetV3 as a trunk extraction network of the PSPNet, so that the accuracy is further improved while the network tends to be light, and the diagnosis and treatment efficiency of doctor expert analysis is improved due to the faster network processing speed.

Description

Translated fromChinese

基于PSPNet改进的肺部CT图像分割算法Improved lung CT image segmentation algorithm based on PSPNet

技术领域technical field

本发明涉及深度学习医学影像分割技术领域，特别是涉及一种基于PSPNet改进的肺部CT图像分割算法。The invention relates to the technical field of deep learning medical image segmentation, in particular to an improved lung CT image segmentation algorithm based on PSPNet.

背景技术Background technique

传统的医学影像分割方法包括阈值分割、区域生长法、边缘分割等图像分割算法。但这些分割算法对样本的要求极高，简单地从图像的灰度、对比度等信息进行手动分割，对于复杂场景下的图像效果很差。随着深度学习在各个领域的不断发展与应用，卷积神经网络也逐渐应用在图像处理领域中。卷积神经网络在图像识别和特征提取中有很好的效果，大幅提升了传统算法在医学影像分割的精度和准确率。其中，PSPNet是一种改良过后的卷积神经网络，利用金字塔池化模块和金字塔场景解析网络聚合基于不同区域的上下文信息，来提高获取全局上下文信息的能力。而MobileNet是一类定位于移动端和嵌入式装置的轻量化神经网络，经过几年的发展我们发现它在提高特征提取精度和减少运行时间方面效果很好，因此将其引入PSPNet有着很好的分割效果。Traditional medical image segmentation methods include threshold segmentation, region growing method, edge segmentation and other image segmentation algorithms. However, these segmentation algorithms have extremely high requirements on samples, and simply perform manual segmentation from information such as grayscale and contrast of the image, and the image effect in complex scenes is very poor. With the continuous development and application of deep learning in various fields, convolutional neural networks are gradually applied in the field of image processing. Convolutional neural network has a good effect in image recognition and feature extraction, which greatly improves the accuracy and accuracy of traditional algorithms in medical image segmentation. Among them, PSPNet is an improved convolutional neural network, which uses the pyramid pooling module and the pyramid scene parsing network to aggregate context information based on different regions to improve the ability to obtain global context information. MobileNet is a kind of lightweight neural network positioned on mobile terminals and embedded devices. After several years of development, we found that it is very effective in improving the accuracy of feature extraction and reducing running time. Therefore, it is very good to introduce it into PSPNet. Split effect.

发明内容SUMMARY OF THE INVENTION

为提高现有神经网络在医学图像分割中的分割能力，本发明提供一种基于PSPNet改进的肺部CT图像分割算法，提高医学图像骨骼分割方面的性能并且减少了预测速度。In order to improve the segmentation ability of the existing neural network in medical image segmentation, the present invention provides an improved lung CT image segmentation algorithm based on PSPNet, which improves the performance of medical image bone segmentation and reduces the prediction speed.

为了达到上述目的，本发明提供的一种基于PSPNet改进的肺部CT图像分割算法，包括如下步骤：In order to achieve the above-mentioned purpose, a kind of improved lung CT image segmentation algorithm based on PSPNet provided by the present invention comprises the following steps:

步骤一，制作肺部CT图像数据集，并通过数据集预处理程序，用代码划分出不同比例的训练集和测试集；Step 1: Create a lung CT image data set, and use the code to divide the training set and test set of different proportions through the data set preprocessing program;

步骤二，将处理过的肺部CT图像样本，输入到一种轻量级的深层神经网络MobileNetV3进行特征提取，进行4次下采样过后得到全局特征层(Feature Map)；In step 2, the processed lung CT image samples are input into a lightweight deep neural network MobileNetV3 for feature extraction, and the global feature layer (Feature Map) is obtained after four downsampling;

步骤三，将提取到的特征层继续获取到的特征层划分成不同大小的区域，将输入进来的特征层划分成6x6，3x3，2x2，1x1的区域，然后每个区域内部各自进行平均池化，得到局部特征层；Step 3: Divide the extracted feature layer into regions of different sizes, and divide the input feature layer into 6x6, 3x3, 2x2, and 1x1 regions, and then perform average pooling within each region. , get the local feature layer;

步骤四，将局部不同维度的特征层，利用1×1卷积进行上采样得到和步骤二得到的全局特征层相同维度的四种特征层，最后将全局和局部特征层进行堆叠；Step 4: Upsampling the local feature layers with different dimensions using 1×1 convolution to obtain four feature layers of the same dimension as the global feature layer obtained in step 2, and finally stacking the global and local feature layers;

步骤五，将步骤四得到的特征层利用一个3x3卷积进行整合，再利用一个1x1卷积进行通道调整，调整成2类，输出预测结果，最后利用resize进行上采样使得最终输出层，宽高和输入图片一样。Step 5: Integrate the feature layer obtained in step 4 with a 3x3 convolution, then use a 1x1 convolution to adjust the channel, adjust it into 2 types, output the prediction result, and finally use resize to upsample so that the final output layer, width and height Same as input image.

优选的，本发明基于PSPNet改进的肺部CT图像分割算法，使用MobileNetV3预训练权重进行训练，使得在背景复杂的CT图像下，预测结果依然不受影响，能够达到相对较好的准确率。Preferably, the present invention is based on the improved lung CT image segmentation algorithm based on PSPNet, and uses MobileNetV3 pre-training weights for training, so that in CT images with complex backgrounds, the prediction results are still unaffected, and a relatively good accuracy can be achieved.

优选的，步骤一中，可以使用现有的公开医学影像数据集，也可以和医疗机构进行合作，通过专业的医生或专家对于肺部部位进行手动分割标注，再进行图像预处理制作成数据集。Preferably, in step 1, the existing public medical image datasets may be used, or you may cooperate with medical institutions to manually segment and label the lungs by professional doctors or experts, and then perform image preprocessing to produce a dataset .

优选的，步骤一中，我们采用了一种图像预处理的方法letterbox_image，将输入进来不同尺寸的图像上下增加灰条，使得resize后的图片不会失真。Preferably, in step 1, we use letterbox_image, an image preprocessing method, to add gray bars up and down the input images of different sizes, so that the resized image will not be distorted.

优选的，在步骤二中将图片输入MobileNetV3进行特征提取的时候，采用的Invertedresidualblock结构，会先进行升维操作，然后用3×3depthwiseconvolution方式做卷积运算，传入SE模块（通道数为膨胀层通道的1/4），随后用一个轻量级注意力模型调整各通道的权重，最后用1×1卷积进行升维，再通过一个线性单元输出全局特征FeatureMap。Preferably, when the image is input into MobileNetV3 for feature extraction in step 2, the Invertedresidualblock structure used will first perform the dimension-raising operation, and then use the 3×3 depthwiseconvolution method to perform the convolution operation, and pass it into the SE module (the number of channels is the expansion layer 1/4 of the channel), then use a lightweight attention model to adjust the weight of each channel, and finally use 1×1 convolution to increase the dimension, and then output the global feature FeatureMap through a linear unit.

优选的，在步骤二的MobileNetV3模块的构建中，在倒数第三个block输出引出辅助Loss，与总Loss一起传播，共同优化参数，有效加快了收敛速度。Preferably, in the construction of the MobileNetV3 module in step 2, the auxiliary Loss is extracted from the output of the penultimate block, and propagated together with the total Loss to jointly optimize the parameters and effectively speed up the convergence speed.

优选的，在步骤三中该PSP模块融合了4种不同金字塔尺度的特征，利用1×1卷积核的是全局池化生成单个bin输出，后面三种是不同尺度的池化特征，为了保证全局特征的权重，如果金字塔共有N个级别，则在每个级别后使用1×1的卷积将对于级别通道降为原本的1/N，再通过双线性插值获得未池化前的大小，最终concat到一起，该特征包含了全局和局部的上下文信息。Preferably, in step 3, the PSP module integrates the features of 4 different pyramid scales, and the 1×1 convolution kernel is used for global pooling to generate a single bin output, and the latter three are pooling features of different scales. The weight of the global feature, if there are N levels in the pyramid, use 1×1 convolution after each level to reduce the level channel to the original 1/N, and then obtain the size before unpooling through bilinear interpolation , and finally concat together, the feature contains global and local context information.

本发明能够取得下列有益效果：The present invention can achieve the following beneficial effects:

（1）本发明采用MobileNetV3作为PSPNet的主干特征提取网络，在保证精度的同时，降低了参数量，减少了预测时间；(1) The present invention adopts MobileNetV3 as the backbone feature extraction network of PSPNet, which reduces the amount of parameters and the prediction time while ensuring the accuracy;

（2）本发明将提取得到的全局特征图通过金字塔自适应平均池化模块进行划分，不同的区域内进行池化得到局部特征图，使得到的特征图不仅包含局部特征还包含全局特征。(2) The present invention divides the extracted global feature map through the pyramid adaptive average pooling module, and performs pooling in different regions to obtain local feature maps, so that the obtained feature maps not only contain local features but also global features.

附图说明Description of drawings

图1为本发明的一种基于PSPNet改进的肺部CT图像分割算法的流程图；1 is a flowchart of a lung CT image segmentation algorithm improved based on PSPNet of the present invention;

图2为本发明的MobileNetV3模型网络结构的示意图；Fig. 2 is the schematic diagram of the MobileNetV3 model network structure of the present invention;

图3为本发明的一种基于PSPNet改进的肺部CT图像分割算法的整体网络结构的示意图；3 is a schematic diagram of the overall network structure of a PSPNet-based improved lung CT image segmentation algorithm of the present invention;

图4为本发明的一种基于PSPNet改进的肺部CT图像分割算法的识别结果的示意图。FIG. 4 is a schematic diagram of a recognition result of an improved lung CT image segmentation algorithm based on PSPNet according to the present invention.

具体实施方式Detailed ways

为使本发明要解决的技术问题、技术方案和优点更加清楚，下面将结合附图及具体实施例进行详细描述。In order to make the technical problems, technical solutions and advantages to be solved by the present invention more clear, the following will be described in detail with reference to the accompanying drawings and specific embodiments.

本发明针对现有的问题，提出了一种基于PSPNet改进的肺部CT图像分割方法。本发明为基于神经网络的语义分割的方法，通过改进PSPNet网络模型进行CT图像中对于肺部部位的分割，将图片输入到改进的PSPNet的网络模型，经过编码器模块，通过不同大小的卷积层逐步降低特征图大小，提取高层次的语义信息，然后再经过解码器模块，通过上采样等操作逐渐恢复特征图的大小，完成空间信息的提取，得到从临床3D计算机断层扫描中分割出肺部的预测结果。本发明的一种基于PSPNet改进的肺部CT图像分割方法包括如下步骤：Aiming at the existing problems, the present invention proposes an improved lung CT image segmentation method based on PSPNet. The present invention is a method for semantic segmentation based on neural network. By improving the PSPNet network model, the lung parts in the CT image are segmented, and the pictures are input into the improved PSPNet network model, through the encoder module, through different sizes of convolution The layer gradually reduces the size of the feature map, extracts high-level semantic information, and then goes through the decoder module to gradually restore the size of the feature map through upsampling and other operations, completes the extraction of spatial information, and obtains the segmented lung from clinical 3D computed tomography. forecast results of the Ministry. The improved lung CT image segmentation method based on PSPNet of the present invention comprises the following steps:

步骤一，准备肺部CT图像数据集，可以选用Kaggle上公开的数据集，也可以通过医疗机构获得肺部CT扫描图，用Labelme、LabelImage、Photoshop等工具进行手动标注肺部部位。将处理好的样本放入Sample文件夹，将手动标注好的标签文件放入Label文件夹，运行Image_annotation.py文件，以9：1的比例划分好训练集和验证集；Step 1: Prepare the lung CT image data set. You can choose the data set publicly available on Kaggle, or you can obtain the lung CT scan map from a medical institution, and use Labelme, LabelImage, Photoshop and other tools to manually label the lung parts. Put the processed samples into the Sample folder, put the manually marked label files into the Label folder, run the Image_annotation.py file, and divide the training set and validation set with a ratio of 9:1;

步骤二，利用letterbox_image函数对肺部CT图像进行不失真的resize，本发明中对输入图片的尺寸规定为473×473，通道数为3，即随后根据所需分割的种类设定num_classes，这里设定为2，并且更改self.color的数值，根据数据集的格式将背景和分割区域的像素值调整为[255,255,255]和[0,0,0]；根据电脑配置自行设定下采样倍数downsample_factor，采集的特征大小为，更改其他参数比如：迭代次数Epoch、batch_size、学习率learning_rate等等。Step 2, utilize letterbox_image function to carry out undistorted resize to the lung CT image, in the present invention, the size of the input picture is specified as 473×473, and the number of channels is 3, that is, num_classes is subsequently set according to the type of required segmentation, and here is set. Set it to 2, and change the value of self.color, adjust the pixel values of the background and segmentation areas to [255, 255, 255] and [0, 0, 0] according to the format of the data set; set the downsampling multiple downsample_factor according to the computer configuration, The size of the collected features is, change other parameters such as: the number of iterations Epoch, batch_size, learning rate learning_rate, etc.

步骤三，构建MobileNetV3所独特的bneck结构：利用1×1卷积进行通道数调整，随后进行标准化和激活函数，在前6个bneck结构中所用到的激活函数为ReLU6，具体形式为：Step 3: Build the unique bneck structure of MobileNetV3: use 1×1 convolution to adjust the number of channels, and then perform standardization and activation functions. The activation function used in the first 6 bneck structures is ReLU6, and the specific form is:

，

,

即输入大于6的时候返回值为6，作为一个非线性函数，保证在后续计算的鲁棒性。在后8个bneck结构中采用h-swish函数，具体形式为：That is, when the input is greater than 6, the return value is 6. As a nonlinear function, the robustness of subsequent calculations is guaranteed. The h-swish function is used in the last 8 bneck structures, and the specific form is:

h-swish函数让计算成本降低，减少了参数量，在返回像素点分类表现更好。The h-swish function reduces the computational cost, reduces the number of parameters, and performs better in the classification of returned pixels.

利用深度可分离卷积进行特征提取，标准化后施加注意力机制：首先利用公式Feature extraction using depthwise separable convolution, and attention mechanism applied after normalization: first use the formula

进行squeeze操作，即全局平均池化得到其中一个特征长条，随后根据公式Perform the squeeze operation, that is, global average pooling to obtain one of the feature strips, and then according to the formula

进行excitation操作，即两次全连接后得到的另一个特征长条，将

和

相乘，完成注意力机制的构建。随后利用1×1卷积进行降维和标准化操作，判断是否使用了残差边，使用残差边那么必须将残差边加入返回出的特征中，至此bneck结构搭建完成；按照MobileNetV3 Large结构搭建网络，我们取出在倒数第三个bneck结构得到的特征作为辅助训练分支aux_branch，进行标准化和dropout操作，将特征resize后与标签进行比对计算loss来训练。

Carry out the excitation operation, that is, another feature strip obtained after two full connections, the

and

Multiply to complete the construction of the attention mechanism. Then use 1×1 convolution to perform dimensionality reduction and normalization operations to determine whether the residual edge is used. If the residual edge is used, the residual edge must be added to the returned features. At this point, the bneck structure is completed; the network is built according to the MobileNetV3 Large structure. , we take the features obtained in the penultimate bneck structure as the auxiliary training branch aux_branch, perform standardization and dropout operations, and compare the features with the labels to calculate the loss for training.

步骤四，将图片输入我们构建好的MobileNetV3中进行全局特征提取，最后我们取得全局特征图将提取到的特征层继续获取到的特征层划分成不同大小的区域，将输入进来的全局特征层用6x6，3x3，2x2，1x1的卷积核生成各部分的局部特征层；将局部不同维度的特征层，利用1×1卷积进行上采样得到和全局特征层相同维度的四种特征层，最后将全局和局部特征层进行通道堆叠；得到的特征层利用一个3x3卷积进行整合，再利用一个1x1卷积进行通道调整，调整成2类，输出预测结果，最后利用resize进行上采样使得最终输出层的宽、高和输入图片一样。Step 4: Input the image into our constructed MobileNetV3 for global feature extraction. Finally, we obtain the global feature map and divide the feature layer obtained from the extracted feature layer into regions of different sizes, and use the input global feature layer with The convolution kernels of 6x6, 3x3, 2x2, and 1x1 generate local feature layers of each part; the local feature layers of different dimensions are upsampled by 1×1 convolution to obtain four feature layers of the same dimension as the global feature layer, and finally The global and local feature layers are channel stacked; the obtained feature layer is integrated with a 3x3 convolution, and then a 1x1 convolution is used for channel adjustment, adjusted into 2 categories, and output prediction results. Finally, use resize for upsampling to make the final output The width and height of the layer are the same as the input image.

步骤五，可以采用网络上公开的MobileNetV3的预训练权重文件运行train.py文件，也可以将model_path设置为空格，从头开始训练，采用Cross Entropy Loss和DiceLoss进行训练，Cross Entropy Loss即交叉熵函数，公式如下：Step 5, you can use the pre-training weight file of MobileNetV3 published on the network to run the train.py file, or you can set the model_path to a space, start training from scratch, and use Cross Entropy Loss and DiceLoss for training, Cross Entropy Loss is the cross entropy function, The formula is as follows:

为标签值，

为预测值，Dice Loss的公式为：

is the label value,

is the predicted value, the formula of Dice Loss is:

，

,

利用上述公式计算生成总Loss进行训练，将训练好生成的.pth权重文件，放入PSPNet.py文件中，运行它再运行predict.py文件，输入想要预测的图片路径，得到经过算法分割生成的肺部图像。Use the above formula to calculate and generate the total Loss for training, put the .pth weight file generated after training into the PSPNet.py file, run it and then run the predict.py file, enter the image path you want to predict, and get the algorithm segmented and generated image of lungs.

步骤六，运行get_miou文件，根据公式Step 6, run the get_miou file, according to the formula

,

其中TP(真正)：预测结果为正类,实则为正类；FP(假正)：预测结果是正类, 实则为负类；FN(假负)：预测结果是负类, 实则为正类；TN(真负)：预测为负类, 真实则为负类。根据公式TP (true): the predicted result is positive, but it is positive; FP (false positive): the predicted result is positive, but it is negative; FN (false negative): the predicted result is negative, but it is positive; TN (True Negative): Prediction is negative class, true is negative class. According to the formula

计算平均像素准确率。查看本次训练成果的评价指标，根据指标优化调整参数，找到最合适的参数。Calculate the average pixel accuracy. Check the evaluation indicators of the training results, optimize and adjust the parameters according to the indicators, and find the most suitable parameters.

通过测试，相比于原始的PSPNet网络，本发明实例在Kaggle数据集上有很好的肺部CT图像分割效果，类别平均像素准确率（MPA）达到了92.68%，在肺部CT图像的语义分割上有更好的效果。Through the test, compared with the original PSPNet network, the example of the present invention has a good lung CT image segmentation effect on the Kaggle dataset, and the category average pixel accuracy (MPA) reaches 92.68%. The semantics of lung CT images There is a better effect on segmentation.

本发明设计了一种基于PSPNet改进的肺部CT图像分割方法，精简PSPNet模型参数，选用轻量级神经网络MobileNetV3进行特征提取，改进后的轻量级神经网络算法可以应用于移动端或者小型嵌入式设备，以辅助专业医生进行及时的诊断，提高诊断效率。The present invention designs an improved lung CT image segmentation method based on PSPNet, simplifies the parameters of the PSPNet model, selects the lightweight neural network MobileNetV3 for feature extraction, and the improved lightweight neural network algorithm can be applied to mobile terminals or small embeddings It can assist professional doctors to make timely diagnosis and improve the efficiency of diagnosis.

Claims

1. A lung CT image segmentation algorithm based on PSPNet improvement is characterized in that MobilenetV3 is introduced as a network backbone extraction network, and the method comprises the following steps:

firstly, making a lung CT image data set, and dividing a training set and a testing set with different proportions by using codes through a data set preprocessing program;

inputting the processed lung CT image sample into a lightweight deep neural network (MobileNet V3) for Feature extraction, and performing downsampling for 4 times to obtain a global Feature layer (Feature Map);

step three, dividing the extracted feature layers continuously acquired by the feature layers into areas with different sizes, dividing the input feature layers into areas of 6x6, 3x3, 2x2 and 1x1, and then performing average pooling in each area to obtain local feature layers;

step four, performing upsampling on the feature layers with different local dimensions by utilizing 1x1 convolution to obtain four feature layers with the same dimensions as the global feature layer obtained in the step two, and finally stacking the global feature layer and the local feature layers;

and step five, integrating the characteristic layers obtained in the step four by using a 3x3 convolution, adjusting channels by using a 1x1 convolution to 2 types, outputting a prediction result, and finally performing upsampling by using resize to ensure that the final output layer has the same width and height as the input picture.

2. The PSPNet-based improved lung CT image segmentation algorithm as claimed in claim 1, wherein in the second step, Google's newly proposed lightweight network MobileNet V3 is used as a main feature extraction network to replace the original ResNet101 network, thereby greatly increasing the extraction speed and increasing the feature extraction precision.

3. The PSPNet-based improved lung CT image segmentation algorithm as claimed in claim 2, wherein a SE (Squeeze-and-Excite) module is introduced after the Depthwise convolution.

4. The PSPNet-based improved lung CT image segmentation algorithm as claimed in claim 1, wherein the ReLU6 and h-swish activation functions are introduced into the network, and the ReLU6 is a normal ReLU but limits the maximum output to 6, so as to have good numerical resolution even at low precision of mobile end device float16/int8, the formula is as follows:

h-Swish is a recent improved version of Swish nonlinear function, where the formula for the Swish activation function is as follows:

wherein

Let sigmoid function, and H-swish function formula as follows:

the h-swish function replaces a Sigmoid function with a piecewise linear function, the used ReLU6 can be realized in a plurality of deep learning frames, and meanwhile, the accuracy loss of numerical values is reduced during quantification.

5. The PSPNet-based improved lung CT image segmentation algorithm as claimed in claim 1, wherein LOSS as used herein is composed of two parts: cross Engine Loss and Dice Loss,

and evaluating each pixel, wherein Cross entry Loss is a Cross Entropy function, Dice Loss is an evaluation index of semantic segmentation as Loss, a Dice coefficient is a set similarity measurement function and is generally used for calculating the similarity of two samples, the value range is [0,1], and the calculation formula is as follows:

，

where X is the predicted outcome and Y is the true outcome.