CN115588013A

Movatterモバイル変換

Info

Publication number: CN115588013A
Application number: CN202211220793.9A
Authority: CN
Inventors: 文颖; 李凯; 王雯; 李占领
Original assignee: AVCON INFORMATION TECHNOLOGY CO LTD; East China Normal University
Current assignee: AVCON INFORMATION TECHNOLOGY CO LTD; East China Normal University
Priority date: 2022-10-08
Filing date: 2022-10-08
Publication date: 2023-01-10
Anticipated expiration: 2042-10-08
Also published as: CN115588013B

Abstract

The invention discloses an image segmentation method based on full-scale fusion and flow field attention, which comprises the following steps: acquiring an image data set, and preprocessing a data image; constructing an image segmentation model by taking U-Net as a backbone network; inputting training images in a data set into an image segmentation model for training; adjusting the model to the optimal effect by selecting proper parameters and loss functions and storing; and inputting the verification images in the data set into the trained image segmentation model to obtain a segmentation prediction result. The invention improves the adaptability of the method to different segmentation tasks and the image segmentation precision by combining the network structure with specific functions and improving the network structure, reduces the semantic difference among the features of each scale, highlights the key feature information of the image and obviously improves the performance and the robustness of the network.

Description

Translated fromChinese

一种基于全尺度融合和流场注意力的图像分割方法An Image Segmentation Method Based on Full-Scale Fusion and Flow Field Attention

技术领域technical field

本发明涉及图像分割领域，特别是涉及一种基于全尺度融合和流场注意力的图像分割方法。The invention relates to the field of image segmentation, in particular to an image segmentation method based on full-scale fusion and flow field attention.

背景技术Background technique

近年来，基于深度学习的分割方法在图像分割任务上的应用取得了显著的进展。深度学习分割方法的成功得益于深度神经网络有强大的图像特征提取能力，并通过对图像特征信息的处理得到精细化的分割结果。深度神经网络模型包括有卷积神经网络(convolutional neural networks，CNN)、多层感知机(Multilayer Perceptron，MLP)、以及Transfomer等。现今卷积神经网络已经在图像分割领域取得了一定成就，例如全卷积网络(FCN)、U-Net和SegNet等。其中U-Net是一个具有代表性的图像分割网络。U-Net在编码阶段利用分级CNN提取图像特征，解码阶段则通过反卷积和跳跃连接以实现编解码器的特征交互融合，从而得到较好的图像分割预测图。基于U-Net，研究者们分别提出了结合注意力机制的Attention U-Net和增加特征精细化处理的DFM，均使得分割精度有所提升。然而，基于卷积神经网络的方法中的卷积运算的内在局部性，造成它缺乏对全局特征相关性建模的能力。Transformer和多层感知机能够利用自注意力机制和全连接网络来捕获图像的全局信息，能提取到卷积神经网络较难获取的图像特征。例如TransUNet通过将U-Net编码器中最深层次的特征替换为从ViT中提取的特征。UNetXt集成了卷积和多层感知机的医学图像分割网络提升了分割预测精度。然而目前大多基于深度学习的图像分割方法仍存在不足之处：第一，网络输入图像的尺寸要求存有限制。大多数基于Transformer的方法都需要数据集有大小规整方正的训练与测试图像，但是广泛收集的真实图像数据大多是尺寸任意的，而图像大小的调整将导致图像变形失真并影响分割结果。第二，编码解码阶段中单一网络挖掘的图像特征信息可能不足。多层感知机网络提取的是全局特征信息，卷积神经网络的优势是聚焦局部的信息。故可以通过合理地结合不同类型网络来获得更全面的特征信息，以提升网络性能。第三，U-Net的简单跳跃连接结构难以将各尺度跳连接特征中的粗粒度信息和细粒度信息进行有效融合，各尺度跳连接特征间存在语义差异，从而限制了网络的分割性能，存在一定的缺陷。In recent years, the application of deep learning-based segmentation methods to image segmentation tasks has achieved remarkable progress. The success of the deep learning segmentation method is due to the powerful image feature extraction ability of the deep neural network, and the refined segmentation results are obtained by processing the image feature information. Deep neural network models include convolutional neural networks (CNN), multilayer perceptron (Multilayer Perceptron, MLP), and Transformer. Nowadays, convolutional neural networks have made certain achievements in the field of image segmentation, such as fully convolutional network (FCN), U-Net and SegNet. Among them, U-Net is a representative image segmentation network. In the encoding stage, U-Net uses hierarchical CNN to extract image features, and in the decoding stage, deconvolution and skip connections are used to realize the interactive fusion of codec features, so as to obtain a better image segmentation prediction map. Based on U-Net, the researchers proposed the Attention U-Net combined with the attention mechanism and the DFM with added feature refinement, both of which improved the segmentation accuracy. However, the inherent locality of the convolution operation in convolutional neural network-based methods causes it to lack the ability to model global feature correlations. Transformer and multi-layer perceptron can use self-attention mechanism and fully connected network to capture the global information of the image, and can extract image features that are difficult for convolutional neural networks to obtain. For example, TransUNet replaces the deepest features in the U-Net encoder with features extracted from ViT. UNetXt integrates convolutional and multilayer perceptron medical image segmentation networks to improve segmentation prediction accuracy. However, most current image segmentation methods based on deep learning still have shortcomings: First, the size requirements of the network input image are limited. Most Transformer-based methods require datasets with regular and square training and test images, but most of the widely collected real image data are of arbitrary size, and the adjustment of image size will cause image deformation and distortion and affect the segmentation results. Second, the image feature information mined by a single network in the encoding and decoding stage may be insufficient. The multi-layer perceptron network extracts global feature information, and the advantage of convolutional neural network is to focus on local information. Therefore, more comprehensive feature information can be obtained by combining different types of networks reasonably to improve network performance. Third, the simple skip connection structure of U-Net is difficult to effectively integrate the coarse-grained information and fine-grained information in the skip connection features of each scale, and there are semantic differences between the skip connection features of each scale, which limits the segmentation performance of the network. Certain flaws.

因此，亟需提出一种基于全尺度融合和流场注意力的图像分割方法，以解决上述问题。Therefore, it is urgent to propose an image segmentation method based on full-scale fusion and flow field attention to solve the above problems.

发明内容Contents of the invention

本发明的目的在于，提供一种基于全尺度融合和流场注意力的图像分割方法，以提高图像分割任务的适用性以及分割精度，同时提高图像分割的鲁棒性。The purpose of the present invention is to provide an image segmentation method based on full-scale fusion and flow field attention, so as to improve the applicability and segmentation accuracy of image segmentation tasks, and at the same time improve the robustness of image segmentation.

为解决上述技术问题，本发明提供一种基于全尺度融合和流场注意力的图像分割方法，包括如下步骤：In order to solve the above technical problems, the present invention provides an image segmentation method based on full-scale fusion and flow field attention, which includes the following steps:

获取图像数据集，对数据图像进行预处理；Obtain an image dataset and preprocess the data image;

以U-Net为骨干网络，构建图像分割模型；Use U-Net as the backbone network to build an image segmentation model;

将数据集中的训练图像输入至图像分割模型中进行训练；Input the training images in the data set into the image segmentation model for training;

通过选择合适的参数和损失函数调整模型至最优效果并进行保存；Adjust the model to the optimal effect by selecting appropriate parameters and loss functions and save it;

将数据集中的验证图像输入到训练好的图像分割模型中，得到分割预测结果。Input the verification image in the dataset into the trained image segmentation model to obtain the segmentation prediction result.

进一步的，所述图形分割模型包括依次连接的特征编码器、全尺度特征融合模块以及特征解码器。Further, the graph segmentation model includes a feature encoder, a full-scale feature fusion module and a feature decoder connected in sequence.

进一步的，所述图像分割模型数据处理包括如下步骤：Further, the image segmentation model data processing includes the following steps:

所述特征编码器提取图像的层次特征信息和全局特征信息；The feature encoder extracts hierarchical feature information and global feature information of the image;

所述全尺度特征融合模块将所述特征编码器提取的各层次特征信息以及全局特征信息进行交互融合；The full-scale feature fusion module interactively fuses the feature information of each level extracted by the feature encoder and the global feature information;

所述特征解码器细化各尺度特征映射，得到分割结果。The feature decoder refines feature maps of each scale to obtain segmentation results.

进一步的，所述特征编码器包括依次连接的U-Net骨干网络以及卷积MLP模块。Further, the feature encoder includes a U-Net backbone network and a convolutional MLP module connected in sequence.

进一步的，所述U-Net骨干网络用于获取输入图像5层不同尺度的特征映射

所述卷积MLP模块用于提取输入图像的全局特征信息并将其与底层特征映射F₁级联得到融合特征T₁。Further, the U-Net backbone network is used to obtain feature maps of 5 layers of different scales of the input image

The convolutional MLP module is used to extract the global feature information of the input image and concatenate it with the underlying feature map F₁ to obtain the fusion feature T₁ .

进一步的，所述全尺度特征融合模块沿着特征高度、宽度和通道维度进行特征融合，生成各支路融合特征

其中生成第层的融合特征公式为：Further, the full-scale feature fusion module performs feature fusion along the feature height, width and channel dimensions to generate the fusion feature of each branch

The fusion feature formula for generating the first layer is:

其中

表示第i层卷积操作用来调整特征映射的通道数，dc和uc分别表示卷积下采样和卷积上采样，Cat表示通道叠加操作。in

Indicates the number of channels used by the i-th layer convolution operation to adjust the feature map, dc and uc represent convolutional downsampling and convolutional upsampling, respectively, and Cat represents the channel superposition operation.

进一步的，所述特征解码器将所述特征编码器的输出T₁和全尺度特征融合模块的输出

逐级输入，输出逐级精细的特征图

Further, the feature decoder uses the output T1_of the feature encoder and the output of the full-scale feature fusion module

Step by step input, output step by step fine feature map

进一步的，所述特征解码器处理流程包括如下步骤：Further, the processing flow of the feature decoder includes the following steps:

将特征图P_i-1上采样至的相同尺寸的融合特征T_i，两者级联后通过卷积运算得到特征流场

来指导特征图P_i-1进行形变；The feature map P_i-1 is upsampled to the fusion feature T_i of the same size, and the two are cascaded to obtain the feature flow field through convolution operation

To guide the deformation of the feature map P_i-1 ;

将形变后的特征图P_i-1与T_i级联并输入并输入到特征解码器，输出特征P_i；Concatenate the deformed feature map P_i-1 with T_i and input it to the feature decoder, and output the feature P_i ;

将特征图P₅的通道数映射为分割的类别数，得到最终的分割结果。Map the number of channels of the feature map_P5 to the number of categories to obtain the final segmentation result.

进一步的，所述损失函数为GDL损失函数和交叉熵损失函数进行融合构建，所述损失函数为：Further, the loss function is constructed by fusion of the GDL loss function and the cross-entropy loss function, and the loss function is:

L＝L_CE+1.1L_GDL；L=L_CE +1.1L_GDL ;

其中，L_CE为交叉熵损失函数；L_GDL为GDL损失函数。Among them, L_CE is the cross-entropy loss function; L_GDL is the GDL loss function.

进一步的，所述GDL损失函数以及交叉熵损失函数分别为：Further, the GDL loss function and the cross-entropy loss function are respectively:

其中：

表示在M个类别中第m个类别的权重，g_mn表示类别m在第n和位置像素的真值，而p_mn表示相应的预测值。in:

Represents the weight of the mth category in the M categories, g_mn represents the true value of the nth and position pixel of the category m, and p_mn represents the corresponding predicted value.

相比于现有技术，本发明至少具有以下有益效果：Compared with the prior art, the present invention has at least the following beneficial effects:

本发明改进了传统网络的编码器和解码器结构，通过结合特定功能的网络结构以及对网络结构的改进，弥补了一般U型网络对于捕捉图像全局特征信息能力不足、图像特征上采样过程中失真等问题，提高了本方法对于不同分割任务的适应性以及图像分割精度。The invention improves the structure of the encoder and decoder of the traditional network. By combining the network structure with specific functions and improving the network structure, it makes up for the lack of ability of the general U-shaped network to capture the global feature information of the image and the distortion in the process of image feature upsampling. and other problems, which improves the adaptability of this method to different segmentation tasks and the accuracy of image segmentation.

同时，本发明提出的全尺度特征融合模块通过在各级跳跃连接上融合了粗粒度特征和细粒度特征，减小了各尺度特征间的语义差异，突出图像的关键特征信息，使得网络的性能与鲁棒性均显著提高。At the same time, the full-scale feature fusion module proposed by the present invention fuses coarse-grained features and fine-grained features at all levels of skip connections, which reduces the semantic differences between features of each scale, highlights the key feature information of the image, and makes the performance of the network and robustness are significantly improved.

附图说明Description of drawings

图1为本发明基于全尺度融合和流场注意力的图像分割方法流程图；Fig. 1 is the flow chart of the image segmentation method based on full-scale fusion and flow field attention in the present invention;

图2为本发明基于全尺度融合和流场注意力的图像分割方法的图像分割模型网络结构示意图；2 is a schematic diagram of the image segmentation model network structure of the image segmentation method based on full-scale fusion and flow field attention in the present invention;

图3为本发明基于全尺度融合和流场注意力的图像分割方法的卷积MLP模块结构示意图；3 is a schematic diagram of the convolutional MLP module structure of the image segmentation method based on full-scale fusion and flow field attention in the present invention;

图4为本发明基于全尺度融合和流场注意力的图像分割方法的全尺度特征融合模块单支示意图；Fig. 4 is a schematic diagram of a single branch of the full-scale feature fusion module of the image segmentation method based on full-scale fusion and flow field attention in the present invention;

图5为本发明基于全尺度融合和流场注意力的图像分割方法的流场注意力模块结构示意图。Fig. 5 is a schematic structural diagram of the flow field attention module of the image segmentation method based on full-scale fusion and flow field attention in the present invention.

具体实施方式detailed description

下面将结合示意图对本发明的基于全尺度融合和流场注意力的图像分割方法进行更详细的描述，其中表示了本发明的优选实施例，应该理解本领域技术人员可以修改在此描述的本发明，而仍然实现本发明的有利效果。因此，下列描述应当被理解为对于本领域技术人员的广泛知道，而并不作为对本发明的限制。The image segmentation method based on full-scale fusion and flow field attention of the present invention will be described in more detail below in conjunction with a schematic diagram, which represents a preferred embodiment of the present invention, and it should be understood that those skilled in the art can modify the present invention described here , while still realizing the advantageous effects of the present invention. Therefore, the following description should be understood as the broad knowledge of those skilled in the art, but not as a limitation of the present invention.

在下列段落中参照附图以举例方式更具体地描述本发明。根据下面说明和权利要求书，本发明的优点和特征将更清楚。需说明的是，附图均采用非常简化的形式且均使用非精准的比例，仅用以方便、明晰地辅助说明本发明实施例的目的。In the following paragraphs the invention is described more specifically by way of example with reference to the accompanying drawings. Advantages and features of the present invention will be apparent from the following description and claims. It should be noted that all the drawings are in a very simplified form and use imprecise scales, and are only used to facilitate and clearly assist the purpose of illustrating the embodiments of the present invention.

如图1所示，本发明实施例提出了一种基于全尺度融合和流场注意力的图像分割方法，包括如下步骤：As shown in Figure 1, the embodiment of the present invention proposes an image segmentation method based on full-scale fusion and flow field attention, including the following steps:

以下列举所述基于全尺度融合和流场注意力的图像分割方法在医学图像分割中的较优实施例，以清楚的说明本发明的内容，应当明确的是，本发明的内容并不限制于以下实施例，其他通过本领域普通技术人员的常规技术手段的改进亦在本发明的思想范围之内。The preferred embodiments of the image segmentation method based on full-scale fusion and flow field attention in medical image segmentation are listed below to clearly illustrate the content of the present invention. It should be clear that the content of the present invention is not limited to The following embodiments and other improvements through conventional technical means by those skilled in the art are also within the scope of the present invention.

S100、获取医学图像数据集，对数据图像进行预处理。S100. Acquire a medical image data set, and perform preprocessing on the data image.

具体的，若医学图像数据集是三维图像时，可以沿着图像的轴位方向切片采样1mm形成二维切片；若医学图像数据集是二维图像时，则不做切片操作；若获取的脑医学图像带有脑壳结构，可以通过算法处理去除脑壳；预处理后对数据图像进行归一化处理，使得输入图像像素均值为0，方差为1，且采用随机旋转和随机翻转等方式扩充数据量。数据集全部按照6:4的比例划分为训练集和测试集。Specifically, if the medical image dataset is a three-dimensional image, it can be sliced and sampled 1mm along the axial direction of the image to form a two-dimensional slice; if the medical image dataset is a two-dimensional image, no slice operation is performed; if the acquired brain The medical image has a skull structure, which can be removed by algorithm processing; after preprocessing, the data image is normalized so that the average value of the input image pixels is 0, and the variance is 1, and random rotation and random flipping are used to expand the amount of data . All datasets are divided into training set and test set according to the ratio of 6:4.

S200、以U-Net为骨干网络，构建包含有卷积MLP模块，全尺度特征融合以及流场注意力解码模块的图像分割模型。S200. Using U-Net as a backbone network, construct an image segmentation model including a convolutional MLP module, a full-scale feature fusion and a flow field attention decoding module.

具体的，如图2所示，该方法图像分割模型分为三个处理阶段：第一阶段采用集成U-Net骨干网络和卷积MLP模块的特征编码器来提取图像的层次特征信息和全局特征信息；第二阶段全尺度特征融合模块将一阶段提取的各层次信息特征进行交互融合，生成各级用于跳连接的特征映射；第三阶段利用了包含流场变换和注意力机制的流场注意力模块(Flow and Attention Decoding Unit，FADU)，构成的特征解码器细化了各尺度输入的特征映射，通过通道映射得到预测分割结果。Specifically, as shown in Figure 2, the image segmentation model of this method is divided into three processing stages: the first stage uses the feature encoder integrating the U-Net backbone network and the convolutional MLP module to extract the hierarchical feature information and global features of the image information; in the second stage, the full-scale feature fusion module interactively fuses the information features of each level extracted in the first stage to generate feature maps for jump connections at all levels; in the third stage, the flow field including flow field transformation and attention mechanism is used The attention module (Flow and Attention Decoding Unit, FADU), the feature decoder composed of refines the feature map of each scale input, and obtains the predicted segmentation result through the channel map.

特征编码器分为U-Net骨干网络以及卷积MLP模块两部分。给定输入图像

经过U-Net得到5层不同尺度的特征映射

同时利用卷积MLP模块提取输入图像I的全局特征信息并将其与底层特征映射F₁级联得到T₁。如图3所示，卷积MLP模块包含一组由3×3卷积下采样、3×3深度可分离卷积和池化下采样组成的卷积块，以及三组包含MLP和卷积下采样的卷积MLP块。特征编码器在提取图像的局部空间特征信息同时保留了图像全局特征信息。The feature encoder is divided into two parts: the U-Net backbone network and the convolutional MLP module. Given an input image

Get 5 layers of feature maps of different scales through U-Net

At the same time, the convolutional MLP module is used to extract the global feature information of the input image I and concatenate it with the underlying feature map F₁ to obtain T₁ . As shown in Figure 3, the convolutional MLP module consists of a set of convolutional blocks consisting of 3×3 convolutional downsampling, 3×3 depthwise separable convolution, and pooling downsampling, and three sets of convolutional blocks consisting of MLP and convolutional downsampling Sampled convolutional MLP block. The feature encoder extracts the local spatial feature information of the image while retaining the global feature information of the image.

全尺度特征融合模块由四条独立分支组成，按照特定的特征高度、宽度和通道维度进行特征融合，生成各支路融合特征

即各级用于跳连接的特征映射。参阅图4，来阐述单分支结构工作流程，down和up分别表示将特征图下采样和上采样到特征F₂的高度和宽度，null表示不做采样操作。C表示卷积操作，这里统一均为3×3卷积，输出通道数为64。经过Cat操作，输入的5层特征按通道级联叠加为单一特征映射

(通道数为320)，

表示卷积操作用来调整特征映射的通道数，将通道数调整为对应特征映射F₂的通道大小，最后再与特征映射F₂相加得到跳连接特征T₂。故生成第i层的融合特征T_i公式为：The full-scale feature fusion module consists of four independent branches, which perform feature fusion according to specific feature height, width and channel dimensions, and generate fusion features of each branch

That is, feature maps for skip connections at all levels. Refer to Figure 4 to illustrate the workflow of the single-branch structure. down and up represent the downsampling and upsampling of the feature map to the height and width of the feature F₂ respectively, and null represents no sampling operation. C represents the convolution operation, which is uniformly 3×3 convolution, and the number of output channels is 64. After the Cat operation, the input 5-layer features are cascaded and superimposed into a single feature map by channel

(the number of channels is 320),

Indicates that the convolution operation is used to adjust the number of channels of the feature map, adjust the number of channels to the channel size of the corresponding feature map F₂ , and finally add it to the feature map F₂ to obtain the jump connection feature T₂ . Therefore, the formula for generating the fusion feature T_i of the i-th layer is:

其中公式(1)中

表示第i层卷积操作用来调整特征映射的通道数，将通道数调整为对应特征映射F_i的通道大小，dC和uC分别表示卷积下采样和卷积上采样，Cat表示通道叠加操作。where in the formula (1)

Indicates that the i-th layer convolution operation is used to adjust the number of channels of the feature map, and adjust the number of channels to the channel size of the corresponding feature map F_i , dC and uC represent convolution down-sampling and convolution up-sampling, respectively, and Cat represents the channel superposition operation .

特征解码器由流场注意力模块(FADU)构成，该解码器将特征编码器的输出T₁和全尺度特征融合模块的输出

逐级输入，输出逐级精细的特征图

实现特征细化的同时避免信息冗余。如图5所示，流场注意力解码模块工作方式如下：首先将输入特征P_i-1上采样至T_i的相同尺寸，两者级联后通过卷积运算得到特征流场

来指导特征P_i-1进行形变，其形变公式如下：The feature decoder consists of a Flow Field Attention Module (FADU), which combines the output T1_of the feature encoder and the output of the full-scale feature fusion module

Step by step input, output step by step fine feature map

Achieving feature refinement while avoiding information redundancy. As shown in Figure 5, the flow field attention decoding module works as follows: first, the input feature P_i-1 is upsampled to the same size as T_i , and after the two are cascaded, the feature flow field is obtained by convolution operation

To guide the deformation of the feature P_i-1 , the deformation formula is as follows:

warp(P_i-1)＝P_i-1(p_x+φ(p)_x,p_x+φ(p)_x)warp(P_i-1 )＝P_i-1 (p_x +φ(p)_x ,p_x +φ(p)_x )

(2)(2)

其中公式(2)中p下标的x和y是像素点的坐标，流场形变减少了特征图在上采样过程中失真的问题；再将形变后的特征P_i-1与T_i级联输入3×3卷积块输出特征P_i′。卷积运算公式为：The x and y of the p subscript in the formula (2) are the coordinates of the pixel point, and the deformation of the flow field reduces the distortion of the feature map during the upsampling process; then the deformed feature P_i-1 and T_i are cascaded and input The 3×3 convolutional block outputs features P_i ′. The convolution operation formula is:

P_i′＝σ(C(W_PP_i-1)+C(W_TT_i))P_i ′＝σ(C(W_P P_i-1 )+C(W_T T_i ))

(3)(3)

其中公式(3)中σ表示ReLU激活函数，C表示卷积运算，W_P和W_T分别表示隐藏状态P_i-1和跳连接特征T_i的权重矩阵。由于简单将高层特征和浅层特征融合的方式通常会带来信息的冗余和混乱，故将特征Pi′输入卷积注意力模块(CBAM)得到P_i，注意力机制提高了模块对有效信息的利用效率。总的来看，P_i可以逐层通过输入的P_i-1和T_i来生成，公式如下所示：Among them, σ in formula (3) represents the ReLU activation function, C represents the convolution operation, W_P and W_T represent the hidden state P_i-1 and the weight matrix of the skip connection feature T_i , respectively. Since the way of simply fusing high-level features and shallow features usually brings information redundancy and confusion, the feature Pi' is input into the convolutional attention module (_CBAM ) to obtain Pi. The attention mechanism improves the module's ability to effectively inform utilization efficiency. In general, P_i can be generated layer by layer through the input P_i-1 and T_i , the formula is as follows:

P_i＝FADU(P_i-1,T_i；φ)P_i =FADU(P_i-1 ,T_i ;φ)

(4)(4)

其中公式(4)中i的取值为1到5，P₀则是初始化为0的张量。编码过程的最后利用3×3卷积将特征图P₅的通道数映射为分割的类别数，得到最终的分割结果。The value of i in formula (4) is 1 to 5, and P₀ is a tensor initialized to 0. At the end of the encoding process, 3×3 convolution is used to map the number of channels of the feature map P₅ into the number of categories to be segmented to obtain the final segmentation result.

S300、将数据集中的训练图像输入至图像分割模型中进行训练。S300. Input the training images in the data set into the image segmentation model for training.

具体的，在训练过程中，模型采用的是Adam优化算法促使损失函数趋向最小来更新网络参数，初始学习率设置为0.0006，权重衰减值为0.0005。同时，训练中的每批数据量大小设置为1以及总的迭代次数是30000。Specifically, during the training process, the model uses the Adam optimization algorithm to minimize the loss function to update the network parameters. The initial learning rate is set to 0.0006, and the weight decay value is 0.0005. At the same time, the size of each batch of data in training is set to 1 and the total number of iterations is 30000.

S400、通过选择合适的参数和损失函数调整模型至最优效果并进行保存。S400. Adjust the model to an optimal effect by selecting appropriate parameters and a loss function and save it.

具体的，影响图像分割模型总体性能结果的因素不仅有网络结构设计，还有损失函数也发挥着关键作用。该网络的损失函数为交叉熵损失函数和GDL损失函数进行融合构建，其中交叉熵损失函数为

其中g_mn表示类别m在第n和位置像素的真值，而p_mn表示相应的预测值，而GDL的损失函数为：

其中

表示在M类别中第m类别的权重，最终形成的融合损失函数公式为：L＝L_CE+1.1L_GDL。Specifically, the factors that affect the overall performance of the image segmentation model are not only the network structure design, but also the loss function plays a key role. The loss function of the network is constructed by fusion of the cross-entropy loss function and the GDL loss function, where the cross-entropy loss function is

Where g_mn represents the true value of the nth and position pixel of category m, and p_mn represents the corresponding predicted value, and the loss function of GDL is:

in

Indicates the weight of the mth category in the M category, and the final fusion loss function formula is: L=L_CE +1.1L_GDL .

S500、将数据集中的验证图像输入到训练好的图像分割模型中，得到分割预测结果。S500. Input the verification image in the data set into the trained image segmentation model to obtain a segmentation prediction result.

显然，本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样，倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内，则本发明也意图包含这些改动和变型在内。Obviously, those skilled in the art can make various changes and modifications to the present invention without departing from the spirit and scope of the present invention. Thus, if these modifications and variations of the present invention fall within the scope of the claims of the present invention and equivalent technologies thereof, the present invention also intends to include these modifications and variations.

Claims

Translated fromChinese

1.一种基于全尺度融合和流场注意力的图像分割方法，其特征在于，包括如下步骤：1. An image segmentation method based on full-scale fusion and flow field attention, is characterized in that, comprises the steps:

2.如权利要求1所述的基于全尺度融合和流场注意力的图像分割方法，其特征在于，所述图形分割模型包括依次连接的特征编码器、全尺度特征融合模块以及特征解码器。2. The image segmentation method based on full-scale fusion and flow field attention as claimed in claim 1, wherein the graph segmentation model includes a sequentially connected feature encoder, full-scale feature fusion module, and feature decoder.

3.如权利要求2所述的基于全尺度融合和流场注意力的图像分割方法，其特征在于，所述图像分割模型数据处理包括如下步骤：3. the image segmentation method based on full-scale fusion and flow field attention as claimed in claim 2, is characterized in that, described image segmentation model data processing comprises the steps:

4.如权利要求2所述的基于全尺度融合和流场注意力的图像分割方法，其特征在于，所述特征编码器包括依次连接的U-Net骨干网络以及卷积MLP模块。4. The image segmentation method based on full-scale fusion and flow field attention as claimed in claim 2, wherein the feature encoder includes a U-Net backbone network and a convolutional MLP module connected in sequence.

5.如权利要求2所述的基于全尺度融合和流场注意力的图像分割方法，其特征在于，所述U-Net骨干网络用于获取输入图像5层不同尺度的特征映射

所述卷积MLP模块用于提取输入图像的全局特征信息并将其与底层特征映射F₁级联得到融合特征T₁。5. The image segmentation method based on full-scale fusion and flow field attention as claimed in claim 2, wherein the U-Net backbone network is used to obtain feature maps of 5 layers of different scales of the input image

6.如权利要求2所述的基于全尺度融合和流场注意力的图像分割方法，其特征在于，所述全尺度特征融合模块沿着特征高度、宽度和通道维度进行特征融合，生成各支路融合特征

其中生成第层的融合特征公式为：6. The image segmentation method based on full-scale fusion and flow field attention as claimed in claim 2, wherein the full-scale feature fusion module performs feature fusion along feature height, width and channel dimensions to generate each branch road fusion feature

The fusion feature formula for generating the first layer is:

其中

7.如权利要求2所述的基于全尺度融合和流场注意力的图像分割方法，其特征在于，所述特征解码器将所述特征编码器的输出T₁和全尺度特征融合模块的输出

逐级输入，输出逐级精细的特征图

7. The image segmentation method based on full-scale fusion and flow field attention as claimed in claim 2, wherein the feature decoder combines the output T_of the feature encoder with the output of the full-scale feature fusion module

Step by step input, output step by step fine feature map

8.如权利要求1所述的基于全尺度融合和流场注意力的图像分割方法，其特征在于，所述特征解码器处理流程包括如下步骤：8. the image segmentation method based on full-scale fusion and flow field attention as claimed in claim 1, is characterized in that, described feature decoder processing flow comprises the steps:

To guide the deformation of the feature map P_i-1 ;

9.如权利要求1所述的基于全尺度融合和流场注意力的图像分割方法，其特征在于，所述损失函数为GDL损失函数和交叉熵损失函数进行融合构建，所述损失函数为：9. The image segmentation method based on full-scale fusion and flow field attention as claimed in claim 1, wherein the loss function is a GDL loss function and a cross-entropy loss function for fusion construction, and the loss function is:

L＝L_CE+1.1L_GDL；L=L_CE +1.1L_GDL ;

10.如权利要求9所述的基于全尺度融合和流场注意力的图像分割方法，其特征在于，所述GDL损失函数以及交叉熵损失函数分别为：10. The image segmentation method based on full-scale fusion and flow field attention as claimed in claim 9, wherein the GDL loss function and the cross-entropy loss function are respectively:

其中：