CN117152193A

Movatterモバイル変換

Info

Publication number: CN117152193A
Application number: CN202311003985.9A
Authority: CN
Inventors: 田宜彬; 曾智; 曹振杰; 潘银; 张雪洋; 杨群
Original assignee: Guanglun Technology Shenzhen Co ltd; Chongqing Normal University
Current assignee: Guanglun Technology Shenzhen Co ltd; Chongqing Normal University
Priority date: 2023-08-09
Filing date: 2023-08-09
Publication date: 2023-12-01

Abstract

The invention discloses an image segmentation method and device based on multi-scale feature extraction, wherein the method comprises the following steps: inputting the original image into a pre-trained U² Performing image size reduction processing on a data input layer of the Net++ network to obtain a target image; inputting the target image into U² The feature extraction layer of the net++ network performs multi-scale feature extraction,obtaining a candidate feature map set; the feature extraction layer comprises a plurality of levels, each level comprising at least one multi-scale feature extraction module; when a certain level comprises at least two multi-scale feature extraction modules, an attention module is arranged between every two adjacent multi-scale feature extraction modules in the level; inputting the candidate feature diagram set into U² And carrying out feature fusion processing on the multi-side output fusion layer of the Net++ network to obtain a target segmentation feature map. Therefore, the method and the device can improve the extraction efficiency and the extraction accuracy of the image feature extraction result, and are beneficial to improving the segmentation efficiency and the segmentation accuracy of the image segmentation result.

Description

Translated fromChinese

基于多尺度特征提取的图像分割方法及装置Image segmentation method and device based on multi-scale feature extraction

技术领域Technical Field

本发明涉及图像处理技术领域，尤其涉及一种基于多尺度特征提取的图像分割方法及装置。The present invention relates to the field of image processing technology, and in particular to an image segmentation method and device based on multi-scale feature extraction.

背景技术Background Art

目前，在需要分析某个身体部位的状态时，可以通过拍摄该身体部位获得该身体部位的图像，且对图像中该身体部位(目标)和其余无关部分(背景)进行分割，获取图像中该身体部位的区域信息和边缘信息，以便更准确地分析该身体部位的状态。例如：在需要分析牙齿状态时，可以通过X光摄影技术对牙齿进行拍摄获得X光牙齿图像，且可以通过对X光牙齿图像中的牙齿部分和非牙齿部分进行分割，获取图像中牙齿区域信息和牙齿边缘信息，以便更准确地分析牙齿的状态。可见，图像分割的准确度对于分析身体部位的状态起至关重要的作用。At present, when it is necessary to analyze the state of a certain body part, the image of the body part can be obtained by photographing the body part, and the body part (target) and the other irrelevant parts (background) in the image can be segmented to obtain the regional information and edge information of the body part in the image, so as to more accurately analyze the state of the body part. For example: when it is necessary to analyze the state of teeth, the teeth can be photographed by X-ray photography technology to obtain an X-ray tooth image, and the tooth part and the non-tooth part in the X-ray tooth image can be segmented to obtain the tooth region information and tooth edge information in the image, so as to more accurately analyze the state of the teeth. It can be seen that the accuracy of image segmentation plays a vital role in analyzing the state of body parts.

然而，实践发现，若需要进行分析的图像具有低对比度和不均匀强度分布等特性，在通过现有技术对该图像进行分割后，容易出现目标相互粘连和目标边缘无法准确分割的问题。因此，提出一种能够提高图像分割的准确性的技术方案显得尤为重要。However, it is found in practice that if the image to be analyzed has characteristics such as low contrast and uneven intensity distribution, after the image is segmented using the existing technology, it is easy to cause problems such as objects sticking to each other and the object edges cannot be accurately segmented. Therefore, it is particularly important to propose a technical solution that can improve the accuracy of image segmentation.

发明内容Summary of the invention

本发明所要解决的技术问题在于，提供一种基于多尺度特征提取的图像分割方法及装置，能够提高图像分割的准确性。The technical problem to be solved by the present invention is to provide an image segmentation method and device based on multi-scale feature extraction, which can improve the accuracy of image segmentation.

为了解决上述技术问题，本发明第一方面公开了一种基于多尺度特征提取的图像分割方法，所述方法包括：In order to solve the above technical problems, the first aspect of the present invention discloses an image segmentation method based on multi-scale feature extraction, the method comprising:

将原始图像输入至预先训练好的U²Net++网络的数据输入层进行图像尺寸缩小处理，得到目标图像，所述目标图像的图像尺寸为目标尺寸，且所述数据输入层用于将所述原始图像的图像尺寸缩小至所述目标尺寸；Inputting the original image into the data input layer of the pre-trained U² Net++ network to perform image size reduction processing to obtain a target image, wherein the image size of the target image is the target size, and the data input layer is used to reduce the image size of the original image to the target size;

将所述目标图像输入至所述U²Net++网络的特征提取层进行多尺度特征提取，得到候选特征图集合；所述特征提取层包括多个层级，每个所述层级包括至少一个多尺度特征提取模块；当某一所述层级包括至少两个所述多尺度特征提取模块时，该层级中每相邻两个所述多尺度特征提取模块之间均设置有注意力模块，每个所述注意力模块用于识别输入至该注意力模块的特征图的感兴趣区域并将所述感兴趣区域输入至该注意力模块的在后相邻的所述多尺度特征提取模块；所述候选特征图集合包括至少两个候选特征图子集合，每个所述候选特征图子集合所包含的特征图用于特征融合以获得分割特征图；The target image is input into the feature extraction layer of the^U2Net ++ network for multi-scale feature extraction to obtain a set of candidate feature maps; the feature extraction layer includes multiple levels, each of which includes at least one multi-scale feature extraction module; when a certain level includes at least two multi-scale feature extraction modules, an attention module is provided between each two adjacent multi-scale feature extraction modules in the level, each of which is used to identify an area of interest of the feature map input into the attention module and input the area of interest into the multi-scale feature extraction module adjacent to the attention module; the set of candidate feature maps includes at least two subsets of candidate feature maps, and the feature maps contained in each subset of candidate feature maps are used for feature fusion to obtain a segmentation feature map;

将所述候选特征图集合输入至所述U²Net++网络的多侧输出融合层进行特征融合处理，得到目标分割特征图。The candidate feature map set is input into the multi-side output fusion layer of the U² Net++ network for feature fusion processing to obtain a target segmentation feature map.

作为一种可选的实施方式，在本发明第一方面中，当某一所述层级仅包括一个所述多尺度特征提取模块时，该多尺度特征提取模块用作编码器；当某一所述层级包括两个所述多尺度特征提取模块时，在该层级位置最先的所述多尺度特征提取模块用作编码器，在该层级位置最后的所述多尺度特征提取模块用作解码器，且该层级的所述编码器与该层级的所述解码器之间设置有跳跃路径；当某一所述层级包括两个以上所述多尺度特征提取模块时，该层级除所述编码器和所述解码器以外的其余所述多尺度特征提取模块用作密集残差模块，且该层级的所述编码器与该层级的每个所述密集残差模块之间、该层级的每个所述密集残差模块与该层级的所述解码器之间均设置有所述跳跃路径；其中，所述编码器用于提取输入至该编码器的图像的多尺度特征；所述解码器用于恢复输入至该解码器的图像的分辨率；所述密集残差模块用于将深层级的信息集成到该密集残差模块所处的层级；As an optional implementation, in the first aspect of the present invention, when a certain level includes only one multi-scale feature extraction module, the multi-scale feature extraction module is used as an encoder; when a certain level includes two multi-scale feature extraction modules, the multi-scale feature extraction module at the first position of the level is used as an encoder, and the multi-scale feature extraction module at the last position of the level is used as a decoder, and a jump path is set between the encoder of the level and the decoder of the level; when a certain level includes more than two multi-scale feature extraction modules, the remaining multi-scale feature extraction modules of the level except the encoder and the decoder are used as dense residual modules, and the jump path is set between the encoder of the level and each of the dense residual modules of the level, and between each of the dense residual modules of the level and the decoder of the level; wherein the encoder is used to extract multi-scale features of an image input to the encoder; the decoder is used to restore the resolution of the image input to the decoder; the dense residual module is used to integrate deep-level information into the level where the dense residual module is located;

其中，若某一所述层级所包含的所述多尺度特征提取模块的数量越多，则该层级越浅，最浅层级为包含所述多尺度特征提取模块的数量最多的层级；若某一所述层级所包含的所述多尺度特征提取模块的数量越少，则该层级越深，最深层级为仅包含一个所述多尺度特征提取模块的层级；Among them, if a certain level contains more multi-scale feature extraction modules, the level is shallower, and the shallowest level is the level containing the largest number of multi-scale feature extraction modules; if a certain level contains fewer multi-scale feature extraction modules, the level is deeper, and the deepest level is the level containing only one multi-scale feature extraction module;

每一所述层级的所述编码器与相邻深层级的所述编码器之间设置有下采样路径；每一所述多尺度特征提取模块与相邻浅层级的第一相邻模块之间设置有上采样路径，其中，所述第一相邻模块在所处层级中的位置相较于该多尺度特征提取模块在所处层级中的位置在后相邻；A downsampling path is provided between the encoder of each layer and the encoder of the adjacent deep layer; an upsampling path is provided between each multi-scale feature extraction module and the first adjacent module of the adjacent shallow layer, wherein the position of the first adjacent module in the layer is adjacent to the position of the multi-scale feature extraction module in the layer;

对于每一所述注意力模块，输入至该注意力模块的特征图包括第一输入特征图和第二输入特征图；其中，以接收该注意力模块所输出的感兴趣区域的所述多尺度特征提取模块作为位置基准模块，所述位置基准模块的在先相邻的多尺度特征提取模块所输出的特征图为所述第一输入特征图；所述位置基准模块的相邻深层级的第二相邻模块所输出的特征图为所述第二输入特征图，所述第二相邻模块在所处层级中的位置相较于所述位置基准模块在所处层级中的位置在先相邻。For each of the attention modules, the feature map input to the attention module includes a first input feature map and a second input feature map; wherein, the multi-scale feature extraction module that receives the area of interest output by the attention module is used as a position reference module, and the feature map output by the prior adjacent multi-scale feature extraction module of the position reference module is the first input feature map; the feature map output by the second adjacent module of the adjacent deep level of the position reference module is the second input feature map, and the position of the second adjacent module in the level is prior to the position of the position reference module in the level.

作为一种可选的实施方式，在本发明第一方面中，所述将所述目标图像输入至所述U²Net++网络的特征提取层进行多尺度特征提取，得到候选特征图集合，包括：As an optional implementation, in the first aspect of the present invention, the inputting the target image into the feature extraction layer of the U² Net++ network to perform multi-scale feature extraction to obtain a set of candidate feature maps includes:

将所述目标图像输入至所述U²Net++网络的特征提取层中所述最浅层级的编码器，并基于所述下采样路径对所述目标图像进行下采样操作直至所述最深层级的相邻浅层级，得到多尺度下采样特征图；Inputting the target image into the encoder of the shallowest level in the feature extraction layer of the U² Net++ network, and performing a downsampling operation on the target image based on the downsampling path until an adjacent shallow level of the deepest level, to obtain a multi-scale downsampled feature map;

基于所述最深层级的编码器，对所述最深层级的相邻浅层级的所述编码器所输出的多尺度下采样特征图进行膨胀卷积处理，得到膨胀特征图；Based on the encoder of the deepest level, performing dilated convolution processing on the multi-scale down-sampled feature map output by the encoder of the shallowest level adjacent to the deepest level to obtain a dilated feature map;

对于除所有所述编码器之外的每一所述多尺度特征提取模块，基于接收到的感兴趣区域，对输入至该多尺度特征提取模块的所有特征图进行上采样操作，得到多尺度上采样特征图；For each of the multi-scale feature extraction modules except all the encoders, based on the received region of interest, upsampling all feature maps input to the multi-scale feature extraction module to obtain a multi-scale up-sampled feature map;

将所述最浅层级中的所有所述多尺度上采样特征图确定为第一候选特征图子集合；Determine all of the multi-scale upsampled feature maps in the shallowest level as a first candidate feature map subset;

将所述膨胀特征图以及除所述最浅层级之外的其余每一所述层级的解码器所输出的所述多尺度上采样特征图确定为第二候选特征图子集合。The dilated feature map and the multi-scale up-sampled feature maps output by the decoders of each of the remaining levels except the shallowest level are determined as a second candidate feature map subset.

作为一种可选的实施方式，在本发明第一方面中，当所述原始图像为X光牙齿图像时，所述目标分割特征图为以牙齿作为分割目标的分割特征图；所述多尺度特征提取模块的类型包括第一特征提取类型和第二特征提取类型；As an optional implementation, in the first aspect of the present invention, when the original image is an X-ray tooth image, the target segmentation feature map is a segmentation feature map with teeth as the segmentation target; the type of the multi-scale feature extraction module includes a first feature extraction type and a second feature extraction type;

所述多尺度特征提取模块包括多个子层级，每一所述子层级包括至少一个多尺度特征提取子模块；当某一所述子层级中包括用于接收输入特征图的多尺度特征提取子模块时，该子层级为最浅子层级，所述用于接收所述输入特征图的多尺度特征提取子模块为输入卷积子模块，所述最浅子层级还包括融合子模块，所述融合子模块为用作融合特征并输出融合特征图的多尺度特征提取子模块；当某一所述子层级中仅包括一个多尺度特征提取子模块时，该子层级为最深子层级；The multi-scale feature extraction module includes a plurality of sub-levels, each of which includes at least one multi-scale feature extraction sub-module; when a certain sub-level includes a multi-scale feature extraction sub-module for receiving an input feature map, the sub-level is the shallowest sub-level, the multi-scale feature extraction sub-module for receiving the input feature map is an input convolution sub-module, and the shallowest sub-level also includes a fusion sub-module, which is a multi-scale feature extraction sub-module used for fusion features and outputting a fusion feature map; when a certain sub-level includes only one multi-scale feature extraction sub-module, the sub-level is the deepest sub-level;

当某一所述子层级包括至少两个所述多尺度特征提取子模块时，该层级中每相邻两个所述多尺度特征提取子模块之间均设置有注意力子模块，且该层级中各个所述多尺度特征提取子模块之间设置有跳跃子路径；对于每一所述注意力子模块，该注意力子模块接收在先相邻的所述多尺度特征提取子模块所输出的特征图以及相邻深子层级的位置在先的所述多尺度特征提取子模块所输出的特征图，且该注意力子模块的在后相邻的所述多尺度特征提取子模块接收该注意力子模块所输出的感兴趣区域；When a certain sub-level includes at least two multi-scale feature extraction sub-modules, an attention sub-module is provided between each two adjacent multi-scale feature extraction sub-modules in the level, and a jump sub-path is provided between each multi-scale feature extraction sub-module in the level; for each attention sub-module, the attention sub-module receives the feature map output by the previous adjacent multi-scale feature extraction sub-module and the feature map output by the previous multi-scale feature extraction sub-module of the adjacent deep sub-level, and the subsequent adjacent multi-scale feature extraction sub-module of the attention sub-module receives the region of interest output by the attention sub-module;

每一所述子层级的位置最先的所述多尺度特征提取子模块与相邻深层级的位置最先的所述多尺度特征提取子模块之间设置有第一特征提取子路径；每一所述子层级的位置最后的所述多尺度特征提取子模块与相邻浅层级的位置最后的所述多尺度特征提取子模块之间设置有第二特征提取子路径。A first feature extraction subpath is set between the multi-scale feature extraction submodule at the first position of each sub-level and the multi-scale feature extraction submodule at the first position of the adjacent deep level; a second feature extraction subpath is set between the multi-scale feature extraction submodule at the last position of each sub-level and the multi-scale feature extraction submodule at the last position of the adjacent shallow level.

作为一种可选的实施方式，在本发明第一方面中，当所述多尺度特征提取模块的类型为所述第一特征提取类型时，所述第一特征提取子路径为下采样子路径，且除所述最浅子层级以外的在所述下采样路径上的所述多尺度特征提取子模块为卷积子模块或下采样子模块；所述第一特征提取子路径为上采样子路径，且除所述最浅子层级以外的在所述上采样路径上的所述多尺度特征提取子模块为所述卷积子模块或上采样子模块；每一所述子层级中除所述上采样子模块和所述下采样子模块以外的其余所述多尺度特征提取子模块为密集卷积子模块。As an optional embodiment, in the first aspect of the present invention, when the type of the multi-scale feature extraction module is the first feature extraction type, the first feature extraction sub-path is a downsampling sub-path, and the multi-scale feature extraction sub-module on the downsampling path except the shallowest sub-level is a convolution sub-module or a downsampling sub-module; the first feature extraction sub-path is an upsampling sub-path, and the multi-scale feature extraction sub-module on the upsampling path except the shallowest sub-level is the convolution sub-module or the upsampling sub-module; the remaining multi-scale feature extraction sub-modules in each sub-level except the upsampling sub-module and the downsampling sub-module are dense convolution sub-modules.

作为一种可选的实施方式，在本发明第一方面中，当所述多尺度特征提取模块的类型为所述第二特征提取类型时，在所述多尺度特征提取模块中，除所述最浅子层级以外的其余每一所述子层级中的所述多尺度特征提取子模块均为膨胀卷积子模块；As an optional implementation, in the first aspect of the present invention, when the type of the multi-scale feature extraction module is the second feature extraction type, in the multi-scale feature extraction module, the multi-scale feature extraction submodules in each of the sub-levels except the shallowest sub-level are all dilated convolution submodules;

其中，所述最深层级的编码器的类型为所述第二特征提取类型。The type of the deepest level encoder is the second feature extraction type.

作为一种可选的实施方式，在本发明第一方面中，所述将所述候选特征图集合输入至所述U²Net++网络的多侧输出融合层进行特征融合处理，得到目标分割特征图，包括：As an optional implementation, in the first aspect of the present invention, the step of inputting the candidate feature map set into the multi-side output fusion layer of the U² Net++ network for feature fusion processing to obtain a target segmentation feature map comprises:

基于预设激活函数，对所述第一候选特征图子集合中的所有所述候选特征图进行特征融合处理，得到第一分割特征图；其中，所述第一分割特征图的计算公式如下：Based on a preset activation function, feature fusion processing is performed on all the candidate feature maps in the first candidate feature map subset to obtain a first segmentation feature map; wherein the calculation formula of the first segmentation feature map is as follows:

式中，表示连接操作，Y^0,5为所述第一分割特征图，Y^0,1、Y^0,2、Y^0,3和Y^0,4均为所述第一候选特征图子集合中的特征图；In the formula, represents a connection operation, Y^0,5 is the first segmentation feature map, Y^0,1 , Y^0,2 , Y^0,3 and Y^0,4 are all feature maps in the first candidate feature map subset;

基于所述预设激活函数，对所述第二候选特征图子集合中的所有所述候选特征图进行特征融合处理，得到第二分割特征图；其中，所述第二分割特征图的计算公式如下：Based on the preset activation function, feature fusion processing is performed on all the candidate feature maps in the second candidate feature map subset to obtain a second segmentation feature map; wherein the calculation formula of the second segmentation feature map is as follows:

式中，Y^5,0为所述第二分割特征图，Y^1,0、Y^2,0、Y^3,0和Y^4,0均为所述第二候选特征图子集合中的特征图；Wherein, Y^5,0 is the second segmentation feature map, Y^1,0 , Y^2,0 , Y^3,0 and Y^4,0 are all feature maps in the second candidate feature map subset;

基于所述预设激活函数，对所述第一分割特征图和所述第二分割特征图进行特征融合处理，得到目标分割特征图；其中，所述目标分割特征图的计算公式如下：Based on the preset activation function, feature fusion processing is performed on the first segmentation feature map and the second segmentation feature map to obtain a target segmentation feature map; wherein the calculation formula of the target segmentation feature map is as follows:

式中，Y^5,5为所述目标分割特征图。Wherein, Y^5,5 is the target segmentation feature map.

本发明第二方面公开了一种基于多尺度特征提取的图像分割装置，所述装置包括：A second aspect of the present invention discloses an image segmentation device based on multi-scale feature extraction, the device comprising:

图像尺寸缩小单元，用于将原始图像输入至预先训练好的U²Net++网络的数据输入层进行图像尺寸缩小处理，得到目标图像，所述目标图像的图像尺寸为目标尺寸，且所述数据输入层用于将所述原始图像的图像尺寸缩小至所述目标尺寸；An image size reduction unit, used for inputting an original image into a data input layer of a pre-trained U² Net++ network to perform image size reduction processing to obtain a target image, wherein the image size of the target image is a target size, and the data input layer is used for reducing the image size of the original image to the target size;

多尺度特征提取单元，用于将所述目标图像输入至所述U²Net++网络的特征提取层进行多尺度特征提取，得到候选特征图集合；所述特征提取层包括多个层级，每个所述层级包括至少一个多尺度特征提取模块；当某一所述层级包括至少两个所述多尺度特征提取模块时，该层级中每相邻两个所述多尺度特征提取模块之间均设置有注意力模块，每个所述注意力模块用于识别输入至该注意力模块的特征图的感兴趣区域并将所述感兴趣区域输入至该注意力模块的在后相邻的所述多尺度特征提取模块；所述候选特征图集合包括至少两个候选特征图子集合，每个所述候选特征图子集合所包含的特征图用于特征融合以获得分割特征图；A multi-scale feature extraction unit, used for inputting the target image into the feature extraction layer of the U² Net++ network for multi-scale feature extraction to obtain a set of candidate feature maps; the feature extraction layer includes a plurality of levels, each of which includes at least one multi-scale feature extraction module; when a certain level includes at least two multi-scale feature extraction modules, an attention module is provided between each two adjacent multi-scale feature extraction modules in the level, each of which is used for identifying an area of interest of a feature map input into the attention module and inputting the area of interest into the multi-scale feature extraction module adjacent to the attention module; the set of candidate feature maps includes at least two subsets of candidate feature maps, and the feature maps contained in each subset of candidate feature maps are used for feature fusion to obtain a segmentation feature map;

多侧输出融合单元，用于将所述候选特征图集合输入至所述U²Net++网络的多侧输出融合层进行特征融合处理，得到目标分割特征图。The multi-side output fusion unit is used to input the candidate feature map set into the multi-side output fusion layer of the U² Net++ network for feature fusion processing to obtain a target segmentation feature map.

作为一种可选的实施方式，在本发明第二方面中，当某一所述层级仅包括一个所述多尺度特征提取模块时，该多尺度特征提取模块用作编码器；当某一所述层级包括两个所述多尺度特征提取模块时，在该层级位置最先的所述多尺度特征提取模块用作编码器，在该层级位置最后的所述多尺度特征提取模块用作解码器，且该层级的所述编码器与该层级的所述解码器之间设置有跳跃路径；当某一所述层级包括两个以上所述多尺度特征提取模块时，该层级除所述编码器和所述解码器以外的其余所述多尺度特征提取模块用作密集残差模块，且该层级的所述编码器与该层级的每个所述密集残差模块之间、该层级的每个所述密集残差模块与该层级的所述解码器之间均设置有所述跳跃路径；其中，所述编码器用于提取输入至该编码器的图像的多尺度特征；所述解码器用于恢复输入至该解码器的图像的分辨率；所述密集残差模块用于将深层级的信息集成到该密集残差模块所处的层级；As an optional implementation, in the second aspect of the present invention, when a certain level includes only one multi-scale feature extraction module, the multi-scale feature extraction module is used as an encoder; when a certain level includes two multi-scale feature extraction modules, the multi-scale feature extraction module at the first position of the level is used as an encoder, and the multi-scale feature extraction module at the last position of the level is used as a decoder, and a jump path is set between the encoder of the level and the decoder of the level; when a certain level includes more than two multi-scale feature extraction modules, the remaining multi-scale feature extraction modules of the level except the encoder and the decoder are used as dense residual modules, and the jump path is set between the encoder of the level and each of the dense residual modules of the level, and between each of the dense residual modules of the level and the decoder of the level; wherein the encoder is used to extract multi-scale features of an image input to the encoder; the decoder is used to restore the resolution of the image input to the decoder; the dense residual module is used to integrate deep-level information into the level where the dense residual module is located;

作为一种可选的实施方式，在本发明第二方面中，所述多尺度特征提取单元将所述目标图像输入至所述U²Net++网络的特征提取层进行多尺度特征提取，得到候选特征图集合的具体方式包括：As an optional implementation, in the second aspect of the present invention, the multi-scale feature extraction unit inputs the target image into the feature extraction layer of the U² Net++ network to perform multi-scale feature extraction, and the specific manner of obtaining the candidate feature map set includes:

作为一种可选的实施方式，在本发明第二方面中，当所述原始图像为X光牙齿图像时，所述目标分割特征图为以牙齿作为分割目标的分割特征图；所述多尺度特征提取模块的类型包括第一特征提取类型和第二特征提取类型；As an optional implementation, in the second aspect of the present invention, when the original image is an X-ray tooth image, the target segmentation feature map is a segmentation feature map with teeth as the segmentation target; the type of the multi-scale feature extraction module includes a first feature extraction type and a second feature extraction type;

作为一种可选的实施方式，在本发明第二方面中，当所述多尺度特征提取模块的类型为所述第一特征提取类型时，所述第一特征提取子路径为下采样子路径，且除所述最浅子层级以外的在所述下采样路径上的所述多尺度特征提取子模块为卷积子模块或下采样子模块；所述第一特征提取子路径为上采样子路径，且除所述最浅子层级以外的在所述上采样路径上的所述多尺度特征提取子模块为所述卷积子模块或上采样子模块；每一所述子层级中除所述上采样子模块和所述下采样子模块以外的其余所述多尺度特征提取子模块为密集卷积子模块。As an optional embodiment, in the second aspect of the present invention, when the type of the multi-scale feature extraction module is the first feature extraction type, the first feature extraction sub-path is a downsampling sub-path, and the multi-scale feature extraction sub-module on the downsampling path except the shallowest sub-level is a convolution sub-module or a downsampling sub-module; the first feature extraction sub-path is an upsampling sub-path, and the multi-scale feature extraction sub-module on the upsampling path except the shallowest sub-level is the convolution sub-module or the upsampling sub-module; the remaining multi-scale feature extraction sub-modules in each sub-level except the upsampling sub-module and the downsampling sub-module are dense convolution sub-modules.

作为一种可选的实施方式，在本发明第二方面中，当所述多尺度特征提取模块的类型为所述第二特征提取类型时，在所述多尺度特征提取模块中，除所述最浅子层级以外的其余每一所述子层级中的所述多尺度特征提取子模块均为膨胀卷积子模块；As an optional implementation, in the second aspect of the present invention, when the type of the multi-scale feature extraction module is the second feature extraction type, in the multi-scale feature extraction module, the multi-scale feature extraction submodules in each of the sub-levels except the shallowest sub-level are all dilated convolution submodules;

作为一种可选的实施方式，在本发明第二方面中，所述多侧输出融合单元将所述候选特征图集合输入至所述U²Net++网络的多侧输出融合层进行特征融合处理，得到目标分割特征图的具体方式包括：As an optional implementation, in the second aspect of the present invention, the multi-side output fusion unit inputs the candidate feature map set into the multi-side output fusion layer of the U² Net++ network for feature fusion processing, and the specific method of obtaining the target segmentation feature map includes:

本发明第三方面公开了另一种基于多尺度特征提取的图像分割装置，所述装置包括：The third aspect of the present invention discloses another image segmentation device based on multi-scale feature extraction, the device comprising:

存储有可执行程序代码的存储器；A memory storing executable program code;

与所述存储器耦合的处理器；a processor coupled to the memory;

所述处理器调用所述存储器中存储的所述可执行程序代码，执行本发明第一方面公开的基于多尺度特征提取的图像分割方法。The processor calls the executable program code stored in the memory to execute the image segmentation method based on multi-scale feature extraction disclosed in the first aspect of the present invention.

本发明第四方面公开了一种计算机存储介质，所述计算机存储介质存储有计算机指令，所述计算机指令被调用时，用于执行本发明第一方面公开的基于多尺度特征提取的图像分割方法。The fourth aspect of the present invention discloses a computer storage medium, which stores computer instructions. When the computer instructions are called, they are used to execute the image segmentation method based on multi-scale feature extraction disclosed in the first aspect of the present invention.

与现有技术相比，本发明实施例具有以下有益效果：Compared with the prior art, the embodiments of the present invention have the following beneficial effects:

本发明实施例中，将原始图像输入至预先训练好的U²Net++网络的数据输入层进行图像尺寸缩小处理，得到目标图像，目标图像的图像尺寸为目标尺寸，且数据输入层用于将原始图像的图像尺寸缩小至目标尺寸；将目标图像输入至U²Net++网络的特征提取层进行多尺度特征提取，得到候选特征图集合；特征提取层包括多个层级，每个层级包括至少一个多尺度特征提取模块；当某一层级包括至少两个多尺度特征提取模块时，该层级中每相邻两个多尺度特征提取模块之间均设置有注意力模块，每个注意力模块用于识别输入至该注意力模块的特征图的感兴趣区域并将感兴趣区域输入至该注意力模块的在后相邻的多尺度特征提取模块；候选特征图集合包括至少两个候选特征图子集合，每个候选特征图子集合所包含的特征图用于特征融合以获得分割特征图；将候选特征图集合输入至U²Net++网络的多侧输出融合层进行特征融合处理，得到目标分割特征图。可见，实施本发明能够通过U²Net++网络的数据输入层对输入至U²Net++网络的原始图像的图像尺寸缩小至目标尺寸，得到目标图像，并基于U²Net++网络的特征提取层对目标图像进行多尺度特征提取，得到候选特征图集合，然后通过U²Net++网络的多侧输出融合层对候选特征图集合进行特征融合处理，得到目标分割特征图，能够提高图像特征提取结果的提取效率和提取准确性，基于准确度更高的图像特征提取结果进行目标检测，以检测到的目标作为图像分割依据，能够提高图像分割依据的确定准确性，有利于提高图像分割结果的分割效率和分割准确性，从而有利于提高基于图像分割结果进行分析的准确性；以及，通过设置注意力模块，以识别图像中的关键特征，能够提高图像的关键特征的提取准确性，从而提高图像分割依据的确定可靠性，有利于进一步提高图像分割结果的精准性。In an embodiment of the present invention, an original image is input into a data input layer of a pre-trained U² Net++ network for image size reduction processing to obtain a target image, the image size of the target image is the target size, and the data input layer is used to reduce the image size of the original image to the target size; the target image is input into a feature extraction layer of the U² Net++ network for multi-scale feature extraction to obtain a candidate feature map set; the feature extraction layer includes multiple levels, each level includes at least one multi-scale feature extraction module; when a certain level includes at least two multi-scale feature extraction modules, an attention module is provided between each adjacent two multi-scale feature extraction modules in the level, each attention module is used to identify an area of interest of a feature map input into the attention module and input the area of interest into a multi-scale feature extraction module adjacent to the attention module; the candidate feature map set includes at least two candidate feature map subsets, and the feature maps contained in each candidate feature map subset are used for feature fusion to obtain a segmentation feature map; the candidate feature map set is input into a multi-side output fusion layer of the U² Net++ network for feature fusion processing to obtain a target segmentation feature map. It can be seen that the implementation of the present invention can reduce the image size of the original image input to the U² Net++ network to the target size through the data input layer of the U² Net++ network to obtain the target image, and perform multi-scale feature extraction on the target image based on the feature extraction layer of the U² Net++ network to obtain a set of candidate feature maps, and then perform feature fusion processing on the candidate feature map set through the multi-side output fusion layer of the U² Net++ network to obtain a target segmentation feature map, which can improve the extraction efficiency and extraction accuracy of the image feature extraction result, perform target detection based on the image feature extraction result with higher accuracy, and use the detected target as the image segmentation basis, which can improve the determination accuracy of the image segmentation basis, which is conducive to improving the segmentation efficiency and segmentation accuracy of the image segmentation result, thereby helping to improve the accuracy of analysis based on the image segmentation result; and, by setting an attention module to identify key features in the image, the extraction accuracy of the key features of the image can be improved, thereby improving the determination reliability of the image segmentation basis, which is conducive to further improving the accuracy of the image segmentation result.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本发明实施例中的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其它的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the accompanying drawings required for use in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings in the following description are only some embodiments of the present invention. For ordinary technicians in this field, other accompanying drawings can be obtained based on these accompanying drawings without paying creative work.

图1是本发明实施例公开的一种基于多尺度特征提取的图像分割方法的流程示意图；FIG1 is a schematic diagram of a flow chart of an image segmentation method based on multi-scale feature extraction disclosed in an embodiment of the present invention;

图2是本发明实施例公开的另一种基于多尺度特征提取的图像分割方法的流程示意图；FIG2 is a flow chart of another image segmentation method based on multi-scale feature extraction disclosed in an embodiment of the present invention;

图3是本发明实施例公开的一种U²Net++网络的架构示意图；FIG3 is a schematic diagram of the architecture of a U² Net++ network disclosed in an embodiment of the present invention;

图4是本发明实施例公开的一种U²Net++网络的特征提取层的结构示意图；FIG4 is a schematic diagram of the structure of a feature extraction layer of a U² Net++ network disclosed in an embodiment of the present invention;

图5是本发明实施例公开的一种U²Net++网络的特征提取层的多尺度特征提取模块的结构示意图；FIG5 is a schematic diagram of the structure of a multi-scale feature extraction module of a feature extraction layer of a U² Net++ network disclosed in an embodiment of the present invention;

图6是本发明实施例公开的另一种U²Net++网络的特征提取层的多尺度特征提取模块的结构示意图；FIG6 is a schematic diagram of the structure of a multi-scale feature extraction module of a feature extraction layer of another U² Net++ network disclosed in an embodiment of the present invention;

图7是本发明实施例公开的一种U²Net++网络的数据输入层的结构示意图；FIG7 is a schematic diagram of the structure of a data input layer of a U² Net++ network disclosed in an embodiment of the present invention;

图8是本发明实施例公开的一种基于多尺度特征提取的图像分割方法的图像分割结果图；FIG8 is a diagram showing an image segmentation result of an image segmentation method based on multi-scale feature extraction disclosed in an embodiment of the present invention;

图9是本发明实施例公开的另一种基于多尺度特征提取的图像分割方法的图像分割结果图；FIG9 is a diagram showing an image segmentation result of another image segmentation method based on multi-scale feature extraction disclosed in an embodiment of the present invention;

图10是本发明实施例公开的一种基于多尺度特征提取的图像分割装置的结构示意图；FIG10 is a schematic diagram of the structure of an image segmentation device based on multi-scale feature extraction disclosed in an embodiment of the present invention;

图11是本发明实施例公开的另一种基于多尺度特征提取的图像分割装置的结构示意图。FIG11 is a schematic diagram of the structure of another image segmentation device based on multi-scale feature extraction disclosed in an embodiment of the present invention.

具体实施方式DETAILED DESCRIPTION

为了使本技术领域的人员更好地理解本发明方案，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其它实施例，都属于本发明保护的范围。In order to enable those skilled in the art to better understand the scheme of the present invention, the technical scheme in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments are only part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present invention.

本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别不同对象，而不是用于描述特定顺序。此外，术语“包括”和“具有”以及它们任何变形，意图在于覆盖不排他的包含。例如包含了一系列步骤或子模块的过程、方法、装置、产品或端没有限定于已列出的步骤或子模块，而是可选地还包括没有列出的步骤或子模块，或可选地还包括对于这些过程、方法、产品或端固有的其它步骤或子模块。The terms "first", "second", etc. in the specification and claims of the present invention and the above-mentioned drawings are used to distinguish different objects, rather than to describe a specific order. In addition, the terms "including" and "having" and any variations thereof are intended to cover non-exclusive inclusions. For example, a process, method, device, product or terminal including a series of steps or submodules is not limited to the listed steps or submodules, but optionally also includes steps or submodules that are not listed, or optionally also includes other steps or submodules inherent to these processes, methods, products or terminals.

在本文中提及“实施例”意味着，结合实施例描述的特定特征、结构或特性可以包含在本发明的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例，也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是，本文所描述的实施例可以与其它实施例相结合。Reference to "embodiments" herein means that a particular feature, structure, or characteristic described in conjunction with the embodiments may be included in at least one embodiment of the present invention. The appearance of the phrase in various places in the specification does not necessarily refer to the same embodiment, nor is it an independent or alternative embodiment that is mutually exclusive with other embodiments. It is explicitly and implicitly understood by those skilled in the art that the embodiments described herein may be combined with other embodiments.

本发明公开了一种基于多尺度特征提取的图像分割方法及装置，能够通过U²Net++网络的数据输入层对输入至U²Net++网络的原始图像的图像尺寸缩小至目标尺寸，得到目标图像，并基于U²Net++网络的特征提取层对目标图像进行多尺度特征提取，得到候选特征图集合，然后通过U²Net++网络的多侧输出融合层对候选特征图集合进行特征融合处理，得到目标分割特征图，能够提高图像特征提取结果的提取效率和提取准确性，基于准确度更高的图像特征提取结果进行目标检测，以检测到的目标作为图像分割依据，能够提高图像分割依据的确定准确性，有利于提高图像分割结果的分割效率和分割准确性，从而有利于提高基于图像分割结果进行分析的准确性；以及，通过设置注意力模块，以识别图像中的关键特征，能够提高图像的关键特征的提取准确性，从而提高图像分割依据的确定可靠性，有利于进一步提高图像分割结果的精准性。以下分别进行详细说明。The present invention discloses an image segmentation method and device based on multi-scale feature extraction, which can reduce the image size of the original image input to the U² Net++ network to the target size through the data input layer of the U² Net++ network to obtain the target image, and perform multi-scale feature extraction on the target image based on the feature extraction layer of the U² Net++ network to obtain a candidate feature map set, and then perform feature fusion processing on the candidate feature map set through the multi-side output fusion layer of the U² Net++ network to obtain a target segmentation feature map, which can improve the extraction efficiency and extraction accuracy of the image feature extraction result, perform target detection based on the image feature extraction result with higher accuracy, and use the detected target as the image segmentation basis, which can improve the determination accuracy of the image segmentation basis, which is conducive to improving the segmentation efficiency and segmentation accuracy of the image segmentation result, thereby facilitating improving the accuracy of analysis based on the image segmentation result; and, by setting an attention module to identify key features in the image, the extraction accuracy of the key features of the image can be improved, thereby improving the determination reliability of the image segmentation basis, which is conducive to further improving the accuracy of the image segmentation result. The following are detailed descriptions.

实施例一Embodiment 1

请参阅图1，图1是本发明实施例公开的一种基于多尺度特征提取的图像分割方法的流程示意图。其中，图1所描述的基于多尺度特征提取的图像分割方法可以应用于基于多尺度特征提取的图像分割装置中，该装置可以包括计算设备、计算终端、计算系统和服务器中的一种，其中，服务器包括本地服务器或云服务器，本发明实施例不做限定。如图1所示，该基于多尺度特征提取的图像分割方法可以包括以下操作：Please refer to FIG. 1, which is a flowchart of an image segmentation method based on multi-scale feature extraction disclosed in an embodiment of the present invention. The image segmentation method based on multi-scale feature extraction described in FIG. 1 can be applied to an image segmentation device based on multi-scale feature extraction, which may include a computing device, a computing terminal, a computing system, and a server, wherein the server includes a local server or a cloud server, which is not limited in the embodiment of the present invention. As shown in FIG. 1, the image segmentation method based on multi-scale feature extraction may include the following operations:

101、将原始图像输入至预先训练好的U²Net++网络的数据输入层进行图像尺寸缩小处理，得到目标图像。101. Input the original image into the data input layer of the pre-trained U² Net++ network to reduce the image size and obtain the target image.

本发明实施例中，U²Net++网络是一个端到端可训练的图像分割网络，且U²Net++网络由数据输入层、特征提取层和多侧输出融合层组成。原始图像可以为生物医学图像、自然图像和其它需要进行目标检测以及图像分割的图像中的一种，本发明实施例不做限定；其中，生物医学图像可以为X光图像，进一步的，X光图像可以为X光牙齿图像或针对其它部位的X光图像，其中，X光牙齿图像可用于分析牙齿问题，本发明实施例不做限定；此外，原始图像可以为拍摄后直接获取到的图像，也可以为经过增强处理后得到的图像，本发明实施例不做限定。目标图像的图像尺寸为目标尺寸，且数据输入层用于将原始图像的图像尺寸缩小至目标尺寸，即降低输入至U²Net++网络的原始图像的分辨率和复杂度，以提高后续对图像进行特征提取的效率。其中，U²Net++网络的架构示意图可以如图3所示。In the embodiment of the present invention, the U² Net++ network is an end-to-end trainable image segmentation network, and the U² Net++ network is composed of a data input layer, a feature extraction layer, and a multi-side output fusion layer. The original image can be one of a biomedical image, a natural image, and other images that require target detection and image segmentation, which is not limited in the embodiment of the present invention; wherein the biomedical image can be an X-ray image, and further, the X-ray image can be an X-ray dental image or an X-ray image of other parts, wherein the X-ray dental image can be used to analyze dental problems, which is not limited in the embodiment of the present invention; in addition, the original image can be an image directly obtained after shooting, or an image obtained after enhancement processing, which is not limited in the embodiment of the present invention. The image size of the target image is the target size, and the data input layer is used to reduce the image size of the original image to the target size, that is, to reduce the resolution and complexity of the original image input to the U² Net++ network, so as to improve the efficiency of subsequent feature extraction of the image. Among them, the architecture diagram of the U² Net++ network can be shown in Figure 3.

102、将目标图像输入至U²Net++网络的特征提取层进行多尺度特征提取，得到候选特征图集合。102. Input the target image into the feature extraction layer of the U² Net++ network to perform multi-scale feature extraction to obtain a set of candidate feature maps.

本发明实施例中，特征提取层包括多个层级，每个层级包括至少一个多尺度特征提取模块；当某一层级包括至少两个多尺度特征提取模块时，该层级中每相邻两个多尺度特征提取模块之间均设置有注意力模块，每个注意力模块用于识别输入至该注意力模块的特征图的感兴趣区域并将感兴趣区域输入至该注意力模块的在后相邻的多尺度特征提取模块；候选特征图集合包括至少两个候选特征图子集合，每个候选特征图子集合包括至少一个特征图，且每个候选特征图子集合所包含的特征图用于特征融合以获得分割特征图；示例性的，候选特征图集合包括两个候选特征图子集合，每个候选特征图子集合包括四个特征图。其中，注意力模块为引入注意力机制的模块，注意力机制可以为通道注意力机制、空间注意力机制和结合通道注意力和空间注意力的机制，进一步的，注意力模块具体可以为SE(Squeeze-and-Excitation，压缩与激活)模块、CBAM(Convolutional BlockAttention Module，卷积块注意力模块)、ECA(Effificient Channel Attention，高效通道注意力)模块和其它具体类型的注意力模块中的一种，本发明实施例不做限定。In an embodiment of the present invention, the feature extraction layer includes multiple levels, each level includes at least one multi-scale feature extraction module; when a level includes at least two multi-scale feature extraction modules, an attention module is arranged between each two adjacent multi-scale feature extraction modules in the level, and each attention module is used to identify the region of interest of the feature map input to the attention module and input the region of interest to the multi-scale feature extraction module adjacent to the attention module; the candidate feature map set includes at least two candidate feature map subsets, each candidate feature map subset includes at least one feature map, and the feature maps contained in each candidate feature map subset are used for feature fusion to obtain a segmentation feature map; exemplarily, the candidate feature map set includes two candidate feature map subsets, each candidate feature map subset includes four feature maps. Among them, the attention module is a module that introduces the attention mechanism. The attention mechanism can be a channel attention mechanism, a spatial attention mechanism, and a mechanism combining channel attention and spatial attention. Further, the attention module can specifically be a SE (Squeeze-and-Excitation) module, a CBAM (Convolutional Block Attention Module), an ECA (Effificient Channel Attention) module, and one of other specific types of attention modules, which is not limited in the embodiments of the present invention.

103、将候选特征图集合输入至U²Net++网络的多侧输出融合层进行特征融合处理，得到目标分割特征图。103. Input the candidate feature map set into the multi-side output fusion layer of the U² Net++ network for feature fusion processing to obtain the target segmentation feature map.

本发明实施例中，多侧输出融合层采用了MSOF策略(Multiple Side-OutputFusion，多侧输出融合)进行深度监督。当U²Net++网络的架构示意图如图3所示时，目标分割特征图为图3中的Y^5,5。In the embodiment of the present invention, the multiple side output fusion layer adopts the MSOF strategy (Multiple Side-Output Fusion) for deep supervision. When the architecture diagram of the U² Net++ network is shown in FIG3 , the target segmentation feature map is Y^5,5 in FIG3 .

可见，实施本发明实施例所描述的方法能够通过U²Net++网络的数据输入层对输入至U²Net++网络的原始图像的图像尺寸缩小至目标尺寸，得到目标图像，并基于U²Net++网络的特征提取层对目标图像进行多尺度特征提取，得到候选特征图集合，然后通过U²Net++网络的多侧输出融合层对候选特征图集合进行特征融合处理，得到目标分割特征图，能够提高图像特征提取结果的提取效率和提取准确性，基于准确度更高的图像特征提取结果进行目标检测，以检测到的目标作为图像分割依据，能够提高图像分割依据的确定准确性，有利于提高图像分割结果的分割效率和分割准确性，从而有利于提高基于图像分割结果进行分析的准确性；以及，通过设置注意力模块，以识别图像中的关键特征，能够提高图像的关键特征的提取准确性，从而提高图像分割依据的确定可靠性，有利于进一步提高图像分割结果的精准性。It can be seen that the method described in the embodiment of the present invention can reduce the image size of the original image input to the U² Net++ network to the target size through the data input layer of the U² Net++ network to obtain the target image, and perform multi-scale feature extraction on the target image based on the feature extraction layer of the U² Net++ network to obtain a set of candidate feature maps, and then perform feature fusion processing on the candidate feature map set through the multi-side output fusion layer of the U² Net++ network to obtain a target segmentation feature map, which can improve the extraction efficiency and extraction accuracy of the image feature extraction result, perform target detection based on the image feature extraction result with higher accuracy, and use the detected target as the image segmentation basis, which can improve the determination accuracy of the image segmentation basis, which is conducive to improving the segmentation efficiency and segmentation accuracy of the image segmentation result, thereby facilitating improving the accuracy of analysis based on the image segmentation result; and, by setting an attention module to identify key features in the image, the extraction accuracy of the key features of the image can be improved, thereby improving the determination reliability of the image segmentation basis, which is conducive to further improving the accuracy of the image segmentation result.

在一个可选的实施例中，当某一层级仅包括一个多尺度特征提取模块时，该多尺度特征提取模块用作编码器；当某一层级包括两个多尺度特征提取模块时，在该层级位置最先的多尺度特征提取模块用作编码器，在该层级位置最后的多尺度特征提取模块用作解码器，且该层级的编码器与该层级的解码器之间设置有跳跃路径；当某一层级包括两个以上多尺度特征提取模块时，该层级除编码器和解码器以外的其余多尺度特征提取模块用作密集残差模块，且该层级的编码器与该层级的每个密集残差模块之间、该层级的每个密集残差模块与该层级的解码器之间均设置有跳跃路径；其中，编码器用于提取输入至该编码器的图像的多尺度特征；解码器用于恢复输入至该解码器的图像的分辨率；密集残差模块用于将深层级的信息集成到该密集残差模块所处的层级；In an optional embodiment, when a certain level includes only one multi-scale feature extraction module, the multi-scale feature extraction module is used as an encoder; when a certain level includes two multi-scale feature extraction modules, the multi-scale feature extraction module at the first position of the level is used as an encoder, and the multi-scale feature extraction module at the last position of the level is used as a decoder, and a jump path is set between the encoder of the level and the decoder of the level; when a certain level includes more than two multi-scale feature extraction modules, the remaining multi-scale feature extraction modules of the level except the encoder and the decoder are used as dense residual modules, and a jump path is set between the encoder of the level and each dense residual module of the level, and between each dense residual module of the level and the decoder of the level; wherein the encoder is used to extract multi-scale features of an image input to the encoder; the decoder is used to restore the resolution of the image input to the decoder; the dense residual module is used to integrate deep-level information into the level where the dense residual module is located;

其中，若某一层级所包含的多尺度特征提取模块的数量越多，则该层级越浅，最浅层级为包含多尺度特征提取模块的数量最多的层级；若某一层级所包含的多尺度特征提取模块的数量越少，则该层级越深，最深层级为仅包含一个多尺度特征提取模块的层级；Among them, if a certain level contains more multi-scale feature extraction modules, the level is shallower, and the shallowest level is the level containing the largest number of multi-scale feature extraction modules; if a certain level contains fewer multi-scale feature extraction modules, the level is deeper, and the deepest level is the level containing only one multi-scale feature extraction module;

每一层级的编码器与相邻深层级的编码器之间设置有下采样路径；每一多尺度特征提取模块与相邻浅层级的第一相邻模块之间设置有上采样路径，其中，第一相邻模块在所处层级中的位置相较于该多尺度特征提取模块在所处层级中的位置在后相邻；A downsampling path is provided between the encoder of each level and the encoder of the adjacent deep level; an upsampling path is provided between each multi-scale feature extraction module and the first adjacent module of the adjacent shallow level, wherein the position of the first adjacent module in the level is adjacent to the position of the multi-scale feature extraction module in the level;

对于每一注意力模块，输入至该注意力模块的特征图包括第一输入特征图和第二输入特征图；其中，以接收该注意力模块所输出的感兴趣区域的多尺度特征提取模块作为位置基准模块，位置基准模块的在先相邻的多尺度特征提取模块所输出的特征图为第一输入特征图；位置基准模块的相邻深层级的第二相邻模块所输出的特征图为第二输入特征图，第二相邻模块在所处层级中的位置相较于位置基准模块在所处层级中的位置在先相邻。For each attention module, the feature map input to the attention module includes a first input feature map and a second input feature map; wherein, the multi-scale feature extraction module that receives the area of interest output by the attention module is used as a position reference module, and the feature map output by the prior adjacent multi-scale feature extraction module of the position reference module is the first input feature map; the feature map output by the second adjacent module of the adjacent deep level of the position reference module is the second input feature map, and the position of the second adjacent module in the level is prior to the position of the position reference module in the level.

需要说明的是，在多尺度特征提取模块用作编码器时，多尺度特征提取模块可用于提取图像特征以及捕获并保留更多的上下文信息；在多尺度特征提取模块用作解码器时，每个解码器会级联上一级(深层级)的解码器的上采样特征图以及同一级(同一层级)的编码器的特征图，能够在解码时从编码器中获取更多的上下文信息和低级别的特征，以及通过级联这些特征图，从而捕获不同尺度和语义级别的信息。在多尺度特征提取模块用作密集残差模块时，密集残差模块能够将深层级的信息集成到当前层级，从而捕捉不同尺度和抽象级别的特征，增强编码器所输出的特征图的语义表达能力，使得这些特征图与解码器所需的特征图更加接近，从而提高网络的表现力和精度。It should be noted that when the multi-scale feature extraction module is used as an encoder, the multi-scale feature extraction module can be used to extract image features and capture and retain more contextual information; when the multi-scale feature extraction module is used as a decoder, each decoder will cascade the upsampled feature map of the decoder at the previous level (deep level) and the feature map of the encoder at the same level (same layer), so that more contextual information and low-level features can be obtained from the encoder during decoding, and information of different scales and semantic levels can be captured by cascading these feature maps. When the multi-scale feature extraction module is used as a dense residual module, the dense residual module can integrate the deep level information into the current level, so as to capture features of different scales and abstract levels, enhance the semantic expression ability of the feature map output by the encoder, and make these feature maps closer to the feature maps required by the decoder, thereby improving the expressiveness and accuracy of the network.

在本发明实施例中，示例性的，U²Net++网络的特征提取层可以包括五级编码器、四级解码器和一系列嵌套的密集残差模块，此时，特征提取层的结构示意图可以如图4所示。其中，上述多尺度特征提取模块可以为图4中的RSUP(ReSidual U-blocks Plus)模块。RSUP模块通过融合不同尺度的接收域，可以从不同尺度捕获更多的上下文信息。以及，RSUP模块与其它残差模块相比，RSUP基于Unet++结构进行卷积，其它残差模块采用单层卷积；RSUP利用由权重层组成的局部特征代替原始特征替换，以使U²Net++网络可以从每个RSUP模块的多个尺度中直接提取特征，从而更准确地捕捉到目标特征。在多尺度特征提取模块用作编码器时，RSUP-L模块用于提取图像特征，L用于表示RSUP模块的深度，且L是根据输入特征提取层的特征图的尺寸大小确定得到的；当RSUP模块的深度L越大，网络越深，感知区域也越大，能够感知到更广泛的局部和全局特征，RSUP-L模块可以如图4中所示的RSUP7、RSUP6、RSUP5和RSUP4；而RSUP4F模块则用于捕获并保留更多的上下文信息。In an embodiment of the present invention, exemplarily, the feature extraction layer of the U² Net++ network may include a five-level encoder, a four-level decoder and a series of nested dense residual modules. In this case, the structural schematic diagram of the feature extraction layer may be shown in FIG4 . Among them, the multi-scale feature extraction module may be the RSUP (ReSidual U-blocks Plus) module in FIG4 . The RSUP module can capture more contextual information from different scales by fusing receptive fields of different scales. In addition, compared with other residual modules, the RSUP module performs convolution based on the Unet++ structure, while other residual modules use single-layer convolution; RSUP uses local features composed of weight layers instead of original feature replacement, so that the U² Net++ network can directly extract features from multiple scales of each RSUP module, thereby more accurately capturing the target features. When the multi-scale feature extraction module is used as an encoder, the RSUP-L module is used to extract image features, L is used to represent the depth of the RSUP module, and L is determined according to the size of the feature map of the input feature extraction layer; when the depth L of the RSUP module is larger, the network is deeper, the perception area is larger, and it can perceive a wider range of local and global features. The RSUP-L module can be RSUP7, RSUP6, RSUP5 and RSUP4 as shown in Figure 4; and the RSUP4F module is used to capture and retain more contextual information.

可见，该可选的实施例能够通过设置多层级的多尺度特征提取模块，以及将多尺度特征提取模块用作编码器、解码器和密集残差模块，并通过从不断下采样的特征图中提取多个尺度的特征，然后通过逐步上采样、合并与卷积的方式将多尺度特征解码为高分辨率的特征图，实现了同时捕捉全局背景信息和局部细节信息，能够提高图像特征的提取全面性和提取精准性，从而提高图像特征提取结果的全面性和可靠性，在基于更为可靠的图像特征提取结果进行图像分割，有利于提高图像分割结果的准确性；以及，通过设置嵌套的密集残差模块和跳跃路径，能够将浅层级的特征和深层级的特征相融合，能够有效的减少梯度消失和网络退化问题，能够提高U²Net++网络的稳定性，与此同时，还能够保留原始图像中的特征，能够提高图像特征提取结果的完整性和可靠性，有利于提高图像分割结果的精准性。It can be seen that this optional embodiment can achieve simultaneous capture of global background information and local detail information by setting up a multi-level multi-scale feature extraction module, and using the multi-scale feature extraction module as an encoder, a decoder and a dense residual module, and by extracting features of multiple scales from the continuously downsampled feature map, and then decoding the multi-scale features into a high-resolution feature map by gradually upsampling, merging and convolution. It can improve the comprehensiveness and accuracy of image feature extraction, thereby improving the comprehensiveness and reliability of image feature extraction results, and performing image segmentation based on more reliable image feature extraction results, which is conducive to improving the accuracy of image segmentation results; and, by setting nested dense residual modules and skip paths, shallow-level features and deep-level features can be integrated, which can effectively reduce the problems of gradient disappearance and network degradation, and can improve the stability of the^U2Net ++ network. At the same time, it can also retain the features in the original image, which can improve the integrity and reliability of the image feature extraction results, and is conducive to improving the accuracy of the image segmentation results.

在该可选的实施例中，可选的，当原始图像为X光牙齿图像时，目标分割特征图为以牙齿作为分割目标的分割特征图；多尺度特征提取模块的类型包括第一特征提取类型和第二特征提取类型；In this optional embodiment, optionally, when the original image is an X-ray tooth image, the target segmentation feature map is a segmentation feature map with teeth as the segmentation target; the type of the multi-scale feature extraction module includes a first feature extraction type and a second feature extraction type;

多尺度特征提取模块包括多个子层级，每一子层级包括至少一个多尺度特征提取子模块；当某一子层级中包括用于接收输入特征图的多尺度特征提取子模块时，该子层级为最浅子层级，用于接收输入特征图的多尺度特征提取子模块为输入卷积子模块，最浅子层级还包括融合子模块，融合子模块为用作融合特征并输出融合特征图的多尺度特征提取子模块；当某一子层级中仅包括一个多尺度特征提取子模块时，该子层级为最深子层级；The multi-scale feature extraction module includes multiple sub-levels, each of which includes at least one multi-scale feature extraction sub-module; when a sub-level includes a multi-scale feature extraction sub-module for receiving an input feature map, the sub-level is the shallowest sub-level, and the multi-scale feature extraction sub-module for receiving an input feature map is an input convolution sub-module, and the shallowest sub-level also includes a fusion sub-module, which is a multi-scale feature extraction sub-module used for fusion features and outputting a fusion feature map; when a sub-level includes only one multi-scale feature extraction sub-module, the sub-level is the deepest sub-level;

当某一子层级包括至少两个多尺度特征提取子模块时，该层级中每相邻两个多尺度特征提取子模块之间均设置有注意力子模块，且该层级中各个多尺度特征提取子模块之间设置有跳跃子路径；对于每一注意力子模块，该注意力子模块接收在先相邻的多尺度特征提取子模块所输出的特征图以及相邻深子层级的位置在先的多尺度特征提取子模块所输出的特征图，且该注意力子模块的在后相邻的多尺度特征提取子模块接收该注意力子模块所输出的感兴趣区域；When a certain sub-level includes at least two multi-scale feature extraction sub-modules, an attention sub-module is provided between each two adjacent multi-scale feature extraction sub-modules in the level, and a jump sub-path is provided between each multi-scale feature extraction sub-module in the level; for each attention sub-module, the attention sub-module receives a feature map output by a previous adjacent multi-scale feature extraction sub-module and a feature map output by a previous multi-scale feature extraction sub-module in an adjacent deep sub-level, and a subsequent adjacent multi-scale feature extraction sub-module of the attention sub-module receives a region of interest output by the attention sub-module;

每一子层级的位置最先的多尺度特征提取子模块与相邻深层级的位置最先的多尺度特征提取子模块之间设置有第一特征提取子路径；每一子层级的位置最后的多尺度特征提取子模块与相邻浅层级的位置最后的多尺度特征提取子模块之间设置有第二特征提取子路径。A first feature extraction subpath is set between the first multi-scale feature extraction submodule at each sub-level and the first multi-scale feature extraction submodule at the adjacent deep level; a second feature extraction subpath is set between the last multi-scale feature extraction submodule at each sub-level and the last multi-scale feature extraction submodule at the adjacent shallow level.

需要说明的是，输入卷积子模块对输入特征图进行卷积计算，并输出一个通道数与输出特征图的通道数相等的特征图。多尺度特征提取子模块可以用作卷积子模块、上采样子模块、下采样子模块、密集卷积子模块和膨胀卷积子模块中的一种，可以根据需求对多尺度特征提取子模块的类型进行确定；可选的，卷积子模块可以包括Conv+BN(BatchNormalization，批归一化)+ReLU(线性整流函数，是一种激活函数)的运算，上采样子模块可以包括上采样+卷积+BN+ReLU的运算，下采样子模块可以包括下采样+卷积+BN+ReLU的运算，膨胀卷积子模块可以包括Conv2d+BN+ReLU的运算，本发明实施例不做限定。It should be noted that the input convolution submodule performs convolution calculation on the input feature map and outputs a feature map with the same number of channels as the output feature map. The multi-scale feature extraction submodule can be used as one of a convolution submodule, an upsampling submodule, a downsampling submodule, a dense convolution submodule and an expansion convolution submodule, and the type of the multi-scale feature extraction submodule can be determined according to demand; optionally, the convolution submodule may include the operation of Conv+BN (BatchNormalization) + ReLU (linear rectification function, which is an activation function), the upsampling submodule may include the operation of upsampling + convolution + BN + ReLU, the downsampling submodule may include the operation of downsampling + convolution + BN + ReLU, and the expansion convolution submodule may include the operation of Conv2d + BN + ReLU, which is not limited in the embodiment of the present invention.

可见，该可选的实施例还能够通过在多尺度特征提取模块中设置有多个子层级的多尺度特征提取子模块，增加了整个网络的深度，以及通过融合不同尺度的接收域，从不同尺度捕获更多的上下文信息，能够提高扩大感受野范围，有利于提取更多、更丰富的图像特征，从而提高图像特征提取结果的全面性。It can be seen that this optional embodiment can also increase the depth of the entire network by setting a multi-scale feature extraction sub-module with multiple sub-levels in the multi-scale feature extraction module, and by fusing receptive fields of different scales, capture more contextual information from different scales, thereby improving and expanding the receptive field range, which is conducive to extracting more and richer image features, thereby improving the comprehensiveness of the image feature extraction results.

在该可选的实施例中，进一步可选的，当多尺度特征提取模块的类型为第一特征提取类型时，第一特征提取子路径为下采样子路径，且除最浅子层级以外的在下采样路径上的多尺度特征提取子模块为卷积子模块或下采样子模块；第一特征提取子路径为上采样子路径，且除最浅子层级以外的在上采样路径上的多尺度特征提取子模块为卷积子模块或上采样子模块；每一子层级中除上采样子模块和下采样子模块以外的其余多尺度特征提取子模块为密集卷积子模块。In this optional embodiment, further optionally, when the type of the multi-scale feature extraction module is the first feature extraction type, the first feature extraction sub-path is a downsampling sub-path, and the multi-scale feature extraction sub-module on the downsampling path except the shallowest sub-level is a convolution sub-module or a downsampling sub-module; the first feature extraction sub-path is an upsampling sub-path, and the multi-scale feature extraction sub-module on the upsampling path except the shallowest sub-level is a convolution sub-module or an upsampling sub-module; the remaining multi-scale feature extraction sub-modules in each sub-level except the upsampling sub-module and the downsampling sub-module are dense convolution sub-modules.

其中，除最深层级的编码器以外，在特征提取层中的其余所有多尺度特征提取模块的类型均为第一特征提取类型。Among them, except for the encoder at the deepest level, the types of all other multi-scale feature extraction modules in the feature extraction layer are the first feature extraction type.

需要说明的是，在第一特征提取类型的多尺度特征提取模块中，输入卷积子模块将输入特征图X(尺寸为H×W×C_in)进行卷积计算，生成一个通道数为C_out的中间图F₁(x)。在下采样路径中，通过中间特征图F₁(x)作为输入，使用RB(F1(x))来提取和编码多尺度上下文信息，并将局部特征和多尺度特征通过求和操作进行融合，融合公式如下：It should be noted that in the multi-scale feature extraction module of the first feature extraction type, the input convolution submodule performs convolution calculation on the input feature map X (size is H×W×C_in ) to generate an intermediate map F₁ (x) with a channel number C_out . In the downsampling path, the intermediate feature map F₁ (x) is used as input, RB (F1 (x)) is used to extract and encode multi-scale context information, and the local features and multi-scale features are fused through a summation operation. The fusion formula is as follows:

HRSU(x)＝F1(x)+RB(F1(x))HRSU(x)＝F1(x)+RB(F1(x))

式中，F₁(x)为中间特征图(局部特征)，RB(F₁(x))为多尺度特征。In the formula, F₁ (x) is the intermediate feature map (local feature), and RB(F₁ (x)) is the multi-scale feature.

密集卷积子模块能够表示编码器输出的特征图中的语义信息与解码器所需的特征图之间的联系，其中，多尺度特征提取子模块的特征映射栈的计算公式如下：The dense convolution submodule can represent the connection between the semantic information in the feature map output by the encoder and the feature map required by the decoder. The calculation formula of the feature map stack of the multi-scale feature extraction submodule is as follows:

式中，H(-)表示带有激活函数的卷积操作，u(-)表示上采样，[-]表示连接层，多尺度特征提取子模块的输出表示为X(i,j)，i表示多尺度特征提取子模块所位于的层级，j表示多尺度特征提取子模块在所处层级中的的位置。Where H(-) represents the convolution operation with activation function, u(-) represents upsampling, [-] represents the connection layer, and the output of the multi-scale feature extraction submodule is represented as X(i, j), i represents the level where the multi-scale feature extraction submodule is located, and j represents the position of the multi-scale feature extraction submodule in the level.

在引入了注意力子模块(可以为CBAM机制的注意力子模块)后，多尺度特征提取子模块的特征映射堆栈的计算公式被修改为：After the introduction of the attention submodule (which can be the attention submodule of the CBAM mechanism), the calculation formula of the feature map stack of the multi-scale feature extraction submodule is modified as follows:

式中，H(-)表示带有激活函数的卷积操作，u(-)表示上采样，[-]表示连接层，Ag表示注意力子模块。Where H(-) represents the convolution operation with activation function, u(-) represents upsampling, [-] represents the connection layer, and Ag represents the attention submodule.

本发明实施例中，示例性的，第一特征提取类型的多尺度特征提取模块可以为RSUP-L模块，此时，多尺度特征提取模块的结构示意图可以如图5所示，其中：可选的，在下采样路径中标有C_in,3×3,C_out和C_out,3×3,M的两个多尺度特征提取子模块均为卷积子模块，以及标有C_out,3×3,d＝2,M的多尺度特征提取子模块以及上采样路径中与最深子层级的相邻的标有M×2,3×3,M的多尺度特征提取子模块均为卷积子模块，在下采样路径中的其余多尺度特征提取子模块均为下采样子模块，在上采样路径中的其余多尺度特征提取子模块均为上采样子模块。In an embodiment of the present invention, exemplarily, the multi-scale feature extraction module of the first feature extraction type may be an RSUP-L module. In this case, the structural schematic diagram of the multi-scale feature extraction module may be as shown in FIG5 , wherein: Optionally, the two multi-scale feature extraction sub-modules labeled C_in ,3×3,C_out and C_out ,3×3,M in the downsampling path are both convolution sub-modules, and the multi-scale feature extraction sub-module labeled C_out ,3×3,d＝2,M and the multi-scale feature extraction sub-module labeled M×2,3×3,M adjacent to the deepest sub-level in the upsampling path are both convolution sub-modules, the remaining multi-scale feature extraction sub-modules in the downsampling path are all downsampling sub-modules, and the remaining multi-scale feature extraction sub-modules in the upsampling path are all upsampling sub-modules.

可见，该可选的实施例还能够在第一特征提取类型的多尺度特征提取模块中设置下采样和上采样的操作，提供了更深的网络结构和更多的注意力机制，能够提取更为丰富的图像特征，从而进一步提高图像特征提取结果的全面性，有利于进一步提高图像分割结果的精确性；以及，根据特征图的尺寸设置不同深度的多尺度特征提取模块，获取不同图像尺寸对应的不同规模的特征信息，能够提高图像特征的提取灵活性，从而提高图像特征提取结果的准确性。It can be seen that this optional embodiment can also set downsampling and upsampling operations in the multi-scale feature extraction module of the first feature extraction type, provide a deeper network structure and more attention mechanisms, and be able to extract richer image features, thereby further improving the comprehensiveness of the image feature extraction results, which is conducive to further improving the accuracy of the image segmentation results; and, according to the size of the feature map, multi-scale feature extraction modules of different depths are set to obtain feature information of different scales corresponding to different image sizes, which can improve the flexibility of image feature extraction and thus improve the accuracy of the image feature extraction results.

在该可选的实施例中，进一步可选的，当多尺度特征提取模块的类型为第二特征提取类型时，在多尺度特征提取模块中，除最浅子层级以外的其余每一子层级中的多尺度特征提取子模块均为膨胀卷积子模块；In this optional embodiment, further optionally, when the type of the multi-scale feature extraction module is the second feature extraction type, in the multi-scale feature extraction module, the multi-scale feature extraction submodules in each sub-level except the shallowest sub-level are all dilated convolution submodules;

其中，最深层级的编码器的类型为第二特征提取类型。Among them, the type of the deepest level encoder is the second feature extraction type.

需要说明的是，与第一特征提取类型的多尺度特征提取模块相比，第二特征提取类型的多尺度特征提取模块通过使用膨胀卷积代替下采样和上采样的操作，以此捕获并保留更广泛的上下文信息，并且有助于提高模型在特征图分辨率相对较低的情况下的分割精度。其中，膨胀卷积能够增大感受野与提升卷积神经网络的性能。其中，膨胀卷积扩张后的卷积核大小的计算公式如下：It should be noted that compared with the multi-scale feature extraction module of the first feature extraction type, the multi-scale feature extraction module of the second feature extraction type uses dilated convolution instead of downsampling and upsampling operations to capture and retain a wider range of contextual information, and helps to improve the segmentation accuracy of the model when the feature map resolution is relatively low. Among them, the dilated convolution can increase the receptive field and improve the performance of the convolutional neural network. Among them, the calculation formula for the convolution kernel size after the dilated convolution is as follows:

w＝k+(k-1)(a-1)w＝k+(k-1)(a-1)

式中，k为卷积核的大小，a为卷积扩张率，w为经扩张后实际卷积核的大小。以及，膨胀卷积后得到的图像尺寸大小的计算公式如下所示：In the formula, k is the size of the convolution kernel, a is the convolution expansion rate, and w is the actual size of the convolution kernel after expansion. And the calculation formula for the image size obtained after the expansion convolution is as follows:

式中，v为膨胀卷积后得到的图像尺寸，k为卷积核的大小，a为卷积扩张率，x为输入图像尺寸，P为Padding，S为步长。Where v is the image size obtained after dilated convolution, k is the size of the convolution kernel, a is the convolution expansion rate, x is the input image size, P is Padding, and S is the step size.

经过膨胀卷积操作后的中间特征图可以保持与输入特征图相同的空间分辨率，因此可以避免在特征提取过程中出现分辨率下降的问题，同时保留更多的上下文信息在第二特征提取类型的多尺度特征提取模块中。The intermediate feature map after the dilated convolution operation can maintain the same spatial resolution as the input feature map, thus avoiding the problem of resolution degradation during the feature extraction process, while retaining more contextual information in the multi-scale feature extraction module of the second feature extraction type.

本发明实施例中，示例性的，第二特征提取类型的多尺度特征提取模块可以为RSUP4F模块，此时，多尺度特征提取模块的结构示意图可以如图6所示。In the embodiment of the present invention, exemplarily, the multi-scale feature extraction module of the second feature extraction type may be a RSUP4F module. In this case, the structural diagram of the multi-scale feature extraction module may be as shown in FIG. 6 .

可见，该可选的实施例还能够通过设置第二特征提取类型的多尺度特征提取模块，膨胀卷积子模块能够在保持输入特征图的分辨率的同时扩大感受野，能够在提高分割网络稳定性的同时保留更多的图像细节信息且捕获到更广泛的上下文信息，提高了图像特征提取结果的可靠性，有利于提高图像分割结果的可靠性。It can be seen that this optional embodiment can also set a multi-scale feature extraction module of the second feature extraction type. The dilated convolution submodule can expand the receptive field while maintaining the resolution of the input feature map, and can retain more image detail information and capture a wider range of contextual information while improving the stability of the segmentation network, thereby improving the reliability of the image feature extraction results and helping to improve the reliability of the image segmentation results.

实施例二Embodiment 2

请参阅图2，图2是本发明实施例公开的一种基于多尺度特征提取的图像分割方法的流程示意图。其中，图2所描述的基于多尺度特征提取的图像分割方法可以应用于基于多尺度特征提取的图像分割装置中，该装置可以包括计算设备、计算终端、计算系统和服务器中的一种，其中，服务器包括本地服务器或云服务器，本发明实施例不做限定。如图2所示，该基于多尺度特征提取的图像分割方法可以包括以下操作：Please refer to FIG. 2, which is a flowchart of an image segmentation method based on multi-scale feature extraction disclosed in an embodiment of the present invention. The image segmentation method based on multi-scale feature extraction described in FIG. 2 can be applied to an image segmentation device based on multi-scale feature extraction, which may include a computing device, a computing terminal, a computing system, and a server, wherein the server includes a local server or a cloud server, which is not limited in the embodiment of the present invention. As shown in FIG. 2, the image segmentation method based on multi-scale feature extraction may include the following operations:

201、将原始图像输入至预先训练好的U²Net++网络的数据输入层进行图像尺寸缩小处理，得到目标图像。201. Input the original image into the data input layer of the pre-trained U² Net++ network to reduce the image size and obtain the target image.

202、将目标图像输入至U²Net++网络的特征提取层中最浅层级的编码器，并基于下采样路径对目标图像进行下采样操作直至最深层级的相邻浅层级，得到多尺度下采样特征图。202. Input the target image into the shallowest encoder in the feature extraction layer of the U² Net++ network, and perform a downsampling operation on the target image based on the downsampling path until the adjacent shallow layer of the deepest layer, so as to obtain a multi-scale downsampled feature map.

203、基于最深层级的编码器，对最深层级的相邻浅层级的编码器所输出的多尺度下采样特征图进行膨胀卷积处理，得到膨胀特征图。203. Based on the deepest level encoder, a dilated convolution process is performed on the multi-scale down-sampled feature map output by the encoder of the adjacent shallow level of the deepest level to obtain a dilated feature map.

204、对于除所有编码器之外的每一多尺度特征提取模块，基于接收到的感兴趣区域，对输入至该多尺度特征提取模块的所有特征图进行上采样操作，得到多尺度上采样特征图。204. For each multi-scale feature extraction module except all encoders, based on the received region of interest, up-sample all feature maps input to the multi-scale feature extraction module to obtain a multi-scale up-sampled feature map.

本发明实施例中，输入至该多尺度特征提取模块的所有特征图包括基于跳跃路径输入至该多尺度特征提取模块的特征图以及基于上采样路径输入至该多尺度特征提取模块的特征图。In an embodiment of the present invention, all feature maps input to the multi-scale feature extraction module include feature maps input to the multi-scale feature extraction module based on a skip path and feature maps input to the multi-scale feature extraction module based on an upsampling path.

205、将最浅层级中的所有多尺度上采样特征图确定为第一候选特征图子集合。205. Determine all multi-scale upsampled feature maps in the shallowest level as a first candidate feature map subset.

本发明实施例中，当U²Net++网络的架构示意图如图3所示时，第一候选特征图子集合包括RSUP7(0,1)输出的Y^0,1、RSUP7(0,2)输出的Y^0,2、RSUP7(0,3)输出的Y^0,3和RSUP7(0,1)输出的Y^0,4。In the embodiment of the present invention, when the architecture diagram of the U² Net++ network is as shown in FIG3 , the first candidate feature map subset includes Y^0,1 output by RSUP7(0,1), Y^0,2 output by RSUP7(0,2), Y^0,3 output by RSUP7(0,3) and Y^0,4 output by RSUP7(0,1).

206、将膨胀特征图以及除最浅层级之外的其余每一层级的解码器所输出的多尺度上采样特征图确定为第二候选特征图子集合。206. Determine the dilated feature map and the multi-scale up-sampled feature maps output by the decoders of each level except the shallowest level as a second candidate feature map subset.

本发明实施例中，当U²Net++网络的架构示意图如图3所示时，第二候选特征图子集合包括RSUP6(1,3)输出的Y^1,0、RSUP5(2,2)输出的Y^2,0、RSUP4(3,1)输出的Y^3,0和RSUP4F(4,0)输出的Y^4,0。In the embodiment of the present invention, when the architecture diagram of the U² Net++ network is as shown in FIG3 , the second candidate feature map subset includes Y^1,0 output by RSUP6(1,3), Y^2,0 output by RSUP5(2,2), Y^3,0 output by RSUP4(3,1) and Y^4,0 output by RSUP4F(4,0).

207、将候选特征图集合输入至U²Net++网络的多侧输出融合层进行特征融合处理，得到目标分割特征图。207. Input the candidate feature map set into the multi-side output fusion layer of the U² Net++ network for feature fusion processing to obtain the target segmentation feature map.

在本发明实施例中，示例性的，当U²Net++网络的架构示意图如图3所示时，则跳跃路径中的特征映射堆栈的计算公式为：In the embodiment of the present invention, exemplarily, when the schematic diagram of the architecture of the U² Net++ network is shown in FIG3 , the calculation formula of the feature map stack in the jump path is:

式中，RSUP(i,j)表示多尺度特征提取模块内的卷积运算，u(-)表示上采样，[-]表示连接层，i表示多尺度特征提取子模块所位于的层级，j表示多尺度特征提取子模块在所处层级中的的位置。Where RSUP(i,j) represents the convolution operation in the multi-scale feature extraction module, u(-) represents upsampling, [-] represents the connection layer, i represents the level where the multi-scale feature extraction submodule is located, and j represents the position of the multi-scale feature extraction submodule in the level.

以及，示例性的，最浅层级的多尺度特征提取模块内的带有激活函数的卷积运算的计算公式如下：And, exemplarily, the calculation formula of the convolution operation with activation function in the shallowest level multi-scale feature extraction module is as follows:

X0,0＝Θ[Input]X0,0＝Θ[Input]

X0,1＝Θ[Ag(X0,0),UP(X1,0)]X0,1＝Θ[Ag(X0,0),UP(X1,0)]

X0,2＝Θ[Ag(X0,0),Ag(X0,1),UP(X1,1)]X0,2＝Θ[Ag(X0,0),Ag(X0,1),UP(X1,1)]

X0,3＝Θ[Ag(X0,0),Ag(X0,1),Ag(X0,2),UP(X1,2)]X0,3＝Θ[Ag(X0,0),Ag(X0,1),Ag(X0,2),UP(X1,2)]

X^0,4＝Θ[Ag(X^0,0),Ag(X^0,1),Ag(X^0,2),Ag(X^0,3),UP(X^1,3)]^{X 0,4} ＝Θ[Ag(X^0,0 ),Ag(X^0,1 ),Ag(X^0,2 ),Ag(X^0,3 ),UP(X^1,3 )]

式中，X^i,j代表多尺度特征提取模块，i表示多尺度特征提取子模块所位于的层级，j表示多尺度特征提取子模块在所处层级中的的位置，Up(-)代表上采样，符号Θ(-)表示带有激活函数的卷积运算，Ag代表注意力模块。Where Xi^,j represents the multi-scale feature extraction module, i represents the level where the multi-scale feature extraction submodule is located, j represents the position of the multi-scale feature extraction submodule in the level, Up(-) represents upsampling, the symbol θ(-) represents the convolution operation with activation function, and Ag represents the attention module.

本发明实施例中，针对U²Net++网络的特征提取层的其它详细描述，请参照实施例一中针对U²Net++网络的特征提取层的详细描述，本发明实施例不再赘述。For other detailed descriptions of the feature extraction layer of the U² Net++ network in the embodiment of the present invention, please refer to the detailed description of the feature extraction layer of the U² Net++ network in the first embodiment, which will not be described in detail in the embodiment of the present invention.

本发明实施例中，针对步骤201-步骤207的其它详细描述以及，请参照实施例一中针对步骤101-步骤103的详细描述，本发明实施例不再赘述。In the embodiment of the present invention, for other detailed descriptions of step 201 to step 207 and the detailed description of step 101 to step 103 in the first embodiment, the embodiment of the present invention will not be repeated here.

可见，实施本发明实施例所描述的方法能够通过U²Net++网络的数据输入层对输入至U²Net++网络的原始图像的图像尺寸缩小至目标尺寸，得到目标图像，并基于U²Net++网络的特征提取层对目标图像进行多尺度特征提取，得到候选特征图集合，然后通过U²Net++网络的多侧输出融合层对候选特征图集合进行特征融合处理，得到目标分割特征图，能够提高图像特征提取操作的效率和准确性，从而提高目标检测的准确性，进而提高图像分割的准确性，有利于提高基于图像分割结果进行分析的准确性；以及，通过设置注意力模块，能够识别感兴趣区域，以突出分割目标和背景之间的区别，能够提高图像的局部特征的提取准确性和效率，从而提高对分割目标的特征提取的可靠性，进而提高U²Net++网络对分割目标的识别能力，有利于进一步提高图像分割的精准性。此外，还能够基于下采样路径对目标图像进行下采样操作，得到多尺度下采样特征图，并且基于最深层级的编码器，对相邻的多尺度下采样特征图进行膨胀卷积处理，得到膨胀特征图，以及对于除所有编码器之外的每一多尺度特征提取模块，基于接收到的感兴趣区域对输入至该多尺度特征提取模块的所有特征图进行上采样操作，得到多尺度上采样特征图，然后将最浅层级中的所有多尺度上采样特征图确定为第一候选特征图子集合，以及将膨胀特征图以及除最浅层级之外的其余每一层级的解码器所输出的多尺度上采样特征图确定为第二候选特征图子集合，能够提高候选特征图的确定准确性和确定全面性，从而有利于提高图像特征融合结果的准确性，进而有利于提高图像分割结果的分割准确性和分割可靠性。It can be seen that the method described in the embodiment of the present invention can reduce the image size of the original image input to the U² Net++ network to the target size through the data input layer of the U² Net++ network to obtain the target image, and perform multi-scale feature extraction on the target image based on the feature extraction layer of the U² Net++ network to obtain a set of candidate feature maps, and then perform feature fusion processing on the candidate feature map set through the multi-side output fusion layer of the U² Net++ network to obtain a target segmentation feature map, which can improve the efficiency and accuracy of the image feature extraction operation, thereby improving the accuracy of target detection, and then improve the accuracy of image segmentation, which is conducive to improving the accuracy of analysis based on image segmentation results; and, by setting the attention module, it is possible to identify the region of interest to highlight the difference between the segmentation target and the background, which can improve the accuracy and efficiency of extracting local features of the image, thereby improving the reliability of feature extraction of the segmentation target, and then improve the recognition ability of the U² Net++ network for the segmentation target, which is conducive to further improving the accuracy of image segmentation. In addition, it is also possible to perform a downsampling operation on the target image based on the downsampling path to obtain a multi-scale downsampling feature map, and based on the encoder at the deepest level, perform dilated convolution processing on adjacent multi-scale downsampling feature maps to obtain a dilated feature map, and for each multi-scale feature extraction module except all encoders, perform an upsampling operation on all feature maps input to the multi-scale feature extraction module based on the received region of interest to obtain a multi-scale upsampling feature map, and then determine all multi-scale upsampling feature maps in the shallowest level as a first candidate feature map subset, and determine the dilated feature map and the multi-scale upsampling feature maps output by the decoder of each level except the shallowest level as a second candidate feature map subset, which can improve the determination accuracy and comprehensiveness of the candidate feature maps, thereby facilitating improving the accuracy of the image feature fusion results, and further facilitating improving the segmentation accuracy and segmentation reliability of the image segmentation results.

在一个可选的实施例中，将候选特征图集合输入至U²Net++网络的多侧输出融合层进行特征融合处理，得到目标分割特征图，可以包括以下操作：In an optional embodiment, the candidate feature map set is input into the multi-side output fusion layer of the U² Net++ network for feature fusion processing to obtain the target segmentation feature map, which may include the following operations:

基于预设激活函数，对第一候选特征图子集合中的所有候选特征图进行特征融合处理，得到第一分割特征图；其中，第一分割特征图的计算公式如下：Based on the preset activation function, feature fusion processing is performed on all candidate feature maps in the first candidate feature map subset to obtain a first segmentation feature map; wherein the calculation formula of the first segmentation feature map is as follows:

式中，表示连接操作，Y^0,5为第一分割特征图，Y^0,1、Y^0,2、Y^0,3和Y^0,4均为第一候选特征图子集合中的特征图；In the formula, represents a connection operation, Y^0,5 is the first segmentation feature map, Y^0,1 , Y^0,2 , Y^0,3 and Y^0,4 are all feature maps in the first candidate feature map subset;

基于预设激活函数，对第二候选特征图子集合中的所有候选特征图进行特征融合处理，得到第二分割特征图；其中，第二分割特征图的计算公式如下：Based on the preset activation function, feature fusion processing is performed on all candidate feature maps in the second candidate feature map subset to obtain a second segmentation feature map; wherein the calculation formula of the second segmentation feature map is as follows:

式中，Y^5,0为第二分割特征图，Y^1,0、Y^2,0、Y^3,0和Y^4,0均为第二候选特征图子集合中的特征图；Wherein, Y^5,0 is the second segmentation feature map, Y^1,0 , Y^2,0 , Y^3,0 and Y^4,0 are all feature maps in the second candidate feature map subset;

基于预设激活函数，对第一分割特征图和第二分割特征图进行特征融合处理，得到目标分割特征图；其中，目标分割特征图的计算公式如下：Based on the preset activation function, the first segmentation feature map and the second segmentation feature map are subjected to feature fusion processing to obtain a target segmentation feature map; wherein the calculation formula of the target segmentation feature map is as follows:

式中，Y^5,5为目标分割特征图。Where Y^5,5 is the target segmentation feature map.

其中，预设激活函数可以为Sigmoid函数，也可以为其它激活函数，本发明实施例不做限定。The preset activation function may be a Sigmoid function or other activation functions, which is not limited in the embodiment of the present invention.

可见，该可选的实施例能够基于预设激活函数，分别对第一候选特征图子集合和第二候选特征图子集合中各自的所有候选特征图进行融合处理，分别得到第一分割特征图和第二分割特征图，然后对第一分割特征图和第二分割特征图进行融合，得到目标分割特征图，有助于丰富图像特征融合依据的数量，提高了图像特征融合结果的可靠性，从而提高图像分割依据的确定准确性，进而有利于提高图像分割结果的准确性，同时还能够提高U²Net++网络的收敛性，以提高U²Net++网络的稳定性，有利于提高图像分割结果的可靠性。It can be seen that this optional embodiment can, based on a preset activation function, fuse all candidate feature maps in the first candidate feature map subset and the second candidate feature map subset, respectively, to obtain a first segmentation feature map and a second segmentation feature map, respectively, and then fuse the first segmentation feature map and the second segmentation feature map to obtain a target segmentation feature map, which helps to enrich the number of image feature fusion bases and improve the reliability of the image feature fusion results, thereby improving the accuracy of determining the image segmentation bases, and further helping to improve the accuracy of the image segmentation results. At the same time, it can also improve the convergence of the U² Net++ network to improve the stability of the U² Net++ network, which is beneficial to improving the reliability of the image segmentation results.

在另一个可选的实施例中，U²Net++网络的数据输入层包括输入层、5个卷积层、4个池化层和2个归一化层；其中，在输入层之后其余层的排布顺序为：2个卷积核为7*7的卷积层、2个池化区域为3*3的池化层、1个归一化层、1个卷积核为1*1的卷积层、2个卷积核为3*3的卷积层、1个归一化层和2个池化区域为3*3的池化层；以及，然后一个池化层输出的图像为目标图像。In another optional embodiment, the data input layer of the U² Net++ network includes an input layer, 5 convolution layers, 4 pooling layers and 2 normalization layers; wherein, the arrangement order of the remaining layers after the input layer is: 2 convolution layers with 7*7 convolution kernels, 2 pooling layers with a pooling area of 3*3, 1 normalization layer, 1 convolution layer with a convolution kernel of 1*1, 2 convolution layers with a convolution kernel of 3*3, 1 normalization layer and 2 pooling layers with a pooling area of 3*3; and, then the image output by one pooling layer is the target image.

可选的，池化层可以为最大化池化层，归一化层可以为局部归一化层。Optionally, the pooling layer may be a maximum pooling layer, and the normalization layer may be a local normalization layer.

本发明实施例中，示例性的，U²Net++网络的数据输入层的结构示意图可以如图7所示，其中，Input为输入层，Conv为卷积层，MaxPool为最大化池化层，LocalRespNorm为局部归一化层。In the embodiment of the present invention, exemplarily, the structural diagram of the data input layer of the U² Net++ network may be shown in FIG7 , wherein Input is the input layer, Conv is the convolution layer, MaxPool is the maximum pooling layer, and LocalRespNorm is the local normalization layer.

可见，该可选的实施例能够通过卷积和池化操作将输入的原始图像转换为低分辨率的特征图，能够降低输入特征提取层的图像尺寸和复杂度，从而提高了图像特征提取的效率，进而有利于提高图像分割的效率。It can be seen that this optional embodiment can convert the input original image into a low-resolution feature map through convolution and pooling operations, which can reduce the image size and complexity of the input feature extraction layer, thereby improving the efficiency of image feature extraction, and further helping to improve the efficiency of image segmentation.

在本发明实施例中，示例性的，当U²Net++网络的架构示意图如图3所示时，在基于U²Net++网络以及其它现有技术对X光牙齿图像进行分割后可以得到分别如图8和图9的图像分割结果，其中，图8为整体偏亮的X光牙齿图像的图像分割结果，图9为整体偏暗的X光牙齿图像的图像分割结果。In the embodiment of the present invention, exemplarily, when the architecture diagram of the U² Net++ network is shown in FIG3 , after segmenting the X-ray tooth image based on the U² Net++ network and other existing technologies, image segmentation results as shown in FIGS. 8 and 9 can be obtained, respectively, wherein FIG8 is the image segmentation result of the X-ray tooth image that is brighter overall, and FIG9 is the image segmentation result of the X-ray tooth image that is darker overall.

通过比较图8和图9中的各分割结果可知，在整体偏亮的X光牙齿图像中，发现使用UNet网络进行X光牙齿图像分割后的图像具有一些问题，例如分割不完整、特征丢失和过度分割等。针对这些牙齿特征不够明显的图像，图像分割的结果会表现得更不理想。整体偏暗的X光牙齿图像中，UNet网络的分割结果表明网络可以大致分割出五颗牙齿，但是上侧第二颗牙齿的边缘与第一颗牙齿重叠，导致边界不清晰。使用UNet++网络对X光牙齿图像进行分割时，虽然整体分割性能优于UNet网络，但是U²Net网络则进一步提高了分割结果。虽然U²Net网络提高了分割的准确度，但是分割结果中存在边缘模糊的情况,比如在第一张X光牙齿图像中，最上方的牙齿边界略有模糊，个别地方存在不清晰的情况，但是分割效果优于UNet++与UNet。Deep snake网络的分割结果从左到右的四颗牙齿与手动标注图相近,但第五颗牙齿存在过度分割的情况。本发明实施例提出的U²Net++网络可以对X光牙齿图像实现每颗牙齿的有效分割，其分割结果更接近手动标注的mask图像。因此，通过分割实验结果的观察，可以得出结论：U²Net++网络对X光牙齿图像的分割的精确度有一定的提升，且牙齿边缘较为清晰。By comparing the segmentation results in Figures 8 and 9, it can be seen that in the overall bright X-ray tooth images, the images after X-ray tooth image segmentation using the UNet network have some problems, such as incomplete segmentation, feature loss and over-segmentation. For these images where the tooth features are not obvious, the image segmentation results will be even less ideal. In the overall dark X-ray tooth images, the segmentation results of the UNet network show that the network can roughly segment five teeth, but the edge of the second tooth on the upper side overlaps with the first tooth, resulting in unclear boundaries. When using the UNet++ network to segment the X-ray tooth images, although the overall segmentation performance is better than the UNet network, the U² Net network further improves the segmentation results. Although the U² Net network improves the accuracy of segmentation, there are blurred edges in the segmentation results. For example, in the first X-ray tooth image, the boundary of the top tooth is slightly blurred, and there are unclear places, but the segmentation effect is better than UNet++ and UNet. The segmentation results of the Deep Snake network are similar to the manually labeled images for the four teeth from left to right, but the fifth tooth is over-segmented. The U² Net++ network proposed in the embodiment of the present invention can effectively segment each tooth in the X-ray tooth image, and its segmentation result is closer to the manually annotated mask image. Therefore, through the observation of the segmentation experiment results, it can be concluded that the U² Net++ network has a certain improvement in the accuracy of X-ray tooth image segmentation, and the tooth edges are clearer.

为了更客观地评价X光牙齿图像分割结果，采用准确率(ACC)、骰子相似系数(DSC)和交并比(IOU)指标进行评价。这些指标可以更全面地反映出模型的分割性能，在下表中列出了实验各项评价指标的具体数值，其中实验数值取自不同测试数据的最大值。In order to more objectively evaluate the X-ray tooth image segmentation results, the accuracy (ACC), dice similarity coefficient (DSC) and intersection over union (IOU) indicators are used for evaluation. These indicators can more comprehensively reflect the segmentation performance of the model. The specific values of each evaluation indicator of the experiment are listed in the following table, where the experimental values are taken from the maximum values of different test data.

通过对上表的分析，发现使用UNet网络分割X光牙齿图像，其ACC、DSC和IOU的评价指标值均为最低的，说明UNet网络的分割效果与其它网络相比时最差的。相比之下，采用UNet网络进行图像分割后，所得到的各项指标值普遍较低，表明分割效果不够理想。此外，相较于Deep mask分割结果，Unet++网络分割的准确率有所下降，其ACC、DSC和IOU值略高于UNet。采用U²Net网络分割X光牙齿图像时，其各项指标均优于其它网络分割结果。而使用Deep mask和Deep snake网络分割X光牙齿图像时，其指标值略优于UNet网络，但仍然不如U²Net网络分割结果。本发明实施例所提出的U²Net++网络各项指标均高于其它网络的各项指标。因此，U²Net++网络对X光牙齿图像的分割的精确度由于其它分割网络。Through the analysis of the above table, it is found that when the UNet network is used to segment X-ray dental images, the evaluation index values of ACC, DSC and IOU are all the lowest, indicating that the segmentation effect of the UNet network is the worst compared with other networks. In contrast, after using the UNet network for image segmentation, the values of various indicators obtained are generally low, indicating that the segmentation effect is not ideal. In addition, compared with the Deep mask segmentation result, the accuracy of the Unet++ network segmentation has decreased, and its ACC, DSC and IOU values are slightly higher than UNet. When the U² Net network is used to segment X-ray dental images, its various indicators are better than the segmentation results of other networks. When the Deep mask and Deep snake networks are used to segment X-ray dental images, their index values are slightly better than the UNet network, but still not as good as the U² Net network segmentation results. The various indicators of the U² Net++ network proposed in the embodiment of the present invention are higher than the various indicators of other networks. Therefore, the accuracy of the U² Net++ network in segmenting X-ray dental images is higher than that of other segmentation networks.

实施例三Embodiment 3

请参阅图10，图10是是本发明实施例公开的一种基于多尺度特征提取的图像分割装置的结构示意图。其中，图10所描述的基于多尺度特征提取的图像分割装置可以包括计算设备、计算终端、计算系统和服务器中的一种，其中，服务器包括本地服务器或云服务器，本发明实施例不做限定。如图8所示，该基于多尺度特征提取的图像分割装置可以包括：Please refer to FIG. 10, which is a schematic diagram of the structure of an image segmentation device based on multi-scale feature extraction disclosed in an embodiment of the present invention. The image segmentation device based on multi-scale feature extraction described in FIG. 10 may include one of a computing device, a computing terminal, a computing system, and a server, wherein the server includes a local server or a cloud server, which is not limited in the embodiment of the present invention. As shown in FIG. 8, the image segmentation device based on multi-scale feature extraction may include:

图像尺寸缩小单元301，用于将原始图像输入至预先训练好的U²Net++网络的数据输入层进行图像尺寸缩小处理，得到目标图像，目标图像的图像尺寸为目标尺寸，且数据输入层用于将原始图像的图像尺寸缩小至目标尺寸；The image size reduction unit 301 is used to input the original image into the data input layer of the pre-trained U² Net++ network to perform image size reduction processing to obtain a target image, the image size of the target image is the target size, and the data input layer is used to reduce the image size of the original image to the target size;

多尺度特征提取单元302，用于将目标图像输入至U²Net++网络的特征提取层进行多尺度特征提取，得到候选特征图集合；特征提取层包括多个层级，每个层级包括至少一个多尺度特征提取模块；当某一层级包括至少两个多尺度特征提取模块时，该层级中每相邻两个多尺度特征提取模块之间均设置有注意力模块，每个注意力模块用于识别输入至该注意力模块的特征图的感兴趣区域并将感兴趣区域输入至该注意力模块的在后相邻的多尺度特征提取模块；候选特征图集合包括至少两个候选特征图子集合，每个候选特征图子集合所包含的特征图用于特征融合以获得分割特征图；The multi-scale feature extraction unit 302 is used to input the target image into the feature extraction layer of the U² Net++ network for multi-scale feature extraction to obtain a set of candidate feature maps; the feature extraction layer includes multiple levels, each level includes at least one multi-scale feature extraction module; when a certain level includes at least two multi-scale feature extraction modules, an attention module is provided between each two adjacent multi-scale feature extraction modules in the level, each attention module is used to identify the region of interest of the feature map input to the attention module and input the region of interest to the multi-scale feature extraction module adjacent to the attention module; the set of candidate feature maps includes at least two subsets of candidate feature maps, and the feature maps contained in each subset of candidate feature maps are used for feature fusion to obtain a segmentation feature map;

多侧输出融合单元303，用于将候选特征图集合输入至U²Net++网络的多侧输出融合层进行特征融合处理，得到目标分割特征图。The multi-side output fusion unit 303 is used to input the candidate feature map set into the multi-side output fusion layer of the U² Net++ network for feature fusion processing to obtain a target segmentation feature map.

可见，实施本发明实施例所描述的装置能够通过U²Net++网络的数据输入层对输入至U²Net++网络的原始图像的图像尺寸缩小至目标尺寸，得到目标图像，并基于U²Net++网络的特征提取层对目标图像进行多尺度特征提取，得到候选特征图集合，然后通过U²Net++网络的多侧输出融合层对候选特征图集合进行特征融合处理，得到目标分割特征图，能够提高图像特征提取结果的提取效率和提取准确性，基于准确度更高的图像特征提取结果进行目标检测，以检测到的目标作为图像分割依据，能够提高图像分割依据的确定准确性，有利于提高图像分割结果的分割效率和分割准确性，从而有利于提高基于图像分割结果进行分析的准确性；以及，通过设置注意力模块，以识别图像中的关键特征，能够提高图像的关键特征的提取准确性，从而提高图像分割依据的确定可靠性，有利于进一步提高图像分割结果的精准性。It can be seen that the device described in the embodiment of the present invention can reduce the image size of the original image input to the U² Net++ network to the target size through the data input layer of the U² Net++ network to obtain the target image, and perform multi-scale feature extraction on the target image based on the feature extraction layer of the U² Net++ network to obtain a set of candidate feature maps, and then perform feature fusion processing on the set of candidate feature maps through the multi-side output fusion layer of the U² Net++ network to obtain a target segmentation feature map, which can improve the extraction efficiency and extraction accuracy of the image feature extraction result, perform target detection based on the image feature extraction result with higher accuracy, and use the detected target as the image segmentation basis, which can improve the determination accuracy of the image segmentation basis, which is conducive to improving the segmentation efficiency and segmentation accuracy of the image segmentation result, thereby facilitating improving the accuracy of analysis based on the image segmentation result; and, by setting an attention module to identify key features in the image, the extraction accuracy of the key features of the image can be improved, thereby improving the determination reliability of the image segmentation basis, which is conducive to further improving the accuracy of the image segmentation result.

可见，实施该可选的实施例所描述的装置能够通过设置多层级的多尺度特征提取模块，以及将多尺度特征提取模块用作编码器、解码器和密集残差模块，并通过从不断下采样的特征图中提取多个尺度的特征，然后通过逐步上采样、合并与卷积的方式将多尺度特征解码为高分辨率的特征图，实现了同时捕捉全局背景信息和局部细节信息，能够提高图像特征的提取全面性和提取精准性，从而提高图像特征提取结果的全面性和可靠性，在基于更为可靠的图像特征提取结果进行图像分割，有利于提高图像分割结果的准确性；以及，通过设置嵌套的密集残差模块和跳跃路径，能够将浅层级的特征和深层级的特征相融合，能够有效的减少梯度消失和网络退化问题，能够提高U²Net++网络的稳定性，与此同时，还能够保留原始图像中的特征，能够提高图像特征提取结果的完整性和可靠性，有利于提高图像分割结果的精准性。It can be seen that the device described in the implementation of this optional embodiment can achieve simultaneous capture of global background information and local detail information by setting up a multi-level multi-scale feature extraction module, and using the multi-scale feature extraction module as an encoder, a decoder and a dense residual module, and extracting features of multiple scales from the continuously downsampled feature map, and then decoding the multi-scale features into a high-resolution feature map by gradually upsampling, merging and convolution. It can improve the comprehensiveness and accuracy of image feature extraction, thereby improving the comprehensiveness and reliability of image feature extraction results, and performing image segmentation based on more reliable image feature extraction results, which is conducive to improving the accuracy of image segmentation results; and, by setting nested dense residual modules and skip paths, shallow-level features and deep-level features can be integrated, which can effectively reduce the problems of gradient disappearance and network degradation, and can improve the stability of the^U2Net ++ network. At the same time, it can also retain the features in the original image, which can improve the integrity and reliability of the image feature extraction results, and is conducive to improving the accuracy of the image segmentation results.

在该可选的实施例中，可选的，多尺度特征提取单元302将目标图像输入至U²Net++网络的特征提取层进行多尺度特征提取，得到候选特征图集合的具体方式包括：In this optional embodiment, the multi-scale feature extraction unit 302 optionally inputs the target image into the feature extraction layer of the U² Net++ network to perform multi-scale feature extraction, and the specific method of obtaining the candidate feature map set includes:

将目标图像输入至U²Net++网络的特征提取层中最浅层级的编码器，并基于下采样路径对目标图像进行下采样操作直至最深层级的相邻浅层级，得到多尺度下采样特征图；Input the target image to the shallowest encoder in the feature extraction layer of the U² Net++ network, and downsample the target image to the adjacent shallow layer of the deepest layer based on the downsampling path to obtain a multi-scale downsampled feature map;

基于最深层级的编码器，对最深层级的相邻浅层级的编码器所输出的多尺度下采样特征图进行膨胀卷积处理，得到膨胀特征图；Based on the deepest level encoder, a dilated convolution process is performed on the multi-scale down-sampled feature map output by the encoder of the adjacent shallow level of the deepest level to obtain a dilated feature map;

对于除所有编码器之外的每一多尺度特征提取模块，基于接收到的感兴趣区域，对输入至该多尺度特征提取模块的所有特征图进行上采样操作，得到多尺度上采样特征图；For each multi-scale feature extraction module except all encoders, based on the received region of interest, upsampling all feature maps input to the multi-scale feature extraction module to obtain a multi-scale up-sampled feature map;

将最浅层级中的所有多尺度上采样特征图确定为第一候选特征图子集合；Determine all multi-scale upsampled feature maps in the shallowest layer as a first candidate feature map subset;

将膨胀特征图以及除最浅层级之外的其余每一层级的解码器所输出的多尺度上采样特征图确定为第二候选特征图子集合。The dilated feature map and the multi-scale up-sampled feature maps output by the decoders of each level except the shallowest level are determined as the second candidate feature map subset.

可见，实施该可选的实施例所描述的装置还能够基于下采样路径对目标图像进行下采样操作，得到多尺度下采样特征图，并且基于最深层级的编码器，对相邻的多尺度下采样特征图进行膨胀卷积处理，得到膨胀特征图，以及对于除所有编码器之外的每一多尺度特征提取模块，基于接收到的感兴趣区域对输入至该多尺度特征提取模块的所有特征图进行上采样操作，得到多尺度上采样特征图，然后将最浅层级中的所有多尺度上采样特征图确定为第一候选特征图子集合，以及将膨胀特征图以及除最浅层级之外的其余每一层级的解码器所输出的多尺度上采样特征图确定为第二候选特征图子集合，能够提高候选特征图的确定准确性和确定全面性，从而有利于提高图像特征融合结果的准确性，进而有利于提高图像分割结果的分割准确性和分割可靠性。It can be seen that the device described in the implementation of this optional embodiment can also perform a downsampling operation on the target image based on the downsampling path to obtain a multi-scale downsampling feature map, and based on the encoder of the deepest level, perform a dilated convolution process on the adjacent multi-scale downsampling feature maps to obtain a dilated feature map, and for each multi-scale feature extraction module except all encoders, perform an upsampling operation on all feature maps input to the multi-scale feature extraction module based on the received region of interest to obtain a multi-scale upsampling feature map, and then determine all the multi-scale upsampling feature maps in the shallowest level as a first candidate feature map subset, and determine the dilated feature map and the multi-scale upsampling feature maps output by the decoders of each level except the shallowest level as a second candidate feature map subset, which can improve the determination accuracy and comprehensiveness of the candidate feature maps, thereby facilitating the improvement of the accuracy of the image feature fusion results, and further facilitating the improvement of the segmentation accuracy and reliability of the image segmentation results.

可见，实施该可选的实施例所描述的装置还能够通过在多尺度特征提取模块中设置有多个子层级的多尺度特征提取子模块，增加了整个网络的深度，以及通过融合不同尺度的接收域，从不同尺度捕获更多的上下文信息，能够提高扩大感受野范围，有利于提取更多、更丰富的图像特征，从而提高图像特征提取结果的全面性。It can be seen that the device described in the implementation of this optional embodiment can also increase the depth of the entire network by setting a multi-scale feature extraction sub-module with multiple sub-levels in the multi-scale feature extraction module, and by fusing receptive fields of different scales, it can capture more contextual information from different scales, thereby improving the expansion of the receptive field range, which is conducive to extracting more and richer image features, thereby improving the comprehensiveness of the image feature extraction results.

在该可选的实施例中，可选的，当多尺度特征提取模块的类型为第一特征提取类型时，第一特征提取子路径为下采样子路径，且除最浅子层级以外的在下采样路径上的多尺度特征提取子模块为卷积子模块或下采样子模块；第一特征提取子路径为上采样子路径，且除最浅子层级以外的在上采样路径上的多尺度特征提取子模块为卷积子模块或上采样子模块；每一子层级中除上采样子模块和下采样子模块以外的其余多尺度特征提取子模块为密集卷积子模块。In this optional embodiment, optionally, when the type of the multi-scale feature extraction module is the first feature extraction type, the first feature extraction sub-path is a downsampling sub-path, and the multi-scale feature extraction sub-module on the downsampling path except the shallowest sub-level is a convolution sub-module or a downsampling sub-module; the first feature extraction sub-path is an upsampling sub-path, and the multi-scale feature extraction sub-module on the upsampling path except the shallowest sub-level is a convolution sub-module or an upsampling sub-module; the remaining multi-scale feature extraction sub-modules in each sub-level except the upsampling sub-module and the downsampling sub-module are dense convolution sub-modules.

可见，实施该可选的实施例所描述的装置还能够在第一特征提取类型的多尺度特征提取模块中设置下采样和上采样的操作，提供了更深的网络结构和更多的注意力机制，能够提取更为丰富的图像特征，从而进一步提高图像特征提取结果的全面性，有利于进一步提高图像分割结果的精确性；以及，根据特征图的尺寸设置不同深度的多尺度特征提取模块，获取不同图像尺寸对应的不同规模的特征信息，能够提高图像特征的提取灵活性，从而提高图像特征提取结果的准确性。It can be seen that the device described in the implementation of this optional embodiment can also set downsampling and upsampling operations in the multi-scale feature extraction module of the first feature extraction type, provide a deeper network structure and more attention mechanisms, and can extract richer image features, thereby further improving the comprehensiveness of the image feature extraction results, which is conducive to further improving the accuracy of the image segmentation results; and, according to the size of the feature map, multi-scale feature extraction modules of different depths are set to obtain feature information of different scales corresponding to different image sizes, which can improve the flexibility of image feature extraction, thereby improving the accuracy of the image feature extraction results.

在该可选的实施例中，可选的，当多尺度特征提取模块的类型为第二特征提取类型时，在多尺度特征提取模块中，除最浅子层级以外的其余每一子层级中的多尺度特征提取子模块均为膨胀卷积子模块；In this optional embodiment, optionally, when the type of the multi-scale feature extraction module is the second feature extraction type, in the multi-scale feature extraction module, the multi-scale feature extraction submodules in each sub-level except the shallowest sub-level are all dilated convolution submodules;

可见，实施该可选的实施例所描述的装置还能够通过设置第二特征提取类型的多尺度特征提取模块，膨胀卷积子模块能够在保持输入特征图的分辨率的同时扩大感受野，能够在提高分割网络稳定性的同时保留更多的图像细节信息且捕获到更广泛的上下文信息，提高了图像特征提取结果的可靠性，有利于提高图像分割结果的可靠性。It can be seen that the device described in the implementation of this optional embodiment can also set a multi-scale feature extraction module of the second feature extraction type. The dilated convolution submodule can expand the receptive field while maintaining the resolution of the input feature map, and can retain more image detail information and capture a wider range of contextual information while improving the stability of the segmentation network, thereby improving the reliability of the image feature extraction results, which is conducive to improving the reliability of the image segmentation results.

在该可选的实施例中，可选的，多侧输出融合单元303将候选特征图集合输入至U²Net++网络的多侧输出融合层进行特征融合处理，得到目标分割特征图的具体方式包括：In this optional embodiment, the multi-side output fusion unit 303 optionally inputs the candidate feature map set into the multi-side output fusion layer of the U² Net++ network for feature fusion processing, and the specific method of obtaining the target segmentation feature map includes:

可见，实施该可选的实施例所描述的装置还能够基于预设激活函数，分别对第一候选特征图子集合和第二候选特征图子集合中各自的所有候选特征图进行融合处理，分别得到第一分割特征图和第二分割特征图，然后对第一分割特征图和第二分割特征图进行融合，得到目标分割特征图，有助于丰富图像特征融合依据的数量，提高了图像特征融合结果的可靠性，从而提高图像分割依据的确定准确性，进而有利于提高图像分割结果的准确性，同时还能够提高U²Net++网络的收敛性，以提高U²Net++网络的稳定性，有利于提高图像分割结果的可靠性。It can be seen that the device described in the implementation of this optional embodiment can also fuse all the candidate feature maps in the first candidate feature map subset and the second candidate feature map subset based on a preset activation function, respectively, to obtain the first segmentation feature map and the second segmentation feature map, respectively, and then fuse the first segmentation feature map and the second segmentation feature map to obtain the target segmentation feature map, which helps to enrich the number of image feature fusion bases and improve the reliability of the image feature fusion results, thereby improving the accuracy of determining the image segmentation bases, and further helping to improve the accuracy of the image segmentation results. At the same time, it can also improve the convergence of the U² Net++ network to improve the stability of the U² Net++ network, which is beneficial to improving the reliability of the image segmentation results.

实施例四Embodiment 4

请参阅图11，图11是本发明实施例公开的又一种基于多尺度特征提取的图像分割装置的结构示意图。如图11所示，该基于多尺度特征提取的图像分割装置可以包括：Please refer to Figure 11, which is a schematic diagram of the structure of another image segmentation device based on multi-scale feature extraction disclosed in an embodiment of the present invention. As shown in Figure 11, the image segmentation device based on multi-scale feature extraction may include:

存储有可执行程序代码的存储器401；A memory 401 storing executable program codes;

与存储器401耦合的处理器402；a processor 402 coupled to the memory 401;

处理器402调用存储器401中存储的可执行程序代码，执行本发明实施例一或本发明实施例二所描述的基于多尺度特征提取的图像分割方法中的步骤。The processor 402 calls the executable program code stored in the memory 401 to execute the steps of the image segmentation method based on multi-scale feature extraction described in the first embodiment of the present invention or the second embodiment of the present invention.

实施例五Embodiment 5

本发明实施例公开了一种计算机存储介质，该计算机存储介质存储有计算机指令，该计算机指令被调用时，用于执行本发明实施例一或本发明实施例二所描述的基于多尺度特征提取的图像分割方法中的步骤。An embodiment of the present invention discloses a computer storage medium, which stores computer instructions. When the computer instructions are called, they are used to execute the steps in the image segmentation method based on multi-scale feature extraction described in Embodiment 1 or Embodiment 2 of the present invention.

实施例六Embodiment 6

本发明实施例公开了一种计算机程序产品，该计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质，且该计算机程序可操作来使计算机执行实施例一或实施例二中所描述的基于多尺度特征提取的图像分割方法中的步骤。An embodiment of the present invention discloses a computer program product, which includes a non-transitory computer-readable storage medium storing a computer program, and the computer program is operable to enable a computer to execute the steps in the image segmentation method based on multi-scale feature extraction described in Embodiment 1 or Embodiment 2.

以上所描述的装置实施例仅是示意性的，其中所述作为分离部件说明的模块可以是或者也可以不是物理上分开的，作为模块显示的部件可以是或者也可以不是物理模块，即可以位于一个地方，或者也可以分布到多个网络模块上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下，即可以理解并实施。The device embodiments described above are only illustrative, wherein the modules described as separate components may or may not be physically separated, and the components displayed as modules may or may not be physical modules, that is, they may be located in one place, or they may be distributed on multiple network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the scheme of this embodiment. Those of ordinary skill in the art may understand and implement it without paying creative labor.

通过以上的实施例的具体描述，本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件。基于这样的理解，上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品可以存储在计算机可读存储介质中,存储介质包括只读存储器(Read-Only Memory，ROM)、随机存储器(Random Access Memory，RAM)、可编程只读存储器(Programmable Read-only Memory，PROM)、可擦除可编程只读存储器(ErasableProgrammable Read Only Memory，EPROM)、一次可编程只读存储器(One-timeProgrammable Read-Only Memory，OTPROM)、电子抹除式可复写只读存储器(Electrically-Erasable Programmable Read-Only Memory，EEPROM)、只读光盘(CompactDisc Read-Only Memory，CD-ROM)或其它光盘存储器、磁盘存储器、磁带存储器、或者能够用于携带或存储数据的计算机可读的任何其它介质。Through the specific description of the above embodiments, those skilled in the art can clearly understand that each implementation method can be implemented by means of software plus a necessary general hardware platform, and of course, it can also be implemented by hardware. Based on such an understanding, the above technical solution can be essentially or partly contributed to the prior art in the form of a software product, and the computer software product can be stored in a computer-readable storage medium, and the storage medium includes a read-only memory (ROM), a random access memory (RAM), a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), a one-time programmable read-only memory (OTPROM), an electronically erasable rewritable read-only memory (EEPROM), a compact disc (CD-ROM) or other optical disc storage, magnetic disk storage, magnetic tape storage, or any other computer-readable medium that can be used to carry or store data.

最后应说明的是：本发明实施例公开的一种基于多尺度特征提取的图像分割方法及装置所揭露的仅为本发明较佳实施例而已，仅用于说明本发明的技术方案，而非对其限制；尽管参照前述实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解；其依然可以对前述各项实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或替换，并不使相应的技术方案的本质脱离本发明各项实施例技术方案的精神和范围。Finally, it should be noted that the image segmentation method and device based on multi-scale feature extraction disclosed in the embodiments of the present invention are only preferred embodiments of the present invention, and are only used to illustrate the technical solution of the present invention, rather than to limit it. Although the present invention has been described in detail with reference to the aforementioned embodiments, it should be understood by those skilled in the art that the technical solutions described in the aforementioned embodiments can still be modified, or some of the technical features therein can be replaced by equivalents. However, these modifications or replacements do not deviate the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of the present invention.