Movatterモバイル変換


[0]ホーム

URL:


CN113033570B - An Image Semantic Segmentation Method Based on Improved Atrous Convolution and Multi-level Feature Information Fusion - Google Patents

An Image Semantic Segmentation Method Based on Improved Atrous Convolution and Multi-level Feature Information Fusion
Download PDF

Info

Publication number
CN113033570B
CN113033570BCN202110344461.0ACN202110344461ACN113033570BCN 113033570 BCN113033570 BCN 113033570BCN 202110344461 ACN202110344461 ACN 202110344461ACN 113033570 BCN113033570 BCN 113033570B
Authority
CN
China
Prior art keywords
image
feature
convolution
output
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110344461.0A
Other languages
Chinese (zh)
Other versions
CN113033570A (en
Inventor
高世伟
张长柱
张皓
王祝萍
黄超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tongji University
Original Assignee
Tongji University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tongji UniversityfiledCriticalTongji University
Priority to CN202110344461.0ApriorityCriticalpatent/CN113033570B/en
Publication of CN113033570ApublicationCriticalpatent/CN113033570A/en
Application grantedgrantedCritical
Publication of CN113033570BpublicationCriticalpatent/CN113033570B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

The invention relates to an image semantic segmentation method for improving void convolution and multilevel characteristic information fusion, which comprises the following steps of: extracting image features in a deep convolution neural network by using an improved hole convolution method; the extracted deep characteristic images and the shallow characteristic images are cascaded and fused to make up for the loss of spatial information; learning boundary information of the characteristic image subjected to multistage processing through boundary thinning, fusing and restoring to the resolution of the original image, and generating a prediction segmentation graph; and (4) training the network by using a cross entropy loss function, and evaluating the model performance by using mIoU. The invention improves the utilization method of the prior cavity convolution and designs the deformable space pyramid structure, thereby improving the image characteristic extraction effect of the model. Meanwhile, a multi-level characteristic information fusion structure is designed for image resolution recovery, local information and global information contained in different levels are fully utilized, boundary refinement is introduced, and the accuracy of image semantic segmentation is effectively improved.

Description

Translated fromChinese
一种改进空洞卷积和多层次特征信息融合的图像语义分割方法An Image Semantic Segmentation Based on Improved Atrous Convolution and Multi-level Feature Information Fusionmethod

技术领域technical field

本发明涉及计算机视觉与模式识别智能系统领域,特别涉及一种改进空洞卷积和多层次特征信息融合的图像语义分割方法。The invention relates to the field of computer vision and pattern recognition intelligent systems, in particular to an image semantic segmentation method for improving hole convolution and multi-level feature information fusion.

背景技术Background technique

自动化场景理解是现代计算机视觉领域的重要目标。图像语义分割是计算机视觉的基本场景理解任务,其中涉及将原始数据(例如平面图像)作为输入并将其转换为具有突出显示的感兴趣区域的掩模,以将它们划分为具有不同语义信息的多个区域。近年来,由于深度卷积神经网络在语义分割任务中的出色表现,与传统的GrabCut、N-Cut等方法相比,分割质量得到了显着提高。良好的分割算法对于许多实际应用至关重要,例如,自动驾驶,医学图像处理,计算摄影,图像搜索引擎,增强现实。上述这些应用都需要非常准确的像素预测。Automated scene understanding is an important goal in the modern field of computer vision. Image semantic segmentation is a fundamental scene understanding task in computer vision, which involves taking raw data (e.g., planar images) as input and converting them into masks with highlighted regions of interest to classify them into regions with different semantic information. multiple regions. In recent years, due to the excellent performance of deep convolutional neural networks in semantic segmentation tasks, the segmentation quality has been significantly improved compared with traditional methods such as GrabCut and N-Cut. A good segmentation algorithm is crucial for many practical applications, e.g., autonomous driving, medical image processing, computational photography, image search engines, augmented reality. All of these applications require very accurate pixel-wise predictions.

但是,目前的基于深度卷积神经网络的语义分割方法由于多次池化和下采样造成图像分辨率降低、全局上下文信息丢失等问题,在分割结果上不能取得较高的预测分类准确度。However, the current semantic segmentation method based on deep convolutional neural network cannot achieve high prediction and classification accuracy in segmentation results due to problems such as image resolution reduction and global context information loss caused by multiple pooling and downsampling.

发明内容Contents of the invention

本发明的目的是提供一种改进空洞卷积和多层次特征信息融合的图像语义分割方法,该方法能够有效提高特征提取的信息利用率与有效性,同时还能丰富浅层语义信息并学习图像全局上下文信息,提高对二维图像进行语义分割的准确率。The purpose of the present invention is to provide an image semantic segmentation method with improved atrous convolution and multi-level feature information fusion, which can effectively improve the information utilization and effectiveness of feature extraction, and at the same time enrich shallow semantic information and learn image Global context information improves the accuracy of semantic segmentation of 2D images.

基于改进的空洞卷积方法和多层次特征信息融合的结构能够保证在提升图像分割效果的同时不明显提升系统的计算量。相比简单的将卷积网络层叠,为图像特征提取与空间信息弥补设计更合适的结构和方法,降低下采样过程特征信息的丢失,有效提高像素预测准确率,增强图像语义分割效果。The structure based on the improved atrous convolution method and the fusion of multi-level feature information can ensure that the calculation amount of the system is not significantly increased while improving the image segmentation effect. Compared with simply stacking convolutional networks, designing a more suitable structure and method for image feature extraction and spatial information compensation can reduce the loss of feature information in the downsampling process, effectively improve the accuracy of pixel prediction, and enhance the image semantic segmentation effect.

为了达到上述目的,本发明采用的技术方案是:In order to achieve the above object, the technical scheme adopted in the present invention is:

一种改进空洞卷积和多层次特征信息融合的图像语义分割方法,包括以下步骤:An image semantic segmentation method for improving atrous convolution and multi-level feature information fusion, comprising the following steps:

S1:使用改进的空洞卷积方法在深度卷积神经网络中提取图像特征;S1: Using an improved atrous convolution method to extract image features in a deep convolutional neural network;

S2:将提取的深层特征图像与浅层特征图像级联融合弥补空间信息丢失;;S2: cascade fusion of the extracted deep feature image and shallow feature image to make up for the loss of spatial information;

S3:将多阶段处理后的特征图像通过边界细化学习边界信息,融合并恢复至原始图像分辨率,生成预测分割图;S3: The multi-stage processed feature image learns boundary information through boundary refinement, fuses and restores to the original image resolution, and generates a predicted segmentation map;

S4:利用交叉熵损失函数训练网络,以mIoU评价模型性能。S4: Use the cross-entropy loss function to train the network, and evaluate the model performance with mIoU.

所述S1的具体实现方法包括以下步骤:The concrete realization method of described S1 comprises the following steps:

S1.1:以ResNet-101为基础网络,在第三个采样模块后接入改进的跃级连接空洞卷积模块,该模块包含连续三个空洞卷积层,根据输入图像的分辨率大小更改其中卷积层的空洞率,不同卷积层之间顺向建立跃级连接,在不继续缩小图像的情况下进一步扩大感受野,并减少信息损失;S1.1: Based on the ResNet-101 network, after the third sampling module, an improved skip connection hole convolution module is connected, which contains three consecutive hole convolution layers, which are changed according to the resolution of the input image Among them, the hole rate of the convolutional layer and the forward connection between different convolutional layers can further expand the receptive field and reduce information loss without continuing to shrink the image;

S1.2:将经过跃级连接空洞卷积模块的图像输入到改进的可变形空间金字塔池化模块,利用可变形卷积具有使感受野自适应目标尺度变化、灵活收敛信息的优势与多尺度空洞卷积标准采样能够有效分类图像任意区域的优势相结合,以较小的模型复杂度为代价提高模型学习目标形变的能力;S1.2: Input the image of the leap-connected dilated convolution module into the improved deformable space pyramid pooling module. Using deformable convolution has the advantages of adapting the receptive field to the target scale change, flexible convergence information and multi-scale Combined with the advantages of dilated convolution standard sampling that can effectively classify any region of the image, the ability of the model to learn target deformation is improved at the cost of a small model complexity;

S1.3:保留下采样过程不同阶段、不同分辨率特征图像所包含的不同层次特征信息。S1.3: Retain different levels of feature information contained in different stages of the downsampling process and different resolution feature images.

所述S2的具体实现方法包括以下步骤:The concrete realization method of described S2 comprises the following steps:

S2.1:将经过跃级连接空洞卷积模块处理的特征层经过1×1卷积,与最深层提取的特征图像结合,弥补该浅层特征图像的语义信息,将输出特征图经过1×1卷积作为该层输出;S2.1: Combine the feature layer processed by the leap-connected hole convolution module with 1×1 convolution, and combine it with the feature image extracted from the deepest layer to make up the semantic information of the shallow feature image, and pass the output feature map through 1× 1 convolution as the output of this layer;

S2.2:将S2.1中输出的特征图与前一模块输出的特征图像结合,弥补该浅层特征图像的语义信息,将输出特征图经过1×1卷积作为该层输出;S2.2: Combine the feature map output in S2.1 with the feature image output by the previous module to make up the semantic information of the shallow feature image, and use the output feature map as the output of this layer after 1×1 convolution;

S2.3:将S2.2中输出的特征图像通过双线性插值二倍上采样,与前一模块输出的特征图像结合,弥补该浅层特征图像的语义信息,将输出特征图经过1×1卷积作为该层输出;S2.3: The feature image output in S2.2 is double-upsampled by bilinear interpolation, and combined with the feature image output by the previous module to make up the semantic information of the shallow feature image, and the output feature image is processed by 1× 1 convolution as the output of this layer;

S2.4:将S2.3中输出的特征图像通过双线性插值二倍上采样,与前一模块输出的特征图像结合,弥补该浅层特征图像的语义信息,将输出特征图经过1×1卷积作为该层输出。S2.4: The feature image output in S2.3 is double-upsampled through bilinear interpolation, combined with the feature image output by the previous module to make up the semantic information of the shallow feature image, and the output feature image is processed by 1× 1 convolution as the output of this layer.

所述S3的具体实现方法包括以下步骤:The concrete realization method of described S3 comprises the following steps:

S3.1:将最深层输出特征图像通过双线性插值法四倍上采样;S3.1: Upsampling the deepest output feature image by four times through bilinear interpolation;

S3.2:将S2.1中该层输出特征图像通过双线性插值法四倍上采样;S3.2: Four times upsampling the output feature image of this layer in S2.1 by bilinear interpolation;

S3.3:将S2.2中该层输出特征图像通过双线性插值法四倍上采样;S3.3: Four times upsampling the output feature image of this layer in S2.2 by bilinear interpolation;

S3.4:将S2.3中该层输出特征图像通过双线性插值法二倍上采样;S3.4: Double upsampling the output feature image of this layer in S2.3 by bilinear interpolation;

S3.5:将S2.4、S3.1、S3.2、S3.3、S3.4中输出特征图像经过BR模块细化边界后融合,经过3×3卷积、双线性插值法四倍上采样两步处理后恢复至图像原始分辨率大小,得到最终预测分割图。S3.5: The output feature images in S2.4, S3.1, S3.2, S3.3, S3.4 are fused after the BR module refines the boundary, after 3×3 convolution and bilinear interpolation method four After double-upsampling and two-step processing, it is restored to the original resolution of the image, and the final prediction segmentation map is obtained.

所述S4的具体实现方法包括以下步骤:The concrete realization method of described S4 comprises the following steps:

S4.1:计算分割预测图与数据集中标准分割图的交叉熵损失,利用反向传播算法更新模型中参数权重,经过数据集中训练集训练得到最终的语义分割模型;S4.1: Calculate the cross-entropy loss between the segmentation prediction map and the standard segmentation map in the dataset, use the back propagation algorithm to update the parameter weights in the model, and obtain the final semantic segmentation model after training on the training set in the dataset;

S4.2:利用数据集中的测试集,以mIoU指标测试模型预测性能。S4.2: Use the test set in the dataset to test the prediction performance of the model with the mIoU indicator.

由于上述技术方案运用,本发明与现有技术相比具有下列有点和效果:Due to the use of the above-mentioned technical solutions, the present invention has the following advantages and effects compared with the prior art:

本发明充分考虑空洞卷积对语义分割的益处与其缺点,改进了现有空洞卷积的利用方法并设计了可变形空间金字塔结构,提升模型的图像特征提取效果。同时,相比于一般的上采样方法,为图像分辨率恢复设计了多层次特征信息融合结构,充分利用不同层级包含的局部信息以及全局信息,并引入边界细化,有效提高图像语义分割的准确率。The present invention fully considers the advantages and disadvantages of atrous convolution for semantic segmentation, improves the existing method of using atrous convolution and designs a deformable spatial pyramid structure, and improves the image feature extraction effect of the model. At the same time, compared with the general upsampling method, a multi-level feature information fusion structure is designed for image resolution restoration, making full use of the local information and global information contained in different levels, and introducing boundary refinement, which effectively improves the accuracy of image semantic segmentation. Rate.

附图说明Description of drawings

图1是本发明提出的整体语义分割方法流程图;Fig. 1 is the overall semantic segmentation method flowchart that the present invention proposes;

图2是本发明提出的整体语义分割算法网络模型图;Fig. 2 is the overall semantic segmentation algorithm network model figure that the present invention proposes;

图3是本发明网络结构中的跃级连接空洞卷积模块;Fig. 3 is the leap-level connection hole convolution module in the network structure of the present invention;

图4是本发明算法在Cityscapes数据集的可视化效果图。Fig. 4 is a visualization effect diagram of the algorithm of the present invention in the Cityscapes data set.

具体实施方式Detailed ways

下面结合附图及实施案例对本发明作进一步描述:Below in conjunction with accompanying drawing and embodiment example, the present invention will be further described:

一种改进空洞卷积和多层次特征信息融合的图像语义分割方法:如图1所示包括以下步骤:An image semantic segmentation method that improves hole convolution and multi-level feature information fusion: as shown in Figure 1, it includes the following steps:

S1:使用改进的空洞卷积方法在深度卷积神经网络中提取图像特征,如图2中“S1”虚线框内所示:S1: Use the improved atrous convolution method to extract image features in the deep convolutional neural network, as shown in the dotted box of "S1" in Figure 2:

S1.1:首先以ResNet-101为基础网络,在第三个采样模块后接入改进的跃级连接空洞卷积模块,其中“Conv”表示“Convolution”,代表卷积层。图3展示了该模块的具体构造,其包含连续三个空洞卷积层,根据输入图像的分辨率大小更改其中卷积层的空洞率(rate),图3中三层空洞卷积的空洞率依次为2、4、8,不同卷积层之间顺向建立跃级连接,在不继续缩小图像的情况下进一步扩大感受野,并减少信息损失;S1.1: Firstly, ResNet-101 is used as the basic network, and after the third sampling module, an improved leap-connected hole convolution module is connected, where "Conv" means "Convolution", which means the convolutional layer. Figure 3 shows the specific structure of the module, which contains three consecutive dilated convolutional layers, and changes the dilated rate (rate) of the convolutional layer according to the resolution of the input image. The dilated rate of the three-layer dilated convolution in Figure 3 The order is 2, 4, 8, and the forward connection between different convolutional layers is established to further expand the receptive field and reduce information loss without continuing to shrink the image;

S1.2:将经过跃级连接空洞卷积模块的图像输入到改进的可变形空间金字塔池化模块,该模块在图2中以三层空洞卷积、一层可变形卷积和最大池化层组成,利用可变形卷积具有使感受野自适应目标尺度变化、灵活收敛信息的优势与多尺度空洞卷积标准采样能够有效分类图像任意区域的优势相结合,以较小的模型复杂度为代价提高模型学习目标形变的能力;S1.2: Input the image of the skip-connected dilated convolution module into the improved deformable spatial pyramid pooling module, which is shown in Figure 2 with three layers of dilated convolution, one layer of deformable convolution and max pooling Layer composition, the use of deformable convolution has the advantages of adapting the receptive field to target scale changes and flexible convergence information, combined with the advantages of multi-scale atrous convolution standard sampling that can effectively classify any region of the image, with a small model complexity of The cost improves the ability of the model to learn the deformation of the target;

S1.3:保留下采样过程不同阶段、不同分辨率特征图像所包含的不同层次特征信息。S1.3: Retain different levels of feature information contained in different stages of the downsampling process and different resolution feature images.

S2:将提取的深层特征图像与浅层特征图像级联融合弥补空间信息丢失,如图2中“S2”虚线框内所示;S2: The extracted deep feature image and shallow feature image are cascaded and fused to make up for the loss of spatial information, as shown in the dotted box of "S2" in Figure 2;

S2.1:如图2所示,将经过跃级连接空洞卷积模块处理的特征层经过1×1卷积,与网络模型最深层提取的特征图像结合,图中“C”表示“Concatenate”,指不同层级特征图的融合,用于弥补浅层特征图像的语义信息,将输出特征图经过1×1卷积作为该层输出;S2.1: As shown in Figure 2, the feature layer processed by the skip connection hole convolution module is subjected to 1×1 convolution, and combined with the feature image extracted from the deepest layer of the network model, “C” in the figure means “Concatenate” , refers to the fusion of feature maps of different levels, which is used to make up for the semantic information of shallow feature images, and the output feature map is 1×1 convolution as the output of this layer;

S2.2:将S2.1中输出的特征图与前一模块输出的特征图像结合,弥补该浅层特征图像的语义信息,将输出特征图经过1×1卷积作为该层输出;S2.2: Combine the feature map output in S2.1 with the feature image output by the previous module to make up the semantic information of the shallow feature image, and use the output feature map as the output of this layer after 1×1 convolution;

S2.3:将S2.2中输出的特征图像通过双线性插值二倍上采样(即“upsampleby2”),与前一模块输出的特征图像结合,弥补该浅层特征图像的语义信息,将输出特征图经过1×1卷积作为该层输出;S2.3: Combine the feature image output in S2.2 with bilinear interpolation double upsampling (ie "upsampleby2") with the feature image output by the previous module to make up for the semantic information of the shallow feature image, and The output feature map undergoes 1×1 convolution as the output of this layer;

S2.4:将S2.3中输出的特征图像通过双线性插值二倍上采样,与前一模块输出的特征图像结合,弥补该浅层特征图像的语义信息,将输出特征图经过1×1卷积作为该层输出。S2.4: The feature image output in S2.3 is double-upsampled through bilinear interpolation, combined with the feature image output by the previous module to make up the semantic information of the shallow feature image, and the output feature image is processed by 1× 1 convolution as the output of this layer.

S3:将多阶段处理后的特征图像通过边界细化学习边界信息,融合并恢复至原始图像分辨率,生成预测分割图,如图2中“S3”虚线框内所示;S3: Learn the boundary information of the multi-stage processed feature image through boundary refinement, fuse and restore it to the original image resolution, and generate a predicted segmentation map, as shown in the dotted line box of "S3" in Figure 2;

S3.1:将最深层输出特征图像通过双线性插值法四倍上采样;S3.1: Upsampling the deepest output feature image by four times through bilinear interpolation;

S3.2:将S2.1中该层输出特征图像通过双线性插值法四倍上采样;S3.2: Four times upsampling the output feature image of this layer in S2.1 by bilinear interpolation;

S3.3:将S2.2中该层输出特征图像通过双线性插值法四倍上采样;S3.3: Four times upsampling the output feature image of this layer in S2.2 by bilinear interpolation;

S3.4:将S2.3中该层输出特征图像通过双线性插值法二倍上采样;S3.4: Double upsampling the output feature image of this layer in S2.3 by bilinear interpolation;

S3.5:将S2.4、S3.1、S3.2、S3.3、S3.4中输出特征图像经过BR(BoundaryRefinement,边界细化)模块细化边界后融合,经过3×3卷积、双线性插值法四倍上采样两步处理后恢复至图像原始分辨率大小,得到最终预测分割图。S3.5: The output feature images in S2.4, S3.1, S3.2, S3.3, and S3.4 are fused after the BR (BoundaryRefinement, boundary refinement) module refines the boundary, and then 3×3 convolution , Bilinear interpolation method quadruple upsampling and two-step processing to restore the original resolution of the image to obtain the final predicted segmentation map.

S4:利用交叉熵损失函数训练网络,以mIoU评价模型性能。S4: Use the cross-entropy loss function to train the network, and evaluate the model performance with mIoU.

S4.1:计算分割预测图与数据集中标准分割图的交叉熵损失,利用反向传播算法更新模型中参数权重,经过数据集中训练集训练得到最终的语义分割模型;S4.1: Calculate the cross-entropy loss between the segmentation prediction map and the standard segmentation map in the dataset, use the back propagation algorithm to update the parameter weights in the model, and obtain the final semantic segmentation model after training on the training set in the dataset;

S4.2:利用数据集中测试集,以准确率和mIoU测试模型预测性能。S4.2: Use the test set in the dataset to test the model prediction performance with accuracy and mIoU.

下面按照本发明的方法进行实验,说明本发明的预测效果。Experiments are carried out according to the method of the present invention to illustrate the prediction effect of the present invention.

测试环境:Ubuntu16.04系统;NVIDIA GTX 1080Ti GPU;python3.5;TensorFlow框架。Test environment: Ubuntu16.04 system; NVIDIA GTX 1080Ti GPU; python3.5; TensorFlow framework.

测试数据集:所选数据集为用于计算机视觉任务中图像分割的图像数据集PASCALVOC 2012,涉及四个类别:车辆,家庭,动物,人,并且进一步细分为20个子类别(外加一个背景)。数据集包含1464个训练图像,1449个验证图像和1456个测试图像。Test data set: The selected data set is the image data set PASCALVOC 2012 for image segmentation in computer vision tasks, involving four categories: vehicles, families, animals, people, and further subdivided into 20 subcategories (plus a background) . The dataset contains 1464 training images, 1449 validation images and 1456 testing images.

测试指标:本发明使用mIoU作为性能评价指标。mIoU是指预测区域和实际区域交集与预测区域和实际区域的并集之比。对现今存在的不同算法计算该指标数据进行结果对比,证明本发明在图像语义分割领域取得的较好结果。Test index: The present invention uses mIoU as a performance evaluation index. mIoU refers to the ratio of the intersection of the predicted area and the actual area to the union of the predicted area and the actual area. Comparing the results of calculating the index data with different existing algorithms proves that the present invention achieves better results in the field of image semantic segmentation.

测试结果如下:The test results are as follows:

表1.本发明在可变形空间金字塔池化模块设计不同的空洞卷积空洞率下的性能对比,通过比较可知适当的参数设置可提升网络性能Table 1. The performance comparison of the present invention under the design of the deformable spatial pyramid pooling module with different atrous convolution hole ratios. Through comparison, it can be seen that appropriate parameter settings can improve network performance

Figure BDA0002996698810000051
Figure BDA0002996698810000051

表2.本发明在多层级特征信息融合和边界细化模块加入下的性能对比,可以证明本网络设计的有效性Table 2. The performance comparison of the present invention with the addition of multi-level feature information fusion and boundary refinement modules can prove the effectiveness of this network design

Figure BDA0002996698810000061
Figure BDA0002996698810000061

表3.本发明与其他算法在PASCAL VOC 2012数据集下的性能比较Table 3. Performance comparison between the present invention and other algorithms under the PASCAL VOC 2012 data set

Figure BDA0002996698810000062
Figure BDA0002996698810000062

通过以上对比数据可以看出,本发明的mIoU与现有算法相比有明显的提高。It can be seen from the above comparative data that the mIoU of the present invention is significantly improved compared with the existing algorithm.

需要强调的是,本发明所述的实例是说明性的,其目的在于让熟悉此项技术的人士能够了解本发明的内容并据以实施,本发明包括但不仅限于具体实施方式中所述的实例。凡是根据本发明精神实质所作的等效变化或修饰,都应涵盖在本发明的保护范围之内。It should be emphasized that the examples described in the present invention are illustrative, and its purpose is to allow those skilled in the art to understand the content of the present invention and implement it accordingly. The present invention includes but not limited to the examples described in the specific embodiments instance. All equivalent changes or modifications made according to the spirit of the present invention shall fall within the protection scope of the present invention.

Claims (4)

Translated fromChinese
1.一种改进空洞卷积和多层次特征信息融合的图像语义分割方法,其特征在于:包括以下步骤:1. An image semantic segmentation method for improving hole convolution and multi-level feature information fusion, characterized in that: comprising the following steps:S1:使用改进的空洞卷积方法在深度卷积神经网络中提取图像特征;S1: Using an improved atrous convolution method to extract image features in a deep convolutional neural network;S2:将提取的深层特征图像与浅层特征图像级联融合弥补空间信息丢失;S2: Cascade fusion of the extracted deep feature image and shallow feature image to make up for the loss of spatial information;S3:将多阶段处理后的特征图像通过边界细化学习边界信息,融合并恢复至原始图像分辨率,生成预测分割图;S3: The multi-stage processed feature image learns boundary information through boundary refinement, fuses and restores to the original image resolution, and generates a predicted segmentation map;S4:利用交叉熵损失函数训练网络,以mIoU评价模型性能;S4: Use the cross-entropy loss function to train the network, and evaluate the model performance with mIoU;所述S1的具体实现方法包括以下步骤:The concrete realization method of described S1 comprises the following steps:S1.1以ResNet-101为基础网络,在第三个采样模块后接入改进的跃级连接空洞卷积模块,该模块包含连续三个空洞卷积层,根据输入图像的分辨率大小更改其中卷积层的空洞率,不同卷积层之间顺向建立跃级连接,在不继续缩小图像的情况下进一步扩大感受野,并减少信息损失;S1.1 uses ResNet-101 as the basic network. After the third sampling module, an improved leap-connected hole convolution module is connected. This module contains three consecutive hole convolution layers, which are changed according to the resolution of the input image. The hole rate of the convolutional layer, the leap connection between different convolutional layers is established in the forward direction, and the receptive field is further expanded without continuing to shrink the image, and the information loss is reduced;S1.2将经过跃级连接空洞卷积模块的图像输入到改进的可变形空间金字塔池化模块,利用可变形卷积具有使感受野自适应目标尺度变化、灵活收敛信息的优势与多尺度空洞卷积标准采样能够有效分类图像任意区域的优势相结合,以较小的模型复杂度为代价提高模型学习目标形变的能力;S1.2 Input the image of the leap-connected hole convolution module to the improved deformable space pyramid pooling module. The use of deformable convolution has the advantages of making the receptive field adaptive to the target scale change, flexible convergence information and multi-scale holes. Combined with the advantages of convolutional standard sampling that can effectively classify any region of the image, the ability of the model to learn target deformation is improved at the cost of a small model complexity;S1.3保留下采样过程不同阶段、不同分辨率特征图像所包含的不同层次特征信息。S1.3 Retain different levels of feature information contained in feature images at different stages of the downsampling process and at different resolutions.2.根据权利要求1所述的一种改进空洞卷积和多层次特征信息融合的图像语义分割方法,其特征在于:所述S2的具体实现方法包括以下步骤:2. A kind of image semantic segmentation method that improves hole convolution and multi-level feature information fusion according to claim 1, is characterized in that: the specific implementation method of described S2 comprises the following steps:S2.1将经过跃级连接空洞卷积模块处理的特征层经过1×1卷积,与最深层提取的特征图像结合,弥补该浅层特征图像的语义信息,将输出特征图经过1×1卷积作为该层输出;S2.1 Combine the feature layer processed by the skip connection hole convolution module through 1×1 convolution, and combine it with the feature image extracted from the deepest layer to make up the semantic information of the shallow feature image, and pass the output feature map through 1×1 The convolution is output as this layer;S2.2将S2.1中输出的特征图与前一模块输出的特征图像结合,弥补该浅层特征图像的语义信息,将输出特征图经过1×1卷积作为该层输出;S2.2 Combine the feature map output in S2.1 with the feature image output by the previous module to make up the semantic information of the shallow feature image, and use the output feature map as the output of this layer through 1×1 convolution;S2.3将S2.2中输出的特征图像通过双线性插值二倍上采样,与前一模块输出的特征图像结合,弥补该浅层特征图像的语义信息,将输出特征图经过1×1卷积作为该层输出;S2.3 The feature image output in S2.2 is double-upsampled through bilinear interpolation, and combined with the feature image output by the previous module to make up for the semantic information of the shallow feature image, and the output feature map is processed by 1×1 The convolution is output as this layer;S2.4将S2.3中输出的特征图像通过双线性插值二倍上采样,与前一模块输出的特征图像结合,弥补该浅层特征图像的语义信息,将输出特征图经过1×1卷积作为该层输出。S2.4 The feature image output in S2.3 is double-upsampled through bilinear interpolation, and combined with the feature image output by the previous module to make up for the semantic information of the shallow feature image, and the output feature map is processed by 1×1 Convolution as output of this layer.3.根据权利要求1所述的一种改进空洞卷积和多层次特征信息融合的图像语义分割方法,其特征在于:为了在同一分辨率下融合不同层级具有不同空间信息与语义信息的特征图像,需要将多层级的输出特征图像经过双线性插值法统一处理至图像原始分辨率的四分之一大小,所述S3的具体实现方法包括以下步骤:3. An image semantic segmentation method for improving atrous convolution and multi-level feature information fusion according to claim 1, characterized in that: in order to fuse feature images with different spatial information and semantic information at different levels at the same resolution , the multi-level output feature images need to be uniformly processed to a quarter of the original resolution of the image through bilinear interpolation. The specific implementation method of S3 includes the following steps:S3.1将最深层输出特征图像通过双线性插值法四倍上采样;S3.1 Upsampling the deepest output feature image by four times through bilinear interpolation;S3.2将S2.1中该层输出特征图像通过双线性插值法四倍上采样;S3.2 Upsampling the output feature image of the layer in S2.1 by four times through bilinear interpolation;S3.3将S2.2中该层输出特征图像通过双线性插值法四倍上采样;S3.3 Upsampling the output feature image of the layer in S2.2 by four times through bilinear interpolation;S3.4将S2.3中该层输出特征图像通过双线性插值法二倍上采样;S3.4 double upsampling the output feature image of the layer in S2.3 by bilinear interpolation;S3.5将S2.4、S3.1、S3.2、S3.3、S3.4中输出特征图像经过BR模块细化边界后融合,经过3×3卷积、双线性插值法四倍上采样两步处理后恢复至图像原始分辨率大小,得到最终预测分割图。S3.5 The output feature images in S2.4, S3.1, S3.2, S3.3, and S3.4 are fused after the BR module refines the boundary, and after 3×3 convolution and bilinear interpolation four times After the two-step process of upsampling, it is restored to the original resolution of the image, and the final prediction segmentation map is obtained.4.根据权利要求1所述的一种改进空洞卷积和多层次特征信息融合的图像语义分割方法,其特征在于:所述S4的具体实现方法包括以下步骤:4. A kind of image semantic segmentation method that improves hole convolution and multi-level feature information fusion according to claim 1, is characterized in that: the specific implementation method of described S4 comprises the following steps:S4.1计算分割预测图与数据集中标准分割图的交叉熵损失,利用反向传播算法更新模型中参数权重,经过数据集中训练集训练得到最终的语义分割模型;S4.1 Calculate the cross-entropy loss between the segmentation prediction map and the standard segmentation map in the data set, use the back propagation algorithm to update the parameter weights in the model, and obtain the final semantic segmentation model through the training set in the data set;S4.2利用数据集中测试集,以mIoU指标测试模型预测性能。S4.2 Use the test set in the dataset to test the prediction performance of the model with the mIoU indicator.
CN202110344461.0A2021-03-292021-03-29 An Image Semantic Segmentation Method Based on Improved Atrous Convolution and Multi-level Feature Information FusionActiveCN113033570B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202110344461.0ACN113033570B (en)2021-03-292021-03-29 An Image Semantic Segmentation Method Based on Improved Atrous Convolution and Multi-level Feature Information Fusion

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202110344461.0ACN113033570B (en)2021-03-292021-03-29 An Image Semantic Segmentation Method Based on Improved Atrous Convolution and Multi-level Feature Information Fusion

Publications (2)

Publication NumberPublication Date
CN113033570A CN113033570A (en)2021-06-25
CN113033570Btrue CN113033570B (en)2022-11-11

Family

ID=76452856

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202110344461.0AActiveCN113033570B (en)2021-03-292021-03-29 An Image Semantic Segmentation Method Based on Improved Atrous Convolution and Multi-level Feature Information Fusion

Country Status (1)

CountryLink
CN (1)CN113033570B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN113506310B (en)*2021-07-162022-03-01首都医科大学附属北京天坛医院 Medical image processing method, device, electronic device and storage medium
CN113658200B (en)*2021-07-292024-01-02东北大学Edge perception image semantic segmentation method based on self-adaptive feature fusion
CN113762476B (en)*2021-09-082023-12-19中科院成都信息技术股份有限公司Neural network model for text detection and text detection method thereof
CN113762396A (en)*2021-09-102021-12-07西南科技大学 A method for semantic segmentation of two-dimensional images
CN113920099B (en)*2021-10-152022-08-30深圳大学Polyp segmentation method based on non-local information extraction and related components
CN113936006A (en)*2021-10-292022-01-14天津大学Segmentation method and device for processing high-noise low-quality medical image
CN114187442A (en)*2021-12-142022-03-15深圳致星科技有限公司Image processing method, storage medium, electronic device, and image processing apparatus
CN114511766B (en)*2022-01-262025-01-24西南民族大学 Image recognition method and related device based on deep learning
CN114419449B (en)*2022-03-282022-06-24成都信息工程大学 A Semantic Segmentation Method for Remote Sensing Images Based on Self-Attention Multi-scale Feature Fusion
CN115375652B (en)*2022-08-222025-09-19深圳大学Method, device, equipment and medium for detecting welding defects of new energy battery pole
CN115829980B (en)*2022-12-132023-07-25深圳核韬科技有限公司Image recognition method, device and equipment for fundus photo and storage medium
CN117211758B (en)*2023-11-072024-04-02克拉玛依市远山石油科技有限公司Intelligent drilling control system and method for shallow hole coring
CN117809043B (en)*2024-03-012024-04-30华东交通大学Foundation cloud picture segmentation and classification method

Citations (9)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN108876793A (en)*2018-04-132018-11-23北京迈格威科技有限公司Semantic segmentation methods, devices and systems and storage medium
CN109190626A (en)*2018-07-272019-01-11国家新闻出版广电总局广播科学研究院A kind of semantic segmentation method of the multipath Fusion Features based on deep learning
CN109190752A (en)*2018-07-272019-01-11国家新闻出版广电总局广播科学研究院The image, semantic dividing method of global characteristics and local feature based on deep learning
CN109325534A (en)*2018-09-222019-02-12天津大学 A Semantic Segmentation Method Based on Bidirectional Multiscale Pyramid
CN109461157A (en)*2018-10-192019-03-12苏州大学Image, semantic dividing method based on multi-stage characteristics fusion and Gauss conditions random field
CN110232394A (en)*2018-03-062019-09-13华南理工大学A kind of multi-scale image semantic segmentation method
CN110706239A (en)*2019-09-262020-01-17哈尔滨工程大学Scene segmentation method fusing full convolution neural network and improved ASPP module
CN111242138A (en)*2020-01-112020-06-05杭州电子科技大学RGBD significance detection method based on multi-scale feature fusion
CN112446890A (en)*2020-10-142021-03-05浙江工业大学Melanoma segmentation method based on void convolution and multi-scale fusion

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109711413B (en)*2018-12-302023-04-07陕西师范大学Image semantic segmentation method based on deep learning
CN110188817B (en)*2019-05-282021-02-26厦门大学 A real-time high-performance semantic segmentation method for street view images based on deep learning
CN111369563B (en)*2020-02-212023-04-07华南理工大学 A Semantic Segmentation Method Based on Pyramid Atrous Convolutional Network

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110232394A (en)*2018-03-062019-09-13华南理工大学A kind of multi-scale image semantic segmentation method
CN108876793A (en)*2018-04-132018-11-23北京迈格威科技有限公司Semantic segmentation methods, devices and systems and storage medium
CN109190626A (en)*2018-07-272019-01-11国家新闻出版广电总局广播科学研究院A kind of semantic segmentation method of the multipath Fusion Features based on deep learning
CN109190752A (en)*2018-07-272019-01-11国家新闻出版广电总局广播科学研究院The image, semantic dividing method of global characteristics and local feature based on deep learning
CN109325534A (en)*2018-09-222019-02-12天津大学 A Semantic Segmentation Method Based on Bidirectional Multiscale Pyramid
CN109461157A (en)*2018-10-192019-03-12苏州大学Image, semantic dividing method based on multi-stage characteristics fusion and Gauss conditions random field
CN110706239A (en)*2019-09-262020-01-17哈尔滨工程大学Scene segmentation method fusing full convolution neural network and improved ASPP module
CN111242138A (en)*2020-01-112020-06-05杭州电子科技大学RGBD significance detection method based on multi-scale feature fusion
CN112446890A (en)*2020-10-142021-03-05浙江工业大学Melanoma segmentation method based on void convolution and multi-scale fusion

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
DeepStrip: High Resolution Boundary Refinement;Peng Zhou 等;《arXiv》;20200325;全文*
ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation;Sachin Mehta 等;《SpringerLink》;20181231;全文*
Rethinking Atrous Convolution for Semantic Image Segmentation;Liang-Chieh Chen 等;《arXiv》;20171205;全文*

Also Published As

Publication numberPublication date
CN113033570A (en)2021-06-25

Similar Documents

PublicationPublication DateTitle
CN113033570B (en) An Image Semantic Segmentation Method Based on Improved Atrous Convolution and Multi-level Feature Information Fusion
CN109740465B (en) A Lane Line Detection Algorithm Based on Instance Segmentation Neural Network Framework
CN112396607B (en) A Deformable Convolution Fusion Enhanced Semantic Segmentation Method for Street View Images
CN112132844B (en) Image segmentation method based on lightweight recursive non-local self-attention
CN109583340B (en) A video object detection method based on deep learning
CN108710830A (en)A kind of intensive human body 3D posture estimation methods for connecting attention pyramid residual error network and equidistantly limiting of combination
CN108985250A (en)Traffic scene analysis method based on multitask network
CN110188685A (en) A target counting method and system based on double-attention multi-scale cascade network
CN112733919B (en)Image semantic segmentation method and system based on void convolution and multi-scale and multi-branch
CN113724271B (en)Semantic segmentation model training method for understanding complex environment mobile robot scene
CN111476249B (en) Construction method of multi-scale large receptive field convolutional neural network
CN113673590A (en) Rain removal method, system and medium based on multi-scale hourglass densely connected network
CN115035295B (en)Remote sensing image semantic segmentation method based on shared convolution kernel and boundary loss function
CN113870335A (en)Monocular depth estimation method based on multi-scale feature fusion
CN112052783A (en)High-resolution image weak supervision building extraction method combining pixel semantic association and boundary attention
CN112465801B (en) An Instance Segmentation Method for Extracting Mask Features at Different Scales
CN116645598B (en) A remote sensing image semantic segmentation method based on channel attention feature fusion
CN113256546A (en)Depth map completion method based on color map guidance
CN113255675B (en) Image semantic segmentation network structure and method based on dilated convolution and residual path
CN109447897B (en)Real scene image synthesis method and system
CN112329801A (en)Convolutional neural network non-local information construction method
CN114299101A (en) Method, apparatus, apparatus, medium and program product for acquiring target area of image
CN113378704B (en) A multi-target detection method, device and storage medium
CN112634289B (en) A Fast Feasible Domain Segmentation Method Based on Asymmetric Atrous Convolution
CN114638408A (en) A Pedestrian Trajectory Prediction Method Based on Spatio-temporal Information

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp