Movatterモバイル変換


[0]ホーム

URL:


CN118898718B - A semantic segmentation method with enhanced boundary perception - Google Patents

A semantic segmentation method with enhanced boundary perception
Download PDF

Info

Publication number
CN118898718B
CN118898718BCN202411004607.7ACN202411004607ACN118898718BCN 118898718 BCN118898718 BCN 118898718BCN 202411004607 ACN202411004607 ACN 202411004607ACN 118898718 BCN118898718 BCN 118898718B
Authority
CN
China
Prior art keywords
feature
module
convolution
representing
feature map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202411004607.7A
Other languages
Chinese (zh)
Other versions
CN118898718A (en
Inventor
焦文华
田玉宇
周旭
蔡晓异
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Mining and Technology Beijing CUMTB
Original Assignee
China University of Mining and Technology Beijing CUMTB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Mining and Technology Beijing CUMTBfiledCriticalChina University of Mining and Technology Beijing CUMTB
Priority to CN202411004607.7ApriorityCriticalpatent/CN118898718B/en
Publication of CN118898718ApublicationCriticalpatent/CN118898718A/en
Application grantedgrantedCritical
Publication of CN118898718BpublicationCriticalpatent/CN118898718B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本发明公开了一种增强边界感知的语义分割方法,属于语义分割技术领域,主要包括编码路径和解码路径,编码路径由5个编码模块组成,每个编码器对整个图像中目标区域的多层次语义信息进行编码,编码模块中不同尺度的卷积运算得到目标区域的多尺度信息;它利用池化操作有效地聚合了上下文语义;解码路径主要由四个模块组成,每个解码模块在注意力嵌入模块AEM的引导下,对不同分支的信息流进行聚合和细化,图卷积模块捕获大规模不规则区域的特征信息,注意嵌入模块生成互补的空间细节,更好地对编码特征进行建模。本发明采用上述的一种增强边界感知的语义分割方法,具备更精准的分割效果。

The present invention discloses a semantic segmentation method for enhancing boundary perception, which belongs to the field of semantic segmentation technology, and mainly includes an encoding path and a decoding path. The encoding path is composed of 5 encoding modules, each encoder encodes the multi-level semantic information of the target area in the whole image, and the convolution operations of different scales in the encoding module obtain the multi-scale information of the target area; it uses pooling operation to effectively aggregate contextual semantics; the decoding path is mainly composed of four modules, each decoding module, under the guidance of the attention embedding module AEM, aggregates and refines the information flow of different branches, the graph convolution module captures the feature information of large-scale irregular areas, and the attention embedding module generates complementary spatial details to better model the encoding features. The present invention adopts the above-mentioned semantic segmentation method for enhancing boundary perception, and has a more accurate segmentation effect.

Description

Semantic segmentation method for enhancing boundary perception
Technical Field
The invention relates to the technical field of semantic segmentation, in particular to a semantic segmentation method for enhancing boundary perception.
Background
The computer vision semantic segmentation aims at correctly classifying targets existing in images and segmenting the targets to obtain accurate outline shapes, belongs to classical tasks in the field of computer vision, has important application values in the fields of informationized intelligent industry, industrial intelligence, automatic driving and the like, and becomes an important premise of follow-up vision tasks. With the rapid development of deep learning technology, the semantic segmentation task breaks through to the new field step by step. However, in reality, the acquired image generally has the problems of complex image structure, insufficient target significance, low image quality and the like, and it is important to extract rich and accurate target features from the panoramic image.
With the development of image processing technology, the analysis of images mainly goes through three development stages of image processing technology and machine learning to deep learning. Methods based on image processing techniques typically include three steps, image preprocessing, feature extraction, and target region extraction. The image preprocessing comprises graying, filtering, binarizing and the like, and aims to remove noise in an image and enhance image contrast so as to obtain a segmented target area by using methods such as threshold analysis, edge detection, hough transformation and the like. On the other hand, the machine learning-based method achieves target segmentation by training a classifier. These methods typically include three steps, collecting a dataset, extracting features, and training a classifier. The common classifiers are a support vector machine (SupportVectorMachine, SVM) and a convolutional neural network (ConvolutionalNeuralNetwork, CNN), and the common feature extraction method is image segmentation and feature selection. However, given the inherent difficulties of image pixel-level labeling, and the complex semantic structure and class imbalance exhibited by most images, detailed analysis of images by conventional data-driven deep learning methods remains a challenge.
Most of the existing semantic segmentation methods based on deep learning are suitable for natural images, but for images with low illumination and unobvious contrast, boundaries of targets and backgrounds cannot be accurately segmented, so that poor segmentation accuracy is caused. Furthermore, since the underlying CNN can only operate within square areas of fixed size of the structured features, most target areas exhibit irregular unstructured features. It is therefore still a difficult task to effectively extract foreground objects in images.
Disclosure of Invention
The invention aims to provide a semantic segmentation method for enhancing boundary perception, which has a more accurate segmentation effect.
In order to achieve the above object, the present invention provides a semantic segmentation method for enhancing boundary perception, including an encoding path and a decoding path;
Wherein the encoding path comprises the sub-steps of:
S1, extracting features in the whole image by an initial feature extraction module to obtain a convolution feature map;
S2, sequentially inputting the convolution feature images obtained in the step S1 into four coding modules to obtain module feature images of the four coding modules;
s3, sequentially embedding the convolution feature images obtained in the step S1 into four module feature images to obtain four module total feature images;
The decoding path comprises the following sub-steps:
s4, inputting the total feature map of the 4 th module output in the step S3 into a 3 rd decoding module, and obtaining a refined extraction feature map through convolution, wherein each pixel point of the refined extraction feature map forms a topological map, and the critical points of the refined extraction feature map form an adjacent matrix;
s5, reconstructing and refining the feature map generated by the attention embedding module AEM to reconstruct a group of dimension-reducing feature maps and a group of projection matrixes, so as to meet the input requirements of the multi-scale map reasoning module MsGRM and the requirements of feature aggregation of map nodes;
s6, using the topological graph and the adjacent matrix obtained in the step S4, a group of dimension reduction feature graphs and a group of projection matrices obtained in the step S5 as inputs, and reconstructing the feature graphs by using a graph convolution and attention embedding module AEM;
s7, the reconstructed feature map obtained in the step S6 is sequentially transmitted into a decoding module 2, a decoding module 1 and a decoding module 0, and the other decoding modules are the same.
The invention comprises a 3 rd decoding module, a2 nd decoding module, a 1st decoding module and a 0 th decoding module, wherein in each decoding module, a reconstructed characteristic diagram is obtained by the method in the step 6 and is transmitted in a dense connection mode.
Preferably, the coding path consists of an initial feature extraction module and four identical coding modules;
The initial feature extraction module comprises a convolution module with a convolution kernel of 7×7 and two convolution modules with a convolution kernel of 3×3.
Preferably, in step S1, the initial feature extraction module extracts features in the whole image to obtain a convolution feature map, and the calculation process is shown in formula (1):
Wherein f0 represents the features of the first layer through the initial feature extraction module, f1 represents the features of the second and third layers through the initial feature extraction module, f'0 represents the features of the fourth layer through the initial feature extraction module, σ (·) represents the activation function, σs (·) represents the ReLU activation function, BN (·) represents the normalized layer of the batch process, conv3×3 (-) and Conv7×7 (-) represent convolution operations of 3×3 and 7×7, respectively; representing the addition of the matrix and MaxPool3×3 ()'s representing a maximum pooling operation of 3 x 3.
Preferably, the specific operation of step S2 is as follows:
inputting the convolution feature map obtained in the step S1 into a first coding module, and then inputting the feature map output by the previous layer into a group of continuous 3X 3 convolutions by each coding module to obtain a first scale feature map in an image;
The feature map output by the previous layer is input into the convolution of 1 multiplied by 1 in parallel to obtain a feature map of a second scale, and then the feature maps of two different scales are fused, wherein the fusion method is as follows:
wherein ffus represents the feature map after fusion, UP2 () represents bilinear interpolation UP-sampling operation, FC (·) represents the fully connected layer; Representing convolution with dimension 1×1, fv representing the feature map from the previous layer, σr (·) representing the leak ReLU activation function; A convolution representing a dimension drop of 1×1, θρ representing the attention coefficient, H 'representing the reconstructed height, W' representing the reconstructed width, H representing the original height, Hρ representing the extruded high dimension, Wρ representing the extruded wide dimension, v representing the set of dimensions, l representing the low dimension, H representing the high dimension;
after the fused feature map ffus passes through an activation layer, the information flow is transmitted to the next layer through jump link in a residual mode, and the calculation is shown in a formula (3):
Wherein f 'fus represents an extrusion fusion feature map obtained by convolution operation of ffus, fs represents a module feature map obtained by residual structure combination of f'fus and ffus, and delta (·) represents 2-layer continuous 3×3 convolution operation.
Preferably, the specific operation of step S3 is as follows:
embedding f0 extracted by the initial feature extraction module in the step S1 into each coding module, and finally sequentially generating a module total feature map through four coding modulesThe embedding process of f0 is shown in formula (4):
Wherein,The method comprises the steps of representing a module total feature map transformed by a feature transformation module ConvTM, wherein Cat (·) represents feature stitching, and s represents the current coding module.
Preferably, the decoding path is composed of four decoding modules with the same structure, and each decoding module comprises a feature transformation module ConvTM composed of convolution, an attention embedding module AEM and a multi-scale graph reasoning module MsGRM;
The feature transformation module ConvTM transforms and up-down samples the total feature graphs of different scale modules output by the corresponding encoding module, so as to ensure that the feature graphs input to the decoder are consistent in size, and the transformation is shown in formula (5):
Wherein,Representation ofA first scale feature map after downsampling; representing a second scale feature map; Representation ofA downsampled third scale feature map; Representation ofA fourth scale feature map after convolution activation and UP-sampling, downS (DEG) representing a down-sampling operation, UP (DEG) representing a nearest neighbor interpolation UP-sampling operation;
The attention embedding module AEM is used for establishing interaction between high-level semantic features and low-level semantic features, the input feature set is { f1…fN }, wherein N is the number of features, and the attention embedding module AEM obtains an attention fusion feature fa as shown in a formula (6):
Wherein α represents an attention mechanism, att (·) represents an attention operation, fa represents a generated attention fusion feature, n represents a current feature index, fn represents a current feature, and fN represents all features;
the multi-scale graph inference module MsGRM obtains global semantics and spatial details of the target using node information transfer and convergence functions, and the multi-scale graph inference module MsGRM collects and transfers node features of different scales using a topology graph method, forms interactions between the nodes, and builds long-term dependencies to strengthen the features.
Preferably, in step S4, the topology map is: epsilon = { epsilon1…εM } represents the set between edges between nodes,Representing a set of graph nodes, adjacency matrix a is the critical point of the constructed topology graph as shown in equation (7):
Where vi and vj represent graph nodes, when nodes vi and vj are associated, or vi and vj are the same node, the node is a self-loop, the value aij of the adjacency matrix is defined as 1, and when there is no association between these nodes, the boundary power value is set to 0, i.e. aij =0.
Preferably, the specific operation of step S5 is:
reconstructing and refining the feature map generated by the attention embedding module AEM, and generating two-scale feature maps by the attention embedding module AEM through convolution of different scalesAndPerforming operation to reconstruct a group of dimension-reducing feature graphsAndProjection matrix setAndThe reconstruction process is shown in formula (8):
Wherein,A set of dimension-reduction feature maps representing a first dimension; A set of dimension-reduction feature maps representing a second dimension; A set of reconstructed feature maps representing a first scale; A reconstruction feature map set representing a second scale, delta (DEG) representing a dimension-reduced reconstruction function, delta (DEG) representing a projection function, and a corresponding dimension-reduced feature mapAnd projection matrixMatrix multiplication is carried out to obtain a node characteristic diagramMeanwhile, the adjacent matrix A in the step S4 is optimized by using Laplace regularization, and node characteristic mapping of different scales is achievedThe adjacency matrix a is optimized as follows:
Wherein,A feature mapping set representing nodes of different scales; i is an identity matrix; Representing the adjacency matrix after laplace re-regularization.
Preferably, the specific operation of step S6 is:
for generated node feature mapProviding a graph convolution GCN module to realize node characteristic aggregation;
First, the multi-scale map reasoning module MsGRM again generates a two-scale feature map for the attention embedding module AEM in step S5Reconstructing to obtain a back projection matrix
Second, the back projection matrix is to be formedFeature mapping with graph volumeMultiplying, converting it back to original hidden space, and fusing by convolution layer to obtain feature map
Finally, the generatedAdded to initial generation of attention embedding module AEM by adopting pixel-by-pixel addition strategyIn generating a new graph convolution featureWherein,AndThe calculation is shown in formula (10):
Wherein,Representing features corresponding to different scales; representing features generated by the AEM; Representing the adjacency matrix after Laplace adjustment; a degree matrix representing the adjacency matrix A, Θs representing the weighting matrix, and the resulting new graph convolution characteristicsThe generation mode is shown as a formula (11):
Preferably, in step S7, a priori knowledge is introduced to increase the representation of low-level semantics, on the decoding path, the information flow of the decoding module is not only connected layer by layer, but also provided with cross-layer connection, the cross-layer transmission of the information flow is performed by adopting a residual connection mode, and the total feature diagram fMsGRM of the diagram inference module finally generated by the multi-scale diagram inference module MsGRM is represented as:
Wherein,A feature map representing a first scale; A feature map representing a second scale; And the total feature diagram of the decoding modules is represented by four scales generated by the four decoding modules.
Therefore, the semantic segmentation method for enhancing boundary perception has the following technical effects:
(1) The U-shaped network structure allows for more scale feature extraction, and the skillfully constructed graph convolution makes the scoring network more friendly to irregular target areas. When the framework is applied to image analysis, unstructured features in the image can be well contained, so that relevant targets can be conveniently extracted, and irrelevant backgrounds can be eliminated;
(2) In the model, richer image features can be extracted through the scale transformation of the coding modules and transmitted to each decoding module through dense links, so that the reusability of the features is enhanced;
(3) The invention deploys a graph convolution network in the decoding path to analyze large irregular areas and combines them with SCSE (ConcurrentSpatialandChannelSqueezeandExcitation), attention mechanisms to capture complementary details, thereby producing a more robust and accurate decoder;
(4) Experiments prove that the model keeps an advanced result while planned data are concentrated on the basis of being as light as possible;
(5) Compared with the existing advanced segmentation algorithm in the same data set result, the method has more accurate segmentation effect on the images with insufficient target significance and lower image quality.
The technical scheme of the invention is further described in detail through the drawings and the embodiments.
Drawings
FIG. 1 is a network overall structure diagram of a semantic segmentation method for enhancing boundary perception, wherein (a) in FIG. 1 is an algorithm overall structure diagram, (b) in FIG. 1 is a coding module structure diagram, and (c) in FIG. 1 is a decoding module structure diagram;
Fig. 2 is a view showing the results of algorithm reasoning.
Detailed Description
The technical scheme of the invention is further described below through the attached drawings and the embodiments.
Example 1
The invention provides a semantic segmentation method for enhancing boundary perception, which takes U-Net as a backbone structure, improves the multiplexing and transitivity of characteristics, uses convolution operations with different sizes in an encoding part to enrich the characteristic scale, and is remarkable in that a network adds a graph convolution to replace a layer of convolution layer in a decoding path. And an Attention Embedding Module (AEM) is added into each decoding module to aggregate and refine the information streams of different branches. On the premise of no pre-training and post-processing, the network model still shows excellent effect and great potential.
The method mainly comprises an encoding path and a decoding path, as shown in fig. 1 (a). The method specifically comprises the following steps:
s1, the coding path consists of 5 coding modules, each coder codes multi-level semantic information of a target area in the whole image as shown in fig. 1 (b), and convolution operations of different scales in the coding modules obtain multi-scale information of the target area, namely a characteristic diagram. The encoding path effectively aggregates context semantics using pooling operations. The decoding path is mainly composed of four modules (as shown in detail in fig. 1 (c)).
S11, the coding path consists of 5 coding modules and is composed of an initial feature extraction module and four identical encoder group layers, wherein the initial feature module comprises a convolution module with a convolution kernel of 7 multiplied by 7 and two convolution modules with convolution kernels of 3 multiplied by 3, a convolution feature diagram is obtained, and the specific calculation process is shown in a formula (1):
Wherein f0 represents features of a first layer through the initial feature extraction module, f1 represents features of a second and third layers through the initial feature extraction module, f'0 represents features of a fourth layer through the initial feature extraction module, σ (·) represents an activation function, σs (·) represents a ReLU activation function, BN (·) represents a normalization layer of the batch process, conv3×3 ()) and Conv7×7 ()) represent convolution operations of 3×3 and 7×7, respectively; representing the addition of the matrix and MaxPool3×3 ()'s representing a maximum pooling operation of 3 x 3.
S12, inputting the convolution feature map obtained in the step S11 into a first coding module, then inputting the feature map output by the upper layer into a group of continuous 3X 3 convolutions as shown in the internal structure of each coding module in FIG. 1 (b), obtaining local features in the image so as to highlight differences between a target area and a background, simultaneously, in order to prevent features at more details from being ignored, inputting the feature map output by the upper layer into the convolution of 1X 1 in parallel, and then fusing the features of two different scales, wherein the fusion method is as shown in the following formula:
After the feature map output after the fusion of the features of different scales passes through an activation layer, in order to prevent the problem of gradient explosion or disappearance caused by an excessively deep network structure, the information stream is transmitted to the next layer through a jump link in a residual mode, and specific calculation is shown as a formula (3), wherein delta (DEG) represents a group of continuous 3×3 convolution operations. σ (·) represents the activation function relu:
S13, considering that abundant bottom semantic information is favorable for representing physical attributes such as target size and appearance, embedding the large-scale features f0 extracted by the initial feature module into each coding module, prompting the network to acquire more detailed coding features, and finally sequentially generating useful coding features fs through four coding modules. The embedding process of the large-scale features is shown as a formula (4):
Wherein,The method comprises the steps of representing a module total feature map transformed by a feature transformation module ConvTM, wherein Cat (·) represents feature stitching, and s represents the current coding module.
S2, each decoding module aggregates and refines the multi-scale information under the guidance of an Attention Embedding Module (AEM). The graph convolution module captures characteristic information of a large-scale irregular area. Note that the embedding module generates complementary spatial details to better model the encoded features.
The decoding path consists of four decoders with the same structure, namely a feature transformation module (ConvolutionTransformModule, convTM) consisting of convolution, an attention embedding module (AttentionEmbeddingModule, AEM) and a Multi-scale graph reasoning module (Multi-scaleGraphReasoningModule, msGRM).
(1) ConvTM: convTM is mainly used for converting and up-down sampling different scale features fs output by corresponding coding modules, ensuring that feature graphs input to a decoder are consistent in size, and effectively preventing the problems of loss, explosion, disappearance and the like of detailed information and gradients caused by excessive up-down sampling in a network. For example, the transformation of the third decoding module is shown in the following formula (5), and the detailed structure of the third decoding module is shown in fig. 1 (c):
Wherein,Representation ofA first scale feature map after downsampling; representing a second scale feature map; Representation ofA downsampled third scale feature map; Representation ofA fourth scale feature map after convolution activation and UP-sampling, downS (DEG) representing a down-sampling operation, UP (DEG) representing a nearest neighbor interpolation UP-sampling operation;
(2) AEM is used to establish interactions between high-level and low-level semantic features. In short, low-level semantic information is embedded in high-level features, and high-level discrimination features are embedded in the low-level features. Let the input feature set be { f1…fN }, where N is the number of features. The fusion fa of the AEM provided by the invention is characterized by the following formula (6):
Where α represents the attention mechanism, att (·) represents the attention operation, fa represents the generated attention fusion feature, n represents the current feature index, fn represents the current feature, and fN represents all features. In this way, the underlying features embed context information while preserving spatial detail, improving feature representation and facilitating segmentation of target regions in the image.
(3) MsGRM-MsGRM obtains global semantics and spatial details of the target using primarily node information transfer and convergence functions. The traditional convolution algorithm can only operate in a square area with a fixed size, and can not effectively extract the characteristics of an irregular target in an image. MsGRM uses the topology graph method to collect and deliver node features of different scales, form interactions between the nodes and establish long-term dependencies to strengthen the features. The module adopts a topology reconstruction and multiscale feature embedding node method, is beneficial to improving the attribute decision capability of the network learning target region boundary pixels and is more effective in learning through the graph nodes. Global and contextual spatial detail further improves the features of the feature.
The specific decoding path includes:
S21, nodes of constructed topological graph
Inputting the feature map output in the step S13 into a decoder, convoluting to obtain a smaller feature map, forming a topological map by each pixel point of the smaller feature map, and forming the topological mapΕ= { ε1…εM } represents the set between edges between nodes, i.e. the relationship between elements in the image.Representing a collection of graph nodes. The adjacency matrix A is the critical point of the constructed topology, as shown in equation (7):
Where vi and vj represent graph nodes. When nodes vi and vj are associated, or vi and vj are the same node, the node is a self-loop and the value of the adjacency matrix aij is defined as 1. When there is no association between these nodes, we set the boundary power value to 0, i.e. aij =0.
S22, reconstructing and refining the feature map generated by the AEM to meet the input requirement of MsGRM and the requirement of feature aggregation of the graph nodes
MsGRM implement global semantic extraction by aggregating and passing these node features. And reconstructing and refining the feature map generated by the AEM to meet the input requirement of MsGRM and the requirement of feature aggregation of the graph nodes. As shown in FIG. 1 (c), we generate a set of feature graphs for AEM by convolution of different scalesAndPerforming operation to reconstruct a group of dimension-reducing feature graphsAndProjection matrix setAndThe reconstruction process is shown in the following formula (8):
Wherein,A set of dimension-reduction feature maps representing a first dimension; A set of dimension-reduction feature maps representing a second dimension; A set of reconstructed feature maps representing a first scale; A reconstruction feature map set representing a second scale, delta (DEG) representing a dimension-reduced reconstruction function, delta (DEG) representing a projection function, and a corresponding dimension-reduced feature mapAnd projection matrixMatrix multiplication is carried out to obtain a node characteristic diagramMeanwhile, the adjacent matrix A in the step S4 is optimized by using Laplace regularization, and node characteristic mapping of different scales is achievedThe adjacency matrix a is optimized as follows:
Wherein,A feature mapping set representing nodes of different scales; i is an identity matrix; Representing the adjacency matrix after laplace re-regularization.
S23, reconstructing the feature map by using the map convolution and AEM
For generated node feature mapProviding a graph rolling (GCN) module to realize node characteristic aggregation and obtain better global and contextual semantic details, comprising the following steps:
first, the feature map generated by the AEM in step S22 is again generatedReconstructing to obtain a back projection matrix
Second, the back projection matrix is to be formedFeature mapping with graph volumeMultiplying and converting the images back to the original hidden space, and fusing the images by using a convolution layer to obtain a feature map
Finally, to obtain a better feature representation, a feature map is generatedAdding the feature images to the AEM initially generated feature images by adopting an addition strategy of pixelsIn generating a new graph convolution featureWherein, multiscaleAndThe calculation is shown as formula (10):
Wherein,The activation function of Leakyrelu is shown that adds a small negative slope to relu so that in the case of a negative input, the activation function is not 0 and has a smaller derivative than relu, avoiding neuronal death and accelerating the convergence rate of the neural network.The degree matrix representing the adjacency matrix a and Θ representing the weighting matrix. New final graph convolution featureThe generation mode is shown in the following formula (11):
S24, information flow cross-layer connection of decoding module
MsGRM obtain multi-scale node characteristics by constructing topological graphs with different scales, fully utilize multi-scale information, and better process global and context details through node transfer functions. Furthermore, introducing a priori knowledge increases the representation of low level semantics. On the decoding path, the information flow of the decoding module can be connected in a cross-layer manner, so that stable transmission of detailed information is ensured to the greatest extent. However, in order to reduce redundant information and alleviate the problems of gradient disappearance and the like, the cross-layer transmission of the information stream is performed in a residual connection mode. The final generated feature map fMsGRM by MsGRM can be expressed as:
Wherein,A feature map representing a first scale; A feature map representing a second scale; And the total feature diagram of the decoding modules is represented by four scales generated by the four decoding modules.
Table 1 comparison of different segmentation methods on the same dataset
Method ofBackbone networkFLOPs↓aACC↑mIOU↑
APCNetResNet1015.175.2213.90
DeepLabv3+ResNet1011.881.1714.46
MobileNetv3MobileNetV32.177.2416.54
UNetFCN1.082.4217.41
OCRNetHRNetV2p-W484.777.3229.25
UperNetSwin-B6.980.0427.23
SETRNaiveViT-L3.878.6618.65
SegformerMIT-B513.980.7816.06
U-GRNets (invention)ResNet346.893.6864.16
Therefore, the semantic segmentation method for enhancing the boundary perception has more accurate segmentation effect, and the final effect diagram of the algorithm is shown in fig. 2, and is sequentially an original diagram I, an effect diagram I, an original diagram II, an effect diagram II, an original diagram III and an effect diagram III from left to right.
It should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that the technical solution of the present invention may be modified or substituted by the same, and the modified or substituted technical solution may not deviate from the spirit and scope of the technical solution of the present invention.

Claims (10)

CN202411004607.7A2024-07-252024-07-25 A semantic segmentation method with enhanced boundary perceptionActiveCN118898718B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202411004607.7ACN118898718B (en)2024-07-252024-07-25 A semantic segmentation method with enhanced boundary perception

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202411004607.7ACN118898718B (en)2024-07-252024-07-25 A semantic segmentation method with enhanced boundary perception

Publications (2)

Publication NumberPublication Date
CN118898718A CN118898718A (en)2024-11-05
CN118898718Btrue CN118898718B (en)2025-04-18

Family

ID=93264585

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202411004607.7AActiveCN118898718B (en)2024-07-252024-07-25 A semantic segmentation method with enhanced boundary perception

Country Status (1)

CountryLink
CN (1)CN118898718B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN116205927A (en)*2023-02-242023-06-02西安电子科技大学Image segmentation method based on boundary enhancement
CN117237645A (en)*2023-11-152023-12-15中国农业科学院农业资源与农业区划研究所Training method, device and equipment of semantic segmentation model based on boundary enhancement

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
EP3171297A1 (en)*2015-11-182017-05-24CentraleSupélecJoint boundary detection image segmentation and object recognition using deep learning
CN113221969A (en)*2021-04-252021-08-06浙江师范大学Semantic segmentation system and method based on Internet of things perception and based on dual-feature fusion
CN116681983B (en)*2023-06-022024-06-11中国矿业大学 A narrow and long target detection method based on deep learning
CN117078930B (en)*2023-08-112025-07-22河南大学Medical image segmentation method based on boundary sensing and attention mechanism
CN117274608B (en)*2023-11-232024-02-06太原科技大学 Semantic segmentation method of remote sensing images based on spatial detail perception and attention guidance

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN116205927A (en)*2023-02-242023-06-02西安电子科技大学Image segmentation method based on boundary enhancement
CN117237645A (en)*2023-11-152023-12-15中国农业科学院农业资源与农业区划研究所Training method, device and equipment of semantic segmentation model based on boundary enhancement

Also Published As

Publication numberPublication date
CN118898718A (en)2024-11-05

Similar Documents

PublicationPublication DateTitle
Li et al.Multitask semantic boundary awareness network for remote sensing image segmentation
CN110428428B (en) An image semantic segmentation method, electronic device and readable storage medium
Zhou et al.Contextual ensemble network for semantic segmentation
CN109859190B (en)Target area detection method based on deep learning
CN112101410B (en) A method and system for image pixel semantic segmentation based on multimodal feature fusion
CN110276765B (en) Image panorama segmentation method based on multi-task learning deep neural network
CN110929665B (en)Natural scene curve text detection method
CN110619369A (en)Fine-grained image classification method based on feature pyramid and global average pooling
WO2022000426A1 (en)Method and system for segmenting moving target on basis of twin deep neural network
Seyedhosseini et al.Semantic image segmentation with contextual hierarchical models
CN108509978A (en)The multi-class targets detection method and model of multi-stage characteristics fusion based on CNN
CN111242288A (en) A multi-scale parallel deep neural network model building method for lesion image segmentation
CN118781596A (en) Semantic segmentation method of remote sensing images based on semantic adaptive edge enhancement network
CN115631369A (en) A fine-grained image classification method based on convolutional neural network
CN111583285A (en) A Semantic Segmentation Method of Liver Image Based on Edge Attention Strategy
Jiang et al.Forest-CD: Forest change detection network based on VHR images
CN113657560A (en)Weak supervision image semantic segmentation method and system based on node classification
Ma et al.An attention-based progressive fusion network for pixelwise pavement crack detection
CN110059769A (en)The semantic segmentation method and system rebuild are reset based on pixel for what streetscape understood
CN110866938B (en) A fully automatic video moving object segmentation method
Wang et al.CWC-transformer: a visual transformer approach for compressed whole slide image classification
CN107133579A (en)Based on CSGF (2D)2The face identification method of PCANet convolutional networks
CN118840553A (en)Weak annotation remote sensing image semantic segmentation method based on double learning mechanism
CN116012658B (en) A self-supervised pre-training target detection method, system, device and storage medium
CN117746045A (en) A medical image segmentation method and system based on Transformer and convolution fusion

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp