Movatterモバイル変換


[0]ホーム

URL:


CN115115938B - A method for salient object detection in remote sensing images - Google Patents

A method for salient object detection in remote sensing images
Download PDF

Info

Publication number
CN115115938B
CN115115938BCN202210879580.0ACN202210879580ACN115115938BCN 115115938 BCN115115938 BCN 115115938BCN 202210879580 ACN202210879580 ACN 202210879580ACN 115115938 BCN115115938 BCN 115115938B
Authority
CN
China
Prior art keywords
remote sensing
sensing image
features
feature
output
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210879580.0A
Other languages
Chinese (zh)
Other versions
CN115115938A (en
Inventor
夏鲁瑞
蔺崎辉
李森
陈雪旗
卢妍
张占月
王鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peoples Liberation Army Strategic Support Force Aerospace Engineering University
Original Assignee
Peoples Liberation Army Strategic Support Force Aerospace Engineering University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peoples Liberation Army Strategic Support Force Aerospace Engineering UniversityfiledCriticalPeoples Liberation Army Strategic Support Force Aerospace Engineering University
Priority to CN202210879580.0ApriorityCriticalpatent/CN115115938B/en
Publication of CN115115938ApublicationCriticalpatent/CN115115938A/en
Application grantedgrantedCritical
Publication of CN115115938BpublicationCriticalpatent/CN115115938B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本发明公开了一种遥感影像显著目标检测方法,所述方法包括以下步骤:S1、获取包含训练集和测试集的遥感影像数据,并构建包括检测特征编码器和级联特征解码器的遥感影像显著目标检测模型;S2、引入注意力机制、特征流动机制和级联解码机制,并基于训练集的遥感影像数据对所述遥感影像显著目标检测模型进行训练,直至预设损失函数收敛后停止训练,获取训练好的遥感影像显著目标检测模型;S3、利用训练好的遥感影像显著目标检测模型对所述测试集的遥感影像数据进行显著目标预测,进而输出对应的显著图。本发明基于级联结构进行特征解码,改善了遥感影像中小目标漏检和错检的问题,且提高了显著区域的预测置信度,能够预测出更准确的显著目标边界。

The present invention discloses a remote sensing image salient target detection method, the method comprising the following steps: S1, obtaining remote sensing image data including a training set and a test set, and constructing a remote sensing image salient target detection model including a detection feature encoder and a cascade feature decoder; S2, introducing an attention mechanism, a feature flow mechanism and a cascade decoding mechanism, and training the remote sensing image salient target detection model based on the remote sensing image data of the training set, until the training is stopped after the preset loss function converges, and the trained remote sensing image salient target detection model is obtained; S3, using the trained remote sensing image salient target detection model to predict salient targets on the remote sensing image data of the test set, and then outputting the corresponding salient map. The present invention performs feature decoding based on a cascade structure, improves the problem of missed detection and misdetection of small targets in remote sensing images, and improves the prediction confidence of salient areas, and can predict more accurate salient target boundaries.

Description

Method for detecting remarkable target of remote sensing image
Technical Field
The invention mainly relates to the technical field of remote sensing image application, in particular to a method for detecting a remarkable target of a remote sensing image.
Background
Along with the explosive growth of the data volume of the remote sensing image, the conventional remote sensing image utilization method for manually and visually interpretation cannot meet the actual demand, so that development of an intelligent interpretation method for the remote sensing image is needed. As an important preprocessing step of computer vision, significant object detection has achieved good results in natural scenes. However, due to the characteristics of different shooting angles, various types of ground objects, complex background and the like of the remote sensing scene, the method for detecting the obvious target of the remote sensing image is still less. Meanwhile, in the process of detecting the obvious target in the remote sensing image, the existing method has poor detection effect on the edge area of the obvious target, is easy to generate the conditions of false detection, missing detection and the like on the small target, and has a certain distance from the realization of application.
Disclosure of Invention
In view of the above, the present invention aims to provide a method for detecting a significant target in a remote sensing image, which adopts an encoder-decoder structure, introduces a attention mechanism, a feature flow mechanism and a cascade decoding mechanism, designs a new loss function to train a target detection model, and further detects a significant target in the remote sensing image through the trained target detection model, thereby effectively improving the detection effect of the significant target edge of the remote sensing image, and improving the conditions of small target omission detection, false detection, etc.
The invention discloses a method for detecting a remarkable target of a remote sensing image, which comprises the following steps:
s1, acquiring remote sensing image data comprising a training set and a testing set, and constructing a remote sensing image salient target detection model comprising a detection feature encoder and a cascade feature decoder;
s2, introducing an attention mechanism, a feature flow mechanism and a cascade decoding mechanism, training the remote sensing image significant target detection model based on remote sensing image data of a training set, stopping training until a preset loss function converges, and obtaining a trained remote sensing image significant target detection model;
s3, performing salient target prediction on the remote sensing image data of the test set by using the trained remote sensing image salient target detection model, and further outputting a corresponding salient map.
Further, the detection feature encoder is a dense attention flow encoder, the dense attention flow encoder is improved based on a VGG16 network as a main network, and the improvement process is that the last three full connection layers of the VGG16 network are removed and truncated before the last pooling layer of the VGG16 network, so that the dense attention flow encoder is obtained.
Further, the specific implementation manner of the step S2 includes:
S21, introducing an attention mechanism, extracting the output characteristics of the last layer of each part from the improved VGG16 network, merging output characteristic dimensions based on a preset spatial pixel relation matrix to construct an operation matrix among pixels, and further realizing the representation of the relation among pixels;
s22, carrying out normalization processing based on an operation matrix among pixels to obtain attention weights, and multiplying the output characteristics after dimension combination by the attention weights to obtain characteristics after spatial self-attention weighting;
S23, adding the output characteristics and the characteristics weighted by using the spatial attention by using a residual connection mode, and obtaining the output deep characteristics by connecting a channel attention mechanism, wherein the process is expressed as follows by a formula:
F=CA(f+δ·(f*(Re-1(Re(f)⊙R))))
Wherein Re-1 represents the inverse operation of the merging dimension of the output features, R represents the pixel relation matrix, x represents the multiplication of elements, δ represents a learnable coefficient, CA (-) represents the channel attention mechanism, and f represents the output initial feature of the backbone network;
S24, upsampling and 1X 1 convolution are carried out on the deep features so as to adjust the sizes and channels of the deep features and the current features to be consistent;
S25, based on a preset gradual splicing module, splicing the deep features subjected to up-sampling and 1 multiplied by 1 convolution with the current features from the next layer of the current features in the sequence from the shallow layer to the deep layer;
S26, adjusting the channel number of the spliced features to be the channel number of the deep features output by the detection feature encoder, and inputting the deep features into a cascade feature decoder for decoding;
and S27, activating the final output of the cascade feature decoder by using a Sigmoid function, and further completing training of a remote sensing image salient target detection model.
Further, the preset spatial pixel relation matrix in step S21 is expressed as:
M={(Re(f))T⊙Re(f)}T
Where Re () represents an operation of combining the latter two dimensions of the output feature into one dimension, while, as such, it represents a matrix multiplication operation, and T represents a transpose.
Further, the normalization processing based on the operation matrix between pixels in the step S22 is formulated as:
Where r (x, y) represents the degree of importance of the influence of pixel x on pixel y, m (x, y) represents an element in the pixel relationship matrix, and e represents a natural constant.
Further, the method further comprises the step S23', wherein the multi-level pyramid fusion multi-scale spatial attention is adopted to extract information of output features, specifically, the output features are updated into three channels with different resolutions through 2 times and 4 times downsampling, the multi-level pyramid fusion multi-scale spatial attention is used to refine the different scale features, the refined features and the output features are fused based on a residual structure, then three-level features are fused according to the sequence from low resolution to high resolution, further deep features of the multi-level pyramid fusion multi-scale spatial attention weight are obtained, and finally the deep features output in the step S23 are combined with the deep features of the multi-level pyramid fusion multi-scale spatial attention weight.
Further, in the step S25, the deep feature after upsampling and 1×1 convolution is spliced with the current feature, and expressed as:
Fk=Conc(Conv(Up(f5)),...Conv(Up(Fk-1)),Fk)
Where Up (·) represents upsampling to align deep features with current features, Fk represents the k-th level features fed into the concatenated decoder, F5 represents the 5-th level features fed into the concatenated decoder, conv represents the convolutional layer.
Further, the preset loss function is a combined loss function with different weight coefficients, and is expressed as:
L=ω1LP2LR3LMAE4LS
Wherein LP、LR、LMAE and Ls represent a precision loss term, a total loss term, an average absolute error loss term, and a structural similarity loss term, respectively, and ω1、ω2、ω3、ω4 represent weight coefficients of LP、LR、LMAE and Ls, respectively, wherein:
LS=1-Smeasure
Smeasure=α×So+(1-α)×Sr
Wherein N is the total number of samples, N is the sample number, J is the high-direction pixel number of the remote sensing image, i is the wide-direction pixel number of the remote sensing image, epsilon is a preset constant, W, H is the width and the height of the remote sensing image, S (i, J) epsilon S is the predicted value of each pixel, G (i, J) epsilon G is the true value of each pixel, S is the significance prediction result, G is the true label, Sr is the similarity measure facing the region, So is the similarity measure facing the object structure, alpha is the hyper-parameter, and is used for measuring the similarity measure facing the region and the similarity measure facing the object structure.
Further, step S4 is further included, comparing the output corresponding saliency map with the truth map, so as to measure the level of generating the saliency map by the remote sensing image saliency target model.
Further, the specific implementation mode of the step S4 is that a saliency map generated by a remote sensing image saliency target model is measured based on a preset index PR curve, an F value, average absolute loss and an S value.
Compared with the prior art, the method for detecting the remarkable target of the remote sensing image has the following advantages:
(1) The invention uses the cascade structure to decode the features, so that more advanced semantic features can guide the feature decoding process, and the problems of missing detection and false detection of small targets in remote sensing images are effectively solved.
(2) The invention designs a new loss function for training a remote sensing image salient target detection model, improves the prediction confidence of a salient region, and enables the model to predict a more accurate salient target boundary.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention. In the drawings:
FIG. 1 is a flow chart of a method for detecting a salient object of a remote sensing image in the present invention;
FIG. 2 is a schematic diagram of a method for detecting a significant target in a remote sensing image according to the present invention;
FIG. 3 is a schematic diagram of the self-attention mechanism of the present invention;
FIG. 4 is a graph of P-R curve results for an embodiment of the present invention;
FIG. 5 is a graph of the results of an improved small target leak detection, wherein (a) is a remote sensing image, (b) is a salient target truth-value graph, (c) is an original saliency map, and (d) is a saliency map generated for the present method;
fig. 6 is a result diagram for improving the prediction effect of a salient object boundary region, wherein (a) is a remote sensing image, (b) is a salient object truth diagram, (c) is an original saliency map, and (d) is a saliency map generated by the method.
Detailed Description
It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other. The invention will be described in detail below with reference to the drawings in connection with embodiments.
Referring to fig. 1 to 6, the method for detecting the significant target of the remote sensing image of the present invention comprises the following steps:
s1, acquiring remote sensing image data comprising a training set and a testing set, and constructing a remote sensing image salient target detection model comprising a detection feature encoder and a cascade feature decoder;
in this step, the detection feature encoder is a dense attention flow encoder, and the dense attention flow encoder is improved based on a VGG16 network as a main network, wherein the improvement process is that the last three full connection layers of the VGG16 network are removed and truncated before the last pooling layer of the VGG16 network, so as to obtain the dense attention flow encoder. Therefore, the characteristic dimension of the first four layers in the improved VGG16 network is thatW and H are the width and height of the remote sensing image respectively, k is the backbone network layer, the last layer is the pooling layer removed, so that the characteristic dimension and the fourth layer are consistent, and meanwhile, after each characteristic extraction layer is finished, the current level characteristics are selected and refined and then sent to the next level for extraction.
In this embodiment, EORSSD is used as the remote sensing image dataset, 1400 remote sensing images are randomly selected from the remote sensing image dataset as the training set, and 600 remote sensing images are used as the test set.
S2, introducing an attention mechanism, a feature flow mechanism and a cascade decoding mechanism, training the remote sensing image significant target detection model based on remote sensing image data of a training set, stopping training until a preset loss function converges, and obtaining a trained remote sensing image significant target detection model;
In the step, it specifically comprises the following steps:
S21, introducing an attention mechanism, extracting the output characteristics of the last layer of each part from the improved VGG16 network, merging output characteristic dimensions based on a preset spatial pixel relation matrix to construct an operation matrix among pixels, and further realizing the representation of the relation among pixels;
Wherein the preset spatial pixel relation matrix is expressed as follows by a formula:
M={(Re(f))T⊙Re(f)}T
where Re () represents an operation of combining the latter two dimensions of the output feature into one dimension, while, as such, it represents a matrix multiplication operation, and T represents a transpose;
s22, carrying out normalization processing based on an operation matrix among pixels to obtain attention weights, multiplying the output characteristics after combining dimensions by the attention weights, and then restoring the temperature of the output characteristics to obtain characteristics weighted by using the spatial self-attention, wherein the obtained characteristics weighted by using the spatial self-attention have global information;
wherein, the normalization processing based on the operation matrix between pixels is formulated as:
wherein r (x, y) represents the influence importance degree of the pixel x on the pixel y, m (x, y) represents an element in the pixel relation matrix, and e represents a natural constant;
S23, adding the output characteristics and the characteristics weighted by using the spatial attention by using a residual connection mode, and obtaining the output deep characteristics by connecting a channel attention mechanism, wherein the process is expressed as follows by a formula:
F=CA(f+δ·(f*(Re-1(Re(f)⊙R))))
Wherein Re-1 represents the inverse operation of the merging dimension of the output features, R represents the pixel relation matrix, x represents the multiplication of elements, δ represents a learnable coefficient, CA (-) represents the channel attention mechanism, and f represents the output initial feature of the backbone network;
In this embodiment, after the improved VGG16 network feature extraction and feature refinement of the remote sensing image, five features with different scales are finally formed, and the deeper layers of the features with different scales contain more semantic features, and the shallower layers retain more detail features.
S24, upsampling and 1X 1 convolution are carried out on the deep features so as to adjust the sizes and channels of the deep features and the current features to be consistent;
S25, based on a preset gradual splicing module, splicing the deep features subjected to upsampling and 1 multiplied by 1 convolution with the current features from the next layer of the current features according to the sequence from the shallow layer to the deep layer, wherein the splicing process is expressed as follows:
Fk=Conc(Conv(Up(F5)),...Conv(Up(Fk-1)),Fk)
Where Up (·) represents upsampling to align deep features with current features, Fk represents k-th level features fed into the concatenated decoder, F5 represents 5-th level features fed into the concatenated decoder, conv represents the convolutional layer;
In this embodiment, in order to achieve complete extraction of image features, a attention fusion mode from shallow layer to deep layer is adopted to fuse multi-level features, and a GCA4 module is used for example, the GCA4 module receives and splices output features of a GCA1 module, a GCA2 module and a GCA3 module respectively, and adjusts the number of channels to 1 again, so as to form a final attention diagram, and the final attention diagram is expressed as follows by a formula:
A4=Conv(Conc(A1,A2,A3,A4))
The attention is then multiplied by the refined features and the deep features fed into the concatenated feature decoder are generated using the residual connection.
S26, adjusting the channel number of the spliced features to be the channel number of the deep features output by the detection feature encoder, and inputting the deep features into a cascade feature decoder for decoding;
and S27, activating the final output of the cascade feature decoder by using a Sigmoid function, and further completing training of a remote sensing image salient target detection model.
In this embodiment, since the deep features have the most abundant semantic features, each level of decoder can be guided to decode, but all the features from the deeper layers have more semantic information than the features from the shallow layers, the deep features obtained by each level of encoder can guide the shallow layer decoder except the most deep global features by using the cascade feature decoder, so that the generation of the final saliency map is facilitated. Each decoder unit receives the output from the previous level decoder and the deep spliced features, and activates the output of the last decoder by using a Sigmoid function to obtain a final predicted saliency map.
S3, performing salient target prediction on the remote sensing image data of the test set by using the trained remote sensing image salient target detection model, and further outputting a corresponding salient map.
In this embodiment, the significant target prediction is performed on the remote sensing image data of the test set based on the trained remote sensing image significant target detection model, so that a significant map of a more accurate significant target boundary can be obtained.
The preset loss function is a combined loss function with different weight coefficients, and is expressed as follows by a formula:
L=ω1LP2LR3LMAE4LS
Wherein LP、LR、LMAE and Ls represent a precision loss term, a total loss term, an average absolute error loss term, and a structural similarity loss term, respectively, and ω1、ω2、ω3、ω4 represent weight coefficients of LP、LR、LMAE and Ls, respectively, wherein:
LS=1-Smeasure
Smeasure=α×So+(1-α)×Sr
Wherein N is the total number of samples, N is the sample number, j is the high-direction pixel number of the remote sensing image, i is the wide-direction pixel number of the remote sensing image, epsilon is a preset constant, W, H is the width and the height of the remote sensing image, S (i, j) epsilon S is the predicted value of each pixel, G (i, j) epsilon G is the true value of each pixel, S is the significance prediction result, G is the true label, Sr is the similarity measure facing the region, So is the similarity measure facing the object structure, alpha is the hyper-parameter, and alpha is the similarity measure for measuring the region facing the similarity measure and the similarity measure facing the object structure.
In this embodiment, the difference of the images obtained by comparing the structural similarity with the structural information between the images is more consistent with the perception result of human eyes, so that the problem that the detection capability of the cross entropy loss function on the edge part is not strong in the process of detecting the significant target can be solved by adopting the combined loss function with different weight coefficients as the preset loss function.
In another embodiment, the method further comprises the step S4 of comparing the outputted corresponding saliency map with a truth map so as to measure the level of generating the saliency map by the remote sensing image saliency target model, specifically, measuring the saliency map generated by the remote sensing image saliency target model based on a preset index PR curve, an F value, average absolute loss and an S value.
In this embodiment, the saliency map generated by the model is compared to the truth map to quantitatively measure the level of saliency map generation. The four indices, namely the PR curve, the F value, the mean absolute loss (MAE) and the S value, were used for the evaluation.
The Precision refers to the ratio of the correct positive sample to all the positive samples predicted, i.e. the Precision ratio, the Recall refers to the ratio of the correct positive sample to the positive sample in the label, i.e. the Recall ratio, all (Precision, recall) values can be obtained by adjusting the threshold between (0, 1), and then the Precision-Recall (PR) curve can be obtained by sequentially connecting, therefore, the closer the PR curve is to the (1, 1) point of the coordinate axis, the better the performance of the model is represented, as shown in fig. 4, fig. 4 shows the PR curve of the remote sensing image salient object detection method in this embodiment.
Wherein the F value is defined as
Wherein β2 is set to 0.3 to emphasize the importance of Precision;
mean absolute loss (MAE) is an indicator of the absolute error of a significant predictive and truth plot, formulated as:
In the formula, S represents a significance prediction result, and G represents a real label.
The Smeasure value is an index that measures the generated saliency map from the structural similarity level, and is expressed as:
Smeasure=α×So+(1-α)×Sr
Where Sr is a region-oriented similarity measure, So is an object-oriented similarity measure, and α represents a hyper-parameter.
In this embodiment, the test set is subjected to significant target detection by using a remote sensing image significant target detection method, and the detection results are shown in table 1, fig. 5 and fig. 6.
Table 1 shows the detection results of the remarkable targets of the remote sensing images
Evaluation indexFMAES
Value of0.90310.00480.9189
As can be seen from fig. 5 and fig. 6, the method for detecting the significant target in the remote sensing image not only can accurately predict the significant target, but also can accurately predict the boundary region of the significant target, and simultaneously, the prediction under the small target scene is relatively accurate, thereby reducing the conditions of missed detection and false detection.
In another embodiment, the method further comprises a step S23', wherein the step S23' is used for extracting information from the output features by adopting multi-level pyramid fusion multi-scale space attention, specifically, the output features are updated into three channels with different resolutions through 2 times and 4 times downsampling, the multi-level pyramid fusion multi-scale space attention is used for refining the different scale features, the refined features are fused with the output features based on a residual structure, then three-level features are fused according to the sequence from low resolution to high resolution, further deep features of the multi-level pyramid fusion multi-scale space attention weight are obtained, and finally the deep features output in the step S23 are combined with the deep features of the multi-level pyramid fusion multi-scale space attention weight.
In this embodiment, in addition to the attention among the individual pixels, the multi-scale attention of the entire image space can also extract useful information, specifically, the GCA module will use a multi-level pyramid to fuse the multi-scale spatial attention after deriving the feature output with self-attention.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims (8)

Translated fromChinese
1.一种遥感影像显著目标检测方法,其特征在于,所述方法包括以下步骤:1. A remote sensing image salient target detection method, characterized in that the method comprises the following steps:S1、获取包含训练集和测试集的遥感影像数据,并构建包括检测特征编码器和级联特征解码器的遥感影像显著目标检测模型;S1. Obtain remote sensing image data including a training set and a test set, and construct a remote sensing image salient object detection model including a detection feature encoder and a cascade feature decoder;S2、引入注意力机制、特征流动机制和级联解码机制,并基于训练集的遥感影像数据对所述遥感影像显著目标检测模型进行训练,直至预设损失函数收敛后停止训练,获取训练好的遥感影像显著目标检测模型;S2, introducing an attention mechanism, a feature flow mechanism and a cascade decoding mechanism, and training the remote sensing image salient target detection model based on the remote sensing image data of the training set, until the training is stopped after the preset loss function converges, and a trained remote sensing image salient target detection model is obtained;S3、利用训练好的遥感影像显著目标检测模型对所述测试集的遥感影像数据进行显著目标预测,进而输出对应的显著图;S3, using the trained remote sensing image salient target detection model to predict salient targets on the remote sensing image data of the test set, and then outputting a corresponding salient map;所述检测特征编码器为密集注意力流动编码器,所述密集注意力流动编码器是基于VGG16网络为主干网络进行改进得到,其改进过程为:移除所述VGG16网络的最后三个全连接层,并在VGG16网络的最后一级池化层前截断,进而得到所述密集注意力流动编码器;The detection feature encoder is a dense attention flow encoder, which is obtained by improving the VGG16 network as the main network. The improvement process is: removing the last three fully connected layers of the VGG16 network and truncating before the last pooling layer of the VGG16 network, thereby obtaining the dense attention flow encoder;所述步骤S2的具体实现方式包括:The specific implementation of step S2 includes:S21、引入注意力机制,从改进后的VGG16网络中提取每一部分最后一层的输出特征,并基于预设空间像素关系矩阵合并输出特征维度以构建像素之间的运算矩阵,进而实现对各个像素之间的关系进行表示;S21. Introduce the attention mechanism to extract the output features of the last layer of each part from the improved VGG16 network, and merge the output feature dimensions based on the preset spatial pixel relationship matrix to construct an operation matrix between pixels, thereby representing the relationship between each pixel;S22、基于像素之间的运算矩阵进行归一化处理以获取注意力权重,并将合并维度后的输出特征与注意力权重相乘,得到使用空间自注意力加权后的特征;S22, performing normalization processing based on the operation matrix between pixels to obtain attention weights, and multiplying the output features after merging the dimensions by the attention weights to obtain features weighted by spatial self-attention;S23、利用残差连接方式将所述输出特征与使用空间注意力加权后的特征相加,并通过衔接通道注意力机制,得到输出的深层特征,其过程用公式表示为:S23, using the residual connection method to add the output features and the features weighted by the spatial attention, and through the connection channel attention mechanism, the deep features of the output are obtained, and the process is expressed by the formula:F=CA(f+δ·(f*(Re-1(Re(f)⊙R))))F=CA(f+δ·(f*(Re-1 (Re(f)⊙R))))式中,Re-1表示输出特征合并维度的逆操作,R表示像素关系矩阵,*表示逐元素相乘,δ表示一个可学习系数,CA(·)表示通道注意力机制,f表示主干网络输出初始特征;In the formula, Re-1 represents the inverse operation of the output feature merging dimension, R represents the pixel relationship matrix, * represents element-by-element multiplication, δ represents a learnable coefficient, CA(·) represents the channel attention mechanism, and f represents the initial feature output by the backbone network;S24、将所述深层特征进行上采样和1×1卷积,以将所述深层特征与当前特征的大小和通道调整为一致;S24, upsampling and 1×1 convolution of the deep feature to adjust the size and channel of the deep feature to be consistent with the current feature;S25、基于预设的逐步拼接模块,从当前特征的下一层按照由浅层到深层的顺序将经过上采样和1×1卷积后的深层特征与当前特征进行拼接;S25, based on a preset step-by-step concatenation module, concatenate the deep features after upsampling and 1×1 convolution with the current features in the order from shallow to deep layers from the next layer of the current features;S26、将拼接后特征的通道数调整为检测特征编码器输出的深层特征的通道数,并输入级联特征解码器中进行解码;S26, adjusting the number of channels of the concatenated features to the number of channels of the deep features output by the detection feature encoder, and inputting the concatenated features into the cascade feature decoder for decoding;S27、利用Sigmoid函数对级联特征解码器的最后输出进行激活,进而完成遥感影像显著目标检测模型的训练。S27. Use the Sigmoid function to activate the final output of the cascade feature decoder, thereby completing the training of the remote sensing image salient target detection model.2.根据权利要求1所述的遥感影像显著目标检测方法,其特征在于,所述步骤S21中预设空间像素关系矩阵用公式表示为:2. The remote sensing image salient target detection method according to claim 1, characterized in that the preset spatial pixel relationship matrix in step S21 is expressed by a formula:M={(Re(f))T⊙Re(f)}TM={(Re(f))T⊙Re (f)}T式中,Re()表示将输出特征的后两维合并为一维的操作,⊙表示矩阵乘法操作,T表示转置。Where Re() represents the operation of merging the last two dimensions of the output feature into one dimension, ⊙ represents the matrix multiplication operation, and T represents the transpose.3.根据权利要求2所述的遥感影像显著目标检测方法,其特征在于,所述步骤S22中基于像素之间的运算矩阵进行归一化处理用公式表示为:3. The remote sensing image salient target detection method according to claim 2, characterized in that the normalization processing based on the operation matrix between pixels in step S22 is expressed by the formula:式中,r(x,y)表示像素x对像素y的影响重要程度,m(x,y)表示像素关系矩阵中的元素,e表示自然常数。Where r(x,y) represents the importance of the influence of pixel x on pixel y, m(x,y) represents the element in the pixel relationship matrix, and e represents a natural constant.4.根据权利要求3所述的遥感影像显著目标检测方法,其特征在于,还包括步骤S23’、采用多级金字塔融合多尺度的空间注意力对输出特征进行信息提取,具体为:通过2倍和4倍下采样将输出特征更新为三个不同分辨率通道,使用多级金字塔融合多尺度的空间注意力对不同尺度特征进行提炼,并基于残差结构将提炼后的特征与输出特征进行融合,然后按照分辨率从低到高的顺序将三级特征融合,进而得到多级金字塔融合多尺度的空间注意力权重的深层特征,最后将步骤S23输出的深层特征与多级金字塔融合多尺度的空间注意力权重的深层特征合并。4. The remote sensing image salient target detection method according to claim 3 is characterized in that it also includes step S23', using a multi-level pyramid to fuse multi-scale spatial attention to extract information from the output features, specifically: updating the output features to three channels with different resolutions by 2 times and 4 times downsampling, using a multi-level pyramid to fuse multi-scale spatial attention to refine the features of different scales, and fusing the refined features with the output features based on the residual structure, and then fusing the three-level features in order from low to high resolution, thereby obtaining the deep features of the multi-level pyramid fused multi-scale spatial attention weights, and finally merging the deep features output by step S23 with the deep features of the multi-level pyramid fused multi-scale spatial attention weights.5.根据权利要求4所述的遥感影像显著目标检测方法,其特征在于,所述步骤S25中将经过上采样和1×1卷积后的深层特征与当前特征进行拼接,用公式表示为:5. The remote sensing image salient target detection method according to claim 4, characterized in that in step S25, the deep features after upsampling and 1×1 convolution are concatenated with the current features, which is expressed by the formula:Fk=Conc(Conv(Up(F5)),...Conv(Up(Fk-1)),Fk)Fk =Conc(Conv(Up(F5 )),...Conv(Up(Fk-1 )), Fk )式中,Up(·)表示上采样将深层特征与当前特征对齐,Fk表示送入级联解码器的第k级特征,F5表示送入级联解码器的第5级特征,Conv表示卷积层。Where Up(·) indicates upsampling to align the deep features with the current features,Fk indicates the k-th level features fed into the cascade decoder,F5 indicates the 5th level features fed into the cascade decoder, and Conv indicates the convolutional layer.6.根据权利要求5所述的遥感影像显著目标检测方法,其特征在于,所述预设损失函数为具有不同权重系数的组合损失函数,用公式表示为:6. The remote sensing image salient target detection method according to claim 5, characterized in that the preset loss function is a combined loss function with different weight coefficients, which is expressed by the formula:L=ω1LP2LR3LMAE4LSL=ω1 LP2 LR3 LMAE4 LS式中,LP、LR、LMAE和Ls分别表示精度损失项、查全损失项、平均绝对误差损失项和结构相似程度损失项,ω1、ω2、ω3、ω4分别表示LP、LR、LMAE和Ls的权重系数,其中:Where LP , LR , LMAE and Ls represent the precision loss term, recall loss term, mean absolute error loss term and structural similarity loss term respectively, ω1 , ω2 , ω3 , ω4 represent the weight coefficients of LP , LR , LMAE and Ls respectively, where:LS=1-SmeasureLS = 1-SmeasureSmeasure=α×So+(1-α)×SrSmeasure =α×So +(1-α)×Sr式中,N为样本总数,n表示样本序号,j表示遥感图像的高方向像素序号,i表示遥感图像的宽方向像素序号,ε表示预设常数,W,H分别为遥感图像的宽和高,s(i,j)∈S表示每一个像素的预测值,g(i,j)∈G表示每一个像素的真值,S表示显著性预测结果,G表示真实标签,Sr为面向区域的相似性度量,So为面向物体结构的相似性度量,α表示超参数,其用于衡量面向区域的相似性度量和面向物体结构的相似性度量。Where N is the total number of samples, n is the sample number, j is the pixel number in the height direction of the remote sensing image, i is the pixel number in the width direction of the remote sensing image, ε is the preset constant, W and H are the width and height of the remote sensing image respectively, s(i,j)∈S is the predicted value of each pixel, g(i,j)∈G is the true value of each pixel, S is the saliency prediction result, G is the true label,Sr is the region-oriented similarity measure,So is the object-oriented similarity measure, and α is a hyperparameter used to measure the region-oriented similarity measure and the object-oriented similarity measure.7.根据权利要求6所述的遥感影像显著目标检测方法,其特征在于,还包括步骤S4、将输出的对应显著图与真值图进行对比,进而衡量遥感影像显著目标模型生成显著图的水平。7. The remote sensing image salient target detection method according to claim 6 is characterized in that it also includes step S4, comparing the output corresponding salient map with the true value map, thereby measuring the level of the remote sensing image salient target model generating the salient map.8.根据权利要求7所述的遥感影像显著目标检测方法,其特征在于,所述步骤S4的具体实现方式为:基于预设指标PR曲线、F值、平均绝对损失和S值对遥感影像显著目标模型生成的显著图进行衡量。8. The remote sensing image salient target detection method according to claim 7 is characterized in that the specific implementation method of step S4 is: based on preset indicators PR curve, F value, mean absolute loss and S value, the salient map generated by the remote sensing image salient target model is measured.
CN202210879580.0A2022-07-252022-07-25 A method for salient object detection in remote sensing imagesActiveCN115115938B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202210879580.0ACN115115938B (en)2022-07-252022-07-25 A method for salient object detection in remote sensing images

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202210879580.0ACN115115938B (en)2022-07-252022-07-25 A method for salient object detection in remote sensing images

Publications (2)

Publication NumberPublication Date
CN115115938A CN115115938A (en)2022-09-27
CN115115938Btrue CN115115938B (en)2025-04-15

Family

ID=83334609

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202210879580.0AActiveCN115115938B (en)2022-07-252022-07-25 A method for salient object detection in remote sensing images

Country Status (1)

CountryLink
CN (1)CN115115938B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN115620163A (en)*2022-10-282023-01-17西南交通大学 A semi-supervised learning method for intelligent recognition of deep valleys based on remote sensing images
CN116994137B (en)*2023-08-042025-01-28哈尔滨工业大学 A target detection method based on multi-scale deformation modeling and region fine extraction
CN118015332A (en)*2024-01-032024-05-10河海大学 A method for salient object detection in remote sensing images

Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109800629A (en)*2018-12-052019-05-24天津大学A kind of Remote Sensing Target detection method based on convolutional neural networks
CN111179217A (en)*2019-12-042020-05-19天津大学 A multi-scale target detection method in remote sensing images based on attention mechanism

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN112347859B (en)*2020-10-152024-05-24北京交通大学Method for detecting significance target of optical remote sensing image
CN112861795A (en)*2021-03-122021-05-28云知声智能科技股份有限公司Method and device for detecting salient target of remote sensing image based on multi-scale feature fusion

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109800629A (en)*2018-12-052019-05-24天津大学A kind of Remote Sensing Target detection method based on convolutional neural networks
CN111179217A (en)*2019-12-042020-05-19天津大学 A multi-scale target detection method in remote sensing images based on attention mechanism

Also Published As

Publication numberPublication date
CN115115938A (en)2022-09-27

Similar Documents

PublicationPublication DateTitle
CN115115938B (en) A method for salient object detection in remote sensing images
CN110909673A (en)Pedestrian re-identification method based on natural language description
CN117095277A (en)Edge-guided multi-attention RGBD underwater salient object detection method
CN118864865B (en) A remote sensing image water body segmentation method based on contrastive learning and multimodal fusion
CN118762364A (en) A method for infrared small target detection based on scene text information guidance
CN116823664A (en)Remote sensing image cloud removal method and system
CN114926826A (en)Scene text detection system
CN117173594A (en)Remote sensing image change detection method based on deformable attention network
CN116229106A (en)Video significance prediction method based on double-U structure
CN116486183B (en)SAR image building area classification method based on multiple attention weight fusion characteristics
CN116245861A (en)Cross multi-scale-based non-reference image quality evaluation method
CN117809198A (en)Remote sensing image significance detection method based on multi-scale feature aggregation network
CN118247323A (en)Scene depth estimation model training method, scene depth estimation method and device
CN119339075A (en) Image segmentation method and device combining feature difference recognition and detail enhancement
CN117152630A (en) A deep learning-based optical remote sensing image change detection method
CN116721314A (en) Small target detection method based on smooth interactive compression network
CN119919782A (en) A remote sensing target detection method and system based on selective feature space fusion
CN118230076B (en) Multi-label classification method for remote sensing images based on semantic and label structure mining
CN118334362B (en)Heterogeneous image matching method and system based on contrast learning
CN119600291A (en)Infrared dim target detection method based on double-branch attention mechanism
CN117830115B (en) A design method for single-lens computational imaging system for depth estimation
CN118736433A (en) Multi-scale building and construction waste extraction method based on high-resolution remote sensing images
CN116524208B (en)Interactive saliency mining method for RGB-D (red, green and blue-white) salient target detection
CN116993639A (en) Visible and infrared image fusion method based on structural reparameterization
CN116630637A (en)optical-SAR image joint interpretation method based on multi-modal contrast learning

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp