CN116363504A

Movatterモバイル変換

Info

Publication number: CN116363504A
Application number: CN202310153318.2A
Authority: CN
Inventors: 王英华; 邹树岭; 黄瀚洋; 刘宏伟
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2023-02-22
Filing date: 2023-02-22
Publication date: 2023-06-30
Anticipated expiration: 2043-02-22
Also published as: CN116363504B

Abstract

Translated fromChinese

本发明公开了一种基于有监督注意力机制的无锚框SAR目标检测方法，包括：基于原始SAR图像获得原始训练集和原始测试集；分别从原始训练集和原始测试集中获取梯度信息和CFAR信息，并据此构建新的训练集和新的测试集；构建基于梯度信息、CFAR信息融合和有监督注意力机制的无锚框SAR目标检测网络；利用新的训练集对目标检测网络进行训练；将新的测试集输入到训练好的目标检测网络中，得到初步的目标检测结果；将初步的目标检测结果对应到测试图像上，并进行NMS操作以去除重叠的目标检测框，得到最终的目标检测结果。该方法能够有效缓解有锚框目标检测网络本身存在的计算复杂、正负样本不平衡等问题，提升了SAR目标检测性能。

The invention discloses an anchor-free SAR target detection method based on a supervised attention mechanism, comprising: obtaining an original training set and an original test set based on an original SAR image; obtaining gradient information and CFAR from the original training set and the original test set respectively information, and construct a new training set and a new test set based on it; build an anchor-free SAR target detection network based on gradient information, CFAR information fusion and supervised attention mechanism; use the new training set to train the target detection network ; Input the new test set into the trained target detection network to obtain the preliminary target detection result; correspond the preliminary target detection result to the test image, and perform NMS operation to remove the overlapping target detection frame to obtain the final Target detection results. This method can effectively alleviate the problems of complex calculation and imbalance of positive and negative samples in the target detection network with anchor frames, and improve the performance of SAR target detection.

Description

Translated fromChinese

基于有监督注意力机制的无锚框SAR目标检测方法Anchor-free SAR target detection method based on supervised attention mechanism

技术领域Technical Field

本发明属于雷达目标检测技术领域，具体涉及一种基于有监督注意力机制的无锚框SAR目标检测方法。The present invention belongs to the technical field of radar target detection, and in particular relates to an anchor-frame-free SAR target detection method based on a supervised attention mechanism.

背景技术Background Art

合成孔径雷达(Synthetic Aperture Radar，SAR)由于其能够在任何气候条件下不分昼夜地作业，具有全天时、全天候的特点，已经被广泛应用在军事侦查、资源勘探、环境保护、灾害预防和科学研究等各种领域。SAR图像自动目标识别(Automatic TargetRecognition,ATR)技术致力于对复杂SAR场景中的目标进行定位和识别，是SAR图像应用的核心方向之一，在军事和民用领域都处于至关重要的地位。美国林肯实验室最早开展此方面的研究工作，并于20世纪80年代末提出了著名的SAR ATR三级处理流程：SAR目标检测、鉴别和识别，它是SAR图像解译最常见的处理流程。其中的SAR目标检测任务是SAR图像解译的重要内容。Synthetic Aperture Radar (SAR) has been widely used in various fields such as military reconnaissance, resource exploration, environmental protection, disaster prevention and scientific research because it can operate day and night in any climatic conditions and has the characteristics of all-day and all-weather. SAR image automatic target recognition (ATR) technology is committed to locating and identifying targets in complex SAR scenes. It is one of the core directions of SAR image application and plays a vital role in both military and civilian fields. The Lincoln Laboratory in the United States was the first to carry out research in this area and proposed the famous SAR ATR three-level processing flow in the late 1980s: SAR target detection, identification and recognition. It is the most common processing flow for SAR image interpretation. Among them, the SAR target detection task is an important part of SAR image interpretation.

自从合成孔径雷达问世以来，有关SAR目标检测的技术也在飞速发展。在技术发展早期阶段，大量学者主要对传统目标检测方法进行研究，其中最为著名的方法是由美国林肯实验室于20世纪90年代提出的双参数恒虚警(Constant False Alarm Rate，CFAR)检测方法，该方法成功将CFAR检测器扩展到了二维的SAR图像目标检测领域中。由于双参数CFAR算法在一些简单场景中检测性能优越，大量研究都开始围绕恒虚警算法展开，提出了最小选择CFAR、最大选择CFAR、单元平均CFAR、序贯统计CFAR等检测器。虽然传统目标检测方法能够在一些简单场景中取得较好的目标检测性能，但是其仍具有许多缺点：(1)通常涉及大量超参数并需要设置阈值来完成目标检测任务，所以在实际中需要根据使用场景手动调参，流程繁琐，难以实现自适应检测，并且目标检测精确率较低；(2)传统目标检测方法通常是逐像素检测，检测时间较长，并且在一些复杂场景中难以取得较好的目标检测性能。因此，传统目标检测方法已逐渐满足不了实际的使用需要。Since the advent of synthetic aperture radar, the technology of SAR target detection has also developed rapidly. In the early stage of technology development, a large number of scholars mainly studied traditional target detection methods. The most famous method is the dual-parameter constant false alarm rate (CFAR) detection method proposed by the Lincoln Laboratory in the United States in the 1990s. This method successfully extended the CFAR detector to the field of two-dimensional SAR image target detection. Since the dual-parameter CFAR algorithm has excellent detection performance in some simple scenes, a large number of studies have begun to focus on the constant false alarm algorithm, and detectors such as minimum selection CFAR, maximum selection CFAR, unit average CFAR, and sequential statistical CFAR have been proposed. Although traditional target detection methods can achieve good target detection performance in some simple scenes, they still have many disadvantages: (1) They usually involve a large number of hyperparameters and need to set thresholds to complete the target detection task. Therefore, in practice, they need to manually adjust the parameters according to the usage scenario. The process is cumbersome, it is difficult to achieve adaptive detection, and the target detection accuracy is low; (2) Traditional target detection methods are usually pixel-by-pixel detection, which takes a long time to detect and is difficult to achieve good target detection performance in some complex scenes. Therefore, traditional target detection methods have gradually failed to meet actual usage needs.

近年来，随着深层网络在光学图像目标检测领域中大获成功，基于深度学习的SAR目标检测方法成为了众多学者研究的热点，目前已经在SAR目标检测任务中得到了广泛的应用，得到了远远优于传统目标检测方法的SAR目标检测性能。例如，专利CN202210269829.6提供了一种基于CFAR指导的双流SSD SAR图像目标检测方法，其通过SAR幅度特征和CFAR指示特征在特征空间上进行融合，来充分利用SAR图像中目标的强散射特性，增强目标检测性能；然后利用CFAR二值指示图来让检测器更加关注难分负样本和正样本的学习；最后提出了AR-NMS算法改进了传统的NMS算法，提高了SAR目标检测性能。In recent years, with the great success of deep networks in the field of optical image target detection, SAR target detection methods based on deep learning have become a hot topic for many scholars. At present, they have been widely used in SAR target detection tasks, and have achieved SAR target detection performance that is far superior to traditional target detection methods. For example, patent CN202210269829.6 provides a dual-stream SSD SAR image target detection method based on CFAR guidance, which fully utilizes the strong scattering characteristics of targets in SAR images and enhances target detection performance by fusing SAR amplitude features and CFAR indicator features in feature space; then, the CFAR binary indicator map is used to make the detector pay more attention to the learning of difficult-to-distinguish negative samples and positive samples; finally, the AR-NMS algorithm is proposed to improve the traditional NMS algorithm and improve the SAR target detection performance.

但是，对于SAR地面目标数据，由于其数据较少并且待检测场景较为复杂，通常包含大量与目标特性十分相似的自然杂波和人造杂波，这使得地面目标检测任务十分困难，目前有关SAR地面目标检测的研究也相对较少，主流的基于深层网络的SAR地面目标检测方法大多数是基于有锚框的目标检测方法，这些目标检测框架通常需要预先铺设大量锚框，其中涉及大量超参数的设置和复杂的IoU计算，并且存在严重的正负样本不平衡问题，这不利于SAR目标检测任务。另外，现有方法没有深入考虑SAR图像本身特有的几何畸变、辐射畸变和阴影遮挡等特点和SAR图像本身的特性对SAR目标检测的帮助，导致目前有关复杂场景下SAR目标检测方法的检测性能不高。However, for SAR ground target data, due to the small amount of data and the complex scenes to be detected, it usually contains a large amount of natural clutter and artificial clutter that are very similar to the target characteristics, which makes the ground target detection task very difficult. At present, there are relatively few studies on SAR ground target detection. Most of the mainstream SAR ground target detection methods based on deep networks are based on target detection methods with anchor boxes. These target detection frameworks usually require a large number of anchor boxes to be laid in advance, which involves a large number of hyperparameter settings and complex IoU calculations, and there is a serious imbalance problem between positive and negative samples, which is not conducive to SAR target detection tasks. In addition, the existing methods do not deeply consider the unique geometric distortion, radiation distortion and shadow occlusion of SAR images themselves and the help of the characteristics of SAR images themselves to SAR target detection, resulting in the low detection performance of the current SAR target detection methods in complex scenes.

发明内容Summary of the invention

为了解决现有技术中存在的上述问题，本发明提供了一种基于有监督注意力机制的无锚框SAR目标检测方法。本发明的技术思路是：通过对训练样本分别使用指数加权平均(Ratio of Exponentially Weighted Averages，ROEWA)边缘检测算法和双参数CFAR算法，得到每个训练样本的梯度信息和CFAR信息，并一同送入到构建好的基于梯度信息、CFAR信息融合和有监督注意力机制的无锚框SAR目标检测网络中进行训练，然后对测试样本进行同样的处理并输入到训练好的网络模型中得到最终的目标检测结果。本发明要解决的技术问题通过以下技术方案实现：In order to solve the above-mentioned problems existing in the prior art, the present invention provides an anchor-free SAR target detection method based on a supervised attention mechanism. The technical idea of the present invention is: by using the exponentially weighted average (Ratio of Exponentially Weighted Averages, ROEWA) edge detection algorithm and the dual-parameter CFAR algorithm for the training samples respectively, the gradient information and CFAR information of each training sample are obtained, and sent together to the constructed anchor-free SAR target detection network based on gradient information, CFAR information fusion and supervised attention mechanism for training, and then the test sample is processed in the same way and input into the trained network model to obtain the final target detection result. The technical problem to be solved by the present invention is achieved by the following technical solutions:

一种基于有监督注意力机制的无锚框SAR目标检测方法，包括：A method for anchor-free SAR target detection based on a supervised attention mechanism, comprising:

步骤1：基于原始SAR图像获得原始训练集和原始测试集；利用ROEWA边缘检测算法和双参数CFAR算法分别对原始训练集和原始测试集中获取梯度信息和CFAR信息，并据此构建新的训练集和新的测试集；Step 1: Obtain the original training set and the original test set based on the original SAR image; use the ROEWA edge detection algorithm and the dual-parameter CFAR algorithm to obtain the gradient information and CFAR information from the original training set and the original test set respectively, and construct a new training set and a new test set accordingly;

步骤2：构建基于梯度信息、CFAR信息融合和有监督注意力机制的无锚框SAR目标检测网络；其中，所述目标检测网络包括特征提取模块、特征处理模块和网络预测模块；Step 2: construct an anchor-free SAR target detection network based on gradient information, CFAR information fusion and supervised attention mechanism; wherein the target detection network includes a feature extraction module, a feature processing module and a network prediction module;

步骤3：利用所述新的训练集对所述目标检测网络进行训练，得到训练好的目标检测网络；Step 3: Using the new training set to train the target detection network to obtain a trained target detection network;

步骤4：将所述新的测试集输入到训练好的目标检测网络中，得到初步的目标检测结果；Step 4: Input the new test set into the trained target detection network to obtain preliminary target detection results;

步骤5：将所述初步的目标检测结果对应到测试图像上，并进行NMS操作以去除重叠的目标检测框，得到最终的目标检测结果。Step 5: Correspond the preliminary target detection result to the test image, and perform NMS operation to remove overlapping target detection frames to obtain the final target detection result.

本发明的有益效果：Beneficial effects of the present invention:

1、本发明提供的基于有监督注意力机制的无锚框SAR目标检测方法从增强目标特征的角度出发，引入了梯度幅度信息和CFAR信息，构建了基于梯度信息、CFAR信息融合的无锚框目标检测网络，实现了复杂场景下地面SAR目标检测任务，能够有效缓解有锚框目标检测网络本身存在的计算复杂、正负样本不平衡等问题，提升了SAR目标检测性能；1. The anchor-free SAR target detection method based on the supervised attention mechanism provided by the present invention introduces gradient amplitude information and CFAR information from the perspective of enhancing target features, constructs an anchor-free target detection network based on the fusion of gradient information and CFAR information, realizes the ground SAR target detection task in complex scenes, and can effectively alleviate the problems of computational complexity and imbalance of positive and negative samples existing in the anchor-free target detection network itself, thereby improving the SAR target detection performance;

2、本发明设计的目标检测网络在特征融合方面，采用了基于交互式注意力机制的三支路交互式通道-空间注意力融合(Triple-Interactive Channel-Spatial AttentionFusion，T-ICSAF)模块；同时，提出了基于Ground Truth(GT)二值标签监督的有监督空间注意力机制和SE通道注意力机制结合(Combining Supervised-Spatial And SE-ChannelAttention Mechanism，CSSCAM)模块来有效地抑制背景杂波特征，解决了由于传统特征引入所带来的额外背景杂波特征造成虚警框增加的问题，进一步增强目标特征，让特征提取后的目标特征更具有目标性，从而获得更好的SAR目标检测性能；2. In terms of feature fusion, the target detection network designed by the present invention adopts a Triple-Interactive Channel-Spatial Attention Fusion (T-ICSAF) module based on an interactive attention mechanism; at the same time, a Combining Supervised-Spatial And SE-Channel Attention Mechanism (CSSCAM) module based on Ground Truth (GT) binary label supervision is proposed to effectively suppress background clutter features, solve the problem of increased false alarm frames caused by additional background clutter features brought about by the introduction of traditional features, further enhance target features, and make the target features after feature extraction more targeted, thereby obtaining better SAR target detection performance;

3、本发明根据SAR图像本身具有几何畸变、辐射畸变、遮挡阴影等特点，设计了一个基于GT二值标签监督的Attention二分类分支取代了原始无锚框目标检测网络FCOS中的Centerness分支，让其更加适用于SAR车辆目标检测任务，并能联合本发明提出的CSSCAM模块使用，从而进一步提升SAR目标检测性能。3. According to the characteristics of SAR images such as geometric distortion, radiation distortion, occlusion and shadow, the present invention designs an Attention binary classification branch based on GT binary label supervision to replace the Centerness branch in the original anchor-free target detection network FCOS, making it more suitable for SAR vehicle target detection tasks, and can be used in conjunction with the CSSCAM module proposed in the present invention, thereby further improving the SAR target detection performance.

以下将结合附图及实施例对本发明做进一步详细说明。The present invention will be further described in detail below with reference to the accompanying drawings and embodiments.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本发明实施例提供的基于有监督注意力机制的无锚框SAR目标检测方法的一种流程示意图；FIG1 is a flow chart of a method for detecting SAR targets without anchor frames based on a supervised attention mechanism according to an embodiment of the present invention;

图2是本发明实施例提供的基于有监督注意力机制的无锚框SAR目标检测方法的另一种流程示意图；FIG2 is another schematic diagram of a flow chart of a method for anchor-free SAR target detection based on a supervised attention mechanism provided by an embodiment of the present invention;

图3是本发明实施例提供的无锚框SAR目标检测网络框架图；FIG3 is a network framework diagram of anchor-free SAR target detection provided by an embodiment of the present invention;

图4是本发明实施例提供的T-ICSAF特征融合模块的网络框架图；FIG4 is a network framework diagram of a T-ICSAF feature fusion module provided in an embodiment of the present invention;

图5是本发明实施例提供的CSSCAM模块的网络框架图；FIG5 is a network framework diagram of a CSSCAM module provided in an embodiment of the present invention;

图6-11是本发明实验所使用的MiniSAR数据图像。Figures 6-11 are MiniSAR data images used in the experiments of the present invention.

具体实施方式DETAILED DESCRIPTION

下面结合具体实施例对本发明做进一步详细的描述，但本发明的实施方式不限于此。The present invention is further described in detail below with reference to specific embodiments, but the embodiments of the present invention are not limited thereto.

实施例一Embodiment 1

请联合参见图1-2，图1是本发明实施例提供的基于有监督注意力机制的无锚框SAR目标检测方法的一种流程示意图，图2是本发明实施例提供的基于有监督注意力机制的无锚框SAR目标检测方法的另一种流程示意图。本发明提供的基于有监督注意力机制的无锚框SAR目标检测方法具体包括：Please refer to Figures 1-2, Figure 1 is a flowchart of a method for anchor-free SAR target detection based on a supervised attention mechanism provided by an embodiment of the present invention, and Figure 2 is another flowchart of a method for anchor-free SAR target detection based on a supervised attention mechanism provided by an embodiment of the present invention. The method for anchor-free SAR target detection based on a supervised attention mechanism provided by the present invention specifically includes:

步骤1：基于原始SAR图像获得原始训练集和原始测试集；利用ROEWA边缘检测算法和双参数CFAR算法分别对原始训练集和原始测试集中获取梯度信息和CFAR信息，并据此构建新的训练集和新的测试集。Step 1: Obtain the original training set and the original test set based on the original SAR image; use the ROEWA edge detection algorithm and the dual-parameter CFAR algorithm to obtain the gradient information and CFAR information from the original training set and the original test set respectively, and construct a new training set and a new test set accordingly.

一、构建新的训练集1. Construct a new training set

首先，选取若干幅原始SAR图像作为训练图像，并对其进行切片，以得到若干训练切片作为原始训练集ψ；Firstly, several original SAR images are selected as training images and sliced to obtain several training slices as the original training set ψ;

然后，对选取的训练图像分别通过ROEWA边缘检测算法和双参数CFAR算法得到梯度幅度训练图像和CFAR二值训练图像；Then, the selected training images are respectively subjected to the ROEWA edge detection algorithm and the dual-parameter CFAR algorithm to obtain the gradient amplitude training image and the CFAR binary training image;

最后，对梯度幅度训练图像和CFAR二值训练图像进行切片，得到与原始训练集ψ对应的梯度幅度训练切片

和CFAR二值训练切片

并与原始训练集ψ一起形成新的训练集ψ'。Finally, the gradient magnitude training image and the CFAR binary training image are sliced to obtain the gradient magnitude training slice corresponding to the original training set ψ

and CFAR binary training slices

And together with the original training set ψ, a new training set ψ' is formed.

可选的，作为一种实现方式，本实施例可以从MiniSAR数据集中选取五幅图像作为训练图像，以45为步长，800×1333为切片大小得到3536张训练切片构成原始训练集ψ；然后对选取的训练图像分别通过ROEWA边缘检测算法和双参数CFAR算法得到梯度幅度训练图像和CFAR二值训练图像，并以45为步长，800×1333为切片大小得到与原始训练集ψ对应的梯度幅度训练切片

和CFAR二值训练切片

最后与原始训练集ψ一起构成最终的训练集ψ'。Optionally, as an implementation method, this embodiment can select five images from the MiniSAR data set as training images, with a step size of 45 and a slice size of 800×1333 to obtain 3536 training slices to form an original training set ψ; then, the selected training images are respectively subjected to the ROEWA edge detection algorithm and the dual-parameter CFAR algorithm to obtain a gradient amplitude training image and a CFAR binary training image, and with a step size of 45 and a slice size of 800×1333 to obtain a gradient amplitude training slice corresponding to the original training set ψ

and CFAR binary training slices

Finally, together with the original training set ψ, they form the final training set ψ'.

二、构建新的测试集2. Build a new test set

首先，选取一幅原始SAR图像作为测试图像，并对其进行切片，以得到若干测试切片作为原始测试集T；First, an original SAR image is selected as a test image and sliced to obtain several test slices as the original test set T;

然后，对选取的测试图像分别通过ROEWA边缘检测算法和双参数CFAR算法得到梯度幅度测试图像和CFAR二值测试图像；Then, the selected test image is subjected to the ROEWA edge detection algorithm and the dual-parameter CFAR algorithm to obtain the gradient amplitude test image and the CFAR binary test image;

最后，对梯度幅度测试图像和CFAR二值测试图像进行切片，得到与原始测试集T对应的梯度幅度测试切片

和CFAR二值测试切片

并与原始测试集T一起形成新的测试集T'。Finally, the gradient magnitude test image and the CFAR binary test image are sliced to obtain the gradient magnitude test slice corresponding to the original test set T

and CFAR binary test slices

And together with the original test set T, form a new test set T'.

可选的，作为一种实现方式，从MiniSAR数据集中选取一幅图像作为测试图像，以150为步长，800×1333为切片大小得到63张测试切片构成原始测试集T，并记录每个测试切片与测试图像对应的位置关系Loc，然后对选取的测试图像分别通过ROEWA边缘检测算法和双参数CFAR算法得到梯度幅度测试图像和CFAR二值测试图像，并以150为步长，800×1333为切片大小得到与原始测试集T对应的梯度幅度测试切片

和CFAR二值测试切片

最后与原始测试集T一起构成最终的测试集T'。Optionally, as an implementation method, an image is selected from the MiniSAR data set as a test image, and 63 test slices are obtained with a step size of 150 and a slice size of 800×1333 to form an original test set T, and the positional relationship Loc corresponding to each test slice and the test image is recorded, and then the selected test image is respectively subjected to the ROEWA edge detection algorithm and the dual-parameter CFAR algorithm to obtain a gradient amplitude test image and a CFAR binary test image, and the gradient amplitude test slice corresponding to the original test set T is obtained with a step size of 150 and a slice size of 800×1333.

and CFAR binary test slices

Finally, together with the original test set T, they form the final test set T'.

进一步的，在构建训练集和测试集的过程中，均需要利用ROEWA边缘检测算法获取SAR图像的梯度幅度图像，其操作过程相同，具体步骤如下：Furthermore, in the process of constructing the training set and the test set, the ROEWA edge detection algorithm is needed to obtain the gradient amplitude image of the SAR image. The operation process is the same, and the specific steps are as follows:

1)输入原始SAR图像；1) Input the original SAR image;

2)对于输入SAR图像中的任一像素点(i,j)，首先使用ROEWA算法分别计算该像素点在水平方向上的水平梯度

和在垂直方向上的垂直梯度

然后利用以下公式计算像素点(i,j)的梯度幅度G^i,j：2) For any pixel point (i, j) in the input SAR image, the ROEWA algorithm is first used to calculate the horizontal gradient of the pixel point in the horizontal direction

and the vertical gradient in the vertical direction

Then the gradient magnitude^Gi,j of pixel (i,j) is calculated using the following formula:

具体的，在实施例中，使用ROEWA算法计算任一像素点(i,j)的水平梯度

和垂直梯度

的具体过程如下：Specifically, in the embodiment, the ROEWA algorithm is used to calculate the horizontal gradient of any pixel point (i, j)

and vertical gradient

The specific process is as follows:

a、计算水平梯度

a. Calculate horizontal gradient

对于任一像素点(i,j)，首先计算该像素点左右两侧(4σ+1)×2σ范围内的像素点幅度值的指数加权均值M_L和M_R，然后将M_L和M_R作商并取对数，从而得到水平梯度

其中σ为指数加权因子。水平梯度

的计算公式如下所示：For any pixel point (i, j), first calculate the exponentially weighted mean_{ML and MR of the pixel amplitude values within the range of (4σ+1)×2σ on the left and right sides of the pixel point, then take the quotient of ML}_and_MR_and take the logarithm to obtain the horizontal gradient

Where σ is an exponential weighting factor. Horizontal gradient

The calculation formula is as follows:

其中I(·)表示SAR图像中像素点的幅度值。Where I(·) represents the amplitude value of the pixel in the SAR image.

b、计算垂直梯度

b. Calculate vertical gradient

对于任一像素点(i,j)，首先计算该像素点上下两侧(4σ+1)×2σ范围内的像素点幅度值的指数加权均值M_T和M_B，然后将M_T和M_B作商并取对数，从而得到垂直梯度

其中σ为指数加权因子。垂直梯度

的计算公式如下所示：For any pixel point (i, j), first calculate the exponentially weighted mean_MT and_MB of the pixel amplitude values within the range of (4σ+1)×2σ on both sides of the pixel point, then take the quotient and logarithm of_MT and_MB to get the vertical gradient

Where σ is an exponential weighting factor. Vertical gradient

The calculation formula is as follows:

3)重复2)操作，得到输入SAR图像中每个像素点的梯度幅度，从而得到原始SAR图像对应的梯度幅度图像。3) Repeat operation 2) to obtain the gradient amplitude of each pixel in the input SAR image, thereby obtaining the gradient amplitude image corresponding to the original SAR image.

更进一步的，在构建训练集和测试集的过程中，还需要利用双参数CFAR算法获取SAR图像的CFAR二值图像，其操作步骤如下：Furthermore, in the process of constructing the training set and the test set, it is also necessary to use the dual-parameter CFAR algorithm to obtain the CFAR binary image of the SAR image. The operation steps are as follows:

1)输入原始SAR图像；1) Input the original SAR image;

2)对于任一像素点(i,j)，利用其幅度I_i,j定义CFAR检测统计量：2) For any pixel (i, j), the CFAR detection statistic is defined using its amplitude I_{i, j} :

其中

和

是高斯分布均值和标准差的最大似然估计，是利用像素点(i,j)附近背景杂波像素点的幅度值x_i计算得到的：in

and

is the maximum likelihood estimate of the mean and standard deviation of the Gaussian distribution, which is calculated using the amplitude value_xi of the background clutter pixel near the pixel point (i, j):

3)将2)计算得到的CFAR检测统计量D_i,j与CFAR检测阈值T进行比较，如果D_i,j＞T，则将该像素点被判为目标，CFAR二值检测结果为1，即将CFAR二值图像中的对应位置设置为1；如果D_i,j＜T，则将该像素点被判为背景，CFAR二值检测结果为0，即将CFAR二值图像中的对应位置设置为0；3) Compare the CFAR detection statistic D_i,j calculated in 2) with the CFAR detection threshold T. If D_i,j > T, the pixel is judged as a target, and the CFAR binary detection result is 1, that is, the corresponding position in the CFAR binary image is set to 1; if D_i,j < T, the pixel is judged as a background, and the CFAR binary detection result is 0, that is, the corresponding position in the CFAR binary image is set to 0;

4)重复上述2)、3)操作，得到输入SAR图像中每个像素点对应的CFAR二值检测结果，从而得到原始SAR图像对应的CFAR二值图像。4) Repeat the above 2) and 3) operations to obtain the CFAR binary detection result corresponding to each pixel in the input SAR image, thereby obtaining the CFAR binary image corresponding to the original SAR image.

步骤2：构建基于梯度信息、CFAR信息融合和有监督注意力机制的无锚框SAR目标检测网络。Step 2: Construct an anchor-free SAR target detection network based on gradient information, CFAR information fusion and supervised attention mechanism.

请参见图3，图3是本发明实施例提供的无锚框SAR目标检测网络框架图，该目标检测网络包括特征提取模块、特征处理模块和网络预测模块。Please refer to FIG. 3 , which is a framework diagram of an anchor-free SAR target detection network provided by an embodiment of the present invention. The target detection network includes a feature extraction module, a feature processing module and a network prediction module.

下面分别对上述三个模块进行详细介绍。The above three modules are introduced in detail below.

(一)特征提取模块(I) Feature extraction module

如图3所示，特征提取模块包括结构完全相同但参数不共享的三个特征提取子网络A、B、C，分别为幅度特征提取网络A、梯度特征提取网络B和CFAR特征提取网络C；其中，As shown in Figure 3, the feature extraction module includes three feature extraction sub-networks A, B, and C with exactly the same structure but no shared parameters, namely, amplitude feature extraction network A, gradient feature extraction network B, and CFAR feature extraction network C;

三个特征提取子网络A、B、C均包括一个以ResNet-18为骨架的特征提取模块和一个FPN多尺度特征融合模块；The three feature extraction sub-networks A, B, and C each include a feature extraction module with ResNet-18 as the skeleton and an FPN multi-scale feature fusion module;

三个特征提取模块分别用于对训练切片ψ、

和

进行特征提取，得到对应的输出特征层

和

i＝A,B,C；The three feature extraction modules are used to extract the training slices ψ,

and

Perform feature extraction to obtain the corresponding output feature layer

and

i＝A,B,C;

三个FPN多尺度特征融合模块分别用于对输出特征层

和

进行多尺度特征融合，得到整个特征提取模块的输出特征层

和

The three FPN multi-scale feature fusion modules are used to output feature layers

and

Perform multi-scale feature fusion to obtain the output feature layer of the entire feature extraction module

and

在本实施例中，三个特征提取子网络A、B、C的特征提取网络所使用的ResNet-18的网络结构、参数设置及对应关系如下：In this embodiment, the network structure, parameter settings and corresponding relationships of ResNet-18 used in the feature extraction networks of the three feature extraction subnetworks A, B, and C are as follows:

所使用到的ResNet-18的网络结构主要包括五个layer：conv1、conv2_x、conv3_x、conv4_x和conv5_x，输入网络的图像大小为H×W×3：The network structure of ResNet-18 used mainly includes five layers: conv1, conv2_x, conv3_x, conv4_x and conv5_x. The image size of the input network is H×W×3:

conv1：包括一个卷积层L₁，其卷积核窗口大小为7×7，滑动步长为2，填充属性Padding为3，用于输出64个

大小的特征图

j表示第j个特征图。conv1的输出为conv2_x的输入；conv1: includes a convolution layer_L1 , whose convolution kernel window size is 7×7, sliding step size is 2, padding attribute Padding is 3, used to output 64

Feature map of size

j represents the jth feature map. The output of conv1 is the input of conv2_x;

conv2_x：包括一个池化层P₁、两个卷积模块M₁、M₂和两个残差模块R₁、R₂。具体地，P₁采用最大池化的方式，其输入为conv1输出的特征层X¹，卷积核窗口大小为3×3，滑动步长为2，填充属性Padding为1，用于输出64个

大小的特征图

j表示第j个特征图。P₁的输出为M₁和R₁的输入；M₁包含两个卷积层L₂和L₃，用于输出64个

大小的特征图

j表示第j个特征图。其中L₂和L₃的卷积核窗口大小为3×3，滑动步长为1，填充属性Padding为1，M₁的输出为R₁的另一个输入；R₁包含一个将P₁的输出特征层和M₁的输出特征层的对应元素进行加和的操作，用于输出64个

大小的特征图

j表示第j个特征图。R₁的输出为M₂和R₂的输入；M₂同样包含两个卷积层L₄和L₅，用于输出64个

大小的特征图

j表示第j个特征图。其中L₄和L₅的卷积核窗口大小为3×3，滑动步长为1，填充属性Padding为1，M₂的输出为R₂的另一个输入；R₂包含一个将R₁的输出特征层和M₂的输出特征层的对应元素进行加和的操作，用于输出64个

大小的特征图

j表示第j个特征图。R₂的输出为conv3_x的输入；conv2_x: includes a pooling layer P₁ , two convolution modules M₁ , M₂ and two residual modules R₁ , R₂ . Specifically, P₁ uses the maximum pooling method, its input is the feature layer X¹ output by conv1, the convolution kernel window size is 3×3, the sliding step is 2, and the padding attribute Padding is 1, which is used to output 64

Feature map of size

j represents the jth feature map. The output of P₁ is the input of M₁ and R₁ ; M₁ contains two convolutional layers L₂ and L₃ , which are used to output 64

Feature map of size

j represents the jth feature map. The convolution kernel window size of_L2 and_L3 is 3×3, the sliding step is 1, the padding attribute Padding is 1, and the output of_M1 is another input of_R1 ;_R1 contains an operation that adds the corresponding elements of the output feature layer of_P1 and the output feature layer of_M1 to output 64

Feature map of size

j represents the jth feature map. The output of R₁ is the input of M₂ and R₂ ; M₂ also contains two convolutional layers L₄ and L₅ , which are used to output 64

Feature map of size

j represents the jth feature map. The convolution kernel window size of_L4 and_L5 is 3×3, the sliding step is 1, the padding attribute is 1, and the output of_M2 is another input of_R2 ;_R2 contains an operation that adds the corresponding elements of the output feature layer of_R1 and the output feature layer of_M2 to output 64

Feature map of size

j represents the jth feature map. The output of R₂ is the input of conv3_x;

conv3_x：包括两个卷积模块M₃、M₄和两个残差模块R₃、R₄。具体地，M₃包含两个卷积层L₆和L₇，用于输出128个

大小的特征图

j表示第j个特征图。其中L₆的卷积核窗口大小为3×3，滑动步长为2，填充属性Padding为1，L₇的卷积核窗口大小为3×3，滑动步长为1，填充属性Padding为1，M₃的输出为R₃的一个输入；R₃的另一个输入为R₂输出的特征层X⁶，R₃首先将X⁶通过一个卷积核窗口大小为1×1，滑动步长为2，填充属性Padding为0的卷积层L₈，用于输出128个

大小的特征图

j表示第j个特征图，然后将L₈输出的特征层X⁸和M₃的输出特征层X⁷进行对应元素的加和操作，用于输出128个

大小的特征图

R₃的输出为R₄和M₄的输入；M₄包含两个卷积层L₉和L₁₀，用于输出128个

大小的特征图

j表示第j个特征图。其中L₉和L₁₀的卷积核窗口大小为3×3，滑动步长为1，填充属性Padding为1，M₄的输出为R₄的另一个输入；R₄包含一个将R₃的输出特征层和M₄的输出特征层的对应元素进行加和的操作，用于输出128个

大小的特征图

j表示第j个特征图。R₄的输出为conv4_x的输入，另外R₄的输出即为A、B、C中特征提取网络输出的

特征层(i＝A,B,C)，同时它也是与之对应的FPN多尺度特征融合模块的一个输入；conv3_x: includes two convolution modules M₃ , M₄ and two residual modules R₃ , R₄ . Specifically, M₃ contains two convolution layers L₆ and L₇ to output 128

Feature map of size

j represents the jth feature map. The convolution kernel window size of L₆ is 3×3, the sliding step size is 2, and the padding attribute is 1. The convolution kernel window size of L₇ is 3×3, the sliding step size is 1, and the padding attribute is 1. The output of M₃ is an input of R₃ ; the other input of R₃ is the feature layer X⁶ output by R_2. R₃ first passes X⁶ through a convolution layer L₈ with a convolution kernel window size of 1×1, a sliding step size of 2, and a padding attribute of 0, which is used to output 128

Feature map of size

j represents the jth feature map, and then the feature layer^X8 output by_L8 and the feature layer^X7 output by_M3 are summed up to output 128

Feature map of size

The output of R₃ is the input of R₄ and M₄ ; M₄ contains two convolutional layers L₉ and L₁₀ , which are used to output 128

Feature map of size

j represents the jth feature map. The convolution kernel window size of_L9 and_L10 is 3×3, the sliding step is 1, the padding attribute is 1, and the output of_M4 is another input of_R4 ;_R4 contains an operation that adds the corresponding elements of the output feature layer of_R3 and the output feature layer of_M4 to output 128

Feature map of size

j represents the jth feature map. The output of R₄ is the input of conv4_x. In addition, the output of R₄ is the output of the feature extraction network in A, B, and C.

Feature layer (i = A, B, C), which is also an input of the corresponding FPN multi-scale feature fusion module;

conv4_x：包括两个卷积模块M₅、M₆和两个残差模块R₅、R₆。具体地，M₅包括两个卷积层L₁₁和L₁₂，用于输出256个

大小的特征图

j表示第j个特征图。其中L₁₁的卷积核窗口大小为3×3，滑动步长为2，填充属性Padding为1，L₁₂的卷积核窗口大小为3×3，滑动步长为1，填充属性Padding为1，M₅的输出为R₅的一个输入；R₅的另一个输入为R₄输出的特征层X¹¹，R₅首先将X¹¹通过一个卷积核窗口大小为1×1，滑动步长为2，填充属性Padding为0的卷积层L₁₃，用于输出256个

大小的特征图

j表示第j个特征图，然后将L₁₃输出的特征层X¹³和M₅的输出特征层X¹²进行对应元素的加和操作，用于输出256个

大小的特征图

j表示第j个特征图。R₅的输出为R₆和M₆的输入；M₆包含两个卷积层L₁₄和L₁₅，用于输出256个

大小的特征图

j表示第j个特征图。其中L₁₄和L₁₅的卷积核窗口大小为3×3，滑动步长为1，填充属性Padding为1的卷积操作，M₆的输出为R₆的另一个输入；R₆包含一个将R₅的输出特征层和M₆的输出特征层的对应元素进行加和的操作，用于输出256个

大小的特征图

j表示第j个特征图。R₆的输出为conv5_x的输入，另外R₆的输出即为A、B、C中特征提取网络输出的

特征层(i＝A,B,C)，同时它也是与之对应的FPN多尺度特征融合模块的一个输入；conv4_x: includes two convolution modules M₅ , M₆ and two residual modules R₅ , R₆ . Specifically, M₅ includes two convolution layers L₁₁ and L₁₂ for outputting 256

Feature map of size

j represents the jth feature map. The convolution kernel window size of_L11 is 3×3, the sliding step size is 2, and the padding attribute is 1. The convolution kernel window size of_L12 is 3×3, the sliding step size is 1, and the padding attribute is 1. The output of_M5 is an input of_R5 ; the other input of_R5 is the feature layer^X11 output by_R4._R5 first passes^X11 through a convolution layer_L13 with a convolution kernel window size of 1×1, a sliding step size of 2, and a padding attribute of 0, which is used to output 256

Feature map of size

j represents the jth feature map, and then the feature layer^X13 output by_L13 and the feature layer^X12 output by_M5 are summed up to output 256

Feature map of size

j represents the jth feature map. The output of R₅ is the input of R₆ and M₆ ; M₆ contains two convolutional layers L₁₄ and L₁₅ , which are used to output 256

Feature map of size

j represents the jth feature map. The convolution kernel window size of_L14 and_L15 is 3×3, the sliding step size is 1, and the padding attribute is 1. The output of_M6 is another input of_R6 ;_R6 contains an operation that adds the corresponding elements of the output feature layer of_R5 and the output feature layer of_M6 to output 256

Feature map of size

j represents the jth feature map. The output of R₆ is the input of conv5_x. In addition, the output of R₆ is the output of the feature extraction network in A, B, and C.

conv5_x：包括两个卷积模块M₇、M₈和两个残差模块R₇和R₈。M₇包括两个卷积层L₁₆和L₁₇，用于输出512个

大小的特征图

j表示第j个特征图。其中L₁₆的卷积核窗口大小为3×3，滑动步长为2，填充属性Padding为1，L₁₇的卷积核窗口大小为3×3，滑动步长为1，填充属性Padding为1，M₇的输出为R₇的输入；R₇的另一个输入为R₆输出特征层X¹⁶，R₇首先将X¹⁶通过一个卷积核窗口大小为1×1，滑动步长为2，填充属性Padding为0的卷积层L₁₈，用于输出512个

大小的特征图

j表示第j个特征图，然后将L₁₈输出的特征层X¹⁸和M₇输出的特征层X¹⁷进行对应元素的加和操作，用于输出512个

大小的特征图

R₇的输出为R₈和M₈的输入；M₈包含两个卷积层L₁₉和L₂₀，用于输出512个

大小的特征图

j表示第j个特征图。其中L₁₉和L₂₀的卷积核窗口大小为3×3，滑动步长为1，填充属性Padding为1，M₈的输出为R₈的另一个输入；R₈包含一个将R₇的输出特征层和M₈的输出特征层的对应元素进行加和的操作，用于输出512个

大小的特征图

j表示第j个特征图。R₈的输出即为A、B、C中特征提取网络输出的

特征层(i＝A,B,C)，同时它也是与之对应的FPN多尺度特征融合模块的一个输入。conv5_x: includes two convolution modules_M7 ,_M8 and two residual modules_R7 and_R8 ._M7 includes two convolution layers_L16 and_L17 to output 512

Feature map of size

j represents the jth feature map. The convolution kernel window size of L₁₆ is 3×3, the sliding step size is 2, and the padding attribute is 1. The convolution kernel window size of L₁₇ is 3×3, the sliding step size is 1, and the padding attribute is 1. The output of M₇ is the input of R₇ ; another input of R₇ is the output feature layer X¹⁶ of R_6. R₇ first passes X¹⁶ through a convolution kernel window size of 1×1, a sliding step size of 2, and a padding attribute of 0. The convolution layer L₁₈ is used to output 512

Feature map of size

j represents the jth feature map, and then the feature layer^X18 output by_L18 and the feature layer^X17 output by_M7 are summed up to output 512

Feature map of size

The output of_R7 is the input of_R8 and_M8 ;_M8 contains two convolutional layers_L19 and_L20 , which are used to output 512

Feature map of size

j represents the jth feature map. The convolution kernel window size of_L19 and_L20 is 3×3, the sliding step is 1, the padding attribute is 1, and the output of_M8 is another input of_R8 ;_R8 contains an operation that adds the corresponding elements of the output feature layer of_R7 and the output feature layer of_M8 to output 512

Feature map of size

j represents the jth feature map. The output of R₈ is the output of the feature extraction network in A, B, and C.

Feature layer (i = A, B, C), and it is also an input to the corresponding FPN multi-scale feature fusion module.

进一步的，三个特征提取子网络A、B、C使用FPN模块进行多尺度特征融合。FPN多尺度特征融合模块的实现方式、参数设置及对应关系如下：Furthermore, the three feature extraction sub-networks A, B, and C use the FPN module for multi-scale feature fusion. The implementation method, parameter setting, and corresponding relationship of the FPN multi-scale feature fusion module are as follows:

A、B、C中FPN多尺度特征融合模块的一个输入为与之对应的特征提取网络的输出特征层

和

以特征提取子网络A为例，其对应的FPN模块的输入为

和

One input of the FPN multi-scale feature fusion module in A, B, and C is the output feature layer of the corresponding feature extraction network

and

Taking feature extraction subnetwork A as an example, the input of its corresponding FPN module is

and

使用的FPN多尺度特征融合模块主要包括三个部分：FPN1模块、FPN2模块和FPN3模块：The FPN multi-scale feature fusion module used mainly consists of three parts: FPN1 module, FPN2 module and FPN3 module:

FPN1模块：FPN1模块的输入为

特征层，其特征层大小为

该模块的具体实现方式如下：首先将

进行一次卷积核的窗口大小为1×1，滑动步长为1，填充属性Padding为0的卷积操作，得到256个

大小的特征图

然后将得到的特征层X²²进行一次上采样操作，得到256个

大小的特征图

即为FPN1模块的输出。另外FPN1模块的输出为FPN2模块的一个输入；FPN1 module: The input of FPN1 module is

The feature layer has a feature size of

The specific implementation of this module is as follows: First,

Perform a convolution operation with a convolution kernel window size of 1×1, a sliding step size of 1, and a padding attribute of 0, and get 256

Feature map of size

Then the obtained feature layer X²² is upsampled once to obtain 256

Feature map of size

This is the output of the FPN1 module. In addition, the output of the FPN1 module is an input to the FPN2 module;

FPN2模块：FPN2模块的输入为

特征层和FPN1模块的输出特征层X²³，

和X²³的特征层大小均为

该模块的具体实现方式如下：首先将

大小的特征图

然后将得到的特征层X²⁴和X²³直接相加融合，得到256个

大小的特征图

最后将得到的特征层X²⁵进行一次上采样操作，得到256个

大小的特征图

即为FPN2模块的输出。另外FPN2模块的输出为FPN3模块的一个输入；FPN2 module: The input of FPN2 module is

Feature layer and the output feature layer of FPN1 module X²³ ,

The feature layer sizes of^X23 are

The specific implementation of this module is as follows: First,

Feature map of size

Then the obtained feature layers^X24 and^X23 are directly added and fused to obtain 256

Feature map of size

Finally, the obtained feature layer X²⁵ is upsampled once to obtain 256

Feature map of size

This is the output of the FPN2 module. In addition, the output of the FPN2 module is an input to the FPN3 module;

FPN3模块：FPN3模块的输入特征层为

特征层和FPN2模块的输出特征层X²⁶，

和X²⁶的大小分别为

和

该模块的具体实现方式如下：首先将

大小的特征图

然后将得到的特征层X²⁷和X²⁶直接相加融合，得到256个

大小的特征图

最后将得到的特征层X²⁸再进行一次卷积核的窗口大小为3×3，滑动步长为1，填充属性Padding为1的卷积操作，得到256个

大小的特征图

即为FPN3模块的输出。另外FPN3模块的输出即为整个FPN模块的输出特征层

FPN3 module: The input feature layer of the FPN3 module is

Feature layer and the output feature layer of the FPN2 module X²⁶ ,

and X²⁶ are

and

The specific implementation of this module is as follows: First,

Feature map of size

Then the obtained feature layers^X27 and^X26 are directly added and fused to obtain 256

Feature map of size

Finally, the obtained feature layer^X28 is subjected to another convolution operation with a convolution kernel window size of 3×3, a sliding step size of 1, and a padding attribute of 1, and 256

Feature map of size

This is the output of the FPN3 module. In addition, the output of the FPN3 module is the output feature layer of the entire FPN module.

(二)特征处理模块(II) Feature processing module

如图3所示，在本实施例中，特征处理模块包括一个基于交互式注意力机制的T-ICSAF特征融合模块和一个基于GT二值标签监督的空间注意力机制和通道注意力机制结合的CSSCAM模块；其中，As shown in FIG3 , in this embodiment, the feature processing module includes a T-ICSAF feature fusion module based on an interactive attention mechanism and a CSSCAM module combining a spatial attention mechanism and a channel attention mechanism based on GT binary label supervision; wherein,

T-ICSAF特征融合模块用于对特征提取模块的输出特征层

和

进行融合，得到融合特征F₃；The T-ICSAF feature fusion module is used to fusion the output feature layer of the feature extraction module.

and

Perform fusion to obtain fusion feature F₃ ;

CSSCAM模块用于对融合特征F₃进行处理，得到目标特征被增强、背景杂波特征被抑制的特征层A₃，以作为特征处理模块的输出。The CSSCAM module is used to process the fused feature F₃ to obtain a feature layer A₃ in which the target feature is enhanced and the background clutter feature is suppressed, as the output of the feature processing module.

请参见图4，图4是本发明实施例提供的T-ICSAF特征融合模块的网络框架图，其中，T-ICSAF特征融合模块主要包括四个子模块，分别为特征预处理子模块、交互式通道注意子模块、交互式空间注意子模块和注意力融合子模块。Please refer to Figure 4, which is a network framework diagram of the T-ICSAF feature fusion module provided by an embodiment of the present invention, wherein the T-ICSAF feature fusion module mainly includes four sub-modules, namely, a feature preprocessing sub-module, an interactive channel attention sub-module, an interactive spatial attention sub-module and an attention fusion sub-module.

在本实施例中，特征预处理子模块用于将特征提取部分的输出特征层

和

分别进行卷积操作和BN批标准化操作，对应得到三个特征层X³⁰、X³¹和X³²。In this embodiment, the feature preprocessing submodule is used to convert the output feature layer of the feature extraction part into

and

Convolution operations and BN batch normalization operations are performed respectively, and three feature layers X³⁰ , X³¹ and X³² are obtained accordingly.

具体的，特征预处理子模块主要采用如下方式实现：Specifically, the feature preprocessing submodule is mainly implemented in the following ways:

将特征提取部分的输出特征层

和

分别进行一次卷积核的窗口大小为3×3，滑动步长为1，填充属性Padding为1的卷积操作和一次BN批标准化操作，得到3个

大小的特征层X³⁰、X³¹和X³²，即为特征预处理子模块的输出。另外特征预处理子模块输出的X³⁰、X³¹和X³²特征层为交互式通道注意子模块和交互式空间注意子模块的输入，特征预处理子模块输出的X³⁰特征层为注意力融合子模块的一个输入。The output feature layer of the feature extraction part

and

Perform a convolution operation with a convolution kernel window size of 3×3, a sliding step size of 1, a padding attribute of 1, and a BN batch normalization operation to obtain 3

The feature layers X³⁰ , X³¹ and X³² of different sizes are the outputs of the feature preprocessing submodule. In addition, the feature layers X³⁰ , X³¹ and X³² output by the feature preprocessing submodule are the inputs of the interactive channel attention submodule and the interactive spatial attention submodule, and the feature layer X³⁰ output by the feature preprocessing submodule is an input of the attention fusion submodule.

进一步的，交互式通道注意子模块用于对特征层X³⁰、X³¹和X³²依次进行全局平均池化、加和、以及Sigmoid归一化操作，得到通道注意权重F_c。Furthermore, the interactive channel attention submodule is used to perform global average pooling, summing, and sigmoid normalization operations on the feature layers X³⁰ , X³¹ , and X³² in sequence to obtain the channel attention weight F_c .

在本实施例中，交互式通道注意子模块主要采用如下方式实现：In this embodiment, the interactive channel attention submodule is mainly implemented in the following manner:

首先将特征预处理子模块输出的X³⁰、X³¹和X³²特征层分别进行一次空间维度的全局平均池化操作，得到3个1×1×256大小的特征向量X³³、X³⁴和X³⁵；然后将得到的X³³、X³⁴和X³⁵进行逐元素的加和操作并进行一次Sigmoid归一化操作，得到一个1×1×256大小的通道注意权重F_c，即为交互式通道注意子模块的输出。另外交互式通道注意子模块的输出为注意力融合子模块的一个输入。First, the feature layers^X30 ,^X31 and^X32 output by the feature preprocessing submodule are subjected to a global average pooling operation in the spatial dimension to obtain three feature vectors^X33 ,^X34 and^X35 of size 1×1×256; then the obtained^X33 ,^X34 and^X35 are added element by element and subjected to a Sigmoid normalization operation to obtain a channel attention weight_Fc of size 1×1×256, which is the output of the interactive channel attention submodule. In addition, the output of the interactive channel attention submodule is an input of the attention fusion submodule.

进一步的，交互式空间注意子模块用于对特征层X³⁰、X³¹和X³²依次进行通道维度上的全局平均池化、堆叠(Concat)、卷积以及Sigmoid归一化操作，得到空间注意权重F_s。Furthermore, the interactive spatial attention submodule is used to perform global average pooling, stacking (Concat), convolution and Sigmoid normalization operations on the feature layers X³⁰ , X³¹ and X³² in the channel dimension in sequence to obtain the spatial attention weight F_s .

在本实施例中，交互式空间注意子模块主要采用如下方式实现：In this embodiment, the interactive spatial attention submodule is mainly implemented in the following manner:

首先将特征预处理子模块输出的X³⁰、X³¹和X³²特征层分别进行一次通道维度上的全局平均池化操作，得到3个

大小的特征图X³⁶、X³⁷和X³⁸；然后将得到的X³⁶、X³⁷和X³⁸进行一次Concat操作，得到1个

大小的特征图X³⁹；再将得到的X³⁹进行一次卷积核的窗口大小为3×3，滑动步长为1，填充属性Padding为1的卷积操作并进行一次Sigmoid归一化操作，得到一个

大小的空间注意权重F_s，即为交互式空间注意子模块的输出。另外交互式空间注意子模块的输出为注意力融合子模块的一个输入。First, the^X30 ,^X31 , and^X32 feature layers output by the feature preprocessing submodule are subjected to a global average pooling operation in the channel dimension to obtain three

The feature maps of^size X³⁶ , X³⁷ and X³⁸ are then concat-^operated to obtain¹

The size of the feature map X³⁹ is obtained; then the obtained X³⁹ is convolved with a kernel window size of 3×3, a sliding step size of 1, a padding attribute of 1, and a Sigmoid normalization operation, and a

The spatial attention weight_Fs of the size is the output of the interactive spatial attention submodule. In addition, the output of the interactive spatial attention submodule is an input of the attention fusion submodule.

此外，注意力融合子模块用于将通道注意权重F_c与特征预处理子模块输出的特征层X³⁰的每个通道对应相乘，得到特征层F'₃；然后将空间注意权重F_s与特征层F'₃每个通道的像素逐一对应相乘，得到融合特征层F₃，即为T-ICSAF特征融合模块的输出。In addition, the attention fusion submodule is used to multiply the channel attention weight_Fc with each channel of the feature layer^X30 output by the feature preprocessing submodule to obtain the feature layer_F'3 ; then the spatial attention weight_Fs is multiplied with the pixels of each channel of the feature layer_F'3 one by one to obtain the fused feature layer_F3 , which is the output of the T-ICSAF feature fusion module.

在本实施例中，注意力融合子模块主要采用如下方式实现：In this embodiment, the attention fusion submodule is mainly implemented in the following manner:

首先将交互式通道注意子模块输出的通道注意权重F_c与特征预处理子模块输出的X³⁰的每个通道对应相乘，得到一个

大小的特征层F'₃；然后将交互式空间注意子模块输出的空间注意权重F_s与特征层F'₃每个通道的像素逐一对应相乘，最终得到一个

大小的融合特征层F₃，即为T-ICSAF特征融合模块的输出。另外T-ICSAF特征融合模块的输出为特征处理模块CSSCAM模块的输入。First, the channel attention weight_Fc output by the interactive channel attention submodule is multiplied by each channel of^X30 output by the feature preprocessing submodule to obtain a

Then the spatial attention weight_Fs output by the interactive spatial attention submodule is multiplied one by one by the pixels of each channel of_the feature layer_F'3 , and finally a

The fusion feature layer F₃ of the size is the output of the T-ICSAF feature fusion module. In addition, the output of the T-ICSAF feature fusion module is the input of the feature processing module CSSCAM module.

进一步的，请参见图5，图5是本发明实施例提供的CSSCAM模块的网络框架图，其中，CSSCAM模块包括三个子模块，分别为GT二值标签监督的空间注意力机制子模块、SE通道注意力机制子模块和注意力融合子模块；其中，Further, please refer to FIG. 5 , which is a network framework diagram of the CSSCAM module provided by an embodiment of the present invention, wherein the CSSCAM module includes three submodules, namely, a spatial attention mechanism submodule for GT binary label supervision, a SE channel attention mechanism submodule, and an attention fusion submodule; wherein,

GT二值标签监督的空间注意力机制子模块用于对T-ICSAF特征融合模块输出的融合特征层F₃依次进行通道维度上的全局平均池化操作和通道维度上的全局最大池化操作、Concat操作、卷积以及Sigmoid归一化操作，得到空间注意权重A_s。The spatial attention mechanism submodule of GT binary label supervision is used to perform global average pooling operation on the channel dimension, global maximum pooling operation on the channel dimension, Concat operation, convolution and Sigmoid normalization operation on the fusion feature layer_F3 output by the T-ICSAF feature fusion module to obtain the spatial attention weight_As .

具体的，GT二值标签监督的空间注意力机制子模块首先将T-ICSAF特征融合模块输出的融合特征F₃分别进行一次通道维度上的全局平均池化操作和一次通道维度上的全局最大池化操作，得到2个

大小的特征图X⁴⁰、X⁴¹；然后将得到的X⁴⁰、X⁴¹进行一次Concat操作，得到1个

大小的特征图X⁴²；再将得到的X⁴²进行一次卷积核的窗口大小为7×7，滑动步长为1，填充属性Padding为3的卷积操作并进行一次Sigmoid归一化操作，得到一个

大小的空间注意权重A_s，即为GT二值标签监督的空间注意力机制子模块的输出。Specifically, the spatial attention mechanism submodule of the GT binary label supervision first performs a global average pooling operation on the channel dimension and a global maximum pooling operation on the channel dimension on the fusion feature_F3 output by the T-ICSAF feature fusion module, and obtains two

The feature maps of^size X⁴⁰ and X⁴¹ are then concat^- operated to obtain 1

The feature map of size^X42 is obtained; then the obtained^X42 is convolved with a convolution kernel window size of 7×7, a sliding step size of 1, a padding attribute of 3, and a Sigmoid normalization operation, and a

The spatial attention weight As_of size is the output of the spatial attention mechanism submodule of GT binary label supervision.

SE通道注意力机制子模块用于对T-ICSAF特征融合模块输出的融合特征层F₃依次进行空间维度的全局平均池化操作和全局最大池化操作、特征向量压缩、映射和解压缩的操作、加和、以及Sigmoid归一化操作，得到通道注意权重A_c。The SE channel attention mechanism submodule is used to perform global average pooling and global maximum pooling operations in the spatial dimension, feature vector compression, mapping and decompression operations, addition, and Sigmoid normalization operations on the fused feature layer_F3 output by the T-ICSAF feature fusion module to obtain the channel attention weight A_c .

具体的，SE通道注意力机制模块首先将T-ICSAF特征融合模块输出的融合特征F₃分别进行一次空间维度的全局平均池化操作和一次空间维度的全局最大池化操作，得到2个1×1×256大小的特征向量X⁴³、X⁴⁴；然后将得到的X⁴³、X⁴⁴分别通过一个有16个神经元的全连接层L₂₁、一个relu激活函数层和一个有256个神经元的全连接层L₂₂进行对特征向量压缩、映射和解压缩的操作，得到通道注意后的2个1×1×256大小的特征向量X⁴⁵、X⁴⁶，其中全连接层L₂₁将输入的特征向量压缩成1×1×16大小的特征向量，全连接层L₂₂将输入的特征向量解压缩回1×1×256大小的特征向量；最后将得到的X⁴⁵、X⁴⁶进行逐元素的加和操作并进行一次Sigmoid归一化操作，得到一个1×1×256大小的通道注意权重A_c，即为SE通道注意力机制子模块的输出。Specifically, the SE channel attention mechanism module first performs a global average pooling operation in the spatial dimension and a global maximum pooling operation in the spatial dimension on the fusion feature_F3 output by the T-ICSAF feature fusion module to obtain two 1×1×256 feature vectors^X43 and^X44 ; then the obtained^X43 and^X44 are respectively compressed, mapped and decompressed through a fully connected layer_L21 with 16 neurons, a relu activation function layer and a fully connected layer_L22 with 256 neurons to obtain two 1×1×256 feature vectors^X45 and^X46 after channel attention, where the fully connected layer_L21 compresses the input feature vector into a feature vector of 1×1×16, and the fully connected layer_L22 decompresses the input feature vector back to a feature vector of 1×1×256; finally, the obtained^X45 and X46 are⁴⁶ performs an element-by-element addition operation and a Sigmoid normalization operation to obtain a channel attention weight A_c of size 1×1×256, which is the output of the SE channel attention mechanism submodule.

注意力融合子模块用于将空间注意权重A_s与T-ICSAF特征融合模块输出的融合特征层F₃的每个通道的像素逐一对应相乘，得到特征层A'₃，然后将通道注意权重A_c与特征层A'₃的每个通道对应相乘，得到一个目标特征被增强、背景杂波特征被抑制的特征层A₃，即为CSSCAM模块的输出。The attention fusion submodule is used to multiply the spatial attention weight_As with the pixels of each channel of the fused feature layer_F3 output by the T-ICSAF feature fusion module one by one to obtain the feature layer_A'3 , and then multiply the channel attention weight_Ac with each channel of the feature layer_A'3 to obtain a feature layer_A3 in which the target features are enhanced and the background clutter features are suppressed, which is the output of the CSSCAM module.

具体的，首先将GT二值标签监督的空间注意力机制子模块输出的空间注意权重A_s与T-ICSAF特征融合模块输出的融合特征F₃的每个通道的像素逐一对应相乘，得到一个

大小的特征层A'₃；然后再将SE通道注意力机制子模块输出的通道注意权重A_c与特征层A₃'的每个通道对应相乘，得到一个目标特征被增强、背景杂波特征被抑制的

大小的特征层A₃，即为CSSCAM模块的输出。另外CSSCAM模块的输出为网络预测部分的输入。Specifically, first, the spatial attention weight As output by the spatial attention mechanism submodule of the GT binary label supervision is multiplied one by one by the_pixels of each channel of the fusion feature_F3 output by the T-ICSAF feature fusion module to obtain a

Then, the channel attention_{weight Ac}_output by the SE channel attention mechanism submodule is multiplied by each channel of the feature layer_A3 ' to obtain a target feature that is enhanced and background clutter features that are suppressed.

The feature layer A₃ of size is the output of the CSSCAM module. In addition, the output of the CSSCAM module is the input of the network prediction part.

此外，需要说明的是，本发明构建的目标检测网络中，还包括一监督标签生成模块，用于生成GT二值标签。In addition, it should be noted that the target detection network constructed by the present invention also includes a supervised label generation module for generating GT binary labels.

在本实施例中，GT二值标签的构造方法如下：In this embodiment, the construction method of the GT binary label is as follows:

a)对训练图像进行真实标注，将目标像素标注为1，背景像素标注为0，得到与训练图像相对应的标注图像；a) Perform true annotation on the training image, annotate the target pixel as 1 and the background pixel as 0, and obtain the annotated image corresponding to the training image;

b)以45为步长，800×1333为大小得到3536张与原始训练集ψ相对应的二值标注切片GT；b) With a step size of 45 and a size of 800×1333, 3536 binary labeled slices GT corresponding to the original training set ψ are obtained;

c)将b)中得到3536张二值标注切片GT下采样三次，得到与送入CSSCAM模块和网络预测部分的特征层同样大小的3536张二值标注切片GT'，并保存与二值标注切片GT'对应的.mat文件作为最终的GT二值标签。c) Downsample the 3536 binary labeled slices GT obtained in b) three times to obtain 3536 binary labeled slices GT' of the same size as the feature layer sent to the CSSCAM module and the network prediction part, and save the .mat file corresponding to the binary labeled slice GT' as the final GT binary label.

(三)网络预测模块(III) Network prediction module

请继续参见图3，其中，网络预测模块包括用于目标检测任务的分类分支子模块、回归分支子模块和基于GT二值标签监督的Attention二分类分支子模块；Please continue to refer to Figure 3, where the network prediction module includes a classification branch submodule for target detection tasks, a regression branch submodule, and an Attention binary classification branch submodule based on GT binary label supervision;

分类分支子模块和回归分支子模块分别用于对特征处理模块的输出特征层A₃进行预测，对应得到分类得分和边界框回归参数；The classification branch submodule and the regression branch submodule are used to predict the output feature layer_A3 of the feature processing module, and obtain the classification score and bounding box regression parameters respectively;

Attention二分类分支子模块用于对用于预测分类得分的特征层C_P和用于预测边界框回归参数的特征层R_P进行处理，得到二分类得分。The Attention binary classification branch submodule is used to process the feature layer C_P used to predict the classification score and the feature layer R_P used to predict the bounding box regression parameters to obtain the binary classification score.

在本实施例中，分类分支子模块采用如下方式实现：In this embodiment, the classification branch submodule is implemented in the following manner:

首先对特征处理部分CSSCAM模块输出的特征层A₃进行四次卷积操作，得到一个用于预测分类得分的特征层C_P；然后对得到的特征层C_P依次进行卷积和Sigmoid归一化操作，得到分类分支子模块预测的分类得分图X⁴⁷。First, the feature layer A₃ output by the feature processing part CSSCAM module is convolved four times to obtain a feature layer C_P for predicting classification scores; then the obtained feature layer C_P is convolved and Sigmoid normalized in turn to obtain the classification score map X⁴⁷ predicted by the classification branch submodule.

具体的，先将特征处理部分CSSCAM模块输出的特征层A₃进行四次卷积核的窗口大小为3×3，滑动步长为1，填充属性Padding为1的卷积操作，得到一个用于预测分类得分的

大小的特征层C_P；再将得到的特征层C_P再进行一次卷积核的窗口大小为3×3，滑动步长为1，填充属性Padding为1的卷积操作并进行一次Sigmoid归一化操作，得到一个

大小的分类分支子模块预测的分类得分图X⁴⁷。Specifically, the feature layer_A3 output by the CSSCAM module of the feature processing part is first subjected to four convolution operations with a window size of 3×3, a sliding step size of 1, and a padding attribute of 1, and a convolution operation is obtained for predicting the classification score.

The feature layer C_P of size is obtained; the obtained feature layer C_P is subjected to a convolution operation with a convolution kernel window size of 3×3, a sliding step size of 1, a padding attribute of 1, and a Sigmoid normalization operation, and a

Classification score graph of size classification branch submodule prediction X⁴⁷ .

在本实施例中，回归分支子模块采用如下方式实现：In this embodiment, the regression branch submodule is implemented in the following manner:

首先对特征处理部分CSSCAM模块输出的特征层A₃进行四次卷积操作，得到一个用于预测边界框回归参数的特征层R_P；然后对得到的特征层R_P进行一次卷积操作，得到回归分支子模块预测的边界框回归参数X⁴⁸。First, the feature layer_A3 output by the feature processing part CSSCAM module is convolved four times to obtain a feature layer_Rp for predicting bounding box regression parameters; then the obtained feature layer_Rp is convolved once to obtain the bounding box regression parameters^X48 predicted by the regression branch submodule.

具体的，先将特征处理部分CSSCAM模块输出的特征层A₃进行四次卷积核的窗口大小为3×3，滑动步长为1，填充属性Padding为1的卷积操作，得到一个用于预测边界框回归参数的

大小的特征层R_P；再将得到的特征层R_P再进行一次卷积核的窗口大小为3×3，滑动步长为1，填充属性Padding为1的卷积操作，得到一个

大小的回归分支子模块预测的边界框回归参数X⁴⁸。Specifically, the feature layer_A3 output by the CSSCAM module of the feature processing part is first subjected to four convolution operations with a window size of 3×3, a sliding step size of 1, and a padding attribute of 1, and a convolution operation is obtained for predicting the bounding box regression parameters.

The feature layer R_P of size ; then the obtained feature layer R_P is convolved again with a convolution kernel window size of 3×3, a sliding step size of 1, and a padding attribute of 1, and a convolution operation is obtained.

The bounding box regression parameters X⁴⁸ predicted by the regression branch submodule of size.

在本实施例中，Attention二分类分支子模块采用如下方式实现：In this embodiment, the Attention binary classification branch submodule is implemented in the following manner:

首先将分类分支子模块中用于预测分类得分的特征层C_P和回归分支子模块中用于预测边界框回归参数的特征层R_P进行一次Concat操作，并对得到的特征图X⁴⁹进行卷积操作，得到特征层A_P；最后对特征层A_P进行卷积操作和Sigmoid归一化操作，得到Attention二分类分支子模块预测的二分类得分图X⁵⁰。First, a concat operation is performed on the feature layer C_P used to predict the classification score in the classification branch submodule and the feature layer R_P used to predict the bounding box regression parameters in the regression branch submodule, and a convolution operation is performed on the obtained feature map X⁴⁹ to obtain the feature layer A_P ; finally, a convolution operation and a Sigmoid normalization operation are performed on the feature layer A_P to obtain the binary classification score map X⁵⁰ predicted by the Attention binary classification branch submodule.

具体的，先将分类分支子模块中用于预测分类得分的特征层C_P和回归分支子模块中用于预测边界框回归参数的特征层R_P进行一次Concat操作，得到1个

大小的特征图X⁴⁹；再将得到的X⁴⁹进行一次卷积核的窗口大小为3×3，滑动步长为1，填充属性Padding为1的卷积操作，得到一个

大小的特征层A_P，最后将得到的特征层A_P进行一次卷积核窗口大小为3×3，滑动步长为1填充属性Padding为1的卷积操作并进行一次Sigmoid归一化操作，得到一个

大小的Attention二分类分支子模块预测的二分类得分图X⁵⁰。Specifically, the feature layer C_P used to predict the classification score in the classification branch submodule and the feature layer R_P used to predict the bounding box regression parameters in the regression branch submodule are concat- ed to obtain 1

The feature map of size X⁴⁹ is obtained; then the obtained X⁴⁹ is convolved with a convolution kernel window size of 3×3, a sliding step size of 1, and a padding attribute of 1 to obtain a

Finally, the obtained feature layer_AP is subjected to a convolution_operation with a convolution kernel window size of 3×3, a sliding step size of 1, and a padding attribute of 1, and a Sigmoid normalization operation, and a

Binary classification score map predicted by the Attention binary classification branch submodule of size X⁵⁰ .

本发明设计的目标检测网络在特征融合方面，采用了基于三支路交互式注意力机制的特征融合模块T-ICSAF；同时，提出了基于GT二值标签监督的注意力机制模块CSSCAM来有效地抑制背景杂波特征，解决了由于传统特征引入所带来的额外背景杂波特征造成虚警框增加的问题，进一步增强目标特征，让特征提取后的目标特征更具有目标性，从而获得更好的SAR目标检测性能。In terms of feature fusion, the target detection network designed by the present invention adopts a feature fusion module T-ICSAF based on a three-branch interactive attention mechanism. At the same time, an attention mechanism module CSSCAM based on GT binary label supervision is proposed to effectively suppress background clutter features, solve the problem of increased false alarm frames caused by additional background clutter features brought about by the introduction of traditional features, further enhance target features, and make the target features after feature extraction more targeted, thereby obtaining better SAR target detection performance.

此外，本发明根据SAR图像本身具有几何畸变、辐射畸变、遮挡阴影等特点，设计了一个基于GT二值标签监督的Attention二分类分支取代了原始无锚框目标检测网络FCOS中的Centerness分支，让其更加适用于SAR车辆目标检测任务，并能联合本发明提出的CSSCAM模块使用，从而进一步提升SAR目标检测性能。In addition, according to the characteristics of SAR images themselves, such as geometric distortion, radiation distortion, occlusion and shadow, the present invention designs an Attention binary classification branch based on GT binary label supervision to replace the Centerness branch in the original anchor-free target detection network FCOS, making it more suitable for SAR vehicle target detection tasks, and can be used in conjunction with the CSSCAM module proposed in the present invention, thereby further improving the SAR target detection performance.

步骤3：利用新的训练集对目标检测网络进行训练，得到训练好的目标检测网络ψ'。Step 3: Use the new training set to train the target detection network to obtain the trained target detection network ψ'.

具体的，在对网络进行训练的过程中，CSSCAM模块使用GT二值标签GT'对GT二值标签监督的空间注意力机制子模块的空间注意权重A_s进行监督学习，损失函数使用FocalLoss；Specifically, in the process of training the network, the CSSCAM module uses the GT binary label GT' to supervise the learning of the spatial attention_weight As of the spatial attention mechanism submodule supervised by the GT binary label, and the loss function uses FocalLoss;

分类分支子模块使用根据无锚框正负样本选择策略得到的正负样本标签进行监督学习，损失函数使用Focal Loss；The classification branch submodule uses the positive and negative sample labels obtained according to the anchor-free positive and negative sample selection strategy for supervised learning, and the loss function uses Focal Loss;

回归分支子模块使用根据无锚框正负样本选择策略得到的正样本标签进行监督学习，损失函数使用GIOU Loss；The regression branch submodule uses the positive sample labels obtained according to the anchor-free positive and negative sample selection strategy for supervised learning, and the loss function uses GIOU Loss;

Attention二分类分支子模块使用与根据无锚框正负样本选择策略得到的正样本相同位置处的GT二值标签GT'进行监督学习，其损失函数使用BCE Loss。The Attention binary classification branch submodule uses the GT binary label GT' at the same position as the positive sample obtained according to the anchor-free positive and negative sample selection strategy for supervised learning, and its loss function uses BCE Loss.

具体的，Focal Loss损失函数的表达式如下：Specifically, the expression of the Focal Loss loss function is as follows:

FL(p_t)＝-(1-p_t)^γlog(p_t)FL(p_t )=-(1-p_t )^γ log(p_t )

其中，p_t表示网络预测为对应分类标签c^*的概率，γ表示调制因子，一般取2；Among them, p_t represents the probability of the network prediction corresponding to the classification label c^* , γ represents the modulation factor, which is generally 2;

GIoU Loss损失函数针对两个边界框A和B，其表达式如下：The GIoU Loss loss function is for two bounding boxes A and B, and its expression is as follows:

其中，C表示能够包围边界框A、B的最小外接边界框，IoU表示边界框A、B的交并比；Among them, C represents the minimum external bounding box that can enclose bounding boxes A and B, and IoU represents the intersection-over-union ratio of bounding boxes A and B;

BCE Loss损失函数的表达式如下：The expression of BCE Loss loss function is as follows:

其中，p表示网络预测为目标(c^*＝1)的概率。Here, p represents the probability that the network predicts the target (c^* = 1).

进一步的，在本实施例中，根据无锚框的正负样本选择策略得到正负样本标签使用的策略如下：Furthermore, in this embodiment, the strategy for obtaining positive and negative sample labels based on the positive and negative sample selection strategy without anchor boxes is as follows:

a)对于预测特征层上的任一位置(x,y)，将其根据下采样倍数映射回原始图像上，得到原始图像上的对应位置

a) For any position (x, y) on the predicted feature layer, map it back to the original image according to the downsampling multiple to obtain the corresponding position on the original image

b)判断

是否落入目标水平框内部，首先计算

到每一个目标水平框四条边距离的最小值

M为原始图像中目标的数量。其中，假设任一目标水平框

则

到B_t四条边的距离l^t、r^t、t^t、b^t的计算公式如下：b) Judgment

Whether it falls inside the target horizontal box, first calculate

The minimum distance to each of the four sides of the target horizontal box

M is the number of objects in the original image. Assume that any object horizontal frame

but

The calculation formulas for the distances to the four sides of_Bt , l^t , r^t , t^t , and b^t, are as follows:

然后按照以下三种情况对

对应的预测特征层上的位置(x,y)进行正负样本划分及标签定义：Then, according to the following three situations

The corresponding position (x, y) on the prediction feature layer is used for positive and negative sample division and label definition:

b1)如果

则将

对应的预测特征层上的位置(x,y)划分为负样本，类别标签

定义为0，回归标签

定义为-1。b1) If

Then

The corresponding position (x, y) on the predicted feature layer is divided into negative samples, and the category label

Defined as 0, regression label

Defined as -1.

b2)如果存在唯一

则说明位置

仅落入一个目标水平框的内部。接着需要根据该预测特征层的尺度回归范围进一步判断

应的预测特征层上的位置(x,y)是否预测该目标水平框。假设该目标水平框为B_i，首先计算位置(x,y)到B_i四条边距离的最大值

然后将其与该预测特征层设定的尺寸回归范围[s_min,s_max]进行比较，如果

则将

对应的预测特征层上的位置(x,y)划分为正样本，类别标签

定义为B_i的类别，回归标签

定义为

到B_i四条边的距离

否则将

对应的预测特征层上的位置(x,y)划分为负样本，即认为位置(x,y)不预测B_i，并将类别标签

定义为0，回归标签

定义为-1。b2) If there is a unique

The location

It only falls inside a target horizontal box. Then it is necessary to further judge according to the scale regression range of the predicted feature layer.

The position (x, y) on the corresponding prediction feature layer predicts the target horizontal box. Assuming that the target horizontal box is_Bi , first calculate the maximum distance from the position (x, y) to the four edges_{of Bi}

Then compare it with the size regression range [s_min ,s_max ] set by the prediction feature layer. If

Then

The corresponding position (x, y) on the prediction feature layer is divided into positive samples and category labels

Defined as the category of_Bi , the regression label

Defined as

Distances to the four sides of_Bi

Otherwise

The position (x, y) on the corresponding prediction feature layer is classified as a negative sample, that is, the position (x, y) is considered not to predict B_i , and the category label

Defined as 0, regression label

Defined as -1.

b3)如果存在N个

则说明位置

落入到N个目标水平框的内部。接着，与情况b2)类似，需要根据该预测特征层的尺度回归范围进一步判断

对应的预测特征层上的位置(x,y)是否预测这些目标水平框。首先计算位置

到这N个目标水平框四条边距离的最大值

然后将它们与该预测特征层设定的尺寸回归范围[s_min,s_max]进行比较，如果存在t(t＞2)个

满足：b3) If there are N

The location

falls into the interior of the N target horizontal boxes. Then, similar to case b2), it is necessary to further judge according to the scale regression range of the predicted feature layer

The corresponding position (x, y) on the prediction feature layer predicts these target horizontal boxes. First, calculate the position

The maximum distance to the four sides of these N target horizontal boxes

Then they are compared with the size regression range [s_min , s_max ] set by the prediction feature layer. If there are t (t>2)

satisfy:

则说明

对应的预测特征层上的位置(x,y)与t个目标水平框对应。由于单阶段全卷积目标检测(Fully Convolution One-Stage Object Detection，FCOS)算法认为面积更大的目标水平框可以由更深尺度预测特征层上的位置进行预测，所以

对应的预测特征层上的位置(x,y)最终将与t个目标水平框中面积最小的目标水平框B_j对应，即将

对应的预测特征层上的位置(x,y)划分为B_j对应的正样本，类别标签

定义为B_j的类别，回归标签

定义为

到B_j四条边的距离

如果存在

满足

则说明

对应的预测特征层上的位置(x,y)仅与一个目标水平框B_k对应，即将

对应的预测特征层上的位置(x,y)划分为B_k对应的正样本，类别标签

定义为B_k的类别，回归标签

定义为

到B_k四条边的距离

否则将

对应的预测特征层上的位置(x,y)划分为负样本，即认为位置(x,y)不预测任何目标水平框，并将类别标签

定义为0，回归标签

定义为-1。Then explain

The corresponding position (x, y) on the prediction feature layer corresponds to t target horizontal boxes. Since the single-stage fully convolutional object detection (FCOS) algorithm believes that the target horizontal box with a larger area can be predicted by the position on the deeper scale prediction feature layer,

The position (x, y) on the corresponding prediction feature layer will eventually correspond to the target horizontal box_Bj with the smallest area among the t target horizontal boxes, that is,

The corresponding position (x, y) on the predicted feature layer is divided into positive samples corresponding to_Bj , and the category label

Defined as the category of_Bj , the regression label

Defined as

Distances to the four sides of_Bj

If exists

satisfy

Then explain

The position (x, y) on the corresponding prediction feature layer corresponds to only one target horizontal box_Bk , that is,

The corresponding position (x, y) on the predicted feature layer is divided into positive samples corresponding to B_k , and the category label

Defined as the category of_Bk , the regression label

Defined as

Distances to the four sides of B_k

Otherwise

The position (x, y) on the corresponding prediction feature layer is classified as a negative sample, that is, it is considered that the position (x, y) does not predict any target horizontal box, and the category label

Defined as 0, regression label

Defined as -1.

c)重复a)、b)操作，定义任一特征层上所有位置处的正负样本及其标签；c) Repeat operations a) and b) to define positive and negative samples and their labels at all positions on any feature layer;

d)重复a)、b)、c)操作，定义所有特征层上所有位置处的正负样本及其标签；d) Repeat operations a), b), and c) to define positive and negative samples and their labels at all positions on all feature layers;

需要说明的是，本实施例仅使用特征处理部分CSSCAM融合模块输出的A₃进行网络预测，所以不需要进行d)的操作。另外，本实施例将A₃预测特征层的尺寸回归范围设置为[-1,128]。It should be noted that this embodiment only uses A₃ output by the CSSCAM fusion module of the feature processing part for network prediction, so there is no need to perform operation d). In addition, this embodiment sets the size regression range of the A₃ prediction feature layer to [-1,128].

步骤4：将新的测试集输入到训练好的目标检测网络ψ'中，得到初步的目标检测结果。Step 4: Input the new test set into the trained object detection network ψ' to obtain preliminary object detection results.

41)将新的测试集中的一张幅度测试切片和与之对应的梯度幅度测试切片

CFAR二值测试切片

分别送入训练好的目标网络中的三个特征提取子网络中进行测试，得到特征层上每个位置的分类得分边界框回归参数和二分类得分；其中，幅度测试切片为原始测试集T中的某一张测试切片；41) Combine an amplitude test slice and the corresponding gradient amplitude test slice in the new test set

CFAR binary test slice

They are sent to the three feature extraction subnetworks in the trained target network for testing, and the classification score bounding box regression parameters and binary classification scores at each position on the feature layer are obtained; the amplitude test slice is a test slice in the original test set T;

42)将每个位置的分类得分和二分类得分相乘并开根号，以作为该位置最终的目标检测得分，并与预设的得分阈值进行比较：42) Multiply the classification score and the binary classification score of each position and take the square root to obtain the final target detection score of the position, and compare it with the preset score threshold:

若特征层中任意位置处的目标检测得分小于预设得分阈值，则丢弃该位置处所预测的检测框；If the target detection score at any position in the feature layer is less than the preset score threshold, the predicted detection box at that position is discarded;

否则，将特征层上的该位置根据下采样倍数映射回该测试切片原图上作为最终检测框的中心，并综合边界框回归参数，以得到该位置的目标检测结果；Otherwise, the position on the feature layer is mapped back to the original image of the test slice according to the downsampling multiple as the center of the final detection box, and the bounding box regression parameters are integrated to obtain the target detection result at this position;

43)将特征层中的每个位置重复步骤42)的操作，得到初步的目标检测结果。43) Repeat step 42) for each position in the feature layer to obtain preliminary target detection results.

步骤5：将初步的目标检测结果对应到测试图像上，并进行NMS操作以去除重叠的目标检测框，得到最终的目标检测结果。Step 5: Map the preliminary target detection results to the test image and perform NMS operation to remove overlapping target detection boxes to obtain the final target detection results.

具体的，根据步骤1构建测试集时得到的Loc将每个测试切片的目标检测结果对应回测试图像上，并进行NMS操作去除重叠的目标检测框。重复步骤4和5的操作，从而得到最终的目标检测结果。Specifically, the target detection result of each test slice is mapped back to the test image according to the Loc obtained when constructing the test set in step 1, and the NMS operation is performed to remove the overlapping target detection boxes. The operations of steps 4 and 5 are repeated to obtain the final target detection result.

本发明提供的基于有监督注意力机制的无锚框SAR目标检测方法从增强目标特征的角度出发，引入了梯度幅度信息和CFAR信息，构建了基于梯度信息、CFAR信息融合的无锚框目标检测网络，实现了复杂场景下地面SAR目标检测任务，能够有效缓解有锚框目标检测网络本身存在的计算复杂、正负样本不平衡等问题，提升了SAR目标检测性能。The anchor-free SAR target detection method based on supervised attention mechanism provided by the present invention starts from the perspective of enhancing target features, introduces gradient amplitude information and CFAR information, constructs an anchor-free target detection network based on the fusion of gradient information and CFAR information, realizes the ground SAR target detection task in complex scenes, can effectively alleviate the problems of computational complexity and imbalance of positive and negative samples existing in the anchor-free target detection network itself, and improves the SAR target detection performance.

为了进一步验证本发明提出的基于有监督注意力机制的无锚框SAR目标检测方法的有效性，本实施例还将其在MiniSAR数据图像上进行了检测。In order to further verify the effectiveness of the anchor-free SAR target detection method based on the supervised attention mechanism proposed in the present invention, this embodiment also detects it on the MiniSAR data image.

请参见图6-11，图6-11是本发明实验所使用的MiniSAR数据图像；表1给出了本发明所提方法与现阶段目标检测性能较好的CFAR-Guided-EfficientDet SAR图像目标检测方法(简称CFAR-Guided-EfficientDet，出自论文《SAR图像目标检测与鉴别方法研究》，西安电子科技大学博士论文，王宁，2021)和基于CFAR指导的双流SSD SAR图像目标检测方法(简称ICSAF-CFAR-SSD，出自论文《结合恒虚警检测与深层网络的SAR目标检测研究》，西安电子科技大学硕士论文，唐天顾，2022)在图6-11所示的MiniSAR数据图像上的车辆目标检测性能指标。Please refer to Figures 6-11, which are the MiniSAR data images used in the experiments of the present invention; Table 1 gives the vehicle target detection performance indicators of the method proposed in the present invention and the CFAR-Guided-EfficientDet SAR image target detection method with better target detection performance at this stage (referred to as CFAR-Guided-EfficientDet, from the paper "Research on SAR Image Target Detection and Identification Methods", doctoral dissertation of Xidian University, Wang Ning, 2021) and the CFAR-guided dual-stream SSD SAR image target detection method (referred to as ICSAF-CFAR-SSD, from the paper "Research on SAR Target Detection Combining Constant False Alarm Detection and Deep Network", master's thesis of Xidian University, Tang Tiangu, 2022) on the MiniSAR data images shown in Figures 6-11.

表1不同检测网络在图6-11所示的MiniSAR数据图像上的目标检测性能对比Table 1 Comparison of target detection performance of different detection networks on the MiniSAR data images shown in Figures 6-11

表1中的Pre表示精确率，即检测出的目标框中真实目标的百分比；Rec表示召回率，即目标被正确检测出来的百分比；F1-Score表示调和平均数，AP表示平均准确率，它们是统一精确率Pre和召回率Rec的系统性指标。In Table 1, Pre represents precision, that is, the percentage of true targets in the detected target frame; Rec represents recall, that is, the percentage of targets correctly detected; F1-Score represents the harmonic mean, and AP represents the average accuracy, which are systematic indicators that unify the precision Pre and recall Rec.

通过表1可以看到，本发明所提方法在图6-11所示的六幅MiniSAR数据图像上的SAR目标检测性能均好于CFAR-Guided-EfficientDet目标检测网络；本发明所提方法在图像1、图像2、图像3和图像4上的SAR目标检测性能要全面好于ICSAF-CFAR-SSD目标检测网络；在图像5上的SAR目标检测性能除AP与ICSAF-CFAR-SSD目标检测网络相当外，其他指标均明显高于ICSAF-CFAR-SSD目标检测网络；另外，在图像6上的SAR目标检测除Pre和F1-Score低于ICSAF-CFAR-SSD目标检测网络外，Rec和AP均高于ICSAF-CFAR-SSD目标检测网络。It can be seen from Table 1 that the SAR target detection performance of the proposed method on the six MiniSAR data images shown in Figures 6-11 is better than that of the CFAR-Guided-EfficientDet target detection network; the SAR target detection performance of the proposed method on Image 1, Image 2, Image 3 and Image 4 is better than that of the ICSAF-CFAR-SSD target detection network in all aspects; the SAR target detection performance on Image 5 is comparable to that of the ICSAF-CFAR-SSD target detection network, and other indicators are significantly higher than those of the ICSAF-CFAR-SSD target detection network; in addition, the SAR target detection on Image 6 is higher than that of the ICSAF-CFAR-SSD target detection network except that Pre and F1-Score are lower than those of the ICSAF-CFAR-SSD target detection network, while Rec and AP are higher than those of the ICSAF-CFAR-SSD target detection network.

总的来看，本发明所提方法在召回率方面明显优于现阶段目标检测性能相对较好的两种SAR目标检测方法在复杂场景下车辆目标上的检测性能，能够检测出更多的目标，同时还能保证目标检测精确率、F1-Score和AP也相对更好。所以，通过上述试验分析，本发明所提方法获得了比现阶段目标检测性能相对较好的SAR目标检测方法更好的SAR目标检测性能，充分验证了本发明所提方法的有效性和优越性。In general, the method proposed in the present invention is significantly better than the two SAR target detection methods with relatively good target detection performance at the current stage in terms of recall rate on vehicle targets in complex scenes, and can detect more targets while ensuring that the target detection accuracy, F1-Score and AP are also relatively better. Therefore, through the above experimental analysis, the method proposed in the present invention has achieved better SAR target detection performance than the SAR target detection methods with relatively good target detection performance at the current stage, which fully verifies the effectiveness and superiority of the method proposed in the present invention.

以上内容是结合具体的优选实施方式对本发明所作的进一步详细说明，不能认定本发明的具体实施只局限于这些说明。对于本发明所属技术领域的普通技术人员来说，在不脱离本发明构思的前提下，还可以做出若干简单推演或替换，都应当视为属于本发明的保护范围。The above contents are further detailed descriptions of the present invention in combination with specific preferred embodiments, and it cannot be determined that the specific implementation of the present invention is limited to these descriptions. For ordinary technicians in the technical field to which the present invention belongs, several simple deductions or substitutions can be made without departing from the concept of the present invention, which should be regarded as falling within the scope of protection of the present invention.

Claims

1. An anchor frame-free SAR target detection method based on a supervised attention mechanism is characterized by comprising the following steps:

step 1: acquiring an original training set and an original testing set based on the original SAR image; gradient information and CFAR information are respectively obtained from an original training set and an original testing set by utilizing a ROEWA edge detection algorithm and a double-parameter CFAR algorithm, and a new training set and a new testing set are constructed according to the gradient information and the CFAR information;

Step 2: constructing an anchor frame-free SAR target detection network based on gradient information, CFAR information fusion and a supervised attention mechanism; the target detection network comprises a feature extraction module, a feature processing module and a network prediction module;

step 3: training the target detection network by using the new training set to obtain a trained target detection network;

step 4: inputting the new test set into a trained target detection network to obtain a preliminary target detection result;

step 5: and the preliminary target detection result is corresponding to the test image, NMS operation is carried out to remove the overlapped target detection frame, and a final target detection result is obtained.

2. The supervised attention mechanism based anchor free SAR target detection method of claim 1, wherein step 1 comprises:

1. constructing a new training set;

selecting a plurality of original SAR images as training images, and slicing the training images to obtain a plurality of training slices as an original training set psi;

respectively obtaining a gradient amplitude training image and a CFAR binary training image from the selected training image through a ROEWA edge detection algorithm and a double-parameter CFAR algorithm;

Slicing the gradient amplitude training image and the CFAR binary training image to obtain a gradient amplitude training slice corresponding to the original training set psi

And CFAR binary training slice->

And forms a new training set ψ' together with the original training set ψ;

2. constructing a new test set;

selecting an original SAR image as a test image, and slicing the original SAR image to obtain a plurality of test slices as an original test set T;

respectively obtaining a gradient amplitude test image and a CFAR binary test image from the selected test image through a ROEWA edge detection algorithm and a double-parameter CFAR algorithm;

slicing the gradient amplitude test image and the CFAR binary test image to obtain a gradient amplitude test slice corresponding to the original test set T

And CFAR binary test section->

And together with said original test set T forms a new test set T'.

3. The method for detecting the target of the anchor-free frame SAR according to claim 2, wherein in the anchor-free frame SAR target detection network constructed in step 2, the feature extraction module comprises three feature extraction sub-networks A, B, C which are identical in structure but not shared in parameters, namely an amplitude feature extraction network a, a gradient feature extraction network B and a CFAR feature extraction network C; wherein,,

The three feature extraction sub-networks A, B, C comprise a feature extraction module taking ResNet-18 as a framework and an FPN multi-scale feature fusion module;

the three feature extraction modules are respectively used for training the slice psi,

And->

Extracting features to obtain corresponding output feature layer ∈>

And->

i＝A,B,C；

Three FPN multi-scale feature fusion modules are respectively used for outputting feature layers

And->

Performing multi-scale feature fusion to obtain an output feature layer of the whole feature extraction module>

And->

4. The method for detecting the target of the anchor-free frame SAR based on the supervised attention mechanism as set forth in claim 3, wherein in the anchor-free frame SAR target detection network constructed in the step 2, the feature processing module comprises a T-ICSAF feature fusion module based on the interactive attention mechanism and a CSSCAM module based on the combination of the spatial attention mechanism and the channel attention mechanism supervised by the GT binary label; wherein,,

the T-ICSAF feature fusion module is used for outputting a feature layer of the feature extraction module

And->

Fusing to obtain fusion characteristic F₃ ；

The CSSCAM module is used for fusing the characteristic F₃ Processing to obtain a feature layer A with enhanced target features and suppressed background clutter features₃ As an output of the feature processing module.

5. The method for detecting the target of the anchor-free SAR based on the supervised attention mechanism as set forth in claim 4, wherein the T-ICSAF feature fusion module mainly comprises four sub-modules, namely a feature preprocessing sub-module, an interactive channel attention sub-module, an interactive space attention sub-module and an attention fusion sub-module; wherein,,

the feature preprocessing submodule is used for extracting the featuresOutput feature layer of (2)

And->

Respectively performing convolution operation and BN batch standardization operation to correspondingly obtain three characteristic layers X³⁰ 、X³¹ And X³² ；

The interactive channel attention submodule is used for aiming at the characteristic layer X³⁰ 、X³¹ And X³² Sequentially performing global average pooling, addition and Sigmoid normalization operation to obtain a channel attention weight F_c ；

The interactive space attention submodule is used for aiming at the characteristic layer X³⁰ 、X³¹ And X³² Sequentially performing global average pooling, concat, convolution and Sigmoid normalization operation on channel dimension to obtain a spatial attention weight F_s ；

The attention fusion sub-module is used for weighting the channel attention F_c Feature layer X output by feature preprocessing sub-module³⁰ Corresponding multiplication of each channel of (a) to obtain a characteristic layer F₃ 'A'; then weight F the spatial attention_s And feature layer F₃ The pixels of each channel are multiplied correspondingly one by one to obtain a fusion characteristic layer F₃ The output of the T-ICSAF feature fusion module is obtained.

6. The method for detecting a frame-less SAR target based on a supervised attention mechanism according to claim 5, wherein the CSSCAM module comprises three sub-modules, namely a spatial attention machine sub-module, a SE channel attention machine sub-module and an attention fusion sub-module supervised by a GT binary label; wherein,,

the GT binary label supervised spatial attention mechanism submodule is used for fusing the feature layer F output by the T-ICSAF feature fusion module₃ Sequentially performing global average pooling operation on channel dimension and global maximum pooling operation, concat operation, convolution and Sigmoid normalization on channel dimensionOperation to obtain the space attention weight A_s ；

The SE channel attention mechanism submodule is used for outputting a fusion characteristic layer F to the T-ICSAF characteristic fusion module₃ Sequentially performing global average pooling operation and global maximum pooling operation of space dimension, feature vector compression, mapping and decompression operation, addition and Sigmoid normalization operation to obtain channel attention weight A_c ；

The attention fusion sub-module is used for weighting the spatial attention A_s Fusion feature layer F output by T-ICSAF feature fusion module₃ The pixels of each channel of the image are multiplied one by one to obtain a characteristic layer A'₃ Then pay attention to the channel weight A_c And feature layer A'₃ Corresponding multiplication of each channel of the target layer to obtain a feature layer A with enhanced target features and suppressed background clutter features₃ The output of the CSSCAM module is obtained.

7. The method for detecting the target of the anchor-free SAR based on the supervised Attention mechanism as set forth in claim 4, wherein in the anchor-free SAR target detection network constructed in the step 2, the network prediction module comprises a classification branching sub-module for target detection tasks, a regression branching sub-module and an Attention branching sub-module based on GT binary label supervision;

the classification branching sub-module and the regression branching sub-module are respectively used for outputting a characteristic layer A to the characteristic processing module₃ Predicting, and correspondingly obtaining classification scores and boundary box regression parameters;

the attribute classification sub-module is used for classifying the feature layer C used for predicting classification scores_P And feature layer R for predicting bounding box regression parameters_P And processing to obtain a classification score.

8. The supervised attention mechanism based anchor free SAR target detection method of claim 7, wherein said classification branching sub-module is implemented by:

First for the characteristic partFeature layer A outputted by CSSCAM module of processing part₃ Performing four convolution operations to obtain a feature layer C for predicting classification scores_P The method comprises the steps of carrying out a first treatment on the surface of the Then for the obtained characteristic layer C_P Sequentially performing convolution and Sigmoid normalization operations to obtain a classification score graph X predicted by the classification branch submodule⁴⁷ ；

The regression branch submodule is realized in the following manner:

first, outputting a feature layer A to a feature processing part CSSCAM module₃ Performing four convolution operations to obtain a feature layer R for predicting regression parameters of the boundary box_P The method comprises the steps of carrying out a first treatment on the surface of the Then for the obtained characteristic layer R_P Performing a convolution operation to obtain a boundary frame regression parameter X predicted by the regression branch submodule⁴⁸ ；

The attribute classification sub-module is realized in the following manner:

feature layer C for predicting classification score in classification branch submodule_P And feature layer R for predicting regression parameters of boundary boxes in regression branch submodule_P Performing a Concat operation, and comparing the obtained characteristic diagram X⁴⁹ Performing convolution operation to obtain a feature layer A_P The method comprises the steps of carrying out a first treatment on the surface of the Finally to the characteristic layer A_P Performing convolution operation and Sigmoid normalization operation to obtain a classification score X predicted by the Attention classification sub-module⁵⁰ 。

9. The supervised attention mechanism based anchor free SAR target detection method of claim 8, wherein in step 3, during training of the target detection network,

the CSSCAM module uses the GT binary label GT' to monitor the space attention weight A of the space attention machine sub-module of the GT label_s Performing supervised learning, wherein a Loss function uses Focal Loss;

the classification branch sub-module performs supervised learning by using positive and negative sample labels obtained according to a positive and negative sample selection strategy without an anchor frame, and the Loss function uses Focal Loss;

the regression branch sub-module uses positive sample labels obtained according to an anchor-frame-free positive and negative sample selection strategy to conduct supervised learning, and the Loss function uses GIOU Loss;

the Attention classification sub-module uses GT binary labels GT' at the same position as a positive sample obtained according to an anchor-free frame positive and negative sample selection strategy to conduct supervised learning, and a Loss function of the Attention classification sub-module uses BCE Loss.

10. The supervised attention mechanism based anchor free SAR target detection method of claim 1, wherein step 4 comprises:

41 One amplitude test slice and the corresponding gradient amplitude test slice G in the new test set_a^Y CFAR binary test slice

Respectively sending the three feature extraction sub-networks in the trained target network to test to obtain a classification score boundary box regression parameter and a classification score of each position on the feature layer; the amplitude test slice is a certain test slice in the original test set T;

42 Multiplying the classification score and the classification score of each location and opening a root number as a final target detection score for the location, and comparing with a preset score threshold:

if the target detection score at any position in the feature layer is smaller than a preset score threshold value, discarding a predicted detection frame at the position;

otherwise, mapping the position on the feature layer back to the original image of the test slice as the center of a final detection frame according to the downsampling multiple, and synthesizing the regression parameters of the boundary frame to obtain a target detection result of the position;

43 Repeating the operation of step 42) for each position in the feature layer to obtain a preliminary target detection result.