










技术领域Technical Field
本发明属于雷达目标检测技术领域,具体涉及一种基于有监督注意力机制的无锚框SAR目标检测方法。The present invention belongs to the technical field of radar target detection, and in particular relates to an anchor-frame-free SAR target detection method based on a supervised attention mechanism.
背景技术Background Art
合成孔径雷达(Synthetic Aperture Radar,SAR)由于其能够在任何气候条件下不分昼夜地作业,具有全天时、全天候的特点,已经被广泛应用在军事侦查、资源勘探、环境保护、灾害预防和科学研究等各种领域。SAR图像自动目标识别(Automatic TargetRecognition,ATR)技术致力于对复杂SAR场景中的目标进行定位和识别,是SAR图像应用的核心方向之一,在军事和民用领域都处于至关重要的地位。美国林肯实验室最早开展此方面的研究工作,并于20世纪80年代末提出了著名的SAR ATR三级处理流程:SAR目标检测、鉴别和识别,它是SAR图像解译最常见的处理流程。其中的SAR目标检测任务是SAR图像解译的重要内容。Synthetic Aperture Radar (SAR) has been widely used in various fields such as military reconnaissance, resource exploration, environmental protection, disaster prevention and scientific research because it can operate day and night in any climatic conditions and has the characteristics of all-day and all-weather. SAR image automatic target recognition (ATR) technology is committed to locating and identifying targets in complex SAR scenes. It is one of the core directions of SAR image application and plays a vital role in both military and civilian fields. The Lincoln Laboratory in the United States was the first to carry out research in this area and proposed the famous SAR ATR three-level processing flow in the late 1980s: SAR target detection, identification and recognition. It is the most common processing flow for SAR image interpretation. Among them, the SAR target detection task is an important part of SAR image interpretation.
自从合成孔径雷达问世以来,有关SAR目标检测的技术也在飞速发展。在技术发展早期阶段,大量学者主要对传统目标检测方法进行研究,其中最为著名的方法是由美国林肯实验室于20世纪90年代提出的双参数恒虚警(Constant False Alarm Rate,CFAR)检测方法,该方法成功将CFAR检测器扩展到了二维的SAR图像目标检测领域中。由于双参数CFAR算法在一些简单场景中检测性能优越,大量研究都开始围绕恒虚警算法展开,提出了最小选择CFAR、最大选择CFAR、单元平均CFAR、序贯统计CFAR等检测器。虽然传统目标检测方法能够在一些简单场景中取得较好的目标检测性能,但是其仍具有许多缺点:(1)通常涉及大量超参数并需要设置阈值来完成目标检测任务,所以在实际中需要根据使用场景手动调参,流程繁琐,难以实现自适应检测,并且目标检测精确率较低;(2)传统目标检测方法通常是逐像素检测,检测时间较长,并且在一些复杂场景中难以取得较好的目标检测性能。因此,传统目标检测方法已逐渐满足不了实际的使用需要。Since the advent of synthetic aperture radar, the technology of SAR target detection has also developed rapidly. In the early stage of technology development, a large number of scholars mainly studied traditional target detection methods. The most famous method is the dual-parameter constant false alarm rate (CFAR) detection method proposed by the Lincoln Laboratory in the United States in the 1990s. This method successfully extended the CFAR detector to the field of two-dimensional SAR image target detection. Since the dual-parameter CFAR algorithm has excellent detection performance in some simple scenes, a large number of studies have begun to focus on the constant false alarm algorithm, and detectors such as minimum selection CFAR, maximum selection CFAR, unit average CFAR, and sequential statistical CFAR have been proposed. Although traditional target detection methods can achieve good target detection performance in some simple scenes, they still have many disadvantages: (1) They usually involve a large number of hyperparameters and need to set thresholds to complete the target detection task. Therefore, in practice, they need to manually adjust the parameters according to the usage scenario. The process is cumbersome, it is difficult to achieve adaptive detection, and the target detection accuracy is low; (2) Traditional target detection methods are usually pixel-by-pixel detection, which takes a long time to detect and is difficult to achieve good target detection performance in some complex scenes. Therefore, traditional target detection methods have gradually failed to meet actual usage needs.
近年来,随着深层网络在光学图像目标检测领域中大获成功,基于深度学习的SAR目标检测方法成为了众多学者研究的热点,目前已经在SAR目标检测任务中得到了广泛的应用,得到了远远优于传统目标检测方法的SAR目标检测性能。例如,专利CN202210269829.6提供了一种基于CFAR指导的双流SSD SAR图像目标检测方法,其通过SAR幅度特征和CFAR指示特征在特征空间上进行融合,来充分利用SAR图像中目标的强散射特性,增强目标检测性能;然后利用CFAR二值指示图来让检测器更加关注难分负样本和正样本的学习;最后提出了AR-NMS算法改进了传统的NMS算法,提高了SAR目标检测性能。In recent years, with the great success of deep networks in the field of optical image target detection, SAR target detection methods based on deep learning have become a hot topic for many scholars. At present, they have been widely used in SAR target detection tasks, and have achieved SAR target detection performance that is far superior to traditional target detection methods. For example, patent CN202210269829.6 provides a dual-stream SSD SAR image target detection method based on CFAR guidance, which fully utilizes the strong scattering characteristics of targets in SAR images and enhances target detection performance by fusing SAR amplitude features and CFAR indicator features in feature space; then, the CFAR binary indicator map is used to make the detector pay more attention to the learning of difficult-to-distinguish negative samples and positive samples; finally, the AR-NMS algorithm is proposed to improve the traditional NMS algorithm and improve the SAR target detection performance.
但是,对于SAR地面目标数据,由于其数据较少并且待检测场景较为复杂,通常包含大量与目标特性十分相似的自然杂波和人造杂波,这使得地面目标检测任务十分困难,目前有关SAR地面目标检测的研究也相对较少,主流的基于深层网络的SAR地面目标检测方法大多数是基于有锚框的目标检测方法,这些目标检测框架通常需要预先铺设大量锚框,其中涉及大量超参数的设置和复杂的IoU计算,并且存在严重的正负样本不平衡问题,这不利于SAR目标检测任务。另外,现有方法没有深入考虑SAR图像本身特有的几何畸变、辐射畸变和阴影遮挡等特点和SAR图像本身的特性对SAR目标检测的帮助,导致目前有关复杂场景下SAR目标检测方法的检测性能不高。However, for SAR ground target data, due to the small amount of data and the complex scenes to be detected, it usually contains a large amount of natural clutter and artificial clutter that are very similar to the target characteristics, which makes the ground target detection task very difficult. At present, there are relatively few studies on SAR ground target detection. Most of the mainstream SAR ground target detection methods based on deep networks are based on target detection methods with anchor boxes. These target detection frameworks usually require a large number of anchor boxes to be laid in advance, which involves a large number of hyperparameter settings and complex IoU calculations, and there is a serious imbalance problem between positive and negative samples, which is not conducive to SAR target detection tasks. In addition, the existing methods do not deeply consider the unique geometric distortion, radiation distortion and shadow occlusion of SAR images themselves and the help of the characteristics of SAR images themselves to SAR target detection, resulting in the low detection performance of the current SAR target detection methods in complex scenes.
发明内容Summary of the invention
为了解决现有技术中存在的上述问题,本发明提供了一种基于有监督注意力机制的无锚框SAR目标检测方法。本发明的技术思路是:通过对训练样本分别使用指数加权平均(Ratio of Exponentially Weighted Averages,ROEWA)边缘检测算法和双参数CFAR算法,得到每个训练样本的梯度信息和CFAR信息,并一同送入到构建好的基于梯度信息、CFAR信息融合和有监督注意力机制的无锚框SAR目标检测网络中进行训练,然后对测试样本进行同样的处理并输入到训练好的网络模型中得到最终的目标检测结果。本发明要解决的技术问题通过以下技术方案实现:In order to solve the above-mentioned problems existing in the prior art, the present invention provides an anchor-free SAR target detection method based on a supervised attention mechanism. The technical idea of the present invention is: by using the exponentially weighted average (Ratio of Exponentially Weighted Averages, ROEWA) edge detection algorithm and the dual-parameter CFAR algorithm for the training samples respectively, the gradient information and CFAR information of each training sample are obtained, and sent together to the constructed anchor-free SAR target detection network based on gradient information, CFAR information fusion and supervised attention mechanism for training, and then the test sample is processed in the same way and input into the trained network model to obtain the final target detection result. The technical problem to be solved by the present invention is achieved by the following technical solutions:
一种基于有监督注意力机制的无锚框SAR目标检测方法,包括:A method for anchor-free SAR target detection based on a supervised attention mechanism, comprising:
步骤1:基于原始SAR图像获得原始训练集和原始测试集;利用ROEWA边缘检测算法和双参数CFAR算法分别对原始训练集和原始测试集中获取梯度信息和CFAR信息,并据此构建新的训练集和新的测试集;Step 1: Obtain the original training set and the original test set based on the original SAR image; use the ROEWA edge detection algorithm and the dual-parameter CFAR algorithm to obtain the gradient information and CFAR information from the original training set and the original test set respectively, and construct a new training set and a new test set accordingly;
步骤2:构建基于梯度信息、CFAR信息融合和有监督注意力机制的无锚框SAR目标检测网络;其中,所述目标检测网络包括特征提取模块、特征处理模块和网络预测模块;Step 2: construct an anchor-free SAR target detection network based on gradient information, CFAR information fusion and supervised attention mechanism; wherein the target detection network includes a feature extraction module, a feature processing module and a network prediction module;
步骤3:利用所述新的训练集对所述目标检测网络进行训练,得到训练好的目标检测网络;Step 3: Using the new training set to train the target detection network to obtain a trained target detection network;
步骤4:将所述新的测试集输入到训练好的目标检测网络中,得到初步的目标检测结果;Step 4: Input the new test set into the trained target detection network to obtain preliminary target detection results;
步骤5:将所述初步的目标检测结果对应到测试图像上,并进行NMS操作以去除重叠的目标检测框,得到最终的目标检测结果。Step 5: Correspond the preliminary target detection result to the test image, and perform NMS operation to remove overlapping target detection frames to obtain the final target detection result.
本发明的有益效果:Beneficial effects of the present invention:
1、本发明提供的基于有监督注意力机制的无锚框SAR目标检测方法从增强目标特征的角度出发,引入了梯度幅度信息和CFAR信息,构建了基于梯度信息、CFAR信息融合的无锚框目标检测网络,实现了复杂场景下地面SAR目标检测任务,能够有效缓解有锚框目标检测网络本身存在的计算复杂、正负样本不平衡等问题,提升了SAR目标检测性能;1. The anchor-free SAR target detection method based on the supervised attention mechanism provided by the present invention introduces gradient amplitude information and CFAR information from the perspective of enhancing target features, constructs an anchor-free target detection network based on the fusion of gradient information and CFAR information, realizes the ground SAR target detection task in complex scenes, and can effectively alleviate the problems of computational complexity and imbalance of positive and negative samples existing in the anchor-free target detection network itself, thereby improving the SAR target detection performance;
2、本发明设计的目标检测网络在特征融合方面,采用了基于交互式注意力机制的三支路交互式通道-空间注意力融合(Triple-Interactive Channel-Spatial AttentionFusion,T-ICSAF)模块;同时,提出了基于Ground Truth(GT)二值标签监督的有监督空间注意力机制和SE通道注意力机制结合(Combining Supervised-Spatial And SE-ChannelAttention Mechanism,CSSCAM)模块来有效地抑制背景杂波特征,解决了由于传统特征引入所带来的额外背景杂波特征造成虚警框增加的问题,进一步增强目标特征,让特征提取后的目标特征更具有目标性,从而获得更好的SAR目标检测性能;2. In terms of feature fusion, the target detection network designed by the present invention adopts a Triple-Interactive Channel-Spatial Attention Fusion (T-ICSAF) module based on an interactive attention mechanism; at the same time, a Combining Supervised-Spatial And SE-Channel Attention Mechanism (CSSCAM) module based on Ground Truth (GT) binary label supervision is proposed to effectively suppress background clutter features, solve the problem of increased false alarm frames caused by additional background clutter features brought about by the introduction of traditional features, further enhance target features, and make the target features after feature extraction more targeted, thereby obtaining better SAR target detection performance;
3、本发明根据SAR图像本身具有几何畸变、辐射畸变、遮挡阴影等特点,设计了一个基于GT二值标签监督的Attention二分类分支取代了原始无锚框目标检测网络FCOS中的Centerness分支,让其更加适用于SAR车辆目标检测任务,并能联合本发明提出的CSSCAM模块使用,从而进一步提升SAR目标检测性能。3. According to the characteristics of SAR images such as geometric distortion, radiation distortion, occlusion and shadow, the present invention designs an Attention binary classification branch based on GT binary label supervision to replace the Centerness branch in the original anchor-free target detection network FCOS, making it more suitable for SAR vehicle target detection tasks, and can be used in conjunction with the CSSCAM module proposed in the present invention, thereby further improving the SAR target detection performance.
以下将结合附图及实施例对本发明做进一步详细说明。The present invention will be further described in detail below with reference to the accompanying drawings and embodiments.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1是本发明实施例提供的基于有监督注意力机制的无锚框SAR目标检测方法的一种流程示意图;FIG1 is a flow chart of a method for detecting SAR targets without anchor frames based on a supervised attention mechanism according to an embodiment of the present invention;
图2是本发明实施例提供的基于有监督注意力机制的无锚框SAR目标检测方法的另一种流程示意图;FIG2 is another schematic diagram of a flow chart of a method for anchor-free SAR target detection based on a supervised attention mechanism provided by an embodiment of the present invention;
图3是本发明实施例提供的无锚框SAR目标检测网络框架图;FIG3 is a network framework diagram of anchor-free SAR target detection provided by an embodiment of the present invention;
图4是本发明实施例提供的T-ICSAF特征融合模块的网络框架图;FIG4 is a network framework diagram of a T-ICSAF feature fusion module provided in an embodiment of the present invention;
图5是本发明实施例提供的CSSCAM模块的网络框架图;FIG5 is a network framework diagram of a CSSCAM module provided in an embodiment of the present invention;
图6-11是本发明实验所使用的MiniSAR数据图像。Figures 6-11 are MiniSAR data images used in the experiments of the present invention.
具体实施方式DETAILED DESCRIPTION
下面结合具体实施例对本发明做进一步详细的描述,但本发明的实施方式不限于此。The present invention is further described in detail below with reference to specific embodiments, but the embodiments of the present invention are not limited thereto.
实施例一Embodiment 1
请联合参见图1-2,图1是本发明实施例提供的基于有监督注意力机制的无锚框SAR目标检测方法的一种流程示意图,图2是本发明实施例提供的基于有监督注意力机制的无锚框SAR目标检测方法的另一种流程示意图。本发明提供的基于有监督注意力机制的无锚框SAR目标检测方法具体包括:Please refer to Figures 1-2, Figure 1 is a flowchart of a method for anchor-free SAR target detection based on a supervised attention mechanism provided by an embodiment of the present invention, and Figure 2 is another flowchart of a method for anchor-free SAR target detection based on a supervised attention mechanism provided by an embodiment of the present invention. The method for anchor-free SAR target detection based on a supervised attention mechanism provided by the present invention specifically includes:
步骤1:基于原始SAR图像获得原始训练集和原始测试集;利用ROEWA边缘检测算法和双参数CFAR算法分别对原始训练集和原始测试集中获取梯度信息和CFAR信息,并据此构建新的训练集和新的测试集。Step 1: Obtain the original training set and the original test set based on the original SAR image; use the ROEWA edge detection algorithm and the dual-parameter CFAR algorithm to obtain the gradient information and CFAR information from the original training set and the original test set respectively, and construct a new training set and a new test set accordingly.
一、构建新的训练集1. Construct a new training set
首先,选取若干幅原始SAR图像作为训练图像,并对其进行切片,以得到若干训练切片作为原始训练集ψ;Firstly, several original SAR images are selected as training images and sliced to obtain several training slices as the original training set ψ;
然后,对选取的训练图像分别通过ROEWA边缘检测算法和双参数CFAR算法得到梯度幅度训练图像和CFAR二值训练图像;Then, the selected training images are respectively subjected to the ROEWA edge detection algorithm and the dual-parameter CFAR algorithm to obtain the gradient amplitude training image and the CFAR binary training image;
最后,对梯度幅度训练图像和CFAR二值训练图像进行切片,得到与原始训练集ψ对应的梯度幅度训练切片和CFAR二值训练切片并与原始训练集ψ一起形成新的训练集ψ'。Finally, the gradient magnitude training image and the CFAR binary training image are sliced to obtain the gradient magnitude training slice corresponding to the original training set ψ and CFAR binary training slices And together with the original training set ψ, a new training set ψ' is formed.
可选的,作为一种实现方式,本实施例可以从MiniSAR数据集中选取五幅图像作为训练图像,以45为步长,800×1333为切片大小得到3536张训练切片构成原始训练集ψ;然后对选取的训练图像分别通过ROEWA边缘检测算法和双参数CFAR算法得到梯度幅度训练图像和CFAR二值训练图像,并以45为步长,800×1333为切片大小得到与原始训练集ψ对应的梯度幅度训练切片和CFAR二值训练切片最后与原始训练集ψ一起构成最终的训练集ψ'。Optionally, as an implementation method, this embodiment can select five images from the MiniSAR data set as training images, with a step size of 45 and a slice size of 800×1333 to obtain 3536 training slices to form an original training set ψ; then, the selected training images are respectively subjected to the ROEWA edge detection algorithm and the dual-parameter CFAR algorithm to obtain a gradient amplitude training image and a CFAR binary training image, and with a step size of 45 and a slice size of 800×1333 to obtain a gradient amplitude training slice corresponding to the original training set ψ and CFAR binary training slices Finally, together with the original training set ψ, they form the final training set ψ'.
二、构建新的测试集2. Build a new test set
首先,选取一幅原始SAR图像作为测试图像,并对其进行切片,以得到若干测试切片作为原始测试集T;First, an original SAR image is selected as a test image and sliced to obtain several test slices as the original test set T;
然后,对选取的测试图像分别通过ROEWA边缘检测算法和双参数CFAR算法得到梯度幅度测试图像和CFAR二值测试图像;Then, the selected test image is subjected to the ROEWA edge detection algorithm and the dual-parameter CFAR algorithm to obtain the gradient amplitude test image and the CFAR binary test image;
最后,对梯度幅度测试图像和CFAR二值测试图像进行切片,得到与原始测试集T对应的梯度幅度测试切片和CFAR二值测试切片并与原始测试集T一起形成新的测试集T'。Finally, the gradient magnitude test image and the CFAR binary test image are sliced to obtain the gradient magnitude test slice corresponding to the original test set T and CFAR binary test slices And together with the original test set T, form a new test set T'.
可选的,作为一种实现方式,从MiniSAR数据集中选取一幅图像作为测试图像,以150为步长,800×1333为切片大小得到63张测试切片构成原始测试集T,并记录每个测试切片与测试图像对应的位置关系Loc,然后对选取的测试图像分别通过ROEWA边缘检测算法和双参数CFAR算法得到梯度幅度测试图像和CFAR二值测试图像,并以150为步长,800×1333为切片大小得到与原始测试集T对应的梯度幅度测试切片和CFAR二值测试切片最后与原始测试集T一起构成最终的测试集T'。Optionally, as an implementation method, an image is selected from the MiniSAR data set as a test image, and 63 test slices are obtained with a step size of 150 and a slice size of 800×1333 to form an original test set T, and the positional relationship Loc corresponding to each test slice and the test image is recorded, and then the selected test image is respectively subjected to the ROEWA edge detection algorithm and the dual-parameter CFAR algorithm to obtain a gradient amplitude test image and a CFAR binary test image, and the gradient amplitude test slice corresponding to the original test set T is obtained with a step size of 150 and a slice size of 800×1333. and CFAR binary test slices Finally, together with the original test set T, they form the final test set T'.
进一步的,在构建训练集和测试集的过程中,均需要利用ROEWA边缘检测算法获取SAR图像的梯度幅度图像,其操作过程相同,具体步骤如下:Furthermore, in the process of constructing the training set and the test set, the ROEWA edge detection algorithm is needed to obtain the gradient amplitude image of the SAR image. The operation process is the same, and the specific steps are as follows:
1)输入原始SAR图像;1) Input the original SAR image;
2)对于输入SAR图像中的任一像素点(i,j),首先使用ROEWA算法分别计算该像素点在水平方向上的水平梯度和在垂直方向上的垂直梯度然后利用以下公式计算像素点(i,j)的梯度幅度Gi,j:2) For any pixel point (i, j) in the input SAR image, the ROEWA algorithm is first used to calculate the horizontal gradient of the pixel point in the horizontal direction and the vertical gradient in the vertical direction Then the gradient magnitudeGi,j of pixel (i,j) is calculated using the following formula:
具体的,在实施例中,使用ROEWA算法计算任一像素点(i,j)的水平梯度和垂直梯度的具体过程如下:Specifically, in the embodiment, the ROEWA algorithm is used to calculate the horizontal gradient of any pixel point (i, j) and vertical gradient The specific process is as follows:
a、计算水平梯度a. Calculate horizontal gradient
对于任一像素点(i,j),首先计算该像素点左右两侧(4σ+1)×2σ范围内的像素点幅度值的指数加权均值ML和MR,然后将ML和MR作商并取对数,从而得到水平梯度其中σ为指数加权因子。水平梯度的计算公式如下所示:For any pixel point (i, j), first calculate the exponentially weighted meanML and MR of the pixel amplitude values within the range of (4σ+1)×2σ on the left and right sides of the pixel point, then take the quotient of MLandMRand take the logarithm to obtain the horizontal gradient Where σ is an exponential weighting factor. Horizontal gradient The calculation formula is as follows:
其中I(·)表示SAR图像中像素点的幅度值。Where I(·) represents the amplitude value of the pixel in the SAR image.
b、计算垂直梯度b. Calculate vertical gradient
对于任一像素点(i,j),首先计算该像素点上下两侧(4σ+1)×2σ范围内的像素点幅度值的指数加权均值MT和MB,然后将MT和MB作商并取对数,从而得到垂直梯度其中σ为指数加权因子。垂直梯度的计算公式如下所示:For any pixel point (i, j), first calculate the exponentially weighted meanMT andMB of the pixel amplitude values within the range of (4σ+1)×2σ on both sides of the pixel point, then take the quotient and logarithm ofMT andMB to get the vertical gradient Where σ is an exponential weighting factor. Vertical gradient The calculation formula is as follows:
其中I(·)表示SAR图像中像素点的幅度值。Where I(·) represents the amplitude value of the pixel in the SAR image.
3)重复2)操作,得到输入SAR图像中每个像素点的梯度幅度,从而得到原始SAR图像对应的梯度幅度图像。3) Repeat operation 2) to obtain the gradient amplitude of each pixel in the input SAR image, thereby obtaining the gradient amplitude image corresponding to the original SAR image.
更进一步的,在构建训练集和测试集的过程中,还需要利用双参数CFAR算法获取SAR图像的CFAR二值图像,其操作步骤如下:Furthermore, in the process of constructing the training set and the test set, it is also necessary to use the dual-parameter CFAR algorithm to obtain the CFAR binary image of the SAR image. The operation steps are as follows:
1)输入原始SAR图像;1) Input the original SAR image;
2)对于任一像素点(i,j),利用其幅度Ii,j定义CFAR检测统计量:2) For any pixel (i, j), the CFAR detection statistic is defined using its amplitude Ii, j :
其中和是高斯分布均值和标准差的最大似然估计,是利用像素点(i,j)附近背景杂波像素点的幅度值xi计算得到的:in and is the maximum likelihood estimate of the mean and standard deviation of the Gaussian distribution, which is calculated using the amplitude valuexi of the background clutter pixel near the pixel point (i, j):
3)将2)计算得到的CFAR检测统计量Di,j与CFAR检测阈值T进行比较,如果Di,j>T,则将该像素点被判为目标,CFAR二值检测结果为1,即将CFAR二值图像中的对应位置设置为1;如果Di,j<T,则将该像素点被判为背景,CFAR二值检测结果为0,即将CFAR二值图像中的对应位置设置为0;3) Compare the CFAR detection statistic Di,j calculated in 2) with the CFAR detection threshold T. If Di,j > T, the pixel is judged as a target, and the CFAR binary detection result is 1, that is, the corresponding position in the CFAR binary image is set to 1; if Di,j < T, the pixel is judged as a background, and the CFAR binary detection result is 0, that is, the corresponding position in the CFAR binary image is set to 0;
4)重复上述2)、3)操作,得到输入SAR图像中每个像素点对应的CFAR二值检测结果,从而得到原始SAR图像对应的CFAR二值图像。4) Repeat the above 2) and 3) operations to obtain the CFAR binary detection result corresponding to each pixel in the input SAR image, thereby obtaining the CFAR binary image corresponding to the original SAR image.
步骤2:构建基于梯度信息、CFAR信息融合和有监督注意力机制的无锚框SAR目标检测网络。Step 2: Construct an anchor-free SAR target detection network based on gradient information, CFAR information fusion and supervised attention mechanism.
请参见图3,图3是本发明实施例提供的无锚框SAR目标检测网络框架图,该目标检测网络包括特征提取模块、特征处理模块和网络预测模块。Please refer to FIG. 3 , which is a framework diagram of an anchor-free SAR target detection network provided by an embodiment of the present invention. The target detection network includes a feature extraction module, a feature processing module and a network prediction module.
下面分别对上述三个模块进行详细介绍。The above three modules are introduced in detail below.
(一)特征提取模块(I) Feature extraction module
如图3所示,特征提取模块包括结构完全相同但参数不共享的三个特征提取子网络A、B、C,分别为幅度特征提取网络A、梯度特征提取网络B和CFAR特征提取网络C;其中,As shown in Figure 3, the feature extraction module includes three feature extraction sub-networks A, B, and C with exactly the same structure but no shared parameters, namely, amplitude feature extraction network A, gradient feature extraction network B, and CFAR feature extraction network C;
三个特征提取子网络A、B、C均包括一个以ResNet-18为骨架的特征提取模块和一个FPN多尺度特征融合模块;The three feature extraction sub-networks A, B, and C each include a feature extraction module with ResNet-18 as the skeleton and an FPN multi-scale feature fusion module;
三个特征提取模块分别用于对训练切片ψ、和进行特征提取,得到对应的输出特征层和i=A,B,C;The three feature extraction modules are used to extract the training slices ψ, and Perform feature extraction to obtain the corresponding output feature layer and i=A,B,C;
三个FPN多尺度特征融合模块分别用于对输出特征层和进行多尺度特征融合,得到整个特征提取模块的输出特征层和The three FPN multi-scale feature fusion modules are used to output feature layers and Perform multi-scale feature fusion to obtain the output feature layer of the entire feature extraction module and
在本实施例中,三个特征提取子网络A、B、C的特征提取网络所使用的ResNet-18的网络结构、参数设置及对应关系如下:In this embodiment, the network structure, parameter settings and corresponding relationships of ResNet-18 used in the feature extraction networks of the three feature extraction subnetworks A, B, and C are as follows:
所使用到的ResNet-18的网络结构主要包括五个layer:conv1、conv2_x、conv3_x、conv4_x和conv5_x,输入网络的图像大小为H×W×3:The network structure of ResNet-18 used mainly includes five layers: conv1, conv2_x, conv3_x, conv4_x and conv5_x. The image size of the input network is H×W×3:
conv1:包括一个卷积层L1,其卷积核窗口大小为7×7,滑动步长为2,填充属性Padding为3,用于输出64个大小的特征图j表示第j个特征图。conv1的输出为conv2_x的输入;conv1: includes a convolution layerL1 , whose convolution kernel window size is 7×7, sliding step size is 2, padding attribute Padding is 3, used to output 64 Feature map of size j represents the jth feature map. The output of conv1 is the input of conv2_x;
conv2_x:包括一个池化层P1、两个卷积模块M1、M2和两个残差模块R1、R2。具体地,P1采用最大池化的方式,其输入为conv1输出的特征层X1,卷积核窗口大小为3×3,滑动步长为2,填充属性Padding为1,用于输出64个大小的特征图j表示第j个特征图。P1的输出为M1和R1的输入;M1包含两个卷积层L2和L3,用于输出64个大小的特征图j表示第j个特征图。其中L2和L3的卷积核窗口大小为3×3,滑动步长为1,填充属性Padding为1,M1的输出为R1的另一个输入;R1包含一个将P1的输出特征层和M1的输出特征层的对应元素进行加和的操作,用于输出64个大小的特征图j表示第j个特征图。R1的输出为M2和R2的输入;M2同样包含两个卷积层L4和L5,用于输出64个大小的特征图j表示第j个特征图。其中L4和L5的卷积核窗口大小为3×3,滑动步长为1,填充属性Padding为1,M2的输出为R2的另一个输入;R2包含一个将R1的输出特征层和M2的输出特征层的对应元素进行加和的操作,用于输出64个大小的特征图j表示第j个特征图。R2的输出为conv3_x的输入;conv2_x: includes a pooling layer P1 , two convolution modules M1 , M2 and two residual modules R1 , R2 . Specifically, P1 uses the maximum pooling method, its input is the feature layer X1 output by conv1, the convolution kernel window size is 3×3, the sliding step is 2, and the padding attribute Padding is 1, which is used to output 64 Feature map of size j represents the jth feature map. The output of P1 is the input of M1 and R1 ; M1 contains two convolutional layers L2 and L3 , which are used to output 64 Feature map of size j represents the jth feature map. The convolution kernel window size ofL2 andL3 is 3×3, the sliding step is 1, the padding attribute Padding is 1, and the output ofM1 is another input ofR1 ;R1 contains an operation that adds the corresponding elements of the output feature layer ofP1 and the output feature layer ofM1 to output 64 Feature map of size j represents the jth feature map. The output of R1 is the input of M2 and R2 ; M2 also contains two convolutional layers L4 and L5 , which are used to output 64 Feature map of size j represents the jth feature map. The convolution kernel window size ofL4 andL5 is 3×3, the sliding step is 1, the padding attribute is 1, and the output ofM2 is another input ofR2 ;R2 contains an operation that adds the corresponding elements of the output feature layer ofR1 and the output feature layer ofM2 to output 64 Feature map of size j represents the jth feature map. The output of R2 is the input of conv3_x;
conv3_x:包括两个卷积模块M3、M4和两个残差模块R3、R4。具体地,M3包含两个卷积层L6和L7,用于输出128个大小的特征图j表示第j个特征图。其中L6的卷积核窗口大小为3×3,滑动步长为2,填充属性Padding为1,L7的卷积核窗口大小为3×3,滑动步长为1,填充属性Padding为1,M3的输出为R3的一个输入;R3的另一个输入为R2输出的特征层X6,R3首先将X6通过一个卷积核窗口大小为1×1,滑动步长为2,填充属性Padding为0的卷积层L8,用于输出128个大小的特征图j表示第j个特征图,然后将L8输出的特征层X8和M3的输出特征层X7进行对应元素的加和操作,用于输出128个大小的特征图R3的输出为R4和M4的输入;M4包含两个卷积层L9和L10,用于输出128个大小的特征图j表示第j个特征图。其中L9和L10的卷积核窗口大小为3×3,滑动步长为1,填充属性Padding为1,M4的输出为R4的另一个输入;R4包含一个将R3的输出特征层和M4的输出特征层的对应元素进行加和的操作,用于输出128个大小的特征图j表示第j个特征图。R4的输出为conv4_x的输入,另外R4的输出即为A、B、C中特征提取网络输出的特征层(i=A,B,C),同时它也是与之对应的FPN多尺度特征融合模块的一个输入;conv3_x: includes two convolution modules M3 , M4 and two residual modules R3 , R4 . Specifically, M3 contains two convolution layers L6 and L7 to output 128 Feature map of size j represents the jth feature map. The convolution kernel window size of L6 is 3×3, the sliding step size is 2, and the padding attribute is 1. The convolution kernel window size of L7 is 3×3, the sliding step size is 1, and the padding attribute is 1. The output of M3 is an input of R3 ; the other input of R3 is the feature layer X6 output by R2. R3 first passes X6 through a convolution layer L8 with a convolution kernel window size of 1×1, a sliding step size of 2, and a padding attribute of 0, which is used to output 128 Feature map of size j represents the jth feature map, and then the feature layerX8 output byL8 and the feature layerX7 output byM3 are summed up to output 128 Feature map of size The output of R3 is the input of R4 and M4 ; M4 contains two convolutional layers L9 and L10 , which are used to output 128 Feature map of size j represents the jth feature map. The convolution kernel window size ofL9 andL10 is 3×3, the sliding step is 1, the padding attribute is 1, and the output ofM4 is another input ofR4 ;R4 contains an operation that adds the corresponding elements of the output feature layer ofR3 and the output feature layer ofM4 to output 128 Feature map of size j represents the jth feature map. The output of R4 is the input of conv4_x. In addition, the output of R4 is the output of the feature extraction network in A, B, and C. Feature layer (i = A, B, C), which is also an input of the corresponding FPN multi-scale feature fusion module;
conv4_x:包括两个卷积模块M5、M6和两个残差模块R5、R6。具体地,M5包括两个卷积层L11和L12,用于输出256个大小的特征图j表示第j个特征图。其中L11的卷积核窗口大小为3×3,滑动步长为2,填充属性Padding为1,L12的卷积核窗口大小为3×3,滑动步长为1,填充属性Padding为1,M5的输出为R5的一个输入;R5的另一个输入为R4输出的特征层X11,R5首先将X11通过一个卷积核窗口大小为1×1,滑动步长为2,填充属性Padding为0的卷积层L13,用于输出256个大小的特征图j表示第j个特征图,然后将L13输出的特征层X13和M5的输出特征层X12进行对应元素的加和操作,用于输出256个大小的特征图j表示第j个特征图。R5的输出为R6和M6的输入;M6包含两个卷积层L14和L15,用于输出256个大小的特征图j表示第j个特征图。其中L14和L15的卷积核窗口大小为3×3,滑动步长为1,填充属性Padding为1的卷积操作,M6的输出为R6的另一个输入;R6包含一个将R5的输出特征层和M6的输出特征层的对应元素进行加和的操作,用于输出256个大小的特征图j表示第j个特征图。R6的输出为conv5_x的输入,另外R6的输出即为A、B、C中特征提取网络输出的特征层(i=A,B,C),同时它也是与之对应的FPN多尺度特征融合模块的一个输入;conv4_x: includes two convolution modules M5 , M6 and two residual modules R5 , R6 . Specifically, M5 includes two convolution layers L11 and L12 for outputting 256 Feature map of size j represents the jth feature map. The convolution kernel window size ofL11 is 3×3, the sliding step size is 2, and the padding attribute is 1. The convolution kernel window size ofL12 is 3×3, the sliding step size is 1, and the padding attribute is 1. The output ofM5 is an input ofR5 ; the other input ofR5 is the feature layerX11 output byR4.R5 first passesX11 through a convolution layerL13 with a convolution kernel window size of 1×1, a sliding step size of 2, and a padding attribute of 0, which is used to output 256 Feature map of size j represents the jth feature map, and then the feature layerX13 output byL13 and the feature layerX12 output byM5 are summed up to output 256 Feature map of size j represents the jth feature map. The output of R5 is the input of R6 and M6 ; M6 contains two convolutional layers L14 and L15 , which are used to output 256 Feature map of size j represents the jth feature map. The convolution kernel window size ofL14 andL15 is 3×3, the sliding step size is 1, and the padding attribute is 1. The output ofM6 is another input ofR6 ;R6 contains an operation that adds the corresponding elements of the output feature layer ofR5 and the output feature layer ofM6 to output 256 Feature map of size j represents the jth feature map. The output of R6 is the input of conv5_x. In addition, the output of R6 is the output of the feature extraction network in A, B, and C. Feature layer (i = A, B, C), which is also an input of the corresponding FPN multi-scale feature fusion module;
conv5_x:包括两个卷积模块M7、M8和两个残差模块R7和R8。M7包括两个卷积层L16和L17,用于输出512个大小的特征图j表示第j个特征图。其中L16的卷积核窗口大小为3×3,滑动步长为2,填充属性Padding为1,L17的卷积核窗口大小为3×3,滑动步长为1,填充属性Padding为1,M7的输出为R7的输入;R7的另一个输入为R6输出特征层X16,R7首先将X16通过一个卷积核窗口大小为1×1,滑动步长为2,填充属性Padding为0的卷积层L18,用于输出512个大小的特征图j表示第j个特征图,然后将L18输出的特征层X18和M7输出的特征层X17进行对应元素的加和操作,用于输出512个大小的特征图R7的输出为R8和M8的输入;M8包含两个卷积层L19和L20,用于输出512个大小的特征图j表示第j个特征图。其中L19和L20的卷积核窗口大小为3×3,滑动步长为1,填充属性Padding为1,M8的输出为R8的另一个输入;R8包含一个将R7的输出特征层和M8的输出特征层的对应元素进行加和的操作,用于输出512个大小的特征图j表示第j个特征图。R8的输出即为A、B、C中特征提取网络输出的特征层(i=A,B,C),同时它也是与之对应的FPN多尺度特征融合模块的一个输入。conv5_x: includes two convolution modulesM7 ,M8 and two residual modulesR7 andR8 .M7 includes two convolution layersL16 andL17 to output 512 Feature map of size j represents the jth feature map. The convolution kernel window size of L16 is 3×3, the sliding step size is 2, and the padding attribute is 1. The convolution kernel window size of L17 is 3×3, the sliding step size is 1, and the padding attribute is 1. The output of M7 is the input of R7 ; another input of R7 is the output feature layer X16 of R6. R7 first passes X16 through a convolution kernel window size of 1×1, a sliding step size of 2, and a padding attribute of 0. The convolution layer L18 is used to output 512 Feature map of size j represents the jth feature map, and then the feature layerX18 output byL18 and the feature layerX17 output byM7 are summed up to output 512 Feature map of size The output ofR7 is the input ofR8 andM8 ;M8 contains two convolutional layersL19 andL20 , which are used to output 512 Feature map of size j represents the jth feature map. The convolution kernel window size ofL19 andL20 is 3×3, the sliding step is 1, the padding attribute is 1, and the output ofM8 is another input ofR8 ;R8 contains an operation that adds the corresponding elements of the output feature layer ofR7 and the output feature layer ofM8 to output 512 Feature map of size j represents the jth feature map. The output of R8 is the output of the feature extraction network in A, B, and C. Feature layer (i = A, B, C), and it is also an input to the corresponding FPN multi-scale feature fusion module.
进一步的,三个特征提取子网络A、B、C使用FPN模块进行多尺度特征融合。FPN多尺度特征融合模块的实现方式、参数设置及对应关系如下:Furthermore, the three feature extraction sub-networks A, B, and C use the FPN module for multi-scale feature fusion. The implementation method, parameter setting, and corresponding relationship of the FPN multi-scale feature fusion module are as follows:
A、B、C中FPN多尺度特征融合模块的一个输入为与之对应的特征提取网络的输出特征层和以特征提取子网络A为例,其对应的FPN模块的输入为和One input of the FPN multi-scale feature fusion module in A, B, and C is the output feature layer of the corresponding feature extraction network and Taking feature extraction subnetwork A as an example, the input of its corresponding FPN module is and
使用的FPN多尺度特征融合模块主要包括三个部分:FPN1模块、FPN2模块和FPN3模块:The FPN multi-scale feature fusion module used mainly consists of three parts: FPN1 module, FPN2 module and FPN3 module:
FPN1模块:FPN1模块的输入为特征层,其特征层大小为该模块的具体实现方式如下:首先将进行一次卷积核的窗口大小为1×1,滑动步长为1,填充属性Padding为0的卷积操作,得到256个大小的特征图然后将得到的特征层X22进行一次上采样操作,得到256个大小的特征图即为FPN1模块的输出。另外FPN1模块的输出为FPN2模块的一个输入;FPN1 module: The input of FPN1 module is The feature layer has a feature size of The specific implementation of this module is as follows: First, Perform a convolution operation with a convolution kernel window size of 1×1, a sliding step size of 1, and a padding attribute of 0, and get 256 Feature map of size Then the obtained feature layer X22 is upsampled once to obtain 256 Feature map of size This is the output of the FPN1 module. In addition, the output of the FPN1 module is an input to the FPN2 module;
FPN2模块:FPN2模块的输入为特征层和FPN1模块的输出特征层X23,和X23的特征层大小均为该模块的具体实现方式如下:首先将进行一次卷积核的窗口大小为1×1,滑动步长为1,填充属性Padding为0的卷积操作,得到256个大小的特征图然后将得到的特征层X24和X23直接相加融合,得到256个大小的特征图最后将得到的特征层X25进行一次上采样操作,得到256个大小的特征图即为FPN2模块的输出。另外FPN2模块的输出为FPN3模块的一个输入;FPN2 module: The input of FPN2 module is Feature layer and the output feature layer of FPN1 module X23 , The feature layer sizes ofX23 are The specific implementation of this module is as follows: First, Perform a convolution operation with a convolution kernel window size of 1×1, a sliding step size of 1, and a padding attribute of 0, and get 256 Feature map of size Then the obtained feature layersX24 andX23 are directly added and fused to obtain 256 Feature map of size Finally, the obtained feature layer X25 is upsampled once to obtain 256 Feature map of size This is the output of the FPN2 module. In addition, the output of the FPN2 module is an input to the FPN3 module;
FPN3模块:FPN3模块的输入特征层为特征层和FPN2模块的输出特征层X26,和X26的大小分别为和该模块的具体实现方式如下:首先将进行一次卷积核的窗口大小为1×1,滑动步长为1,填充属性Padding为0的卷积操作,得到256个大小的特征图然后将得到的特征层X27和X26直接相加融合,得到256个大小的特征图最后将得到的特征层X28再进行一次卷积核的窗口大小为3×3,滑动步长为1,填充属性Padding为1的卷积操作,得到256个大小的特征图即为FPN3模块的输出。另外FPN3模块的输出即为整个FPN模块的输出特征层FPN3 module: The input feature layer of the FPN3 module is Feature layer and the output feature layer of the FPN2 module X26 , and X26 are and The specific implementation of this module is as follows: First, Perform a convolution operation with a convolution kernel window size of 1×1, a sliding step size of 1, and a padding attribute of 0, and get 256 Feature map of size Then the obtained feature layersX27 andX26 are directly added and fused to obtain 256 Feature map of size Finally, the obtained feature layerX28 is subjected to another convolution operation with a convolution kernel window size of 3×3, a sliding step size of 1, and a padding attribute of 1, and 256 Feature map of size This is the output of the FPN3 module. In addition, the output of the FPN3 module is the output feature layer of the entire FPN module.
(二)特征处理模块(II) Feature processing module
如图3所示,在本实施例中,特征处理模块包括一个基于交互式注意力机制的T-ICSAF特征融合模块和一个基于GT二值标签监督的空间注意力机制和通道注意力机制结合的CSSCAM模块;其中,As shown in FIG3 , in this embodiment, the feature processing module includes a T-ICSAF feature fusion module based on an interactive attention mechanism and a CSSCAM module combining a spatial attention mechanism and a channel attention mechanism based on GT binary label supervision; wherein,
T-ICSAF特征融合模块用于对特征提取模块的输出特征层和进行融合,得到融合特征F3;The T-ICSAF feature fusion module is used to fusion the output feature layer of the feature extraction module. and Perform fusion to obtain fusion feature F3 ;
CSSCAM模块用于对融合特征F3进行处理,得到目标特征被增强、背景杂波特征被抑制的特征层A3,以作为特征处理模块的输出。The CSSCAM module is used to process the fused feature F3 to obtain a feature layer A3 in which the target feature is enhanced and the background clutter feature is suppressed, as the output of the feature processing module.
请参见图4,图4是本发明实施例提供的T-ICSAF特征融合模块的网络框架图,其中,T-ICSAF特征融合模块主要包括四个子模块,分别为特征预处理子模块、交互式通道注意子模块、交互式空间注意子模块和注意力融合子模块。Please refer to Figure 4, which is a network framework diagram of the T-ICSAF feature fusion module provided by an embodiment of the present invention, wherein the T-ICSAF feature fusion module mainly includes four sub-modules, namely, a feature preprocessing sub-module, an interactive channel attention sub-module, an interactive spatial attention sub-module and an attention fusion sub-module.
在本实施例中,特征预处理子模块用于将特征提取部分的输出特征层和分别进行卷积操作和BN批标准化操作,对应得到三个特征层X30、X31和X32。In this embodiment, the feature preprocessing submodule is used to convert the output feature layer of the feature extraction part into and Convolution operations and BN batch normalization operations are performed respectively, and three feature layers X30 , X31 and X32 are obtained accordingly.
具体的,特征预处理子模块主要采用如下方式实现:Specifically, the feature preprocessing submodule is mainly implemented in the following ways:
将特征提取部分的输出特征层和分别进行一次卷积核的窗口大小为3×3,滑动步长为1,填充属性Padding为1的卷积操作和一次BN批标准化操作,得到3个大小的特征层X30、X31和X32,即为特征预处理子模块的输出。另外特征预处理子模块输出的X30、X31和X32特征层为交互式通道注意子模块和交互式空间注意子模块的输入,特征预处理子模块输出的X30特征层为注意力融合子模块的一个输入。The output feature layer of the feature extraction part and Perform a convolution operation with a convolution kernel window size of 3×3, a sliding step size of 1, a padding attribute of 1, and a BN batch normalization operation to obtain 3 The feature layers X30 , X31 and X32 of different sizes are the outputs of the feature preprocessing submodule. In addition, the feature layers X30 , X31 and X32 output by the feature preprocessing submodule are the inputs of the interactive channel attention submodule and the interactive spatial attention submodule, and the feature layer X30 output by the feature preprocessing submodule is an input of the attention fusion submodule.
进一步的,交互式通道注意子模块用于对特征层X30、X31和X32依次进行全局平均池化、加和、以及Sigmoid归一化操作,得到通道注意权重Fc。Furthermore, the interactive channel attention submodule is used to perform global average pooling, summing, and sigmoid normalization operations on the feature layers X30 , X31 , and X32 in sequence to obtain the channel attention weight Fc .
在本实施例中,交互式通道注意子模块主要采用如下方式实现:In this embodiment, the interactive channel attention submodule is mainly implemented in the following manner:
首先将特征预处理子模块输出的X30、X31和X32特征层分别进行一次空间维度的全局平均池化操作,得到3个1×1×256大小的特征向量X33、X34和X35;然后将得到的X33、X34和X35进行逐元素的加和操作并进行一次Sigmoid归一化操作,得到一个1×1×256大小的通道注意权重Fc,即为交互式通道注意子模块的输出。另外交互式通道注意子模块的输出为注意力融合子模块的一个输入。First, the feature layersX30 ,X31 andX32 output by the feature preprocessing submodule are subjected to a global average pooling operation in the spatial dimension to obtain three feature vectorsX33 ,X34 andX35 of size 1×1×256; then the obtainedX33 ,X34 andX35 are added element by element and subjected to a Sigmoid normalization operation to obtain a channel attention weightFc of size 1×1×256, which is the output of the interactive channel attention submodule. In addition, the output of the interactive channel attention submodule is an input of the attention fusion submodule.
进一步的,交互式空间注意子模块用于对特征层X30、X31和X32依次进行通道维度上的全局平均池化、堆叠(Concat)、卷积以及Sigmoid归一化操作,得到空间注意权重Fs。Furthermore, the interactive spatial attention submodule is used to perform global average pooling, stacking (Concat), convolution and Sigmoid normalization operations on the feature layers X30 , X31 and X32 in the channel dimension in sequence to obtain the spatial attention weight Fs .
在本实施例中,交互式空间注意子模块主要采用如下方式实现:In this embodiment, the interactive spatial attention submodule is mainly implemented in the following manner:
首先将特征预处理子模块输出的X30、X31和X32特征层分别进行一次通道维度上的全局平均池化操作,得到3个大小的特征图X36、X37和X38;然后将得到的X36、X37和X38进行一次Concat操作,得到1个大小的特征图X39;再将得到的X39进行一次卷积核的窗口大小为3×3,滑动步长为1,填充属性Padding为1的卷积操作并进行一次Sigmoid归一化操作,得到一个大小的空间注意权重Fs,即为交互式空间注意子模块的输出。另外交互式空间注意子模块的输出为注意力融合子模块的一个输入。First, theX30 ,X31 , andX32 feature layers output by the feature preprocessing submodule are subjected to a global average pooling operation in the channel dimension to obtain three The feature maps ofsize X36 , X37 and X38 are then concat-operated to obtain1 The size of the feature map X39 is obtained; then the obtained X39 is convolved with a kernel window size of 3×3, a sliding step size of 1, a padding attribute of 1, and a Sigmoid normalization operation, and a The spatial attention weightFs of the size is the output of the interactive spatial attention submodule. In addition, the output of the interactive spatial attention submodule is an input of the attention fusion submodule.
此外,注意力融合子模块用于将通道注意权重Fc与特征预处理子模块输出的特征层X30的每个通道对应相乘,得到特征层F'3;然后将空间注意权重Fs与特征层F'3每个通道的像素逐一对应相乘,得到融合特征层F3,即为T-ICSAF特征融合模块的输出。In addition, the attention fusion submodule is used to multiply the channel attention weightFc with each channel of the feature layerX30 output by the feature preprocessing submodule to obtain the feature layerF'3 ; then the spatial attention weightFs is multiplied with the pixels of each channel of the feature layerF'3 one by one to obtain the fused feature layerF3 , which is the output of the T-ICSAF feature fusion module.
在本实施例中,注意力融合子模块主要采用如下方式实现:In this embodiment, the attention fusion submodule is mainly implemented in the following manner:
首先将交互式通道注意子模块输出的通道注意权重Fc与特征预处理子模块输出的X30的每个通道对应相乘,得到一个大小的特征层F'3;然后将交互式空间注意子模块输出的空间注意权重Fs与特征层F'3每个通道的像素逐一对应相乘,最终得到一个大小的融合特征层F3,即为T-ICSAF特征融合模块的输出。另外T-ICSAF特征融合模块的输出为特征处理模块CSSCAM模块的输入。First, the channel attention weightFc output by the interactive channel attention submodule is multiplied by each channel ofX30 output by the feature preprocessing submodule to obtain a Then the spatial attention weightFs output by the interactive spatial attention submodule is multiplied one by one by the pixels of each channel ofthe feature layerF'3 , and finally a The fusion feature layer F3 of the size is the output of the T-ICSAF feature fusion module. In addition, the output of the T-ICSAF feature fusion module is the input of the feature processing module CSSCAM module.
进一步的,请参见图5,图5是本发明实施例提供的CSSCAM模块的网络框架图,其中,CSSCAM模块包括三个子模块,分别为GT二值标签监督的空间注意力机制子模块、SE通道注意力机制子模块和注意力融合子模块;其中,Further, please refer to FIG. 5 , which is a network framework diagram of the CSSCAM module provided by an embodiment of the present invention, wherein the CSSCAM module includes three submodules, namely, a spatial attention mechanism submodule for GT binary label supervision, a SE channel attention mechanism submodule, and an attention fusion submodule; wherein,
GT二值标签监督的空间注意力机制子模块用于对T-ICSAF特征融合模块输出的融合特征层F3依次进行通道维度上的全局平均池化操作和通道维度上的全局最大池化操作、Concat操作、卷积以及Sigmoid归一化操作,得到空间注意权重As。The spatial attention mechanism submodule of GT binary label supervision is used to perform global average pooling operation on the channel dimension, global maximum pooling operation on the channel dimension, Concat operation, convolution and Sigmoid normalization operation on the fusion feature layerF3 output by the T-ICSAF feature fusion module to obtain the spatial attention weightAs .
具体的,GT二值标签监督的空间注意力机制子模块首先将T-ICSAF特征融合模块输出的融合特征F3分别进行一次通道维度上的全局平均池化操作和一次通道维度上的全局最大池化操作,得到2个大小的特征图X40、X41;然后将得到的X40、X41进行一次Concat操作,得到1个大小的特征图X42;再将得到的X42进行一次卷积核的窗口大小为7×7,滑动步长为1,填充属性Padding为3的卷积操作并进行一次Sigmoid归一化操作,得到一个大小的空间注意权重As,即为GT二值标签监督的空间注意力机制子模块的输出。Specifically, the spatial attention mechanism submodule of the GT binary label supervision first performs a global average pooling operation on the channel dimension and a global maximum pooling operation on the channel dimension on the fusion featureF3 output by the T-ICSAF feature fusion module, and obtains two The feature maps ofsize X40 and X41 are then concat- operated to obtain 1 The feature map of sizeX42 is obtained; then the obtainedX42 is convolved with a convolution kernel window size of 7×7, a sliding step size of 1, a padding attribute of 3, and a Sigmoid normalization operation, and a The spatial attention weight Asof size is the output of the spatial attention mechanism submodule of GT binary label supervision.
SE通道注意力机制子模块用于对T-ICSAF特征融合模块输出的融合特征层F3依次进行空间维度的全局平均池化操作和全局最大池化操作、特征向量压缩、映射和解压缩的操作、加和、以及Sigmoid归一化操作,得到通道注意权重Ac。The SE channel attention mechanism submodule is used to perform global average pooling and global maximum pooling operations in the spatial dimension, feature vector compression, mapping and decompression operations, addition, and Sigmoid normalization operations on the fused feature layerF3 output by the T-ICSAF feature fusion module to obtain the channel attention weight Ac .
具体的,SE通道注意力机制模块首先将T-ICSAF特征融合模块输出的融合特征F3分别进行一次空间维度的全局平均池化操作和一次空间维度的全局最大池化操作,得到2个1×1×256大小的特征向量X43、X44;然后将得到的X43、X44分别通过一个有16个神经元的全连接层L21、一个relu激活函数层和一个有256个神经元的全连接层L22进行对特征向量压缩、映射和解压缩的操作,得到通道注意后的2个1×1×256大小的特征向量X45、X46,其中全连接层L21将输入的特征向量压缩成1×1×16大小的特征向量,全连接层L22将输入的特征向量解压缩回1×1×256大小的特征向量;最后将得到的X45、X46进行逐元素的加和操作并进行一次Sigmoid归一化操作,得到一个1×1×256大小的通道注意权重Ac,即为SE通道注意力机制子模块的输出。Specifically, the SE channel attention mechanism module first performs a global average pooling operation in the spatial dimension and a global maximum pooling operation in the spatial dimension on the fusion featureF3 output by the T-ICSAF feature fusion module to obtain two 1×1×256 feature vectorsX43 andX44 ; then the obtainedX43 andX44 are respectively compressed, mapped and decompressed through a fully connected layerL21 with 16 neurons, a relu activation function layer and a fully connected layerL22 with 256 neurons to obtain two 1×1×256 feature vectorsX45 andX46 after channel attention, where the fully connected layerL21 compresses the input feature vector into a feature vector of 1×1×16, and the fully connected layerL22 decompresses the input feature vector back to a feature vector of 1×1×256; finally, the obtainedX45 and X46 are46 performs an element-by-element addition operation and a Sigmoid normalization operation to obtain a channel attention weight Ac of size 1×1×256, which is the output of the SE channel attention mechanism submodule.
注意力融合子模块用于将空间注意权重As与T-ICSAF特征融合模块输出的融合特征层F3的每个通道的像素逐一对应相乘,得到特征层A'3,然后将通道注意权重Ac与特征层A'3的每个通道对应相乘,得到一个目标特征被增强、背景杂波特征被抑制的特征层A3,即为CSSCAM模块的输出。The attention fusion submodule is used to multiply the spatial attention weightAs with the pixels of each channel of the fused feature layerF3 output by the T-ICSAF feature fusion module one by one to obtain the feature layerA'3 , and then multiply the channel attention weightAc with each channel of the feature layerA'3 to obtain a feature layerA3 in which the target features are enhanced and the background clutter features are suppressed, which is the output of the CSSCAM module.
具体的,首先将GT二值标签监督的空间注意力机制子模块输出的空间注意权重As与T-ICSAF特征融合模块输出的融合特征F3的每个通道的像素逐一对应相乘,得到一个大小的特征层A'3;然后再将SE通道注意力机制子模块输出的通道注意权重Ac与特征层A3'的每个通道对应相乘,得到一个目标特征被增强、背景杂波特征被抑制的大小的特征层A3,即为CSSCAM模块的输出。另外CSSCAM模块的输出为网络预测部分的输入。Specifically, first, the spatial attention weight As output by the spatial attention mechanism submodule of the GT binary label supervision is multiplied one by one by thepixels of each channel of the fusion featureF3 output by the T-ICSAF feature fusion module to obtain a Then, the channel attentionweight Acoutput by the SE channel attention mechanism submodule is multiplied by each channel of the feature layerA3 ' to obtain a target feature that is enhanced and background clutter features that are suppressed. The feature layer A3 of size is the output of the CSSCAM module. In addition, the output of the CSSCAM module is the input of the network prediction part.
此外,需要说明的是,本发明构建的目标检测网络中,还包括一监督标签生成模块,用于生成GT二值标签。In addition, it should be noted that the target detection network constructed by the present invention also includes a supervised label generation module for generating GT binary labels.
在本实施例中,GT二值标签的构造方法如下:In this embodiment, the construction method of the GT binary label is as follows:
a)对训练图像进行真实标注,将目标像素标注为1,背景像素标注为0,得到与训练图像相对应的标注图像;a) Perform true annotation on the training image, annotate the target pixel as 1 and the background pixel as 0, and obtain the annotated image corresponding to the training image;
b)以45为步长,800×1333为大小得到3536张与原始训练集ψ相对应的二值标注切片GT;b) With a step size of 45 and a size of 800×1333, 3536 binary labeled slices GT corresponding to the original training set ψ are obtained;
c)将b)中得到3536张二值标注切片GT下采样三次,得到与送入CSSCAM模块和网络预测部分的特征层同样大小的3536张二值标注切片GT',并保存与二值标注切片GT'对应的.mat文件作为最终的GT二值标签。c) Downsample the 3536 binary labeled slices GT obtained in b) three times to obtain 3536 binary labeled slices GT' of the same size as the feature layer sent to the CSSCAM module and the network prediction part, and save the .mat file corresponding to the binary labeled slice GT' as the final GT binary label.
(三)网络预测模块(III) Network prediction module
请继续参见图3,其中,网络预测模块包括用于目标检测任务的分类分支子模块、回归分支子模块和基于GT二值标签监督的Attention二分类分支子模块;Please continue to refer to Figure 3, where the network prediction module includes a classification branch submodule for target detection tasks, a regression branch submodule, and an Attention binary classification branch submodule based on GT binary label supervision;
分类分支子模块和回归分支子模块分别用于对特征处理模块的输出特征层A3进行预测,对应得到分类得分和边界框回归参数;The classification branch submodule and the regression branch submodule are used to predict the output feature layerA3 of the feature processing module, and obtain the classification score and bounding box regression parameters respectively;
Attention二分类分支子模块用于对用于预测分类得分的特征层CP和用于预测边界框回归参数的特征层RP进行处理,得到二分类得分。The Attention binary classification branch submodule is used to process the feature layer CP used to predict the classification score and the feature layer RP used to predict the bounding box regression parameters to obtain the binary classification score.
在本实施例中,分类分支子模块采用如下方式实现:In this embodiment, the classification branch submodule is implemented in the following manner:
首先对特征处理部分CSSCAM模块输出的特征层A3进行四次卷积操作,得到一个用于预测分类得分的特征层CP;然后对得到的特征层CP依次进行卷积和Sigmoid归一化操作,得到分类分支子模块预测的分类得分图X47。First, the feature layer A3 output by the feature processing part CSSCAM module is convolved four times to obtain a feature layer CP for predicting classification scores; then the obtained feature layer CP is convolved and Sigmoid normalized in turn to obtain the classification score map X47 predicted by the classification branch submodule.
具体的,先将特征处理部分CSSCAM模块输出的特征层A3进行四次卷积核的窗口大小为3×3,滑动步长为1,填充属性Padding为1的卷积操作,得到一个用于预测分类得分的大小的特征层CP;再将得到的特征层CP再进行一次卷积核的窗口大小为3×3,滑动步长为1,填充属性Padding为1的卷积操作并进行一次Sigmoid归一化操作,得到一个大小的分类分支子模块预测的分类得分图X47。Specifically, the feature layerA3 output by the CSSCAM module of the feature processing part is first subjected to four convolution operations with a window size of 3×3, a sliding step size of 1, and a padding attribute of 1, and a convolution operation is obtained for predicting the classification score. The feature layer CP of size is obtained; the obtained feature layer CP is subjected to a convolution operation with a convolution kernel window size of 3×3, a sliding step size of 1, a padding attribute of 1, and a Sigmoid normalization operation, and a Classification score graph of size classification branch submodule prediction X47 .
在本实施例中,回归分支子模块采用如下方式实现:In this embodiment, the regression branch submodule is implemented in the following manner:
首先对特征处理部分CSSCAM模块输出的特征层A3进行四次卷积操作,得到一个用于预测边界框回归参数的特征层RP;然后对得到的特征层RP进行一次卷积操作,得到回归分支子模块预测的边界框回归参数X48。First, the feature layerA3 output by the feature processing part CSSCAM module is convolved four times to obtain a feature layerRp for predicting bounding box regression parameters; then the obtained feature layerRp is convolved once to obtain the bounding box regression parametersX48 predicted by the regression branch submodule.
具体的,先将特征处理部分CSSCAM模块输出的特征层A3进行四次卷积核的窗口大小为3×3,滑动步长为1,填充属性Padding为1的卷积操作,得到一个用于预测边界框回归参数的大小的特征层RP;再将得到的特征层RP再进行一次卷积核的窗口大小为3×3,滑动步长为1,填充属性Padding为1的卷积操作,得到一个大小的回归分支子模块预测的边界框回归参数X48。Specifically, the feature layerA3 output by the CSSCAM module of the feature processing part is first subjected to four convolution operations with a window size of 3×3, a sliding step size of 1, and a padding attribute of 1, and a convolution operation is obtained for predicting the bounding box regression parameters. The feature layer RP of size ; then the obtained feature layer RP is convolved again with a convolution kernel window size of 3×3, a sliding step size of 1, and a padding attribute of 1, and a convolution operation is obtained. The bounding box regression parameters X48 predicted by the regression branch submodule of size.
在本实施例中,Attention二分类分支子模块采用如下方式实现:In this embodiment, the Attention binary classification branch submodule is implemented in the following manner:
首先将分类分支子模块中用于预测分类得分的特征层CP和回归分支子模块中用于预测边界框回归参数的特征层RP进行一次Concat操作,并对得到的特征图X49进行卷积操作,得到特征层AP;最后对特征层AP进行卷积操作和Sigmoid归一化操作,得到Attention二分类分支子模块预测的二分类得分图X50。First, a concat operation is performed on the feature layer CP used to predict the classification score in the classification branch submodule and the feature layer RP used to predict the bounding box regression parameters in the regression branch submodule, and a convolution operation is performed on the obtained feature map X49 to obtain the feature layer AP ; finally, a convolution operation and a Sigmoid normalization operation are performed on the feature layer AP to obtain the binary classification score map X50 predicted by the Attention binary classification branch submodule.
具体的,先将分类分支子模块中用于预测分类得分的特征层CP和回归分支子模块中用于预测边界框回归参数的特征层RP进行一次Concat操作,得到1个大小的特征图X49;再将得到的X49进行一次卷积核的窗口大小为3×3,滑动步长为1,填充属性Padding为1的卷积操作,得到一个大小的特征层AP,最后将得到的特征层AP进行一次卷积核窗口大小为3×3,滑动步长为1填充属性Padding为1的卷积操作并进行一次Sigmoid归一化操作,得到一个大小的Attention二分类分支子模块预测的二分类得分图X50。Specifically, the feature layer CP used to predict the classification score in the classification branch submodule and the feature layer RP used to predict the bounding box regression parameters in the regression branch submodule are concat- ed to obtain 1 The feature map of size X49 is obtained; then the obtained X49 is convolved with a convolution kernel window size of 3×3, a sliding step size of 1, and a padding attribute of 1 to obtain a Finally, the obtained feature layerAP is subjected to a convolutionoperation with a convolution kernel window size of 3×3, a sliding step size of 1, and a padding attribute of 1, and a Sigmoid normalization operation, and a Binary classification score map predicted by the Attention binary classification branch submodule of size X50 .
本发明设计的目标检测网络在特征融合方面,采用了基于三支路交互式注意力机制的特征融合模块T-ICSAF;同时,提出了基于GT二值标签监督的注意力机制模块CSSCAM来有效地抑制背景杂波特征,解决了由于传统特征引入所带来的额外背景杂波特征造成虚警框增加的问题,进一步增强目标特征,让特征提取后的目标特征更具有目标性,从而获得更好的SAR目标检测性能。In terms of feature fusion, the target detection network designed by the present invention adopts a feature fusion module T-ICSAF based on a three-branch interactive attention mechanism. At the same time, an attention mechanism module CSSCAM based on GT binary label supervision is proposed to effectively suppress background clutter features, solve the problem of increased false alarm frames caused by additional background clutter features brought about by the introduction of traditional features, further enhance target features, and make the target features after feature extraction more targeted, thereby obtaining better SAR target detection performance.
此外,本发明根据SAR图像本身具有几何畸变、辐射畸变、遮挡阴影等特点,设计了一个基于GT二值标签监督的Attention二分类分支取代了原始无锚框目标检测网络FCOS中的Centerness分支,让其更加适用于SAR车辆目标检测任务,并能联合本发明提出的CSSCAM模块使用,从而进一步提升SAR目标检测性能。In addition, according to the characteristics of SAR images themselves, such as geometric distortion, radiation distortion, occlusion and shadow, the present invention designs an Attention binary classification branch based on GT binary label supervision to replace the Centerness branch in the original anchor-free target detection network FCOS, making it more suitable for SAR vehicle target detection tasks, and can be used in conjunction with the CSSCAM module proposed in the present invention, thereby further improving the SAR target detection performance.
步骤3:利用新的训练集对目标检测网络进行训练,得到训练好的目标检测网络ψ'。Step 3: Use the new training set to train the target detection network to obtain the trained target detection network ψ'.
具体的,在对网络进行训练的过程中,CSSCAM模块使用GT二值标签GT'对GT二值标签监督的空间注意力机制子模块的空间注意权重As进行监督学习,损失函数使用FocalLoss;Specifically, in the process of training the network, the CSSCAM module uses the GT binary label GT' to supervise the learning of the spatial attentionweight As of the spatial attention mechanism submodule supervised by the GT binary label, and the loss function uses FocalLoss;
分类分支子模块使用根据无锚框正负样本选择策略得到的正负样本标签进行监督学习,损失函数使用Focal Loss;The classification branch submodule uses the positive and negative sample labels obtained according to the anchor-free positive and negative sample selection strategy for supervised learning, and the loss function uses Focal Loss;
回归分支子模块使用根据无锚框正负样本选择策略得到的正样本标签进行监督学习,损失函数使用GIOU Loss;The regression branch submodule uses the positive sample labels obtained according to the anchor-free positive and negative sample selection strategy for supervised learning, and the loss function uses GIOU Loss;
Attention二分类分支子模块使用与根据无锚框正负样本选择策略得到的正样本相同位置处的GT二值标签GT'进行监督学习,其损失函数使用BCE Loss。The Attention binary classification branch submodule uses the GT binary label GT' at the same position as the positive sample obtained according to the anchor-free positive and negative sample selection strategy for supervised learning, and its loss function uses BCE Loss.
具体的,Focal Loss损失函数的表达式如下:Specifically, the expression of the Focal Loss loss function is as follows:
FL(pt)=-(1-pt)γlog(pt)FL(pt )=-(1-pt )γ log(pt )
其中,pt表示网络预测为对应分类标签c*的概率,γ表示调制因子,一般取2;Among them, pt represents the probability of the network prediction corresponding to the classification label c* , γ represents the modulation factor, which is generally 2;
GIoU Loss损失函数针对两个边界框A和B,其表达式如下:The GIoU Loss loss function is for two bounding boxes A and B, and its expression is as follows:
其中,C表示能够包围边界框A、B的最小外接边界框,IoU表示边界框A、B的交并比;Among them, C represents the minimum external bounding box that can enclose bounding boxes A and B, and IoU represents the intersection-over-union ratio of bounding boxes A and B;
BCE Loss损失函数的表达式如下:The expression of BCE Loss loss function is as follows:
其中,p表示网络预测为目标(c*=1)的概率。Here, p represents the probability that the network predicts the target (c* = 1).
进一步的,在本实施例中,根据无锚框的正负样本选择策略得到正负样本标签使用的策略如下:Furthermore, in this embodiment, the strategy for obtaining positive and negative sample labels based on the positive and negative sample selection strategy without anchor boxes is as follows:
a)对于预测特征层上的任一位置(x,y),将其根据下采样倍数映射回原始图像上,得到原始图像上的对应位置a) For any position (x, y) on the predicted feature layer, map it back to the original image according to the downsampling multiple to obtain the corresponding position on the original image
b)判断是否落入目标水平框内部,首先计算到每一个目标水平框四条边距离的最小值M为原始图像中目标的数量。其中,假设任一目标水平框则到Bt四条边的距离lt、rt、tt、bt的计算公式如下:b) Judgment Whether it falls inside the target horizontal box, first calculate The minimum distance to each of the four sides of the target horizontal box M is the number of objects in the original image. Assume that any object horizontal frame but The calculation formulas for the distances to the four sides ofBt , lt , rt , tt , and bt, are as follows:
然后按照以下三种情况对对应的预测特征层上的位置(x,y)进行正负样本划分及标签定义:Then, according to the following three situations The corresponding position (x, y) on the prediction feature layer is used for positive and negative sample division and label definition:
b1)如果则将对应的预测特征层上的位置(x,y)划分为负样本,类别标签定义为0,回归标签定义为-1。b1) If Then The corresponding position (x, y) on the predicted feature layer is divided into negative samples, and the category label Defined as 0, regression label Defined as -1.
b2)如果存在唯一则说明位置仅落入一个目标水平框的内部。接着需要根据该预测特征层的尺度回归范围进一步判断应的预测特征层上的位置(x,y)是否预测该目标水平框。假设该目标水平框为Bi,首先计算位置(x,y)到Bi四条边距离的最大值然后将其与该预测特征层设定的尺寸回归范围[smin,smax]进行比较,如果则将对应的预测特征层上的位置(x,y)划分为正样本,类别标签定义为Bi的类别,回归标签定义为到Bi四条边的距离否则将对应的预测特征层上的位置(x,y)划分为负样本,即认为位置(x,y)不预测Bi,并将类别标签定义为0,回归标签定义为-1。b2) If there is a unique The location It only falls inside a target horizontal box. Then it is necessary to further judge according to the scale regression range of the predicted feature layer. The position (x, y) on the corresponding prediction feature layer predicts the target horizontal box. Assuming that the target horizontal box isBi , first calculate the maximum distance from the position (x, y) to the four edgesof Bi Then compare it with the size regression range [smin ,smax ] set by the prediction feature layer. If Then The corresponding position (x, y) on the prediction feature layer is divided into positive samples and category labels Defined as the category ofBi , the regression label Defined as Distances to the four sides ofBi Otherwise The position (x, y) on the corresponding prediction feature layer is classified as a negative sample, that is, the position (x, y) is considered not to predict Bi , and the category label Defined as 0, regression label Defined as -1.
b3)如果存在N个则说明位置落入到N个目标水平框的内部。接着,与情况b2)类似,需要根据该预测特征层的尺度回归范围进一步判断对应的预测特征层上的位置(x,y)是否预测这些目标水平框。首先计算位置到这N个目标水平框四条边距离的最大值然后将它们与该预测特征层设定的尺寸回归范围[smin,smax]进行比较,如果存在t(t>2)个满足:b3) If there are N The location falls into the interior of the N target horizontal boxes. Then, similar to case b2), it is necessary to further judge according to the scale regression range of the predicted feature layer The corresponding position (x, y) on the prediction feature layer predicts these target horizontal boxes. First, calculate the position The maximum distance to the four sides of these N target horizontal boxes Then they are compared with the size regression range [smin , smax ] set by the prediction feature layer. If there are t (t>2) satisfy:
则说明对应的预测特征层上的位置(x,y)与t个目标水平框对应。由于单阶段全卷积目标检测(Fully Convolution One-Stage Object Detection,FCOS)算法认为面积更大的目标水平框可以由更深尺度预测特征层上的位置进行预测,所以对应的预测特征层上的位置(x,y)最终将与t个目标水平框中面积最小的目标水平框Bj对应,即将对应的预测特征层上的位置(x,y)划分为Bj对应的正样本,类别标签定义为Bj的类别,回归标签定义为到Bj四条边的距离如果存在满足则说明对应的预测特征层上的位置(x,y)仅与一个目标水平框Bk对应,即将对应的预测特征层上的位置(x,y)划分为Bk对应的正样本,类别标签定义为Bk的类别,回归标签定义为到Bk四条边的距离否则将对应的预测特征层上的位置(x,y)划分为负样本,即认为位置(x,y)不预测任何目标水平框,并将类别标签定义为0,回归标签定义为-1。Then explain The corresponding position (x, y) on the prediction feature layer corresponds to t target horizontal boxes. Since the single-stage fully convolutional object detection (FCOS) algorithm believes that the target horizontal box with a larger area can be predicted by the position on the deeper scale prediction feature layer, The position (x, y) on the corresponding prediction feature layer will eventually correspond to the target horizontal boxBj with the smallest area among the t target horizontal boxes, that is, The corresponding position (x, y) on the predicted feature layer is divided into positive samples corresponding toBj , and the category label Defined as the category ofBj , the regression label Defined as Distances to the four sides ofBj If exists satisfy Then explain The position (x, y) on the corresponding prediction feature layer corresponds to only one target horizontal boxBk , that is, The corresponding position (x, y) on the predicted feature layer is divided into positive samples corresponding to Bk , and the category label Defined as the category ofBk , the regression label Defined as Distances to the four sides of Bk Otherwise The position (x, y) on the corresponding prediction feature layer is classified as a negative sample, that is, it is considered that the position (x, y) does not predict any target horizontal box, and the category label Defined as 0, regression label Defined as -1.
c)重复a)、b)操作,定义任一特征层上所有位置处的正负样本及其标签;c) Repeat operations a) and b) to define positive and negative samples and their labels at all positions on any feature layer;
d)重复a)、b)、c)操作,定义所有特征层上所有位置处的正负样本及其标签;d) Repeat operations a), b), and c) to define positive and negative samples and their labels at all positions on all feature layers;
需要说明的是,本实施例仅使用特征处理部分CSSCAM融合模块输出的A3进行网络预测,所以不需要进行d)的操作。另外,本实施例将A3预测特征层的尺寸回归范围设置为[-1,128]。It should be noted that this embodiment only uses A3 output by the CSSCAM fusion module of the feature processing part for network prediction, so there is no need to perform operation d). In addition, this embodiment sets the size regression range of the A3 prediction feature layer to [-1,128].
步骤4:将新的测试集输入到训练好的目标检测网络ψ'中,得到初步的目标检测结果。Step 4: Input the new test set into the trained object detection network ψ' to obtain preliminary object detection results.
41)将新的测试集中的一张幅度测试切片和与之对应的梯度幅度测试切片CFAR二值测试切片分别送入训练好的目标网络中的三个特征提取子网络中进行测试,得到特征层上每个位置的分类得分边界框回归参数和二分类得分;其中,幅度测试切片为原始测试集T中的某一张测试切片;41) Combine an amplitude test slice and the corresponding gradient amplitude test slice in the new test set CFAR binary test slice They are sent to the three feature extraction subnetworks in the trained target network for testing, and the classification score bounding box regression parameters and binary classification scores at each position on the feature layer are obtained; the amplitude test slice is a test slice in the original test set T;
42)将每个位置的分类得分和二分类得分相乘并开根号,以作为该位置最终的目标检测得分,并与预设的得分阈值进行比较:42) Multiply the classification score and the binary classification score of each position and take the square root to obtain the final target detection score of the position, and compare it with the preset score threshold:
若特征层中任意位置处的目标检测得分小于预设得分阈值,则丢弃该位置处所预测的检测框;If the target detection score at any position in the feature layer is less than the preset score threshold, the predicted detection box at that position is discarded;
否则,将特征层上的该位置根据下采样倍数映射回该测试切片原图上作为最终检测框的中心,并综合边界框回归参数,以得到该位置的目标检测结果;Otherwise, the position on the feature layer is mapped back to the original image of the test slice according to the downsampling multiple as the center of the final detection box, and the bounding box regression parameters are integrated to obtain the target detection result at this position;
43)将特征层中的每个位置重复步骤42)的操作,得到初步的目标检测结果。43) Repeat step 42) for each position in the feature layer to obtain preliminary target detection results.
步骤5:将初步的目标检测结果对应到测试图像上,并进行NMS操作以去除重叠的目标检测框,得到最终的目标检测结果。Step 5: Map the preliminary target detection results to the test image and perform NMS operation to remove overlapping target detection boxes to obtain the final target detection results.
具体的,根据步骤1构建测试集时得到的Loc将每个测试切片的目标检测结果对应回测试图像上,并进行NMS操作去除重叠的目标检测框。重复步骤4和5的操作,从而得到最终的目标检测结果。Specifically, the target detection result of each test slice is mapped back to the test image according to the Loc obtained when constructing the test set in step 1, and the NMS operation is performed to remove the overlapping target detection boxes. The operations of steps 4 and 5 are repeated to obtain the final target detection result.
本发明提供的基于有监督注意力机制的无锚框SAR目标检测方法从增强目标特征的角度出发,引入了梯度幅度信息和CFAR信息,构建了基于梯度信息、CFAR信息融合的无锚框目标检测网络,实现了复杂场景下地面SAR目标检测任务,能够有效缓解有锚框目标检测网络本身存在的计算复杂、正负样本不平衡等问题,提升了SAR目标检测性能。The anchor-free SAR target detection method based on supervised attention mechanism provided by the present invention starts from the perspective of enhancing target features, introduces gradient amplitude information and CFAR information, constructs an anchor-free target detection network based on the fusion of gradient information and CFAR information, realizes the ground SAR target detection task in complex scenes, can effectively alleviate the problems of computational complexity and imbalance of positive and negative samples existing in the anchor-free target detection network itself, and improves the SAR target detection performance.
为了进一步验证本发明提出的基于有监督注意力机制的无锚框SAR目标检测方法的有效性,本实施例还将其在MiniSAR数据图像上进行了检测。In order to further verify the effectiveness of the anchor-free SAR target detection method based on the supervised attention mechanism proposed in the present invention, this embodiment also detects it on the MiniSAR data image.
请参见图6-11,图6-11是本发明实验所使用的MiniSAR数据图像;表1给出了本发明所提方法与现阶段目标检测性能较好的CFAR-Guided-EfficientDet SAR图像目标检测方法(简称CFAR-Guided-EfficientDet,出自论文《SAR图像目标检测与鉴别方法研究》,西安电子科技大学博士论文,王宁,2021)和基于CFAR指导的双流SSD SAR图像目标检测方法(简称ICSAF-CFAR-SSD,出自论文《结合恒虚警检测与深层网络的SAR目标检测研究》,西安电子科技大学硕士论文,唐天顾,2022)在图6-11所示的MiniSAR数据图像上的车辆目标检测性能指标。Please refer to Figures 6-11, which are the MiniSAR data images used in the experiments of the present invention; Table 1 gives the vehicle target detection performance indicators of the method proposed in the present invention and the CFAR-Guided-EfficientDet SAR image target detection method with better target detection performance at this stage (referred to as CFAR-Guided-EfficientDet, from the paper "Research on SAR Image Target Detection and Identification Methods", doctoral dissertation of Xidian University, Wang Ning, 2021) and the CFAR-guided dual-stream SSD SAR image target detection method (referred to as ICSAF-CFAR-SSD, from the paper "Research on SAR Target Detection Combining Constant False Alarm Detection and Deep Network", master's thesis of Xidian University, Tang Tiangu, 2022) on the MiniSAR data images shown in Figures 6-11.
表1不同检测网络在图6-11所示的MiniSAR数据图像上的目标检测性能对比Table 1 Comparison of target detection performance of different detection networks on the MiniSAR data images shown in Figures 6-11
表1中的Pre表示精确率,即检测出的目标框中真实目标的百分比;Rec表示召回率,即目标被正确检测出来的百分比;F1-Score表示调和平均数,AP表示平均准确率,它们是统一精确率Pre和召回率Rec的系统性指标。In Table 1, Pre represents precision, that is, the percentage of true targets in the detected target frame; Rec represents recall, that is, the percentage of targets correctly detected; F1-Score represents the harmonic mean, and AP represents the average accuracy, which are systematic indicators that unify the precision Pre and recall Rec.
通过表1可以看到,本发明所提方法在图6-11所示的六幅MiniSAR数据图像上的SAR目标检测性能均好于CFAR-Guided-EfficientDet目标检测网络;本发明所提方法在图像1、图像2、图像3和图像4上的SAR目标检测性能要全面好于ICSAF-CFAR-SSD目标检测网络;在图像5上的SAR目标检测性能除AP与ICSAF-CFAR-SSD目标检测网络相当外,其他指标均明显高于ICSAF-CFAR-SSD目标检测网络;另外,在图像6上的SAR目标检测除Pre和F1-Score低于ICSAF-CFAR-SSD目标检测网络外,Rec和AP均高于ICSAF-CFAR-SSD目标检测网络。It can be seen from Table 1 that the SAR target detection performance of the proposed method on the six MiniSAR data images shown in Figures 6-11 is better than that of the CFAR-Guided-EfficientDet target detection network; the SAR target detection performance of the proposed method on Image 1, Image 2, Image 3 and Image 4 is better than that of the ICSAF-CFAR-SSD target detection network in all aspects; the SAR target detection performance on Image 5 is comparable to that of the ICSAF-CFAR-SSD target detection network, and other indicators are significantly higher than those of the ICSAF-CFAR-SSD target detection network; in addition, the SAR target detection on Image 6 is higher than that of the ICSAF-CFAR-SSD target detection network except that Pre and F1-Score are lower than those of the ICSAF-CFAR-SSD target detection network, while Rec and AP are higher than those of the ICSAF-CFAR-SSD target detection network.
总的来看,本发明所提方法在召回率方面明显优于现阶段目标检测性能相对较好的两种SAR目标检测方法在复杂场景下车辆目标上的检测性能,能够检测出更多的目标,同时还能保证目标检测精确率、F1-Score和AP也相对更好。所以,通过上述试验分析,本发明所提方法获得了比现阶段目标检测性能相对较好的SAR目标检测方法更好的SAR目标检测性能,充分验证了本发明所提方法的有效性和优越性。In general, the method proposed in the present invention is significantly better than the two SAR target detection methods with relatively good target detection performance at the current stage in terms of recall rate on vehicle targets in complex scenes, and can detect more targets while ensuring that the target detection accuracy, F1-Score and AP are also relatively better. Therefore, through the above experimental analysis, the method proposed in the present invention has achieved better SAR target detection performance than the SAR target detection methods with relatively good target detection performance at the current stage, which fully verifies the effectiveness and superiority of the method proposed in the present invention.
以上内容是结合具体的优选实施方式对本发明所作的进一步详细说明,不能认定本发明的具体实施只局限于这些说明。对于本发明所属技术领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干简单推演或替换,都应当视为属于本发明的保护范围。The above contents are further detailed descriptions of the present invention in combination with specific preferred embodiments, and it cannot be determined that the specific implementation of the present invention is limited to these descriptions. For ordinary technicians in the technical field to which the present invention belongs, several simple deductions or substitutions can be made without departing from the concept of the present invention, which should be regarded as falling within the scope of protection of the present invention.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310153318.2ACN116363504B (en) | 2023-02-22 | 2023-02-22 | Anchor-free SAR target detection method based on supervised attention mechanism |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310153318.2ACN116363504B (en) | 2023-02-22 | 2023-02-22 | Anchor-free SAR target detection method based on supervised attention mechanism |
| Publication Number | Publication Date |
|---|---|
| CN116363504Atrue CN116363504A (en) | 2023-06-30 |
| CN116363504B CN116363504B (en) | 2025-08-26 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202310153318.2AActiveCN116363504B (en) | 2023-02-22 | 2023-02-22 | Anchor-free SAR target detection method based on supervised attention mechanism |
| Country | Link |
|---|---|
| CN (1) | CN116363504B (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20210174149A1 (en)* | 2018-11-20 | 2021-06-10 | Xidian University | Feature fusion and dense connection-based method for infrared plane object detection |
| US20210232813A1 (en)* | 2020-01-23 | 2021-07-29 | Tongji University | Person re-identification method combining reverse attention and multi-scale deep supervision |
| CN114202672A (en)* | 2021-12-09 | 2022-03-18 | 南京理工大学 | A small object detection method based on attention mechanism |
| CN114764886A (en)* | 2022-03-18 | 2022-07-19 | 西安电子科技大学 | CFAR (computational fluid dynamics) -guidance-based double-current SSD SAR (solid State disk) image target detection method |
| CN115147731A (en)* | 2022-07-28 | 2022-10-04 | 北京航空航天大学 | A SAR Image Target Detection Method Based on Full Spatial Coding Attention Module |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20210174149A1 (en)* | 2018-11-20 | 2021-06-10 | Xidian University | Feature fusion and dense connection-based method for infrared plane object detection |
| US20210232813A1 (en)* | 2020-01-23 | 2021-07-29 | Tongji University | Person re-identification method combining reverse attention and multi-scale deep supervision |
| CN114202672A (en)* | 2021-12-09 | 2022-03-18 | 南京理工大学 | A small object detection method based on attention mechanism |
| CN114764886A (en)* | 2022-03-18 | 2022-07-19 | 西安电子科技大学 | CFAR (computational fluid dynamics) -guidance-based double-current SSD SAR (solid State disk) image target detection method |
| CN115147731A (en)* | 2022-07-28 | 2022-10-04 | 北京航空航天大学 | A SAR Image Target Detection Method Based on Full Spatial Coding Attention Module |
| Title |
|---|
| 李健伟;曲长文;彭书娟;: "SAR图像舰船目标联合检测与方向估计", 武汉大学学报(信息科学版), no. 06, 5 June 2019 (2019-06-05)* |
| 邹树岭: "结合SAR图像特性的无锚框SAR目标检测方法研究", 《知网》, 15 March 2025 (2025-03-15)* |
| Publication number | Publication date |
|---|---|
| CN116363504B (en) | 2025-08-26 |
| Publication | Publication Date | Title |
|---|---|---|
| CN111401201B (en) | A multi-scale target detection method for aerial images based on spatial pyramid attention-driven | |
| CN109685776B (en) | Pulmonary nodule detection method and system based on CT image | |
| CN111898432B (en) | Pedestrian detection system and method based on improved YOLOv3 algorithm | |
| WO2020177432A1 (en) | Multi-tag object detection method and system based on target detection network, and apparatuses | |
| CN113128564B (en) | Typical target detection method and system based on deep learning under complex background | |
| CN112348036A (en) | Adaptive Object Detection Method Based on Lightweight Residual Learning and Deconvolution Cascade | |
| CN107748900A (en) | Tumor of breast sorting technique and device based on distinction convolutional neural networks | |
| Cepni et al. | Vehicle detection using different deep learning algorithms from image sequence | |
| CN112507896B (en) | A method for detecting cherry fruits using the improved YOLO-V4 model | |
| CN111798490B (en) | Video SAR vehicle target detection method | |
| Wu et al. | Object detection and X-ray security imaging: A survey | |
| CN116958792B (en) | A false alarm removal method for SAR vehicle target detection | |
| CN107977660A (en) | Region of interest area detecting method based on background priori and foreground node | |
| Yang et al. | An effective and lightweight hybrid network for object detection in remote sensing images | |
| CN115965862A (en) | SAR ship target detection method based on mask network fusion image characteristics | |
| CN103366184A (en) | Polarization SAR data classification method and system based on mixed classifier | |
| CN111091101A (en) | High-precision pedestrian detection method, system and device based on one-step method | |
| CN117218545B (en) | Radar image detection method based on LBP characteristics and improvement Yolov5 | |
| CN108734200A (en) | Human body target visible detection method and device based on BING features | |
| CN115861956A (en) | Yolov3 road garbage detection method based on decoupling head | |
| Bi et al. | DR-YOLO: An improved multi-scale small object detection model for drone aerial photography scenes based on YOLOv7 | |
| CN113869121A (en) | Radar waveform classification method and system based on effective region identification | |
| CN118072163A (en) | Neural network-based method and system for detecting illegal occupation of territorial cultivated land | |
| CN116363504A (en) | Anchor frame-free SAR target detection method based on supervised attention mechanism | |
| Li et al. | An improved image registration and fusion algorithm |
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |