Movatterモバイル変換


[0]ホーム

URL:


CN116363504A - Anchor frame-free SAR target detection method based on supervised attention mechanism - Google Patents

Anchor frame-free SAR target detection method based on supervised attention mechanism
Download PDF

Info

Publication number
CN116363504A
CN116363504ACN202310153318.2ACN202310153318ACN116363504ACN 116363504 ACN116363504 ACN 116363504ACN 202310153318 ACN202310153318 ACN 202310153318ACN 116363504 ACN116363504 ACN 116363504A
Authority
CN
China
Prior art keywords
feature
module
target detection
attention
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310153318.2A
Other languages
Chinese (zh)
Other versions
CN116363504B (en
Inventor
王英华
邹树岭
黄瀚洋
刘宏伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian UniversityfiledCriticalXidian University
Priority to CN202310153318.2ApriorityCriticalpatent/CN116363504B/en
Publication of CN116363504ApublicationCriticalpatent/CN116363504A/en
Application grantedgrantedCritical
Publication of CN116363504BpublicationCriticalpatent/CN116363504B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

Translated fromChinese

本发明公开了一种基于有监督注意力机制的无锚框SAR目标检测方法,包括:基于原始SAR图像获得原始训练集和原始测试集;分别从原始训练集和原始测试集中获取梯度信息和CFAR信息,并据此构建新的训练集和新的测试集;构建基于梯度信息、CFAR信息融合和有监督注意力机制的无锚框SAR目标检测网络;利用新的训练集对目标检测网络进行训练;将新的测试集输入到训练好的目标检测网络中,得到初步的目标检测结果;将初步的目标检测结果对应到测试图像上,并进行NMS操作以去除重叠的目标检测框,得到最终的目标检测结果。该方法能够有效缓解有锚框目标检测网络本身存在的计算复杂、正负样本不平衡等问题,提升了SAR目标检测性能。

Figure 202310153318

The invention discloses an anchor-free SAR target detection method based on a supervised attention mechanism, comprising: obtaining an original training set and an original test set based on an original SAR image; obtaining gradient information and CFAR from the original training set and the original test set respectively information, and construct a new training set and a new test set based on it; build an anchor-free SAR target detection network based on gradient information, CFAR information fusion and supervised attention mechanism; use the new training set to train the target detection network ; Input the new test set into the trained target detection network to obtain the preliminary target detection result; correspond the preliminary target detection result to the test image, and perform NMS operation to remove the overlapping target detection frame to obtain the final Target detection results. This method can effectively alleviate the problems of complex calculation and imbalance of positive and negative samples in the target detection network with anchor frames, and improve the performance of SAR target detection.

Figure 202310153318

Description

Translated fromChinese
基于有监督注意力机制的无锚框SAR目标检测方法Anchor-free SAR target detection method based on supervised attention mechanism

技术领域Technical Field

本发明属于雷达目标检测技术领域,具体涉及一种基于有监督注意力机制的无锚框SAR目标检测方法。The present invention belongs to the technical field of radar target detection, and in particular relates to an anchor-frame-free SAR target detection method based on a supervised attention mechanism.

背景技术Background Art

合成孔径雷达(Synthetic Aperture Radar,SAR)由于其能够在任何气候条件下不分昼夜地作业,具有全天时、全天候的特点,已经被广泛应用在军事侦查、资源勘探、环境保护、灾害预防和科学研究等各种领域。SAR图像自动目标识别(Automatic TargetRecognition,ATR)技术致力于对复杂SAR场景中的目标进行定位和识别,是SAR图像应用的核心方向之一,在军事和民用领域都处于至关重要的地位。美国林肯实验室最早开展此方面的研究工作,并于20世纪80年代末提出了著名的SAR ATR三级处理流程:SAR目标检测、鉴别和识别,它是SAR图像解译最常见的处理流程。其中的SAR目标检测任务是SAR图像解译的重要内容。Synthetic Aperture Radar (SAR) has been widely used in various fields such as military reconnaissance, resource exploration, environmental protection, disaster prevention and scientific research because it can operate day and night in any climatic conditions and has the characteristics of all-day and all-weather. SAR image automatic target recognition (ATR) technology is committed to locating and identifying targets in complex SAR scenes. It is one of the core directions of SAR image application and plays a vital role in both military and civilian fields. The Lincoln Laboratory in the United States was the first to carry out research in this area and proposed the famous SAR ATR three-level processing flow in the late 1980s: SAR target detection, identification and recognition. It is the most common processing flow for SAR image interpretation. Among them, the SAR target detection task is an important part of SAR image interpretation.

自从合成孔径雷达问世以来,有关SAR目标检测的技术也在飞速发展。在技术发展早期阶段,大量学者主要对传统目标检测方法进行研究,其中最为著名的方法是由美国林肯实验室于20世纪90年代提出的双参数恒虚警(Constant False Alarm Rate,CFAR)检测方法,该方法成功将CFAR检测器扩展到了二维的SAR图像目标检测领域中。由于双参数CFAR算法在一些简单场景中检测性能优越,大量研究都开始围绕恒虚警算法展开,提出了最小选择CFAR、最大选择CFAR、单元平均CFAR、序贯统计CFAR等检测器。虽然传统目标检测方法能够在一些简单场景中取得较好的目标检测性能,但是其仍具有许多缺点:(1)通常涉及大量超参数并需要设置阈值来完成目标检测任务,所以在实际中需要根据使用场景手动调参,流程繁琐,难以实现自适应检测,并且目标检测精确率较低;(2)传统目标检测方法通常是逐像素检测,检测时间较长,并且在一些复杂场景中难以取得较好的目标检测性能。因此,传统目标检测方法已逐渐满足不了实际的使用需要。Since the advent of synthetic aperture radar, the technology of SAR target detection has also developed rapidly. In the early stage of technology development, a large number of scholars mainly studied traditional target detection methods. The most famous method is the dual-parameter constant false alarm rate (CFAR) detection method proposed by the Lincoln Laboratory in the United States in the 1990s. This method successfully extended the CFAR detector to the field of two-dimensional SAR image target detection. Since the dual-parameter CFAR algorithm has excellent detection performance in some simple scenes, a large number of studies have begun to focus on the constant false alarm algorithm, and detectors such as minimum selection CFAR, maximum selection CFAR, unit average CFAR, and sequential statistical CFAR have been proposed. Although traditional target detection methods can achieve good target detection performance in some simple scenes, they still have many disadvantages: (1) They usually involve a large number of hyperparameters and need to set thresholds to complete the target detection task. Therefore, in practice, they need to manually adjust the parameters according to the usage scenario. The process is cumbersome, it is difficult to achieve adaptive detection, and the target detection accuracy is low; (2) Traditional target detection methods are usually pixel-by-pixel detection, which takes a long time to detect and is difficult to achieve good target detection performance in some complex scenes. Therefore, traditional target detection methods have gradually failed to meet actual usage needs.

近年来,随着深层网络在光学图像目标检测领域中大获成功,基于深度学习的SAR目标检测方法成为了众多学者研究的热点,目前已经在SAR目标检测任务中得到了广泛的应用,得到了远远优于传统目标检测方法的SAR目标检测性能。例如,专利CN202210269829.6提供了一种基于CFAR指导的双流SSD SAR图像目标检测方法,其通过SAR幅度特征和CFAR指示特征在特征空间上进行融合,来充分利用SAR图像中目标的强散射特性,增强目标检测性能;然后利用CFAR二值指示图来让检测器更加关注难分负样本和正样本的学习;最后提出了AR-NMS算法改进了传统的NMS算法,提高了SAR目标检测性能。In recent years, with the great success of deep networks in the field of optical image target detection, SAR target detection methods based on deep learning have become a hot topic for many scholars. At present, they have been widely used in SAR target detection tasks, and have achieved SAR target detection performance that is far superior to traditional target detection methods. For example, patent CN202210269829.6 provides a dual-stream SSD SAR image target detection method based on CFAR guidance, which fully utilizes the strong scattering characteristics of targets in SAR images and enhances target detection performance by fusing SAR amplitude features and CFAR indicator features in feature space; then, the CFAR binary indicator map is used to make the detector pay more attention to the learning of difficult-to-distinguish negative samples and positive samples; finally, the AR-NMS algorithm is proposed to improve the traditional NMS algorithm and improve the SAR target detection performance.

但是,对于SAR地面目标数据,由于其数据较少并且待检测场景较为复杂,通常包含大量与目标特性十分相似的自然杂波和人造杂波,这使得地面目标检测任务十分困难,目前有关SAR地面目标检测的研究也相对较少,主流的基于深层网络的SAR地面目标检测方法大多数是基于有锚框的目标检测方法,这些目标检测框架通常需要预先铺设大量锚框,其中涉及大量超参数的设置和复杂的IoU计算,并且存在严重的正负样本不平衡问题,这不利于SAR目标检测任务。另外,现有方法没有深入考虑SAR图像本身特有的几何畸变、辐射畸变和阴影遮挡等特点和SAR图像本身的特性对SAR目标检测的帮助,导致目前有关复杂场景下SAR目标检测方法的检测性能不高。However, for SAR ground target data, due to the small amount of data and the complex scenes to be detected, it usually contains a large amount of natural clutter and artificial clutter that are very similar to the target characteristics, which makes the ground target detection task very difficult. At present, there are relatively few studies on SAR ground target detection. Most of the mainstream SAR ground target detection methods based on deep networks are based on target detection methods with anchor boxes. These target detection frameworks usually require a large number of anchor boxes to be laid in advance, which involves a large number of hyperparameter settings and complex IoU calculations, and there is a serious imbalance problem between positive and negative samples, which is not conducive to SAR target detection tasks. In addition, the existing methods do not deeply consider the unique geometric distortion, radiation distortion and shadow occlusion of SAR images themselves and the help of the characteristics of SAR images themselves to SAR target detection, resulting in the low detection performance of the current SAR target detection methods in complex scenes.

发明内容Summary of the invention

为了解决现有技术中存在的上述问题,本发明提供了一种基于有监督注意力机制的无锚框SAR目标检测方法。本发明的技术思路是:通过对训练样本分别使用指数加权平均(Ratio of Exponentially Weighted Averages,ROEWA)边缘检测算法和双参数CFAR算法,得到每个训练样本的梯度信息和CFAR信息,并一同送入到构建好的基于梯度信息、CFAR信息融合和有监督注意力机制的无锚框SAR目标检测网络中进行训练,然后对测试样本进行同样的处理并输入到训练好的网络模型中得到最终的目标检测结果。本发明要解决的技术问题通过以下技术方案实现:In order to solve the above-mentioned problems existing in the prior art, the present invention provides an anchor-free SAR target detection method based on a supervised attention mechanism. The technical idea of the present invention is: by using the exponentially weighted average (Ratio of Exponentially Weighted Averages, ROEWA) edge detection algorithm and the dual-parameter CFAR algorithm for the training samples respectively, the gradient information and CFAR information of each training sample are obtained, and sent together to the constructed anchor-free SAR target detection network based on gradient information, CFAR information fusion and supervised attention mechanism for training, and then the test sample is processed in the same way and input into the trained network model to obtain the final target detection result. The technical problem to be solved by the present invention is achieved by the following technical solutions:

一种基于有监督注意力机制的无锚框SAR目标检测方法,包括:A method for anchor-free SAR target detection based on a supervised attention mechanism, comprising:

步骤1:基于原始SAR图像获得原始训练集和原始测试集;利用ROEWA边缘检测算法和双参数CFAR算法分别对原始训练集和原始测试集中获取梯度信息和CFAR信息,并据此构建新的训练集和新的测试集;Step 1: Obtain the original training set and the original test set based on the original SAR image; use the ROEWA edge detection algorithm and the dual-parameter CFAR algorithm to obtain the gradient information and CFAR information from the original training set and the original test set respectively, and construct a new training set and a new test set accordingly;

步骤2:构建基于梯度信息、CFAR信息融合和有监督注意力机制的无锚框SAR目标检测网络;其中,所述目标检测网络包括特征提取模块、特征处理模块和网络预测模块;Step 2: construct an anchor-free SAR target detection network based on gradient information, CFAR information fusion and supervised attention mechanism; wherein the target detection network includes a feature extraction module, a feature processing module and a network prediction module;

步骤3:利用所述新的训练集对所述目标检测网络进行训练,得到训练好的目标检测网络;Step 3: Using the new training set to train the target detection network to obtain a trained target detection network;

步骤4:将所述新的测试集输入到训练好的目标检测网络中,得到初步的目标检测结果;Step 4: Input the new test set into the trained target detection network to obtain preliminary target detection results;

步骤5:将所述初步的目标检测结果对应到测试图像上,并进行NMS操作以去除重叠的目标检测框,得到最终的目标检测结果。Step 5: Correspond the preliminary target detection result to the test image, and perform NMS operation to remove overlapping target detection frames to obtain the final target detection result.

本发明的有益效果:Beneficial effects of the present invention:

1、本发明提供的基于有监督注意力机制的无锚框SAR目标检测方法从增强目标特征的角度出发,引入了梯度幅度信息和CFAR信息,构建了基于梯度信息、CFAR信息融合的无锚框目标检测网络,实现了复杂场景下地面SAR目标检测任务,能够有效缓解有锚框目标检测网络本身存在的计算复杂、正负样本不平衡等问题,提升了SAR目标检测性能;1. The anchor-free SAR target detection method based on the supervised attention mechanism provided by the present invention introduces gradient amplitude information and CFAR information from the perspective of enhancing target features, constructs an anchor-free target detection network based on the fusion of gradient information and CFAR information, realizes the ground SAR target detection task in complex scenes, and can effectively alleviate the problems of computational complexity and imbalance of positive and negative samples existing in the anchor-free target detection network itself, thereby improving the SAR target detection performance;

2、本发明设计的目标检测网络在特征融合方面,采用了基于交互式注意力机制的三支路交互式通道-空间注意力融合(Triple-Interactive Channel-Spatial AttentionFusion,T-ICSAF)模块;同时,提出了基于Ground Truth(GT)二值标签监督的有监督空间注意力机制和SE通道注意力机制结合(Combining Supervised-Spatial And SE-ChannelAttention Mechanism,CSSCAM)模块来有效地抑制背景杂波特征,解决了由于传统特征引入所带来的额外背景杂波特征造成虚警框增加的问题,进一步增强目标特征,让特征提取后的目标特征更具有目标性,从而获得更好的SAR目标检测性能;2. In terms of feature fusion, the target detection network designed by the present invention adopts a Triple-Interactive Channel-Spatial Attention Fusion (T-ICSAF) module based on an interactive attention mechanism; at the same time, a Combining Supervised-Spatial And SE-Channel Attention Mechanism (CSSCAM) module based on Ground Truth (GT) binary label supervision is proposed to effectively suppress background clutter features, solve the problem of increased false alarm frames caused by additional background clutter features brought about by the introduction of traditional features, further enhance target features, and make the target features after feature extraction more targeted, thereby obtaining better SAR target detection performance;

3、本发明根据SAR图像本身具有几何畸变、辐射畸变、遮挡阴影等特点,设计了一个基于GT二值标签监督的Attention二分类分支取代了原始无锚框目标检测网络FCOS中的Centerness分支,让其更加适用于SAR车辆目标检测任务,并能联合本发明提出的CSSCAM模块使用,从而进一步提升SAR目标检测性能。3. According to the characteristics of SAR images such as geometric distortion, radiation distortion, occlusion and shadow, the present invention designs an Attention binary classification branch based on GT binary label supervision to replace the Centerness branch in the original anchor-free target detection network FCOS, making it more suitable for SAR vehicle target detection tasks, and can be used in conjunction with the CSSCAM module proposed in the present invention, thereby further improving the SAR target detection performance.

以下将结合附图及实施例对本发明做进一步详细说明。The present invention will be further described in detail below with reference to the accompanying drawings and embodiments.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本发明实施例提供的基于有监督注意力机制的无锚框SAR目标检测方法的一种流程示意图;FIG1 is a flow chart of a method for detecting SAR targets without anchor frames based on a supervised attention mechanism according to an embodiment of the present invention;

图2是本发明实施例提供的基于有监督注意力机制的无锚框SAR目标检测方法的另一种流程示意图;FIG2 is another schematic diagram of a flow chart of a method for anchor-free SAR target detection based on a supervised attention mechanism provided by an embodiment of the present invention;

图3是本发明实施例提供的无锚框SAR目标检测网络框架图;FIG3 is a network framework diagram of anchor-free SAR target detection provided by an embodiment of the present invention;

图4是本发明实施例提供的T-ICSAF特征融合模块的网络框架图;FIG4 is a network framework diagram of a T-ICSAF feature fusion module provided in an embodiment of the present invention;

图5是本发明实施例提供的CSSCAM模块的网络框架图;FIG5 is a network framework diagram of a CSSCAM module provided in an embodiment of the present invention;

图6-11是本发明实验所使用的MiniSAR数据图像。Figures 6-11 are MiniSAR data images used in the experiments of the present invention.

具体实施方式DETAILED DESCRIPTION

下面结合具体实施例对本发明做进一步详细的描述,但本发明的实施方式不限于此。The present invention is further described in detail below with reference to specific embodiments, but the embodiments of the present invention are not limited thereto.

实施例一Embodiment 1

请联合参见图1-2,图1是本发明实施例提供的基于有监督注意力机制的无锚框SAR目标检测方法的一种流程示意图,图2是本发明实施例提供的基于有监督注意力机制的无锚框SAR目标检测方法的另一种流程示意图。本发明提供的基于有监督注意力机制的无锚框SAR目标检测方法具体包括:Please refer to Figures 1-2, Figure 1 is a flowchart of a method for anchor-free SAR target detection based on a supervised attention mechanism provided by an embodiment of the present invention, and Figure 2 is another flowchart of a method for anchor-free SAR target detection based on a supervised attention mechanism provided by an embodiment of the present invention. The method for anchor-free SAR target detection based on a supervised attention mechanism provided by the present invention specifically includes:

步骤1:基于原始SAR图像获得原始训练集和原始测试集;利用ROEWA边缘检测算法和双参数CFAR算法分别对原始训练集和原始测试集中获取梯度信息和CFAR信息,并据此构建新的训练集和新的测试集。Step 1: Obtain the original training set and the original test set based on the original SAR image; use the ROEWA edge detection algorithm and the dual-parameter CFAR algorithm to obtain the gradient information and CFAR information from the original training set and the original test set respectively, and construct a new training set and a new test set accordingly.

一、构建新的训练集1. Construct a new training set

首先,选取若干幅原始SAR图像作为训练图像,并对其进行切片,以得到若干训练切片作为原始训练集ψ;Firstly, several original SAR images are selected as training images and sliced to obtain several training slices as the original training set ψ;

然后,对选取的训练图像分别通过ROEWA边缘检测算法和双参数CFAR算法得到梯度幅度训练图像和CFAR二值训练图像;Then, the selected training images are respectively subjected to the ROEWA edge detection algorithm and the dual-parameter CFAR algorithm to obtain the gradient amplitude training image and the CFAR binary training image;

最后,对梯度幅度训练图像和CFAR二值训练图像进行切片,得到与原始训练集ψ对应的梯度幅度训练切片

Figure BDA0004091497180000041
和CFAR二值训练切片
Figure BDA0004091497180000042
并与原始训练集ψ一起形成新的训练集ψ'。Finally, the gradient magnitude training image and the CFAR binary training image are sliced to obtain the gradient magnitude training slice corresponding to the original training set ψ
Figure BDA0004091497180000041
and CFAR binary training slices
Figure BDA0004091497180000042
And together with the original training set ψ, a new training set ψ' is formed.

可选的,作为一种实现方式,本实施例可以从MiniSAR数据集中选取五幅图像作为训练图像,以45为步长,800×1333为切片大小得到3536张训练切片构成原始训练集ψ;然后对选取的训练图像分别通过ROEWA边缘检测算法和双参数CFAR算法得到梯度幅度训练图像和CFAR二值训练图像,并以45为步长,800×1333为切片大小得到与原始训练集ψ对应的梯度幅度训练切片

Figure BDA0004091497180000043
和CFAR二值训练切片
Figure BDA0004091497180000044
最后与原始训练集ψ一起构成最终的训练集ψ'。Optionally, as an implementation method, this embodiment can select five images from the MiniSAR data set as training images, with a step size of 45 and a slice size of 800×1333 to obtain 3536 training slices to form an original training set ψ; then, the selected training images are respectively subjected to the ROEWA edge detection algorithm and the dual-parameter CFAR algorithm to obtain a gradient amplitude training image and a CFAR binary training image, and with a step size of 45 and a slice size of 800×1333 to obtain a gradient amplitude training slice corresponding to the original training set ψ
Figure BDA0004091497180000043
and CFAR binary training slices
Figure BDA0004091497180000044
Finally, together with the original training set ψ, they form the final training set ψ'.

二、构建新的测试集2. Build a new test set

首先,选取一幅原始SAR图像作为测试图像,并对其进行切片,以得到若干测试切片作为原始测试集T;First, an original SAR image is selected as a test image and sliced to obtain several test slices as the original test set T;

然后,对选取的测试图像分别通过ROEWA边缘检测算法和双参数CFAR算法得到梯度幅度测试图像和CFAR二值测试图像;Then, the selected test image is subjected to the ROEWA edge detection algorithm and the dual-parameter CFAR algorithm to obtain the gradient amplitude test image and the CFAR binary test image;

最后,对梯度幅度测试图像和CFAR二值测试图像进行切片,得到与原始测试集T对应的梯度幅度测试切片

Figure BDA0004091497180000051
和CFAR二值测试切片
Figure BDA0004091497180000052
并与原始测试集T一起形成新的测试集T'。Finally, the gradient magnitude test image and the CFAR binary test image are sliced to obtain the gradient magnitude test slice corresponding to the original test set T
Figure BDA0004091497180000051
and CFAR binary test slices
Figure BDA0004091497180000052
And together with the original test set T, form a new test set T'.

可选的,作为一种实现方式,从MiniSAR数据集中选取一幅图像作为测试图像,以150为步长,800×1333为切片大小得到63张测试切片构成原始测试集T,并记录每个测试切片与测试图像对应的位置关系Loc,然后对选取的测试图像分别通过ROEWA边缘检测算法和双参数CFAR算法得到梯度幅度测试图像和CFAR二值测试图像,并以150为步长,800×1333为切片大小得到与原始测试集T对应的梯度幅度测试切片

Figure BDA0004091497180000053
和CFAR二值测试切片
Figure BDA0004091497180000054
最后与原始测试集T一起构成最终的测试集T'。Optionally, as an implementation method, an image is selected from the MiniSAR data set as a test image, and 63 test slices are obtained with a step size of 150 and a slice size of 800×1333 to form an original test set T, and the positional relationship Loc corresponding to each test slice and the test image is recorded, and then the selected test image is respectively subjected to the ROEWA edge detection algorithm and the dual-parameter CFAR algorithm to obtain a gradient amplitude test image and a CFAR binary test image, and the gradient amplitude test slice corresponding to the original test set T is obtained with a step size of 150 and a slice size of 800×1333.
Figure BDA0004091497180000053
and CFAR binary test slices
Figure BDA0004091497180000054
Finally, together with the original test set T, they form the final test set T'.

进一步的,在构建训练集和测试集的过程中,均需要利用ROEWA边缘检测算法获取SAR图像的梯度幅度图像,其操作过程相同,具体步骤如下:Furthermore, in the process of constructing the training set and the test set, the ROEWA edge detection algorithm is needed to obtain the gradient amplitude image of the SAR image. The operation process is the same, and the specific steps are as follows:

1)输入原始SAR图像;1) Input the original SAR image;

2)对于输入SAR图像中的任一像素点(i,j),首先使用ROEWA算法分别计算该像素点在水平方向上的水平梯度

Figure BDA0004091497180000055
和在垂直方向上的垂直梯度
Figure BDA0004091497180000056
然后利用以下公式计算像素点(i,j)的梯度幅度Gi,j:2) For any pixel point (i, j) in the input SAR image, the ROEWA algorithm is first used to calculate the horizontal gradient of the pixel point in the horizontal direction
Figure BDA0004091497180000055
and the vertical gradient in the vertical direction
Figure BDA0004091497180000056
Then the gradient magnitudeGi,j of pixel (i,j) is calculated using the following formula:

Figure BDA0004091497180000057
Figure BDA0004091497180000057

具体的,在实施例中,使用ROEWA算法计算任一像素点(i,j)的水平梯度

Figure BDA0004091497180000058
和垂直梯度
Figure BDA0004091497180000059
的具体过程如下:Specifically, in the embodiment, the ROEWA algorithm is used to calculate the horizontal gradient of any pixel point (i, j)
Figure BDA0004091497180000058
and vertical gradient
Figure BDA0004091497180000059
The specific process is as follows:

a、计算水平梯度

Figure BDA00040914971800000510
a. Calculate horizontal gradient
Figure BDA00040914971800000510

对于任一像素点(i,j),首先计算该像素点左右两侧(4σ+1)×2σ范围内的像素点幅度值的指数加权均值ML和MR,然后将ML和MR作商并取对数,从而得到水平梯度

Figure BDA00040914971800000511
其中σ为指数加权因子。水平梯度
Figure BDA00040914971800000512
的计算公式如下所示:For any pixel point (i, j), first calculate the exponentially weighted meanML and MR of the pixel amplitude values within the range of (4σ+1)×2σ on the left and right sides of the pixel point, then take the quotient of MLandMRand take the logarithm to obtain the horizontal gradient
Figure BDA00040914971800000511
Where σ is an exponential weighting factor. Horizontal gradient
Figure BDA00040914971800000512
The calculation formula is as follows:

Figure BDA00040914971800000513
Figure BDA00040914971800000513

Figure BDA0004091497180000061
Figure BDA0004091497180000061

Figure BDA0004091497180000062
Figure BDA0004091497180000062

其中I(·)表示SAR图像中像素点的幅度值。Where I(·) represents the amplitude value of the pixel in the SAR image.

b、计算垂直梯度

Figure BDA0004091497180000063
b. Calculate vertical gradient
Figure BDA0004091497180000063

对于任一像素点(i,j),首先计算该像素点上下两侧(4σ+1)×2σ范围内的像素点幅度值的指数加权均值MT和MB,然后将MT和MB作商并取对数,从而得到垂直梯度

Figure BDA0004091497180000064
其中σ为指数加权因子。垂直梯度
Figure BDA0004091497180000065
的计算公式如下所示:For any pixel point (i, j), first calculate the exponentially weighted meanMT andMB of the pixel amplitude values within the range of (4σ+1)×2σ on both sides of the pixel point, then take the quotient and logarithm ofMT andMB to get the vertical gradient
Figure BDA0004091497180000064
Where σ is an exponential weighting factor. Vertical gradient
Figure BDA0004091497180000065
The calculation formula is as follows:

Figure BDA0004091497180000066
Figure BDA0004091497180000066

Figure BDA0004091497180000067
Figure BDA0004091497180000067

Figure BDA0004091497180000068
Figure BDA0004091497180000068

其中I(·)表示SAR图像中像素点的幅度值。Where I(·) represents the amplitude value of the pixel in the SAR image.

3)重复2)操作,得到输入SAR图像中每个像素点的梯度幅度,从而得到原始SAR图像对应的梯度幅度图像。3) Repeat operation 2) to obtain the gradient amplitude of each pixel in the input SAR image, thereby obtaining the gradient amplitude image corresponding to the original SAR image.

更进一步的,在构建训练集和测试集的过程中,还需要利用双参数CFAR算法获取SAR图像的CFAR二值图像,其操作步骤如下:Furthermore, in the process of constructing the training set and the test set, it is also necessary to use the dual-parameter CFAR algorithm to obtain the CFAR binary image of the SAR image. The operation steps are as follows:

1)输入原始SAR图像;1) Input the original SAR image;

2)对于任一像素点(i,j),利用其幅度Ii,j定义CFAR检测统计量:2) For any pixel (i, j), the CFAR detection statistic is defined using its amplitude Ii, j :

Figure BDA0004091497180000069
Figure BDA0004091497180000069

其中

Figure BDA00040914971800000610
Figure BDA00040914971800000611
是高斯分布均值和标准差的最大似然估计,是利用像素点(i,j)附近背景杂波像素点的幅度值xi计算得到的:in
Figure BDA00040914971800000610
and
Figure BDA00040914971800000611
is the maximum likelihood estimate of the mean and standard deviation of the Gaussian distribution, which is calculated using the amplitude valuexi of the background clutter pixel near the pixel point (i, j):

Figure BDA00040914971800000612
Figure BDA00040914971800000612

Figure BDA0004091497180000071
Figure BDA0004091497180000071

3)将2)计算得到的CFAR检测统计量Di,j与CFAR检测阈值T进行比较,如果Di,j>T,则将该像素点被判为目标,CFAR二值检测结果为1,即将CFAR二值图像中的对应位置设置为1;如果Di,j<T,则将该像素点被判为背景,CFAR二值检测结果为0,即将CFAR二值图像中的对应位置设置为0;3) Compare the CFAR detection statistic Di,j calculated in 2) with the CFAR detection threshold T. If Di,j > T, the pixel is judged as a target, and the CFAR binary detection result is 1, that is, the corresponding position in the CFAR binary image is set to 1; if Di,j < T, the pixel is judged as a background, and the CFAR binary detection result is 0, that is, the corresponding position in the CFAR binary image is set to 0;

4)重复上述2)、3)操作,得到输入SAR图像中每个像素点对应的CFAR二值检测结果,从而得到原始SAR图像对应的CFAR二值图像。4) Repeat the above 2) and 3) operations to obtain the CFAR binary detection result corresponding to each pixel in the input SAR image, thereby obtaining the CFAR binary image corresponding to the original SAR image.

步骤2:构建基于梯度信息、CFAR信息融合和有监督注意力机制的无锚框SAR目标检测网络。Step 2: Construct an anchor-free SAR target detection network based on gradient information, CFAR information fusion and supervised attention mechanism.

请参见图3,图3是本发明实施例提供的无锚框SAR目标检测网络框架图,该目标检测网络包括特征提取模块、特征处理模块和网络预测模块。Please refer to FIG. 3 , which is a framework diagram of an anchor-free SAR target detection network provided by an embodiment of the present invention. The target detection network includes a feature extraction module, a feature processing module and a network prediction module.

下面分别对上述三个模块进行详细介绍。The above three modules are introduced in detail below.

(一)特征提取模块(I) Feature extraction module

如图3所示,特征提取模块包括结构完全相同但参数不共享的三个特征提取子网络A、B、C,分别为幅度特征提取网络A、梯度特征提取网络B和CFAR特征提取网络C;其中,As shown in Figure 3, the feature extraction module includes three feature extraction sub-networks A, B, and C with exactly the same structure but no shared parameters, namely, amplitude feature extraction network A, gradient feature extraction network B, and CFAR feature extraction network C;

三个特征提取子网络A、B、C均包括一个以ResNet-18为骨架的特征提取模块和一个FPN多尺度特征融合模块;The three feature extraction sub-networks A, B, and C each include a feature extraction module with ResNet-18 as the skeleton and an FPN multi-scale feature fusion module;

三个特征提取模块分别用于对训练切片ψ、

Figure BDA0004091497180000072
Figure BDA0004091497180000073
进行特征提取,得到对应的输出特征层
Figure BDA0004091497180000074
Figure BDA0004091497180000075
i=A,B,C;The three feature extraction modules are used to extract the training slices ψ,
Figure BDA0004091497180000072
and
Figure BDA0004091497180000073
Perform feature extraction to obtain the corresponding output feature layer
Figure BDA0004091497180000074
and
Figure BDA0004091497180000075
i=A,B,C;

三个FPN多尺度特征融合模块分别用于对输出特征层

Figure BDA0004091497180000076
Figure BDA0004091497180000077
进行多尺度特征融合,得到整个特征提取模块的输出特征层
Figure BDA0004091497180000078
Figure BDA0004091497180000079
The three FPN multi-scale feature fusion modules are used to output feature layers
Figure BDA0004091497180000076
and
Figure BDA0004091497180000077
Perform multi-scale feature fusion to obtain the output feature layer of the entire feature extraction module
Figure BDA0004091497180000078
and
Figure BDA0004091497180000079

在本实施例中,三个特征提取子网络A、B、C的特征提取网络所使用的ResNet-18的网络结构、参数设置及对应关系如下:In this embodiment, the network structure, parameter settings and corresponding relationships of ResNet-18 used in the feature extraction networks of the three feature extraction subnetworks A, B, and C are as follows:

所使用到的ResNet-18的网络结构主要包括五个layer:conv1、conv2_x、conv3_x、conv4_x和conv5_x,输入网络的图像大小为H×W×3:The network structure of ResNet-18 used mainly includes five layers: conv1, conv2_x, conv3_x, conv4_x and conv5_x. The image size of the input network is H×W×3:

conv1:包括一个卷积层L1,其卷积核窗口大小为7×7,滑动步长为2,填充属性Padding为3,用于输出64个

Figure BDA0004091497180000081
大小的特征图
Figure BDA0004091497180000082
j表示第j个特征图。conv1的输出为conv2_x的输入;conv1: includes a convolution layerL1 , whose convolution kernel window size is 7×7, sliding step size is 2, padding attribute Padding is 3, used to output 64
Figure BDA0004091497180000081
Feature map of size
Figure BDA0004091497180000082
j represents the jth feature map. The output of conv1 is the input of conv2_x;

conv2_x:包括一个池化层P1、两个卷积模块M1、M2和两个残差模块R1、R2。具体地,P1采用最大池化的方式,其输入为conv1输出的特征层X1,卷积核窗口大小为3×3,滑动步长为2,填充属性Padding为1,用于输出64个

Figure BDA0004091497180000083
大小的特征图
Figure BDA0004091497180000084
j表示第j个特征图。P1的输出为M1和R1的输入;M1包含两个卷积层L2和L3,用于输出64个
Figure BDA0004091497180000085
大小的特征图
Figure BDA0004091497180000086
j表示第j个特征图。其中L2和L3的卷积核窗口大小为3×3,滑动步长为1,填充属性Padding为1,M1的输出为R1的另一个输入;R1包含一个将P1的输出特征层和M1的输出特征层的对应元素进行加和的操作,用于输出64个
Figure BDA0004091497180000087
大小的特征图
Figure BDA00040914971800000816
j表示第j个特征图。R1的输出为M2和R2的输入;M2同样包含两个卷积层L4和L5,用于输出64个
Figure BDA0004091497180000088
大小的特征图
Figure BDA0004091497180000089
j表示第j个特征图。其中L4和L5的卷积核窗口大小为3×3,滑动步长为1,填充属性Padding为1,M2的输出为R2的另一个输入;R2包含一个将R1的输出特征层和M2的输出特征层的对应元素进行加和的操作,用于输出64个
Figure BDA00040914971800000810
大小的特征图
Figure BDA00040914971800000811
j表示第j个特征图。R2的输出为conv3_x的输入;conv2_x: includes a pooling layer P1 , two convolution modules M1 , M2 and two residual modules R1 , R2 . Specifically, P1 uses the maximum pooling method, its input is the feature layer X1 output by conv1, the convolution kernel window size is 3×3, the sliding step is 2, and the padding attribute Padding is 1, which is used to output 64
Figure BDA0004091497180000083
Feature map of size
Figure BDA0004091497180000084
j represents the jth feature map. The output of P1 is the input of M1 and R1 ; M1 contains two convolutional layers L2 and L3 , which are used to output 64
Figure BDA0004091497180000085
Feature map of size
Figure BDA0004091497180000086
j represents the jth feature map. The convolution kernel window size ofL2 andL3 is 3×3, the sliding step is 1, the padding attribute Padding is 1, and the output ofM1 is another input ofR1 ;R1 contains an operation that adds the corresponding elements of the output feature layer ofP1 and the output feature layer ofM1 to output 64
Figure BDA0004091497180000087
Feature map of size
Figure BDA00040914971800000816
j represents the jth feature map. The output of R1 is the input of M2 and R2 ; M2 also contains two convolutional layers L4 and L5 , which are used to output 64
Figure BDA0004091497180000088
Feature map of size
Figure BDA0004091497180000089
j represents the jth feature map. The convolution kernel window size ofL4 andL5 is 3×3, the sliding step is 1, the padding attribute is 1, and the output ofM2 is another input ofR2 ;R2 contains an operation that adds the corresponding elements of the output feature layer ofR1 and the output feature layer ofM2 to output 64
Figure BDA00040914971800000810
Feature map of size
Figure BDA00040914971800000811
j represents the jth feature map. The output of R2 is the input of conv3_x;

conv3_x:包括两个卷积模块M3、M4和两个残差模块R3、R4。具体地,M3包含两个卷积层L6和L7,用于输出128个

Figure BDA00040914971800000812
大小的特征图
Figure BDA00040914971800000813
j表示第j个特征图。其中L6的卷积核窗口大小为3×3,滑动步长为2,填充属性Padding为1,L7的卷积核窗口大小为3×3,滑动步长为1,填充属性Padding为1,M3的输出为R3的一个输入;R3的另一个输入为R2输出的特征层X6,R3首先将X6通过一个卷积核窗口大小为1×1,滑动步长为2,填充属性Padding为0的卷积层L8,用于输出128个
Figure BDA00040914971800000814
大小的特征图
Figure BDA00040914971800000815
j表示第j个特征图,然后将L8输出的特征层X8和M3的输出特征层X7进行对应元素的加和操作,用于输出128个
Figure BDA0004091497180000091
大小的特征图
Figure BDA0004091497180000092
R3的输出为R4和M4的输入;M4包含两个卷积层L9和L10,用于输出128个
Figure BDA0004091497180000093
大小的特征图
Figure BDA0004091497180000094
j表示第j个特征图。其中L9和L10的卷积核窗口大小为3×3,滑动步长为1,填充属性Padding为1,M4的输出为R4的另一个输入;R4包含一个将R3的输出特征层和M4的输出特征层的对应元素进行加和的操作,用于输出128个
Figure BDA0004091497180000095
大小的特征图
Figure BDA0004091497180000096
j表示第j个特征图。R4的输出为conv4_x的输入,另外R4的输出即为A、B、C中特征提取网络输出的
Figure BDA0004091497180000097
特征层(i=A,B,C),同时它也是与之对应的FPN多尺度特征融合模块的一个输入;conv3_x: includes two convolution modules M3 , M4 and two residual modules R3 , R4 . Specifically, M3 contains two convolution layers L6 and L7 to output 128
Figure BDA00040914971800000812
Feature map of size
Figure BDA00040914971800000813
j represents the jth feature map. The convolution kernel window size of L6 is 3×3, the sliding step size is 2, and the padding attribute is 1. The convolution kernel window size of L7 is 3×3, the sliding step size is 1, and the padding attribute is 1. The output of M3 is an input of R3 ; the other input of R3 is the feature layer X6 output by R2. R3 first passes X6 through a convolution layer L8 with a convolution kernel window size of 1×1, a sliding step size of 2, and a padding attribute of 0, which is used to output 128
Figure BDA00040914971800000814
Feature map of size
Figure BDA00040914971800000815
j represents the jth feature map, and then the feature layerX8 output byL8 and the feature layerX7 output byM3 are summed up to output 128
Figure BDA0004091497180000091
Feature map of size
Figure BDA0004091497180000092
The output of R3 is the input of R4 and M4 ; M4 contains two convolutional layers L9 and L10 , which are used to output 128
Figure BDA0004091497180000093
Feature map of size
Figure BDA0004091497180000094
j represents the jth feature map. The convolution kernel window size ofL9 andL10 is 3×3, the sliding step is 1, the padding attribute is 1, and the output ofM4 is another input ofR4 ;R4 contains an operation that adds the corresponding elements of the output feature layer ofR3 and the output feature layer ofM4 to output 128
Figure BDA0004091497180000095
Feature map of size
Figure BDA0004091497180000096
j represents the jth feature map. The output of R4 is the input of conv4_x. In addition, the output of R4 is the output of the feature extraction network in A, B, and C.
Figure BDA0004091497180000097
Feature layer (i = A, B, C), which is also an input of the corresponding FPN multi-scale feature fusion module;

conv4_x:包括两个卷积模块M5、M6和两个残差模块R5、R6。具体地,M5包括两个卷积层L11和L12,用于输出256个

Figure BDA0004091497180000098
大小的特征图
Figure BDA0004091497180000099
j表示第j个特征图。其中L11的卷积核窗口大小为3×3,滑动步长为2,填充属性Padding为1,L12的卷积核窗口大小为3×3,滑动步长为1,填充属性Padding为1,M5的输出为R5的一个输入;R5的另一个输入为R4输出的特征层X11,R5首先将X11通过一个卷积核窗口大小为1×1,滑动步长为2,填充属性Padding为0的卷积层L13,用于输出256个
Figure BDA00040914971800000910
大小的特征图
Figure BDA00040914971800000911
j表示第j个特征图,然后将L13输出的特征层X13和M5的输出特征层X12进行对应元素的加和操作,用于输出256个
Figure BDA00040914971800000912
大小的特征图
Figure BDA00040914971800000913
j表示第j个特征图。R5的输出为R6和M6的输入;M6包含两个卷积层L14和L15,用于输出256个
Figure BDA00040914971800000914
大小的特征图
Figure BDA00040914971800000915
j表示第j个特征图。其中L14和L15的卷积核窗口大小为3×3,滑动步长为1,填充属性Padding为1的卷积操作,M6的输出为R6的另一个输入;R6包含一个将R5的输出特征层和M6的输出特征层的对应元素进行加和的操作,用于输出256个
Figure BDA00040914971800000916
大小的特征图
Figure BDA00040914971800000917
j表示第j个特征图。R6的输出为conv5_x的输入,另外R6的输出即为A、B、C中特征提取网络输出的
Figure BDA00040914971800000918
特征层(i=A,B,C),同时它也是与之对应的FPN多尺度特征融合模块的一个输入;conv4_x: includes two convolution modules M5 , M6 and two residual modules R5 , R6 . Specifically, M5 includes two convolution layers L11 and L12 for outputting 256
Figure BDA0004091497180000098
Feature map of size
Figure BDA0004091497180000099
j represents the jth feature map. The convolution kernel window size ofL11 is 3×3, the sliding step size is 2, and the padding attribute is 1. The convolution kernel window size ofL12 is 3×3, the sliding step size is 1, and the padding attribute is 1. The output ofM5 is an input ofR5 ; the other input ofR5 is the feature layerX11 output byR4.R5 first passesX11 through a convolution layerL13 with a convolution kernel window size of 1×1, a sliding step size of 2, and a padding attribute of 0, which is used to output 256
Figure BDA00040914971800000910
Feature map of size
Figure BDA00040914971800000911
j represents the jth feature map, and then the feature layerX13 output byL13 and the feature layerX12 output byM5 are summed up to output 256
Figure BDA00040914971800000912
Feature map of size
Figure BDA00040914971800000913
j represents the jth feature map. The output of R5 is the input of R6 and M6 ; M6 contains two convolutional layers L14 and L15 , which are used to output 256
Figure BDA00040914971800000914
Feature map of size
Figure BDA00040914971800000915
j represents the jth feature map. The convolution kernel window size ofL14 andL15 is 3×3, the sliding step size is 1, and the padding attribute is 1. The output ofM6 is another input ofR6 ;R6 contains an operation that adds the corresponding elements of the output feature layer ofR5 and the output feature layer ofM6 to output 256
Figure BDA00040914971800000916
Feature map of size
Figure BDA00040914971800000917
j represents the jth feature map. The output of R6 is the input of conv5_x. In addition, the output of R6 is the output of the feature extraction network in A, B, and C.
Figure BDA00040914971800000918
Feature layer (i = A, B, C), which is also an input of the corresponding FPN multi-scale feature fusion module;

conv5_x:包括两个卷积模块M7、M8和两个残差模块R7和R8。M7包括两个卷积层L16和L17,用于输出512个

Figure BDA0004091497180000101
大小的特征图
Figure BDA0004091497180000102
j表示第j个特征图。其中L16的卷积核窗口大小为3×3,滑动步长为2,填充属性Padding为1,L17的卷积核窗口大小为3×3,滑动步长为1,填充属性Padding为1,M7的输出为R7的输入;R7的另一个输入为R6输出特征层X16,R7首先将X16通过一个卷积核窗口大小为1×1,滑动步长为2,填充属性Padding为0的卷积层L18,用于输出512个
Figure BDA0004091497180000103
大小的特征图
Figure BDA0004091497180000104
j表示第j个特征图,然后将L18输出的特征层X18和M7输出的特征层X17进行对应元素的加和操作,用于输出512个
Figure BDA0004091497180000105
大小的特征图
Figure BDA0004091497180000106
R7的输出为R8和M8的输入;M8包含两个卷积层L19和L20,用于输出512个
Figure BDA0004091497180000107
大小的特征图
Figure BDA0004091497180000108
j表示第j个特征图。其中L19和L20的卷积核窗口大小为3×3,滑动步长为1,填充属性Padding为1,M8的输出为R8的另一个输入;R8包含一个将R7的输出特征层和M8的输出特征层的对应元素进行加和的操作,用于输出512个
Figure BDA0004091497180000109
大小的特征图
Figure BDA00040914971800001010
j表示第j个特征图。R8的输出即为A、B、C中特征提取网络输出的
Figure BDA00040914971800001011
特征层(i=A,B,C),同时它也是与之对应的FPN多尺度特征融合模块的一个输入。conv5_x: includes two convolution modulesM7 ,M8 and two residual modulesR7 andR8 .M7 includes two convolution layersL16 andL17 to output 512
Figure BDA0004091497180000101
Feature map of size
Figure BDA0004091497180000102
j represents the jth feature map. The convolution kernel window size of L16 is 3×3, the sliding step size is 2, and the padding attribute is 1. The convolution kernel window size of L17 is 3×3, the sliding step size is 1, and the padding attribute is 1. The output of M7 is the input of R7 ; another input of R7 is the output feature layer X16 of R6. R7 first passes X16 through a convolution kernel window size of 1×1, a sliding step size of 2, and a padding attribute of 0. The convolution layer L18 is used to output 512
Figure BDA0004091497180000103
Feature map of size
Figure BDA0004091497180000104
j represents the jth feature map, and then the feature layerX18 output byL18 and the feature layerX17 output byM7 are summed up to output 512
Figure BDA0004091497180000105
Feature map of size
Figure BDA0004091497180000106
The output ofR7 is the input ofR8 andM8 ;M8 contains two convolutional layersL19 andL20 , which are used to output 512
Figure BDA0004091497180000107
Feature map of size
Figure BDA0004091497180000108
j represents the jth feature map. The convolution kernel window size ofL19 andL20 is 3×3, the sliding step is 1, the padding attribute is 1, and the output ofM8 is another input ofR8 ;R8 contains an operation that adds the corresponding elements of the output feature layer ofR7 and the output feature layer ofM8 to output 512
Figure BDA0004091497180000109
Feature map of size
Figure BDA00040914971800001010
j represents the jth feature map. The output of R8 is the output of the feature extraction network in A, B, and C.
Figure BDA00040914971800001011
Feature layer (i = A, B, C), and it is also an input to the corresponding FPN multi-scale feature fusion module.

进一步的,三个特征提取子网络A、B、C使用FPN模块进行多尺度特征融合。FPN多尺度特征融合模块的实现方式、参数设置及对应关系如下:Furthermore, the three feature extraction sub-networks A, B, and C use the FPN module for multi-scale feature fusion. The implementation method, parameter setting, and corresponding relationship of the FPN multi-scale feature fusion module are as follows:

A、B、C中FPN多尺度特征融合模块的一个输入为与之对应的特征提取网络的输出特征层

Figure BDA00040914971800001012
Figure BDA00040914971800001013
以特征提取子网络A为例,其对应的FPN模块的输入为
Figure BDA00040914971800001014
Figure BDA00040914971800001015
One input of the FPN multi-scale feature fusion module in A, B, and C is the output feature layer of the corresponding feature extraction network
Figure BDA00040914971800001012
and
Figure BDA00040914971800001013
Taking feature extraction subnetwork A as an example, the input of its corresponding FPN module is
Figure BDA00040914971800001014
and
Figure BDA00040914971800001015

使用的FPN多尺度特征融合模块主要包括三个部分:FPN1模块、FPN2模块和FPN3模块:The FPN multi-scale feature fusion module used mainly consists of three parts: FPN1 module, FPN2 module and FPN3 module:

FPN1模块:FPN1模块的输入为

Figure BDA00040914971800001016
特征层,其特征层大小为
Figure BDA00040914971800001017
该模块的具体实现方式如下:首先将
Figure BDA0004091497180000111
进行一次卷积核的窗口大小为1×1,滑动步长为1,填充属性Padding为0的卷积操作,得到256个
Figure BDA0004091497180000112
大小的特征图
Figure BDA0004091497180000113
然后将得到的特征层X22进行一次上采样操作,得到256个
Figure BDA0004091497180000114
大小的特征图
Figure BDA0004091497180000115
即为FPN1模块的输出。另外FPN1模块的输出为FPN2模块的一个输入;FPN1 module: The input of FPN1 module is
Figure BDA00040914971800001016
The feature layer has a feature size of
Figure BDA00040914971800001017
The specific implementation of this module is as follows: First,
Figure BDA0004091497180000111
Perform a convolution operation with a convolution kernel window size of 1×1, a sliding step size of 1, and a padding attribute of 0, and get 256
Figure BDA0004091497180000112
Feature map of size
Figure BDA0004091497180000113
Then the obtained feature layer X22 is upsampled once to obtain 256
Figure BDA0004091497180000114
Feature map of size
Figure BDA0004091497180000115
This is the output of the FPN1 module. In addition, the output of the FPN1 module is an input to the FPN2 module;

FPN2模块:FPN2模块的输入为

Figure BDA0004091497180000116
特征层和FPN1模块的输出特征层X23
Figure BDA0004091497180000117
和X23的特征层大小均为
Figure BDA0004091497180000118
该模块的具体实现方式如下:首先将
Figure BDA0004091497180000119
进行一次卷积核的窗口大小为1×1,滑动步长为1,填充属性Padding为0的卷积操作,得到256个
Figure BDA00040914971800001110
大小的特征图
Figure BDA00040914971800001111
然后将得到的特征层X24和X23直接相加融合,得到256个
Figure BDA00040914971800001112
大小的特征图
Figure BDA00040914971800001113
最后将得到的特征层X25进行一次上采样操作,得到256个
Figure BDA00040914971800001114
大小的特征图
Figure BDA00040914971800001115
即为FPN2模块的输出。另外FPN2模块的输出为FPN3模块的一个输入;FPN2 module: The input of FPN2 module is
Figure BDA0004091497180000116
Feature layer and the output feature layer of FPN1 module X23 ,
Figure BDA0004091497180000117
The feature layer sizes ofX23 are
Figure BDA0004091497180000118
The specific implementation of this module is as follows: First,
Figure BDA0004091497180000119
Perform a convolution operation with a convolution kernel window size of 1×1, a sliding step size of 1, and a padding attribute of 0, and get 256
Figure BDA00040914971800001110
Feature map of size
Figure BDA00040914971800001111
Then the obtained feature layersX24 andX23 are directly added and fused to obtain 256
Figure BDA00040914971800001112
Feature map of size
Figure BDA00040914971800001113
Finally, the obtained feature layer X25 is upsampled once to obtain 256
Figure BDA00040914971800001114
Feature map of size
Figure BDA00040914971800001115
This is the output of the FPN2 module. In addition, the output of the FPN2 module is an input to the FPN3 module;

FPN3模块:FPN3模块的输入特征层为

Figure BDA00040914971800001116
特征层和FPN2模块的输出特征层X26
Figure BDA00040914971800001117
和X26的大小分别为
Figure BDA00040914971800001118
Figure BDA00040914971800001119
该模块的具体实现方式如下:首先将
Figure BDA00040914971800001120
进行一次卷积核的窗口大小为1×1,滑动步长为1,填充属性Padding为0的卷积操作,得到256个
Figure BDA00040914971800001121
大小的特征图
Figure BDA00040914971800001122
然后将得到的特征层X27和X26直接相加融合,得到256个
Figure BDA00040914971800001123
大小的特征图
Figure BDA00040914971800001124
最后将得到的特征层X28再进行一次卷积核的窗口大小为3×3,滑动步长为1,填充属性Padding为1的卷积操作,得到256个
Figure BDA00040914971800001125
大小的特征图
Figure BDA00040914971800001126
即为FPN3模块的输出。另外FPN3模块的输出即为整个FPN模块的输出特征层
Figure BDA00040914971800001127
FPN3 module: The input feature layer of the FPN3 module is
Figure BDA00040914971800001116
Feature layer and the output feature layer of the FPN2 module X26 ,
Figure BDA00040914971800001117
and X26 are
Figure BDA00040914971800001118
and
Figure BDA00040914971800001119
The specific implementation of this module is as follows: First,
Figure BDA00040914971800001120
Perform a convolution operation with a convolution kernel window size of 1×1, a sliding step size of 1, and a padding attribute of 0, and get 256
Figure BDA00040914971800001121
Feature map of size
Figure BDA00040914971800001122
Then the obtained feature layersX27 andX26 are directly added and fused to obtain 256
Figure BDA00040914971800001123
Feature map of size
Figure BDA00040914971800001124
Finally, the obtained feature layerX28 is subjected to another convolution operation with a convolution kernel window size of 3×3, a sliding step size of 1, and a padding attribute of 1, and 256
Figure BDA00040914971800001125
Feature map of size
Figure BDA00040914971800001126
This is the output of the FPN3 module. In addition, the output of the FPN3 module is the output feature layer of the entire FPN module.
Figure BDA00040914971800001127

(二)特征处理模块(II) Feature processing module

如图3所示,在本实施例中,特征处理模块包括一个基于交互式注意力机制的T-ICSAF特征融合模块和一个基于GT二值标签监督的空间注意力机制和通道注意力机制结合的CSSCAM模块;其中,As shown in FIG3 , in this embodiment, the feature processing module includes a T-ICSAF feature fusion module based on an interactive attention mechanism and a CSSCAM module combining a spatial attention mechanism and a channel attention mechanism based on GT binary label supervision; wherein,

T-ICSAF特征融合模块用于对特征提取模块的输出特征层

Figure BDA0004091497180000121
Figure BDA0004091497180000122
进行融合,得到融合特征F3;The T-ICSAF feature fusion module is used to fusion the output feature layer of the feature extraction module.
Figure BDA0004091497180000121
and
Figure BDA0004091497180000122
Perform fusion to obtain fusion feature F3 ;

CSSCAM模块用于对融合特征F3进行处理,得到目标特征被增强、背景杂波特征被抑制的特征层A3,以作为特征处理模块的输出。The CSSCAM module is used to process the fused feature F3 to obtain a feature layer A3 in which the target feature is enhanced and the background clutter feature is suppressed, as the output of the feature processing module.

请参见图4,图4是本发明实施例提供的T-ICSAF特征融合模块的网络框架图,其中,T-ICSAF特征融合模块主要包括四个子模块,分别为特征预处理子模块、交互式通道注意子模块、交互式空间注意子模块和注意力融合子模块。Please refer to Figure 4, which is a network framework diagram of the T-ICSAF feature fusion module provided by an embodiment of the present invention, wherein the T-ICSAF feature fusion module mainly includes four sub-modules, namely, a feature preprocessing sub-module, an interactive channel attention sub-module, an interactive spatial attention sub-module and an attention fusion sub-module.

在本实施例中,特征预处理子模块用于将特征提取部分的输出特征层

Figure BDA0004091497180000123
Figure BDA0004091497180000124
分别进行卷积操作和BN批标准化操作,对应得到三个特征层X30、X31和X32。In this embodiment, the feature preprocessing submodule is used to convert the output feature layer of the feature extraction part into
Figure BDA0004091497180000123
and
Figure BDA0004091497180000124
Convolution operations and BN batch normalization operations are performed respectively, and three feature layers X30 , X31 and X32 are obtained accordingly.

具体的,特征预处理子模块主要采用如下方式实现:Specifically, the feature preprocessing submodule is mainly implemented in the following ways:

将特征提取部分的输出特征层

Figure BDA0004091497180000125
Figure BDA0004091497180000126
分别进行一次卷积核的窗口大小为3×3,滑动步长为1,填充属性Padding为1的卷积操作和一次BN批标准化操作,得到3个
Figure BDA0004091497180000127
大小的特征层X30、X31和X32,即为特征预处理子模块的输出。另外特征预处理子模块输出的X30、X31和X32特征层为交互式通道注意子模块和交互式空间注意子模块的输入,特征预处理子模块输出的X30特征层为注意力融合子模块的一个输入。The output feature layer of the feature extraction part
Figure BDA0004091497180000125
and
Figure BDA0004091497180000126
Perform a convolution operation with a convolution kernel window size of 3×3, a sliding step size of 1, a padding attribute of 1, and a BN batch normalization operation to obtain 3
Figure BDA0004091497180000127
The feature layers X30 , X31 and X32 of different sizes are the outputs of the feature preprocessing submodule. In addition, the feature layers X30 , X31 and X32 output by the feature preprocessing submodule are the inputs of the interactive channel attention submodule and the interactive spatial attention submodule, and the feature layer X30 output by the feature preprocessing submodule is an input of the attention fusion submodule.

进一步的,交互式通道注意子模块用于对特征层X30、X31和X32依次进行全局平均池化、加和、以及Sigmoid归一化操作,得到通道注意权重FcFurthermore, the interactive channel attention submodule is used to perform global average pooling, summing, and sigmoid normalization operations on the feature layers X30 , X31 , and X32 in sequence to obtain the channel attention weight Fc .

在本实施例中,交互式通道注意子模块主要采用如下方式实现:In this embodiment, the interactive channel attention submodule is mainly implemented in the following manner:

首先将特征预处理子模块输出的X30、X31和X32特征层分别进行一次空间维度的全局平均池化操作,得到3个1×1×256大小的特征向量X33、X34和X35;然后将得到的X33、X34和X35进行逐元素的加和操作并进行一次Sigmoid归一化操作,得到一个1×1×256大小的通道注意权重Fc,即为交互式通道注意子模块的输出。另外交互式通道注意子模块的输出为注意力融合子模块的一个输入。First, the feature layersX30 ,X31 andX32 output by the feature preprocessing submodule are subjected to a global average pooling operation in the spatial dimension to obtain three feature vectorsX33 ,X34 andX35 of size 1×1×256; then the obtainedX33 ,X34 andX35 are added element by element and subjected to a Sigmoid normalization operation to obtain a channel attention weightFc of size 1×1×256, which is the output of the interactive channel attention submodule. In addition, the output of the interactive channel attention submodule is an input of the attention fusion submodule.

进一步的,交互式空间注意子模块用于对特征层X30、X31和X32依次进行通道维度上的全局平均池化、堆叠(Concat)、卷积以及Sigmoid归一化操作,得到空间注意权重FsFurthermore, the interactive spatial attention submodule is used to perform global average pooling, stacking (Concat), convolution and Sigmoid normalization operations on the feature layers X30 , X31 and X32 in the channel dimension in sequence to obtain the spatial attention weight Fs .

在本实施例中,交互式空间注意子模块主要采用如下方式实现:In this embodiment, the interactive spatial attention submodule is mainly implemented in the following manner:

首先将特征预处理子模块输出的X30、X31和X32特征层分别进行一次通道维度上的全局平均池化操作,得到3个

Figure BDA0004091497180000131
大小的特征图X36、X37和X38;然后将得到的X36、X37和X38进行一次Concat操作,得到1个
Figure BDA0004091497180000132
大小的特征图X39;再将得到的X39进行一次卷积核的窗口大小为3×3,滑动步长为1,填充属性Padding为1的卷积操作并进行一次Sigmoid归一化操作,得到一个
Figure BDA0004091497180000133
大小的空间注意权重Fs,即为交互式空间注意子模块的输出。另外交互式空间注意子模块的输出为注意力融合子模块的一个输入。First, theX30 ,X31 , andX32 feature layers output by the feature preprocessing submodule are subjected to a global average pooling operation in the channel dimension to obtain three
Figure BDA0004091497180000131
The feature maps ofsize X36 , X37 and X38 are then concat-operated to obtain1
Figure BDA0004091497180000132
The size of the feature map X39 is obtained; then the obtained X39 is convolved with a kernel window size of 3×3, a sliding step size of 1, a padding attribute of 1, and a Sigmoid normalization operation, and a
Figure BDA0004091497180000133
The spatial attention weightFs of the size is the output of the interactive spatial attention submodule. In addition, the output of the interactive spatial attention submodule is an input of the attention fusion submodule.

此外,注意力融合子模块用于将通道注意权重Fc与特征预处理子模块输出的特征层X30的每个通道对应相乘,得到特征层F'3;然后将空间注意权重Fs与特征层F'3每个通道的像素逐一对应相乘,得到融合特征层F3,即为T-ICSAF特征融合模块的输出。In addition, the attention fusion submodule is used to multiply the channel attention weightFc with each channel of the feature layerX30 output by the feature preprocessing submodule to obtain the feature layerF'3 ; then the spatial attention weightFs is multiplied with the pixels of each channel of the feature layerF'3 one by one to obtain the fused feature layerF3 , which is the output of the T-ICSAF feature fusion module.

在本实施例中,注意力融合子模块主要采用如下方式实现:In this embodiment, the attention fusion submodule is mainly implemented in the following manner:

首先将交互式通道注意子模块输出的通道注意权重Fc与特征预处理子模块输出的X30的每个通道对应相乘,得到一个

Figure BDA0004091497180000134
大小的特征层F'3;然后将交互式空间注意子模块输出的空间注意权重Fs与特征层F'3每个通道的像素逐一对应相乘,最终得到一个
Figure BDA0004091497180000135
大小的融合特征层F3,即为T-ICSAF特征融合模块的输出。另外T-ICSAF特征融合模块的输出为特征处理模块CSSCAM模块的输入。First, the channel attention weightFc output by the interactive channel attention submodule is multiplied by each channel ofX30 output by the feature preprocessing submodule to obtain a
Figure BDA0004091497180000134
Then the spatial attention weightFs output by the interactive spatial attention submodule is multiplied one by one by the pixels of each channel ofthe feature layerF'3 , and finally a
Figure BDA0004091497180000135
The fusion feature layer F3 of the size is the output of the T-ICSAF feature fusion module. In addition, the output of the T-ICSAF feature fusion module is the input of the feature processing module CSSCAM module.

进一步的,请参见图5,图5是本发明实施例提供的CSSCAM模块的网络框架图,其中,CSSCAM模块包括三个子模块,分别为GT二值标签监督的空间注意力机制子模块、SE通道注意力机制子模块和注意力融合子模块;其中,Further, please refer to FIG. 5 , which is a network framework diagram of the CSSCAM module provided by an embodiment of the present invention, wherein the CSSCAM module includes three submodules, namely, a spatial attention mechanism submodule for GT binary label supervision, a SE channel attention mechanism submodule, and an attention fusion submodule; wherein,

GT二值标签监督的空间注意力机制子模块用于对T-ICSAF特征融合模块输出的融合特征层F3依次进行通道维度上的全局平均池化操作和通道维度上的全局最大池化操作、Concat操作、卷积以及Sigmoid归一化操作,得到空间注意权重AsThe spatial attention mechanism submodule of GT binary label supervision is used to perform global average pooling operation on the channel dimension, global maximum pooling operation on the channel dimension, Concat operation, convolution and Sigmoid normalization operation on the fusion feature layerF3 output by the T-ICSAF feature fusion module to obtain the spatial attention weightAs .

具体的,GT二值标签监督的空间注意力机制子模块首先将T-ICSAF特征融合模块输出的融合特征F3分别进行一次通道维度上的全局平均池化操作和一次通道维度上的全局最大池化操作,得到2个

Figure BDA0004091497180000141
大小的特征图X40、X41;然后将得到的X40、X41进行一次Concat操作,得到1个
Figure BDA0004091497180000142
大小的特征图X42;再将得到的X42进行一次卷积核的窗口大小为7×7,滑动步长为1,填充属性Padding为3的卷积操作并进行一次Sigmoid归一化操作,得到一个
Figure BDA0004091497180000143
大小的空间注意权重As,即为GT二值标签监督的空间注意力机制子模块的输出。Specifically, the spatial attention mechanism submodule of the GT binary label supervision first performs a global average pooling operation on the channel dimension and a global maximum pooling operation on the channel dimension on the fusion featureF3 output by the T-ICSAF feature fusion module, and obtains two
Figure BDA0004091497180000141
The feature maps ofsize X40 and X41 are then concat- operated to obtain 1
Figure BDA0004091497180000142
The feature map of sizeX42 is obtained; then the obtainedX42 is convolved with a convolution kernel window size of 7×7, a sliding step size of 1, a padding attribute of 3, and a Sigmoid normalization operation, and a
Figure BDA0004091497180000143
The spatial attention weight Asof size is the output of the spatial attention mechanism submodule of GT binary label supervision.

SE通道注意力机制子模块用于对T-ICSAF特征融合模块输出的融合特征层F3依次进行空间维度的全局平均池化操作和全局最大池化操作、特征向量压缩、映射和解压缩的操作、加和、以及Sigmoid归一化操作,得到通道注意权重AcThe SE channel attention mechanism submodule is used to perform global average pooling and global maximum pooling operations in the spatial dimension, feature vector compression, mapping and decompression operations, addition, and Sigmoid normalization operations on the fused feature layerF3 output by the T-ICSAF feature fusion module to obtain the channel attention weight Ac .

具体的,SE通道注意力机制模块首先将T-ICSAF特征融合模块输出的融合特征F3分别进行一次空间维度的全局平均池化操作和一次空间维度的全局最大池化操作,得到2个1×1×256大小的特征向量X43、X44;然后将得到的X43、X44分别通过一个有16个神经元的全连接层L21、一个relu激活函数层和一个有256个神经元的全连接层L22进行对特征向量压缩、映射和解压缩的操作,得到通道注意后的2个1×1×256大小的特征向量X45、X46,其中全连接层L21将输入的特征向量压缩成1×1×16大小的特征向量,全连接层L22将输入的特征向量解压缩回1×1×256大小的特征向量;最后将得到的X45、X46进行逐元素的加和操作并进行一次Sigmoid归一化操作,得到一个1×1×256大小的通道注意权重Ac,即为SE通道注意力机制子模块的输出。Specifically, the SE channel attention mechanism module first performs a global average pooling operation in the spatial dimension and a global maximum pooling operation in the spatial dimension on the fusion featureF3 output by the T-ICSAF feature fusion module to obtain two 1×1×256 feature vectorsX43 andX44 ; then the obtainedX43 andX44 are respectively compressed, mapped and decompressed through a fully connected layerL21 with 16 neurons, a relu activation function layer and a fully connected layerL22 with 256 neurons to obtain two 1×1×256 feature vectorsX45 andX46 after channel attention, where the fully connected layerL21 compresses the input feature vector into a feature vector of 1×1×16, and the fully connected layerL22 decompresses the input feature vector back to a feature vector of 1×1×256; finally, the obtainedX45 and X46 are46 performs an element-by-element addition operation and a Sigmoid normalization operation to obtain a channel attention weight Ac of size 1×1×256, which is the output of the SE channel attention mechanism submodule.

注意力融合子模块用于将空间注意权重As与T-ICSAF特征融合模块输出的融合特征层F3的每个通道的像素逐一对应相乘,得到特征层A'3,然后将通道注意权重Ac与特征层A'3的每个通道对应相乘,得到一个目标特征被增强、背景杂波特征被抑制的特征层A3,即为CSSCAM模块的输出。The attention fusion submodule is used to multiply the spatial attention weightAs with the pixels of each channel of the fused feature layerF3 output by the T-ICSAF feature fusion module one by one to obtain the feature layerA'3 , and then multiply the channel attention weightAc with each channel of the feature layerA'3 to obtain a feature layerA3 in which the target features are enhanced and the background clutter features are suppressed, which is the output of the CSSCAM module.

具体的,首先将GT二值标签监督的空间注意力机制子模块输出的空间注意权重As与T-ICSAF特征融合模块输出的融合特征F3的每个通道的像素逐一对应相乘,得到一个

Figure BDA0004091497180000151
大小的特征层A'3;然后再将SE通道注意力机制子模块输出的通道注意权重Ac与特征层A3'的每个通道对应相乘,得到一个目标特征被增强、背景杂波特征被抑制的
Figure BDA0004091497180000152
大小的特征层A3,即为CSSCAM模块的输出。另外CSSCAM模块的输出为网络预测部分的输入。Specifically, first, the spatial attention weight As output by the spatial attention mechanism submodule of the GT binary label supervision is multiplied one by one by thepixels of each channel of the fusion featureF3 output by the T-ICSAF feature fusion module to obtain a
Figure BDA0004091497180000151
Then, the channel attentionweight Acoutput by the SE channel attention mechanism submodule is multiplied by each channel of the feature layerA3 ' to obtain a target feature that is enhanced and background clutter features that are suppressed.
Figure BDA0004091497180000152
The feature layer A3 of size is the output of the CSSCAM module. In addition, the output of the CSSCAM module is the input of the network prediction part.

此外,需要说明的是,本发明构建的目标检测网络中,还包括一监督标签生成模块,用于生成GT二值标签。In addition, it should be noted that the target detection network constructed by the present invention also includes a supervised label generation module for generating GT binary labels.

在本实施例中,GT二值标签的构造方法如下:In this embodiment, the construction method of the GT binary label is as follows:

a)对训练图像进行真实标注,将目标像素标注为1,背景像素标注为0,得到与训练图像相对应的标注图像;a) Perform true annotation on the training image, annotate the target pixel as 1 and the background pixel as 0, and obtain the annotated image corresponding to the training image;

b)以45为步长,800×1333为大小得到3536张与原始训练集ψ相对应的二值标注切片GT;b) With a step size of 45 and a size of 800×1333, 3536 binary labeled slices GT corresponding to the original training set ψ are obtained;

c)将b)中得到3536张二值标注切片GT下采样三次,得到与送入CSSCAM模块和网络预测部分的特征层同样大小的3536张二值标注切片GT',并保存与二值标注切片GT'对应的.mat文件作为最终的GT二值标签。c) Downsample the 3536 binary labeled slices GT obtained in b) three times to obtain 3536 binary labeled slices GT' of the same size as the feature layer sent to the CSSCAM module and the network prediction part, and save the .mat file corresponding to the binary labeled slice GT' as the final GT binary label.

(三)网络预测模块(III) Network prediction module

请继续参见图3,其中,网络预测模块包括用于目标检测任务的分类分支子模块、回归分支子模块和基于GT二值标签监督的Attention二分类分支子模块;Please continue to refer to Figure 3, where the network prediction module includes a classification branch submodule for target detection tasks, a regression branch submodule, and an Attention binary classification branch submodule based on GT binary label supervision;

分类分支子模块和回归分支子模块分别用于对特征处理模块的输出特征层A3进行预测,对应得到分类得分和边界框回归参数;The classification branch submodule and the regression branch submodule are used to predict the output feature layerA3 of the feature processing module, and obtain the classification score and bounding box regression parameters respectively;

Attention二分类分支子模块用于对用于预测分类得分的特征层CP和用于预测边界框回归参数的特征层RP进行处理,得到二分类得分。The Attention binary classification branch submodule is used to process the feature layer CP used to predict the classification score and the feature layer RP used to predict the bounding box regression parameters to obtain the binary classification score.

在本实施例中,分类分支子模块采用如下方式实现:In this embodiment, the classification branch submodule is implemented in the following manner:

首先对特征处理部分CSSCAM模块输出的特征层A3进行四次卷积操作,得到一个用于预测分类得分的特征层CP;然后对得到的特征层CP依次进行卷积和Sigmoid归一化操作,得到分类分支子模块预测的分类得分图X47First, the feature layer A3 output by the feature processing part CSSCAM module is convolved four times to obtain a feature layer CP for predicting classification scores; then the obtained feature layer CP is convolved and Sigmoid normalized in turn to obtain the classification score map X47 predicted by the classification branch submodule.

具体的,先将特征处理部分CSSCAM模块输出的特征层A3进行四次卷积核的窗口大小为3×3,滑动步长为1,填充属性Padding为1的卷积操作,得到一个用于预测分类得分的

Figure BDA0004091497180000161
大小的特征层CP;再将得到的特征层CP再进行一次卷积核的窗口大小为3×3,滑动步长为1,填充属性Padding为1的卷积操作并进行一次Sigmoid归一化操作,得到一个
Figure BDA0004091497180000162
大小的分类分支子模块预测的分类得分图X47。Specifically, the feature layerA3 output by the CSSCAM module of the feature processing part is first subjected to four convolution operations with a window size of 3×3, a sliding step size of 1, and a padding attribute of 1, and a convolution operation is obtained for predicting the classification score.
Figure BDA0004091497180000161
The feature layer CP of size is obtained; the obtained feature layer CP is subjected to a convolution operation with a convolution kernel window size of 3×3, a sliding step size of 1, a padding attribute of 1, and a Sigmoid normalization operation, and a
Figure BDA0004091497180000162
Classification score graph of size classification branch submodule prediction X47 .

在本实施例中,回归分支子模块采用如下方式实现:In this embodiment, the regression branch submodule is implemented in the following manner:

首先对特征处理部分CSSCAM模块输出的特征层A3进行四次卷积操作,得到一个用于预测边界框回归参数的特征层RP;然后对得到的特征层RP进行一次卷积操作,得到回归分支子模块预测的边界框回归参数X48First, the feature layerA3 output by the feature processing part CSSCAM module is convolved four times to obtain a feature layerRp for predicting bounding box regression parameters; then the obtained feature layerRp is convolved once to obtain the bounding box regression parametersX48 predicted by the regression branch submodule.

具体的,先将特征处理部分CSSCAM模块输出的特征层A3进行四次卷积核的窗口大小为3×3,滑动步长为1,填充属性Padding为1的卷积操作,得到一个用于预测边界框回归参数的

Figure BDA0004091497180000163
大小的特征层RP;再将得到的特征层RP再进行一次卷积核的窗口大小为3×3,滑动步长为1,填充属性Padding为1的卷积操作,得到一个
Figure BDA0004091497180000164
大小的回归分支子模块预测的边界框回归参数X48。Specifically, the feature layerA3 output by the CSSCAM module of the feature processing part is first subjected to four convolution operations with a window size of 3×3, a sliding step size of 1, and a padding attribute of 1, and a convolution operation is obtained for predicting the bounding box regression parameters.
Figure BDA0004091497180000163
The feature layer RP of size ; then the obtained feature layer RP is convolved again with a convolution kernel window size of 3×3, a sliding step size of 1, and a padding attribute of 1, and a convolution operation is obtained.
Figure BDA0004091497180000164
The bounding box regression parameters X48 predicted by the regression branch submodule of size.

在本实施例中,Attention二分类分支子模块采用如下方式实现:In this embodiment, the Attention binary classification branch submodule is implemented in the following manner:

首先将分类分支子模块中用于预测分类得分的特征层CP和回归分支子模块中用于预测边界框回归参数的特征层RP进行一次Concat操作,并对得到的特征图X49进行卷积操作,得到特征层AP;最后对特征层AP进行卷积操作和Sigmoid归一化操作,得到Attention二分类分支子模块预测的二分类得分图X50First, a concat operation is performed on the feature layer CP used to predict the classification score in the classification branch submodule and the feature layer RP used to predict the bounding box regression parameters in the regression branch submodule, and a convolution operation is performed on the obtained feature map X49 to obtain the feature layer AP ; finally, a convolution operation and a Sigmoid normalization operation are performed on the feature layer AP to obtain the binary classification score map X50 predicted by the Attention binary classification branch submodule.

具体的,先将分类分支子模块中用于预测分类得分的特征层CP和回归分支子模块中用于预测边界框回归参数的特征层RP进行一次Concat操作,得到1个

Figure BDA0004091497180000165
大小的特征图X49;再将得到的X49进行一次卷积核的窗口大小为3×3,滑动步长为1,填充属性Padding为1的卷积操作,得到一个
Figure BDA0004091497180000166
大小的特征层AP,最后将得到的特征层AP进行一次卷积核窗口大小为3×3,滑动步长为1填充属性Padding为1的卷积操作并进行一次Sigmoid归一化操作,得到一个
Figure BDA0004091497180000167
大小的Attention二分类分支子模块预测的二分类得分图X50。Specifically, the feature layer CP used to predict the classification score in the classification branch submodule and the feature layer RP used to predict the bounding box regression parameters in the regression branch submodule are concat- ed to obtain 1
Figure BDA0004091497180000165
The feature map of size X49 is obtained; then the obtained X49 is convolved with a convolution kernel window size of 3×3, a sliding step size of 1, and a padding attribute of 1 to obtain a
Figure BDA0004091497180000166
Finally, the obtained feature layerAP is subjected to a convolutionoperation with a convolution kernel window size of 3×3, a sliding step size of 1, and a padding attribute of 1, and a Sigmoid normalization operation, and a
Figure BDA0004091497180000167
Binary classification score map predicted by the Attention binary classification branch submodule of size X50 .

本发明设计的目标检测网络在特征融合方面,采用了基于三支路交互式注意力机制的特征融合模块T-ICSAF;同时,提出了基于GT二值标签监督的注意力机制模块CSSCAM来有效地抑制背景杂波特征,解决了由于传统特征引入所带来的额外背景杂波特征造成虚警框增加的问题,进一步增强目标特征,让特征提取后的目标特征更具有目标性,从而获得更好的SAR目标检测性能。In terms of feature fusion, the target detection network designed by the present invention adopts a feature fusion module T-ICSAF based on a three-branch interactive attention mechanism. At the same time, an attention mechanism module CSSCAM based on GT binary label supervision is proposed to effectively suppress background clutter features, solve the problem of increased false alarm frames caused by additional background clutter features brought about by the introduction of traditional features, further enhance target features, and make the target features after feature extraction more targeted, thereby obtaining better SAR target detection performance.

此外,本发明根据SAR图像本身具有几何畸变、辐射畸变、遮挡阴影等特点,设计了一个基于GT二值标签监督的Attention二分类分支取代了原始无锚框目标检测网络FCOS中的Centerness分支,让其更加适用于SAR车辆目标检测任务,并能联合本发明提出的CSSCAM模块使用,从而进一步提升SAR目标检测性能。In addition, according to the characteristics of SAR images themselves, such as geometric distortion, radiation distortion, occlusion and shadow, the present invention designs an Attention binary classification branch based on GT binary label supervision to replace the Centerness branch in the original anchor-free target detection network FCOS, making it more suitable for SAR vehicle target detection tasks, and can be used in conjunction with the CSSCAM module proposed in the present invention, thereby further improving the SAR target detection performance.

步骤3:利用新的训练集对目标检测网络进行训练,得到训练好的目标检测网络ψ'。Step 3: Use the new training set to train the target detection network to obtain the trained target detection network ψ'.

具体的,在对网络进行训练的过程中,CSSCAM模块使用GT二值标签GT'对GT二值标签监督的空间注意力机制子模块的空间注意权重As进行监督学习,损失函数使用FocalLoss;Specifically, in the process of training the network, the CSSCAM module uses the GT binary label GT' to supervise the learning of the spatial attentionweight As of the spatial attention mechanism submodule supervised by the GT binary label, and the loss function uses FocalLoss;

分类分支子模块使用根据无锚框正负样本选择策略得到的正负样本标签进行监督学习,损失函数使用Focal Loss;The classification branch submodule uses the positive and negative sample labels obtained according to the anchor-free positive and negative sample selection strategy for supervised learning, and the loss function uses Focal Loss;

回归分支子模块使用根据无锚框正负样本选择策略得到的正样本标签进行监督学习,损失函数使用GIOU Loss;The regression branch submodule uses the positive sample labels obtained according to the anchor-free positive and negative sample selection strategy for supervised learning, and the loss function uses GIOU Loss;

Attention二分类分支子模块使用与根据无锚框正负样本选择策略得到的正样本相同位置处的GT二值标签GT'进行监督学习,其损失函数使用BCE Loss。The Attention binary classification branch submodule uses the GT binary label GT' at the same position as the positive sample obtained according to the anchor-free positive and negative sample selection strategy for supervised learning, and its loss function uses BCE Loss.

具体的,Focal Loss损失函数的表达式如下:Specifically, the expression of the Focal Loss loss function is as follows:

FL(pt)=-(1-pt)γlog(pt)FL(pt )=-(1-pt )γ log(pt )

Figure BDA0004091497180000171
Figure BDA0004091497180000171

其中,pt表示网络预测为对应分类标签c*的概率,γ表示调制因子,一般取2;Among them, pt represents the probability of the network prediction corresponding to the classification label c* , γ represents the modulation factor, which is generally 2;

GIoU Loss损失函数针对两个边界框A和B,其表达式如下:The GIoU Loss loss function is for two bounding boxes A and B, and its expression is as follows:

Figure BDA0004091497180000172
Figure BDA0004091497180000172

Figure BDA0004091497180000181
Figure BDA0004091497180000181

其中,C表示能够包围边界框A、B的最小外接边界框,IoU表示边界框A、B的交并比;Among them, C represents the minimum external bounding box that can enclose bounding boxes A and B, and IoU represents the intersection-over-union ratio of bounding boxes A and B;

BCE Loss损失函数的表达式如下:The expression of BCE Loss loss function is as follows:

Figure BDA0004091497180000182
Figure BDA0004091497180000182

其中,p表示网络预测为目标(c*=1)的概率。Here, p represents the probability that the network predicts the target (c* = 1).

进一步的,在本实施例中,根据无锚框的正负样本选择策略得到正负样本标签使用的策略如下:Furthermore, in this embodiment, the strategy for obtaining positive and negative sample labels based on the positive and negative sample selection strategy without anchor boxes is as follows:

a)对于预测特征层上的任一位置(x,y),将其根据下采样倍数映射回原始图像上,得到原始图像上的对应位置

Figure BDA0004091497180000183
a) For any position (x, y) on the predicted feature layer, map it back to the original image according to the downsampling multiple to obtain the corresponding position on the original image
Figure BDA0004091497180000183

b)判断

Figure BDA0004091497180000184
是否落入目标水平框内部,首先计算
Figure BDA0004091497180000185
到每一个目标水平框四条边距离的最小值
Figure BDA0004091497180000186
M为原始图像中目标的数量。其中,假设任一目标水平框
Figure BDA0004091497180000187
Figure BDA0004091497180000188
到Bt四条边的距离lt、rt、tt、bt的计算公式如下:b) Judgment
Figure BDA0004091497180000184
Whether it falls inside the target horizontal box, first calculate
Figure BDA0004091497180000185
The minimum distance to each of the four sides of the target horizontal box
Figure BDA0004091497180000186
M is the number of objects in the original image. Assume that any object horizontal frame
Figure BDA0004091497180000187
but
Figure BDA0004091497180000188
The calculation formulas for the distances to the four sides ofBt , lt , rt , tt , and bt, are as follows:

Figure BDA0004091497180000189
Figure BDA0004091497180000189

Figure BDA00040914971800001810
Figure BDA00040914971800001810

然后按照以下三种情况对

Figure BDA00040914971800001811
对应的预测特征层上的位置(x,y)进行正负样本划分及标签定义:Then, according to the following three situations
Figure BDA00040914971800001811
The corresponding position (x, y) on the prediction feature layer is used for positive and negative sample division and label definition:

b1)如果

Figure BDA00040914971800001821
则将
Figure BDA00040914971800001812
对应的预测特征层上的位置(x,y)划分为负样本,类别标签
Figure BDA00040914971800001813
定义为0,回归标签
Figure BDA00040914971800001814
定义为-1。b1) If
Figure BDA00040914971800001821
Then
Figure BDA00040914971800001812
The corresponding position (x, y) on the predicted feature layer is divided into negative samples, and the category label
Figure BDA00040914971800001813
Defined as 0, regression label
Figure BDA00040914971800001814
Defined as -1.

b2)如果存在唯一

Figure BDA00040914971800001815
则说明位置
Figure BDA00040914971800001816
仅落入一个目标水平框的内部。接着需要根据该预测特征层的尺度回归范围进一步判断
Figure BDA00040914971800001817
应的预测特征层上的位置(x,y)是否预测该目标水平框。假设该目标水平框为Bi,首先计算位置(x,y)到Bi四条边距离的最大值
Figure BDA00040914971800001818
然后将其与该预测特征层设定的尺寸回归范围[smin,smax]进行比较,如果
Figure BDA00040914971800001819
则将
Figure BDA00040914971800001820
对应的预测特征层上的位置(x,y)划分为正样本,类别标签
Figure BDA0004091497180000191
定义为Bi的类别,回归标签
Figure BDA0004091497180000192
定义为
Figure BDA0004091497180000193
到Bi四条边的距离
Figure BDA0004091497180000194
否则将
Figure BDA0004091497180000195
对应的预测特征层上的位置(x,y)划分为负样本,即认为位置(x,y)不预测Bi,并将类别标签
Figure BDA0004091497180000196
定义为0,回归标签
Figure BDA0004091497180000197
定义为-1。b2) If there is a unique
Figure BDA00040914971800001815
The location
Figure BDA00040914971800001816
It only falls inside a target horizontal box. Then it is necessary to further judge according to the scale regression range of the predicted feature layer.
Figure BDA00040914971800001817
The position (x, y) on the corresponding prediction feature layer predicts the target horizontal box. Assuming that the target horizontal box isBi , first calculate the maximum distance from the position (x, y) to the four edgesof Bi
Figure BDA00040914971800001818
Then compare it with the size regression range [smin ,smax ] set by the prediction feature layer. If
Figure BDA00040914971800001819
Then
Figure BDA00040914971800001820
The corresponding position (x, y) on the prediction feature layer is divided into positive samples and category labels
Figure BDA0004091497180000191
Defined as the category ofBi , the regression label
Figure BDA0004091497180000192
Defined as
Figure BDA0004091497180000193
Distances to the four sides ofBi
Figure BDA0004091497180000194
Otherwise
Figure BDA0004091497180000195
The position (x, y) on the corresponding prediction feature layer is classified as a negative sample, that is, the position (x, y) is considered not to predict Bi , and the category label
Figure BDA0004091497180000196
Defined as 0, regression label
Figure BDA0004091497180000197
Defined as -1.

b3)如果存在N个

Figure BDA0004091497180000198
则说明位置
Figure BDA0004091497180000199
落入到N个目标水平框的内部。接着,与情况b2)类似,需要根据该预测特征层的尺度回归范围进一步判断
Figure BDA00040914971800001932
对应的预测特征层上的位置(x,y)是否预测这些目标水平框。首先计算位置
Figure BDA00040914971800001910
到这N个目标水平框四条边距离的最大值
Figure BDA00040914971800001911
然后将它们与该预测特征层设定的尺寸回归范围[smin,smax]进行比较,如果存在t(t>2)个
Figure BDA00040914971800001912
满足:b3) If there are N
Figure BDA0004091497180000198
The location
Figure BDA0004091497180000199
falls into the interior of the N target horizontal boxes. Then, similar to case b2), it is necessary to further judge according to the scale regression range of the predicted feature layer
Figure BDA00040914971800001932
The corresponding position (x, y) on the prediction feature layer predicts these target horizontal boxes. First, calculate the position
Figure BDA00040914971800001910
The maximum distance to the four sides of these N target horizontal boxes
Figure BDA00040914971800001911
Then they are compared with the size regression range [smin , smax ] set by the prediction feature layer. If there are t (t>2)
Figure BDA00040914971800001912
satisfy:

Figure BDA00040914971800001913
Figure BDA00040914971800001913

则说明

Figure BDA00040914971800001914
对应的预测特征层上的位置(x,y)与t个目标水平框对应。由于单阶段全卷积目标检测(Fully Convolution One-Stage Object Detection,FCOS)算法认为面积更大的目标水平框可以由更深尺度预测特征层上的位置进行预测,所以
Figure BDA00040914971800001915
对应的预测特征层上的位置(x,y)最终将与t个目标水平框中面积最小的目标水平框Bj对应,即将
Figure BDA00040914971800001916
对应的预测特征层上的位置(x,y)划分为Bj对应的正样本,类别标签
Figure BDA00040914971800001917
定义为Bj的类别,回归标签
Figure BDA00040914971800001918
定义为
Figure BDA00040914971800001919
到Bj四条边的距离
Figure BDA00040914971800001920
如果存在
Figure BDA00040914971800001921
满足
Figure BDA00040914971800001922
则说明
Figure BDA00040914971800001923
对应的预测特征层上的位置(x,y)仅与一个目标水平框Bk对应,即将
Figure BDA00040914971800001924
对应的预测特征层上的位置(x,y)划分为Bk对应的正样本,类别标签
Figure BDA00040914971800001925
定义为Bk的类别,回归标签
Figure BDA00040914971800001926
定义为
Figure BDA00040914971800001927
到Bk四条边的距离
Figure BDA00040914971800001928
否则将
Figure BDA00040914971800001929
对应的预测特征层上的位置(x,y)划分为负样本,即认为位置(x,y)不预测任何目标水平框,并将类别标签
Figure BDA00040914971800001930
定义为0,回归标签
Figure BDA00040914971800001931
定义为-1。Then explain
Figure BDA00040914971800001914
The corresponding position (x, y) on the prediction feature layer corresponds to t target horizontal boxes. Since the single-stage fully convolutional object detection (FCOS) algorithm believes that the target horizontal box with a larger area can be predicted by the position on the deeper scale prediction feature layer,
Figure BDA00040914971800001915
The position (x, y) on the corresponding prediction feature layer will eventually correspond to the target horizontal boxBj with the smallest area among the t target horizontal boxes, that is,
Figure BDA00040914971800001916
The corresponding position (x, y) on the predicted feature layer is divided into positive samples corresponding toBj , and the category label
Figure BDA00040914971800001917
Defined as the category ofBj , the regression label
Figure BDA00040914971800001918
Defined as
Figure BDA00040914971800001919
Distances to the four sides ofBj
Figure BDA00040914971800001920
If exists
Figure BDA00040914971800001921
satisfy
Figure BDA00040914971800001922
Then explain
Figure BDA00040914971800001923
The position (x, y) on the corresponding prediction feature layer corresponds to only one target horizontal boxBk , that is,
Figure BDA00040914971800001924
The corresponding position (x, y) on the predicted feature layer is divided into positive samples corresponding to Bk , and the category label
Figure BDA00040914971800001925
Defined as the category ofBk , the regression label
Figure BDA00040914971800001926
Defined as
Figure BDA00040914971800001927
Distances to the four sides of Bk
Figure BDA00040914971800001928
Otherwise
Figure BDA00040914971800001929
The position (x, y) on the corresponding prediction feature layer is classified as a negative sample, that is, it is considered that the position (x, y) does not predict any target horizontal box, and the category label
Figure BDA00040914971800001930
Defined as 0, regression label
Figure BDA00040914971800001931
Defined as -1.

c)重复a)、b)操作,定义任一特征层上所有位置处的正负样本及其标签;c) Repeat operations a) and b) to define positive and negative samples and their labels at all positions on any feature layer;

d)重复a)、b)、c)操作,定义所有特征层上所有位置处的正负样本及其标签;d) Repeat operations a), b), and c) to define positive and negative samples and their labels at all positions on all feature layers;

需要说明的是,本实施例仅使用特征处理部分CSSCAM融合模块输出的A3进行网络预测,所以不需要进行d)的操作。另外,本实施例将A3预测特征层的尺寸回归范围设置为[-1,128]。It should be noted that this embodiment only uses A3 output by the CSSCAM fusion module of the feature processing part for network prediction, so there is no need to perform operation d). In addition, this embodiment sets the size regression range of the A3 prediction feature layer to [-1,128].

步骤4:将新的测试集输入到训练好的目标检测网络ψ'中,得到初步的目标检测结果。Step 4: Input the new test set into the trained object detection network ψ' to obtain preliminary object detection results.

41)将新的测试集中的一张幅度测试切片和与之对应的梯度幅度测试切片

Figure BDA0004091497180000201
CFAR二值测试切片
Figure BDA0004091497180000202
分别送入训练好的目标网络中的三个特征提取子网络中进行测试,得到特征层上每个位置的分类得分边界框回归参数和二分类得分;其中,幅度测试切片为原始测试集T中的某一张测试切片;41) Combine an amplitude test slice and the corresponding gradient amplitude test slice in the new test set
Figure BDA0004091497180000201
CFAR binary test slice
Figure BDA0004091497180000202
They are sent to the three feature extraction subnetworks in the trained target network for testing, and the classification score bounding box regression parameters and binary classification scores at each position on the feature layer are obtained; the amplitude test slice is a test slice in the original test set T;

42)将每个位置的分类得分和二分类得分相乘并开根号,以作为该位置最终的目标检测得分,并与预设的得分阈值进行比较:42) Multiply the classification score and the binary classification score of each position and take the square root to obtain the final target detection score of the position, and compare it with the preset score threshold:

若特征层中任意位置处的目标检测得分小于预设得分阈值,则丢弃该位置处所预测的检测框;If the target detection score at any position in the feature layer is less than the preset score threshold, the predicted detection box at that position is discarded;

否则,将特征层上的该位置根据下采样倍数映射回该测试切片原图上作为最终检测框的中心,并综合边界框回归参数,以得到该位置的目标检测结果;Otherwise, the position on the feature layer is mapped back to the original image of the test slice according to the downsampling multiple as the center of the final detection box, and the bounding box regression parameters are integrated to obtain the target detection result at this position;

43)将特征层中的每个位置重复步骤42)的操作,得到初步的目标检测结果。43) Repeat step 42) for each position in the feature layer to obtain preliminary target detection results.

步骤5:将初步的目标检测结果对应到测试图像上,并进行NMS操作以去除重叠的目标检测框,得到最终的目标检测结果。Step 5: Map the preliminary target detection results to the test image and perform NMS operation to remove overlapping target detection boxes to obtain the final target detection results.

具体的,根据步骤1构建测试集时得到的Loc将每个测试切片的目标检测结果对应回测试图像上,并进行NMS操作去除重叠的目标检测框。重复步骤4和5的操作,从而得到最终的目标检测结果。Specifically, the target detection result of each test slice is mapped back to the test image according to the Loc obtained when constructing the test set in step 1, and the NMS operation is performed to remove the overlapping target detection boxes. The operations of steps 4 and 5 are repeated to obtain the final target detection result.

本发明提供的基于有监督注意力机制的无锚框SAR目标检测方法从增强目标特征的角度出发,引入了梯度幅度信息和CFAR信息,构建了基于梯度信息、CFAR信息融合的无锚框目标检测网络,实现了复杂场景下地面SAR目标检测任务,能够有效缓解有锚框目标检测网络本身存在的计算复杂、正负样本不平衡等问题,提升了SAR目标检测性能。The anchor-free SAR target detection method based on supervised attention mechanism provided by the present invention starts from the perspective of enhancing target features, introduces gradient amplitude information and CFAR information, constructs an anchor-free target detection network based on the fusion of gradient information and CFAR information, realizes the ground SAR target detection task in complex scenes, can effectively alleviate the problems of computational complexity and imbalance of positive and negative samples existing in the anchor-free target detection network itself, and improves the SAR target detection performance.

为了进一步验证本发明提出的基于有监督注意力机制的无锚框SAR目标检测方法的有效性,本实施例还将其在MiniSAR数据图像上进行了检测。In order to further verify the effectiveness of the anchor-free SAR target detection method based on the supervised attention mechanism proposed in the present invention, this embodiment also detects it on the MiniSAR data image.

请参见图6-11,图6-11是本发明实验所使用的MiniSAR数据图像;表1给出了本发明所提方法与现阶段目标检测性能较好的CFAR-Guided-EfficientDet SAR图像目标检测方法(简称CFAR-Guided-EfficientDet,出自论文《SAR图像目标检测与鉴别方法研究》,西安电子科技大学博士论文,王宁,2021)和基于CFAR指导的双流SSD SAR图像目标检测方法(简称ICSAF-CFAR-SSD,出自论文《结合恒虚警检测与深层网络的SAR目标检测研究》,西安电子科技大学硕士论文,唐天顾,2022)在图6-11所示的MiniSAR数据图像上的车辆目标检测性能指标。Please refer to Figures 6-11, which are the MiniSAR data images used in the experiments of the present invention; Table 1 gives the vehicle target detection performance indicators of the method proposed in the present invention and the CFAR-Guided-EfficientDet SAR image target detection method with better target detection performance at this stage (referred to as CFAR-Guided-EfficientDet, from the paper "Research on SAR Image Target Detection and Identification Methods", doctoral dissertation of Xidian University, Wang Ning, 2021) and the CFAR-guided dual-stream SSD SAR image target detection method (referred to as ICSAF-CFAR-SSD, from the paper "Research on SAR Target Detection Combining Constant False Alarm Detection and Deep Network", master's thesis of Xidian University, Tang Tiangu, 2022) on the MiniSAR data images shown in Figures 6-11.

表1不同检测网络在图6-11所示的MiniSAR数据图像上的目标检测性能对比Table 1 Comparison of target detection performance of different detection networks on the MiniSAR data images shown in Figures 6-11

Figure BDA0004091497180000211
Figure BDA0004091497180000211

表1中的Pre表示精确率,即检测出的目标框中真实目标的百分比;Rec表示召回率,即目标被正确检测出来的百分比;F1-Score表示调和平均数,AP表示平均准确率,它们是统一精确率Pre和召回率Rec的系统性指标。In Table 1, Pre represents precision, that is, the percentage of true targets in the detected target frame; Rec represents recall, that is, the percentage of targets correctly detected; F1-Score represents the harmonic mean, and AP represents the average accuracy, which are systematic indicators that unify the precision Pre and recall Rec.

通过表1可以看到,本发明所提方法在图6-11所示的六幅MiniSAR数据图像上的SAR目标检测性能均好于CFAR-Guided-EfficientDet目标检测网络;本发明所提方法在图像1、图像2、图像3和图像4上的SAR目标检测性能要全面好于ICSAF-CFAR-SSD目标检测网络;在图像5上的SAR目标检测性能除AP与ICSAF-CFAR-SSD目标检测网络相当外,其他指标均明显高于ICSAF-CFAR-SSD目标检测网络;另外,在图像6上的SAR目标检测除Pre和F1-Score低于ICSAF-CFAR-SSD目标检测网络外,Rec和AP均高于ICSAF-CFAR-SSD目标检测网络。It can be seen from Table 1 that the SAR target detection performance of the proposed method on the six MiniSAR data images shown in Figures 6-11 is better than that of the CFAR-Guided-EfficientDet target detection network; the SAR target detection performance of the proposed method on Image 1, Image 2, Image 3 and Image 4 is better than that of the ICSAF-CFAR-SSD target detection network in all aspects; the SAR target detection performance on Image 5 is comparable to that of the ICSAF-CFAR-SSD target detection network, and other indicators are significantly higher than those of the ICSAF-CFAR-SSD target detection network; in addition, the SAR target detection on Image 6 is higher than that of the ICSAF-CFAR-SSD target detection network except that Pre and F1-Score are lower than those of the ICSAF-CFAR-SSD target detection network, while Rec and AP are higher than those of the ICSAF-CFAR-SSD target detection network.

总的来看,本发明所提方法在召回率方面明显优于现阶段目标检测性能相对较好的两种SAR目标检测方法在复杂场景下车辆目标上的检测性能,能够检测出更多的目标,同时还能保证目标检测精确率、F1-Score和AP也相对更好。所以,通过上述试验分析,本发明所提方法获得了比现阶段目标检测性能相对较好的SAR目标检测方法更好的SAR目标检测性能,充分验证了本发明所提方法的有效性和优越性。In general, the method proposed in the present invention is significantly better than the two SAR target detection methods with relatively good target detection performance at the current stage in terms of recall rate on vehicle targets in complex scenes, and can detect more targets while ensuring that the target detection accuracy, F1-Score and AP are also relatively better. Therefore, through the above experimental analysis, the method proposed in the present invention has achieved better SAR target detection performance than the SAR target detection methods with relatively good target detection performance at the current stage, which fully verifies the effectiveness and superiority of the method proposed in the present invention.

以上内容是结合具体的优选实施方式对本发明所作的进一步详细说明,不能认定本发明的具体实施只局限于这些说明。对于本发明所属技术领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干简单推演或替换,都应当视为属于本发明的保护范围。The above contents are further detailed descriptions of the present invention in combination with specific preferred embodiments, and it cannot be determined that the specific implementation of the present invention is limited to these descriptions. For ordinary technicians in the technical field to which the present invention belongs, several simple deductions or substitutions can be made without departing from the concept of the present invention, which should be regarded as falling within the scope of protection of the present invention.

Claims (10)

1. An anchor frame-free SAR target detection method based on a supervised attention mechanism is characterized by comprising the following steps:
step 1: acquiring an original training set and an original testing set based on the original SAR image; gradient information and CFAR information are respectively obtained from an original training set and an original testing set by utilizing a ROEWA edge detection algorithm and a double-parameter CFAR algorithm, and a new training set and a new testing set are constructed according to the gradient information and the CFAR information;
Step 2: constructing an anchor frame-free SAR target detection network based on gradient information, CFAR information fusion and a supervised attention mechanism; the target detection network comprises a feature extraction module, a feature processing module and a network prediction module;
step 3: training the target detection network by using the new training set to obtain a trained target detection network;
step 4: inputting the new test set into a trained target detection network to obtain a preliminary target detection result;
step 5: and the preliminary target detection result is corresponding to the test image, NMS operation is carried out to remove the overlapped target detection frame, and a final target detection result is obtained.
2. The supervised attention mechanism based anchor free SAR target detection method of claim 1, wherein step 1 comprises:
1. constructing a new training set;
selecting a plurality of original SAR images as training images, and slicing the training images to obtain a plurality of training slices as an original training set psi;
respectively obtaining a gradient amplitude training image and a CFAR binary training image from the selected training image through a ROEWA edge detection algorithm and a double-parameter CFAR algorithm;
Slicing the gradient amplitude training image and the CFAR binary training image to obtain a gradient amplitude training slice corresponding to the original training set psi
Figure FDA0004091497170000011
And CFAR binary training slice->
Figure FDA0004091497170000012
And forms a new training set ψ' together with the original training set ψ;
2. constructing a new test set;
selecting an original SAR image as a test image, and slicing the original SAR image to obtain a plurality of test slices as an original test set T;
respectively obtaining a gradient amplitude test image and a CFAR binary test image from the selected test image through a ROEWA edge detection algorithm and a double-parameter CFAR algorithm;
slicing the gradient amplitude test image and the CFAR binary test image to obtain a gradient amplitude test slice corresponding to the original test set T
Figure FDA0004091497170000021
And CFAR binary test section->
Figure FDA0004091497170000022
And together with said original test set T forms a new test set T'.
3. The method for detecting the target of the anchor-free frame SAR according to claim 2, wherein in the anchor-free frame SAR target detection network constructed in step 2, the feature extraction module comprises three feature extraction sub-networks A, B, C which are identical in structure but not shared in parameters, namely an amplitude feature extraction network a, a gradient feature extraction network B and a CFAR feature extraction network C; wherein,,
The three feature extraction sub-networks A, B, C comprise a feature extraction module taking ResNet-18 as a framework and an FPN multi-scale feature fusion module;
the three feature extraction modules are respectively used for training the slice psi,
Figure FDA0004091497170000023
And->
Figure FDA0004091497170000024
Extracting features to obtain corresponding output feature layer ∈>
Figure FDA0004091497170000025
And->
Figure FDA0004091497170000026
i=A,B,C;
Three FPN multi-scale feature fusion modules are respectively used for outputting feature layers
Figure FDA0004091497170000027
And->
Figure FDA0004091497170000028
Performing multi-scale feature fusion to obtain an output feature layer of the whole feature extraction module>
Figure FDA0004091497170000029
And->
Figure FDA00040914971700000210
4. The method for detecting the target of the anchor-free frame SAR based on the supervised attention mechanism as set forth in claim 3, wherein in the anchor-free frame SAR target detection network constructed in the step 2, the feature processing module comprises a T-ICSAF feature fusion module based on the interactive attention mechanism and a CSSCAM module based on the combination of the spatial attention mechanism and the channel attention mechanism supervised by the GT binary label; wherein,,
the T-ICSAF feature fusion module is used for outputting a feature layer of the feature extraction module
Figure FDA00040914971700000211
And->
Figure FDA00040914971700000212
Fusing to obtain fusion characteristic F3
The CSSCAM module is used for fusing the characteristic F3 Processing to obtain a feature layer A with enhanced target features and suppressed background clutter features3 As an output of the feature processing module.
5. The method for detecting the target of the anchor-free SAR based on the supervised attention mechanism as set forth in claim 4, wherein the T-ICSAF feature fusion module mainly comprises four sub-modules, namely a feature preprocessing sub-module, an interactive channel attention sub-module, an interactive space attention sub-module and an attention fusion sub-module; wherein,,
the feature preprocessing submodule is used for extracting the featuresOutput feature layer of (2)
Figure FDA00040914971700000213
And->
Figure FDA00040914971700000214
Respectively performing convolution operation and BN batch standardization operation to correspondingly obtain three characteristic layers X30 、X31 And X32
The interactive channel attention submodule is used for aiming at the characteristic layer X30 、X31 And X32 Sequentially performing global average pooling, addition and Sigmoid normalization operation to obtain a channel attention weight Fc
The interactive space attention submodule is used for aiming at the characteristic layer X30 、X31 And X32 Sequentially performing global average pooling, concat, convolution and Sigmoid normalization operation on channel dimension to obtain a spatial attention weight Fs
The attention fusion sub-module is used for weighting the channel attention Fc Feature layer X output by feature preprocessing sub-module30 Corresponding multiplication of each channel of (a) to obtain a characteristic layer F3 'A'; then weight F the spatial attentions And feature layer F3 The pixels of each channel are multiplied correspondingly one by one to obtain a fusion characteristic layer F3 The output of the T-ICSAF feature fusion module is obtained.
6. The method for detecting a frame-less SAR target based on a supervised attention mechanism according to claim 5, wherein the CSSCAM module comprises three sub-modules, namely a spatial attention machine sub-module, a SE channel attention machine sub-module and an attention fusion sub-module supervised by a GT binary label; wherein,,
the GT binary label supervised spatial attention mechanism submodule is used for fusing the feature layer F output by the T-ICSAF feature fusion module3 Sequentially performing global average pooling operation on channel dimension and global maximum pooling operation, concat operation, convolution and Sigmoid normalization on channel dimensionOperation to obtain the space attention weight As
The SE channel attention mechanism submodule is used for outputting a fusion characteristic layer F to the T-ICSAF characteristic fusion module3 Sequentially performing global average pooling operation and global maximum pooling operation of space dimension, feature vector compression, mapping and decompression operation, addition and Sigmoid normalization operation to obtain channel attention weight Ac
The attention fusion sub-module is used for weighting the spatial attention As Fusion feature layer F output by T-ICSAF feature fusion module3 The pixels of each channel of the image are multiplied one by one to obtain a characteristic layer A'3 Then pay attention to the channel weight Ac And feature layer A'3 Corresponding multiplication of each channel of the target layer to obtain a feature layer A with enhanced target features and suppressed background clutter features3 The output of the CSSCAM module is obtained.
7. The method for detecting the target of the anchor-free SAR based on the supervised Attention mechanism as set forth in claim 4, wherein in the anchor-free SAR target detection network constructed in the step 2, the network prediction module comprises a classification branching sub-module for target detection tasks, a regression branching sub-module and an Attention branching sub-module based on GT binary label supervision;
the classification branching sub-module and the regression branching sub-module are respectively used for outputting a characteristic layer A to the characteristic processing module3 Predicting, and correspondingly obtaining classification scores and boundary box regression parameters;
the attribute classification sub-module is used for classifying the feature layer C used for predicting classification scoresP And feature layer R for predicting bounding box regression parametersP And processing to obtain a classification score.
8. The supervised attention mechanism based anchor free SAR target detection method of claim 7, wherein said classification branching sub-module is implemented by:
First for the characteristic partFeature layer A outputted by CSSCAM module of processing part3 Performing four convolution operations to obtain a feature layer C for predicting classification scoresP The method comprises the steps of carrying out a first treatment on the surface of the Then for the obtained characteristic layer CP Sequentially performing convolution and Sigmoid normalization operations to obtain a classification score graph X predicted by the classification branch submodule47
The regression branch submodule is realized in the following manner:
first, outputting a feature layer A to a feature processing part CSSCAM module3 Performing four convolution operations to obtain a feature layer R for predicting regression parameters of the boundary boxP The method comprises the steps of carrying out a first treatment on the surface of the Then for the obtained characteristic layer RP Performing a convolution operation to obtain a boundary frame regression parameter X predicted by the regression branch submodule48
The attribute classification sub-module is realized in the following manner:
feature layer C for predicting classification score in classification branch submoduleP And feature layer R for predicting regression parameters of boundary boxes in regression branch submoduleP Performing a Concat operation, and comparing the obtained characteristic diagram X49 Performing convolution operation to obtain a feature layer AP The method comprises the steps of carrying out a first treatment on the surface of the Finally to the characteristic layer AP Performing convolution operation and Sigmoid normalization operation to obtain a classification score X predicted by the Attention classification sub-module50
9. The supervised attention mechanism based anchor free SAR target detection method of claim 8, wherein in step 3, during training of the target detection network,
the CSSCAM module uses the GT binary label GT' to monitor the space attention weight A of the space attention machine sub-module of the GT labels Performing supervised learning, wherein a Loss function uses Focal Loss;
the classification branch sub-module performs supervised learning by using positive and negative sample labels obtained according to a positive and negative sample selection strategy without an anchor frame, and the Loss function uses Focal Loss;
the regression branch sub-module uses positive sample labels obtained according to an anchor-frame-free positive and negative sample selection strategy to conduct supervised learning, and the Loss function uses GIOU Loss;
the Attention classification sub-module uses GT binary labels GT' at the same position as a positive sample obtained according to an anchor-free frame positive and negative sample selection strategy to conduct supervised learning, and a Loss function of the Attention classification sub-module uses BCE Loss.
10. The supervised attention mechanism based anchor free SAR target detection method of claim 1, wherein step 4 comprises:
41 One amplitude test slice and the corresponding gradient amplitude test slice G in the new test setaY CFAR binary test slice
Figure FDA0004091497170000051
Respectively sending the three feature extraction sub-networks in the trained target network to test to obtain a classification score boundary box regression parameter and a classification score of each position on the feature layer; the amplitude test slice is a certain test slice in the original test set T;
42 Multiplying the classification score and the classification score of each location and opening a root number as a final target detection score for the location, and comparing with a preset score threshold:
if the target detection score at any position in the feature layer is smaller than a preset score threshold value, discarding a predicted detection frame at the position;
otherwise, mapping the position on the feature layer back to the original image of the test slice as the center of a final detection frame according to the downsampling multiple, and synthesizing the regression parameters of the boundary frame to obtain a target detection result of the position;
43 Repeating the operation of step 42) for each position in the feature layer to obtain a preliminary target detection result.
CN202310153318.2A2023-02-222023-02-22 Anchor-free SAR target detection method based on supervised attention mechanismActiveCN116363504B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202310153318.2ACN116363504B (en)2023-02-222023-02-22 Anchor-free SAR target detection method based on supervised attention mechanism

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202310153318.2ACN116363504B (en)2023-02-222023-02-22 Anchor-free SAR target detection method based on supervised attention mechanism

Publications (2)

Publication NumberPublication Date
CN116363504Atrue CN116363504A (en)2023-06-30
CN116363504B CN116363504B (en)2025-08-26

Family

ID=86940415

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202310153318.2AActiveCN116363504B (en)2023-02-222023-02-22 Anchor-free SAR target detection method based on supervised attention mechanism

Country Status (1)

CountryLink
CN (1)CN116363504B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20210174149A1 (en)*2018-11-202021-06-10Xidian UniversityFeature fusion and dense connection-based method for infrared plane object detection
US20210232813A1 (en)*2020-01-232021-07-29Tongji UniversityPerson re-identification method combining reverse attention and multi-scale deep supervision
CN114202672A (en)*2021-12-092022-03-18南京理工大学 A small object detection method based on attention mechanism
CN114764886A (en)*2022-03-182022-07-19西安电子科技大学CFAR (computational fluid dynamics) -guidance-based double-current SSD SAR (solid State disk) image target detection method
CN115147731A (en)*2022-07-282022-10-04北京航空航天大学 A SAR Image Target Detection Method Based on Full Spatial Coding Attention Module

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20210174149A1 (en)*2018-11-202021-06-10Xidian UniversityFeature fusion and dense connection-based method for infrared plane object detection
US20210232813A1 (en)*2020-01-232021-07-29Tongji UniversityPerson re-identification method combining reverse attention and multi-scale deep supervision
CN114202672A (en)*2021-12-092022-03-18南京理工大学 A small object detection method based on attention mechanism
CN114764886A (en)*2022-03-182022-07-19西安电子科技大学CFAR (computational fluid dynamics) -guidance-based double-current SSD SAR (solid State disk) image target detection method
CN115147731A (en)*2022-07-282022-10-04北京航空航天大学 A SAR Image Target Detection Method Based on Full Spatial Coding Attention Module

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李健伟;曲长文;彭书娟;: "SAR图像舰船目标联合检测与方向估计", 武汉大学学报(信息科学版), no. 06, 5 June 2019 (2019-06-05)*
邹树岭: "结合SAR图像特性的无锚框SAR目标检测方法研究", 《知网》, 15 March 2025 (2025-03-15)*

Also Published As

Publication numberPublication date
CN116363504B (en)2025-08-26

Similar Documents

PublicationPublication DateTitle
CN111401201B (en) A multi-scale target detection method for aerial images based on spatial pyramid attention-driven
CN109685776B (en)Pulmonary nodule detection method and system based on CT image
CN111898432B (en)Pedestrian detection system and method based on improved YOLOv3 algorithm
WO2020177432A1 (en)Multi-tag object detection method and system based on target detection network, and apparatuses
CN113128564B (en)Typical target detection method and system based on deep learning under complex background
CN112348036A (en) Adaptive Object Detection Method Based on Lightweight Residual Learning and Deconvolution Cascade
CN107748900A (en)Tumor of breast sorting technique and device based on distinction convolutional neural networks
Cepni et al.Vehicle detection using different deep learning algorithms from image sequence
CN112507896B (en) A method for detecting cherry fruits using the improved YOLO-V4 model
CN111798490B (en)Video SAR vehicle target detection method
Wu et al.Object detection and X-ray security imaging: A survey
CN116958792B (en) A false alarm removal method for SAR vehicle target detection
CN107977660A (en)Region of interest area detecting method based on background priori and foreground node
Yang et al.An effective and lightweight hybrid network for object detection in remote sensing images
CN115965862A (en)SAR ship target detection method based on mask network fusion image characteristics
CN103366184A (en)Polarization SAR data classification method and system based on mixed classifier
CN111091101A (en)High-precision pedestrian detection method, system and device based on one-step method
CN117218545B (en)Radar image detection method based on LBP characteristics and improvement Yolov5
CN108734200A (en)Human body target visible detection method and device based on BING features
CN115861956A (en)Yolov3 road garbage detection method based on decoupling head
Bi et al.DR-YOLO: An improved multi-scale small object detection model for drone aerial photography scenes based on YOLOv7
CN113869121A (en)Radar waveform classification method and system based on effective region identification
CN118072163A (en)Neural network-based method and system for detecting illegal occupation of territorial cultivated land
CN116363504A (en)Anchor frame-free SAR target detection method based on supervised attention mechanism
Li et al.An improved image registration and fusion algorithm

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp