CN115909086A

Movatterモバイル変換

Info

Publication number: CN115909086A
Application number: CN202211449058.5A
Authority: CN
Inventors: 白雪茹; 鲜要胜; 杨敏佳; 孟昭晗; 周峰
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2022-11-18
Filing date: 2022-11-18
Publication date: 2023-04-04
Anticipated expiration: 2042-11-18
Also published as: CN115909086B

Abstract

The invention discloses an SAR target detection and identification method based on a multistage enhancement network, which mainly solves the problems of poor robustness, high false alarm rate and omission factor and low detection and identification precision in the prior art in a complex environment. The implementation scheme is as follows: marking and dividing SAR measured data to obtain a training set and a test set; constructing a multi-level enhancement network formed by cascading a data level enhancement module, a feature level enhancement module, a region suggestion module and a decision level enhancement module; training a multistage enhancement network by using a training set based on a random gradient descent algorithm; and inputting the test set image into the trained multistage enhancement network to obtain the detection and identification result of the SAR target. The SAR target detection and identification method based on the multi-point sensor significantly improves the detection and identification performance of the SAR target in the complex environment, and can be used for battlefield reconnaissance and situation perception.

Description

Translated fromChinese

基于多级增强网络的SAR目标检测识别方法SAR target detection and recognition method based on multi-level enhanced network

技术领域technical field

本发明属于雷达遥感技术领域，更进一步涉及一种SAR目标检测识别方法，可用于战场侦察和态势感知。The invention belongs to the technical field of radar remote sensing, and further relates to a SAR target detection and recognition method, which can be used for battlefield reconnaissance and situation awareness.

背景技术Background technique

合成孔径雷达SAR是一种主动式微波成像传感器，通过发射大时宽-带宽积信号并利用孔径合成获得二维高分辨图像。与光学和红外传感器相比，SAR具有全天时、全天候、作用距离远、穿透力强等独特优势，成为对地观测的重要手段，并广泛应用于军事和民用领域。随着SAR系统的不断完善和SAR成像水平的不断提高，SAR图像解译技术逐渐受到相关领域学者及研究人员的关注。作为其中的难点和关键步骤，对重点目标的精确检测和识别具有重要意义和研究价值。Synthetic Aperture Radar (SAR) is an active microwave imaging sensor that emits a large time-width-bandwidth product signal and uses aperture synthesis to obtain a two-dimensional high-resolution image. Compared with optical and infrared sensors, SAR has unique advantages such as all-day, all-weather, long range, and strong penetrating power. It has become an important means of earth observation and is widely used in military and civilian fields. With the continuous improvement of SAR system and the continuous improvement of SAR imaging level, SAR image interpretation technology has gradually attracted the attention of scholars and researchers in related fields. As one of the difficulties and key steps, the precise detection and recognition of key targets is of great significance and research value.

传统的SAR目标检测识别方法主要采用三级处理流程，包括目标检测、目标鉴别和目标识别。其中，目标检测主要基于恒虚警率CFAR算法，在假设背景杂波满足某种概率分布模型的前提下，该算法通过在SAR图像上进行滑窗并将选定值与自适应阈值进行比较来实现目标检测，然而，由于难以对非均匀强杂波背景进行有效建模，该算法在复杂场景中适应性较差，检测精度较低。目标鉴别和目标识别主要依据图像的统计信息和物理特性进行手工特征设计和分类器构建，然而，这需要较强的专业知识和专家经验，且算法的精度和灵活性较差，难以在实际应用中达到理想的效果。此外，传统的三级处理流程中各环节之间的低效连接也极大地降低了算法的运算效率，亟需开发新的架构体系。Traditional SAR target detection and recognition methods mainly adopt three-level processing flow, including target detection, target discrimination and target recognition. Among them, the target detection is mainly based on the constant false alarm rate CFAR algorithm. Under the premise that the background clutter satisfies a certain probability distribution model, the algorithm performs sliding windows on the SAR image and compares the selected value with the adaptive threshold. However, due to the difficulty in effectively modeling the non-uniform strong clutter background, the algorithm has poor adaptability in complex scenes and low detection accuracy. Target identification and target recognition are mainly based on the statistical information and physical characteristics of the image for manual feature design and classifier construction. However, this requires strong professional knowledge and expert experience, and the accuracy and flexibility of the algorithm are poor, which is difficult to apply in practice. to achieve the desired effect. In addition, the inefficient connection between the various links in the traditional three-level processing flow also greatly reduces the computational efficiency of the algorithm, and it is urgent to develop a new architecture system.

近年来，随着深度学习技术的不断发展，以深度神经网络为主体的目标检测识别方法在计算机视觉领域取得了重大突破。得益于深度网络的特殊结构，这些算法能够同时预测目标的位置和类别信息，而无需进行多级处理，显著提高了检测识别的性能和效率。目前，主流的检测识别算法可分为单阶段和双阶段两种模式，前者直接将网络所提特征进行解码来实现目标的检测识别，具有更快的推理速度，代表算法有YOLO、SSD和RetinaNet等；后者则增加了候选区域提取阶段，即首先通过深度网络从图像中提取可能包含重点目标的候选区域，然后再对候选区域的位置进一步修正，并得到识别结果，代表算法有R-CNN、Faster R-CNN和Cascade R-CNN等。相比于单阶段算法，双阶段算法具有更高的检测识别精度。In recent years, with the continuous development of deep learning technology, object detection and recognition methods based on deep neural networks have made major breakthroughs in the field of computer vision. Thanks to the special structure of the deep network, these algorithms can simultaneously predict the location and category information of the target without multi-level processing, which significantly improves the performance and efficiency of detection and recognition. At present, the mainstream detection and recognition algorithms can be divided into two modes: single-stage and two-stage. The former directly decodes the features extracted by the network to achieve target detection and recognition. It has faster reasoning speed. Representative algorithms include YOLO, SSD and RetinaNet. etc.; the latter adds a candidate area extraction stage, that is, first extracts candidate areas that may contain key targets from the image through a deep network, and then further corrects the position of the candidate area, and obtains the recognition result. The representative algorithm is R-CNN , Faster R-CNN and Cascade R-CNN, etc. Compared with the single-stage algorithm, the two-stage algorithm has higher detection and recognition accuracy.

虽然上述基于深度学习的方法为SAR目标检测识别提供了可行的途径，然而，与光学图像相比，SAR图像场景更为复杂、不同类别目标的相似性较高，且受相干斑噪声的影响，目标的边缘不清晰，因此依然存在复杂环境不稳健、相似类别难以区分的问题。Although the above deep learning-based methods provide a feasible way for SAR target detection and recognition, compared with optical images, SAR image scenes are more complex, the similarity of different types of targets is higher, and they are affected by coherent speckle noise. The edge of the target is not clear, so there are still problems that the complex environment is not robust and similar categories are difficult to distinguish.

申请号为201710461303.7的专利文献公开了“一种SAR图像目标检测识别一体化方法”，其首先通过卷积神经网络提取SAR图像特征，然后基于该特征生成可能包含目标的候选区域，最后利用全连接网络预测感兴趣区域的类别和位置信息来实现SAR目标的检测识别。由于该方法未针对SAR图像的特性进行相关优化，因此在复杂环境中不稳健，预测边界框不准确，虚警率和漏检率高，且难以对目标的细粒度特征进行有效挖掘，导致检测识别准确率较低。The patent document with the application number 201710461303.7 discloses "an integrated method for SAR image target detection and recognition", which firstly extracts SAR image features through a convolutional neural network, then generates candidate regions that may contain targets based on the features, and finally uses fully connected The network predicts the category and location information of the region of interest to realize the detection and recognition of SAR targets. Since this method is not optimized for the characteristics of SAR images, it is not robust in complex environments, the predicted bounding box is inaccurate, the false alarm rate and missed detection rate are high, and it is difficult to effectively mine the fine-grained features of the target, resulting in detection The recognition accuracy is low.

发明内容Contents of the invention

本发明的目的在于针对上述现有技术的不足，提出一种基于多级增强网络的SAR目标检测识别方法，以提高复杂环境中检测识别的稳健性，降低虚警率和漏检率，增强特征的可分性，显著提升SAR目标检测识别精度。The purpose of the present invention is to address the deficiencies of the above-mentioned prior art, and propose a SAR target detection and recognition method based on a multi-level enhanced network to improve the robustness of detection and recognition in complex environments, reduce false alarm rates and missed detection rates, and enhance feature The separability can significantly improve the accuracy of SAR target detection and recognition.

本发明的技术思路是，通过设计多级增强网络，提升复杂环境下SAR目标检测识别性能，其实现步骤包括如下：The technical idea of the present invention is to improve the performance of SAR target detection and recognition in complex environments by designing a multi-level enhanced network, and the implementation steps include the following:

(1)获取具有多类目标的SAR图像，对每幅SAR图像中的目标位置和目标类别进行标记，并将标记好的SAR图像进行随机划分，得到训练集和测试集；(1) Obtain SAR images with multiple types of targets, mark the target position and target category in each SAR image, and randomly divide the marked SAR images to obtain a training set and a test set;

(2)构建多级增强网络：(2) Build a multi-level enhanced network:

(2a)建立依次进行多尺度变换、随机翻转、随机旋转、幂次变换、随机噪声操作的数据级增强模块；(2a) Establish a data-level enhancement module that sequentially performs multi-scale transformation, random flip, random rotation, power transformation, and random noise operations;

(2b)建立由主干网络A、特征优选金字塔网络F、递归主干网络Q和递归特征优选金字塔网络E级联组成的特征级增强模块；(2b) Establish a feature-level enhancement module consisting of backbone network A, feature-optimized pyramid network F, recursive backbone network Q and recursive feature-optimized pyramid network E cascade;

(2c)选用现有区域建议网络组成区域建议模块G，并选用交叉熵损失和CIOU损失作为其分类和回归损失；(2c) Select the existing region proposal network to form the region proposal module G, and select the cross-entropy loss and CIOU loss as its classification and regression losses;

(2d)建立由三个决策器d₁,d₂,d₃级联组成的决策级增强模块D，并选用交叉熵损失和CIOU损失作为其分类和回归损失；(2d) Establish a decision-level enhancement module D consisting of cascades of three decision makers d₁ , d₂ , and d_{3 ,} and select cross-entropy loss and CIOU loss as its classification and regression losses;

(2e)将数据级增强模块、特征级增强模块、区域建议模块、决策级增强模块依次级联，构成多级增强网络；(2e) cascading the data-level enhancement module, feature-level enhancement module, region suggestion module, and decision-level enhancement module sequentially to form a multi-level enhancement network;

(3)对多级增强网络进行训练：(3) Train the multi-level enhanced network:

(3a)从训练集中随机采样一组SAR图像输入到多级增强网络中，计算损失，基于该损失，通过随机梯度下降算法更新网络参数；(3a) Randomly sample a group of SAR images from the training set and input them into the multi-level enhanced network, calculate the loss, and update the network parameters through the stochastic gradient descent algorithm based on the loss;

(3b)重复(3a)过程，直至网络收敛，得到训练好的多级增强网络；(3b) Repeat the process of (3a) until the network converges to obtain a trained multi-level enhanced network;

(4)将测试集中的SAR图像输入到训练好的多级增强网络中，得到检测识别结果。(4) Input the SAR images in the test set into the trained multi-level enhanced network to obtain the detection and recognition results.

本发明与现有技术相比，具有以下优点：Compared with the prior art, the present invention has the following advantages:

第一，本发明通过设计数据级增强模块，模拟目标的尺度和方位变化以及杂波噪声干扰，提高了复杂环境中算法的稳健性，降低虚警率和漏检率。First, the present invention improves the robustness of the algorithm in a complex environment and reduces the false alarm rate and missed detection rate by designing a data-level enhancement module to simulate the scale and orientation changes of the target and clutter noise interference.

第二，本发明通过设计特征级增强模块，充分挖掘SAR图像中目标的细粒度特征，增强了相似类别目标的可分性；Second, the present invention fully excavates the fine-grained features of targets in SAR images by designing a feature-level enhancement module, and enhances the separability of targets of similar categories;

第三，本发明通过设计决策级增强模块，对预测结果进行多次微调，以逐步缩小目标预测位置与真实值之间的偏差，有效抑制了SAR目标边缘模糊对检测精度的影响。Thirdly, the present invention fine-tunes the prediction results multiple times by designing a decision-making level enhancement module to gradually reduce the deviation between the predicted position of the target and the real value, effectively suppressing the influence of blurred SAR target edges on the detection accuracy.

附图说明Description of drawings

图1为本发明的实现流程图；Fig. 1 is the realization flowchart of the present invention;

图2为本发明中构建的多级增强网路模型图；Fig. 2 is the multi-level enhanced network model figure that builds among the present invention;

图3为本发明的仿真结果图。Fig. 3 is a simulation result diagram of the present invention.

具体实施方式Detailed ways

下面结合附图对本发明的实例和效果作进一步详细说明。The examples and effects of the present invention will be described in further detail below in conjunction with the accompanying drawings.

参考图1，本实例中基于多级增强网络的SAR目标检测识别方法依次包括数据标注和划分、构建多级增强网络、训练多级增强网络、获取SAR目标检测识别结果，具体实现如下：Referring to Figure 1, the SAR target detection and recognition method based on the multi-level enhanced network in this example includes data labeling and division, building a multi-level enhanced network, training the multi-level enhanced network, and obtaining the SAR target detection and recognition results. The specific implementation is as follows:

步骤一，数据标注和划分。Step 1, data labeling and division.

获取具有多类目标的SAR图像，对每幅SAR图像中的目标位置和目标类别进行标注，并将标注好的SAR图像按照7:3进行随机划分，得到训练集和测试集。Obtain SAR images with multiple types of targets, mark the target position and target category in each SAR image, and divide the marked SAR images randomly according to 7:3 to obtain training set and test set.

在本发明的实施例中，SAR图像来自高分3号卫星上的星载雷达，SAR图像尺度包括600×600、1024×1024、2048×2048三种，共七类飞机目标，训练集图像数量为1400张，测试集图像数量为600张。In the embodiment of the present invention, the SAR image comes from the spaceborne radar on the Gaofen 3 satellite. The SAR image scale includes three types: 600×600, 1024×1024, and 2048×2048. There are seven types of aircraft targets in total. The number of images in the training set The number of images in the test set is 1400, and the number of images in the test set is 600.

步骤二，构建多级增强网络。Step two, build a multi-level enhanced network.

参考图2，本步骤构建的多级增强网络包括依次级联的数据级增强模块、特征级增强模块、区域建议模块和决策级增强模块，其构建步骤如下：Referring to Figure 2, the multi-level enhanced network constructed in this step includes sequentially cascaded data-level enhanced modules, feature-level enhanced modules, region proposal modules, and decision-level enhanced modules. The construction steps are as follows:

2.1)建立依次进行多尺度变换、随机翻转、随机旋转、幂次变换、随机噪声操作的数据级增强模块；在本发明的实施例中，训练时执行上述所有操作，测试时仅执行多尺度变换和随机翻转操作，其中多尺度变换的尺度包括1024×1024、1088×1088、1152×1152，随机翻转的方向包括水平、垂直、对角，随机旋转的角度包括90°、180°、270°，幂次变换的系数为[0.8,1.2]之间的随机值。2.1) Establish a data-level enhancement module that sequentially performs multi-scale transformation, random flip, random rotation, power transformation, and random noise operations; in the embodiment of the present invention, all the above-mentioned operations are performed during training, and only multi-scale transformation is performed during testing And random flip operation, where the scale of multi-scale transformation includes 1024×1024, 1088×1088, 1152×1152, the direction of random flip includes horizontal, vertical, diagonal, and the angle of random rotation includes 90°, 180°, 270°, The coefficient of the power transformation is a random value between [0.8,1.2].

2.2)建立由主干网络A、特征优选金字塔网络F、递归主干网络Q和递归特征优选金字塔网络E级联组成的特征级增强模块：2.2) Establish a feature-level enhancement module consisting of backbone network A, feature optimization pyramid network F, recursive backbone network Q and recursive feature optimization pyramid network E cascade:

2.2.1)建立包括5个级联卷积模块a₁,a₂,a₃,a₄,a₅的主干网络A，其中：2.2.1) Establish a backbone network A including 5 cascaded convolution modules a₁ , a₂ , a₃ , a₄ , a₅ , where:

第一个卷积模块a₁由7×7标准卷积层、批归一化层、ReLU激活层以及最大池化下采样层级联组成；The first convolution module a₁ consists of a 7×7 standard convolution layer, a batch normalization layer, a ReLU activation layer, and a cascade of max pooling downsampling layers;

第二个卷积模块a₂由3个残差块级联组成；The second convolutional module_a2 consists of 3 residual block cascades;

第三个卷积模块a₃由4个残差块级联组成；The third convolutional module a₃ consists of 4 residual block cascades;

第四个卷积模块a₄由6个残差块级联组成；The fourth convolution module a₄ consists of 6 residual block cascades;

第五个卷积模块a₅由3个残差块级联组成；The fifth convolution module a₅ consists of 3 residual block cascades;

在本发明的实施例中，主干网络A用于提取多尺度特征图，对于宽和高分别为W和H的输入SAR图像

主干网络输出的多尺度特征图为

其中Y_i为第i个卷积模块a_i的输出特征图。In the embodiment of the present invention, the backbone network A is used to extract multi-scale feature maps, for the input SAR image with width and height W and H respectively

The multi-scale feature map output by the backbone network is

where Y_i is the output feature map of the i-th convolutional module a_i .

2.2.2)建立包括4个并联分支f₁,f₂,f₃,f₄的特征优选金字塔网络F，每个分支f_i由特征优选模块f_i^s和特征融合模块f_i^u级联组成，每个特征优选模块f_i^s,由两个并联子分支

和1×1标准卷积层级联组成，第一子分支

的结构依次包括全局平均池化层、1维卷积层、Sigmoid激活层，第二子分支

为恒等分支；每个特征融合模块f_i^u由3×3标准卷积层组成。2.2.2) Establish a feature optimization pyramid network F including 4 parallel branches f₁ , f₂ , f₃ , f₄ , each branch f_i is composed of a feature optimization module f_i^s and a feature fusion module f_i^u in cascade , each feature optimization module f_i^s consists of two parallel sub-branches

Concatenated with 1×1 standard convolutional layer, the first sub-branch

The structure includes a global average pooling layer, a 1D convolutional layer, a Sigmoid activation layer, and the second sub-branch

is the identity branch; each feature fusion module f_i^u consists of 3×3 standard convolutional layers.

在本发明的实施例中，特征优选金字塔网络F用于对主干网络输出的多尺度特征图Y_i(i＝2,3,4,5)进行特征优选和特征融合，Y_i经过特征优选后表示为：In the embodiment of the present invention, the feature optimization pyramid network F is used to perform feature optimization and feature fusion on the multi-scale feature map Y_i (i=2, 3, 4, 5) output by the backbone network. After the feature optimization, Y_i Expressed as:

其中，S_i为第i个特征优选模块f_i^s的输出特征图，⊙表示逐通道相乘操作，GAP(·)表示全局平均池化，Conv1d(·)表示1维卷积，σ(·)表示Sigmoid函数，Conv1×1(·)表示1×1标准卷积；S_i经过特征融合后表示为：Among them, S_i is the output feature map of the i-th feature selection module f_i^s , ⊙ represents channel-by-channel multiplication operation, GAP(·) represents global average pooling, Conv1d(·) represents 1-dimensional convolution, σ(· ) represents the Sigmoid function, Conv1×1(·) represents the 1×1 standard convolution; S_i is expressed as:

其中，U_i为第i个特征融合模块f_i^u的输出特征图，Conv3×3(·)表示3×3标准卷积，Up(·)表示双线性插值上采样函数。Among them, U_i is the output feature map of the i-th feature fusion module f_i^u , Conv3×3(·) represents the 3×3 standard convolution, and Up(·) represents the bilinear interpolation upsampling function.

2.2.3)建立包括5个级联卷积模块q₁,q₂,q₃,q₄,q₅的递归主干网络Q，其中：2.2.3) Establish a recursive backbone network Q including 5 cascaded convolution modules q₁ , q₂ , q₃ , q₄ , q₅ , where:

第1个卷积模块q₁由7×7标准卷积层、批归一化层、ReLU激活层以及最大池化下采样层级联组成；The first convolution module q₁ consists of a 7×7 standard convolution layer, a batch normalization layer, a ReLU activation layer, and a cascade of maximum pooling downsampling layers;

第2个卷积模块q₂由3个级联的残差块和一个1×1标准卷积层并联组成；The second convolutional module q₂ consists of 3 cascaded residual blocks and a 1×1 standard convolutional layer connected in parallel;

第3个卷积模块q₃由4个级联的残差块和一个1×1标准卷积层并联组成；The third convolutional module q₃ consists of 4 cascaded residual blocks and a 1×1 standard convolutional layer connected in parallel;

第4个卷积模块q₄由6个级联的残差块和一个1×1标准卷积层并联组成；The fourth convolutional module q₄ consists of 6 cascaded residual blocks and a 1×1 standard convolutional layer connected in parallel;

第5个卷积模块q₅由3个级联的残差块和一个1×1标准卷积层并联组成；The fifth convolutional module q₅ consists of 3 cascaded residual blocks and a 1×1 standard convolutional layer connected in parallel;

在本发明的实施例中，递归主干网络Q用于提取多尺度特征图，对于宽和高分别为W和H的输入SAR图像

递归主干网络输出的多尺度特征图为：In an embodiment of the present invention, a recurrent backbone network Q is used to extract multi-scale feature maps, for an input SAR image with width and height W and H respectively

The multi-scale feature map output by the recurrent backbone network is:

其中，Z_i为第i个卷积模块q_i的输出特征图，U_i-1为特征优选金字塔网络第i-1个特征融合模块

的输出特征图，Conv1×1(·)表示1×1标准卷积。Among them, Z_i is the output feature map of the i-th convolution module q_i , U_i-1 is the i-1th feature fusion module of the feature optimization pyramid network

The output feature map of Conv1×1( ) denotes a 1×1 standard convolution.

2.2.4)建立与特征优选金字塔网络F的结构、参数相同的递归特征优选金字塔网络E。2.2.4) Establish a recursive feature optimization pyramid network E with the same structure and parameters as the feature optimization pyramid network F.

在本发明的实施例中，递归特征优选金字塔E用于对递归主干网络Q输出的多尺度特征图Z_i(i＝2,3,4,5)进行特征优选和特征融合，Z_i经过特征优选后表示为：In the embodiment of the present invention, the recursive feature optimization pyramid E is used to perform feature optimization_and feature fusion on the multi-scale feature map Z_i (i=2, 3, 4, 5) output by the recurrent backbone network Q. After optimization, it is expressed as:

W_i＝f_i^s(Z_i+1)W_i ＝f_i^s (Z_i+1 )

＝Conv1×1(σ(Conv1d(GAP(Z_i+1)))⊙Z_i+1),i＝1,2,3,4=Conv1×1(σ(Conv1d(GAP(Z_i+1 )))⊙Z_i+1 ), i=1,2,3,4

其中，W_i为第i个特征优选模块f_i^s的输出特征图，⊙表示逐通道相乘操作，GAP(·)表示全局平均池化，Conv1d(·)表示1维卷积，σ(·)表示Sigmoid函数，Conv1×1(·)表示1×1标准卷积；W_i经过特征融合后表示为：Among them, W_i is the output feature map of the i-th feature selection module f_i^s , ⊙ represents channel-by-channel multiplication operation, GAP(·) represents global average pooling, Conv1d(·) represents 1-dimensional convolution, σ(· ) represents the Sigmoid function, Conv1×1( ) represents the 1×1 standard convolution; W_i is expressed as:

其中，P_i为第i个特征融合模块f_i^u的输出特征图，Conv3×3(·)表示3×3标准卷积，Up(·)表示双线性插值上采样函数。Among them, P_i is the output feature map of the i-th feature fusion module f_i^u , Conv3×3(·) represents the 3×3 standard convolution, and Up(·) represents the bilinear interpolation upsampling function.

2.3)建立由区域建议网络组成的区域建议模块G，区域建议网络依次包括候选区域生成模块、分类回归模块、后处理模块、正负样本分配模块，其中：2.3) Establish a region proposal module G consisting of a region proposal network. The region proposal network includes a candidate region generation module, a classification regression module, a post-processing module, and a positive and negative sample allocation module in turn, where:

候选区域生成模块，用于在输入特征图P_i(i＝1,2,3,4)每个点上生成面积为2²ⁱ⁺²，宽、高比分别为1:2、1:1、2:1的三个矩形候选区域；The candidate area generation module is used to generate an area of 2²ⁱ⁺² on each point of the input feature map P_i (i=1,2,3,4), and the width and height ratios are 1:2, 1:1, 2:1 three rectangular candidate areas;

分类回归模块，包括两个并联的3×3标准卷积层g₁和g₂，第一卷积层g₁用于调整候选区域的中心点位置和宽高，第二卷积层g₂用于预测候选区域的目标置信度；Classification and regression module, including two parallel 3×3 standard convolutional layers g₁ and g₂ , the first convolutional layer g₁ is used to adjust the center point position, width and height of the candidate area, and the second convolutional layer g₂ is used to Target confidence in predicting candidate regions;

后处理模块，用于过滤冗余的候选区域并输出目标置信度最高的前N个候选区域{(p_j,b_j)|j＝1,2,...,N}，其中p_j为第j个候选区域的目标置信度，b_j＝(x_j,y_j,w_j,h_j)为第j个候选区域的边界框，(x_j,y_j)为边界框的中心点坐标，(w_j,h_j)为边界框的宽高；The post-processing module is used to filter redundant candidate regions and output the top N candidate regions {(p_j ,b_j )|j=1,2,...,N} with the highest target confidence, where p_j is The target confidence of the j-th candidate area, b_j = (x_j ,y_j ,w_j ,h_j ) is the bounding box of the j-th candidate area, (x_j ,y_j ) is the coordinates of the center point of the bounding box , (w_j ,h_j ) is the width and height of the bounding box;

正负样本分配模块，用于将候选区域分配为正负样本，即将与标注边界框交并比大于0.7的候选区域分配为正样本，将与标注边界框交并比小于0.3的候选区域分配为负样本；The positive and negative sample allocation module is used to assign the candidate area as a positive and negative sample, that is, the candidate area with an intersection ratio greater than 0.7 with the labeled bounding box is assigned as a positive sample, and the candidate area with an intersection ratio with the labeled bounding box that is less than 0.3 is assigned as negative sample;

在本发明的实施例中，区域建议模块G输出的边界框R_G表示为：In an embodiment of the present invention, the bounding box R_G output by the region proposal module G is expressed as:

R_G＝{b_j＝(x_j,y_j,w_j,h_j)|j＝1,2,...N}。R_G ={b_j =(x_j ,y_j ,w_j ,h_j )|j=1,2,...N}.

2.4)建立由三个级联子决策器d₁,d₂,d₃组成的决策级增强模块D，每个子决策器结构相同，依次包括感兴趣区域提取模块、分类回归模块、正负样本分配模块，其中：2.4) Establish a decision-level enhancement module D consisting of three cascaded sub-decision makers d₁ , d₂ , and d₃ . Each sub-decision maker has the same structure, including the region of interest extraction module, classification and regression module, and positive and negative sample distribution. module, where:

感兴趣区域提取模块，由自适应平均池化层和展平层级联组成，自适应平均池化层用于将候选区域特征池化为7×7的感兴趣特征，展平层用于展平感兴趣特征；The region of interest extraction module consists of an adaptive average pooling layer and a flattening layer cascade. The adaptive average pooling layer is used to pool the candidate region features into 7×7 features of interest, and the flattening layer is used to flatten features of interest;

分类回归模块，由两个线性层并联组成，第一个线性层调整边界框的位置，第二个线性层预测边界框的类别得分；The classification regression module consists of two linear layers in parallel, the first linear layer adjusts the position of the bounding box, and the second linear layer predicts the category score of the bounding box;

正负样本分配模块，用于将边界框分配为正负样本，即将与标注边界框的交并比大于阈值的边界框分配为正样本，将与标注边界框的交并比小于阈值的边界框分配为负样本，三个子决策器的阈值分别为0.5，0.6，0.7；The positive and negative sample allocation module is used to assign the bounding box as a positive and negative sample, that is, assign the bounding box whose intersection ratio with the labeled bounding box is greater than the threshold as a positive sample, and assign the bounding box whose intersection ratio with the labeled bounding box is smaller than the threshold Assigned as a negative sample, the thresholds of the three sub-decision makers are 0.5, 0.6, 0.7;

决策级增强模块输出的边界框R_D表示为：The bounding box R_D output by the decision-level augmentation module is expressed as:

其中N为边界框个数，

为第三个子决策器输出的第j个边界框的位置，

分别代表

的中心点坐标和宽高，

为每个子决策器输出的第j个边界框类别得分的均值，表示为：where N is the number of bounding boxes,

is the location of the jth bounding box output by the third sub-decision maker,

Representing

The center point coordinates and width and height of

The mean value of the category score of the jth bounding box output for each sub-decision maker, expressed as:

其中N_c为类别总数，

分别代表子决策器d₁,d₂,d₃输出的第j个边界框的类别得分。where N_c is the total number of categories,

Represent the category score of the jth bounding box output by the sub-decision makers d₁ , d₂ , and d₃ respectively.

步骤三，训练多级增强网络。Step three, train the multi-level enhanced network.

3.1)从训练集中随机采样一组SAR图像输入到多级增强网络中计算其损失，基于该损失，通过随机梯度下降算法更新网络参数：3.1) Randomly sample a set of SAR images from the training set and input them into the multi-level enhanced network to calculate its loss. Based on the loss, update the network parameters through the stochastic gradient descent algorithm:

3.1.1)计算多级增强网络的损失

其中

和

分别为区域建议模块的损失和决策级增强模块的损失，分别表示为：3.1.1) Calculate the loss of the multi-level enhanced network

in

and

are the loss of the region proposal module and the loss of the decision-level augmentation module, respectively, expressed as:

其中,N_G为区域建议模块随机采样的候选区域个数，p_m和

分别为区域建议模块采样的第m个候选区域的目标置信度和对应的真实标签，b_m和

分别为区域建议模块采样的第m个候选区域的边界框和对应的真实标签，L_cls和L_reg分别为交叉熵分类损失和CIOU回归损失,N_D为各个子决策器随机采样的边界框个数，c_li和

分别为第i个子决策器采样的第l个边界框的类别得分和对应的真实标签，b_lⁱ和

分别为第i个子决策器采样的第l个边界框和对应的真实标签，λ_i为第i个子决策器的损失权重，满足

为激活函数，其表达式为：Among them, N_G is the number of candidate regions randomly sampled by the region proposal module, p_m and

are the object confidence and the corresponding ground-truth label of the m-th candidate region sampled by the region proposal module, b_m and

are the bounding box of the mth candidate region sampled by the region proposal module and the corresponding real label, L_cls and L_reg are the cross-entropy classification loss and CIOU regression loss respectively, N_D is the bounding box randomly sampled by each sub-decision maker number, c_l i and

Respectively for the class score of the l-th bounding box sampled by the i-th sub-decision maker and the corresponding ground truth label, b_lⁱ and

are respectively the l-th bounding box sampled by the i-th sub-decision maker and the corresponding true label, λ_i is the loss weight of the i-th sub-decision maker, satisfying

is an activation function whose expression is:

3.1.2)求解3.1.1)中多级增强网络损失

对多级增强网络参数θ的梯度，表示为：3.1.2) Solve the multi-level enhanced network loss in 3.1.1)

The gradient of the multi-level enhanced network parameter θ is expressed as:

其中

和

分别为区域建议模块和决策级增强模块的损失。in

and

are the losses of the region proposal module and the decision-level augmentation module, respectively.

3.1.3)根据3.1.2)中求解的梯度

更新多级增强网络参数，表示为：3.1.3) According to the gradient solved in 3.1.2)

Update the multi-level enhanced network parameters, expressed as:

其中θ′为更新后网络参数，θ为更新前网络参数；lr为学习率，其根据输入图像批次大小设置，在本发明的实施例中，设lr＝0.005。Wherein θ' is the network parameter after updating, θ is the network parameter before updating; lr is the learning rate, which is set according to the batch size of the input image, in the embodiment of the present invention, set lr=0.005.

3.2)重复步骤3.1)，直至网络收敛，得到训练好的多级增强网络。3.2) Repeat step 3.1) until the network converges to obtain a trained multi-level enhanced network.

步骤四，获取SAR目标检测识别结果。Step 4, obtaining the SAR target detection and recognition result.

将测试集中的SAR图像输入到训练好的多级增强网络中，得到检测识别结果。Input the SAR images in the test set into the trained multi-level enhanced network to obtain the detection and recognition results.

本发明的效果可以通过以下仿真实验进一步说明：Effect of the present invention can be further illustrated by following simulation experiments:

一、仿真实验条件：1. Simulation experiment conditions:

本发明仿真实验的软件平台为：Ubuntu 18.04操作系统和Pytorch 1.8.0，硬件配置为：Core i9-10980XE CPU和NVIDIA GeForce RTX 3090 GPU。The software platform of the simulation experiment of the present invention is: Ubuntu 18.04 operating system and Pytorch 1.8.0, and the hardware configuration is: Core i9-10980XE CPU and NVIDIA GeForce RTX 3090 GPU.

本发明的仿真实验使用高分3号SAR实测数据，场景类型为机场，图像分辨率为1m×1m，SAR图像数量为2000张，图像大小有600×600,1024×1024,2048×2048三种，目标类别数为7类，目标总数为6556，训练集图像数量为1400张，测试集图像数量为600张。The simulation experiment of the present invention uses the measured data of Gaofen No. 3 SAR, the scene type is an airport, the image resolution is 1m×1m, the number of SAR images is 2000, and the image sizes are 600×600, 1024×1024, and 2048×2048. , the number of target categories is 7, the total number of targets is 6556, the number of images in the training set is 1400, and the number of images in the test set is 600.

二、仿真内容与结果分析：2. Simulation content and result analysis:

在上述仿真条件下，用本发明和现有的“一种SAR图像目标检测识别一体化方法”分别在训练集上完成训练，然后随机挑选一张测试集图像输入到训练好的网络中，并将检测识别结果可视化到测试集图像上，结果如图3所示。其中，图3(a)为现有技术的检测识别结果，图3(b)为本发明的检测识别结果。图中绿色矩形框表示检测识别正确的目标，红色矩形框表示检测或识别错误的目标。Under the above simulation conditions, use the present invention and the existing "a kind of SAR image target detection and recognition integration method" to complete the training on the training set respectively, then randomly select a test set image and input it into the trained network, and The detection and recognition results are visualized on the test set images, and the results are shown in Figure 3. Among them, Fig. 3(a) is the detection and recognition result of the prior art, and Fig. 3(b) is the detection and recognition result of the present invention. The green rectangular box in the figure indicates the target detected and recognized correctly, and the red rectangular box indicates the target detected or recognized incorrectly.

对比图3(a)和3(b)可以看出，现有技术得到的检测识别结果中虚警和漏检较多，本发明所得检测识别结果虚警和漏检较少。Comparing Figures 3(a) and 3(b), it can be seen that there are more false alarms and missed detections in the detection and recognition results obtained in the prior art, and fewer false alarms and missed detections in the detection and recognition results obtained in the present invention.

对比本发明和现有技术在所有测试集图像上的检测识别指标，包括七类目标的平均精确率、平均召回率、平均F1分数、类平均精度，结果如表1所示：Comparing the detection and recognition indicators of the present invention and the prior art on all test set images, including the average precision rate, average recall rate, average F1 score, and class average precision of seven types of targets, the results are shown in Table 1:

表1Table 1

评价指标Evaluation index现有技术current technology本发明this invention平均精确率average precision81.4％81.4%96.5％96.5%平均召回率average recall78.8％78.8%97.1％97.1%平均F1分数average F1 score0.800.800.970.97类平均精度class average precision83.1％83.1%97.3％97.3%

从表1中可以看出，本发明的平均精确率、平均召回率、平均F1分数以及类平均精度均高于现有技术，说明本发明检测识别性能显著优于现有技术。It can be seen from Table 1 that the average precision rate, average recall rate, average F1 score and class average precision of the present invention are all higher than those of the prior art, indicating that the detection and recognition performance of the present invention is significantly better than that of the prior art.

Claims

1. A SAR target detection and identification method based on a multistage enhancement network is characterized by comprising the following steps:

(1) Acquiring SAR images with multiple types of targets, labeling the target position and the target type in each SAR image, and randomly dividing the labeled SAR images to obtain a training set and a test set;

(2) Constructing a multi-stage enhancement network:

(2a) Establishing a data level enhancement module which sequentially performs multi-scale transformation, random overturning, random rotation, power transformation and random noise operation;

(2b) Establishing a feature level enhancement module formed by cascading a backbone network A, a feature optimization pyramid network F, a recursion backbone network Q and a recursion feature optimization pyramid network E;

(2c) Selecting the existing regional suggestion network to form a regional suggestion module G, and selecting cross entropy loss and CIOU loss as classification and regression loss;

(2d) Three decision makers d₁ ,d₂ ,d₃ A decision level enhancement module D formed by cascading is adopted, and cross entropy loss and CIOU loss are used as classification and regression loss;

(2e) Sequentially cascading a data level enhancement module, a feature level enhancement module, a region suggestion module and a decision level enhancement module to form a multi-level enhancement network;

(3) Training the multi-stage enhancement network:

(3a) Randomly sampling a group of SAR images from a training set, inputting the SAR images into a multistage enhancement network, calculating loss, and updating network parameters through a random gradient descent algorithm based on the loss;

(3b) Repeating the process (3 a) until the network converges to obtain a trained multistage enhancement network;

(4) And inputting the SAR images in the test set into a trained multistage enhancement network to obtain a detection and identification result.

2. The method of claim 1, wherein the backbone network A in step (2 b) comprises 5 cascaded convolution modules a₁ ,a₂ ,a₃ ,a₄ ,a₅ ；

The first convolution module a₁ The device is composed of a 7 multiplied by 7 standard convolution layer, a batch normalization layer, a ReLU activation layer and a maximum pooling downsampling layer in a cascade mode;

the second convolution module a₂ Is formed by cascading 3 residual blocks;

the third convolution module a₃ Is formed by cascading 4 residual blocks;

the fourth convolution module a₄ The method is characterized by comprising 6 residual blocks in cascade connection;

the fifth convolution module a₅ Is formed by cascading 3 residual blocks;

the output characteristic diagram of the whole backbone network is shown as

Wherein Y is_i For the ith convolution module a_i In the output characteristic diagram of (a) is shown,

is an input SAR image with width and height W and H, respectively.

3. The method of claim 1, wherein the features in step (2 b) are preferably a pyramid network F, a packetComprising 4 parallel branches f₁ ,f₂ ,f₃ ,f₄ Each branch f_i Preference module f by characteristics_i^s And a feature fusion module f_i^u Cascading;

each characteristic optimization module f_i^s From two parallel sub-branches

And a 1X 1 standard convolutional layer cascade, a first sub-branch +>

The structure of the multilayer structure sequentially comprises a global average pooling layer, a 1-dimensional convolution layer and a Sigmoid activation layer; second sub-branch +>

Are constant branches;

each feature fusion module f_i^u Consists of a 3 × 3 standard convolutional layer;

the output feature map of the whole feature optimization pyramid network F is represented as:

wherein U is_i For the ith feature fusion module f_i^u Conv3 × 3 (-) represents a 3 × 3 standard convolution, up (-) represents a bilinear interpolation upsampling function, S_i Preference is given to module f for the ith feature_i^s Is represented as:

wherein Y is_i+1 Is the i +1 th convolution module a of the backbone network_i+1 The output characteristic diagram of (5) indicates a channel-by-channel multiplication operation, GAP (-) indicates fullLocal mean pooling, conv1d (. Cndot.) represents a 1-dimensional convolution, σ (. Sigma.). Cndot.) represents a Sigmoid function, and Conv 1X 1 (. Cndot.) represents a 1X 1 standard convolution.

4. The method of claim 1, wherein the recursive backbone network Q in step (2 b) comprises 5 cascaded convolution modules Q₁ ,q₂ ,q₃ ,q₄ ,q₅ ；

The 1 st convolution module q₁ The device is composed of a 7 multiplied by 7 standard convolution layer, a batch normalization layer, a ReLU activation layer and a maximum pooling downsampling layer in a cascade mode;

the 2 nd convolution module q₂ The device is formed by connecting 3 cascaded residual blocks and a 1 x 1 standard convolution layer in parallel;

the 3 rd convolution module q₃ The device is formed by connecting 4 cascaded residual blocks and a 1 x 1 standard convolution layer in parallel;

the 4 th convolution module q₄ The device is formed by connecting 6 cascaded residual blocks and a 1 x 1 standard convolution layer in parallel;

said 5 th convolution module q₅ The device is formed by connecting 3 cascaded residual blocks and a 1 x 1 standard convolution layer in parallel;

the output characteristic diagram of the whole recursion backbone network A is shown as follows:

wherein Z_i For the ith convolution module q_i Is used for the output characteristic diagram of the system,

for input SAR images with width and height W and H, respectively, U_i-1 Feature-optimized pyramid network i-1 th feature fusion module>

Conv1 × 1 (·) represents a 1 × 1 standard convolution.

5. The method of claim 1, wherein the recursive feature-preferred pyramid network E in step (2 b) has the same structure and parameters as the feature-preferred pyramid network F, and the output features thereof are represented as follows:

wherein P is_i For the ith feature fusion module f_i^u Conv3 × 3 (-) represents a 3 × 3 standard convolution, up (-) represents a bilinear interpolation upsampling function, W_i Preference is given to module f for the ith feature_i^s Is expressed as:

W_i ＝f_i^s (Z_i+1 )

＝Conv1×1(σ(Conv1d(GAP(Z_i+1 )))⊙Z_i+1 ) I =1,2,3,4 wherein Z_i+1 For the (i + 1) th convolution module q of the recursive backbone network_i+1 The output characteristic diagram of (1) (-) indicates a channel-by-channel multiplication operation, GAP (-) indicates a global average pooling, conv1d (-) indicates a 1-dimensional convolution, σ (-) indicates a Sigmoid function, and Conv1 × 1 (-) indicates a 1 × 1 standard convolution.

6. The method of claim 1, wherein the regional suggestion network in step (2 c) comprises a candidate region generation module, a classification regression module, a post-processing module, and a positive and negative sample distribution module in sequence;

the candidate region generation module is used for inputting the feature map P_i The area generated at each point is 2²ⁱ⁺² Three rectangular candidate regions with width and height ratios of 1, 1;

the classification regression module comprises two parallel 3 x 3 standard convolution layers g₁ And g₂ The first winding layer g₁ A second convolution layer g for adjusting the center position and width and height of the candidate region₂ A target confidence for predicting the candidate region;

the above-mentionedA post-processing module for filtering the redundant candidate regions and outputting the first N candidate regions with the highest target confidence coefficient { (p)_j ,b_j ) I j =1, 2., N }, where p is_j Is the target confidence of the jth candidate region, b_j ＝(x_j ,y_j ,w_j ,h_j ) As a bounding box for the jth candidate region, (x)_j ,y_j ) Is the center point coordinate of the bounding box, (w)_j ,h_j ) Is the width and height of the bounding box;

the positive and negative sample distribution module is used for distributing the candidate area as a positive and negative sample, namely distributing the candidate area with the intersection ratio of more than 0.7 with the labeling boundary frame as a positive sample and distributing the candidate area with the intersection ratio of less than 0.3 with the labeling boundary frame as a negative sample;

the bounding box of the entire regional proposal network output is represented as: r is_G ＝{b_j ＝(x_j ,y_j ,w_j ,h_j )|j＝1,2,...N}。

7. The method of claim 1, wherein three cascaded sub-deciders d in step (2 d) are used₁ ,d₂ ,d₃ Each decision maker sequentially comprises an interested region extraction module, a classification regression module and a positive and negative sample distribution module;

the interesting region extraction module consists of a self-adaptive average pooling layer and a flattening layer in a joint mode, wherein the self-adaptive average pooling layer is used for pooling the candidate region features into interesting features of 7 x 7, and the flattening layer is used for flattening the interesting features;

the classification regression module is formed by connecting two linear layers in parallel, the first linear layer adjusts the position of the boundary frame, and the second linear layer predicts the category score of the boundary frame;

the positive and negative sample distribution module is used for distributing the boundary frame into positive and negative samples, namely distributing the boundary frame with the intersection ratio of the boundary frame to be marked greater than a threshold value as a positive sample, distributing the boundary frame with the intersection ratio of the boundary frame to be marked smaller than the threshold value as a negative sample, and respectively setting the threshold values of the three sub-decision makers to be 0.5,0.6 and 0.7;

the bounding box output by the decision-level boosting module is represented as:

wherein N is the number of the bounding boxes,

for the position of the jth bounding box output of the third sub-decision maker>

Respectively represent->

In the coordinate system of (4), width and height>

The mean of the jth bounding box class score output for each sub-decider is expressed as:

wherein N is_c As a result of the total number of categories,

respectively representing sub-deciders d₁ ,d₂ ,d₃ The category score of the jth bounding box of the output.

8. The method of claim 1, wherein the loss in step (3 a) comprises a loss of a region recommendation module

And loss of decision stage boost module>

Respectively, as follows:

wherein N is_G Number of candidate regions, p, randomly sampled for region suggestion module_m And

target confidence and corresponding true label for the mth candidate region sampled by the region suggestion module, respectively, b_m And &>

Bounding boxes and corresponding real labels, L, for the mth candidate region sampled by the region suggestion module, respectively_cls And L_reg Respectively, cross entropy classification loss and CIOU regression loss, N_D For the number of randomly sampled bounding boxes for each sub-decision maker>

And &>

Class scores and corresponding true tags, <' > according to the ith bounding box sampled by the ith sub-decider>

And &>

Respectively for the ith sub-decision makerThe ith bounding box of a sample and the corresponding true label, λ_i Satisfy @, for the lost weight of the ith sub-decider>

For activating a function, the expression is:

the loss of the entire multi-stage enhancement network is expressed as:

9. the method of claim 1, wherein the network parameters are updated in step (3 b) by a random gradient descent method, which is implemented as follows:

(3b1) Solving for the gradient of the multi-level enhancement network parameters, expressed as:

wherein

For a multi-stage boost network loss, greater or lesser>

And &>

Loss of the region suggestion module and the decision-level enhancement module is respectively, and theta is a learnable parameter of the multi-level enhancement network;

(3b2) According to the gradient of solution

Updating parameters of the multi-level enhanced network, expressed as:

wherein theta' is a network parameter after updating, and theta is a network parameter before updating; lr is a learning rate, which is set according to the input image batch size.