CN113902975B

Movatterモバイル変換

Info

Publication number: CN113902975B
Application number: CN202111170725.1A
Authority: CN
Inventors: 张晓玲; 杨振宇; 张天文; 师君; 韦顺军
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2021-10-08
Filing date: 2021-10-08
Publication date: 2023-05-05
Anticipated expiration: 2041-10-08
Also published as: CN113902975A

Abstract

The invention discloses a scene perception data enhancement method for SAR ship detection, which is characterized in that firstly, a classical convolutional neural network VGG-11 is used for improvement, so that the method is more suitable for SAR images, and then the network is used for classifying images in a training set: the method comprises the steps of dividing an offshore training sample and an offshore training sample; obtaining a quantity-balanced offshore training sample and an offshore training sample by scene amplification; the classical detection network uses the processed data set to train, execute detection tasks and evaluate detection results; the overall detection precision of the Faster R-CNN ship detection network adopting the method of the invention is improved by 1.95% compared with the overall detection precision of the Faster R-CNN ship detection network in the prior art, the detection precision of the coasting ship is improved by 6.61%, and the improvement of the detection precision of the SAR image coasting ship is realized.

Description

Translated fromChinese

一种用于SAR舰船检测的场景感知数据增强方法A scene-aware data enhancement method for SAR ship detection

技术领域technical field

本发明属于合成孔径雷达(SAR)图像解译技术领域，涉及一种用于SAR舰船检测的场景感知数据增强方法。The invention belongs to the technical field of synthetic aperture radar (SAR) image interpretation, and relates to a scene perception data enhancement method for SAR ship detection.

背景技术Background technique

合成孔径雷达(SAR)是一种具有高分辨率的微波主动式成像雷达，它具有全天时、全天候的工作特点，相比于光学传感器，合成孔径雷达发射的电磁波能够穿透云雾、植被等复杂环境物体的遮挡，并能够不受探测地区光线亮暗的影响，因而在民事和军事领域中有着广泛的应用。详见文献“欧业宁.合成孔径雷达在舰船目标定位和成像技术的应用研究[J].舰船科学技术,2019,41(02):152-154.”。Synthetic Aperture Radar (SAR) is a high-resolution microwave active imaging radar. It has all-weather and all-weather working characteristics. Compared with optical sensors, the electromagnetic waves emitted by SAR can penetrate clouds, vegetation, etc. The occlusion of complex environmental objects, and can not be affected by the light of the detection area, so it has a wide range of applications in the civil and military fields. For details, please refer to the literature "Ou Yening. Application Research of Synthetic Aperture Radar in Ship Target Positioning and Imaging Technology [J]. Ship Science and Technology, 2019,41(02):152-154.".

近年来，SAR图像中船只检测也成为了一个研究热点，因为它可以实现便捷的海洋交通管理，船只溢油监测，船只灾难救援等。SAR图像中的船只是重要的有价值的目标，尤其在国防军事领域，可以有效地保护国家海洋权益，为解决海洋争端提供一种有效的解决手段。特别地，SAR工作不受白昼、气候条件的影响，特别适用于变幻莫测的海洋环境，从而弥补了光学传感器的缺点。详见文献“孟凡超,鲍勇.合成孔径雷达在舰船目标高分辨监视和测绘中的应用[J].舰船科学技术,2018,40(22):157-159.”。In recent years, ship detection in SAR images has also become a research hotspot, because it can realize convenient marine traffic management, ship oil spill monitoring, ship disaster rescue, etc. Ships in SAR images are important and valuable targets, especially in the field of national defense and military, which can effectively protect national maritime rights and interests and provide an effective means of solving maritime disputes. In particular, SAR work is not affected by daylight and weather conditions, and is especially suitable for unpredictable marine environments, thus making up for the shortcomings of optical sensors. For details, see the literature "Meng Fanchao, Bao Yong. Application of synthetic aperture radar in high-resolution surveillance and mapping of ship targets [J]. Ship Science and Technology, 2018, 40(22): 157-159.".

目前为止，众多SAR图像船只检测算法已经被提出，最常用且有效的方法是以CFAR为基础的各类检测算法，该类型的方法利用事先建立好的海杂波模型，使用滑动窗口对图像进行检索，根据海杂波模型提供的船只检测阈值，来确定是否包含船只，其中常见的海杂波模型有基于高斯分布、基于瑞丽分布、基于K分布等。但是由于海面背景受周围环境和气象所影响，背景杂波分布模型难以拟合真实背景杂波分布，因此导致CFAR很难应用在较复杂场景的条件下。详见文献“杨学志,宋辉,杜扬,张晰,孟俊敏.基于Rice-CFAR的SAR图像舰船检测[J].合肥工业大学学报(自然科学版),2015,38(04):463-467.”。So far, many SAR image ship detection algorithms have been proposed, the most common and effective method is various detection algorithms based on CFAR, this type of method uses the sea clutter model established in advance, and uses the sliding window to process the image Retrieval, according to the ship detection threshold provided by the sea clutter model, to determine whether ships are included, among which the common sea clutter models are based on Gaussian distribution, Rayleigh distribution, K distribution, etc. However, since the sea surface background is affected by the surrounding environment and weather, it is difficult for the background clutter distribution model to fit the real background clutter distribution, which makes it difficult for CFAR to be applied in more complex scenes. For details, see the literature "Yang Xuezhi, Song Hui, Du Yang, Zhang Xi, Meng Junmin. Ship detection in SAR images based on Rice-CFAR [J]. Journal of Hefei University of Technology (Natural Science Edition), 2015,38(04):463 -467."

随着人工智能的发展，深度学习被应用在SAR图像舰船检测领域。基于深度学习的方法主要采用深度卷积神经网络来自动提取船只的特征，通过学习训练来拟合数据的数学分布，通过回归进行推理得到船只在SAR图像中的坐标位置，其精度比以CFAR为基础的各类检测算法更高。目前一些源自计算机视觉领域的目标检测器，如Fast R-CNN，Faster R-CNN，YOLO，RetinaNet等，已被成功地应用到SAR图像船只检测领域。但由于靠岸区域具有较强的后向散射特征，靠岸船只的检测精度明显低于离岸船只检测精度。With the development of artificial intelligence, deep learning is applied in the field of SAR image ship detection. The method based on deep learning mainly uses the deep convolutional neural network to automatically extract the features of the ship, fits the mathematical distribution of the data through learning and training, and obtains the coordinate position of the ship in the SAR image through regression inference. The accuracy ratio is CFAR. The basic detection algorithms are higher. At present, some target detectors from the field of computer vision, such as Fast R-CNN, Faster R-CNN, YOLO, RetinaNet, etc., have been successfully applied to the field of SAR image ship detection. However, due to the strong backscattering characteristics in the docking area, the detection accuracy of docked ships is significantly lower than that of offshore ships.

尽管上述基于CNN的SAR舰船检测器比传统的检测方法具有更好的检测性能，但由于样本场景的不平衡，近岸船舶的检测精度仍然难以提高。为平衡靠岸样本和离岸样本的数量，一种用于SAR图像中靠岸和离岸舰船检测的平衡场景学习机制(BSLM)的方法被提出。该方法基于无监督学习，利用生成对抗网络(GAN)提取SAR图像的场景特征；利用这些特征，通过k均值进行场景二元聚类(靠岸/离岸)；最后，通过复制、旋转变换或添加噪声来增强近岸的样本，以平衡离岸样本，从而消除场景学习偏差，获得平衡的学习表示能力，从而提高学习效益和检测精度。详见文献“T.Zhang et al.,"Balance Scene Learning Mechanismfor Offshore and Inshore Ship Detection in SAR Images,"in IEEE Geoscience andRemote Sensing Letters,doi:10.1109/LGRS.2020.3033988.”。Although the above CNN-based SAR ship detector has better detection performance than traditional detection methods, it is still difficult to improve the detection accuracy of near-shore ships due to the imbalance of sample scenes. In order to balance the number of shore samples and offshore samples, a Balanced Scene Learning Mechanism (BSLM) method for ship detection on shore and offshore in SAR images is proposed. The method is based on unsupervised learning, using generative adversarial network (GAN) to extract scene features of SAR images; using these features, scene binary clustering (onshore/offshore) is performed by k-means; finally, by copying, rotating transformation or Add noise to enhance nearshore samples to balance offshore samples, thereby eliminating scene learning bias and obtaining balanced learning representation capabilities, thereby improving learning efficiency and detection accuracy. For details, see the literature "T. Zhang et al., "Balance Scene Learning Mechanism for Offshore and Inshore Ship Detection in SAR Images," in IEEE Geoscience and Remote Sensing Letters, doi:10.1109/LGRS.2020.3033988.".

因此，为了解决传统的SAR靠岸船只检测精度不足的问题，本发明提出了一种用于SAR舰船检测的场景感知数据增强方法。Therefore, in order to solve the problem of insufficient detection accuracy of traditional SAR ships docked, the present invention proposes a scene-aware data enhancement method for SAR ship detection.

发明内容Contents of the invention

本发明属于合成孔径雷达(SAR)图像解译技术领域，公开了一种用于SAR舰船检测的场景感知数据增强方法。该方法基于深度学习理论，主要包括卷积神经网络、场景扩增、经典检测网络Faster R-CNN四个部分。本发明基于经典卷积神经网络VGG-11进行了一定的改进，使得更适用于SAR图像，然后用该网络对训练集中的图像进行二分类，分为靠岸训练样本和离岸训练样本；再利用场景扩增，得到数量平衡的靠岸训练样本和离岸训练样本；经典检测网络使用处理过后的数据集进行训练，执行检测任务并评估检测结果。最终采用本方法的Faster R-CNN船只检测网络的总体检测精度比现有技术的Faster R-CNN船只检测网络的总体检测精度提高了1.95％，对靠岸船只的检测精度提高了6.61％，实现了SAR图像靠岸船只的检测精度的提高。The invention belongs to the technical field of synthetic aperture radar (SAR) image interpretation, and discloses a scene perception data enhancement method for SAR ship detection. This method is based on deep learning theory and mainly includes four parts: convolutional neural network, scene amplification, and classic detection network Faster R-CNN. The present invention makes certain improvements based on the classic convolutional neural network VGG-11, making it more suitable for SAR images, and then uses the network to perform binary classification on the images in the training set, and divides them into shore training samples and offshore training samples; and then Use scene amplification to obtain a balanced number of onshore training samples and offshore training samples; the classic detection network uses the processed data set for training, performs detection tasks and evaluates the detection results. Finally, the overall detection accuracy of the Faster R-CNN ship detection network using this method is 1.95% higher than that of the prior art Faster R-CNN ship detection network, and the detection accuracy of docked ships is increased by 6.61%. It improves the detection accuracy of ships docked in SAR images.

为了方便描述本发明的内容，首先作以下术语定义：In order to describe content of the present invention conveniently, at first do following term definition:

定义1：SSDD数据集获取方法Definition 1: SSDD dataset acquisition method

SSDD数据集是指SAR船只检测数据集，英文全称为SAR Ship Detection Dataset，是第一个开放的SAR图像船只检测的数据集。SSDD数据主要来源于RadarSat-2，TerraSAR-X和Sentinel-1传感器，包含HH、HV、VV和VH四种极化方式的数据。SSDD数据集的观测场景主要为海域和近岸地区，一共有1160幅500×500的图像和2551艘船只，平均每幅图像有2.20艘船，并且船只具有不同尺度，不同的分布位置以及不同的分辨率等特点，船只目标具有多样性。获取SSDD数据集方法见文献“李健伟,曲长文,彭书娟,邓兵.基于卷积神经网络的SAR图像舰船目标检测[J].系统工程与电子技术,2018,40(09):1953-1959.”。The SSDD data set refers to the SAR ship detection data set, which is called the SAR Ship Detection Dataset in English. It is the first open SAR image ship detection data set. SSDD data mainly comes from RadarSat-2, TerraSAR-X and Sentinel-1 sensors, including data of four polarization modes of HH, HV, VV and VH. The observation scenes of the SSDD data set are mainly sea areas and near-shore areas. There are 1160 images of 500×500 and 2551 ships, with an average of 2.20 ships per image, and the ships have different scales, different distribution locations and different Resolution and other characteristics, ship targets are diverse. For the method of obtaining the SSDD data set, see the literature "Li Jianwei, Qu Changwen, Peng Shujuan, Deng Bing. Ship target detection in SAR images based on convolutional neural network [J]. System Engineering and Electronic Technology, 2018, 40(09): 1953- 1959.".

定义2：经典的卷积神经网络Definition 2: Classic Convolutional Neural Network

经典的卷积神经网络通常由输入层，隐含层，输出层组成。输入层可以处理多维数据，在计算机视觉领域通常预先假设输入层输入三维输入数据，即平面上的二维像素点和RGB通道。输出层在图像检测和识别当中通常使用逻辑函数或归一化指数函数输出分类标签和相应边框坐标值。隐含层包含卷积层、非线性激活函数、池化层和全连接层构成，卷积层以输入特征的一小块矩形区域为单位，将特征进行高维的抽象；非线性池化层被用来缩小矩阵，进而减少后续神经网络中的参数；全连接层等价于传统前馈神经网络中的隐含层，它将之前抽象得到的高维特征作为输入进行分类和检测任务。经典的卷积神经网络方法详见文献“胡伏原,李林燕,尚欣茹,沈军宇,戴永良.基于卷积神经网络的目标检测算法综述[J].苏州科技大学学报(自然科学版),2020,37(02):1-10+25.”A classic convolutional neural network usually consists of an input layer, a hidden layer, and an output layer. The input layer can handle multi-dimensional data. In the field of computer vision, it is usually presupposed that the input layer inputs three-dimensional input data, that is, two-dimensional pixels and RGB channels on the plane. The output layer usually uses logistic functions or normalized exponential functions to output classification labels and corresponding border coordinate values in image detection and recognition. The hidden layer consists of a convolutional layer, a nonlinear activation function, a pooling layer, and a fully connected layer. The convolutional layer uses a small rectangular area of the input feature as a unit to abstract the features in high dimensions; the nonlinear pooling layer It is used to reduce the matrix, thereby reducing the parameters in the subsequent neural network; the fully connected layer is equivalent to the hidden layer in the traditional feedforward neural network, and it uses the previously abstracted high-dimensional features as input for classification and detection tasks. The classic convolutional neural network method is detailed in the literature "Hu Fuyuan, Li Linyan, Shang Xinru, Shen Junyu, Dai Yongliang. A review of object detection algorithms based on convolutional neural network [J]. Journal of Suzhou University of Science and Technology (Natural Science Edition), 2020, 37(02):1-10+25."

定义3：标准的全连接层方法Definition 3: Standard fully connected layer method

全连接层是为卷积神经网络的一部分，全连接层的输入和输出的尺寸都是固定的，每一个结点都与上一层的所有结点相连，用来把前边提取到的特征综合起来。全连接层方法详见“Haoren Wang,Haotian Shi,Ke Lin,Chengjin Qin,Liqun Zhao,YixiangHuang,Chengliang Liu.Ahigh-precision arrhythmia classification method basedon dual fully connected neural network[J].Biomedical Signal Processing andControl,2020,58.”。The fully connected layer is a part of the convolutional neural network. The size of the input and output of the fully connected layer is fixed, and each node is connected to all the nodes of the previous layer, which is used to synthesize the features extracted earlier. stand up. For details on the fully connected layer method, see "Haoren Wang, Haotian Shi, Ke Lin, Chengjin Qin, Liqun Zhao, YixiangHuang, Chengliang Liu. A high-precision arrhythmia classification method based on dual fully connected neural network[J]. Biomedical Signal Processing and Control ,2020, 58."

定义4：卷积核Definition 4: Convolution Kernel

卷积核是实现将输入的特征图或者图片中的一小部分矩形区域内的值分别加权然后求和作为输出的一个节点。每个卷积核需要人工指定多个参数。一类参数是卷积核所处理的节点矩阵的长和宽，这个节点矩阵的尺寸也是卷积核的尺寸。另外一类卷积核的参数是处理得到的单位节点矩阵的深度，单位节点矩阵的深度也是卷积核的深度。在卷积操作过程中，每个卷积核在输入数据上滑动，然后计算整个卷积核与输入数据相对应位置的内积，之后将内积通过非线性函数得到最终结果，最后所有对应位置的结果组成了一张二维的特征图。每个卷积核都会生成一张二维的特征图，多个卷积核生成的特征图相叠加组成了一个三维的特征图。卷积核方法详见“范丽丽,赵宏伟,赵浩宇,胡黄水,王振.基于深度卷积神经网络的目标检测研究综述[J].光学精密工程,2020,28(05):1152-1164.”。The convolution kernel is a node that weights the input feature map or the values in a small part of the rectangular area in the picture and then sums them as output. Each convolution kernel needs to manually specify multiple parameters. One type of parameter is the length and width of the node matrix processed by the convolution kernel, and the size of this node matrix is also the size of the convolution kernel. The parameter of another type of convolution kernel is the depth of the processed unit node matrix, and the depth of the unit node matrix is also the depth of the convolution kernel. During the convolution operation, each convolution kernel slides on the input data, and then calculates the inner product of the entire convolution kernel and the corresponding position of the input data, and then passes the inner product through a nonlinear function to obtain the final result, and finally all corresponding positions The results form a two-dimensional feature map. Each convolution kernel generates a two-dimensional feature map, and the feature maps generated by multiple convolution kernels are superimposed to form a three-dimensional feature map. For details on the convolution kernel method, please refer to "Fan Lili, Zhao Hongwei, Zhao Haoyu, Hu Huangshui, Wang Zhen. A Review of Research on Object Detection Based on Deep Convolutional Neural Networks [J]. Optical Precision Engineering, 2020, 28(05): 1152-1164.".

定义5：传统IoU交并比方法Definition 5: Traditional IoU intersection and ratio method

IoU分数是对象类别分割问题的标准性能度量。给定一组图像，IoU测量给出了在该组图像中存在的对象的预测区域和地面实况区域之间的相似性，并且由公式

定义，其中I(X)和U(X)分别表示“预测的边框”和“真实的边框”的交集和并集。传统IoU交并比计算方法详见文献“Rahman M A,Wang Y.Optimizing Intersection-Over-Union inDeep Neural Networks for Image Segmentation[M]//Advances in VisualComputing.Springer International Publishing,2016:234-244.”。The IoU score is a standard performance measure for object category segmentation problems. Given a set of images, the IoU measure gives the similarity between predicted and ground-truth regions of objects present in the set of images, and is given by the formula

Definition, where I(X) and U(X) denote the intersection and union of "predicted bounding box" and "true bounding box", respectively. For the traditional IoU calculation method, please refer to the literature "Rahman M A, Wang Y. Optimizing Intersection-Over-Union in Deep Neural Networks for Image Segmentation [M]//Advances in Visual Computing. Springer International Publishing, 2016: 234-244.".

定义6：标准的ReLU函数激活方法Definition 6: Standard ReLU function activation method

标准的ReLU函数全称是线性整流函数(Rectified Linear Unit,ReLU)，又称修正线性单元，是一种人工神经网络中常用的激活函数(activation function)，通常指代以斜坡函数及其变种为代表的非线性函数。其表达式为

该函数能够将函数的输入变量映射到0到1的区间内，是负半轴恒为0，正半轴单调递增并且可导的函数，可以增加神经网络中的稀疏性。标准的ReLU函数激活方法详见网站“https://www.cnblogs.com/makefile/p/activation-function.html”。The full name of the standard ReLU function is a linear rectification function (Rectified Linear Unit, ReLU), also known as a modified linear unit, is a commonly used activation function in artificial neural networks (activation function), usually refers to the ramp function and its variants. non-linear function. Its expression is

This function can map the input variable of the function to the interval from 0 to 1. It is a function in which the negative semi-axis is always 0 and the positive semi-axis is monotonically increasing and can be derived. It can increase the sparsity in the neural network. For the standard ReLU function activation method, please refer to the website "https://www.cnblogs.com/makefile/p/activation-function.html".

定义7：标准的批归一化方法Definition 7: Standard batch normalization methods

标准的批归一化(Batch Normalization，BN)方法是将分散的数据统一化的一种方法，作用是能够让网络更容易学习到数据之中的规律。通常BN被看作一个层面，添加在激活函数前面，使输入的x值变化范围变小，一定程度减少过拟合。标准的批归一化方法详见网站“https://www.cnblogs.com/shine-lee/p/11989612.html”。The standard batch normalization (Batch Normalization, BN) method is a method to unify scattered data, and its function is to make it easier for the network to learn the laws of the data. Usually BN is regarded as a layer, which is added in front of the activation function to make the input x value change range smaller and reduce overfitting to a certain extent. The standard batch normalization method is detailed in the website "https://www.cnblogs.com/shine-lee/p/11989612.html".

定义8：标准的最大池化方法Definition 8: Standard max pooling method

标准的最大池化(Max Pooling)方法是一种取局部接受域中值最大的点的方法，其主要作用是用来缩减模型的大小、提高计算速度以及提高所提取特征的鲁棒性。标准的最大池化方法详见网站“https://blog.csdn.net/weixin_43336281/article/details/102149468”。The standard Max Pooling method is a method of taking the point with the largest value in the local receptive field. Its main function is to reduce the size of the model, improve the calculation speed and improve the robustness of the extracted features. The standard maximum pooling method is detailed in the website "https://blog.csdn.net/weixin_43336281/article/details/102149468".

定义9：标准的softmax方法Definition 9: Standard softmax method

标准的softmax方法是logistic回归模型在多分类问题上的推广，其表达式为

其中Vi是分类器前级输出单元的输出，i表示类别索引，总的类别个数为C，Si表示的是当前元素的指数与所有元素指数和的比值。经过softmax处理后的输出，能将数值表征为不同类别之间的相对概率。标准的softmax方法详见网站“https://blog.csdn.net/qq_32642107/article/details/97270994？utm_medium＝distribute.pc_relevant.none-task-blog-2～default～baidujs_baidulandingword～default-0.control&spm＝1001.2101.3001.4242”。The standard softmax method is an extension of the logistic regression model to multi-classification problems, and its expression is

Among them, Vi is the output of the previous output unit of the classifier, i represents the category index, the total number of categories is C, and Si represents the ratio of the index of the current element to the sum of all element indices. The output after softmax processing can represent the value as the relative probability between different categories. For the standard softmax method, see the website "https://blog.csdn.net/qq_32642107/article/details/97270994?utm_medium=distribute.pc_relevant.none-task-blog-2~default~baidujs_baidulandingword~default-0.control&spm= 1001.2101.3001.4242".

定义10：标准的VGG-11网络Definition 10: Standard VGG-11 network

标准的VGG-11网络指的是含有11层隐藏层的VGG网络，是用来提取特征的网络部分，能够将网络中不同的模块联合起来，包含有多个卷积层和池化层，通过训练能够自动提取有用的特征信息。详见文献“Simonyan K,Zisserman A.Very Deep ConvolutionalNetworks for Large-Scale Image Recognition[J].Computer Science,2014.”。The standard VGG-11 network refers to the VGG network with 11 hidden layers. It is a part of the network used to extract features. It can combine different modules in the network, including multiple convolutional layers and pooling layers. Through Training can automatically extract useful feature information. For details, see the literature "Simonyan K, Zisserman A. Very Deep Convolutional Networks for Large-Scale Image Recognition [J]. Computer Science, 2014.".

定义11：经典的随机梯度下降算法Definition 11: Classic stochastic gradient descent algorithm

经典的随机梯度下降(SGD)算法是一种优化算法，对原始模型构建的损失函数进行优化，以寻找到最优的参数。其特点是每个数据都计算损失函数并求梯度来更新参数，计算速度快。经典的随机梯度下降算法详见“https://blog.csdn.net/qq_38150441/article/details/80533891”。The classic stochastic gradient descent (SGD) algorithm is an optimization algorithm that optimizes the loss function constructed by the original model to find the optimal parameters. Its characteristic is that the loss function is calculated for each data and the gradient is calculated to update the parameters, and the calculation speed is fast. For details on the classic stochastic gradient descent algorithm, see "https://blog.csdn.net/qq_38150441/article/details/80533891".

定义12：召回率和精确率计算方法Definition 12: Calculation method of recall rate and precision rate

召回率R指在所有的正样本中预测正确的数量，表达式为

精确率P指预测为正例的结果中，正确的个数所占的比例表达式为

其中，TP(truepositive)表示被模型预测为正值的正样本；FN(false negative)表示被模型预测为负值的负样本；FP(false positive)表示为被模型预测为负值的正样本。召回率和精确率曲线P(R)指以R为自变量，P为因变量的函数，以上参数数值的求法详见文献“李航.统计学习方法[M].北京:清华大学出版社,2012.”。The recall rate R refers to the number of correct predictions in all positive samples, the expression is

The accuracy rate P refers to the proportion of the correct number among the results predicted as positive examples. The expression is

Among them, TP (true positive) represents a positive sample predicted to be positive by the model; FN (false negative) represents a negative sample predicted to be negative by the model; FP (false positive) represents a positive sample predicted to be negative by the model. The recall rate and precision rate curve P(R) refers to the function with R as the independent variable and P as the dependent variable. For the calculation method of the above parameter values, please refer to the literature "Li Hang. Statistical Learning Methods [M]. Beijing: Tsinghua University Press, 2012.”.

定义13：标准的mAP指标精度评估方法Definition 13: Standard mAP indicator accuracy evaluation method

mAP是指均值平均精度，英文全称为mean Average Precision。在目标检测领域，mAP被用来去衡量一个检测模型的精度优劣。其计算公式为

其中P为精度，R为召回率。标准的mAP指标精度评估方法详见“https://www.cnblogs.com/zongfa/p/9783972.html”。mAP refers to the mean average precision, the English full name is mean Average Precision. In the field of target detection, mAP is used to measure the accuracy of a detection model. Its calculation formula is

where P is precision and R is recall. The standard mAP index accuracy evaluation method is detailed in "https://www.cnblogs.com/zongfa/p/9783972.html".

定义14：现有技术Faster R-CNNDefinition 14: State-of-the-art Faster R-CNN

现有技术Faster R-CNN是一种目标检测网络。该网络由两个模块组成，第一个模块是区域推荐网络，用于推荐可能出现目标的位置，第二个模块是Fast R-CNN网络，用于进行目标的分类和框回归。现有技术Faster R-CNN网络的建立方法详见“Ren S,He K,Girshick R,et al.Faster R-CNN:Towards Real-Time Object Detection with RegionProposal Networks[J].IEEE Transactions on Pattern Analysis&MachineIntelligence,2017,39(6):1137-1149.”。State-of-the-art Faster R-CNN is an object detection network. The network consists of two modules, the first module is the region recommendation network, which is used to recommend the location where the target may appear, and the second module is the Fast R-CNN network, which is used for the classification and frame regression of the target. For the establishment method of the prior art Faster R-CNN network, see "Ren S, He K, Girshick R, et al. Faster R-CNN: Towards Real-Time Object Detection with RegionProposal Networks[J]. IEEE Transactions on Pattern Analysis&Machine Intelligence, 2017,39(6):1137-1149.".

定义15：经典的数据增强方法Definition 15: Classic data augmentation methods

经典的数据增强方法是一种生成新训练样本的方法，该方法通过将原始数据加入一些随机扰动，同时保证原始数据的类标签不变，从而达到产生更多训练样本的目的。数据增强的作用是增强网络的泛化性，提高网络的各项指标。常见的数据增强操作包括翻转、旋转、缩放、剪切等。经典数据增强方法详见“https://blog.csdn.net/u010801994/article/details/81914716”。The classic data enhancement method is a method of generating new training samples. This method achieves the purpose of generating more training samples by adding some random perturbations to the original data while ensuring that the class labels of the original data remain unchanged. The role of data enhancement is to enhance the generalization of the network and improve the various indicators of the network. Common data augmentation operations include flipping, rotating, scaling, shearing, etc. For details on the classic data enhancement methods, see "https://blog.csdn.net/u010801994/article/details/81914716".

定义16：标准的前向传播方法Definition 16: Standard forward propagation method

标准的前向传播方法是深度学习当中最基本的一个方法，主要是将输入依据网络中的参数和连接方法进行前向推理，从而得到网络的输出。标准的前向传播方法详见“https://www.jianshu.com/p/f30c8daebebb”。The standard forward propagation method is the most basic method in deep learning. It mainly uses the input to perform forward reasoning based on the parameters and connection methods in the network, so as to obtain the output of the network. For the standard forward propagation method, see "https://www.jianshu.com/p/f30c8daebebb" for details.

定义17：标准的非极大值抑制方法Definition 17: Standard non-maximum suppression method

标准的非极大值抑制方法是目标检测领域中用来去除冗余检测框的算法。在经典的检测网络的前向传播结果中，常常会出现同一目标对应多个检测框的情况。因此，需要一种算法从同一目标的多个检测框中筛选出一个质量最好、得分最高的检测框。非极大值抑制通过计算重叠率阈值进行局部最大搜索。标准的非极大值抑制方法详见“https://www.cnblogs.com/makefile/p/nms.html”。The standard non-maximum suppression method is an algorithm used in the field of target detection to remove redundant detection boxes. In the forward propagation results of the classic detection network, it often occurs that the same target corresponds to multiple detection boxes. Therefore, an algorithm is needed to select a detection frame with the best quality and the highest score from multiple detection frames of the same object. Non-maximum suppression performs a local maximum search by computing an overlap ratio threshold. For the standard non-maximum suppression method, see "https://www.cnblogs.com/makefile/p/nms.html" for details.

定义18：标准的图像镜像方法Definition 18: Standard Image Mirroring Method

标准的图像镜像方法分为水平镜像及垂直镜像。水平镜像即将图像左半部分和右半部分以图像垂直中轴线为中心轴进行对换；而垂直镜像则是将图像上半部分和下半部分以图像水平中轴线为中心轴进行对换。标准的图像镜像方法详见“https://blog.csdn.net/qq_30708445/article/details/87881362？utm_medium＝distribute.pc_relevant.none-task-blog-2～default～baidujs_baidulandingword～default-0.no_search_link&spm＝1001.2101.3001.4242”。Standard image mirroring methods are divided into horizontal mirroring and vertical mirroring. Horizontal mirroring is to swap the left half and right half of the image with the vertical central axis of the image as the central axis; while vertical mirroring is to swap the upper and lower halves of the image with the horizontal central axis of the image as the central axis. For the standard image mirroring method, see "https://blog.csdn.net/qq_30708445/article/details/87881362?utm_medium=distribute.pc_relevant.none-task-blog-2~default~baidujs_baidulandingword~default-0.no_search_link&spm= 1001.2101.3001.4242".

定义19：标准的数据集合并方法Definition 19: Standard Dataset Merging Method

标准的数据集合并方法是将不同的数据源数据合并在一起，其中包括图片和标签的合并及重命名，然后进一步进行数据处理与分析。标准的数据集合并方法详见“https://zhuanlan.zhihu.com/p/97074949”。The standard data set merging method is to merge data from different data sources together, including merging and renaming images and tags, and then further data processing and analysis. For details on the standard data set merging method, see "https://zhuanlan.zhihu.com/p/97074949".

本发明提供了一种用于SAR舰船检测的场景感知数据增强方法，整个流程见附图1，它包括以下几个步骤：The present invention provides a kind of scene perception data enhancement method for SAR ship detection, and the whole process is shown in accompanying drawing 1, and it comprises the following several steps:

步骤1、准备数据集Step 1. Prepare the dataset

按照定义1获取SSDD数据集方法得到SSDD数据集，选择后缀为1和9的图像作为测试集，记为Test，其他图像作为训练集，记为Train，并将训练集Train中的SAR图像进行标记，分为靠岸场景和离岸场景两类，得到新的训练集，记为new_Train。Get the SSDD data set according to the method of obtaining the SSDD data set in Definition 1, select the images with suffixes 1 and 9 as the test set, record it as Test, and use other images as the training set, record it as Train, and mark the SAR images in the training set Train , which can be divided into two types: docking scene and offshore scene, and a new training set is obtained, which is denoted as new_Train.

步骤2、建立场景分类网络Step 2. Establish a scene classification network

按照定义2中经典的卷积神经网络方法，定义输入层，记为L1，输入尺寸为224×224×1的SAR图像；According to the classic convolutional neural network method in Definition 2, define the input layer, denoted as L1, and input a SAR image with a size of 224×224×1;

以输入层L1作为输入，按照定义2中经典的卷积神经网络方法，构建卷积层C1，卷积核参数设置：尺寸设为3×3×64、步长设为1；Taking the input layer L1 as input, according to the classic convolutional neural network method in Definition 2, construct the convolutional layer C1, and set the parameters of the convolution kernel: the size is set to 3×3×64, and the step size is set to 1;

采用定义6中的标准ReLU函数激活方法对卷积层C1进行激活，得到激活后的卷积层C1_act；Use the standard ReLU function activation method in Definition 6 to activate the convolutional layer C1 to obtain the activated convolutional layer C1_act ;

采用定义7中的标准的批归一化方法对激活后的卷积层C1_act进行批归一化处理，得到224×224×64维的向量，记为L2；Use the standard batch normalization method in Definition 7 to perform batch normalization processing on the activated convolutional layer C1_act to obtain a 224×224×64-dimensional vector, which is denoted as L2;

以224×224×64维的向量L2作为输入，采用定义8中的标准的最大池化方法对L2进行尺寸为2×2的最大池化，得到112×112×64维的向量，记为L3；Take the 224×224×64-dimensional vector L2 as input, and use the standard maximum pooling method in Definition 8 to perform maximum pooling on L2 with a size of 2×2 to obtain a 112×112×64-dimensional vector, denoted as L3 ;

以112×112×64维的向量L3作为输入，按照定义2中经典的卷积神经网络方法，构建卷积层C2，卷积核参数设置：尺寸设为3×3×128、步长设为1；Taking the 112×112×64-dimensional vector L3 as input, according to the classic convolutional neural network method in Definition 2, construct the convolutional layer C2, and set the convolution kernel parameters: the size is set to 3×3×128, and the step size is set to 1;

采用定义6中的标准ReLU函数激活方法对卷积层C2进行激活，得到激活后的卷积层C2_act；Use the standard ReLU function activation method in Definition 6 to activate the convolutional layer C2 to obtain the activated convolutional layer C2_act ;

采用定义7中的标准的批归一化方法对激活后的卷积层C2_act进行批归一化处理，得到112×112×128维的向量，记为L4；Use the standard batch normalization method in Definition 7 to perform batch normalization on the activated convolutional layer C2_act to obtain a 112×112×128-dimensional vector, which is denoted as L4;

以112×112×128维的向量L4作为输入，采用定义8中的标准的最大池化方法对L4进行尺寸为2×2的最大池化，得到56×56×128维的向量，记为L5；Take the 112×112×128-dimensional vector L4 as input, use the standard maximum pooling method in Definition 8 to perform maximum pooling on L4 with a size of 2×2, and obtain a 56×56×128-dimensional vector, denoted as L5 ;

以56×56×128维的向量L5作为输入，按照定义2中经典的卷积神经网络方法，构建卷积层C3，卷积核参数设置：尺寸设为3×3×256、步长设为1；Taking the 56×56×128-dimensional vector L5 as input, according to the classic convolutional neural network method in Definition 2, construct the convolutional layer C3, and set the convolution kernel parameters: the size is set to 3×3×256, and the step size is set to 1;

采用定义6中的标准ReLU函数激活方法对卷积层C3进行激活，得到激活后的卷积层C3_act；Use the standard ReLU function activation method in Definition 6 to activate the convolutional layer C3 to obtain the activated convolutional layer C3_act ;

采用定义7中的标准的批归一化方法对激活后的卷积层C3_act进行批归一化处理，得到56×56×256维的向量，记为L6；Use the standard batch normalization method in Definition 7 to perform batch normalization on the activated convolutional layer C3_act to obtain a 56×56×256-dimensional vector, denoted as L6;

以56×56×256维的向量L6作为输入，按照定义2中经典的卷积神经网络方法，构建卷积层C4，卷积核参数设置：尺寸设为3×3×256、步长设为1；Taking the 56×56×256-dimensional vector L6 as input, according to the classic convolutional neural network method in Definition 2, construct the convolutional layer C4, and set the convolution kernel parameters: the size is set to 3×3×256, and the step size is set to 1;

采用定义6中的标准ReLU函数激活方法对卷积层C4进行激活，得到激活后的卷积层C4_act；Use the standard ReLU function activation method in Definition 6 to activate the convolutional layer C4 to obtain the activated convolutional layer C4_act ;

采用定义7中的标准的批归一化方法对激活后的卷积层C4_act进行批归一化处理，得到56×56×256维的向量，记为L7；Use the standard batch normalization method in Definition 7 to perform batch normalization on the activated convolutional layer C4_act to obtain a 56×56×256-dimensional vector, denoted as L7;

以56×56×256维的向量L7作为输入，采用定义8中的标准的最大池化方法对L7进行尺寸为2×2的最大池化，得到28×28×256维的向量，记为L8；Taking the 56×56×256-dimensional vector L7 as input, use the standard maximum pooling method in Definition 8 to perform maximum pooling on L7 with a size of 2×2, and obtain a 28×28×256-dimensional vector, denoted as L8 ;

以28×28×256维的向量L8作为输入，按照定义2中经典的卷积神经网络方法，构建卷积层C5，卷积核参数设置：尺寸设为3×3×512、步长设为1；Taking the 28×28×256-dimensional vector L8 as input, according to the classic convolutional neural network method in Definition 2, construct the convolutional layer C5, and set the convolution kernel parameters: the size is set to 3×3×512, and the step size is set to 1;

采用定义6中的标准ReLU函数激活方法对卷积层C5进行激活，得到激活后的卷积层C5_act；Use the standard ReLU function activation method in Definition 6 to activate the convolutional layer C5 to obtain the activated convolutional layer C5_act ;

采用定义7中的标准的批归一化方法对激活后的卷积层C5_act进行批归一化处理，得到28×28×512维的向量，记为L9；Use the standard batch normalization method in Definition 7 to perform batch normalization on the activated convolutional layer C5_act to obtain a 28×28×512-dimensional vector, denoted as L9;

以28×28×512维的向量L9作为输入，按照定义2中经典的卷积神经网络方法，构建卷积层C6，卷积核参数设置：尺寸设为3×3×512、步长设为1；Taking the 28×28×512-dimensional vector L9 as input, according to the classic convolutional neural network method in Definition 2, construct the convolutional layer C6, and set the convolution kernel parameters: the size is set to 3×3×512, and the step size is set to 1;

采用定义6中的标准ReLU函数激活方法对卷积层C6进行激活，得到激活后的卷积层C6_act；Use the standard ReLU function activation method in Definition 6 to activate the convolutional layer C6 to obtain the activated convolutional layer C6_act ;

采用定义7中的标准的批归一化方法对激活后的卷积层C6_act进行批归一化处理，得到28×28×512维的向量，记为L10；Use the standard batch normalization method in Definition 7 to perform batch normalization on the activated convolutional layer C6_act to obtain a 28×28×512-dimensional vector, denoted as L10;

以28×28×512维的向量L10作为输入，采用定义8中的标准的最大池化方法对L10进行尺寸为2×2的最大池化，得到14×14×512维的向量，记为L11；Take the 28×28×512-dimensional vector L10 as input, use the standard maximum pooling method in Definition 8 to perform maximum pooling on L10 with a size of 2×2, and obtain a 14×14×512-dimensional vector, denoted as L11 ;

以14×14×512维的向量L11作为输入，按照定义2中经典的卷积神经网络方法，构建卷积层C7，卷积核参数设置：尺寸设为3×3×512、步长设为1；Taking the 14×14×512-dimensional vector L11 as input, according to the classic convolutional neural network method in Definition 2, construct the convolution layer C7, and set the convolution kernel parameters: the size is set to 3×3×512, and the step size is set to 1;

采用定义6中的标准ReLU函数激活方法对卷积层C7进行激活，得到激活后的卷积层C7_act；The convolutional layer C7 is activated by using the standard ReLU function activation method in Definition 6 to obtain the activated convolutional layer C7_act ;

采用定义7中的标准的批归一化方法对激活后的卷积层C7_act进行批归一化处理，得到14×14×512维的向量，记为L12；Use the standard batch normalization method in Definition 7 to perform batch normalization processing on the activated convolutional layer C7_act to obtain a 14×14×512-dimensional vector, denoted as L12;

以14×14×512维的向量L12作为输入，按照定义2中经典的卷积神经网络方法，构建卷积层C8，卷积核参数设置：尺寸设为3×3×512、步长设为1；Taking the 14×14×512-dimensional vector L12 as input, according to the classic convolutional neural network method in Definition 2, construct the convolutional layer C8, and set the convolution kernel parameters: the size is set to 3×3×512, and the step size is set to 1;

采用定义6中的标准ReLU函数激活方法对卷积层C8进行激活，得到激活后的卷积层C8_act；The convolutional layer C8 is activated by using the standard ReLU function activation method in Definition 6 to obtain the activated convolutional layer C8_act ;

采用定义7中的标准的批归一化方法对激活后的卷积层C8_act进行批归一化处理，得到14×14×512维的向量，记为L13；Use the standard batch normalization method in Definition 7 to perform batch normalization on the activated convolutional layer C8_act to obtain a 14×14×512-dimensional vector, denoted as L13;

以14×14×512维的向量L13作为输入，采用定义8中的标准的最大池化方法对L13进行尺寸为2×2的最大池化，得到7×7×512维的向量，记为L14；Take the 14×14×512-dimensional vector L13 as input, use the standard maximum pooling method in Definition 8 to perform maximum pooling on L13 with a size of 2×2, and obtain a 7×7×512-dimensional vector, denoted as L14 ;

以7×7×512维的向量L14作为输入，采用定义3中的标准的全连接层方法，构建尺寸为1×1×4096的全连接层，记为FC1；Take the 7×7×512-dimensional vector L14 as input, adopt the standard fully connected layer method in Definition 3, and construct a fully connected layer with a size of 1×1×4096, denoted as FC1;

以FC1作为输入，采用定义3中的标准的全连接层方法，构建尺寸为1×1×4096的全连接层，记为FC2；Taking FC1 as input, adopt the standard fully connected layer method in Definition 3 to construct a fully connected layer with a size of 1×1×4096, denoted as FC2;

以FC2作为输入，采用定义3中的标准的全连接层方法，构建尺寸为1×1×N_class的全连接层，N_class为场景类别数，记为FC-N_class；Taking FC2 as input, using the standard fully-connected layer method in Definition 3, construct a fully-connected layer with a size of 1×1×N_class , where N_class is the number of scene categories, denoted as FC-N_class ;

至此，场景分类网络构建完毕，记为Modified-VGG_pre。So far, the scene classification network has been constructed, which is recorded as Modified-VGG_pre .

步骤3、训练场景分类网络Step 3. Training scene classification network

以步骤1中获取得到的新训练集new_Train作为输入，采用定义9中的经典的随机梯度下降算法，对步骤2中建立的场景分类网络Modified-VGG_pre进行训练和优化，得到训练和优化之后的场景分类网络，记为Modified-VGG。Take the new training set new_Train obtained in step 1 as input, and use the classic stochastic gradient descent algorithm in definition 9 to train and optimize the scene classification network Modified-VGG_pre established in step 2, and obtain the trained and optimized Scene classification network, denoted as Modified-VGG.

步骤4、进行场景分类Step 4. Carry out scene classification

以训练集Train作为输入，通过步骤3中得到的场景分类网络Modified-VGG进行分类，将Train中的所有图片分为两类，第一类为靠岸场景，记为Data1，第二类为离岸场景，记为Data2。Taking the training set Train as input, classify all the pictures in Train through the scene classification network Modified-VGG obtained in step 3, and divide all the pictures in Train into two categories. Shore scene, denoted as Data2.

步骤5、进行场景扩增Step 5. Perform scene amplification

根据由步骤4中得到的分类结果Data1和Data2。定义Data1的图片数量为M₁，Data2的图片数量为M₂。According to the classification results Data1 and Data2 obtained in step 4. The number of pictures in Data1 is defined as M₁ , and the number of pictures in Data2 is M₂ .

若M₁<M₂，采用定义18中标准的图像镜像方法，对第一类靠岸场景Data1中随机选取的M₂-M₁张图片进行镜像操作，得到镜像操作之后的M₂-M₁张图片，记为extra_Data1。然后采用定义19中的标准的数据集合并方法，将镜像操作之后的M₂-M₁张图片extra_Data1和第一类靠岸场景Data1合并，得到一个新的靠岸场景数据集合，记为new_Data1。定义new_Data2＝Data2。If M₁ <M₂ , use the standard image mirroring method in Definition 18 to perform a mirroring operation on M₂ -M₁ pictures randomly selected in the first type of docking scene Data1, and obtain M₂ -M₁ after the mirroring operation A picture, recorded as extra_Data1. Then use the standard data set merging method in Definition 19 to merge the M₂ -M₁ pictures extra_Data1 after the mirroring operation and the first type of docking scene Data1 to obtain a new docking scene data set, which is recorded as new_Data1. Define new_Data2=Data2.

若M₁>M₂，采用定义18中标准的图像镜像方法，对第二类离岸场景Data2中随机选取的M₁-M₂张图片进行镜像操作，得到镜像操作之后的M₁-M₂张图片，记为extra_Data2。然后采用定义19中的标准的数据集合并方法，将镜像操作之后的M₁-M₂张图片extra_Data2和第二类离岸场景Data2合并，得到一个新的离岸场景数据集合，记为new_Data2。定义new_Data1＝Data1。If M₁ >M₂ , use the standard image mirroring method in Definition 18 to perform a mirroring operation on randomly selected M₁ -M₂ pictures in the second type of offshore scene Data2, and obtain M₁ -M₂ after the mirroring operation A picture, recorded as extra_Data2. Then use the standard data set merging method in Definition 19 to merge the M₁ -M₂ pictures extra_Data2 after the mirroring operation with the second type of offshore scene Data2 to obtain a new offshore scene data set, which is denoted as new_Data2. Define new_Data1=Data1.

定义新的数据集合new_Data＝{new_Data1,new_Data2}。Define a new data set new_Data={new_Data1, new_Data2}.

步骤6、在经典的模型上进行实验验证Step 6. Experimental verification on the classic model

步骤6.1、数据增强Step 6.1, Data Augmentation

以从步骤5中获得的新的数据集合new_Data作为输入，采用定义15中的经典的数据增强方法对new_Data进行数据增强，得到数据增强后的SAR图像检测训练集，记作DetTrain。Taking the new data set new_Data obtained from step 5 as input, use the classic data enhancement method in Definition 15 to perform data enhancement on new_Data, and obtain the SAR image detection training set after data enhancement, denoted as DetTrain.

步骤6.2、建立网络Step 6.2, Establish a network

采用定义14中经典的Faster R-CNN方法建立未训练的Faster R-CNN网络；Establish an untrained Faster R-CNN network using the classic Faster R-CNN method in Definition 14;

步骤6.3、训练网络Step 6.3, training network

初始化步骤6.2得到的未训练的网络的图像批处理大小，记为Batchsize；The image batch size of the untrained network obtained in initialization step 6.2, denoted as Batchsize;

初始化未训练的网络的学习率，记为η；Initialize the learning rate of the untrained network, denoted as η;

初始化未训练的网络训练参数的权重衰减率和动量，分别记为DC和MM；Initialize the weight decay rate and momentum of the untrained network training parameters, denoted as DC and MM respectively;

对步骤6.2得到的未训练的Faster R-CNN网络进行随机参数初始化，将初始化后的参数记为W；Perform random parameter initialization on the untrained Faster R-CNN network obtained in step 6.2, and denote the initialized parameters as W;

使用步骤6.1中的训练集DetTrain，采用定义11中经典的随机梯度下降算法对未训练的Faster R-CNN网络进行训练，得到该网络的损失值，记为loss。Using the training set DetTrain in step 6.1, use the classic stochastic gradient descent algorithm in definition 11 to train the untrained Faster R-CNN network, and obtain the loss value of the network, which is recorded as loss.

当该网络的损失值loss小于理想损失值时，停止训练，得到新的网络参数new_W。When the loss value loss of the network is less than the ideal loss value, the training is stopped and a new network parameter new_W is obtained.

步骤6.4、检测结果评估Step 6.4, Test result evaluation

以步骤6.3中得到的新的网络参数new_W和步骤1中得到的测试集Test作为输入，采用定义16中的标准的前向传播方法，得到基于Faster R-CNN的船只检测网络得到检测结果，记为Result。Taking the new network parameter new_W obtained in step 6.3 and the test set Test obtained in step 1 as input, adopt the standard forward propagation method in definition 16 to obtain the detection result based on the Faster R-CNN ship detection network, record for Result.

以基于Faster R-CNN的船只检测网络得到检测结果Result作为输入，采用定义17中标准的非极大值抑制方法，去除检测结果Result中的冗余框，得到得分最高的检测框，具体步骤如下：Take the detection result Result obtained by the ship detection network based on Faster R-CNN as input, and use the standard non-maximum value suppression method in Definition 17 to remove the redundant frame in the detection result Result, and obtain the detection frame with the highest score. The specific steps are as follows :

(1)首先令检测结果Result中得分最高的框，记为BS；(1) First let the box with the highest score in the detection result Result be recorded as BS;

(2)然后采用定义5中的IoU交并比计算方法，对检测结果Result中其余框与BS进行IoU交并比计算，得到检测结果Result中其余框与BS的交并比(IoU)，舍弃IoU>0.5的框后，Result中剩余的框记为RB；(2) Then use the IoU calculation method in Definition 5 to calculate the IoU ratio between the remaining frames in the detection result Result and the BS, and obtain the IoU ratio (IoU) between the remaining frames in the detection result Result and the BS, and discard After the box with IoU>0.5, the remaining boxes in Result are marked as RB;

(3)继续从RB中选出得分最高的框BS；(3) Continue to select the box BS with the highest score from the RB;

重复上述(2)中计算IoU和舍弃的过程，直到没有框可以舍弃，最后剩余的框即为最终检测结果，记为RR。Repeat the process of calculating IoU and discarding in (2) above until there is no frame to discard, and the last remaining frame is the final detection result, which is recorded as RR.

以上面步骤得到的Faster R-CNN网络检测结果RR作为输入，采用定义12中召回率和精确率计算方法，计算得到Faster R-CNN网络检测的精确率P、召回率R以及精确率和召回率曲线P(R)；采用定义13中标准的mAP指标精度评估方法，计算得到Faster R-CNN网络的平均精度mAP。Take the Faster R-CNN network detection result RR obtained in the above steps as input, and use the calculation method of recall rate and precision rate in Definition 12 to calculate the precision rate P, recall rate R, precision rate and recall rate of Faster R-CNN network detection Curve P(R); using the standard mAP index accuracy evaluation method in Definition 13, the average accuracy mAP of the Faster R-CNN network is calculated.

本发明的创新点在于使用卷积神经网络构建了一个场景分类的模型来进行数据增强，提高了SAR图像中靠岸船只的检测精度。该方法能够对训练集进行靠岸样本和离岸样本的分类，来平衡靠岸训练样本和离岸训练样本的数量，使得本发明中的船只检测模型具有更好的靠岸船只检测能力：采用本方法的Faster R-CNN船只检测网络的总体检测精度比现有技术的Faster R-CNN船只检测网络的总体检测精度提高了1.95％，对靠岸船只的检测精度提高了6.61％。The innovation of the present invention is that a convolutional neural network is used to build a scene classification model for data enhancement, which improves the detection accuracy of docked ships in SAR images. The method can classify the training set as docking samples and offshore samples to balance the number of docking training samples and offshore training samples, so that the ship detection model in the present invention has better detection capabilities for docking ships: using The overall detection accuracy of the Faster R-CNN ship detection network of this method is 1.95% higher than that of the prior art Faster R-CNN ship detection network, and the detection accuracy of docked ships is increased by 6.61%.

本发明的优点在于能够提高SAR图像中靠岸船只的检测精度，克服现有技术存在的靠岸船只检测精度的不足，同时一定程度提高了总体的检测精度。The invention has the advantages of being able to improve the detection accuracy of the docked ships in the SAR image, overcoming the deficiency of the detection accuracy of the docked ships in the prior art, and at the same time improving the overall detection accuracy to a certain extent.

附图说明Description of drawings

图1为本发明中的用于SAR舰船检测的场景感知数据增强方法的流程示意图。FIG. 1 is a schematic flowchart of a scene perception data enhancement method for SAR ship detection in the present invention.

图2为本发明中的用于SAR舰船检测的场景感知数据增强方法的场景分类网络结构的示意图。FIG. 2 is a schematic diagram of the scene classification network structure of the scene perception data enhancement method for SAR ship detection in the present invention.

图3为本发明中的用于SAR舰船检测的场景感知数据增强方法的检测精度。Fig. 3 is the detection accuracy of the scene perception data enhancement method for SAR ship detection in the present invention.

具体实施方式Detailed ways

步骤1、准备数据集Step 1. Prepare the dataset

按照定义1获取SSDD数据集方法得到SSDD数据集，选择后缀为1和9的图像作为测试集，记为Test，其他图像作为训练集，记为Train，并将训练集Train中的SAR图像进行标记，分为靠岸场景和离岸场景两类，得到新的训练集，记为new_Train。Obtain the SSDD data set according to Definition 1, select the images with suffixes 1 and 9 as the test set, record it as Test, and use other images as the training set, record it as Train, and mark the SAR images in the training set Train , which can be divided into two types: docking scene and offshore scene, and a new training set is obtained, which is denoted as new_Train.

以56×56×128维的向量L5作为输入，按照定义2中经典的卷积神经网络方法，构建卷积层C3，卷积核参数设置：尺寸设为3×3×256、步长设为1；Taking the 56×56×128-dimensional vector L5 as input, according to the classic convolution neural network method in Definition 2, construct the convolution layer C3, and set the convolution kernel parameters: the size is set to 3×3×256, and the step size is set to 1;

以14×14×512维的向量L11作为输入，按照定义2中经典的卷积神经网络方法，构建卷积层C7，卷积核参数设置：尺寸设为3×3×512、步长设为1；Taking the 14×14×512-dimensional vector L11 as input, according to the classic convolutional neural network method in Definition 2, construct the convolutional layer C7, and set the convolution kernel parameters: the size is set to 3×3×512, and the step size is set to 1;

采用定义7中的标准的批归一化方法对激活后的卷积层C7_act进行批归一化处理，得到14×14×512维的向量，记为L12；Use the standard batch normalization method in Definition 7 to perform batch normalization on the activated convolutional layer C7_act to obtain a 14×14×512-dimensional vector, denoted as L12;

步骤3、训练场景分类网络Step 3. Training scene classification network

步骤4、进行场景分类Step 4. Carry out scene classification

步骤5、进行场景扩增Step 5. Perform scene amplification

步骤6.1、数据增强Step 6.1, Data Augmentation

步骤6.2、建立网络Step 6.2, Establish a network

步骤6.3、训练网络Step 6.3, training network

步骤6.4、检测结果评估Step 6.4, Test result evaluation

以上面步骤得到的Faster R-CNN网络检测结果RR作为输入，采用定义12中召回率和精确率计算方法，计算得到Faster R-CNN网络检测的精确率P、召回率R以及精确率和召回率曲线P(R)；采用定义13中标准的mAP指标精度评估方法，计算得到Faster R-CNN网络的平均精度mAP。Taking the Faster R-CNN network detection result RR obtained in the above steps as input, using the calculation method of recall rate and precision rate in Definition 12, calculate the precision rate P, recall rate R, precision rate and recall rate of Faster R-CNN network detection Curve P(R); using the standard mAP index accuracy evaluation method in Definition 13, the average accuracy mAP of the Faster R-CNN network is calculated.