CN118015388A

Movatterモバイル変換

Info

Publication number: CN118015388A
Application number: CN202410424489.9A
Authority: CN
Inventors: 覃仁超; 张岚; 叶承卓; 何飞
Original assignee: Southwest University of Science and Technology
Current assignee: Southwest University of Science and Technology
Priority date: 2024-04-10
Filing date: 2024-04-10
Publication date: 2024-05-10

Abstract

Translated fromChinese

本申请提供了一种小目标检测方法、装置及存储介质，涉及图像检测技术领域，其包括：获取包含小目标的待检测图片，输入至小目标检测模型中；模型包括：添加有注意力机制的特征提取网络，包括预设锚框尺寸和比例、以及新增面积比惩罚项和宽高比惩罚项的损失函数的区域建议网络；通过特征提取网络提取待检测图片中包含小目标信息的特征图；通过区域建议网络基于特征图输出多个候选框，并进行过滤处理，得到预测框；通过感兴趣区域层和全连接层，得到目标概率分数和边界框回归分数，以得到待检测图片对应的小目标检测框及类别。本申请对原有小目标检测模型进行改进，提高模型的检测精度，降低漏检率。

The present application provides a small target detection method, device and storage medium, which relates to the field of image detection technology, and includes: obtaining a picture to be detected containing a small target, and inputting it into a small target detection model; the model includes: a feature extraction network with an attention mechanism added, including a region proposal network with a loss function of preset anchor frame size and ratio, and newly added area ratio penalty term and aspect ratio penalty term; extracting a feature map containing small target information in the picture to be detected through the feature extraction network; outputting multiple candidate frames based on the feature map through the region proposal network, and filtering and processing to obtain a predicted frame; obtaining a target probability score and a bounding box regression score through a region of interest layer and a fully connected layer, so as to obtain a small target detection frame and category corresponding to the picture to be detected. The present application improves the original small target detection model, improves the detection accuracy of the model, and reduces the missed detection rate.

Description

Translated fromChinese

小目标检测方法、装置及存储介质Small target detection method, device and storage medium

技术领域Technical Field

本申请涉及图像处理技术领域，尤其是涉及一种小目标检测方法、装置及存储介质。The present application relates to the field of image processing technology, and in particular to a small target detection method, device and storage medium.

背景技术Background technique

近年来，深度学习与目标检测相结合的这项技术已经趋于成熟，但在小目标检测方面效果还不是很理想，当前，小目标检测在现实很多领域中都有涉及，例如在缺陷检测、航拍图像分析和智慧医疗等发挥着重要的作用。由于小目标像素占比小、语义信息不丰富且易受噪音干扰等问题，导致小目标检测仍然是目标检测领域中的一大难点。Faster R-CNN作为经典的两阶段目标检测算法，在主干网络中提取目标的特征信息，并在RPN网络中生成候选框，最后经过ROI Pooling层综合proposals和feature maps后，送入全连接层判定目标类别。该方法虽然对目标检测的速度和精度都有了一定程度的提高，但无法兼顾较小的局部特征，并且无法保证通用的锚框尺寸和比例适用于小目标数据集，导致检测精度不高和高漏检率等问题。In recent years, the technology combining deep learning with target detection has matured, but the effect of small target detection is still not very ideal. At present, small target detection is involved in many fields in reality, such as defect detection, aerial image analysis and smart medical care. Due to the small pixel ratio of small targets, the lack of semantic information and susceptibility to noise interference, small target detection is still a major difficulty in the field of target detection. Faster R-CNN is a classic two-stage target detection algorithm. It extracts the feature information of the target in the backbone network and generates candidate boxes in the RPN network. Finally, after the ROI Pooling layer integrates proposals and feature maps, it is sent to the fully connected layer to determine the target category. Although this method has improved the speed and accuracy of target detection to a certain extent, it cannot take into account smaller local features and cannot ensure that the general anchor box size and proportion are suitable for small target data sets, resulting in low detection accuracy and high missed detection rate.

发明内容Summary of the invention

本申请的目的在于提供一种小目标检测方法、装置及存储介质，对原有小目标检测模型进行改进，通过添加有MS-ECA注意力机制的特征提取网络，提高较小的局部特征的提取精度，通过聚类分析确定的锚框的尺寸和比例适用于小目标数据集，并结合以面积比和宽高比作为新的惩罚项的损失函数，来全面提高模型的检测精度，降低漏检率。The purpose of this application is to provide a small target detection method, device and storage medium, to improve the original small target detection model, to improve the extraction accuracy of smaller local features by adding a feature extraction network with MS-ECA attention mechanism, to make the size and proportion of the anchor box determined by cluster analysis suitable for small target data sets, and to combine the loss function with area ratio and aspect ratio as new penalty terms to comprehensively improve the detection accuracy of the model and reduce the missed detection rate.

第一方面，本申请提供一种小目标检测方法，方法包括：获取包含小目标的待检测图片；将待检测图片输入至小目标检测模型中；小目标检测模型包括添加有MS-ECA注意力机制的特征提取网络，包括预设锚框尺寸和比例、以及新增面积比惩罚项和宽高比惩罚项的损失函数的区域建议网络；预设锚框尺寸和比例为通过聚类算法对小目标训练样本集进行聚类分析后得到；通过特征提取网络，提取待检测图片中的包含小目标信息的特征图；通过区域建议网络，基于提取的特征图输出多个候选框，并对多个候选框进行过滤处理，得到后续用于计算的预测框；通过感兴趣区域层对预测框进行调整，再经过全连接层后，得到目标概率分数和边界框回归分数，以得到待检测图片对应的小目标检测框及类别。In a first aspect, the present application provides a small target detection method, the method comprising: obtaining a picture to be detected containing a small target; inputting the picture to be detected into a small target detection model; the small target detection model comprises a feature extraction network with an MS-ECA attention mechanism added, including a preset anchor box size and ratio, and a region proposal network with a loss function of newly added area ratio penalty and aspect ratio penalty; the preset anchor box size and ratio are obtained after clustering analysis of a small target training sample set through a clustering algorithm; through the feature extraction network, extracting a feature map containing small target information in the picture to be detected; through the region proposal network, outputting multiple candidate boxes based on the extracted feature map, and filtering the multiple candidate boxes to obtain a prediction box for subsequent calculation; adjusting the prediction box through the region of interest layer, and then after passing through the fully connected layer, obtaining a target probability score and a bounding box regression score to obtain a small target detection box and category corresponding to the picture to be detected.

进一步地，上述通过特征提取网络，提取待检测图片中的包含小目标信息的特征图的步骤，包括：通过特征提取网络中的主干网络提取待检测图片的第一特征图；将第一特征图，按照通道划分为多组特征图，分别用不同大小的卷积核对多组特征图进行卷积，得到多组第二特征图；将卷积后得到的多组第二特征图输入至ECA注意力机制，得到不同的权值，将每个权值与对应的第二特征图进行相乘，得到不同组的第三特征图；将不同组的第三特征图在通道上进行拼接，得到包含小目标信息的特征图。Furthermore, the step of extracting a feature map containing small target information from the image to be detected through a feature extraction network includes: extracting a first feature map of the image to be detected through a backbone network in the feature extraction network; dividing the first feature map into multiple groups of feature maps according to channels, and convolving the multiple groups of feature maps with convolution kernels of different sizes to obtain multiple groups of second feature maps; inputting the multiple groups of second feature maps obtained after the convolution into the ECA attention mechanism to obtain different weights, multiplying each weight by the corresponding second feature map to obtain different groups of third feature maps; and splicing different groups of third feature maps on the channel to obtain feature maps containing small target information.

进一步地，上述小目标检测模型的训练过程如下：获取小目标训练样本集；采用K-means++聚类算法对小目标训练样本集进行聚类分析，确定适用于训练集的锚框尺寸和比例；将小目标训练样本集中的样本输入至小目标检测模型对应的初始检测模型中；通过初始检测模型中的添加有MS-ECA注意力机制的特征提取网络，提取样本特征；通过配置有锚框尺寸和比例的区域建议网络对样本特征进行处理得到预测框；基于包括预测框和真实框的面积比和宽高比的损失函数，确定损失值；基于损失值调整初始检测模型的参数，直至初始检测模型收敛，得到小目标检测模型。Furthermore, the training process of the above-mentioned small target detection model is as follows: obtain a small target training sample set; use the K-means++ clustering algorithm to perform cluster analysis on the small target training sample set to determine the anchor box size and ratio suitable for the training set; input the samples in the small target training sample set into the initial detection model corresponding to the small target detection model; extract sample features through the feature extraction network with the MS-ECA attention mechanism added in the initial detection model; process the sample features through the region proposal network configured with the anchor box size and ratio to obtain a predicted box; determine the loss value based on the loss function including the area ratio and aspect ratio of the predicted box and the real box; adjust the parameters of the initial detection model based on the loss value until the initial detection model converges to obtain the small target detection model.

进一步地，上述获取小目标训练样本集的步骤，包括：获取小目标数据集；对小目标数据集中的图片进行缩放处理，以及图片数据增强处理，得到小目标训练样本集；其中，缩放处理包括：对图片中小目标的真实框宽高值超过阈值的图片，按照缩放比进行缩放处理；缩放比等于目标真实框宽高值中的最小值除以预设值；图片数据增强处理包括：对每张图片进行旋转、翻转以及HSV值调整。Furthermore, the above-mentioned step of obtaining a small target training sample set includes: obtaining a small target data set; scaling the images in the small target data set, and performing image data enhancement processing to obtain a small target training sample set; wherein the scaling processing includes: scaling the images whose real frame width and height values of the small targets in the images exceed a threshold value according to a scaling ratio; the scaling ratio is equal to the minimum value of the target real frame width and height divided by a preset value; the image data enhancement processing includes: rotating, flipping, and adjusting the HSV value of each image.

进一步地，上述采用K-means++聚类算法对小目标训练样本集进行聚类分析，确定适用于训练集的锚框尺寸和比例的步骤，包括：获取训练集中每个样本中小目标对应的真实框的面积；按照两个样本中真实框的面积差小于等于预设面积阈值，确定两个真实框属于一个簇，否则不属于一个簇的原则对所有样本进行判断，确定聚类个数；对训练集中的样本进行聚类中心初始化处理，得到与聚类个数相同数量的初始化聚类中心；对多个初始化聚类中心进行更新处理，得到多个更新聚类中心；基于多个更新聚类中心分别对应的目标真实框，确定为适用于训练集的锚框尺寸和比例。Furthermore, the above-mentioned step of using the K-means++ clustering algorithm to perform cluster analysis on the small target training sample set to determine the size and proportion of the anchor frame suitable for the training set includes: obtaining the area of the real frame corresponding to the small target in each sample in the training set; judging all samples according to the principle that the area difference of the real frames in two samples is less than or equal to a preset area threshold, and determining that the two real frames belong to one cluster, otherwise they do not belong to a cluster, and determining the number of clusters; initializing the cluster centers of the samples in the training set to obtain the same number of initialized cluster centers as the number of clusters; updating multiple initialized cluster centers to obtain multiple updated cluster centers; and determining the size and proportion of the anchor frame suitable for the training set based on the target real frames corresponding to the multiple updated cluster centers.

进一步地，上述对训练集中的样本进行聚类中心初始化处理，得到与聚类个数相同数量的初始化聚类中心的步骤，包括：从训练集中随机选择一个样本作为当前簇中心；基于当前簇中心进行簇中心确定步骤：计算其它每个样本与当前簇中心间的IoU值，将最小IoU值对应的样本作为新的当前簇中心，继续执行簇中心确定步骤，直至确定出的当前簇中心的数量达到聚类个数，将确定出的当前簇中心确定为初始化聚类中心。Furthermore, the above-mentioned step of initializing the cluster centers of the samples in the training set to obtain the same number of initialized cluster centers as the number of clusters includes: randomly selecting a sample from the training set as the current cluster center; performing a cluster center determination step based on the current cluster center: calculating the IoU value between each other sample and the current cluster center, taking the sample corresponding to the minimum IoU value as the new current cluster center, and continuing to perform the cluster center determination step until the number of determined current cluster centers reaches the number of clusters, and determining the determined current cluster centers as the initialized cluster centers.

进一步地，上述对多个初始化聚类中心进行更新处理，得到多个更新聚类中心的步骤，包括：基于多个初始化聚类中心执行样本划分步骤：针对每个样本，计算样本与每个初始化聚类中心的IoU值；将样本归属到IoU值最大的初始化聚类中心对应的簇中；根据每个簇中的中值更新该簇中的聚类中心；中值为该簇中多个样本分别与簇中心的IoU值中，处于中间位置的IoU值对应的样本；基于更新的聚类中心继续执行样本划分步骤，直到每个簇中的样本不再变化，将此时的每个簇中的聚类中心确定为更新聚类中心。Furthermore, the above-mentioned step of updating the multiple initialized cluster centers to obtain multiple updated cluster centers includes: performing a sample division step based on the multiple initialized cluster centers: for each sample, calculating the IoU value of the sample with each initialized cluster center; assigning the sample to the cluster corresponding to the initialized cluster center with the largest IoU value; updating the cluster center in each cluster according to the median value in the cluster; the median is the sample corresponding to the IoU value in the middle position among the IoU values of multiple samples in the cluster with the cluster center; continuing to perform the sample division step based on the updated cluster center until the samples in each cluster no longer change, and determining the cluster center in each cluster at this time as the updated cluster center.

进一步地，上述基于包括预测框和真实框的面积比和宽高比的损失函数，确定损失值的步骤，包括：按照以下算式计算损失值AWH-IoU：Furthermore, the step of determining the loss value based on the loss function including the area ratio and the aspect ratio of the predicted box and the real box includes: calculating the loss valueAWH-IoU according to the following formula:

； ;

其中，IoU表示输出结果中的预测框与真实框之间的面积交并比；和/>分别代表预测框和真实框的中心坐标值，/>表示预测框和真实框的中心距离，表示能够包围预测框和真实框最小闭合区域的对角线距离，/>和/>分别表示预测框和真实框的面积，/>，/>分别表示真实框和预测框的宽高值；/>表示指数函数。Among them,IoU represents the area intersection-over-union ratio between the predicted box and the true box in the output result; and/> Represent the center coordinates of the predicted box and the real box respectively,/> Represents the center distance between the predicted box and the true box, Indicates the diagonal distance of the minimum closed area that can enclose the predicted box and the real box, /> and/> Represent the area of the predicted box and the real box respectively,/> ,/> Respectively represent the width and height of the real box and the predicted box; /> Represents an exponential function.

第二方面，本申请还提供一种小目标检测装置，装置包括用于执行第一方面任一项所述的小目标检测方法的步骤的多个模块，多个模块包括图片获取模块、图片输入模块、特征提取模块和预测模块，其中：图片获取模块，用于获取包含小目标的待检测图片；图片输入模块，用于将待检测图片输入至小目标检测模型中；小目标检测模型包括添加有MS-ECA注意力机制的特征提取网络，包括预设锚框尺寸和比例、以及新增面积比惩罚项和宽高比惩罚项的损失函数的区域建议网络；预设锚框尺寸和比例为通过聚类算法对小目标训练样本集进行聚类分析后得到；特征提取模块，用于通过特征提取网络，提取待检测图片中的包含小目标信息的特征图；预测模块，用于通过区域建议网络，基于提取的特征图输出多个候选框，并对多个所述候选框进行过滤处理，得到后续用于计算的预测框；通过感兴趣区域层对预测框进行调整，再经过全连接层后，得到目标概率分数和边界框回归分数，以得到待检测图片对应的小目标检测框及类别。In a second aspect, the present application also provides a small target detection device, which includes multiple modules for executing the steps of the small target detection method described in any one of the first aspects, and the multiple modules include an image acquisition module, an image input module, a feature extraction module and a prediction module, wherein: the image acquisition module is used to acquire a picture to be detected containing a small target; the image input module is used to input the picture to be detected into a small target detection model; the small target detection model includes a feature extraction network with an MS-ECA attention mechanism added, including a region proposal network with a preset anchor box size and ratio, and a loss function with newly added area ratio penalty terms and aspect ratio penalty terms; the preset anchor box size and ratio are obtained after clustering analysis of a small target training sample set by a clustering algorithm; the feature extraction module is used to extract a feature map containing small target information in the picture to be detected through the feature extraction network; the prediction module is used to output multiple candidate boxes based on the extracted feature map through the region proposal network, and filter the multiple candidate boxes to obtain a prediction box for subsequent calculation; the prediction box is adjusted through the region of interest layer, and then after passing through the fully connected layer, the target probability score and the bounding box regression score are obtained to obtain the small target detection box and category corresponding to the picture to be detected.

第三方面，本申请还提供一种计算机可读存储介质，计算机可读存储介质存储有计算机可执行指令，计算机可执行指令在被处理器调用和执行时，计算机可执行指令促使处理器实现上述第一方面所述的方法。In a third aspect, the present application further provides a computer-readable storage medium, which stores computer-executable instructions. When the computer-executable instructions are called and executed by a processor, the computer-executable instructions prompt the processor to implement the method described in the first aspect above.

本申请提供的小目标检测方法、装置及存储介质中，首先获取包含小目标的待检测图片，将待检测图片输入至小目标检测模型中；该小目标检测模型包括添加有MS-ECA注意力机制的特征提取网络，包括预设锚框尺寸和比例、以及新增面积比惩罚项和宽高比惩罚项的损失函数的区域建议网络；预设锚框尺寸和比例为通过聚类算法对小目标训练样本集进行聚类分析后得到的锚框；然后通过特征提取网络，提取待检测图片中的包含小目标信息的特征图；通过区域建议网络，基于提取的特征图输出多个候选框，并对多个候选框进行过滤处理，得到后续用于计算的预测框；通过感兴趣区域层对预测框进行调整，再经过全连接层后，得到目标概率分数和边界框回归分数，以得到待检测图片对应的小目标检测框及类别。本申请能够对原有小目标检测模型进行改进，通过添加有MS-ECA注意力机制的特征提取网络，提高较小的局部特征的提取精度，通过聚类分析确定的锚框的尺寸和比例适用于小目标数据集，并结合以面积比和宽高比作为新的惩罚项的损失函数，来全面提高模型的检测精度，降低漏检率。In the small target detection method, device and storage medium provided by the present application, a picture to be detected containing a small target is first obtained, and the picture to be detected is input into a small target detection model; the small target detection model includes a feature extraction network with an MS-ECA attention mechanism added, including a region proposal network with a preset anchor frame size and ratio, and a loss function with a newly added area ratio penalty term and aspect ratio penalty term; the preset anchor frame size and ratio are the anchor frames obtained after clustering analysis of a small target training sample set by a clustering algorithm; then, through the feature extraction network, a feature map containing small target information in the picture to be detected is extracted; through the region proposal network, multiple candidate frames are output based on the extracted feature map, and the multiple candidate frames are filtered to obtain a prediction frame for subsequent calculation; the prediction frame is adjusted through the region of interest layer, and after passing through the fully connected layer, the target probability score and the bounding box regression score are obtained to obtain the small target detection frame and category corresponding to the picture to be detected. This application can improve the original small target detection model by adding a feature extraction network with MS-ECA attention mechanism to improve the extraction accuracy of smaller local features. The size and proportion of the anchor box determined by cluster analysis are suitable for small target data sets. It is combined with a loss function with area ratio and aspect ratio as new penalty terms to comprehensively improve the detection accuracy of the model and reduce the missed detection rate.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本申请具体实施方式或现有技术中的技术方案，下面将对具体实施方式或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图是本申请的一些实施方式，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the specific implementation methods of the present application or the technical solutions in the prior art, the drawings required for use in the specific implementation methods or the description of the prior art will be briefly introduced below. Obviously, the drawings described below are some implementation methods of the present application. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying any creative work.

图1为本申请实施例提供的一种小目标检测方法的流程图；FIG1 is a flow chart of a small target detection method provided in an embodiment of the present application;

图2为本申请实施例提供的一种Faster R-CNN的示意图；FIG2 is a schematic diagram of a Faster R-CNN provided in an embodiment of the present application;

图3为本申请实施例提供的一种Resnet-50的示意图；FIG3 is a schematic diagram of a Resnet-50 provided in an embodiment of the present application;

图4为本申请实施例提供的一种MS_ECA注意力机制的原理示意图；FIG4 is a schematic diagram of the principle of an MS_ECA attention mechanism provided in an embodiment of the present application;

图5为本申请实施例提供的一种K-means++聚类算法的流程图；FIG5 is a flow chart of a K-means++ clustering algorithm provided in an embodiment of the present application;

图6为本申请实施例提供的一种结果对比示意图；FIG6 is a schematic diagram of a result comparison provided in an embodiment of the present application;

图7为本申请实施例提供的一种小目标检测装置的结构框图。FIG. 7 is a structural block diagram of a small target detection device provided in an embodiment of the present application.

具体实施方式Detailed ways

下面将结合实施例对本申请的技术方案进行清楚、完整地描述，显然，所描述的实施例是本申请一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。The technical solution of the present application will be described clearly and completely in conjunction with the embodiments below. Obviously, the described embodiments are part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present application.

目前，由于小目标特征难提取这一问题，注意力机制作为一种即插即用的结构，被广泛应用到目标检测领域当中，但是，当下流行的几种注意力机制，例如：SE、CBAM以及ECA等，其主要是对输入的特征图进行同尺寸下特征放大和增强，这样的操作对小目标的特征增强效果不明显；此外，最近出现一种以对抗生成网络来获取高分辨率图像的方法，该方法虽然能增强小目标在图像上的信息，但是会极大的增加检测网络的负担，所以并不适用于二阶段检测方法。其次，建议框是检测效果好坏最直观的表现，而在候选框筛选阶段，当前应用最多的损失函数为：IoU、G-IoU、D-IoU和C-IoU等，这些损失函数在一定程度上能够对中大型目标的候选框进行很好效果的过滤作用，但面对小目标，特别是有明显长宽比例的小目标，还不能发挥很好的效果。At present, due to the problem that small target features are difficult to extract, the attention mechanism, as a plug-and-play structure, is widely used in the field of target detection. However, several popular attention mechanisms, such as SE, CBAM, and ECA, mainly amplify and enhance the features of the input feature map at the same size. Such operations have little effect on the feature enhancement of small targets. In addition, a method of obtaining high-resolution images with adversarial generative networks has recently emerged. Although this method can enhance the information of small targets on the image, it will greatly increase the burden of the detection network, so it is not suitable for the two-stage detection method. Secondly, the proposed box is the most intuitive manifestation of the detection effect. In the candidate box screening stage, the most commonly used loss functions are IoU, G-IoU, D-IoU, and C-IoU. These loss functions can filter the candidate boxes of medium and large targets to a certain extent, but they cannot play a good effect on small targets, especially small targets with obvious length-to-width ratios.

综上，现有两阶段小目标检测方法主要面临的问题是：In summary, the main problems faced by the existing two-stage small target detection method are:

1、锚框大小和比例的设计不适用于小目标；1. The design of anchor box size and ratio is not suitable for small objects;

2、小目标的特征提取部分，无法提取丰富的小目标信息；2. The feature extraction part of small targets cannot extract rich small target information;

3、候选框的筛选部分，由于损失函数的惩戒力度太小，导致效果差的候选框也能作为预测框参与后续操作。3. In the screening part of the candidate boxes, since the penalty of the loss function is too small, the candidate boxes with poor results can also be used as prediction boxes to participate in subsequent operations.

基于此，本申请实施例提供一种小目标检测方法、装置及存储介质，对原有小目标检测模型进行改进，通过添加有MS-ECA注意力机制的特征提取网络，提高较小的局部特征的提取精度，通过聚类分析确定的锚框的尺寸和比例适用于小目标数据集，并结合以面积比和宽高比作为新的惩罚项的损失函数，来全面提高模型的检测精度，降低漏检率。Based on this, the embodiments of the present application provide a small target detection method, device and storage medium, which improve the original small target detection model, improve the extraction accuracy of smaller local features by adding a feature extraction network with MS-ECA attention mechanism, and the size and proportion of the anchor frame determined by clustering analysis are suitable for small target data sets, and combine with the loss function with area ratio and aspect ratio as new penalty terms to comprehensively improve the detection accuracy of the model and reduce the missed detection rate.

为便于对本实施例进行理解，首先对本申请实施例所公开的一种小目标检测方法进行详细介绍。To facilitate understanding of this embodiment, a small target detection method disclosed in an embodiment of the present application is first introduced in detail.

图1为本申请实施例提供的一种小目标检测方法的流程图，该方法包括以下步骤：FIG1 is a flow chart of a small target detection method provided in an embodiment of the present application, the method comprising the following steps:

步骤S102，获取包含小目标的待检测图片；待检测图片中的小目标可以是癌细胞，也可以是车牌等不同的类别，也就是说，本申请提供的小目标检测模型可以是癌细胞检测模型，也可以是车牌检测模型等。Step S102, obtaining a picture to be detected containing a small target; the small target in the picture to be detected can be a cancer cell, or a license plate and other different categories, that is, the small target detection model provided in the present application can be a cancer cell detection model, or a license plate detection model, etc.

步骤S104，将待检测图片输入至小目标检测模型中；小目标检测模型包括添加有MS-ECA（Multi-scale Split Efficient Channel Attention，多尺度分割高效通道注意力）机制的特征提取网络，包括预设锚框尺寸和比例、以及新增面积比惩罚项和宽高比惩罚项的损失函数的区域建议网络；预设锚框尺寸和比例为通过聚类算法对小目标训练样本集进行聚类分析后得到；Step S104, inputting the image to be detected into a small target detection model; the small target detection model includes a feature extraction network with an MS-ECA (Multi-scale Split Efficient Channel Attention) mechanism added, including a region proposal network with a preset anchor box size and ratio, and a loss function with newly added area ratio penalty terms and aspect ratio penalty terms; the preset anchor box size and ratio are obtained by clustering analysis of a small target training sample set through a clustering algorithm;

本申请实施例中的小目标检测模型是基于Faster R-CNN（其原有结构如图2所示）进行改进得到的，在特征提取网络中添加了MS-ECA注意力机制，可以丰富提取的小目标信息；在区域建议网络中用到的损失函数中，增加了以候选框和真实框的面积比和宽高比构成的新惩罚项；在候选框过滤时基于上述预设锚框尺寸和比例进行处理，其中，预设锚框尺寸和比例为通过聚类算法对小目标训练样本集进行聚类分析后得到；这样可以针对具有明显长宽比例的小目标，发挥很好的效果，提高模型预测精度。The small target detection model in the embodiment of the present application is improved based on Faster R-CNN (whose original structure is shown in Figure 2). The MS-ECA attention mechanism is added to the feature extraction network to enrich the extracted small target information; in the loss function used in the region proposal network, a new penalty term consisting of the area ratio and aspect ratio of the candidate box and the real box is added; when filtering the candidate box, it is processed based on the above-mentioned preset anchor box size and ratio, wherein the preset anchor box size and ratio are obtained by clustering analysis of the small target training sample set through a clustering algorithm; this can achieve a good effect for small targets with obvious length-to-width ratios and improve the model prediction accuracy.

步骤S106，通过特征提取网络，提取待检测图片中的包含小目标信息的特征图。Step S106, extracting a feature map containing small target information in the image to be detected through a feature extraction network.

通过添加了MS-ECA注意力机制的特征提取网络进行特征提取，可以得到包含更丰富的小目标信息的特征图。By adding the feature extraction network with the MS-ECA attention mechanism for feature extraction, a feature map containing richer small target information can be obtained.

步骤S108，通过区域建议网络，基于提取的特征图输出多个候选框，并对多个候选框进行过滤处理，得到后续用于计算的预测框；通过感兴趣区域层对预测框进行调整，再经过全连接层后，得到目标概率分数和边界框回归分数，以得到待检测图片对应的小目标检测框及类别。Step S108, through the region proposal network, multiple candidate boxes are output based on the extracted feature map, and the multiple candidate boxes are filtered to obtain the prediction box used for subsequent calculation; the prediction box is adjusted through the region of interest layer, and then after passing through the fully connected layer, the target probability score and bounding box regression score are obtained to obtain the small target detection box and category corresponding to the image to be detected.

本申请实施例提供的小目标检测方法中，为了解决在主干提取网络部分，由于小目标本身分辨率低而导致小目标检测不准确的问题，主要想法是设计一个能在多尺度上进行特征增强的注意力机制，即MS-ECA注意力机制，将其应用在特征提取部分，以此获取更丰富的小目标信息；其次，针对候选框筛选过程，本实施例中提供一种考虑候选框和真实框之间的面积比以及长宽比作为惩罚项的损失函数，以此来增强小目标，尤其是长宽比明显的小目标在过滤候选框时的惩罚力度，进而降低对小目标的漏检率；另外，本申请实施例中模型训练过程中的锚框尺寸和比例基于对样本集的聚类算法分析得到，更适用于小目标的模型训练，从而提高模型检测效果。In the small target detection method provided in the embodiment of the present application, in order to solve the problem of inaccurate small target detection in the backbone extraction network part due to the low resolution of the small target itself, the main idea is to design an attention mechanism that can perform feature enhancement at multiple scales, namely the MS-ECA attention mechanism, and apply it to the feature extraction part to obtain richer small target information; secondly, for the candidate box screening process, the present embodiment provides a loss function that considers the area ratio and the aspect ratio between the candidate box and the real box as penalty terms, so as to enhance the penalty intensity of small targets, especially small targets with obvious aspect ratios when filtering candidate boxes, thereby reducing the missed detection rate of small targets; in addition, the anchor box size and proportion in the model training process in the embodiment of the present application are obtained based on the clustering algorithm analysis of the sample set, which is more suitable for model training of small targets, thereby improving the model detection effect.

本申请实施例还提供另一种小目标检测方法，该方法在上述实施例的基础上实现；本实施例重点描述特征提取过程、模型训练过程。The embodiment of the present application also provides another small target detection method, which is implemented on the basis of the above embodiment; this embodiment focuses on describing the feature extraction process and the model training process.

上述通过特征提取网络，提取待检测图片中的包含小目标信息的特征图的步骤，包括：The above step of extracting a feature map containing small target information from the image to be detected through a feature extraction network includes:

通过特征提取网络中的主干网络提取待检测图片的第一特征图；将第一特征图，按照通道划分为多组特征图，分别用不同大小的卷积核对多组特征图进行卷积，得到多组第二特征图；将卷积后得到的多组第二特征图输入至ECA注意力机制，得到不同的权值，将每个权值与对应的第二特征图进行相乘，得到不同组的第三特征图；将不同组的第三特征图在通道上进行拼接，得到包含小目标信息的特征图。The first feature map of the image to be detected is extracted through the backbone network in the feature extraction network; the first feature map is divided into multiple groups of feature maps according to the channel, and the multiple groups of feature maps are convolved with convolution kernels of different sizes to obtain multiple groups of second feature maps; the multiple groups of second feature maps obtained after the convolution are input into the ECA attention mechanism to obtain different weights, and each weight is multiplied by the corresponding second feature map to obtain different groups of third feature maps; the third feature maps of different groups are spliced on the channel to obtain feature maps containing small target information.

本申请实施例中，主干网络为Resnet-50主干提取网络，如图3所示，由一个卷积层和四个残差层组成，在Resnet-50的第一个残差块和第二个残差块后添加MS_ECA注意力机制，从而可以丰富提取的小目标特征信息。In an embodiment of the present application, the backbone network is a Resnet-50 backbone extraction network, as shown in FIG3 , which is composed of a convolutional layer and four residual layers. The MS_ECA attention mechanism is added after the first residual block and the second residual block of Resnet-50, thereby enriching the extracted small target feature information.

具体实施时，参见图4所示的MS_ECA注意力机制的原理示意图，首先，输入特征图像尺寸为，根据特征图的大小按照通道划分为多组，特征图越大，被分成的组越少，将分成的多组特征图分别用不同大小的卷积核进行卷积，为保证最后的输出尺寸和输入尺寸一致，在使用大的卷积核进行卷积时，通过增大padding值来保证特征图尺寸的不变性，将卷积后得到的特征图输入到ECA注意力机制得到不同的权值，把该权值与输入特征图相乘，最后将不同组的特征图在通道上进行拼接输出，以此使网络更加关注小目标的信息。In the specific implementation, referring to the principle diagram of the MS_ECA attention mechanism shown in Figure 4, first, the input feature image size is According to the size of the feature map, it is divided into multiple groups according to the channel. The larger the feature map, the fewer groups it is divided into. The multiple groups of feature maps are convolved with convolution kernels of different sizes. To ensure that the final output size is consistent with the input size, when using a large convolution kernel for convolution, the padding value is increased to ensure the invariance of the feature map size. The feature map obtained after convolution is input into the ECA attention mechanism to obtain different weights, which are multiplied by the input feature map. Finally, the feature maps of different groups are spliced and output on the channel, so that the network pays more attention to the information of small targets.

下面详细说明本申请实施例中小目标检测模型的训练过程：The following is a detailed description of the training process of the small target detection model in the embodiment of the present application:

（1）获取小目标训练样本集；(1) Obtain a small target training sample set;

首先获取小目标数据集；然后对小目标数据集中的图片进行缩放处理，以及图片数据增强处理，得到小目标训练样本集；其中，缩放处理包括：对图片中小目标的真实框宽高值超过阈值的图片，按照缩放比进行缩放处理；缩放比等于目标真实框宽高值中的最小值除以预设值；图片数据增强处理包括：对每张图片进行旋转、翻转以及HSV值调整。First, a small target data set is obtained; then, the images in the small target data set are scaled and the image data is enhanced to obtain a small target training sample set; wherein the scaling process includes: for images whose real frame width and height values of small targets in the image exceed a threshold, scaling is performed according to a scaling ratio; the scaling ratio is equal to the minimum value of the target real frame width and height divided by a preset value; the image data enhancement process includes: rotating, flipping and adjusting the HSV value of each image.

对公开数据集FlickrLogos-32进行预处理。按照小目标的定义，读取每张图片所对应标注文件的目标宽高值，若宽值或高值大于32px，那么对整张图片进行缩放，缩放后的图像左上角与原图像左上角对齐，其余部分填充黑色构成一张新的图像作为训练集，随后，将每张新的图像通过旋转、翻转以及调整图像HSV值等方式对数据集进行数据增强，缩放比公式如下：The public dataset FlickrLogos-32 is preprocessed. According to the definition of small targets, the target width and height values of the annotation file corresponding to each image are read. If the width or height value is greater than 32px, the entire image is scaled, and the upper left corner of the scaled image is aligned with the upper left corner of the original image, and the rest is filled with black to form a new image as a training set. Subsequently, each new image is rotated, flipped, and the image HSV value is adjusted to enhance the data set. The scaling ratio formula is as follows:

； ;

其中，ratio表示缩放比；和/>分别代表每个目标的宽高值，表示每个目标的宽高值中的最小值；32是小目标的绝对尺寸定义。Among them,ratio represents the scaling ratio; and/> Represent the width and height of each target respectively. Indicates the minimum value of the width and height of each object; 32 is the absolute size definition of a small object.

（2）采用K-means++聚类算法对小目标训练样本集进行聚类分析，确定适用于训练集的锚框尺寸和比例；(2) Use the K-means++ clustering algorithm to perform cluster analysis on the small target training sample set to determine the anchor box size and proportion suitable for the training set;

具体实施过程可以结合图5所示的步骤进行说明，通过以下步骤实现：The specific implementation process can be described in conjunction with the steps shown in FIG5 , and is implemented by the following steps:

A、获取训练集中每个样本中小目标对应的真实框的面积；按照两个样本中真实框的面积差小于等于预设面积阈值，确定两个真实框属于一个簇，否则不属于一个簇的原则对所有样本进行判断，确定聚类个数；A. Obtain the area of the real box corresponding to the small target in each sample in the training set; determine that the two real boxes belong to the same cluster if the area difference between the real boxes in two samples is less than or equal to the preset area threshold, otherwise judge all samples and determine the number of clusters;

比如，首先读取每张图片对应的标注文件，获取每个目标的真实框宽高值方便后续计算，再设置面积阈值，用于确定聚类个数，具体公式如下：For example, first read the annotation file corresponding to each image, obtain the real frame width and height of each target for subsequent calculations, and then set the area threshold , used to determine the number of clusters, the specific formula is as follows:

其中，和/>分别代表两个样本的面积，/>表明做对比的两个样本不属于一个簇，/>则表明两个样本属于一个簇，/>表示两个样本是否属于同一簇的结果。in, and/> Represent the area of two samples respectively,/> Indicates that the two samples being compared do not belong to the same cluster, /> This indicates that the two samples belong to one cluster. The result indicating whether two samples belong to the same cluster.

B、对训练集中的样本进行聚类中心初始化处理，得到与聚类个数相同数量的初始化聚类中心；B. Initialize the cluster centers of the samples in the training set to obtain the same number of initialized cluster centers as the number of clusters;

从训练集中随机选择一个样本作为当前簇中心；基于当前簇中心进行簇中心确定步骤：计算其它每个样本与当前簇中心间的IoU值，将最小IoU值对应的样本作为新的当前簇中心，继续执行簇中心确定步骤，直至确定出的当前簇中心的数量达到聚类个数，将确定出的当前簇中心确定为初始化聚类中心。A sample is randomly selected from the training set as the current cluster center; the cluster center determination step is performed based on the current cluster center: the IoU value between each other sample and the current cluster center is calculated, and the sample corresponding to the minimum IoU value is used as the new current cluster center. The cluster center determination step is continued until the number of current cluster centers determined reaches the number of clusters, and the current cluster center determined is determined as the initialization cluster center.

IoU值可按如下公式计算：The IoU value can be calculated as follows:

； ;

其中，代表交集，/>代表并集，IoU即为面积交并比，分别代表由两个真实框构成的组合框的左上角坐标值和右下角坐标值，分别代表两个真实框的面积。in, represents the intersection, /> represents the union,IoU is the intersection over union ratio, Respectively represent the coordinate values of the upper left corner and the lower right corner of the combo box composed of two real boxes. Represent the areas of the two real boxes respectively.

C、对多个初始化聚类中心进行更新处理，得到多个更新聚类中心；C. Updating multiple initialized cluster centers to obtain multiple updated cluster centers;

具体实施时，可以基于多个初始化聚类中心执行样本划分步骤：针对每个样本，计算样本与每个初始化聚类中心的IoU值；将样本归属到IoU值最大的初始化聚类中心对应的簇中；根据每个簇中的中值更新该簇中的聚类中心；中值为该簇中多个样本分别与簇中心的IoU值中，处于中间位置的IoU值对应的样本；基于更新的聚类中心继续执行样本划分步骤，直到每个簇中的样本不再变化，将此时的每个簇中的聚类中心确定为更新聚类中心。In specific implementation, the sample division step can be performed based on multiple initialized cluster centers: for each sample, the IoU value between the sample and each initialized cluster center is calculated; the sample is assigned to the cluster corresponding to the initialized cluster center with the largest IoU value; the cluster center in each cluster is updated according to the median value in the cluster; the median is the sample corresponding to the IoU value in the middle position among the IoU values of multiple samples in the cluster with the cluster center; the sample division step is continued based on the updated cluster center until the samples in each cluster no longer change, and the cluster center in each cluster at this time is determined as the updated cluster center.

D、基于多个更新聚类中心分别对应的目标真实框，确定适用于训练集的锚框尺寸和比例。也就确定出适合该训练集的锚框的尺寸和比例，比如最后聚类出一共9个聚类中心，那么这9个聚类中心就是锚框信息，例如(宽，高)：(5,5)、(15,15)、(20,20)、(10,10)、(30,30)、(40,40)、(20,20)、(60,60)、(90,90)，那么最后的尺寸和比例就是，尺寸：(10,10)、(30,30)、(40,40)，比例：(1:1)、(1:2)、(2:1)。需要说明的是，要对聚类结果分析后，选择尽可能包含所有聚类中心的尺寸和比例最后尺寸。本申请实施例中，可以选择任意多个，比例是根据预设锚框的个数，最后通过：聚类中心/预设锚框数得到，如果聚类中心是12，那可以选择尺寸个数：1、2、3、4、6、12，对应的比例个数：12、6、4、3、2、1；即尺寸个数和比例个数相乘，得到聚类中心个数。D. Based on the target real boxes corresponding to multiple updated cluster centers, determine the size and ratio of the anchor box suitable for the training set. In other words, determine the size and ratio of the anchor box suitable for the training set. For example, if a total of 9 cluster centers are finally clustered, then these 9 cluster centers are the anchor box information, such as (width, height): (5,5), (15,15), (20,20), (10,10), (30,30), (40,40), (20,20), (60,60), (90,90), then the final size and ratio are, size: (10,10), (30,30), (40,40), ratio: (1:1), (1:2), (2:1). It should be noted that after analyzing the clustering results, select the final size that contains the size and ratio of all cluster centers as much as possible. In the embodiment of the present application, any number can be selected, and the ratio is based on the number of preset anchor frames, and finally obtained by: cluster center/preset number of anchor frames. If the cluster center is 12, then the number of sizes can be selected: 1, 2, 3, 4, 6, 12, and the corresponding number of ratios is: 12, 6, 4, 3, 2, 1; that is, the number of sizes and the number of ratios are multiplied to get the number of cluster centers.

（3）将小目标训练样本集中的样本输入至小目标检测模型对应的初始检测模型中；这里的初始检测模型也就是上述结构改进后的Faster R-CNN。(3) Input the samples in the small target training sample set into the initial detection model corresponding to the small target detection model; the initial detection model here is the Faster R-CNN with the improved structure mentioned above.

（4）通过初始检测模型中的添加有MS-ECA注意力机制的特征提取网络，提取样本特征；通过配置有锚框尺寸和比例的区域建议网络对样本特征进行处理得到预测框；基于包括预测框和真实框的面积比和宽高比的损失函数，确定损失值；也就是根据候选框和真实框来计算损失值。具体的计算过程如下：(4) Extract sample features through the feature extraction network with MS-ECA attention mechanism added in the initial detection model; process sample features through the region proposal network configured with anchor box size and ratio to obtain prediction box; determine the loss value based on the loss function including the area ratio and aspect ratio of the prediction box and the real box; that is, calculate the loss value based on the candidate box and the real box. The specific calculation process is as follows:

按照以下算式计算损失值AWH-IoU：The loss valueAWH-IoU is calculated according to the following formula:

； ;

其中，IoU表示输出结果中的候选框与真实框之间的面积交并比；和/>分别代表候选框和真实框的中心坐标值，/>表示候选框和真实框的中心距离，表示能够包围候选框和真实框最小闭合区域的对角线距离，/>和/>分别表示候选框和真实框的面积，/>，/>分别表示真实框和候选框的宽高值；/>表示指数函数。Among them,IoU represents the area intersection-over-union ratio between the candidate box and the real box in the output result; and/> Represent the center coordinates of the candidate box and the real box respectively,/> Indicates the center distance between the candidate box and the real box, Indicates the diagonal distance of the minimum closed area that can enclose the candidate box and the real box, /> and/> Represent the area of the candidate box and the real box respectively,/> ,/> Respectively represent the width and height of the real box and the candidate box; /> Represents an exponential function.

（5）基于损失值调整初始检测模型的参数，直至初始检测模型收敛，得到小目标检测模型。(5) Adjust the parameters of the initial detection model based on the loss value until the initial detection model converges to obtain a small target detection model.

上面的过程即为模型训练过程，在模型应用时，同样可以通过区域建议网络，基于提取的目标特征输出多个候选框，并对多个候选框进行过滤处理，得到后续用于计算的预测框；通过感兴趣区域层对预测框进行调整，再经过全连接层后，得到目标概率分数和边界框回归分数，以得到待检测图片对应的小目标检测框及类别。The above process is the model training process. When the model is applied, the region proposal network can also be used to output multiple candidate boxes based on the extracted target features, and the multiple candidate boxes can be filtered to obtain the prediction box used for subsequent calculations; the prediction box is adjusted through the region of interest layer, and then after passing through the fully connected layer, the target probability score and bounding box regression score are obtained to obtain the small target detection box and category corresponding to the image to be detected.

具体的过程如下：The specific process is as follows:

RPN网络（如图2中的区域建议网络）使用NMS（Non-Maximum Suppression，非极大抑制算法）对候选框进行过滤，而NMS就是以anchor boxes之间的IoU值作为过滤条件，本申请实施例在此基础上还引入候选框与真实框的面积比和宽高比作为新的惩罚项。首先通过检测模型生成一组候选框，每个候选框都有其坐标信息和置信度得分，对生成的候选框按照它们的置信度的得分降序排列，并计算所有候选框的AWH_IoU值。然后根据得到的AWH_IoU值对候选框进行降序排序，选择具有最高AWH_IoU值的候选框，将其添加到最终结果列表中，再遍历剩余候选框，计算它们与选择的候选框的AWH_IoU值，如果剩余候选框与任何已选择的候选框AWH_IoU值大于设定的阈值(0.5)，则丢弃该候选框，否则将其添加到最终列表中。重复上述过程直到所有候选框都被处理完毕。最后，最终的结果列表中包含的候选框即为输出的预测框。进而通过感兴趣区域层对预测框进行调整，再经过全连接层后，得到目标概率分数和边界框回归分数，以得到待检测图片对应的小目标检测框及类别。The RPN network (such as the region proposal network in Figure 2) uses NMS (Non-Maximum Suppression) to filter the candidate boxes, and NMS uses the IoU value between anchor boxes as the filtering condition. On this basis, the embodiment of the present application also introduces the area ratio and aspect ratio of the candidate box to the real box as a new penalty item. First, a set of candidate boxes are generated by the detection model, each of which has its coordinate information and confidence score. The generated candidate boxes are arranged in descending order according to their confidence scores, and the AWH_IoU values of all candidate boxes are calculated. Then, the candidate boxes are sorted in descending order according to the obtained AWH_IoU values, and the candidate box with the highest AWH_IoU value is selected and added to the final result list. Then, the remaining candidate boxes are traversed and their AWH_IoU values with the selected candidate boxes are calculated. If the AWH_IoU value of the remaining candidate boxes with any selected candidate box is greater than the set threshold (0.5), the candidate box is discarded, otherwise it is added to the final list. Repeat the above process until all candidate boxes have been processed. Finally, the candidate boxes included in the final result list are the output prediction boxes. The prediction boxes are then adjusted through the region of interest layer, and after passing through the fully connected layer, the target probability score and bounding box regression score are obtained to obtain the small target detection box and category corresponding to the image to be detected.

本申请实施例提供的一种小目标检测方法中，通过K-means++对训练集进行聚类，分别设置了不同的聚类中心个数，聚类结果表明，随着簇的个数增加，真实框的平均IoU值也在不断增加，意味着在设置聚类中心个数时，使用较大的K值能更好的反应样本的分布情况。最后将K值设置为12后，能得到四个尺寸和三种不同比例的锚框信息，相比于原FasterR-CNN通过经验所得的三个尺寸和三种不同比例的锚框信息，本申请实施例提供的方法更能适用于小目标的锚框设计。In a small target detection method provided in an embodiment of the present application, the training set is clustered by K-means++, and different numbers of cluster centers are set. The clustering results show that as the number of clusters increases, the average IoU value of the real box is also increasing, which means that when setting the number of cluster centers, using a larger K value can better reflect the distribution of samples. Finally, after setting the K value to 12, four sizes and three different proportions of anchor box information can be obtained. Compared with the three sizes and three different proportions of anchor box information obtained by the original FasterR-CNN through experience, the method provided in the embodiment of the present application is more suitable for the design of anchor boxes for small targets.

此外，在Resnet-50第一层和第二层残差模块后，通过添加MS_ECA注意力机制，使得特征提取网络能更有效的关注小目标的信息，从而降低小目标的漏检率，并且在使用惩罚力度更大的AWH_IoU后，可以让RPN在过滤候选框时，考虑到框之间的面积比以及宽高比，这样能更有效的过滤掉一些IoU值很大，但是效果很差的候选框，进而提升对小目标的检测精度。In addition, by adding the MS_ECA attention mechanism after the first and second residual modules of Resnet-50, the feature extraction network can pay more attention to the information of small targets, thereby reducing the missed detection rate of small targets. After using the AWH_IoU with a stronger penalty, RPN can consider the area ratio and aspect ratio between frames when filtering candidate frames. This can more effectively filter out some candidate frames with large IoU values but poor effects, thereby improving the detection accuracy of small targets.

小目标检测是计算机视觉领域中的一个挑战性问题，通常涉及到在图像或视频中识别和定位极小尺寸的目标物体。这些小目标可能由于低分辨率、遮挡或噪声等因素而难以准确检测。现有的小目标检测方法通常依赖于复杂的神经网络架构和大量的训练数据。因此，需要一种更有效、精确和可扩展的小目标检测系统和方法。本申请实施例提出的一种基于多尺度注意力机制的小目标检测方法，该方法在提高检测准确性的同时，具有很好的可扩展性。以下是该方法的主要构思：Small target detection is a challenging problem in the field of computer vision, which usually involves identifying and locating extremely small target objects in images or videos. These small targets may be difficult to detect accurately due to factors such as low resolution, occlusion or noise. Existing small target detection methods usually rely on complex neural network architectures and large amounts of training data. Therefore, a more effective, accurate and scalable small target detection system and method is needed. A small target detection method based on a multi-scale attention mechanism proposed in an embodiment of the present application has good scalability while improving detection accuracy. The following are the main ideas of the method:

1、小目标增强技术：本申请实施例提供一种增强小目标特征信息的技术，能在主干提取网络部分中，增强低分辨率或模糊图像中的目标特征，以提高检测性能；1. Small target enhancement technology: The embodiment of the present application provides a technology for enhancing the feature information of small targets, which can enhance the target features in low-resolution or blurred images in the backbone extraction network part to improve the detection performance;

2、多尺度检测：本申请实施例提供的小目标检测模型，能够支持多尺度的目标检测，以确保即使在不同距离和尺寸下，也能够准确的检测小目标；2. Multi-scale detection: The small target detection model provided in the embodiment of the present application can support multi-scale target detection to ensure that small targets can be accurately detected even at different distances and sizes;

3、丰富的训练样本：本申请实施例使用翻转和旋转等方法，对原始数据集进行数据增强，从而扩展训练集，提高模型的鲁棒性；3. Rich training samples: The embodiment of the present application uses methods such as flipping and rotation to perform data enhancement on the original data set, thereby expanding the training set and improving the robustness of the model;

4、更有效的候选框筛选技术：除考虑到IoU带来的影响，本申请实施例增加候选框之间的面积比以及宽高比作为新的惩罚项，从而在候选框筛选过程中，能更有效的过滤掉效果差的候选框。4. More effective candidate box screening technology: In addition to considering the impact of IoU, the embodiment of the present application adds the area ratio and aspect ratio between candidate boxes as new penalty items, so that in the candidate box screening process, candidate boxes with poor effects can be filtered out more effectively.

可行性和有益效果：Feasibility and beneficial effects:

1、本申请实施例所有实验均在Windows系统下进行，使用pytorch作为深度学习框架，硬件环境：GPU为NVIDIA GTX3050，并搭有CUDA10.2；1. All experiments in the embodiments of this application are carried out under Windows system, using pytorch as the deep learning framework, and the hardware environment: GPU is NVIDIA GTX3050, and is equipped with CUDA10.2;

2、本申请实施例具有较高的检测准确性，尤其在小目标检测方面表现出色；2. The embodiments of the present application have high detection accuracy, especially in the detection of small targets;

3、本申请实施例所提供的MS_ECA注意力机制和AWH_IoU能适用于各种检测模型，具有很好的扩展性；3. The MS_ECA attention mechanism and AWH_IoU provided in the embodiments of the present application can be applied to various detection models and have good scalability;

实验结果及分析：Experimental results and analysis:

改进后的检测模型采用了一种新的神经网络架构，具有更好的特征提取能力，由图6所示的实验结果可以看出，改进后的算法在增强后的FlickrLogos-32数据集上检测精度达到了92.7%，相比于原Faster R-CNN提升了12%，所以MS_ECA注意力机制和AWH_IoU能有效的提升对小目标的检测精度。The improved detection model adopts a new neural network architecture with better feature extraction capabilities. From the experimental results shown in Figure 6, it can be seen that the improved algorithm has a detection accuracy of 92.7% on the enhanced FlickrLogos-32 dataset, which is 12% higher than the original Faster R-CNN. Therefore, the MS_ECA attention mechanism and AWH_IoU can effectively improve the detection accuracy of small targets.

基于上述方法实施例，本申请实施例还提供一种小目标检测装置，装置包括用于执行上面所述的小目标检测方法的步骤的多个模块，参见图7所示，多个模块包括图片获取模块82、图片输入模块84、特征提取模块86和预测模块88，其中：图片获取模块82，用于获取包含小目标的待检测图片；图片输入模块84，用于将待检测图片输入至小目标检测模型中；小目标检测模型包括添加有MS-ECA注意力机制的特征提取网络，包括预设锚框尺寸和比例、以及新增面积比惩罚项和宽高比惩罚项的损失函数的区域建议网络；预设锚框尺寸和比例为通过聚类算法对小目标训练样本集进行聚类分析后得到的锚框；特征提取模块86，用于通过特征提取网络，提取待检测图片中的包含小目标信息的特征图；预测模块88，用于通过区域建议网络，基于提取的特征图输出多个候选框，并对多个候选框进行过滤处理，得到后续用于计算的预测框；通过感兴趣区域层对预测框进行调整，再经过全连接层后，得到目标概率分数和边界框回归分数，以得到待检测图片对应的小目标检测框及类别。Based on the above method embodiment, the embodiment of the present application also provides a small target detection device, which includes multiple modules for executing the steps of the small target detection method described above, as shown in Figure 7, the multiple modules include an image acquisition module 82, an image input module 84, a feature extraction module 86 and a prediction module 88, wherein: the image acquisition module 82 is used to obtain a picture to be detected containing a small target; the image input module 84 is used to input the picture to be detected into the small target detection model; the small target detection model includes a feature extraction network with an MS-ECA attention mechanism added, including a preset anchor frame size and ratio, and a newly added area ratio penalty term and aspect ratio A region proposal network with a loss function of a penalty term; the preset anchor frame size and ratio are the anchor frames obtained after clustering analysis of the small target training sample set through a clustering algorithm; a feature extraction module 86 is used to extract a feature map containing small target information in the image to be detected through a feature extraction network; a prediction module 88 is used to output multiple candidate frames based on the extracted feature map through a region proposal network, and filter the multiple candidate frames to obtain a prediction frame for subsequent calculation; the prediction frame is adjusted through the region of interest layer, and then after passing through the fully connected layer, the target probability score and the bounding box regression score are obtained to obtain the small target detection frame and category corresponding to the image to be detected.

进一步地，上述特征提取模块86，用于通过特征提取网络中的主干网络提取待检测图片的第一特征图；将第一特征图，按照通道划分为多组特征图，分别用不同大小的卷积核对多组特征图进行卷积，得到多组第二特征图；将卷积后得到的多组第二特征图输入至ECA注意力机制，得到不同的权值，将每个权值与对应的第二特征图进行相乘，得到不同组的第三特征图；将不同组的第三特征图在通道上进行拼接，得到包含小目标信息的特征图。Furthermore, the feature extraction module 86 is used to extract a first feature map of the image to be detected through a backbone network in the feature extraction network; divide the first feature map into multiple groups of feature maps according to channels, and convolve the multiple groups of feature maps with convolution kernels of different sizes to obtain multiple groups of second feature maps; input the multiple groups of second feature maps obtained after convolution into the ECA attention mechanism to obtain different weights, multiply each weight by the corresponding second feature map, and obtain different groups of third feature maps; splice the different groups of third feature maps on the channel to obtain a feature map containing small target information.

进一步地，上述装置还包括训练模块，用于执行以下小目标检测模型的训练过程：获取小目标训练样本集；采用K-means++聚类算法对小目标训练样本集进行聚类分析，确定适用于训练集的锚框尺寸和比例；将小目标训练样本集中的样本输入至小目标检测模型对应的初始检测模型中；通过初始检测模型中的添加有MS-ECA注意力机制的特征提取网络，提取样本特征；通过配置有锚框尺寸和比例的区域建议网络对样本特征进行处理得到预测框；基于包括预测框和真实框的面积比和宽高比的损失函数，确定损失值；基于损失值调整初始检测模型的参数，直至初始检测模型收敛，得到小目标检测模型。Furthermore, the above-mentioned device also includes a training module, which is used to execute the following training process of the small target detection model: obtaining a small target training sample set; using the K-means++ clustering algorithm to perform cluster analysis on the small target training sample set to determine the anchor box size and ratio suitable for the training set; inputting the samples in the small target training sample set into the initial detection model corresponding to the small target detection model; extracting sample features through a feature extraction network with an MS-ECA attention mechanism added in the initial detection model; processing the sample features through a region proposal network configured with an anchor box size and ratio to obtain a predicted box; determining a loss value based on a loss function including an area ratio and an aspect ratio of the predicted box and the true box; adjusting the parameters of the initial detection model based on the loss value until the initial detection model converges to obtain a small target detection model.

进一步地，上述训练模块，用于获取小目标数据集；对小目标数据集中的图片进行缩放处理，以及图片数据增强处理，得到小目标训练样本集；其中，缩放处理包括：对图片中小目标的真实框宽高值超过阈值的图片，按照缩放比进行缩放处理；缩放比等于目标真实框宽高值中的最小值除以预设值；图片数据增强处理包括：对每张图片进行旋转、翻转以及HSV值调整。Furthermore, the above-mentioned training module is used to obtain a small target data set; scale the images in the small target data set, and perform image data enhancement processing to obtain a small target training sample set; wherein the scaling processing includes: scaling the images whose real frame width and height values of the small targets in the images exceed the threshold according to the scaling ratio; the scaling ratio is equal to the minimum value of the target real frame width and height divided by a preset value; the image data enhancement processing includes: rotating, flipping and adjusting the HSV value of each image.

进一步地，上述训练模块，用于获取训练集中每个样本中小目标对应的真实框的面积；按照两个样本中真实框的面积差小于等于预设面积阈值，确定两个真实框属于一个簇，否则不属于一个簇的原则对所有样本进行判断，确定聚类个数；对训练集中的样本进行聚类中心初始化处理，得到与聚类个数相同数量的初始化聚类中心；对多个初始化聚类中心进行更新处理，得到多个更新聚类中心；基于多个更新聚类中心分别对应的目标真实框，确定适用于训练集的锚框尺寸和比例。Furthermore, the above-mentioned training module is used to obtain the area of the real box corresponding to the small target in each sample in the training set; according to the principle that the area difference of the real boxes in two samples is less than or equal to a preset area threshold, it is determined that the two real boxes belong to one cluster, otherwise all samples are judged to determine the number of clusters; the cluster center is initialized for the samples in the training set to obtain the same number of initialized cluster centers as the number of clusters; multiple initialized cluster centers are updated to obtain multiple updated cluster centers; based on the target real boxes corresponding to the multiple updated cluster centers, the size and proportion of the anchor box suitable for the training set are determined.

进一步地，上述训练模块，用于从训练集中随机选择一个样本作为当前簇中心；基于当前簇中心进行簇中心确定步骤：计算其它每个样本与当前簇中心间的IoU值，将最小IoU值对应的样本作为新的当前簇中心，继续执行簇中心确定步骤，直至确定出的当前簇中心的数量达到聚类个数，将确定出的当前簇中心确定为初始化聚类中心。Furthermore, the above-mentioned training module is used to randomly select a sample from the training set as the current cluster center; perform a cluster center determination step based on the current cluster center: calculate the IoU value between each other sample and the current cluster center, and use the sample corresponding to the minimum IoU value as the new current cluster center, and continue to execute the cluster center determination step until the number of determined current cluster centers reaches the number of clusters, and the determined current cluster center is determined as the initialization cluster center.

进一步地，上述训练模块，用于基于多个初始化聚类中心执行样本划分步骤：针对每个样本，计算样本与每个初始化聚类中心的IoU值；将样本归属到IoU值最大的初始化聚类中心对应的簇中；根据每个簇中的中值更新该簇中的聚类中心；中值为该簇中多个样本分别与簇中心的IoU值中，处于中间位置的IoU值对应的样本；基于更新的聚类中心继续执行样本划分步骤，直到每个簇中的样本不再变化，将此时的每个簇中的聚类中心确定为更新聚类中心。Furthermore, the above-mentioned training module is used to perform a sample division step based on multiple initialized cluster centers: for each sample, calculate the IoU value between the sample and each initialized cluster center; assign the sample to the cluster corresponding to the initialized cluster center with the largest IoU value; update the cluster center in each cluster according to the median value in the cluster; the median is the sample corresponding to the IoU value in the middle position among the IoU values of multiple samples in the cluster with the cluster center; continue to perform the sample division step based on the updated cluster center until the samples in each cluster no longer change, and determine the cluster center in each cluster at this time as the updated cluster center.

进一步地，上述训练模块，用于按照以下算式计算损失值AWH-IoU：Furthermore, the above training module is used to calculate the loss valueAWH-IoU according to the following formula:

； ;

本申请实施例提供的装置，其实现原理及产生的技术效果和前述方法实施例相同，为简要描述，装置的实施例部分未提及之处，可参考前述方法实施例中相应内容。The device provided in the embodiment of the present application has the same implementation principle and technical effects as those of the aforementioned method embodiment. For the sake of brief description, for matters not mentioned in the embodiment of the device, reference may be made to the corresponding contents in the aforementioned method embodiment.

本申请实施例还提供了一种计算机可读存储介质，该计算机可读存储介质存储有计算机可执行指令，该计算机可执行指令在被处理器调用和执行时，该计算机可执行指令促使处理器实现上述方法，具体实现可参见前述方法实施例，在此不再赘述。An embodiment of the present application also provides a computer-readable storage medium, which stores computer-executable instructions. When the computer-executable instructions are called and executed by a processor, the computer-executable instructions prompt the processor to implement the above method. The specific implementation can be found in the aforementioned method embodiment, which will not be repeated here.

本申请实施例所提供的方法、装置和电子设备的计算机程序产品，包括存储了程序代码的计算机可读存储介质，所述程序代码包括的指令可用于执行前面方法实施例中所述的方法，具体实现可参见方法实施例，在此不再赘述。The computer program products of the methods, devices, and electronic devices provided in the embodiments of the present application include a computer-readable storage medium storing program code. The instructions included in the program code can be used to execute the methods described in the previous method embodiments. The specific implementation can be found in the method embodiments, which will not be repeated here.

除非另外具体说明，否则在这些实施例中阐述的部件和步骤的相对步骤、数字表达式和数值并不限制本申请的范围。Unless otherwise specifically stated, the relative steps, numerical expressions and values of the components and steps set forth in these embodiments do not limit the scope of the present application.

所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个处理器可执行的非易失的计算机可读取存储介质中。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备（可以是个人计算机，服务器，或者网络设备等）执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器（ROM，Read-Only Memory）、随机存取存储器（RAM，Random Access Memory）、磁碟或者光盘等各种可以存储程序代码的介质。If the functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a non-volatile computer-readable storage medium that can be executed by a processor. Based on this understanding, the technical solution of the present application, or the part that contributes to the prior art or the part of the technical solution, can be embodied in the form of a software product. The computer software product is stored in a storage medium, including several instructions to enable a computer device (which can be a personal computer, server, or network device, etc.) to execute all or part of the steps of the method described in each embodiment of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), disk or optical disk, and other media that can store program codes.

在本申请的描述中，需要说明的是，术语“第一”、“第二”、“第三”仅用于描述目的，而不能理解为指示或暗示相对重要性。In the description of the present application, it should be noted that the terms "first", "second" and "third" are only used for descriptive purposes and cannot be understood as indicating or implying relative importance.

最后应说明的是：以上所述实施例，仅为本申请的具体实施方式，用以说明本申请的技术方案，而非对其限制，本申请的保护范围并不局限于此，尽管参照前述实施例对本申请进行了详细的说明，本领域的普通技术人员应当理解：任何熟悉本技术领域的技术人员在本申请揭露的技术范围内，其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化，或者对其中部分技术特征进行等同替换；而这些修改、变化或者替换，并不使相应技术方案的本质脱离本申请实施例技术方案的精神和范围，都应涵盖在本申请的保护范围之内。因此，本申请的保护范围应所述以权利要求的保护范围为准。Finally, it should be noted that the above-described embodiments are only specific implementation methods of the present application, which are used to illustrate the technical solutions of the present application, rather than to limit them. The protection scope of the present application is not limited thereto. Although the present application is described in detail with reference to the above-described embodiments, ordinary technicians in the field should understand that any technician familiar with the technical field can still modify the technical solutions recorded in the above-described embodiments within the technical scope disclosed in the present application, or can easily think of changes, or make equivalent replacements for some of the technical features therein; and these modifications, changes or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present application, and should be included in the protection scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.