


技术领域technical field
本发明属于数字图像处理领域,涉及目标检测,特别涉及一种基于特征金字塔网络和注意力机制的目标检测改进算法。The invention belongs to the field of digital image processing and relates to target detection, in particular to an improved target detection algorithm based on a feature pyramid network and an attention mechanism.
背景技术Background technique
目标检测的任务是找出图像中的感兴趣目标,确定它们的类别和位置,是计算机视觉领域的核心问题之一,在红外探测技术,智能视频监控,遥感影像目标检测,医疗诊断以及智能建筑中的火灾、烟雾检测中都有广泛应用。目标检测算法可以分为传统目标检测算法和基于深度学习的目标检测算法;传统目标检测算法代表算法有SIFT算法和V-J检测算法等,但该种方法时间复杂度高,且没有很好的鲁棒性。基于深度学习的目标检测算法,经典算法有R-CNN算法,Fast R-CNN算法,Faster R-CNN算法,YOLO算法,SSD算法等。虽然现阶段有很多优秀的目标检测算法,但检测性能仍有很多不足,从而导致出现漏检、误检等问题。The task of object detection is to find out the objects of interest in the image, determine their category and location, which is one of the core problems in the field of computer vision, in infrared detection technology, intelligent video surveillance, remote sensing image object detection, medical diagnosis and intelligent buildings. It is widely used in fire and smoke detection in China. Target detection algorithms can be divided into traditional target detection algorithms and target detection algorithms based on deep learning; the representative algorithms of traditional target detection algorithms include SIFT algorithm and V-J detection algorithm, but this method has high time complexity and is not very robust. sex. Target detection algorithms based on deep learning, classic algorithms include R-CNN algorithm, Fast R-CNN algorithm, Faster R-CNN algorithm, YOLO algorithm, SSD algorithm, etc. Although there are many excellent target detection algorithms at this stage, there are still many shortcomings in detection performance, which leads to problems such as missed detection and false detection.
发明内容SUMMARY OF THE INVENTION
针对上述现有技术存在的缺陷或不足,本发明的目的在于,提供一种基于特征金字塔网络和注意力机制的目标检测改进算法。In view of the above-mentioned defects or deficiencies in the prior art, the purpose of the present invention is to provide an improved target detection algorithm based on a feature pyramid network and an attention mechanism.
为了实现上述任务,本发明采取如下的技术解决方案:In order to realize the above-mentioned tasks, the present invention adopts the following technical solutions:
一种基于特征金字塔网络和注意力机制的目标检测改进算法,其特征在于,包括以下步骤:An improved target detection algorithm based on feature pyramid network and attention mechanism, characterized in that it includes the following steps:
步骤1)结合特征金字塔网络的原理,对原始SSD算法中,基础网络VGG-16提取出输入图像的6个多尺度特征图,按自小而大的顺序进行特征融合;得到融合不同层的特征图,且融合后的特征图同时包含有丰富的语义信息和细节信息;Step 1) Combined with the principle of the feature pyramid network, in the original SSD algorithm, the basic network VGG-16 extracts 6 multi-scale feature maps of the input image, and performs feature fusion in the order from small to large; the features of different layers are obtained. image, and the fused feature map contains rich semantic information and detailed information at the same time;
其中,所述原始SSD算法中,经过基础网络VGG-16对输入图像提取出的特征图尺度是从大到小依次递减的,其中底层特征图分辨率较大,含有更多细节信息,高层特征图分辨率较小,包含更多抽象的语义信息,因此,原始SSD算法将底层特征图用于对小目标进行检测,高层特征图用于对中、大目标进行检测;Among them, in the original SSD algorithm, the scale of the feature map extracted from the input image by the basic network VGG-16 is in descending order from large to small. The resolution of the image is small and contains more abstract semantic information. Therefore, the original SSD algorithm uses the low-level feature map to detect small objects, and the high-level feature map is used to detect medium and large objects;
步骤2)引入通道注意力机制,对特征融合后其中拥有更加丰富的细节信息和语义信息,同时对小目标检测更加敏感的两个特征图添加注意力模型;即通过对特征图添加掩码(mask)来实现注意力机制,将感兴趣区域的特征标识出来,通过网络的不断训练,让网络学习到每一张图像中需要重点关注的感兴趣区域,抑制其他干扰区域带来的影响,从而增强算法对小目标物体的检测能力。Step 2) Introduce the channel attention mechanism, and add an attention model to the two feature maps that have richer detailed information and semantic information after feature fusion, and are more sensitive to small target detection; that is, by adding a mask to the feature map ( mask) to realize the attention mechanism, identify the features of the area of interest, and through the continuous training of the network, let the network learn the area of interest that needs to be focused on in each image, suppress the influence of other interference areas, and thus Enhance the detection ability of the algorithm for small target objects.
根据本发明,步骤1)中所述输入图像尺寸为300×300,经过基础网络VGG-16后得到的用于检测的特征图尺寸分别为38×38、19×19、10×10、5×5、3×3、1×1。按特征金字塔网络的原理,对特征图按照尺寸从小到大的顺序,依次进行特征融合,得到尺寸大小仍为38×38、19×19、10×10、5×5、3×3、1×1的6个特征图。According to the present invention, the size of the input image in step 1) is 300×300, and the size of the feature map for detection obtained after the basic network VGG-16 is 38×38, 19×19, 10×10, 5× 5, 3×3, 1×1. According to the principle of the feature pyramid network, the feature maps are fused in order of size from small to large, and the size is still 38×38, 19×19, 10×10, 5×5, 3×3, 1× 1 of the 6 feature maps.
进一步地,步骤2)中对步骤1)中按特征金字塔原理融合后的特征图,添加注意力模型,因为融合过程是按照特征图尺寸从小到大的顺序进行的,因此融合后信息最丰富的特征图为(38,38),(19,19)两个特征图这两个特征图相比其他特征图,拥有更加丰富的细节信息和语义信息,同时对小目标检测更加敏感;且为了保持算法的检测速度,减少算法的计算量,故只对融合后的(38,38),(19,19)这两个特征图添加注意力模型,具体的检测算法过程如下:Further, in step 2), an attention model is added to the feature map fused according to the principle of feature pyramid in step 1), because the fusion process is carried out in the order of the size of the feature map from small to large, so the most informative after fusion. The feature maps are (38, 38), (19, 19) two feature maps. Compared with other feature maps, these two feature maps have richer detailed information and semantic information, and are more sensitive to small target detection; and in order to maintain The detection speed of the algorithm reduces the calculation amount of the algorithm, so only the attention model is added to the two feature maps after fusion (38, 38), (19, 19). The specific detection algorithm process is as follows:
a)基于单阶段网络模型的目标检测,利用回归的思想,直接通过一个卷积神经网络在输入图像上回归出目标的类别及边框。首先结合特征金字塔网络的原理,对原始SSD算法提取出的多尺度特征图按照尺寸从小到大的顺序,依次进行特征融合;原始SSD算法由基础网络VGG-16提取出的多尺度特征图,尺寸大小分别为38×38、19×19、10×10、5×5、3×3、1×1,按照特征金字塔网络的原理,按照尺寸从小到大的顺序,进行特征融合,融合得到6个尺寸为38×38、19×19、10×10、5×5、3×3、1×1的特征图,这些特征图都包含有丰富的语义信息和细节信息。a) Target detection based on a single-stage network model, using the idea of regression, directly regresses the category and frame of the target on the input image through a convolutional neural network. First, combined with the principle of the feature pyramid network, the multi-scale feature maps extracted by the original SSD algorithm are fused in order of size from small to large; the multi-scale feature maps extracted by the original SSD algorithm from the basic network VGG-16, the size The sizes are 38×38, 19×19, 10×10, 5×5, 3×3, and 1×1. According to the principle of the feature pyramid network, in the order of size from small to large, feature fusion is performed, and 6 are obtained by fusion. Feature maps of
b)结合注意力机制的原理,引入了通道注意力,对进行特征融合后的特征图添加注意力模型;对1a)中经过特征融合后的特征图添加注意力模型,由于融合后38×38、19×19两个特征图中包含有最丰富的信息,且为了保持算法的实时性,因此只对这两个特征图添加注意力模型。b) Combined with the principle of the attention mechanism, the channel attention is introduced, and the attention model is added to the feature map after feature fusion; the attention model is added to the feature map after feature fusion in 1a). , 19×19 feature maps contain the most abundant information, and in order to maintain the real-time performance of the algorithm, only the attention model is added to these two feature maps.
c)由步骤a)和b)中得到的6个多尺度特征图,在每个单元都要设置不同尺寸、长宽比的候选框,对于候选框的尺度,按如下公式(1)进行计算:c) From the six multi-scale feature maps obtained in steps a) and b), candidate frames of different sizes and aspect ratios must be set in each unit. For the scale of the candidate frame, the following formula (1) is used to calculate :
其中,m代表特征层的个数;sk表示候选框与图片的比例;smax和smin代表比例的最大值和最小值,分别取值为0.9和0.2;利用上述公式(1)得到各个候选框的尺度;Among them, m represents the number of feature layers; sk represents the ratio of the candidate frame to the picture; smax and smin represent the maximum and minimum values of the ratio, which are 0.9 and 0.2 respectively; using the above formula (1) to obtain each The scale of the candidate box;
对于长宽比,一般取值为且按照如下公式(2)对候选框的宽度及高度进行计算:For the aspect ratio, the general value is And calculate the width and height of the candidate frame according to the following formula (2):
对于宽高比为1的候选框,还增加一个尺度为的候选框,候选框的中心坐标为其中|fk|代表特征层的大小;For the candidate box with an aspect ratio of 1, a scale is added as The candidate frame of , the center coordinates of the candidate frame are where |fk | represents the size of the feature layer;
d)使用3×3的卷积核通过卷积操作对多尺度特征图的类别和置信度进行检测,并对目标检测算法进行训练;模型训练时损失函数定义为位置损失(localization loss,loc)和置信度损失(confidence loss,conf)的加权和,计算公式如下:d) Use a 3×3 convolution kernel to detect the category and confidence of the multi-scale feature map through the convolution operation, and train the target detection algorithm; the loss function during model training is defined as localization loss (loc) and the weighted sum of confidence loss (conf), the formula is as follows:
式中,N为匹配的候选框的数量;x∈{1,0}表示候选框是否与真实框匹配,若匹配,则x=1,反之x=0;c为类别置信度预测值;g为真实框的位置参数;l为预测框的位置预测值;α权重系数,设置为1。In the formula, N is the number of matched candidate frames; x∈{1,0} indicates whether the candidate frame matches the real frame, if it matches, then x=1, otherwise x=0; c is the category confidence prediction value; g is the position parameter of the real frame; l is the position prediction value of the prediction frame; α weight coefficient, set to 1.
对于SSD中的位置损失函数,采用Smooth L1 loss,对候选框的中心(cx,cy)及宽度(w)、高度(h)的偏移量进行回归。公式如下:For the position loss function in SSD, Smooth L1 loss is used to regress the center (cx, cy) and the offset of the width (w) and height (h) of the candidate frame. The formula is as follows:
对于SSD中的置信度损失函数,使用典型的softmax loss,其公式为:For the confidence loss function in SSD, a typical softmax loss is used, and its formula is:
本发明的基于特征金字塔网络和注意力机制的目标检测改进算法,以单阶段目标检测算法SSD算法为基础,考虑到特征图分辨率大小对目标检测性能的影响,对原算法进行改进,结合特征金字塔网络的思想,对原始SSD算法提取出的多尺度特征图进行融合,融合形成具有丰富语义信息和丰富细节信息的特征图;再结合注意力机制的原理,为融合后尺寸为38×38、19×19两个特征图添加注意力模型,以加强对小目标物体的识别效果。The improved target detection algorithm based on the feature pyramid network and the attention mechanism of the present invention is based on the single-stage target detection algorithm SSD algorithm. The idea of the pyramid network is to fuse the multi-scale feature maps extracted by the original SSD algorithm to form a feature map with rich semantic information and rich detailed information; 19×19 two feature maps are added with attention model to enhance the recognition effect of small target objects.
附图说明Description of drawings
图1是结合特征金字塔网络和注意力机制的目标检测算法的网络结构示意图;Figure 1 is a schematic diagram of the network structure of a target detection algorithm combining a feature pyramid network and an attention mechanism;
图2是原始SSD算法与改进后的目标检测算法检测效果对比图片,其中,左侧的图a1、图a2、图a3、图a4和图a5均是原始SSD算法检测图片;右侧的图b1、图b2、图b3、图b4和图b5均是改进后目标检测算法检测图片。Figure 2 is a comparison picture of the detection effect of the original SSD algorithm and the improved target detection algorithm. Among them, Figure a1, Figure a2, Figure a3, Figure a4 and Figure a5 on the left are the detection pictures of the original SSD algorithm; Figure b1 on the right , Figure b2, Figure b3, Figure b4 and Figure b5 are the improved target detection algorithm detection pictures.
以下结合附图和实施例对本发明做进一步详细描述。The present invention will be described in further detail below with reference to the accompanying drawings and embodiments.
具体实施方式Detailed ways
本发明的基于特征金字塔网络和注意力机制的目标检测改进算法,采取的技术思路是,以单阶段目标检测算法SSD为基础,对SSD算法中不足进行分析,提出对SSD目标检测算法进行改进。集合特征金字塔网络的原理,对原始SSD算法提取出的6个特征图进行融合,融合形成新的特征图,同时具有丰富的语义信息和细节信息;然后对融合后的特征图添加注意力模型,但为了保持算法的实时性,只对包含信息最丰富,同时对小目标检测更加敏感的38×38和19×19两个特征图进行添加注意力模型。通过对算法的改进以达到提高目标检测算法的检测能力,改善漏检等问题。The improved target detection algorithm based on the feature pyramid network and the attention mechanism of the present invention adopts the technical idea that, based on the single-stage target detection algorithm SSD, the deficiencies in the SSD algorithm are analyzed, and the improvement of the SSD target detection algorithm is proposed. The principle of the aggregated feature pyramid network is to fuse the 6 feature maps extracted by the original SSD algorithm to form a new feature map with rich semantic information and detailed information; then add an attention model to the fused feature map, However, in order to maintain the real-time performance of the algorithm, the attention model is only added to the 38×38 and 19×19 feature maps that contain the most information and are more sensitive to small target detection. Through the improvement of the algorithm, the detection ability of the target detection algorithm can be improved, and the problems such as missed detection can be improved.
本实施例给出一种基于特征金字塔网络和注意力机制的目标检测改进算法,包括以下步骤:The present embodiment provides an improved target detection algorithm based on a feature pyramid network and an attention mechanism, including the following steps:
步骤1)结合特征金字塔网络的原理,对原始SSD算法中,基础网络VGG-16提取出输入图像的6个多尺度特征图,按自小而大的顺序进行特征融合;得到融合不同层的特征图,且融合后的特征图同时包含有丰富的语义信息和细节信息;Step 1) Combined with the principle of the feature pyramid network, in the original SSD algorithm, the basic network VGG-16 extracts 6 multi-scale feature maps of the input image, and performs feature fusion in the order from small to large; the features of different layers are obtained. image, and the fused feature map contains rich semantic information and detailed information at the same time;
其中,所述原始SSD算法中,经过基础网络VGG-16对输入图像提取出的特征图尺度是从大到小依次递减的,其中底层特征图分辨率较大,含有更多细节信息,高层特征图分辨率较小,包含更多抽象的语义信息,因此,原始SSD算法将底层特征图用于对小目标进行检测,高层特征图用于对中、大目标进行检测;Among them, in the original SSD algorithm, the scale of the feature map extracted from the input image by the basic network VGG-16 is in descending order from large to small. The resolution of the image is small and contains more abstract semantic information. Therefore, the original SSD algorithm uses the low-level feature map to detect small objects, and the high-level feature map is used to detect medium and large objects;
步骤2)引入通道注意力机制,对特征融合后其中拥有更加丰富的细节信息和语义信息,同时对小目标检测更加敏感的两个特征图添加注意力模型;即通过对特征图添加掩码(mask)来实现注意力机制,将感兴趣区域的特征标识出来,通过网络的不断训练,让网络学习到每一张图像中需要重点关注的感兴趣区域,抑制其他干扰区域带来的影响,从而增强算法对小目标物体的检测能力。Step 2) Introduce the channel attention mechanism, and add an attention model to the two feature maps that have richer detailed information and semantic information after feature fusion, and are more sensitive to small target detection; that is, by adding a mask to the feature map ( mask) to realize the attention mechanism, identify the features of the area of interest, and through the continuous training of the network, let the network learn the area of interest that needs to be focused on in each image, suppress the influence of other interference areas, and thus Enhance the detection ability of the algorithm for small target objects.
步骤1)中,输入图像的尺寸为300×300,经过基础网络VGG-16提取出的特征图尺寸分别为38×38、19×19、10×10、5×5、3×3、1×1,结合特征金字塔网络的思想,对提取出的这6个特征图按尺寸从小到大的方式进行融合,即1×1与3×3,3×3与5×5,5×5与10×10,10×10与19×19,19×19与38×38。融合后的特征图尺寸仍为38×38、19×19、10×10、5×5、3×3、1×1。In step 1), the size of the input image is 300×300, and the size of the feature map extracted by the basic network VGG-16 is 38×38, 19×19, 10×10, 5×5, 3×3, 1× 1. Combined with the idea of feature pyramid network, the extracted 6 feature maps are fused according to the size from small to large, namely 1×1 and 3×3, 3×3 and 5×5, 5×5 and 10 ×10, 10×10 and 19×19, 19×19 and 38×38. The size of the fused feature map is still 38×38, 19×19, 10×10, 5×5, 3×3, 1×1.
步骤2)中,结合注意力机制的原理,对融合后的特征图添加注意力模型,由于特征融合后38×38和19×19两个特征图中包含最丰富的信息,且为了保持检测算法的实时性,减少计算量,只对这两个特征图添加注意力模型,添加注意力模型后可以增强对小目标物体特征的提取。In step 2), combined with the principle of the attention mechanism, an attention model is added to the fused feature map. Since the 38×38 and 19×19 feature maps after feature fusion contain the most abundant information, and in order to maintain the detection algorithm The real-time performance is reduced, and the amount of calculation is reduced. Only the attention model is added to these two feature maps. After adding the attention model, the feature extraction of small target objects can be enhanced.
改进后的目标检测算法的检测过程如下:The detection process of the improved target detection algorithm is as follows:
a)基于单阶段网络模型的目标检测,利用回归的思想,直接通过一个卷积神经网络在输入图像上回归出目标的类别及边框。首先结合特征金字塔网络的原理,对原始SSD算法提取出的多尺度特征图按照尺寸从小到大的顺序,依次进行特征融合;原始SSD算法由基础网络VGG-16提取出的多尺度特征图,尺寸大小分别为38×38、19×19、10×10、5×5、3×3、1×1,按照特征金字塔网络的原理,按照尺寸从小到大的顺序,进行特征融合,以(1,1)和(3,3)两个特征图为例:a) Target detection based on a single-stage network model, using the idea of regression, directly regresses the category and frame of the target on the input image through a convolutional neural network. First, combined with the principle of the feature pyramid network, the multi-scale feature maps extracted by the original SSD algorithm are fused in order of size from small to large; the multi-scale feature maps extracted by the original SSD algorithm from the basic network VGG-16, the size The sizes are 38×38, 19×19, 10×10, 5×5, 3×3, and 1×1. According to the principle of the feature pyramid network, the feature fusion is performed in the order of size from small to large, with (1, 1) and (3, 3) two feature maps as examples:
首先对尺寸为(1,1)的特征图进行上采样,采用内插值的方法,在原有图像像素的基础上,在像素点之间采用合适的插值算法插入新的元素,从而扩大特征图大小,使扩大后与(3,3)特征图大小一致;然后对(3,3)的特征图进行1*1的卷积操作,改变其通道数,使通道数与经过上采样得到的特征图通道数相同;最后进行特征融合,融合后再使用3*3卷积核对融合后的特征图进行卷积操作,以消除上采样的混叠效应。其他相邻特征图之间的融合与上述方法一致。融合得到6个尺寸为38×38、19×19、10×10、5×5、3×3、1×1的特征图,这些特征图都包含有丰富的语义信息和细节信息。Firstly, the feature map with size (1, 1) is upsampled, and the interpolation method is used. On the basis of the original image pixels, a suitable interpolation algorithm is used to insert new elements between the pixels, thereby expanding the size of the feature map. , so that the size of the feature map of (3, 3) is the same after expansion; then perform a 1*1 convolution operation on the feature map of (3, 3) to change the number of channels so that the number of channels is the same as the feature map obtained by upsampling. The number of channels is the same; finally, feature fusion is performed, and after fusion, a 3*3 convolution kernel is used to perform a convolution operation on the fused feature map to eliminate the aliasing effect of upsampling. The fusion between other adjacent feature maps is consistent with the above method. Six feature maps with sizes of 38 × 38, 19 × 19, 10 × 10, 5 × 5, 3 × 3, and 1 × 1 are obtained by fusion, and these feature maps contain rich semantic information and detailed information.
b)结合注意力机制的原理,引入了通道注意力,对进行特征融合后的特征图添加注意力模型;对a)步骤中经过特征融合后的特征图添加注意力模型,由于融合后38×38、19×19两个特征图中包含有最丰富的信息,且为了保持算法的实时性,因此只对这两个特征图添加注意力模型。注意力模型的添加过程分为三个步骤:挤压,激励,注意。b) Combined with the principle of the attention mechanism, channel attention is introduced, and an attention model is added to the feature map after feature fusion; an attention model is added to the feature map after feature fusion in step a). The 38 and 19×19 feature maps contain the most abundant information, and in order to maintain the real-time performance of the algorithm, only the attention model is added to these two feature maps. The addition process of the attention model is divided into three steps: squeezing, excitation, and attention.
挤压操作的公式如下:The formula for the squeeze operation is as follows:
其中,H、W分别代表输入的高度,宽度,U代表输入,Y代表输出,C为输入的通道数;Among them, H and W represent the height and width of the input respectively, U represents the input, Y represents the output, and C is the number of input channels;
该式(1)的作用是将H*W*C的输入转化为1*1*C的输出,相当于进行了一个全局平均池化(global average pooling)操作。The function of the formula (1) is to convert the input of H*W*C into the output of 1*1*C, which is equivalent to performing a global average pooling operation.
激励操作的公式如下:The formula for the incentive operation is as follows:
S=h-Swish(W2×ReLU6(W1Y))(2)S=h-Swish(W2 ×ReLU6(W1 Y))(2)
其中,Y代表挤压操作的输出,S代表激励操作的输出,W1的维度为C/r*C,W2的维度为C*C/r,r是一个缩放参数,本文取值为4。W1与Y相乘代表全连接操作,然后经过ReLU6激活函数;再与W2相乘,也代表一个全连接操作,最后再经过hard-Swish激活函数,即完成了激励操作。ReLU6和hard-Swish激活函数公式如下式(3)所示。Among them, Y represents the output of the squeeze operation, S represents the output of the excitation operation, the dimension of W1 is C/r*C, the dimension of W2 is C*C/r, and r is a scaling parameter, which is 4 in this paper. . The multiplication of W1 and Y represents the full connection operation, and then passes through the ReLU6 activation function; then multiplication with W2 also represents a full connection operation, and finally passes through the hard-Swish activation function to complete the excitation operation. The formulas of ReLU6 and hard-Swish activation functions are shown in Equation (3).
Attention的操作如下式所示:The operation of Attention is as follows:
X=S×U (4)X=S×U (4)
式中,X代表添加注意力机制后的特征图,U代表原始输入,S代表激励操作的输出,对每一个特征图的权重和特征图的特征进行相乘。In the formula, X represents the feature map after adding the attention mechanism, U represents the original input, S represents the output of the excitation operation, and the weight of each feature map and the feature of the feature map are multiplied.
c)由步骤a)和步骤b)中得到的6个多尺度特征图,在每个单元都要设置不同尺寸、长宽比的候选框,对于候选框的尺度,按如下公式进行计算:c) For the 6 multi-scale feature maps obtained in step a) and step b), candidate frames of different sizes and aspect ratios must be set in each unit. The scale of the candidate frame is calculated according to the following formula:
其中,m代表特征层的个数;sk表示候选框与图片的比例;smax和smin代表比例的最大值和最小值,分别取值为0.9和0.2;Among them, m represents the number of feature layers; sk represents the ratio of the candidate frame to the image; smax and smin represent the maximum and minimum values of the ratio, which are 0.9 and 0.2 respectively;
利用上式(5)得到各个候选框的尺度;Use the above formula (5) to obtain the scale of each candidate frame;
对于长宽比,一般取值为且按照如下公式(6)对候选框的宽度及高度进行计算:For the aspect ratio, the general value is And calculate the width and height of the candidate frame according to the following formula (6):
对于宽高比为1的候选框,还增加一个尺度为的候选框,候选框的中心坐标为其中|fk|代表特征层的大小;For the candidate box with an aspect ratio of 1, a scale is added as The candidate frame of , the center coordinates of the candidate frame are where |fk | represents the size of the feature layer;
d)使用3×3的卷积核通过卷积操作对多尺度特征图的类别和置信度进行检测,并对目标检测算法进行训练;模型训练时损失函数定义为位置损失(localization loss,loc)和置信度损失(confidence loss,conf)的加权和,计算公式如下:d) Use a 3×3 convolution kernel to detect the category and confidence of the multi-scale feature map through the convolution operation, and train the target detection algorithm; the loss function during model training is defined as localization loss (loc) and the weighted sum of confidence loss (conf), the formula is as follows:
式中,N为匹配的候选框的数量;x∈{1,0}表示候选框是否与真实框匹配,若匹配,则x=1,反之x=0;c为类别置信度预测值;g为真实框的位置参数;l为预测框的位置预测值;α权重系数,设置为1。In the formula, N is the number of matched candidate frames; x∈{1,0} indicates whether the candidate frame matches the real frame, if it matches, then x=1, otherwise x=0; c is the category confidence prediction value; g is the position parameter of the real frame; l is the position prediction value of the prediction frame; α weight coefficient, set to 1.
对于SSD中的位置损失函数,采用Smooth L1 loss,对候选框的中心(cx,cy)及宽度(w)、高度(h)的偏移量进行回归。公式如下:For the position loss function in SSD, Smooth L1 loss is used to regress the center (cx, cy) and the offset of the width (w) and height (h) of the candidate frame. The formula is as follows:
对于SSD中的置信度损失函数,使用典型的softmax loss,其公式为:For the confidence loss function in SSD, a typical softmax loss is used, and its formula is:
然后对改进后的目标检测算法模型进行训练。Then the improved target detection algorithm model is trained.
在本实施例中,以PASCAL VOC2007数据集和PASCAL VOC2012数据集作为模型训练所用的训练集,并采用数据扩增技术,通过对数据集进行水平翻转、随机裁剪等操作,对训练集图像进行扩充。In this embodiment, the PASCAL VOC2007 data set and the PASCAL VOC2012 data set are used as the training sets for model training, and the data augmentation technology is used to expand the training set images by performing horizontal flipping, random cropping and other operations on the data set. .
实验所用的数据:PASCAL VOC数据集,是一套用于图像识别和分类的标准化的数据集,该数据集中包含20个类别,分别为人、鸟、猫、牛、狗、马、羊、飞机、自行车、船、巴士、汽车、摩托车、火车、瓶子、椅子、餐桌、盆栽植物、沙发、电视。Data used in the experiment: PASCAL VOC dataset is a standardized dataset for image recognition and classification. The dataset contains 20 categories, namely people, birds, cats, cows, dogs, horses, sheep, airplanes, bicycles , boat, bus, car, motorcycle, train, bottle, chair, dining table, potted plant, sofa, TV.
本实施例使用上述VOC2007数据集和VOC2012数据集进行训练,使用VOC2007数据集进行测试。训练时采用随机梯度下降法(SGD),batchsize设置为32,初始学习率设置为0.001,动量参数monmentum设置为0.9,学习率在迭代次数为100000和150000时调小90%,共训练200000次。In this example, the above-mentioned VOC2007 dataset and VOC2012 dataset are used for training, and the VOC2007 dataset is used for testing. Stochastic Gradient Descent (SGD) is used for training, the batch size is set to 32, the initial learning rate is set to 0.001, the momentum parameter is set to 0.9, and the learning rate is reduced by 90% when the number of iterations is 100,000 and 150,000, and a total of 200,000 training times.
为了验证本实施例的基于单阶段网络模型的目标检测改进算法的检测效果,申请人选用PASCAL VOC2007数据集中的测试集进行检测,使用mAP(mean Average Precision)作为检测算法的评价指标,检测到的每一个类别都会得到由查准率和查全率构成的曲线,即P-R曲线,曲线下的面积就是AP值,对检测的所有类别的AP值再求平均,即可得到mAP值。与其他主流目标检测模型从主观和客观两方面进行检测效果对比(参见表1和表2)。In order to verify the detection effect of the target detection improvement algorithm based on the single-stage network model of this embodiment, the applicant selects the test set in the PASCAL VOC2007 data set for detection, and uses mAP (mean Average Precision) as the evaluation index of the detection algorithm. Each category will get a curve composed of precision and recall, that is, the P-R curve. The area under the curve is the AP value. The mAP value can be obtained by averaging the AP values of all categories detected. The detection effect is compared with other mainstream target detection models from both subjective and objective aspects (see Table 1 and Table 2).
表1Table 1
表2Table 2
检测效果主观评价中,对比原始SSD算法及改进后的检测算法效果图(如图2所示,其中,a1、a2、a3、a4、a5图均是原始SSD算法检测图片;b1、b2、b3、b4、b5图均是目标检测算法检测图片)。从图中可以看出,改进后的目标检测算法相比原始SSD算法,显著改善了原始算法中的漏检等问题,对密集分布的小目标物体检测能力更加优秀,可以检测到更多的目标。检测效果较原始SSD算法有了较明显的提升。In the subjective evaluation of the detection effect, compare the original SSD algorithm and the improved detection algorithm renderings (as shown in Figure 2, where a1, a2, a3, a4, and a5 are the original SSD algorithm detection pictures; b1, b2, b3 , b4, and b5 are all images detected by the target detection algorithm). As can be seen from the figure, compared with the original SSD algorithm, the improved target detection algorithm significantly improves the missed detection and other problems in the original algorithm, and has better detection ability for densely distributed small target objects, and can detect more targets. . Compared with the original SSD algorithm, the detection effect has been significantly improved.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010710684.XACN111914917B (en) | 2020-07-22 | 2020-07-22 | An improved object detection algorithm based on feature pyramid network and attention mechanism |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010710684.XACN111914917B (en) | 2020-07-22 | 2020-07-22 | An improved object detection algorithm based on feature pyramid network and attention mechanism |
| Publication Number | Publication Date |
|---|---|
| CN111914917Atrue CN111914917A (en) | 2020-11-10 |
| CN111914917B CN111914917B (en) | 2025-01-17 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202010710684.XAActiveCN111914917B (en) | 2020-07-22 | 2020-07-22 | An improved object detection algorithm based on feature pyramid network and attention mechanism |
| Country | Link |
|---|---|
| CN (1) | CN111914917B (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112418345A (en)* | 2020-12-07 | 2021-02-26 | 苏州小阳软件科技有限公司 | Method and device for quickly identifying fine-grained small target |
| CN112465057A (en)* | 2020-12-08 | 2021-03-09 | 中国人民解放军空军工程大学 | Target detection and identification method based on deep convolutional neural network |
| CN112819737A (en)* | 2021-01-13 | 2021-05-18 | 西北大学 | Remote sensing image fusion method of multi-scale attention depth convolution network based on 3D convolution |
| CN112837747A (en)* | 2021-01-13 | 2021-05-25 | 上海交通大学 | A protein binding site prediction method based on attention twin network |
| CN113158738A (en)* | 2021-01-28 | 2021-07-23 | 中南大学 | Port environment target detection method, system, terminal and readable storage medium based on attention mechanism |
| CN113177579A (en)* | 2021-04-08 | 2021-07-27 | 北京科技大学 | Feature fusion method based on attention mechanism |
| CN113255443A (en)* | 2021-04-16 | 2021-08-13 | 杭州电子科技大学 | Pyramid structure-based method for positioning time sequence actions of graph attention network |
| CN113408549A (en)* | 2021-07-14 | 2021-09-17 | 西安电子科技大学 | Few-sample weak and small target detection method based on template matching and attention mechanism |
| CN113409249A (en)* | 2021-05-17 | 2021-09-17 | 上海电力大学 | Insulator defect detection method based on end-to-end algorithm |
| CN113807291A (en)* | 2021-09-24 | 2021-12-17 | 南京莱斯电子设备有限公司 | Airport runway foreign matter detection and identification method based on feature fusion attention network |
| CN113920468A (en)* | 2021-12-13 | 2022-01-11 | 松立控股集团股份有限公司 | Multi-branch pedestrian detection method based on cross-scale feature enhancement |
| CN114220015A (en)* | 2021-12-21 | 2022-03-22 | 一拓通信集团股份有限公司 | Improved YOLOv 5-based satellite image small target detection method |
| CN114387202A (en)* | 2021-06-25 | 2022-04-22 | 南京交通职业技术学院 | 3D target detection method based on vehicle end point cloud and image fusion |
| CN114419530A (en)* | 2021-12-01 | 2022-04-29 | 国电南瑞南京控制系统有限公司 | Helmet wearing detection algorithm based on improved YOLOv5 |
| CN114494870A (en)* | 2022-01-21 | 2022-05-13 | 山东科技大学 | A dual-phase remote sensing image change detection method, model building method and device |
| CN114627368A (en)* | 2020-12-10 | 2022-06-14 | 华东理工大学 | Novel real-time detector for remote sensing image |
| CN114782772A (en)* | 2022-04-08 | 2022-07-22 | 河海大学 | A detection and identification method of floating objects on water based on improved SSD algorithm |
| CN114821347A (en)* | 2021-01-21 | 2022-07-29 | 南京理工大学 | Remote sensing aircraft target identification method based on depth feature fusion |
| CN114937196A (en)* | 2022-04-22 | 2022-08-23 | 广州大学 | Shadow detection method based on random attention mechanism |
| CN114972860A (en)* | 2022-05-23 | 2022-08-30 | 郑州轻工业大学 | Target detection method based on attention-enhanced bidirectional feature pyramid network |
| CN115019169A (en)* | 2022-05-31 | 2022-09-06 | 海南大学 | Single-stage water surface small target detection method and device |
| CN115995042A (en)* | 2023-02-09 | 2023-04-21 | 上海理工大学 | A video SAR moving target detection method and device |
| CN119579968A (en)* | 2024-11-14 | 2025-03-07 | 中国科学院自动化研究所 | Image internal texture classification method and texture classification model training method |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20180182109A1 (en)* | 2016-12-22 | 2018-06-28 | TCL Research America Inc. | System and method for enhancing target tracking via detector and tracker fusion for unmanned aerial vehicles |
| CN109344821A (en)* | 2018-08-30 | 2019-02-15 | 西安电子科技大学 | Small target detection method based on feature fusion and deep learning |
| US20190341025A1 (en)* | 2018-04-18 | 2019-11-07 | Sony Interactive Entertainment Inc. | Integrated understanding of user characteristics by multimodal processing |
| CN110533084A (en)* | 2019-08-12 | 2019-12-03 | 长安大学 | A kind of multiscale target detection method based on from attention mechanism |
| CN110674866A (en)* | 2019-09-23 | 2020-01-10 | 兰州理工大学 | Method for detecting X-ray breast lesion images by using transfer learning characteristic pyramid network |
| CN110705457A (en)* | 2019-09-29 | 2020-01-17 | 核工业北京地质研究院 | Remote sensing image building change detection method |
| CN111179217A (en)* | 2019-12-04 | 2020-05-19 | 天津大学 | A multi-scale target detection method in remote sensing images based on attention mechanism |
| CN111259940A (en)* | 2020-01-10 | 2020-06-09 | 杭州电子科技大学 | Target detection method based on space attention map |
| CN111401201A (en)* | 2020-03-10 | 2020-07-10 | 南京信息工程大学 | A multi-scale object detection method based on spatial pyramid attention-driven aerial imagery |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20180182109A1 (en)* | 2016-12-22 | 2018-06-28 | TCL Research America Inc. | System and method for enhancing target tracking via detector and tracker fusion for unmanned aerial vehicles |
| US20190341025A1 (en)* | 2018-04-18 | 2019-11-07 | Sony Interactive Entertainment Inc. | Integrated understanding of user characteristics by multimodal processing |
| CN109344821A (en)* | 2018-08-30 | 2019-02-15 | 西安电子科技大学 | Small target detection method based on feature fusion and deep learning |
| CN110533084A (en)* | 2019-08-12 | 2019-12-03 | 长安大学 | A kind of multiscale target detection method based on from attention mechanism |
| CN110674866A (en)* | 2019-09-23 | 2020-01-10 | 兰州理工大学 | Method for detecting X-ray breast lesion images by using transfer learning characteristic pyramid network |
| CN110705457A (en)* | 2019-09-29 | 2020-01-17 | 核工业北京地质研究院 | Remote sensing image building change detection method |
| CN111179217A (en)* | 2019-12-04 | 2020-05-19 | 天津大学 | A multi-scale target detection method in remote sensing images based on attention mechanism |
| CN111259940A (en)* | 2020-01-10 | 2020-06-09 | 杭州电子科技大学 | Target detection method based on space attention map |
| CN111401201A (en)* | 2020-03-10 | 2020-07-10 | 南京信息工程大学 | A multi-scale object detection method based on spatial pyramid attention-driven aerial imagery |
| Title |
|---|
| 徐成琪;洪学海;: "基于功能保持的特征金字塔目标检测网络", 模式识别与人工智能, no. 06, 15 June 2020 (2020-06-15)* |
| 沈文祥;秦品乐;曾建潮;: "基于多级特征和混合注意力机制的室内人群检测网络", 计算机应用, no. 12* |
| 高建瓴;孙健;王子牛;韩毓璐;冯娇娇;: "基于注意力机制和特征融合的SSD目标检测算法", 软件, no. 02, 15 February 2020 (2020-02-15)* |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112418345B (en)* | 2020-12-07 | 2024-02-23 | 深圳小阳软件有限公司 | Method and device for quickly identifying small targets with fine granularity |
| CN112418345A (en)* | 2020-12-07 | 2021-02-26 | 苏州小阳软件科技有限公司 | Method and device for quickly identifying fine-grained small target |
| CN112465057A (en)* | 2020-12-08 | 2021-03-09 | 中国人民解放军空军工程大学 | Target detection and identification method based on deep convolutional neural network |
| CN114627368A (en)* | 2020-12-10 | 2022-06-14 | 华东理工大学 | Novel real-time detector for remote sensing image |
| CN112819737B (en)* | 2021-01-13 | 2023-04-07 | 西北大学 | Remote sensing image fusion method of multi-scale attention depth convolution network based on 3D convolution |
| CN112837747B (en)* | 2021-01-13 | 2022-07-12 | 上海交通大学 | A protein binding site prediction method based on attention twin network |
| CN112819737A (en)* | 2021-01-13 | 2021-05-18 | 西北大学 | Remote sensing image fusion method of multi-scale attention depth convolution network based on 3D convolution |
| CN112837747A (en)* | 2021-01-13 | 2021-05-25 | 上海交通大学 | A protein binding site prediction method based on attention twin network |
| CN114821347B (en)* | 2021-01-21 | 2025-06-13 | 南京理工大学 | A remote sensing aircraft target recognition method based on deep feature fusion |
| CN114821347A (en)* | 2021-01-21 | 2022-07-29 | 南京理工大学 | Remote sensing aircraft target identification method based on depth feature fusion |
| CN113158738B (en)* | 2021-01-28 | 2022-09-20 | 中南大学 | Port environment target detection method, system, terminal and readable storage medium based on attention mechanism |
| CN113158738A (en)* | 2021-01-28 | 2021-07-23 | 中南大学 | Port environment target detection method, system, terminal and readable storage medium based on attention mechanism |
| CN113177579A (en)* | 2021-04-08 | 2021-07-27 | 北京科技大学 | Feature fusion method based on attention mechanism |
| CN113255443A (en)* | 2021-04-16 | 2021-08-13 | 杭州电子科技大学 | Pyramid structure-based method for positioning time sequence actions of graph attention network |
| CN113255443B (en)* | 2021-04-16 | 2024-02-09 | 杭州电子科技大学 | Graph annotation meaning network time sequence action positioning method based on pyramid structure |
| CN113409249A (en)* | 2021-05-17 | 2021-09-17 | 上海电力大学 | Insulator defect detection method based on end-to-end algorithm |
| CN114387202A (en)* | 2021-06-25 | 2022-04-22 | 南京交通职业技术学院 | 3D target detection method based on vehicle end point cloud and image fusion |
| CN113408549A (en)* | 2021-07-14 | 2021-09-17 | 西安电子科技大学 | Few-sample weak and small target detection method based on template matching and attention mechanism |
| CN113408549B (en)* | 2021-07-14 | 2023-01-24 | 西安电子科技大学 | Few-sample Weak Object Detection Method Based on Template Matching and Attention Mechanism |
| CN113807291B (en)* | 2021-09-24 | 2024-04-26 | 南京莱斯电子设备有限公司 | Airport runway foreign matter detection and identification method based on feature fusion attention network |
| CN113807291A (en)* | 2021-09-24 | 2021-12-17 | 南京莱斯电子设备有限公司 | Airport runway foreign matter detection and identification method based on feature fusion attention network |
| CN114419530A (en)* | 2021-12-01 | 2022-04-29 | 国电南瑞南京控制系统有限公司 | Helmet wearing detection algorithm based on improved YOLOv5 |
| CN113920468A (en)* | 2021-12-13 | 2022-01-11 | 松立控股集团股份有限公司 | Multi-branch pedestrian detection method based on cross-scale feature enhancement |
| CN114220015A (en)* | 2021-12-21 | 2022-03-22 | 一拓通信集团股份有限公司 | Improved YOLOv 5-based satellite image small target detection method |
| CN114494870B (en)* | 2022-01-21 | 2025-05-30 | 山东科技大学 | A dual-temporal remote sensing image change detection method, model building method and device |
| CN114494870A (en)* | 2022-01-21 | 2022-05-13 | 山东科技大学 | A dual-phase remote sensing image change detection method, model building method and device |
| CN114782772A (en)* | 2022-04-08 | 2022-07-22 | 河海大学 | A detection and identification method of floating objects on water based on improved SSD algorithm |
| CN114937196A (en)* | 2022-04-22 | 2022-08-23 | 广州大学 | Shadow detection method based on random attention mechanism |
| CN114937196B (en)* | 2022-04-22 | 2025-04-22 | 广州大学 | A shadow detection method based on random attention mechanism |
| CN114972860A (en)* | 2022-05-23 | 2022-08-30 | 郑州轻工业大学 | Target detection method based on attention-enhanced bidirectional feature pyramid network |
| CN115019169A (en)* | 2022-05-31 | 2022-09-06 | 海南大学 | Single-stage water surface small target detection method and device |
| CN115995042A (en)* | 2023-02-09 | 2023-04-21 | 上海理工大学 | A video SAR moving target detection method and device |
| CN119579968A (en)* | 2024-11-14 | 2025-03-07 | 中国科学院自动化研究所 | Image internal texture classification method and texture classification model training method |
| Publication number | Publication date |
|---|---|
| CN111914917B (en) | 2025-01-17 |
| Publication | Publication Date | Title |
|---|---|---|
| CN111914917A (en) | Target detection improved algorithm based on feature pyramid network and attention mechanism | |
| CN113065558B (en) | Lightweight small target detection method combined with attention mechanism | |
| CN112200045B (en) | Remote sensing image target detection model establishment method based on context enhancement and application | |
| CN111126202B (en) | Object detection method of optical remote sensing image based on hole feature pyramid network | |
| CN114612937B (en) | Pedestrian detection method based on single-mode enhancement by combining infrared light and visible light | |
| CN107220611B (en) | Space-time feature extraction method based on deep neural network | |
| CN107784288B (en) | Iterative positioning type face detection method based on deep neural network | |
| CN108549841A (en) | A kind of recognition methods of the Falls Among Old People behavior based on deep learning | |
| CN111783685B (en) | An improved target detection algorithm based on a single-stage network model | |
| CN111160249A (en) | Multi-class target detection method in optical remote sensing images based on cross-scale feature fusion | |
| CN111738344A (en) | A fast target detection method based on multi-scale fusion | |
| CN113887649B (en) | Target detection method based on fusion of deep layer features and shallow layer features | |
| CN117037004A (en) | Unmanned aerial vehicle image detection method based on multi-scale feature fusion and context enhancement | |
| CN107292875A (en) | A kind of conspicuousness detection method based on global Local Feature Fusion | |
| CN111860171A (en) | A method and system for detecting irregularly shaped targets in large-scale remote sensing images | |
| CN109767456A (en) | A target tracking method based on SiameseFC framework and PFP neural network | |
| CN109784278A (en) | The small and weak moving ship real-time detection method in sea based on deep learning | |
| CN109271876A (en) | Video actions detection method based on temporal evolution modeling and multi-instance learning | |
| CN114743023B (en) | An image detection method of wheat spider based on RetinaNet model | |
| CN108256496A (en) | A kind of stockyard smog detection method based on video | |
| CN115223009A (en) | Small target detection method and device based on improved YOLOv5 | |
| CN111340019A (en) | Detection method of granary pests based on Faster R-CNN | |
| CN110046595A (en) | A kind of intensive method for detecting human face multiple dimensioned based on tandem type | |
| CN117203678A (en) | Target detection method and device | |
| CN111428655A (en) | A scalp detection method based on deep learning |
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |