Movatterモバイル変換


[0]ホーム

URL:


CN111914917A - Target detection improved algorithm based on feature pyramid network and attention mechanism - Google Patents

Target detection improved algorithm based on feature pyramid network and attention mechanism
Download PDF

Info

Publication number
CN111914917A
CN111914917ACN202010710684.XACN202010710684ACN111914917ACN 111914917 ACN111914917 ACN 111914917ACN 202010710684 ACN202010710684 ACN 202010710684ACN 111914917 ACN111914917 ACN 111914917A
Authority
CN
China
Prior art keywords
feature
algorithm
fusion
network
small
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010710684.XA
Other languages
Chinese (zh)
Other versions
CN111914917B (en
Inventor
王燕妮
刘祥
翟会杰
余丽仙
孙雪松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian University of Architecture and Technology
Original Assignee
Xian University of Architecture and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian University of Architecture and TechnologyfiledCriticalXian University of Architecture and Technology
Priority to CN202010710684.XApriorityCriticalpatent/CN111914917B/en
Publication of CN111914917ApublicationCriticalpatent/CN111914917A/en
Application grantedgrantedCritical
Publication of CN111914917BpublicationCriticalpatent/CN111914917B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

Translated fromChinese

本发明公开了一种基于特征金字塔网络和注意力机制的目标检测改进算法,该方法通过结合特征金字塔网络的原理,对原始SSD算法中基础网络提取出的6个多尺度特征图进行融合,融合后形成的新特征图中同时包含有丰富的上下文信息,以提高检测能力;并对融合后的特征图添加注意力模型,有效提取出小目标的特征信息。改善了漏检的情况,提高了算法的鲁棒性,同时在检测速度方面仍满足实时性的要求。

Figure 202010710684

The invention discloses an improved target detection algorithm based on a feature pyramid network and an attention mechanism. By combining the principle of the feature pyramid network, the method fuses 6 multi-scale feature maps extracted from the basic network in the original SSD algorithm, and fuses them. The new feature map formed later also contains rich context information to improve the detection ability; an attention model is added to the fused feature map to effectively extract the feature information of small targets. It improves the situation of missed detection, improves the robustness of the algorithm, and still meets the real-time requirements in terms of detection speed.

Figure 202010710684

Description

Translated fromChinese
一种基于特征金字塔网络和注意力机制的目标检测改进算法An Improved Target Detection Algorithm Based on Feature Pyramid Network and Attention Mechanism

技术领域technical field

本发明属于数字图像处理领域,涉及目标检测,特别涉及一种基于特征金字塔网络和注意力机制的目标检测改进算法。The invention belongs to the field of digital image processing and relates to target detection, in particular to an improved target detection algorithm based on a feature pyramid network and an attention mechanism.

背景技术Background technique

目标检测的任务是找出图像中的感兴趣目标,确定它们的类别和位置,是计算机视觉领域的核心问题之一,在红外探测技术,智能视频监控,遥感影像目标检测,医疗诊断以及智能建筑中的火灾、烟雾检测中都有广泛应用。目标检测算法可以分为传统目标检测算法和基于深度学习的目标检测算法;传统目标检测算法代表算法有SIFT算法和V-J检测算法等,但该种方法时间复杂度高,且没有很好的鲁棒性。基于深度学习的目标检测算法,经典算法有R-CNN算法,Fast R-CNN算法,Faster R-CNN算法,YOLO算法,SSD算法等。虽然现阶段有很多优秀的目标检测算法,但检测性能仍有很多不足,从而导致出现漏检、误检等问题。The task of object detection is to find out the objects of interest in the image, determine their category and location, which is one of the core problems in the field of computer vision, in infrared detection technology, intelligent video surveillance, remote sensing image object detection, medical diagnosis and intelligent buildings. It is widely used in fire and smoke detection in China. Target detection algorithms can be divided into traditional target detection algorithms and target detection algorithms based on deep learning; the representative algorithms of traditional target detection algorithms include SIFT algorithm and V-J detection algorithm, but this method has high time complexity and is not very robust. sex. Target detection algorithms based on deep learning, classic algorithms include R-CNN algorithm, Fast R-CNN algorithm, Faster R-CNN algorithm, YOLO algorithm, SSD algorithm, etc. Although there are many excellent target detection algorithms at this stage, there are still many shortcomings in detection performance, which leads to problems such as missed detection and false detection.

发明内容SUMMARY OF THE INVENTION

针对上述现有技术存在的缺陷或不足,本发明的目的在于,提供一种基于特征金字塔网络和注意力机制的目标检测改进算法。In view of the above-mentioned defects or deficiencies in the prior art, the purpose of the present invention is to provide an improved target detection algorithm based on a feature pyramid network and an attention mechanism.

为了实现上述任务,本发明采取如下的技术解决方案:In order to realize the above-mentioned tasks, the present invention adopts the following technical solutions:

一种基于特征金字塔网络和注意力机制的目标检测改进算法,其特征在于,包括以下步骤:An improved target detection algorithm based on feature pyramid network and attention mechanism, characterized in that it includes the following steps:

步骤1)结合特征金字塔网络的原理,对原始SSD算法中,基础网络VGG-16提取出输入图像的6个多尺度特征图,按自小而大的顺序进行特征融合;得到融合不同层的特征图,且融合后的特征图同时包含有丰富的语义信息和细节信息;Step 1) Combined with the principle of the feature pyramid network, in the original SSD algorithm, the basic network VGG-16 extracts 6 multi-scale feature maps of the input image, and performs feature fusion in the order from small to large; the features of different layers are obtained. image, and the fused feature map contains rich semantic information and detailed information at the same time;

其中,所述原始SSD算法中,经过基础网络VGG-16对输入图像提取出的特征图尺度是从大到小依次递减的,其中底层特征图分辨率较大,含有更多细节信息,高层特征图分辨率较小,包含更多抽象的语义信息,因此,原始SSD算法将底层特征图用于对小目标进行检测,高层特征图用于对中、大目标进行检测;Among them, in the original SSD algorithm, the scale of the feature map extracted from the input image by the basic network VGG-16 is in descending order from large to small. The resolution of the image is small and contains more abstract semantic information. Therefore, the original SSD algorithm uses the low-level feature map to detect small objects, and the high-level feature map is used to detect medium and large objects;

步骤2)引入通道注意力机制,对特征融合后其中拥有更加丰富的细节信息和语义信息,同时对小目标检测更加敏感的两个特征图添加注意力模型;即通过对特征图添加掩码(mask)来实现注意力机制,将感兴趣区域的特征标识出来,通过网络的不断训练,让网络学习到每一张图像中需要重点关注的感兴趣区域,抑制其他干扰区域带来的影响,从而增强算法对小目标物体的检测能力。Step 2) Introduce the channel attention mechanism, and add an attention model to the two feature maps that have richer detailed information and semantic information after feature fusion, and are more sensitive to small target detection; that is, by adding a mask to the feature map ( mask) to realize the attention mechanism, identify the features of the area of interest, and through the continuous training of the network, let the network learn the area of interest that needs to be focused on in each image, suppress the influence of other interference areas, and thus Enhance the detection ability of the algorithm for small target objects.

根据本发明,步骤1)中所述输入图像尺寸为300×300,经过基础网络VGG-16后得到的用于检测的特征图尺寸分别为38×38、19×19、10×10、5×5、3×3、1×1。按特征金字塔网络的原理,对特征图按照尺寸从小到大的顺序,依次进行特征融合,得到尺寸大小仍为38×38、19×19、10×10、5×5、3×3、1×1的6个特征图。According to the present invention, the size of the input image in step 1) is 300×300, and the size of the feature map for detection obtained after the basic network VGG-16 is 38×38, 19×19, 10×10, 5× 5, 3×3, 1×1. According to the principle of the feature pyramid network, the feature maps are fused in order of size from small to large, and the size is still 38×38, 19×19, 10×10, 5×5, 3×3, 1× 1 of the 6 feature maps.

进一步地,步骤2)中对步骤1)中按特征金字塔原理融合后的特征图,添加注意力模型,因为融合过程是按照特征图尺寸从小到大的顺序进行的,因此融合后信息最丰富的特征图为(38,38),(19,19)两个特征图这两个特征图相比其他特征图,拥有更加丰富的细节信息和语义信息,同时对小目标检测更加敏感;且为了保持算法的检测速度,减少算法的计算量,故只对融合后的(38,38),(19,19)这两个特征图添加注意力模型,具体的检测算法过程如下:Further, in step 2), an attention model is added to the feature map fused according to the principle of feature pyramid in step 1), because the fusion process is carried out in the order of the size of the feature map from small to large, so the most informative after fusion. The feature maps are (38, 38), (19, 19) two feature maps. Compared with other feature maps, these two feature maps have richer detailed information and semantic information, and are more sensitive to small target detection; and in order to maintain The detection speed of the algorithm reduces the calculation amount of the algorithm, so only the attention model is added to the two feature maps after fusion (38, 38), (19, 19). The specific detection algorithm process is as follows:

a)基于单阶段网络模型的目标检测,利用回归的思想,直接通过一个卷积神经网络在输入图像上回归出目标的类别及边框。首先结合特征金字塔网络的原理,对原始SSD算法提取出的多尺度特征图按照尺寸从小到大的顺序,依次进行特征融合;原始SSD算法由基础网络VGG-16提取出的多尺度特征图,尺寸大小分别为38×38、19×19、10×10、5×5、3×3、1×1,按照特征金字塔网络的原理,按照尺寸从小到大的顺序,进行特征融合,融合得到6个尺寸为38×38、19×19、10×10、5×5、3×3、1×1的特征图,这些特征图都包含有丰富的语义信息和细节信息。a) Target detection based on a single-stage network model, using the idea of regression, directly regresses the category and frame of the target on the input image through a convolutional neural network. First, combined with the principle of the feature pyramid network, the multi-scale feature maps extracted by the original SSD algorithm are fused in order of size from small to large; the multi-scale feature maps extracted by the original SSD algorithm from the basic network VGG-16, the size The sizes are 38×38, 19×19, 10×10, 5×5, 3×3, and 1×1. According to the principle of the feature pyramid network, in the order of size from small to large, feature fusion is performed, and 6 are obtained by fusion. Feature maps ofsize 38×38, 19×19, 10×10, 5×5, 3×3, 1×1, these feature maps contain rich semantic information and detailed information.

b)结合注意力机制的原理,引入了通道注意力,对进行特征融合后的特征图添加注意力模型;对1a)中经过特征融合后的特征图添加注意力模型,由于融合后38×38、19×19两个特征图中包含有最丰富的信息,且为了保持算法的实时性,因此只对这两个特征图添加注意力模型。b) Combined with the principle of the attention mechanism, the channel attention is introduced, and the attention model is added to the feature map after feature fusion; the attention model is added to the feature map after feature fusion in 1a). , 19×19 feature maps contain the most abundant information, and in order to maintain the real-time performance of the algorithm, only the attention model is added to these two feature maps.

c)由步骤a)和b)中得到的6个多尺度特征图,在每个单元都要设置不同尺寸、长宽比的候选框,对于候选框的尺度,按如下公式(1)进行计算:c) From the six multi-scale feature maps obtained in steps a) and b), candidate frames of different sizes and aspect ratios must be set in each unit. For the scale of the candidate frame, the following formula (1) is used to calculate :

Figure BDA0002596425370000031
Figure BDA0002596425370000031

其中,m代表特征层的个数;sk表示候选框与图片的比例;smax和smin代表比例的最大值和最小值,分别取值为0.9和0.2;利用上述公式(1)得到各个候选框的尺度;Among them, m represents the number of feature layers; sk represents the ratio of the candidate frame to the picture; smax and smin represent the maximum and minimum values of the ratio, which are 0.9 and 0.2 respectively; using the above formula (1) to obtain each The scale of the candidate box;

对于长宽比,一般取值为

Figure BDA0002596425370000032
且按照如下公式(2)对候选框的宽度及高度进行计算:For the aspect ratio, the general value is
Figure BDA0002596425370000032
And calculate the width and height of the candidate frame according to the following formula (2):

Figure BDA0002596425370000041
Figure BDA0002596425370000041

对于宽高比为1的候选框,还增加一个尺度为

Figure BDA0002596425370000042
的候选框,候选框的中心坐标为
Figure BDA0002596425370000043
其中|fk|代表特征层的大小;For the candidate box with an aspect ratio of 1, a scale is added as
Figure BDA0002596425370000042
The candidate frame of , the center coordinates of the candidate frame are
Figure BDA0002596425370000043
where |fk | represents the size of the feature layer;

d)使用3×3的卷积核通过卷积操作对多尺度特征图的类别和置信度进行检测,并对目标检测算法进行训练;模型训练时损失函数定义为位置损失(localization loss,loc)和置信度损失(confidence loss,conf)的加权和,计算公式如下:d) Use a 3×3 convolution kernel to detect the category and confidence of the multi-scale feature map through the convolution operation, and train the target detection algorithm; the loss function during model training is defined as localization loss (loc) and the weighted sum of confidence loss (conf), the formula is as follows:

Figure BDA0002596425370000044
Figure BDA0002596425370000044

式中,N为匹配的候选框的数量;x∈{1,0}表示候选框是否与真实框匹配,若匹配,则x=1,反之x=0;c为类别置信度预测值;g为真实框的位置参数;l为预测框的位置预测值;α权重系数,设置为1。In the formula, N is the number of matched candidate frames; x∈{1,0} indicates whether the candidate frame matches the real frame, if it matches, then x=1, otherwise x=0; c is the category confidence prediction value; g is the position parameter of the real frame; l is the position prediction value of the prediction frame; α weight coefficient, set to 1.

对于SSD中的位置损失函数,采用Smooth L1 loss,对候选框的中心(cx,cy)及宽度(w)、高度(h)的偏移量进行回归。公式如下:For the position loss function in SSD, Smooth L1 loss is used to regress the center (cx, cy) and the offset of the width (w) and height (h) of the candidate frame. The formula is as follows:

Figure BDA0002596425370000045
Figure BDA0002596425370000045

Figure BDA0002596425370000046
Figure BDA0002596425370000046

Figure BDA0002596425370000047
Figure BDA0002596425370000047

对于SSD中的置信度损失函数,使用典型的softmax loss,其公式为:For the confidence loss function in SSD, a typical softmax loss is used, and its formula is:

Figure BDA0002596425370000051
Figure BDA0002596425370000051

本发明的基于特征金字塔网络和注意力机制的目标检测改进算法,以单阶段目标检测算法SSD算法为基础,考虑到特征图分辨率大小对目标检测性能的影响,对原算法进行改进,结合特征金字塔网络的思想,对原始SSD算法提取出的多尺度特征图进行融合,融合形成具有丰富语义信息和丰富细节信息的特征图;再结合注意力机制的原理,为融合后尺寸为38×38、19×19两个特征图添加注意力模型,以加强对小目标物体的识别效果。The improved target detection algorithm based on the feature pyramid network and the attention mechanism of the present invention is based on the single-stage target detection algorithm SSD algorithm. The idea of the pyramid network is to fuse the multi-scale feature maps extracted by the original SSD algorithm to form a feature map with rich semantic information and rich detailed information; 19×19 two feature maps are added with attention model to enhance the recognition effect of small target objects.

附图说明Description of drawings

图1是结合特征金字塔网络和注意力机制的目标检测算法的网络结构示意图;Figure 1 is a schematic diagram of the network structure of a target detection algorithm combining a feature pyramid network and an attention mechanism;

图2是原始SSD算法与改进后的目标检测算法检测效果对比图片,其中,左侧的图a1、图a2、图a3、图a4和图a5均是原始SSD算法检测图片;右侧的图b1、图b2、图b3、图b4和图b5均是改进后目标检测算法检测图片。Figure 2 is a comparison picture of the detection effect of the original SSD algorithm and the improved target detection algorithm. Among them, Figure a1, Figure a2, Figure a3, Figure a4 and Figure a5 on the left are the detection pictures of the original SSD algorithm; Figure b1 on the right , Figure b2, Figure b3, Figure b4 and Figure b5 are the improved target detection algorithm detection pictures.

以下结合附图和实施例对本发明做进一步详细描述。The present invention will be described in further detail below with reference to the accompanying drawings and embodiments.

具体实施方式Detailed ways

本发明的基于特征金字塔网络和注意力机制的目标检测改进算法,采取的技术思路是,以单阶段目标检测算法SSD为基础,对SSD算法中不足进行分析,提出对SSD目标检测算法进行改进。集合特征金字塔网络的原理,对原始SSD算法提取出的6个特征图进行融合,融合形成新的特征图,同时具有丰富的语义信息和细节信息;然后对融合后的特征图添加注意力模型,但为了保持算法的实时性,只对包含信息最丰富,同时对小目标检测更加敏感的38×38和19×19两个特征图进行添加注意力模型。通过对算法的改进以达到提高目标检测算法的检测能力,改善漏检等问题。The improved target detection algorithm based on the feature pyramid network and the attention mechanism of the present invention adopts the technical idea that, based on the single-stage target detection algorithm SSD, the deficiencies in the SSD algorithm are analyzed, and the improvement of the SSD target detection algorithm is proposed. The principle of the aggregated feature pyramid network is to fuse the 6 feature maps extracted by the original SSD algorithm to form a new feature map with rich semantic information and detailed information; then add an attention model to the fused feature map, However, in order to maintain the real-time performance of the algorithm, the attention model is only added to the 38×38 and 19×19 feature maps that contain the most information and are more sensitive to small target detection. Through the improvement of the algorithm, the detection ability of the target detection algorithm can be improved, and the problems such as missed detection can be improved.

本实施例给出一种基于特征金字塔网络和注意力机制的目标检测改进算法,包括以下步骤:The present embodiment provides an improved target detection algorithm based on a feature pyramid network and an attention mechanism, including the following steps:

步骤1)结合特征金字塔网络的原理,对原始SSD算法中,基础网络VGG-16提取出输入图像的6个多尺度特征图,按自小而大的顺序进行特征融合;得到融合不同层的特征图,且融合后的特征图同时包含有丰富的语义信息和细节信息;Step 1) Combined with the principle of the feature pyramid network, in the original SSD algorithm, the basic network VGG-16 extracts 6 multi-scale feature maps of the input image, and performs feature fusion in the order from small to large; the features of different layers are obtained. image, and the fused feature map contains rich semantic information and detailed information at the same time;

其中,所述原始SSD算法中,经过基础网络VGG-16对输入图像提取出的特征图尺度是从大到小依次递减的,其中底层特征图分辨率较大,含有更多细节信息,高层特征图分辨率较小,包含更多抽象的语义信息,因此,原始SSD算法将底层特征图用于对小目标进行检测,高层特征图用于对中、大目标进行检测;Among them, in the original SSD algorithm, the scale of the feature map extracted from the input image by the basic network VGG-16 is in descending order from large to small. The resolution of the image is small and contains more abstract semantic information. Therefore, the original SSD algorithm uses the low-level feature map to detect small objects, and the high-level feature map is used to detect medium and large objects;

步骤2)引入通道注意力机制,对特征融合后其中拥有更加丰富的细节信息和语义信息,同时对小目标检测更加敏感的两个特征图添加注意力模型;即通过对特征图添加掩码(mask)来实现注意力机制,将感兴趣区域的特征标识出来,通过网络的不断训练,让网络学习到每一张图像中需要重点关注的感兴趣区域,抑制其他干扰区域带来的影响,从而增强算法对小目标物体的检测能力。Step 2) Introduce the channel attention mechanism, and add an attention model to the two feature maps that have richer detailed information and semantic information after feature fusion, and are more sensitive to small target detection; that is, by adding a mask to the feature map ( mask) to realize the attention mechanism, identify the features of the area of interest, and through the continuous training of the network, let the network learn the area of interest that needs to be focused on in each image, suppress the influence of other interference areas, and thus Enhance the detection ability of the algorithm for small target objects.

步骤1)中,输入图像的尺寸为300×300,经过基础网络VGG-16提取出的特征图尺寸分别为38×38、19×19、10×10、5×5、3×3、1×1,结合特征金字塔网络的思想,对提取出的这6个特征图按尺寸从小到大的方式进行融合,即1×1与3×3,3×3与5×5,5×5与10×10,10×10与19×19,19×19与38×38。融合后的特征图尺寸仍为38×38、19×19、10×10、5×5、3×3、1×1。In step 1), the size of the input image is 300×300, and the size of the feature map extracted by the basic network VGG-16 is 38×38, 19×19, 10×10, 5×5, 3×3, 1× 1. Combined with the idea of feature pyramid network, the extracted 6 feature maps are fused according to the size from small to large, namely 1×1 and 3×3, 3×3 and 5×5, 5×5 and 10 ×10, 10×10 and 19×19, 19×19 and 38×38. The size of the fused feature map is still 38×38, 19×19, 10×10, 5×5, 3×3, 1×1.

步骤2)中,结合注意力机制的原理,对融合后的特征图添加注意力模型,由于特征融合后38×38和19×19两个特征图中包含最丰富的信息,且为了保持检测算法的实时性,减少计算量,只对这两个特征图添加注意力模型,添加注意力模型后可以增强对小目标物体特征的提取。In step 2), combined with the principle of the attention mechanism, an attention model is added to the fused feature map. Since the 38×38 and 19×19 feature maps after feature fusion contain the most abundant information, and in order to maintain the detection algorithm The real-time performance is reduced, and the amount of calculation is reduced. Only the attention model is added to these two feature maps. After adding the attention model, the feature extraction of small target objects can be enhanced.

改进后的目标检测算法的检测过程如下:The detection process of the improved target detection algorithm is as follows:

a)基于单阶段网络模型的目标检测,利用回归的思想,直接通过一个卷积神经网络在输入图像上回归出目标的类别及边框。首先结合特征金字塔网络的原理,对原始SSD算法提取出的多尺度特征图按照尺寸从小到大的顺序,依次进行特征融合;原始SSD算法由基础网络VGG-16提取出的多尺度特征图,尺寸大小分别为38×38、19×19、10×10、5×5、3×3、1×1,按照特征金字塔网络的原理,按照尺寸从小到大的顺序,进行特征融合,以(1,1)和(3,3)两个特征图为例:a) Target detection based on a single-stage network model, using the idea of regression, directly regresses the category and frame of the target on the input image through a convolutional neural network. First, combined with the principle of the feature pyramid network, the multi-scale feature maps extracted by the original SSD algorithm are fused in order of size from small to large; the multi-scale feature maps extracted by the original SSD algorithm from the basic network VGG-16, the size The sizes are 38×38, 19×19, 10×10, 5×5, 3×3, and 1×1. According to the principle of the feature pyramid network, the feature fusion is performed in the order of size from small to large, with (1, 1) and (3, 3) two feature maps as examples:

首先对尺寸为(1,1)的特征图进行上采样,采用内插值的方法,在原有图像像素的基础上,在像素点之间采用合适的插值算法插入新的元素,从而扩大特征图大小,使扩大后与(3,3)特征图大小一致;然后对(3,3)的特征图进行1*1的卷积操作,改变其通道数,使通道数与经过上采样得到的特征图通道数相同;最后进行特征融合,融合后再使用3*3卷积核对融合后的特征图进行卷积操作,以消除上采样的混叠效应。其他相邻特征图之间的融合与上述方法一致。融合得到6个尺寸为38×38、19×19、10×10、5×5、3×3、1×1的特征图,这些特征图都包含有丰富的语义信息和细节信息。Firstly, the feature map with size (1, 1) is upsampled, and the interpolation method is used. On the basis of the original image pixels, a suitable interpolation algorithm is used to insert new elements between the pixels, thereby expanding the size of the feature map. , so that the size of the feature map of (3, 3) is the same after expansion; then perform a 1*1 convolution operation on the feature map of (3, 3) to change the number of channels so that the number of channels is the same as the feature map obtained by upsampling. The number of channels is the same; finally, feature fusion is performed, and after fusion, a 3*3 convolution kernel is used to perform a convolution operation on the fused feature map to eliminate the aliasing effect of upsampling. The fusion between other adjacent feature maps is consistent with the above method. Six feature maps with sizes of 38 × 38, 19 × 19, 10 × 10, 5 × 5, 3 × 3, and 1 × 1 are obtained by fusion, and these feature maps contain rich semantic information and detailed information.

b)结合注意力机制的原理,引入了通道注意力,对进行特征融合后的特征图添加注意力模型;对a)步骤中经过特征融合后的特征图添加注意力模型,由于融合后38×38、19×19两个特征图中包含有最丰富的信息,且为了保持算法的实时性,因此只对这两个特征图添加注意力模型。注意力模型的添加过程分为三个步骤:挤压,激励,注意。b) Combined with the principle of the attention mechanism, channel attention is introduced, and an attention model is added to the feature map after feature fusion; an attention model is added to the feature map after feature fusion in step a). The 38 and 19×19 feature maps contain the most abundant information, and in order to maintain the real-time performance of the algorithm, only the attention model is added to these two feature maps. The addition process of the attention model is divided into three steps: squeezing, excitation, and attention.

挤压操作的公式如下:The formula for the squeeze operation is as follows:

Figure BDA0002596425370000081
Figure BDA0002596425370000081

其中,H、W分别代表输入的高度,宽度,U代表输入,Y代表输出,C为输入的通道数;Among them, H and W represent the height and width of the input respectively, U represents the input, Y represents the output, and C is the number of input channels;

该式(1)的作用是将H*W*C的输入转化为1*1*C的输出,相当于进行了一个全局平均池化(global average pooling)操作。The function of the formula (1) is to convert the input of H*W*C into the output of 1*1*C, which is equivalent to performing a global average pooling operation.

激励操作的公式如下:The formula for the incentive operation is as follows:

S=h-Swish(W2×ReLU6(W1Y))(2)S=h-Swish(W2 ×ReLU6(W1 Y))(2)

其中,Y代表挤压操作的输出,S代表激励操作的输出,W1的维度为C/r*C,W2的维度为C*C/r,r是一个缩放参数,本文取值为4。W1与Y相乘代表全连接操作,然后经过ReLU6激活函数;再与W2相乘,也代表一个全连接操作,最后再经过hard-Swish激活函数,即完成了激励操作。ReLU6和hard-Swish激活函数公式如下式(3)所示。Among them, Y represents the output of the squeeze operation, S represents the output of the excitation operation, the dimension of W1 is C/r*C, the dimension of W2 is C*C/r, and r is a scaling parameter, which is 4 in this paper. . The multiplication of W1 and Y represents the full connection operation, and then passes through the ReLU6 activation function; then multiplication with W2 also represents a full connection operation, and finally passes through the hard-Swish activation function to complete the excitation operation. The formulas of ReLU6 and hard-Swish activation functions are shown in Equation (3).

Figure BDA0002596425370000082
Figure BDA0002596425370000082

Attention的操作如下式所示:The operation of Attention is as follows:

X=S×U (4)X=S×U (4)

式中,X代表添加注意力机制后的特征图,U代表原始输入,S代表激励操作的输出,对每一个特征图的权重和特征图的特征进行相乘。In the formula, X represents the feature map after adding the attention mechanism, U represents the original input, S represents the output of the excitation operation, and the weight of each feature map and the feature of the feature map are multiplied.

c)由步骤a)和步骤b)中得到的6个多尺度特征图,在每个单元都要设置不同尺寸、长宽比的候选框,对于候选框的尺度,按如下公式进行计算:c) For the 6 multi-scale feature maps obtained in step a) and step b), candidate frames of different sizes and aspect ratios must be set in each unit. The scale of the candidate frame is calculated according to the following formula:

Figure BDA0002596425370000091
Figure BDA0002596425370000091

其中,m代表特征层的个数;sk表示候选框与图片的比例;smax和smin代表比例的最大值和最小值,分别取值为0.9和0.2;Among them, m represents the number of feature layers; sk represents the ratio of the candidate frame to the image; smax and smin represent the maximum and minimum values of the ratio, which are 0.9 and 0.2 respectively;

利用上式(5)得到各个候选框的尺度;Use the above formula (5) to obtain the scale of each candidate frame;

对于长宽比,一般取值为

Figure BDA0002596425370000092
且按照如下公式(6)对候选框的宽度及高度进行计算:For the aspect ratio, the general value is
Figure BDA0002596425370000092
And calculate the width and height of the candidate frame according to the following formula (6):

Figure BDA0002596425370000093
Figure BDA0002596425370000093

对于宽高比为1的候选框,还增加一个尺度为

Figure BDA0002596425370000094
的候选框,候选框的中心坐标为
Figure BDA0002596425370000095
其中|fk|代表特征层的大小;For the candidate box with an aspect ratio of 1, a scale is added as
Figure BDA0002596425370000094
The candidate frame of , the center coordinates of the candidate frame are
Figure BDA0002596425370000095
where |fk | represents the size of the feature layer;

d)使用3×3的卷积核通过卷积操作对多尺度特征图的类别和置信度进行检测,并对目标检测算法进行训练;模型训练时损失函数定义为位置损失(localization loss,loc)和置信度损失(confidence loss,conf)的加权和,计算公式如下:d) Use a 3×3 convolution kernel to detect the category and confidence of the multi-scale feature map through the convolution operation, and train the target detection algorithm; the loss function during model training is defined as localization loss (loc) and the weighted sum of confidence loss (conf), the formula is as follows:

Figure BDA0002596425370000101
Figure BDA0002596425370000101

式中,N为匹配的候选框的数量;x∈{1,0}表示候选框是否与真实框匹配,若匹配,则x=1,反之x=0;c为类别置信度预测值;g为真实框的位置参数;l为预测框的位置预测值;α权重系数,设置为1。In the formula, N is the number of matched candidate frames; x∈{1,0} indicates whether the candidate frame matches the real frame, if it matches, then x=1, otherwise x=0; c is the category confidence prediction value; g is the position parameter of the real frame; l is the position prediction value of the prediction frame; α weight coefficient, set to 1.

对于SSD中的位置损失函数,采用Smooth L1 loss,对候选框的中心(cx,cy)及宽度(w)、高度(h)的偏移量进行回归。公式如下:For the position loss function in SSD, Smooth L1 loss is used to regress the center (cx, cy) and the offset of the width (w) and height (h) of the candidate frame. The formula is as follows:

Figure BDA0002596425370000102
Figure BDA0002596425370000102

Figure BDA0002596425370000103
Figure BDA0002596425370000103

Figure BDA0002596425370000104
Figure BDA0002596425370000104

对于SSD中的置信度损失函数,使用典型的softmax loss,其公式为:For the confidence loss function in SSD, a typical softmax loss is used, and its formula is:

Figure BDA0002596425370000105
Figure BDA0002596425370000105

然后对改进后的目标检测算法模型进行训练。Then the improved target detection algorithm model is trained.

在本实施例中,以PASCAL VOC2007数据集和PASCAL VOC2012数据集作为模型训练所用的训练集,并采用数据扩增技术,通过对数据集进行水平翻转、随机裁剪等操作,对训练集图像进行扩充。In this embodiment, the PASCAL VOC2007 data set and the PASCAL VOC2012 data set are used as the training sets for model training, and the data augmentation technology is used to expand the training set images by performing horizontal flipping, random cropping and other operations on the data set. .

实验所用的数据:PASCAL VOC数据集,是一套用于图像识别和分类的标准化的数据集,该数据集中包含20个类别,分别为人、鸟、猫、牛、狗、马、羊、飞机、自行车、船、巴士、汽车、摩托车、火车、瓶子、椅子、餐桌、盆栽植物、沙发、电视。Data used in the experiment: PASCAL VOC dataset is a standardized dataset for image recognition and classification. The dataset contains 20 categories, namely people, birds, cats, cows, dogs, horses, sheep, airplanes, bicycles , boat, bus, car, motorcycle, train, bottle, chair, dining table, potted plant, sofa, TV.

本实施例使用上述VOC2007数据集和VOC2012数据集进行训练,使用VOC2007数据集进行测试。训练时采用随机梯度下降法(SGD),batchsize设置为32,初始学习率设置为0.001,动量参数monmentum设置为0.9,学习率在迭代次数为100000和150000时调小90%,共训练200000次。In this example, the above-mentioned VOC2007 dataset and VOC2012 dataset are used for training, and the VOC2007 dataset is used for testing. Stochastic Gradient Descent (SGD) is used for training, the batch size is set to 32, the initial learning rate is set to 0.001, the momentum parameter is set to 0.9, and the learning rate is reduced by 90% when the number of iterations is 100,000 and 150,000, and a total of 200,000 training times.

为了验证本实施例的基于单阶段网络模型的目标检测改进算法的检测效果,申请人选用PASCAL VOC2007数据集中的测试集进行检测,使用mAP(mean Average Precision)作为检测算法的评价指标,检测到的每一个类别都会得到由查准率和查全率构成的曲线,即P-R曲线,曲线下的面积就是AP值,对检测的所有类别的AP值再求平均,即可得到mAP值。与其他主流目标检测模型从主观和客观两方面进行检测效果对比(参见表1和表2)。In order to verify the detection effect of the target detection improvement algorithm based on the single-stage network model of this embodiment, the applicant selects the test set in the PASCAL VOC2007 data set for detection, and uses mAP (mean Average Precision) as the evaluation index of the detection algorithm. Each category will get a curve composed of precision and recall, that is, the P-R curve. The area under the curve is the AP value. The mAP value can be obtained by averaging the AP values of all categories detected. The detection effect is compared with other mainstream target detection models from both subjective and objective aspects (see Table 1 and Table 2).

表1Table 1

Figure BDA0002596425370000111
Figure BDA0002596425370000111

表2Table 2

Figure BDA0002596425370000112
Figure BDA0002596425370000112

Figure BDA0002596425370000121
Figure BDA0002596425370000121

检测效果主观评价中,对比原始SSD算法及改进后的检测算法效果图(如图2所示,其中,a1、a2、a3、a4、a5图均是原始SSD算法检测图片;b1、b2、b3、b4、b5图均是目标检测算法检测图片)。从图中可以看出,改进后的目标检测算法相比原始SSD算法,显著改善了原始算法中的漏检等问题,对密集分布的小目标物体检测能力更加优秀,可以检测到更多的目标。检测效果较原始SSD算法有了较明显的提升。In the subjective evaluation of the detection effect, compare the original SSD algorithm and the improved detection algorithm renderings (as shown in Figure 2, where a1, a2, a3, a4, and a5 are the original SSD algorithm detection pictures; b1, b2, b3 , b4, and b5 are all images detected by the target detection algorithm). As can be seen from the figure, compared with the original SSD algorithm, the improved target detection algorithm significantly improves the missed detection and other problems in the original algorithm, and has better detection ability for densely distributed small target objects, and can detect more targets. . Compared with the original SSD algorithm, the detection effect has been significantly improved.

Claims (3)

1. An improved target detection algorithm based on a feature pyramid network and an attention mechanism is characterized by comprising the following steps:
step 1) combining the principle of a feature pyramid network, and performing feature fusion on 6 multi-scale feature maps extracted from an input image by a basic network VGG-16 in an original SSD algorithm according to the sequence of small features and large features; obtaining feature maps fusing different layers, wherein the fused feature maps simultaneously contain rich semantic information and detail information;
in the original SSD algorithm, the scale of a feature map extracted from an input image through a basic network VGG-16 is gradually decreased from large to small, wherein the resolution of a bottom-layer feature map is large and contains more detailed information, and the resolution of a high-layer feature map is small and contains more abstract semantic information, so that the original SSD algorithm uses the bottom-layer feature map for detecting small targets and the high-layer feature map for detecting medium and large targets;
step 2) introducing a channel attention mechanism, adding an attention model to two feature graphs which have richer detail information and semantic information after feature fusion and are more sensitive to small target detection; namely, a mask (mask) is added to a feature map to realize an attention mechanism, the features of the region of interest are identified, the network learns the region of interest needing to be focused in each image through continuous training of the network, and influences caused by other interference regions are inhibited, so that the detection capability of the algorithm on small target objects is enhanced.
2. The algorithm of claim 1, wherein the size of the input image in step 1) is 300 x 300, and the sizes of the feature maps for detection obtained after passing through the underlying network VGG-16 are 38 x 38, 19 x 19, 10 x 10, 5 x 5, 3 x 3, 1 x 1; according to the principle of the feature pyramid network, feature fusion is sequentially carried out on feature graphs for detection from small to large in size, and 6 feature graphs with the feature graph size still being 38 multiplied by 38, 19 multiplied by 19, 10 multiplied by 10, 5 multiplied by 5, 3 multiplied by 3 and 1 multiplied by 1 are obtained.
3. The algorithm according to claim 1, wherein in step 2), an attention model is added to the feature map fused according to the feature pyramid principle in step 1), because the fusion process is performed in the order from small to large in feature map size, so that the feature map with the most abundant information after fusion is (38, 38), (19, 19), and the two feature maps have more abundant detail information and semantic information and are more sensitive to small object detection than other feature maps; in order to maintain the detection speed of the algorithm and reduce the calculation amount of the algorithm, the attention model is only added to the two feature maps (38, 38) and (19, 19) after fusion, and the detection process of the target detection algorithm is as follows:
a) and (3) target detection based on a single-stage network model, and directly regressing the category and the frame of the target on the input image through a convolutional neural network by utilizing the regression idea. Firstly, combining the principle of a characteristic pyramid network, and sequentially performing characteristic fusion on multi-scale characteristic graphs extracted by an original SSD algorithm according to the sequence of sizes from small to large; in the original SSD algorithm, input image multi-scale feature maps extracted by a basic network VGG-16 are respectively 38 × 38, 19 × 19, 10 × 10, 5 × 5, 3 × 3 and 1 × 1 in size, feature fusion is carried out according to the principle of a feature pyramid network and the order of the sizes from small to large, 6 feature maps with the sizes of 38 × 38, 19 × 19, 10 × 10, 5 × 5, 3 × 3 and 1 × 1 are obtained through fusion, and the feature maps contain rich semantic information and detailed information.
b) The method is characterized in that channel attention is introduced by combining the principle of an attention mechanism, and an attention model is added to a feature map subjected to feature fusion; adding an attention model to the feature map subjected to feature fusion in the step 1a), wherein the two feature maps of 38 × 38 and 19 × 19 contain the most abundant information after fusion, and in order to keep the real-time performance of the algorithm, only adding the attention model to the two feature maps;
c) setting candidate frames with different sizes and aspect ratios in each unit according to the 6 multi-scale feature maps obtained in the steps a) and b), and calculating the scale of the candidate frames according to the following formula (1):
Figure FDA0002596425360000031
wherein m represents the number of feature layers; skRepresenting the ratio of the candidate frame to the picture; smaxAnd sminThe maximum value and the minimum value of the representative proportion are respectively 0.9 and 0.2;
obtaining the scale of each candidate frame by using the formula (1);
for aspect ratio, the value is generally
Figure FDA0002596425360000032
And the width and height of the candidate frame are calculated according to the following formula (2):
Figure FDA0002596425360000033
for a candidate box with aspect ratio of 1, a scale is also added
Figure FDA0002596425360000034
The candidate frame of (1), the center coordinates of the candidate frame are
Figure FDA0002596425360000035
Wherein | fk| represents the size of the feature layer;
d) detecting the category and the confidence coefficient of the multi-scale feature map by using a convolution kernel of 3 multiplied by 3 through convolution operation, and training a target detection algorithm; the loss function during model training is defined as a weighted sum of position loss (loc) and confidence loss (conf), and the calculation formula is as follows:
Figure FDA0002596425360000036
in the formula, N is the number of matched candidate frames; x belongs to {1,0} and represents whether the candidate frame is matched with the real frame, if so, x is 1, otherwise, x is 0; c is a category confidence degree predicted value; g is a position parameter of the real frame; l is the position predicted value of the predicted frame; an alpha weight coefficient set to 1;
for the position loss function in SSD, the center (cx, cy) of the candidate frame, and the offset of the width (w) and height (h) are regressed using Smooth L1 loss. The formula is as follows:
Figure FDA0002596425360000041
Figure FDA0002596425360000042
Figure FDA0002596425360000043
for the confidence loss function in SSD, a typical softmax loss is used, which is formulated as:
Figure FDA0002596425360000044
CN202010710684.XA2020-07-222020-07-22 An improved object detection algorithm based on feature pyramid network and attention mechanismActiveCN111914917B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202010710684.XACN111914917B (en)2020-07-222020-07-22 An improved object detection algorithm based on feature pyramid network and attention mechanism

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202010710684.XACN111914917B (en)2020-07-222020-07-22 An improved object detection algorithm based on feature pyramid network and attention mechanism

Publications (2)

Publication NumberPublication Date
CN111914917Atrue CN111914917A (en)2020-11-10
CN111914917B CN111914917B (en)2025-01-17

Family

ID=73280105

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202010710684.XAActiveCN111914917B (en)2020-07-222020-07-22 An improved object detection algorithm based on feature pyramid network and attention mechanism

Country Status (1)

CountryLink
CN (1)CN111914917B (en)

Cited By (23)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN112418345A (en)*2020-12-072021-02-26苏州小阳软件科技有限公司Method and device for quickly identifying fine-grained small target
CN112465057A (en)*2020-12-082021-03-09中国人民解放军空军工程大学Target detection and identification method based on deep convolutional neural network
CN112819737A (en)*2021-01-132021-05-18西北大学Remote sensing image fusion method of multi-scale attention depth convolution network based on 3D convolution
CN112837747A (en)*2021-01-132021-05-25上海交通大学 A protein binding site prediction method based on attention twin network
CN113158738A (en)*2021-01-282021-07-23中南大学Port environment target detection method, system, terminal and readable storage medium based on attention mechanism
CN113177579A (en)*2021-04-082021-07-27北京科技大学Feature fusion method based on attention mechanism
CN113255443A (en)*2021-04-162021-08-13杭州电子科技大学Pyramid structure-based method for positioning time sequence actions of graph attention network
CN113408549A (en)*2021-07-142021-09-17西安电子科技大学Few-sample weak and small target detection method based on template matching and attention mechanism
CN113409249A (en)*2021-05-172021-09-17上海电力大学Insulator defect detection method based on end-to-end algorithm
CN113807291A (en)*2021-09-242021-12-17南京莱斯电子设备有限公司Airport runway foreign matter detection and identification method based on feature fusion attention network
CN113920468A (en)*2021-12-132022-01-11松立控股集团股份有限公司Multi-branch pedestrian detection method based on cross-scale feature enhancement
CN114220015A (en)*2021-12-212022-03-22一拓通信集团股份有限公司Improved YOLOv 5-based satellite image small target detection method
CN114387202A (en)*2021-06-252022-04-22南京交通职业技术学院3D target detection method based on vehicle end point cloud and image fusion
CN114419530A (en)*2021-12-012022-04-29国电南瑞南京控制系统有限公司Helmet wearing detection algorithm based on improved YOLOv5
CN114494870A (en)*2022-01-212022-05-13山东科技大学 A dual-phase remote sensing image change detection method, model building method and device
CN114627368A (en)*2020-12-102022-06-14华东理工大学Novel real-time detector for remote sensing image
CN114782772A (en)*2022-04-082022-07-22河海大学 A detection and identification method of floating objects on water based on improved SSD algorithm
CN114821347A (en)*2021-01-212022-07-29南京理工大学Remote sensing aircraft target identification method based on depth feature fusion
CN114937196A (en)*2022-04-222022-08-23广州大学Shadow detection method based on random attention mechanism
CN114972860A (en)*2022-05-232022-08-30郑州轻工业大学Target detection method based on attention-enhanced bidirectional feature pyramid network
CN115019169A (en)*2022-05-312022-09-06海南大学Single-stage water surface small target detection method and device
CN115995042A (en)*2023-02-092023-04-21上海理工大学 A video SAR moving target detection method and device
CN119579968A (en)*2024-11-142025-03-07中国科学院自动化研究所 Image internal texture classification method and texture classification model training method

Citations (9)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20180182109A1 (en)*2016-12-222018-06-28TCL Research America Inc.System and method for enhancing target tracking via detector and tracker fusion for unmanned aerial vehicles
CN109344821A (en)*2018-08-302019-02-15西安电子科技大学 Small target detection method based on feature fusion and deep learning
US20190341025A1 (en)*2018-04-182019-11-07Sony Interactive Entertainment Inc.Integrated understanding of user characteristics by multimodal processing
CN110533084A (en)*2019-08-122019-12-03长安大学A kind of multiscale target detection method based on from attention mechanism
CN110674866A (en)*2019-09-232020-01-10兰州理工大学Method for detecting X-ray breast lesion images by using transfer learning characteristic pyramid network
CN110705457A (en)*2019-09-292020-01-17核工业北京地质研究院Remote sensing image building change detection method
CN111179217A (en)*2019-12-042020-05-19天津大学 A multi-scale target detection method in remote sensing images based on attention mechanism
CN111259940A (en)*2020-01-102020-06-09杭州电子科技大学Target detection method based on space attention map
CN111401201A (en)*2020-03-102020-07-10南京信息工程大学 A multi-scale object detection method based on spatial pyramid attention-driven aerial imagery

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20180182109A1 (en)*2016-12-222018-06-28TCL Research America Inc.System and method for enhancing target tracking via detector and tracker fusion for unmanned aerial vehicles
US20190341025A1 (en)*2018-04-182019-11-07Sony Interactive Entertainment Inc.Integrated understanding of user characteristics by multimodal processing
CN109344821A (en)*2018-08-302019-02-15西安电子科技大学 Small target detection method based on feature fusion and deep learning
CN110533084A (en)*2019-08-122019-12-03长安大学A kind of multiscale target detection method based on from attention mechanism
CN110674866A (en)*2019-09-232020-01-10兰州理工大学Method for detecting X-ray breast lesion images by using transfer learning characteristic pyramid network
CN110705457A (en)*2019-09-292020-01-17核工业北京地质研究院Remote sensing image building change detection method
CN111179217A (en)*2019-12-042020-05-19天津大学 A multi-scale target detection method in remote sensing images based on attention mechanism
CN111259940A (en)*2020-01-102020-06-09杭州电子科技大学Target detection method based on space attention map
CN111401201A (en)*2020-03-102020-07-10南京信息工程大学 A multi-scale object detection method based on spatial pyramid attention-driven aerial imagery

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
徐成琪;洪学海;: "基于功能保持的特征金字塔目标检测网络", 模式识别与人工智能, no. 06, 15 June 2020 (2020-06-15)*
沈文祥;秦品乐;曾建潮;: "基于多级特征和混合注意力机制的室内人群检测网络", 计算机应用, no. 12*
高建瓴;孙健;王子牛;韩毓璐;冯娇娇;: "基于注意力机制和特征融合的SSD目标检测算法", 软件, no. 02, 15 February 2020 (2020-02-15)*

Cited By (33)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN112418345B (en)*2020-12-072024-02-23深圳小阳软件有限公司Method and device for quickly identifying small targets with fine granularity
CN112418345A (en)*2020-12-072021-02-26苏州小阳软件科技有限公司Method and device for quickly identifying fine-grained small target
CN112465057A (en)*2020-12-082021-03-09中国人民解放军空军工程大学Target detection and identification method based on deep convolutional neural network
CN114627368A (en)*2020-12-102022-06-14华东理工大学Novel real-time detector for remote sensing image
CN112819737B (en)*2021-01-132023-04-07西北大学Remote sensing image fusion method of multi-scale attention depth convolution network based on 3D convolution
CN112837747B (en)*2021-01-132022-07-12上海交通大学 A protein binding site prediction method based on attention twin network
CN112819737A (en)*2021-01-132021-05-18西北大学Remote sensing image fusion method of multi-scale attention depth convolution network based on 3D convolution
CN112837747A (en)*2021-01-132021-05-25上海交通大学 A protein binding site prediction method based on attention twin network
CN114821347B (en)*2021-01-212025-06-13南京理工大学 A remote sensing aircraft target recognition method based on deep feature fusion
CN114821347A (en)*2021-01-212022-07-29南京理工大学Remote sensing aircraft target identification method based on depth feature fusion
CN113158738B (en)*2021-01-282022-09-20中南大学Port environment target detection method, system, terminal and readable storage medium based on attention mechanism
CN113158738A (en)*2021-01-282021-07-23中南大学Port environment target detection method, system, terminal and readable storage medium based on attention mechanism
CN113177579A (en)*2021-04-082021-07-27北京科技大学Feature fusion method based on attention mechanism
CN113255443A (en)*2021-04-162021-08-13杭州电子科技大学Pyramid structure-based method for positioning time sequence actions of graph attention network
CN113255443B (en)*2021-04-162024-02-09杭州电子科技大学Graph annotation meaning network time sequence action positioning method based on pyramid structure
CN113409249A (en)*2021-05-172021-09-17上海电力大学Insulator defect detection method based on end-to-end algorithm
CN114387202A (en)*2021-06-252022-04-22南京交通职业技术学院3D target detection method based on vehicle end point cloud and image fusion
CN113408549A (en)*2021-07-142021-09-17西安电子科技大学Few-sample weak and small target detection method based on template matching and attention mechanism
CN113408549B (en)*2021-07-142023-01-24西安电子科技大学 Few-sample Weak Object Detection Method Based on Template Matching and Attention Mechanism
CN113807291B (en)*2021-09-242024-04-26南京莱斯电子设备有限公司Airport runway foreign matter detection and identification method based on feature fusion attention network
CN113807291A (en)*2021-09-242021-12-17南京莱斯电子设备有限公司Airport runway foreign matter detection and identification method based on feature fusion attention network
CN114419530A (en)*2021-12-012022-04-29国电南瑞南京控制系统有限公司Helmet wearing detection algorithm based on improved YOLOv5
CN113920468A (en)*2021-12-132022-01-11松立控股集团股份有限公司Multi-branch pedestrian detection method based on cross-scale feature enhancement
CN114220015A (en)*2021-12-212022-03-22一拓通信集团股份有限公司Improved YOLOv 5-based satellite image small target detection method
CN114494870B (en)*2022-01-212025-05-30山东科技大学 A dual-temporal remote sensing image change detection method, model building method and device
CN114494870A (en)*2022-01-212022-05-13山东科技大学 A dual-phase remote sensing image change detection method, model building method and device
CN114782772A (en)*2022-04-082022-07-22河海大学 A detection and identification method of floating objects on water based on improved SSD algorithm
CN114937196A (en)*2022-04-222022-08-23广州大学Shadow detection method based on random attention mechanism
CN114937196B (en)*2022-04-222025-04-22广州大学 A shadow detection method based on random attention mechanism
CN114972860A (en)*2022-05-232022-08-30郑州轻工业大学Target detection method based on attention-enhanced bidirectional feature pyramid network
CN115019169A (en)*2022-05-312022-09-06海南大学Single-stage water surface small target detection method and device
CN115995042A (en)*2023-02-092023-04-21上海理工大学 A video SAR moving target detection method and device
CN119579968A (en)*2024-11-142025-03-07中国科学院自动化研究所 Image internal texture classification method and texture classification model training method

Also Published As

Publication numberPublication date
CN111914917B (en)2025-01-17

Similar Documents

PublicationPublication DateTitle
CN111914917A (en)Target detection improved algorithm based on feature pyramid network and attention mechanism
CN113065558B (en)Lightweight small target detection method combined with attention mechanism
CN112200045B (en)Remote sensing image target detection model establishment method based on context enhancement and application
CN111126202B (en) Object detection method of optical remote sensing image based on hole feature pyramid network
CN114612937B (en)Pedestrian detection method based on single-mode enhancement by combining infrared light and visible light
CN107220611B (en)Space-time feature extraction method based on deep neural network
CN107784288B (en)Iterative positioning type face detection method based on deep neural network
CN108549841A (en)A kind of recognition methods of the Falls Among Old People behavior based on deep learning
CN111783685B (en) An improved target detection algorithm based on a single-stage network model
CN111160249A (en) Multi-class target detection method in optical remote sensing images based on cross-scale feature fusion
CN111738344A (en) A fast target detection method based on multi-scale fusion
CN113887649B (en)Target detection method based on fusion of deep layer features and shallow layer features
CN117037004A (en)Unmanned aerial vehicle image detection method based on multi-scale feature fusion and context enhancement
CN107292875A (en)A kind of conspicuousness detection method based on global Local Feature Fusion
CN111860171A (en) A method and system for detecting irregularly shaped targets in large-scale remote sensing images
CN109767456A (en) A target tracking method based on SiameseFC framework and PFP neural network
CN109784278A (en)The small and weak moving ship real-time detection method in sea based on deep learning
CN109271876A (en)Video actions detection method based on temporal evolution modeling and multi-instance learning
CN114743023B (en) An image detection method of wheat spider based on RetinaNet model
CN108256496A (en)A kind of stockyard smog detection method based on video
CN115223009A (en)Small target detection method and device based on improved YOLOv5
CN111340019A (en) Detection method of granary pests based on Faster R-CNN
CN110046595A (en)A kind of intensive method for detecting human face multiple dimensioned based on tandem type
CN117203678A (en)Target detection method and device
CN111428655A (en) A scalp detection method based on deep learning

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp