CN109583456B

Movatterモバイル変換

Info

Publication number: CN109583456B
Application number: CN201811386234.9A
Authority: CN
Inventors: 周慧鑫; 施元斌; 赵东; 郭立新; 张嘉嘉; 秦翰林; 王炳健; 赖睿; 李欢; 宋江鲁奇; 姚博; 于跃; 贾秀萍; 周峻
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2018-11-20
Filing date: 2018-11-20
Publication date: 2023-04-28
Anticipated expiration: 2038-11-20
Also published as: CN109583456A

Abstract

Translated fromChinese

本发明公开了一种基于特征融合和稠密连接的红外面目标检测方法，构建包含所需识别目标的红外图像数据集，在所述红外图像数据集中标定所需识别目标的位置与种类，获得原有已知的标签图像；将所述红外图像数据集分为训练集和验证集两部分；对训练集中的图像进行图像增强的预处理并且进行特征提取和特征融合，通过回归网络获得分类结果和边界框；将所述分类结果和边界框与原有已知的标签图像进行损失函数计算，更新卷积神经网络的参数值；重复对卷积神经网络参数进行迭代更新，直至误差足够小或迭代次数达到设定的上限为止；通过训练完成的卷积神经网络参数对验证集中的图像进行处理，获取目标检测的准确度和所需时间，以及最终目标检测结果图。

The invention discloses an infrared surface target detection method based on feature fusion and dense connection, constructs an infrared image data set containing the required recognition target, calibrates the position and type of the required recognition target in the infrared image data set, and obtains the original There are known label images; the infrared image data set is divided into two parts, a training set and a verification set; the images in the training set are preprocessed for image enhancement and feature extraction and feature fusion, and the classification results and Bounding box; calculate the loss function of the classification result and the bounding box and the original known label image, and update the parameter value of the convolutional neural network; iteratively update the parameters of the convolutional neural network until the error is small enough or iteratively Until the number of times reaches the set upper limit; the images in the verification set are processed through the trained convolutional neural network parameters to obtain the accuracy and time required for target detection, as well as the final target detection result map.

Description

Translated fromChinese

基于特征融合和稠密连接的红外面目标检测方法Infrared surface target detection method based on feature fusion and dense connection

技术领域technical field

本发明属于图像处理技术领域，具体涉及一种基于特征融合和稠密连接的红外面目标检测方法。The invention belongs to the technical field of image processing, and in particular relates to an infrared surface target detection method based on feature fusion and dense connection.

背景技术Background technique

目前，主要的目标检测方法可以大致分为两类，一类是基于背景建模的目标检测方法，一类是基于前景建模的方法，基于背景建模的方法通过构建背景模型，将图像中与背景差异大的区域判定为目标；由于背景的复杂性，此种方法的检测效果不够理想。基于前景建模的方法通过提取目标的特征信息，将与特征信息相符较多的区域判定为目标，其中，最具代表性的是基于深度学习的目标检测方法。基于深度学习的目标检测方法通过深层卷积神经网络，自动提取目标特征，检测目标种类与位置。然后与训练集中的标定信息进行对比，计算损失函数，通过梯度下降的方法，改进网络提取的特征，使其更符合目标的实际情况。同时，更新后续检测部分的参数，使检测结果更准确。不断重复训练，直到达到预期的检测效果。At present, the main target detection methods can be roughly divided into two categories, one is the target detection method based on background modeling, and the other is the method based on foreground modeling. The area with a large difference from the background is determined as the target; due to the complexity of the background, the detection effect of this method is not ideal. The method based on foreground modeling extracts the feature information of the target, and judges the area that matches the feature information more as the target. Among them, the most representative one is the target detection method based on deep learning. The target detection method based on deep learning automatically extracts target features through a deep convolutional neural network, and detects the type and location of the target. Then compare it with the calibration information in the training set, calculate the loss function, and improve the features extracted by the network through the gradient descent method to make it more in line with the actual situation of the target. At the same time, the parameters of the subsequent detection part are updated to make the detection results more accurate. Repeat the training until the expected detection effect is achieved.

发明内容Contents of the invention

为了解决现有技术中的上述问题，本发明提供了一种基于特征融合和稠密块的目标检测方法。In order to solve the above-mentioned problems in the prior art, the present invention provides an object detection method based on feature fusion and dense blocks.

本发明采用的技术方案如下：The technical scheme that the present invention adopts is as follows:

本发明实施例提供一种基于特征融合和稠密连接的红外面目标检测方法，该方法通过如下步骤实现：An embodiment of the present invention provides an infrared surface target detection method based on feature fusion and dense connection, which is implemented through the following steps:

步骤1，构建包含所需识别目标的红外图像数据集，在所述红外图像数据集中标定所需识别目标的位置与种类，获得原有已知的标签图像；Step 1, constructing an infrared image data set containing the target to be identified, marking the position and type of the target to be identified in the infrared image data set, and obtaining the original known label image;

步骤2，将所述红外图像数据集分为训练集和验证集两部分；Step 2, the infrared image data set is divided into two parts, a training set and a verification set;

步骤3，对训练集中的图像进行图像增强的预处理；Step 3, carry out the preprocessing of image enhancement to the image in the training set;

步骤4，对预处理后的图像进行特征提取和特征融合，并通过回归网络获得分类结果和边界框；将所述分类结果和边界框与原有已知的标签图像进行损失函数计算，使用包含动量的随机梯度下降法在卷积神经网络中对预测误差进行反向传播，并更新卷积神经网络的参数值；Step 4, perform feature extraction and feature fusion on the preprocessed image, and obtain the classification result and bounding box through the regression network; perform loss function calculation on the classification result and bounding box and the original known label image, using the The stochastic gradient descent method of momentum backpropagates the prediction error in the convolutional neural network and updates the parameter values of the convolutional neural network;

步骤5，重复步骤3、4对卷积神经网络参数进行迭代更新，直至误差足够小或迭代次数达到设定的上限为止；Step 5, repeat steps 3 and 4 to iteratively update the parameters of the convolutional neural network until the error is small enough or the number of iterations reaches the set upper limit;

步骤6，通过训练完成的卷积神经网络参数对验证集中的图像进行处理，获取目标检测的准确度和所需时间，以及最终目标检测结果图。Step 6: Process the images in the verification set through the trained convolutional neural network parameters to obtain the accuracy and time required for target detection, as well as the final target detection result map.

上述方案中，所述步骤4中对预处理后的图像进行特征提取和特征融合，并通过回归网络获得分类结果和边界框，具体通过以下步骤实现：In the above scheme, in the step 4, feature extraction and feature fusion are performed on the preprocessed image, and the classification result and the bounding box are obtained through the regression network, which is specifically implemented through the following steps:

步骤401，在所述训练集中随机抽取固定数量的图像，对每一幅图像划分10×10的区域；Step 401, randomly extracting a fixed number of images from the training set, and dividing each image into a 10×10 area;

步骤402，将所述步骤401划分后的图像输入稠密连接网络进行特征提取；Step 402, input the image divided in step 401 into a densely connected network for feature extraction;

步骤403，对提取的特征图进行特征融合，获得融合的特征图；Step 403, performing feature fusion on the extracted feature map to obtain a fused feature map;

步骤404，对所述融合的特征图中每一个区域产生固定数量的建议框；Step 404, generating a fixed number of suggestion boxes for each region in the fused feature map;

步骤405，将所述融合的特征图和建议框送入回归网络进行分类和边界框回归，并使用非极大值抑制方法去除冗余，获得分类结果和边界框。Step 405, send the fused feature map and suggestion frame to the regression network for classification and bounding box regression, and use the non-maximum value suppression method to remove redundancy, and obtain the classification result and bounding box.

上述方案中，所述步骤402中稠密连接网络的计算方法如公式：In the above scheme, the calculation method of the densely connected network in step 402 is as follows:

d_l＝H_l([d₀,d₁,...,d_l-1])d_l ＝H_l ([d₀ ,d₁ ,...,d_l-1 ])

其中，d_l表示稠密连接网络中第l个卷积层的输出结果，若稠密连接网络共包含B个卷积层，则l在0～B之间取值，H_l(*)是正则化、卷积和线性整流激活函数的组合操作，d₀为输入图像，d_l-1为第l-1层的输出结果。Among them, d_l represents the output result of the lth convolutional layer in the densely connected network. If the densely connected network contains B convolutional layers, then l takes a value between 0 and B, and H_l (*) is the regularization , the combined operation of convolution and linear rectification activation functions, d₀ is the input image, and d_l-1 is the output result of the l-1th layer.

上述方案中，所述步骤403中对提取的特征图进行特征融合是将所提取到的不同尺度的特征图通过池化方法进行直接融合。In the above solution, the feature fusion of the extracted feature maps in the step 403 is to directly fuse the extracted feature maps of different scales through a pooling method.

上述方案中，所述步骤403中对提取的特征图进行特征融合，具体通过以下步骤实现：In the above solution, feature fusion is performed on the extracted feature maps in step 403, which is specifically implemented through the following steps:

步骤4031，将第一组特征图F₁通过池化运算，转换成新的较小的特征图，再与第二组特征图F₂融合得到新的特征图F₂’；Step 4031, convert the first group of feature maps_F1 into a new smaller feature map through pooling operation, and then fuse with the second group of feature maps_F2 to obtain a new feature map_F2 ';

步骤4032，将新的特征图F₂’通过池化运算，再与第三组特征图F₃融合得到新的特征图F₃’；In step 4032, the new feature map F₂ ' is pooled, and then fused with the third group of feature maps F₃ to obtain a new feature map F₃ ';

步骤4033，用新的特征图F₂’和F₃’代替第二组特征图F₂和第三组特征图F₃进入回归网络。Step 4033, replace the second group of feature maps F₂ and the third group of feature maps F₃ with new feature maps F_{2 ′} and F_{3 ′} to enter the regression network.

上述方案中，所述步骤405中将所述融合的特征图和建议框送入回归网络进行分类和边界框回归，并使用非极大值抑制方法去除冗余，获得分类结果和边界框，具体通过以下步骤实现：In the above scheme, in the step 405, the fused feature map and suggestion frame are sent to the regression network for classification and bounding box regression, and the non-maximum value suppression method is used to remove redundancy, and the classification result and bounding box are obtained, specifically This is achieved through the following steps:

步骤4051，将特征图划分为10×10个区域，输入回归检测网络；Step 4051, divide the feature map into 10×10 regions, and input them into the regression detection network;

步骤4051，对于每一个区域，回归检测网络将输出7个可能存在的目标的位置与种类；其中，目标种类共有A个，即输出对应A种目标的可能性，与训练集的设置有关；位置参数包含3个数据，包括目标边界框的中心位置坐标、宽、高；Step 4051, for each area, the regression detection network will output the positions and types of 7 possible targets; among them, there are A total of target types, that is, the possibility of outputting the corresponding A type of target is related to the setting of the training set; the position The parameter contains 3 data, including the center position coordinates, width and height of the target bounding box;

步骤4052，非极大值抑制方法是对于获得的同一种类边界框，使用以下公式计算其交并比：Step 4052, the non-maximum value suppression method is to calculate the intersection and union ratio of the obtained bounding box of the same type using the following formula:

其中，S为计算所得的交并比，M，N表示同一类目标的两个边界框，M∩N表示边界框M与N的交集，M∪N表示边界框M与N的并集。对于S大于0.75的两个边界框，剔除其中分类结果值较小的边界框。Among them, S is the calculated intersection ratio, M and N represent two bounding boxes of the same type of objects, M∩N represents the intersection of bounding boxes M and N, and M∪N represents the union of bounding boxes M and N. For two bounding boxes with S greater than 0.75, the bounding box with the smaller classification result value is eliminated.

上述方案中，所述步骤4中将所述分类结果和边界框与原有已知的标签图像进行损失函数计算，使用包含动量的随机梯度下降法在卷积神经网络中对预测误差进行反向传播，并更新卷积神经网络的参数值，具体通过以下步骤实现：In the above scheme, in the step 4, the loss function calculation is performed on the classification result and the bounding box and the original known label image, and the prediction error is reversed in the convolutional neural network using the stochastic gradient descent method including momentum. Propagate and update the parameter values of the convolutional neural network, specifically through the following steps:

步骤401，根据所述分类结果和边界框中目标的位置与种类以及训练集中标定的所需识别目标的位置与种类计算损失函数，损失函数的计算公式如下所示：Step 401, calculate the loss function according to the classification result, the position and type of the object in the bounding box, and the position and type of the target to be recognized in the training set. The calculation formula of the loss function is as follows:

其中，100为区域数量，7为每个区域需要预测的建议框和最终生成的边界框数量，i为区域编号，j为建议框和边界框编号，loss为误差值，obj表示存在目标，noobj表示不存在目标，x和y分别为建议框和边界框中心的横坐标和纵坐标的预测值，w和h分别为建议框和边界框的宽和高的预测值，C为建议框和边界框是否包含目标的预测值，包含A个值，分别对应A类目标的可能性，

为对应的标注值，

和

分别表示目标落入和未落入区域i的第j个建议框和边界框内；Among them, 100 is the number of regions, 7 is the number of proposal boxes that need to be predicted for each region and the number of final bounding boxes, i is the number of regions, j is the number of proposal boxes and bounding boxes, loss is the error value, obj indicates the existence of the target, and noobj Indicates that there is no target, x and y are the predicted values of the abscissa and ordinate of the center of the proposal box and the bounding box, respectively, w and h are the predicted values of the width and height of the proposal box and the bounding box, respectively, and C is the proposal box and the boundary Whether the box contains the predicted value of the target, including A values, corresponding to the possibility of the A-type target,

is the corresponding labeled value,

and

Indicates that the target falls into and does not fall into the j-th proposal box and bounding box of region i, respectively;

步骤402，根据损失函数计算结果，使用包含动量的随机梯度下降法对权重进行更新。Step 402, according to the calculation result of the loss function, use the stochastic gradient descent method including momentum to update the weights.

上述方案中，所述步骤3的预处理为通过随机旋转、镜像、翻转、缩放、平移、尺度变换、对比度变换、噪声扰动和颜色变化扩充训练集。In the above solution, the preprocessing of step 3 is to expand the training set by random rotation, mirroring, flipping, scaling, translation, scale transformation, contrast transformation, noise perturbation and color change.

与现有的技术相比，本发明通过对红外图像进行学习，使目标检测网络获得对可见光与红外目标的识别能力，同时，通过改进网络结构，使本方法相对传统深度学习方法具有更好的检测效果。Compared with the existing technology, the present invention enables the target detection network to obtain the ability to recognize visible light and infrared targets by learning infrared images, and at the same time, by improving the network structure, this method has better performance than traditional deep learning methods. Detection effect.

附图说明Description of drawings

图1为本发明的流程图；Fig. 1 is a flowchart of the present invention;

图2为本发明的网络结构图；Fig. 2 is a network structure diagram of the present invention;

图3为本发明的结果图。Fig. 3 is the result graph of the present invention.

具体实施方式Detailed ways

为了使本发明的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不用于限定本发明。In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

本发明实施例提供一种基于特征融合和稠密连接的红外面目标检测方法，如图1所示，该方法通过以下步骤实现：An embodiment of the present invention provides an infrared surface target detection method based on feature fusion and dense connection, as shown in Figure 1, the method is implemented through the following steps:

步骤1、构建数据集Step 1. Build a dataset

如果需要检测算法具有对红外图像进行识别的能力，需要在数据集中加入红外图像。本发明使用红外图像构建数据集，使用边界框对数据集中图像进行人工标记。If the detection algorithm is required to have the ability to recognize infrared images, it is necessary to add infrared images to the data set. The invention uses infrared images to construct a data set, and uses bounding boxes to manually mark the images in the data set.

步骤2、扩充训练集Step 2. Expand the training set

通过随机旋转、镜像、翻转、缩放、平移、尺度变换、对比度变换、噪声扰动和颜色变化等方法，扩充训练集。可以弥补数据集采集困难的缺点，提高小数据集的训练效果。Augment the training set with methods such as random rotation, mirroring, flipping, scaling, translation, scale transformation, contrast transformation, noise perturbation, and color variation. It can make up for the shortcomings of difficult data set collection and improve the training effect of small data sets.

步骤3、划分10*10区域Step 3. Divide 10*10 area

将原图像划分为10*10的区域，每个区域分别负责检查中心落入该区域的目标，可以大大加快检测速度。Divide the original image into 10*10 areas, and each area is responsible for checking the objects whose center falls into this area, which can greatly speed up the detection speed.

步骤4、使用稠密网络进行特征提取Step 4. Use dense network for feature extraction

特征提取过程包含以下步骤：The feature extraction process consists of the following steps:

第一步，使用卷积核大小为3*3，数量为32的卷积层对输入图像进行计算，然后进行2*2的池化运算，得到特征图F₁。In the first step, the input image is calculated using a convolutional layer with a convolution kernel size of 3*3 and a number of 32, and then a 2*2 pooling operation is performed to obtain a feature map F₁ .

第二步，使用包含64个3*3卷积核与64个1*1卷积核的稠密块对F₁进行特征提取，同时计算残差，然后进行2*2的池化运算，得到特征图F₂。In the second step, use a dense block containing 64 3*3 convolution kernels and 64 1*1 convolution kernels to perform feature extraction on F₁ , calculate the residual at the same time, and then perform a 2*2 pooling operation to obtain the features Figure_F2 .

第三步，使用包含64个1*1卷积核与64个3*3卷积核的稠密块对F₂进行特征提取，同时计算残差，然后进行2*2的池化运算，得到特征图F₃。The third step is to use a dense block containing 64 1*1 convolution kernels and 64 3*3 convolution kernels to perform feature extraction on F₂ , calculate the residual at the same time, and then perform a 2*2 pooling operation to obtain the features Figure_F3 .

第四步，使用包含64个1*1卷积核与64个3*3卷积核的稠密块对F₄进行特征提取，然后进行1*1的卷积，同时计算残差，最后进行2*2的池化运算，得到特征图F₄。The fourth step is to use a dense block containing 64 1*1 convolution kernels and 64 3*3 convolution kernels to perform feature extraction on F₄ , then perform 1*1 convolution, calculate the residual at the same time, and finally perform 2 *2 pooling operation to obtain the feature map F₄ .

第五步，使用包含256个1*1卷积核与256个3*3卷积核的稠密块对F₄进行特征提取，然后进行1*1的卷积，同时计算残差，最后进行2*2的池化运算，得到特征图F₅。The fifth step is to use a dense block containing 256 1*1 convolution kernels and 256 3*3 convolution kernels to perform feature extraction on F₄ , then perform 1*1 convolution, calculate the residual at the same time, and finally perform 2 *2 pooling operation to obtain the feature map F₅ .

第六步，使用包含1024个1*1卷积核、1024个3*3卷积核和1024个1*1卷积核的稠密块对F₅进行特征提取，然后进行1*1的卷积，同时计算残差，得到特征图F₆。In the sixth step, use a dense block containing 1024 1*1 convolution kernels, 1024 3*3 convolution kernels and 1024 1*1 convolution kernels to perform feature extraction on F₅ , and then perform 1*1 convolution , and calculate the residual at the same time to obtain the feature map F₆ .

步骤5、对特征提取结果进行特征融合Step 5. Perform feature fusion on the feature extraction results

特征融合的方法包含以下步骤：The method of feature fusion includes the following steps:

第一步，提取步骤3中所得的特征图F₄、F₅、F₆。The first step is to extract the feature maps F₄ , F₅ , and F₆ obtained in step 3.

第二步，对特征图F₄进行4次2*2池化，分别取四领域中左上，右上，左下，右下的点，形成新的特征图F₄’，与特征图F₅组合成特征图组F₇。The second step is to perform 4 times of 2*2 pooling on the feature map F₄ , respectively take the upper left, upper right, lower left, and lower right points in the four fields to form a new feature map F₄ ′, which is combined with the feature map F₅ Feature map group F₇ .

第三步，对特征图F₇进行4次2*2池化，分别取四领域中左上，右上，左下，右下的点，形成新的特征图F₇’，与特征图F₆组合成特征图组F₈。The third step is to perform 4 times of 2*2 pooling on the feature map F₇ , respectively take the upper left, upper right, lower left, and lower right points in the four fields to form a new feature map F₇ ′, which is combined with the feature map F₆ Feature map group F₈ .

步骤6、回归检测得到分类结果和边界框Step 6. Regression detection to obtain classification results and bounding boxes

得到分类结果和边界框的方法如下：对于每一个区域，分类和回归检测网络将输出7个可能所存在的目标的位置与种类。其中，目标种类共有A个，即输出对应A种目标的可能性，与训练集的设置有关；位置参数包含3个数据，包括目标边界框的中心位置坐标、宽、高；The method of obtaining classification results and bounding boxes is as follows: for each region, the classification and regression detection network will output the positions and types of 7 possible targets. Among them, there are a total of A types of targets, that is, the possibility of outputting corresponding targets of A type is related to the setting of the training set; the position parameter contains 3 data, including the center position coordinates, width, and height of the target bounding box;

步骤7、计算损失函数与更新参数Step 7. Calculate loss function and update parameters

根据第6步输出的目标的位置与种类与训练集中标定的所需识别目标的位置与种类计算损失函数，此步骤只在训练过程中进行。损失函数的计算公式如下所示：Calculate the loss function according to the position and type of the target output in step 6 and the position and type of the target to be recognized in the training set. This step is only performed during the training process. The calculation formula of the loss function is as follows:

其中，100为区域数量，7为每个区域需要预测的建议框和最终生成的编辑框数量，i为区域编号，j为建议框和边界框编号，loss为误差值，obj表示存在目标，noobj表示不存在目标。x和y分别为建议框和边界框中心的横坐标和纵坐标的预测值，w和h分别为建议框和边界框的宽和高的预测值，C为建议框和边界框是否包含目标的预测值，包含A个值，分别对应A类目标的可能性，

为对应的标注值，

和

分别表示目标落入和未落入区域i的第j个建议框和边界框内。然后，根据损失函数计算结果，使用包含动量的随机梯度下降法对权重进行更新。Among them, 100 is the number of regions, 7 is the number of suggestion boxes that need to be predicted for each region and the number of final edit boxes, i is the number of regions, j is the number of suggestion boxes and bounding boxes, loss is the error value, obj indicates the existence of the target, and noobj Indicates that no target exists. x and y are the predicted values of the abscissa and ordinate of the center of the proposal box and the bounding box, respectively, w and h are the predicted values of the width and height of the proposal box and the bounding box, respectively, and C is whether the proposal box and the bounding box contain the target Predicted value, including A values, respectively corresponding to the possibility of class A targets,

is the corresponding labeled value,

and

denote that the object falls and does not fall within the j-th proposal box and bounding box of region i, respectively. Then, according to the loss function calculation results, the weights are updated using the stochastic gradient descent method including momentum.

重复步骤3-7直到误差满足要求或迭代次数达到设定的上限。Repeat steps 3-7 until the error meets the requirements or the number of iterations reaches the set upper limit.

步骤8、使用测试集进行测试Step 8. Test with the test set

使用步骤7训练完成的目标检测网络对验证集中的图像进行处理，获取目标检测的准确度和所需时间，以及最终目标检测结果图。Use the target detection network trained in step 7 to process the images in the verification set to obtain the accuracy and time required for target detection, as well as the final target detection result map.

下面结合图2对本发明的网络结构做进一步说明Below in conjunction with Fig. 2, the network structure of the present invention is further described

1、网络层数设置1. Network layer settings

本发明所使用的神经网络分两部分，第一部分为特征提取网络，由5个稠密块组成，共包含25层卷积神经网络。第二部分为特征融合及回归检测网络，包含8层卷积神经网络及1层全卷积网络。The neural network used in the present invention is divided into two parts. The first part is a feature extraction network, which is composed of 5 dense blocks and contains a total of 25 layers of convolutional neural networks. The second part is feature fusion and regression detection network, including 8-layer convolutional neural network and 1-layer full convolutional network.

2、稠密块设置2. Dense block setting

特征提取网络部分所使用稠密块设置如下：The dense block settings used in the feature extraction network part are as follows:

(1)稠密块1包含2层卷积神经网络，第一层所使用卷积核数量为64，大小为1*1，步长为1；第二层所使用卷积核数量为64，大小为3*3，步长为1。稠密块1使用1次。(1) Dense block 1 contains 2 layers of convolutional neural network. The number of convolution kernels used in the first layer is 64, the size is 1*1, and the step size is 1; the number of convolution kernels used in the second layer is 64, and the size is is 3*3, and the step size is 1. Dense Block 1 is used once.

(2)稠密块2包含2层卷积神经网络，第一层所使用卷积核数量为64，大小为3*3，步长为1；第二层所使用卷积核数量为64，大小为1*1，步长为1。稠密块2使用1次。(2) Dense block 2 contains 2 layers of convolutional neural networks. The number of convolution kernels used in the first layer is 64, the size is 3*3, and the step size is 1; the number of convolution kernels used in the second layer is 64, and the size is is 1*1, and the step size is 1. Dense Block 2 is used once.

(3)稠密块3包含2层卷积神经网络，第一层所使用卷积核数量为64，大小为1*1，步长为1；第二层所使用卷积核数量为64，大小为3*3，步长为1。稠密块3使用2次。(3) The dense block 3 contains 2 layers of convolutional neural networks. The number of convolution kernels used in the first layer is 64, the size is 1*1, and the step size is 1; the number of convolution kernels used in the second layer is 64, and the size is is 3*3, and the step size is 1. Dense block 3 uses 2 times.

(4)稠密块4包含2层卷积神经网络，第一层所使用卷积核数量为256，大小为1*1，步长为1；第二层所使用卷积核数量为256，大小为3*3，步长为1。稠密块4使用4次。(4) The dense block 4 contains 2 layers of convolutional neural networks. The number of convolution kernels used in the first layer is 256, the size is 1*1, and the step size is 1; the number of convolution kernels used in the second layer is 256, and the size is is 3*3, and the step size is 1. Dense block 4 uses 4 times.

(5)稠密块5包含3层卷积神经网络，第一层所使用卷积核数量为1024，大小为1*1，步长为1；第二层所使用卷积核数量为1024，大小为3*3，步长为1；第三层所使用卷积核数量为1024，大小为1*1，步长为1。稠密块5使用2次。(5) The dense block 5 contains 3 layers of convolutional neural networks. The number of convolution kernels used in the first layer is 1024, the size is 1*1, and the step size is 1; the number of convolution kernels used in the second layer is 1024, and the size is is 3*3, the step size is 1; the number of convolution kernels used in the third layer is 1024, the size is 1*1, and the step size is 1. Dense block 5 uses 2 times.

3、特征融合设置。3. Feature fusion settings.

特征融合所使用的3组特征图来源于特征提取网络的第9层、第18层和第25层结果。然后将生成特征图通过卷积与上采样与浅层特征图结合。所得结果通过3*3卷积层与1*1卷积层进行进一步处理，然后将所得的三组新特征图进行特征融合。The three sets of feature maps used in feature fusion come from the results of the 9th, 18th and 25th layers of the feature extraction network. Then the generated feature map is combined with the shallow feature map through convolution and upsampling. The obtained results are further processed through a 3*3 convolutional layer and a 1*1 convolutional layer, and then the obtained three sets of new feature maps are subjected to feature fusion.

下面结合图3对本发明的仿真效果做进一步说明。The simulation effect of the present invention will be further described below in conjunction with FIG. 3 .

1.仿真条件：1. Simulation conditions:

本发明的仿真所使用待检测的图像大小为480×640，包含行人和自行车。The size of the image to be detected used in the simulation of the present invention is 480×640, including pedestrians and bicycles.

2.仿真结果与分析：2. Simulation results and analysis:

图3是本发明的结果图，其中，图3(a)为待检测的图；图3(b)为提取得到的特征图；图2(c)为检测结果图。Fig. 3 is the result graph of the present invention, wherein, Fig. 3 (a) is the graph to be detected; Fig. 3 (b) is the feature graph extracted; Fig. 2 (c) is the detection result graph.

使用稠密网络对图3(a)进行特征提取得到一系列特征图，因中间过程的特征图太多，只抽取其中两幅，即图3(b)和图3(c)。其中，图3(b)为较浅层网络提取得到的特征图，图像尺寸较大，含有的细节信息多，语义信息少；图3(c)为较深层网络提取得到的特征图，图像尺寸较小，含有的细节信息少，语义信息多。Using a dense network to extract features from Figure 3(a) to obtain a series of feature maps, because there are too many feature maps in the intermediate process, only two of them are extracted, namely Figure 3(b) and Figure 3(c). Among them, Figure 3(b) is the feature map extracted by the shallower network, the image size is larger, contains more detailed information, and less semantic information; Figure 3(c) is the feature map extracted by the deeper network, the image size Smaller, it contains less detailed information and more semantic information.

对特征图进行融合及回归检测之后，可以得到行人和自行车的位置，将其在原图上进行标注，即得到最终的结果图3(c)。After the fusion and regression detection of the feature maps, the positions of pedestrians and bicycles can be obtained, and marked on the original image, and the final result is shown in Figure 3(c).

以上所述，仅为本发明的较佳实施例而已，并非用于限定本发明的保护范围。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the protection scope of the present invention.

Claims

Translated fromChinese

1.一种基于特征融合和稠密连接的红外面目标检测方法，其特征在于，该方法通过如下步骤实现：1. An infrared surface target detection method based on feature fusion and dense connection, characterized in that, the method is realized through the following steps:

步骤6，通过训练完成的卷积神经网络参数对验证集中的图像进行处理，获取目标检测的准确度和所需时间，以及最终目标检测结果图；Step 6. Process the images in the verification set through the trained convolutional neural network parameters to obtain the accuracy and time required for target detection, as well as the final target detection result map;

所述步骤4中对预处理后的图像进行特征提取和特征融合，并通过回归网络获得分类结果和边界框，具体通过以下步骤实现：In the step 4, the preprocessed image is subjected to feature extraction and feature fusion, and a classification result and a bounding box are obtained through a regression network, specifically through the following steps:

步骤401，在所述训练集中随机抽取固定数量的图像，对每一幅图像划分10×10的区域；Step 401, randomly extracting a fixed number of images from the training set, and dividing each image into a 10×10 region;

步骤405，将所述融合的特征图和建议框送入回归网络进行分类和边界框回归，并使用非极大值抑制方法去除冗余，获得分类结果和边界框；Step 405, sending the fused feature map and suggestion frame into the regression network for classification and bounding box regression, and using the non-maximum value suppression method to remove redundancy to obtain classification results and bounding boxes;

所述步骤402中稠密连接网络的计算方法如公式：The calculation method of the densely connected network in the step 402 is as the formula:

d_l＝H_l([d₀,d₁,…,d_l-1])d_l ＝H_l ([d₀ ,d₁ ,…,d_l-1 ])

其中，d_l表示稠密连接网络中第l个卷积层的输出结果，若稠密连接网络共包含B个卷积层，则l在0～B之间取值，H_l(*)是正则化、卷积和线性整流激活函数的组合操作，d₀为输入图像，d_l-1为第l-1层的输出结果；Among them, d_l represents the output result of the lth convolutional layer in the densely connected network. If the densely connected network contains B convolutional layers, then l takes a value between 0 and B, and H_l (*) is the regularization , the combined operation of convolution and linear rectification activation functions, d₀ is the input image, and d_l-1 is the output result of the l-1th layer;

所述步骤403中对提取的特征图进行特征融合是将所提取到的不同尺度的特征图通过池化方法进行直接融合；The feature fusion of the extracted feature maps in the step 403 is to directly fuse the extracted feature maps of different scales through a pooling method;

所述步骤403中对提取的特征图进行特征融合，具体通过以下步骤实现：In the step 403, feature fusion is carried out on the extracted feature map, specifically through the following steps:

步骤4033，用新的特征图F₂’和F₃’代替第二组特征图F₂和第三组特征图F₃进入回归网络；Step 4033, replace the second group of feature maps F₂ and the third group of feature maps F₃ with new feature maps F₂ ' and F₃ ' to enter the regression network;

所述步骤405中将所述融合的特征图和建议框送入回归网络进行分类和边界框回归，并使用非极大值抑制方法去除冗余，获得分类结果和边界框，具体通过以下步骤实现：In the step 405, the fused feature map and suggestion frame are sent to the regression network for classification and bounding box regression, and the non-maximum value suppression method is used to remove redundancy, and the classification result and bounding box are obtained, specifically through the following steps: :

其中，S为计算所得的交并比，M，N表示同一类目标的两个边界框，M∩N表示边界框M与N的交集，M∪N表示边界框M与N的并集，对于S大于0.75的两个边界框，剔除其中分类结果值较小的边界框；Among them, S is the calculated intersection ratio, M, N represent two bounding boxes of the same type of target, M∩N represents the intersection of bounding boxes M and N, M∪N represents the union of bounding boxes M and N, for For two bounding boxes with S greater than 0.75, remove the bounding box with a smaller classification result value;

所述步骤4中将所述分类结果和边界框与原有已知的标签图像进行损失函数计算，使用包含动量的随机梯度下降法在卷积神经网络中对预测误差进行反向传播，并更新卷积神经网络的参数值，具体通过以下步骤实现：In the step 4, the classification result, the bounding box and the original known label image are used to calculate the loss function, and the stochastic gradient descent method including momentum is used to backpropagate the prediction error in the convolutional neural network, and update The parameter value of the convolutional neural network is realized through the following steps:

为对应的标注值，

和

is the corresponding labeled value,

and

2.根据权利要求1所述的基于特征融合和稠密连接的红外面目标检测方法，其特征在于，所述步骤3的预处理为通过随机旋转、镜像、翻转、缩放、平移、尺度变换、对比度变换、噪声扰动和颜色变化扩充训练集。2. The infrared surface target detection method based on feature fusion and dense connection according to claim 1, wherein the preprocessing of the step 3 is through random rotation, mirror image, flip, scaling, translation, scale transformation, contrast Transforms, noise perturbations, and color changes augment the training set.