CN115035372A

Movatterモバイル変換

Info

Publication number: CN115035372A
Application number: CN202210544283.0A
Authority: CN
Inventors: 葛铭; 李志鹏; 洪炜杰; 魏江; 郑小青
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2022-05-18
Filing date: 2022-05-18
Publication date: 2022-09-09

Abstract

Translated fromChinese

本发明公开了一种基于目标检测的钢筋检测方法，S1，对钢筋图像数据进行预处理，并对模型进行预训练；S2，将图像输入到特征提取网络，输出特征图F″；S3，将S2得到的特征图F″进行特征融合；S4，对提取到的特征处理进行分类和回归，回归时通过非极大值抑制法抑制掉多余的预测框，保留一个最接近的作为最终输出的预测框；S5，统计S4最终输出的预测框数量，将其作为钢筋数量以实现钢筋计数的功能。本发明在yolo‑v4网络的基础上引入了注意力机制模块(CBAM)和自适应空间特征融合模块(ASFF)，有效降低了遮挡造成的钢筋漏检率高，提升了钢筋的检测精度。

The invention discloses a steel bar detection method based on target detection. S1, preprocessing steel bar image data, and pretraining a model; S2, inputting the image to a feature extraction network, and outputting a feature map F"; S3, adding The feature map F" obtained in S2 is used for feature fusion; S4, the extracted features are classified and regressed, and the extra prediction frame is suppressed by the non-maximum value suppression method during regression, and the closest prediction is retained as the final output. box; S5, count the number of prediction boxes finally output by S4, and use it as the number of steel bars to realize the function of counting steel bars. The present invention introduces an attention mechanism module (CBAM) and an adaptive spatial feature fusion module (ASFF) on the basis of the yolo-v4 network, which effectively reduces the high missed detection rate of steel bars caused by occlusion, and improves the detection accuracy of steel bars.

Description

Translated fromChinese

一种基于目标检测的钢筋检测方法A Rebar Detection Method Based on Target Detection

技术领域technical field

本发明属于密集目标检测技术领域，涉及一种基于目标检测的钢筋检测方法。The invention belongs to the technical field of dense target detection, and relates to a steel bar detection method based on target detection.

背景技术Background technique

近年来，随着人工智能技术的不断发展，将目标检测、神经网络等人工智能技术与其他传统行业结合起来，不仅能够提高效率和降低成本，还提高各行各业生产的产能。目前，钢筋计数通常是人工手动标记，依次计数。但是对于数量较多、堆积密集的一捆钢筋来说，这种人工计数方法费力费时且容易出错。因此，利用目标检测技术对成捆的密集钢筋进行准确检测具有重要的研究意义。相比人工计数，利用目标检测技术的优点不仅在于准确率高这一方面，更加能够促进工业生产的自动化和高效率发展，早日向智能化生产方向靠拢。In recent years, with the continuous development of artificial intelligence technology, combining artificial intelligence technologies such as target detection and neural networks with other traditional industries can not only improve efficiency and reduce costs, but also increase the production capacity of various industries. Currently, rebar counts are usually manually marked and counted sequentially. But for large, densely packed bundles of rebar, this manual counting method is laborious, time-consuming and error-prone. Therefore, it is of great significance to use target detection technology to accurately detect bundles of dense steel bars. Compared with manual counting, the advantage of using target detection technology is not only in the aspect of high accuracy, but also can promote the automation and high-efficiency development of industrial production, and move closer to the direction of intelligent production as soon as possible.

现有技术的钢筋检测方法主要存在以下两个问题There are the following two problems in the detection method of steel bars in the prior art

1、在钢筋检测的实际场景中，钢筋在图像中所占像素相对较小，相关特征的提取比较困难，并且在经过网络之后，小目标的特征信息会丢失导致检测结果的漏检率和误检率较高。1. In the actual scene of steel bar detection, the pixels occupied by steel bars in the image are relatively small, and it is difficult to extract relevant features, and after passing through the network, the feature information of small targets will be lost, resulting in missed detection rates and errors in detection results. The detection rate is high.

2、多捆钢筋堆积在一起会造成它们的边界难以区分，如果直接将所有钢筋一次性检测，会导致图片边缘的钢筋因相邻钢筋的遮挡而漏检。2. The stacking of multiple bundles of steel bars will make their boundaries difficult to distinguish. If all steel bars are directly detected at one time, the steel bars on the edge of the picture will be missed due to the occlusion of adjacent steel bars.

故针对现有技术的缺陷，实有必要提出一种技术方案以解决现有技术存在的技术问题。Therefore, in view of the defects of the prior art, it is necessary to propose a technical solution to solve the technical problems existing in the prior art.

发明内容SUMMARY OF THE INVENTION

为解决上述问题，本发明提供一种基于目标检测的钢筋检测方法，包括以下步骤：In order to solve the above problems, the present invention provides a method for detecting steel bars based on target detection, comprising the following steps:

S1，对钢筋图像数据进行预处理，并对模型进行预训练；S1, preprocessing the steel bar image data and pretraining the model;

S2，将图像输入到特征提取网络，输出特征图F″；S2, input the image to the feature extraction network, and output the feature map F";

S3，将S2得到的特征图F″进行特征融合；S3, perform feature fusion on the feature map F″ obtained in S2;

S4，对提取到的特征处理进行分类和回归，回归时通过非极大值抑制法抑制掉多余的预测框，保留一个最接近的作为最终输出的预测框；S4, classify and regress the extracted feature processing, suppress the redundant prediction frame by the non-maximum value suppression method during regression, and retain a closest prediction frame as the final output;

S5，统计S4最终输出的预测框数量，将其作为钢筋数量以实现钢筋计数的功能。S5, count the number of prediction boxes finally output by S4, and use it as the number of steel bars to realize the function of counting steel bars.

优选地，所述S1，对钢筋图像数据进行预处理，并对模型进行预训练，具体包括以下步骤：Preferably, in the S1, preprocessing the steel bar image data and pretraining the model, which specifically includes the following steps:

S11，通过数据增强来丰富钢筋数据集；S11, enrich the steel bar dataset through data augmentation;

S12，使用公开数据集PASCAL VOC2007在yolo-v4网络上进行预训练，加载yolo-v4中DarkNet53和PANet部分的权重；S12, use the public dataset PASCAL VOC2007 for pre-training on the yolo-v4 network, and load the weights of the DarkNet53 and PANet parts in yolo-v4;

S13，修改预训练模型中的Anchor尺寸，使用K-means聚类算法对钢筋数据集进行聚类，得到钢筋数据集的Anchor尺寸。S13, modify the Anchor size in the pre-training model, and use the K-means clustering algorithm to cluster the steel bar data set to obtain the Anchor size of the steel bar data set.

优选地，所述S11，通过数据增强来丰富钢筋数据集，具体包括以下步骤：Preferably, the S11, enriching the steel bar data set through data enhancement, specifically includes the following steps:

S111，保持图像的尺寸大小不变，随机调节对比度与亮度，同时采用噪声和图片模糊方法加强网络对图片的鲁棒性；S111, keeping the size of the image unchanged, randomly adjusting the contrast and brightness, and using noise and image blur methods to enhance the robustness of the network to the image;

S112，改变图片的尺寸，即将图片随机进行任意角度的旋转和翻转，同时将图片进行拉伸和压缩来扩大不同尺寸图片的数量。S112 , changing the size of the picture, that is, randomly rotating and flipping the picture at any angle, and simultaneously stretching and compressing the picture to expand the number of pictures of different sizes.

优选地，所述S13，修改预训练模型中的Anchor尺寸，具体包括以下步骤：Preferably, the S13, modifying the Anchor size in the pre-training model, specifically includes the following steps:

S131，随机选取K个聚类中心，将标注框划分到距离最近的聚类中心所在的类，所有的标注框都进行归类；S131, randomly select K cluster centers, divide the labeling frame into the class where the closest clustering center is located, and classify all the labeling frames;

S132，计算所有框宽和高的平均值，更新聚类中心；S132, calculate the average value of all frame widths and heights, and update the cluster center;

S133，重复S131和S132，直到聚类中心不变，得到Anchor尺寸。In S133, S131 and S132 are repeated until the cluster center remains unchanged, and the Anchor size is obtained.

优选地，所述S2，将图像输入到特征提取网络，输出特征图F″，具体为特征提取网络基于CSPDarknet53网络上进行改进，将CSPx模块尾部的CBM模块替换成CBAM模块，CBM模块由卷积层Conv、批量归一化BN、Mish激活函数组成，CSPx模块包括卷积模块，多个残差网络结构，合并模块Concat和CBAM模块。Preferably, in the S2, the image is input into the feature extraction network, and the feature map F" is output. Specifically, the feature extraction network is improved based on the CSPDarknet53 network, and the CBM module at the end of the CSPx module is replaced by a CBAM module, and the CBM module is composed of convolutional modules. Layer Conv, batch normalization BN, Mish activation function, CSPx module includes convolution module, multiple residual network structures, merging module Concat and CBAM module.

优选地，所述S2，将图像输入到特征提取网络，输出特征图F″，具体包括以下步骤：Preferably, in the S2, the image is input to the feature extraction network, and the feature map F" is output, which specifically includes the following steps:

S21，将特征图F输入到通道注意力模块，输出通道注意力特征图M_c(F)，与原特征图F按对应元素相乘得到

S21, input the feature map F into the channel attention module, and output the channel attention feature map M_c (F), which is multiplied with the original feature map F according to the corresponding elements to obtain

S22，将特征图F′输入到空间注意力模块，得到最终的特征图F″。S22, input the feature map F' to the spatial attention module to obtain the final feature map F".

优选地，所述S21具体包括以下步骤：Preferably, the S21 specifically includes the following steps:

S211，将输入特征图F分别进行最大池化和平均池化得到

和

最大池化由特征图F每个局部的最大值得到输出特征图

平均池化由特征图F每个局部的平均值得到输出特征图

S211, the input feature map F is obtained by performing maximum pooling and average pooling respectively

and

Maximum pooling obtains the output feature map from the maximum value of each part of the feature map F

Average pooling obtains the output feature map from the average value of each part of the feature map F

S212，特征图

和

经过多层感知器，对两个特征图

和

进行逐像素相加，经过Sigmoid激活函数运算生成通道注意力特征图M_c(F)，S212, feature map

and

After a multi-layer perceptron, the two feature maps are

and

Perform pixel-by-pixel addition, and generate channel attention feature map M_c (F) through Sigmoid activation function operation,

其中，σ是Sigmoid函数，W₀和W₁是多层感知器的共享权重；where σ is the sigmoid function, and W₀ and W₁ are the shared weights of the multilayer perceptron;

S213，将通道注意力特征图M_c(F)进行广播，与原特征图F按对应元素相乘得到

S213, broadcast the channel attention feature map M_c (F), and multiply it with the original feature map F by corresponding elements to obtain

优选地，所述S22具体包括以下步骤：Preferably, the S22 specifically includes the following steps:

S221，将S21得到的F′进行最大池化和平均池化，得到

和

将特征图

和

合并，得到通道数为2的特征图

S221, perform maximum pooling and average pooling on F' obtained in S21, and obtain

and

feature map

and

Merge to get a feature map with a channel number of 2

S222，通过7×7卷积层将特征图F_cat的通道数量减少到1，经过Sigmoid函数生成空间注意力模块输出的特征图M_s(F′)，计算公式为：S222, reduce the number of channels of the feature map F_cat to 1 through the 7×7 convolution layer, and generate the feature map M_s (F′) output by the spatial attention module through the Sigmoid function. The calculation formula is:

M_s(F′)＝σ(f^7×7(F_cat))M_s (F′)=σ(f^7×7 (F_cat ))

其中，f代表卷积操作，7×7是卷积核大小；Among them, f represents the convolution operation, and 7×7 is the size of the convolution kernel;

S223，将注意力特征图M_s(F′)进行广播，与原特征图F′按元素相乘得到

S223, broadcast the attention feature map M_s (F'), and multiply it with the original feature map F' element-wise to obtain

优选地，所述S3，将S2得到的特征图F″进行特征融合中，特征融合网络在yolo-v4中的PANet网络中引入自适应空间特征融合模块，具体包括以下步骤：Preferably, in the S3, the feature map F" obtained in S2 is subjected to feature fusion, and the feature fusion network introduces an adaptive spatial feature fusion module into the PANet network in yolo-v4, which specifically includes the following steps:

S31，通过空间金字塔池化和特征金字塔网络得到三层尺寸和大小不同的特征图X₁，X₂，X₃；S31, obtain three layers of feature maps X₁ , X₂ , X₃ with different sizes and sizes through spatial pyramid pooling and feature pyramid network;

S32，将输出的三个特征图进行融合操作，各层特征图融合时作相加操作，并且计算每层的权重参数，然后将三层特征图分别与计算出的权重参数相乘，输出最后相加的结果，公式为：S32, perform a fusion operation on the output three feature maps, perform an addition operation when the feature maps of each layer are fused, and calculate the weight parameters of each layer, and then multiply the three-layer feature maps with the calculated weight parameters respectively, and output the final The result of the addition, the formula is:

y_i＝α_i·X_1→i+β_i·X_2→i+γ_i·X_3→ii＝1，2，3y_i =α_i ·X_1→i +β_i ·X_2→i +γ_i ·X_3→i i=1, 2, 3

其中，X_1→i、X_2→i、X_3→i分别为三个不同层要与第i层特征图大小相同时获得的特征向量，α_i、β_i、γ_i分别为从三个不同层融合到第i层时对应的权重参数，由网络自适应学习，参数大小在0到1之间，且α_i+β_i+γ_i＝1。Among them, X_1→i , X_2→i , X_3→i are the feature vectors obtained when three different layers have the same size as the feature map of the i-th layer, respectively, α_i , β_i , γ_i are from the three The corresponding weight parameters when different layers are fused to the i-th layer are adaptively learned by the network, the parameter size is between 0 and 1, and α_i +β_i +γ_i =1.

优选地，所述S4具体为使用Soft-DIoU-NMS算法，其策略是不把那些超出阈值的框抑制掉，而是先将它们的置信度降低，使它们暂时保留下来，Soft-DIoU-NMS的公式为：Preferably, the S4 specifically uses the Soft-DIoU-NMS algorithm, and its strategy is not to suppress those frames that exceed the threshold, but to reduce their confidence first, so that they are temporarily retained, Soft-DIoU-NMS The formula is:

其中，S是置信度得分，X是所有候选框中置信度最高的框，Y是其余候选框，IoU(X，γ)表示两个候选框的交并比，n是设定的阈值，默认设为0.5；ρ²(，)表示两点之间距离的平方，a、b分别为候选框X和Y的中心点，l是候选框X和Y的最小外接框的对角线长度。Among them, S is the confidence score, X is the frame with the highest confidence in all candidate frames, Y is the remaining candidate frames, IoU(X, γ) represents the intersection ratio of the two candidate frames, n is the set threshold, the default Set to 0.5; ρ² (,) represents the square of the distance between two points, a and b are the center points of candidate frames X and Y, respectively, and l is the diagonal length of the minimum bounding frame of candidate frames X and Y.

本发明有益效果至少包括：The beneficial effects of the present invention at least include:

1、在特征提取网络的CSPx模块中增加了CBAM模块来改善模型对钢筋的检测性能，从通道和空间两个方向对钢筋特征提取部分进行优化，将相关特征的权重增大，将无关特征的权重减小，使网络更加关注包含重要信息的区域，而抑制不相关信息的区域，提高钢筋检测的整体准确度，有效缓解钢筋之间间距密集并且尺寸较小导致的检测精度低的问题；1. The CBAM module is added to the CSPx module of the feature extraction network to improve the detection performance of the model for steel bars. The feature extraction part of the steel bar is optimized from the two directions of the channel and space, and the weight of the relevant features is increased. The reduction of the weight makes the network pay more attention to the areas containing important information, while suppressing the areas with irrelevant information, which improves the overall accuracy of steel bar detection, and effectively alleviates the problem of low detection accuracy caused by dense spacing and small size between steel bars;

2、在特征融合网络引入的ASFF模块可以保留底层信息在传递过程中的特征表达能力并且减少不相关特征的干扰。避免因堆叠钢筋间距较密的时候，个别钢筋被旁边的钢筋遮挡导致特征缺失或者提取不充分的问题，降低了漏检率；2. The ASFF module introduced in the feature fusion network can retain the feature expression ability of the underlying information in the transfer process and reduce the interference of irrelevant features. Avoid the problem of missing features or insufficient extraction due to the fact that individual steel bars are blocked by adjacent steel bars when the spacing between stacked steel bars is relatively close, and the missed detection rate is reduced;

3、对小目标检测的时候，相邻检测框会由于重叠面积过大被非极大值抑制法筛选掉，根据钢筋检测的实际情况在预测阶段采用了更加柔和的Soft-DIoU-NMS算法。改进后的预测模型提高了检测钢筋时的准确率。3. When detecting small targets, the adjacent detection frames will be screened out by the non-maximum value suppression method due to the excessive overlapping area. According to the actual situation of steel bar detection, a softer Soft-DIoU-NMS algorithm is adopted in the prediction stage. The improved predictive model increases the accuracy of detecting rebar.

附图说明Description of drawings

图1为本发明基于目标检测的钢筋检测方法的步骤流程图；Fig. 1 is the step flow chart of the steel bar detection method based on target detection of the present invention;

图2为本发明基于目标检测的钢筋检测方法的网络整体结构图；Fig. 2 is the overall network structure diagram of the steel bar detection method based on target detection of the present invention;

图3为本发明基于目标检测的钢筋检测方法的特征提取网络结构图；Fig. 3 is the feature extraction network structure diagram of the steel bar detection method based on target detection of the present invention;

图4为本发明基于目标检测的钢筋检测方法的注意力机制模块结构图；Fig. 4 is the structure diagram of the attention mechanism module of the steel bar detection method based on target detection of the present invention;

图5为本发明基于目标检测的钢筋检测方法的通道注意力模块结构图；Fig. 5 is the structure diagram of the channel attention module of the steel bar detection method based on target detection of the present invention;

图6为本发明基于目标检测的钢筋检测方法的空间注意力模块结构图；6 is a structural diagram of a spatial attention module of the steel bar detection method based on target detection of the present invention;

图7为本发明基于目标检测的钢筋检测方法的特征融合网络结构图。FIG. 7 is a feature fusion network structure diagram of the target detection-based steel bar detection method of the present invention.

具体实施方式Detailed ways

为了使本发明的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不用于限定本发明。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.

相反，本发明涵盖任何由权利要求定义的在本发明的精髓和范围上做的替代、修改、等效方法以及方案。进一步，为了使公众对本发明有更好的了解，在下文对本发明的细节描述中，详尽描述了一些特定的细节部分。对本领域技术人员来说没有这些细节部分的描述也可以完全理解本发明。On the contrary, the present invention covers any alternatives, modifications, equivalents and arrangements within the spirit and scope of the present invention as defined by the appended claims. Further, in order to give the public a better understanding of the present invention, some specific details are described in detail in the following detailed description of the present invention. The present invention can be fully understood by those skilled in the art without the description of these detailed parts.

参见图1，为本发明实施例的本发明的技术方案为基于目标检测的钢筋检测方法的示意图，包括以下步骤：Referring to FIG. 1, it is a schematic diagram of a method for detecting steel bars based on target detection according to an embodiment of the present invention, which includes the following steps:

图2为本发明方法对应的网络结构图，从输入图像(Input)经过预处理、特征提取网络、特征融合网络后输出预测框。FIG. 2 is a network structure diagram corresponding to the method of the present invention, and a prediction frame is output from an input image (Input) after preprocessing, a feature extraction network, and a feature fusion network.

对于S1，对钢筋图像数据进行预处理，并对模型进行预训练，具体包括以下步骤：For S1, the rebar image data is preprocessed and the model is pretrained, which includes the following steps:

其中，S11，通过数据增强来丰富钢筋数据集，具体包括以下步骤：Among them, S11, enriching the reinforcement data set through data enhancement, specifically includes the following steps:

S13，修改预训练模型中的Anchor尺寸，具体包括以下步骤：S13, modify the Anchor size in the pre-training model, which specifically includes the following steps:

S133，重复S131和S132，直到聚类中心几乎不变，得到Anchor尺寸。In S133, S131 and S132 are repeated until the cluster center is almost unchanged, and the Anchor size is obtained.

S2，将图像输入到图3所示的特征提取网络，输出特征图F″，具体为特征提取网络基于CSPDarknet53网络上进行改进，将CSPx(Cross stage partial Network，跨阶段局部网络)，模块尾部的CBM(卷积)模块替换成CBAM(Convolutional Block Attention Module，注意力机制)模块，CBM模块由卷积层Conv、批量归一化BN、Mish激活函数组成，CSPx模块包括卷积模块，多个残差网络结构，合并模块Concat和CBAM模块，首先经过一个卷积模块，然后一部分经过多个残差网络结构，另一部分经过常规卷积处理。最后将两部分进行合并(Concat)，经过CBAM模块。S2, input the image to the feature extraction network shown in Figure 3, and output the feature map F". Specifically, the feature extraction network is improved based on the CSPDarknet53 network, and the CSPx (Cross stage partial Network, cross-stage partial network), the module tail The CBM (convolution) module is replaced by the CBAM (Convolutional Block Attention Module, attention mechanism) module. The CBM module consists of the convolutional layer Conv, batch normalized BN, and Mish activation function. The CSPx module includes a convolution module, multiple residuals Difference network structure, merge module Concat and CBAM module, first go through a convolution module, then part through multiple residual network structures, and the other part through conventional convolution processing. Finally, the two parts are merged (Concat) and passed through the CBAM module.

CBAM的结构参见图4。特征图F(图4中Input Feature)进入CBAM模块会先经过通道注意力模块，按照通道特征的重要性突出钢筋相关特征，减小无关特征，输出的特征图M_c(F)与原特征图F相乘得到F′作为空间注意力模块的输入，在空间上进一步对钢筋特征进行增强，输出的特征图M_s(F′)与特征图F′相乘得到最终的特征图F″(图4中RefinedFeature)。公式如下所示：The structure of CBAM is shown in Figure 4. When the feature map F (Input Feature in Figure 4) enters the CBAM module, it will first pass through the channel attention module, highlight the relevant features of the steel bar according to the importance of the channel features, and reduce the irrelevant features. The output feature map M_c (F) is the same as the original feature map. F is_multiplied to obtain F' as the input of the spatial attention module, and the reinforcement features are further enhanced in space. 4 in RefinedFeature). The formula is as follows:

其中M_c(F)是通道注意力模块21的输出，M_s(F′)是空间注意力模块22输出的特征图，F″是CBAM模块输出的特征图。where_Mc (F) is the output of the channel attention module 21, Ms(F′) is the feature map output by the spatial attention module 22, and F″ is the feature map output by the_CBAM module.

S2，将图像输入到特征提取网络，输出特征图F″，具体包括以下步骤：S2, input the image into the feature extraction network, and output the feature map F", which specifically includes the following steps:

S21，将特征图F(图5中Input Feature)输入到图5所示的通道注意力模块21，输出通道注意力特征图M_c(F)(图5中Channel Attention)，与原特征图F按对应元素相乘得到

；S21, input the feature map F (Input Feature in FIG. 5 ) to the channel attention module 21 shown in FIG. 5 , and output the channel attention feature map M_c (F) (Channel Attention in FIG. 5 ), which is the same as the original feature map F Multiply the corresponding elements to get

;

S21具体包括以下步骤：S21 specifically includes the following steps:

S211，将输入特征图F分别进行最大池化(MaxPool)和平均池化(AvgPool)得到

和

最大池化由特征图F每个局部的最大值得到输出特征图

平均池化由特征图F每个局部的平均值得到输出特征图

S211: Perform maximum pooling (MaxPool) and average pooling (AvgPool) on the input feature map F respectively to obtain

and

Maximum pooling obtains the output feature map from the maximum value of each local value of the feature map F

S212，特征图

和

经过多层感知器(Multi-Layer Perceptron，MLP)，MLP包含一个隐藏层，对两个特征图

和

and

After a Multi-Layer Perceptron (MLP), the MLP contains a hidden layer, and the two feature maps are

and

S22具体包括以下步骤：S22 specifically includes the following steps:

S221，将S21得到的F′(图6中Input Feature)输入图6所示的空间注意力模块22，进行最大池化和平均池化，得到

和

将特征图

和

合并，得到通道数为2的特征图

S221, input F' (Input Feature in Fig. 6) obtained in S21 into the spatial attention module 22 shown in Fig. 6, perform maximum pooling and average pooling, and obtain

and

feature map

and

Merge to get a feature map with a channel number of 2

S222，通过7×7卷积层将特征图F_cat的通道数量减少到1，经过Sigmoid函数生成空间注意力模块输出的特征图M_s(F′)(图6中Spatial Attention)，计算公式为：S222, the number of channels of the feature map F_cat is reduced to 1 through the 7×7 convolutional layer, and the feature map M_s (F′) output by the spatial attention module is generated by the Sigmoid function (Spatial Attention in Figure 6). The calculation formula is :

M_s(F′)＝σ(f^7×7(F_cat))M_s (F′)=σ(f^7×7 (F_cat ))

。S223, broadcast the attention feature map M_s (F'), and multiply it with the original feature map F' element-wise to obtain

.

S3，将S2得到的特征图F″进行特征融合中，特征融合网络参见图7，在yolo-v4中的PANet网络中引入自适应空间特征融合模块，具体包括以下步骤：S3, perform feature fusion on the feature map F″ obtained in S2, the feature fusion network is shown in Figure 7, and an adaptive spatial feature fusion module is introduced into the PANet network in yolo-v4, which specifically includes the following steps:

S31，通过空间金字塔池化(Spatial Pyramid Pooling，SPP)和特征金字塔网络(Feature Pyramid Networks，FPN)得到三层尺寸和大小不同的特征图X₁，X₂，X₃；S31, obtain three layers of feature maps X₁ , X₂ , X₃ with different sizes and sizes through Spatial Pyramid Pooling (Spatial Pyramid Pooling, SPP) and Feature Pyramid Networks (Feature Pyramid Networks, FPN);

S32，将输出的三个特征图进行融合操作(ASFF，Adaptively Spatial FeatureFusion)，各层特征图融合时作相加操作，并且计算每层的权重参数，然后将三层特征图分别与计算出的权重参数相乘，输出最后相加的结果，公式为：S32, perform a fusion operation on the three output feature maps (ASFF, Adaptively Spatial FeatureFusion), perform an addition operation when the feature maps of each layer are fused, and calculate the weight parameters of each layer, and then combine the three-layer feature maps with the calculated The weight parameters are multiplied, and the final addition result is output. The formula is:

其中，X_1→i、X_2→i、X_3→i分别为三个不同层要与第i层特征图大小相同时获得的特征向量，α_i、β_i、γ_i分别为从三个不同层融合到第i层时对应的权重参数，由网络自适应学习，参数大小在0到1之间，且α_i+β_i+γ_i＝1。α_i的学习方法可以表示为以下步骤，β_i与γ_i的学习方法与α_i相同。Among them, X_1→i , X_2→i , X_3→i are the feature vectors obtained when three different layers have the same size as the feature map of the i-th layer, respectively, α_i , β_i , γ_i are from the three The corresponding weight parameters when different layers are fused to the i-th layer are adaptively learned by the network, the parameter size is between 0 and 1, and α_i +β_i +γ_i =1. The learning method of α_i can be expressed as the following steps, and the learning methods of β_i and γ_i are the same as α_i .

S321，将X_1→i、X_2→i、X_3→i进行1x1的卷积得到

S321, perform 1×1 convolution on X_1→i , X_2→i , and X_3→i to obtain

S322，通过softmax算法得到α_i的值，公式为S322, the value of α_i is obtained by the softmax algorithm, and the formula is

S4具体为使用Soft-DIoU-NMS算法，其策略是不把那些超出阈值的框抑制掉，而是先将它们的置信度降低，使它们暂时保留下来，Soft-DIoU-NMS的公式为：S4 specifically uses the Soft-DIoU-NMS algorithm. The strategy is not to suppress those boxes that exceed the threshold, but to reduce their confidence first to keep them temporarily. The formula of Soft-DIoU-NMS is:

其中，S是置信度得分，X是所有候选框中置信度最高的框，Y是其余候选框，IoU(X,Y)表示两个候选框的交并比，n是设定的阈值，默认设为0.5；ρ²(,)表示两点之间距离的平方，a、b分别为候选框X和Y的中心点，l是候选框X和Y的最小外接框的对角线长度。Among them, S is the confidence score, X is the frame with the highest confidence in all candidate frames, Y is the remaining candidate frames, IoU(X, Y) represents the intersection ratio of the two candidate frames, n is the set threshold, the default Set to 0.5; ρ² (,) represents the square of the distance between the two points, a and b are the center points of the candidate frames X and Y, respectively, and l is the diagonal length of the minimum bounding frame of the candidate frames X and Y.

以上所述仅为本发明的较佳实施例而已，并不用以限制本发明，凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等，均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention and are not intended to limit the present invention. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present invention shall be included in the protection of the present invention. within the range.

Claims

1. A steel bar detection method based on target detection is characterized by comprising the following steps:

s1, preprocessing the image data of the steel bars and pre-training the model;

s2, inputting the image into the feature extraction network, and outputting a feature map F';

s3, carrying out feature fusion on the feature map F' obtained in S2;

s4, classifying and regressing the extracted feature processing, suppressing redundant prediction frames by a non-maximum value suppression method during regression, and reserving a nearest prediction frame as final output;

and S5, counting the number of the prediction frames finally output in the S4, and using the number as the number of the steel bars to realize the function of counting the steel bars.

2. The method for detecting the reinforcing steel bar based on the target detection as claimed in claim 1, wherein the step S1 of preprocessing the reinforcing steel bar image data and pre-training the model specifically comprises the following steps:

s11, enriching the reinforcing steel bar data set through data enhancement;

s12, pre-training on yolo-v4 network using public data set PASCAL VOC2007, loading weights of DarkNet53 and PANET parts in yolo-v 4;

and S13, modifying the Anchor size in the pre-training model, and clustering the rebar data set by using a K-means clustering algorithm to obtain the Anchor size of the rebar data set.

3. The method for detecting the reinforcing steel bar based on the target detection as claimed in claim 2, wherein the S11 enriches the reinforcing steel bar data set by data enhancement, comprising the following steps:

s111, keeping the size of the image unchanged, randomly adjusting contrast and brightness, and simultaneously enhancing the robustness of the network to the image by adopting a noise and image blurring method;

and S112, changing the size of the picture, namely randomly rotating and turning the picture by any angle, and stretching and compressing the picture to expand the number of the pictures with different sizes.

4. The method for detecting steel bars based on target detection as claimed in claim 2, wherein the S13 modifying the Anchor size in the pre-trained model specifically includes the following steps:

s131, randomly selecting K clustering centers, dividing the labeling frames into the class where the clustering center closest to the labeling frame is located, and classifying all the labeling frames;

s132, calculating the average value of all frame widths and heights, and updating a clustering center;

and S133, repeating S131 and S132 until the clustering center is unchanged to obtain the Anchor size.

5. The method as claimed in claim 1, wherein, in step S2, the image is input to a feature extraction network, and a feature map F "is output, specifically, the feature extraction network is improved based on a CSPDarknet53 network, a CBM module at the tail of the CSPx module is replaced with a CBAM module, the CBM module is composed of a convolutional layer Conv, a batch normalization BN, and a Mish activation function, and the CSPx module includes a convolutional module, a plurality of residual network structures, a merge module Concat, and a CBAM module.

6. The method as claimed in claim 1, wherein the step S2 of inputting the image into the feature extraction network and outputting the feature map F ″ comprises the following steps:

s21, inputting the feature map F into the channel attention module, and outputting the channel attention feature map M_c (F) Multiplied by the original feature map F according to the corresponding elements to obtain

And S22, inputting the feature map F 'into a space attention module to obtain a final feature map F'.

7. The method for detecting steel bars based on target detection as claimed in claim 6, wherein the S21 specifically comprises the following steps:

s211, performing maximum pooling and average pooling on the input feature map F to obtain

And

maximum pooling the output profile from each local maximum of the profile F

Average pooling the output profile from the average of each part of the profile F

S212, characteristic diagram

And

subjecting the two feature maps to a multi-layer sensor

And

adding pixel by pixel, and generating a channel attention feature map M through Sigmoid activation function operation_c (F)，

Where σ is the Sigmoid function, W₀ And W₁ Is the shared weight of the multi-layer perceptron;

s213, attention feature map M of channel_c (F) Broadcast, and multiply with the original feature map F according to the corresponding elements to obtain

8. The method for detecting steel bars based on target detection as claimed in claim 6, wherein the S22 specifically comprises the following steps:

s221, performing maximum pooling and average pooling on the F' obtained in the step S21 to obtain

And

will feature map

And

merging to obtain a feature map with the channel number of 2

S222, passing the 7 x 7 convolution layer through the feature map F_cat Is reduced to 1, and a feature map M output by a space attention module is generated through a Sigmoid function_s (F'), the calculation formula is:

M_s (F′)＝σ(f^7×7 (F_cat ))

where f represents the convolution operation, 7 × 7 is the convolution kernel size;

s223, attention feature map M_s (F') broadcast, multiply by element with original characteristic diagram F

9. The method for detecting steel bars based on target detection according to claim 1, wherein in the step S3, in the step S2 of feature fusion of the feature map F ", the feature fusion network introduces an adaptive spatial feature fusion module into the PANet network in yolo-v4, and specifically includes the following steps:

s31, obtaining three layers of feature graphs X with different sizes and dimensions through space pyramid pooling and feature pyramid network₁ ,X₂ ,X₃ ；

S32, the output three feature maps are fused, when the feature maps of each layer are fused, the addition operation is carried out, the weight parameter of each layer is calculated, then the three layer feature maps are respectively multiplied by the calculated weight parameter, and the final addition result is output, wherein the formula is as follows:

y_i ＝α_i ·X_1→i +β_i ·X_2→i +γ_i ·X_3→i i＝1,2,3

wherein X_1→i 、X_2→i 、X_3→i Respectively, the feature vectors, alpha, of three different layers to be obtained with the same size as the feature map of the ith layer_i 、β_i 、γ_i The weight parameters are respectively corresponding to the fusion from three different layers to the ith layer, the parameters are self-adaptively learned by the network, the parameter size is between 0 and 1, and alpha_i +β_i +γ_i ＝1。

10. The method for detecting steel bars based on target detection as claimed in claim 1, wherein the S4 is specifically implemented by using Soft-DIoU-NMS algorithm, whose strategy is not to suppress those frames exceeding the threshold value, but to reduce their confidence level first, so that they are kept temporarily, the formula of Soft-DIoU-NMS is:

where S is the confidence score, X is the highest confidence box among all candidate boxes, Y is the remaining candidate box, IoU (X, Y) indicatesThe intersection ratio of the two candidate frames, n is a set threshold value, and the default value is 0.5; ρ is a unit of a gradient² (,) represents the square of the distance between two points, a, b are the center points of the candidate boxes X and Y, respectively, and l is the diagonal length of the smallest bounding box of the candidate boxes X and Y.