CN116071343A

Movatterモバイル変換

Info

Publication number: CN116071343A
Application number: CN202310158656.5A
Authority: CN
Inventors: 张婷; 马聪; 陈迎春; 刘兆英
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2023-02-13
Filing date: 2023-02-13
Publication date: 2023-05-05
Anticipated expiration: 2043-02-13
Also published as: CN116071343B

Abstract

The invention discloses an improved refinished device pipeline defect detection method. First, the backbone network of the refindet is modified to be Swin-transducer; secondly, the characteristics output after the multi-scale characteristics of the Neck module are fused are used as classification and fine adjustment of anchors of the anchor refinement module, so that the anchors output by the anchor refinement module are more accurate, and further the target detection module can better position a target; finally, the learning rate adjustment strategy using wakeup and cosine and the improved refindet network was trained using Adamw optimizer. The improvement method based on the refine det can effectively improve the average accuracy of pipeline defect detection.

Description

Translated fromChinese

一种改进的RefineDet的管道缺陷检测方法An improved pipeline defect detection method based on RefineDet

技术领域Technical Field

本发明属于计算机视觉目标检测领域。尤其涉及一种基于改进的RefineDet的管道缺陷检测方法。The present invention belongs to the field of computer vision target detection, and in particular relates to a pipeline defect detection method based on improved RefineDet.

背景技术Background Art

管道为工业领域常用的液体和气体传输介质，由于工作环境和传输材料的复杂性，其在使用过程中极容易发生腐蚀、堵塞甚至破裂等问题，因此定期对管道进行检测，以确保其使用寿命和安全性是必须的。传统的管道缺陷人工检测方法存在耗费时间多、容易误检漏检等情况，因此，智能的缺陷检测方法是更佳的选择。管道缺陷检测是指对管道表面缺陷的检测，表面缺陷检测一般采用先进的人工智能视觉检测技术，对工件表面的斑点、凹坑、划痕、色差、缺损等缺陷进行检测。Pipelines are commonly used in the industrial field to transmit liquids and gases. Due to the complexity of the working environment and transmission materials, they are prone to corrosion, blockage, and even rupture during use. Therefore, it is necessary to regularly inspect pipelines to ensure their service life and safety. Traditional manual pipeline defect detection methods are time-consuming and prone to false detections and missed detections. Therefore, intelligent defect detection methods are a better choice. Pipeline defect detection refers to the detection of surface defects on pipelines. Surface defect detection generally uses advanced artificial intelligence visual inspection technology to detect defects such as spots, pits, scratches, color differences, and defects on the surface of the workpiece.

本发明利用基于改进的RefineDet目标检测模型对管道进行缺陷检测，其中RefineDet模型可以看成是SSD、RPN和FPN算法的结合，其主要思想是：Faster-RCNN等two-stage算法，对box进行两次回归，因此精度高，但是速度慢；YOLO等one-stage算法，对box只进行一次回归,速度快，但是精度低；RefineDet将两者结合起来，对box进行两次回归，但是还是one-stage算法，既提高了精度，同时速度也比较快；RefineDet的使用的框架是SSD，同时引入了FPN的特征融合操作，提高对小目标的检测效果。The present invention uses an improved RefineDet target detection model to perform defect detection on pipelines, wherein the RefineDet model can be regarded as a combination of SSD, RPN and FPN algorithms, and its main idea is: two-stage algorithms such as Faster-RCNN regress the box twice, so the accuracy is high but the speed is slow; one-stage algorithms such as YOLO only regress the box once, the speed is fast, but the accuracy is low; RefineDet combines the two and regresses the box twice, but it is still a one-stage algorithm, which not only improves the accuracy, but also has a faster speed; the framework used by RefineDet is SSD, and the feature fusion operation of FPN is introduced to improve the detection effect of small targets.

发明内容Summary of the invention

本发明要解决的技术问题是，提供一种基于RefineDet的改进方法，克服目标缺陷检测中平均准确率低的问题。本发明通过修改RefineDet的网络架构以及训练策略的方式来提高管道缺陷数据集平均准确率，使训练好的深度卷积神经网络模型准确率更高，从而提高网络模型对管道缺陷的分类和定位能力。为实现上述目的，本发明采用如下的技术方案：The technical problem to be solved by the present invention is to provide an improved method based on RefineDet to overcome the problem of low average accuracy in target defect detection. The present invention improves the average accuracy of the pipeline defect data set by modifying the network architecture and training strategy of RefineDet, so that the trained deep convolutional neural network model has a higher accuracy, thereby improving the network model's ability to classify and locate pipeline defects. To achieve the above purpose, the present invention adopts the following technical solutions:

一种改进的RefineDet的管道缺陷检测方法，其特征在于，包括以下步骤：An improved pipeline defect detection method of RefineDet, characterized by comprising the following steps:

步骤1：获取管道缺陷数据集X_s(包含N_s个样本)以及对应的标签Y_S；将X_s按7:3的比例划分为训练集X_train(包含N_train个样本)和测试集X_test(包含N_test个样本)；Step 1: Obtain the pipeline defect dataset_Xs (including_Ns samples) and the corresponding label_Ys ; divide_Xs into a training set_Xtrain (including_Ntrain samples) and a test set_Xtest (including_Ntest samples) in a ratio of 7:3;

步骤2：构建改进的RefineDet模型，该模型由4个分支组成，分别为主干网络、Neck模块、anchor细化模块和目标检测模块：Step 2: Build an improved RefineDet model, which consists of four branches: backbone network, Neck module, anchor refinement module, and target detection module:

步骤2.1：构建主干网络；主干网络为Swin Transformer网络，该网络以层次化的方式构建了4个特征提取阶段，分别记为S₁、S₂、S₃和S₄，每个阶段的输出特征图分别记为O_s1、O_s2、O_s3和O_s4；每个阶段包括两个部分，分别是补丁合并(Patch Merging)层和SwinTransformer块；4个特征提取阶段中补丁合并层的数量都为1，Swin Transformer块的数量分别为2、2、6和2，两个Swin Transformer块为一组；补丁合并的操作是将高度、宽度和通道数为H×W×C大小的特征图划分为4个

大小的特征图，然后将这4个特征图在通道维度拼接后得到

大小的特征图，最后将拼接后的特征图经过归一化和线性变换变成

大小的特征图，其中第一阶段的补丁合并操作是线性变换，特征图大小不变，故每阶段补丁合并层输入的H×W×C分别为80×80×48、80×80×96、40×40×192和20×20×384；在一组Swin Transformer块中，第一个Swin Transformer块的结构为一个归一化层、一个窗口多头注意力模块、一个归一化层和一个多层感知器模块，第二个SwinTransformer块结构为一个归一化层、一个偏移窗口多头注意力模块、一个归一化层和一个多层感知器模块；Step 2.1: Construct the backbone network; the backbone network is a Swin Transformer network, which constructs four feature extraction stages in a hierarchical manner, denoted as_S1 ,_S2 ,_S3 and_S4 , and the output feature maps of each stage are denoted as_Os1 ,_Os2 ,_Os3 and_Os4 respectively; each stage consists of two parts, namely the patch merging layer and the Swin Transformer block; the number of patch merging layers in the four feature extraction stages is 1, the number of Swin Transformer blocks is 2, 2, 6 and 2 respectively, and two Swin Transformer blocks are a group; the patch merging operation is to divide the feature map with a height, width and number of channels of H×W×C into 4

The feature maps of size are then concatenated in the channel dimension to obtain

The size of the feature map is finally normalized and linearly transformed into

The patch merging operation in the first stage is a linear transformation, and the size of the feature map remains unchanged. Therefore, the H×W×C of the patch merging layer input in each stage is 80×80×48, 80×80×96, 40×40×192, and 20×20×384 respectively; in a group of Swin Transformer blocks, the structure of the first Swin Transformer block is a normalization layer, a window multi-head attention module, a normalization layer, and a multi-layer perceptron module, and the structure of the second Swin Transformer block is a normalization layer, an offset window multi-head attention module, a normalization layer, and a multi-layer perceptron module;

步骤2.2：构建Neck模块，Neck模块的功能与特征金字塔模块功能类似，通过对低级特征添加高级特征来丰富语义信息，以提高检测准确性；Neck模块包含了4个传输连接块，分别记为T₁、T₂、T₃和T₄，其对应了主干网络的S₁、S₂、S₃和S₄；其中T₄只包含一个分支，O_s4做为该分支的输入，经过3次卷积操作，输出的特征图大小不变，剩余的传输连接块T_i(i＝1,2,3)包含了两个分支，第一个分支以O_si做为该分支的输入，第二个分支以T_i+1的输出特征图做为该分支的输入，由于第一个分支与第二个分支的输入特征图大小不匹配，为了匹配它们之间的维度，第一个分支使用反卷积操作来放大输入特征图，使之与第二个分支的输入特征图大小相同，然后第二个分支对输入特征图进行两次卷积操作，此时该分支输出的特征图大小不变，之后以逐元素相加方式对两个分支的输出求和；最后，在求和后增加一个卷积层，以保证检测特征的可分辨性；Neck模块的4个传输连接块T₁、T₂、T₃和T₄的输出分别记为O_t1、O_t2、O_t3和O_t4；Step 2.2: Construct the Neck module. The function of the Neck module is similar to that of the feature pyramid module. It enriches semantic information by adding high-level features to low-level features to improve detection accuracy. The Neck module contains four transmission connection blocks, which are denoted as T₁ , T₂ , T₃ and T₄ , which correspond to S₁ , S₂ , S₃ and S₄ of the backbone network. Among them, T₄ contains only one branch, O_s4 is used as the input of the branch. After three convolution operations, the size of the output feature map remains unchanged. The remaining transmission connection block T_i (i＝1,2,3) contains two branches. The first branch uses O_si as the input of the branch, and the second branch uses T The output feature map of_i+1 is used as the input of this branch. Since the sizes of the input feature maps of the first branch and the second branch do not match, in order to match their dimensions, the first branch uses a deconvolution operation to enlarge the input feature map to make it the same size as the input feature map of the second branch. Then the second branch performs two convolution operations on the input feature map. At this time, the size of the feature map output by this branch remains unchanged. Then, the outputs of the two branches are summed in an element-by-element addition manner. Finally, a convolution layer is added after the summation to ensure the distinguishability of the detection features. The outputs of the four transmission connection blocks_T1 ,_T2 ,_T3 and_T4 of the Neck module are recorded as_Ot1 ,_Ot2 ,_Ot3 and_Ot4 respectively.

步骤2.3：构建anchor细化模块；anchor细化模块通过调整anchors的位置和大小，为目标检测模块中的边框回归提供更好的初始化；anchor细化模块包含4个卷积块，每个卷积块又分别由两个卷积层构成；4个卷积块分别用于对输入的4个不同尺度的特征图O_t1、O_t2、O_t3和O_t4进行anchors微调的位置预测以及对anchors是正样本的置信度进行预测；Step 2.3: Construct an anchor refinement module; the anchor refinement module provides better initialization for the bounding box regression in the object detection module by adjusting the position and size of the anchors; the anchor refinement module contains 4 convolution blocks, each of which is composed of two convolution layers; the 4 convolution blocks are used to predict the position of the anchors fine-tuning for the input feature maps of 4 different scales_Ot1 ,_Ot2 ,_Ot3 and_Ot4 , and predict the confidence that the anchors are positive samples;

步骤2.4：构建目标检测模块；目标检测模块与anchor细化模块结构类似，也包含4个卷积块，每个卷积块分别由两个卷积层构成；4个卷积块分别对输入的4个不同尺度的特征图O_t1、O_t2、O_t3和O_t4进行目标边框的偏移量预测以及目标的缺陷类别预测；Step 2.4: Construct a target detection module. The target detection module has a similar structure to the anchor refinement module and also contains four convolution blocks. Each convolution block consists of two convolution layers. The four convolution blocks predict the offset of the target border and the defect category of the target for the input feature maps_Ot1 ,_Ot2 ,_Ot3 and_Ot4 of four different scales.

步骤3：损失函数包括anchor细化模块的损失函数和目标检测模块的损失函数，具体为：Step 3: The loss function includes the loss function of the anchor refinement module and the loss function of the target detection module, specifically:

步骤3.1:anchor细化模块根据每个anchor是否存在目标分为两类，进行anchors的二分类和回归损失计算；使用的损失函数分别为交叉熵损失函数和平滑L1损失函数；Step 3.1: The anchor refinement module divides each anchor into two categories according to whether there is a target, and performs binary classification and regression loss calculation of anchors; the loss functions used are cross entropy loss function and smooth L1 loss function respectively;

交叉熵损失函数为：The cross entropy loss function is:

其中，y_i是标签值，y'_i是预测值，n为训练集X_train样本的数量N_train；Where_yi is the label value,_y'i is the predicted value, and n is the number of samples in the training set_Xtrain,_Ntrain ;

平滑L1损失函数的公式为：The formula for the smoothed L1 loss function is:

其中

v＝(v_x，v_y，v_w，v_h)表示真实边框的中心点坐标(v_x,v_y)和框的宽度v_w以及高度v_h，

表示预测框的中心点坐标

和框的宽度

以及高度

k∈{x,y,h,w}，x、y、w和h表示框的中心点坐标(x,y)以及框的宽度w和高度h；in

v = (_vx ,_vy ,_vw ,_vh ) represents the coordinates of the center point of the real bounding box (_vx ,_vy ) and the width_vw and height_vh of the box.

Represents the center point coordinates of the prediction box

and the width of the box

and height

k∈{x,y,h,w}, where x, y, w and h represent the coordinates of the center point of the box (x, y) and the width w and height h of the box;

步骤3.2:目标检测模块的损失函数分别为交叉熵损失函数和平滑L1损失函数；Step 3.2: The loss functions of the target detection module are the cross entropy loss function and the smooth L1 loss function;

交叉熵损失函数为：The cross entropy loss function is:

其中

表示预测框的中心点坐标

和框的宽度

以及高度

Represents the center point coordinates of the prediction box

and the width of the box

and height

步骤4：使用warmup和cosine的学习率调整策略，使用Adamw优化器训练改进的RefineDet网络；Step 4: Use the learning rate adjustment strategy of warmup and cosine and train the improved RefineDet network using the Adamw optimizer;

步骤5：在测试集上测试训练好的改进的RefineDet网络，计算平均准确率。Step 5: Test the trained improved RefineDet network on the test set and calculate the average accuracy.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明的基本方法流程示意图；FIG1 is a schematic diagram of a basic method flow chart of the present invention;

图2为本文整体网络架构设计；Figure 2 shows the overall network architecture design of this article;

图3为主干网络结构图；Figure 3 is a diagram of the backbone network structure;

图4为Neck模块包含的传输连接块结构图；FIG4 is a structural diagram of a transmission connection block included in the Neck module;

图5为anchor细化模块结构图；Figure 5 is a structural diagram of the anchor refinement module;

图6为目标检测模块结构图；Figure 6 is a structural diagram of the target detection module;

表1为改进前和改进后RefineDet模型在测试集上的平均准确率结果；Table 1 shows the average accuracy results of the RefineDet model on the test set before and after improvement;

具体实施方式DETAILED DESCRIPTION

本发明提供一种基于RefineDet的改进方法，下面结合相关附图对本发明进行解释和阐述：The present invention provides an improved method based on RefineDet, which is explained and illustrated below in conjunction with the relevant drawings:

本发明使用的管道缺陷数据集，数据集包含了20种类别，共有6010个样本以及对应的标签，其中训练集样本为4258，测试集样本为1752。The pipeline defect dataset used in the present invention includes 20 categories, a total of 6010 samples and corresponding labels, of which the training set samples are 4258 and the test set samples are 1752.

本发明的实施方案流程如下：The implementation scheme of the present invention is as follows:

步骤1：获取管道缺陷数据集X_s(包含6010个样本)以及对应的标签Y_S(包含6010个样本)；将X_s按7:3的比例划分为训练集X_train(包含4258个样本以及对应标签)和测试集X_test(包含1752个样本以及对应标签)；Step 1: Obtain the pipeline defect dataset_Xs (containing 6010 samples) and the corresponding labels_Ys (containing 6010 samples); divide_Xs into a training set_Xtrain (containing 4258 samples and corresponding labels) and a test set_Xtest (containing 1752 samples and corresponding labels) in a ratio of 7:3;

步骤2.1：构建主干网络；主干网络为Swin Transformer网络，该网络以层次化的方式构建了4个特征提取阶段，分别记为S₁、S₂、S₃和S₄，每个阶段的输出特征图分别记为O_s1、O_s2、O_s3和O_s4；每个阶段包括两个部分，分别是补丁合并(Patch Merging)层和SwinTransformer块；4个特征提取阶段中补丁合并层的数量都为1，Swin Transformer块的数量分别为2、2、6和2，两个Swin Transformer块为一组；补丁合并的操作是将样本数、高度、宽度和通道数为H×W×C大小的特征图划分为4个

大小的特征图，然后将这4个特征图在通道维度拼接后得到

大小的特征图，其中第一阶段的补丁合并操作是线性变换，特征图大小不变，故每阶段补丁合并层输入的H×W×C分别为80×80×48、80×80×96、40×40×192和20×20×384；在一组Swin Transformer块中，第一个Swin Transformer块的结构为一个归一化层、一个窗口多头注意力模块、一个归一化层和一个多层感知器模块，第二个Swin Transformer块结构为一个归一化层、一个偏移窗口多头注意力模块、一个归一化层和一个多层感知器模块；Step 2.1: Construct the backbone network; the backbone network is a Swin Transformer network, which constructs four feature extraction stages in a hierarchical manner, denoted as_S1 ,_S2 ,_S3 and_S4 , and the output feature maps of each stage are denoted as_Os1 ,_Os2 ,_Os3 and_Os4 respectively; each stage consists of two parts, namely the patch merging layer and the Swin Transformer block; the number of patch merging layers in the four feature extraction stages is 1, the number of Swin Transformer blocks is 2, 2, 6 and 2 respectively, and two Swin Transformer blocks are a group; the patch merging operation is to divide the feature map with the number of samples, height, width and number of channels of H×W×C into 4

The size of the feature map is finally normalized and linearly transformed into

步骤2.4：构建目标检测模块；目标检测模块与anchor细化模块结构类似，也包含4个卷积块，每个卷积块又分别由两个卷积层构成；4个卷积块分别对输入的4个不同尺度的特征图O_t1、O_t2、O_t3和O_t4进行目标边框的偏移量预测以及目标的缺陷类别预测；Step 2.4: Construct the target detection module; the target detection module has a similar structure to the anchor refinement module and also contains four convolution blocks, each of which is composed of two convolution layers; the four convolution blocks respectively predict the offset of the target border and the defect category of the target for the input feature maps of four different scales_Ot1 ,_Ot2 ,_Ot3 and_Ot4 ;

交叉熵损失函数为：The cross entropy loss function is:

其中，y_i是标签值，y′_i是预测值，n为训练集X_train样本数量，共4258个样本；Among them,_yi is the label value,_y′i is the predicted value, and n is the number of samples in the training set_Xtrain , a total of 4258 samples;

其中

表示预测框的中心点坐标

和框的宽度

以及高度

Represents the center point coordinates of the prediction box

and the width of the box

and height

交叉熵损失函数为：The cross entropy loss function is:

其中

表示预测框的中心点坐标

和框的宽度

以及高度

Represents the center point coordinates of the prediction box

and the width of the box

and height

步骤5：在测试集上测试训练好的改进的RefineDet网络，计算出的平均准确率为91.8％。Step 5: Test the trained improved RefineDet network on the test set, and the calculated average accuracy is 91.8%.

表1不同模型的测试结果Table 1 Test results of different models

以上实例仅用于描述本发明，而非限制本发明所描述的技术方案。因此，一切不脱离本发明精神和范围的技术方案及其改进，均应涵盖在本发明的权利要求范围中。The above examples are only used to describe the present invention, rather than to limit the technical solutions described in the present invention. Therefore, all technical solutions and improvements that do not depart from the spirit and scope of the present invention should be included in the scope of the claims of the present invention.

Claims

1. An improved refinishedet pipeline defect detection method, comprising the steps of:

step 1: acquiring a pipe defect dataset X_s Corresponding label Y_S The method comprises the steps of carrying out a first treatment on the surface of the X is to be_s Dividing into training sets X according to the proportion of 7:3_train And test set X_test ；X_s Comprising N_s Sample number, training set X_train Comprising N_train Sample, test set X_test Comprising N_test A sample number;

step 2: an improved refinished model is built, which consists of 4 branches, namely a backbone network, a rock module, an anchor refinement module and a target detection module:

step 3: the loss function comprises a loss function of the anchor refinement module and a loss function of the target detection module, and specifically comprises the following steps:

step 3.1: the anchor refinement module classifies targets of each anchor into two categories according to the existence of targets of each anchor, and performs two classification of anchors and regression loss calculation; the loss functions used are a cross entropy loss function and a smooth L1 loss function, respectively;

the cross entropy loss function is:

wherein y is_i Is the label value, y'_i Is a predicted value, n is a training set X_train Number of samples N_train ；

The formula for smoothing the L1 loss function is:

wherein the method comprises the steps of

v＝(v_x ，v_y ，v_w ，v_h ) Represents the center point coordinates (v)_x ，v_y ) Width v of sum frame_w Height v_h ，

Center point coordinates representing prediction box>

Width of sum frame->

Height +.>

x, y, w and h represent the coordinates (x, y) And the width w and height h of the frame;

step 3.2: the loss functions of the target detection module are respectively a cross entropy loss function and a smooth L1 loss function;

the cross entropy loss function is:

The formula for smoothing the L1 loss function is:

wherein the method comprises the steps of

Center point coordinates representing prediction box>

Width of sum frame->

Height +.>

x, y, w and h represent the center point coordinates (x, y) of the box, and the width w and height h of the box;

step 4: training an improved refindet network using an Adamw optimizer using learning rate adjustment policies of wakeup and cosine;

step 5: the trained improved refindet network is tested on the test set and the average accuracy is calculated.

2. The improved refinishedet pipe defect detection method according to claim 1, wherein step 2.1: constructing a backbone network; the backbone network is a Swin transducer network which builds 4 feature extraction stages in a hierarchical manner, denoted S₁ 、S₂ 、S₃ And S is₄ The output characteristic diagram of each stage is respectively marked as O_s1 、O_s2 、O_s3 And O_s4 The method comprises the steps of carrying out a first treatment on the surface of the Each stage comprises two parts, namely a Patch Merging Patch merge layer and a Swin transform block; the number of patch merging layers in the 4 feature extraction stages is 1,Swin Transformer, and the number of patch merging layers is 2, 6 and 2 respectively, and two Swin transducer blocks are a group; the patch merging operation is to divide a feature map having a height, width and channel number of c×h×w size into 4 pieces

Characteristic maps of the size, then the 4 characteristic maps are spliced in the channel dimension to obtain +.>

The feature images with the size are finally normalized and linearly transformed into +.>

A feature map of size; in a group of Swin transform blocks, the first Swin transform block has a structure of a normalization layer, a window multi-head attention module, a normalization layer and a multi-layer sensor module, and the second Swin transform block has a structure of a normalization layer, an offset window multi-head attention module, a normalization layer and a multi-layer sensor module;

step 2.2: constructing a Neck module, wherein the Neck module enriches semantic information by adding high-level features to low-level features; the Neck module contains 4 transmission connection blocks, respectively denoted as T₁ 、T₂ 、T₃ And T₄ Which corresponds to S of the backbone network₁ 、S₂ 、S₃ And S is₄ The method comprises the steps of carrying out a first treatment on the surface of the Wherein T is₄ Comprising only one branch, O_s4 As the input of the branch, the size of the output characteristic diagram is unchanged after 3 times of convolution operation, and the residual transmission connecting block T_i Comprising two branches, i=1, 2,3, the first branch being denoted by O_si As input to the branch, the second branch is denoted by T_i+1 As input of the branch, since the input feature map sizes of the first branch and the second branch are not matched, in order to match the dimension between them, the first branch uses deconvolution operation to amplify the input feature map to be the same as the input feature map size of the second branch, then the second branch carries out convolution operation on the input feature map twice, at this time, the feature map size of the output of the branch is unchanged, and then the outputs of the two branches are summed in an element-by-element addition mode; finally, adding a convolution layer after summation to ensure the resolvable property of the detection features; 4 transmission connecting blocks T of Neck module₁ 、T₂ 、T₃ And T₄ The outputs of (2) are respectively denoted as O_t1 、O_t2 、O_t3 And O_t4 ；

Step 2.3: constructing an anchor refinement module; the anchor refinement module provides better initialization for frame regression in the target detection module by adjusting the position and the size of anchors; the anchor refinement module comprises 4 convolution blocks, and each convolution block is respectively composed of two convolution layers; the 4 convolution blocks are respectively used for the characteristic image O of 4 different scales of the input_t1 、O_t2 、O_t3 And O_t4 Performing position prediction of the anchors fine tuning, and predicting the confidence that the anchors is a positive sample;

step 2.4: constructing a target detection module; the object detection module is similar to the Anchor refinement module in structure and also comprises 4 convolution blocks, each convolution blockEach of which is composed of two convolution layers; the 4 convolution blocks respectively apply to the input characteristic graphs O of 4 different scales_t1 、O_t2 、O_t3 And O_t4 And carrying out offset prediction of the target frame and defect type prediction of the target.