CN116977270A

Movatterモバイル変換

Info

Publication number: CN116977270A
Application number: CN202310435716.3A
Authority: CN
Inventors: 李家乐; 李俊峰; 潘海鹏
Original assignee: Zhejiang Sci Tech University Changshan Research Institute Co ltd; Zhejiang Sci Tech University ZSTU
Current assignee: Zhejiang Sci Tech University Changshan Research Institute Co ltd; Zhejiang Sci Tech University ZSTU
Priority date: 2023-04-22
Filing date: 2023-04-22
Publication date: 2023-10-31

Abstract

Translated fromChinese

本发明涉及轴承检测技术领域，公开了一种高精度轴承全表面缺陷在线可视化智能检测方法，采集轴承的上下内外四个端面的图片，然后在上位机中采用滑动窗口的方式对采集图片进行窗口截取并输入轴承缺陷检测模型进行预测，输出带有缺陷类型、置信度和缺陷位置标记的图片；轴承缺陷检测模型是以YOLOv5为基础网络，在主干网络的SPPF模块前添加ECCA模块，将Neck网络替换为由GSConv和VoVGSCSP构成的Slim neck，Head采用YOLOX中的解耦检测头。本发明可以有效地对轴承四个端面上的缺陷进行检测，对检测结果进行分类和标记。

The invention relates to the technical field of bearing detection. It discloses a high-precision online visual intelligent detection method for full-surface defects of bearings. It collects pictures of the upper, lower and inner four end surfaces of the bearing, and then uses a sliding window to window the collected pictures in the host computer. Intercept and input the bearing defect detection model for prediction, and output pictures with defect types, confidence levels and defect location marks; the bearing defect detection model is based on YOLOv5, and the ECCA module is added before the SPPF module of the backbone network, and the Neck network Replaced with a Slim neck composed of GSConv and VoVGSCSP, the Head uses the decoupled detection head in YOLOX. The invention can effectively detect defects on the four end faces of the bearing, and classify and mark the detection results.

Description

Translated fromChinese

一种高精度轴承全表面缺陷在线可视化智能检测方法An online visual intelligent detection method for high-precision bearing full-surface defects

技术领域Technical field

本发明涉及轴承检测技术领域，具体是一种高精度轴承全表面缺陷在线可视化智能检测方法。The invention relates to the technical field of bearing detection, specifically an online visual intelligent detection method for high-precision bearing full surface defects.

背景技术Background technique

轴承(Bearing)是机械设备中的一种重要零部件，主要功能是支撑机械旋转体，降低其运动过程中的摩擦系数(friction coefficient)，并保证其回转精度(accuracy)。轴承在生产过程中受到原材料、加工工艺、加工设备及外界因素等影响，不可避免的出现一些缺陷，包括螺纹、黑斑、平面印、磕伤和切划伤等等类型的缺陷。Bearing is an important component in mechanical equipment. Its main function is to support the mechanical rotating body, reduce the friction coefficient during its movement, and ensure its rotation accuracy. During the production process, bearings are affected by raw materials, processing techniques, processing equipment and external factors, and some defects inevitably occur, including threads, black spots, flat marks, bumps and cuts, etc.

对检测轴承表面缺陷的检测是必不可少的工艺步骤，因此有必要在装配线上有效地识别轴承的缺陷。如果轴承缺陷的检测能够通过人工智能在线实现，不仅可以节省人力成本，还可以避免因员工分心、人眼错误检测和遗漏检测而导致的错误，这将对整个生产线做出巨大贡献。The detection of bearing surface defects is an essential process step, so it is necessary to effectively identify bearing defects on the assembly line. If the detection of bearing defects can be realized online through artificial intelligence, it will not only save labor costs, but also avoid errors caused by employee distraction, human eye error detection, and missed detection, which will make a huge contribution to the entire production line.

YOLOv5是目前检测精度和检测效率综合性能最优的目标检测网络之一，但直接使用YOLOv5算法对轴承缺陷进行识别效果并不理想，主要因为是轴承表面纹理背景复杂，缺陷种类、形状、大小等复杂多样。基于此，结合轴承表面光学特性、缺陷成像特点及工业检测要求，本发明提出一种基于改进YOLOv5网络的轴承套圈表面缺陷检测算法。YOLOv5 is currently one of the target detection networks with the best comprehensive performance in terms of detection accuracy and detection efficiency. However, the direct use of the YOLOv5 algorithm to identify bearing defects is not ideal, mainly because the bearing surface texture background is complex, and the types, shapes, sizes of defects, etc. Complex and diverse. Based on this, combined with the bearing surface optical characteristics, defect imaging characteristics and industrial inspection requirements, the present invention proposes a bearing ring surface defect detection algorithm based on the improved YOLOv5 network.

发明内容Contents of the invention

本发明要解决的技术问题是提供一种高精度轴承全表面缺陷在线可视化智能检测方法，用以实现对轴承上、下、内和外四个端面上的生产缺陷的可视化智能检测。The technical problem to be solved by the present invention is to provide an online visual intelligent detection method for high-precision bearing full-surface defects, which is used to realize visual intelligent detection of production defects on the upper, lower, inner and outer end surfaces of the bearing.

为了解决上述技术问题，本发明提供一种高精度轴承全表面缺陷在线可视化智能检测方法，包括如下过程：采集轴承的上端面、下端面、内侧面和外侧面的图片，然后在上位机中采用滑动窗口的方式对采集图片进行窗口截取并输入轴承缺陷检测模型进行预测，输出带有缺陷类型、置信度和缺陷位置标记的图片；In order to solve the above technical problems, the present invention provides an online visual intelligent detection method for high-precision bearing full surface defects, which includes the following process: collecting pictures of the upper end face, lower end face, inner side and outer side of the bearing, and then using it in the host computer The sliding window method is used to intercept the collected images and input them into the bearing defect detection model for prediction, and output images with defect type, confidence level and defect location markers;

所述轴承缺陷检测模型是以YOLOv5为基础网络，在YOLOv5网络的主干网络的SPPF模块前添加ECCA模块，将YOLOv5网络的Neck网络替换为由GSConv和VoVGSCSP构成的Slimneck，Head采用YOLOX中的解耦检测头。The bearing defect detection model is based on YOLOv5. The ECCA module is added in front of the SPPF module of the backbone network of the YOLOv5 network. The Neck network of the YOLOv5 network is replaced by a Slimneck composed of GSConv and VoVGSCSP. The head adopts the decoupling in YOLOX. detection head.

作为本发明的的一种高精度轴承全表面缺陷在线可视化智能检测方法的改进：As an improvement of the online visual intelligent detection method of high-precision bearing full-surface defects according to the present invention:

所述ECCA模块包括坐标注意力模块和高效通道注意力模块，输入的特征图进入坐标注意力模块后分别对水平方向和垂直方向进行全局平均池化得到两个单独的位置感知特征图，然后将输入的特征图经过高效通道注意力模块提取的通道特征向量分别加权到两个位置感知特征图，在空间维度上Concat两个特征图，并通过Conv、BN和hardSwish来编码垂直方向和水平方向的空间信息；最后，分离两个位置感知特征图并加权到输入的特征图上。The ECCA module includes a coordinate attention module and an efficient channel attention module. After the input feature map enters the coordinate attention module, global average pooling is performed on the horizontal and vertical directions to obtain two separate position-aware feature maps, and then the The input feature map is weighted to two position-aware feature maps through the channel feature vectors extracted by the efficient channel attention module, Concat the two feature maps in the spatial dimension, and encodes the vertical and horizontal directions through Conv, BN and hardSwish. spatial information; finally, two position-aware feature maps are separated and weighted onto the input feature map.

作为本发明的的一种高精度轴承全表面缺陷在线可视化智能检测方法的进一步改进：As a further improvement of the online visual intelligent detection method of high-precision bearing full-surface defects of the present invention:

所述轴承缺陷检测模型的聚类锚框的大小分别为[39,39,62,122,178,76],[106,244,597,58,236,202,],[178,547,478,220,354,455]。The sizes of the clustering anchor boxes of the bearing defect detection model are [39,39,62,122,178,76], [106,244,597,58,236,202,], [178,547,478,220,354,455] respectively.

所述轴承缺陷检测模型的训练和测试过程为：The training and testing process of the bearing defect detection model is:

采集带有缺陷的轴承的上端面、下端面、内侧面和外侧面的图像，在有缺陷的位置手动截取窗口并按缺陷类型对窗口进行分类，再将窗口数量进行扩充后，每种缺陷样本随机划分为训练集、验证集和测试集；训练的总轮数为100、批处理大小为16、前期学习率为0.01、后期学习率为0.001，使用SGD优化器，损失函数为：Collect images of the upper end face, lower end face, inner side and outer side of the bearing with defects, manually intercept the windows at the defective position and classify the windows according to the defect type, and then expand the number of windows, each defect sample Randomly divided into training set, validation set and test set; the total number of training rounds is 100, the batch size is 16, the early learning rate is 0.01, the late learning rate is 0.001, using the SGD optimizer, the loss function is:

Loss＝w_boxL_box+w_objL_obj+w_clsL_cls (2)Loss＝w_box L_box +w_obj L_obj +w_cls L_cls (2)

其中，w_box、w_obj、w_cls分别为0.05、0.5和1.0。Among them, w_box , w_obj , and w_cls are 0.05, 0.5, and 1.0 respectively.

本发明的有益效果主要体现在：The beneficial effects of the present invention are mainly reflected in:

1、本发明采用高效通道注意力ECA模块和坐标注意力CA模块构建混合注意力机制ECCA模块，并引入YOLOv5的主干网络中，可以进一步提高网络的特征提取能力；1. The present invention uses the high-efficiency channel attention ECA module and the coordinate attention CA module to construct a hybrid attention mechanism ECCA module, and introduces it into the backbone network of YOLOv5, which can further improve the feature extraction capability of the network;

2、本发明使用SConv和VoVGSCSP构建的Slim neck模块取代Conv和C3构成的YOLOv5的neck网络，有效减少参数量的同时提升对轴承缺陷的检测能力；2. The present invention uses the Slim neck module constructed by SConv and VoVGSCSP to replace the YOLOv5 neck network composed of Conv and C3, effectively reducing the amount of parameters and improving the detection ability of bearing defects;

3、本发明采用YOLOX中的解耦检测头代替YOLOv5网络的检测头，将回归任务和分类任务分开，使网络对缺陷类别的区分更加准确。3. The present invention uses the decoupled detection head in YOLOX to replace the detection head of the YOLOv5 network, and separates the regression task and the classification task, so that the network can distinguish defect categories more accurately.

附图说明Description of the drawings

下面结合附图对本发明的具体实施方式作进一步详细说明。The specific embodiments of the present invention will be further described in detail below with reference to the accompanying drawings.

图1为本发明的轴承缺陷检测模型的结构示意图；Figure 1 is a schematic structural diagram of the bearing defect detection model of the present invention;

图2为坐标注意力模块的结构示意图；Figure 2 is a schematic structural diagram of the coordinate attention module;

图3为高效通道注意力模块的结构示意图；Figure 3 is a schematic structural diagram of the efficient channel attention module;

图4为本发明的混合注意力机制模块的结构示意图；Figure 4 is a schematic structural diagram of the hybrid attention mechanism module of the present invention;

图5为GSConv模块的结构示意图；Figure 5 is a schematic structural diagram of the GSConv module;

图6为VoVGSCSP模块的结构示意图；Figure 6 is a schematic structural diagram of the VoVGSCSP module;

图7为YOLOX的解耦检测头的结构示意图；Figure 7 is a schematic structural diagram of YOLOX’s decoupled detection head;

图8为本发明的训练集的损失函数、验证集的损失函数曲线和mAP曲线图；Figure 8 shows the loss function of the training set, the loss function curve of the verification set and the mAP curve of the present invention;

图9为本发明的滑动窗口截取示意图；Figure 9 is a schematic diagram of the sliding window of the present invention;

图10为本发明与YOLOv5s、YOLOXs、YOLOv6n和YOLOv7tiny网络对5张不同缺陷图片的检测结果对比图。Figure 10 is a comparison chart of the detection results of five different defect pictures between the present invention and YOLOv5s, YOLOXs, YOLOv6n and YOLOv7tiny networks.

具体实施方式Detailed ways

下面结合具体实施例对本发明进行进一步描述，但本发明的保护范围并不仅限于此：The present invention will be further described below in conjunction with specific embodiments, but the protection scope of the present invention is not limited thereto:

实施例1、一种高精度轴承全表面缺陷在线可视化智能检测方法，如图1-10所示，首先，在YOLOv5主干网络中引入结合通道注意力机制和空间注意力机制的混合注意力模块ECCA，加强网络对目标特征的定位能力；其次，用GSConv和VoVGSCSP构成的Slim neck替换原neck，在不损失精度的前提下减少模型的参数量和计算量；再次，用YOLOX的解耦检测头替换原检测头，将分类与回归任务分离，从而构建为本发明的轴承缺陷检测模型，具体实现的过程如下：Embodiment 1. An online visual intelligent detection method for high-precision bearing full-surface defects, as shown in Figure 1-10. First, a hybrid attention module ECCA that combines the channel attention mechanism and the spatial attention mechanism is introduced into the YOLOv5 backbone network. , to strengthen the network's ability to locate target features; secondly, replace the original neck with a Slim neck composed of GSConv and VoVGSCSP to reduce the number of parameters and calculations of the model without losing accuracy; thirdly, replace it with the decoupled detection head of YOLOX The original detection head separates the classification and regression tasks to construct the bearing defect detection model of the present invention. The specific implementation process is as follows:

1、改进YOLOv5网络1. Improve YOLOv5 network

1.1网络结构1.1 Network structure

YOLOv5使用CSPDarknet53作为主干网络Backbone，其作用是通过多个下采样模块提取特征；颈部Neck结构是一种结合了特征金字塔网络FPN和路径聚合网络PAN的结构，其中FPN(特征金字塔网络)层从上到下传递强烈的语义特征，而PAN(路径聚合网络)层自下而上传递强烈的定位特征，在来自不同主干层的不同检测层上执行特征聚合，以提高特征提取的能力。FPN+PAN结构其作用是将主干提取的特征进行融合。头部Head部分用于预测结果，包括目标的位置和类别。YOLOv5 uses CSPDarknet53 as the backbone network Backbone, which is used to extract features through multiple downsampling modules; the neck Neck structure is a structure that combines the feature pyramid network FPN and the path aggregation network PAN, in which the FPN (feature pyramid network) layer is It transfers strong semantic features from top to bottom, while the PAN (Path Aggregation Network) layer transfers strong localization features from bottom to top, performing feature aggregation on different detection layers from different backbone layers to improve the ability of feature extraction. The function of the FPN+PAN structure is to fuse the features extracted from the backbone. The Head part is used to predict the results, including the location and category of the target.

本发明提出的YOLOv5改进网络结构如图1所示。主干网络Backbone对输入进行了五次下采样，下采样模块CBS由卷积(convolution)、批次归一化(Batch Normalization)和SiLU激活函数组成。C3模块主要用于对特征的提取，是一种CSP结构，由3个下采样模块(Conv)和多个Bottleneck模块组成；SPPF是空间金字塔池化模块，该模块以不同内核大小执行最大池化，提高网络的感受野，并将特征拼接在一起进行融合。ECCA模块是本发明提出的混合注意力机制，该模块融合了ECA和CA注意力机制，使网络更加关注特征图的通道和位置信息。本发明Neck结构的CBS模块替换为GSConv模块，C3模块被替换为VoVGSCSP模块。GSConv的计算成本更低且计算的结果好于标准卷积。Head采用YOLOX中的解耦检测头(Decoupled head)，其作用是将分类任务和回归任务分开，可以大大加快损失函数的收敛速度。The YOLOv5 improved network structure proposed by the present invention is shown in Figure 1. The backbone network Backbone downsamples the input five times, and the downsampling module CBS consists of convolution, batch normalization and SiLU activation function. The C3 module is mainly used for feature extraction. It is a CSP structure, consisting of 3 downsampling modules (Conv) and multiple Bottleneck modules; SPPF is a spatial pyramid pooling module, which performs maximum pooling with different kernel sizes. , improve the receptive field of the network, and splice the features together for fusion. The ECCA module is a hybrid attention mechanism proposed by the present invention. This module integrates the ECA and CA attention mechanisms to make the network pay more attention to the channel and position information of the feature map. In the present invention, the CBS module of the Neck structure is replaced by the GSConv module, and the C3 module is replaced by the VoVGSCSP module. GSConv's computational cost is lower and its calculation results are better than standard convolution. Head uses the decoupled detection head (Decoupled head) in YOLOX. Its function is to separate the classification task and the regression task, which can greatly speed up the convergence speed of the loss function.

1.2、ECCA模块1.2. ECCA module

注意力机制本质上类似于人类的选择性视觉注意力机制，它调整图像不同区域的权重，以便我们可以更专注于重要区域而忽略无关信息。注意力机制已被证明有助于各种计算机视觉任务，如图像分类、目标检测等。因此，使用注意机制可以使网络更加关注于缺陷区域。坐标注意力(CA，coordinate attention)模块的作用是将通道注意力分解为两个1维特征编码过程，分别沿H和W两个空间方向聚合特征。这样，可以沿一个空间方向捕获远程依赖关系，同时可以沿另一空间方向保留精确的位置信息；然后，将生成的特征图分别编码为一对方向感知和位置敏感的attention map，可以将其互补地应用于输入特征图。坐标注意力(CA)模块同时考虑了通道间关系和位置信息。它不仅捕获了跨通道的信息，还包含了direction-aware和position-sensitive(方向与位置敏感)的信息，这使得模型更准确地定位到并识别目标区域。但是，由于CA模块需要同时关注特征图的通道和位置信息，训练过程中会导致通道信息的丢失。而高效通道注意力(ECA，efficient channel attention)模块作为一个轻量级的通道注意模块，可以捕获跨通道交互信息，并实现显著的性能改进。基于此，结合CA模块和ECA模块，本发明构建了混合注意力机制模块(ECCA模块)，利用ECA模块帮助在CA模块捕获通道信息，并将该模块引入YOLOv5的主干网络中以帮助其更好的提取特征。轴承的缺陷颜色、位置和尺寸变化很大，而且受到背景干扰的影响，有些缺陷使用原YOLOv5无法完成检测要求，采用CA模块可以提高网络对缺陷位置和大小的提取能力，继续引入ECA模块可以提高网络对缺陷颜色的区分。The attention mechanism is essentially similar to the human selective visual attention mechanism, which adjusts the weight of different areas of the image so that we can focus more on important areas and ignore irrelevant information. Attention mechanisms have been proven to be helpful in various computer vision tasks such as image classification, object detection, etc. Therefore, using the attention mechanism can make the network pay more attention to defective areas. The function of the coordinate attention (CA, coordinate attention) module is to decompose the channel attention into two 1-dimensional feature encoding processes, and aggregate features along the two spatial directions of H and W respectively. In this way, long-range dependencies can be captured along one spatial direction, while precise position information can be preserved along another spatial direction; then, the generated feature maps are respectively encoded into a pair of direction-aware and position-sensitive attention maps, which can be complementary applied to the input feature map. The coordinate attention (CA) module considers both inter-channel relationships and position information. It not only captures cross-channel information, but also includes direction-aware and position-sensitive information, which allows the model to more accurately locate and identify target areas. However, since the CA module needs to pay attention to both the channel and position information of the feature map, the channel information will be lost during the training process. The efficient channel attention (ECA) module, as a lightweight channel attention module, can capture cross-channel interactive information and achieve significant performance improvements. Based on this, combining the CA module and the ECA module, the present invention constructs a hybrid attention mechanism module (ECCA module), uses the ECA module to help capture channel information in the CA module, and introduces this module into the backbone network of YOLOv5 to help it better extracted features. The color, location and size of bearing defects vary greatly and are affected by background interference. Some defects cannot be detected using the original YOLOv5. The use of the CA module can improve the network's ability to extract the location and size of defects. The continued introduction of the ECA module can improve Network differentiation of defect colors.

(1)CA(1)CA

CA模块的结构如图2中所示。首先，分别对水平方向和垂直方向进行全局平均池化(GAP)得到两个单独的位置感知特征图，其中垂直方向上池化的结果会通过permute将第二维度和第三维度交换；其次，在空间维度上Concat两个特征图，并通过Conv、BN和hardSwish来编码垂直方向和水平方向的空间信息；最后，分离两个位置感知特征图，并加权到输入特征图上。The structure of the CA module is shown in Figure 2. First, global average pooling (GAP) is performed on the horizontal and vertical directions to obtain two separate position-aware feature maps. The results of the pooling in the vertical direction will exchange the second dimension and the third dimension through permute; secondly, Concat two feature maps in the spatial dimension, and encode the spatial information in the vertical and horizontal directions through Conv, BN and hardSwish; finally, separate the two position-aware feature maps and weight them to the input feature map.

(2)ECA(2)ECA

ECA模块的结构如图3所示。首先，对输入特征图(C*H*W)通过全局平均池化(GAP)得到C*1*1的张量；其次，通过大小为k的快速一维卷积来捕获跨通道交互信息，得到各个通道的权重值，经过激活函数生成C*1*1的特征图；最后，再与输入特征图进行相乘运算得到最终的特征图。ECA模块通过添加少量参数避免了降维，而且为了更好的捕获跨通道交互信息，ECA模块将每个通道及其k个邻近范围作为关键指标。卷积核大小k代表有多少个邻近范围参与注意力计算，即局部跨通道交互的覆盖率；k可以通过通道维数自适应确定，如式(1)所示：The structure of the ECA module is shown in Figure 3. First, the input feature map (C*H*W) is obtained through global average pooling (GAP) to obtain a tensor of C*1*1; secondly, a fast one-dimensional convolution of size k is used to capture the cross-channel interactive information, The weight value of each channel is obtained, and a C*1*1 feature map is generated through the activation function; finally, the final feature map is obtained by multiplying the input feature map. The ECA module avoids dimensionality reduction by adding a small number of parameters, and in order to better capture cross-channel interaction information, the ECA module uses each channel and its k neighboring ranges as key indicators. The convolution kernel size k represents how many neighboring ranges participate in the attention calculation, that is, the coverage of local cross-channel interactions; k can be adaptively determined by the channel dimension, as shown in Equation (1):

其中，c为通道维数，|x|_odd表示最近的奇数x，本发明中r设置为2，b设置为1。Among them, c is the channel dimension, |x|_odd represents the nearest odd number x, r is set to 2 and b is set to 1 in the present invention.

(3)ECCA(3)ECCA

本发明将CA模块与和ECA模块相结合构造混合注意力机制模块(ECCA模块)，并添加在YOLOV5主干网络的SPPF模块之前，以加强网络对特征的提取。ECCA的结构如图4所示。The present invention combines the CA module with the ECA module to construct a hybrid attention mechanism module (ECCA module), and adds it before the SPPF module of the YOLOV5 backbone network to enhance the network's feature extraction. The structure of ECCA is shown in Figure 4.

ECCA模块是将ECA模块提取的通道特征向量加权到CA模块的两个位置感知特征图，这样做的目的是加强位置感知特征图得到的跨通道交互信息，使网络的性能更加优异，增强特征的提取。The ECCA module weights the channel feature vector extracted by the ECA module to the two position-aware feature maps of the CA module. The purpose of this is to strengthen the cross-channel interactive information obtained by the position-aware feature map, making the network performance more excellent and enhancing the features. extract.

1.3Slim neck1.3Slim neck

工业项目中的对检测的精度和推理要求较高。通常，模型的参数数量越高，检测精度越高；然而，相应的检测速度会有所下降。因此，本发明引入了更轻的卷积结构GSConv，该结构可以在不损失特征表达能力的情况下降低参数量和计算量。本发明将GSConv模块嵌入到特征融合阶段，以便新模型在显著较少的参数量下，有更好的效果。没有在主干网络中使用GSConv，因为它会导致更深的主干网络层，更深的网络会加剧对空间信息流的阻力，从而影响推理速度。Industrial projects have higher requirements for detection accuracy and reasoning. Generally, the higher the number of parameters of the model, the higher the detection accuracy; however, the corresponding detection speed will decrease. Therefore, the present invention introduces a lighter convolution structure GSConv, which can reduce the amount of parameters and calculation without losing the ability to express features. The present invention embeds the GSConv module into the feature fusion stage so that the new model can achieve better results with significantly less parameters. GSConv is not used in the backbone network because it would lead to a deeper backbone network layer. A deeper network would increase the resistance to spatial information flow and thus affect the inference speed.

(1)GSConv(1)GSConv

GSConv模块的结构如图5所示。GSConv的结构由两个部分组成，一个是标准卷积(SC)层，一个是深度可分离卷积(DSC)层。SC层负责提取特征图的高级语义信息，而DSC层负责减少特征图的通道数和计算量。然后将这两层提取的特征信息Concat，再经过一个channel shuffle^[28]操作得到输出特征图。channel shuffle操作用于在分组卷积后重新排列通道，使得不同分组之间的信息能够交流，这样可以提高网络的性能和精度。The structure of the GSConv module is shown in Figure 5. The structure of GSConv consists of two parts, a standard convolution (SC) layer and a depthwise separable convolution (DSC) layer. The SC layer is responsible for extracting high-level semantic information of the feature map, while the DSC layer is responsible for reducing the number of channels and calculation amount of the feature map. Then Concat the feature information extracted from these two layers, and then perform a channel shuffle^[28] operation to obtain the output feature map. The channel shuffle operation is used to rearrange the channels after group convolution so that information between different groups can be communicated, which can improve the performance and accuracy of the network.

(2)、VoVGSCSP(2), VoVGSCSP

基于GSConv，本发明在GSConv的基础上引入GS Bottleneck和VoVGSCSP模块。图6显示了GS Bottleneck和VoVGSCSP模块的结构。相较于YOLOv5中使用的Bottleneck模块和C3模块，VoVGSCSP可以通过分组卷积和channel shuffle来减少参数量和计算量，提高模型的轻量化程度；而且，可以通过多分支卷积来增强特征提取能力和感受野，提高模型的精度。Based on GSConv, the present invention introduces GS Bottleneck and VoVGSCSP modules on the basis of GSConv. Figure 6 shows the structure of the GS Bottleneck and VoVGSCSP modules. Compared with the Bottleneck module and C3 module used in YOLOv5, VoVGSCSP can reduce the amount of parameters and calculations through group convolution and channel shuffle, and improve the lightweight of the model; moreover, it can enhance feature extraction capabilities through multi-branch convolution and receptive field to improve the accuracy of the model.

(3)、Slim neck(3)、Slim neck

YOLOv5的Neck部分为特征融合网络，它的作用是将主干网络提取的大、中、小三个不同尺度的特征图进行融合以得到更丰富的特征信息。为了平衡模型的准确度和速度，本发明使用由GSConv和VoVGSCSP构成的Slim neck特征融合网络。图1展示了Slim neck的网络结构，相较于YOLOv5的Neck，Slim neck将CBS和C3模块替换成GSConv和VoVGSCSP，可以减少计算量和参数量，提高模型的速度和效率，还可以通过控制不同尺度特征的通道数和表达能力，适用不同大小的目标检测。The Neck part of YOLOv5 is the feature fusion network. Its function is to fuse the large, medium and small feature maps of three different scales extracted by the backbone network to obtain richer feature information. In order to balance the accuracy and speed of the model, the present invention uses a Slim neck feature fusion network composed of GSConv and VoVGSCSP. Figure 1 shows the network structure of Slim neck. Compared with YOLOv5 Neck, Slim neck replaces CBS and C3 modules with GSConv and VoVGSCSP, which can reduce the amount of calculation and parameters, improve the speed and efficiency of the model, and can also control different The number of channels and expression ability of scale features are suitable for target detection of different sizes.

1.4、YOLOX的解耦检测头1.4. YOLOX’s decoupling detection head

头部Head部分是YOLOv5的检测部分，在原始YOLOv5算法使用的是耦合头，即在特征融合后，直接经过一层卷积得到最终检测头，检测头耦合了位置、目标和类别信息。但是，轴承缺陷分类过于复杂，而且由于轴承是环形结构，本发明采用滑动窗口的方式进行检测，会导致每张图片裁剪出的结果都有所差别，这会导致背景变得复杂。使用耦合头会导致损失函数收敛缓慢，并容易对背景产生误判。所以本发明使用YOLOX的解耦检测头，其结构如图7所示。YOLOX的解耦检测头的结构是由一个1×1的卷积层减少通道数，后面跟着两个平行分支结构。其中一个分支负责分类，另一个分支负责回归。分类分支的输出形状为H*W*C，回归分支又分位置和目标置信度两个分支，输出形状分别为H*W*4和H*W*1。本发明使用的依然是anchor based的检测机制，所以每个输出都要乘上锚框数量(anchor)。由于解耦头可以分别提取分类和回归特征，避免了特征之间的干扰，所以使用解耦头可以大大加快模型训练时损失函数的收敛速度。The head part is the detection part of YOLOv5. The original YOLOv5 algorithm uses a coupling head, that is, after feature fusion, the final detection head is obtained directly through a layer of convolution. The detection head couples the position, target and category information. However, the classification of bearing defects is too complicated, and because the bearing is an annular structure, the present invention uses a sliding window method for detection, which will result in different cropped results for each picture, which will cause the background to become complicated. Using a coupling head will cause the loss function to converge slowly and easily cause misjudgment of the background. Therefore, the present invention uses the decoupling detection head of YOLOX, whose structure is shown in Figure 7. The structure of YOLOX's decoupled detection head consists of a 1×1 convolution layer to reduce the number of channels, followed by two parallel branch structures. One branch is responsible for classification and the other branch is responsible for regression. The output shape of the classification branch is H*W*C, and the regression branch is divided into two branches: position and target confidence. The output shapes are H*W*4 and H*W*1 respectively. This invention still uses the anchor based detection mechanism, so each output must be multiplied by the number of anchor frames (anchor). Since the decoupling head can extract classification and regression features separately, avoiding interference between features, the use of the decoupling head can greatly speed up the convergence of the loss function during model training.

1.5、K-means算法和损失函数1.5. K-means algorithm and loss function

1.5.1、K-means算法1.5.1. K-means algorithm

YOLOv5是一种anchor based目标检测算法，它使用预设锚框(anchor box)来预测目标的边界框。锚框的形状和尺寸对检测效果有很大影响，因此需要根据数据集的特点进行聚类分析，得到合适的锚框。K-means算法是一种无监督聚类算法，它可以将无标签的数据划分为一定数量的不同群体，计算步骤如表1：YOLOv5 is an anchor based target detection algorithm that uses preset anchor boxes to predict the bounding box of the target. The shape and size of the anchor box have a great impact on the detection effect, so cluster analysis needs to be performed based on the characteristics of the data set to obtain a suitable anchor box. The K-means algorithm is an unsupervised clustering algorithm that can divide unlabeled data into a certain number of different groups. The calculation steps are as shown in Table 1:

表1 K-means计算步骤Table 1 K-means calculation steps

在YOLOv5的锚框计算中，一般将边框看做一个二维点(宽度和高度)，然后用K-means算法对这些点进行聚类，得到K个最符合真实框大小的锚框。由于YOLOv5有三个不同尺度的特征图，每个尺度的特征图有三个锚框，所以选择聚类九个锚框，大小分别为[39,39,62,122,178,76],[106,244,597,58,236,202,],[178,547,478,220,354,455]，不同尺度的锚框分别对于不同大小的目标。In YOLOv5's anchor box calculation, the border is generally regarded as a two-dimensional point (width and height), and then the K-means algorithm is used to cluster these points to obtain K anchor boxes that best match the real box size. Since YOLOv5 has three feature maps of different scales, and each scale feature map has three anchor boxes, nine anchor boxes are selected for clustering, with sizes of [39,39,62,122,178,76], [106,244,597,58,236,202,], [178,547,478,220,354,455], anchor boxes of different scales are suitable for targets of different sizes.

1.5.2、损失函数1.5.2. Loss function

损失函数是用来度量神经网络预测输出与期望输出的接近程度的，预测输出越接近期望输出损失函数的值越小。YOLOv5采用的损失函数由三部分构成，分别为位置损失、目标损失和类别损失。位置损失用来衡量网络预测位置与期望位置之间的距离；目标损失代表有目标的概率，一般置信度为0到1之间的数值，数值越大代表有目标的概率越大；类别损失表示该目标属于某个类别的概率。整体损失函数是上述三个损失函数的加权和，如式(2)所示：The loss function is used to measure the closeness of the neural network's predicted output to the expected output. The closer the predicted output is to the expected output, the smaller the value of the loss function. The loss function used by YOLOv5 consists of three parts, namely position loss, target loss and category loss. Position loss is used to measure the distance between the network's predicted position and the expected position; target loss represents the probability of having a target. Generally, the confidence level is a value between 0 and 1. The larger the value, the greater the probability of having a target; category loss represents The probability that the target belongs to a certain category. The overall loss function is the weighted sum of the above three loss functions, as shown in equation (2):

位置损失L_box使用CIoU作为损失函数，其定义为：The position loss L_box uses CIoU as the loss function, which is defined as:

其中，IOU为预测框与真实框的交并比，IOU越大代表真实框与预测框越接近。ρ表示真实框A与预测框B中心点坐标的欧氏距离，c表示包含预测边界框和真实边界框的最小闭合矩形对角线长度，用于归一化距离。α是权重系数，v用来衡量A与B宽高比的一致性。Among them, IOU is the intersection ratio of the predicted box and the real box. The larger the IOU, the closer the real box and the predicted box are. ρ represents the Euclidean distance between the center point coordinates of the true box A and the predicted box B, and c represents the diagonal length of the minimum closed rectangle containing the predicted bounding box and the true bounding box, which is used to normalize the distance. α is the weight coefficient, and v is used to measure the consistency of the aspect ratios of A and B.

IOU的定义为：The definition of IOU is:

其中A为真实框，B为预测框，A∩B表示A与B的交集，A∪B表示A与B的并集，α和v定义如下：Among them, A is the real box, B is the prediction box, A∩B represents the intersection of A and B, A∪B represents the union of A and B, α and v are defined as follows:

而本发明的类别损失和目标损失都使用二分类交叉熵损失函数BCEWithLogitsLoss，其定义如下：The category loss and target loss of this invention both use the binary cross-entropy loss function BCEWithLogitsLoss, which is defined as follows:

其中，n表示输入的样本数，y_n表示目标的真实值，x_n表示网络的预测值。Among them, n represents the number of input samples, y_n represents the true value of the target, and x_n represents the predicted value of the network.

2、训练和测试2. Training and testing

2.1轴承表面缺陷数据集2.1 Bearing surface defect data set

本发明的轴承缺陷数据集从工业流水线上采集，轴承需要检测的面包括上端面、下端面、内侧面和外侧面四个面。其中上下端面和内侧面由面阵相机成像，得到的图像分辨率为5472×3648；外侧面通过线阵相机成像，得到的图像分辨率为1024×10000。本发明采集了1000幅有缺陷的轴承图像，并在有缺陷的位置手动截取窗口，每个窗口的大为640×640，同时根据缺陷的类型将数据集分成了螺纹、黑斑、平面印、磕伤和切划伤。由于生产实际中每种缺陷类型的数量存在一定的差异，为了保证训练的合理性以及每种缺陷类型之间的平衡，对每种缺陷的数量做了扩充，扩充方法包括平移、旋转、翻转、改变亮度和对比度等。扩充后的图片总数量为4934，标签数量为5358，扩充后的各种缺陷的统计数据见表2。本发明根据数据集样本数量以及训练的合理性，将每种缺陷样本按照训练集、验证集和测试集进行随机划分，划分的比例为8：1：1。The bearing defect data set of the present invention is collected from the industrial assembly line. The faces of the bearing that need to be detected include the upper end face, the lower end face, the inner side and the outer side. The upper and lower end surfaces and the inner surface are imaged by an area array camera, and the image resolution obtained is 5472×3648; the outer surface is imaged by a line array camera, and the image resolution obtained is 1024×10000. This invention collects 1,000 defective bearing images, and manually intercepts windows at the defective locations. The size of each window is 640×640. At the same time, the data set is divided into threads, black spots, flat prints, Bruises and cuts. Since there are certain differences in the number of each defect type in actual production, in order to ensure the rationality of training and the balance between each defect type, the number of each defect type has been expanded. The expansion methods include translation, rotation, flipping, Change brightness and contrast, etc. The total number of images after expansion is 4934, and the number of tags is 5358. The statistics of various defects after expansion are shown in Table 2. According to the number of samples in the data set and the rationality of the training, the present invention randomly divides each defect sample into the training set, verification set and test set, and the division ratio is 8:1:1.

表2扩充后的缺陷数据集Table 2 Expanded defect data set

2.3性能指标2.3 Performance indicators

为了验证改进的YOLOV5缺陷检测模型的有效性，本发明以均值平均精度(mAP)、平均精度(AP)、参数量(parameter)和FPS作为衡量指标，混淆矩阵如表3所示。In order to verify the effectiveness of the improved YOLOV5 defect detection model, the present invention uses mean average precision (mAP), average precision (AP), parameter quantity (parameter) and FPS as measurement indicators. The confusion matrix is shown in Table 3.

表3混淆矩阵Table 3 confusion matrix

表3中，TP(True Positive)表示是正样本并且预测正确的数量，FP(FalsePositive)表示是负样本但是预测为正样本的数量；FN(False Negative)表示是正样本但是错误的预测为负样本的数量。TN(True Negative)表示是负样本并且预测正确的数量。FPS表示目标检测网络每秒可以处理图片的数量，FPS越大表示网络处理速度越快。In Table 3, TP (True Positive) represents the number of positive samples that are correctly predicted, FP (FalsePositive) represents the number of negative samples but predicted as positive samples, and FN (False Negative) represents the number of positive samples that are incorrectly predicted as negative samples. quantity. TN (True Negative) indicates the number of negative samples and correct predictions. FPS represents the number of images that the target detection network can process per second. The larger the FPS, the faster the network processing speed.

准确率和召回率的计算公式如下：The calculation formulas of precision and recall are as follows:

AP和mAP的定义如下：The definitions of AP and mAP are as follows:

其中，AP是由准确率和召回率围成的P-R曲线的面积，mAP表示的是每种类别AP值的平均值，用以衡量网络模型对所有类别的检测性能。Among them, AP is the area of the P-R curve surrounded by accuracy and recall, and mAP represents the average AP value of each category, which is used to measure the detection performance of the network model for all categories.

2.4、训练过程及输出、测试过程及输出2.4. Training process and output, testing process and output

实验的硬件环境和软件版本如表4所示。The experimental hardware environment and software version are shown in Table 4.

表4实验环境Table 4 Experimental environment

训练的总轮数为100、批处理大小为16、前期学习率为0.01、后期学习率为0.001，使用SGD优化器。本发明对YOLOv5进行了三处改进，为了分别验证每处改进的有效性以及三处改进相结合的有效性，进行了消融实验，实验结果如表5所示。The total number of training epochs is 100, the batch size is 16, the early learning rate is 0.01, the late learning rate is 0.001, and the SGD optimizer is used. The present invention has made three improvements to YOLOv5. In order to verify the effectiveness of each improvement and the effectiveness of the combination of the three improvements, an ablation experiment was performed. The experimental results are shown in Table 5.

表5消融实验结果Table 5 Ablation experiment results

由表5可知，YOLOv5s的mAP为96.3％，加入ECA模块后的mAP为96.7％；加入CA模块后的mAP为97.0％；加入结合ECA和CA模块的ECCA模块后的mAP为97.8％，表明二者结合有助于提高网络对轴承表面缺陷的检测；结合ECCA和Slim neck的mAP为98.1％；继续结合Decoupled head的mAP为98.6％，此时检测精度达到了最高，相较于YOLOv5s提升了2.3％，说明以上三个方法结合应用于轴承表面缺陷的检测是有效的。As can be seen from Table 5, the mAP of YOLOv5s is 96.3%, the mAP after adding the ECA module is 96.7%; the mAP after adding the CA module is 97.0%; the mAP after adding the ECCA module that combines ECA and CA modules is 97.8%, indicating that the two The combination of the two helps to improve the network's detection of bearing surface defects; the mAP of combining ECCA and Slim neck is 98.1%; the mAP of continuing to combine with Decoupled head is 98.6%. At this time, the detection accuracy reaches the highest level, which is 2.3 higher than that of YOLOv5s. %, indicating that the above three methods are effective in detecting bearing surface defects.

训练集的损失函数、验证集的损失函数曲线如图8所示。图8上面两行图像为训练和验证时的位置损失函数曲线、目标置信度损失函数曲线和类别损失函数曲线，可以看出训练和验证损失函数曲线在前30个训练次数内快速收敛，并在训练次数达到100时完成收敛，而且，由于增加了解耦头，目标损失和类别损失的收敛速度确实很快。图8第三行图像为mAP曲线，可以看出mAP曲线随着训练次数的增加也呈现递增趋势。The loss function of the training set and the loss function curve of the verification set are shown in Figure 8. The upper two rows of images in Figure 8 show the position loss function curve, target confidence loss function curve and category loss function curve during training and verification. It can be seen that the training and verification loss function curves converge quickly within the first 30 training times, and Convergence is completed when the number of training times reaches 100, and due to the addition of the decoupling head, the convergence speed of target loss and category loss is indeed very fast. The third row of images in Figure 8 is the mAP curve. It can be seen that the mAP curve also shows an increasing trend as the number of training times increases.

3、在线使用过程描述3. Description of online usage process

本发明中实际在线使用时是利用步骤2训练好的轴承缺陷检测模型，运行在Windows10系统上。在线运行时首先在工业现场的四个工业相机采集生产完成的轴承图片，并发送至上位机进行进一步预处理。预处理包括提取感兴趣区域，工业现场通过滑动窗口的方式进行检测，检测方式如图9所示，其中滑动窗口大小为640*640，步长为0.85。随后轴承缺陷检测模型对所有640×640大小的图片进行预测，输出带有缺陷类型、置信度和缺陷位置标记的图片。若输出的图片中含有轴承缺陷中的一种或几种类型，则人工判定该轴承为不合格产品，否则为合格产品。In actual online use of the present invention, the bearing defect detection model trained in step 2 is used to run on the Windows 10 system. During online operation, four industrial cameras at the industrial site first collect pictures of the completed bearings and send them to the host computer for further preprocessing. Preprocessing includes extracting the area of interest, and the industrial site is detected through a sliding window. The detection method is shown in Figure 9, where the sliding window size is 640*640 and the step size is 0.85. The bearing defect detection model then predicts all 640×640 sized images and outputs images with defect type, confidence and defect location markers. If the output picture contains one or several types of bearing defects, the bearing will be manually determined as an unqualified product, otherwise it will be a qualified product.

实验：experiment:

实验采用实施例1中表2所示的扩充后的缺陷数据集和表4所示的实验环境。The experiment uses the expanded defect data set shown in Table 2 in Example 1 and the experimental environment shown in Table 4.

为了进一步验证改进的YOLOv5缺陷检测模型的有效性，本发明与YOLOv5s、YOLOXs、YOLOv6n和YOLOv7tiny等单阶段目标检测方法进行了对比。实验结果如表6所示。In order to further verify the effectiveness of the improved YOLOv5 defect detection model, the present invention is compared with single-stage target detection methods such as YOLOv5s, YOLOXs, YOLOv6n and YOLOv7tiny. The experimental results are shown in Table 6.

表6对比实验结果Table 6 Comparison of experimental results

由表6可知，YOLOv5s模型的mAP为96.3％，为所有原始模型中最高。而本发明提出的改进YOLOv5s模型的mAP为98.6％，提高了2.5％。但是由于增加了参数量和计算量，本发明提出模型的FPS有所下降。本发明中面阵相机成像的图片裁剪后为77张，线阵相机为38张，四个工位得到的图片总量为269张，理论上3s可以完成检测。实际生产中要求8s完成检测，所以本发明提出的模型可以满足轴承实际生产检测的需求。As can be seen from Table 6, the mAP of the YOLOv5s model is 96.3%, which is the highest among all original models. The mAP of the improved YOLOv5s model proposed by this invention is 98.6%, an increase of 2.5%. However, due to the increased amount of parameters and calculation, the FPS of the model proposed by the present invention has declined. In the present invention, the number of images imaged by the area array camera is 77 after cropping, and the number of images imaged by the line array camera is 38. The total number of images obtained by the four work stations is 269. Theoretically, the detection can be completed in 3 seconds. Actual production requires 8 seconds to complete the inspection, so the model proposed by the present invention can meet the needs of actual production inspection of bearings.

为进一步验证本发明的先进性，本实验随机挑选了5张图片在各个模型上进行测试，结果如图10所示。图10中第一行为该5张图片缺陷区域真实框位置，3-8行为各个模型预测结果；并且右上角标记的数字为预测的置信度，置信度越高代表是目标的可能性越高。显然，由图10可以看出，不同模型在轴承缺陷数据集上有不同的检测效果，所有的原始模型对黑斑和平面印的检测效果都不理想，而本发明整体上明显优于其他模型的检测效果，这进一步验证了本发明的先进性。In order to further verify the advancement of the present invention, this experiment randomly selected 5 pictures and tested them on each model. The results are shown in Figure 10. In Figure 10, the first line is the true frame position of the defect area in the five pictures, and lines 3-8 are the prediction results of each model; and the number marked in the upper right corner is the confidence of the prediction. The higher the confidence, the higher the possibility of being the target. Obviously, it can be seen from Figure 10 that different models have different detection effects on the bearing defect data set. All the original models have unsatisfactory detection effects on black spots and flat marks, and the present invention is significantly better than other models as a whole. The detection effect further verifies the advancement of the present invention.

最后，还需要注意的是，以上列举的仅是本发明的若干个具体实施例。显然，本发明不限于以上实施例，还可以有许多变形。本领域的普通技术人员能从本发明公开的内容直接导出或联想到的所有变形，均应认为是本发明的保护范围。Finally, it should also be noted that the above enumerations are only several specific embodiments of the present invention. Obviously, the present invention is not limited to the above embodiments, and many modifications are possible. All modifications that a person of ordinary skill in the art can directly derive or associate from the disclosure of the present invention should be considered to be within the protection scope of the present invention.

Claims

1. The online visual intelligent detection method for the defects of the whole surface of the high-precision bearing is characterized by comprising the following steps of:

collecting pictures of the upper end face, the lower end face, the inner side face and the outer side face of the bearing, then carrying out window interception on the collected pictures in an upper computer in a sliding window mode, inputting a bearing defect detection model for prediction, and outputting pictures with defect types, confidence and defect position marks;

the bearing defect detection model is based on a Yolov5 network, an ECCA module is added in front of an SPPF module of a backbone network of the Yolov5 network, a Neck network of the Yolov5 network is replaced by a Slim network composed of GSConv and VoVGSCSP, and a Head adopts a decoupling detection Head in Yolox.

2. The online visual intelligent detection method for the defects on the whole surface of the high-precision bearing, which is disclosed in claim 1, is characterized in that:

the ECCA module comprises a coordinate attention module and a high-efficiency channel attention module, the input feature images enter the coordinate attention module and are subjected to global average pooling in the horizontal direction and the vertical direction respectively to obtain two independent position sensing feature images, channel feature vectors extracted by the high-efficiency channel attention module of the input feature images are weighted to the two position sensing feature images respectively, the Concat two feature images in the space dimension are coded, and the space information in the vertical direction and the horizontal direction is coded through Conv, BN and hardSwish; finally, the two position-aware feature maps are separated and weighted onto the input feature map.

3. The online visual intelligent detection method for the defects on the whole surface of the high-precision bearing according to claim 2, which is characterized by comprising the following steps:

the size of the clustering anchor frame of the bearing defect detection model is [39,39,62,122,178,76], [106,244,597,58,236,202 ], [178,547,478,220,354,455] respectively.

4. The online visual intelligent detection method for the defects on the whole surface of the high-precision bearing according to claim 3, which is characterized by comprising the following steps of:

the training and testing process of the bearing defect detection model comprises the following steps:

collecting images of the upper end face, the lower end face, the inner side face and the outer side face of the bearing with defects, manually intercepting windows at defective positions, classifying the windows according to defect types, and randomly dividing each defect sample into a training set, a verification set and a test set after expanding the number of the windows; the total number of training rounds is 100, the batch size is 16, the early learning rate is 0.01, the later learning rate is 0.001, and the loss function is as follows by using the SGD optimizer:

Loss＝w_box L_box +w_obj L_obj +w_cls L_cls (2)

wherein w is_box 、w_obj 、w_cls 0.05, 0.5 and 1.0, respectively.