Movatterモバイル変換


[0]ホーム

URL:


CN119399144A - A chip surface defect segmentation model based on improved YOLOv8-seg and its training method and application - Google Patents

A chip surface defect segmentation model based on improved YOLOv8-seg and its training method and application
Download PDF

Info

Publication number
CN119399144A
CN119399144ACN202411453719.0ACN202411453719ACN119399144ACN 119399144 ACN119399144 ACN 119399144ACN 202411453719 ACN202411453719 ACN 202411453719ACN 119399144 ACN119399144 ACN 119399144A
Authority
CN
China
Prior art keywords
network structure
feature
layer
seg
concat
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411453719.0A
Other languages
Chinese (zh)
Inventor
顾寄南
朱永民
单韵竹
姜宝康
李静
高艳
向泓宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu University
Original Assignee
Jiangsu University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu UniversityfiledCriticalJiangsu University
Priority to CN202411453719.0ApriorityCriticalpatent/CN119399144A/en
Publication of CN119399144ApublicationCriticalpatent/CN119399144A/en
Pendinglegal-statusCriticalCurrent

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本发明公开了一种基于改进YOLOv8‑seg芯片表面缺陷分割模型及其训练方法和应用,搭建改进YOLOv8‑seg分割检测模型的网络,其中,Backbone模块包括多层的ShuffleNetV2网络结构,对图片进行多次卷积处理,提取出3个有效特征层;Neck模块采用Concat_SDI网络结构和EMA注意力机制网络结构,利用Concat_SDI网络结构对Backbone模块输出的有效特征层进行多层次特征融合得到特征图,利用EMA注意力机制网络结构对特征图进行特征加权;head模块为分割头,基于Neck模块的输出结果,以输出最终的特征信息;在完成改进YOLOv8‑seg分割检测模型的网络结构搭建后,进行训练、验证以及测试。本发明利用改进yolov8‑seg分割检测模型对芯片表面缺陷进行精准分割、检测,尤其是面对复杂背景和低对比度缺陷时,仍然能够保证检测的精度。

The invention discloses a chip surface defect segmentation model based on an improved YOLOv8-seg and its training method and application, and builds a network of an improved YOLOv8-seg segmentation detection model, wherein a Backbone module includes a multi-layer ShuffleNetV2 network structure, performs multiple convolutions on a picture, and extracts three effective feature layers; a Neck module adopts a Concat_SDI network structure and an EMA attention mechanism network structure, uses the Concat_SDI network structure to perform multi-level feature fusion on the effective feature layer output by the Backbone module to obtain a feature map, and uses the EMA attention mechanism network structure to perform feature weighting on the feature map; a head module is a segmentation head, and outputs final feature information based on the output result of the Neck module; after the network structure of the improved YOLOv8-seg segmentation detection model is built, training, verification and testing are performed. The invention uses the improved yolov8-seg segmentation detection model to accurately segment and detect chip surface defects, and can still ensure the accuracy of detection, especially when facing complex backgrounds and low-contrast defects.

Description

Surface defect segmentation model based on improved YOLOv-seg chip and training method and application thereof
Technical Field
The invention belongs to the technical field of machine vision, and particularly relates to an improved YOLOv-seg chip surface defect segmentation model and a training method and application thereof.
Background
In the semiconductor manufacturing process, the detection of chip surface defects is a key element for ensuring the quality and reliability of products. The traditional defect detection method often depends on manual inspection or algorithm based on rules, the manual inspection is high in labor intensity and is easily influenced by subjective factors, and the traditional image processing algorithm has limitation in processing complex background and diversified defects. Therefore, a detection algorithm of a high-efficiency accurate segmentation model is developed, and can be optimized in real-time performance and precision so as to meet the improvement of accuracy and efficiency of defect detection.
With the progress of deep learning technology, especially the development of Convolutional Neural Network (CNN), a defect detection method based on deep learning is gradually widely used. The deep learning model can automatically learn characteristics, and can realize efficient defect detection under a complex background, and the combination of target detection (such as YOLO series) and an image segmentation (such as Mask R-CNN) model enables the positioning and boundary recognition of defects to be more accurate, YOLOv-seg is an improved version of the YOLO series model, and the capability of target detection and image segmentation is combined. The model can detect the defect, can accurately divide the defect area, has obvious improvement in speed and precision, and is suitable for real-time processing and high-resolution image detection tasks.
However, the existing methods are relatively low in detection accuracy in the face of complex background and low contrast defects, and require further improvement and optimization.
Disclosure of Invention
In order to solve the defects in the prior art, the application provides a chip surface defect segmentation model based on improvement YOLOv-seg and a training method and application thereof, and the application utilizes the improved yolov-seg segmentation detection model to accurately segment and detect the chip surface defect, and can still ensure the detection precision especially when facing complex background and low contrast defects.
The technical scheme adopted by the invention is as follows:
A training method for a chip surface defect segmentation model based on improvement YOLOv-seg comprises the following steps:
Step 1, a network structure of an improved YOLOv-seg segmentation detection model is built, the improved YOLOv-seg segmentation detection model comprises a backbox module, a Neck module and a head module, wherein the backbox module comprises a multi-layer ShuffleNetV network structure, an input picture is subjected to multiple convolution processing through the ShuffleNetV2 network structure to extract 3 effective feature layers, the Neck module adopts a Concat _SDI network structure and an EMA attention mechanism network structure, the Concat _SDI network structure is used for carrying out multi-layer feature fusion on the effective feature layers output by the backbox module to obtain a feature map, the EMA attention mechanism network structure is used for carrying out feature weighting on the feature map, and the head module is a segmentation head and is used for outputting final feature information based on an output result of the Neck module;
And 2, after the network structure of the improved YOLOv-seg segmentation detection model is built, training, verifying and testing the built structure of the improved YOLOv-seg mesh segmentation detection model by using a picture with the surface defects of the chip.
Further, the backhaul module is formed by sequentially connecting a Conv_ maxpool network structure, a first ShuffleNetV network structure, a second ShuffleNetV network structure, a third ShuffleNetV network structure, a fourth ShuffleNetV2 network structure, a fifth ShuffleNetV network structure, a sixth ShuffleNetV2 network structure, an SPPF network structure and an EMA, wherein the fourth ShuffleNetV network structure, the sixth ShuffleNetV network structure and the SPPF network structure respectively output a first effective feature layer, a second effective feature layer and a third effective feature layer.
Further, according to different input Channel step sizes, different ShuffleNetV network structures are selected, when the input Channel step size is 1, the input is output in two paths after passing through CHANNEL SPLIT layers, one is directly connected with the input of Concat layers, the other is connected with the Conv layer, DWConv layers and Conv layer in sequence and then connected with the input of Concat layers, the input tensor is spliced through Concat layers and then is input into a Channel Shuffle layer, and the rearranged characteristic map channels are output by the Channel Shuffle layer;
When the step length of the input Channel is 2, the input is processed by two paths, one path comprises DWConv layers and Conv layers which are sequentially connected, the other path comprises Conv layers, DWConv layers and Conv layers which are sequentially connected, the output ends of the two paths are connected with the input of Concat layers, the input tensor is spliced by Concat layers and then is input into a Channel buffer layer, and the Channel buffer layer outputs rearranged feature map channels.
Further, the Neck module is composed of 2 Conv network structures, 4C 2f network structures, 2 upsamples, 4 Concat _SDI network structures and 3 EMA attention mechanism network structures, wherein the first upsamples are sequentially connected with a first Concat _SDI network structure, a first C2f network structure, a second upsamples, a second Concat _SDI network structure, a second C2f network structure, a first EMA attention mechanism network structure, a first Conv network structure, a third Concat _SDI network structure, a third C2f network structure, a second EMA attention mechanism network structure, a second Conv network structure, a fourth Concat _SDI network structure, a fourth C2f network structure and a third EMA attention mechanism network structure, and the first C2f network structure is also directly connected with the third Concat _SDI network structure.
Further, the process of multi-level feature fusion in Neck module is that Neck module receives 3 effective feature layers output by Backbone, wherein the first effective feature layer P1 output by the fourth ShuffleNetV network structure is connected to the second Concat _SDI network structure, and the same-scale fusion is carried out with the feature diagram N1 to be detected;
The second effective feature layer P2 output by the sixth ShuffleNetV network structure is accessed into the first Concat _SDI network structure, and the same-scale fusion is carried out on the second effective feature layer P2 and the feature diagram N2 to be detected;
and accessing a third effective feature layer P3 output by the SPPF module into a fourth Concat _SDI network structure, and carrying out same-scale fusion with the feature diagram N3 to be detected.
Further, the EMA attention mechanism network structure comprises a groups-style branch and a Cross-space learning branch, wherein in the groups-style branch, attention weight descriptors of the grouping feature map are extracted through three parallel routes, and the groups-styl branch not only encodes inter-channel information to adjust the importance of different channels, but also keeps accurate spatial structure information into the channels.
Further, in the groups-style branches, attention weight descriptors of the grouping feature graphs are extracted through three parallel routes, the three parallel routes comprise two parallel paths on the 1X1 branch and one path on the 3X3 branch, for any given input feature graph, the X equivalent of the feature graph is divided into G sub-features along the 1X1 branch to learn different semantics, the Y equivalent of the feature graph is subjected to feature learning along the other 1X1 branch, the outputs of the two 1X1 branches are processed through Concat and conv and then are output to be decomposed into two vectors, two-dimensional binomial distribution after linear convolution is fitted by utilizing two nonlinear Sigmoid functions to realize different Cross-channel interaction features, finally, the outputs of the two parallel paths of the 1X1 branch and the output after grouping are input into Re-weight together, and the feature graph is input into the Cross-space learning branch after feature learning along the 3X3 branch.
Further, in the Cross-spatial learning branch, the output of the 1x1 branch and the output of the 3x3 branch are used as tensors, wherein the output of the 1x1 branch is sequentially connected with GroupNorm, avg Pool and Softmax, matmul to output a vector of class probability, the output of the 3x3 branch is sequentially connected with Avg Pool and Softmax, matmul to output the probability of each class, and GroupNorm is required to be connected with Matmul in the 3x3 branch;
Finally, the outputs of the two branches are fused by using a nonlinear Sigmoid function to obtain a probability value between 0 and 1, and then Re-weighted by the Re-weighted characteristic together with the grouped outputs, so as to finally obtain the model parameters and the characteristics after smoothing.
The surface defect segmentation model based on the improved YOLOv-seg chip is obtained by training by adopting the method.
A chip surface defect segmentation detection method based on improvement YOLOv-seg comprises the following steps:
step 1, acquiring a picture of a chip surface defect to be identified by using an industrial camera;
And 2, inputting the image to be identified obtained in the step 1 into an improved YOLOv-seg chip surface defect segmentation model, carrying out segmentation prediction on the chip surface defects of the image to be identified, and outputting information of the chip surface defects in the industrial production process.
The invention has the beneficial effects that:
1. Aiming at the problem that the detection precision is lower when the existing method faces complex background and low-contrast defects, the invention improves the network structure of the feature extraction part and the feature fusion part of YOLOv network, and leads the details of the low-contrast defects to be better captured by introducing a lighter-weight feature extraction module, and adopts a finer fusion strategy in the process of feature fusion, so that the features from different layers can be more effectively complemented, and the optimization of the network structure not only improves the detection capability of the low-contrast defects, but also enhances the overall detection precision and reliability.
2. By introducing IoU (Intersection over Union) loss functions to quantify the prediction effect of the mask, taking IoU values as an additional loss term into the total loss function, the prediction performance of the model in the boundary and overlapping areas is directly optimized, the overall segmentation performance of the chip surface defects is effectively improved, various types of defects can be more accurately identified and positioned in practical application, and the detection result with higher quality is improved.
Drawings
FIG. 1 is a schematic diagram of a method for constructing a segmentation detection model based on a modification YOLOv-seg.
FIG. 2 is a schematic diagram of a network structure of the detection model based on the improved YOLOv-seg segmentation according to the present invention.
Fig. 3 is a schematic diagram of ShuffleNetV network structures, where (a) is the network structure when the input step is 1 and (b) is the network structure when the input step is 2.
Fig. 4 is a schematic diagram of the structure of the EMA attention mechanism module.
Fig. 5 is a schematic diagram of Concat _sdi structure, wherein (c) is a schematic diagram of Concat _sdi network structure and (d) is a schematic diagram of Concat _sdi feature fusion.
FIG. 6 is a graph showing the effect of the surface defects of a chip divided based on the improved YOLOv-seg division detection model of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Example 1
The application discloses a model training method based on the surface defect segmentation of an improved YOLOv-seg chip, which is combined with the accompanying figures 1-6, and comprises the following steps:
step 1, a network structure of an improved YOLOv-seg split detection model is built, and as shown in fig. 2, the network structure of the improved YOLOv-seg split detection model specifically comprises a backhaul module, a Neck module and a head module.
1. The backhaul module mainly comprises 1 Conv_ maxpool network structure, 6 ShuffleNetV network structures, 1 SPPF network structure and 1 EMA attention mechanism structure, and is formed by sequentially connecting a Conv_ maxpool network structure, a first ShuffleNetV2 network structure, a second ShuffleNetV2 network structure, a third ShuffleNetV network structure, a fourth ShuffleNetV2 network structure, a fifth ShuffleNetV network structure, a sixth ShuffleNetV network structure, an SPPF network structure and an EMA, wherein the fourth ShuffleNetV network structure, the sixth ShuffleNetV network structure and the SPPF network structure respectively output a first effective feature layer, a second effective feature layer and a third effective feature layer. In the backbox module, 3 effective feature layers are extracted after the input picture is convolved for a plurality of times (ShuffleNetV network structures).
More specifically, the Conv_ maxpool network structure performs feature processing of extracting different layers on an input picture, the SPPF network structure performs multi-scale feature extraction and dimension reduction processing on the output of the last layer ShuffleNetV network structure, and the EMA performs smoothing processing on model parameters.
More specifically, shuffleNetV network architecture introduces a Channel Shuffle (Channel Shuffle) mechanism based on a depth separable convolution. The channel shuffling mechanism breaks the independence between channels through a "channel shuffling" operation, forcing the network to enhance feature expression while maintaining low computational load. Specifically, the ShuffleNetV network structure adopts 'grouping convolution' and 'channel shuffling' to reduce the calculation amount, and the channel of the feature map is rearranged to improve the feature extraction capability of the model.
The ShuffleNetV network structure is shown in fig. 3, and different ShuffleNetV network structures are selected according to different input Channel step sizes, for example, when the input Channel step size is 1, the ShuffleNetV network structure is shown in fig. 3 (a), the input is output in two paths after passing through CHANNEL SPLIT layers, one is directly connected with the input of Concat layers, the other is connected with the input of Concat layers after being sequentially connected with the Conv layer, DWConv layers and Conv layer, the input tensor is spliced through Concat layers and then is input into the Channel Shuffle layer, and the Channel Shuffle layer outputs rearranged feature map channels.
When the step length of the input Channel is 2, the ShuffleNetV network structure is as shown in fig. 3 (b), the input is processed by two paths, one path comprises DWConv layers and Conv layers which are sequentially connected, the other path comprises Conv layers, DWConv layers and Conv layers which are sequentially connected, the output ends of the two paths are connected with the input of Concat layers, the input tensor is spliced by Concat layers and then is input into a Channel buffer layer, and the Channel buffer layer outputs rearranged feature map channels.
Furthermore, a ReLU activation function is adopted in ShuffleNetV network structure, nonlinearity is introduced, the model can be fitted with complex functions, the training process can be accelerated on one hand, gradient disappearance is reduced because the derivative of a positive half shaft is constant to be 1, on the other hand, model sparsity can be improved because negative value output is zero, and model parameters are reduced, and negative value parts are set to be zero. Furthermore, the expression of the ReLU activation function is as follows:
f(x)=max(0,x)
Wherein x is the characteristic value input in the previous layer.
2. The Concat _SDI is adopted in the Neck module to realize multi-level feature fusion, and the Concat _SDI fuses high-level features containing more semantic information and low-level features capturing finer details for each level of feature map. The Neck module is composed of 2 Conv network structures, 4C 2f network structures, 2 upsamples, 4 Concat _SDI network structures and 3 EMA attention mechanism network structures; the first upsampling is sequentially connected with a first Concat _SDI network structure, a first C2f network structure, a second upsampling, a second Concat _SDI network structure, a second C2f network structure, a first EMA attention mechanism network structure, a first Conv network structure, a third Concat _SDI network structure, a third C2f network structure, a second EMA attention mechanism network structure, a second Conv network structure, a fourth Concat _SDI network structure, a fourth C2f network structure and a third EMA attention mechanism network structure, the first C2f network structure is also directly connected with the third Concat _SDI network structure, the first upsampling receives an output of the EMA attention mechanism network structure in a backhaul module, the fourth Concat _SDI network structure receives a third effective feature layer of the SPPF output in the backhaul module, the first Concat _SDI network structure receives a second effective feature layer of the sixth ShuffleNetV network structure output in the backhaul module, and the first Concat _SDI network structure receives a fourth effective feature layer of the second effective network structure of the second Concat _SDI structure in the backhaul module 76.
More specifically, the Concat _sdi network structure is shown in fig. 5, and is composed of four feature extraction layers, a forward propagation calculation layer and an output layer.
As shown in fig. 5, 3 effective feature layers output by a backhaul are input into a Concat _sdi network structure, a first effective feature layer P1 output by a fourth ShuffleNetV network structure is connected into a second Concat _sdi network structure, and the same-scale fusion is performed with a feature diagram N1 to be detected;
The second effective feature layer P2 output by the sixth ShuffleNetV network structure is accessed into the first Concat _SDI network structure, and the same-scale fusion is carried out on the second effective feature layer P2 and the feature diagram N2 to be detected;
and accessing a third effective feature layer P3 output by the SPPF module into a fourth Concat _SDI network structure, and carrying out same-scale fusion with the feature diagram N3 to be detected.
For the feature graphs N1, N2, N3 and N1 to be detected are feature graphs obtained through second upsampling, N2 is a feature graph obtained through upsampling from the previous network layer N1, a feature graph generated through upsampling and fusion of a second effective feature layer P2 and a feature extraction network part F2 output by a sixth ShuffleNetV network structure, respectively, the feature graphs are obtained through feature fusion of three different network levels, and N3 is a feature graph obtained through a second Conv network structure.
More specifically, EMA attention mechanism network architecture as shown in fig. 4, EMA enhances feature learning capability by parallelizing 1x1 and 3x3 convolutions. The 1x1 branch keeps the channel dimension unchanged and captures fine-grained channel information, while the 3x3 branch aggregates multi-scale spatial information. Through functional grouping and cross-space learning, the EMA can effectively model local and global feature dependencies, improving pixel-level attention and performance. Finally, the output of the EMA is consistent with the input size. The EMA cross-space efficient multi-scale attention mechanism realizes feature weighting by learning the importance degrees of different areas of the image, and performs weighted fusion on feature mapping and corresponding attention weights to obtain final multi-scale feature representation.
In combination with fig. 4, the ema attention mechanism network structure includes a groups-style branch and a Cross-space learning branch, in which, in the groups-style branch, attention weight descriptors of the grouping feature map are extracted through three parallel routes, the three parallel routes include two parallel paths on the 1X1 branch and one path on the 3X3 branch, for any given input feature map, the X equivalent of the feature map is divided into G sub-features along the 1X1 branch (i.e. the channel dimension direction) to learn different semantics, and the Y equivalent of the feature map is feature-learned along the other 1X1 branch (i.e. the channel dimension direction), specifically, two 1D global average pooling operations are adopted to encode channels along two spatial directions in the 1X1 branch.
The outputs of the two 1x1 branches are processed by Concat and conv and then decomposed into two vectors, i.e. the two coding features are connected along the image height direction, so that the two coding features share the same 1x1 convolution without reducing the dimension of the 1x1 branch. And finally, the output of the two parallel paths of the 1x1 branch and the grouped output are input into Re-weight together to Re-weight the characteristics.
The feature map also performs feature learning along the 3x3 branch, stacks only one 3x3 kernel in the 3x3 branch to capture a multi-scale feature representation to expand the feature space, and then enters into the Cross-spatial learning branch.
The groups-styl branch not only encodes inter-channel information to adjust the importance of different channels, but also retains accurate spatial structure information into the channels.
In the Cross-spatial learning branch, the output of the 1x1 branch and the output of the 3x3 branch are taken as tensors.
The output of the 1x1 branch is sequentially connected with the vector of the class probability after GroupNorm, the Avg Pool and Softmax, matmul, the output of the 3x3 branch is sequentially connected with the Avg Pool and Softmax, matmul to output the probability of each class, and GroupNorm is required to be connected with Matmul in the 3x3 branch;
Finally, the outputs of the two branches are fused by using a nonlinear Sigmoid function to obtain a probability value between 0 and 1, and then Re-weighted by the Re-weighted characteristic together with the grouped outputs, so as to finally obtain the model parameters and the characteristics after smoothing.
And then, carrying out global space information coding on the output of the 1x1 branch by utilizing two-dimensional global average pooling, and directly converting the output of the minimum branch into a corresponding dimension shape before the channel characteristic joint activation mechanism. Multiplying the parallel processed outputs by a matrix dot product operation yields a first spatial attention map. In addition, the global space information is encoded on the 3x3 branch by two-dimensional global average pooling, the 1x1 branch is directly converted into a corresponding dimension shape before the channel characteristic joint activation mechanism, and on the basis, a second space attention pattern for reserving the whole accurate space position information is derived. Finally, the output feature map within each group is computed as a set of two spatial attention weight values generated, and then a Sigmoid function is used. It captures the pairwise relationship at the pixel level and highlights the global context of all pixels. The final output of EMA is the same size as X.
More specifically, the Concat _SDI network structure of the invention is to increase SDI on the basis of common splicing to realize multi-level feature fusion, and the formula is as follows:
Wherein f0i represents the original feature map of the i-th stage,AndParameters representing spatial and channel attention mechanisms, respectively, f1i is a processed profile.
The SDI module enhances the representation capability of each level of feature map by combining semantic information of high-level features with detailed information of low-level features, so that the precision in an image segmentation task is improved, compared with a traditional feature map splicing mode, the jump connection of the module is simpler and more efficient, and the calculation complexity and GPU memory use are reduced;
Further, the SDI intermediate principle process is explained in detail by the SDI formula:
Firstly, reducing the channel number of f1i to a super-parameter through convolution of 1x1 to obtain a new characteristic diagram f2i;
Further, the feature map is transferred to a decoder, and the SDI module adjusts the feature map of each stage to the same resolution. The adjusted feature map is denoted as f3ij, where i represents the target level and j represents the source level of the feature map. The adjusting operation comprises the following steps:
Further, for the case of j < i, adaptive average pooling is used to adjust the size;
for the case of j=i, an identity mapping is used;
For the case of j > i, resizing using bilinear interpolation;
Further, the adjusted feature images are smoothed through 3x3 convolution and expressed as f4ij, and finally, all the adjusted feature images are fused together through Hadamard products, so that semantic information and detail information of each level of feature images are enhanced, and f5i is obtained.
3. The head module is a YOLOv split head, and specifically includes 3 segments, where the 3 segments correspond to the first EMA attention mechanism network structure, the second EMA attention mechanism network structure, and the third EMA attention mechanism network structure in the Neck module respectively, and final feature information including category labels, bounding box coordinates, bounding box areas, split masks, and confidence levels is output.
Segment predicts the category of each pixel by using a feature map extracted by a feature extraction network, adopts IoU values between IoU loss function prediction masks and real masks, adds IoU as an additional loss term into total loss, optimizes the prediction of a model in boundary and overlapping areas, and improves the overall segmentation performance, wherein the formula is as follows:
Wherein Intersection (Pred, target) is the intersection area of the predicted segmented region and the true labeled region, union (Pred, target) is the Union area of the predicted segmented region and the true labeled region.
And 2, after the network structure of the improved YOLOv-seg segmentation detection model is built, training, verifying and testing the built structure of the segmentation detection model based on the improved YOLOv-seg mesh. The method comprises the following specific steps:
step 2.1, data set preparation.
Firstly, 900 pictures with chip surface defects are obtained by using an industrial camera, labelme software is used for marking the collected pictures with the chip surface defects, the marking name is "Crack", and in the embodiment, the marking format adopts PascalVOC format.
And (2) carrying out image preprocessing on the images in the data set in the step (2.1), wherein the preprocessing specifically comprises data enhancement on the chip surface defect data set in the modes of random cutting, rotation, overturning, color dithering, contrast and the like, so that the robustness of the model on different data conditions can be improved. The image is resized to the input size required by the model, with a uniform size 640 x 640. And normalize the pixel values to a range of 0 to 1 or perform normalization (subtracting the mean value divided by the standard deviation) to make model training more stable. Meanwhile, four pictures are spliced by using a Mosaic data enhancement method, so that the batch_size is indirectly improved, and a single GPU can achieve a better training effect. The reinforced data set is divided to obtain a training set, a verification set and a test set, wherein the dividing ratio is (training set + verification set) the test set=9:1, and the training set and the verification set=9:1.
And 2.2, training a model.
The improved YOLOv-seg-split detection model described above is trained using a training set and a validation set of the dataset. The trained epochs were set to 500 times, the first 150 epochs frozen the stem feature extraction network and set the initial learning rate to 0.01, and the last 300 epochs thawed the stem feature extraction network and set the initial learning rate to 0.0001. And (3) performing verification of segmentation detection on the test set of the constructed chip surface defect data set to finish accurate segmentation of the chip surface defects.
Example 2
The method can be used for obtaining a chip surface defect segmentation model based on the improvement YOLOv-seg.
Example 3
A chip surface defect segmentation detection method based on improvement YOLOv-seg comprises the following steps:
step 1, acquiring a picture of a chip surface defect to be identified by using an industrial camera;
And 2, inputting the image to be identified obtained in the step 1 into a chip surface defect segmentation model based on the improvement YOLOv-seg as in the embodiment 2, carrying out segmentation prediction on the chip surface defects of the image to be identified, and outputting information of the chip surface defects in the industrial production process, wherein the information comprises category label names, boundary frame coordinates, boundary frame areas, segmentation masks and category confidence.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above embodiments are merely for illustrating the design concept and features of the present invention, and are intended to enable those skilled in the art to understand the content of the present invention and implement the same, the scope of the present invention is not limited to the above embodiments. Therefore, all equivalent changes or modifications according to the principles and design ideas of the present invention are within the scope of the present invention.
The above embodiments are merely for illustrating the design concept and features of the present invention, and are intended to enable those skilled in the art to understand the content of the present invention and implement the same, the scope of the present invention is not limited to the above embodiments. Therefore, all equivalent changes or modifications according to the principles and design ideas of the present invention are within the scope of the present invention.

Claims (10)

Translated fromChinese
1.一种基于改进YOLOv8-seg芯片表面缺陷分割模型训练方法,其特征在于,包括如下步骤:1. A chip surface defect segmentation model training method based on improved YOLOv8-seg, characterized by comprising the following steps:步骤1、搭建改进YOLOv8-seg分割检测模型的网络结构,所述改进YOLOv8-seg分割检测模型包括Backbone模块、Neck模块和head模块;所述Backbone模块包括多层ShuffleNetV2网络结构,输入的图片经过ShuffleNetV2网络结构进行多次卷积处理,提取出3个有效特征层;Neck模块采用Concat_SDI网络结构和EMA注意力机制网络结构,利用Concat_SDI网络结构对Backbone模块输出的有效特征层进行多层次特征融合得到特征图,利用EMA注意力机制网络结构对特征图进行特征加权;head模块为分割头,基于Neck模块的输出结果,以输出最终的特征信息;Step 1, build the network structure of the improved YOLOv8-seg segmentation detection model, the improved YOLOv8-seg segmentation detection model includes Backbone module, Neck module and head module; the Backbone module includes a multi-layer ShuffleNetV2 network structure, the input image is subjected to multiple convolution processing by the ShuffleNetV2 network structure, and 3 effective feature layers are extracted; the Neck module adopts the Concat_SDI network structure and the EMA attention mechanism network structure, and uses the Concat_SDI network structure to perform multi-level feature fusion on the effective feature layer output by the Backbone module to obtain a feature map, and uses the EMA attention mechanism network structure to perform feature weighting on the feature map; the head module is a segmentation head, which outputs the final feature information based on the output result of the Neck module;步骤2,在完成改进YOLOv8-seg分割检测模型的网络结构搭建后,利用带有芯片表面缺陷的图片对搭建好的基于改进YOLOv8-seg目分割检测模型结构进行训练、验证以及测试。Step 2: After completing the network structure of the improved YOLOv8-seg segmentation detection model, use pictures with chip surface defects to train, verify and test the built improved YOLOv8-seg segmentation detection model structure.2.根据权利要求1所述的一种基于改进YOLOv8-seg芯片表面缺陷分割模型训练方法,其特征在于,所述Backbone模块由Conv_maxpool网络结构、第一ShuffleNetV2网络结构、第二ShuffleNetV2网络结构、第三ShuffleNetV2网络结构、第四ShuffleNetV2网络结构、第五ShuffleNetV2网络结构、第六ShuffleNetV2网络结构、SPPF网络结构和EMA依次连接而成;其中,第四ShuffleNetV2网络结构、第六ShuffleNetV2网络结构和SPPF网络结构分别输出第一有效特征层、第二有效特征层和第三有效特征层。2. According to a chip surface defect segmentation model training method based on improved YOLOv8-seg according to claim 1, it is characterized in that the Backbone module is composed of a Conv_maxpool network structure, a first ShuffleNetV2 network structure, a second ShuffleNetV2 network structure, a third ShuffleNetV2 network structure, a fourth ShuffleNetV2 network structure, a fifth ShuffleNetV2 network structure, a sixth ShuffleNetV2 network structure, an SPPF network structure and an EMA connected in sequence; wherein the fourth ShuffleNetV2 network structure, the sixth ShuffleNetV2 network structure and the SPPF network structure output the first effective feature layer, the second effective feature layer and the third effective feature layer respectively.3.根据权利要求1所述的一种基于改进YOLOv8-seg芯片表面缺陷分割模型训练方法,其特征在于,根据输入通道步长不同,选用不同的ShuffleNetV2网络结构,当输入通道步长为1的时候,输入经过Channel Split层后分两条路输出,一条直接连接Concat层的输入,另一条与Conv层、DWConv层、Conv层依次连接后再与Concat层的输入连接,经Concat层将输入张量进行拼接处理后输入Channel Shuffle层,由Channel Shuffle层输出重新排列的特征图通道;3. According to the training method of the chip surface defect segmentation model based on the improved YOLOv8-seg according to claim 1, it is characterized in that, according to different input channel step sizes, different ShuffleNetV2 network structures are selected, when the input channel step size is 1, the input is output in two ways after passing through the Channel Split layer, one is directly connected to the input of the Concat layer, and the other is connected to the Conv layer, the DWConv layer, and the Conv layer in sequence and then connected to the input of the Concat layer, the input tensor is spliced and processed by the Concat layer and then input to the Channel Shuffle layer, and the Channel Shuffle layer outputs the rearranged feature map channel;当输入通道步长为2的时候,输入经过两条路处理,一条路包括依次连接的DWConv层、Conv层,另一条路包括依次连接的Conv层、DWConv层、Conv层,两条路的输出端均与Concat层的输入连接,经Concat层将输入张量进行拼接处理后输入Channel Shuffle层,由Channel Shuffle层输出重新排列的特征图通道。When the input channel stride is 2, the input is processed through two paths. One path includes the DWConv layer and the Conv layer connected in sequence, and the other path includes the Conv layer, the DWConv layer, and the Conv layer connected in sequence. The output ends of the two paths are connected to the input of the Concat layer. The input tensors are concatenated by the Concat layer and then input into the Channel Shuffle layer, which outputs the rearranged feature map channels.4.根据权利要求1所述的一种基于改进YOLOv8-seg芯片表面缺陷分割模型训练方法,其特征在于,Neck模块是由2个Conv网络结构、4个C2f网络结构、2个上采样、4个Concat_SDI网络结构、3个EMA注意力机制网络结构构成;其中,第一上采样与第一Concat_SDI网络结构、第一C2f网络结构、第二上采样、第二Concat_SDI网络结构、第二C2f网络结构、第一EMA注意力机制网络结构、第一Conv网络结构、第三Concat_SDI网络结构、第三C2f网络结构、第二EMA注意力机制网络结构、第二Conv网络结构、第四Concat_SDI网络结构、第四C2f网络结构、第三EMA注意力机制网络结构依次连接,且第一C2f网络结构还与第三Concat_SDI网络结构直接连接。4. According to a chip surface defect segmentation model training method based on improved YOLOv8-seg according to claim 1, it is characterized in that the Neck module is composed of 2 Conv network structures, 4 C2f network structures, 2 upsampling, 4 Concat_SDI network structures, and 3 EMA attention mechanism network structures; wherein the first upsampling and the first Concat_SDI network structure, the first C2f network structure, the second upsampling, the second Concat_SDI network structure, the second C2f network structure, the first EMA attention mechanism network structure, the first Conv network structure, the third Concat_SDI network structure, the third C2f network structure, the second EMA attention mechanism network structure, the second Conv network structure, the fourth Concat_SDI network structure, the fourth C2f network structure, and the third EMA attention mechanism network structure are connected in sequence, and the first C2f network structure is also directly connected to the third Concat_SDI network structure.5.根据权利要求4所述的一种基于改进YOLOv8-seg芯片表面缺陷分割模型训练方法,其特征在于,Neck模块中进行多层次特征融合的过程为:Neck模块接收Backbone输出的3个有效特征层,其中,将第四个ShuffleNetV2网络结构输出的第一有效特征层P1接入第二Concat_SDI网络结构,与待检测特征图N1进行相同尺度融合;5. According to the improved YOLOv8-seg chip surface defect segmentation model training method described in claim 4, it is characterized in that the process of multi-level feature fusion in the Neck module is: the Neck module receives 3 valid feature layers output by Backbone, wherein the first valid feature layer P1 output by the fourth ShuffleNetV2 network structure is connected to the second Concat_SDI network structure, and is fused with the feature map N1 to be detected at the same scale;将第六个ShuffleNetV2网络结构输出的第二有效特征层P2接入第一Concat_SDI网络结构,与待检测特征图N2进行相同尺度融合;The second effective feature layer P2 output by the sixth ShuffleNetV2 network structure is connected to the first Concat_SDI network structure, and is fused with the feature map N2 to be detected at the same scale;将SPPF模块输出的第三有效特征层P3接入第四Concat_SDI网络结构,与待检测特征图N3进行相同尺度融合。The third effective feature layer P3 output by the SPPF module is connected to the fourth Concat_SDI network structure and fused with the feature map N3 to be detected at the same scale.6.根据权利要求1所述的一种基于改进YOLOv8-seg芯片表面缺陷分割模型训练方法,其特征在于,所述EMA注意力机制网络结构包括groups-style分支和Cross-spatial学习分支;在groups-style分支中,通过三条平行路线来提取分组特征图的注意力权重描述符,groups-styl分支不仅对通道间信息进行编码以调整不同信道的重要性,而且将精确的空间结构信息保留到通道中。6. According to a chip surface defect segmentation model training method based on improved YOLOv8-seg according to claim 1, it is characterized in that the EMA attention mechanism network structure includes a groups-style branch and a Cross-spatial learning branch; in the groups-style branch, the attention weight descriptor of the grouped feature map is extracted through three parallel routes, and the groups-style branch not only encodes the inter-channel information to adjust the importance of different channels, but also retains the precise spatial structure information in the channel.7.根据权利要求6所述的一种基于改进YOLOv8-seg芯片表面缺陷分割模型训练方法,其特征在于,在groups-style分支中,通过三条平行路线来提取分组特征图的注意力权重描述符,三条平行路线包括两条在1x1分支上的平行路径,一条在3x3分支上路径;对于任意给定的输入特征图,将特征图的X当量沿着1x1分支划分为G个子特征以学习不同的语义,将特征图的Y当量沿着另一条1x1分支进行特征学习;两条1x1分支的输出均经过Concat和conv处理后输出分解为两个向量,利用两个非线性Sigmoid函数拟合线性卷积后的二维二项分布,实现不同的跨通道交互特征;最后在1x1分支的两条平行路径的输出和分组后的输出一起输入Re-weight对特征进行重新加权处理;特征图还沿着3x3分支进行特征学习后输入Cross-spatial学习分支中。7. According to claim 6, a chip surface defect segmentation model training method based on improved YOLOv8-seg is characterized in that, in the groups-style branch, three parallel routes are used to extract the attention weight descriptor of the grouped feature map, and the three parallel routes include two parallel paths on the 1x1 branch and one path on the 3x3 branch; for any given input feature map, the X equivalent of the feature map is divided into G sub-features along the 1x1 branch to learn different semantics, and the Y equivalent of the feature map is subjected to feature learning along another 1x1 branch; the outputs of the two 1x1 branches are decomposed into two vectors after Concat and conv processing, and two nonlinear Sigmoid functions are used to fit the two-dimensional binomial distribution after linear convolution to achieve different cross-channel interaction features; finally, the outputs of the two parallel paths of the 1x1 branch and the grouped output are input together to Re-weight to re-weight the features; the feature map is also input into the Cross-spatial learning branch after feature learning along the 3x3 branch.8.根据权利要求6所述的一种基于改进YOLOv8-seg芯片表面缺陷分割模型训练方法,其特征在于,在Cross-spatial学习分支中,将1x1分支的输出和3x3分支的输出作为张量;其中,1x1分支的输出依次连接入GroupNorm、Avg Pool、Softmax、Matmul后输出类别概率的向量;3x3分支的输出依次连接Avg Pool、Softmax、Matmul后输出每个类别的概率,且GroupNorm需要跟3x3分支中的Matmul连接;最后,利用非线性Sigmoid函数对两条分支的输出进行融合处理得到一个介于0和1之间的概率值,再和分组后的输出一起输入Re-weight对特征进行重新加权处理,最终得到经过平滑处理的模型参数和特征。8. According to the improved YOLOv8-seg chip surface defect segmentation model training method described in claim 6, it is characterized in that, in the Cross-spatial learning branch, the output of the 1x1 branch and the output of the 3x3 branch are used as tensors; wherein, the output of the 1x1 branch is sequentially connected to GroupNorm, Avg Pool, Softmax, and Matmul to output a vector of category probabilities; the output of the 3x3 branch is sequentially connected to Avg Pool, Softmax, and Matmul to output the probability of each category, and GroupNorm needs to be connected to Matmul in the 3x3 branch; finally, the outputs of the two branches are fused using a nonlinear Sigmoid function to obtain a probability value between 0 and 1, and then input into Re-weight together with the grouped output to re-weight the features, and finally obtain smoothed model parameters and features.9.一种基于改进YOLOv8-seg芯片表面缺陷分割模型,其特征在于,采用如权利要求1所述一种基于改进YOLOv8-seg芯片表面缺陷分割模型训练方法训练得到的。9. A chip surface defect segmentation model based on an improved YOLOv8-seg, characterized in that it is trained using the chip surface defect segmentation model training method based on an improved YOLOv8-seg as described in claim 1.10.一种基于改进YOLOv8-seg芯片表面缺陷分割检测方法,其特征在于,包括如下步骤:10. A chip surface defect segmentation detection method based on improved YOLOv8-seg, characterized by comprising the following steps:步骤1、利用工业相机采集待识别芯片表面缺陷的图片;Step 1: Use an industrial camera to collect images of surface defects of the chip to be identified;步骤2、将步骤1获取的待识别图像输入如权利要求9所述一种基于改进YOLOv8-seg芯片表面缺陷分割模型中,对待识别图像进行芯片表面缺陷的分割预测,输出工业生产过程中芯片表面缺陷的信息。Step 2: Input the image to be identified obtained in step 1 into the chip surface defect segmentation model based on the improved YOLOv8-seg as described in claim 9, perform segmentation prediction of chip surface defects on the image to be identified, and output information on chip surface defects in the industrial production process.
CN202411453719.0A2024-10-172024-10-17 A chip surface defect segmentation model based on improved YOLOv8-seg and its training method and applicationPendingCN119399144A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202411453719.0ACN119399144A (en)2024-10-172024-10-17 A chip surface defect segmentation model based on improved YOLOv8-seg and its training method and application

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202411453719.0ACN119399144A (en)2024-10-172024-10-17 A chip surface defect segmentation model based on improved YOLOv8-seg and its training method and application

Publications (1)

Publication NumberPublication Date
CN119399144Atrue CN119399144A (en)2025-02-07

Family

ID=94421594

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202411453719.0APendingCN119399144A (en)2024-10-172024-10-17 A chip surface defect segmentation model based on improved YOLOv8-seg and its training method and application

Country Status (1)

CountryLink
CN (1)CN119399144A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN119672029A (en)*2025-02-212025-03-21安徽大学 A classification method for surface defect detection of prefabricated components based on target detection

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN119672029A (en)*2025-02-212025-03-21安徽大学 A classification method for surface defect detection of prefabricated components based on target detection

Similar Documents

PublicationPublication DateTitle
CN111210443B (en) A Deformable Convolutional Hybrid Task Cascade Semantic Segmentation Method Based on Embedding Balance
CN114119638B (en)Medical image segmentation method integrating multi-scale features and attention mechanisms
CN118134952B (en)Medical image segmentation method based on feature interaction
CN114612477B (en)Lightweight image segmentation method, system, medium, terminal and application
CN117253154B (en)Container weak and small serial number target detection and identification method based on deep learning
US20160358337A1 (en)Image semantic segmentation
CN110738207A (en)character detection method for fusing character area edge information in character image
CN110059698A (en)The semantic segmentation method and system based on the dense reconstruction in edge understood for streetscape
CN118691815A (en) A high-quality automatic instance segmentation method for remote sensing images based on fine-tuning of the SAM large model
CN115393289A (en)Tumor image semi-supervised segmentation method based on integrated cross pseudo label
CN115082778B (en)Multi-branch learning-based homestead identification method and system
CN117237623B (en) A method and system for semantic segmentation of UAV remote sensing images
CN119399144A (en) A chip surface defect segmentation model based on improved YOLOv8-seg and its training method and application
CN118736226A (en) A method and system for dam crack segmentation based on a few samples of a universal segmentation model
CN114359554A (en) An Image Semantic Segmentation Method Based on Multi-receptive Field Contextual Semantic Information
CN118781077A (en) Tunnel disease detection method based on multi-scale feature pyramid
CN118470714A (en) A method, system, medium and electronic device for semantic segmentation of camouflaged objects based on decision-level feature fusion modeling
CN110633706B (en)Semantic segmentation method based on pyramid network
Song et al.Building footprint extraction from aerial images using an edge-aware YOLO-v8 network
CN116935051B (en)Polyp segmentation network method, system, electronic equipment and storage medium
CN114708591B (en) Chinese character detection method in document images based on single-word connection
CN116579952A (en)Image restoration method based on DU-GAN network
CN119180959B (en) Edge-assisted feature calibration method for real-time semantic segmentation
CN118982843B (en) A lightweight pedestrian detection method in harsh environments based on deep learning
Li et al.Pixel memory sharing-based multiscale features perception method for remote sensing images

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination

[8]ページ先頭

©2009-2025 Movatter.jp