Disclosure of Invention
In order to solve the problems, the invention provides a glass panel surface defect detection method based on small sample learning, which realizes the efficient and accurate detection of the glass panel surface defects under the conditions that only a small number of glass panel defect sample images are available, the labeling is not accurate enough, and the small targets are more, and comprises the following specific steps:
s1: collecting a small number of defective glass panel images, and labeling a boundary frame and defect types;
s2: preprocessing the glass panel images and expanding the number of the glass panel images to construct a glass panel surface defect detection data set;
s3: constructing a defect detection network for identifying and positioning the glass panel image, wherein the defect detection network comprises a main feature extraction network, an improved RPN (resilient packet network), an ROI posing layer, a global ROI extraction layer, a frame regression network and a classification network, wherein the main feature extraction network is composed of ResNet101 and a feature pyramid network;
the system comprises a backbone feature extraction network, an improved RPN (resilient packet network) network and an ROI posing layer, wherein the backbone feature extraction network is used for extracting global features of an image, the improved RPN network and the ROI posing layer are used for extracting candidate region features of the image, the global ROI extraction layer is used for fusing the global features and the candidate region features and updating the candidate region features, and a frame regression network and a classification network are used for generating a positioning boundary frame and a defect type according to the updated candidate region features;
s4: training the constructed defect detection network by using a glass panel surface defect detection data set to obtain a trained defect detection model;
s5: and performing defect detection on the glass panel image by using the trained defect detection model, and outputting a defect positioning boundary frame and the defect type to which the defect positioning boundary frame belongs.
Further, the backbone feature extraction network comprises a ResNet101 and a feature pyramid network;
the ResNet101 comprises a layer of convolution layer and four residual blocks from bottom to top which are sequentially connected; and the output of each residual block is sequentially connected with the corresponding layer of the feature pyramid network from top to bottom, and the output results of each layer of the feature pyramid network are subjected to L2 regularization processing to form a multi-scale feature map.
The improved RPN network working mode is as follows: taking a multi-scale feature map output by the feature pyramid network as the input of an RPN network, generating p anchor frames in the feature map of each scale, obtaining the length and width dimensions of the anchor frames through clustering, and clustering marking frames in a training set into p × q classes by using a k-means method, wherein q is the number of different scales of the multi-scale feature map; and generating p × q anchor frames according to each clustering center, sequencing according to the area of the anchor frames, and enabling the feature map of each scale to correspond to the p anchor frames. And automatically selecting a positive sample and a negative sample by adopting a self-adaptive training sample selection method, and carrying out classification and frame regression training. And performing frame regression and scoring on the anchor frame through an RPN (resilient packet network) to obtain an initial candidate region, and filtering through NMS (network management system) to obtain a final candidate region.
Inputting the candidate region generated by the improved RPN network into an ROI posing layer, extracting the characteristics of the candidate region from the characteristic diagram with the corresponding scale output by the characteristic pyramid network, and unifying the size; taking a multi-scale feature map output by the feature pyramid network as a global feature, performing attention mechanism-based processing on the candidate regional feature and the averaged pooled global feature, and taking the candidate regional feature as a mask to generate a background feature; and dynamically fusing the background features and the candidate region features to obtain updated candidate region features.
And sending the updated candidate region characteristics into a frame regression network and a classification network, wherein the total loss of the defect detection network comprises classification loss and frame loss, the classification loss uses a cross entropy function, and the frame loss uses a smoothL1 function.
Further, the positive sample and the negative sample are obtained according to a self-adaptive training sample selection method, firstly, the sample with the IOU lower than a threshold value is filtered according to the IOU statistical characteristics of the anchor frame and the marking frame, then whether the center of the anchor frame falls into the marking frame is judged, if yes, the positive sample is obtained, and if not, the negative sample is obtained.
Under the condition that only a small number of defective pictures of the glass panel can be obtained, the contrast of the pictures is improved by using the contrast-limiting self-adaptive histogram equalization, the noise is added to simulate the pictures without accurate focusing, the image blocks are randomly adopted to realize the expansion of the small sample images according to the intersection of the labeling frames and the image blocks, and the problem of the small samples is solved by using data enhancement, transfer learning and L2 regularization; random dithering is carried out on the labeling frame to increase the diversity of the frame and enhance the robustness of the model under the condition of inaccurate labeling; the characteristic pyramid is used for fusing the bottom layer structure information and the high-level semantic information to generate multi-scale characteristics, so that the detection accuracy of small target defects such as glass panel pinholes is improved; the candidate region features are fused with the features of the whole image, background information is introduced into the candidate region features, and classification and regression of a frame which is smaller than a labeled frame are facilitated; and considering the influence of each sample on the mAP, the weight of various samples in the loss is changed, and the performance index of the model is improved on the whole.
Detailed Description
The invention will be further elucidated and described with reference to the drawings and the detailed description. The technical features of the embodiments of the present invention can be combined correspondingly without mutual conflict.
The overall flow chart of the glass panel surface defect detection method based on small sample learning disclosed by the invention is shown in fig. 1, and the specific implementation process is as follows:
(1) a small number of defective glass panel images are collected. And labeling each picture by using labeling software labelImg to generate an xml file, wherein the xml file comprises a defect boundary frame and defect types, and the defect types comprise bubbles, tin ash, pinholes and scratches.
(2) And preprocessing the glass panel images and expanding the number of the glass panel images to construct a glass panel surface defect detection data set.
The method comprises the following specific steps:
a. preprocessing includes limiting contrast adaptive histogram equalization, adding noise, etc.
By using the contrast-limited self-adaptive histogram equalization method, the upper threshold of contrast limitation is 4, the grid size of histogram equalization is selected to be 8, and the contrast of the image can be enhanced by using the method because the acquired glass panel image is dark.
The specific process of the contrast-limiting adaptive histogram equalization method comprises the following steps: firstly, converting an image from an RGB color mode to an LAB color mode, then dividing the image into 8 × 8 rectangular blocks with equal size, then counting the histogram distribution of a luminance channel (L channel) of each rectangular block, and if a threshold value is exceeded in each histogram
The gray level exceeding part is cut and evenly distributed to each gray level, then each histogram is equalized, the brightness value of each rectangular block after the center point is equalized is calculated, then each pixel point is subjected to bilinear difference by using the center point of the adjacent rectangular block to obtain the brightness value of the pixel point, and finally the image after the difference is converted into an RGB color mode from an LAB color mode.
Random addition of gaussian or salt and pepper noise simulates a picture without accurate focus.
b. The number expansion is achieved by random sampling.
And randomly cutting each glass panel image with the defects to obtain a plurality of image blocks, wherein each image block comprises one or more bounding boxes with defect types.
In this embodiment, when an image of a defective glass panel is cut, a threshold set {0.1,0.3,0.5,0.7,0.9} of the intersection ratio between an image block and a bounding box is set, and multiple rounds of cutting are performed in sequence from small to large according to the threshold, and it is ensured that the intersection ratio between the bounding box with the defective category and the image block in the obtained image block is greater than the threshold in each round of cutting. For example, the threshold value is set to 0.1, the extended data set is randomly sampled in the acquisition area of the defective glass panel image, after one round of sampling is finished, the threshold value is set to 0.3, and the above process is repeated.
After the cutting is finished, random dithering is carried out on the marking frame to increase the diversity of the marking frame so as to improve the adaptability and robustness of the model, the center point of the marking frame is unchanged, and the scaling proportion of the length and the width is uniformly sampled between 0.9 and 1.1; and transforming the marking frame while enhancing the data, and finally manually checking to ensure that the image and the marking frame generated by enhancing the data are in accordance with the reality and meet the construction principle of the data set.
(3) And constructing a glass panel surface defect detection network model for identifying and positioning the glass panel defect image. As shown in fig. 2, the defect detection network model includes a backbone feature extraction network composed of ResNet101 and a feature pyramid network, an improved RPN network, a ROI posing layer, a global ROI extraction layer, a bounding box regression network, and a classification network.
The following describes the structure of each sub-network included in the glass panel surface defect detection network model.
(3.1) backbone feature extraction network
The main feature extraction network is composed of a ResNet101 and a feature pyramid network, in the embodiment, the ResNet101 adopts a weight parameter pre-trained by a classification task through a large-scale data set ImageNet, and overfitting of a depth model on small data is relieved by means of transfer learning.
The ResNet101 comprises a convolution layer and 4 residual blocks, and feature pyramid networks are used for fusing feature outputs of the 4 residual blocks of the ResNet101, namely, bottom layer structure information and high-level semantic information are combined, feature expression is enhanced, and small target detection performance is improved. The output of the characteristic pyramid network consists of 5 layers of characteristic graphs, and the characteristic graphs of each branch of the characteristic pyramid network are subjected to L2 regularization processing to form a multi-scale characteristic graph.
(3.2) improved RPN network
And inputting the multi-scale features output by the feature pyramid network into the improved RPN network for extracting the candidate region. Specifically, a multi-scale feature map output by a feature pyramid network is used as input of an RPN, 3 anchor frames are generated in the feature map of each scale, the length and width dimensions of the anchor frames are obtained through clustering, and a group of prior values more suitable for a data set is automatically generated by using a k-means method. The IOU of the anchor frame and the label frame is improved, the detection precision of the model is improved, therefore, for improving the IOU of the anchor frame and the label frame, the IOU value is adopted as the judgment standard of the cluster, and the distance measurement formula is as follows:
d(box,centroid)=1-IOU(box,centroid)
the IOU (box) represents the intersection ratio of the marking box and the cluster center box.
Automatically selecting a positive sample and a negative sample by adopting a self-adaptive training sample selection method, and carrying out classification and frame regression training; performing frame regression and scoring on the anchor frame through an RPN (resilient packet network) to obtain an initial candidate region; the candidate regions filtered by NMS are sent to an ROI posing layer for feature extraction, and the feature sizes are uniformly set to be 7 x 7.
In this embodiment, the labeling boxes in the training set are grouped into 15 types; and generating 15 anchor frames according to each clustering center. The large-resolution feature map contains finer structural features and is suitable for detecting small targets, the small-resolution feature map has a larger receptive field and is suitable for detecting large targets, therefore, the generated anchor frames are sorted according to areas, the anchor frames with small areas serve as prior frames of the large-resolution feature map, and 3 anchor frames are distributed to each feature map.
In a common method for selecting positive and negative samples, the IOU of an anchor frame and the IOU of a marking frame are calculated, and the IOU is compared with a set threshold value to select the positive and negative samples, so that the defect that the anchor frame is a positive sample or a negative sample is obvious, and the method is very limited by the design of the length and the width of the anchor frame and the selection of the threshold value. The invention provides a self-adaptive training sample selection method, which is characterized in that a part of anchor frames with smaller IOU are filtered according to the IOU statistical characteristics of the anchor frames and a marking frame, and then positive and negative samples are judged by judging whether the centers of the anchor frames fall into the marking frame, so that the defect is obviously overcome.
(3.3) ROI pooling layer
And inputting the candidate region generated by the improved RPN network into an ROI posing layer, extracting the characteristics of the candidate region from the characteristic diagram of the corresponding scale output by the characteristic pyramid network, and unifying the size to be 7 multiplied by 7.
(3.4) Global ROI extraction layer
The global ROI extraction layer is used for extracting the features of the whole image, fusing the global features and the candidate region features by using a residual error network, introducing background information into the candidate region features, and facilitating classification and regression of a frame which is smaller than a labeling frame.
In this embodiment, the global ROI extraction layer adopts a residual structure, and a side branch of the residual structure is favorable for avoiding a network degradation phenomenon, and the specific structure is as shown in fig. 3, where the global feature changes the resolution to 7 × 7 by adaptive average pooling, and then uses the candidate region feature X _ pro as a mask based on an attention mechanism to further optimize and generate the background feature X _ bg, and considering that simply adding the positions of the two features means that the two features are equally important, but the background feature can only play an auxiliary role, a dynamic fusion strategy is used to fuse the candidate region feature X _ pro and the background feature X _ bg, and w _ pro and w _ bg are both used as learnable parameters, and finally an updated candidate region feature is output through a convolution layer.
(3.5) bounding Box regression network and Classification network
And sending the updated candidate region characteristics into a frame regression network and a classification network to realize the detection of the surface defects of the glass panel.
And detecting the total loss of the network, wherein the classification loss and the border loss comprise classification loss and border loss, the classification loss uses a cross entropy function, the border loss uses a smoothL1 function, and the total loss realizes importance weighting of the prediction boxes from the aspect that each prediction box influences the average accuracy average (mAP).
The prediction box importance weighting specifically includes: the classification weight is adjusted to take into account the influence of each sample on mAP, the larger the IOU for positive samples, and labeled boxes, the classification weight wiThe larger; for negative samples, the greater the score predicted as a positive sample (i.e., the probability of being classified as a defect), the classification weight wjThe larger; and adjusting the frame regression weight, wherein for the positive sample, the larger the score predicted to be a certain type of defect is, the frame regression weight ciThe larger.
The loss function is as follows:
L=λLcls+μLloc
wherein λ and μ are parameters for balancing classification loss and bounding box loss, and L is total loss;
the classification loss is as follows:
wherein L is
clsIs the classification loss, cross _ entropy (. -) is the cross entropy function, n is the number of positive samples, m is the number of negative samples, s
iIs the prediction score, s, of the ith positive sample
jIs the prediction score of the jth negative sample,
for the true defect class of the ith positive sample,
is the category of the jth negative example (i.e., background category), w
iAnd w
jIs weight, beta and gamma are hyperparameters, r
iOrder number of ith positive sample, r
jThe ranking sequence number of the jth negative sample, wherein the positive samples are ranked according to the IOU of the sum label box, the larger the IOU is, the smaller the sequence number is, the negative samples are ranked according to the probability of being predicted as the defect category, the larger the probability is, the smaller the sequence number is, n
maxThe total number of the classes to which the samples belong;
the frame loss is as follows:
wherein L is
locFor bezel loss, smoothL1(.) is a smoothL1 function, n is the number of positive samples, d
iFor the prediction bounding box offset of the ith positive sample,
as deviation of the annotation box and the prediction bounding box, c
iAre weights, b and k are hyperparameters, p
iAnd the prediction score of the real defect category corresponding to the prediction frame of the ith positive sample. Thus, for positive samples, the greater the score predicted to be a defect of a certain type, the greater the weight in the loss.
Model training is carried out by utilizing a glass panel surface defect detection data set, the initial learning rate is 0.004, the change of the learning rate adopts the periodic learning rate and cosine annealing, the training period is 50 rounds, the momentum is set to be 0.9, and the weight attenuation is set to be 0.0001.
In practical application, the trained defect detection model is used for detecting the defects of the glass panel image and outputting a defect frame and a defect category. As shown in fig. 5, the detection results of four defects are shown. Because the detection picture is large, a local picture is displayed for convenient display.
In the embodiment, the defect detection result is evaluated by using mAP, Precision (Precision) and Recall (Recall), and the higher mAP indicates that the defect detection effect of the method on the glass panel is better. The defect detection result of the algorithm of the embodiment is shown in FIG. 4, the comparison between the algorithm of the embodiment and the conventional Faster R-CNN is shown in Table 1, and it can be seen from Table 1 that the method provided by the embodiment has obvious advantages in terms of accuracy or recall compared with the conventional Faster R-CNN.
TABLE 1 detection results of conventional Faster R-CNN and the algorithm of this example on a glass panel defect data set
| Detecting a network | mAP(%) | Precision(%) | Recall(%) |
| Conventional fast R-CNN | 72.8 | 73.1 | 70.1 |
| The method of the invention | 85.1 | 82.2 | 82.8 |
In conclusion, under the condition that only a small number of defective pictures of the glass panel can be obtained, the contrast of the pictures is improved by using the contrast-limiting self-adaptive histogram equalization, the noise is added to simulate the pictures without accurate focusing, the expansion of the small sample images is realized according to the intersection of the marking frame and the image blocks compared with the randomly sampled image blocks, and the problem of the small samples is relieved by using data enhancement, transfer learning and L2 regularization; random dithering is carried out on the labeling frame to increase the diversity of the frame and enhance the robustness of the model under the condition of inaccurate labeling; the characteristic pyramid is used for fusing the bottom layer structure information and the high-level semantic information to generate multi-scale characteristics, so that the detection accuracy of small target defects such as glass panel pinholes is improved; the candidate region features are fused with the whole image features, background information is introduced into the candidate region features, and classification and regression of a frame which is smaller than a real frame are facilitated; and considering the influence of each sample on the mAP, the weight of various samples in the loss is changed, and the performance index of the model is improved on the whole.