Disclosure of Invention
The invention mainly aims to provide a notebook appearance flaw segmentation method based on deep learning, which can effectively solve the problems that most generations in the background technology adopt a manual detection mode to detect the notebook appearance flaws, a large amount of human resources are needed, the detection efficiency is low, the traditional vision-based notebook appearance flaw detection algorithm is easily interfered by factors such as external environment, the unified design is difficult to perform aiming at different flaw characteristics, the detection accuracy is low, and the generalization capability is poor.
In order to achieve the purpose, the invention adopts the technical scheme that:
a notebook appearance flaw segmentation method based on deep learning comprises the following steps:
the method comprises the following steps: collecting training samples, making a data set, training a deep learning model by using the data set, and training the model until convergence;
step two: collecting a target image, and segmenting the foreground and the background of the image by using a maximum inter-class variance method;
step three: performing connected domain analysis to find a connected domain with the largest area, and cutting the image into a target size by taking the region as a center to input;
step four: modifying the structure of Res50, replacing convolution modules of Res4 and Res5 with deformable convolution, fixing parameters of a previous layer unchanged, and retraining parameters of Res4 and layers after Res 4;
step five: performing K-Means clustering on a target frame in the data set to obtain prior knowledge of the size of a search box;
step six: adjusting the size of the cut image, and inputting a deep learning model;
step seven: and (4) distinguishing the appearance flaws of the notebook computer through a deep learning model and outputting an inference result to an upper computer for display.
Further, the data set in the first step includes a plurality of sample images and label information corresponding to each sample image, where the label information includes a category of a detection target in the image, a segmentation mask, and a framing position, where the framing position may be represented as (x, y, w, h), x is an abscissa of the target frame, y is an ordinate of the target frame, w is a width of the target frame, h is a length of the target frame, and the segmentation mask is an outline of an actual detection object in the target frame.
Furthermore, the deformable convolution in the fourth step mainly adds learning of offset in the x and y directions in the original convolution unit, dynamically adjusts the size and position of a convolution kernel, inputs the deformable convolution as a feature map after standard convolution, performs convolution operation on the feature map to generate N2-dimensional offset quantities (Δ x, Δ y), corrects the value of each point on the input feature map, sets the feature map as P, i.e., P (x, y) = P (x + Δ x, y + Δ y), calculates P (x + Δ x, y + Δ y) by using bilinear interpolation when x + Δ x is a fraction, forms N feature maps, and performs convolution one-to-one by using N convolution kernels to obtain output.
Further, before training the deep learning model, the size of the search box of the RPN network candidate area is set.
Further, the candidate boxes in the step five have sizes of 322, 642 and 1282, and the aspect ratios of the candidate boxes are 1:1, 1:3 and 3: 1.
Compared with the prior art, the invention has the following beneficial effects:
1. when a sample image is processed, a foreground area is cut out through a traditional image algorithm, and then the size of the cut-out foreground image is adjusted, so that the definition of the image is guaranteed to the maximum extent compared with the case that the original image is directly used for adjusting the size;
2. network structures of Res4 and Res5 of ResNet50 are modified, DCN is used for replacing a common convolution module, the geometric transformation modeling capability of the model is enhanced, and missing reports and false reports are reduced to a certain extent;
3. the size and the length-width ratio of the candidate frame of the RPN network are optimized by using the priori knowledge, so that the method is more suitable for detecting the appearance flaws of the notebook computer, the missing report is further reduced, and the detection precision is improved.
Detailed Description
The present invention will be further described with reference to the following detailed description, wherein the drawings are for illustrative purposes only and are not intended to be limiting, wherein certain elements may be omitted, enlarged or reduced in size, and are not intended to represent the actual dimensions of the product, so as to better illustrate the detailed description of the invention.
Example 1
As shown in fig. 1, a notebook appearance flaw segmentation method based on deep learning includes the following steps:
the method comprises the following steps: collecting training samples, making a data set, training a deep learning model by using the data set, and training the model until convergence;
step two: collecting a target image, and segmenting the foreground and the background of the image by using a maximum inter-class variance method;
step three: performing connected domain analysis to find a connected domain with the largest area, and cutting the image into a target size by taking the region as a center to input;
step four: modifying the structure of Res50, replacing convolution modules of Res4 and Res5 with deformable convolution, fixing parameters of a previous layer unchanged, and retraining parameters of Res4 and layers after Res 4;
step five: performing K-Means clustering on a target frame in the data set to obtain prior knowledge of the size of a search box;
step six: adjusting the size of the cut image, and inputting a deep learning model;
step seven: and (4) distinguishing the appearance flaws of the notebook computer through a deep learning model and outputting an inference result to an upper computer for display.
In the second step, the foreground area is cut out through a traditional image algorithm, and then the size of the cut foreground image is adjusted, so that the definition of the image is guaranteed to the maximum extent compared with the method of directly adjusting the size by using an original image.
In the third step, if the size of the image to be detected is larger, the detection precision can be increased, but the detection speed can be influenced, the display memory consumed by detection is slightly larger, so that the size can be properly reduced, and a balance is obtained between the detection precision and the detection speed.
Example 2
As shown in fig. 1, a notebook appearance flaw segmentation method based on deep learning includes the following steps:
the method comprises the following steps: collecting training samples, making a data set, training a deep learning model by using the data set, and training the model until convergence;
step two: collecting a target image, and segmenting the foreground and the background of the image by using a maximum inter-class variance method;
step three: performing connected domain analysis to find a connected domain with the largest area, and cutting the image into a target size by taking the region as a center to input;
step four: modifying the structure of Res50, replacing convolution modules of Res4 and Res5 with deformable convolution, fixing parameters of a previous layer unchanged, and retraining parameters of Res4 and layers after Res 4;
step five: performing K-Means clustering on a target frame in the data set to obtain prior knowledge of the size of a search box;
step six: adjusting the size of the cut image, and inputting a deep learning model;
step seven: and (4) distinguishing the appearance flaws of the notebook computer through a deep learning model and outputting an inference result to an upper computer for display.
The deformable convolution in the fourth step is mainly characterized in that learning of offset in the x and y directions is added in an original convolution unit, dynamic adjustment is carried out on the size and the position of a convolution kernel, the input of the deformable convolution is a feature map after standard convolution, then convolution operation is carried out on the feature map to generate N2-dimensional offset quantities (delta x and delta y), the value of each point on the input feature map is corrected respectively, the feature map is set to be P, namely P (x, y) = P (x & ltdelta & gt, y & ltdelta & gt y & gt), when x & ltdelta & gt is a fraction, P (x & ltdelta & gt, y & ltdelta & gt) is calculated by using bilinear interpolation to form N feature maps, and then N convolution kernels are used for one-to-one correspondence to obtain output;
meanwhile, network structures of Res4 and Res5 of ResNet50 are modified, DCN is used for replacing a common convolution module, the geometric transformation modeling capability of the model is enhanced, and missing reports and false reports are reduced to a certain extent.
Example 3
As shown in fig. 1, a notebook appearance flaw segmentation method based on deep learning includes the following steps:
the method comprises the following steps: collecting training samples, making a data set, training a deep learning model by using the data set, and training the model until convergence;
step two: collecting a target image, and segmenting the foreground and the background of the image by using a maximum inter-class variance method;
step three: performing connected domain analysis to find a connected domain with the largest area, and cutting the image into a target size by taking the region as a center to input;
step four: modifying the structure of Res50, replacing convolution modules of Res4 and Res5 with deformable convolution, fixing parameters of a previous layer unchanged, and retraining parameters of Res4 and layers after Res 4;
step five: performing K-Means clustering on a target frame in the data set to obtain prior knowledge of the size of a search box;
step six: adjusting the size of the cut image, and inputting a deep learning model;
step seven: and (4) distinguishing the appearance flaws of the notebook computer through a deep learning model and outputting an inference result to an upper computer for display.
The data set in the first step includes a plurality of sample images and label information corresponding to each sample image, where the label information includes a category, a segmentation mask, and a framing position of a detection target in the image, the framing position may be represented as (x, y, w, h), x is an abscissa of the target frame, y is an ordinate of the target frame, w is a width of the target frame, h is a length of the target frame, and the segmentation mask is an outline of an actual detection object in the target frame.
Before training the deep learning model, the size of a search box of the RPN network candidate area is set, the invention uses a K-Means clustering method to cluster the size of a target box in a data set, and a proper candidate box size and length-width ratio are divided. The priori knowledge of the size of the search box is obtained through K-Means clustering, and then the parameter setting of the search box is carried out, so that the detection precision reduction caused by the difference between the size of the search box and the size of the actual detection defect is effectively avoided.
The foregoing shows and describes the general principles and broad features of the present invention and advantages thereof. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.