Method for enhancing unsupervised data of surface defect image of hot rolled coilTechnical Field
The invention belongs to the technical field of metallurgical hot rolled coil defect detection, and relates to a method for generating image data of surface defects of a hot rolled coil based on an artificial intelligent network model.
Background
In the industry's ever-evolving era, steel has become an industrial "grain," one of the most important materials in modern times. However, in the actual production of steel coils in steel mills, hot rolled coils with defective surfaces are often produced due to the influence of equipment problems and processing techniques. If the surface defects of the plate coil cannot be effectively detected and identified, the quality, the external appearance, the economic benefit, the usability and the like of the plate coil as steel products are seriously affected, so that the research on the surface defect identification of the plate coil has great practical significance. The early defect detection needs to be identified and judged by eyes of quality inspection staff, has high labor intensity, low efficiency and time and labor consumption, and later adopts machine vision based on traditional machine learning to detect the defects, but the identification rate is improved, the identification rate still needs to be judged manually, and the characteristic design is performed manually through characteristic engineering in the machine learning, so that the method is complicated. In recent years, the rise of deep learning technology brings a qualitative leap to image recognition, and the research of deep learning from an artificial neural network is that the deep neural network comprises a plurality of hidden layers. Deep learning forms more abstract high-level representation attribute categories or features by combining low-level features to discover a distributed pattern feature representation of the data. The deep learning does not need manual design characteristics, and the flow is simple. The method starts to input images from one end of the network to the other end of the network, outputs a prediction result, automatically learns the image texture characteristics of a large amount of data, combines and abstracts the image texture characteristics, can more describe the internal information with rich data, improves the learning capacity compared with the traditional network, and further can improve the accuracy of identifying the surface defect images of the hot rolled coil.
The deep learning technology is used for detecting the surface defects of the hot rolled plate coil, so that a corresponding neural network model is trained firstly, but massive label data are usually relied on, and the requirement on training samples is very high. The larger the data volume is, the better the generalization performance of the trained model is, and the higher the accuracy of identifying the surface defects of the hot rolled coil is. However, the data collection of the surface defects of the hot rolled coils in the metallurgical steel rolling field is very difficult, so that the industrial data becomes a very precious resource. The occurrence of defects on the surfaces of various hot rolled coils is random naturally, and after collection, steel rolling specialists are required to spend a great deal of time for marking defect data, which is time-consuming and labor-consuming. The data is input into the deep neural network, the picture data of the plate coil and the characteristics of the picture data often determine the upper limit of the final model recognition result, the selection and optimization of the model algorithm only gradually approach the upper limit, and the data quality and the data quantity play a critical role in the final training performance of the model. The data enhancement technology has been developed under the background, and the data enhancement is also called data amplification, namely, the existing original data is changed in a series of processing under the condition of no additional data, so that more new data are generated, more value is generated by limited data, and meanwhile, the recognition accuracy and generalization capability of a classification model are improved.
An unsupervised learning model is an artificial intelligence model that features that for its input examples, its potential rules are automatically found out from those examples. The unsupervised learning is to learn the essential characteristics of the real data, thereby describing the distribution characteristics of the sample data and generating new data similar to the training sample. The parameters of the model are much smaller than the amount of training data so the model can discover and effectively internalize the nature of the data so that the data can be generated. The idea of generating an antagonism network (GAN, generative Adversarial Networks) is derived from game theory, which belongs to an unsupervised learning network and was proposed by Ian Goodfellow in 2014. The GAN model ultimately produces good outputs through the learning of the game with each other by two modules in the framework, the Generator and the arbiter (Discriminator). The generator randomly generates the observation data by giving some implicit information. The input of the generator network in the invention is random noise variable, and the output is the image data of the surface defect of the hot rolled plate coil generated by a computer. The generator continuously learns to fit the true high-dimensional data distribution mainly from one low-dimensional data distribution, and the discriminator mainly aims to distinguish whether the data originates from the true image or the image generated by the generator. The arbiter model requires an input image, predictive decisions by the loss function of the model. The method comprises the steps of training a generator and a discriminator in an alternating mode, firstly training the discriminator, inputting real data and generated data into the discriminator, training the discriminator with good effect, and training the generator by using the trained discriminator, wherein the purpose is that the data generated by the generator cheat the discriminator as far as possible, and the generated pictures and the real pictures cannot be distinguished by the data generated by the generator until the discriminator is continuously confronted and learned, so that Nash equilibrium is finally achieved (one balance in game theory means that improvement of any party cannot cause increase of overall income).
Disclosure of Invention
The invention aims to provide a method for stably and effectively generating surface defect image data of a hot rolled coil, which solves the problems that the quality of products is seriously affected due to low accuracy of identifying defects by using an artificial intelligent machine vision technology caused by less surface defect image data of the hot rolled coil and difficult acquisition in the field of metallurgical steel rolling. The invention is different from the prior supervised data enhancement, the traditional data enhancement is to use fixed preset rules to perform simple geometric transformation (turnover, rotation, cutting, deformation, scaling and the like) and color transformation (noise, blurring, brightness, erasure, dithering, filling and the like) and other operations on the existing image, and the traditional method does not consider the difference between different tasks and the rich diversity of the samples and the high and low image quality after the data enhancement, and is not efficient and stable enough.
The original GAN theory does not require that both the generator and the arbiter are neural networks, but only that functions fit the corresponding generation and discrimination. The generation countermeasure network consists of two deep neural networks competing with each other, which is a probability generation model, the mode of generating samples is forward propagation through a generator, gradient back propagation is used for optimization calculation after judgment of a discriminator, and the model does not depend on any prior assumption. Many data enhancement methods in the past generally have been very complex to assume that the data follows a certain distribution and then use maximum likelihood to estimate the data distribution. The generator in the GAN takes random noise as input and generates a sample, and the discriminator is used for judging whether the data generated by the generating network is true or false. In the production countermeasure network training process, the two networks are continuously iterated and optimized in the mutual game, the generator learns to generate more real samples, and the discrimination network discriminates whether the data samples are real samples or false samples as far as possible. The two parties compete continuously until the two parties cannot get better, and finally the two networks achieve dynamic balance, namely the image distribution generated by the generator is close to the real image distribution, and the discriminator cannot recognize the real and false images. The earliest generators and discriminants were not deep neural networks employed but perceptron, which optimized the objective function as shown in the following equation:
Where x represents the true sample, z represents the random noise input to the generator G, and G (z) represents the image generated by the generator G. D (x) represents the probability that the discriminator D judges whether or not the picture is a true picture, and D (G (z)) is the probability that the discriminator D judges whether or not the picture generated by G is a true picture. p (x) represents the distribution of real data, and p (z) represents the distribution of generated data. This formula requires maximizing the probability estimate of the discriminant model for the real data and minimizing the probability estimate of the discriminant model for the generated data. The generated countermeasure network has training instability in the training process, can only grasp the local variance characteristics of data distribution, and is easy to generate model collapse. The problems that training is difficult, the loss of a generator and a discriminator cannot indicate the training process, the generated samples lack of diversity and the like exist, and the generated pictures are fuzzy and poor in quality.
In view of the above problems, the present invention improves and optimizes the original generation of an antagonism network, which consists of two deep neural networks (a generator sequence network and a arbiter sequence network) competing against each other.
Step 1. The invention defines the generator function as a generator sequence GEN (z) consisting of generator modules, wherein the initial module is defined as:
Where a is the picture space generated by the noise space Z map, Z is the space consisting of gaussian noise hidden vectors (latent vector of the left-hand input in fig. 3) subject to normal distribution. N (0, 1) is a standard normal (Gaussian) distribution, R represents the spatial dimension,To represent the mapped symbols, a0 represents the picture that was first generated by the noise spatial mapping.
Step 2, setting the generator sequence to have k layers (k is set by a model designer, k is set to 7 in the invention), and enabling Gk to be a general function:
It represents the basic generator module, and in implementation the function includes an upsampling operation such that each layer of image resolution is greater than it was.
Step 3, defining k such generator modules to synthesize a final generator sequence:
Step 4, defining a function r (represented by a horizontal black rectangular box in fig. 3), wherein the function has the function of generating images with different resolutions at different stages of different modules in the generator sequence, modeling r as a module consisting of a1×1 convolution and an activation function (a function for increasing nonlinearity of a neural network model), and greatly increasing the nonlinearity characteristic of the module (using a subsequent nonlinear activation function to convert the activation of intermediate convolution values into images, and the output Oi of the module corresponds to different downsampled images of a final output image) under the premise of keeping the scale of a feature map (feature map) unchanged (i.e. without losing resolution). The formula for defining r is as follows:
wherein, the function ri acts as a regularizer (a rule function for improving the generalization capability of the model), and projects the learned feature map into the RGB image space to generate an intermediate layer image.
Step 5 gi(z)=ai is obtained from step 1, so there is ri(gi(z))=ri(ai)=oi
Where oi is the generated image of the generator i-th layer intermediate layer. The generated pictures oi with different resolutions are all sent to the discriminator, so that the gradient information flows into the current layer when back-propagating to strengthen the information interaction and prevent the gradient from disappearing.
And 6, defining a discriminator sequence. We denote the sequence of discriminators, consisting of all the modules of the discriminator, by the letter D, we define the last layer of the discriminator as D (z '), the first layer of the discriminator as D0, the true sample by y, the generated sample image by y', and similarly Dj as the intermediate layer function of the discriminator. When j=k-i, i and j are always related to each other. The output of the j-th middle layer function of the arbiter is thus defined as:
a'j=dj(combine(ok-j,a'j-1))
=dj(combine(oi,a'j-1))
Wherein the combination is to perform 1×1 convolution (channel cascade combination operation) on the output oi of the ith intermediate layer of the generator and the corresponding output of the j-1 th intermediate layer of the discriminator, so as to realize feature information integration. Reference is made to the illustration of figure 3. Since the final discriminant loss function of the discriminant is not only a function of the final output of the generator, but also a function of the intermediate layer output. The connection of the middle layer enables gradient information to flow from the middle layer of the discriminator to the middle layer of the generator in the model training process, so that the network training is more stable.
And 7, finally, the formula of the whole discriminator sequence is recorded as:
Step 8, the invention improves the structure of the traditional generation countermeasure network and defines a new loss function. The following is shown:
Where LD represents the loss function of the arbiter D, LG represents the loss function of the generator G, xf is the picture data produced by the generator, xt is the real picture data, and E represents the averaging function. New loss function evaluates the probability that average given real data is more real than randomly sampled dummy dataThe whole network can generate stable high-quality (finer edges and richer textures) data images from smaller samples, and the time required by model training is reduced;
In the deep learning, the loss function measures the difference between the prediction and the actual and reduces the difference through an optimization strategy. The model optimization rule adopted by the patent is a gradient descent method for calculating the first moment estimation and the second moment estimation of the gradient and the adaptive learning rate of the parameter, which is different from a strategy for correcting errors by only solving random gradient descent of the gradient. The learning rate can be adjusted according to the historical information of the gradient, so that the memory required by the training data for updating the weight of the neural network is reduced, and the training speed is faster.
The invention has the advantage of providing a novel unsupervised data enhancement method for generating surface defect image data (shown in fig. 4) of hot rolled coils with different resolutions. The generated countermeasure network is different from other artificial intelligent networks, can stably generate the generated countermeasure network of the surface defects of the hot rolled plate coil and various resolution images, generates high-quality image sample data and solves the instability of original generated countermeasure network training. The network architecture differs in the connection between the middle layers of the generator and the discriminator by introducing a regularization term between the input vector and the generated result to ensure that similar input vectors do not always fall within one mapping region. The connected layers are trained at the same time, so that a great amount of time for network training is saved, the network is not trained in a traditional step-by-step layer increasing mode, the image generating efficiency is improved, and the method is suitable for rapid engineering landing. The invention also has the advantage that the result of the discriminator is determined by the output of the generator and the output of the intermediate layer together, and the gradient information features are transferred to images with different resolutions. Compared with the prior art, the method further relieves the problem that the image training set of the surface defect of the hot rolled coil is difficult to acquire, and lays a better data foundation for improving the identification accuracy of the AI model and the generalization capability of the model. The novel deep production countermeasure network-generated data is a good AI model training resource, breaks through a data barrier to a certain extent, and provides a reference for obtaining a desired image data sample for more basic research and actual project application.
Drawings
Fig. 1 is a graph of a profile generated to fit real data to data. Line b is the gaussian distribution of the real data, line c is the noise that generated against the network learned data distribution, which was initially randomly initialized. The a-line is the probability that the arbiter network decides that the image is real data. The objective of the production countermeasure network is to gradually fit the c curve to the b curve through the continuous training generator and the discriminator network, namely the process from left to right in fig. 1, and finally when the distribution of the generated samples and the real samples completely coincide, the sample data is true or false, and the true or false is judged in the discriminator with the same probability. The horizontal line of the mark x represents the sampling space subject to the gaussian distribution x, and the horizontal line of the mark z represents the sampling space subject to the distribution z. The arrow with z-axis pointing to the x-axis indicates that the generation countermeasure network learns the mapping from z-space to x-space;
Fig. 2 is a diagram for generating an impedance network concept. The generator network takes as input the random noise vector z to generate the image x, the arbiter is trained to be a two-class network based on two data, one data of the real data samples, all marked 1, and the other data from the generator marked 0. The real data and the generated data are input as samples after mixing, and the probability that the input image is the real image is output. The discriminator is used to determine whether the input image is a real image or a generated image. In an ideal situation, the generator network may generate enough "spurious" pictures x that it is difficult for the arbiter network to determine whether the pictures generated by the generator network are real or generated at all. When one such generated countermeasure model is obtained, it can be used to generate a target picture for unsupervised data enhancement;
Fig. 3 is a diagram of a model network architecture. This architecture connects the middle layer of the generator sequence with the middle layer of the arbiter sequence (which is different from conventional production countermeasure networks) and transfers the multi-dimensional images of the middle layer of the generator sequence to the arbiter and the corresponding activation functions in the arbiter obtained from the last convolutional layer together. The model in the present invention allows the arbiter to obtain not only the final output (highest resolution) of the generator, but also the output of the intermediate layer. The connection operation enables the production countermeasure network to better adjust parameters thereof, and improves the stability of network training. The left-most input of the graph is a potential noise vector to a horizontal rectangular module of a generator sequence, then the graph is subjected to two convolution modules (one 4×4 convolution and one 3×3 convolution) of a vertical rectangular frame, then the graph is subjected to upsample (up-sampling) through middle layer convolution, such as g1,g2 in the graph, until an image sample with the highest resolution designated (highest resolution samples) is generated (a defect image of a hot rolled coil surface break in fig. 3), then the generated graph and a training image are input to a right-side identifier sequence network, the down-sampling is continuously performed through a deconvolution module (downsample), such as dk-2,dk-1 in the right end of the graph, a generator middle layer is connected during the process, the images with different resolutions and corresponding activation values obtained from the previous convolution layer are connected together through a combination module (a longitudinal module in the right end of the graph), then difference information between a certain layer of feature graph of a small batch of samples is calculated through a MinibatchStd module (MinibatchStd module), the difference information between the feature graph of a certain layer in the identifier network is used as an output of the next layer of the identifier network, the identifier network is subjected to be subjected to down-sampling, the difference information is further input to the full-collapse pattern is set, and the full-collapse pattern is achieved, and the full-collapse pattern can be further input to the identifier pattern is further connected;
FIG. 4 is a graph showing an example of surface rolling defects of the resulting hot rolled sheet. The resolution sizes of the images are 4×4,8×8,16×16,32×32,64× 64,128 ×128 in order from left to right and from top to bottom. The theoretical generation image resolution may also be 256×256,512×512,1024×1024, which is related to the image dataset resolution for training and the performance of the training machine and the model training round number (number of iterations), training time, neural network super-parameter settings, generator sequence layer number k. The generator and the discriminator in the invention use the same super parameter setting, the model learning rate LEARNING RATE is 0.003, the batch size is 120, and the training is simpler than the training of other neural networks, and the realization is convenient. The resolution of the training data set adopted in the invention is 128×128, the training round number is 57290, and the training time is one week. The performance of the machine is described in the detailed description;
FIG. 5 is a graph showing another example of the surface rolling defect of the resulting hot rolled sheet. The resolution sizes of the images are 4×4,8×8,16×16,32×32,64× 64,128 ×128 in order from left to right and from top to bottom. The principle of the generation is the same as that of fig. 4, and fig. 5 and 4 are both trained (i.e. true distribution of true data is fitted) production countermeasure network model randomly generated hot rolled sheet coil surface defect data images with multiple resolutions.
Detailed Description
1. The method is implemented by a computer server needing deep learning, 2 pieces of GPU graphics cards with Tesla (version is V100), wherein each piece of graphics memory has 32G and 128 memory, and a 1TB solid state disk is used as a hardware platform;
2. Installing a Linux 64-bit (Ubuntu16.04) operating system, installing an Anaconda software library, installing Pytorch a deep learning framework, installing a CUDA Toolkit version of 10.2, a graphic card driving version of NVIDI-440.33.01 and a cudnn acceleration package version of 7.6.5 as software environments, and using a python programming language;
3. Activating a deep learning development environment using command statements source axtivate pytorch;
4. preparing a data set, wherein the picture training set is from a steel mill hot rolled plate coil actual production line, shooting and collecting by a surface defect detector, sorting the marked labeled data set by workers and experts, cleaning the labeled data set to remove some pictures which do not accord with the error format of scene recognition, noise, damaged pictures and uniform picture size, preprocessing the data set to obtain 325 rolling defect images, wherein the resolution is 128 multiplied by 128, and the picture format is jpg;
5. The structure of the countermeasure network is well defined and generated, the dimension shape of the input data is defined, the shape and the initialization mode of each layer are well defined, and the loss functions loss and Gaussian noise distribution of the two networks of the discriminator D and the G generator are defined;
6. parameters of both the generator G and the arbiter D are initialized. A training generator for generating image samples using the defined noise distribution and reading in data from the training set;
7. Fixing the G parameter of a generator, training a discriminator, putting the generated picture and the real picture into the discriminator, and distinguishing the true picture and the false picture as far as possible through a loss function;
8. After the discriminator clears the real data and the generated data, fixing parameters of the discriminator, training a generator, and enabling the discriminator to generate picture data again;
9. The optimization strategy is selected to optimize the model, after k times of cyclic updating, the discriminator cannot distinguish true and false samples, and when the Nash balance target is reached, the generator can be considered to capture the true distribution of the true data;
10. Examples of images generated using the trained generated challenge model are shown in fig. 4 and 5, and will be described with reference to the accompanying drawings.