Image blind motion blur removing method based on cyclic multi-scale generation countermeasure networkTechnical Field
The invention belongs to the technical field of image processing, and relates to an image blind motion blur removing method based on a cyclic multi-scale generation countermeasure network.
Background
Since it is difficult to maintain a relatively stationary state between the photographing apparatus and the imaging object, motion blur of the image may be caused. However, in the fields of daily life, traffic safety, medicine, military investigation and the like, it is very important to obtain a clear image.
The blurring of moving images can be seen as the formation of sharp images and a two-dimensional linear function after convolution operation by additive noise pollution. This linear function, called the point spread function or convolution kernel, contains the blurring information of the image. Blind deblurring of an image refers to restoring an original sharp image by only depending on information of a blurred image under the condition that a blurring mode is unknown (namely a blurring kernel is unknown). In the blind deblurring of a single moving image, a blur kernel and the size of the blur image are unknown, which affects the accuracy of blur kernel estimation and further affects the final restoration effect.
Disclosure of Invention
The invention aims to provide an image blind motion blur removing method based on a cyclic multi-scale generation countermeasure network aiming at the characteristic of image motion blur, and the method can estimate a clear image without estimating a blur kernel.
The invention specifically comprises the following steps:
step (1), constructing a discriminator D;
the discriminator D consists of nine convolutional layers, a full-link layer and a Sigmoid active layer, and inputs a color image with the size of 256 multiplied by 256.
Each convolutional layer used a LeakyReLU as the activation function: the first layer has 32 convolution kernels, each convolution kernel size is 5 × 5, step size is 2, zero-padding width (zero-padding) is 2; the second layer has 64 convolution kernels, each convolution kernel is 5 × 5 in size, 1 in step size and 2 in zero-filling width; the third layer has 64 convolution kernels, each convolution kernel is 5 × 5 in size, 2 in step size and 2 in zero-filling width; the fourth layer has 128 convolution kernels, each convolution kernel is 5 × 5 in size, 1 in step size and 2 in zero-filling width; the fifth layer has 128 convolution kernels, each convolution kernel size is 5 × 5, step size is 4, and zero filling width is 2; the sixth layer has 256 convolution kernels, each convolution kernel has a size of 5 × 5, a step size of 1, and a zero filling width of 2; the seventh layer has 256 convolution kernels, each convolution kernel has a size of 5 × 5, a step size of 4, and a zero filling width of 2; the eighth layer has 512 convolution kernels, each convolution kernel is 5 × 5 in size, the step size is 1, and the zero filling width is 2; the ninth layer has 512 convolution kernels, each with a size of 4 x 4, step size of 4, and zero-fill width of 0.
And the convolution output of the last layer is subjected to full-connection layers with the number of input channels being 512 and the number of output channels being 1 to obtain 1 constant, and the probability of judgment is output after being activated by a Sigmoid function.
Step (2), constructing a generator G;
the generator G comprises cascaded three-scale sub-networks, wherein each sub-network comprises 1 input module, 2 coding modules, 1 cascaded convolution long-time memory (ConvLSTM) module, 2 decoding modules and 1 output module; each module comprises a residual module, the residual module is formed by cascading a convolution layer with a convolution kernel, and the convolution layer takes a modified Linear Unit (ReLU) as an activation function; and adding the output of the cascaded convolution kernel in the residual error module and the input of the residual error module to obtain the output of the residual error module.
The input module comprises an independent convolutional layer and three residual modules with the same structure, the number of cores of convolutional layer convolutional cores of the independent convolutional layer and the residual modules is 32, the size of the convolutional layer convolutional cores is 5 multiplied by 5, the step length of the convolutional layer convolutional cores is 1, the zero filling width of the convolutional layer cores is 2, and a ReLU function is used as an activation function in the independent convolutional layer.
The first coding module comprises an independent convolutional layer and three residual modules with the same structure, the number of convolutional layers of the independent convolutional layer and the residual modules is 64, the size of the convolutional layers is 5 multiplied by 5, the step size is 2, the zero filling width is 2, and a ReLU function is used as an activation function in the independent convolutional layer.
The second coding module comprises an independent convolutional layer and three residual modules with the same structure, the number of convolutional layers of the independent convolutional layer and the residual modules is 128, the size of the convolutional layers is 5 multiplied by 5, the step size is 2, the zero filling width is 2, and a ReLU function is used as an activation function in the independent convolutional layer.
The state output of the memory cells in the convolution long-time and short-time memory module is used as the input of the decoding module, and the hidden state output of the convolution long-time and short-time memory module is connected with the hidden state input of the convolution long-time and short-time memory module in the next scale sub-network; and for the last scale, the hidden state output of the convolution long-time and short-time memory module is not connected with other modules.
The structure of the convolution long-short memory (ConvLSTM) module is shown in Shi X, Chen Z, Hao W, et al. the convolutional LSTM Network a machine learning approach for prediction of non-coding [ C ]// International Conference on Neural Information Processing systems.2015, page number: 802-810.
The first decoding module comprises three residual modules with the same structure and an independent convolutional layer, the number of convolutional kernels of the independent convolutional layer and the residual modules is 128, the size of the convolutional kernels is 5 multiplied by 5, the step size of the convolutional kernels is 2, the zero filling width of the convolutional kernels is 2, and a ReLU function is used as an activation function in the independent convolutional layer cascaded after the residual modules.
The second decoding module comprises three residual modules with the same structure and an independent convolutional layer, the number of convolutional kernels of the independent convolutional layer and the residual modules is 64, the size of the convolutional kernels is 5 multiplied by 5, the step size of the convolutional kernels is 2, the zero filling width of the convolutional kernels is 2, and a ReLU function is used as an activation function in the independent convolutional layer cascaded after the residual modules.
The output module comprises three residual modules with the same structure and an independent convolutional layer, the number of convolutional cores of the independent convolutional layer and the convolutional layer of the residual modules is 32, the size of the convolutional layer is 5 multiplied by 5, the step length is 1, the zero filling width is 2, and a ReLU function is used as an activation function in the independent convolutional layer cascaded behind the residual modules.
Output third-level scaled generator output image L3The size is 64 x 64, L3Up-sampling to obtain 128 × 128 image, and outputting 128 × 128 second-level generator output image L as input of the second-level scale2;L2Up-sampling to obtain 256 × 256 image as input of first-level scale, and outputting 256 × 256 generator output image L of first-level scale1I.e. the deblurred resulting image. In the cascaded three-scale sub-networks, the corresponding structures, the channel numbers and the convolution kernel sizes of the three sub-networks are all the same. In the three channels of the color image RGB, each channel weight is shared.
And (3) randomly extracting m (m is more than or equal to 16) blurred images and corresponding clear images from the training data set T, randomly cutting the blurred images and the corresponding clear images into square areas of 256 multiplied by 256, respectively forming a blurred image set B for training and a corresponding clear image set S, wherein the number of the obtained images B and S is m, and each image is a 3-channel color image of 256 multiplied by 256. The blurred image set B is input into a generator, and an output image set L of the generator is obtained, wherein m color images with the size of 256 multiplied by 256 exist in the L.
And (4) sequentially taking the generator output image set L and the corresponding clear image set S as the input of a discriminator, and sequentially outputting two groups of confidence coefficient results by the discriminator, wherein each group of confidence coefficients comprises m probability values, so that each input image is judged to be a clear image or a generated image: if the probability value is greater than 0.5, determining that the image is clear; and if the probability value is less than or equal to 0.5, determining that an image is generated.
And (5) constructing a loss function of the training generator, wherein the loss function is as follows: ldb=lE+α1lgrad+α2ladv;
Wherein alpha is
1、α
2Is a regular term coefficient greater than 0, l
ETo the generator, the mean square error between the image set L and the corresponding clear image set S is output, i.e.:
wherein L isi、SiRepresenting the generator output image and the sharp image, respectively, at the ith scale, NiThe number of pixels of all channels on the ith scale image is represented, i is 1,2 and 3; and (3) carrying out multi-scale down sampling on the image for 3 times to obtain the image with reduced size, wherein the first-level scale is the image with the original size, and from the second level, the size of each level of image is half of the width and the height of the previous-level image.
l
gradAs a gradient image
And
the gradient error between, i.e.:
in the formula Li(dx) And Li(dy) Respectively represent the horizontal and vertical gradient of Li, Si(dx) And Si(dy) Respectively represent SiHorizontal gradient and vertical gradient.
ladvThe decision error for the generator output image set L and the corresponding sharp image set S is:
wherein S-p (S) represents that the clear image S is taken from the clear image set S, and p (S) represents the probability distribution of the clear image set S; b to p (B) show that the blurred image B is taken from the blurred image set B, and p (B) show the probability distribution of the blurred image set B;
d(s) represents the discrimination probability of the discriminator on the input image s, G (b) represents the result image generated by the input image b through the generator, and E [. cndot. ] represents the expectation in brackets.
Step (6) inputting the generated image and the clear image into a discriminator, updating weight parameters in each layer of network by gradient descent iteration, and continuously optimizing ladvUntil the discriminator can not discriminate whether the input image is the generated image or the clear image, namely the difference value change between the obtained probability value and 0.5 is less than thr, and thr is more than or equal to 0.01 and less than or equal to 0.08, the discriminant training is finished.
Step (7) according to the loss function ldb=lE+α1lgrad+α2ladvTraining generator, inputting the fuzzy image into the generator, obtaining the generated image through forward propagation, comparing the difference between the generated image and the clear image, and iterating by gradient descentNew weighting parameter in each layer network, constant loss function ldb=lE+α1lgrad+α2ladvTraining lumped loss function values until the generator model training phasedbThe variation is less than a threshold Th, Th is more than or equal to 0.001 and less than or equal to 0.01, and the training of the generator is finished at the moment.
And (8) repeating the steps (3) to (7) of the training process until the value l of the training lumped loss function of the generator model in the training stagedbAnd when the variation is smaller than the threshold Th, namely the discriminator cannot judge whether the input image is a sharp image or a generated image, the generator model and the discriminator model are determined to be converged, and the blurred image is input into the generator to obtain an estimated deblurred image.
The method of the invention uses a deep learning method to learn the relation between the motion blurred image and the corresponding clear image, and omits a complex blurred kernel estimation process. Through the contrast training of a large number of blurred images and clear images, the extracted model can extract the edge characteristics of the images, has a simpler network structure and fewer parameters, is easier to train and has a better restoration effect.
Detailed Description
The following further illustrates the practice of the present invention.
The blurred image set B is input to a generator G, and a generator output image set L is obtained as an input of a discriminator D, so that a discrimination result of the discriminator is obtained. Similarly, the clear image set S is also used as an input of the discriminator to obtain a discrimination result. The determination result indicates whether the determination input is from the clear image set or the generated image set, and if the determination result is greater than 0.5, the determination is made as the clear image set S; otherwise, it is determined that the generator outputs the image set L. And calculating the error between the judgment result and the real label data, optimizing the discriminator by using a gradient descent algorithm, then calculating the error mean value of the generated image and the clear image, and optimizing the generator by using the gradient descent algorithm. The discriminators and generators are alternately optimized until the model converges. In the experiments of the present invention, the model converged after a total of 40 ten thousand training times.
The image blind motion blur removing method for generating the countermeasure network based on the circulation multi-scale comprises the following specific steps:
s1, constructing a discriminator D: the discriminator D is composed of nine convolutional layers, one full link layer, and one Sigmoid active layer, and inputs a color image having a size of 256 × 256.
Each convolutional layer used a LeakyReLU as the activation function: the first layer has 32 convolution kernels, each convolution kernel has a size of 5 × 5, a step size of 2, and a zero-filling width of 2; the second layer has 64 convolution kernels, each convolution kernel is 5 × 5 in size, 1 in step size and 2 in zero-filling width; the third layer has 64 convolution kernels, each convolution kernel is 5 × 5 in size, 2 in step size and 2 in zero-filling width; the fourth layer has 128 convolution kernels, each convolution kernel is 5 × 5 in size, 1 in step size and 2 in zero-filling width; the fifth layer has 128 convolution kernels, each convolution kernel size is 5 × 5, step size is 4, and zero filling width is 2; the sixth layer has 256 convolution kernels, each convolution kernel has a size of 5 × 5, a step size of 1, and a zero filling width of 2; the seventh layer has 256 convolution kernels, each convolution kernel has a size of 5 × 5, a step size of 4, and a zero filling width of 2; the eighth layer has 512 convolution kernels, each convolution kernel is 5 × 5 in size, the step size is 1, and the zero filling width is 2; the ninth layer has 512 convolution kernels, each with a size of 4 x 4, step size of 4, and zero-fill width of 0.
And the convolution output of the last layer is subjected to full-connection layers with the number of input channels being 512 and the number of output channels being 1 to obtain 1 constant, and the probability of judgment is output after being activated by a Sigmoid function.
S2, constructing a generator G: the generator G comprises cascaded subnetworks with three scales, wherein each subnetwork comprises 1 input module, 2 coding modules, cascaded 1 convolution long-time and short-time memory modules, 2 decoding modules and 1 output module; each module comprises a residual error module, the residual error module is formed by cascading a convolution layer with a convolution kernel, and the convolution layer takes an improved linear unit ReLU as an activation function; and adding the output of the cascaded convolution kernel in the residual error module and the input of the residual error module to obtain the output of the residual error module.
The input module comprises an independent convolutional layer and three residual modules with the same structure, the number of cores of convolutional cores of the independent convolutional layer and the residual modules is 32, the size of the convolutional cores is 5 multiplied by 5, the step length of the convolutional cores is 1, the zero filling width of the convolutional cores is 2, and a ReLU function is used as an activation function in the independent convolutional layer.
The first coding module comprises an independent convolutional layer and three residual modules with the same structure, the number of convolutional layers of the independent convolutional layer and the residual modules is 64, the size of the convolutional layers is 5 multiplied by 5, the step size is 2, the zero filling width is 2, and a ReLU function is used as an activation function in the independent convolutional layer.
The second coding module comprises an independent convolutional layer and three residual modules with the same structure, the number of convolutional layers of the independent convolutional layer and the residual modules is 128, the size of the convolutional layers is 5 multiplied by 5, the step size is 2, the zero filling width is 2, and a ReLU function is used as an activation function in the independent convolutional layer.
The state output of a memory cell in the convolution long-time and short-time memory module is used as the input of the decoding module, and the hidden state output of the convolution long-time and short-time memory module is connected with the hidden state input of the convolution long-time and short-time memory module in the next scale sub-network; and for the last scale, the hidden state output of the convolution long-time and short-time memory module is not connected with other modules.
The first decoding module comprises three residual modules with the same structure and an independent convolutional layer, the number of convolutional kernels of the independent convolutional layer and the residual modules is 128, the size of the convolutional kernels is 5 multiplied by 5, the step size of the convolutional kernels is 2, the zero filling width of the convolutional kernels is 2, and a ReLU function is used as an activation function in the independent convolutional layer cascaded after the residual modules.
The second decoding module comprises three residual modules with the same structure and an independent convolutional layer, the number of convolutional kernels of the independent convolutional layer and the residual modules is 64, the size of the convolutional kernels is 5 multiplied by 5, the step size of the convolutional kernels is 2, the zero filling width of the convolutional kernels is 2, and a ReLU function is used as an activation function in the independent convolutional layer cascaded after the residual modules.
The output module comprises three residual modules with the same structure and an independent convolutional layer, the number of convolutional cores of the independent convolutional layer and the convolutional layer of the residual modules is 32, the size of the convolutional layer is 5 multiplied by 5, the step length is 1, the zero filling width is 2, and a ReLU function is used as an activation function in the independent convolutional layer cascaded behind the residual modules.
Output third-level scaled generator output image L3The size is 64 x 64, L3Up-sampling to obtain 128 × 128 image, and outputting 128 × 128 second-level generator output image L as input of the second-level scale2;L2Up-sampling to obtain 256 × 256 image as input of first-level scale, and outputting 256 × 256 generator output image L of first-level scale1I.e. the deblurred resulting image. In the cascaded three-scale sub-networks, the corresponding structures, the channel numbers and the convolution kernel sizes of the three sub-networks are all the same. In the three channels of the color image RGB, each channel weight is shared.
S3, randomly extracting m (m is 16) blurred images and corresponding sharp images from the training data set T, and randomly cutting the blurred images and the corresponding sharp images into 256 × 256 square regions to respectively form a blurred image set B and a corresponding sharp image set S for training, where the images of B and S are both 256 × 256 3-channel color images. And inputting the blurred image set B into a generator to obtain an output image set L of the generator.
S4, sequentially taking the image set L output by the generator and the corresponding clear image set S as the input of a discriminator, and sequentially outputting two groups of confidence coefficient results by the discriminator, wherein each group of confidence coefficients comprises 16 probability values, so that each input image is judged to be a clear image or a generated image: if the probability value is greater than 0.5, determining that the image is clear; otherwise, the image is determined to be generated.
S5, constructing a loss function of the training generator, wherein the loss function is as follows: ldb=lE+α1lgrad+α2ladv。α1、α2Being coefficient of a regular term, α1=10-2,α2=10-4。lETo generate the mean square error between the generator output image set L and the corresponding clear image set S:
wherein L is
i、S
iRepresenting the generator output image and the sharp image, respectively, at the ith scale, N
iThe number of pixels of all channels on the ith scale image is represented, i is 1,2 and 3; and in the multiscale, an image with reduced size is obtained by carrying out three times of downsampling on the image, the first-level scale is the image with the original size, and from the second level, the size of each level of image is half of the width and the height of the size of the previous-level image.
l
gradAs a gradient image
And
the gradient error between, i.e.:
in the formula, Li(dx) And Li(dy) Respectively represent the horizontal and vertical gradient of Li, Si(dx) And Si(dy) Respectively represent SiHorizontal and vertical gradients of; ladvThe decision error for the generator output image set L and the corresponding sharp image set S is:
wherein S-p (S) represents that the clear image S is taken from the clear image set S, and p (S) represents the probability distribution of the clear image set S; b to p (B) show that the blurred image B is taken from the blurred image set B, and p (B) show the probability distribution of the blurred image set B;
d(s) represents the discrimination probability of the discriminator on the input image s, G (b) represents the result image generated by the input image b through the generator, and E [. cndot. ] represents the expectation in brackets.
S6, inputting the generated image and the clear image into a discriminator, updating the weight parameters in each layer of network by gradient descent iteration, and continuously optimizing ladvAnd until the discriminator cannot discriminate whether the input image is a generated image or a clear image, namely the difference value change between the obtained probability value and 0.5 is less than the set threshold value 0.05, finishing the training of the discriminator.
S7 according to the loss function ldb=lE+α1lgrad+α2ladvTraining generator, inputting the fuzzy image into the generator, obtaining the generated image through forward propagation, comparing the difference between the generated image and the clear image, updating the weight parameter in each layer of network by gradient descent iteration, and continuously losing function ldb=lE+α1lgrad+α2ladvTraining lumped loss function values until the generator model training phasedbThe change is less than the set threshold of 0.005, at which point the generator training is complete.
S8, repeating the steps S3 to S7 of the training process until the training lumped loss function value l of the generator model training phasedbAnd when the variation is less than 0.005, namely the discriminator cannot judge whether the input image is a sharp image or a generated image, the training of the generator model and the discriminator model is determined to be converged, and the blurred image is input into the generator to obtain an estimated deblurred image.