Disclosure of Invention
The invention aims to reduce the cost through an image reconstruction technology, overcome the limitation of hardware, provide an effective and feasible algorithm solution for people, and provide a depth residual error network image super-resolution reconstruction method based on cascade contraction and expansion, so as to solve the problems in the prior art, realize better detail recovery of a super-resolution image, and have the advantages of simple algorithm, high running speed and strong practicability.
In order to achieve the purpose, the invention provides a depth residual error network image super-resolution reconstruction method based on cascade contraction and expansion, which has the following specific technical scheme:
a super-resolution reconstruction method of a depth residual error network image based on cascade contraction and expansion comprises the following steps:
step S1, acquiring a low-resolution observation image sequence, and performing bicubic difference operation on any one image in the sequence to obtain an initial estimation value of a high-resolution image;
and step S2, based on the cascaded contracted and expanded depth residual error neural network, carrying out depth residual error neural network training on the initial evaluation image, and adding the trained feature map and the initial evaluation image to recover a high-resolution image corresponding to the low-resolution image.
Preferably, in step S2, the depth residual neural network training is performed on the initial estimation image, and includes: firstly, a shrinking sub-network is adopted to shrink the initial estimated image characteristics, then the shrinking image characteristics are reconstructed and output through an expanding sub-network, and finally a high-resolution image corresponding to a low-resolution image is recovered through a reconstructing sub-network.
Preferably, the shrinking sub-network shrinking step is to input the initial estimation image into the first-level convolution layer of the shrinking sub-network to obtain a first shrinking layer C1; the first shrinkage layer C1 is subjected to down-sampling operation and then is subjected to convolutional layer learning shrinkage once to obtain a second shrinkage layer C2; the second shrinkage layer C2 is subjected to downsampling operation, and then is subjected to convolutional layer learning once to obtain a third shrinkage layer C3; the third contraction layer C3 is downsampled and then convolutional-layer learned three times to obtain a fourth contraction layer C4, and the fourth contraction layer C4 is used as the first expansion layer D1 in the expansion sub-network.
Preferably, the expanding sub-network expanding step is that the first expanding layer D1 is added with the third contracting layer C3 after the up-sampling operation, so as to obtain a second expanding layer D2; the second expansion layer D2 is added with the second contraction layer C2 after one convolution layer learning and up-sampling operation to obtain a third expansion layer D3; the third expansion layer D3 is subjected to convolutional layer learning and upsampling operation once, and then added to the first contraction layer C1 to obtain a fourth expansion layer D4, and the fourth expansion layer D4 is subjected to convolutional layer learning once, and the contraction step and the expansion step are repeated.
Preferably, the reconstructing sub-network step is that, in the repeated expansion sub-network, the second expansion layer D2 goes through two upsampling operations to the fourth expansion layer D4, and outputs a third feature map R3; the third expansion layer D3 goes through one up-sampling operation to the fourth expansion layer D4, and a second feature map R2 is output; the fourth expansion layer D4 obtained through the expansion sub-network expansion step outputs a first feature map R1; and adding the first characteristic diagram R1, the second characteristic diagram R2 and the third characteristic diagram R3 to obtain a trained characteristic diagram, and after carrying out convolution operation on the trained characteristic diagram, adding the trained characteristic diagram and the initial evaluation image to obtain a high-resolution image corresponding to the low-resolution image.
Preferably, the convolution down-sampling operation with step size 2 is performed in the shrinking sub-network, the deconvolution up-sampling operation with step size 2 is performed in the expanding sub-network, and each layer in the network consists of a 3 × 3 filter and an activation function operation.
Preferably, in the shrinking sub-network, after the second shrinking layer C2 and the first shrinking layer C1 are sampled, the samples are connected through a shortcut connection to the corresponding layer of the expanding sub-network or to the next shrinking layer of the shrinking sub-network.
Compared with the prior art, the invention has the following beneficial effects:
the invention discloses a super-resolution reconstruction method of a depth residual network image based on cascade contraction expansion, which comprises the steps of firstly carrying out bicubic difference value operation on an input low-resolution image to obtain an initial estimated value of the high-resolution image, then carrying out feature contraction through the down-sampling operation of a contraction sub-network to increase the receptive field, carrying out multi-level up-sampling on the finally obtained contraction feature through an expansion sub-network to realize detail reconstruction and multi-level feature output, combining the feature mapping of the contraction sub-network and the feature mapping of the expansion sub-network through quick connection in order to enrich the features without increasing the calculation load, and finally recovering the high-resolution image corresponding to the low-resolution image by fusing the features of different scales through a reconstruction sub-network. Meanwhile, different levels of detail textures are output by adopting a multi-level reconstruction mode of the expansion subnetwork, so that the details of the low-resolution image are better recovered. Compared with other mainstream algorithms, the algorithm has good performance and efficiency, the time consumption of the algorithm is shorter than that of other mainstream algorithms when the optimal performance and efficiency are obtained, the network structure of the algorithm is simple, the operation speed of the calculation complexity is high, the calculation cost is reduced, the good performance is obtained, the comprehensive effect is better improved, and the practicability is high.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
Referring to fig. 2, the invention relates to a depth residual error network image super-resolution reconstruction method based on cascade contraction and expansion, which comprises the following steps:
1. nonlinear fitting of neural networks
The neural network is actually a plurality of neurons connected according to a certain rule, wherein each neuron node receives an output value of a neuron in a previous layer as an input value of the neuron, and transmits the input value to a next layer, and the neuron node in the input layer directly transmits an input attribute value to the next layer (or output layer). In a multi-layer neural network, there is a functional relationship between the output of an upper node and the input of a lower node, and this function is called an activation function. Single neuron structure as shown in fig. 2, the single neuron structure mathematical expression can be described as follows:
for equation (1), f represents the neuron, where xjIs the jth input of the neuron, wjIs the jth weight for the corresponding input, b is the bias term, and y is the output of a neuron that contains n inputs and one output. Neural networkThere are weights for each connection, and these weights are parameters of the neural network model, that is, what the model needs to learn. However, parameters such as the connection mode of a neural network, the number of layers of the network, the number of nodes per layer, and the learning rate are set in advance manually. These artificially set Parameters are often referred to as Hyper-Parameters (Hyper-Parameters).
2. Forward propagation
The forward propagation algorithm in the neural network can be summarized as that the input of each layer is weighted and operated with the corresponding connection weight, and the result is added with a bias term, but the bias term is not necessary in some neural networks. And then obtaining an output result of each layer through a nonlinear activation function, such as ReLu, Sigmoid, Tanh and the like. Finally, continuously calculating forwards by the method to obtain an output layer result. For forward propagation, no matter how high the dimension is, the process can be expressed by the following formula:
wherein,represents the weight of the ith cell of the ith layer for the jth input, f represents the activation function,indicating the activation output value of the ith unit of the ith layer. The propagation of the antecedent is simple because only the output of each layer needs to be calculated, and the backward propagation of the network is started after the forward propagation is finished.
3. Counter-propagating
In order to update the parameters of the network, Back-Propagation (BP) of the gradient must be performed to calculate the update amplitude of each parameter, and the Back-Propagation method is a specific implementation manner of the gradient descent method on the deep network. In general, the gradient is calculated for an objective function of a depth network, and the method of minimizing the gradient of the objective function is a neural network optimization algorithm, in which the idea is to update parameter values along the direction of the gradient of the objective function in order to hopefully achieve the minimum (or maximum) of the objective function. The gradient descent method is the most commonly used optimization algorithm for deep learning networks.
The back propagation algorithm is usually used to train CNN, that is, parameters corresponding to the network are updated through the gradients of back propagation loss for each layer, the network parameters are updated and iterated once, the loss of the next iteration is calculated through forward propagation, and then the parameters are updated repeatedly until the set iteration number or the set loss value is reached, so that the mapping relationship learned by the trained network is as close to the real target mapping as possible. The chain derivative formula for calculating the gradient is expressed as:
there is a fixed sample set containing m samples { (x)(1),y(1)),…,(x(m),y(m)) Solve the neural network using Batch Gradient Descent (Batch Gradient Descent). Specifically, if the common two-norm is used to calculate its loss for a single sample (x, y), its cost function is expressed as:
to reduce the magnitude of the weights and prevent overfitting, by adding a regularization term, where a weight decay parameter λ is used to control the relative importance of the two terms in the formula, the loss function is expressed as:
m represents the batch size of the batch gradient descent method, and the update loss is calculated by continuously iterating, so that the aim is to solve the minimum cost J (W, b) for the parameters W and b. Each iteration in the gradient descent method updates W and b according to the following formula:
(W,b)=(W(1),b(1),W(2),b(2)…,W(n),b(n))………………8
wherein α is the learning rate, and the step that the parameter needs to be updated each time is obtained by calculating the partial derivatives of the loss function to the weight W and the bias b respectively and multiplying the partial derivatives by the learning rate, wherein, the formula (8) is the step that the parameter set, namely the weight W and the bias b, needs to be optimized, so far, the gradient back propagation algorithm returns to the forward propagation step after updating all the parameters, and thus the process is repeated until the training is finished.
Typically, the convolutional layer is followed by an active layer. The activation function is used to add non-linear factors because the expressive power of linear models does not enable complex non-linear mappings. And mapping the convolved output to a value threshold of an activation function through the activation function, wherein the activation function is the nonlinear mapping for realizing the output characteristics.
A modified Linear unit (ReLU) is also a common nonlinear activation function, and the ReLU function is Linear in the positive half, i.e. it is itself when implementing input mapping, and the function value is 0 in the negative half, i.e. the derivative is 0. This function is clearly superior to the first two functions, being the most widely used function today. The function is simple, reverse derivation is very easy, and the mathematical expression is as follows:
f(x)=max(0,x)……………………………………9
based on the nonlinear characteristics of the activation functions, the neural network can realize high-level nonlinear mapping by combining the activation functions. The usual activation functions mentioned above also include some variant more advanced activation functions such as: the LeakyReLU function, the ELU function, the MaxOut function, the prilu function, and the like are activation functions extended with the ReLU, and the Softmax function is an activation function extended with the Sigmoid function for multi-classification.
The steps of constructing the depth residual error network algorithm based on the cascade contraction expansion are shown in fig. 3, and include: shrinking sub-networks, expanding sub-networks, and rebuilding sub-networks. The contracting and expanding subnetworks each contain four levels of feature description. Firstly, a low-resolution image after bicubic interpolation operation is adopted, the receptive field is increased through a contraction sub-network, contraction characteristic representation is used for expanding the sub-network, and then characteristic contraction of a multi-level down-sampling layer is constructed; the expansion sub-network constructs the feature fusion of a multi-level up-sampling layer corresponding to the contraction sub-network to realize the image detail recovery; the feature maps of the shrinking sub-network and the expanding sub-network are combined in the reconstruction sub-network by means of a shortcut connection to achieve a multi-level reconstruction feature output.
The algorithm implementation of the present invention is shown in fig. 3, where each rectangular block represents a convolution layer, downward sampling represents performing convolution with step size 2 to implement Downsampling operation, and Upsampling represents performing deconvolution with step size 2 to implement Upsampling operation, and each layer in the network is composed of a 3 × 3 filter and an activation function operation. The activation function uses a parameterized linear modification unit ReLU that maximizes the representational capacity of the network, and for the last layer it is not used to keep the layer linear output. Each section is specifically described as follows:
(1) shrinking subnetworks
Constructing a shrinking subnetwork; the structure is shown in the left half part of fig. 3, each contraction of the contraction sub-network realizes the down-sampling operation by the convolution layer with the step size of 2, firstly, the initial estimation image is input into the first-level convolution layer of the contraction sub-network to obtain a first contraction layer C1; the first shrinkage layer C1 is subjected to down-sampling operation and then is subjected to convolutional layer learning shrinkage once to obtain a second shrinkage layer C2; the second shrinkage layer C2 is subjected to downsampling operation, and then is subjected to convolutional layer learning once to obtain a third shrinkage layer C3; the third contraction layer C3 is downsampled and then convolutional-layer learned three times to obtain a fourth contraction layer C4, and the fourth contraction layer C4 is used as the first expansion layer D1 in the expansion sub-network.
(2) Extending subnetworks
Constructing an extended sub-network; the structure of the expansion sub-network is shown in the right half part of the left side of fig. 3, each expansion of the expansion sub-network is performed by performing up-sampling operation by deconvolution with the step length of 2, so as to realize gradual reduction of the spatial resolution of the feature map, and first, the first expansion layer D1 is added with the third contraction layer C3 after up-sampling operation, so as to obtain a second expansion layer D2; the second expansion layer D2 is added with the second contraction layer C2 after one convolution layer learning and up-sampling operation to obtain a third expansion layer D3; the third expansion layer D3 is subjected to convolutional layer learning and upsampling once, and then added to the first contraction layer C1 to obtain a fourth expansion layer D4, the fourth expansion layer D4 is subjected to convolutional layer learning once, and the contraction step and the expansion step are repeated, wherein in the repeated expansion step, the addition operation in the upsampling process of the first expansion sub-network only uses the blocks with the size corresponding to the downsampling of the first contraction sub-network, and in the upsampling and adding process of the second expansion sub-network, both the blocks corresponding to the downsampling of the second contraction sub-network and the blocks corresponding to the downsampling of the first contraction sub-network are used.
(3) Rebuilding sub-networks
In the repeated expansion subnetwork, the second expansion layer D2 goes through two upsampling operations to the fourth expansion layer D4, and outputs a third feature map R3; the third expansion layer D3 goes through one up-sampling operation to the fourth expansion layer D4, and a second feature map R2 is output; the fourth expansion layer D4 obtained through the expansion sub-network expansion step outputs a first feature map R1; and adding the first characteristic diagram R1, the second characteristic diagram R2 and the third characteristic diagram R3 to obtain a trained characteristic diagram, and after carrying out convolution operation on the trained characteristic diagram, adding the trained characteristic diagram and the initial evaluation image to obtain a high-resolution image corresponding to the low-resolution image.
Deep residual error network algorithm training and testing based on cascade contraction expansion
(1) Algorithm training
The reconstruction algorithm can realize the image reconstruction of multi-scale factors. The advantage of using a model with multiple scale factors is that all parameters of the network can be shared at different magnifications. Similar to the VDSR, EDSR, and DRNN methods, our model can reconstruct HR images with magnification factors of 2, 3, and 4. Compared with other algorithms which need to train a separate model for each scale factor, the method reduces a lot of network parameters, and can reduce the size of an image reconstruction model in practical application. For better training of the network, the proposed network was trained using a large image reconstruction dataset, DIV2K dataset containing 1000 training images and validation images consisting of high resolution images. The color images in all data sets are usually converted from RGB space to YCbCr space, only the luminance component is extracted for training and testing, and the color component is amplified by bicubic interpolation. The network will use the mean square error function (MSE) as a loss function, while adding constraints on the parameters to prevent the network from overfitting:
wherein, theta is a network parameter, n is the number of images in the training set, and y(i)Is an HR image, x(i)For LR image, Y (x)(i)) For the network output, λ is the regularization coefficient. This part is carried out as followsThe following: carrying out random Gaussian kernel fuzzy degradation operation on the training image set, then carrying out down-sampling to obtain a plurality of LR images, forming training image pairs with HR images corresponding to the training image set, and expanding the training image set by adopting data enhancement; sending the obtained LR and HR images into a residual error network of the contraction and expansion depth to train a reconstruction network; and finally, converting the reconstructed brightness channel and the interpolated color channel into an RGB space.
Typically in data pre-processing, the training image is cropped into blocks of size 64 x 64 to better speed up training. The training data is augmented in two ways: randomly rotating the image by 90 °, 180 ° or 270 °, and flipping the image horizontally and vertically. Due to the large image size in the DIV2K dataset, we only randomly selected one enhancement to expand the training set by a factor of 2. Except for the last layer, each layer consists of 64 convolution operations of a 3 × 3 filter and one activation function operation. The filter initialization method adopts the He-normal based method. In the training process, a batch gradient descent algorithm is adopted to train the network, and the number of each batch of image blocks is set to be 64. Training also employed a learning rate decay strategy, with an initial learning rate set to 0.0001, decaying to one-tenth every 15 rounds, for a total of four decays until the end of 60 rounds of training. All experiments were carried out on a TensorFlow platform using a Python3.6 environment.
(2) Algorithm testing
To further illustrate the effectiveness of the inventive algorithm, we compare the method with other image reconstruction methods, and use the commonly used image objective evaluation index, Peak Signal to Noise Ratio (PSNR) for short, and Structural Similarity Index (SSIM) for short, which measures the Similarity index of two images.
Experimental results test codes from different authors were published, including Bicubic, SRCNN, VDSR, DRCN, laprn, ARN, and DWSR. On BSD100 and Urban100 evaluation data sets, testing the proposed method and some existing deep learning image reconstruction methods, adopting HR images subjected to bicubic linear interpolation operation as initial estimation, calculating PSNR and SSIM values of reconstructed images on a brightness channel, objectively evaluating the performance of an image reconstruction algorithm, giving average PSNR and SSIM values of 7 reconstruction algorithms on 2 reference data sets in table 1, and as can be seen from the data in table 1, PSNR values of test results of four disclosed image reconstruction evaluation data sets of the reconstruction algorithm are higher than those of other mainstream algorithms, the obtained PSNR and SSIM values achieve better effects, and the image comprehensive effect is improved;
TABLE 1 PSNR and SSIM comparison of the algorithm of the present invention with current mainstream algorithms in different data sets
The comparison result on the public test data set is shown in fig. 4, and the detail part of the reconstructed image is enlarged, so that the reconstruction algorithm can better reconstruct information such as detail texture and the like, the better detail recovery of the super-resolution image is realized, and the practicability is strong.
As shown in FIG. 5, under the condition of 4 times amplification, the relation between the average PSNR of an Urban100 test data set and the average test time is shown, and the reconstruction algorithm of the invention is compared with the operation time of four algorithms of VDSR, DRCN, LAPSRN and DWSR of a deep learning method. Compared with the other three algorithms, the algorithm of the invention has shorter time consumption, the PSNR is obviously higher than the other three algorithms, and particularly under the condition of SCRNN, the algorithm of the invention obtains better performance while reducing the calculation cost, has low calculation complexity and high running speed, and has simple algorithm and network structure.
The above-mentioned embodiments are only for describing the preferred mode of the invention, and do not limit the scope of the invention, and various modifications and improvements of the technical solution of the invention by those skilled in the art are within the protection scope determined by the claims of the present invention without departing from the spirit of the invention.