CN117455812A

Movatterモバイル変換

Info

Publication number: CN117455812A
Application number: CN202311504674.0A
Authority: CN
Inventors: 沈君华
Original assignee: Zhejiang Zhonglu Culture Communication Co ltd
Current assignee: Zhejiang Zhonglu Culture Communication Co ltd
Priority date: 2023-11-13
Filing date: 2023-11-13
Publication date: 2024-01-26
Anticipated expiration: 2043-11-13
Also published as: CN117455812B

Abstract

The invention discloses a video restoration method and a system, which belong to the technical field of image data processing, wherein the method comprises the following steps: acquiring video data; constructing a defect area detection model, and detecting the defect area in each video frame through the defect area detection model; extracting optical flow characteristics of each video frame through an optical flow extraction algorithm; extracting local features of each video frame through a convolutional neural network; extracting global features of each video frame through a long-short-time memory network; carrying out feature fusion on the light flow features, the local features and the global features to obtain fusion features; detecting a defect area in each video frame according to the fusion characteristics; constructing a video repair model, and repairing the defect area through the video repair model; detecting whether an undamaged induced image frame exists in the adjacent frames; repairing the defect area according to the induced image frame; the defective area is repaired by creating an countermeasure network.

Description

Video restoration method and system

Technical Field

The invention belongs to the technical field of image data processing, and particularly relates to a video restoration method and system.

Background

Video is taken as one of the main stream carriers in the modern society, and in the shooting, storage and transmission processes, due to the influence of factors such as hardware equipment, imaging technology, motion blur, ambient light, atmospheric particulate matters and the like, the condition of blur, damage or deletion can occur in the video, and the original video is often required to be repaired by using a video repair technology.

However, the existing video restoration technology mainly adopts linear or nonlinear interpolation to fill up the missing frames in the video, and the main principle is to use the information of the adjacent frames to interpolate to fill up the missing areas, which easily causes image artifacts, distortion and discontinuity, especially in the highly damaged areas.

With rapid development of pattern recognition, machine vision, deep learning, etc., and urgent need for video repair, more and more modern technologies are applied to video repair.

However, currently, generation countermeasure networks have been applied to video restoration, the main principle of which is a game process based on a generator and a arbiter, and finally, video is restored by generating high quality images by the generator. However, the generation of the countering network requires a large amount of computing resources, resulting in high video repair costs and low repair efficiency.

Disclosure of Invention

In order to solve the technical problems that the existing method adopts linear or nonlinear interpolation to fill up the missing frame in the video, and is easy to cause image artifact, distortion and discontinuity, the method adopts generation of an countermeasure network to require a large amount of computing resources, so that the video repair cost is high and the repair efficiency is low, the invention provides a video repair method and a video repair system.

First aspect

The invention provides a video restoration method, which comprises the following steps:

s1: acquiring video data;

s2: constructing a defect area detection model, and detecting a defect area in each video frame through the defect area detection model; the step S2 specifically comprises the following steps:

s201: extracting optical flow characteristics of each video frame through an optical flow extraction algorithm;

s202: extracting local features of each video frame through a convolutional neural network;

s203: extracting global features of each video frame through a long-short-time memory network;

s204: performing feature fusion on the optical flow features, the local features and the global features to obtain fusion features;

s205: detecting a defect area in each video frame according to the fusion characteristics;

s3: constructing a video repair model, and repairing the defect area through the video repair model; the step S3 specifically comprises the following steps:

S301: detecting whether an undamaged induced image frame exists in the adjacent frames, if so, executing S302, otherwise, executing S304;

s302: repairing the defect area according to the induced image frame;

s303: calculating the image quality score of the repaired video frame, and executing S304 when the image quality score of the repaired video frame is lower than a preset score;

s304: repairing the defective area by generating an countermeasure network.

Second aspect

The invention provides a video repair system, which comprises a processor and a memory for storing instructions executable by the processor; the processor is configured to invoke the instructions stored by the memory to perform the video repair method of the first aspect.

Compared with the prior art, the invention has at least the following beneficial technical effects:

(1) In the invention, when undamaged induced image frames exist in adjacent frames, the induced image frames are used for repairing the defect area preferentially, if the repairing quality is not achieved, the generation of an countermeasure network is adopted for repairing, so that the computing resource can be saved to a certain extent, the video repairing cost is reduced, the repairing efficiency is improved, the defect frames in the video are not required to be filled by adopting linear or nonlinear interpolation, the repairing quality is monitored, and the occurrence of image artifacts, distortion and discontinuity is avoided.

(2) In the invention, the optical flow characteristics, the local characteristics and the global characteristics are comprehensively considered, the characteristics of the video frame are more comprehensively evaluated, the defect area is automatically determined, and meanwhile, the accuracy of detecting the defect area of the video is improved.

Drawings

The above features, technical features, advantages and implementation of the present invention will be further described in the following description of preferred embodiments with reference to the accompanying drawings in a clear and easily understood manner.

Fig. 1 is a schematic flow chart of a video restoration method provided by the invention.

Fig. 2 is a schematic structural diagram of a video repair system according to the present invention.

Detailed Description

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description will explain the specific embodiments of the present invention with reference to the accompanying drawings. It is evident that the drawings in the following description are only examples of the invention, from which other drawings and other embodiments can be obtained by a person skilled in the art without inventive effort.

For simplicity of the drawing, only the parts relevant to the invention are schematically shown in each drawing, and they do not represent the actual structure thereof as a product. Additionally, in order to simplify the drawing for ease of understanding, components having the same structure or function in some of the drawings are shown schematically with only one of them, or only one of them is labeled. Herein, "a" means not only "only this one" but also "more than one" case.

It should be further understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

In this context, it should be noted that the terms "mounted," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected, unless otherwise explicitly stated and defined. Either mechanically or electrically. Can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.

In addition, in the description of the present invention, the terms "first," "second," and the like are used merely to distinguish between descriptions and are not to be construed as indicating or implying relative importance.

Example 1

In one embodiment, referring to fig. 1 of the specification, a schematic flow chart of a video restoration method provided by the present invention is shown.

S1: video data is acquired.

S2: and constructing a defect area detection model, and detecting the defect area in each video frame through the defect area detection model.

In one possible implementation, S2 specifically includes substeps S201 to S205:

s201: and extracting optical flow characteristics of each video frame through an optical flow extraction algorithm.

Wherein the optical flow features are computer vision features describing pixel displacement between adjacent video frames. Reflecting the motion information of objects in the video, usually expressed in the form of optical fields. The optical flow field is an image that contains a motion displacement vector for each pixel, where the displacement vector for each pixel represents the displacement of the pixel from one frame to another.

Specifically, the optical flow extraction algorithm includes: a Lucas-Kanade optical flow extraction algorithm, a Horn-Schunck optical flow extraction algorithm and a Farnesback optical flow extraction algorithm.

In one possible implementation, the present invention proposes a completely new optical flow extraction algorithm, and the substep S201 specifically includes grandchild steps S2011 to S2013:

s2011: introducing smoothness constraint on the basis of an optical flow basic equation to construct an optical flow extraction algorithm.

Wherein, the optical flow basic equation is expressed as:

Where ζ represents the optical flow fundamental constraint parameter, I represents the gray value at the pixel point (x, y), and (x, y) represents the pixel point coordinates, and t represents time.

Wherein the smoothness constraint is expressed as:

wherein ζ represents the smoothness constraint parameter.

Specifically, by introducing smoothness constraint on the optical flow basic equation, continuity and consistency between pixel points in an image can be better processed, noise and instability possibly occurring in optical flow estimation can be reduced, and optical flow accuracy is improved.

S2012: constructing an optical flow extraction objective function:

f₁ (u,v)＝minL＝min{[∫α·ζ² +(1-α)ξ² dxdy}

wherein f₁ () Representing the optical flow extraction objective function, (u, v) representing the displacement vector at the pixel point (x, y),l represents an optical flow extraction target item, ζ represents a smoothness constraint parameter, ζ represents an optical flow basic constraint parameter, and α represents a weight coefficient of the smoothness constraint parameter.

The size of the weight coefficient α of the smoothness constraint parameter can be set by a person skilled in the art according to practical situations, and the invention is not limited.

S2013: and solving the optical flow extraction target item by using the function value of the optical flow extraction target function as a target and using an Euler-Lagrange equation to obtain displacement vectors (u, v) of all pixel points, and summarizing to obtain the optical flow characteristics of all video frames.

In particular, solving the objective function using the Euler-Lagrangian equation is a common optimization method that can help find the minimum of the objective function, i.e., find the appropriate displacement vector (u, v), to best describe the pixel displacement in the image.

According to the method, smoothness constraint is introduced into an optical flow extraction algorithm, and an optical flow extraction objective function is constructed, so that the accuracy and stability of optical flow estimation can be improved, better understanding of motion information in an image is facilitated, and video restoration effect is improved.

S202: and extracting local characteristics of each video frame through a convolutional neural network.

Among them, convolutional neural networks (Convolutional Neural Network, CNN) are a deep learning neural network architecture, dedicated to machine learning tasks that process and analyze data with a grid structure.

In one possible implementation, the substep S202 specifically includes grandchild steps S2021 to S2024:

s2021: video data is input.

S2022: extracting data characteristics of video data:

wherein,represents the output of the jth channel of the current convolutional layer,/>Representing the output of the ith convolution kernel in the jth channel of the previous convolution layer, +.>Convolution kernel weights representing the current convolution layer, +. >Bias term representing current convolutional layer, M_j Representing selected input feature mappings, f_c () Representing the convolutional layer activation function.

Specifically, in convolutional neural networks, different features are detected by multiple convolutional kernels, each of which detects a different feature, resulting in a feature map of multiple channels, each channel corresponding to a different feature. These multi-channel feature maps can provide more information to help the system better understand the content in the video.

S2023: performing dimension reduction compression on the features extracted by the convolution layer:

wherein,representing the output of the jth channel of the current pooling layer, f_p () Representing a pooling layer activation function,>representing the multiplication offset of the current pooling layer, f_down () Representing a downsampling function>Represents the output of the jth channel of the previous pooling layer,>representing the additive bias of the current pooling layer.

In particular, the feature is reduced and compressed in the pooling layer, so that the computational burden is reduced, the computational efficiency is improved, the feature dimension can be effectively reduced by reducing the spatial resolution of the feature, and the complexity of subsequent processing is reduced.

S2024: and summarizing the output of the pooling layer to obtain the local characteristics of the video data.

Specifically, the local features are rolled and pooled to be summarized into a local feature representation of the video data. The local feature representation will contain the primary local feature information in the video frame, helping the subsequent steps to better understand the video content and structure.

In the invention, the convolutional neural network is used for extracting the local characteristics of the video frame, so that the system can be helped to better understand the details and the structure of the video, and the accuracy and the efficiency of the video repair task are improved.

S203: and extracting global features of each video frame through a long-short-time memory network.

Wherein Long Short-Term Memory (LSTM) is a variant of recurrent neural network (Recurrent Neural Network, RNN) and aims to solve the problems of gradient disappearance and gradient explosion of RNN when processing Long sequence data.

It should be noted that video data generally includes time-related information, such as movement, motion, continuity, and the like of an object. LSTM networks can effectively capture and model this time dependence. With LSTM, the network can remember previous frames and use this information in subsequent frames to better understand the global context of the video.

In one possible implementation, the substep S203 specifically includes grandchild steps S2031 to S2033:

s2031: a sequence of video frames of video data is input.

S2032: extracting hidden states h of each video frame, wherein the hidden states include forward hidden statesAnd a backward hidden state

I_t ＝Sigmoid(W_XI X_t +W_HI h_t-1 +b_I )

F_t ＝Sigmoid(W_XF X_t +W_HF h_t-1 +b_F )

O_t ＝Sigmoid(W_XO X_t +W_HO h_t-1 +b_O )

C'_t ＝tanh(W_XC X_t +W_HC h_t-1 +b_C )

C_t ＝F_t ·C_t-1 +I_t ·C'_t

h_t ＝O_t ·tanh(C_t )

Wherein I is_t An activation output vector representing an input gate at time t, sigmoid () representing a Sigmoid activation function, W_XI Representing a weight matrix between word sequences and input gates, W_HI Representing a weight matrix between hidden states and input gates, b_I Representing the bias term of the input gate, F_t An activation output vector of a forgetting gate at the time t is represented by W_XF Weight matrix between word sequence and forgetting gate, W_HF A weight matrix representing the hidden state and forgetting gate, b_F Indicating the forgetting of the bias term of the door, O_t An activation output vector W representing an output gate at time t_XO Representing a weight matrix between word sequences and output gates, W_HO Representing a weight matrix between hidden states and output gates, C_t An activation output vector, C ', representing the cell memory cell at time t'_t Indicating t-time cell storageCandidate output vector of cell, C_t-1 Representing the activation output vector of the cell memory unit at time t-1, and tanh () represents tanh activation function, W_XC Representing a weight matrix between word sequences and cell storage units, W_HC Representing a weight matrix between hidden states and cell storage units, b_C Bias term, h, representing cell memory cell_t Represents the hidden state at the time t, h_t-1 The hidden state at time t-1 is indicated.

S2033: integrating the forward hidden state and the backward hidden state to obtain a comprehensive hidden state which is used as the global feature of each video frame:

wherein H is_t Represents the comprehensive hidden state at the time t, W_t^f Representing the forward weight matrix at time t,indicating the forward hidden state at time t, W_t^b A backward weight matrix representing the time t, < +.>And represents the backward hidden state at the time t.

It should be noted that integrating the forward and backward hidden states makes the comprehensive hidden state more comprehensive, allowing the model to take into account the context of the video frame, capturing the temporal features.

In the invention, the LSTM is used for extracting the global characteristics of the video frames, which is helpful for better understanding video content, including time dependence, motion information and continuity, so as to improve performance in tasks such as video repair.

S204: and carrying out feature fusion on the light flow features, the local features and the global features to obtain fusion features.

In one possible implementation, S204 is specifically: and carrying out feature fusion on the light flow features, the local features and the global features according to the following formula to obtain fusion features:

S＝β₁ ·s₁ +β₂ ·s₂ +β₃ ·s₃

wherein S represents a fusion feature, S₁ Representing the features of the optical flow, beta₁ Weighting coefficients, s, representing optical flow characteristics₂ Representing local features, beta₂ Weighting coefficients, s, representing local features₃ Representing global features, beta₃ Weight coefficients representing global features.

Wherein, the person skilled in the art can set the weight coefficient beta of the optical flow characteristic according to the actual situation₁ Weighting coefficient beta of local feature₂ And the weighting coefficient beta of the global feature₃ The size of (3) is not limited in the present invention.

In the invention, feature fusion allows the model to benefit from different feature sources, and improves the comprehensive performance, adaptability and robustness of the model.

S205: and detecting a defect area in each video frame according to the fusion characteristics.

In one possible implementation, the substep S205 specifically includes grandchild steps S2051 to S2053:

s2051: according to the fusion characteristics, defect detection values of all pixel points are calculated:

C_ij ＝Softmax(W·S_ij +B)

wherein C is_ij Representing pixel points (x)_i ,y_j ) Defect detection value at Softmax () represents Softmax activation function, S_ij Representing pixel points (x)_i ,y_j ) And the fusion characteristic value is represented by W, wherein W represents a weight coefficient and B represents a bias parameter.

S2052: and when the defect detection value is larger than a preset value, determining the pixel point as a defect pixel point.

S2053: and combining each defective pixel point into a defective area.

In the invention, the optical flow characteristics, the local characteristics and the global characteristics are comprehensively considered, the characteristics of the video frame are more comprehensively evaluated, the defect area is automatically determined, and meanwhile, the accuracy of detecting the defect area of the video is improved.

In one possible embodiment, the training method of the defect area detection model includes:

constructing a loss function of a defect area detection model:

L(θ)＝λL_dice +(1-λ)L_IoU

where L () represents a loss function, a model parameter set of a θ defect region detection model, θ= [ α, β₁ ,β₂ ,β₃ ,W,B]，L_dice Represents the Dice loss, λ represents the weight coefficient of the Dice loss, L_IoU Representing IoU losses.

The size of the weight coefficient λ of the race loss can be set by a person skilled in the art according to practical situations, and the invention is not limited.

In the present invention, both the Dice loss and IoU loss are used to measure the degree of overlap between the predicted result and the real tag. The race loss focuses on accuracy, while the IoU loss focuses on recall. By using both of these losses, the model will more fully take into account accuracy and recall during the training process to better accommodate various detection tasks.

Wherein, the Dice loss is specifically:

wherein y is_i Representing the true label of the i-th sample,indicating the prediction result of the i-th sample, i=1, 2, …, N indicating the total number of samples.

Wherein, ioU loss is specifically:

and training the defect area detection model by taking the minimum function value of the loss function of the defect area detection model as a target.

In the invention, the comprehensive use of the Dice loss and IoU loss to construct the loss function is helpful to improve the performance of the defect area detection model, so that the defect area detection model has better performance in the aspects of accuracy, recall rate, unbalance data adaptation and the like.

In one possible implementation manner, the training of the defect area detection model with the goal of minimizing the function value of the loss function of the defect area detection model specifically includes:

initializing population Q, initial temperature T₀ Maximum number of iterations m and termination temperature T_m The population Q comprises a plurality of individuals X, each individual X representing a feasible model parameter set θ, θ= [ α, β ]₁ ,β₂ ,β₃ ,W,B]；

Calculating the fitness value of each individual, and determining the food position and the natural enemy position of the population Q, wherein the fitness value is calculated in the following way:

wherein delta_i Indicating fitness value, L, of the ith individual_i A function value representing a loss function when the model parameter set of the ith individual is used;

it should be noted that, taking the inverse of the loss function as the fitness function can facilitate subsequent calculation and optimization.

Performing mutation operation on the individual X to generate a new individual X_new ：

Wherein X is_new Represents a new individual, X represents a target individual, X_max Represents the individual with the largest fitness value, X_min Representing the individual with the smallest fitness value, rand represents a random number between 0 and 1;

in the invention, by carrying out mutation operation on individuals, new solutions can be introduced, so that the diversity of the population is increased, the solution which is unknown before is explored, and the algorithm is more likely to find the globally optimal solution.

Comparing individual X with New individual X_new The fitness value between them, when delta (X_new ) At > delta (X), new individuals X are used_new Replacement of individual X; when delta (X)_new ) When delta (X) is less than or equal to delta (X), using a new individual X with a preset replacement probability P_new Replacement of individual X;

the calculation mode of the preset replacement probability P is as follows:

wherein P represents a preset substitution probability, e represents a natural logarithm, δ (X)_new ) Representing a new individual X_new Delta (X) represents the fitness value of the individual X, and T represents the current temperature;

in the present invention, a temperature parameter T is introduced, allowing more sub-optimal solutions to be accepted at an early stage, helping to avoid premature collapse into a locally optimal solution; when the temperature is higher, a worse solution is more acceptable, and the tapering temperature may gradually converge to a better solution.

In the invention, the preset replacement probability P is used for controlling whether a new individual is accepted or not, and random exploration in a search space is facilitated. By accepting the new solution with a higher probability, there is an opportunity to find a better solution, while gradually sinking into a converging state as the temperature gradually decreases.

When new individual X_new When the replacement of the individual X is unsuccessful, updating the position of the individual X:

X_t+1 ＝X_t +ΔX_t+1

ΔX_t+1 ＝(η₁ A₁ +η₂ A₂ +η₃ A₃ +η₄ A₄ +η₅ A₅ )+ωΔX_t

wherein X is_t+1 Represents the position of the individual X at the t+1st iteration, X_t Representing individuals at the t-th iterationX position, deltaX_t+1 Represents the displacement vector, deltaX, at the t+1st iteration_t Representing the displacement vector at the t-th iteration, A₁ Representing the first behavior, eta₁ Weight coefficient representing first behavior, A₂ Representing the second behavior, eta₂ Weight coefficient representing the second behavior, A₃ Representing a third behavior, eta₃ Weight coefficient representing third behavior, A₄ Representing the fourth behavior, eta₄ Weight coefficient representing fourth behavior, A₅ Representing the fifth behavior, eta₅ A weight coefficient representing a fifth behavior, ω representing an inertial weight factor;

optionally, the first behavior is indicative of separation, the second behavior is indicative of alignment, the third behavior is indicative of aggregation, the fourth behavior is indicative of predation, and the fifth behavior is indicative of avoidance of natural enemies;

in the present invention, even a new individual X_new The position of the individual X can be slightly adjusted by the position updating strategy after the individual X is not replaced, so that diversity among the individuals is kept, the population is prevented from falling into a local optimal solution, and the individuals can gradually trend to a better solution without suddenly jumping out of a potential good solution by small-amplitude displacement.

Judging whether the iteration number reaches the maximum iteration number m or whether the current temperature reaches the termination temperature T_m The method comprises the steps of carrying out a first treatment on the surface of the If yes, outputting a feasible solution with the maximum reserved fitness value (the minimum function value of the loss function) as an optimal solution; otherwise, updating the temperature, and returning to the step of calculating the fitness value of each individual for iteration:

T_t+1 ＝εT_t

wherein epsilon represents the cooling coefficient, T_t+1 Represents the temperature at the t+1st iteration, T_t The temperature at the t-th iteration is indicated.

In the present invention, algorithms can more easily escape from the initial solution by gradually decreasing the temperature as the iteration progresses, and explore more widely in the search space to find globally optimal solutions, the gradual decrease in temperature helping to guide the search toward more optimal solutions.

S3: and constructing a video repair model, and repairing the defect area through the video repair model. S3 specifically includes substeps S301 to S304:

s301: it is detected whether there is an undamaged induced image frame in the neighboring frames, if so, S302 is performed, otherwise S304 is performed.

In one possible implementation, substep S301 specifically includes Sun Buzhou S3011 and S3012:

Wherein sigma_k Representing the similarity between the current frame and the kth adjacent frame, s₁ (ij) represents the pixel point (x) in the current frame_i ,y_j ) Optical flow features at s₁ (ijk) represents the pixel point (x) in the kth adjacent frame_i ,y_j ) Optical flow characteristics at beta₁ Weighting coefficients, s, representing optical flow characteristics₂ (ij) represents the pixel point (x) in the current frame_i ,y_j ) Local features at s₂ (ijk) represents the pixel point (x) in the kth adjacent frame_i ,y_j ) Local features at beta₂ Weighting coefficients, s, representing local features₃ (ij) represents the pixel point (x) in the current frame_i ,y_j ) Global features at s₃ (ijk) represents the pixel point (x) in the kth adjacent frame_i ,y_j ) Global features at beta₃ The weighting coefficients representing the global features, i=1, 2 …, M representing the total number of video horizontal pixels, j=1, 2 …, N representing the total number of video vertical pixels.

In the invention, by integrating different types of feature information (optical flow features, local features and global features), the algorithm can more fully compare the similarity between the current frame and the adjacent frames, and is helpful for better understanding the relationship between frames, especially in the case of complex motion or uneven variation. Further, the similarity calculation formula combines the information of various feature dimensions, so that the similarity between frames can be reflected more accurately, and whether the current frame is repaired by using the adjacent frames is better determined. This helps to reduce false decisions and improves the effectiveness of the repair.

S3012: when the similarity between the current frame and the adjacent frame is greater than the preset similarity, checking whether the adjacent frame has an undamaged induced image frame, if so, executing S302, otherwise, executing S304.

The size of the preset similarity can be set by a person skilled in the art according to practical situations, and the invention is not limited.

S302: and repairing the defect area according to the induced image frame.

In the invention, whether the similar adjacent frames are not damaged is determined preferentially, and the undamaged adjacent frames can be adopted to repair the video rapidly, so that the video repair efficiency is improved.

In one possible implementation manner, the present invention proposes a completely new video repair method, and the substep S302 specifically includes grandchild steps S3021 to S3023:

s3021: and performing dimension reduction processing on the video frame through a dynamic Gaussian process, and mapping the high-dimension fusion characteristics to a low-dimension potential variable space.

It should be noted that the high-dimensional fusion feature of the video frame is mapped to the low-dimensional latent variable space. This helps reduce the dimensionality of the data, lessening the computational burden while retaining critical information. The low-dimensional representation facilitates more efficient processing of subsequent image restoration tasks.

In one possible implementation manner, the invention provides a brand new construction manner of a dynamic Gaussian process, which comprises the following steps:

introducing M auxiliary points, and obtaining a probability model of a dynamic Gaussian process according to auxiliary input positions Z and auxiliary outputs u of the M auxiliary points:

p(y,f,u|X,Z)＝p(y|f)·p(f,u|X,Z)

wherein p (y, f, u|X, Z) represents a probability model of the dynamic Gaussian process, y represents the output, f represents the dynamic Gaussian process, u represents the auxiliary output, X represents the input position, and Z represents the auxiliary input position.

It should be noted that the introduction of auxiliary points can increase the flexibility of the model, so that it can better adapt to complex data distribution. The introduction of auxiliary points can reduce direct dependence on data, thereby reducing the burden of calculation and storage and improving the calculation efficiency of the model. Further, introducing auxiliary points can improve the flexibility, efficiency and fitting ability of the dynamic gaussian process model, and simultaneously reduce the computational complexity, so that the dynamic gaussian process model is more suitable for various applications including image processing and restoration.

The posterior distribution of the dynamic Gaussian process is determined through the optimal distribution of the auxiliary points:

p(f|y)＝∫p(f|u)q(u)du

where p (f|y) represents the posterior distribution of the dynamic gaussian process, p (f|u) represents the posterior distribution of the auxiliary points, and q (u) represents the optimal distribution of the auxiliary points.

Based on posterior distribution of a dynamic Gaussian process, the video frames are subjected to dimension reduction processing, and high-dimension fusion features are mapped to a low-dimension potential variable space.

In the invention, the mapping of the high-dimensional fusion features to the low-dimensional potential variable space is beneficial to improving the computing efficiency, removing noise, retaining key information and better understanding the structure of data, and has benefits for various image processing tasks, in particular to image restoration tasks.

S3022: and selecting a target area in the induced image frame in the potential variable space, and replacing, interpolating and reconstructing the defect area to repair the defect area.

It should be noted that in the latent variable space, the target region can be selected more easily, the defective region can be replaced, interpolated and reconstructed, allowing for more accurate and efficient repair of the defective pixel or region, improving image quality.

Further, compared with the traditional scheme of directly replacing by using similar frames, due to the reduced-dimension representation in the potential variable space, defect repair is easier to realize, the repaired image can keep the visual quality of the original image, and unnecessary artifacts or deformation are avoided.

S3023: and (3) performing an inverse dynamic Gaussian process on the repaired video frame, and remapping the video frame back to the original data space.

It should be noted that after the repair task is completed, the repaired image can be remapped back to the original data space through the inverse dynamic gaussian process, and the repaired image can maintain the same resolution and characteristics as those of the original video frame, so that no unexpected distortion is introduced.

In the invention, through data dimension reduction, defect repair and anti-mapping, a higher quality image repair result is provided, and meanwhile, the calculation cost is reduced, thereby being beneficial to improving the performance and usability of an image repair algorithm.

S303: and calculating the image quality score of the repaired video frame, and executing S304 when the image quality score of the repaired video frame is lower than the preset score.

In the present invention, evaluating the image quality of the repaired video frame helps to determine the effectiveness of the repair process. If the image quality is not as expected, the system may automatically trigger generation of an anti-network repair to ensure that the final output image quality meets the requirements. Meanwhile, the use of the generated countermeasure network can be reduced to a certain extent, the computing resources are saved, the video restoration cost is reduced, and the restoration efficiency is improved.

In one possible implementation, the substep S303 specifically includes Sun Buzhou S3031 and S3032:

s3031: and calculating the peak signal-to-noise ratio and the structural similarity of the repaired video frame.

Wherein, the peak signal-to-noise ratio is specifically:

wherein e₁ Represents peak signal-to-noise ratio, k represents binary representation bit number, x_ij Representing the pixel value, y, of the ith row and jth column pixel points in the original image frame_ij Pixel values representing the j-th pixel point of the i-th row and the j-th column in the repaired image frame, i=1, 2 …, M representing the total number of video horizontal pixel points, j=1, 2 …, N representing the video vertical directionTotal number of pixels.

Wherein, the structural similarity is specifically:

wherein e₂ Representing structural similarity, y representing the restored image frame, ref representing the reference image frame, L (y, ref) representing the luminance similarity between the restored image frame and the reference image, gamma₁ Weight coefficient representing brightness similarity, C (y, ref) represents contrast similarity between the restored image frame and the reference image, γ₂ Weight coefficient representing contrast similarity, S (y, ref) represents structural similarity between the repaired image frame and the reference image, gamma₃ Weight coefficients representing structural similarity.

S3032: calculating the image quality score of the repaired video frame according to the peak signal-to-noise ratio and the structural similarity:

E＝μ·e₁ +(1-μ)·e₂

Wherein E represents an image quality score, E₁ Represents peak signal-to-noise ratio, μ represents weight of peak signal-to-noise ratio, e₂ Representing structural similarity.

The size of the weight μ of the peak signal-to-noise ratio can be set by a person skilled in the art according to practical situations, and the invention is not limited.

In the invention, the image quality is evaluated by integrating the peak signal-to-noise ratio and the structural similarity, which is helpful for realizing objective, comprehensive and adjustable image quality evaluation and improving the efficiency and reliability of image processing and restoration.

S304: the defective area is repaired by creating an countermeasure network.

Wherein generating the antagonism network (Generative Adversarial Network, GAN) is a deep learning framework. GAN consists of two neural networks: a Generator (producer) and a Discriminator (Discriminator), which are mutually opposed, are learned together by gaming to generate high quality data samples.

In one possible implementation, the substep S304 specifically includes grandchild steps S3041 to S3045:

s3041: the discriminator F and the generator G are constructed in association with each other.

S3042: real video data is input, and a plurality of derivative video data are generated by a generator according to the real video data.

S3043: fixing the parameters of the generator G with a first objective function L₁ (θ_f ,θ_g ) The minimum is the goal, training discriminator F:

wherein θ_f Representing the parameters of the discriminator, θ_g Representing parameters of the generator, E () representing mathematical expectations, x representing real video data, F (x) representing the result of discrimination of the real video data by the discriminator, p_t Representing the distribution of real video data, y representing the derived video data, F (y) representing the result of the discriminator on the derived video data, p_g Representing the distribution of the derived video data.

S3044: fixing the parameters of the discriminator F with a second objective function L₂ (θ_f ,θ_g ) Maximum target, training generator G:

s3045: and repairing the defect area through the trained generator G.

According to the invention, the antagonism training framework for generating the antagonism network is fully utilized to generate high-quality data, and good effects are obtained in tasks such as image restoration, and the generated data is gradually improved by the generator through iterative training so as to be more similar to real data distribution, and meanwhile, the identifier also continuously improves the identification capability of the true and false data, so that better data restoration and generation are realized.

(1) In the invention, firstly when undamaged induced image frames exist in adjacent frames, the induced image frames are preferentially used for repairing the defect area, if the repairing quality is not reached, the generation of an countermeasure network is adopted for repairing, so that the computing resource can be saved to a certain extent, the video repairing cost is reduced, the repairing efficiency is improved, the linear or nonlinear interpolation is not required to be adopted for filling the lost frames in the video, the repairing quality is monitored, and the occurrence of image artifacts, distortion and discontinuity is avoided.

Example 2

In one embodiment, referring to fig. 2 of the specification, a schematic structural diagram of a video repair system provided by the present invention is shown.

The invention provides a video repair system which comprises a processor 201 and a memory 202 for storing instructions executable by the processor 201. The processor 201 is configured to call the instructions stored in the memory 202 to perform the video repair method in embodiment 1.

The video restoration system provided by the invention can realize the steps and effects of the video restoration method in the embodiment 1, and in order to avoid repetition, the invention is not repeated.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples illustrate only a few embodiments of the invention, which are described in detail and are not to be construed as limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

Claims

1. A method of video repair, comprising:

s1: acquiring video data;

s302: repairing the defect area according to the induced image frame;

S304: repairing the defective area by generating an countermeasure network.

2. The video restoration method according to claim 1, wherein S201 specifically includes:

s2011: introducing smoothness constraint on the basis of an optical flow basic equation, and constructing the optical flow extraction algorithm;

wherein the optical flow basis equation is expressed as:

wherein, xi represents the basic constraint parameter of the optical flow, I represents the gray value at the pixel point (x, y), and (x, y) represents the pixel point coordinate, and t represents the time;

wherein the smoothness constraint is expressed as:

wherein ζ represents a smoothness constraint parameter;

s2012: constructing an optical flow extraction objective function:

f₁ (u,v)＝minL＝min{[∫α·ζ² +(1-α)ξ² dxdy}

wherein f₁ () Representing the optical flow extraction objective function, (u, v) representing the displacement vector at the pixel point (x, y),l represents an optical flow extraction target item, ζ represents a smoothness constraint parameter, ζ represents an optical flow basic constraint parameter, and α represents a weight coefficient of the smoothness constraint parameter;

s2013: and solving an optical flow extraction target item by using the minimum function value of the optical flow extraction target function as a target through an Euler-Lagrange equation to obtain displacement vectors (u, v) of each pixel point, and summarizing to obtain optical flow characteristics of each video frame.

3. The video restoration method according to claim 1, wherein S202 specifically comprises:

s2021: inputting the video data;

s2022: extracting data features of the video data:

wherein,represents the output of the jth channel of the current convolutional layer,/>Representing the output of the ith convolution kernel in the jth channel of the previous convolution layer, +.>Convolution kernel weights representing the current convolution layer, +.>Bias term representing current convolutional layer, M_j Representing selected input feature mappings, f_c () Representing a convolutional layer activation function;

wherein,representing the output of the jth channel of the current pooling layer, f_p () Representing a pooling layer activation function,>representing the multiplication offset of the current pooling layer, f_down () Representing a downsampling function>Represents the output of the jth channel of the previous pooling layer,>representing the additive bias of the current pooling layer;

4. The video restoration method according to claim 1, wherein S203 specifically comprises:

s2031: inputting a sequence of video frames of the video data;

s2032: extracting hidden states h of each video frame, wherein the hidden states comprise a forward hidden state h and a backward hidden state h

I_t ＝Sigmoid(W_XI X_t +W_HI h_t-1 +b_I )

F_t ＝Sigmoid(W_XF X_t +W_HF h_t-1 +b_F )

O_t ＝Sigmoid(W_XO X_t +W_HO h_t-1 +b_O )

C'_t ＝tanh(W_XC X_t +W_HC h_t-1 +b_C )

C_t ＝F_t ·C_t-1 +I_t ·C'_t

h_t ＝O_t ·tanh(C_t )

Wherein I is_t An activation output vector representing an input gate at time t, sigmoid () representing a Sigmoid activation function, W_XI Representing a weight matrix between word sequences and input gates, W_HI Representing a weight matrix between hidden states and input gates, b_I Representing the bias term of the input gate, F_t An activation output vector of a forgetting gate at the time t is represented by W_XF Weight matrix between word sequence and forgetting gate, W_HF A weight matrix representing the hidden state and forgetting gate, b_F Indicating the forgetting of the bias term of the door, O_t An activation output vector W representing an output gate at time t_XO Representing a weight matrix between word sequences and output gates, W_HO Representing a weight matrix between hidden states and output gates, C_t An activation output vector representing the cell memory cell at time t, C_t ' candidate output vector representing cell memory cell at time t, C_t-1 Representing the activation output vector of the cell memory unit at time t-1, and tanh () represents tanh activation function, W_XC Representing a weight matrix between word sequences and cell storage units, W_HC Representing a weight matrix between hidden states and cell storage units, b_C Bias term, h, representing cell memory cell_t Represents the hidden state at the time t, h_t-1 The hidden state at the time t-1 is represented;

5. The video restoration method according to claim 1, wherein S204 is specifically:

and carrying out feature fusion on the optical flow features, the local features and the global features according to the following formula to obtain fusion features:

S＝β₁ ·s₁ +β₂ ·s₂ +β₃ ·s₃

6. The video restoration method according to claim 5, wherein S205 specifically comprises:

C_ij ＝Softmax(W·S_ij +B)

wherein C is_ij Representing pixel points (x)_i ,y_j ) Defect detection value at Softmax () represents Softmax activation function, S_ij Representing pixel points (x)_i ,y_j ) Melting at the siteCombining the characteristic values, wherein W represents a weight coefficient, and B represents a bias parameter;

s2052: when the defect detection value is larger than a preset value, determining the pixel point as a defect pixel point;

s2053: and combining each defective pixel point into a defective area.

7. The video restoration method according to claim 1, wherein S301 specifically includes:

wherein sigma_k Representing the similarity between the current frame and the kth adjacent frame, s₁ (ij) represents the pixel point (x) in the current frame_i ,y_j ) Optical flow features at s₁ (ijk) represents the pixel point (x) in the kth adjacent frame_i ,y_j ) Optical flow characteristics at beta₁ Weighting coefficients, s, representing optical flow characteristics₂ (ij) represents the pixel point (x) in the current frame_i ,y_j ) Local features at s₂ (ijk) represents the pixel point (x) in the kth adjacent frame_i ,y_j ) Local features at beta₂ Weighting coefficients, s, representing local features₃ (ij) represents the pixel point (x) in the current frame_i ,y_j ) Global features at s₃ (ijk) represents the pixel point (x) in the kth adjacent frame_i ,y_j ) Global features at beta₃ The weight coefficient representing the global feature, i=1, 2 …, M represents the total number of video horizontal pixels, j=1, 2 …, N represents the total number of video vertical pixels;

8. The video restoration method according to claim 1, wherein S303 specifically comprises:

s3031: calculating the peak signal-to-noise ratio and the structural similarity of the repaired video frame;

wherein, the peak signal-to-noise ratio is specifically:

wherein e₁ Represents peak signal-to-noise ratio, k represents binary representation bit number, x_ij Representing the pixel value, y, of the ith row and jth column pixel points in the original image frame_ij Pixel values representing the j-th pixel point of the i-th row in the repaired image frame, i=1, 2 …, M representing the total number of video horizontal pixels, j=1, 2 …, N representing the total number of video vertical pixels;

wherein, the structural similarity is specifically:

wherein e₂ Representing structural similarity, y representing the restored image frame, ref representing the reference image frame, L (y, ref) representing the luminance similarity between the restored image frame and the reference image, gamma₁ Weight coefficient representing brightness similarity, C (y, ref) representing contrast similarity between the restored image frame and the reference image, γ2 representing weight coefficient of contrast similarity, S (y, ref) representing structural similarity between the restored image frame and the reference image, γ₃ A weight coefficient representing structural similarity;

E＝μ·e₁ +(1-μ)·e₂

9. The video restoration method according to claim 1, wherein S304 specifically includes:

s3041: constructing a discriminator F and a generator G which have association relation with each other;

s3042: inputting real video data, and generating various derivative video data according to the real video data through the generator;

s3043: fixing the parameters of the generator G to a first objective function L₁ (θ_f ,θ_g ) The minimum is the goal, training the discriminator F:

wherein θ_f Representing the parameters, θ, of the discriminator_g Representing parameters of the generator, E () representing mathematical expectations, x representing real video data, F (x) representing the result of the discriminator's discrimination of real video data, p_t Representing the distribution of real video data, y representing the derived video data, F (y) representing the result of the discriminator on the derived video data, p_g Representing a distribution of derived video data;

s3044: fixing the parameters of the discriminator F with a second objective function L₂ (θ_f ,θ_g ) Maximum goal, training the generator G:

s3045: and repairing the defect area through a trained generator G.

10. A video repair system comprising a processor and a memory for storing processor-executable instructions; the processor is configured to invoke the instructions stored in the memory to perform the video repair method of any of claims 1 to 9.