Movatterモバイル変換


[0]ホーム

URL:


CN117455812A - Video restoration method and system - Google Patents

Video restoration method and system
Download PDF

Info

Publication number
CN117455812A
CN117455812ACN202311504674.0ACN202311504674ACN117455812ACN 117455812 ACN117455812 ACN 117455812ACN 202311504674 ACN202311504674 ACN 202311504674ACN 117455812 ACN117455812 ACN 117455812A
Authority
CN
China
Prior art keywords
representing
video
frame
optical flow
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311504674.0A
Other languages
Chinese (zh)
Other versions
CN117455812B (en
Inventor
沈君华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Zhonglu Culture Communication Co ltd
Original Assignee
Zhejiang Zhonglu Culture Communication Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Zhonglu Culture Communication Co ltdfiledCriticalZhejiang Zhonglu Culture Communication Co ltd
Priority to CN202311504674.0ApriorityCriticalpatent/CN117455812B/en
Publication of CN117455812ApublicationCriticalpatent/CN117455812A/en
Application grantedgrantedCritical
Publication of CN117455812BpublicationCriticalpatent/CN117455812B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

The invention discloses a video restoration method and a system, which belong to the technical field of image data processing, wherein the method comprises the following steps: acquiring video data; constructing a defect area detection model, and detecting the defect area in each video frame through the defect area detection model; extracting optical flow characteristics of each video frame through an optical flow extraction algorithm; extracting local features of each video frame through a convolutional neural network; extracting global features of each video frame through a long-short-time memory network; carrying out feature fusion on the light flow features, the local features and the global features to obtain fusion features; detecting a defect area in each video frame according to the fusion characteristics; constructing a video repair model, and repairing the defect area through the video repair model; detecting whether an undamaged induced image frame exists in the adjacent frames; repairing the defect area according to the induced image frame; the defective area is repaired by creating an countermeasure network.

Description

Video restoration method and system
Technical Field
The invention belongs to the technical field of image data processing, and particularly relates to a video restoration method and system.
Background
Video is taken as one of the main stream carriers in the modern society, and in the shooting, storage and transmission processes, due to the influence of factors such as hardware equipment, imaging technology, motion blur, ambient light, atmospheric particulate matters and the like, the condition of blur, damage or deletion can occur in the video, and the original video is often required to be repaired by using a video repair technology.
However, the existing video restoration technology mainly adopts linear or nonlinear interpolation to fill up the missing frames in the video, and the main principle is to use the information of the adjacent frames to interpolate to fill up the missing areas, which easily causes image artifacts, distortion and discontinuity, especially in the highly damaged areas.
With rapid development of pattern recognition, machine vision, deep learning, etc., and urgent need for video repair, more and more modern technologies are applied to video repair.
However, currently, generation countermeasure networks have been applied to video restoration, the main principle of which is a game process based on a generator and a arbiter, and finally, video is restored by generating high quality images by the generator. However, the generation of the countering network requires a large amount of computing resources, resulting in high video repair costs and low repair efficiency.
Disclosure of Invention
In order to solve the technical problems that the existing method adopts linear or nonlinear interpolation to fill up the missing frame in the video, and is easy to cause image artifact, distortion and discontinuity, the method adopts generation of an countermeasure network to require a large amount of computing resources, so that the video repair cost is high and the repair efficiency is low, the invention provides a video repair method and a video repair system.
First aspect
The invention provides a video restoration method, which comprises the following steps:
s1: acquiring video data;
s2: constructing a defect area detection model, and detecting a defect area in each video frame through the defect area detection model; the step S2 specifically comprises the following steps:
s201: extracting optical flow characteristics of each video frame through an optical flow extraction algorithm;
s202: extracting local features of each video frame through a convolutional neural network;
s203: extracting global features of each video frame through a long-short-time memory network;
s204: performing feature fusion on the optical flow features, the local features and the global features to obtain fusion features;
s205: detecting a defect area in each video frame according to the fusion characteristics;
s3: constructing a video repair model, and repairing the defect area through the video repair model; the step S3 specifically comprises the following steps:
S301: detecting whether an undamaged induced image frame exists in the adjacent frames, if so, executing S302, otherwise, executing S304;
s302: repairing the defect area according to the induced image frame;
s303: calculating the image quality score of the repaired video frame, and executing S304 when the image quality score of the repaired video frame is lower than a preset score;
s304: repairing the defective area by generating an countermeasure network.
Second aspect
The invention provides a video repair system, which comprises a processor and a memory for storing instructions executable by the processor; the processor is configured to invoke the instructions stored by the memory to perform the video repair method of the first aspect.
Compared with the prior art, the invention has at least the following beneficial technical effects:
(1) In the invention, when undamaged induced image frames exist in adjacent frames, the induced image frames are used for repairing the defect area preferentially, if the repairing quality is not achieved, the generation of an countermeasure network is adopted for repairing, so that the computing resource can be saved to a certain extent, the video repairing cost is reduced, the repairing efficiency is improved, the defect frames in the video are not required to be filled by adopting linear or nonlinear interpolation, the repairing quality is monitored, and the occurrence of image artifacts, distortion and discontinuity is avoided.
(2) In the invention, the optical flow characteristics, the local characteristics and the global characteristics are comprehensively considered, the characteristics of the video frame are more comprehensively evaluated, the defect area is automatically determined, and meanwhile, the accuracy of detecting the defect area of the video is improved.
Drawings
The above features, technical features, advantages and implementation of the present invention will be further described in the following description of preferred embodiments with reference to the accompanying drawings in a clear and easily understood manner.
Fig. 1 is a schematic flow chart of a video restoration method provided by the invention.
Fig. 2 is a schematic structural diagram of a video repair system according to the present invention.
Detailed Description
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description will explain the specific embodiments of the present invention with reference to the accompanying drawings. It is evident that the drawings in the following description are only examples of the invention, from which other drawings and other embodiments can be obtained by a person skilled in the art without inventive effort.
For simplicity of the drawing, only the parts relevant to the invention are schematically shown in each drawing, and they do not represent the actual structure thereof as a product. Additionally, in order to simplify the drawing for ease of understanding, components having the same structure or function in some of the drawings are shown schematically with only one of them, or only one of them is labeled. Herein, "a" means not only "only this one" but also "more than one" case.
It should be further understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
In this context, it should be noted that the terms "mounted," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected, unless otherwise explicitly stated and defined. Either mechanically or electrically. Can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.
In addition, in the description of the present invention, the terms "first," "second," and the like are used merely to distinguish between descriptions and are not to be construed as indicating or implying relative importance.
Example 1
In one embodiment, referring to fig. 1 of the specification, a schematic flow chart of a video restoration method provided by the present invention is shown.
The invention provides a video restoration method, which comprises the following steps:
S1: video data is acquired.
S2: and constructing a defect area detection model, and detecting the defect area in each video frame through the defect area detection model.
In one possible implementation, S2 specifically includes substeps S201 to S205:
s201: and extracting optical flow characteristics of each video frame through an optical flow extraction algorithm.
Wherein the optical flow features are computer vision features describing pixel displacement between adjacent video frames. Reflecting the motion information of objects in the video, usually expressed in the form of optical fields. The optical flow field is an image that contains a motion displacement vector for each pixel, where the displacement vector for each pixel represents the displacement of the pixel from one frame to another.
Specifically, the optical flow extraction algorithm includes: a Lucas-Kanade optical flow extraction algorithm, a Horn-Schunck optical flow extraction algorithm and a Farnesback optical flow extraction algorithm.
In one possible implementation, the present invention proposes a completely new optical flow extraction algorithm, and the substep S201 specifically includes grandchild steps S2011 to S2013:
s2011: introducing smoothness constraint on the basis of an optical flow basic equation to construct an optical flow extraction algorithm.
Wherein, the optical flow basic equation is expressed as:
Where ζ represents the optical flow fundamental constraint parameter, I represents the gray value at the pixel point (x, y), and (x, y) represents the pixel point coordinates, and t represents time.
Wherein the smoothness constraint is expressed as:
wherein ζ represents the smoothness constraint parameter.
Specifically, by introducing smoothness constraint on the optical flow basic equation, continuity and consistency between pixel points in an image can be better processed, noise and instability possibly occurring in optical flow estimation can be reduced, and optical flow accuracy is improved.
S2012: constructing an optical flow extraction objective function:
f1 (u,v)=minL=min{[∫α·ζ2 +(1-α)ξ2 dxdy}
wherein f1 () Representing the optical flow extraction objective function, (u, v) representing the displacement vector at the pixel point (x, y),l represents an optical flow extraction target item, ζ represents a smoothness constraint parameter, ζ represents an optical flow basic constraint parameter, and α represents a weight coefficient of the smoothness constraint parameter.
The size of the weight coefficient α of the smoothness constraint parameter can be set by a person skilled in the art according to practical situations, and the invention is not limited.
S2013: and solving the optical flow extraction target item by using the function value of the optical flow extraction target function as a target and using an Euler-Lagrange equation to obtain displacement vectors (u, v) of all pixel points, and summarizing to obtain the optical flow characteristics of all video frames.
In particular, solving the objective function using the Euler-Lagrangian equation is a common optimization method that can help find the minimum of the objective function, i.e., find the appropriate displacement vector (u, v), to best describe the pixel displacement in the image.
According to the method, smoothness constraint is introduced into an optical flow extraction algorithm, and an optical flow extraction objective function is constructed, so that the accuracy and stability of optical flow estimation can be improved, better understanding of motion information in an image is facilitated, and video restoration effect is improved.
S202: and extracting local characteristics of each video frame through a convolutional neural network.
Among them, convolutional neural networks (Convolutional Neural Network, CNN) are a deep learning neural network architecture, dedicated to machine learning tasks that process and analyze data with a grid structure.
In one possible implementation, the substep S202 specifically includes grandchild steps S2021 to S2024:
s2021: video data is input.
S2022: extracting data characteristics of video data:
wherein,represents the output of the jth channel of the current convolutional layer,/>Representing the output of the ith convolution kernel in the jth channel of the previous convolution layer, +.>Convolution kernel weights representing the current convolution layer, +. >Bias term representing current convolutional layer, Mj Representing selected input feature mappings, fc () Representing the convolutional layer activation function.
Specifically, in convolutional neural networks, different features are detected by multiple convolutional kernels, each of which detects a different feature, resulting in a feature map of multiple channels, each channel corresponding to a different feature. These multi-channel feature maps can provide more information to help the system better understand the content in the video.
S2023: performing dimension reduction compression on the features extracted by the convolution layer:
wherein,representing the output of the jth channel of the current pooling layer, fp () Representing a pooling layer activation function,>representing the multiplication offset of the current pooling layer, fdown () Representing a downsampling function>Represents the output of the jth channel of the previous pooling layer,>representing the additive bias of the current pooling layer.
In particular, the feature is reduced and compressed in the pooling layer, so that the computational burden is reduced, the computational efficiency is improved, the feature dimension can be effectively reduced by reducing the spatial resolution of the feature, and the complexity of subsequent processing is reduced.
S2024: and summarizing the output of the pooling layer to obtain the local characteristics of the video data.
Specifically, the local features are rolled and pooled to be summarized into a local feature representation of the video data. The local feature representation will contain the primary local feature information in the video frame, helping the subsequent steps to better understand the video content and structure.
In the invention, the convolutional neural network is used for extracting the local characteristics of the video frame, so that the system can be helped to better understand the details and the structure of the video, and the accuracy and the efficiency of the video repair task are improved.
S203: and extracting global features of each video frame through a long-short-time memory network.
Wherein Long Short-Term Memory (LSTM) is a variant of recurrent neural network (Recurrent Neural Network, RNN) and aims to solve the problems of gradient disappearance and gradient explosion of RNN when processing Long sequence data.
It should be noted that video data generally includes time-related information, such as movement, motion, continuity, and the like of an object. LSTM networks can effectively capture and model this time dependence. With LSTM, the network can remember previous frames and use this information in subsequent frames to better understand the global context of the video.
In one possible implementation, the substep S203 specifically includes grandchild steps S2031 to S2033:
s2031: a sequence of video frames of video data is input.
S2032: extracting hidden states h of each video frame, wherein the hidden states include forward hidden statesAnd a backward hidden state
It =Sigmoid(WXI Xt +WHI ht-1 +bI )
Ft =Sigmoid(WXF Xt +WHF ht-1 +bF )
Ot =Sigmoid(WXO Xt +WHO ht-1 +bO )
C't =tanh(WXC Xt +WHC ht-1 +bC )
Ct =Ft ·Ct-1 +It ·C't
ht =Ot ·tanh(Ct )
Wherein I ist An activation output vector representing an input gate at time t, sigmoid () representing a Sigmoid activation function, WXI Representing a weight matrix between word sequences and input gates, WHI Representing a weight matrix between hidden states and input gates, bI Representing the bias term of the input gate, Ft An activation output vector of a forgetting gate at the time t is represented by WXF Weight matrix between word sequence and forgetting gate, WHF A weight matrix representing the hidden state and forgetting gate, bF Indicating the forgetting of the bias term of the door, Ot An activation output vector W representing an output gate at time tXO Representing a weight matrix between word sequences and output gates, WHO Representing a weight matrix between hidden states and output gates, Ct An activation output vector, C ', representing the cell memory cell at time t't Indicating t-time cell storageCandidate output vector of cell, Ct-1 Representing the activation output vector of the cell memory unit at time t-1, and tanh () represents tanh activation function, WXC Representing a weight matrix between word sequences and cell storage units, WHC Representing a weight matrix between hidden states and cell storage units, bC Bias term, h, representing cell memory cellt Represents the hidden state at the time t, ht-1 The hidden state at time t-1 is indicated.
S2033: integrating the forward hidden state and the backward hidden state to obtain a comprehensive hidden state which is used as the global feature of each video frame:
wherein H ist Represents the comprehensive hidden state at the time t, Wtf Representing the forward weight matrix at time t,indicating the forward hidden state at time t, Wtb A backward weight matrix representing the time t, < +.>And represents the backward hidden state at the time t.
It should be noted that integrating the forward and backward hidden states makes the comprehensive hidden state more comprehensive, allowing the model to take into account the context of the video frame, capturing the temporal features.
In the invention, the LSTM is used for extracting the global characteristics of the video frames, which is helpful for better understanding video content, including time dependence, motion information and continuity, so as to improve performance in tasks such as video repair.
S204: and carrying out feature fusion on the light flow features, the local features and the global features to obtain fusion features.
In one possible implementation, S204 is specifically: and carrying out feature fusion on the light flow features, the local features and the global features according to the following formula to obtain fusion features:
S=β1 ·s12 ·s23 ·s3
wherein S represents a fusion feature, S1 Representing the features of the optical flow, beta1 Weighting coefficients, s, representing optical flow characteristics2 Representing local features, beta2 Weighting coefficients, s, representing local features3 Representing global features, beta3 Weight coefficients representing global features.
Wherein, the person skilled in the art can set the weight coefficient beta of the optical flow characteristic according to the actual situation1 Weighting coefficient beta of local feature2 And the weighting coefficient beta of the global feature3 The size of (3) is not limited in the present invention.
In the invention, feature fusion allows the model to benefit from different feature sources, and improves the comprehensive performance, adaptability and robustness of the model.
S205: and detecting a defect area in each video frame according to the fusion characteristics.
In one possible implementation, the substep S205 specifically includes grandchild steps S2051 to S2053:
s2051: according to the fusion characteristics, defect detection values of all pixel points are calculated:
Cij =Softmax(W·Sij +B)
wherein C isij Representing pixel points (x)i ,yj ) Defect detection value at Softmax () represents Softmax activation function, Sij Representing pixel points (x)i ,yj ) And the fusion characteristic value is represented by W, wherein W represents a weight coefficient and B represents a bias parameter.
S2052: and when the defect detection value is larger than a preset value, determining the pixel point as a defect pixel point.
S2053: and combining each defective pixel point into a defective area.
In the invention, the optical flow characteristics, the local characteristics and the global characteristics are comprehensively considered, the characteristics of the video frame are more comprehensively evaluated, the defect area is automatically determined, and meanwhile, the accuracy of detecting the defect area of the video is improved.
In one possible embodiment, the training method of the defect area detection model includes:
constructing a loss function of a defect area detection model:
L(θ)=λLdice +(1-λ)LIoU
where L () represents a loss function, a model parameter set of a θ defect region detection model, θ= [ α, β123 ,W,B],Ldice Represents the Dice loss, λ represents the weight coefficient of the Dice loss, LIoU Representing IoU losses.
The size of the weight coefficient λ of the race loss can be set by a person skilled in the art according to practical situations, and the invention is not limited.
In the present invention, both the Dice loss and IoU loss are used to measure the degree of overlap between the predicted result and the real tag. The race loss focuses on accuracy, while the IoU loss focuses on recall. By using both of these losses, the model will more fully take into account accuracy and recall during the training process to better accommodate various detection tasks.
Wherein, the Dice loss is specifically:
wherein y isi Representing the true label of the i-th sample,indicating the prediction result of the i-th sample, i=1, 2, …, N indicating the total number of samples.
Wherein, ioU loss is specifically:
and training the defect area detection model by taking the minimum function value of the loss function of the defect area detection model as a target.
In the invention, the comprehensive use of the Dice loss and IoU loss to construct the loss function is helpful to improve the performance of the defect area detection model, so that the defect area detection model has better performance in the aspects of accuracy, recall rate, unbalance data adaptation and the like.
In one possible implementation manner, the training of the defect area detection model with the goal of minimizing the function value of the loss function of the defect area detection model specifically includes:
initializing population Q, initial temperature T0 Maximum number of iterations m and termination temperature Tm The population Q comprises a plurality of individuals X, each individual X representing a feasible model parameter set θ, θ= [ α, β ]123 ,W,B];
Calculating the fitness value of each individual, and determining the food position and the natural enemy position of the population Q, wherein the fitness value is calculated in the following way:
wherein deltai Indicating fitness value, L, of the ith individuali A function value representing a loss function when the model parameter set of the ith individual is used;
it should be noted that, taking the inverse of the loss function as the fitness function can facilitate subsequent calculation and optimization.
Performing mutation operation on the individual X to generate a new individual Xnew
Wherein X isnew Represents a new individual, X represents a target individual, Xmax Represents the individual with the largest fitness value, Xmin Representing the individual with the smallest fitness value, rand represents a random number between 0 and 1;
in the invention, by carrying out mutation operation on individuals, new solutions can be introduced, so that the diversity of the population is increased, the solution which is unknown before is explored, and the algorithm is more likely to find the globally optimal solution.
Comparing individual X with New individual Xnew The fitness value between them, when delta (Xnew ) At > delta (X), new individuals X are usednew Replacement of individual X; when delta (X)new ) When delta (X) is less than or equal to delta (X), using a new individual X with a preset replacement probability Pnew Replacement of individual X;
the calculation mode of the preset replacement probability P is as follows:
wherein P represents a preset substitution probability, e represents a natural logarithm, δ (X)new ) Representing a new individual Xnew Delta (X) represents the fitness value of the individual X, and T represents the current temperature;
in the present invention, a temperature parameter T is introduced, allowing more sub-optimal solutions to be accepted at an early stage, helping to avoid premature collapse into a locally optimal solution; when the temperature is higher, a worse solution is more acceptable, and the tapering temperature may gradually converge to a better solution.
In the invention, the preset replacement probability P is used for controlling whether a new individual is accepted or not, and random exploration in a search space is facilitated. By accepting the new solution with a higher probability, there is an opportunity to find a better solution, while gradually sinking into a converging state as the temperature gradually decreases.
When new individual Xnew When the replacement of the individual X is unsuccessful, updating the position of the individual X:
Xt+1 =Xt +ΔXt+1
ΔXt+1 =(η1 A12 A23 A34 A45 A5 )+ωΔXt
wherein X ist+1 Represents the position of the individual X at the t+1st iteration, Xt Representing individuals at the t-th iterationX position, deltaXt+1 Represents the displacement vector, deltaX, at the t+1st iterationt Representing the displacement vector at the t-th iteration, A1 Representing the first behavior, eta1 Weight coefficient representing first behavior, A2 Representing the second behavior, eta2 Weight coefficient representing the second behavior, A3 Representing a third behavior, eta3 Weight coefficient representing third behavior, A4 Representing the fourth behavior, eta4 Weight coefficient representing fourth behavior, A5 Representing the fifth behavior, eta5 A weight coefficient representing a fifth behavior, ω representing an inertial weight factor;
optionally, the first behavior is indicative of separation, the second behavior is indicative of alignment, the third behavior is indicative of aggregation, the fourth behavior is indicative of predation, and the fifth behavior is indicative of avoidance of natural enemies;
in the present invention, even a new individual Xnew The position of the individual X can be slightly adjusted by the position updating strategy after the individual X is not replaced, so that diversity among the individuals is kept, the population is prevented from falling into a local optimal solution, and the individuals can gradually trend to a better solution without suddenly jumping out of a potential good solution by small-amplitude displacement.
Judging whether the iteration number reaches the maximum iteration number m or whether the current temperature reaches the termination temperature Tm The method comprises the steps of carrying out a first treatment on the surface of the If yes, outputting a feasible solution with the maximum reserved fitness value (the minimum function value of the loss function) as an optimal solution; otherwise, updating the temperature, and returning to the step of calculating the fitness value of each individual for iteration:
Tt+1 =εTt
wherein epsilon represents the cooling coefficient, Tt+1 Represents the temperature at the t+1st iteration, Tt The temperature at the t-th iteration is indicated.
In the present invention, algorithms can more easily escape from the initial solution by gradually decreasing the temperature as the iteration progresses, and explore more widely in the search space to find globally optimal solutions, the gradual decrease in temperature helping to guide the search toward more optimal solutions.
S3: and constructing a video repair model, and repairing the defect area through the video repair model. S3 specifically includes substeps S301 to S304:
s301: it is detected whether there is an undamaged induced image frame in the neighboring frames, if so, S302 is performed, otherwise S304 is performed.
In one possible implementation, substep S301 specifically includes Sun Buzhou S3011 and S3012:
s3011: the similarity between the current frame and the neighboring frame is calculated according to the following formula:
Wherein sigmak Representing the similarity between the current frame and the kth adjacent frame, s1 (ij) represents the pixel point (x) in the current framei ,yj ) Optical flow features at s1 (ijk) represents the pixel point (x) in the kth adjacent framei ,yj ) Optical flow characteristics at beta1 Weighting coefficients, s, representing optical flow characteristics2 (ij) represents the pixel point (x) in the current framei ,yj ) Local features at s2 (ijk) represents the pixel point (x) in the kth adjacent framei ,yj ) Local features at beta2 Weighting coefficients, s, representing local features3 (ij) represents the pixel point (x) in the current framei ,yj ) Global features at s3 (ijk) represents the pixel point (x) in the kth adjacent framei ,yj ) Global features at beta3 The weighting coefficients representing the global features, i=1, 2 …, M representing the total number of video horizontal pixels, j=1, 2 …, N representing the total number of video vertical pixels.
In the invention, by integrating different types of feature information (optical flow features, local features and global features), the algorithm can more fully compare the similarity between the current frame and the adjacent frames, and is helpful for better understanding the relationship between frames, especially in the case of complex motion or uneven variation. Further, the similarity calculation formula combines the information of various feature dimensions, so that the similarity between frames can be reflected more accurately, and whether the current frame is repaired by using the adjacent frames is better determined. This helps to reduce false decisions and improves the effectiveness of the repair.
S3012: when the similarity between the current frame and the adjacent frame is greater than the preset similarity, checking whether the adjacent frame has an undamaged induced image frame, if so, executing S302, otherwise, executing S304.
The size of the preset similarity can be set by a person skilled in the art according to practical situations, and the invention is not limited.
S302: and repairing the defect area according to the induced image frame.
In the invention, whether the similar adjacent frames are not damaged is determined preferentially, and the undamaged adjacent frames can be adopted to repair the video rapidly, so that the video repair efficiency is improved.
In one possible implementation manner, the present invention proposes a completely new video repair method, and the substep S302 specifically includes grandchild steps S3021 to S3023:
s3021: and performing dimension reduction processing on the video frame through a dynamic Gaussian process, and mapping the high-dimension fusion characteristics to a low-dimension potential variable space.
It should be noted that the high-dimensional fusion feature of the video frame is mapped to the low-dimensional latent variable space. This helps reduce the dimensionality of the data, lessening the computational burden while retaining critical information. The low-dimensional representation facilitates more efficient processing of subsequent image restoration tasks.
In one possible implementation manner, the invention provides a brand new construction manner of a dynamic Gaussian process, which comprises the following steps:
introducing M auxiliary points, and obtaining a probability model of a dynamic Gaussian process according to auxiliary input positions Z and auxiliary outputs u of the M auxiliary points:
p(y,f,u|X,Z)=p(y|f)·p(f,u|X,Z)
wherein p (y, f, u|X, Z) represents a probability model of the dynamic Gaussian process, y represents the output, f represents the dynamic Gaussian process, u represents the auxiliary output, X represents the input position, and Z represents the auxiliary input position.
It should be noted that the introduction of auxiliary points can increase the flexibility of the model, so that it can better adapt to complex data distribution. The introduction of auxiliary points can reduce direct dependence on data, thereby reducing the burden of calculation and storage and improving the calculation efficiency of the model. Further, introducing auxiliary points can improve the flexibility, efficiency and fitting ability of the dynamic gaussian process model, and simultaneously reduce the computational complexity, so that the dynamic gaussian process model is more suitable for various applications including image processing and restoration.
The posterior distribution of the dynamic Gaussian process is determined through the optimal distribution of the auxiliary points:
p(f|y)=∫p(f|u)q(u)du
where p (f|y) represents the posterior distribution of the dynamic gaussian process, p (f|u) represents the posterior distribution of the auxiliary points, and q (u) represents the optimal distribution of the auxiliary points.
Based on posterior distribution of a dynamic Gaussian process, the video frames are subjected to dimension reduction processing, and high-dimension fusion features are mapped to a low-dimension potential variable space.
In the invention, the mapping of the high-dimensional fusion features to the low-dimensional potential variable space is beneficial to improving the computing efficiency, removing noise, retaining key information and better understanding the structure of data, and has benefits for various image processing tasks, in particular to image restoration tasks.
S3022: and selecting a target area in the induced image frame in the potential variable space, and replacing, interpolating and reconstructing the defect area to repair the defect area.
It should be noted that in the latent variable space, the target region can be selected more easily, the defective region can be replaced, interpolated and reconstructed, allowing for more accurate and efficient repair of the defective pixel or region, improving image quality.
Further, compared with the traditional scheme of directly replacing by using similar frames, due to the reduced-dimension representation in the potential variable space, defect repair is easier to realize, the repaired image can keep the visual quality of the original image, and unnecessary artifacts or deformation are avoided.
S3023: and (3) performing an inverse dynamic Gaussian process on the repaired video frame, and remapping the video frame back to the original data space.
It should be noted that after the repair task is completed, the repaired image can be remapped back to the original data space through the inverse dynamic gaussian process, and the repaired image can maintain the same resolution and characteristics as those of the original video frame, so that no unexpected distortion is introduced.
In the invention, through data dimension reduction, defect repair and anti-mapping, a higher quality image repair result is provided, and meanwhile, the calculation cost is reduced, thereby being beneficial to improving the performance and usability of an image repair algorithm.
S303: and calculating the image quality score of the repaired video frame, and executing S304 when the image quality score of the repaired video frame is lower than the preset score.
In the present invention, evaluating the image quality of the repaired video frame helps to determine the effectiveness of the repair process. If the image quality is not as expected, the system may automatically trigger generation of an anti-network repair to ensure that the final output image quality meets the requirements. Meanwhile, the use of the generated countermeasure network can be reduced to a certain extent, the computing resources are saved, the video restoration cost is reduced, and the restoration efficiency is improved.
In one possible implementation, the substep S303 specifically includes Sun Buzhou S3031 and S3032:
s3031: and calculating the peak signal-to-noise ratio and the structural similarity of the repaired video frame.
Wherein, the peak signal-to-noise ratio is specifically:
wherein e1 Represents peak signal-to-noise ratio, k represents binary representation bit number, xij Representing the pixel value, y, of the ith row and jth column pixel points in the original image frameij Pixel values representing the j-th pixel point of the i-th row and the j-th column in the repaired image frame, i=1, 2 …, M representing the total number of video horizontal pixel points, j=1, 2 …, N representing the video vertical directionTotal number of pixels.
Wherein, the structural similarity is specifically:
wherein e2 Representing structural similarity, y representing the restored image frame, ref representing the reference image frame, L (y, ref) representing the luminance similarity between the restored image frame and the reference image, gamma1 Weight coefficient representing brightness similarity, C (y, ref) represents contrast similarity between the restored image frame and the reference image, γ2 Weight coefficient representing contrast similarity, S (y, ref) represents structural similarity between the repaired image frame and the reference image, gamma3 Weight coefficients representing structural similarity.
S3032: calculating the image quality score of the repaired video frame according to the peak signal-to-noise ratio and the structural similarity:
E=μ·e1 +(1-μ)·e2
Wherein E represents an image quality score, E1 Represents peak signal-to-noise ratio, μ represents weight of peak signal-to-noise ratio, e2 Representing structural similarity.
The size of the weight μ of the peak signal-to-noise ratio can be set by a person skilled in the art according to practical situations, and the invention is not limited.
In the invention, the image quality is evaluated by integrating the peak signal-to-noise ratio and the structural similarity, which is helpful for realizing objective, comprehensive and adjustable image quality evaluation and improving the efficiency and reliability of image processing and restoration.
S304: the defective area is repaired by creating an countermeasure network.
Wherein generating the antagonism network (Generative Adversarial Network, GAN) is a deep learning framework. GAN consists of two neural networks: a Generator (producer) and a Discriminator (Discriminator), which are mutually opposed, are learned together by gaming to generate high quality data samples.
In one possible implementation, the substep S304 specifically includes grandchild steps S3041 to S3045:
s3041: the discriminator F and the generator G are constructed in association with each other.
S3042: real video data is input, and a plurality of derivative video data are generated by a generator according to the real video data.
S3043: fixing the parameters of the generator G with a first objective function L1fg ) The minimum is the goal, training discriminator F:
wherein θf Representing the parameters of the discriminator, θg Representing parameters of the generator, E () representing mathematical expectations, x representing real video data, F (x) representing the result of discrimination of the real video data by the discriminator, pt Representing the distribution of real video data, y representing the derived video data, F (y) representing the result of the discriminator on the derived video data, pg Representing the distribution of the derived video data.
S3044: fixing the parameters of the discriminator F with a second objective function L2fg ) Maximum target, training generator G:
s3045: and repairing the defect area through the trained generator G.
According to the invention, the antagonism training framework for generating the antagonism network is fully utilized to generate high-quality data, and good effects are obtained in tasks such as image restoration, and the generated data is gradually improved by the generator through iterative training so as to be more similar to real data distribution, and meanwhile, the identifier also continuously improves the identification capability of the true and false data, so that better data restoration and generation are realized.
Compared with the prior art, the invention has at least the following beneficial technical effects:
(1) In the invention, firstly when undamaged induced image frames exist in adjacent frames, the induced image frames are preferentially used for repairing the defect area, if the repairing quality is not reached, the generation of an countermeasure network is adopted for repairing, so that the computing resource can be saved to a certain extent, the video repairing cost is reduced, the repairing efficiency is improved, the linear or nonlinear interpolation is not required to be adopted for filling the lost frames in the video, the repairing quality is monitored, and the occurrence of image artifacts, distortion and discontinuity is avoided.
(2) In the invention, the optical flow characteristics, the local characteristics and the global characteristics are comprehensively considered, the characteristics of the video frame are more comprehensively evaluated, the defect area is automatically determined, and meanwhile, the accuracy of detecting the defect area of the video is improved.
Example 2
In one embodiment, referring to fig. 2 of the specification, a schematic structural diagram of a video repair system provided by the present invention is shown.
The invention provides a video repair system which comprises a processor 201 and a memory 202 for storing instructions executable by the processor 201. The processor 201 is configured to call the instructions stored in the memory 202 to perform the video repair method in embodiment 1.
The video restoration system provided by the invention can realize the steps and effects of the video restoration method in the embodiment 1, and in order to avoid repetition, the invention is not repeated.
Compared with the prior art, the invention has at least the following beneficial technical effects:
(1) In the invention, firstly when undamaged induced image frames exist in adjacent frames, the induced image frames are preferentially used for repairing the defect area, if the repairing quality is not reached, the generation of an countermeasure network is adopted for repairing, so that the computing resource can be saved to a certain extent, the video repairing cost is reduced, the repairing efficiency is improved, the linear or nonlinear interpolation is not required to be adopted for filling the lost frames in the video, the repairing quality is monitored, and the occurrence of image artifacts, distortion and discontinuity is avoided.
(2) In the invention, the optical flow characteristics, the local characteristics and the global characteristics are comprehensively considered, the characteristics of the video frame are more comprehensively evaluated, the defect area is automatically determined, and meanwhile, the accuracy of detecting the defect area of the video is improved.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples illustrate only a few embodiments of the invention, which are described in detail and are not to be construed as limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

Claims (10)

Wherein I ist An activation output vector representing an input gate at time t, sigmoid () representing a Sigmoid activation function, WXI Representing a weight matrix between word sequences and input gates, WHI Representing a weight matrix between hidden states and input gates, bI Representing the bias term of the input gate, Ft An activation output vector of a forgetting gate at the time t is represented by WXF Weight matrix between word sequence and forgetting gate, WHF A weight matrix representing the hidden state and forgetting gate, bF Indicating the forgetting of the bias term of the door, Ot An activation output vector W representing an output gate at time tXO Representing a weight matrix between word sequences and output gates, WHO Representing a weight matrix between hidden states and output gates, Ct An activation output vector representing the cell memory cell at time t, Ct ' candidate output vector representing cell memory cell at time t, Ct-1 Representing the activation output vector of the cell memory unit at time t-1, and tanh () represents tanh activation function, WXC Representing a weight matrix between word sequences and cell storage units, WHC Representing a weight matrix between hidden states and cell storage units, bC Bias term, h, representing cell memory cellt Represents the hidden state at the time t, ht-1 The hidden state at the time t-1 is represented;
wherein sigmak Representing the similarity between the current frame and the kth adjacent frame, s1 (ij) represents the pixel point (x) in the current framei ,yj ) Optical flow features at s1 (ijk) represents the pixel point (x) in the kth adjacent framei ,yj ) Optical flow characteristics at beta1 Weighting coefficients, s, representing optical flow characteristics2 (ij) represents the pixel point (x) in the current framei ,yj ) Local features at s2 (ijk) represents the pixel point (x) in the kth adjacent framei ,yj ) Local features at beta2 Weighting coefficients, s, representing local features3 (ij) represents the pixel point (x) in the current framei ,yj ) Global features at s3 (ijk) represents the pixel point (x) in the kth adjacent framei ,yj ) Global features at beta3 The weight coefficient representing the global feature, i=1, 2 …, M represents the total number of video horizontal pixels, j=1, 2 …, N represents the total number of video vertical pixels;
CN202311504674.0A2023-11-132023-11-13Video restoration method and systemActiveCN117455812B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202311504674.0ACN117455812B (en)2023-11-132023-11-13Video restoration method and system

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202311504674.0ACN117455812B (en)2023-11-132023-11-13Video restoration method and system

Publications (2)

Publication NumberPublication Date
CN117455812Atrue CN117455812A (en)2024-01-26
CN117455812B CN117455812B (en)2024-06-04

Family

ID=89583458

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202311504674.0AActiveCN117455812B (en)2023-11-132023-11-13Video restoration method and system

Country Status (1)

CountryLink
CN (1)CN117455812B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN118570698A (en)*2024-05-242024-08-30北京优酷科技有限公司 Video defect detection method and device, electronic device and storage medium
CN119323525A (en)*2024-09-262025-01-17深圳前海微众银行股份有限公司Real-time video stream processing method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20060257042A1 (en)*2005-05-132006-11-16Microsoft CorporationVideo enhancement
WO2009126621A2 (en)*2008-04-072009-10-15Tufts UniversityMethods and apparatus for image restoration
US20230008473A1 (en)*2021-06-282023-01-12Beijing Baidu Netcom Science Technology Co., Ltd.Video repairing methods, apparatus, device, medium and products
CN115731132A (en)*2022-11-252023-03-03京东方科技集团股份有限公司Image restoration method, device, equipment and medium
CN116189292A (en)*2023-01-052023-05-30重庆大学Video action recognition method based on double-flow network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20060257042A1 (en)*2005-05-132006-11-16Microsoft CorporationVideo enhancement
WO2009126621A2 (en)*2008-04-072009-10-15Tufts UniversityMethods and apparatus for image restoration
US20230008473A1 (en)*2021-06-282023-01-12Beijing Baidu Netcom Science Technology Co., Ltd.Video repairing methods, apparatus, device, medium and products
CN115731132A (en)*2022-11-252023-03-03京东方科技集团股份有限公司Image restoration method, device, equipment and medium
CN116189292A (en)*2023-01-052023-05-30重庆大学Video action recognition method based on double-flow network

Cited By (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN118570698A (en)*2024-05-242024-08-30北京优酷科技有限公司 Video defect detection method and device, electronic device and storage medium
CN118570698B (en)*2024-05-242025-07-08北京优酷科技有限公司 Video defect detection method and device, electronic device and storage medium
CN119323525A (en)*2024-09-262025-01-17深圳前海微众银行股份有限公司Real-time video stream processing method and device

Also Published As

Publication numberPublication date
CN117455812B (en)2024-06-04

Similar Documents

PublicationPublication DateTitle
CN117455812B (en)Video restoration method and system
CN112052886B (en) Method and device for intelligent estimation of human action pose based on convolutional neural network
CN109271933B (en)Method for estimating three-dimensional human body posture based on video stream
CN113674191B (en) A low-light image enhancement method and device based on conditional adversarial network
WO2020019236A1 (en)Loss-error-aware quantization of a low-bit neural network
CN115731396B (en) A continuous learning method based on Bayesian variational inference
CN111784624B (en)Target detection method, device, equipment and computer readable storage medium
CN113221645B (en)Target model training method, face image generating method and related device
CN112001983B (en)Method and device for generating occlusion image, computer equipment and storage medium
CN113536971B (en)Target detection method based on incremental learning
CN110809126A (en)Video frame interpolation method and system based on adaptive deformable convolution
US12112518B2 (en)Object detection device, learning method, and recording medium
CN112329793B (en) Saliency detection method based on structure-adaptive and scale-adaptive receptive field
Ye et al.Low-quality image object detection based on reinforcement learning adaptive enhancement
Zhang et al.Image motion deblurring via attention generative adversarial network
CN120125436A (en) A method for image super-resolution reconstruction based on deep learning
Yilmaz et al.Effect of architectures and training methods on the performance of learned video frame prediction
Wang et al.Structural-equation-modeling-based indicator systems for image quality assessment
CN117314750A (en)Image super-resolution reconstruction method based on residual error generation network
Shojaei et al.Analyzing different loss functions for Single Image Super-Resolution
Yang et al.Blind quality assessment of tone-mapped images using multi-exposure sequences
CN113516298A (en)Financial time sequence data prediction method and device
Zhang et al.Violence Recognition with Adaptive Temporal Down-Sampling
CN112507817A (en)Rain removing method for generating confrontation network based on feature supervision
CN111680648A (en)Training method of target density estimation neural network

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp