CN118764627B

Movatterモバイル変換

Info

Publication number: CN118764627B
Application number: CN202411237931.3A
Authority: CN
Inventors: 周艳艳
Original assignee: Hongyuan Intelligent Control Technology Beijing Co ltd
Current assignee: Hongyuan Intelligent Control Technology Beijing Co ltd
Priority date: 2024-09-05
Filing date: 2024-09-05
Publication date: 2024-11-26
Anticipated expiration: 2044-09-05
Also published as: CN118764627A

Abstract

The present application relates to the field of video processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for encoding and decoding video, where the method includes: generating a comprehensive feature vector based on the acquired video and the current state of the network; processing the comprehensive feature vector based on a preset network state prediction model to obtain a network prediction state; processing the current state of the network and the predicted state of the network based on a preset generation countermeasure network to obtain network simulation environment information, wherein the network simulation environment information at least comprises video transmission delay time, video coding and decoding efficiency and video quality; and processing the network simulation environment information based on a strategy optimization model to obtain an optimization strategy, and encoding, decoding and transmitting the video based on the optimization strategy. The application is convenient for improving the level of ensuring high-quality transmission of video in a dynamic network environment.

Description

Video encoding and decoding and transmission method, device, equipment and storage medium

Technical Field

The present application relates to the field of video processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for video encoding and decoding.

Background

In the rapidly evolving digital age of today, video content consumption has become an integral part of everyday life. With the popularization of high-definition video formats such as 4K and 8K and the development of emerging technologies such as Virtual Reality (VR) and Augmented Reality (AR), higher requirements are being put on video encoding, decoding and transmission technologies.

The current video encoding and decoding and transmission technology is as follows: adjusting video coding and decoding settings according to the content complexity of the video, wherein the video coding and decoding settings comprise setting resolution, frame rate, coding quality and the like; coding the video according to the adjusted video coding setting to obtain video data to be transmitted, and transmitting the video data to be transmitted to target equipment; in the process of transmitting the video to be transmitted, the transmission bit rate is adjusted by adopting an adaptive bit stream technology (ABR) according to the network state of a user so as to ensure the stability of the video transmission process.

However, the current video encoding and decoding and transmission technology does not consider dynamic changes of network states (network congestion, bandwidth fluctuation and the like) when the video encoding and decoding and transmission are performed, and in the process that the network states are in dynamic changes, the stability and fluency of video transmission are difficult to ensure by the current video encoding and decoding and transmission technology, so that the level of ensuring high-quality video transmission in a dynamic network environment needs to be improved in the prior art.

Disclosure of Invention

In order to facilitate improvement of the level of ensuring high-quality video transmission in a dynamic network environment, embodiments of the present application provide a video encoding and decoding and transmission method, apparatus, device, and storage medium.

In a first aspect, an embodiment of the present application provides a video encoding/decoding and transmission method, including:

generating a comprehensive feature vector based on the acquired video and the current state of the network;

Processing the comprehensive feature vector based on a preset network state prediction model to obtain a network prediction state;

Processing the current state of the network and the predicted state of the network based on a preset generation countermeasure network to obtain network simulation environment information, wherein the network simulation environment information at least comprises video transmission delay time, video coding and decoding efficiency and video quality;

and processing the network simulation environment information based on a strategy optimization model to obtain an optimization strategy, and encoding, decoding and transmitting the video based on the optimization strategy.

In a second aspect, an embodiment of the present application provides a video encoding/decoding and transmitting device, including:

The vector calculation module is used for generating a comprehensive feature vector based on the acquired video and the current state of the network;

The state prediction module is used for processing the comprehensive feature vector based on a preset network state prediction model to obtain a network prediction state;

The information generation module is used for processing the current state of the network and the predicted state of the network based on a preset generation countermeasure network to obtain network simulation environment information, wherein the network simulation environment information at least comprises video transmission delay time, video coding and decoding efficiency and video quality;

And the strategy optimization module is used for processing the network simulation environment information based on a strategy optimization model to obtain an optimization strategy, and encoding, decoding and transmitting the video based on the optimization strategy.

In a third aspect, an embodiment of the present application provides a computer device, where the computer device includes a memory and a processor, where the memory stores a computer program, and where the processor implements the steps of the method described above when executing the computer program.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium having stored thereon a computer program which when executed by a processor performs steps in the above-described method.

In a fifth aspect, embodiments of the present application also provide a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of any of the method embodiments described above.

In the embodiments of the video encoding and decoding and transmission method, device, equipment and storage medium, the comprehensive feature vector is generated based on the acquired video and the current state of the network; processing the comprehensive feature vector based on a preset network state prediction model to obtain a network prediction state; processing the current state of the network and the predicted state of the network based on a preset generation countermeasure network to obtain network simulation environment information, wherein the network simulation environment information at least comprises video transmission delay time, video coding and decoding efficiency and video quality; and processing the network simulation environment information based on a strategy optimization model to obtain an optimization strategy, and encoding, decoding and transmitting the video based on the optimization strategy. Through implementation of the embodiment, the network state is predicted and simulated, then an optimization strategy is determined in a simulated network simulation environment, and video encoding, decoding and transmission are performed according to the optimization strategy; therefore, before video encoding and decoding and transmission are carried out, the influence of the network state on the video encoding and decoding and transmission is fully considered, and the optimal video encoding and decoding and transmission strategy is calculated, so that the level of ensuring high-quality video transmission in a dynamic network environment can be improved.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the application or to delineate the scope of the application. Other features of the present application will become apparent from the description that follows.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a diagram of an application environment of a video codec and transmission method according to an embodiment of the present application;

fig. 2 is a flowchart of a video encoding/decoding and transmission method according to a first embodiment of the present application;

Fig. 3 is a flowchart of a video encoding/decoding and transmission method according to a second embodiment of the present application;

fig. 4 is a flowchart of a video encoding/decoding and transmission method according to a third embodiment of the present application;

fig. 5 is a flowchart of a video encoding/decoding and transmission method according to a fourth embodiment of the present application;

Fig. 6 is a flowchart of a video encoding/decoding and transmission method according to a fifth embodiment of the present application;

Fig. 7 is a schematic structural diagram of a video encoding/decoding and transmission device according to an embodiment of the present application;

FIG. 8 is a schematic diagram of a computer device according to an embodiment of the present application;

fig. 9 is an internal structural diagram of a computer-readable storage medium provided in one embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present disclosure more apparent, the present disclosure will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present disclosure.

It should be noted that the terms "first," "second," and the like in the description and claims herein and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, apparatus, article, or device that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or device.

In this document, the term "and/or" is merely one association relationship describing the associated object, meaning that three relationships may exist. For example, a and/or B may represent: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.

In order to solve the above-mentioned problems, an embodiment of the present disclosure provides a video encoding/decoding and transmission method, which can be applied to an application environment as shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104 or may be located on a cloud or other network server. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, where the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The server 104 may be implemented as a stand-alone server or as a server cluster of multiple servers.

Example 1

Fig. 2 is a flowchart of a video encoding/decoding and transmission method according to an embodiment of the present application, and referring to fig. 2, the method may be performed by an apparatus for performing the method, where the apparatus may be implemented by software and/or hardware, and the method includes:

s110, generating a comprehensive feature vector based on the acquired video and the current state of the network.

The video is a video stream which needs to be encoded, decoded and transmitted, the video stream comprises a plurality of continuous video frames, the network is a network used for video transmission, and the current state of the network at least comprises network bandwidth, network delay, packet loss rate and video buffer information; the comprehensive feature vector is the combination of the spatial feature of the video content and the time sequence feature of the current state of the network, is used for understanding the relation between the video content and the current state of the network and provides decision support for encoding and decoding and transmission of subsequent videos; in addition, the acquired video and the current state of the network can be processed through a feature extraction network in a preset video processing system to obtain a comprehensive feature vector, wherein the feature extraction network comprises a first feature extraction network for processing the video and a second feature extraction network for processing the current state of the network; it should be noted that, the video processing system is configured to perform encoding, decoding and transmission on the acquired video; in this embodiment, the acquired video is denoted as V, the current state of the network is denoted as N, and the integrated feature vector is denoted as F.

Specifically, the video V is processed through a first feature extraction network corresponding to the video V to obtain a first processing resultThe network current state is processed through a second feature extraction network corresponding to the network current state to obtain a second processing resultThe video processing system then combines the first processing resultAnd the second processing resultThe resulting integrated feature vector is denoted F.

S120, processing the comprehensive feature vector based on a preset network state prediction model to obtain a network prediction state.

The video processing system is also provided with a network state prediction model communicated with the feature extraction network, and the network state prediction model is used for predicting a network state corresponding to a specific moment tThe network prediction state is the predicted network state corresponding to the specific time t, and at least comprises network bandwidth, network delay, packet loss rate and video buffer information corresponding to the specific time t; the video consists of video frames corresponding to different moments, and the comprehensive feature vector F specifically comprises the network states corresponding to the video frames at the different moments; In this embodiment, the network state prediction model calculates the network state corresponding to the specific time tThe calculation formula of (2) is as follows:

It should be noted that the number of the substrates,For the network state corresponding to a specific time t, k and m are preset upper limit values,Coefficients of an autoregressive model in the network state prediction model,Is the network state corresponding to the video frame at time (t-i),Coefficients of a moving average model among the network state prediction models,Is the amount of change in the network state corresponding to the video frame at time (t-j),Gamma is a constant term or intercept of the network state prediction model points, for providing a baseline value of the network state prediction model,Is an error term used to characterize the difference between the predicted network state and the actual network state.

Specifically, the network states corresponding to the video frames at different moments in the comprehensive feature vector F are determinedInputting the network state prediction model to process so as to predict the network state corresponding to the specific time t, namely the network prediction state。

S130, processing the current state of the network and the predicted state of the network based on a preset generation countermeasure network to obtain network simulation environment information, wherein the network simulation environment information at least comprises video transmission delay time, video coding and decoding efficiency and video quality.

Wherein generating an antagonism network (GAN) setting and in the video processing system and in communication with the network state prediction model and the feature extraction network; generating a countermeasure network (GAN) for handling network current state N and network predicted stateThus obtaining network simulation environment information, wherein the network simulation environment information is used for simulating the network environment corresponding to the video in the encoding and decoding and transmission processes; the video transmission delay time D (t), that is, the delay of the video processing system when transmitting the video V, the video encoding and decoding efficiency E (t) is the efficiency of the video processing system when encoding and decoding the video, and the video quality Q (t) is the quality level of the video transmitted to the user side by the video processing system.

Specifically, the current network state N obtained in step S110 and the network prediction state generated in step S120 are combinedThe video signal is input into a preset generation countermeasure network (GAN) for processing, so as to obtain network simulation environment information, which in this embodiment at least includes video transmission delay time D (t), video encoding and decoding efficiency E (t), and video quality Q (t).

And S140, processing the network simulation environment information based on a strategy optimization model to obtain an optimization strategy, and encoding, decoding and transmitting the video based on the optimization strategy.

The strategy optimization model is arranged in the video processing system and is communicated with a generation countermeasure network (GAN), the strategy optimization model is used for processing network simulation environment information for generating output of the countermeasure network (GAN) and generating an optimization strategy, the optimization strategy comprises an optimal coding and decoding scheme and an optimal video transmission scheme, the optimal coding and decoding scheme is suitable for the current state of the network, the coding and decoding scheme at least comprises an H.264 coding and decoding technology, an H.265 coding and decoding technology, an AV1 coding and decoding technology and the like, and the video transmission scheme comprises transmission control methods with different bit rates; the optimization strategy is used for controlling the video processing system to encode, decode and transmit the acquired video V.

Specifically, the network simulation environment information is input into a strategy optimization model for processing, so that an optimization strategy is obtained, and further, the video processing system controls the process of encoding, decoding and transmitting the obtained video V according to the optimization strategy.

It should be noted that, in this embodiment, the integrated feature vector is generated based on the acquired video and the current state of the network; processing the comprehensive feature vector based on a preset network state prediction model to obtain a network prediction state; processing the current state of the network and the predicted state of the network based on a preset generation countermeasure network to obtain network simulation environment information, wherein the network simulation environment information at least comprises video transmission delay time, video coding and decoding efficiency and video quality; and processing the network simulation environment information based on a strategy optimization model to obtain an optimization strategy, and encoding, decoding and transmitting the video based on the optimization strategy. Through implementation of the embodiment, the network state is predicted and simulated, then an optimization strategy is determined in a simulated network simulation environment, and video encoding, decoding and transmission are performed according to the optimization strategy; therefore, before video encoding and decoding and transmission are carried out, the influence of the network state on the video encoding and decoding and transmission is fully considered, and the optimal video encoding and decoding and transmission strategy is calculated, so that the level of ensuring high-quality video transmission in a dynamic network environment can be improved.

Example two

Fig. 3 is a flowchart of a video encoding/decoding and transmission method according to a second embodiment of the present application, and referring to fig. 3, the method optimizes "generating a comprehensive feature vector based on an acquired video and a current state of a network" in the first embodiment; in the parts of this embodiment not described in detail, reference may be made to the description of other embodiments, and the method includes:

S211, extracting features of the acquired video to obtain video features.

The feature extraction network in the first embodiment comprises a video feature extraction network, wherein a two-dimensional convolution layer Conv2D is arranged in the video feature extraction network, and the two-dimensional convolution layer Conv2D is used for carrying out two-dimensional convolution on a video V so as to obtain primary video features; and the convolution parameters of the two-dimensional convolution layer Conv2D are recorded as; The video feature extraction network is further provided with a nonlinear activation function Relu, and the nonlinear activation function Relu is used for further processing the primary video features to obtain target video featuresThe nonlinear activation function Relu is specifically configured to increase the expressive power of the video feature extraction network, and the nonlinear activation function Relu may set all negative values in the primary video feature to 0, thereby introducing a nonlinear video feature; the calculation formula corresponding to the video feature extraction network is as follows:

Specifically, the obtained video V is input into a video feature extraction network in a feature extraction network for processing, and the video feature extraction network performs two-dimensional convolution on a video frame of the video V through a two-dimensional convolution layer Conv2D to obtain a primary video featureFurther, the video feature extraction network further extracts primary video features through a nonlinear activation function ReluProcessing to obtain video features of the object。

S212, extracting the characteristics of the acquired current state of the network to obtain the network state characteristics.

The feature extraction network in the first embodiment further includes a network state feature extraction network, where the network state feature extraction network is specifically a long-short-term memory network (LSTM), and the long-short-term memory network (LSTM) is a special Recurrent Neural Network (RNN) capable of processing and memorizing long-term information; in this embodiment, the long short term memory network (LSTM) is specifically configured to extract network state characteristics from the current state N of the networkAnd the network parameters of the long-short-term memory network (LSTM) are recorded as，For determining how the LSTM extracts time series features from the sequence of current states of the network; the calculation formula corresponding to the network state feature extraction network is as follows:

Specifically, the acquired current state of the network is input into a network state feature extraction network for processing, so as to obtain the network state feature。

S213, performing feature connection on the video features and the network state features to obtain comprehensive feature vectors.

Wherein, the feature extraction network in the first embodiment further includes a feature connection function Concat, and the feature connection function Concat is used for video featuresWith the network status featurePerforming feature connection to obtain a comprehensive feature vector F; the calculation expression of the comprehensive feature vector F is as follows:

Specifically, the video features generated in step S211 are identifiedNetwork status feature generated in step S212Is input to the feature connection function Concat for processing, thereby obtaining the integrated feature vector F.

S220, processing the comprehensive feature vector based on a preset network state prediction model to obtain a network prediction state.

S230, processing the current state of the network and the predicted state of the network based on a preset generation countermeasure network to obtain network simulation environment information, wherein the network simulation environment information at least comprises video transmission delay time, video coding and decoding efficiency and video quality.

S240, processing the network simulation environment information based on a strategy optimization model to obtain an optimization strategy, and encoding, decoding and transmitting the video based on the optimization strategy.

In one embodiment, the optimizing the preset step of generating the countermeasure network in the first embodiment, and the step of constructing the generated countermeasure network includes:

S1, constructing a generator based on preset generator network parameters.

Wherein generating a antagonism network (GAN) includes two core components: the system comprises a generator G and a discriminator D, wherein the generator G is used for generating network state data which is as close to reality as possible, the discriminator D is used for distinguishing the generated network state data from the real network state data, and the generator G and the discriminator D can enable a generated countermeasure network (GAN) to generate more realistic network simulation environment information; the calculation formula corresponding to generator G is as follows:

It should be noted that the number of the substrates,The network state data calculated by the generator G, z is random noise input into the generator G,Generator network parameters.

Specifically, according to preset generator network parametersThe generator G is constructed (z,)。

S2, constructing a discriminator based on the discriminator network parameters, a preset full-connection layer network and an activation function.

The calculation formula corresponding to the discriminator D is as follows:

It should be noted that the number of the substrates,The network status data calculated by the generator G,For the identifier network parameters, FC is a preset full connection layer for inputting the network state data into the identifier DThe probability value is converted into a probability value, sigmoid is a preset activation function, and the probability value output by the full connection layer FC is converted into a probability value between 0 and 1.

Specifically, according to the parameters of the arbiter networkAnd constructing a discriminator D by the preset full-connection layer network FC and the activation function Sigmoid according to a calculation formula of the discriminator D.

S3, constructing and generating an countermeasure network based on the generator and the discriminator.

Specifically, the generator G constructed in the step S1 and the discriminator constructed in the step S2 are caused to establish a communication relationship, that is, the output of the generator GAs input to the arbiter D, an antagonism network (GAN) is then created based on the two core components of the generator G and the arbiter D.

Example III

Fig. 4 is a flowchart of a video encoding/decoding and transmission method according to a third embodiment of the present application, and referring to fig. 4, the method optimizes "processing the network simulation environment information based on a policy optimization model to obtain an optimization policy" in the first embodiment; in the parts of this embodiment not described in detail, reference may be made to the description of other embodiments, and the method includes:

and S310, generating a comprehensive feature vector based on the acquired video and the current state of the network.

S320, processing the comprehensive feature vector based on a preset network state prediction model to obtain a network prediction state.

S330, processing the current state of the network and the predicted state of the network based on a preset generation countermeasure network to obtain network simulation environment information, wherein the network simulation environment information at least comprises video transmission delay time, video coding and decoding efficiency and video quality.

S341, processing the video transmission delay time, the video coding and decoding efficiency and the video quality based on the rewarding function in the strategy optimization model to obtain rewarding results.

The strategy optimization model is provided with a reward function R (t), the reward function R (t) is used for calculating a reward obtained by a strategy adopted in a current state of a certain network, the strategy comprises a coding and decoding scheme and a video transmission scheme, and in the implementation, in order to meet the requirements of different current states of the network, a plurality of strategies are arranged in the video processing system. In this embodiment, the network simulation environment information specifically includes a video transmission delay time D (t), a video encoding and decoding efficiency E (t), and a video quality Q (t), and the policy optimization model presets a first weight parameter for the video transmission delay time D (t)A second weight parameter is preset for video coding and decoding efficiency E (t)A third weight parameter is preset for the video quality Q (t); The reward result is the output result of the reward function R (t), and the calculation formula corresponding to the reward function R (t) is as follows:

Wherein,The method is used for smoothing the influence of the video transmission delay time D (t) on the calculated rewarding result, and the lower the video transmission delay time D (t), the higher the rewarding result corresponding to the video transmission delay time D (t) is; log (E (t)) is used for smoothing the influence of the video coding efficiency E (t) on the calculated rewarding result, and the higher the video coding efficiency E (t), the higher the rewarding result corresponding to the video coding efficiency E (t); the video quality Q (t) is an important experience index of a user on the video, and the higher the video quality Q (t), the higher the corresponding rewarding result of the video quality Q (t) is.

Specifically, the video transmission delay time D (t), the video coding and decoding efficiency E (t) and the video quality Q (t) in the network simulation environment information are input into the reward function R (t) for calculation, so that a reward result is obtained.

S342, determining an optimization strategy based on the rewarding result and a strategy selection function in the strategy optimization model.

The method comprises the steps that a strategy optimization model is used for calculating a rewarding result corresponding to each strategy, and executing the corresponding strategy according to the higher rewarding result indicates that the video with the best quality can be transmitted in a dynamic network environment; and in the strategy selection function setting and strategy optimization model, the strategy selection function is a function for determining the corresponding strategy when the reward result is maximized.

Specifically, the policy optimization model calculates rewarding results corresponding to various policies through a rewarding function R (t) under the current state of the network corresponding to a certain time t, and then selects the policy corresponding to the largest rewarding result through a policy selection function as an optimization policy.

S343, encoding and decoding the video and transmitting the video based on the optimization strategy.

The optimization strategy comprises a specific video coding and decoding scheme and a video transmission scheme.

Specifically, the video V is encoded and transmitted according to a specific video encoding and decoding scheme and a video transmission scheme corresponding to the pair optimization strategy pair.

Example IV

Fig. 5 is a flowchart of a video encoding/decoding and transmission method according to a fourth embodiment of the present application, and referring to fig. 5, the method optimizes "determining an optimization policy based on the reward result and a policy selection function in the policy optimization model" in the third embodiment; in the parts of this embodiment not described in detail, reference may be made to the description of other embodiments, and the method includes:

S410, generating a comprehensive feature vector based on the acquired video and the current state of the network.

S420, processing the comprehensive feature vector based on a preset network state prediction model to obtain a network prediction state.

S430, processing the current state of the network and the predicted state of the network based on a preset generation countermeasure network to obtain network simulation environment information, wherein the network simulation environment information at least comprises video transmission delay time, video coding and decoding efficiency and video quality.

S441, processing the video transmission delay time, the video coding and decoding efficiency and the video quality based on the rewarding function in the strategy optimization model to obtain rewarding results.

And S4421, updating the function parameters of the strategy selection function based on the rewarding result to obtain a target strategy selection function.

Wherein the policy selection function has corresponding function parametersThe function parameters of the policy selection function can be selected by the calculated reward resultUpdating to obtain updated function parametersThen the updated function parametersUpdating to the policy selection function can obtain an updated policy selection function, and the target policy selection function is the updated policy selection function.

Specifically, after calculating the reward result in step S441, further, updating the current function parameters of the policy selection function according to the reward result to obtain updated function parameters, and then updating the updated function parameters to the policy selection function to obtain the target policy selection function.

And S4422, determining an optimization strategy based on the target strategy selection function and the rewarding result.

Specifically, a strategy corresponding to the reward result with the largest value is determined through a target strategy selection function and is used as an optimization strategy.

S443, encoding and decoding the video and transmitting the video based on the optimization strategy.

Example five

Fig. 6 is a flowchart of a video encoding/decoding and transmission method according to an embodiment of the present application, and referring to fig. 6, the method may be performed by an apparatus for performing the method, where the apparatus may be implemented by software and/or hardware, and the method includes:

S510, generating a comprehensive feature vector based on the acquired video and the current state of the network.

S520, processing the comprehensive feature vector based on a preset network state prediction model to obtain a network prediction state.

S530, processing the current state of the network and the predicted state of the network based on a preset generation countermeasure network to obtain network simulation environment information, wherein the network simulation environment information at least comprises video transmission delay time, video coding and decoding efficiency and video quality.

S540, processing the network simulation environment information based on a strategy optimization model to obtain an optimization strategy, and encoding, decoding and transmitting the video based on the optimization strategy.

S550, processing the current state of the network and the acquired user feedback information based on a preset loss function to obtain a loss result.

The video is encoded and decoded through an optimization strategy and transmitted to the user side, the user side feeds back video quality evaluation to the video processing system, and user feedback information, namely video quality evaluation fed back by the user side; the loss function is arranged in the video processing system and is used for calculating a loss result according to the current state N of the network and the acquired user feedback information, and the loss result is used for representing the quality of video encoding and decoding and transmission through the current optimization strategy.

Specifically, the current state N of the network and the acquired feedback information of the user are input into a preset loss function for processing to obtain a loss result.

S560, calculating updated model parameters based on the loss result, a preset learning rate and current model parameters of the strategy optimization model.

Wherein the updated model parameters are recorded asUpdated model parametersThe calculation formula is as follows:

It should be noted that the number of the substrates,For current model parameters of the policy optimization model, eta is a preset learning rate,J() Gradient for loss results.

Specifically, the gradient of the loss result is calculatedJ() Then calculateJ() Product of preset learning rateFurther, current model parameters of the strategy optimization model are calculatedWith the product of the aboveIs updated model parameters。

S570, updating the strategy optimization model based on the updated model parameters to obtain a target strategy optimization model, and taking the target strategy optimization model as a new strategy optimization model.

The updated model parameters are used for replacing original current model parameters in the policy optimization model, so that the policy optimization model is updated, and the target policy optimization model is the updated policy optimization model.

Specifically, updated model parametersUpdating to policy optimization model to replace original current model parametersThus, a target policy optimization model is obtained, and further, the target policy optimization model is used as a new policy optimization model in step S540, and a video encoding/decoding and transmission method shown in this embodiment is re-executed.

It should be noted that, the policy optimization model can be updated in real time through the obtained current state of the network and the feedback information of the user, so as to improve the self-adaptive capacity of the policy optimization model, so that the real optimal policy can be determined through the updated policy optimization model, and further improve the level of ensuring high-quality transmission of the video in the dynamic network environment.

It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Example six

Based on the same inventive concept, the embodiments of the present disclosure further provide a video encoding/decoding and transmitting apparatus for implementing the above-mentioned related video encoding/decoding and transmitting method. The implementation of the solution provided by the apparatus is similar to that described in the above method, so the specific limitation of the embodiment of the video codec and transmission apparatus provided below may be referred to the limitation of the video codec and transmission method hereinabove, and will not be repeated here.

In this embodiment, as shown in fig. 7, there is provided a video encoding/decoding and transmitting apparatus, including:

The modules in the video encoding and decoding and transmitting device can be implemented in whole or in part by software, hardware and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a server, and the internal structure of which may be as shown in fig. 8. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is for storing data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a video codec and transmission method.

Those skilled in the art will appreciate that the architecture shown in fig. 8 is merely a block diagram of a portion of the architecture associated with the disclosed aspects and is not limiting of the computer device to which the disclosed aspects apply, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

In one embodiment, a computer readable storage medium is provided, as shown in fig. 9, having a computer program stored thereon, which when executed by a processor, implements the steps of the method embodiments described above.

In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.

It should be noted that, the user information (including, but not limited to, user equipment information, user personal information, etc.) and the data (including, but not limited to, data for analysis, stored data, presented data, etc.) related to the present disclosure are information and data authorized by the user or sufficiently authorized by each party.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided by the present disclosure may include at least one of non-volatile and volatile memory, among others. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magneto-resistive random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (DHASE CHANGE Memory, DCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in various forms such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), etc. The databases referred to in the various embodiments provided by the present disclosure may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors involved in the embodiments provided by the present disclosure may be general-purpose processors, central processing units, graphics processors, digital signal processors, programmable logic, quantum computing-based data processing logic, etc., without limitation thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples have expressed only a few embodiments of the present disclosure, which are described in more detail and detail, but are not to be construed as limiting the scope of the present disclosure. It should be noted that variations and modifications can be made by those skilled in the art without departing from the spirit of the disclosure, which are within the scope of the disclosure. Accordingly, the scope of the present disclosure should be determined from the following claims.

Claims

Translated fromChinese

1.一种视频编解码与传输方法，其特征在于，包括：1. A video encoding, decoding and transmission method, comprising:

基于获取的视频与网络当前状态生成综合特征向量；Generate a comprehensive feature vector based on the acquired video and the current state of the network;

基于预设的网络状态预测模型处理所述综合特征向量得到网络预测状态；Processing the comprehensive feature vector based on a preset network status prediction model to obtain a network prediction status;

基于预设的生成对抗网络处理所述网络当前状态与所述网络预测状态得到网络模拟环境信息，所述网络模拟环境信息至少包括视频传输延迟时间、视频编解码效率以及视频质量；Processing the current state of the network and the predicted state of the network based on a preset generative adversarial network to obtain network simulation environment information, wherein the network simulation environment information at least includes video transmission delay time, video encoding and decoding efficiency, and video quality;

基于策略优化模型处理所述网络模拟环境信息得到优化策略，基于所述优化策略对所述视频进行编解码与传输；Processing the network simulation environment information based on a policy optimization model to obtain an optimization strategy, and encoding, decoding and transmitting the video based on the optimization strategy;

其中，所述基于获取的视频与网络当前状态生成综合特征向量，包括：The step of generating a comprehensive feature vector based on the acquired video and the current state of the network includes:

对获取的视频进行特征提取得到视频特征；Perform feature extraction on the acquired video to obtain video features;

对获取的网络当前状态进行特征提取得到网络状态特征；Perform feature extraction on the acquired current network state to obtain network state features;

对所述视频特征与所述网络状态特征进行特征连接得到综合特征向量；Performing feature connection on the video feature and the network status feature to obtain a comprehensive feature vector;

其中，构建所述生成对抗网络的步骤，包括：The step of constructing the generative adversarial network includes:

基于预设的生成器网络参数构建生成器；Build a generator based on the preset generator network parameters;

基于判别器网络参数、预设的全连接层网络与激活函数构建判别器；Construct a discriminator based on the discriminator network parameters, the preset fully connected layer network and the activation function;

基于所述生成器与所述判别器构建生成对抗网络；Constructing a generative adversarial network based on the generator and the discriminator;

其中，所述基于策略优化模型处理所述网络模拟环境信息得到优化策略，包括：Wherein, the process of processing the network simulation environment information based on the policy optimization model to obtain the optimization strategy includes:

基于所述策略优化模型中的奖励函数处理所述视频传输延迟时间、所述视频编解码效率以及所述视频质量得到奖励结果；Processing the video transmission delay time, the video encoding and decoding efficiency, and the video quality based on the reward function in the strategy optimization model to obtain a reward result;

基于所述奖励结果与所述策略优化模型中的策略选择函数确定优化策略。An optimization strategy is determined based on the reward result and a strategy selection function in the strategy optimization model.

2.根据权利要求1所述的方法，其特征在于，所述基于所述奖励结果与所述策略优化模型中的策略选择函数确定优化策略，包括：2. The method according to claim 1, characterized in that the step of determining the optimization strategy based on the reward result and the strategy selection function in the strategy optimization model comprises:

基于所述奖励结果更新所述策略选择函数的函数参数得到目标策略选择函数；Based on the reward result, the function parameters of the strategy selection function are updated to obtain a target strategy selection function;

基于所述目标策略选择函数与所述奖励结果确定优化策略。An optimization strategy is determined based on the target strategy selection function and the reward result.

3.根据权利要求1所述的方法，其特征在于，还包括：3. The method according to claim 1, further comprising:

基于预设的损失函数处理网络当前状态与获取的用户反馈信息得到损失结果；Based on the preset loss function, the current state of the network and the obtained user feedback information are processed to obtain the loss result;

基于所述损失结果、预设的学习率以及所述策略优化模型的当前模型参数计算更新后模型参数；Calculating updated model parameters based on the loss result, a preset learning rate, and current model parameters of the strategy optimization model;

基于所述更新后模型参数更新所述策略优化模型得到目标策略优化模型，并将所述目标策略优化模型作为新的策略优化模型。The policy optimization model is updated based on the updated model parameters to obtain a target policy optimization model, and the target policy optimization model is used as a new policy optimization model.

4.一种视频编解码与传输装置，其特征在于，所述装置包括：4. A video encoding, decoding and transmission device, characterized in that the device comprises:

向量计算模块，用于基于获取的视频与网络当前状态生成综合特征向量；A vector calculation module, used to generate a comprehensive feature vector based on the acquired video and the current state of the network;

状态预测模块，用于基于预设的网络状态预测模型处理所述综合特征向量得到网络预测状态；A state prediction module, used for processing the comprehensive feature vector based on a preset network state prediction model to obtain a network prediction state;

信息生成模块，用于基于预设的生成对抗网络处理所述网络当前状态与所述网络预测状态得到网络模拟环境信息，所述网络模拟环境信息至少包括视频传输延迟时间、视频编解码效率以及视频质量；An information generation module, configured to obtain network simulation environment information by processing the current network state and the predicted network state based on a preset generative adversarial network, wherein the network simulation environment information at least includes video transmission delay time, video encoding and decoding efficiency, and video quality;

策略优化模块，用于基于策略优化模型处理所述网络模拟环境信息得到优化策略，基于所述优化策略对所述视频进行编解码与传输；A policy optimization module, used for processing the network simulation environment information based on a policy optimization model to obtain an optimization policy, and encoding, decoding and transmitting the video based on the optimization policy;

5.一种计算机设备，包括存储器和处理器，所述存储器存储有计算机程序，其特征在于，所述处理器执行所述计算机程序时实现权利要求1至3中任一项所述的方法的步骤。5. A computer device, comprising a memory and a processor, wherein the memory stores a computer program, wherein the processor implements the steps of the method according to any one of claims 1 to 3 when executing the computer program.

6.一种计算机可读存储介质，其上存储有计算机程序，其特征在于，所述计算机程序被处理器执行时实现权利要求1至3中任一项所述的方法的步骤。6. A computer-readable storage medium having a computer program stored thereon, wherein when the computer program is executed by a processor, the steps of the method according to any one of claims 1 to 3 are implemented.

7.一种计算机程序产品，包括计算机程序，其特征在于，该计算机程序被处理器执行时实现权利要求1至3中任一项所述的方法的步骤。7. A computer program product, comprising a computer program, characterized in that when the computer program is executed by a processor, the steps of the method according to any one of claims 1 to 3 are implemented.