CN106937168B

Movatterモバイル変換

Info

Publication number: CN106937168B
Application number: CN201511021812.5A
Authority: CN
Inventors: 焦华龙
Original assignee: Palmwin Information Technology Shanghai Co ltd
Current assignee: Xiao Feng
Priority date: 2015-12-30
Filing date: 2015-12-30
Publication date: 2020-05-12
Anticipated expiration: 2035-12-30
Also published as: CN106937168A

Abstract

Description

Video coding method, electronic equipment and system using long-term reference frame

Technical Field

The present invention relates to the field of video coding, and in particular, to a video coding method, an electronic device, and a system using a long-term reference frame.

Background

In the video coding process of coding by using a long-term reference frame and hierarchically coding by using a short-term reference frame, the rate is wasted because the hierarchical coding easily causes that the short-term reference frame referred by the video frame is not the nearest.

Disclosure of Invention

In order to solve the above problem, embodiments of the present invention provide a video encoding method, an electronic device, and a system.

According to a first aspect, an embodiment of the present invention provides a video encoding method using long-term reference frames, the method including:

acquiring a video frame;

caching and marking the video frame as a long-term reference frame to be effective;

determining whether a valid long-term reference frame is closer to the video frame than a short-term reference frame corresponding to layered coding;

if so, encoding the video frame by using the effective long-term reference frame to generate encoded data;

if not, encoding the video frame by using the short-term reference frame corresponding to the layered coding to generate encoded data;

setting information for marking the video frame as a long-term reference frame in the coded data;

transmitting the encoded data to a decoding end;

receiving long-term reference frame feedback from the decoding end; and

and marking the long-term reference frame to be generated for which the long-term reference frame feedback aims as a valid long-term reference frame and clearing the previous long-term reference frame.

With reference to the first aspect, in a first possible implementation manner, the determining whether the valid long-term reference frame is closer to the video frame than the short-term reference frame corresponding to the layered coding includes:

determining whether a residual between the validated long-term reference frame and the video frame is smaller than a short-term reference frame corresponding to layered coding.

With reference to the first aspect, in a second possible implementation manner, the determining whether the valid long-term reference frame is closer to the video frame than the short-term reference frame corresponding to the layered coding includes:

and judging whether the frame number of the effective long-term reference frame is closer to the frame number of the video frame than the short-term reference frame corresponding to the layered coding.

With reference to the first aspect to the second possible implementation manner of the first aspect, in a third possible implementation manner, the layered coding includes 1-3 layer coding.

According to a second aspect, an embodiment of the present invention provides an electronic device, including:

the acquisition module is used for acquiring video frames;

the reference frame management module is used for caching the video frame and marking the video frame as a long-term reference frame to be effective;

a judging module for judging whether the effective long-term reference frame is closer to the video frame than the short-term reference frame corresponding to the layered coding;

the coding module is used for coding the video frame by using the effective long-term reference frame to generate coded data if the judgment module judges that the video frame is positive;

the encoding module is further used for encoding the video frame by using the short-term reference frame corresponding to the layered coding to generate encoded data if the judgment module judges that the video frame is not the layered coding;

a marking module, configured to set information for marking the video frame as a long-term reference frame in the encoded data;

the sending module is used for sending the coded data to other electronic equipment;

a receiving module for receiving long term reference frame feedback from the other electronic devices; and

the reference frame management module is further used for marking the long-term reference frame to be generated for which the long-term reference frame feedback is aimed as an effective long-term reference frame and clearing the previous long-term reference frame.

With reference to the second aspect, in a first possible implementation manner, the determining module is specifically configured to:

With reference to the second aspect, in a second possible implementation manner, the determining module is specifically configured to:

With reference to the second aspect or any implementation manner of the second possible implementation manner of the second aspect, in a third possible implementation manner, the layered coding includes 1-3 layer coding.

In combination with the third aspect, an embodiment of the present invention provides a video coding and decoding system, where the video coding and decoding system includes a first electronic device and a second electronic device, where the first electronic device and the second electronic device are configured to perform coding and decoding, respectively

The first electronic device includes:

the acquisition module is used for acquiring video frames;

the first reference frame management module is used for caching and marking the video frame as a long-term reference frame to be effective;

a first determining module for determining whether a valid long-term reference frame is closer to the video frame than a short-term reference frame corresponding to a layered coding;

a sending module, configured to send the encoded data to the second electronic device;

a first receiving module for receiving long term reference frame feedback from the second electronic device; and

the first reference frame management module is further used for marking the long-term reference frame to be generated for which the long-term reference frame feedback is aimed as an effective long-term reference frame and clearing the previous long-term reference frame;

the second electronic device includes:

a second receiving module, configured to receive the encoded data;

the decoding module is used for decoding the coded data to obtain a video frame;

a second judging module, configured to judge whether information indicating that the video frame is a long-term reference frame is set in the encoded data and whether the decoding is correct;

the second reference frame management module is used for adding the video frame into a reference frame cache and marking the video frame as a long-term reference frame if the second judgment module judges that the video frame is a long-term reference frame;

a feedback module, configured to send long-term reference frame feedback to the first electronic device after the second reference frame management module adds the video frame to the reference frame buffer and marks the video frame as a long-term reference frame.

With reference to the third aspect, in a first possible implementation manner, the first determining module is specifically configured to:

With reference to the third aspect, in a second possible implementation manner, the first determining module is specifically configured to:

With reference to the third aspect or any implementation manner of the second possible implementation manner of the third aspect, in a third possible implementation manner, the layered coding includes 1-3 layer coding.

According to a fourth aspect, there is provided an electronic device comprising a memory, a transmitting/receiving module, and a processor connected to the memory, the transmitting/receiving module. Wherein the memory stores a set of program codes, and the processor calls the program codes stored in the memory to execute the following operations:

acquiring a video frame;

transmitting the encoded data to a decoding end;

receiving long-term reference frame feedback from the decoding end; and

With reference to the fourth aspect, in a first possible implementation manner, the processor calls the program code stored in the memory to perform the following operations:

With reference to the fourth aspect, in a second possible implementation manner, the processor calls the program code stored in the memory to perform the following operations:

With reference to the fourth aspect, in a third possible implementation manner, the layered coding includes 1-3 layer coding.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a flowchart of a video encoding method using long-term reference frames according to an embodiment of the present invention;

fig. 2 is a flowchart of a video encoding method using long-term reference frames according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a layered coding scheme according to an embodiment of the present invention;

fig. 4 is a flowchart of a video encoding method using long-term reference frames according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a video encoding and decoding system according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention provides a video coding and decoding method using a long-term reference frame, which can be applied to scenes such as instant video communication or video stream playing and the like, and the scene is not limited in the embodiment of the invention. The data can be compressed better by replacing the IDR frame with the successfully transmitted long-term reference frame, the image quality with the same code rate can be better, and the problems of packet loss and pause caused by overlarge IDR frame data are avoided. When the packet loss is serious, for example, the long-term reference frame which is successfully transmitted is used as a reference, so that the problem that the decoding of the subsequent frame is influenced because the packet loss of the previous frame cannot be normally decoded is avoided. The embodiment of the invention can be applied to the protocol of H.264. However, one of ordinary skill in the art will appreciate that embodiments of the present invention may also be applied to other protocols. The application range of the embodiments of the present invention is not particularly limited.

Example one

An embodiment of the present invention provides a video encoding method using a long-term reference frame, and as shown in fig. 1, the method includes:

101. a video frame is acquired.

Specifically, acquiring the video frame includes acquiring the video frame by a camera. Optionally, acquiring the video frame includes acquiring a video frame from another device or acquiring a stored video frame. The embodiment of the present invention is not limited thereto.

102. And caching and marking the video frame as a long-term reference frame to be effective.

Specifically, the step includes adding the video frame to a long-term reference frame buffer area in a reference frame buffer and setting an indicator corresponding to the long-term reference frame to be effective.

103. Determining whether a valid long-term reference frame is closer to the video frame than a short-term reference frame corresponding to layered coding; if yes, step 104 is performed, and if no, step 105 is performed.

Optionally, the method includes:

Optionally, the layered coding comprises 1-3 layer coding.

104. And encoding the video frame by utilizing the effective long-term reference frame to generate encoded data.

105. And encoding the video frame by using the short-term reference frame corresponding to the layered coding to generate encoded data.

106. And setting information for indicating that the video frame is a long-term reference frame in the coded data.

Specifically, the information indicating that the video frame is a long-term reference frame is 1-bit information in the encoded data, for example,binary 1.

107. And sending the coded data to a decoding end.

108. Receiving long-term reference frame feedback from the decoding end.

Specifically, the long-term reference frame feedback from the decoding end includes a frame number of the long-term reference frame.

109. And marking the long-term reference frame to be generated for which the long-term reference frame feedback aims as a valid long-term reference frame and clearing the previous long-term reference frame.

Specifically, the marking of the long-term reference frame to be validated for which the long-term reference frame feedback is directed as the validated long-term reference frame includes:

acquiring a frame number included in the long-term reference frame feedback;

determining a long-term reference frame corresponding to the frame number in a reference frame buffer; and

the corresponding long-term reference frame is marked as valid.

Specifically, clearing the previous long-term reference frame includes clearing all the long-term reference frames to be validated and validated before the long-term reference frame to be validated for which the long-term reference frame feedback is directed.

Example two

An embodiment of the present invention provides a video encoding method using a long-term reference frame, and as shown in fig. 2, the method includes:

201. a video frame is acquired.

202. And caching and marking the video frame as a long-term reference frame to be effective.

Specifically, the step includes adding the video frame to a long-term reference frame buffer area in a reference frame buffer and setting an indicator corresponding to the long-term reference frame to be effective. Of course, the embodiment of the present invention does not limit the specific manner of buffering and marking the video frame as the long-term reference frame to be validated.

203. Determining whether a residual between an active long-term reference frame and the video frame is smaller than a short-term reference frame corresponding to layered coding; if yes, step 204 is performed, if no, step 205 is performed.

The determining whether the residual between the validated long-term reference frame and the video frame is smaller than the short-term reference frame corresponding to the layered coding may specifically include:

acquiring a short-term reference frame corresponding to a video frame according to a preset layered coding rule;

calculating a residual error between the video frame and the corresponding short-term reference frame, namely a first residual error;

acquiring an effective long-term reference frame;

calculating a residual between the video frame and the effective long-term reference frame, which is called a second residual;

comparing the first residual and the second residual;

if the second residual is less than the first residual, determining yes;

otherwise, no is determined.

Specifically, the obtaining of the short-term reference frame corresponding to the video frame according to the preset layered coding rule includes:

determining the frame number of a short-term reference frame corresponding to the video frame according to a preset layered coding rule; and

and acquiring the corresponding short-term reference frame from the short-term reference frame region in the reference frame buffer according to the frame number of the corresponding short-term reference frame.

It should be noted that the "first" and "second" are only for distinguishing purposes, and are not used to limit the order, and the order of acquiring the short-term reference frame and calculating the first residual and acquiring the long-term reference frame and calculating the second residual may be arbitrary, for example, the short-term reference frame may be acquired first and the first residual is calculated, then the long-term reference frame is acquired and the second residual is calculated, or the long-term reference frame may be acquired first and the second residual is calculated, then the short-term reference frame is calculated and the first residual is calculated, or the short-term reference frame may be acquired in parallel and the first residual is calculated and the long-term reference frame is acquired and the second residual is calculated, which is not limited in the embodiment of the present invention.

204. And encoding the video frame by utilizing the effective long-term reference frame to generate encoded data.

This step may include calculating a residual between the video frame and the effective long-term reference frame, transforming and quantizing the residual, and entropy encoding the transformed and quantized result to generate encoded data. Motion estimation and motion compensation may be included in the process of computing the residual between the video frame and the effective long-term reference frame. Of course, the embodiment of the present invention does not limit the specific process of encoding.

205. And encoding the video frame by using the short-term reference frame corresponding to the layered coding to generate encoded data.

This step may include calculating a residual between the video frame and a short-term reference frame corresponding to the layered coding, transforming and quantizing the residual, and entropy coding the transformed and quantized result to generate coded data. Motion estimation and motion compensation may be included in the computation of the residual between a video frame and a short-term reference frame corresponding to a layered coding. Of course, the embodiment of the present invention does not limit the specific process of encoding.

206. And setting information for indicating that the video frame is a long-term reference frame in the coded data.

Specifically, the information indicating that the video frame is a long-term reference frame is 1-bit information in the encoded data, for example,binary 1. Of course, the embodiment of the present invention does not limit the specific manner of setting the information indicating that the video frame is the long-term reference frame in the encoded data.

207. And sending the coded data to a decoding end.

The embodiment of the present invention does not limit the specific transmission process.

208. Receiving long-term reference frame feedback from the decoding end.

209. And marking the long-term reference frame to be generated for which the long-term reference frame feedback aims as a valid long-term reference frame and clearing the previous long-term reference frame.

acquiring a frame number included in the long-term reference frame feedback;

the corresponding long-term reference frame is marked as valid.

Specifically, marking the corresponding long-term reference frame as valid includes marking an indicator corresponding to the corresponding long-term reference frame as valid.

EXAMPLE III

An embodiment of the present invention provides a video encoding method using a long-term reference frame, and as shown in fig. 4, the method includes:

401. a video frame is acquired.

402. And caching and marking the video frame as a long-term reference frame to be effective.

403. Judging whether the frame number of the effective long-term reference frame is closer to the frame number of the video frame than the short-term reference frame corresponding to the layered coding; if yes, step 404 is performed, and if no, step 405 is performed.

Optionally, the layered coding comprises 1-3 layer coding. The scheme of hierarchical coding can refer to fig. 3 and the related description ofstep 203, and is not described herein again.

Judging whether the frame number of the effective long-term reference frame is closer to the frame number of the video frame than the short-term reference frame corresponding to the layered coding specifically includes:

acquiring a frame number of a short-term reference frame corresponding to a video frame according to a preset layered coding rule, wherein the frame number is called a first frame number;

determining a frame number of the effective long-term reference frame, called a second frame number;

comparing the first frame number and the second frame number;

if the second frame number is less than the first frame number, determining yes;

otherwise, no is determined.

It should be noted that the "first" and the "second" are only for distinguishing purposes, but not for limiting the order, and the order of determining the first frame number and the second frame number may be arbitrary, for example, the first frame number may be determined first, then the second frame number may be determined, or the second frame number may be determined first, then the first frame number may be determined, or the first frame number and the second frame number may be determined in parallel, which is not limited in the embodiment of the present invention.

404. And encoding the video frame by utilizing the effective long-term reference frame to generate encoded data.

The step is the same asstep 204 and is not described again

405. And encoding the video frame by using the short-term reference frame corresponding to the layered coding to generate encoded data.

This step is the same asstep 205 and will not be described again.

406. And setting information for indicating that the video frame is a long-term reference frame in the coded data.

407. And sending the coded data to a decoding end.

The embodiment of the present invention does not limit the specific form of transmitting the encoded data.

408. Receiving long-term reference frame feedback from the decoding end.

409. And marking the long-term reference frame to be generated for which the long-term reference frame feedback aims as a valid long-term reference frame and clearing the previous long-term reference frame.

acquiring a frame number included in the long-term reference frame feedback;

the corresponding long-term reference frame is marked as valid.

Example four

An embodiment of the present invention provides an electronic device, and with reference to fig. 5, the electronic device includes:

an obtainingmodule 501, configured to obtain a video frame;

a referenceframe management module 502, configured to cache and mark the video frame as a long-term reference frame to be validated;

a determiningmodule 503, configured to determine whether the effective long-term reference frame is closer to the video frame than the short-term reference frame corresponding to the layered coding;

anencoding module 504, configured to encode the video frame with the valid long-term reference frame to generate encoded data if the determination module determines that the video frame is a video frame;

theencoding module 504 is further configured to encode the video frame by using the short-term reference frame corresponding to the layered coding to generate encoded data if the determining module determines that the video frame is not encoded;

amarking module 505, configured to set, in the encoded data, information that marks the video frame as a long-term reference frame;

a sendingmodule 506, configured to send the encoded data to other electronic devices;

areceiving module 507, configured to receive long-term reference frame feedback from the other electronic devices; and

the referenceframe management module 502 is further configured to mark the long-term reference frame to be generated for which the long-term reference frame feedback is intended as a valid long-term reference frame and clear the previous long-term reference frame.

Optionally, the determiningmodule 503 is specifically configured to determine whether a residual between the effective long-term reference frame and the video frame is smaller than a short-term reference frame corresponding to layered coding.

Specifically, the determiningmodule 503 is configured to: acquiring a short-term reference frame corresponding to a video frame according to a preset layered coding rule;

calculating a residual error between the video frame and a short-term reference frame corresponding to the layered coding, and the residual error is called a first residual error;

acquiring an effective long-term reference frame;

comparing the first residual and the second residual;

if the second residual is less than the first residual, determining yes;

otherwise, no is determined.

Optionally, the determiningmodule 503 is specifically configured to determine whether a frame number of an effective long-term reference frame is closer to a frame number of the video frame than a short-term reference frame corresponding to layered coding.

Specifically, the determiningmodule 503 is configured to:

comparing the first frame number and the second frame number;

otherwise, no is determined.

Optionally, the layered coding includes 1-3 layer coding.

The embodiment of the invention provides electronic equipment. By caching and marking each video frame as a long-term reference frame, when the network condition is good and the decoding end quickly returns the feedback of the long-term reference frame, the video frame is coded by using the long-term reference frame when the fed back long-term reference frame is closer to the video frame than the short-term reference frame of layered coding, so that the code rate can be effectively improved, and the waste of the code rate caused by layered coding is avoided. By judging the residual error sizes of the long-term reference frame and the short-term reference frame corresponding to the layered coding and the video frame respectively, which reference frame is closer to the video frame can be determined more accurately. By judging which frame number in the long-term reference frame and the short-term reference frame corresponding to the layered coding is closer to the video frame, which reference frame is closer to the video frame can be determined more simply and rapidly, and the efficiency is improved. The corresponding long-term reference frame is marked as effective after the long-term reference frame feedback from the decoding end is received, so that only the fed-back long-term reference frame can be used for encoding, and the decoding end is ensured to correctly decode the video frame encoded by using the long-term reference frame. By purging the previous long-term reference frame after receiving the long-term reference frame feedback from the decoding end, the reference frame buffer can be prevented from becoming full.

EXAMPLE five

An embodiment of the present invention provides a video coding and decoding system, and referring to fig. 6, the video coding and decoding system includes a firstelectronic device 61 and a secondelectronic device 62, where the firstelectronic device 61 and the secondelectronic device 62 are arranged in a same plane, and the second electronic device is arranged in a same plane

The firstelectronic device 61 includes:

an obtainingmodule 611, configured to obtain a video frame;

a first referenceframe management module 612, configured to cache and mark the video frame as a long-term reference frame to be validated;

a first determiningmodule 613, configured to determine whether the effective long-term reference frame is closer to the video frame than the short-term reference frame corresponding to the layered coding;

anencoding module 614, configured to encode the video frame with the effective long-term reference frame to generate encoded data if the determination of the determining module is yes;

theencoding module 614 is further configured to encode the video frame by using the short-term reference frame corresponding to the layered coding to generate encoded data if the determining module determines that the video frame is not encoded;

amarking module 615, configured to set, in the encoded data, information that marks the video frame as a long-term reference frame;

a sendingmodule 616, configured to send the encoded data to the second electronic device;

afirst receiving module 617 for receiving long term reference frame feedback from the second electronic device; and

the first referenceframe management module 612 is further configured to mark the long-term reference frame to be validated for which the long-term reference frame feedback is directed as a validated long-term reference frame and clear a previous long-term reference frame;

the secondelectronic device 62 includes:

asecond receiving module 621, configured to receive the encoded data;

adecoding module 622, configured to decode the encoded data to obtain a video frame;

a second determiningmodule 623, configured to determine whether information indicating that the video frame is a long-term reference frame is set in the encoded data and whether the decoding is correct;

a second referenceframe management module 624, configured to add the video frame to a reference frame buffer and mark the video frame as a long-term reference frame if the second determination module determines yes;

afeedback module 625, configured to send long-term reference frame feedback to the first electronic device after the second reference frame management module adds the video frame to the reference frame buffer and marks the video frame as a long-term reference frame.

Optionally, the first determiningmodule 613 is specifically configured to:

Specifically, the first determiningmodule 613 is configured to:

acquiring an effective long-term reference frame;

comparing the first residual and the second residual;

if the second residual is less than the first residual, determining yes;

otherwise, no is determined.

Optionally, the first determiningmodule 613 is specifically configured to:

Specifically, the first determiningmodule 613 is configured to:

comparing the first frame number and the second frame number;

otherwise, no is determined.

Optionally, the layered coding includes 1-3 layer coding.

EXAMPLE six

An embodiment of the present invention provides an electronic device, and referring to fig. 7, the electronic device includes amemory 701, a sending/receiving module 702, and aprocessor 703 connected to thememory 701 and the sending/receiving module 702. Thememory 701 stores a set of program codes, and theprocessor 703 calls the program codes stored in thememory 701 to perform the following operations:

acquiring a video frame;

transmitting the encoded data to a decoding end;

receiving long-term reference frame feedback from the decoding end; and

Optionally, theprocessor 703 calls the program code stored in thememory 701 to perform the following operations:

Optionally, the layered coding includes 1-3 layer coding.

All the above-mentioned optional technical solutions can be combined arbitrarily to form the optional embodiments of the present invention, and are not described herein again.

It should be noted that: in the above embodiment, when the device executes the video coding method using the long-term reference frame, only the division of the above functional modules is illustrated, and in practical applications, the above function distribution may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to perform all or part of the above described functions. In addition, the apparatus provided in the foregoing embodiment and the video encoding method using the long-term reference frame belong to the same concept, and specific implementation processes thereof are described in the method embodiment and are not described herein again.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A method for video coding using long-term reference frames, the method comprising:

acquiring a video frame;

determining whether a residual between an active long-term reference frame and the video frame is smaller than a short-term reference frame corresponding to layered coding;

transmitting the encoded data to a decoding end;

receiving long-term reference frame feedback from the decoding end; and

2. The method of claim 1, wherein the layered coding comprises 1-3 layer coding.

3. An electronic device, characterized in that the electronic device comprises:

the acquisition module is used for acquiring video frames;

a determining module for determining whether a residual between the validated long-term reference frame and the video frame is smaller than a short-term reference frame corresponding to the layered coding;

4. A video coding and decoding system is characterized in that the video coding and decoding system comprises a first electronic device and a second electronic device, wherein

The first electronic device includes:

the acquisition module is used for acquiring video frames;

a first determining module, configured to determine whether a residual between an effective long-term reference frame and the video frame is smaller than a short-term reference frame corresponding to layered coding;

the second electronic device includes:

a second receiving module, configured to receive the encoded data;