The application is that application number is 200480002031.5, the applying date is on January 7th, 2004, denomination of invention for " for improvement of the method and apparatus selected of coding mode " the dividing an application of application for a patent for invention.
Present patent application require name be called " for improvement of the method and apparatus selected of coding mode ", sequence number is 60/439062, in the priority of disclosed U.S. Provisional Patent Application on the 8th January in 2003.
Embodiment
The invention discloses the method and system of selecting for improvement of coding mode.In following description, for the ease of explaining that proposing concrete term provides complete understanding of the present invention.Yet it is evident that to those skilled in the art in order to implement the present invention does not need these concrete details.
The H.264 video encoding standard that occurs, be also referred to as MPEG4/ part 10, joint video team (JVT), advanced video encoding (AVC) and H.26L, it is by the common exploitation of Motion Picture Experts Group (MPEG) and International Telecommunications Union (ITU), so that the compression of the moving image higher than state-of-the-art video coding system to be provided, wherein said state-of-the-art video coding system and existing mpeg standard adapt.Its target application H.264 that is expected to become international standard in 2003 includes, but is not limited to video conference, digital storage media, television broadcasting, the Internet flows and communicates by letter.
Similar to other video encoding standard (in their main body or annex), H.264 standard uses rate distortion (RD) to determine framework.Particularly, H.264 standard is used rate-distortion optimisation and the locomotion evaluation that coding mode is selected.In open, principal focal point is that the coding mode in the framework of standard is H.264 selected.
In most of video coding systems, each frame of video of video sequence is divided into pixel subset, wherein pixel subset is known as picture element module.In standard H.264, picture element module is of different sizes (picture element module with 16 * 16 pixel sizes is commonly referred to as macro block).Coding mode is selected problem can be defined as " select in all possible coding method (or coding mode) best with to each picture element module in the frame of video " off the record to encode.Can solve coding mode by video encoder with different ways and select problem.Solving coding mode selects a possible method of problem to utilize rate-distortion optimisation exactly.
There is multiple different coding mode, its H.264 each interior picture element module of framework of video encoding standard that can be selected to encode.Pattern 0 is called " Direct Model " and " skip mode " in the P frame in the B frame.Other coding mode utilizes size in B frame or the P frame to equal the picture element module of 16 * 16,16 * 8 and 8 * 16 pixels, 8 * 8,8 * 4,4 * 8,4 * 4 pixels.
(pattern 0 of B image) do not have movable information to be transferred to decoder in Direct Model.And be to use prognoses system to generate movable information.Therefore, Direct Model can provide the saving of important bit rate to sequence, and wherein contiguous space or the temporal information of this sequence utilization allows good motion vector prediction.Yet the Direct Model during H.264 experimental estimation shows is selected not generate and the as many selecteed picture element module desired to some video sequences.
The disclosure has been recommended a kind of method, and the Direct Model (pattern 0) that is used for strengthening the bidirectional predictive picture (being called B image or B frame) in the framework of standard is H.264 selected.When being applied to the P frame, coding method of the present invention obtains the enhancing that skip mode (also being pattern 0) is selected.The enhancing of Direct Model and skip mode by the lagrangian values of trooping, remove outlier and in the rate-distortion optimisation that coding mode is selected, specify the smaller value of Lagrange's multiplier to obtain.
Utilize the experimental result of the video sequence of high quality sample to represent, compare with the bit stream of the compression that utilizes H.264 encoding and decoding to obtain, the bit rate of the bit stream of compression of the present invention has reduced.The minimizing of bit rate is slightly damaged relevant with the Y-PSNR of bit stream (PSNR).Yet experimental results show that of two tests do not have subjective vision loss relevant with the variation of Y-PSNR.The more important thing is, when being introduced in the image of decoding owing to unacceptable pseudomorphism, make when inapplicable such as other possible scheme of the value of further increase quantization parameter, method of the present invention is not introduced in the video sequence of decoding under the situation of visual distortion, just further obtains bit rate significantly and reduces.And, no matter the present invention uses the H.264 fact of framework, the video coding system that coding method of the present invention is optimized applicable to any use bit distortion.
The remainder of this document is organized as follows.Video compression overview part has at first been described the basic conception relevant with the optimization framework of bit distortion in the standard H.264.The coding method that the present invention proposes is partly described in detail in the Direct Model Enhancement Method that proposes.At last, one group of experimental result and conclusion are provided respectively in experimental result part and conclusion part.
Video compression overview
As described in before this document, each frame of video is divided into the H.264 sets of pixelblocks of standard.Can utilize motion compensated predictive coding that these picture element modules are encoded.The picture element module of prediction can be in its coding, do not use before image information inside (I) picture element module (I picture element module), use previous image information single directional prediction (P) picture element module (P picture element module) or use previous image information and the picture element module (B picture element module) of bi-directional predicted (B) of a back image information.
For the P picture element module in each P image, calculate a motion vector.(noting in each video image, in many ways the encoded pixels module).For example, picture element module can be divided into littler submodule, each submodule is calculated and the transmitting moving vector.The shape of submodule can change and can not be foursquare).Utilize the computer motion vector, the pixel transitions by in the image before above-mentioned can form the predict pixel module.Difference in the video image between the picture element module of the picture element module of reality and prediction is encoded then for transmission.(this difference is for the less difference between the picture element module of the picture element module of correcting prediction and reality).
Also can be by each motion vector of coding transmission of prediction.Just, near the motion vector that has been transmitted utilizing forms the prediction to motion vector, and the difference between the motion vector of actual motion vector and prediction then is used for transmission by coding subsequently.
For each B picture element module, typically calculate two motion vectors, one is the motion vector of above-mentioned previous image, one is the motion vector of a back image.(note in P image or B image, can encoding better to some picture element modules without motion compensation.Such pixel can be encoded as intra-pixelblocks.In the B image, utilize the compensation of one-way movement forward or backward to encode better to some picture element modules.Such pixel can be encoded as prediction forward or prediction backward, and this depends on whether used previous image or a back image in prediction.) two predict pixel modules are from two B picture element module motion vector calculation.Then two predict pixel modules are combined, to form final predict pixel module.As mentioned above, the picture element module of reality and the difference between the prediction module are encoded then for transmission in the video image.
As the P picture element module, each motion vector of B picture element module can transmit by predictive coding.Just, utilize near the motion vector that has been transmitted to form predicted motion vector.Difference between the motion vector of Shi Ji motion vector and prediction is encoded subsequently and is used for transmission then.
Yet, for the B picture element module, also there is the chance of interpolation motion vector, motion vector comes from the motion vector in the image pixel module of configuration or contiguous storage.(when the motion vector of the module of utilizing the current pixel modules configured made up motion vector prediction, the Direct Model type was called the time Direct Model.When the space that utilizes the current pixel module adjacent make up motion vector prediction the time, the Direct Model type is known as the space Direct Model.) interpolate value can be used as predicted motion vector then, actual motion vector and the difference between the predicted motion vector are encoded then for transmission.Be inserted in the encoder like this and all carry out.(notice that encoder always has decoder, will how to occur so this encoder will be known the video image of reconstruction exactly).
In some cases, the motion vector of interpolation enough well uses, and does not need to do any difference correction, does not need the transmitting moving vector data in this case.H.264 this be called the Direct Model in (and H.263) standard.When recording camera lentamente during the static background of pan (pan), Direct Model is selected just effective especially.In fact, such motion vector interpolation enough well can be used according to present situation, and this means does not need to transmit differential information to these B picture element module motion vectors.In skip mode (pattern 0 in the P image), make up motion vector prediction with mode identical in 16 * 16 Direct Model, making does not have the transmission of motion vector bits to be performed.
Before transmission, typically the predicated error (difference) of picture element module or submodule is changed, quantification and entropy coding, to reduce the quantity of bit.Be calculated as original expectation picture element module and be encoded with Direct Model in the predicated error of utilizing the mean square error between the decoded predict pixel module in Direct Model coding back.Yet predicated error is not encoded and transmits in skip mode.The size and dimension that is used for the submodule of conversion can be different with the submodule size and dimension that is used for motion compensation.For example, 8 * 8 pixels or 4 * 4 pixels are generally used for conversion, and 16 * 16 pixels, 16 * 8 pixels, 8 * 16 pixels or littler size are generally used for motion compensation.Motion compensation and conversion submodule size and dimension can be different between picture element module and picture element module.
The selection of best coding mode of each picture element module of encoding is one of decision in standard H.264, and this standard has very directly influence to the distortion D in the bit rate R of compressed bit stream and the decoded video sequence.The purpose that coding mode is selected is to select coding mode M*, it will be subjected to R (P)≤RMaxThe distortion D (p) of bit rate constraints minimize, wherein P is the vector of adjustable coding parameter, RMaxIt is maximum admissible bit rate.Affined optimization problem can be converted to utilize Lagrange's equation J (p, unconstrained optimization problem λ) is provided by following formula:
J(p,λ)=D(p)+λR(p) (1)
Wherein λ is Lagrange's multiplier, the compromise of its control distortion rate.The coding mode problem identificatioin has just become J (p, minimizing λ).This can express with following equation:
Can assess aforesaid Lagrange's equation by each permissible coding mode is carried out the following step:
(a) after utilizing specific coding mode Code And Decode, calculated distortion D is as the standard L of the error between the picture element module of original picture element module and reconstruction2
(b) calculate bit rate R as the sum of encoding motion vector and the necessary bit of conversion coefficient;
(c) utilize equation (1) to calculate lagrangian values J;
At last, after all coding modes were calculated lagrangian values J, the lagrangian values J of the minimum of acquisition represented to have solved the minimized coding mode M that is expressed by equation (2)*
Note, in video compression standard H.264, before the coding mode of determining bigger picture element module, utilize 8 * 8 and littler picture element module carry out determining of coding mode.And, note in the work of the complexity that reduces the optimization process, utilize fixing quantizer values Q to carry out to minimize definite, and often select Lagrange's multiplier to equal (for example) 0.85 * Q/2 or 0.85 * 2Q/3, wherein Q is quantization parameter.For a plurality of B images, often select bigger value.Certainly, the reduction of this complexity has also limited the search to the minimum value of Lagrangian J in the distortion rate plane.
The Direct Model Enhancement Method of recommending
System recommendation of the present invention a kind of method, the Direct Model that be used for to strengthen the B frame is selected and strengthens in the P frame skip mode to select.System of the present invention utilization troop value at cost, outlier reduces and the explanation of Lagrange's multiplier.In one embodiment, native system utilizes four steps to carry out this method.With reference to accompanying drawing 3, following text provides the detailed description to these method steps.
At first, the current pixel module of each possible coding mode M is carried out Code And Decode, and as described instep 310 and 320 to distortion DMCalculate.In one embodiment, with distortion DMBe calculated to be the Huber functional value sum of error between the pixel in the picture element module of pixel in the original picture element module and decoding.The Huber function as shown in Figure 1, is provided by following equation:
Wherein χ is the error of a pixel of picture element module, and β is parameter.Undoubtedly, for the error amount less than β, the value of Huber function equals by the square error specified value.For the error amount greater than β, the value of Huber function is less than the value of the square error of same error value.
The second, as described in step 330, calculate the bit rate R of each coding mode.In one embodiment, system-computed bit rate R is as the sum of encoding motion vector and the necessary bit of conversion picture element module coefficient.
The 3rd, as described instep 340, system of the present invention utilizes the lagrangian of equation (1) calculation code pattern.In one embodiment, the value of lagrangian multiplier is selected by this system, and the value of this lagrangian multiplier is as the function of quantization parameter, and its original Lagrangian λ that partly advises than the nonstandardized technique of standard 4.1 versions H.264 changes slowlyer.Variation as the suggestion of the Lagrangian λ of the function of quantizer Q is described in accompanying drawing 2A, 2B and 2C.By making that lagrangian multiplier lambda vary must be slower than the lambda in the benchmark realization, the less bit rate composition R that emphasizes Lagrange's equation (1) of system of the present invention, and thereby the more distortion components D that emphasizes.As the result that lagrangian multiplier lambda is changed, the increase that bit rate R is small will have less influence to the lagrangian values J of output.(this also will reduce the influence that bit rate R troops to the Lagrange described in the following paragraph).
The 4th, make
Become all J
MMinimum value (utilizing equation (1)), M is one of them possible coding mode.System does not select to generate
Coding mode (M
*), but as the lagrangian values J that gets off and troop and calculate
MMake S be made as the set of coding mode K, wherein the lagrangian values of Ji Suaning satisfies condition:
Wherein general Shillong in distress (" ε ") is the error amount of selecting, J*Be the J of the minimum of all patterns.Ifcoding mode 0 is the element of S set, then system selectscoding mode 0 as the coding mode that will be used to the encoded pixels module, otherwise system select withCorresponding codes pattern M*(generate the coding mode M of minimum J value*).
Above-mentioned step utilized with benchmark (nonstandardized technique) H.264 encoder compare novel assembly.Especially, the present invention uses Huber cost function calculated distortion, the Lagrange's multiplier of modification and trooping of lagrangian values.
The Huber cost function belongs to robust M estimator classification.The key property of these functions is abilities that they reduce the outlier influence.More particularly, if any outlier is present in the picture element module, then the Huber cost function is lower than (the quadratic power ground) of mean square error function to their weighting (linearly), and making successively may be identical with the coding mode of neighboring macro-blocks to the selected coding mode of picture element module.
The lagrangian multiplier of revising must be slower as the function of quantization parameter Q, thereby the degree height that the degree that the distortion components of lagrangian values J is paid attention to is paid attention to than bit rate composition R.(in the document, " lambda " or " λ " expression is used in coding mode and determines Lagrange's multiplier in the process.The multiplier that is used in the motion vector selection process is different).
At last, trooping of the lagrangian values of describing in the past supported coding mode 0.Therefore, system of the present invention allows to utilize respectively for the Direct Model of B picture element module and P picture element module or the skip mode more picture element module of encoding.
Experimental result
Vidclip " is visited Egypt (Discovering Egypt) " by coming from, 9 kinds of color video montages of " wafing " and " Britain patient " constitute to be used in video measurement collection in the experiment.The particular characteristics of these video sequences is as described in Table 1.
Table 1: cycle tests
(slightly write ch and Og and represent chapters and sections and oppositely flicker (glance) respectively)
| Sequence number | The video sequence title | Frame size | Frame number | Type |
| 1 | Visit Egypt, ch.1 | 704×464 | 58 | Distant taking the photograph |
| 2 | Waft ch.11 | 720×480 | 44 | Og |
| 3 | Visit Egypt, ch.1 | 704×464 | 630 | Distant taking the photograph |
| 4 | Visit Egypt, ch.2 | 704×464 | 148 | Zoom |
| 5 | Visit Egypt, ch.3 | 704×464 | 196 | Lifting (Boom) |
| 6 | Visit Egypt, ch.6 | 704×464 | 298 | Distant taking the photograph |
| 7 | The Britain patient, ch.2 | 720×352 | 97 | Veining |
| 8 | The Britain patient, ch.6 | 720×352 | 196 | Og |
| 9 | The Britain patient, ch.8 | 720×352 | 151 | Og |
Frame of video is represented with yuv format, equals per second 23.976 frames (fps) for all video sequence video frame rates.The visual quality of the bit rate R of the video sequence of utilization compression and the video sequence of decoding is come the effect of the method for the present invention's recommendation is assessed.Visual inspection and Y-PSNR (PSNR) value by video sequence are assessed the latter.
The assembly of the novelty in the coding method of the present invention that the Direct Model Enhancement Method of recommending is partly described replenishes the influence of speed and distortion mutually according to them.Method of the present invention makes overall bit rate reduce and the minimizing of slight Y-PSNR.Two experiments that utilization is described in following textual portions are assessed system of the present invention.
The fixed quantisation parameter of all sequences
To all video sequences, first tests selected quantization parameter is identical, and equals Q, Q+1, Q+3 respectively for I frame, P frame and B frame.Described in table 2, when utilizing coding method of the present invention, the minimizing of bit rate can be 9%, wherein the about 0.12dB of loss of Y-PSNR (PSNR).With comparing of the method coding that utilizes benchmark, there is not visible distortion in the video sequence that utilizes coding method of the present invention to encode.
Table 2: utilize identical quantization parameter Q to use bit rate (BR) [k bps] and the Y-PSNR (PSNR) [dB] of video sequence of the method for pedestal method and recommendation to all sequences
The highest quantization parameter of each sequence
For the further validity of assessment coding method of the present invention, design and carried out second experiment.When bit rate R and Y-PSNR value all reduced, general argument was that several different methods such as pre-filtering video sequence, the value that increases quantizer Q etc. can generate identical result.The purpose of this experiment is to show that method of the present invention can further reduce bit rate when these methods can not further be used under the situation of the quality that does not unacceptably weaken video.
At first, to the video sequence of each test, when distortion becomes visible, utilize pedestal method to reduce bit rate as far as possible by the value that increases quantization parameter, up to QMax+ 1.Next, system utilizes QMax(distortion is sightless maximum also) and pedestal method coding and decoding video sequence generate the bit rate and Y-PSNR (PSNR) value that are included in the table 3.For each sequence, QMaxValue is different, and for I frame, P frame and B frame, it also is respectively different.Suppose that maximum available bit minimizing does not have vision loss, is coded in identical Q with coding method of the present invention thenMaxThe sequence of value.
Table 3: utilize the highest quantization parameter to use bit rate (BR) [k bps] and the Y-PSNR (PSNR) [dB] of film sequence of the method for pedestal method and recommendation
Described in table 3, method of the present invention can further reduce bit rate 13.3% significantly, and (PSNR) loses about 0.29dB for Y-PSNR.(in order to assess the relevant pseudomorphism of any B frame) can the deterministic bit rate minimizing not introduce visual pseudomorphism by the sequence visual inspection under full motion in the video sequence of decoding.Notice that when utilizing method of the present invention, the value that can increase quantization parameter surpasses QMax, and obtain the more bits rate and reduce and do not have a vision loss.
Conclusion
The invention provides a kind of method, be used for the enhancing of skip mode in the enhancing of Direct Model in the framework B image of the video compression standard of (MPEG4/ part 10) H.264 and the P image.System of the present invention utilizes Huber cost function calculated distortion, revises Lagrange's multiplier, and the lagrangian values of trooping is to select to be used for the coding mode of encoded pixels module.Test has shown the method for the present invention of utilizing, just can obtain significant bit-rate reduction with small Y-PSNR (PSNR) loss, and not have subjective visual quality to descend.As additives, when other the value where applicable no longer of scheme such as further increase quantization parameter, it is particularly useful that these characteristics make that method of the present invention reduces for the bit rate in any video coding system, and this video coding system utilizes the distortion rate optimization framework that coding mode is determined.
The method and apparatus of combine digital figure image intensifying has below been described.Under the situation that does not deviate from scope of the present invention, those of ordinary skill in the art can make a change and revise material and the arrangement of parts of the present invention.