CROSS-REFERENCE TO RELATED APPLICATIONSThis application claims the benefit of U.S. Provisional Application Ser. No. 60/818,874, filed Jul. 6, 2006, and U.S. Provisional Application Ser. No. 60/807,706, filed Jul. 18, 2006, which are incorporated by reference herein in their respective entireties. Further, this application is related to the non-provisional application, Attorney Docket No. PU060136, entitled “Method and Apparatus for Decoupling Frame Number and/or Picture Order Count (POC) for Multi-view Video Encoding and Decoding”, which is commonly assigned, incorporated by reference herein, and concurrently filed herewith.
TECHNICAL FIELDThe present principles relate generally to video encoding and decoding and, more particularly, to a method and apparatus for decoupling frame number and/or Picture Order Count (POC) for multi-view video encoding and decoding.
BACKGROUNDIn the International Organization for Standardization/International Electrotechnical Commission (ISO/IEC) Moving Picture Experts Group-4 (MPEG-4) Part 10 Advanced Video Coding (AVC) standard/International Telecommunication Union, Telecommunication Sector (ITU-T) H.264 recommendation (hereinafter the “MPEG-4 AVC standard”), the syntax element frame_num is used as an identifier for pictures and has several constraints as defined in the MPEG-4 AVC standard. The primary purpose of frame_num is to act as a counter that increments each time a picture is decoded so that if there are losses of data, the decoder can detect that some picture(s) were missing and would be able to conceal the problem. frame_num increases in decoding order of access units and does not necessarily indicate display order. The Memory Management Control Operations (MMCO) use the value of frame_num to mark pictures as long term and short term references, or mark reference pictures as unused for reference pictures. frame_num is also used for the default reference list ordering for P and SP slices.
The Picture Order Count in the MPEG-4 AVC standard is an indication of the timing or output ordering of a particular picture. Picture order count is a variable having a value that is non-decreasing with increasing picture position in output order relative to the previous Instantaneous Decoding Refresh (IDR) picture in decoding order or relative to the previous picture containing the memory management control operation that marks all reference pictures as “unused for reference”. Picture Order Count is derived from slice header syntax elements. Picture Order Count is used in the derivation of motion vectors in temporal DIRECT mode, implicit weighted prediction, and default initial reference picture list ordering for B slices.
In particular, DIRECT mode motion parameters using temporal correlation are typically derived for the current macroblock/block by considering the motion information within a co-located position in a subsequent reference picture or more precisely thefirst List 1 reference. Turning toFIG. 1, a diagram illustrating temporal DIRECT prediction in B slice coding is indicated generally by thereference numeral100. Following the presumption that an object is moving with constant speed these parameters are scaled according to the temporal distances (as shown inFIG. 1) of the reference pictures involved. The motion vectors {right arrow over (MV)}L0and {right arrow over (MV)}L1for a DIRECT coded block versus the motion vector {right arrow over (MV)} of its co-located position in thefirst List 1 reference are calculated as follows:
X=(16384+abs(TDD/2))/TDD (1)
ScaleFactor=clip(−1024,1023,(TDB×X+32)>>6) (2)
{right arrow over (MV)}L0=(ScaleFactor×{right arrow over (MV)}+128)>>8 (3)
{right arrow over (MV)}L1={right arrow over (MV)}L0−{right arrow over (MV)} (4)
In the preceding equations, TDBand TDDare the temporal distances, or more precisely Picture Order Count (POC) distances, of the reference picture used by theList 0 motion vector of the co-located block in theList 1 picture compared to the current and theList 1 picture, respectively. TheList 1 reference picture and the reference inList 0 referred by the motion vectors of the co-located block inList 1 are used as the two references of DIRECT mode. If the reference index refldxL0 refers to a long-term reference picture, or DiffPicOrderCnt(pic1, pic0) is equal to 0, the motion vectors {right arrow over (MV)}L0) and {right arrow over (MV)}L1for the direct mode partition are derived by the following:
{right arrow over (MV)}L0=mv of the collocated macroblock
{right arrow over (MV)}L1=0
The implicit weighted prediction tool also uses Picture Order Count information to determine the weights. In weighted prediction (WP) implicit mode, weighting factors are not explicitly transmitted in the slice header, but instead are derived based on relative distances between the current picture and the reference pictures. Implicit mode is used only for bi-predictively coded macroblocks and macroblock partitions in B slices, including those using DIRECT mode. For implicit mode the formula shown in Equation (1) is used, except that the offset values O0and O1are equal to zero, and the weighting factors W0and W1are derived using the formulas below in Equation (6) to Equation (10).
predPartC[x,y]=Clip1C(((predPartL0C[x,y]*w0+predPartL1C[x,y]* w1+2 logWD)>>(logWD+1))+((o0+o1+1)>>1)) (5)
X=(16384+(TDD>>1))/TDD (6)
Z=clip3(−1024, 1023,(TDB·X+32)>>6) (7)
W1=Z>>2 (8)
W0=64−W1 (9)
This is a division-free, 16-bit safe operation implementation of the following:
W1=(64·TDD)/TDB (10)
DiffPicOrderCnt(picA,picB)=PicOrderCnt(picA)−PicOrderCnt(picB) (11)
where TDBis temporal difference between theList 1 reference picture and theList 0 reference picture, clipped to the range [−128, 127] and TDBis the difference of the current picture and theList 0 reference picture, clipped to the range [−128, 127]. In Multi-view Video Coding, there can be cases where TDDcan evaluate to zero (this happens when DiffPicOrderCnt(pic1, pic2) in Equation (11) becomes zero). In such a case, the weights W0and W1are set to 32.
In the current MPEG-4 AVC compliant implementation of Multi-view Video Coding (MVC), the reference software achieves multi-view prediction by interleaving all video sequences into a single stream. In this way, frame_num and Picture Order Count between views are coupled together. This has several disadvantages. One disadvantage is there will be gaps in the value of frame_num for partial decoding. This may complicate the management of reference picture lists or make error loss detection based on frame_num gap impossible. Another disadvantage is Picture Order Count does not have a real physical meaning, which can break any coding tool which relies upon Picture Order Count information, such as temporal DIRECT mode or implicit weighed prediction. Yet another disadvantage is that the coupling makes parallel coding of multi-view sequences more difficult.
SUMMARYThese and other drawbacks and disadvantages of the prior art are addressed by the present principles, which are directed to a method and apparatus for decoupling frame number and Picture Order Count (POC) for multi-view video encoding and decoding.
According to an aspect of the present principles, there is provided an apparatus. The apparatus includes a decoder for decoding at least one picture corresponding to at least one of at least two views of multi-view video content from a bitstream. In the bitstream at least one of coding order information and output order information for the at least one picture is decoupled from the at least one view to which the at least one picture corresponds.
According to another aspect of the present principles, there is provided a method. The method includes decoding at least one picture corresponding to at least one of at least two views of multi-view video content from a bitstream. In the bitstream at least one of coding order information and output order information for the at least one picture is decoupled from the at least one view to which the at least one picture corresponds.
According to yet another aspect of the present principles, there is provided an apparatus. The apparatus includes a decoder for decoding at least one of at least two views corresponding to multi-view video content. The decoder decodes the at least one of the at least two views using redefined variables in a default reference picture list construction process and reference picture list reordering corresponding to the International Organization for Standardization/International Electrotechnical Commission Moving Picture Experts Group-4 Part 10 Advanced Video Coding standard/International Telecommunication Union, Telecommunication Sector H.264 recommendation.
According to still yet another aspect of the present principles, there is provided an apparatus. The apparatus includes a decoder (250) for decoding at least one of at least two views corresponding to multi-view video content. The decoder decodes the at least one of the at least two views using redefined variables in a decoded reference picture marking process of the International Organization for Standardization/International Electrotechnical Commission Moving Picture Experts Group-4 Part 10 Advanced Video Coding standard/International Telecommunication Union, Telecommunication Sector H.264 recommendation.
According to a further aspect of the present principles, there is provided a method. The method includes decoding at least one of at least two views corresponding to multi-view video content. The decoding step decodes the at least one of the at least two views using redefined variables in a default reference picture list construction process and reference picture list reordering corresponding to the International Organization for Standardization/International Electrotechnical Commission Moving Picture Experts Group-4 Part 10 Advanced Video Coding standard/International Telecommunication Union, Telecommunication Sector H.264 recommendation.
According to a still further aspect of the present principles, there is provided a method. The method includes decoding at least one of at least two views corresponding to multi-view video content. The decoding step decodes the at least one of the at least two views using redefined variables in a decoded reference picture marking process of the International Organization for Standardization/International Electrotechnical Commission Moving Picture Experts Group-4 Part 10 Advanced Video Coding standard/International Telecommunication Union, Telecommunication Sector H.264 recommendation.
These and other aspects, features and advantages of the present principles will become apparent from the following detailed description of exemplary embodiments, which is to be read in connection with the accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGSThe present principles may be better understood in accordance with the following exemplary figures, in which:
FIG. 1 is a diagram illustrating temporal DIRECT prediction in B slice coding;
FIG. 2A is a block diagram for an exemplary Multi-view Video Coding (MVC) encoder to which the present principles may be applied, in accordance with an embodiment of the present principles;
FIG. 2B is a block diagram for an exemplary Multi-view Video Coding (MVC) decoder to which the present principles may be applied, in accordance with an embodiment of the present principles;
FIG. 3 is a flow diagram for an exemplary method for encoding multi-view video content using modified decoded reference picture marking, in accordance with an embodiment of the present principles;
FIG. 4 is a flow diagram for an exemplary method for decoding multi-view video content using modified decoded reference picture marking, in accordance with an embodiment of the present principles;
FIG. 5 is a flow diagram for an exemplary method for encoding multi-view video content using modified reference picture list construction, in accordance with an embodiment of the present principles;
FIG. 6 is a flow diagram for another exemplary method for encoding multi-view video content using modified reference picture list construction, in accordance with an embodiment of the present principles;
FIG. 7 is a flow diagram for yet another exemplary method for encoding multi-view video content using modified reference picture list construction, in accordance with an embodiment of the present principles;
FIG. 8 is a flow diagram for an exemplary method for decoding multi-view video content using modified reference picture list construction, in accordance with an embodiment of the present principles;
FIG. 9 is a flow diagram for another exemplary method for decoding multi-view video content using modified reference picture list construction, in accordance with an embodiment of the present principles;
FIG. 10 is a flow diagram for yet another exemplary method for decoding multi-view video content using modified reference picture list construction, in accordance with an embodiment of the present principles;
FIG. 11 is a flow diagram for an exemplary method for encoding multi-view video content using temporal DIRECT mode and implicit weighted prediction, in accordance with an embodiment of the present principles;
FIG. 12 is a flow diagram for another exemplary method encoding multi-view video content using temporal DIRECT mode and implicit weighted prediction, in accordance with an embodiment of the present principles;
FIG. 13 is a flow diagram for an exemplary method for decoding multi-view video content using modified decoded reference picture marking, in accordance with an embodiment of the present principles; and
FIG. 14 is a flow diagram for another exemplary method for decoding multi-view video content using modified decoded reference picture marking, in accordance with an embodiment of the present principles.
FIG. 15 is a flow diagram for an exemplary method for encoding multi-view video content using modified decoded reference picture marking, in accordance with an embodiment of the present principles;
FIG. 16 is a flow diagram for an exemplary method for decoding multi-view video content using modified decoded reference picture marking, in accordance with an embodiment of the present principles;
FIG. 17 is a flow diagram for an exemplary method for encoding multi-view video content using modified reference picture list construction and frame number calculation, in accordance with an embodiment of the present principles;
FIG. 18 is a flow diagram for another exemplary method for encoding multi-view video content using modified reference picture list construction and frame number calculation, in accordance with an embodiment of the present principles;
FIG. 19 is a flow diagram for an exemplary method for decoding multi-view video content using modified reference picture list construction and frame number calculation, in accordance with an embodiment of the present principles;
FIG. 20 is a flow diagram for another exemplary method for decoding multi-view video content using modified reference picture list construction and frame number calculation, in accordance with an embodiment of the present principles.
FIG. 21 is a flow diagram for an exemplary method for encoding multi-view video content using modified reference picture list initialization with Reference Picture List Reordering (RPLR) commands, in accordance with an embodiment of the present principles;
FIG. 22 is a flow diagram for another exemplary method for encoding multi-view video content using modified reference picture list initialization with Reference Picture List Reordering (RPLR) commands, in accordance with an embodiment of the present principles;
FIG. 23 is a flow diagram for an exemplary method for decoding multi-view video content using modified reference picture list construction with Reference Picture List Reordering (RPLR) commands, in accordance with an embodiment of the present principles; and
FIG. 24 is a flow diagram for another exemplary method for decoding multi-view video content using modified reference picture list construction with Reference Picture List Reordering (RPLR) commands, in accordance with an embodiment of the present principles.
DETAILED DESCRIPTIONThe present principles are directed to a method and apparatus for decoupling frame number and Picture Order Count (POC) for multi-view video encoding and decoding.
The present description illustrates the present principles. It will thus be appreciated that those skilled in the art will be able to devise various arrangements that, although not explicitly described or shown herein, embody the present principles and are included within its spirit and scope.
All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the present principles and the concepts contributed by the inventor(s) to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions.
Moreover, all statements herein reciting principles, aspects, and embodiments of the present principles, as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof. Additionally, it is intended that such equivalents include both currently known equivalents as well as equivalents developed in the future, i.e., any elements developed that perform the same function, regardless of structure.
Thus, for example, it will be appreciated by those skilled in the art that the block diagrams presented herein represent conceptual views of illustrative circuitry embodying the present principles. Similarly, it will be appreciated that any flow charts, flow diagrams, state transition diagrams, pseudocode, and the like represent various processes which may be substantially represented in computer readable media and so executed by a computer or processor, whether or not such computer or processor is explicitly shown.
The functions of the various elements shown in the figures may be provided through the use of dedicated hardware as well as hardware capable of executing software in association with appropriate software. When provided by a processor, the functions may be provided by a single dedicated processor, by a single shared processor, or by a plurality of individual processors, some of which may be shared. Moreover, explicit use of the term “processor” or “controller” should not be construed to refer exclusively to hardware capable of executing software, and may implicitly include, without limitation, digital signal processor (“DSP”) hardware, read-only memory (“ROM”) for storing software, random access memory (“RAM”), and non-volatile storage.
Other hardware, conventional and/or custom, may also be included. Similarly, any switches shown in the figures are conceptual only. Their function may be carried out through the operation of program logic, through dedicated logic, through the interaction of program control and dedicated logic, or even manually, the particular technique being selectable by the implementer as more specifically understood from the context.
In the claims hereof, any element expressed as a means for performing a specified function is intended to encompass any way of performing that function including, for example, a) a combination of circuit elements that performs that function or b) software in any form, including, therefore, firmware, microcode or the like, combined with appropriate circuitry for executing that software to perform the function. The present principles as defined by such claims reside in the fact that the functionalities provided by the various recited means are combined and brought together in the manner which the claims call for. It is thus regarded that any means that can provide those functionalities are equivalent to those shown herein.
Reference in the specification to “one embodiment” or “an embodiment” of the present principles means that a particular feature, structure, characteristic, and so forth described in connection with the embodiment is included in at least one embodiment of the present principles. Thus, the appearances of the phrase “in one embodiment” or “in an embodiment” appearing in various places throughout the specification are not necessarily all referring to the same embodiment.
As used herein, “high level syntax” refers to syntax present in the bitstream that resides hierarchically above the macroblock layer. For example, high level syntax, as used herein, may refer to, but is not limited to, syntax at the slice header level, Supplemental Enhancement Information (SEI) level, picture parameter set level, sequence parameter set level and NAL unit header level.
Further, as used herein, “previously unused syntax” refers to syntax that does not yet exist in any currently known video coding standards and recommendations and extensions thereof including, but not limited to, the MPEG-4 AVC standard.
Also, as used herein, “coding order information” refers to information present in a video bitstream that indicates the order in which the pictures in the bitstream are coded and/or decoded. Coding order information may include, for example, frame_num.
Additionally, as used herein, “output order information” refers to information present in a video bitstream that indicates the order in which the pictures in the bitstream are output. Output order information may include, for example, a Picture Order Count (POC) value.
Moreover, it is to be appreciated that while the present principles are described herein with respect to the MPEG-4 AVC standard, the present principles are not limited to solely this standard and, thus, may be utilized with respect to other video coding standards and extensions thereof, including extensions of the MPEG-4 AVC standard, while maintaining the spirit of the present principles.
Further, as interchangeably used herein, “cross-view” and “inter-view” both refer to pictures that belong to a view other than a current view.
Turning toFIG. 2A, an exemplary Multi-view Video Coding (MVC) encoder is indicated generally by thereference numeral100. Theencoder100 includes acombiner105 having an output connected in signal communication with an input of atransformer110. An output of thetransformer110 is connected in signal communication with an input ofquantizer115. An output of thequantizer115 is connected in signal communication with an input of anentropy coder120 and an input of aninverse quantizer125. An output of theinverse quantizer125 is connected in signal communication with an input of aninverse transformer130. An output of theinverse transformer130 is connected in signal communication with a first non-inverting input of acombiner135. An output of thecombiner135 is connected in signal communication with an input of anintra predictor145 and an input of adeblocking filter150. An output of thedeblocking filter150 is connected in signal communication with an input of a reference picture store155 (for view i). An output of thereference picture store155 is connected in signal communication with a first input of amotion compensator175 and a first input of amotion estimator180. An output of themotion estimator180 is connected in signal communication with a second input of themotion compensator175
An output of a reference picture store160 (for other views) is connected in signal communication with a first input of adisparity estimator170 and a first input of adisparity compensator165. An output of thedisparity estimator170 is connected in signal communication with a second input of thedisparity compensator165.
An output of theentropy decoder120 is available as an output of theencoder100. A non-inverting input of thecombiner105 is available as an input of theencoder100, and is connected in signal communication with a second input of thedisparity estimator170, and a second input of themotion estimator180. An output of aswitch185 is connected in signal communication with a second non-inverting input of thecombiner135 and with an inverting input of thecombiner105. Theswitch185 includes a first input connected in signal communication with an output of themotion compensator175, a second input connected in signal communication with an output of thedisparity compensator165, and a third input connected in signal communication with an output of theintra predictor145.
Turning toFIG. 2B, an exemplary Multi-view Video Coding (MVC) decoder is indicated generally by thereference numeral3200. Thedecoder3200 includes anentropy decoder3205 having an output connected in signal communication with an input of aninverse quantizer3210. An output of the inverse quantizer is connected in signal communication with an input of aninverse transformer3215. An output of theinverse transformer3215 is connected in signal communication with a first non-inverting input of acombiner3220. An output of thecombiner3220 is connected in signal communication with an input of adeblocking filter3225 and an input of anintra predictor3230. An output of thedeblocking filter3225 is connected in signal communication with an input of a reference picture store3240 (for view i). An output of thereference picture store3240 is connected in signal communication with a first input of amotion compensator3235.
An output of a reference picture store3245 (for other views) is connected in signal communication with a first input of adisparity compensator3250.
An input of theentropy coder3205 is available as an input to thedecoder3200, for receiving a residue bitstream. Moreover, a control input of theswitch3255 is also available as an input to thedecoder3200, for receiving control syntax to control which input is selected by theswitch3255. Further, a second input of themotion compensator3235 is available as an input of thedecoder3200, for receiving motion vectors. Also, a second input of thedisparity compensator3250 is available as an input to thedecoder3200, for receiving disparity vectors.
An output of aswitch3255 is connected in signal communication with a second non-inverting input of thecombiner3220. A first input of theswitch3255 is connected in signal communication with an output of thedisparity compensator3250. A second input of theswitch3255 is connected in signal communication with an output of themotion compensator3235. A third input of theswitch3255 is connected in signal communication with an output of theintra predictor3230. An output of the mode module3260 is connected in signal communication with theswitch3255 for controlling which input is selected by theswitch3255. An output of thedeblocking filter3225 is available as an output of the decoder.
In accordance with the present principles, several changes are proposed to the high level syntax of the MPEG-4 AVC standard for efficient coding of a multi-view video sequence. In an embodiment, it is proposed to decouple the frame number (frame_num) and/or Picture Order Count (POC) values between views when coding a multi-view video sequence. One possible application is we can apply MPEG-4 AVC compliant decoding and output process for each view independently. In an embodiment, the frame number and/or Picture Order Count values between views are decoupled by sending a view_id for each of the views. Previously, it has been simply proposed to add a view identifier (view_id) in high level syntax, since view_id information is needed for several Multi-view Video Coding (MVC) requirements including view interpolation/synthesis, view random access, parallel processing, and so forth. The view_id information can also be useful for special coding modes that only relate to cross-view prediction. It is this view_id that is used in accordance with the present principles to decouple the frame number and Picture Order Count values between the views of multi-view video content. Moreover, in an embodiment, a solution is proposed for fixing the coding tools in the MPEG-4 AVG standard with respect to multi-view Video Coding
In an embodiment, each view will have a different view_id, thus allowing the same frame_num and POC to be reused for different views.
| S0 | I0 | I8 | B4 | B2 | B6 | B1 | B3 | B5 | B7 | (View 0 slice types) |
| S1 | B0 | B8 | B4 | B2 | B6 | B1 | B3 | B5 | B7 | (View 1 slice types) |
| S2 | P0 | P8 | B4 | B2 | B6 | B1 | B3 | B5 | B7 | (View 2 slice types) |
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | (frame_num) |
|
There are several ways in which the different views can be coded to enable parallel processing. One possible way is to encode in such a way that the pictures in one view are coded first for a GOP followed by pictures from another view for the same GOP size until all the views have been encoded for that GOP. The process is then repeated for other GOPs. In the illustration above, first the pictures in view S0 are coded followed by pictures from view S2 and then S1.
Another possible way would be to code all the picture in all the views belonging to the same time instance to be coded first followed by another set of pictures belonging to another time instance in all the views. This process is repeated till all the pictures have been coded. In the illustration above, first all the pictures in view S0, S1, S2 at time instance T0 are coded followed by T8, T4 etc. This invention is agnostic to the order in which the pictures are encoded.
Hereinafter, we will discuss changes to the MPEG-4 AVC standard in accordance with various embodiments of the present principles. We will also show how one or more of the changes can enable parallel coding of multi-view sequences. However, it is to be appreciated that while the present principles are primarily described herein with respect to the MPEG-4 AVC standard, the present principles may be implemented with respect to extensions of the MPEG-4 AVC standard as well as other video coding standards and recommendations and extensions thereof, as readily determined by one of ordinary skill in this and related arts given the teachings of the present principles provided herein, while maintaining the scope of the present principles.
Decoded Reference Picture Marking ProcessIn the current MPEG-4 AVC standard, it is not permitted to have multiple pictures with the same frame_num in the decoded picture buffer (DPB). However, in accordance with an embodiment of the present principles, this restriction may be relaxed in Multi-view Video Coding (MVC), since we decouple the frame_num and/or Picture Order Count, i.e., we propose that each view have its own independent frame_num and/or Picture Order Count values. In order to allow this, in an embodiment, we associate view_id with the decoded pictures. This introduces another dimension for each picture. Thus, in an embodiment, the decoded reference picture marking process is redefined to include the view_id.
There are two methods by which the MPEG-4 AVC standard allows decoded reference picture marking. The first method for decoded reference picture marking in the MPEG-4 AVC standard involves sliding window decoded reference picture marking. The second method for decoded reference picture marking in the MPEG-4 AVC standard involves adaptive memory control decoded reference picture marking.
In accordance with various embodiments of the present principles, one or more of these methods are altered to take into account the new view_id that is present in the slice header. Table1 illustrates the slice header syntax in accordance with an embodiment of the present principles.
| TABLE 1 |
|
| slice_header( ) { | C | Descriptor |
|
| first_mb_in _slice | 2 | ue(v) |
| slice_type | 2 | ue(v) |
| pic_parameter_set_id | 2 | ue(v) |
| if (nal_unit_type == 22 || nal_unit_type == 23) { |
| view_parameter_set_id | 2 | ue(v) |
| view_id | 2 | ue(v) |
| } |
| frame_num | 2 | u(v) |
| if( !frame_mbs_only_flag ) { |
| field_pic_flag | 2 | u(l) |
| if( field_pic_flag ) |
| bottom_field_flag | 2 | u(l) |
| } |
| ........ |
| } |
|
For the first method for decoded reference picture marking in the MPEG-4 AVG standard, a default behavior should be specified when there are pictures with the same frame_num/POC value but with different view_id values. One embodiment of such default behavior in accordance with the present principles is only to apply MMCO commands to those pictures with the same view_id as the current decoded picture.
For the second method for decoded reference picture marking in the MPEG-4 AVG standard, various embodiments in accordance with the present principles are provided where we introduce new Memory Management Control Operations (MMCO) commands and/or modify the existing MMCO commands in the MPEG-4 AVG standard to take into consideration the view_id of the picture that needs to be marked. One embodiment of redefining the existing MMCO (when memory_management_control_operation is equal to 1), involves the following:
Let picNumX be specified by the following:
picNumX=CurrPicNum−(difference_of_pic_nums_minus1+1).
viewIdX=CurrViewId−(difference_of_view_ids_minus1+1).
where picNumX, CurrPicNum, difference_of_pic_nums_minus1 are as defined in the current MPEG-4 AVC standard and viewIdX is the viewId of the picture that is to be marked using the MMCO command, CurrViewId is the viewId of the current decoded picture, and difference_of_view_ids_minus1 is the difference between the current viewId and the viewId of the picture that is to be marked using the MMCO command.
Additionally, for the default behavior of the sliding window decoded reference picture marking process only pictures with the same view_id as the current picture are to be considered to be marked as “unused for reference”.
Turning toFIG. 3, an exemplary method for encoding multi-view video content using modified decoded reference picture marking is indicated generally by thereference numeral300 which uses view first coding.
Themethod300 includes astart block305 that passes control to afunction block310. Thefunction block310 reads the encoder configuration file, and passes control to afunction block315. Thefunction block315 lets the number of views be N, with variables i (view number index) and j (picture number index) both being set equal to zero, and passes control to adecision block320. Thedecision block320 determines whether or not i is less than N. If so, then control is passed to adecision block325. Otherwise, control is passed to anend block399.
Thedecision block325 determines whether or not j is less than the number of pictures in view i. If so, then control is passed to afunction block330. Otherwise, control is passed to afunction block350.
Thefunction block330 encodes picture j in view i, increments j, and passes control to adecision block335. Thedecision block335 determines whether or not a Memory Management Control Operations (MMCO) command is associated with the current picture. If so, then control is passed to afunction block340. Otherwise, control is passed to afunction block355.
Thefunction block340 calculates difference_of_pic_nums_minus1 and difference_of_view_ids_minus1 to determine the picture and view_id of the reference picture to be marked as “unused for reference”, and passes control to afunction block345. Thefunction block345 inserts the current picture in the decoded picture buffer (DPB), and passes control to afunction block360. Thefunction block360 changes frame_num and the Picture Order Count (POC) for the current view_id, and returns control to thefunction block325.
Thefunction block350 increments i, resets frame_num and Picture Order Count (POC), and returns control to thedecision block320.
Thefunction block355 selects the picture with a view_id equal to the view_id of the current picture to be marked as “unused for reference” for use by the MPEG-4 AVC process for sliding window reference picture marking, and passes control to thefunction block355.
Turning toFIG. 4, an exemplary method for decoding multi-view video content using modified decoded reference picture marking is indicated generally by thereference numeral400.
Themethod400 includes astart block405 that passes control to afunction block410. Thefunction block410 parses the bitstream, view_id, frame_num, and Picture Order Count (POC), and passes control to afunction block415. Thefunction block415 decodes the current picture, and passes control to adecision block420. Thedecision block420 determines whether or not a Memory Management Control Operations (MMCO) command is present. If so, then control is passed to afunction block425. Otherwise, control is passed to afunction block440
Thefunction block425 parses difference_of_pic_nums_minus1 and difference_of_view_ids_minus1 to determine the picture and view_id of the reference picture to be “unused for reference”, and passes control to afunction block430. Thefunction block430 inserts the current picture in the decoder picture buffer (DPB), and passes control to adecision block435. Thedecision block435 determines whether or not all pictures have been decoded. If so, then control is passed to anend block499. Otherwise, control is returned to thefunction block410.
Thefunction block440 selects the picture with the view_id equal to the view_id of the current picture to be marked as “unused for reference” for use with the MPEG-4 AVC process for sliding window decoded reference picture marking, and passes control to thefunction block430.
Turning toFIG. 15, an exemplary method for encoding multi-view video content using modified decoded reference picture marking is indicated generally by thereference numeral1500.
Themethod1500 includes astart block1505 that passes control to afunction block1510. Thefunction block1510 reads the encoder configuration file, and passes control to afunction block1515. Thefunction block1515 lets the number of views be N, with variables i (view number index) and j (picture number index) both being set equal to zero, and passes control to adecision block1520. Thedecision block1520 determines whether or not i is less than N. If so, then control is passed to adecision block1525. Otherwise, control is passed to anend block1599.
Thedecision block1525 determines whether or not j is less than the number of pictures in view i. If so, then control is passed to afunction block1530. Otherwise, control is passed to afunction block1550.
Thefunction block1530 encodes picture j in view i, increments j, and passes control to adecision block1535. Thedecision block1535 determines whether or not a Memory Management Control Operations (MMCO) command is associated with the current picture. If so, then control is passed to afunction block1540. Otherwise, control is passed to afunction block1555.
Thefunction block1540 performs the associated MMCO command only with respect to a picture with a view_id equal to the view_id of the current picture, and passes control to afunction block1545. Thefunction block1545 inserts the current picture in the decoded picture buffer (DPB), and passes control to afunction block1560. Thefunction block1560 changes frame_num and the Picture Order Count (POC) for the current viewId, and returns control to thefunction block1525.
Thefunction block1550 increments i, resets frame_num and Picture Order Count (POC), and returns control to thedecision block1520.
Thefunction block1555 selects the picture with a view_id equal to the view_id of the current picture to be marked as “unused for reference” for use by the MPEG-4 AVC process for sliding window reference picture marking, and passes control to thefunction block1555.
Turning toFIG. 16, an exemplary method for decoding multi-view video content using modified decoded reference picture marking is indicated generally by thereference numeral1600.
Themethod1600 includes astart block1605 that passes control to afunction block1610. Thefunction block1610 parses the bitstream, view_id, frame_num, and Picture Order Count (POC), and passes control to afunction block1615. Thefunction block1615 decodes the current picture, and passes control to adecision block1620. Thedecision block1620 determines whether or not a Memory Management Control Operations (MMCO) command is present. If so, then control is passed to afunction block1625. Otherwise, control is passed to afunction block440
Thefunction block1625 parses MMCO commands and performs the MMCO commands only with respect to a picture with a view_id equal to the view_id of the current picture, and passes control to afunction block1630. Thefunction block1630 inserts the current picture in the decoder picture buffer (DPB), and passes control to adecision block1635. Thedecision block1635 determines whether or not all pictures have been decoded. If so, then control is passed to anend block1699. Otherwise, control is returned to thefunction block1610.
Thefunction block1640 selects the picture with the view_id equal to the view_id of the current picture to be marked as “unused for reference” for use with the MPEG-4 AVC process for sliding window decoded reference picture marking, and passes control to thefunction block1630.
Reference Picture Lists ConstructionIn accordance with an embodiment of the present principles, we associate view_id with the decoded reference pictures. Accordingly, in an embodiment, we redefine the initialization process for reference pictures and the reordering process for reference picture lists to include the view_id.
The MPEG-4 AVC standard specifies a default process to initialize the reference lists for P and B slices. This default process can then be modified by special Reference Picture List Reordering (RPLR) commands, which are present in the bitstream.
This default ordering and re-ordering of reference pictures is based on frame_num and Picture Order Count values. However, since we allow a picture with the same frame_num/POC value to be present in the Decoder Picture Buffer (DPB), we need to distinguish between the same frame_num/POC values using view_id. In an embodiment, one or more of these processes to set the reference picture lists is changed.
One embodiment of the default initialization process to initialize the reference lists for P and B slices involves allowing only temporal reference pictures in the reference list and ignoring all pictures with a view_id that is different from the view_id of the current picture. The temporal reference pictures would follow the same default initialization process specified in the MPEG-4 AVC standard. Another embodiment involves placing only the cross-view reference in the list such that the closest view_id is placed earlier in the list. Another embodiment involves initializing the reference lists using temporal references first, then placing the cross-view reference frames at certain fixed locations, for example at the end of the reference lists under construction.
For the Reference Picture List Reordering commands to re-order the list, in an embodiment, new commands are introduced and/or the semantics of existing commands are modified to take into consideration the view_id of the picture that needs to be moved.
In an embodiment, we redefine the MPEG-4 AVC standard variables used in this process as below, so the Reference Picture List Reordering commands specified in the MPEG-4 AVC standard remains unchanged.
One embodiment where we redefine the variables of the MPEG-4 AVC standard relating to reordering the reference lists is shown below. In this embodiment, the following applies:
FrameNum=frame_num*N+view_id; and
MaxFrameNum=2(log2—max—frame—num—minus4+4)*N.
The variable CurrPicNum is derived as follows: if field_pic_flag is equal to 0, then CurrPicNum is set equal to frame_num * N+view_id; and otherwise, if field_pic_flag is equal to 1, then CurrPicNum is set equal to 2 * (frame_num * N+view_id)+1.
The Picture Order Count for a slice in the MPEG-4 AVC standard is defined as follows:
- if(picX is a frame or a complementary field pair)
PicOrderCnt(picX)=Min(TopFieldOrderCnt, BottomFieldOrderCnt) of the frame or complementary field pair picX
- else if(picX is a top field)
PicOrderCnt(picX)=TopFieldOrderCnt of field picXelse if(picXis a bottom field)
- PicOrderCnt(picX)=BottomFieldOrderCnt of field picX
For Multi-view Video Coding slices, the Picture Order Count is derived as follows for the decoding process for reference picture list construction and the decoded reference picture marking process:
PicOrderCnt(picX)=PicOrderCnt(picX)*N+view_id;
where N denotes the number of views. The number of views is indicated using a high level syntax in the bitstream and can be conveyed in-band or out-of-band. One embodiment is to include this in parameter sets of the MPEG-4 AVC standard (e.g., Sequence Parameter Set (SPS), Picture Parameter Set (PPS), or View Parameter Set (VPS)).
Another embodiment of redefining the variables of the MPEG-4 AVC standard relating to reordering the reference lists is shown below. In this embodiment, the following applies:
FrameNum=GOP_length*view_id+frame_num.
For Multi-view Video Coding slices, the Picture Order Count is derived as follows for the decoding process for reference picture list construction and decoded reference picture marking process:
PicOrderCnt(picX)=PicOrderCnt(picX)+GOP_length*view_id,
where GOP_length is defined as an anchor picture and all pictures that are temporally located between the anchor picture and the previous anchor picture for each view.
In another embodiment, we change the semantics of the existing RPLR commands such that they apply only the pictures that have the same view_id as the current view.
Turning toFIG. 5, an exemplary method for encoding multi-view video content using modified reference picture list construction is indicated generally by thereference numeral500. Themethod500 includes astart block505 that passes control to afunction block510. Thefunction block510 reads the encoder configuration file, and passes control to afunction block515. Thefunction block515 lets the number of views be equal to a variable N, sets variables i (view number index) and j (picture number index)to both be equal to zero, and passes control to adecision block520. Thedecision block520 determines whether or not i is less than N. If so, the control is passed to afunction block525. Otherwise, control is passed to anend block599.
Thefunction block525 determines whether or not j is less than the number of pictures in view i. If so, then control is passed to afunction block530. Otherwise, control is passed to afunction block545.
Thefunction block530, for inter pictures, includes only pictures with a view_id equal to the view_id of the current picture for use by the MPEG-4 AVC processor for reference list initialization, and passes control to afunction block532. Thefunction block532 reorders the reference list, and passes control to afunction block535. Thefunction block535 encodes picture j in view i, increments j, and passes control to afunction block540. Thefunction block540 increments frame_num and Picture Order Count (POC), and returns control to thedecision block525.
Thefunction block545 increments i, resets frame_num and Picture Order Count (POC), and returns control to thedecision block520.
Turning toFIG. 6, another exemplary method for encoding multi-view video content using modified reference picture list construction is indicated generally by thereference numeral600.
Themethod600 includes astart block605 that passes control to afunction block610. Thefunction block610 reads the encoder configuration file, and passes control to afunction block615. Thefunction block615 lets the number of views be equal to a variable N, sets variables i (view number index) and j (picture number index) to both be equal to zero, and passes control to adecision block620. Thedecision block620 determines whether or not i is less than N. If so, the control is passed to afunction block625. Otherwise, control is passed to anend block699.
Thefunction block625 determines whether or not j is less than the number of pictures in view i. If so, then control is passed to afunction block630. Otherwise, control is passed to afunction block645.
Thefunction block630, for inter pictures, initialize the reference lists with only pictures with a view_id different than the view_id of the current picture, sampled at the same time as the current picture and ordered such that the closest view_id's are placed earlier in the list, and passes control to afunction block632. Thefunction block632 reorders the reference list, and passes control to afunction block635. Thefunction block635 encodes picture j in view i, increments j, and passes control to afunction block640. Thefunction block640 increments frame_num and Picture Order Count (POC), and returns control to thedecision block625.
Thefunction block645 increments i, resets frame_num and Picture Order Count (POC), and returns control to thedecision block620.
Turning toFIG. 7, yet another exemplary method for encoding multi-view video content using modified reference picture list construction is indicated generally by thereference numeral700.
Themethod700 includes astart block705 that passes control to afunction block710. Thefunction block710 reads the encoder configuration file, and passes control to afunction block715. Thefunction block715 lets the number of views be equal to a variable N, sets variables i (view number index) and j (picture number index) to both be equal to zero, and passes control to adecision block720. Thedecision block720 determines whether or not i is less than N. If so, the control is passed to afunction block725. Otherwise, control is passed to anend block799.
Thefunction block725 determines whether or not j is less than the number of pictures in view i. If so, then control is passed to afunction block730. Otherwise, control is passed to afunction block745.
Thefunction block730 includes only pictures with a view_id equal to the view_id of the current picture for use by the MPEG-4 AVC processor for reference list initialization, and passes control to afunction block732. Thefunction block732 inserts cross-view pictures, with the same temporal location as the current picture, at the end of the reference list, and passes control to afunction block735. Thefunction block735 encodes picture j in view i, increments j, and passes control to afunction block740. Thefunction block740 increments frame_num and Picture Order Count (POC), and returns control to thedecision block725.
Thefunction block745 increments i, resets frame_num and Picture Order Count (POC), and returns control to thedecision block720.
Turning toFIG. 8, an exemplary method for decoding multi-view video content using modified reference picture list construction is indicated generally by thereference numeral800. Themethod800 includes astart block805 that passes control to afunction block810. Thefunction block810 parses the bitstream, view_id, frame_num, and Picture Order Count (POC), and passes control to afunction block815. Thefunction block815 includes only pictures with a view_id equal to the view_id of the current picture for use by the MPEG-4 AVC process for reference list initialization, and passes control to afunction block820. Thefunction block820 decodes the current picture, and passes control to afunction block825. Thefunction block825 inserts the current picture in the decoded picture buffer, and passes control to adecision block830. Thedecision block830 determines whether or not all pictures have been decoded. If so, then control is passed to anend block899. Otherwise, control is returned to thefunction block810.
Turning toFIG. 9, another exemplary method for decoding multi-view video content using modified reference picture list construction is indicated generally by thereference numeral900. Themethod900 includes astart block905 that passes control to afunction block910. Thefunction block910 parses the bitstream, view_id, frame_num, and Picture Order Count (POC), and passes control to afunction block915. Thefunction block915 initializes the reference lists with only pictures with a view_id different than the view_id of the current picture, sampled at the same time as the current picture and ordered such that the closest view_id's are placed earlier in the list, and passes control to afunction block920. Thefunction block920 decodes the current picture, and passes control to afunction block925. Thefunction block925 inserts the current picture in the decoded picture buffer (DPB), and passes control to adecision block930. Thedecision block930 determines whether or not all pictures have been decoded. If so, then control is passed to anend block999. Otherwise, control is returned to thefunction block910.
Turning toFIG. 10, yet another exemplary method for decoding multi-view video content using modified reference picture list construction is indicated generally by thereference numeral1000. Themethod1000 includes astart block1005 that passes control to afunction block1010. Thefunction block1010 parses the bitstream, view_id, frame_num, and Picture Order Count (POC), and passes control to afunction block1015. Thefunction block1015 includes only pictures with a view_id equal to the view_id of the current picture for use by the MPEG-4 AVC process for reference list initialization, and passes control to afunction block1020. Thefunction block1020 inserts cross-view pictures, with the same temporal location as the current picture, at the end of the reference list, and passes control to afunction block1025. Thefunction block1025 inserts the current picture in the decoded picture buffer, and passes control to adecision block1030. Thedecision block1030 determines whether or not all pictures have been decoded. If so, then control is passed to anend block1099. Otherwise, control is returned to thefunction block1010.
Turning toFIG. 17, an exemplary method for encoding multi-view video content using modified reference picture list construction and frame number calculation is indicated generally by thereference numeral1700.
Themethod1700 includes astart block1705 that passes control to afunction block1710. Thefunction block1710 reads the encoder configuration file, and passes control to afunction block1715. Thefunction block1715 lets the number of views be equal to a variable N, sets variables i (view number index) and j (picture number index) to both be equal to zero, and passes control to adecision block1720. Thedecision block1720 determines whether or not i is less than N. If so, the control is passed to afunction block1725. Otherwise, control is passed to anend block1799.
Thefunction block1725 determines whether or not j is less than the number of pictures in view i. If so, then control is passed to afunction block1730. Otherwise, control is passed to afunction block1745.
Thefunction block1730 sets frame_num=frame_num*N+view_id, sets PicOrderCnt(picX)=PicOrderCnt(picX)*N+view_id, and passes control to afunction block1735. Thefunction block1735 encodes picture j in view i, increments j, and passes control to afunction block1740. Thefunction block1740 increments frame_num and Picture Order Count (POC), and returns control to thedecision block1725.
Thefunction block1745 increments i, resets frame_num and Picture Order Count (POC), and returns control to thedecision block1720.
Turning toFIG. 18, another exemplary method for encoding multi-view video content using modified reference picture list construction and frame number calculation is indicated generally by thereference numeral1800.
Themethod1800 includes astart block1805 that passes control to afunction block1810. Thefunction block1810 reads the encoder configuration file, and passes control to afunction block1815. Thefunction block1815 lets the number of views be equal to a variable N, sets variables i (view number index) and j (picture number index) to both be equal to zero, and passes control to adecision block1820. Thedecision block1820 determines whether or not i is less than N. If so, the control is passed to afunction block1825. Otherwise, control is passed to anend block1899.
Thefunction block1825 determines whether or not j is less than the number of pictures in view i. If so, then control is passed to afunction block1830. Otherwise, control is passed to afunction block1845.
Thefunction block1830 sets frame_num=GOP_length*view_id+frame_num, sets PicOrderCnt(picX)=PicOrderCnt(PicX)+GOP_length * view_id, and passes control to afunction block1835. Thefunction block1835 encodes picture j in view i, increments j, and passes control to afunction block1840. Thefunction block1840 increments frame_num and Picture Order Count (POC), and returns control to thedecision block1825.
Thefunction block1845 increments i, resets frame_num and Picture Order Count (POC), and returns control to thedecision block1820.
Turning toFIG. 19, an exemplary method for decoding multi-view video content using modified reference picture list construction and frame number calculation is indicated generally by thereference numeral1900. Themethod1900 includes astart block1905 that passes control to afunction block1910. Thefunction block910 parses the bitstream, view_id, frame num, and Picture Order Count (POC), and passes control to afunction block1915. Thefunction block1915 sets frame_num=frame_num*N+view_id, sets PicOrderCnt(picX)=PicOrderCnt(picX)*N+view_id, and passes control to afunction block1920. Thefunction block1920 decodes the current picture, and passes control to afunction block1925. Thefunction block1925 inserts the current picture in the decoded picture buffer (DPB), and passes control to adecision block1930. Thedecision block1930 determines whether or not all pictures have been decoded. If so, then control is passed to anend block1999. Otherwise, control is returned to thefunction block1910.
Turning toFIG. 20, another exemplary method for decoding multi-view video content using modified reference picture list construction and frame number calculation is indicated generally by thereference numeral2000. Themethod2000 includes astart block2005 that passes control to afunction block2010. Thefunction block2010 parses the bitstream, view_id, frame_num, and Picture Order Count (POC), and passes control to afunction block2015. Thefunction block2015 sets frame_num=GOP_length*view_id_frame_num, sets PicOrderCnt(picX)=PicOrderCnt(picX)+GOP_length*view_id, and passes control to afunction block2020. Thefunction block2020 decodes the current picture, and passes control to afunction block2025. Thefunction block2025 inserts the current picture in the decoded picture buffer (DPB), and passes control to adecision block2030. Thedecision block2030 determines whether or not all pictures have been decoded. If so, then control is passed to anend block2099. Otherwise, control is returned to thefunction block2010.
Turning toFIG. 21, an exemplary method for encoding multi-view video content using modified reference picture list initialization with Reference Picture List Reordering (RPLR) commands is indicated generally by thereference numeral2100.
Themethod2100 includes astart block2105 that passes control to afunction block2110. Thefunction block2110 reads the encoder configuration file, and passes control to afunction block2115. Thefunction block2115 lets the number of views be equal to a variable N, sets variables i (view number index) and j (picture number index) to both be equal to zero, and passes control to adecision block2120. Thedecision block2120 determines whether or not i is less than N. If so, the control is passed to afunction block2125. Otherwise, control is passed to anend block2199.
Thefunction block2125 determines whether or hot j is less than the number of pictures in view i. If so, then control is passed to afunction block2130. Otherwise, control is passed to afunction block2145.
Thefunction block2130, for inter pictures, performs default reference list initialization, and passes control to afunction block2132. Thefunction block2132 reads RPLR commands from the encoder configuration file, and passes control to afunction block2134. Thefunction block2134 performs the RPLR commands only with respect to the picture with a view_id equal to the view_id of the current picture, and passes control to afunction block2135. Thefunction block2135 encodes picture j in view i, increments j, and passes control to afunction block2140. Thefunction block2140 increments frame_num and Picture Order Count (POC), and returns control to thedecision block2130.
Thefunction block2145 increments i, resets frame_num and Picture Order Count (POC), and returns control to thedecision block2120.
Turning toFIG. 22, another exemplary method for encoding multi:view video content using modified reference picture list initialization with Reference Picture List Reordering (RPLR) commands is indicated generally by thereference numeral2200.
Themethod2200 includes astart block2205 that passes control to afunction block2210. Thefunction block2210 reads the encoder configuration file, and passes control to afunction block2215. Thefunction block2215 lets the number of views be equal to a variable N, sets variables i (view number index) and j (picture number index) to both be equal to zero, and passes control to adecision block2220. Thedecision block2220 determines whether or not i is less than N. If so, the control is passed to afunction block2225. Otherwise, control is passed to anend block2299.
Thefunction block2225 determines whether or not j is less than the number of pictures in view i. If so, then control is passed to afunction block2230. Otherwise, control is passed to afunction block2245.
Thefunction block2230, for inter pictures, performs default reference list initialization, and passes control to afunction block2232. Thefunction block2232 reads RPLR commands from the encoder configuration file, and passes control to afunction block2234. Thefunction block2234 performs the RPLR commands on the picture specified in the view_id indicated in the RPLR command, and passes control to afunction block2235. Thefunction block2235 encodes picture j in view i, increments j, and passes control to afunction block2240. Thefunction block2240 increments frame_num and Picture Order Count (POC), and returns control to thedecision block2230.
Thefunction block2245 increments i, resets frame_num and Picture Order Count (POC), and returns control to thedecision block2220.
Turning toFIG. 23, an exemplary method for decoding multi-view video content using modified reference picture list construction with Reference Picture List Reordering (RPLR) commands is indicated generally by thereference numeral2300. Themethod2300 includes astart block2305 that passes control to afunction block2310. Thefunction block2310 parses the bitstream, view_id, frame_num, and Picture Order Count (POC), and passes control to afunction block2315. Thefunction block2315 includes only pictures with a view_id equal to the view_id of the current picture for use by the MPEG-4 AVC process for reference list initialization, and passes control to afunction block2317. Thefunction block2317 reads the RPLR commands, and passes control to afunction block2319. Thefunction block2319 performs the RPLR commands only with respect to a picture with a view_id equal to the view_id of the current picture, and passes control to afunction block2320. Thefunction block2320 decodes the current picture, and passes control to afunction block2325. Thefunction block2325 inserts the current picture in the decoded picture buffer, and passes control to adecision block2330. Thedecision block2330 determines whether or not all pictures have been decoded. If so, then control is passed to anend block2399. Otherwise, control is returned to thefunction block2310.
Turning toFIG. 24, another exemplary method for decoding multi-view video content using modified reference picture list construction with Reference Picture List Reordering (RPLR) commands is indicated generally by thereference numeral2400. Themethod2400 includes astart block2405 that passes control to afunction block2410. Thefunction block2410 parses the bitstream, view_id, frame_num, and Picture Order Count (POC), and passes control to afunction block2415. Thefunction block2415 includes only pictures with a view_id equal to the view_id of the current picture for use by the MPEG-4 AVC process for reference list initialization, and passes control to afunction block2417. Thefunction block2417 reads the RPLR commands, and passes control to afunction block2419. Thefunction block2419 performs the RPLR commands only with respect to a picture with a view_id equal to the view_id of the current picture, and passes control to afunction block2420. Thefunction block2420 decodes the current picture, and passes control to afunction block2425. Thefunction block2325 inserts the current picture in the decoded picture buffer, and passes control to adecision block2430. Thedecision block2430 determines whether or not all pictures have been decoded. If so, then control is passed to anend block2499. Otherwise, control is returned to thefunction block2410.
Temporal DIRECT ModeAs, mentioned above, temporal DIRECT mode uses the Picture Order Count information to derive the motion vector for a given macroblock. Since we decouple the frame_num and/or Picture Order Count values, introduce the view_id for each view of multi-view video content, and allow placing cross-view pictures in the decoder picture buffer and reference lists, in an embodiment, we also refine this mode to handle the derivations correctly where cross-view pictures refer to the pictures from a view that is different from the current view.
In temporal DIRECT mode, we have the following exemplary cases:
(1) picture inref list 1 and picture inref list 0 have different POC and same view_id;
(2) picture inref list 1 and picture inref list 0 have different POC and different view_id;
(3) picture inref list 1 and picture inref list 0 have same POC and different view_id; and
(4) picture inref list 1 and picture inref list 0 have same POC and same view_id.
One embodiment of obtaining the motion vector in temporal DIRECT mode is to use the existing MPEG-4 AVC method of simply ignoring the view_id information present in the bitstream. In another embodiment, we redefine temporal DIRECT mode to take into consideration view_id information along with the Picture Order Count information.
Implicit Weighted PredictionSimilar to temporal DIRECT mode, implicit weighted prediction (as discussed above) also uses Picture Order Count values to determine the weights to be applied to the reference pictures. As a result, in an embodiment, all the changes that apply to temporal DIRECT mode will indirectly fix the implicit weighted prediction mode. In another embodiment, the method to obtain weights in implicit weighted prediction mode can be redefined to take into consideration view_id information along with the Picture Order Count information. For example, we may calculate the Picture Order Count by taking into consideration the view_id information and the number of views as described above and thereafter take the difference between Picture Order Counts in order to obtain the required values to perform implicit weighted prediction.
Turning toFIG. 11, an exemplary method for encoding multi-view video content using temporal DIRECT mode and implicit weighted prediction is indicated generally by thereference numeral1100.
Themethod1100 includes astart block1105 that passes control to afunction block1110. Thefunction block1110 reads the encoder configuration file, and passes control to afunction block1115. Thefunction block1115 lets the number of views be equal to a variable N, sets variables i (view number index) and j (picture number index) to both be equal to zero, and passes control to adecision block1120. Thedecision block1120 determines whether or not i is less than N. If so, the control is passed to afunction block1125. Otherwise, control is passed to anend block1199.
Thefunction block1125 determines whether or not j is less than the number of pictures in view i. If so, then control is passed to adecision block1132. Otherwise, control is passed to afunction block1145.
Thedecision block1132 determines whether or not weighted prediction is enabled for the current slice. If so, then control is passed to afunction block1134. Otherwise, control is passed to afunction block1136.
Thefunction block1134 ignores view_id information for weighted prediction, and passes control to thefunction block1136.
Thefunction block1136 starts encoding a current macroblock, and passes control to adecision block1138. Thedecision block1138 determines whether or not to choose direct mode for the macroblock. If so, then control is passed to afunction block1142. Otherwise, control is passed to afunction block1152.
Thefunction block1142 ignores view_id for direct mode, and passes control to thefunction block1152.
Thefunction block1152 encodes the current macroblock, and passes control to adecision block1154. Thedecision block1154 determines whether or not all macroblock have been encoded. If so, the control is passed to afunction block1156. Otherwise, control is returned to thefunction block1136.
Thefunction block1156 increment the variable j, and passes control to afunction block1140. Thefunction block1140 increments frame_num and Picture Order Count, and returns control to thedecision block1125.
Thefunction block1145 increments i, resets fram_num and Picture Order Count, and returns control to thedecision block1120.
Turning toFIG. 12, another exemplary method encoding multi-view video content using temporal DIRECT mode and implicit weighted prediction is indicated generally by thereference numeral1200.
Themethod1200 includes astart block1205 that passes control to afunction block1210. Thefunction block1210 reads the encoder configuration file, and passes control to afunction block1215. Thefunction block1215 lets the number of views be equal to a variable N, sets variables i (view number index) and j (picture number index) to both be equal to zero, and passes control to adecision block1220. Thedecision block1220 determines whether or not i is less than N. If so, the control is passed to afunction block1225. Otherwise, control is passed to anend block1299.
Thefunction block1225 determines whether or not j is less than the number of pictures in view i. If so, then control is passed to adecision block1232. Otherwise, control is passed to afunction block1245.
Thedecision block1232 determines whether or not weighted prediction is enabled for the current slice. If so, then control is passed to afunction block1234. Otherwise, control is passed to afunction block1236.
Thefunction block1234 ignores view_id information for weighted prediction, and passes control to thefunction block1236.
Thefunction block1236 starts encoding a current macroblock, and passes control to adecision block1238. Thedecision block1238 determines whether or not to choose direct mode for the macroblock. If so, then control is passed to afunction block1242. Otherwise, control is passed to afunction block1252.
Thefunction block1242 considers view_id for direct mode, and passes control to thefunction block1252.
Thefunction block1252 encodes the current macroblock, and passes control to adecision block1254. Thedecision block1254 determines whether or not all macroblock have been encoded. If so, the control is passed to afunction block1256. Otherwise, control is returned to thefunction block1236.
Thefunction block1256 increment the variable j, and passes control to afunction block1240. Thefunction block1240 increments fram_num and Picture Order Count, and returns control to thedecision block1225.
Thefunction block1245 increments i, resets fram_num and Picture Order Count, and returns control to thedecision block1220 Turning toFIG. 13, an exemplary method for decoding multi-view video content using modified decoded reference picture marking is indicated generally by thereference numeral1300.
Themethod1300 includes astart block1305 that passes control to afunction block1310. Thefunction block1310 parses the bitstream, view_id, frame_num, and Picture Order Count (POC), and passes control to afunction block1315. Thefunction block1315 parses the macroblock mode, the motion vector, ref_idx, and passes control to adecision block1320. Thedecision block1320 determines whether or not weighted prediction is enabled for the picture. If so, the control is passed to afunction block1325. Otherwise, control is passed to adecision block1330.
Thefunction block1325 ignores view_id information for weighted prediction, and passes control to thedecision block1330.
Thedecision block1330 determines whether or not a macroblock is a direct mode macroblock. If so, then control is passed to afunction block1355. Otherwise, control is passed to afunction block1335.
Thefunction block1355 ignores view_id information for direct mode, and passes control to afunction block1335.
Thefunction block1335 decodes the current macroblock, and passes control to adecision block1340. Thedecision block1340 determines whether or not all macroblocks have been decoded. If so, the control is passed to afunction block1345. Otherwise, control is returned to thefunction block1315.
Thefunction block1345 inserts the current picture in the decoded picture buffer, and passes control to adecision block1350. Thedecision block1350 determines whether or not all pictures have been decoded. If so, the control is passed to anend block1399. Otherwise, control is returned to thefunction block1310.
Turning toFIG. 14, another exemplary method for decoding multi-view video content using modified decoded reference picture marking is indicated generally by thereference numeral1400.
Themethod1400 includes astart block1405 that passes control to a function block1410. The function block1410 parses the bitstream, view_id, frame_num, and Picture Order Count (POC), and passes control to afunction block1415. Thefunction block1415 parses the macroblock mode, the motion vector, ref_idx, and passes control to adecision block1420. Thedecision block1420 determines whether or not weighted prediction is enabled for the picture. If so, the control is passed to afunction block1425. Otherwise, control is passed to adecision block1430.
Thefunction block1425 ignores view_id information for weighted prediction, and passes control to thedecision block1430.
Thedecision block1430 determines whether or not a macroblock is a direct mode macroblock. If so, then control is passed to afunction block1455. Otherwise, control is passed to afunction block1435.
Thefunction block1455 considers view_id information for direct mode, and passes control to afunction block1435.
Thefunction block1435 decodes the current macroblock, and passes control to adecision block1440. Thedecision block1440 determines whether or not all macroblocks have been decoded. If so, the control is passed to afunction block1445. Otherwise, control is returned to thefunction block1415.
Thefunction block1445 inserts the current picture in the decoded picture buffer, and passes control to adecision block1450. Thedecision block1450 determines whether or not all pictures have been decoded. If so, the control is passed to anend block1499. Otherwise, control is returned to the function block1410.
Parallel Coding of MVCDue to the amount of data involved in the processing of multi-view video content sequences, support for parallel encoding/decoding in Multi-view Video Coding is very important for many applications, especially those with a real-time constraint. In the current MPEG-4 AVC compliant implementation of Multi-view Video Coding, cross-view prediction is enabled but there is no provision to distinguish temporal references from cross-view references. By adding view_id support in the Multi-view Video Coding encoder and/or decoder and including view_id's in the construction of decoded reference picture management and reference list construction as we proposed herein, the data dependency between parallel processing engines is clearly defined, which facilitates parallel implementation for the MVC codec.
A description will now be given of some of the many attendant advantages/features of the present invention, some of which have been mentioned above. For example, one advantage/feature is an apparatus that includes a decoder for decoding at least one picture corresponding to at least one of at least two views of multi-view video content from a bitstream, wherein in the bitstream at least one of coding order information and output order information for the at least one picture is decoupled from the at least one view to which the at least one picture corresponds.
Another advantage/feature is the apparatus having the decoder as described above, wherein the decoder determines an existence of a decoupling of the at least one of the coding order information and the output order information for the at least one picture using at least one existing syntax element (frame_num and pic_order_cnt_Isb) corresponding to the International Organization for Standardization/International Electrotechnical Commission Moving Picture Experts Group-4 Part 10 Advanced Video Coding standard/International Telecommunication Union, Telecommunication Sector H.264 recommendation.
Yet another advantage/feature is the apparatus having the decoder as described above, wherein the decoder determines an existence of a decoupling of the at least one of the coding order information and the output order information for the at least one picture using a view identifier.
Yet still another advantage/feature is the apparatus having the decoder that determines the existence of the decoupling using the view identifier as described above, wherein the view identifier is present at a slice level in the bitstream.
Yet still a further advantage/feature is the apparatus having the decoder that determines the existence of the decoupling using the view identifier as described above, wherein the view identifier is present at a level higher than a macroblock level in the bitstream.
Moreover, another advantage/feature is the apparatus having the decoder that determines the existence of the decoupling using the view identifier present at the level higher than the macroblock level as described above, wherein the decoder parses the view identifier from the bitstream for use by a decoded reference picture marking process.
Further, another advantage/feature is the apparatus having the decoder that parses the view identifier from the bitstream as described above, wherein the decoder parses the view identifier from the bitstream to determine which of the at least two views a particular picture to be marked by the decoded reference picture marking process belongs.
Also, another advantage/feature is the apparatus having the decoder that determines the existence of the decoupling using the view identifier present at the level higher than the macroblock level as described above, wherein the decoder uses at least one existing syntax element (no_output_of_prior_pics_flag, long_term_reference_flag, adaptive_ref_pic_marking_mode_flag, memory_management_control_operation, difference_of_pic_nums_minus1, long_term_pic_num, long_term_frame_idx, max_long_term_frame_idx_plus1) with semantics of the at least one existing syntax element redefined for use in a redefined decoded reference picture marking process corresponding to the International Organization for Standardization/International Electrotechnical Commission Moving Picture Experts Group-4 Part 10 Advanced Video Coding standard/International Telecommunication Union, Telecommunication Sector H.264 recommendation to support a use of the view identifier in the redefined decoded reference picture marking process.
Additionally, another advantage/feature is the apparatus having the decoder that uses the at least one existing syntax element as described above, wherein in the redefined decoded reference picture marking process, only pictures with a same view identifier as a currently decoded picture are marked.
Moreover, another advantage/feature is the apparatus having the decoder that uses the at least one existing syntax element as described above, wherein at least one of a sliding window decoded reference picture marking process and an adaptive memory control decoded reference picture marking process are applied.
Further, another advantage/feature is the apparatus having the decoder that uses the at least one existing syntax element as described above, wherein in the redefined decoded reference picture marking process, pictures which have a different view identifier than that of the at least one picture are marked using a previously unused syntax element (difference_of_view_ids_minus1).
Also, another advantage/feature is the apparatus having the decoder that determines the existence of the decoupling using the view identifier present at the level higher than the macroblock level as described above, wherein the decoder parses the view identifier from the bitstream for default reference picture list construction.
Additionally, another advantage/feature is the apparatus having the decoder that parses the view identifier from the bitstream as described above, wherein inter-view reference pictures are prohibited from being added to a reference list for a default reference picture list creation process corresponding to the reference picture list construction, according to at least one existing syntax element (frame_num and pic_order_cnt_Isb) for the reference picture list construction.
Moreover, another advantage/feature is the apparatus having the decoder that parses the view identifier from the bitstream as described above, wherein only inter-view reference pictures are added to a reference list for a default reference picture list creation process corresponding to the reference picture list construction, according to at least one existing syntax element (frame_num and pic_order_cnt_Isb) for the default reference picture list construction.
Further, another advantage/feature is the apparatus having the decoder wherein only inter-view reference pictures are added to the reference list for the default reference picture list creation process as described above, wherein the inter-view reference pictures are added after temporal references.
Also, another advantage/feature is the apparatus having the decoder that determines the existence of the decoupling using the view identifier as described above, wherein the decoder uses at least one existing syntax element (ref_pic_list_reordering_flag—10, reordering_of_pic_nums_idc, abs_diff_pic_num_minus1, long_term_pic_num, ref_pic_list_reordering_flag_I1, reordering_of_pic_nums_idc, abs_diff_pic_num_minus1, long_term_pic_num) redefined for use in a redefined reference picture list reordering process corresponding to the International Organization for Standardization/International Electrotechnical Commission Moving Picture Experts Group-4 Part 10 Advanced Video Coding standard/International Telecommunication Union, Telecommunication Sector H.264 recommendation to support a use of the view identifier in the redefined reference picture list reordering process.
Additionally, another advantage/feature is the apparatus having the decoder that uses the at least one existing syntax element as described above, wherein in the redefined reference picture list reordering process, only pictures with a same view identifier as a currently decoded picture are reordered.
Moreover, another advantage/feature is the apparatus having the decoder wherein in the redefined reference picture list reordering process, only pictures with a same view identifier as a currently decoded picture are reordered as described above, wherein in the redefined reference picture list reordering process, wherein the view identifier indicates to which of the at least two views corresponds a particular picture to be moved to a current index in a corresponding reference picture list.
Further, another advantage/feature is the apparatus having the decoder wherein in the redefined reference picture list reordering process, only pictures with a same view identifier as a currently decoded picture are reordered as described above, wherein in the redefined reference picture list reordering process, wherein the view identifier is only required when the view identifier of a reference picture to be ordered is different from that of the at least one picture.
Also, another advantage/feature is the apparatus having the decoder that determines the existence of the decoupling using the view identifier as described above, wherein the decoder uses an existing syntax element (pic_order_cnt_Isb) redefined for temporal DIRECT mode, the existing syntax corresponding to the International Organization for Standardization/International Electrotechnical Commission Moving Picture Experts Group-4 Part 10 Advanced Video Coding standard/International Telecommunication Union, Telecommunication Sector H.264 recommendation to support a use of the view identifier in the temporal DIRECT mode.
Additionally, another advantage/feature is the apparatus having the decoder that uses the existing syntax element as described above, wherein the temporal DIRECT mode is derived based on at least one of a Picture Order Count value and a view identifier.
Moreover, another advantage/feature is the apparatus having the decoder that determines the existence of the decoupling using the view identifier as described above, wherein the decoder uses an existing syntax element (pic_order_cnt_Isb), existing semantics, and an existing decoding process for temporal DIRECT mode, the existing syntax, the existing semantics, and the existing decoding process corresponding to the International Organization for Standardization/International Electrotechnical Commission Moving Picture Experts Group-4 Part 10 Advanced Video Coding standard/International Telecommunication Union, Telecommunication Sector H.264 recommendation.
Further, another advantage/feature is the apparatus having the decoder that determines the existence of the decoupling using the view identifier as described above, wherein the decoder uses an existing syntax element (pic_order_cnt_Isb) redefined for implicit weighted prediction, the existing syntax corresponding to the International Organization for Standardization/International Electrotechnical Commission Moving Picture Experts Group-4 Part 10 Advanced Video Coding standard/International Telecommunication Union, Telecommunication Sector H.264 recommendation to support a use of the view identifier in the implicit weighted prediction.
Also, another advantage/feature is the apparatus having the decoder that uses the existing syntax element as described above, wherein the implicit weighted prediction is derived based on at least one of a Picture Order Count value and a view identifier.
Additionally, another advantage/feature is the apparatus having the decoder that determines the existence of the decoupling using the view identifier as described above, wherein the decoder uses an existing syntax element (pic_order_cnt_Isb), existing semantics, and an existing decoding process for implicit weighted prediction, the existing syntax, the existing semantics, and the existing decoding process corresponding to the International Organization for Standardization/International Electrotechnical Commission Moving Picture Experts Group-4 Part 10 Advanced Video Coding standard/International Telecommunication Union, Telecommunication Sector H.264 recommendation.
Moreover, another advantage/feature is the apparatus having the decoder as described above, wherein the decoder uses a particular one of the at least two views corresponding to a particular picture to determine an inter-view dependency in a parallel decoding of different ones of the at least two views.
Yet another advantage/feature is an apparatus having a decoder for decoding at least one of at least two views corresponding to multi-view video content. The decoder decodes the at least one of the at least two views using redefined variables in a default reference picture list construction process and reference picture list reordering corresponding to the International Organization for Standardization/International Electrotechnical Commission Moving Picture Experts Group-4 Part 10 Advanced Video Coding standard/International Telecommunication Union, Telecommunication Sector H.264 recommendation.
Moreover, another advantage/feature is an apparatus having the decoder as described above, wherein at least one of a number of views and view identifier information is used to redefine the variables.
Further, another advantage/feature is an apparatus having the decoder as described above, wherein at least one of a Group Of Pictures length and view identifier information is used to redefine the variables.
Yet another advantage/feature is an apparatus having a decoder for decoding at least one of at least two views corresponding to multi-view video content. The decoder decodes the at least one of the at least two views using redefined variables in a decoded reference picture marking process of the International Organization for Standardization/International Electrotechnical Commission Moving Picture Experts Group-4 Part 10 Advanced Video Coding standard/International Telecommunication Union, Telecommunication Sector H.264 recommendation.
Moreover, another advantage/feature is an apparatus having the decoder as described above, wherein at least one of a number of views and view identifier information is used to redefine the variables.
Further, another advantage/feature is an apparatus having the decoder as described above, wherein at least one of a Group Of Pictures length and view identifier information is used to redefine the variables.
It is to be appreciated that the selection of particular syntax names, particularly previously unused syntax names as described with respect to various inventive aspects of the present principles, is for purposes of illustration and clarity and, thus, given the teachings of the present principles provided herein, other names and/or characters and so forth may also be used in place of and/or in addition to the syntax names provided herein, while maintaining the spirit of the present principles.
These and other features and advantages of the present principles may be readily ascertained by one of ordinary skill in the pertinent art based on the teachings herein. It is to be understood that the teachings of the present principles may be implemented in various forms of hardware, software, firmware, special purpose processors, or combinations thereof.
Most preferably, the teachings of the present principles are implemented as a combination of hardware and software. Moreover, the software may be implemented as an application program tangibly embodied on a program storage unit. The application program may be uploaded to, and executed by, a machine comprising any suitable architecture. Preferably, the machine is implemented on a computer platform having hardware such as one or more central processing units (“CPU”), a random access memory (“RAM”), and input/output (“I/O”) interfaces. The computer platform may also include an operating system and microinstruction code. The various processes and functions described herein may be either part of the microinstruction code or part of the application program, or any combination thereof, which may be executed by a CPU. In addition, various other peripheral units may be connected to the computer platform such as an additional data storage unit and a printing unit.
It is to be further understood that, because some of the constituent system components and methods depicted in the accompanying drawings are preferably implemented in software, the actual connections between the system components or the process function blocks may differ depending upon the manner in which the present principles are programmed. Given the teachings herein, one of ordinary skill in the pertinent art will be able to contemplate these and similar implementations or configurations of the present principles.
Although the illustrative embodiments have been described herein with reference to the accompanying drawings, it is to be understood that the present principles is not limited to those precise embodiments, and that various changes and modifications may be effected therein by one of ordinary skill in the pertinent art without departing from the scope or spirit of the present principles. All such changes and modifications are intended to be included within the scope of the present principles as set forth in the appended claims.