The present disclosure is part of a non-provisional application claiming priority interest from U.S. provisional patent application No.62/808,940 filed on 22/2/2019. The contents of the above-mentioned applications are incorporated herein by reference.
Detailed Description
In the following detailed description, numerous specific details are set forth by way of examples in order to provide a thorough understanding of the relevant teachings. Any variations, derivations and/or extensions based on the teachings described herein are within the scope of the present disclosure. In some instances, well known methods, procedures, components, and/or circuits related to one or more example implementations disclosed herein may be described at a relatively high-level, without detail, in order to avoid unnecessarily obscuring aspects of the teachings of the present disclosure.
I. Candidate list
a. Merge mode and AMVP
For intra prediction mode, spatially neighboring reconstructed pixels may be used to generate directional prediction. For inter prediction mode, the temporally reconstructed reference frame may be used to generate a motion compensated prediction. Conventional inter prediction modes include skip, merge, and inter Advanced Motion Vector Prediction (AMVP) modes. The skip mode and the merge mode obtain motion information from spatially neighboring blocks (spatial candidates) or temporally collocated (co-located) blocks (temporal candidates). When a PU is encoded by skip or merge mode, no motion information is encoded, but only the index of the selected candidate. For skip mode, the residual signal is forced to zero and not coded. If a particular block is coded as skipped or merged, the candidate index is signaled to indicate which candidate in the candidate set is used for merging. Each merged PU reuses the MV, prediction direction, and reference picture index of the selected candidate.
Fig. 1 illustrates motion candidates for the merge mode. The figure identifies acurrent block 100 of a video picture or frame being encoded or decoded by a video codec. As illustrated, from spatial neighborhood, a0、A1、B0And B1Deriving up to four spatial MV candidates from TBROr TCTRDeriving a time-domain MV candidate (using T first)BRIf T isBRNot available, then T is usedCTRInstead). If none of the four spatial MV candidates are available, then position B is used2To derive MV candidates as alternatives. After the derivation process of four spatial MV candidates and one temporal MV candidate, in some embodiments redundancy removal (pruning) is applied to remove the redundant MV candidates. If the number of available MV candidates is less than five after removing redundancy (pruning), three types of additional candidates are derived and added to the candidate set (candidate list). The video encoder selects a final candidate within the candidate set for the skip or merge mode based on a Rate Distortion Optimization (RDO) decision and transmits the index to the video decoder. (skip mode and merge mode are collectively referred to herein as "merge mode")
When a PU is encoded in inter-AMVP mode, Motion compensated prediction is performed using the transmitted Motion Vector Difference (MVD), which may be used with a Motion Vector Predictor (MVP) to derive a Motion Vector (MV). In order to decide MVP in the inter-AMVP mode, an Advanced Motion Vector Prediction (AMVP) scheme is used to select a motion vector predictor in an AMVP candidate set including two spatial MVPs and one temporal MVP. Therefore, in the AMVP mode, an MVP index of an MVP and a corresponding MVD need to be encoded and transmitted. In addition, inter prediction directions accompanied by reference frame indexes for each list for specifying prediction directions in bi-directional prediction and uni-directional prediction, which are list 0(L0) and list 1(L1), should also be encoded and transmitted.
When a PU is encoded in skip mode or merge mode, no motion information is transmitted except for the merge index of the selected candidate. This is because the skip mode and the merge mode obtain motion information from spatial neighboring blocks (spatial candidates) or temporal blocks located in a collocated picture (temporal candidates) using a motion inference method (MV ═ MVP + MVD, where MVD is zero), where the collocated picture is the first reference picture inlist 0 orlist 1, which is signaled in a slice header. In the case of skipping a PU, the residual (residual) signal is also omitted. To determine the merge index for the skip mode and the merge mode, a merge scheme is used to select a motion vector predictor in a merge candidate set that includes four spatial MVPs and one temporal MVP.
Fig. 1 also shows the MVP candidate set for inter prediction mode, i.e., neighboring PUs that are referred to for deriving spatial MVP and temporal MVP for both AMVP and merging schemes. The current block 100 (which may be a PU or a CU) refers to neighboring blocks to derive spatial MVP and temporal MVP as MVP lists or candidate lists for AMVP mode, merge mode, or skip mode.
For AMVP mode, the left MVP is from A0、A1The first available MVP of (1), the top MVP being from B0、B1、B2The first available MVP, temporal MVP is from TBROr TCTRThe first available MVP (using T first)BRIf T isBRNot available, then T is usedCTRInstead). If left MVP is not available and top MVP is not scaled MVP, if at B0、B1And B2Where scaled MVP is present, the second top MVP can be derived. Thus, after the derivation process of two spatial MVPs and one temporal MVP, only the first two MVPs may be included in the candidate list. If the number of available MVPs is less than 2 after removing redundancy, the zero vector candidate is added to the candidate list.
For skip mode and merge mode, from A0、A1、B0And B1Deriving up to four spatial merging indices, and deriving from TBROr TCTRDeriving a time-domain merging index (using T first)BRIf T isBRNot available, then T is usedCTRInstead). If any of the four spatial merge indices are not available, then position B is used2To derive the merging index as an alternative. After deriving four spatial and one temporal merging indices, redundant merging indices are removed. If the number of non-redundant merge indices is less than 5, additional candidates may be derived from the original candidate and added to the candidate list. There are three types of derivation candidates:
1. combined bi-directional prediction merging candidates (derivation candidate type 1)
2. Scaled bi-directional prediction merge candidates (derivation candidate type 2)
3. Zero vector merge/AMVP candidate (derived candidate type 3)
For thederivation candidate type 1, a combined bi-directionally predicted merge candidate is created by combining the original merge candidates. In particular, if the current slice is a B-slice, further merge candidates may be generated by combining candidates fromlist 0 andlist 1. Fig. 2 illustrates a merge candidate list including combined bi-directionally predicted merge candidates. As illustrated, the bi-predictive merge candidate is created using two original candidates with mvL0 (motion vector in list 0) and refIdxL0 (reference picture index in list 0) or mvL1 (motion vector in list 1) and refIdxL1 (reference picture index in list 1).
For thederivation candidate type 2, scaled merge candidates are created by scaling the original merge candidates. Fig. 3 illustrates a merge candidate list including scaled merge candidates. As illustrated, the original merge candidate has mvLX (motion vector in list X, X may be 0 or 1) and refIdxLX (reference picture index in list X, X may be 0 or 1). For example, the original candidate a is alist 0 uni-directional predicted MV with mvL0_ a and referencepicture index ref 0. Candidate a is first copied to list L1 with reference picture index ref 0'. The scaled MV mvL0'_ a is calculated by scaling the mvL0_ a based on ref0 and ref 0'. Scaled bi-predictive merge candidates having mvL0_ a and ref0 in list L0 and mvL0'_ a and ref0' in list L1 are created and added to the merge candidate list. Likewise, scaled bi-predictive merge candidates having mvL1'_ a and ref1' inlist 0 and mvL1_ a and ref1 inlist 1 are created and added to the merge candidate list.
For thederivation candidate type 3, a zero vector candidate is created by combining a zero vector and a reference index. If the created zero vector candidate is not duplicate, it is added to the merge/AMVP candidate list. Fig. 4 illustrates an example of adding a zero vector candidate to the merge candidate list or the AMVP candidate list.
b. Intra block copy prediction
Intra Block Copy (IBC) is also known as Current Picture Reference (CPR). IBC prediction is similar to inter prediction (inter mode), except that the reference picture providing prediction is the current decoded frame or current picture including the current block being encoded. An IBC (or CPR) motion vector is a motion vector that refers to reference samples that have been reconstructed in the current picture. In some embodiments, the IBC-coded CU is signaled as an inter-coded block. In other words, the currently (partially) decoded picture is considered as a reference picture. By referring to such a reference picture, the current block can be predicted from a reference block of the same picture in the same manner as motion compensation. For some embodiments, the difference between IBC coding blocks and conventional Motion Compensation (MC) coding blocks includes the following: (1) the block vectors (displacement vectors in IBC) have only integer resolution, no interpolation is required for luminance or chrominance; (2) the block vector does not participate in temporal motion vector prediction; (3) block vectors and motion vectors are not used for inter prediction; and (4) the valid block vector has some constraints so that it can only point to a subset of the current picture. In some embodiments, to reduce implementation costs, the reference samples for IBC mode are from already reconstructed portions of the current slice or tile and meet the WPP parallel processing conditions. In some embodiments, to reduce memory consumption and decoder complexity, the video codec only allows the reconstructed portion of the current CTU to be used for IBC mode. This restriction allows the IBC mode to be implemented using local on-chip memory for hardware implementation.
More detailed information on IBC predictions can be found in the following documents: joint video experts group (JVT) 11 th conference of ITU-T SG 16WP 3 and ISO/IEC JTC 1/SC 29/WG 11: ljubljana, SI, 10-18 months 7 to 2018; document: JFET-K0076; CE8-2.2 Current picture referencing using reference index signaling. More detailed information on IBC modes can also be found in the following documents: xu, s.liu, t.chuang, y.huang, s.lei, k.rapaka, c.pang, v.seregin, y.wang, and m.karczewicz, "Intra Block Copy in HEVC Screen Content Coding Extensions," IEEE j.em.sel.topics Circuits system ", volume 6, No. 4, pages 409 to 419, 2016.
c. History-based motion vector prediction
In some embodiments, the motion information used to generate the hypotheses for inter prediction, also referred to as history-based motion vector prediction (HMVP), may be obtained by referencing previously encoded motion information in a history-based scheme. The HMVP candidate is defined as motion information of a previously coded block. The video codec maintains a table with a plurality of HMVP candidates during the encoding/decoding process. The table will be cleared when a new slice is encountered.
Simplified merge lists for IBC
For IBC merge mode, a merge candidate list is generated to include merge indices associated with only two of the plurality of coded spatially neighboring blocks of the current block. One merge candidate will be selected from the merge candidate list for decoding the current block.
In practice, when generating the merge candidate list, several types of merge candidates are inserted into the merge list if a candidate exists. The types of merge candidates that may be inserted into the merge list may include spatial merge candidates (i.e., merge indices associated with only two of the plurality of coded spatial neighboring blocks), temporal merge candidates, history-based (e.g., HMVP) merge candidates, pairwise average merge (pair average merge) candidates, sub-CU merge candidates, or default merge candidates. The trimming process is also performed on the alignment table.
In some embodiments, for IBC merge mode, the video codec simplifies the merge list construction by excluding some merge candidates or by reducing the pruning process. The reduced merge list construction may also be applied according to some constraints.
In some embodiments, for IBC mode, one or some or all merge candidates are excluded or omitted from the merge candidate list construction. In other words, the merge candidate list constructed for IBC mode has no merge candidates or some merge candidates, or only a subset of merge candidates (compared to available merge candidates in conventional inter-prediction merge mode). For example, in some embodiments, one or some or all spatial merging candidates are excluded or omitted from the merge candidate list construction for IBC prediction, or from the merge candidate list construction for IBC prediction mode, which would be included in conventional inter-prediction merge mode, or have only a subset of spatial merging candidates compared to conventional (or non-IBC) merge mode.
Fig. 5 illustrates an example simplified merge candidate list for IBC mode when encoding thecurrent block 100. As illustrated, only two spatial merge candidates a1 and B1 serve as merge candidates for IBC mode, while the other spatial merge candidates a0, B0, and B2 are omitted, excluded, or not included in the merge candidate list. In other words, among all spatial neighbors (i.e., the upper neighbors B0, B1, B2, and the left-side neighbors A1 and A0) encoded before thecurrent block 100, only the spatial neighbors of the top (B1) and the top left (A1) are included. Even if other spatial neighbors (B0, B2, a0) of thecurrent block 100 have been encoded before thecurrent block 100, those spatial neighbors will not be used as merge mode candidates for IBC.
In some embodiments, some or all history-based (e.g., HMVP) candidates are excluded or omitted from the merge candidate list construction. As mentioned, to implement HMVP, a video codec may maintain a motion history table that stores motion information for previously encoded blocks of a current slice. To generate a merge candidate list for encoding a current block in IBC prediction mode, the video codec may include only a subset of motion information stored in the motion history table in the merge candidate list for IBC mode.
In some embodiments, for IBC mode, one or some or all of the time domain merge candidates are excluded or omitted from the merge candidate list construction. In some embodiments, one or some or all of the pairwise averaged merge candidates are excluded or omitted from the merge candidate list construction. In some embodiments, one or some or all of the sub-CU merging candidates are excluded or omitted from the merging candidate list construction. In some embodiments, the default merge candidate is excluded or omitted from the merge candidate list construction.
In some embodiments, for IBC merge mode, pruning processing (redundancy removal processing) is simplified or not performed for the merge candidate constructions. In some embodiments, pruning of spatial merge candidates is simplified or not performed in the merge candidate list construction. In some embodiments, pruning of the temporal merging candidates is simplified or not performed in the merging candidate list construction. In some embodiments, pruning of pairwise averaged merge candidates is simplified or not performed in the merge candidate list construction. In some embodiments, pruning of sub-CU merge candidates is simplified or not performed in the merge candidate list construction. In some embodiments, pruning of default merge candidates is simplified or not performed in the merge candidate list construction. In some embodiments, pruning of history-based (e.g., HMVP) candidates is simplified or not performed in the merge candidate list construction. When the pruning process is simplified, only the first N HMVP candidates in the HMVP candidate list are compared to the merge candidate list (for detecting redundancy or checking for redundancy candidates). In some embodiments, the compared HMVP candidate is added to the merge candidate list when the comparison result indicates that the compared HMVP candidate is different from a candidate in the merge candidate list. When no pruning processing is performed, no previous comparison is performed until the HMVP candidate is included in the merge candidate list. In short, the pruning process for the various types of merge candidates described herein may be simplified or not performed. When the pruning processing of any one of the various types of merge candidates is not performed, the merge candidate of that type may be included in the merge candidate list without comparison redundancy. For some embodiments, this simplified pruning process is not contradictory to the simplification of the generation of merge candidates, and may be performed in the same process.
In some embodiments, for IBC merge mode, one or some merge candidates are excluded from the merge candidate list construction according to a certain CU width or height. In other words, when generating the merge candidate list, the video codec determines which merge candidate to include or omit based on the attribute of the current CU.
In some embodiments, one or some or all spatial merge candidates are excluded from the merge candidate list construction according to a certain CU width or height. In some embodiments, one or some or all of the temporal merging candidates are excluded or omitted from the merging candidate list construction, depending on a certain CU width or height. In some embodiments, one or some or all history-based (e.g., HMVP) merge candidates are excluded or omitted from the merge candidate list construction, depending on a certain CU width or height. In some embodiments, one or some or all of the pairwise averaged merge candidates are excluded or omitted from the merge candidate list construction, depending on a certain CU width or height. In some embodiments, one or some or all sub-CU merging candidates are excluded or omitted from the merging candidate list construction, depending on a certain CU width or height. In some embodiments, one or some or all default merge candidates are excluded or omitted from the merge candidate list construction, depending on a certain CU width or height. In some embodiments, the pruning process is simplified or not performed in the merge candidate list construction, depending on a certain CU width or height.
In some embodiments, in IBC merge mode, one or some candidates are excluded from the merge candidate list construction according to a certain CU area. In some embodiments, one or some or all spatial merge candidates are excluded from the merge candidate list construction, depending on a certain CU area. In some embodiments, one or some or all of the temporal merging candidates are excluded from the merging candidate list construction according to a certain CU area. In some embodiments, one or some or all history-based (e.g., HMVP) merge candidates are excluded from the merge candidate list construction, depending on a certain CU area. In some embodiments, one or some or all pairwise averaged merge candidates are excluded from the merge candidate list construction, depending on a certain CU area. In some embodiments, one or some or all sub-CU merging candidates are excluded from the merging candidate list construction according to a certain CU area. In some embodiments, the default merge candidate is excluded from the merge candidate list construction according to a certain CU area. In another embodiment, the pruning process is simplified or not performed in the merge candidate list construction depending on a certain CU area.
Any of the aforementioned proposed methods may be implemented in an encoder and/or decoder. For example, any of the proposed methods may be implemented in a predictor derivation module of an encoder and/or a predictor derivation module of a decoder. Alternatively, any of the proposed methods may be implemented as circuitry coupled to a predictor derivation module of an encoder and/or a predictor derivation module of a decoder, providing the information required by the predictor derivation module.
Example video codec
Fig. 6 illustrates anexample video encoder 600 that can encode a block using a reduced merge list construction in IBC mode. As illustrated, thevideo encoder 600 receives an input video signal from thevideo source 605 and encodes the signal into abitstream 695. Thevideo encoder 600 has several components or modules for encoding signals from thevideo source 605, including at least some components selected from thetransform module 610, thequantization module 611, theinverse quantization module 614, theinverse transform module 615, the intrapicture estimation module 620, theintra prediction module 625, themotion compensation module 630, themotion estimation module 635, theloop filter 645, thereconstructed picture buffer 650, the MV buffers 665 and 675, and theentropy encoder 690. Themotion compensation module 630 and themotion estimation module 635 are part of theinter prediction module 640.
In some implementations, modules 610-690 are modules of software instructions executed by one or more processing units (e.g., processors) of a computing device or electronic device. In some implementations, the modules 610-690 are modules of hardware circuitry implemented by one or more Integrated Circuits (ICs) of an electronic device. Althoughmodules 610 through 690 are illustrated as separate modules, some of the modules may be combined into a single module.
Video source 605 provides an original video signal that can render the pixel data of each video frame without compression. Thesubtractor 608 calculates the difference between the original video pixel data of thevideo source 605 and the predictedpixel data 613 from themotion compensation module 630 or theintra prediction module 625. Thetransform module 610 converts the difference (or residual pixel data or residual signal 609) into transform coefficients (e.g., by performing a discrete cosine transform or DCT). Thequantization module 611 quantizes the transform coefficients into quantized material (or quantized coefficients) 612, which is encoded by anentropy encoder 690 into abitstream 695.
Inverse quantization module 614 inverse quantizes quantized data (or quantized coefficients) 612 to obtain transform coefficients, andinverse transform module 615 performs an inverse transform on the transform coefficients to produce reconstructed residual 619. The reconstructed residual 619 is added to the predictedpixel data 613 to produce reconstructedpixel data 617. In some embodiments,reconstructed pixel data 617 is temporarily stored in a line buffer (not shown) for intra picture prediction and spatial MV prediction. The reconstructed pixels are filtered by theloop filter 645 and stored in thereconstructed picture buffer 650. In some implementations, thereconstructed picture buffer 650 is a storage external to thevideo encoder 600. In some implementations, thereconstructed picture buffer 650 is a storage internal to thevideo encoder 600.
The intrapicture estimation module 620 performs intra prediction based on the reconstructedpixel data 617 to generate intra prediction data. The intra prediction data is provided to anentropy encoder 690 to be encoded into abitstream 695. The intra-prediction data is also used by theintra-prediction module 625 to generate predictedpixel data 613.
Themotion estimation module 635 performs inter prediction by generating MVs that reference pixel data of previously decoded frames stored in thereconstructed picture buffer 650. These MVs are provided tomotion compensation module 630 to generate predicted pixel data.
Instead of encoding the complete actual MVs in the bitstream, thevideo encoder 600 uses MV prediction to generate predicted MVs, and encodes the difference between the MVs used for motion compensation and the predicted MVs as residual motion data and stores in thebitstream 695.
MV prediction module 675 generates predicted MVs based on reference MVs that were generated for encoding previous video frames, i.e., motion compensated MVs used to perform motion compensation. TheMV prediction module 675 retrieves reference MVs from previous video frames from theMV buffer 665. Thevideo encoder 600 stores the MV generated for the current video frame in theMV buffer 665 as a reference MV for generating a prediction MV.
TheMV prediction module 675 uses the reference MVs to create predicted MVs. The predicted MV may be calculated by spatial MV prediction or temporal MV prediction. Theentropy encoder 690 encodes the difference (residual motion data) between the prediction MV and the motion compensation MV (mc MV) of the current frame into abitstream 695.
Theentropy encoder 690 encodes various parameters and data into thebitstream 695 by using entropy encoding techniques such as Context Adaptive Binary Arithmetic Coding (CABAC) or Huffman coding. Theentropy encoder 690 encodes various header elements, flags, and quantized transformcoefficients 612 and residual motion data as syntax elements into abitstream 695. Thebitstream 695 is in turn stored in a storage device or transmitted to a decoder over a communication medium, such as a network.
Theloop filter 645 performs a filtering or smoothing operation on the reconstructedpixel data 617 to reduce coding artifacts, particularly at the boundaries of the blocks. In some embodiments, the filtering operation performed includes Sample Adaptive Offset (SAO). In some embodiments, the filtering operation comprises a self-Adjusting Loop Filter (ALF).
To implement IBC mode, themotion estimation module 635 may search the encoded portion of the current picture stored in thereconstructed picture buffer 650 to determine motion vectors and corresponding motion information for pixels that reference the current picture. Themotion compensation module 630 may implement the merge candidate list based on motion information stored in theMV buffer 665, which includes motion information of spatial neighbors of the current block (used to encode). When encoding the current block using the IBC mode, the merge candidate list may include some but not all spatial neighbors of the current block as spatial merge candidates. Thevideo encoder 600 may also apply simplified pruning to the merge candidate list.
FIG. 7 conceptually illustrates aprocess 700 of encoding a current block by using a simplified merge candidate list for IBC mode. In some implementations, one or more processing units (e.g., processors) of a computingdevice implementing encoder 600perform process 700 by executing instructions stored in a computer-readable medium. In some implementations, an electronicdevice implementing encoder 600 performsprocess 700. In some implementations, theprocess 700 is performed at theinter prediction module 640.
An encoder receives (at 710) original pixel data for a block to be encoded as a current block of a current picture of video. Two or more spatially adjacent neighboring blocks of the current block are coded before the current block. In the example of FIG. 1, spatial neighbors A0, A1, B0, B1, and B2 were encoded before thecurrent block 100, which are PUs or CUs above and/or to the left of the current block.
The encoder generates (at 720) a merge candidate list. The merge candidate list may include spatial merge candidates, temporal merge candidates, history-based (e.g., HMVP) merge candidates, pairwise-average merge candidates, sub-CU merge candidates, and/or default merge candidates. The encoder may determine which merge candidate to include in the list based on the properties (e.g., size, width, height, aspect ratio) of the current block.
The trimming process is also performed on the alignment table. The pruning process may be simplified such that pruning is not performed for certain types of merge candidates. For example, in some embodiments, pruning of history-based (e.g., HMVP) candidates is simplified or not performed in the merge candidate list construction. When the pruning process is simplified, only the top N HMVP candidates in the HMVP candidate list are compared to the merge candidate list. In some embodiments, N is equal to 1. In other words, in this embodiment, no more than one HMVP candidate is compared to the merge candidate list. Then, for example, when the comparison result indicates that the compared HMVP candidate is different from a candidate in the merge candidate list, the compared HMVP candidate is added to the merge candidate list. When no pruning processing is performed, no previous comparison is performed until the HMVP candidate is included in the merge candidate list. In short, the pruning process for the various types of merge candidates described herein may be simplified or not performed. When the pruning processing of any one of the various types of merge candidates is not performed, the merge candidate of that type may be included in the merge candidate list without comparing redundancies (comparing candidates to identify redundancies).
Since the merge candidate list is generated for IBC mode, the list includes intra picture candidates associated with motion information that references pixels in the current picture. In some implementations, the intra picture candidates include candidates associated with some, but not all, of two or more spatially neighboring blocks of the current block. For example, the intra picture candidates of the merge candidate list may include only spatial neighbors a1 and B1, and not spatial neighbors a0, B0, and B2. In other words, some but not all spatial merge candidates of the current block are included in the merge candidate list for IBC.
The encoder selects (at 730) a merge candidate from the generated list, e.g., by generating an index to be included as a syntax element in thebitstream 695. The encoder then encodes (at 740) the current block by using the motion information of the selected merge candidate to generate a prediction of the current block.
Example video decoder
Fig. 8 illustrates anexample video decoder 800 that may decode a block using a reduced merge list construction in IBC mode. As illustrated, thevideo decoder 800 is an image decoding or video decoding circuit that receives thebitstream 895 and decodes the content of the bitstream into pixel data for video frames for display. Thevideo decoder 800 has several components or modules for decoding thebitstream 895, including some components selected from the group consisting of aninverse quantization module 805, aninverse transform module 810, anintra prediction module 825, amotion compensation module 830, aloop filter 845, a decodedpicture buffer 850, anMV buffer 865, anMV prediction module 875, and aparser 890. Themotion compensation module 830 is part of aninter prediction module 840.
In some implementations, the modules 810-890 are modules of software instructions that are executed by one or more processing units (e.g., processors) of a computing device. In some implementations, the modules 810-890 are modules of hardware circuitry implemented by one or more ICs of an electronic device. Although the modules 810-890 are illustrated as separate modules, some of the modules may be combined into a single module.
The parser 890 (or entropy decoder) receives thebitstream 895 and performs initial parsing according to a syntax defined by a video coding or image coding standard. The parsed syntax elements include various header elements, flags, and quantized data (or quantized coefficients) 812.Parser 890 parses out the various syntax elements by using entropy coding techniques such as Context Adaptive Binary Arithmetic Coding (CABAC) or Huffman coding.
Theinverse quantization module 805 inverse quantizes the quantized data (or quantized coefficients) 812 to obtain transform coefficients, and theinverse transform module 810 performs an inverse transform on thetransform coefficients 816 to produce a reconstructedresidual signal 819. The reconstructedresidual signal 819 is added to predictedpixel data 813 from either theintra prediction module 825 or themotion compensation module 830 to produce decodedpixel data 817. The decoded pixel data is filtered byloop filter 845 and stored in decodedpicture buffer 850. In some implementations, decodedpicture buffer 850 is a storage external tovideo decoder 800. In some implementations, decodedpicture buffer 850 is a storage internal tovideo decoder 800.
Theintra prediction module 825 receives intra prediction data from thebitstream 895 and generates therefrom predictedpixel data 813 from decodedpixel data 817 stored in the decodedpicture buffer 850. In some embodiments, decodedpixel data 817 is also stored in a line buffer (not shown) for intra picture prediction and spatial MV prediction.
In some implementations, the contents of decodedpicture buffer 850 are for display. Thedisplay device 855 either retrieves the contents of the decodedpicture buffer 850 for direct display or retrieves the contents of the decoded picture buffer to a display buffer. In some implementations, the display device receives pixel values from decodedpicture buffer 850 via pixel transfer.
Motion compensation module 830 generates predictedpixel data 813 from decodedpixel data 817 stored in decodedpicture buffer 850 based on motion compensated mv (mc mv). These motion compensated MVs are decoded by adding the residual motion data received frombitstream 895 to the predicted MVs received fromMV prediction module 875.
MV prediction module 875 generates predicted MVs based on reference MVs that are generated for decoding previous video frames (e.g., motion compensated MVs used to perform motion compensation).MV prediction module 875 retrieves reference MVs for previous video frames fromMV buffer 865. Thevideo decoder 800 stores the motion compensated MV generated for decoding the current video frame in theMV buffer 865 as a reference MV for generating a predicted MV.
Loop filter 845 performs a filtering or smoothing operation on decodedpixel data 817 to reduce coding artifacts, particularly at block boundaries. In some embodiments, the filtering operation performed includes Sample Adaptive Offset (SAO). In some embodiments, the filtering operation comprises a self-Adjusting Loop Filter (ALF).
To implement IBC mode, themotion compensation module 830 may implement a merge candidate list that includes intra picture candidates associated with motion information that references pixels in the current picture. The merge candidate list includes (for encoding) motion information of spatial neighbors of the current block based on the motion information stored in theMV buffer 865. When decoding the current block using the IBC mode, the merge candidate list may include some but not all spatial neighbors of the current block as spatial merge candidates. Thevideo decoder 800 may also apply simplified pruning to the merge candidate list.
Figure 9 conceptually illustrates aprocess 900 of decoding a current block by using a simplified merge candidate list for IBC mode. In some implementations, one or more processing units (e.g., processors) of a computingdevice implementing decoder 800perform process 900 by executing instructions stored in a computer-readable medium. In some implementations, theprocess 900 is performed by an electronic device implementing thedecoder 800. In some implementations, theprocess 900 is performed at theinter prediction module 840.
A decoder receives (at 910) data from a bitstream for a block in which a current block of a current picture to be video is decoded. Two or more spatially adjacent neighboring blocks of the current block are coded before the current block. In the example of FIG. 1, spatial neighbors A0, A1, B0, B1, and B2 were encoded before thecurrent block 100.
The decoder generates (at 920) a merge candidate list. The merge candidate list may include spatial merge candidates, temporal merge candidates, history-based (e.g., HMVP) merge candidates, pairwise-average merge candidates, sub-CU merge candidates, and/or default merge candidates. The decoder may determine which merge candidate to include in the list based on the properties (e.g., size, width, height, aspect ratio) of the current block.
The trimming process is also performed on the alignment table. The pruning process may be simplified such that at least one redundant candidate in the merge candidate list is not removed. The pruning process may also be simplified such that pruning is not performed for certain types of merge candidates. For example, in some embodiments, the simplified pruning process may not remove redundancy related to HMVP candidates.
Since the merge candidate list is generated for IBC mode, the list includes intra picture candidates associated with motion information that references pixels in the current picture. In some implementations, the intra picture candidates include candidates associated with some, but not all, of two or more spatially neighboring blocks of the current block. For example, the intra picture candidates of the merge candidate list may include only spatial neighbors a1 and B1, and not spatial neighbors a0, B0, and B2. In other words, some but not all spatial merge candidates of the current block are included in the merge candidate list for IBC.
In some embodiments, some merge candidates available for the merge mode are not included in the merge candidate list for the IBC mode. For example, in some embodiments, at least one HMVP candidate stored for the current slice is not included in the merge candidate list for the current block of IBC mode.
The decoder selects (at 930) a merge candidate from the generated list, e.g., based on an index provided by a syntax element parsed from thebitstream 895. The decoder then decodes (at 940) the current block by using the motion information of the selected merge candidate to generate a prediction of the current block.
V. example electronic System
Many of the above-described features and applications are implemented as software processes that are specified as a set of instructions recorded on a computer readable storage medium (also referred to as computer readable medium). When executed by one or more computing or processing units (e.g., one or more processors, cores of processors, or other processing units), these instructions cause the processing unit to perform the actions indicated in the instructions. Examples of computer-readable media include, but are not limited to, CD-ROM, flash drives, Random Access Memory (RAM) chips, hard drives, erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), and the like. Computer-readable media do not contain carrier waves or electronic signals that are communicated wirelessly or through a wired connection.
In this specification, the term "software" is intended to include firmware residing in read-only memory or applications stored in magnetic storage that can be read into memory for processing by a processor. Moreover, in some embodiments, multiple software inventions may be implemented as sub-parts of a larger program, while retaining different software inventions. In some embodiments, multiple software inventions may also be implemented as a single program. Finally, any combination of separate programs that together implement the software invention described herein is within the scope of the present disclosure. In some embodiments, a software program defines one or more specific machine implementations that perform the operations of the software program when installed for execution on one or more electronic systems.
Figure 10 conceptually illustrates anelectronic system 1000 with which some embodiments of the present disclosure are implemented. Theelectronic system 1000 may be a computer (e.g., desktop computer, personal computer, tablet computer, etc.), a telephone, a PDA, or any other kind of electronic device. Such electronic systems include various types of computer-readable media and interfaces for various other types of computer-readable media.Electronic system 1000 includesbus 1005,processing unit 1010, Graphics Processing Unit (GPU)1015,system memory 1020,network 1025, read onlymemory 1030,permanent storage 1035,input device 1040, andoutput 1045.
Bus 1005 generally represents all of the system bus, peripheral buses, and chipset buses communicatively connected with many of the internal devices ofelectronic system 1000. For example,bus 1005 communicatively connectsprocessing unit 1010 with GPU1015, read onlymemory 1030,system memory 1020, andpermanent storage device 1035.
Processing unit 1010 retrieves instructions to execute and data to process from these various memory units to perform the processes of the present disclosure. In different embodiments, the processing unit may be a single processor or a multi-core processor. Some instructions are passed to and executed byGPU 1015. GPU1015 may offload various computations or supplement image processing provided byprocessing unit 1010.
Read Only Memory (ROM)1030 stores static data and instructions for use byprocessing unit 1010 and other modules of the electronic system. On the other hand, thepermanent storage device 1035 is a read-write memory device. The device is a non-volatile memory unit that stores instructions and data even when theelectronic system 1000 is turned off. Some embodiments of the present disclosure use a mass storage device (e.g., a magnetic or optical disk and its corresponding disk drive) as thepermanent storage device 1035.
Other embodiments use removable storage devices (e.g., floppy disks, flash memory devices, etc., and their corresponding disk drives) as the permanent storage device. Like thepermanent storage device 1035, thesystem memory 1020 is a read-write memory device. Unlikestorage device 1035, however,system memory 1020 is a volatile read-and-write memory, such as a random access memory.System memory 1020 stores some instructions and data used by the processor during runtime. In some embodiments, processes according to the present disclosure are stored insystem memory 1020,permanent storage 1035, and/or read-only memory 1030. For example, according to some embodiments, various memory units include instructions for processing a multi media clip.Processing unit 1010 retrieves instructions to be executed and data to be processed from these various memory units to perform the processes of some embodiments.
Thebus 1005 is also connected to aninput device 1040 and anoutput device 1045.Input device 1040 enables a user to communicate information and select commands to the electronic system.Input device 1040 includes an alphanumeric keyboard and a pointing device (also referred to as a "cursor control device"), a camera (e.g., a web cam), a microphone or the like for receiving voice commands, and so forth. Theoutput device 1045 displays images or otherwise outputs data generated by the electronic system.Output devices 1045 include a printer, a display device such as a Cathode Ray Tube (CRT) or Liquid Crystal Display (LCD), and speakers or similar audio output device. Some embodiments include an apparatus that functions as both an input device and an output device, such as a touch screen.
Finally, as shown in FIG. 10,bus 1005 also coupleselectronic system 1000 to anetwork 1025 through a network adapter (not shown). In this manner, the computer may be part of a network of computers (e.g., a local area network ("LAN"), a wide area network ("WAN"), or an intranet), or part of one of the networks, such as the Internet. Any or all of the components ofelectronic system 1000 may be used in conjunction with the present disclosure.
Some embodiments include electronic components, such as microprocessors, storage, and memory, that store computer program instructions in a machine-readable or computer-readable medium (alternatively referred to as a computer-readable storage medium, machine-readable medium, or machine-readable storage medium). Some examples of such computer-readable media include RAM, ROM, compact disk read-only (CD-ROM), compact disk recordable (CD-R), compact disk rewritable (CD-RW), digital versatile disks read-only (e.g., DVD-ROM, dual-layer DVD-ROM), various DVD recordable/rewritable (e.g., DVD-RAM, DVD-RW, DVD + RW, etc.), flash memory (e.g., SD card)Mini SD card, micro SD card, etc.), magnetic and/or solid state disk drives, read-only and recordable

Optical disks, ultra-high density optical disks, any other optical or magnetic medium, and floppy disks. The computer-readable medium may store a computer program that is executable by at least one processing unit and that includes sets of instructions for performing various operations. Examples of computer programs or computer code include machine code, such as produced by a compiler, and files including higher level code that are executed by a computer, electronic component, or microprocessor using an interpreter.
Although the above discussion has primarily referred to microprocessor or multi-core processors executing software, many of the above features and applications are performed by one or more integrated circuits, such as Application Specific Integrated Circuits (ASICs) or Field Programmable Gate Arrays (FPGAs). In some implementations, such integrated circuits execute instructions stored on the circuit itself. In addition, some embodiments execute software stored in a Programmable Logic Device (PLD), ROM, or RAM device.
As used in this specification and any claims of this application, the terms "computer," "server," "processor," and "memory" all refer to electronic or other technical devices. These terms do not include a person or group of persons. For purposes of this description, the term "display" refers to displaying on an electronic device. As used in this specification and any claims of this application, the terms "computer-readable medium" and "machine-readable medium" are entirely limited to tangible physical objects that store information in a form that can be read by a computer. These terms do not include any wireless signals, wired download signals, and any other temporary signals.
Although the present disclosure has been described with reference to numerous specific details, one of ordinary skill in the art will recognize that the present disclosure may be embodied in other specific forms without departing from the spirit of the present disclosure. In addition, a number of figures (including fig. 7 and 9) conceptually illustrate the processing. The specific operations of these processes may not be performed in the exact order shown and described. Certain operations may not be performed in a single continuous series of operations and may be performed in different embodiments. Further, the processing may be implemented using several sub-processes, or as part of a larger macro-process. Accordingly, one of ordinary skill in the art will understand that the present disclosure is not limited by the foregoing illustrative details, but rather is defined by the appended claims.
Supplementary notes
The subject matter described herein sometimes illustrates different components contained within, or connected with, different other components. It is to be understood that such depicted architectures are merely examples, and that in fact many other architectures can be implemented which achieve the same functionality. Conceptually, any arrangement of components to achieve the same functionality is effectively "associated" such that the desired functionality is achieved. Hence, any two components herein combined to achieve a particular functionality can be seen as "associated with" each other such that the desired functionality is achieved, irrespective of architectures or intermedial components. Likewise, any two components so associated can also be viewed as being "operably connected," or "operably coupled," to each other to achieve the desired functionality, and any two components capable of being so associated can also be viewed as being "operably couplable," to each other to achieve the desired functionality. Specific examples of operatively couplable include but are not limited to physically mateable and/or physically interacting components and/or wirelessly interactable and/or wirelessly interacting components and/or logically interacting and/or logically interactable components.
Furthermore, with respect to the use of substantially any plural and/or singular terms herein, those having skill in the art can translate from the plural to the singular and/or from the singular to the plural as is appropriate to the context and/or application. Various singular/plural permutations may be expressly set forth herein for the sake of clarity.
Furthermore, those skilled in the art will understand that, in general, terms used herein, especially in the appended claims (e.g., bodies of the appended claims) are generally intended as "open" terms, e.g., the term "including" should be interpreted as "including but not limited to," the term "having" should be interpreted as "having at least," the term "includes" should be interpreted as "includes but is not limited to," etc. It will be further understood by those within the art that if a specific number of an introduced claim recitation (recitations) is intended, such an intent will be explicitly recited in the claim, and in the absence of such recitation no such intent is present. For example, as an aid to understanding, the following appended claims may contain usage of the introductory phrases "at least one" and "one or more" to introduce claim recitations. However, the use of such phrases should not be construed to imply that the introduction of a claim recitation by the indefinite articles "a" or "an" limits any particular claim containing such introduced claim recitation to implementations containing only one such recitation, even when the same claim includes the introductory phrases "one or more" or "at least one" and indefinite articles such as "a" or "an" (e.g., "a" and/or "an" should be interpreted to mean "at least one" or "one or more"); the same is true for the use of definite articles used to introduce claim recitations. In addition, even if a specific number of an introduced claim recitation is explicitly recited, those skilled in the art will recognize that such recitation should be interpreted to mean at least the recited number (e.g., the bare recitation of "two recitations," without other modifiers, means at least two recitations, or two or more recitations). Further, where a convention analogous to "A, B and at least one of C, etc." is used, in general such a construction is intended to mean what one of skill in the art would understand as that convention (e.g., "a system having at least one of A, B and C" would include, but is not limited to, systems that include A alone, B alone, C, A alone, B, A and C, B and C and/or A, B and C, etc.). Where a convention analogous to "A, B or at least one of C, etc." is used, in general such a structure is intended to mean the meaning of the convention as would be understood by a person skilled in the art (e.g., "a system having at least one of A, B or C" would include but not be limited to systems that include a alone, B alone, C, A alone and B, A and C, B and C and/or A, B and C, etc.). It will be further understood by those within the art that virtually any disjunctive word and/or phrase presenting two or more alternative terms, whether in the description, claims, or drawings, should be understood to contemplate the possibilities of including one of the terms, either of the terms, or both terms. For example, the phrase "a or B" should be understood to include the possibility of "a" or "B" or "a and B".
From the foregoing, it will be appreciated that various implementations of the disclosure have been described herein for purposes of illustration, and that various modifications may be made without deviating from the scope and spirit of the disclosure. Accordingly, the various implementations disclosed herein are not intended to be limiting, with the true scope and spirit being indicated by the following claims.