Movatterモバイル変換


[0]ホーム

URL:


HK1261263A1 - Coding of transform coefficients for video coding - Google Patents

Coding of transform coefficients for video coding
Download PDF

Info

Publication number
HK1261263A1
HK1261263A1HK19121159.8AHK19121159AHK1261263A1HK 1261263 A1HK1261263 A1HK 1261263A1HK 19121159 AHK19121159 AHK 19121159AHK 1261263 A1HK1261263 A1HK 1261263A1
Authority
HK
Hong Kong
Prior art keywords
transform coefficients
subset
scan
coding
block
Prior art date
Application number
HK19121159.8A
Other languages
Chinese (zh)
Other versions
HK1261263B (en
Inventor
霍埃尔‧索赖‧罗哈斯
拉詹‧拉克斯曼‧乔希
马尔塔‧卡切维奇
Original Assignee
高通股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 高通股份有限公司filedCritical高通股份有限公司
Publication of HK1261263A1publicationCriticalpatent/HK1261263A1/en
Publication of HK1261263BpublicationCriticalpatent/HK1261263B/en

Links

Description

Coding of transform coefficients for video coding
The present application claims the benefit of united states provisional application No. 61/450,555, 2011, 3/8, japanese provisional application No. 61/451,485, 2011, 3/10, japanese provisional application No. 61/451,496, 2011, 3/14, japanese provisional application No. 61/452,384, 2011, 6/8, japanese provisional application No. 61/494,855, and 2011, 6/15, japanese provisional application No. 61/497,345, each of which is incorporated herein by reference in its entirety.
Related information of divisional application
The scheme is a divisional application. The parent of this division is an inventive patent application having an application date of 2012, 3/7, and an application number of 201280015368.4, entitled "decoding of transform coefficients for video decoding".
Technical Field
This disclosure relates to video coding, and more particularly, to techniques for scanning and coding transform coefficients generated by a video coding process.
Background
Digital video capabilities can be incorporated into a wide range of devices, including digital televisions, digital direct broadcast systems, wireless broadcast systems, Personal Digital Assistants (PDAs), laptop or desktop computers, digital cameras, digital recording devices, digital media players, video gaming devices, video gaming consoles, cellular or satellite radio telephones, video teleconferencing devices, and the like. Digital video devices implement video compression techniques, such as those described in standards defined by MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264/MPEG-4 part 10, Advanced Video Coding (AVC), the High Efficiency Video Coding (HEVC) standard currently being developed, and extensions to such standards, in order to more efficiently transmit, receive, and store digital video information.
Video compression techniques include spatial prediction and/or temporal prediction to reduce or remove redundancy inherent in video sequences. For block-based video coding, a video frame or slice may be partitioned into blocks. Each block may be further partitioned. Blocks in an intra-coded (I) frame or slice are encoded using spatial prediction with respect to reference samples in neighboring blocks in the same frame or slice. Blocks in an inter-coded (P or B) frame or slice may use spatial prediction with respect to reference samples in neighboring blocks in the same frame or slice, or temporal prediction with respect to reference samples in other reference frames. Spatial or temporal prediction generates a predictive block for a block to be coded. The residual data represents pixel differences between the original block to be coded and the predictive block.
An inter-coded block is encoded according to a motion vector that points to a block of reference samples that forms a predictive block and residual data that indicates a difference between the coded block and the predictive block. The intra-coded block is encoded according to an intra-coding mode and residual data. For further compression, the residual data may be transformed from the pixel domain to a transform domain, producing residual transform coefficients, which may then be quantized. Quantized transform coefficients initially arranged in a two-dimensional array may be scanned in a particular order to generate one-dimensional vectors of transform coefficients for entropy coding.
Disclosure of Invention
In general, devices and methods are described for coding transform coefficients associated with a block of residual video data in a video coding process. The techniques, structures, and methods described in this disclosure may be applicable to video coding processes that code transform coefficients using entropy coding, such as Context Adaptive Binary Arithmetic Coding (CABAC). Aspects of this disclosure include selecting a scanning order for both significance map coding and level and sign coding, and selecting contexts for entropy coding consistent with the selected scanning order. The techniques, structures, and methods of this disclosure may be applicable for use in both video encoders and video decoders.
This disclosure proposes the coordination of the coding of significance maps of transform coefficients and the scanning order of coding the levels of transform coefficients. That is, in some examples, the scan orders used for significance map and level coding should have the same pattern and direction. In another example, it is proposed that the scan order for the significance map should be in the reverse direction (i.e., from coefficients of higher frequencies to coefficients of lower frequencies). In yet another example, it is proposed that the scanning order for significance map and level coding should be coordinated so that each proceeds in the reverse direction.
This disclosure also proposes, in some examples, scanning the transform coefficients in the subset. In particular, transform coefficients are scanned in a subset consisting of a plurality of consecutive coefficients according to a scanning order. These subsets may be applicable to both significance map scans as well as coefficient level scans.
Furthermore, this disclosure proposes that, in some examples, significance map and coefficient level scans are performed in consecutive scans and according to the same scan order. In one aspect, the scan order is an inverse scan order. The continuous scan may consist of several scan passes. Each scan pass may consist of a syntax element scan pass. For example, the first scan is a significance map scan (also referred to as bin 0 of a level of transform coefficients), the second scan is a scan of bin 1 of a level of transform coefficients in each subset, the third scan may be a scan of bin 2 of a level of transform coefficients in each subset, the fourth scan is a scan of the remaining bins of a level of transform coefficients, and the fifth scan is a scan of the sign of a level of transform coefficients. The sign round may be at any point after the significance map round. Moreover, coding more than one syntax element per pass may reduce the number of scan passes. For example, one scan pass is for syntax elements that use coded bins, and a second scan pass is for syntax elements that use bypass bins (e.g., remaining levels and signs). In this context, a bin is a portion of a string of entropy coded bins. A given non-binary valued syntax element is mapped to a binary sequence (a so-called binary bit string).
This disclosure also proposes, in some examples, entropy coding transform coefficients using CABAC in two different context regions. The context derivation for the first context region depends on the position of the transform coefficients, while the context derivation for the second region depends on causal neighbors of the transform coefficients. In another example, the second context region may use two different context models, depending on the location of the transform coefficients.
In one example of this disclosure, a method of coding transform coefficients associated with residual video data in a video coding process is presented. The method includes arranging a block of transform coefficients into one or more subsets of transform coefficients based on a scanning order, coding a first portion of levels of transform coefficients in each subset, wherein the first portion of levels includes at least significance of transform coefficients in each subset, and coding a second portion of levels of transform coefficients in each subset.
In another example of this disclosure, a system for coding a plurality of transform coefficients associated with residual video data in a video coding process is presented. The system includes a video coding unit configured to arrange a block of transform coefficients into one or more subsets of transform coefficients based on a scanning order, code a first portion of levels of transform coefficients in each subset, wherein the first portion of levels includes at least significance of transform coefficients in each subset, and code a second portion of levels of transform coefficients in each subset.
In another example of this disclosure, a system for coding a plurality of transform coefficients associated with residual video data in a video coding process is presented. The system comprises means for arranging a block of transform coefficients into one or more subsets of transform coefficients based on a scanning order, means for coding a first portion of levels of transform coefficients in each subset, wherein the first portion of levels includes at least significance of transform coefficients in each subset, and means for coding a second portion of levels of transform coefficients in each subset.
In another example of this disclosure, a computer program product comprises a computer-readable storage medium having stored thereon instructions that, when executed, cause a processor of a device for coding transform coefficients associated with residual video data in a video coding process to: arranging a block of transform coefficients into one or more subsets of transform coefficients based on a scanning order, coding a first portion of levels of transform coefficients in each subset, wherein the first portion of levels includes at least significance of transform coefficients in each subset, and coding a second portion of levels of transform coefficients in each subset.
The details of one or more examples are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description and drawings, and from the claims.
Drawings
Fig. 1 is a conceptual diagram illustrating a significance map coding process.
Fig. 2 is a conceptual diagram illustrating scan patterns and directions for significance map coding.
Fig. 3 is a conceptual diagram illustrating a scanning technique for level coding of transform units.
Fig. 4 is a block diagram illustrating an example video coding system.
Fig. 5 is a block diagram illustrating an example video encoder.
Fig. 6 is a conceptual diagram illustrating an inverse scan order for significance map and coefficient level coding.
Fig. 7 is a conceptual diagram illustrating a first subset of transform coefficients according to an inverse diagonal scan order.
Fig. 8 is a conceptual diagram illustrating a first subset of transform coefficients according to an inverse horizontal scan order.
Fig. 9 is a conceptual diagram illustrating a first subset of transform coefficients according to an inverse vertical scan order.
Fig. 10 is a conceptual diagram illustrating context regions for significance map coding.
Fig. 11 is a conceptual diagram illustrating example context regions for significance map coding using an inverse scan order.
Fig. 12 is a conceptual diagram illustrating example causal neighbors for entropy coding using forward scan order.
Fig. 13 is a conceptual diagram illustrating example causal neighbors for entropy coding using an inverse scan order.
Fig. 14 is a conceptual diagram illustrating example context regions for entropy coding using inverse scan order.
Fig. 15 is a conceptual diagram illustrating example causal neighbors for entropy coding using an inverse scan order.
Fig. 16 is a conceptual diagram illustrating another example of a context region for CABAC using an inverse scan order.
Fig. 17 is a conceptual diagram illustrating another example of a context region for CABAC using an inverse scan order.
Fig. 18 is a conceptual diagram illustrating another example of a context region for CABAC using an inverse scan order.
FIG. 19 is a block diagram illustrating an example entropy coding unit.
Fig. 20 is a block diagram illustrating an example video decoder.
FIG. 21 is a block diagram illustrating an example entropy decoding unit.
Fig. 22 is a flow diagram illustrating an example process for significance map and coefficient level scanning using a coordinated scanning order.
Fig. 23 is a flow diagram illustrating an example process for significance map and coefficient level scanning and entropy coding context derivation.
Fig. 24 is a flow diagram illustrating another example process for significance map and coefficient level scanning and entropy coding context derivation.
Fig. 25 is a flow diagram illustrating another example process for significance map and coefficient level scanning and entropy coding context derivation.
FIG. 26 is a flow diagram illustrating an example process for significance map coding using an inverse scan direction.
Fig. 27 is a flow diagram illustrating an example process for significance map and coefficient level scanning according to a subset of transform coefficients.
Fig. 28 is a flow diagram illustrating another example process for significance map and coefficient level scanning in accordance with a subset of transform coefficients.
Fig. 29 is a flow diagram illustrating another example process for significance map and coefficient level scanning according to a subset of transform coefficients.
Fig. 30 is a flow diagram illustrating an example process for entropy coding using multiple regions.
Detailed Description
Digital video devices implement video compression techniques to transmit and receive digital video information more efficiently. Video compression may apply spatial (intra) prediction and/or temporal (inter) prediction techniques to reduce or remove redundancy inherent in video sequences.
As one example, for video coding according to the High Efficiency Video Coding (HEVC) standard currently being developed by the joint collaborative team for video coding (JCT-VC), video frames may be partitioned into coding units. A coding unit generally refers to an image region serving as a basic unit to which various coding tools are applied for video compression. Coding units are typically, but not necessarily, square and may be considered similar to so-called macroblocks, e.g., according to other video coding standards such as ITU-h.264. Coding in accordance with some presently proposed aspects of the HEVC standard being developed will be described in this application for purposes of illustration. However, the techniques described in this disclosure may be used for other video coding processes, such as video coding processes defined according to h.264 or other standards or proprietary video coding processes.
To achieve desirable coding efficiency, a Coding Unit (CU) may have a variable size depending on the video content. In addition, the coding unit may be split into smaller blocks for prediction or transform. In particular, each coding unit may be further partitioned into Prediction Units (PUs) and Transform Units (TUs). The prediction units may be considered similar to so-called partitions according to other video coding standards, such as the h.264 standard. A Transform Unit (TU) generally refers to a block of residual data to which a transform is applied to generate transform coefficients.
A coding unit typically has one luma component (denoted Y) and two chroma components (denoted U and V). Depending on the video sampling format, the size of the U and V components may be the same or different from the size of the Y component in terms of the number of samples.
To code a block (e.g., a prediction unit of video data), a predictor for the block is first derived. The predictor, also referred to as a predictive block, may be derived by intra (I) prediction (i.e., spatial prediction) or inter (P or B) prediction (i.e., temporal prediction). Thus, some prediction units may be intra-coded (I) using spatial prediction with respect to reference samples in neighboring reference blocks in the same frame (or slice), and other prediction units may be uni-directionally inter-coded (P) or bi-directionally inter-coded (B) with respect to reference sample blocks in other previously coded frames (or slices). In each case, the reference samples may be used to form a predictive block for the block to be coded.
Upon identifying the predictive block, a difference between the original block of video data and its predictive block is determined. This difference may be referred to as prediction residual data and indicates the pixel difference between the pixel values in the block and the coded and pixel values in the predictive block selected to represent the coded block. To achieve better compression, the prediction residual data may be transformed, for example, using a Discrete Cosine Transform (DCT), an integer transform, a karhunen-lavian (K-L) transform, or another transform.
The residual data in a transform block (e.g., TU) may be arranged in a two-dimensional (2D) array of pixel difference values residing in the spatial pixel domain. The transform converts the residual pixel values into a two-dimensional array of transform coefficients in a transform domain (e.g., the frequency domain). For further compression, the transform coefficients may be quantized prior to entropy coding. The entropy coder then applies entropy coding, such as Context Adaptive Variable Length Coding (CAVLC), Context Adaptive Binary Arithmetic Coding (CABAC), probability interval partition entropy coding (PIPE), etc., to the quantized transform coefficients.
To entropy code a block of quantized transform coefficients, a scanning process is typically performed in order to process a two-dimensional (2D) array of quantized transform coefficients in the block according to a particular scanning order in an ordered one-dimensional (1D) array (i.e., a vector of transform coefficients). Entropy coding is applied in the 1-D order of the transform coefficients. The scanning of the quantized transform coefficients in the transform unit serializes the 2D array of transform coefficients of the entropy coder. A significance map may be generated to indicate the locations of significant (i.e., non-zero) coefficients. Scanning may be applied to scan levels of significant (i.e., non-zero) coefficients and/or code signs of significant coefficients.
As an example, for DCT, the probability of non-zero coefficients towards the upper left corner of the 2D transform unit (i.e., the low frequency region) is often higher. It may be desirable to scan the non-zero coefficients in a manner that increases the probability of grouping them together at one end of a serialized run of coefficients, thereby permitting zero-valued coefficients to be grouped together toward the other end of the serialized vector and coded into a zero run more efficiently. For this reason, the scanning order may be important for efficient entropy coding.
As one example, so-called diagonal (or wavefront) scan orders have been employed in the HEVC standard for scanning quantized transform coefficients. Alternatively, a zig-zag, horizontal, vertical, or other scanning order may be used. As mentioned above, for the example where the transform is a DCT, through the transform and quantization, the non-zero transform coefficients are generally located at a low frequency region toward the top left region of the block. Thus, after a diagonal scanning process (which may traverse the upper left region first), the non-zero transform coefficients are generally more likely to be located in the front portion of the scan. For a diagonal scanning process that traverses from the bottom-right region first, non-zero transform coefficients are generally more likely to be located in the latter part of the scan.
Multiple zero coefficients will typically be grouped together at one end of the scan (depending on the scan direction) because of the energy reduction at higher frequencies, and because of the effect of quantization, which may cause some non-zero coefficients to become zero-valued coefficients immediately after the bit depth is reduced. These characteristics of coefficient distribution in a serialized 1D array can be exploited in entropy decoder design to improve decoding efficiency. In other words, better coding efficiency may be expected due to the design of many entropy coders if the non-zero coefficients can be efficiently arranged in one portion of the 1D array by some appropriate scanning order.
To achieve this goal of placing more non-zero coefficients at one end of the 1D array, transform coefficients may be coded using different scan orders in a video encoder-decoder (codec). In some cases, diagonal scanning may be effective. In other cases, different types of scanning, such as zig-zag, vertical, or horizontal scanning, may be more efficient.
Different scanning orders can be generated in various ways. As an example, for each block of transform coefficients, a "best" scan order may be selected from a plurality of available scan orders. The video encoder may then provide, for each block, an indication to the decoder of an index to a best scan order among a set of scan orders represented by the respective index. By applying several scan orders and selecting the one that is most efficient in placing non-zero coefficients near the beginning or end of the 1D vector, the selection of the best scan order may be determined, thereby facilitating efficient entropy coding.
In another example, the scan order for the current block may be determined based on various factors related to the coding of the relevant prediction unit, such as prediction mode (I, B, P), block size, transform, or other factors. In some cases, because the same information (e.g., prediction mode) may be inferred on both the encoder and decoder sides, it may not be necessary to provide an indication of the scan order index to the decoder. Rather, the video decoder may store configuration data that indicates an appropriate scan order with knowledge of the prediction mode for the block and one or more criteria that map the prediction mode to a particular scan order.
To further improve coding efficiency, the available scanning order may not be constant all the time. Rather, some adaptation may be enabled, for example, to adaptively adjust the scanning order based on already coded coefficients. In general, scan order adaptation may be done in a manner such that according to the selected scan order, zero and non-zero coefficients are more likely to be grouped together.
In some video codecs, the initial available scanning order may take a very conventional form, such as full horizontal, vertical, diagonal or zig-zag scanning. Alternatively, the scan order may be derived by a training process, and thus may appear somewhat random. The training process may involve applying different scan orders to a block or series of blocks to identify a scan order that produces a desirable result (e.g., in terms of the significant placement of non-zero and zero-valued coefficients, as mentioned above).
If the scanning order is derived from a training process, or if a variety of different scanning orders can be selected, it may be beneficial to preserve a particular scanning order on both the encoder and decoder side. The amount of data specifying such scan orders can be quite large. For example, for a 32 x 32 transform block, one scan order may contain 1024 transform coefficient positions. Because there may be different sized blocks and there may be multiple different scan orders for each sized transform block, the total amount of data that needs to be saved is not trivial. Conventional scan orders, such as diagonal, horizontal, vertical, or zig-zag orders, may not require storage, or may require minimal storage. However, diagonal, horizontal, vertical, or zig-zag order may not provide sufficient variety to provide coding performance equivalent to the trained scan order.
In one conventional example, for the h.264 and HEVC standards currently being developed, when a CABAC entropy coder is used, the positions of significant coefficients (i.e., non-zero transform coefficients) in a transform block (i.e., a transform unit in HEVC) are encoded prior to the coefficient level. The coding process of significant coefficient positions is referred to as significance map coding. The significance of a coefficient is the same as the binary 0 of the coefficient level. As shown in fig. 1, significance map coding of quantized transform coefficients 11 produces a significance map 13. Significance fig. 13 is a graph of 1 and 0, where 1 indicates the position of the significant coefficient. Significance maps typically require a high percentage of the video bit rate. The techniques of this disclosure may also be adapted for use with other entropy coders (e.g., PIPE).
D. Example processes for decoding significance maps are described in "Context-Based Adaptive Binary Arithmetic Coding in the h.264/AVC Video Compression Standard" (IEEE Video Technology circuits and Systems journal (IEEE trans. circuits and Systems for Video Technology), 7 months 2003, 7 th, volume 13) by mappe (d.marpe), h.schwarz (h.schwarz), and t.wiegand (t.wiegand). In this process, a significance map is coded if there is at least one significant coefficient in the block as indicated by a Coded Block Flag (CBF), which is defined as:
coded block flag: coded block flag is a one-bit symbol that indicates whether there are significant (i.e., non-zero) coefficients inside a single block of transform coefficients for which the coded block mode indicates a non-zero entry. If the coded _ block _ flag is zero, no further information is transmitted for the relevant block.
If significant coefficients are present in the block, the significance map is encoded by following the scan order of the transform coefficients in the block as follows:
scanning of transform coefficients: a two-dimensional array of transform coefficient levels for which the coded block flag indicates sub-blocks of non-zero entries is first mapped into a one-dimensional list using a given scan mode. In other words, the sub-blocks having significant coefficients are scanned according to the scan pattern.
Given a scan pattern, the following scan validity plots:
effectiveness graph: if the coded _ block _ flag indicates that a block has significant coefficients, a significance map of binary values is encoded. A one-bit symbol significant coeff flag is transmitted for each transform coefficient in the scan order. If the significant _ coeff _ flag symbol is one, i.e. if there is a non-zero coefficient at this scanning position, another one-bit symbol last _ significant _ coeff _ flag is sent. This sign indicates whether the current significant coefficient is the last significant coefficient inside the block or whether other significant coefficients follow. If the last scanning position is reached and the significance map coding has not yet ended with last _ significant _ coeff _ flag of value one, it is clear that the last coefficient must be significant.
Recent proposals for HEVC have removed the last _ significant _ coeff flag. In these proposals, an indication of the X and Y positions of the position of the last significant coefficient is sent before the significance map is sent.
Currently, in HEVC, it is proposed to use three scan modes for significance maps: diagonal, vertical and horizontal. Fig. 2 shows an example of zigzag scanning 17, vertical scanning 19, horizontal scanning 21 and diagonal scanning 15. As shown in fig. 2, each of these scans is done in the forward direction, i.e., from lower frequency transform coefficients in the upper left corner of the transform block to higher frequency transform coefficients in the lower right corner of the transform block. After coding the significance map, the remaining level information (bins 1-N, where N is the total number of bins) for each significant transform coefficient (i.e., coefficient value) is coded.
In the CABAC process previously specified in the h.264 standard, after the 4 x 4 sub-blocks are handled, each of the transform coefficient levels is binarized, e.g., according to a unary code, to generate a series of bins. In h.264, the set of CABAC context models for each sub-block consists of two by five context models, with five models being used for the first bin and all remaining bins (up to and including the 14 th bin) of the coeff abs level minus one syntax element that encodes the absolute value of the transform coefficient. Notably, in one proposed version of HEVC, the remaining bins include only bin 1 and bin 2. The remaining coefficient levels are coded using Golomb-Rice (Golomb-Rice) coding and exponential Golomb (Golomb) codes.
In HEVC, the selection of the context model can be performed as in the original CABAC process proposed in the h.264 standard. However, different sets of context models may be selected for different sub-blocks. In particular, the selection of a set of context models for a given sub-block depends on certain statistics of previously coded sub-blocks.
Fig. 3 shows the scanning order followed by one proposed version of the HEVC process when encoding the levels of transform coefficients (the absolute values of the levels and the signs of the levels) in transform unit 25. It should be noted that there is a forward zigzag pattern 27 for scanning the 4 x 4 sub-blocks of a larger block, and an inverse zigzag pattern 23 for scanning the transform coefficient levels within each sub-block. In other words, a series of 4 x 4 sub-blocks are scanned in a forward zigzag pattern, scanning the sub-blocks in a sequence. Then, within each sub-block, an inverse zig-zag scan is performed to scan the levels of transform coefficients within the sub-block. Thus, the transform coefficients in the two-dimensional array formed by the transform units are serialized into a one-dimensional array such that the inversely scanned coefficients in a given sub-block are followed by the inversely scanned coefficients in a successive sub-block.
In one example, CABAC coding of coefficients scanned according to the sub-block scanning approach shown in fig. 3 may use 6 sets of 60 contexts, i.e., 10 contexts, each distributed as described below. For a 4 x 4 block, as shown in table 1, 10 context models can be used (5 models for bin 1 and 5 models for bins 2 through 14):
TABLE 1 context for bin 1 and bins 2 through 14 of coefficient levels of a sub-block
According to table 1, one of context models 0 to 4 in the context set is used for bin 1 in the following cases, respectively: after coefficients greater than 1 have been encoded within a sub-block, the current encoded coefficient being scanned in the sub-block is encoded, the current encoded coefficient being the initial coefficient scanned in the sub-block, or there is no trailing 1 in the sub-block (no previously encoded coefficient), there are 1 trailing 1's in the sub-block (i.e., 1's have been encoded, but no coefficients greater than 1 have been encoded), there are two trailing 1's in the sub-block, or there are three or more trailing 1's in the sub-block. For each of bins 2-14 (although the currently proposed version of HEVC only codes bin 2 using CABAC, where successive bins of a coefficient level are coded with exponential golomb codes), one of context models 0-4 may be used, respectively, if: the coefficient is an initial coefficient scanned in the sub-block, or there are zero previously coded coefficients greater than 1, there is one previously coded coefficient greater than 1, there are two previously coded coefficients greater than 1, there are three previously coded coefficients greater than 1, or there are four previously coded coefficients greater than 1.
Depending on the number of coefficients greater than 1 in the previously coded 4 x 4 sub-block in the forward scan of the sub-block, there are 6 different sets of these 10 models:
TABLE 2 context for bin 1 and bins 2 through 14
According to table 2, sets 0 to 5 of context models are used for a given subblock in the following cases, respectively: the sub-block size is 4 x 4, there are 0 to 3 coefficients greater than 1 in the previously coded sub-block, 4 to 7 coefficients greater than 1 in the previously coded sub-block, 8 to 11 coefficients greater than 1 in the previously coded sub-block, 12 to 15 coefficients greater than 1 in the previously coded sub-block, or the given sub-block is the first 4 x 4 sub-block (top left sub-block) or 16 coefficients greater than 1 in the previously coded sub-block.
The above-described coding process for h.264 and the currently proposed coding process for HEVC have several drawbacks. As shown in fig. 3, one drawback is that the scanning for the coefficient levels proceeds forward for the scanning of the sub-blocks (i.e., starting from the top left sub-block), but then backward for the coefficient levels within each sub-block (i.e., starting from the bottom right coefficient in each sub-block). This approach means walking around within the block, which may make data acquisition more complex.
Another drawback derives from the fact that: the scanning order of the coefficient levels is different from the scanning order of the significance map. In HEVC, there are three different proposed scan orders for significance maps: forward diagonal, forward horizontal and forward vertical as shown in fig. 2. All significant coefficient scans are different from the scanning of coefficient levels currently proposed for HEVC, since the level scanning is done in the reverse direction. Because the direction and pattern of the coefficient level scan does not match the direction and pattern of the significance scan, more coefficient levels must be examined. For example, assume that a horizontal scan is used for the significance map, and the last significant coefficient is found at the end of the first row of coefficients. Coefficient level scanning in HEVC would require diagonal scanning across multiple lines for level scanning, when only the first line actually contains coefficient levels other than 0. Such a scanning process may introduce unwanted inefficiencies.
In the current proposal for HEVC, the scan of the significance map proceeds in the block from the DC coefficient found in the upper left corner of the block forward to the highest frequency coefficient typically found in the lower right corner of the block, while the scan for the coefficient level proceeds backward within each 4 x 4 sub-block. This may also result in more complex and less efficient data acquisition.
Another drawback of the current HEVC proposal comes from the context set. The context set for CABAC (see table 2 above) is different for block size 4 × 4 than for other block sizes. According to this disclosure, it would be desirable to coordinate contexts across all block sizes so that less memory is dedicated to storing different sets of contexts.
Furthermore, as will be described in more detail below, the currently proposed CABAC context for significance maps for HEVC is only valid when the scan order is forward. This will therefore not allow an inverse significance map scan.
Furthermore, the context described above for encoding a level of quantized coefficients attempts to exploit local correlation of coefficient levels. These contexts depend on the correlation among the 4 x 4 sub-blocks (see context set in table 2), and the correlation within each sub-block (see context model in table 1). The drawback of these contexts is that the dependencies may be too far apart (i.e., there is a low dependency between coefficients that are separated from each other by several other coefficients from one sub-block to another). Furthermore, within each sub-block, the dependencies may be weak.
The present invention proposes several different features which may reduce or eliminate some of the disadvantages described above. In some examples, these features may provide a more efficient and coordinated scanning order of transform coefficients in video coding. In other examples of this disclosure, these features provide a more efficient set of contexts to be used for CABAC-based entropy coding of transform coefficients consistent with the proposed scan order. It should be noted that all of the techniques described in this disclosure can be used independently or together in any combination.
Fig. 4 is a block diagram illustrating an example video encoding and decoding system 10 that may be configured to utilize techniques for coding transform coefficients in accordance with examples of this disclosure. As shown in fig. 4, system 10 includes a source device 12 that transmits encoded video to a destination device 14 via a communication channel 16. The encoded video may also be stored on storage medium 34 or file server 36 and may be accessed by destination device 14 when needed. Source device 12 and destination device 14 may comprise any of a wide variety of devices, including desktop computers, notebook (i.e., laptop) computers, tablet computers, set-top boxes, telephone handsets (such as so-called smart phones), televisions, cameras, display devices, digital media players, video game consoles, and the like. In many cases, such devices may be equipped for wireless communication. Thus, the communication channel 16 may comprise a wireless channel, a wired channel, or a combination of wireless and wired channels suitable for transmitting encoded video data. Similarly, the file server 36 may be accessed by the destination device 14 through any standard data connection, including an internet connection. This may include a wireless channel (e.g., a Wi-Fi connection), a wired connection (e.g., DSL, cable modem, etc.), or a combination of both suitable for accessing encoded video data stored on a file server.
Techniques for coding transform coefficients according to examples of this disclosure may be applied to video coding in support of any of a variety of multimedia applications, such as over-the-air television broadcasts, cable television transmissions, satellite television transmissions, streaming video transmissions (e.g., via the internet), encoding of digital video for storage on a data storage medium, decoding of digital video stored on a data storage medium, or other applications. In some examples, the system 10 may be configured to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcasting, and/or video telephony.
In the example of fig. 4, source device 12 includes a video source 18, a video encoder 20, a modulator/demodulator 22, and a transmitter 24. In source device 12, video source 18 may comprise a source, such as a video capture device, such as a video camera, a video archive comprising previously captured video, a video feed interface for receiving video from a video content provider and/or a computer graphics system for generating computer graphics data as the source video, or a combination of such sources. As one example, if video source 18 is a video camera, source device 12 and destination device 14 may form so-called camera phones or video phones. However, the techniques described in this disclosure may be generally applicable to video coding, and may be applied to wireless and/or wired applications.
Captured, pre-captured, or computer-generated video may be encoded by video encoder 20. Modem 22 may modulate the encoded video information according to a communication standard, such as a wireless communication protocol, and transmit the encoded video information to destination device 14 via transmitter 24. Modem 22 may include various mixers, filters, amplifiers, or other components designed for signal modulation. Transmitter 24 may include circuitry designed for transmitting data, including amplifiers, filters, and one or more antennas.
The captured, pre-captured, or computer-generated video encoded by video encoder 20 may also be stored on storage medium 34 or file server 36 for later consumption. Storage medium 34 may comprise a blu-ray disc, DVD, CD-ROM, flash memory, or any other suitable digital storage medium for storing encoded video. Destination device 14 may then access the encoded video stored on storage medium 34 for decoding and playback.
File server 36 may be any type of server capable of storing encoded video and transmitting the encoded video to destination device 14. Example file servers include web servers (e.g., for a website), FTP servers, Network Attached Storage (NAS) devices, local disk drives, or any other type of device capable of storing encoded video data and transmitting the encoded video data to a destination device. The transmission of the encoded video data from the file server 36 may be a streaming transmission, a download transmission, or a combination of both. The destination device 14 may access the file server 36 through any standard data connection, including an internet connection. This may include a wireless channel (e.g., a Wi-Fi connection), a wired connection (e.g., DSL, cable modem, ethernet, USB, etc.), or a combination of both suitable for accessing encoded video data stored on a file server.
Destination device 14 in the example of fig. 4 includes a receiver 26, a modem 28, a video decoder 30, and a display device 32. Receiver 26 of destination device 14 receives the information via channel 16, and modem 28 demodulates the information to produce a demodulated bitstream for video decoder 30. The information communicated over channel 16 may include a variety of syntax information generated by video encoder 20 for use by video decoder 30 in decoding video data. Such syntax may also be included in the encoded video data stored on storage medium 34 or file server 36. Each of video encoder 20 and video decoder 30 may form part of a respective encoder-decoder (codec) capable of encoding or decoding video data.
The display device 32 may be integrated with the destination device 14 or external to the destination device 14. In some examples, destination device 14 may include an integrated display device and also be configured to interface with an external display device. In other examples, destination device 14 may be a display device. In general, display device 32 displays the decoded video data to a user, and may comprise any of a variety of display devices, such as a Liquid Crystal Display (LCD), a plasma display, an Organic Light Emitting Diode (OLED) display, or another type of display device.
In the example of fig. 4, communication channel 16 may comprise any wireless or wired communication medium, such as a Radio Frequency (RF) spectrum or one or more physical transmission lines, or any combination of wireless and wired media. The communication channel 16 may form part of a packet-based network, such as a local area network, a wide area network, or a global network such as the internet. Communication channel 16 generally represents any suitable communication medium or collection of different communication media, including any suitable combination of wired or wireless media, for transmitting video data from source device 12 to destination device 14. Communication channel 16 may include a router, switch, base station, or any other apparatus that may be used to facilitate communication from source device 12 to destination device 14.
Video encoder 20 and video decoder 30 may operate in accordance with a video compression standard, such as the High Efficiency Video Coding (HEVC) standard currently being developed, and may conform to the HEVC test model (HM). Alternatively, video encoder 20 and video decoder 30 may operate in accordance with other proprietary or industry standards, such as the ITU-T h.264 standard, or known as MPEG-4, part 10, Advanced Video Coding (AVC), or an extension of such standards. However, the techniques of this disclosure are not limited to any particular coding standard. Other examples include MPEG-2 and ITU-T H.263.
Although not shown in fig. 4, in some aspects, video encoder 20 and video decoder 30 may each be integrated with an audio encoder and decoder, and may include appropriate multiplexer-demultiplexer (MUX-DEMUX) units or other hardware and software to handle encoding of both audio and video in a common data stream or separate data streams. In some examples, the MUX-DEMUX unit may be compliant with the ITU h.223 multiplexer protocol, or other protocols such as the User Datagram Protocol (UDP), if applicable.
Video encoder 20 and video decoder 30 may each be implemented as any of a variety of suitable encoder circuits, such as one or more microprocessors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), discrete logic, software, hardware, firmware or any combinations thereof. When the techniques are implemented in part in software, a device may store instructions for the software in a suitable non-transitory computer-readable medium and execute the instructions in hardware using one or more processors to perform the techniques of this disclosure. Each of video encoder 20 and video decoder 30 may be included in one or more encoders or decoders, either of which may be integrated as part of a combined encoder/decoder (codec) in the respective device.
Video encoder 20 may implement any or all of the techniques of this disclosure to improve the encoding of transform coefficients in a video coding process. Likewise, video decoder 30 may implement any or all of these techniques to improve decoding of transform coefficients in a video coding process. As described in this disclosure, a video coder may refer to a video encoder or a video decoder. Similarly, a video coding unit may refer to a video encoder or a video decoder. Likewise, video coding may refer to video encoding or video decoding.
In one example of this disclosure, a video coder (e.g., video encoder 20 or video decoder 30) may be configured to code a plurality of transform coefficients associated with residual video data in a video coding process. The video coder may be configured to code information indicative of significant coefficients of the plurality of transform coefficients according to a scan order, and code information indicative of levels of the plurality of transform coefficients according to the scan order.
In another example of this disclosure, a video coder (e.g., video encoder 20 or video decoder 30) may be configured to code a plurality of transform coefficients associated with residual video data in a video coding process. The video coder may be configured to code information indicative of significant transform coefficients in a block of transform coefficients with a scan in an inverse scan direction from higher frequency coefficients in the block of transform coefficients to lower frequency coefficients of the block of transform coefficients.
In another example of this disclosure, a video coder (e.g., video encoder 20 or video decoder 30) may be configured to code a plurality of transform coefficients associated with residual video data in a video coding process. The video coder may be configured to: arranging a block of transform coefficients into one or more subsets of transform coefficients based on a scanning order; coding a first portion of levels of transform coefficients in each subset, wherein the first portion of levels includes at least significance of transform coefficients in each subset; and coding a second portion of the levels of transform coefficients in each subset.
In another example of this disclosure, a video coder (e.g., video encoder 20 or video decoder 30) may be configured to: coding information indicative of significant coefficients of the plurality of transform coefficients according to a scanning order; dividing the coded information into at least a first region and a second region; entropy coding coded information in the first region using context derivation criteria according to a first set of contexts; and entropy coding the coded information in the second region using the same context derivation criteria as the first region according to the second set of contexts.
Fig. 5 is a block diagram illustrating an example of a video encoder 20 that may be used for coding transform coefficients as described in this disclosure. Video encoder 20 will be described in the context of HEVC coding for purposes of illustration, but is not limited in this disclosure with respect to other coding standards or methods that may require scanning of transform coefficients. Video encoder 20 may perform intra and inter coding on CUs within a video frame. Intra-coding relies on spatial prediction to reduce or remove spatial redundancy in video within a given video frame. Inter-coding relies on temporal prediction to reduce or remove temporal redundancy between a current frame and a previously coded frame of a video sequence. Intra mode (I-mode) may refer to any of several spatially based video compression modes. An inter mode, such as uni-directional prediction (P-mode) or bi-directional prediction (B-mode), may refer to any of several time-based video compression modes.
As shown in fig. 5, video encoder 20 receives a current video block within a video frame to be encoded. In the example of fig. 5, video encoder 20 includes motion compensation unit 44, motion estimation unit 42, intra-prediction module 46, reference frame buffer 64, summer 50, transform module 52, quantization unit 54, and entropy encoding unit 56. The transform module 52 illustrated in fig. 5 is a module that applies the actual transform to a block of residual data and should not be confused with blocks of transform coefficients of Transform Units (TUs), which may also be referred to as CUs. For video block reconstruction, video encoder 20 also includes inverse quantization unit 58, inverse transform module 60, and summer 62. A deblocking filter (not shown in fig. 5) may also be included to filter block boundaries to remove blockiness artifacts from the reconstructed video. The deblocking filter will typically filter the output of summer 62, if desired.
During the encoding process, video encoder 20 receives a video frame or slice to be coded. The frame or slice may be divided into a plurality of video blocks, e.g., Largest Coding Units (LCUs). Motion estimation unit 42 and motion compensation unit 44 may perform inter-predictive coding of the received video block relative to one or more blocks in one or more reference frames to provide temporal compression. Intra-prediction unit 46 may perform intra-predictive coding of the received video block relative to one or more neighboring blocks in the same frame or slice as the block to be coded to provide spatial compression.
Mode select unit 40 may select one of the coding modes (intra or inter), for example, for each mode based on the error (i.e., distortion) results, and provide the resulting intra or inter coded block to summer 50 to generate residual block data, and provide the resulting intra or inter coded block to summer 62 to reconstruct the encoded block for use in the reference frame. Some video frames may be denoted as I-frames, where all blocks in an I-frame are encoded in intra-prediction mode. In some cases, intra-prediction module 46 may perform intra-prediction encoding of blocks in P or B frames if, for example, the motion search performed by motion estimation unit 42 does not yield sufficient prediction for the block.
Motion estimation unit 42 and motion compensation unit 44 may be highly integrated, but are illustrated separately for conceptual purposes. Motion estimation is the process of generating motion vectors that estimate the motion of video blocks. For example, a motion vector may indicate a reference sample displacement of a prediction unit in a current frame relative to a reference frame. The reference samples may be blocks found to closely match the portion of the CU that includes the coded PU in terms of pixel differences, which may be determined by Sum of Absolute Differences (SAD), Sum of Squared Differences (SSD), or other difference metrics. The motion compensation performed by motion compensation unit 44 may include deriving or generating values for the prediction unit based on motion vectors determined by motion estimation. Also, in some examples, motion estimation unit 42 and motion compensation unit 44 may be functionally integrated.
Motion estimation unit 42 calculates a motion vector for a prediction unit of an inter-coded frame by comparing the prediction unit to reference samples of a reference frame stored in reference frame buffer 64. In some examples, video encoder 20 may calculate values for sub-integer pixel positions of reference frames stored in reference frame buffer 64. For example, video encoder 20 may calculate values for quarter-pixel positions, eighth-pixel positions, or other fractional-pixel positions of the reference frame. Thus, motion estimation unit 42 may perform a motion search with respect to integer pixel positions and fractional pixel positions and output motion vectors with fractional pixel precision. Motion estimation unit 42 sends the calculated motion vectors to entropy encoding unit 56 and motion compensation unit 44. The portion of the reference frame identified by the motion vector may be referred to as a reference sample. Motion compensation unit 44 may calculate a prediction value for the prediction unit of the current CU, e.g., by retrieving a reference sample identified by the motion vector of the PU.
As an alternative to inter prediction performed by motion estimation unit 42 and motion compensation unit 44, intra prediction module 46 may intra prediction encode the received block. Intra-prediction module 46 may encode the received block relative to a neighboring previously coded block (e.g., a block above, above-right, above-left, or to the left of the current block), assuming a coding order of the blocks that is left-to-right, top-to-bottom. Intra-prediction module 46 may be configured with a variety of different intra-prediction modes. For example, intra-prediction module 46 may be configured with some number of directional prediction modes, such as 33 directional prediction modes, based on the size of the CU being encoded.
The intra-prediction module 46 may select the intra-prediction mode, for example, by calculating error values for various intra-prediction modes and selecting the mode that yields the lowest error value. The directional prediction mode may include functions for combining values of spatially neighboring pixels and applying the combined values to one or more pixel locations in the PU. Once the values for all pixel locations in the PU have been calculated, intra-prediction module 46 may calculate an error value for the prediction mode based on the pixel differences between the PU and the received block to be encoded. Intra-prediction module 46 may continue to test intra-prediction modes until an intra-prediction mode is found that yields an acceptable error value. Intra-prediction module 46 may then send the PU to summer 50.
Video encoder 20 forms a residual block by subtracting the prediction data calculated by motion compensation unit 44 or intra-prediction module 46 from the original video block being coded. Summer 50 represents the component that performs this subtraction operation. The residual block may correspond to a two-dimensional matrix of pixel difference values, where the number of values in the residual block is the same as the number of pixels in the PU corresponding to the residual block. The values in the residual block may correspond to differences, i.e., errors, between co-located pixels in the PU and in the original block to be coded. The difference may be a chrominance or luminance difference, depending on the type of block being coded.
Transform module 52 may form one or more Transform Units (TUs) from the residual blocks. Transform module 52 applies a transform, such as a Discrete Cosine Transform (DCT), a directional transform, or a conceptually similar transform, to the TU, producing a video block comprising transform coefficients. Transform module 52 may send the resulting transform coefficients to quantization unit 54. The quantization unit 54 may then quantize the transform coefficients. Entropy encoding unit 56 may then perform a scan of the quantized transform coefficients in the matrix according to the specified scan order. This disclosure describes entropy encoding unit 56 as performing the scan. However, it should be understood that in other examples, other processing units (e.g., quantization unit 54) may perform the scan.
As mentioned above, the scanning of transform coefficients may involve two scans. One scan identifies which of the coefficients are significant (i.e., non-zero) to form a significance map, while another scan codes levels of transform coefficients. In one example, this disclosure proposes that the scanning order used to code coefficient levels in a block be the same as the scanning order used to code significant coefficients in a significance map for the block. In HEVC, the block may be a transform unit. As used herein, the term scan order may refer to a direction of scanning and/or a pattern of scanning. Thus, the significance map and the scanning of the coefficient levels may be the same in terms of scan pattern and/or scan direction. That is, as one example, if the scanning order used to form the significance map is a horizontal scanning pattern in the forward direction, then the scanning order for the coefficient levels should also be a horizontal scanning pattern in the forward direction. Also, as another example, if the scan order for the significance map is a vertical scan pattern in the inverse direction, then the scan order for the coefficient levels should also be a vertical scan pattern in the inverse direction. This may be the case for diagonal, zig-zag, or other scan patterns.
Fig. 6 shows an example of an inverse scan order for a block of transform coefficients (i.e., a transform block). The transform blocks may be formed using a transform, such as, for example, a Discrete Cosine Transform (DCT). It should be noted that each of the inverse diagonal mode 9, inverse zigzag mode 29, inverse vertical mode 31, and inverse horizontal mode 33 proceeds from higher frequency coefficients in the lower right corner of the transform block to lower frequency coefficients in the upper left corner of the transform block. Accordingly, one aspect of this disclosure presents a unified scanning order for coding significance maps and coding coefficient levels. The proposed techniques apply a scan order for the significance map to a scan order for coefficient level coding. In general, horizontal, vertical, and diagonal scan patterns have been shown to work well, thereby reducing the need for additional scan patterns. However, the general techniques of the present invention are applicable for use with any scan pattern.
According to another aspect, the invention proposes to perform the significance scan as an inverse scan, i.e. from the last significant coefficient in the transform unit to the first coefficient (i.e. the DC coefficient) in the transform unit. An example of an inverse scan order is shown in fig. 6. In particular, the significance scan proceeds from the last significant coefficient at a higher frequency position to a significant coefficient at a lower frequency position and finally to a DC coefficient position.
To facilitate the inverse scan, a technique for identifying the last significant coefficient may be used. A process for identifying the last significant coefficient is described in united states provisional patent application No. 61/419,740 entitled "Encoding of the position of the last significant transform coefficient in video coding" filed on 3.2010 of Joel Sole, Rojals et al, JCTVC-D262, korean big church 4 th JCT-VC conference, 2011 1 month) in "Parallel Context Processing for the significance map in high coding efficiency (Parallel Context Processing for the coding significance map in high coding efficiency" by j.she, r.joshi, i. Once the last significant coefficient in the block is identified, the inverse scan order may be applied to both the significance map and the coefficient level.
The invention further proposes that the significance scan and the coefficient level scan are not inverse and forward, respectively, but actually have the same scan direction and more specifically only one direction in one block. In particular, it is proposed that both significance scanning and coefficient-level scanning use an inverse scanning order from the last significant coefficient to the first coefficient in a transform unit. Thus, the significance scan is performed from the last significant coefficient to the first coefficient (DC coefficient) (which is an inverse scan with respect to the currently proposed scan for HEVC). This aspect of the disclosure presents a unified unidirectional scanning order for coding significance maps and coding coefficient levels. In particular, the unified unidirectional scan order may be a unified inverse scan order. The scan order for significance and coefficient level scanning according to the unified inverse scan pattern may be inverse diagonal, inverse zig-zag, inverse horizontal, or inverse vertical, as shown in fig. 6. However, any scanning pattern may be used.
Instead of defining the set of coefficients in a two-dimensional sub-block as shown in fig. 3 for CABAC context derivation purposes, the present invention proposes to define the set of coefficients as a number of coefficients that are successively scanned according to a scanning order. In particular, each set of coefficients may comprise consecutive coefficients of a scan order over the entire block. Any size set may be considered, but a size of 16 coefficients in the scan set has been found to be very effective. The set size may be fixed or adaptive. This definition allows the set to be a 2D block (if using the sub-block scanning approach), a rectangle (if using horizontal or vertical scanning), or a diagonal shape (if using zigzag or diagonal scanning). The diagonally shaped set of coefficients may be a portion of a diagonal shape, a continuous diagonal shape, or a portion of a continuous diagonal shape.
Fig. 7-9 show examples of coefficients arranged according to a particular scanning order into 16 coefficient subsets, in addition to a fixed 4 x 4 block arrangement. Fig. 7 depicts a 16-coefficient subset 51 consisting of the first 16 coefficients in an inverse diagonal scan order. In this example, the next subset would simply consist of the next 16 consecutive coefficients along the inverse diagonal scan order. Similarly, fig. 8 depicts 16 coefficient subset 53 for the first 16 coefficients of the inverse horizontal scan order. Fig. 9 depicts a 16-coefficient subset 55 for the first 16 coefficients of the inverse vertical scan order.
This technique is compatible with the same scanning order for the coefficient levels as for the significance map. In this case, a different (and sometimes cumbersome) scanning order for the coefficient levels is not required, such as shown in fig. 3. As with the significance map scan currently proposed for HEVC, the coefficient level scan may be formed as a forward scan from the position of the last significant coefficient in the transform unit to the DC coefficient position.
As currently proposed in HEVC, for entropy coding using CABAC, the transform coefficients are encoded in the following manner. First, there is one pass (in significance map scan order) on the complete transform unit to encode the significance map. Then, encoding binary 1 for the levels (first pass), the remaining coefficient levels (second pass), and the signs of the coefficient levels (third pass) there are three passes (in coefficient-level scan order). These three passes for coefficient level coding are not for a complete transform unit. Instead, as shown in fig. 3, each pass is done in 4 x 4 sub-blocks. When the three passes have been completed in one sub-block, the next sub-block is processed by sequentially performing the same three encoding passes. This approach facilitates parallelization of the encoding.
As described above, this disclosure proposes scanning transform coefficients in a more coordinated manner such that the scanning order for the coefficient levels is the same as the scanning order of the significant coefficients used to form the significance map. In addition, it is proposed to perform the scanning for the coefficient levels and significant coefficients in the inverse direction from the last significant coefficient in the block to the first coefficient in the block (DC component). As currently proposed, according to HEVC, this inverse scanning is the opposite of the scanning for significant coefficients.
As previously described with reference to fig. 7-9, the present invention further proposes to partition the context for the coefficient level (including the significance map) into subsets. That is, a context is determined for each subset of coefficients. Thus, in this example, the same context is not necessarily used for the entire coefficient scan. Rather, different subsets within a transform block may have different contexts determined separately for each subset. Each subset may comprise a one-dimensional array of consecutive scan coefficients in scan order. Thus, a coefficient level scan proceeds from the last significant coefficient to the first coefficient (DC component), where the scan is conceptually partitioned into different subsets of consecutive scan coefficients according to scan order. For example, each subset may include n consecutive scan coefficients for a particular scan order. Grouping the coefficients in several subsets according to their scan order may provide better correlation between the coefficients and thus more efficient entropy coding.
This disclosure further proposes to improve the parallelization of CABAC-based entropy coding of transform coefficients by expanding the concept of several passes of coefficient levels to include additional passes for significance maps. Thus, an example with four rounds may include: (1) coding of significant coefficient flag values for transform coefficients, e.g., to form a significance map, (2) coding of bin 1 for level values of transform coefficients, (3) coding of remaining bins of coefficient level values, and (4) coding of signs of coefficient levels, all in the same scan order. Using the techniques described in this disclosure, the four-pass coding outlined above may be facilitated. That is, scanning levels of significant coefficients and transform coefficients with the same scan order, with the scan order proceeding in the inverse direction from high frequency coefficients to low frequency coefficients, supports performance of the number of pass coding techniques described above.
In another example, a five-pass scanning technique may include: (1) coding of significant coefficient flag values for transform coefficients, e.g., to form a significance map, (2) coding of bin 1 for level values of transform coefficients, (3) coding of bin 2 for level values of transform coefficients, (4) coding of the sign of the coefficient level (e.g., in bypass mode), and (5) coding of the remaining bins of the coefficient level values (e.g., in bypass mode), all passes using the same scan order.
Instances with fewer rounds may also be employed. For example, a two-pass scan in which level and sign information is processed in parallel may include: (1) coding conventional-pass bins in one pass (e.g., significance, bin 1 level, and bin 2 level), and (2) coding bypass bins in another pass (e.g., remaining levels and signs), each pass using the same scan order. The regular bins are bins encoded with CABAC using updated contexts determined by context derivation criteria. For example, as will be explained in more detail below, the context derivation criteria may include coded level information for causal neighbor coefficients relative to the current transform coefficient. The bypass bin is a bin encoded with CABAC having a fixed context.
Examples of the number of scan passes described above may be generalized to a first scan pass comprising a first portion of coefficient levels, wherein the first portion comprises a significance pass, and a second scan pass comprising a second portion of coefficient levels.
In each of the examples given above, the rounds may be performed sequentially in each subset. While it may be desirable to use a one-dimensional subset comprising consecutive scan coefficients, several pass approaches may also be applied to sub-blocks, such as 4 x 4 sub-blocks. Example two-pass and four-pass processes for subsets of consecutive scans are outlined in more detail below.
In a simplified two-pass process, for each subset of transform units, a first pass codes the significance of the coefficients in the subset following a scan order, and a second pass codes the coefficient levels of the coefficients in the subset following the same scan order. The scan order may be characterized by the scan direction (forward or reverse) and the scan pattern (e.g., horizontal, vertical, or diagonal). As described above, the algorithm may be more amenable to parallel processing if these two passes in each subset follow the same scan order.
In a more elaborate four-pass process, for each subset of transform units, a first pass codes the significance of the coefficients in the subset, a second pass codes bin 1 of the coefficient levels of the coefficients in the subset, a third pass codes the remaining bins of the coefficient levels of the coefficients in the subset, and a fourth pass codes the signs of the coefficient levels of the coefficients in the subset. Also, to be more amenable to parallel processing, all passes in each subset should have the same scan order. As mentioned above, it has been shown that a scanning order with a reverse direction is very effective. It should be noted that the fourth pass (i.e., coding of the sign of the coefficient level) may be done immediately after the first pass (i.e., coding of the significance map) or just before the remaining values of the coefficient level.
For some transform sizes, the subset may be the entire transform unit. In this case, there is a single subset of all significant coefficients corresponding to the entire transform unit, and the significance scan and the level scan proceed in the same scan order. In this case, instead of a limited number of n (e.g., n-16) coefficients in a subset, the subset may be a single subset of the transform unit, where the single subset includes all significant coefficients.
Returning to fig. 5, once the transform coefficients are scanned, entropy encoding unit 56 may apply entropy coding, such as CAVLC or CABAC, to the coefficients. In addition, entropy encoding unit 56 may encode Motion Vector (MV) information, as well as any of a variety of syntax elements useful in decoding video data at video decoder 30. The syntax elements may include a significance map having a significant coefficient flag indicating whether a particular coefficient is significant (e.g., non-zero) and a last significant coefficient flag indicating whether the particular coefficient is the last significant coefficient. Video decoder 30 may use these syntax elements to reconstruct the encoded video data. Following the entropy coding by entropy encoding unit 56, the resulting encoded video may be transmitted to another device, such as video decoder 30, or archived for later transmission or retrieval.
To entropy encode the syntax element, entropy encoding unit 56 may perform CABAC and select a context model based on, for example, the number of significant coefficients in the previously scanned N coefficients, where N is an integer value that may be related to the size of the scanned block. Entropy encoding unit 56 may also select a context model based on a prediction mode used to calculate the residual data that is transformed into the transform coefficient block and a transform type used to transform the residual data into the transform coefficient block. When the corresponding prediction data is predicted using the intra-prediction mode, entropy encoding unit 56 may further base the selection of the context model on the direction of the intra-prediction mode.
Furthermore, according to another aspect of the present invention, it is proposed to divide the context of CABAC into several subsets of coefficients (e.g., the subsets shown in fig. 7-9). It is proposed that each subset consists of consecutive coefficients in scan order over the whole block. Any size of the subset may be considered, but it has been found that scanning the size of 16 coefficients in the subset has a significant effect. In this example, a subset may be 16 consecutive coefficients of a scan order, which may be any scan pattern, including sub-block, diagonal, zig-zag, horizontal, and vertical scan patterns. According to this proposal, the coefficient level scan is done from the last significant coefficient in a block. Thus, a coefficient level scan proceeds from the last significant coefficient in the block to the first coefficient (DC component), where the scan is conceptually partitioned into different subsets of coefficients in order to derive the context to be applied. For example, the scans are arranged in subsets of n consecutive coefficients in the scan order. The last significant coefficient is the first significant coefficient encountered in an inverse scan from the highest frequency coefficient of the block (typically found near the bottom right corner of the block) towards the DC coefficient of the block (top left corner of the block).
In another aspect of the present invention, it is proposed to coordinate CABAC context derivation criteria for all block sizes. In other words, instead of having different context derivations based on block size as discussed above, each block size will depend on the same derivation of CABAC context. In this way, no particular block size needs to be considered in order to derive the CABAC context for the block. Context derivation is also the same for both significance coding and coefficient level coding.
It is also proposed that the CABAC context set depends on whether the subset is subset 0 (defined as the subset of coefficients with the lowest frequency (i.e. containing the DC coefficient and neighboring low frequency coefficients)) (i.e. the context derivation criterion). See tables 3a and 3b below.
Table 3 a-context set table. Compare with table 2. There is a dependency on the subset, whether or not the subset is subset 0 (lowest frequency).
According to table 3a above, sets 0-2 of context models are used for the lowest frequency scanning subset (i.e., the set of n consecutive coefficients) if there are zero coefficients greater than 1 in the previously coded subset, one coefficient greater than 1 in the previously coded subset, or more than one coefficient greater than 1 in the previously coded subset, respectively. Sets 3-5 of context models are used for all subsets above the lowest frequency subset if there are zero coefficients greater than 1 in the previously coded subset, one coefficient greater than 1 in the previously coded subset, or more than one coefficient greater than 1 in the previously coded subset, respectively.
Table 3 b-context set table.
Table 3b shows a context set table that shows good performance because this table allows for a more accurate count of the number of coefficients greater than 1 in the previous subset. Table 3b may be used as an alternative to table 3a above.
Table 3c shows a simplified context set table with context derivation criteria that may also be used instead.
Table 3 c-context set table.
In addition, the subset containing the last significant coefficient in the transform unit may utilize a unique set of contexts.
The invention also proposes that the context for a subset still depends on the number of coefficients greater than 1 in the previous subset. For example, if the number of coefficients in the previous subset is a sliding window, then this number is assumed to be uiNumOne. Once this value is checked to determine the context for the current sub-scan set, the value is not set to zero. Instead, this value is normalized (e.g., using uiNumOne ═ uiNumOne/4, which is equivalent to uiNumOne > > > 2, or uiNumOne ═ uiNumOne/2, which is equivalent to uiNumOne > > > 1). By doing so, the values of the subset immediately preceding the previous subset may still be considered, but in case the CABAC context decision of the current coded subset is given less weight. In particular, CABAC context decision for a given subset takes into account not only the number of coefficients greater than 1 in the previous subset, but also the weighted number of coefficients greater than 1 in the previously coded subset.
Additionally, the set of contexts may depend on: (1) the number of significant coefficients in the currently scanned subset, (2) whether the current subset is the last subset with significant coefficients (i.e., using the inverse scan order, which refers to whether the subset was scanned first for a coefficient level). In addition, the context model for the coefficient hierarchy may depend on whether the current coefficient is the last coefficient.
In HEVC, highly adaptive context selection methods have previously been proposed for significance map coding of 16 × 16 and 32 × 32 transform coefficient blocks. It should be noted that this context selection method can be extended to all block sizes. As shown in fig. 10, this method divides a 16 × 16 block into four regions, where each coefficient in a lower frequency region 41 (four coefficients in the upper left corner in the x, y coordinate position in the example of the 16 × 16 block ([0,0], [0,1], [1,0], [1,1]), where [0,0] indicates the upper left corner, a DC coefficient) has its own context, the coefficients in a top region 37 (coefficients in the top row from the x, y coordinate position [2,0] to [15,0] in the example of the 16 × 16 block) share 3 contexts, the coefficients in a left region 35 (coefficients in the left column from the x, y coordinate position [0,2] to [0,15] in the example of the 16 × 16 block) share another 3 contexts, and the coefficients in the remaining region 39 (remaining coefficients in the 16 × 16 block) share 5 contexts. As an example, the context selection for transform coefficient X in region 39 is based on the sum of the significance of the largest transform coefficient of the 5 transform coefficients B, E, F, H and I. Because X is independent of other locations on the same diagonal line of X along the scan direction (in this example, a zig-zag or diagonal scan pattern), the context of the significance of the transform coefficients along the diagonal lines in the scan order can be computed in parallel from the previous diagonal lines in the scan order.
As shown in fig. 10, the proposed context for the significance map is only valid if the scan order is forward, since it becomes non-causal at the decoder if inverse scanning is used. That is, if inverse scanning is used, the decoder has not yet decoded coefficients B, E, F, H and I as shown in FIG. 10. As a result, the bitstream is not decodable.
However, the present invention proposes to use the inverse scan direction. Thus, when the scanning order is the reverse direction, the significance map has a corresponding correlation among the coefficients, as shown in fig. 6. Thus, as described above, using an inverse scan on the significance map provides desirable coding efficiency. Furthermore, inverse scanning is used on the significance map to coordinate scanning for coding of the coefficient level and the significance map. To support inverse scanning of significant coefficients, the context needs to be changed so that it is compatible with the inverse scanning. It is proposed that coding of significant coefficients utilizes contexts that are causal with respect to the inverse scan.
In one example, this disclosure further proposes a technique for significance map coding using the context depicted in FIG. 11. Each coefficient in the lower frequency region 43 (three coefficients in the upper left corner x, y coordinate positions [0,0], [0,1], [1,0] in the example of a 16 x 16 block, where [0,0] indicates the upper left corner DC coefficient) has its own context derivation. The coefficients in the top region 45 (the coefficients from x, y coordinate positions [2,0] to [15,0] in the top row in the example of a 16 x 16 block) have contexts that depend on the significance of two previous coefficients in the top region 45 (e.g., the two coefficients immediately to the right of the coefficient to be coded, where such coefficients are causal neighbors for decoding purposes given an inverse scan). The coefficients in left region 47 (the coefficients from x, y coordinate positions [0,2] to [0,15] in the left column in the example of a 16 x 16 block) have contexts that depend on the significance of two previous coefficients (e.g., two coefficients immediately below the coefficient to be coded, where such coefficients are causal neighbors for decoding purposes given the inverse scan orientation). It should be noted that these contexts in the top region 45 and left region 47 in fig. 11 are the inverse of the contexts shown in fig. 10 (e.g., where coefficients in the top region 37 have contexts that depend on coefficients on the left side, and coefficients in the left region 35 have contexts that depend on coefficients above). Returning to fig. 11, the context for the coefficients in the remaining region 49 (i.e., the remaining coefficients outside of the lower frequency region 43, the top region 45, and the left region 47) depends on the sum (or any other function) of the significance of the coefficients in the locations labeled I, H, F, E and B.
In another example, the coefficients in the top region 45 and the left region 47 may be derived using exactly the same context as the coefficients in region 49. In the inverse scan, this is possible because the adjacent positions labeled I, H, F, E and B are available for coefficients in the top region 45 and the left region 47. At the end of the row/column, the locations for the causal coefficients I, H, F, E and B may be outside of the block. In that case, the value of such coefficients is assumed to be zero (i.e., not significant).
There are many options in selecting a context. The basic idea is to use the significance of coefficients that have been coded according to the scanning order. In the example shown in fig. 10, the context of the coefficient at position X is derived based on the sum of the significance of the coefficients at positions B, E, F, H and I. These context coefficients appear before the current coefficient of the inverse scan order proposed in this disclosure for the significance map. Contexts that are causal in the forward scan become non-causal (unavailable) in the reverse scan order. A way to solve this problem is to mirror the context of the conventional case in fig. 10 to the context for inverse scanning in fig. 11. For the significance scan from the last significant coefficient to the DC coefficient position in the inverse direction, the context neighborhood for coefficient X is made up of coefficients B, E, F, H, I that are associated with higher frequency positions relative to the position of coefficient X and that have been processed by the encoder or decoder in the inverse scan prior to coding of coefficient X.
As discussed above, the contexts and context models illustrated in tables 1 and 2 attempt to exploit local correlation of coefficient levels among the 4 x 4 sub-blocks. However, the dependency may be too far away. That is, there may be low dependencies between coefficients that are separated from each other by coefficients (e.g., from one sub-block to another). Furthermore, within each sub-block, the dependency between coefficients may be weak. This disclosure describes techniques for employing a more local context neighborhood to address these issues by creating a context set for the coefficient hierarchy.
This disclosure proposes the use of local neighborhoods to derive contexts for transform coefficient levels, for example in video coding according to HEVC or other standards. This neighborhood consists of coefficients that have been encoded (or decoded) that have a high correlation with the level of the current coefficient. The coefficients may be spatially adjacent to the coefficient to be coded and may include both coefficients defining the coefficient to be coded and other nearby coefficients, such as shown in fig. 11 or fig. 13. It is noted that the coefficients used for context derivation are not limited to sub-blocks or previous sub-blocks. Rather, the local neighborhood may include coefficients that are spatially located close to the coefficients to be coded, but will not necessarily reside in the same sub-blocks as the coefficients to be coded or in the same sub-blocks as each other (if the coefficients are arranged in sub-blocks). Rather than relying on coefficients located in fixed subblocks, the present invention proposes to use neighboring coefficients (i.e., already decoded) that are available given the particular scanning order used.
Different sets of CABAC contexts may be specified for different subsets of coefficients (e.g., based on previously coded subsets of coefficients). Within a given subset of coefficients, a context is derived based on a local neighborhood of coefficients (sometimes referred to as a context neighborhood). An example of a context neighborhood is shown in FIG. 12, in accordance with this disclosure. Coefficients in the context neighborhood may be spatially located near the coefficients to be coded.
As shown in fig. 12, for a forward scan, the context for the level of transform coefficient X depends on the values of coefficients B, E, F, H and I. In the forward scan, coefficients B, E, F, H and I are associated with lower frequency positions relative to the position of coefficient X that have been processed by an encoder or decoder prior to coding coefficient X.
To encode binary 1 for CABAC, the context depends on the sum of the number of significant coefficients (i.e., coefficients B, E, F, H and I in this example) in this context neighborhood. If a coefficient in the context neighborhood does not belong to the block (i.e., due to data loss), then the value may be considered to be 0 for the purpose of determining the context of the coefficient X. To encode the remaining bins of CABAC, the context depends on the sum of the number of coefficients in the neighborhood equal to 1 and the sum of the number of coefficients in the neighborhood greater than 1. In another example, a context of bin 1 may depend on the sum of the bin 1 values of the coefficients in the local context neighborhood. In another example, a context of bin 1 may depend on a combination of the significance coefficient and the sum of the bin 1 values in this context neighborhood.
There are many possibilities in selecting a context neighborhood. However, the context neighborhood should be made of coefficients so that both the encoder and decoder can access the same information. In particular, the coefficients B, F, E, I and H in the neighborhood should be causal neighbors in the sense that they have been previously encoded or decoded and can be used for reference in determining the context for coefficient X.
The context described above with reference to fig. 12 is one of many possibilities. Such context may be applied to any of the three scans currently proposed for use in HEVC: diagonal, horizontal and vertical. The invention proposes that the context neighborhood used for deriving the context of the coefficient level may be the same as the context neighborhood used for deriving the context of the significance map. For example, the context neighborhood used to derive the context of the coefficient level may be a local neighborhood, as is the case in the coding of significance maps.
As described in more detail above, this disclosure proposes scanning the significant coefficients using an inverse scan order to form a significance map. The inverse scan order may be an inverse zigzag pattern, a vertical pattern, or a horizontal pattern, as shown in fig. 6. The context neighborhood shown in fig. 12 will become non-causal if the scan order for coefficient level scanning is also the inverse mode. The invention proposes to reverse the position of the context neighbourhood so that it is causal with respect to the inverse scan order. FIG. 13 shows an example of a context neighborhood for an inverse scan order.
As shown in fig. 13, for a hierarchical scan in the reverse direction from the last significant coefficient to the DC coefficient position, the context neighborhood for coefficient X includes coefficients B, E, F, H and I, which are associated with higher frequency positions relative to the position of coefficient X. Given the inverse scan, coefficients B, E, F, H and I have already been processed by the encoder or decoder prior to coding coefficient X, and are therefore causal in the sense that they are available for use. Similarly, such a context neighborhood may be applied to the coefficient level.
In one example, this disclosure further proposes another technique for significance map coding that utilizes contexts selected to support inverse scanning. As discussed above, highly adaptive context selection methods have previously been proposed for significance map coding of 16 × 16 and 32 × 32 transform coefficient blocks for HEVC. For example, as described above with reference to fig. 10, such an approach divides a 16 x 16 block into four regions, with each location in region 41 having its own set of contexts, region 37 having several contexts, region 35 having the other 3 contexts, and region 39 having 5 contexts. As an example, the context selection for transform coefficient X is based on the sum of the significance of the largest of the 5 positions B, E, F, H and I. Because X is independent of other positions on the same diagonal line of X along the scan direction, the context of the significance of transform coefficients along the diagonal line in the scan order can be computed in parallel from the previous diagonal line in the scan order.
The current HEVC approach for context derivation has several drawbacks. One issue is the number of contexts per block. Having more contexts means more memory and more processing each time a context is refreshed. Therefore, it would be beneficial to have an algorithm that has little context and therefore few ways to generate context (e.g., less than the four ways in the foregoing example, four modes).
One way to address these drawbacks is to code the significance map in reverse order, that is, from the last significant coefficient (higher frequency) to the DC component (lowest frequency). The result of this process in reverse order is that the context for the forward scan is no longer valid. The techniques described above include a method for determining a context for Context Adaptive Binary Arithmetic Coding (CABAC) of information indicating a current one significant coefficient based on previously coded significant coefficients in an inverse scan direction. In an example of inverse zig-zag scanning, previously coded significant coefficients reside at positions to the right of the scan line on which the current significant coefficient resides.
Context generation may be different for different locations of the transform block based at least on a distance from the boundary and a distance from the DC component. In the example techniques described above, it is proposed that significance map coding utilize the set of contexts depicted in fig. 11.
The present invention proposes a context set for inverse significance map scanning that can achieve higher performance by reducing the number of contexts per block. Referring back to fig. 11, a reduction in the number of contexts per block can be achieved by allowing the left region 47 and the top region 45 to use the same context derivation as the remaining region 49. In the inverse scan, this is possible because the adjacent locations labeled I, H, F, E and B are available for coefficients at regions 47 and 45.
Fig. 14 shows an example of context derivation from this example. In this example, there are only two context regions: a low frequency region 57 for the DC coefficient and the remaining region 59 for all other coefficients. Thus, this example only proposes two ways to derive the context. In the low frequency region 57 (DC coefficients at x, y position 0,0), the context is derived based on the position, i.e. the DC coefficient has its own context. In the remaining region 59, contexts are derived based on the significance of neighboring coefficients in the local neighborhood for each coefficient to be coded. In this example, this is derived depending on the sum of the validity of the 5 neighbors represented by I, H, F, E and B in fig. 14.
Thus, the number of ways to derive context within a block is reduced from 4 to 2. Furthermore, the number of contexts is reduced by 8 relative to the previous example in fig. 11 (2 for the lower frequency region 43 and 3 for each of the upper region 45 and the left region 47). In another example, the DC coefficient may use the same method as the rest of the block, thus reducing the number of ways to derive context within the block to 1.
Fig. 15 shows an example in which the current position of coefficient X causes some of the neighboring coefficients (H and B in this case) to be outside the current block. If any neighbors of the current coefficient are outside the block, such neighboring coefficients may be assumed to have 0 significance (i.e., they are zero values and thus are not significant). Alternatively, one or more special contexts may be specified for the lower right one or more coefficients. In this way, the higher frequency coefficients can have a location-dependent context in a similar manner to the DC coefficients. However, assuming neighbors are zero may provide adequate results, particularly because the bottom-right coefficient will generally have a lower probability of having a significant coefficient, or at least a lower probability of having a significant coefficient of a larger value.
The reduction in the number of contexts in the example of fig. 14 is beneficial for implementation. However, this may result in a slight degradation of performance. The present invention proposes another technique to improve performance while still reducing the number of contexts. In particular, it is proposed to have a second set of contexts that are also based on neighboring coefficients. The context derivation algorithm is identical, but uses two sets of contexts with different probability models. The set of contexts used depends on the position of the coefficient to be coded within the transform unit.
That is, increased performance has been demonstrated when a context model is used for higher frequency coefficients (e.g., the lower-right x, y coordinate position of the coefficient) that is different from coefficients at lower frequencies (e.g., the upper-left x, y coordinate position of the coefficient). One way to separate the lower frequency coefficients from the higher frequency coefficients and thus separate the context models for each is to compute the x + y values of the coefficients, where x is the horizontal position of the coefficients and y is the vertical position of the coefficients. If this value is less than some threshold (e.g., 4 has been shown to be very effective), then context set 1 is used. If the value is equal to or greater than the threshold, then context set 2. Likewise, context sets 1 and 2 have different probability models.
FIG. 16 shows an example of a context region for this example. Likewise, the DC coefficient at position (0,0) has its own context region 61. The lower frequency context region 63 consists of transform coefficients at x + y positions equal to or less than a threshold 4 (not including the DC coefficient). The higher frequency context region 65 is composed of transform coefficients at x + y positions greater than a threshold of 4. Threshold 4 is used as an example and can be adjusted to any number that achieves better performance. In another example, the threshold may depend on TU size.
The context derivation for the lower frequency context region 63 and the higher frequency context region 65 is identical in the way that neighbors are used to select contexts, but the probabilities (i.e., contexts) employed are different. In particular, the same criteria for neighbor-based context selection may be used, but applying such criteria results in different contexts being selected for different coefficient positions, as different coefficient positions may be associated with different sets of contexts. In this way, knowledge that the lower and higher frequency coefficients have different statistics is incorporated into the algorithm so that different sets of contexts for the different coefficients can be used.
In other examples, the x + y function may be changed to other functions depending on the location of the coefficients. For example, one option is to give the same set of contexts to all coefficients x < T and y < T, where T is a threshold. Fig. 17 shows an example of a transform coefficient block having these context regions. Likewise, the DC coefficient at position (0,0) may have its own context region 61. The lower frequency context region 73 consists of transform coefficients with X or Y positions less than or equal to a threshold 4 (not containing DC coefficients). The higher frequency context region is composed of all transform coefficients with X or Y position greater than a threshold of 4. Again, threshold 4 is used as an example and may be adjusted to any number that achieves better performance. In one example, the threshold may depend on TU size.
The above-described technique shown in fig. 16 and 17 has two sets of 5 contexts, which is still a smaller number of contexts and exhibits higher performance than the number of contexts shown in fig. 10. This is achieved by partitioning the block into different regions and specifying different sets of contexts for coefficients in the different regions, but still applying the same context derivation criteria to each region.
Fig. 18 shows another example of a transform coefficient block having a context region. In this example, the DC coefficients in region 81 and the coefficients at the x, y positions (1,0) and (0,1) in regions 83 and 85 each have their own context. The remaining region 87 has yet another context. In a variation of the example shown in fig. 18, regions 83 and 85 share a context.
In general, the techniques described above may include scanning significant coefficients in a transform coefficient block in an inverse direction from higher frequency coefficients in the transform coefficient block to lower frequency coefficients in the transform coefficient block to form a significance map, and determining contexts for Context Adaptive Binary Arithmetic Coding (CABAC) of the significant coefficients of the significance map based on a local neighborhood of previously scanned coefficients in the block. A context may be determined for each of the significant coefficients based on previously scanned transform coefficients in the local neighborhood having a higher frequency than the significant transform coefficients. In some examples, the context may be determined based on a sum of a number of significant coefficients in previously scanned coefficients of a context neighborhood. A local neighborhood of each of the significant coefficients to be coded may comprise a plurality of transform coefficients that spatially neighbor respective coefficients in the block.
The context of the significant coefficient at the DC (e.g., uppermost) position of the transform coefficient block may be determined based on the individual contexts specified for the significant coefficients at the DC position. Further, contexts may be determined for coefficients at the left and top edges of the block using criteria substantially similar or equivalent to the criteria used to determine contexts for coefficients not at the left and top edges of the block. In some examples, the context of the coefficient at the bottom-right most position of the block may be determined using a criterion that assumes that neighboring coefficients outside the block are zero valued coefficients. Further, in some examples, determining the context may include determining the context of the coefficient using substantially similar or equivalent criteria for selecting a context within the context set (but a different context set) based on the position of the coefficient within the transform coefficient block.
References in this disclosure to upper, lower, right, left, and the like are generally used for convenience in order to refer to the relative positions of higher and lower frequency coefficients in a block of transform coefficients arranged in a conventional manner such that the lower frequency coefficients are towards the upper left of the block and the higher frequency coefficients are towards the lower right of the block, and this should not be taken as limiting to the case where the higher and lower frequency coefficients may be arranged in a different, unconventional manner.
Returning to fig. 5, in some examples, transform module 52 may be configured to zero certain transform coefficients (that is, transform coefficients in certain positions). For example, transform module 52 may be configured to zero all transform coefficients that are outside the upper left quadrant of the TU after the transform. As another example, entropy encoding unit 56 may be configured to zero transform coefficients in the array after certain positions in the array. In any case, video encoder 20 may be configured to zero some portion of the transform coefficients, e.g., before or after scanning. The phrase "return to zero" is used to refer to setting coefficient values equal to zero without having to skip or discard the coefficients. In some examples, this setting of the coefficients to zero may be in addition to zeroing that may be caused by quantization.
Inverse quantization unit 58 and inverse transform module 60 apply inverse quantization and inverse transform, respectively, to reconstruct the residual block in the pixel domain, respectively, e.g., for later use as a reference block. Motion compensation unit 44 may calculate a reference block by adding the residual block to a predictive block of one of the frames of reference frame buffer 64. Motion compensation unit 44 may also apply one or more interpolation filters to the reconstructed residual block to calculate sub-integer pixel values for motion estimation. Summer 62 adds the reconstructed residual block to the motion compensated prediction block generated by motion compensation unit 44 to generate a reconstructed video block for storage in reference frame buffer 64. The reconstructed video block may be used by motion estimation unit 42 and motion compensation unit 44 as a reference block to inter-code a block in a subsequent video frame.
Fig. 19 is a block diagram illustrating an example of entropy encoding unit 56 used in the video encoder of fig. 5. Fig. 19 illustrates various functional aspects of entropy encoding unit 56 for selecting the scanning order and corresponding set of contexts used in CABAC entropy coding. Entropy encoding unit 56 may include a scan order and context selection unit 90, a 2D/1D scan unit 92, an entropy encoding engine 94, and a scan order memory 96.
Scan order and context selection unit 90 selects the scan order used by 2D/1D scan unit 92 for significance map scanning and coefficient level scanning. As discussed above, the scan order consists of both the scan pattern and the scan direction. Scan memory 96 may store instructions and/or data defining which scan order to use for a particular situation. As examples, the scan order may be selected using a prediction mode of a frame or slice, a block size, a transform, or other characteristics of the video data. In one proposal for HEVC, each of the intra-prediction modes is assigned to a particular scan order (sub-block diagonal, horizontal, or vertical). The decoder parses the intra prediction mode and uses a look-up table to determine the scanning order to apply. An adaptive approach may be used to track the statistics of the most frequent significant coefficients. In another example, the scanning may be based on the most frequently used coefficient of the first bit in the scanning order. As another example, scan order and context selection unit 90 may use a predetermined scan order for all cases. As described above, scan order and context selection unit 90 may select a scan order for both significance map and coefficient level scanning. In accordance with the techniques of this disclosure, the two scans may have the same scan order, and in particular may both be in the reverse direction.
Based on the selected scan order, scan order and context selection unit 90 also selects contexts to be used for CABAC in entropy encoding engine 94, such as the contexts described above with reference to fig. 11 and 13-18.
The 2D/1D scanning unit 92 applies the selected scan to the two-dimensional array of transform coefficients. In particular, 2D/1D scanning unit 92 may scan the transform coefficients in the subset, as described above with reference to fig. 7-9. In particular, transform coefficients are scanned in a subset consisting of a plurality of consecutive coefficients according to a scanning order. These subsets may be applicable to both significance map scans as well as coefficient level scans. In addition, the 2D/1D scan unit 92 may perform significance map and coefficient level scanning as a continuous scan and according to the same scan order. As described above, the continuous scan may consist of several scans. In one example, the first scan is a significance map scan, the second scan is a scan of bin 1 of a transform coefficient level in each subset, the third scan is a scan of the remaining bins of transform coefficient levels, and the fourth scan is a scan of the sign of the transform coefficient levels.
Entropy encoding engine 94 applies an entropy encoding process to the scan coefficients using the contexts selected from scan order and context selection unit 90. In some instances, the context for CABAC may be predetermined for all cases, and thus, one procedure or unit may not be needed to select a context. The entropy coding process may be applied to the coefficients as they are fully scanned into a 1D vector or as each coefficient is added to a 1D vector. In other examples, the coefficients are processed directly in a 2D array using a scan order. In some cases, the entropy encoding engine 94 may be configured to encode different sections of the 1D vector in parallel to facilitate parallelization of the entropy encoding process in order to increase speed and efficiency. Entropy encoding engine 94 generates a bitstream carrying the encoded video. The bitstream may be transmitted to another device or stored in a data storage archive for later retrieval. In addition to residual transform coefficient data, the bitstream may also carry motion vector data and various syntax elements that may be used to decode the encoded video in the bitstream.
Additionally, entropy encoding unit 56 may provide signaling in the encoded video bitstream to indicate the scanning order and/or context used in the CABAC process. For example, the scanning order and/or context may be signaled as syntax elements at various levels (e.g., frame, slice, LCU, CU level, or TU level). If a predetermined scanning order and/or context is set, there may be no need to provide signaling in the encoded bitstream. Furthermore, in some examples, it may be that video decoder 30 may infer some parameter values without signaling. To permit the definition of different scan orders for different TUs, such syntax elements may need to be signaled at the TU level (e.g., in the TU quadtree header). Although signaling in the encoded video bitstream is described for purposes of illustration, information indicative of parameter values or functions may be signaled out-of-band in the side information.
In this context, signaling the scan order and/or context in the encoded bitstream does not require such elements to be transmitted from the encoder to the decoder in real-time, but rather means that such syntax elements are encoded into the bitstream and made accessible to the decoder in any way. This may include real-time transmission (e.g., in a video conference) and storing the encoded bitstream on a computer-readable medium for future use by a decoder (e.g., by streaming, downloading, disk access, card access, DVD, blu-ray, etc.).
It should be noted that while shown as separate functional units for ease of illustration, the structure and functionality of the scan order and context selection unit 90, the 2D/1D scan unit 92, the entropy encoding engine 94, and the scan order memory 96 may be highly integrated with one another.
Fig. 20 is a block diagram illustrating an example of video decoder 30 that decodes an encoded video sequence. In the example of fig. 20, video decoder 30 includes an entropy decoding unit 70, a motion compensation unit 72, an intra-prediction module 74, an inverse quantization unit 76, an inverse transform unit 78, a reference frame buffer 82, and a summer 80. In some examples, video decoder 30 may perform a decoding pass that is generally reciprocal to the encoding pass described with respect to video encoder 20 (fig. 5).
Entropy decoding 70 entropy decodes the encoded video in a process that is the reverse of that used by entropy encoding unit 56 of fig. 5. Motion compensation unit 72 may generate prediction data based on the motion vectors received from entropy decoding unit 70. Intra-prediction module 74 may generate prediction data for a current block of a current frame based on the signaled intra-prediction mode and data from previously decoded blocks of the current frame.
In some examples, entropy decoding unit 70 (or inverse quantization unit 76) may scan the received values using a scan that mirrors the scan order used by entropy encoding unit 56 (or quantization unit 54) of video encoder 20. While the scanning of coefficients may be performed in inverse quantization unit 76, the scanning will be described as being performed by entropy decoding unit 70 for purposes of illustration. In addition, although shown as separate functional units for ease of illustration, the structure and functionality of entropy decoding unit 70, inverse quantization unit 76, and other units of video decoder 30 may be highly integrated with one another.
In accordance with the techniques of this disclosure, video decoder 30 may scan both transform coefficients and significance maps of transform coefficient levels according to the same scan order. That is, the scan order for significance map and level coding should have the same pattern and direction. In addition, video decoder 30 may use the scan order of the significance map, i.e., in the reverse direction. As another example, video decoder 30 may use a scan order coordinated in the reverse direction for significance map and level coding.
In another aspect of the present invention, video decoder 30 may scan the transform coefficients in the subset. In particular, transform coefficients are scanned in a subset consisting of a plurality of consecutive coefficients according to a scanning order. These subsets may be applicable to both significance map scans as well as coefficient level scans. In addition, video decoder 30 may perform significance map scanning and coefficient level scanning according to the same scanning order as consecutive scans. In one aspect, the scan order is an inverse scan order. The continuous scan may consist of several scan passes. In one example, the first scan is a significance map scan, the second scan is a scan of bin 1 in a transform coefficient level in each subset, the third scan is a scan of the remaining bins of the transform coefficient level, and the fourth scan is a scan of the sign of the transform coefficient level.
Video decoder 30 may receive signaling from the encoded bitstream identifying the scanning order and/or context used by video encoder 20 for CABAC. Additionally, or alternatively, scan order and context may be inferred by video decoder 30 based on characteristics of the coded video such as prediction mode, block size, or other characteristics. As another example, video encoder 20 and video decoder 30 may use a predetermined scanning order and context for all use cases, and thus, signaling in the encoded bitstream would not be needed.
Regardless of how the scan order is determined, entropy decoding unit 70 scans the 1D vector into the 2D array using the inverse of the scan order. The 2D array of transform coefficients produced by entropy decoding unit 70 may be quantized and may generally match the 2D array of transform coefficients scanned by entropy encoding unit 56 of video encoder 20 to produce a 1D vector of transform coefficients.
Inverse quantization unit 76 inverse quantizes, i.e., dequantizes, the quantized transform coefficients provided in the bitstream and decoded by entropy decoding unit 70. The inverse quantization process may include conventional processes, e.g., similar to the process proposed for HEVC or defined by the h.264 decoding standard. The inverse quantization process may include a quantization parameter QP calculated for the CU using video encoder 20 to determine a degree of quantization and a degree of inverse quantization that should also be applied. The inverse quantization unit 76 may inverse quantize the transform coefficients before or after the coefficients are converted from a 1D vector to a 2D array.
The inverse transform module 78 applies an inverse transform, such as an inverse DCT, an inverse integer transform, an inverse KLT, an inverse rotational transform, an inverse directional transform, or another inverse transform. In some examples, inverse transform module 78 may determine the inverse transform based on signaling from video encoder 20 or by inferring the transform from one or more coding characteristics, such as block size, coding mode, and so forth. In some examples, inverse transform module 78 may determine a transform to apply to the current block based on a transform signaled at a root node of a quadtree for an LCU that includes the current block. In some instances, the inverse transform module 78 may apply a cascaded inverse transform.
Motion compensation unit 72 generates motion compensated blocks, possibly performing interpolation based on interpolation filters. An identifier of an interpolation filter to be used for motion estimation with sub-pixel precision may be included in the syntax element. Motion compensation unit 72 may calculate interpolated values for sub-integer pixels of the reference block using interpolation filters used by video encoder 20 during encoding of the video block. Motion compensation unit 72 may determine the interpolation filters used by video encoder 20 according to the received syntax information and use the interpolation filters to generate predictive blocks.
In the HEVC example, motion compensation unit 72 and intra-prediction module 74 may use some syntax information (e.g., provided through a quadtree) to determine the size of an LCU for encoding a frame of an encoded video sequence. Motion compensation unit 72 and intra-prediction module 74 may also use syntax information to determine splitting information that describes how each CU of a frame of the encoded video sequence is split (and likewise how sub-CUs are split). The syntax information may also include modes indicating how each split is encoded (e.g., intra-or inter-prediction, and for intra-prediction and intra-prediction encoding modes), one or more reference frames (and/or a reference list containing identifiers for the reference frames) for each inter-encoded PU, and other information for decoding the encoded video sequence.
Summer 80 combines the residual block with a corresponding predictive block generated by motion compensation unit 72 or intra-prediction module 74 to form a decoded block. A deblocking filter may also be applied to filter the decoded blocks, if desired, in order to remove blockiness artifacts. The decoded video blocks are then stored in a reference frame buffer 82, the reference frame buffer 82 providing reference blocks for subsequent motion compensation and also generating decoded video for presentation on a display device (such as display device 32 of fig. 4).
As mentioned above, the techniques for scanning transform coefficients presented in this disclosure may be applicable to both encoders and decoders. The video encoder may apply the scan order to scan the transform coefficients from the two-dimensional array to the one-dimensional array, while the video decoder may apply the scan order to scan the transform coefficients from the one-dimensional array to the two-dimensional array, e.g., in a reverse manner to the encoder. Alternatively, the video decoder may apply the scan order to scan the transform coefficients from the one-dimensional array to the two-dimensional array, and the video encoder may apply the scan order to scan the transform coefficients from the two-dimensional array to the one-dimensional array in a reverse manner to the decoder. Thus, a scan by a decoder may refer to a 2D/1D scan by an encoder, or a 1D/2D scan by a decoder. In addition, scanning according to a scanning order may refer to scanning in a scanning order for 2D/1D scanning, scanning in a scanning order for 1D/2D scanning, scanning in an inverse order of a scanning order for 1D/2D scanning, or scanning in an inverse order of a scanning order for 2D/1D scanning. Thus, the scan order may be established for a scan by an encoder or a scan by a decoder.
Video decoder 30 may operate in a manner that is substantially symmetric to the manner in which video encoder 20 operates. For example, video decoder 30 may receive entropy-encoded data representing an encoded CU, including encoded PU and TU data. Video decoder 30 may inverse entropy encode the received data to form encoded quantized coefficients. When video encoder 20 entropy encodes data using an arithmetic coding algorithm (e.g., CABAC), video decoder 30 may decode the data using a context model that corresponds to the same context model that video encoder 20 used to encode the data.
Video decoder 30 may also inverse scan the decoded coefficients with an inverse scan that mirrors the scan used by video encoder 20. To inverse scan the coefficients, video decoder 30 selects the same scanning order used by video encoder 20, which may be stored at the decoder or signaled by the encoder in the encoded bitstream. Using this scanning order, video decoder 30 thus forms a two-dimensional matrix from the one-dimensional vectors of quantized transform coefficients generated by the entropy decoding process. In particular, video decoder 30 inverse scans the coefficients from a one-dimensional array to a two-dimensional array according to the scan order used by video encoder 20.
Next, video decoder 30 may inverse quantize coefficients in a two-dimensional matrix generated by the inverse scan performed according to the scan order. Video decoder 30 may then apply one or more inverse transforms to the two-dimensional matrix. The inverse transform may correspond to a transform applied by video encoder 20. For example, video decoder 30 may determine the inverse transform to apply based on information signaled at the root of the quadtree corresponding to the currently decoded CU or with reference to other information indicating an appropriate inverse transform. Upon applying the inverse transform, video decoder 30 recovers the residual video data in the pixel domain and applies the applicable intra-prediction or inter-prediction decoding to reconstruct the original video data.
Fig. 21 is a block diagram illustrating an example of entropy decoding unit 70 for the video decoder of fig. 20. Fig. 21 illustrates various functional aspects of entropy decoding unit 70 for selecting a scan order and context for CABAC decoding in a video decoding process. As shown in fig. 21, entropy decoding unit 70 may include a scan order and context selection unit 100, a 1D/2D scanning unit 102, an entropy decoding engine 104, and a scan order memory 106.
Entropy decoding engine 104 entropy decodes encoded video transmitted to video decoder 30 or retrieved from storage by video decoder 30. For example, entropy decoding engine 104 may apply an entropy decoding process, such as CAVLC, CABAC, or another process, to a bitstream carrying encoded video to recover 1D vectors of transform coefficients. In addition to the residual transform coefficient data, the entropy decoding engine 104 may also apply entropy decoding to render motion vector data and various syntax elements that may be used to decode the encoded video in the bitstream. The entropy decoding engine 104 may determine which entropy decoding process to select, e.g., CAVLC, CABAC, or another process, based on signaling in the encoded video bitstream or by inferring the appropriate process from other information in the bitstream.
In accordance with the techniques of this disclosure, entropy decoding engine 104 may entropy decode the encoded video from two different context regions using CABAC. The scan order and context selection unit 100 may provide the context derivation to the entropy decoding engine 104. According to an example of this disclosure, the context derivation for the first context region depends on the position of the transform coefficients, while the context derivation for the second region depends on causal neighbors of the transform coefficients. In another example, the second context region may use two different context models, depending on the location of the transform coefficients.
The scan order and context selection unit 100 may also determine the scan order and/or an indication of the scan order based on signaling in the encoded video bitstream. For example, entropy decoding unit 70 may receive a syntax element that explicitly signals the scan order. Also, although signaling in the encoded video bitstream is described for purposes of illustration, the scanning order may be received by entropy decoding unit 70 as out-of-band information in the side information. Furthermore, in some examples, it is possible that the scan order and context selection unit 100 infers the scan order without signaling. The scan order may be based on a prediction mode, block size, transform, or other characteristics of the encoded video. Like memory 96 of FIG. 19, memory 106 of FIG. 21 may store instructions and/or data defining a scan order.
1D/2D scan unit 102 receives the scan order from scan order and context selection unit 100 and applies the scan order directly or in a reverse manner to control the scanning of the coefficients. According to the techniques of this disclosure, the same scanning order may be used for both significance map scanning and coefficient levels. In another aspect of the invention, the significance map scan may be performed in the reverse direction. In another aspect of the present disclosure, both the significance map scan and the coefficient level scan may be in the reverse direction.
According to another aspect of this disclosure, 1D/2D scanning unit 102 may scan a one-dimensional array of transform coefficients into one or more subsets of transform coefficients, code the significance of the transform coefficients in each subset, and code the levels of transform coefficients in each subset. In another aspect of the disclosure, significance map and coefficient level scanning are performed in consecutive scans according to the same scan order. In one aspect, the scan order is an inverse scan order. The continuous scan may consist of several scans, where the first scan is a significance map scan, the second scan is a bin 1 in the level of the transform coefficient in each subset, the third scan is the remaining bins of the level of the transform coefficient, and the fourth scan is the sign of the level of the transform coefficient.
At the encoder side, coding of transform coefficients may include encoding the transform coefficients according to a scan order to form a one-dimensional array of transform coefficients. On the decoder side, coding the transform coefficients may include decoding the transform coefficients according to a scan order to reconstruct a two-dimensional array of transform coefficients in a transform block.
It should be noted that while shown as separate functional units for ease of illustration, the structure and functionality of the scan order and context selection unit 100, the 1D/2D scan unit 102, the entropy decoding engine 104, and the scan order memory 106 may be highly integrated with one another.
Fig. 22 is a flow diagram illustrating an example process for significance map and coefficient level scanning using a coordinated scanning order. A method of coding a plurality of transform coefficients associated with residual video data in a video coding process is presented. The method may be performed by a video coder, such as video encoder 20 or video decoder 30 of fig. 4. The video coder may be configured to select a scanning order (120). The scan order may be selected based on a prediction mode, block size, transform, or other characteristics of the encoded video. Additionally, the scan order may be a default scan order. The scan order defines both the scan pattern and the scan direction. In one example, the scan direction is an inverse scan direction proceeding from higher frequency coefficients in the plurality of transform coefficients to lower frequency coefficients in the plurality of transform coefficients. The scan mode may include one of a zig-zag mode, a diagonal mode, a horizontal mode, or a vertical mode.
The video coder may be further configured to code information indicative of significant coefficients of the plurality of transform coefficients according to a scan order (122), and determine contexts for coding significant coefficient levels for a plurality of subsets of significant coefficients, wherein each of the plurality of subsets comprises one or more significant coefficients scanned according to the scan order (124). The video coder also codes (126) information indicative of levels of the plurality of transform coefficients according to the scan order. The subsets may be of different sizes. It should be noted that steps 122, 124, and 126 may be interleaved, as the determination of the context for the level information depends on previously coded neighboring coefficients.
Fig. 23 is a flow diagram illustrating another example process for significance map and coefficient level scanning and CABAC context derivation. The method of fig. 23 is slightly different from the method shown in fig. 22, as the contexts of different sized blocks may use the same context derivation criteria. As one example, a video coder may: deriving a first context for a first block of transform coefficients according to a context derivation criterion, the first block having a first size; and deriving a second context for a second block of transform coefficients according to the same context derivation criteria as the first block, the second block having a second, different size (123). As with fig. 22, steps 122, 123 and 126 may be interleaved, as the determination of the context for the level information depends on previously coded neighboring coefficients.
Fig. 24 is a flow diagram illustrating another example process for significance map and coefficient level scanning and CABAC context derivation. The method of fig. 24 differs slightly from the method shown in fig. 22 in that the context for the subset is determined based on the presence of a DC coefficient in the subset. As one example, the video coder may determine different sets of contexts for different subsets of coefficients based on whether the respective subsets contain DC coefficients for the transform coefficients (125). As with fig. 22, steps 122, 125 and 126 may be interleaved, as the determination of the context for the level information depends on previously coded neighboring coefficients.
Fig. 25 is a flow diagram illustrating another example process for significance map and coefficient level scanning and CABAC context derivation. The method of fig. 25 differs slightly from the method shown in fig. 22 because the context is determined based on the weighted number of significant coefficients in the other previous subsets. As one example, a video coder may determine different sets of contexts for different subsets of coefficients based on a number of significant coefficients in a previous subset of coefficients and weighted numbers of significant coefficients in other previous subsets of coefficients (127). As with fig. 22, steps 122, 127 and 126 may be interleaved, as the determination of the context for the level information depends on previously coded neighboring coefficients.
FIG. 26 is a flow diagram illustrating an example process for significance map coding using an inverse scan direction. A method of coding transform coefficients associated with residual video data in a video coding process is presented. The method may be performed by a video coder, such as video encoder 20 or video decoder 30 of fig. 4. The video coder may be configured to select a scan order having a reverse direction (140), and determine a context for Context Adaptive Binary Arithmetic Coding (CABAC) of information indicating a current one significant coefficient based on previously coded significant coefficients in the reverse scan direction (142). The video coder may be further configured to code information indicative of significant transform coefficients along an inverse scan direction to form a significance map (146).
In one example, the scan has a diagonal pattern, and the previously coded significant coefficients reside at positions to the right of the scan line on which the current one significant coefficient resides. In another example, the scan has a horizontal pattern, and previously coded significant coefficients reside at positions below the scan line on which the current one of the significant coefficients resides. In another example, the scan has a vertical pattern, and the previously coded significant coefficients reside at positions to the right of the scan line on which the current one significant coefficient resides.
The video coder may be further configured to code information indicative of levels of significant transform coefficients (148). The step of coding information indicative of levels of significant transform coefficients may be performed in an inverse scan direction from higher frequency coefficients in the transform coefficient block to lower frequency coefficients in the transform coefficient block. As with fig. 22, steps 142, 146 and 148 may be interleaved, as the determination of the context for the level information depends on previously coded neighboring coefficients.
Fig. 27 is a flow diagram illustrating an example process for significance map and coefficient level scanning according to a subset of transform coefficients. A method of coding transform coefficients associated with residual video data in a video coding process is presented. The method may be performed by a video coder, such as video encoder 20 or video decoder 30 of fig. 4. The video coder may be configured to: arranging a block of transform coefficients into one or more subsets of transform coefficients (160); coding (162) the significance of the transform coefficients in each subset; and coding levels of transform coefficients in each subset (164). In one example, arranging the transform coefficient block may include arranging the transform coefficient block into a single set of transform coefficients corresponding to the entire transform unit. In another example, arranging the transform coefficient blocks may include arranging the transform coefficient blocks into one or more subsets of transform coefficients based on a scanning order.
The video coder may be configured to code the significance of the transform coefficients in each subset according to a scan order, and code the transform coefficient levels according to the scan order. Coding 162 the significance map and coding 164 the levels may be performed 165 together in two or more consecutive scan passes of the subset.
Fig. 28 is a flow diagram illustrating another example process for significance map and coefficient level scanning according to a subset of transform coefficients. The video encoder may perform a continuous scan (165) by first coding (170) the significance of the transform coefficients in a subset in a first scan of the transform coefficients in the corresponding subset.
Coding (164) coefficient levels in each subset includes at least a second scan of transform coefficients in the respective subset. The second scan may include coding (172) bin 1 of transform coefficient levels in the subset in a second scan of transform coefficients in the respective subset, coding (174) remaining bins of levels of transform coefficients in the subset in a third scan of transform coefficients in the respective subset, and coding (176) signs of transform coefficient levels in the subset in a fourth scan of transform coefficients in the respective subset.
Fig. 29 is a flow diagram illustrating another example process for significance map and coefficient level scanning according to a subset of transform coefficients. In this example, coding (176) of signs of transform coefficient levels is performed prior to coding (172, 174) of the levels.
Fig. 30 is a flow diagram illustrating an example process for entropy coding using multiple regions. A method of coding a plurality of transform coefficients associated with residual video data in a video coding process is presented. The method may be performed by a video coder, such as video encoder 20 or video decoder 30 of fig. 4. The video coder may be configured to code information indicative of significant coefficients of the plurality of transform coefficients according to a scan order (180), divide the coded information into a first region and a second region (182), entropy code the coded information in the first region according to a first set of contexts using context adaptive binary arithmetic coding (184), and entropy code the coded information in the second region according to a second set of contexts using context adaptive binary arithmetic coding (186). In one example, the scan order has an inverse direction and a diagonal scan pattern. This approach may also be applied to more than two regions, where each region has a set of contexts.
The first and second regions may be divided in several ways. In one example, the first region contains at least a DC component of the plurality of transform coefficients, and the second region contains the remaining plurality of transform coefficients that are not in the first region.
In another example, the first region contains all transform coefficients within a region defined by x + y < T, where x is the horizontal position of the transform coefficient, y is the vertical position of the transform coefficient, and T is a threshold. The first region may contain a DC coefficient. The second region contains the remaining plurality of transform coefficients that are not in the first region.
In another example, the first region contains all transform coefficients within a region defined by x < T and y < T, where x is the horizontal position of the transform coefficient, y is the vertical position of the transform coefficient, and T is a threshold. The second region contains the remaining plurality of transform coefficients that are not in the first region.
In another example, the first region contains a DC coefficient, the second region contains all transform coefficients (excluding the DC coefficient) within a region defined by x < T and y < T, where x is a horizontal position of the transform coefficient, y is a vertical position of the transform coefficient, and T is a threshold, and the third region contains the remaining plurality of transform coefficients that are not in the first region or the second region. In another example, the second and third regions described above may use the same method to derive the context, but use a different set of contexts for each region.
In another example, the first region includes the DC component and the transform coefficient at positions (1,0) and (0, 1). The second region contains the remaining plurality of transform coefficients that are not in the first region.
In another example, the first region contains only the DC component of the plurality of transform coefficients, and the second region contains the remaining plurality of transform coefficients.
In general, a first context for each transform coefficient in a first region is based on a position of each transform coefficient in the first region, while a second context for each transform coefficient in a second region is based on coded information of causal neighbors of each transform coefficient. In some examples, the second context is further based on a position of each transform coefficient in the second region. In another example, the second context for each transform coefficient in the second region is based on the coded information of five causal neighbors of each transform coefficient.
In one or more examples, the functions described in this disclosure may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be performed by a hardware-based processing unit (e.g., one or more processors) executing software in the form of computer-readable instructions or code. Such instructions or code may be stored or transmitted on a computer-readable medium and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media, which corresponds to tangible, non-transitory media, such as data storage media, or any communication media that facilitates transfer of a computer program from one place to another, such as according to a communication protocol. In this manner, the computer-readable medium may generally correspond to (1) a tangible computer-readable storage medium, which is non-transitory, or (2) a communication medium, such as a signal or carrier wave. A data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the techniques described in this disclosure. The computer program product may include a computer-readable medium.
By way of example, and not limitation, such computer-readable storage media can comprise RAM, ROM, EEPROM, flash memory, CD-ROM, or any other solid-state, optical or magnetic data storage medium, including optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be understood, however, that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but are directed to non-transitory tangible storage media. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
The instructions may be executed by one or more processors, such as one or more Digital Signal Processors (DSPs), general purpose microprocessors, Application Specific Integrated Circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Thus, the term "processor," as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. In addition, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques may be fully implemented in one or more circuits or logic elements.
The techniques of this disclosure may be performed by a wide variety of devices or apparatuses, including desktop computers, notebook (i.e., laptop) computers, tablet computers, set-top boxes, telephone handsets (e.g., so-called smart phones), televisions, cameras, display devices, digital media players, video game consoles, and so forth. In many cases, such devices may be equipped for wireless communication. Additionally, such techniques may be implemented by an Integrated Circuit (IC) or a group of ICs (e.g., a chipset). A device configured to perform the techniques of this disclosure may include any of the above devices, and in some cases may be a video encoder or a video decoder or a combined video encoder-decoder, i.e., a video codec, which may be formed by a combination of hardware, software, and firmware. Various components, modules, or units are described in this disclosure may be for emphasis on functional aspects of devices configured to perform the disclosed techniques, but do not necessarily need to be implemented by different hardware units. Rather, as described above, the various units may be combined in a codec hardware unit, in conjunction with suitable software and/or firmware, or provided by a collection of interoperating hardware units, including one or more processors as described above.
Various examples have been described. These and other examples are within the scope of the following claims.

Claims (20)

1. A method of coding transform coefficients associated with residual video data in a predicted video block coding process, the method comprising:
arranging a block of transform coefficients into one or more transform coefficient subsets;
decoding the position of the last significant coefficient in each subset;
coding significance of transform coefficients in each subset according to a scan order, wherein the scan order includes both a scan mode and a scan direction, wherein the scan direction is an inverse scan direction, and wherein the scan order starts from a position of a last significant coefficient in the each subset; and
coding levels of the transform coefficients in each subset according to the scan order of the inverse scan direction.
2. The method of claim 1, wherein the predicted video block coding process is a video decoding process, wherein coding significance of the transform coefficients in each subset comprises decoding significance of the transform coefficients in each subset, wherein coding levels of the transform coefficients in each subset comprises decoding levels of the transform coefficients in each subset, and wherein the method further comprises:
inverse transforming the block of transform coefficients to generate a block of residual video data;
performing a prediction process to generate predicted video data; and
decoding the predicted video data based on the predicted video data and the block of residual video data.
3. The method of claim 1, wherein the predicted video block coding process is a video encoding process, wherein coding significance of the transform coefficients in each subset comprises encoding significance of the transform coefficients in each subset, wherein coding levels of the transform coefficients in each subset comprises encoding levels of the transform coefficients in each subset, and wherein the method further comprises:
performing a prediction process on the predicted video block to generate predicted video data;
generating a block of residual video data based on the predicted video data; and
transform the block of residual video data to generate the block of transform coefficients.
4. The method of claim 1, wherein arranging a block of transform coefficients comprises arranging a block of transform coefficients into a single set of transform coefficients corresponding to an entire transform unit.
5. The method of claim 1, wherein coding the significance of transform coefficients and coding the levels of transform coefficients employ Context Adaptive Binary Arithmetic Coding (CABAC).
6. The method of claim 1, wherein coding the significance of transform coefficients comprises coding significance of transform coefficients in each subset in a first scan of transform coefficients in a respective subset, and wherein coding the levels of the transform coefficients comprises coding the levels of transform coefficients in each subset in at least a second scan of transform coefficients in the respective subset.
7. The method of claim 1, wherein coding the significance of transform coefficients comprises coding significance of transform coefficients in each subset in a first scan of transform coefficients in a respective subset, and wherein coding the levels of the transform coefficients comprises:
coding bin 1 of the level of transform coefficients in each subset in a second scan of transform coefficients in the respective subset;
coding at least a portion of remaining bins of the levels of transform coefficients in each subset in a third scan of transform coefficients in the respective subset; and
coding signs of the levels of transform coefficients in each subset in a fourth scan of transform coefficients in the respective subset.
8. The method of claim 7, further comprising coding the signs of the levels of transform coefficients prior to coding the remaining bins of the levels of transform coefficients.
9. The method of claim 1, wherein coding the significance of transform coefficients comprises coding significance of transform coefficients in each subset in a first scan of transform coefficients in a respective subset, and wherein coding the levels of transform coefficients comprises:
coding bin 1 of the level of transform coefficients in each subset in a second scan of transform coefficients in the respective subset;
coding bin 2 of the level of transform coefficients in each subset in a third scan of transform coefficients in the respective subset;
coding signs of the levels of transform coefficients in each subset in a fourth scan of transform coefficients in the respective subset; and
coding at least a portion of remaining bins of the levels of transform coefficients in each subset in a fifth scan of transform coefficients in the respective subset.
10. A device that codes transform coefficients associated with residual video data in a predicted video block coding process, the device comprising:
a memory configured to store video data; and
a video coder in communication with the memory and configured to:
arranging a block of transform coefficients into one or more transform coefficient subsets;
decoding the position of the last significant coefficient in each subset;
coding significance of transform coefficients in each subset according to a scan order, wherein the scan order includes both a scan mode and a scan direction, wherein the scan direction is an inverse scan direction, and wherein the scan order starts from a position of a last significant coefficient in the each subset; and
coding levels of the transform coefficients in each subset according to the scan order of the inverse scan direction.
11. The device of claim 10, wherein the video coder is configured to decode the predicted video block, and wherein the video coder is further configured to:
inverse transforming the block of transform coefficients to generate a block of residual video data;
performing a prediction process to generate predicted video data; and
decoding the predicted video data based on the predicted video data and the block of residual video data.
12. The device of claim 10, wherein the video coder is configured to encode the predicted video block, and wherein the video coder is further configured to:
performing a prediction process on the predicted video block to generate predicted video data;
generating a block of residual video data based on the predicted video data; and
transform the block of residual video data to generate the block of transform coefficients.
13. The device of claim 10, wherein the video coder is configured to arrange a block of transform coefficients into a single set of transform coefficients corresponding to an entire transform unit.
14. The device of claim 10, wherein the video coder is configured to code the significance and the level with Context Adaptive Binary Arithmetic Coding (CABAC).
15. The device of claim 10, wherein the video coder is configured to code significance of transform coefficients in each subset in a first scan of transform coefficients in a respective subset, and wherein the video coder is configured to code the levels of transform coefficients in each subset in at least a second scan of transform coefficients in the respective subset.
16. The device of claim 10, wherein the video coder is configured to:
coding significance of transform coefficients in each subset in a first scan of transform coefficients in a respective subset;
coding bin 1 of the level of transform coefficients in each subset in a second scan of transform coefficients in the respective subset;
coding at least a portion of remaining bins of the levels of transform coefficients in each subset in a third scan of transform coefficients in the respective subset; and
coding signs of the levels of transform coefficients in each subset in a fourth scan of transform coefficients in the respective subset.
17. The device of claim 16, wherein the video coder is configured to code the signs of the levels of transform coefficients prior to coding the remaining bins of the levels of transform coefficients.
18. The device of claim 16, wherein the video coder is further configured to:
coding significance of transform coefficients in each subset in a first scan of transform coefficients in the respective subset,
coding bin 1 of the level of transform coefficients in each subset in a second scan of transform coefficients in the respective subset;
coding bin 2 of the level of transform coefficients in each subset in a third scan of transform coefficients in the respective subset;
coding signs of the levels of transform coefficients in each subset in a fourth scan of transform coefficients in the respective subset; and
coding at least a portion of remaining bins of the levels of transform coefficients in each subset in a fifth scan of transform coefficients in the respective subset.
19. A non-transitory computer-readable storage medium having stored thereon instructions that, when executed, cause one or more processors of a device that codes transform coefficients associated with residual video data in a video decoding process to:
arranging a block of transform coefficients into one or more transform coefficient subsets;
decoding the position of the last significant coefficient in each subset;
decoding the significance of the transform coefficients in each subset according to a scan order, wherein the scan order includes both a scan mode and a scan direction, wherein the scan direction is an inverse scan direction, and wherein the scan order starts from the position of the last significant coefficient in the each subset;
decoding levels of the transform coefficients in each subset according to the scan order of the inverse scan direction;
inverse transforming the block of transform coefficients to generate residual video data;
performing a prediction process to generate predicted video data; and
decoding the block of data based on the predicted video data and the residual video data.
20. The non-transitory computer-readable storage medium of claim 19, further comprising instructions that cause the one or more processors to:
decoding the significance of the transform coefficients in each subset in a first scan of the transform coefficients in the respective subset;
decoding bin 1 of the level of transform coefficients in each subset in a second scan of transform coefficients in the respective subset;
decoding at least a portion of remaining bins of the level of transform coefficients in each subset in a third scan of transform coefficients in the respective subset; and
decoding signs of the levels of transform coefficients in each subset in a fourth scan of transform coefficients in the respective subset.
HK19121159.8A2011-03-082019-03-19Coding of transform coefficients for video codingHK1261263B (en)

Applications Claiming Priority (7)

Application NumberPriority DateFiling DateTitle
US61/450,5552011-03-08
US61/451,4852011-03-10
US61/451,4962011-03-10
US61/452,3842011-03-14
US61/494,8552011-06-08
US61/497,3452011-06-15
US13/413,4972012-03-06

Publications (2)

Publication NumberPublication Date
HK1261263A1true HK1261263A1 (en)2019-12-27
HK1261263B HK1261263B (en)2023-01-20

Family

ID=

Similar Documents

PublicationPublication DateTitle
US11405616B2 (en)Coding of transform coefficients for video coding
HK1261263A1 (en)Coding of transform coefficients for video coding
HK1261263B (en)Coding of transform coefficients for video coding
HK1263213A1 (en)Coding method and apparatus for transform coefficients for video coding
HK1190543A (en)Coding of transform coefficients for video coding
HK1189733B (en)Coding of transform coefficients for video coding
HK1190543B (en)Coding of transform coefficients for video coding
HK1189733A (en)Coding of transform coefficients for video coding
HK1189318A (en)Coding of transform coefficients for video coding
HK1189732B (en)Coding of transform coefficients for video coding
HK1189732A (en)Coding of transform coefficients for video coding
HK1191483B (en)Coding of transform coefficients for video coding
HK1191483A (en)Coding of transform coefficients for video coding
HK1189318B (en)Coding of transform coefficients for video coding

[8]ページ先頭

©2009-2025 Movatter.jp