The present application claims the benefit of U.S. provisional application No. 63/355,027 filed on day 23 of 6.2022. The entire contents of which are incorporated herein by reference in their entirety.
Detailed Description
Reference will now be made in detail to the present embodiments, examples of which are illustrated in the accompanying drawings. In the following detailed description, numerous non-limiting specific details are set forth in order to provide an understanding of the subject matter presented herein. It will be apparent, however, to one of ordinary skill in the art that various alternatives may be used and that the subject matter may be practiced without these specific details without departing from the scope of the claims. For example, it will be apparent to one of ordinary skill in the art that the subject matter presented herein may be implemented on many types of electronic devices having digital video capabilities.
It should be noted that the terms "first," "second," and the like, as used in the description and claims of the present disclosure and in the accompanying drawings, are used for distinguishing between objects and not for describing any particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate, such that embodiments of the disclosure described herein may be implemented in other sequences than those illustrated in the figures or otherwise described in the disclosure.
The first version of the VVC standard was completed in 7 months 2020, which provides a bit rate saving of approximately 50% or equivalent perceived quality compared to the previous generation video codec standard HEVC. Although the VVC standard provides significant codec improvements over its predecessor, there is evidence that excellent codec efficiency can be achieved with additional codec tools. Recently, in cooperation with ITU-T VCEG and ISO/IEC MPEG, the Joint Video Exploration Team (JVET) began to explore advanced technologies that could greatly improve codec efficiency over VVC. At month 4 of 2021, a software code library named Enhanced Compression Model (ECM) was built for future video codec discovery work. The ECM reference software is based on a VVC Test Model (VTM) developed by JVET for VVC and further expands and/or improves on several existing modules (e.g., intra/inter prediction, transform, loop filter, etc.). In the future, any new codec beyond the VVC standard needs to be integrated into the ECM platform and tested using JVET universal test conditions (CTCs).
Like all previous video codec standards, ECMs are built on a block-based hybrid video codec framework. Fig. 1 shows a block diagram of a generic block-based hybrid video coding system. The input video signal is processed block by block, called a Coding Unit (CU). In ECM-1.0, a CU may be up to 128×128 pixels. However, like VVC, one Coding Tree Unit (CTU) is partitioned into CUs to accommodate different local characteristics based on a quadtree/binary/trigeminal tree. In a multi-type tree structure, one CTU is first partitioned by a quadtree structure. Each quadtree leaf node may then be further partitioned by a binary tree structure and a trigeminal tree structure. As shown in fig. 2A, 2B, 2C, 2D, and 2E, there are five segmentation types, quaternary segmentation, vertical binary segmentation, horizontal binary segmentation, vertical expansion quaternary segmentation, and horizontal expansion quaternary segmentation.
In fig. 1, spatial prediction and/or temporal prediction may be performed. Spatial prediction (or "intra prediction") predicts a current video block using pixels from samples (which are referred to as reference samples) of decoded neighboring blocks in the same video picture/slice. Spatial prediction reduces the spatial redundancy inherent in video signals. Temporal prediction (also referred to as "inter prediction" or "motion compensated prediction") uses reconstructed pixels from a coded video picture to predict a current video block. Temporal prediction reduces the inherent temporal redundancy in video signals. The temporal prediction signal for a given CU is typically signaled by one or more Motion Vectors (MVs), which indicate the amount and direction of motion between the current CU and its temporal reference. Also, if a plurality of reference pictures are supported, one reference picture index for identifying from which reference picture in the reference picture store the temporal prediction signal originates is additionally transmitted. After spatial and/or temporal prediction, a mode decision block in the encoder selects the best prediction mode, e.g. based on a rate-distortion optimization method. Then, the prediction block is subtracted from the current video block, and the prediction residual is decorrelated and quantized using a transform. The quantized residual coefficients are inverse quantized and inverse transformed to form a reconstructed residual, which is then added back to the prediction block to form a reconstructed signal of the CU. Furthermore, loop filtering, such as deblocking filters, sample Adaptive Offset (SAO), and Adaptive Loop Filters (ALF), may be applied to the reconstructed CU before the reconstructed CU is placed in the reference picture store and used to codec future video blocks. To form the output video bitstream, the coding mode (inter or intra), prediction mode information, motion information, and quantized residual coefficients are all sent to an entropy encoding unit to be further compressed and packed to form the bitstream. It should be noted that the term "block" or "video block" as used herein may be a part of a frame or picture, in particular a rectangular (square or non-square) part. Referring to HEVC and VVC, a block or video block may be or correspond to a Coding Tree Unit (CTU), a CU, a Prediction Unit (PU) or a Transform Unit (TU) and/or may be or correspond to a corresponding block, e.g., a Coding Tree Block (CTB), a Coding Block (CB), a Prediction Block (PB) or a Transform Block (TB)), and/or a sub-block.
Fig. 3 shows a general block diagram of a block-based video decoder. The video bitstream is first entropy decoded at an entropy decoding unit. The coding mode and prediction information are sent to a spatial prediction unit (if intra coded) or a temporal prediction unit (if inter coded) to form a prediction block. The residual transform coefficients are sent to an inverse quantization unit and an inverse transform unit to reconstruct the residual block. The prediction block and the residual block are then added together. The reconstructed block may also be loop filtered before being stored in the reference picture store. The reconstructed video in the reference picture store is then sent out to drive the display device and used to predict future video blocks.
The primary focus of the present disclosure is to further enhance the codec efficiency of a codec tool applied to a cross-component predictive, cross-component linear model (CCLM) in ECM. Hereinafter, some relevant codec tools in the ECM will be briefly reviewed. Hereinafter, some drawbacks in the existing design of the CCLM are discussed. Finally, a solution for improving existing CCLM predictive designs is provided.
Cross-component linear model prediction
To reduce cross-component redundancy, a cross-component linear model (CCLM) prediction mode is used in VVC for which chroma samples are predicted based on reconstructed luma samples of the same CU by using a linear model:
predC(i,j)=α·recL′(i,j)+ β (1)
Where predC (i, j) represents the predicted chroma samples in the CU and recL' (i, j) represents the downsampled reconstructed luma samples of the same CU obtained by performing the downsampling on the reconstructed luma samples recL (i, j). The above α and β are linear model parameters derived from up to four neighboring chroma-sample points and their corresponding downsampled luma sample points, which may be referred to as neighboring luma-chroma sample point pairs. Assuming that the current chroma block has a size of w×h, W 'and H' are obtained as follows:
-when applying LM mode, W '=w, H' =h;
-whenapplyingLM-amode,w'=w+h;
-H' =h+w when LM-L mode is applied;
wherein in LM mode, the upper and left side samples of the CU are used together to calculate the linear model coefficients, in lm_a mode, only the upper sample of the CU is used to calculate the linear model coefficients, and in lm_l mode, only the left side sample of the CU is used to calculate the linear model coefficients.
If the position of the upper neighboring sample point of the chroma block is denoted as S [0, -1]. S [ W '-1, -1], and the position of the left neighboring sample point of the chroma block is denoted as S [ -1,0]. S [ -1, h' -1], the positions of the four neighboring chroma sample points are selected as follows:
-selecting S [ W '/4, -1], S [3*W'/4, -1], S [ -1, h '/4], S [ -1,3 x h'/4] as the locations of the four neighboring chroma samples when the LM mode is applied and both the top and left neighboring samples are available;
selectingS[W'/8,-1],S[3*W'/8,-1],S[5*W'/8whentheLM-Amodeisappliedoronlytheupperneighborsamplesareavailable,
-1, S [7*W'/8, -1] as four locations adjacent to the chroma sampling points;
-selecting S-1, h '/8, S-1, 3 h'/8, S-1, 5h '/8, S-1, 7 h'/8 as the location of the four neighboring chroma samples when LM-L mode is applied or only left neighboring samples are available.
Four neighboring luminance samples corresponding to the selected position are obtained by a downsampling operation and the four obtained neighboring luminance samples are compared four times to find two larger values x0A and x1A and two smaller values x0B and x1B. The chroma sample values corresponding to the two larger and smaller values are denoted as y0A、y1A、y0B and y1B, respectively. Then, Xa、Xb、Ya and Yb are derived as follows:
Xa=(x0A+x1A+1)>>1;
Xb=(x0B+x1B+1)>>1;
Ya=(y0A+y1A+1)>>1;
Yb=(y0B+y1B+1)>>1
(2)
Finally, the linear model parameters α and β are obtained according to the following equation.
β=Yb-α·Xb (4)
Fig. 4 shows examples of the positions of the left side and upper samples and the samples of the current block involved in the CCLM mode, including the positions of the left side and upper samples of the nxn chroma block in the CU and the positions of the left side and upper samples of the 2 nx2N luma block in the CU.
The division operation for calculating the parameter α is implemented using a look-up table. To reduce the memory required to store the table, the diff value (the difference between the maximum and minimum values) and the parameter α are represented by an exponential representation. For example, diff is approximated by a 4-bit significant portion and an exponent. Thus, the table for 1/diff is reduced to 16 elements of 16 valid values, as follows:
DivTable [ ] = { 0, 7, 6, 5, 5, 4, 4, 3, 3, 2, 2, 1, 1, 1, 1, 0 } (5)
This would have the benefit of both reducing the complexity of the computation and reducing the memory size required to store the required tables.
In addition to the upper and left templates that can be used together to calculate the linear model coefficients, they can alternatively be used in other 2 LM modes (referred to as lm_a and lm_l modes).
In lm_t mode, only the upper template is used to calculate the linear model coefficients. To get more samples, the upper template is expanded to (W+H) samples. In lm_l mode, only the left template is used to calculate the linear model coefficients. To get more samples, the left template is expanded to (H+W) samples.
In lm_lt mode, the left and upper templates are used to calculate the linear model coefficients.
To match chroma samples of a 4:2:0 video sequence, two types of downsampling filters are applied to luma samples to achieve a downsampling ratio of 2:1 in both the horizontal and vertical directions. The selection of the downsampling filter is specified by the SPS level flag. The two downsampling filters are as follows, which correspond to "type 0" and "type 2" content, respectively.
Note that when the up reference line is located at the CTU boundary, only one luma line (the common line buffer in intra prediction) is used to make the downsampled luma samples.
This parameter calculation is performed as part of the decoding process and not just as an encoder retrieval operation. As a result, the α and β values are not transmitted to the decoder using any syntax.
For chroma intra mode coding, a total of 8 intra modes are allowed for chroma intra mode coding. These modes include five traditional intra modes and three cross-component linear model modes (CCLM, lm_a, and lm_l). The chroma mode signaling and derivation procedure is shown in table 1. Chroma mode coding depends directly on the intra prediction mode of the corresponding luma block. Since separate block partition structures for the luminance component and the chrominance component are enabled in the I slice, one chrominance block may correspond to a plurality of luminance blocks. Thus, for the chroma DM mode, the intra prediction mode of the corresponding luma block covering the center position of the current chroma block is directly inherited.
TABLE 1 deriving chroma prediction modes from luma modes when CCLM is enabled
Regardless of the value of sps cclm enabled flag, a single binarization table is used, as shown in Table 2.
TABLE 2 unified binarization table for chroma prediction modes
In table 2, the first binary bit indicates whether it is the normal mode (0) or the LM mode (1). If it is LM mode, the next bit indicates whether it is lm_chroma (0). If it is not LM_CHROMA, the next 1 binary bit indicates whether LM_L (0) or LM_A (1). For this case, when sps_ cclm _enabled_flag is 0, the first binary bit of the binary table of the corresponding intra_chroma_pred_mode may be discarded before entropy encoding. Or, in other words, the first binary bit is inferred to be 0 and is therefore not encoded. This single binarization table is used for both cases where sps_ cclm _enabled_flag is equal to 0 and 1. The first two bits in table 2 are context-coded using their own context model, and the remaining bits are bypass-coded.
In addition, to reduce luminance-chrominance delays in dual-tree, when 64×64 luminance coding tree nodes are partitioned with Not Split (and ISP is Not used for 64×64 CUs) or QT, the chrominance CUs in the 32×32/32×16 chrominance coding tree nodes are allowed to use CCLM as follows:
if the 32×32 chroma node is not partitioned or QT partitioned, then all chroma CUs in the 32×32 node may use CCLM.
If the 32×32 chroma node is partitioned with horizontal BT and the 32×16 child node is not partitioned or partitioned using vertical BT, then all chroma CUs in the 32×16 chroma node may use CCLM.
The CCLM is not allowed for chroma CUs under all other luma and chroma coding tree partitioning conditions.
During ECM development, the simplified derivation of α and β (min-max approximation) was removed. Alternatively, a linear least squares solution is performed between the causal reconstruction data of the downsampled luminance samples and the causal chrominance samples to derive the model parameters α and β.
Wherein RecC (I) and Rec'L (I) indicate reconstructed chroma samples and downsampled reconstructed luma samples around the target block, and I indicates the total number of samples of the neighboring data.
The lm_ A, LM _l mode is also called a multidirectional linear model (MDLM). Fig. 5A shows an example of operation MDLM when block content cannot be predicted from an L-shaped reconstructed region. Fig. 5B shows MDLM _l that uses only the left reconstructed samples to derive CCLM parameters. Fig. 5C shows MDLM _t that uses only the top reconstructed samples to derive the CCLM parameters.
Integers of Least Mean Squares (LMS) discussed above (please refer to equations (8) - (9)) have been proposed as an improvement over CCLM. An initial integer design of the LMS CCLM is presented for the first time in JCTCC-C206. The method is then improved by a series of simplifications, which finally form an ECM LMS version, including JCTVC-F0233/I0178, which reduces the alpha precision nα from 13 to 7, JCTVC-I0151, which reduces the maximum multiplier bit width, and JCTVC-H0490/I0166, which reduces the division LUT entries from 64 to 32.
As discussed in equation (1), the integer design models the correlation of luminance and chrominance signals using a linear relationship. The chrominance values are predicted from the reconstructed luminance values of the co-located block.
In YUV420 sampling, the luminance component and the chrominance component have different sampling ratios. The sampling rate of the chrominance component is half that of the luminance component, and has a phase difference of 0.5 pixel in the vertical direction. The reconstructed luminance requires downsampling in the vertical direction and subsampling in the horizontal direction to match the size of the chrominance signal. For example, downsampling may be achieved by:
RecL′(i,j)=(recL(2i,2j)+recL(2i,2j+1))>>1 (10)
In equation (8), in order to maintain high data accuracy, floating point operations are required in calculating the linear model parameter α. And, when α is represented by a floating point value, floating point multiplication is involved in equation (1). In this section, an integer implementation of the algorithm is designed. Specifically, the fractional part of the parameter α is quantized with nα bits of data accuracy. The parameter a value is represented by the amplified and rounded integer values a 'and a' =a× (1 < < nα). The linear model of equation (1) is then changed to:
predC[x,y]=(α'·RecL'[x,y]>>nα)+β' (11)
Where β 'is the rounded value of floating point β, and α' can be calculated as follows.
Instead of the division operation of equation (12) it is proposed to look up and multiply by a table. A2 is first scaled down to reduce the table size. A1 is also narrowed to avoid product overflow. Then, in a2, only the most significant bits defined by the value of nA2 are reserved, and the other bits are all set to zero. The approximation a2' can be calculated as:
wherein, [..] means rounding operations, andIt can be calculated as:
Wherein bdepth (a2) represents the bit depth of the value a2.
The same procedure is performed for a1 as follows:
Considering the quantized representations of a1 and a2, equation (12) can be rewritten as the following equation.
Wherein,The representation hasIs provided with a look-up table of the length of (c), to avoid division.
In the simulation, the constant parameter is set as:
Nα is equal to 13, which is a trade-off between data accuracy and computational cost.
·Equal to 6, such that the look-up table size is 64, the table size can be further reduced to 32 by amplifying a2 when bdepth (a2) <6 (e.g., a2 < 32).
Ntable is equal to 15, resulting in a 16-bit data representation of the table element.
·Set to 15 to avoid product overflow and maintain 16 bit multiplication.
Finally, α' is cut to [ -2-15,215 -1] to preserve the 16-bit multiplication in equation (11). By this clipping, when nα is equal to 13, the actual a value is limited to [ -4, 4), which helps to prevent false amplification.
By means of the calculated parameter α ', the parameter β' is calculated as follows:
Where the division of the above equation can be replaced simply by a shift, since the value I is a power of 2.
Similar to the discussion above regarding equation (1), in HM6.0, an intra prediction mode, referred to as LM, is applied to predict the chroma PU based on a linear model using reconstruction of the co-located luma PU. The parameters of the linear model consist of slopes (a > > k) and y-intercepts (b) that are derived from neighboring luminance and chrominance pixels using a least mean square solution. The values of the predicted samples predSamples x, y are derived as follows, where x, y=0......n.s-1, wherein, nS specifies the block size of the current chroma PU:
predSamples [ x, y ] =clip 1C(((pY' [ x, y ]. Times.a) > > k) +b), where x, y=0..ns-1 (17)
Where PY' x, y is the reconstructed pixel from the corresponding luma component. When the coordinates x and y are equal to or greater than 0, PY' is the reconstructed pixel from the co-located luminance PU. When x or y is less than 0, PY' is the reconstructed neighboring pixel of the parity PU.
Some intermediate variables L, C, LL, LC, k and k3 in the derivation are derived as:
k2 = Log2( (2*nS) >> k3 ) (18-5)
k3 = Max( 0, BitDepthC + Log2( nS ) - 14 ) (18-6)
thus, the variables a, b, and k can be derived as:
a1 = ( LC << k2 ) – L*C (19-1)
a2 = ( LL << k2 ) – L*L (19-2)
k1 = Max( 0, Log2( abs( a2 ) ) - 5 ) – Max( 0, Log2( abs( a1 ) ) - 14 ) + 2 (19-3)
a1s = a1 >> Max(0, Log2( abs( a1 ) ) - 14 ) (19-4)
a2s = abs( a2 >> Max(0, Log2( abs( a2 ) ) - 5 ) ) (19-5)
a3 = a2s < 1 ?0 :Clip3( -215, 215-1, a1s*lmDiv + ( 1 << ( k1 - 1 ) ) >> k1 ) (19-6)
a = a3 >> Max( 0, Log2( abs( a3 ) ) - 6 ) (19-7)
k = 13 – Max( 0, Log2( abs( a ) ) - 6 ) (19-8)
b = ( L – ( ( a*C ) >> k1 ) + ( 1 << ( k2 - 1 ) ) ) >> k2, (19-9)
Wherein lmDiv is specified in a 63-entry lookup table (i.e., table 3) that is generated online by:
lmDiv(a2s)=( (1 << 15) + a2s/2 ) / a2s . (20)
table 3 Specification of lmDiv
| a2s | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 |
| lmDiv | 32768 | 16384 | 10923 | 8192 | 6554 | 5461 | 4681 | 4096 | 3641 | 3277 | 2979 | 2731 | 2521 |
| a2s | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 |
| lmDiv | 2341 | 2185 | 2048 | 1928 | 1820 | 1725 | 1638 | 1560 | 1489 | 1425 | 1365 | 1311 | 1260 |
| a2s | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 |
| lmDiv | 1214 | 1170 | 1130 | 1092 | 1057 | 1024 | 993 | 964 | 936 | 910 | 886 | 862 | 840 |
| a2s | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 |
| lmDiv | 819 | 799 | 780 | 762 | 745 | 728 | 712 | 697 | 683 | 669 | 655 | 643 | 630 |
| a2s | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | |
| lmDiv | 618 | 607 | 596 | 585 | 575 | 565 | 555 | 546 | 537 | 529 | 520 | 512 | |
In equation (19-6), a1s is a 16-bit signed integer, and lmDiv is a 16-bit unsigned integer. Thus, a 16-bit multiplier and 16-bit storage are required. It is proposed to reduce the bit depth of the multiplier to an internal bit depth and to reduce the size of the look-up table, as described in more detail below.
The bit depth of a1s is reduced to the internal bit depth by changing equation (19-4) to the following equation:
a1s = a1 >> Max(0, Log2( abs( a1 ) ) – (BitDepthC – 2)) . (21)
The value of lmDiv with the internal bit depth is implemented using the following equation (22) and stored in a look-up table:
lmDiv(a2s)=( (1 << (BitDepthC-1)) + a2s/2 ) / a2s. (22)
table 4 shows an example of the internal bit depth 10.
TABLE 4 Specification of lmDiv with internal bit depth equal to 10
Equation (19-3) and equation (19-8) are also modified as follows:
k1 =max (0, log2 (abs (a 2)) -5) -Max (0, log2 (abs (a 1)) - (BitDepthC -2)), and (23-1)
k=BitDepthC–1–Max(0,Log2(abs(a))-6) (23-2)
It is also proposed to reduce entries from 63 to 32 and the bits per entry from 16 to 10, as shown in table 5. By doing so, a memory savings of approximately 70% may be achieved. The corresponding changes for equations (19-6), equation (20) and equation (19-8) are as follows:
a3=a2s<320:Clip3(-215,215-1,a1s*lmDiv+(1<<(k1-1))>>k1) (24-1)
lmDiv(a2s)=((1<<(BitDepthC+4))+a2s/2)/a2s (24-2)
k=BitDepthC+4–Max(0,Log2(abs(a))-6). (24-3)
TABLE 5 Specification of lmDiv with internal bit depth equal to 10
| a2s | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 |
| lmDiv | 512 | 496 | 482 | 468 | 455 | 443 | 431 | 420 | 410 | 400 | 390 | 381 | 372 | 364 | 356 | 349 |
| a2s | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 |
| lmDiv | 341 | 334 | 328 | 321 | 315 | 309 | 303 | 298 | 293 | 287 | 282 | 278 | 273 | 269 | 264 | 260 |
Multi-model linear model prediction
In ECM-1.0, a multi-model LM (MMLM) prediction mode is proposed for which chroma samples are predicted based on reconstructed luma samples of the same CU by using two linear models, as follows:
Where predC (i, j) represents the predicted chroma samples in the CU and recL' (i, j) represents the downsampled reconstructed luma samples of the same CU. Threshold is calculated as the average value of neighboring reconstructed luminance samples. Fig. 6 shows an example of classifying neighboring samples into two groups based on a value Threshold. For each group, parameters αi and βi (where i equals 1 and 2, respectively) are derived from the linear relationship between luminance and chrominance values from two samples, which are the minimum luminance sample a (XA、YA) and the maximum luminance sample B (XB、YB) inside the group. Here, XA、YA is an X-coordinate (i.e., luminance value) and y-coordinate (i.e., chrominance value) value of the sample point a, and XB、YB is an X-coordinate and y-coordinate value of the sample point B. The linear model parameters α and β are obtained according to the following equation.
β=yA-αxA (26)
Such a method is also called a min-max method. Division in the above equation can be avoided and replaced by multiplication and shifting.
For a coded block having a square shape, the above two equations are directly applied. For non-square coded blocks, neighboring samples of longer boundaries are first sub-sampled to have the same number of samples as the shorter boundaries.
In addition to the scenario in which the upper Fang Moban and left templates are used together to calculate the linear model coefficients, these two templates can alternatively be used in the other two MMLM modes (called MMLM _a mode and MMLM _l mode).
In MMLM _a mode, only the pixel samples in the upper template are used to calculate the linear model coefficients. To get more samples, the upper template is expanded to the size of (W+W). In MMLM _l mode, only the pixel samples in the left template are used to calculate the linear model coefficients. To get more points, the left template is expanded to the size of (H+H).
Note that when the up reference line is located at the CTU boundary, only one luma line (stored in the line buffer for intra prediction) is used to make the downsampled luma samples.
For chroma intra mode coding, a total of 11 intra modes are allowed for chroma intra mode coding. These modes include five traditional intra modes and six cross-component linear model modes (CCLM, lm_ A, LM _ L, MMLM, MMLM _a, and MMLM _l). The chroma mode signaling and derivation procedure is shown in table 6. Chroma mode coding depends directly on the intra prediction mode of the corresponding luma block. Since separate block partition structures for the luminance component and the chrominance component are enabled in the I slice, one chrominance block may correspond to a plurality of luminance blocks. Thus, for the chroma DM mode, the intra prediction mode of the corresponding luma block covering the center position of the current chroma block is directly inherited.
TABLE 6 deriving chroma prediction modes from luma modes when MMLM is enabled
MMLM mode and LM mode can also be used together in an adaptive manner. For MMLM, the two linear models are as follows:
Where predC (i, j) represents the predicted chroma samples in the CU and recL' (i, j) represents the downsampled reconstructed luma samples of the same CU. Threshold can be simply determined based on the average of luminance and chrominance, together with its minimum and maximum values. Fig. 7 shows an example of classifying neighboring samples into two groups based on an inflection point T indicated by an arrow. The linear model parameters α1 and β1 are derived from the linear relationship between luminance and chrominance values from two samples, which are the minimum luminance samples a (XA、YA) and Threshold (XT、YT). The linear model parameters α2 and β2 are derived from the linear relationship between luminance and chrominance values from two samples, which are the maximum luminance sample B (XB、YB) and Threshold (XT、YT). Here, XA、YA is an X-coordinate (i.e., luminance value) and y-coordinate (i.e., chrominance value) value of the sample point a, and XB、YB is an X-coordinate and y-coordinate value of the sample point B. The linear model parameters αi and βi for each group are obtained according to the following equations, where i equals 1 and2, respectively.
β1=YA-α1XA
β2=YT-α2XT (28)
For a coded block having a square shape, the above equation is directly applied. For non-square coded blocks, neighboring samples of longer boundaries are first sub-sampled to have the same number of samples as the shorter boundaries.
In addition to the scenario in which the upper Fang Moban and left templates are used together to determine the linear model coefficients, these two templates can alternatively be used in the other two MMLM modes (denoted as MMLM _a mode and MMLM _l mode, respectively).
In MMLM _a mode, only the pixel samples in the upper template are used to calculate the linear model coefficients. To get more samples, the upper template is expanded to the size of (W+W). In MMLM _l mode, only the pixel samples in the left template are used to calculate the linear model coefficients. To get more points, the left template is expanded to the size of (H+H).
Note that when the up reference line is located at the CTU boundary, only one luma line (stored in the line buffer for intra prediction) is used to make the downsampled luma samples.
For chroma intra mode coding, there is a condition check for selecting either LM mode (CCLM, lm_a, and lm_l) or multi-mode LM mode (MMLM, mmlm_a, and MMLM _l). The condition check is as follows:
Where BlkSizeThresLM denotes the minimum block size of the LM mode, and BlkSizeThresMM denotes the minimum block size of the MMLM mode. The symbol d represents a predetermined threshold. In one example, d may take on the value 0. In another example, d may take on a value of 8.
For chroma intra mode coding, a total of 8 intra modes are allowed for chroma intra mode coding. These modes include five traditional intra modes and three cross-component linear model modes. The chroma mode signaling and derivation procedure is shown in table 1. Notably, for a given CU, if it is encoded in linear model mode, it is determined whether it is normal single model LM mode or MMLM mode based on the above condition check. Unlike the case shown in table 6, there is no separate MMLM mode that needs to be signaled. Chroma mode coding depends directly on the intra prediction mode of the corresponding luma block. Since separate block partition structures for the luminance component and the chrominance component are enabled in the I slice, one chrominance block may correspond to a plurality of luminance blocks. Thus, for the chroma DM mode, the intra prediction mode of the corresponding luma block covering the center position of the current chroma block is directly inherited.
Scaling (slope) adjustments to the CCLM were proposed as a further improvement during ECM development, for example as described in JVET-Y0055/Z0049.
As discussed above, CCLM uses a model with 2 parameters to map luma values to chroma values. The scaling parameter "a" and the deviation parameter "b" define a map as follows:
chromaVal = a * lumaVal + b (30)
it is proposed to signal the adjustment "u" to the scaling parameters to update the model to the form:
chromaVal = a’ * lumaVal + b’ (31)
wherein a '=a+u, and b' =b-u×yr.
By this selection, the mapping function will tilt or rotate around the point with the luminance value yr. It is proposed to use the average of the reference luminance samples used in the model creation as yr in order to provide meaningful modifications to the model. Fig. 8A to 8B show the effect of the scaling parameter "u", wherein fig. 8A shows a model created without the scaling parameter "u", and fig. 8B shows a model created with the scaling parameter "u".
In one example, the scaling parameters are provided as integers between-4 and 4 (including-4 and 4) and signaled in the bitstream. The unit of the scaling parameters is 1/8 (for 10-bit content) of the chroma-sample value for each luma-sample value.
In one example, adjustments may be used to CCLM models ("lm_chromajdx" and "MMLM _chromajdx") that use reference points above and to the left of the block, but not for "single-sided" modes. This choice is based on a trade-off of codec efficiency versus complexity.
When scaling adjustments are applied to a multi-mode CCLM model, both models may be adjusted and thus, for a single chroma block, at most two scaling updates are signaled.
To enable scaling at the encoder, the encoder may perform SATD-based retrieval of the best value of the scaling update for Cr and similar SATD-based retrieval for Cb. If either result is a non-zero scaling parameter, the combined scaling adjustment pair (SATD-based update for Cr, SATD-based update for Cb) will be included in the list of RD checks for TU.
Fusion of chroma intra prediction modes
JVET-Y0092/Z0051 proposed the fusion of chroma intra modes during ECM development.
The intra prediction modes enabled for the chroma components in ECM-4.0 are six cross-component Linear Model (LM) modes including cclm_lt, cclm_ L, CCLM _ T, MMLM _lt, mmlm_l, and MMLM _t modes, direct Mode (DM), and four default chroma intra prediction modes. Four default modes are given by the list 0,50,18,1 and if the DM mode already belongs to the list, the mode in the list will be replaced with the mode 66.
A decoder-side intra mode derivation (DIMD) method for luma intra prediction is included in ECM-4.0. First, a horizontal gradient and a vertical gradient are calculated for each reconstructed luma sample of the L-shaped template of the second neighboring row and column of the current block to construct a gradient histogram (HoG). Then, the two intra prediction modes having the largest histogram magnitude value and the second largest histogram magnitude value are mixed with the plane mode to generate a final predictor of the current luminance block.
In order to improve the coding efficiency of chroma intra prediction, two methods are proposed, including a decoder-side derived chroma intra prediction mode (DIMD chroma) and a fusion of the non-LM mode and MMLM _lt mode.
In the first embodiment, DIMD chromaticity modes are proposed. The proposed DIMD chroma mode uses DIMD derivation method to derive the chroma intra prediction mode of the current block based on co-located reconstructed luma samples. Specifically, a horizontal gradient and a vertical gradient are calculated for each co-located reconstructed luma sample of the current chroma block to construct a HoG, as shown in fig. 8C. Intra-prediction of the chroma of the current chroma block is then performed using the intra-prediction mode having the largest histogram magnitude value.
When the intra prediction mode derived from DIMD chroma mode is the same as the intra prediction mode derived from DM mode, the intra prediction mode having the second largest histogram magnitude value is used as DIMD chroma model.
As shown in table 7, a CU level flag is signaled to indicate whether the proposed DIMD chroma mode is applied.
TABLE 7 intra/u in the proposed method binarization process of chroma pred mode
| intra_chroma_pred_mode | Binary bit string | Chroma intra mode |
| 0 | 1100 | List [0] |
| 1 | 1101 | List [1] |
| 2 | 1110 | List [2] |
| 3 | 1111 | List [3] |
| 4 | 10 | DIMD chromaticity |
| 5 | 0 | DM |
In a second embodiment, a fusion of chroma intra prediction modes is proposed, in which a DM mode and four default modes can be fused with MMLM _lt mode, as follows:
pred=(w0*pred0+w1*pred1+(1<<(shift-1)))>>shift
where pred0 is a predictor obtained by applying the non-LM mode, pred1 is a predictor obtained by applying the MMLM _lt mode, and pred is a final predictor of the current chroma block. The two weights (w 0 and w 1) are determined by intra prediction modes of neighboring chroma blocks, and shift is set equal to 2. Specifically, { w0, w1} = {1,3}, when both the upper and left neighboring blocks are coded using the LM mode, { w0, w1 = {3,1}, when both the upper and left neighboring blocks are coded using the non-LM mode, { w0, w1 = {2,2}, otherwise.
For grammar design, if a non-LM mode is selected, a flag is signaled to indicate whether fusion is applied. And the proposed fusion applies only to I slices.
In a third embodiment, DIMD chroma modes are combined with a fusion of chroma intra prediction modes. Specifically, the DIMD chroma mode described in the first embodiment is applied, and for I slices, the DM mode, the four default modes, and the DIMD chroma mode can be fused with the MMLM _lt mode using the weights described in the second embodiment, while for non-I slices, only DIMD chroma modes can be fused with the MMLM _lt mode using equal weights.
In a fourth embodiment, DIMD chroma modes with reduction processing are combined with a fusion of chroma intra prediction modes. Specifically, the DIMD chroma mode with reduction processing derives the intra mode based on neighboring reconstructed Y, cb and Cr samples in the second neighboring rows and columns, as shown in fig. 8D. The other portions are the same as those of the third embodiment.
In one embodiment, when DIMD is applied, two intra modes are derived from the reconstructed neighboring samples, and the two predictors are combined with a plane mode predictor with weights derived from the gradient, as described in JVET-O0449. The division operation in weight derivation is performed using the same look-up table (LUT) based integration scheme as used by CCLM. For example division in azimuth calculation
Orient=Gy/Gx
Is calculated by the following LUT-based scheme:
x=Floor(Log2(Gx))
normDiff=((Gx<<4)>>x)&15
x+=(3+(normDiff!=0)?1:0)
Orient=(Gy*(DivSigTable[normDiff]|8)+(1<<(x-1)))>>x
Wherein,
DivSigTable[16]={0,7,6,5,5,4,4,3,3,2,2,1,1,1,1,0}。
The derived intra mode is included into the main list of intra Most Probable Modes (MPMs), and thus the DIMD process is performed before constructing the MPM list. The main derived intra mode of DIMD blocks is stored with the block and used for MPM list construction of neighboring blocks.
Fig. 8E to 8H show steps of decoder-side intra mode derivation, in which the intra prediction direction is estimated without intra mode signaling. The first step, shown in fig. 8E, involves estimating the gradient of each spot (for the light gray spots shown in fig. 8E). The second step, shown in FIG. 8F, involves mapping the gradient values to the nearest predicted direction within [2,66 ]. The third step, as shown in fig. 8G, includes selecting 2 prediction directions, wherein for each prediction direction, all absolute gradients Gx and Gy of neighboring pixels of that direction are summed, and the top 2 directions are selected. The fourth step, shown in fig. 8H, includes enabling weighted intra prediction with the selected direction.
Multiple Reference Line (MRL) intra prediction uses more reference lines for intra prediction. In fig. 9, an example of 4 reference rows is depicted, where the samples of segments a and F are not taken from reconstructed neighboring samples, but are filled with the closest samples from segments B and E, respectively. HEVC intra-picture prediction uses the nearest reference line (i.e., reference line 0). In the MRL, 2 additional rows (reference row 1 and reference row 3) are used.
An index (mrl _idx) of the selected reference line is signaled and used to generate an intra predictor. For reference rows idx greater than 0, only additional reference row patterns are included in the MPM list, and only MPM indexes with no remaining patterns are signaled. The reference row index is signaled before the intra prediction mode, and in case a non-zero reference row index is signaled, the plane mode is excluded from the intra prediction mode.
For blocks of the first row inside the CTU, the MRL is disabled to prevent the use of extended reference samples outside the current CTU row. In addition, PDPC is also disabled when additional rows are used. For MRL mode, the derivation of DC values in DC intra prediction mode for non-zero reference row index is consistent with the derivation of reference row index 0. The MRL needs to store 3 adjacent luminance reference lines with CTUs to generate predictions. The cross-component linear model (CCLM) tool also requires 3 adjacent luma reference lines for its downsampling filter. The definition of MRLs using the same 3 rows is consistent with CCLM to reduce the memory requirements of the decoder.
During ECM development, a convolutional cross-component model (CCCM) of chroma intra modes is proposed.
It is proposed to apply a convolution cross-component model (CCCM) to predict chroma samples from reconstructed luma samples, the spirit of which is similar to the current CCLM mode. As with CCLM, when chroma subsampling is used, the reconstructed luma samples are downsampled to match the lower resolution chroma grid.
Furthermore, similar to CCLM, there is an option to use single or multiple model variants of CCCM. The multiple model variant uses two models, one model being derived for samples above the average luminance reference value and the other model being derived for the remaining samples (following the spirit of the CCLM design). For PUs having at least 128 available reference points, the multi-model CCCM mode may be selected.
The proposed convolution 7-tap filter consists of a 5-tap plus sign shape spatial component, a non-linear term and a bias term. The input to the spatial 5 tap component of the filter is made up of a center (C) luminance sample (which is co-located with the chroma sample to be predicted) and its upper/north (N), lower/south (S), left/west (W) and right/east (E) neighboring samples, as shown in fig. 10A.
The nonlinear term P is represented as the second power of the center luminance sample point C and scaled to the sample point value range of the content:
P=(C*C+midVal)>>bitDepth
That is, for 10-bit content, it is calculated as:
P=(C*C+512)>>10
The offset term B represents the scalar offset between the input and output (similar to the offset term in CCLM) and is set to an intermediate chroma value (512 for 10-bit content).
The output of the filter is calculated as the convolution between the filter coefficient ci and the input value and is truncated to the range of valid chroma samples:
predChromaVal=c0C+c1N+c2S+c3E+c4W+c5P+c6B
The filter coefficients ci are calculated by minimizing the MSE between the predicted and reconstructed chroma-sample points in the reference region. Fig. 10B shows a reference region consisting of 6 lines of chroma samples above and to the left of the PU. The reference region extends one PU width to the right and one PU height below the PU boundary. The region is adjusted to include only available samples. An extension of the region shown in blue is required to support "side sampling" of the plus sign shaped spatial filter and to fill when in the unavailable region.
MSE minimization is performed by computing an autocorrelation matrix for the luminance input and a cross-correlation vector between the luminance input and the chrominance output. LDL decomposition is performed on the autocorrelation matrix and back-substitution is used to calculate the final filter coefficients. This process generally follows the calculation of ALF filter coefficients in the ECM, however LDL decomposition is chosen instead of Cholesky decomposition to avoid the use of square root operations. The proposed method uses only integer arithmetic.
The use of this mode is signaled using the PU level flag of CABAC codec. A new CABAC context is included to support this. In terms of signaling CCCM is considered a sub-mode of the CCLM. That is, the CCCM flag is signaled only when the intra prediction mode is lm_chroma_idx (to enable single mode CCCM) or MMLM _chroma_idx (to enable multi-mode CCCM).
The encoder performs two new RD checks in the chroma prediction mode loop, one for checking the single model CCCM mode and one for checking the multi-model CCCM mode.
In existing CCLM or MMLM designs, adjacent reconstructed luma-chroma samples are classified into one or more sample groups based on a value Threshold, which only considers luma DC values. That is, luminance-chrominance sample pairs are classified by considering only the intensity of the luminance sample. However, the luma component typically retains rich texture, and the current luma samples may be highly correlated with neighboring luma samples, such inter-sample correlation (AC correlation) may be beneficial for classification of luma-chroma sample pairs, and may bring additional codec efficiency.
As shown in fig. 10C, CCLM assumes that a given chroma sample is only related to a corresponding luma sample (L0.5, which may be taken as a fractional luma sample position), and predicts the given chroma sample using a Simple Linear Regression (SLR) estimation using common least squares (OLS). However, as shown in fig. 10D, in some video content, one chroma sample may be correlated (AC or DC correlated) with multiple luma prediction samples at the same time, so a Multiple Linear Regression (MLR) model may further improve accuracy.
Although CCCM mode can enhance intra prediction efficiency, there is room for further improvement in its performance. At the same time, some portions of the existing CCCM modes also need to be simplified to achieve efficient codec hardware implementations, or improved to have better codec efficiency. Furthermore, the tradeoff between implementation complexity and its codec efficiency benefits needs to be further improved.
Edge classification linear model (ELM)
In order to improve the codec efficiency of the luminance component and the chrominance component, a classifier that considers luminance edges or AC information is introduced, contrary to the above implementation in which only luminance DC values are considered. In addition to the existing band classification MMLM, the present disclosure also provides an exemplary classifier. The process of generating linear prediction models for different sets of points may be similar to CCLM or MMLM (e.g., via least squares or simplified min-max methods, etc.), but classified using different metrics. Different classifiers may be used to classify neighboring luma samples (e.g., of neighboring luma-chroma sample pairs) and/or luma samples corresponding to chroma samples to be predicted. Luminance samples corresponding to chroma samples may be obtained by a downsampling operation to match the positions of the corresponding chroma samples of the 4:2:0 video sequence. For example, luminance samples corresponding to chroma samples may be obtained by performing a downsampling operation on more than one (e.g., 4) reconstructed luminance samples (e.g., located around the chroma samples) corresponding to the chroma samples. Alternatively, for example, in the case of a 4:4:4 video sequence, luminance samples may be obtained directly from the reconstructed luminance samples. Alternatively, luminance samples may be obtained from respective ones of the reconstructed luminance samples located at respective co-located positions of the corresponding chrominance samples. For example, a luminance sample to be classified may be obtained from one reconstructed luminance sample of four reconstructed luminance samples corresponding to a chroma sample, which is located at an upper left position of the four reconstructed luminance samples, which may be regarded as a co-located position of the chroma sample.
The first classifier may classify the luminance samples according to luminance sample edge intensities. For example, one direction (e.g., 0 degrees, 45 degrees, or 90 degrees, etc.) may be selected to calculate the edge strength. The direction may be formed by the current sample and neighboring samples along the direction (e.g., neighboring samples located at the upper right 45 degrees of the current sample). Edge strength may be calculated by subtracting neighboring samples from the current sample. The edge intensities may be quantized into one of M segments by M-1 thresholds, and the first classifier may classify the current sample using M classes. Alternatively or additionally, N directions may be formed by the current sample and N neighboring samples along the N directions. The N edge intensities may be calculated by subtracting the N neighboring samples from the current sample, respectively. Similarly, if each of the N edge intensities can be quantized into one of M segments by M-1 thresholds, the first classifier can use MN classes to classify the current sample point.
The second classifier may be used to classify according to local patterns. For example, the current luminance sample Y0 may be compared with N luminance samples Yi adjacent thereto. If the value of Y0 is greater than the value of Yi, the score may be incremented by one, otherwise the score may be decremented by one. The scores may be quantized to form K classes. The second classifier may classify the current sample point into one of the K classes. For example, the neighboring luminance samples may be obtained from four neighboring samples located above, to the left, to the right, and below the current luminance sample, i.e., without diagonal neighboring samples.
It is contemplated that multiple first classifiers, second classifiers, or different instances of either the first classifier or the second classifier or other classifiers described herein may be combined. For example, the first classifier may be combined with an existing MMLM-threshold-based classifier. For another example, instance a of the first classifier may be combined with another instance B of the first classifier, wherein instances a and B take different directions (e.g., vertical and horizontal directions, respectively).
Those skilled in the art will recognize that while the existing CCLM design in the VVC standard is used in the specification as the basic CCLM method, the proposed cross-component method described in this disclosure may also be applied to other predictive codec tools with similar design spirit. For example, for chromaticity (CfL) from luminance in the AV1 standard, the proposed method can also be applied by dividing luminance-chromaticity sample pairs into multiple sample groups.
Those skilled in the art will recognize that Y/Cb/Cr may also be denoted Y/U/V in the field of video encoding and decoding. For example, if the video data is in RGB format, the proposed method can also be applied by simply mapping YUV symbols to GBR.
Filter-based linear model (FLM)
Considering the possibility that one chroma sample may be correlated with multiple luma samples simultaneously, a filter-based linear model (FLM) using an MLR model was introduced, as described below.
For chroma samples to be predicted, the reconstructed co-located and neighboring luma samples may be used to predict the chroma samples to capture inter-sample correlations between co-located luma samples, neighboring luma samples, and chroma samples. The reconstructed luma samples are linearly weighted and combined with an "offset" to generate the predicted chroma samples (C: predicted chroma samples, Li: ith reconstructed co-located or neighboring luma samples, ai: filter coefficients, β: offset, N: filter taps) as shown in equation (32-1) below. Note that the linear weighted addition of the offset values directly forms the predicted chroma samples (which may be low-pass, high-pass, depending on the video content adaptation), and it then adds the residual to form the reconstructed chroma samples.
In some implementations like CCCM, the offset term may also be implemented as the intermediate chroma value B (512 for 10-bit content) multiplied by another coefficient, as shown in equation (32-2) below.
For a given CU, the top and left reconstructed luma and chroma samples may be used to derive or train FLM parameters (αi,, β). Like CCLM, αi and β can be derived via OLS. The training samples on the top and left are collected and a pseudo-inverse is calculated at both the encoder and decoder sides to derive parameters, which are then used to predict chroma samples in a given CU. Let N denote the number of filter taps applied to the luminance samples, M denote the total top and left reconstructed luminance-chrominance sample pairs for the training parameters,Representing luminance samples with the ith sample pair and the jth filter tap, Ci representing chrominance samples with the ith sample pair, the following equation shows the derivation of the pseudo-inverse a+, as well as the parameters. Fig. 11 shows an example where N is 6 (6 taps), M is 8, luminance samples of top 2 rows and left 3 columns and chrominance samples of top 1 rows and left 1 columns are used to derive or train parameters.
Note that without an offset β, one can only predict chroma samples through αi, which may be a subset of the proposed method.
The proposed ELM/FLM/GLM (discussed below) can be directly extended to the CfL design in the AV1 standard, which explicitly sends the model parameters (α, β). For example, a and/or β are derived at the encoder at SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level and signaled to the decoder for CfL mode.
To further improve codec performance, additional designs may be used in FLM prediction. As shown in fig. 11 and discussed above, a 6-tap luma filter is used for FLM prediction. However, while the multi-tap filter may fit the training data well (e.g., top and left adjacent reconstructed luma and chroma samples), in some cases the training data does not capture the complete characteristics of the test data, which may result in an over fit and may not predict the test data well (i.e., the chroma block samples to be predicted). Furthermore, different filter shapes can be well adapted to different video block contents, resulting in a more accurate prediction.
To solve this problem, the filter shape and the number of filter taps may be predefined or signaled or switched in a Sequence Parameter Set (SPS), an Adaptive Parameter Set (APS), a Picture Parameter Set (PPS), a Picture Header (PH), a Slice Header (SH), a region, a CTU, a CU, a sub-block or a sample level. A set of filter shape candidates may be predefined and the selection of the set of filter shape candidates may be signaled or switched in SPS, APS, PPS, PH, SH, region, CTU, CU, sub-block or sample level. Different components (e.g., U and V) may have different filter switching controls. For example, a set of filter shape candidates (e.g., indicated by indices 0 through 5) may be predefined, and filter shape (1, 2) may represent a 2-tap luma filter, filter shape (1, 2, 4) may represent a 3-tap luma filter, etc., as shown in fig. 11. The filter shape selection of the U and V components may be switched in PH or CU or CTU levels. Note that N taps may represent N taps with or without the offset β described herein. An example is given in table 8 below.
Table 8-exemplary signaling and switching for different filter shapes
Different chroma types and/or color formats may have different predefined filter shapes and/or taps. For example, as shown in FIG. 12, predefined filter shapes (1, 2,4, 5) may be used for 4:2:0 type-0, predefined filter shapes (0,1,2,4,7) may be used for 4:2:0 type-2, and predefined filter shapes (1, 4) may be used for 4:2:2, and predefined filter shapes (0, 1,2,3,4, 5) may be used for 4:4:4.
In another aspect of the present disclosure, unavailable luma samples and chroma samples used to derive the MLR model may be padded from the available reconstructed samples. For example, if a 6-tap (0, 1,2,3,4, 5) filter as shown in fig. 12 is used, then for a CU located at the left picture boundary, the left column including the sample (0, 3) is not available (beyond the picture boundary), so the sample (0, 3) is a repeated pad from the sample (1, 4) to apply the 6-tap filter. Note that the padding process can be applied to training data (top and left adjacent reconstructed luma and chroma samples) and test data (luma and chroma samples in the CU).
One or more shapes/numbers of filter taps may be used for FLM prediction, examples of which are shown in fig. 16, 17, and 18A-18B. One or more sets of filter taps may be used for FLM prediction, examples of which are shown in fig. 19A-19G.
As described above, the MLR model (linear equation) must be derived at both the encoder and the decoder. According to one or more aspects of the present disclosure, several methods are proposed to derive the pseudo-inverse matrix a+, or to directly solve the linear equation. Other known methods, such as the Newton method, the Cayley-Hamilton method and the feature decomposition mentioned in https:// en.
In the present disclosure, a+ may be denoted as a-1 for simplification. The linear equation may be solved for 1, through the companion matrix (adjA), closed form, analytical solution for a-1 as follows:
The following shows one n×n generic form, one 2×2 and one 3×3 case. If 3 x 3 is used for FLM, then 2 scalers plus one offset need to be solved.
B=ax, x= (aTA)-1ATb=A+ b, denoted a-1 b)
By removing the (n-1) x (n-1) submatrices of the j-th row and i-th column.
2. Gauss-Jordan primordial elimination method
The linear equation can be solved using Gauss-Jordan elimination through an augmentation matrix [ A In ] and a series of first-order row operations to obtain a simplified row trapezoidal form [ I|X ]. Examples 2×2 and 3×3 are shown below.
3. Cholesky decomposition
To solve ax=b, a may be first decomposed by Cholesky-Crout algorithm to obtain an upper triangular matrix and a lower triangular matrix, and then a forward substitution and a backward substitution are sequentially applied to obtain a solution. One 3x3 example is shown below.
In addition to the above examples, some cases require special handling. For example, if some circumstances result in a linear equation that cannot be solved, a default value may be used to populate the chroma prediction value. The default value may be predefined or signaled or switched in SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level, for example, when 1< < (bitDepth-1), meanC, meanL, or meanC-meanL (average current chroma or other chroma, available luma values, or subset of FLM reconstructed neighboring regions) is predefined.
The following example represents a case where matrix a cannot be solved, where a default predictor may be assigned to the entire current block:
1. solving by closed form (analytical, concomitant matrix), but a is singular, (i.e., detA =0);
2. Solved by Cholesky decomposition, but a cannot do Cholesky decomposition, gjj < reg_sqr, where reg_sqr is a small value that can be predefined or signaled or switched in SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level.
Fig. 11 shows a typical case of deriving FLM parameters using top 2 and/or left 3 luminance lines and top 1 and/or left 1 chrominance lines. However, as described above, parameter derivation using different regions may bring codec benefits due to different block contents and reconstruction quality of different neighboring samples. Several methods of selecting an application area for parameter derivation are presented below:
1. Similar to MDLM, FLM derivation can use only top or left luminance and/or chrominance samples to derive parameters. Whether FLM, flm_l or flm_t is used may be predefined or signaled or switched in SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level. Assuming that the current chroma block size is w×h, W 'and H' are obtained as follows:
-when FLM mode is applied, W '=w, H' =h;
when flm_t mode is applied, W' =w+we, where We represents the extended top luminance/chrominance samples;
When flm_l mode is applied, H' =h+he, where He represents the extended left luminance/chrominance sample point.
The number of extended luminance/chrominance samples (We, he) may be predefined or signaled or switched in SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level.
For example, the predefined (We, he) = (H, W) is VVC CCLM, or the predefined (W, H) is ECM CCLM. The unavailable (We, he) luminance/chrominance samples may be repeatedly filled from the nearest (horizontal, vertical) luminance/chrominance samples.
Fig. 13 shows a graphical representation of flm_l and flm_t (e.g., under 4 taps). When flm_l or flm_t is applied, only H 'or W' luminance/chrominance samples are used for parameter derivation, respectively.
2. Similar to MRL, different row indices may be predefined or signaled or switched in SPS/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample levels to indicate the selected luma-chroma sample pair rows. This may benefit from different reconstruction quality for different rows of samples.
Fig. 14 shows that, like the MRL, the FLM may use different row parameter derivation (e.g., under 4 taps). For example, the FLM may use light blue/yellow luminance and/or chrominance samples in index 1.
3. The CCLM region is extended and the full top N and/or left M rows are obtained for parameter derivation. Fig. 14 shows that all deep blue and light blue and yellow regions may be used simultaneously. Training with larger regions (data) may result in a more robust MLR model.
It should be understood that throughout this disclosure, luminance sample values of an outer region of a video block to be decoded may be referred to as "outer luminance sample values" and chrominance sample values of the outer region may be referred to as "outer luminance sample values".
The corresponding syntax can be defined in table 9 for FLM prediction as follows. Where FLC denotes a fixed length code, TU denotes a truncated unary code, EGk denotes an exponential golomb code with k-th order, where k may be fixed or signaled/switched in SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level, SVLC denotes signed EG0, and UVLC denotes unsigned EG0.
Table 9-examples of FLM syntax
Note that the binarization of each syntax element can be changed.
Based on the existing linear model design, a new method for cross-component prediction is provided to further improve coding and decoding accuracy and efficiency. The main aspects of the proposed method are detailed below.
While the FLM discussed above provides the best flexibility (best performance), if the number of filter taps increases, it needs to solve for many unknown parameters. When the inverse matrix is greater than 3 x 3, the closed form derivation is unsuitable (too many multipliers) and an iterative approach like Cholesky is required, which increases the burden of the decoder processing loop. In this section, pre-operations prior to applying the linear model are presented, including using sample gradients to exploit the correlation between luminance AC information and chrominance intensity. By means of the gradient, the number of filter taps can be effectively reduced.
Note that the methods/examples in this section may be combined/reused from any of the designs discussed above, including but not limited to classification, filter shape, matrix derivation (with special handling), application area, syntax. Furthermore, the methods/examples listed in this section may also be applied to any of the designs discussed above to achieve better performance under certain complexity trade-offs.
Note that reference samples/training templates/reconstructed neighboring areas as used herein generally refer to luminance samples used to derive MLR model parameters that are then applied to internal luminance samples in one CU to predict chroma samples in the CU.
According to the proposed method, pre-operations (e.g., pre-linear weighting, sign, scale/absolute, thresholding, reLU) can be applied to reduce the dimension of the unknown parameters, instead of directly using the luminance sample intensity values as inputs to the linear model. In one example, the pre-operation may include calculating a sample difference based on the luminance sample value. As will be appreciated by those skilled in the art, the sample differences may be characterized as gradients, and thus this new approach is also referred to as a Gradient Linear Model (GLM) in certain embodiments.
Note that the following detailed description discusses a scenario in which the proposed pre-operations can be reused for/in combination with an SLR model (also referred to as a 1-tap case) and for/in combination with an MLR model (also referred to as a multi-tap case, e.g. 2 taps).
For example, instead of applying 2 taps on 2 luminance samples, pre-operations can be performed on 2 luminance samples, and then a simpler 1 tap can be applied to reduce complexity. Fig. 15A to 15D show some examples of 1 tap/2 tap (with offset) pre-operations, where 2 tap coefficients are denoted (a, b). Note that each circle shown in fig. 15A-15D represents an illustrative chromaticity position in YUV 4:2:0 format. As discussed above, in YUV 4:2:0 format, the luma samples corresponding to chroma samples may be obtained by performing a downsampling operation on more than one (e.g., 4) reconstructed luma samples corresponding to chroma samples (e.g., located around chroma samples). In other words, the chroma position may correspond to one or more luminance samples including a co-located luminance sample. Different 1-tap modes are designed for different gradient directions and use different "interpolated" luminance samples (weighted to different luminance positions) for gradient calculations. For example, a typical filter [1,0, -1;1,0, -1] is shown in FIGS. 15A, 15C and 15D, which represents the following operations:
Where RecL represents the reconstructed luminance sample value and RecL "(i, j) represents the pre-computed luminance sample value. Note also that the 1-tap filter shown in fig. 15A, 15C, and 15D can be understood as an alternative to the downsampling filter as used in CCLM (see equations (6) - (7)), and the filter coefficients are changed.
The pre-operations may be based on gradients, edge direction (detection), pixel intensities, pixel variations, pixel variances, roberts/Prewitt/compass/Sobel/Laplacian operators, high pass filters (by computing gradients or other related operators), low pass filters (by performing weighted average operations), etc. The edge direction detectors listed in the examples may be extended to different edge directions. For example, 1 tap (1, -1) or 2 tap (a, b) are applied in different directions to detect different edge gradients. The filter shape/coefficients may be symmetrical about the chromaticity position as in the example of fig. 15A-15D (420 type 0 case).
The pre-operation parameters (coefficients, symbols, scale/absolute values, thresholding, reLU) may be fixed or signaled/switched in SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level. Note that in an example, if multiple coefficients are applied to one sample (e.g., -1, 4), they may be combined (e.g., 3) to reduce the operation.
In one example, the pre-operation may involve calculating a sample difference of luminance sample values. Alternatively, the pre-operation may include performing downsampling by a weighted average operation. In some cases, the pre-operations may be applied repeatedly. For example, a low pass smoothing FIR filter [1,2,1]/4 or [1,2,1;1,2,1]/8 (i.e., downsampling) may be used to apply a template filter to remove outliers and then a1 tap GLM filter is applied to calculate sample differences to derive the linear model. It is contemplated that the sample difference may also be calculated and then downsampling is enabled.
In one example, the pre-operation coefficients (last applied (e.g., 3), or intermediate applied (e.g., -1, 4) to each luminance sample) may be limited to power values of 2 to save multipliers.
In one aspect of the present disclosure, the proposed new method may be reused/combined with the CCLM discussed above, which utilizes a Simple Linear Regression (SLR) model and uses one corresponding luma sample value to predict chroma sample values. This is also referred to as the 1 tap case. In this case deriving the linear model further comprises deriving the scaling parameter a and the offset parameter β by using pre-computed neighboring luma sample values and neighboring chroma sample values. Alternatively, the linear model may be rewritten as:
C=α·L+β (35)
Where L denotes here the luminance samples of the "pre-operation". The parameter derivation of the 1 tap GLM may reuse the CCLM design but consider the directional gradient (possibly with a high pass filter). In one example, the scaling parameter α may be derived by using a division look-up table (as described in detail below) to achieve simplification.
In one example, when combining the GLM with the SLR model, the scaling parameter α and the offset parameter β may be derived by utilizing the min-max method discussed above. Specifically, the scaling parameter α and the offset parameter β may be derived by comparing pre-computed neighboring luminance sample values to determine a minimum luminance sample value YA and a maximum luminance sample value YB, determining corresponding chrominance sample values XA and XB of the minimum luminance sample value YA and the maximum luminance sample value YB, respectively, and deriving the scaling parameter α and the offset parameter β based on the minimum luminance sample value YA, the maximum luminance sample value YB, and the corresponding chrominance sample values XA and XB according to the following equations:
β=YA-αXA。 (36)
In one example, the scaling adjustments discussed above may be reused when combining the GLM with the SLR model. In this case, the encoder may determine a scaling adjustment value (e.g., "u") to be signaled in the bitstream and add the scaling adjustment value to the derived scaling parameter α. The decoder may determine a scaling adjustment value (e.g., "u") from the bitstream and add the scaling adjustment value to the derived scaling parameter α. The increased value is ultimately used to predict the internal chroma-sample value.
In one aspect of the present disclosure, the proposed new method can be reused for/in combination with FLM that utilizes a Multiple Linear Regression (MLR) model and uses multiple luminance sample values to predict chroma sample values. This is also referred to as a multi-tap case, e.g. 2 taps. In this case, the linear model can be rewritten as:
In this case, the plurality of scaling parameters α and offset parameters β may be derived by using the pre-computed neighboring luminance sample values and neighboring chrominance sample values. In one example, the offset parameter β is optional. In one example, at least one of the plurality of scaling parameters α may be derived by utilizing the sample point differences. Furthermore, another one of the plurality of scaling parameters α may be derived by using the downsampled luminance sample value. In one example, at least one of the plurality of scaling parameters α may be derived by utilizing a horizontal or vertical sample difference calculated on the basis of the downsampled neighboring luminance sample values. In other words, the linear model may combine multiple scaling parameters α associated with different pre-operations.
Implicit filter shape derivation
In one example, the used directional filter shape may be derived at the decoder to save bit overhead, rather than explicitly signaling the selected filter shape index. For example, at the decoder, a plurality of directional gradient filters may be applied to each reconstructed luma sample of the L-shaped templates of the ith adjacent row and column of the current block. The filtered values (gradients) may then be accumulated for each direction in the plurality of directional gradient filters, respectively. In an example, the accumulated value is an accumulated value of absolute values of the corresponding filtered values. After the accumulation, the direction of the directional gradient filter with the largest accumulated value may be determined as the derived (luminance) gradient direction. For example, a gradient histogram (HoG) may be constructed to determine the maximum. The derived direction may be further used as a direction for predicting chroma-samples in the current block.
The following example relates to reusing the decoder-side intra mode derivation (DIMD) method for luma intra prediction included in ECM-4.0:
step 1, applying 2 kinds of directional gradient filters (3×3 horizontal/vertical sobel) to each reconstructed luminance sample point of the L-shaped template of the 2 nd adjacent row and column of the current block;
step 2, accumulating filtered values (gradients) for each of the directional gradient filters by SAD (sum of absolute differences);
step 3, constructing a gradient histogram (HoG) based on accumulating the filtered values, and
The maximum value in the hog is determined as the derived (luminance) gradient direction, based on which the GLM filter can be determined.
In one example, if the shape candidates are [ -1,0,1] (horizontal) and [1,2, 1] [ -1, -2, -1] (vertical), then the shape [ -1,0,1 ]; -1,0,1] is used for GLM-based chroma prediction when the maximum is associated with a horizontal shape.
The shape of the gradient filter used to derive the gradient direction may be the same as or different from the GLM filter in shape. For example, both filters may be horizontal [ -1,0,1; -1,0,1], or both filters may have different shapes, while the GLM filter may be determined based on a gradient filter.
The proposed GLM may be combined with MMLM or ELM discussed above. When combined with classification, each group may share or have its own filter shape, with the syntax indicating the shape for each group. For example, as an exemplary classifier, a horizontal gradient grad_hor may be classified into a first group, which corresponds to a first linear model, and a vertical gradient grad_ver may be classified into a second group, which corresponds to a second linear model. In one example, the horizontal luminance pattern may be generated only once.
Additional possible classifiers are provided below. With a classifier, adjacent and internal luminance sample pairs of a current video block may be classified into groups based on one or more thresholds. Note that as discussed above, each neighboring/inner chroma sample and its corresponding luma sample may be referred to as a luma-chroma sample pair. One or more thresholds are associated with the intensities of the neighboring/internal luminance samples. In this case, each of the plurality of groups corresponds to a respective one of the plurality of linear models.
When combined with the MMLM classifier, the operations of classifying neighboring reconstructed luma-chroma sample points of the current video block into 2 groups based on Threshold, deriving different linear models for the different groups, wherein the derivation process may be GLM-simplified, i.e. the number of taps is reduced by the pre-operation described above, classifying luma-chroma sample point pairs inside the CU (inner luma-chroma sample point pairs, wherein each inner luma-chroma sample point pair of the inner luma-chroma sample point pairs comprises an inner chroma sample point value predicted with the derived linear model) into 2 groups based on Threshold, applying different linear models to the reconstructed luma sample points in the different groups, and predicting chroma sample points in the CU based on the different classified linear models may be performed. Where Threshold may be the average of neighboring reconstructed luminance samples. Note that by increasing the number of Threshold, the number of classes (2) can be extended to multiple classes (e.g., aliquoting based on the minimum/maximum values of neighboring reconstructed (downsampled) luminance samples, fixed or signaled/switched in SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level).
In one example, instead of MMLM luminance DC intensities, the filtered values applied to FLM/GLM on neighboring luminance samples are used for classification. For example, if 1 tap (1, -1) GLM is applied, the average AC value (physical meaning) is used. The processing may be classifying the neighboring reconstructed luma-chroma sample pairs into K groups based on one or more filter shapes, one or more filtered values, and K-1Threshold Ti, deriving different MLR models for the different groups, wherein the derivation process may be GLM reduced, i.e. by the pre-operation described above, reducing the number of taps, similarly classifying luma-chroma sample pairs (internal luma-chroma sample pairs, wherein each internal luma-chroma sample pair of an internal luma-chroma sample pair comprises an internal chroma sample value predicted with the derived linear model) inside the CU based on the one or more filter shapes, the one or more filtered values, and K-1Threshold Ti, applying different linear models to the reconstructed luma samples in the different groups, and predicting chroma samples in the CU based on the different classified linear models. Wherein Threshold may be predefined (e.g., 0, or may be a table) or signaled/switched in SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level. For example, threshold may be the average AC value (filtered value) of neighboring reconstructed (possibly downsampled) luminance samples (2 groups), or based on a min/max AC score (K groups).
It has also been proposed to combine GLM with ELM classifiers. As shown in fig. 15A to 15D, one filter shape (e.g., 1 tap) may be selected to calculate the edge intensity. The direction is determined as the direction along which the sample difference between the current sample and N neighboring samples (e.g., all 6 luminance samples) is calculated. For example, the filters (shapes [1,0, -1;1,0, -1 ]) at the upper middle portion of fig. 15A indicate the horizontal direction because the sample difference between the samples can be calculated in the horizontal direction, while the filters (shapes [1,2,1; -1, -2, -1 ]) below them indicate the vertical direction because the sample difference between the samples can be calculated in the vertical direction. Positive and negative coefficients in each of the filters enable sample differences to be calculated. The processing may then include calculating one edge intensity by filtered values (e.g., equivalents), quantizing the edge intensity to M segments by M-1 thresholds Ti, classifying the current samples using K classes (e.g., k= =m), deriving different MLR models for the different groups, where the derivation process may be GLM-reduced, i.e., reducing the number of taps by the pre-operation described above, classifying the luminance-chrominance samples inside the CU into K groups, applying different MLR models to the reconstructed luminance samples in the different groups, and predicting the chrominance samples in the CU based on the different classified MLR models. Note that the filter shape used for classification may be the same as or different from the filter shape used for MLR prediction. Both the threshold M-1 and the threshold Ti, as well as the number of thresholds M-1 and threshold Ti, may be fixed or signaled/switched in SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level. In addition, other classifiers/combined classifiers as discussed in ELM may also be used for GLM.
If the classification samples in a group are less than one number (e.g., predefined 4), then the default values mentioned in discussing matrix derivation for the MLR model may be applied to the group parameters (αi,, β). If a corresponding neighboring reconstructed sample is not available for the selected LM mode, a default value may be applied. For example, when MMLM _l mode is selected but the remaining samples are invalid.
Several methods related to the simplification of GLM are introduced below to further improve the codec efficiency.
Matrix/parameter derivation in FLM requires floating point operations (e.g., division in a closed form), which is expensive for decoder hardware, thus requiring a fixed point design. For the 1-tap GLM case, it may be considered a modified luma reconstructed sample generation of the CCLM (e.g., horizontal gradient direction, from CCLM [1,2,1;1,2,1]/8 to GLM [ -1,0,1; -1,0,1 ]), the original CCLM process may be reused for GLM, including fixed point operations, MDLM downsampling, division tables, applying size constraints, min-max approximations, and scaling adjustments. For all items, the 1-tap GLM may have its own configuration or share the same design as the CCLM. For example, parameters (instead of LMS) are derived using a simplified min-max method and combined with scaling after the GLM model is derived. In this case, the center point (luminance value yr) for the rotation slope becomes the average value of the reference luminance sample point "gradient". For another example, when GLM is turned on for the CU, CCLM slope adjustment is inferred to be off, and syntax related to slope adjustment need not be signaled.
This section takes the typical case reference sample points (top 1 row and left 1 column) as an example. Note that as shown in fig. 14, the extended reconstructed region may also use simplifications of the same nature, and may have a syntax (e.g., MDLM, MRL) that indicates a particular region.
Note that the following aspects may be combined and applied jointly. The partitioning process is performed, for example, in conjunction with reference sample downsampling and a partitioning table.
When applying classification (MMLM/ELM), each group may apply the same or different reduced operations. For example, before applying the right shift, the samples of each group are respectively filled to the target sample numbers, and then the same derivation process, the same division table, is applied.
Fixed point implementation
The 1 tap case may reuse the CCLM design, dividing by n may be achieved by right shifting, and dividing by a2 may be achieved by LUT. Integer parameters including nα involved in the integer design of the LMS CCLM,Ntable and intermediate parameters (equations (19) - (20)) used to derive the linear model may be the same as or have different values than CCLM to achieve greater accuracy. The integer parameters may be predefined or signaled/switched in SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level, which may be adjusted according to the sequence bit depth. For example, ntable = bit depth +4.
MDLM downsampling
When GLM is combined with MDLM, the existing total samples for parameter derivation may not be power values of 2 and need to be padded to powers of 2 to replace division with a right shift operation. For example, for an 8×4 chroma CU, MDLM requires w+h=12 samples, whereas MDLM _t is only 8 samples available (reconstructed), then the downsampled 4 samples (0, 2, 4, 6) can be equivalently padded. The code to implement such an operation is as follows:
other filling methods, such as repeated/mirrored filling with respect to the last adjacent sample point (rightmost/bottommost) may also be applied.
The filling method for GLM may be the same as or different from the filling method of CCLM.
Note that in the ECM version, 8×4 chromaticity CU MDLM _t/MDLM _l requires 2T/2 l=16/8 samples, respectively, in which case the same padding method can be applied to satisfy the power number of samples of target 2.
Division LUT
The partitioning LUT proposed for CCLM/LIC (local illumination Compensation) in the development of known standards such as AVC/HEVC/AV1/VVC/AVS can be used for GLM division. For example, the LUT in JCTVC-I0166 is reused with bit depth=10 (table 4). The division LUT may be different from CCLM. For example, CCLM uses min-max with DivTable (as in equation 5), but GLM uses a 32-entry LMS division LUT (as in table 5).
When GLM is combined with MMLM, the meanL value may not always be positive (e.g., classifying the group using filtered/gradient values), so sgn (meanL) needs to be extracted and abs (meanL) used to find the division LUT. Note that the division LUTs used for MMLM classification and parameter derivation may be different. For example, a lower precision LUT (e.g., min-max LUT) is used for mean classification, and a higher precision LUT (e.g., in LMS) is used for parameter derivation.
Size constraint and latency constraint
Similar to the CCLM design, some size constraints may be applied for ELM/FLM/GLM. For example, the same constraint for luminance-chrominance delays in a dual-tree may be applied.
The size constraint may be based on CU area/width/height/depth. The threshold may be predefined or signaled in SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level. For example, for a chroma CU area, the predefined threshold may be 128.
In one example, at least one pre-operation is performed in response to determining that the video block meets an enabling threshold, wherein the enabling threshold is associated with an area, a width, a height, or a segmentation depth of the video block. In particular, the enablement threshold may define a minimum or maximum area, width, height, or segmentation depth of the video block. As understood by those skilled in the art, a video block may include a current chroma block and its co-located luma block. It is also proposed to jointly apply the above-mentioned enabling threshold to the current chroma block and its co-located luma block. For example, in response to determining that both the current chroma block and its co-located luma block meet an enable threshold, at least one pre-operation is performed.
Line buffer reduction
Similar to the CCLM design, if the co-located luma region of the current chroma CU contains the first line inside one CTU, then the top template sample generation may be limited to 1 line to reduce the line buffer storage of the CTU line. Note that when the up reference line is located at the CTU boundary, only one luma line (a common line buffer in intra prediction) is used to make the downsampled luma samples.
For example, in fig. 13, if the co-located luma region of the current chroma CU contains the first line inside one CTU, the top template may be limited to parameter derivation using only 1 line (instead of 2 lines) (other CUs may still use 2 lines). This may save on luma sample line buffer storage when CTUs are processed line by line at the decoder hardware. Line buffer reduction may be achieved using several methods. Note that the limited example of a "1" row can be extended to N rows with similar operations. Similarly, 2-tap or multi-tap may also apply such operations. Chroma sampling may also require application operations when multi-tap is applied.
For example, a 1-tap filter [1,0, -1;1,0, -1] shown in FIG. 15A is illustrated as an example. The filter can be reduced to [0, 0;1,0, -1], i.e. only the lower coefficients are used. Alternatively, a limited up-link luminance sample may be filled from below (repeat, mirror, 0, meanL, meanc.
Taking as an example that n=4, i.e. the video block is located at the top boundary of the current CTU, the neighboring luma sample values and the corresponding chroma sample values of the top 4 lines are used to derive the linear model. Note that the corresponding chroma-sample values may refer to the corresponding top 4 rows of neighboring chroma-sample values (e.g., for YUV 4:4 format). Alternatively, the corresponding chroma sample values may refer to the corresponding top 2 rows of neighboring chroma sample values (e.g., for YUV 4:2:0 format). In this case, the neighboring luminance sample values and corresponding chrominance sample values of the top 4 rows may be divided into two regions-a first region including valid sample values (e.g., luminance sample values and corresponding chrominance sample values of the nearest row), and a second region including invalid sample values (e.g., luminance sample values and corresponding chrominance sample values of the other three rows). The coefficients of the filter corresponding to sample points not belonging to the first region may then be set to zero such that only sample point values from the first region are used to calculate the sample point difference. For example, as discussed above, in this case, the filter [1,0, -1;1,0, -1] may be reduced to [0, 0;1,0, -1]. Alternatively, the nearest sample value in the first region may be padded to the second region, so that the padded sample value may be used to calculate the sample difference.
Fusion of chroma intra prediction modes
In one example, because GLM can be a special CCLM mode, the fusion design can be reused or have its own way. Multiple (two or more) weights may be applied to generate the final predictor. For example, the number of the cells to be processed,
pred=(w0*pred0+w1*pred1+(1<<(shift-1)))>>shift
Wherein pred0 is a non-LM mode based predictor and pred1 is a GLM based predictor, or
Pred0 is a CCLM-based predictor, including all MDLM/MMLM, and pred1 is a GLM-based predictor, or
Pred0 is a GLM-based predictor, and pred1 is a GLM-based predictor.
Different I/P/B slices may have different weight designs (w 0 and w 1) depending on whether neighboring blocks are coded with CCLM/GLM/other coding modes or block size/width/height.
For example, the design for the weights may be determined by the intra prediction modes of neighboring chroma blocks, and shift is set equal to 2. Specifically, when both the upper and left neighboring blocks are coded using LM mode, { w0, w1} = {1,3}, when both the upper and left neighboring blocks are coded using non-LM mode, { w0, w1 = {3,1}, otherwise, { w0, w1 = {2,2}. For non-I slices, w0 and w1 may both be set equal to 2.
For grammar design, if a non-LM mode is selected, a flag is signaled to indicate whether fusion is applied.
As described above, GLM has a good gain complexity tradeoff because it can reuse existing CCLM modules without introducing additional derivation. Such a 1-tap design may be further extended or generalized in accordance with one or more aspects of the present disclosure.
In one aspect of the present disclosure, for chroma samples to be predicted, a single corresponding luma sample L may be generated by combining co-located luma samples and neighboring luma samples. For example, the combination may be a combination of different linear filters, e.g., a combination of a high pass gradient filter (GLM) and a low pass smoothing filter (e.g., a [1,2,1;1,2,1]/8FIR downsampling filter that may be commonly used in CCLM), and/or a combination of a linear filter and a non-linear filter (e.g., having a power of n, e.g., Ln, n may be a positive number, a negative number, or a + -fraction (e.g., +1/2, square root, or +3, cube, which may be rounded and rescaled to a bit depth dynamic range)).
In one aspect of the disclosure, the combination may be applied repeatedly. For example, a combination of GLM and [1,2,1;1,2,1]/8FIR may be applied on the reconstructed luminance samples, and then a nonlinear power of 1/2 may be applied. For example, the nonlinear filter may be implemented as a LUT (look-up table), e.g., for bit depth=10, power of n, n=1/2, LUT [ i ] = (int) (sqrt (i) +0.5) < <5,i =0-1023, where 5 is to be scaled to bit depth=10 dynamic range. The nonlinear filter may provide an option when the linear filter cannot efficiently process the luminance-chrominance relationship. Whether non-linear terms are used may be predefined or signaled/switched in SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level.
In one or more aspects of the present disclosure, GLM may refer to a generalized linear model (which may be used to generate one single luminance sample linearly or non-linearly, and the generated one single luminance sample may be fed into the CCLM linear model to derive parameters of the CCLM linear model), the linear/non-linear generation may be referred to as a generic mode. Different gradients or generic modes may be combined to form another mode. For example, a gradient pattern may be combined with a down sampled value via CCLM, a gradient pattern may be combined with a non-linear L2 value, a gradient pattern may be combined with another gradient pattern, two gradient patterns to be combined may have different directions or the same direction, for example, [1, 1; -1, -1, -1] and [1,2,1; -1, -2, -1], both of which have vertical directions, and also [1, 1; -1, -1, -1] and [1,0, -1], both of which have vertical and horizontal directions, as shown in FIGS. 15A-15D. The combination may include adding, subtracting, or linear weighting.
GLM applied to downsampling domain
As described above, the pre-operation may be repeatedly applied, and GLM may be applied to the pre-linear weighted/pre-operation samples. For example, as a CCLM, a template filter may be applied to luma samples to remove outliers (i.e., CCLM downsampling smoothing filter) and generate downsampled luma samples (one downsampled luma sample corresponding to one chroma sample) using a low-pass smoothing FIR filter [1,2,1;1,2,1 ]/8. And thereafter, a 1-tap GLM may be applied to the smoothed downsampled luminance samples to derive the MLR model.
Some gradient filter modes (such as 3 x 3Sobel or Prewitt operators) may be applied to the downsampled luminance samples. The following table shows some of the gradient filter modes.
The gradient filter pattern may be combined with other gradient/generic filter patterns in the downsampled luminance domain. In one example, a combined filter pattern may be applied to the downsampled luminance samples. For example, the combined filter pattern may be derived by performing an addition or subtraction operation on the respective coefficients of the gradient filter pattern and the DC/low pass based filter pattern, such as the filter pattern [0, 0;0,1,0;0, 0] or [1,2,1;2,4,1;1,2,1 ]. In another example, the combined filter pattern is derived by performing an addition or subtraction operation on coefficients and non-linear values (such as L2) of the gradient filter pattern. In another example, the combined filter pattern is derived by performing an addition or subtraction operation on the corresponding coefficients of the gradient filter pattern and another gradient filter pattern having a different direction or the same direction. In another example, the combined filter pattern is derived by performing a linear weighting operation on coefficients of the gradient filter pattern.
GLM applied to the downsampling field may fit into CCCM framework, but may sacrifice high frequency accuracy because low-pass smoothing is applied before GLM is applied.
In one or more aspects of the present disclosure, one or more grammars may be introduced to indicate information about GLM. Table 10 below shows an example of GLM syntax.
Table 10
FLC fixed length code
TU truncated unary code
EGk an exponential golomb code with k-th order, where k may be fixed or signaled/switched in SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level.
SVLC signed EG0
UVLC unsigned EG0
Note that the binarization of each syntax element may be changed.
In one aspect of the present disclosure, GLM on/off control of Cb/Cr components may be performed jointly or separately. For example, at the CU level, 1 flag may be used to indicate whether GLM is valid for this CU. If so, 1 flag may be used to indicate whether Cb/Cr are both valid. If neither is valid, 1 flag indicates that Cb or Cr is valid. When Cb and/or Cr are active, the filter index/gradient (generic) mode may be signaled separately. All flags may have their own context model or bypassed codec.
In another aspect of the present disclosure, whether to signal the GLM on/off flag may depend on the luma/chroma coding mode and/or the CU size. For example, in the ECM5 chroma intra mode syntax, GLM may be inferred to be off when MMLM or MMLM _l or MMLM _t is applied, when CU area < a, where a may be predefined or signaled/switched in SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level, and when CCCM is on, GLM may be inferred to be off if combined with CCCM.
Note that when GLM is combined with MMLM, different models may share the same gradient/common pattern or have their own gradient/common pattern.
When GLM is combined with CCCM/FLM, if the current CU is enabled to CCCM/FLM, then the CU level GLM enable flag may be inferred to be off.
hasGlmFlag&=!pu.cccmFlag;
CCCM without downsampling procedure
CCCM require processing of the downsampled luminance reference values before calculating the model parameters and applying the CCCM model, which increases the burden on the decoder processing cycle. In this section, CCCM is presented without a downsampling process, including utilizing a non-downsampled luminance reference value and/or a selection of a different non-downsampled luminance reference. One or more filter shapes may be used for the purposes described below.
Note that the methods/examples in this section may be reused in combination with/together with the above-mentioned methods, including but not limited to methods related to classification, filter shape, matrix derivation (with special handling), application area and syntax. Furthermore, the methods/examples listed in this section may also be applied with the above-described methods/examples (more taps) to have better performance under certain complexity trade-offs.
In this disclosure, reference samples/training templates/reconstructed neighboring areas generally refer to luminance samples used to derive MLR model parameters, which are then applied to internal luminance samples in one CU to predict chroma samples in the CU.
Filter shape
One or more shapes/numbers of filter taps may be used for CCCM predictions as shown in fig. 16, 17, and 18A-18B. One or more sets of filter taps may be used for FLM prediction, examples of which are shown in fig. 19A-19G. The selected luminance reference value is not downsampled. One or more predefined shapes/numbers of filter taps may be used for CCCM prediction based on previously decoded information on TB/CB/slice/picture/sequence level.
While the multi-tap filter may fit the training data well (i.e., top/left adjacent reconstructed luma/chroma samples), in some cases where the training data does not capture the complete characteristics of the test data, this may result in an over-fit and not predict the test data well (i.e., chroma block samples to be predicted). Furthermore, different filter shapes can be well adapted to different video block contents, resulting in a more accurate prediction. To solve this problem, the filter shape/number of filter taps may be predefined or signaled/switched in SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level. A set of filter shape candidates may be predefined or signaled/switched in SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level. Different components (U/V) may have different filter switching controls. For example, as shown in the following table, a set of filter shape candidates (index=0 to 5) is predefined, and the filter shape (1, 2) represents a 2-tap luminance filter, while the filter shape (1, 2, 4) represents a 3-tap luminance filter, etc., as shown in fig. 11. The filter shape selection of the U/V component may be switched in the PH or CU/CTU level. Note that the N taps may represent N taps with or without an offset β as described above.
Different chroma types/color formats may have different predefined filter shapes/taps. For example, a predefined filter shape is used for 420 types-0 (1, 2,4, 5), 420 types-2 (0,1,2,4,7), 422 (1, 4), 444 (0, 1,2,3,4, 5), as shown in FIG. 12.
Luminance/chrominance samples that are not available for deriving the MLR model may be filled from the available reconstructed samples. For example, if a 6-tap (0, 1,2,3,4, 5) filter as in fig. 12 is used, then for a CU located at the left picture boundary, the left column including (0, 3) is not available (beyond the picture boundary), so (0, 3) is a repeated fill from (1, 4) to apply the 6-tap filter. Note that the padding process is applied in both training data (top/left adjacent reconstructed luminance/chrominance samples) and test data (luminance/chrominance samples in the CU).
Luminance/chrominance samples that are not available for deriving the MLR model may be skipped and not used in accordance with one or more embodiments of the present disclosure. Thus, a filling process is not required for unavailable luminance/chrominance samples.
CCLM/MMLM with LDL breakdown
CCCM need to process LDL decomposition to calculate model parameters of CCCM model, avoiding the use of square root operations and requiring only integer operations. In this section, CCLM/MMLM with LDL breakdown is presented. As described above, LDL breakdown can also be used in ELM/FLM/GLM.
Note that the methods/examples in this section may be reused in combination with/together with the above-mentioned methods, including but not limited to methods related to classification, filter shape, matrix derivation (with special handling), application area and syntax. Furthermore, the methods/examples listed in this section may also be applied with the above methods/examples to have better performance under certain complexity trade-offs.
In this disclosure, reference samples/training templates/reconstructed neighboring areas generally refer to luminance samples used to derive MLR model parameters, which are then applied to internal luminance samples in one CU to predict chroma samples in the CU.
CCLM/MMLM with extended range
One or more reference samples may be used for CCLM/MMLM prediction, i.e., as shown in fig. 10B, the reference region may be the same as the reference region in CCCM. Different reference regions may be used for CCLM/MMLM prediction based on previously decoded information at the TB/CB/slice/picture/sequence level.
While training data with multiple reference regions may well fit the calculation of model parameters, in some cases where the training data does not capture the complete characteristics of the test data, this may result in an over fit and not predict the test data well (i.e., the chroma block samples to be predicted). Furthermore, different reference regions can be well adapted to different video block contents, resulting in a more accurate prediction. To solve this problem, the number of reference shapes/reference regions may be predefined or signaled/switched in SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level. A set of reference region candidates may be predefined or signaled/switched in SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level. Different components (U/V) may have different reference area switching controls. For example, a set of candidate reference regions (idx=0 to 4) is predefined, as shown in the following table. The reference region selection of the U/V component may be switched in the PH or CU/CTU level. Different chroma types/color formats may have different predefined reference areas.
Luminance/chrominance samples that are not available for deriving the MLR model may be padded from the available reconstructed samples, the padding process being applied in both training data (top/left neighboring reconstructed luminance/chrominance samples) and test data (luminance/chrominance samples in the CU).
Luminance/chrominance samples that are not available for deriving the MLR model may be skipped and not used in accordance with one or more embodiments of the present disclosure. Thus, a filling process is not required for unavailable luminance/chrominance samples.
FLM/GLM/ELM/CCCM with minimum-point restriction
FLM needs to process the downsampled luminance reference values and calculate model parameters, which increases the burden of the decoder processing cycle, especially for small blocks. In this section, an FLM with minimal sample restriction is presented, e.g., FLM is only used for more than a predefined number of samples (such as 64, 128). One or more different restrictions may be used for this purpose, e.g., FLM is used only for more than a predefined number of samples (such as 256) in a single model, and FLM is used only for more than a predefined number of samples (such as 128) in multiple models.
According to one or more embodiments of the present disclosure, the number of predefined minimum samplings for a single model may be greater than or equal to the number of predefined minimum samplings for multiple models. For example, FLM/GLM/ELM/CCCM is used only in a single model for a predefined number of samples greater than or equal to (such as 128), and FLM/GLM/ELM/CCCM is used only in multiple models for a predefined number of samples greater than or equal to (such as 256).
According to one or more embodiments of the present disclosure, the number of predefined minimum samplings for FLM/GLM/ELM may be greater than or equal to the number of predefined minimum samplings for CCCM. For example, CCCM is used only in a single model for a predefined number of samples greater than or equal to a predefined number (such as 0), and CCCM is used only in multiple models for a predefined number of samples greater than or equal to a predefined number (such as 128). The FLM is only used in a single model for a predefined number of samples greater than or equal to, such as 128, and the FLM is only used in multiple models for a predefined number of samples greater than or equal to, such as 256.
Note that the methods/examples in this section may be reused in combination with/together with the above-mentioned methods, including but not limited to methods related to classification, filter shape, matrix derivation (with special handling), application area and syntax. Furthermore, the methods/examples listed in this section may also be applied with the above-described methods/examples (more taps) to have better performance under certain complexity trade-offs.
Fig. 20 illustrates a workflow of a method 2000 of decoding video data in accordance with one or more aspects of the present disclosure.
At step 2010, method 2000 includes obtaining video blocks from a bitstream.
At step 2020, method 2000 includes obtaining an inner luma sample value for the video block, an outer luma sample value for an outer region of the video block, and an outer chroma sample value for the outer region.
At step 2030, method 2000 includes calculating a downsampled internal luminance sample value from the obtained internal luminance sample values of the video block and a downsampled external luminance sample value from the obtained external luminance sample values of the external region, respectively.
At step 2040, method 2000 includes calculating filtered values for the downsampled external luminance sample values, wherein each of the filtered values is calculated based on a combined filter pattern derived from at least one gradient filter pattern that enables sample differences between the downsampled external luminance sample values to be calculated.
At step 2050, method 2000 includes predicting internal chroma sample values for a video block using a combined filter mode based on the downsampled internal luma sample values and the filtered values.
At step 2060, method 2000 includes obtaining a decoded video block using the predicted internal chroma sample values.
In one example, calculating the downsampled internal luminance sample value and the downsampled external luminance sample value includes obtaining the downsampled internal luminance sample value and the downsampled external luminance sample value by performing a weighted average operation.
In one example, predicting the internal chroma sample values of the video block includes deriving a linear model by using the filtered values and the external chroma sample values and predicting each of the internal chroma sample values of the video block by applying the linear model to the filtered values of the downsampled internal luma sample values, wherein each of the filtered values of the downsampled internal luma sample values is calculated based on the combined filter pattern.
In one example, the combined filter pattern is derived by performing an addition or subtraction operation on the respective coefficients of the at least one gradient filter pattern and the at least one low pass based filter pattern.
In another example, the combined filter pattern is derived by performing an addition or subtraction operation on the coefficients and non-linear values of at least one gradient filter pattern.
In one example, the non-linear value comprises a square of the corresponding downsampled internal luminance sample value.
In one example, the non-linear values are scaled to a range of luminance sample values for the video block.
In one example, the combined filter pattern is derived by performing an addition or subtraction operation on respective coefficients of at least one gradient filter pattern and another gradient filter pattern.
In one example, the combined filter pattern is derived by performing a linear weighting operation on coefficients of at least one gradient filter pattern.
Fig. 21 illustrates a workflow of a method 2100 for encoding video data in accordance with one or more aspects of the present disclosure.
At step 2110, method 2100 includes obtaining a video block.
At step 2120, method 2100 includes obtaining an inner luma sample value of a video block, an outer luma sample value of an outer region of the video block, and an outer chroma sample value of the outer region.
At step 2130, method 2100 includes calculating a downsampled internal luminance sample value from the obtained internal luminance sample values of the video block and a downsampled external luminance sample value from the obtained external luminance sample values of the external region, respectively.
At step 2140, method 2100 includes calculating filtered values for the downsampled external luminance sample values, wherein each of the filtered values is calculated based on a combined filter pattern derived from at least one gradient filter pattern that enables calculation of sample differences between the downsampled external luminance sample values.
At step 2150, method 2100 includes predicting internal chroma sample values for a video block using a combined filter pattern based on the downsampled internal luma sample values and the filtered values.
At step 2160, method 2100 includes generating a bitstream including the encoded video block using the predicted internal chroma sample values.
In one example, calculating the downsampled internal luminance sample value and the downsampled external luminance sample value includes obtaining the downsampled internal luminance sample value and the downsampled external luminance sample value by performing a weighted average operation.
In one example, predicting the internal chroma sample values of the video block includes deriving a linear model by using the filtered values and the external chroma sample values and predicting each of the internal chroma sample values of the video block by applying the linear model to the filtered values of the downsampled internal luma sample values, wherein each of the filtered values for the downsampled internal luma sample values is calculated based on a combined filter pattern.
In one example, the combined filter pattern is derived by performing an addition or subtraction operation on the respective coefficients of the at least one gradient filter pattern and the at least one low pass based filter pattern.
In one example, the combined filter pattern is derived by performing an addition or subtraction operation on the coefficients and non-linear values of at least one gradient filter pattern.
In one example, the non-linear value comprises a square of the corresponding downsampled internal luminance sample value.
In one example, the non-linear values are scaled to a range of luminance sample values for the video block.
In one example, the combined filter pattern is derived by performing an addition or subtraction operation on respective coefficients of at least one gradient filter pattern and another gradient filter pattern.
In one example, the combined filter pattern is derived by performing a linear weighting operation on coefficients of at least one gradient filter pattern.
FIG. 22 illustrates an example computing system 2200 in accordance with one or more aspects of the disclosure. The computing system 2200 may include at least one processor 2210. Computing system 2200 may also include at least one storage device 2220. The storage device 2220 may store computer executable instructions that, when executed, cause the processor 2210 to perform the steps of the methods described above. The processor 2210 may be a general purpose processor or may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration. The storage device 2220 may store input data, output data, data generated by the processor 2210, and/or instructions for execution by the processor 2210.
It should be appreciated that the storage device 2220 may store computer executable instructions that, when executed, cause the processor 2210 to perform any operations according to embodiments of the disclosure.
Embodiments of the present disclosure may be embodied in a computer-readable medium, such as a non-transitory computer-readable medium. The non-transitory computer-readable medium may include instructions that, when executed, cause one or more processors to perform any operations in accordance with embodiments of the present disclosure. For example, the instructions, when executed, may cause one or more processors to receive a bitstream and perform decoding operations as described above. As another example, the instructions, when executed, may cause the one or more processors to perform encoding operations and transmit a bitstream including encoded video information associated with predicted chroma samples as described above.
It should be recognized that all operations in the above-described methods are merely exemplary, and that the present disclosure is not limited to any operations in the methods or to the order of such operations, but rather should encompass all other equivalents under the same or similar concepts.
It should also be appreciated that all of the modules in the above methods may be implemented in a variety of ways. These modules may be implemented as hardware, software, or a combination thereof. Furthermore, any of these modules may be functionally further divided into sub-modules or combined together.
The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects. Accordingly, the claims are not intended to be limited to the aspects shown herein. All structural and functional equivalents to the elements of the various aspects described throughout this disclosure that are known or later come to be known to those of ordinary skill in the art are expressly incorporated herein by reference and are intended to be encompassed by the claims.