Movatterモバイル変換


[0]ホーム

URL:


CN119256544A - Method and apparatus for cross-component prediction for video coding - Google Patents

Method and apparatus for cross-component prediction for video coding
Download PDF

Info

Publication number
CN119256544A
CN119256544ACN202380042295.6ACN202380042295ACN119256544ACN 119256544 ACN119256544 ACN 119256544ACN 202380042295 ACN202380042295 ACN 202380042295ACN 119256544 ACN119256544 ACN 119256544A
Authority
CN
China
Prior art keywords
glm
samples
chroma
luminance
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202380042295.6A
Other languages
Chinese (zh)
Inventor
郭哲玮
朱弘正
修晓宇
闫宁
陈伟
王祥林
于冰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co LtdfiledCriticalBeijing Dajia Internet Information Technology Co Ltd
Publication of CN119256544ApublicationCriticalpatent/CN119256544A/en
Pendinglegal-statusCriticalCurrent

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本公开内容提供了一种用于对视频数据进行编解码的方法,包括:获得比特流;从比特流中获得指示与梯度线性模型(GLM)相关的信息的指示,其中,GLM用于基于亮度样点之间的强度差来获得一个或多个经滤波值;以及基于与GLM相关的信息对视频数据进行解码。

The present disclosure provides a method for encoding and decoding video data, including: obtaining a bitstream; obtaining an indication of information related to a gradient linear model (GLM) from the bitstream, wherein the GLM is used to obtain one or more filtered values based on intensity differences between luminance samples; and decoding the video data based on the information related to the GLM.

Description

Method and apparatus for cross-component prediction for video coding
Cross Reference to Related Applications
The present application claims the benefit of U.S. provisional application No. 63/346,253 filed on day 26 of 5 of 2022. The entire contents of which are incorporated herein by reference in their entirety.
Technical Field
Aspects of the present disclosure relate generally to image/video codec and compression, and more particularly, to methods and apparatus for cross-component prediction techniques.
Background
Various video codec techniques may be used to compress video data. Video encoding and decoding is performed according to one or more video encoding and decoding standards. For example, video codec standards include general video codec (VVC), high efficiency video codec (h.265/HEVC), advanced video codec (h.264/AVC), moving Picture Experts Group (MPEG) codec, and so forth. Video coding typically employs prediction methods (e.g., inter-prediction, intra-prediction, etc.) that exploit redundancy present in a video image or sequence. Video codec aims at compressing video data into a form using a lower bit rate while avoiding or minimizing degradation of video quality.
Disclosure of Invention
The following presents a simplified summary in accordance with one or more aspects of the disclosure in order to provide a basic understanding of such aspects. This summary is not an extensive overview of all contemplated aspects, and is intended to neither identify key or critical elements of all aspects nor delineate the scope of any or all aspects. Its sole purpose is to present some concepts of one or more aspects in a simplified form as a prelude to the more detailed description that is presented later.
According to one aspect of the present disclosure, there is provided a method for decoding video data, comprising obtaining a bitstream, obtaining from the bitstream an indication indicative of information related to a Gradient Linear Model (GLM), wherein the GLM is configured to obtain one or more filtered values based on intensity differences between luminance samples, and decoding the video data based on the information related to the GLM.
According to another aspect of the present disclosure, there is provided a method for encoding video data, comprising obtaining an indication indicative of information related to a Gradient Linear Model (GLM), wherein the GLM is configured to obtain one or more filtered values based on intensity differences between luminance samples, encoding the video data based on the information related to the GLM, and obtaining a bitstream comprising the encoded video data and the indication indicative of the information related to the GLM.
According to an embodiment, a computer system is provided that includes one or more processors and one or more storage devices storing computer-executable instructions that, when executed, cause the one or more processors to perform the operations of the methods of the present disclosure.
According to an embodiment, a computer program product is provided that stores computer-executable instructions that, when executed, cause one or more processors to perform the operations of the methods of the present disclosure.
According to an embodiment, a computer-readable medium is provided that stores computer-executable instructions that, when executed, cause one or more processors to perform the operations of the methods of the present disclosure.
According to an embodiment, a computer readable medium storing a bitstream generated according to the operation of the method of the present disclosure is provided.
Drawings
The disclosed aspects will be described below in conjunction with the drawings, which are provided to illustrate and not limit the disclosed aspects.
Fig. 1 shows a block diagram of a generic block-based hybrid video coding system.
Fig. 2A to 2E show five division types including quaternary division, horizontal binary division, vertical binary division, horizontal ternary division, and vertical ternary division.
Fig. 3 shows a general block diagram of a block-based video decoder.
Fig. 4 shows examples of left and upper samples and the positions of the samples of the current block involved in the CCLM mode.
Fig. 5A-5C illustrate examples of deriving CCLM parameters.
Fig. 6 shows an example of classifying neighboring samples into two groups based on a value Threshold.
Fig. 7 shows an example of classifying neighboring sample points into two groups based on the turning points.
Fig. 8A and 8B show the effect of scaling the adjustment parameter "u".
Fig. 8C shows co-located reconstructed luminance samples.
Fig. 8D shows neighboring reconstructed samples.
Fig. 8E to 8H show steps of decoder-side intra mode derivation.
Fig. 9 shows an example of four reference lines adjacent to a prediction block.
Fig. 10A and 10B show a schematic diagram of the correlation between a chromaticity sample and one or more luminance samples.
Fig. 11 illustrates an example of using 6 taps in a Multiple Linear Regression (MLR) model in accordance with one or more aspects of the present disclosure.
Fig. 12 illustrates example different filter shapes and/or tap numbers in accordance with one or more aspects of the present disclosure.
Fig. 13 shows an example in which FLM can only use top or left luminance and/or chrominance samples (extended) for parameter derivation.
Fig. 14 shows an example in which FLM may use different rows for parameter derivation.
Fig. 15 shows some examples of 1 tap/2 tap pre-operations.
FIG. 16 illustrates an exemplary pattern of convolved cross-component models (CCCM).
Fig. 17 shows an exemplary reference region consisting of 6 lines of chroma samples above and to the left of the PU.
Fig. 18 illustrates a workflow of a method for decoding video data in accordance with one or more aspects of the present disclosure.
Fig. 19 illustrates a workflow of a method for encoding video data in accordance with one or more aspects of the present disclosure.
FIG. 20 illustrates an exemplary computing system in accordance with one or more aspects of the present disclosure.
Detailed Description
Reference will now be made in detail to the present embodiments, examples of which are illustrated in the accompanying drawings. In the following detailed description, numerous non-limiting specific details are set forth in order to provide an understanding of the subject matter presented herein. It will be apparent, however, to one of ordinary skill in the art that various alternatives may be used and that the subject matter may be practiced without these specific details without departing from the scope of the claims. For example, it will be apparent to one of ordinary skill in the art that the subject matter presented herein may be implemented on many types of electronic devices having digital video capabilities.
It should be noted that the terms "first," "second," and the like, as used in the description and claims of the present disclosure and in the accompanying drawings, are used for distinguishing between objects and not for describing any particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate, such that embodiments of the disclosure described herein may be implemented in other sequences than those illustrated in the figures or otherwise described in the disclosure.
The first version of the VVC standard was completed in 7 months 2020, which provides a bit rate saving of approximately 50% or equivalent perceived quality compared to the previous generation video codec standard HEVC. Although the VVC standard provides significant codec improvements over its predecessor, there is evidence that excellent codec efficiency can be achieved with additional codec tools. Recently, in cooperation with ITU-T VCEG and ISO/IEC MPEG, the Joint Video Exploration Team (JVET) began to explore advanced technologies that could greatly improve codec efficiency over VVC. At month 4 of 2021, a software code library named Enhanced Compression Model (ECM) was built for future video codec discovery work. The ECM reference software is based on a VVC Test Model (VTM) developed by JVET for VVC and further expands and/or improves on several existing modules (e.g., intra/inter prediction, transform, loop filter, etc.). In the future, any new codec beyond the VVC standard needs to be integrated into the ECM platform and tested using JVET universal test conditions (CTCs).
Like all previous video codec standards, ECMs are built on a block-based hybrid video codec framework. Fig. 1 shows a block diagram of a generic block-based hybrid video coding system. The input video signal is processed block by block, called a Coding Unit (CU). In ECM-1.0, a CU may be up to 128×128 pixels. However, like VVC, one Coding Tree Unit (CTU) is partitioned into CUs to accommodate different local characteristics based on a quadtree/binary/trigeminal tree. In a multi-type tree structure, one CTU is first partitioned by a quadtree structure. Each quadtree leaf node may then be further partitioned by a binary tree structure and a trigeminal tree structure. As shown in fig. 2A, 2B, 2C, 2D, and 2E, there are five segmentation types, quaternary segmentation, vertical binary segmentation, horizontal binary segmentation, vertical expansion quaternary segmentation, and horizontal expansion quaternary segmentation.
In fig. 1, spatial prediction and/or temporal prediction may be performed. Spatial prediction (or "intra prediction") predicts a current video block using pixels from samples (which are referred to as reference samples) of decoded neighboring blocks in the same video picture/slice. Spatial prediction reduces the spatial redundancy inherent in video signals. Temporal prediction (also referred to as "inter prediction" or "motion compensated prediction") uses reconstructed pixels from a coded video picture to predict a current video block. Temporal prediction reduces the inherent temporal redundancy in video signals. The temporal prediction signal for a given CU is typically signaled by one or more Motion Vectors (MVs), which indicate the amount and direction of motion between the current CU and its temporal reference. Also, if a plurality of reference pictures are supported, one reference picture index for identifying from which reference picture in the reference picture store the temporal prediction signal originates is additionally transmitted. After spatial and/or temporal prediction, a mode decision block in the encoder selects the best prediction mode, e.g. based on a rate-distortion optimization method. Then, the prediction block is subtracted from the current video block, and the prediction residual is decorrelated and quantized using a transform. The quantized residual coefficients are inverse quantized and inverse transformed to form a reconstructed residual, which is then added back to the prediction block to form a reconstructed signal of the CU. Furthermore, loop filtering, such as deblocking filters, sample Adaptive Offset (SAO), and Adaptive Loop Filters (ALF), may be applied to the reconstructed CU before the reconstructed CU is placed in the reference picture store and used to codec future video blocks. To form the output video bitstream, the coding mode (inter or intra), prediction mode information, motion information, and quantized residual coefficients are all sent to an entropy encoding unit to be further compressed and packed to form the bitstream. It should be noted that the term "block" or "video block" as used herein may be a part of a frame or picture, in particular a rectangular (square or non-square) part. Referring to HEVC and VVC, a block or video block may be or correspond to a Coding Tree Unit (CTU), a CU, a Prediction Unit (PU) or a Transform Unit (TU) and/or may be or correspond to a corresponding block, e.g., a Coding Tree Block (CTB), a Coding Block (CB), a Prediction Block (PB) or a Transform Block (TB)), and/or a sub-block.
Fig. 3 shows a general block diagram of a block-based video decoder. The video bitstream is first entropy decoded at an entropy decoding unit. The coding mode and prediction information are sent to a spatial prediction unit (if intra coded) or a temporal prediction unit (if inter coded) to form a prediction block. The residual transform coefficients are sent to an inverse quantization unit and an inverse transform unit to reconstruct the residual block. The prediction block and the residual block are then added together. The reconstructed block may also be loop filtered before being stored in the reference picture store. The reconstructed video in the reference picture store is then sent out to drive the display device and used to predict future video blocks.
The primary focus of the present disclosure is to further enhance the codec efficiency of a codec tool applied to a cross-component predictive, cross-component linear model (CCLM) in ECM. Hereinafter, some relevant codec tools in the ECM will be briefly reviewed. Hereinafter, some drawbacks in the existing design of the CCLM are discussed. Finally, a solution for improving existing CCLM predictive designs is provided.
Cross-component linear model prediction
To reduce cross-component redundancy, a cross-component linear model (CCLM) prediction mode is used in VVC for which chroma samples are predicted based on reconstructed luma samples of the same CU by using a linear model:
predC(i,j)=α·recL′(i,j)+ β (1)
Where predC (i, j) represents the predicted chroma samples in the CU and recL' (i, j) represents the downsampled reconstructed luma samples of the same CU obtained by performing the downsampling on the reconstructed luma samples recL (i, j). The above α and β are linear model parameters derived from up to four neighboring chroma-sample points and their corresponding downsampled luma sample points, which may be referred to as neighboring luma-chroma sample point pairs. Assuming that the current chroma block has a size of w×h, W 'and H' are obtained as follows:
-when applying LM mode, W '=w, H' =h;
-whenapplyingLM-amode,w'=w+h;
-H' =h+w when LM-L mode is applied;
Wherein in LM mode, the upper and left side samples of the CU are used together to calculate the linear model coefficients, in lm_a mode, only the upper sample of the CU is used to calculate the linear model coefficients, and in lm_l mode, only the left side sample of the CU is used to calculate the linear model coefficients.
If the position of the upper neighboring sample point of the chroma block is denoted as S [0, -1]. S [ W '-1, -1], and the position of the left neighboring sample point of the chroma block is denoted as S [ -1,0]. S [ -1, h' -1], the positions of the four neighboring chroma sample points are selected as follows:
-selecting S [ W '/4, -1], S [3*W'/4, -1], S [ -1, h '/4], S [ -1,3 x h'/4] as the locations of the four neighboring chroma samples when the LM mode is applied and both the top and left neighboring samples are available;
selectingS[W'/8,-1],S[3*W'/8,-1],S[5*W'/8whentheLM-Amodeisappliedoronlytheupperneighborsamplesareavailable,
-1, S [7*W'/8, -1] as four locations adjacent to the chroma sampling points;
-selecting S-1, h '/8, S-1, 3 h'/8, S-1, 5h '/8, S-1, 7 h'/8 as the location of the four neighboring chroma samples when LM-L mode is applied or only left neighboring samples are available.
Four neighboring luminance samples corresponding to the selected position are obtained by a downsampling operation and the four obtained neighboring luminance samples are compared four times to find two larger values x0A and x1A and two smaller values x0B and x1B. The chroma sample values corresponding to the two larger and smaller values are denoted as y0A、y1A、y0B and y1B, respectively. Then, Xa、Xb、Ya and Yb are derived as follows:
Xa=(x0A+x1A+1)>>1;
Xb=(x0B+x1B+1)>>1;
Ya=(y0A+y1A+1)>>1;
Yb=(y0B+y1B+1)>>1
(2)
Finally, the linear model parameters α and β are obtained according to the following equation.
β=Yb-α·Xb (4)
Fig. 4 shows examples of the positions of the left side and upper samples and the samples of the current block involved in the CCLM mode, including the positions of the left side and upper samples of the nxn chroma block in the CU and the positions of the left side and upper samples of the 2 nx2N luma block in the CU.
The division operation for calculating the parameter α is implemented using a look-up table. To reduce the memory required to store the table, the diff value (the difference between the maximum and minimum values) and the parameter α are represented by an exponential representation. For example, diff is approximated by a 4-bit significant portion and an exponent. Thus, the table for 1/diff is reduced to 16 elements of 16 valid values, as follows:
DivTable [ ] = { 0, 7, 6, 5, 5, 4, 4, 3, 3, 2, 2, 1, 1, 1, 1, 0 }(5)
this would have the benefit of both reducing the complexity of the computation and reducing the memory size required to store the required tables.
In addition to the upper and left templates that can be used together to calculate the linear model coefficients, they can alternatively be used in other 2 LM modes (referred to as lm_a and lm_l modes).
In lm_t mode, only the upper template is used to calculate the linear model coefficients. To get more samples, the upper template is expanded to (W+H) samples. In lm_l mode, only the left template is used to calculate the linear model coefficients. To get more samples, the left template is expanded to (H+W) samples.
In lm_lt mode, the left and upper templates are used to calculate the linear model coefficients.
To match chroma samples of a 4:2:0 video sequence, two types of downsampling filters are applied to luma samples to achieve a downsampling ratio of 2:1 in both the horizontal and vertical directions. The selection of the downsampling filter is specified by the SPS level flag. The two downsampling filters are as follows, which correspond to "type 0" and "type 2" content, respectively.
Note that when the up reference line is located at the CTU boundary, only one luma line (the common line buffer in intra prediction) is used to make the downsampled luma samples.
This parameter calculation is performed as part of the decoding process and not just as an encoder retrieval operation. As a result, the α and β values are not transmitted to the decoder using any syntax.
For chroma intra mode coding, a total of 8 intra modes are allowed for chroma intra mode coding. These modes include five traditional intra modes and three cross-component linear model modes (CCLM, lm_a, and lm_l). The chroma mode signaling and derivation procedure is shown in table 1. Chroma mode coding depends directly on the intra prediction mode of the corresponding luma block. Since separate block partition structures for the luminance component and the chrominance component are enabled in the I slice, one chrominance block may correspond to a plurality of luminance blocks. Thus, for the chroma DM mode, the intra prediction mode of the corresponding luma block covering the center position of the current chroma block is directly inherited.
TABLE 1 deriving chroma prediction modes from luma modes when CCLM is enabled
Regardless of the value of sps cclm enabled flag, a single binarization table is used, as shown in Table 2.
TABLE 2 unified binarization table for chroma prediction modes
In table 2, the first binary bit indicates whether it is the normal mode (0) or the LM mode (1). If it is LM mode, the next bit indicates whether it is lm_chroma (0). If it is not LM_CHROMA, the next 1 binary bit indicates whether LM_L (0) or LM_A (1). For this case, when sps_ cclm _enabled_flag is 0, the first binary bit of the binary table of the corresponding intra_chroma_pred_mode may be discarded before entropy encoding. Or, in other words, the first binary bit is inferred to be 0 and is therefore not encoded. This single binarization table is used for both cases where sps_ cclm _enabled_flag is equal to 0 and 1. The first two bits in table 2 are context-coded using their own context model, and the remaining bits are bypass-coded.
In addition, to reduce luminance-chrominance delays in dual-tree, when 64×64 luminance coding tree nodes are partitioned with Not Split (and ISP is Not used for 64×64 CUs) or QT, the chrominance CUs in the 32×32/32×16 chrominance coding tree nodes are allowed to use CCLM as follows:
If the 32×32 chroma node is not partitioned or QT partitioned, then all chroma CUs in the 32×32 node may use CCLM.
If the 32×32 chroma node is partitioned with horizontal BT and the 32×16 child node is not partitioned or partitioned using vertical BT, then all chroma CUs in the 32×16 chroma node may use CCLM.
The CCLM is not allowed for chroma CUs under all other luma and chroma coding tree partitioning conditions.
During ECM development, the simplified derivation of α and β (min-max approximation) was removed. Alternatively, a linear least squares solution is performed between the causal reconstruction data of the downsampled luminance samples and the causal chrominance samples to derive the model parameters α and β.
Wherein RecC (I) and Rec'L (I) indicate reconstructed chroma samples and downsampled reconstructed luma samples around the target block, and I indicates the total number of samples of the neighboring data.
The lm_ A, LM _l mode is also called a multidirectional linear model (MDLM). Fig. 5A shows an example of operation MDLM when block content cannot be predicted from an L-shaped reconstruction region. Fig. 5B shows MDLM _l that uses only the left reconstruction samples to derive the CCLM parameters. Fig. 5C shows MDLM _t that uses only the top reconstruction samples to derive the CCLM parameters.
Integers of Least Mean Squares (LMS) discussed above (please refer to equations (8) - (9)) have been proposed as an improvement over CCLM. An initial integer design of the LMS CCLM is presented for the first time in JCTCC-C206. The method is then improved by a series of simplifications, which finally form an ECM LMS version, including JCTVC-F0233/I0178, which reduces the alpha precision nα from 13 to 7, JCTVC-I0151, which reduces the maximum multiplier bit width, and JCTVC-H0490/I0166, which reduces the division LUT entries from 64 to 32.
As discussed in equation (1), the integer design models the correlation of luminance and chrominance signals using a linear relationship. The chrominance values are predicted from the reconstructed luminance values of the co-located block.
In YUV420 sampling, the luminance component and the chrominance component have different sampling ratios. The sampling rate of the chrominance component is half that of the luminance component, and has a phase difference of 0.5 pixel in the vertical direction. The reconstructed luminance requires downsampling in the vertical direction and subsampling in the horizontal direction to match the size of the chrominance signal. For example, downsampling may be achieved by:
RecL′(i,j)=(recL(2i,2j)+recL(2i,2j+1))>>1 (10)
In equation (8), in order to maintain high data accuracy, floating point operations are required in calculating the linear model parameter α. And, when α is represented by a floating point value, floating point multiplication is involved in equation (1). In this section, an integer implementation of the algorithm is designed. Specifically, the fractional part of the parameter α is quantized with nα bits of data accuracy. The parameter a value is represented by the amplified and rounded integer values a 'and a' =a× (1 < < nα). The linear model of equation (1) is then changed to:
predC[x,y]=(α'·RecL'[x,y]>>nα)+β' (11)
where β 'is the rounded value of floating point β, and α' can be calculated as follows.
Instead of the division operation of equation (12) it is proposed to look up and multiply by a table. A2 is first scaled down to reduce the table size. A1 is also narrowed to avoid product overflow. Then, in A2, only the group consisting ofThe most significant bit of the value definition and the other bits are all set to zero. The approximation a2' can be calculated as:
wherein, [..] means rounding operations, andIt can be calculated as:
Wherein bdepth (a2) represents the bit depth of the value a2.
The same procedure is performed for a1 as follows:
Considering the quantized representations of a1 and a2, equation (12) can be rewritten as the following equation.
Wherein,The representation hasIs provided with a look-up table of the length of (c), to avoid division.
In the simulation, the constant parameter is set as:
nα is equal to 13, which is a trade-off between data accuracy and computational cost.
·Equal to 6, such that the look-up table size is 64, the table size can be further reduced to 32 by amplifying a2 when bdepth (a2) <6 (e.g., a2 < 32).
Ntable is equal to 15, resulting in a 16-bit data representation of the table element.
·Set to 15 to avoid product overflow and maintain 16 bit multiplication.
Finally, α' is cut to [ -2-15,215 -1] to preserve the 16-bit multiplication in equation (11). By this clipping, when nα is equal to 13, the actual a value is limited to [ -4, 4), which helps to prevent false amplification.
By means of the calculated parameter α ', the parameter β' is calculated as follows:
where the division of the above equation can be replaced simply by a shift, since the value I is a power of 2.
Similar to the discussion above regarding equation (1), in HM6.0, an intra prediction mode, referred to as LM, is applied to predict the chroma PU based on a linear model using reconstruction of the co-located luma PU. The parameters of the linear model consist of slopes (a > > k) and y-intercepts (b) that are derived from neighboring luminance and chrominance pixels using a least mean square solution. The values of the predicted samples predSamples x, y are derived as follows, where x, y=0......n.s-1, wherein, nS specifies the block size of the current chroma PU:
predSamples[x,y]=Clip1C(((pY’[x,y]*a)>>k)+b),
Where x, y=0..nS-1 (17)
Where PY' x, y is the reconstructed pixel from the corresponding luma component. When the coordinates x and y are equal to or greater than 0, PY' is the reconstructed pixel from the co-located luminance PU. When x or y is less than 0, PY' is the reconstructed neighboring pixel of the parity PU.
Some intermediate variables L, C, LL, LC, k and k3 in the derivation are derived as:
k2=Log2((2*nS)>>k3) (18-5)
k3=Max(0,BitDepthC+Log2(nS)-14) (18-6)
Thus, the variables a, b, and k can be derived as:
a1=(LC<<k2)–L*C (19-1)
a2=(LL<<k2)–L*L (19-2)
k1=Max(0,Log2(abs(a2))-5)–Max(0,Log2(abs(a1))-14)+2 (19-3)
a1s=a1>>Max(0,Log2(abs(a1))-14) (19-4)
a2s=abs(a2>>Max(0,Log2(abs(a2))-5)) (19-5)
a3=a2s<10:Clip3(-215,215-1,a1s*lmDiv+(1<<(k1-1))>>k1) (19-6)
a=a3>>Max(0,Log2(abs(a3))-6) (19-7)
k=13–Max(0,Log2(abs(a))-6) (19-8)
b=(L–((a*C)>>k1)+(1<<(k2-1)))>>k2, (19-9)
wherein lmDiv is specified in a 63-entry lookup table (i.e., table 3) that is generated online by:
lmDiv(a2s)=((1<<15)+a2s/2)/a2s. (20)
Table 3 Specification of lmDiv
a2s12345678910111213
lmDiv3276816384109238192655454614681409636413277297927312521
a2s14151617181920212223242526
lmDiv2341218520481928182017251638156014891425136513111260
a2s27282930313233343536373839
lmDiv121411701130109210571024993964936910886862840
a2s40414243444546474849505152
lmDiv819799780762745728712697683669655643630
a2s535455565758596061626364
lmDiv618607596585575565555546537529520512
In equation (19-6), a1s is a 16-bit signed integer, and lmDiv is a 16-bit unsigned integer. Thus, a 16-bit multiplier and 16-bit storage are required. It is proposed to reduce the bit depth of the multiplier to an internal bit depth and to reduce the size of the look-up table, as described in more detail below.
The bit depth of a1s is reduced to the internal bit depth by changing equation (19-4) to the following equation:
a1s=a1>>Max(0,Log2(abs(a1))–(BitDepthC–2)). (21)
the value of lmDiv with the internal bit depth is implemented using the following equation (22) and stored in a look-up table:
lmDiv(a2s)=((1<<(BitDepthC-1))+a2s/2)/a2s. (22)
Table 4 shows an example of the internal bit depth 10.
TABLE 4 Specification of lmDiv with internal bit depth equal to 10
a2s12345678910111213141516
lmDiv5122561711281028573645751474339373432
a2s17181920212223242526272829303132
lmDiv30282726242322212020191818171716
a2s33343536373839404142434445464748
lmDiv16151514141313131212121211111111
a2s495051525354555657585960616263
lmDiv10101010109999999888
Equation (19-3) and equation (19-8) are also modified as follows:
k1 =max (0, log2 (abs (a 2)) -5) -Max (0, log2 (abs (a 1)) - (BitDepthC -2)), and (23-1)
k=BitDepthC–1–Max(0,Log2(abs(a))-6) (23-2)
It is also proposed to reduce entries from 63 to 32 and the bits per entry from 16 to 10, as shown in table 5. By doing so, a memory savings of approximately 70% may be achieved. The corresponding changes for equations (19-6), equation (20) and equation (19-8) are as follows:
a3=a2s<320:Clip3(-215,215-1,a1s*lmDiv+(1<<(k1-1))>>k1) (24-1)
lmDiv(a2s)=((1<<(BitDepthC+4))+a2s/2)/a2s (24-2)
k=BitDepthC+4–Max(0,Log2(abs(a))-6). (24-3)
TABLE 5 Specification of lmDiv with internal bit depth equal to 10
a2s32333435363738394041424344454647
lmDiv512496482468455443431420410400390381372364356349
a2s48495051525354555657585960616263
lmDiv341334328321315309303298293287282278273269264260
Multi-model linear model prediction
In ECM-1.0, a multi-model LM (MMLM) prediction mode is proposed for which chroma samples are predicted based on reconstructed luma samples of the same CU by using two linear models, as follows:
Where predC (i, j) represents the predicted chroma samples in the CU and recL' (i, j) represents the downsampled reconstructed luma samples of the same CU. Threshold is calculated as the average value of neighboring reconstructed luminance samples. Fig. 6 shows an example of classifying neighboring samples into two groups based on a value Threshold. For each group, parameters αi and βi (where i equals 1 and 2, respectively) are derived from the linear relationship between luminance and chrominance values from two samples, which are the minimum luminance sample a (XA、YA) and the maximum luminance sample B (XB、YB) inside the group. Here, XA、YA is an X-coordinate (i.e., luminance value) and y-coordinate (i.e., chrominance value) value of the sample point a, and XB、YB is an X-coordinate and y-coordinate value of the sample point B. The linear model parameters α and β are obtained according to the following equation.
(26)
Such a method is also called a min-max method. Division in the above equation can be avoided and replaced by multiplication and shifting.
For a coded block having a square shape, the above two equations are directly applied. For non-square coded blocks, neighboring samples of longer boundaries are first sub-sampled to have the same number of samples as the shorter boundaries.
In addition to the scenario in which the upper Fang Moban and left templates are used together to calculate the linear model coefficients, these two templates can alternatively be used in the other two MMLM modes (called MMLM _a mode and MMLM _l mode).
In MMLM _a mode, only the pixel samples in the upper template are used to calculate the linear model coefficients. To get more samples, the upper template is expanded to the size of (W+W). In MMLM _l mode, only the pixel samples in the left template are used to calculate the linear model coefficients. To get more points, the left template is expanded to the size of (H+H).
Note that when the up reference line is located at the CTU boundary, only one luma line (stored in the line buffer for intra prediction) is used to make the downsampled luma samples.
For chroma intra mode coding, a total of 11 intra modes are allowed for chroma intra mode coding. These modes include five traditional intra modes and six cross-component linear model modes (CCLM, lm_ A, LM _ L, MMLM, MMLM _a, and MMLM _l). The chroma mode signaling and derivation procedure is shown in table 6. Chroma mode coding depends directly on the intra prediction mode of the corresponding luma block. Since separate block partition structures for luminance and chrominance components are enabled in the I slice, one chrominance block may correspond to a plurality of luminance blocks. Thus, for the chroma DM mode, the intra prediction mode of the corresponding luma block covering the center position of the current chroma block is directly inherited.
TABLE 6 deriving chroma prediction modes from luma modes when MMLM is enabled
MMLM mode and LM mode can also be used together in an adaptive manner. For MMLM, the two linear models are as follows:
Where predC (i, j) represents the predicted chroma samples in the CU and recL' (i, j) represents the downsampled reconstructed luma samples of the same CU. Threshold can be simply determined based on the average of luminance and chrominance, together with its minimum and maximum values. Fig. 7 shows an example of classifying neighboring samples into two groups based on an inflection point T indicated by an arrow. The linear model parameters α1 and β1 are derived from the linear relationship between luminance and chrominance values from two samples, which are the minimum luminance samples a (XA、YA) and Threshold (XT、YT). The linear model parameters α2 and β2 are derived from the linear relationship between luminance and chrominance values from two samples, which are the maximum luminance sample B (XB、YB) and Threshold (XT、YT). Here, XA、YA is an X-coordinate (i.e., luminance value) and y-coordinate (i.e., chrominance value) value of the sample point a, and XB、YB is an X-coordinate and y-coordinate value of the sample point B. The linear model parameters αi and βi for each group are obtained according to the following equations, where i equals 1 and 2, respectively.
β1=YA1XA
β2=YT2XT (28)
For a coded block having a square shape, the above equation is directly applied. For non-square coded blocks, neighboring samples of longer boundaries are first sub-sampled to have the same number of samples as the shorter boundaries.
In addition to the scenario in which the upper Fang Moban and left templates are used together to determine the linear model coefficients, these two templates can alternatively be used in the other two MMLM modes (denoted as MMLM _a mode and MMLM _l mode, respectively).
In MMLM _a mode, only the pixel samples in the upper template are used to calculate the linear model coefficients. To get more samples, the upper template is expanded to the size of (W+W). In MMLM _l mode, only the pixel samples in the left template are used to calculate the linear model coefficients. To get more points, the left template is expanded to the size of (H+H).
Note that when the up reference line is located at the CTU boundary, only one luma line (stored in the line buffer for intra prediction) is used to make the downsampled luma samples.
For chroma intra mode coding, there is a condition check for selecting either LM mode (CCLM, lm_a, and lm_l) or multi-mode LM mode (MMLM, mmlm_a, and MMLM _l). The condition check is as follows:
Where BlkSizeThresLM denotes the minimum block size of the LM mode, and BlkSizeThresMM denotes the minimum block size of the MMLM mode. The symbol d represents a predetermined threshold. In one example, d may take on the value 0. In another example, d may take on a value of 8.
For chroma intra mode coding, a total of 8 intra modes are allowed for chroma intra mode coding. These modes include five traditional intra modes and three cross-component linear model modes. The chroma mode signaling and derivation procedure is shown in table 1. Notably, for a given CU, if it is encoded in linear model mode, it is determined whether it is normal single model LM mode or MMLM mode based on the above condition check. Unlike the case shown in table 6, there is no separate MMLM mode that needs to be signaled. Chroma mode coding depends directly on the intra prediction mode of the corresponding luma block. Since separate block partition structures for the luminance component and the chrominance component are enabled in the I slice, one chrominance block may correspond to a plurality of luminance blocks. Thus, for the chroma DM mode, the intra prediction mode of the corresponding luma block covering the center position of the current chroma block is directly inherited.
Scaling (slope) adjustments to the CCLM were proposed as a further improvement during ECM development, for example as described in JVET-Y0055/Z0049.
As discussed above, CCLM uses a model with 2 parameters to map luma values to chroma values. The scaling parameter "a" and the deviation parameter "b" define a map as follows:
chromaVal = a * lumaVal + b (30)
it is proposed to signal the adjustment "u" to the scaling parameters to update the model to the form:
chromaVal = a’ * lumaVal + b’ (31)
Wherein a '=a+u, and b' =b-u×yr.
By this selection, the mapping function will tilt or rotate around the point with the luminance value yr. It is proposed to use the average of the reference luminance samples used in the model creation as yr in order to provide meaningful modifications to the model. Fig. 8A to 8B show the effect of the scaling parameter "u", wherein fig. 8A shows a model created without the scaling parameter "u", and fig. 8B shows a model created with the scaling parameter "u".
In one example, the scaling parameters are provided as integers between-4 and 4 (including-4 and 4) and signaled in the bitstream. The unit of the scaling parameters is 1/8 (for 10-bit content) of the chroma-sample value for each luma-sample value.
In one example, adjustments may be used to CCLM models ("lm_chromajdx" and "MMLM _chromajdx") that use reference points above and to the left of the block, but not for "single-sided" modes. This choice is based on a trade-off of codec efficiency versus complexity.
When scaling adjustments are applied to a multi-mode CCLM model, both models may be adjusted and thus, for a single chroma block, at most two scaling updates are signaled.
To enable scaling at the encoder, the encoder may perform SATD-based retrieval of the best value of the scaling update for Cr and similar SATD-based retrieval for Cb. If either result is a non-zero scaling parameter, the combined scaling adjustment pair (SATD-based update for Cr, SATD-based update for Cb) will be included in the list of RD checks for TU.
Fusion of chroma intra prediction modes
JVET-Y0092/Z0051 proposed the fusion of chroma intra modes during ECM development.
The intra prediction modes enabled for the chroma components in ECM-4.0 are six cross-component Linear Model (LM) modes including cclm_lt, cclm_ L, CCLM _ T, MMLM _lt, mmlm_l, and MMLM _t modes, direct Mode (DM), and four default chroma intra prediction modes. Four default modes are given by the list 0,50,18,1 and if the DM mode already belongs to the list, the mode in the list will be replaced with the mode 66.
A decoder-side intra mode derivation (DIMD) method for luma intra prediction is included in ECM-4.0. First, a horizontal gradient and a vertical gradient are calculated for each reconstructed luma sample of the L-shaped template of the second neighboring row and column of the current block to construct a gradient histogram (HoG). Then, the two intra prediction modes having the largest histogram magnitude value and the second largest histogram magnitude value are mixed with the plane mode to generate a final predictor of the current luminance block.
In order to improve the coding efficiency of chroma intra prediction, two methods are proposed, including a decoder-side derived chroma intra prediction mode (DIMD chroma) and a fusion of a non-LM mode and a MMLM _lt mode.
In the first embodiment, DIMD chromaticity modes are proposed. The proposed DIMD chroma mode uses DIMD derivation method to derive the chroma intra prediction mode for the current block based on co-located reconstructed luma samples. Specifically, a horizontal gradient and a vertical gradient are calculated for each co-located reconstructed luma sample of the current chroma block to construct a HoG, as shown in fig. 8C. Intra-prediction of the chroma of the current chroma block is then performed using the intra-prediction mode having the largest histogram magnitude value.
When the intra prediction mode derived from DIMD chroma mode is the same as the intra prediction mode derived from DM mode, the intra prediction mode having the second largest histogram magnitude value is used as DIMD chroma model.
As shown in table 7, a CU level flag is signaled to indicate whether the proposed DIMD chroma mode is applied.
TABLE 7 intra/u in the proposed method binarization process of chroma pred mode
intra_chroma_pred_modeBinary bit stringChroma intra mode
01100List [0]
11101List [1]
21110List [2]
31111List [3]
410DIMD chromaticity
50DM
In a second embodiment, a fusion of chroma intra prediction modes is proposed, in which a DM mode and four default modes can be fused with MMLM _lt mode, as follows:
pred=(w0*pred0+w1*pred1+(1<<(shift-1)))>>shift
where pred0 is a predictor obtained by applying the non-LM mode, pred1 is a predictor obtained by applying the MMLM _lt mode, and pred is a final predictor of the current chroma block. The two weights (w 0 and w 1) are determined by intra prediction modes of neighboring chroma blocks, and shift is set equal to 2. Specifically, { w0, w1} = {1,3}, when both the upper and left neighboring blocks are coded using the LM mode, { w0, w1 = {3,1}, when both the upper and left neighboring blocks are coded using the non-LM mode, { w0, w1 = {2,2}, otherwise.
For grammar design, if a non-LM mode is selected, a flag is signaled to indicate whether fusion is applied. And the proposed fusion applies only to I slices.
In a third embodiment, DIMD chroma modes are combined with a fusion of chroma intra prediction modes. Specifically, the DIMD chroma mode described in the first embodiment is applied, and for I slices, the DM mode, the four default modes, and the DIMD chroma mode can be fused with the MMLM _lt mode using the weights described in the second embodiment, while for non-I slices, only DIMD chroma modes can be fused with the MMLM _lt mode using equal weights.
In a fourth embodiment, DIMD chroma modes with reduction processing are combined with a fusion of chroma intra prediction modes. Specifically, DIMD chroma modes with reduction processing derive intra modes based on neighboring reconstructed Y, cb and Cr samples in the second neighboring rows and columns, as shown in fig. 8D. The other portions are the same as those of the third embodiment.
In one embodiment, when DIMD is applied, two intra modes are derived from the reconstructed neighboring samples, and the two predictors are combined with a plane mode predictor with weights derived from the gradient, as described in JVET-O0449. The division operation in weight derivation is performed using the same look-up table (LUT) based integration scheme as used by CCLM. For example division in azimuth calculation
Orient=Gy/Gx
Is calculated by the following LUT-based scheme:
x=Floor(Log2(Gx))
normDiff=((Gx<<4)>>x)&15
x+=(3+(normDiff!=0)?1:0)
Orient=(Gy*(DivSigTable[normDiff]|8)+(1<<(x-1)))>>x
Wherein,
DivSigTable[16]={0,7,6,5,5,4,4,3,3,2,2,1,1,1,1,0}。
The derived intra mode is included into the main list of intra Most Probable Modes (MPMs), so the DIMD process is performed before constructing the MPM list. The main derived intra mode of DIMD blocks is stored with the block and used for MPM list construction of neighboring blocks.
Fig. 8E to 8H show steps of decoder-side intra mode derivation, in which the intra prediction direction is estimated without intra mode signaling. The first step, shown in fig. 8E, involves estimating the gradient of each spot (for the light gray spots shown in fig. 8E). The second step, shown in FIG. 8F, involves mapping the gradient values to the nearest predicted direction within [2,66 ]. The third step, as shown in fig. 8G, includes selecting 2 prediction directions, wherein for each prediction direction, all absolute gradients Gx and Gy of neighboring pixels of that direction are summed, and the top 2 directions are selected. The fourth step, shown in fig. 8H, includes enabling weighted intra prediction with the selected direction.
Multiple Reference Line (MRL) intra prediction uses more reference lines for intra prediction. In fig. 9, an example of 4 reference rows is depicted, where the samples of segments a and F are not taken from reconstructed neighboring samples, but are filled with the closest samples from segments B and E, respectively. HEVC intra-picture prediction uses the nearest reference line (i.e., reference line 0). In the MRL, 2 additional rows (reference row 1 and reference row 3) are used.
The index of the selected reference row (mrl _idx) is signaled and used to generate an intra predictor. For reference rows idx greater than 0, only additional reference row patterns are included in the MPM list, and only MPM indexes with no remaining patterns are signaled. The reference row index is signaled before the intra prediction mode, and in case a non-zero reference row index is signaled, the plane mode is excluded from the intra prediction mode.
For blocks of the first row inside the CTU, the MRL is disabled to prevent the use of extended reference samples outside the current CTU row. In addition, PDPC is also disabled when additional rows are used. For MRL mode, the derivation of DC values in DC intra prediction mode for non-zero reference row index is consistent with the derivation of reference row index 0. The MRL needs to store 3 adjacent luminance reference lines with CTUs to generate predictions. The cross-component linear model (CCLM) tool also requires 3 adjacent luma reference lines for its downsampling filter. The definition of MRLs using the same 3 rows is consistent with CCLM to reduce the memory requirements of the decoder.
During ECM development, a convolutional cross-component model (CCCM) of chroma intra modes is proposed.
It is proposed to apply a convolution cross-component model (CCCM) to predict chroma samples from reconstructed luma samples, the spirit of which is similar to the current CCLM mode. As with CCLM, when chroma subsampling is used, the reconstructed luma samples are downsampled to match the lower resolution chroma grid.
Furthermore, similar to CCLM, there is an option to use single or multiple model variants of CCCM. The multiple model variant uses two models, one model being derived for samples above the average luminance reference value and the other model being derived for the remaining samples (following the spirit of the CCLM design). For PUs having at least 128 available reference points, the multi-model CCCM mode may be selected.
The proposed convolution 7-tap filter consists of a 5-tap plus sign shape spatial component, a non-linear term and a bias term. The input to the spatial 5 tap component of the filter is made up of a center (C) luminance sample (which is co-located with the chroma sample to be predicted) and its upper/north (N), lower/south (S), left/west (W) and right/east (E) neighboring samples, as shown in fig. 16.
The nonlinear term P is represented as the second power of the center luminance sample point C and scaled to the sample point value range of the content:
P=(C*C+midVal)>>bitDepth
that is, for 10-bit content, it is calculated as:
P=(C*C+512)>>10
the offset term B represents the scalar offset between the input and output (similar to the offset term in CCLM) and is set to an intermediate chroma value (512 for 10-bit content).
The output of the filter is calculated as the convolution between the filter coefficient ci and the input value and is truncated to the range of valid chroma samples:
predChromaVal=c0C+c1N+c2S+c3E+c4W+c5P+c6B
The filter coefficients ci are calculated by minimizing the MSE between the predicted and reconstructed chroma-sample points in the reference region. Fig. 17 shows a reference region consisting of 6 lines of chroma samples above and to the left of the PU. The reference region extends one PU width to the right and one PU height below the PU boundary. The region is adjusted to include only available samples. An extension of the region shown in blue is required to support "side sampling" of the plus sign shaped spatial filter and to fill when in the unavailable region.
MSE minimization is performed by computing an autocorrelation matrix for the luminance input and a cross-correlation vector between the luminance input and the chrominance output. LDL decomposition is performed on the autocorrelation matrix and back-substitution is used to calculate the final filter coefficients. This process generally follows the calculation of ALF filter coefficients in the ECM, however LDL decomposition is chosen instead of Cholesky decomposition to avoid the use of square root operations. The proposed method uses only integer arithmetic.
The use of this mode is signaled using the PU level flag of CABAC codec. A new CABAC context is included to support this. In terms of signaling CCCM is considered a sub-mode of the CCLM. That is, the CCCM flag is signaled only when the intra prediction mode is lm_chroma_idx (to enable single mode CCCM) or MMLM _chroma_idx (to enable multi-mode CCCM).
The encoder performs two new RD checks in the chroma prediction mode loop, one for checking the single model CCCM mode and one for checking the multi-model CCCM mode.
In existing CCLM or MMLM designs, adjacent reconstructed luma-chroma samples are classified into one or more sample groups based on a value Threshold, which only considers luma DC values. That is, luminance-chrominance sample pairs are classified by considering only the intensity of the luminance sample. However, the luma component typically retains rich texture, and the current luma samples may be highly correlated with neighboring luma samples, such inter-sample correlation (AC correlation) may be beneficial for classification of luma-chroma sample pairs, and may bring additional codec efficiency.
Edge classification linear model (ELM)
In order to improve the codec efficiency of the luminance component and the chrominance component, a classifier that considers luminance edges or AC information is introduced, contrary to the above implementation in which only luminance DC values are considered. In addition to the existing band classification MMLM, the present disclosure also provides an exemplary classifier. The process of generating linear prediction models for different sets of points may be similar to CCLM or MMLM (e.g., via least squares or simplified min-max methods, etc.), but classified using different metrics. Different classifiers may be used to classify neighboring luma samples (e.g., of neighboring luma-chroma sample pairs) and/or luma samples corresponding to chroma samples to be predicted. Luminance samples corresponding to chroma samples may be obtained by a downsampling operation to match the positions of the corresponding chroma samples of the 4:2:0 video sequence. For example, luminance samples corresponding to chroma samples may be obtained by performing a downsampling operation on more than one (e.g., 4) reconstructed luminance samples (e.g., located around the chroma samples) corresponding to the chroma samples. Alternatively, for example, in the case of a 4:4:4 video sequence, luminance samples may be obtained directly from the reconstructed luminance samples. Alternatively, luminance samples may be obtained from respective ones of the reconstructed luminance samples located at respective co-located positions of the corresponding chrominance samples. For example, a luminance sample to be classified may be obtained from one reconstructed luminance sample of four reconstructed luminance samples corresponding to a chroma sample, which is located at an upper left position of the four reconstructed luminance samples, which may be regarded as a co-located position of the chroma sample.
The first classifier may classify the luminance samples according to luminance sample edge intensities. For example, one direction (e.g., 0 degrees, 45 degrees, or 90 degrees, etc.) may be selected to calculate the edge strength. The direction may be formed by the current sample and neighboring samples along the direction (e.g., neighboring samples located at the upper right 45 degrees of the current sample). Edge strength may be calculated by subtracting neighboring samples from the current sample. The edge intensities may be quantized into one of M segments by M-1 thresholds, and the first classifier may classify the current sample using M classes. Alternatively or additionally, N directions may be formed by the current sample and N neighboring samples along the N directions. The N edge intensities may be calculated by subtracting the N neighboring samples from the current sample, respectively. Similarly, if each of the N edge intensities can be quantized into one of M segments by M-1 thresholds, the first classifier can use MN classes to classify the current sample point.
The second classifier may be used to classify according to local patterns. For example, the current luminance sample Y0 may be compared with N luminance samples Yi adjacent thereto. If the value of Y0 is greater than the value of Yi, the score may be incremented by one, otherwise the score may be decremented by one. The scores may be quantized to form K classes. The second classifier may classify the current sample point into one of the K classes. For example, the neighboring luminance samples may be obtained from four neighboring samples located above, to the left, to the right, and below the current luminance sample, i.e., without diagonal neighboring samples.
It is contemplated that multiple first classifiers, second classifiers, or different instances of either the first classifier or the second classifier or other classifiers described herein may be combined. For example, the first classifier may be combined with an existing MMLM-threshold-based classifier. For another example, instance a of the first classifier may be combined with another instance B of the first classifier, wherein instances a and B take different directions (e.g., vertical and horizontal directions, respectively).
Those skilled in the art will recognize that while the existing CCLM design in the VVC standard is used in the specification as the basic CCLM method, the proposed cross-component method described in this disclosure may also be applied to other predictive codec tools with similar design spirit. For example, for chromaticity (CfL) from luminance in the AV1 standard, the proposed method can also be applied by dividing luminance-chromaticity sample pairs into multiple sample groups.
Those skilled in the art will recognize that Y/Cb/Cr may also be denoted Y/U/V in the field of video encoding and decoding. For example, if the video data is in RGB format, the proposed method can also be applied by simply mapping YUV symbols to GBR.
Filter-based linear model (FLM)
As shown in fig. 10A, CCLM assumes that a given chroma sample is only related to a corresponding luma sample (L0.5, which may be a fractional luma sample position), and uses Simple Linear Regression (SLR) with a common least squares (OLS) estimate to predict the given chroma sample. However, as shown in fig. 10B, in some video contents, one chroma sample may be correlated (AC or DC correlated) with a plurality of luma samples at the same time, so a Multiple Linear Regression (MLR) model may further improve prediction accuracy.
Thus, considering the possibility that one chroma sample may be correlated with a plurality of luma samples at the same time, a filter-based linear model (FLM) using an MLR model is introduced, as described below.
For chroma samples to be predicted, the reconstructed co-located and neighboring luma samples may be used to predict the chroma samples to capture inter-sample correlations between co-located luma samples, neighboring luma samples, and chroma samples. The reconstructed luma samples are linearly weighted and combined with an "offset" to generate predicted chroma samples (C: predicted chroma samples, Li: ith reconstructed co-located or neighboring luma samples, ai: filter coefficients, β: offset, N: filter taps). Note that the linear weighted addition of the offset values directly forms the predicted chroma samples (which may be low-pass, high-pass, depending on the video content adaptation), and it then adds the residual to form the reconstructed chroma samples.
For a given CU, the top and left reconstructed luma and chroma samples may be used to derive or train FLM parameters (αi,, β). Like CCLM, αi and β can be derived via OLS. The training samples on the top and left are collected and a pseudo-inverse is calculated at both the encoder and decoder sides to derive parameters, which are then used to predict chroma samples in a given CU. Let N denote the number of filter taps applied to the luminance samples, M denote the total top and left reconstructed luminance-chrominance sample pairs for the training parameters,Representing luminance samples with the ith sample pair and the jth filter tap, Ci representing chrominance samples with the ith sample pair, the following equation shows the derivation of the pseudo-inverse a+, as well as the parameters. Fig. 11 shows an example where N is 6 (6 taps), M is 8, luminance samples of top 2 rows and left 3 columns and chrominance samples of top 1 rows and left 1 columns are used to derive or train parameters.
Note that without an offset β, one can only predict chroma samples through αi, which may be a subset of the proposed method.
The proposed ELM/FLM/GLM (discussed below) can be directly extended to the CfL design in the AV1 standard, which explicitly sends the model parameters (α, β). For example, a and/or β are derived at the encoder at SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level and signaled to the decoder for CfL mode.
To further improve codec performance, additional designs may be used in FLM prediction. As shown in fig. 11 and discussed above, a 6-tap luma filter is used for FLM prediction. However, while the multi-tap filter may fit the training data well (e.g., top and left adjacent reconstructed luma and chroma samples), in some cases the training data does not capture the complete characteristics of the test data, which may result in an over fit and may not predict the test data well (i.e., the chroma block samples to be predicted). Furthermore, different filter shapes can be well adapted to different video block contents, resulting in a more accurate prediction.
To solve this problem, the filter shape and the number of filter taps may be predefined or signaled or switched in a Sequence Parameter Set (SPS), an Adaptive Parameter Set (APS), a Picture Parameter Set (PPS), a Picture Header (PH), a Slice Header (SH), a region, a CTU, a CU, a sub-block or a sample level. A set of filter shape candidates may be predefined and the selection of the set of filter shape candidates may be signaled or switched in SPS, APS, PPS, PH, SH, region, CTU, CU, sub-block or sample level. Different components (e.g., U and V) may have different filter switching controls. For example, a set of filter shape candidates (e.g., indicated by indices 0 through 5) may be predefined, and filter shape (1, 2) may represent a 2-tap luma filter, filter shape (1, 2, 4) may represent a 3-tap luma filter, etc., as shown in fig. 11. The filter shape selection of the U and V components may be switched in PH or CU or CTU levels. Note that N taps may represent N taps with or without the offset β described herein. An example is given in table 8 below.
Table 8-exemplary signaling and switching for different filter shapes
Different chroma types and/or color formats may have different predefined filter shapes and/or taps. For example, as shown in FIG. 12, predefined filter shapes (1, 2,4, 5) may be used for 4:2:0 type-0, predefined filter shapes (0,1,2,4,7) may be used for 4:2:0 type-2, and predefined filter shapes (1, 4) may be used for 4:2:2, and predefined filter shapes (0, 1,2,3,4, 5) may be used for 4:4:4.
In another aspect of the present disclosure, unavailable luma samples and chroma samples used to derive the MLR model may be padded from the available reconstructed samples. For example, if a 6-tap (0, 1,2,3,4, 5) filter as shown in fig. 12 is used, then for a CU located at the left picture boundary, the left column including the sample (0, 3) is not available (beyond the picture boundary), so the sample (0, 3) is a repeated pad from the sample (1, 4) to apply the 6-tap filter. Note that the padding process can be applied to training data (top and left adjacent reconstructed luma and chroma samples) and test data (luma and chroma samples in the CU).
As described above, the MLR model (linear equation) must be derived at both the encoder and the decoder. According to one or more aspects of the present disclosure, several methods are proposed to derive the pseudo-inverse matrix a+, or to directly solve the linear equation. Other known methods, such as the Newton method, the Cayley-Hamilton method and the feature decomposition mentioned in https:// en.
In the present disclosure, a+ may be denoted as a-1 for simplification. The linear equation may be solved for 1, through the companion matrix (adjA), closed form, analytical solution for a-1 as follows:
The following shows one n×n generic form, one 2×2 and one 3×3 case. If 3 x 3 is used for FLM, then 2 scalers plus one offset need to be solved.
B=ax, x= (aTA)-1ATA=A+ b, denoted a-1 b)
By removing the (n-1) x (n-1) submatrices of the j-th row and i-th column.
2. Gauss-Jordan primordial elimination method
The linear equation can be solved using Gauss-Jordan elimination through an augmentation matrix [ A In ] and a series of first-order row operations to obtain a simplified row trapezoidal form [ I|X ]. Examples 2×2 and 3×3 are shown below.
3. Cholesky decomposition
To solve ax=b, a may be first decomposed by Cholesky-Crout algorithm to obtain an upper triangular matrix and a lower triangular matrix, and then a forward substitution and a backward substitution are sequentially applied to obtain a solution. One 3x3 example is shown below.
In addition to the above examples, some cases require special handling. For example, if some circumstances result in a linear equation that cannot be solved, a default value may be used to populate the chroma prediction value. The default value may be predefined or signaled or switched in SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level, for example, when 1< < (bitDepth-1), meanC, meanL, or meanC-meanL (average current chroma or other chroma, available luma values, or subset of FLM reconstructed neighboring regions) is predefined.
The following example represents a case where matrix a cannot be solved, where a default predictor may be assigned to the entire current block:
1. solving by closed form (analytical, concomitant matrix), but a is singular, (i.e., detA =0);
2. Solved by Cholesky decomposition, but a cannot do Cholesky decomposition, gjj < reg_sqr, where reg_sqr is a small value that can be predefined or signaled or switched in SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level.
Fig. 11 shows a typical case of deriving FLM parameters using top 2 and/or left 3 luminance lines and top 1 and/or left 1 chrominance lines. However, as described above, parameter derivation using different regions may bring codec benefits due to different block contents and reconstruction quality of different neighboring samples. Several methods of selecting an application area for parameter derivation are presented below:
1. Similar to MDLM, FLM derivation can only use top or left luminance and/or chrominance samples to derive parameters. Whether FLM, flm_l or flm_t is used may be predefined or signaled or switched in SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level. Assuming that the current chroma block size is w×h, W 'and H' are obtained as follows:
-when FLM mode is applied, W '=w, H' =h;
when flm_t mode is applied, W' =w+we, where We represents the extended top luminance/chrominance samples;
When flm_l mode is applied, H' =h+he, where He represents the extended left luminance/chrominance sample point.
The number of extended luminance/chrominance samples (We, he) may be predefined or signaled or switched in SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level.
For example, the predefined (We, he) = (H, W) is VVC CCLM, or the predefined (W, H) is ECM CCLM. The unavailable (We, he) luminance/chrominance samples may be repeatedly filled from the nearest (horizontal, vertical) luminance/chrominance samples.
Fig. 13 shows a graphical representation of flm_l and flm_t (e.g., under 4 taps). When flm_l or flm_t is applied, only H 'or W' luminance/chrominance samples are used for parameter derivation, respectively.
2. Similar to MRL, different row indices may be predefined or signaled or switched in SPS/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample levels to indicate the selected luma-chroma sample pair rows. This may benefit from different reconstruction quality for different rows of samples.
Fig. 14 shows that, like the MRL, the FLM may use different row parameter derivation (e.g., under 4 taps). For example, the FLM may use light blue/yellow luminance and/or chrominance samples in index 1.
3. The CCLM region is extended and the full top N and/or left M rows are obtained for parameter derivation. Fig. 14 shows that all deep blue and light blue and yellow regions may be used simultaneously. Training with larger regions (data) may result in a more robust MLR model.
The corresponding syntax can be defined in table 9 for FLM prediction as follows. Where FLC denotes a fixed length code, TU denotes a truncated unary code, EGk denotes an exponential golomb code with k-th order, where k may be fixed or signaled/switched in SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level, SVLC denotes signed EG0, and UVLC denotes unsigned EG0.
Table 9-examples of FLM syntax
Note that the binarization of each syntax element can be changed.
Based on the existing linear model design, a new method for cross-component prediction is provided to further improve coding and decoding accuracy and efficiency. The main aspects of the proposed method are detailed below.
While the FLM discussed above provides the best flexibility (best performance), if the number of filter taps increases, it needs to solve for many unknown parameters. When the inverse matrix is greater than 3 x 3, the closed form derivation is unsuitable (too many multipliers) and an iterative approach like Cholesky is required, which increases the burden of the decoder processing loop. In this section, pre-operations prior to applying the linear model are presented, including using sample gradients to exploit the correlation between luminance AC information and chrominance intensity. By means of the gradient, the number of filter taps can be effectively reduced.
Note that the methods/examples in this section may be combined/reused from any of the designs discussed above, including but not limited to classification, filter shape, matrix derivation (with special handling), application area, syntax. Furthermore, the methods/examples listed in this section may also be applied to any of the designs discussed above to achieve better performance under certain complexity trade-offs.
Note that reference samples/training templates/reconstructed neighboring areas as used herein generally refer to luminance samples used to derive MLR model parameters that are then applied to internal luminance samples in one CU to predict chroma samples in the CU.
According to the proposed method, pre-operations (e.g., pre-linear weighting, sign, scale/absolute, thresholding, reLU) can be applied to reduce the dimension of the unknown parameters, instead of directly using the luminance sample intensity values as inputs to the linear model. In one example, the pre-operation may include calculating a sample difference based on the luminance sample value. As will be appreciated by those skilled in the art, the sample differences may be characterized as gradients, and thus this new approach is also referred to as a Gradient Linear Model (GLM) in certain embodiments.
Note that the following detailed description discusses a scenario in which the proposed pre-operations may be repeated for/combined with an SLR model (also referred to as a 1-tap case) and for/combined with an MLR model (also referred to as a multi-tap case, e.g. 2 taps).
For example, instead of applying 2 taps on 2 luminance samples, pre-operations can be performed on 2 luminance samples, and then a simpler 1 tap can be applied to reduce complexity. Fig. 15 shows some examples of 1 tap/2 tap (with offset) pre-operations, where 2 tap coefficients are denoted (a, b). Note that each circle shown in fig. 15 represents an illustrative chromaticity position in YUV 4:2:0 format. As discussed above, in YUV 4:2:0 format, the luma samples corresponding to chroma samples may be obtained by performing a downsampling operation on more than one (e.g., 4) reconstructed luma samples corresponding to chroma samples (e.g., located around chroma samples). In other words, the chroma position may correspond to one or more luminance samples including a co-located luminance sample. Different 1-tap modes are designed for different gradient directions and use different "interpolated" luminance samples (weighted to different luminance positions) for gradient calculations. For example, a typical filter [1,0, -1;1,0, -1] is shown in FIG. 15, which represents the following operations:
Where RecL represents the reconstructed luminance sample value and RecL "(i, j) represents the pre-computed luminance sample value. Note also that the 1-tap filter shown in fig. 15 can be understood as an alternative to the downsampling filter as used in CCLM (see equations (6) - (7)), and the filter coefficients are changed.
The pre-operations may be based on gradients, edge direction (detection), pixel intensities, pixel variations, pixel variances, roberts/Prewitt/compass/Sobel/Laplacian operators, high pass filters (by computing gradients or other related operators), low pass filters (by performing weighted average operations), etc. The edge direction detectors listed in the examples may be extended to different edge directions. For example, 1 tap (1, -1) or 2 tap (a, b) are applied in different directions to detect different edge gradients. The filter shape/coefficients may be symmetric about the chromaticity position as in the example of fig. 15 (420 type 0 case).
The pre-operation parameters (coefficients, symbols, scale/absolute values, thresholding, reLU) may be fixed or signaled/switched in SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level. Note that in an example, if multiple coefficients are applied to one sample (e.g., -1, 4), they may be combined (e.g., 3) to reduce the operation.
In one example, the pre-operation may involve calculating a sample difference of luminance sample values. Alternatively, the pre-operation may include performing downsampling by a weighted average operation. In some cases, the pre-operations may be applied repeatedly. For example, a low pass smoothing FIR filter [1,2,1]/4 or [1,2,1;1,2,1]/8 (i.e., downsampling) may be used to apply a template filter to remove outliers and then a1 tap GLM filter is applied to calculate sample differences to derive the linear model. It is contemplated that the sample difference may also be calculated and then downsampling is enabled.
In one example, the pre-operation coefficients (last applied (e.g., 3), or intermediate applied (e.g., -1, 4) to each luminance sample) may be limited to power values of 2 to save multipliers.
In one aspect of the present disclosure, the proposed new method may be reused/combined with the CCLM discussed above, which utilizes a Simple Linear Regression (SLR) model and uses one corresponding luma sample value to predict chroma sample values. This is also referred to as the 1 tap case. In this case deriving the linear model further comprises deriving the scaling parameter a and the offset parameter β by using pre-computed neighboring luma sample values and neighboring chroma sample values. Alternatively, the linear model may be rewritten as:
C=α·L+β (35)
Where L denotes here the luminance samples of the "pre-operation". The parameter derivation of the 1 tap GLM may reuse the CCLM design but consider the directional gradient (possibly with a high pass filter). In one example, the scaling parameter α may be derived by using a division look-up table (as described in detail below) to achieve simplification.
In one example, when combining the GLM with the SLR model, the scaling parameter α and the offset parameter β may be derived by utilizing the min-max method discussed above. Specifically, the scaling parameter α and the offset parameter β may be derived by comparing pre-computed neighboring luminance sample values to determine a minimum luminance sample value YA and a maximum luminance sample value YB, determining corresponding chrominance sample values XA and XB of the minimum luminance sample value YA and the maximum luminance sample value YB, respectively, and deriving the scaling parameter α and the offset parameter β based on the minimum luminance sample value YA, the maximum luminance sample value YB, and the corresponding chrominance sample values XA and XB according to the following equations:
β=YA-αXA。 (36)
In one example, the scaling adjustments discussed above may be reused when combining the GLM with the SLR model. In this case, the encoder may determine a scaling adjustment value (e.g., "u") to be signaled in the bitstream and add the scaling adjustment value to the derived scaling parameter α. The decoder may determine a scaling adjustment value (e.g., "u") from the bitstream and add the scaling adjustment value to the derived scaling parameter α. The increased value is ultimately used to predict the internal chroma-sample value.
In one aspect of the present disclosure, the proposed new method can be reused for/in combination with FLM that utilizes a Multiple Linear Regression (MLR) model and uses multiple luminance sample values to predict chroma sample values. This is also referred to as a multi-tap case, e.g. 2 taps. In this case, the linear model can be rewritten as:
In this case, the plurality of scaling parameters α and offset parameters β may be derived by using the pre-computed neighboring luminance sample values and neighboring chrominance sample values. In one example, the offset parameter β is optional. In one example, at least one of the plurality of scaling parameters α may be derived by utilizing the sample point differences. Furthermore, another one of the plurality of scaling parameters α may be derived by using the downsampled luminance sample value. In one example, at least one of the plurality of scaling parameters α may be derived by utilizing a horizontal or vertical sample difference calculated on the basis of the downsampled neighboring luminance sample values. In other words, the linear model may combine multiple scaling parameters α associated with different pre-operations.
Implicit filter shape derivation
In one example, the used directional filter shape may be derived at the decoder to save bit overhead, rather than explicitly signaling the selected filter shape index. For example, at the decoder, a plurality of directional gradient filters may be applied to each reconstructed luma sample of the L-shaped templates of the ith adjacent row and column of the current block. The filtered values (gradients) may then be accumulated for each direction in the plurality of directional gradient filters, respectively. In an example, the accumulated value is an accumulated value of absolute values of the corresponding filtered values. After the accumulation, the direction of the directional gradient filter with the largest accumulated value may be determined as the derived (luminance) gradient direction. For example, a gradient histogram (HoG) may be constructed to determine the maximum. The derived direction may be further used as a direction for predicting chroma-samples in the current block.
The following example relates to reusing the decoder-side intra mode derivation (DIMD) method for luma intra prediction included in ECM-4.0:
Step 1, applying 2 kinds of directional gradient filters (3×3 horizontal/vertical sobel) to each reconstructed luminance sample point of the L-shaped template of the 2 nd adjacent row and column of the current block;
Step 2, accumulating filtered values (gradients) for each of the directional gradient filters by SAD (sum of absolute differences);
Step 3, constructing a gradient histogram (HoG) based on accumulating the filtered values, and
The maximum value in the hog is determined as the derived (luminance) gradient direction, based on which the GLM filter can be determined.
In one example, if the shape candidates are [ -1,0,1] (horizontal) and [1,2, 1] [ -1, -2, -1] (vertical), then the shape [ -1,0,1 ]; -1,0,1] is used for GLM-based chroma prediction when the maximum is associated with a horizontal shape.
The shape of the gradient filter used to derive the gradient direction may be the same as or different from the GLM filter in shape. For example, both filters may be horizontal [ -1,0,1; -1,0,1], or both filters may have different shapes, while the GLM filter may be determined based on a gradient filter.
The proposed GLM may be combined with MMLM or ELM discussed above. When combined with classification, each group may share or have its own filter shape, with the syntax indicating the shape for each group. For example, as an exemplary classifier, a horizontal gradient grad_hor may be classified into a first group, which corresponds to a first linear model, and a vertical gradient grad_ver may be classified into a second group, which corresponds to a second linear model. In one example, the horizontal luminance pattern may be generated only once.
Additional possible classifiers are provided below. With a classifier, adjacent and internal luminance sample pairs of a current video block may be classified into groups based on one or more thresholds. Note that as discussed above, each neighboring/inner chroma sample and its corresponding luma sample may be referred to as a luma-chroma sample pair. One or more thresholds are associated with the intensities of the neighboring/internal luminance samples. In this case, each of the plurality of groups corresponds to a respective one of the plurality of linear models.
When combined with the MMLM classifier, the operations of classifying neighboring reconstructed luma-chroma sample points of the current video block into 2 groups based on Threshold, deriving different linear models for the different groups, wherein the derivation process may be GLM-simplified, i.e. the number of taps is reduced by the pre-operation described above, classifying luma-chroma sample point pairs inside the CU (inner luma-chroma sample point pairs, wherein each inner luma-chroma sample point pair of the inner luma-chroma sample point pairs comprises an inner chroma sample point value predicted with the derived linear model) into 2 groups based on Threshold, applying different linear models to the reconstructed luma sample points in the different groups, and predicting chroma sample points in the CU based on the different classified linear models may be performed. Where Threshold may be the average of neighboring reconstructed luminance samples. Note that by increasing the number of Threshold, the number of classes (2) can be extended to multiple classes (e.g., aliquoting based on the minimum/maximum values of neighboring reconstructed (downsampled) luminance samples, fixed or signaled/switched in SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level).
In one example, instead of MMLM luminance DC intensities, the filtered values applied to FLM/GLM on neighboring luminance samples are used for classification. For example, if 1 tap (1, -1) GLM is applied, the average AC value (physical meaning) is used. The processing may be classifying the neighboring reconstructed luma-chroma sample pairs into K groups based on one or more filter shapes, one or more filtered values, and K-1Threshold Ti, deriving different MLR models for the different groups, wherein the derivation process may be GLM reduced, i.e. by the pre-operation described above, reducing the number of taps, similarly classifying luma-chroma sample pairs (internal luma-chroma sample pairs, wherein each internal luma-chroma sample pair of an internal luma-chroma sample pair comprises an internal chroma sample value predicted with the derived linear model) inside the CU based on the one or more filter shapes, the one or more filtered values, and K-1Threshold Ti, applying different linear models to the reconstructed luma samples in the different groups, and predicting chroma samples in the CU based on the different classified linear models. Wherein Threshold may be predefined (e.g., 0, or may be a table) or signaled/switched in SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level. For example, threshold may be the average AC value (filtered value) of neighboring reconstructed (possibly downsampled) luminance samples (2 groups), or based on a min/max AC score (K groups).
It has also been proposed to combine GLM with ELM classifiers. As shown in fig. 15, one filter shape (e.g., 1 tap) may be selected to calculate the edge strength. The direction is determined as the direction along which the sample difference between the current sample and N neighboring samples (e.g., all 6 luminance samples) is calculated. For example, the filters (shapes [1,0, -1;1,0, -1 ]) at the upper middle portion of fig. 15 indicate the horizontal direction because the sample difference between the samples can be calculated in the horizontal direction, while the filters (shapes [1,2,1; -1, -2, -1 ]) below them indicate the vertical direction because the sample difference between the samples can be calculated in the vertical direction. Positive and negative coefficients in each of the filters enable sample differences to be calculated. The processing may then include calculating one edge intensity by filtered values (e.g., equivalents), quantizing the edge intensity to M segments by M-1 thresholds Ti, classifying the current samples using K classes (e.g., k= =m), deriving different MLR models for the different groups, where the derivation process may be GLM-reduced, i.e., reducing the number of taps by the pre-operation described above, classifying the luminance-chrominance samples inside the CU into K groups, applying different MLR models to the reconstructed luminance samples in the different groups, and predicting the chrominance samples in the CU based on the different classified MLR models. Note that the filter shape used for classification may be the same as or different from the filter shape used for MLR prediction. Both the threshold M-1 and the threshold Ti, as well as the number of thresholds M-1 and threshold Ti, may be fixed or signaled/switched in SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level. In addition, other classifiers/combined classifiers as discussed in ELM may also be used for GLM.
If the classification samples in a group are less than one number (e.g., predefined 4), then the default values mentioned in discussing matrix derivation for the MLR model may be applied to the group parameters (αi, β). If a corresponding neighboring reconstructed sample is not available for the selected LM mode, a default value may be applied. For example, when MMLM _l mode is selected but the remaining samples are invalid.
Several methods related to the simplification of GLM are introduced below to further improve the codec efficiency.
Matrix/parameter derivation in FLM requires floating point operations (e.g., division in a closed form), which is expensive for decoder hardware, thus requiring a fixed point design. For the 1-tap GLM case, it may be considered a modified luma reconstructed sample generation of the CCLM (e.g., horizontal gradient direction, from CCLM [1,2,1;1,2,1]/8 to GLM [ -1,0,1; -1,0,1 ]), the original CCLM process may be reused for GLM, including fixed point operations, MDLM downsampling, division tables, applying size constraints, min-max approximations, and scaling adjustments. For all items, the 1-tap GLM may have its own configuration or share the same design as the CCLM. For example, parameters (instead of LMS) are derived using a simplified min-max method and combined with scaling after the GLM model is derived. In this case, the center point (luminance value yr) for the rotation slope becomes the average value of the reference luminance sample point "gradient". For another example, when GLM is turned on for the CU, CCLM slope adjustment is inferred to be off, and syntax related to slope adjustment need not be signaled.
This section takes the typical case reference sample points (top 1 row and left 1 column) as an example. Note that as shown in fig. 14, the extended reconstructed region may also use simplifications of the same nature, and may have a syntax (e.g., MDLM, MRL) that indicates a particular region.
Note that the following aspects may be combined and applied jointly. The partitioning process is performed, for example, in conjunction with reference sample downsampling and a partitioning table.
When applying classification (MMLM/ELM), each group may apply the same or different reduced operations. For example, before applying the right shift, the samples of each group are respectively filled to the target sample numbers, and then the same derivation process, the same division table, is applied.
Fixed point implementation
The 1 tap case may reuse the CCLM design, dividing by n may be achieved by right shifting, and dividing by a2 may be achieved by LUT. The integer parameters, including nα、nA1、nA2、rA1、rA2 ntable involved in the integer design of the LMS CCLM and the intermediate parameters used to derive the linear model (equations (19) - (20)), may be the same as the CCLM or have different values to achieve greater accuracy. The integer parameters may be predefined or signaled/switched in SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level, which may be adjusted according to the sequence bit depth. For example, ntable = bit depth +4.
MDLM downsampling
When GLM is combined with MDLM, the existing total samples for parameter derivation may not be power values of 2 and need to be padded to powers of 2 to replace division with a right shift operation. For example, for an 8×4 chroma CU, MDLM requires w+h=12 samples, whereas MDLM _t is only 8 samples available (reconstructed), then the downsampled 4 samples (0, 2, 4, 6) can be equivalently padded. The code to implement such an operation is as follows:
other filling methods, such as repeated/mirrored filling with respect to the last adjacent sample point (rightmost/bottommost) may also be applied.
The filling method for GLM may be the same as or different from the filling method of CCLM.
Note that in the ECM version, 8×4 chromaticity CU MDLM _t/MDLM _l requires 2T/2 l=16/8 samples, respectively, in which case the same padding method can be applied to satisfy the power number of samples of target 2.
Division LUT
The partitioning LUT proposed for CCLM/LIC (local illumination Compensation) in the development of known standards such as AVC/HEVC/AV1/VVC/AVS can be used for GLM division. For example, the LUT in JCTVC-I0166 is reused with bit depth=10 (table 4). The division LUT may be different from CCLM. For example, CCLM uses min-max with DivTable (as in equation 5), but GLM uses a 32-entry LMS division LUT (as in table 5).
When GLM is combined with MMLM, the meanL value may not always be positive (e.g., classifying the group using filtered/gradient values), so sgn (meanL) needs to be extracted and abs (meanL) used to find the division LUT. Note that the division LUTs used for MMLM classification and parameter derivation may be different. For example, a lower precision LUT (e.g., min-max LUT) is used for mean classification, and a higher precision LUT (e.g., in LMS) is used for parameter derivation.
Size constraint and latency constraint
Similar to the CCLM design, some size constraints may be applied for ELM/FLM/GLM. For example, the same constraint for luminance-chrominance delays in a dual-tree may be applied.
The size constraint may be based on CU area/width/height/depth. The threshold may be predefined or signaled in SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level. For example, for a chroma CU area, the predefined threshold may be 128.
In one example, at least one pre-operation is performed in response to determining that the video block meets an enabling threshold, wherein the enabling threshold is associated with an area, a width, a height, or a segmentation depth of the video block. In particular, the enablement threshold may define a minimum or maximum area, width, height, or segmentation depth of the video block. As understood by those skilled in the art, a video block may include a current chroma block and its co-located luma block. It is also proposed to jointly apply the above-mentioned enabling threshold to the current chroma block and its co-located luma block. For example, in response to determining that both the current chroma block and its co-located luma block meet an enable threshold, at least one pre-operation is performed.
Line buffer reduction
Similar to the CCLM design, if the co-located luma region of the current chroma CU contains the first line inside one CTU, then the top template sample generation may be limited to 1 line to reduce the line buffer storage of the CTU line. Note that when the up reference line is located at the CTU boundary, only one luminance line (a common line buffer in intra prediction) is used to make the downsampled luminance samples.
For example, in fig. 13, if the co-located luma region of the current chroma CU contains the first line inside one CTU, the top template may be limited to parameter derivation using only 1 line (instead of 2 lines) (other CUs may still use 2 lines). This may save on luma sample line buffer storage when CTUs are processed line by line at the decoder hardware. Line buffer reduction may be achieved using several methods. Note that the limited example of a "1" row can be extended to N rows with similar operations. Similarly, 2-tap or multi-tap may also apply such operations. Chroma sampling may also require application operations when multi-tap is applied.
For example, a 1-tap filter [1,0, -1;1,0, -1] shown in FIG. 15 is illustrated as an example. The filter can be reduced to [0, 0;1,0, -1], i.e. only the lower coefficients are used. Alternatively, a limited up-link luminance sample may be filled from below (repeat, mirror, 0, meanL, meanc.
Taking as an example that n=4, i.e. the video block is located at the top boundary of the current CTU, the neighboring luma sample values and the corresponding chroma sample values of the top 4 lines are used to derive the linear model. Note that the corresponding chroma-sample values may refer to the corresponding top 4 rows of neighboring chroma-sample values (e.g., for YUV 4:4 format). Alternatively, the corresponding chroma sample values may refer to the corresponding top 2 rows of neighboring chroma sample values (e.g., for YUV 4:2:0 format). In this case, the neighboring luminance sample values and corresponding chrominance sample values of the top 4 rows may be divided into two regions-a first region including valid sample values (e.g., luminance sample values and corresponding chrominance sample values of the nearest row), and a second region including invalid sample values (e.g., luminance sample values and corresponding chrominance sample values of the other three rows). The coefficients of the filter corresponding to sample points not belonging to the first region may then be set to zero such that only sample point values from the first region are used to calculate the sample point difference. For example, as discussed above, in this case, the filter [1,0, -1;1,0, -1] may be reduced to [0, 0;1,0, -1]. Alternatively, the nearest sample value in the first region may be padded to the second region, so that the padded sample value may be used to calculate the sample difference.
Fusion of chroma intra prediction modes
In one example, because GLM can be a special CCLM mode, the fusion design can be reused or have its own way. Multiple (two or more) weights may be applied to generate the final predictor. For example, the number of the cells to be processed,
pred=(w0*pred0+w1*pred1+(1<<(shift-1)))>>shift
Wherein pred0 is a non-LM mode based predictor and pred1 is a GLM based predictor, or
Pred0 is a CCLM-based predictor, including all MDLM/MMLM, and pred1 is a GLM-based predictor, or
Pred0 is a GLM-based predictor, and pred1 is a GLM-based predictor.
Different I/P/B slices may have different weight designs (w 0 and w 1) depending on whether neighboring blocks are coded with CCLM/GLM/other coding modes or block size/width/height.
For example, the design for the weights may be determined by the intra prediction modes of neighboring chroma blocks, and shift is set equal to 2. Specifically, when both the upper and left neighboring blocks are coded using LM mode, { w0, w1} = {1,3}, when both the upper and left neighboring blocks are coded using non-LM mode, { w0, w1 = {3,1}, otherwise, { w0, w1 = {2,2}. For non-I slices, w0 and w1 may both be set equal to 2.
For grammar design, if a non-LM mode is selected, a flag is signaled to indicate whether fusion is applied.
As described above, GLM has a good gain complexity tradeoff because it can reuse existing CCLM modules without introducing additional derivation. Such a 1-tap design may be further extended or generalized in accordance with one or more aspects of the present disclosure.
In one aspect of the present disclosure, for chroma samples to be predicted, a single corresponding luma sample L may be generated by combining co-located luma samples and neighboring luma samples. For example, the combination may be a combination of different linear filters, e.g., a combination of a high pass gradient filter (GLM) and a low pass smoothing filter (e.g., a [1,2,1;1,2,1]/8FIR downsampling filter that may be commonly used in CCLM), and/or a combination of a linear filter and a non-linear filter (e.g., having a power of n, e.g., Ln, n may be a positive number, a negative number, or a + -fraction (e.g., +1/2, square root, or +3, cube, which may be rounded and rescaled to a bit depth dynamic range)).
In one aspect of the disclosure, the combination may be applied repeatedly. For example, a combination of GLM and [1,2,1;1,2,1]/8FIR may be applied on the reconstructed luminance samples, and then a nonlinear power of 1/2 may be applied. For example, the nonlinear filter may be implemented as a LUT (look-up table), e.g., for bit depth=10, power of n, n=1/2, LUT [ i ] = (int) (sqrt (i) +0.5) < <5,i =0-1023, where 5 is to be scaled to bit depth=10 dynamic range. The nonlinear filter may provide an option when the linear filter cannot efficiently process the luminance-chrominance relationship. Whether non-linear terms are used may be predefined or signaled/switched in SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level.
In one or more aspects of the present disclosure, GLM may refer to a generalized linear model (which may be used to generate one single luminance sample linearly or non-linearly, and the generated one single luminance sample may be fed into the CCLM linear model to derive parameters of the CCLM linear model), the linear/non-linear generation may be referred to as a generic mode. Different gradients or generic modes may be combined to form another mode. For example, a gradient pattern may be combined with a down sampled value via CCLM, a gradient pattern may be combined with a non-linear L2 value, a gradient pattern may be combined with another gradient pattern, two gradient patterns to be combined may have different directions or the same direction, e.g., [1, 1; -1, -1, -1] and [1,2,1; -1, -2, -1], both of which have vertical directions, and also [1, 1; -1, -1, -1] and [1,0, -1], both of which have vertical and horizontal directions, as shown in FIG. 15. The combination may include adding, subtracting, or linear weighting.
In one or more aspects of the present disclosure, one or more grammars may be introduced to indicate information about GLM. Table 10 below shows an example of GLM syntax.
Table 10
FLC fixed length code
TU truncated unary code
EGk an exponential golomb code with k-th order, where k may be fixed or signaled/switched in SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level.
SVLC signed EG0
UVLC unsigned EG0
Note that the binarization of each syntax element may be changed.
In one aspect of the present disclosure, GLM on/off control of Cb/Cr components may be performed jointly or separately. For example, at the CU level, 1 flag may be used to indicate whether GLM is valid for this CU. If so, 1 flag may be used to indicate whether Cb/Cr are both valid. If neither is valid, 1 flag indicates that Cb or Cr is valid. When Cb and/or Cr are active, the filter index/gradient (generic) mode may be signaled separately. All flags may have their own context model or bypassed codec.
In another aspect of the present disclosure, whether to signal the GLM on/off flag may depend on the luma/chroma coding mode and/or the CU size. For example, in the ECM5 chroma intra mode syntax, GLM may be inferred to be off when MMLM or MMLM _l or MMLM _t is applied, when CU area < a, where a may be predefined or signaled/switched in SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level, and when CCCM is on, GLM may be inferred to be off if combined with CCCM.
Note that when GLM is combined with MMLM, different models may share the same gradient/common pattern or have their own gradient/common pattern.
Fig. 18 illustrates a workflow of a method 1800 for decoding video data in accordance with one or more aspects of the present disclosure. For example, method 1800 may be performed by a decoder described with reference to fig. 3.
At step 1810, a bitstream may be received or obtained at a decoder. The bitstream may include video data that has been encoded using various video encoding and decoding techniques as described herein or other video encoding and decoding techniques beyond those described.
At step 1820, an indication indicating information related to a Gradient Linear Model (GLM) may be obtained or derived from the bitstream. GLM may be used to obtain one or more filtered values based on intensity differences between luminance samples in one or more of a plurality of directions (e.g., horizontal, vertical, diagonal, or any combination thereof, as shown in fig. 15). As shown in fig. 15, GLM may be used to filter in one direction (e.g., vertical direction through [1, 1; -1, -1, -1 ]) or multiple directions (e.g., vertical, horizontal, and ± 45 ° diagonal directions through [ -1,5, -1; -1, -1, -1 ]). For example, as described with reference to fig. 15, GLM may be used to obtain only one single value or two values filtered over multiple reconstructed luminance samples. For other examples, GLM may be used to obtain more than two filtered values (e.g., 3). The inclusion of dramatic intensity changes in the video sequence may result in corresponding changes in the chrominance values, known as the purple fringing problem. For example, high brightness halos may result in saturated photodiode quantum wells and charge leakage in Charge Coupled Devices (CCDs). GLM may be proposed to cope with the case when the luminance gradient is highly correlated with the chrominance values. In other cases, GLM may be disabled. For example, in MMLM mode, GLM for obtaining two filtered values may not achieve the desired codec performance and may be disabled. The information related to GLM may include one or more of whether GLM is enabled, which one or more of a plurality of directions (e.g., horizontal, vertical, diagonal, or any combination thereof) is used for GLM, or which one of a plurality of filter modes (e.g., [1, 1; -1, -1, -1], [1,2,1; -1, -2, -1], as shown in fig. 15, etc.) is used for GLM.
In one or more aspects of the present disclosure, an indication indicating information related to the GLM may be obtained explicitly or implicitly. For example, one or more grammars may be used to indicate GLM-related information, such as the grammars shown in table 10. The syntax may be entropy encoded in the bitstream along with the video data or CABAC bypass encoded and included in the bitstream along with the video data. As another example, information related to GLM may be inferred or indicated based on other coding modes or the size of the video block, such as GLM switching may be inferred or indicated when CCCM is on or CCLM is off, without using additional signaling dedicated to GLM.
At step 1830, the video data may be decoded based on the information related to the GLM. For example, the linear model may be trained by using the output data of the GLM. Training data may be collected over an area (e.g., multiple sets, where each set is made up of one or more right luminance sample values and corresponding chrominance sample values). The region may include one or more columns and/or rows or other irregular shapes corresponding to the current video block to be decoded. The region for deriving the parameters of the linear model may be determined based on information related to the GLM, such as MDLM index (GLM, glm_ L, GLM _t) and/or GLM MR index (e.g. 0, 1). Decoding of video data may be performed by applying a linear model to the reconstructed luma component to predict a corresponding chroma component of the video data. The linear model may include a Simple Linear Regression (SLR) model or a Multiple Linear Regression (MLR) model, and the SLR or MLR may use at least one filtered value output by the GLM as at least one luminance value in each set of one or more values of luminance samples and a value of a corresponding chromaticity sample in the region to train or derive parameters of the SLR or MLC. For example, the SLR may have only one tap and only use one filtered value output by the GLM to train the tap. The MLR may have a plurality of taps, and one of the plurality of taps may be trained using only one filtered value output by the GLM. Alternatively, the MLR may train more than one (e.g., 2) of the plurality of taps using more than one (e.g., 2) filtered values output by the GLM. In one example, the number of filtered values used output by the GLM may be less than the number of MLR taps, such that the MLR may be trained taking into account not only the luminance gradient but also other factors.
Fig. 19 illustrates a workflow of a method 1900 for encoding video data in accordance with one or more aspects of the present disclosure. For example, method 1900 may be performed by the encoder described with reference to fig. 1. Method 1900 may be the counterpart of method 1800.
At step 1910, an indication indicating information related to a Gradient Linear Model (GLM) may be obtained. For example, an explicit indication or an implicit indication may be obtained. The explicit indication may be obtained by generating one or more grammars to be included or encoded in the bitstream. Alternatively, the implicit indication may be obtained based on the coding mode or size of the video block. For example, when CCCM is on or CCLM is off, GLM off may be inferred or indicated without generating additional signaling specific to GLM.
At step 1920, video data may be encoded based on the information related to the GLM.
At step 1930, a bitstream may be generated to include the encoded video data and an indication indicating GLM-related information.
Fig. 20 illustrates an exemplary computing system 2000 in accordance with one or more aspects of the present disclosure. The computing system 2000 may include at least one processor 2010. The computing system 2000 may also include at least one storage device 2020. The storage 2020 may store computer executable instructions that, when executed, cause the processor 2010 to perform the steps of the method described above. Processor 2010 may be a general purpose processor or may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration. The storage 2020 may store input data, output data, data generated by the processor 2010, and/or instructions to be executed by the processor 2010.
It should be appreciated that the storage 2020 may store computer executable instructions that, when executed, cause the processor 2010 to perform any operations in accordance with embodiments of the present disclosure.
Embodiments of the present disclosure may be embodied in a computer-readable medium, such as a non-transitory computer-readable medium. The non-transitory computer-readable medium may include instructions that, when executed, cause one or more processors to perform any operations in accordance with embodiments of the present disclosure. For example, the instructions, when executed, may cause one or more processors to receive a bitstream and perform decoding operations as described above. As another example, the instructions, when executed, may cause the one or more processors to perform encoding operations and transmit a bitstream including encoded video data and an indication of information related to the GLM, as described above.
In an embodiment, the information related to the GLM may include one or more of whether GLM is enabled, which one or more of a plurality of directions are used for the GLM, which one of a plurality of filter modes is used for the GLM, which region is used for the GLM to derive parameters of the linear model.
In an embodiment, the indication of the information related to the GLM may be obtained by at least one of a coding mode of the video data, a size of a video block of the video data, or signaling (e.g., one or more grammars) in the bitstream.
In an embodiment, the indication indicating the GLM related information may be signaled in a Sequence Parameter Set (SPS), a Picture Header (PH), a Slice Header (SH), a Coding Tree Unit (CTU), or a Coding Unit (CU) level.
In an embodiment, the indication indicating the information related to GLM is signaled separately or jointly for the Cr component and the Cb component. For example, the direction for Cr may be different from the direction for Cb. For another example, the direction for Cr may be the same as Cb, but the coefficients for Cr may be different from the coefficients for Cb, e.g., [1, 1; -1, -1, -1] for Cr and [1,2,1; -1, -2, -1] for Cb.
It should be recognized that all operations in the above-described methods are merely exemplary, and that the present disclosure is not limited to any operations in the methods or to the order of such operations, but rather should encompass all other equivalents under the same or similar concepts. One or more aspects of the presented methods and/or processes described with reference to fig. 1, 2A-2E, 3-4, 5A-5C, 6-7, 8A-8H, 9, 10A-10B, 11-20 may be combined without departing from the present disclosure.
It should also be appreciated that all of the modules in the above methods may be implemented in a variety of ways. These modules may be implemented as hardware, software, or a combination thereof. Furthermore, any of these modules may be functionally further divided into sub-modules or combined together.
The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects. Accordingly, the claims are not intended to be limited to the aspects shown herein. All structural and functional equivalents to the elements of the various aspects described throughout this disclosure that are known or later come to be known to those of ordinary skill in the art are expressly incorporated herein by reference and are intended to be encompassed by the claims.

Claims (20)

CN202380042295.6A2022-05-262023-05-24 Method and apparatus for cross-component prediction for video codingPendingCN119256544A (en)

Applications Claiming Priority (3)

Application NumberPriority DateFiling DateTitle
US202263346253P2022-05-262022-05-26
US63/346,2532022-05-26
PCT/US2023/023391WO2023230152A1 (en)2022-05-262023-05-24Method and apparatus for cross-component prediction for video coding

Publications (1)

Publication NumberPublication Date
CN119256544Atrue CN119256544A (en)2025-01-03

Family

ID=88919882

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202380042295.6APendingCN119256544A (en)2022-05-262023-05-24 Method and apparatus for cross-component prediction for video coding

Country Status (3)

CountryLink
EP (1)EP4533793A1 (en)
CN (1)CN119256544A (en)
WO (1)WO2023230152A1 (en)

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
JP6004375B2 (en)*2011-06-032016-10-05サン パテント トラスト Image encoding method and image decoding method
WO2020069667A1 (en)*2018-10-052020-04-09Huawei Technologies Co., Ltd.Intra prediction method and device
JP7469475B2 (en)*2019-12-302024-04-16華為技術有限公司 Filtering method and apparatus for cross-component linear model prediction - Patents.com

Also Published As

Publication numberPublication date
EP4533793A1 (en)2025-04-09
WO2023230152A1 (en)2023-11-30

Similar Documents

PublicationPublication DateTitle
CN119404497A (en) Method and apparatus for cross-component prediction for video coding
CN119404498A (en) Improved cross-component prediction for video codecs
US20250047886A1 (en)Method and apparatus for cross-component prediction for video coding
CN120380749A (en)Method and apparatus for cross-component prediction for video coding
CN119343924A (en) Improving cross-component prediction for video codecs
CN119256544A (en) Method and apparatus for cross-component prediction for video coding
CN119452654A (en) Method and apparatus for cross-component prediction for video coding
CN119698829A (en) Method and apparatus for cross-component prediction for video coding
US20250175598A1 (en)Method and apparatus for cross-component prediction for video coding
US20250097434A1 (en)Method and apparatus for cross-component prediction for video coding
US20250039428A1 (en)Method and apparatus for cross-component prediction for video coding
CN120077639A (en) Method and apparatus for cross-component prediction for video coding
CN119698830A (en) Method and apparatus for cross-component prediction for video coding
US20250047873A1 (en)Method and apparatus for cross-component prediction for video coding
US20250227278A1 (en)Method and apparatus for cross-component prediction for video coding
US20250301158A1 (en)Method and apparatus for cross-component prediction for video coding
CN117596393A (en)Video decoding or encoding method, electronic device, storage medium, and program product
CN120019640A (en) Method and apparatus for cross-component prediction for video coding
CN120226351A (en)Method and apparatus for cross-component prediction for video coding
CN120677702A (en)Method and apparatus for cross-component prediction for video coding
KR20240122515A (en) Method and device for cross-component prediction for video coding
CN120457680A (en)Method and apparatus for cross-component prediction for video coding
CN117499643A (en)Video encoding and decoding method and device
KR20240140939A (en) Method and device for cross-component prediction for video coding
TW202533577A (en)Method and apparatus for coding colour pictures or video

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination

[8]ページ先頭

©2009-2025 Movatter.jp