a2s	1	2	3	4	5	6	7	8	9	10	11	12	13
														lmDiv	32768	16384	10923	8192	6554	5461	4681	4096	3641	3277	2979	2731	2521
a2s	14	15	16	17	18	19	20	21	22	23	24	25	26
														lmDiv	2341	2185	2048	1928	1820	1725	1638	1560	1489	1425	1365	1311	1260
a2s	27	28	29	30	31	32	33	34	35	36	37	38	39
														lmDiv	1214	1170	1130	1092	1057	1024	993	964	936	910	886	862	840
a2s	40	41	42	43	44	45	46	47	48	49	50	51	52
														lmDiv	819	799	780	762	745	728	712	697	683	669	655	643	630
a2s	53	54	55	56	57	58	59	60	61	62	63	64
														lmDiv	618	607	596	585	575	565	555	546	537	529	520	512

In equation (19-6), a1s is a 16-bit signed integer, and lmDiv is a 16-bit unsigned integer. Thus, a 16-bit multiplier and 16-bit storage are required. It is proposed to reduce the bit depth of the multiplier to an internal bit depth and to reduce the size of the look-up table, as described in more detail below.

The bit depth of a1s is reduced to the internal bit depth by changing equation (19-4) to the following equation:

a1s = a1 >> Max(0, Log2( abs( a1 ) ) – (BitDepth_C – 2)) . (21)

The value of lmDiv with the internal bit depth is implemented using the following equation (22) and stored in a look-up table:

lmDiv(a2s)=( (1 << (BitDepth_C-1)) + a2s/2 ) / a2s. (22)

table 4 shows an example of the internal bit depth 10.

TABLE 4 Specification of lmDiv with internal bit depth equal to 10

Equation (19-3) and equation (19-8) are also modified as follows:

k1 =max (0, log2 (abs (a 2)) -5) -Max (0, log2 (abs (a 1)) - (BitDepth_C -2)), and (23-1)

k=BitDepth_C–1–Max(0,Log2(abs(a))-6) (23-2)

It is also proposed to reduce entries from 63 to 32 and the bits per entry from 16 to 10, as shown in table 5. By doing so, a memory savings of approximately 70% may be achieved. The corresponding changes for equations (19-6), equation (20) and equation (19-8) are as follows:

a3=a2s<320:Clip3(-2¹⁵,2¹⁵-1,a1s*lmDiv+(1<<(k1-1))>>k1) (24-1)

lmDiv(a2s)=((1<<(BitDepth_C+4))+a2s/2)/a2s (24-2)

k=BitDepth_C+4–Max(0,Log2(abs(a))-6). (24-3)

TABLE 5 Specification of lmDiv with internal bit depth equal to 10

a2s	32	33	34	35	36	37	38	39	40	41	42	43	44	45	46	47
																	lmDiv	512	496	482	468	455	443	431	420	410	400	390	381	372	364	356	349
a2s	48	49	50	51	52	53	54	55	56	57	58	59	60	61	62	63
																	lmDiv	341	334	328	321	315	309	303	298	293	287	282	278	273	269	264	260

Multi-model linear model prediction

In ECM-1.0, a multi-model LM (MMLM) prediction mode is proposed for which chroma samples are predicted based on reconstructed luma samples of the same CU by using two linear models, as follows:

Where pred_C (i, j) represents the predicted chroma samples in the CU and rec_L' (i, j) represents the downsampled reconstructed luma samples of the same CU. Threshold is calculated as the average value of neighboring reconstructed luminance samples. Fig. 6 shows an example of classifying neighboring samples into two groups based on a value Threshold. For each group, parameters αi and βi (where i equals 1 and 2, respectively) are derived from the linear relationship between luminance and chrominance values from two samples, which are the minimum luminance sample a (X_A、Y_A) and the maximum luminance sample B (X_B、Y_B) inside the group. Here, X_A、Y_A is an X-coordinate (i.e., luminance value) and y-coordinate (i.e., chrominance value) value of the sample point a, and X_B、Y_B is an X-coordinate and y-coordinate value of the sample point B. The linear model parameters α and β are obtained according to the following equation.

β=y_A-αx_A (26)

Such a method is also called a min-max method. Division in the above equation can be avoided and replaced by multiplication and shifting.

For a coded block having a square shape, the above two equations are directly applied. For non-square coded blocks, neighboring samples of longer boundaries are first sub-sampled to have the same number of samples as the shorter boundaries.

In addition to the scenario in which the upper Fang Moban and left templates are used together to calculate the linear model coefficients, these two templates can alternatively be used in the other two MMLM modes (called MMLM _a mode and MMLM _l mode).

In MMLM _a mode, only the pixel samples in the upper template are used to calculate the linear model coefficients. To get more samples, the upper template is expanded to the size of (W+W). In MMLM _l mode, only the pixel samples in the left template are used to calculate the linear model coefficients. To get more points, the left template is expanded to the size of (H+H).

Note that when the up reference line is located at the CTU boundary, only one luma line (stored in the line buffer for intra prediction) is used to make the downsampled luma samples.

For chroma intra mode coding, a total of 11 intra modes are allowed for chroma intra mode coding. These modes include five traditional intra modes and six cross-component linear model modes (CCLM, lm_ A, LM _ L, MMLM, MMLM _a, and MMLM _l). The chroma mode signaling and derivation procedure is shown in table 6. Chroma mode coding depends directly on the intra prediction mode of the corresponding luma block. Since separate block partition structures for the luminance component and the chrominance component are enabled in the I slice, one chrominance block may correspond to a plurality of luminance blocks. Thus, for the chroma DM mode, the intra prediction mode of the corresponding luma block covering the center position of the current chroma block is directly inherited.

TABLE 6 deriving chroma prediction modes from luma modes when MMLM is enabled

MMLM mode and LM mode can also be used together in an adaptive manner. For MMLM, the two linear models are as follows:

Where pred_C (i, j) represents the predicted chroma samples in the CU and rec_L' (i, j) represents the downsampled reconstructed luma samples of the same CU. Threshold can be simply determined based on the average of luminance and chrominance, together with its minimum and maximum values. Fig. 7 shows an example of classifying neighboring samples into two groups based on an inflection point T indicated by an arrow. The linear model parameters α₁ and β₁ are derived from the linear relationship between luminance and chrominance values from two samples, which are the minimum luminance samples a (X_A、Y_A) and Threshold (X_T、Y_T). The linear model parameters α₂ and β₂ are derived from the linear relationship between luminance and chrominance values from two samples, which are the maximum luminance sample B (X_B、Y_B) and Threshold (X_T、Y_T). Here, X_A、Y_A is an X-coordinate (i.e., luminance value) and y-coordinate (i.e., chrominance value) value of the sample point a, and X_B、Y_B is an X-coordinate and y-coordinate value of the sample point B. The linear model parameters α_i and β_i for each group are obtained according to the following equations, where i equals 1 and2, respectively.

β₁＝Y_A-α₁X_A

β₂＝Y_T-α₂X_T (28)

For a coded block having a square shape, the above equation is directly applied. For non-square coded blocks, neighboring samples of longer boundaries are first sub-sampled to have the same number of samples as the shorter boundaries.

In addition to the scenario in which the upper Fang Moban and left templates are used together to determine the linear model coefficients, these two templates can alternatively be used in the other two MMLM modes (denoted as MMLM _a mode and MMLM _l mode, respectively).

For chroma intra mode coding, there is a condition check for selecting either LM mode (CCLM, lm_a, and lm_l) or multi-mode LM mode (MMLM, mmlm_a, and MMLM _l). The condition check is as follows:

Where BlkSizeThres_LM denotes the minimum block size of the LM mode, and BlkSizeThres_MM denotes the minimum block size of the MMLM mode. The symbol d represents a predetermined threshold. In one example, d may take on the value 0. In another example, d may take on a value of 8.

For chroma intra mode coding, a total of 8 intra modes are allowed for chroma intra mode coding. These modes include five traditional intra modes and three cross-component linear model modes. The chroma mode signaling and derivation procedure is shown in table 1. Notably, for a given CU, if it is encoded in linear model mode, it is determined whether it is normal single model LM mode or MMLM mode based on the above condition check. Unlike the case shown in table 6, there is no separate MMLM mode that needs to be signaled. Chroma mode coding depends directly on the intra prediction mode of the corresponding luma block. Since separate block partition structures for the luminance component and the chrominance component are enabled in the I slice, one chrominance block may correspond to a plurality of luminance blocks. Thus, for the chroma DM mode, the intra prediction mode of the corresponding luma block covering the center position of the current chroma block is directly inherited.

Scaling (slope) adjustments to the CCLM were proposed as a further improvement during ECM development, for example as described in JVET-Y0055/Z0049.

As discussed above, CCLM uses a model with 2 parameters to map luma values to chroma values. The scaling parameter "a" and the deviation parameter "b" define a map as follows:

chromaVal = a * lumaVal + b (30)

it is proposed to signal the adjustment "u" to the scaling parameters to update the model to the form:

chromaVal = a’ * lumaVal + b’ (31)

wherein a '=a+u, and b' =b-u×y_r.

By this selection, the mapping function will tilt or rotate around the point with the luminance value y_r. It is proposed to use the average of the reference luminance samples used in the model creation as y_r in order to provide meaningful modifications to the model. Fig. 8A to 8B show the effect of the scaling parameter "u", wherein fig. 8A shows a model created without the scaling parameter "u", and fig. 8B shows a model created with the scaling parameter "u".

In one example, the scaling parameters are provided as integers between-4 and 4 (including-4 and 4) and signaled in the bitstream. The unit of the scaling parameters is 1/8 (for 10-bit content) of the chroma-sample value for each luma-sample value.

In one example, adjustments may be used to CCLM models ("lm_chromajdx" and "MMLM _chromajdx") that use reference points above and to the left of the block, but not for "single-sided" modes. This choice is based on a trade-off of codec efficiency versus complexity.

When scaling adjustments are applied to a multi-mode CCLM model, both models may be adjusted and thus, for a single chroma block, at most two scaling updates are signaled.

To enable scaling at the encoder, the encoder may perform SATD-based retrieval of the best value of the scaling update for Cr and similar SATD-based retrieval for Cb. If either result is a non-zero scaling parameter, the combined scaling adjustment pair (SATD-based update for Cr, SATD-based update for Cb) will be included in the list of RD checks for TU.

Fusion of chroma intra prediction modes

JVET-Y0092/Z0051 proposed the fusion of chroma intra modes during ECM development.

The intra prediction modes enabled for the chroma components in ECM-4.0 are six cross-component Linear Model (LM) modes including cclm_lt, cclm_ L, CCLM _ T, MMLM _lt, mmlm_l, and MMLM _t modes, direct Mode (DM), and four default chroma intra prediction modes. Four default modes are given by the list 0,50,18,1 and if the DM mode already belongs to the list, the mode in the list will be replaced with the mode 66.

A decoder-side intra mode derivation (DIMD) method for luma intra prediction is included in ECM-4.0. First, a horizontal gradient and a vertical gradient are calculated for each reconstructed luma sample of the L-shaped template of the second neighboring row and column of the current block to construct a gradient histogram (HoG). Then, the two intra prediction modes having the largest histogram magnitude value and the second largest histogram magnitude value are mixed with the plane mode to generate a final predictor of the current luminance block.

In order to improve the coding efficiency of chroma intra prediction, two methods are proposed, including a decoder-side derived chroma intra prediction mode (DIMD chroma) and a fusion of the non-LM mode and MMLM _lt mode.

In the first embodiment, DIMD chromaticity modes are proposed. The proposed DIMD chroma mode uses DIMD derivation method to derive the chroma intra prediction mode of the current block based on co-located reconstructed luma samples. Specifically, a horizontal gradient and a vertical gradient are calculated for each co-located reconstructed luma sample of the current chroma block to construct a HoG, as shown in fig. 8C. Intra-prediction of the chroma of the current chroma block is then performed using the intra-prediction mode having the largest histogram magnitude value.

When the intra prediction mode derived from DIMD chroma mode is the same as the intra prediction mode derived from DM mode, the intra prediction mode having the second largest histogram magnitude value is used as DIMD chroma model.

As shown in table 7, a CU level flag is signaled to indicate whether the proposed DIMD chroma mode is applied.

TABLE 7 intra/u in the proposed method binarization process of chroma pred mode

intra_chroma_pred_mode	Binary bit string	Chroma intra mode
			0	1100	List [0]
1	1101	List [1]
			2	1110	List [2]
3	1111	List [3]
			4	10	DIMD chromaticity
5	0	DM

In a second embodiment, a fusion of chroma intra prediction modes is proposed, in which a DM mode and four default modes can be fused with MMLM _lt mode, as follows:

pred=(w0*pred0+w1*pred1+(1<<(shift-1)))>>shift

where pred0 is a predictor obtained by applying the non-LM mode, pred1 is a predictor obtained by applying the MMLM _lt mode, and pred is a final predictor of the current chroma block. The two weights (w 0 and w 1) are determined by intra prediction modes of neighboring chroma blocks, and shift is set equal to 2. Specifically, { w0, w1} = {1,3}, when both the upper and left neighboring blocks are coded using the LM mode, { w0, w1 = {3,1}, when both the upper and left neighboring blocks are coded using the non-LM mode, { w0, w1 = {2,2}, otherwise.

For grammar design, if a non-LM mode is selected, a flag is signaled to indicate whether fusion is applied. And the proposed fusion applies only to I slices.

In a third embodiment, DIMD chroma modes are combined with a fusion of chroma intra prediction modes. Specifically, the DIMD chroma mode described in the first embodiment is applied, and for I slices, the DM mode, the four default modes, and the DIMD chroma mode can be fused with the MMLM _lt mode using the weights described in the second embodiment, while for non-I slices, only DIMD chroma modes can be fused with the MMLM _lt mode using equal weights.

In a fourth embodiment, DIMD chroma modes with reduction processing are combined with a fusion of chroma intra prediction modes. Specifically, the DIMD chroma mode with reduction processing derives the intra mode based on neighboring reconstructed Y, cb and Cr samples in the second neighboring rows and columns, as shown in fig. 8D. The other portions are the same as those of the third embodiment.

In one embodiment, when DIMD is applied, two intra modes are derived from the reconstructed neighboring samples, and the two predictors are combined with a plane mode predictor with weights derived from the gradient, as described in JVET-O0449. The division operation in weight derivation is performed using the same look-up table (LUT) based integration scheme as used by CCLM. For example division in azimuth calculation

Orient=G_y/G_x

Is calculated by the following LUT-based scheme:

x=Floor(Log2(Gx))

normDiff=((Gx<<4)>>x)&15

x+=(3+(normDiff!=0)?1:0)

Orient=(Gy*(DivSigTable[normDiff]|8)+(1<<(x-1)))>>x

Wherein,

DivSigTable[16]={0,7,6,5,5,4,4,3,3,2,2,1,1,1,1,0}。

The derived intra mode is included into the main list of intra Most Probable Modes (MPMs), and thus the DIMD process is performed before constructing the MPM list. The main derived intra mode of DIMD blocks is stored with the block and used for MPM list construction of neighboring blocks.

Fig. 8E to 8H show steps of decoder-side intra mode derivation, in which the intra prediction direction is estimated without intra mode signaling. The first step, shown in fig. 8E, involves estimating the gradient of each spot (for the light gray spots shown in fig. 8E). The second step, shown in FIG. 8F, involves mapping the gradient values to the nearest predicted direction within [2,66 ]. The third step, as shown in fig. 8G, includes selecting 2 prediction directions, wherein for each prediction direction, all absolute gradients Gx and Gy of neighboring pixels of that direction are summed, and the top 2 directions are selected. The fourth step, shown in fig. 8H, includes enabling weighted intra prediction with the selected direction.

Multiple Reference Line (MRL) intra prediction uses more reference lines for intra prediction. In fig. 9, an example of 4 reference rows is depicted, where the samples of segments a and F are not taken from reconstructed neighboring samples, but are filled with the closest samples from segments B and E, respectively. HEVC intra-picture prediction uses the nearest reference line (i.e., reference line 0). In the MRL, 2 additional rows (reference row 1 and reference row 3) are used.

An index (mrl _idx) of the selected reference line is signaled and used to generate an intra predictor. For reference rows idx greater than 0, only additional reference row patterns are included in the MPM list, and only MPM indexes with no remaining patterns are signaled. The reference row index is signaled before the intra prediction mode, and in case a non-zero reference row index is signaled, the plane mode is excluded from the intra prediction mode.

For blocks of the first row inside the CTU, the MRL is disabled to prevent the use of extended reference samples outside the current CTU row. In addition, PDPC is also disabled when additional rows are used. For MRL mode, the derivation of DC values in DC intra prediction mode for non-zero reference row index is consistent with the derivation of reference row index 0. The MRL needs to store 3 adjacent luminance reference lines with CTUs to generate predictions. The cross-component linear model (CCLM) tool also requires 3 adjacent luma reference lines for its downsampling filter. The definition of MRLs using the same 3 rows is consistent with CCLM to reduce the memory requirements of the decoder.

During ECM development, a convolutional cross-component model (CCCM) of chroma intra modes is proposed.

It is proposed to apply a convolution cross-component model (CCCM) to predict chroma samples from reconstructed luma samples, the spirit of which is similar to the current CCLM mode. As with CCLM, when chroma subsampling is used, the reconstructed luma samples are downsampled to match the lower resolution chroma grid.

The proposed convolution 7-tap filter consists of a 5-tap plus sign shape spatial component, a non-linear term and a bias term. The input to the spatial 5 tap component of the filter is made up of a center (C) luminance sample (which is co-located with the chroma sample to be predicted) and its upper/north (N), lower/south (S), left/west (W) and right/east (E) neighboring samples, as shown in fig. 10A.

The nonlinear term P is represented as the second power of the center luminance sample point C and scaled to the sample point value range of the content:

P=(C*C+midVal)>>bitDepth

That is, for 10-bit content, it is calculated as:

P=(C*C+512)>>10

The offset term B represents the scalar offset between the input and output (similar to the offset term in CCLM) and is set to an intermediate chroma value (512 for 10-bit content).

The output of the filter is calculated as the convolution between the filter coefficient c_i and the input value and is truncated to the range of valid chroma samples:

predChromaVal=c₀C+c₁N+c₂S+c₃E+c₄W+c₅P+c₆B

The filter coefficients c_i are calculated by minimizing the MSE between the predicted and reconstructed chroma-sample points in the reference region. Fig. 10B shows a reference region consisting of 6 lines of chroma samples above and to the left of the PU. The reference region extends one PU width to the right and one PU height below the PU boundary. The region is adjusted to include only available samples. An extension of the region shown in blue is required to support "side sampling" of the plus sign shaped spatial filter and to fill when in the unavailable region.

MSE minimization is performed by computing an autocorrelation matrix for the luminance input and a cross-correlation vector between the luminance input and the chrominance output. LDL decomposition is performed on the autocorrelation matrix and back-substitution is used to calculate the final filter coefficients. This process generally follows the calculation of ALF filter coefficients in the ECM, however LDL decomposition is chosen instead of Cholesky decomposition to avoid the use of square root operations. The proposed method uses only integer arithmetic.

The use of this mode is signaled using the PU level flag of CABAC codec. A new CABAC context is included to support this. In terms of signaling CCCM is considered a sub-mode of the CCLM. That is, the CCCM flag is signaled only when the intra prediction mode is lm_chroma_idx (to enable single mode CCCM) or MMLM _chroma_idx (to enable multi-mode CCCM).

The encoder performs two new RD checks in the chroma prediction mode loop, one for checking the single model CCCM mode and one for checking the multi-model CCCM mode.

In existing CCLM or MMLM designs, adjacent reconstructed luma-chroma samples are classified into one or more sample groups based on a value Threshold, which only considers luma DC values. That is, luminance-chrominance sample pairs are classified by considering only the intensity of the luminance sample. However, the luma component typically retains rich texture, and the current luma samples may be highly correlated with neighboring luma samples, such inter-sample correlation (AC correlation) may be beneficial for classification of luma-chroma sample pairs, and may bring additional codec efficiency.

As shown in fig. 10C, CCLM assumes that a given chroma sample is only related to a corresponding luma sample (L0.5, which may be taken as a fractional luma sample position), and predicts the given chroma sample using a Simple Linear Regression (SLR) estimation using common least squares (OLS). However, as shown in fig. 10D, in some video content, one chroma sample may be correlated (AC or DC correlated) with multiple luma prediction samples at the same time, so a Multiple Linear Regression (MLR) model may further improve accuracy.

Although CCCM mode can enhance intra prediction efficiency, there is room for further improvement in its performance. At the same time, some portions of the existing CCCM modes also need to be simplified to achieve efficient codec hardware implementations, or improved to have better codec efficiency. Furthermore, the tradeoff between implementation complexity and its codec efficiency benefits needs to be further improved.

Edge classification linear model (ELM)

In order to improve the codec efficiency of the luminance component and the chrominance component, a classifier that considers luminance edges or AC information is introduced, contrary to the above implementation in which only luminance DC values are considered. In addition to the existing band classification MMLM, the present disclosure also provides an exemplary classifier. The process of generating linear prediction models for different sets of points may be similar to CCLM or MMLM (e.g., via least squares or simplified min-max methods, etc.), but classified using different metrics. Different classifiers may be used to classify neighboring luma samples (e.g., of neighboring luma-chroma sample pairs) and/or luma samples corresponding to chroma samples to be predicted. Luminance samples corresponding to chroma samples may be obtained by a downsampling operation to match the positions of the corresponding chroma samples of the 4:2:0 video sequence. For example, luminance samples corresponding to chroma samples may be obtained by performing a downsampling operation on more than one (e.g., 4) reconstructed luminance samples (e.g., located around the chroma samples) corresponding to the chroma samples. Alternatively, for example, in the case of a 4:4:4 video sequence, luminance samples may be obtained directly from the reconstructed luminance samples. Alternatively, luminance samples may be obtained from respective ones of the reconstructed luminance samples located at respective co-located positions of the corresponding chrominance samples. For example, a luminance sample to be classified may be obtained from one reconstructed luminance sample of four reconstructed luminance samples corresponding to a chroma sample, which is located at an upper left position of the four reconstructed luminance samples, which may be regarded as a co-located position of the chroma sample.

The first classifier may classify the luminance samples according to luminance sample edge intensities. For example, one direction (e.g., 0 degrees, 45 degrees, or 90 degrees, etc.) may be selected to calculate the edge strength. The direction may be formed by the current sample and neighboring samples along the direction (e.g., neighboring samples located at the upper right 45 degrees of the current sample). Edge strength may be calculated by subtracting neighboring samples from the current sample. The edge intensities may be quantized into one of M segments by M-1 thresholds, and the first classifier may classify the current sample using M classes. Alternatively or additionally, N directions may be formed by the current sample and N neighboring samples along the N directions. The N edge intensities may be calculated by subtracting the N neighboring samples from the current sample, respectively. Similarly, if each of the N edge intensities can be quantized into one of M segments by M-1 thresholds, the first classifier can use MN classes to classify the current sample point.

The second classifier may be used to classify according to local patterns. For example, the current luminance sample Y0 may be compared with N luminance samples Yi adjacent thereto. If the value of Y0 is greater than the value of Yi, the score may be incremented by one, otherwise the score may be decremented by one. The scores may be quantized to form K classes. The second classifier may classify the current sample point into one of the K classes. For example, the neighboring luminance samples may be obtained from four neighboring samples located above, to the left, to the right, and below the current luminance sample, i.e., without diagonal neighboring samples.

It is contemplated that multiple first classifiers, second classifiers, or different instances of either the first classifier or the second classifier or other classifiers described herein may be combined. For example, the first classifier may be combined with an existing MMLM-threshold-based classifier. For another example, instance a of the first classifier may be combined with another instance B of the first classifier, wherein instances a and B take different directions (e.g., vertical and horizontal directions, respectively).

Those skilled in the art will recognize that while the existing CCLM design in the VVC standard is used in the specification as the basic CCLM method, the proposed cross-component method described in this disclosure may also be applied to other predictive codec tools with similar design spirit. For example, for chromaticity (CfL) from luminance in the AV1 standard, the proposed method can also be applied by dividing luminance-chromaticity sample pairs into multiple sample groups.

Those skilled in the art will recognize that Y/Cb/Cr may also be denoted Y/U/V in the field of video encoding and decoding. For example, if the video data is in RGB format, the proposed method can also be applied by simply mapping YUV symbols to GBR.

Filter-based linear model (FLM)

Considering the possibility that one chroma sample may be correlated with multiple luma samples simultaneously, a filter-based linear model (FLM) using an MLR model was introduced, as described below.

For chroma samples to be predicted, the reconstructed co-located and neighboring luma samples may be used to predict the chroma samples to capture inter-sample correlations between co-located luma samples, neighboring luma samples, and chroma samples. The reconstructed luma samples are linearly weighted and combined with an "offset" to generate the predicted chroma samples (C: predicted chroma samples, L_i: ith reconstructed co-located or neighboring luma samples, a_i: filter coefficients, β: offset, N: filter taps) as shown in equation (32-1) below. Note that the linear weighted addition of the offset values directly forms the predicted chroma samples (which may be low-pass, high-pass, depending on the video content adaptation), and it then adds the residual to form the reconstructed chroma samples.

In some implementations like CCCM, the offset term may also be implemented as the intermediate chroma value B (512 for 10-bit content) multiplied by another coefficient, as shown in equation (32-2) below.

For a given CU, the top and left reconstructed luma and chroma samples may be used to derive or train FLM parameters (α_i,, β). Like CCLM, α_i and β can be derived via OLS. The training samples on the top and left are collected and a pseudo-inverse is calculated at both the encoder and decoder sides to derive parameters, which are then used to predict chroma samples in a given CU. Let N denote the number of filter taps applied to the luminance samples, M denote the total top and left reconstructed luminance-chrominance sample pairs for the training parameters,Representing luminance samples with the ith sample pair and the jth filter tap, Cⁱ representing chrominance samples with the ith sample pair, the following equation shows the derivation of the pseudo-inverse a⁺, as well as the parameters. Fig. 11 shows an example where N is 6 (6 taps), M is 8, luminance samples of top 2 rows and left 3 columns and chrominance samples of top 1 rows and left 1 columns are used to derive or train parameters.

Note that without an offset β, one can only predict chroma samples through α_i, which may be a subset of the proposed method.

The proposed ELM/FLM/GLM (discussed below) can be directly extended to the CfL design in the AV1 standard, which explicitly sends the model parameters (α, β). For example, a and/or β are derived at the encoder at SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level and signaled to the decoder for CfL mode.

To further improve codec performance, additional designs may be used in FLM prediction. As shown in fig. 11 and discussed above, a 6-tap luma filter is used for FLM prediction. However, while the multi-tap filter may fit the training data well (e.g., top and left adjacent reconstructed luma and chroma samples), in some cases the training data does not capture the complete characteristics of the test data, which may result in an over fit and may not predict the test data well (i.e., the chroma block samples to be predicted). Furthermore, different filter shapes can be well adapted to different video block contents, resulting in a more accurate prediction.

To solve this problem, the filter shape and the number of filter taps may be predefined or signaled or switched in a Sequence Parameter Set (SPS), an Adaptive Parameter Set (APS), a Picture Parameter Set (PPS), a Picture Header (PH), a Slice Header (SH), a region, a CTU, a CU, a sub-block or a sample level. A set of filter shape candidates may be predefined and the selection of the set of filter shape candidates may be signaled or switched in SPS, APS, PPS, PH, SH, region, CTU, CU, sub-block or sample level. Different components (e.g., U and V) may have different filter switching controls. For example, a set of filter shape candidates (e.g., indicated by indices 0 through 5) may be predefined, and filter shape (1, 2) may represent a 2-tap luma filter, filter shape (1, 2, 4) may represent a 3-tap luma filter, etc., as shown in fig. 11. The filter shape selection of the U and V components may be switched in PH or CU or CTU levels. Note that N taps may represent N taps with or without the offset β described herein. An example is given in table 8 below.

Table 8-exemplary signaling and switching for different filter shapes

Different chroma types and/or color formats may have different predefined filter shapes and/or taps. For example, as shown in FIG. 12, predefined filter shapes (1, 2,4, 5) may be used for 4:2:0 type-0, predefined filter shapes (0,1,2,4,7) may be used for 4:2:0 type-2, and predefined filter shapes (1, 4) may be used for 4:2:2, and predefined filter shapes (0, 1,2,3,4, 5) may be used for 4:4:4.

In another aspect of the present disclosure, unavailable luma samples and chroma samples used to derive the MLR model may be padded from the available reconstructed samples. For example, if a 6-tap (0, 1,2,3,4, 5) filter as shown in fig. 12 is used, then for a CU located at the left picture boundary, the left column including the sample (0, 3) is not available (beyond the picture boundary), so the sample (0, 3) is a repeated pad from the sample (1, 4) to apply the 6-tap filter. Note that the padding process can be applied to training data (top and left adjacent reconstructed luma and chroma samples) and test data (luma and chroma samples in the CU).

One or more shapes/numbers of filter taps may be used for FLM prediction, examples of which are shown in fig. 16, 17, and 18A-18B. One or more sets of filter taps may be used for FLM prediction, examples of which are shown in fig. 19A-19G.

As described above, the MLR model (linear equation) must be derived at both the encoder and the decoder. According to one or more aspects of the present disclosure, several methods are proposed to derive the pseudo-inverse matrix a⁺, or to directly solve the linear equation. Other known methods, such as the Newton method, the Cayley-Hamilton method and the feature decomposition mentioned in https:// en.

In the present disclosure, a⁺ may be denoted as a^-1 for simplification. The linear equation may be solved for 1, through the companion matrix (adjA), closed form, analytical solution for a^-1 as follows:

The following shows one n×n generic form, one 2×2 and one 3×3 case. If 3 x 3 is used for FLM, then 2 scalers plus one offset need to be solved.

B=ax, x= (a^TA)^-1A^Tb＝A⁺ b, denoted a^-1 b)

By removing the (n-1) x (n-1) submatrices of the j-th row and i-th column.

2. Gauss-Jordan primordial elimination method

The linear equation can be solved using Gauss-Jordan elimination through an augmentation matrix [ A I_n ] and a series of first-order row operations to obtain a simplified row trapezoidal form [ I|X ]. Examples 2×2 and 3×3 are shown below.

3. Cholesky decomposition

To solve ax=b, a may be first decomposed by Cholesky-Crout algorithm to obtain an upper triangular matrix and a lower triangular matrix, and then a forward substitution and a backward substitution are sequentially applied to obtain a solution. One 3x3 example is shown below.

In addition to the above examples, some cases require special handling. For example, if some circumstances result in a linear equation that cannot be solved, a default value may be used to populate the chroma prediction value. The default value may be predefined or signaled or switched in SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level, for example, when 1< < (bitDepth-1), meanC, meanL, or meanC-meanL (average current chroma or other chroma, available luma values, or subset of FLM reconstructed neighboring regions) is predefined.

The following example represents a case where matrix a cannot be solved, where a default predictor may be assigned to the entire current block:

1. solving by closed form (analytical, concomitant matrix), but a is singular, (i.e., detA =0);

2. Solved by Cholesky decomposition, but a cannot do Cholesky decomposition, g_jj < reg_sqr, where reg_sqr is a small value that can be predefined or signaled or switched in SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level.

Fig. 11 shows a typical case of deriving FLM parameters using top 2 and/or left 3 luminance lines and top 1 and/or left 1 chrominance lines. However, as described above, parameter derivation using different regions may bring codec benefits due to different block contents and reconstruction quality of different neighboring samples. Several methods of selecting an application area for parameter derivation are presented below:

-when FLM mode is applied, W '=w, H' =h;

when flm_t mode is applied, W' =w+we, where We represents the extended top luminance/chrominance samples;

When flm_l mode is applied, H' =h+he, where He represents the extended left luminance/chrominance sample point.

The number of extended luminance/chrominance samples (We, he) may be predefined or signaled or switched in SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level.

For example, the predefined (We, he) = (H, W) is VVC CCLM, or the predefined (W, H) is ECM CCLM. The unavailable (We, he) luminance/chrominance samples may be repeatedly filled from the nearest (horizontal, vertical) luminance/chrominance samples.

Fig. 13 shows a graphical representation of flm_l and flm_t (e.g., under 4 taps). When flm_l or flm_t is applied, only H 'or W' luminance/chrominance samples are used for parameter derivation, respectively.

Fig. 14 shows that, like the MRL, the FLM may use different row parameter derivation (e.g., under 4 taps). For example, the FLM may use light blue/yellow luminance and/or chrominance samples in index 1.

3. The CCLM region is extended and the full top N and/or left M rows are obtained for parameter derivation. Fig. 14 shows that all deep blue and light blue and yellow regions may be used simultaneously. Training with larger regions (data) may result in a more robust MLR model.

It should be understood that throughout this disclosure, luminance sample values of an outer region of a video block to be decoded may be referred to as "outer luminance sample values" and chrominance sample values of the outer region may be referred to as "outer luminance sample values".

The corresponding syntax can be defined in table 9 for FLM prediction as follows. Where FLC denotes a fixed length code, TU denotes a truncated unary code, EGk denotes an exponential golomb code with k-th order, where k may be fixed or signaled/switched in SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level, SVLC denotes signed EG0, and UVLC denotes unsigned EG0.

Table 9-examples of FLM syntax

Note that the binarization of each syntax element can be changed.

Based on the existing linear model design, a new method for cross-component prediction is provided to further improve coding and decoding accuracy and efficiency. The main aspects of the proposed method are detailed below.

While the FLM discussed above provides the best flexibility (best performance), if the number of filter taps increases, it needs to solve for many unknown parameters. When the inverse matrix is greater than 3 x 3, the closed form derivation is unsuitable (too many multipliers) and an iterative approach like Cholesky is required, which increases the burden of the decoder processing loop. In this section, pre-operations prior to applying the linear model are presented, including using sample gradients to exploit the correlation between luminance AC information and chrominance intensity. By means of the gradient, the number of filter taps can be effectively reduced.

Note that the methods/examples in this section may be combined/reused from any of the designs discussed above, including but not limited to classification, filter shape, matrix derivation (with special handling), application area, syntax. Furthermore, the methods/examples listed in this section may also be applied to any of the designs discussed above to achieve better performance under certain complexity trade-offs.

Note that reference samples/training templates/reconstructed neighboring areas as used herein generally refer to luminance samples used to derive MLR model parameters that are then applied to internal luminance samples in one CU to predict chroma samples in the CU.

According to the proposed method, pre-operations (e.g., pre-linear weighting, sign, scale/absolute, thresholding, reLU) can be applied to reduce the dimension of the unknown parameters, instead of directly using the luminance sample intensity values as inputs to the linear model. In one example, the pre-operation may include calculating a sample difference based on the luminance sample value. As will be appreciated by those skilled in the art, the sample differences may be characterized as gradients, and thus this new approach is also referred to as a Gradient Linear Model (GLM) in certain embodiments.

Note that the following detailed description discusses a scenario in which the proposed pre-operations can be reused for/in combination with an SLR model (also referred to as a 1-tap case) and for/in combination with an MLR model (also referred to as a multi-tap case, e.g. 2 taps).

For example, instead of applying 2 taps on 2 luminance samples, pre-operations can be performed on 2 luminance samples, and then a simpler 1 tap can be applied to reduce complexity. Fig. 15A to 15D show some examples of 1 tap/2 tap (with offset) pre-operations, where 2 tap coefficients are denoted (a, b). Note that each circle shown in fig. 15A-15D represents an illustrative chromaticity position in YUV 4:2:0 format. As discussed above, in YUV 4:2:0 format, the luma samples corresponding to chroma samples may be obtained by performing a downsampling operation on more than one (e.g., 4) reconstructed luma samples corresponding to chroma samples (e.g., located around chroma samples). In other words, the chroma position may correspond to one or more luminance samples including a co-located luminance sample. Different 1-tap modes are designed for different gradient directions and use different "interpolated" luminance samples (weighted to different luminance positions) for gradient calculations. For example, a typical filter [1,0, -1;1,0, -1] is shown in FIGS. 15A, 15C and 15D, which represents the following operations:

Where Rec_L represents the reconstructed luminance sample value and Rec_L "(i, j) represents the pre-computed luminance sample value. Note also that the 1-tap filter shown in fig. 15A, 15C, and 15D can be understood as an alternative to the downsampling filter as used in CCLM (see equations (6) - (7)), and the filter coefficients are changed.

The pre-operations may be based on gradients, edge direction (detection), pixel intensities, pixel variations, pixel variances, roberts/Prewitt/compass/Sobel/Laplacian operators, high pass filters (by computing gradients or other related operators), low pass filters (by performing weighted average operations), etc. The edge direction detectors listed in the examples may be extended to different edge directions. For example, 1 tap (1, -1) or 2 tap (a, b) are applied in different directions to detect different edge gradients. The filter shape/coefficients may be symmetrical about the chromaticity position as in the example of fig. 15A-15D (420 type 0 case).

The pre-operation parameters (coefficients, symbols, scale/absolute values, thresholding, reLU) may be fixed or signaled/switched in SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level. Note that in an example, if multiple coefficients are applied to one sample (e.g., -1, 4), they may be combined (e.g., 3) to reduce the operation.

In one example, the pre-operation may involve calculating a sample difference of luminance sample values. Alternatively, the pre-operation may include performing downsampling by a weighted average operation. In some cases, the pre-operations may be applied repeatedly. For example, a low pass smoothing FIR filter [1,2,1]/4 or [1,2,1;1,2,1]/8 (i.e., downsampling) may be used to apply a template filter to remove outliers and then a1 tap GLM filter is applied to calculate sample differences to derive the linear model. It is contemplated that the sample difference may also be calculated and then downsampling is enabled.

In one example, the pre-operation coefficients (last applied (e.g., 3), or intermediate applied (e.g., -1, 4) to each luminance sample) may be limited to power values of 2 to save multipliers.

In one aspect of the present disclosure, the proposed new method may be reused/combined with the CCLM discussed above, which utilizes a Simple Linear Regression (SLR) model and uses one corresponding luma sample value to predict chroma sample values. This is also referred to as the 1 tap case. In this case deriving the linear model further comprises deriving the scaling parameter a and the offset parameter β by using pre-computed neighboring luma sample values and neighboring chroma sample values. Alternatively, the linear model may be rewritten as:

C=α·L+β (35)

Where L denotes here the luminance samples of the "pre-operation". The parameter derivation of the 1 tap GLM may reuse the CCLM design but consider the directional gradient (possibly with a high pass filter). In one example, the scaling parameter α may be derived by using a division look-up table (as described in detail below) to achieve simplification.

β=Y_A-αX_A。 (36)

In one example, the scaling adjustments discussed above may be reused when combining the GLM with the SLR model. In this case, the encoder may determine a scaling adjustment value (e.g., "u") to be signaled in the bitstream and add the scaling adjustment value to the derived scaling parameter α. The decoder may determine a scaling adjustment value (e.g., "u") from the bitstream and add the scaling adjustment value to the derived scaling parameter α. The increased value is ultimately used to predict the internal chroma-sample value.

In one aspect of the present disclosure, the proposed new method can be reused for/in combination with FLM that utilizes a Multiple Linear Regression (MLR) model and uses multiple luminance sample values to predict chroma sample values. This is also referred to as a multi-tap case, e.g. 2 taps. In this case, the linear model can be rewritten as:

In this case, the plurality of scaling parameters α and offset parameters β may be derived by using the pre-computed neighboring luminance sample values and neighboring chrominance sample values. In one example, the offset parameter β is optional. In one example, at least one of the plurality of scaling parameters α may be derived by utilizing the sample point differences. Furthermore, another one of the plurality of scaling parameters α may be derived by using the downsampled luminance sample value. In one example, at least one of the plurality of scaling parameters α may be derived by utilizing a horizontal or vertical sample difference calculated on the basis of the downsampled neighboring luminance sample values. In other words, the linear model may combine multiple scaling parameters α associated with different pre-operations.

Implicit filter shape derivation

In one example, the used directional filter shape may be derived at the decoder to save bit overhead, rather than explicitly signaling the selected filter shape index. For example, at the decoder, a plurality of directional gradient filters may be applied to each reconstructed luma sample of the L-shaped templates of the ith adjacent row and column of the current block. The filtered values (gradients) may then be accumulated for each direction in the plurality of directional gradient filters, respectively. In an example, the accumulated value is an accumulated value of absolute values of the corresponding filtered values. After the accumulation, the direction of the directional gradient filter with the largest accumulated value may be determined as the derived (luminance) gradient direction. For example, a gradient histogram (HoG) may be constructed to determine the maximum. The derived direction may be further used as a direction for predicting chroma-samples in the current block.

The following example relates to reusing the decoder-side intra mode derivation (DIMD) method for luma intra prediction included in ECM-4.0:

step 1, applying 2 kinds of directional gradient filters (3×3 horizontal/vertical sobel) to each reconstructed luminance sample point of the L-shaped template of the 2 nd adjacent row and column of the current block;

step 2, accumulating filtered values (gradients) for each of the directional gradient filters by SAD (sum of absolute differences);

step 3, constructing a gradient histogram (HoG) based on accumulating the filtered values, and

The maximum value in the hog is determined as the derived (luminance) gradient direction, based on which the GLM filter can be determined.

In one example, if the shape candidates are [ -1,0,1] (horizontal) and [1,2, 1] [ -1, -2, -1] (vertical), then the shape [ -1,0,1 ]; -1,0,1] is used for GLM-based chroma prediction when the maximum is associated with a horizontal shape.

The shape of the gradient filter used to derive the gradient direction may be the same as or different from the GLM filter in shape. For example, both filters may be horizontal [ -1,0,1; -1,0,1], or both filters may have different shapes, while the GLM filter may be determined based on a gradient filter.

The proposed GLM may be combined with MMLM or ELM discussed above. When combined with classification, each group may share or have its own filter shape, with the syntax indicating the shape for each group. For example, as an exemplary classifier, a horizontal gradient grad_hor may be classified into a first group, which corresponds to a first linear model, and a vertical gradient grad_ver may be classified into a second group, which corresponds to a second linear model. In one example, the horizontal luminance pattern may be generated only once.

Additional possible classifiers are provided below. With a classifier, adjacent and internal luminance sample pairs of a current video block may be classified into groups based on one or more thresholds. Note that as discussed above, each neighboring/inner chroma sample and its corresponding luma sample may be referred to as a luma-chroma sample pair. One or more thresholds are associated with the intensities of the neighboring/internal luminance samples. In this case, each of the plurality of groups corresponds to a respective one of the plurality of linear models.

When combined with the MMLM classifier, the operations of classifying neighboring reconstructed luma-chroma sample points of the current video block into 2 groups based on Threshold, deriving different linear models for the different groups, wherein the derivation process may be GLM-simplified, i.e. the number of taps is reduced by the pre-operation described above, classifying luma-chroma sample point pairs inside the CU (inner luma-chroma sample point pairs, wherein each inner luma-chroma sample point pair of the inner luma-chroma sample point pairs comprises an inner chroma sample point value predicted with the derived linear model) into 2 groups based on Threshold, applying different linear models to the reconstructed luma sample points in the different groups, and predicting chroma sample points in the CU based on the different classified linear models may be performed. Where Threshold may be the average of neighboring reconstructed luminance samples. Note that by increasing the number of Threshold, the number of classes (2) can be extended to multiple classes (e.g., aliquoting based on the minimum/maximum values of neighboring reconstructed (downsampled) luminance samples, fixed or signaled/switched in SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level).

In one example, instead of MMLM luminance DC intensities, the filtered values applied to FLM/GLM on neighboring luminance samples are used for classification. For example, if 1 tap (1, -1) GLM is applied, the average AC value (physical meaning) is used. The processing may be classifying the neighboring reconstructed luma-chroma sample pairs into K groups based on one or more filter shapes, one or more filtered values, and K-1Threshold Ti, deriving different MLR models for the different groups, wherein the derivation process may be GLM reduced, i.e. by the pre-operation described above, reducing the number of taps, similarly classifying luma-chroma sample pairs (internal luma-chroma sample pairs, wherein each internal luma-chroma sample pair of an internal luma-chroma sample pair comprises an internal chroma sample value predicted with the derived linear model) inside the CU based on the one or more filter shapes, the one or more filtered values, and K-1Threshold Ti, applying different linear models to the reconstructed luma samples in the different groups, and predicting chroma samples in the CU based on the different classified linear models. Wherein Threshold may be predefined (e.g., 0, or may be a table) or signaled/switched in SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level. For example, threshold may be the average AC value (filtered value) of neighboring reconstructed (possibly downsampled) luminance samples (2 groups), or based on a min/max AC score (K groups).

It has also been proposed to combine GLM with ELM classifiers. As shown in fig. 15A to 15D, one filter shape (e.g., 1 tap) may be selected to calculate the edge intensity. The direction is determined as the direction along which the sample difference between the current sample and N neighboring samples (e.g., all 6 luminance samples) is calculated. For example, the filters (shapes [1,0, -1;1,0, -1 ]) at the upper middle portion of fig. 15A indicate the horizontal direction because the sample difference between the samples can be calculated in the horizontal direction, while the filters (shapes [1,2,1; -1, -2, -1 ]) below them indicate the vertical direction because the sample difference between the samples can be calculated in the vertical direction. Positive and negative coefficients in each of the filters enable sample differences to be calculated. The processing may then include calculating one edge intensity by filtered values (e.g., equivalents), quantizing the edge intensity to M segments by M-1 thresholds Ti, classifying the current samples using K classes (e.g., k= =m), deriving different MLR models for the different groups, where the derivation process may be GLM-reduced, i.e., reducing the number of taps by the pre-operation described above, classifying the luminance-chrominance samples inside the CU into K groups, applying different MLR models to the reconstructed luminance samples in the different groups, and predicting the chrominance samples in the CU based on the different classified MLR models. Note that the filter shape used for classification may be the same as or different from the filter shape used for MLR prediction. Both the threshold M-1 and the threshold Ti, as well as the number of thresholds M-1 and threshold Ti, may be fixed or signaled/switched in SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level. In addition, other classifiers/combined classifiers as discussed in ELM may also be used for GLM.

If the classification samples in a group are less than one number (e.g., predefined 4), then the default values mentioned in discussing matrix derivation for the MLR model may be applied to the group parameters (α_i,, β). If a corresponding neighboring reconstructed sample is not available for the selected LM mode, a default value may be applied. For example, when MMLM _l mode is selected but the remaining samples are invalid.

Several methods related to the simplification of GLM are introduced below to further improve the codec efficiency.

Matrix/parameter derivation in FLM requires floating point operations (e.g., division in a closed form), which is expensive for decoder hardware, thus requiring a fixed point design. For the 1-tap GLM case, it may be considered a modified luma reconstructed sample generation of the CCLM (e.g., horizontal gradient direction, from CCLM [1,2,1;1,2,1]/8 to GLM [ -1,0,1; -1,0,1 ]), the original CCLM process may be reused for GLM, including fixed point operations, MDLM downsampling, division tables, applying size constraints, min-max approximations, and scaling adjustments. For all items, the 1-tap GLM may have its own configuration or share the same design as the CCLM. For example, parameters (instead of LMS) are derived using a simplified min-max method and combined with scaling after the GLM model is derived. In this case, the center point (luminance value y_r) for the rotation slope becomes the average value of the reference luminance sample point "gradient". For another example, when GLM is turned on for the CU, CCLM slope adjustment is inferred to be off, and syntax related to slope adjustment need not be signaled.

This section takes the typical case reference sample points (top 1 row and left 1 column) as an example. Note that as shown in fig. 14, the extended reconstructed region may also use simplifications of the same nature, and may have a syntax (e.g., MDLM, MRL) that indicates a particular region.

Note that the following aspects may be combined and applied jointly. The partitioning process is performed, for example, in conjunction with reference sample downsampling and a partitioning table.

When applying classification (MMLM/ELM), each group may apply the same or different reduced operations. For example, before applying the right shift, the samples of each group are respectively filled to the target sample numbers, and then the same derivation process, the same division table, is applied.

Fixed point implementation

The 1 tap case may reuse the CCLM design, dividing by n may be achieved by right shifting, and dividing by a₂ may be achieved by LUT. Integer parameters including n_α involved in the integer design of the LMS CCLM,N_table and intermediate parameters (equations (19) - (20)) used to derive the linear model may be the same as or have different values than CCLM to achieve greater accuracy. The integer parameters may be predefined or signaled/switched in SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level, which may be adjusted according to the sequence bit depth. For example, n_table = bit depth +4.

MDLM downsampling

When GLM is combined with MDLM, the existing total samples for parameter derivation may not be power values of 2 and need to be padded to powers of 2 to replace division with a right shift operation. For example, for an 8×4 chroma CU, MDLM requires w+h=12 samples, whereas MDLM _t is only 8 samples available (reconstructed), then the downsampled 4 samples (0, 2, 4, 6) can be equivalently padded. The code to implement such an operation is as follows:

other filling methods, such as repeated/mirrored filling with respect to the last adjacent sample point (rightmost/bottommost) may also be applied.

The filling method for GLM may be the same as or different from the filling method of CCLM.

Note that in the ECM version, 8×4 chromaticity CU MDLM _t/MDLM _l requires 2T/2 l=16/8 samples, respectively, in which case the same padding method can be applied to satisfy the power number of samples of target 2.

Division LUT

The partitioning LUT proposed for CCLM/LIC (local illumination Compensation) in the development of known standards such as AVC/HEVC/AV1/VVC/AVS can be used for GLM division. For example, the LUT in JCTVC-I0166 is reused with bit depth=10 (table 4). The division LUT may be different from CCLM. For example, CCLM uses min-max with DivTable (as in equation 5), but GLM uses a 32-entry LMS division LUT (as in table 5).

When GLM is combined with MMLM, the meanL value may not always be positive (e.g., classifying the group using filtered/gradient values), so sgn (meanL) needs to be extracted and abs (meanL) used to find the division LUT. Note that the division LUTs used for MMLM classification and parameter derivation may be different. For example, a lower precision LUT (e.g., min-max LUT) is used for mean classification, and a higher precision LUT (e.g., in LMS) is used for parameter derivation.

Size constraint and latency constraint

The size constraint may be based on CU area/width/height/depth. The threshold may be predefined or signaled in SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level. For example, for a chroma CU area, the predefined threshold may be 128.

In one example, at least one pre-operation is performed in response to determining that the video block meets an enabling threshold, wherein the enabling threshold is associated with an area, a width, a height, or a segmentation depth of the video block. In particular, the enablement threshold may define a minimum or maximum area, width, height, or segmentation depth of the video block. As understood by those skilled in the art, a video block may include a current chroma block and its co-located luma block. It is also proposed to jointly apply the above-mentioned enabling threshold to the current chroma block and its co-located luma block. For example, in response to determining that both the current chroma block and its co-located luma block meet an enable threshold, at least one pre-operation is performed.

Line buffer reduction

For example, in fig. 13, if the co-located luma region of the current chroma CU contains the first line inside one CTU, the top template may be limited to parameter derivation using only 1 line (instead of 2 lines) (other CUs may still use 2 lines). This may save on luma sample line buffer storage when CTUs are processed line by line at the decoder hardware. Line buffer reduction may be achieved using several methods. Note that the limited example of a "1" row can be extended to N rows with similar operations. Similarly, 2-tap or multi-tap may also apply such operations. Chroma sampling may also require application operations when multi-tap is applied.

For example, a 1-tap filter [1,0, -1;1,0, -1] shown in FIG. 15A is illustrated as an example. The filter can be reduced to [0, 0;1,0, -1], i.e. only the lower coefficients are used. Alternatively, a limited up-link luminance sample may be filled from below (repeat, mirror, 0, meanL, meanc.

Taking as an example that n=4, i.e. the video block is located at the top boundary of the current CTU, the neighboring luma sample values and the corresponding chroma sample values of the top 4 lines are used to derive the linear model. Note that the corresponding chroma-sample values may refer to the corresponding top 4 rows of neighboring chroma-sample values (e.g., for YUV 4:4 format). Alternatively, the corresponding chroma sample values may refer to the corresponding top 2 rows of neighboring chroma sample values (e.g., for YUV 4:2:0 format). In this case, the neighboring luminance sample values and corresponding chrominance sample values of the top 4 rows may be divided into two regions-a first region including valid sample values (e.g., luminance sample values and corresponding chrominance sample values of the nearest row), and a second region including invalid sample values (e.g., luminance sample values and corresponding chrominance sample values of the other three rows). The coefficients of the filter corresponding to sample points not belonging to the first region may then be set to zero such that only sample point values from the first region are used to calculate the sample point difference. For example, as discussed above, in this case, the filter [1,0, -1;1,0, -1] may be reduced to [0, 0;1,0, -1]. Alternatively, the nearest sample value in the first region may be padded to the second region, so that the padded sample value may be used to calculate the sample difference.

Fusion of chroma intra prediction modes

In one example, because GLM can be a special CCLM mode, the fusion design can be reused or have its own way. Multiple (two or more) weights may be applied to generate the final predictor. For example, the number of the cells to be processed,

pred=(w0*pred0+w1*pred1+(1<<(shift-1)))>>shift

Wherein pred0 is a non-LM mode based predictor and pred1 is a GLM based predictor, or

Pred0 is a CCLM-based predictor, including all MDLM/MMLM, and pred1 is a GLM-based predictor, or

Pred0 is a GLM-based predictor, and pred1 is a GLM-based predictor.

Different I/P/B slices may have different weight designs (w 0 and w 1) depending on whether neighboring blocks are coded with CCLM/GLM/other coding modes or block size/width/height.

For example, the design for the weights may be determined by the intra prediction modes of neighboring chroma blocks, and shift is set equal to 2. Specifically, when both the upper and left neighboring blocks are coded using LM mode, { w0, w1} = {1,3}, when both the upper and left neighboring blocks are coded using non-LM mode, { w0, w1 = {3,1}, otherwise, { w0, w1 = {2,2}. For non-I slices, w0 and w1 may both be set equal to 2.

For grammar design, if a non-LM mode is selected, a flag is signaled to indicate whether fusion is applied.

As described above, GLM has a good gain complexity tradeoff because it can reuse existing CCLM modules without introducing additional derivation. Such a 1-tap design may be further extended or generalized in accordance with one or more aspects of the present disclosure.

In one aspect of the present disclosure, for chroma samples to be predicted, a single corresponding luma sample L may be generated by combining co-located luma samples and neighboring luma samples. For example, the combination may be a combination of different linear filters, e.g., a combination of a high pass gradient filter (GLM) and a low pass smoothing filter (e.g., a [1,2,1;1,2,1]/8FIR downsampling filter that may be commonly used in CCLM), and/or a combination of a linear filter and a non-linear filter (e.g., having a power of n, e.g., Lⁿ, n may be a positive number, a negative number, or a + -fraction (e.g., +1/2, square root, or +3, cube, which may be rounded and rescaled to a bit depth dynamic range)).

In one aspect of the disclosure, the combination may be applied repeatedly. For example, a combination of GLM and [1,2,1;1,2,1]/8FIR may be applied on the reconstructed luminance samples, and then a nonlinear power of 1/2 may be applied. For example, the nonlinear filter may be implemented as a LUT (look-up table), e.g., for bit depth=10, power of n, n=1/2, LUT [ i ] = (int) (sqrt (i) +0.5) < <5,i =0-1023, where 5 is to be scaled to bit depth=10 dynamic range. The nonlinear filter may provide an option when the linear filter cannot efficiently process the luminance-chrominance relationship. Whether non-linear terms are used may be predefined or signaled/switched in SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level.

In one or more aspects of the present disclosure, GLM may refer to a generalized linear model (which may be used to generate one single luminance sample linearly or non-linearly, and the generated one single luminance sample may be fed into the CCLM linear model to derive parameters of the CCLM linear model), the linear/non-linear generation may be referred to as a generic mode. Different gradients or generic modes may be combined to form another mode. For example, a gradient pattern may be combined with a down sampled value via CCLM, a gradient pattern may be combined with a non-linear L² value, a gradient pattern may be combined with another gradient pattern, two gradient patterns to be combined may have different directions or the same direction, for example, [1, 1; -1, -1, -1] and [1,2,1; -1, -2, -1], both of which have vertical directions, and also [1, 1; -1, -1, -1] and [1,0, -1], both of which have vertical and horizontal directions, as shown in FIGS. 15A-15D. The combination may include adding, subtracting, or linear weighting.

GLM applied to downsampling domain

As described above, the pre-operation may be repeatedly applied, and GLM may be applied to the pre-linear weighted/pre-operation samples. For example, as a CCLM, a template filter may be applied to luma samples to remove outliers (i.e., CCLM downsampling smoothing filter) and generate downsampled luma samples (one downsampled luma sample corresponding to one chroma sample) using a low-pass smoothing FIR filter [1,2,1;1,2,1 ]/8. And thereafter, a 1-tap GLM may be applied to the smoothed downsampled luminance samples to derive the MLR model.

Some gradient filter modes (such as 3 x 3Sobel or Prewitt operators) may be applied to the downsampled luminance samples. The following table shows some of the gradient filter modes.

The gradient filter pattern may be combined with other gradient/generic filter patterns in the downsampled luminance domain. In one example, a combined filter pattern may be applied to the downsampled luminance samples. For example, the combined filter pattern may be derived by performing an addition or subtraction operation on the respective coefficients of the gradient filter pattern and the DC/low pass based filter pattern, such as the filter pattern [0, 0;0,1,0;0, 0] or [1,2,1;2,4,1;1,2,1 ]. In another example, the combined filter pattern is derived by performing an addition or subtraction operation on coefficients and non-linear values (such as L²) of the gradient filter pattern. In another example, the combined filter pattern is derived by performing an addition or subtraction operation on the corresponding coefficients of the gradient filter pattern and another gradient filter pattern having a different direction or the same direction. In another example, the combined filter pattern is derived by performing a linear weighting operation on coefficients of the gradient filter pattern.

GLM applied to the downsampling field may fit into CCCM framework, but may sacrifice high frequency accuracy because low-pass smoothing is applied before GLM is applied.

In one or more aspects of the present disclosure, one or more grammars may be introduced to indicate information about GLM. Table 10 below shows an example of GLM syntax.

Table 10

FLC fixed length code

TU truncated unary code

EGk an exponential golomb code with k-th order, where k may be fixed or signaled/switched in SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level.

SVLC signed EG0

UVLC unsigned EG0

Note that the binarization of each syntax element may be changed.

In one aspect of the present disclosure, GLM on/off control of Cb/Cr components may be performed jointly or separately. For example, at the CU level, 1 flag may be used to indicate whether GLM is valid for this CU. If so, 1 flag may be used to indicate whether Cb/Cr are both valid. If neither is valid, 1 flag indicates that Cb or Cr is valid. When Cb and/or Cr are active, the filter index/gradient (generic) mode may be signaled separately. All flags may have their own context model or bypassed codec.

In another aspect of the present disclosure, whether to signal the GLM on/off flag may depend on the luma/chroma coding mode and/or the CU size. For example, in the ECM5 chroma intra mode syntax, GLM may be inferred to be off when MMLM or MMLM _l or MMLM _t is applied, when CU area < a, where a may be predefined or signaled/switched in SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level, and when CCCM is on, GLM may be inferred to be off if combined with CCCM.

Note that when GLM is combined with MMLM, different models may share the same gradient/common pattern or have their own gradient/common pattern.

When GLM is combined with CCCM/FLM, if the current CU is enabled to CCCM/FLM, then the CU level GLM enable flag may be inferred to be off.

hasGlmFlag&=!pu.cccmFlag;

CCCM without downsampling procedure

CCCM require processing of the downsampled luminance reference values before calculating the model parameters and applying the CCCM model, which increases the burden on the decoder processing cycle. In this section, CCCM is presented without a downsampling process, including utilizing a non-downsampled luminance reference value and/or a selection of a different non-downsampled luminance reference. One or more filter shapes may be used for the purposes described below.

Note that the methods/examples in this section may be reused in combination with/together with the above-mentioned methods, including but not limited to methods related to classification, filter shape, matrix derivation (with special handling), application area and syntax. Furthermore, the methods/examples listed in this section may also be applied with the above-described methods/examples (more taps) to have better performance under certain complexity trade-offs.

In this disclosure, reference samples/training templates/reconstructed neighboring areas generally refer to luminance samples used to derive MLR model parameters, which are then applied to internal luminance samples in one CU to predict chroma samples in the CU.

Filter shape

One or more shapes/numbers of filter taps may be used for CCCM predictions as shown in fig. 16, 17, and 18A-18B. One or more sets of filter taps may be used for FLM prediction, examples of which are shown in fig. 19A-19G. The selected luminance reference value is not downsampled. One or more predefined shapes/numbers of filter taps may be used for CCCM prediction based on previously decoded information on TB/CB/slice/picture/sequence level.

While the multi-tap filter may fit the training data well (i.e., top/left adjacent reconstructed luma/chroma samples), in some cases where the training data does not capture the complete characteristics of the test data, this may result in an over-fit and not predict the test data well (i.e., chroma block samples to be predicted). Furthermore, different filter shapes can be well adapted to different video block contents, resulting in a more accurate prediction. To solve this problem, the filter shape/number of filter taps may be predefined or signaled/switched in SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level. A set of filter shape candidates may be predefined or signaled/switched in SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level. Different components (U/V) may have different filter switching controls. For example, as shown in the following table, a set of filter shape candidates (index=0 to 5) is predefined, and the filter shape (1, 2) represents a 2-tap luminance filter, while the filter shape (1, 2, 4) represents a 3-tap luminance filter, etc., as shown in fig. 11. The filter shape selection of the U/V component may be switched in the PH or CU/CTU level. Note that the N taps may represent N taps with or without an offset β as described above.

Different chroma types/color formats may have different predefined filter shapes/taps. For example, a predefined filter shape is used for 420 types-0 (1, 2,4, 5), 420 types-2 (0,1,2,4,7), 422 (1, 4), 444 (0, 1,2,3,4, 5), as shown in FIG. 12.

Luminance/chrominance samples that are not available for deriving the MLR model may be filled from the available reconstructed samples. For example, if a 6-tap (0, 1,2,3,4, 5) filter as in fig. 12 is used, then for a CU located at the left picture boundary, the left column including (0, 3) is not available (beyond the picture boundary), so (0, 3) is a repeated fill from (1, 4) to apply the 6-tap filter. Note that the padding process is applied in both training data (top/left adjacent reconstructed luminance/chrominance samples) and test data (luminance/chrominance samples in the CU).

Luminance/chrominance samples that are not available for deriving the MLR model may be skipped and not used in accordance with one or more embodiments of the present disclosure. Thus, a filling process is not required for unavailable luminance/chrominance samples.

CCLM/MMLM with LDL breakdown

CCCM need to process LDL decomposition to calculate model parameters of CCCM model, avoiding the use of square root operations and requiring only integer operations. In this section, CCLM/MMLM with LDL breakdown is presented. As described above, LDL breakdown can also be used in ELM/FLM/GLM.

Note that the methods/examples in this section may be reused in combination with/together with the above-mentioned methods, including but not limited to methods related to classification, filter shape, matrix derivation (with special handling), application area and syntax. Furthermore, the methods/examples listed in this section may also be applied with the above methods/examples to have better performance under certain complexity trade-offs.

CCLM/MMLM with extended range

One or more reference samples may be used for CCLM/MMLM prediction, i.e., as shown in fig. 10B, the reference region may be the same as the reference region in CCCM. Different reference regions may be used for CCLM/MMLM prediction based on previously decoded information at the TB/CB/slice/picture/sequence level.

While training data with multiple reference regions may well fit the calculation of model parameters, in some cases where the training data does not capture the complete characteristics of the test data, this may result in an over fit and not predict the test data well (i.e., the chroma block samples to be predicted). Furthermore, different reference regions can be well adapted to different video block contents, resulting in a more accurate prediction. To solve this problem, the number of reference shapes/reference regions may be predefined or signaled/switched in SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level. A set of reference region candidates may be predefined or signaled/switched in SPS/DPS/VPS/SEI/APS/PPS/PH/SH/region/CTU/CU/sub-block/sample level. Different components (U/V) may have different reference area switching controls. For example, a set of candidate reference regions (idx=0 to 4) is predefined, as shown in the following table. The reference region selection of the U/V component may be switched in the PH or CU/CTU level. Different chroma types/color formats may have different predefined reference areas.

Luminance/chrominance samples that are not available for deriving the MLR model may be padded from the available reconstructed samples, the padding process being applied in both training data (top/left neighboring reconstructed luminance/chrominance samples) and test data (luminance/chrominance samples in the CU).

FLM/GLM/ELM/CCCM with minimum-point restriction

FLM needs to process the downsampled luminance reference values and calculate model parameters, which increases the burden of the decoder processing cycle, especially for small blocks. In this section, an FLM with minimal sample restriction is presented, e.g., FLM is only used for more than a predefined number of samples (such as 64, 128). One or more different restrictions may be used for this purpose, e.g., FLM is used only for more than a predefined number of samples (such as 256) in a single model, and FLM is used only for more than a predefined number of samples (such as 128) in multiple models.

According to one or more embodiments of the present disclosure, the number of predefined minimum samplings for a single model may be greater than or equal to the number of predefined minimum samplings for multiple models. For example, FLM/GLM/ELM/CCCM is used only in a single model for a predefined number of samples greater than or equal to (such as 128), and FLM/GLM/ELM/CCCM is used only in multiple models for a predefined number of samples greater than or equal to (such as 256).

According to one or more embodiments of the present disclosure, the number of predefined minimum samplings for FLM/GLM/ELM may be greater than or equal to the number of predefined minimum samplings for CCCM. For example, CCCM is used only in a single model for a predefined number of samples greater than or equal to a predefined number (such as 0), and CCCM is used only in multiple models for a predefined number of samples greater than or equal to a predefined number (such as 128). The FLM is only used in a single model for a predefined number of samples greater than or equal to, such as 128, and the FLM is only used in multiple models for a predefined number of samples greater than or equal to, such as 256.

Fig. 20 illustrates a workflow of a method 2000 of decoding video data in accordance with one or more aspects of the present disclosure.

At step 2010, method 2000 includes obtaining video blocks from a bitstream.

At step 2020, method 2000 includes obtaining an inner luma sample value for the video block, an outer luma sample value for an outer region of the video block, and an outer chroma sample value for the outer region.

At step 2030, method 2000 includes calculating a downsampled internal luminance sample value from the obtained internal luminance sample values of the video block and a downsampled external luminance sample value from the obtained external luminance sample values of the external region, respectively.

At step 2040, method 2000 includes calculating filtered values for the downsampled external luminance sample values, wherein each of the filtered values is calculated based on a combined filter pattern derived from at least one gradient filter pattern that enables sample differences between the downsampled external luminance sample values to be calculated.

At step 2050, method 2000 includes predicting internal chroma sample values for a video block using a combined filter mode based on the downsampled internal luma sample values and the filtered values.

At step 2060, method 2000 includes obtaining a decoded video block using the predicted internal chroma sample values.

In one example, calculating the downsampled internal luminance sample value and the downsampled external luminance sample value includes obtaining the downsampled internal luminance sample value and the downsampled external luminance sample value by performing a weighted average operation.

In one example, predicting the internal chroma sample values of the video block includes deriving a linear model by using the filtered values and the external chroma sample values and predicting each of the internal chroma sample values of the video block by applying the linear model to the filtered values of the downsampled internal luma sample values, wherein each of the filtered values of the downsampled internal luma sample values is calculated based on the combined filter pattern.

In one example, the combined filter pattern is derived by performing an addition or subtraction operation on the respective coefficients of the at least one gradient filter pattern and the at least one low pass based filter pattern.

In another example, the combined filter pattern is derived by performing an addition or subtraction operation on the coefficients and non-linear values of at least one gradient filter pattern.

In one example, the non-linear value comprises a square of the corresponding downsampled internal luminance sample value.

In one example, the non-linear values are scaled to a range of luminance sample values for the video block.

In one example, the combined filter pattern is derived by performing an addition or subtraction operation on respective coefficients of at least one gradient filter pattern and another gradient filter pattern.

In one example, the combined filter pattern is derived by performing a linear weighting operation on coefficients of at least one gradient filter pattern.

Fig. 21 illustrates a workflow of a method 2100 for encoding video data in accordance with one or more aspects of the present disclosure.

At step 2110, method 2100 includes obtaining a video block.

At step 2120, method 2100 includes obtaining an inner luma sample value of a video block, an outer luma sample value of an outer region of the video block, and an outer chroma sample value of the outer region.

At step 2130, method 2100 includes calculating a downsampled internal luminance sample value from the obtained internal luminance sample values of the video block and a downsampled external luminance sample value from the obtained external luminance sample values of the external region, respectively.

At step 2140, method 2100 includes calculating filtered values for the downsampled external luminance sample values, wherein each of the filtered values is calculated based on a combined filter pattern derived from at least one gradient filter pattern that enables calculation of sample differences between the downsampled external luminance sample values.

At step 2150, method 2100 includes predicting internal chroma sample values for a video block using a combined filter pattern based on the downsampled internal luma sample values and the filtered values.

At step 2160, method 2100 includes generating a bitstream including the encoded video block using the predicted internal chroma sample values.

In one example, predicting the internal chroma sample values of the video block includes deriving a linear model by using the filtered values and the external chroma sample values and predicting each of the internal chroma sample values of the video block by applying the linear model to the filtered values of the downsampled internal luma sample values, wherein each of the filtered values for the downsampled internal luma sample values is calculated based on a combined filter pattern.

In one example, the combined filter pattern is derived by performing an addition or subtraction operation on the coefficients and non-linear values of at least one gradient filter pattern.

FIG. 22 illustrates an example computing system 2200 in accordance with one or more aspects of the disclosure. The computing system 2200 may include at least one processor 2210. Computing system 2200 may also include at least one storage device 2220. The storage device 2220 may store computer executable instructions that, when executed, cause the processor 2210 to perform the steps of the methods described above. The processor 2210 may be a general purpose processor or may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration. The storage device 2220 may store input data, output data, data generated by the processor 2210, and/or instructions for execution by the processor 2210.

It should be appreciated that the storage device 2220 may store computer executable instructions that, when executed, cause the processor 2210 to perform any operations according to embodiments of the disclosure.

Embodiments of the present disclosure may be embodied in a computer-readable medium, such as a non-transitory computer-readable medium. The non-transitory computer-readable medium may include instructions that, when executed, cause one or more processors to perform any operations in accordance with embodiments of the present disclosure. For example, the instructions, when executed, may cause one or more processors to receive a bitstream and perform decoding operations as described above. As another example, the instructions, when executed, may cause the one or more processors to perform encoding operations and transmit a bitstream including encoded video information associated with predicted chroma samples as described above.

It should be recognized that all operations in the above-described methods are merely exemplary, and that the present disclosure is not limited to any operations in the methods or to the order of such operations, but rather should encompass all other equivalents under the same or similar concepts.

It should also be appreciated that all of the modules in the above methods may be implemented in a variety of ways. These modules may be implemented as hardware, software, or a combination thereof. Furthermore, any of these modules may be functionally further divided into sub-modules or combined together.

The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects. Accordingly, the claims are not intended to be limited to the aspects shown herein. All structural and functional equivalents to the elements of the various aspects described throughout this disclosure that are known or later come to be known to those of ordinary skill in the art are expressly incorporated herein by reference and are intended to be encompassed by the claims.