[0182] As shown in FIG. 8, the ® operator translates the center of each c-th vector element to zero by an element-wise addition (which is a subtraction as shown in FIG. 8) of the FM and expanded mean parameters to generate an estimated re-centered feature map, FM_C. The 0 operator scales the elements by an element-wise multiplication of FM_C 810 and an expanded then inverted standard deviation parameters, to generate the normalized feature map, FM_n 818. For some embodiments, expansion is done first, inversion is done second, and element-wise multiplication is done third. For some embodiments, an epsilon e may be added, as shown in Eq. 1 , to the estimated standard deviation distribution parameters as a very small float value to avoid the possibility of zero division.

[0183] FIG. 9 is a schematic illustration showing an example expansion to a feature map dimension according to some embodiments. For some embodiments, the “Expansion to FM” blocks 808, 816 of FIG. 8 operate as shown in FIG. 9. The estimated set of mean parameters /J. 910, is inputted into an expansion to FM dimension process I block 908 to match the dimension of the original FM. For some embodiments, the distribution parameters array 910 may be repeated for each of the n rows of the FM matrix to increase the dimension of the distribution parameters array to match the FM matrix. For some embodiments, a matrix of expanded distribution parameters 902 may have dimensions of C-dimensions 904 by n-pixels 906.

[0184] FIG. 10 is a functional block diagram showing a second example feature map normalization process according to some embodiments. For some embodiments, as illustrated in FIG. 10, a set of distribution parameters o 1014 may be learnt by an “NN Layers 2” block 1012. For some embodiments (not shown in FIG. 10), the input of the “NN Layers 2” block may be the initial FM instead of the centered FM_C.

[0185] An initial FM 1002 is inputted into an “NN Layers 1” block 1004 in which a set of mean distribution parameters /i 1006 are estimated. The estimated mean distribution parameters (n) 1006 are expanded 1008 back into a feature map and subtracted off the original feature map 1002. The centered feature map, FM_C 1010, is passed to the “NN Layers 2” block 1012 to estimate a. The estimated standard deviation distribution parameters (o 1014 are expanded 1016 back into a feature map. The centered feature map, FM_C 1010, is divided by the estimated standard deviation distribution parameters to generate the normalized feature map, FM_n 1018. For each of the columns in the centered feature map matrix, FM_C 1010, each element in that particular column is divided by the element of the estimated standard deviation distribution parameters array (o corresponding to that particular column for some embodiments. For some embodiments, an epsilon e may be added to the estimated standard deviation distribution parameters as a very small float value to avoid the possibility of zero division.

Transforming Feature Elements via Neural Network Layers

[0186] Transforming vector elements in a feature map (FM) by its distribution parameter(s) may emphasize the property of the reconstructing points embedded in the FM. Such a channel-wise (or pointwise) independent vector elements transform, may efficiently adjust the expressivity of the abstracted FM by optimizing the proposed distribution learning model through an end-to-end training process. An independent parameter estimation per vector further discriminates the feature representation by breaking the rigidity of the deformation within the FM.

[0187] FIG. 11 is a functional block diagram showing a first example feature map transformation process according to some embodiments. A transformation may scale and translate the entire set of elements with parameters y and (3. For some embodiments, the parameters may be predetermined. Different pairs of scale and translation values may be applied separately per vector in a feature map. A feature vector may be chosen for this separation, but, as depicted in FIG. 5, a feature vector may be reshaped into a per-channel vector. The transformation parameters y_c and [3_C , which may be an element of the distribution parameters y and ? , respectively, are defined and applied independently for each reshaped vector (channel-wise). These neural network layers, “NN Layers 3” 1104 and “NN Layers 4” 1112, may be trained by an end-to-end PCC framework. During an inference, an optimal set of distribution parameters y and (3 may be estimated from the given input feature map FM_n 1102.

[0188] For some embodiments, such as when transforming an entire FM at once, y and /3 are used to perform a linear transformation of the feature map. For example, suppose a 3D object is in a virtual scene. The parameters y and (3 scale and translate this 3D object to another place. The 3D object’s point coordinates may be represented as a feature map (FM) after a feature extraction process. The parameters y and p also may be applied to the FM. To adjust the distribution of a feature map more precisely the parameters y and p may be learnt/estimated on a per channel basis. In this case, each parameter y and p becomes a vectorized parameter and the transformation becomes non-linear with more degree of freedom to deform the FM.

[0189] For some embodiments of an example process 1100, given a normalized FM_n 1102 as an input, each set of per channel distribution parameters, which forms a vector in the C dimension, is learnt by a set of neural network (NN) layers. This set of NN layers 1104, 1112, which may learn a core set of distribution parameters 1106, 1114, may operate similar to the description given for FIG. 7.

[0190] Using these sets of NN layers 1104, 1112, two sets of distribution parameters y and p are learnt via their own “NN Layers 3” block 1104 and “NN Layers 4” block 1112, respectively as shown in FIG. 11 . Each set of parameters y and p go through an “Expansion to FM” block 1108, 1116 to match the FM dimensions and prepare for their respective 0 and © operations. This transformation learning architecture via NN layers is based on the transformation equation 2:

FM_t = (FM_n Expand(y)) ® Expand( ?) (2)

[0191] As shown in FIG. 11 , the 0 operator scales the elements by an element-wise multiplication of FM_n with an expanded set of distribution parameters y to generate the scaled feature map, FM_S 1110. The scaled feature map, FM_S 1110 generated by the 0 operation, and FM_n 1102 is used as the input to the “NN Layers 4” block 1112. The © operator translates each element in a particular column of the scaled feature map, FM_S 1110, by the expanded element of the set of distribution parameters p that corresponds to that particular column. This © operator generates the transformed feature map, FM_t 1118.

[0192] FIG. 12 is a functional block diagram showing a second example feature map transformation process according to some embodiments. For some embodiments, the “NN Layers 3” block 1204, the “NN Layers 4” block 1212, and the “Expansion to FM” blocks 1208, 1216 operate as described earlier.

[0193] For some embodiments, as depicted in FIG. 12, the FM input 1202 to the “NN Layers 4” 1212 is parallel. Instead of using the scaled feature map, FM_S 1210, as an input to the “NN Layers 4” block 1212, the normalized feature map, FM_n 1202, is used directly as an input to the “NN Layers 4” 1212. This variation of the input FM may improve the training of the NN layers by independently optimizing the set of distribution parameters p. [0194] For some embodiments of the example process 1200, a © operator adds the scaled feature map, FM_S 1210, to the output of the expansion to FM process 1216 to generate the transformed feature map, FM_t 1218.

[0195] For some embodiments, a single transformation block may be used without a normalization process. In this case, the initial FM is inputted directly into a “NN Layers 3” and/or “NN Layers 4” block instead of inputting the normalized feature map, FM_n.

[0196] For some embodiments, distribution parameters may be shared by: (1) processes within the encoder; (2) processes within the decoder; and/or (3) a process in the encoder and a process in the decoder and vice versa.

Extension Towards a Multi-Stage Architecture

[0197] A multi-stage architecture may further refine a feature map computed from the above distribution learning network. To deeply and smoothly communicate between stages, additional functionalities, such as downsampling or filtering, may follow the feature map transformation.

Downsampling Feature Map

[0198] FIG. 13 is a functional block diagram showing an example downsampling of a feature map according to some embodiments. Both voxel-based layers (e.g., CNN) and point-based neural network layers (e.g., shared MLP in PointNet Error! Reference source not found, described in Qi, Charles R., et al., Pointnet: Deep Learning on Point Sets for 3D Classification and Segmentation, PROC, OF IEEE CONF, ON COMP. VISION AND PATTERN RECOGNITION, 652-660 (2017)) may downsample points by a later max or average pooling operation. The degree of downsampling is further adjustable by the size of the local group in image or 3D space. In FIG 13, there are m input feature maps. By controlling this number m, the size of the local group may be adjusted. If m is larger, then the size of the local group gets bigger.

[0199] In the example process 1300 of FIG. 13, an example set of m feature maps 1302, 1304, 1306 is processed by a shared “Distribution Learning Network” block 1308, 1310, 1312. For some embodiments, a DLN block, e.g, may be a combination of the process shown in either FIG. 7 or FIG. 9 plus the process shown in either FIG. 10 or FIG. 11 . Each FM 1302, 1304, 1306 of a local group pixels is transformed via the distribution learning network 1308, 1310, 1312 into a transformed feature map 1314, 1316, 1318. The transformed feature maps 1314, 1316, 1318 are each pooled 1320, 1322, 1324 to a feature vector, FV 1326, 1328, 1330. In this manner, the features aggregate progressively across multi-stage layers. For some embodiments, the pooling operation, for each of the C columns of an n x C transformed feature map FM_t, may, e.g., determine an average of the elements in a particular column or may, e.g., select the maximum value among the elements in a particular column or may, e.g., select the maximum value among the elements in a particular column.

Filtering Feature Map

[0200] FIG. 14 is a functional block diagram showing an example filtering of a feature map according to some embodiments. For some embodiments, the size of points and/or feature dimensions is kept constant across different stages of a multi-stage design. This structure may improve the compatibility and allow the abovedescribed methods to be plugged between layers of any end-to-end compression framework. For some embodiments, as illustrated in the example structure 1400 in FIG. 14, a downsampled feature vector (FV) 1406 is expanded to the original point size by copying the same vector for each expanded point. For some embodiments, downsampling may occur via a pool process 1404. The feature map (FM) 1402 before downsampling is concatenated to the expanded FV 1410. The expansion of the feature vector (FV) to match the size of the original feature map is done before concatenation. For some embodiments, the concatenation generates a concatenated feature map FMconcat 1408. For some embodiments, an additional feed forward network 1412 (e.g., MLP) may be connected to generate a filtered feature map 1414 after the concatenation. This additional neural network layer also matches the in-out dimension of the FM. This process is depicted in FIG. 14. For some embodiments, a feed forward network block 1412 may be a neural network in which information flows in one direction. A feed forward network 1412 may be, for example, an MLP or a CNN. For some embodiments, the purpose of a feed forward network 1412 is to refine a concatenated FM to smoothly adapt to the next block in the process.

[0201] Based on these downsampling and filtering functions, the distribution learning network may be connected multiple times with numerous options in accordance with some embodiments.

[0202] For some embodiments, the first row of FIG. 13 may be combined with FIG. 14. The first feature map of FIG. 13, FM¹, may be concatenated with an expanded first feature vector of FIG. 13, FV¹. See the expanded FV in FIG. 14. This expanded and concatenated feature map, FM_concat, may be inputted into the Feed Forward Network block of FIG. 14, which outputs a filtered feature map, FM_f . For some embodiments, the first transformed feature map of FIG. 13, FMf , may be concatenated with an expanded first feature of FIG. 13, FV¹. For some embodiments, this combination of FIGs. 13 and 14 to generate a filtered feature map may be used to further filter a transformed feature map, such as FM_t of FIGs. 10 and 11 .

Combination with Another Feature Extractor

[0203] For some embodiments, to further enrich the abstraction level of the proposed feature extractor, different purpose neural networks may be put in place along with the distribution learning network. The residual networks of He and Szegedy and the transformer-based models of Zhao, Mao, and Zhang are examples of the other feature extractors that may be combined with the distribution learning network. In the following subsections, two such combined architectures are discussed.

Parallel Architecture

[0204] FIG. 15 is a functional block diagram showing an example parallel architecture for an enhanced distribution learning network according to some embodiments. FIG. 15 illustrates an enhanced distribution learning network process 1500 in which the distribution learning network 1504 outputting a transformed FM is combined with another feature extraction network 1506 in a parallel manner. A feature map FM 1502 is fed to both networks 1504, 1506 in parallel. These neural networks 1504, 1506 extract a distribution adjusted feature map, FM_d 1510, from another feature map, FM_a. After concatenating 1508 both feature maps, an additional feed forward network 1512, such as MLP, further merges the concatenated FM and outputs an enhanced feature map, FM_e 1514. For some embodiments, to avoid degradation in a deeper network with multiple stages, the architecture may combine with, e.g., a residual network. This residual network block (“another feature extractor” block 1506 in FIG. 15) works jointly with the distribution learning block in parallel, thereby propagating a rich representation through a deep network. In some embodiments, the “Distribution Learning Network” block 1504 in FIG. 15 may process a downsampled and filtered FM. In this way, both global and local features are preserved by the “filtering” process.

Serial Architecture

[0205] FIG. 16 is a functional block diagram showing an example serial architecture for an enhanced distribution learning network according to some embodiments. Similar to the parallel architecture of FIG. 15, FIG. 16 shows another enhanced distribution learning network process 1600. In FIG. 16, the output of the distributed learning network block 1604, which may be a distribution-adjusted feature map, FM_d 1606, is combined in series with another feature extraction network block 1608. In this architecture, a feature map FM 1602 is inputted into the distribution learning network block 1604. The distribution learning feature map, FM_d 1606, becomes the input of the other network block (“another feature extractor” block 1608 in FIG. 16), which outputs another feature map, FM_a. The enhanced feature map, FM_e 1612, is generated by a feed forward network 1610.

[0206] For some embodiments, the serial enhanced “Distribution Learning Network” in FIG. 16 may also process a downsampled and filtered FM. For both parallel and serial architectures, for some embodiments, multiple other purpose networks (more than one) may be combined with the distribution learning network.

Deep Enhanced Distribution Learning Network

[0207] FIG. 17 is a functional block diagram showing an example deep enhanced distribution learning network according to some embodiments. The enhanced distribution learning network may be designed in a multi-stage architecture 1700. Such a deep neural network, which may be called a deep enhanced distribution learning network (Deep EDL-Net) 1704, 1706, 1708, is illustrated in detail in FIG. 17. In each stage, the enhanced distribution learning network (either a parallel or serial EDL network combined with residual or point transformer layers) 1704, 1706, 1708 outputs an enhanced feature map, FM_e., in which i represents the i-th stage. After the last stage, a © operation stabilizes the deep network by estimating the residual of the input feature map 1702. For some embodiments, the networks try to learn network parameters through the chain of enhanced distributed learning network blocks, and this ® operation leads to an estimate of a residual of the FM. The output feature map, FM₀ 1710, is the final enriched feature map generated through the deep EDL network.

Application within Learning-Based Compression Frameworks

[0208] Feature extraction in a learning-based compression system is a relatively new area compared to the classic point cloud classifications or segmentations processes. As observed from the recent learning-based PCC frameworks of Wang and the ‘861 application, a feature extractor needs to be designed carefully to handle the trade-off between the fine abstraction and coding time.

[0209] A deep EDL-Net architecture enriches the point feature with an extendable design. For some embodiments, the architecture may be followed by any neural network layer outputting a compatible feature map. The architecture is also applicable within both encoder and decoder of any learning-based compression framework. Voxel-Based PCC Framework

[0210] FIG. 18 is a functional block diagram showing an example deep enhanced distribution learning network in a voxel-based point cloud compression framework according to some embodiments. In a voxel-based PCC architecture, the points are positioned in a regular voxel grid. The downsampling of points may be done by merging adjacent voxels in the regular grid to a coarser resolution. For example, some sparse CNN layers may be placed for both “Voxel Down NN Layers”. Upsampling may reconstruct the merged adjacent voxel with some, e.g., sparse CNN layers placed in both ‘Voxel Up NN Layers”. An example of recent voxel-based PCC is proposed in Wang and the 798 application.

[0211] A voxel-based PCC architecture 1800 may be combined with Deep EDL-Net blocks as illustrated in the architecture 1800 in FIG. 18. For example, in between the voxel upsampling layers 1 (1822) and 2 (1818) in the decoder, a deep EDL-Net 1820 may be inserted to enhance the feature map upsampled from the Voxel Up NN Layers 2 block 1818. FIG. 18 also shows other possible locations to apply a deep EDL-Net 1806, 1810, 1816, 1820 for some embodiments.

[0212] For some embodiments, an input point cloud 1802 may be inputted into a voxel down NN layers 1 block 1804 and the output passed through a deep EDL-Net block 1806. The output of the deep EDL-Net block 1806 may be passed through a voxel down NN layers 2 block 1808 and the output passed through a second deep EDL-Net block 1810. The output of the second deep EDL-Net block 1810 may be passed through an entropy encoder 1812 to generate a bitstream. On the decoder side, the bitstream may be passed through an entropy decoder 1814. The output of the entropy decoder 1814 may be passed through a deep EDL-Net block 1816. The output of the deep EDL-Net block 1816 may be passed through a voxel up NN layers 2 block 1818, the output of which may be passed through a second deep EDL-Net block 1820. The output of the second deep EDL-Net block 1820 may be passed through a voxel up NN layers 1 block 1822 to generate an output point cloud 1824.

Point-Based PCC Framework

[0213] FIG. 19 is a functional block diagram showing an example deep enhanced distribution learning network in a point-based point cloud compression framework according to some embodiments. A recent point-based PCC architecture is introduced in the ‘424 application. The point-based PCC architecture uses an end-to-end PCC framework, which processes point and feature analysis in the encoder and performs feature and point synthesis in the decoder. For some embodiments, on the encoder side, residuals of points surrounding a center block are computed for each block. The feature(s) is/are embedded on each block (via point analysis). These blocks and their corresponding feature(s) is/are downsampled to make a more condensed feature (via feature analysis). On the decoder side, the condensed feature(s) is/are upsampled to create a finer block (via feature synthesis). The surrounding points of each block are reconstructed (via point synthesis). In this framework, MLP and aggregation layers are combined in both point analysis and synthesis layers.

[0214] A point-based PCC architecture may be combined with Deep EDL-Net blocks as shown in FIG. 18. For example, in between the feature synthesis layer 1918 and point synthesis layer 1922 in the decoder, a deep EDL-Net 1920 may be inserted to enhance the feature map generated by the Feature Synthesis NN Layers block. FIG. 19 shows other locations to apply deep EDL-Net 1906, 1910, 1916, 1920 for some embodiments.

[0215] For some embodiments, an input point cloud 1902 may be inputted into a point analysis NN layers block 1904 and the output passed through a deep EDL-Net block 1906. The output of the deep EDL-Net block 1906 may be passed through a feature analysis NN layers block 1908 and the output passed through a second deep EDL-Net block 1910. The output of the second deep EDL-Net block 1910 may be passed through an entropy encoder 1912 to generate a bitstream. On the decoder side, the bitstream may be passed through an entropy decoder 1914. The output of the entropy decoder 1914 may be passed through a deep EDL-Net block 1916. The output of the deep EDL-Net block 1916 may be passed through a feature synthesis NN layers block 1918, the output of which may be passed through a second deep EDL-Net block 1920. The output of the second deep EDL-Net block 1920 may be passed through a point synthesis NN layers block 1922 to generate an output point cloud 1924.

[0216] FIG. 20 is a flowchart illustrating an example learning-based point cloud geometry process according to some embodiments. For some embodiments, an example process 2000 may include obtaining 2002 a first feature map, wherein the first feature map includes C channels, and wherein the first feature map is generated by one or more preceding neural network layers. For some embodiments, the example process 2000 may further include generating 2004 a first set of distribution parameters using a first set of neural network layers based on the first feature map. For some embodiments, the example process 2000 may further include transforming 2006 the first feature map to a second feature map based on the first set of distribution parameters. For some embodiments, the example process 2000 may further include outputting 2008 the second feature map to a succeeding neural network layer. [0217] FIG. 21 is a flowchart illustrating an example learning-based point cloud geometry process according to some embodiments. For some embodiments, an example process 2100 may include obtaining 2102 a first feature map, wherein the first feature map includes C channels, and wherein the first feature map is generated by one or more preceding neural network layers. For some embodiments, the example process 2100 may further include generating 2104 a first set of distribution parameters using a first set of neural network layers based on the first feature map. For some embodiments, the example process 2100 may further include expanding 2106 the first set of distribution parameters to dimensionally-match the first feature map. For some embodiments, the example process 2100 may further include subtracting 2108 the expanded first set of distribution parameters from the first feature map to generate a centered feature map. For some embodiments, the example process 2100 may further include generating 2110 a second set of distribution parameters using a second set of neural network layers based on the first feature map. For some embodiments, the example process 2100 may further include expanding 2112 the second set of distribution parameters to dimensionally-match the centered feature map. For some embodiments, the example process 2100 may further include dividing 2114 the centered feature map by the expanded second set of distribution parameters to generate a normalized feature map. For some embodiments, the example process 2100 may further include outputting 2116 the normalized feature map to a succeeding neural network layer.

[0218] FIG. 22 is a flowchart illustrating an example learning-based point cloud geometry process according to some embodiments. For some embodiments, an example process 2200 may include obtaining 2202 a normalized feature map, wherein the normalized feature map includes C channels corresponding to an original feature map, and wherein the original feature map and the normalized feature map are generated separately by one or more preceding neural network layers. For some embodiments, the example process 2200 may further include generating 2204 a first set of distribution parameters using a first set of neural network layers based on the normalized feature map. For some embodiments, the example process 2200 may further include expanding 2206 the first set of distribution parameters to dimensionally-match the normalized feature map. For some embodiments, the example process 2200 may further include scaling 2208 the normalized feature map by the expanded first set of distribution parameters to generate a scaled feature map. For some embodiments, the example process 2200 may further include generating 2210 a second set of distribution parameters using a second set of neural network layers based on the normalized feature map. For some embodiments, the example process 2200 may further include expanding 2212 the second set of distribution parameters to dimensionally- match the scaled feature map. For some embodiments, the example process 2200 may further include adding 2214 the scaled feature map to the expanded second set of distribution parameters to generate a transformed feature map. For some embodiments, the example process 2200 may further include outputting 2216 the transformed feature map to a succeeding neural network layer.

[0219] While the methods and systems in accordance with some embodiments are generally discussed in context of extended reality (XR), some embodiments may be applied to any XR contexts such as, e.g., virtual reality (VR) / mixed reality (MR) I augmented reality (AR) contexts. Also, although the term “head mounted display (HMD)” is used herein in accordance with some embodiments, some embodiments may be applied to a wearable device (which may or may not be attached to the head) capable of, e.g., XR, VR, AR, and/or MR for some embodiments.

[0220] A first example method in accordance with some embodiments may include: obtaining a first feature map, wherein the first feature map includes C channels, and wherein the first feature map is generated by one or more preceding neural network layers; generating a first set of distribution parameters using a first set of neural network layers based on the first feature map; transforming the first feature map to a second feature map based on the first set of distribution parameters; and outputting the second feature map to a succeeding neural network layer.

[0221] For some embodiments of the first example method, the first feature map further includes n feature vectors respectively corresponding to n distinct points in 3D space.

[0222] For some embodiments of the first example method, each feature vector represents a feature of the corresponding point in 3D space; and the first feature map has a dimension n x C.

[0223] Some embodiments of the first example method may further include: generating a second set of distribution parameters using a second set of neural networks layers; transforming the first feature map to a third feature map based on the second set of distribution parameters; and updating the first feature map to the third feature map.

[0224] For some embodiments of the first example method, generating the first set of distribution parameters may include: updating a given feature map via neural network layers; and simplifying the updated feature map to obtain the first set of per channel distribution parameters.

[0225] For some embodiments of the first example method, transforming the first feature map may include: generating, for each of the C channels in the first feature map, a respective reshaped vector, wherein each reshaped vector is generated by reshaping a respective channel vector in the first feature map; obtaining a distribution parameter for each of the C channels corresponding to one of the respective reshaped vectors; expanding each of the distribution parameters into a respective feature channel; and transforming, for each of the C channels, each element in the reshaped vector by the expanded distribution parameter.

[0226] Some embodiments of the first example method may further include updating the first feature map by normalizing each feature element of the first feature map.

[0227] For some embodiments of the first example method, normalizing each feature element of the first feature map may include: generating a third set of distribution parameters using a third set of neural network layers; centering the first feature map to a fourth feature map based on the third set of distribution parameters; and updating the first feature map to the fourth feature map.

[0228] Some embodiments of the first example method may further include: generating a fourth set of distribution parameters from the fourth feature map; and normalizing the fourth feature map to a fifth feature map based on the fourth set of distribution parameters.

[0229] Some embodiments of the first example method may further include: generating a fifth set of distribution parameters using a fifth set of neural networks layers; and normalizing the fourth feature map to a sixth feature map based on the fifth set of distribution parameters.

[0230] Some embodiments of the first example method may further include: downsampling the second feature map to generate a first feature vector, wherein downsampling the second feature map uses a pooling function.

[0231] For some embodiments of the first example method, the pooling function is selected from the group consisting of average pooling and max pooling.

[0232] Some embodiments of the first example method may further include: generating an expanded feature map by expanding the first feature vector towards a channel dimension; concatenating the second feature map with the expanded feature map to generate a concatenated feature map; and passing the concatenated feature map through a filtering neural network layers to generate a filtered feature map.

[0233] Some embodiments of the first example method may further include aggregating the second feature map using an additional neural network.

[0234] For some embodiments of the first example method, the additional neural network includes at least one of a sparse convolution neural network (CNN) and a multi-layer perceptron (MLP). re[0235] Some embodiments of the first example method may further include aggregating the second feature map using a residual network.

[0236] For some embodiments of the first example method, the residual network is a ResNet block.

[0237] Some embodiments of the first example method may further include aggregating the second feature map using a transformer block.

[0238] For some embodiments of the first example method, the transformer block is selected from the group consisting of a point transformer and a voxel transformer.

[0239] Some embodiments of the first example method may further include: generating a seventh feature map by aggregating the second feature map using a neural network in parallel to transform the first feature map to the second feature map; and concatenating the seventh feature map to the second feature map.

[0240] Some embodiments of the first example method may further include: repeating the learning-based point cloud geometry process one or more times to generate an eighth feature map; and adding the eighth feature map to the first feature map to generate a ninth feature map.

[0241] A first example apparatus in accordance with some embodiments may include: a processor; and a non-transitory computer-readable medium storing instructions operative, when executed by the processor, to cause the apparatus to: obtain a first feature map, wherein the first feature map includes C channels, and wherein the first feature map is generated by one or more preceding neural network layers; generate a first set of distribution parameters using a first set of neural network layers based on the first feature map; transform the first feature map to a second feature map based on the first set of distribution parameters; and output the second feature map to a succeeding neural network layer.

[0242] For some embodiments of the first example apparatus, the first feature map further includes n feature vectors respectively corresponding to n distinct points in 3D space.

[0243] For some embodiments of the first example apparatus, each feature vector represents a feature of the corresponding point in 3D space; and the first feature map has a dimension n x C.

[0244] A second example method in accordance with some embodiments may include: obtaining a first feature map, wherein the first feature map includes C channels, and wherein the first feature map is generated by one or more preceding neural network layers; generating a first set of distribution parameters using a first set of neural network layers based on the first feature map; expanding the first set of distribution parameters to dimensional ly-match the first feature map; subtracting the expanded first set of distribution parameters from the first feature map to generate a centered feature map; generating a second set of distribution parameters using a second set of neural network layers based on the first feature map; expanding the second set of distribution parameters to dimensionally-match the centered feature map; dividing the centered feature map by the expanded second set of distribution parameters to generate a normalized feature map; and outputting the normalized feature map to a succeeding neural network layer.

[0245] For some embodiments of the second example method, the first feature map further includes n feature vectors respectively corresponding to n distinct points in 3D space.

[0246] For some embodiments of the second example method, each feature vector represents a feature of the corresponding point in 3D space; and the first feature map has a dimension n x C.

[0247] For some embodiments of the second example method, expanding the first set of distribution parameters to dimensionally-match the first feature map includes copying a first distribution parameter vector one or more times to dimensionally-match a corresponding dimension of the first feature map, the first distribution parameter vector includes the first set of distribution parameters, expanding the second set of distribution parameters to dimensionally-match the centered feature map includes copying a second distribution parameter vector one or more times to dimensionally-match a corresponding dimension of the centered feature map, and the second distribution parameter vector includes the second set of distribution parameters.

[0248] A second example apparatus in accordance with some embodiments may include: a processor; and a non-transitory computer-readable medium storing instructions operative, when executed by the processor, to cause the apparatus to: obtain a first feature map, wherein the first feature map includes C channels, and wherein the first feature map is generated by one or more preceding neural network layers; generate a first set of distribution parameters using a first set of neural network layers based on the first feature map; expand the first set of distribution parameters to dimensionally-match the first feature map; subtract the expanded first set of distribution parameters from the first feature map to generate a centered feature map; generate a second set of distribution parameters using a second set of neural network layers based on the first feature map; expand the second set of distribution parameters to dimensionally-match the centered feature map; divide the centered feature map by the expanded second set of distribution parameters to generate a normalized feature map; and output the normalized feature map to a succeeding neural network layer. [0249] A third example method in accordance with some embodiments may include: obtaining a normalized feature map, wherein the normalized feature map includes C channels corresponding to an original feature map, and wherein the original feature map and the normalized feature map are generated separately by one or more preceding neural network layers; generating a first set of distribution parameters using a first set of neural network layers based on the normalized feature map; expanding the first set of distribution parameters to dimensionally- match the normalized feature map; scaling the normalized feature map by the expanded first set of distribution parameters to generate a scaled feature map; generating a second set of distribution parameters using a second set of neural network layers based on the normalized feature map; expanding the second set of distribution parameters to dimensionally-match the scaled feature map; adding the scaled feature map to the expanded second set of distribution parameters to generate a transformed feature map; and outputting the transformed feature map to a succeeding neural network layer.

[0250] For some embodiments of the third example method, the first feature map further includes n feature vectors respectively corresponding to n distinct points in 3D space.

[0251] For some embodiments of the third example method, each feature vector represents a feature of the corresponding point in 3D space; and the first feature map has a dimension n x C.

[0252] For some embodiments of the third example method, an input to the second set of neural network layers is the scaled feature map.

[0253] For some embodiments of the third example method, an input to the second set of neural network layers is the normalized feature map.

[0254] Some embodiments of the third example method may further include: downsampling the transformed feature map to generate a first feature vector, wherein downsampling the transformed feature map uses a pooling function.

[0255] For some embodiments of the third example method, the pooling function is selected from the group consisting of average pooling and max pooling.

[0256] Some embodiments of the third example method may further include: generating an expanded feature map by expanding the first feature vector towards a channel dimension; concatenating the first feature map with the expanded feature map to generate a concatenated feature map; and passing the concatenated feature map through a filtering neural network layers to generate a filtered feature map. [0257] Some embodiments of the third example method may further include: aggregating the first feature map using an additional neural network.

[0258] For some embodiments of the third example method, the additional neural network includes at least one of a sparse convolution neural network (CNN) and a multi-layer perceptron (MLP).

[0259] Some embodiments of the third example method may further include aggregating the first feature map using a residual network.

[0260] For some embodiments of the third example method, the residual network is a ResNet block.

[0261] Some embodiments of the third example method may further include: repeating the learning-based point cloud geometry method one or more times to generate an enhanced feature map; and adding the enhanced feature map to the first feature map to generate an enhanced output feature map.

[0262] Some embodiments of the third example method may further include repeating the learning-based point cloud geometry method one or more times within a point cloud encoder.

[0263] Some embodiments of the third example method may further include repeating the learning-based point cloud geometry method one or more times within a point cloud decoder.

[0264] For some embodiments of the third example method, the point cloud encoder processes voxels.

[0265] For some embodiments of the third example method, the point cloud encoder performs point and feature analysis.

[0266] A third example apparatus in accordance with some embodiments may include: a processor; and a non-transitory computer-readable medium storing instructions operative, when executed by the processor, to cause the apparatus to: obtain a normalized feature map, wherein the normalized feature map includes C channels corresponding to an original feature map, and wherein the original feature map and the normalized feature map are generated separately by one or more preceding neural network layers; generate a first set of distribution parameters using a first set of neural network layers based on the normalized feature map; expand the first set of distribution parameters to dimensionally-match the normalized feature map; scale the normalized feature map by the expanded first set of distribution parameters to generate a scaled feature map; generate a second set of distribution parameters using a second set of neural network layers based on the normalized feature map; expand the second set of distribution parameters to dimensionally-match the scaled feature map; add the scaled feature map to the expanded second set of distribution parameters to generate a transformed feature map; and output the transformed feature map to a succeeding neural network layer.

[0267] An example apparatus in accordance with some embodiments may include at least one processor configured to perform any one of the methods listed above.

[0268] An example apparatus in accordance with some embodiments may include a computer-readable medium storing instructions for causing one or more processors to perform any one of the methods listed above.

[0269] An example apparatus in accordance with some embodiments may include at least one processor and at least one non-transitory computer-readable medium storing instructions for causing the at least one processor to perform any one of the methods listed above.

[0270] An example apparatus in accordance with some embodiments may include a computer-readable medium storing a scene description file generated according to any one of the methods listed above.

[0271] An example apparatus in accordance with some embodiments may include a signal including a scene description file generated according to any one of the methods listed above.

[0272] This disclosure describes a variety of aspects, including tools, features, embodiments, models, approaches, etc. Many of these aspects are described with specificity and, at least to show the individual characteristics, are often described in a manner that may sound limiting. However, this is for purposes of clarity in description, and does not limit the disclosure or scope of those aspects. Indeed, all of the different aspects can be combined and interchanged to provide further aspects. Moreover, the aspects can be combined and interchanged with aspects described in earlier filings as well.

[0273] The aspects described and contemplated in this disclosure can be implemented in many different forms. While some embodiments are illustrated specifically, other embodiments are contemplated, and the discussion of particular embodiments does not limit the breadth of the implementations. At least one of the aspects generally relates to video encoding and decoding, and at least one other aspect generally relates to transmitting a bitstream generated or encoded. These and other aspects can be implemented as a method, an apparatus, a computer readable storage medium having stored thereon instructions for encoding or decoding video data according to any of the methods described, and/or a computer readable storage medium having stored thereon a bitstream generated according to any of the methods described. [0274] In the present disclosure, the terms “reconstructed” and “decoded” may be used interchangeably, the terms “pixel” and “sample” may be used interchangeably, the terms “image,” “picture” and “frame” may be used interchangeably. Usually, but not necessarily, the term “reconstructed” is used at the encoder side while “decoded” is used at the decoder side.

[0275] The terms HDR (high dynamic range) and SDR (standard dynamic range) often convey specific values of dynamic range to those of ordinary skill in the art. However, additional embodiments are also intended in which a reference to HDR is understood to mean “higher dynamic range” and a reference to SDR is understood to mean “lower dynamic range.” Such additional embodiments are not constrained by any specific values of dynamic range that might often be associated with the terms “high dynamic range” and “standard dynamic range.”

[0276] Various methods are described herein, and each of the methods includes one or more steps or actions for achieving the described method. Unless a specific order of steps or actions is required for proper operation of the method, the order and/or use of specific steps and/or actions may be modified or combined. Additionally, terms such as “first”, “second”, etc. may be used in various embodiments to modify an element, component, step, operation, etc., such as, for example, a “first decoding” and a “second decoding”. Use of such terms does not imply an ordering to the modified operations unless specifically required. So, in this example, the first decoding need not be performed before the second decoding, and may occur, for example, before, during, or in an overlapping time period with the second decoding.

[0277] Various numeric values may be used in the present disclosure, for example. The specific values are for example purposes and the aspects described are not limited to these specific values.

[0278] Embodiments described herein may be carried out by computer software implemented by a processor or other hardware, or by a combination of hardware and software. As a non-limiting example, the embodiments can be implemented by one or more integrated circuits. The processor can be of any type appropriate to the technical environment and can encompass one or more of microprocessors, general purpose computers, special purpose computers, and processors based on a multi-core architecture, as non-limiting examples.

[0279] Various implementations involve decoding. “Decoding”, as used in this disclosure, can encompass all or part of the processes performed, for example, on a received encoded sequence in order to produce a final output suitable for display. In various embodiments, such processes include one or more of the processes typically performed by a decoder, for example, entropy decoding, inverse quantization, inverse transformation, and differential decoding. In various embodiments, such processes also, or alternatively, include processes performed by a decoder of various implementations described in this disclosure, for example, extracting a picture from a tiled (packed) picture, determining an upsampling filter to use and then upsampling a picture, and flipping a picture back to its intended orientation.

[0280] As further examples, in one embodiment “decoding” refers only to entropy decoding, in another embodiment “decoding” refers only to differential decoding, and in another embodiment “decoding” refers to a combination of entropy decoding and differential decoding. Whether the phrase “decoding process” is intended to refer specifically to a subset of operations or generally to the broader decoding process will be clear based on the context of the specific descriptions.

[0281] Various implementations involve encoding. In an analogous way to the above discussion about “decoding”, “encoding” as used in this disclosure can encompass all or part of the processes performed, for example, on an input video sequence in order to produce an encoded bitstream. In various embodiments, such processes include one or more of the processes typically performed by an encoder, for example, partitioning, differential encoding, transformation, quantization, and entropy encoding. In various embodiments, such processes also, or alternatively, include processes performed by an encoder of various implementations described in this disclosure.

[0282] As further examples, in one embodiment “encoding” refers only to entropy encoding, in another embodiment “encoding” refers only to differential encoding, and in another embodiment “encoding” refers to a combination of differential encoding and entropy encoding. Whether the phrase “encoding process” is intended to refer specifically to a subset of operations or generally to the broader encoding process will be clear based on the context of the specific descriptions.

[0283] When a figure is presented as a flow diagram, it should be understood that it also provides a block diagram of a corresponding apparatus. Similarly, when a figure is presented as a block diagram, it should be understood that it also provides a flow diagram of a corresponding method/process.

[0284] Various embodiments refer to rate distortion optimization. In particular, during the encoding process, the balance or trade-off between the rate and distortion is usually considered, often given the constraints of computational complexity. The rate distortion optimization is usually formulated as minimizing a rate distortion function, which is a weighted sum of the rate and of the distortion. There are different approaches to solve the rate distortion optimization problem. For example, the approaches may be based on an extensive testing of all encoding options, including all considered modes or coding parameters values, with a complete evaluation of their coding cost and related distortion of the reconstructed signal after coding and decoding. Faster approaches may also be used, to save encoding complexity, in particular with computation of an approximated distortion based on the prediction or the prediction residual signal, not the reconstructed one. A mix of these two approaches can also be used, such as by using an approximated distortion for only some of the possible encoding options, and a complete distortion for other encoding options. Other approaches only evaluate a subset of the possible encoding options. More generally, many approaches employ any of a variety of techniques to perform the optimization, but the optimization is not necessarily a complete evaluation of both the coding cost and related distortion.

[0285] The implementations and aspects described herein can be implemented in, for example, a method or a process, an apparatus, a software program, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method), the implementation of features discussed can also be implemented in other forms (for example, an apparatus or program). An apparatus can be implemented in, for example, appropriate hardware, software, and firmware. The methods can be implemented in, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices, such as, for example, computers, cell phones, portable/personal digital assistants (“PDAs”), and other devices that facilitate communication of information between end-users.

[0286] Reference to “one embodiment” or “an embodiment” or “one implementation” or “an implementation”, as well as other variations thereof, means that a particular feature, structure, characteristic, and so forth described in connection with the embodiment is included in at least one embodiment. Thus, the appearances of the phrase “in one embodiment” or “in an embodiment” or “in one implementation” or “in an implementation”, as well any other variations, appearing in various places throughout this disclosure are not necessarily all referring to the same embodiment.

[0287] Additionally, this disclosure may refer to “determining” various pieces of information. Determining the information can include one or more of, for example, estimating the information, calculating the information, predicting the information, or retrieving the information from memory.

[0288] Further, this disclosure may refer to “accessing” various pieces of information. Accessing the information can include one or more of, for example, receiving the information, retrieving the information (for example, from memory), storing the information, moving the information, copying the information, calculating the information, determining the information, predicting the information, or estimating the information.

[0289] Additionally, this disclosure may refer to “receiving” various pieces of information. Receiving is, as with “accessing”, intended to be a broad term. Receiving the information can include one or more of, for example, accessing the information, or retrieving the information (for example, from memory). Further, “receiving” is typically involved, in one way or another, during operations such as, for example, storing the information, processing the information, transmitting the information, moving the information, copying the information, erasing the information, calculating the information, determining the information, predicting the information, or estimating the information.

[0290] It is to be appreciated that the use of any of the following “/”, “and/or”, and “at least one of”, for example, in the cases of “A/B”, “A and/or B” and “at least one of A and B”, is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of both options (A and B). As a further example, in the cases of “A, B, and/or C” and “at least one of A, B, and C”, such phrasing is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of the third listed option (C) only, or the selection of the first and the second listed options (A and B) only, or the selection of the first and third listed options (A and C) only, or the selection of the second and third listed options (B and C) only, or the selection of all three options (A and B and C). This may be extended for as many items as are listed.

[0291] Also, as used herein, the word “signal” refers to, among other things, indicating something to a corresponding decoder. For example, in certain embodiments the encoder signals a particular one of a plurality of parameters for region-based filter parameter selection for de-artifact filtering. In this way, in an embodiment the same parameter is used at both the encoder side and the decoder side. Thus, for example, an encoder can transmit (explicit signaling) a particular parameter to the decoder so that the decoder can use the same particular parameter. Conversely, if the decoder already has the particular parameter as well as others, then signaling can be used without transmitting (implicit signaling) to simply allow the decoder to know and select the particular parameter. By avoiding transmission of any actual functions, a bit savings is realized in various embodiments. It is to be appreciated that signaling can be accomplished in a variety of ways. For example, one or more syntax elements, flags, and so forth are used to signal information to a corresponding decoder in various embodiments. While the preceding relates to the verb form of the word “signal”, the word “signal” can also be used herein as a noun. [0292] Implementations can produce a variety of signals formatted to carry information that can be, for example, stored or transmitted. The information can include, for example, instructions for performing a method, or data produced by one of the described implementations. For example, a signal can be formatted to carry the bitstream of a described embodiment. Such a signal can be formatted, for example, as an electromagnetic wave (for example, using a radio frequency portion of spectrum) or as a baseband signal. The formatting can include, for example, encoding a data stream and modulating a carrier with the encoded data stream. The information that the signal carries can be, for example, analog or digital information. The signal can be transmitted over a variety of different wired or wireless links, as is known. The signal can be stored on a processor-readable medium.

[0293] We describe a number of embodiments. Features of these embodiments can be provided alone or in any combination, across various claim categories and types. Further, embodiments can include one or more of the following features, devices, or aspects, alone or in any combination, across various claim categories and types:

• Adapting residues at an encoder according to any of the embodiments discussed.

• A bitstream or signal that includes one or more of the described syntax elements, or variations thereof.

• A bitstream or signal that includes syntax conveying information generated according to any of the embodiments described.

• Inserting in the signaling syntax elements that enable the decoder to adapt residues in a manner corresponding to that used by an encoder.

• Creating and/or transmitting and/or receiving and/or decoding a bitstream or signal that includes one or more of the described syntax elements, or variations thereof.

• Creating and/or transmitting and/or receiving and/or decoding according to any of the embodiments described.

• A method, process, apparatus, medium storing instructions, medium storing data, or signal according to any of the embodiments described.

• A TV, set-top box, cell phone, tablet, or other electronic device that performs adaptation of filter parameters according to any of the embodiments described.

• A TV, set-top box, cell phone, tablet, or other electronic device that performs adaptation of filter parameters according to any of the embodiments described, and that displays (e.g. using a monitor, screen, or other type of display) a resulting image. • A TV, set-top box, cell phone, tablet, or other electronic device that selects (e.g. using a tuner) a channel to receive a signal including an encoded image, and performs adaptation of filter parameters according to any of the embodiments described.

• A TV, set-top box, cell phone, tablet, or other electronic device that receives (e.g. using an antenna) a signal over the air that includes an encoded image, and performs adaptation of filter parameters according to any of the embodiments described.

[0294] Note that various hardware elements of one or more of the described embodiments are referred to as “modules” that carry out (i.e., perform, execute, and the like) various functions that are described herein in connection with the respective modules. As used herein, a module includes hardware (e.g., one or more processors, one or more microprocessors, one or more microcontrollers, one or more microchips, one or more application-specific integrated circuits (ASICs), one or more field programmable gate arrays (FPGAs), one or more memory devices) deemed suitable by those of skill in the relevant art for a given implementation. Each described module may also include instructions executable for carrying out the one or more functions described as being carried out by the respective module, and it is noted that those instructions could take the form of or include hardware (i.e., hardwired) instructions, firmware instructions, software instructions, and/or the like, and may be stored in any suitable non-transitory computer-readable medium or media, such as commonly referred to as RAM, ROM, etc.

[0295] Although features and elements are described above in particular combinations, one of ordinary skill in the art will appreciate that each feature or element can be used alone or in any combination with the other features and elements. In addition, the methods described herein may be implemented in a computer program, software, or firmware incorporated in a computer-readable medium for execution by a computer or processor. Examples of computer-readable storage media include, but are not limited to, a read only memory (ROM), a random access memory (RAM), a register, cache memory, semiconductor memory devices, magnetic media such as internal hard disks and removable disks, magneto-optical media, and optical media such as CD-ROM disks, and digital versatile disks (DVDs). A processor in association with software may be used to implement a radio frequency transceiver for use in a WTRU, UE, terminal, base station, RNC, or any host computer.

Claims

1 . A learning-based point cloud geometry method, the method comprising: obtaining a first feature map, wherein the first feature map comprises C channels, and wherein the first feature map is generated by one or more preceding neural network layers; generating a first set of distribution parameters using a first set of neural network layers based on the first feature map; transforming the first feature map to a second feature map based on the first set of distribution parameters; and encoding a bitstream based on the second feature map.

2. A learning-based point cloud geometry method, the method comprising: obtaining a first feature map, wherein the first feature map comprises C channels, and wherein the first feature map is generated by one or more preceding neural network layers; generating a first set of distribution parameters using a first set of neural network layers based on the first feature map; transforming the first feature map to a second feature map based on the first set of distribution parameters; and outputting the second feature map to a succeeding neural network layer.

3. The method of claim 2, wherein the first feature map further comprises n feature vectors respectively corresponding to n distinct points in 3D space.

4. The method of claim 3, wherein each feature vector represents a feature of the corresponding point in 3D space; and wherein the first feature map has a dimension n x C.

5. The method of claim 2, further comprising: generating a second set of distribution parameters using a second set of neural networks layers; transforming the first feature map to a third feature map based on the second set of distribution parameters; and updating the first feature map to the third feature map.

6. The method of claim 2, wherein generating the first set of distribution parameters comprises: updating a given feature map via neural network layers; and simplifying the updated feature map to obtain the first set of per channel distribution parameters.

7. The method of claim 2, wherein transforming the first feature map comprises: generating, for each of the C channels in the first feature map, a respective reshaped vector, wherein each reshaped vector is generated by reshaping a respective channel vector in the first feature map; obtaining a distribution parameter for each of the C channels corresponding to one of the respective reshaped vectors; expanding each of the distribution parameters into a respective feature channel; and transforming, for each of the C channels, each element in the reshaped vector by the expanded distribution parameter.

8. The method of claim 2, further comprising updating the first feature map by normalizing each feature element of the first feature map.

9. The method of claim 7, wherein normalizing each feature element of the first feature map comprises: generating a third set of distribution parameters using a third set of neural network layers; centering the first feature map to a fourth feature map based on the third set of distribution parameters; and updating the first feature map to the fourth feature map.

10. The method of claim 9, further comprising: generating a fourth set of distribution parameters from the fourth feature map; and normalizing the fourth feature map to a fifth feature map based on the fourth set of distribution parameters.

11 . The method of claim 9, further comprising: generating a fifth set of distribution parameters using a fifth set of neural networks layers; and normalizing the fourth feature map to a sixth feature map based on the fifth set of distribution parameters.

12. The method of claim 2, further comprising: downsampling the second feature map to generate a first feature vector, wherein downsampling the second feature map uses a pooling function.

13. The method of claim 12, wherein the pooling function is selected from the group consisting of average pooling and max pooling.

14. The method of claim 12, further comprising: generating an expanded feature map by expanding the first feature vector towards a channel dimension; concatenating the second feature map with the expanded feature map to generate a concatenated feature map; and passing the concatenated feature map through a filtering neural network layers to generate a filtered feature map.

15. The method of claim 2, further comprising aggregating the second feature map using an additional neural network.

16. The method of claim 15, wherein the additional neural network comprises at least one of a sparse convolution neural network (CNN) and a multi-layer perceptron (MLP).

17. The method of claim 2, further comprising aggregating the second feature map using a residual network.

18. The method of claim 17, wherein the residual network is a ResNet block.

19. The method of claim 2, further comprising aggregating the second feature map using a transformer block.

20. The method of claim 19, wherein the transformer block is selected from the group consisting of a point transformer and a voxel transformer.

21 . The method of claim 2, further comprising: generating a seventh feature map by aggregating the second feature map using a neural network in parallel to transform the first feature map to the second feature map; and concatenating the seventh feature map to the second feature map.

22. The method of claim 2, further comprising: repeating the learning-based point cloud geometry process one or more times to generate an eighth feature map; and adding the eighth feature map to the first feature map to generate a ninth feature map.

23. An apparatus comprising: a processor; and a non-transitory computer-readable medium storing instructions operative, when executed by the processor, to cause the apparatus to: obtain a first feature map, wherein the first feature map comprises C channels, and wherein the first feature map is generated by one or more preceding neural network layers; generate a first set of distribution parameters using a first set of neural network layers based on the first feature map; transform the first feature map to a second feature map based on the first set of distribution parameters; and encode a bitstream based on the second feature map.

24. An apparatus comprising: a processor; and a non-transitory computer-readable medium storing instructions operative, when executed by the processor, to cause the apparatus to: obtain a first feature map, wherein the first feature map comprises C channels, and wherein the first feature map is generated by one or more preceding neural network layers; generate a first set of distribution parameters using a first set of neural network layers based on the first feature map; transform the first feature map to a second feature map based on the first set of distribution parameters; and output the second feature map to a succeeding neural network layer.

25. The apparatus of claim 24, wherein the first feature map further comprises n feature vectors respectively corresponding to n distinct points in 3D space.

26. The apparatus of claim 24, wherein each feature vector represents a feature of the corresponding point in 3D space; and wherein the first feature map has a dimension n x C.

27. A learning-based point cloud geometry method, the method comprising: obtaining a first feature map, wherein the first feature map comprises C channels, and wherein the first feature map is generated by one or more preceding neural network layers; generating a first set of distribution parameters using a first set of neural network layers based on the first feature map; expanding the first set of distribution parameters to dimensionally-match the first feature map; subtracting the expanded first set of distribution parameters from the first feature map to generate a centered feature map; generating a second set of distribution parameters using a second set of neural network layers based on the first feature map; expanding the second set of distribution parameters to dimensionally-match the centered feature map; dividing the centered feature map by the expanded second set of distribution parameters to generate a normalized feature map; and encoding a bitstream based on the normalized feature map.

28. A learning-based point cloud geometry method, the method comprising: obtaining a first feature map, wherein the first feature map comprises C channels, and wherein the first feature map is generated by one or more preceding neural network layers; generating a first set of distribution parameters using a first set of neural network layers based on the first feature map; expanding the first set of distribution parameters to dimensionally-match the first feature map; subtracting the expanded first set of distribution parameters from the first feature map to generate a centered feature map; generating a second set of distribution parameters using a second set of neural network layers based on the first feature map; expanding the second set of distribution parameters to dimensionally-match the centered feature map; dividing the centered feature map by the expanded second set of distribution parameters to generate a normalized feature map; and outputting the normalized feature map to a succeeding neural network layer.

29. The method of claim 28, wherein the first feature map further comprises n feature vectors respectively corresponding to n distinct points in 3D space.

30. The method of claim 28, wherein each feature vector represents a feature of the corresponding point in 3D space; and wherein the first feature map has a dimension n x C.

31. The method of claim 28, wherein expanding the first set of distribution parameters to dimensionally-match the first feature map comprises copying a first distribution parameter vector one or more times to dimensionally-match a corresponding dimension of the first feature map, wherein the first distribution parameter vector comprises the first set of distribution parameters, wherein expanding the second set of distribution parameters to dimensionally-match the centered feature map comprises copying a second distribution parameter vector one or more times to dimensionally- match a corresponding dimension of the centered feature map, and wherein the second distribution parameter vector comprises the second set of distribution parameters.

32. An apparatus comprising: a processor; and a non-transitory computer-readable medium storing instructions operative, when executed by the processor, to cause the apparatus to: obtain a first feature map, wherein the first feature map comprises C channels, and wherein the first feature map is generated by one or more preceding neural network layers; generate a first set of distribution parameters using a first set of neural network layers based on the first feature map; expand the first set of distribution parameters to dimensionally-match the first feature map; subtract the expanded first set of distribution parameters from the first feature map to generate a centered feature map; generate a second set of distribution parameters using a second set of neural network layers based on the first feature map; expand the second set of distribution parameters to dimensionally-match the centered feature map; divide the centered feature map by the expanded second set of distribution parameters to generate a normalized feature map; and encode a bitstream based on the normalized feature map.

33. An apparatus comprising: a processor; and a non-transitory computer-readable medium storing instructions operative, when executed by the processor, to cause the apparatus to: obtain a first feature map, wherein the first feature map comprises C channels, and wherein the first feature map is generated by one or more preceding neural network layers; generate a first set of distribution parameters using a first set of neural network layers based on the first feature map; expand the first set of distribution parameters to dimensionally-match the first feature map; subtract the expanded first set of distribution parameters from the first feature map to generate a centered feature map; generate a second set of distribution parameters using a second set of neural network layers based on the first feature map; expand the second set of distribution parameters to dimensionally-match the centered feature map; divide the centered feature map by the expanded second set of distribution parameters to generate a normalized feature map; and output the normalized feature map to a succeeding neural network layer.

34. A learning-based point cloud geometry method, the method comprising: obtaining a normalized feature map, wherein the normalized feature map comprises C channels corresponding to an original feature map, and wherein the original feature map and the normalized feature map are generated separately by one or more preceding neural network layers; generating a first set of distribution parameters using a first set of neural network layers based on the normalized feature map; expanding the first set of distribution parameters to dimensionally-match the normalized feature map; scaling the normalized feature map by the expanded first set of distribution parameters to generate a scaled feature map; generating a second set of distribution parameters using a second set of neural network layers based on the normalized feature map; expanding the second set of distribution parameters to dimensionally-match the scaled feature map; adding the scaled feature map to the expanded second set of distribution parameters to generate a transformed feature map; and encoding a bitstream based on the transformed feature map.

35. A learning-based point cloud geometry method, the method comprising: obtaining a normalized feature map, wherein the normalized feature map comprises C channels corresponding to an original feature map, and wherein the original feature map and the normalized feature map are generated separately by one or more preceding neural network layers; generating a first set of distribution parameters using a first set of neural network layers based on the normalized feature map; expanding the first set of distribution parameters to dimensionally-match the normalized feature map; scaling the normalized feature map by the expanded first set of distribution parameters to generate a scaled feature map; generating a second set of distribution parameters using a second set of neural network layers based on the normalized feature map; expanding the second set of distribution parameters to dimensionally-match the scaled feature map; adding the scaled feature map to the expanded second set of distribution parameters to generate a transformed feature map; and outputting the transformed feature map to a succeeding neural network layer.

36. The method of claim 35, wherein the first feature map further comprises n feature vectors respectively corresponding to n distinct points in 3D space.

37. The method of claim 35, wherein each feature vector represents a feature of the corresponding point in 3D space; and wherein the first feature map has a dimension n x C.

38. The method of claim 35, wherein an input to the second set of neural network layers is the scaled feature map.

39. The method of claim 35, wherein an input to the second set of neural network layers is the normalized feature map.

40. The method of any one of claims 35-39, further comprising: downsampling the transformed feature map to generate a first feature vector, wherein downsampling the transformed feature map uses a pooling function.

41 . The method of claim 40, wherein the pooling function is selected from the group consisting of average pooling and max pooling.

42. The method of any one of claims 40-41, further comprising: generating an expanded feature map by expanding the first feature vector towards a channel dimension; concatenating the first feature map with the expanded feature map to generate a concatenated feature map; and passing the concatenated feature map through a filtering neural network layers to generate a filtered feature map.

43. The method of any one of claims 35-42, further comprising aggregating the first feature map using an additional neural network.

44. The method of claim 43, wherein the additional neural network comprises at least one of a sparse convolution neural network (CNN) and a multi-layer perceptron (MLP).

45. The method of any one of claims 35-42, further comprising aggregating the first feature map using a residual network.

46. The method of claim 45, wherein the residual network is a ResNet block.

47. The method of any one of claims 35-46, further comprising: repeating the learning-based point cloud geometry method one or more times to generate an enhanced feature map; and adding the enhanced feature map to the first feature map to generate an enhanced output feature map.

48. The method of any one of claims 35-47, further comprising repeating the learning-based point cloud geometry method one or more times within a point cloud encoder.

49. The method of any one of claims 35-47, further comprising repeating the learning-based point cloud geometry method one or more times within a point cloud decoder.

50. The method of any one of claims 35-48, wherein the point cloud encoder processes voxels.

51 . The method of any one of claims 35-48, wherein the point cloud encoder performs point and feature analysis.

52. An apparatus comprising: a processor; and a non-transitory computer-readable medium storing instructions operative, when executed by the processor, to cause the apparatus to: obtain a normalized feature map, wherein the normalized feature map comprises C channels corresponding to an original feature map, and wherein the original feature map and the normalized feature map are generated separately by one or more preceding neural network layers; generate a first set of distribution parameters using a first set of neural network layers based on the normalized feature map; expand the first set of distribution parameters to dimensionally-match the normalized feature map; scale the normalized feature map by the expanded first set of distribution parameters to generate a scaled feature map; generate a second set of distribution parameters using a second set of neural network layers based on the normalized feature map; expand the second set of distribution parameters to dimensionally-match the scaled feature map; add the scaled feature map to the expanded second set of distribution parameters to generate a transformed feature map; and encode a bitstream based on the transformed feature map.

53. An apparatus comprising: a processor; and a non-transitory computer-readable medium storing instructions operative, when executed by the processor, to cause the apparatus to: obtain a normalized feature map, wherein the normalized feature map comprises C channels corresponding to an original feature map, and wherein the original feature map and the normalized feature map are generated separately by one or more preceding neural network layers; generate a first set of distribution parameters using a first set of neural network layers based on the normalized feature map; expand the first set of distribution parameters to dimensionally-match the normalized feature map; scale the normalized feature map by the expanded first set of distribution parameters to generate a scaled feature map; generate a second set of distribution parameters using a second set of neural network layers based on the normalized feature map; expand the second set of distribution parameters to dimensionally-match the scaled feature map; add the scaled feature map to the expanded second set of distribution parameters to generate a transformed feature map; and output the transformed feature map to a succeeding neural network layer.

54. An apparatus comprising at least one processor configured to perform the method of any one of claims 1-

22, 27-31, and 34-51.

55. An apparatus comprising a computer-readable medium storing instructions for causing one or more processors to perform the method of any one of claims 1-22, 27-31 , and 34-51.

56. An apparatus comprising at least one processor and at least one non-transitory computer-readable medium storing instructions for causing the at least one processor to perform the method of any one of claims 1-22, 27-31 , and 34-51.

57. A computer-readable medium storing a feature map generated according to any one of claims 1-22, 27-31 , and 34-51.

58. A signal including a feature map generated according to any one of claims 1-22, 27-31 , and 34-51.