The present application claims priority from U.S. provisional patent application entitled "method and apparatus for point cloud coding using hybrid coding trees" filed on 25.6.2018, Futurewei Technologies, inc. under application number 62/689,666, the contents of which are incorporated herein by reference.
Detailed Description
It should be understood at the outset that although illustrative implementations of one or more embodiments are provided below, the disclosed systems and/or methods may be implemented using any number of techniques, whether currently known or in existence. The present disclosure is not limited to the illustrative embodiments, drawings, and techniques shown below (including the exemplary designs and embodiments shown and described herein), but may be modified within the scope of the appended claims along with their full scope of equivalents.
The following abbreviations apply:
ASIC: application-specific integrated circuit
A CPU: central processing unit (central processing unit)
And (4) DSP: digital signal processor
EO: electric-to-optical (electro-optical)
FPGA: field-programmable gate array
LoD: level of detail level
OE: optical-to-electrical (photoelectric)
PCC: point closed coding (Point cloud decoding)
RAM: random-access memory (RAM)
RF: radio frequency (radio frequency)
ROM: read-only memory
RX: receiver unit
SRAM: static RAM (static RAM)
TCAM: ternary content-addressable memory (ternary content addressable memory)
TX: transmitter unit (transmitter unit)
2D: two dimension (two dimension)
3D: three-dimensional (three-dimensional)
Fig. 1 is a schematic diagram of adecoding system 100. Thedecoding system 100 includes asource device 110, a medium 150, and adestination device 160.
Thesource device 110 includes apoint cloud generator 120, anencoder 130, and anoutput interface 140. Thepoint cloud generator 120 is a component adapted to generate a point cloud. Theencoder 130 may be referred to as a decoder. Theencoder 130 performs encoding according to a set of rules.Output interface 140 is an antenna or another component adapted to transmit data todestination device 160. Alternatively, thepoint cloud generator 120, theencoder 130, and theoutput interface 140 are in a combination of devices.
The medium 150 is a local area network, a wireless network, the internet, or another suitable medium.Medium 150 transfers data betweensource device 110 anddestination device 160.
Destination device 160 includesinput interface 170,decoder 180, andprojector 190.Input interface 170 is an antenna or another component adapted to receive data fromsource device 110. Thedecoder 180 may also be referred to as a transcoder. Thedecoder 180 performs decoding according to a set of rules.Projector 190 is a component adapted to project a point cloud. Alternatively, theinput interface 170,decoder 180, andprojector 190 are in a combination of devices.
In operation, in thesource device 110, thepoint cloud generator 120 captures the point cloud, theencoder 130 encodes the point cloud to create an encoded point cloud, and theoutput interface 140 transmits the encoded point cloud to thedestination device 160 over the medium 150.Source device 110 may store the point cloud or encoded point cloud locally, orsource device 110 may instruct the point cloud or encoded point cloud to be stored on another device. In thedestination device 160, theinput interface 170 receives the encoded point cloud from thesource device 110, thedecoder 180 decodes the encoded point cloud to obtain a decoded point cloud, and theprojector 190 projects the decoded point cloud. Thedecoder 180 may decode the encoded point cloud in the reverse manner as compared to how theencoder 130 encodes the point cloud. Thedestination device 160 stores the encoded point cloud or the decoded point cloud locally, or thedestination device 160 instructs the encoded point cloud or the decoded point cloud to be stored on another device.
Thepoint cloud generator 120 may be a lidar, a conventional camera, an infrared camera, a time-of-flight camera, a laser system, a scanner, or other device that scans objects and generates a point cloud representing the objects. These objects may be vehicles and thus the point cloud may be 3D. The point cloud may include hundreds of thousands or millions of points. Thus, the point cloud requires encoding of a large amount of data and a large amount of bandwidth for communication. Therefore, there is a need for efficient transcoding and communication of point clouds.
Theencoder 130 anddecoder 180 perform PCC of the point cloud. PCC includes attribute transcoding and geometry transcoding. Decoding (coding) includes encoding (encoding) and decoding (decoding). Attribute coding codes attributes such as color, reflectivity, and transparency. Geometric decoding decodes the position of a point in space. Octree decoding is a type of geometric decoding.
The octal decoding starts from a father node, which is a 3D cube containing all points in the point cloud; dividing a father node into eight child nodes, wherein the child nodes are also 3D cubes; and repeating the above division until the desired endpoint. At each division, the divided node is a parent node, and the node into which the parent node is divided is a child node. Each parent node is decoded with eight bits, where each bit indicates whether the corresponding child node includes a point. Some parent nodes may have no points distributed in their 3D space. Rather, those parent nodes may have points along a plane or substantially along a plane. Therefore, it is inefficient to divide those parent nodes into eight child nodes.
Embodiments of hybrid geometric decoding of point clouds are disclosed herein. In this context, mixed geometric coding of point clouds includes 3D coding and 2D coding, such as octree coding and quadtree coding. Quadtree decoding starts from a parent node, which is a 3D cube; projecting points in the parent node onto a 2D plane; and divides the plane into four sub-nodes, which are 2D squares. The parent node is decoded with four bits, where each bit indicates whether the corresponding child node includes a point. Whether octree coding or quadtree coding is used is based on a flatness parameter, rate distortion, or other metric. Although mixed geometry decoding is discussed in connection with octree decoding and quadtree decoding, the embodiments are also applicable to other types of coding having different numbers of nodes. Furthermore, although flatness parameters and rate distortion are discussed, embodiments are also applicable to other metrics that make 2D coding more efficient while maintaining an acceptable level of quality.
Fig. 2 is a flow chart illustrating asimplified decoding method 200. Thedecoding system 100 implements themethod 200. Thesimplified decoding method 200 illustrates a portion of the decoding process. Thus, the transcoding process may include other steps, such as attribute transcoding.
Atstep 210,encoder 130 performs geometric encoding on the point cloud to obtain a first encoded bitstream. Step 210 is further described below with reference to fig. 3 and 5. Instep 220, theencoder 130 performs arithmetic encoding on the first encoded bitstream to obtain a second encoded bitstream. Instep 230, theoutput interface 140 sends a second encoded bitstream.
Atstep 240, theinput interface 170 receives a second encoded bitstream. Instep 250, thedecoder 180 performs arithmetic decoding on the second encoded bitstream to obtain a decoded bitstream. Finally, instep 260, thedecoder 180 performs geometric decoding on the first decoded bitstream to obtain a point cloud. Step 260 is further described below with reference to fig. 6.
Fig. 3 is a flow diagram illustrating amethod 300 of hybrid geometry coding based on flatness constraints in accordance with an embodiment of the disclosure.Method 300 may implementstep 210 in fig. 2. Instep 305, theencoder 130 calculates a feature value λ of the parent node1、λ2、λ3。λ1、λ2、λ3Forming orthogonal bases (x ', y ', z '), i.e. three orthogonal axes. (x ', y ', z ') is rotated relative to the global coordinate system (x, y, z). The included angle between (x, y, z) and (x ', y ', z ') is Rx、Ry、Rz. Instep 310,encoder 130 calculates the flatness parameter Θ of the parent node. Let λ be1、λ2、λ3Middle lambda3At a minimum, Θ is as follows:
atdecision 315,encoder 130 determines whether λ3<And theta. If so, the parent node has a point along a plane or substantially along a plane, and themethod 300 proceeds to step 320. If not, the parent node has no points along the plane or substantially along the plane, and themethod 300 proceeds to step 340.
FIG. 4A is a schematic diagram of aparent node 410 having points along a plane or substantially along a plane. Thus, for theparent node 410, themethod 300 will proceed to step 320. Fig. 4B is a schematic diagram of aparent node 420 without points along a plane or substantially along a plane. Thus, for theparent node 420, themethod 300 will proceed to step 340.
Returning to fig. 3, the
encoder 130 sets the parent node as a quadtree node at
step 320. For example, the
encoder 130 encodes node _ coding _ mode ═ 1. In
step 325, the
encoder 130 calculates parameters of the plane. Specifically,
encoder 130 determines that λ corresponds to
3The feature vector of (2); a normal vector n corresponding to the feature vector; a normal vector n' corresponding to a parent node; and comparing n with n' to obtain the angle theta of the x direction and the angle theta of the y direction
Angle ψ in the z direction. The
encoder 130 uses theta,
Psi calculates a rotation set (R) having the following components
x,R
y,R
z):
Theencoder 130 uses a centroid translation component (centroid translation component) tx、ty、tzThe translation matrix T is calculated as follows:
t shifts the origin from the center of the parent node to the center of the plane.Encoder 130 also calculates the size of one side of the plane as 2mWherein, m-QTNodeSizeLog 2 is based on NodeSizeLog2 and scale _ project io of parent point cloudn:
QTNodeSizeLog2=NodeSizeLog2
Or
QTNodeSizeLog2=NodeSizeLog2–1。 (4)
Atstep 330, theencoder 130 projects points in the point cloud onto a plane. Specifically, encoder 130 projects the points from the 3D (x, y, z) coordinate system to the 2D (u, v) coordinate system using (Rx, Ry, Rz) and T. Instep 335, theencoder 130 encodes the parent node into a quadtree node.
As a first alternative to projecting points onto a plane,encoder 130 uses the projection axis index and distance after making the rotation. In that case, theencoder 130 may make only three rotations of the points. As a second alternative to projecting points onto a plane,encoder 130 uses oblique projection without rotating to one of the child node surfaces. In that case,encoder 130 encodes a normal vector n (x, y, z), a distance d to the centroid of the plane, and a projection plane index. For example, the projection plane index is 00 for xy, 01 for xz, and 10 for yz. The projection plane may be the smallest plane of the projection normal, as follows:
min{nxy(x,y,0),nyz(0,y,z),nxz(x,0,z)} (5)
if the normal projection size is 0, the plane is parallel to the corresponding coordinate plane, and thus theencoder 130 may encode only the distance and the projection plane index. Thus, if no normal vector exists, the plane is parallel to the corresponding coordinate plane.
Instep 340, theencoder 130 sets the parent node as an octree node. Instep 345, theencoder 130 encodes the parent node into an octree node. Finally, atstep 350,encoder 130 generates a child node. When the parent node is a quadtree node, theencoder 130 calculates the center point of the child node as follows:
ChildQTNodeSizeLog2=QTNodeSizeLog2–1 (6)
when the parent node is an octree node,encoder 130 calculates the center point of the child node using standard octree coding.
Encoder 130 repeatsmethod 300 until a LoD is reached, which indicates a predefined maximum number of node partitions or splits. Onceencoder 130 sets the parent node as a quadtree node,encoder 130 encodes each subsequent child node, grandchild node, etc. into a quadtree node untilmethod 300 reaches LoD.
To reduce signaling,encoder 130 may assume octree encoding and performmethod 300, whichmethod 300 begins at a depth indicated by minquadtree depth, a depth indicated by maxquadtiesize log2, or where the number of points in the parent node is below a threshold. Theencoder 130 may encode minquadtree depth or maxquadtiesize log 2. Depth represents the number of node splits. If minQuadTreeDepth is 0, then quad-tree coding is not used.Encoder 130 may signal minquadtree depth or maxquadtree size log 2.
Fig. 5 is a flow diagram illustrating a cost-based hybridgeometry encoding method 500 in accordance with an embodiment of the present disclosure.Method 500 may implementstep 210 in fig. 2. Instep 505,encoder 130 calculates a cost C of octree decoding of the parent nodeO. Instep 510, theencoder 130 calculates a cost C of quadtree decoding of the parent nodeQ. Cost CO、CQIndicating rate distortion, flatness, or other measure of the points in the point cloud. Atdecision 515,encoder 130 determines whether CQ<CO. If so, the cost of quadtree decoding is low enough and themethod 500 proceeds to step 520. If not, then the cost of quadtree decoding is not low enough and themethod 500 proceeds to step 540. To reduce signaling,encoder 130 may assume octree coding and performmethod 500, whichmethod 500 begins where the depth indicated by minquadtree depth, the depth indicated by maxquadtiesize log2, or the number of points in the parent node is below a threshold.Encoder 130 may signal minquadtree depth or maxquadtree size log 2.Steps 520 through 550 are similar tosteps 320 through 350 of fig. 3.
Encoder 130 repeatsmethod 500 until it reaches LoD. Onceencoder 130 sets the parent node as a quadtree node,encoder 130 encodes each subsequent child node, grandchild node, etc. into a quadtree node untilmethod 300 reaches LoD.
To further implementstep 210 of fig. 2, which is further described inmethod 300 of fig. 3 andmethod 500 of fig. 5,encoder 130 may encode the following syntax into a geometric bitstream syntax:
TABLE 1 geometry bitstream syntax
bitstream _ size _ in _ bytes specifies the size of the compressed geometry bitstream in bytes. minquadtree depth specifies the minimum depth of the quadtree decomposition in the current point cloud. When minqautdreededepth is 0, the bitstream does not contain a quadtree node. If minQaudTreeDepth is not defined, it is 0.
GeometryNodeOccupanacyCnt [ depth ] [ xN ] [ yN ] [ zN ] represents the number of child nodes present in an octree node at a location (xN, yN, zN) of a given depth. The undefined geometrynodeucccupancnt value is considered to be 0. It is initialized as follows: GeometryNodeOccupanacyCnt [ -1] [0] [0] ═ 8.
NodeX [ depth ] [ nondeidx ], NodeY [ depth ] [ nondeidx ], NodeZ [ depth ] [ nondeidx ] represents the x, y, z coordinates of the idx-th node in decoding order at a given depth. It is initialized as follows: NodeX [0] ═ nody [0] ═ nodz [0] ═ 0. NumNodesAtDepth [ depth ] indicates the number of nodes to be decoded at a given depth. It is initialized as follows: NumNodesAtDepth [0] ═ 1.
Further,encoder 130 may encode the following syntax into a geometric node syntax:
TABLE 2. geometric node syntax
The node with the depth of MaxGeometryCodeTreeDepth is the last child node and can be called a leaf node. For quadtree decoding, the position of a node is given by the coordinates of its lower left corner position as (uN, vN). uPn and yvn indicate the location of the parent node of the node as follows:
uPn=uN>>1
vPn=vN>>1。 (7)
for octree coding, the position of a node is given by the coordinates of its lower left corner position as (xN, yN, zN). xPn, yPn, zPn indicate the location of the parent node of the node as follows:
xPn=xN>>1
yPn=yN>>1
zPn=zN>>1。 (8)
NodeSizeLog2 is derived as follows: NodeSizeLog2 is MaxGeometryOctreeDepth-depth. ChildNodeSizeLog2 is derived as follows: ChildNodeSizeLog2 ═ NodeSizeLog 2-1. NeighbourPattern was derived as follows:
NeighbourPattern=
((GeometryNodeOccupancyCnt[depth-1][xPn+1][yPn][zPn]!=0)<<0)
((GeometryNodeOccupancyCnt[depth-1][xPn-1][yPn][zPn]!=0)<<1)
((GeometryNodeOccupancyCnt[depth-1][xPn][yPn-1][zPn]!=0)<<2)
|((GeometryNodeOccupancyCnt[depth-1][xPn][yPn+1][zPn]!=0)<<3)
|((GeometryNodeOccupancyCnt[depth-1][xPn][yPn][zPn-1]!=0)<<4)
|((GeometryNodeOccupancyCnt[depth-1][xPn][yPn][zPn+1]!=0)<<5)。
for quadtree coding, the neighbor mode is assumed to be a null node. single _ occupancy _ flag 1 indicates that the node contains a single child node. A single _ occupancy _ flag of 0 indicates that the node may contain multiple child nodes. The occupancy _ idx identifies a single child node present in the parent node. If so, OccupancyMap is set equal to 1< < occupancy _ idx. The occupancy _ map is a bitmap that identifies child nodes present in the parent node. If so, the OccupanceMap is set equal to the output of the geometric occupancy map replacement process when NeighbourPattern and occupancy _ map are called as inputs. For quadtree decoding, the occupancy _ map has a maximum of four input elements. For octree decoding, the occupancy _ map has a maximum of eight input elements. The array GeometryNodeChildren [ i ] identifies the index of the ith occupied child node of the parent node. GeometryNodeChildrenCnt identifies the number of child nodes in the array GeometryNodeChildren [ ]. For octree coding, a parent node may have one to eight child nodes. For quadtree coding, a parent node may have one to four child nodes.
When either occupancy _ idx or occupancy _ map is present, the following conditions apply:
the directmodeflag present is derived as follows:
directmodeflag present is 1 when all of the following conditions are true:
inferred_direct_coding_mode_enabled_flag=1
NodeSizeLog2>1
GeometryNodeOccupancyCnt[depth-1][xPn][yPn][zPn]≤2
GeometryNodeOccupancyCnt[depth][xN][yN][zN]=1
otherwise, DirectModeFlagPresent is 0.
num _ points _ eq _1_ flag 1 indicates that the current child node includes a single point. num _ points _ eq _1_ flag ═ 0 indicates that the child node includes at least two points. If not, num _ points _ eq _1_ flag is inferred to be 1. num _ points _ minus2 indicates the number of points represented by a child node. direct _ mode _ flag 1 indicates that a single child node of the parent node is a leaf node and contains one or more delta point coordinates (delta point coordinates). direct _ mode _ flag-0 indicates that the single child node of the parent node is an octree node. If not, direct _ mode _ flag is inferred to be 0.
When direct _ mode _ flag is equal to 1, the following condition applies: GeometryNodeOccupanacyCnt [ depth ] [ xN ] [ yN ] [ zN ] ═ 0. When direct _ mode _ flag is equal to 0, the following condition applies:
num _ direct _ points _ minus1 indicates the number of points in the child node. Point _ rem _ x [ i ] [ j ], point _ rem _ y [ i ] [ j ], point _ rem _ z [ i ] [ j ] indicate the j-th bit of the corresponding x, y, z coordinate of the ith point of the current child node relative to the origin of the child node identified by the index GeometryNodeChildren [0 ].
Fig. 6 is a hybridgeometry decoding method 600 according to an embodiment of the disclosure.Method 600 may implementstep 260 in fig. 2. Atdecision 610, thedecoder 180 determines whether minquadtree depth is 0. If so, themethod 600 proceeds to step 620. If not,method 600 proceeds todecision 630. Alternatively, once thedecoder 180 reaches a depth where the number of points in the parent node is below the threshold, thedecoder 180 may proceed to step 620. Instep 620, thedecoder 180 decodes the current node as an octree node. Thedecoder 180 repeats step 620 until LoD is reached.
Atdecision 630, thedecoder 180 determines whether the node _ coding _ mode is 0. If so, themethod 600 proceeds to step 640. If not, the method proceeds to step 650. Instep 640, thedecoder 180 decodes the current node as an octree node. Further, if the current node is an octree node and its parent node is also an octree node, thedecoder 180 may not read additional information from the bitstream. Instep 650, thedecoder 180 decodes the current node as a quadtree node. If the current node is a quadtree node and its parent node is an octree node, thedecoder 180 reads a scale parameter (scale parameter), an offset parameter (mean _ value _ x, mean _ value _ y, mean _ value _ z), and a normal parameter (normal parameter) (projection _ rot _ x, projection _ rot _ y, projection _ rot _ z) from the bitstream, and thedecoder 180 applies an inverse scale parameter (scale _ projection) to project the point onto the plane. If the current node is a quadtree node and its parent node is also a quadtree node, thedecoder 180 reads a scale parameter, offset parameters (mean _ value _ x, mean _ value _ y, mean _ value _ z), and normal parameters (project _ rot _ x, project _ rot _ y, animation _ rot _ z) inherited from the parent node from the bitstream, and thedecoder 180 projects the point onto the plane using an inverse scale parameter (scale _ project). Thedecoder 180 repeats thedecision 630 to proceed to step 640 and repeatssteps 640 through 650 until it reaches LoD.
Fig. 7 is a flow diagram illustrating a hybridgeometry encoding method 700 according to an embodiment of the present disclosure.Source device 110implements method 700. Atstep 710, a point cloud is obtained. For example, thepoint cloud generator 120 records a point cloud. At step 720, a selection is made between octree coding or quadtree coding of the parent node. For example,encoder 130 makesdecision 315 in fig. 3 ordecision 515 in fig. 5. Atstep 730, the parent node is encoded into a bitstream based on the selection. For example,encoder 130 performsstep 335 or step 345 in fig. 3, or performsstep 535 or step 545 in fig. 5. Finally, instep 740, the bit stream is transmitted. For example, theoutput interface 140 sends the bitstream to thedestination device 160 through the medium 150.
Fig. 8 is a flow chart illustrating a hybridgeometry decoding method 800 according to an embodiment of the present disclosure.Destination device 160implements method 800. Atstep 810, a bitstream is received. For example,input interface 170 receives a bitstream fromsource device 110. Atstep 820, a first parameter is parsed from the bitstream, the first parameter indicating a depth of the PCC. For example, thedecoder 180 parses minquadtree depth from the bitstream. Atstep 830, the first node before the depth is decoded using octree decoding. For example, after the result ofdecision 610 in fig. 6 is yes, thedecoder 180 performsstep 620. Atstep 840, a second parameter is parsed from the bitstream, the second parameter indicating a node decoding mode of a second node at and after the depth. For example, thedecoder 180 parses the node _ coding _ mode from the bitstream. Finally, atstep 850, the second node is decoded based on the second parameter. For example, thedecoder 180 performsstep 640 or step 650 in fig. 6.
Fig. 9 is a schematic diagram of anapparatus 900 according to an embodiment of the present disclosure.Apparatus 900 may implement the disclosed embodiments.Apparatus 900 includesingress port 910 andRX 920 for receiving data; a processor, logic unit, baseband unit, orCPU 930 for processing data;TX 940 andegress port 950 for transmitting data; amemory 960 for storing data. Theapparatus 900 may also include OE components, EO components, or RF components coupled to theingress ports 910,RX 920,TX 940, andegress ports 950 to provide ingress or egress of optical signals, electrical signals, or RF signals.
Processor 930 is any combination of hardware, middleware, firmware, or software.Processor 930 includes any combination of one or more CPU chips, cores, FPGAs, ASICs, or DSPs. Theprocessor 930 is in communication with theingress port 910, theRX 920, theTX 940, theegress port 950, and thememory 960.Processor 930 includes a hybridgeometry decoding component 970, wheregeometry decoding component 970 implements disclosed embodiments. Thus, the inclusion of hybridgeometry decoding component 970 provides a substantial improvement to the functionality ofdevice 900 and enables the transition ofdevice 900 to different states. Alternatively,memory 960 stores hybridgeometry decoding component 970 as instructions andprocessor 930 executes those instructions.
Memory 960 includes any combination of disks, tape drives, or solid state drives.Apparatus 900 may usememory 960 as an overflow data store to store programs to be executed when selected byapparatus 900, and to store instructions and data read byapparatus 900 during execution of those programs, e.g., as a computer program product.Memory 960 can be volatile or non-volatile, and can be any combination of ROM, RAM, TCAM, or SRAM.
A computer program product may include computer-executable instructions stored on a non-transitory medium (e.g., memory 960) that, when executed by a processor (e.g., processor 930), cause an apparatus to perform any of the embodiments.
Fig. 10 is a schematic diagram of adecoding apparatus 1000. In an embodiment, thecoding apparatus 1000 is implemented in the PCC device 1002 (e.g., theencoder 130 or the decoder 180). ThePCC device 1002 includes a receiving means 1001. Thereceiving device 1001 is used for receiving an image to be encoded or receiving a bitstream to be decoded. ThePCC device 1002 includes atransmitting apparatus 1007 coupled to thereceiving apparatus 1001. Thetransmitting device 1007 is used to send a bitstream to a decoder or a decoded image to a display device.
PCC device 1002 includes astorage 1003. Thestorage device 1003 is coupled to at least one of thereceiving device 1001 or thetransmitting device 1007. Thestorage device 1003 is used to store instructions.PCC device 1002 also includes aprocessing apparatus 1005. Theprocessing device 1005 is coupled to thestorage device 1003.Processing device 1005 is used to execute instructions stored instorage device 1003 to perform the methods disclosed herein.
While various embodiments have been provided in the present disclosure, it can be understood that the disclosed systems and methods can be embodied in many other specific forms without departing from the spirit or scope of the present disclosure. The present examples are to be considered as illustrative and not restrictive, and the invention is not to be limited to the details given herein. For example, various elements or components may be combined or integrated in another system or certain features may be omitted, or not implemented.
Moreover, techniques, systems, subsystems, and methods described and illustrated in the various embodiments as discrete or separate may be combined or integrated with other systems, components, techniques, or methods without departing from the scope of the present disclosure. Other items shown or discussed as coupled may be directly coupled or may be indirectly coupled or communicating through some interface, device, or intermediate component, whether electrically, mechanically, or otherwise. Other examples of changes, substitutions, and alterations are ascertainable by one skilled in the art and could be made without departing from the spirit and scope disclosed herein.