CN114445593A

Movatterモバイル変換

Info

Publication number: CN114445593A
Application number: CN202210114639.7A
Authority: CN
Inventors: 詹东旭; 冯绪杨
Original assignee: Chongqing Changan Automobile Co Ltd
Current assignee: Chongqing Changan Automobile Co Ltd
Priority date: 2022-01-30
Filing date: 2022-01-30
Publication date: 2022-05-06
Anticipated expiration: 2042-01-30
Also published as: CN114445593B

Abstract

The invention discloses a method for generating a bird's-eye view semantic segmentation label based on multi-frame semantic point cloud splicing, which comprises the following steps of 1) configuring six cameras and a laser radar on a vehicle; 2) calibrating the internal parameters of each camera and the external parameters relative to the vehicle body by using a calibration plate, and calibrating the external parameters of the laser radar relative to the vehicle body; 3) synchronizing data collected by a camera and a laser radar at the same moment; 4) jointly marking 6 original images collected by a camera at the same moment and a cloud point image of a laser radar; 5) converting the marked point cloud picture into each camera plane, and dyeing the point cloud by using semantic information of the image; 6) splicing continuous multi-frame semantic point clouds to a uniform vehicle body coordinate system taking a certain frame as a reference, and projecting the continuous multi-frame semantic point clouds to a BEV canvas. According to the invention, the semantic point clouds are generated and spliced by utilizing the image semantic information and the point cloud information, and are finally projected into the bird's-eye view canvas for automatic generation, so that the cost of the data tags is reduced.

Description

Aerial view semantic segmentation label generation method based on multi-frame semantic point cloud splicing

Technical Field

The invention relates to the technical field of automobile automatic driving surrounding perception, in particular to a method for generating a bird's-eye view semantic segmentation label based on multi-frame semantic point cloud splicing.

Background

Autopilot has been valued by more and more manufacturers as a key technology for a new generation of intelligent cars. Generally, the entire autopilot system consists of three major modules: the system comprises a perception fusion module, a decision planning module and a control module, wherein the perception fusion is used as a front module of the other two modules, and the perception precision of the perception fusion module directly determines the performance of the whole automatic driving system.

The technology of the current perception module has not been limited to the traditional single front view Camera (Forward Camera) configuration, and all manufacturers have started to perform 360-degree blind-corner-free surround perception by using multiple cameras around the vehicle body, and the most common arrangement is shown in fig. 1: the six cameras are respectively arranged at the front View, the rear View, the left front, the left rear, the right front and the right rear, collect image information of different visual angles, then send the image information into a surrounding perception model, and the surrounding perception model directly outputs the semantic information of a Bird's Eye View (BEV). The bird's eye view discussed herein refers specifically to a bird's eye view taken from directly above the host vehicle. The bird's-eye view semantic information refers to semantic segmentation of the bird's-eye view, and segmentation elements of the bird's-eye view are defined according to requirements and comprise static objects such as lane lines and drivable areas, and comprise moving objects such as vehicles and pedestrians.

In order to train such a bird's-eye view semantic segmentation model, it is naturally necessary to acquire a corresponding bird's-eye view semantic segmentation label (hereinafter referred to as BEV label). The current possible semantic segmentation label acquisition mode is as follows:

the first mode is as follows: high-precision maps are generated off-line (such as the high-precision map generation method, device, equipment and readable storage medium disclosed in CN 202010597488.6), and then corresponding BEV labels are generated through semantic information elements of the high-precision maps. The method needs to extract a constructed high-precision map and then directly acquire the BEV aerial view by utilizing semantic information of the high-precision map.

The second mode is as follows: utilize unmanned aerial vehicle, carry out synchronous aerial photography directly over the data acquisition car, then the manual work marks the aerial view. The biggest disadvantage of this approach is that the drones are usually controlled by the region, so the data collection scenarios are limited. Furthermore, this acquisition approach cannot be triggered by the shadow mode, which makes subsequent closed-loop iterations of the model difficult.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a bird's-eye view semantic segmentation label generation method based on multi-frame semantic point cloud splicing so as to obtain a set of low-cost BEV label automatic generation algorithm, avoid the high cost and inconvenience of an unmanned aerial vehicle and a high-precision map, and directly obtain BEV labels by utilizing semantic point cloud and multi-frame splicing.

The technical scheme of the invention is realized as follows:

a bird's-eye view semantic segmentation label generation method based on multi-frame semantic point cloud splicing is characterized by comprising the following steps:

1) configuring sensors for data acquisition on the vehicle: arranging a camera in more than two directions of the vehicle respectively to ensure that parts of the visual angles of two adjacent cameras are overlapped together so as to cover 360 degrees around the vehicle body; the laser radar is arranged at the top of the vehicle body;

2) sensor calibration: calibrating the internal parameters of each camera and the external parameters relative to the vehicle body by using a calibration plate, and calibrating the external parameters of the laser radar relative to the vehicle body by using a camera and laser radar combined calibration method;

3) data acquisition: synchronizing data collected by a camera and a laser radar at the same moment, and ensuring that the time stamp difference of all data at each moment does not exceed a set value; when the scanning edge of the laser radar is overlapped with the optical axis of the camera, triggering the camera to expose to obtain an original image of the external image data of the vehicle body;

4) data annotation: performing joint labeling on the original image acquired by each camera at the same moment and the point cloud image acquired by the laser radar correspondingly; marking original image road surface information on each frame of image acquired by a camera, marking at least two static targets of a lane line and a travelable area, marking a 3D bounding box of a moving target on a synchronous point cloud image acquired by a laser radar, and marking at least two moving targets of a traveler and a vehicle;

5) generating a single-frame semantic point cloud: converting the marked point cloud picture into each camera plane, and dyeing the point cloud by using semantic information of the image to generate semantic point cloud;

6) and splicing continuous multi-frame semantic point clouds to a uniform body coordinate system taking a certain frame as a reference, and projecting the continuous multi-frame semantic point clouds to a BEV canvas to obtain a compact BEV label.

In this way, the semantic point clouds are generated by utilizing the image semantic information and the point cloud information, then continuous multi-frame semantic point clouds are spliced and finally projected into the bird's-eye view canvas and post-processed, so that the bird's-eye view semantic segmentation map is automatically generated, the situation that the bird's-eye view semantic tags are obtained in a high-cost mode such as an unmanned aerial vehicle or a high-precision map is avoided, and the cost of data tags is greatly reduced.

Further: and 6, a BEV label post-processing step is further included, some hole areas projected by point clouds are repaired, and manual or morphological refine is carried out on the map projected by the point clouds to obtain the BEV pavement area label. Thus, a more accurate BEV pavement area label can be obtained by the repair of the post-processing step.

Further: the external parameters of each camera are described by a yaw angle yaw, a pitch angle pitch, a roll angle, an X-direction translation distance tx, a Y-direction translation distance ty and a Z-direction translation distance tz; inner reference refers to the x-direction and y-direction pixel dimensions f of the camera_x、f_yAnd pixel center p_x、p_y；

The projection matrix from the vehicle body coordinate system to the camera pixel coordinate system can be obtained through external reference and internal reference, and the specific transformation derivation is as formulas (1) to (8):

R＝R_yawR_pitchR_roll (4)

recording the rigid body rotation matrix as R; recording the translation vector as T; k is an internal reference matrix formed by internal references of the camera; the formula (9) is derived from (6) and (8); r, T, K form a 3x4 projection matrix P which is the homogeneous coordinate of a certain point in the coordinate system of the vehicle body

Projected as pixel coordinates (u, v) of the camera pixel plane, where Z_cThis is the depth of the point in the camera coordinate system. Therefore, the external reference and the internal reference of each camera can be accurately calibrated, and the reference is determined for obtaining the BEV pavement area label with high precision.

Further: and (3) converting the laser radar coordinate system into the vehicle body coordinate system, wherein the conversion relation is as follows:

therefore, the laser radar coordinate system can be transformed to the vehicle body coordinate system, and the transformation of the coordinate system is convenient to accurately carry out.

Further: generating the single-frame semantic point cloud in the step 5 specifically as follows:

firstly, converting point cloud into a vehicle body coordinate system by using the following formula, then converting the point cloud into a camera coordinate system by using the formula (6), filtering out point cloud points with Z-direction coordinates smaller than 0, and finally converting the point cloud points into pixel coordinate points by using the formula (8);

the formula is as follows:

wherein

Representing the conversion of the lidar point cloud to the coordinates in the ith camera coordinate system, Z_{i_lidar}＞0，

Representing the coordinates of the laser radar point cloud on the ith camera pixel plane;

the original point cloud perspective is transformed to a pixel plane of a certain camera i by the formulas (11) and (12), and a perspective projection picture is obtained and recorded as Mask_{i_lidar}Wherein if a certain pixel coordinate has a projection of point cloud, then Mask_{i_lidar}(u, v) ═ 1, otherwise Mask_{i_lidar}(u, v) ═ 0; then passing through the semantic label Mask of the image of the ith camera_{i_gt}For Mask_{i_lidar}Performing a dyeing; the point cloud has category attribute and becomes semantic point cloud.

Further: the dyeing process is as follows:

(1) selecting an opposite type j to be dyed;

(2) assume that the label value of this class is f_jThat isThe set of pixel points corresponding to this category is:

Mask_{j_i_gt}＝(Mask_{i_gt}＝＝f_j) (13)

(3) solving the intersection M of the mask of the category j and the non-0 pixel point in the point cloud projection mask_j：

M_ij＝{(u,v)|Mask_{j_i_gt}(u,v)＝＝1,Mask_{i_lidar}(u,v)＝＝1} (14)

(4) For the non-0 projection point sets, reversely solving the corresponding point sets in the original point cloud:

P_ij＝{(X_lidar,Y_lidar,Z_lidar)|(X_lidar,Y_lidar,Z_lidar) Projection point E M of ith camera_ij} (15)

Given point set P_ijLabel category j, i.e. stain point cloud. Therefore, the method can dye various opposite categories and becomes semantic point cloud.

Further: for each camera, P can be found_ijThe final point cloud with category j is:

for each category of the pavement area, the point cloud can be dyed according to the steps (1) to (4), so that the semantic point cloud P with different category labels is finally obtained_j。

Further: the step 6 is to obtain a compact BEV label, which specifically comprises the following steps:

selecting total 2N +1 frames of the front N frames and the back N frames of the current frame as original information to generate a BEV label of the current frame, wherein the reference frame is the current frame, the label is 0, and the reference frame is represented by the following table ref; converting the points of the world coordinate system into the reference coordinate system expression by using the objective invariant world coordinate system and using subscript w as follows:

wherein R is_wAnd T_wStill defined according to the above equations (4) and (5), wherein the yaw angle yaw, the pitch angle pitch, the roll angle roll, the X-direction translation distance tx, the Y-direction translation distance ty, and the Z-direction translation distance tz are a pose information of the vehicle body itself; for the mth frame (m belongs to { -N, - (N-1),. and-1, 0, 1.. and N-1, N }), a semantic point cloud set P of the category j is first obtained in step 5_mjThen, the point clouds are converted to a world coordinate system, and the conversion formula is as follows:

wherein

A certain point in the point cloud set representing the jth label category of the mth frame; by means of equations (17) and (18), the point cloud set of the jth tag of the mth frame can be converted into a point cloud of a uniform reference coordinate system:

converting the semantic point clouds of all the frames of the [ -N, N ] into a reference coordinate system through a formula (19) to obtain compact point clouds, and then projecting the compact point clouds onto a BEV canvas to obtain a compact BEV label.

Further: the BEV label post-processing is to repair some hollow areas of the point cloud projection and perform refine on the label again manually; and for the moving target, directly projecting four grounding points of a 3d bounding box of the moving target onto a BEV canvas of a reference coordinate system, and finally fusing the BEV pavement label and the BEV moving target to obtain an accurate BEV pavement area label.

And further: the timestamp difference of all data at each moment is set to 45 ms.

Further: the front view, the back view, the left front, the left back, the right front and the right back of the vehicle body are respectively provided with one camera, and the total number of the cameras is six, and the cameras cover 360 degrees around the vehicle body. Therefore, the visual angles of two adjacent cameras are partially overlapped, and the vehicle body can be covered by 360 degrees.

In summary, the invention has the following beneficial effects:

1. the method is a set of low-cost BEV label automatic generation algorithm, high cost and inconvenience of an unmanned aerial vehicle and a high-precision map are avoided, and BEV labels are obtained by directly utilizing semantic point cloud and multi-frame splicing.

2. According to the bird's-eye view semantic segmentation method, the semantic point clouds are generated by utilizing the image semantic information and the point cloud information, then the continuous multi-frame semantic point clouds are spliced and finally projected to the bird's-eye view canvas and subjected to post-processing, so that the bird's-eye view semantic segmentation map is automatically generated, the situation that a bird's-eye view semantic tag is obtained in a high-cost mode such as an unmanned aerial vehicle or a high-precision map is avoided, and the cost of data tags is greatly reduced.

Drawings

FIG. 1 is a schematic view of a data acquisition sensor configuration of the vehicle;

FIG. 2-1 is a schematic view of a raw picture taken by a sensor;

FIG. 2-2 is a mask generated from the label of the original picture of FIG. 2-1;

FIGS. 2-3 are cloud point views with bounding boxes labeled;

FIG. 3-1 is a projection view of a single-frame travelable area BEV; FIG. 3-2 is a single frame lane line BEV projection;

FIG. 4-1 is a multi-frame drivable area BEV projection; fig. 4-2 are multi-frame lane line BEV projection views;

FIG. 5 is a moving object BEV label, a pavement BEV detailing label and a final fusion map;

fig. 6 is a flow chart of the overall BEV label auto-generation algorithm.

Detailed Description

The following detailed description of specific embodiments of the invention refers to the accompanying drawings.

Referring to fig. 1 to 6, the invention relates to a bird's-eye view semantic segmentation label generation method based on multi-frame semantic point cloud splicing, which comprises the following steps:

the specific implementation steps are as follows:

1. the sensor for data acquisition is configured on a vehicle and mainly comprises a camera and a radar (laser radar): the acquisition devices are configured according to the position of the vehicle body shown in figure 1 to form a data acquisition vehicle, and six cameras are respectively arranged in more than two directions of the vehicle body, namely the front direction, the rear direction, the front left direction, the rear left direction, the front right direction and the rear right direction as shown in the figure, so that the visual angles of two adjacent cameras are partially overlapped, and the vehicle body can be covered by 360 degrees. For camera: the HFOV in front and back view is 50 degrees, the maximum distance is 200m, and the focal length is 6 mm; the HFOV in front of the side is 120 degrees, the maximum distance is 40m, and the focal length is 2.33 mm; the HFOV at the back of the side is 80 degrees with a maximum distance of 70m and a focal length of 4.14 mm. The resolution of all the cameras is 200 ten thousand. For the lidar, the lidar is carried (arranged) on the top of a vehicle body, the horizontal FOV is 360 degrees, the vertical FOV is about-20 degrees to 20 degrees, and the scanning frequency is 20 Hz.

2. Sensor calibration: calibrating an internal parameter (intrinsic) of each camera and an external parameter (extrinsic) relative to a vehicle body (ego vehicle) by using a calibration plate; and calibrating external parameters of the lidar relative to the vehicle body by using a camera and laser radar (hereinafter referred to as lidar) combined calibration method.

The outer parameter (extrinsic) and the inner parameter (intrinsic) of each camera relative to the vehicle body (ego vehicle) are calibrated by a camera calibration board, wherein the outer parameter of each camera is described by a yaw angle yaw, a pitch angle pitch, a roll angle roll, an X-direction translation distance tx, a Y-direction translation distance ty and a Z-direction translation distance tz. The internal reference refers to the x-direction and y-direction pixel dimensions f of the cameras (or the camera to which each camera is connected)_x、f_yAnd pixel center p_x、p_y. The projection matrix from the vehicle body coordinate system to the camera pixel coordinate system can be obtained through external reference and internal reference, and specific transformation derivation is shown in formulas (1) - (8).

R＝R_yawR_pitchR_roll (4)

The rigid rotation matrix is denoted herein as R and the translation vector is denoted as T. K is an internal reference matrix formed by internal references of the camera. The formula (9) is derived from (6) and (8). It can be seen that R, T, K form a 3x4 projection matrix P, which is the homogeneous coordinate of a point in the coordinate system of the vehicle body

Projected as pixel coordinates (u, v) of a camera pixel plane, where Z_cFor this purpose the depth of the point in the camera coordinate system.

For the Lidar, the patent only relates to the transformation from the Lidar coordinate system to the ego vehicle coordinate system, and the transformation relationship is as follows:

3. data acquisition: synchronizing data collected by a camera and a laser radar at the same moment, and ensuring that the time stamp difference of all data at each moment does not exceed a set value, such as a numerical value of 45ms or lower; and during data acquisition, the synchronization of the 6-path camera and the Lidar needs to be ensured. The synchronous mode of this patent is for when laser radar Lidar's scanning edge and camera (or this camera that every camera is connected) optical axis coincidence, triggers the camera exposure, acquires the outside image data original image of automobile body. Therefore, when the Lidar sweeps through 360 degrees, all cameras are exposed once. The scanning frequency of the Lidar is 20Hz, namely 50ms is needed for one rotation, so that the maximum synchronization difference of the camera is (5/6) × 50ms, and the requirement of less than 45ms is met.

4. Data annotation: performing joint labeling on original drawings (for example, 6 original drawings are obtained by six cameras) acquired by each camera at the same moment and point cloud drawings correspondingly acquired by the laser radar; marking a static area of a road surface on an image acquired by a camera, for example, marking a static target at least comprising two static objects such as a travelable area, a lane line and the like, and marking a 3D bounding box of at least two moving targets such as vehicles, pedestrians and the like on a point cloud picture acquired by a laser radar;

reference is made to fig. 2-1, 2-2, and 2-3, where each frame of image collected by the camera is marked with original image road information, which is two static targets, i.e., a lane line and a drivable area for the present patent. Moving target information is marked on a synchronous point cloud picture acquired by a laser radar, and for the patent, two moving targets, namely a pedestrian and a vehicle, are marked by adopting a 3D bounding box marking mode in the prior art.

5. Generating a single-frame semantic point cloud: converting the marked point cloud picture into each camera plane, and dyeing the point cloud by using semantic information of the image to generate semantic point cloud;

firstly, converting the point cloud into an ego vehicle coordinate system by using a formula (10), then converting the point cloud into a camera coordinate system by using a formula (6), then filtering out point cloud points with Z-direction coordinates smaller than 0, and finally converting the point cloud points into pixel coordinate points by using a formula (8). The above process can be formulated as follows:

wherein

Representing the conversion of the lidar point cloud to the coordinates of the ith camera coordinate system, Z_{i_lidar}＞0，

And the coordinates of the lidar point cloud in the ith camera pixel plane are represented. The original point cloud perspective is transformed to a pixel plane of a certain camera i by the formulas (11) and (12), and a perspective projection picture is obtained and recorded as Mask_{i_lidar}Wherein if a certain pixel coordinate has a projection of point cloud, then Mask_{i_lidar}(u, v) ═ 1, otherwise Mask_{i_lidar}(u, v) ═ 0. Then the semantic label Mask of the image of the ith camera can be passed_{i_gt}For Mask_{i_lidar}One dyeing was carried out, the specific dyeing procedure was as follows:

(1) selecting a subtending category j to be dyed, such as selecting a travelable area of the patent;

(2) assume that the label value of this class is f_jThen the set of pixel points corresponding to this category is:

Mask_{j_i_gt}＝(Mask_{i_gt}＝＝f_j) (13)

M_ij＝{(u,v)|Mask_{j_i_gt}(u,v)＝＝1,Mask_{i_lidar}(u,v)＝＝1} (14)

Given point set P_ijAnd attaching a label of the category j, namely dyeing the point cloud, wherein the point cloud has category attributes and becomes a semantic point cloud.

Further, for each camera, P can be found_ijThe final point cloud with category j is:

for each category of the pavement area, the point cloud can be dyed according to the steps (1) to (4), so that the semantic point cloud P with different category labels is finally obtained_j. 3-1 and 3-2 show the result of projecting a single frame semantic point cloud of a travelable area on a BEV canvas.

6. Splicing and projecting multi-frame semantic point cloud: splicing continuous multi-frame semantic point clouds under a unified body coordinate system taking a certain frame as a reference, and projecting the point clouds onto a BEV canvas:

step 5 mainly focuses on the generation of single-frame semantic point clouds, but the point clouds have sparse characteristics (see fig. 3-1 and 3-2), so in order to obtain dense semantic point clouds, continuous multi-frame semantic point clouds are spliced to a uniform vehicle body coordinate system taking a certain frame as a reference. Specifically, the present patent selects a total of 2N +1 frames (where plus 1 refers to the current frame) of the previous N frames and the next N frames of the current frame as the original information to generate the BEV label of the current frame, where the reference frame is the current frame (labeled as 0) and is represented by the following table ref. The objective invariant world coordinate system, denoted by the subscript w, is introduced here, and similar to the principle of equation (6), the points of the world coordinate system are transformed into reference coordinate system expressions as follows:

wherein R is_wAnd T_wStill according to the definitions of the equations (4) and (5), the yaw angle yaw, pitch angle pitch, roll angle, X-direction translation distance tx, Y-direction translation distance ty, and Z-direction translation distance tz are the pose information of the ego vehicle body itself (the information can be obtained by the wheel encoder/IMU or VIO, and there is a set of algorithms of its own, which is out of the scope of the patent discussion). For the mth frame (m belongs to { -N, - (N-1),. and-1, 0, 1.. and N-1, N }), a semantic point cloud set P of the category j is first obtained in step 5_mjThen, the point clouds are converted to a world coordinate system, and the conversion formula is as follows:

wherein

And (3) a certain point in the point cloud set representing the jth label category of the mth frame. By means of equations (17) and (18), the point cloud set of the jth tag of the mth frame can be converted into a point cloud of a uniform reference coordinate system:

converting the semantic point clouds of all the frames of the [ -N, N ] into a reference coordinate system through a formula (19) to obtain compact point clouds, and then projecting the compact point clouds onto a BEV canvas to obtain a compact BEV label. Fig. 4-1 and 4-2 show schematic diagrams of dense point cloud projection views of a travelable region of a road surface (where N is 5).

7. BEV label post-treatment: and (4) manually or morphologically refining the map after the point cloud projection to obtain the BEV pavement area label.

The BEV label generated in the previous step 6 has more or less holes, as shown in fig. 4-1 and 4-2, so the patent further performs morphological transformation on the map after point cloud projection, repairs some hole areas of the point cloud projection, and performs refine on the label again manually, thereby obtaining an accurate road BEV label (fig. 5). For a moving target, the four grounding points of its 3d bounding box (see fig. 2-3) can be projected directly onto the BEV canvas of the reference coordinate system (fig. 5). Finally, the BEV pavement label and the BEV dynamic target are fused to obtain an accurate BEV pavement area label (figure 5). The whole algorithm flow of the present application is summarized as a flow chart shown in fig. 6.

Finally, it should be noted that the above-mentioned examples of the present invention are only examples for illustrating the present invention, and are not intended to limit the embodiments of the present invention. While the invention has been described in detail with reference to preferred embodiments, it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention. Not all embodiments are exhaustive. All obvious changes and modifications of the present invention are within the scope of the present invention.

Claims

1. A bird's-eye view semantic segmentation label generation method based on multi-frame semantic point cloud splicing is characterized by comprising the following steps:

1) configuring sensors for data acquisition on a vehicle: arranging a camera in more than two directions of the vehicle respectively to ensure that some visual angles of two adjacent cameras are overlapped together; the laser radar is arranged at the top of the vehicle body;

3) data acquisition: synchronizing data collected by a camera and a laser radar at the same moment, and ensuring that the time stamp difference of all data at each moment does not exceed a set value; when the scanning edge of the laser radar is overlapped with the optical axis of the camera, triggering the camera to expose to obtain an original image of the image data outside the vehicle body;

2. The bird view semantic segmentation label generation method based on multi-frame semantic point cloud stitching according to claim 1, characterized in that: and 6, a BEV label post-processing step is further included, some hollow areas of the point cloud projection are repaired, and manual or morphological refine is carried out on the map after the point cloud projection to obtain a BEV pavement area label.

3. The bird view semantic segmentation label generation method based on multi-frame semantic point cloud stitching according to claim 1, characterized in that: the external parameters of each camera are described by a yaw angle yaw, a pitch angle pitch, a roll angle, an X-direction translation distance tx, a Y-direction translation distance ty and a Z-direction translation distance tz; inner reference refers to the x-direction and y-direction pixel dimensions f of the camera_x、f_yAnd pixel center p_x、p_y；

R＝R_yawR_pitchR_roll (4)

recording the rigid body rotation matrix as R; recording the translation vector as T; kAn internal reference matrix formed by internal references of the camera; the formula (9) is derived from (6) and (8); r, T, K form a 3x4 projection matrix P which is the homogeneous coordinate of a certain point in the coordinate system of the vehicle body

Projected as pixel coordinates (u, v) of the camera pixel plane, where Z_cThis is the depth of the point in the camera coordinate system.

4. The bird view semantic segmentation label generation method based on multi-frame semantic point cloud stitching according to claim 3, characterized in that: and (3) converting the laser radar coordinate system into the vehicle body coordinate system, wherein the conversion relation is as follows:

5. the bird view semantic segmentation label generation method based on multi-frame semantic point cloud stitching according to any one of claims 1 to 4, characterized in that: generating the single-frame semantic point cloud in the step 5 specifically as follows:

the formula is as follows:

wherein

6. The bird view semantic segmentation label generation method based on multi-frame semantic point cloud stitching according to claim 5, characterized in that: the dyeing process is as follows:

(1) selecting an opposite type j to be dyed;

Mask_{j_i_gt}＝(Mask_{i_gt}＝＝f_j) (13)

M_ij＝{(u,v)|Mask_{j_i_gt}(u,v)＝＝1,Mask_{i_lidar}(u,v)＝＝1} (14)

P_ij＝{(X_lidar,Y_lidar,Z_lidar)|(X_lidar,Y_lidar,Z_lidar) Projection point E of ith camera is (15)

Given point set P_ijLabel category j, i.e. stain point cloud.

7. The bird view semantic segmentation label generation method based on multi-frame semantic point cloud stitching according to claim 6, characterized in that: for each camera, P can be found_ijThe final point cloud with category j is:

8. The bird's-eye view semantic segmentation label generation method based on multi-frame semantic point cloud splicing according to any one of claims 1 to 3, characterized in that: the step 6 is to obtain a compact BEV label, which specifically comprises the following steps:

wherein R is_wAnd T_wStill defined according to the above equations (4) and (5), and the yaw angle yaw, pitch, roll, X-direction translation distance tx, Y-direction translation distance ty, and Z-direction translation distance tz is pose information of the vehicle body; for the mth frame (m belongs to { -N, - (N-1),. and-1, 0, 1.. and N-1, N }), a semantic point cloud set P of the category j is first obtained in step 5_mjThen, the point clouds are converted to a world coordinate system, and the conversion formula is as follows:

wherein

9. The bird view semantic segmentation label generation method based on multi-frame semantic point cloud stitching according to claim 2, characterized in that: the BEV label post-processing is to repair some hollow areas of the point cloud projection and perform refine on the label again manually; for the moving target, four grounding points of the 3d bounding box are directly projected onto a BEV canvas of a reference coordinate system, and finally the BEV pavement area label and the BEV moving target are fused to obtain an accurate BEV pavement area label.

10. The bird view semantic segmentation label generation method based on multi-frame semantic point cloud stitching according to any one of claims 1 to 3, characterized in that: the front view, the back view, the left front, the left back, the right front and the right back of the vehicle body are respectively provided with one camera, and the total number of the cameras is six, so that the vehicle body can be covered by 360 degrees.