Movatterモバイル変換


[0]ホーム

URL:


CN114445593A - Aerial view semantic segmentation label generation method based on multi-frame semantic point cloud splicing - Google Patents

Aerial view semantic segmentation label generation method based on multi-frame semantic point cloud splicing
Download PDF

Info

Publication number
CN114445593A
CN114445593ACN202210114639.7ACN202210114639ACN114445593ACN 114445593 ACN114445593 ACN 114445593ACN 202210114639 ACN202210114639 ACN 202210114639ACN 114445593 ACN114445593 ACN 114445593A
Authority
CN
China
Prior art keywords
point cloud
camera
semantic
frame
coordinate system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210114639.7A
Other languages
Chinese (zh)
Other versions
CN114445593B (en
Inventor
詹东旭
冯绪杨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Changan Automobile Co Ltd
Original Assignee
Chongqing Changan Automobile Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Changan Automobile Co LtdfiledCriticalChongqing Changan Automobile Co Ltd
Priority to CN202210114639.7ApriorityCriticalpatent/CN114445593B/en
Publication of CN114445593ApublicationCriticalpatent/CN114445593A/en
Application grantedgrantedCritical
Publication of CN114445593BpublicationCriticalpatent/CN114445593B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

The invention discloses a method for generating a bird's-eye view semantic segmentation label based on multi-frame semantic point cloud splicing, which comprises the following steps of 1) configuring six cameras and a laser radar on a vehicle; 2) calibrating the internal parameters of each camera and the external parameters relative to the vehicle body by using a calibration plate, and calibrating the external parameters of the laser radar relative to the vehicle body; 3) synchronizing data collected by a camera and a laser radar at the same moment; 4) jointly marking 6 original images collected by a camera at the same moment and a cloud point image of a laser radar; 5) converting the marked point cloud picture into each camera plane, and dyeing the point cloud by using semantic information of the image; 6) splicing continuous multi-frame semantic point clouds to a uniform vehicle body coordinate system taking a certain frame as a reference, and projecting the continuous multi-frame semantic point clouds to a BEV canvas. According to the invention, the semantic point clouds are generated and spliced by utilizing the image semantic information and the point cloud information, and are finally projected into the bird's-eye view canvas for automatic generation, so that the cost of the data tags is reduced.

Description

Aerial view semantic segmentation label generation method based on multi-frame semantic point cloud splicing
Technical Field
The invention relates to the technical field of automobile automatic driving surrounding perception, in particular to a method for generating a bird's-eye view semantic segmentation label based on multi-frame semantic point cloud splicing.
Background
Autopilot has been valued by more and more manufacturers as a key technology for a new generation of intelligent cars. Generally, the entire autopilot system consists of three major modules: the system comprises a perception fusion module, a decision planning module and a control module, wherein the perception fusion is used as a front module of the other two modules, and the perception precision of the perception fusion module directly determines the performance of the whole automatic driving system.
The technology of the current perception module has not been limited to the traditional single front view Camera (Forward Camera) configuration, and all manufacturers have started to perform 360-degree blind-corner-free surround perception by using multiple cameras around the vehicle body, and the most common arrangement is shown in fig. 1: the six cameras are respectively arranged at the front View, the rear View, the left front, the left rear, the right front and the right rear, collect image information of different visual angles, then send the image information into a surrounding perception model, and the surrounding perception model directly outputs the semantic information of a Bird's Eye View (BEV). The bird's eye view discussed herein refers specifically to a bird's eye view taken from directly above the host vehicle. The bird's-eye view semantic information refers to semantic segmentation of the bird's-eye view, and segmentation elements of the bird's-eye view are defined according to requirements and comprise static objects such as lane lines and drivable areas, and comprise moving objects such as vehicles and pedestrians.
In order to train such a bird's-eye view semantic segmentation model, it is naturally necessary to acquire a corresponding bird's-eye view semantic segmentation label (hereinafter referred to as BEV label). The current possible semantic segmentation label acquisition mode is as follows:
the first mode is as follows: high-precision maps are generated off-line (such as the high-precision map generation method, device, equipment and readable storage medium disclosed in CN 202010597488.6), and then corresponding BEV labels are generated through semantic information elements of the high-precision maps. The method needs to extract a constructed high-precision map and then directly acquire the BEV aerial view by utilizing semantic information of the high-precision map.
The second mode is as follows: utilize unmanned aerial vehicle, carry out synchronous aerial photography directly over the data acquisition car, then the manual work marks the aerial view. The biggest disadvantage of this approach is that the drones are usually controlled by the region, so the data collection scenarios are limited. Furthermore, this acquisition approach cannot be triggered by the shadow mode, which makes subsequent closed-loop iterations of the model difficult.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a bird's-eye view semantic segmentation label generation method based on multi-frame semantic point cloud splicing so as to obtain a set of low-cost BEV label automatic generation algorithm, avoid the high cost and inconvenience of an unmanned aerial vehicle and a high-precision map, and directly obtain BEV labels by utilizing semantic point cloud and multi-frame splicing.
The technical scheme of the invention is realized as follows:
a bird's-eye view semantic segmentation label generation method based on multi-frame semantic point cloud splicing is characterized by comprising the following steps:
1) configuring sensors for data acquisition on the vehicle: arranging a camera in more than two directions of the vehicle respectively to ensure that parts of the visual angles of two adjacent cameras are overlapped together so as to cover 360 degrees around the vehicle body; the laser radar is arranged at the top of the vehicle body;
2) sensor calibration: calibrating the internal parameters of each camera and the external parameters relative to the vehicle body by using a calibration plate, and calibrating the external parameters of the laser radar relative to the vehicle body by using a camera and laser radar combined calibration method;
3) data acquisition: synchronizing data collected by a camera and a laser radar at the same moment, and ensuring that the time stamp difference of all data at each moment does not exceed a set value; when the scanning edge of the laser radar is overlapped with the optical axis of the camera, triggering the camera to expose to obtain an original image of the external image data of the vehicle body;
4) data annotation: performing joint labeling on the original image acquired by each camera at the same moment and the point cloud image acquired by the laser radar correspondingly; marking original image road surface information on each frame of image acquired by a camera, marking at least two static targets of a lane line and a travelable area, marking a 3D bounding box of a moving target on a synchronous point cloud image acquired by a laser radar, and marking at least two moving targets of a traveler and a vehicle;
5) generating a single-frame semantic point cloud: converting the marked point cloud picture into each camera plane, and dyeing the point cloud by using semantic information of the image to generate semantic point cloud;
6) and splicing continuous multi-frame semantic point clouds to a uniform body coordinate system taking a certain frame as a reference, and projecting the continuous multi-frame semantic point clouds to a BEV canvas to obtain a compact BEV label.
In this way, the semantic point clouds are generated by utilizing the image semantic information and the point cloud information, then continuous multi-frame semantic point clouds are spliced and finally projected into the bird's-eye view canvas and post-processed, so that the bird's-eye view semantic segmentation map is automatically generated, the situation that the bird's-eye view semantic tags are obtained in a high-cost mode such as an unmanned aerial vehicle or a high-precision map is avoided, and the cost of data tags is greatly reduced.
Further: and 6, a BEV label post-processing step is further included, some hole areas projected by point clouds are repaired, and manual or morphological refine is carried out on the map projected by the point clouds to obtain the BEV pavement area label. Thus, a more accurate BEV pavement area label can be obtained by the repair of the post-processing step.
Further: the external parameters of each camera are described by a yaw angle yaw, a pitch angle pitch, a roll angle, an X-direction translation distance tx, a Y-direction translation distance ty and a Z-direction translation distance tz; inner reference refers to the x-direction and y-direction pixel dimensions f of the camerax、fyAnd pixel center px、py
The projection matrix from the vehicle body coordinate system to the camera pixel coordinate system can be obtained through external reference and internal reference, and the specific transformation derivation is as formulas (1) to (8):
Figure RE-GDA0003558774200000031
Figure RE-GDA0003558774200000032
Figure RE-GDA0003558774200000033
R=RyawRpitchRroll (4)
Figure RE-GDA0003558774200000034
Figure RE-GDA0003558774200000035
Figure RE-GDA0003558774200000036
Figure RE-GDA0003558774200000037
Figure RE-GDA0003558774200000038
recording the rigid body rotation matrix as R; recording the translation vector as T; k is an internal reference matrix formed by internal references of the camera; the formula (9) is derived from (6) and (8); r, T, K form a 3x4 projection matrix P which is the homogeneous coordinate of a certain point in the coordinate system of the vehicle body
Figure RE-GDA0003558774200000041
Projected as pixel coordinates (u, v) of the camera pixel plane, where ZcThis is the depth of the point in the camera coordinate system. Therefore, the external reference and the internal reference of each camera can be accurately calibrated, and the reference is determined for obtaining the BEV pavement area label with high precision.
Further: and (3) converting the laser radar coordinate system into the vehicle body coordinate system, wherein the conversion relation is as follows:
Figure RE-GDA0003558774200000042
therefore, the laser radar coordinate system can be transformed to the vehicle body coordinate system, and the transformation of the coordinate system is convenient to accurately carry out.
Further: generating the single-frame semantic point cloud in the step 5 specifically as follows:
firstly, converting point cloud into a vehicle body coordinate system by using the following formula, then converting the point cloud into a camera coordinate system by using the formula (6), filtering out point cloud points with Z-direction coordinates smaller than 0, and finally converting the point cloud points into pixel coordinate points by using the formula (8);
Figure RE-GDA0003558774200000043
the formula is as follows:
Figure RE-GDA0003558774200000044
Figure RE-GDA0003558774200000045
wherein
Figure RE-GDA0003558774200000046
Representing the conversion of the lidar point cloud to the coordinates in the ith camera coordinate system, Zi_lidar>0,
Figure RE-GDA0003558774200000051
Representing the coordinates of the laser radar point cloud on the ith camera pixel plane;
the original point cloud perspective is transformed to a pixel plane of a certain camera i by the formulas (11) and (12), and a perspective projection picture is obtained and recorded as Maski_lidarWherein if a certain pixel coordinate has a projection of point cloud, then Maski_lidar(u, v) ═ 1, otherwise Maski_lidar(u, v) ═ 0; then passing through the semantic label Mask of the image of the ith camerai_gtFor Maski_lidarPerforming a dyeing; the point cloud has category attribute and becomes semantic point cloud.
Further: the dyeing process is as follows:
(1) selecting an opposite type j to be dyed;
(2) assume that the label value of this class is fjThat isThe set of pixel points corresponding to this category is:
Maskj_i_gt=(Maski_gt==fj) (13)
(3) solving the intersection M of the mask of the category j and the non-0 pixel point in the point cloud projection maskj
Mij={(u,v)|Maskj_i_gt(u,v)==1,Maski_lidar(u,v)==1} (14)
(4) For the non-0 projection point sets, reversely solving the corresponding point sets in the original point cloud:
Pij={(Xlidar,Ylidar,Zlidar)|(Xlidar,Ylidar,Zlidar) Projection point E M of ith cameraij} (15)
Given point set PijLabel category j, i.e. stain point cloud. Therefore, the method can dye various opposite categories and becomes semantic point cloud.
Further: for each camera, P can be foundijThe final point cloud with category j is:
Figure RE-GDA0003558774200000052
for each category of the pavement area, the point cloud can be dyed according to the steps (1) to (4), so that the semantic point cloud P with different category labels is finally obtainedj
Further: the step 6 is to obtain a compact BEV label, which specifically comprises the following steps:
selecting total 2N +1 frames of the front N frames and the back N frames of the current frame as original information to generate a BEV label of the current frame, wherein the reference frame is the current frame, the label is 0, and the reference frame is represented by the following table ref; converting the points of the world coordinate system into the reference coordinate system expression by using the objective invariant world coordinate system and using subscript w as follows:
Figure RE-GDA0003558774200000061
wherein R iswAnd TwStill defined according to the above equations (4) and (5), wherein the yaw angle yaw, the pitch angle pitch, the roll angle roll, the X-direction translation distance tx, the Y-direction translation distance ty, and the Z-direction translation distance tz are a pose information of the vehicle body itself; for the mth frame (m belongs to { -N, - (N-1),. and-1, 0, 1.. and N-1, N }), a semantic point cloud set P of the category j is first obtained in step 5mjThen, the point clouds are converted to a world coordinate system, and the conversion formula is as follows:
Figure RE-GDA0003558774200000062
wherein
Figure RE-GDA0003558774200000063
A certain point in the point cloud set representing the jth label category of the mth frame; by means of equations (17) and (18), the point cloud set of the jth tag of the mth frame can be converted into a point cloud of a uniform reference coordinate system:
Figure RE-GDA0003558774200000064
converting the semantic point clouds of all the frames of the [ -N, N ] into a reference coordinate system through a formula (19) to obtain compact point clouds, and then projecting the compact point clouds onto a BEV canvas to obtain a compact BEV label.
Further: the BEV label post-processing is to repair some hollow areas of the point cloud projection and perform refine on the label again manually; and for the moving target, directly projecting four grounding points of a 3d bounding box of the moving target onto a BEV canvas of a reference coordinate system, and finally fusing the BEV pavement label and the BEV moving target to obtain an accurate BEV pavement area label.
And further: the timestamp difference of all data at each moment is set to 45 ms.
Further: the front view, the back view, the left front, the left back, the right front and the right back of the vehicle body are respectively provided with one camera, and the total number of the cameras is six, and the cameras cover 360 degrees around the vehicle body. Therefore, the visual angles of two adjacent cameras are partially overlapped, and the vehicle body can be covered by 360 degrees.
In summary, the invention has the following beneficial effects:
1. the method is a set of low-cost BEV label automatic generation algorithm, high cost and inconvenience of an unmanned aerial vehicle and a high-precision map are avoided, and BEV labels are obtained by directly utilizing semantic point cloud and multi-frame splicing.
2. According to the bird's-eye view semantic segmentation method, the semantic point clouds are generated by utilizing the image semantic information and the point cloud information, then the continuous multi-frame semantic point clouds are spliced and finally projected to the bird's-eye view canvas and subjected to post-processing, so that the bird's-eye view semantic segmentation map is automatically generated, the situation that a bird's-eye view semantic tag is obtained in a high-cost mode such as an unmanned aerial vehicle or a high-precision map is avoided, and the cost of data tags is greatly reduced.
Drawings
FIG. 1 is a schematic view of a data acquisition sensor configuration of the vehicle;
FIG. 2-1 is a schematic view of a raw picture taken by a sensor;
FIG. 2-2 is a mask generated from the label of the original picture of FIG. 2-1;
FIGS. 2-3 are cloud point views with bounding boxes labeled;
FIG. 3-1 is a projection view of a single-frame travelable area BEV; FIG. 3-2 is a single frame lane line BEV projection;
FIG. 4-1 is a multi-frame drivable area BEV projection; fig. 4-2 are multi-frame lane line BEV projection views;
FIG. 5 is a moving object BEV label, a pavement BEV detailing label and a final fusion map;
fig. 6 is a flow chart of the overall BEV label auto-generation algorithm.
Detailed Description
The following detailed description of specific embodiments of the invention refers to the accompanying drawings.
Referring to fig. 1 to 6, the invention relates to a bird's-eye view semantic segmentation label generation method based on multi-frame semantic point cloud splicing, which comprises the following steps:
the specific implementation steps are as follows:
1. the sensor for data acquisition is configured on a vehicle and mainly comprises a camera and a radar (laser radar): the acquisition devices are configured according to the position of the vehicle body shown in figure 1 to form a data acquisition vehicle, and six cameras are respectively arranged in more than two directions of the vehicle body, namely the front direction, the rear direction, the front left direction, the rear left direction, the front right direction and the rear right direction as shown in the figure, so that the visual angles of two adjacent cameras are partially overlapped, and the vehicle body can be covered by 360 degrees. For camera: the HFOV in front and back view is 50 degrees, the maximum distance is 200m, and the focal length is 6 mm; the HFOV in front of the side is 120 degrees, the maximum distance is 40m, and the focal length is 2.33 mm; the HFOV at the back of the side is 80 degrees with a maximum distance of 70m and a focal length of 4.14 mm. The resolution of all the cameras is 200 ten thousand. For the lidar, the lidar is carried (arranged) on the top of a vehicle body, the horizontal FOV is 360 degrees, the vertical FOV is about-20 degrees to 20 degrees, and the scanning frequency is 20 Hz.
2. Sensor calibration: calibrating an internal parameter (intrinsic) of each camera and an external parameter (extrinsic) relative to a vehicle body (ego vehicle) by using a calibration plate; and calibrating external parameters of the lidar relative to the vehicle body by using a camera and laser radar (hereinafter referred to as lidar) combined calibration method.
The outer parameter (extrinsic) and the inner parameter (intrinsic) of each camera relative to the vehicle body (ego vehicle) are calibrated by a camera calibration board, wherein the outer parameter of each camera is described by a yaw angle yaw, a pitch angle pitch, a roll angle roll, an X-direction translation distance tx, a Y-direction translation distance ty and a Z-direction translation distance tz. The internal reference refers to the x-direction and y-direction pixel dimensions f of the cameras (or the camera to which each camera is connected)x、fyAnd pixel center px、py. The projection matrix from the vehicle body coordinate system to the camera pixel coordinate system can be obtained through external reference and internal reference, and specific transformation derivation is shown in formulas (1) - (8).
Figure RE-GDA0003558774200000081
Figure RE-GDA0003558774200000082
Figure RE-GDA0003558774200000083
R=RyawRpitchRroll (4)
Figure RE-GDA0003558774200000084
Figure RE-GDA0003558774200000085
Figure RE-GDA0003558774200000086
Figure RE-GDA0003558774200000091
Figure RE-GDA0003558774200000092
The rigid rotation matrix is denoted herein as R and the translation vector is denoted as T. K is an internal reference matrix formed by internal references of the camera. The formula (9) is derived from (6) and (8). It can be seen that R, T, K form a 3x4 projection matrix P, which is the homogeneous coordinate of a point in the coordinate system of the vehicle body
Figure RE-GDA0003558774200000093
Projected as pixel coordinates (u, v) of a camera pixel plane, where ZcFor this purpose the depth of the point in the camera coordinate system.
For the Lidar, the patent only relates to the transformation from the Lidar coordinate system to the ego vehicle coordinate system, and the transformation relationship is as follows:
Figure RE-GDA0003558774200000094
3. data acquisition: synchronizing data collected by a camera and a laser radar at the same moment, and ensuring that the time stamp difference of all data at each moment does not exceed a set value, such as a numerical value of 45ms or lower; and during data acquisition, the synchronization of the 6-path camera and the Lidar needs to be ensured. The synchronous mode of this patent is for when laser radar Lidar's scanning edge and camera (or this camera that every camera is connected) optical axis coincidence, triggers the camera exposure, acquires the outside image data original image of automobile body. Therefore, when the Lidar sweeps through 360 degrees, all cameras are exposed once. The scanning frequency of the Lidar is 20Hz, namely 50ms is needed for one rotation, so that the maximum synchronization difference of the camera is (5/6) × 50ms, and the requirement of less than 45ms is met.
4. Data annotation: performing joint labeling on original drawings (for example, 6 original drawings are obtained by six cameras) acquired by each camera at the same moment and point cloud drawings correspondingly acquired by the laser radar; marking a static area of a road surface on an image acquired by a camera, for example, marking a static target at least comprising two static objects such as a travelable area, a lane line and the like, and marking a 3D bounding box of at least two moving targets such as vehicles, pedestrians and the like on a point cloud picture acquired by a laser radar;
reference is made to fig. 2-1, 2-2, and 2-3, where each frame of image collected by the camera is marked with original image road information, which is two static targets, i.e., a lane line and a drivable area for the present patent. Moving target information is marked on a synchronous point cloud picture acquired by a laser radar, and for the patent, two moving targets, namely a pedestrian and a vehicle, are marked by adopting a 3D bounding box marking mode in the prior art.
5. Generating a single-frame semantic point cloud: converting the marked point cloud picture into each camera plane, and dyeing the point cloud by using semantic information of the image to generate semantic point cloud;
firstly, converting the point cloud into an ego vehicle coordinate system by using a formula (10), then converting the point cloud into a camera coordinate system by using a formula (6), then filtering out point cloud points with Z-direction coordinates smaller than 0, and finally converting the point cloud points into pixel coordinate points by using a formula (8). The above process can be formulated as follows:
Figure RE-GDA0003558774200000101
Figure RE-GDA0003558774200000102
wherein
Figure RE-GDA0003558774200000103
Representing the conversion of the lidar point cloud to the coordinates of the ith camera coordinate system, Zi_lidar>0,
Figure RE-GDA0003558774200000104
And the coordinates of the lidar point cloud in the ith camera pixel plane are represented. The original point cloud perspective is transformed to a pixel plane of a certain camera i by the formulas (11) and (12), and a perspective projection picture is obtained and recorded as Maski_lidarWherein if a certain pixel coordinate has a projection of point cloud, then Maski_lidar(u, v) ═ 1, otherwise Maski_lidar(u, v) ═ 0. Then the semantic label Mask of the image of the ith camera can be passedi_gtFor Maski_lidarOne dyeing was carried out, the specific dyeing procedure was as follows:
(1) selecting a subtending category j to be dyed, such as selecting a travelable area of the patent;
(2) assume that the label value of this class is fjThen the set of pixel points corresponding to this category is:
Maskj_i_gt=(Maski_gt==fj) (13)
(3) solving the intersection M of the mask of the category j and the non-0 pixel point in the point cloud projection maskj
Mij={(u,v)|Maskj_i_gt(u,v)==1,Maski_lidar(u,v)==1} (14)
(4) For the non-0 projection point sets, reversely solving the corresponding point sets in the original point cloud:
Pij={(Xlidar,Ylidar,Zlidar)|(Xlidar,Ylidar,Zlidar) Projection point E M of ith cameraij} (15)
Given point set PijAnd attaching a label of the category j, namely dyeing the point cloud, wherein the point cloud has category attributes and becomes a semantic point cloud.
Further, for each camera, P can be foundijThe final point cloud with category j is:
Figure RE-GDA0003558774200000111
for each category of the pavement area, the point cloud can be dyed according to the steps (1) to (4), so that the semantic point cloud P with different category labels is finally obtainedj. 3-1 and 3-2 show the result of projecting a single frame semantic point cloud of a travelable area on a BEV canvas.
6. Splicing and projecting multi-frame semantic point cloud: splicing continuous multi-frame semantic point clouds under a unified body coordinate system taking a certain frame as a reference, and projecting the point clouds onto a BEV canvas:
step 5 mainly focuses on the generation of single-frame semantic point clouds, but the point clouds have sparse characteristics (see fig. 3-1 and 3-2), so in order to obtain dense semantic point clouds, continuous multi-frame semantic point clouds are spliced to a uniform vehicle body coordinate system taking a certain frame as a reference. Specifically, the present patent selects a total of 2N +1 frames (where plus 1 refers to the current frame) of the previous N frames and the next N frames of the current frame as the original information to generate the BEV label of the current frame, where the reference frame is the current frame (labeled as 0) and is represented by the following table ref. The objective invariant world coordinate system, denoted by the subscript w, is introduced here, and similar to the principle of equation (6), the points of the world coordinate system are transformed into reference coordinate system expressions as follows:
Figure RE-GDA0003558774200000112
wherein R iswAnd TwStill according to the definitions of the equations (4) and (5), the yaw angle yaw, pitch angle pitch, roll angle, X-direction translation distance tx, Y-direction translation distance ty, and Z-direction translation distance tz are the pose information of the ego vehicle body itself (the information can be obtained by the wheel encoder/IMU or VIO, and there is a set of algorithms of its own, which is out of the scope of the patent discussion). For the mth frame (m belongs to { -N, - (N-1),. and-1, 0, 1.. and N-1, N }), a semantic point cloud set P of the category j is first obtained in step 5mjThen, the point clouds are converted to a world coordinate system, and the conversion formula is as follows:
Figure RE-GDA0003558774200000121
wherein
Figure RE-GDA0003558774200000122
And (3) a certain point in the point cloud set representing the jth label category of the mth frame. By means of equations (17) and (18), the point cloud set of the jth tag of the mth frame can be converted into a point cloud of a uniform reference coordinate system:
Figure RE-GDA0003558774200000123
converting the semantic point clouds of all the frames of the [ -N, N ] into a reference coordinate system through a formula (19) to obtain compact point clouds, and then projecting the compact point clouds onto a BEV canvas to obtain a compact BEV label. Fig. 4-1 and 4-2 show schematic diagrams of dense point cloud projection views of a travelable region of a road surface (where N is 5).
7. BEV label post-treatment: and (4) manually or morphologically refining the map after the point cloud projection to obtain the BEV pavement area label.
The BEV label generated in the previous step 6 has more or less holes, as shown in fig. 4-1 and 4-2, so the patent further performs morphological transformation on the map after point cloud projection, repairs some hole areas of the point cloud projection, and performs refine on the label again manually, thereby obtaining an accurate road BEV label (fig. 5). For a moving target, the four grounding points of its 3d bounding box (see fig. 2-3) can be projected directly onto the BEV canvas of the reference coordinate system (fig. 5). Finally, the BEV pavement label and the BEV dynamic target are fused to obtain an accurate BEV pavement area label (figure 5). The whole algorithm flow of the present application is summarized as a flow chart shown in fig. 6.
Finally, it should be noted that the above-mentioned examples of the present invention are only examples for illustrating the present invention, and are not intended to limit the embodiments of the present invention. While the invention has been described in detail with reference to preferred embodiments, it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention. Not all embodiments are exhaustive. All obvious changes and modifications of the present invention are within the scope of the present invention.

Claims (10)

1. A bird's-eye view semantic segmentation label generation method based on multi-frame semantic point cloud splicing is characterized by comprising the following steps:
1) configuring sensors for data acquisition on a vehicle: arranging a camera in more than two directions of the vehicle respectively to ensure that some visual angles of two adjacent cameras are overlapped together; the laser radar is arranged at the top of the vehicle body;
2) sensor calibration: calibrating the internal parameters of each camera and the external parameters relative to the vehicle body by using a calibration plate, and calibrating the external parameters of the laser radar relative to the vehicle body by using a camera and laser radar combined calibration method;
3) data acquisition: synchronizing data collected by a camera and a laser radar at the same moment, and ensuring that the time stamp difference of all data at each moment does not exceed a set value; when the scanning edge of the laser radar is overlapped with the optical axis of the camera, triggering the camera to expose to obtain an original image of the image data outside the vehicle body;
4) data annotation: performing joint labeling on the original image acquired by each camera at the same moment and the point cloud image acquired by the laser radar correspondingly; marking original image road surface information on each frame of image acquired by a camera, marking at least two static targets of a lane line and a travelable area, marking a 3D bounding box of a moving target on a synchronous point cloud image acquired by a laser radar, and marking at least two moving targets of a traveler and a vehicle;
5) generating a single-frame semantic point cloud: converting the marked point cloud picture into each camera plane, and dyeing the point cloud by using semantic information of the image to generate semantic point cloud;
6) and splicing continuous multi-frame semantic point clouds to a uniform body coordinate system taking a certain frame as a reference, and projecting the continuous multi-frame semantic point clouds to a BEV canvas to obtain a compact BEV label.
2. The bird view semantic segmentation label generation method based on multi-frame semantic point cloud stitching according to claim 1, characterized in that: and 6, a BEV label post-processing step is further included, some hollow areas of the point cloud projection are repaired, and manual or morphological refine is carried out on the map after the point cloud projection to obtain a BEV pavement area label.
3. The bird view semantic segmentation label generation method based on multi-frame semantic point cloud stitching according to claim 1, characterized in that: the external parameters of each camera are described by a yaw angle yaw, a pitch angle pitch, a roll angle, an X-direction translation distance tx, a Y-direction translation distance ty and a Z-direction translation distance tz; inner reference refers to the x-direction and y-direction pixel dimensions f of the camerax、fyAnd pixel center px、py
The projection matrix from the vehicle body coordinate system to the camera pixel coordinate system can be obtained through external reference and internal reference, and the specific transformation derivation is as formulas (1) to (8):
Figure FDA0003495806240000021
Figure FDA0003495806240000022
Figure FDA0003495806240000023
R=RyawRpitchRroll (4)
Figure FDA0003495806240000024
Figure FDA0003495806240000025
Figure FDA0003495806240000026
Figure FDA0003495806240000027
Figure FDA0003495806240000031
recording the rigid body rotation matrix as R; recording the translation vector as T; kAn internal reference matrix formed by internal references of the camera; the formula (9) is derived from (6) and (8); r, T, K form a 3x4 projection matrix P which is the homogeneous coordinate of a certain point in the coordinate system of the vehicle body
Figure FDA0003495806240000032
Projected as pixel coordinates (u, v) of the camera pixel plane, where ZcThis is the depth of the point in the camera coordinate system.
4. The bird view semantic segmentation label generation method based on multi-frame semantic point cloud stitching according to claim 3, characterized in that: and (3) converting the laser radar coordinate system into the vehicle body coordinate system, wherein the conversion relation is as follows:
Figure FDA0003495806240000033
5. the bird view semantic segmentation label generation method based on multi-frame semantic point cloud stitching according to any one of claims 1 to 4, characterized in that: generating the single-frame semantic point cloud in the step 5 specifically as follows:
firstly, converting point cloud into a vehicle body coordinate system by using the following formula, then converting the point cloud into a camera coordinate system by using the formula (6), filtering out point cloud points with Z-direction coordinates smaller than 0, and finally converting the point cloud points into pixel coordinate points by using the formula (8);
Figure FDA0003495806240000034
the formula is as follows:
Figure FDA0003495806240000035
Figure FDA0003495806240000041
wherein
Figure FDA0003495806240000042
Representing the conversion of the lidar point cloud to the coordinates in the ith camera coordinate system, Zi_lidar>0,
Figure FDA0003495806240000043
Representing the coordinates of the laser radar point cloud on the ith camera pixel plane;
the original point cloud perspective is transformed to a pixel plane of a certain camera i by the formulas (11) and (12), and a perspective projection picture is obtained and recorded as Maski_lidarWherein if a certain pixel coordinate has a projection of point cloud, then Maski_lidar(u, v) ═ 1, otherwise Maski_lidar(u, v) ═ 0; then passing through the semantic label Mask of the image of the ith camerai_gtFor Maski_lidarPerforming a dyeing; the point cloud has category attribute and becomes semantic point cloud.
6. The bird view semantic segmentation label generation method based on multi-frame semantic point cloud stitching according to claim 5, characterized in that: the dyeing process is as follows:
(1) selecting an opposite type j to be dyed;
(2) assume that the label value of this class is fjThen the set of pixel points corresponding to this category is:
Maskj_i_gt=(Maski_gt==fj) (13)
(3) solving the intersection M of the mask of the category j and the non-0 pixel point in the point cloud projection maskj
Mij={(u,v)|Maskj_i_gt(u,v)==1,Maski_lidar(u,v)==1} (14)
(4) For the non-0 projection point sets, reversely solving the corresponding point sets in the original point cloud:
Pij={(Xlidar,Ylidar,Zlidar)|(Xlidar,Ylidar,Zlidar) Projection point E of ith camera is (15)
Given point set PijLabel category j, i.e. stain point cloud.
7. The bird view semantic segmentation label generation method based on multi-frame semantic point cloud stitching according to claim 6, characterized in that: for each camera, P can be foundijThe final point cloud with category j is:
Figure FDA0003495806240000051
for each category of the pavement area, the point cloud can be dyed according to the steps (1) to (4), so that the semantic point cloud P with different category labels is finally obtainedj
8. The bird's-eye view semantic segmentation label generation method based on multi-frame semantic point cloud splicing according to any one of claims 1 to 3, characterized in that: the step 6 is to obtain a compact BEV label, which specifically comprises the following steps:
selecting total 2N +1 frames of the front N frames and the back N frames of the current frame as original information to generate a BEV label of the current frame, wherein the reference frame is the current frame, the label is 0, and the reference frame is represented by the following table ref; converting the points of the world coordinate system into the reference coordinate system expression by using the objective invariant world coordinate system and using subscript w as follows:
Figure FDA0003495806240000052
wherein R iswAnd TwStill defined according to the above equations (4) and (5), and the yaw angle yaw, pitch, roll, X-direction translation distance tx, Y-direction translation distance ty, and Z-direction translation distance tz is pose information of the vehicle body; for the mth frame (m belongs to { -N, - (N-1),. and-1, 0, 1.. and N-1, N }), a semantic point cloud set P of the category j is first obtained in step 5mjThen, the point clouds are converted to a world coordinate system, and the conversion formula is as follows:
Figure FDA0003495806240000053
wherein
Figure FDA0003495806240000054
A certain point in the point cloud set representing the jth label category of the mth frame; by means of equations (17) and (18), the point cloud set of the jth tag of the mth frame can be converted into a point cloud of a uniform reference coordinate system:
Figure FDA0003495806240000061
converting the semantic point clouds of all the frames of the [ -N, N ] into a reference coordinate system through a formula (19) to obtain compact point clouds, and then projecting the compact point clouds onto a BEV canvas to obtain a compact BEV label.
9. The bird view semantic segmentation label generation method based on multi-frame semantic point cloud stitching according to claim 2, characterized in that: the BEV label post-processing is to repair some hollow areas of the point cloud projection and perform refine on the label again manually; for the moving target, four grounding points of the 3d bounding box are directly projected onto a BEV canvas of a reference coordinate system, and finally the BEV pavement area label and the BEV moving target are fused to obtain an accurate BEV pavement area label.
10. The bird view semantic segmentation label generation method based on multi-frame semantic point cloud stitching according to any one of claims 1 to 3, characterized in that: the front view, the back view, the left front, the left back, the right front and the right back of the vehicle body are respectively provided with one camera, and the total number of the cameras is six, so that the vehicle body can be covered by 360 degrees.
CN202210114639.7A2022-01-302022-01-30Bird's eye view semantic segmentation label generation method based on multi-frame semantic point cloud splicingActiveCN114445593B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202210114639.7ACN114445593B (en)2022-01-302022-01-30Bird's eye view semantic segmentation label generation method based on multi-frame semantic point cloud splicing

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202210114639.7ACN114445593B (en)2022-01-302022-01-30Bird's eye view semantic segmentation label generation method based on multi-frame semantic point cloud splicing

Publications (2)

Publication NumberPublication Date
CN114445593Atrue CN114445593A (en)2022-05-06
CN114445593B CN114445593B (en)2024-05-10

Family

ID=81372524

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202210114639.7AActiveCN114445593B (en)2022-01-302022-01-30Bird's eye view semantic segmentation label generation method based on multi-frame semantic point cloud splicing

Country Status (1)

CountryLink
CN (1)CN114445593B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN115601530A (en)*2022-08-222023-01-13奥特酷智能科技(南京)有限公司(Cn) Method and system for automatically marking traffic lights on a point cloud map
CN115661394A (en)*2022-12-262023-01-31安徽蔚来智驾科技有限公司Method for constructing lane line map, computer device and storage medium
CN115690416A (en)*2022-10-292023-02-03重庆长安汽车股份有限公司 A knowledge distillation-based BEV semantic segmentation model training method, system, device and medium
CN115797454A (en)*2023-02-082023-03-14深圳佑驾创新科技有限公司Multi-camera fusion sensing method and device under bird's-eye view angle
CN115937825A (en)*2023-01-062023-04-07之江实验室Robust lane line generation method and device under BEV (beam-based attitude vector) of on-line pitch angle estimation
CN116109657A (en)*2023-02-162023-05-12航科院中宇(北京)新技术发展有限公司Geographic information data acquisition processing method, system, electronic equipment and storage medium
CN116152702A (en)*2022-12-212023-05-23北京百度网讯科技有限公司Point cloud label acquisition method and device, electronic equipment and automatic driving vehicle
CN116304992A (en)*2023-05-222023-06-23智道网联科技(北京)有限公司Sensor time difference determining method, device, computer equipment and storage medium
CN116452654A (en)*2023-04-112023-07-18北京辉羲智能科技有限公司BEV perception-based relative pose estimation method, neural network and training method thereof
CN117935262A (en)*2024-01-232024-04-26镁佳(北京)科技有限公司Point cloud data labeling method and device, computer equipment and storage medium
CN120143802A (en)*2025-05-122025-06-13天津联汇智造科技有限公司 A mobile robot navigation method and system based on deep learning

Citations (11)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
GB201518613D0 (en)*2015-10-212015-12-02Nokia Technologies Oy3D scene rendering
US20180188026A1 (en)*2016-12-302018-07-05DeepMap Inc.Visual odometry and pairwise alignment for high definition map creation
CN112101066A (en)*2019-06-172020-12-18商汤集团有限公司 Object detection method and device and intelligent driving method, device and storage medium
CN112767485A (en)*2021-01-262021-05-07哈尔滨工业大学(深圳)Point cloud map creating and scene identifying method based on static semantic information
CN113076830A (en)*2021-03-222021-07-06上海欧菲智能车联科技有限公司Environment passing area detection method and device, vehicle-mounted terminal and storage medium
CN113252053A (en)*2021-06-162021-08-13中智行科技有限公司High-precision map generation method and device and electronic equipment
US20210287430A1 (en)*2020-03-132021-09-16Nvidia CorporationSelf-supervised single-view 3d reconstruction via semantic consistency
CN113537049A (en)*2021-07-142021-10-22广东汇天航空航天科技有限公司Ground point cloud data processing method and device, terminal equipment and storage medium
CN113673484A (en)*2021-09-092021-11-19上海融进电子商务有限公司Road condition identification and decision-making method in unmanned driving scene
WO2021247741A1 (en)*2020-06-032021-12-09Waymo LlcAutonomous driving with surfel maps
CN113936139A (en)*2021-10-292022-01-14江苏大学 A method and system for scene bird's-eye view reconstruction combining visual depth information and semantic segmentation

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
GB201518613D0 (en)*2015-10-212015-12-02Nokia Technologies Oy3D scene rendering
US20180188026A1 (en)*2016-12-302018-07-05DeepMap Inc.Visual odometry and pairwise alignment for high definition map creation
CN112101066A (en)*2019-06-172020-12-18商汤集团有限公司 Object detection method and device and intelligent driving method, device and storage medium
US20210287430A1 (en)*2020-03-132021-09-16Nvidia CorporationSelf-supervised single-view 3d reconstruction via semantic consistency
WO2021247741A1 (en)*2020-06-032021-12-09Waymo LlcAutonomous driving with surfel maps
CN112767485A (en)*2021-01-262021-05-07哈尔滨工业大学(深圳)Point cloud map creating and scene identifying method based on static semantic information
CN113076830A (en)*2021-03-222021-07-06上海欧菲智能车联科技有限公司Environment passing area detection method and device, vehicle-mounted terminal and storage medium
CN113252053A (en)*2021-06-162021-08-13中智行科技有限公司High-precision map generation method and device and electronic equipment
CN113537049A (en)*2021-07-142021-10-22广东汇天航空航天科技有限公司Ground point cloud data processing method and device, terminal equipment and storage medium
CN113673484A (en)*2021-09-092021-11-19上海融进电子商务有限公司Road condition identification and decision-making method in unmanned driving scene
CN113936139A (en)*2021-10-292022-01-14江苏大学 A method and system for scene bird's-eye view reconstruction combining visual depth information and semantic segmentation

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
MONG H. NG: "BEV-Seg: Bird\'s Eye View Semantic Segmentation Using Geometry and Semantic Point Cloud", COMPUTER SCIENCE, 23 June 2020 (2020-06-23)*
杨雪梦: "基于深度学习的激光雷达点云语义分割研究及应用", 中国优秀硕士学位论文全文数据库信息科技辑, no. 2, 15 February 2023 (2023-02-15)*
钟泽宇: "基于激光雷达点云的自动驾驶场景语义理解方法", 中国优秀硕士学位论文全文数据库工程科技Ⅱ辑, no. 1, 15 January 2021 (2021-01-15)*

Cited By (14)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN115601530A (en)*2022-08-222023-01-13奥特酷智能科技(南京)有限公司(Cn) Method and system for automatically marking traffic lights on a point cloud map
CN115690416A (en)*2022-10-292023-02-03重庆长安汽车股份有限公司 A knowledge distillation-based BEV semantic segmentation model training method, system, device and medium
CN116152702A (en)*2022-12-212023-05-23北京百度网讯科技有限公司Point cloud label acquisition method and device, electronic equipment and automatic driving vehicle
CN115661394A (en)*2022-12-262023-01-31安徽蔚来智驾科技有限公司Method for constructing lane line map, computer device and storage medium
CN115937825A (en)*2023-01-062023-04-07之江实验室Robust lane line generation method and device under BEV (beam-based attitude vector) of on-line pitch angle estimation
CN115797454A (en)*2023-02-082023-03-14深圳佑驾创新科技有限公司Multi-camera fusion sensing method and device under bird's-eye view angle
CN115797454B (en)*2023-02-082023-06-02深圳佑驾创新科技有限公司Multi-camera fusion sensing method and device under bird's eye view angle
CN116109657A (en)*2023-02-162023-05-12航科院中宇(北京)新技术发展有限公司Geographic information data acquisition processing method, system, electronic equipment and storage medium
CN116109657B (en)*2023-02-162023-07-07航科院中宇(北京)新技术发展有限公司Geographic information data acquisition processing method, system, electronic equipment and storage medium
CN116452654A (en)*2023-04-112023-07-18北京辉羲智能科技有限公司BEV perception-based relative pose estimation method, neural network and training method thereof
CN116452654B (en)*2023-04-112023-11-10北京辉羲智能科技有限公司BEV perception-based relative pose estimation method, neural network and training method thereof
CN116304992A (en)*2023-05-222023-06-23智道网联科技(北京)有限公司Sensor time difference determining method, device, computer equipment and storage medium
CN117935262A (en)*2024-01-232024-04-26镁佳(北京)科技有限公司Point cloud data labeling method and device, computer equipment and storage medium
CN120143802A (en)*2025-05-122025-06-13天津联汇智造科技有限公司 A mobile robot navigation method and system based on deep learning

Also Published As

Publication numberPublication date
CN114445593B (en)2024-05-10

Similar Documents

PublicationPublication DateTitle
CN114445593B (en)Bird's eye view semantic segmentation label generation method based on multi-frame semantic point cloud splicing
CN114445592B (en)Bird's eye view semantic segmentation label generation method based on inverse perspective transformation and point cloud projection
CN113050074B (en)Camera and laser radar calibration system and calibration method in unmanned environment perception
CN112581612A (en)Vehicle-mounted grid map generation method and system based on fusion of laser radar and look-around camera
CN103707781B (en)Based on the driver's seat position automatic adjustment system of range image
CN111462135A (en) Semantic Mapping Method Based on Visual SLAM and 2D Semantic Segmentation
CN112180373A (en)Multi-sensor fusion intelligent parking system and method
JP2022522385A (en) Road sign recognition methods, map generation methods, and related products
WO2000007373A1 (en)Method and apparatus for displaying image
CN116977806A (en)Airport target detection method and system based on millimeter wave radar, laser radar and high-definition array camera
CN110736472A (en) An indoor high-precision map representation method based on the fusion of vehicle surround view image and millimeter-wave radar
US20250086983A1 (en)Unmanned parking space detection method based on panoramic surround view
US12106516B2 (en)Pose estimation refinement for aerial refueling
CN108596982A (en)A kind of easy vehicle-mounted multi-view camera viewing system scaling method and device
CN112991524B (en)Three-dimensional reconstruction method, electronic device and storage medium
CN114639115B (en)Human body key point and laser radar fused 3D pedestrian detection method
CN103714321A (en)Driver face locating system based on distance image and strength image
CN114998436A (en)Object labeling method and device, electronic equipment and storage medium
CN113609942B (en)Road intelligent monitoring system based on multi-view and multi-spectral fusion
Huang et al.Measuring the absolute distance of a front vehicle from an in-car camera based on monocular vision and instance segmentation
CN115588053A (en) Color 3D reconstruction method of complex shape and large workpiece based on hand-eye calibration of point cloud information
CN102202159B (en)Digital splicing method for unmanned aerial photographic photos
Castorena et al.Motion guided LiDAR-camera self-calibration and accelerated depth upsampling for autonomous vehicles
CN117972885A (en)Simulation enhancement-based space intelligent perception data generation method
Ge et al.Multivision sensor extrinsic calibration method with non-overlapping fields of view using encoded 1D target

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp