Movatterモバイル変換


[0]ホーム

URL:


CN119919320B - Dynamic point cloud elimination method, device, equipment and storage medium based on multimodal data perception - Google Patents

Dynamic point cloud elimination method, device, equipment and storage medium based on multimodal data perception

Info

Publication number
CN119919320B
CN119919320BCN202411801168.2ACN202411801168ACN119919320BCN 119919320 BCN119919320 BCN 119919320BCN 202411801168 ACN202411801168 ACN 202411801168ACN 119919320 BCN119919320 BCN 119919320B
Authority
CN
China
Prior art keywords
point cloud
image
frame
dynamic
processed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202411801168.2A
Other languages
Chinese (zh)
Other versions
CN119919320A (en
Inventor
张建国
李永刚
庞旭芳
丁宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Guochuang Guishen Intelligent Robot Co ltd
Chinese University of Hong Kong Shenzhen
Original Assignee
Shenzhen Guochuang Guishen Intelligent Robot Co ltd
Chinese University of Hong Kong Shenzhen
Filing date
Publication date
Application filed by Shenzhen Guochuang Guishen Intelligent Robot Co ltd, Chinese University of Hong Kong ShenzhenfiledCriticalShenzhen Guochuang Guishen Intelligent Robot Co ltd
Priority to CN202411801168.2ApriorityCriticalpatent/CN119919320B/en
Publication of CN119919320ApublicationCriticalpatent/CN119919320A/en
Application grantedgrantedCritical
Publication of CN119919320BpublicationCriticalpatent/CN119919320B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Abstract

The present application relates to the field of image processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for dynamic point cloud rejection based on multi-modal data awareness. The method comprises the steps of obtaining a single-frame point cloud, determining a corresponding image to be processed, carrying out camera distortion correction on the image to be processed, inputting the image to be processed into a model, carrying out semantic segmentation to obtain a mask image, obtaining a front-frame point cloud and a rear-frame point cloud which are closest to image acquisition time of the image to be processed, carrying out linear transition on the single-frame point cloud based on the front-frame point cloud, the rear-frame point cloud and the image acquisition time, projecting the single-frame point cloud after the linear transition to the mask image, extracting a dynamic object point cloud and a static point cloud, carrying out cluster analysis on the dynamic object point cloud to extract a ground point cloud, and combining the static point cloud and the ground point cloud to obtain a three-dimensional point cloud after the dynamic point cloud is removed. The method and the device can realize the effect of confirming the specific details of the object in the fusion point cloud data so as to meet the scene understanding requirement of increasingly higher precision.

Description

Dynamic point cloud eliminating method, device, equipment and storage medium based on multi-mode data perception
Technical Field
The present application relates to the field of image processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for dynamic point cloud rejection based on multi-mode data sensing.
Background
The point cloud data refers to a set of vectors in a three-dimensional coordinate system, each point contains three-dimensional coordinates, and can carry other information about the attribute of the point, such as color, reflectivity and the like, geometric information with high precision, high resolution and high dimension, and can intuitively represent the shape, surface, texture and the like of an object in space. Along with the continuous development of technology, the application range of the point cloud data is wider and wider, and the point cloud data has wide application prospect and market demand in the fields of automatic driving, intelligent robots, smart city construction and the like.
In practical application, the dynamic point cloud elimination is one of the indispensable technical means in the fields of automatic driving, mobile robots and the like, and is not only beneficial to improving the positioning accuracy and the map construction quality of a system, but also capable of reducing the calculation cost, enhancing the environment perception capability and adapting to complex scenes, thereby providing safer, more comfortable and intelligent travel experience for users.
However, at present, the position and the range of the target object are mainly defined by combining with the result fusion point cloud data of the target detection, that is, a rectangular frame is adopted in the detection result, so that the expression of information such as the internal structure, the shape details, the precise relation with the surrounding environment and the like of the object is limited, the complex shape of the object cannot be accurately represented, and the scene understanding requirement with increasingly improved precision is difficult to meet, for example, ideal effects are difficult to be exerted in the technical fields of robot navigation, high-precision mapping and building information models.
Based on this, how to confirm the specific details of the object in the fusion point cloud data so as to meet the scene understanding requirement of increasingly higher precision is a technical problem to be solved urgently.
Disclosure of Invention
In order to overcome the defects of the prior art, the application provides a dynamic point cloud eliminating method, device, equipment and storage medium based on multi-mode data perception, which can realize the effect of confirming specific details of objects in fusion point cloud data so as to meet the scene understanding requirement of increasingly improving the precision.
The technical scheme adopted for solving the technical problems is as follows:
In a first aspect, the present application provides a dynamic point cloud rejection method based on multi-modal data awareness, the method comprising:
Acquiring a single-frame point cloud, and determining an image to be processed corresponding to the single-frame point cloud;
Carrying out camera distortion correction on the image to be processed, inputting the corrected image to be processed into a pre-trained image semantic segmentation model, and carrying out semantic segmentation on the corrected image to be processed through the image semantic segmentation model to obtain a mask image;
Acquiring a front frame point cloud and a rear frame point cloud which are closest to the image acquisition time of the image to be processed, and carrying out linear transition on the single frame point cloud based on the front frame point cloud, the rear frame point cloud and the image acquisition time;
Projecting a single-frame point cloud after linear transition to the mask image, and extracting a dynamic object point cloud and a static point cloud from the single-frame point cloud, wherein the dynamic object point cloud comprises a dynamic point cloud and a ground point cloud;
Performing cluster analysis on the dynamic object point cloud to extract the ground point cloud;
and combining the static point cloud and the ground point cloud to obtain a three-dimensional point cloud with the dynamic point cloud removed.
Optionally, the step of linearly transitioning the single-frame point cloud based on the front-frame point cloud, the rear-frame point cloud, and the image acquisition time includes:
Acquiring a front frame posture position corresponding to the front frame point cloud and a rear frame posture position corresponding to the rear frame point cloud, and acquiring a front frame image acquisition time corresponding to the front frame point cloud and a rear frame image acquisition time corresponding to the rear frame point cloud;
Determining a time proportion based on the front frame acquisition time, the rear frame acquisition time and the image acquisition time;
and carrying out linear interpolation calculation on the gesture position corresponding to the single-frame point cloud according to the gesture position of the front frame, the gesture position of the rear frame and the time proportion to obtain a transition gesture position, wherein the transition gesture position is the gesture position corresponding to the image acquisition time.
Optionally, the step of projecting the single-frame point cloud after the linear transition to the mask image, and extracting the dynamic object point cloud and the static point cloud from the single-frame point cloud includes:
Acquiring an external parameter matrix of a camera coordinate system relative to a radar coordinate system and a transition pose matrix corresponding to the transition pose;
Determining a conversion matrix based on the external reference matrix, the transition pose matrix and the pose corresponding to the single-frame point cloud, and converting each point in the single-frame point cloud from a radar coordinate system to a camera coordinate system through the conversion matrix;
based on a preset camera internal reference matrix, projecting each point corresponding to the single-frame point cloud under a camera coordinate system to the mask image to obtain a plurality of pixel coordinates;
and obtaining pixel values corresponding to each pixel coordinate in the mask image, so as to extract the dynamic object point cloud and the static point cloud according to each pixel value.
Optionally, the step of performing camera distortion correction on the image to be processed includes:
performing radial distortion correction and tangential distortion correction on each image point of the image to be processed;
And determining the input of the image semantic segmentation model according to the radial distortion correction results and the tangential distortion correction results of all the image points.
Optionally, the step of obtaining a pixel value corresponding to each pixel coordinate in the mask image to extract the dynamic object point cloud and the static point cloud according to each pixel value includes:
Determining the corresponding relation between each pixel coordinate and the mask image according to each pixel coordinate;
screening to obtain all pixel coordinates with non-zero values of the corresponding pixel values in the mask image through the corresponding relation between each pixel coordinate and the mask image;
and extracting the dynamic object point cloud according to all the pixel coordinates obtained by screening, and determining the static point cloud.
Optionally, the step of determining, according to each pixel coordinate, a correspondence between each pixel coordinate and the mask image includes:
Judging whether each pixel coordinate is an integer pixel coordinate or not;
if the pixel coordinates are integer pixel coordinates, determining the corresponding relation between the pixel coordinates and the mask image according to the pixel coordinates;
Otherwise, acquiring four integer pixel coordinates closest to the pixel coordinates, and acquiring four pixel values corresponding to the four integer pixel coordinates;
And carrying out interpolation calculation based on the obtained four pixel values, and determining the corresponding relation between the pixel coordinates and the mask image according to interpolation calculation results.
Optionally, the step of performing cluster analysis on the dynamic object point cloud to extract the ground point cloud includes:
determining any point in the dynamic object point cloud as a starting point, and adding the starting point into a preset clustering set;
the Euclidean distance between each point which is not accessed in the dynamic object point cloud and all points in the clustering set is calculated one by one;
comparing the Euclidean distance of each point with a preset distance threshold value, and adding the points with Euclidean threshold values exceeding the distance threshold value into the clustering set until all the points in the dynamic object point cloud are accessed;
and extracting the ground point cloud from the clustering set according to the preset ground point cloud characteristics.
In a second aspect, the present application provides a dynamic point cloud removing device based on multi-mode data sensing, including:
the data acquisition module is used for acquiring a single-frame point cloud and determining an image to be processed corresponding to the single-frame point cloud;
the semantic segmentation module is used for carrying out camera distortion correction on the image to be processed, inputting the corrected image to be processed into a pre-trained image semantic segmentation model, and carrying out semantic segmentation on the corrected image to be processed through the image semantic segmentation model to obtain a mask image;
The linear transition module is used for acquiring a front frame point cloud and a rear frame point cloud which are closest to the image acquisition time of the image to be processed, and carrying out linear transition on the single frame point cloud based on the front frame point cloud, the rear frame point cloud and the image acquisition time;
The point cloud extraction module is used for projecting the single-frame point cloud after linear transition to the mask image, and extracting a dynamic object point cloud and a static point cloud from the single-frame point cloud, wherein the dynamic object point cloud comprises a dynamic point cloud and a ground point cloud;
the ground point cloud determining module is used for carrying out cluster analysis on the dynamic object point cloud so as to extract the ground point cloud;
And the point cloud data fusion module is used for combining the static point cloud and the ground point cloud to obtain a three-dimensional point cloud after the dynamic point cloud is removed.
In a third aspect, the present application provides an electronic device, comprising:
One or more processors;
One or more memories;
and one or more computer programs, wherein the one or more computer programs are stored in the one or more memories, the one or more computer programs comprising instructions, which when executed by the one or more processors, cause the electronic device to perform the method described above.
In a fourth aspect, the present application provides a computer readable storage medium having stored therein a program or instructions which, when executed, implement the above-described method.
The technical scheme includes that a radar and a camera are used for acquiring data of the same scene respectively, the radar is used for acquiring three-dimensional point clouds, the camera is used for acquiring two-dimensional images, the camera and the radar are used for acquiring data of the same scene at the same time, one frame of point clouds is determined to be the point clouds to be processed, the image to be processed closest to the point clouds to be processed in time is found, further, camera distortion correction is carried out on the image to be processed, the corrected image to be processed is input into a model for semantic segmentation to obtain a mask image, further, linear transition is carried out on the basis of track point poses of the front frame of point clouds, track point poses of the rear frame of point clouds and acquisition time of the image to be processed to enable track point poses corresponding to the single frame of point clouds to be converted into interpolated positions, further, the single frame of point clouds are projected to the mask image to be separated from dynamic object point clouds corresponding to the mask image, other parts of the single frame of point clouds are static point clouds, further, the dynamic object point clouds are obtained through clustering, and finally, the dynamic point clouds are obtained through clustering of the point clouds, and the dynamic point clouds are combined with the dynamic point clouds.
By adopting the technical scheme, the application at least has the following beneficial effects:
1. Compared with the traditional target detection method, the semantic segmentation method has finer spatial information expression, semantic labels are given to each point in a scene, more detailed spatial information is provided, all point cloud data are processed and analyzed, information in the data can be fully mined, and omission of the information is avoided. It can be seen that the dynamic object point cloud separated based on semantic segmentation has more abundant expression on the internal structure, shape details, accurate relation with surrounding environment and other information of the object, and can realize the confirmation of the specific details of the object in the fusion point cloud data so as to meet the scene understanding requirement with increasingly improved precision.
2. The application adopts a multi-sensor fusion calibration technology combining a camera and a radar to make up for the defect of a single sensor, thereby improving the overall perception capability and further improving the accuracy and the integrity of point cloud data.
Drawings
FIG. 1 is a flow chart of a dynamic point cloud rejection method based on multi-modal data awareness provided by an embodiment of the application
FIG. 2 is a schematic diagram of a virtual structure of the dynamic point cloud removing device based on multi-mode data perception;
fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The application will be further described with reference to the drawings and examples.
The conception, specific structure, and technical effects produced by the present application will be clearly and completely described below with reference to the embodiments and the drawings to fully understand the objects, features, and effects of the present application. It is apparent that the described embodiments are only some embodiments of the present application, but not all embodiments, and that other embodiments obtained by those skilled in the art without inventive effort are within the scope of the present application based on the embodiments of the present application. In addition, all the coupling/connection relationships referred to in the patent are not direct connection of the single-finger members, but rather, it means that a better coupling structure can be formed by adding or subtracting coupling aids depending on the specific implementation. The technical features in the application can be interactively combined on the premise of no contradiction and conflict.
Referring to fig. 1, fig. 1 is a flow chart of a dynamic point cloud removing method based on multi-mode data perception, which is provided by the embodiment of the application, specifically includes the following steps, and the following specifically describes:
In step S1, a single-frame point cloud is acquired, and a to-be-processed image corresponding to the single-frame point cloud is determined.
Specifically, a single-frame point cloud acquires point cloud data by adopting a three-dimensional point cloud acquisition mode such as a laser radar, and an image to be processed is a two-dimensional image which is acquired by an image acquisition device such as a camera and contains dynamic objects, wherein the dynamic objects comprise vehicles, pedestrians and the like, for a target scene. In the embodiment of the application, the same scene is acquired for multiple times at the same frequency through multiple sensors (cameras and radars), so as to obtain multiple images acquired by the cameras and multiple frame point clouds acquired by the radars, wherein each point cloud corresponds to an SLAM track point pose. Selecting a single-frame point cloud containing a dynamic object from the acquired multi-frame point clouds, and then finding a to-be-processed image with time aligned with space based on the acquisition time of the single-frame point cloud.
In step S2, the camera distortion correction is performed on the image to be processed, and the corrected image to be processed is input into a pre-trained image semantic segmentation model, so that the corrected image to be processed is subjected to semantic segmentation through the image semantic segmentation model, and a mask image is obtained.
Specifically, the image semantic segmentation model is a semantic segmentation-capable model that can segment an image into region blocks having a specific semantic meaning and identify the semantic category of each region block, for example, identify pedestrians, vehicles, and the like. The mask image is a binary image for representing different types of areas in the image, in the mask image obtained by segmentation based on semantic segmentation, a dynamic object can be marked black, and a static background can be marked white.
More specifically, taking into account the distortion condition of the camera, before obtaining the mask image (i.e., before inputting the image to be processed into the image semantic segmentation model), the distortion parameters need to be used to correct the distortion of the image to be processed, including:
And carrying out radial distortion correction and tangential distortion correction on each image point of the image to be processed.
In particular, the distortion of a camera generally includes radial distortion and tangential distortion, assuming uncorrected points in the camera coordinate system asThe corrected point isIn this case, the normalized coordinates are first obtained,And carrying out distortion correction through a radial distortion calculation formula and a tangential distortion calculation formula respectively by using the normalized coordinates. The radial distortion calculation formula is as follows:
;
;
;
wherein, p1 and p2 in the above radial distortion calculation formula are radial distortion parameters.
The tangential distortion calculation formula is:
;
;
Wherein, p1 and p2 in the tangential distortion calculation formula are tangential distortion parameters.
Further, according to the radial distortion correction results and the tangential distortion correction results of all the image points, the input of the image semantic segmentation model is determined.
In particular, the distortion corrected point may be expressed asThe camera distortion correction of the image to be processed can be completed through the algorithm.
In step S3, a front frame point cloud and a rear frame point cloud closest to the image acquisition time of the image to be processed are obtained, and linear transition is performed on the single frame point cloud based on the front frame point cloud, the rear frame point cloud and the image acquisition time.
Specifically, in the embodiment of the present application, the laser radar frequency is 10hz, the image data acquisition frequency is also 10hz, however, due to various errors, the image to be processed (any one of the acquired images including the dynamic object) acquired at the current time and the one-frame point cloud closest to the acquisition time are not completely the same time, based on this, before the subsequent point cloud data fusion, two front and rear frames of point clouds closest to the current image acquisition time need to be found, and based on this, a space-time unified and kinematic linear transition operation is performed to determine a single-frame point cloud corresponding to the acquisition time.
More specifically, the step of linearly transitioning the single-frame point cloud based on the front-frame point cloud, the rear-frame point cloud, and the image acquisition time includes:
And acquiring a front frame posture position corresponding to the front frame point cloud and a rear frame posture position corresponding to the rear frame point cloud, and acquiring a front frame image acquisition time corresponding to the front frame point cloud and a rear frame image acquisition time corresponding to the rear frame point cloud.
Specifically, let the acquisition time of the image to be processed beThe corresponding acquisition time of the front frame point cloud isSLAM track point pose corresponding to previous frame point cloud isWhereinRepresenting position coordinates roll, pitch, yaw to identify the attitude angle of a dynamic object (such as a vehicle), wherein the acquisition time of the corresponding post-frame point cloud isThe corresponding SLAM track point gesture pose is
Wherein, the SLAM track point gesture pose refers to three-dimensional position and direction information of a robot or a camera in space, which is acquired by synchronous positioning and map construction (Simultaneous Localization AND MAPPING, SLAM for short), and the pose refers to information describing the position and direction of the robot or the camera in space and generally comprises three-dimensional coordinatesAnd rotation angle (pitch angle, yaw angle, roll angle).
Further, a time ratio is determined based on the previous frame acquisition time, the subsequent frame acquisition time, and the image acquisition time.
Specifically, a first time difference from a previous frame is determined based on a previous frame acquisition time and an image acquisition time, and a second time difference from a subsequent frame is determined based on a subsequent frame acquisition time and an image acquisition timeAndThen based on the first time difference and the second time difference, calculating to obtain the total time differenceThen, based on the total time difference and the first time difference, determining the time proportion of the previous frame time difference to the total time difference
Further, according to the front frame posture, the rear frame posture and the time proportion, linear interpolation calculation is carried out on the posture corresponding to the single-frame point cloud, so that a transition posture is obtained, and the transition posture is the posture corresponding to the image acquisition time.
Specifically, position coordinates of any point in the point cloud are determinedInterpolation is carried out on the position coordinates based on the time proportion obtained in the steps, and the position coordinates after interpolation are obtainedCan be obtained in the same way,;
Further, for the gesture pose corresponding to the interpolated position coordinate, including the interpolated roll angleInterpolated pitch angleAnd interpolated yaw angleWherein the pose angles roll, pitch, yaw represent rotations about the x-axis, y-axis, and z-axis in order for describing the spatial pose of the object.
Further, each position coordinate in the single-frame point cloud is interpolated, so that all transition gesture positions corresponding to the image acquisition time are obtainedThis pose may be converted into a matrix M2, based on which a conversion matrix for conversion from the radar coordinate system to the camera coordinate system may subsequently be calculated.
In step S4, projecting the single-frame point cloud after the linear transition to the mask image, and extracting a dynamic object point cloud and a static point cloud from the single-frame point cloud, where the dynamic object point cloud includes a dynamic point cloud and a ground point cloud.
Specifically, projecting a single-frame point cloud after linear transition to a pixel coordinate system of a mask image, and classifying and storing the point clouds of a dynamic object and a static part in the obtained mask image, thereby extracting and obtaining the point clouds of the dynamic object and the static point clouds. Specifically, firstly, determining the pose corresponding to the single-frame point cloud as M0, obtaining a pose conversion matrix of M2 through linear interpolation of the pose corresponding to the single-frame point cloud, and defining an external reference matrix of a camera relative to a radar as M1 and a conversion matrix as M3. The external parameter matrix of the camera relative to the radar can be obtained by calibrating the camera and the radar, and the calibration process can be realized by adopting MATLAB software.
More specifically, projecting the single-frame point cloud after linear transition to the mask image, and extracting the dynamic object point cloud and the static point cloud from the single-frame point cloud specifically comprises the following steps:
and acquiring an external parameter matrix of the camera coordinate system relative to the radar coordinate system and a transition pose matrix corresponding to the transition pose.
Specifically, an extrinsic matrix M1 of the camera with respect to the radar, and a matrix M2 obtained by converting the transition pose and the pose obtained in the step S3 are obtained. Wherein M2 is obtained by linear interpolation of pose M0 corresponding to single-frame point cloud.
Further, a transformation matrix is determined based on the extrinsic matrix, the transition pose matrix and the pose corresponding to the single-frame point cloud, and each point in the single-frame point cloud is transformed from a radar coordinate system to a camera coordinate system through the transformation matrix.
Specifically, let the external parameter matrix of the camera coordinate system relative to the radar coordinate system be M1, and the pose matrix corresponding to the transition pose be M2 (i.e., the matrix M2 obtained by linear interpolation of the pose corresponding to the point cloud), in this case, the conversion matrix M3 for converting the coordinate system is:
in the resting state: And in the motion state:;
wherein M3 is oneThe matrix is formed by a matrix of,Is 3A rotation matrix of 3 is provided,Is 3A translation vector of 1. Specifically, the radar coordinate system of the point cloud and the camera coordinate system are different by an M3 matrix, namely, the point cloud is multiplied by M3 to the right, and the point cloud can be converted into the camera coordinate system and is associated with the camera. Conversion from radar coordinate system to camera coordinate system:
for example, for point cloud coordinates in a radar coordinate systemConversion to coordinates in the camera coordinate systemThe formula of (2) is:
At this time, the radar coordinate system lower point can be obtainedCorresponding points in the camera coordinate system
Further, based on a preset camera internal reference matrix, each point corresponding to the single-frame point cloud in a camera coordinate system is projected to the mask image, and a plurality of pixel coordinates are obtained.
Specifically, the camera reference matrix is a known preset value, specifically, the camera reference matrix isWhereinAndThe focal lengths of the cameras in the x-axis direction and the y-axis direction, respectively, are the principal point coordinates of the image. Based on the point of P point under camera coordinate systemThe calculation formula of the pixel coordinates (u, v) projected to the image plane is obtained, and the specific calculation formula is as follows:
;
To sum up, in this step, the points in the single-frame point cloud are setConversion to (radar coordinate system)(Camera coordinate system) and then converted to (u, v) pixel coordinates (pixel coordinate system).
Further, obtaining pixel values corresponding to each pixel coordinate in the mask image, so as to extract the dynamic object point cloud and the static point cloud according to each pixel value.
Specifically, an index is established on the mask image according to the pixel coordinates to determine which pixel positions correspond to the effective area (generally, the pixel values are non-zero as the effective values), and if one pixel coordinate is effective in the mask image, the corresponding radar point cloud can be determined through the index. Specifically, the calculated pixel coordinates (u, v) have corresponding pixel values in the single-frame point cloud, each pixel coordinate corresponds to a point cloud portion of a mask image in the single-frame point cloud, based on the corresponding relationship, the point cloud corresponding to the mask image portion can be extracted from the single-frame point cloud by using the pixel coordinates, so that a dynamic object point cloud is obtained, and the rest of the extraction is determined to be a static point cloud.
The following specifically describes the step of obtaining a pixel value corresponding to each pixel coordinate in the mask image, so as to extract the dynamic object point cloud and the static point cloud according to each pixel value:
Determining the corresponding relation between each pixel coordinate and the mask image according to each pixel coordinate;
screening to obtain all pixel coordinates with non-zero values of the corresponding pixel values in the mask image through the corresponding relation between each pixel coordinate and the mask image;
and extracting the dynamic object point cloud according to all the pixel coordinates obtained by screening, and determining the static point cloud.
Specifically, the position corresponding to the calculated pixel coordinate (u, v) can be checked in the mask image to establish an index, and based on whether the pixel value of the position is an effective value, normally, a non-zero value is an effective value, the point cloud corresponding to the pixel coordinate of the effective value can be determined to be a dynamic object point cloud, so that all dynamic object point clouds can be obtained through screening one by one, and the point clouds of other parts in the single-frame point cloud can be determined to be static point clouds. Wherein a data structure (including but not limited to a dictionary or list) may be employed to store the correspondence of pixel coordinates to a single frame point cloud index.
More specifically, in the embodiment of the present application, it is considered that there are two cases of integer and non-integer pixel coordinates, the integer pixel coordinates can be directly used as the index, and the non-integer pixel coordinates need to be processed before the corresponding index is obtained, specifically:
Judging whether each pixel coordinate is an integer pixel coordinate or not;
And if the pixel coordinates are integer pixel coordinates, determining the corresponding relation between the pixel coordinates and the mask image according to the pixel coordinates.
Specifically, if the calculated pixel coordinate is an integer, that is, the integer pixel coordinate, the calculated pixel coordinate may be directly used as an index to check whether the pixel value corresponding to the mask image is an effective value, and all the effective values are judged and extracted one by one, so as to obtain all the point clouds corresponding to the integer pixel coordinate, that is, all the dynamic object point clouds corresponding to the integer pixel coordinate in the single-frame point cloud. Wherein, the index of the radar point cloud corresponding to the integer pixel coordinate is recorded, and a data structure (such as a dictionary or a list) can be used to store the corresponding relation between the integer pixel coordinate and the single-frame point cloud index.
Otherwise, acquiring four integer pixel coordinates closest to the pixel coordinates, and acquiring four pixel values corresponding to the four integer pixel coordinates;
And carrying out interpolation calculation based on the obtained four pixel values, and determining the corresponding relation between the pixel coordinates and the mask image according to interpolation calculation results.
Specifically, if the pixel coordinates are non-integer, i.e., non-integer pixel coordinates, a method such as bilinear interpolation may be used to determine whether the pixel value in the mask image corresponding to the pixel position is valid. And if the pixel value is valid, determining a corresponding point cloud in the single-frame point cloud. The application is exemplified by bilinear interpolation methods, specifically:
Four integer pixel coordinates closest to the pixel coordinates are obtained. I.e. find the four integer pixel coordinates closest to the pixel coordinates;
obtaining pixel values corresponding to the four integer pixel coordinates from the mask imageRespectively marked as;
And respectively carrying out interpolation calculation based on the four pixel values, wherein the calculation comprises calculating the interpolation weight of each pixel value, and obtaining the corresponding value of each non-integer pixel coordinate in the mask image based on the interpolation weight. Wherein. The formula for calculating interpolation weights is:
;
;
;
the formula adopted for obtaining the corresponding value of each non-integer pixel coordinate in the mask image based on the interpolation weight is as follows:
;
similar to the point cloud index for integer pixel coordinates described above, different index records for different non-integer pixel coordinates may be ensured in some way (e.g., fine-tuning based on the fractional portion of the pixel coordinates).
In summary, the above steps are combined to screen out the corresponding position of the mask image portion (i.e. the portion corresponding to the dynamic object in the drawing) in the single frame point cloud, and based on this, obtain the dynamic object point cloud (the point cloud corresponding to the mask image) and the static point cloud (the point cloud remaining after the point cloud of the mask image is removed).
In step S5, cluster analysis is performed on the dynamic object point cloud to extract the ground point cloud.
Specifically, the extracted dynamic object point cloud is a mixed point cloud obtained by mixing a dynamic object (for example, a vehicle) with the ground, and in order to remove the point cloud of the dynamic object part, cluster analysis is also required to partition the point cloud of each cluster in the mixed point cloud. In the embodiment of the application, the Euclidean distance between the points is calculated, and if the Euclidean distance between the two points is smaller than a certain preset threshold value, the two points are considered to belong to the same class of point clouds.
More specifically, the step of performing cluster analysis on the dynamic object point cloud to extract the ground point cloud includes:
And determining any point in the dynamic object point cloud as a starting point, and adding the starting point into a preset clustering set.
Specifically, let the input point cloud set beWherein each point is. The point cloud set is a point cloud set formed by the dynamic object point clouds and comprises mixed point clouds of various kinds of point clouds. Then, any point in the point cloud set is selected as a starting point, or the point with the smallest value can be selected as the starting point, and the starting point is assumed to be
And calculating Euclidean distances between each unvisited point in the dynamic object point cloud and all points in the clustering set one by one.
Specifically, an empty cluster set c= { } and an accessed point set v= { } may be created, the starting point to be determinedRespectively to the cluster set C and to the accessed set V. Subsequently, for each non-accessed point in the set of point clouds that is not in the accessed setAnd calculating whether the Euclidean distance between the clustering set and all the points in the clustering set accords with a preset clustering condition (namely, whether the Euclidean distance between at least one point in the clustering set and the point accessed currently is smaller than a preset distance threshold value).
The calculation formula of the Euclidean distance is as follows:
Wherein pj= (xj, yj, zj) in the formula is the point in the cluster set C with the smallest euclidean distance from the point, i.e. the smallest euclidean distance.
Comparing the Euclidean distance of each point with a preset distance threshold value, and adding the points with the Euclidean threshold value exceeding the distance threshold value into the clustering set until all the points in the dynamic object point cloud are accessed.
Specifically, assume that the Euclidean distance threshold isIf the minimum Euclidean distance is smaller than the preset distance threshold valueThen will this pointAdding to the cluster set C and the accessed point set V, and repeating the process until all points have been accessed.
And extracting the ground point cloud from the clustering set according to the preset ground point cloud characteristics.
Specifically, after the above clustering process, a plurality of clusters may be obtained, and the feature of each cluster needs to be analyzed to obtain a cluster corresponding to the ground point cloud. Typically, the ground point cloud is the largest cluster or cluster with a particular feature. Furthermore, the present application proposes that which cluster is a ground point cloud may be determined based on ground point cloud characteristics such as flatness, altitude range, etc. And then, dividing the cluster determined to be the ground point cloud from the dynamic object point cloud to obtain the ground point cloud.
In step S6, combining the static point cloud and the ground point cloud to obtain a three-dimensional point cloud from which the dynamic point cloud is removed.
Specifically, combining the obtained ground point cloud with the static point cloud to obtain the point cloud with the dynamic object part removed and the data cleaned, and obtaining the three-dimensional point cloud with the dynamic point cloud removed. If the resulting three-dimensional point cloud needs to be converted from the camera coordinate system to the radar coordinate system, the coordinate system conversion may be performed by multiplying the inverse of the conversion matrix M3 mentioned in the above step.
By adopting the technical scheme, the intelligent robot can be widely applied to the application fields of automatic driving, intelligent robots, smart city construction and the like, and specifically:
1. In the field of automatic driving, the quality of point cloud data can be improved, so that the perception capability and decision accuracy of an automatic driving system are improved. For example, by accurately separating the point cloud of the dynamic object, collision of the autonomous vehicle with other dynamic objects during driving can be avoided. Meanwhile, the accuracy and the integrity of the point cloud data can be improved through a multi-sensor fusion calibration technology, so that the environment sensing capability of an automatic driving system is improved.
2. In the field of intelligent robots, the three-dimensional environment sensing capability with higher precision can be provided, so that the navigation and obstacle avoidance capability of the intelligent robot is improved. For example, by accurately separating the point cloud of the dynamic object, collision between the intelligent robot and the dynamic object can be avoided. Meanwhile, the accuracy and the integrity of the point cloud data can be improved through a multi-sensor fusion calibration technology, so that the environment sensing capability of the intelligent robot is improved.
3. In the field of smart city construction, a higher-precision three-dimensional city environment modeling capability can be provided, so that the daily life quality of city managers and residents is improved. For example, dynamic objects in urban environments, such as vehicles, pedestrians, etc., can be modeled more accurately by accurately separating out the dynamic object point cloud. Meanwhile, the accuracy and the integrity of the point cloud data can be improved through a multi-sensor fusion calibration technology, so that the accuracy and the fineness of urban environment modeling are improved.
In a second aspect, the present application provides a dynamic point cloud removing device based on multi-mode data sensing, referring to fig. 2, fig. 2 is a schematic virtual structure diagram of the dynamic point cloud removing device based on multi-mode data sensing, including:
The data acquisition module 100 is used for acquiring a single-frame point cloud and determining an image to be processed corresponding to the single-frame point cloud;
the semantic segmentation module 200 is configured to perform camera distortion correction on the image to be processed, and input the corrected image to be processed into a pre-trained image semantic segmentation model, so as to perform semantic segmentation on the corrected image to be processed through the image semantic segmentation model, thereby obtaining a mask image;
The linear transition module 300 is configured to obtain a front frame point cloud and a rear frame point cloud that are closest to an image acquisition time of the image to be processed, and perform linear transition on the single frame point cloud based on the front frame point cloud, the rear frame point cloud, and the image acquisition time;
The point cloud extraction module 400 is configured to project a single-frame point cloud after linear transition to the mask image, and extract a dynamic object point cloud and a static point cloud from the single-frame point cloud, where the dynamic object point cloud includes a dynamic point cloud and a ground point cloud;
the ground point cloud determining module 500 is configured to perform cluster analysis on the dynamic object point cloud to extract the ground point cloud;
and the point cloud data fusion module 600 is configured to combine the static point cloud and the ground point cloud to obtain a three-dimensional point cloud from which the dynamic point cloud is removed.
The dynamic point cloud removing device based on multi-mode data perception according to the embodiment of the present application can execute the dynamic point cloud removing method based on multi-mode data perception provided by the above embodiment, and the dynamic point cloud removing device based on multi-mode data perception has the corresponding functional steps and advantages of the dynamic point cloud removing method based on multi-mode data perception according to the above embodiment, and in particular please refer to the above embodiment of the dynamic point cloud removing method based on multi-mode data perception, which is not described herein again.
Referring to fig. 3, fig. 3 is a schematic structural diagram of the electronic device according to the embodiment of the present application, where the electronic device may include a processor and a memory, and the processor and the memory may be connected by a bus or other manners. The processor may be a central processing unit (CentralProcessing Unit, CPU). The processor may also be a chip such as another general purpose processor, a digital signal processor (DIGITALSIGNAL PROCESSOR), an Application SPECIFIC INTEGRATED Circuit (ASIC), a Field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, a discrete gate or transistor logic device, a discrete hardware component, or a combination thereof. The memory is used as a non-transitory computer readable storage medium and can be used for storing non-transitory software programs, non-transitory computer executable programs and modules, such as program instructions/modules corresponding to the dynamic point cloud eliminating method based on multi-modal data perception in the embodiment of the application. The processor executes various functional applications and data processing of the processor by running non-transitory software programs, instructions and modules stored in the memory, namely, the dynamic point cloud rejection method based on multi-mode data perception in the method embodiment is realized.
The memory may include a storage program area that may store an operating system, application programs required for at least one function, and a storage data area that may store data created by the processor, etc. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. The one or more modules are stored in the memory that, when executed by the processor, perform a dynamic point cloud culling method based on multi-modal data awareness as in the method embodiments described above. The specific details of the electronic device may be understood corresponding to the corresponding related descriptions and effects in the foregoing method embodiments, which are not repeated herein. It will be appreciated by those skilled in the art that implementing all or part of the above-described embodiment method may be implemented by a computer program to instruct related hardware, where the program may be stored in a computer readable storage medium, and the program may include the above-described embodiment method when executed. The storage medium may be a Read-Only Memory (ROM), a Random access Memory (Random AccessMemory, RAM), a flash Memory (flash Memory), a hard disk (HARD DISK DRIVE, abbreviated as HDD), a Solid-state disk (Solid-STATE DRIVE, SSD), or the like, and may further include a combination of the above types of memories.
In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the application may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the application, various features of the application are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed application requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this application.
It should be noted that the above-mentioned embodiments illustrate rather than limit the application, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims.

Claims (10)

CN202411801168.2A2024-12-09 Dynamic point cloud elimination method, device, equipment and storage medium based on multimodal data perceptionActiveCN119919320B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202411801168.2ACN119919320B (en)2024-12-09 Dynamic point cloud elimination method, device, equipment and storage medium based on multimodal data perception

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202411801168.2ACN119919320B (en)2024-12-09 Dynamic point cloud elimination method, device, equipment and storage medium based on multimodal data perception

Publications (2)

Publication NumberPublication Date
CN119919320A CN119919320A (en)2025-05-02
CN119919320Btrue CN119919320B (en)2025-10-10

Family

ID=

Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN116643291A (en)*2023-05-292023-08-25中国矿业大学(北京)SLAM method for removing dynamic targets by combining vision and laser radar
CN117593650A (en)*2024-01-182024-02-23上海几何伙伴智能驾驶有限公司 Moving point filtering visual SLAM method based on 4D millimeter wave radar and SAM image segmentation

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN116643291A (en)*2023-05-292023-08-25中国矿业大学(北京)SLAM method for removing dynamic targets by combining vision and laser radar
CN117593650A (en)*2024-01-182024-02-23上海几何伙伴智能驾驶有限公司 Moving point filtering visual SLAM method based on 4D millimeter wave radar and SAM image segmentation

Similar Documents

PublicationPublication DateTitle
CN113159151B (en) A multi-sensor deep fusion 3D object detection method for autonomous driving
CN111797650B (en)Obstacle identification method, obstacle identification device, computer equipment and storage medium
Ding et al.Vehicle pose and shape estimation through multiple monocular vision
CN114217665B (en) A camera and laser radar time synchronization method, device and storage medium
CN112097732A (en)Binocular camera-based three-dimensional distance measurement method, system, equipment and readable storage medium
CN115410167A (en)Target detection and semantic segmentation method, device, equipment and storage medium
US12380592B2 (en)Image processing system and method
CN111860072A (en) Parking control method, device, computer device, and computer-readable storage medium
US12260582B2 (en)Image processing system and method
CN116433737A (en)Method and device for registering laser radar point cloud and image and intelligent terminal
CN115601430A (en)Texture-free high-reflection object pose estimation method and system based on key point mapping
CN114926485B (en) Image depth annotation method, device, equipment and storage medium
WO2025002194A1 (en)Scene reconstruction method and apparatus, and storage medium and electronic device
CN117584943A (en)Parking method, device, electronic equipment system and storage medium
CN114648639B (en)Target vehicle detection method, system and device
CN118644551B (en)Multi-mode-based pose estimation method for unmanned vehicle tracking moving target
CN117237609B (en)Multi-mode fusion three-dimensional target detection method and system
CN119919320B (en) Dynamic point cloud elimination method, device, equipment and storage medium based on multimodal data perception
CN112464905A (en)3D target detection method and device
CN118097666A (en)Semantic segmentation method, device and system and excavator
CN117789193A (en)Multimode data fusion 3D target detection method based on secondary enhancement
CN117576200A (en)Long-period mobile robot positioning method, system, equipment and medium
CN117611800A (en)YOLO-based target grounding point detection and ranging method
CN119919320A (en) Dynamic point cloud elimination method, device, equipment and storage medium based on multimodal data perception
CN112433193B (en)Multi-sensor-based mold position positioning method and system

Legal Events

DateCodeTitleDescription
PB01Publication
SE01Entry into force of request for substantive examination
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp