CN118762235B

Movatterモバイル変換

Info

Publication number: CN118762235B
Application number: CN202411216313.0A
Authority: CN
Inventors: 刘家豪; 王野; 黄鸿胜; 张�雄
Original assignee: Neolix Technologies Co Ltd
Current assignee: Neolix Technologies Co Ltd
Priority date: 2024-09-02
Filing date: 2024-09-02
Publication date: 2025-01-24
Anticipated expiration: 2044-09-02
Also published as: CN118762235A

Abstract

Description

Label determining method, device, equipment and medium

Technical Field

The application relates to the technical field of computers, in particular to the technical fields of positioning technology, automatic driving and the like, and particularly relates to a method, a device, equipment and a medium for determining a label.

Background

The general obstacle occupation network (Occupancy Network) model based on deep learning can directly learn from the original sensor data to generate the mapping relation of the occupation grid, can process complex environment and dynamic change, and provides high-precision and real-time environment perception capability. And when training the obstacle occupation network model, the truth value label of the obstacle occupation grid is needed.

At present, the process of generating the true value label of the obstacle occupation grid can be to collect point cloud data, divide the point cloud data, convert the point cloud semantic obtained by dividing into the occupation grid label and store the generated occupation grid label. Partitioning point cloud data typically requires manual or semi-automated labeling tools. The labeling personnel need to classify each point and divide the point cloud into different categories, such as roads, pedestrians, vehicles, buildings, and the like. Converting the point cloud into an occupied grid is to assign each point cloud point to a fixed-size grid cell, and the state and class of each grid cell are determined by the labels of the points it contains.

Disclosure of Invention

The application provides a method, a device, equipment and a medium for determining a tag, which can solve the problem that the accuracy of the determined grid tag data occupied by an obstacle is not high, and the technical scheme is as follows:

In a first aspect, a method of tag determination is provided, the method comprising:

acquiring point cloud semantic information of at least one frame of point cloud to be processed in a designated area;

Classifying the point cloud semantic information of each frame to obtain static semantic point clouds and dynamic semantic point clouds of each frame;

Respectively splicing the static semantic point cloud and the dynamic semantic point cloud of each frame to obtain a dense static semantic point cloud and a dense dynamic semantic point cloud;

Based on the dense static semantic point cloud and the dense dynamic semantic point cloud, determining the dense semantic point cloud of each frame by utilizing a preset merging strategy;

Carrying out voxelization processing on each point cloud in the dense semantic point cloud of each frame based on a preset voxel grid network of the designated area so as to obtain the point cloud occupation condition of each voxel grid in the voxel grid network;

Based on the point cloud occupation condition of each voxel grid, the semantics of the point cloud of each voxel grid are selected and processed to determine the occupation grid semantic tags of each voxel grid.

In one possible implementation manner, classifying the point cloud semantic information of each frame to obtain a static semantic point cloud and a dynamic semantic point cloud of each frame includes:

Detecting each frame of point cloud by using a preset target detection model to obtain a dynamic obstacle detection result of each frame;

Based on the dynamic obstacle detection result of each frame, extracting and processing the dynamic semantic point cloud in the point cloud semantic information of each frame to obtain the dynamic semantic point cloud of each frame;

And filtering the dynamic semantic point cloud in the point cloud semantic information of each frame based on the dynamic semantic point cloud of each frame to obtain the static semantic point cloud of each frame.

In one possible implementation manner, the performing a stitching process on the dynamic semantic point cloud of each frame to obtain a dense dynamic semantic point cloud includes:

Acquiring a dynamic obstacle detection result corresponding to the dynamic semantic point cloud of each frame;

Based on a dynamic obstacle detection result corresponding to the dynamic semantic point cloud of each frame, performing splicing processing on the dynamic semantic point cloud of each frame;

and obtaining dense dynamic semantic point clouds based on the result of the splicing process.

In one possible implementation manner, the stitching processing is performed on the static semantic point cloud of each frame to obtain a dense static semantic point cloud, including:

based on a preset sequence, performing splicing processing on the static semantic point cloud of each frame;

and obtaining dense static semantic point clouds based on the result of the splicing process.

In a possible implementation manner, the determining, based on the dense static semantic point cloud and the dense dynamic semantic point cloud, the dense semantic point cloud of each frame by using a preset merging policy includes:

The method comprises the following steps of acquiring position data of the point cloud of the current frame;

based on the position data of the point cloud of the current frame, converting the dense static semantic point cloud to obtain the dense static semantic point cloud of the current frame;

acquiring a dynamic obstacle detection result of the point cloud of the current frame;

matching the dynamic obstacle detection result with the dense dynamic semantic point cloud to obtain the dense dynamic semantic point cloud of the current frame;

Merging the dense static semantic point cloud of the current frame and the dense dynamic semantic point cloud of the current frame to obtain the dense semantic point cloud of the current frame;

A dense semantic point cloud for each frame is determined based on the dense semantic point cloud for each current frame.

In a possible implementation manner, the voxelization processing is performed on each point cloud in the dense semantic point cloud of each frame based on the preset voxel grid network of the specified area to obtain a point cloud occupation condition of each voxel grid in the voxel grid network, and the method includes:

Performing voxelization processing on each point cloud in the dense semantic point clouds of each frame based on a preset voxel grid network of the designated area to obtain a position index of each point cloud;

Determining the number of point clouds in each voxel grid based on the position index of each point cloud and the preset position index of the voxel grid;

Based on the number of point clouds in each voxel grid, the point cloud occupancy of each voxel grid is obtained.

In a possible implementation manner, the selecting the semantics of the point cloud in each voxel grid based on the point cloud occupation condition of each voxel grid to determine the occupation grid semantic tag of each voxel grid includes:

responding to the point cloud occupation condition of the voxel grid as the point cloud occupation, and acquiring the number of the point clouds and the semantics of the point clouds in the voxel grid;

based on the number of point clouds in the voxel grid and the semantics of the point clouds, selecting the semantics of the point cloud with the highest voting rate in the voxel grid by utilizing a voting mechanism;

And determining the semantics of the point cloud with the highest ticket obtaining rate as the occupied grid semantic tags of the voxel grid.

In a second aspect, a training method of an obstacle occupation network model is provided, and the method includes:

Acquiring sample data and occupied grid semantic tags corresponding to the sample data, wherein the occupied grid semantic tags are determined by processing the sample data according to the method described in the first aspect and any possible implementation manner of the first aspect;

Performing iterative training processing on an obstacle occupation network model to be trained based on the sample data and the occupation grid semantic tags corresponding to the sample data;

and responding to the training reaching a preset termination condition, and obtaining the obstacle occupation network model after the training is completed.

In a third aspect, there is provided an apparatus for tag determination, the apparatus comprising:

the acquisition unit is used for acquiring point cloud semantic information of at least one frame of point cloud to be processed in the designated area;

the classification unit is used for classifying the point cloud semantic information of each frame to obtain static semantic point clouds and dynamic semantic point clouds of each frame;

The splicing unit is used for respectively carrying out splicing treatment on the static semantic point cloud and the dynamic semantic point cloud of each frame so as to respectively obtain a dense static semantic point cloud and a dense dynamic semantic point cloud;

The merging unit is used for determining the dense semantic point cloud of each frame by utilizing a preset merging strategy based on the dense static semantic point cloud and the dense dynamic semantic point cloud;

The acquisition unit is used for carrying out voxelization processing on each point cloud in the dense semantic point cloud of each frame based on a preset voxel grid network of the appointed area so as to acquire the point cloud occupation condition of each voxel grid in the voxel grid network;

the determining unit is used for selecting and processing the semantics of the point cloud in each voxel grid based on the point cloud occupation condition of each voxel grid so as to determine the occupation grid semantic tags of each voxel grid.

In a fourth aspect, there is provided a training apparatus for an obstacle occupying network model, the apparatus comprising:

an obtaining unit, configured to obtain sample data and an occupied grid semantic tag corresponding to the sample data, where the occupied grid semantic tag is determined by processing the sample data according to the method described in the first aspect and any possible implementation manner of the first aspect;

The training unit is used for carrying out iterative training treatment on the obstacle occupation network model to be trained based on the sample data and the occupation grid semantic tags corresponding to the sample data;

The obtaining unit is used for obtaining the obstacle occupation network model after training is completed in response to the training reaching a preset termination condition.

In a fifth aspect, there is provided a computer readable storage medium having stored therein at least one instruction that is loaded and executed by a processor to implement the method of the first aspect and any possible implementation thereof, the second aspect, as described above.

In a sixth aspect, there is provided an electronic device comprising:

at least one processor, and

A memory communicatively coupled to the at least one processor, wherein,

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect and any one of its possible implementations, the second aspect as described above.

In a seventh aspect, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the method of the first aspect and any possible implementation manner thereof, the second aspect, as described above.

In an eighth aspect, there is provided an autonomous vehicle comprising an electronic device as described above.

The technical scheme provided by the application has the beneficial effects that at least:

As can be seen from the above technical solution, on one hand, in the embodiment of the present application, by acquiring point cloud semantic information of at least one frame of point clouds to be processed in a designated area, and further performing classification processing on the point cloud semantic information of each frame to obtain a static semantic point cloud and a dynamic semantic point cloud of each frame, performing stitching processing on the static semantic point clouds and the dynamic semantic point clouds of each frame respectively to obtain a dense static semantic point cloud and a dense dynamic semantic point cloud, determining a dense semantic point cloud of each frame by using a preset merging strategy, performing voxel processing on each point cloud in the dense semantic point clouds of each frame based on a preset voxel grid network of the designated area, so as to obtain a point cloud occupation condition of each voxel grid, and performing selection processing on the point cloud of each voxel cloud based on the point cloud occupation condition of each voxel grid, so as to determine an occupation semantic label of each voxel, and automatically merging the point cloud with a dense semantic point cloud by using a dense cloud, and automatically determining a dense cloud label by using a dense cloud of each static semantic point cloud of each frame, and a dense cloud of each pixel cloud, and automatically merging the point cloud with a dense cloud of each dense semantic point cloud, and the dense cloud can be obtained by using a dense cloud point cloud, and the dense cloud of each dense point cloud of each pixel can be obtained, thereby improving the precision and reliability of occupying the grid semantic tags.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the application or to delineate the scope of the application. Other features of the present application will become apparent from the description that follows.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method of tag determination provided by one embodiment of the present application;

FIG. 2 is a flowchart of a training method of an obstacle occupation network model according to another embodiment of the present application;

FIG. 3 is a flow chart of a method of tag determination provided by another embodiment of the present application;

FIG. 4 is a block diagram of an apparatus for tag determination according to still another embodiment of the present application;

FIG. 5 is a block diagram of a training apparatus for an obstacle occupation network model according to still another embodiment of the present application;

fig. 6 is a block diagram of an electronic device for implementing a method of tag determination of an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present application are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It will be apparent that the described embodiments are some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

It should be noted that, the terminal device according to the embodiment of the present application may include, but is not limited to, smart devices such as a mobile phone, a Personal digital assistant (Personal DIGITAL ASSISTANT, PDA), a wireless handheld device, and a Tablet Computer (Tablet Computer), and the display device may include, but is not limited to, devices with display functions such as a Personal Computer and a television.

In addition, the term "and/or" is merely an association relation describing the association object, and means that three kinds of relations may exist, for example, a and/or B, and that three kinds of cases where a exists alone, while a and B exist alone, exist alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.

The general obstacle occupation grid technology is a key technology in automatic driving, intelligent robots, environment sensing and navigation systems. The method realizes accurate representation and understanding of the three-dimensional space by dividing the environment into grids and recording the occupied state of each grid unit.

At present, the obstacle occupation network model based on deep learning can directly learn from the original sensor data to generate the mapping relation of the occupation grid, can process complex environment and dynamic change, provides high-precision and real-time environment sensing capability, and provides safe and reliable navigation and decision support for automatic driving vehicles and robots.

While training the obstacle occupation network model requires an obstacle occupation grid truth value tag, namely an occupation grid semantic tag, generating the occupation grid truth value tag generally involves several key steps of collecting data, dividing scene point clouds, converting the point clouds into an occupation grid, and storing the generated occupation grid semantic tag. Point cloud semantic segmentation typically requires manual or semi-automated labeling tools. The labeling personnel need to classify each point and divide the point cloud into different categories, such as roads, pedestrians, vehicles, buildings, and the like. Converting the point cloud into an occupied grid is to assign each point to a fixed-size grid cell, and the state and class of each grid cell are determined by the labels of the points it contains.

However, the above related technologies still have some problems that the manual labeling process of point cloud segmentation is very time-consuming and laborious, and has high cost, and the precision of the generated occupied grid label is limited due to the sensor noise, the complex real environment and the subjective difference in the labeling process, and a large amount of occupied grid label data occupy a large amount of storage space.

Therefore, a method for determining the label is needed to automatically and effectively label the occupied grid semanteme, so that the reliability and the accuracy of the semantic label of the occupied grid are ensured.

Referring to fig. 1, a flowchart of a method for determining a tag according to an embodiment of the application is shown. The method for determining the label specifically comprises the following steps:

step 101, acquiring point cloud semantic information of at least one frame of point cloud to be processed in a designated area.

Step 102, classifying the point cloud semantic information of each frame to obtain static semantic point clouds and dynamic semantic point clouds of each frame.

And 103, respectively performing splicing processing on the static semantic point cloud and the dynamic semantic point cloud of each frame to respectively obtain a dense static semantic point cloud and a dense dynamic semantic point cloud.

Step 104, determining the dense semantic point cloud of each frame by utilizing a preset merging strategy based on the dense static semantic point cloud and the dense dynamic semantic point cloud.

And 105, carrying out voxelization processing on each point cloud in the dense semantic point cloud of each frame based on a preset voxel grid network of the designated area so as to obtain the point cloud occupation condition of each voxel grid in the voxel grid network.

And 106, selecting and processing the semantics of the point cloud in each voxel grid based on the point cloud occupation condition of each voxel grid so as to determine the occupation grid semantic tags of each voxel grid.

It should be noted that the designated area may be an area where the point cloud is collected. The at least one frame of point clouds may include a point cloud for each of the successive frames.

It should be noted that the dense static semantic point cloud may be a dense point cloud representing a static object or a static scene, i.e. may be a dense point cloud with static semantic categories.

It should be noted that the dense dynamic semantic point cloud may be a dense point cloud representing a dynamic object, i.e. may be a dense point cloud with dynamic semantic categories.

It should be noted that the dense semantic point cloud may be a dense point cloud having semantic categories. Each of the dense semantic point clouds may have a corresponding semantic category, i.e., semantic label.

It should be noted that, part or all of the execution body in steps 101 to 106 may be an application located in the local terminal, or may be a functional unit such as a plug-in unit or a software development kit (Software Development Kit, SDK) disposed in the application located in the local terminal, or may be a processing engine located in a server on the network side, or may be a distributed system located on the network side, for example, a processing engine or a distributed system in a data processing platform on the network side, which is not limited in this embodiment.

It will be appreciated that the application may be a local program (NATIVEAPP) installed on the local terminal, or may also be a web page program (webApp) of a browser on the local terminal, which is not limited in this embodiment.

Alternatively, in one possible implementation manner of the present embodiment, in step 101, specifically, first, at least one frame of point cloud and an image corresponding to the point cloud may be acquired. Secondly, carrying out semantic segmentation processing on the image corresponding to the point cloud by using an image semantic segmentation model to obtain an image semantic segmentation result. And mapping the point cloud onto the image semantic segmentation result to take the semantic segmentation label of the corresponding pixel on the image as the semantic segmentation result of the point cloud. And thirdly, taking the semantic segmentation result of the point cloud as the semantic information of the point cloud.

Therefore, the image semantic segmentation model is utilized to carry out semantic segmentation processing on the image corresponding to the point cloud, so that automation of point cloud semantic annotation is realized, and data processing efficiency is improved.

In this implementation, the first recognition algorithm may include a ground point cloud recognition algorithm. The second recognition algorithm may include a target obstacle detection algorithm based on a 3D target detection model.

In this way, the first point cloud semantic segmentation data determined by the image semantic segmentation strategy can be respectively filtered according to the determined obstacle point cloud data and the ground point cloud data to obtain third point cloud semantic segmentation data, second ground point cloud data and second obstacle point cloud data, and then the second ground point cloud data, the second obstacle point cloud data and the third point cloud semantic segmentation data are used as point cloud semantic segmentation results of the point cloud data, so that more accurate point cloud semantic segmentation results can be obtained by combining a multi-dimensional point cloud semantic recognition result, the reliability of the point cloud semantic segmentation results is improved, the reliability of the point cloud semantic information is guaranteed, meanwhile, the automation of point cloud semantic annotation is realized, and the data processing efficiency is improved.

It should be noted that, the specific implementation procedure provided in the present implementation manner may be combined with the various specific implementation procedures provided in the foregoing implementation manner to implement the method for determining a tag in this embodiment. The detailed description may refer to the relevant content in the foregoing implementation, and will not be repeated here.

Optionally, in one possible implementation manner of this embodiment, in step 102, specifically, a preset target detection model may be used to perform detection processing on each frame of point cloud to obtain a dynamic obstacle detection result of each frame, and further, based on the dynamic obstacle detection result of each frame, dynamic semantic point clouds in the point cloud semantic information of each frame may be extracted to obtain dynamic semantic point clouds of each frame, and based on the dynamic semantic point clouds of each frame, dynamic semantic point clouds in the point cloud semantic information of each frame may be filtered to obtain static semantic point clouds of each frame.

In the present implementation, the dynamic obstacle detection results may include, but are not limited to, a dynamic obstacle detection box and a dynamic obstacle category Identification (ID). The dynamic obstacle category identification may be a dynamic obstacle detection box category identification.

In this implementation, the preset object detection model may include a 3D object detection model.

It can be understood that, here, for each single-frame point cloud, a 3D target detection model is used to detect each single-frame point cloud, so that dynamic obstacles, for example, 3D detection frames of vehicles, pedestrians and the like, can be obtained, point clouds in the detection frames are extracted, and dynamic semantic point clouds can be obtained. And filtering the point cloud semantic information of the point cloud by utilizing the dynamic semantic point cloud to obtain a static semantic point cloud, so that the single-frame point cloud can be divided into the dynamic semantic point cloud and the static semantic point cloud. The dynamic semantic point cloud may be a dynamic object point cloud. The static semantic point cloud may be a static scene point cloud. Moreover, the dynamic semantic point cloud and the static semantic point cloud can be stored separately, so that dynamic and static separation of single-frame point clouds can be realized.

In this way, by using the target detection model to detect the dynamic obstacle in each frame of point cloud, and separating the dynamic semantic point cloud and the static semantic point cloud in the point cloud semantic information based on the detected dynamic obstacle, the static semantic point cloud and the dynamic semantic point cloud of each frame are respectively obtained, so that the reliability of the static semantic point cloud and the dynamic semantic point cloud can be improved, the reliability of the semantics of the point cloud can be improved, and the accuracy of the subsequently determined occupied grid semantic tags can be further ensured.

Optionally, in one possible implementation manner of this embodiment, in step 103, specifically, first, a dynamic obstacle detection result corresponding to a dynamic semantic point cloud of each frame may be obtained. And secondly, based on a dynamic obstacle detection result corresponding to the dynamic semantic point cloud of each frame, the dynamic semantic point cloud of each frame can be spliced. Again, a dense dynamic semantic point cloud may be obtained based on the results of the stitching process.

In the present implementation, the dynamic obstacle detection result includes a dynamic obstacle category ID. The dynamic semantic point clouds of each frame may be arranged according to a preset time sequence, that is, dynamic semantic point clouds of consecutive frames.

In a specific implementation process of the implementation manner, the dynamic semantic point clouds of each frame may be spliced based on a dynamic obstacle category ID of a dynamic obstacle detection result corresponding to the dynamic semantic point clouds of each frame. Again, a dense dynamic semantic point cloud may be obtained based on the results of the stitching process.

In this implementation, the dense dynamic semantic point cloud may be a dynamic semantic point cloud of consecutive frames according to a preset time sequence. The dense dynamic semantic point cloud may be a dense point cloud characterizing a dynamic object.

Optionally, in one possible implementation manner of this embodiment, in step 103, the static semantic point cloud of each frame may be further subjected to a stitching process based on a preset sequence, and further a dense static semantic point cloud may be obtained based on a result of the stitching process.

In this implementation, the preset order may be a time order of the static semantic point cloud of each frame.

In a specific implementation process of the implementation manner, based on a preset time sequence, the static semantic point cloud of each frame is spliced, so that a dense static semantic point cloud can be obtained based on a result of the splicing.

In this implementation, the dense static semantic point cloud may be a static scene point cloud that is stitched to get a continuous frame. A dense static semantic point cloud may be a dense point cloud that characterizes a static scene.

Optionally, in one possible implementation manner of this embodiment, in step 105, specifically, first, voxel processing is performed on each point cloud in the dense semantic point clouds of each frame based on the preset voxel grid network of the specified area, and voxel processing is performed on each point cloud in the dense semantic point clouds of each frame based on the preset voxel grid network of the specified area, so as to obtain a position index of each point cloud. Again, the number of point clouds within each voxel grid is determined based on the location index of each point cloud and the preset location index of the voxel grid. Again, based on the number of point clouds within each voxel grid, the point cloud occupancy of each voxel grid is obtained.

In this implementation, the preset voxel grid network of the designated region may include a plurality of voxel grids. The preset position index of the voxel grid may be determined based on position information of each voxel grid in the preset voxel grid network.

In a specific implementation process of the implementation manner, for one point cloud in the dense semantic point cloud of each frame, the position coordinates of the point cloud may be (x, y, z), and the voxel processing may be performed on the point cloud based on the preset voxel grid network of the specified area to obtain the position index of the point cloud,i[ X, y, z ] can be represented by the following formula (1):(1)

The i_ub is a lower bound of a preset valid point cloud range in each coordinate axis direction, and the i_size may be a size, or a length, of the unit voxel grid in each coordinate axis direction under a coordinate system defined by the unit voxel grid.

For example, for a three-dimensional Euclidean geometry space in meters, a unit voxel grid, i.e., a voxel grid, which may be a cube of length, width and height 1m, where i_size may be 1, i[x,y,z]。

Here, the preset valid point cloud range may be determined based on a valid range of point cloud acquisition corresponding to one week of scanning by the laser radar.

In another specific implementation process of the implementation manner, matching processing is performed on the position index of each point cloud and the preset position index of each voxel grid, and the number of the point clouds in each voxel grid is determined based on a result of the matching processing. Again, based on the number of point clouds within each voxel grid, the point cloud occupancy of each voxel grid is obtained.

By way of example, each position of the voxel grid is traversed, if the number of point clouds in the voxel grid is greater than 0, the point cloud occupation condition of the voxel grid can be determined to be point cloud occupation, otherwise, the point cloud occupation condition is determined to be point cloud unoccupied.

Optionally, in one possible implementation manner of the present embodiment, in step 105, specifically, first, in response to a point cloud occupancy condition of a voxel grid being a point cloud occupancy, the number of point clouds and semantics of the point clouds in the voxel grid are acquired. And secondly, selecting the semantics of the point cloud with the highest voting rate in the voxel grid by utilizing a voting mechanism based on the number of the point clouds in the voxel grid and the semantics of the point clouds. And determining the semantics of the point cloud with the highest ticketing rate as the occupied grid semantic tags of the voxel grid.

In this implementation, the number of semantics of the point cloud may be multiple.

Alternatively, in one possible implementation of the present embodiment, after step 106, first, the location information of each voxel grid may be acquired. Next, a position index of each voxel grid is determined based on the position information of each voxel grid. Again, based on the occupied grid semantic tags for each voxel grid, an occupied grid semantic tag index for each voxel grid is determined. And thirdly, carrying out coding processing on each voxel grid based on a preset acquisition range corresponding to the voxel grid network, a position index of each voxel grid and an occupied grid semantic tag index to obtain each coding voxel grid, and further carrying out storage processing on each coding voxel grid.

In a specific implementation process of the implementation manner, a preset acquisition range corresponding to the voxel grid network can be determined based on an acquisition coverage area of the laser radar.

Here, the acquisition coverage area of the lidar may be an area that the lidar can acquire. The position index of each voxel grid may be a preset position index of each voxel grid. The preset position index of each voxel grid may be determined based on position information of each voxel grid.

In another specific implementation process of the implementation manner, for any voxel grid, encoding processing can be performed on each voxel grid based on a position index of the voxel grid and an occupied grid semantic tag, so as to obtain an encoded voxel grid index, which can be shown in the following formula (2): index=x_index_y_size_z _size_size+y_index size c/u size+y_index

Where x_index, y_index, and z_index may represent the position index of the voxel grid, c_index may represent the occupied grid semantic label of the voxel grid, i.e. the category index, and y_size, z_size, and c_size may be the maximum values of y_index, z_index, and c_index, respectively.

Illustratively, here, the encoded voxel grid may be a 32-bit integer number.

Therefore, the voxel grids with the occupied grid semantic tags can be subjected to coding processing and stored in a coding mode, and the storage space of large-batch occupied grid semantic tag data can be greatly reduced.

Fig. 2 is a flow chart of a training method of an obstacle occupation network model according to another embodiment of the present application, as shown in fig. 2.

Step 201, acquiring sample data and occupied grid semantic tags corresponding to the sample data, wherein the occupied grid semantic tags are determined by processing the sample data according to the method described in the foregoing embodiment.

Step 202, performing iterative training processing on the obstacle occupation network model to be trained based on the sample data and the occupation grid semantic tags corresponding to the sample data.

And 203, responding to the training reaching a preset termination condition, and obtaining a training-completed obstacle occupation network model.

It should be noted that the sample data may include a point cloud of a specified area.

It should be noted that the preset termination condition may include that the number of iterations reaches a preset number of times threshold, that the training result satisfies a preset convergence condition, and so on.

Before step 201, each encoding voxel grid corresponding to the sample data may be obtained from the database based on the sample data, and then decoding processing may be performed on each encoding voxel grid, so as to obtain the semantic label of the occupied grid corresponding to the sample data.

It should be noted that, part or all of the execution bodies of steps 201 to 203 may be applications located at the local terminal, or may be functional units such as a plug-in unit or a software development kit (Software Development Kit, SDK) disposed in the applications located at the local terminal, or may be a processing engine located in a server on the network side, or may be a distributed system located on the network side, for example, a processing engine or a distributed system in a model training platform on the network side, which is not limited in this embodiment.

Therefore, the obstacle occupation network model can be obtained through training by utilizing the occupation grid semantic tags of the more effective and accurate sample data, and the robustness and performance of the trained obstacle occupation network model can be improved.

It can be understood that, for a specific implementation manner of determining the occupied grid semantic tag corresponding to the sample data, reference may be made to the corresponding specific implementation manner and the specific implementation process in the tag determination method in the foregoing embodiment, which are not described herein.

For a better understanding of the method according to the embodiment of the present application, the following describes the method according to the embodiment of the present application with reference to the accompanying drawings and specific application scenarios.

Fig. 3 is a flowchart of a method for determining a tag according to another embodiment of the present application, as shown in fig. 3.

Step 301, at least one frame of point cloud and an image corresponding to the point cloud are acquired.

In this embodiment, at least one frame of point cloud may be the acquired point cloud data of the designated area. The single frame point cloud in the at least one frame point cloud may be acquired point cloud data of one week of laser radar scanning. The image corresponding to the point cloud of the single frame can be image data acquired by image acquisition equipment such as a camera corresponding to the acquired point cloud data of one week of laser radar scanning.

And 302, performing semantic segmentation processing on the image corresponding to the point cloud of each frame by using an image semantic segmentation model to obtain an image semantic segmentation result corresponding to the point cloud of each frame.

In this embodiment, for a point cloud of a single frame, an image semantic segmentation model may be used to perform semantic segmentation processing on an image corresponding to the point cloud, so as to obtain an image semantic segmentation result corresponding to the point cloud.

Step 303, mapping the point cloud of each frame onto an image semantic segmentation result corresponding to the point cloud of each frame, using a semantic segmentation label of a corresponding pixel on the image as the semantic segmentation result of the point cloud, and using the semantic segmentation result of the point cloud as the semantic information of the point cloud.

And 304, detecting each frame of point cloud by using a 3D target detection model to obtain a dynamic obstacle detection result of each frame.

Step 305, based on the dynamic obstacle detection result of each frame, performing separation processing on the dynamic semantic point cloud and the static semantic point cloud in the point cloud semantic information of each frame to obtain the static semantic point cloud and the dynamic semantic point cloud of each frame.

In this embodiment, first, based on a dynamic obstacle detection result of each frame, a dynamic semantic point cloud in the point cloud semantic information of each frame is extracted. Secondly, other semantic point clouds in the point cloud semantic information of each frame after the dynamic semantic point clouds are extracted are used as static semantic point clouds of each frame.

Step 306, based on the dynamic obstacle category ID of the dynamic obstacle detection result corresponding to the dynamic semantic point cloud of each frame, performing a stitching process on the dynamic semantic point cloud of each frame to obtain a dense dynamic semantic point cloud.

In the present embodiment, the dynamic obstacle detection result may include, but is not limited to, a dynamic obstacle detection frame and a dynamic obstacle class ID. The dynamic obstacle category identification may be a dynamic obstacle detection box category identification.

Step 307, based on the time sequence of the point cloud of each frame, performing a stitching process on the static semantic point cloud of each frame to obtain a dense static semantic point cloud.

In this embodiment, the dense static semantic point cloud may be a static semantic point cloud of a continuous frame point cloud obtained by stitching.

In this embodiment, the timing of the point cloud of each frame may be the acquisition time sequence of the point cloud of each frame.

Step 308, performing the following operations on the point clouds of each frame, namely, performing conversion processing on the dense static semantic point clouds based on the position data of the point clouds of the current frame to obtain the dense static semantic point clouds of the current frame, performing matching processing on the dynamic obstacle detection result of the point clouds of the current frame and the dense dynamic semantic point clouds to obtain the dense dynamic semantic point clouds of the current frame, and performing merging processing on the dense static semantic point clouds of the current frame and the dense dynamic semantic point clouds of the current frame to obtain the dense semantic point clouds of the current frame to determine the dense semantic point clouds of each frame.

Step 309, based on a preset voxel grid network of the designated area, voxel processing is performed on each point cloud in the dense semantic point cloud of each frame to obtain the point cloud occupation condition of each voxel grid.

In this embodiment, the preset voxel grid network of the designated region includes a plurality of voxel grids.

Step 310, responding to the point cloud occupation condition of the voxel grid as the point cloud occupation, and selecting the semantic of the point cloud with the highest voting rate in the voxel grid as an occupation grid semantic tag of the voxel grid by utilizing a voting mechanism based on the number of the point clouds and the semantic of the point clouds in the voxel grid.

In this embodiment, specifically, first, voxel processing is performed on each point cloud in the dense semantic point clouds of each frame based on a preset voxel grid network of a designated area, and voxel processing is performed on each point cloud in the dense semantic point clouds of each frame based on the preset voxel grid network of the designated area, so as to obtain a position index of each point cloud. Again, the number of point clouds within each voxel grid is determined based on the location index of each point cloud and the preset location index of the voxel grid. Again, based on the number of point clouds within each voxel grid, the point cloud occupancy of each voxel grid is obtained.

Here, the position index of each point cloud may be expressed as,i[ X, y, z ], i.e. x_index, y_index, z_index of the coordinates of the point cloud.

It will be appreciated that the position index of each point cloud can be calculated using the aforementioned equation (1)。

In this embodiment, the point cloud occupancy condition of the voxel grid may include a point cloud occupancy and a point cloud unoccupied.

In this embodiment, each voxel grid may be traversed, for each voxel grid, the point cloud occupation condition of the voxel grid is responded to be point cloud occupation, and based on the number of point clouds and the semantics of the point clouds in the voxel grid, the semantics of the point cloud with the highest vote rate in the voxel grid is selected as the occupation grid semantic tag of the voxel grid by using a voting mechanism.

So far, iterative training processing can be performed on the trained obstacle occupation network model based on the occupation grid semantic tags of the voxel grids and corresponding point cloud sample data so as to obtain the trained obstacle occupation network model.

Step 311, performing encoding processing on each voxel grid based on a collection range corresponding to a preset voxel grid network, a preset position index of each voxel grid and an occupied grid semantic tag of each voxel grid to obtain each encoding voxel grid, so as to perform storage processing on each encoding voxel grid.

Here, the preset position index of each voxel grid may be determined based on position information of each voxel grid in the preset voxel grid network.

In this embodiment, the occupied grid semantic tag index of each voxel grid may be determined based on the occupied grid semantic tag of each voxel grid, and then encoding processing may be performed on each voxel grid based on the collection range corresponding to the preset voxel grid network, the preset position index of each voxel grid, and the occupied grid semantic tag index, so as to obtain each encoded voxel grid, and store the encoded voxel grid.

It will be appreciated that here, each encoded voxel grid index may be calculated using equation (2) above.

In addition, by adopting the technical scheme in the embodiment, the precision of the semantic tags occupying the grids can be effectively improved through a voting mechanism, and the storage space occupation of the grids occupied by a single frame can be effectively reduced through encoding processing of the grids.

It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present application.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.

It should be noted that, part or all of the apparatus for determining a tag in this embodiment may be an application located at a local terminal, or may also be a functional unit such as a plug-in unit or a software development kit (Software Development Kit, SDK) disposed in the application located at the local terminal, or may also be a processing engine located in a server on a network side, or may also be a distributed system located on the network side, for example, a processing engine or a distributed system in a data processing platform on the network side, which is not limited in this embodiment.

Optionally, in one possible implementation manner of this embodiment, the classification unit 402 may specifically be configured to perform detection processing on each frame of point cloud by using a preset target detection model to obtain a dynamic obstacle detection result of each frame, perform extraction processing on a dynamic semantic point cloud in the point cloud semantic information of each frame based on the dynamic obstacle detection result of each frame to obtain a dynamic semantic point cloud of each frame, and perform filtering processing on a dynamic semantic point cloud in the point cloud semantic information of each frame based on the dynamic semantic point cloud of each frame to obtain a static semantic point cloud of each frame.

Optionally, in one possible implementation manner of this embodiment, the stitching unit 403 may be configured to obtain a dynamic obstacle detection result corresponding to a dynamic semantic point cloud of each frame, perform stitching processing on the dynamic semantic point cloud of each frame based on the dynamic obstacle detection result corresponding to the dynamic semantic point cloud of each frame, and obtain a dense dynamic semantic point cloud based on the result of the stitching processing.

Alternatively, in one possible implementation manner of this embodiment, the stitching unit 403 may be configured to perform a stitching process on the static semantic point cloud of each frame based on a preset sequence, and obtain a dense static semantic point cloud based on a result of the stitching process.

Optionally, in one possible implementation manner of this embodiment, the merging unit 404 may be configured to obtain location data of a point cloud of a current frame, perform conversion processing on the dense static semantic point cloud based on the location data of the point cloud of the current frame to obtain a dense static semantic point cloud of the current frame, obtain a dynamic obstacle detection result of the point cloud of the current frame, perform matching processing on the dynamic obstacle detection result and the dense dynamic semantic point cloud to obtain a dense dynamic semantic point cloud of the current frame, perform merging processing on the dense static semantic point cloud of the current frame and the dense dynamic semantic point cloud of the current frame to obtain a dense semantic point cloud of the current frame, and determine the dense semantic point cloud of each frame based on the dense semantic point cloud of each current frame.

Optionally, in one possible implementation manner of this embodiment, the obtaining unit 405 may be configured to voxel each point cloud in the dense semantic point cloud of each frame based on a preset voxel grid network of the specified area to obtain a position index of each point cloud, determine a number of point clouds in each voxel grid based on the position index of each point cloud and the preset position index of the voxel grid, and obtain a point cloud occupation condition of each voxel grid based on the number of point clouds in each voxel grid.

Optionally, in one possible implementation manner of this embodiment, the determining unit 406 may be configured to obtain, in response to a point cloud occupation situation of a voxel grid as a point cloud occupation, the number of point clouds and semantics of the point clouds in the voxel grid, select, by using a voting mechanism, semantics of a point cloud with a highest vote rate in the voxel grid based on the number of point clouds and semantics of the point clouds in the voxel grid, and determine, as an occupied grid semantic tag of the voxel grid, the semantics of the point cloud with the highest vote rate.

Fig. 5 is a block diagram of a training device for an obstacle occupation network model according to an embodiment of the present application, as shown in fig. 5. The training apparatus 500 of the obstacle occupation network model of the present embodiment may include an acquisition unit 501, a training unit 502, and an acquisition unit 503. The training unit 502 is configured to perform iterative training processing on an obstacle occupation network model to be trained based on the sample data and the occupation grid semantic tag corresponding to the sample data, and the obtaining unit 503 is configured to obtain the trained obstacle occupation network model in response to the training reaching a preset termination condition.

It should be noted that, part or all of the apparatus for determining a tag in this embodiment may be an application located at a local terminal, or may also be a functional unit such as a plug-in unit or a software development kit (Software Development Kit, SDK) disposed in the application located at the local terminal, or may also be a processing engine located in a server on a network side, or may also be a distributed system located on the network side, for example, a processing engine or a distributed system in a model training platform on the network side, which is not limited in this embodiment.

In this embodiment, the sample data and the occupied grid semantic tag corresponding to the sample data may be acquired by the acquiring unit, where the occupied grid semantic tag is determined by processing the sample data according to the method of determining the tag in the foregoing embodiment, and further the training unit may perform iterative training processing on the to-be-trained obstacle occupied network model based on the sample data and the occupied grid semantic tag corresponding to the sample data, and the obtaining unit may obtain the trained obstacle occupied network model in response to the training reaching the preset termination condition.

In the technical scheme of the application, related personal information of the user, such as collection, storage, use, processing, transmission, provision, disclosure and other processes of images, attribute data and the like of the user, accords with the regulations of related laws and regulations and does not violate the popular regulations.

According to embodiments of the present application, the present application also provides an electronic device, a readable storage medium and a computer program product.

According to an embodiment of the present application, further, there is also provided an automated driving vehicle including the provided electronic apparatus, which may include an unmanned vehicle of the order of L2 and above, for example, an unmanned delivery vehicle, an unmanned logistics vehicle, or the like.

Fig. 6 shows a schematic block diagram of an example electronic device 600 that may be used to implement an embodiment of the application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the applications described and/or claimed herein.

As shown in fig. 6, the electronic device 600 includes a computing unit 601 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the electronic device 600 can also be stored. The computing unit 601, ROM 602, and RAM 603 are connected to each other by a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Various components in the electronic device 600 are connected to the I/O interface 605, including an input unit 606, such as a keyboard, mouse, etc., an output unit 607, such as various types of displays, speakers, etc., a storage unit 608, such as a magnetic disk, optical disk, etc., and a communication unit 609, such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the electronic device 600 to exchange information/data with other devices through a computer network, such as the internet, and/or various telecommunication networks.

The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The calculation unit 601 performs the respective methods and processes described above, for example, a method of tag determination or a training method of an obstacle occupation network model. For example, in some embodiments, the method of tag determination or the training method of the obstacle-occupation network model may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into RAM 603 and executed by the computing unit 601, one or more steps of the method of tag determination described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured in any other suitable way (e.g. by means of firmware) to perform the method of tag determination or the training method of the obstacle-occupation network model.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include being implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be a special or general purpose programmable processor, operable to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present application may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of the present application, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user and a keyboard and a pointing device (e.g., a mouse or a trackball) by which the user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user, for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback), and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a Local Area Network (LAN), a Wide Area Network (WAN), and the Internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present disclosure may be performed in parallel, sequentially, or in a different order, so long as the desired result of the technical solution of the present disclosure is achieved, and the present disclosure is not limited herein.

The above embodiments do not limit the scope of the present application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application should be included in the scope of the present application.

Claims

Translated fromChinese

1.一种标签确定的方法，其特征在于，所述方法包括：1. A method for determining a label, characterized in that the method comprises:

获取指定区域的待处理的至少一帧点云的点云语义信息，所述点云语义信息是自动化标注的；Acquire point cloud semantic information of at least one frame of point cloud to be processed in a specified area, wherein the point cloud semantic information is automatically annotated;

对每一帧的所述点云语义信息进行分类处理，以获得每一帧的静态语义点云和动态语义点云；Classify the point cloud semantic information of each frame to obtain a static semantic point cloud and a dynamic semantic point cloud of each frame;

对每一帧的所述静态语义点云和所述动态语义点云分别进行拼接处理，以分别获得稠密静态语义点云和稠密动态语义点云；The static semantic point cloud and the dynamic semantic point cloud of each frame are spliced to obtain a dense static semantic point cloud and a dense dynamic semantic point cloud respectively;

基于所述稠密静态语义点云和稠密动态语义点云，利用预设的合并策略，确定每一帧的稠密语义点云；Based on the dense static semantic point cloud and the dense dynamic semantic point cloud, a dense semantic point cloud of each frame is determined using a preset merging strategy;

基于所述指定区域的预设的体素网格网络，对所述每一帧的稠密语义点云中的每个点云进行体素化处理，得到每个点云的位置索引；Based on the preset voxel grid network of the designated area, voxelize each point cloud in the dense semantic point cloud of each frame to obtain a position index of each point cloud;

每个点云的位置索引满足，i[x,y,z]，其中，i_ub为x,y,z坐标轴方向上预设的有效点云范围的下界，i_size为单位体素网格在x,y,z坐标轴方向上的长度；The position index of each point cloud satisfy ,i [x,y,z], where i_ub is the lower bound of the preset valid point cloud range in the x,y,z coordinate axis direction, and i_size is the length of the unit voxel grid in the x,y,z coordinate axis direction;

基于每个点云的位置索引和体素网格的预设位置索引，确定每个体素网格内的点云的数目；Determining the number of point clouds within each voxel grid based on the position index of each point cloud and the preset position index of the voxel grid;

基于每个体素网格内的点云的数目，获得每个体素网格的点云占用情况；Based on the number of point clouds in each voxel grid, the point cloud occupancy of each voxel grid is obtained;

基于每个体素网格的点云占用情况，对每个体素网格中点云的语义进行选取处理，以确定每个体素网格的占用网格语义标签。Based on the point cloud occupancy of each voxel grid, the semantics of the point cloud in each voxel grid is selected and processed to determine the occupied grid semantic label of each voxel grid.

2.根据权利要求1所述的方法，其特征在于，所述对每一帧的所述点云语义信息进行分类处理，以获得每一帧的静态语义点云和动态语义点云，包括：2. The method according to claim 1, characterized in that the classification processing of the point cloud semantic information of each frame to obtain a static semantic point cloud and a dynamic semantic point cloud of each frame comprises:

利用预设的目标检测模型，对每一帧点云进行检测处理，以获得每一帧的动态障碍物检测结果；Using the preset target detection model, each frame of point cloud is detected and processed to obtain the dynamic obstacle detection results of each frame;

基于每一帧的动态障碍物检测结果，对每一帧的所述点云语义信息中的动态语义点云进行提取处理，以获得每一帧的动态语义点云；Based on the dynamic obstacle detection result of each frame, extracting and processing the dynamic semantic point cloud in the point cloud semantic information of each frame to obtain the dynamic semantic point cloud of each frame;

基于每一帧的动态语义点云，对每一帧的所述点云语义信息中的动态语义点云进行过滤处理，以获得每一帧的静态语义点云。Based on the dynamic semantic point cloud of each frame, the dynamic semantic point cloud in the point cloud semantic information of each frame is filtered to obtain the static semantic point cloud of each frame.

3.根据权利要求1所述的方法，其特征在于，对每一帧的所述动态语义点云进行拼接处理，以获得稠密动态语义点云，包括：3. The method according to claim 1, characterized in that the dynamic semantic point cloud of each frame is spliced to obtain a dense dynamic semantic point cloud, comprising:

获取每一帧的动态语义点云对应的动态障碍物检测结果；Get the dynamic obstacle detection results corresponding to the dynamic semantic point cloud of each frame;

基于每一帧的动态语义点云对应的动态障碍物检测结果，对所述对每一帧的所述动态语义点云进行拼接处理；Based on the dynamic obstacle detection result corresponding to the dynamic semantic point cloud of each frame, splicing the dynamic semantic point cloud of each frame;

基于拼接处理的结果，得到获得稠密动态语义点云。Based on the results of the splicing process, a dense dynamic semantic point cloud is obtained.

4.根据权利要求1所述的方法，其特征在于，对每一帧的所述静态语义点云进行拼接处理，以获得稠密静态语义点云，包括：4. The method according to claim 1, characterized in that the static semantic point cloud of each frame is spliced to obtain a dense static semantic point cloud, comprising:

基于预设的顺序，对每一帧的所述静态语义点云进行拼接处理；Based on a preset order, stitching the static semantic point cloud of each frame;

基于拼接处理的结果，得到获得稠密静态语义点云。Based on the results of the stitching process, a dense static semantic point cloud is obtained.

5.根据权利要求1所述的方法，其特征在于，所述基于所述稠密静态语义点云和稠密动态语义点云，利用预设的合并策略，确定每一帧的稠密语义点云，包括：5. The method according to claim 1, characterized in that the step of determining the dense semantic point cloud of each frame based on the dense static semantic point cloud and the dense dynamic semantic point cloud using a preset merging strategy comprises:

针对每一帧的点云执行如下操作：获取当前帧点云的位置数据；The following operations are performed for each frame of point cloud: obtain the position data of the point cloud of the current frame;

基于当前帧的点云的所述位置数据，对所述稠密静态语义点云进行转换处理，以获得当前帧的稠密静态语义点云；Based on the position data of the point cloud of the current frame, converting the dense static semantic point cloud to obtain a dense static semantic point cloud of the current frame;

获取当前帧的点云的动态障碍物检测结果；Get the dynamic obstacle detection results of the point cloud of the current frame;

对所述动态障碍物检测结果和稠密动态语义点云进行匹配处理，以获得当前帧的稠密动态语义点云；Performing matching processing on the dynamic obstacle detection result and the dense dynamic semantic point cloud to obtain a dense dynamic semantic point cloud of the current frame;

对当前帧的所述稠密静态语义点云和当前帧的所述稠密动态语义点云进行合并处理，得到当前帧的稠密语义点云；Merging the dense static semantic point cloud of the current frame and the dense dynamic semantic point cloud of the current frame to obtain a dense semantic point cloud of the current frame;

基于每个当前帧的稠密语义点云，确定每一帧的稠密语义点云。Based on the dense semantic point cloud of each current frame, a dense semantic point cloud of each frame is determined.

6.根据权利要求1所述的方法，其特征在于，所述基于每个体素网格的点云占用情况，对每个体素网格中点云的语义进行选取处理，以确定每个体素网格的占用网格语义标签，包括：6. The method according to claim 1, characterized in that the selecting and processing of the semantics of the point cloud in each voxel grid based on the point cloud occupancy of each voxel grid to determine the occupied grid semantic label of each voxel grid comprises:

响应于体素网格的点云占用情况为点云占用，获取所述体素网格中点云的数目和点云的语义；In response to the point cloud occupancy of the voxel grid being point cloud occupied, obtaining the number of point clouds in the voxel grid and the semantics of the point clouds;

基于所述体素网格中点云的数目和点云的语义，利用投票机制，选取所述体素网格中得票率最高的点云的语义；Based on the number of point clouds in the voxel grid and the semantics of the point clouds, a voting mechanism is used to select the semantics of the point cloud with the highest vote rate in the voxel grid;

将得票率最高的所述点云的语义确定为所述体素网格的占用网格语义标签。The semantics of the point cloud with the highest vote rate is determined as the occupied grid semantic label of the voxel grid.

7.一种障碍物占用网络模型的训练方法，其特征在于，所述方法包括：7. A method for training an obstacle occupancy network model, characterized in that the method comprises:

获取样本数据和所述样本数据对应的占用网格语义标签，其中，所述占用网格语义标签是根据权利要求1至6中任一项所述的方法对所述样本数据进行处理所确定的；Acquire sample data and an occupied grid semantic label corresponding to the sample data, wherein the occupied grid semantic label is determined by processing the sample data according to the method of any one of claims 1 to 6;

基于所述样本数据和所述样本数据对应的占用网格语义标签，对待训练的障碍物占用网络模型进行迭代训练处理；Based on the sample data and the occupied grid semantic labels corresponding to the sample data, performing iterative training processing on the obstacle occupancy network model to be trained;

响应于训练到达预设的终止条件，获得训练完成的障碍物占用网络模型。In response to the training reaching a preset termination condition, a trained obstacle occupancy network model is obtained.

8.一种标签确定的装置，其特征在于，所述装置包括：8. A device for determining a label, characterized in that the device comprises:

获取单元，用于获取指定区域的待处理的至少一帧点云的点云语义信息，所述点云语义信息是自动化标注的；An acquisition unit, used to acquire point cloud semantic information of at least one frame of point cloud to be processed in a specified area, wherein the point cloud semantic information is automatically annotated;

分类单元，用于对每一帧的所述点云语义信息进行分类处理，以获得每一帧的静态语义点云和动态语义点云；A classification unit, used for classifying the point cloud semantic information of each frame to obtain a static semantic point cloud and a dynamic semantic point cloud of each frame;

拼接单元，用于对每一帧的所述静态语义点云和所述动态语义点云分别进行拼接处理，以分别获得稠密静态语义点云和稠密动态语义点云；A splicing unit, used for splicing the static semantic point cloud and the dynamic semantic point cloud of each frame respectively, so as to obtain a dense static semantic point cloud and a dense dynamic semantic point cloud respectively;

合并单元，用于基于所述稠密静态语义点云和稠密动态语义点云，利用预设的合并策略，确定每一帧的稠密语义点云；A merging unit, configured to determine a dense semantic point cloud of each frame based on the dense static semantic point cloud and the dense dynamic semantic point cloud using a preset merging strategy;

获得单元，用于基于所述指定区域的预设的体素网格网络，对所述每一帧的稠密语义点云中的每个点云进行体素化处理，得到每个点云的位置索引；An obtaining unit, configured to voxelize each point cloud in the dense semantic point cloud of each frame based on a preset voxel grid network of the designated area to obtain a position index of each point cloud;

确定单元，用于基于每个体素网格的点云占用情况，对每个体素网格中点云的语义进行选取处理，以确定每个体素网格的占用网格语义标签。The determination unit is used to select and process the semantics of the point cloud in each voxel grid based on the point cloud occupancy of each voxel grid, so as to determine the occupied grid semantic label of each voxel grid.

9.一种障碍物占用网络模型的训练装置，其特征在于，所述装置包括：9. A training device for an obstacle occupancy network model, characterized in that the device comprises:

获取单元，用于获取样本数据和所述样本数据对应的占用网格语义标签，其中，所述占用网格语义标签是根据权利要求1至6中任一项所述的方法对所述样本数据进行处理所确定的；an acquiring unit, configured to acquire sample data and an occupied grid semantic label corresponding to the sample data, wherein the occupied grid semantic label is determined by processing the sample data according to the method of any one of claims 1 to 6;

训练单元，用于基于所述样本数据和所述样本数据对应的占用网格语义标签，对待训练的障碍物占用网络模型进行迭代训练处理；A training unit, configured to perform iterative training processing on the obstacle occupancy network model to be trained based on the sample data and the occupancy grid semantic labels corresponding to the sample data;

获得单元，用于响应于训练到达预设的终止条件，获得训练完成的障碍物占用网络模型。The obtaining unit is used to obtain the obstacle occupancy network model after training is completed in response to the training reaching a preset termination condition.

10. 一种电子设备，包括：10. An electronic device comprising:

至少一个处理器；以及at least one processor; and

与所述至少一个处理器通信连接的存储器；其中，a memory communicatively connected to the at least one processor; wherein,

所述存储器存储有可被所述至少一个处理器执行的指令，所述指令被所述至少一个处理器执行，以使所述至少一个处理器能够执行根据权利要求1-7中任一项所述的方法。The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform the method according to any one of claims 1 to 7.

11.一种存储有计算机指令的非瞬时计算机可读存储介质，其中，所述计算机指令用于使所述计算机执行根据权利要求1-7中任一项所述的方法。11. A non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause the computer to execute the method according to any one of claims 1 to 7.

12.一种计算机程序产品，包括计算机程序，所述计算机程序在被处理器执行时实现根据权利要求1-7中任一项所述的方法。12. A computer program product, comprising a computer program, which, when executed by a processor, implements the method according to any one of claims 1 to 7.