CN112712089B

Movatterモバイル変換

Info

Publication number: CN112712089B
Application number: CN202011630173.3A
Authority: CN
Inventors: 王龙
Original assignee: Dilu Technology Co Ltd
Current assignee: Dilu Technology Co Ltd
Priority date: 2020-12-31
Filing date: 2020-12-31
Publication date: 2024-09-20
Anticipated expiration: 2040-12-31
Also published as: CN112712089A

Abstract

The application relates to an obstacle detection method, an obstacle detection device, a computer device and a storage medium. The method comprises the following steps: acquiring acquired point cloud data; dividing the point cloud data by adopting preset dividing scales with different numbers and sizes to obtain voxels of each dividing scale; extracting feature vectors of voxels of each segmentation scale to obtain each shape feature; the size of each shape feature is adjusted to obtain each shape feature to be spliced with the same size; splicing and combining the shape features to be spliced to obtain point cloud features with multi-scale information; projecting the point cloud features to a horizontal plane to form a two-dimensional tensor; the two-dimensional tensor is input into the convolutional neural network model to predict the obstacle, and a prediction result is determined, so that the obstacle detection precision is improved.

Description

Obstacle detection method, obstacle detection device, computer device, and storage medium

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to a method and apparatus for detecting an obstacle, a computer device, and a storage medium.

Background

The lidar has a strong spatial three-dimensional resolving power, and it is necessary for autopilot to realize fast and high-precision obstacle detection by using the lidar.

The laser radar target detection algorithm at the present stage is mainly divided into two types: voxel-based and point-based, etc. The Point-based algorithm extracts the Point cloud characteristics adaptively through the density and the three-dimensional mode of the Point cloud when extracting the Point cloud characteristics, but the calculation amount is large, the calculation time is long, and the Point-based algorithm is difficult to apply in the actual environment, so that the Voxel-based algorithm is generally used for detecting the obstacle, but the detection mode of the Voxel-based algorithm is as follows: uniformly dividing the point cloud into three-dimensional voxels with equal intervals, and independently extracting a feature vector from each voxel; in the process, the whole point cloud is divided by only using voxels with a single size, so that the three-dimensional mode of the point cloud can be cut, and the improvement of the detection precision is limited.

Therefore, the current obstacle detection accuracy is low.

Disclosure of Invention

In view of the above, it is desirable to provide an obstacle detection method, apparatus, computer device, and storage medium that can improve the accuracy of obstacle detection.

A method of obstacle detection, the method comprising:

Acquiring acquired point cloud data;

dividing the point cloud data by adopting preset dividing scales with different numbers and sizes to obtain voxels of each dividing scale;

extracting feature vectors of the voxels of each segmentation scale to obtain each shape feature;

The size of each shape feature is adjusted to obtain each shape feature to be spliced with the same size;

Splicing and combining the shape features to be spliced to obtain point cloud features with multi-scale information;

projecting the point cloud features to a horizontal plane to form a two-dimensional tensor;

And inputting the two-dimensional tensor into a convolutional neural network model to predict the obstacle, and determining a prediction result.

In one embodiment, the determining manner of the preset number of different segmentation scales is:

and determining different preset dividing scales according to the types of the barriers and the corresponding size ranges of the types.

In one embodiment, the step of extracting the feature vector of each voxel of the segmentation scale to obtain each shape feature includes:

And inputting the voxels of each segmentation scale into a corresponding feature extraction network to perform feature extraction, so as to obtain each shape feature.

In one embodiment, the feature extraction network is a deep learning network structure.

In one embodiment, the step of adjusting the size of each shape feature to obtain each shape feature to be spliced with the same size includes:

And (3) adjusting the size of each shape characteristic by adopting a tri-linear interpolation method to obtain each shape characteristic to be spliced with the same size.

In one embodiment, the step of projecting the point cloud features onto a horizontal plane to form a two-dimensional tensor includes:

and calling reshape a function to project the point cloud characteristics to a horizontal plane to form a two-dimensional tensor.

In one embodiment, the step of inputting the two-dimensional tensor into the convolutional neural network model to predict the obstacle and determining the prediction result includes:

Inputting the two-dimensional tensor into a backbone network of a convolutional neural network model for calculation to obtain a calculation result;

and inputting the calculation result into a head network of the convolutional neural network model to predict a bounding box and the type of the obstacle, and determining the coordinates of the central point of the obstacle, the length, width and height of the obstacle, the orientation angle of the obstacle and the type of the obstacle.

An obstacle detection device, the device comprising:

The data acquisition module is used for acquiring the acquired point cloud data;

the segmentation module is used for segmenting the point cloud data by adopting preset number of segmentation scales with different sizes to obtain voxels of each segmentation scale;

The feature extraction module is used for extracting feature vectors of voxels of each segmentation scale to obtain shape features;

the shape adjusting module is used for adjusting the sizes of the shape features to obtain the shape features to be spliced with the same size;

the splicing module is used for splicing and combining the shape features to be spliced to obtain point cloud features with multi-scale information;

the projection module is used for projecting the point cloud characteristics to a horizontal plane to form a two-dimensional tensor;

And the prediction module is used for inputting the two-dimensional tensor into a convolutional neural network model to predict the obstacle and determining a prediction result.

A computer device comprising a memory storing a computer program and a processor implementing the steps of the method when the processor executes the computer program.

A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method.

The obstacle detection method, the obstacle detection device, the computer equipment and the storage medium are used for acquiring the acquired point cloud data; dividing the point cloud data by adopting preset dividing scales with different numbers and sizes to obtain voxels of each dividing scale; the method can better reserve the shape information of the point cloud under different scales and provide better point cloud characteristics; extracting feature vectors of voxels of each segmentation scale to obtain each shape feature; the size of each shape feature is adjusted to obtain each shape feature to be spliced with the same size; splicing and combining the shape features to be spliced to obtain point cloud features with multi-scale information; projecting the point cloud features to a horizontal plane to form a two-dimensional tensor; the two-dimensional tensor is input into the convolutional neural network model to predict the obstacle, and a prediction result is determined, so that the obstacle detection precision is improved.

Drawings

FIG. 1 is a flow chart of a method for detecting an obstacle according to an embodiment;

FIG. 2 is a schematic diagram of partitioning point cloud data with different partition scales in one embodiment;

fig. 3 is a block diagram of an obstacle detecting apparatus in one embodiment.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

In one embodiment, as shown in fig. 1, there is provided an obstacle detection method including the steps of:

Step S220, acquiring the acquired point cloud data.

The point cloud data is three-dimensional point cloud generated by a laser radar, such as: the installed 128-line laser radar generally has 12 ten thousand points (laser reflection points) according to the three-dimensional point cloud generated by the detection target range, and is distributed in a range of 50 meters in the front, back, left and right, and the height is generally from-5 meters to 3 meters.

And step S240, segmenting the point cloud data by adopting preset segmentation scales with different numbers and sizes to obtain voxels with each segmentation scale.

The size of the division scale can be expressed as (x_i,y_i,z_i), wherein i is the sequence number of the scale, x is the length, y is the width, and z is the height. Voxel is a abbreviation for volumn pixel (volume pixel), analog pixel to explain: a voxel is a three-dimensional concept and a pixel is a two-dimensional concept. The voxels of each division scale refer to that each division scale divides the point cloud data to obtain the voxels divided by each division scale, if the length of the point cloud data is l, the width is w, and the height is h, the number of the voxels divided by each division scale is l/x_i,w/y_i,h/z_i.

In one embodiment, the determining manner of the preset number of different segmentation scales is: determining different segmentation scales of the preset number according to each type of the obstacle and the corresponding size range of each type, for example: the obstacles to be predicted comprise people, cars and buses, wherein the length, width and height of the people are generally 0.73,0.67,1.77 meters, and the average value is 1 meter; the length, width and height of the car are 4.63,1.97,1.74 meters, and the average value is 2.78 meters; the length and width of the bus are (10.5,2.94,3.47 m) and the average value is 5.63 m, the designer can set up according to his own experience how many parts of each of the length, width and height directions of one obstacle are required to be cut, for example, a designer considers that voxels (voxel) obtained by cutting the length, width and height directions into 10 parts according to the average value is enough to express the shape of the obstacle, the designer can select a three-dimensional division scale dividing point cloud of 0.1m, 0.278 m and 0.563 m according to the average value of the three obstacles, the three scales are the selected division scales, the size spans of the three obstacles in the above example are larger, so that the three dimensions are divided into 3 scales, and if the predicted obstacle sizes are not so large, the three dimensions can be divided into a few scales.

Step S260, extracting feature vectors of voxels of each segmentation scale to obtain each shape feature.

In one embodiment, the step of extracting feature vectors of voxels of each segmentation scale to obtain each shape feature includes: and inputting the voxels of each segmentation scale into a corresponding feature extraction network to perform feature extraction, and obtaining each shape feature.

Wherein, each voxel of the segmentation scale trains a feature extraction network for extracting the feature vector of the voxel of the corresponding segmentation scale. Points in voxels of each segmentation scale are transmitted into a corresponding feature extraction network, shape features are extracted, and different feature vectors are obtained and are represented as (C_i,l/x_i,w/y_i,h/z_i), wherein C_i is the dimension of the feature vectors. The feature extraction network is a deep learning network structure, that is, after each point in the voxel passes through a fully connected network, maxpooling (pooling operation) is used to obtain the shape feature of the whole voxel.

And step S280, adjusting the sizes of the shape features to obtain the shape features to be spliced with the same size.

At least one of the length, width, height and number of each shape feature is different, so that in order to splice, the shape features with the same length, width, height and number need to be adjusted to facilitate later splicing. As shown in fig. 2, the point cloud data is segmented by using 3 segmentation scales with different sizes, the object similar to a magic cube represents the whole point cloud data, the segmentation scale used by the segmentation scale 1 is larger, and the length, width and height of the point cloud data are respectively segmented into 2 parts, as shown in the segmentation scale 1 segmentation result in fig. 2; dividing the length, width and height of the point cloud data into 3 parts by a dividing scale 2, and dividing the length, width and height of the point cloud data into 3 parts by the dividing scale 2 in the figure 2; the division scale 3 divides the length, width and height of the point cloud data into 4 parts, as the division scale 3 division result in fig. 2 generally extracts each voxel into 64 feature vectors composed of numbers, so that the feature shapes of the 3 division scales in fig. 2 are (64,2,2,2), (64,3,3,3) and (64,4,4,4) respectively, and therefore the feature shapes of the 3 division scales are required to be adjusted to the shape feature of the size (64,4,4,4) by using the tri-linear interpolation method, and the feature shapes of the 3 division scales are adjusted to be the feature features of the shape to be spliced with the same size.

And (3) adjusting the size of each shape characteristic by adopting a tri-linear interpolation method to obtain each shape characteristic to be spliced with the same size. The shape features to be spliced of the same size can be represented by (C_i,l₀,w₀,h₀), wherein l₀ is the length of the shape feature to be spliced, w₀ is the width of the shape feature to be spliced, and h₀ is the height of the shape feature to be spliced.

The linear interpolation method is a method for performing linear interpolation on tensor product grids of three-dimensional discrete sampled data.

And step S300, splicing and combining the shape features to be spliced to obtain the point cloud features with multi-scale information.

The method comprises the steps of splicing and combining all shape features to be spliced to obtain a point cloud feature f with multi-scale information, wherein the shape is expressed as (C₁+C₂+^...+C_n,l₀,w₀,h₀), and C₁+C₂+^...+C_n is expressed as C, so that the shape of the point cloud feature f obtained in the step is (C, l₀,w₀,h₀).

In step S320, the point cloud features are projected onto a horizontal plane to form a two-dimensional tensor.

In one embodiment, the step of projecting the point cloud features onto a horizontal plane to form a two-dimensional tensor comprises: the reshape function is called to project the point cloud features to the horizontal plane, forming a two-dimensional tensor.

The shape of the point cloud feature f is (C, l₀,w₀,h₀), f needs to be projected to a horizontal plane, and the projection operation can be reshape operation, so that the point cloud feature f 'is changed into a feature f' with the shape of (C x l₀,w₀,h₀), namely a two-dimensional tensor. The reshape function is a function that transforms a specified matrix into a dimension-specific matrix.

And S340, inputting the two-dimensional tensor into a convolutional neural network model to predict the obstacle, and determining a prediction result.

In one embodiment, the step of inputting the two-dimensional tensor into the convolutional neural network model to predict the obstacle and determining the prediction result includes: inputting the two-dimensional tensor into a backbone network of the convolutional neural network model for calculation to obtain a calculation result; inputting the calculation result into a head network of the convolutional neural network model to predict a bounding box and the type of the obstacle, and determining the coordinates of the central point of the obstacle, the length, width and height of the obstacle, the orientation angle of the obstacle and the type of the obstacle.

The structure of the convolutional neural network model is divided into a main network and a head network, the main network is divided into 16 convolutional layers, the number of the convolutional layers in each stage is 4, 6 and 6, the first convolutional layer in each stage is downsampled, the number of the convolutional layer channels in the 3 stages is 64, 128 and 256, the output (namely the calculation result) obtained by calculation of the main network is used as the input of the head network, the head network is divided into two branches, the first branch is used for predicting a bounding box (bounding box), the branch specifically comprises a center point coordinate of an obstacle, the length and width of the obstacle and the orientation angle of the obstacle, the second branch is used for predicting the type of the obstacle, the output results of the two branches are prediction results, and the prediction results comprise the center point coordinate of the obstacle, the length and width of the obstacle, the orientation angle of the obstacle and the type of the obstacle, and the type of the obstacle are as follows: pedestrians, bicycles, cars, buses, etc.

According to the obstacle detection method, the acquired point cloud data are acquired; dividing the point cloud data by adopting preset dividing scales with different numbers and sizes to obtain voxels of each dividing scale; the method can better reserve the shape information of the point cloud under different scales and provide better point cloud characteristics; extracting feature vectors of voxels of each segmentation scale to obtain each shape feature; the size of each shape feature is adjusted to obtain each shape feature to be spliced with the same size; splicing and combining the shape features to be spliced to obtain point cloud features with multi-scale information; projecting the point cloud features to a horizontal plane to form a two-dimensional tensor; the two-dimensional tensor is input into the convolutional neural network model to predict the obstacle, and a prediction result is determined, so that the obstacle detection precision is improved.

It should be understood that, although the steps in the flowchart of fig. 1 are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in fig. 1 may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor do the order in which the sub-steps or stages are performed necessarily performed in sequence, but may be performed alternately or alternately with at least a portion of other steps or sub-steps of other steps.

In one embodiment, as shown in fig. 3, there is provided an obstacle detecting apparatus including: a data acquisition module 310, a segmentation module 320, a feature extraction module 330, a shape adjustment module 340, a stitching module 350, a projection module 360, and a prediction module 370.

The data acquisition module 310 is configured to acquire the acquired point cloud data.

The segmentation module 320 is configured to segment the point cloud data with a preset number of different segmentation scales, and obtain voxels with each segmentation scale.

The feature extraction module 330 is configured to extract feature vectors of voxels of each of the segmentation scales, and obtain each shape feature.

The shape adjustment module 340 is configured to adjust the size of each shape feature to obtain each shape feature to be spliced with the same size.

And the splicing module 350 is used for splicing and combining the shape features to be spliced to obtain the point cloud features with multi-scale information.

The projection module 360 is configured to project the point cloud features onto a horizontal plane to form a two-dimensional tensor.

And the prediction module 370 is used for inputting the two-dimensional tensor into the convolutional neural network model to predict the obstacle and determining a prediction result.

In one embodiment, the feature extraction module 330 is further configured to: and inputting the voxels of each segmentation scale into a corresponding feature extraction network to perform feature extraction, and obtaining each shape feature.

In one embodiment, the shape adjustment module 340 is further to: and (3) adjusting the size of each shape characteristic by adopting a tri-linear interpolation method to obtain each shape characteristic to be spliced with the same size.

In one embodiment, the projection module 360 is further configured to: the reshape function is called to project the point cloud features to the horizontal plane, forming a two-dimensional tensor.

In one embodiment, the prediction module 370 is further to: inputting the two-dimensional tensor into a backbone network of the convolutional neural network model for calculation to obtain a calculation result; inputting the calculation result into a head network of the convolutional neural network model to predict a bounding box and the type of the obstacle, and determining the coordinates of the central point of the obstacle, the length, width and height of the obstacle, the orientation angle of the obstacle and the type of the obstacle.

For specific limitations of the obstacle detection device, reference may be made to the above limitations of the obstacle detection method, and no further description is given here. The respective modules in the obstacle detecting apparatus described above may be implemented in whole or in part by software, hardware, and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, including a memory storing a computer program and a processor implementing the steps of the obstacle detection method described above when the processor executes the computer program.

In one embodiment, a computer readable storage medium is provided, on which a computer program is stored which, when executed by a processor, implements the steps of the obstacle detection method described above.

Those skilled in the art will appreciate that implementing all or part of the above-described methods may be accomplished by way of a computer program, which may be stored on a non-transitory computer readable storage medium, that when executed may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link (SYNCHLINK) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims

1. A method of detecting an obstacle, the method comprising:

Acquiring acquired point cloud data;

Dividing the point cloud data by adopting preset number and different dividing scales to obtain voxels of each dividing scale, wherein the determining modes of the preset number and different dividing scales are as follows: determining different preset dividing scales according to each type of the obstacle and the corresponding size range of each type;

inputting the two-dimensional tensor into a convolutional neural network model to predict an obstacle, and determining a prediction result;

The step of extracting the feature vector of each voxel of the segmentation scale to obtain each shape feature comprises the following steps:

Inputting the voxels of each segmentation scale into a corresponding feature extraction network for feature extraction to obtain each shape feature, wherein the feature extraction network is a deep learning network structure;

the step of adjusting the size of each shape feature to obtain each shape feature to be spliced with the same size comprises the following steps:

2. The method of claim 1, wherein the step of projecting the point cloud features to a horizontal plane to form a two-dimensional tensor comprises:

3. The method of claim 1, wherein the step of inputting the two-dimensional tensor into a convolutional neural network model to predict an obstacle and determining a prediction result comprises:

4. An obstacle detection device, the device comprising:

The segmentation module is used for segmenting the point cloud data by adopting preset number and different segmentation scales to obtain voxels of each segmentation scale, and the preset number and different segmentation scales are determined in the following manner: determining different preset dividing scales according to each type of the obstacle and the corresponding size range of each type;

The prediction module is used for inputting the two-dimensional tensor into a convolutional neural network model to predict the obstacle and determining a prediction result;

the feature extraction module is also used for: inputting the voxels of each segmentation scale into a corresponding feature extraction network for feature extraction to obtain each shape feature, wherein the feature extraction network is a deep learning network structure;

The shape adjustment module is also for: and (3) adjusting the size of each shape characteristic by adopting a tri-linear interpolation method to obtain each shape characteristic to be spliced with the same size.

5. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 3 when the computer program is executed.

6. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 3.