Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a flowchart of an object identification method based on width learning according to an embodiment of the present invention, as shown in fig. 1, the method includes:
and S1, acquiring the three-dimensional point cloud data in the current area.
The execution subject of the embodiment of the invention is electronic equipment, the electronic equipment can be a vehicle-mounted terminal, the vehicle-mounted terminal can comprise a LiDAR sensor, and the LiDAR sensor is used for acquiring three-dimensional point cloud data in the current area, wherein an object to be identified can exist in the current area.
The object to be identified may be a tree or a building, etc.
In addition, the in-vehicle terminal may be disposed on top of an Unmanned Ground Vehicle (UGV).
And S2, processing the three-dimensional point cloud data through a preset uniform space encoder to obtain current feature nodes in a uniform feature space.
And S3, performing object recognition on the current feature nodes through a preset width learning neural network.
It is understood that, the embodiment of the present invention may adopt a preset width Learning (Learning System) neural network to perform object identification on the object to be identified in the current area, for example, if there is an object to be identified and the object type of the object to be identified is a tree, an object identification result may be obtained, and the object identification result may be "the object to be identified is a tree".
The preset width learning neural network is a neural network structure independent of a depth structure.
Compared with the traditional deep learning neural network, the preset width learning neural network used in the method has a simple neural network structure, for example, the number of network layers of the neural network structure is less, so that the calculation efficiency is higher, and the neural network has a real-time processing characteristic with excellent performance; meanwhile, the number of parameters involved in the neural network is much smaller than that of the traditional deep learning neural network, so that the neural network has the characteristic of light weight of a network structure, and the requirement of the unmanned driving field on algorithm instantaneity can be met, thereby further improving the calculation efficiency.
After all, even if the trained deep learning neural network is used for object identification under the three-dimensional point cloud data, the real-time processing aspect is still poor due to the fact that the related parameters are more.
Moreover, on the basis of using the width learning neural network, the embodiment of the invention additionally introduces a network structure of a preset unified space encoder, and the three-dimensional point cloud data can be processed in advance through the network structure, so that the original three-dimensional point cloud data is converted into a feature vector under a unified feature space, namely the current feature node. When the width learning neural network is used specifically, the input quantity is changed from original three-dimensional point cloud data into a feature vector under a uniform feature space, and the data type is simpler to process, so that the calculation efficiency is greatly improved.
The object identification method based on width learning provided by the embodiment of the invention comprises the steps of firstly collecting three-dimensional point cloud data in a current area; processing the three-dimensional point cloud data through a preset uniform space encoder to obtain current feature nodes in a uniform feature space; and carrying out object recognition on the current feature nodes through a preset width learning neural network. When the object is identified, the preset width learning neural network adopted by the embodiment of the invention is different from the traditional deep learning neural network, the number of network layers is less, the number of parameters involved in the neural network structure is less, and the overall calculation efficiency is higher; meanwhile, the input quantity of the preset width learning neural network is changed from the original three-dimensional point cloud data to feature vectors in a unified feature space, the data type is simpler to process, and the calculation efficiency is further improved.
Fig. 2 is a flowchart of an object recognition method based on width learning according to another embodiment of the present invention, where the another embodiment of the present invention is based on the embodiment shown in fig. 1.
In this embodiment, the S2 specifically includes:
s201, processing the three-dimensional point cloud data through a current coding matrix in a preset unified space coder to obtain a current uncertain feature vector under an uncertain feature space.
For the convenience of distinguishing, the embodiment of the invention divides the whole process into a training link and a using link when the object recognition is actually used, and the training link can be divided into two parts, namely a first training link corresponding to a preset uniform space encoder and a second training link corresponding to a preset width learning neural network.
Specifically, in the using link, the original three-dimensional point cloud data is mapped to an Uncertain feature space (uncertainless feature space) by using a predetermined current coding matrix, and the feature vector mapped to the Uncertain feature space is recorded as the current Uncertain feature vector.
For example, the three-dimensional coordinate information in the three-dimensional point cloud data may be denoted as (x, y, z), and the three-dimensional coordinate information is processed through the current encoding matrix in a preset uncertain space mapping formula.
As for the preset uncertain space mapping formula, as follows,
ri,j,k=wk,1xi,j+wk,2yi,j+wk,3zi,j+b,
wherein r isi,j,kFor uncertain feature vectors, wk,1To wk,3And the current coding matrix W, i and j together represent the jth point in the ith object, k represents the vector serial number of the uncertain characteristic vector, and b represents the offset.
And S202, performing pooling treatment on the current uncertain feature vectors through an average pooling algorithm in the preset unified space encoder to obtain the current unified feature vectors in the unified feature space.
Then, the feature vectors in the uncertain feature space can be pooled by using an average pooling algorithm to be encoded into feature vectors in a Unified feature space (Unified-feature space), and the feature vectors are recorded as a Unified feature vector Vi。
As for the average pooling algorithm, as follows,
wherein v isi,kFor the average pooling result, ri,j,kFor uncertain feature vectors, niAnd (3) the number of points contained in the three-dimensional point cloud sample is represented by i, i represents the ith object, i and j together represent the jth point in the ith object, and k represents the serial number of the uncertain feature vector.
Unified feature vector ViConsisting of the average pooling result, which can be expressed as
Vi=[vi,1,vi,2,…,vi,k,…,vi,d]T,
It can be seen that the feature vectors in the uncertain feature space can be mapped to the uniform feature space through the average pooling operation.
S203, transposing the current unified feature vector to obtain a current feature node.
Transposing the current unified feature vector results in a feature node (feature node), which can be denoted as current feature node,
Xi=ViT,
wherein, XiRepresenting the current characteristic node, V, of the ith objectiRepresenting a uniform feature vector for the ith object.
It should be noted that the current feature node mentioned here is the same data type as the preset feature node appearing later, and is only used for distinguishing.
The object identification method based on width learning provided by the embodiment of the invention is applied to the current coding matrix and relates to the specific implementation mode of the current characteristic node, and the original three-dimensional point cloud data can be successfully converted into the data type which is easier to process and more suitable for the preset width learning neural network by using the implementation mode.
Fig. 3 is a flowchart of an object recognition method based on width learning according to yet another embodiment of the present invention, which is based on the embodiment shown in fig. 1.
In this embodiment, the S3 specifically includes:
s301, carrying out object identification on the current feature node through a preset weight matrix in a preset width learning neural network.
In the using link, the preset weight matrix can be recorded as W*When the preset width learning neural network is applied to object recognition, a preset weight matrix in the preset width learning neural network can be confirmed in a training link, and the preset width learning neural network can be directly used in a using link.
The object identification method based on width learning provided by the embodiment of the invention uses the preset weight matrix to carry out object identification operation.
On the basis of the foregoing embodiment, preferably, before S1, the method specifically includes:
and S11, acquiring preset characteristic nodes, preset enhanced nodes and a weight matrix to be updated in the width learning neural network to be trained.
And S12, performing output processing according to the preset feature node, the preset enhancement node and the weight matrix to be updated to obtain an output matrix.
Specifically, the second training link corresponding to the preset width learning neural network is involved here, and in order to determine the weight matrix actually used in the using link, that is, the preset weight matrix, the second training link is to train and optimize the weight matrix.
The untrained width learning neural network can be recorded as a to-be-trained width learning neural network, and the trained width learning neural network can be recorded as a preset width learning neural network.
For example, for the training optimization process, a matrix default value of a weight matrix may be initialized first, and the matrix default value is recorded as the weight matrix to be updated. The preset feature node, the preset enhanced node (enhanced node) and the weight matrix to be updated can be output by a preset output processing algorithm.
As for the preset output processing algorithm, as follows,
γ=[Zn|Hm]W*,
wherein γ represents the output matrix, ZnRepresenting a predetermined characteristic node, HmRepresents a predetermined enhanced node, W*Representing the weight matrix, n and m both representing the sequence numbers.
And S13, updating the weight matrix according to the preset feature node, the preset enhanced node and the output matrix so as to update the weight matrix to be updated into a preset weight matrix.
Then, the output matrix obtained based on the weight matrix to be updated can be used for updating the weight matrix.
For example, a specific implementation manner is that the preset feature section can be firstly selectedPoint ZnAnd a predetermined enhanced node HmA block matrix F is determined which, as follows,
F=[Zn|Hm]。
then, a new weight matrix, i.e. a preset weight matrix, is determined according to the block matrix and the output matrix,
W*=F+γ=(λI+FFT)-1FTγ,
wherein, W*Representing a weight matrix, F+The generalized inverse matrix of F is represented, F represents a block matrix, gamma represents an output matrix, the variable lambda is an eigenvalue, and I represents an identity matrix.
In addition to this, the present invention is,
wherein, F
+In order to be a general expression of the items,
then it represents the matrix F obtained when the variable lambda approaches 0
+The solution of (1).
In addition, the updating of the weight matrix according to the preset feature node, the preset enhanced node and the output matrix to update the weight matrix to be updated to a preset weight matrix specifically includes:
updating a weight matrix according to the preset characteristic node, the preset enhancement node and the output matrix to obtain a target weight matrix;
and if the object identification accuracy rate corresponding to the target weight matrix is not in the preset accuracy rate range, taking the target weight matrix as a new matrix to be updated, executing the step of performing output processing according to the preset characteristic node, the preset enhanced node and the weight matrix to be updated again to obtain an output matrix, and updating the target weight matrix into the preset weight matrix until the object identification accuracy rate corresponding to the target weight matrix is in the preset accuracy rate range.
For example, the weight matrix after the first update may be referred to as a target weight matrix, which may be abbreviated as a matrix L1, and an automatic test of object identification may be performed through the matrix L1 to automatically generate an object identification accuracy, and if the object identification accuracy is not at a higher value, the update is continued. In the continuous update operation, the matrix L1 is used to regenerate the output matrix, and the weight matrix is updated through the output matrix to obtain a new target weight matrix, which can be denoted as the matrix L2. If the object recognition accuracy corresponding to the matrix L2 is high, that is, within the preset accuracy range, the matrix L2 may be selected as a weight matrix used in the subsequent object recognition, that is, a preset weight matrix.
Of course, if the object recognition accuracy corresponding to the matrix L1 is within the preset accuracy range, the matrix L1 may be selected as the weight matrix used in the subsequent object recognition.
Therefore, a weight matrix with high object identification accuracy can be obtained by continuously and circularly updating the weight matrix.
Meanwhile, the weight matrix in the neural network structure is updated, so that the identification accuracy when the neural network structure is used for identifying objects can be improved.
Fig. 4 is a flowchart of an object recognition method based on width learning according to another embodiment of the present invention, where another embodiment of the present invention is based on the embodiment shown in fig. 3.
In this embodiment, before the obtaining of the preset feature node, the preset enhanced node, and the weight matrix to be updated in the to-be-trained width learning neural network, the width learning-based object identification method further includes:
acquiring a preset characteristic node;
and constructing an enhancement layer by using the preset feature node through a preset activation function so as to construct a preset enhancement node.
It is understood that the preset enhanced node is used in the breadth learning neural network, and the generation manner of the preset enhanced node is referred to herein.
Specifically, a preset feature node is obtained first, and the preset feature node is input into a preset activation function to obtain a preset enhanced node. As for the preset activation function, as follows,
Hm=ζ(ZnWm'+βm),
wherein HmA preset enhancement node representing an enhancement layer, ζ represents a preset activation function, ZnRepresenting a predetermined characteristic node, Wm' denotes a predefined matrix, βmThe expression offset vector is embodied as a set of randomly generated fixed values, and m denotes a sequence number.
In addition, see fig. 5, a schematic diagram of the architecture of the predetermined unified space coder and the predetermined width learning neural network, Zn=[Z1|Z2|...|Zl|...|Zu],Zl=[zl,1,zl,2,…,zl,p,…,zl,d],Hm=[hm,1,…,hm,q,…hm,s],βm=[bm,1,bm,2,…,bm,q,…,bs]N, l, u, p, d, m, q and s each represent a number. Wherein, UASE represents a preset unified space coder, average position represents average pooling, and transpose represents transposition.
The preset width learning neural network can be divided into 3 layers, namely an input layer, an enhancement layer and an output layer, and the number of layers of the preset width learning neural network is less than that of the traditional deep learning neural network. The preset characteristic node is an input layer of the preset width learning neural network, the preset enhancement node is located in the enhancement layer, and the output layer can obtain an object recognition result. The comparison operation of the labels is involved in the output layer.
Therefore, the preset enhanced node can be generated through the embodiment of the invention.
On the basis of the foregoing embodiment, preferably, before S11, the method for object recognition based on width learning further includes:
and S111, obtaining a three-dimensional point cloud sample.
It is understood that preset feature nodes are used in the width learning neural network, and the generation manner of the preset feature nodes is referred to herein.
Specifically, in a first training link corresponding to a preset unified space encoder, three-dimensional point cloud data in an outdoor environment can be collected by any LiDAR sensor to serve as a training sample. Of course, the training sample may be stored in a local hard disk.
Then, a large number of three-dimensional point cloud samples can be visualized, see fig. 6, where fig. 6 is a schematic view of three-dimensional point cloud data visualization in an outdoor environment, and corresponding labels can be marked on different three-dimensional point cloud samples.
For example, referring to fig. 7, fig. 7 is a schematic diagram of the spatial distribution of three-dimensional point cloud data corresponding to a target object, and the spatial distribution of three-dimensional point cloud data corresponding to a car, a pedestrian, a bush, a trunk, a tree, and a building respectively from left to right, that is, the target object includes six types of cars, pedestrians, bushes, trunks, trees, and buildings.
Therefore, three-dimensional point cloud samples corresponding to different target objects can be stored in separate files, and a sample label representing the object type is added to each file, for example, the sample label can be an automobile.
When the method is actually used, a plurality of existing files can be directly obtained to obtain the three-dimensional point cloud sample and the sample label corresponding to the three-dimensional point cloud sample.
And S112, selecting a current coding matrix from preset coding matrixes according to the coordinate information in the three-dimensional point cloud sample.
A plurality of encoding matrices and decoding matrices may be generated first and recorded as preset encoding matrices and preset decoding matrices.
Then, the coordinate information can be used as a proof to screen the preset coding matrix so as to select a coding matrix and record the coding matrix as the current coding matrix.
S113, processing the three-dimensional point cloud sample through the current coding matrix to obtain a target uncertain feature vector under an uncertain feature space.
Then, the original three-dimensional point cloud sample can be mapped to an Uncertain feature space (Uncertain feature space) by using the screened current coding matrix, and the feature vector mapped to the Uncertain feature space is marked as a target Uncertain feature vector.
For example, the three-dimensional coordinate information in the three-dimensional point cloud sample may be denoted as (x, y, z), and the three-dimensional coordinate information is processed through the current coding matrix in a preset uncertain space mapping formula.
As for the preset uncertain space mapping formula, as follows,
ri,j,k=wk,1xi,j+wk,2yi,j+wk,3zi,j+b,
wherein r isi,j,kFor uncertain feature vectors, wk,1To wk,3For the elements in the kth coding matrix, i and j together represent the jth point in the ith object, k represents the vector serial number of the uncertain feature vector, and b represents the offset.
S114, performing pooling treatment on the target uncertain feature vectors through an average pooling algorithm to obtain target uniform feature vectors in a uniform feature space.
Then, the feature vectors in the uncertain feature space can be pooled by using an average pooling algorithm to be encoded into feature vectors in a Unified feature space (Unified-feature space), and the feature vectors are recorded as a Unified feature vector Vi。
As for the average pooling algorithm, as follows,
wherein v isi,kFor the average pooling result, ri,j,kFor uncertain feature vectors, niAnd (3) the number of points contained in the three-dimensional point cloud sample is represented by i, i represents the ith object, i and j together represent the jth point in the ith object, and k represents the serial number of the uncertain feature vector.
Unified feature vector ViConsisting of the average pooling result, which can be expressed as
Vi=[vi,1,vi,2,…,vi,k,…,vi,d]T,
It can be seen that the feature vectors in the uncertain feature space can be mapped to the uniform feature space through the average pooling operation.
And S115, transposing the target uniform characteristic vector to obtain a preset characteristic node.
The uniform feature vector is transposed to obtain the feature nodes shown below, which can be marked as preset feature nodes,
Xi=ViT,
wherein, XiA predetermined characteristic node, V, representing the ith objectiRepresenting a uniform feature vector for the ith object.
It should be noted that, in the embodiments of the present invention, the data types are still the same only for distinguishing the data contents in different situations, for example, the target uniform feature vector and the current uniform feature vector, and so on.
Therefore, the preset feature node can be generated through the embodiment of the invention.
On the basis of the foregoing embodiment, preferably, the selecting a current encoding matrix from preset encoding matrices according to the coordinate information in the three-dimensional point cloud sample specifically includes:
training an encoding matrix and a decoding matrix through a preset batch gradient descent algorithm to obtain a preset encoding matrix and a preset decoding matrix;
mapping coordinate information in the three-dimensional point cloud sample through the preset coding matrix to obtain hidden layer information;
mapping the hidden layer information through the preset decoding matrix to obtain output layer information;
and if the similarity between the coordinate information and the output layer information is within a preset similarity range, taking the preset coding matrix as the current coding matrix.
It can be understood that, the embodiment of the present invention optimizes the training of the coding matrix, and the accuracy of object identification can be further improved by optimizing the coding accuracy of the coding matrix.
Specifically, the encoding matrix and the decoding matrix are trained through a preset Batch Gradient Descent (BGD) algorithm, for example, if the trained encoding matrix is a preset encoding matrix, the trained decoding matrix is a preset decoding matrix.
Then, the preset encoding matrix and the preset decoding matrix may be used for testing, for example, the coordinate information and a bias node in the three-dimensional point cloud sample may be mapped to the hidden layer through the preset encoding matrix to obtain hidden layer information corresponding to the coordinate information.
The hidden layer information and another bias node may then be mapped to the output layer by a preset decoding matrix to obtain output layer information.
Wherein, the testing process of the testing using the predetermined encoding matrix and the predetermined decoding matrix can be referred to the following predetermined testing formula,
wherein,
for the output layer information i.e. the coding information,
for a predetermined decoding matrix, W for a predetermined encoding matrix, a
i,jFor the coordinate information of the input three-dimensional point cloud sample, i and j together represent the j-th point of the i-th object, and the f function and the g function are both Sigmoid functions.
Then, if the coordinate information before encoding and the output layer information after encoding have higher similarity, the preset encoding matrix can be reserved as the encoding matrix used later and recorded as the current encoding matrix.
In addition, see also the following table 1,
TABLE 1 test run Table
Table 1 records that Name represents a Name, UASE Training sample represents the number of Training samples of a preset uniform spatial encoder, BLS Training sample represents the number of Training samples of a preset width learning neural network, Training Accuracy represents Training Accuracy, BLS Testing sample represents the number of test samples of the preset width learning neural network, and Testing Accuracy represents Testing Accuracy;
car denotes Car, Pedestian denotes Pedestrian, Bush denotes shrub, Trunk denotes Trunk, Tree denotes Tree, Building denotes Building, Total/Aver denotes Total/average.
As shown in the test flow of table 1 above, 10 feature nodes will be used as inputs to the preset width learning neural network, 12 preset unified spatial encoders (USAE), and 9000 enhancement nodes.
Therefore, the embodiment of the invention can optimize the coding matrix so as to find the coding matrix with better coding performance to be used in object identification.
Fig. 8 is a schematic structural diagram of an object recognition system based on width learning according to an embodiment of the present invention, as shown in fig. 8, the system includes: adata acquisition module 301, aspatial coding module 302 and anobject identification module 303;
thedata acquisition module 301 is used for acquiring three-dimensional point cloud data in a current area;
thespatial coding module 302 is configured to process the three-dimensional point cloud data through a preset uniform spatial coder to obtain a current feature node in a uniform feature space;
and theobject identification module 303 is configured to perform object identification on the current feature node through a preset width learning neural network.
The object identification system based on width learning provided by the embodiment of the invention firstly collects three-dimensional point cloud data in a current area; processing the three-dimensional point cloud data through a preset uniform space encoder to obtain current feature nodes in a uniform feature space; and carrying out object recognition on the current feature nodes through a preset width learning neural network. When the object is identified, the preset width learning neural network adopted by the embodiment of the invention is different from the traditional deep learning neural network, the number of network layers is less, the number of parameters involved in the neural network structure is less, and the overall calculation efficiency is higher; meanwhile, the input quantity of the preset width learning neural network is changed from the original three-dimensional point cloud data to feature vectors in a unified feature space, the data type is simpler to process, and the calculation efficiency is further improved.
The system embodiment provided in the embodiments of the present invention is for implementing the above method embodiments, and for details of the process and the details, reference is made to the above method embodiments, which are not described herein again.
Fig. 9 is a schematic entity structure diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 9, the electronic device may include: a processor (processor)401, a communication Interface (communication Interface)402, a memory (memory)403 and abus 404, wherein theprocessor 401, thecommunication Interface 402 and thememory 403 complete communication with each other through thebus 404. Thecommunication interface 402 may be used for information transfer of an electronic device.Processor 401 may call logic instructions inmemory 403 to perform a method comprising:
collecting three-dimensional point cloud data in a current area;
processing the three-dimensional point cloud data through a preset uniform space encoder to obtain current feature nodes in a uniform feature space;
and carrying out object recognition on the current feature nodes through a preset width learning neural network.
In addition, the logic instructions in thememory 403 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the above-described method embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented by a processor to perform the method provided by the foregoing embodiments, for example, including:
collecting three-dimensional point cloud data in a current area;
processing the three-dimensional point cloud data through a preset uniform space encoder to obtain current feature nodes in a uniform feature space;
and carrying out object recognition on the current feature nodes through a preset width learning neural network.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.