CN117788277A

Movatterモバイル変換

Info

Publication number: CN117788277A
Application number: CN202311816094.5A
Authority: CN
Inventors: 李弘扬; 杨泽同; 陈立; 孙亚楠; 司马崇昊; 杨佳智; 曾嘉; 乔宇
Original assignee: Shanghai AI Innovation Center
Current assignee: Shanghai AI Innovation Center
Priority date: 2023-12-26
Filing date: 2023-12-26
Publication date: 2024-03-29
Anticipated expiration: 2043-12-26
Also published as: CN117788277B

Abstract

本发明公开了一种点云生成方法、装置、电子设备和存储介质，其中，该方法包括：获取目标车辆的历史时序图像数据，并确定所述历史时序图像数据的鸟瞰图特征信息；在所述鸟瞰图特征信息内采集所述历史时序图像数据的三维几何表示数据；根据所述目标车辆的自车运动条件以及所述三维几何表示数据预测生成未来点云。本发明实施例实现针对自动驾驶模型的视觉图像处理，实现图像点云序列内语义、几何特征以及动态时序信息的充分利用，可提高点云预测生成的准确性，可提高自动驾驶模型的可靠性，提升用户使用体验。

The invention discloses a point cloud generation method, device, electronic device and storage medium, wherein the method includes: obtaining historical time series image data of a target vehicle, and determining bird's-eye view feature information of the historical time series image data; The three-dimensional geometric representation data of the historical time series image data is collected from the bird's-eye view feature information; and a future point cloud is predicted and generated based on the self-vehicle motion conditions of the target vehicle and the three-dimensional geometric representation data. Embodiments of the present invention realize visual image processing for autonomous driving models, realize full utilization of semantics, geometric features and dynamic time series information in image point cloud sequences, improve the accuracy of point cloud prediction and generation, and improve the reliability of autonomous driving models. , improve user experience.

Description

Translated fromChinese

一种点云生成方法、装置、电子设备和存储介质Point cloud generation method, device, electronic device and storage medium

技术领域Technical field

本发明涉及图像处理技术领域，尤其涉及一种点云生成方法、装置、电子设备和存储介质。The present invention relates to the field of image processing technology, and in particular to a point cloud generation method, device, electronic equipment and storage medium.

背景技术Background technique

随着视觉处理技术的发展，自动驾驶即将成为现实，目前视觉处理技术主要集中在通用视觉的研究，但是针对如何自监督预训练自动驾驶模型的研究当前仍然较少，这意味着常见视觉处理无法同时涵盖语义、几何以及时序等特征，在端到端的感知、预测以及规划等方面视觉处理仍存在缺陷，如何充分利用图像点云序列中的丰富语义、几何特征以及动态时序信息，从而提升点云预测的效率以及准确性成为当前业界亟待解决的问题。With the development of visual processing technology, autonomous driving is about to become a reality. Currently, visual processing technology mainly focuses on the research of general vision. However, there is currently still little research on how to self-supervised pre-train autonomous driving models, which means that common visual processing cannot It also covers features such as semantics, geometry, and timing. There are still flaws in visual processing in end-to-end perception, prediction, and planning. How to make full use of the rich semantics, geometric features, and dynamic timing information in image point cloud sequences to improve point clouds? The efficiency and accuracy of forecasting have become urgent issues in the industry.

发明内容Contents of the invention

本发明提供了一种点云生成方法、装置、电子设备和存储介质，以实现针对自动驾驶模型的视觉图像处理，实现图像点云序列内语义、几何特征以及动态时序信息的充分利用，可提高点云预测生成的准确性，可提高自动驾驶模型的可靠性，提升用户使用体验。The present invention provides a point cloud generation method, device, electronic equipment and storage medium to realize visual image processing for automatic driving models, realize full utilization of semantics, geometric features and dynamic timing information in image point cloud sequences, and can improve The accuracy of point cloud prediction generation can improve the reliability of autonomous driving models and enhance user experience.

根据本发明的一方面，提供了一种点云生成方法，其中，该方法包括：According to an aspect of the present invention, a point cloud generation method is provided, wherein the method includes:

获取目标车辆的历史时序图像数据，并确定所述历史时序图像数据的鸟瞰图特征信息；Obtain historical time series image data of the target vehicle, and determine bird's-eye view feature information of the historical time series image data;

在所述鸟瞰图特征信息内采集所述历史时序图像数据的三维几何表示数据；Collecting three-dimensional geometric representation data of the historical time-series image data in the bird's-eye view feature information;

根据所述目标车辆的自车运动条件以及所述三维几何表示数据预测生成未来点云。A future point cloud is generated based on the ego-vehicle motion condition of the target vehicle and the three-dimensional geometric representation data.

根据本发明的另一方面，提供了一种点云生成装置，其中，该装置包括：According to another aspect of the present invention, a point cloud generation device is provided, wherein the device includes:

编码器模块，用于获取目标车辆的历史时序图像数据，并确定所述历史时序图像数据的鸟瞰图特征信息；An encoder module, used to obtain the historical time series image data of the target vehicle and determine the bird's-eye view feature information of the historical time series image data;

隐空间渲染器模块，用于在所述鸟瞰图特征信息内采集所述历史时序图像数据的三维几何表示数据；A latent space renderer module, configured to collect three-dimensional geometric representation data of the historical time series image data within the bird's-eye view feature information;

解码器模块，用于根据所述目标车辆的自车运动条件以及所述三维几何表示数据预测生成未来点云。A decoder module configured to predict and generate a future point cloud based on the self-vehicle motion conditions of the target vehicle and the three-dimensional geometric representation data.

根据本发明的另一方面，提供了一种电子设备，所述电子设备包括：According to another aspect of the present invention, an electronic device is provided, the electronic device including:

至少一个处理器；以及at least one processor; and

与所述至少一个处理器通信连接的存储器；其中，a memory communicatively connected to the at least one processor; wherein,

所述存储器存储有可被所述至少一个处理器执行的计算机程序，所述计算机程序被所述至少一个处理器执行，以使所述至少一个处理器能够执行本发明任一实施例所述的点云生成方法。The memory stores a computer program that can be executed by the at least one processor, and the computer program is executed by the at least one processor, so that the at least one processor can execute the method described in any embodiment of the present invention. Point cloud generation method.

根据本发明的另一方面，提供了一种计算机可读存储介质，所述计算机可读存储介质存储有计算机指令，所述计算机指令用于使处理器执行时实现本发明任一实施例所述的点云生成方法。According to another aspect of the present invention, a computer-readable storage medium is provided. The computer-readable storage medium stores computer instructions. The computer instructions are used to enable a processor to implement any embodiment of the present invention when executed. point cloud generation method.

本发明实施例的技术方案，通过在目标车辆内获取历史时序图像数据，并生成历史时序图像数据对应的鸟瞰图特征信息，按照鸟瞰图特征信息提取对应历史时序图像数据的三维几何表示数据，在目标车辆的自车运动条件下按照所述三维几何表示数据预测生成未来点云，可实现自动驾驶模型场景下的视觉图像处理，可实现图像点云序列内语义、几何特征以及动态时序信息的充分利用，可提高点云预测生成的准确性，可提高自动驾驶模型的可靠性，提升用户使用体验。The technical solution of the embodiment of the present invention obtains historical time series image data in the target vehicle, generates bird's-eye view feature information corresponding to the historical time series image data, and extracts three-dimensional geometric representation data corresponding to the historical time series image data according to the bird's-eye view feature information. Predicting and generating future point clouds based on the three-dimensional geometric representation data under the target vehicle's self-motion conditions can realize visual image processing in the autonomous driving model scenario, and can fully realize the semantics, geometric features and dynamic time series information in the image point cloud sequence. Utilization can improve the accuracy of point cloud prediction generation, improve the reliability of autonomous driving models, and improve user experience.

应当理解，本部分所描述的内容并非旨在标识本发明的实施例的关键或重要特征，也不用于限制本发明的范围。本发明的其它特征将通过以下的说明书而变得容易理解。It should be understood that what is described in this section is not intended to identify key or important features of the embodiments of the invention, nor is it intended to limit the scope of the invention. Other features of the present invention will become easily understood from the following description.

附图说明Description of the drawings

为了更清楚地说明本发明实施例中的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings required for use in the description of the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For ordinary technicians in this field, other drawings can be obtained based on these drawings without creative work.

图1是根据本发明实施例一提供的一种点云生成方法的流程图；Figure 1 is a flow chart of a point cloud generation method provided according to Embodiment 1 of the present invention;

图2是根据本发明实施例二提供的另一种点云生成方法的流程图；Figure 2 is a flow chart of another point cloud generation method provided according to Embodiment 2 of the present invention;

图3是根据本发明实施例三提供的一种点云生成方法的结构架构示例图；Figure 3 is an example structural architecture diagram of a point cloud generation method provided according to Embodiment 3 of the present invention;

图4是根据本发明实施例三提供的一种隐空间渲染器的结构示意图；Figure 4 is a schematic structural diagram of a latent space renderer provided according to Embodiment 3 of the present invention;

图5是根据本发明实施例三提供的一种解码器的结构示意图；Figure 5 is a schematic structural diagram of a decoder provided according to Embodiment 3 of the present invention;

图6是根据本发明实施例四提供的一种点云生成装置的结构示意图；Figure 6 is a schematic structural diagram of a point cloud generation device provided according to Embodiment 4 of the present invention;

图7是实现本发明实施例的电子设备的结构示意图。Figure 7 is a schematic structural diagram of an electronic device implementing an embodiment of the present invention.

具体实施方式Detailed ways

为了使本技术领域的人员更好地理解本发明方案，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分的实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都应当属于本发明保护的范围。In order to enable those skilled in the art to better understand the solutions of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only These are some embodiments of the present invention, rather than all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts should fall within the scope of protection of the present invention.

需要说明的是，本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象，而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换，以便这里描述的本发明的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外，术语“包括”和“具有”以及他们的任何变形，意图在于覆盖不排他的包含，例如，包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元，而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It should be noted that the terms "first", "second", etc. in the specification and claims of the present invention and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence. It should be understood that the data used in this way can be interchanged where appropriate, so that the embodiments of the present invention described herein can be implemented in an order other than those illustrated or described herein. In addition, the terms "including" and "having" and any variations thereof are intended to cover non-exclusive inclusions, for example, a process, method, system, product or device that includes a series of steps or units is not necessarily limited to those steps or units clearly listed, but may include other steps or units that are not clearly listed or inherent to these processes, methods, products or devices.

实施例一Embodiment 1

图1是根据本发明实施例一提供的一种点云生成方法的流程图，本实施例可适用于情况，该方法可以由点云生成装置来执行，该点云生成装置可以采用硬件和/或软件的形式实现，该点云生成装置可配置于服务器或服务器集群中。如图1所示，该方法包括：FIG1 is a flow chart of a point cloud generation method provided according to Embodiment 1 of the present invention. This embodiment is applicable to situations. The method can be executed by a point cloud generation device. The point cloud generation device can be implemented in the form of hardware and/or software. The point cloud generation device can be configured in a server or a server cluster. As shown in FIG1 , the method includes:

步骤110、获取目标车辆的历史时序图像数据，并确定历史时序图像数据的鸟瞰图特征信息。Step 110: Obtain the historical time series image data of the target vehicle, and determine the bird's-eye view feature information of the historical time series image data.

其中，目标车辆可以是进行自动驾驶的主体，目标车辆可以在自动驾驶过程中的不同时刻采集周围环境数据，历史时序图像数据可以由目标车辆采集到的周围环境数据整理生成，历史时序图像数据可以为一个数据集，该历史时序图像数据内图像帧可以按照时间顺序依次排列，该历史时序图像数据可以反映出目标车辆在过去一段时间内的形式状态，可以理解的是，历史时序图像数据内的时间长度可以由具体的业务需要设定，自动驾驶的准确性要求越高，则历史时序图像数据内包括的图像帧的时间长度可以越长或越细粒度。鸟瞰图特征信息可以是由一个或多个视角的二维图像经过处理生成的鸟瞰视图，可提高场景内物体和空间关系的关联，便于历史时序图像数据进行三维几何的特征处理。Among them, the target vehicle can be the subject of autonomous driving. The target vehicle can collect surrounding environment data at different moments during the autonomous driving process. The historical time series image data can be compiled and generated from the surrounding environment data collected by the target vehicle. The historical time series image data can is a data set. The image frames in the historical time series image data can be arranged in chronological order. The historical time series image data can reflect the form status of the target vehicle in the past period of time. It can be understood that the historical time series image data in the The time length can be set according to specific business needs. The higher the accuracy requirements of autonomous driving, the longer or finer the time length of the image frames included in the historical time series image data. Bird's-eye view feature information can be a bird's-eye view generated by processing two-dimensional images from one or more perspectives, which can improve the correlation between objects and spatial relationships in the scene and facilitate three-dimensional geometric feature processing of historical time-series image data.

在本发明实施例中，可以提取目标车辆在自动驾驶过程的不同时刻采集到的环境图像数据构成的历史时序图像数据，可以对历史时序图像数据内不同时刻采集到的环境图像数据进行处理，将不同时刻的一帧或多帧二维环境图像数据转换为鸟瞰图，可以将该鸟瞰图分别作为对应历史时序图像数据的鸟瞰图特征信息。In an embodiment of the present invention, historical time series image data consisting of environmental image data collected by the target vehicle at different times during the automatic driving process can be extracted, the environmental image data collected at different times within the historical time series image data can be processed, and one or more frames of two-dimensional environmental image data at different times can be converted into a bird's-eye view, and the bird's-eye view can be used as bird's-eye view feature information corresponding to the historical time series image data.

步骤120、在鸟瞰图特征信息内采集历史时序图像数据的三维几何表示数据。Step 120: Collect three-dimensional geometric representation data of historical time series image data within the bird's-eye view feature information.

其中，三维几何表示数据可以是反映空间内三维几何关系的特征数据，三维几何表示数据可以包括目标车辆在网格内占用概率、目标车辆行驶路径的几何形状等。Among them, the three-dimensional geometric representation data can be characteristic data that reflects three-dimensional geometric relationships in space. The three-dimensional geometric representation data can include the occupancy probability of the target vehicle in the grid, the geometric shape of the target vehicle's driving path, etc.

在本发明实施例中，可以对提取到的鸟瞰图特征信息进行三维几何特征提取，可以将提取到的三维几何特征作为三维几何表示数据，可以理解的是，历史时序图像数据可以对应多帧鸟瞰图特征信息，每帧鸟瞰图特征信息可以存在各种对应的三维几何表示数据。具体的，三维几何表示数据可以通过神经网络模型提取获得，例如，在一些发明实施例中，三维几何表示数据可以通过长短期记忆模型处理生成，不同时刻的三维几何表示数据可以由其前一刻的三维几何表示数据以及当前时刻的鸟瞰图特征信息处理生成。In an embodiment of the present invention, three-dimensional geometric features can be extracted from the extracted bird's-eye view feature information, and the extracted three-dimensional geometric features can be used as three-dimensional geometric representation data. It can be understood that the historical time series image data can correspond to multiple frames of bird's-eye view feature information, and each frame of bird's-eye view feature information can have various corresponding three-dimensional geometric representation data. Specifically, the three-dimensional geometric representation data can be obtained by extracting a neural network model. For example, in some embodiments of the invention, the three-dimensional geometric representation data can be generated by processing a long short-term memory model, and the three-dimensional geometric representation data at different times can be generated by processing the three-dimensional geometric representation data at the previous moment and the bird's-eye view feature information at the current moment.

步骤130、根据目标车辆的自车运动条件以及三维几何表示数据预测生成未来点云。Step 130: Predict and generate a future point cloud based on the target vehicle's own motion conditions and three-dimensional geometric representation data.

其中，自车运动条件可以是目标车辆在不同时刻下运行状态的信息，自车运动条件可以包括但不限于基于车速、车辆行驶方向、车辆位置等信息。未来点云可以是预测生成的点云数据，未来点云可以表示目标车辆在未来一个时刻或多个时刻所处状态的点云数据。Among them, the self-vehicle movement conditions can be information about the operating status of the target vehicle at different times. The self-vehicle movement conditions can include but are not limited to information based on vehicle speed, vehicle driving direction, vehicle location, etc. The future point cloud can be point cloud data generated by prediction, and the future point cloud can represent the point cloud data of the target vehicle's state at one or more moments in the future.

在本发明实施例中，可以提取目标车辆的车速、车辆行驶方向、车辆位置等信息作为自车运动条件，可以理解的是，该自车运动条件可以包括历史一段时间内的自车运动条件，也可以包括基于历史的自车运动条件预测生成的未来一段时间的自车运动条件，可以通过自车运动条件以及预先获取到的三维几何表示数据对目标车辆的未来点云进行预测，该未来点云所处的时间可以包括下一帧或下几帧的未来时刻，该时间可以由具体业务场景决定。In the embodiment of the present invention, the vehicle speed, vehicle driving direction, vehicle position and other information of the target vehicle can be extracted as the self-vehicle movement conditions. It can be understood that the self-vehicle movement conditions can include the self-vehicle movement conditions within a historical period of time, It can also include self-vehicle movement conditions for a period of time in the future generated based on historical self-vehicle movement condition predictions. The future point cloud of the target vehicle can be predicted through the self-vehicle movement conditions and pre-obtained three-dimensional geometric representation data. The future point cloud The time of the cloud can include the future time of the next frame or the next few frames, and this time can be determined by the specific business scenario.

本发明实施例，通过在目标车辆内获取历史时序图像数据，并生成历史时序图像数据对应的鸟瞰图特征信息，按照鸟瞰图特征信息提取对应历史时序图像数据的三维几何表示数据，在目标车辆的自车运动条件下按照所述三维几何表示数据预测生成未来点云，可实现自动驾驶模型场景下的视觉图像处理，可实现图像点云序列内语义、几何特征以及动态时序信息的充分利用，可提高点云预测生成的准确性，可提高自动驾驶模型的可靠性，提升用户使用体验。The embodiments of the present invention obtain historical time-series image data in a target vehicle, generate bird's-eye view feature information corresponding to the historical time-series image data, extract three-dimensional geometric representation data corresponding to the historical time-series image data according to the bird's-eye view feature information, and predict and generate future point clouds according to the three-dimensional geometric representation data under the self-vehicle motion condition of the target vehicle. This can realize visual image processing in an autonomous driving model scenario, fully utilize the semantics, geometric features and dynamic time-series information in an image point cloud sequence, improve the accuracy of point cloud prediction generation, improve the reliability of the autonomous driving model, and enhance the user experience.

实施例二Embodiment 2

图2是根据本发明实施例二提供的另一种点云生成方法的流程图，本发明实施例是在上述发明实施例基础上的具体化，参见图2，本发明实施例提供的方法具体包括如下步骤：FIG. 2 is a flow chart of another point cloud generation method provided by Embodiment 2 of the present invention. The embodiment of the present invention is a specific implementation of the above-mentioned embodiment of the present invention. Referring to FIG. 2 , the method provided by the embodiment of the present invention specifically includes the following steps:

步骤210、提取目标车辆按照时间顺序采集的图像帧构成历史时序图像数据。Step 210: Extract the image frames collected by the target vehicle in time sequence to form historical time series image data.

其中，图像帧可以是目标车辆采集到的所处环境的环境图像，图像帧可以是目标车辆四周环境的图像。The image frame may be an environmental image of the environment where the target vehicle is collected, and the image frame may be an image of the environment surrounding the target vehicle.

在本发明实施例中，目标车辆可以在行驶过程中以定时或不定时的方式采集所处环境的图像作为图像帧，可以按照各图像帧采集或生成等时间顺序对各图像帧进行排序，可以理解的是，同一时刻下可以存在一个或多个图像帧，可以将经过时间顺序排序后的图像帧作为历史时序图像数据。In the embodiment of the present invention, the target vehicle can collect images of its environment as image frames in a timed or untimed manner during driving, and the image frames can be sorted according to the time sequence of the collection or generation of each image frame. It is understood that one or more image frames may exist at the same time, and the image frames sorted in time sequence may be used as historical time series image data.

步骤220、将历史时序图像数据内对应不同时刻的图像帧依次输入预设历史编码器以生成不同时刻图像帧的鸟瞰图特征信息，其中，预设历史编码器包括自低向上鸟瞰图模型、自顶向下鸟瞰图模型、多模态鸟瞰图模型、解码鸟瞰图模型。Step 220: Input the image frames corresponding to different moments in the historical time series image data into the preset history encoder in sequence to generate bird's-eye view feature information of the image frames at different times, where the preset history encoder includes a bottom-up bird's-eye view model, an automatic Top-down bird's-eye view model, multi-modal bird's-eye view model, and decoded bird's-eye view model.

其中，预设历史编码器可以将一帧或多帧二维图像处理为鸟瞰图的编码器，预设历史编码器可以包括自低向上鸟瞰图模型、自顶向下鸟瞰图模型、多模态鸟瞰图模型、解码鸟瞰图模型中的一种或多种，自低向上鸟瞰图模型可以包括Lift-Splat-Shoot模型、BEVDet模型等，自顶向下鸟瞰图模型可以包括DETR模型、PETR模型等，多模态鸟瞰图模型可以包括bevfusion–ADLab等，解码鸟瞰图模型可以包括CenterNet模型等。Among them, the preset history encoder can process one or more frames of two-dimensional images into a bird's-eye view encoder. The preset history encoder can include a bottom-up bird's-eye view model, a top-down bird's-eye view model, a multi-modal One or more of a bird's-eye view model and a decoding bird's-eye view model. The bottom-up bird's-eye view model may include the Lift-Splat-Shoot model, the BEVDet model, etc. The top-down bird's-eye view model may include the DETR model, PETR model, etc. , the multi-modal bird's-eye view model can include bevfusion-ADLab, etc., and the decoding bird's-eye view model can include CenterNet model, etc.

在本发明实施例中，可以将历史时序图像数据内将不同时刻的图像帧分别输入到预设历史编码器进行处理，可以理解的是，同一时刻的图像帧可以同时输入到预设历史编码器进行处理，可以由预设历史编码器对图像帧进行鸟瞰图转换，可以提取预设历史编码器输出的对应不同时刻图像帧的鸟瞰图特征信息，在本发明实施例中，预设历史编码器的实现方式可以通过自低向上鸟瞰图模型、自顶向下鸟瞰图模型、多模态鸟瞰图模型、解码鸟瞰图模型实现。In the embodiment of the present invention, image frames at different times in the historical time series image data can be input to the preset history encoder for processing. It can be understood that image frames at the same time can be input to the preset history encoder at the same time. For processing, the preset history encoder can perform bird's-eye view conversion on the image frame, and the bird's-eye view feature information corresponding to the image frames at different times output by the preset history encoder can be extracted. In the embodiment of the present invention, the preset history encoder can be implemented through a bottom-up bird's-eye view model, a top-down bird's-eye view model, a multi-modal bird's-eye view model, and a decoding bird's-eye view model.

步骤230、调用预设隐空间渲染器处理每帧鸟瞰图特征信息。Step 230: Call a preset latent space renderer to process feature information of each frame of the bird's-eye view.

其中，预设隐空间渲染器可以是对鸟瞰图特征信息内三维几何特征进行处理的装置，预设隐空间渲染器可以提取鸟瞰图特征信息内每个占用网格的占用概率作为三维几何表示数据。Among them, the preset latent space renderer can be a device for processing three-dimensional geometric features in the bird's-eye view feature information. The preset latent space renderer can extract the occupancy probability of each occupied grid in the bird's-eye view feature information as three-dimensional geometric representation data. .

在本发明实施例中，可以将每帧鸟瞰图特征信息分别输入到预设隐空间渲染器进行处理，从而在预设隐空间渲染器内提取到每帧鸟看图特征信息的三维几何表示数据，在一些发明实施例中，预设隐空间渲染器可以具有长短期记忆模型的特点，鸟瞰图特征信息可以按照时间顺序依次输入预设隐空间渲染器，可以由预设隐空间渲染器的BEV查询保留每帧鸟瞰图特征信息的权重信息，以便在后续鸟瞰图特征信息的处理过程中使用，使得每帧鸟瞰图特征信息的处理相互依赖，从而保留鸟瞰图特征信息的时序特点，从而提升未来点云预测的准确性。In the embodiment of the present invention, each frame of bird's-eye view feature information can be input to the preset latent space renderer for processing, so that the three-dimensional geometric representation data of each frame of bird's-eye view feature information can be extracted in the preset latent space renderer. , in some embodiments of the invention, the preset latent space renderer can have the characteristics of a long short-term memory model, and the bird's-eye view feature information can be input to the preset latent space renderer in chronological order, and can be determined by the BEV of the preset latent space renderer. The query retains the weight information of the bird's-eye view feature information of each frame for use in the subsequent processing of the bird's-eye view feature information, making the processing of the bird's-eye view feature information of each frame interdependent, thereby retaining the timing characteristics of the bird's-eye view feature information, thereby improving the future Point cloud prediction accuracy.

步骤240、提取预设隐空间渲染器生成对应鸟瞰图特征信息的占用网格特征作为三维几何表示数据。Step 240: Extract the preset latent space renderer to generate occupied grid features corresponding to the bird's-eye view feature information as three-dimensional geometric representation data.

其中，占用网格特征可以是鸟瞰图特征信息内目标车辆占据每个网格的概率特征，占用网格特征可以由预设隐空间渲染器生成，可以理解的是，每个鸟瞰图特征信息可以对应一组占用网格特征，该组占用网格特征内可以包括鸟瞰图网格内每个网格的占用概率。Among them, the occupied grid feature can be a probability feature of the target vehicle occupying each grid in the bird's-eye view feature information. The occupied grid feature can be generated by a preset latent space renderer. It can be understood that each bird's-eye view feature information can correspond to a set of occupied grid features, and the set of occupied grid features can include the occupancy probability of each grid in the bird's-eye view grid.

在本发明实施例中，可以提取预设隐空间渲染器处理鸟瞰图特征信息生成的占用网格特征，可以将每个鸟瞰图特征信息的占用网格特征作为对应的三维几何表示数据。In the embodiment of the present invention, the occupancy grid features generated by processing the bird's-eye view feature information by the preset latent space renderer can be extracted, and the occupancy grid features of each bird's-eye view feature information can be used as corresponding three-dimensional geometric representation data.

步骤250、调用预设多层感知器基于历史时序图像数据内当前时刻的图像帧确定下一时刻的自车运动条件。Step 250: Call the preset multi-layer perceptron to determine the vehicle motion conditions at the next moment based on the image frame at the current moment in the historical time series image data.

其中，预设多层感知器可以是用于对目标车辆的未来的自车运动条件进行预测的神经网络模型，预设多层感知器可以包括人工神经网络训练生成，在此不对预设多层感知器的网络结构以及权重参数进行限制。Among them, the preset multi-layer perceptron can be a neural network model used to predict the future self-vehicle motion conditions of the target vehicle. The preset multi-layer perceptron can include artificial neural network training and generation. The preset multi-layer perceptron is not used here. The network structure and weight parameters of the perceptron are restricted.

在本发明实施例中，可以获取预先训练的预设多层感知器，可以在历史时序图像数据内提取当前时刻的图像帧，并调用预设多层感知器对当前时刻的图像帧进行处理，从而由预设多层感知器基于当前时刻的图像帧预测出下一时刻的自车运动条件。可以理解的是，在某些实施例中，预设多层感知器输入的图像帧不限于当前时刻，还可以与当前时刻相邻的一段时间内的图像帧。In the embodiment of the present invention, a pre-trained preset multi-layer perceptron can be obtained, and the image frame at the current moment can be extracted from the historical time series image data, and the preset multi-layer perceptron can be called to process the image frame at the current moment, so that the preset multi-layer perceptron predicts the vehicle motion condition at the next moment based on the image frame at the current moment. It can be understood that in some embodiments, the image frame input by the preset multi-layer perceptron is not limited to the current moment, but can also be an image frame within a period of time adjacent to the current moment.

步骤260、将当前时刻的三维几何表示数据对应的鸟瞰图查询、当前时刻的三维几何表示数据的占用网格特征以及自车运动条件依次通过自注意力模块、时序交叉注意力模块以及前向网络进行至少一次迭代处理生成未来点云。Step 260: The bird's-eye view query corresponding to the three-dimensional geometric representation data at the current moment, the occupied grid features of the three-dimensional geometric representation data at the current moment, and the vehicle motion conditions are iteratively processed at least once through the self-attention module, the temporal cross-attention module, and the forward network to generate a future point cloud.

其中，鸟瞰图查询可以是不同时刻的三维几何表示数据生成过程保留的中间参数信息，鸟瞰图查询可以在预设隐空间渲染器对每帧鸟瞰图特征信息的处理过程生成，例如，当前时刻的鸟瞰图查询可以由预设隐空间渲染器基于上一时刻的鸟瞰图查询以及当前时刻的鸟瞰图特征信息处理生成，当前时刻的鸟瞰图查询可以反映当前时刻的鸟瞰图特征信息的时序特点。Among them, the bird's-eye view query can be the intermediate parameter information retained in the three-dimensional geometric representation data generation process at different times. The bird's-eye view query can be generated in the process of processing the bird's-eye view feature information of each frame by the preset latent space renderer. For example, the bird's-eye view query at the current moment can be generated by the preset latent space renderer based on the bird's-eye view query at the previous moment and the bird's-eye view feature information at the current moment. The bird's-eye view query at the current moment can reflect the temporal characteristics of the bird's-eye view feature information at the current moment.

自注意力模块可以通过对自注意力机制构成，自注意力基站在处理序列数据是，每个元素都可以与序列中其他元素建立关联，可以通过计算序列内元素之间的相对重要性来自适应地捕捉元素之间的长程依赖关系，具体的，自注意力模块可以计算序列内元素与其他元素之间的相似度，并将这些相似度归一化为注意力权重，每个元素可以与对应的注意力权重进行加权求和，从而得到输出。时序交叉注意力模块可以是将一个序列中某个元素与另一个序列中所有元素进行注意力计算的模型，上述一个序列以及另一个序列可以是本发明实施例中提供的占用网格特征以及自车运动条件。前向网络可以是预先训练的神经网络模型，前向网络可以对未来点云进行预测，前向网络可以包括输入层、隐含层以及输出层构成。The self-attention module can be constructed by the self-attention mechanism. When the self-attention base station processes sequence data, each element can establish an association with other elements in the sequence, and can adaptively capture the long-range dependencies between elements by calculating the relative importance between elements in the sequence. Specifically, the self-attention module can calculate the similarity between elements in the sequence and other elements, and normalize these similarities into attention weights. Each element can be weighted summed with the corresponding attention weight to obtain an output. The temporal cross-attention module can be a model that calculates the attention of an element in a sequence with all elements in another sequence. The above-mentioned one sequence and the other sequence can be the occupied grid features and the self-vehicle motion conditions provided in the embodiments of the present invention. The forward network can be a pre-trained neural network model. The forward network can predict future point clouds. The forward network may include an input layer, a hidden layer, and an output layer.

在本发明实施例中，可以提取当前时刻的三维几何表示数据的鸟瞰图查询以及占用网格特征，以及当前时刻的自车运动条件，可以将自车运动条件、鸟瞰图查询以及占用网格特征依次在自注意力模块、时序交叉注意力模块以及前向网络进行多次迭代处理，可以将输出的结果作为未来点云。In the embodiment of the present invention, the bird's-eye view query and occupied grid characteristics of the three-dimensional geometric representation data at the current moment can be extracted, as well as the self-vehicle motion conditions at the current moment, and the self-vehicle movement conditions, bird's-eye view query and occupied grid features can be extracted Multiple iterative processes are performed in the self-attention module, temporal cross-attention module and forward network in sequence, and the output results can be used as future point clouds.

本发明实施例，通过将目标车辆按照时间顺序采集的图像帧排序为历史时序图像数据，将历史时序图像数据内不同时刻的图像帧依次输入预设历史编码器进行处理以生成鸟瞰图特征信息，使用预设隐空间渲染器对鸟瞰图特征信息进行处理以提取占用网格特征作为三维几何表示数据，调用预设多层感知器按照当前时刻的图像帧预测下一时刻的自车运动条件，将当前时刻的鸟瞰图查询、占用网格特征以及下一时刻的自车运动条件基于自注意力模块、时序交叉注意力模块以及前向网络进行多次迭代处理生成未来点云，以实现图像点云序列内语义、几何特征以及动态时序信息的充分利用，可提高点云预测生成的准确性，提高了自动驾驶模型的可靠性，可提升用户使用体验。In the embodiment of the present invention, the image frames collected by the target vehicle in chronological order are sorted into historical time series image data, and the image frames at different moments in the historical time series image data are sequentially input into the preset history encoder for processing to generate bird's-eye view feature information. Use the preset latent space renderer to process the bird's-eye view feature information to extract the occupied grid features as three-dimensional geometric representation data, and call the preset multi-layer perceptron to predict the vehicle motion conditions at the next moment based on the image frame at the current moment. Based on the self-attention module, temporal cross-attention module and forward network, the bird's-eye view query at the current moment, occupied grid characteristics and self-vehicle motion conditions at the next moment are processed to generate future point clouds through multiple iterations to achieve image point clouds. Full utilization of semantic, geometric features and dynamic timing information within the sequence can improve the accuracy of point cloud prediction generation, improve the reliability of the autonomous driving model, and improve the user experience.

进一步的，在上述发明实施例的基础上，调用预设隐空间渲染器处理每帧鸟瞰图特征信息，包括：Further, based on the above embodiments of the invention, the preset latent space renderer is called to process the bird's-eye view feature information of each frame, including:

按照预设条件概率函数累计鸟瞰图特征信息内不同占用网格的条件概率；按照预设特征期望函数确定条件概率以及鸟瞰图特征信息对应的射线特征；将射线特征以及条件概率的加权乘积作为占用网格特征。Accumulate the conditional probabilities of different occupied grids in the bird's-eye view feature information according to the preset conditional probability function; determine the conditional probability and the ray characteristics corresponding to the bird's-eye view feature information according to the preset feature expectation function; use the weighted product of the ray characteristics and the conditional probability as the occupancy Grid features.

其中，预设条件概率函数可以预先配置的条件概率确定函数。预设特征期望函数可以是计算从鸟瞰图特征信息对应的鸟瞰图原点出发到达每个网格射线的联合概率分布，射线特征可以通过预设特征期望函数确定出的联合概率值。The preset conditional probability function may be a preconfigured conditional probability determination function. The preset feature expectation function may be to calculate the joint probability distribution arriving at each grid ray from the bird's-eye view origin corresponding to the bird's-eye view feature information, and the ray features may be determined by the joint probability value of the preset feature expectation function.

在本发明实施例中，可以通过预设条件概率函数计算鸟瞰图特征信息对应的鸟瞰图中每个网格的占用条件概率，以及，使用预设特征期望函数计算鸟瞰图特征信息对应的从鸟瞰图原点出发到达每个网格射线的联合概率分布作为射线特征，可以将射线特征以及条件概率的加权乘积作为占用网格特征。In the embodiment of the present invention, the occupancy condition probability of each grid in the bird's-eye view corresponding to the bird's-eye view feature information can be calculated through the preset conditional probability function, and the preset feature expectation function can be used to calculate the bird's-eye view feature information corresponding to the bird's-eye view. The joint probability distribution of each grid ray starting from the origin of the graph is used as the ray feature, and the weighted product of the ray feature and the conditional probability can be used as the occupied grid feature.

进一步的，在上述发明实施例的基础上，预设条件概率函数至少包括下述：Further, based on the above embodiments of the invention, the preset conditional probability function at least includes the following:

其中，表示鸟瞰图特征信息内网格i与原点指向网格i的射线上的路点j的条件概率，p表示不同鸟瞰图特征信息的网格的独立占用概率，p的取值由神经网络估计确定；in, Represents the conditional probability of waypoint j on grid i within the bird's-eye view feature information and the ray whose origin points to grid i. p represents the independent occupancy probability of grids with different bird's-eye view feature information. The value of p is determined by neural network estimation. ;

预设特征期望函数至少包括下述：The preset feature expectation function at least includes the following:

其中，表示鸟瞰图特征信息内网格i与原点指向网格i的射线上的路点k的条件概率，/>表示第k帧图像帧的鸟瞰图特征信息。in, represents the conditional probability of the grid i in the bird's-eye view feature information and the waypoint k on the ray whose origin points to the grid i,/> Represents the bird's-eye view feature information of the k-th image frame.

在上述发明实施例的基础上，自车运动条件包括以下至少之一：车辆行驶方向、车速和车辆位置。Based on the above embodiments of the invention, the self-vehicle motion conditions include at least one of the following: vehicle traveling direction, vehicle speed, and vehicle position.

在本发明实施例中，自车运动条件可以是反映目标车辆在不同时刻的运动状态的信息，自车运动条件可以包括但不限于车辆行驶方向、车速和车辆位置等。In the embodiment of the present invention, the self-vehicle motion conditions may be information reflecting the motion status of the target vehicle at different times. The self-vehicle motion conditions may include but are not limited to vehicle traveling direction, vehicle speed, vehicle position, etc.

实施例三Embodiment 3

本发明实施例提供的一种点云生成方法，该方法可以实现视觉点云预测，该方法可以通过历史编码器、隐空间渲染器以及解码器实现，可以将将历史输入图像凝练成鸟瞰图(Bird's-Eye View，BEV)特征的历史编码器，将BEV特征转化为3维几何特征的隐空间渲染器，通过解码器按照3维几何特征预测未来点云。本发明实施例通过模型ViDAR实现点云预测生成，参见图3，ViDAR主要包括三个核心组件：1、编码器，该编码器可以预先训练生成，可以用于从视觉输入中提取BEV特征。2、隐空间渲染器，该隐空间渲染器可以从BEV特征中提取三维几何表示。3、解码器，该解码器可以通过自回归的方式预测未来BEV特征，并通过一个预测头将未来BEV特征投影为三维占据概率网络，进而通过占据概率网络解析点云。ViDAR的逻辑在于通过原点发出不同指定方向的射线，确定沿着每条射线的路径上最大占据响应的航路点距离，通过距离以及射线方向确定出点的位置，从而预测未来点云。The embodiment of the present invention provides a point cloud generation method, which can realize visual point cloud prediction. This method can be realized by a history encoder, a latent space renderer and a decoder, and can condense the historical input image into a bird's-eye view ( A historical encoder for Bird's-Eye View (BEV) features, a latent space renderer that converts BEV features into 3D geometric features, and a decoder to predict future point clouds based on 3D geometric features. The embodiment of the present invention realizes point cloud prediction and generation through the model ViDAR. See Figure 3. ViDAR mainly includes three core components: 1. Encoder, which can be pre-trained and generated and can be used to extract BEV features from visual input. 2. Implicit space renderer. The Cain space renderer can extract three-dimensional geometric representations from BEV features. 3. Decoder. The decoder can predict future BEV features through autoregression, and project future BEV features into a three-dimensional occupancy probability network through a prediction head, and then parse the point cloud through the occupancy probability network. The logic of ViDAR is to emit rays in different specified directions through the origin, determine the distance to the waypoint that occupies the maximum response along the path of each ray, and determine the location of the outgoing point through the distance and ray direction to predict future point clouds.

参见图4，隐空间渲染器的目标在于提取视觉输入的更具区分性和代表性的特征，隐空间渲染器可以通过条件概率函数累计视觉输入对应的BEV网格的条件概率，进而通过特征期望函数计算每条射线的特征，最终通过将射线特征与其关联的条件概率进行加权，从而获取到每个占用网格的特征。具体的，条件概率函数的数据表示包括：Referring to Figure 4, the goal of the latent space renderer is to extract more distinguishing and representative features of the visual input. The latent space renderer can accumulate the conditional probability of the BEV grid corresponding to the visual input through the conditional probability function, and then use the feature expectation The function calculates the characteristics of each ray, and finally obtains the characteristics of each occupied grid by weighting the ray characteristics with their associated conditional probabilities. Specifically, the data representation of the conditional probability function includes:

其中，i表示不同的BEV网格，j表示不同的从原点指向BEV网格i的射线上不同的路点，p表示不同BEV网格的独立占用概率，由神经网络估计获得。本发明实施例可以通过条件概率函数将每一个独立的BEV网格的占用概率转换为由其之前的路点概率的联合概率分布。Wherein, i represents different BEV grids, j represents different waypoints on different rays pointing from the origin to BEV grid i, and p represents the independent occupancy probability of different BEV grids, which is obtained by neural network estimation. In the embodiment of the present invention, the occupancy probability of each independent BEV grid can be converted into a joint probability distribution of the probabilities of the previous waypoints through a conditional probability function.

在本发明实施例中，在确定联合概率分布后，可以通过特征期望函数确定出射线特征，该特征期望函数的数据表示可以包括：In the embodiment of the present invention, after determining the joint probability distribution, the ray characteristics can be determined through the characteristic expectation function. The data representation of the characteristic expectation function can include:

本发明实施例可以通过特征期望函数获取到每一条由原点触发的射线对应的射线特征。然后可以将射线特征与条件概率进行加权，从而获得最后的占用网格特征：In the embodiment of the present invention, the ray characteristics corresponding to each ray triggered by the origin can be obtained through the characteristic expectation function. The ray features can then be weighted with conditional probabilities to obtain the final occupied grid features:

在本发明实施例中，ViDAR提出的解码器可以通过从三维几何表示解码未来点云，参见图5，解码器可以是一个迭代使用的变压器，可以通过自回归的方式不停的从上一帧的结果预测下一帧的BEV特征。其中，在第t次的迭代中，它首先通过多层感知器MLP将期望的下一帧自车运动条件编码为高维表示，该自车运动条件可以包括自车在未来的方向以及位置、速度等，然后将高维表示加入到一系列BEV查询(Future BEV Queries)中作为输入。可以使用可变形的自注意力模块(Deformable Self-Attention)，时序交叉注意力模块(Temporal Cross-Attention)，以及前向网络(FeedForward Network)基于自车运动条件以及上一帧BEV特征预测未来的点云。In the embodiment of the present invention, the decoder proposed by ViDAR can decode the future point cloud from the three-dimensional geometric representation. See Figure 5. The decoder can be an iterative transformer that can continuously learn from the previous frame through autoregression. The result predicts the BEV features of the next frame. Among them, in the t-th iteration, it first encodes the expected self-vehicle motion conditions of the next frame into a high-dimensional representation through the multi-layer perceptron MLP. The self-vehicle motion conditions can include the future direction and position of the self-vehicle, speed, etc., and then add the high-dimensional representation to a series of BEV queries (Future BEV Queries) as input. You can use the Deformable Self-Attention module (Deformable Self-Attention), the Temporal Cross-Attention module (Temporal Cross-Attention), and the FeedForward Network to predict the future based on the self-vehicle motion conditions and the BEV features of the previous frame. Point cloud.

本发明实施例通过nuScenes数据集上进行实验以验证点云生成方法的性能，其中，nuScenes数据集涵盖了复杂的城市场景，包括1000个不同自动驾驶场景的时序片段，包括来自6个摄像机的多视图图像，1个32线雷达的点云和2Hz的标注。在nuScenes数据集上比较了本发明实施例提供的点云生成方法实现的ViDAR和以前的点云生成方法在预测未来点云上的效果对比，同时对比了不同下游预训练的对比结果如下表：The embodiment of the present invention conducts experiments on the nuScenes data set to verify the performance of the point cloud generation method. The nuScenes data set covers complex urban scenes, including 1,000 time-series fragments of different autonomous driving scenes, including multiple images from 6 cameras. View image, point cloud of a 32-line radar and annotations at 2Hz. On the nuScenes data set, the effects of ViDAR implemented by the point cloud generation method provided by the embodiment of the present invention and the previous point cloud generation method in predicting future point clouds were compared. The comparison results of different downstream pre-training are also compared in the following table:

表1未来点云预测Table 1 Future point cloud prediction

通过表1展示了未来点云预测的结果，评估指标为预测点云和实际点云之间的Chamfer-Distance，这个指标越小说明预测效果越好。如上表表1示，在使用视觉图像输入的情况下，本发明实施例提供的方法与现有最先进的方法相比获得了更好的点云预测效果。Table 1 shows the result of future point cloud prediction. The evaluation index is the Chamfer-Distance between the predicted point cloud and the actual point cloud. The smaller this index is, the better the prediction effect is. As shown in Table 1 above, when using visual image input, the method provided by the embodiment of the present invention achieves better point cloud prediction effect than the existing most advanced method.

表2下游感知任务对比Table 2 Comparison of downstream perception tasks

本发明实施例通过表2展示了在自动驾驶下游任务上的结果，评估指标为NDS，以及mAP，其中，指标NDS为nuScenes数据集上的检测分数以及而mAP为一种常用的衡量检测正确性的指标。这两个指标越高表示下游效果越好。如上表表2所示，本发明实施例提供方法实现的ViDAR远远的超越了现有的预训练方案。The embodiment of the present invention shows the results on the downstream tasks of autonomous driving through Table 2. The evaluation indicators are NDS and mAP, where the indicator NDS is the detection score on the nuScenes data set and mAP is a commonly used measure of detection accuracy. index of. The higher these two indicators are, the better the downstream effect is. As shown in Table 2 above, the ViDAR implemented by the method provided by the embodiment of the present invention far surpasses the existing pre-training solution.

实施例四Embodiment 4

图6是根据本发明实施例四提供的一种点云生成装置的结构示意图。如图6所示，该装置包括：编码器模块301、隐空间渲染器模块302和解码器模块303。FIG. 6 is a schematic structural diagram of a point cloud generating device according to Embodiment 4 of the present invention. As shown in Figure 6, the device includes: an encoder module 301, a latent space renderer module 302 and a decoder module 303.

编码器模块301，用于获取目标车辆的历史时序图像数据，并确定所述历史时序图像数据的鸟瞰图特征信息。The encoder module 301 is used to obtain historical time series image data of the target vehicle and determine bird's-eye view feature information of the historical time series image data.

隐空间渲染器模块302，用于在所述鸟瞰图特征信息内采集所述历史时序图像数据的三维几何表示数据。The latent space renderer module 302 is used to collect the three-dimensional geometric representation data of the historical time series image data within the bird's-eye view feature information.

解码器模块303，用于根据所述目标车辆的自车运动条件以及所述三维几何表示数据预测生成未来点云。The decoder module 303 is used to predict and generate a future point cloud according to the ego-vehicle motion condition of the target vehicle and the three-dimensional geometric representation data.

本发明实施例，通过编码器模块在目标车辆内获取历史时序图像数据，并生成历史时序图像数据对应的鸟瞰图特征信息，隐空间渲染器模块按照鸟瞰图特征信息提取对应历史时序图像数据的三维几何表示数据，解码器模块在目标车辆的自车运动条件下按照所述三维几何表示数据预测生成未来点云，可实现自动驾驶模型场景下的视觉图像处理，可实现图像点云序列内语义、几何特征以及动态时序信息的充分利用，可提高点云预测生成的准确性，可提高自动驾驶模型的可靠性，提升用户使用体验。In the embodiment of the present invention, the encoder module obtains historical time series image data in the target vehicle and generates bird's-eye view feature information corresponding to the historical time series image data. The latent space renderer module extracts the three-dimensional image data corresponding to the historical time series image data according to the bird's-eye view feature information. Geometric representation data, the decoder module predicts and generates future point clouds according to the three-dimensional geometric representation data under the conditions of the target vehicle's self-vehicle motion, which can realize visual image processing in the autonomous driving model scene, and can realize semantics, Full utilization of geometric features and dynamic time series information can improve the accuracy of point cloud prediction generation, improve the reliability of autonomous driving models, and enhance user experience.

在一些发明实施例中，编码器模块301包括：In some inventive embodiments, the encoder module 301 includes:

时序处理单元，用于提取所述目标车辆按照时间顺序采集的图像帧构成所述历史时序图像数据。A time series processing unit, configured to extract image frames collected by the target vehicle in time sequence to constitute the historical time series image data.

鸟瞰图处理单元，用于将所述历史时序图像数据内对应不同时刻的所述图像帧依次输入预设历史编码器以生成不同时刻所述图像帧的所述鸟瞰图特征信息，其中，所述预设历史编码器包括自低向上鸟瞰图模型、自顶向下鸟瞰图模型、多模态鸟瞰图模型、解码鸟瞰图模型。A bird's-eye view processing unit, configured to sequentially input the image frames corresponding to different moments in the historical time series image data into a preset history encoder to generate the bird's-eye view feature information of the image frames at different moments, wherein, The default history encoder includes a bottom-up bird's-eye view model, a top-down bird's-eye view model, a multi-modal bird's-eye view model, and a decoded bird's-eye view model.

在一些发明实施例中，隐空间渲染器模块302包括：In some inventive embodiments, latent space renderer module 302 includes:

鸟瞰图处理单元，用于调用预设隐空间渲染器处理每帧所述鸟瞰图特征信息。A bird's-eye view processing unit is used to call a preset latent space renderer to process the bird's-eye view feature information of each frame.

几何提取单元，用于提取所述预设隐空间渲染器生成对应所述鸟瞰图特征信息的占用网格特征作为所述三维几何表示数据。A geometry extraction unit configured to extract the occupied grid features corresponding to the bird's-eye view feature information generated by the preset latent space renderer as the three-dimensional geometric representation data.

在一些发明实施例中，鸟瞰图处理单元具体用于：按照预设条件概率函数累计所述鸟瞰图特征信息内不同占用网格的条件概率；按照预设特征期望函数确定所述条件概率以及所述鸟瞰图特征信息对应的射线特征；将所述射线特征以及所述条件概率的加权乘积作为所述占用网格特征。In some embodiments of the invention, the bird's-eye view processing unit is specifically configured to: accumulate the conditional probabilities of different occupied grids in the bird's-eye view feature information according to a preset conditional probability function; determine the conditional probability and the conditional probability according to the preset feature expectation function. The ray feature corresponding to the bird's-eye view feature information is used; the weighted product of the ray feature and the conditional probability is used as the occupied grid feature.

在一些发明实施例中，预设条件概率函数至少包括下述：In some embodiments of the invention, the preset conditional probability function at least includes the following:

其中，所述表示鸟瞰图特征信息内网格i与原点指向网格i的射线上的路点j的条件概率，p表示不同鸟瞰图特征信息的网格的独立占用概率，p的取值由神经网络估计确定；Among them, the Represents the conditional probability of waypoint j on grid i within the bird's-eye view feature information and the ray whose origin points to grid i. p represents the independent occupancy probability of grids with different bird's-eye view feature information. The value of p is determined by neural network estimation. ;

所述预设特征期望函数至少包括下述：The preset feature expectation function at least includes the following:

其中，所述表示鸟瞰图特征信息内网格i与原点指向网格i的射线上的路点k的条件概率，/>表示第k帧图像帧的鸟瞰图特征信息。Among them, the Represents the conditional probability of waypoint k on grid i in the bird's-eye view feature information and the ray whose origin points to grid i,/> Represents the bird's-eye view feature information of the k-th image frame.

在另一些发明实施例中，解码器模块303具体用于：调用预设多层感知器基于所述历史时序图像数据内当前时刻的图像帧确定下一时刻的所述自车运动条件；将当前时刻的所述三维几何表示数据对应的鸟瞰图查询、当前时刻的所述三维几何表示数据的占用网格特征以及所述自车运动条件依次通过自注意力模块、时序交叉注意力模块以及前向网络进行至少一次迭代处理生成所述未来点云。In other embodiments of the invention, the decoder module 303 is specifically configured to: call a preset multi-layer perceptron to determine the self-vehicle motion condition at the next moment based on the image frame at the current moment in the historical time series image data; The bird's-eye view query corresponding to the three-dimensional geometric representation data at the current moment, the occupied grid characteristics of the three-dimensional geometric representation data at the current moment, and the vehicle motion conditions are sequentially passed through the self-attention module, the temporal cross-attention module, and the forward The network performs at least one iteration to generate the future point cloud.

本发明实施例所提供的点云生成装置可执行本发明任意实施例所提供的点云生成方法，具备执行方法相应的功能模块和有益效果。The point cloud generation device provided by the embodiment of the present invention can execute the point cloud generation method provided by any embodiment of the present invention, and has functional modules and beneficial effects corresponding to the execution method.

实施例六Embodiment 6

图7是实现本发明实施例的电子设备的结构示意图。电子设备旨在表示各种形式的数字计算机，诸如，膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置，诸如，个人数字处理、蜂窝电话、智能电话、可穿戴设备(如头盔、眼镜、手表等)和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例，并且不意在限制本文中描述的和/或者要求的本发明的实现。Fig. 7 is a schematic diagram of the structure of an electronic device implementing an embodiment of the present invention. The electronic device is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workbenches, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices (such as helmets, glasses, watches, etc.) and other similar computing devices. The components shown herein, their connections and relationships, and their functions are merely examples and are not intended to limit the implementation of the present invention described and/or required herein.

如图7所示，电子设备10包括至少一个处理器11，以及与至少一个处理器11通信连接的存储器，如只读存储器(ROM)12、随机访问存储器(RAM)13等，其中，存储器存储有可被至少一个处理器执行的计算机程序，处理器11可以根据存储在只读存储器(ROM)12中的计算机程序或者从存储单元18加载到随机访问存储器(RAM)13中的计算机程序，来执行各种适当的动作和处理。在RAM 13中，还可存储电子设备10操作所需的各种程序和数据。处理器11、ROM 12以及RAM 13通过总线14彼此相连。输入/输出(I/O)接口15也连接至总线14。As shown in Figure 7, the electronic device 10 includes at least one processor 11, and a memory communicatively connected to the at least one processor 11, such as a read-only memory (ROM) 12, a random access memory (RAM) 13, etc., wherein the memory stores There is a computer program that can be executed by at least one processor. The processor 11 can perform the operation according to the computer program stored in the read-only memory (ROM) 12 or loaded from the storage unit 18 into the random access memory (RAM) 13. Perform various appropriate actions and processing. In the RAM 13, various programs and data required for the operation of the electronic device 10 can also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via the bus 14. An input/output (I/O) interface 15 is also connected to bus 14 .

电子设备10中的多个部件连接至I/O接口15，包括：输入单元16，例如键盘、鼠标等；输出单元17，例如各种类型的显示器、扬声器等；存储单元18，例如磁盘、光盘等；以及通信单元19，例如网卡、调制解调器、无线通信收发机等。通信单元19允许电子设备10通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据。Multiple components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16, such as a keyboard, a mouse, etc.; an output unit 17, such as various types of displays, speakers, etc.; a storage unit 18, such as a magnetic disk, an optical disk, etc. etc.; and communication unit 19, such as network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices through computer networks such as the Internet and/or various telecommunications networks.

处理器11可以是各种具有处理和计算能力的通用和/或专用处理组件。处理器11的一些示例包括但不限于中央处理单元(CPU)、图形处理单元(GPU)、各种专用的人工智能(AI)计算芯片、各种运行机器学习模型算法的处理器、数字信号处理器(DSP)、以及任何适当的处理器、控制器、微控制器等。处理器11执行上文所描述的各个方法和处理，例如点云生成方法。The processor 11 may be a variety of general and/or special processing components with processing and computing capabilities. Some examples of the processor 11 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various special artificial intelligence (AI) computing chips, various processors running machine learning model algorithms, a digital signal processor (DSP), and any appropriate processor, controller, microcontroller, etc. The processor 11 executes the various methods and processes described above, such as a point cloud generation method.

在一些实施例中，点云生成方法可被实现为计算机程序，其被有形地包含于计算机可读存储介质，例如存储单元18。在一些实施例中，计算机程序的部分或者全部可以经由ROM 12和/或通信单元19而被载入和/或安装到电子设备10上。当计算机程序加载到RAM 13并由处理器11执行时，可以执行上文描述的点云生成方法的一个或多个步骤。备选地，在其他实施例中，处理器11可以通过其他任何适当的方式(例如，借助于固件)而被配置为执行点云生成方法。In some embodiments, the point cloud generation method may be implemented as a computer program, which is tangibly embodied in a computer-readable storage medium, such as the storage unit 18 . In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19 . When the computer program is loaded into the RAM 13 and executed by the processor 11, one or more steps of the point cloud generation method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the point cloud generation method in any other suitable manner (eg, by means of firmware).

本文中以上描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、芯片上系统的系统(SOC)、负载可编程逻辑设备(CPLD)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括：实施在一个或者多个计算机程序中，该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释，该可编程处理器可以是专用或者通用可编程处理器，可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令，并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。Various implementations of the systems and techniques described above may be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on a chip implemented in a system (SOC), load programmable logic device (CPLD), computer hardware, firmware, software, and/or a combination thereof. These various embodiments may include implementation in one or more computer programs executable and/or interpreted on a programmable system including at least one programmable processor, the programmable processor The processor, which may be a special purpose or general purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device. An output device.

用于实施本发明的方法的计算机程序可以采用一个或多个编程语言的任何组合来编写。这些计算机程序可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器，使得计算机程序当由处理器执行时使流程图和/或框图中所规定的功能/操作被实施。计算机程序可以完全在机器上执行、部分地在机器上执行，作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。Computer programs for implementing the methods of the invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing device, such that the computer program, when executed by the processor, causes the functions/operations specified in the flowcharts and/or block diagrams to be implemented. A computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

在本发明的上下文中，计算机可读存储介质可以是有形的介质，其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的计算机程序。计算机可读存储介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备，或者上述内容的任何合适组合。备选地，计算机可读存储介质可以是机器可读信号介质。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of this invention, a computer-readable storage medium may be a tangible medium that may contain or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. Computer-readable storage media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices or devices, or any suitable combination of the foregoing. Alternatively, the computer-readable storage medium may be a machine-readable signal medium. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, laptop disks, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.

为了提供与用户的交互，可以在电子设备上实施此处描述的系统和技术，该电子设备具有：用于向用户显示信息的显示装置(例如，CRT(阴极射线管)或者LCD(液晶显示器)监视器)；以及键盘和指向装置(例如，鼠标或者轨迹球)，用户可以通过该键盘和该指向装置来将输入提供给电子设备。其它种类的装置还可以用于提供与用户的交互；例如，提供给用户的反馈可以是任何形式的传感反馈(例如，视觉反馈、听觉反馈、或者触觉反馈)；并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。To provide interaction with a user, the systems and techniques described herein may be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user; and a keyboard and a pointing device (e.g., a mouse or trackball) through which the user can provide input to the electronic device. Other types of devices may also be used to provide interaction with the user; for example, the feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form (including acoustic input, voice input, or tactile input).

可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如，作为数据服务器)、或者包括中间件部件的计算系统(例如，应用服务器)、或者包括前端部件的计算系统(例如，具有图形用户界面或者网络浏览器的用户计算机，用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如，通信网络)来将系统的部件相互连接。通信网络的示例包括：局域网(LAN)、广域网(WAN)、区块链网络和互联网。The systems and techniques described herein may be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., A user's computer having a graphical user interface or web browser through which the user can interact with implementations of the systems and technologies described herein), or including such backend components, middleware components, or any combination of front-end components in a computing system. The components of the system may be interconnected by any form or medium of digital data communication (eg, a communications network). Examples of communication networks include: local area network (LAN), wide area network (WAN), blockchain network, and the Internet.

计算系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。服务器可以是云服务器，又称为云计算服务器或云主机，是云计算服务体系中的一项主机产品，以解决了传统物理主机与VPS服务中，存在的管理难度大，业务扩展性弱的缺陷。Computing systems may include clients and servers. Clients and servers are generally remote from each other and typically interact over a communications network. The relationship of client and server is created by computer programs running on corresponding computers and having a client-server relationship with each other. The server can be a cloud server, also known as cloud computing server or cloud host. It is a host product in the cloud computing service system to solve the problems of difficult management and weak business scalability in traditional physical hosts and VPS services. defect.

应该理解，可以使用上面所示的各种形式的流程，重新排序、增加或删除步骤。例如，本发明中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行，只要能够实现本发明的技术方案所期望的结果，本文在此不进行限制。It should be understood that various forms of the process shown above may be used, with steps reordered, added or deleted. For example, each step described in the present invention can be executed in parallel, sequentially, or in different orders. As long as the desired results of the technical solution of the present invention can be achieved, there is no limitation here.

上述具体实施方式，并不构成对本发明保护范围的限制。本领域技术人员应该明白的是，根据设计要求和其他因素，可以进行各种修改、组合、子组合和替代。任何在本发明的精神和原则之内所作的修改、等同替换和改进等，均应包含在本发明保护范围之内。The above specific implementations do not constitute a limitation on the protection scope of the present invention. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions can be made according to design requirements and other factors. Any modification, equivalent substitution and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.