CN116721418A

Movatterモバイル変換

Info

Publication number: CN116721418A
Application number: CN202310736368.3A
Authority: CN
Inventors: 李思洋; 郭雪亮; 郭钰; 李强; 刘让; 张丹
Original assignee: Uisee Shanghai Automotive Technologies Ltd
Current assignee: Uisee Shanghai Automotive Technologies Ltd
Priority date: 2023-06-20
Filing date: 2023-06-20
Publication date: 2023-09-08

Abstract

Translated fromChinese

本公开实施例公开了一种标注车辆3D检测框的方法、装置、设备和介质，方法包括：建立第一2D图像中道路区域的像素点与世界坐标之间的关联关系；确定第二2D图像中车辆的伪3D检测框，所述第一2D图像中的道路区域包括所述车辆在所述第二2D图像中占据的道路区域；根据所述关联关系确定所述伪3D检测框上预设特征像素点所对应的世界坐标；基于所述预设特征像素点的世界坐标构造所述车辆的真3D检测框。本公开实现了车辆3D检测框的标注，提高了标注准确度以及效率，降低了投入成本。

Embodiments of the present disclosure disclose a method, device, equipment and medium for marking a 3D detection frame of a vehicle. The method includes: establishing a correlation between the pixels of the road area in the first 2D image and the world coordinates; determining the second 2D image. The pseudo 3D detection frame of the vehicle in the first 2D image includes the road area occupied by the vehicle in the second 2D image; and the preset value on the pseudo 3D detection frame is determined according to the association relationship. The world coordinates corresponding to the characteristic pixel points; construct the true 3D detection frame of the vehicle based on the world coordinates of the preset characteristic pixel points. The present disclosure realizes the labeling of vehicle 3D detection frames, improves labeling accuracy and efficiency, and reduces investment costs.

Description

Translated fromChinese

标注车辆3D检测框的方法、装置、设备和介质Methods, devices, equipment and media for marking vehicle 3D detection frames

技术领域Technical field

本公开涉及目标检测技术领域，尤其涉及一种标注车辆3D检测框的方法、装置、设备和介质。The present disclosure relates to the technical field of target detection, and in particular, to a method, device, equipment and medium for marking a 3D detection frame of a vehicle.

背景技术Background technique

基于深度学习的视觉单目3D目标检测算法一般会构建典型的Encoder-Decoder网络结构，其中End-to-End的检测方法会将预先标注好的数据，通常是标注了目标的3D检测框的数据以及2D的原图像数据输入网络中，计算loss并通过梯度的反向传播使网络进行学习，从而让网络具备从单目2D图像中输出目标的3D检测框的能力。Visual monocular 3D target detection algorithms based on deep learning generally build a typical Encoder-Decoder network structure, in which the End-to-End detection method will use pre-labeled data, usually data with the 3D detection frame of the target marked. And the 2D original image data is input into the network, the loss is calculated and the network learns through gradient backpropagation, so that the network has the ability to output the 3D detection frame of the target from the monocular 2D image.

相比于2D目标检测，往往3D目标检测算法的重点在于对目标的距离和朝向角检测。要网络在2D图像上检测出目标的一个大致的3D检测框并不难，难点在于这个3D检测框的距离和朝向角的准确性。所以，根据深度学习其他任务的经验，对于同一个任务，在同等条件下，越丰富的数据意味着能训练出一个性能越好的网络模型。所以，如果能够低成本地生成大量较高质量的3D检测框的标注数据，就能为算法提供更好的数据基础。Compared with 2D target detection, 3D target detection algorithms often focus on detecting the distance and orientation angle of the target. It is not difficult for the network to detect a rough 3D detection frame of the target on a 2D image. The difficulty lies in the accuracy of the distance and orientation angle of the 3D detection frame. Therefore, based on the experience of other deep learning tasks, for the same task, under the same conditions, richer data means that a network model with better performance can be trained. Therefore, if a large amount of higher-quality 3D detection frame annotation data can be generated at low cost, it can provide a better data basis for the algorithm.

有鉴于此，特提出本发明。In view of this, the present invention is proposed.

发明内容Contents of the invention

为了解决上述技术问题或者至少部分地解决上述技术问题，本公开实施例提供了一种标注车辆3D检测框的方法、装置、设备和介质，提高了标注准确度以及效率，降低了投入成本。In order to solve the above technical problems or at least partially solve the above technical problems, embodiments of the present disclosure provide a method, device, equipment and medium for labeling 3D detection frames of vehicles, which improves labeling accuracy and efficiency and reduces investment costs.

第一方面，本公开实施例提供了一种标注车辆3D检测框的方法，该方法包括：In a first aspect, embodiments of the present disclosure provide a method for marking a 3D detection frame of a vehicle. The method includes:

建立第一2D图像中道路区域的像素点与世界坐标之间的关联关系；Establish a correlation between the pixels of the road area in the first 2D image and the world coordinates;

确定第二2D图像中车辆的伪3D检测框，所述第一2D图像中的道路区域包括所述车辆在所述第二2D图像中占据的道路区域；Determine a pseudo 3D detection frame of the vehicle in the second 2D image, the road area in the first 2D image including the road area occupied by the vehicle in the second 2D image;

根据所述关联关系确定所述伪3D检测框上预设特征像素点所对应的世界坐标；Determine the world coordinates corresponding to the preset feature pixels on the pseudo 3D detection frame according to the association relationship;

基于所述预设特征像素点的世界坐标构造所述车辆的真3D检测框。A true 3D detection frame of the vehicle is constructed based on the world coordinates of the preset feature pixel points.

第二方面，本公开实施例还提供了一种标注车辆3D检测框的装置，该装置包括：In a second aspect, embodiments of the present disclosure also provide a device for marking a vehicle 3D detection frame. The device includes:

建立模块，用于建立第一2D图像中道路区域的像素点与世界坐标之间的关联关系；Establishing a module for establishing a correlation between the pixels of the road area in the first 2D image and the world coordinates;

第一确定模块，用于确定第二2D图像中车辆的伪3D检测框，所述第一2D图像中的道路区域包括所述车辆在所述第二2D图像中占据的道路区域；A first determination module configured to determine the pseudo 3D detection frame of the vehicle in the second 2D image, where the road area in the first 2D image includes the road area occupied by the vehicle in the second 2D image;

第二确定模块，用于根据所述关联关系确定所述伪3D检测框上预设特征像素点所对应的世界坐标；A second determination module, configured to determine the world coordinates corresponding to the preset characteristic pixel points on the pseudo 3D detection frame according to the association relationship;

构造模块，用于基于所述预设特征像素点的世界坐标构造所述车辆的真3D检测框。A construction module, configured to construct a true 3D detection frame of the vehicle based on the world coordinates of the preset feature pixel points.

第三方面，本公开实施例还提供了一种电子设备，所述电子设备包括：一个或多个处理器；存储装置，用于存储一个或多个程序；当所述一个或多个程序被所述一个或多个处理器执行，使得所述一个或多个处理器实现如上所述的标注车辆3D检测框的方法。In a third aspect, embodiments of the present disclosure also provide an electronic device. The electronic device includes: one or more processors; a storage device for storing one or more programs; when the one or more programs are The one or more processors execute such that the one or more processors implement the method of marking a vehicle 3D detection frame as described above.

第四方面，本公开实施例还提供了一种计算机可读存储介质，其上存储有计算机程序，该程序被处理器执行时实现如上所述的标注车辆3D检测框的方法。In a fourth aspect, embodiments of the present disclosure also provide a computer-readable storage medium on which a computer program is stored. When the program is executed by a processor, the method for marking a vehicle 3D detection frame as described above is implemented.

本公开实施例提供的一种标注车辆3D检测框的方法，通过建立第一2D图像中道路区域的像素点与世界坐标之间的关联关系；确定第二2D图像中车辆的伪3D检测框，所述第一2D图像中的道路区域包括所述车辆在所述第二2D图像中占据的道路区域；根据所述关联关系确定所述伪3D检测框上预设特征像素点所对应的世界坐标；基于所述预设特征像素点的世界坐标构造所述车辆的真3D检测框的技术手段，实现了车辆3D检测框的标注，提高了标注准确度以及效率，降低了投入成本。An embodiment of the present disclosure provides a method for marking a 3D detection frame of a vehicle by establishing a correlation between the pixels of the road area in the first 2D image and the world coordinates; determining the pseudo 3D detection frame of the vehicle in the second 2D image, The road area in the first 2D image includes the road area occupied by the vehicle in the second 2D image; determine the world coordinates corresponding to the preset feature pixel points on the pseudo 3D detection frame according to the association relationship ; The technical means of constructing the true 3D detection frame of the vehicle based on the world coordinates of the preset characteristic pixel points realizes the labeling of the vehicle 3D detection frame, improves the labeling accuracy and efficiency, and reduces the investment cost.

附图说明Description of the drawings

结合附图并参考以下具体实施方式，本公开各实施例的上述和其他特征、优点及方面将变得更加明显。贯穿附图中，相同或相似的附图标记表示相同或相似的元素。应当理解附图是示意性的，原件和元素不一定按照比例绘制。The above and other features, advantages, and aspects of various embodiments of the present disclosure will become more apparent with reference to the following detailed description taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It is to be understood that the drawings are schematic and that elements and elements are not necessarily drawn to scale.

图1为本公开实施例中的一种标注车辆3D检测框的方法的流程图；Figure 1 is a flow chart of a method of marking a vehicle 3D detection frame in an embodiment of the present disclosure;

图2为本公开实施例中的一种标志杆的示意图；Figure 2 is a schematic diagram of a sign pole in an embodiment of the present disclosure;

图3为本公开实施例中的一种像素点的示意图；Figure 3 is a schematic diagram of a pixel in an embodiment of the present disclosure;

图4为本公开实施例中的一种包括2D检测框的示意图；Figure 4 is a schematic diagram including a 2D detection frame in an embodiment of the present disclosure;

图5为本公开实施例中的一种包括伪3D检测框的示意图；Figure 5 is a schematic diagram including a pseudo 3D detection frame in an embodiment of the present disclosure;

图6为本公开实施例中的一种标记有车辆运动方向的示意图；Figure 6 is a schematic diagram with the vehicle movement direction marked in an embodiment of the present disclosure;

图7为本公开实施例中的一种标记有车辆的轮胎与地面接触点的示意图；Figure 7 is a schematic diagram of a marked contact point between a tire and the ground of a vehicle in an embodiment of the present disclosure;

图8为本公开实施例中的一种伪3D检测框的示意图；Figure 8 is a schematic diagram of a pseudo 3D detection frame in an embodiment of the present disclosure;

图9为本公开实施例中的一种标注车辆3D检测框的装置的结构示意图；Figure 9 is a schematic structural diagram of a device for marking a vehicle 3D detection frame in an embodiment of the present disclosure;

图10为本公开实施例中的一种电子设备的结构示意图。Figure 10 is a schematic structural diagram of an electronic device in an embodiment of the present disclosure.

具体实施方式Detailed ways

下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例，然而应当理解的是，本公开可以通过各种形式来实现，而且不应该被解释为限于这里阐述的实施例，相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是，本公开的附图及实施例仅用于示例性作用，并非用于限制本公开的保护范围。Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although certain embodiments of the disclosure are shown in the drawings, it should be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, which rather are provided for A more thorough and complete understanding of this disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of the present disclosure.

需要注意，本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分，并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。It should be noted that concepts such as “first” and “second” mentioned in this disclosure are only used to distinguish different devices, modules or units, and are not used to limit the order of functions performed by these devices, modules or units. Or interdependence.

本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的，而并不是用于对这些消息或信息的范围进行限制。The names of messages or information exchanged between multiple devices in the embodiments of the present disclosure are for illustrative purposes only and are not used to limit the scope of these messages or information.

基于深度学习的视觉单目3D目标检测算法需要大量的3D目标标注数据，一般来说，数据规模越大，对于网络的训练和推理性能就越有帮助，模型的泛化性也就越好。但是传统的3D目标标注方法，不仅需要通过精密的距离测量仪器来获取准确的传感器位姿数据，而且后期还需要通过人工标注的方式来实现。这样的流程面临较大的硬件投入和人工投入等问题，并且标注数据量和投入成本成正比关系。目前，单目目标检测和目标分割算法已较为成熟，受益于大量公开的标注数据或者较为低廉的数据获得成本，获得一个性能较好的单目目标检测的模型并且将目标的2D框及其轮廓标注出来较为容易。基于此，特提出本公开的技术方案。The visual monocular 3D target detection algorithm based on deep learning requires a large amount of 3D target annotation data. Generally speaking, the larger the data size, the more helpful it is for the training and inference performance of the network, and the better the generalization of the model. However, the traditional 3D target labeling method not only requires precise distance measurement instruments to obtain accurate sensor pose data, but also requires manual labeling in the later stage. Such a process faces problems such as large hardware investment and manual investment, and the amount of annotated data is directly proportional to the investment cost. At present, monocular target detection and target segmentation algorithms are relatively mature. Benefiting from a large amount of public annotation data or relatively low data acquisition costs, a better-performing monocular target detection model can be obtained and the 2D box and outline of the target can be obtained. It's easier to label it. Based on this, the technical solution of the present disclosure is proposed.

图1为本公开实施例中的一种标注车辆3D检测框的方法的流程图。该方法可以由标注车辆3D检测框的装置执行，该装置可以采用软件和/或硬件的方式实现，该装置可配置于电子设备中。如图1所示，该方法具体可以包括如下步骤：Figure 1 is a flow chart of a method of marking a 3D detection frame of a vehicle in an embodiment of the present disclosure. The method can be executed by a device for marking the 3D detection frame of the vehicle. The device can be implemented in the form of software and/or hardware. The device can be configured in an electronic device. As shown in Figure 1, the method may specifically include the following steps:

S110、建立第一2D图像中道路区域的像素点与世界坐标之间的关联关系。S110. Establish an association between the pixels of the road area in the first 2D image and the world coordinates.

示例性的，所述建立第一2D图像中道路区域的像素点与世界坐标之间的关联关系，包括：Exemplarily, establishing an association between the pixels of the road area in the first 2D image and the world coordinates includes:

通过位姿固定的相机针对垂直树立在所述道路区域中不同位置的标志杆分别进行拍摄，获得多张所述第一2D图像，所述标志杆上从低到高依次设置有多个不同颜色的标记点；A plurality of first 2D images are obtained by taking photos of sign poles vertically erected at different positions in the road area with fixed-position cameras. A plurality of different colors are arranged on the sign poles from low to high. mark points;

通过与特征标记点关联设置的GPS模块获取所述特征标记点的世界坐标，所述特征标记点是所述不同颜色的标记点中的任意一个；Obtain the world coordinates of the characteristic marker point through a GPS module configured in association with the characteristic marker point, where the characteristic marker point is any one of the marker points of different colors;

针对所述多个不同颜色的标记点中的当前标记点，根据当前标记点与所述特征标记点在所述标志杆上的相对位置关系以及所述特征标记点的世界坐标确定当前标记点的世界坐标；For the current marking point among the plurality of marking points of different colors, determine the current marking point according to the relative position relationship between the current marking point and the characteristic marking point on the sign pole and the world coordinate of the characteristic marking point. world coordinates;

建立各所述标记点在对应的第一2D图像中的像素点与世界坐标之间的关联关系。Establish an association between the pixel points of each marker point in the corresponding first 2D image and the world coordinates.

具体的，所述相机为单目相机，在确定好相机的高度和朝向后将相机进行固定，即在拍摄过程中相机的位姿是固定且已知的，另外相机的内参和外参也是已知量。可以理解的是相机的朝向通常是面向待拍摄的道路区域。将相机固定好之后，可人工推动标志杆均匀地在道路区域内移动，标志杆在每个具体位置静止时，相机针对标志杆进行拍摄，如此可获得多张第一2D图像。可选的，还可以预先将多个相同的标志杆均匀地、垂直树立在道路区域，然后利用所述相机进行拍摄，可获得包括多个标志杆在内的一张所述第一2D图像，此种情况下应避免标志杆彼此之间相互遮挡，且应给各标志杆打标签，以便于对相同的多个标志杆进行区分。Specifically, the camera is a monocular camera. After determining the height and orientation of the camera, the camera is fixed. That is, the camera's pose is fixed and known during the shooting process. In addition, the camera's internal and external parameters are also known. Know the quantity. It is understood that the camera is usually oriented towards the road area to be photographed. After the camera is fixed, the sign pole can be manually pushed to move evenly within the road area. When the sign pole is stationary at each specific position, the camera will take pictures of the sign pole, so that multiple first 2D images can be obtained. Optionally, a plurality of identical sign poles can also be erected evenly and vertically in the road area in advance, and then the camera is used to take pictures, and a first 2D image including multiple sign poles can be obtained, In this case, the sign poles should be prevented from blocking each other, and each sign pole should be labeled to facilitate the identification of multiple identical sign poles.

所述标志杆上从低到高依次设置有多个不同颜色的标记点，如图2所示，所述标志杆上从低到高依次为红色标记点210、蓝色标记点220、黄色标记点230和绿色标记点240。可将其中的红色标记点210看作是特征标记点，为红色标记点210额外设置绑定的GPS模块，通过该GPS模块可确定红色标记点210的世界坐标，进一步，由于其它标记点与红色标记点210的相对位置关系是确定且已知的，因此基于红色标记点210的世界坐标以及相对位置关系可以分别确定其它各标记点的世界坐标。同时各标记点在第一2D图像中也对应具体的像素点，至此可建立各标记点在第一2D图像中的像素点与世界坐标之间的关联关系。There are multiple marking points of different colors on the sign pole from low to high. As shown in Figure 2, there are red marking points 210, blue marking points 220, and yellow markings on the sign pole from low to high. Point 230 and green marker point 240. The red mark point 210 can be regarded as a characteristic mark point. An additional bound GPS module is set for the red mark point 210. The world coordinates of the red mark point 210 can be determined through this GPS module. Furthermore, since other mark points are different from the red mark point, The relative position relationship of the marker point 210 is determined and known. Therefore, the world coordinates of other marker points can be determined based on the world coordinates and relative position relationship of the red marker point 210. At the same time, each marker point also corresponds to a specific pixel point in the first 2D image. At this point, a correlation between the pixel point and world coordinates of each marker point in the first 2D image can be established.

可以理解的是，通过该方式(将相机固定好之后，人工推动标志杆均匀地在道路区域内移动，标志杆在每个具体位置静止时，相机针对标志杆进行拍摄)获得的第一2D图像中，同一时刻标志杆上不同的标志点在同一第一2D图像中对应的像素点不同，假设第一2D图像的高为H，宽为W，即第一2D图像有H*W个像素点，可看作是一个H*W的矩阵，假设标志点个数为N，那么存储各标志点定位信息的矩阵可以是H*W*(N+1)的矩阵，其中矩阵的第1层到第N层可存储各标志点的像素坐标，第(N+1)层存储特征标记点(例如标志杆上最下面的红色标记点)的世界坐标。特别的，无需存储除特征标记点以外的其它标志点的世界坐标，因为标志杆上各标志点之间有固定间距，并且标志杆始终垂直于地面，所以各标记点只相差固定高度，可通过特征标记点的世界坐标计算出。It can be understood that the first 2D image obtained through this method (after fixing the camera, manually pushing the sign pole to move evenly within the road area, and when the sign pole is stationary at each specific position, the camera takes pictures of the sign pole) , different landmark points on the sign pole at the same moment have different corresponding pixels in the same first 2D image. Assume that the height of the first 2D image is H and the width is W, that is, the first 2D image has H*W pixels. , can be regarded as an H*W matrix. Assuming that the number of landmark points is N, then the matrix that stores the positioning information of each landmark point can be an H*W*(N+1) matrix, where the first layer of the matrix is The Nth layer can store the pixel coordinates of each landmark point, and the (N+1) layer stores the world coordinates of the characteristic marker points (such as the lowest red marker point on the marker pole). In particular, there is no need to store the world coordinates of other landmark points except the characteristic marker points, because there is a fixed spacing between the marker points on the sign pole, and the sign pole is always perpendicular to the ground, so the marker points only differ by a fixed height, which can be passed The world coordinates of the feature marker points are calculated.

进一步的，所述建立第一2D图像中道路区域的像素点与世界坐标之间的关联关系，还包括：Further, establishing the correlation between the pixels of the road area in the first 2D image and the world coordinates also includes:

根据已建立所述关联关系的像素点的世界坐标，通过线性差值法计算道路区域中未建立所述关联关系的像素点的世界坐标，并基于计算得到的世界坐标建立与对应像素点之间的关联关系。According to the world coordinates of the pixels for which the association relationship has been established, the world coordinates of the pixel points in the road area for which the association relationship has not been established are calculated through the linear difference method, and based on the calculated world coordinates, a relationship between the pixel points and the corresponding pixel points is established. relationship.

换言之，当相机画面中道路区域中已关联定位信息的像素点达到一定密度后，道路区域内其余未关联定位信息的像素点，通过周围已关联的像素点的定位信息，通过线性插值法计算出定位信息进行关联。示例性的，如图3所示，红色方块310代表已关联定位信息的像素点，其余像素点可以通过周围的已关联的像素点的定位信息来计算自身的定位信息，如黄色块320的定位信息可通过计算的方式获得。至此，可获得道路区域的全部像素点与世界坐标之间的关联关系。In other words, when the pixels in the road area in the camera image that have been associated with positioning information reach a certain density, the remaining pixels in the road area that are not associated with positioning information are calculated by linear interpolation based on the positioning information of the surrounding associated pixels. Positioning information is associated. For example, as shown in Figure 3, the red square 310 represents a pixel with associated positioning information. The remaining pixels can calculate their own positioning information through the positioning information of the surrounding associated pixels, such as the positioning of the yellow block 320. Information can be obtained computationally. At this point, the correlation between all pixels in the road area and the world coordinates can be obtained.

S120、确定第二2D图像中车辆的伪3D检测框，所述第一2D图像中的道路区域包括所述车辆在所述第二2D图像中占据的道路区域。S120. Determine the pseudo 3D detection frame of the vehicle in the second 2D image, and the road area in the first 2D image includes the road area occupied by the vehicle in the second 2D image.

其中，第二2D图像是通过所述相机针对所述道路区域进行拍摄获得的，具体是当有社会车辆驶入所述道路区域时，通过所述相机拍下对应的画面。可以理解的是，所述第一2D图像与所述第二2D图像可以是同一张图像，但是图像中的标志杆尽量不对图像中的车辆造成遮挡，以保证最终的3D检测框的标注精度，以及不增加标注复杂度。Wherein, the second 2D image is obtained by taking a picture of the road area through the camera. Specifically, when a social vehicle drives into the road area, the corresponding picture is taken by the camera. It can be understood that the first 2D image and the second 2D image may be the same image, but the sign pole in the image should try not to block the vehicle in the image to ensure the labeling accuracy of the final 3D detection frame. and without increasing annotation complexity.

示例性的，所述确定第二2D图像中车辆的伪3D检测框，包括：Exemplarily, determining the pseudo 3D detection frame of the vehicle in the second 2D image includes:

将至少两张连续采集的第二2D图像输入到车辆检测模型，获得第二2D图像中车辆的2D检测框。示例性的，如图4所示的一种包括2D检测框的示意图，其中的检测框410即为车辆的2D检测框。Input at least two continuously collected second 2D images into the vehicle detection model to obtain a 2D detection frame of the vehicle in the second 2D image. For example, a schematic diagram including a 2D detection frame is shown in Figure 4, in which the detection frame 410 is the 2D detection frame of the vehicle.

将所述至少两张连续采集的第二2D图像输入到实例分割模型，获得第二2D图像中车辆的实例分割结果。2D检测框内除了包括车辆的像素点，还会包括一些除车辆之外的其它对象的像素点，为了精确地获取属于车辆的像素点，可对第二2D图像进行实例分割获得。The at least two continuously collected second 2D images are input into the instance segmentation model to obtain an instance segmentation result of the vehicle in the second 2D image. In addition to the pixels of the vehicle, the 2D detection frame also includes pixels of some objects other than the vehicle. In order to accurately obtain the pixels belonging to the vehicle, the second 2D image can be obtained by instance segmentation.

根据光流算法、特征点跟踪算法以及所述至少两张连续采集的第二2D图像确定所述车辆的运动方向；根据所述运动方向、所述实例分割结果和所述2D检测框确定所述伪3D检测框。示例性的，参考如图5所示的一种包括伪3D检测框的示意图，其中的检测框510即为车辆的伪3D检测框，相比于2D检测框，伪3D检测框增加了车轮与地面之间的接触线。The movement direction of the vehicle is determined based on the optical flow algorithm, the feature point tracking algorithm and the at least two continuously collected second 2D images; the movement direction is determined based on the movement direction, the instance segmentation result and the 2D detection frame. Pseudo 3D detection box. For example, refer to a schematic diagram including a pseudo 3D detection frame as shown in Figure 5, in which the detection frame 510 is the pseudo 3D detection frame of the vehicle. Compared with the 2D detection frame, the pseudo 3D detection frame adds wheels and The line of contact between the ground.

需要说明的是，上述的车辆检测模型、实例分割模型、光流算法以及特征点跟踪算法均采用现有相关算法即可，本公开实施例不对其进行限定。It should be noted that the above-mentioned vehicle detection model, instance segmentation model, optical flow algorithm and feature point tracking algorithm can all adopt existing relevant algorithms, and the embodiments of the present disclosure do not limit them.

进一步的，所述根据所述运动方向、所述实例分割结果和所述2D检测框确定所述伪3D检测框，包括：Further, determining the pseudo 3D detection frame based on the motion direction, the instance segmentation result and the 2D detection frame includes:

根据所述运动方向以及所述实例分割结果通过像素扫描的方式确定所述车辆的轮胎与地面相接触的点；Determine the point where the tire of the vehicle contacts the ground through pixel scanning according to the direction of movement and the instance segmentation result;

根据所述车辆的轮胎与地面相接触的点以及所述2D检测框确定所述车辆的轮胎与地面相接触的部分所构成的接触线；基于所述接触线以及所述2D检测框生成所述伪3D检测框。Determine the contact line formed by the part where the vehicle's tires are in contact with the ground based on the point where the vehicle's tires are in contact with the ground and the 2D detection frame; generate the said contact line based on the contact line and the 2D detection frame Pseudo 3D detection box.

所述根据所述运动方向以及所述实例分割结果通过像素扫描的方式确定所述车辆的轮胎与地面相接触的点，包括：Determining the point where the tire of the vehicle contacts the ground through pixel scanning based on the direction of movement and the instance segmentation result includes:

在所述实例分割结果中从地面开始向上取K行的像素，获得K*w1的第一像素子矩阵，K根据所述实例分割结果的总行数h1进行设定，所述实例分割结果的总列数为w1；根据所述运动方向确定扫描方向；按照所述扫描方向对所述第一像素子矩阵进行逐列扫描，以统计每列非0值的个数；当第一次扫描到非0值的个数等于阈值的第一目标列w2时，将所述第一像素子矩阵的最后一行中第w2列处的像素点确定为第一目标点；根据所述运动方向从所述实例分割结果中取出h1*(w1-w2)的第二像素子矩阵，按照从下向上的顺序对所述第二像素子矩阵进行逐行扫描，针对每行按照所述扫描方向进行扫描，并同时统计扫描到的0值的个数，直到扫描到第一个非0值时停止对当前行的扫描，并继续对下一行进行扫描，直到扫描到0值个数小于上一行0值个数的目标行h2；从所述实例分割结果中的所述目标行h2开始，向上连续取K行像素获得K*w1的第三像素子矩阵；对所述第三像素子矩阵按照所述扫描方向进行逐列扫描，并统计每列的非0值个数，直到第一次扫描到非0值个数等于所述阈值的第二目标列w3；将所述第三像素子矩阵的最后一行中第w3列处的像素点确定为第二目标点；由所述第一目标点和所述第二目标点组成所述车辆的轮胎与地面相接触的点。In the instance segmentation result, K rows of pixels are taken upward from the ground to obtain the first pixel submatrix of K*w1. K is set according to the total number of rows h1 of the instance segmentation result. The total number of rows of the instance segmentation result is The number of columns is w1; the scanning direction is determined according to the movement direction; the first pixel submatrix is scanned column by column according to the scanning direction to count the number of non-0 values in each column; when the first scan reaches a non-zero value, When the number of 0 values is equal to the first target column w2 of the threshold, the pixel point at the w2th column in the last row of the first pixel submatrix is determined as the first target point; according to the movement direction, from the example The second pixel sub-matrix of h1*(w1-w2) is taken out from the segmentation result, and the second pixel sub-matrix is scanned row by row in order from bottom to top, and each row is scanned according to the scanning direction, and at the same time Count the number of 0 values scanned, stop scanning the current row when the first non-0 value is scanned, and continue scanning the next row until the number of 0 values scanned is less than the number of 0 values in the previous row. Target row h2; starting from the target row h2 in the example segmentation result, continuously take K rows of pixels upward to obtain the third pixel sub-matrix of K*w1; perform the third pixel sub-matrix according to the scanning direction Scan column by column and count the number of non-0 values in each column until the first scan reaches the second target column w3 where the number of non-0 values is equal to the threshold; add the third pixel in the last row of the third pixel submatrix The pixel point at column w3 is determined as the second target point; the first target point and the second target point constitute the point where the tire of the vehicle contacts the ground.

所述根据所述车辆的轮胎与地面相接触的点以及所述2D检测框确定所述车辆的轮胎与地面相接触的部分所构成的接触线，包括：确定所述第一目标点与所述第二目标点构成的第一直线；确定所述第一直线与所述2D检测框的两个交点；确定所述两个交点所构成的第二直线；将所述第二直线确定为所述车辆的轮胎与地面相接触的部分所构成的接触线。Determining the contact line formed by the part where the vehicle's tires are in contact with the ground based on the point where the vehicle's tires are in contact with the ground and the 2D detection frame includes: determining the first target point and the The first straight line formed by the second target point; determine the two intersection points of the first straight line and the 2D detection frame; determine the second straight line formed by the two intersection points; determine the second straight line as The contact line formed by the portion of the vehicle's tires that is in contact with the ground.

具体的，假设原图像的尺寸为(H，W)，即原图像的高为H，宽为W，像素级别实例分割算法的输出结果为(H，W)的掩码图mask，可以理解为一个尺寸为(H，W)的矩阵，矩阵中通常用“0”表示没有目标的区域，用“X”表示有目标编号为X的区域。在mask中，多个目标有不同的编号X，以使目标之间相互区分，同一目标所在的区域里所有的值为同一数值。Specifically, assuming that the size of the original image is (H, W), that is, the height of the original image is H and the width is W. The output result of the pixel-level instance segmentation algorithm is the mask image mask of (H, W), which can be understood as A matrix of size (H, W). In the matrix, "0" is usually used to represent the area without a target, and "X" is used to represent the area with target number X. In the mask, multiple targets have different numbers X to distinguish the targets from each other. All values in the area where the same target is located are the same value.

在得到2D检测框、车辆的运动方向以及mask后，将mask中的2D检测框所圈住的区域的mask取出，假设为mask1，尺寸为(h1，w1)的矩阵，设置K＝0.05*h1(可以根据2D检测框的尺寸进行相应增减)。从mask1底边向上取K行，那么取出尺寸为(K，w1)的矩阵。因为已经得到了车辆的运动方向，如图6中的黄色箭头610所示，其表示车辆的运动方向，在2D图像上是从右到左。那么对取出的(K，w1)的子矩阵，从右到左，逐列扫描，统计每一列非0的个数，扫描所有列，每列中非0个数最多为n_max。从右到左，逐列扫描，当扫描的当前列的非0个数小于n_max则继续扫描下一列。当第一次扫描到某列，并且这一列非0个数等于n_max时，记下此时的列号w2，行号为K(从上至下为第1～第K行，K行为最下方的行)。将第K行第w2列处的像素点标记为A(即所述第一目标点)，在mask中的位置，就是车辆在2D图像中伪3D框的底边点，即车辆的轮胎与地面接触的一个点，如图7所示的点A。After obtaining the 2D detection frame, vehicle movement direction and mask, take out the mask of the area enclosed by the 2D detection frame in the mask, assuming it is mask1, a matrix of size (h1, w1), set K = 0.05*h1 (It can be increased or decreased accordingly according to the size of the 2D detection frame). Take K rows upward from the bottom of mask1, then take out a matrix of size (K, w1). Because the movement direction of the vehicle has been obtained, as shown by the yellow arrow 610 in Figure 6 , which represents the movement direction of the vehicle, which is from right to left on the 2D image. Then scan the extracted sub-matrix of (K, w1) column by column from right to left, count the number of non-0s in each column, and scan all columns. The maximum number of non-0s in each column is n_max. From right to left, scan column by column. When the number of non-zeros in the current column scanned is less than n_max, continue scanning the next column. When a column is scanned for the first time, and the number of non-zeros in this column is equal to n_max, record the column number w2 at this time, and the row number is K (from top to bottom, it is the 1st to Kth rows, and the K row is at the bottom row). Mark the pixel at row K and column w2 as A (i.e., the first target point). Its position in the mask is the bottom point of the vehicle's pseudo-3D box in the 2D image, that is, the vehicle's tires and the ground. A point of contact, point A shown in Figure 7.

接着取出mask1右边为(w1-w2)列的矩阵，尺寸为(h1,w1-w2)，由下至上，逐行进行：从右至左逐个点扫描，统计0值的个数，直到遇到第一个非0值时计数停止。当某一行的计数值小于上一行的计数值的1/2时，记下行号h2。从mask1中取出h2行之上(包括h2行在内)的K行矩阵，尺寸为(K，w1)的子矩阵。针对(K，w1)的子矩阵从右到左，逐列扫描，统计每一列非0的个数，扫描所有列，每列中非0个数最多为n_max。从右到左，逐列扫描，当扫描的当前列的非0个数小于n_max则继续扫描下一列。当第一次扫描到某列，并且这一列非0个数等于n_max时，记下此时的列号w3，行号为h2。第h2行第w3列处的像素点即为所述第二目标点，是车辆的轮胎与地面接触的又个点。Then take out the matrix with columns (w1-w2) on the right side of mask1, and the size is (h1,w1-w2), proceed from bottom to top, row by row: scan point by point from right to left, count the number of 0 values, until you encounter Counting stops at the first non-zero value. When the count value of a row is less than 1/2 of the count value of the previous row, record the row number h2. Take out the K-row matrix above row h2 (including row h2) from mask1, and the sub-matrix with size (K, w1). Scan the submatrix of (K, w1) column by column from right to left, count the number of non-0s in each column, and scan all columns. The maximum number of non-0s in each column is n_max. From right to left, scan column by column. When the number of non-zeros in the current column scanned is less than n_max, continue scanning the next column. When a column is scanned for the first time and the non-zero number in this column is equal to n_max, record the column number w3 and the row number h2. The pixel point at row h2 and column w3 is the second target point, which is another point where the vehicle's tires contact the ground.

所述第一目标点与所述第二目标点构成一条直线，该直线与2D检测框侧边的交点即为图8中所示的B点，其中点O(x1，y2)、点A(x3，y2)、点B(x2，y3)以及点C(x3，y1)四个点确定车辆的伪3D检测框，其中点TL(x1，y1)、O(x1，y2)以及RB(x2，y2)所确定的矩形810表示车辆的2D检测框。The first target point and the second target point form a straight line, and the intersection point of the straight line and the side of the 2D detection frame is point B shown in Figure 8, where point O (x1, y2), point A ( x3, y2), point B (x2, y3) and point C (x3, y1) determine the pseudo 3D detection frame of the vehicle, among which points TL (x1, y1), O (x1, y2) and RB (x2 , y2) The rectangle 810 determined represents the 2D detection frame of the vehicle.

S130、根据所述关联关系确定所述伪3D检测框上预设特征像素点所对应的世界坐标。S130. Determine the world coordinates corresponding to the preset feature pixels on the pseudo 3D detection frame according to the association relationship.

其中，所述伪3D检测框上预设特征像素点可以是上述像素点O(x1，y2)、点A(x3，y2)、点B(x2，y3)以及点C(x3，y1)，这四个点能够唯一确定车辆的伪3D检测框，根据所述关联关系可分别确定这个四个点的世界坐标，进而可获得所述车辆的真3D检测框。Wherein, the preset characteristic pixel points on the pseudo 3D detection frame may be the above-mentioned pixel point O(x1, y2), point A(x3, y2), point B(x2, y3) and point C(x3, y1), These four points can uniquely determine the pseudo 3D detection frame of the vehicle. According to the correlation relationship, the world coordinates of these four points can be determined respectively, and then the true 3D detection frame of the vehicle can be obtained.

S140、基于所述预设特征像素点的世界坐标构造所述车辆的真3D检测框。S140. Construct a true 3D detection frame of the vehicle based on the world coordinates of the preset feature pixel points.

本公开实施例提供的标注车辆3D检测框的方法，利用固定式相机和标志杆来对2D图像中的道路区域进行像素点和世界坐标的对应，再通过2D目标检测模型和2D实例分割模型，获取车辆的2D检测框和分割结果，进而确定车辆的伪3D检测框，最后通过几何换算，在坐标系里构建车辆的真3D检测框，无需人工参与即可自动完成3D检测框的标注工作。仅有一次性成本投入，没有后期的标注成本，可以在短时间内获得大量的标注数据，具备效率高成本低的优势。The method of labeling a vehicle 3D detection frame provided by the embodiment of the present disclosure uses a fixed camera and a sign pole to correspond the pixel points and world coordinates of the road area in the 2D image, and then uses a 2D target detection model and a 2D instance segmentation model, Obtain the 2D detection frame and segmentation results of the vehicle, and then determine the pseudo 3D detection frame of the vehicle. Finally, through geometric conversion, the true 3D detection frame of the vehicle is constructed in the coordinate system, and the annotation of the 3D detection frame can be automatically completed without manual participation. There is only one-time cost investment and no later labeling costs. A large amount of labeling data can be obtained in a short time, which has the advantages of high efficiency and low cost.

本公开实施例还提供一种标注车辆3D检测框的装置，如图9所示，该装置包括：建立模块910，用于建立第一2D图像中道路区域的像素点与世界坐标之间的关联关系；第一确定模块920，用于确定第二2D图像中车辆的伪3D检测框，所述第一2D图像中的道路区域包括所述车辆在所述第二2D图像中占据的道路区域；第二确定模块930，用于根据所述关联关系确定所述伪3D检测框上预设特征像素点所对应的世界坐标；构造模块940，用于基于所述预设特征像素点的世界坐标构造所述车辆的真3D检测框。Embodiments of the present disclosure also provide a device for marking a 3D detection frame of a vehicle. As shown in Figure 9, the device includes: an establishment module 910 for establishing an association between the pixels of the road area in the first 2D image and the world coordinates. Relationship; the first determination module 920 is used to determine the pseudo 3D detection frame of the vehicle in the second 2D image, and the road area in the first 2D image includes the road area occupied by the vehicle in the second 2D image; The second determination module 930 is used to determine the world coordinates corresponding to the preset feature pixel points on the pseudo 3D detection frame according to the association relationship; the construction module 940 is used to construct the world coordinates based on the preset feature pixel points. The true 3D detection frame of the vehicle.

进一步的，建立模块910包括：Further, the establishment module 910 includes:

拍摄单元，用于通过位姿固定的相机针对垂直树立在所述道路区域中不同位置的标志杆分别进行拍摄，获得多张所述第一2D图像，所述标志杆上从低到高依次设置有多个不同颜色的标记点；获取单元，用于通过与特征标记点关联设置的GPS模块获取所述特征标记点的世界坐标，所述特征标记点是所述不同颜色的标记点中的任意一个；确定单元，用于针对所述多个不同颜色的标记点中的当前标记点，根据当前标记点与所述特征标记点在所述标志杆上的相对位置关系以及所述特征标记点的世界坐标确定当前标记点的世界坐标；第一建立单元，用于建立各所述标记点在对应的第一2D图像中的像素点与世界坐标之间的关联关系。The shooting unit is used to separately shoot the sign poles vertically erected at different positions in the road area through a camera with a fixed posture, and obtain a plurality of the first 2D images. The sign poles are arranged in sequence from low to high. There are multiple mark points of different colors; the acquisition unit is used to obtain the world coordinates of the feature mark points through the GPS module set in association with the feature mark points, and the feature mark points are any of the mark points of different colors. A; Determining unit, configured to determine, for the current marking point among the plurality of marking points of different colors, the relative positional relationship between the current marking point and the characteristic marking point on the sign pole and the location of the characteristic marking point. The world coordinate determines the world coordinate of the current marker point; the first establishment unit is used to establish an association between the pixel points of each marker point in the corresponding first 2D image and the world coordinate.

进一步的，建立模块910还包括：Further, the establishment module 910 also includes:

第二建立单元，用于根据已建立所述关联关系的像素点的世界坐标，通过线性差值法计算道路区域中未建立所述关联关系的像素点的世界坐标，并基于计算得到的世界坐标建立与对应像素点之间的关联关系。The second establishment unit is used to calculate the world coordinates of the pixels in the road area for which the association has not been established through the linear difference method based on the world coordinates of the pixels for which the association has been established, and based on the calculated world coordinates Establish an association with the corresponding pixels.

进一步的，第一确定模块920包括：Further, the first determination module 920 includes:

检测单元，用于将至少两张连续采集的第二2D图像输入到车辆检测模型，获得第二2D图像中车辆的2D检测框；分割单元，用于将所述至少两张连续采集的第二2D图像输入到实例分割模型，获得第二2D图像中车辆的实例分割结果；第一确定单元，用于根据光流算法、特征点跟踪算法以及所述至少两张连续采集的第二2D图像确定所述车辆的运动方向；第二确定单元，用于根据所述运动方向、所述实例分割结果和所述2D检测框确定所述伪3D检测框。The detection unit is used to input at least two continuously collected second 2D images into the vehicle detection model to obtain the 2D detection frame of the vehicle in the second 2D image; the segmentation unit is used to input the at least two continuously collected second 2D images into the vehicle detection model. The 2D image is input into the instance segmentation model to obtain the instance segmentation result of the vehicle in the second 2D image; the first determination unit is used to determine based on the optical flow algorithm, the feature point tracking algorithm and the at least two continuously collected second 2D images. The movement direction of the vehicle; a second determination unit configured to determine the pseudo 3D detection frame according to the movement direction, the instance segmentation result and the 2D detection frame.

进一步的，所述第二确定单元包括：Further, the second determining unit includes:

第一确定子单元，用于根据所述运动方向以及所述实例分割结果通过像素扫描的方式确定所述车辆的轮胎与地面相接触的点；第二确定子单元，用于根据所述车辆的轮胎与地面相接触的点以及所述2D检测框确定所述车辆的轮胎与地面相接触的部分所构成的接触线；生成子单元，用于基于所述接触线以及所述2D检测框生成所述伪3D检测框。The first determination subunit is used to determine the point at which the tire of the vehicle contacts the ground through pixel scanning according to the movement direction and the instance segmentation result; the second determination subunit is used to determine the point at which the tire of the vehicle contacts the ground according to the direction of movement and the instance segmentation result. The point where the tire is in contact with the ground and the 2D detection frame determine a contact line formed by the part where the tire of the vehicle is in contact with the ground; a generating subunit is used to generate the contact line based on the contact line and the 2D detection frame. Describe the pseudo 3D detection box.

进一步的，所述第一确定子单元具体用于：Further, the first determination subunit is specifically used for:

在所述实例分割结果中从地面开始向上取K行的像素，获得K*w1的第一像素子矩阵，K根据所述实例分割结果的总行数h1进行设定，所述实例分割结果的总列数为w1；根据所述运动方向确定扫描方向；按照所述扫描方向对所述第一像素子矩阵进行逐列扫描，以统计每列非0值的个数；当第一次扫描到非0值的个数等于阈值的第一目标列w2时，将所述第一像素子矩阵的最后一行中第w2列处的像素点确定为第一目标点；根据所述运动方向从所述实例分割结果中取出h1*(w1-w2)的第二像素子矩阵，按照从下向上的顺序对所述第二像素子矩阵进行逐行扫描，针对每行按照所述扫描方向进行扫描，并同时统计扫描到的0值的个数，直到扫描到第一个非0值时停止对当前行的扫描，并继续对下一行进行扫描，直到扫描到0值个数小于上一行0值个数的目标行h2；从所述实例分割结果中的所述目标行h2开始，向上连续取K行像素获得K*w1的第三像素子矩阵；对所述第三像素子矩阵按照所述扫描方向进行逐列扫描，并统计每列的非0值个数，直到第一次扫描到非0值个数等于所述阈值的第二目标列w3，将所述第三像素子矩阵的最后一行中第w3列处的像素点确定为第二目标点；由所述第一目标点和所述第二目标点组成所述车辆的轮胎与地面相接触的点。In the instance segmentation result, K rows of pixels are taken upward from the ground to obtain the first pixel submatrix of K*w1. K is set according to the total number of rows h1 of the instance segmentation result. The total number of rows of the instance segmentation result is The number of columns is w1; the scanning direction is determined according to the movement direction; the first pixel submatrix is scanned column by column according to the scanning direction to count the number of non-0 values in each column; when the first scan reaches a non-zero value, When the number of 0 values is equal to the first target column w2 of the threshold, the pixel point at the w2th column in the last row of the first pixel submatrix is determined as the first target point; according to the movement direction, from the example The second pixel sub-matrix of h1*(w1-w2) is taken out from the segmentation result, and the second pixel sub-matrix is scanned row by row in order from bottom to top, and each row is scanned according to the scanning direction, and at the same time Count the number of 0 values scanned, stop scanning the current row when the first non-0 value is scanned, and continue scanning the next row until the number of 0 values scanned is less than the number of 0 values in the previous row. Target row h2; starting from the target row h2 in the example segmentation result, continuously take K rows of pixels upward to obtain the third pixel sub-matrix of K*w1; perform the third pixel sub-matrix according to the scanning direction Scan column by column and count the number of non-zero values in each column until the first scan reaches the second target column w3 whose number of non-zero values is equal to the threshold, and add the second target column w3 in the last row of the third pixel submatrix. The pixel point at column w3 is determined as the second target point; the first target point and the second target point constitute the point where the tire of the vehicle contacts the ground.

进一步的，所述第二确定子单元具体用于：Further, the second determination subunit is specifically used for:

确定所述第一目标点与所述第二目标点构成的第一直线；确定所述第一直线与所述2D检测框的两个交点；确定所述两个交点所构成的第二直线；将所述第二直线确定为所述车辆的轮胎与地面相接触的部分所构成的接触线。Determine a first straight line formed by the first target point and the second target point; determine two intersection points of the first straight line and the 2D detection frame; determine a second straight line formed by the two intersection points. Straight line; determine the second straight line as the contact line formed by the portion of the vehicle's tires that is in contact with the ground.

本公开实施例提供的标注车辆3D检测框的装置，可执行本公开方法实施例所提供的标注车辆3D检测框的方法中的步骤，可获得相同的有益效果，此处不再赘述。The device for labeling a 3D detection frame of a vehicle provided by the embodiment of the present disclosure can perform the steps in the method of labeling a 3D detection frame of a vehicle provided by the method embodiment of the present disclosure, and can obtain the same beneficial effects, which will not be described again here.

图10为本公开实施例中的一种电子设备的结构示意图。下面具体参考图10，其示出了适于用来实现本公开实施例中的电子设备500的结构示意图。图10示出的电子设备仅仅是一个示例，不应对本公开实施例的功能和使用范围带来任何限制。Figure 10 is a schematic structural diagram of an electronic device in an embodiment of the present disclosure. Referring specifically to FIG. 10 below, a schematic structural diagram of an electronic device 500 suitable for implementing an embodiment of the present disclosure is shown. The electronic device shown in FIG. 10 is only an example and should not impose any limitations on the functions and scope of use of the embodiments of the present disclosure.

如图10所示，电子设备500可以包括处理装置(例如中央处理器、图形处理器等)501，其可以根据存储在只读存储器(ROM)502中的程序或者从存储装置508加载到随机访问存储器(RAM)503中的程序而执行各种适当的动作和处理以实现如本公开所述的实施例的方法。在RAM 503中，还存储有电子设备500操作所需的各种程序和数据。处理装置501、ROM502以及RAM 503通过总线504彼此相连。输入/输出(I/O)接口505也连接至总线504。As shown in FIG. 10 , the electronic device 500 may include a processing device (eg, central processing unit, graphics processor, etc.) 501 that may be loaded into a random access device according to a program stored in a read-only memory (ROM) 502 or from a storage device 508 . The program in the memory (RAM) 503 performs various appropriate actions and processes to implement the methods of the embodiments described in the present disclosure. In the RAM 503, various programs and data required for the operation of the electronic device 500 are also stored. The processing device 501, the ROM 502 and the RAM 503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

特别地，根据本公开的实施例，上文参考流程图描述的过程可以被实现为计算机软件程序。例如，本公开的实施例包括一种计算机程序产品，其包括承载在非暂态计算机可读介质上的计算机程序，该计算机程序包含用于执行流程图所示的方法的程序代码，从而实现如上所述的标注车辆3D检测框的方法。在这样的实施例中，该计算机程序可以通过通信装置509从网络上被下载和安装，或者从存储装置508被安装，或者从ROM 502被安装。在该计算机程序被处理装置501执行时，执行本公开实施例的方法中限定的上述功能。In particular, according to embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product, which includes a computer program carried on a non-transitory computer-readable medium, the computer program including program code for executing the method shown in the flowchart, thereby achieving the above The method of marking a vehicle 3D detection frame. In such embodiments, the computer program may be downloaded and installed from the network via communication device 509, or from storage device 508, or from ROM 502. When the computer program is executed by the processing device 501, the above-mentioned functions defined in the method of the embodiment of the present disclosure are performed.

需要说明的是，本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件，或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于：具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中，计算机可读存储介质可以是任何包含或存储程序的有形介质，该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中，计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号，其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式，包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质，该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输，包括但不限于：电线、光缆、RF(射频)等等，或者上述的任意合适的组合。It should be noted that the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two. The computer-readable storage medium may be, for example, but is not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or any combination thereof. More specific examples of computer readable storage media may include, but are not limited to: an electrical connection having one or more wires, a portable computer disk, a hard drive, random access memory (RAM), read only memory (ROM), removable Programmed read-only memory (EPROM or flash memory), fiber optics, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In this disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above. A computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium that can send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device . Program code embodied on a computer-readable medium may be transmitted using any suitable medium, including but not limited to: wire, optical cable, RF (radio frequency), etc., or any suitable combination of the above.

上述计算机可读介质可以是上述电子设备中所包含的；也可以是单独存在，而未装配入该电子设备中。上述计算机可读介质承载有一个或者多个程序，当上述一个或者多个程序被该电子设备执行时，使得该电子设备：建立第一2D图像中道路区域的像素点与世界坐标之间的关联关系；确定第二2D图像中车辆的伪3D检测框，所述第一2D图像中的道路区域包括所述车辆在所述第二2D图像中占据的道路区域；根据所述关联关系确定所述伪3D检测框上预设特征像素点所对应的世界坐标；基于所述预设特征像素点的世界坐标构造所述车辆的真3D检测框。The above-mentioned computer-readable medium may be included in the above-mentioned electronic device; it may also exist independently without being assembled into the electronic device. The computer-readable medium carries one or more programs. When the one or more programs are executed by the electronic device, the electronic device: establishes an association between the pixels of the road area in the first 2D image and the world coordinates. relationship; determine the pseudo 3D detection frame of the vehicle in the second 2D image, the road area in the first 2D image includes the road area occupied by the vehicle in the second 2D image; determine the The world coordinates corresponding to the preset feature pixel points on the pseudo 3D detection frame; and the true 3D detection frame of the vehicle is constructed based on the world coordinates of the preset feature pixel points.

可选的，当上述一个或者多个程序被该电子设备执行时，该电子设备还可以执行上述实施例所述的其他步骤。Optionally, when one or more of the above programs are executed by the electronic device, the electronic device may also perform other steps described in the above embodiments.

在本公开的上下文中，机器可读介质可以是有形的介质，其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备，或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of this disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, laptop disks, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.

以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解，本公开中所涉及的公开范围，并不限于上述技术特征的特定组合而成的技术方案，同时也应涵盖在不脱离上述公开构思的情况下，由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。The above description is only a description of the preferred embodiments of the present disclosure and the technical principles applied. Those skilled in the art should understand that the disclosure scope involved in the present disclosure is not limited to technical solutions composed of specific combinations of the above technical features, but should also cover solutions composed of the above technical features or without departing from the above disclosed concept. Other technical solutions formed by any combination of equivalent features. For example, a technical solution is formed by replacing the above features with technical features with similar functions disclosed in this disclosure (but not limited to).