CN114663880B

Movatterモバイル変換

Info

Publication number: CN114663880B
Application number: CN202210253116.0A
Authority: CN
Inventors: 曹原周汉; 李浥东; 张慧; 郎丛妍; 陈乃月
Original assignee: Beijing Jiaotong University
Current assignee: Beijing Jiaotong University
Priority date: 2022-03-15
Filing date: 2022-03-15
Publication date: 2025-03-25
Anticipated expiration: 2042-03-15
Also published as: CN114663880A

Abstract

Translated fromChinese

本发明提供了一种基于多层级跨模态自注意力机制的三维目标检测方法。该方法包括利用RGB图像数据构建训练集与测试集；构建三维目标检测模型，该三维目标检测模型包含RGB主干网络、深度主干网络、分类器与回归器；利用训练集与测试集数据训练所述三维目标检测模型，并利用测试集验证训练效果，得到训练好的三维目标检测模型；利用训练得到的模型对RGB图像中的三维目标进行检测。本发明方法从深度特征图中获取全局场景范围内的深度结构信息，与外观信息有机结合以提升三维目标检测算法的准确性，从而有效地对二维RGB图像中的三维物体进行类别、位置、尺寸和姿态等信息的检测。

The present invention provides a three-dimensional target detection method based on a multi-level cross-modal self-attention mechanism. The method comprises constructing a training set and a test set using RGB image data; constructing a three-dimensional target detection model, which comprises an RGB backbone network, a deep backbone network, a classifier and a regressor; training the three-dimensional target detection model using the training set and the test set data, and verifying the training effect using the test set to obtain a trained three-dimensional target detection model; and detecting three-dimensional targets in RGB images using the trained model. The method of the present invention obtains the depth structure information within the global scene range from the depth feature map, and organically combines it with the appearance information to improve the accuracy of the three-dimensional target detection algorithm, thereby effectively detecting the category, position, size and posture of the three-dimensional objects in the two-dimensional RGB image.