Movatterモバイル変換


[0]ホーム

URL:


CN111652933B - Repositioning method and device based on monocular camera, storage medium and electronic equipment - Google Patents

Repositioning method and device based on monocular camera, storage medium and electronic equipment
Download PDF

Info

Publication number
CN111652933B
CN111652933BCN202010373453.4ACN202010373453ACN111652933BCN 111652933 BCN111652933 BCN 111652933BCN 202010373453 ACN202010373453 ACN 202010373453ACN 111652933 BCN111652933 BCN 111652933B
Authority
CN
China
Prior art keywords
key frame
point cloud
real scene
cloud data
frame images
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010373453.4A
Other languages
Chinese (zh)
Other versions
CN111652933A (en
Inventor
彭冬炜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Oppo Mobile Telecommunications Corp Ltd
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp LtdfiledCriticalGuangdong Oppo Mobile Telecommunications Corp Ltd
Priority to CN202010373453.4ApriorityCriticalpatent/CN111652933B/en
Publication of CN111652933ApublicationCriticalpatent/CN111652933A/en
Application grantedgrantedCritical
Publication of CN111652933BpublicationCriticalpatent/CN111652933B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

The disclosure provides a repositioning method and device based on a monocular camera, a storage medium and electronic equipment, and relates to the technical field of computer vision. Wherein the method comprises the following steps: acquiring a video stream of a real scene acquired by a monocular camera; extracting a plurality of key frame images from the video stream; according to pose transformation parameters between any two key frame images, performing three-dimensional reconstruction processing on the two key frame images to obtain point cloud data of the real scene; and matching the point cloud data of the real scene with the map data acquired in advance to determine the pose of the monocular camera. The method for realizing repositioning through the monocular camera is low in hardware cost, simple in realization process and high in practicality.

Description

Translated fromChinese
基于单目相机的重定位方法、装置、存储介质与电子设备Monocular camera-based relocation method, device, storage medium and electronic equipment

技术领域technical field

本公开涉及计算机视觉技术领域,尤其涉及一种基于单目相机的重定位方法、基于单目相机的重定位装置、计算机可读存储介质与电子设备。The present disclosure relates to the technical field of computer vision, and in particular to a monocular camera-based relocation method, a monocular camera-based relocation device, a computer-readable storage medium, and electronic equipment.

背景技术Background technique

在AR(Augmented Reality,增强现实)、SLAM(Simultaneous Localization andMapping,即时定位与建图)等领域中,重定位技术具有重要的应用,通过对相机采集的图像和已经建立的地图进行匹配,将搭载相机的设备(如智能手机、机器人等)重定位到地图中,以实现诸如多人AR共享地图、扫地机器人根据已经建立的地图规划路线等场景功能。In fields such as AR (Augmented Reality, augmented reality), SLAM (Simultaneous Localization and Mapping, real-time positioning and mapping), relocation technology has important applications. By matching the images collected by the camera with the established maps, the Camera devices (such as smartphones, robots, etc.) are relocated to the map to realize scene functions such as multi-person AR sharing maps, and sweeping robots planning routes based on established maps.

相关技术中,重定位的实现极大地依赖于相机的硬件条件,例如需要设置双目相机或者搭载深度传感器(如TOF(Time of Flight,飞行时间)摄像头),通过双目相机采集的图像或者图像深度信息还原出三维信息,从而进行重定位。可见,相关技术对于硬件的要求较高,无法适用于仅具有单目相机的设备。In related technologies, the realization of relocation greatly depends on the hardware conditions of the camera, for example, it is necessary to set up a binocular camera or carry a depth sensor (such as a TOF (Time of Flight, time of flight) camera), and the image or image collected by the binocular camera The depth information restores the three-dimensional information for repositioning. It can be seen that related technologies have high requirements on hardware and cannot be applied to devices with only a monocular camera.

需要说明的是,在上述背景技术部分公开的信息仅用于加强对本公开的背景的理解,因此可以包括不构成对本领域普通技术人员已知的现有技术的信息。It should be noted that the information disclosed in the above background section is only for enhancing the understanding of the background of the present disclosure, and therefore may include information that does not constitute the prior art known to those of ordinary skill in the art.

发明内容Contents of the invention

本公开提供了一种基于单目相机的重定位方法、基于单目相机的重定位装置、计算机可读存储介质与电子设备,进而至少在一定程度上克服相关技术无法适用于单目相机的问题。The present disclosure provides a relocation method based on a monocular camera, a relocation device based on a monocular camera, a computer-readable storage medium, and an electronic device, thereby overcoming the problem that related technologies cannot be applied to a monocular camera at least to a certain extent .

本公开的其他特性和优点将通过下面的详细描述变得显然,或部分地通过本公开的实践而习得。Other features and advantages of the present disclosure will become apparent from the following detailed description, or in part, be learned by practice of the present disclosure.

根据本公开的第一方面,提供一种基于单目相机的重定位方法,包括:获取由单目相机采集的现实场景的视频流;从所述视频流中提取多个关键帧图像;根据任意两个关键帧图像之间的位姿变换参数,对所述两个关键帧图像进行三维重建处理,得到所述现实场景的点云数据;将所述现实场景的点云数据和预先获取的地图数据进行匹配,以确定所述单目相机的位姿。According to the first aspect of the present disclosure, a monocular camera-based relocation method is provided, including: acquiring a video stream of a real scene captured by a monocular camera; extracting a plurality of key frame images from the video stream; according to any pose transformation parameters between two key frame images, and perform three-dimensional reconstruction processing on the two key frame images to obtain the point cloud data of the real scene; combine the point cloud data of the real scene with the map obtained in advance data to determine the pose of the monocular camera.

根据本公开的第二方面,提供一种基于单目相机的重定位装置,包括:视频流获取模块,用于获取由单目相机采集的现实场景的视频流;关键帧提取模块,用于从所述视频流中提取多个关键帧图像;三维重建模块,用于根据任意两个关键帧图像之间的位姿变换参数,对所述两个关键帧图像进行三维重建处理,得到所述现实场景的点云数据;点云匹配模块,用于将所述现实场景的点云数据和预先获取的地图数据进行匹配,以确定所述单目相机的位姿。According to the second aspect of the present disclosure, there is provided a relocation device based on a monocular camera, including: a video stream acquisition module for acquiring a video stream of a real scene captured by a monocular camera; a key frame extraction module for obtaining from A plurality of key frame images are extracted from the video stream; a three-dimensional reconstruction module is used to perform three-dimensional reconstruction processing on the two key frame images according to the pose transformation parameters between any two key frame images to obtain the actual The point cloud data of the scene; the point cloud matching module is used to match the point cloud data of the real scene with the map data acquired in advance to determine the pose of the monocular camera.

根据本公开的第三方面,提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述第一方面所述的基于单目相机的重定位方法及其可能的实施方式。According to a third aspect of the present disclosure, a computer-readable storage medium is provided, on which a computer program is stored, and when the computer program is executed by a processor, the monocular camera-based relocation method and its possible implementation.

根据本公开的第四方面,提供一种电子设备,包括:处理器;以及存储器,用于存储所述处理器的可执行指令;其中,所述处理器配置为经由执行所述可执行指令来执行上述第一方面所述的基于单目相机的重定位方法及其可能的实施方式。According to a fourth aspect of the present disclosure, there is provided an electronic device, including: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to execute the executable instructions to The monocular camera-based relocation method described in the first aspect and possible implementations thereof are implemented.

本公开的技术方案具有以下有益效果:The technical solution of the present disclosure has the following beneficial effects:

根据上述基于单目相机的重定位方法、基于单目相机的重定位装置、计算机可读存储介质与电子设备,获取由单目相机采集的现实场景的视频流,从中提取关键帧图像,并根据任意两个关键帧图像之间的位姿变换参数进行三维重建处理,得到现实场景的点云数据,最后将现实场景的点云数据和预先获取的地图数据进行匹配,以确定单目相机的位姿。一方面,本方案提供了一种通过单目相机即可实现的重定位方法,无需设置双目相机、深度传感器等其他硬件,实现成本较低。另一方面,本方案的实现过程较为简单,且由单目相机所采集的视频与图像数量较少,使得重定位所涉及的数据处理量较低,具有较高的实用行。According to the relocation method based on the monocular camera, the relocation device based on the monocular camera, the computer readable storage medium and the electronic equipment, the video stream of the real scene collected by the monocular camera is obtained, and the key frame image is extracted therefrom, and according to The pose transformation parameters between any two key frame images are processed for 3D reconstruction to obtain the point cloud data of the real scene, and finally the point cloud data of the real scene are matched with the pre-acquired map data to determine the position of the monocular camera. posture. On the one hand, this solution provides a relocation method that can be realized through a monocular camera, without setting other hardware such as binocular cameras, depth sensors, etc., and the implementation cost is low. On the other hand, the implementation process of this solution is relatively simple, and the number of videos and images collected by the monocular camera is small, so that the amount of data processing involved in relocation is low, and it has high practicality.

应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.

附图说明Description of drawings

此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理。显而易见地,下面描述中的附图仅仅是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description serve to explain the principles of the disclosure. Apparently, the drawings in the following description are only some embodiments of the present disclosure, and those skilled in the art can obtain other drawings according to these drawings without creative efforts.

图1示出本示例性实施方式中运行环境的系统架构示意图;FIG. 1 shows a schematic diagram of the system architecture of the operating environment in this exemplary embodiment;

图2示出本示例性实施方式中一种移动终端的结构示意图;FIG. 2 shows a schematic structural diagram of a mobile terminal in this exemplary embodiment;

图3示出本示例性实施方式中一种基于单目相机的重定位方法的流程图;FIG. 3 shows a flow chart of a monocular camera-based relocation method in this exemplary embodiment;

图4示出本示例性实施方式中一种基于单目相机的重定位方法的子流程图;FIG. 4 shows a sub-flowchart of a monocular camera-based relocation method in this exemplary embodiment;

图5示出本示例性实施方式中三角化处理的示意图;FIG. 5 shows a schematic diagram of triangulation processing in this exemplary embodiment;

图6示出本示例性实施方式中多线程三角化处理的示意图;FIG. 6 shows a schematic diagram of multi-threaded triangulation processing in this exemplary embodiment;

图7示出本示例性实施方式中一种基于单目相机的重定位装置的结构框图。Fig. 7 shows a structural block diagram of a relocation device based on a monocular camera in this exemplary embodiment.

具体实施方式Detailed ways

现在将参考附图更全面地描述示例实施方式。然而,示例实施方式能够以多种形式实施,且不应被理解为限于在此阐述的范例;相反,提供这些实施方式使得本公开将更加全面和完整,并将示例实施方式的构思全面地传达给本领域的技术人员。所描述的特征、结构或特性可以以任何合适的方式结合在一个或更多实施方式中。在下面的描述中,提供许多具体细节从而给出对本公开的实施方式的充分理解。然而,本领域技术人员将意识到,可以实践本公开的技术方案而省略所述特定细节中的一个或更多,或者可以采用其它的方法、组元、装置、步骤等。在其它情况下,不详细示出或描述公知技术方案以避免喧宾夺主而使得本公开的各方面变得模糊。Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided in order to give a thorough understanding of embodiments of the present disclosure. However, those skilled in the art will appreciate that the technical solutions of the present disclosure may be practiced without one or more of the specific details being omitted, or other methods, components, devices, steps, etc. may be adopted. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.

此外,附图仅为本公开的示意性图解,并非一定是按比例绘制。图中相同的附图标记表示相同或类似的部分,因而将省略对它们的重复描述。附图中所示的一些方框图是功能实体,不一定必须与物理或逻辑上独立的实体相对应。可以采用软件形式来实现这些功能实体,或在一个或多个硬件模块或集成电路中实现这些功能实体,或在不同网络和/或处理器装置和/或微控制器装置中实现这些功能实体。Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus repeated descriptions thereof will be omitted. Some of the block diagrams shown in the drawings are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different network and/or processor means and/or microcontroller means.

本公开的示例性实施方式提供一种基于单目相机的重定位方法和基于单目相机的重定位装置。Exemplary embodiments of the present disclosure provide a monocular camera-based relocation method and a monocular camera-based relocation device.

图1示出了本公开示例性实施方式运行环境的系统架构示意图。FIG. 1 shows a schematic diagram of a system architecture of an operating environment in an exemplary embodiment of the present disclosure.

如图1所示,该系统架构100可以包括:单目相机110、网络120和电子设备130。单目相机110是指配置单摄像头的相机;电子设备130可以是具有处理功能的设备,如计算机、智能手机、平板电脑、智能可穿戴设备(如AR眼镜)、机器人、无人机等。单目相机110可以通过网络120,与电子设备130形成通讯连接,将所采集的图像或视频传输至电子设备130,由电子设备130进行分析处理。图1示出单目相机110设置于电子设备130之外,在一种实施方式中,单目相机110还可以内置于电子设备130中,如电子设备130可以是配置了单目相机的智能手机或机器人。As shown in FIG. 1 , the system architecture 100 may include: a monocular camera 110 , a network 120 and an electronic device 130 . The monocular camera 110 refers to a camera configured with a single camera; the electronic device 130 may be a device with processing functions, such as a computer, a smart phone, a tablet computer, a smart wearable device (such as AR glasses), a robot, a drone, and the like. The monocular camera 110 can form a communication connection with the electronic device 130 through the network 120 , and transmit the collected images or videos to the electronic device 130 for analysis and processing by the electronic device 130 . Figure 1 shows that the monocular camera 110 is set outside the electronic device 130. In one embodiment, the monocular camera 110 can also be built into the electronic device 130, such as the electronic device 130 can be a smart phone equipped with a monocular camera or robots.

需要说明的是,本示例性实施方式中,是对单目相机110进行重定位,如果单目相机110内置于电子设备130,也相当于对电子设备130进行重定位。It should be noted that, in this exemplary embodiment, the monocular camera 110 is relocated. If the monocular camera 110 is built into the electronic device 130 , it is equivalent to relocating the electronic device 130 .

应当理解的是,图1中各装置的数量仅仅是示意性的,例如根据实现需要,可以设置多个单目相机,其分别通过网络120连接到电子设备130,电子设备130可以同时对每个单目相机的图像进行分析处理,等等。It should be understood that the number of devices in FIG. 1 is only illustrative. For example, according to implementation requirements, multiple monocular cameras can be set, which are respectively connected to the electronic device 130 through the network 120, and the electronic device 130 can simultaneously monitor each The image of the monocular camera is analyzed and processed, and so on.

下面以图2中的移动终端200为例,对上述电子设备130的构造进行示例性说明。在另一些实施方式中,移动终端200可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件、软件或软件和硬件的组合实现。各部件间的接口连接关系只是示意性示出,并不构成对移动终端200的结构限定。在另一些实施方式中,移动终端200也可以采用与图2不同的接口连接方式,或多种接口连接方式的组合。Taking the mobile terminal 200 in FIG. 2 as an example below, the structure of the above-mentioned electronic device 130 will be exemplarily described. In some other implementations, the mobile terminal 200 may include more or fewer components than shown, or combine some components, or separate some components, or arrange different components. The illustrated components may be realized in hardware, software, or a combination of software and hardware. The interface connection relationship among the various components is only schematically shown, and does not constitute a structural limitation on the mobile terminal 200 . In some other implementation manners, the mobile terminal 200 may also adopt an interface connection manner different from that in FIG. 2 , or a combination of multiple interface connection manners.

如图2所示,移动终端200具体可以包括:处理器210、内部存储器221、外部存储器接口222、USB接口230、充电管理模块240、电源管理模块241、电池242、天线1、天线2、移动通信模块250、无线通信模块260、音频模块270、扬声器271、受话器272、麦克风273、耳机接口274、传感器模块280、显示屏290、摄像模组291、指示器292、马达293、按键294以及用户标识模块(Subscriber Identification Module,SIM)卡接口295等。As shown in FIG. 2 , the mobile terminal 200 may specifically include: a processor 210, an internal memory 221, an external memory interface 222, a USB interface 230, a charging management module 240, a power management module 241, a battery 242, an antenna 1, an antenna 2, a mobile Communication module 250, wireless communication module 260, audio module 270, speaker 271, receiver 272, microphone 273, earphone interface 274, sensor module 280, display screen 290, camera module 291, indicator 292, motor 293, button 294 and user An identification module (Subscriber Identification Module, SIM) card interface 295 and the like.

处理器210可以包括一个或多个处理单元,例如:处理器210可以包括应用处理器(Application Processor,AP)、调制解调处理器、图形处理器(Graphics ProcessingUnit,GPU)、图像信号处理器(Image Signal Processor,ISP)、控制器、编码器、解码器、数字信号处理器(Digital Signal Processor,DSP)、基带处理器和/或神经网络处理器(Neural-Network Processing Unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。编码器可以对图像或视频数据进行编码(即压缩),形成码流数据;解码器可以对图像或视频的码流数据进行解码(即解压缩),以还原出图像或视频数据。移动终端200可以支持一种或多种编码器和解码器。这样,移动终端200可以播放或录制多种编码格式的图像或视频,例如:JPEG(Joint Photographic Experts Group,联合图像专家组)、PNG(Portable Network Graphics,便携式网络图形)、BMP(Bitmap,位图)等图像格式,MPEG(Moving Picture Experts Group,动态图像专家组)1、MPEG2、MPEG3、MPEG4、H.263、H.264、H.265、HEVC(High Efficiency Video Coding,高效率视频编码)等视频格式。移动终端200从单目相机获取图像或视频后,可以先通过解码器进行解码,再进行后续处理。The processor 210 may include one or more processing units, for example: the processor 210 may include an application processor (Application Processor, AP), a modem processor, a graphics processor (Graphics Processing Unit, GPU), an image signal processor ( Image Signal Processor, ISP), controller, encoder, decoder, digital signal processor (Digital Signal Processor, DSP), baseband processor and/or neural network processor (Neural-Network Processing Unit, NPU), etc. Wherein, different processing units may be independent devices, or may be integrated in one or more processors. The encoder can encode (that is, compress) the image or video data to form code stream data; the decoder can decode (that is, decompress) the code stream data of the image or video to restore the image or video data. The mobile terminal 200 may support one or more encoders and decoders. In this way, the mobile terminal 200 can play or record images or videos in multiple encoding formats, such as: JPEG (Joint Photographic Experts Group, Joint Photographic Experts Group), PNG (Portable Network Graphics, portable network graphics), BMP (Bitmap, bitmap ) and other image formats, MPEG (Moving Picture Experts Group) 1, MPEG2, MPEG3, MPEG4, H.263, H.264, H.265, HEVC (High Efficiency Video Coding, high-efficiency video coding), etc. video format. After the mobile terminal 200 acquires the image or video from the monocular camera, it may first decode it through a decoder, and then perform subsequent processing.

在一些实施方式中,处理器210可以包括一个或多个接口。接口可以包括集成电路(Inter-Integrated Circuit,I2C)接口、集成电路内置音频(Inter-Integrated CircuitSound,I2S)接口、脉冲编码调制(Pulse Code Modulation,PCM)接口、通用异步收发传输器(Universal Asynchronous Receiver/Transmitter,UART)接口、移动产业处理器接口(Mobile Industry Processor Interface,MIPI)、通用输入输出(General-PurposeInput/Output,GPIO)接口、用户标识模块(Subscriber Identity Module,SIM)接口和/或通用串行总线(Universal Serial Bus,USB)接口等。通过不同的接口和移动终端200的其他部件形成连接。In some implementations, processor 210 may include one or more interfaces. The interface may include an integrated circuit (Inter-Integrated Circuit, I2C) interface, an integrated circuit built-in audio (Inter-Integrated CircuitSound, I2S) interface, a pulse code modulation (Pulse Code Modulation, PCM) interface, a Universal Asynchronous Receiver (Universal Asynchronous Receiver) /Transmitter, UART) interface, Mobile Industry Processor Interface (MIPI), General-Purpose Input/Output (GPIO) interface, Subscriber Identity Module (SIM) interface and/or general-purpose Serial bus (Universal Serial Bus, USB) interface, etc. Connections are formed with other components of the mobile terminal 200 through different interfaces.

USB接口230是符合USB标准规范的接口,具体可以是MiniUSB接口,MicroUSB接口,USBTypeC接口等。USB接口230可以用于连接充电器为移动终端200充电,也可以连接耳机,通过耳机播放音频,还可以用于移动终端200连接其他电子设备,例如连接电脑、外围设备等。The USB interface 230 is an interface conforming to the USB standard specification, specifically, it may be a MiniUSB interface, a MicroUSB interface, a USB Type C interface, and the like. The USB interface 230 can be used to connect a charger to charge the mobile terminal 200, and can also be connected to an earphone to play audio through the earphone, and can also be used to connect the mobile terminal 200 to other electronic devices, such as computers and peripheral devices.

充电管理模块240用于从充电器接收充电输入。充电管理模块240为电池242充电的同时,还可以通过电源管理模块241为设备供电。The charging management module 240 is configured to receive charging input from the charger. While the charging management module 240 is charging the battery 242 , it can also supply power to the device through the power management module 241 .

电源管理模块241用于连接电池242、充电管理模块240与处理器210。电源管理模块241接收电池242和/或充电管理模块240的输入,为移动终端200的各个部分供电,还可以用于监测电池的状态。The power management module 241 is used for connecting the battery 242 , the charging management module 240 and the processor 210 . The power management module 241 receives input from the battery 242 and/or the charging management module 240, supplies power to various parts of the mobile terminal 200, and can also be used to monitor the state of the battery.

移动终端200的无线通信功能可以通过天线1、天线2、移动通信模块250、无线通信模块260、调制解调处理器以及基带处理器等实现。The wireless communication function of the mobile terminal 200 can be realized by the antenna 1, the antenna 2, the mobile communication module 250, the wireless communication module 260, a modem processor, a baseband processor, and the like.

天线1和天线2用于发射和接收电磁波信号。移动终端200中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。移动通信模块250可以提供应用在移动终端200上的包括2G/3G/4G/5G等无线通信的解决方案。Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals. Each antenna in mobile terminal 200 can be used to cover single or multiple communication frequency bands. Different antennas can also be multiplexed to improve the utilization of the antennas. The mobile communication module 250 can provide wireless communication solutions including 2G/3G/4G/5G applied on the mobile terminal 200 .

无线通信模块260可以提供应用在移动终端200上的包括无线局域网(WirelessLocal Area Networks,WLAN)(如无线保真(Wireless Fidelity,Wi-Fi))、蓝牙(Bluetooth,BT)、全球导航卫星系统(Global Navigation Satellite System,GNSS)、调频(Frequency Modulation,FM)、近距离无线通信技术(Near Field Communication,NFC)、红外技术(Infrared,IR)等无线通信解决方案。无线通信模块260可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块260经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器210。无线通信模块260还可以从处理器210接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。The wireless communication module 260 can provide applications on the mobile terminal 200 including wireless local area network (WirelessLocal Area Networks, WLAN) (such as wireless fidelity (Wireless Fidelity, Wi-Fi)), bluetooth (Bluetooth, BT), global navigation satellite system ( Global Navigation Satellite System (GNSS), Frequency Modulation (FM), Near Field Communication (NFC), Infrared (IR) and other wireless communication solutions. The wireless communication module 260 may be one or more devices integrating at least one communication processing module. The wireless communication module 260 receives electromagnetic waves via the antenna 2 , frequency-modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 210 . The wireless communication module 260 can also receive the signal to be sent from the processor 210 , frequency-modulate it, amplify it, and convert it into electromagnetic waves through the antenna 2 to radiate out.

在一些实施方式中,移动终端200的天线1和移动通信模块250耦合,天线2和无线通信模块260耦合,使得移动终端200可以通过无线通信技术与网络以及其他设备通信。所述无线通信技术可以包括全球移动通讯系统(Global System for Mobilecommunications,GSM),通用分组无线服务(General Packet Radio Service,GPRS),码分多址接入(Code Division Multiple Access,CDMA),宽带码分多址(Wideband CodeDivision Multiple Access,WCDMA),时分码分多址(Time Division-Synchronous CodeDivision Multiple Access,TD-SCDMA),长期演进(Long Term Evolution,LTE),新空口(New Radio,NR),BT,GNSS,WLAN,NFC,FM,和/或IR技术等。In some implementations, the antenna 1 of the mobile terminal 200 is coupled to the mobile communication module 250, and the antenna 2 is coupled to the wireless communication module 260, so that the mobile terminal 200 can communicate with the network and other devices through wireless communication technology. Described wireless communication technology can comprise Global System for Mobile Communications (Global System for Mobilecommunications, GSM), General Packet Radio Service (General Packet Radio Service, GPRS), Code Division Multiple Access (Code Division Multiple Access, CDMA), broadband code Wideband Code Division Multiple Access (WCDMA), Time Division Code Division Multiple Access (Time Division-Synchronous Code Division Multiple Access, TD-SCDMA), Long Term Evolution (LTE), New Radio (New Radio, NR), BT, GNSS, WLAN, NFC, FM, and/or IR technology, etc.

移动终端200通过GPU、显示屏290及应用处理器等实现显示功能。GPU用于执行数学和几何计算,以实现图形渲染,并连接显示屏290和应用处理器。处理器210可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。移动终端200可以包括一个或多个显示屏290,用于显示图像,视频等。The mobile terminal 200 realizes the display function through the GPU, the display screen 290 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering and is connected to the display screen 290 and the application processor. Processor 210 may include one or more GPUs that execute program instructions to generate or alter display information. The mobile terminal 200 may include one or more display screens 290 for displaying images, videos and the like.

移动终端200可以通过ISP、摄像模组291、编码器、解码器、GPU、显示屏290及应用处理器等实现拍摄功能。The mobile terminal 200 can realize the shooting function through the ISP, camera module 291 , encoder, decoder, GPU, display screen 290 and application processor.

摄像模组291用于捕获静态图像或视频,通过感光元件采集光信号,转换为电信号。ISP用于处理摄像模组291反馈的数据,将电信号转换成数字图像信号。The camera module 291 is used to capture still images or videos, collect light signals through photosensitive elements, and convert them into electrical signals. The ISP is used to process the data fed back by the camera module 291 and convert the electrical signal into a digital image signal.

外部存储器接口222可以用于连接外部存储卡,例如Micro SD卡,实现扩展移动终端200的存储能力。The external memory interface 222 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the mobile terminal 200 .

内部存储器221可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。内部存储器221可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储移动终端200使用过程中所创建的数据(比如图像,视频)等。处理器210通过运行存储在内部存储器221的指令和/或存储在设置于处理器中的存储器的指令,执行移动终端200的各种功能应用以及数据处理。The internal memory 221 may be used to store computer-executable program codes including instructions. The internal memory 221 may include an area for storing programs and an area for storing data. Wherein, the stored program area can store an operating system, at least one application program required by a function (such as a sound playing function, an image playing function, etc.) and the like. The storage data area can store data (such as images, videos) etc. created during the use of the mobile terminal 200 . The processor 210 executes various functional applications and data processing of the mobile terminal 200 by executing instructions stored in the internal memory 221 and/or instructions stored in a memory provided in the processor.

移动终端200可以通过音频模块270、扬声器271、受话器272、麦克风273、耳机接口274及应用处理器等实现音频功能。例如音乐播放、录音等。音频模块270用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块270还可以用于对音频信号编码和解码。扬声器271,用于将音频电信号转换为声音信号。受话器272,用于将音频电信号转换成声音信号。麦克风273,用于将声音信号转换为电信号。耳机接口274用于连接有线耳机。The mobile terminal 200 can implement audio functions through an audio module 270 , a speaker 271 , a receiver 272 , a microphone 273 , an earphone interface 274 , and an application processor. Such as music playback, recording, etc. The audio module 270 is used to convert digital audio information into analog audio signal output, and is also used to convert analog audio input into digital audio signal. The audio module 270 may also be used to encode and decode audio signals. The speaker 271 is used for converting audio electrical signals into sound signals. The receiver 272 is used for converting audio electrical signals into sound signals. The microphone 273 is used to convert sound signals into electrical signals. The earphone interface 274 is used for connecting wired earphones.

传感器模块280可以包括深度传感器2801、压力传感器2802、陀螺仪传感器2803、气压传感器2804等。深度传感器2801用于获取景物的深度信息。压力传感器2802用于感受压力信号,可以将压力信号转换成电信号,用于实现压力触控等功能。陀螺仪传感器2803可以用于确定移动终端200的运动姿态,可用于拍摄防抖、导航、体感游戏等场景。气压传感器2804用于测量气压,可通过计算海拔高度,辅助定位和导航。此外,根据实际需要,还可以在传感器模块280中设置其他功能的传感器,例如磁传感器、加速度传感器、距离传感器等。The sensor module 280 may include a depth sensor 2801, a pressure sensor 2802, a gyro sensor 2803, an air pressure sensor 2804, and the like. The depth sensor 2801 is used to obtain the depth information of the scene. The pressure sensor 2802 is used to sense the pressure signal, and can convert the pressure signal into an electrical signal for realizing functions such as pressure touch. The gyroscope sensor 2803 can be used to determine the motion posture of the mobile terminal 200, and can be used to shoot scenarios such as anti-shake, navigation, and somatosensory games. The air pressure sensor 2804 is used to measure air pressure, which can be used to assist positioning and navigation by calculating the altitude. In addition, according to actual needs, sensors with other functions, such as magnetic sensors, acceleration sensors, and distance sensors, may also be provided in the sensor module 280 .

指示器292可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。The indicator 292 can be an indicator light, which can be used to indicate the charging status, the change of the battery capacity, and also can be used to indicate messages, missed calls, notifications and so on.

马达293可以产生振动提示,例如来电、闹钟、接收信息等的振动提示,也可以用于触摸振动反馈等。The motor 293 can generate vibration prompts, such as vibration prompts for incoming calls, alarm clocks, received information, etc., and can also be used for touch vibration feedback and the like.

按键294包括开机键,音量键等。按键294可以是机械按键。也可以是触摸式按键。移动终端200可以接收按键输入,产生与移动终端200的用户设置以及功能控制有关的键信号输入。The buttons 294 include power button, volume button and so on. Key 294 may be a mechanical key. It can also be a touch button. The mobile terminal 200 may receive key input and generate key signal input related to user settings and function control of the mobile terminal 200 .

移动终端200可以支持一个或多个SIM卡接口295,用于连接SIM卡,使移动终端200通过SIM卡和网络交互,实现通话以及数据通信等功能。The mobile terminal 200 may support one or more SIM card interfaces 295 for connecting to a SIM card, so that the mobile terminal 200 interacts with the network through the SIM card to realize functions such as calling and data communication.

下面对本公开示例性实施方式的基于单目相机的重定位方法和基于单目相机的重定位装置进行具体说明。The monocular camera-based relocation method and the monocular camera-based relocation device according to the exemplary embodiments of the present disclosure will be described in detail below.

图3示出该重定位方法的示意性流程,可以包括以下步骤S310至S340:FIG. 3 shows a schematic flow of the relocation method, which may include the following steps S310 to S340:

步骤S310,获取由单目相机采集的现实场景的视频流。Step S310, acquiring the video stream of the real scene captured by the monocular camera.

其中,现实场景是指在单目相机所在的真实世界环境,如现实的房间、咖啡店、商场或者街道等。在重定位时,一般需要相机一边移动一边拍摄周围的现实场景,例如用户可以手持手机,打开摄像头,一边走动一边拍摄现实场景,以完整地记录场景的每个部分、每个角落,拍摄的画面生成视频流,传输至处理器或外部的电子设备,以进行下一步处理。Among them, the real scene refers to the real world environment where the monocular camera is located, such as a real room, coffee shop, shopping mall or street. During relocation, it is generally necessary for the camera to move while shooting the surrounding real scene. For example, the user can hold the mobile phone, turn on the camera, and shoot the real scene while walking, so as to completely record every part and every corner of the scene. Generate a video stream and transmit it to a processor or external electronic device for further processing.

步骤S320,从视频流中提取多个关键帧图像。Step S320, extracting multiple key frame images from the video stream.

其中,关键帧图像是指视频流中质量较高、易于反映单目相机位姿的图像。Among them, the key frame image refers to the image in the video stream with high quality and easy to reflect the pose of the monocular camera.

在一种可选的实施方式中,在提取关键帧图像之前,可以对视频流中的图像进行预处理,以滤除质量较差的图像。例如,考虑到视频流是单目相机在移动中采集的,难免因为抖动出现画面模糊的情况,因此可以对视频流中的图像进行模糊检测,以滤除视频流中的模糊图像。具体来说,可以通过拉普拉斯算子或索贝尔算子等算法,计算每一帧图像的梯度,如果梯度达到一定的水平(如大于按照经验设定的阈值),则判断图像清晰,反之则判断图像模糊(一般模糊图像中存在较多的低梯度区域,导致整张图像的整体梯度较低),予以滤除。这样通过对低质量图像进行过滤,可以缩小后续提取关键帧的范围,提高效率。In an optional implementation manner, before extracting the key frame images, preprocessing may be performed on the images in the video stream, so as to filter out images with poor quality. For example, considering that the video stream is collected by a monocular camera while moving, it is inevitable that the picture will be blurred due to shaking. Therefore, blur detection can be performed on the image in the video stream to filter out the blurred image in the video stream. Specifically, the gradient of each frame of image can be calculated through algorithms such as Laplacian operator or Sobel operator. If the gradient reaches a certain level (such as greater than the threshold value set according to experience), the image is judged to be clear. Otherwise, it is judged that the image is blurred (generally there are many low-gradient areas in the blurred image, resulting in a low overall gradient of the entire image), and it is filtered out. In this way, by filtering low-quality images, the range of key frames for subsequent extraction can be narrowed to improve efficiency.

下面对于如何提取关键帧图像,提供两个具体方案:The following provides two specific solutions for how to extract key frame images:

方案一、参考图4所示,可以通过以下步骤S401至S403实现关键帧提取:Option 1, as shown in FIG. 4, the key frame extraction can be realized through the following steps S401 to S403:

步骤S401,对视频流解码,依次得到连续多帧图像;Step S401, decoding the video stream to sequentially obtain multiple consecutive frames of images;

步骤S402,根据当前帧图像相对于上一帧图像的位姿变换参数,确定当前帧图像与上一帧图像的相对运动距离;Step S402, according to the pose transformation parameters of the current frame image relative to the previous frame image, determine the relative movement distance between the current frame image and the previous frame image;

步骤S403,当上述相对运动距离处于预设数值范围内时,提取当前帧图像作为关键帧图像。Step S403, when the relative movement distance is within a preset numerical range, extract the current frame image as the key frame image.

实际应用中,一般可以将视频流的采集和关键帧图像的提取同步执行,以提高效率;当然也可以在视频流采集完成或者采集到一定程度后,开始关键帧图像的提取,即关键帧图像的提取可以落后于视频帧的采集;本公开对此不做限定。In practical applications, the acquisition of video streams and the extraction of key frame images can generally be performed synchronously to improve efficiency; of course, the extraction of key frame images can also be started after the video stream acquisition is completed or to a certain extent, that is, the key frame image The extraction of may lag behind the acquisition of video frames; this disclosure is not limited thereto.

单目相机在采集视频流时,一帧一帧的流入处理器,处理器进行逐帧分析:When the monocular camera captures the video stream, it flows into the processor frame by frame, and the processor performs frame-by-frame analysis:

以当前流入的图像称为当前帧,获取当前帧图像相对于上一帧图像的位姿变换参数。位姿变换参数通常包括平移参数T(如可以是3*1的矩阵)和旋转参数R(如可以是3*3的矩阵),是由于单目相机在拍摄这两帧时发生了移动。可以通过单目相机或内置有单目相机的电子设备所配置的IMU(Inertia Measurement Unit,惯性测量单元),测量这两帧之间单目相机的加速度、角速度等参数,通过这些参数求解得到位姿变换参数。The current incoming image is called the current frame, and the pose transformation parameters of the current frame image relative to the previous frame image are obtained. Pose transformation parameters usually include translation parameters T (such as a 3*1 matrix) and rotation parameters R (such as a 3*3 matrix), because the monocular camera moves when shooting these two frames. The IMU (Inertia Measurement Unit, inertial measurement unit) configured by a monocular camera or an electronic device with a built-in monocular camera can be used to measure the acceleration, angular velocity and other parameters of the monocular camera between the two frames, and the position can be obtained by solving these parameters. Pose transformation parameters.

将位姿变换参数以相对运动距离的方式进行量化,以表征单目相机在这两帧之间运动的程度。相对运动距离可以通过以下公式(1)计算:The pose transformation parameters are quantified in terms of relative motion distance to characterize the degree of motion of the monocular camera between these two frames. The relative movement distance can be calculated by the following formula (1):

D=‖T‖+α·min (2π-‖R‖,‖R‖); (1)D=‖T‖+α min (2π-‖R‖,‖R‖); (1)

其中,D表示相对运动距离;T为当前帧图像相对于上一帧图像的平移参数,‖T‖表示T的范数;R为当前帧图像相对于上一帧图像的旋转参数,‖R‖表示R的范数;由于在表示单目相机的旋转时,通常固定为一个方向(如固定为逆时针或顺时针),所以可能出现旋转超过180度的情况,例如逆时针旋转210度,其相当于顺时针旋转150度;这里用min(2π-‖R‖,‖R‖)来度量旋转量,即保证旋转角度值不超过180度;α为预设系数,表示在将平移参数与旋转参数进行统一时,对旋转参数这一部分所施加的权重,以均衡平移参数与旋转参数两个方面对于D的影响。Among them, D represents the relative motion distance; T is the translation parameter of the current frame image relative to the previous frame image, ‖T‖ represents the norm of T; R is the rotation parameter of the current frame image relative to the previous frame image, ‖R‖ Indicates the norm of R; since the rotation of the monocular camera is usually fixed in one direction (such as fixed as counterclockwise or clockwise), it may occur that the rotation exceeds 180 degrees, such as 210 degrees counterclockwise, and its It is equivalent to rotating 150 degrees clockwise; here, min(2π-‖R‖,‖R‖) is used to measure the amount of rotation, that is, to ensure that the rotation angle value does not exceed 180 degrees; α is a preset coefficient, which means that when the translation parameter is combined with the rotation When the parameters are unified, the weight applied to the part of the rotation parameter is to balance the influence of the two aspects of the translation parameter and the rotation parameter on D.

可见,公式(1)是对相邻两帧之间单目相机运动程度的一种度量。如果D过大,则两帧之间的运动程度过高,不利于对现实场景图像的连续采集,画面信息可能缺失;如果D过小,则两帧之间的运动程度过低(甚至接近静止),所采集的图像中可能重复信息过多,有效信息不足。基于此,可以为D设定预设数值范围[Dmin,Dmax],其中Dmin表示最小运动距离,Dmax表示最大运动距离,两者均为可调的经验参数。当D∈[Dmin,Dmax]时,可以将当前帧图像作为关键帧图像提取出来,这样得到的关键帧图像中,其信息的质量较高,有利于后续实现重定位。It can be seen that formula (1) is a measure of the degree of motion of the monocular camera between two adjacent frames. If D is too large, the degree of motion between two frames is too high, which is not conducive to the continuous acquisition of real scene images, and the picture information may be missing; if D is too small, the degree of motion between two frames is too low (even close to stillness) ), there may be too much repetitive information in the collected images and insufficient effective information. Based on this, a preset value range [Dmin, Dmax] can be set for D, where Dmin represents the minimum movement distance, and Dmax represents the maximum movement distance, both of which are adjustable empirical parameters. When D∈[Dmin,Dmax], the current frame image can be extracted as a key frame image, and the information quality of the key frame image obtained in this way is high, which is conducive to subsequent relocation.

方案二、可以在视频流中每间隔固定的帧数,例如每5帧或每30帧,提取一帧作为关键帧图像。Solution 2: A frame may be extracted as a key frame image at a fixed interval of frames in the video stream, for example, every 5 frames or every 30 frames.

实际应用中,也可以结合上述方案一和方案二,例如每间隔固定的帧数,提取一帧,将其相对于上一帧图像的相对运动距离与预设数值范围进行比较,如果相对运动距离处于预设数值范围中,则确定所提取的帧为关键帧图像。In practical applications, the above schemes 1 and 2 can also be combined, such as extracting a frame at a fixed number of frames per interval, and comparing its relative motion distance with the preset value range relative to the previous frame image, if the relative motion distance is in the preset value range, it is determined that the extracted frame is a key frame image.

步骤S330,根据任意两个关键帧图像之间的位姿变换参数,对该两个关键帧图像进行三维重建处理,得到现实场景的点云数据。In step S330, according to the pose transformation parameters between any two key frame images, three-dimensional reconstruction processing is performed on the two key frame images to obtain point cloud data of the real scene.

其中,每个关键帧图像是对现实场景的一个局部进行采集的图像,两个关键帧图像所对应的局部中,一般存在重合的部分,而两个关键帧图像是单目相机从不同位置、不同角度拍摄得到的,因此可以通过对两个关键帧图像进行三维重建处理,还原出两个关键帧图像中至少一个像素点(一般是两个关键帧图像重合的部分)的三维信息(主要是恢复像素点的深度信息),这样得到的三维信息可以作为现实场景的点云数据。为了增加两个关键帧图像中可实现三维重建的区域大小,两个关键帧图像可以选取相邻两个关键帧图像,以保证其重合的部分较多。Among them, each key frame image is an image collected from a part of the real scene. In the part corresponding to the two key frame images, there is generally an overlapping part, and the two key frame images are monocular cameras from different positions, Therefore, the three-dimensional information of at least one pixel in the two key frame images (generally the overlapping part of the two key frame images) can be restored by performing three-dimensional reconstruction processing on the two key frame images (mainly Recover the depth information of the pixel point), the 3D information obtained in this way can be used as the point cloud data of the real scene. In order to increase the area size that can realize three-dimensional reconstruction in the two key frame images, two adjacent key frame images can be selected for the two key frame images to ensure that there are more overlapped parts.

通常三维重建可以基于三角定位原理实现,在一种可选的实施方式中,步骤S330可以包括:Generally, three-dimensional reconstruction can be realized based on the principle of triangulation. In an optional implementation manner, step S330 may include:

获取现实场景中的三维点在两个关键帧图像上的投影点;Obtain the projection points of the 3D points in the real scene on the two key frame images;

基于同一三维点对应的两个投影点的相机坐标,以及上述两个关键帧图像之间的位姿变换参数进行三角化处理,求解得到该三维点的空间坐标。Based on the camera coordinates of the two projection points corresponding to the same three-dimensional point, and the pose transformation parameters between the above two key frame images, triangulation is performed to obtain the spatial coordinates of the three-dimensional point.

其中,两个关键帧图像之间的位姿变换参数可以参考上述当前帧图像相对于上一帧图像的位姿变换参数,包括平移参数和旋转参数。在计算两个关键帧图像之间的位姿变换参数时,可以列出其间的所有帧,然后对每相邻两帧之间的位姿变换参数进行叠加,得到这两个关键帧图像之间的位姿变换参数。Wherein, the pose transformation parameters between two key frame images may refer to the pose transformation parameters of the current frame image relative to the previous frame image, including translation parameters and rotation parameters. When calculating the pose transformation parameters between two key frame images, you can list all the frames in between, and then superimpose the pose transformation parameters between every two adjacent frames to get the distance between the two key frame images The pose transformation parameters of .

参考图5所示,假设现实场景中存在一三维点P0,其在两个关键帧图像F1和F2上的投影点分别为P1和P2;基于F1建立相机坐标系c1,P1在c1中的坐标为X1(x1,y1);基于F2建立相机坐标系c2,P2在c2中的坐标为X2(x2,y2);构建以下三角化公式(2):Referring to Figure 5, suppose there is a three-dimensional point P0 in the real scene, and its projection points on the two key frame images F1 and F2 are P1 and P2 respectively; the camera coordinate system c1 is established based on F1, and the coordinates of P1 in c1 is X1(x1,y1); establish the camera coordinate system c2 based on F2, and the coordinate of P2 in c2 is X2(x2,y2); construct the following triangulation formula (2):

s1X1=s2RX2+T; (2)s1X1=s2RX2+T; (2)

其中,R、T分别为F2相对于F1的旋转参数和平移参数,注意与公式(1)中的R、T不相同。Among them, R and T are the rotation parameters and translation parameters of F2 relative to F1 respectively, and it should be noted that they are different from R and T in the formula (1).

可以利用叉乘进行消元,对公式(2)左右两边均乘以X1的反对称矩阵,可得:You can use cross multiplication to eliminate elements, and multiply the left and right sides of formula (2) by the antisymmetric matrix of X1, you can get:

s1X1×X1=0=s2X1×RX2+X1×T; (3)s1X1×X1=0=s2X1×RX2+X1×T; (3)

由此可以求解得到P0的深度值,然后重建出P0的空间坐标,该空间坐标可以是在相机坐标系或世界坐标系中的坐标。In this way, the depth value of P0 can be obtained by solving, and then the space coordinates of P0 can be reconstructed, and the space coordinates can be coordinates in the camera coordinate system or the world coordinate system.

在得到P0的空间坐标后,可以将其添加到现实场景的点云数据中。现实场景的点云数据即大量三维点的空间坐标所形成的集合。After obtaining the spatial coordinates of P0, it can be added to the point cloud data of the real scene. The point cloud data of a real scene is a collection of spatial coordinates of a large number of three-dimensional points.

进一步的,在得到三维点的空间坐标后,可以对三维点进行筛选,不满足条件的三维点不加入点云数据中,从而提高点云数据的质量。具体来说,步骤S330还可以包括:Further, after the spatial coordinates of the 3D points are obtained, the 3D points can be screened, and the 3D points that do not satisfy the conditions are not added to the point cloud data, thereby improving the quality of the point cloud data. Specifically, step S330 may also include:

当判断三维点的梯度大于预设梯度阈值时,将该三维点添加到现实场景的点云数据中。When it is judged that the gradient of the 3D point is greater than the preset gradient threshold, the 3D point is added to the point cloud data of the real scene.

获取点云数据的目的,是对现实场景的纹理、地形、障碍物等特征进行表征,因此对现实场景中的轮廓、边角、纹理突变、起伏等较为鲜明的部分采集点云数据,可以更好地体现出上述特征。一般这些部分的三维点,由于和周围的临近点差异较大,其梯度也较大,所以可以通过梯度进行三维点的筛选。根据经验或实际应用需求确定预设梯度阈值,如果三维点的梯度大于该阈值,则将其添加到现实场景的点云中,反之则说明三维点的特征性较低,可以将其舍弃。The purpose of obtaining point cloud data is to characterize the texture, terrain, obstacles and other features of the real scene. well reflect the above characteristics. Generally, the 3D points in these parts are quite different from the surrounding adjacent points, and their gradients are also large, so the 3D points can be screened through the gradient. Determine the preset gradient threshold based on experience or actual application requirements. If the gradient of the 3D point is greater than the threshold, it will be added to the point cloud of the real scene. Otherwise, the characteristic of the 3D point is low and can be discarded.

本示例性实施方式提供一种计算三维点梯度的方法,如以下公式(4)所示:This exemplary embodiment provides a method for calculating a three-dimensional point gradient, as shown in the following formula (4):

其中,Guv表示三维点的像素梯度,由x轴方向的像素梯度gxuv和y轴方向的像素梯度gyuv组成,u、v表示三维点在上述两个关键帧图像的任一帧图像中的投影点位于第u行、第v列;表示三维点的像素绝对梯度值;Iuv表示投影点在任一帧图像中的像素灰度值;i表示增量,一般是较小的数值。通常在任一帧图像中度量三维点的梯度,而图像是二维的,因此梯度一般包括x轴与y轴上的梯度分量,通过像素绝对梯度值可以将两分量进行综合度量。参考上述图5所示,P0在F1上的投影点,转换到F1的平面坐标中,为(u,v),表示第u行、第v列的像素,在F1中,将该像素与邻近像素之间灰度值求梯度,可以得到P0的梯度。如果转换到F2中计算P0的梯度,得到的结果可能存在一定差别,但是差别一般不大,因此选用任一帧图像均可,本公开不做限定。基于公式(4),在判断三维点的梯度是否大于预设梯度阈值时,可以判断三维点的像素绝对梯度值/>是否大于预设梯度阈值。Among them, Guv represents the pixel gradient of the three-dimensional point, which is composed of the pixel gradient gxuv in the x-axis direction and the pixel gradient gyuv in the y-axis direction, and u and v represent the three-dimensional point in any frame image of the above two key frame images The projection point of is located in row u, column v; Indicates the absolute gradient value of the pixel of the three-dimensional point;Iuv indicates the pixel gray value of the projected point in any frame image; i indicates the increment, which is generally a small value. The gradient of a three-dimensional point is usually measured in any frame of image, but the image is two-dimensional, so the gradient generally includes the gradient components on the x-axis and y-axis, and the two components can be comprehensively measured by the absolute gradient value of the pixel. Referring to the above-mentioned figure 5, the projection point of P0 on F1 is transformed into the plane coordinates of F1, which is (u, v), which represents the pixel in row u and column v, and in F1, the pixel and the adjacent The gradient of the gray value between pixels can be obtained to obtain the gradient of P0. If it is switched to F2 to calculate the gradient of P0, there may be some differences in the obtained results, but the difference is generally not large, so any frame of image can be selected, which is not limited in the present disclosure. Based on formula (4), when judging whether the gradient of a three-dimensional point is greater than the preset gradient threshold, the absolute gradient value of the pixel of the three-dimensional point can be judged /> Whether it is greater than the preset gradient threshold.

步骤S340,将现实场景的点云数据和预先获取的地图数据进行匹配,以确定单目相机的位姿。Step S340, matching the point cloud data of the real scene with the pre-acquired map data to determine the pose of the monocular camera.

其中,地图数据是指已经建立的现实场景的三维地图模型,可以由其他设备建立并同步到本设备,也可以由本设备在以前建图的环节所建立得到。地图数据也是大量三维点的集合,可以看作是另一个点云数据。在将两个点云数据匹配时,基本原理是计算相匹配的三维点是否具有相同或相近的法线信息。具体来说,在进行匹配时,可以在点云数据中计算一部分三维点的特征描述子,然后根据特征描述子进行三维点的逐对匹配,最后返回匹配信息。Wherein, the map data refers to the established three-dimensional map model of the real scene, which can be established by other devices and synchronized to the device, or can be obtained by the device in the previous link of map building. Map data is also a collection of a large number of 3D points, which can be regarded as another point cloud data. When matching two point cloud data, the basic principle is to calculate whether the matched 3D points have the same or similar normal information. Specifically, when matching, the feature descriptors of some 3D points can be calculated in the point cloud data, and then the 3D points can be matched pair by pair according to the feature descriptors, and finally the matching information can be returned.

在一种可选的实施方式中,步骤S340可以包括:In an optional implementation manner, step S340 may include:

通过ICP(Iterative Closest Point,迭代最近邻点)算法对现实场景的点云数据进行位姿变换,使变换后的点云数据和地图数据之间的误差收敛;Through the ICP (Iterative Closest Point, Iterative Closest Point) algorithm, the point cloud data of the real scene is transformed to make the error between the transformed point cloud data and the map data converge;

如果误差小于预设误差阈值,则确定现实场景的点云数据和地图数据匹配成功。If the error is smaller than the preset error threshold, it is determined that the point cloud data of the real scene and the map data are successfully matched.

假设现实场景的点云数据为集合X={xj|j=1,2,…,m},地图数据为集合Y={yj|j=1,2,…,n},m和n分别为两个集合中的点数量,m和n可以相等,也可以不相等。通过以下公式(5)进行ICP运算:Suppose the point cloud data of the real scene is a set X={xj |j=1,2,…,m}, and the map data is a set Y={yj |j=1,2,…,n}, m and n They are the number of points in the two sets respectively, m and n can be equal or not. The ICP operation is performed by the following formula (5):

其中,e表示误差,R、T为针对现实场景的点云数据的位姿变换参数,注意与公式(1)~(3)中的R、T不相同。误差收敛是指误差达到全局最小或局部最小,通过进一步的迭代无法再有效降低误差。ICP运算通过最小化e,迭代得到公式(5)中的R和T,然后基于R和T计算X和Y的误差,如果其小于根据经验确定的预设误差阈值,说明现实场景的点云和地图数据重合度较高,即匹配成功。Among them, e represents the error, and R and T are the pose transformation parameters for the point cloud data of the real scene. Note that they are different from R and T in formulas (1) to (3). Error convergence means that the error reaches the global minimum or local minimum, and the error can no longer be effectively reduced through further iterations. The ICP operation minimizes e, iteratively obtains R and T in formula (5), and then calculates the errors of X and Y based on R and T. If it is less than the preset error threshold determined based on experience, it indicates that the point cloud of the real scene and The map data overlap is high, that is, the matching is successful.

进一步的,在进行匹配时,可以先对现实场景的点云数据和地图数据进行配准(Alignment),配准可以看作是粗匹配,然后再通过ICP等算法进行精匹配,这样可以提高匹配准确度,且减少精匹配阶段的运算量。Furthermore, when performing matching, the point cloud data and map data of the real scene can be aligned (Alignment). Alignment can be regarded as rough matching, and then fine matching can be carried out through ICP and other algorithms, which can improve the matching accuracy. Accuracy, and reduce the amount of computation in the fine matching stage.

在确定现实场景的点云数据和地图数据匹配成功后,可以根据公式(5)中得到的点云数据的位姿变换参数,确定单目相机在世界坐标系中的位姿。具体来说,世界坐标系即地图数据的坐标系,公式(5)中的位姿变换参数可用于相机坐标系与世界坐标系间的转换。通过任一关键帧图像可以确定单目相机的相机坐标,将其转换到世界坐标系中,得到其在世界坐标系中的位姿。换而言之,将单目相机到定位到地图中,实现重定位。After confirming that the point cloud data and map data of the real scene are successfully matched, the pose of the monocular camera in the world coordinate system can be determined according to the pose transformation parameters of the point cloud data obtained in formula (5). Specifically, the world coordinate system is the coordinate system of the map data, and the pose transformation parameters in formula (5) can be used for conversion between the camera coordinate system and the world coordinate system. Through any key frame image, the camera coordinates of the monocular camera can be determined, transformed into the world coordinate system, and its pose in the world coordinate system can be obtained. In other words, the monocular camera is positioned on the map to achieve relocation.

一般的,现实场景的点云数据生成是逐帧累积的过程,当点云数据达到一定程度,即可进行匹配。在一种可选的实施方式中,步骤S340可以包括:Generally, the generation of point cloud data in real scenes is a process of frame-by-frame accumulation. When the point cloud data reaches a certain level, matching can be performed. In an optional implementation manner, step S340 may include:

当现实场景的点云数据中三维点的数量达到第一数量阈值,或者达到预设周期时间时,将现实场景的点云数据和地图数据进行匹配;When the number of three-dimensional points in the point cloud data of the real scene reaches a first number threshold, or reaches a preset cycle time, match the point cloud data of the real scene with the map data;

如果匹配失败,则继续从视频流中提取关键帧图像,并根据所提取的关键帧图像向现实场景的点云数据中增加新的三维点;If the matching fails, continue to extract key frame images from the video stream, and add new 3D points to the point cloud data of the real scene according to the extracted key frame images;

直到新的三维点的数量达到第二数量阈值,或者达到下一预设周期时间时,再次将现实场景的点云数据和地图数据进行匹配。Until the number of new three-dimensional points reaches the second number threshold, or reaches the next preset cycle time, the point cloud data of the real scene is matched with the map data again.

其中,第一数量阈值和第二数量阈值是根据经验和实际需求确定的参数,和现实场景的面积、复杂度等相关。当现实场景的点云数据中三维点的数量达到第一数量阈值时,可以认为其三维点数量已经足够表征现实场景的特征,此时将其与地图数据进行匹配,如果匹配成功,则可以实现重定位;如果匹配不成功,则说明当前的点云数据还不够充分,继续执行步骤S320和S330,提取更多的关键帧图像,并通过三维重建处理得到更多的三维点;当新增三维点的数量达到第二数量阈值时,再进行一次匹配;如果仍然匹配不成功,则等到下一次新增三维点的数量达到第二数量阈值时,再进行匹配。即,以新增三维点的数量达到第二数量阈值为条件,不断地尝试匹配,直到匹配成功。Wherein, the first number threshold and the second number threshold are parameters determined according to experience and actual needs, and are related to the area and complexity of the actual scene. When the number of 3D points in the point cloud data of the real scene reaches the first number threshold, it can be considered that the number of 3D points is enough to represent the characteristics of the real scene. At this time, it is matched with the map data. If the matching is successful, it can be realized. Relocation; if the matching is unsuccessful, it means that the current point cloud data is not enough, continue to perform steps S320 and S330, extract more key frame images, and obtain more 3D points through 3D reconstruction processing; when adding 3D When the number of points reaches the second number threshold, another match is performed; if the match is still unsuccessful, the next matching is performed when the number of newly added 3D points reaches the second number threshold. That is, on the condition that the number of newly added three-dimensional points reaches the second number threshold, the matching is continuously attempted until the matching is successful.

此外,也可以以预设周期时间为条件,例如预设周期时间为1分钟,则每分钟匹配一次,直到匹配成功为止。In addition, the preset cycle time can also be used as a condition, for example, if the preset cycle time is 1 minute, the match will be performed every minute until the match is successful.

本示例性实施方式中,三维重建处理的环节对于整个重定位过程具有重要影响,通常也是制约重定位响应速度的主要因素。基于此,可以预先创建多个用于三维重建处理的线程,例如对于N核处理器的电子设备,可以创建N个线程。在进行重定位时,获取上述多个线程,通过每个线程分别对不同的两个关键帧图像进行三角化,从而能够实现并行处理,提高重定位的响应速度。In this exemplary embodiment, the link of the 3D reconstruction process has an important influence on the entire relocation process, and is usually the main factor restricting the response speed of the relocation. Based on this, multiple threads for 3D reconstruction processing can be created in advance, for example, for an electronic device with N-core processors, N threads can be created. When relocation is performed, the above-mentioned multiple threads are obtained, and each thread triangulates two different key frame images, so as to realize parallel processing and improve the response speed of relocation.

进一步的,在从视频流中提取关键帧图像后,可以将关键帧图像放置到关键帧队列中。由上述每个线程依次从关键帧队列中提取相邻两个关键帧图像进行三维重建处理。图6示出了通过三角化实现三维重建处理时,设置多线程的情况。如图6所示,设置线程1、线程2、线程3,当产生第一关键帧图像时,放入关键帧队列,随后放入第二关键帧图像,此时线程1从队列中提取第一关键帧图像和第二关键帧图像,进行三角化处理;线程2随后从队列中提取第三关键帧图像和第四关键帧图像,进行三角化处理……各个线程进行三角化处理后输出的数据,也可以进入一个队列(输出数据队列),然后再更新到点云数据中。通过这样的方式,实现了各线程的同步并行处理,且利用队列的方式实现了各线程的负载均衡,进一步提高了效率。Further, after the key frame image is extracted from the video stream, the key frame image can be placed in a key frame queue. Each of the above threads sequentially extracts two adjacent key frame images from the key frame queue for 3D reconstruction processing. FIG. 6 shows the situation of setting multi-threads when the three-dimensional reconstruction process is realized by triangulation. As shown in Figure 6, thread 1, thread 2, and thread 3 are set. When the first key frame image is generated, it is put into the key frame queue, and then the second key frame image is put into it. At this time, thread 1 extracts the first key frame image from the queue. The key frame image and the second key frame image are triangulated; thread 2 then extracts the third key frame image and the fourth key frame image from the queue and performs triangulation processing...the output data of each thread after triangulation processing , can also enter a queue (output data queue), and then update to the point cloud data. In this way, the synchronous parallel processing of each thread is realized, and the load balancing of each thread is realized by using the queue, which further improves the efficiency.

综上所述,本示例性实施方式中,获取由单目相机采集的现实场景的视频流,从中提取关键帧图像,并根据任意两个关键帧图像之间的位姿变换参数进行三维重建处理,得到现实场景的点云数据,最后将现实场景的点云数据和预先获取的地图数据进行匹配,以确定单目相机的位姿。一方面,提供了一种通过单目相机即可实现的重定位方法,无需设置双目相机、深度传感器等其他硬件,实现成本较低。另一方面,本示例性实施方式的实现过程较为简单,且由单目相机所采集的视频与图像数量较少,使得重定位所涉及的数据处理量较低,具有较高的实用行。To sum up, in this exemplary embodiment, the video stream of the real scene captured by the monocular camera is obtained, key frame images are extracted from it, and three-dimensional reconstruction processing is performed according to the pose transformation parameters between any two key frame images , get the point cloud data of the real scene, and finally match the point cloud data of the real scene with the pre-acquired map data to determine the pose of the monocular camera. On the one hand, it provides a relocation method that can be realized through a monocular camera, without setting other hardware such as a binocular camera, a depth sensor, etc., and the implementation cost is low. On the other hand, the implementation process of this exemplary embodiment is relatively simple, and the number of videos and images collected by the monocular camera is small, so that the amount of data processing involved in relocation is relatively low, and has high practicality.

图7示出了本公开示例性实施方式中的基于单目相机的重定位装置。如图7所示,该重定位装置700可以包括:Fig. 7 shows a relocation device based on a monocular camera in an exemplary embodiment of the present disclosure. As shown in Figure 7, the relocation device 700 may include:

视频流获取模块710,用于获取由单目相机采集的现实场景的视频流;Video stream obtaining module 710, for obtaining the video stream of the real scene collected by monocular camera;

关键帧提取模块720,用于从视频流中提取多个关键帧图像;A key frame extraction module 720, configured to extract a plurality of key frame images from the video stream;

三维重建模块730,用于根据任意两个关键帧图像之间的位姿变换参数,对该两个关键帧图像进行三维重建处理,得到现实场景的点云数据;The three-dimensional reconstruction module 730 is used to perform three-dimensional reconstruction processing on the two key frame images according to the pose transformation parameters between any two key frame images to obtain the point cloud data of the real scene;

点云匹配模块740,用于将现实场景的点云数据和预先获取的地图数据进行匹配,以确定所述单目相机的位姿。The point cloud matching module 740 is configured to match the point cloud data of the real scene with the pre-acquired map data to determine the pose of the monocular camera.

在一种可选的实施方式中,关键帧提取模块720,被配置为:In an optional implementation manner, the key frame extraction module 720 is configured to:

对视频流解码,依次得到连续多帧图像;Decode the video stream to obtain continuous multi-frame images in sequence;

根据当前帧图像相对于上一帧图像的位姿变换参数,确定当前帧图像与上一帧图像的相对运动距离;Determine the relative motion distance between the current frame image and the previous frame image according to the pose transformation parameters of the current frame image relative to the previous frame image;

如果上述相对运动距离处于预设数值范围内,则提取当前帧图像作为关键帧。If the above-mentioned relative movement distance is within a preset value range, the current frame image is extracted as a key frame.

在一种可选的实施方式中,三维重建模块730,被配置为:In an optional implementation manner, the three-dimensional reconstruction module 730 is configured to:

获取预先创建的多个线程,通过每个线程分别对不同的两个关键帧图像进行三维重建处理。Obtain multiple pre-created threads, and perform three-dimensional reconstruction processing on two different key frame images through each thread.

进一步,关键帧提取模块720在从视频流中提取关键帧图像后,将关键帧图像放置到关键帧队列中;每个线程依次从关键帧队列中提取相邻两个关键帧图像进行三维重建处理。Further, after the key frame extraction module 720 extracts the key frame image from the video stream, the key frame image is placed in the key frame queue; each thread sequentially extracts two adjacent key frame images from the key frame queue for 3D reconstruction processing .

在一种可选的实施方式中,三维重建模块730,被配置为:In an optional implementation manner, the three-dimensional reconstruction module 730 is configured to:

获取现实场景中的三维点在两个关键帧图像上的投影点;Obtain the projection points of the 3D points in the real scene on the two key frame images;

基于同一三维点对应的两个投影点的相机坐标,以及上述两个关键帧图像之间的位姿变换参数进行三角化处理,求解得到该三维点的空间坐标。Based on the camera coordinates of the two projection points corresponding to the same three-dimensional point, and the pose transformation parameters between the above two key frame images, triangulation is performed to obtain the spatial coordinates of the three-dimensional point.

在一种可选的实施方式中,三维重建模块730,还用于当判断三维点的梯度大于预设梯度阈值时,将三维点添加到现实场景的点云数据中。In an optional implementation manner, the 3D reconstruction module 730 is further configured to add the 3D point to the point cloud data of the real scene when it is judged that the gradient of the 3D point is greater than a preset gradient threshold.

在一种可选的实施方式中,点云匹配模块740,被配置为:In an optional implementation manner, the point cloud matching module 740 is configured to:

当现实场景的点云数据中三维点的数量达到第一数量阈值,或者达到预设周期时间时,将现实场景的点云数据和地图数据进行匹配;When the number of three-dimensional points in the point cloud data of the real scene reaches a first number threshold, or reaches a preset cycle time, match the point cloud data of the real scene with the map data;

如果匹配失败,则继续从视频流中提取关键帧图像,并根据所提取的关键帧图像向现实场景的点云数据中增加新的三维点;If the matching fails, continue to extract key frame images from the video stream, and add new 3D points to the point cloud data of the real scene according to the extracted key frame images;

直到新的三维点的数量达到第二数量阈值,或者达到下一预设周期时间时,再次将现实场景的点云数据和地图数据进行匹配。Until the number of new three-dimensional points reaches the second number threshold, or reaches the next preset cycle time, the point cloud data of the real scene is matched with the map data again.

在一种可选的实施方式中,点云匹配模块740,被配置为:In an optional implementation manner, the point cloud matching module 740 is configured to:

通过迭代最近邻点算法对现实场景的点云数据进行位姿变换,使变换后的点云数据和地图数据之间的误差收敛;Through the iterative nearest neighbor point algorithm, the point cloud data of the real scene is transformed to make the error between the transformed point cloud data and the map data converge;

如果误差小于预设误差阈值,则确定现实场景的点云数据和地图数据匹配成功。If the error is smaller than the preset error threshold, it is determined that the point cloud data of the real scene and the map data are successfully matched.

进一步的,点云匹配模块740,还用于在确定现实场景的点云数据和地图数据匹配成功后,根据点云数据的位姿变换参数确定单目相机在世界坐标系中的位姿,世界坐标系为地图数据的坐标系。Further, the point cloud matching module 740 is also used to determine the pose of the monocular camera in the world coordinate system according to the pose transformation parameters of the point cloud data after determining that the point cloud data and map data of the real scene are successfully matched. The coordinate system is the coordinate system of the map data.

在一种可选的实施方式中,关键帧提取模块720,还用于在从视频流中提取关键帧图像前,对视频流中的图像进行模糊检测,以滤除视频流中的模糊图像。In an optional implementation manner, the key frame extraction module 720 is further configured to perform blur detection on images in the video stream before extracting key frame images from the video stream, so as to filter out blurred images in the video stream.

上述装置中各模块的具体细节在方法部分实施方式中已经详细说明,未披露的细节内容可以参见方法部分的实施方式内容,因而不再赘述。The specific details of each module in the above device have been described in detail in the implementation of the method, and details not disclosed can be found in the implementation of the method, so details are not repeated here.

所属技术领域的技术人员能够理解,本公开的各个方面可以实现为系统、方法或程序产品。因此,本公开的各个方面可以具体实现为以下形式,即:完全的硬件实施方式、完全的软件实施方式(包括固件、微代码等),或硬件和软件方面结合的实施方式,这里可以统称为“电路”、“模块”或“系统”。Those skilled in the art can understand that various aspects of the present disclosure can be implemented as a system, method or program product. Therefore, various aspects of the present disclosure can be embodied in the following forms, namely: a complete hardware implementation, a complete software implementation (including firmware, microcode, etc.), or a combination of hardware and software, which can be collectively referred to herein as "circuit", "module" or "system".

本公开的示例性实施方式还提供了一种计算机可读存储介质,其上存储有能够实现本说明书上述方法的程序产品。在一些可能的实施方式中,本公开的各个方面还可以实现为一种程序产品的形式,其包括程序代码,当程序产品在电子设备上运行时,程序代码用于使电子设备执行本说明书上述“示例性方法”部分中描述的根据本公开各种示例性实施方式的步骤,如图3或图4所示的方法步骤。该程序产品可以采用便携式紧凑盘只读存储器(CD-ROM)并包括程序代码,并可以在电子设备,例如个人电脑上运行。然而,本公开的程序产品不限于此,在本文件中,可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。Exemplary embodiments of the present disclosure also provide a computer-readable storage medium on which a program product capable of implementing the above-mentioned method in this specification is stored. In some possible implementations, various aspects of the present disclosure can also be implemented in the form of a program product, which includes program code. When the program product is run on the electronic device, the program code is used to make the electronic device execute the above-mentioned functions of this specification. The steps described in the section "Exemplary Methods" according to various exemplary embodiments of the present disclosure are the method steps shown in FIG. 3 or FIG. 4 . The program product may take the form of a portable compact disc read-only memory (CD-ROM) and include program code, and may run on an electronic device, such as a personal computer. However, the program product of the present disclosure is not limited thereto. In this document, a readable storage medium may be any tangible medium containing or storing a program, and the program may be used by or in combination with an instruction execution system, apparatus or device.

程序产品可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以为但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。A program product may take the form of any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof. More specific examples (non-exhaustive list) of readable storage media include: electrical connection with one or more conductors, portable disk, hard disk, random access memory (RAM), read only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.

计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了可读程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。可读信号介质还可以是可读存储介质以外的任何可读介质,该可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。A computer readable signal medium may include a data signal carrying readable program code in baseband or as part of a carrier wave. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium other than a readable storage medium that can transmit, propagate, or transport a program for use by or in conjunction with an instruction execution system, apparatus, or device.

可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于无线、有线、光缆、RF等等,或者上述的任意合适的组合。Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

可以以一种或多种程序设计语言的任意组合来编写用于执行本公开操作的程序代码,程序设计语言包括面向对象的程序设计语言—诸如Java、C++等,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。在涉及远程计算设备的情形中,远程计算设备可以通过任意种类的网络,包括局域网(LAN)或广域网(WAN),连接到用户计算设备,或者,可以连接到外部计算设备(例如利用因特网服务提供商来通过因特网连接)。Program code for performing the operations of the present disclosure may be written in any combination of one or more programming languages, including object-oriented programming languages—such as Java, C++, etc., as well as conventional procedural programming Language - such as "C" or similar programming language. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server to execute. In cases involving a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computing device (for example, using an Internet service provider). business to connect via the Internet).

应当注意,尽管在上文详细描述中提及了用于动作执行的设备的若干模块或者单元,但是这种划分并非强制性的。实际上,根据本公开的示例性实施方式,上文描述的两个或更多模块或者单元的特征和功能可以在一个模块或者单元中具体化。反之,上文描述的一个模块或者单元的特征和功能可以进一步划分为由多个模块或者单元来具体化。It should be noted that although several modules or units of the device for action execution are mentioned in the above detailed description, this division is not mandatory. Actually, according to the exemplary embodiment of the present disclosure, the features and functions of two or more modules or units described above may be embodied in one module or unit. Conversely, the features and functions of one module or unit described above can be further divided to be embodied by a plurality of modules or units.

本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本公开的其他实施方式。本申请旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施方式仅被视为示例性的,本公开的真正范围和精神由权利要求指出。Other embodiments of the present disclosure will be readily apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any modification, use or adaptation of the present disclosure, and these modifications, uses or adaptations follow the general principles of the present disclosure and include common knowledge or conventional technical means in the technical field not disclosed in the present disclosure . The specification and embodiments are to be considered as exemplary only, with the true scope and spirit of the disclosure indicated by the appended claims.

应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限。It should be understood that the present disclosure is not limited to the precise constructions which have been described above and shown in the drawings, and various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (12)

Translated fromChinese
1.一种基于单目相机的重定位方法,其特征在于,包括:1. A relocation method based on a monocular camera, characterized in that, comprising:获取由单目相机采集的现实场景的视频流;Obtain a video stream of a real scene captured by a monocular camera;从所述视频流中提取多个关键帧图像;extracting a plurality of key frame images from the video stream;根据任意两个关键帧图像之间的位姿变换参数,对所述两个关键帧图像进行三维重建处理,得到所述现实场景的点云数据;According to the pose transformation parameters between any two key frame images, perform three-dimensional reconstruction processing on the two key frame images to obtain point cloud data of the real scene;将所述现实场景的点云数据和预先获取的地图数据进行匹配,以确定所述单目相机的位姿;Matching the point cloud data of the real scene with the pre-acquired map data to determine the pose of the monocular camera;其中,所述将所述现实场景的点云数据和预先获取的地图数据进行匹配,包括:Wherein, the matching of the point cloud data of the real scene with the map data obtained in advance includes:当所述现实场景的点云数据中三维点的数量达到第一数量阈值,或者达到预设周期时间时,将所述现实场景的点云数据和所述地图数据进行匹配;When the number of three-dimensional points in the point cloud data of the real scene reaches a first quantity threshold, or reaches a preset cycle time, matching the point cloud data of the real scene with the map data;如果匹配失败,则继续从所述视频流中提取关键帧图像,并根据所提取的关键帧图像向所述现实场景的点云数据中增加新的三维点;If the matching fails, continue to extract key frame images from the video stream, and add new three-dimensional points to the point cloud data of the real scene according to the extracted key frame images;直到所述新的三维点的数量达到第二数量阈值,或者达到下一预设周期时间时,再次将所述现实场景的点云数据和所述地图数据进行匹配。Until the number of the new three-dimensional points reaches the second number threshold, or reaches the next preset cycle time, the point cloud data of the real scene is matched with the map data again.2.根据权利要求1所述的方法,其特征在于,所述从所述视频流中提取多个关键帧图像,包括:2. The method according to claim 1, wherein the extracting a plurality of key frame images from the video stream comprises:对所述视频流解码,依次得到连续多帧图像;Decoding the video stream to sequentially obtain multiple consecutive frames of images;根据当前帧图像相对于上一帧图像的位姿变换参数,确定所述当前帧图像与所述上一帧图像的相对运动距离;Determine the relative movement distance between the current frame image and the previous frame image according to the pose transformation parameters of the current frame image relative to the previous frame image;当所述相对运动距离处于预设数值范围内时,提取所述当前帧图像作为关键帧图像。When the relative movement distance is within a preset numerical range, the current frame image is extracted as a key frame image.3.根据权利要求1所述的方法,其特征在于,在对所述两个关键帧图像进行三维重建处理时,所述方法还包括:3. The method according to claim 1, wherein, when the two key frame images are processed for three-dimensional reconstruction, the method further comprises:获取预先创建的多个线程,通过每个所述线程分别对不同的所述两个关键帧图像进行三维重建处理。A plurality of pre-created threads are obtained, and each of the threads is used to perform three-dimensional reconstruction processing on the two different key frame images.4.根据权利要求3所述的方法,其特征在于,在从所述视频流中提取关键帧图像后,将所述关键帧图像放置到关键帧队列中;4. The method according to claim 3, characterized in that, after extracting the key frame image from the video stream, the key frame image is placed in the key frame queue;每个所述线程依次从所述关键帧队列中提取相邻两个关键帧图像进行三维重建处理。Each of the threads sequentially extracts two adjacent key frame images from the key frame queue to perform 3D reconstruction processing.5.根据权利要求1所述的方法,其特征在于,所述根据任意两个关键帧图像之间的位姿变换参数,对所述两个关键帧图像进行三维重建处理,得到所述现实场景的点云数据,包括:5. The method according to claim 1, wherein, according to the pose transformation parameters between any two key frame images, the two key frame images are subjected to three-dimensional reconstruction processing to obtain the real scene point cloud data, including:获取所述现实场景中的三维点在两个关键帧图像上的投影点;Obtain the projection points of the three-dimensional points in the real scene on the two key frame images;基于同一三维点对应的两个投影点的相机坐标,以及所述两个关键帧图像之间的位姿变换参数进行三角化处理,求解得到该三维点的空间坐标。Triangulation is performed based on the camera coordinates of two projection points corresponding to the same three-dimensional point and the pose transformation parameters between the two key frame images, and the spatial coordinates of the three-dimensional point are obtained by solving.6.根据权利要求5所述的方法,其特征在于,所述根据任意两个关键帧图像之间的位姿变换参数,对所述两个关键帧图像进行三维重建处理,得到所述现实场景的点云数据,还包括:6. The method according to claim 5, wherein, according to the pose transformation parameters between any two key frame images, the two key frame images are subjected to three-dimensional reconstruction processing to obtain the real scene The point cloud data also includes:当判断所述三维点的梯度大于预设梯度阈值时,将所述三维点添加到所述现实场景的点云数据中。When it is judged that the gradient of the three-dimensional point is greater than a preset gradient threshold, the three-dimensional point is added to the point cloud data of the real scene.7.根据权利要求1所述的方法,其特征在于,所述将所述现实场景的点云数据和预先获取的地图数据进行匹配,包括:7. The method according to claim 1, wherein the matching of the point cloud data of the real scene with the map data obtained in advance comprises:通过迭代最近邻点算法对所述现实场景的点云数据进行位姿变换,使变换后的所述点云数据和所述地图数据之间的误差收敛;Perform pose transformation on the point cloud data of the real scene by an iterative nearest neighbor algorithm, so that the error between the transformed point cloud data and the map data converges;如果所述误差小于预设误差阈值,则确定所述现实场景的点云数据和所述地图数据匹配成功。If the error is smaller than a preset error threshold, it is determined that the point cloud data of the real scene and the map data are successfully matched.8.根据权利要求7所述的方法,其特征在于,在确定所述现实场景的点云数据和所述地图数据匹配成功后,根据所述点云数据的位姿变换参数确定所述单目相机在世界坐标系中的位姿,所述世界坐标系为所述地图数据的坐标系。8. The method according to claim 7, characterized in that, after determining that the point cloud data of the real scene and the map data are successfully matched, the monocular is determined according to the pose transformation parameters of the point cloud data. The pose of the camera in the world coordinate system, where the world coordinate system is the coordinate system of the map data.9.根据权利要求1所述的方法,其特征在于,在从所述视频流中提取关键帧图像前,所述方法还包括:9. The method according to claim 1, wherein, before extracting the key frame image from the video stream, the method further comprises:对所述视频流中的图像进行模糊检测,以滤除所述视频流中的模糊图像。Blur detection is performed on the images in the video stream to filter out the blurred images in the video stream.10.一种基于单目相机的重定位装置,其特征在于,包括:10. A relocation device based on a monocular camera, comprising:视频流获取模块,用于获取由单目相机采集的现实场景的视频流;The video stream acquisition module is used to acquire the video stream of the real scene collected by the monocular camera;关键帧提取模块,用于从所述视频流中提取多个关键帧图像;A key frame extraction module, used to extract a plurality of key frame images from the video stream;三维重建模块,用于根据任意两个关键帧图像之间的位姿变换参数,对所述两个关键帧图像进行三维重建处理,得到所述现实场景的点云数据;A three-dimensional reconstruction module, configured to perform three-dimensional reconstruction processing on the two key frame images according to pose transformation parameters between any two key frame images, to obtain point cloud data of the real scene;点云匹配模块,用于将所述现实场景的点云数据和预先获取的地图数据进行匹配,以确定所述单目相机的位姿;The point cloud matching module is used to match the point cloud data of the real scene with the map data acquired in advance to determine the pose of the monocular camera;其中,所述点云匹配模块,被配置为:Wherein, the point cloud matching module is configured as:当所述现实场景的点云数据中三维点的数量达到第一数量阈值,或者达到预设周期时间时,将所述现实场景的点云数据和所述地图数据进行匹配;When the number of three-dimensional points in the point cloud data of the real scene reaches a first quantity threshold, or reaches a preset cycle time, matching the point cloud data of the real scene with the map data;如果匹配失败,则继续从所述视频流中提取关键帧图像,并根据所提取的关键帧图像向所述现实场景的点云数据中增加新的三维点;If the matching fails, continue to extract key frame images from the video stream, and add new three-dimensional points to the point cloud data of the real scene according to the extracted key frame images;直到所述新的三维点的数量达到第二数量阈值,或者达到下一预设周期时间时,再次将所述现实场景的点云数据和所述地图数据进行匹配。Until the number of the new three-dimensional points reaches the second number threshold, or reaches the next preset cycle time, the point cloud data of the real scene is matched with the map data again.11.一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1至9任一项所述的方法。11. A computer-readable storage medium, on which a computer program is stored, wherein, when the computer program is executed by a processor, the method according to any one of claims 1 to 9 is implemented.12.一种电子设备,其特征在于,包括:12. An electronic device, characterized in that it comprises:处理器;以及processor; and存储器,用于存储所述处理器的可执行指令;a memory for storing executable instructions of the processor;其中,所述处理器配置为经由执行所述可执行指令来执行权利要求1至9任一项所述的方法。Wherein, the processor is configured to execute the method according to any one of claims 1 to 9 by executing the executable instructions.
CN202010373453.4A2020-05-062020-05-06Repositioning method and device based on monocular camera, storage medium and electronic equipmentActiveCN111652933B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202010373453.4ACN111652933B (en)2020-05-062020-05-06Repositioning method and device based on monocular camera, storage medium and electronic equipment

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202010373453.4ACN111652933B (en)2020-05-062020-05-06Repositioning method and device based on monocular camera, storage medium and electronic equipment

Publications (2)

Publication NumberPublication Date
CN111652933A CN111652933A (en)2020-09-11
CN111652933Btrue CN111652933B (en)2023-08-04

Family

ID=72348255

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202010373453.4AActiveCN111652933B (en)2020-05-062020-05-06Repositioning method and device based on monocular camera, storage medium and electronic equipment

Country Status (1)

CountryLink
CN (1)CN111652933B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN114820776A (en)*2021-01-292022-07-29北京外号信息技术有限公司Method and electronic device for obtaining information of objects in scene
CN112837424B (en)*2021-02-042024-02-06脸萌有限公司Image processing method, apparatus, device and computer readable storage medium
CN114088099B (en)*2021-11-182024-06-25北京易航远智科技有限公司Semantic repositioning method and device based on known map, electronic equipment and medium
CN114413915A (en)*2022-01-212022-04-29北京三快在线科技有限公司Map construction method and device
CN116503474B (en)*2023-02-082025-10-03腾讯科技(深圳)有限公司 Position acquisition method, device, electronic device, storage medium and program product
CN116188583B (en)*2023-04-232023-07-14禾多科技(北京)有限公司 Method, device, device, and computer-readable medium for generating camera pose information

Citations (17)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN105953796A (en)*2016-05-232016-09-21北京暴风魔镜科技有限公司Stable motion tracking method and stable motion tracking device based on integration of simple camera and IMU (inertial measurement unit) of smart cellphone
CN107610175A (en)*2017-08-042018-01-19华南理工大学The monocular vision SLAM algorithms optimized based on semi-direct method and sliding window
CN107990899A (en)*2017-11-222018-05-04驭势科技(北京)有限公司A kind of localization method and system based on SLAM
CN109961506A (en)*2019-03-132019-07-02东南大学A kind of fusion improves the local scene three-dimensional reconstruction method of Census figure
CN110132242A (en)*2018-02-092019-08-16驭势科技(北京)有限公司Multiple-camera positions and the Triangulation Algorithm and its movable body of map structuring immediately
CN110322500A (en)*2019-06-282019-10-11Oppo广东移动通信有限公司Immediately optimization method and device, medium and the electronic equipment of positioning and map structuring
CN110335316A (en)*2019-06-282019-10-15Oppo广东移动通信有限公司 Pose determination method, device, medium and electronic device based on depth information
CN110349213A (en)*2019-06-282019-10-18Oppo广东移动通信有限公司Method, apparatus, medium and electronic equipment are determined based on the pose of depth information
CN110349212A (en)*2019-06-282019-10-18Oppo广东移动通信有限公司Immediately optimization method and device, medium and the electronic equipment of positioning and map structuring
CN110555901A (en)*2019-09-052019-12-10亮风台(上海)信息科技有限公司Method, device, equipment and storage medium for positioning and mapping dynamic and static scenes
CN110568447A (en)*2019-07-292019-12-13广东星舆科技有限公司Visual positioning method, device and computer readable medium
CN110675450A (en)*2019-09-062020-01-10武汉九州位讯科技有限公司Method and system for generating orthoimage in real time based on SLAM technology
CN110766716A (en)*2019-09-102020-02-07中国科学院深圳先进技术研究院 Information acquisition method and system for unknown moving target in space
CN110807809A (en)*2019-10-252020-02-18中山大学Light-weight monocular vision positioning method based on point-line characteristics and depth filter
CN110827395A (en)*2019-09-092020-02-21广东工业大学 A real-time positioning and map construction method suitable for dynamic environment
CN110866953A (en)*2019-10-312020-03-06Oppo广东移动通信有限公司Map construction method and device, and positioning method and device
CN110992487A (en)*2019-12-102020-04-10南京航空航天大学 Fast 3D map reconstruction device and reconstruction method for handheld aircraft fuel tank

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN105953796A (en)*2016-05-232016-09-21北京暴风魔镜科技有限公司Stable motion tracking method and stable motion tracking device based on integration of simple camera and IMU (inertial measurement unit) of smart cellphone
CN107610175A (en)*2017-08-042018-01-19华南理工大学The monocular vision SLAM algorithms optimized based on semi-direct method and sliding window
CN107990899A (en)*2017-11-222018-05-04驭势科技(北京)有限公司A kind of localization method and system based on SLAM
CN110132242A (en)*2018-02-092019-08-16驭势科技(北京)有限公司Multiple-camera positions and the Triangulation Algorithm and its movable body of map structuring immediately
CN109961506A (en)*2019-03-132019-07-02东南大学A kind of fusion improves the local scene three-dimensional reconstruction method of Census figure
CN110349212A (en)*2019-06-282019-10-18Oppo广东移动通信有限公司Immediately optimization method and device, medium and the electronic equipment of positioning and map structuring
CN110335316A (en)*2019-06-282019-10-15Oppo广东移动通信有限公司 Pose determination method, device, medium and electronic device based on depth information
CN110349213A (en)*2019-06-282019-10-18Oppo广东移动通信有限公司Method, apparatus, medium and electronic equipment are determined based on the pose of depth information
CN110322500A (en)*2019-06-282019-10-11Oppo广东移动通信有限公司Immediately optimization method and device, medium and the electronic equipment of positioning and map structuring
CN110568447A (en)*2019-07-292019-12-13广东星舆科技有限公司Visual positioning method, device and computer readable medium
CN110555901A (en)*2019-09-052019-12-10亮风台(上海)信息科技有限公司Method, device, equipment and storage medium for positioning and mapping dynamic and static scenes
CN110675450A (en)*2019-09-062020-01-10武汉九州位讯科技有限公司Method and system for generating orthoimage in real time based on SLAM technology
CN110827395A (en)*2019-09-092020-02-21广东工业大学 A real-time positioning and map construction method suitable for dynamic environment
CN110766716A (en)*2019-09-102020-02-07中国科学院深圳先进技术研究院 Information acquisition method and system for unknown moving target in space
CN110807809A (en)*2019-10-252020-02-18中山大学Light-weight monocular vision positioning method based on point-line characteristics and depth filter
CN110866953A (en)*2019-10-312020-03-06Oppo广东移动通信有限公司Map construction method and device, and positioning method and device
CN110992487A (en)*2019-12-102020-04-10南京航空航天大学 Fast 3D map reconstruction device and reconstruction method for handheld aircraft fuel tank

Also Published As

Publication numberPublication date
CN111652933A (en)2020-09-11

Similar Documents

PublicationPublication DateTitle
CN111652933B (en)Repositioning method and device based on monocular camera, storage medium and electronic equipment
CN112269851B (en) Map data updating method, device, storage medium and electronic device
CN112927362B (en) Map reconstruction method and device, computer readable medium and electronic device
CN111598776B (en)Image processing method, image processing device, storage medium and electronic apparatus
CN111580765B (en)Screen projection method, screen projection device, storage medium, screen projection equipment and screen projection equipment
CN111429517A (en) Relocation method, relocation device, storage medium and electronic device
CN113096185B (en) Visual positioning method, visual positioning device, storage medium and electronic equipment
CN112270710B (en)Pose determining method, pose determining device, storage medium and electronic equipment
TWI808987B (en)Apparatus and method of five dimensional (5d) video stabilization with camera and gyroscope fusion
CN112381828B (en)Positioning method, device, medium and equipment based on semantic and depth information
CN111784614A (en) Image denoising method and device, storage medium and electronic device
CN112270754A (en) Local grid map construction method and device, readable medium and electronic device
TW202112138A (en)Method and system for processing input video
WO2021233032A1 (en)Video processing method, video processing apparatus, and electronic device
CN112270755A (en)Three-dimensional scene construction method and device, storage medium and electronic equipment
CN112270702B (en)Volume measurement method and device, computer readable medium and electronic equipment
CN112927271B (en) Image processing method, image processing device, storage medium and electronic device
CN112348738B (en) Image optimization method, image optimization device, storage medium and electronic equipment
CN111741303B (en)Deep video processing method and device, storage medium and electronic equipment
CN111835973A (en)Shooting method, shooting device, storage medium and mobile terminal
CN111784734A (en) Image processing method and device, storage medium and electronic device
CN116310105B (en) Multi-view based object three-dimensional reconstruction method, device, equipment and storage medium
CN114429495A (en) A method and electronic device for reconstructing a three-dimensional scene
US20250308062A1 (en)Image rendering method and apparatus, electronic device, and storage medium
CN116486008B (en)Three-dimensional reconstruction method, display method and electronic equipment

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp