CN104504671B

Movatterモバイル変換

Info

Publication number: CN104504671B
Application number: CN201410765678.9A
Authority: CN
Inventors: 张骏飞; 王梁昊; 李东晓; 张明
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2014-12-12
Filing date: 2014-12-12
Publication date: 2017-04-19
Anticipated expiration: 2034-12-12
Also published as: CN104504671A

Abstract

本发明公开了一种用于立体显示的虚实融合图像生成方法，包括：(1)利用单目RGB‑D摄像机获取真实场景的深度图和彩色图；(2)重建三维场景表面模型，计算摄像机参数；(3)映射得到虚拟视点位置的深度图和彩色图；(4)完成虚拟物体的三维注册，渲染得到虚拟物体的深度图和彩色图，进行虚实融合，得到用于立体显示的虚实融合内容。本发明采用单目RGB‑D摄像机拍摄，逐帧重建三维场景模型，模型被同时用于摄像机追踪和虚拟视点映射，能获取较高的摄像机追踪精度和虚拟物体注册精度，能较好的应对基于图像的虚拟视点绘制技术中出现的空洞，能实现虚实场景的遮挡判断和碰撞检测，利用立体显示设备可以获得逼真的立体显示效果。

The invention discloses a method for generating a virtual-real fusion image for stereoscopic display, comprising: (1) using a monocular RGB-D camera to obtain a depth map and a color map of a real scene; (2) reconstructing a three-dimensional scene surface model, and calculating the camera parameters; (3) Mapping to obtain the depth map and color map of the virtual viewpoint position; (4) Complete the three-dimensional registration of the virtual object, render the depth map and color map of the virtual object, perform virtual-real fusion, and obtain virtual-real fusion for stereoscopic display content. The present invention adopts a monocular RGB-D camera to shoot, and reconstructs a three-dimensional scene model frame by frame. The model is used for camera tracking and virtual viewpoint mapping at the same time, which can obtain higher camera tracking accuracy and virtual object registration accuracy, and can better deal with Holes appearing in the virtual viewpoint rendering technology of images can realize occlusion judgment and collision detection of virtual and real scenes, and realistic stereoscopic display effects can be obtained by using stereoscopic display devices.

Description

Translated fromChinese

一种用于立体显示的虚实融合图像生成方法A virtual-real fusion image generation method for stereoscopic display

技术领域technical field

本发明属于三维立体成像技术领域，具体涉及一种用于立体显示的虚实融合图像生成方法。The invention belongs to the technical field of three-dimensional stereoscopic imaging, and in particular relates to a virtual-real fusion image generation method for stereoscopic display.

背景技术Background technique

立体显示技术已成为IT、通信、广电领域的主旋律，也成为人们耳熟能详的名词。随着立体显示技术的日臻完善，人们对3D内容的热情与期待也越来越高。随着市场需求的不断增加，人们正在寻求更加方便快捷低成本的3D内容生成方式。虚实融合是指通过电脑技术将虚拟信息融入到真实世界的技术，在医学、娱乐和军事等领域均有广泛的应用前景。立体显示技术与虚实融合的结合是大势所趋，一方面立体显示技术为虚实融合提供了更佳的展示方式，另一方面虚实融合则为3D内容制作带来了新的方法。Stereoscopic display technology has become the main theme in the fields of IT, communications, and radio and television, and has also become a familiar term. With the improvement of stereoscopic display technology, people's enthusiasm and expectations for 3D content are also getting higher and higher. With the continuous increase of market demand, people are looking for a more convenient, quick and low-cost way to generate 3D content. The fusion of virtual and real refers to the technology of integrating virtual information into the real world through computer technology, which has broad application prospects in the fields of medicine, entertainment and military. The combination of stereoscopic display technology and virtual-real fusion is the general trend. On the one hand, stereoscopic display technology provides a better display method for virtual-real fusion, and on the other hand, virtual-real fusion brings a new method for 3D content production.

对于单目摄像机而言，3D内容制作的关键在于虚拟视点的生成算法。虚拟视点生成算法可被分为两大类：基于模型的虚拟视点绘制技术和基于图像的虚拟视点绘制技术。For monocular cameras, the key to 3D content production lies in the generation algorithm of virtual viewpoints. Virtual viewpoint generation algorithms can be divided into two categories: model-based virtual viewpoint rendering technology and image-based virtual viewpoint rendering technology.

基于模型的虚拟视点绘制技术指的是先采用计算机视觉知识重建拍摄场景的三维模型，然后通过计算机三维图形渲染技术绘制得到新的虚拟视点图像的技术。这种方法在结构简单的场景中可以获得较好的虚拟视点绘制效果，但是对于复杂场景，精确三维模型建立的难度极高，而且所需的运算资源和数据量也非常大，因此并不适用于现实世界中自然场景的虚拟视点绘制。The model-based virtual viewpoint rendering technology refers to the technology that first uses computer vision knowledge to reconstruct the 3D model of the shooting scene, and then draws a new virtual viewpoint image through computer 3D graphics rendering technology. This method can obtain a better virtual viewpoint rendering effect in a scene with a simple structure, but for a complex scene, it is extremely difficult to establish an accurate 3D model, and the required computing resources and data volume are also very large, so it is not applicable Virtual viewpoint rendering of natural scenes in the real world.

基于图像的虚拟视点绘制技术指的是不需要精确的三维场景模型，直接利用摄像机拍摄的真实图像，通过双目或多目摄像机模型，映射出新的虚拟视点图像的一类技术。相比基于模型的虚拟视点绘制技术，其具有输入数据量小，图像数据获取简单，绘制速度快等诸多优点，非常适合自然三维场景的绘制应用，但是在视差较大的区域由于遮挡而产生的空洞填补问题难以解决。Image-based virtual viewpoint rendering technology refers to a type of technology that does not require an accurate 3D scene model, directly uses the real image captured by the camera, and maps a new virtual viewpoint image through a binocular or multi-eye camera model. Compared with the model-based virtual viewpoint rendering technology, it has many advantages such as small amount of input data, simple image data acquisition, and fast rendering speed. It is very suitable for the rendering application of natural 3D scenes. The void filling problem is difficult to solve.

摄像机追踪技术是虚实融合中最为关键的技术，它是指系统应该能够实时精确的计算摄像机所处的位置。在虚实融合系统中，摄像机追踪技术直接关系到虚拟物体能否始终正确的被安放到正确的位置，将决定虚实融合效果的稳定性和真实感。最初的虚实融合系统采用基于标记的方法进行摄像机追踪，使用特殊图形作为标记来估计摄像机的运动姿态，这种方法相对比较简单，但是由于追踪依赖特殊标记物完成，因此使用场景并不广泛。Camera tracking technology is the most critical technology in virtual-real fusion, which means that the system should be able to accurately calculate the position of the camera in real time. In the virtual-real fusion system, the camera tracking technology is directly related to whether the virtual objects can always be correctly placed in the correct position, which will determine the stability and realism of the virtual-real fusion effect. The original virtual-real fusion system uses a marker-based method for camera tracking, using special graphics as markers to estimate the camera's motion posture. This method is relatively simple, but because the tracking relies on special markers, it is not widely used in scenarios.

上世纪九十年代，Smith和Cheeseman等人给出了即时定位与地图构建(SLAM)的基于估计理论的解决方案，通过提取图像特征点构建稀疏的特征点云，再利用特征点云进行摄像机追踪。在此基础之上，不断涌现出许多基于单目RGB摄像机的摄像机追踪方案，如Davision提出的MonoSLAM系统，Klein等人提出的PTAM(并行跟踪和定位)算法，使研究者可以进行灵活的无需标记的三维注册，但是追踪精度仍然不够高。In the 1990s, Smith and Cheeseman et al. gave a solution based on estimation theory for real-time localization and map construction (SLAM), which constructed a sparse feature point cloud by extracting image feature points, and then used the feature point cloud for camera tracking. . On this basis, many camera tracking schemes based on monocular RGB cameras are emerging, such as the MonoSLAM system proposed by Davison, and the PTAM (Parallel Tracking and Positioning) algorithm proposed by Klein et al. 3D registration, but the tracking accuracy is still not high enough.

发明内容Contents of the invention

针对现有技术所存在的上述技术问题，本发明提供了一种用于立体显示的虚实融合图像生成方法，能获取较高的摄像机追踪精度和虚拟物体注册精度，能较好的应对基于图像的虚拟视点绘制技术中出现的空洞。Aiming at the above-mentioned technical problems existing in the prior art, the present invention provides a virtual-real fusion image generation method for stereoscopic display, which can obtain higher camera tracking accuracy and virtual object registration accuracy, and can better deal with image-based Holes appearing in virtual viewpoint rendering techniques.

一种用于立体显示的虚实融合图像生成方法，包括如下步骤：A method for generating a virtual-real fusion image for stereoscopic display, comprising the steps of:

(1)利用单目RGB-D(红绿蓝三原色加距离深度)摄像机采集获取单目RGB-D摄像机视点关于场景当前帧的深度图D_{r_k}和色彩图C_{r_k}；(1) Utilize the monocular RGB-D (red, green and blue three primary colors plus distance depth) camera acquisition to obtain the depth map D_{r_k} and the color map C_{r_k} of the current frame of the scene from the monocular RGB-D camera viewpoint;

(2)利用前一帧的三维场景重建模型确定当前帧的单目RGB-D摄像机参数，并利用深度图D_{r_k}和色彩图C_{r_k}对所述的三维场景重建模型进行更新，得到当前帧的三维场景重建模型；(2) Utilize the 3D scene reconstruction model of the previous frame to determine the monocular RGB-D camera parameters of the current frame, and use the depth map D_{r_k} and the color map C_{r_k} to update the 3D scene reconstruction model to obtain the current frame 3D scene reconstruction model;

(3)根据采集到的深度图D_{r_k}和色彩图C_{r_k}以当前帧的三维场景重建模型作为指导，利用双目摄像机模型通过映射得到虚拟摄像机视点关于场景当前帧的深度图D_{v_k}和色彩图C_{v_k}；(3) According to the collected depth map D_{r_k} and color map C_{r_k} , use the 3D scene reconstruction model of the current frame as a guide, and use the binocular camera model to obtain the depth map D_{v_k} and color map of the virtual camera viewpoint on the current frame of the scene through mapping C_{v_k} ;

(4)对虚拟物体进行三维注册并渲染得到单目RGB-D摄像机视点和虚拟摄像机视点关于虚拟物体的深度图和彩色图；利用两个视点关于场景和虚拟物体的深度图进行遮挡判断和碰撞检测以对两个视点关于场景和虚拟物体的色彩图进行融合，得到用于立体显示的虚实融合图像。(4) Perform three-dimensional registration and rendering of the virtual object to obtain the depth map and color map of the virtual object from the monocular RGB-D camera viewpoint and the virtual camera viewpoint; use the depth map of the scene and the virtual object from the two viewpoints to perform occlusion judgment and collision Detection is used to fuse the color maps of the scene and virtual objects from two viewpoints to obtain a virtual-real fusion image for stereoscopic display.

所述步骤(2)的具体过程如下：The concrete process of described step (2) is as follows:

2.1从前一帧的三维场景重建模型中提取出单目RGB-D摄像机视点关于场景前一帧的深度图D_{r_k-1}；2.1 Extract the depth map D_{r_k-1} of the monocular RGB-D camera viewpoint on the previous frame of the scene from the 3D scene reconstruction model of the previous frame;

2.2对当前帧深度图D_{r_k}和前一帧深度图D_{r_k-1}进行匹配，计算出当前帧的单目RGB-D摄像机参数；2.2 Match the depth map D r_k of the current frame with the depth map D_{r_k}_-1 of the previous frame, and calculate the monocular RGB-D camera parameters of the current frame;

2.3从匹配过程的局外点中滤波得到运动物体区域，以运动物体区域作为模板从当前帧的深度图D_{r_k}和色彩图C_{r_k}中分离出运动物体和静态背景；2.3 Filter the moving object area from the outlier points in the matching process, and use the moving object area as a template to separate the moving object and the static background from the depth map D_{r_k} and color map C_{r_k} of the current frame;

2.4根据当前帧的单目RGB-D摄像机参数利用当前帧静态场景的深度信息和色彩信息，采用体集成算法对前一帧的三维场景重建模型进行更新，得到当前帧的三维场景重建模型。2.4 According to the monocular RGB-D camera parameters of the current frame, using the depth information and color information of the static scene in the current frame, the volume integration algorithm is used to update the 3D scene reconstruction model of the previous frame to obtain the 3D scene reconstruction model of the current frame.

优选地，利用Raycast算法从前一帧的三维场景重建模型中提取出单目RGB-D摄像机视点关于场景前一帧的深度图D_{r_k-1}。Preferably, a Raycast algorithm is used to extract the depth map D_{r_k-1} of the previous frame of the scene from the viewpoint of the monocular RGB-D camera from the 3D scene reconstruction model of the previous frame.

优选地，采用ICP(迭代最近点)算法对深度图D_{r_k}和深度图D_{r_k-1}进行匹配。Preferably, the depth map D_{r_k} and the depth map D_{r_k-1} are matched using an ICP (Iterative Closest Point) algorithm.

所述的局外点为当前帧深度图D_{r_k}中与前一帧深度图D_{r_k-1}未匹配上的像素点。The outliers are pixels in the depth map D_{r_k} of the current frame that do not match the depth map D_{r_k-1} of the previous frame.

所述的步骤2.3中，从当前帧深度图D_{r_k}的局外点中滤除掉属于场景中物体边缘上的局外点、单目RGB-D摄像机无法获取到深度值的局外点以及零星小块聚集的局外点，从而得到运动物体区域。In the step 2.3, the outliers belonging to the edge of the object in the scene, the outliers whose depth value cannot be obtained by the_monocular RGB-D camera, and the sporadic Outlier points gathered by small blocks to obtain the moving object area.

所述步骤(3)的具体过程如下：The concrete process of described step (3) is as follows:

3.1将当前帧的单目RGB-D摄像机参数代入双目摄像机模型中计算得到当前帧的虚拟摄像机参数，根据所述的虚拟摄像机参数从当前帧的三维场景重建模型中提取出虚拟摄像机视点关于场景当前帧的深度图D_{v1_k}和色彩图C_{v1_k}；3.1 Substitute the monocular RGB-D camera parameters of the current frame into the binocular camera model to calculate the virtual camera parameters of the current frame, and extract the virtual camera viewpoint from the 3D scene reconstruction model of the current frame according to the virtual camera parameters. About the scene Depth map D_{v1_k} and color map C_{v1_k} of the current frame;

3.2根据双目摄像机模型，从当前帧深度图D_{r_k}映射得到虚拟摄像机视点关于场景当前帧的深度图D_{v2_k}；3.2 According to the binocular camera model, map the depth map D_{r_k} of the current frame to obtain the depth map D_{v2_k} of the virtual camera viewpoint on the current frame of the scene;

3.3对映射得到的当前帧深度图D_{v2_k}中的重采样空洞进行填补；3.3 Fill the resampling holes in the mapped current frame depth map D_{v2_k} ;

3.4根据填补后的深度图D_{v2_k}和双目摄像机模型，从当前帧色彩图C_{r_k}映射得到虚拟摄像机视点关于场景当前帧的色彩图C_{v2_k}；3.4 According to the filled depth map D_{v2_k} and the binocular camera model, map the color map C_{r_k} of the current frame to obtain the color map C_{v2_k} of the virtual camera viewpoint on the current frame of the scene;

3.5利用提取得到的当前帧深度图D_{v1_k}和色彩图C_{v1_k}对映射得到的当前帧的深度图D_{v2_k}和色彩图C_{v2_k}进行遮挡空洞填补，最终得到虚拟摄像机视点关于场景当前帧的深度图D_{v_k}和色彩图C_{v_k}。3.5 Use the extracted depth map D_{v1_k} and color map C_{v1_k} of the current frame to fill the occlusion holes of the mapped depth map D_{v2_k} and color map C_{v2_k} of the current frame, and finally obtain the depth map D of the current frame of the scene from the virtual camera viewpoint_{v_k} and the colormap C_{v_k} .

优选地，利用Raycast算法从当前帧的三维场景重建模型中提取出虚拟摄像机视点关于场景当前帧的深度图D_{v1_k}和色彩图C_{v1_k}。Preferably, the depth map D_{v1_k} and the color map C_{v1_k} of the virtual camera viewpoint relative to the current frame of the scene are extracted from the 3D scene reconstruction model of the current frame by using the Raycast algorithm.

本发明采用单目RGB-D摄像机拍摄，逐帧重建三维场景模型，模型被同时用于摄像机追踪和虚拟视点映射，能获取较高的摄像机追踪精度和虚拟物体注册精度，能较好的应对基于图像映射的虚拟视点绘制技术中出现的空洞，能实现虚实场景的遮挡判断和碰撞检测，利用3D立体显示设备可以获得逼真的立体显示效果。The invention adopts a monocular RGB-D camera to shoot, and reconstructs a three-dimensional scene model frame by frame. The model is used for camera tracking and virtual viewpoint mapping at the same time, which can obtain higher camera tracking accuracy and virtual object registration accuracy, and can better deal with The holes in the virtual viewpoint rendering technology of image mapping can realize occlusion judgment and collision detection of virtual and real scenes, and realistic stereoscopic display effects can be obtained by using 3D stereoscopic display devices.

附图说明Description of drawings

图1为本发明摄像机追踪模块的处理流程示意图。FIG. 1 is a schematic diagram of the processing flow of the camera tracking module of the present invention.

图2为本发明虚拟视点绘制模块的处理流程示意图。FIG. 2 is a schematic diagram of the processing flow of the virtual viewpoint rendering module of the present invention.

具体实施方式detailed description

为了更为具体地描述本发明，下面结合附图及具体实施方式对本发明的技术方案进行详细说明。In order to describe the present invention more specifically, the technical solutions of the present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.

本发明用于立体显示的虚实融合图像生成方法，包括如下步骤：The method for generating a virtual-real fusion image for stereoscopic display of the present invention comprises the following steps:

(1)利用单目RGB-D摄像机进行场景深度信息和颜色信息的获取。(1) Using a monocular RGB-D camera to obtain scene depth information and color information.

(2)利用摄像机追踪模块根据三维场景重建模型确定每帧的摄像机参数，同时逐帧将场景深度信息和颜色信息融入三维场景重建模型中。(2) Use the camera tracking module to determine the camera parameters of each frame according to the 3D scene reconstruction model, and at the same time integrate the scene depth information and color information into the 3D scene reconstruction model frame by frame.

2.1利用Raycast算法，根据保存的上一帧的摄像机姿态从三维场景重建模型中提取出上一帧的深度图；2.1 Use the Raycast algorithm to extract the depth map of the previous frame from the 3D scene reconstruction model according to the saved camera pose of the previous frame;

2.2对当前帧的深度图进行预处理。利用ICP算法，对上一帧和当前帧的深度图进行匹配，计算从上一帧到当前帧的摄像机运动，进而计算出当前帧的摄像机参数；2.2 Preprocess the depth map of the current frame. Use the ICP algorithm to match the depth map of the previous frame and the current frame, calculate the camera motion from the previous frame to the current frame, and then calculate the camera parameters of the current frame;

2.3从匹配过程的局外点中滤波得到运动物体区域，以运动物体区域作为模板在当前帧深度图D_{r_k}和当前帧色彩图C_{r_k}中分离出运动物体和静态背景；2.3 Filter the moving object area from the outlier points in the matching process, and use the moving object area as a template to separate the moving object and the static background in the current frame depth map D_{r_k} and the current frame color map C_{r_k} ;

2.4利用体集成算法，根据当前帧的摄像机参数，将当前帧中静态场景的深度信息和彩色信息融入三维场景重建模型。该模型是一个三维空间中的正方体，该模型由许多大小均匀的小正方体组成，每个小正方体存储其所代表的空间位置的加权TSDF值和加权颜色值。2.4 Use the volume integration algorithm to integrate the depth information and color information of the static scene in the current frame into the 3D scene reconstruction model according to the camera parameters of the current frame. The model is a cube in a three-dimensional space, which is composed of many small cubes of uniform size, and each small cube stores the weighted TSDF value and weighted color value of the spatial position it represents.

如图1所示，摄像机追踪模块采用的是基于模型的摄像机追踪方法，利用逐帧重建的三维场景表面模型作为匹配对象，并从匹配局外点中分离出运动物体，提高摄像机抗干扰追踪能力。本实施方式采用单目RGB-D摄像机作为采集设备，摄像机追踪过程仅适用深度信息进行匹配。首先需要对摄像机进行标定，以获取摄像机内参。每帧深度信息获取之后需要对深度图进行降噪处理，本实施方式采用双边滤波器进行滤波。根据摄像机内参，可以将深度图映射成为摄像机坐标系下的三维点云。假设摄像机运动平缓，可以利用ICP算法对当前帧的点云和前一帧的点云进行快速匹配，获得前一帧和当前帧摄像机的相对运动，进而根据前一帧摄像机参数计算出当前帧摄像机参数，其中ICP算法采用的是点面距能量公式如下：As shown in Figure 1, the camera tracking module uses a model-based camera tracking method, which uses the 3D scene surface model reconstructed frame by frame as the matching object, and separates moving objects from the matching outliers to improve the camera's anti-interference tracking ability . In this embodiment, a monocular RGB-D camera is used as the acquisition device, and only depth information is used for matching in the camera tracking process. First, the camera needs to be calibrated to obtain the internal reference of the camera. After the depth information of each frame is acquired, it is necessary to perform noise reduction processing on the depth map. In this embodiment, a bilateral filter is used for filtering. According to the internal parameters of the camera, the depth map can be mapped into a 3D point cloud in the camera coordinate system. Assuming that the camera moves smoothly, the ICP algorithm can be used to quickly match the point cloud of the current frame with the point cloud of the previous frame to obtain the relative motion of the camera in the previous frame and the current frame, and then calculate the current frame camera according to the camera parameters of the previous frame. Parameters, where the ICP algorithm uses the point-to-plane distance energy formula as follows:

其中：V_k，V_k-1分别是当前帧和前一帧三维点云的顶点图，N_k-1是前一帧三维点云的法向量图，T_g,k是两帧之间的摄像机运动矩阵。Among them: V_k , V_k-1 are the vertex maps of the current frame and the previous frame of the 3D point cloud, N_k-1 is the normal vector map of the previous frame of the 3D point cloud, T_{g, k} are the vertex maps between the two frames Camera motion matrix.

另外，匹配过程中的局外点可以通过形态学操作过滤出运动物体位置，以此为模板可以得到滤除运动干扰的场景深度图。通过获得的摄像机参数，可将当前帧深度图再次映射到空间中，即获得当前帧拍摄到的场景表面各点在空间中的位置，将这些深度信息融入三维场景表面模型。该模型是一个三维空间中的正方体，该模型由许多大小均匀的小正方体组成，每个小正方体存储其所代表的空间位置的加权TSDF值和加权颜色值。其中TSDF值代表的是这一空间位置到离它最近的实体表面的距离，小正方体中存储的值由各帧TSDF值加权得到，加权方式如下：In addition, the outlier points in the matching process can filter out the position of moving objects through morphological operations, and use this as a template to obtain a scene depth map that filters out motion interference. Through the obtained camera parameters, the depth map of the current frame can be mapped to the space again, that is, the position of each point on the scene surface captured by the current frame in space can be obtained, and these depth information can be integrated into the three-dimensional scene surface model. The model is a cube in a three-dimensional space, which is composed of many small cubes of uniform size, and each small cube stores the weighted TSDF value and weighted color value of the spatial position it represents. The TSDF value represents the distance from this spatial position to the nearest solid surface, and the value stored in the small cube is obtained by weighting the TSDF value of each frame. The weighting method is as follows:

w_k＝w_k-1+w_k′w_k ＝w_k-1 +w_k′

其中：_k-1，_k，d_k′分别是前一帧加权TSDF值，当前帧加权TSDF值和当前帧TSDF值，w_k-1，w_k分别是前一帧和当前帧权重，w_k′为每帧增加权重，本方法中设为常量1。颜色加权方式与TSDF加权方式相同。Among them:_k-1 ,_k , d_k′ are the weighted TSDF value of the previous frame, the weighted TSDF value of the current frame and the TSDF value of the current frame, w_k-1 , w_k are the weights of the previous frame and the current frame respectively, w_{k '} Increase the weight for each frame, which is set to a constant of 1 in this method. Color weighting is done in the same way as TSDF weighting.

(3)利用虚拟视点绘制模块根据每帧采集到的深度图和彩色图，使用三维场景重建模型作为指导，利用双目摄像机模型映射得到虚拟视点位置的深度图和彩色图。(3) Use the virtual viewpoint drawing module to obtain the depth map and color map of the virtual viewpoint position by using the three-dimensional scene reconstruction model as a guide according to the depth map and color map collected in each frame by using the binocular camera model mapping.

3.1根据双目摄像机模型的摄像机参数，利用Raycast算法，从三维场景重建模型中提取得到拍摄位置的深度图，虚拟视点位置的深度图和彩色图；3.1 According to the camera parameters of the binocular camera model, the Raycast algorithm is used to extract the depth map of the shooting position, the depth map and the color map of the virtual viewpoint position from the 3D scene reconstruction model;

3.2使用提取得到的拍摄位置的深度图对当前帧的深度图进行填补，根据双目摄像机模型，从填补后的当前帧的深度图映射得到虚拟视点位置的深度图；3.2 Use the extracted depth map of the shooting position to fill the depth map of the current frame, and obtain the depth map of the virtual viewpoint position from the filled depth map of the current frame according to the binocular camera model;

3.3对映射得到的虚拟视点位置的深度图中的重采样空洞进行填补；3.3 Fill the resampling hole in the depth map of the mapped virtual viewpoint position;

3.4使用虚拟视点位置填补后的深度图，根据双目摄像机的摄像机参数，从当前帧的彩色图映射得到虚拟视点位置的彩色图；3.4 Use the depth map filled in the virtual viewpoint position, and according to the camera parameters of the binocular camera, map the color map of the current frame to obtain the color map of the virtual viewpoint position;

3.5利用模型中提取的深度图和彩色图再次对虚拟视点位置的深度图和彩色图进行空洞填补，得到最终的虚拟视点位置的深度图和彩色图。3.5 Use the depth map and color map extracted from the model to fill holes in the depth map and color map of the virtual viewpoint position again to obtain the final depth map and color map of the virtual viewpoint position.

如图2所示，虚拟视点绘制模块采用的方法结合了基于图像的虚拟视点绘制技术和基于模型的虚拟视点绘制技术。模块包含遮挡空洞填补单元，深度图映射单元，重采样空洞填补单元，彩色图逆映射单元。获取到一帧的深度图后，首先利用Raycast算法从三维场景重建模型中提取出当前帧的深度图作为辅助深度图，空洞填补单元以这副辅助深度图作为参考，对当前帧深度图进行空洞填补。接下来深度图映射单元利用双目摄像机模型将当前帧深度图映射到虚拟视点位置，从而获取虚拟视点深度图。然后重采样空洞填补单元需要对虚拟视点深度图中因重采样所产生的空洞进行填补操作。当前帧的彩色图利用填补后的虚拟视点深度图所确定的逆映射关系，经过彩色图逆映射单元，得到虚拟视点彩色图。最后遮挡空洞填补单元从三维场景重建模型中提取出虚拟摄像机位置的深度图和彩色图作为参考，对带有遮挡空洞的虚拟视点深度图和彩色图进行填补，至此获得了没有空洞的虚拟视点位置的深度图和彩色图。As shown in Figure 2, the method adopted by the virtual viewpoint rendering module combines image-based virtual viewpoint rendering technology and model-based virtual viewpoint rendering technology. The module includes an occlusion hole filling unit, a depth map mapping unit, a resampling hole filling unit, and a color map inverse mapping unit. After obtaining the depth map of a frame, first use the Raycast algorithm to extract the depth map of the current frame from the 3D scene reconstruction model as an auxiliary depth map. fill up. Next, the depth map mapping unit uses the binocular camera model to map the depth map of the current frame to the position of the virtual viewpoint, thereby obtaining the depth map of the virtual viewpoint. Then the resampling hole filling unit needs to fill the holes generated by resampling in the depth map of the virtual viewpoint. The color image of the current frame uses the inverse mapping relationship determined by the padded virtual viewpoint depth map, and passes through the color image inverse mapping unit to obtain the virtual viewpoint color image. Finally, the occlusion hole filling unit extracts the depth map and color map of the virtual camera position from the 3D scene reconstruction model as a reference, fills the virtual viewpoint depth map and color map with occlusion holes, and thus obtains the virtual viewpoint position without holes depth and color maps.

(4)利用虚实融合模块进行虚拟物体的三维注册，并渲染得到拍摄位置和虚拟摄像机位置的虚拟物体深度图和彩色图。对虚实图像进行融合，并利用深度信息进行遮挡判断和碰撞检测，得到用于立体显示的虚实融合内容。(4) Use the virtual-real fusion module to perform three-dimensional registration of virtual objects, and render the virtual object depth map and color map of the shooting position and virtual camera position. The virtual and real images are fused, and the depth information is used for occlusion judgment and collision detection to obtain the virtual and real fusion content for stereoscopic display.

虚实融合模块是将真实拍摄位置和虚拟摄像机位置两个视点的虚拟物体和真实场景的深度图和彩色图进行融合的模块。模块包括三维注册单元，遮挡判断单元，碰撞检测单元和虚拟物体控制单元，各单元的操作都是同时针对两个视点进行的。虚拟物体控制单元可以监听键盘输入，使虚拟物体在世界坐标系中进行放缩，移动，旋转等运动。三维注册单元根据虚拟物体在世界坐标系中的空间位置以及两个视点的摄像机参数，计算出虚拟物体在两个视点投影平面中所呈现的彩色图和深度图。遮挡判断单元通过监测虚拟物体和真实场景在同一位置的深度值来判断应该显示虚拟物体还是真实场景，以获取真实的遮挡效果。本方法同时获取了摄像机视线方向上的虚拟物体正面和背面的深度图，碰撞检测单元通过判断虚拟物体正面和背面的深度图，以及真实场景深度图之间的关系判断是否发生了碰撞，对于碰撞发生的位置以红色标明。最终根据实际应用，可决定将虚实融合结果以各种立体格式显示。The virtual-real fusion module is a module that fuses the depth map and color map of the virtual object and the real scene from the two viewpoints of the real shooting position and the virtual camera position. The module includes a three-dimensional registration unit, an occlusion judgment unit, a collision detection unit and a virtual object control unit, and the operations of each unit are carried out for two viewpoints at the same time. The virtual object control unit can monitor the keyboard input, and make the virtual object perform movements such as zooming, moving, and rotating in the world coordinate system. The three-dimensional registration unit calculates the color map and depth map presented by the virtual object in the projection plane of the two viewpoints according to the spatial position of the virtual object in the world coordinate system and the camera parameters of the two viewpoints. The occlusion judging unit judges whether the virtual object or the real scene should be displayed by monitoring the depth values of the virtual object and the real scene at the same position, so as to obtain a real occlusion effect. This method obtains the front and back depth maps of the virtual object in the direction of the camera line of sight at the same time, and the collision detection unit judges whether a collision has occurred by judging the relationship between the front and back depth maps of the virtual object and the real scene depth map. The location where it occurs is marked in red. Finally, according to the actual application, it can be decided to display the virtual-real fusion results in various stereoscopic formats.

Claims

Translated fromChinese

1.一种用于立体显示的虚实融合图像生成方法，包括如下步骤：1. A virtual-real fusion image generation method for stereoscopic display, comprising the steps of:

(1)利用单目RGB-D摄像机采集获取单目RGB-D摄像机视点关于场景当前帧的深度图D_{r_k}和色彩图C_{r_k}；(1) Utilize the monocular RGB-D camera to acquire the depth map D_{r_k} and the color map C_{r_k} of the current frame of the scene from the monocular RGB-D camera viewpoint;

(2)利用前一帧的三维场景重建模型确定当前帧的单目RGB-D摄像机参数，并利用深度图D_{r_k}和色彩图C_{r_k}对所述的三维场景重建模型进行更新，得到当前帧的三维场景重建模型；具体过程如下：(2) Utilize the 3D scene reconstruction model of the previous frame to determine the monocular RGB-D camera parameters of the current frame, and use the depth map D_{r_k} and the color map C_{r_k} to update the 3D scene reconstruction model to obtain the current frame 3D scene reconstruction model; the specific process is as follows:

2.4根据当前帧的单目RGB-D摄像机参数利用当前帧静态场景的深度信息和色彩信息，采用体集成算法对前一帧的三维场景重建模型进行更新，得到当前帧的三维场景重建模型；2.4 According to the monocular RGB-D camera parameters of the current frame, use the depth information and color information of the static scene in the current frame, and use the volume integration algorithm to update the 3D scene reconstruction model of the previous frame to obtain the 3D scene reconstruction model of the current frame;

2.根据权利要求1所述的虚实融合图像生成方法，其特征在于：利用Raycast算法从前一帧的三维场景重建模型中提取出单目RGB-D摄像机视点关于场景前一帧的深度图D_{r_k-1}。2. The virtual-real fusion image generation method according to claim 1, characterized in that: utilize the Raycast algorithm to extract the depth map D_{r_k} of the monocular RGB-D camera viewpoint about the previous frame of the scene from the three-dimensional scene reconstruction model of the previous frame_-1 .

3.根据权利要求1所述的虚实融合图像生成方法，其特征在于：采用ICP算法对深度图D_{r_k}和深度图D_{r_k-1}进行匹配。3. The virtual-real fusion image generation method according to claim 1, characterized in that: the depth map D_{r_k} and the depth map D_{r_k-1} are matched using an ICP algorithm.

4.根据权利要求1所述的虚实融合图像生成方法，其特征在于：所述的局外点为当前帧深度图D_{r_k}中与前一帧深度图D_{r_k-1}未匹配上的像素点。4. The virtual-real fusion image generation method according to claim 1, characterized in that: said outliers are pixels in the depth map D_{r_k} of the current frame that do not match the depth map D_{r_k-1} of the previous frame.

5.根据权利要求1所述的虚实融合图像生成方法，其特征在于：所述的步骤2.3中，从当前帧深度图D_{r_k}的局外点中滤除掉属于场景中物体边缘上的局外点、单目RGB-D摄像机无法获取到深度值的局外点以及零星小块聚集的局外点，从而得到运动物体区域。5. The virtual-real fusion image generation method according to claim 1, characterized in that: in the step 2.3, filter out the outliers belonging to the edge of objects in the scene from the outliers in the current frame depth map_{Dr_k} Points, outliers where depth values cannot be obtained by monocular RGB-D cameras, and outliers where sporadic small blocks gather to obtain moving object areas.

6.根据权利要求1所述的虚实融合图像生成方法，其特征在于：所述步骤(3)的具体过程如下：6. The virtual-real fusion image generation method according to claim 1, characterized in that: the specific process of the step (3) is as follows:

7.根据权利要求6所述的虚实融合图像生成方法，其特征在于：利用Raycast算法从当前帧的三维场景重建模型中提取出虚拟摄像机视点关于场景当前帧的深度图D_{v1_k}和色彩图C_{v1_k}。7. The virtual-real fusion image generation method according to claim 6, characterized in that: utilize the Raycast algorithm to extract the depth map D_{v1_k} and the color map C_{v1_k} of the virtual camera viewpoint on the current frame of the scene from the three-dimensional scene reconstruction model of the current frame .