CN118037989A

Movatterモバイル変換

Info

Publication number: CN118037989A
Application number: CN202311806718.5A
Authority: CN
Inventors: 陶文兵; 苏婉娟; 刘李漫; 顾华领
Original assignee: Hangzhou Tuke Intelligent Information Technology Co ltd
Current assignee: Hangzhou Tuke Intelligent Information Technology Co ltd
Priority date: 2023-12-26
Filing date: 2023-12-26
Publication date: 2024-05-14
Anticipated expiration: 2043-12-26
Also published as: CN118037989B

Abstract

The invention provides a prior-drive-based multi-view nerve implicit surface reconstruction method, which takes an image set of a scene and camera parameters corresponding to each image as input, and reconstructs a surface model of the scene in an unsupervised mode through volume rendering and surface rendering technologies. The proposed method mainly comprises the following steps: the method has the advantages that the feature consistency loss of the visual sensing is provided for carrying out explicit supervision on the geometric field, so that the problem of under constraint of the geometric field is solved, and the geometric consistency of the reconstructed surface is improved; a depth priori auxiliary sampling strategy is provided, which is helpful for efficiently positioning the positions of points on the surface; the internal priori guiding importance rendering strategy is provided, the optimization of the network is guided by excavating the priori knowledge in the network, so that the problem of biased surface rendering caused by volume rendering is relieved, and the fidelity of the reconstructed surface model can be improved; the strategy provided by the invention can be applied to a plurality of fields such as three-dimensional reconstruction, virtual reality, smart city and the like.

Description

Translated fromChinese

一种基于先验驱动的多视图神经隐式表面重建方法A multi-view neural implicit surface reconstruction method based on prior-driven

技术领域Technical Field

本发明涉及计算机视觉技术领域，更具体的说，它涉及一种基于先验驱动的多视图神经隐式表面重建方法。The present invention relates to the field of computer vision technology, and more specifically, to a multi-view neural implicit surface reconstruction method based on prior-driven.

背景技术Background technique

多视图神经隐式表面重建方法旨在带相机姿态的多视角图像中通过体渲染和表面渲染技术重建出场景的表面模型。由于该方法能够以无监督的方式学习出场景的三维表达，近年来，基于神经隐式学习的多视图表面重建方法得到了快速发展。虽然神经隐式表面重建方法在简单场景上展现出了令人印象深刻的结果，但由于其主要依赖于颜色重建损失进行优化，几何场的重建是颜色场重建的附属产品，即几何场存在欠约束优化的问题，这导致基于神经隐式学习的多视图表面重建方法难以处理真实世界的复杂场景。因此，如何加强对于几何场的优化约束，使其能够重建出复杂场景下的高质量三维表达是有待解决的关键问题。The multi-view neural implicit surface reconstruction method aims to reconstruct the surface model of the scene in multi-view images with camera poses through volume rendering and surface rendering techniques. Since this method can learn the three-dimensional expression of the scene in an unsupervised manner, the multi-view surface reconstruction method based on neural implicit learning has developed rapidly in recent years. Although the neural implicit surface reconstruction method has shown impressive results on simple scenes, it mainly relies on color reconstruction loss for optimization, and the reconstruction of the geometric field is a by-product of the color field reconstruction, that is, the geometric field has the problem of under-constrained optimization, which makes it difficult for the multi-view surface reconstruction method based on neural implicit learning to handle complex scenes in the real world. Therefore, how to strengthen the optimization constraints on the geometric field so that it can reconstruct high-quality three-dimensional expressions in complex scenes is a key issue to be solved.

发明内容Summary of the invention

本发明克服了现有技术的不足，提供了解决现有方法难以恢复出复杂场景下的表面模型的技术问题的一种基于点云引导的多视图神经隐式表面重建方法。The present invention overcomes the shortcomings of the prior art and provides a multi-view neural implicit surface reconstruction method based on point cloud guidance to solve the technical problem that the prior methods are difficult to restore surface models in complex scenes.

本发明的技术方案如下：The technical solution of the present invention is as follows:

一种基于先验驱动的多视图神经隐式表面重建方法，包括如下步骤：A multi-view neural implicit surface reconstruction method based on prior-driven, comprising the following steps:

S1先验提取步骤：给定一个场景的K幅图像及各图像对应的相机参数，从K幅场景图像中依次选择一幅图像为参考图像，与其最邻近的N-1幅图像为邻域图像，并进行先验提取；S1 prior extraction step: given K images of a scene and the camera parameters corresponding to each image, select one image from the K scene images as the reference image, and the N-1 images closest to it as the neighborhood images, and perform prior extraction;

首先，将每幅参考图像和其对应的邻域图像，输入到预训练好的多视图立体视觉网络中，得到该参考图像对应的深度图D_MVS、邻域图像对应的可见性图V_m，以及参考图像和邻域图像对应的图像特征F_r和F_m；该过程将一直重复，直到所有图像都处理完，之后根据光度一致性和几何一致性滤除每张图像深度图中不可靠的深度部分；First, each reference image and its corresponding neighborhood image are input into the pre-trained multi-view stereo vision network to obtain the depth map D_MVS corresponding to the reference image, the visibility map V_m corresponding to the neighborhood image, and the image features F_r and F_m corresponding to the reference image and the neighborhood image; this process will be repeated until all images are processed, and then the unreliable depth part of each image depth map is filtered out according to photometric consistency and geometric consistency;

S2图像训练步骤：对于一个场景的K幅图像，从中随机选择一幅图像作为目标图像进行训练；S2 image training step: for K images of a scene, randomly select one image as the target image for training;

首先，在选择的图像中随机选择N_p个像素x，根据选择的图像对应的相机参数，假设射线r从相机中心o沿着其视角方向v穿过像素x进行投射；之后，采用误差有界采样和深度先验辅助采样在射线r上采样P个深度值进而得到射线上的P个采样点；First, randomly select N_p pixels x in the selected image. According to the camera parameters corresponding to the selected image, assume that ray r is projected from the camera center o along its viewing direction v through pixel x. Then, error-bounded sampling and depth prior-assisted sampling are used to sample P depth values on ray r. Then P sampling points on the ray are obtained;

S3编码步骤：构建多尺度哈希编码并将S2图像训练步骤中采样得到的三维点进行编码；S3 encoding step: construct a multi-scale hash code and encode the three-dimensional points sampled in the S2 image training step;

S4 SDF处理步骤：对于射线上采样的三维点和其对应的哈希编码特征h(p)，首先通过基于傅里叶级数的位置编码对三维点/>进行编码得到位置编码特征γ(p)；之后，将位置编码特征γ(p)和哈希编码特征h(p)进行结合并输入到由两层多层感知机MLP组成的几何网络f_θ来进行SDF场学习，并输出学习到的SDFs值s和通道数为256的SDF特征；S4 SDF processing step: for the 3D points sampled on the ray And its corresponding hash code feature h(p), firstly, the three-dimensional point is encoded by Fourier series based position coding/> The position coding feature γ(p) is obtained by encoding; then, the position coding feature γ(p) and the hash coding feature h(p) are combined and input into a geometric network f_θ composed of two layers of multi-layer perceptrons MLP to learn the SDF field, and the learned SDFs value s and the SDF feature with 256 channels are output;

S5颜色处理步骤：将射线r的视角方向v、几何网络学习到的SDF特征和三维点的法向/>输入到由两层多层感知机MLP组成的颜色网络来对颜色场进行建模，并输出学习到的射线上r采样点/>对应的颜色值/>S5 color processing step: the viewing direction v of ray r, the SDF features learned by the geometric network and the 3D point Normal/> Input to the color network composed of two layers of multi-layer perceptron MLP to model the color field and output the learned r sampling points on the ray/> The corresponding color value />

S6渲染步骤：将S4 SDF处理步骤中学习到的SDFs值s转换为体密度，并基于体渲染将S5颜色处理步骤中得到的颜色值和法向/>进行渲染从而得到射线r的颜色/>法向和深度/>S6 rendering step: convert the SDFs value s learned in S4 SDF processing step into volume density, and render the color value obtained in S5 color processing step based on volume and normal/> Rendering is performed to obtain the color of ray r/> Normal and depth/>

S7重要点渲染步骤：在通过上述步骤得到当前射线r的SDFs值s和渲染深度值后；构建样本集/>在样本集/>的基础上执行步骤S3至S6的操作，得到重要性渲染的颜色/>S7 important point rendering step: After the above steps, the SDFs value s and rendering depth value of the current ray r are obtained After; build sample set /> In the sample set/> Based on this, the operations of steps S3 to S6 are performed to obtain the color of importance rendering/>

S8监督步骤：采用可见性感知的特征一致性损失L_vfc、法向一致性损失L_normal、几何偏置损失L_bias、平滑性损失L_smooth、Eikonal损失L_eikonal和颜色损失L_rgb来进行监督步骤S1至S7的操作；S8 supervision step: use visibility-aware feature consistency loss L_vfc , normal consistency loss L_normal , geometric bias loss L_bias , smoothness loss L_smooth , Eikonal loss L_eikonal and color loss L_rgb to perform the operations of supervision steps S1 to S7;

S9表面模型获取步骤：在训练过程中不断重复步骤S2至S7的操作，直至达到预设的迭代次数；训练完成后，采用Marching Cube算法从几何网络学习到的SDF场的零水平级中提取场景的表面模型。S9 surface model acquisition step: during the training process, the operations of steps S2 to S7 are continuously repeated until a preset number of iterations is reached; after the training is completed, the Marching Cube algorithm is used to extract the surface model of the scene from the zero level of the SDF field learned by the geometric network.

进一步的，S2图像训练具体包括如下步骤：Furthermore, S2 image training specifically includes the following steps:

S201：对于一个场景的K幅图像，从中随机选择一幅图像作为目标图像进行训练，并在该图像中随机选择N_p个像素x，根据该图像对应的相机参数，假设射线r从相机中心o沿着其视角方向v穿过像素x进行投射；S201: For K images of a scene, randomly select one image as the target image for training, and randomly select N_p pixels x in the image. According to the camera parameters corresponding to the image, assume that the ray r is projected from the camera center o along its viewing direction v through the pixel x;

S202：采用误差有界采样在射线r上采样P_E个采样点得到样本集S202: Use bounded error sampling to sample P_E sampling points on ray r to obtain a sample set

S203：采用深度先验辅助采样在射线r上采样P_MVS个采样点得到样本集S203: Use depth prior assisted sampling to sample P_MVS sampling points on ray r to obtain a sample set

S204：将基于误差有界采样得到的样本集和深度先验辅助采样得到的样本集结合得到最终的样本集，并将样本集中的采样值/>转换为射线上的三维点t_n和t_f分别表示射线的近边界和远边界。S204: The sample set obtained based on bounded error sampling And the sample set obtained by deep prior assisted sampling Combine to get the final sample set, and put the sample values in the sample set/> Convert to a 3D point on the ray t_n and t_f represent the near and far boundaries of the ray respectively.

进一步的，步骤S202中具体处理如下：Furthermore, the specific processing in step S202 is as follows:

首先对误差阈值ε和可学习参数β分别随机初始化一个大于0的值，在[t_n,t_f]中进行均匀采样得到初始样本即/>并初始化β₊使其满足β₊>β，从而能够使误差界限/>满足条件/>First, the error threshold ε and the learnable parameter β are randomly initialized to a value greater than 0, and the initial sample is obtained by uniform sampling in [t_n ,t_f ] That is/> And initialize β₊ to satisfy β₊ >β, so that the error bound /> Satisfy the conditions/>

之后为了在保持的同时减少β₊，对/>进行上采样得到候选样本集/>若/>被充分上采样后满足/>则将β₊向β方向减小；该采样算法在/>的情况下会持续迭代，根据中值定理，存在β_*∈(β,β₊)，使得/>并使用最多10次迭代的等分方法来寻找β_*，并更新β₊的值；上述上采样、迭代过程将循环进行直到达到循环的最大次数或/>Then in order to maintain while reducing β₊ , to/> Up-sample to obtain candidate sample set/> If/> After being fully upsampled, it satisfies/> Then reduce β₊ to β; the sampling algorithm is in/> In the case of continuous iteration, according to the mean value theorem, there exists β_* ∈(β,β₊ ) such that/> And use the equal division method with a maximum of 10 iterations to find β_* and update the value of β₊ ; the above upsampling and iterative process will be repeated until the maximum number of cycles is reached or/>

接着，使用最终得到的和β来估计当前的不透明度/>即Next, use the resulting and β to estimate the current opacity/> Right now

其中δ_i＝t_i+1-t_i，σ表示体密度，其通过体密度转换函数σ_β(s)将符号距离函数值s转换为体密度，即：Wherein δ_i =t_i+1 -t_i , σ represents the volume density, which is converted from the signed distance function value s to the volume density by the volume density conversion function σ_β (s), that is:

最后，通过逆变换采样来得到最终的样本集/>Finally, the inverse transform is used to sample To get the final sample set/>

进一步的，步骤S203中具体处理如下：Furthermore, the specific processing in step S203 is as follows:

首先基于步骤S1中得到的可靠深度图D_MVS提取出目标图像中的像素x对应的深度值d_MVS，并将其投影到3D空间中，得到p_MVS；First, based on the reliable depth map D_MVS obtained in step S1, the depth value d_MVS corresponding to the pixel x in the target image is extracted, and it is projected into the 3D space to obtain p_MVS ;

之后，基于如下公式将3D点p_MVS转换为在射线r到相机中心o的距离t_MVS：After that, the 3D point p_MVS is converted to the distance t_MVS from the ray r to the camera center o based on the following formula:

接着，在中均匀地P_MVS个样本得到本集/>其中表示第i个和第i+1个样本之间的假设间隔，P_t是用于控制假设间隔H_t的预定义标量；对于没有可靠深度值d_MVS的像素x，则从步骤S202中产生的候选样本集/>中随机选取P_MVS个样本来构建样本集/>Then, in Evenly distribute P_MVS samples in this set to obtain/> in represents the hypothesis interval between the i-th and i+1-th samples,_Pt is a predefined scalar used to control the hypothesis interval_Ht ; for a pixel x without a reliable depth value d_MVS , the candidate sample set generated in step S202 is Randomly select P_MVS samples to construct the sample set/>

进一步的，S3编码步骤具体处理如下：Furthermore, the S3 encoding step is specifically processed as follows:

首先，在空间中构建一个带有可学习参数θ的多分辨率特征网格其分为L个层级，每个层级含有T个特征向量，即每个层级的哈希表大小为T，而每个特征向量的维度为F；然后，对于每个待编码三维点/>通过插值来得到每个层级的哈希网格上的编码特征/>并将它们连接在一起形成哈希编码特征向量h(p)。First, a multi-resolution feature grid with learnable parameters θ is constructed in space It is divided into L levels, each level contains T feature vectors, that is, the hash table size of each level is T, and the dimension of each feature vector is F; then, for each 3D point to be encoded/> The encoded features on the hash grid at each level are obtained by interpolation/> And concatenate them together to form the hash-coded feature vector h(p).

进一步的，S7重要点渲染步骤具体处理如下：Furthermore, the S7 important point rendering step is specifically processed as follows:

S701：基于如下公式，并根据学习到的SDFs值s定位得到位于SDF场的零水平级的点的深度/>S701: Based on the following formula, locate the point at the zero level of the SDF field according to the learned SDFs value s Depth/>

S702：在[t_n,t_f]内均匀采样Q个点，这Q个点的深度和以及/>共同构成了一个新的样本集/>S702: uniformly sample Q points in [t_n , t_f ], and the depth and and/> Together they form a new sample set/>

S703：在样本集的基础上执行步骤S3至S6的操作，得到重要性渲染的颜色S703: In the sample set Based on this, the operations of steps S3 to S6 are performed to obtain the color of importance rendering.

7.根据权利要求6所述的一种基于先验驱动的多视图神经隐式表面重建方法，其特征在于，S8监督步骤具体处理包括如下：7. According to the multi-view neural implicit surface reconstruction method based on prior drive in claim 6, it is characterized in that the specific processing of the S8 supervision step includes the following:

S801：计算可见性感知的特征一致性损失；首先，基于S701得到的可以得到点的位置；然后，基于当前目标视图I_r中像素x对应的3D点/>通过平面诱导单应性H_m得到第m个邻域视图I_m中对应的像素x_m：S801: Calculate the visibility-aware feature consistency loss; first, based on S701 Can get points The position of; then, based on the 3D point corresponding to pixel x in the current target view I_r /> The corresponding pixel x_m in the mth neighborhood view I_m is obtained through the plane-induced homography H_m :

其中，K_r、R_r、t_r和K_m、R_m、t_m分别表示参考视图和第m个邻域视图的相机内参、旋转和平移；Where K_r , R_r , t_r and K_m , R_m , t_m represent the camera intrinsic parameters, rotation and translation of the reference view and the mth neighboring view respectively;

接着，基于从步骤S1中从预训练的多视图立体视觉网络提取的参考视图和邻域视图的图像特征，提取像素x的图像特征和其对应像素x_m的图像特征；为了增强表示能力，提取以像素x为中心的大小为Q×Q块q_r对应的图像特征F_r和q_r对应的块q_m的相应特征F_m；F_r和F_m的光度一致性用余弦相似度来衡量：Next, based on the image features of the reference view and the neighborhood view extracted from the pre-trained multi-view stereo vision network in step S1, the image features of pixel x and its corresponding pixel_xm are extracted; in order to enhance the representation capability, the image feature_Fr corresponding to the block_qr of_size Q×Q centered on pixel x and the corresponding feature_Fm of the block_qm corresponding to qr are extracted; the photometric consistency of_Fr and_Fm is measured by cosine similarity:

最后，为了处理可见性的问题，进一步引入了步骤S1中从预训练的多视图立体视觉网络提取邻域视图的可见性图，并得到如下式所示的可见性感知的特征一致性损失L_vfc：Finally, in order to deal with the visibility problem, the visibility map of the neighborhood view extracted from the pre-trained multi-view stereo vision network in step S1 is further introduced, and the visibility-aware feature consistency loss L_vfc is obtained as shown in the following formula:

其中，N_p和M分别表示小批量中像素的数量和邻域视图的数量，N_q表示块q_r中的像素数量；Where_Np and M represent the number of pixels and the number of neighborhood views in a mini-batch, respectively, and_Nq represents the number of pixels in a block_qr ;

S802：基于如下公式计算法向一致性损失L_normal：S802: Calculate the normal consistency loss L_normal based on the following formula:

其中，表示由步骤S6中得到的射线r的法向，/>表示从预训练的Ominidata模型中提取的单目法向；in, represents the normal direction of the ray r obtained in step S6, /> represents the monocular normal extracted from the pre-trained Ominidata model;

S803：基于如下公式计算几何偏置损失L_bias：S803: Calculate the geometric bias loss L_bias based on the following formula:

其中，表示通过步骤S801计算得到的学习到的SDF的零水平集点的位置，S表示小批量中的点/>的几何，|S|表示/>的个数，/>表示/>的绝对值；in, represents the position of the zero level set point of the learned SDF calculated in step S801, and S represents the point in the mini-batch/> The geometry of |S| represents/> The number of, /> Indicates/> The absolute value of

S804：基于如下公式计算平滑性损失L_smooth：S804: Calculate the smoothness loss L_smooth based on the following formula:

其中，n(p)表示通过步骤S5得到的法向，ε表示值很小的常数，X是均匀采样点与近表面点的集合；Wherein, n(p) represents the normal direction obtained by step S5, ε represents a very small constant, and X is a set of uniform sampling points and near-surface points;

S805：基于如下公式计算Eikonal损失L_eikonal：S805: Calculate the Eikonal loss_Leikonal based on the following formula:

S806：基于如下公式计算颜色损失L_rgb和S806: Calculate the color loss L_rgb and

其中，和/>分别表示通过步骤S6和步骤S7计算得到的颜色值，C(r)表示真值颜色值，R表示小批量中射线的集合。in, and/> They represent the color values calculated by step S6 and step S7 respectively, C(r) represents the true color value, and R represents the set of rays in the small batch.

本发明相比现有技术优点在于：Compared with the prior art, the present invention has the following advantages:

本发明为了解决几何网络在优化过程中的欠约束问题，提出了可见性感知的特征一致性损失，可显著提高重建表面模型的几何一致性。此外，提出了深度先验辅助采样策略，有助于定位表面上点的位置。为了缓解由体渲染带来的有偏表面渲染问题，提出了内部先验引导重要性渲染，可提高重建表面模型的保真度。In order to solve the under-constraint problem of geometric networks in the optimization process, the present invention proposes a visibility-aware feature consistency loss, which can significantly improve the geometric consistency of the reconstructed surface model. In addition, a depth prior-assisted sampling strategy is proposed to help locate the position of points on the surface. In order to alleviate the biased surface rendering problem caused by volume rendering, an internal prior-guided importance rendering is proposed to improve the fidelity of the reconstructed surface model.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明的整体架构示意图。FIG1 is a schematic diagram of the overall structure of the present invention.

具体实施方式Detailed ways

下面详细描述本发明的实施方式，其中自始至终相同或类似的标号表示相同或类似的元件或类似功能的元件。下面通过参考附图描述的实施方式是示例性的，仅用于解释本发明而不能作为对本发明的限制。The embodiments of the present invention are described in detail below, wherein the same or similar reference numerals throughout represent the same or similar elements or elements with similar functions. The embodiments described below with reference to the accompanying drawings are exemplary and are only used to explain the present invention and cannot be used as a limitation of the present invention.

本技术领域技术人员可以理解的是，除非另外定义，这里使用的所有术语(包括技术术语和科技术语)具有与本发明所属领域中的普通技术人员的一般理解相同的意义。还应该理解的是，诸如通用字典中定义的那些术语应该被理解为具有与现有技术的上下文中的意义一致的意义，并且除非像这里一样的定义，不会用理想化或过于正式的含义来解释。It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical terms and scientific terms) used herein have the same meaning as those generally understood by those skilled in the art in the field to which the present invention belongs. It should also be understood that terms such as those defined in general dictionaries should be understood to have meanings consistent with the meanings in the context of the prior art, and will not be interpreted with idealized or overly formal meanings unless defined as herein.

各实施方式中提到的有关于步骤的标号，仅仅是为了描述的方便，而没有实质上先后顺序的联系。各具体实施方式中的不同步骤，可以进行不同先后顺序的组合，实现本发明的发明目的。The step numbers mentioned in each embodiment are only for the convenience of description and have no substantial sequence relationship. Different steps in each specific embodiment can be combined in different sequences to achieve the purpose of the invention.

下面结合附图和具体实施方式对本发明进一步说明。The present invention is further described below in conjunction with the accompanying drawings and specific embodiments.

如图1所示，一种基于先验驱动的多视图神经隐式表面重建方法，包括如下步骤：As shown in FIG1 , a multi-view neural implicit surface reconstruction method based on prior-driven includes the following steps:

首先，将每幅参考图像和其对应的邻域图像，输入到预训练好的多视图立体视觉网络中，得到该参考图像对应的深度图D_MVS、邻域图像对应的可见性图V_m，以及参考图像和邻域图像对应的图像特征F_r和F_m；该过程将一直重复，直到所有图像都处理完即得到所有图像的深度图，之后根据光度一致性和几何一致性滤除每张图像深度图中不可靠的深度部分，只留下可靠的深度；First, each reference image and its corresponding neighborhood image are input into the pre-trained multi-view stereo vision network to obtain the depth map D_MVS corresponding to the reference image, the visibility map V_m corresponding to the neighborhood image, and the image features F_r and F_m corresponding to the reference image and the neighborhood image; this process will be repeated until all images are processed and the depth maps of all images are obtained. After that, the unreliable depth part of each image depth map is filtered out according to the photometric consistency and geometric consistency, leaving only the reliable depth;

其中，预训练好的多视图立体视觉网络采用了CasMVSNet的级联架构。构造代价体时，采用了PVSNet中的可见性感知代价体方式，在训练时采用不确定性感知的损失进行监督，从而以无监督的方式得到每张深度图对应的不确定性。更具体地说，在通过多视图图像特征网络提取出参考图像和邻域图像对应的深度特征F_r和后，在级联架构的第1个阶段，在整个场景深度范围R¹内均匀采样L¹个深度假设值。然后，通过可微单应变换，在每个深度假设下，将第m幅邻域视图的深度特征/>投影变换至参考视图下，再利用组相关度量来构建两视图代价体M_m。接着，对于第m个两视图代价体M_m，采用浅层3D CNN来估计可见性图V_m，并基于每幅领域视图的可见性图V_m，对所有的两视图代价体进行加权求和，得到最终聚合的代价体M。最后，利用三维卷积神经网络对上一步得到的代价体M进行正则化，得到具有两个通道的正则化后的代价体，并通过Softmax对代价体的两个通道分别进行操作，得到深度概率体和不确定性概率体，并基于该深度概率体，采用soft-argmax得到深度图对不确定性概率体进熵操作得到不确定性图/>在级联架构的第2阶段和第3阶段，根据上一阶段估计的深度图/>确定该阶段的深度假设采样范围R^s，并在该深度范围内进行均匀采样L^s个深度假设值。然后，重复上述所述的代价体构造与聚合、代价体正则化、深度图和不确定性回归步骤，直到得到第3阶段的深度图D_MVS。在多视图立体视觉网络的训练过程中，每个阶段得到的深度图都采用下述不确定感知损失函数进行监督：The pre-trained multi-view stereo vision network uses the cascade architecture of CasMVSNet. When constructing the cost body, the visibility-aware cost body method in PVSNet is adopted, and the uncertainty-aware loss is used for supervision during training, so as to obtain the uncertainty corresponding to each depth map in an unsupervised manner. More specifically, the depth features F r and F_r corresponding to the reference image and the neighborhood image are extracted through the multi-view image feature network. Finally, in the first stage of the cascade architecture, L¹ depth hypothesis values are uniformly sampled over the entire scene depth range R^1. Then, through a differentiable homography, the depth features of the mth neighborhood view are transformed under each depth hypothesis. Project the transformation to the reference view, and then use the group correlation metric to construct the two-view cost volume M_m . Next, for the mth two-view cost volume M_m , a shallow 3D CNN is used to estimate the visibility map V_m , and based on the visibility map V_m of each domain view, all two-view cost volumes are weighted summed to obtain the final aggregated cost volume M. Finally, the cost volume M obtained in the previous step is regularized using a three-dimensional convolutional neural network to obtain a regularized cost volume with two channels, and the two channels of the cost volume are operated separately by Softmax to obtain a depth probability volume and an uncertainty probability volume, and based on the depth probability volume, soft-argmax is used to obtain the depth map Perform entropy operation on the uncertainty probability body to obtain the uncertainty graph/> In the second and third stages of the cascade architecture, the depth map estimated in the previous stage is used to Determine the depth hypothesis sampling range R^s of this stage, and uniformly sample L^s depth hypothesis values within this depth range. Then, repeat the cost volume construction and aggregation, cost volume regularization, depth map and uncertainty regression steps described above until the depth map D_MVS of the third stage is obtained. During the training process of the multi-view stereo vision network, the depth map obtained at each stage is supervised using the following uncertainty perception loss function:

其中，和/>分别表示第s个阶段的真值深度和预测得到的深度，in, and/> Represent the true depth and predicted depth of the sth stage respectively,

光度一致性基于多视图立体视觉网络预测深度图过程产生的确定性图对预测的深度图进行过滤，3个阶段的过滤阈值均为0.6，即对于预测的深度图D_MVS的某个像素在三个阶段确定性图中对应的确定性值均大于0.6，则该深度被认为是可靠的，否则会被认为是不可靠的。几何一致性计算重投影误差来对预测的深度过滤，在至少5个视图下的重投影坐标误差小于1和重投影深度误差小于0.01的深度被认为是可靠的。Photometric consistency is a deterministic graph generated by the process of predicting depth maps based on multi-view stereo vision networks. The predicted depth map is filtered, and the filtering threshold of the three stages is 0.6. That is, if the corresponding certainty value of a certain pixel in the predicted depth map D_MVS in the three-stage certainty map is greater than 0.6, the depth is considered reliable, otherwise it is considered unreliable. The geometric consistency calculates the reprojection error to filter the predicted depth. The depth with a reprojection coordinate error less than 1 and a reprojection depth error less than 0.01 in at least 5 views is considered reliable.

S2图像训练具体包括如下步骤：S2 image training specifically includes the following steps:

S201：对于一个场景的K幅图像，从中随机选择一幅图像作为目标图像进行训练，并在该图像中随机选择N_p个像素x，根据该图像对应的相机参数，假设射线r从相机中心o沿着其视角方向v穿过像素x进行投射；S201: For K images of a scene, one image is randomly selected as the target image for training, and N_p pixels x are randomly selected in the image. According to the camera parameters corresponding to the image, it is assumed that the ray r is projected from the camera center o along its viewing direction v through the pixel x;

S202：采用误差有界采样在射线r上采样P_E个采样点得到样本集具体处理如下：S202: Use bounded error sampling to sample P_E sampling points on ray r to obtain a sample set The specific processing is as follows:

其中δ_i＝t_i+1-t_i，σ表示体密度，其通过体密度转换函数σ_β(s)将符号距离函数(Signed Distance Functions,SDFs)值s转换为体密度，即：Where δ_i =t_i+1 -t_i , σ represents the volume density, which is converted from the signed distance function (SDFs) value s to the volume density through the volume density conversion function σ_β (s), that is:

S203：采用深度先验辅助采样在射线r上采样P_MVS个采样点得到样本集具体处理如下：S203: Use depth prior assisted sampling to sample P_MVS sampling points on ray r to obtain a sample set The specific processing is as follows:

S3编码步骤：构建多尺度哈希编码并将S2图像训练步骤中采样得到的三维点进行编码；具体处理如下：S3 encoding step: construct a multi-scale hash code and encode the three-dimensional points sampled in the S2 image training step; the specific process is as follows:

首先，在空间中构建一个带有可学习参数θ的多分辨率特征网格其分为L个层级，每个层级含有T个特征向量，即每个层级的哈希表大小为T，而每个特征向量的维度为F；每个层级的特征网格是独立的，特征向量被存储在网格的顶点上。为了结合不同频率的特征，网格的分辨率是在几何空间中的采样，第l层级的哈希网格分辨率R_l可以由如下公式定义，其中R_max为网格的最高分辨率，R_min为网格的最低分辨率。First, a multi-resolution feature grid with learnable parameters θ is constructed in space It is divided into L levels, each level contains T feature vectors, that is, the hash table size of each level is T, and the dimension of each feature vector is F; the feature grid of each level is independent, and the feature vectors are stored on the vertices of the grid. In order to combine features of different frequencies, the resolution of the grid is sampled in the geometric space. The hash grid resolution R_l of the lth level can be defined by the following formula, where R_max is the highest resolution of the grid and R_min is the lowest resolution of the grid.

然后，对于每个待编码三维点通过插值来得到每个层级的哈希网格上的编码特征/>并将它们连接在一起形成哈希编码特征向量h(p)。h(p)具体计算公式如下：Then, for each 3D point to be encoded The encoded features on the hash grid at each level are obtained by interpolation/> And connect them together to form a hash code feature vector h(p). The specific calculation formula of h(p) is as follows:

S4 SDF处理步骤：对于射线上采样的三维点和其对应的哈希编码特征h(p)，首先通过基于傅里叶级数的位置编码对三维点/>进行编码得到位置编码特征γ(p)；γ(p)具体计算公式如下：S4 SDF processing step: for the 3D points sampled on the ray And its corresponding hash code feature h(p), firstly, the three-dimensional point is encoded by Fourier series based position coding/> Encoding is performed to obtain the position encoding feature γ(p); the specific calculation formula of γ(p) is as follows:

γ(p)＝{sin(2⁰πp),cos(2⁰πp),…,sin(2^L-1πp),cos(2^L-1πp)}γ(p)＝{sin(2⁰ πp),cos(2⁰ πp),…,sin(2^L-1 πp),cos(2^L-1 πp)}

之后，将位置编码特征γ(p)和哈希编码特征h(p)进行结合并输入到由两层多层感知机(Multi-Layer Perceptron,MLP)组成的几何网络f_θ来进行SDF场学习，并输出学习到的SDFs值s和通道数为256的SDF特征；Afterwards, the position encoding feature γ(p) and the hash encoding feature h(p) are combined and input into a geometric network f_θ consisting of two layers of multi-layer perceptrons (MLP) to learn the SDF field, and the learned SDFs value s and the SDF feature with 256 channels are output;

具体的处理如下：The specific processing is as follows:

S501：基于γ(p)的计算，通过基于傅里叶级数的位置编码对视角方向v进行编码得到位置编码特征γ(v)。S501: Based on the calculation of γ(p), the viewing direction v is encoded by Fourier series-based position coding to obtain a position coding feature γ(v).

S502：根据如下公式计算三维点的法向/>S502: Calculate the three-dimensional point according to the following formula Normal/>

S503：将编码特征γ(v)、几何网络学习到的SDF特征和三维点的法向/>输入到由两层MLP组成的颜色网络来对颜色场进行建模，并输出学习到的射线上r采样点/>对应的颜色值/>S503: The encoded feature γ(v), the SDF feature learned by the geometric network and the 3D point Normal/> Input to the color network composed of two layers of MLP to model the color field and output r sampling points on the learned ray/> The corresponding color value />

S6渲染步骤：将S4 SDF处理步骤中学习到的SDFs值s转换为体密度，并基于体渲染将S5颜色处理步骤中得到的颜色值和法向/>进行渲染从而得到射线r的颜色/>法向和深度/>具体公式如下：S6 rendering step: convert the SDFs value s learned in S4 SDF processing step into volume density, and render the color value obtained in S5 color processing step based on volume and normal/> Rendering is performed to obtain the color of ray r/> Normal and depth/> The specific formula is as follows:

其中和/>分别表示射线r上第i个点的累积透射率和α值。in and/> They represent the cumulative transmittance and α value of the i-th point on the ray r respectively.

S7重要点渲染步骤：在通过上述步骤得到当前射线r的SDFs值s和渲染深度值后；构建样本集/>在样本集/>的基础上执行步骤S3至S6的操作，得到重要性渲染的颜色/>具体处理如下：S7 important point rendering step: After the above steps, the SDFs value s and rendering depth value of the current ray r are obtained After; build sample set /> In the sample set/> Based on this, the operations of steps S3 to S6 are performed to obtain the color of importance rendering/> The specific processing is as follows:

S701：基于如下公式，并根据学习到的SDFs值s定位得到位于SDF场的零水平级的点的深度/>(即位于表面上的点/>的深度)：S701: Based on the following formula, locate the point at the zero level of the SDF field according to the learned SDFs value s Depth/> (i.e. points on the surface/> depth):

点和深度为/>的点/>被认为能够指示表面上的点的位置的重要点，其被用于进行重要性渲染。point and depth is/> Points/> Important points that are considered to indicate the location of a point on the surface are used for importance rendering.

S8监督步骤：采用可见性感知的特征一致性损失L_vfc、法向一致性损失L_normal、几何偏置损失L_bias、平滑性损失L_smooth、Eikonal损失L_eikonal和颜色损失L_rgb来进行监督步骤S1至S7的操作；具体处理包括如下：S8 supervision step: use visibility-aware feature consistency loss L_vfc , normal consistency loss L_normal , geometric bias loss L_bias , smoothness loss L_smooth , Eikonal loss L_eikonal and color loss L_rgb to perform the operations of supervision steps S1 to S7; the specific processing includes the following:

S801：计算可见性感知的特征一致性损失；首先，基于S701得到的可以得到点的位置；然后，基于当前目标(参考)视图I_r中像素x对应的3D点/>通过平面诱导单应性H_m得到第m个邻域视图I_m中对应的像素x_m：S801: Calculate the visibility-aware feature consistency loss; first, based on S701 Can get points Then, based on the 3D point corresponding to pixel x in the current target (reference) view I_r /> The corresponding pixel x_m in the mth neighborhood view I_m is obtained through the plane-induced homography H_m :

其中，N_p和M分别表示小批量(batch)中像素的数量和邻域视图的数量，N_q表示块q_r中的像素数量；Where_Np and M represent the number of pixels and the number of neighborhood views in a mini-batch, respectively, and_Nq represents the number of pixels in a block_qr ;

S9表面模型获取步骤：在训练过程中不断重复步骤S2至S7的操作，直至达到预设的迭代次数；训练完成后，采用MarchingCube算法从几何网络学习到的SDF场的零水平级中提取场景的表面模型。S9 surface model acquisition step: during the training process, the operations of steps S2 to S7 are continuously repeated until the preset number of iterations is reached; after the training is completed, the MarchingCube algorithm is used to extract the surface model of the scene from the zero level of the SDF field learned by the geometric network.

本发明提供的神经隐式表面重建方法对重建的表面精度有显著增益，增益主要来自以下三个方面：首先通过可见性感知特征一致性损失来对学习到的SDF场的零水平级进行约束，可显著提高重建表面模型的几何一致性；同时，引入深度先验辅助采样策略，有助于定位表面上点的位置；在此基础上，引入了内部先验引导重要性渲染策略，可缓解由体渲染带来的有偏表面渲染问题，从而能够在进一步提高重建表面模型的保真度。The neural implicit surface reconstruction method provided by the present invention has a significant gain in the reconstructed surface accuracy, and the gain mainly comes from the following three aspects: first, the zero-level level of the learned SDF field is constrained by the visibility-aware feature consistency loss, which can significantly improve the geometric consistency of the reconstructed surface model; at the same time, the introduction of a depth prior auxiliary sampling strategy is helpful to locate the position of points on the surface; on this basis, an internal prior guided importance rendering strategy is introduced to alleviate the biased surface rendering problem caused by volume rendering, thereby being able to further improve the fidelity of the reconstructed surface model.

需要说明的是，在上述实施例中，没有详细描述的部分，可以通过常规技术手段实现故不再赘述。It should be noted that in the above embodiments, parts that are not described in detail can be implemented by conventional technical means and will not be described in detail.

本领域内的技术人员应明白，本发明的实施例可提供为方法、系统、或计算机程序产品。因此，本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且，本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will appreciate that embodiments of the present invention may be provided as methods, systems, or computer program products. Therefore, the present invention may take the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present invention may take the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式计算机或者其他可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention is described with reference to the flowchart and/or block diagram of the method, device (system), and computer program product according to the embodiment of the present invention. It should be understood that each process and/or box in the flowchart and/or block diagram, as well as the combination of the process and/or box in the flowchart and/or block diagram can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, a special-purpose computer, an embedded computer or other programmable data processing device to produce a machine, so that the instructions executed by the processor of the computer or other programmable data processing device produce a device for implementing the functions specified in one process or multiple processes in the flowchart and/or one box or multiple boxes in the block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing device to work in a specific manner, so that the instructions stored in the computer-readable memory produce a manufactured product including an instruction device that implements the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions may also be loaded onto a computer or other programmable data processing device so that a series of operational steps are executed on the computer or other programmable device to produce a computer-implemented process, whereby the instructions executed on the computer or other programmable device provide steps for implementing the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.

尽管已描述了本发明的优选实施例，但本领域内的技术人员一旦得知了基本创造概念，则可对这些实施例做出另外的变更和修改。所以，所附权利要求意欲解释为包括优选实施例以及落入本发明范围的所有变更和修改。Although the preferred embodiments of the present invention have been described, those skilled in the art may make additional changes and modifications to these embodiments once they have learned the basic creative concept. Therefore, the appended claims are intended to be interpreted as including the preferred embodiments and all changes and modifications that fall within the scope of the present invention.

显然，本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样，倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内，则本发明也意图包括这些改动和变型在内。Obviously, those skilled in the art can make various changes and modifications to the present invention without departing from the spirit and scope of the present invention. Thus, if these modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include these modifications and variations.

Claims

Translated fromChinese

1.一种基于先验驱动的多视图神经隐式表面重建方法，其特征在于，包括如下步骤：1. A multi-view neural implicit surface reconstruction method based on prior-driven, characterized by comprising the following steps:

S5颜色处理步骤：将射线r的视角方向v、几何网络学习到的SDF特征和三维点的法向输入到由两层多层感知机MLP组成的颜色网络来对颜色场进行建模，并输出学习到的射线上r采样点/>对应的颜色值/>S5 color processing step: the viewing direction v of ray r, the SDF features learned by the geometric network and the 3D point Normal Input to the color network composed of two layers of multi-layer perceptron MLP to model the color field and output the learned r sampling points on the ray/> The corresponding color value />

S7重要点渲染步骤：在通过上述步骤得到当前射线r的SDFs值s和渲染深度值后；构建样本集/>在样本集/>的基础上执行步骤S3至S6的操作，得到重要性渲染的颜色S7 important point rendering step: After the above steps, the SDFs value s and rendering depth value of the current ray r are obtained After; build sample set /> In the sample set/> Based on this, the operations of steps S3 to S6 are performed to obtain the color of importance rendering.

2.根据权利要求1所述的一种基于先验驱动的多视图神经隐式表面重建方法，其特征在于，S2图像训练具体包括如下步骤：2. According to the multi-view neural implicit surface reconstruction method based on prior drive in claim 1, it is characterized in that S2 image training specifically comprises the following steps:

S204：将基于误差有界采样得到的样本集和深度先验辅助采样得到的样本集/>结合得到最终的样本集，并将样本集中的采样值/>转换为射线上的三维点t_n和t_f分别表示射线的近边界和远边界。S204: The sample set obtained based on bounded error sampling And the sample set obtained by deep prior assisted sampling/> Combine to get the final sample set, and put the sampling values in the sample set/> Convert to a 3D point on the ray t_n and t_f represent the near and far boundaries of the ray respectively.

3.根据权利要求2所述的一种基于先验驱动的多视图神经隐式表面重建方法，其特征在于，步骤S202中具体处理如下：3. According to the multi-view neural implicit surface reconstruction method based on prior drive according to claim 2, it is characterized in that the specific processing in step S202 is as follows:

之后为了在保持的同时减少β₊，对/>进行上采样得到候选样本集/>若/>被充分上采样后满足/>则将β₊向β方向减小；该采样算法在/>的情况下会持续迭代，根据中值定理，存在β_*∈(β,β₊)，使得/>并使用最多10次迭代的等分方法来寻找β_*，并更新β₊的值；上述上采样、迭代过程将循环进行直到达到循环的最大次数或Then in order to maintain while reducing β₊ , to/> Up-sample to obtain candidate sample set/> If/> After being fully upsampled, it satisfies/> Then reduce β₊ to β; the sampling algorithm is in/> In the case of continuous iteration, according to the mean value theorem, there exists β_* ∈(β,β₊ ) such that/> The equal division method with a maximum of 10 iterations is used to find β_* and update the value of β₊ ; the above upsampling and iterative process will be repeated until the maximum number of cycles is reached or

4.根据权利要求3所述的一种基于先验驱动的多视图神经隐式表面重建方法，其特征在于，步骤S203中具体处理如下：4. According to the multi-view neural implicit surface reconstruction method based on prior drive of claim 3, it is characterized in that the specific processing in step S203 is as follows:

5.根据权利要求1所述的一种基于先验驱动的多视图神经隐式表面重建方法，其特征在于，S3编码步骤具体处理如下：5. According to the multi-view neural implicit surface reconstruction method based on prior drive in claim 1, it is characterized in that the S3 encoding step is specifically processed as follows:

首先，在空间中构建一个带有可学习参数θ的多分辨率特征网格其分为L个层级，每个层级含有T个特征向量，即每个层级的哈希表大小为T，而每个特征向量的维度为F；然后，对于每个待编码三维点/>通过插值来得到每个层级的哈希网格上的编码特征并将它们连接在一起形成哈希编码特征向量h(p)。First, a multi-resolution feature grid with learnable parameters θ is constructed in space It is divided into L levels, each level contains T feature vectors, that is, the hash table size of each level is T, and the dimension of each feature vector is F; then, for each 3D point to be encoded/> The encoded features on the hash grid at each level are obtained by interpolation And concatenate them together to form the hash-coded feature vector h(p).

6.根据权利要求1所述的一种基于先验驱动的多视图神经隐式表面重建方法，其特征在于，S7重要点渲染步骤具体处理如下：6. According to the multi-view neural implicit surface reconstruction method based on prior drive in claim 1, it is characterized in that the important point rendering step S7 is specifically processed as follows:

S703：在样本集的基础上执行步骤S3至S6的操作，得到重要性渲染的颜色/>S703: In the sample set Based on this, the operations of steps S3 to S6 are performed to obtain the color of importance rendering/>