Movatterモバイル変換


[0]ホーム

URL:


CN119625162A - Incremental optimal view selection method for neural radiance field based on hybrid uncertainty estimation - Google Patents

Incremental optimal view selection method for neural radiance field based on hybrid uncertainty estimation
Download PDF

Info

Publication number
CN119625162A
CN119625162ACN202411487001.3ACN202411487001ACN119625162ACN 119625162 ACN119625162 ACN 119625162ACN 202411487001 ACN202411487001 ACN 202411487001ACN 119625162 ACN119625162 ACN 119625162A
Authority
CN
China
Prior art keywords
uncertainty
image
hybrid
nerf
rendering
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411487001.3A
Other languages
Chinese (zh)
Inventor
劳奕臻
谢景鹏
薛逸飞
谈诗语
王远磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University
Original Assignee
Hunan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan UniversityfiledCriticalHunan University
Priority to CN202411487001.3ApriorityCriticalpatent/CN119625162A/en
Publication of CN119625162ApublicationCriticalpatent/CN119625162A/en
Pendinglegal-statusCriticalCurrent

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本发明公开一种基于混合不确定性估计的神经辐射场增量式最优视图选择方法,包括以下步骤:输入RGB图像集合步骤;计算3D位置和观察方向;初始化NeRF训练步骤;基于剩余图像集的5D坐标(x,d),通过修改的NeRF网络进行阈值采样,计算出RGB图像的颜色和不透明度,再将颜色和不透明度整合到Beta分布中,计算渲染不确定性;输入剩余图像集的3D坐标x,通过分类器判断轨迹是平面或非平面,使用Voronoi图算法估计位置信息,得到位置不确定性;选择最优视图步骤,将渲染不确定性和位置不确定性归一化并求和,计算出每张RGB图像的混合不确定性;选择混合不确定性最高的图像,将其添加到训练集中,并重复此过程,直至达到特定的重建效果或预设的图像数量限制。

The present invention discloses a neural radiation field incremental optimal view selection method based on hybrid uncertainty estimation, comprising the following steps: an RGB image set input step; a 3D position and viewing direction calculation step; a NeRF training initialization step; based on the 5D coordinates (x, d) of the remaining image set, threshold sampling is performed through a modified NeRF network to calculate the color and opacity of the RGB image, and then the color and opacity are integrated into a Beta distribution to calculate the rendering uncertainty; the 3D coordinate x of the remaining image set is input, and a classifier is used to determine whether the trajectory is a plane or a non-plane, and a Voronoi diagram algorithm is used to estimate the position information to obtain the position uncertainty; an optimal view selection step is performed, and the rendering uncertainty and the position uncertainty are normalized and summed to calculate the hybrid uncertainty of each RGB image; the image with the highest hybrid uncertainty is selected, added to the training set, and the process is repeated until a specific reconstruction effect or a preset image quantity limit is achieved.

Description

Neural radiation field incremental optimal view selection method based on mixed uncertainty estimation
Technical Field
The field of technology referred to herein is that of computer vision using neural radiation fields (Neural RADIANCE FIELDS, NERF) for large-scale three-dimensional reconstruction. In particular, the invention relates to a neural radiation field incremental optimal view selection method based on hybrid uncertainty estimation.
Background
At present, three-dimensional reconstruction techniques based on images are mainly divided into two main categories, namely a traditional geometric-based method and a neural network-based method. The conventional method mainly includes two steps of motion restoration structure (Structure From Motion, sfM) and multi-view stereoscopic (Multiple View Stereo, MVS). SfM aims at analyzing feature points from a sequence of images, acquiring a three-dimensional structure and a camera trajectory by means of beam adjustment parameters. Subsequently, the MVS generates a cloud point of a single view by estimating a depth map for each view. Another class of methods is represented by the neural radiation field (Neural RADIANCE FIELDS, NERF) as a significant breakthrough in three-dimensional reconstruction techniques in recent years. NeRF implicitly express three-dimensional scene information by training network parameters and enable synthesis of images of new perspectives. However, urban-level large-scale reconstruction techniques face two major challenges, namely, firstly, large-scale reconstruction generally requires processing and storing of a large amount of data, which may rapidly deplete the memory resources of a single GPU, resulting in slow processing speed, insufficient memory and other performance problems, which are particularly unfriendly to users or researchers with limited video memory resources, and secondly, as the demand for real-time or near real-time applications increases, such as navigation or disaster management, research into faster and more efficient large-scale three-dimensional reconstruction techniques becomes particularly urgent.
In order to create a more accurate, detailed and useful large-scale 3D model and solve the practical challenges of using large datasets and limited computational resources, the present study proposes an efficient method that enables fast, robust large-scale three-dimensional reconstruction under limited video memory resources.
Disclosure of Invention
The invention provides a rapid and efficient large-scale three-dimensional reconstruction method based on a nerve radiation field, which can effectively process a large amount of data input under a limited video memory resource, and is based on the problems that the training speed of the existing large-scale reconstruction technology is still very long, the video memory requirement is high and the like. The method mainly comprises two major parts, namely a view planning strategy based on uncertainty and a view selection strategy based on information gain.
The method provided by the invention fuses the characteristics of different time windows, utilizes a plurality of prediction heads to output a plurality of suggested positions at different levels, takes the point position with the highest confidence as the final point tracking result, and the constructed model has no limitation on the length of an input video sequence and the number of tracking points, so that the tracking of all points in all frames can be completed in parallel only through one forward propagation process.
In order to achieve the above purpose, the application adopts the following technical scheme:
a neural radiation field incremental optimal view selection method based on mixed uncertainty estimation comprises the following steps:
Inputting a group of N RGB image data with unified resolution shot by using the unmanned aerial vehicle;
Processing each input RGB image, estimating the pose, recovering the structure in the RGB image and outputting the 3D position x and the observing direction vector D corresponding to each image in the scene, wherein the structure in the RGB image refers to scene geometric information recovered from the RGB image and comprises but is not limited to the position, the shape and the spatial relationship of objects;
initializing NeRF, namely randomly selecting images with a preset proportion from the RGB image set to perform initializing training of the NeRF model, wherein the preset proportion is not lower than 15%;
the method comprises the steps of calculating mixed uncertainty, calculating color and opacity of an RGB image based on 5D coordinates (x, D) of a residual image set, performing threshold sampling through a modified NeRF network, integrating the color and the opacity into Beta distribution, and calculating rendering uncertainty;
And selecting an image with the highest mixing uncertainty, adding the image with the highest mixing uncertainty into a training set, and repeating the process until a specific reconstruction effect or a preset image quantity limit is achieved.
The training set added to the model is used for training an image dataset of a NeRF model. The NeRF model and the traditional triangle mesh, the point cloud, the voxels and the like are all three-dimensional reconstruction expression forms, but the triangle mesh, the point cloud, the voxels and the like are all explicit expressions, the NeRF is implicit expression, and the scene information is stored in a neural network. To reconstruct a scene, a network needs to be trained specifically, and multiple view reconstruction is performed using the acquired images of the scene, and the reconstructed images are the training set. Under the condition of enough video memory, the more the training set images are, the more the information is brought, and the better the reconstruction effect is.
As a preferred aspect of the invention, in the step of calculating the mixing uncertainty,
Based on the 5D coordinates (x, y, z, θ, phi) of the remaining image set, threshold sampling is performed through the modified NeRF network, and the color and opacity of the sampling point on each ray of each RGB image are calculated through multi-resolution hash information storage of the explicit voxel grid and the implicit neural network, and the rendering uncertainty of each picture is calculated through integration into the Beta distribution, and at the same time,
The method comprises the steps of inputting 3D coordinates (x, y, z) of a residual image set, judging whether a flying track of the unmanned aerial vehicle is a plane or a non-plane through a classifier, carrying out top-layer global planning and bottom-layer local planning on the plane track by utilizing a Voronoi information radiation field so as to compress and limit position information, and quantifying position uncertainty on the non-plane track by adopting a Voronoi clustering algorithm through distance from a central point.
Where (x, D) and (x, y, z, θ, φ) are different descriptions of 5D coordinates, the former x and D representing the position and viewing direction in three-dimensional space, respectively, the latter x, y, z referring to the values of the xyz coordinate system in three-dimensional space, θ, φ referring to the horizontal and vertical components of the direction.
As a preferred aspect of the present invention, the step of calculating the mixing uncertainty is specifically performed as:
the color and the opacity on each ray are obtained through NeRF model, and the volume density is calculated according to the following calculation formula:
where ai represents the bulk density of the ith sample point on the ray, σj represents the opacity of the jth sample point, and δj represents the distance of the jth sample point from the last sample point.
The color variance is calculated for each ray by NeRF model, and the calculation formula of the color variance is as follows:
β2(r(ti))=-P(αi)log P(αi)
Wherein β2(r(ti)) represents the variance of the i-th sampling point on the ray, αi represents the bulk density of the i-th sampling point (equation 1 above), and P (αi) represents the proportion of αi to the total bulk density Σαi of the ray;
Building Beta distribution from estimated mean and varianceWherein,Representing the color of a predicted ray,Representing the mean value of the color of a ray,Representing the mean of a ray variance;
optimizing the continuous function by minimizing the square reconstruction error between the true RGB image and the rendered pixel colors while calculating the rendering uncertainty for each picture, the calculation formula for the picture rendering uncertainty is as follows:
Representing the rendering uncertainty of the I-th picture, Nr represents the total number of rays per picture,Representing the squared error between the true RGB image and the rendered pixel color,Represents the mean variance of ray i;
Calculating Hausdorff distance between two data sets to determine whether the image track is plane or non-plane, wherein the Hausdorff distance isH (A, B) is Ha Siduo f distance of two tracks A and B, and I ai-bj I is distance of the ith point in the A track and the jth point in the B track.
As a preferred aspect of the invention, the step of calculating the mixing uncertainty is specifically performed by calculating the position uncertainty for the planar trajectory using the formula:
wherein Fp (I) is the plane position uncertainty value of the photo I, Nv represents the total three-dimensional pose point number,Representing the distance values of points i and j at the weight value λi of i, ai represents the voronoi diagram area of point i.
As a preferred aspect of the invention, for non-planar trajectories, the following formula is used to estimate the position uncertainty:
Wherein Fnp (I) is a non-planar position uncertainty value of the photo I, Gi is a probability of selecting the picture I based on the Veno graph polygon, and ro represents a relative local density of the evaluation point in density change before and after selection.
In a preferred aspect of the invention, in the step of selecting the optimal view, the rendering uncertainty and the position uncertainty are normalized and summed to form a mixed uncertainty of each image, the image with the highest mixed uncertainty is selected in the candidate set, added into the training set, and the process is repeated until a specific reconstruction effect or a preset image quantity limit is achieved, so that the view selection strategy is optimized, wherein the calculation formula of the mixed uncertainty is as follows:
Wherein,Representing the position uncertainty of picture I,The rendering uncertainty of the picture I is represented, and the total uncertainty psi2 (I) of the picture I is obtained by normalizing and adding the rendering uncertainty and the rendering uncertainty.
As a preferred aspect of the present invention, further comprising:
And in each iteration, calculating the uncertainty of each unused candidate view based on the current training set, selecting the view with the largest information gain, and adding the view to the training set to gradually improve the quality of new view synthesis.
In the input RGB image collecting step, a rotor unmanned aerial vehicle is used for carrying out specific route flight on a target area, the flight route is divided into a plane and a non-plane, and fixed-point shooting is carried out in the flight process, so that a group of N aerial RGB image data with the same resolution of orthographic or oblique shooting by using the unmanned aerial vehicle is obtained.
In the method, COLMAP software is used for estimating the pose of all input RGB images in the process of calculating the 3D position and the observing direction to obtain the shooting pose of the camera corresponding to each image, wherein the default camera model is a pinhole camera model, and all cameras are set to be the same internal reference to perform incremental SfM reconstruction, so that the internal and external parameters of all cameras are obtained.
In a preferred aspect of the present invention, in the initializing NeRF training step, at least 15% of the images from the RGB image set are randomly selected for initializing the NeRF model to enable the NeRF model to learn the basic structure and appearance characteristics of the scene and store the basic structure and appearance characteristics in the multi-layer perceptron network.
The method comprises the steps of randomly selecting 15% of images from a training set to perform initialization training to obtain a problematic scene model, and performing iterative training on the basis of the initialization model. And randomly selecting 15% of images from the training set to perform initialization training to obtain a problematic scene model, wherein if the problem is small, the subsequent iteration can achieve a better effect with fewer rounds, otherwise, more rounds are needed to be iterated.
For example, 100 images are provided, 10 (10%) images are selected as test sets for evaluating the quality of the model, the remaining 90 images are used as training sets, 15 (15%) are used as initialized training sets, the rest is training candidate sets, a worse scene model is obtained after initialization training is performed at this time, mixing uncertainty of all images in the training candidate sets is calculated at this time, candidate images with the largest information amount are obtained, the candidate images are added into the training sets (16 images are used as training sets, 15 initialized images are+1 newly added candidate images at this time), iterative training is performed to obtain a new optimized scene model, and the steps are repeated until the set conditions are met, for example, 15 images are selected.
The invention has the specific advantages that:
The invention introduces the realization of the information gain based on the mixed uncertainty of the position uncertainty and the rendering uncertainty, and simultaneously helps to realize the task of fast and large-scale three-dimensional reconstruction under the limited video memory resource through the visual angle selection of the maximum gain. Compared with the prior art, the method has the remarkable advantages of larger input data standard, higher speed of completing large-scale three-dimensional reconstruction, higher quality of rendered images under the same condition and the like.
The neural radiation field incremental optimal view selection method based on mixed uncertainty estimation can solve the problem of artifacts appearing under multiple viewpoints, and reduces the calculation cost while maintaining high rendering fidelity. The method is particularly focused on how to effectively select the view which can bring the maximum information gain from candidate views through an incremental optimal view selection strategy under the restriction of limited video memory resources, so that the rendering quality and the rendering efficiency are improved.
In addition, the invention combines the technologies of mixed multi-resolution hash storage of the explicit voxel grid and the implicit neural network, voronoi diagram information gain radiation field and clustering algorithm, threshold sampling, flight classifier and the like, so as to further improve the processing efficiency and ensure the performance of the original NeRF network. The technology can be used as a plugin tool to assist the existing system to realize more efficient three-dimensional scene reconstruction and rendering
Specific embodiments of the invention are disclosed in detail below with reference to the following description and drawings, indicating the manner in which the principles of the invention may be employed. It should be understood that the embodiments of the invention are not limited in scope thereby. The embodiments of the invention include many variations, modifications and equivalents within the spirit and scope of the appended claims.
Features that are described and/or illustrated with respect to one embodiment may be used in the same way or in a similar way in one or more other embodiments in combination with or instead of the features of the other embodiments.
It should be emphasized that the term "comprises/comprising" when used herein is taken to specify the presence of stated features, integers, steps or components but does not preclude the presence or addition of one or more other features, integers, steps or components.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the invention, and that other drawings can be obtained from these drawings without inventive faculty for a person skilled in the art.
FIG. 1 is a schematic illustration of a neural radiation field incremental optimal view selection method based on hybrid uncertainty estimation of the present invention.
FIG. 2 is a flow chart of a neural radiation field incremental optimal view selection method based on hybrid uncertainty estimation of the present invention.
FIG. 3 is a block diagram of a method of constructing the neural radiation field incremental optimal view selection method based on hybrid uncertainty estimation of the present invention.
Fig. 4 is a schematic diagram of a voronoi diagram information gain radiation field employed by a planar flight trajectory in a neural radiation field incremental optimal view selection method based on hybrid uncertainty estimation in accordance with the present invention.
Fig. 5 is a schematic diagram of a voronoi diagram clustering algorithm employed by a non-planar flight trajectory in a neural radiation field incremental optimal view selection method based on hybrid uncertainty estimation in accordance with the present invention.
FIG. 6 is a partial result effectiveness display in the neural radiation field incremental optimal view selection method based on hybrid uncertainty estimation of the present invention.
Detailed Description
In order to make the technical solution of the present invention better understood by those skilled in the art, the technical solution of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, shall fall within the scope of the invention.
As shown in fig. 1 to 6, an embodiment of the present invention provides a neural radiation field incremental optimal view selection method based on hybrid uncertainty estimation, which is mainly used for solving the problems of large-scale three-dimensional reconstruction and new view rendering output, and the algorithm schematic is shown in fig. 1.
The algorithm mainly comprises two major parts, namely calculating rendering uncertainty and position uncertainty, obtaining mixed uncertainty to carry out view angle selection planning of maximum information gain, wherein the algorithm flow is shown in figure 2 and mainly comprises the following steps of:
s1, firstly inputting an RGB real photo set. A set of high resolution RGB image data photographed using the unmanned aerial vehicle is input.
A rotor unmanned aerial vehicle is used for carrying out specific route flight on a certain area, the flight route is divided into a plane and a non-plane, and the non-plane generally adopts manual flight. The plane may have a route set in the app, typically a groined fly method, etc. And carrying out fixed-point shooting in the flight process, so as to obtain a group of N aerial survey image data with the same resolution of orthojets or oblique jets shot by using the unmanned aerial vehicle.
S2, performing pose estimation (the position of a camera in space and the orientation of the camera) and sparse point cloud reconstruction on all input images by using COLMAP, and obtaining camera shooting pose and sparse point cloud information corresponding to each image.
And performing pose estimation (the position of the camera in space and the orientation of the camera) on all the input images by using COLMAP to obtain the shooting pose of the camera corresponding to each image. The default camera model is a pinhole camera model (PINEHOLE) and is set so that all cameras are used as the same internal parameter to perform incremental SfM reconstruction, and therefore the internal and external parameters of all cameras are obtained.
S3, initializing NeRF for training. A proportion of images (at least 15% recommended) are randomly selected from the input dataset for initialization training of the NeRF model.
At least 15% of the images from the input dataset are randomly selected for initialization training of NeRF models, ensuring that the models can learn the basic structure and appearance features of the scene.
S4, calculating the mixing uncertainty. Based on the 5D coordinates (x, D) of the remaining image set, the color and opacity of the image are calculated by threshold sampling through the modified NeRF network.
The NeRF network is the part of the NeRF model responsible for learning and encoding scene information, and the NeRF model relies on the NeRF network to learn the three-dimensional representation of the scene, and requires other components (e.g., volume rendering equations, etc.) to achieve new view angle synthesis and three-dimensional reconstruction.
The complete image set is divided into a test set, a training set and a training candidate set, the remaining image set refers to the training candidate set, and the test set is not participated in the training process and is only used for evaluating training results. For example, 100 images are selected, 10 (10%) images are selected as test sets for evaluating the quality of the model, the rest 90 images are used as training candidate sets, 15 images (15%) are selected for initializing the training set for training, and the images with the largest information content are selected from the rest training candidate sets and added into the training set.
The color and opacity are then integrated into the Beta distribution, and the rendering uncertainty is calculated. For the position uncertainty, the 3D coordinate x of the residual image set is input, whether the track is a plane or a non-plane is judged through a classifier, and then the position information is estimated by using a Voronoi graph algorithm, so that the position uncertainty is obtained. The specific execution algorithm of the step is as follows:
S41, calculating the color and the opacity on each light line through the MLP of the NeRF model, wherein the distance and the opacity between adjacent samples are considered by the volume density calculated by the following calculation formula:
where ai represents the bulk density of the ith sample point on the ray, σj represents the opacity of the jth sample point, and δj represents the distance of the jth sample point from the last sample point.
S42, calculating the variance of the color for each ray through a NeRF model. The color variance is calculated as follows:
β2(r(ti))=-P(αi)log P(αi)
Where β2(r(ti)) represents the variance of the ith sample point on the ray, αi represents the bulk density of the ith sample point (equation 1 above), and P (αi) represents the proportion of αi to the total bulk density Σαi of the ray.
S43, since the volume density at a specific location is only affected by its own 3D coordinates and not by the viewing direction, which makes the distributions of different locations independent of each other, while the volume rendering can be approximated as a linear combination along the ray sampling points, beta distribution can be constructed from the estimated mean and variance
Wherein,Representing the color of a ray predicted (calculated according to the build model),Representing the mean value of the color of a ray,Represents the mean of the variance of a ray and builds the beta distribution.
S44, optimizing the continuous function by minimizing the square reconstruction error between the true RGB image and the rendered pixel colors, and simultaneously calculating the rendering uncertainty of each picture. The calculation formula of the picture rendering uncertainty is as follows:
Representing the rendering uncertainty of the I-th picture, Nr represents the total number of rays per picture,Representing the squared error between the true RGB image and the rendered pixel color,Representing the mean variance of ray i.
S45, calculating Hausdorff distance between two data setsTo determine whether the image track is planar or non-planar.
Where h (A, B) is Ha Siduo f distance of the two tracks A and B, |ai-bj || is distance of the ith point in the A track and the jth point in the B track.
S46, for the plane track, calculating position uncertainty by using the Voronoi diagram, and taking importance of each position point and the plane Veno diagram area into consideration. The position uncertainty is calculated using the following formula:
wherein Fp (I) is the plane position uncertainty value of the photo I, Nv represents the total three-dimensional pose point number,Representing the distance values of points i and j at the weight value λi of i, ai represents the voronoi diagram area of point i.
For non-planar trajectories, a Voronoi diagram clustering algorithm is used to estimate position uncertainty, taking into account three-dimensional spatial importance and local correlation density. At this time, the position uncertainty is calculated using the following formula:
Wherein Fnp (I) is a non-planar position uncertainty value of the photo I, Gi is a probability of selecting the picture I based on the Veno graph polygon, and ri represents a relative local density of the evaluation point in density change before and after selection.
S5, selecting an optimal view. And normalizing and summing the rendering uncertainty and the position uncertainty, and calculating the mixed uncertainty of each image. The image with the highest mixing uncertainty is selected and added to the training set, and the process is repeated until a specific reconstruction effect or a preset image quantity limit is reached. The calculation formula of the mixing uncertainty is as follows:
Wherein,Representing the position uncertainty of picture I,The rendering uncertainty of the picture I is represented, and the total uncertainty psi2 (I) of the picture I is obtained by normalizing and adding the rendering uncertainty and the rendering uncertainty.
S6, iterative optimization. In each iteration, the uncertainty of each unused candidate view is calculated based on the current training set, the view with the greatest information gain is selected and added to the training set to gradually improve the quality of the new view synthesis.
It should be noted that NeRF introduces a new scene representation and view synthesis method, allowing to synthesize highly realistic 3D scenes from 2D images. The training process includes capturing a set of 2D images of the scene from different viewpoints. For each pixel in the image, neRF calculates the corresponding ray in 3D space. NeRF estimate scene color and opacity at each point along these rays, then compare the predicted appearance color values to the observed appearance color values, and adjust network parameters to minimize the variance. After training NeRF, new scene views can be synthesized by projecting light from the virtual camera locations and accumulating color and opacity along the light using the learned neuro-radiation fields. This process allows photographs to be generated from previously invisible viewpoints, providing realistic scene rendering images. Instant-NGP is the fastest nerve radiation field model to date that is extremely engineering-valuable. The work proposes a multi-resolution hash coding to accelerate model training, the multi-resolution hash table enhancement of trainable feature vectors can further reduce the model size, and the whole system is realized by using the completely fused CUDA kernel, so that parallelism is extremely exerted.
Each time iterative training is performed, the method is a three-dimensional reconstruction method based on a nerve radiation field, and view angle planning of position uncertainty and rendering uncertainty is considered, wherein a specific model is Instant-NGP.
S6, iterative optimization. In each iteration, the uncertainty of each unused candidate view is calculated based on the current training set, the view with the greatest information gain is selected and added to the training set to gradually improve the quality of the new view synthesis.
As can be seen from fig. 6, according to the partial results in the rapid large-scale three-dimensional reconstruction method based on the nerve radiation field, six groups of public large scene data sets are selected, and compared with the previous base line, better reconstruction effects are obtained.
In the present application, a plurality of elements, components, parts or steps can be provided by a single integrated element, component, part or step. Alternatively, a single integrated element, component, part or step may be divided into separate plural elements, components, parts or steps. The disclosure of "a" or "an" to describe an element, component, section or step is not intended to exclude other elements, components, sections or steps.
It is to be understood that the above description is intended to be illustrative, and not restrictive. Many embodiments and many applications other than the examples provided will be apparent to those of skill in the art upon reading the above description. The scope of the present teachings should, therefore, be determined not with reference to the above description, but instead should be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled. The disclosures of all articles and references, including patent applications and publications, are incorporated herein by reference for the purpose of completeness. The omission of any aspect of the subject matter disclosed herein in the preceding claims is not intended to forego such subject matter, nor should the inventors regard such subject matter as not be considered to be part of the disclosed subject matter.

Claims (10)

Translated fromChinese
1.一种基于混合不确定性估计的神经辐射场增量式最优视图选择方法,其特征在于,包括以下步骤:1. A neural radiation field incremental optimal view selection method based on hybrid uncertainty estimation, characterized by comprising the following steps:输入RGB图像集合步骤;输入一组数量为N的使用无人机所拍摄统一分辨率的RGB图像数据;Input RGB image set step: input a set of N RGB image data with uniform resolution taken by a drone;计算3D位置和观察方向;对输入的每张RGB图像进行处理,进行位姿估算,恢复RGB图像中的结构并输出场景中每张图像对应的3D位置x和观察方向向量d;Calculate 3D position and viewing direction; process each input RGB image, perform pose estimation, restore the structure in the RGB image and output the 3D position x and viewing direction vector d corresponding to each image in the scene;初始化NeRF训练步骤;从RGB图像集合中随机选择预定比例的图像进行NeRF模型的初始化训练;所述预定比例不低于15%;Initialize the NeRF training step; randomly select a predetermined proportion of images from the RGB image set to perform initialization training of the NeRF model; the predetermined proportion is not less than 15%;计算混合不确定性步骤;基于剩余图像集的5D坐标(x,d),通过修改的NeRF网络进行阈值采样,计算出RGB图像的颜色和不透明度,再将颜色和不透明度整合到Beta分布中,计算渲染不确定性;输入剩余图像集的3D坐标x,通过分类器判断轨迹是平面或非平面,使用Voronoi图算法估计位置信息,得到位置不确定性;Calculate the mixed uncertainty step; Based on the 5D coordinates (x, d) of the remaining image set, perform threshold sampling through the modified NeRF network, calculate the color and opacity of the RGB image, and then integrate the color and opacity into the Beta distribution to calculate the rendering uncertainty; Input the 3D coordinate x of the remaining image set, use the classifier to determine whether the trajectory is planar or non-planar, use the Voronoi diagram algorithm to estimate the position information, and obtain the position uncertainty;选择最优视图步骤,将渲染不确定性和位置不确定性归一化并求和,计算出每张RGB图像的混合不确定性;选择混合不确定性最高的图像,将其添加到训练集中,并重复此过程,直至达到特定的重建效果或预设的图像数量限制。The optimal view step is selected, the rendering uncertainty and position uncertainty are normalized and summed, and the mixed uncertainty of each RGB image is calculated; the image with the highest mixed uncertainty is selected, added to the training set, and this process is repeated until a specific reconstruction effect or a preset image number limit is achieved.2.如权利要求1所述的基于混合不确定性估计的神经辐射场增量式最优视图选择方法,其特征在于,在所述计算混合不确定性步骤中,2. The neural radiation field incremental optimal view selection method based on hybrid uncertainty estimation according to claim 1, characterized in that in the step of calculating hybrid uncertainty,基于剩余图像集的5D坐标(x,y,z,θ,φ),通过修改的NeRF网络进行阈值采样,并通过显式体素网格和隐式神经网络的多分辨率哈希信息存储,计算出每张RGB图像每条射线上采样点的的颜色和不透明度,通过整合到Beta分布中计算每张图片的渲染不确定性;同时,Based on the 5D coordinates (x, y, z, θ, φ) of the remaining image set, threshold sampling is performed through a modified NeRF network, and the color and opacity of the sampling points on each ray of each RGB image are calculated through the multi-resolution hash information storage of the explicit voxel grid and the implicit neural network. The rendering uncertainty of each image is calculated by integrating it into the Beta distribution; at the same time,输入剩余图像集的3D坐标(x,y,z),通过分类器判断无人机飞行轨迹是平面还是非平面;对于平面轨迹,利用Voronoi信息辐射场进行顶层全局规划和底层局部规划,从而进行位置信息的压缩与限定;对于非平面轨迹,采用Voronoi聚类算法,通过距离中心点的远近来量化位置的不确定性。The 3D coordinates (x, y, z) of the remaining image sets are input, and the classifier is used to determine whether the UAV flight trajectory is planar or non-planar. For planar trajectories, the Voronoi information radiation field is used to perform top-level global planning and bottom-level local planning, thereby compressing and limiting the position information. For non-planar trajectories, the Voronoi clustering algorithm is used to quantify the uncertainty of the position by the distance from the center point.3.如权利要求2所述的基于混合不确定性估计的神经辐射场增量式最优视图选择方法,其特征在于,所述计算混合不确定性步骤被具体执行为:3. The neural radiation field incremental optimal view selection method based on hybrid uncertainty estimation according to claim 2, wherein the step of calculating hybrid uncertainty is specifically performed as follows:通过NeRF模型,得到每条光线上的颜色和不透明度,计算体积密度;体积密度计算公式如下:Through the NeRF model, the color and opacity of each ray are obtained, and the volume density is calculated; the volume density calculation formula is as follows:其中,ai代表射线上第i个采样点的体积密度,σj代表第j个采样点的不透明度,δj代表第j个采样点与上一个采样点的距离。Among them, ai represents the volume density of the i-th sampling point on the ray, σj represents the opacity of the j-th sampling point, and δj represents the distance between the j-th sampling point and the previous sampling point.通过NeRF模型,为每条光线计算颜色的方差;颜色方差的计算公式如下:The color variance is calculated for each ray using the NeRF model. The color variance is calculated using the following formula:β2(r(ti))=-P(αi)log P(αi)β2 (r(ti ))=-P(αi )log P(αi )其中,β2(r(ti))代表射线上第i个采样点的方差,αi代表第i个采样点的体积密度(上述公式1),P(αi)代表αi占该射线全部体积密度∑αi的比例;Wherein, β2 (r(ti )) represents the variance of the i-th sampling point on the ray, αi represents the volume density of the i-th sampling point (Formula 1 above), and P(αi ) represents the proportion of αi to the total volume density ∑αi of the ray;根据估计的均值和方差构建Beta分布其中,代表预测的一条射线的颜色,代表一条射线颜色的均值,代表一条射线方差的均值;Construct a Beta distribution based on the estimated mean and variance in, Represents the color of a predicted ray, Represents the mean color of a ray, Represents the mean of the variance of a ray;通过最小化真值RGB图像和渲染的像素颜色之间的平方重建误差来优化连续函数,同时计算每张图片的渲染不确定性;图片渲染不确定性的计算公式如下:The continuous function is optimized by minimizing the squared reconstruction error between the true RGB image and the rendered pixel color, while calculating the rendering uncertainty of each image; the calculation formula for image rendering uncertainty is as follows:代表第I张图片的渲染不确定度,Nr代表每张图片的总共射线数量,代表真值RGB图像和渲染的像素颜色之间的平方误差,代表射线i的平均方差; represents the rendering uncertainty of the Ith image, Nr represents the total number of rays for each image, Represents the squared error between the true RGB image and the rendered pixel color, represents the average variance of ray i;计算两个数据集之间的Hausdorff距离,以确定图像轨迹是否为平面或非平面;其中,Hausdorff距离为h(A,B)为两条轨迹A和B的哈斯多夫距离,||ai-bj||为A轨迹中的第i个点和B轨迹中的第j个点的距离。Calculate the Hausdorff distance between the two data sets to determine whether the image trajectory is planar or non-planar; where the Hausdorff distance is h(A,B) is the Hasdorff distance between two trajectories A and B, and ||ai -bj || is the distance between the i-th point in trajectory A and the j-th point in trajectory B.4.如权利要求3所述的基于混合不确定性估计的神经辐射场增量式最优视图选择方法,其特征在于,所述计算混合不确定性步骤被具体执行为:对于平面轨迹,采用如下公式计算位置不确定性:4. The neural radiation field incremental optimal view selection method based on hybrid uncertainty estimation according to claim 3 is characterized in that the step of calculating the hybrid uncertainty is specifically performed as follows: for a planar trajectory, the position uncertainty is calculated using the following formula:其中,Fp(I)为照片I的平面位置不确定度值,Nv表示全部的三维位姿点数量,表示点i和点j在i的权重值λi下的距离值,Ai表示i点的维诺图面积。WhereFp (I) is the planar position uncertainty value of photo I,Nv represents the total number of 3D pose points, represents the distance between point i and point j under the weight value λi of i, andAi represents the Voronoi diagram area of point i.5.如权利要求3所述的基于混合不确定性估计的神经辐射场增量式最优视图选择方法,其特征在于,对于非平面轨迹,采用如下公式来估计位置不确定性:5. The neural radiation field incremental optimal view selection method based on hybrid uncertainty estimation as claimed in claim 3, characterized in that for non-planar trajectories, the following formula is used to estimate the position uncertainty:其中,Fnp(I)为照片I的非平面位置不确定度值,Gi是基于维诺图多边形选取图片i的概率,ri表示评估点在选取前后密度变化的相对局部密度。WhereFnp (I) is the non-planar position uncertainty value of photo I,Gi is the probability of selecting photo i based on the Voronoi diagram polygon, andri represents the relative local density of the density change of the evaluation point before and after selection.6.如权利要求1所述的基于混合不确定性估计的神经辐射场增量式最优视图选择方法,其特征在于,在所述选择最优视图步骤中,将渲染不确定性和位置不确定性归一化并求和,形成每张图像的混合不确定性;通过在候选集中选择混合不确定性最高的图像,将其添加到训练集中,并重复此过程,直至达到特定的重建效果或预设的图像数量限制,从而优化视图选择策略;其中,混合不确定性的计算公式如下:6. The neural radiation field incremental optimal view selection method based on hybrid uncertainty estimation according to claim 1 is characterized in that, in the step of selecting the optimal view, the rendering uncertainty and the position uncertainty are normalized and summed to form the hybrid uncertainty of each image; the image with the highest hybrid uncertainty is selected from the candidate set, added to the training set, and this process is repeated until a specific reconstruction effect or a preset image quantity limit is achieved, thereby optimizing the view selection strategy; wherein the calculation formula of the hybrid uncertainty is as follows:其中,表示图片I的位置不确定度,表示图片I的渲染不确定度,两者归一化相加后为图片I的总不确定度ψ2(I)。in, represents the position uncertainty of image I, represents the rendering uncertainty of image I. The normalized addition of the two is the total uncertainty of image I, ψ2 (I).7.如权利要求1所述的基于混合不确定性估计的神经辐射场增量式最优视图选择方法,其特征在于,还包括:7. The neural radiation field incremental optimal view selection method based on hybrid uncertainty estimation according to claim 1, characterized in that it also includes:迭代优化步骤;在每次迭代中,基于当前训练集计算每个未使用候选视图的不确定性,选择信息增益最大的视图,并将其添加到训练集中,以逐步提高新视角合成的质量。Iterative optimization step; in each iteration, the uncertainty of each unused candidate view is calculated based on the current training set, and the view with the largest information gain is selected and added to the training set to gradually improve the quality of the synthesis of new views.8.如权利要求1所述的基于混合不确定性估计的神经辐射场增量式最优视图选择方法,其特征在于,在所述输入RGB图像集合步骤中,首先使用旋翼无人机对目标区域进行特定的航线飞行,飞行航线分为平面与非平面;在飞行过程中进行定点拍摄,从而得到一组数量为N的使用无人机拍摄的正射或斜射的分辨率大小相同的航测RGB图像数据。8. The neural radiation field incremental optimal view selection method based on hybrid uncertainty estimation as described in claim 1 is characterized in that, in the step of inputting the RGB image set, a rotorcraft UAV is first used to fly a specific route to the target area, and the flight route is divided into plane and non-plane; fixed-point shooting is performed during the flight, thereby obtaining a set of N aerial survey RGB image data with the same resolution size, which are taken by the UAV in the form of orthogonal or oblique projection.9.如权利要求1所述的基于混合不确定性估计的神经辐射场增量式最优视图选择方法,其特征在于,在所述计算3D位置和观察方向中,使用COLMAP软件对所有输入的RGB图像进行位姿估算,得到每张图像对应的相机拍摄位姿;其中,默认设置相机模型为针孔相机模型且设置为所有相机为同一内参进行增量式SfM重建,由此得到所有相机的内外参数。9. The neural radiation field incremental optimal view selection method based on hybrid uncertainty estimation as described in claim 1 is characterized in that, in the calculation of 3D position and observation direction, COLMAP software is used to estimate the pose of all input RGB images to obtain the camera shooting pose corresponding to each image; wherein, the camera model is set to a pinhole camera model by default and all cameras are set to perform incremental SfM reconstruction with the same internal parameters, thereby obtaining the internal and external parameters of all cameras.10.如权利要求1所述的基于混合不确定性估计的神经辐射场增量式最优视图选择方法,其特征在于,在所述初始化NeRF训练步骤中,从RGB图像集合中随机选择至少15%的图像进行NeRF模型的初始化训练,以使NeRF模型学习到场景的基本结构和外观特征,并存储到多层感知机网络中。10. The neural radiation field incremental optimal view selection method based on hybrid uncertainty estimation as described in claim 1 is characterized in that, in the initialization NeRF training step, at least 15% of the images are randomly selected from the RGB image set for initialization training of the NeRF model, so that the NeRF model learns the basic structure and appearance features of the scene and stores them in a multi-layer perceptron network.
CN202411487001.3A2024-10-232024-10-23 Incremental optimal view selection method for neural radiance field based on hybrid uncertainty estimationPendingCN119625162A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202411487001.3ACN119625162A (en)2024-10-232024-10-23 Incremental optimal view selection method for neural radiance field based on hybrid uncertainty estimation

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202411487001.3ACN119625162A (en)2024-10-232024-10-23 Incremental optimal view selection method for neural radiance field based on hybrid uncertainty estimation

Publications (1)

Publication NumberPublication Date
CN119625162Atrue CN119625162A (en)2025-03-14

Family

ID=94898959

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202411487001.3APendingCN119625162A (en)2024-10-232024-10-23 Incremental optimal view selection method for neural radiance field based on hybrid uncertainty estimation

Country Status (1)

CountryLink
CN (1)CN119625162A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN120411325A (en)*2025-07-032025-08-01江西农业大学 A neural radiation field plant rendering method and device integrating a large language model

Citations (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US7542624B1 (en)*2005-06-082009-06-02Sandia CorporationWindow-based method for approximating the Hausdorff in three-dimensional range imagery
US20100036647A1 (en)*2008-08-052010-02-11Technion Research & Development Foundation Ltd.Efficient computation of Voronoi diagrams of general generators in general spaces and uses thereof
CN103399994A (en)*2013-07-232013-11-20中国人民解放军海军航空工程学院Optimization method of periodic inspection process and maintenance of airplanes based on uncertain network planning techniques
US20230154104A1 (en)*2021-11-122023-05-18Nec Laboratories America, Inc.UNCERTAINTY-AWARE FUSION TOWARDS LARGE-SCALE NeRF
US20230377180A1 (en)*2022-05-182023-11-23Toyota Research Institute Inc.Systems and methods for neural implicit scene representation with dense, uncertainty-aware monocular depth constraints
WO2024138350A1 (en)*2022-12-272024-07-04北京原创力科技有限公司Video rendering method and system based on multi-scale spatial delta encoding
US20240295879A1 (en)*2023-03-032024-09-05Dalian University Of TechnologyActive scene mapping method based on constraint guidance and space optimization strategies

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US7542624B1 (en)*2005-06-082009-06-02Sandia CorporationWindow-based method for approximating the Hausdorff in three-dimensional range imagery
US20100036647A1 (en)*2008-08-052010-02-11Technion Research & Development Foundation Ltd.Efficient computation of Voronoi diagrams of general generators in general spaces and uses thereof
CN103399994A (en)*2013-07-232013-11-20中国人民解放军海军航空工程学院Optimization method of periodic inspection process and maintenance of airplanes based on uncertain network planning techniques
US20230154104A1 (en)*2021-11-122023-05-18Nec Laboratories America, Inc.UNCERTAINTY-AWARE FUSION TOWARDS LARGE-SCALE NeRF
US20230377180A1 (en)*2022-05-182023-11-23Toyota Research Institute Inc.Systems and methods for neural implicit scene representation with dense, uncertainty-aware monocular depth constraints
WO2024138350A1 (en)*2022-12-272024-07-04北京原创力科技有限公司Video rendering method and system based on multi-scale spatial delta encoding
US20240295879A1 (en)*2023-03-032024-09-05Dalian University Of TechnologyActive scene mapping method based on constraint guidance and space optimization strategies

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
赵艳丽;赵小虎;刘康明;: "基于Voronoi图和量子粒子群算法的无人机航路规划", 科技导报, no. 22, 8 August 2013 (2013-08-08)*
路坦;李建泽;吴仲超;薛立军;盛华艳;孙智慧;: "不确定性下柔性分布式多能源发电系统扩展规划", 计算技术与自动化, no. 03, 28 September 2020 (2020-09-28)*

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN120411325A (en)*2025-07-032025-08-01江西农业大学 A neural radiation field plant rendering method and device integrating a large language model

Similar Documents

PublicationPublication DateTitle
CN108510573B (en) A method for reconstruction of multi-view face 3D model based on deep learning
Flynn et al.Deepstereo: Learning to predict new views from the world's imagery
CN112509115B (en) Method and system for three-dimensional time-varying unconstrained reconstruction of dynamic scene from sequence images
CN107481279B (en)Monocular video depth map calculation method
US8929645B2 (en)Method and system for fast dense stereoscopic ranging
US20240013479A1 (en)Methods and Systems for Training Quantized Neural Radiance Field
US20100111444A1 (en)Method and system for fast dense stereoscopic ranging
CN113689539A (en)Dynamic scene real-time three-dimensional reconstruction method and device based on implicit optical flow field
CN115953476A (en) Human Free Viewpoint Synthesis Method Based on Generalizable Neural Radiation Field
CN119625162A (en) Incremental optimal view selection method for neural radiance field based on hybrid uncertainty estimation
US20250157133A1 (en)Neural dynamic image-based rendering
Li et al.Sat2vid: Street-view panoramic video synthesis from a single satellite image
Fan et al.RS-DPSNet: Deep plane sweep network for rolling shutter stereo images
CN119006678A (en)Three-dimensional Gaussian sputtering optimization method for pose-free input
Choi et al.Tmo: Textured mesh acquisition of objects with a mobile device by using differentiable rendering
CN119991937A (en) A single-view 3D human body reconstruction method based on Gaussian surface elements
Premalatha et al.Adaptive fish school search optimized resnet for multi-view 3D objects reconstruction
CN117576542A (en) A new viewpoint image synthesis method, system, device and storage medium
Zhang et al.A portable multiscopic camera for novel view and time synthesis in dynamic scenes
Kim et al.Complex-motion NeRF: joint reconstruction and pose optimization with motion and depth priors
CN111210507B (en)Initial view selection method for multi-view three-dimensional reconstruction
Thakur et al.A conditional adversarial network for scene flow estimation
Dickson et al.User-centred Depth Estimation Benchmarking for VR Content Creation from Single Images.
CN115205463B (en) New perspective image generation method, device and equipment based on multi-spherical scene expression
CN117315152B (en)Binocular stereoscopic imaging method and binocular stereoscopic imaging system

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination

[8]ページ先頭

©2009-2025 Movatter.jp