Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Neural radiance field

From Wikipedia, the free encyclopedia
3D reconstruction technique
Part of a series on
Machine learning
anddata mining

Aneural radiance field (NeRF) is aneural field forreconstructing a three-dimensional representation of a scene from two-dimensional images. The NeRF model enables downstream applications of novel view synthesis, scene geometry reconstruction, and obtaining thereflectance properties of the scene. Additional scene properties such as camera poses may also be jointly learned. First introduced in 2020,[1] it has since gained significant attention for its potential applications in computer graphics and content creation.[2]

Algorithm

[edit]

The NeRF algorithm represents a scene as a radiance field parametrized by adeep neural network (DNN). The network predicts a volume density and view-dependent emitted radiance given the spatial location(x,y,z){\displaystyle (x,y,z)} and viewing direction inEuler angles(θ,Φ){\displaystyle (\theta ,\Phi )} of the camera. By sampling many points along camera rays, traditionalvolume rendering techniques can produce an image.[1]

Data collection

[edit]

A NeRF needs to be retrained for each unique scene. The first step is to collect images of the scene from different angles and their respective camera pose. These images are standard 2D images and do not require a specialized camera or software. Any camera is able to generate datasets, provided the settings and capture method meet the requirements for SfM (Structure from Motion).

This requires tracking of the camera position and orientation, often through some combination ofSLAM,GPS, orinertial estimation. Researchers often use synthetic data to evaluate NeRF and related techniques. For such data, images (rendered through traditional non-learned methods) and respective camera poses are reproducible and error-free.[3]

Training

[edit]

For each sparse viewpoint (image and camera pose) provided, camerarays are marched through the scene, generating a set of 3D points with a given radiance direction (into the camera). For these points, volume density and emitted radiance are predicted using themulti-layer perceptron (MLP). An image is then generated through classical volume rendering. Because this process is fully differentiable, the error between the predicted image and the original image can be minimized withgradient descent over multiple viewpoints, encouraging theMLP to develop a coherent model of the scene.[1]

Variations and improvements

[edit]

Early versions of NeRF were slow to optimize and required that all input views were taken with the same camera in the same lighting conditions. These performed best when limited to orbiting around individual objects, such as a drum set, plants or small toys.[2] Since the original paper in 2020, many improvements have been made to the NeRF algorithm, with variations for special use cases.

Fourier feature mapping

[edit]

In 2020, shortly after the release of NeRF, the addition of Fourier Feature Mapping improved training speed and image accuracy. Deep neural networks struggle to learn high frequency functions in low dimensional domains; a phenomenon known as spectral bias. To overcome this shortcoming, points are mapped to a higher dimensional feature space before being fed into the MLP.

γ(v)=[a1cos(2πB1Tv)a1sin(2πB1Tv)amcos(2πBmTv)amsin(2πBmTv)]{\displaystyle \gamma (\mathrm {v} )={\begin{bmatrix}a_{1}\cos(2{\pi }{\mathrm {B} }_{1}^{T}\mathrm {v} )\\a_{1}\sin(2\pi {\mathrm {B} }_{1}^{T}\mathrm {v} )\\\vdots \\a_{m}\cos(2{\pi }{\mathrm {B} }_{m}^{T}\mathrm {v} )\\a_{m}\sin(2{\pi }{\mathrm {B} }_{m}^{T}\mathrm {v} )\end{bmatrix}}}

Wherev{\displaystyle \mathrm {v} } is the input point,Bi{\displaystyle \mathrm {B} _{i}} are the frequency vectors, andai{\displaystyle a_{i}} are coefficients.

This allows for rapid convergence to high frequency functions, such as pixels in a detailed image.[4]

Bundle-adjusting neural radiance fields

[edit]

One limitation of NeRFs is the requirement of knowing accurate camera poses to train the model. Often times, pose estimation methods are not completely accurate, nor is the camera pose even possible to know. These imperfections result in artifacts and suboptimal convergence. So, a method was developed to optimize the camera pose along with the volumetric function itself. Called Bundle-Adjusting Neural Radiance Field (BARF), the technique uses a dynamiclow-pass filter (DLPF) to go from coarse to fine adjustment, minimizing error by finding the geometric transformation to the desired image. This corrects imperfect camera poses and greatly improves the quality of NeRF renders.[5]

Multiscale representation

[edit]

Conventional NeRFs struggle to represent detail at all viewing distances, producing blurry images up close and overlyaliased images from distant views. In 2021, researchers introduced a technique to improve the sharpness of details at different viewing scales known as mip-NeRF (comes frommipmap). Rather than sampling a single ray per pixel, the technique fits agaussian to the conicalfrustum cast by the camera. This improvement effectively anti-aliases across all viewing scales. mip-NeRF also reduces overall image error and is faster to converge at about half the size of ray-based NeRF.[6]

Learned initializations

[edit]

In 2021, researchers appliedmeta-learning to assign initial weights to the MLP. This rapidly speeds up convergence by effectively giving the network a head start in gradient descent. Meta-learning also allowed the MLP to learn an underlying representation of certain scene types. For example, given a dataset of famous tourist landmarks, an initialized NeRF could partially reconstruct a scene given one image.[7]

NeRF in the wild

[edit]

Conventional NeRFs are vulnerable to slight variations in input images (objects, lighting) often resulting inghosting and artifacts. As a result, NeRFs struggle to represent dynamic scenes, such as bustling city streets with changes in lighting and dynamic objects. In 2021, researchers at Google[2] developed a new method for accounting for these variations, named NeRF in the Wild (NeRF-W). This method splits the neural network (MLP) into three separate models. The main MLP is retained to encode the static volumetric radiance. However, it operates in sequence with a separate MLP for appearance embedding (changes in lighting, camera properties) and an MLP for transient embedding (changes in scene objects). This allows the NeRF to be trained on diverse photo collections, such as those taken by mobile phones at different times of day.[8]

Relighting

[edit]

In 2021, researchers added more outputs to the MLP at the heart of NeRFs. The output now included: volume density, surface normal, material parameters, distance to the first surface intersection (in any direction), and visibility of the external environment in any direction. The inclusion of these new parameters lets the MLP learn material properties, rather than pure radiance values. This facilitates a more complex rendering pipeline, calculating direct andglobal illumination, specular highlights, and shadows. As a result, the NeRF can render the scene under any lighting conditions with no re-training.[9]

Plenoctrees

[edit]

Although NeRFs had reached high levels of fidelity, their costly compute time made them useless for many applications requiring real-time rendering, such asVR/AR and interactive content. Introduced in 2021, Plenoctrees (plenopticoctrees) enabled real-time rendering of pre-trained NeRFs through division of the volumetric radiance function into an octree. Rather than assigning a radiance direction into the camera, viewing direction is taken out of the network input and spherical radiance is predicted for each region. This makes rendering over 3000x faster than conventional NeRFs.[10]

Sparse Neural Radiance Grid

[edit]

Similar to Plenoctrees, this method enabled real-time rendering of pretrained NeRFs. To avoid querying the large MLP for each point, this method bakes NeRFs into Sparse Neural Radiance Grids (SNeRG). A SNeRG is a sparsevoxel grid containing opacity and color, with learned feature vectors to encode view-dependent information. A lightweight, more efficient MLP is then used to produce view-dependent residuals to modify the color and opacity. To enable this compressive baking, small changes to the NeRF architecture were made, such as running the MLP once per pixel rather than for each point along the ray. These improvements make SNeRG extremely efficient, outperforming Plenoctrees.[11]

Instant NeRFs

[edit]

In 2022, researchers at Nvidia enabled real-time training of NeRFs through a technique known as Instant Neural Graphics Primitives. An innovative input encoding reduces computation, enabling real-time training of a NeRF, an improvement orders of magnitude above previous methods. The speedup stems from the use of spatialhash functions, which haveO(1){\displaystyle O(1)} access times, and parallelized architectures which run fast on modernGPUs.[12]

Related techniques

[edit]

Plenoxels

[edit]

Plenoxel (plenoptic volume element) uses a sparsevoxel representation instead of a volumetric approach as seen in NeRFs. Plenoxel also completely removes the MLP, instead directly performing gradient descent on the voxel coefficients. Plenoxel can match the fidelity of a conventional NeRF in orders of magnitude less training time. Published in 2022, this method disproved the importance of the MLP, showing that the differentiable rendering pipeline is the critical component.[13]

Gaussian splatting

[edit]
Main article:Gaussian splatting

Gaussian splatting is a newer method that can outperform NeRF in render time and fidelity. Rather than representing the scene as a volumetric function, it uses a sparse cloud of 3Dgaussians. First, a point cloud is generated (throughstructure from motion) and converted to gaussians of initial covariance, color, and opacity. The gaussians are directly optimized through stochastic gradient descent to match the input image. This saves computation by removing empty space and foregoing the need to query a neural network for each point. Instead, simply "splat" all the gaussians onto the screen and they overlap to produce the desired image.[14]

Photogrammetry

[edit]

Traditionalphotogrammetry is not neural, instead using robust geometric equations to obtain 3D measurements. NeRFs, unlike photogrammetric methods, do not inherently produce dimensionally accurate 3D geometry. While their results are often sufficient for extracting accurate geometry (ex: via cube marching[1]), the process isfuzzy, as with most neural methods. This limits NeRF to cases where the output image is valued, rather than raw scene geometry. However, NeRFs excel in situations with unfavorable lighting. For example, photogrammetric methods completely break down when trying to reconstruct reflective or transparent objects in a scene, while a NeRF is able to infer the geometry.[15]

Applications

[edit]

NeRFs have a wide range of applications, and are starting to grow in popularity as they become integrated into user-friendly applications.[3]

Content creation

[edit]
Video rendered from a neural radiance field

NeRFs have huge potential in content creation, where on-demand photorealistic views are extremely valuable.[16] The technology democratizes a space previously only accessible by teams of VFX artists with expensive assets. Neural radiance fields now allow anyone with a camera to create compelling 3D environments.[3] NeRF has been combined withgenerative AI, allowing users with no modelling experience to instruct changes in photorealistic 3D scenes.[17] NeRFs have potential uses in video production, computer graphics, and product design.

Interactive content

[edit]

The photorealism of NeRFs make them appealing for applications where immersion is important, such as virtual reality or videogames. NeRFs can be combined with classical rendering techniques to insert synthetic objects and create believable virtual experiences.[18]

Medical imaging

[edit]

NeRFs have been used to reconstruct 3D CT scans from sparse or even single X-ray views. The model demonstrated high fidelity renderings of chest and knee data. If adopted, this method can save patients from excess doses of ionizing radiation, allowing for safer diagnosis.[19]

Robotics and autonomy

[edit]

The unique ability of NeRFs to understand transparent and reflective objects makes them useful for robots interacting in such environments. The use of NeRF allowed a robot arm to precisely manipulate a transparent wine glass; a task where traditionalcomputer vision would struggle.[20]

NeRFs can also generate photorealistic human faces, making them valuable tools for human-computer interaction. Traditionally rendered faces can beuncanny, whileother neural methods are too slow to run in real-time.[21]

References

[edit]
  1. ^abcdMildenhall, Ben; Srinivasan, Pratul P.; Tancik, Matthew; Barron, Jonathan T.; Ramamoorthi, Ravi; Ng, Ren (2020). "NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis". In Vedaldi, Andrea; Bischof, Horst; Brox, Thomas; Frahm, Jan-Michael (eds.).Computer Vision – ECCV 2020. Lecture Notes in Computer Science. Vol. 12346. Cham: Springer International Publishing. pp. 405–421.arXiv:2003.08934.doi:10.1007/978-3-030-58452-8_24.ISBN 978-3-030-58452-8.S2CID 213175590.
  2. ^abc"What is a Neural Radiance Field (NeRF)? | Definition from TechTarget".Enterprise AI. Retrieved2023-10-24.
  3. ^abcTancik, Matthew; Weber, Ethan; Ng, Evonne; Li, Ruilong; Yi, Brent; Kerr, Justin; Wang, Terrance; Kristoffersen, Alexander; Austin, Jake; Salahi, Kamyar; Ahuja, Abhik; McAllister, David; Kanazawa, Angjoo (2023-07-23). "Nerfstudio: A Modular Framework for Neural Radiance Field Development".Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Proceedings. pp. 1–12.arXiv:2302.04264.doi:10.1145/3588432.3591516.ISBN 9798400701597.S2CID 256662551.
  4. ^Tancik, Matthew; Srinivasan, Pratul P.; Mildenhall, Ben; Fridovich-Keil, Sara; Raghavan, Nithin; Singhal, Utkarsh; Ramamoorthi, Ravi; Barron, Jonathan T.; Ng, Ren (2020-06-18). "Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains".arXiv:2006.10739 [cs.CV].
  5. ^Lin, Chen-Hsuan; Ma, Wei-Chiu; Torralba, Antonio; Lucey, Simon (2021). "BARF: Bundle-Adjusting Neural Radiance Fields".arXiv:2104.06405 [cs.CV].
  6. ^Barron, Jonathan T.; Mildenhall, Ben; Tancik, Matthew; Hedman, Peter; Martin-Brualla, Ricardo; Srinivasan, Pratul P. (2021-04-07). "Mip-NeRF: {A} Multiscale Representation for Anti-Aliasing Neural Radiance Fields".arXiv:2103.13415 [cs.CV].
  7. ^Tancik, Matthew; Mildenhall, Ben; Wang, Terrance; Schmidt, Divi; Srinivasan, Pratul (2021). "Learned Initializations for Optimizing Coordinate-Based Neural Representations".arXiv:2012.02189 [cs.CV].
  8. ^Martin-Brualla, Ricardo; Radwan, Noha; Sajjadi, Mehdi S. M.; Barron, Jonathan T.; Dosovitskiy, Alexey; Duckworth, Daniel (2020). "NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections".arXiv:2008.02268 [cs.CV].
  9. ^Srinivasan, Pratul P.; Deng, Boyang; Zhang, Xiuming; Tancik, Matthew; Mildenhall, Ben; Barron, Jonathan T. (2020). "NeRV: Neural Reflectance and Visibility Fields for Relighting and View Synthesis".arXiv:2012.03927 [cs.CV].
  10. ^Yu, Alex; Li, Ruilong; Tancik, Matthew; Li, Hao; Ng, Ren; Kanazawa, Angjoo (2021). "PlenOctrees for Real-time Rendering of Neural Radiance Fields".arXiv:2103.14024 [cs.CV].
  11. ^Hedman, Peter; Srinivasan, Pratul P.; Mildenhall, Ben; Barron, Jonathan T.; Debevec, Paul (2021). "Baking Neural Radiance Fields for Real-Time View Synthesis".arXiv:2103.14645 [cs.CV].
  12. ^Müller, Thomas; Evans, Alex; Schied, Christoph; Keller, Alexander (2022-07-04). "Instant Neural Graphics Primitives with a Multiresolution Hash Encoding".ACM Transactions on Graphics.41 (4):1–15.arXiv:2201.05989.doi:10.1145/3528223.3530127.ISSN 0730-0301.S2CID 246016186.
  13. ^Fridovich-Keil, Sara; Yu, Alex; Tancik, Matthew; Chen, Qinhong; Recht, Benjamin; Kanazawa, Angjoo (2021). "Plenoxels: Radiance Fields without Neural Networks".arXiv:2112.05131 [cs.CV].
  14. ^Kerbl, Bernhard; Kopanas, Georgios; Leimkuehler, Thomas; Drettakis, George (2023-07-26)."3D Gaussian Splatting for Real-Time Radiance Field Rendering".ACM Transactions on Graphics.42 (4):1–14.arXiv:2308.04079.doi:10.1145/3592433.ISSN 0730-0301.S2CID 259267917.
  15. ^"Why THIS is the Future of Imagery (and Nobody Knows it Yet)". 20 November 2022 – via www.youtube.com.
  16. ^"Shutterstock Speaks About NeRFs At Ad Week | Neural Radiance Fields".neuralradiancefields.io. 2023-10-20. Retrieved2023-10-24.
  17. ^Haque, Ayaan; Tancik, Matthew; Efros, Alexei; Holynski, Aleksander; Kanazawa, Angjoo (2023-06-01)."InstructPix2Pix: Learning to Follow Image Editing Instructions".2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. pp. 18392–18402.arXiv:2211.09800.doi:10.1109/cvpr52729.2023.01764.ISBN 979-8-3503-0129-8.S2CID 253581213.
  18. ^"Venturing Beyond Reality: VR-NeRF | Neural Radiance Fields".neuralradiancefields.io. 2023-11-08. Retrieved2023-11-09.
  19. ^Corona-Figueroa, Abril; Frawley, Jonathan; Taylor, Sam Bond-; Bethapudi, Sarath; Shum, Hubert P. H.; Willcocks, Chris G. (2022-07-11)."MedNeRF: Medical Neural Radiance Fields for Reconstructing 3D-aware CT-Projections from a Single X-ray".2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC)(PDF). Vol. 2022. IEEE. pp. 3843–3848.doi:10.1109/embc48229.2022.9871757.ISBN 978-1-7281-2782-8.PMID 36085823.S2CID 246473192.
  20. ^Kerr, Justin; Fu, Letian; Huang, Huang; Avigal, Yahav; Tancik, Matthew; Ichnowski, Jeffrey; Kanazawa, Angjoo; Goldberg, Ken (2022-08-15).Evo-NeRF: Evolving NeRF for Sequential Robot Grasping of Transparent Objects. CoRL 2022 Conference.
  21. ^Aurora (2023-06-04)."Generating highly detailed human faces using Neural Radiance Fields".ILLUMINATION. Archived fromthe original on 2023-11-16. Retrieved2023-11-09.
Categories
Technologies
Applications
3D reconstruction
Retrieved from "https://en.wikipedia.org/w/index.php?title=Neural_radiance_field&oldid=1310707783"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2026 Movatter.jp