Movatterモバイル変換


[0]ホーム

URL:


Skip to main content

Advertisement

Springer Nature Link
Log in

Gemini connector

Interfacing differentiable physical models and neural networks

You have full access to thisopen access article

Informatik Spektrum Aims and scope

Abstract

Spectacular advances have been made in the field of machine vision over the past decade. While this discipline is traditionally driven by geometric models, neural networks have proven to be superior in some applications and have significantly expanded the limits of what is possible. At the same time, conventional graphic models describe the relationship between images and the associated scene with textures and light in a physically realistic manner and are an important part of photogrammetry. Differential renderers combine these approaches by enabling gradient-based optimization in fixed structures of a graphics pipeline and thus adapt the learning process of neural networks. This fusion of formalized knowledge and machine learning motivates the idea of a modular differentiable renderer in which physical and statistical models can be recombined depending on the use case. We therefore present Gemini Connector: an initiative for the modular development and combination of differentiable physical models and neural networks. We examine opportunities and problems and motivate the idea with the extension of a differentiable rendering pipeline to include models of underwater optics for the analysis of deep sea images. Finally, we discuss use cases, especially within the Cross-Domain Fusion initiative.

Similar content being viewed by others

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Inverse problems are integral parts of numerous scientific disciplines with applications in earth sciences such as geophysics and oceanography, but also in the areas of signal processing and computer vision [1]. Their solution is traditionally based on the analytical evaluation of physical models that (minimally) describe the processes to be examined using specific functions and parameters. In addition to such explicitly formalized models, data-driven models that provide probabilistic solutions directly through the analysis of given measurement data are increasingly being used. This includes machine learning methods, which have proven to be powerful and versatile tools for solving inverse problems and are state-of-the-art in many areas due to their high precision [8].

In the field of computer vision, such techniques have greatly expanded the limits of what was previously possible in areas such as object recognition, visual tracking or semantic segmentation. Their capabilities are not firmly anchored in structures defined by experts, but learned from scratch through training using example data. However, the assumption that the incorporation of formalized knowledge should generally be avoided when training neural networks is a fallacy.

An example of this is the 3D reconstruction of an object or scene from given 2D images. In computer graphics, rendering refers to the process of generating images of the scene defined by it from a 3D model with surface description, light, and camera properties. Processes in this area look back on decades of research and are highly developed and technically mature. Physically driven models such as ray tracing, which realistically depict the radiation exchange of light, sometimes achieve photorealism and have almost closed visible gaps between synthetic and real images.

Making this formalized knowledge usable for 3D reconstruction from images motivates the research area of inverse rendering. Standard renderers cannot easily be coupled with gradient-based methods such as neural networks, since they contain discretization steps like rasterization. In order to close this gap, differentiable renderers that eliminate these obstacles by replacing unsuitable program parts have been developed in recent years.

In this paper, we discuss synergetic effects of a fusion of statistical and physical models. Therefore, we present Gemini Connector, an initiative to create a platform for differential rendering with an easily extendable modular concept. In this way, optical phenomena should be physically describable or learnable via neural networks and thus expand the possibilities of image analysis through inverse rendering (see Fig. 1).

Fig. 1
figure 1

Central idea: The fusion of neural networks and differentiable physically based models enables any recombination of these approaches as a bidirectional interface between the world and its parametric description. Instances can be synthesized from the latter, but conversely the model parameters can also be optimized. This hybrid approach aims to achieve a synergetic effect between formalized and data-driven models

In a concrete application example, we investigate how such a combination can be used for improved image analysis to solve visual challenges in underwater recordings using a close-meshed network of differential renderers with physical and statistical models of underwater optics. This enables the measurement of visual underwater properties from ocean imagery, e.g., color absorption patterns or turbidity. It has been shown that these optical parameters can be used to detect and analyze further ocean health parameters [3]. We will therefore discuss possible applications in other projects of Cross-Domain Fusion (CDF), especially but not limited to ocean science.

Related work

It is generally known that neural networks are now being used very successfully in all conceivable areas. This triumph was based on the successful application of the backpropagation algorithm in conjunction with graphics processing unit (GPU)-based autodiff frameworks such as Caffe, PyTorch, and Tensorflow. Recently, there have also been significant developments in the area of differentiable physical models, which take advantage of a similar technical framework, which is accompanied by interoperability.

Differential rendering

Differential rendering is a young but rapidly developing field of image analysis, the first open source framework OpenDR was released in 2014. In just a few years, a large number of application areas in the field of 3D, material, and light reconstruction as well as body and pose estimation emerged. Kato et al. provide an overview of current technologies [6]. The rapidly growing visibility is driven by new platforms from major vendors such as NVIDIA’s Kaolin (2019), Meta’s PyTorch3D (2020), or Alphabet’s Tensor-flow Graphics (2021) and their direct connection to established libraries for machine learning such as Pytorch and Tensorflow. Despite their visually appealing results, conventional computer graphic models are simplified and only realistic to a limited extent. Therefore, renderers like Mitsuba2 (2019) introduce differential ray tracing, in which the light distribution is underlaid with physical principles. Although this method is very computationally intensive and is sometimes not considered to be sufficiently performant for deep learning approaches [7], it enables inverse rendering with even more realistic feedback.

Image analysis in the medium of water

Scattering media reduce the quality of vision and cast a visual haze over distant objects. Depending on the composition, the range of vision can be barely or greatly restricted, image content can be blurry, color components can be desaturated, or scattered light can cover the field of vision. Although these properties hide areas behind the medium, they also provide information about the visual medium itself. In relation to the medium of water, particularly open bodies of water on our planet, the visual properties mentioned in underwater images can provide information about the condition of the body of water. For example, suspended particulate materials such as plankton and detritus, nutrients, or microplastics can be optically detected [3]. The lighting model is initially dependent on whether the modeled water layer is illuminated by sunlight or artificial light sources. While in the first case the lighting model actually intended for foggy weather by Nayar [10] has also established itself in uniformly illuminated underwater scenarios, a macroscopic underwater model was presented by McGlamery and later extended by Jaffe [5] (Fig. 2). In addition to effects such as wavelength-dependent light attenuation, forward and backward scattering light components were identified and considered. Measurements by Petzold et al. show that scattering distributions differ greatly in calm and turbid water [11].

Fig. 2
figure 2

Three-dimensional scene in grid representation (a) and with underwater lighting model according to Jaffe-McGlamery (b) with its light components (c). Sunlight is not taken into account, but can be integrated via other models

Approach

The reconstruction of image content sometimes requires complex inverse rendering pipelines—a suitable combination of methods is one of the main challenges. We strive for a simple platform for the fusion of such methods and motivate this idea with a modular extension of differentiable renderings with physical models of underwater optics. This work is based directly on the results of Nakath et al. [9], who were able to successfully estimate light and water parameters using differential ray tracing.

The goal of this effort is to recover scene parameters. Rendered images are compared with a reference image using an error function and parameters to be optimized are iteratively adjusted using the gradient descent method. How many and which parameters are estimated or specified is freely configurable and has a major impact on accuracy. In the following, we therefore examine several research questions with different configurations of the rendering pipeline.

Optimizing light and water parameters

Fig. 3 shows the core application, the differentiable renderer, augmented with underwater lighting (Jaffe-McGlamery) and light scattering data (Petzold’s scattering model). Depending on the application, parameters can be specified or learned, with larger numbers of parameters to be learned at the same time leading to complex optimization problems that can sometimes only be solved with difficulty or imprecisely.

Fig. 3
figure 3

Structure of the advanced differentiable renderer around the Jaffe-McGlamery model with Petzold’s scattering measurements capable of detecting individual color fading and water turbidity

Light scattering

Synthetic copies can be generated from real input images with known geometry, texture, and pose, which ideally only differ in the light scattering. Therefore, all standard parameters described in Fig. 3 should be known for the measurement of specific light behaviour. Using the optimization approach described, these values can be traced back to light distributions if a suitable model has been selected for them. Fig. 4 shows a return to discrete measured values that can be interpolated to a Gaussian-like light distribution. An implicit functional modeling would also be possible, as described in the upcoming section about adaptive Monte Carlo sampling. If a fixed scattering model was already used as shown in Fig. 3, the water turbidity of the input images could be determined from this instead.

Fig. 4
figure 4

The scattered light does not follow a normal distribution and can be estimated depending on the angle.a Input images.b Forward scatter measurement results overlaid with an estimated isotropic light distribution [12]

Color attenuation

The measurement of the color attenuation through the medium can be carried out using a similar setup in which all standard parameters are also known. In addition, the use of a Macbeth chart (see Fig. 5) is suitable for optimally determining the gradients of the different color channels.

Fig. 5
figure 5

a Use of an ArUco board with a Macbeth chart to calibrate the water parameters (experimental setup adapted from [9]). For comparison:b Macbeth chart with undistorted colors and without reflections. MacBeth ColorChecker Chart by Acser123 (https://en.wikipedia.org/wiki/File:Color_Checker.pdf). License:https://creativecommons.org/licenses/by-sa/4.0/deed.en

Adaptive Monte Carlo sampling

Physically realistic rendering is usually based on path tracing, i.e., Monte Carlo-based ray tracing. In this way, photon currents up to the emitting light sources are measured according to probabilistic methods with the help of the bidirectional reflection distribution and their influence on the image sensor is simulated. The greater the number of samples examined, the more the result converges to a realistic global illumination.

Despite technical progress, the computing effort is still a limiting factor. In addition to noise-suppression methods, adaptive methods have also been developed that fit the probabilistic distribution of the Monte Carlo simulation to the light field. In his work, Huo gives an overview and presents a reinforcement learning approach to achieve realistic results with a smaller number of samples [4]. He also suggests mapping light distributions implicitly functionally, for example via neural radiance fields. Their superior interpolation properties appear very promising for the reconstruction of light distributions. Likewise, as shown in Fig. 6, an extension to complex lighting conditions of underwater scenarios would be very interesting, since these are not fully physically recorded, but are only predetermined by a discrete number of measured values.

Fig. 6
figure 6

Replacing the scattering model (see Fig. 3) and using a neural network to learn and interpolate the scattering distribution of light in water, dependent on turbidity

Conclusion

In this paper we presented the idea of Gemini Connector, a project within the CDF initiative. We strive for a fusion of data-driven and model-based approaches achieved through end-to-end differentiation. By doing so, we presented an example application that measures optical underwater parameters that can be used to deduce further ocean parameters. We seek to combine those observations with other measurements and models to synergistically create a more holistic view on Earth-scale systems. The examination of digital twinning and CDF are central aspects of this research. While we seek to develop specific methods for the underwater projects planned in this initiative [2], the announced platform should not be limited to these applications. Using examples in the underwater optical domain, we have shown the potential of this approach, which offers new possibilities for the estimation of environmental parameters, but also for the physically based instantiation of samples of these models. By recombination of the modules, the interface can be developed as desired and thus adapted to the respective requirements. We want to employ this potential within the CDF initiative in order to promote the postulated paradigm of CDF through a technical approach.

References

  1. Feng X, Youni J, Xuejiao Yang Du M, Li X (2019) Computer vision algorithms and hardware implementations: A survey. Integration 69:309–320

    Article  Google Scholar 

  2. Grossmann V, Nakath D, Urlaub M, Oppelt N, Koch R, Köser K (2022) Digital twinning in the ocean-challenges in multimodal sensing and multiscale fusion based on faithful visual models. In: Proceedings of the ISPRS Congress 2022 Nice

    Google Scholar 

  3. Huang H, Zehao S, Shuchang L, Di Yanan, Jinzhong X, Caicai L, Xu R, Song H, Shuyue Z, Wu J (2021) Underwa-ter hyperspectral imaging for in situ underwater microplastic detection. Sci Total Environ 776:145960

    Article  Google Scholar 

  4. Huo Y (2022) Extension-adaptive sampling with implicit radiance field. arXiv preprint arXiv:2202.00855

    Google Scholar 

  5. Jaffe JS (1990) Computer modeling and the design of optimal underwater imaging systems. IEEE J Ocean Eng 15(2):101–111

    Article  Google Scholar 

  6. Kato H, Beker D, Morariu M, Ando T, Matsuoka T, Kehl W, Gaidon A (2020) Differentiable rendering: a survey. CoRR, abs/2006.12057

    Google Scholar 

  7. Laine S, Hellsten J, Karras T, Seol Y, Lehtinen J, Aila T (2020) Modular primitives for high-performance differ-entiable rendering. ACM Trans Graph 39(6):1–14

    Article  Google Scholar 

  8. McCann MT, Jin KH, Unser M (2017) Convolutional neural networks for inverse problems in imaging: a review. IEEE Signal Process Mag 34(6):85–95

    Article  Google Scholar 

  9. Nakath D, She M, Yifan S, Köser K (2021) In-situ joint light and medium estimation for underwater color restoration. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 3731–3740

    Google Scholar 

  10. Nayar SK, Narasimhan SG (1999) Vision in bad weather. In: Proceedings of the seventh IEEE international conference on computer vision, vol 2, pp 820–827

    Chapter  Google Scholar 

  11. Petzold TJ (1972) Volume scattering functions for selected ocean waters. Technical report, Scripps Institution of Oceanography La Jolla Ca Visibility Lab

    Book  Google Scholar 

  12. Song Y, Nakath D, She M, Elibol F, Köser K (2021) Deep sea robotic imaging simulator. In: International conference on pattern recognition. Springer, Berlin, pp 375–389

    Google Scholar 

Download references

Acknowledgements

The work has been supported through the Future Ocean Network and Kiel Marine Science by Kiel University (funding for V. Grossmann) as well as by Deutsche Forschungsgemeinschaft (Projektnummer 396311425).

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

  1. Institut für Informatik, Christian-Albrechts-Universität zu Kiel, Kiel, Germany

    Vasco Grossmann & Reinhard Koch

  2. Helmholtz-Zentrum für Ozeanforschung Kiel, Kiel, Germany

    David Nakath & Kevin Köser

Authors
  1. Vasco Grossmann

    You can also search for this author inPubMed Google Scholar

  2. David Nakath

    You can also search for this author inPubMed Google Scholar

  3. Reinhard Koch

    You can also search for this author inPubMed Google Scholar

  4. Kevin Köser

    You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence toVasco Grossmann.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visithttp://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Grossmann, V., Nakath, D., Koch, R.et al. Gemini connector.Informatik Spektrum45, 309–313 (2022). https://doi.org/10.1007/s00287-022-01492-x

Download citation

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Advertisement


[8]ページ先頭

©2009-2025 Movatter.jp