Movatterモバイル変換

Part of the book series:Lecture Notes in Computer Science ((LNCS,volume 15115))

Included in the following conference series:

European Conference on Computer Vision

369Accesses

Abstract

3D decomposition/segmentation remains a challenge as large-scale 3D annotated data is not readily available. Existing approaches typically leverage 2D machine-generated segments, integrating them to achieve 3D consistency. In this paper, we propose ClusteringSDF, a novel approach achieving both segmentation and reconstruction in 3D via the neural implicit surface representation, specifically the Signed Distance Function (SDF), where the segmentation rendering is directly integrated with the volume rendering of neural implicit surfaces. Although based on ObjectSDF++, ClusteringSDFno longer requires ground-truth segments for supervision while maintaining the capability of reconstructing individual object surfaces, relying purely on the noisy and inconsistent labels from pre-trained models. As the core of ClusteringSDF, we introduce a highly efficientclustering mechanism for lifting 2D labels to 3D. Experimental results on the challenging scenes from ScanNet and Replica datasets show that ClusteringSDF can achieve competitive performance compared to the state-of-the-art with significantly reduced training time.

This is a preview of subscription content,log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 8465; Price includes VAT (Japan)

Softcover Book: JPY 10581; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bhalgat, Y., Laina, I., Henriques, J.F., Zisserman, A., Vedaldi, A.: Contrastive lift: 3d object instance segmentation by slow-fast contrastive fusion. arXiv preprintarXiv:2306.04633 (2023)
Byravan, A., et al.: Nerf2real: sim2real transfer of vision-guided bipedal motion skills using neural radiance fields. In: 2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 9362–9369. IEEE (2023)
Google Scholar
Cheng, B., Misra, I., Schwing, A.G., Kirillov, A., Girdhar, R.: Masked-attention mask transformer for universal image segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1290–1299 (2022)
Google Scholar
Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: Scannet: richly-annotated 3d reconstructions of indoor scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5828–5839 (2017)
Google Scholar
De Brabandere, B., Neven, D., Van Gool, L.: Semantic instance segmentation with a discriminative loss function. arXiv preprintarXiv:1708.02551 (2017)
Deng, N., et al.: Fov-nerf: foveated neural radiance fields for virtual reality. IEEE Trans. Visual Comput. Graph.28(11), 3854–3864 (2022)
Article Google Scholar
Fan, Z., Wang, P., Jiang, Y., Gong, X., Xu, D., Wang, Z.: Nerf-sos: any-view self-supervised object segmentation on complex scenes. arXiv preprintarXiv:2209.08776 (2022)
Fathi, A., et al.: Semantic instance segmentation via deep metric learning. arXiv preprintarXiv:1703.10277 (2017)
Fu, X., et al.: Panoptic nerf: 3d-to-2d label transfer for panoptic urban scene segmentation. In: 2022 International Conference on 3D Vision (3DV), pp. 1–11. IEEE (2022)
Google Scholar
Kerr, J., Kim, C.M., Goldberg, K., Kanazawa, A., Tancik, M.: Lerf: language embedded radiance fields. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 19729–19739 (2023)
Google Scholar
Kirillov, A., He, K., Girshick, R., Rother, C., Dollár, P.: Panoptic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9404–9413 (2019)
Google Scholar
Kirillov, A., et al.: Segment anything. arXiv preprintarXiv:2304.02643 (2023)
Kobayashi, S., Matsumoto, E., Sitzmann, V.: Decomposing nerf for editing via feature field distillation. Adv. Neural. Inf. Process. Syst.35, 23311–23330 (2022)
Google Scholar
Kong, S., Fowlkes, C.C.: Recurrent pixel embedding for instance grouping. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9018–9028 (2018)
Google Scholar
Kundu, A., et al.: Panoptic neural fields: a semantic object-aware neural scene representation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12871–12881 (2022)
Google Scholar
Li, C., Li, S., Zhao, Y., Zhu, W., Lin, Y.: Rt-nerf: real-time on-device neural radiance fields towards immersive ar/vr rendering. In: Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design, pp. 1–9 (2022)
Google Scholar
Li, Z., Li, L., Zhu, J.: Read: large-scale neural scene rendering for autonomous driving. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pp. 1522–1529 (2023)
Google Scholar
Liang, S., Huang, C., Tian, Y., Kumar, A., Xu, C.: Av-nerf: learning neural fields for real-world audio-visual scene synthesis. Adv. Neural Inf. Process. Syst.36 (2024)
Google Scholar
Liu, Y., Hu, B., Huang, J., Tai, Y.W., Tang, C.K.: Instance neural radiance field. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 787–796 (2023)
Google Scholar
Mascaro, R., Teixeira, L., Chli, M.: Diffuser: multi-view 2d-to-3d label diffusion for semantic scene segmentation. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 13589–13595. IEEE (2021)
Google Scholar
Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: learning 3d reconstruction in function space. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4460–4470 (2019)
Google Scholar
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: Nerf: representing scenes as neural radiance fields for view synthesis. Commun. ACM65(1), 99–106 (2021)
Article Google Scholar
Mirzaei, A., Kant, Y., Kelly, J., Gilitschenski, I.: Laterf: label and text driven object radiance fields. In: European Conference on Computer Vision, pp. 20–36. Springer, Heidelberg (2022).https://doi.org/10.1007/978-3-031-20062-5_2
Novotny, D., Albanie, S., Larlus, D., Vedaldi, A.: Semi-convolutional operators for instance segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 86–102 (2018)
Google Scholar
Oechsle, M., Peng, S., Geiger, A.: Unisurf: unifying neural implicit surfaces and radiance fields for multi-view reconstruction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5589–5599 (2021)
Google Scholar
Park, J.J., Florence, P., Straub, J., Newcombe, R., Lovegrove, S.: Deepsdf: learning continuous signed distance functions for shape representation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 165–174 (2019)
Google Scholar
Sharma, P., et al.: Neural groundplans: persistent neural scene representations from a single image. In: International Conference on Learning Representations (2023)
Google Scholar
Siddiqui, Y., et al.: Panoptic lifting for 3d scene understanding with neural fields. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9043–9052 (2023)
Google Scholar
Sitzmann, V., Rezchikov, S., Freeman, B., Tenenbaum, J., Durand, F.: Light field networks: neural scene representations with single-evaluation rendering. Adv. Neural Inf. Process. Syst. (NeurIPS)34, 19313–19325 (2021)
Google Scholar
Sitzmann, V., Zollhöfer, M., Wetzstein, G.: Scene representation networks: continuous 3d-structure-aware neural scene representations. Adv. Neural Inf. Process. Syst.32 (2019)
Google Scholar
Straub, J., et al.: The replica dataset: a digital replica of indoor spaces. arXiv preprintarXiv:1906.05797 (2019)
Tschernezki, V., Laina, I., Larlus, D., Vedaldi, A.: Neural feature fusion fields: 3d distillation of self-supervised 2d image representations. In: 2022 International Conference on 3D Vision (3DV), pp. 443–453. IEEE (2022)
Google Scholar
Tschernezki, V., Larlus, D., Vedaldi, A.: Neuraldiff: segmenting 3d objects that move in egocentric videos. In: 2021 International Conference on 3D Vision (3DV), pp. 910–919. IEEE (2021)
Google Scholar
Ulku, I., Akagündüz, E.: A survey on deep learning-based architectures for semantic segmentation on 2d images. Appl. Artif. Intell.36(1), 2032924 (2022)
Article Google Scholar
Wang, B., Chen, L., Yang, B.: Dm-nerf: 3d scene geometry decomposition and manipulation from 2d images. arXiv preprintarXiv:2208.07227 (2022)
Wang, P., Liu, L., Liu, Y., Theobalt, C., Komura, T., Wang, W.: Neus: learning neural implicit surfaces by volume rendering for multi-view reconstruction. arXiv preprintarXiv:2106.10689 (2021)
Wu, Q., et al.: Object-compositional neural implicit surfaces. In: European Conference on Computer Vision, pp. 197–213. Springer, Heidelberg (2022).https://doi.org/10.1007/978-3-031-19812-0_12
Wu, Q., Wang, K., Li, K., Zheng, J., Cai, J.: Objectsdf++: improved object-compositional neural implicit surfaces. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 21764–21774 (2023)
Google Scholar
Xie, C., Park, K., Martin-Brualla, R., Brown, M.: Fig-nerf: figure-ground neural radiance fields for 3d object category modelling. In: 2021 International Conference on 3D Vision (3DV), pp. 962–971. IEEE (2021)
Google Scholar
Xu, J., Liu, S., Vahdat, A., Byeon, W., Wang, X., De Mello, S.: Open-vocabulary panoptic segmentation with text-to-image diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2955–2966 (2023)
Google Scholar
Yariv, L., Gu, J., Kasten, Y., Lipman, Y.: Volume rendering of neural implicit surfaces. Adv. Neural. Inf. Process. Syst.34, 4805–4815 (2021)
Google Scholar
Yariv, L., et al.: Multiview neural surface reconstruction by disentangling geometry and appearance. Adv. Neural. Inf. Process. Syst.33, 2492–2502 (2020)
Google Scholar
Zhao, X., et al.: Contrastive learning for label efficient semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10623–10633 (2021)
Google Scholar
Zhi, S., Laidlow, T., Leutenegger, S., Davison, A.J.: In-place scene labelling and understanding with implicit scene representation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15838–15847 (2021)
Google Scholar
Zhou, X., Girdhar, R., Joulin, A., Krähenbühl, P., Misra, I.: Detecting twenty-thousand classes using image-level supervision. In: European Conference on Computer Vision, pp. 350–368. Springer, Heidelberg (2022).https://doi.org/10.1007/978-3-031-20077-9_21

Download references

Acknowledgements

This study is supported under the RIE2020 Industry Alignment Fund - Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner(s). Chuanxia Zheng is supported by EPSRC SYN3D EP/Z001811/1.

Author information

Authors and Affiliations

Nanyang Technological University, Singapore, Singapore
Tianhao Wu & Tat-Jen Cham
S-Lab, Singapore, Singapore
Tianhao Wu
VGG, University of Oxford, Oxford, UK
Chuanxia Zheng
Monash University, Melbourne, Australia
Qianyi Wu

Authors

Tianhao Wu
View author publications
You can also search for this author inPubMed Google Scholar
Chuanxia Zheng
View author publications
You can also search for this author inPubMed Google Scholar
Qianyi Wu
View author publications
You can also search for this author inPubMed Google Scholar
Tat-Jen Cham
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence toTianhao Wu.

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Hessen, Germany
Stefan Roth
Princeton University, Palo Alto, CA, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 6125 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wu, T., Zheng, C., Wu, Q., Cham, TJ. (2025). ClusteringSDF: Self-Organized Neural Implicit Surfaces for 3D Decomposition. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15115. Springer, Cham. https://doi.org/10.1007/978-3-031-72998-0_15

Download citation

DOI:https://doi.org/10.1007/978-3-031-72998-0_15
Published:30 September 2024
Publisher Name:Springer, Cham
Print ISBN:978-3-031-72997-3
Online ISBN:978-3-031-72998-0
eBook Packages:Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Movatterモバイル変換

ClusteringSDF: Self-Organized Neural Implicit Surfaces for 3D Decomposition

Abstract

Access this chapter

Subscribe and save

Buy Now

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1Electronic supplementary material

Supplementary material 1 (pdf 6125 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Access this chapter

Subscribe and save

Buy Now