- Tianhao Wu ORCID:orcid.org/0000-0003-2195-994213,14,
- Chuanxia Zheng ORCID:orcid.org/0000-0002-3584-964015,
- Qianyi Wu ORCID:orcid.org/0000-0001-8764-617816 &
- …
- Tat-Jen Cham ORCID:orcid.org/0000-0001-5264-257213
Part of the book series:Lecture Notes in Computer Science ((LNCS,volume 15115))
Included in the following conference series:
369Accesses
Abstract
3D decomposition/segmentation remains a challenge as large-scale 3D annotated data is not readily available. Existing approaches typically leverage 2D machine-generated segments, integrating them to achieve 3D consistency. In this paper, we propose ClusteringSDF, a novel approach achieving both segmentation and reconstruction in 3D via the neural implicit surface representation, specifically the Signed Distance Function (SDF), where the segmentation rendering is directly integrated with the volume rendering of neural implicit surfaces. Although based on ObjectSDF++, ClusteringSDFno longer requires ground-truth segments for supervision while maintaining the capability of reconstructing individual object surfaces, relying purely on the noisy and inconsistent labels from pre-trained models. As the core of ClusteringSDF, we introduce a highly efficientclustering mechanism for lifting 2D labels to 3D. Experimental results on the challenging scenes from ScanNet and Replica datasets show that ClusteringSDF can achieve competitive performance compared to the state-of-the-art with significantly reduced training time.
This is a preview of subscription content,log in via an institution to check access.
Access this chapter
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
- Chapter
- JPY 3498
- Price includes VAT (Japan)
- eBook
- JPY 8465
- Price includes VAT (Japan)
- Softcover Book
- JPY 10581
- Price includes VAT (Japan)
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bhalgat, Y., Laina, I., Henriques, J.F., Zisserman, A., Vedaldi, A.: Contrastive lift: 3d object instance segmentation by slow-fast contrastive fusion. arXiv preprintarXiv:2306.04633 (2023)
Byravan, A., et al.: Nerf2real: sim2real transfer of vision-guided bipedal motion skills using neural radiance fields. In: 2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 9362–9369. IEEE (2023)
Cheng, B., Misra, I., Schwing, A.G., Kirillov, A., Girdhar, R.: Masked-attention mask transformer for universal image segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1290–1299 (2022)
Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: Scannet: richly-annotated 3d reconstructions of indoor scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5828–5839 (2017)
De Brabandere, B., Neven, D., Van Gool, L.: Semantic instance segmentation with a discriminative loss function. arXiv preprintarXiv:1708.02551 (2017)
Deng, N., et al.: Fov-nerf: foveated neural radiance fields for virtual reality. IEEE Trans. Visual Comput. Graph.28(11), 3854–3864 (2022)
Fan, Z., Wang, P., Jiang, Y., Gong, X., Xu, D., Wang, Z.: Nerf-sos: any-view self-supervised object segmentation on complex scenes. arXiv preprintarXiv:2209.08776 (2022)
Fathi, A., et al.: Semantic instance segmentation via deep metric learning. arXiv preprintarXiv:1703.10277 (2017)
Fu, X., et al.: Panoptic nerf: 3d-to-2d label transfer for panoptic urban scene segmentation. In: 2022 International Conference on 3D Vision (3DV), pp. 1–11. IEEE (2022)
Kerr, J., Kim, C.M., Goldberg, K., Kanazawa, A., Tancik, M.: Lerf: language embedded radiance fields. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 19729–19739 (2023)
Kirillov, A., He, K., Girshick, R., Rother, C., Dollár, P.: Panoptic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9404–9413 (2019)
Kirillov, A., et al.: Segment anything. arXiv preprintarXiv:2304.02643 (2023)
Kobayashi, S., Matsumoto, E., Sitzmann, V.: Decomposing nerf for editing via feature field distillation. Adv. Neural. Inf. Process. Syst.35, 23311–23330 (2022)
Kong, S., Fowlkes, C.C.: Recurrent pixel embedding for instance grouping. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9018–9028 (2018)
Kundu, A., et al.: Panoptic neural fields: a semantic object-aware neural scene representation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12871–12881 (2022)
Li, C., Li, S., Zhao, Y., Zhu, W., Lin, Y.: Rt-nerf: real-time on-device neural radiance fields towards immersive ar/vr rendering. In: Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design, pp. 1–9 (2022)
Li, Z., Li, L., Zhu, J.: Read: large-scale neural scene rendering for autonomous driving. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pp. 1522–1529 (2023)
Liang, S., Huang, C., Tian, Y., Kumar, A., Xu, C.: Av-nerf: learning neural fields for real-world audio-visual scene synthesis. Adv. Neural Inf. Process. Syst.36 (2024)
Liu, Y., Hu, B., Huang, J., Tai, Y.W., Tang, C.K.: Instance neural radiance field. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 787–796 (2023)
Mascaro, R., Teixeira, L., Chli, M.: Diffuser: multi-view 2d-to-3d label diffusion for semantic scene segmentation. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 13589–13595. IEEE (2021)
Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.: Occupancy networks: learning 3d reconstruction in function space. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4460–4470 (2019)
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: Nerf: representing scenes as neural radiance fields for view synthesis. Commun. ACM65(1), 99–106 (2021)
Mirzaei, A., Kant, Y., Kelly, J., Gilitschenski, I.: Laterf: label and text driven object radiance fields. In: European Conference on Computer Vision, pp. 20–36. Springer, Heidelberg (2022).https://doi.org/10.1007/978-3-031-20062-5_2
Novotny, D., Albanie, S., Larlus, D., Vedaldi, A.: Semi-convolutional operators for instance segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 86–102 (2018)
Oechsle, M., Peng, S., Geiger, A.: Unisurf: unifying neural implicit surfaces and radiance fields for multi-view reconstruction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5589–5599 (2021)
Park, J.J., Florence, P., Straub, J., Newcombe, R., Lovegrove, S.: Deepsdf: learning continuous signed distance functions for shape representation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 165–174 (2019)
Sharma, P., et al.: Neural groundplans: persistent neural scene representations from a single image. In: International Conference on Learning Representations (2023)
Siddiqui, Y., et al.: Panoptic lifting for 3d scene understanding with neural fields. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9043–9052 (2023)
Sitzmann, V., Rezchikov, S., Freeman, B., Tenenbaum, J., Durand, F.: Light field networks: neural scene representations with single-evaluation rendering. Adv. Neural Inf. Process. Syst. (NeurIPS)34, 19313–19325 (2021)
Sitzmann, V., Zollhöfer, M., Wetzstein, G.: Scene representation networks: continuous 3d-structure-aware neural scene representations. Adv. Neural Inf. Process. Syst.32 (2019)
Straub, J., et al.: The replica dataset: a digital replica of indoor spaces. arXiv preprintarXiv:1906.05797 (2019)
Tschernezki, V., Laina, I., Larlus, D., Vedaldi, A.: Neural feature fusion fields: 3d distillation of self-supervised 2d image representations. In: 2022 International Conference on 3D Vision (3DV), pp. 443–453. IEEE (2022)
Tschernezki, V., Larlus, D., Vedaldi, A.: Neuraldiff: segmenting 3d objects that move in egocentric videos. In: 2021 International Conference on 3D Vision (3DV), pp. 910–919. IEEE (2021)
Ulku, I., Akagündüz, E.: A survey on deep learning-based architectures for semantic segmentation on 2d images. Appl. Artif. Intell.36(1), 2032924 (2022)
Wang, B., Chen, L., Yang, B.: Dm-nerf: 3d scene geometry decomposition and manipulation from 2d images. arXiv preprintarXiv:2208.07227 (2022)
Wang, P., Liu, L., Liu, Y., Theobalt, C., Komura, T., Wang, W.: Neus: learning neural implicit surfaces by volume rendering for multi-view reconstruction. arXiv preprintarXiv:2106.10689 (2021)
Wu, Q., et al.: Object-compositional neural implicit surfaces. In: European Conference on Computer Vision, pp. 197–213. Springer, Heidelberg (2022).https://doi.org/10.1007/978-3-031-19812-0_12
Wu, Q., Wang, K., Li, K., Zheng, J., Cai, J.: Objectsdf++: improved object-compositional neural implicit surfaces. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 21764–21774 (2023)
Xie, C., Park, K., Martin-Brualla, R., Brown, M.: Fig-nerf: figure-ground neural radiance fields for 3d object category modelling. In: 2021 International Conference on 3D Vision (3DV), pp. 962–971. IEEE (2021)
Xu, J., Liu, S., Vahdat, A., Byeon, W., Wang, X., De Mello, S.: Open-vocabulary panoptic segmentation with text-to-image diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2955–2966 (2023)
Yariv, L., Gu, J., Kasten, Y., Lipman, Y.: Volume rendering of neural implicit surfaces. Adv. Neural. Inf. Process. Syst.34, 4805–4815 (2021)
Yariv, L., et al.: Multiview neural surface reconstruction by disentangling geometry and appearance. Adv. Neural. Inf. Process. Syst.33, 2492–2502 (2020)
Zhao, X., et al.: Contrastive learning for label efficient semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10623–10633 (2021)
Zhi, S., Laidlow, T., Leutenegger, S., Davison, A.J.: In-place scene labelling and understanding with implicit scene representation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15838–15847 (2021)
Zhou, X., Girdhar, R., Joulin, A., Krähenbühl, P., Misra, I.: Detecting twenty-thousand classes using image-level supervision. In: European Conference on Computer Vision, pp. 350–368. Springer, Heidelberg (2022).https://doi.org/10.1007/978-3-031-20077-9_21
Acknowledgements
This study is supported under the RIE2020 Industry Alignment Fund - Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner(s). Chuanxia Zheng is supported by EPSRC SYN3D EP/Z001811/1.
Author information
Authors and Affiliations
Nanyang Technological University, Singapore, Singapore
Tianhao Wu & Tat-Jen Cham
S-Lab, Singapore, Singapore
Tianhao Wu
VGG, University of Oxford, Oxford, UK
Chuanxia Zheng
Monash University, Melbourne, Australia
Qianyi Wu
- Tianhao Wu
You can also search for this author inPubMed Google Scholar
- Chuanxia Zheng
You can also search for this author inPubMed Google Scholar
- Qianyi Wu
You can also search for this author inPubMed Google Scholar
- Tat-Jen Cham
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toTianhao Wu.
Editor information
Editors and Affiliations
University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Hessen, Germany
Stefan Roth
Princeton University, Palo Alto, CA, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol
1Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wu, T., Zheng, C., Wu, Q., Cham, TJ. (2025). ClusteringSDF: Self-Organized Neural Implicit Surfaces for 3D Decomposition. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15115. Springer, Cham. https://doi.org/10.1007/978-3-031-72998-0_15
Download citation
Published:
Publisher Name:Springer, Cham
Print ISBN:978-3-031-72997-3
Online ISBN:978-3-031-72998-0
eBook Packages:Computer ScienceComputer Science (R0)
Share this paper
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative