Movatterモバイル変換


[0]ホーム

URL:


Skip to main content

Advertisement

Springer Nature Link
Log in

RoomTex: Texturing Compositional Indoor Scenes via Iterative Inpainting

  • Conference paper
  • First Online:

Abstract

The advancement of diffusion models has pushed the boundary of text-to-3D object generation. While it is straightforward to composite objects into a scene with reasonable geometry, it is nontrivial to texture such a scene perfectly due to style inconsistency and occlusions between objects. To tackle these problems, we propose acoarse-to-fine 3D scene texturing framework, referred to asRoomTex, to generate high-fidelity and style-consistent textures for untextured compositional scene meshes. In the coarse stage, RoomTex first unwraps the scene mesh to a panoramic depth map and leverages ControlNet to generate a room panorama, which is regarded as the coarse reference to ensure the global texture consistency. In the fine stage, based on the panoramic image and perspective depth maps, RoomTex will refine and texture every single object in the room iteratively along a series of selected camera views, until this object is completely painted. Moreover, we propose to maintain superior alignment between RGB and depth spaces via subtle edge detection methods. Extensive experiments show our method is capable of generating high-quality and diverse room textures, and more importantly, supporting interactive fine-grained texture control and flexible scene editing thanks to our inpainting-based framework and compositional mesh input. Our project page is available athttps://qwang666.github.io/RoomTex/.

Q. Wang and R. Lu—Equal contribution, work done during the internship at Shanghai AI Laboratory.

This is a preview of subscription content,log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 8007
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 10009
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide -see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Similar content being viewed by others

References

  1. Balaji, Y., et al.: EDIFFI: text-to-image diffusion models with an ensemble of expert denoisers. arXiv preprintarXiv:2211.01324 (2022)

  2. Bokhovkin, A., Tulsiani, S., Dai, A.: Mesh2Tex: generating mesh textures from image queries. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8918–8928 (2023)

    Google Scholar 

  3. Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell.6, 679–698 (1986)

    Article  Google Scholar 

  4. Cao, T., Kreis, K., Fidler, S., Sharp, N., Yin, K.: TexFusion: synthesizing 3D textures with text-guided image diffusion models. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4169–4181 (2023)

    Google Scholar 

  5. Chen, D.Z., Li, H., Lee, H.Y., Tulyakov, S., Nießner, M.: SceneTex: high-quality texture synthesis for indoor scenes via diffusion priors. In: CVPR, pp. 21081–21091 (2024)

    Google Scholar 

  6. Chen, D.Z., Siddiqui, Y., Lee, H.Y., Tulyakov, S., Nießner, M.: Text2Tex: text-driven texture synthesis via diffusion models. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 18558–18568 (2023)

    Google Scholar 

  7. Chen, R., Chen, Y., Jiao, N., Jia, K.: Fantasia3D: disentangling geometry and appearance for high-quality text-to-3D content creation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 22246–22256 (2023)

    Google Scholar 

  8. Chen, Y., Chen, R., Lei, J., Zhang, Y., Jia, K.: TANGO: text-driven photorealistic and robust 3D stylization via lighting decomposition. Adv. Neural. Inf. Process. Syst.35, 30923–30936 (2022)

    Google Scholar 

  9. Chen, Z., Yin, K., Fidler, S.: AUV-Net: learning aligned UV maps for texture transfer and synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1465–1474 (2022)

    Google Scholar 

  10. Cohen-Bar, D., Richardson, E., Metzer, G., Giryes, R., Cohen-Or, D.: Set-the-Scene: global-local training for generating controllable nerf scenes. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2920–2929 (2023)

    Google Scholar 

  11. Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: ScanNet: richly-annotated 3D reconstructions of indoor scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5828–5839 (2017)

    Google Scholar 

  12. Fang, C., Hu, X., Luo, K., Tan, P.: Ctrl-Room: controllable text-to-3D room meshes generation with layout constraints. arXiv preprintarXiv:2310.03602 (2023)

  13. Fridman, R., Abecasis, A., Kasten, Y., Dekel, T.: SceneScape: text-driven consistent scene generation. Adv. Neural Info. Process. Syst.36 (2024)

    Google Scholar 

  14. Fu, H., et al.: 3D-front: 3D furnished rooms with layouts and semantics. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10933–10942 (2021)

    Google Scholar 

  15. Gao, J., et al.: GET3D: a generative model of high quality 3D textured shapes learned from images. Adv. Neural. Inf. Process. Syst.35, 31841–31854 (2022)

    Google Scholar 

  16. Gupta, A., Xiong, W., Nie, Y., Jones, I., Oğuz, B.: 3DGEN: triplane latent diffusion for textured mesh generation. arXiv preprintarXiv:2303.05371 (2023)

  17. Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: NeurIPS, vol. 33, pp. 6840–6851 (2020)

    Google Scholar 

  18. Höllein, L., Cao, A., Owens, A., Johnson, J., Nießner, M.: Text2Room: extracting textured 3D meshes from 2D text-to-image models. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7909–7920 (2023)

    Google Scholar 

  19. Hwang, I., Kim, H., Kim, Y.M.: Text2Scene: text-driven indoor scene stylization with part-aware details. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1890–1899 (2023)

    Google Scholar 

  20. Jun, H., Nichol, A.: Shap-E: generating conditional 3D implicit functions. arXiv preprintarXiv:2305.02463 (2023)

  21. Li, W., Chen, R., Chen, X., Tan, P.: SweetDreamer: aligning geometric priors in 2D diffusion for consistent text-to-3D. arXiv preprintarXiv:2310.02596 (2023)

  22. Lin, C.H., et al.: Magic3D: high-resolution text-to-3D content creation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 300–309 (2023)

    Google Scholar 

  23. Liu, Z., et al.: UniDream: unifying diffusion priors for relightable text-to-3D generation. arXiv preprintarXiv:2312.08754 (2023)

  24. Liu, Z., Feng, Y., Black, M.J., Nowrouzezahrai, D., Paull, L., Liu, W.: MeshDiffusion: score-based generative 3D mesh modeling. arXiv preprintarXiv:2303.08133 (2023)

  25. Metzer, G., Richardson, E., Patashnik, O., Giryes, R., Cohen-Or, D.: Latent-NeRF for shape-guided generation of 3D shapes and textures. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12663–12673 (2023)

    Google Scholar 

  26. Michel, O., Bar-On, R., Liu, R., Benaim, S., Hanocka, R.: Text2Mesh: text-driven neural stylization for meshes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13492–13502 (2022)

    Google Scholar 

  27. Mittal, A., Moorthy, A.K., Bovik, A.C.: No-reference image quality assessment in the spatial domain. IEEE Trans. Image Process.21(12), 4695–4708 (2012)

    Article MathSciNet  Google Scholar 

  28. Mou, C., et al.: T2I-Adapter: learning adapters to dig out more controllable ability for text-to-image diffusion models. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, pp. 4296–4304 (2024)

    Google Scholar 

  29. Nichol, A., Jun, H., Dhariwal, P., Mishkin, P., Chen, M.: Point-E: a system for generating 3D point clouds from complex prompts. arXiv preprintarXiv:2212.08751 (2022)

  30. Nichol, A.Q., et al.: GLIDE: towards photorealistic image generation and editing with text-guided diffusion models. In: International Conference on Machine Learning, pp. 16784–16804. PMLR (2022)

    Google Scholar 

  31. Oechsle, M., Mescheder, L., Niemeyer, M., Strauss, T., Geiger, A.: Texture Fields: learning texture representations in function space. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4531–4540 (2019)

    Google Scholar 

  32. Po, R., Wetzstein, G.: Compositional 3D scene generation using locally conditioned diffusion. In: 2024 International Conference on 3D Vision (3DV), pp. 651–663. IEEE (2024)

    Google Scholar 

  33. Poole, B., Jain, A., Barron, J.T., Mildenhall, B.: DreamFusion: text-to-3D using 2D diffusion. In: ICLR (2023)

    Google Scholar 

  34. Qiu, L., et al.: RichDreamer: a generalizable normal-depth diffusion model for detail richness in text-to-3D. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9914–9925 (2024)

    Google Scholar 

  35. Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)

    Google Scholar 

  36. Richardson, E., Metzer, G., Alaluf, Y., Giryes, R., Cohen-Or, D.: TEXTure: text-guided texturing of 3D shapes. In: ACM SIGGRAPH 2023 Conference Proceedings, pp. 1–11 (2023)

    Google Scholar 

  37. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695 (2022)

    Google Scholar 

  38. Saharia, C.: Photorealistic text-to-image diffusion models with deep language understanding. Adv. Neural. Inf. Process. Syst.35, 36479–36494 (2022)

    Google Scholar 

  39. Schuhmann, C., et al.: LAION-400M: open dataset of CLIP-filtered 400 million image-text pairs. arXiv preprintarXiv:2111.02114 (2021)

  40. Sharma, P., Ding, N., Goodman, S., Soricut, R.: Conceptual Captions: a cleaned, hypernymed, image alt-text dataset for automatic image captioning. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 2556–2565 (2018)

    Google Scholar 

  41. Shi, Y., Wang, P., Ye, J., Long, M., Li, K., Yang, X.: MvDream: multi-view diffusion for 3D generation. arXiv preprintarXiv:2308.16512 (2023)

  42. Siddiqui, Y., et al.: MeshGPT: generating triangle meshes with decoder-only transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19615–19625 (2024)

    Google Scholar 

  43. Siddiqui, Y., Thies, J., Ma, F., Shan, Q., Nießner, M., Dai, A.: Texturify: generating textures on 3D shape surfaces. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision, ECCV 2022. LNCS, vol. 13663, pp. 72–88. Springer, Cham (2022).https://doi.org/10.1007/978-3-031-20062-5_5

  44. Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., Ganguli, S.: Deep unsupervised learning using nonequilibrium thermodynamics. In: International Conference on Machine Learning, pp. 2256–2265. PMLR (2015)

    Google Scholar 

  45. Song, L., et al.: RoomDreamer: text-driven 3D indoor scene synthesis with coherent geometry and texture. In: Proceedings of the 31st ACM International Conference on Multimedia, pp. 6898–6906 (2023)

    Google Scholar 

  46. Tang, S., Zhang, F., Chen, J., Wang, P., Furukawa, Y.: MVDiffusion: enabling holistic multi-view image generation with correspondence-aware diffusion. arXiv preprintarXiv:2307.01097 (2023)

  47. Voynov, A., Aberman, K., Cohen-Or, D.: Sketch-guided text-to-image diffusion models. In: ACM SIGGRAPH 2023 Conference Proceedings, pp. 1–11 (2023)

    Google Scholar 

  48. Wang, H., Du, X., Li, J., Yeh, R.A., Shakhnarovich, G.: Score Jacobian Chaining: lifting pretrained 2D diffusion models for 3D generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12619–12629 (2023)

    Google Scholar 

  49. Wang, T., Kanakis, M., Schindler, K., Van Gool, L., Obukhov, A.: Breathing new life into 3D assets with generative repainting. arXiv preprintarXiv:2309.08523 (2023)

  50. Wang, Z., et al.: ProlificDreamer: high-fidelity and diverse text-to-3D generation with variational score distillation. Adv. Neural Inf. Process. Syst.36 (2024)

    Google Scholar 

  51. Yang, B., et al.: DreamSpace: dreaming your room space with text-driven panoramic texture propagation. In: 2024 IEEE Conference Virtual Reality and 3D User Interfaces (VR), pp. 650–660. IEEE (2024)

    Google Scholar 

  52. Yeh, Y.Y., et al.: TextureDreamer: image-guided texture synthesis through geometry-aware diffusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4304–4314 (2024)

    Google Scholar 

  53. Youwang, K., Oh, T.H., Pons-Moll, G.: Paint-it: text-to-texture synthesis via deep convolutional texture map optimization and physically-based rendering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4347–4356 (2024)

    Google Scholar 

  54. Yu, X., Dai, P., Li, W., Ma, L., Liu, Z., Qi, X.: Texture generation on 3D meshes with point-UV diffusion. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4206–4216 (2023)

    Google Scholar 

  55. Zeng, X., et al.: Paint3D: paint anything 3D with lighting-less texture diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4252–4262 (2024)

    Google Scholar 

  56. Zhang, J., Li, X., Wan, Z., Wang, C., Liao, J.: Text2NeRF: text-driven 3D scene generation with neural radiance fields. IEEE Trans. Vis. Comput. Graph. (2024)

    Google Scholar 

  57. Zhang, L., Rao, A., Agrawala, M.: Adding conditional control to text-to-image diffusion models. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3836–3847 (2023)

    Google Scholar 

  58. Zhang, Q., et al.: Scenewiz3D: towards text-guided 3D scene composition. arXiv preprintarXiv:2312.08885 (2023)

Download references

Acknowledgements

This research is supported in part by the Early Career Scheme of the Research Grants Council (RGC) of the Hong Kong SAR under grant No. 26202321, SAIL Research Project, HKUST-Zeekr Collaborative Research Fund, HKUST-WeBank Joint Lab Project, Tencent Rhino-Bird Focused Research Program, Sichuan Science and Technology Program (2023YFSY0008), China Tower-Peking University Joint Laboratory of Intelligent Society and Space Governance, National Natural Science Foundation of China (61632003, 61375022, 61403005), Grant SCITLAB-20017 of Intelligent Terminal Key Laboratory of SiChuan Province, Beijing Advanced Innovation Center for Intelligent Robots and Systems (2018IRS11), and PEK-SenseTime Joint Laboratory of Machine Vision. This research is also supported by Shanghai Artificial Intelligence Laboratory.

Author information

Authors and Affiliations

  1. The Hong Kong University of Science and Technology, Hong Kong, Hong Kong SAR, China

    Qi Wang, Michael Yu Wang & Dan Xu

  2. National Key Laboratory of General AI, School of IST, Peking University, Beijing, China

    Ruijie Lu & Gang Zeng

  3. Shanghai AI Laboratory, Shanghai, China

    Qi Wang, Ruijie Lu, Xudong Xu, Jingbo Wang & Bo Dai

Authors
  1. Qi Wang

    You can also search for this author inPubMed Google Scholar

  2. Ruijie Lu

    You can also search for this author inPubMed Google Scholar

  3. Xudong Xu

    You can also search for this author inPubMed Google Scholar

  4. Jingbo Wang

    You can also search for this author inPubMed Google Scholar

  5. Michael Yu Wang

    You can also search for this author inPubMed Google Scholar

  6. Bo Dai

    You can also search for this author inPubMed Google Scholar

  7. Gang Zeng

    You can also search for this author inPubMed Google Scholar

  8. Dan Xu

    You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence toXudong Xu.

Editor information

Editors and Affiliations

  1. University of Birmingham, Birmingham, UK

    Aleš Leonardis

  2. University of Trento, Trento, Italy

    Elisa Ricci

  3. Technical University of Darmstadt, Darmstadt, Germany

    Stefan Roth

  4. Princeton University, Princeton, NJ, USA

    Olga Russakovsky

  5. Czech Technical University in Prague, Prague, Czech Republic

    Torsten Sattler

  6. École des Ponts ParisTech, Marne-la-Vallée, France

    Gül Varol

1Electronic supplementary material

Below is the link to the electronic supplementary material.

Rights and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, Q.et al. (2025). RoomTex: Texturing Compositional Indoor Scenes via Iterative Inpainting. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15126. Springer, Cham. https://doi.org/10.1007/978-3-031-73113-6_27

Download citation

Publish with us

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 8007
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 10009
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide -see info

Tax calculation will be finalised at checkout

Purchases are for personal use only


[8]ページ先頭

©2009-2025 Movatter.jp