- Qi Wang ORCID:orcid.org/0000-0001-7774-978X13,15,
- Ruijie Lu ORCID:orcid.org/0009-0007-2786-293714,15,
- Xudong Xu ORCID:orcid.org/0009-0003-8858-091815,
- Jingbo Wang ORCID:orcid.org/0009-0005-0740-854815,
- Michael Yu Wang ORCID:orcid.org/0000-0002-6524-574113,
- Bo Dai ORCID:orcid.org/0000-0003-0777-923215,
- Gang Zeng ORCID:orcid.org/0000-0002-9575-465114 &
- …
- Dan Xu ORCID:orcid.org/0000-0003-0136-960313
Part of the book series:Lecture Notes in Computer Science ((LNCS,volume 15126))
Included in the following conference series:
325Accesses
Abstract
The advancement of diffusion models has pushed the boundary of text-to-3D object generation. While it is straightforward to composite objects into a scene with reasonable geometry, it is nontrivial to texture such a scene perfectly due to style inconsistency and occlusions between objects. To tackle these problems, we propose acoarse-to-fine 3D scene texturing framework, referred to asRoomTex, to generate high-fidelity and style-consistent textures for untextured compositional scene meshes. In the coarse stage, RoomTex first unwraps the scene mesh to a panoramic depth map and leverages ControlNet to generate a room panorama, which is regarded as the coarse reference to ensure the global texture consistency. In the fine stage, based on the panoramic image and perspective depth maps, RoomTex will refine and texture every single object in the room iteratively along a series of selected camera views, until this object is completely painted. Moreover, we propose to maintain superior alignment between RGB and depth spaces via subtle edge detection methods. Extensive experiments show our method is capable of generating high-quality and diverse room textures, and more importantly, supporting interactive fine-grained texture control and flexible scene editing thanks to our inpainting-based framework and compositional mesh input. Our project page is available athttps://qwang666.github.io/RoomTex/.
Q. Wang and R. Lu—Equal contribution, work done during the internship at Shanghai AI Laboratory.
This is a preview of subscription content,log in via an institution to check access.
Access this chapter
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
- Chapter
- JPY 3498
- Price includes VAT (Japan)
- eBook
- JPY 8007
- Price includes VAT (Japan)
- Softcover Book
- JPY 10009
- Price includes VAT (Japan)
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Balaji, Y., et al.: EDIFFI: text-to-image diffusion models with an ensemble of expert denoisers. arXiv preprintarXiv:2211.01324 (2022)
Bokhovkin, A., Tulsiani, S., Dai, A.: Mesh2Tex: generating mesh textures from image queries. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8918–8928 (2023)
Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell.6, 679–698 (1986)
Cao, T., Kreis, K., Fidler, S., Sharp, N., Yin, K.: TexFusion: synthesizing 3D textures with text-guided image diffusion models. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4169–4181 (2023)
Chen, D.Z., Li, H., Lee, H.Y., Tulyakov, S., Nießner, M.: SceneTex: high-quality texture synthesis for indoor scenes via diffusion priors. In: CVPR, pp. 21081–21091 (2024)
Chen, D.Z., Siddiqui, Y., Lee, H.Y., Tulyakov, S., Nießner, M.: Text2Tex: text-driven texture synthesis via diffusion models. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 18558–18568 (2023)
Chen, R., Chen, Y., Jiao, N., Jia, K.: Fantasia3D: disentangling geometry and appearance for high-quality text-to-3D content creation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 22246–22256 (2023)
Chen, Y., Chen, R., Lei, J., Zhang, Y., Jia, K.: TANGO: text-driven photorealistic and robust 3D stylization via lighting decomposition. Adv. Neural. Inf. Process. Syst.35, 30923–30936 (2022)
Chen, Z., Yin, K., Fidler, S.: AUV-Net: learning aligned UV maps for texture transfer and synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1465–1474 (2022)
Cohen-Bar, D., Richardson, E., Metzer, G., Giryes, R., Cohen-Or, D.: Set-the-Scene: global-local training for generating controllable nerf scenes. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2920–2929 (2023)
Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: ScanNet: richly-annotated 3D reconstructions of indoor scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5828–5839 (2017)
Fang, C., Hu, X., Luo, K., Tan, P.: Ctrl-Room: controllable text-to-3D room meshes generation with layout constraints. arXiv preprintarXiv:2310.03602 (2023)
Fridman, R., Abecasis, A., Kasten, Y., Dekel, T.: SceneScape: text-driven consistent scene generation. Adv. Neural Info. Process. Syst.36 (2024)
Fu, H., et al.: 3D-front: 3D furnished rooms with layouts and semantics. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10933–10942 (2021)
Gao, J., et al.: GET3D: a generative model of high quality 3D textured shapes learned from images. Adv. Neural. Inf. Process. Syst.35, 31841–31854 (2022)
Gupta, A., Xiong, W., Nie, Y., Jones, I., Oğuz, B.: 3DGEN: triplane latent diffusion for textured mesh generation. arXiv preprintarXiv:2303.05371 (2023)
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: NeurIPS, vol. 33, pp. 6840–6851 (2020)
Höllein, L., Cao, A., Owens, A., Johnson, J., Nießner, M.: Text2Room: extracting textured 3D meshes from 2D text-to-image models. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7909–7920 (2023)
Hwang, I., Kim, H., Kim, Y.M.: Text2Scene: text-driven indoor scene stylization with part-aware details. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1890–1899 (2023)
Jun, H., Nichol, A.: Shap-E: generating conditional 3D implicit functions. arXiv preprintarXiv:2305.02463 (2023)
Li, W., Chen, R., Chen, X., Tan, P.: SweetDreamer: aligning geometric priors in 2D diffusion for consistent text-to-3D. arXiv preprintarXiv:2310.02596 (2023)
Lin, C.H., et al.: Magic3D: high-resolution text-to-3D content creation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 300–309 (2023)
Liu, Z., et al.: UniDream: unifying diffusion priors for relightable text-to-3D generation. arXiv preprintarXiv:2312.08754 (2023)
Liu, Z., Feng, Y., Black, M.J., Nowrouzezahrai, D., Paull, L., Liu, W.: MeshDiffusion: score-based generative 3D mesh modeling. arXiv preprintarXiv:2303.08133 (2023)
Metzer, G., Richardson, E., Patashnik, O., Giryes, R., Cohen-Or, D.: Latent-NeRF for shape-guided generation of 3D shapes and textures. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12663–12673 (2023)
Michel, O., Bar-On, R., Liu, R., Benaim, S., Hanocka, R.: Text2Mesh: text-driven neural stylization for meshes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13492–13502 (2022)
Mittal, A., Moorthy, A.K., Bovik, A.C.: No-reference image quality assessment in the spatial domain. IEEE Trans. Image Process.21(12), 4695–4708 (2012)
Mou, C., et al.: T2I-Adapter: learning adapters to dig out more controllable ability for text-to-image diffusion models. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, pp. 4296–4304 (2024)
Nichol, A., Jun, H., Dhariwal, P., Mishkin, P., Chen, M.: Point-E: a system for generating 3D point clouds from complex prompts. arXiv preprintarXiv:2212.08751 (2022)
Nichol, A.Q., et al.: GLIDE: towards photorealistic image generation and editing with text-guided diffusion models. In: International Conference on Machine Learning, pp. 16784–16804. PMLR (2022)
Oechsle, M., Mescheder, L., Niemeyer, M., Strauss, T., Geiger, A.: Texture Fields: learning texture representations in function space. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4531–4540 (2019)
Po, R., Wetzstein, G.: Compositional 3D scene generation using locally conditioned diffusion. In: 2024 International Conference on 3D Vision (3DV), pp. 651–663. IEEE (2024)
Poole, B., Jain, A., Barron, J.T., Mildenhall, B.: DreamFusion: text-to-3D using 2D diffusion. In: ICLR (2023)
Qiu, L., et al.: RichDreamer: a generalizable normal-depth diffusion model for detail richness in text-to-3D. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9914–9925 (2024)
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)
Richardson, E., Metzer, G., Alaluf, Y., Giryes, R., Cohen-Or, D.: TEXTure: text-guided texturing of 3D shapes. In: ACM SIGGRAPH 2023 Conference Proceedings, pp. 1–11 (2023)
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695 (2022)
Saharia, C.: Photorealistic text-to-image diffusion models with deep language understanding. Adv. Neural. Inf. Process. Syst.35, 36479–36494 (2022)
Schuhmann, C., et al.: LAION-400M: open dataset of CLIP-filtered 400 million image-text pairs. arXiv preprintarXiv:2111.02114 (2021)
Sharma, P., Ding, N., Goodman, S., Soricut, R.: Conceptual Captions: a cleaned, hypernymed, image alt-text dataset for automatic image captioning. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 2556–2565 (2018)
Shi, Y., Wang, P., Ye, J., Long, M., Li, K., Yang, X.: MvDream: multi-view diffusion for 3D generation. arXiv preprintarXiv:2308.16512 (2023)
Siddiqui, Y., et al.: MeshGPT: generating triangle meshes with decoder-only transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19615–19625 (2024)
Siddiqui, Y., Thies, J., Ma, F., Shan, Q., Nießner, M., Dai, A.: Texturify: generating textures on 3D shape surfaces. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision, ECCV 2022. LNCS, vol. 13663, pp. 72–88. Springer, Cham (2022).https://doi.org/10.1007/978-3-031-20062-5_5
Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., Ganguli, S.: Deep unsupervised learning using nonequilibrium thermodynamics. In: International Conference on Machine Learning, pp. 2256–2265. PMLR (2015)
Song, L., et al.: RoomDreamer: text-driven 3D indoor scene synthesis with coherent geometry and texture. In: Proceedings of the 31st ACM International Conference on Multimedia, pp. 6898–6906 (2023)
Tang, S., Zhang, F., Chen, J., Wang, P., Furukawa, Y.: MVDiffusion: enabling holistic multi-view image generation with correspondence-aware diffusion. arXiv preprintarXiv:2307.01097 (2023)
Voynov, A., Aberman, K., Cohen-Or, D.: Sketch-guided text-to-image diffusion models. In: ACM SIGGRAPH 2023 Conference Proceedings, pp. 1–11 (2023)
Wang, H., Du, X., Li, J., Yeh, R.A., Shakhnarovich, G.: Score Jacobian Chaining: lifting pretrained 2D diffusion models for 3D generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12619–12629 (2023)
Wang, T., Kanakis, M., Schindler, K., Van Gool, L., Obukhov, A.: Breathing new life into 3D assets with generative repainting. arXiv preprintarXiv:2309.08523 (2023)
Wang, Z., et al.: ProlificDreamer: high-fidelity and diverse text-to-3D generation with variational score distillation. Adv. Neural Inf. Process. Syst.36 (2024)
Yang, B., et al.: DreamSpace: dreaming your room space with text-driven panoramic texture propagation. In: 2024 IEEE Conference Virtual Reality and 3D User Interfaces (VR), pp. 650–660. IEEE (2024)
Yeh, Y.Y., et al.: TextureDreamer: image-guided texture synthesis through geometry-aware diffusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4304–4314 (2024)
Youwang, K., Oh, T.H., Pons-Moll, G.: Paint-it: text-to-texture synthesis via deep convolutional texture map optimization and physically-based rendering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4347–4356 (2024)
Yu, X., Dai, P., Li, W., Ma, L., Liu, Z., Qi, X.: Texture generation on 3D meshes with point-UV diffusion. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4206–4216 (2023)
Zeng, X., et al.: Paint3D: paint anything 3D with lighting-less texture diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4252–4262 (2024)
Zhang, J., Li, X., Wan, Z., Wang, C., Liao, J.: Text2NeRF: text-driven 3D scene generation with neural radiance fields. IEEE Trans. Vis. Comput. Graph. (2024)
Zhang, L., Rao, A., Agrawala, M.: Adding conditional control to text-to-image diffusion models. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3836–3847 (2023)
Zhang, Q., et al.: Scenewiz3D: towards text-guided 3D scene composition. arXiv preprintarXiv:2312.08885 (2023)
Acknowledgements
This research is supported in part by the Early Career Scheme of the Research Grants Council (RGC) of the Hong Kong SAR under grant No. 26202321, SAIL Research Project, HKUST-Zeekr Collaborative Research Fund, HKUST-WeBank Joint Lab Project, Tencent Rhino-Bird Focused Research Program, Sichuan Science and Technology Program (2023YFSY0008), China Tower-Peking University Joint Laboratory of Intelligent Society and Space Governance, National Natural Science Foundation of China (61632003, 61375022, 61403005), Grant SCITLAB-20017 of Intelligent Terminal Key Laboratory of SiChuan Province, Beijing Advanced Innovation Center for Intelligent Robots and Systems (2018IRS11), and PEK-SenseTime Joint Laboratory of Machine Vision. This research is also supported by Shanghai Artificial Intelligence Laboratory.
Author information
Authors and Affiliations
The Hong Kong University of Science and Technology, Hong Kong, Hong Kong SAR, China
Qi Wang, Michael Yu Wang & Dan Xu
National Key Laboratory of General AI, School of IST, Peking University, Beijing, China
Ruijie Lu & Gang Zeng
Shanghai AI Laboratory, Shanghai, China
Qi Wang, Ruijie Lu, Xudong Xu, Jingbo Wang & Bo Dai
- Qi Wang
You can also search for this author inPubMed Google Scholar
- Ruijie Lu
You can also search for this author inPubMed Google Scholar
- Xudong Xu
You can also search for this author inPubMed Google Scholar
- Jingbo Wang
You can also search for this author inPubMed Google Scholar
- Michael Yu Wang
You can also search for this author inPubMed Google Scholar
- Bo Dai
You can also search for this author inPubMed Google Scholar
- Gang Zeng
You can also search for this author inPubMed Google Scholar
- Dan Xu
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toXudong Xu.
Editor information
Editors and Affiliations
University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol
1Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, Q.et al. (2025). RoomTex: Texturing Compositional Indoor Scenes via Iterative Inpainting. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15126. Springer, Cham. https://doi.org/10.1007/978-3-031-73113-6_27
Download citation
Published:
Publisher Name:Springer, Cham
Print ISBN:978-3-031-73112-9
Online ISBN:978-3-031-73113-6
eBook Packages:Computer ScienceComputer Science (R0)
Share this paper
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative