{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,4]],"date-time":"2026-06-04T06:43:07Z","timestamp":1780555387741,"version":"3.54.1"},"reference-count":83,"publisher":"Association for Computing Machinery (ACM)","issue":"6","funder":[{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program of China","doi-asserted-by":"publisher","award":["2022YFB3303101"],"award-info":[{"award-number":["2022YFB3303101"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Graph."],"published-print":{"date-parts":[[2025,12]]},"abstract":"<jats:p>Generating artistic and coherent 3D scene layouts is crucial in digital content creation. Traditional optimization-based methods are often constrained by cumbersome manual rules, while deep generative models face challenges in producing content with richness and diversity. Furthermore, approaches that utilize large language models frequently lack robustness and fail to accurately capture complex spatial relationships. To address these challenges, this paper presents a novel vision-guided 3D layout generation system. We first construct a high-quality asset library containing 2,037 scene assets and 147 3D scene layouts. Subsequently, we employ an image generation model to expand prompt representations into images, fine-tuning it to align with our asset library. We then develop a robust image parsing module to recover the 3D layout of scenes based on visual semantics and geometric information. Finally, we optimize the scene layout using scene graphs and overall visual semantics to ensure logical coherence and alignment with the images. Extensive user testing demonstrates that our algorithm significantly outperforms existing methods in terms of layout richness and quality. The code and dataset will be available at https:\/\/github.com\/HiHiAllen\/Imaginarium.<\/jats:p>","DOI":"10.1145\/3763353","type":"journal-article","created":{"date-parts":[[2025,12,4]],"date-time":"2025-12-04T17:15:39Z","timestamp":1764868539000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Imaginarium: Vision-guided High-Quality 3D Scene Layout Generation"],"prefix":"10.1145","volume":"44","author":[{"ORCID":"https:\/\/orcid.org\/0009-0002-3695-7288","authenticated-orcid":false,"given":"Xiaoming","family":"Zhu","sequence":"first","affiliation":[{"name":"Tsinghua University, ShenZhen, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-2655-2251","authenticated-orcid":false,"given":"Xu","family":"Huang","sequence":"additional","affiliation":[{"name":"Tencent, ShenZhen, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-9590-638X","authenticated-orcid":false,"given":"Qinghongbing","family":"Xie","sequence":"additional","affiliation":[{"name":"Tsinghua University, ShenZhen, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4582-4874","authenticated-orcid":false,"given":"Zhi","family":"Deng","sequence":"additional","affiliation":[{"name":"Tencent, ShenZhen, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-8719-1289","authenticated-orcid":false,"given":"Junsheng","family":"Yu","sequence":"additional","affiliation":[{"name":"Southeast University, NanJing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-0603-748X","authenticated-orcid":false,"given":"Yirui","family":"Guan","sequence":"additional","affiliation":[{"name":"Tencent, ShenZhen, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1601-0038","authenticated-orcid":false,"given":"Zhongyuan","family":"Liu","sequence":"additional","affiliation":[{"name":"Tencent, ShenZhen, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-9044-9922","authenticated-orcid":false,"given":"Lin","family":"Zhu","sequence":"additional","affiliation":[{"name":"Tencent, ShenZhen, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-4797-9155","authenticated-orcid":false,"given":"Qijun","family":"Zhao","sequence":"additional","affiliation":[{"name":"Tencent, ShenZhen, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4352-1431","authenticated-orcid":false,"given":"Ligang","family":"Liu","sequence":"additional","affiliation":[{"name":"University of Science and Technology of China, Hefei, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3090-6319","authenticated-orcid":false,"given":"Long","family":"Zeng","sequence":"additional","affiliation":[{"name":"Tsinghua University, ShenZhen, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2025,12,4]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al.","author":"Achiam Josh","year":"2023","unstructured":"Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. 2023. Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023)."},{"key":"e_1_2_1_2_1","volume-title":"Stewart Morris, Seung Jean Yoo, Aditya Ganeshan, R. Kenny Jones, Qiuhong Anna Wei, Kailiang Fu, and Daniel Ritchie.","author":"Aguina-Kang Rio","year":"2024","unstructured":"Rio Aguina-Kang, Maxim Gumin, Do Heon Han, Stewart Morris, Seung Jean Yoo, Aditya Ganeshan, R. Kenny Jones, Qiuhong Anna Wei, Kailiang Fu, and Daniel Ritchie. 2024. Open-Universe Indoor Scene Generation using LLM Program Synthesis and Uncurated Object Databases. arXiv:2403.09675 [cs.CV] https:\/\/arxiv.org\/abs\/2403.09675"},{"key":"e_1_2_1_3_1","unstructured":"Tom B. Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell Sandhini Agarwal Ariel Herbert-Voss Gretchen Krueger Tom Henighan Rewon Child Aditya Ramesh Daniel M. Ziegler Jeffrey Wu Clemens Winter Christopher Hesse Mark Chen Eric Sigler Mateusz Litwin Scott Gray Benjamin Chess Jack Clark Christopher Berner Sam McCandlish Alec Radford Ilya Sutskever and Dario Amodei. 2020. Language Models are Few-Shot Learners. arXiv:2005.14165 [cs.CL] https:\/\/arxiv.org\/abs\/2005.14165"},{"key":"e_1_2_1_4_1","volume-title":"Iro Armeni, Anton Obukhov, and Xi Wang.","author":"\u00c7elen Ata","year":"2024","unstructured":"Ata \u00c7elen, Guo Han, Konrad Schindler, Luc Van Gool, Iro Armeni, Anton Obukhov, and Xi Wang. 2024. I-design: Personalized llm interior designer. arXiv preprint arXiv:2404.02838 (2024)."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1217"},{"key":"e_1_2_1_6_1","volume-title":"SceneSeer: 3D scene design with natural language. arXiv preprint arXiv:1703.00050","author":"Chang Angel X","year":"2017","unstructured":"Angel X Chang, Mihail Eric, Manolis Savva, and Christopher D Manning. 2017. SceneSeer: 3D scene design with natural language. arXiv preprint arXiv:1703.00050 (2017)."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.52202\/079017-0738"},{"key":"e_1_2_1_8_1","volume-title":"Samir Yitzhak Gadre, et al","author":"Deitke Matt","year":"2024","unstructured":"Matt Deitke, Ruoshi Liu, Matthew Wallingford, Huong Ngo, Oscar Michel, Aditya Kusupati, Alan Fan, Christian Laforte, Vikram Voleti, Samir Yitzhak Gadre, et al. 2024. Objaverse-xl: A universe of 10m+ 3d objects. Advances in Neural Information Processing Systems 36 (2024)."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52734.2025.00839"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2022.3170853"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01604"},{"key":"e_1_2_1_12_1","doi-asserted-by":"crossref","unstructured":"Bertram Drost Markus Ulrich Nassir Navab and Slobodan Ilic. 2010. Model globally match locally: Efficient and robust 3D object recognition. In 2010 IEEE computer society conference on computer vision and pattern recognition. Ieee 998\u20131005.","DOI":"10.1109\/CVPR.2010.5540108"},{"key":"e_1_2_1_13_1","volume-title":"Xin Eric Wang, and William Yang Wang","author":"Feng Weixi","year":"2024","unstructured":"Weixi Feng, Wanrong Zhu, Tsu-jui Fu, Varun Jampani, Arjun Akula, Xuehai He, Sugato Basu, Xin Eric Wang, and William Yang Wang. 2024. Layoutgpt: Compositional visual planning and generation with large language models. Advances in Neural Information Processing Systems 36 (2024)."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/358669.358692"},{"key":"e_1_2_1_15_1","doi-asserted-by":"crossref","unstructured":"Matthew Fisher and Pat Hanrahan. 2010. Context-based search for 3d models. In ACM SIGGRAPH Asia 2010 papers. 1\u201310.","DOI":"10.1145\/1882262.1866204"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/2366145.2366154"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/2816795.2818057"},{"key":"e_1_2_1_18_1","unstructured":"Huan Fu Rongfei Jia Lin Gao Mingming Gong Binqiang Zhao Steve Maybank and Dacheng Tao. 2020. 3D-FUTURE: 3D Furniture shape with TextURE. arXiv:2009.09633 [cs.CV] https:\/\/arxiv.org\/abs\/2009.09633"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3130800.3130805"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3658236"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00399"},{"key":"e_1_2_1_22_1","volume-title":"A generative model of 3d object layouts in apartments. arXiv preprint arXiv:1711.10939","author":"Henderson Paul","year":"2017","unstructured":"Paul Henderson and Vittorio Ferrari. 2017. A generative model of 3d object layouts in apartments. arXiv preprint arXiv:1711.10939 (2017)."},{"key":"e_1_2_1_23_1","unstructured":"Jonathan Ho Ajay Jain and Pieter Abbeel. 2020. Denoising Diffusion Probabilistic Models. arXiv:2006.11239 [cs.LG] https:\/\/arxiv.org\/abs\/2006.11239"},{"key":"e_1_2_1_24_1","volume-title":"Forty-first International Conference on Machine Learning.","author":"Hu Ziniu","year":"2024","unstructured":"Ziniu Hu, Ahmet Iscen, Aashi Jain, Thomas Kipf, Yisong Yue, David A Ross, Cordelia Schmid, and Alireza Fathi. 2024. Scenecraft: An llm agent for synthesizing 3d scenes as blender code. In Forty-first International Conference on Machine Learning."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52734.2025.01257"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-018-1103-5"},{"key":"e_1_2_1_27_1","volume-title":"Learning object arrangements in 3d scenes using human context. arXiv preprint arXiv:1206.6462","author":"Jiang Yun","year":"2012","unstructured":"Yun Jiang, Marcus Lim, and Ashutosh Saxena. 2012. Learning object arrangements in 3d scenes using human context. arXiv preprint arXiv:1206.6462 (2012)."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00371"},{"key":"e_1_2_1_29_1","volume-title":"Proceedings, Part III 16","author":"Kuo Weicheng","year":"2020","unstructured":"Weicheng Kuo, Anelia Angelova, Tsung-Yi Lin, and Angela Dai. 2020. Mask2cad: 3d shape prediction by learning to segment and retrieve. In Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23\u201328, 2020, Proceedings, Part III 16. Springer, 260\u2013277."},{"key":"e_1_2_1_30_1","volume-title":"Megapose: 6d pose estimation of novel objects via render & compare. arXiv preprint arXiv:2212.06870","author":"Labb\u00e9 Yann","year":"2022","unstructured":"Yann Labb\u00e9, Lucas Manuelli, Arsalan Mousavian, Stephen Tyree, Stan Birchfield, Jonathan Tremblay, Justin Carpentier, Mathieu Aubry, Dieter Fox, and Josef Sivic. 2022. Megapose: 6d pose estimation of novel objects via render & compare. arXiv preprint arXiv:2212.06870 (2022)."},{"key":"e_1_2_1_31_1","unstructured":"Black Forest Labs. 2024. FLUX. https:\/\/github.com\/black-forest-labs\/flux."},{"key":"e_1_2_1_32_1","volume-title":"Sparc: Sparse render-and-compare for cad model alignment in a single rgb image. arXiv preprint arXiv:2210.01044","author":"Langer Florian","year":"2022","unstructured":"Florian Langer, Gwangbin Bae, Ignas Budvytis, and Roberto Cipolla. 2022. Sparc: Sparse render-and-compare for cad model alignment in a single rgb image. arXiv preprint arXiv:2210.01044 (2022)."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-019-01250-9"},{"key":"e_1_2_1_34_1","volume-title":"Instructscene: Instruction-driven 3d indoor scene synthesis with semantic graph prior. arXiv preprint arXiv:2402.04717","author":"Lin Chenguo","year":"2024","unstructured":"Chenguo Lin and Yadong Mu. 2024. Instructscene: Instruction-driven 3d indoor scene synthesis with semantic graph prior. arXiv preprint arXiv:2402.04717 (2024)."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00960"},{"key":"e_1_2_1_36_1","volume-title":"One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimization. Advances in Neural Information Processing Systems 36","author":"Liu Minghua","year":"2024","unstructured":"Minghua Liu, Chao Xu, Haian Jin, Linghao Chen, Mukund Varma T, Zexiang Xu, and Hao Su. 2024c. One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimization. Advances in Neural Information Processing Systems 36 (2024)."},{"key":"e_1_2_1_37_1","volume-title":"Syncdreamer: Generating multiview-consistent images from a single-view image. In ICLR.","author":"Liu Yuan","year":"2024","unstructured":"Yuan Liu, Cheng Lin, Zijiao Zeng, Xiaoxiao Long, Lingjie Liu, Taku Komura, and Wenping Wang. 2024a. Syncdreamer: Generating multiview-consistent images from a single-view image. In ICLR."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/2980179.2980223"},{"key":"e_1_2_1_39_1","volume-title":"Interactive furniture layout using interior design guidelines. ACM transactions on graphics (TOG) 30, 4","author":"Merrell Paul","year":"2011","unstructured":"Paul Merrell, Eric Schkufza, Zeyang Li, Maneesh Agrawala, and Vladlen Koltun. 2011. Interactive furniture layout using interior design guidelines. ACM transactions on graphics (TOG) 30, 4 (2011), 1\u201310."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00945"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00945"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00665"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00083"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-73347-5_10"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00288"},{"key":"e_1_2_1_46_1","volume-title":"ATISS: Autoregressive Transformers for Indoor Scene Synthesis. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021","author":"Paschalidou Despoina","year":"2021","unstructured":"Despoina Paschalidou, Amlan Kar, Maria Shugrina, Karsten Kreis, Andreas Geiger, and Sanja Fidler. 2021a. ATISS: Autoregressive Transformers for Indoor Scene Synthesis. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6\u201314, 2021, virtual, Marc'Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan (Eds.). 12013\u201312026. https:\/\/proceedings.neurips.cc\/paper\/2021\/hash\/64986d86a17424eeac96b08a6d519059-Abstract.html"},{"key":"e_1_2_1_47_1","first-page":"12013","article-title":"Atiss: Autoregressive transformers for indoor scene synthesis","volume":"34","author":"Paschalidou Despoina","year":"2021","unstructured":"Despoina Paschalidou, Amlan Kar, Maria Shugrina, Karsten Kreis, Andreas Geiger, and Sanja Fidler. 2021b. Atiss: Autoregressive transformers for indoor scene synthesis. Advances in Neural Information Processing Systems 34 (2021), 12013\u201312026.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","unstructured":"Pulak Purkait Christopher Zach and Ian Reid. 2020. SG-VAE: Scene Grammar Variational Autoencoder to Generate New Indoor Scenes. 155\u2013171. 10.1007\/978-3-030-58586-0_10","DOI":"10.1007\/978-3-030-58586-0_10"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00618"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.02058"},{"key":"e_1_2_1_51_1","unstructured":"Tianhe Ren Qing Jiang Shilong Liu Zhaoyang Zeng Wenlong Liu Han Gao Hongjie Huang Zhengyu Ma Xiaoke Jiang Yihao Chen et al. 2024. Grounding DINO 1.5: Advance the\" Edge\" of Open-Set Object Detection. arXiv preprint arXiv:2405.10300 (2024)."},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00634"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.02155"},{"key":"e_1_2_1_54_1","volume-title":"Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022","author":"Saharia Chitwan","year":"2022","unstructured":"Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L. Denton, Seyed Kamyar Seyed Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, Jonathan Ho, David J. Fleet, and Mohammad Norouzi. 2022. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, Sanmi Koyejo, S. Mohamed, A. Agarwal, Danielle Belgrave, K. Cho, and A. Oh (Eds.)."},{"key":"e_1_2_1_55_1","volume-title":"Scenesuggest: Context-driven 3d scene design. arXiv preprint arXiv:1703.00061","author":"Savva Manolis","year":"2017","unstructured":"Manolis Savva, Angel X Chang, and Maneesh Agrawala. 2017. Scenesuggest: Context-driven 3d scene design. arXiv preprint arXiv:1703.00061 (2017)."},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/cvpr.2013.377"},{"key":"e_1_2_1_57_1","volume-title":"Proceedings of the 15th conference on Winter simulation, WSC 1983","author":"Christopher","year":"1983","unstructured":"Christopher C. Skiscim and Bruce L. Golden. 1983. Optimization by simulated annealing: A preliminary computational study for the TSP. In Proceedings of the 15th conference on Winter simulation, WSC 1983, Arlington, VA, USA, December 12\u201314, 1983, Stephen D. Roberts, Jerry Banks, and Bruce W. Schmeiser (Eds.). ACM, 523\u2013535. http:\/\/dl.acm.org\/citation.cfm?id=801546"},{"key":"e_1_2_1_58_1","volume-title":"LayoutVLM: Differentiable Optimization of 3D Layout via Vision-Language Models. arXiv preprint arXiv:2412.02193","author":"Sun Fan-Yun","year":"2024","unstructured":"Fan-Yun Sun, Weiyu Liu, Siyi Gu, Dylan Lim, Goutam Bhat, Federico Tombari, Manling Li, Nick Haber, and Jiajun Wu. 2024. LayoutVLM: Differentiable Optimization of 3D Layout via Vision-Language Models. arXiv preprint arXiv:2412.02193 (2024)."},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01393"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.01938"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.imavis.2023.104816"},{"key":"e_1_2_1_62_1","unstructured":"Hugo Touvron Louis Martin Kevin Stone Peter Albert Amjad Almahairi Yasmine Babaei Nikolay Bashlykov Soumya Batra Prajjwal Bhargava Shruti Bhosale et al. 2023. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023)."},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00402"},{"key":"e_1_2_1_64_1","volume-title":"Towards understanding chain-of-thought prompting: An empirical study of what matters. arXiv preprint arXiv:2212.10001","author":"Wang Boshi","year":"2022","unstructured":"Boshi Wang, Sewon Min, Xiang Deng, Jiaming Shen, You Wu, Luke Zettlemoyer, and Huan Sun. 2022. Towards understanding chain-of-thought prompting: An empirical study of what matters. arXiv preprint arXiv:2212.10001 (2022)."},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00275"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1145\/3197517.3201362"},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1109\/3dv53792.2021.00021"},{"key":"e_1_2_1_68_1","first-page":"67575","article-title":"Architect: Generating Vivid and Interactive 3D Scenes with Hierarchical 2D Inpainting","volume":"37","author":"Wang Yian","year":"2024","unstructured":"Yian Wang, Xiaowen Qiu, Jiageng Liu, Zhehuan Chen, Jiting Cai, Yufei Wang, Tsun-Hsuan Johnson Wang, Zhou Xian, and Chuang Gan. 2024a. Architect: Generating Vivid and Interactive 3D Scenes with Hierarchical 2D Inpainting. Advances in Neural Information Processing Systems 37 (2024), 67575\u201367603.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_1_69_1","volume-title":"Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models. arXiv preprint arXiv:2412.18605","author":"Wang Zehan","year":"2024","unstructured":"Zehan Wang, Ziang Zhang, Tianyu Pang, Chao Du, Hengshuang Zhao, and Zhou Zhao. 2024b. Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models. arXiv preprint arXiv:2412.18605 (2024)."},{"key":"e_1_2_1_70_1","volume-title":"Tackling the generative learning trilemma with denoising diffusion gans. arXiv preprint arXiv:2112.07804","author":"Xiao Zhisheng","year":"2021","unstructured":"Zhisheng Xiao, Karsten Kreis, and Arash Vahdat. 2021. Tackling the generative learning trilemma with denoising diffusion gans. arXiv preprint arXiv:2112.07804 (2021)."},{"key":"e_1_2_1_71_1","first-page":"1","article-title":"Sketch2Scene: Sketch-based co-retrieval and co-placement of 3D models","volume":"32","author":"Xu Kun","year":"2013","unstructured":"Kun Xu, Kang Chen, Hongbo Fu, Wei-Lun Sun, and Shi-Min Hu. 2013. Sketch2Scene: Sketch-based co-retrieval and co-placement of 3D models. ACM Transactions on Graphics (TOG) 32, 4 (2013), 1\u201315.","journal-title":"ACM Transactions on Graphics (TOG)"},{"key":"e_1_2_1_72_1","volume-title":"Sketch123: Multi-spectral channel cross attention for sketch-based 3D generation via diffusion models. Computer-Aided Design","author":"Xu Zhentong","year":"2025","unstructured":"Zhentong Xu, Long Zeng, Junli Zhao, Baodong Wang, Zhenkuan Pan, and Yong-Jin Liu. 2025. Sketch123: Multi-spectral channel cross attention for sketch-based 3D generation via diffusion models. Computer-Aided Design (2025), 103896."},{"key":"e_1_2_1_73_1","volume-title":"3dsceneeditor","author":"Yan Ziyang","year":"2024","unstructured":"Ziyang Yan, Lei Li, Yihua Shao, Siyu Chen, Zongkai Wu, Jenq-Neng Hwang, Hao Zhao, and Fabio Remondino. 2024. 3dsceneeditor: Controllable 3d scene editing with gaussian splatting. arXiv preprint arXiv:2412.01583 (2024)."},{"key":"e_1_2_1_74_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00558"},{"key":"e_1_2_1_75_1","volume-title":"Depth Anything V2. arXiv preprint arXiv:2406.09414","author":"Yang Lihe","year":"2024","unstructured":"Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiaogang Xu, Jiashi Feng, and Hengshuang Zhao. 2024a. Depth Anything V2. arXiv preprint arXiv:2406.09414 (2024)."},{"key":"e_1_2_1_76_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01492"},{"key":"e_1_2_1_77_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01492"},{"key":"e_1_2_1_78_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.01536"},{"key":"e_1_2_1_79_1","volume-title":"Cast: Component-aligned 3d scene reconstruction from an rgb image. arXiv preprint arXiv:2502.12894","author":"Yao Kaixin","year":"2025","unstructured":"Kaixin Yao, Longwen Zhang, Xinhao Yan, Yan Zeng, Qixuan Zhang, Lan Xu, Wei Yang, Jiayuan Gu, and Jingyi Yu. 2025. Cast: Component-aligned 3d scene reconstruction from an rgb image. arXiv preprint arXiv:2502.12894 (2025)."},{"key":"e_1_2_1_80_1","doi-asserted-by":"publisher","DOI":"10.1145\/2185520.2185552"},{"key":"e_1_2_1_81_1","doi-asserted-by":"publisher","DOI":"10.1145\/1964921.1964981"},{"key":"e_1_2_1_82_1","volume-title":"European Conference on Computer Vision. Springer, 167\u2013184","author":"Zhai Guangyao","year":"2024","unstructured":"Guangyao Zhai, Evin P\u0131nar \u00d6rnek, Dave Zhenyu Chen, Ruotong Liao, Yan Di, Nassir Navab, Federico Tombari, and Benjamin Busam. 2024. Echoscene: Indoor scene generation via information echo over scene graph diffusion. In European Conference on Computer Vision. Springer, 167\u2013184."},{"key":"e_1_2_1_83_1","first-page":"30026","article-title":"Commonscenes: Generating commonsense 3d indoor scenes with scene graph diffusion","volume":"36","author":"Zhai Guangyao","year":"2023","unstructured":"Guangyao Zhai, Evin P\u0131nar \u00d6rnek, Shun-Cheng Wu, Yan Di, Federico Tombari, Nassir Navab, and Benjamin Busam. 2023. Commonscenes: Generating commonsense 3d indoor scenes with scene graph diffusion. Advances in Neural Information Processing Systems 36 (2023), 30026\u201330038.","journal-title":"Advances in Neural Information Processing Systems"}],"container-title":["ACM Transactions on Graphics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3763353","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,5]],"date-time":"2025-12-05T21:14:22Z","timestamp":1764969262000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3763353"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12]]},"references-count":83,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2025,12]]}},"alternative-id":["10.1145\/3763353"],"URL":"https:\/\/doi.org\/10.1145\/3763353","relation":{},"ISSN":["0730-0301","1557-7368"],"issn-type":[{"value":"0730-0301","type":"print"},{"value":"1557-7368","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,12]]},"assertion":[{"value":"2025-05-24","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-08-09","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-12-04","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}