{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,17]],"date-time":"2026-07-17T06:09:24Z","timestamp":1784268564078,"version":"3.55.0"},"publisher-location":"New York, NY, USA","reference-count":61,"publisher":"ACM","license":[{"start":{"date-parts":[[2024,12,3]],"date-time":"2024-12-03T00:00:00Z","timestamp":1733184000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2024,12,3]]},"DOI":"10.1145\/3680528.3687645","type":"proceedings-article","created":{"date-parts":[[2024,12,3]],"date-time":"2024-12-03T08:14:37Z","timestamp":1733213677000},"page":"1-11","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["BlobGEN-3D: Compositional 3D-Consistent Freeview Image Generation with 3D Blobs"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0007-5751-8723","authenticated-orcid":false,"given":"Chao","family":"Liu","sequence":"first","affiliation":[{"name":"NVIDIA, Santa Clara, United States of America"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0030-3189","authenticated-orcid":false,"given":"Weili","family":"Nie","sequence":"additional","affiliation":[{"name":"NVIDIA, Santa Clara, United States of America"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6011-3686","authenticated-orcid":false,"given":"Sifei","family":"Liu","sequence":"additional","affiliation":[{"name":"NVIDIA, Santa Clara, United States of America"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5559-2336","authenticated-orcid":false,"given":"Abhishek","family":"Badki","sequence":"additional","affiliation":[{"name":"NVIDIA, Santa Clara, United States of America"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8770-8754","authenticated-orcid":false,"given":"Hang","family":"Su","sequence":"additional","affiliation":[{"name":"NVIDIA, Santa Clara, United States of America"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-1763-6832","authenticated-orcid":false,"given":"Morteza","family":"Mardani","sequence":"additional","affiliation":[{"name":"NVIDIA, Santa Clara, United States of America"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-1959-9750","authenticated-orcid":false,"given":"Benjamin","family":"Eckart","sequence":"additional","affiliation":[{"name":"NVIDIA, Santa Clara, United States of America"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-9476-1306","authenticated-orcid":false,"given":"Arash","family":"Vahdat","sequence":"additional","affiliation":[{"name":"NVIDIA, Santa Clara, United States of America"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2024,12,3]]},"reference":[{"key":"e_1_3_3_2_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00385"},{"key":"e_1_3_3_2_3_1","volume-title":"IEEE International Conference on Computer Vision (ICCV)","author":"Chan Eric\u00a0R.","year":"2023","unstructured":"Eric\u00a0R. Chan, Koki Nagano, Matthew\u00a0A. Chan, Alexander\u00a0W. Bergman, Jeong\u00a0Joon Park, Axel Levy, Miika Aittala, Shalini\u00a0De Mello, Tero Karras, and Gordon Wetzstein. 2023. GeNVS: Generative Novel View Synthesis with 3D-Aware Diffusion Models. In IEEE International Conference on Computer Vision (ICCV)."},{"key":"e_1_3_3_2_4_1","volume-title":"International Conference on Learning Representations (ICLR)","author":"Chang Pascal","year":"2024","unstructured":"Pascal Chang, Jingwei Tang, Markus Gross, and Vinicius\u00a0C. Azevedo. 2024. How I Warped Your Noise: a Temporally-Correlated Noise Prior for Diffusion Models. In International Conference on Learning Representations (ICLR)."},{"key":"e_1_3_3_2_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.01992"},{"key":"e_1_3_3_2_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.01701"},{"key":"e_1_3_3_2_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.02033"},{"key":"e_1_3_3_2_8_1","volume-title":"IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Chung Jaeyoung","year":"2024","unstructured":"Jaeyoung Chung, Suyoung Lee, Hyeongjin Nam, Jaerin Lee, and Kyoung\u00a0Mu Lee. 2024. LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR)."},{"key":"e_1_3_3_2_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCVW60793.2023.00314"},{"key":"e_1_3_3_2_10_1","unstructured":"Chuan Fang Xiaotao Hu Kunming Luo and Ping Tan. 2023. Ctrl-Room: Controllable Text-to-3D Room Meshes Generation with Layout Constraints. ArXiv Preprint (2023)."},{"key":"e_1_3_3_2_11_1","volume-title":"Advances in Neural Information Processing Systems (NeurIPS)","author":"Fridman Rafail","year":"2023","unstructured":"Rafail Fridman, Amit Abecasis, Yoni Kasten, and Tali Dekel. 2023. SceneScape: Text-Driven Consistent Scene Generation. In Advances in Neural Information Processing Systems (NeurIPS)."},{"key":"e_1_3_3_2_12_1","unstructured":"Ian Goodfellow Jean Pouget-Abadie Mehdi Mirza Bing Xu David Warde-Farley Sherjil Ozair Aaron Courville and Yoshua Bengio. 2014. Generative adversarial nets. Advances in neural information processing systems 27 (2014)."},{"key":"e_1_3_3_2_13_1","doi-asserted-by":"crossref","unstructured":"Jack Hessel Ari Holtzman Maxwell Forbes Ronan\u00a0Le Bras and Yejin Choi. 2021. CLIPScore: A Reference-free Evaluation Metric for Image Captioning. ArXiv Preprint (2021).","DOI":"10.18653\/v1\/2021.emnlp-main.595"},{"key":"e_1_3_3_2_14_1","volume-title":"Advances in Neural Information Processing Systems (NeurIPS)","author":"Heusel Martin","year":"2017","unstructured":"Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. GANs trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems (NeurIPS). https:\/\/dl.acm.org\/doi\/10.5555\/3295222.3295408"},{"key":"e_1_3_3_2_15_1","unstructured":"Jonathan Ho Ajay Jain and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. Advances in neural information processing systems 33 (2020) 6840\u20136851."},{"key":"e_1_3_3_2_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00727"},{"key":"e_1_3_3_2_17_1","unstructured":"Tero Karras Miika Aittala Timo Aila and Samuli Laine. 2022. Elucidating the design space of diffusion-based generative models. Advances in Neural Information Processing Systems 35 (2022) 26565\u201326577."},{"key":"e_1_3_3_2_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00907"},{"key":"e_1_3_3_2_19_1","doi-asserted-by":"crossref","unstructured":"Bernhard Kerbl Georgios Kopanas Thomas Leimk\u00fchler and George Drettakis. 2023. 3D Gaussian Splatting for Real-Time Radiance Field Rendering. ACM Transactions on Graphics (ToG) 42 4 (July 2023). https:\/\/repo-sam.inria.fr\/fungraph\/3d-gaussian-splatting\/","DOI":"10.1145\/3592433"},{"key":"e_1_3_3_2_20_1","unstructured":"Eric Kolve Roozbeh Mottaghi Winson Han Eli VanderBilt Luca Weihs Alvaro Herrasti Daniel Gordon Yuke Zhu Abhinav Gupta and Ali Farhadi. 2017. AI2-THOR: An Interactive 3D Environment for Visual AI. ArXiv Preprint (2017)."},{"key":"e_1_3_3_2_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00814"},{"key":"e_1_3_3_2_22_1","unstructured":"Haoran Li Haolin Shi Wenli Zhang Wenjun Wu Yong Liao Lin Wang Lik hang Lee and Pengyuan Zhou. 2024. DreamScene: 3D Gaussian-based Text-to-3D Scene Generation via Formation Pattern Sampling. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2404.03575 (2024)."},{"key":"e_1_3_3_2_23_1","volume-title":"International Conference on Machine Learning (ICML)","author":"Li Junnan","year":"2023","unstructured":"Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. 2023a. BLIP-2: bootstrapping language-image pre-training with frozen image encoders and large language models. In International Conference on Machine Learning (ICML)."},{"key":"e_1_3_3_2_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.02156"},{"key":"e_1_3_3_2_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00037"},{"key":"e_1_3_3_2_26_1","volume-title":"Advances in Neural Information Processing Systems (NeurIPS)","author":"Liu Haotian","year":"2023","unstructured":"Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong\u00a0Jae Lee. 2023a. Visual Instruction Tuning. In Advances in Neural Information Processing Systems (NeurIPS)."},{"key":"e_1_3_3_2_27_1","unstructured":"Minghua Liu Ruoxi Shi Linghao Chen Zhuoyang Zhang Chao Xu Xinyue Wei Hansheng Chen Chong Zeng Jiayuan Gu and Hao Su. 2023b. One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2311.07885 (2023)."},{"key":"e_1_3_3_2_28_1","doi-asserted-by":"crossref","unstructured":"Jonathan Lorraine Kevin Xie Xiaohui Zeng Chen-Hsuan Lin Towaki Takikawa Nicholas Sharp Tsung-Yi Lin Ming-Yu Liu Sanja Fidler and James Lucas. 2023. ATT3D: Amortized Text-to-3D Object Synthesis. (2023).","DOI":"10.1109\/ICCV51070.2023.01645"},{"key":"e_1_3_3_2_29_1","unstructured":"Fan Lu Kwan-Yee Lin Yan Xu Hongsheng Li Guang Chen and Changjun Jiang. 2024. Urban Architect: Steerable 3D Urban Scene Generation with Layout Prior. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2404.06780 (2024)."},{"key":"e_1_3_3_2_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01218"},{"key":"e_1_3_3_2_31_1","doi-asserted-by":"crossref","unstructured":"Thomas M\u00fcller Alex Evans Christoph Schied and Alexander Keller. 2022. Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (ToG) 41 4 (2022) 1\u201315.","DOI":"10.1145\/3528223.3530127"},{"key":"e_1_3_3_2_32_1","unstructured":"Weili Nie Sifei Liu Morteza Mardani Chao Liu Benjamin Eckart and Arash Vahdat. 2024. Compositional Text-to-Image Generation with Dense Blob Representations. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2405.08246 (2024)."},{"key":"e_1_3_3_2_33_1","unstructured":"Ryan Po and Gordon Wetzstein. 2024. Compositional 3D Scene Generation using Locally Conditioned Diffusion. (2024)."},{"key":"e_1_3_3_2_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.02070"},{"key":"e_1_3_3_2_35_1","volume-title":"International Conference on Learning Representations (ICLR)","author":"Poole Ben","year":"2023","unstructured":"Ben Poole, Ajay Jain, Jonathan\u00a0T. Barron, and Ben Mildenhall. 2023. DreamFusion: Text-to-3D using 2D Diffusion. In International Conference on Learning Representations (ICLR)."},{"key":"e_1_3_3_2_36_1","unstructured":"Guocheng Qian Junli Cao Aliaksandr Siarohin Yash Kant Chaoyang Wang Michael Vasilkovsky Hsin-Ying Lee Yuwei Fang Ivan Skorokhodov Peiye Zhuang Igor Gilitschenski Jian Ren Bernard Ghanem Kfir Aberman and Sergey Tulyakov. 2024. AToM: Amortized Text-to-Mesh using 2D Diffusion. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2402.00867 (2024)."},{"key":"e_1_3_3_2_37_1","first-page":"8748","volume-title":"International conference on machine learning","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong\u00a0Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et\u00a0al. 2021. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748\u20138763."},{"key":"e_1_3_3_2_38_1","unstructured":"Aditya Ramesh Prafulla Dhariwal Alex Nichol Casey Chu and Mark Chen. 2022. Hierarchical Text-Conditional Image Generation with CLIP Latents. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2204.06125 (2022)."},{"key":"e_1_3_3_2_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3588432.3591503"},{"key":"e_1_3_3_2_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"e_1_3_3_2_41_1","volume-title":"Advances in Neural Information Processing Systems (NeurIPS)","author":"Saharia Chitwan","year":"2022","unstructured":"Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar\u00a0Seyed Ghasemipour, Raphael Gontijo-Lopes, Burcu\u00a0Karagol Ayan, Tim Salimans, Jonathan Ho, David\u00a0J. Fleet, and Mohammad Norouzi. 2022. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. In Advances in Neural Information Processing Systems (NeurIPS)."},{"key":"e_1_3_3_2_42_1","unstructured":"Tim Salimans Ian Goodfellow Wojciech Zaremba Vicki Cheung Alec Radford and Xi Chen. 2016. Improved techniques for training gans. Advances in Neural Information Processing Systems (NeurIPS) (2016)."},{"key":"e_1_3_3_2_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00593"},{"key":"e_1_3_3_2_44_1","unstructured":"Yichun Shi Peng Wang Jianglong Ye Mai Long Kejie Li and Xiao Yang. 2023. Mvdream: Multi-view diffusion for 3d generation. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2308.16512 (2023)."},{"key":"e_1_3_3_2_45_1","unstructured":"Jaidev Shriram Alex Trevithick Lingjie Liu and Ravi Ramamoorthi. 2024. RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2404.07199 (2024)."},{"key":"e_1_3_3_2_46_1","first-page":"2256","volume-title":"International conference on machine learning","author":"Sohl-Dickstein Jascha","year":"2015","unstructured":"Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. 2015. Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning. PMLR, 2256\u20132265."},{"key":"e_1_3_3_2_47_1","unstructured":"Yang Song Jascha Sohl-Dickstein Diederik\u00a0P Kingma Abhishek Kumar Stefano Ermon and Ben Poole. 2020. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2011.13456 (2020)."},{"key":"e_1_3_3_2_48_1","doi-asserted-by":"crossref","unstructured":"Jiaxiang Tang Zhaoxi Chen Xiaokang Chen Tengfei Wang Gang Zeng and Ziwei Liu. 2024. LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2402.05054 (2024).","DOI":"10.1007\/978-3-031-73235-5_1"},{"key":"e_1_3_3_2_49_1","volume-title":"Advances in Neural Information Processing Systems (NeurIPS)","author":"Tang Shitao","year":"2023","unstructured":"Shitao Tang, Fuayng Zhang, Jiacheng Chen, Peng Wang, and Furukawa Yasutaka. 2023. MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion. In Advances in Neural Information Processing Systems (NeurIPS)."},{"key":"e_1_3_3_2_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01609"},{"key":"e_1_3_3_2_51_1","volume-title":"Advances in Neural Information Processing Systems (NeurIPS)","author":"Wang Zhengyi","year":"2023","unstructured":"Zhengyi Wang, Cheng Lu, Yikai Wang, Fan Bao, Chongxuan Li, Hang Su, and Jun Zhu. 2023. ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation. In Advances in Neural Information Processing Systems (NeurIPS)."},{"key":"e_1_3_3_2_52_1","unstructured":"Qiuhong\u00a0Anna Wei Sijie Ding Jeong\u00a0Joon Park Rahul Sajnani Adrien Poulenard Srinath Sridhar and Leonidas Guibas. 2023. LEGO-Net: Learning Regular Rearrangements of Objects in Rooms. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2301.09629 (2023)."},{"key":"e_1_3_3_2_53_1","doi-asserted-by":"crossref","unstructured":"Kevin Xie Jonathan Lorraine Tianshi Cao Jun Gao James Lucas Antonio Torralba Sanja Fidler and Xiaohui Zeng. 2024. LATTE3D: Large-scale Amortized Text-To-Enhanced3D Synthesis. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2403.15385 (2024).","DOI":"10.1007\/978-3-031-72980-5_18"},{"key":"e_1_3_3_2_54_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00289"},{"key":"e_1_3_3_2_55_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01369"},{"key":"e_1_3_3_2_56_1","unstructured":"Yu-Ying Yeh Jia-Bin Huang Changil Kim Lei Xiao Thu Nguyen-Phuoc Numair Khan Cheng Zhang Manmohan Chandraker Carl\u00a0S Marshall Zhao Dong and Zhengqin Li. 2024. TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2401.09416 (2024)."},{"key":"e_1_3_3_2_57_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00008"},{"key":"e_1_3_3_2_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.29"},{"key":"e_1_3_3_2_59_1","doi-asserted-by":"crossref","unstructured":"Chong Zeng Yue Dong Pieter Peers Youkang Kong Hongzhi Wu and Xin Tong. 2024. DiLightNet: Fine-grained Lighting Control for Diffusion-based Image Generation. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2402.11929 (2024).","DOI":"10.1145\/3641519.3657396"},{"key":"e_1_3_3_2_60_1","doi-asserted-by":"crossref","unstructured":"Jingbo Zhang Xiaoyu Li Ziyu Wan Can Wang and Jing Liao. 2023a. Text2NeRF: Text-Driven 3D Scene Generation with Neural Radiance Fields. IEEE Transactions on Visualization and Computer Graphics (TVCG) (2023).","DOI":"10.1109\/TVCG.2024.3361502"},{"key":"e_1_3_3_2_61_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00355"},{"key":"e_1_3_3_2_62_1","volume-title":"International Conference on Machine Learning (ICML)","author":"Zhou Xiaoyu","year":"2024","unstructured":"Xiaoyu Zhou, Xingjian Ran, Yajiao Xiong, Jinlin He, Zhiwei Lin, Yongtao Wang, Deqing Sun, and Ming-Hsuan Yang. 2024. GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting. In International Conference on Machine Learning (ICML)."}],"event":{"name":"SA '24: SIGGRAPH Asia 2024 Conference Papers","location":"Tokyo Japan","acronym":"SA '24","sponsor":["SIGGRAPH ACM Special Interest Group on Computer Graphics and Interactive Techniques"]},"container-title":["SIGGRAPH Asia 2024 Conference Papers"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3680528.3687645","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3680528.3687645","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T00:58:27Z","timestamp":1750294707000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3680528.3687645"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,12,3]]},"references-count":61,"alternative-id":["10.1145\/3680528.3687645","10.1145\/3680528"],"URL":"https:\/\/doi.org\/10.1145\/3680528.3687645","relation":{},"subject":[],"published":{"date-parts":[[2024,12,3]]},"assertion":[{"value":"2024-12-03","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}