{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,28]],"date-time":"2026-01-28T22:05:18Z","timestamp":1769637918477,"version":"3.49.0"},"publisher-location":"New York, NY, USA","reference-count":49,"publisher":"ACM","funder":[{"name":"Institute of Information & communications Technology Planning & Evaluation (IITP)","award":["2019-0-01906,RS-2024-00457882,2021-0-02068"],"award-info":[{"award-number":["2019-0-01906,RS-2024-00457882,2021-0-02068"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,12,15]]},"DOI":"10.1145\/3757377.3763871","type":"proceedings-article","created":{"date-parts":[[2025,12,8]],"date-time":"2025-12-08T16:30:41Z","timestamp":1765211441000},"page":"1-11","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["VideoFrom3D: 3D Scene Video Generation via Complementary Image and Video Diffusion Models"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0806-6963","authenticated-orcid":false,"given":"Geonung","family":"Kim","sequence":"first","affiliation":[{"name":"POSTECH, Pohang, Republic of Korea"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-2287-6263","authenticated-orcid":false,"given":"Janghyeok","family":"Han","sequence":"additional","affiliation":[{"name":"POSTECH, Pohang, Republic of Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7627-3513","authenticated-orcid":false,"given":"Sunghyun","family":"Cho","sequence":"additional","affiliation":[{"name":"POSTECH, Pohang, Republic of Korea"}]}],"member":"320","published-online":{"date-parts":[[2025,12,14]]},"reference":[{"key":"e_1_3_3_3_2_1","unstructured":"Hassan\u00a0Abu Alhaija Jose Alvarez Maciej Bala Tiffany Cai Tianshi Cao Liz Cha Joshua Chen Mike Chen Francesco Ferroni Sanja Fidler et\u00a0al. 2025. Cosmos-transfer1: Conditional world generation with adaptive multimodal control. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2503.14492 (2025)."},{"key":"e_1_3_3_3_3_1","unstructured":"Dane\u00a0Edward Bettis. 2005. Digital production pipelines: examining structures and methods in the computer effects industry. Ph.\u00a0D. Dissertation. Texas A&M University."},{"key":"e_1_3_3_3_4_1","doi-asserted-by":"crossref","unstructured":"Ryan Burgert Yuancheng Xu Wenqi Xian Oliver Pilarski Pascal Clausen Mingming He Li Ma Yitong Deng Lingxiao Li Mohsen Mousavi et\u00a0al. 2025. Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2501.08331 (2025).","DOI":"10.1109\/CVPR52734.2025.00011"},{"key":"e_1_3_3_3_5_1","unstructured":"Chenjie Cao Jingkai Zhou Shikai Li Jingyun Liang Chaohui Yu Fan Wang Xiangyang Xue and Yanwei Fu. 2025. Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2504.14899 (2025)."},{"key":"e_1_3_3_3_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.01701"},{"key":"e_1_3_3_3_7_1","unstructured":"Hansheng Chen Ruoxi Shi Yulin Liu Bokui Shen Jiayuan Gu Gordon Wetzstein Hao Su and Leonidas Guibas. 2024a. Generic 3d diffusion adapter using controlled multi-view editing. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2403.12032 (2024)."},{"key":"e_1_3_3_3_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.02033"},{"key":"e_1_3_3_3_9_1","unstructured":"Yuedong Chen Chuanxia Zheng Haofei Xu Bohan Zhuang Andrea Vedaldi Tat-Jen Cham and Jianfei Cai. 2024b. Mvsplat360: Feed-forward 360 scene synthesis from sparse views. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2411.04924 (2024)."},{"key":"e_1_3_3_3_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3641519.3657425"},{"key":"e_1_3_3_3_11_1","volume-title":"3D Environment Design with Blender: Enhance your modeling, texturing, and lighting skills to create realistic 3D scenes","author":"Hamdani Abdelilah","year":"2023","unstructured":"Abdelilah Hamdani and Carlos Barreto. 2023. 3D Environment Design with Blender: Enhance your modeling, texturing, and lighting skills to create realistic 3D scenes. Packt Publishing Ltd."},{"key":"e_1_3_3_3_12_1","unstructured":"Edward\u00a0J Hu Yelong Shen Phillip Wallis Zeyuan Allen-Zhu Yuanzhi Li Shean Wang Lu Wang Weizhu Chen et\u00a0al. 2022. Lora: Low-rank adaptation of large language models. ICLR 1 2 (2022) 3."},{"key":"e_1_3_3_3_13_1","unstructured":"Ziqi Huang Fan Zhang Xiaojie Xu Yinan He Jiashuo Yu Ziyue Dong Qianli Ma Nattapol Chanpaisit Chenyang Si Yuming Jiang et\u00a0al. 2024. Vbench++: Comprehensive and versatile benchmark suite for video generative models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2411.13503 (2024)."},{"key":"e_1_3_3_3_14_1","unstructured":"Zeyinzi Jiang Zhen Han Chaojie Mao Jingfeng Zhang Yulin Pan and Yu Liu. 2025. Vace: All-in-one video creation and editing. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2503.07598 (2025)."},{"key":"e_1_3_3_3_15_1","unstructured":"Haian Jin Hanwen Jiang Hao Tan Kai Zhang Sai Bi Tianyuan Zhang Fujun Luan Noah Snavely and Zexiang Xu. 2024. Lvsm: A large view synthesis model with minimal 3d inductive bias. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2410.17242 (2024)."},{"key":"e_1_3_3_3_16_1","unstructured":"Wonjoon Jin Qi Dai Chong Luo Seung-Hwan Baek and Sunghyun Cho. 2025. FloVD: Optical Flow Meets Video Diffusion Model for Enhanced Camera-Controlled Video Synthesis. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2502.08244 (2025)."},{"key":"e_1_3_3_3_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00510"},{"key":"e_1_3_3_3_18_1","doi-asserted-by":"crossref","unstructured":"Bernhard Kerbl Georgios Kopanas Thomas Leimk\u00fchler and George Drettakis. 2023. 3d gaussian splatting for real-time radiance field rendering. ACM Trans. Graph. 42 4 (2023) 139\u20131.","DOI":"10.1145\/3592433"},{"key":"e_1_3_3_3_19_1","unstructured":"Diederik\u00a0P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/1412.6980 (2014)."},{"key":"e_1_3_3_3_20_1","unstructured":"Black\u00a0Forest Labs. 2024. FLUX. https:\/\/github.com\/black-forest-labs\/flux."},{"key":"e_1_3_3_3_21_1","unstructured":"LAION-AI. 2023. Aesthetic Predictor. https:\/\/github.com\/LAION-AI\/aesthetic-predictor. Accessed: 2025-05-01."},{"key":"e_1_3_3_3_22_1","first-page":"12888","volume-title":"International conference on machine learning","author":"Li Junnan","year":"2022","unstructured":"Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. 2022. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In International conference on machine learning. PMLR, 12888\u201312900."},{"key":"e_1_3_3_3_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.02092"},{"key":"e_1_3_3_3_24_1","unstructured":"Fangfu Liu Wenqiang Sun Hanyang Wang Yikai Wang Haowen Sun Junliang Ye Jun Zhang and Yueqi Duan. 2024. Reconx: Reconstruct any scene from sparse views with video diffusion model. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2408.16767 (2024)."},{"key":"e_1_3_3_3_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/3641233.3664307"},{"key":"e_1_3_3_3_26_1","unstructured":"Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/1711.05101 (2017)."},{"key":"e_1_3_3_3_27_1","unstructured":"Fan Lu Kwan-Yee Lin Yan Xu Hongsheng Li Guang Chen and Changjun Jiang. 2024. Urban architect: Steerable 3d urban scene generation with layout prior. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2404.06780 (2024)."},{"key":"e_1_3_3_3_28_1","unstructured":"Baorui Ma Huachen Gao Haoge Deng Zhengxiong Luo Tiejun Huang Lulu Tang and Xinlong Wang. 2024. You See it You Got it: Learning 3D Creation on Pose-Free Videos at Scale. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2412.06699 (2024)."},{"key":"e_1_3_3_3_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01218"},{"key":"e_1_3_3_3_30_1","first-page":"8748","volume-title":"International conference on machine learning","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong\u00a0Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et\u00a0al. 2021. Learning transferable visual models from natural language supervision. In International conference on machine learning. PmLR, 8748\u20138763."},{"key":"e_1_3_3_3_31_1","doi-asserted-by":"crossref","unstructured":"Ren\u00e9 Ranftl Katrin Lasinger David Hafner Konrad Schindler and Vladlen Koltun. 2020. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE transactions on pattern analysis and machine intelligence 44 3 (2020) 1623\u20131637.","DOI":"10.1109\/TPAMI.2020.3019967"},{"key":"e_1_3_3_3_32_1","unstructured":"Xuanchi Ren Tianchang Shen Jiahui Huang Huan Ling Yifan Lu Merlin Nimier-David Thomas M\u00fcller Alexander Keller Sanja Fidler and Jun Gao. 2025. Gen3c: 3d-informed world-consistent video generation with precise camera control. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2503.03751 (2025)."},{"key":"e_1_3_3_3_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/3588432.3591503"},{"key":"e_1_3_3_3_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3721238.3730701"},{"key":"e_1_3_3_3_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00593"},{"key":"e_1_3_3_3_36_1","unstructured":"Ruoxi Shi Hansheng Chen Zhuoyang Zhang Minghua Liu Chao Xu Xinyue Wei Linghao Chen Chong Zeng and Hao Su. 2023. Zero123++: a single image to consistent multi-view diffusion base model. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2310.15110 (2023)."},{"key":"e_1_3_3_3_37_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58536-5_24"},{"key":"e_1_3_3_3_38_1","unstructured":"TurboSquid. [n. d.]. TurboSquid - 3D Models for Professionals. https:\/\/www.turbosquid.com."},{"key":"e_1_3_3_3_39_1","unstructured":"Zhenwei Wang Tengfei Wang Zexin He Gerhard Hancke Ziwei Liu and Rynson\u00a0WH Lau. 2024. Phidias: A generative model for creating 3d content from text image and 3d conditions with reference-augmented diffusion. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2409.11406 (2024)."},{"key":"e_1_3_3_3_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.164"},{"key":"e_1_3_3_3_41_1","unstructured":"XLabs. 2024. x-flux. https:\/\/github.com\/XLabs-AI\/x-flux."},{"key":"e_1_3_3_3_42_1","unstructured":"Zhuoyi Yang Jiayan Teng Wendi Zheng Ming Ding Shiyu Huang Jiazheng Xu Yuanming Yang Wenyi Hong Xiaohan Zhang Guanyu Feng et\u00a0al. 2024. CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2408.06072 (2024)."},{"key":"e_1_3_3_3_43_1","unstructured":"Wangbo Yu Jinbo Xing Li Yuan Wenbo Hu Xiaoyu Li Zhipeng Huang Xiangjun Gao Tien-Tsin Wong Ying Shan and Yonghong Tian. 2024a. Viewcrafter: Taming video diffusion models for high-fidelity novel view synthesis. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2409.02048 (2024)."},{"key":"e_1_3_3_3_44_1","doi-asserted-by":"crossref","unstructured":"Xin Yu Ze Yuan Yuan-Chen Guo Ying-Tian Liu Jianhui Liu Yangguang Li Yan-Pei Cao Ding Liang and Xiaojuan Qi. 2024b. Texgen: a generative diffusion model for mesh textures. ACM Transactions on Graphics (TOG) 43 6 (2024) 1\u201314.","DOI":"10.1145\/3687909"},{"key":"e_1_3_3_3_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00407"},{"key":"e_1_3_3_3_46_1","unstructured":"Hao Zhang Feng Li Shilong Liu Lei Zhang Hang Su Jun Zhu Lionel\u00a0M Ni and Heung-Yeung Shum. 2022. Dino: Detr with improved denoising anchor boxes for end-to-end object detection. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2203.03605 (2022)."},{"key":"e_1_3_3_3_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/3641519.3657494"},{"key":"e_1_3_3_3_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00355"},{"key":"e_1_3_3_3_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00068"},{"key":"e_1_3_3_3_50_1","unstructured":"Jensen\u00a0Jinghao Zhou Hang Gao Vikram Voleti Aaryaman Vasishta Chun-Han Yao Mark Boss Philip Torr Christian Rupprecht and Varun Jampani. 2025. STABLE VIRTUAL CAMERA: Generative View Synthesis with Diffusion Models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2503.14489 (2025)."}],"event":{"name":"SA Conference Papers '25: SIGGRAPH Asia 2025 Conference Papers","location":"Hong Kong Hong Kong","acronym":"SA Conference Papers '25","sponsor":["SIGGRAPH ACM Special Interest Group on Computer Graphics and Interactive Techniques"]},"container-title":["Proceedings of the SIGGRAPH Asia 2025 Conference Papers"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3757377.3763871","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,9]],"date-time":"2025-12-09T03:32:34Z","timestamp":1765251154000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3757377.3763871"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12,14]]},"references-count":49,"alternative-id":["10.1145\/3757377.3763871","10.1145\/3757377"],"URL":"https:\/\/doi.org\/10.1145\/3757377.3763871","relation":{},"subject":[],"published":{"date-parts":[[2025,12,14]]},"assertion":[{"value":"2025-12-14","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}