{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,9]],"date-time":"2025-12-09T04:27:42Z","timestamp":1765254462649,"version":"3.46.0"},"reference-count":59,"publisher":"Association for Computing Machinery (ACM)","issue":"6","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Graph."],"published-print":{"date-parts":[[2025,12]]},"abstract":"<jats:p>\n                    Real-world applications like video gaming and virtual reality often demand the ability to model 3D scenes that users can explore along custom camera trajectories. While significant progress has been made in generating 3D objects from text or images, creating long-range, 3D-consistent, explorable 3D scenes remains a complex and challenging problem. In this work, we present\n                    <jats:bold>\n                      <jats:italic toggle=\"yes\">Voyager<\/jats:italic>\n                    <\/jats:bold>\n                    , a novel video diffusion framework that generates world-consistent 3D point-cloud sequences from a single image with user-defined camera path. Unlike existing approaches, Voyager achieves end-to-end scene generation and reconstruction with inherent consistency across frames, eliminating the need for 3D reconstruction pipelines (e.g., structure-from-motion or multi-view stereo). Our method integrates three key components: 1)\n                    <jats:bold>World-Consistent Video Diffusion<\/jats:bold>\n                    : A unified architecture that jointly generates aligned RGB and depth video sequences, conditioned on existing world observation to ensure global coherence 2)\n                    <jats:bold>Long-Range World Exploration<\/jats:bold>\n                    : An efficient world cache with point culling and an auto-regressive inference with smooth video sampling for iterative scene extension with context-aware consistency, and 3)\n                    <jats:bold>Scalable Data Engine<\/jats:bold>\n                    : A video reconstruction pipeline that automates camera pose estimation and metric depth prediction for arbitrary videos, enabling large-scale, diverse training data curation without manual 3D annotations. Collectively, these designs result in a clear improvement over existing methods in visual quality and geometric accuracy, with versatile applications. Code for this paper are at https:\/\/github.com\/Tencent-Hunyuan\/HunyuanWorld-Voyager.\n                  <\/jats:p>","DOI":"10.1145\/3763330","type":"journal-article","created":{"date-parts":[[2025,12,4]],"date-time":"2025-12-04T17:15:39Z","timestamp":1764868539000},"page":"1-15","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation"],"prefix":"10.1145","volume":"44","author":[{"ORCID":"https:\/\/orcid.org\/0009-0002-1071-6371","authenticated-orcid":false,"given":"Tianyu","family":"Huang","sequence":"first","affiliation":[{"name":"Harbin Institute of Technology, Harbin, China"},{"name":"City University of Hong Kong, Hong Kong, Hong Kong"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-0919-1191","authenticated-orcid":false,"given":"Wangguandong","family":"Zheng","sequence":"additional","affiliation":[{"name":"Tencent, Shenzhen, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3435-8110","authenticated-orcid":false,"given":"Tengfei","family":"Wang","sequence":"additional","affiliation":[{"name":"Tencent, Shenzhen, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0550-4788","authenticated-orcid":false,"given":"Yuhao","family":"Liu","sequence":"additional","affiliation":[{"name":"City University of Hong Kong, Hong Kong, Hong Kong"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0215-660X","authenticated-orcid":false,"given":"Zhenwei","family":"Wang","sequence":"additional","affiliation":[{"name":"Tencent, Shenzhen, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-3928-4606","authenticated-orcid":false,"given":"Junta","family":"Wu","sequence":"additional","affiliation":[{"name":"Tencent, Shenzhen, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7019-2077","authenticated-orcid":false,"given":"Jie","family":"Jiang","sequence":"additional","affiliation":[{"name":"Tencent, Shenzhen, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9198-3951","authenticated-orcid":false,"given":"Hui","family":"Li","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology, Harbin, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8957-8129","authenticated-orcid":false,"given":"Rynson","family":"Lau","sequence":"additional","affiliation":[{"name":"City University of Hong Kong, Hong Kong, Hong Kong"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3330-783X","authenticated-orcid":false,"given":"Wangmeng","family":"Zuo","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology, Harbin, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-7465-802X","authenticated-orcid":false,"given":"Chunchao","family":"Guo","sequence":"additional","affiliation":[{"name":"Tencent, Shenzhen, China"}]}],"member":"320","published-online":{"date-parts":[[2025,12,4]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Jianhong Bai Menghan Xia Xiao Fu Xintao Wang Lianrui Mu Jinwen Cao Zuozhu Liu Haoji Hu Xiang Bai Pengfei Wan and Di Zhang. 2025. ReCamMaster: Camera-Controlled Generative Rendering from A Single Video. arXiv:2503.11647 [cs.CV] https:\/\/arxiv.org\/abs\/2503.11647"},{"key":"e_1_2_1_2_1","volume-title":"VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control. arXiv preprint arXiv:2503.05639","author":"Bian Yuxuan","year":"2025","unstructured":"Yuxuan Bian, Zhaoyang Zhang, Xuan Ju, Mingdeng Cao, Liangbin Xie, Ying Shan, and Qiang Xu. 2025. VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control. arXiv preprint arXiv:2503.05639 (2025)."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.02161"},{"key":"e_1_2_1_4_1","first-page":"8","article-title":"Video generation models as world simulators","volume":"1","author":"Brooks Tim","year":"2024","unstructured":"Tim Brooks, Bill Peebles, Connor Holmes, Will DePue, Yufei Guo, Li Jing, David Schnurr, Joe Taylor, Troy Luhman, Eric Luhman, et al. 2024. Video generation models as world simulators. OpenAI Blog 1 (2024), 8.","journal-title":"OpenAI Blog"},{"key":"e_1_2_1_5_1","unstructured":"Luxi Chen Zihan Zhou Min Zhao Yikai Wang Ge Zhang Wenhao Huang Hao Sun Ji-Rong Wen and Chongxuan Li. 2025. FlexWorld: Progressively Expanding 3D Scenes for Flexiable-View Synthesis. arXiv:2503.13265 [cs.CV] https:\/\/arxiv.org\/abs\/2503.13265"},{"key":"e_1_2_1_6_1","volume-title":"LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes. arXiv preprint arXiv:2311.13384","author":"Chung Jaeyoung","year":"2023","unstructured":"Jaeyoung Chung, Suyoung Lee, Hyeongjin Nam, Jaerin Lee, and Kyoung Mu Lee. 2023. LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes. arXiv preprint arXiv:2311.13384 (2023)."},{"key":"e_1_2_1_7_1","volume-title":"WorldScore: A Unified Evaluation Benchmark for World Generation. arXiv preprint arXiv:2504.00983","author":"Duan Haoyi","year":"2025","unstructured":"Haoyi Duan, Hong-Xing Yu, Sirui Chen, Li Fei-Fei, and Jiajun Wu. 2025. WorldScore: A Unified Evaluation Benchmark for World Generation. arXiv preprint arXiv:2504.00983 (2025)."},{"key":"e_1_2_1_8_1","volume-title":"CAT3D: Create Anything in 3D with Multi-View Diffusion Models. Advances in Neural Information Processing Systems","author":"Ruiqi","year":"2024","unstructured":"Ruiqi Gao*, Aleksander Holynski*, Philipp Henzler, Arthur Brussee, Ricardo Martin-Brualla, Pratul P. Srinivasan, Jonathan T. Barron, and Ben Poole*. 2024. CAT3D: Create Anything in 3D with Multi-View Diffusion Models. Advances in Neural Information Processing Systems (2024)."},{"key":"e_1_2_1_9_1","volume-title":"Animatediff: Animate your personalized text-to-image diffusion models without specific tuning. arXiv preprint arXiv:2307.04725","author":"Guo Yuwei","year":"2023","unstructured":"Yuwei Guo, Ceyuan Yang, Anyi Rao, Zhengyang Liang, Yaohui Wang, Yu Qiao, Maneesh Agrawala, Dahua Lin, and Bo Dai. 2023. Animatediff: Animate your personalized text-to-image diffusion models without specific tuning. arXiv preprint arXiv:2307.04725 (2023)."},{"key":"e_1_2_1_10_1","volume-title":"CameraCtrl: Enabling Camera Control for Text-to-Video Generation. arXiv preprint arXiv:2404.02101","author":"He Hao","year":"2024","unstructured":"Hao He, Yinghao Xu, Yuwei Guo, Gordon Wetzstein, Bo Dai, Hongsheng Li, and Ceyuan Yang. 2024. CameraCtrl: Enabling Camera Control for Text-to-Video Generation. arXiv preprint arXiv:2404.02101 (2024)."},{"key":"e_1_2_1_11_1","volume-title":"Latent video diffusion models for high-fidelity long video generation. arXiv preprint arXiv:2211.13221","author":"He Yingqing","year":"2022","unstructured":"Yingqing He, Tianyu Yang, Yong Zhang, Ying Shan, and Qifeng Chen. 2022. Latent video diffusion models for high-fidelity long video generation. arXiv preprint arXiv:2211.13221 (2022)."},{"key":"e_1_2_1_12_1","volume-title":"Consistent, dynamic, and extendable long video generation from text. arXiv preprint arXiv:2403.14773","author":"Henschel Roberto","year":"2024","unstructured":"Roberto Henschel, Levon Khachatryan, Hayk Poghosyan, Daniil Hayrapetyan, Vahram Tadevosyan, Zhangyang Wang, Shant Navasardyan, and Humphrey Shi. 2024. Streamingt2v: Consistent, dynamic, and extendable long video generation from text. arXiv preprint arXiv:2403.14773 (2024)."},{"key":"e_1_2_1_13_1","volume-title":"Lrm: Large reconstruction model for single image to 3d. arXiv preprint arXiv:2311.04400","author":"Hong Yicong","year":"2023","unstructured":"Yicong Hong, Kai Zhang, Jiuxiang Gu, Sai Bi, Yang Zhou, Difan Liu, Feng Liu, Kalyan Sunkavalli, Trung Bui, and Hao Tan. 2023. Lrm: Large reconstruction model for single image to 3d. arXiv preprint arXiv:2311.04400 (2023)."},{"key":"e_1_2_1_14_1","volume-title":"Metric3d v2: A versatile monocular geometric foundation model for zero-shot metric depth and surface normal estimation","author":"Hu Mu","year":"2024","unstructured":"Mu Hu, Wei Yin, Chi Zhang, Zhipeng Cai, Xiaoxiao Long, Hao Chen, Kaixuan Wang, Gang Yu, Chunhua Shen, and Shaojie Shen. 2024. Metric3d v2: A versatile monocular geometric foundation model for zero-shot metric depth and surface normal estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence (2024)."},{"key":"e_1_2_1_15_1","volume-title":"https:\/\/3d-models.hunyuan.tencent.com","author":"D.","year":"2025","unstructured":"Hunyuan3D. 2025. Hunyuan-3D. (2025). https:\/\/3d-models.hunyuan.tencent.com"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3592433"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3072959.3073599"},{"key":"e_1_2_1_18_1","volume-title":"Hunyuanvideo: A systematic framework for large video generative models. arXiv preprint arXiv:2412.03603","author":"Kong Weijie","year":"2024","unstructured":"Weijie Kong, Qi Tian, Zijian Zhang, Rox Min, Zuozhuo Dai, Jin Zhou, Jiangfeng Xiong, Xin Li, Bo Wu, Jianwei Zhang, et al. 2024. Hunyuanvideo: A systematic framework for large video generative models. arXiv preprint arXiv:2412.03603 (2024)."},{"key":"e_1_2_1_19_1","unstructured":"Black Forest Labs. 2024. FLUX. https:\/\/github.com\/black-forest-labs\/flux."},{"key":"e_1_2_1_20_1","volume-title":"Hunyuan-dit: A powerful multi-resolution diffusion transformer with fine-grained chinese understanding. arXiv preprint arXiv:2405.08748","author":"Li Zhimin","year":"2024","unstructured":"Zhimin Li, Jianwei Zhang, Qin Lin, Jiangfeng Xiong, Yanxin Long, Xinchi Deng, Yingfang Zhang, Xingchao Liu, Minbin Huang, Zedong Xiao, et al. 2024. Hunyuan-dit: A powerful multi-resolution diffusion transformer with fine-grained chinese understanding. arXiv preprint arXiv:2405.08748 (2024)."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.02092"},{"key":"e_1_2_1_22_1","volume-title":"Pavel Tokmakov, Sergey Zakharov, and Carl Vondrick.","author":"Liu Ruoshi","year":"2023","unstructured":"Ruoshi Liu, Rundi Wu, Basile Van Hoorick, Pavel Tokmakov, Sergey Zakharov, and Carl Vondrick. 2023. Zero-1-to-3: Zero-shot One Image to 3D Object. arXiv:2303.11328 [cs.CV]"},{"key":"e_1_2_1_23_1","volume-title":"European Conference on Computer Vision. Springer, 265\u2013282","author":"Liu Yang","year":"2024","unstructured":"Yang Liu, Chuanchen Luo, Lue Fan, Naiyan Wang, Junran Peng, and Zhaoxiang Zhang. 2024. Citygaussian: Real-time high-quality large-scale scene rendering with gaussians. In European Conference on Computer Vision. Springer, 265\u2013282."},{"key":"e_1_2_1_24_1","volume-title":"Shape-for-Motion: Precise and Consistent Video Editing with 3D Proxy. arXiv preprint arXiv:2506.22432","author":"Liu Yuhao","year":"2025","unstructured":"Yuhao Liu, Tengfei Wang, Fang Liu, Zhenwei Wang, and Rynson WH Lau. 2025. Shape-for-Motion: Precise and Consistent Video Editing with 3D Proxy. arXiv preprint arXiv:2506.22432 (2025)."},{"key":"e_1_2_1_25_1","volume-title":"Freelong: Training-free long video generation with spectralblend temporal attention. arXiv preprint arXiv:2407.19918","author":"Lu Yu","year":"2024","unstructured":"Yu Lu, Yuanzhi Liang, Linchao Zhu, and Yi Yang. 2024. Freelong: Training-free long video generation with spectralblend temporal attention. arXiv preprint arXiv:2407.19918 (2024)."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52734.2025.00194"},{"key":"e_1_2_1_27_1","volume-title":"Lt3sd: Latent trees for 3d scene diffusion. arXiv preprint arXiv:2409.08215","author":"Meng Quan","year":"2024","unstructured":"Quan Meng, Lei Li, Matthias Nie\u00dfner, and Angela Dai. 2024. Lt3sd: Latent trees for 3d scene diffusion. arXiv preprint arXiv:2409.08215 (2024)."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3503250"},{"key":"e_1_2_1_29_1","volume-title":"Scalable Diffusion Models with Transformers. arXiv preprint arXiv:2212.09748","author":"Peebles William","year":"2022","unstructured":"William Peebles and Saining Xie. 2022. Scalable Diffusion Models with Transformers. arXiv preprint arXiv:2212.09748 (2022)."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52734.2025.00574"},{"key":"e_1_2_1_31_1","volume-title":"https:\/\/hyper3d.ai","author":"Rodin AI.","year":"2025","unstructured":"RodinAI. 2025. Rodin. (2025). https:\/\/hyper3d.ai"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"e_1_2_1_33_1","unstructured":"Runway. 2024. Introducing gen-3 alpha: A new frontier for video gneration. (2024). https:\/\/runwayml.com\/research\/introducing-gen-3-alpha"},{"key":"e_1_2_1_34_1","volume-title":"Lorenzo Porzi, and Peter Kontschieder.","author":"Schwarz Katja","year":"2025","unstructured":"Katja Schwarz, Denys Rozumnyi, Samuel Rota Bul\u00f2, Lorenzo Porzi, and Peter Kontschieder. 2025. A Recipe for Generating 3D Worlds From a Single Image. arXiv preprint arXiv:2503.16611 (2025)."},{"key":"e_1_2_1_35_1","volume-title":"GenWarp: Single Image to Novel Views with Semantic-Preserving Generative Warping. arXiv preprint arXiv:2405.17251","author":"Seo Junyoung","year":"2024","unstructured":"Junyoung Seo, Kazumi Fukuda, Takashi Shibuya, Takuya Narihira, Naoki Murata, Shoukang Hu, Chieh-Hsin Lai, Seungryong Kim, and Yuki Mitsufuji. 2024. GenWarp: Single Image to Novel Views with Semantic-Preserving Generative Warping. arXiv preprint arXiv:2405.17251 (2024)."},{"key":"e_1_2_1_36_1","volume-title":"Droid-slam: Deep visual slam for monocular, stereo, and rgb-d cameras. Advances in neural information processing systems 34","author":"Teed Zachary","year":"2021","unstructured":"Zachary Teed and Jia Deng. 2021. Droid-slam: Deep visual slam for monocular, stereo, and rgb-d cameras. Advances in neural information processing systems 34 (2021), 16558\u201316569."},{"key":"e_1_2_1_37_1","unstructured":"HunyuanWorld Team Tencent. 2025. HunyuanWorld 1.0: Generating Immersive Explorable and Interactive 3D Worlds from Words or Pixels. arXiv:2507.21809 [cs.CV]"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.510"},{"key":"e_1_2_1_39_1","volume-title":"Gen-l-video: Multi-text to long video generation via temporal co-denoising. arXiv preprint arXiv:2305.18264","author":"Wang Fu-Yun","year":"2023","unstructured":"Fu-Yun Wang, Wenshuo Chen, Guanglu Song, Han-Jia Ye, Yu Liu, and Hongsheng Li. 2023. Gen-l-video: Multi-text to long video generation via temporal co-denoising. arXiv preprint arXiv:2305.18264 (2023)."},{"key":"e_1_2_1_40_1","volume-title":"Vggt: Visual geometry grounded transformer. arXiv preprint arXiv:2503.11651","author":"Wang Jianyuan","year":"2025","unstructured":"Jianyuan Wang, Minghao Chen, Nikita Karaev, Andrea Vedaldi, Christian Rupprecht, and David Novotny. 2025. Vggt: Visual geometry grounded transformer. arXiv preprint arXiv:2503.11651 (2025)."},{"key":"e_1_2_1_41_1","volume-title":"Moge: Unlocking accurate monocular geometry estimation for open-domain images with optimal training supervision. arXiv preprint arXiv:2410.19115","author":"Wang Ruicheng","year":"2024","unstructured":"Ruicheng Wang, Sicheng Xu, Cassie Dai, Jianfeng Xiang, Yu Deng, Xin Tong, and Jiaolong Yang. 2024a. Moge: Unlocking accurate monocular geometry estimation for open-domain images with optimal training supervision. arXiv preprint arXiv:2410.19115 (2024)."},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3641519.3657518"},{"key":"e_1_2_1_43_1","volume-title":"Toon3D: Seeing Cartoons from a New Perspective. arXiv preprint arXiv:2405.10320","author":"Weber Ethan","year":"2024","unstructured":"Ethan Weber, Riley Peterlinz, Rohan Mathur, Frederik Warburg, Alexei A Efros, and Angjoo Kanazawa. 2024. Toon3D: Seeing Cartoons from a New Perspective. arXiv preprint arXiv:2405.10320 (2024)."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.02036"},{"key":"e_1_2_1_45_1","volume-title":"Structured 3d latents for scalable and versatile 3d generation. arXiv preprint arXiv:2412.01506","author":"Xiang Jianfeng","year":"2024","unstructured":"Jianfeng Xiang, Zelong Lv, Sicheng Xu, Yu Deng, Ruicheng Wang, Bowen Zhang, Dong Chen, Xin Tong, and Jiaolong Yang. 2024. Structured 3d latents for scalable and versatile 3d generation. arXiv preprint arXiv:2412.01506 (2024)."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00923"},{"key":"e_1_2_1_47_1","volume-title":"Easyanimate: A high-performance long video generation method based on transformer architecture. arXiv preprint arXiv:2405.18991","author":"Xu Jiaqi","year":"2024","unstructured":"Jiaqi Xu, Xinyi Zou, Kunzhe Huang, Yunkuo Chen, Bo Liu, MengLi Cheng, Xing Shi, and Jun Huang. 2024. Easyanimate: A high-performance long video generation method based on transformer architecture. arXiv preprint arXiv:2405.18991 (2024)."},{"key":"e_1_2_1_48_1","unstructured":"Zhuoyi Yang Jiayan Teng Wendi Zheng Ming Ding Shiyu Huang Jiazheng Xu Yuanming Yang Wenyi Hong Xiaohan Zhang Guanyu Feng et al. 2024. CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer. arXiv preprint arXiv:2408.06072 (2024)."},{"key":"e_1_2_1_49_1","volume-title":"Nuwa-xl: Diffusion over diffusion for extremely long video generation. arXiv preprint arXiv:2303.12346","author":"Yin Shengming","year":"2023","unstructured":"Shengming Yin, Chenfei Wu, Huan Yang, Jianfeng Wang, Xiaodong Wang, Minheng Ni, Zhengyuan Yang, Linjie Li, Shuguang Liu, Fan Yang, et al. 2023. Nuwa-xl: Diffusion over diffusion for extremely long video generation. arXiv preprint arXiv:2303.12346 (2023)."},{"key":"e_1_2_1_50_1","volume-title":"From slow bidirectional to fast causal video generators. arXiv preprint arXiv:2412.07772","author":"Yin Tianwei","year":"2024","unstructured":"Tianwei Yin, Qiang Zhang, Richard Zhang, William T Freeman, Fredo Durand, Eli Shechtman, and Xun Huang. 2024. From slow bidirectional to fast causal video generators. arXiv preprint arXiv:2412.07772 (2024)."},{"key":"e_1_2_1_51_1","volume-title":"WonderWorld: Interactive 3D Scene Generation from a Single Image. arXiv:2406.09394","author":"Yu Hong-Xing","year":"2024","unstructured":"Hong-Xing Yu, Haoyi Duan, Charles Herrmann, William T. Freeman, and Jiajun Wu. 2024a. WonderWorld: Interactive 3D Scene Generation from a Single Image. arXiv:2406.09394 (2024)."},{"key":"e_1_2_1_52_1","volume-title":"WonderJourney: Going from Anywhere to Everywhere. arXiv preprint arXiv:2312.03884","author":"Yu Hong-Xing","year":"2023","unstructured":"Hong-Xing Yu, Haoyi Duan, Junhwa Hur, Kyle Sargent, Michael Rubinstein, William T Freeman, Forrester Cole, Deqing Sun, Noah Snavely, Jiajun Wu, and Charles Herrmann. 2023a. WonderJourney: Going from Anywhere to Everywhere. arXiv preprint arXiv:2312.03884 (2023)."},{"key":"e_1_2_1_53_1","unstructured":"Lijun Yu Jos\u00e9 Lezama Nitesh B Gundavarapu Luca Versari Kihyuk Sohn David Minnen Yong Cheng Vighnesh Birodkar Agrim Gupta Xiuye Gu et al. 2023b. Language Model Beats Diffusion-Tokenizer is Key to Visual Generation. arXiv preprint arXiv:2310.05737 (2023)."},{"key":"e_1_2_1_54_1","volume-title":"ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis. arXiv preprint arXiv:2409.02048","author":"Yu Wangbo","year":"2024","unstructured":"Wangbo Yu, Jinbo Xing, Li Yuan, Wenbo Hu, Xiaoyu Li, Zhipeng Huang, Xiangjun Gao, Tien-Tsin Wong, Ying Shan, and Yonghong Tian. 2024b. ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis. arXiv preprint arXiv:2409.02048 (2024)."},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v37i3.25464"},{"key":"e_1_2_1_56_1","unstructured":"Zibo Zhao Zeqiang Lai Qingxiang Lin Yunfei Zhao Haolin Liu Shuhui Yang Yifei Feng Mingxin Yang Sheng Zhang Xianghui Yang et al. 2025. Hunyuan3d 2.0: Scaling diffusion models for high resolution textured 3d assets generation. arXiv preprint arXiv:2501.12202 (2025)."},{"key":"e_1_2_1_57_1","volume-title":"Stable Virtual Camera: Generative View Synthesis with Diffusion Models. arXiv preprint arXiv:2503.14489","author":"Zhou Jensen","year":"2025","unstructured":"Jensen (Jinghao) Zhou, Hang Gao, Vikram Voleti, Aaryaman Vasishta, Chun-Han Yao, Mark Boss, Philip Torr, Christian Rupprecht, and Varun Jampani. 2025. Stable Virtual Camera: Generative View Synthesis with Diffusion Models. arXiv preprint arXiv:2503.14489 (2025)."},{"key":"e_1_2_1_58_1","volume-title":"Stereo magnification: Learning view synthesis using multiplane images. arXiv preprint arXiv:1805.09817","author":"Zhou Tinghui","year":"2018","unstructured":"Tinghui Zhou, Richard Tucker, John Flynn, Graham Fyffe, and Noah Snavely. 2018. Stereo magnification: Learning view synthesis using multiplane images. arXiv preprint arXiv:1805.09817 (2018)."},{"key":"e_1_2_1_59_1","volume-title":"Allegro: Open the Black Box of Commercial-Level Video Generation Model. arXiv preprint arXiv:2410.15458","author":"Zhou Yuan","year":"2024","unstructured":"Yuan Zhou, Qiuyue Wang, Yuxuan Cai, and Huan Yang. 2024. Allegro: Open the Black Box of Commercial-Level Video Generation Model. arXiv preprint arXiv:2410.15458 (2024)."}],"container-title":["ACM Transactions on Graphics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3763330","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,5]],"date-time":"2025-12-05T21:12:11Z","timestamp":1764969131000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3763330"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12]]},"references-count":59,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2025,12]]}},"alternative-id":["10.1145\/3763330"],"URL":"https:\/\/doi.org\/10.1145\/3763330","relation":{},"ISSN":["0730-0301","1557-7368"],"issn-type":[{"type":"print","value":"0730-0301"},{"type":"electronic","value":"1557-7368"}],"subject":[],"published":{"date-parts":[[2025,12]]},"assertion":[{"value":"2025-05-24","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-08-09","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-12-04","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}