{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,5]],"date-time":"2025-12-05T21:16:24Z","timestamp":1764969384263,"version":"3.46.0"},"reference-count":122,"publisher":"Association for Computing Machinery (ACM)","issue":"6","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Graph."],"published-print":{"date-parts":[[2025,12]]},"abstract":"<jats:p>\n                    We present a unique system for large-scale, multi-performer, high resolution 4D volumetric capture providing realistic free-viewpoint video up to and including 4K resolution facial closeups. To achieve this, we employ a novel volumetric capture, reconstruction and rendering pipeline based on Dynamic Gaussian Splatting and Diffusion-based Detail Enhancement. We design our pipeline specifically to meet the demands of high-end media production. We employ two capture rigs: the\n                    <jats:italic toggle=\"yes\">Scene Rig<\/jats:italic>\n                    , which captures multi-actor performances at a resolution which falls short of 4K production quality, and the\n                    <jats:italic toggle=\"yes\">Face Rig<\/jats:italic>\n                    , which records high-fidelity single-actor facial detail to serve as a reference for detail enhancement. We first reconstruct dynamic performances from the\n                    <jats:italic toggle=\"yes\">Scene Rig<\/jats:italic>\n                    using 4D Gaussian Splatting, incorporating new model designs and training strategies to improve reconstruction, dynamic range, and rendering quality. Then to render high-quality images for facial closeups, we introduce a diffusion-based detail enhancement model. This model is fine-tuned with high-fidelity data from the same actors recorded in the\n                    <jats:italic toggle=\"yes\">Face Rig.<\/jats:italic>\n                    We train on paired data generated from low- and high-quality Gaussian Splatting (GS) models, using the low-quality input to match the quality of the\n                    <jats:italic toggle=\"yes\">Scene Rig<\/jats:italic>\n                    , with the high-quality GS as ground truth. Our results demonstrate the effectiveness of this pipeline in bridging the gap between the scalable performance capture of a large-scale rig and the high-resolution standards required for film and media production.\n                  <\/jats:p>","DOI":"10.1145\/3763336","type":"journal-article","created":{"date-parts":[[2025,12,4]],"date-time":"2025-12-04T17:15:39Z","timestamp":1764868539000},"page":"1-13","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Detail Enhanced Gaussian Splatting for Large-Scale Volumetric Capture"],"prefix":"10.1145","volume":"44","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3125-1614","authenticated-orcid":false,"given":"Julien","family":"Philip","sequence":"first","affiliation":[{"name":"Eyeline Labs, London, United Kingdom"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6992-0089","authenticated-orcid":false,"given":"Li","family":"Ma","sequence":"additional","affiliation":[{"name":"Eyeline Labs, Los Angeles, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-2534-9266","authenticated-orcid":false,"given":"Pascal","family":"Clausen","sequence":"additional","affiliation":[{"name":"Eyeline Labs, Geneve, Switzerland"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-4678-6458","authenticated-orcid":false,"given":"Wenqi","family":"Xian","sequence":"additional","affiliation":[{"name":"Eyeline Labs, Los Angeles, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-7150-0160","authenticated-orcid":false,"given":"Ahmet Levent","family":"Ta\u015fel","sequence":"additional","affiliation":[{"name":"Eyeline Labs, Vancouver, Canada"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9982-7934","authenticated-orcid":false,"given":"Mingming","family":"He","sequence":"additional","affiliation":[{"name":"Eyeline Labs, Los Angeles, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-8189-6024","authenticated-orcid":false,"given":"Xueming","family":"Yu","sequence":"additional","affiliation":[{"name":"Eyeline Labs, Los Angeles, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-1570-3708","authenticated-orcid":false,"given":"David M.","family":"George","sequence":"additional","affiliation":[{"name":"Eyeline Labs, Los Angeles, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-6865-1325","authenticated-orcid":false,"given":"Ning","family":"Yu","sequence":"additional","affiliation":[{"name":"Eyeline Labs, Los Angeles, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-7803-2314","authenticated-orcid":false,"given":"Oliver","family":"Pilarski","sequence":"additional","affiliation":[{"name":"Eyeline Labs, Munich, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7381-2323","authenticated-orcid":false,"given":"Paul","family":"Debevec","sequence":"additional","affiliation":[{"name":"Eyeline Labs, Los Angeles, USA"}]}],"member":"320","published-online":{"date-parts":[[2025,12,4]]},"reference":[{"key":"e_1_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2008.4587758"},{"key":"e_1_2_2_2_1","unstructured":"Autodesk INC. 2024. Maya. https:\/\/autodesk.com\/maya"},{"key":"e_1_2_2_3_1","volume-title":"MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation. arXiv preprint arXiv:2302.08113","author":"Bar-Tal Omer","year":"2023","unstructured":"Omer Bar-Tal, Lior Yariv, Yaron Lipman, and Tali Dekel. 2023. MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation. arXiv preprint arXiv:2302.08113 (2023)."},{"key":"e_1_2_2_4_1","volume-title":"Srinivasan","author":"Barron Jonathan T.","year":"2021","unstructured":"Jonathan T. Barron, Ben Mildenhall, Matthew Tancik, Peter Hedman, Ricardo Martin-Brualla, and Pratul P. Srinivasan. 2021. Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields. ICCV (2021)."},{"key":"e_1_2_2_5_1","volume-title":"Zip-NeRF: Anti-Aliased Grid-Based Neural Radiance Fields. ICCV","author":"Barron Jonathan T.","year":"2023","unstructured":"Jonathan T. Barron, Ben Mildenhall, Dor Verbin, Pratul P. Srinivasan, and Peter Hedman. 2023. Zip-NeRF: Anti-Aliased Grid-Based Neural Radiance Fields. ICCV (2023)."},{"key":"e_1_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/2010324.1964970"},{"key":"e_1_2_2_7_1","unstructured":"Andreas Blattmann Tim Dockhorn Sumith Kulal Daniel Mendelevitch Maciej Kilian Dominik Lorenz Yam Levi Zion English Vikram Voleti Adam Letts et al. 2023. Stable video diffusion: Scaling latent video diffusion models to large datasets. arXiv preprint arXiv:2311.15127 (2023)."},{"key":"e_1_2_2_8_1","doi-asserted-by":"crossref","unstructured":"Jose Caballero Christian Ledig Andrew Aitken Alejandro Acosta Johannes Totz Zehan Wang and Wenzhe Shi. 2017. Real-time video super-resolution with spatio-temporal networks and motion compensation. In CVPR. 4778\u20134787.","DOI":"10.1109\/CVPR.2017.304"},{"key":"e_1_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-15561-1_24"},{"key":"e_1_2_2_10_1","volume-title":"AP-LDM: Attentive and Progressive Latent Diffusion Model for Training-Free High-Resolution Image Generation. arXiv preprint arXiv:2410.06055","author":"Cao Boyuan","year":"2024","unstructured":"Boyuan Cao, Jiaxin Ye, Yujie Wei, and Hongming Shan. 2024. AP-LDM: Attentive and Progressive Latent Diffusion Model for Training-Free High-Resolution Image Generation. arXiv preprint arXiv:2410.06055 (2024)."},{"key":"e_1_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-73254-6_4"},{"key":"e_1_2_2_12_1","volume-title":"Basicvsr: The search for essential components in video super-resolution and beyond. In CVPR. 4947\u20134956.","author":"Chan Kelvin CK","year":"2021","unstructured":"Kelvin CK Chan, Xintao Wang, Ke Yu, Chao Dong, and Chen Change Loy. 2021. Basicvsr: The search for essential components in video super-resolution and beyond. In CVPR. 4947\u20134956."},{"key":"e_1_2_2_13_1","unstructured":"Kelvin CK Chan Shangchen Zhou Xiangyu Xu and Chen Change Loy. 2022. Investigating tradeoffs in real-world video super-resolution. In CVPR. 5962\u20135971."},{"key":"e_1_2_2_14_1","doi-asserted-by":"crossref","unstructured":"Zhikai Chen Fuchen Long Zhaofan Qiu Ting Yao Wengang Zhou Jiebo Luo and Tao Mei. 2024. Learning Spatial Adaptation and Temporal Coherence in Diffusion Models for Video Super-Resolution. In CVPR. 9232\u20139241.","DOI":"10.1109\/CVPR52733.2024.00882"},{"key":"e_1_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2766945"},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/1360612.1360697"},{"key":"e_1_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3130800.3130801"},{"key":"e_1_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00589"},{"key":"e_1_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01406"},{"key":"e_1_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3641519.3657463"},{"key":"e_1_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.5555\/2383894.2383917"},{"key":"e_1_2_2_22_1","volume-title":"Fast Dynamic Radiance Fields with Time-Aware Neural Voxels. In SIGGRAPH Asia 2022 Conference Papers.","author":"Fang Jiemin","year":"2022","unstructured":"Jiemin Fang, Taoran Yi, Xinggang Wang, Lingxi Xie, Xiaopeng Zhang, Wenyu Liu, Matthias Nie\u00dfner, and Qi Tian. 2022. Fast Dynamic Radiance Fields with Time-Aware Neural Voxels. In SIGGRAPH Asia 2022 Conference Papers."},{"key":"e_1_2_2_23_1","doi-asserted-by":"crossref","unstructured":"Ruicheng Feng Chongyi Li and Chen Change Loy. 2024. Kalman-Inspired Feature Propagation for Video Face Super-Resolution. arXiv:2408.05205 [cs.CV] https:\/\/arxiv.org\/abs\/2408.05205","DOI":"10.1007\/978-3-031-73347-5_12"},{"key":"e_1_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01201"},{"key":"e_1_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1111\/J.1467-8659.2011.01888.X"},{"key":"e_1_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/280814.280822"},{"key":"e_1_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/3355089.3356571"},{"key":"e_1_2_2_28_1","unstructured":"Lanqing Guo Yingqing He Haoxin Chen Menghan Xia Xiaodong Cun Yufei Wang Siyu Huang Yong Zhang Xintao Wang Qifeng Chen et al. 2024. Make a cheap scaling: A self-cascade diffusion model for higher-resolution adaptation. arXiv preprint arXiv:2402.10491 (2024)."},{"key":"e_1_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-72764-1_3"},{"key":"e_1_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00631"},{"key":"e_1_2_2_31_1","doi-asserted-by":"crossref","unstructured":"Muhammad Haris Gregory Shakhnarovich and Norimichi Ukita. 2019. Recurrent back-projection network for video super-resolution. In CVPR. 3897\u20133906.","DOI":"10.1109\/CVPR.2019.00402"},{"key":"e_1_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1111\/cgf.13919"},{"key":"e_1_2_2_33_1","volume-title":"VEnhancer: Generative Space-Time Enhancement for Video Generation. arXiv preprint arXiv:2407.07667","author":"He Jingwen","year":"2024","unstructured":"Jingwen He, Tianfan Xue, Dongyang Liu, Xinqi Lin, Peng Gao, Dahua Lin, Yu Qiao, Wanli Ouyang, and Ziwei Liu. 2024b. VEnhancer: Generative Space-Time Enhancement for Video Generation. arXiv preprint arXiv:2407.07667 (2024)."},{"key":"e_1_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3680528.3687644"},{"key":"e_1_2_2_35_1","volume-title":"The Twelfth International Conference on Learning Representations.","author":"He Yingqing","year":"2024","unstructured":"Yingqing He, Shaoshu Yang, Haoxin Chen, Xiaodong Cun, Menghan Xia, Yong Zhang, Xintao Wang, Ran He, Qifeng Chen, and Ying Shan. 2024c. Scalecrafter: Tuning-free higher-resolution visual generation with diffusion models. In The Twelfth International Conference on Learning Representations."},{"key":"e_1_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3272127.3275084"},{"key":"e_1_2_2_37_1","unstructured":"Jonathan Ho William Chan Chitwan Saharia Jay Whang Ruiqi Gao Alexey Gritsenko Diederik P Kingma Ben Poole Mohammad Norouzi David J Fleet et al. 2022. Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303 (2022)."},{"key":"e_1_2_2_38_1","volume-title":"International Conference on Machine Learning. PMLR, 13213\u201313232","author":"Hoogeboom Emiel","year":"2023","unstructured":"Emiel Hoogeboom, Jonathan Heek, and Tim Salimans. 2023. simple diffusion: End-to-end diffusion for high resolution images. In International Conference on Machine Learning. PMLR, 13213\u201313232."},{"key":"e_1_2_2_39_1","volume-title":"FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis. arXiv preprint arXiv:2403.12963","author":"Huang Linjiang","year":"2024","unstructured":"Linjiang Huang, Rongyao Fang, Aiping Zhang, Guanglu Song, Si Liu, Yu Liu, and Hongsheng Li. 2024. FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis. arXiv preprint arXiv:2403.12963 (2024)."},{"key":"e_1_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2017.2701380"},{"key":"e_1_2_2_41_1","volume-title":"Vbench: Comprehensive benchmark suite for video generative models. arXiv preprint arXiv:2311.17982","author":"Huang Ziqi","year":"2023","unstructured":"Ziqi Huang, Yinan He, Jiashuo Yu, Fan Zhang, Chenyang Si, Yuming Jiang, Yuanhan Zhang, Tianxing Wu, Qingyang Jin, Nattapol Chanpaisit, et al. 2023. Vbench: Comprehensive benchmark suite for video generative models. arXiv preprint arXiv:2311.17982 (2023)."},{"key":"e_1_2_2_42_1","volume-title":"Upsample guidance: Scale up diffusion models without training. arXiv preprint arXiv:2404.01709","author":"Hwang Juno","year":"2024","unstructured":"Juno Hwang, Yong-Hyun Park, and Junghyo Jo. 2024. Upsample guidance: Scale up diffusion models without training. arXiv preprint arXiv:2404.01709 (2024)."},{"key":"e_1_2_2_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/3592415"},{"key":"e_1_2_2_44_1","unstructured":"Yang Jin Zhicheng Sun Ningyuan Li Kun Xu Kun Xu Hao Jiang Nan Zhuang Quzhe Huang Yang Song Yadong Mu and Zhouchen Lin. 2024. Pyramidal Flow Matching for Efficient Video Generative Modeling. (2024)."},{"key":"e_1_2_2_45_1","first-page":"70847","article-title":"Training-free diffusion model adaptation for variable-sized text-to-image synthesis","volume":"36","author":"Jin Zhiyu","year":"2023","unstructured":"Zhiyu Jin, Xuli Shen, Bin Li, and Xiangyang Xue. 2023. Training-free diffusion model adaptation for variable-sized text-to-image synthesis. Advances in Neural Information Processing Systems 36 (2023), 70847\u201370860.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_2_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/93.580394"},{"key":"e_1_2_2_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/3592433"},{"key":"e_1_2_2_48_1","unstructured":"Shakiba Kheradmand Daniel Rebain Gopal Sharma Weiwei Sun Yang-Che Tseng Hossam Isack Abhishek Kar Andrea Tagliasacchi and Kwang Moo Yi. 2024. 3D Gaussian Splatting as Markov Chain Monte Carlo. In Advances in Neural Information Processing Systems (NeurIPS). Spotlight Presentation."},{"key":"e_1_2_2_49_1","first-page":"D21","volume-title":"Point-Based Neural Rendering with Per-View Optimization. Computer Graphics Forum (Proceedings of the Eurographics Symposium on Rendering) 40","author":"Kopanas Georgios","year":"2021","unstructured":"Georgios Kopanas, Julien Philip, Thomas Leimk\u00fchler, and George Drettakis. 2021. Point-Based Neural Rendering with Per-View Optimization. Computer Graphics Forum (Proceedings of the Eurographics Symposium on Rendering) 40, 4 (June 2021). http:\/\/www-sop.inria.fr\/reves\/Basilic\/2021\/KPLD21"},{"key":"e_1_2_2_50_1","unstructured":"Black Forest Labs. 2024. FLUX.1: An advanced state-of-the-art generative deep learning model. Technical Report. Black Forest Labs. https:\/\/flux1.io\/"},{"key":"e_1_2_2_51_1","first-page":"50648","article-title":"Syncdiffusion: Coherent montage via synchronized joint diffusions","volume":"36","author":"Lee Yuseung","year":"2023","unstructured":"Yuseung Lee, Kunho Kim, Hyunjin Kim, and Minhyuk Sung. 2023. Syncdiffusion: Coherent montage via synchronized joint diffusions. Advances in Neural Information Processing Systems 36 (2023), 50648\u201350660.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_2_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/3528223.3530130"},{"key":"e_1_2_2_53_1","volume-title":"Mucan: Multi-correspondence aggregation network for video super-resolution","author":"Li Wenbo","year":"2020","unstructured":"Wenbo Li, Xin Tao, Taian Guo, Lu Qi, Jiangbo Lu, and Jiaya Jia. 2020. Mucan: Multi-correspondence aggregation network for video super-resolution. In ECCV. Springer, 335\u2013351."},{"key":"e_1_2_2_54_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00813"},{"key":"e_1_2_2_55_1","volume-title":"Vrt: A video restoration transformer","author":"Liang Jingyun","year":"2024","unstructured":"Jingyun Liang, Jiezhang Cao, Yuchen Fan, Kai Zhang, Rakesh Ranjan, Yawei Li, Radu Timofte, and Luc Van Gool. 2024. Vrt: A video restoration transformer. IEEE TIP (2024)."},{"key":"e_1_2_2_56_1","first-page":"378","article-title":"Recurrent video restoration transformer with guided deformable attention","volume":"35","author":"Liang Jingyun","year":"2022","unstructured":"Jingyun Liang, Yuchen Fan, Xiaoyu Xiang, Rakesh Ranjan, Eddy Ilg, Simon Green, Jiezhang Cao, Kai Zhang, Radu Timofte, and Luc V Gool. 2022. Recurrent video restoration transformer with guided deformable attention. NeurIPS 35 (2022), 378\u2013393.","journal-title":"NeurIPS"},{"key":"e_1_2_2_57_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2310.08585"},{"key":"e_1_2_2_58_1","volume-title":"Fast, Cheap, and Strong Diffusion Extrapolation Method. arXiv preprint arXiv:2404.15141","author":"Lin Mingbao","year":"2024","unstructured":"Mingbao Lin, Zhihang Lin, Wengyi Zhan, Liujuan Cao, and Rongrong Ji. 2024. CutDiffusion: A Simple, Fast, Cheap, and Strong Diffusion Extrapolation Method. arXiv preprint arXiv:2404.15141 (2024)."},{"key":"e_1_2_2_59_1","volume-title":"Diffbir: Towards blind image restoration with generative diffusion prior. arXiv preprint arXiv:2308.15070","author":"Lin Xinqi","year":"2023","unstructured":"Xinqi Lin, Jingwen He, Ziyan Chen, Zhaoyang Lyu, Bo Dai, Fanghua Yu, Wanli Ouyang, Yu Qiao, and Chao Dong. 2023a. Diffbir: Towards blind image restoration with generative diffusion prior. arXiv preprint arXiv:2308.15070 (2023)."},{"key":"e_1_2_2_60_1","unstructured":"Songhua Liu Weihao Yu Zhenxiong Tan and Xinchao Wang. 2024. LinFusion: 1 GPU 1 Minute 16K Image. (2024). arXiv:2409.02097 [cs.CV]"},{"key":"e_1_2_2_61_1","doi-asserted-by":"publisher","DOI":"10.1145\/3306346.3323020"},{"key":"e_1_2_2_62_1","doi-asserted-by":"publisher","DOI":"10.1145\/3450626.3459863"},{"key":"e_1_2_2_63_1","doi-asserted-by":"publisher","DOI":"10.1145\/3641519.3657472"},{"key":"e_1_2_2_64_1","volume-title":"Sander","author":"Ma Li","year":"2021","unstructured":"Li Ma, Xiaoyu Li, Jing Liao, Qi Zhang, Xuan Wang, Jue Wang, and Pedro V. Sander. 2021. Deblur-NeRF: Neural Radiance Fields from Blurry Images. arXiv preprint arXiv:2111.14292 (2021)."},{"key":"e_1_2_2_65_1","doi-asserted-by":"publisher","DOI":"10.1145\/3414685.3417814"},{"key":"e_1_2_2_66_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-72627-9_18"},{"key":"e_1_2_2_67_1","volume-title":"Barron","author":"Mildenhall Ben","year":"2022","unstructured":"Ben Mildenhall, Peter Hedman, Ricardo Martin-Brualla, Pratul P. Srinivasan, and Jonathan T. Barron. 2022. NeRF in the Dark: High Dynamic Range View Synthesis from Noisy Raw Images. CVPR (2022)."},{"key":"e_1_2_2_68_1","doi-asserted-by":"crossref","unstructured":"Ben Mildenhall Pratul P. Srinivasan Matthew Tancik Jonathan T. Barron Ravi Ramamoorthi and Ren Ng. 2020. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. In ECCV.","DOI":"10.1007\/978-3-030-58452-8_24"},{"key":"e_1_2_2_69_1","doi-asserted-by":"publisher","DOI":"10.1109\/LSP.2012.2227726"},{"key":"e_1_2_2_70_1","doi-asserted-by":"publisher","DOI":"10.1145\/3528223.3530127"},{"key":"e_1_2_2_71_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00581"},{"key":"e_1_2_2_72_1","doi-asserted-by":"publisher","DOI":"10.1145\/3478513.3480487"},{"key":"e_1_2_2_73_1","unstructured":"Philipp; Mildenhall Ben; Barron Jonathan T.; Martin-Brualla Ricardo Park Keunhong; Henzler. 2023. CamP: Camera Preconditioning for Neural Radiance Fields. ACM Trans. Graph. (2023)."},{"key":"e_1_2_2_74_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00414"},{"key":"e_1_2_2_75_1","doi-asserted-by":"publisher","DOI":"10.2312\/sr.20231122"},{"key":"e_1_2_2_76_1","volume-title":"FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion. arXiv preprint arXiv:2412.09626","author":"Qiu Haonan","year":"2024","unstructured":"Haonan Qiu, Shiwei Zhang, Yujie Wei, Ruihang Chu, Hangjie Yuan, Xiang Wang, Yingya Zhang, and Ziwei Liu. 2024. FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion. arXiv preprint arXiv:2412.09626 (2024)."},{"key":"e_1_2_2_77_1","volume-title":"Ultrapixel: Advancing ultra-high-resolution image synthesis to new peaks. arXiv preprint arXiv:2407.02158","author":"Ren Jingjing","year":"2024","unstructured":"Jingjing Ren, Wenbo Li, Haoyu Chen, Renjing Pei, Bin Shao, Yong Guo, Long Peng, Fenglong Song, and Lei Zhu. 2024. Ultrapixel: Advancing ultra-high-resolution image synthesis to new peaks. arXiv preprint arXiv:2407.02158 (2024)."},{"key":"e_1_2_2_78_1","doi-asserted-by":"publisher","DOI":"10.1145\/3618402"},{"key":"e_1_2_2_79_1","doi-asserted-by":"crossref","unstructured":"Robin Rombach Andreas Blattmann Dominik Lorenz Patrick Esser and Bj\u00f6rn Ommer. 2022. High-resolution image synthesis with latent diffusion models. In CVPR. 10684\u201310695.","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"e_1_2_2_80_1","doi-asserted-by":"publisher","DOI":"10.1145\/3528223.3530122"},{"key":"e_1_2_2_81_1","unstructured":"Runway AI Inc. 2024. Gen-3 Alpha. https:\/\/runwayml.com\/research\/introducing-gen-3-alpha"},{"key":"e_1_2_2_82_1","doi-asserted-by":"crossref","unstructured":"Mehdi SM Sajjadi Raviteja Vemulapalli and Matthew Brown. 2018. Frame-recurrent video super-resolution. In CVPR. 6626\u20136634.","DOI":"10.1109\/CVPR.2018.00693"},{"key":"e_1_2_2_83_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-73001-6_3"},{"key":"e_1_2_2_84_1","volume-title":"European Conference on Computer Vision (ECCV).","author":"Shen Yuan","year":"2024","unstructured":"Yuan Shen, Duygu Ceylan, Paul Guerrero, Zexiang Xu, Niloy J. Mitra, Shenlong Wang, and Anna Fr\u00fchst\u00fcck. 2024. SuperGaussian: Repurposing Video Models for 3D Super Resolution. In European Conference on Computer Vision (ECCV)."},{"key":"e_1_2_2_85_1","first-page":"36081","article-title":"Rethinking alignment in video super-resolution transformers","volume":"35","author":"Shi Shuwei","year":"2022","unstructured":"Shuwei Shi, Jinjin Gu, Liangbin Xie, Xintao Wang, Yujiu Yang, and Chao Dong. 2022. Rethinking alignment in video super-resolution transformers. NeurIPS 35 (2022), 36081\u201336093.","journal-title":"NeurIPS"},{"key":"e_1_2_2_86_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.01954"},{"key":"e_1_2_2_87_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58536-5_24"},{"key":"e_1_2_2_88_1","volume-title":"Relay diffusion: Unifying diffusion process across resolutions for image synthesis. arXiv preprint arXiv:2309.03350","author":"Teng Jiayan","year":"2023","unstructured":"Jiayan Teng, Wendi Zheng, Ming Ding, Wenyi Hong, Jianqiao Wangni, Zhuoyi Yang, and Jie Tang. 2023. Relay diffusion: Unifying diffusion process across resolutions for image synthesis. arXiv preprint arXiv:2309.03350 (2023)."},{"key":"e_1_2_2_89_1","volume-title":"FVD: A new Metric for Video Generation.. In DGS@ICLR. OpenReview.net","author":"Unterthiner Thomas","year":"2019","unstructured":"Thomas Unterthiner, Sjoerd van Steenkiste, Karol Kurach, Rapha\u00ebl Marinier, Marcin Michalski, and Sylvain Gelly. 2019. FVD: A new Metric for Video Generation.. In DGS@ICLR. OpenReview.net. http:\/\/dblp.uni-trier.de\/db\/conf\/iclr\/dgs2019.html#UnterthinerSKMM19"},{"key":"e_1_2_2_90_1","doi-asserted-by":"publisher","DOI":"10.1145\/1360612.1360696"},{"key":"e_1_2_2_91_1","doi-asserted-by":"publisher","DOI":"10.1145\/1618452.1618520"},{"key":"e_1_2_2_92_1","volume-title":"Cinematic Gaussians: Real-Time HDR Radiance Fields with Depth. In Computer Graphics Forum","author":"Wang Chao","year":"2024","unstructured":"Chao Wang, Krzysztof Wolski, Bernhard Kerbl, Ana Serrano, Mojtaba Bemana, Hans-Peter Seidel, Karol Myszkowski, and Thomas Leimk\u00fchler. 2024b. Cinematic Gaussians: Real-Time HDR Radiance Fields with Depth. In Computer Graphics Forum, Vol. 43. Blackwell-Wiley, 1\u201313."},{"key":"e_1_2_2_93_1","volume-title":"Kelvin CK Chan, and Chen Change Loy","author":"Wang Jianyi","year":"2024","unstructured":"Jianyi Wang, Zongsheng Yue, Shangchen Zhou, Kelvin CK Chan, and Chen Change Loy. 2024d. Exploiting diffusion prior for real-world image super-resolution. IJCV (2024), 1\u201321."},{"key":"e_1_2_2_94_1","volume-title":"Lavie: High-quality video generation with cascaded latent diffusion models. arXiv preprint arXiv:2309.15103","author":"Wang Yaohui","year":"2023","unstructured":"Yaohui Wang, Xinyuan Chen, Xin Ma, Shangchen Zhou, Ziqi Huang, Yi Wang, Ceyuan Yang, Yinan He, Jiashuo Yu, Peiqing Yang, et al. 2023. Lavie: High-quality video generation with cascaded latent diffusion models. arXiv preprint arXiv:2309.15103 (2023)."},{"key":"e_1_2_2_95_1","doi-asserted-by":"publisher","DOI":"10.1145\/3658148"},{"key":"e_1_2_2_96_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.02437"},{"key":"e_1_2_2_97_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.01920"},{"key":"e_1_2_2_98_1","volume-title":"Sanja Fidler, Zan Gojcic, and Huan Ling.","author":"Wu Jay Zhangjie","year":"2025","unstructured":"Jay Zhangjie Wu, Yuxuan Zhang, Haithem Turki, Xuanchi Ren, Jun Gao, Mike Zheng Shou, Sanja Fidler, Zan Gojcic, and Huan Ling. 2025. Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models. arXiv:2503.01774 [cs.CV]"},{"key":"e_1_2_2_99_1","volume-title":"Seesr: Towards semantics-aware real-world image super-resolution. In CVPR. 25456\u201325467.","author":"Wu Rongyuan","year":"2024","unstructured":"Rongyuan Wu, Tao Yang, Lingchen Sun, Zhengqiang Zhang, Shuai Li, and Lei Zhang. 2024a. Seesr: Towards semantics-aware real-world image super-resolution. In CVPR. 25456\u201325467."},{"key":"e_1_2_2_100_1","volume-title":"STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution. arXiv preprint arXiv:2501.02976","author":"Xie Rui","year":"2025","unstructured":"Rui Xie, Yinhong Liu, Penghao Zhou, Chen Zhao, Jun Zhou, Kai Zhang, Zhenyu Zhang, Jian Yang, Zhenheng Yang, and Ying Tai. 2025. STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution. arXiv preprint arXiv:2501.02976 (2025)."},{"key":"e_1_2_2_101_1","unstructured":"Gang Xu Jun Xu Zhen Li Liang Wang Xing Sun and Ming-Ming Cheng. 2021. Temporal modulation network for controllable space-time video super-resolution. In CVPR. 6388\u20136397."},{"key":"e_1_2_2_102_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00536"},{"key":"e_1_2_2_103_1","doi-asserted-by":"publisher","DOI":"10.1145\/3306346.3323007"},{"key":"e_1_2_2_104_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.01893"},{"key":"e_1_2_2_105_1","doi-asserted-by":"publisher","DOI":"10.1145\/3687919"},{"key":"e_1_2_2_106_1","volume-title":"Pixel-aware stable diffusion for realistic image super-resolution and personalized stylization. arXiv preprint arXiv:2308.14469","author":"Yang Tao","year":"2023","unstructured":"Tao Yang, Rongyuan Wu, Peiran Ren, Xuansong Xie, and Lei Zhang. 2023. Pixel-aware stable diffusion for realistic image super-resolution and personalized stylization. arXiv preprint arXiv:2308.14469 (2023)."},{"key":"e_1_2_2_107_1","doi-asserted-by":"crossref","unstructured":"Xi Yang Chenhang He Jianqi Ma and Lei Zhang. 2024a. Motion-Guided Latent Diffusion for Temporally Consistent Real-world Video Super-resolution. (2024).","DOI":"10.1007\/978-3-031-72784-9_13"},{"key":"e_1_2_2_108_1","volume-title":"Cogvideox: Text-to-video diffusion models with an expert transformer. arXiv preprint arXiv:2408.06072","author":"Yang Zhuoyi","year":"2024","unstructured":"Zhuoyi Yang, Jiayan Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, et al. 2024b. Cogvideox: Text-to-video diffusion models with an expert transformer. arXiv preprint arXiv:2408.06072 (2024)."},{"key":"e_1_2_2_109_1","volume-title":"International Conference on Learning Representations (ICLR).","author":"Yang Zeyu","year":"2024","unstructured":"Zeyu Yang, Hongye Yang, Zijie Pan, and Li Zhang. 2024c. Real-time Photorealistic Dynamic Scene Representation and Rendering with 4D Gaussian Splatting. International Conference on Learning Representations (ICLR)."},{"key":"e_1_2_2_110_1","first-page":"1","article-title":"gsplat: An open-source library for Gaussian splatting","volume":"26","author":"Ye Vickie","year":"2025","unstructured":"Vickie Ye, Ruilong Li, Justin Kerr, Matias Turkulainen, Brent Yi, Zhuoyang Pan, Otto Seiskari, Jianbo Ye, Jeffrey Hu, Matthew Tancik, and Angjoo Kanazawa. 2025. gsplat: An open-source library for Gaussian splatting. Journal of Machine Learning Research 26, 34 (2025), 1\u201317.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_2_2_111_1","doi-asserted-by":"crossref","unstructured":"Peng Yi Zhongyuan Wang Kui Jiang Junjun Jiang and Jiayi Ma. 2019. Progressive fusion video super-resolution network via exploiting non-local spatio-temporal correlations. In ICCV. 3106\u20133115.","DOI":"10.1109\/ICCV.2019.00320"},{"key":"e_1_2_2_112_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.01839"},{"key":"e_1_2_2_113_1","doi-asserted-by":"publisher","DOI":"10.1109\/WACVW60836.2024.00059"},{"key":"e_1_2_2_114_1","volume-title":"Resshift: Efficient diffusion model for image super-resolution by residual shifting. NeurIPS 36","author":"Yue Zongsheng","year":"2024","unstructured":"Zongsheng Yue, Jianyi Wang, and Chen Change Loy. 2024. Resshift: Efficient diffusion model for image super-resolution by residual shifting. NeurIPS 36 (2024)."},{"key":"e_1_2_2_115_1","doi-asserted-by":"publisher","DOI":"10.1145\/3641519.3657445"},{"key":"e_1_2_2_116_1","doi-asserted-by":"crossref","unstructured":"Richard Zhang Phillip Isola Alexei A Efros Eli Shechtman and Oliver Wang. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In CVPR. 586\u2013595.","DOI":"10.1109\/CVPR.2018.00068"},{"key":"e_1_2_2_117_1","volume-title":"RealViformer: Investigating Attention for Real-World Video Super-Resolution. ECCV","author":"Zhang Yuehan","year":"2024","unstructured":"Yuehan Zhang and Angela Yao. 2024. RealViformer: Investigating Attention for Real-World Video Super-Resolution. ECCV (2024)."},{"key":"e_1_2_2_118_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00791"},{"key":"e_1_2_2_119_1","doi-asserted-by":"publisher","DOI":"10.1145\/3550454.3555451"},{"key":"e_1_2_2_120_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v38i7.28589"},{"key":"e_1_2_2_121_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01190"},{"key":"e_1_2_2_122_1","doi-asserted-by":"crossref","unstructured":"Shangchen Zhou Peiqing Yang Jianyi Wang Yihang Luo and Chen Change Loy. 2024. Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution. In CVPR. 2535\u20132545.","DOI":"10.1109\/CVPR52733.2024.00245"}],"container-title":["ACM Transactions on Graphics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3763336","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,5]],"date-time":"2025-12-05T21:11:45Z","timestamp":1764969105000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3763336"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12]]},"references-count":122,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2025,12]]}},"alternative-id":["10.1145\/3763336"],"URL":"https:\/\/doi.org\/10.1145\/3763336","relation":{},"ISSN":["0730-0301","1557-7368"],"issn-type":[{"type":"print","value":"0730-0301"},{"type":"electronic","value":"1557-7368"}],"subject":[],"published":{"date-parts":[[2025,12]]},"assertion":[{"value":"2025-05-24","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-08-09","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-12-04","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}