{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,7]],"date-time":"2026-04-07T15:56:15Z","timestamp":1775577375092,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":65,"publisher":"ACM","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,12,15]]},"DOI":"10.1145\/3757377.3763816","type":"proceedings-article","created":{"date-parts":[[2025,12,8]],"date-time":"2025-12-08T16:27:29Z","timestamp":1765211249000},"page":"1-12","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Shape-for-Motion: Precise and Consistent Video Editing With 3D Proxy"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0550-4788","authenticated-orcid":false,"given":"Yuhao","family":"Liu","sequence":"first","affiliation":[{"name":"City University of Hong Kong, Hong Kong, Hong Kong"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3435-8110","authenticated-orcid":false,"given":"Tengfei","family":"Wang","sequence":"additional","affiliation":[{"name":"Tencent, Shang Hai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5763-0172","authenticated-orcid":false,"given":"Fang","family":"Liu","sequence":"additional","affiliation":[{"name":"City University of Hong Kong, Hong Kong, Hong Kong"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0215-660X","authenticated-orcid":false,"given":"Zhenwei","family":"Wang","sequence":"additional","affiliation":[{"name":"City University of Hong Kong, Hong Kong, Hong Kong"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8957-8129","authenticated-orcid":false,"given":"Rynson W.H.","family":"Lau","sequence":"additional","affiliation":[{"name":"City University of Hong Kong, Hong Kong, Hong Kong"}]}],"member":"320","published-online":{"date-parts":[[2025,12,14]]},"reference":[{"key":"e_1_3_3_2_2_1","doi-asserted-by":"crossref","unstructured":"Radhakrishna Achanta Appu Shaji Kevin Smith Aurelien Lucchi Pascal Fua and Sabine S\u00fcsstrunk. 2012. SLIC superpixels compared to state-of-the-art superpixel methods. IEEE transactions on pattern analysis and machine intelligence 34 11 (2012) 2274\u20132282.","DOI":"10.1109\/TPAMI.2012.120"},{"key":"e_1_3_3_2_3_1","first-page":"53","volume-title":"European Conference on Computer Vision","author":"Bahmani Sherwin","year":"2024","unstructured":"Sherwin Bahmani, Xian Liu, Wang Yifan, Ivan Skorokhodov, Victor Rong, Ziwei Liu, Xihui Liu, Jeong\u00a0Joon Park, Sergey Tulyakov, Gordon Wetzstein, et\u00a0al. 2024. Tc4d: Trajectory-conditioned text-to-4d generation. In European Conference on Computer Vision. Springer, 53\u201372."},{"key":"e_1_3_3_2_4_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-19784-0_41"},{"key":"e_1_3_3_2_5_1","doi-asserted-by":"crossref","unstructured":"Weikang Bian Zhaoyang Huang Xiaoyu Shi Yijin Li Fu-Yun Wang and Hongsheng Li. 2025. Gs-dit: Advancing video generation with pseudo 4d gaussian fields through efficient dense 3d point tracking. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2501.02690 (2025).","DOI":"10.1109\/CVPR52734.2025.02023"},{"key":"e_1_3_3_2_6_1","unstructured":"Andreas Blattmann Tim Dockhorn Sumith Kulal Daniel Mendelevitch Maciej Kilian Dominik Lorenz Yam Levi Zion English Vikram Voleti Adam Letts et\u00a0al. 2023. Stable video diffusion: Scaling latent video diffusion models to large datasets. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2311.15127 (2023)."},{"key":"e_1_3_3_2_7_1","unstructured":"Tim Brooks Bill Peebles Connor Holmes Will DePue Yufei Guo Li Jing David Schnurr Joe Taylor Troy Luhman Eric Luhman Clarence Ng Ricky Wang and Aditya Ramesh. 2024. Video generation models as world simulators. (2024). https:\/\/openai.com\/research\/video-generation-models-as-world-simulators"},{"key":"e_1_3_3_2_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00727"},{"key":"e_1_3_3_2_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.02121"},{"key":"e_1_3_3_2_10_1","unstructured":"Weifeng Chen Yatai Ji Jie Wu Hefeng Wu Pan Xie Jiashi Li Xin Xia Xuefeng Xiao and Liang Lin. 2023. Control-a-video: Controllable text-to-video generation with diffusion models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2305.13840 (2023)."},{"key":"e_1_3_3_2_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3721238.3730695"},{"key":"e_1_3_3_2_12_1","first-page":"183","volume-title":"European Conference on Computer Vision","author":"Deng Yufan","year":"2024","unstructured":"Yufan Deng, Ruida Wang, Yuhao Zhang, Yu-Wing Tai, and Chi-Keung Tang. 2024. Dragvideo: Interactive drag-style video editing. In European Conference on Computer Vision. Springer, 183\u2013199."},{"key":"e_1_3_3_2_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00675"},{"key":"e_1_3_3_2_14_1","doi-asserted-by":"crossref","unstructured":"Xiang Fan Anand Bhattad and Ranjay Krishna. 2024. Videoshop: Localized Semantic Video Editing with Noise-Extrapolated Diffusion Inversion. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2403.14617 (2024).","DOI":"10.1007\/978-3-031-73254-6_14"},{"key":"e_1_3_3_2_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00728"},{"key":"e_1_3_3_2_16_1","unstructured":"Zekai Gu Rui Yan Jiahao Lu Peng Li Zhiyang Dou Chenyang Si Zhen Dong Qifeng Liu Cheng Lin Ziwei Liu et\u00a0al. 2025. Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2501.03847 (2025)."},{"key":"e_1_3_3_2_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3641519.3657407"},{"key":"e_1_3_3_2_18_1","unstructured":"Yuwei Guo Ceyuan Yang Anyi Rao Zhengyang Liang Yaohui Wang Yu Qiao Maneesh Agrawala Dahua Lin and Bo Dai. 2024a. AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning. International Conference on Learning Representations (2024)."},{"key":"e_1_3_3_2_19_1","unstructured":"Jonathan Ho William Chan Chitwan Saharia Jay Whang Ruiqi Gao Alexey Gritsenko Diederik\u00a0P Kingma Ben Poole Mohammad Norouzi David\u00a0J Fleet et\u00a0al. 2022. Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2210.02303 (2022)."},{"key":"e_1_3_3_2_20_1","unstructured":"Jonathan Ho Ajay Jain and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. Advances in neural information processing systems 33 (2020) 6840\u20136851."},{"key":"e_1_3_3_2_21_1","unstructured":"Zhihao Hu and Dong Xu. 2023. Videocontrolnet: A motion-guided video-to-video translation framework by using diffusion model with controlnet. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2307.14073 (2023)."},{"key":"e_1_3_3_2_22_1","doi-asserted-by":"crossref","unstructured":"Tianyu Huang Wangguandong Zheng Tengfei Wang Yuhao Liu Zhenwei Wang Junta Wu Jie Jiang Hui Li Rynson\u00a0WH Lau Wangmeng Zuo et\u00a0al. 2025. Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2506.04225 (2025).","DOI":"10.1145\/3763330"},{"key":"e_1_3_3_2_23_1","unstructured":"Yanqin Jiang Chaohui Yu Chenjie Cao Fan Wang Weiming Hu and Jin Gao. 2024. Animate3d: Animating any 3d model with multi-view video diffusion. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2407.11398 (2024)."},{"key":"e_1_3_3_2_24_1","doi-asserted-by":"crossref","unstructured":"Mizuki Kagaya William Brendel Qingqing Deng Todd Kesterson Sinisa Todorovic Patrick\u00a0J Neill and Eugene Zhang. 2010. Video painting with space-time-varying style parameters. IEEE transactions on visualization and computer graphics 17 1 (2010) 74\u201387.","DOI":"10.1109\/TVCG.2010.25"},{"key":"e_1_3_3_2_25_1","doi-asserted-by":"crossref","unstructured":"Yoni Kasten Dolev Ofri Oliver Wang and Tali Dekel. 2021. Layered neural atlases for consistent video editing. ACM Transactions on Graphics (TOG) 40 6 (2021) 1\u201312.","DOI":"10.1145\/3478513.3480546"},{"key":"e_1_3_3_2_26_1","doi-asserted-by":"crossref","unstructured":"Bernhard Kerbl Georgios Kopanas Thomas Leimk\u00fchler and George Drettakis. 2023. 3d gaussian splatting for real-time radiance field rendering. ACM Trans. Graph. 42 4 (2023) 139\u20131.","DOI":"10.1145\/3592433"},{"key":"e_1_3_3_2_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52734.2025.01648"},{"key":"e_1_3_3_2_28_1","unstructured":"Max Ku Cong Wei Weiming Ren Huan Yang and Wenhu Chen. 2024. AnyV2V: A Tuning-Free Framework For Any Video-to-Video Editing Tasks. Transactions on Machine Learning Research (2024)."},{"key":"e_1_3_3_2_29_1","unstructured":"Black\u00a0Forest Labs. 2024. FLUX. https:\/\/github.com\/black-forest-labs\/flux."},{"key":"e_1_3_3_2_30_1","doi-asserted-by":"crossref","unstructured":"Samuli Laine Janne Hellsten Tero Karras Yeongho Seol Jaakko Lehtinen and Timo Aila. 2020. Modular primitives for high-performance differentiable rendering. ACM Transactions on Graphics (ToG) 39 6 (2020) 1\u201314.","DOI":"10.1145\/3414685.3417861"},{"key":"e_1_3_3_2_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52734.2025.01713"},{"key":"e_1_3_3_2_32_1","unstructured":"Isabella Liu Hao Su and Xiaolong Wang. 2024b. Dynamic Gaussians Mesh: Consistent Mesh Reconstruction from Monocular Videos. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2404.12379 (2024)."},{"key":"e_1_3_3_2_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52734.2025.01650"},{"key":"e_1_3_3_2_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00402"},{"key":"e_1_3_3_2_35_1","unstructured":"Jonathon Luiten Georgios Kopanas Bastian Leibe and Deva Ramanan. 2023. Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2308.09713 (2023)."},{"key":"e_1_3_3_2_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW63382.2024.00150"},{"key":"e_1_3_3_2_37_1","unstructured":"Yue Ma Xiaodong Cun Yingqing He Chenyang Qi Xintao Wang Ying Shan Xiu Li and Qifeng Chen. 2023. Magicstick: Controllable video editing via control handle transformations. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2312.03047 (2023)."},{"key":"e_1_3_3_2_38_1","unstructured":"Oscar Michel Anand Bhattad Eli VanderBilt Ranjay Krishna Aniruddha Kembhavi and Tanmay Gupta. 2023. Object 3dit: Language-guided 3d-aware image editing. Advances in Neural Information Processing Systems 36 (2023) 3497\u20133516."},{"key":"e_1_3_3_2_39_1","doi-asserted-by":"crossref","unstructured":"Ben Mildenhall Pratul\u00a0P Srinivasan Matthew Tancik Jonathan\u00a0T Barron Ravi Ramamoorthi and Ren Ng. 2021. Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 65 1 (2021) 99\u2013106.","DOI":"10.1145\/3503250"},{"key":"e_1_3_3_2_40_1","unstructured":"Chong Mou Mingdeng Cao Xintao Wang Zhaoyang Zhang Ying Shan and Jian Zhang. 2024. ReVideo: Remake a Video with Motion and Content Control. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2405.13865 (2024)."},{"key":"e_1_3_3_2_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3680528.3687656"},{"key":"e_1_3_3_2_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3588432.3591500"},{"key":"e_1_3_3_2_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00735"},{"key":"e_1_3_3_2_44_1","unstructured":"Songyou Peng Chiyu Jiang Yiyi Liao Michael Niemeyer Marc Pollefeys and Andreas Geiger. 2021. Shape as points: A differentiable poisson solver. Advances in Neural Information Processing Systems 34 (2021) 13032\u201313044."},{"key":"e_1_3_3_2_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01018"},{"key":"e_1_3_3_2_46_1","volume-title":"ICML","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong\u00a0Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et\u00a0al. 2021. Learning transferable visual models from natural language supervision. In ICML."},{"key":"e_1_3_3_2_47_1","doi-asserted-by":"crossref","unstructured":"Jiawei Ren Cheng Xie Ashkan Mirzaei Karsten Kreis Ziwei Liu Antonio Torralba Sanja Fidler Seung\u00a0Wook Kim Huan Ling et\u00a0al. 2024. L4gm: Large 4d gaussian reconstruction model. Advances in Neural Information Processing Systems 37 (2024) 56828\u201356858.","DOI":"10.52202\/079017-1810"},{"key":"e_1_3_3_2_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"e_1_3_3_2_49_1","unstructured":"Ruoxi Shi Hansheng Chen Zhuoyang Zhang Minghua Liu Chao Xu Xinyue Wei Linghao Chen Chong Zeng and Hao Su. 2023a. Zero123++: a single image to consistent multi-view diffusion base model. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2310.15110 (2023)."},{"key":"e_1_3_3_2_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/3641519.3657497"},{"key":"e_1_3_3_2_51_1","unstructured":"Yichun Shi Peng Wang Jianglong Ye Mai Long Kejie Li and Xiao Yang. 2023b. Mvdream: Multi-view diffusion for 3d generation. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2308.16512 (2023)."},{"key":"e_1_3_3_2_52_1","volume-title":"ICLR","author":"Song Jiaming","year":"2021","unstructured":"Jiaming Song, Chenlin Meng, and Stefano Ermon. 2021. Denoising Diffusion Implicit Models. In ICLR."},{"key":"e_1_3_3_2_53_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-73235-5_1"},{"key":"e_1_3_3_2_54_1","unstructured":"Yao Teng Enze Xie Yue Wu Haoyu Han Zhenguo Li and Xihui Liu. 2023. Drag-a-video: Non-rigid video editing with point-based interaction. arXiv (2023)."},{"key":"e_1_3_3_2_55_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-73232-4_25"},{"key":"e_1_3_3_2_56_1","volume-title":"arXiv","author":"Wang Tengfei","year":"2022","unstructured":"Tengfei Wang, Ting Zhang, Bo Zhang, Hao Ouyang, Dong Chen, Qifeng Chen, and Fang Wen. 2022. Pretraining is All You Need for Image-to-Image Translation. In arXiv."},{"key":"e_1_3_3_2_57_1","unstructured":"Zhenwei Wang Tengfei Wang Zexin He Gerhard Hancke Ziwei Liu and Rynson\u00a0WH Lau. 2025. Phidias: A generative model for creating 3d content from text image and 3d conditions with reference-augmented diffusion. ICLR (2025)."},{"key":"e_1_3_3_2_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.01920"},{"key":"e_1_3_3_2_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00701"},{"key":"e_1_3_3_2_60_1","unstructured":"Rundi Wu Ruiqi Gao Ben Poole Alex Trevithick Changxi Zheng Jonathan\u00a0T Barron and Aleksander Holynski. 2024a. Cat4d: Create anything in 4d with multi-view video diffusion models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2411.18613 (2024)."},{"key":"e_1_3_3_2_61_1","unstructured":"Yiming Xie Chun-Han Yao Vikram Voleti Huaizu Jiang and Varun Jampani. 2024. Sv4d: Dynamic 3d content generation with multi-frame and multi-view consistency. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2407.17470 (2024)."},{"key":"e_1_3_3_2_62_1","doi-asserted-by":"publisher","DOI":"10.1145\/3641519.3657481"},{"key":"e_1_3_3_2_63_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.01922"},{"key":"e_1_3_3_2_64_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00406"},{"key":"e_1_3_3_2_65_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00355"},{"key":"e_1_3_3_2_66_1","unstructured":"Yabo Zhang Yuxiang Wei Dongsheng Jiang Xiaopeng Zhang Wangmeng Zuo and Qi Tian. 2023b. Controlvideo: Training-free controllable text-to-video generation. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2305.13077 (2023)."}],"event":{"name":"SA Conference Papers '25: SIGGRAPH Asia 2025 Conference Papers","location":"Hong Kong Hong Kong","acronym":"SA Conference Papers '25","sponsor":["SIGGRAPH ACM Special Interest Group on Computer Graphics and Interactive Techniques"]},"container-title":["Proceedings of the SIGGRAPH Asia 2025 Conference Papers"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3757377.3763816","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,9]],"date-time":"2025-12-09T03:20:38Z","timestamp":1765250438000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3757377.3763816"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12,14]]},"references-count":65,"alternative-id":["10.1145\/3757377.3763816","10.1145\/3757377"],"URL":"https:\/\/doi.org\/10.1145\/3757377.3763816","relation":{},"subject":[],"published":{"date-parts":[[2025,12,14]]},"assertion":[{"value":"2025-12-14","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}