{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,20]],"date-time":"2026-03-20T15:48:41Z","timestamp":1774021721511,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":55,"publisher":"ACM","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,8,10]]},"DOI":"10.1145\/3721238.3730683","type":"proceedings-article","created":{"date-parts":[[2025,7,23]],"date-time":"2025-07-23T08:40:47Z","timestamp":1753260047000},"page":"1-11","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["FlexiAct: Towards Flexible Action Control in Heterogeneous Scenarios"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0005-8067-4025","authenticated-orcid":false,"given":"Shiyi","family":"Zhang","sequence":"first","affiliation":[{"name":"Tsinghua University, Shenzhen, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1642-0750","authenticated-orcid":false,"given":"Junhao","family":"Zhuang","sequence":"additional","affiliation":[{"name":"Tencent, Shenzhen, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-5583-6454","authenticated-orcid":false,"given":"Zhaoyang","family":"Zhang","sequence":"additional","affiliation":[{"name":"Tencent, Shenzhen, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7673-8325","authenticated-orcid":false,"given":"Ying","family":"Shan","sequence":"additional","affiliation":[{"name":"Tencent, Shenzhen, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1534-4549","authenticated-orcid":false,"given":"Yansong","family":"Tang","sequence":"additional","affiliation":[{"name":"Tsinghua University, Shenzhen, China"}]}],"member":"320","published-online":{"date-parts":[[2025,7,27]]},"reference":[{"key":"e_1_3_3_2_2_1","doi-asserted-by":"crossref","unstructured":"Omer Bar-Tal Hila Chefer Omer Tov Charles Herrmann Roni Paiss Shiran Zada Ariel Ephrat Junhwa Hur Guanghui Liu Amit Raj et\u00a0al. 2024. Lumiere: A space-time diffusion model for video generation. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2401.12945 (2024).","DOI":"10.1145\/3680528.3687614"},{"key":"e_1_3_3_2_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.02161"},{"key":"e_1_3_3_2_4_1","unstructured":"Tim Brooks Bill Peebles Connor Holmes Will DePue Yufei Guo Li Jing David Schnurr Joe Taylor Troy Luhman Eric Luhman Clarence Ng Ricky Wang and Aditya Ramesh. 2024. Video generation models as world simulators. (2024). https:\/\/openai.com\/research\/video-generation-models-as-world-simulators"},{"key":"e_1_3_3_2_5_1","volume-title":"Forty-first International Conference on Machine Learning","author":"Bruce Jake","year":"2024","unstructured":"Jake Bruce, Michael\u00a0D Dennis, Ashley Edwards, Jack Parker-Holder, Yuge Shi, Edward Hughes, Matthew Lai, Aditi Mavalankar, Richie Steigerwald, Chris Apps, et\u00a0al. 2024. Genie: Generative interactive environments. In Forty-first International Conference on Machine Learning."},{"key":"e_1_3_3_2_6_1","unstructured":"Haoxin Chen Menghan Xia Yingqing He Yong Zhang Xiaodong Cun Shaoshu Yang Jinbo Xing Yaofang Liu Qifeng Chen Xintao Wang et\u00a0al. 2023b. VideoCrafter1: Open Diffusion Models for High-Quality Video Generation. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2310.19512 (2023)."},{"key":"e_1_3_3_2_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00698"},{"key":"e_1_3_3_2_8_1","unstructured":"Weifeng Chen Jie Wu Pan Xie Hefeng Wu Jiashi Li Xin Xia Xuefeng Xiao and Liang Lin. 2023a. Control-A-Video: Controllable Text-to-Video Generation with Diffusion Models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2305.13840 (2023)."},{"key":"e_1_3_3_2_9_1","volume-title":"ICML","author":"Dao Tri","year":"2024","unstructured":"Tri Dao and Albert Gu. 2024. Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality. In ICML."},{"key":"e_1_3_3_2_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00675"},{"key":"e_1_3_3_2_11_1","volume-title":"Forty-first international conference on machine learning","author":"Esser Patrick","year":"2024","unstructured":"Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas M\u00fcller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, et\u00a0al. 2024. Scaling rectified flow transformers for high-resolution image synthesis. In Forty-first international conference on machine learning."},{"key":"e_1_3_3_2_12_1","unstructured":"Rinon Gal Yuval Alaluf Yuval Atzmon Or Patashnik Amit\u00a0H Bermano Gal Chechik and Daniel Cohen-Or. 2022. An image is worth one word: Personalizing text-to-image generation using textual inversion. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2208.01618 (2022)."},{"key":"e_1_3_3_2_13_1","doi-asserted-by":"crossref","unstructured":"Ian Goodfellow Jean Pouget-Abadie Mehdi Mirza Bing Xu David Warde-Farley Sherjil Ozair Aaron Courville and Yoshua Bengio. 2020. Generative adversarial networks. Commun. ACM (2020).","DOI":"10.1145\/3422622"},{"key":"e_1_3_3_2_14_1","unstructured":"Yuwei Guo Ceyuan Yang Anyi Rao Yaohui Wang Yu Qiao Dahua Lin and Bo Dai. 2023. Animatediff: Animate your personalized text-to-image diffusion models without specific tuning. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2307.04725 (2023)."},{"key":"e_1_3_3_2_15_1","unstructured":"Yingqing He Tianyu Yang Yong Zhang Ying Shan and Qifeng Chen. 2022. Latent Video Diffusion Models for High-Fidelity Long Video Generation. (2022). arxiv:https:\/\/arXiv.org\/abs\/2211.13221\u00a0[cs.CV]"},{"key":"e_1_3_3_2_16_1","unstructured":"Jonathan Ho Ajay Jain and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. Proc. NeurIPS 33 (2020) 6840\u20136851."},{"key":"e_1_3_3_2_17_1","unstructured":"Edward\u00a0J Hu Yelong Shen Phillip Wallis Zeyuan Allen-Zhu Yuanzhi Li Shean Wang Lu Wang and Weizhu Chen. 2021. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2106.09685 (2021)."},{"key":"e_1_3_3_2_18_1","first-page":"8153","volume-title":"Proc. CVPR","author":"Hu Li","year":"2024","unstructured":"Li Hu. 2024. Animate anyone: Consistent and controllable image-to-video synthesis for character animation. In Proc. CVPR. 8153\u20138163."},{"key":"e_1_3_3_2_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00233"},{"key":"e_1_3_3_2_20_1","doi-asserted-by":"crossref","unstructured":"Hyeonho Jeong Jinho Chang Geon\u00a0Yeong Park and Jong\u00a0Chul Ye. 2024. DreamMotion: Space-Time Self-Similarity Score Distillation for Zero-Shot Video Editing. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2403.12002 (2024).","DOI":"10.1007\/978-3-031-73404-5_21"},{"key":"e_1_3_3_2_21_1","unstructured":"Hyeonho Jeong Geon\u00a0Yeong Park and Jong\u00a0Chul Ye. 2023. VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2312.00845 (2023)."},{"key":"e_1_3_3_2_22_1","unstructured":"Xuan Ju Yiming Gao Zhaoyang Zhang Ziyang Yuan Xintao Wang Ailing Zeng Yu Xiong Qiang Xu and Ying Shan. 2024. Miradata: A large-scale video dataset with long durations and structured captions. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2407.06358 (2024)."},{"key":"e_1_3_3_2_23_1","unstructured":"Nikita Karaev Ignacio Rocco Benjamin Graham Natalia Neverova Andrea Vedaldi and Christian Rupprecht. 2023. Cotracker: It is better to track together. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2307.07635 (2023)."},{"key":"e_1_3_3_2_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/WACV57701.2024.00532"},{"key":"e_1_3_3_2_25_1","unstructured":"Pengyang Ling Jiazi Bu Pan Zhang Xiaoyi Dong Yuhang Zang Tong Wu Huaian Chen Jiaqi Wang and Yi Jin. 2024. MotionClone: Training-Free Motion Cloning for Controllable Video Generation. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2406.05338 (2024)."},{"key":"e_1_3_3_2_26_1","unstructured":"Yu Lu Yuanzhi Liang Linchao Zhu and Yi Yang. 2024. Freelong: Training-free long video generation with spectralblend temporal attention. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2407.19918 (2024)."},{"key":"e_1_3_3_2_27_1","unstructured":"Xin Ma Yaohui Wang Gengyun Jia Xinyuan Chen Ziwei Liu Yuan-Fang Li Cunjian Chen and Yu Qiao. 2024. Latte: Latent diffusion transformer for video generation. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2401.03048 (2024)."},{"key":"e_1_3_3_2_28_1","unstructured":"Bohao Peng Jian Wang Yuechen Zhang Wenbo Li Ming-Chang Yang and Jiaya Jia. 2024. ControlNeXt: Powerful and Efficient Control for Image and Video Generation. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2408.06070 (2024)."},{"key":"e_1_3_3_2_29_1","unstructured":"Haonan Qiu Menghan Xia Yong Zhang Yingqing He Xintao Wang Ying Shan and Ziwei Liu. 2023. Freenoise: Tuning-free longer video diffusion via noise rescheduling. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2310.15169 (2023)."},{"key":"e_1_3_3_2_30_1","first-page":"8748","volume-title":"Proc. ICML","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong\u00a0Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et\u00a0al. 2021. Learning transferable visual models from natural language supervision. In Proc. ICML. PMLR, 8748\u20138763."},{"key":"e_1_3_3_2_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00453"},{"key":"e_1_3_3_2_32_1","volume-title":"NeurIPS","author":"Siarohin Aliaksandr","year":"2019","unstructured":"Aliaksandr Siarohin, St\u00e9phane Lathuili\u00e8re, Sergey Tulyakov, Elisa Ricci, and Nicu Sebe. 2019. First Order Motion Model for Image Animation. In NeurIPS."},{"key":"e_1_3_3_2_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01344"},{"key":"e_1_3_3_2_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00753"},{"key":"e_1_3_3_2_35_1","unstructured":"Shuyuan Tu Zhen Xing Xintong Han Zhi-Qi Cheng Qi Dai Chong Luo and Zuxuan Wu. 2024b. StableAnimator: High-Quality Identity-Preserving Human Image Animation. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2411.17697 (2024)."},{"key":"e_1_3_3_2_36_1","unstructured":"Jiuniu Wang Hangjie Yuan Dayou Chen Yingya Zhang Xiang Wang and Shiwei Zhang. 2023c. Modelscope text-to-video technical report. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2308.06571 (2023)."},{"key":"e_1_3_3_2_37_1","unstructured":"Luozhou Wang Ziyang Mai Guibao Shen Yixun Liang Xin Tao Pengfei Wan Di Zhang Yijun Li and Yingcong Chen. 2024b. Motion inversion for video customization. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2403.20193 (2024)."},{"key":"e_1_3_3_2_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00891"},{"key":"e_1_3_3_2_39_1","unstructured":"Wenjing Wang Huan Yang Zixi Tuo Huiguo He Junchen Zhu Jianlong Fu and Jiaying Liu. 2023b. VideoFactory: Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2305.10874 (2023)."},{"key":"e_1_3_3_2_40_1","unstructured":"Xiang Wang Hangjie Yuan Shiwei Zhang Dayou Chen Jiuniu Wang Yingya Zhang Yujun Shen Deli Zhao and Jingren Zhou. 2023d. VideoComposer: Compositional Video Synthesis with Motion Controllability. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2306.02018 (2023)."},{"key":"e_1_3_3_2_41_1","unstructured":"Xiang Wang Shiwei Zhang Changxin Gao Jiayu Wang Xiaoqiang Zhou Yingya Zhang Luxin Yan and Nong Sang. 2024c. UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2406.01188 (2024)."},{"key":"e_1_3_3_2_42_1","unstructured":"Yaohui Wang Xinyuan Chen Xin Ma Shangchen Zhou Ziqi Huang Yi Wang Ceyuan Yang Yinan He Jiashuo Yu Peiqing Yang et\u00a0al. 2023a. LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2309.15103 (2023)."},{"key":"e_1_3_3_2_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00701"},{"key":"e_1_3_3_2_44_1","unstructured":"Tianxing Wu Chenyang Si Yuming Jiang Ziqi Huang and Ziwei Liu. 2023b. Freeinit: Bridging initialization gap in video diffusion models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2312.07537 (2023)."},{"key":"e_1_3_3_2_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00147"},{"key":"e_1_3_3_2_46_1","unstructured":"Zhuoyi Yang Jiayan Teng Wendi Zheng Ming Ding Shiyu Huang Jiazheng Xu Yuanming Yang Wenyi Hong Xiaohan Zhang Guanyu Feng et\u00a0al. 2024. CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2408.06072 (2024)."},{"key":"e_1_3_3_2_47_1","unstructured":"Danah Yatim Rafail Fridman Omer\u00a0Bar Tal Yoni Kasten and Tali Dekel. 2023. Space-Time Diffusion Features for Zero-Shot Text-Driven Motion Transfer. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2311.17009 (2023)."},{"key":"e_1_3_3_2_48_1","unstructured":"Hu Ye Jun Zhang Sibo Liu Xiao Han and Wei Yang. 2023a. IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models. arXiv preprint arxiv:https:\/\/arXiv.org\/abs\/2308.06721 (2023)."},{"key":"e_1_3_3_2_49_1","unstructured":"Hu Ye Jun Zhang Sibo Liu Xiao Han and Wei Yang. 2023b. Ip-adapter: Text compatible image prompt adapter for text-to-image diffusion models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2308.06721 (2023)."},{"key":"e_1_3_3_2_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00618"},{"key":"e_1_3_3_2_51_1","unstructured":"David\u00a0Junhao Zhang Jay\u00a0Zhangjie Wu Jia-Wei Liu Rui Zhao Lingmin Ran Yuchao Gu Difei Gao and Mike\u00a0Zheng Shou. 2024b. Show-1: Marrying pixel and latent diffusion models for text-to-video generation. Int. J. Comput. Vis. (2024) 1\u201315."},{"key":"e_1_3_3_2_52_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00355"},{"key":"e_1_3_3_2_53_1","unstructured":"Yuang Zhang Jiaxi Gu Li-Wen Wang Han Wang Junqi Cheng Yuefeng Zhu and Fangyuan Zou. 2024a. Mimicmotion: High-quality human motion video generation with confidence-aware pose guidance. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2406.19680 (2024)."},{"key":"e_1_3_3_2_54_1","unstructured":"Rui Zhao Yuchao Gu Jay\u00a0Zhangjie Wu David\u00a0Junhao Zhang Jiawei Liu Weijia Wu Jussi Keppo and Mike\u00a0Zheng Shou. 2023. MotionDirector: Motion Customization of Text-to-Video Diffusion Models. arxiv:https:\/\/arXiv.org\/abs\/2310.08465\u00a0[cs.CV]"},{"key":"e_1_3_3_2_55_1","unstructured":"Yuan Zhou Qiuyue Wang Yuxuan Cai and Huan Yang. 2024. Allegro: Open the Black Box of Commercial-Level Video Generation Model. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2410.15458 (2024)."},{"key":"e_1_3_3_2_56_1","volume-title":"EECV","author":"Zhu Shenhao","year":"2024","unstructured":"Shenhao Zhu, Junming\u00a0Leo Chen, Zuozhuo Dai, Yinghui Xu, Xun Cao, Yao Yao, Hao Zhu, and Siyu Zhu. 2024. Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance. In EECV."}],"event":{"name":"SIGGRAPH Conference Papers '25: Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers","location":"Vancouver BC Canada","acronym":"SIGGRAPH Conference Papers '25","sponsor":["SIGGRAPH ACM Special Interest Group on Computer Graphics and Interactive Techniques"]},"container-title":["Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3721238.3730683","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,20]],"date-time":"2026-03-20T14:55:45Z","timestamp":1774018545000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3721238.3730683"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,27]]},"references-count":55,"alternative-id":["10.1145\/3721238.3730683","10.1145\/3721238"],"URL":"https:\/\/doi.org\/10.1145\/3721238.3730683","relation":{},"subject":[],"published":{"date-parts":[[2025,7,27]]},"assertion":[{"value":"2025-07-27","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}