{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,2]],"date-time":"2026-06-02T18:26:28Z","timestamp":1780424788750,"version":"3.54.1"},"publisher-location":"New York, NY, USA","reference-count":75,"publisher":"ACM","funder":[{"name":"Natural Science Foundation of China","award":["62421003"],"award-info":[{"award-number":["62421003"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,8,10]]},"DOI":"10.1145\/3721238.3730611","type":"proceedings-article","created":{"date-parts":[[2025,7,23]],"date-time":"2025-07-23T08:40:47Z","timestamp":1753260047000},"page":"1-12","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Motion-example-controlled Co-speech Gesture Generation Leveraging Large Language Models"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0007-1036-7737","authenticated-orcid":false,"given":"Bohong","family":"Chen","sequence":"first","affiliation":[{"name":"State Key Lab of CAD&amp;CG, Zhejiang University, Hangzhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-6558-4165","authenticated-orcid":false,"given":"Yumeng","family":"Li","sequence":"additional","affiliation":[{"name":"State Key Lab of CAD&amp;CG, Zhejiang University, Hangzhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9120-9592","authenticated-orcid":false,"given":"Youyi","family":"Zheng","sequence":"additional","affiliation":[{"name":"State Key Lab of CAD&amp;CG, Zhejiang University, Hangzhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8580-1103","authenticated-orcid":false,"given":"Yao-Xiang","family":"Ding","sequence":"additional","affiliation":[{"name":"State Key Lab of CAD&amp;CG, Zhejiang University, Hangzhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4243-6112","authenticated-orcid":false,"given":"Kun","family":"Zhou","sequence":"additional","affiliation":[{"name":"State Key Lab of CAD&amp;CG, Zhejiang University, Hangzhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2025,7,27]]},"reference":[{"key":"e_1_3_3_2_2_1","doi-asserted-by":"crossref","unstructured":"Kfir Aberman Peizhuo Li Dani Lischinski Olga Sorkine-Hornung Daniel Cohen-Or and Baoquan Chen. 2020a. Skeleton-aware networks for deep motion retargeting. ACM Trans. Graph. 39 4 Article 62 (Aug. 2020) 14\u00a0pages.","DOI":"10.1145\/3386569.3392462"},{"key":"e_1_3_3_2_3_1","doi-asserted-by":"crossref","unstructured":"Kfir Aberman Yijia Weng Dani Lischinski Daniel Cohen-Or and Baoquan Chen. 2020b. Unpaired Motion Style Transfer from Video to Animation. ACM Transactions on Graphics (TOG) 39 4 (2020) 64.","DOI":"10.1145\/3386569.3392469"},{"key":"e_1_3_3_2_4_1","doi-asserted-by":"crossref","unstructured":"Simon Alexanderson Gustav\u00a0Eje Henter Taras Kucherenko and Jonas Beskow. 2020. Style-controllable speech-driven gesture synthesis using normalising flows. Computer Graphics Forum 39 2 (2020) 487\u2013496.","DOI":"10.1111\/cgf.13946"},{"key":"e_1_3_3_2_5_1","doi-asserted-by":"crossref","unstructured":"Simon Alexanderson Rajmund Nagy Jonas Beskow and Gustav\u00a0Eje Henter. 2023. Listen Denoise Action! Audio-Driven Motion Synthesis with Diffusion Models. ACM Trans. Graph. 42 4 Article 44 (July 2023) 20\u00a0pages.","DOI":"10.1145\/3592458"},{"key":"e_1_3_3_2_6_1","doi-asserted-by":"crossref","unstructured":"Tenglong Ao Qingzhe Gao Yuke Lou Baoquan Chen and Libin Liu. 2022. Rhythmic gesticulator: Rhythm-aware co-speech gesture synthesis with hierarchical neural embeddings. ACM Transactions on Graphics (TOG) 41 6 (2022) 1\u201319.","DOI":"10.1145\/3550454.3555435"},{"key":"e_1_3_3_2_7_1","unstructured":"Tenglong Ao Zeyi Zhang and Libin Liu. 2023. GestureDiffuCLIP: Gesture Diffusion Model with CLIP Latents. ACM Trans. Graph. (2023) 18\u00a0pages."},{"key":"e_1_3_3_2_8_1","volume-title":"arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2307.15818","author":"Brohan Anthony","year":"2023","unstructured":"Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Xi Chen, Krzysztof Choromanski, Tianli Ding, Danny Driess, Avinava Dubey, Chelsea Finn, Pete Florence, Chuyuan Fu, Montse\u00a0Gonzalez Arenas, Keerthana Gopalakrishnan, Kehang Han, Karol Hausman, Alex Herzog, Jasmine Hsu, Brian Ichter, Alex Irpan, Nikhil Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Isabel Leal, Lisa Lee, Tsang-Wei\u00a0Edward Lee, Sergey Levine, Yao Lu, Henryk Michalewski, Igor Mordatch, Karl Pertsch, Kanishka Rao, Krista Reymann, Michael Ryoo, Grecia Salazar, Pannag Sanketi, Pierre Sermanet, Jaspiar Singh, Anikait Singh, Radu Soricut, Huong Tran, Vincent Vanhoucke, Quan Vuong, Ayzaan Wahid, Stefan Welker, Paul Wohlhart, Jialin Wu, Fei Xia, Ted Xiao, Peng Xu, Sichun Xu, Tianhe Yu, and Brianna Zitkovich. 2023. RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control. In arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2307.15818."},{"key":"e_1_3_3_2_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/192161.192272"},{"key":"e_1_3_3_2_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/383259.383315"},{"key":"e_1_3_3_2_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3664647.3680847"},{"key":"e_1_3_3_2_12_1","volume-title":"arXiv","author":"Chen Changan","year":"2024","unstructured":"Changan Chen, Juze Zhang, Shrinidhi\u00a0Kowshika Lakshmikanth, Yusu Fang, Ruizhi Shao, Gordon Wetzstein, Li Fei-Fei, and Ehsan Adeli. 2024c. The Language of Motion: Unifying Verbal and Non-verbal Language of 3D Human Motion. In arXiv."},{"key":"e_1_3_3_2_13_1","unstructured":"Ling-Hao Chen Shunlin Lu Ailing Zeng Hao Zhang Benyou Wang Ruimao Zhang and Lei Zhang. 2024b. MotionLLM: Understanding Human Behaviors from Human Motions and Videos. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2405.20340 (2024)."},{"key":"e_1_3_3_2_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3680528.3687677"},{"key":"e_1_3_3_2_15_1","unstructured":"Hyung\u00a0Won Chung Le Hou Shayne Longpre Barret Zoph Yi Tay William Fedus Yunxuan Li Xuezhi Wang Mostafa Dehghani Siddhartha Brahma et\u00a0al. 2024. Scaling instruction-finetuned language models. Journal of Machine Learning Research 25 70 (2024) 1\u201353."},{"key":"e_1_3_3_2_16_1","doi-asserted-by":"crossref","unstructured":"Sharice Clough and Melissa\u00a0C. Duff. 2020. The Role of Gesture in Communication and Cognition: Implications for Understanding and Treating Neurogenic Communication Disorders. Frontiers in Human Neuroscience 14 (2020).","DOI":"10.3389\/fnhum.2020.00323"},{"key":"e_1_3_3_2_17_1","doi-asserted-by":"crossref","unstructured":"Saeed Ghorbani Ylva Ferstl Daniel Holden Nikolaus\u00a0F. Troje and Marc-Andr\u00e9 Carbonneau. 2023. ZeroEGGS: Zero-shot Example-based Gesture Generation from Speech. Computer Graphics Forum 42 1 (2023) 206\u2013216. arXiv:https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1111\/cgf.14734","DOI":"10.1111\/cgf.14734"},{"key":"e_1_3_3_2_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00361"},{"key":"e_1_3_3_2_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3641519.3657447"},{"key":"e_1_3_3_2_20_1","doi-asserted-by":"crossref","unstructured":"Chuan Guo Yuxuan Mu Muhammad\u00a0Gohar Javed Sen Wang and Li Cheng. 2023. MoMask: Generative Masked Modeling of 3D Human Motions. (2023). arxiv:https:\/\/arXiv.org\/abs\/2312.00063\u00a0[cs.CV]","DOI":"10.1109\/CVPR52733.2024.00186"},{"key":"e_1_3_3_2_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3528233.3530750"},{"key":"e_1_3_3_2_22_1","doi-asserted-by":"crossref","unstructured":"Ikhsanul Habibie Weipeng Xu Dushyant Mehta Lingjie Liu Hans-Peter Seidel Gerard Pons-Moll Mohamed Elgharib and Christian Theobalt. 2021. Learning Speech-driven 3D Conversational Gestures from Video. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2102.06837 (2021).","DOI":"10.1145\/3472306.3478335"},{"key":"e_1_3_3_2_23_1","doi-asserted-by":"crossref","unstructured":"Fangzhou Hong Mingyuan Zhang Liang Pan Zhongang Cai Lei Yang and Ziwei Liu. 2022. AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars. ACM Transactions on Graphics (TOG) 41 4 (2022) 1\u201319.","DOI":"10.1145\/3528223.3530094"},{"key":"e_1_3_3_2_24_1","doi-asserted-by":"crossref","unstructured":"Wei-Ning Hsu Benjamin Bolte Yao-Hung\u00a0Hubert Tsai Kushal Lakhotia Ruslan Salakhutdinov and Abdelrahman Mohamed. 2021. Hubert: Self-supervised speech representation learning by masked prediction of hidden units. IEEE\/ACM transactions on audio speech and language processing 29 (2021) 3451\u20133460.","DOI":"10.1109\/TASLP.2021.3122291"},{"key":"e_1_3_3_2_25_1","unstructured":"Biao Jiang Xin Chen Wen Liu Jingyi Yu Gang Yu and Tao Chen. 2024. Motiongpt: Human motion as a foreign language. Advances in Neural Information Processing Systems 36 (2024)."},{"key":"e_1_3_3_2_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00205"},{"key":"e_1_3_3_2_27_1","doi-asserted-by":"publisher","DOI":"10.1007\/11821830_17"},{"key":"e_1_3_3_2_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3382507.3418815"},{"key":"e_1_3_3_2_29_1","doi-asserted-by":"crossref","unstructured":"Taras Kucherenko Pieter Wolfert Youngwoo Yoon Carla Viegas Teodor Nikolov Mihail Tsakov and Gustav\u00a0Eje Henter. 2024. Evaluating Gesture Generation in a Large-scale Open Challenge: The GENEA Challenge 2022. 43 3 Article 32 (jun 2024).","DOI":"10.1145\/3656374"},{"key":"e_1_3_3_2_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3600006.3613165"},{"key":"e_1_3_3_2_31_1","doi-asserted-by":"crossref","unstructured":"Jina Lee and Stacy Marsella. 2006. Nonverbal Behavior Generator for Embodied Conversational Agents(IVA \u201906). Springer 243\u2013255.","DOI":"10.1007\/11821830_20"},{"key":"e_1_3_3_2_32_1","doi-asserted-by":"crossref","unstructured":"Margot Lhommet Yuyu Xu and Stacy Marsella. 2015. Cerebella: Automatic Generation of Nonverbal Behavior for Virtual Humans(AAAI \u201915 1).","DOI":"10.1609\/aaai.v29i1.9778"},{"key":"e_1_3_3_2_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01110"},{"key":"e_1_3_3_2_34_1","doi-asserted-by":"crossref","unstructured":"Weiyu Li Xuelin Chen Peizhuo Li Olga Sorkine-Hornung and Baoquan Chen. 2023. Example-Based Motion Synthesis via Generative Motion Matching. ACM Transactions on Graphics (TOG) 42 4 Article 94 (2023).","DOI":"10.1145\/3592395"},{"key":"e_1_3_3_2_35_1","unstructured":"Shijia Liao Yuxuan Wang Tianyu Li Yifan Cheng Ruoyi Zhang Rongzhi Zhou and Yijin Xing. 2024. Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis. arxiv:https:\/\/arXiv.org\/abs\/2411.01156\u00a0[cs.SD]"},{"key":"e_1_3_3_2_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3503161.3548400"},{"key":"e_1_3_3_2_37_1","unstructured":"Haiyang Liu Xingchao Yang Tomoya Akiyama Yuantian Huang Qiaoge Li Shigeru Kuriyama and Takafumi Taketomi. 2024a. TANGO: Co-Speech Gesture Video Reenactment with Hierarchical Audio Motion Embedding and Diffusion Interpolation. arxiv:https:\/\/arXiv.org\/abs\/2410.04221\u00a0[cs.CV]"},{"key":"e_1_3_3_2_38_1","unstructured":"Haiyang Liu Zihao Zhu Giorgio Becherini Yichen Peng Mingyang Su You Zhou Naoya Iwamoto Bo Zheng and Michael\u00a0J. Black. 2024b. EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Masked Audio Gesture Modeling. arxiv:https:\/\/arXiv.org\/abs\/2401.00374\u00a0[cs.CV]"},{"key":"e_1_3_3_2_39_1","unstructured":"Haiyang Liu Zihao Zhu Naoya Iwamoto Yichen Peng Zhengqing Li You Zhou Elif Bozkurt and Bo Zheng. 2022d. BEAT: A Large-Scale Semantic and Emotional Multi-Modal Dataset for Conversational Gestures Synthesis. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2203.05297 (2022)."},{"key":"e_1_3_3_2_40_1","first-page":"21386","volume-title":"Advances in Neural Information Processing Systems","author":"Liu Xian","year":"2022","unstructured":"Xian Liu, Qianyi Wu, Hang Zhou, Yuanqi Du, Wayne Wu, Dahua Lin, and Ziwei Liu. 2022b. Audio-Driven Co-Speech Gesture Video Generation. In Advances in Neural Information Processing Systems , S.\u00a0Koyejo, S.\u00a0Mohamed, A.\u00a0Agarwal, D.\u00a0Belgrave, K.\u00a0Cho, and A.\u00a0Oh (Eds.), Vol.\u00a035. Curran Associates, Inc., 21386\u201321399."},{"key":"e_1_3_3_2_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01021"},{"key":"e_1_3_3_2_42_1","doi-asserted-by":"crossref","unstructured":"Shuhong Lu Youngwoo Yoon and Andrew\u00a0W. Feng. 2023. Co-Speech Gesture Synthesis using Discrete Gesture Token Learning. 2023 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS) (2023) 9808\u20139815.","DOI":"10.1109\/IROS55552.2023.10342027"},{"key":"e_1_3_3_2_43_1","doi-asserted-by":"crossref","unstructured":"Mingshuang Luo Ruibing Hou Zhuo Li Hong Chang Zimo Liu Yaowei Wang and Shiguang Shan. 2024. M3GPT: An Advanced Multimodal Multitask Framework for Motion Comprehension and Generation. Advances in Neural Information Processing Systems (2024).","DOI":"10.52202\/079017-0879"},{"key":"e_1_3_3_2_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.01000"},{"key":"e_1_3_3_2_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00101"},{"key":"e_1_3_3_2_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00925"},{"key":"e_1_3_3_2_47_1","doi-asserted-by":"crossref","unstructured":"Long Ouyang Jeffrey Wu Xu Jiang Diogo Almeida Carroll Wainwright Pamela Mishkin Chong Zhang Sandhini Agarwal Katarina Slama Alex Ray et\u00a0al. 2022. Training language models to follow instructions with human feedback. Advances in neural information processing systems 35 (2022) 27730\u201327744.","DOI":"10.52202\/068431-2011"},{"key":"e_1_3_3_2_48_1","doi-asserted-by":"crossref","unstructured":"Haozhou Pang Tianwei Ding Lanshan He Ming Tao Lu Zhang and Qi Gan. 2024. LLM Gesticulator: Leveraging Large Language Models for Scalable and Controllable Co-Speech Gesture Synthesis. arxiv:https:\/\/arXiv.org\/abs\/2410.10851\u00a0[cs.GR]","DOI":"10.1117\/12.3060395"},{"key":"e_1_3_3_2_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01123"},{"key":"e_1_3_3_2_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00870"},{"key":"e_1_3_3_2_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/3680528.3687579"},{"key":"e_1_3_3_2_52_1","volume-title":"International Conference on Machine Learning (ICML)","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong\u00a0Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models from Natural Language Supervision. In International Conference on Machine Learning (ICML)."},{"key":"e_1_3_3_2_53_1","unstructured":"Colin Raffel Noam Shazeer Adam Roberts Katherine Lee Sharan Narang Michael Matena Yanqi Zhou Wei Li and Peter\u00a0J. Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research 21 140 (2020) 1\u201367."},{"key":"e_1_3_3_2_54_1","volume-title":"The Twelfth International Conference on Learning Representations","author":"Shafir Yoni","year":"2024","unstructured":"Yoni Shafir, Guy Tevet, Roy Kapon, and Amit\u00a0Haim Bermano. 2024. Human Motion Diffusion as a Generative Prior. In The Twelfth International Conference on Learning Representations."},{"key":"e_1_3_3_2_55_1","doi-asserted-by":"crossref","unstructured":"Min Shi Wenke Feng Lin Gao and Dengming Gao. 2024. Generating diverse clothed 3D human animations via a generative model. Computational Visual Media 10 2 (2024) 261\u2013277.","DOI":"10.1007\/s41095-022-0324-2"},{"key":"e_1_3_3_2_56_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-20047-2_21"},{"key":"e_1_3_3_2_57_1","volume-title":"The Eleventh International Conference on Learning Representations","author":"Tevet Guy","year":"2023","unstructured":"Guy Tevet, Sigal Raab, Brian Gordon, Yoni Shafir, Daniel Cohen-or, and Amit\u00a0Haim Bermano. 2023. Human Motion Diffusion Model. In The Eleventh International Conference on Learning Representations."},{"key":"e_1_3_3_2_58_1","unstructured":"Weilin Wan Zhiyang Dou Taku Komura Wenping Wang Dinesh Jayaraman and Lingjie Liu. 2023. TLControl: Trajectory and Language Control for Human Motion Synthesis. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2311.17135 (2023)."},{"key":"e_1_3_3_2_59_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-demos.6"},{"key":"e_1_3_3_2_60_1","doi-asserted-by":"crossref","unstructured":"Bowen Wu Chaoran Liu Carlos\u00a0Toshinori Ishi and Hiroshi Ishiguro. 2021. Modeling the conditional distribution of co-speech upper body gesture jointly using conditional-GAN and unrolled-GAN. Electronics 10 3 (2021) 228.","DOI":"10.3390\/electronics10030228"},{"key":"e_1_3_3_2_61_1","unstructured":"Yiming Xie Varun Jampani Lei Zhong Deqing Sun and Huaizu Jiang. 2023. OmniControl: Control Any Joint at Any Time for Human Motion Generation. arxiv:https:\/\/arXiv.org\/abs\/2310.08580"},{"key":"e_1_3_3_2_62_1","unstructured":"An Yang Baosong Yang Beichen Zhang Binyuan Hui Bo Zheng Bowen Yu Chengyuan Li Dayiheng Liu Fei Huang Haoran Wei Huan Lin Jian Yang Jianhong Tu Jianwei Zhang Jianxin Yang Jiaxi Yang Jingren Zhou Junyang Lin Kai Dang Keming Lu Keqin Bao Kexin Yang Le Yu Mei Li Mingfeng Xue Pei Zhang Qin Zhu Rui Men Runji Lin Tianhao Li Tingyu Xia Xingzhang Ren Xuancheng Ren Yang Fan Yang Su Yichang Zhang Yu Wan Yuqiong Liu Zeyu Cui Zhenru Zhang and Zihan Qiu. 2024. Qwen2.5 Technical Report. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2412.15115 (2024)."},{"key":"e_1_3_3_2_63_1","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2023\/650"},{"key":"e_1_3_3_2_64_1","doi-asserted-by":"crossref","unstructured":"Heyuan Yao Zhenhua Song Yuyang Zhou Tenglong Ao Baoquan Chen and Libin Liu. 2024. MoConVQ: Unified Physics-Based Motion Control via Scalable Discrete Representations. ACM Trans. Graph. 43 4 Article 144 (July 2024) 21\u00a0pages.","DOI":"10.1145\/3658137"},{"key":"e_1_3_3_2_65_1","doi-asserted-by":"publisher","DOI":"10.1109\/IROS47612.2022.9981117"},{"key":"e_1_3_3_2_66_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-20065-6_41"},{"key":"e_1_3_3_2_67_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00053"},{"key":"e_1_3_3_2_68_1","doi-asserted-by":"crossref","unstructured":"Youngwoo Yoon Bok Cha Joo-Haeng Lee Minsu Jang Jaeyeon Lee Jaehong Kim and Geehyuk Lee. 2020. Speech gesture generation from the trimodal context of text audio and speaker identity. ACM Transactions on Graphics (TOG) 39 6 (2020) 1\u201316.","DOI":"10.1145\/3414685.3417838"},{"key":"e_1_3_3_2_69_1","unstructured":"Jun Zhan Junqi Dai Jiasheng Ye Yunhua Zhou Dong Zhang Zhigeng Liu Xin Zhang Ruibin Yuan Ge Zhang Linyang Li et\u00a0al. 2024. AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2402.12226 (2024)."},{"key":"e_1_3_3_2_70_1","doi-asserted-by":"crossref","unstructured":"Dong Zhang Shimin Li Xin Zhang Jun Zhan Pengyu Wang Yaqian Zhou and Xipeng Qiu. 2023a. SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities. arxiv:https:\/\/arXiv.org\/abs\/2305.11000\u00a0[cs.CL]","DOI":"10.18653\/v1\/2023.findings-emnlp.1055"},{"key":"e_1_3_3_2_71_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Zhang Jianrong","year":"2023","unstructured":"Jianrong Zhang, Yangsong Zhang, Xiaodong Cun, Shaoli Huang, Yong Zhang, Hongwei Zhao, Hongtao Lu, and Xi Shen. 2023b. T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)."},{"key":"e_1_3_3_2_72_1","unstructured":"Mingyuan Zhang Zhongang Cai Liang Pan Fangzhou Hong Xinying Guo Lei Yang and Ziwei Liu. 2022. MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2208.15001 (2022)."},{"key":"e_1_3_3_2_73_1","first-page":"397","volume-title":"Computer Vision \u2013 ECCV 2024: 18th European Conference, Milan, Italy, September 29\u2013October 4, 2024, Proceedings, Part XIII","author":"Zhang Mingyuan","year":"2024","unstructured":"Mingyuan Zhang, Daisheng Jin, Chenyang Gu, Fangzhou Hong, Zhongang Cai, Jingfang Huang, Chongzhi Zhang, Xinying Guo, Lei Yang, Ying He, and Ziwei Liu. 2024b. Large Motion Model for Unified Multi-modal Motion Generation. In Computer Vision \u2013 ECCV 2024: 18th European Conference, Milan, Italy, September 29\u2013October 4, 2024, Proceedings, Part XIII (Milan, Italy). Springer-Verlag, Berlin, Heidelberg, 397\u2013421."},{"key":"e_1_3_3_2_74_1","doi-asserted-by":"crossref","unstructured":"Zeyi Zhang Tenglong Ao Yuyao Zhang Qingzhe Gao Chuan Lin Baoquan Chen and Libin Liu. 2024a. Semantic Gesticulator: Semantics-Aware Co-Speech Gesture Synthesis. ACM Trans. Graph. (2024) 17\u00a0pages.","DOI":"10.1145\/3658134"},{"key":"e_1_3_3_2_75_1","doi-asserted-by":"publisher","DOI":"10.1145\/3536221.3558063"},{"key":"e_1_3_3_2_76_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00589"}],"event":{"name":"SIGGRAPH Conference Papers '25: Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers","location":"Vancouver BC Canada","acronym":"SIGGRAPH Conference Papers '25","sponsor":["SIGGRAPH ACM Special Interest Group on Computer Graphics and Interactive Techniques"]},"container-title":["Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3721238.3730611","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,20]],"date-time":"2026-03-20T14:58:24Z","timestamp":1774018704000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3721238.3730611"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,27]]},"references-count":75,"alternative-id":["10.1145\/3721238.3730611","10.1145\/3721238"],"URL":"https:\/\/doi.org\/10.1145\/3721238.3730611","relation":{},"subject":[],"published":{"date-parts":[[2025,7,27]]},"assertion":[{"value":"2025-07-27","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}