{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,24]],"date-time":"2026-07-24T14:54:21Z","timestamp":1784904861115,"version":"3.55.0"},"publisher-location":"New York, NY, USA","reference-count":42,"publisher":"ACM","license":[{"start":{"date-parts":[[2023,10,9]],"date-time":"2023-10-09T00:00:00Z","timestamp":1696809600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Shenzhen Key Laboratory of next generation interactive media innovative technology","award":["ZDSYS20210623092001004"],"award-info":[{"award-number":["ZDSYS20210623092001004"]}]},{"name":"Shenzhen Science and Technology Program","award":["WDZC20200818121348001"],"award-info":[{"award-number":["WDZC20200818121348001"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62076144"],"award-info":[{"award-number":["62076144"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,10,9]]},"DOI":"10.1145\/3577190.3616114","type":"proceedings-article","created":{"date-parts":[[2023,10,7]],"date-time":"2023-10-07T22:30:48Z","timestamp":1696717848000},"page":"779-785","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":27,"title":["The DiffuseStyleGesture+ entry to the GENEA Challenge 2023"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0928-034X","authenticated-orcid":false,"given":"Sicheng","family":"Yang","sequence":"first","affiliation":[{"name":"Shenzhen International Graduate School, Tsinghua University, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7318-9682","authenticated-orcid":false,"given":"Haiwei","family":"Xue","sequence":"additional","affiliation":[{"name":"Shenzhen International Graduate School, Tsinghua University, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-7911-7564","authenticated-orcid":false,"given":"Zhensong","family":"Zhang","sequence":"additional","affiliation":[{"name":"Huawei Noah's Ark Lab, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1427-3507","authenticated-orcid":false,"given":"Minglei","family":"Li","sequence":"additional","affiliation":[{"name":"Huawei Cloud Computing Technologies Co., Ltd, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8533-0524","authenticated-orcid":false,"given":"Zhiyong","family":"Wu","sequence":"additional","affiliation":[{"name":"Shenzhen International Graduate School, Tsinghua University, China and The Chinese University of Hong Kong, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-0143-1485","authenticated-orcid":false,"given":"Xiaofei","family":"Wu","sequence":"additional","affiliation":[{"name":"Huawei Noah's Ark Lab, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0022-0906","authenticated-orcid":false,"given":"Songcen","family":"Xu","sequence":"additional","affiliation":[{"name":"Huawei Noah's Ark Lab, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-7723-4130","authenticated-orcid":false,"given":"Zonghong","family":"Dai","sequence":"additional","affiliation":[{"name":"Huawei Cloud Computing Technologies Co., Ltd, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2023,10,9]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01991"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSA.2005.851998"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00051"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3536221.3558060"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/JSTSP.2022.3188113"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-21996-7_17"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00941"},{"key":"e_1_3_2_1_8_1","volume-title":"Computer Graphics Forum, Vol.\u00a042","author":"Ghorbani Saeed","unstructured":"Saeed Ghorbani , Ylva Ferstl , Daniel Holden , Nikolaus\u00a0 F Troje , and Marc-Andr\u00e9 Carbonneau . 2023. ZeroEGGS: Zero-shot Example-based Gesture Generation from Speech . In Computer Graphics Forum, Vol.\u00a042 . Wiley Online Library , 206\u2013216. Saeed Ghorbani, Ylva Ferstl, Daniel Holden, Nikolaus\u00a0F Troje, and Marc-Andr\u00e9 Carbonneau. 2023. ZeroEGGS: Zero-shot Example-based Gesture Generation from Speech. In Computer Graphics Forum, Vol.\u00a042. Wiley Online Library, 206\u2013216."},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00509"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3267851.3267878"},{"key":"e_1_3_2_1_11_1","first-page":"6840","article-title":"Denoising diffusion probabilistic models","volume":"33","author":"Ho Jonathan","year":"2020","unstructured":"Jonathan Ho , Ajay Jain , and Pieter Abbeel . 2020 . Denoising diffusion probabilistic models . Advances in Neural Information Processing Systems 33 (2020), 6840 \u2013 6851 . Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems 33 (2020), 6840\u20136851.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"crossref","unstructured":"Peter\u00a0J Huber. 1992. Robust estimation of a location parameter. In Breakthroughs in statistics. 492\u2013518.  Peter\u00a0J Huber. 1992. Robust estimation of a location parameter. In Breakthroughs in statistics. 492\u2013518.","DOI":"10.1007\/978-1-4612-4380-9_35"},{"key":"e_1_3_2_1_13_1","volume-title":"FLAME: Free-form Language-based Motion Synthesis & Editing. CoRR abs\/2209.00349","author":"Kim Jihoon","year":"2022","unstructured":"Jihoon Kim , Jiseob Kim , and Sungjoon Choi . 2022 . FLAME: Free-form Language-based Motion Synthesis & Editing. CoRR abs\/2209.00349 (2022). https:\/\/doi.org\/10.48550\/arXiv.2209.00349 arXiv:2209.00349 10.48550\/arXiv.2209.00349 Jihoon Kim, Jiseob Kim, and Sungjoon Choi. 2022. FLAME: Free-form Language-based Motion Synthesis & Editing. CoRR abs\/2209.00349 (2022). https:\/\/doi.org\/10.48550\/arXiv.2209.00349 arXiv:2209.00349"},{"key":"e_1_3_2_1_14_1","volume-title":"Reformer: The efficient transformer. arXiv preprint arXiv:2001.04451","author":"Kitaev Nikita","year":"2020","unstructured":"Nikita Kitaev , \u0141ukasz Kaiser , and Anselm Levskaya . 2020 . Reformer: The efficient transformer. arXiv preprint arXiv:2001.04451 (2020). Nikita Kitaev, \u0141ukasz Kaiser, and Anselm Levskaya. 2020. Reformer: The efficient transformer. arXiv preprint arXiv:2001.04451 (2020)."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3308532.3329472"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3577190.3616120"},{"key":"e_1_3_2_1_17_1","volume-title":"Evaluating gesture-generation in a large-scale open challenge: The GENEA Challenge","author":"Kucherenko Taras","year":"2022","unstructured":"Taras Kucherenko , Pieter Wolfert , Youngwoo Yoon , Carla Viegas , Teodor Nikolov , Mihail Tsakov , and Gustav\u00a0Eje Henter . 2023. Evaluating gesture-generation in a large-scale open challenge: The GENEA Challenge 2022 . arXiv preprint arXiv:2303.08737 (2023). Taras Kucherenko, Pieter Wolfert, Youngwoo Yoon, Carla Viegas, Teodor Nikolov, Mihail Tsakov, and Gustav\u00a0Eje Henter. 2023. Evaluating gesture-generation in a large-scale open challenge: The GENEA Challenge 2022. arXiv preprint arXiv:2303.08737 (2023)."},{"key":"e_1_3_2_1_18_1","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision. 763\u2013772","author":"Lee Gilwoo","year":"2019","unstructured":"Gilwoo Lee , Zhiwei Deng , Shugao Ma , Takaaki Shiratori , Siddhartha\u00a0 S Srinivasa , and Yaser Sheikh . 2019 . Talking with hands 16.2 m: A large-scale dataset of synchronized body-finger motion and audio for conversational motion analysis and synthesis . In Proceedings of the IEEE\/CVF International Conference on Computer Vision. 763\u2013772 . Gilwoo Lee, Zhiwei Deng, Shugao Ma, Takaaki Shiratori, Siddhartha\u00a0S Srinivasa, and Yaser Sheikh. 2019. Talking with hands 16.2 m: A large-scale dataset of synchronized body-finger motion and audio for conversational motion analysis and synthesis. In Proceedings of the IEEE\/CVF International Conference on Computer Vision. 763\u2013772."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01110"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01022"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-20071-7_36"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01021"},{"key":"e_1_3_2_1_23_1","volume-title":"Decoupled Weight Decay Regularization. In 7th International Conference on Learning Representations, ICLR 2019","author":"Loshchilov Ilya","year":"2019","unstructured":"Ilya Loshchilov and Frank Hutter . 2019 . Decoupled Weight Decay Regularization. In 7th International Conference on Learning Representations, ICLR 2019 , New Orleans, LA, USA , May 6-9, 2019. OpenReview.net. https:\/\/openreview.net\/forum?id=Bkg6RiCqY7 Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net. https:\/\/openreview.net\/forum?id=Bkg6RiCqY7"},{"key":"e_1_3_2_1_24_1","volume-title":"Computer Graphics Forum, Vol.\u00a042","author":"Nyatsanga Simbarashe","unstructured":"Simbarashe Nyatsanga , Taras Kucherenko , Chaitanya Ahuja , Gustav\u00a0Eje Henter , and Michael Neff . 2023. A Comprehensive Review of Data-Driven Co-Speech Gesture Generation . In Computer Graphics Forum, Vol.\u00a042 . Wiley Online Library , 569\u2013596. Simbarashe Nyatsanga, Taras Kucherenko, Chaitanya Ahuja, Gustav\u00a0Eje Henter, and Michael Neff. 2023. A Comprehensive Review of Data-Driven Co-Speech Gesture Generation. In Computer Graphics Forum, Vol.\u00a042. Wiley Online Library, 569\u2013596."},{"key":"e_1_3_2_1_25_1","volume-title":"EmotionGesture: Audio-Driven Diverse Emotional Co-Speech 3D Gesture Generation. arXiv preprint arXiv:2305.18891","author":"Qi Xingqun","year":"2023","unstructured":"Xingqun Qi , Chen Liu , Lincheng Li , Jie Hou , Haoran Xin , and Xin Yu. 2023. EmotionGesture: Audio-Driven Diverse Emotional Co-Speech 3D Gesture Generation. arXiv preprint arXiv:2305.18891 ( 2023 ). Xingqun Qi, Chen Liu, Lincheng Li, Jie Hou, Haoran Xin, and Xin Yu. 2023. EmotionGesture: Audio-Driven Diverse Emotional Co-Speech 3D Gesture Generation. arXiv preprint arXiv:2305.18891 (2023)."},{"key":"e_1_3_2_1_26_1","volume-title":"Passing a non-verbal turing test: Evaluating gesture animations generated from speech. In 2021 IEEE Virtual Reality and 3D User Interfaces (VR)","author":"Rebol Manuel","unstructured":"Manuel Rebol , Christian G\u00fcti , and Krzysztof Pietroszek . 2021. Passing a non-verbal turing test: Evaluating gesture animations generated from speech. In 2021 IEEE Virtual Reality and 3D User Interfaces (VR) . IEEE , 573\u2013581. Manuel Rebol, Christian G\u00fcti, and Krzysztof Pietroszek. 2021. Passing a non-verbal turing test: Evaluating gesture animations generated from speech. In 2021 IEEE Virtual Reality and 3D User Interfaces (VR). IEEE, 573\u2013581."},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00353"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01077"},{"key":"e_1_3_2_1_29_1","volume-title":"Human Motion Diffusion Model. In The Eleventh International Conference on Learning Representations, ICLR 2023","author":"Tevet Guy","year":"2023","unstructured":"Guy Tevet , Sigal Raab , Brian Gordon , Yonatan Shafir , Daniel Cohen-Or , and Amit\u00a0Haim Bermano . 2023 . Human Motion Diffusion Model. In The Eleventh International Conference on Learning Representations, ICLR 2023 , Kigali, Rwanda , May 1-5, 2023. OpenReview.net. https:\/\/openreview.net\/pdf?id=SJ1kSyO2jwu Guy Tevet, Sigal Raab, Brian Gordon, Yonatan Shafir, Daniel Cohen-Or, and Amit\u00a0Haim Bermano. 2023. Human Motion Diffusion Model. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net. https:\/\/openreview.net\/pdf?id=SJ1kSyO2jwu"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00051"},{"key":"e_1_3_2_1_31_1","volume-title":"Attention is all you need. Advances in neural information processing systems 30","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani , Noam Shazeer , Niki Parmar , Jakob Uszkoreit , Llion Jones , Aidan\u00a0 N Gomez , \u0141ukasz Kaiser , and Illia Polosukhin . 2017. Attention is all you need. Advances in neural information processing systems 30 ( 2017 ). Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan\u00a0N Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017)."},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/THMS.2022.3149173"},{"key":"e_1_3_2_1_33_1","volume-title":"DiffuseStyleGesture: Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion Models. arXiv preprint arXiv:2305.04919","author":"Yang Sicheng","year":"2023","unstructured":"Sicheng Yang , Zhiyong Wu , Minglei Li , Zhensong Zhang , Lei Hao , Weihong Bao , Ming Cheng , and Long Xiao . 2023. DiffuseStyleGesture: Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion Models. arXiv preprint arXiv:2305.04919 ( 2023 ). Sicheng Yang, Zhiyong Wu, Minglei Li, Zhensong Zhang, Lei Hao, Weihong Bao, Ming Cheng, and Long Xiao. 2023. DiffuseStyleGesture: Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion Models. arXiv preprint arXiv:2305.04919 (2023)."},{"key":"e_1_3_2_1_34_1","volume-title":"QPGesture: Quantization-Based and Phase-Guided Motion Matching for Natural Speech-Driven Gesture Generation. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition, CVPR. IEEE, 2321\u20132330","author":"Yang Sicheng","year":"2023","unstructured":"Sicheng Yang , Zhiyong Wu , Minglei Li , Zhensong Zhang , Lei Hao , Weihong Bao , and Haolin Zhuang . 2023 . QPGesture: Quantization-Based and Phase-Guided Motion Matching for Natural Speech-Driven Gesture Generation. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition, CVPR. IEEE, 2321\u20132330 . Sicheng Yang, Zhiyong Wu, Minglei Li, Zhensong Zhang, Lei Hao, Weihong Bao, and Haolin Zhuang. 2023. QPGesture: Quantization-Based and Phase-Guided Motion Matching for Natural Speech-Driven Gesture Generation. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition, CVPR. IEEE, 2321\u20132330."},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/3536221.3558066"},{"key":"e_1_3_2_1_36_1","volume-title":"EMoG: Synthesizing Emotive Co-speech 3D Gesture with Diffusion Model. arXiv preprint arXiv:2306.11496","author":"Yin Lianying","year":"2023","unstructured":"Lianying Yin , Yijun Wang , Tianyu He , Jinming Liu , Wei Zhao , Bohan Li , Xin Jin , and Jianxin Lin . 2023. EMoG: Synthesizing Emotive Co-speech 3D Gesture with Diffusion Model. arXiv preprint arXiv:2306.11496 ( 2023 ). Lianying Yin, Yijun Wang, Tianyu He, Jinming Liu, Wei Zhao, Bohan Li, Xin Jin, and Jianxin Lin. 2023. EMoG: Synthesizing Emotive Co-speech 3D Gesture with Diffusion Model. arXiv preprint arXiv:2306.11496 (2023)."},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3414685.3417838"},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2019.8793720"},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3536221.3558058"},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/3272127.3275090"},{"key":"e_1_3_2_1_41_1","volume-title":"Motiondiffuse: Text-driven human motion generation with diffusion model. arXiv preprint arXiv:2208.15001","author":"Zhang Mingyuan","year":"2022","unstructured":"Mingyuan Zhang , Zhongang Cai , Liang Pan , Fangzhou Hong , Xinying Guo , Lei Yang , and Ziwei Liu . 2022 . Motiondiffuse: Text-driven human motion generation with diffusion model. arXiv preprint arXiv:2208.15001 (2022). Mingyuan Zhang, Zhongang Cai, Liang Pan, Fangzhou Hong, Xinying Guo, Lei Yang, and Ziwei Liu. 2022. Motiondiffuse: Text-driven human motion generation with diffusion model. arXiv preprint arXiv:2208.15001 (2022)."},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP49357.2023.10095203"}],"event":{"name":"ICMI '23: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION","location":"Paris France","acronym":"ICMI '23","sponsor":["SIGCHI ACM Special Interest Group on Computer-Human Interaction"]},"container-title":["INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3577190.3616114","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3577190.3616114","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:37:02Z","timestamp":1750178222000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3577190.3616114"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,10,9]]},"references-count":42,"alternative-id":["10.1145\/3577190.3616114","10.1145\/3577190"],"URL":"https:\/\/doi.org\/10.1145\/3577190.3616114","relation":{},"subject":[],"published":{"date-parts":[[2023,10,9]]},"assertion":[{"value":"2023-10-09","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}