{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,15]],"date-time":"2025-08-15T01:28:58Z","timestamp":1755221338646,"version":"3.43.0"},"reference-count":79,"publisher":"Association for Computing Machinery (ACM)","issue":"4","funder":[{"name":"Innovation and Technology Fund, HKSAR","award":["ITS\/319\/21FP"],"award-info":[{"award-number":["ITS\/319\/21FP"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Comput. Graph. Interact. Tech."],"published-print":{"date-parts":[[2025,8,31]]},"abstract":"<jats:p>We address the problem of accurate capture of interactive behaviors between two people in daily scenarios. Most previous works either only consider one person or solely focus on conversational gestures of two people, assuming the body orientation and\/or position of each actor are constant or barely change over each interaction. In contrast, we propose to simultaneously model two people\u2019s activities, and target objective-driven, dynamic, and semantically consistent interactions which often span longer duration and cover bigger space. To this end, we capture a new multi-modal dataset dubbed InterAct, which is composed of 241 motion sequences where two people perform a realistic and coherent scenario for one minute or longer over a complete interaction. For each sequence, two actors are assigned different roles and emotion labels, and collaborate to finish one task or conduct a common interaction activity. The audios, body motions, and facial expressions of both persons are captured. InterAct contains diverse and complex motions of individuals and interesting and relatively long-term interaction patterns barely seen before. We also demonstrate a simple yet effective diffusion-based method that estimates interactive face expressions and body motions of two people from speech inputs. Our method regresses the body motions in a hierarchical manner, and we also propose a novel fine-tuning mechanism to improve the lip accuracy of facial expressions. To facilitate further research, the data and code will be made public.<\/jats:p>","DOI":"10.1145\/3747871","type":"journal-article","created":{"date-parts":[[2025,8,8]],"date-time":"2025-08-08T15:33:31Z","timestamp":1754667211000},"page":"1-27","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["InterAct: A Large-Scale Dataset of Dynamic, Expressive and Interactive Activities between Two People in Daily Scenarios 53"],"prefix":"10.1145","volume":"8","author":[{"ORCID":"https:\/\/orcid.org\/0009-0006-4373-2289","authenticated-orcid":false,"given":"Leo","family":"Ho","sequence":"first","affiliation":[{"name":"The University of Hong Kong","place":["Hong Kong, Hong Kong"]},{"name":"Centre for Transformative Garment Production","place":["Hong Kong, Hong Kong"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8871-5128","authenticated-orcid":false,"given":"Yinghao","family":"Huang","sequence":"additional","affiliation":[{"name":"Great Bay University","place":["Dongguan, China"]},{"name":"Dongguan Key Laboratory for Intelligence and Information Technology","place":["Dongguan, China"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-4992-4760","authenticated-orcid":false,"given":"Dafei","family":"Qin","sequence":"additional","affiliation":[{"name":"The University of Hong Kong","place":["Hong Kong, Hong Kong"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5180-600X","authenticated-orcid":false,"given":"Mingyi","family":"Shi","sequence":"additional","affiliation":[{"name":"The University of Hong Kong","place":["Hong Kong, Hong Kong"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-3836-4875","authenticated-orcid":false,"given":"Wangpok","family":"Tse","sequence":"additional","affiliation":[{"name":"The University of Hong Kong","place":["Hong Kong, Hong Kong"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-7554-1730","authenticated-orcid":false,"given":"Wei","family":"Liu","sequence":"additional","affiliation":[{"name":"Shandong University","place":["Jinan, China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2752-3955","authenticated-orcid":false,"given":"Junichi","family":"Yamagishi","sequence":"additional","affiliation":[{"name":"National Institute of Informatics","place":["Tokyo, Japan"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2729-5860","authenticated-orcid":false,"given":"Taku","family":"Komura","sequence":"additional","affiliation":[{"name":"The University of Hong Kong","place":["Hong Kong, Hong Kong"]},{"name":"Centre for Transformative Garment Production","place":["Hong Kong, Hong Kong"]}]}],"member":"320","published-online":{"date-parts":[[2025,8,8]]},"reference":[{"key":"e_1_3_3_2_1","doi-asserted-by":"publisher","unstructured":"Simon Alexanderson Gustav\u00a0Eje Henter Taras Kucherenko and Jonas Beskow. 2020a. Style-Controllable Speech-Driven Gesture Synthesis Using Normalising Flows. Comput. Graph. Forum 39 2 (2020) 487\u2013496. 10.1111\/cgf.13946","DOI":"10.1111\/cgf.13946"},{"key":"e_1_3_3_3_1","doi-asserted-by":"publisher","DOI":"10.1111\/cgf.13946"},{"key":"e_1_3_3_4_1","doi-asserted-by":"crossref","unstructured":"Simon Alexanderson Rajmund Nagy Jonas Beskow and Gustav\u00a0Eje Henter. 2023. Listen denoise action! audio-driven motion synthesis with diffusion models. ACM Transactions on Graphics (TOG) 42 4 (2023) 1\u201320.","DOI":"10.1145\/3592458"},{"key":"e_1_3_3_5_1","doi-asserted-by":"crossref","unstructured":"Tenglong Ao Zeyi Zhang and Libin Liu. 2023. Gesturediffuclip: Gesture diffusion model with clip latents. ACM Transactions on Graphics (TOG) 42 4 (2023) 1\u201318.","DOI":"10.1145\/3592097"},{"key":"e_1_3_3_6_1","unstructured":"Alexei Baevski Henry Zhou Abdelrahman Mohamed and Michael Auli. 2020. wav2vec 2.0: A framework for self-supervised learning of speech representations. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2006.11477 (2020)."},{"key":"e_1_3_3_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/1186822.1073248"},{"key":"e_1_3_3_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/3641519.3657440"},{"key":"e_1_3_3_9_1","first-page":"4","volume-title":"Proc. of GDC","author":"Clavet Simon","year":"2016","unstructured":"Simon Clavet et\u00a0al. 2016. Motion matching and the road to next-gen animation. In Proc. of GDC , Vol.\u00a02. 4."},{"key":"e_1_3_3_10_1","doi-asserted-by":"publisher","unstructured":"Alan\u00a0S. Cowen and Dacher Keltner. 2017. Self-report captures 27 distinct categories of emotion bridged by continuous gradients. Proceedings of the National Academy of Sciences 114 38 (2017) E7900\u2013E7909. 10.1073\/pnas.1702247114 arXiv:https:\/\/www.pnas.org\/doi\/pdf\/10.1073\/pnas.1702247114","DOI":"10.1073\/pnas.1702247114"},{"key":"e_1_3_3_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01034"},{"key":"e_1_3_3_12_1","doi-asserted-by":"publisher","unstructured":"Radek Dan\u011b\u010dek Kiran Chhatre Shashank Tripathi Yandong Wen Michael Black and Timo Bolkart. 2023. Emotional Speech-Driven Animation with Content-Emotion Disentanglement. ACM. 10.1145\/3610548.3618183","DOI":"10.1145\/3610548.3618183"},{"key":"e_1_3_3_13_1","doi-asserted-by":"crossref","unstructured":"Jos\u00e9\u00a0Mario De\u00a0Martino L\u00e9o\u00a0Pini Magalh\u00e3es and F\u00e1bio Violaro. 2006. Facial animation based on context-dependent visemes. Computers & Graphics 30 6 (2006) 971\u2013980.","DOI":"10.1016\/j.cag.2006.08.017"},{"key":"e_1_3_3_14_1","unstructured":"Jacob Devlin Ming-Wei Chang Kenton Lee and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/1810.04805 (2018)."},{"key":"e_1_3_3_15_1","doi-asserted-by":"crossref","unstructured":"Pif Edwards Chris Landreth Eugene Fiume and Karan Singh. 2016. JALI: an animator-centric viseme model for expressive lip synchronization. ACM Transactions on Graphics) 35 4 (2016) 1\u201311.","DOI":"10.1145\/2897824.2925984"},{"key":"e_1_3_3_16_1","unstructured":"Epic Games. 2024. Facial Capture with Live Link. https:\/\/dev.epicgames.com\/community\/learning\/tutorials\/lEYe\/unreal-engine-facial-capture-with-live-link. Accessed on June 2 2025.."},{"key":"e_1_3_3_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/CA.1998.681913"},{"key":"e_1_3_3_18_1","first-page":"204","volume-title":"European Conference on Computer Vision","author":"Fan Xiangyu","year":"2024","unstructured":"Xiangyu Fan, Jiaqi Li, Zhiqian Lin, Weiye Xiao, and Lei Yang. 2024. UniTalker: Scaling up Audio-Driven 3D Facial Animation through A Unified Model. In European Conference on Computer Vision. Springer, 204\u2013221."},{"key":"e_1_3_3_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01821"},{"key":"e_1_3_3_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3267851.3267898"},{"key":"e_1_3_3_21_1","doi-asserted-by":"publisher","DOI":"10.5555\/2919332.2919834"},{"key":"e_1_3_3_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01524"},{"key":"e_1_3_3_23_1","doi-asserted-by":"crossref","unstructured":"John\u00a0S Garofolo Lori\u00a0F Lamel William\u00a0M Fisher Jonathan\u00a0G Fiscus and David\u00a0S Pallett. 1993. DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM. NIST speech disc 1-1.1. NASA STI\/Recon technical report n 93 (1993) 27403.","DOI":"10.6028\/NIST.IR.4930"},{"key":"e_1_3_3_24_1","doi-asserted-by":"publisher","DOI":"10.1111\/cgf.14734"},{"key":"e_1_3_3_25_1","first-page":"418","volume-title":"European Conference on Computer Vision","author":"Ghosh Anindita","year":"2024","unstructured":"Anindita Ghosh, Rishabh Dabral, Vladislav Golyanik, Christian Theobalt, and Philipp Slusallek. 2024. Remos: 3d motion-conditioned reaction synthesis for two-person interactions. In European Conference on Computer Vision. Springer, 418\u2013437."},{"key":"e_1_3_3_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00361"},{"key":"e_1_3_3_27_1","doi-asserted-by":"crossref","unstructured":"F\u00a0Sebastian Grassia. 1998. Practical parameterization of rotations using the exponential map. Journal of graphics tools 3 3 (1998) 29\u201348.","DOI":"10.1080\/10867651.1998.10487493"},{"key":"e_1_3_3_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/1186562.1015755"},{"key":"e_1_3_3_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00509"},{"key":"e_1_3_3_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3528233.3530750"},{"key":"e_1_3_3_31_1","doi-asserted-by":"crossref","unstructured":"Daniel Holden Oussama Kanoun Maksym Perepichka and Tiberiu Popa. 2020. Learned motion matching. ACM Transactions on Graphics (ToG) 39 4 (2020) 53\u20131.","DOI":"10.1145\/3386569.3392440"},{"key":"e_1_3_3_32_1","doi-asserted-by":"crossref","unstructured":"Daniel Holden Taku Komura and Jun Saito. 2017. Phase-functioned neural networks for character control. ACM Transactions on Graphics (TOG) 36 4 (2017) 42.","DOI":"10.1145\/3072959.3073663"},{"key":"e_1_3_3_33_1","unstructured":"Nicholas Howe Michael Leventon and William Freeman. 1999. Bayesian reconstruction of 3d human motion from single-camera video. Advances in neural information processing systems 12 (1999)."},{"key":"e_1_3_3_34_1","doi-asserted-by":"crossref","unstructured":"Catalin Ionescu Dragos Papava Vlad Olaru and Cristian Sminchisescu. 2013. Human3. 6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE transactions on pattern analysis and machine intelligence 36 7 (2013) 1325\u20131339.","DOI":"10.1109\/TPAMI.2013.248"},{"key":"e_1_3_3_35_1","unstructured":"Hanbyul Joo Tomas Simon Xulong Li Hao Liu Lei Tan Lin Gui Sean Banerjee Timothy\u00a0Scott Godisart Bart Nabbe Iain Matthews Takeo Kanade Shohei Nobuhara and Yaser Sheikh. 2017. Panoptic Studio: A Massively Multiview System for Social Interaction Capture. IEEE Transactions on Pattern Analysis and Machine Intelligence (2017)."},{"key":"e_1_3_3_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/CA.2001.982373"},{"key":"e_1_3_3_37_1","doi-asserted-by":"crossref","unstructured":"Tero Karras Timo Aila Samuli Laine Antti Herva and Jaakko Lehtinen. 2017. Audio-driven facial animation by joint end-to-end learning of pose and emotion. ACM Transactions on Graphics (TOG) 36 4 (2017) 1\u201312.","DOI":"10.1145\/3072959.3073658"},{"key":"e_1_3_3_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/3577190.3616120"},{"key":"e_1_3_3_39_1","first-page":"763","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Lee Gilwoo","year":"2019","unstructured":"Gilwoo Lee, Zhiwei Deng, Shugao Ma, Takaaki Shiratori, Siddhartha\u00a0S Srinivasa, and Yaser Sheikh. 2019. Talking with hands 16.2 m: A large-scale dataset of synchronized body-finger motion and audio for conversational motion analysis and synthesis. In Proceedings of the IEEE\/CVF International Conference on Computer Vision. 763\u2013772."},{"key":"e_1_3_3_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/1661412.1618518"},{"key":"e_1_3_3_41_1","doi-asserted-by":"crossref","unstructured":"Sergey Levine Jack\u00a0M Wang Alexis Haraux Zoran Popovi\u0107 and Vladlen Koltun. 2012. Continuous character control with low-dimensional embeddings. ACM Transactions on Graphics (TOG) 31 4 (2012) 1\u201310.","DOI":"10.1145\/2185520.2335379"},{"key":"e_1_3_3_42_1","doi-asserted-by":"crossref","unstructured":"Han Liang Wenqian Zhang Wenxuan Li Jingyi Yu and Lan Xu. 2024. Intergen: Diffusion-based multi-human motion generation under complex interactions. International Journal of Computer Vision 132 9 (2024) 3463\u20133483.","DOI":"10.1007\/s11263-024-02042-6"},{"key":"e_1_3_3_43_1","unstructured":"Jing Lin Ailing Zeng Shunlin Lu Yuanhao Cai Ruimao Zhang Haoqian Wang and Lei Zhang. 2023. Motion-X: A Large-scale 3D Expressive Whole-body Human Motion Dataset. Advances in Neural Information Processing Systems (2023)."},{"key":"e_1_3_3_44_1","unstructured":"Haiyang Liu Zihao Zhu Giorgio Becherini Yichen Peng Mingyang Su You Zhou Naoya Iwamoto Bo Zheng and Michael\u00a0J Black. 2023. EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Masked Audio Gesture Modeling. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2401.00374 (2023)."},{"key":"e_1_3_3_45_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-20071-7_36"},{"key":"e_1_3_3_46_1","doi-asserted-by":"crossref","unstructured":"Steven\u00a0R Livingstone and Frank\u00a0A Russo. 2018. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic multimodal set of facial and vocal expressions in North American English. PloS one 13 5 (2018) e0196391.","DOI":"10.1371\/journal.pone.0196391"},{"key":"e_1_3_3_47_1","unstructured":"Zhiyuan Ma Xiangyu Zhu Guojun Qi Chen Qian Zhaoxiang Zhang and Zhen Lei. 2024. DiffSpeaker: Speech-Driven 3D Facial Animation with Diffusion Transformer. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2402.05712 (2024)."},{"key":"e_1_3_3_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00554"},{"key":"e_1_3_3_49_1","doi-asserted-by":"publisher","DOI":"10.25080\/Majora-7b98e3ed-003"},{"key":"e_1_3_3_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00138"},{"key":"e_1_3_3_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/1186822.1073313"},{"key":"e_1_3_3_52_1","doi-asserted-by":"crossref","unstructured":"Michael Neff Michael Kipp Irene Albrecht and Hans-Peter Seidel. 2008. Gesture modeling and animation based on a probabilistic re-creation of speaker style. ACM Transactions on Graphics (TOG) 27 1 (2008) 5.","DOI":"10.1145\/1330511.1330516"},{"key":"e_1_3_3_53_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01169"},{"key":"e_1_3_3_54_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01975"},{"key":"e_1_3_3_55_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00101"},{"key":"e_1_3_3_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00925"},{"key":"e_1_3_3_57_1","doi-asserted-by":"crossref","unstructured":"Kunkun Pang Dafei Qin Yingruo Fan Julian Habekost Takaaki Shiratori Junichi Yamagishi and Taku Komura. 2023. Bodyformer: Semantics-guided 3d body gesture synthesis with transformer. ACM Transactions on Graphics (TOG) 42 4 (2023) 1\u201312.","DOI":"10.1145\/3592456"},{"key":"e_1_3_3_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.01891"},{"key":"e_1_3_3_59_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-demos.7"},{"key":"e_1_3_3_60_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00121"},{"key":"e_1_3_3_61_1","doi-asserted-by":"crossref","unstructured":"Alla Safonova Jessica\u00a0K Hodgins and Nancy\u00a0S Pollard. 2004. Synthesizing physically realistic human motion in low-dimensional behavior-specific spaces. ACM Transactions on Graphics (ToG) 23 3 (2004) 514\u2013521.","DOI":"10.1145\/1015706.1015754"},{"key":"e_1_3_3_62_1","doi-asserted-by":"publisher","DOI":"10.1145\/3623264.3624447"},{"key":"e_1_3_3_63_1","doi-asserted-by":"crossref","unstructured":"Sebastian Starke Ian Mason and Taku Komura. 2022. Deepphase: Periodic autoencoders for learning motion phase manifolds. ACM Transactions on Graphics (ToG) 41 4 (2022) 1\u201313.","DOI":"10.1145\/3528223.3530178"},{"key":"e_1_3_3_64_1","doi-asserted-by":"crossref","unstructured":"Sebastian Starke Paul Starke Nicky He Taku Komura and Yuting Ye. 2024. Categorical Codebook Matching for Embodied Character Controllers. ACM Transactions on Graphics (TOG) 43 4 (2024) 1\u201314.","DOI":"10.1145\/3658209"},{"key":"e_1_3_3_65_1","doi-asserted-by":"publisher","DOI":"10.1145\/1553374.1553505"},{"key":"e_1_3_3_66_1","doi-asserted-by":"crossref","unstructured":"Sarah Taylor Taehwan Kim Yisong Yue Moshe Mahler James Krahe Anastasio\u00a0Garcia Rodriguez Jessica Hodgins and Iain Matthews. 2017. A deep learning approach for generalized speech animation. ACM Transactions on Graphics 36 4 (2017) 1\u201311.","DOI":"10.1145\/3072959.3073699"},{"key":"e_1_3_3_67_1","unstructured":"Guy Tevet Sigal Raab Brian Gordon Yonatan Shafir Daniel Cohen-Or and Amit\u00a0H Bermano. 2022. Human motion diffusion model. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2209.14916 (2022)."},{"key":"e_1_3_3_68_1","unstructured":"Guillermo Valle-P\u00e9rez Gustav\u00a0Eje Henter Jonas Beskow Andr\u00e9 Holzapfel Pierre-Yves Oudeyer and Simon Alexanderson. 2021. Transflower: probabilistic autoregressive dance generation with multimodal attention. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2106.13871 (2021)."},{"key":"e_1_3_3_69_1","doi-asserted-by":"crossref","unstructured":"Michiel van\u00a0de Panne. 2014. Motion fields for interactive character animation: Technical perspective. Commun. ACM 57 6 (2014) 100\u2013100.","DOI":"10.1145\/2602759"},{"key":"e_1_3_3_70_1","doi-asserted-by":"publisher","DOI":"10.1111\/cgf.14640"},{"key":"e_1_3_3_71_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58589-1_42"},{"key":"e_1_3_3_72_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01229"},{"key":"e_1_3_3_73_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01229"},{"key":"e_1_3_3_74_1","unstructured":"Liang Xu Xintao Lv Yichao Yan Xin Jin Shuwen Wu Congsheng Xu Yifan Liu Yizhou Zhou Fengyun Rao Xingdong Sheng et\u00a0al. 2023. Inter-X: Towards Versatile Human-Human Interaction Analysis. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2312.16051 (2023)."},{"key":"e_1_3_3_75_1","unstructured":"Sicheng Xu Guojun Chen Yu-Xiao Guo Jiaolong Yang Chong Li Zhenyu Zang Yizhong Zhang Xin Tong and Baining Guo. 2024. Vasa-1: Lifelike audio-driven talking faces generated in real time. Advances in Neural Information Processing Systems 37 (2024) 660\u2013684."},{"key":"e_1_3_3_76_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00053"},{"key":"e_1_3_3_77_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01632"},{"key":"e_1_3_3_78_1","doi-asserted-by":"crossref","unstructured":"Zeyi Zhang Tenglong Ao Yuyao Zhang Qingzhe Gao Chuan Lin Baoquan Chen and Libin Liu. 2024. Semantic Gesticulator: Semantics-Aware Co-Speech Gesture Synthesis. ACM Transactions on Graphics (TOG) 43 4 (2024) 1\u201317.","DOI":"10.1145\/3658134"},{"key":"e_1_3_3_79_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00366"},{"key":"e_1_3_3_80_1","doi-asserted-by":"publisher","DOI":"10.1145\/3641519.3657413"}],"container-title":["Proceedings of the ACM on Computer Graphics and Interactive Techniques"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3747871","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,8]],"date-time":"2025-08-08T16:25:09Z","timestamp":1754670309000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3747871"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8,8]]},"references-count":79,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025,8,31]]}},"alternative-id":["10.1145\/3747871"],"URL":"https:\/\/doi.org\/10.1145\/3747871","relation":{},"ISSN":["2577-6193"],"issn-type":[{"value":"2577-6193","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,8,8]]},"assertion":[{"value":"2025-08-08","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}