{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,17]],"date-time":"2026-07-17T06:02:59Z","timestamp":1784268179710,"version":"3.55.0"},"reference-count":134,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2024,6,23]],"date-time":"2024-06-23T00:00:00Z","timestamp":1719100800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Industrial Fundamental Technology Development Program","award":["20023495"],"award-info":[{"award-number":["20023495"]}]},{"DOI":"10.13039\/501100003130","name":"Flemish Research Foundation","doi-asserted-by":"crossref","award":["1S95020N"],"award-info":[{"award-number":["1S95020N"]}],"id":[{"id":"10.13039\/501100003130","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Portuguese Foundation for Science and Technology","award":["SFRH\/BD\/127842\/ 2016"],"award-info":[{"award-number":["SFRH\/BD\/127842\/ 2016"]}]},{"DOI":"10.13039\/501100004063","name":"Knut and Alice Wallenberg Foundation","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100004063","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Graph."],"published-print":{"date-parts":[[2024,6,30]]},"abstract":"<jats:p>This article reports on the second GENEA Challenge to benchmark data-driven automatic co-speech gesture generation. Participating teams used the same speech and motion dataset to build gesture-generation systems. Motion generated by all these systems was rendered to video using a standardised visualisation pipeline and evaluated in several large, crowdsourced user studies. Unlike when comparing different research articles, differences in results are here only due to differences between methods, enabling direct comparison between systems. The dataset was based on 18 hours of full-body motion capture, including fingers, of different persons engaging in a dyadic conversation. Ten teams participated in the challenge across two tiers: full-body and upper-body gesticulation. For each tier, we evaluated both the human-likeness of the gesture motion and its appropriateness for the specific speech signal. Our evaluations decouple human-likeness from gesture appropriateness, which has been a difficult problem in the field.<\/jats:p>\n          <jats:p>The evaluation results show some synthetic gesture conditions being rated as significantly more human-like than 3D human motion capture. To the best of our knowledge, this has not been demonstrated before. On the other hand, all synthetic motion is found to be vastly less appropriate for the speech than the original motion-capture recordings. We also find that conventional objective metrics do not correlate well with subjective human-likeness ratings in this large evaluation. The one exception is the Fr\u00e9chet gesture distance (FGD), which achieves a Kendall\u2019s tau rank correlation of around -0.5. Based on the challenge results we formulate numerous recommendations for system building and evaluation.<\/jats:p>","DOI":"10.1145\/3656374","type":"journal-article","created":{"date-parts":[[2024,4,27]],"date-time":"2024-04-27T10:01:50Z","timestamp":1714212110000},"page":"1-28","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":19,"title":["Evaluating Gesture Generation in a Large-scale Open Challenge: The GENEA Challenge 2022"],"prefix":"10.1145","volume":"43","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9838-8848","authenticated-orcid":false,"given":"Taras","family":"Kucherenko","sequence":"first","affiliation":[{"name":"SEED, Electronic Arts Inc, Stockholm, Sweden"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7420-7181","authenticated-orcid":false,"given":"Pieter","family":"Wolfert","sequence":"additional","affiliation":[{"name":"Donders Institute for Brain, Cognition &amp; Behaviour, Radboud Universiteit, Nijmegen, Netherlands and IDLab, Ghent University, Gent, Belgium"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4286-3421","authenticated-orcid":false,"given":"Youngwoo","family":"Yoon","sequence":"additional","affiliation":[{"name":"ETRI, Daejeon, Korea (the Republic of)"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3385-4101","authenticated-orcid":false,"given":"Carla","family":"Viegas","sequence":"additional","affiliation":[{"name":"Carnegie Mellon University, Pittsburgh, United States and Nova University of Lisbon, Lisboa, Portugal"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4573-1400","authenticated-orcid":false,"given":"Teodor","family":"Nikolov","sequence":"additional","affiliation":[{"name":"Department of Computing Science, Ume\u00e5 Universitet, Ume\u00e5, Sweden and Motorica AB, Stockholm, Sweden"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1817-440X","authenticated-orcid":false,"given":"Mihail","family":"Tsakov","sequence":"additional","affiliation":[{"name":"Department of Computing Science, Ume\u00e5 Universitet, Ume\u00e5, Sweden"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1643-1054","authenticated-orcid":false,"given":"Gustav Eje","family":"Henter","sequence":"additional","affiliation":[{"name":"Division of Speech, Music and Hearing, KTH Royal Institute of Technology, Stockholm, Sweden and Motorica AB, Stockholm, Sweden"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2024,6,23]]},"reference":[{"key":"e_1_3_3_2_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.findings-emnlp.170"},{"key":"e_1_3_3_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01991"},{"key":"e_1_3_3_4_1","doi-asserted-by":"publisher","DOI":"10.5281\/zenodo.4088599"},{"key":"e_1_3_3_5_1","doi-asserted-by":"publisher","DOI":"10.1111\/cgf.13946"},{"key":"e_1_3_3_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/3592458"},{"key":"e_1_3_3_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3550454.3555435"},{"key":"e_1_3_3_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/566570.566606"},{"key":"e_1_3_3_9_1","doi-asserted-by":"publisher","DOI":"10.1121\/1.4919317"},{"key":"e_1_3_3_10_1","first-page":"12449","volume-title":"Proceedings of the Advances in Neural Information Processing Systems (NeurIPS \u201920)","author":"Baevski Alexei","year":"2020","unstructured":"Alexei Baevski , Yuhao Zhou , Abdelrahman Mohamed , and Michael Auli . 2020. wav2vec 2.0: A framework for self-supervised learning of speech representations. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS \u201920) . 12449\u201312460. Retrieved from https:\/\/proceedings.neurips.cc\/paper\/2020\/hash\/92d1e1eb1cd6f9fba3227870bb6d7f07-Abstract.html"},{"key":"e_1_3_3_11_1","doi-asserted-by":"publisher","DOI":"10.1038\/156783b0"},{"key":"e_1_3_3_12_1","volume-title":"Proceedings of the Workshop on Gesture and Speech in Interaction (GeSpIn \u201911)","author":"Bergmann Kirsten","year":"2011","unstructured":"Kirsten Bergmann , Volkan Aksu , and Stefan Kopp . 2011. The relation of speech and gestures: Temporal synchrony follows semantic synchrony. In Proceedings of the Workshop on Gesture and Speech in Interaction (GeSpIn \u201911) . Retrieved from https:\/\/pub.uni-bielefeld.de\/record\/2392953"},{"key":"e_1_3_3_13_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-04380-2_12"},{"key":"e_1_3_3_14_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-15892-6_11"},{"key":"e_1_3_3_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3474085.3475223"},{"key":"e_1_3_3_16_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2005-72"},{"key":"e_1_3_3_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00652"},{"key":"e_1_3_3_18_1","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00051"},{"key":"e_1_3_3_19_1","doi-asserted-by":"publisher","DOI":"10.1098\/rspb.2020.2419"},{"key":"e_1_3_3_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICME.2015.7177478"},{"key":"e_1_3_3_21_1","first-page":"1877","volume-title":"Proceedings of the Advances in Neural Information Processing Systems (NeurIPS \u201920)","author":"Brown Tom B.","year":"2020","unstructured":"Tom B. Brown , Benjamin Mann , Nick Ryder , Melanie Subbiah , Jared D. Kaplan , Prafulla Dhariwal , Arvind Neelakantan , Pranav Shyam , Girish Sastry , Amanda Askell , Sandhini Agarwal , Ariel Herbert-Voss , Gretchen Krueger , Tom Henighan , Rewon Child , Aditya Ramesh , Daniel Ziegler , Jeffrey Wu , Clemens Winter , Chris Hesse , Mark Chen , Eric Sigler , Mateusz Litwin , Scott Gray , Benjamin Chess , Jack Clark , Christopher Berner , Sam McCandlish , Alec Radford , Ilya Sutskever , and Dario Amodei . 2020. Language models are few-shot learners. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS \u201920) . 1877\u20131901. Retrieved from https:\/\/proceedings.neurips.cc\/paper\/2020\/hash\/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html"},{"key":"e_1_3_3_22_1","volume-title":"Proceedings of the Nucl.ai","author":"B\u00fcttner Michael","year":"2015","unstructured":"Michael B\u00fcttner and Simon Clavet . 2015. Motion matching \u2013 the road to next gen animation. In Proceedings of the Nucl.ai . Retrieved from https:\/\/youtu.be\/z_wpgHFSWss"},{"key":"e_1_3_3_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/383259.383315"},{"key":"e_1_3_3_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3536221.3558060"},{"key":"e_1_3_3_25_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2013-395"},{"key":"e_1_3_3_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3450626.3459932"},{"key":"e_1_3_3_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/JSTSP.2022.3188113"},{"key":"e_1_3_3_28_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-21996-7_17"},{"key":"e_1_3_3_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASSP.1980.1163420"},{"key":"e_1_3_3_30_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N19-1423"},{"key":"e_1_3_3_31_1","article-title":"Loudness Normalisation and Permitted Maximum Level of Audio Signals","author":"Union European Broadcasting","year":"2020","unstructured":"European Broadcasting Union . 2020. Loudness Normalisation and Permitted Maximum Level of Audio Signals. EBU Recommendation EBU R 128v4. Retrieved from https:\/\/tech.ebu.ch\/docs\/r\/r128.pdf","journal-title":"EBU Recommendation EBU R 128v4"},{"key":"e_1_3_3_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/3267851.3267898"},{"key":"e_1_3_3_33_1","doi-asserted-by":"publisher","DOI":"10.1002\/cav.2016"},{"key":"e_1_3_3_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3536221.3558068"},{"key":"e_1_3_3_35_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2019-1783"},{"key":"e_1_3_3_36_1","doi-asserted-by":"publisher","DOI":"10.1080\/10867651.1998.10487493"},{"key":"e_1_3_3_37_1","doi-asserted-by":"publisher","DOI":"10.1002\/9780470316771"},{"key":"e_1_3_3_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514197.3549697"},{"key":"e_1_3_3_39_1","doi-asserted-by":"publisher","DOI":"10.1155\/2022\/1828293"},{"key":"e_1_3_3_40_1","volume-title":"Proceedings of the Advances in Neural Information Processing Systems (NIPS \u201917)","author":"Heusel Martin","year":"2017","unstructured":"Martin Heusel , Hubert Ramsauer , Thomas Unterthiner , Bernhard Nessler , and Sepp Hochreiter . 2017. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In Proceedings of the Advances in Neural Information Processing Systems (NIPS \u201917) . Retrieved from https:\/\/proceedings.neurips.cc\/paper\/2017\/file\/8a1d694707eb0fefe65871369074926d-Paper.pdf"},{"key":"e_1_3_3_41_1","doi-asserted-by":"publisher","DOI":"10.3758\/s13423-017-1363-z"},{"issue":"2","key":"e_1_3_3_42_1","first-page":"65","article-title":"A simple sequentially rejective multiple test procedure","volume":"6","author":"Holm Sture","year":"1979","unstructured":"Sture Holm . 1979. A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics 6, 2 (1979), 65\u201370. Retrieved from https:\/\/www.jstor.org\/stable\/4615733","journal-title":"Scandinavian Journal of Statistics"},{"key":"e_1_3_3_43_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2022-970"},{"key":"e_1_3_3_44_1","volume-title":"Methods for Subjective Determination of Transmission Quality","author":"Sector International Telecommunication Union, Telecommunication Standardisation","year":"1996","unstructured":"International Telecommunication Union, Telecommunication Standardisation Sector . 1996. Methods for Subjective Determination of Transmission Quality . Recommendation ITU-T P.800. Retrieved from https:\/\/www.itu.int\/rec\/T-REC-P.800-199608-I"},{"key":"e_1_3_3_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2018.2856281"},{"key":"e_1_3_3_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/3267851.3267866"},{"key":"e_1_3_3_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/3383652.3423911"},{"key":"e_1_3_3_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/3383652.3423860"},{"key":"e_1_3_3_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/3462244.3479957"},{"key":"e_1_3_3_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/3536221.3558061"},{"key":"e_1_3_3_51_1","volume-title":"Rank Correlation Methods (4 ed.)","author":"Kendall Maurice G.","year":"1970","unstructured":"Maurice G. Kendall . 1970. Rank Correlation Methods (4 ed.). Charles Griffin and Co."},{"key":"e_1_3_3_52_1","doi-asserted-by":"publisher","DOI":"10.3989\/loquens.2014.006"},{"key":"e_1_3_3_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/3536220.3558801"},{"key":"e_1_3_3_54_1","doi-asserted-by":"publisher","DOI":"10.28995\/2075-7182-2021-20-425-432"},{"key":"e_1_3_3_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/566654.566605"},{"key":"e_1_3_3_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/3308532.3329472"},{"key":"e_1_3_3_57_1","doi-asserted-by":"publisher","DOI":"10.1080\/10447318.2021.1883883"},{"key":"e_1_3_3_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/3382507.3418815"},{"key":"e_1_3_3_59_1","doi-asserted-by":"publisher","DOI":"10.1145\/3397481.3450692"},{"key":"e_1_3_3_60_1","doi-asserted-by":"publisher","DOI":"10.1145\/3472306.3478333"},{"key":"e_1_3_3_61_1","doi-asserted-by":"publisher","DOI":"10.5555\/3535850.3535937"},{"key":"e_1_3_3_62_1","doi-asserted-by":"publisher","DOI":"10.1145\/3577190.3616120"},{"key":"e_1_3_3_63_1","doi-asserted-by":"crossref","unstructured":"Quoc Anh Le and Catherine Pelachaud . 2012. Evaluating an Expressive Gesture Model for a Humanoid Robot: Experimental Results. Retrieved from https:\/\/www.researchgate.net\/publication\/268257868_Evaluating_an_Expressive_Gesture_Model_for_a_Humanoid_Robot_Experimental_Results","DOI":"10.1007\/978-3-642-24571-8_24"},{"key":"e_1_3_3_64_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00085"},{"key":"e_1_3_3_65_1","doi-asserted-by":"publisher","DOI":"10.1145\/566654.566607"},{"key":"e_1_3_3_66_1","doi-asserted-by":"publisher","DOI":"10.1145\/1778765.1778861"},{"key":"e_1_3_3_67_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33016706"},{"key":"e_1_3_3_68_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01315"},{"key":"e_1_3_3_69_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01022"},{"key":"e_1_3_3_70_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-20071-7_36"},{"key":"e_1_3_3_71_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01021"},{"key":"e_1_3_3_72_1","doi-asserted-by":"publisher","DOI":"10.1145\/3472307.3484167"},{"key":"e_1_3_3_73_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP39728.2021.9414660"},{"key":"e_1_3_3_74_1","doi-asserted-by":"publisher","DOI":"10.1145\/3536221.3558059"},{"key":"e_1_3_3_75_1","doi-asserted-by":"publisher","DOI":"10.1002\/sim.3531"},{"key":"e_1_3_3_76_1","doi-asserted-by":"publisher","DOI":"10.1145\/3623264.3624443"},{"key":"e_1_3_3_77_1","doi-asserted-by":"publisher","DOI":"10.31234\/osf.io\/dxvhc"},{"key":"e_1_3_3_78_1","doi-asserted-by":"publisher","DOI":"10.1038\/264746a0"},{"key":"e_1_3_3_79_1","doi-asserted-by":"publisher","DOI":"10.1177\/002383099403700208"},{"key":"e_1_3_3_80_1","doi-asserted-by":"publisher","DOI":"10.21437\/SSW.2023-24"},{"key":"e_1_3_3_81_1","doi-asserted-by":"publisher","DOI":"10.1145\/3503250"},{"key":"e_1_3_3_82_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2020-2382"},{"key":"e_1_3_3_83_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2010-413"},{"key":"e_1_3_3_84_1","doi-asserted-by":"publisher","DOI":"10.1177\/0261927X17728361"},{"key":"e_1_3_3_85_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF00236911"},{"key":"e_1_3_3_86_1","volume-title":"Encyclopedia of Mathematics","author":"Nikulin Mikhail S.","year":"2001","unstructured":"Mikhail S. Nikulin . 2001. Hellinger distance. Encyclopedia of Mathematics . Springer. http:\/\/encyclopediaofmath.org\/index.php?title=Hellinger_distance Accessed: 2021-01-31 ."},{"key":"e_1_3_3_87_1","doi-asserted-by":"publisher","DOI":"10.1111\/cgf.14776"},{"key":"e_1_3_3_88_1","doi-asserted-by":"publisher","DOI":"10.5281\/zenodo.4090878"},{"key":"e_1_3_3_89_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1162"},{"key":"e_1_3_3_90_1","first-page":"8748","volume-title":"Proceedings of the International Conference on Machine Learning (ICML \u201921)","author":"Radford Alec","year":"2021","unstructured":"Alec Radford , Jong Wook Kim , Chris Hallacy , Aditya Ramesh , Gabriel Goh , Sandhini Agarwal , Girish Sastry , Amanda Askell , Pamela Mishkin , Jack Clark , Gretchen Krueger , and Ilya Sutskever . 2021. Learning transferable visual models from natural language supervision. In Proceedings of the International Conference on Machine Learning (ICML \u201921) . 8748\u20138763. https:\/\/proceedings.mlr.press\/v139\/radford21a.html"},{"key":"e_1_3_3_91_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP40776.2020.9053569"},{"key":"e_1_3_3_92_1","doi-asserted-by":"publisher","DOI":"10.1109\/VR50410.2021.00082"},{"key":"e_1_3_3_93_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2015-368"},{"key":"e_1_3_3_94_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.specom.2019.04.005"},{"key":"e_1_3_3_95_1","doi-asserted-by":"publisher","DOI":"10.1145\/3536221.3558064"},{"key":"e_1_3_3_96_1","doi-asserted-by":"publisher","DOI":"10.1007\/s12369-013-0196-9"},{"key":"e_1_3_3_97_1","doi-asserted-by":"publisher","DOI":"10.1007\/s12369-011-0124-9"},{"key":"e_1_3_3_98_1","doi-asserted-by":"publisher","DOI":"10.1109\/ROMAN.2011.6005285"},{"key":"e_1_3_3_99_1","doi-asserted-by":"publisher","DOI":"10.1155\/2009\/191940"},{"key":"e_1_3_3_100_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.specom.2016.04.001"},{"key":"e_1_3_3_101_1","doi-asserted-by":"publisher","DOI":"10.1109\/FG52635.2021.9667023"},{"key":"e_1_3_3_102_1","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.1968.10480934"},{"key":"e_1_3_3_103_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2018.8461368"},{"key":"e_1_3_3_104_1","doi-asserted-by":"publisher","DOI":"10.1109\/ROMAN.2018.8525621"},{"key":"e_1_3_3_105_1","first-page":"3335","volume-title":"Proceedings of the International Conference on Language Resources and Evaluation (LREC \u201912)","author":"Sz\u00e9kely \u00c9va","year":"2012","unstructured":"\u00c9va Sz\u00e9kely , Jo\u00e3o P. Cabral , Mohamed Abou-Zleikha , Peter Cahill , and Julie Carson-Berndsen . 2012. Evaluating expressive speech synthesis from audiobooks in conversational phrases. In Proceedings of the International Conference on Language Resources and Evaluation (LREC \u201912) . 3335\u20133339. Retrieved from https:\/\/aclanthology.org\/L12-1513\/"},{"key":"e_1_3_3_106_1","doi-asserted-by":"publisher","DOI":"10.1109\/IROS47612.2022.9981734"},{"key":"e_1_3_3_107_1","doi-asserted-by":"publisher","DOI":"10.1109\/ECTI-CON51831.2021.9454931"},{"key":"e_1_3_3_108_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-94-011-2546-8_20"},{"key":"e_1_3_3_109_1","doi-asserted-by":"publisher","DOI":"10.4135\/9781412983570"},{"key":"e_1_3_3_110_1","unstructured":"George Toderici Lucas Theis Nick Johnston Eirikur Agustsson Johannes Ball\u00e9 Fabian Mentzer Wenzhe Shi and Radu Timofte . 2020. CLIC 2020: Overview and Analysis of the Competition Results. Retrieved March 27 2024 from https:\/\/youtu.be\/iXzgFrRWNEg"},{"key":"e_1_3_3_111_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF00204593"},{"key":"e_1_3_3_112_1","unstructured":"A\u00e4ron van den Oord Sander Dieleman Heiga Zen Karen Simonyan Oriol Vinyals Alex Graves Nal Kalchbrenner Andrew Senior and Koray Kavukcuoglu . 2016. WaveNet: A Generative Model for Raw Audio. arXiv:1609.03499. Retrieved from https:\/\/arxiv.org\/abs\/1609.03499"},{"key":"e_1_3_3_113_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.specom.2013.09.008"},{"key":"e_1_3_3_114_1","volume-title":"Proceedings of the Advances in Neural Information Processing Systems (NeurIPS \u201919)","author":"Wang Alex","year":"2019","unstructured":"Alex Wang , Yada Pruksachatkun , Nikita Nangia , Amanpreet Singh , Julian Michael , Felix Hill , Omer Levy , and Samuel R. Bowman . 2019. SuperGLUE: A stickier benchmark for general-purpose language understanding systems. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS \u201919) . Retrieved from https:\/\/proceedings.neurips.cc\/paper\/2019\/hash\/4496bf24afe7fab6f046bf4923da8de6-Abstract.html"},{"key":"e_1_3_3_115_1","doi-asserted-by":"publisher","DOI":"10.1145\/3462244.3479914"},{"key":"e_1_3_3_116_1","doi-asserted-by":"publisher","DOI":"10.1145\/3536221.3558065"},{"key":"e_1_3_3_117_1","first-page":"95","volume-title":"Proceedings of the Research on Spoken Language Processing Progress Report No. 26","author":"Winters Stephen J.","year":"2004","unstructured":"Stephen J. Winters and David B. Pisoni . 2004. Perception and comprehension of synthetic speech. In Proceedings of the Research on Spoken Language Processing Progress Report No. 26 . Speech Research Laboratory, Department of Psychology, Indiana University, Bloomington, IN, 95\u2013138. Retrieved from https:\/\/citeseerx.ist.psu.edu\/pdf\/8e10a4c4d279e9540cd5af5aae692fe9907409ff"},{"key":"e_1_3_3_118_1","doi-asserted-by":"publisher","DOI":"10.1145\/3462244.3479889"},{"key":"e_1_3_3_119_1","doi-asserted-by":"publisher","DOI":"10.1145\/3610661.3617160"},{"key":"e_1_3_3_120_1","first-page":"4","volume-title":"Proceedings of the ICDL-EpiRob Workshop on Naturalistic Non-Verbal and Affective Human-Robot Interactions (ICDL-EpiRob \u201919 Workshop)","author":"Wolfert Pieter","year":"2019","unstructured":"Pieter Wolfert , Taras Kucherenko , Hedvig Kjelstr\u00f6m , and Tony Belpaeme . 2019. Should beat gestures be learned or designed? A benchmarking user study. In Proceedings of the ICDL-EpiRob Workshop on Naturalistic Non-Verbal and Affective Human-Robot Interactions (ICDL-EpiRob \u201919 Workshop) . 4 pages. Retrieved from http:\/\/urn.kb.se\/resolve?urn=urn:nbn:se:kth:diva-255998"},{"key":"e_1_3_3_121_1","doi-asserted-by":"publisher","DOI":"10.1109\/THMS.2022.3149173"},{"key":"e_1_3_3_122_1","doi-asserted-by":"publisher","DOI":"10.1145\/3462244.3481275"},{"key":"e_1_3_3_123_1","doi-asserted-by":"publisher","DOI":"10.1145\/3536221.3558066"},{"key":"e_1_3_3_124_1","doi-asserted-by":"publisher","DOI":"10.1109\/IROS47612.2022.9981117"},{"key":"e_1_3_3_125_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-20065-6_41"},{"key":"e_1_3_3_126_1","doi-asserted-by":"publisher","DOI":"10.1145\/3414685.3417838"},{"key":"e_1_3_3_127_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2019.8793720"},{"key":"e_1_3_3_128_1","doi-asserted-by":"publisher","DOI":"10.1145\/3472749.3474789"},{"key":"e_1_3_3_129_1","doi-asserted-by":"publisher","DOI":"10.1145\/3536221.3558058"},{"key":"e_1_3_3_130_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2016-847"},{"key":"e_1_3_3_131_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-27077-2_18"},{"key":"e_1_3_3_132_1","doi-asserted-by":"publisher","DOI":"10.1145\/3197517.3201366"},{"key":"e_1_3_3_133_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW50498.2020.00254"},{"key":"e_1_3_3_134_1","doi-asserted-by":"publisher","DOI":"10.1145\/3536221.3558063"},{"key":"e_1_3_3_135_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00589"}],"container-title":["ACM Transactions on Graphics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3656374","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3656374","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T22:48:59Z","timestamp":1750286939000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3656374"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,6,23]]},"references-count":134,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2024,6,30]]}},"alternative-id":["10.1145\/3656374"],"URL":"https:\/\/doi.org\/10.1145\/3656374","relation":{},"ISSN":["0730-0301","1557-7368"],"issn-type":[{"value":"0730-0301","type":"print"},{"value":"1557-7368","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,6,23]]},"assertion":[{"value":"2023-02-15","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-02-27","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-06-23","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}