{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,30]],"date-time":"2025-07-30T16:31:43Z","timestamp":1753893103481,"version":"3.41.2"},"reference-count":37,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2025,4,8]],"date-time":"2025-04-08T00:00:00Z","timestamp":1744070400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Robot. AI"],"abstract":"<jats:p>Generating natural and expressive co-speech gestures for conversational virtual agents and social robots is crucial for enhancing their acceptability and usability in real-world contexts. However, this task is complicated by strong cultural and linguistic influences on gesture patterns, exacerbated by the limited availability of cross-cultural co-speech gesture datasets. To address this gap, we introduce the TED-Culture Dataset, a novel dataset derived from TED talks, designed to enable cross-cultural gesture generation based on linguistic cues. We propose a generative model based on the Stable Diffusion architecture, which we evaluate on both the TED-Expressive Dataset and the TED-Culture Dataset. The model is further implemented on the NAO robot to assess real-time performance. Our model surpasses state-of-the-art baselines in gesture naturalness and exhibits rapid convergence across languages, specifically Indonesian, Japanese, and Italian. Objective and subjective evaluations confirm improvements in communicative effectiveness. Notably, results reveal that individuals are more critical of gestures in their native language, expecting higher generative performance in familiar linguistic contexts. By releasing the TED-Culture Dataset, we facilitate future research on multilingual gesture generation for embodied agents. The study underscores the importance of cultural and linguistic adaptation in co-speech gesture synthesis, with implications for human-robot interaction design.<\/jats:p>","DOI":"10.3389\/frobt.2025.1546765","type":"journal-article","created":{"date-parts":[[2025,4,8]],"date-time":"2025-04-08T04:37:38Z","timestamp":1744087058000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["TED-culture: culturally inclusive co-speech gesture generation for embodied social agents"],"prefix":"10.3389","volume":"12","author":[{"given":"Yixin","family":"Shen","sequence":"first","affiliation":[]},{"given":"Wafa","family":"Johal","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2025,4,8]]},"reference":[{"key":"B1","doi-asserted-by":"crossref","first-page":"719","DOI":"10.1109\/3DV.2019.00084","article-title":"Language2pose: natural language grounded pose forecasting","volume-title":"2019 international conference on 3D vision (3DV)","author":"Ahuja","year":"2019"},{"key":"B2","first-page":"1029","article-title":"Beat gesture generation rules for human-robot interaction","volume-title":"RO-MAN 2009-the 18th IEEE international Symposium on Robot and human interactive communication","author":"Bremner","year":"2009"},{"key":"B3","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1007\/978-3-030-58607-2_2","article-title":"Monocular expressive body regression through body-driven attention","volume-title":"Computer vision\u2013ECCV 2020: 16th European conference, glasgow, UK, august 23\u201328, 2020, proceedings, Part X 16","author":"Choutas","year":"2020"},{"key":"B4","doi-asserted-by":"publisher","first-page":"51","DOI":"10.1080\/09298210701653344","article-title":"Beat tracking by dynamic programming","volume":"36","author":"Ellis","year":"2007","journal-title":"J. New Music Res."},{"article-title":"Evaluation of social interaction (ESI)","year":"2010","author":"Fisher","key":"B5"},{"key":"B6","doi-asserted-by":"publisher","first-page":"206","DOI":"10.1111\/cgf.14734","article-title":"Zeroeggs: zero-shot example-based gesture generation from speech","volume":"42","author":"Ghorbani","year":"2023","journal-title":"Comput. Graph. Forum"},{"key":"B7","first-page":"3497","article-title":"Learning individual styles of conversational gesture","author":"Ginosar","year":"2019"},{"key":"B8","doi-asserted-by":"publisher","first-page":"1493","DOI":"10.1007\/s12369-022-00893-y","article-title":"Towards culture-aware co-speech gestures for social robots","volume":"14","author":"Gjaci","year":"2022","journal-title":"Int. J. Soc. robotics"},{"article-title":"Generative adversarial networks","year":"2014","author":"Goodfellow","key":"B9"},{"key":"B10","first-page":"101","article-title":"Learning speech-driven 3d conversational gestures from video","volume-title":"Proceedings of the 21st ACM international conference on intelligent virtual agents","author":"Habibie","year":"2021"},{"key":"B11","doi-asserted-by":"publisher","first-page":"6840","DOI":"10.48550\/arXiv.2006.11239","article-title":"Denoising diffusion probabilistic models","volume":"33","author":"Ho","year":"2020","journal-title":"Adv. neural Inf. Process. Syst."},{"key":"B12","doi-asserted-by":"publisher","first-page":"3757","DOI":"10.1109\/lra.2018.2856281","article-title":"A speech-driven hand gesture generation method and evaluation in android robots","volume":"3","author":"Ishi","year":"2018","journal-title":"IEEE Robotics Automation Lett."},{"key":"B13","doi-asserted-by":"publisher","first-page":"145","DOI":"10.4324\/9781003059783-1","article-title":"Cross-cultural variation of speech-accompanying gesture: a review","volume":"24","author":"Kita","year":"2009","journal-title":"Lang. cognitive Process."},{"article-title":"Nonverbal communication in human interaction (Cengage Learning)","year":"2013","author":"Knapp","key":"B14"},{"key":"B15","first-page":"242","article-title":"Gesticulator: a framework for semantically-aware speech-driven gesture generation","author":"Kucherenko","year":"2020"},{"key":"B16","first-page":"763","article-title":"Talking with hands 16.2m: a large-scale dataset of synchronized body-finger motion and audio for conversational motion analysis and synthesis","author":"Lee","year":""},{"key":"B17","first-page":"763","article-title":"Talking with hands 16.2 m: A large-scale dataset of synchronized body-finger motion and audio for conversational motion analysis and synthesis","author":"Lee","year":""},{"key":"B18","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1911.02001","article-title":"Dancing to music","volume":"32","author":"Lee","year":"","journal-title":"Adv. neural Inf. Process. Syst."},{"key":"B19","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/1833349.1778861","article-title":"Gesture controllers","volume-title":"Acm siggraph 2010 papers","author":"Levine","year":"2010"},{"key":"B20","doi-asserted-by":"publisher","first-page":"1272","DOI":"10.1609\/aaai.v36i2.20014","article-title":"Danceformer: music conditioned 3d dance generation with parametric motion transformer","volume":"36","author":"Li","year":"2022","journal-title":"Proc. AAAI Conf. Artif. Intell."},{"key":"B21","first-page":"13401","article-title":"Ai choreographer: music conditioned 3d dance generation with aist++","author":"Li","year":"2021"},{"key":"B22","first-page":"405","article-title":"Speech-gesture gan: gesture generation for robots and embodied agents","author":"Liu","year":"2023"},{"key":"B23","first-page":"T612","article-title":"Beat: a large-scale semantic and emotional multi-modal dataset for conversational gestures synthesis","volume-title":"Computer Vision \u2013 ECCV 2022. Editors Avidan, S., Brostow, G., Ciss\u00e9, M., Farinella, G. M., Hassner","author":"Liu","year":""},{"key":"B24","first-page":"10462","article-title":"Learning hierarchical cross-modal association for co-speech gesture generation","author":"Liu","year":""},{"key":"B25","first-page":"31","article-title":"Speech-based gesture generation for robots and embodied agents: a scoping review","author":"Liu","year":"2021"},{"key":"B26","doi-asserted-by":"publisher","first-page":"569","DOI":"10.1111\/cgf.14776","article-title":"A comprehensive review of data-driven co-speech gesture generation","volume":"42","author":"Nyatsanga","year":"2023","journal-title":"Comput. Graph. Forum"},{"key":"B27","first-page":"7753","article-title":"3d human pose estimation in video with temporal convolutions and semi-supervised training","author":"Pavllo","year":"2019"},{"key":"B28","doi-asserted-by":"publisher","first-page":"203","DOI":"10.1177\/002383099403700208","article-title":"Hand and mind: what gestures reveal about thought","volume":"37","author":"Studdert-Kennedy","year":"1994","journal-title":"Lang. Speech"},{"key":"B29","doi-asserted-by":"publisher","first-page":"497","DOI":"10.1109\/tmm.2020.2981989","article-title":"Deepdance: music-to-dance motion choreography with adversarial learning","volume":"23","author":"Sun","year":"2020","journal-title":"IEEE Trans. Multimedia"},{"key":"B30","doi-asserted-by":"crossref","first-page":"198","DOI":"10.1007\/978-3-319-58750-9_28","article-title":"Creating a gesture-speech dataset for speech-based automatic gesture generation","volume-title":"HCI international 2017\u2013posters\u2019 extended abstracts: 19th international conference, HCI international 2017","author":"Takeuchi","year":"2017"},{"key":"B31","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1706.03762","article-title":"Attention is all you need","volume":"30","author":"Vaswani","year":"2017","journal-title":"Adv. neural Inf. Process. Syst."},{"key":"B32","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3414685.3417838","article-title":"Speech gesture generation from the trimodal context of text, audio, and speaker identity","volume":"39","author":"Yoon","year":"2020","journal-title":"ACM Trans. Graph. (TOG)"},{"key":"B33","first-page":"4303","article-title":"Robots learn social skills: end-to-end learning of co-speech gesture generation for humanoid robots","author":"Yoon","year":"2019"},{"key":"B34","first-page":"736","article-title":"The genea challenge 2022: a large evaluation of data-driven co-speech gesture generation","author":"Yoon","year":"2022"},{"key":"B35","first-page":"20807","article-title":"Livelyspeaker: towards semantic-aware co-speech gesture generation","author":"Zhi","year":"2023"},{"key":"B36","first-page":"764","article-title":"Gesturemaster: graph-based speech-driven gesture generation","author":"Zhou","year":"2022"},{"key":"B37","first-page":"10544","article-title":"Taming diffusion models for audio-driven co-speech gesture generation","author":"Zhu","year":"2023"}],"container-title":["Frontiers in Robotics and AI"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2025.1546765\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,4,8]],"date-time":"2025-04-08T04:37:56Z","timestamp":1744087076000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2025.1546765\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,4,8]]},"references-count":37,"alternative-id":["10.3389\/frobt.2025.1546765"],"URL":"https:\/\/doi.org\/10.3389\/frobt.2025.1546765","relation":{},"ISSN":["2296-9144"],"issn-type":[{"type":"electronic","value":"2296-9144"}],"subject":[],"published":{"date-parts":[[2025,4,8]]},"article-number":"1546765"}}