{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,19]],"date-time":"2026-03-19T07:03:04Z","timestamp":1773903784352,"version":"3.50.1"},"reference-count":47,"publisher":"Springer Science and Business Media LLC","issue":"6","license":[{"start":{"date-parts":[[2022,6,3]],"date-time":"2022-06-03T00:00:00Z","timestamp":1654214400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,6,3]],"date-time":"2022-06-03T00:00:00Z","timestamp":1654214400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100004702","name":"Universit\u00e0 degli Studi di Genova","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100004702","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int J of Soc Robotics"],"published-print":{"date-parts":[[2022,8]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Embedding social robots with the capability of accompanying their sentences with natural gestures may be the key to increasing their acceptability and their usage in real contexts. However, the definition of <jats:italic>natural<\/jats:italic> communicative gestures may not be trivial, since it strictly depends on the culture of the person interacting with the robot. The proposed work investigates the possibility of generating culture-dependent communicative gestures, by proposing an integrated approach based on a custom dataset composed exclusively of persons belonging to the same culture, an adversarial generation module based on speech audio features, a voice conversion module to manage the multi-person dataset, and a 2D-to-3D mapping module for generating three-dimensional gestures. The approach has eventually been implemented and tested with the humanoid robot Pepper. Preliminary results, obtained through a statistical analysis of the evaluations made by human participants identifying themselves as belonging to different cultures, are discussed.\n<\/jats:p>","DOI":"10.1007\/s12369-022-00893-y","type":"journal-article","created":{"date-parts":[[2022,6,3]],"date-time":"2022-06-03T03:33:01Z","timestamp":1654227181000},"page":"1493-1506","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":12,"title":["Towards Culture-Aware Co-Speech Gestures for Social Robots"],"prefix":"10.1007","volume":"14","author":[{"given":"Ariel","family":"Gjaci","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9550-3740","authenticated-orcid":false,"given":"Carmine Tommaso","family":"Recchiuto","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Antonio","family":"Sgorbissa","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2022,6,3]]},"reference":[{"key":"893_CR1","doi-asserted-by":"publisher","first-page":"389","DOI":"10.1016\/S0065-2601(08)60241-5","volume":"28","author":"RM Krauss","year":"1996","unstructured":"Krauss RM, Chen Y, Chawla P (1996) Nonverbal behavior and nonverbal communication: What do conversational hand gestures tell us? Adv Exp Soc Psychol 28:389\u2013450","journal-title":"Adv Exp Soc Psychol"},{"issue":"2","key":"893_CR2","doi-asserted-by":"publisher","first-page":"203","DOI":"10.1177\/002383099403700208","volume":"37","author":"M Studdert-Kennedy","year":"1994","unstructured":"Studdert-Kennedy M (1994) Hand and Mind: What Gestures Reveal About Thought. Lang Speech 37(2):203\u2013209","journal-title":"Lang Speech"},{"issue":"6","key":"893_CR3","doi-asserted-by":"publisher","first-page":"593","DOI":"10.1080\/016909600750040571","volume":"15","author":"MW Alibali","year":"2000","unstructured":"Alibali MW, Kita S, Young AJ (2000) Gesture and the process of speech production: We think, therefore we gesture. Lang Cognit Process 15(6):593\u2013613","journal-title":"Lang Cognit Process"},{"issue":"1","key":"893_CR4","doi-asserted-by":"publisher","first-page":"79","DOI":"10.1023\/A:1024716331692","volume":"20","author":"D Archer","year":"1997","unstructured":"Archer D (1997) Unspoken diversity: Cultural differences in gestures. Qual Sociol 20(1):79\u2013105","journal-title":"Qual Sociol"},{"key":"893_CR5","unstructured":"Archer D (1992) A world of gestures: Culture and nonverbal communication. video) Berkeley: University of California Extension Center for Media and Independent Learning-2000 Center Street. Fourth Floor, Berkeley, California 94704:642\u20130460"},{"issue":"2","key":"893_CR6","doi-asserted-by":"publisher","first-page":"145","DOI":"10.1080\/01690960802586188","volume":"24","author":"S Kita","year":"2009","unstructured":"Kita S (2009) Cross-cultural variation of speech-accompanying gesture: A review. Lang Cognit Process 24(2):145\u2013167","journal-title":"Lang Cognit Process"},{"key":"893_CR7","doi-asserted-by":"crossref","unstructured":"Bremner P, Pipe AG, Melhuish C, Fraser M, Subramanian S (2011, October) The effects of robot-performed co-verbal gesture on listener behaviour. In: 2011 11th IEEE-RAS International Conference on Humanoid Robots. IEEE, p 458\u2013465","DOI":"10.1109\/Humanoids.2011.6100810"},{"key":"893_CR8","doi-asserted-by":"crossref","unstructured":"Wilson JR, Lee NY, Saechao A, Hershenson S, Scheutz M, Tickle-Degnen L (2017, November) Hand gestures and verbal acknowledgments improve human-robot rapport. In: International Conference on Social Robotics. Springer, Cham, p 334\u2013344","DOI":"10.1007\/978-3-319-70022-9_33"},{"key":"893_CR9","doi-asserted-by":"crossref","unstructured":"Sun L, Li K, Wang H, Kang S, Meng H (2016, July) Phonetic posteriorgrams for many-to-one voice conversion without parallel data training. In: 2016 IEEE International Conference on Multimedia and Expo (ICME), IEEE, p 1\u20136","DOI":"10.1109\/ICME.2016.7552917"},{"key":"893_CR10","doi-asserted-by":"crossref","unstructured":"Kucherenko T, Jonell P, Yoon Y, Wolfert P, Henter GE (2021, April) A large, crowdsourced evaluation of gesture generation systems on common data: The GENEA Challenge 2020. In: 26th International Conference on Intelligent User Interfaces, p 11\u201321","DOI":"10.1145\/3397481.3450692"},{"key":"893_CR11","doi-asserted-by":"crossref","unstructured":"Liu Y, Mohammadi G, Song Y, Johal W (2021, November) Speech-based Gesture Generation for Robots and Embodied Agents: A Scoping Review. In: Proceedings of the 9th International Conference on Human-Agent Interaction, p 31\u201338","DOI":"10.1145\/3472307.3484167"},{"issue":"3","key":"893_CR12","doi-asserted-by":"publisher","first-page":"40","DOI":"10.1109\/MRA.2018.2833157","volume":"25","author":"AK Pandey","year":"2018","unstructured":"Pandey AK, Gelin R (2018) A mass-produced sociable humanoid robot: Pepper: The first machine of its kind. IEEE Robot & Autom Mag 25(3):40\u201348","journal-title":"IEEE Robot & Autom Mag"},{"key":"893_CR13","doi-asserted-by":"crossref","unstructured":"Le QA, Hanoune S, Pelachaud C (2011, October) Design and implementation of an expressive gesture model for a humanoid robot. In: 2011 11th IEEE-RAS International Conference on Humanoid Robots. IEEE, p 134\u2013140","DOI":"10.1109\/Humanoids.2011.6100857"},{"key":"893_CR14","doi-asserted-by":"crossref","unstructured":"Meena R, Jokinen K, Wilcock G (2012, December) Integration of gestures and speech in human-robot interaction. In 2012 IEEE 3rd International Conference on Cognitive Infocommunications (CogInfoCom). IEEE, p 673\u2013678","DOI":"10.1109\/CogInfoCom.2012.6421936"},{"key":"893_CR15","doi-asserted-by":"crossref","unstructured":"Levine S, Kr\u00e4henb\u00fchl P, Thrun S, Koltun V (2010) Gesture controllers. In: ACM SIGGRAPH 2010 papers, p 1\u201311","DOI":"10.1145\/1778765.1778861"},{"key":"893_CR16","doi-asserted-by":"crossref","unstructured":"Ginosar S, Bar A, Kohavi G, Chan C, Owens A, Malik J (2019) Learning individual styles of conversational gesture. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition p 3497\u20133506","DOI":"10.1109\/CVPR.2019.00361"},{"key":"893_CR17","doi-asserted-by":"crossref","unstructured":"Yoon Y, Ko WR, Jang M, Lee J, Kim J, Lee G (2019, May) Robots learn social skills: End-to-end learning of co-speech gesture generation for humanoid robots. In: 2019 International Conference on Robotics and Automation (ICRA). IEEE, p 4303\u20134309","DOI":"10.1109\/ICRA.2019.8793720"},{"issue":"1","key":"893_CR18","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1109\/MSP.2017.2765202","volume":"35","author":"A Creswell","year":"2018","unstructured":"Creswell A, White T, Dumoulin V, Arulkumaran K, Sengupta B, Bharath AA (2018) Generative adversarial networks: An overview. IEEE Signal Process Mag 35(1):53\u201365","journal-title":"IEEE Signal Process Mag"},{"key":"893_CR19","doi-asserted-by":"crossref","unstructured":"Ronneberger O, Fischer P, Brox T (2015, October) U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical image computing and computer-assisted intervention. Springer, Cham, p 234\u2013241","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"893_CR20","doi-asserted-by":"crossref","unstructured":"Joo H, Liu H, Tan L, Gui L, Nabbe B, Matthews I, Sheikh Y (2015) Panoptic studio: A massively multiview system for social motion capture. In: Proceedings of the IEEE International Conference on Computer Vision, p 3334\u20133342","DOI":"10.1109\/ICCV.2015.381"},{"issue":"6","key":"893_CR21","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3414685.3417838","volume":"39","author":"Y Yoon","year":"2020","unstructured":"Yoon Y, Cha B, Lee JH, Jang M, Lee J, Kim J, Lee G (2020) Speech gesture generation from the trimodal context of text, audio, and speaker identity. ACM Trans on Graph (TOG) 39(6):1\u201316","journal-title":"ACM Trans on Graph (TOG)"},{"key":"893_CR22","doi-asserted-by":"crossref","unstructured":"Kucherenko T, Hasegawa D, Henter GE, Kaneko N, Kjellstr\u00f6m H (2019, July) Analyzing input and output representations for speech-driven gesture generation. In: Proceedings of the 19th ACM International Conference on Intelligent Virtual Agents, p 97\u2013104","DOI":"10.1145\/3308532.3329472"},{"key":"893_CR23","doi-asserted-by":"crossref","unstructured":"Ferstl Y, McDonnell R (2018, November) Investigating the use of recurrent motion modelling for speech gesture generation. In: Proceedings of the 18th International Conference on Intelligent Virtual Agents, p 93\u201398","DOI":"10.1145\/3267851.3267898"},{"key":"893_CR24","doi-asserted-by":"crossref","unstructured":"Ferstl Y, Neff M, McDonnell R (2019) Multi-objective adversarial gesture generation. In: Motion, Interaction and Games, p 1\u201310","DOI":"10.1145\/3359566.3360053"},{"key":"893_CR25","doi-asserted-by":"crossref","unstructured":"Panteris M, Manschitz S, Calinon S (2020, March) Learning, Generating and Adapting Wave Gestures for Expressive Human-Robot Interaction. In: Companion of the 2020 ACM\/IEEE International Conference on Human-Robot Interaction, p 386\u2013388","DOI":"10.1145\/3371382.3378286"},{"issue":"2","key":"893_CR26","first-page":"83","volume":"4","author":"G Trovato","year":"2013","unstructured":"Trovato G, Zecca M, Sessa S, Jamone L, Ham J, Hashimoto K, Takanishi A (2013) Cross-cultural study on human-robot greeting interaction: acceptance and discomfort by Egyptians and Japanese. Paladyn. J Behav Robot 4(2):83\u201393","journal-title":"J Behav Robot"},{"issue":"4","key":"893_CR27","doi-asserted-by":"publisher","first-page":"34","DOI":"10.5772\/60117","volume":"12","author":"G Trovato","year":"2015","unstructured":"Trovato G, Zecca M, Do M, Terlemez \u00d6, Kuramochi M, Waibel A, Takanishi A (2015) A novel greeting selection system for a culture-adaptive humanoid robot. Int J Adv Rob Syst 12(4):34","journal-title":"Int J Adv Rob Syst"},{"key":"893_CR28","doi-asserted-by":"crossref","unstructured":"Andrist S, Ziadee M, Boukaram H, Mutlu B, Sakr M (2015, March) Effects of culture on the credibility of robot speech: A comparison between english and arabic. In: Proceedings of the Tenth Annual ACM\/IEEE International Conference on Human-Robot Interaction, p 157\u2013164","DOI":"10.1145\/2696454.2696464"},{"issue":"4","key":"893_CR29","doi-asserted-by":"publisher","first-page":"1743","DOI":"10.1109\/TASE.2017.2731371","volume":"14","author":"XT Truong","year":"2017","unstructured":"Truong XT, Ngo TD (2017) Toward socially aware robot navigation in dynamic and crowded environments: A proactive social motion model. IEEE Trans Autom Sci Eng 14(4):1743\u20131760","journal-title":"IEEE Trans Autom Sci Eng"},{"issue":"1","key":"893_CR30","doi-asserted-by":"publisher","first-page":"267","DOI":"10.1007\/s12369-019-00560-9","volume":"12","author":"P Patompak","year":"2020","unstructured":"Patompak P, Jeong S, Nilkhamhang I, Chong NY (2020) Learning proxemics for personalized human-robot social interaction. Int J Soc Robot 12(1):267\u2013280","journal-title":"Int J Soc Robot"},{"key":"893_CR31","doi-asserted-by":"crossref","unstructured":"Papadopoulos C, Castro N, Nigath A, Davidson R, Faulkes N, Menicatti R, Sgorbissa A (2021) The CARESSES Randomised Controlled Trial: Exploring the Health-Related Impact of Culturally Competent Artificial Intelligence Embedded Into Socially Assistive Robots and Tested in Older Adult Care Homes. International Journal of Social Robotics, 1-12","DOI":"10.1007\/s12369-021-00781-x"},{"key":"893_CR32","doi-asserted-by":"crossref","unstructured":"Sgorbissa A, Papadopoulos I, Bruno B, Koulouglioti C, Recchiuto C (2018, October) Encoding guidelines for a culturally competent robot for elderly care. In: 2018 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, p 1988\u20131995","DOI":"10.1109\/IROS.2018.8594089"},{"key":"893_CR33","doi-asserted-by":"crossref","unstructured":"Khaliq AA, K\u00f6ckemann U, Pecora F, Saffiotti A, Bruno B, Recchiuto CT, Chong NY (2018, October) Culturally aware planning and execution of robot actions. In: 2018 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, p 326\u2013332","DOI":"10.1109\/IROS.2018.8593570"},{"issue":"3","key":"893_CR34","doi-asserted-by":"publisher","first-page":"515","DOI":"10.1007\/s12369-019-00519-w","volume":"11","author":"B Bruno","year":"2019","unstructured":"Bruno B, Recchiuto CT, Papadopoulos I, Saffiotti A, Koulouglioti C, Menicatti R, Sgorbissa A (2019) Knowledge representation for culturally competent personal robots: requirements, design principles, implementation, and assessment. Int J Soc Robot 11(3):515\u2013538","journal-title":"Int J Soc Robot"},{"issue":"4","key":"893_CR35","doi-asserted-by":"publisher","first-page":"6559","DOI":"10.1109\/LRA.2020.3015461","volume":"5","author":"CT Recchiuto","year":"2020","unstructured":"Recchiuto CT, Sgorbissa A (2020) A feasibility study of culture-aware cloud services for conversational robots. IEEE Robot Automat Lett 5(4):6559\u20136566","journal-title":"IEEE Robot Automat Lett"},{"key":"893_CR36","doi-asserted-by":"crossref","unstructured":"Recchuto C, Gava L, Grassi L, Grillo A, Lagomarsino M, Lanza D, Sgorbissa A (2020, June) Cloud services for culture aware conversation: Socially assistive robots and virtual assistants. In: 2020 17th International Conference on Ubiquitous Robots (UR). IEEE, p 270\u2013277","DOI":"10.1109\/UR49135.2020.9144750"},{"key":"893_CR37","unstructured":"Bergmann K, Aksu V, Kopp S (2011) The relation of speech and gestures: Temporal synchrony follows semantic synchrony. In: Proceedings of the 2nd Workshop on Gesture and Speech in Interaction (GeSpIn 2011)"},{"key":"893_CR38","unstructured":"Zaino G, Recchiuto CT, Sgorbissa A (2022) Culture-to-Culture Image Translation with Generative Adversarial Networks. arXiv preprint arXiv:2201.01565"},{"issue":"1","key":"893_CR39","first-page":"35","volume":"13","author":"R Raina","year":"2016","unstructured":"Raina R, Zameer A (2016) A study of non-verbal immediacy behaviour from the perspective of Indian cultural context, gender and experience. Int J Ind Cult Bus Manag 13(1):35\u201356","journal-title":"Int J Ind Cult Bus Manag"},{"issue":"1","key":"893_CR40","doi-asserted-by":"publisher","first-page":"172","DOI":"10.1109\/TPAMI.2019.2929257","volume":"43","author":"Z Cao","year":"2019","unstructured":"Cao Z, Hidalgo G, Simon T, Wei SE, Sheikh Y (2019) OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields. IEEE Trans Pattern Anal Mach Intell 43(1):172\u2013186","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"893_CR41","unstructured":"PySceneDetect (2021) PySceneDetect: Intelligent scene cut detection and video splitting tool. Retrieved July 13, 2021, from https:\/\/pyscenedetect.readthedocs.io\/en\/latest"},{"key":"893_CR42","doi-asserted-by":"crossref","unstructured":"Hazen TJ, Shen W, White C (2009, December) Query-by-example spoken term detection using phonetic posteriorgram templates. In: 2009 IEEE Workshop on Automatic Speech Recognition & Understanding. IEEE, p 421\u2013426","DOI":"10.1109\/ASRU.2009.5372889"},{"key":"893_CR43","unstructured":"Povey D, Ghoshal A, Boulianne G, Burget L, Glembek O, Goel N, Vesely K (2011) The Kaldi speech recognition toolkit. In: IEEE 2011 workshop on automatic speech recognition and understanding (No. CONF). IEEE Signal Processing Society"},{"key":"893_CR44","doi-asserted-by":"crossref","unstructured":"Sun L, Kang S, Li K, Meng H (2015, April) Voice conversion using deep bidirectional long short-term memory based recurrent neural networks. In: 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, p 4869\u20134873","DOI":"10.1109\/ICASSP.2015.7178896"},{"key":"893_CR45","doi-asserted-by":"crossref","unstructured":"Wolfert P, Robinson N, Belpaeme T (2021) A review of evaluation practices of gesture generation in embodied conversational agents. arXiv preprint arXiv:2101.03769","DOI":"10.1109\/THMS.2022.3149173"},{"issue":"10","key":"893_CR46","doi-asserted-by":"publisher","first-page":"344","DOI":"10.5772\/56870","volume":"10","author":"I Mlakar","year":"2013","unstructured":"Mlakar I, Ka\u010di\u010d Z, Rojc M (2013) TTS-driven synthetic behaviour-generation model for artificial bodies. Int J Adv Rob Syst 10(10):344","journal-title":"Int J Adv Rob Syst"},{"key":"893_CR47","doi-asserted-by":"crossref","unstructured":"Kucherenko T (2018, October) Data driven non-verbal behavior generation for humanoid robots. In: Proceedings of the 20th ACM International Conference on Multimodal Interaction, p 520-523","DOI":"10.1145\/3242969.3264970"}],"container-title":["International Journal of Social Robotics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s12369-022-00893-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s12369-022-00893-y\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s12369-022-00893-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,8,31]],"date-time":"2022-08-31T09:34:11Z","timestamp":1661938451000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s12369-022-00893-y"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,6,3]]},"references-count":47,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2022,8]]}},"alternative-id":["893"],"URL":"https:\/\/doi.org\/10.1007\/s12369-022-00893-y","relation":{},"ISSN":["1875-4791","1875-4805"],"issn-type":[{"value":"1875-4791","type":"print"},{"value":"1875-4805","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,6,3]]},"assertion":[{"value":"18 May 2022","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"3 June 2022","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Informed consent, constituted by the first page of the online survey, was obtained from all subjects involved in this study. Institutional Review Board approval was not required because responses to the survey were anonymous.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Informed consent"}}]}}