{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,13]],"date-time":"2026-03-13T08:58:55Z","timestamp":1773392335947,"version":"3.50.1"},"reference-count":77,"publisher":"Springer Science and Business Media LLC","issue":"8","license":[{"start":{"date-parts":[[2023,4,2]],"date-time":"2023-04-02T00:00:00Z","timestamp":1680393600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,4,2]],"date-time":"2023-04-02T00:00:00Z","timestamp":1680393600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62061136001"],"award-info":[{"award-number":["62061136001"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"crossref","award":["TRR 169"],"award-info":[{"award-number":["TRR 169"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Office of China Postdoctoral Council"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int J of Soc Robotics"],"published-print":{"date-parts":[[2023,8]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>To enhance human-robot social interaction, it is essential for robots to process multiple social cues in a complex real-world environment. However, incongruency of input information across modalities is inevitable and could be challenging for robots to process. To tackle this challenge, our study adopted the neurorobotic paradigm of crossmodal conflict resolution to make a robot express human-like social attention. A behavioural experiment was conducted on 37 participants for the human study. We designed a round-table meeting scenario with three animated avatars to improve ecological validity. Each avatar wore a medical mask to obscure the facial cues of the nose, mouth, and jaw. The central avatar shifted its eye gaze while the peripheral avatars generated sound. Gaze direction and sound locations were either spatially congruent or incongruent. We observed that the central avatar\u2019s dynamic gaze could trigger crossmodal social attention responses. In particular, human performance was better under the congruent audio-visual condition than the incongruent condition. Our saliency prediction model was trained to detect social cues, predict audio-visual saliency, and attend selectively for the robot study. After mounting the trained model on the iCub, the robot was exposed to laboratory conditions similar to the human experiment. While the human performance was overall superior, our trained model demonstrated that it could replicate attention responses similar to humans.<\/jats:p>","DOI":"10.1007\/s12369-023-00993-3","type":"journal-article","created":{"date-parts":[[2023,4,3]],"date-time":"2023-04-03T07:09:58Z","timestamp":1680505798000},"page":"1325-1340","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":13,"title":["A Trained Humanoid Robot can Perform Human-Like Crossmodal Social Attention and Conflict Resolution"],"prefix":"10.1007","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5385-2982","authenticated-orcid":false,"given":"Di","family":"Fu","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Fares","family":"Abawi","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hugo","family":"Carneiro","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Matthias","family":"Kerzel","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ziwei","family":"Chen","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Erik","family":"Strahl","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xun","family":"Liu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Stefan","family":"Wermter","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2023,4,2]]},"reference":[{"key":"993_CR1","doi-asserted-by":"publisher","unstructured":"Abawi F, Weber T, Wermter S (2021) GASP: gated attention for saliency prediction. In: Proceedings of the international joint conference on artificial intelligence (IJCAI), pp. 584\u2013591. IJCAI Organization. https:\/\/doi.org\/10.24963\/ijcai.2021\/81","DOI":"10.24963\/ijcai.2021\/81"},{"issue":"1","key":"993_CR2","doi-asserted-by":"publisher","first-page":"25","DOI":"10.5898\/JHRI.6.1.Admoni","volume":"6","author":"H Admoni","year":"2017","unstructured":"Admoni H, Scassellati B (2017) Social eye gaze in human-robot interaction: a review. J Human-Robot Interact 6(1):25\u201363. https:\/\/doi.org\/10.5898\/JHRI.6.1.Admoni","journal-title":"J Human-Robot Interact"},{"issue":"11","key":"993_CR3","doi-asserted-by":"publisher","first-page":"2593","DOI":"10.1093\/cercor\/bhl166","volume":"17","author":"T Akiyama","year":"2007","unstructured":"Akiyama T, Kato M, Muramatsu T, Umeda S, Saito F, Kashima H (2007) Unilateral amygdala lesions hamper attentional orienting triggered by gaze direction. Cereb Cortex 17(11):2593\u20132600. https:\/\/doi.org\/10.1093\/cercor\/bhl166","journal-title":"Cereb Cortex"},{"key":"993_CR4","doi-asserted-by":"publisher","first-page":"283","DOI":"10.3389\/fnhum.2015.00283","volume":"9","author":"M Ambrosecchia","year":"2015","unstructured":"Ambrosecchia M, Marino BF, Gawryszewski LG, Riggio L (2015) Spatial stimulus-response compatibility and affordance effects are not ruled by the same mechanisms. Front Hum Neurosci 9:283. https:\/\/doi.org\/10.3389\/fnhum.2015.00283","journal-title":"Front Hum Neurosci"},{"key":"993_CR5","doi-asserted-by":"publisher","DOI":"10.1007\/s12369-020-00690-5","author":"A Andriella","year":"2020","unstructured":"Andriella A, Siqueira H, Fu D, Magg S, Barros P, Wermter S, Torras C, Alenya G (2020) Do I have a personality? Endowing care robots with context-dependent personality traits. Int J Soc Robot. https:\/\/doi.org\/10.1007\/s12369-020-00690-5","journal-title":"Int J Soc Robot"},{"issue":"14","key":"993_CR6","doi-asserted-by":"publisher","first-page":"10209","DOI":"10.1007\/s00521-019-04559-1","volume":"32","author":"AJST Montes-y","year":"2019","unstructured":"Montes-y AJST, FA GMG (2019) Gated multimodal networks. Neural Comput Appl 32(14):10209. https:\/\/doi.org\/10.1007\/s00521-019-04559-1","journal-title":"Neural Comput Appl"},{"key":"993_CR7","volume-title":"Mindblindness: an essay on autism and theory of mind","author":"S Baron-Cohen","year":"1997","unstructured":"Baron-Cohen S (1997) Mindblindness: an essay on autism and theory of mind. MIT press, Cambridge"},{"key":"993_CR8","doi-asserted-by":"publisher","DOI":"10.3758\/s13423-020-01766-z","author":"L Battich","year":"2020","unstructured":"Battich L, Fairhurst M, Deroy O (2020) Coordinating attention requires coordinated senses. Psychonom Bull Rev. https:\/\/doi.org\/10.3758\/s13423-020-01766-z","journal-title":"Psychonom Bull Rev"},{"issue":"58","key":"993_CR9","doi-asserted-by":"publisher","first-page":"eabc5044","DOI":"10.1126\/scirobotics.abc5044","volume":"6","author":"M Belkaid","year":"2021","unstructured":"Belkaid M, Kompatsiari K, De Tommaso D, Zablith I, Wykowska A (2021) Mutual gaze with a robot affects human neural activity and delays decision-making processes. Sci Robot 6(58):eabc5044. https:\/\/doi.org\/10.1126\/scirobotics.abc5044","journal-title":"Sci Robot"},{"issue":"1","key":"993_CR10","doi-asserted-by":"publisher","first-page":"118","DOI":"10.1111\/j.1749-6632.2009.04468.x","volume":"1156","author":"E Birmingham","year":"2009","unstructured":"Birmingham E, Kingstone A (2009) Human social attention: a new look at past, present, and future investigations. Ann N Y Acad Sci 1156(1):118\u2013140. https:\/\/doi.org\/10.1111\/j.1749-6632.2009.04468.x","journal-title":"Ann N Y Acad Sci"},{"issue":"6","key":"993_CR11","doi-asserted-by":"publisher","first-page":"535","DOI":"10.1111\/j.1467-7687.2005.00445.x","volume":"8","author":"R Brooks","year":"2005","unstructured":"Brooks R, Meltzoff AN (2005) The development of gaze following and its relation to language. Dev Sci 8(6):535\u2013543. https:\/\/doi.org\/10.1111\/j.1467-7687.2005.00445.x","journal-title":"Dev Sci"},{"issue":"3","key":"993_CR12","doi-asserted-by":"publisher","first-page":"740","DOI":"10.1109\/TPAMI.2018.2815601","volume":"41","author":"Z Bylinskii","year":"2019","unstructured":"Bylinskii Z, Judd T, Oliva A, Torralba A, Durand F (2019) What do different evaluation metrics tell us about saliency models? IEEE Trans Pattern Anal Mach Intell 41(3):740\u2013757. https:\/\/doi.org\/10.1109\/TPAMI.2018.2815601","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"993_CR13","doi-asserted-by":"publisher","unstructured":"Carneiro H, Weber C, Wermter S FaVoA: Face-Voice association favours ambiguous speaker detection. In: Proceedings of the 30th international conference on artificial neural networks (ICANN 2021), vol. LNCS 12891:439\u2013450. https:\/\/doi.org\/10.1007\/978-3-030-86362-3_36","DOI":"10.1007\/978-3-030-86362-3_36"},{"issue":"3","key":"993_CR14","doi-asserted-by":"publisher","first-page":"332","DOI":"10.1037\/0033-295x.97.3.332","volume":"97","author":"JD Cohen","year":"1990","unstructured":"Cohen JD, Dunbar K, McClelland JL (1990) On the control of automatic processes: a parallel distributed processing account of the stroop effect. Psychol Rev 97(3):332. https:\/\/doi.org\/10.1037\/0033-295x.97.3.332","journal-title":"Psychol Rev"},{"issue":"10","key":"993_CR15","doi-asserted-by":"publisher","first-page":"5142","DOI":"10.1109\/TIP.2018.2851672","volume":"27","author":"M Cornia","year":"2018","unstructured":"Cornia M, Baraldi L, Serra G, Cucchiara R (2018) Predicting human eye fixations via an LSTM-based saliency attentive model. IEEE Trans Image Process 27(10):5142\u20135154. https:\/\/doi.org\/10.1109\/TIP.2018.2851672","journal-title":"IEEE Trans Image Process"},{"issue":"6","key":"993_CR16","doi-asserted-by":"publisher","first-page":"204166952110584","DOI":"10.1177\/20416695211058480","volume":"12","author":"M Dalmaso","year":"2021","unstructured":"Dalmaso M, Zhang X, Galfano G, Castelli L (2021) Face masks do not alter gaze cueing of attention: evidence from the Covid-19 pandemic. I-Perception 12(6):20416695211058480. https:\/\/doi.org\/10.1177\/20416695211058480","journal-title":"I-Perception"},{"issue":"4","key":"993_CR17","doi-asserted-by":"publisher","DOI":"10.1016\/j.heliyon.2018.e00595","volume":"4","author":"D Doruk","year":"2018","unstructured":"Doruk D, Chanes L, Malavera A, Merabet LB, Valero-Cabr\u00e9 A, Fregni F (2018) Cross-modal cueing effects of visuospatial attention on conscious somatosensory perception. Heliyon 4(4):e00595. https:\/\/doi.org\/10.1016\/j.heliyon.2018.e00595","journal-title":"Heliyon"},{"issue":"1","key":"993_CR18","doi-asserted-by":"publisher","first-page":"143","DOI":"10.3758\/BF03203267","volume":"16","author":"BA Eriksen","year":"1974","unstructured":"Eriksen BA, Eriksen CW (1974) Effects of noise letters upon the identification of a target letter in a nonsearch task. Percept Psychophys 16(1):143\u2013149. https:\/\/doi.org\/10.3758\/BF03203267","journal-title":"Percept Psychophys"},{"issue":"1","key":"993_CR19","doi-asserted-by":"publisher","first-page":"39","DOI":"10.1207\/s15327078in0501_2","volume":"5","author":"T Farroni","year":"2004","unstructured":"Farroni T, Massaccesi S, Pividori D, Johnson MH (2004) Gaze following in newborns. Infancy 5(1):39\u201360. https:\/\/doi.org\/10.1207\/s15327078in0501_2","journal-title":"Infancy"},{"issue":"3","key":"993_CR20","doi-asserted-by":"publisher","first-page":"490","DOI":"10.3758\/BF03208827","volume":"5","author":"CK Friesen","year":"1998","unstructured":"Friesen CK, Kingstone A (1998) The eyes have it! Reflexive orienting is triggered by nonpredictive gaze. Psychonom Bull Rev 5(3):490\u2013495. https:\/\/doi.org\/10.3758\/BF03208827","journal-title":"Psychonom Bull Rev"},{"issue":"2","key":"993_CR21","doi-asserted-by":"publisher","first-page":"319","DOI":"10.1037\/0096-1523.30.2.319","volume":"30","author":"CK Friesen","year":"2004","unstructured":"Friesen CK, Ristic J, Kingstone A (2004) Attentional effects of counterpredictive gaze and arrow cues. J Exp Psychol Hum Percept Perform 30(2):319. https:\/\/doi.org\/10.1037\/0096-1523.30.2.319","journal-title":"J Exp Psychol Hum Percept Perform"},{"issue":"4","key":"993_CR22","doi-asserted-by":"publisher","first-page":"694","DOI":"10.1037\/0033-2909.133.4.694","volume":"133","author":"A Frischen","year":"2007","unstructured":"Frischen A, Bayliss AP, Tipper SP (2007) Gaze cueing of attention: visual attention, social cognition, and individual differences. Psychol Bull 133(4):694. https:\/\/doi.org\/10.1037\/0033-2909.133.4.694","journal-title":"Psychol Bull"},{"key":"993_CR23","unstructured":"Fu D, Barros P, Parisi GI, Wu H, Magg S, Liu X, Wermter S (2018) Assessing the contribution of semantic congruency to multisensory integration and conflict resolution. In: IROS 2018 Workshop on crossmodal learning for intelligent robotics. IEEE. https:\/\/arxiv.org\/abs\/1810.06748"},{"key":"993_CR24","doi-asserted-by":"publisher","first-page":"10","DOI":"10.3389\/fnint.2020.00010","volume":"14","author":"D Fu","year":"2020","unstructured":"Fu D, Weber C, Yang G, Kerzel M, Nan W, Barros P, Wu H, Liu X, Wermter S (2020) What can computational models learn from human selective attention? A review from an audiovisual unimodal and crossmodal perspective. Front Integr Neurosci 14:10. https:\/\/doi.org\/10.3389\/fnint.2020.00010","journal-title":"Front Integr Neurosci"},{"key":"993_CR25","doi-asserted-by":"publisher","unstructured":"Gao R, Grauman K (2019) 2.5D visual sound. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (CVPR), IEEE. pp. 324\u2013333. https:\/\/doi.org\/10.1109\/CVPR.2019.00041","DOI":"10.1109\/CVPR.2019.00041"},{"key":"993_CR26","doi-asserted-by":"publisher","first-page":"1541","DOI":"10.3389\/fpsyg.2021.669432","volume":"12","author":"M Gori","year":"2021","unstructured":"Gori M, Schiatti L, Amadeo MB (2021) Masking emotions: face masks impair how we read emotions. Front Psychol 12:1541. https:\/\/doi.org\/10.3389\/fpsyg.2021.669432","journal-title":"Front Psychol"},{"key":"993_CR27","doi-asserted-by":"publisher","DOI":"10.1016\/j.dcn.2019.100671","volume":"38","author":"J Guo","year":"2019","unstructured":"Guo J, Luo X, Wang E, Li B, Chang Q, Sun L, Song Y (2019) Abnormal alpha modulation in response to human eye gaze predicts inattention severity in children with ADHD. Dev Cogn Neurosci 38:100671. https:\/\/doi.org\/10.1016\/j.dcn.2019.100671","journal-title":"Dev Cogn Neurosci"},{"key":"993_CR28","doi-asserted-by":"publisher","unstructured":"Hara K, Kataoka H, Satoh Y (2018) Can spatiotemporal 3D CNNs retrace the history of 2D CNNs and Imagenet? In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), IEEE. pp. 6546\u20136555. https:\/\/doi.org\/10.1109\/CVPR.2018.00685","DOI":"10.1109\/CVPR.2018.00685"},{"key":"993_CR29","doi-asserted-by":"publisher","unstructured":"He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In: Proceedings of the IEEE international conference on computer vision (ICCV), IEEE, USA. pp. 1026\u20131034. https:\/\/doi.org\/10.1109\/ICCV.2015.123","DOI":"10.1109\/ICCV.2015.123"},{"key":"993_CR30","doi-asserted-by":"publisher","unstructured":"He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: European conference on computer vision. Springer. pp. 630\u2013645. https:\/\/doi.org\/10.1007\/978-3-319-46493-0_38","DOI":"10.1007\/978-3-319-46493-0_38"},{"key":"993_CR31","doi-asserted-by":"publisher","unstructured":"Jain S, Yarlagadda P, Jyoti S, Karthik S, Subramanian R, Gandhi V (2020) ViNet: Pushing the limits of visual modality for audio-visual saliency prediction. In: Proceedings of the IEEE\/RSJ international conference on intelligent robots and systems (IROS), IEEE. pp. 3520\u20133527. https:\/\/doi.org\/10.1109\/IROS51168.2021.9635989","DOI":"10.1109\/IROS51168.2021.9635989"},{"issue":"45","key":"993_CR32","doi-asserted-by":"publisher","first-page":"16208","DOI":"10.1073\/pnas.1411333111","volume":"111","author":"S Jessen","year":"2014","unstructured":"Jessen S, Grossmann T (2014) Unconscious discrimination of social cues from eye whites in infants. Proc Natl Acad Sci 111(45):16208\u201316213. https:\/\/doi.org\/10.1073\/pnas.1411333111","journal-title":"Proc Natl Acad Sci"},{"issue":"2","key":"993_CR33","doi-asserted-by":"publisher","first-page":"233","DOI":"10.1111\/1467-7687.00036","volume":"1","author":"S Johnson","year":"1998","unstructured":"Johnson S, Slaughter V, Carey S (1998) Whose gaze will infants follow? the elicitation of gaze-following in 12-month-olds. Dev Sci 1(2):233\u2013238. https:\/\/doi.org\/10.1111\/1467-7687.00036","journal-title":"Dev Sci"},{"key":"993_CR34","unstructured":"Kerzel M, Wermter S (2020) Towards a data generation framework for affective shared perception and social cue learning using virtual avatars. In: Workshop on affective shared perception, ICDL 2020, IEEE international conference on development and learning https:\/\/www.whisperproject.eu\/images\/WASP2020 submissions\/9_ICDL_Workshop_WASPKerzelWermter.pdf"},{"issue":"3","key":"993_CR35","doi-asserted-by":"publisher","first-page":"525","DOI":"10.1007\/s12369-019-00565-4","volume":"13","author":"K Kompatsiari","year":"2021","unstructured":"Kompatsiari K, Ciardo F, Tikhanoff V, Metta G, Wykowska A (2021) It\u2019s in the eyes: the engaging role of eye contact in HRI. Int J Soc Robot 13(3):525\u2013535. https:\/\/doi.org\/10.1007\/s12369-019-00565-4","journal-title":"Int J Soc Robot"},{"key":"993_CR36","doi-asserted-by":"publisher","unstructured":"K\u00f6p\u00fckl\u00fc O, Taseska M, Rigoll G (2021) How to design a three-stage architecture for audio-visual active speaker detection in the wild. In: Proceedings of the IEEE\/CVF international conference on computer vision (ICCV), IEEE. pp. 1193\u20131203. https:\/\/doi.org\/10.1109\/ICCV48922.2021.00123","DOI":"10.1109\/ICCV48922.2021.00123"},{"key":"993_CR37","doi-asserted-by":"publisher","unstructured":"Kornblum S, Lee JW (1995) Stimulus-response compatibility with relevant and irrelevant stimulus dimensions that do and do not overlap with the response. J Exp Psychol Hum Percept Perform 21(4):855. https:\/\/doi.org\/10.1037\/\/0096-1523.21.4.855","DOI":"10.1037\/\/0096-1523.21.4.855"},{"issue":"2","key":"993_CR38","doi-asserted-by":"publisher","first-page":"50","DOI":"10.1016\/s1364-6613(99)01436-9","volume":"4","author":"SR Langton","year":"2000","unstructured":"Langton SR, Watt RJ, Bruce V (2000) Do the eyes have it? Cues to the direction of social attention. Trends Cogn Sci 4(2):50\u201359. https:\/\/doi.org\/10.1016\/s1364-6613(99)01436-9","journal-title":"Trends Cogn Sci"},{"issue":"2","key":"993_CR39","doi-asserted-by":"publisher","first-page":"1643","DOI":"10.1016\/j.neuroimage.2010.08.074","volume":"54","author":"I Laube","year":"2011","unstructured":"Laube I, Kamphuis S, Dicke PW, Thier P (2011) Cortical processing of head-and eye-gaze cues guiding joint social attention. Neuroimage 54(2):1643\u20131653. https:\/\/doi.org\/10.1016\/j.neuroimage.2010.08.074","journal-title":"Neuroimage"},{"issue":"7","key":"993_CR40","doi-asserted-by":"publisher","first-page":"1347","DOI":"10.1037\/dev0000524","volume":"54","author":"X Liu","year":"2018","unstructured":"Liu X, Liu T, Shangguan F, S\u00f8rensen TA, Liu Q, Shi J (2018) Neurodevelopment of conflict adaptation: evidence from event-related potentials. Dev Psychol 54(7):1347. https:\/\/doi.org\/10.1037\/dev0000524","journal-title":"Dev Psychol"},{"key":"993_CR41","doi-asserted-by":"publisher","unstructured":"MacLeod CM (1991) Half a century of research on the stroop effect: an integrative review. Psychol Bull 109(2):163. https:\/\/doi.org\/10.1037\/0033-2909.109.2.163","DOI":"10.1037\/0033-2909.109.2.163"},{"issue":"7","key":"993_CR42","doi-asserted-by":"publisher","first-page":"748","DOI":"10.1016\/j.cub.2014.02.021","volume":"24","author":"RK Maddox","year":"2014","unstructured":"Maddox RK, Pospisil DA, Stecker GC, Lee AK (2014) Directing eye gaze enhances auditory spatial cue discrimination. Curr Biol 24(7):748\u2013752. https:\/\/doi.org\/10.1016\/j.cub.2014.02.021","journal-title":"Curr Biol"},{"issue":"4","key":"993_CR43","doi-asserted-by":"publisher","first-page":"679","DOI":"10.1037\/0021-843X.112.4.679","volume":"112","author":"HE McNeely","year":"2003","unstructured":"McNeely HE, West R, Christensen BK, Alain C (2003) Neurophysiological evidence for disturbances of conflict processing in patients with schizophrenia. J Abnorm Psychol 112(4):679. https:\/\/doi.org\/10.1037\/0021-843X.112.4.679","journal-title":"J Abnorm Psychol"},{"issue":"5","key":"993_CR44","doi-asserted-by":"publisher","first-page":"269","DOI":"10.1111\/j.1467-8721.2007.00518.x","volume":"16","author":"P Mundy","year":"2007","unstructured":"Mundy P, Newell L (2007) Attention, joint attention, and social cognition. Curr Dir Psychol Sci 16(5):269\u2013274. https:\/\/doi.org\/10.1111\/j.1467-8721.2007.00518.x","journal-title":"Curr Dir Psychol Sci"},{"issue":"4","key":"993_CR45","doi-asserted-by":"publisher","first-page":"625","DOI":"10.1080\/17470210802486027","volume":"62","author":"R Newport","year":"2009","unstructured":"Newport R, Howarth S (2009) Social gaze cueing to auditory locations. Q J Experiment Psychol 62(4):625\u2013634. https:\/\/doi.org\/10.1080\/17470210802486027","journal-title":"Q J Experiment Psychol"},{"issue":"3","key":"993_CR46","doi-asserted-by":"publisher","first-page":"54","DOI":"10.3390\/robotics8030054","volume":"8","author":"O Nocentini","year":"2019","unstructured":"Nocentini O, Fiorini L, Acerbi G, Sorrentino A, Mancioppi G, Cavallo F (2019) A survey of behavioral models for social robots. Robotics 8(3):54. https:\/\/doi.org\/10.3390\/robotics8030054","journal-title":"Robotics"},{"issue":"1","key":"993_CR47","doi-asserted-by":"publisher","first-page":"339","DOI":"10.1016\/j.concog.2007.06.014","volume":"17","author":"P Nuku","year":"2008","unstructured":"Nuku P, Bekkering H (2008) Joint attention: inferring what others perceive (and don\u2019t perceive). Conscious Cogn 17(1):339\u2013349. https:\/\/doi.org\/10.1016\/j.concog.2007.06.014","journal-title":"Conscious Cogn"},{"issue":"1","key":"993_CR48","doi-asserted-by":"publisher","first-page":"135","DOI":"10.1016\/j.concog.2009.07.012","volume":"19","author":"P Nuku","year":"2010","unstructured":"Nuku P, Bekkering H (2010) When one sees what the other hears: crossmodal attentional modulation for gazed and non-gazed upon auditory targets. Conscious Cogn 19(1):135\u2013143. https:\/\/doi.org\/10.1016\/j.concog.2009.07.012","journal-title":"Conscious Cogn"},{"issue":"3","key":"993_CR49","doi-asserted-by":"publisher","first-page":"135","DOI":"10.1016\/j.tics.2008.12.006","volume":"13","author":"L Nummenmaa","year":"2009","unstructured":"Nummenmaa L, Calder AJ (2009) Neural mechanisms of social attention. Trends Cogn Sci 13(3):135\u2013143. https:\/\/doi.org\/10.1016\/j.tics.2008.12.006","journal-title":"Trends Cogn Sci"},{"key":"993_CR50","doi-asserted-by":"publisher","unstructured":"Parisi GI, Barros P, Fu D, Magg S, Wu H, Liu X, Wermter S (2018) A neurorobotic experiment for crossmodal conflict resolution in complex environments. In: Proceedings of the IEEE\/RSJ international conference on intelligent robots and systems (IROS), IEEE. pp. 2330\u20132335. https:\/\/doi.org\/10.1109\/IROS.2018.8594036","DOI":"10.1109\/IROS.2018.8594036"},{"key":"993_CR51","unstructured":"Pfeifer-Lessmann N, Pfeifer T, Wachsmuth I (2012) An operational model of joint attention-timing of gaze patterns in interactions between humans and a virtual human. In: Proceedings of the annual meeting of the cognitive science society, vol.\u00a034. https:\/\/escholarship.org\/uc\/item\/4f49f71h"},{"key":"993_CR52","unstructured":"Posner M, Cohen Y (1984) Components of visual orienting. Attention and performance X: Control of language processes. Psychology Press, London, pp 531\u2013556"},{"key":"993_CR53","doi-asserted-by":"publisher","unstructured":"Posner MI, Snyder CR, Davidson BJ (1980) Attention and the detection of signals. J Exp Psychol Gen 109(2):160. https:\/\/doi.org\/10.1037\/0096-3445.109.2.160","DOI":"10.1037\/0096-3445.109.2.160"},{"key":"993_CR54","doi-asserted-by":"publisher","DOI":"10.1201\/9780203022795","volume-title":"Stimulus-response compatibility principles: data, theory, and application","author":"RW Proctor","year":"2006","unstructured":"Proctor RW, Vu KPL (2006) Stimulus-response compatibility principles: data, theory, and application. CRC Press, Cambridge"},{"key":"993_CR55","doi-asserted-by":"publisher","unstructured":"Rachavarapu KK, Sundaresha V, Aakanksha Rajagopalan A (2021) Localize to binauralize: Audio spatialization from visual sound source localization. In: Proceedings of the IEEE\/cvf international conference on computer vision, IEEE. pp. 1930\u20131939. https:\/\/doi.org\/10.1109\/ICCV48922.2021.00194","DOI":"10.1109\/ICCV48922.2021.00194"},{"key":"993_CR56","doi-asserted-by":"publisher","unstructured":"Raptopoulou A, Komnidis A, Bamidis PD, Astaras A (2021) Human-robot interaction for social skill development in children with Asd: a literature review. Healthcare Technol Lett 8(4):90\u201396. https:\/\/doi.org\/10.1049\/htl2.12013","DOI":"10.1049\/htl2.12013"},{"issue":"5","key":"993_CR57","doi-asserted-by":"publisher","first-page":"964","DOI":"10.3758\/bf03194129","volume":"14","author":"J Ristic","year":"2007","unstructured":"Ristic J, Wright A, Kingstone A (2007) Attentional control and reflexive orienting to gaze and arrow cues. Psychonom Bull Rev 14(5):964\u2013969. https:\/\/doi.org\/10.3758\/bf03194129","journal-title":"Psychonom Bull Rev"},{"key":"993_CR58","doi-asserted-by":"publisher","unstructured":"Roth J, Chaudhuri S, Klejch O, Marvin R, Gallagher A, Kaver L, Ramaswamy S, Stopczynski A, Schmid C, Xi Z, Pantofaru C (2020) AVA-ActiveSpeaker: An audio-visual dataset for active speaker detection. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE. pp. 4492\u20134496. https:\/\/doi.org\/10.1109\/ICASSP40776.2020.9053900","DOI":"10.1109\/ICASSP40776.2020.9053900"},{"key":"993_CR59","doi-asserted-by":"publisher","first-page":"275","DOI":"10.1146\/annurev-bioeng-071811-150036","volume":"14","author":"B Scassellati","year":"2012","unstructured":"Scassellati B, Admoni H, Matari\u0107 M (2012) Robots for use in autism research. Ann Rev Biomed Eng 14:275\u2013294. https:\/\/doi.org\/10.1146\/annurev-bioeng-071811-150036","journal-title":"Ann Rev Biomed Eng"},{"key":"993_CR60","doi-asserted-by":"publisher","unstructured":"Schuller AM, Rossion B (2004) Perception of static eye gaze direction facilitates subsequent early visual processing. Clin Neurophysiol 115(5):1161\u20131168. https:\/\/doi.org\/10.1016\/j.clinph.2003.12.022","DOI":"10.1016\/j.clinph.2003.12.022"},{"issue":"8","key":"993_CR61","doi-asserted-by":"publisher","first-page":"1204","DOI":"10.1016\/j.neubiorev.2009.06.001","volume":"33","author":"A Senju","year":"2009","unstructured":"Senju A, Johnson MH (2009) Atypical eye contact in autism: models, mechanisms and development. Neurosci Biobehav Rev 33(8):1204\u20131214. https:\/\/doi.org\/10.1016\/j.neubiorev.2009.06.001","journal-title":"Neurosci Biobehav Rev"},{"key":"993_CR62","doi-asserted-by":"publisher","first-page":"5","DOI":"10.3389\/fnint.2010.00005","volume":"4","author":"SV Shepherd","year":"2010","unstructured":"Shepherd SV (2010) Following gaze: gaze-following behavior as a window into social cognition. Front Integr Neurosci 4:5. https:\/\/doi.org\/10.3389\/fnint.2010.00005","journal-title":"Front Integr Neurosci"},{"key":"993_CR63","doi-asserted-by":"crossref","unstructured":"Shimaya J, Yoshikawa Y, Matsumoto Y, Kumazaki H, Ishiguro H, Mimura M, Miyao M (2016) Advantages of indirect conversation via a desktop humanoid robot: Case study on daily life guidance for adolescents with autism spectrum disorders. In: 2016 25th IEEE international symposium on robot and human interactive communication (RO-MAN), IEEE. pp. 831\u2013836. https:\/\/doi.org\/10.1109\/ROMAN.2016.7745215","DOI":"10.1109\/ROMAN.2016.7745215"},{"key":"993_CR64","doi-asserted-by":"publisher","unstructured":"Simon JR, Rudell AP (1967) Auditory SR compatibility: the effect of an irrelevant cue on information processing. J Appl Psychol 51(3):300. https:\/\/doi.org\/10.1037\/h0020586","DOI":"10.1037\/h0020586"},{"issue":"6","key":"993_CR65","doi-asserted-by":"publisher","first-page":"1024","DOI":"10.3758\/BF03206438","volume":"12","author":"S Soto-Faraco","year":"2005","unstructured":"Soto-Faraco S, Sinnett S, Alsius A, Kingstone A (2005) Spatial orienting of tactile attention induced by social cues. Psychonom Bull Rev 12(6):1024\u20131031. https:\/\/doi.org\/10.3758\/BF03206438","journal-title":"Psychonom Bull Rev"},{"key":"993_CR66","doi-asserted-by":"publisher","first-page":"1","DOI":"10.7554\/eLife.31670","volume":"7","author":"HF Sperdin","year":"2018","unstructured":"Sperdin HF, Coito A, Kojovic N, Rihs TA, Jan RK, Franchini M, Plomp G, Vulliemoz S, Eliez S, Michel CM, Schaer M (2018) Early alterations of social brain networks in young children with autism. ELife 7:1\u201323. https:\/\/doi.org\/10.7554\/eLife.31670","journal-title":"ELife"},{"key":"993_CR67","doi-asserted-by":"publisher","unstructured":"Srinivasan SM, Eigsti IM, Neelly L, Bhat AN (2016) The effects of embodied rhythm and robotic interventions on the spontaneous and responsive social attention patterns of children with autism spectrum disorder (Asd): a pilot randomized controlled trial. Res Autism Spect Disord 27:54\u201372. https:\/\/doi.org\/10.1016\/j.rasd.2016.01.004","DOI":"10.1016\/j.rasd.2016.01.004"},{"issue":"1","key":"993_CR68","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s41235-022-00360-2","volume":"7","author":"A Stajduhar","year":"2022","unstructured":"Stajduhar A, Ganel T, Avidan G, Rosenbaum RS, Freud E (2022) Face masks disrupt holistic processing and face perception in school-age children. Cogn Res Princ Implic 7(1):1\u201310. https:\/\/doi.org\/10.1186\/s41235-022-00360-2","journal-title":"Cogn Res Princ Implic"},{"key":"993_CR69","doi-asserted-by":"crossref","unstructured":"Stroop JR (1935) Studies of interference in serial verbal reactions. J Exp Psychol 18(6):643.https:\/\/doi.org\/10.1037\/h0054651","DOI":"10.1037\/h0054651"},{"key":"993_CR70","doi-asserted-by":"publisher","unstructured":"Tavakoli HR, Borji A, Kannala J, Rahtu E (2020) Deep audio-visual saliency: Baseline model and data. In: ACM symposium on eye tracking research and applications, ETRA \u201920 Short Papers. Association for Computing Machinery, New York, NY, USA. pp. 1\u20135. https:\/\/doi.org\/10.1145\/3379156.3391337","DOI":"10.1145\/3379156.3391337"},{"key":"993_CR71","doi-asserted-by":"publisher","unstructured":"Tsiami A, Koutras P, Maragos P (2020) STAViS: Spatio-temporal audiovisual saliency network. In: Proceedings of the IEEE\/cvf conference on computer vision and pattern recognition (CVPR), IEEE. pp. 4766\u20134776. https:\/\/doi.org\/10.1109\/CVPR42600.2020.00482","DOI":"10.1109\/CVPR42600.2020.00482"},{"issue":"4","key":"993_CR72","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13636-020-0171-y","volume":"2020","author":"J Wang","year":"2020","unstructured":"Wang J, Wang J, Qian K, Xie X, Kuang J (2020) Binaural sound localization based on deep neural network and affinity propagation clustering in mismatched HRTF condition. EURASIP J Audio Speech Music Process 2020(4):1\u201316. https:\/\/doi.org\/10.1186\/s13636-020-0171-y","journal-title":"EURASIP J Audio Speech Music Process"},{"key":"993_CR73","doi-asserted-by":"publisher","unstructured":"Wightman FL, Kistler DJ (1997) Monaural sound localization revisited. J Acoust Soc Am 101(2):1050\u20131063. https:\/\/doi.org\/10.1121\/1.418029","DOI":"10.1121\/1.418029"},{"key":"993_CR74","doi-asserted-by":"publisher","first-page":"70","DOI":"10.3389\/fpsyg.2018.00070","volume":"9","author":"C Willemse","year":"2018","unstructured":"Willemse C, Marchesi S, Wykowska A (2018) Robot faces that follow gaze facilitate attentional engagement and increase their likeability. Front Psychol 9:70. https:\/\/doi.org\/10.3389\/fpsyg.2018.00070","journal-title":"Front Psychol"},{"key":"993_CR75","doi-asserted-by":"crossref","unstructured":"Wu X, Wu Z, Ju L, Wang S (2021) Binaural Audio-Visual Localization, vol. 35(4). AAAI. https:\/\/doi.org\/10.1609\/aaai.v35i4.16403","DOI":"10.1609\/aaai.v35i4.16403"},{"key":"993_CR76","doi-asserted-by":"publisher","unstructured":"Xu M, Liu Y, Hu R, He F (2018) Find who to look at: turning from action to saliency. IEEE Trans Image Process 27(9):4529\u20134544. https:\/\/doi.org\/10.1109\/TIP.2018.2837106","DOI":"10.1109\/TIP.2018.2837106"},{"key":"993_CR77","doi-asserted-by":"publisher","unstructured":"Yeung HH, Werker JF (2013) Lip movements affect infants\u2019 audiovisual speech perception. Psychol Sci 24(5):603\u2013612. https:\/\/doi.org\/10.1177\/0956797612458802","DOI":"10.1177\/0956797612458802"}],"container-title":["International Journal of Social Robotics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s12369-023-00993-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s12369-023-00993-3\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s12369-023-00993-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,8,29]],"date-time":"2023-08-29T10:25:51Z","timestamp":1693304751000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s12369-023-00993-3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,4,2]]},"references-count":77,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2023,8]]}},"alternative-id":["993"],"URL":"https:\/\/doi.org\/10.1007\/s12369-023-00993-3","relation":{},"ISSN":["1875-4791","1875-4805"],"issn-type":[{"value":"1875-4791","type":"print"},{"value":"1875-4805","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,4,2]]},"assertion":[{"value":"8 March 2023","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 April 2023","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}},{"value":"Informed consent was obtained from all participants in the study.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Informed Consent"}},{"value":"All procedures performed in studies involving participants were following the ethical standards of the institutional and national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical Approval"}}]}}