{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T22:31:08Z","timestamp":1776119468828,"version":"3.50.1"},"reference-count":31,"publisher":"Springer Science and Business Media LLC","issue":"7","license":[{"start":{"date-parts":[[2020,6,30]],"date-time":"2020-06-30T00:00:00Z","timestamp":1593475200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,6,30]],"date-time":"2020-06-30T00:00:00Z","timestamp":1593475200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"DFG","award":["TRR 169"],"award-info":[{"award-number":["TRR 169"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Neural Comput &amp; Applic"],"published-print":{"date-parts":[[2021,4]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Whenever we are addressing a specific object or refer to a certain spatial location, we are using referential or deictic gestures usually accompanied by some verbal description. Particularly, pointing gestures are necessary to dissolve ambiguities in a scene and they are of crucial importance when verbal communication may fail due to environmental conditions or when two persons simply do not speak the same language. With the currently increasing advances of humanoid robots and their future integration in domestic domains, the development of gesture interfaces complementing human\u2013robot interaction scenarios is of substantial interest. The implementation of an intuitive gesture scenario is still challenging because both the pointing intention and the corresponding object have to be correctly recognized in real time. The demand increases when considering pointing gestures in a cluttered environment, as is the case in households. Also, humans perform pointing in many different ways and those variations have to be captured. Research in this field often proposes a set of geometrical computations which do not scale well with the number of gestures and objects and use specific markers or a predefined set of pointing directions. In this paper, we propose an unsupervised learning approach to model the distribution of pointing gestures using a growing-when-required (GWR) network. We introduce an interaction scenario with a humanoid robot and define the so-called ambiguity classes. Our implementation for the hand and object detection is independent of any markers or skeleton models; thus, it can be easily reproduced. Our evaluation comparing a baseline computer vision approach with our GWR model shows that the pointing-object association is well learned even in cases of ambiguities resulting from close object proximity.<\/jats:p>","DOI":"10.1007\/s00521-020-05109-w","type":"journal-article","created":{"date-parts":[[2020,6,30]],"date-time":"2020-06-30T07:02:46Z","timestamp":1593500566000},"page":"2297-2319","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":12,"title":["Solving visual object ambiguities when pointing: an unsupervised learning approach"],"prefix":"10.1007","volume":"33","author":[{"given":"Doreen","family":"Jirak","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"David","family":"Biertimpel","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Matthias","family":"Kerzel","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Stefan","family":"Wermter","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2020,6,30]]},"reference":[{"issue":"1","key":"5109_CR1","doi-asserted-by":"publisher","first-page":"79","DOI":"10.1111\/infa.12261","volume":"24","author":"K Astor","year":"2019","unstructured":"Astor K, Gredebaeck G (2019) Gaze following in 4.5- and 6-month-old infants: The impact of proximity on standard gaze following performance tests. Infancy 24(1):79\u201389","journal-title":"Infancy"},{"issue":"3","key":"5109_CR2","doi-asserted-by":"publisher","first-page":"359","DOI":"10.1111\/j.2044-835X.2011.02043.x","volume":"30","author":"T Behne","year":"2012","unstructured":"Behne T, Liszkowski U, Carpenter M, Tomasello M (2012) Twelve-month-olds comprehension and production of pointing. Brit J Dev Psychol 30(3):359\u2013375","journal-title":"Brit J Dev Psychol"},{"issue":"2","key":"5109_CR3","doi-asserted-by":"publisher","first-page":"181","DOI":"10.1109\/TSMCC.2004.826268","volume":"34","author":"C Breazeal","year":"2004","unstructured":"Breazeal C (2004) Social interactions in HRI: the robot view. Trans Syst Man Cybern Part C 34(2):181\u2013186","journal-title":"Trans Syst Man Cybern Part C"},{"key":"5109_CR4","doi-asserted-by":"publisher","first-page":"65","DOI":"10.1016\/j.cviu.2016.03.004","volume":"149","author":"G Canal","year":"2016","unstructured":"Canal G, Escalera S, Angulo C (2016) A real-time human\u2013robot interaction system based on gestures for assistive scenarios. Comput Vis Image Underst 149:65\u201377","journal-title":"Comput Vis Image Underst"},{"issue":"4","key":"5109_CR5","doi-asserted-by":"publisher","first-page":"i","DOI":"10.2307\/1166214","volume":"63","author":"M Carpenter","year":"1998","unstructured":"Carpenter M, Nagell K, Tomasello M, Butterworth G, Moore C (1998) Social cognition, joint attention, and communicative competence from 9 to 15 months of age. Monogr Soc Res Child Dev 63(4):i\u2013174","journal-title":"Monogr Soc Res Child Dev"},{"key":"5109_CR6","doi-asserted-by":"crossref","unstructured":"Churamani N, Kerzel M, Strahl E, Barros P, Wermter S (2017) Teaching emotion expressions to a human companion robot using deep neural architectures. In: International joint conference on neural networks (IJCNN). IEEE, Anchorage, Alaska, pp 627\u2013634","DOI":"10.1109\/IJCNN.2017.7965911"},{"key":"5109_CR7","unstructured":"Cosgun A, Trevor JA, Christensen HI (2015) Did you mean this object? Detecting ambiguity in pointing gesture targets. In: Towards a framework for joint action workshop, HRI 2015 Portland, OR, USA"},{"key":"5109_CR8","doi-asserted-by":"publisher","first-page":"1573","DOI":"10.1007\/s11263-017-1058-y","volume":"126","author":"S Escalera","year":"2018","unstructured":"Escalera S, Gonz\u00e0lez J, Escalante HJ, Bar\u00f3 X, Guyon I (2018) Looking at people special issue. Int J Comput Vis 126:1573\u20131405","journal-title":"Int J Comput Vis"},{"key":"5109_CR9","doi-asserted-by":"crossref","unstructured":"Gromov B, Gambardella LM, Giusti A (2018) Robot identification and localization with pointing gestures. In: 2018 IEEE\/RSJ international conference on intelligent robots and systems (IROS), pp 3921\u20133928","DOI":"10.1109\/IROS.2018.8594174"},{"key":"5109_CR10","doi-asserted-by":"crossref","unstructured":"Gulzar K, Kyrki V (2015) See what I mean-probabilistic optimization of robot pointing gestures. In: 2015 IEEE-RAS 15th international conference on humanoid robots (humanoids), pp 953\u2013958","DOI":"10.1109\/HUMANOIDS.2015.7363484"},{"issue":"1","key":"5109_CR11","doi-asserted-by":"publisher","first-page":"14","DOI":"10.1515\/pjbr-2019-0005","volume":"10","author":"B Hafez","year":"2019","unstructured":"Hafez B, Weber C, Kerzel M, Wermter S (2019) Deep intrinsically motivated continuous actor-critic for efficient robotic visuomotor skill learning. Paladyn J Behav Robot 10(1):14\u201329","journal-title":"Paladyn J Behav Robot"},{"key":"5109_CR12","unstructured":"Jirak D, Wermter S (2017) Potentials and limitations of deep neural networks for cognitive robots. In: EUCog meeting proceedings. EUCog. Z\u00fcrich, Switzerland"},{"key":"5109_CR13","doi-asserted-by":"crossref","unstructured":"Kerzel M, Strahl E, Magg S, Navarro-Guerrero N, Heinrich S, Wermter S (2017) Nico\u2014neuro-inspired companion: a developmental humanoid robot platform for multimodal interaction. In: Proceedings of the IEEE international symposium on robot and human interactive communication (RO-MAN), pp 113\u2013120","DOI":"10.1109\/ROMAN.2017.8172289"},{"key":"5109_CR14","doi-asserted-by":"publisher","first-page":"235","DOI":"10.1016\/j.cogdev.2017.06.001","volume":"43","author":"T Kishimoto","year":"2017","unstructured":"Kishimoto T (2017) Cross-sectional and longitudinal observations of pointing gestures by infants and their caregivers in Japan. Cogn Dev 43:235\u2013244","journal-title":"Cogn Dev"},{"issue":"9","key":"5109_CR15","doi-asserted-by":"publisher","first-page":"1464","DOI":"10.1109\/5.58325","volume":"78","author":"T Kohonen","year":"1990","unstructured":"Kohonen T (1990) The self-organizing map. Proc IEEE 78(9):1464\u20131480","journal-title":"Proc IEEE"},{"issue":"1","key":"5109_CR16","doi-asserted-by":"publisher","first-page":"216","DOI":"10.1109\/TRO.2018.2875388","volume":"35","author":"P Kondaxakis","year":"2019","unstructured":"Kondaxakis P, Gulzar K, Kinauer S, Kokkinos I, Kyrki V (2019) Robot\u2013robot gesturing for anchoring representations. IEEE Trans Robot 35(1):216\u2013230","journal-title":"IEEE Trans Robot"},{"issue":"8","key":"5109_CR17","doi-asserted-by":"publisher","first-page":"1041","DOI":"10.1016\/S0893-6080(02)00078-3","volume":"15","author":"S Marsland","year":"2002","unstructured":"Marsland S, Shapiro J, Nehmzow U (2002) A self-organising network that grows when required. Neural Netw 15(8):1041\u20131058","journal-title":"Neural Netw"},{"issue":"1","key":"5109_CR18","doi-asserted-by":"publisher","first-page":"83","DOI":"10.1016\/S0163-6383(97)90063-1","volume":"20","author":"C Moore","year":"1997","unstructured":"Moore C, Angelopoulos M, Bennett P (1997) The role of movement in the development of joint visual attention. Infant Behav Dev 20(1):83\u201392","journal-title":"Infant Behav Dev"},{"key":"5109_CR19","unstructured":"Nagai Y (2005) Learning to comprehend deictic gestures in robots and human infants. In: ROMAN 2005. IEEE international workshop on robot and human interactive communication, 2005, pp 217\u2013222"},{"key":"5109_CR20","doi-asserted-by":"crossref","unstructured":"Nagai Y (2005) The role of motion information in learning human\u2013robot joint attention. In: Proceedings of the 2005 IEEE international conference on robotics and automation, pp 2069\u20132074","DOI":"10.1109\/ROBOT.2005.1570418"},{"key":"5109_CR21","doi-asserted-by":"crossref","unstructured":"Sugiyama O, Kanda T, Imai M, Ishiguro H, Hagita N (2007) Natural deictic communication with humanoid robots. In: 2007 IEEE\/RSJ international conference on intelligent robots and systems, pp 1441\u20131448","DOI":"10.1109\/IROS.2007.4399120"},{"key":"5109_CR22","doi-asserted-by":"publisher","DOI":"10.3389\/fnbot.2015.00003","author":"GI Parisi","year":"2015","unstructured":"Parisi GI, Weber C, Wermter S (2015) Self-organizing neural integration of pose-motion features for human action recognition. Front Neurorobot. https:\/\/doi.org\/10.3389\/fnbot.2015.00003","journal-title":"Front Neurorobot"},{"issue":"3","key":"5109_CR23","doi-asserted-by":"publisher","first-page":"313","DOI":"10.1007\/s12369-013-0196-9","volume":"5","author":"M Salem","year":"2013","unstructured":"Salem M, Eyssel F, Rohlfing K, Kopp S, Joublin F (2013) To err is human(-like): effects of robot gesture on perceived anthropomorphism and likability. Int J Soc Robot 5(3):313\u2013323","journal-title":"Int J Soc Robot"},{"key":"5109_CR24","doi-asserted-by":"crossref","unstructured":"Saupp\u00e9 A, Mutlu B (2014) Robot deictics: How gesture and context shape referential communication. In: Proceedings of the 2014 ACM\/IEEE international conference on human\u2013robot interaction, HRI \u201914. ACM, New York, pp 342\u2013349","DOI":"10.1145\/2559636.2559657"},{"key":"5109_CR25","doi-asserted-by":"crossref","unstructured":"Showers A, Si M (2018) Pointing estimation for human\u2013robot interaction using hand pose, verbal cues, and confidence heuristics. In: Proceeding social computing and social media. Technologies and Analytics, pp 403\u2013412","DOI":"10.1007\/978-3-319-91485-5_31"},{"key":"5109_CR26","doi-asserted-by":"crossref","unstructured":"Shukla D, Erkent O, Piater J (2015) Probabilistic detection of pointing directions for human\u2013robot interaction. In: 2015 international conference on digital image computing: techniques and applications (DICTA), pp 1\u20138","DOI":"10.1109\/DICTA.2015.7371296"},{"key":"5109_CR27","doi-asserted-by":"crossref","unstructured":"Shukla D, Erkent O, Piater J (2016) A multi-view hand gesture RGB-D dataset for human\u2013robot interaction scenarios. In: 2016 25th IEEE international symposium on robot and human interactive communication (RO-MAN), pp 1084\u20131091","DOI":"10.1109\/ROMAN.2016.7745243"},{"key":"5109_CR28","doi-asserted-by":"crossref","unstructured":"Siqueira H, Sutherland A, Barros P, Kerzel M, Magg S, Wermter S (2018) Disambiguating affective stimulus associations for robot perception and dialogue. In: IEEE-RAS international conference on humanoid robots (humanoids), pp 433\u2013440","DOI":"10.1109\/HUMANOIDS.2018.8625012"},{"issue":"2","key":"5109_CR29","doi-asserted-by":"publisher","first-page":"79","DOI":"10.1016\/0167-8655(82)90016-2","volume":"1","author":"J Sklansky","year":"1982","unstructured":"Sklansky J (1982) Finding the convex hull of a simple polygon. Pattern Recognit Lett 1(2):79\u201383","journal-title":"Pattern Recognit Lett"},{"issue":"3","key":"5109_CR30","doi-asserted-by":"publisher","first-page":"705","DOI":"10.1111\/j.1467-8624.2007.01025.x","volume":"78","author":"M Tomasello","year":"2007","unstructured":"Tomasello M, Carpenter M, Liszkowski U (2007) A new look at infant pointing. Child Dev 78(3):705\u2013722","journal-title":"Child Dev"},{"issue":"1","key":"5109_CR31","doi-asserted-by":"publisher","first-page":"121","DOI":"10.1016\/j.archger.2011.02.003","volume":"54","author":"YH Wu","year":"2012","unstructured":"Wu YH, Fassert C, Rigaud AS (2012) Designing robots for the elderly: appearance issue and beyond. Arch Gerontol Geriatr 54(1):121\u2013126","journal-title":"Arch Gerontol Geriatr"}],"container-title":["Neural Computing and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00521-020-05109-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s00521-020-05109-w\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00521-020-05109-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,6,29]],"date-time":"2021-06-29T23:44:45Z","timestamp":1625010285000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s00521-020-05109-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,6,30]]},"references-count":31,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2021,4]]}},"alternative-id":["5109"],"URL":"https:\/\/doi.org\/10.1007\/s00521-020-05109-w","relation":{},"ISSN":["0941-0643","1433-3058"],"issn-type":[{"value":"0941-0643","type":"print"},{"value":"1433-3058","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,6,30]]},"assertion":[{"value":"12 August 2019","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"5 June 2020","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"30 June 2020","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Compliance with ethical standards"}},{"value":"The authors declare that they have no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}