{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,7]],"date-time":"2026-05-07T18:51:37Z","timestamp":1778179897069,"version":"3.51.4"},"reference-count":36,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2019,6,3]],"date-time":"2019-06-03T00:00:00Z","timestamp":1559520000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"CHIST-ERA project IGLU and KTH SRA ICT The Next Generation"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["J. Hum.-Robot Interact."],"published-print":{"date-parts":[[2019,6,30]]},"abstract":"<jats:p>This study proposes, develops, and evaluates methods for modeling the eye-gaze direction and head orientation of a person in multiparty open-world dialogues, as a function of low-level communicative signals generated by his\/hers interlocutors. These signals include speech activity, eye-gaze direction, and head orientation, all of which can be estimated in real time during the interaction. By utilizing these signals and novel data representations suitable for the task and context, the developed methods can generate plausible candidate gaze targets in real time. The methods are based on Feedforward Neural Networks and Long Short-Term Memory Networks. The proposed methods are developed using several hours of unrestricted interaction data and their performance is compared with a heuristic baseline method. The study offers an extensive evaluation of the proposed methods that investigates the contribution of different predictors to the accurate generation of candidate gaze targets. The results show that the methods can accurately generate candidate gaze targets when the person being modeled is in a listening state. However, when the person being modeled is in a speaking state, the proposed methods yield significantly lower performance.<\/jats:p>","DOI":"10.1145\/3323231","type":"journal-article","created":{"date-parts":[[2019,6,4]],"date-time":"2019-06-04T16:01:38Z","timestamp":1559664098000},"page":"1-21","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":12,"title":["Modeling of Human Visual Attention in Multiparty Open-World Dialogues"],"prefix":"10.1145","volume":"8","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0861-8660","authenticated-orcid":false,"given":"Kalin","family":"Stefanov","sequence":"first","affiliation":[{"name":"KTH Royal Institute of Technology, Stockholm, Sweden"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3323-5311","authenticated-orcid":false,"given":"Giampiero","family":"Salvi","sequence":"additional","affiliation":[{"name":"KTH Royal Institute of Technology, Stockholm, Sweden"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8874-6629","authenticated-orcid":false,"given":"Dimosthenis","family":"Kontogiorgos","sequence":"additional","affiliation":[{"name":"KTH Royal Institute of Technology, Stockholm, Sweden"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5750-9655","authenticated-orcid":false,"given":"Hedvig","family":"Kjellstr\u00f6m","sequence":"additional","affiliation":[{"name":"KTH Royal Institute of Technology, Stockholm, Sweden"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1399-6604","authenticated-orcid":false,"given":"Jonas","family":"Beskow","sequence":"additional","affiliation":[{"name":"KTH Royal Institute of Technology, Stockholm, Sweden"}]}],"member":"320","published-online":{"date-parts":[[2019,6,3]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"M. Abadi A. Agarwal P. Barham E. Brevdo Z. Chen C. Citro G. Corrado A. Davis J. Dean M. Devin S. Ghemawat I. Goodfellow A. Harp G. Irving M. Isard Y. Jia R. Jozefowicz L. Kaiser M. Kudlur J. Levenberg D. Man\u00e9 R. Monga S. Moore D. Murray C. Olah M. Schuster J. Shlens B. Steiner I. Sutskever K. Talwar P. Tucker V. Vanhoucke V. Vasudevan F. Vi\u00e9gas O. Vinyals P. Warden M. Wattenberg M. Wicke Y. Yu and X. Zheng. 2015. TensorFlow: Large-scale machine learning on heterogeneous systems White paper.  M. Abadi A. Agarwal P. Barham E. Brevdo Z. Chen C. Citro G. Corrado A. Davis J. Dean M. Devin S. Ghemawat I. Goodfellow A. Harp G. Irving M. Isard Y. Jia R. Jozefowicz L. Kaiser M. Kudlur J. Levenberg D. Man\u00e9 R. Monga S. Moore D. Murray C. Olah M. Schuster J. Shlens B. Steiner I. Sutskever K. Talwar P. Tucker V. Vanhoucke V. Vasudevan F. Vi\u00e9gas O. Vinyals P. Warden M. Wattenberg M. Wicke Y. Yu and X. Zheng. 2015. TensorFlow: Large-scale machine learning on heterogeneous systems White paper."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/2663204.2663263"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.5898\/JHRI.6.1.Admoni"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1207\/s15327051hci1204_5"},{"key":"e_1_2_1_5_1","volume-title":"Proceedings of the Intelligent Virtual Agents. Springer","author":"Andrist S.","unstructured":"S. Andrist , B. Mutlu , and M. Gleicher . 2013. Conversational gaze aversion for virtual agents . In Proceedings of the Intelligent Virtual Agents. Springer , Berlin, 249--262. S. Andrist, B. Mutlu, and M. Gleicher. 2013. Conversational gaze aversion for virtual agents. In Proceedings of the Intelligent Virtual Agents. Springer, Berlin, 249--262."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/2559636.2559666"},{"key":"e_1_2_1_7_1","volume-title":"Proceedings of the IEEE-RAS International Conference on Humanoid Robots. IEEE, 418--423","author":"Bennewitz M.","unstructured":"M. Bennewitz , F. Faber , D. Joho , M. Schreiber , and S. Behnke . 2005. Towards a humanoid museum guide robot that interacts with multiple persons . In Proceedings of the IEEE-RAS International Conference on Humanoid Robots. IEEE, 418--423 . M. Bennewitz, F. Faber, D. Joho, M. Schreiber, and S. Behnke. 2005. Towards a humanoid museum guide robot that interacts with multiple persons. In Proceedings of the IEEE-RAS International Conference on Humanoid Robots. IEEE, 418--423."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/1891903.1891910"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2012.89"},{"key":"e_1_2_1_10_1","volume-title":"Proceedings of the Joint Conference on Artificial Intelligence. Morgan Kaufmann Publishers Inc., 1146--1153","author":"Breazeal C.","unstructured":"C. Breazeal and B. Scassellati . 1999. A context-dependent attention system for a social robot . In Proceedings of the Joint Conference on Artificial Intelligence. Morgan Kaufmann Publishers Inc., 1146--1153 . C. Breazeal and B. Scassellati. 1999. A context-dependent attention system for a social robot. In Proceedings of the Joint Conference on Artificial Intelligence. Morgan Kaufmann Publishers Inc., 1146--1153."},{"key":"e_1_2_1_11_1","unstructured":"F. Chollet etal 2015. Keras.  F. Chollet et al. 2015. Keras."},{"key":"e_1_2_1_12_1","unstructured":"A. Colburn M. Cohen and S. Drucker. 2000. The Role of Eye Gaze in Avatar Mediated Conversational Interfaces. Technical Report. Microsoft.  A. Colburn M. Cohen and S. Drucker. 2000. The Role of Eye Gaze in Avatar Mediated Conversational Interfaces. Technical Report. Microsoft."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/1658349.1658355"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1007\/11821830_16"},{"key":"e_1_2_1_15_1","volume-title":"The Hidden Dimension","author":"Hall E.","unstructured":"E. Hall . 1990. The Hidden Dimension . Anchor , Garden City, NY . E. Hall. 1990. The Hidden Dimension. Anchor, Garden City, NY."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neunet.2006.02.008"},{"key":"e_1_2_1_18_1","volume-title":"Proceedings of the IEEE International Conference on Robot and Human Interactive Communication. IEEE, 241--246","author":"Holroyd A.","unstructured":"A. Holroyd , C. Rich , C. Sidner , and B. Ponsler . 2011. Generating connection events for human-robot collaboration . In Proceedings of the IEEE International Conference on Robot and Human Interactive Communication. IEEE, 241--246 . A. Holroyd, C. Rich, C. Sidner, and B. Ponsler. 2011. Generating connection events for human-robot collaboration. In Proceedings of the IEEE International Conference on Robot and Human Interactive Communication. IEEE, 241--246."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2559636.2559668"},{"key":"e_1_2_1_20_1","volume-title":"Proceedings of the ACM\/IEEE International Conference on Human-Robot Interaction. IEEE Press","author":"Ishi C.","unstructured":"C. Ishi , C. Liu , H. Ishiguro , and N. Hagita . 2010. Head motion during dialogue speech and nod timing control in humanoid robots . In Proceedings of the ACM\/IEEE International Conference on Human-Robot Interaction. IEEE Press , Piscataway, NJ, 293--300. C. Ishi, C. Liu, H. Ishiguro, and N. Hagita. 2010. Head motion during dialogue speech and nod timing control in humanoid robots. In Proceedings of the ACM\/IEEE International Conference on Human-Robot Interaction. IEEE Press, Piscataway, NJ, 293--300."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0042-6989(99)00163-7"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1038\/35058500"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1010010528443"},{"key":"e_1_2_1_24_1","volume-title":"Adam: A method for stochastic optimization. Computing Research Repository, abs\/1412.6980.","author":"Kingma D.","year":"2014","unstructured":"D. Kingma and J. Ba . 2014 . Adam: A method for stochastic optimization. Computing Research Repository, abs\/1412.6980. D. Kingma and J. Ba. 2014. Adam: A method for stochastic optimization. Computing Research Repository, abs\/1412.6980."},{"key":"e_1_2_1_25_1","doi-asserted-by":"crossref","unstructured":"C. Koch and S. Ullman. 1987. Shifts in Selective Visual Attention: Towards the Underlying Neural Circuitry. Springer Netherlands Dordrecht 115--141.  C. Koch and S. Ullman. 1987. Shifts in Selective Visual Attention: Towards the Underlying Neural Circuitry. Springer Netherlands Dordrecht 115--141.","DOI":"10.1007\/978-94-009-3833-5_5"},{"key":"e_1_2_1_26_1","volume-title":"Proceedings of the International Conference on Language Resources and Evaluation. European Language Resources Association (ELRA).","author":"Kontogiorgos D.","unstructured":"D. Kontogiorgos , V. Avramova , S. Alexandersson , P. Jonell , C. Oertel , J. Beskow , G. Skantze , and J. Gustafson . 2018. A multimodal corpus for mutual gaze and joint attention in multiparty situated interaction . In Proceedings of the International Conference on Language Resources and Evaluation. European Language Resources Association (ELRA). D. Kontogiorgos, V. Avramova, S. Alexandersson, P. Jonell, C. Oertel, J. Beskow, G. Skantze, and J. Gustafson. 2018. A multimodal corpus for mutual gaze and joint attention in multiparty situated interaction. In Proceedings of the International Conference on Language Resources and Evaluation. European Language Resources Association (ELRA)."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/2157689.2157797"},{"key":"e_1_2_1_28_1","volume-title":"Proceedings of the IEEE-RAS International Conference on Humanoid Robots. IEEE, 518--523","author":"Mutlu B.","unstructured":"B. Mutlu , J. Forlizzi , and J. Hodgins . 2006. A storytelling robot: Modeling and evaluation of human-like gaze behavior . In Proceedings of the IEEE-RAS International Conference on Humanoid Robots. IEEE, 518--523 . B. Mutlu, J. Forlizzi, and J. Hodgins. 2006. A storytelling robot: Modeling and evaluation of human-like gaze behavior. In Proceedings of the IEEE-RAS International Conference on Humanoid Robots. IEEE, 518--523."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1007\/11550617_20"},{"key":"e_1_2_1_30_1","volume-title":"Proceedings of the ACM\/IEEE International Conference on Human-Robot Interaction. IEEE Press","author":"Rich C.","unstructured":"C. Rich , B. Ponsler , A. Holroyd , and C. Sidner . 2010. Recognizing engagement in human-robot interaction . In Proceedings of the ACM\/IEEE International Conference on Human-Robot Interaction. IEEE Press , Piscataway, NJ, 375--382. C. Rich, B. Ponsler, A. Holroyd, and C. Sidner. 2010. Recognizing engagement in human-robot interaction. In Proceedings of the ACM\/IEEE International Conference on Human-Robot Interaction. IEEE Press, Piscataway, NJ, 375--382."},{"key":"e_1_2_1_31_1","volume-title":"Proceedings of the IEEE International Conference on Robotics and Automation. IEEE, 962--967","author":"Ruesch J.","unstructured":"J. Ruesch , M. Lopes , A. Bernardino , J. Hornstein , J. Santos-Victor , and R. Pfeifer . 2008. Multimodal saliency-based bottom-up attention a framework for the humanoid robot iCub . In Proceedings of the IEEE International Conference on Robotics and Automation. IEEE, 962--967 . J. Ruesch, M. Lopes, A. Bernardino, J. Hornstein, J. Santos-Victor, and R. Pfeifer. 2008. Multimodal saliency-based bottom-up attention a framework for the humanoid robot iCub. In Proceedings of the IEEE International Conference on Robotics and Automation. IEEE, 962--967."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/TRO.2007.904903"},{"key":"e_1_2_1_33_1","volume-title":"Proceedings of the International Conference on Language Resources and Evaluation. European Language Resources Association (ELRA).","author":"Stefanov K.","unstructured":"K. Stefanov and J. Beskow . 2016. A multi-party multi-modal dataset for focus of visual attention in human-human and human-robot interaction . In Proceedings of the International Conference on Language Resources and Evaluation. European Language Resources Association (ELRA). K. Stefanov and J. Beskow. 2016. A multi-party multi-modal dataset for focus of visual attention in human-human and human-robot interaction. In Proceedings of the International Conference on Language Resources and Evaluation. European Language Resources Association (ELRA)."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/1349822.1349849"},{"key":"e_1_2_1_35_1","volume-title":"Clinical Methods: The History, Physical, and Laboratory Examinations","author":"Walker H.","year":"1990","unstructured":"H. Walker , W. Hall , and J. Hurst . 1990 . Clinical Methods: The History, Physical, and Laboratory Examinations . Butterworth Publishers, Boston , MA. H. Walker, W. Hall, and J. Hurst. 1990. Clinical Methods: The History, Physical, and Laboratory Examinations. Butterworth Publishers, Boston, MA."},{"key":"e_1_2_1_36_1","volume-title":"Proceedings the International Conference on Social Robotics. Springer International Publishing, Cham, 556--566","author":"Zhang Y.","unstructured":"Y. Zhang , J. Beskow , and H. Kjellstr\u00f6m . 2017. Look but don\u2019t stare: Mutual gaze interaction in social robots . In Proceedings the International Conference on Social Robotics. Springer International Publishing, Cham, 556--566 . Y. Zhang, J. Beskow, and H. Kjellstr\u00f6m. 2017. Look but don\u2019t stare: Mutual gaze interaction in social robots. In Proceedings the International Conference on Social Robotics. Springer International Publishing, Cham, 556--566."}],"container-title":["ACM Transactions on Human-Robot Interaction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3323231","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3323231","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:23:17Z","timestamp":1750202597000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3323231"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,6,3]]},"references-count":36,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2019,6,30]]}},"alternative-id":["10.1145\/3323231"],"URL":"https:\/\/doi.org\/10.1145\/3323231","relation":{},"ISSN":["2573-9522"],"issn-type":[{"value":"2573-9522","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,6,3]]},"assertion":[{"value":"2018-04-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-03-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-06-03","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}