{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,12]],"date-time":"2026-02-12T15:15:11Z","timestamp":1770909311443,"version":"3.50.1"},"reference-count":45,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2019,7,26]],"date-time":"2019-07-26T00:00:00Z","timestamp":1564099200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Comput. Graph. Interact. Tech."],"published-print":{"date-parts":[[2019,7,26]]},"abstract":"<jats:p>In this paper we propose a novel deep learning based approach to generate realistic three-party head and eye motions based on novel acoustic speech input together with speaker marking (i.e., speaking time for each interlocutor). Specifically, we first acquire a high quality, three-party conversational motion dataset. Then, based on the acquired dataset, we train a deep learning based framework to automatically predict the dynamic directions of both the eyes and heads of all the interlocutors based on speech signal input. Via the combination of existing lip-sync and speech-driven hand\/body gesture generation algorithms, we can generate realistic three-party conversational animations. Through many experiments and comparative user studies, we demonstrate that our approach can generate realistic three-party head-and-eye motions based on novel speech recorded on new subjects with different genders and ethnicities.<\/jats:p>","DOI":"10.1145\/3340250","type":"journal-article","created":{"date-parts":[[2019,7,29]],"date-time":"2019-07-29T20:55:51Z","timestamp":1564433751000},"page":"1-19","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":17,"title":["A Deep Learning-Based Model for Head and Eye Motion Generation in Three-party Conversations"],"prefix":"10.1145","volume":"2","author":[{"given":"Aobo","family":"Jin","sequence":"first","affiliation":[{"name":"University of Houston"}]},{"given":"Qixin","family":"Deng","sequence":"additional","affiliation":[{"name":"University of Houston"}]},{"given":"Yuting","family":"Zhang","sequence":"additional","affiliation":[{"name":"University of Houston"}]},{"given":"Zhigang","family":"Deng","sequence":"additional","affiliation":[{"name":"University of Houston"}]}],"member":"320","published-online":{"date-parts":[[2019,7,26]]},"reference":[{"key":"e_1_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASL.2006.885910"},{"key":"e_1_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.5555\/1089870.1089884"},{"key":"e_1_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/192161.192272"},{"key":"e_1_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/383259.383315"},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/1061347.1061355"},{"key":"e_1_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/MCG.2005.35"},{"key":"e_1_2_2_7_1","volume-title":"Data-driven 3D facial animation","author":"Deng Zhigang","unstructured":"Zhigang Deng and Junyong Noh . 2008. Computer facial animation: A survey . In Data-driven 3D facial animation . Springer , 1--28. Zhigang Deng and Junyong Noh. 2008. Computer facial animation: A survey. In Data-driven 3D facial animation. Springer, 1--28."},{"key":"e_1_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/3025453.3025644"},{"key":"e_1_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/2388676.2388680"},{"key":"e_1_2_2_10_1","volume-title":"Conversation analysis. Annual review of anthropology 19, 1","author":"Goodwin Charles","year":"1990","unstructured":"Charles Goodwin and John Heritage . 1990. Conversation analysis. Annual review of anthropology 19, 1 ( 1990 ), 283--307. Charles Goodwin and John Heritage. 1990. Conversation analysis. Annual review of anthropology 19, 1 (1990), 283--307."},{"key":"e_1_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.5555\/874061.875416"},{"key":"e_1_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1007\/11821830_2"},{"key":"e_1_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1007\/11821830_16"},{"key":"e_1_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-02675-6_35"},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1010010528443"},{"key":"e_1_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.5898\/JHRI.2.1.Kondo"},{"key":"e_1_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2012.74"},{"key":"e_1_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/1073204.1073242"},{"key":"e_1_2_2_20_1","volume-title":"Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems-Volume 1. International Foundation for Autonomous Agents and Multiagent Systems, 289--296","author":"Lee Jina","year":"2009","unstructured":"Jina Lee and Stacy Marsella . 2009 . Learning a model of speaker head nods using gesture corpora . In Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems-Volume 1. International Foundation for Autonomous Agents and Multiagent Systems, 289--296 . Jina Lee and Stacy Marsella. 2009. Learning a model of speaker head nods using gesture corpora. In Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems-Volume 1. International Foundation for Autonomous Agents and Multiagent Systems, 289--296."},{"key":"e_1_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/566654.566629"},{"key":"e_1_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1778765.1778861"},{"key":"e_1_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/1618452.1618518"},{"key":"e_1_2_2_24_1","volume-title":"Practice and Theory of Blendshape Facial Models. Eurographics (State of the Art Reports) 1, 8","author":"Lewis John P","year":"2014","unstructured":"John P Lewis , Ken Anjyo , Taehyun Rhee , Mengjie Zhang , Fr\u00e9d\u00e9ric H Pighin , and Zhigang Deng . 2014. Practice and Theory of Blendshape Facial Models. Eurographics (State of the Art Reports) 1, 8 ( 2014 ). John P Lewis, Ken Anjyo, Taehyun Rhee, Mengjie Zhang, Fr\u00e9d\u00e9ric H Pighin, and Zhigang Deng. 2014. Practice and Theory of Blendshape Facial Models. Eurographics (State of the Art Reports) 1, 8 (2014)."},{"key":"e_1_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/VR.2009.4811014"},{"key":"e_1_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1007\/11550617_3"},{"key":"e_1_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/2485895.2485900"},{"key":"e_1_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.5555\/1165040.1165046"},{"key":"e_1_2_2_29_1","volume-title":"AAAI Fall Symposium: Dialog with Robots.","author":"Matsuyama Yoichi","year":"2010","unstructured":"Yoichi Matsuyama , Hikaru Taniyama , Shinya Fujie , and Tetsunori Kobayashi . 2010 . Framework of Communication Activation Robot Participating in Multiparty Conversation .. In AAAI Fall Symposium: Dialog with Robots. Yoichi Matsuyama, Hikaru Taniyama, Shinya Fujie, and Tetsunori Kobayashi. 2010. Framework of Communication Activation Robot Participating in Multiparty Conversation.. In AAAI Fall Symposium: Dialog with Robots."},{"key":"e_1_2_2_30_1","volume-title":"Visual prosody and speech intelligibility: Head movement improves auditory speech perception. Psychological science 15, 2","author":"Munhall Kevin G","year":"2004","unstructured":"Kevin G Munhall , Jeffery A Jones , Daniel E Callan , Takaaki Kuratate , and Eric Vatikiotis-Bateson . 2004. Visual prosody and speech intelligibility: Head movement improves auditory speech perception. Psychological science 15, 2 ( 2004 ), 133--137. Kevin G Munhall, Jeffery A Jones, Daniel E Callan, Takaaki Kuratate, and Eric Vatikiotis-Bateson. 2004. Visual prosody and speech intelligibility: Head movement improves auditory speech perception. Psychological science 15, 2 (2004), 133--137."},{"key":"e_1_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/1514095.1514109"},{"key":"e_1_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/1322192.1322237"},{"key":"e_1_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/1088463.1088497"},{"key":"e_1_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/2980179.2982444"},{"key":"e_1_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/965400.965484"},{"key":"e_1_2_2_36_1","unstructured":"Kerstin Ruhland Sean Andrist Jeremy Badler Christopher Peters Norman Badler Michael Gleicher Bilge Mutlu and Rachel Mcdonnell. 2014. Look me in the eyes: A survey of eye and gaze animation for virtual agents and artificial systems. In Eurographics State-of-the-Art Report. 69--91.  Kerstin Ruhland Sean Andrist Jeremy Badler Christopher Peters Norman Badler Michael Gleicher Bilge Mutlu and Rachel Mcdonnell. 2014. Look me in the eyes: A survey of eye and gaze animation for virtual agents and artificial systems. In Eurographics State-of-the-Art Report. 69--91."},{"key":"e_1_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2007.70797"},{"key":"e_1_2_2_38_1","volume-title":"Eyelid kinematics for virtual characters. Computer animation and virtual worlds 21, 3-4","author":"Steptoe William","year":"2010","unstructured":"William Steptoe , Oyewole Oyekoya , and Anthony Steed . 2010. Eyelid kinematics for virtual characters. Computer animation and virtual worlds 21, 3-4 ( 2010 ), 161--171. William Steptoe, Oyewole Oyekoya, and Anthony Steed. 2010. Eyelid kinematics for virtual characters. Computer animation and virtual worlds 21, 3-4 (2010), 161--171."},{"key":"e_1_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/1015706.1015753"},{"key":"e_1_2_2_40_1","volume-title":"Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems-Volume 1. International Foundation for Autonomous Agents and Multiagent Systems, 321--328","author":"Thiebaux Marcus","year":"2009","unstructured":"Marcus Thiebaux , Brent Lance , and Stacy Marsella . 2009 . Real-time expressive gaze animation for virtual humans . In Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems-Volume 1. International Foundation for Autonomous Agents and Multiagent Systems, 321--328 . Marcus Thiebaux, Brent Lance, and Stacy Marsella. 2009. Real-time expressive gaze animation for virtual humans. In Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems-Volume 1. International Foundation for Autonomous Agents and Multiagent Systems, 321--328."},{"key":"e_1_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/365024.365119"},{"key":"e_1_2_2_42_1","unstructured":"Roel Vertegaal Gerrit van der Veer and Harro Vons. 2000. Effects of gaze on multiparty mediated communication. In Graphics Interface. 95--102.  Roel Vertegaal Gerrit van der Veer and Harro Vons. 2000. Effects of gaze on multiparty mediated communication. In Graphics Interface. 95--102."},{"key":"e_1_2_2_43_1","volume-title":"Computer Graphics Forum","author":"Vinayagamoorthy Vinoba","unstructured":"Vinoba Vinayagamoorthy , Maia Garau , Anthony Steed , and Mel Slater . 2004. An eye gaze model for dyadic interaction in an immersive virtual environment: Practice and experience . In Computer Graphics Forum , Vol. 23 . Wiley Online Library , 1--11. Vinoba Vinayagamoorthy, Maia Garau, Anthony Steed, and Mel Slater. 2004. An eye gaze model for dyadic interaction in an immersive virtual environment: Practice and experience. In Computer Graphics Forum, Vol. 23. Wiley Online Library, 1--11."},{"key":"e_1_2_2_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/2897824.2925947"},{"key":"e_1_2_2_45_1","volume-title":"Janet Beavin Bavelas, and Don D Jackson","author":"Watzlawick Paul","year":"2011","unstructured":"Paul Watzlawick , Janet Beavin Bavelas, and Don D Jackson . 2011 . Pragmatics of human communication: A study of interactional patterns, pathologies and paradoxes. WW Norton & Company . Paul Watzlawick, Janet Beavin Bavelas, and Don D Jackson. 2011. Pragmatics of human communication: A study of interactional patterns, pathologies and paradoxes. WW Norton & Company."}],"container-title":["Proceedings of the ACM on Computer Graphics and Interactive Techniques"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3340250","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3340250","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T17:49:32Z","timestamp":1750268972000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3340250"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,7,26]]},"references-count":45,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2019,7,26]]}},"alternative-id":["10.1145\/3340250"],"URL":"https:\/\/doi.org\/10.1145\/3340250","relation":{},"ISSN":["2577-6193"],"issn-type":[{"value":"2577-6193","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,7,26]]},"assertion":[{"value":"2019-07-26","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}