{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,1]],"date-time":"2025-10-01T15:42:50Z","timestamp":1759333370369,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":20,"publisher":"ACM","license":[{"start":{"date-parts":[[2020,3,23]],"date-time":"2020-03-23T00:00:00Z","timestamp":1584921600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2020,3,23]]},"DOI":"10.1145\/3371382.3378261","type":"proceedings-article","created":{"date-parts":[[2020,4,1]],"date-time":"2020-04-01T19:31:43Z","timestamp":1585769503000},"page":"340-342","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":11,"title":["Attention-Based Multimodal Fusion for Estimating Human Emotion in Real-World HRI"],"prefix":"10.1145","author":[{"given":"Yuanchao","family":"Li","sequence":"first","affiliation":[{"name":"Honda R&amp;D Co., Ltd., Tokyo, Japan"}]},{"given":"Tianyu","family":"Zhao","sequence":"additional","affiliation":[{"name":"Kyoto University, Kyoto, Japan"}]},{"given":"Xun","family":"Shen","sequence":"additional","affiliation":[{"name":"The Graduate University for Advanced Studies, Tokyo, Japan"}]}],"member":"320","published-online":{"date-parts":[[2020,4]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACII.2013.65"},{"key":"e_1_3_2_1_2_1","volume-title":"Dzmitry Bahdanau, and Yoshua Bengio.","author":"Cho Kyunghyun","year":"2014","unstructured":"Kyunghyun Cho , Bart Van Merri\u00ebnboer , Dzmitry Bahdanau, and Yoshua Bengio. 2014 . On the googleproperties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259 (2014). Kyunghyun Cho, Bart Van Merri\u00ebnboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. On the googleproperties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259 (2014)."},{"key":"e_1_3_2_1_3_1","article-title":"Multi-modal emotion recognition from speech and text","volume":"9","author":"Chuang Ze-Jing","year":"2004","unstructured":"Ze-Jing Chuang and Chung-Hsien Wu . 2004 . Multi-modal emotion recognition from speech and text . In International Journal of Computational Linguistics & Chinese Language Processing , Volume 9 , Number 2, August 2004: Special Issue on New Trends of Speech and Language Processing . 45--62. Ze-Jing Chuang and Chung-Hsien Wu. 2004. Multi-modal emotion recognition from speech and text. In International Journal of Computational Linguistics & Chinese Language Processing, Volume 9, Number 2, August 2004: Special Issue on New Trends of Speech and Language Processing . 45--62.","journal-title":"International Journal of Computational Linguistics & Chinese Language Processing"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/2388676.2388686"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/1873951.1874246"},{"key":"e_1_3_2_1_6_1","unstructured":"Google. 2018. Google Cloud Speech-to-Text API . https:\/\/cloud.google.com\/speech-to-text\/  Google. 2018. Google Cloud Speech-to-Text API . https:\/\/cloud.google.com\/speech-to-text\/"},{"key":"e_1_3_2_1_7_1","volume-title":"AVSP 2001-International Conference on Auditory-Visual Speech Processing .","author":"Grant Ken W","year":"2001","unstructured":"Ken W Grant and Steven Greenberg . 2001 . Speech intelligibility derived from asynchronous processing of auditory-visual information . In AVSP 2001-International Conference on Auditory-Visual Speech Processing . Ken W Grant and Steven Greenberg. 2001. Speech intelligibility derived from asynchronous processing of auditory-visual information. In AVSP 2001-International Conference on Auditory-Visual Speech Processing ."},{"key":"e_1_3_2_1_8_1","volume-title":"Long short-term memory. Neural computation","author":"Hochreiter Sepp","year":"1997","unstructured":"Sepp Hochreiter and J\u00fcrgen Schmidhuber . 1997. Long short-term memory. Neural computation , Vol. 9 , 8 ( 1997 ), 1735--1780. Sepp Hochreiter and J\u00fcrgen Schmidhuber. 1997. Long short-term memory. Neural computation , Vol. 9, 8 (1997), 1735--1780."},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.450"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3284432.3287188"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/APSIPA.2017.8282243"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.5555\/265013"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-24571-8_51"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/SLT.2016.7846319"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2661806.2661807"},{"key":"e_1_3_2_1_16_1","volume-title":"mbox","author":"W Wang","year":"2017","unstructured":"W Wang et al mbox . 2017 . R-NET: machine reading comprehension with self-matching networks. Natural Language Computer Group , Microsoft Reserach. Asia, Beijing . Technical Report. China, Technical Report 5. W Wang et almbox. 2017. R-NET: machine reading comprehension with self-matching networks. Natural Language Computer Group, Microsoft Reserach. Asia, Beijing . Technical Report. China, Technical Report 5."},{"key":"e_1_3_2_1_17_1","volume-title":"Survey on audiovisual emotion recognition: databases, features, and data fusion strategies. APSIPA transactions on signal and information processing","author":"Wu Chung-Hsien","year":"2014","unstructured":"Chung-Hsien Wu , Jen-Chun Lin , and Wen-Li Wei . 2014. Survey on audiovisual emotion recognition: databases, features, and data fusion strategies. APSIPA transactions on signal and information processing , Vol. 3 ( 2014 ). Chung-Hsien Wu, Jen-Chun Lin, and Wen-Li Wei. 2014. Survey on audiovisual emotion recognition: databases, features, and data fusion strategies. APSIPA transactions on signal and information processing , Vol. 3 (2014)."},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2006.12.001"},{"key":"e_1_3_2_1_19_1","volume-title":"Learning Alignment for Multimodal Emotion Recognition from Speech. arXiv preprint arXiv:1909.05645","author":"Xu Haiyang","year":"2019","unstructured":"Haiyang Xu , Hui Zhang , Kun Han , Yun Wang , Yiping Peng , and Xiangang Li. 2019. Learning Alignment for Multimodal Emotion Recognition from Speech. arXiv preprint arXiv:1909.05645 ( 2019 ). Haiyang Xu, Hui Zhang, Kun Han, Yun Wang, Yiping Peng, and Xiangang Li. 2019. Learning Alignment for Multimodal Emotion Recognition from Speech. arXiv preprint arXiv:1909.05645 (2019)."},{"key":"e_1_3_2_1_20_1","volume-title":"A survey of affect recognition methods: Audio, visual, and spontaneous expressions","author":"Zeng Zhihong","year":"2008","unstructured":"Zhihong Zeng , Maja Pantic , Glenn I Roisman , and Thomas S Huang . 2008. A survey of affect recognition methods: Audio, visual, and spontaneous expressions . IEEE transactions on pattern analysis and machine intelligence , Vol. 31 , 1 ( 2008 ), 39--58. Zhihong Zeng, Maja Pantic, Glenn I Roisman, and Thomas S Huang. 2008. A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE transactions on pattern analysis and machine intelligence , Vol. 31, 1 (2008), 39--58."}],"event":{"name":"HRI '20: ACM\/IEEE International Conference on Human-Robot Interaction","sponsor":["SIGAI ACM Special Interest Group on Artificial Intelligence","SIGCHI ACM Special Interest Group on Computer-Human Interaction"],"location":"Cambridge United Kingdom","acronym":"HRI '20"},"container-title":["Companion of the 2020 ACM\/IEEE International Conference on Human-Robot Interaction"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3371382.3378261","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3371382.3378261","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:33:31Z","timestamp":1750199611000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3371382.3378261"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,3,23]]},"references-count":20,"alternative-id":["10.1145\/3371382.3378261","10.1145\/3371382"],"URL":"https:\/\/doi.org\/10.1145\/3371382.3378261","relation":{},"subject":[],"published":{"date-parts":[[2020,3,23]]},"assertion":[{"value":"2020-04-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}