{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:34:13Z","timestamp":1750221253468,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":29,"publisher":"ACM","license":[{"start":{"date-parts":[[2018,10,26]],"date-time":"2018-10-26T00:00:00Z","timestamp":1540512000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Ministry of Science and ICT","award":["NRF-2017K1A3A1A16066838"],"award-info":[{"award-number":["NRF-2017K1A3A1A16066838"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2018,10,26]]},"DOI":"10.1145\/3264869.3264873","type":"proceedings-article","created":{"date-parts":[[2018,10,17]],"date-time":"2018-10-17T12:18:31Z","timestamp":1539778711000},"page":"27-32","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["Audio-Visual Attention Networks for Emotion Recognition"],"prefix":"10.1145","author":[{"given":"Jiyoung","family":"Lee","sequence":"first","affiliation":[{"name":"Yonsei University, Seoul, Rebublic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sunok","family":"Kim","sequence":"additional","affiliation":[{"name":"Yonsei University, Seoul, Rebublic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Seungryong","family":"Kim","sequence":"additional","affiliation":[{"name":"Yonsei University, Seoul, Rebublic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kwanghoon","family":"Sohn","sequence":"additional","affiliation":[{"name":"Yonsei University, Seoul, Rebublic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2018,10,26]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. http:\/\/tensorflow.org\/ Software available from tensorflow.org.  2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. http:\/\/tensorflow.org\/ Software available from tensorflow.org."},{"volume-title":"Proc. Conf. Int. Speech Communicat. Associat.","author":"Amir N.","key":"e_1_3_2_1_2_1","unstructured":"N. Amir and S. Ron . 1998. Towards an automatic classification of emotion in speech . In Proc. Conf. Int. Speech Communicat. Associat. N. Amir and S. Ron. 1998. Towards an automatic classification of emotion in speech. In Proc. Conf. Int. Speech Communicat. Associat."},{"key":"e_1_3_2_1_3_1","unstructured":"Jan K Chorowski Dzmitry Bahdanau Dmitriy Serdyuk Kyunghyun Cho and Yoshua Bengio. 2015. Attention-based models for speech recognition. In Advances in neural information processing systems. 577--585.   Jan K Chorowski Dzmitry Bahdanau Dmitriy Serdyuk Kyunghyun Cho and Yoshua Bengio. 2015. Attention-based models for speech recognition. In Advances in neural information processing systems. 577--585."},{"volume-title":"Proc. Conf. Int. Speech Communicat. Associat.","author":"Dellaert F.","key":"e_1_3_2_1_4_1","unstructured":"F. Dellaert , T. Polzin , and A. Waibel . 1996. Recognizing emotion in speech . In Proc. Conf. Int. Speech Communicat. Associat. F. Dellaert, T. Polzin, and A. Waibel. 1996. Recognizing emotion in speech. In Proc. Conf. Int. Speech Communicat. Associat."},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/2818346.2830596"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1873951.1874246"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.213"},{"volume-title":"Proc. Int. Conf. Artific. Intell. Statis. 249--256","author":"Glorot X.","key":"e_1_3_2_1_8_1","unstructured":"X. Glorot and Y. Bengio . 2010. Understanding the difficulty of training deep feedforward neural networks . In Proc. Int. Conf. Artific. Intell. Statis. 249--256 . X. Glorot and Y. Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proc. Int. Conf. Artific. Intell. Statis. 249--256."},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.4018\/jse.2010101605"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/2808196.2811641"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_3_2_1_12_1","volume-title":"Proc. IEEE Int. Conf. Image Process. (Sep.","author":"Jung H.","year":"2017","unstructured":"H. Jung , Y. Kim , D. Min , C. Oh , and K. Sohn . 2017. Depth Prediction from a Single Image with Conditional Adversarial Networks . in Proc. IEEE Int. Conf. Image Process. (Sep. 2017 ). H. Jung, Y. Kim, D. Min, C. Oh, and K. Sohn. 2017. Depth Prediction from a Single Image with Conditional Adversarial Networks. in Proc. IEEE Int. Conf. Image Process. (Sep. 2017)."},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/2522848.2531745"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCVW.2015.12"},{"volume-title":"Proc. IEEE Int. Conf. Image Process. 619--623","author":"Khorrami P.","key":"e_1_3_2_1_15_1","unstructured":"P. Khorrami , T. Le Paine , K. Brady , C. Dagli , and T. S. Huang . 2016. How deep neural networks can improve emotion recognition on video data . In Proc. IEEE Int. Conf. Image Process. 619--623 . P. Khorrami, T. Le Paine, K. Brady, C. Dagli, and T. S. Huang. 2016. How deep neural networks can improve emotion recognition on video data. In Proc. IEEE Int. Conf. Image Process. 619--623."},{"volume-title":"IEEE Int. Conf. Acous, Speech and Signal Process. 3677--3681","author":"Kim Y.","key":"e_1_3_2_1_16_1","unstructured":"Y. Kim and E. M. Provost . 2013. Emotion Classification via Utterance-level Dinamics: A Pattern-based appraoch to characterizing aaffective expressions . In IEEE Int. Conf. Acous, Speech and Signal Process. 3677--3681 . Y. Kim and E. M. Provost. 2013. Emotion Classification via Utterance-level Dinamics: A Pattern-based appraoch to characterizing aaffective expressions. In IEEE Int. Conf. Acous, Speech and Signal Process. 3677--3681."},{"key":"e_1_3_2_1_17_1","first-page":"1755","article-title":"Dlib-ml: A machine learning toolkit","author":"King D. E.","year":"2009","unstructured":"D. E. King . 2009 . Dlib-ml: A machine learning toolkit . Journal of Machine Learning Research 10 , Jul (2009), 1755 -- 1758 . D. E. King. 2009. Dlib-ml: A machine learning toolkit. Journal of Machine Learning Research 10, Jul (2009), 1755--1758.","journal-title":"Journal of Machine Learning Research 10"},{"key":"e_1_3_2_1_18_1","volume-title":"Adam: A method for stochastic optimization. arXiv:1412.6980","author":"Kingma D.","year":"2014","unstructured":"D. Kingma and J. Ba . 2014 . Adam: A method for stochastic optimization. arXiv:1412.6980 (2014). D. Kingma and J. Ba. 2014. Adam: A method for stochastic optimization. arXiv:1412.6980 (2014)."},{"key":"e_1_3_2_1_19_1","volume-title":"Proc. IEEE Int. Conf. Image Process. (Sep.","author":"Lee J.","year":"2017","unstructured":"J. Lee , H. Jung , Y. Kim , and K. Sohn . 2017. Automatic 2D-to-3D Conversion using Multi-scale Deep Neural Network . in Proc. IEEE Int. Conf. Image Process. (Sep. 2017 ). J. Lee, H. Jung, Y. Kim, and K. Sohn. 2017. Automatic 2D-to-3D Conversion using Multi-scale Deep Neural Network. in Proc. IEEE Int. Conf. Image Process. (Sep. 2017)."},{"volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing.","author":"Lee J.","key":"e_1_3_2_1_20_1","unstructured":"J. Lee , S. Kim , S. Kim , and K. Sohn . 2018. Spatiotemporal Attention Based Deep Neural Networks for Emotion Recognition . In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. J. Lee, S. Kim, S. Kim, and K. Sohn. 2018. Spatiotemporal Attention Based Deep Neural Networks for Emotion Recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing."},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/2733373.2806408"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3133944.3133953"},{"volume-title":"Proc. Conf. Int. Speech Communicat. Associat.","author":"Rozgic V.","key":"e_1_3_2_1_23_1","unstructured":"V. Rozgic , S. Ananthakrishnan , S. Saleem , R. Kumar , A. N. Vembu , and R. Prasad . 2012. Emotion recognition using acoustic and lexical features . In Proc. Conf. Int. Speech Communicat. Associat. V. Rozgic, S. Ananthakrishnan, S. Saleem, R. Kumar, A. N. Vembu, and R. Prasad. 2012. Emotion recognition using acoustic and lexical features. In Proc. Conf. Int. Speech Communicat. Associat."},{"key":"e_1_3_2_1_24_1","volume-title":"Action recognition using visual attention. arXiv:1511.04119","author":"Sharma S.","year":"2015","unstructured":"S. Sharma , R. Kiros , and r. Salakhutdinov. 2015. Action recognition using visual attention. arXiv:1511.04119 ( 2015 ). S. Sharma, R. Kiros, and r. Salakhutdinov. 2015. Action recognition using visual attention. arXiv:1511.04119 (2015)."},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2988257.2988270"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/2988257.2988258"},{"key":"e_1_3_2_1_27_1","first-page":"1","article-title":"Variable-state latent conditional random fields for facial expression recognition and action unit detection","volume":"1","author":"Walecki R.","year":"2015","unstructured":"R. Walecki , O. Rudovic , V. Pavlovic , and M. Pantic . 2015 . Variable-state latent conditional random fields for facial expression recognition and action unit detection . In Proc. IEEE Int. Conf. Face and Gesture Recognit. , Vol. 1. 1 -- 8 . R. Walecki, O. Rudovic, V. Pavlovic, and M. Pantic. 2015. Variable-state latent conditional random fields for facial expression recognition and action unit detection. In Proc. IEEE Int. Conf. Face and Gesture Recognit., Vol. 1. 1--8.","journal-title":"Proc. IEEE Int. Conf. Face and Gesture Recognit."},{"volume-title":"Proc. Neur. Inf. Proc. Syst. 802--810","author":"Xingjian S.","key":"e_1_3_2_1_28_1","unstructured":"S. Xingjian , Z. Chen , H. Wang , D. Y. Yeung , W. K. Wong , and W. C. Woo . 2015. Convolutional LSTM network: A machine learning approach for precipitation nowcasting . In Proc. Neur. Inf. Proc. Syst. 802--810 . S. Xingjian, Z. Chen, H. Wang, D. Y. Yeung, W. K. Wong, and W. C. Woo. 2015. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Proc. Neur. Inf. Proc. Syst. 802--810."},{"key":"e_1_3_2_1_29_1","volume-title":"International conference on machine learning. 2048--2057","author":"Xu Kelvin","year":"2015","unstructured":"Kelvin Xu , Jimmy Ba , Ryan Kiros , Kyunghyun Cho , Aaron Courville , Ruslan Salakhudinov , Rich Zemel , and Yoshua Bengio . 2015 . Show, attend and tell: Neural image caption generation with visual attention . In International conference on machine learning. 2048--2057 . Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine learning. 2048--2057."}],"event":{"name":"MM '18: ACM Multimedia Conference","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Seoul Republic of Korea","acronym":"MM '18"},"container-title":["Proceedings of the 2018 Workshop on Audio-Visual Scene Understanding for Immersive Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3264869.3264873","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3264869.3264873","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T02:07:58Z","timestamp":1750212478000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3264869.3264873"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,10,26]]},"references-count":29,"alternative-id":["10.1145\/3264869.3264873","10.1145\/3264869"],"URL":"https:\/\/doi.org\/10.1145\/3264869.3264873","relation":{},"subject":[],"published":{"date-parts":[[2018,10,26]]},"assertion":[{"value":"2018-10-26","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}