{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,24]],"date-time":"2026-02-24T06:42:49Z","timestamp":1771915369667,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":48,"publisher":"ACM","license":[{"start":{"date-parts":[[2017,11,3]],"date-time":"2017-11-03T00:00:00Z","timestamp":1509667200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2017,11,3]]},"DOI":"10.1145\/3136755.3143006","type":"proceedings-article","created":{"date-parts":[[2017,11,6]],"date-time":"2017-11-06T13:30:29Z","timestamp":1509975029000},"page":"536-543","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":35,"title":["Modeling multimodal cues in a deep learning-based framework for emotion recognition in the wild"],"prefix":"10.1145","author":[{"given":"Stefano","family":"Pini","sequence":"first","affiliation":[{"name":"University of Modena and Reggio Emilia, Italy"}]},{"given":"Olfa Ben","family":"Ahmed","sequence":"additional","affiliation":[{"name":"EURECOM, France"}]},{"given":"Marcella","family":"Cornia","sequence":"additional","affiliation":[{"name":"University of Modena and Reggio Emilia, Italy"}]},{"given":"Lorenzo","family":"Baraldi","sequence":"additional","affiliation":[{"name":"University of Modena and Reggio Emilia, Italy"}]},{"given":"Rita","family":"Cucchiara","sequence":"additional","affiliation":[{"name":"University of Modena and Reggio Emilia, Italy"}]},{"given":"Benoit","family":"Huet","sequence":"additional","affiliation":[{"name":"EURECOM, France"}]}],"member":"320","published-online":{"date-parts":[[2017,11,3]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.572"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"crossref","unstructured":"Yusuf Aytar Carl Vondrick and Antonio Torralba. 2016. SoundNet: Learning Sound Representations from Unlabeled Video. In Neural Information Processing Systems.   Yusuf Aytar Carl Vondrick and Antonio Torralba. 2016. SoundNet: Learning Sound Representations from Unlabeled Video. In Neural Information Processing Systems.","DOI":"10.1109\/CVPR.2016.18"},{"key":"e_1_3_2_1_3_1","volume-title":"Specificity of facial emotion recognition impairments in patients with multi-episode schizophrenia","author":"Barkhof Emile","year":"2015","unstructured":"Emile Barkhof , Leo M.J. de Sonneville , Carin J. Meijer , and Lieuwe de Haan . 2015. Specificity of facial emotion recognition impairments in patients with multi-episode schizophrenia . Schizophrenia Research : Cognition ( 2015 ). Emile Barkhof, Leo M.J. de Sonneville, Carin J. Meijer, and Lieuwe de Haan. 2015. Specificity of facial emotion recognition impairments in patients with multi-episode schizophrenia. Schizophrenia Research: Cognition (2015)."},{"key":"e_1_3_2_1_4_1","volume-title":"the Proceedings of the MediaEval 2017 Workshop","author":"Ben-Ahmed Olfa","year":"2017","unstructured":"Olfa Ben-Ahmed , Jonas Wacker , Alessandro Gaballo , and Benoit Huet . 2017 . EURECOM @MediaEval 2017: Media Genre Inference for Predicting Media Interestingnes . In the Proceedings of the MediaEval 2017 Workshop , Dublin, Ireland , September 13-15, 2017. Olfa Ben-Ahmed, Jonas Wacker, Alessandro Gaballo, and Benoit Huet. 2017. EURECOM @MediaEval 2017: Media Genre Inference for Predicting Media Interestingnes. In the Proceedings of the MediaEval 2017 Workshop, Dublin, Ireland, September 13-15, 2017."},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/2661806.2661811"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2016.7472178"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/2808196.2811638"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/3136755.3143004"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/MMUL.2012.26"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/2993148.2997637"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/2818346.2830596"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/2993148.2997632"},{"key":"e_1_3_2_1_13_1","volume-title":"International Conference on Machine Learning Workshops.","author":"Goodfellow I. J.","unstructured":"I. J. Goodfellow , D. Erhan , P. L. Carrier , A. Courville , M. Mirza , B. Hamner , W. Cukierski , Y. Tang , D. Thaler , D. Lee , Y. Zhou , C. Ramaiah , F. Feng , R. Li , X. Wang , D. Athanasakis , J. Shawe-Taylor , M. Milakov , J. Park , R. Ionescu , M. Popescu , C. Grozea , J. Bergstra , J. Xie , L. Romaszko , B. Xu , Z. Chuang , and Y. Bengio . 2013. Challenges in Representation Learning: A report on three machine learning contests . In International Conference on Machine Learning Workshops. I. J. Goodfellow, D. Erhan, P. L. Carrier, A. Courville, M. Mirza, B. Hamner, W. Cukierski, Y. Tang, D. Thaler, D. Lee, Y. Zhou, C. Ramaiah, F. Feng, R. Li, X. Wang, D. Athanasakis, J. Shawe-Taylor, M. Milakov, J. Park, R. Ionescu, M. Popescu, C. Grozea, J. Bergstra, J. Xie, L. Romaszko, B. Xu, Z. Chuang, and Y. Bengio. 2013. Challenges in Representation Learning: A report on three machine learning contests. In International Conference on Machine Learning Workshops."},{"key":"e_1_3_2_1_14_1","volume-title":"Speech Emotion Recognition Using Deep Neural Network and Extreme Learning Machine. In Fifteenth Annual Conference of the International Speech Communication Association.","author":"Han Kun","year":"2014","unstructured":"Kun Han , Dong Yu , and Ivan Tashev . 2014 . Speech Emotion Recognition Using Deep Neural Network and Extreme Learning Machine. In Fifteenth Annual Conference of the International Speech Communication Association. Kun Han, Dong Yu, and Ivan Tashev. 2014. Speech Emotion Recognition Using Deep Neural Network and Extreme Learning Machine. In Fifteenth Annual Conference of the International Speech Communication Association."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"crossref","unstructured":"A. Hargreaves O. Mothersill M. Anderson S. Lawless A. Corvin and G. Donohoe. 2016. Detecting facial emotion recognition deficits in schizophrenia using dynamic stimuli of varying intensities. Neuroscience letters (2016).  A. Hargreaves O. Mothersill M. Anderson S. Lawless A. Corvin and G. Donohoe. 2016. Detecting facial emotion recognition deficits in schizophrenia using dynamic stimuli of varying intensities. Neuroscience letters (2016).","DOI":"10.1016\/j.neulet.2016.09.017"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.123"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/2808196.2811641"},{"key":"e_1_3_2_1_18_1","volume-title":"N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, and B. Kingsbury.","author":"Hinton G.","year":"2012","unstructured":"G. Hinton , L. Deng , D. Yu , G. E. Dahl , A. r. Mohamed , N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, and B. Kingsbury. 2012 . Deep Neural ICMI\u201917, November 13\u201317, 2017, Glasgow, UK S. Pini, O. Ben Ahmed, M. Cornia, L. Baraldi, R. Cucchiara, B. Huet Networks for Acoustic Modeling in Speech Recognition : The Shared Views of Four Research Groups. IEEE Signal Processing Magazine 29, 6 (2012), 82\u201397. G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, and B. Kingsbury. 2012. Deep Neural ICMI\u201917, November 13\u201317, 2017, Glasgow, UK S. Pini, O. Ben Ahmed, M. Cornia, L. Baraldi, R. Cucchiara, B. Huet Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups. IEEE Signal Processing Magazine 29, 6 (2012), 82\u201397."},{"key":"e_1_3_2_1_19_1","volume-title":"Improving neural networks by preventing coadaptation of feature detectors. arXiv preprint arXiv:1207.0580","author":"Hinton Geoffrey E","year":"2012","unstructured":"Geoffrey E Hinton , Nitish Srivastava , Alex Krizhevsky , Ilya Sutskever , and Ruslan R Salakhutdinov . 2012. Improving neural networks by preventing coadaptation of feature detectors. arXiv preprint arXiv:1207.0580 ( 2012 ). Geoffrey E Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R Salakhutdinov. 2012. Improving neural networks by preventing coadaptation of feature detectors. arXiv preprint arXiv:1207.0580 (2012)."},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/2647868.2654984"},{"key":"e_1_3_2_1_22_1","volume-title":"International Conference on Machine Learning.","author":"Ioffe Sergey","year":"2015","unstructured":"Sergey Ioffe and Christian Szegedy . 2015 . Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift . In International Conference on Machine Learning. Sergey Ioffe and Christian Szegedy. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In International Conference on Machine Learning."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2010.5540039"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/2522848.2531745"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.imavis.2017.01.012"},{"key":"e_1_3_2_1_26_1","volume-title":"How Deep Neural Networks Can Improve Emotion Recognition on Video Data. In IEEE International Conference on Image Processing.","author":"Khorrami Pooya","year":"2016","unstructured":"Pooya Khorrami , Tom Le Paine , Kevin Brady , Charlie Dagli , and Thomas S Huang . 2016 . How Deep Neural Networks Can Improve Emotion Recognition on Video Data. In IEEE International Conference on Image Processing. Pooya Khorrami, Tom Le Paine, Kevin Brady, Charlie Dagli, and Thomas S Huang. 2016. How Deep Neural Networks Can Improve Emotion Recognition on Video Data. In IEEE International Conference on Image Processing."},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2013.6638346"},{"key":"e_1_3_2_1_28_1","volume-title":"Kingma and Jimmy Ba","author":"Diederik","year":"2014","unstructured":"Diederik P. Kingma and Jimmy Ba . 2014 . Adam : A Method for Stochastic Optimization. CoRR ( 2014). Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. CoRR (2014)."},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/APSIPA.2016.7820699"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2014.2360798"},{"key":"e_1_3_2_1_31_1","volume-title":"Cognitive penetrability and emotion recognition in human facial expressions. Frontiers in psychology","author":"Marchi Francesco","year":"2015","unstructured":"Francesco Marchi and Albert Newen . 2015. Cognitive penetrability and emotion recognition in human facial expressions. Frontiers in psychology ( 2015 ). Francesco Marchi and Albert Newen. 2015. Cognitive penetrability and emotion recognition in human facial expressions. Frontiers in psychology (2015)."},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDEW.2006.145"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/1816041.1816069"},{"key":"e_1_3_2_1_34_1","volume-title":"Deep Face Recognition. In British Machine Vision Conference.","author":"Parkhi Omkar M","year":"2015","unstructured":"Omkar M Parkhi , Andrea Vedaldi , Andrew Zisserman , 2015 . Deep Face Recognition. In British Machine Vision Conference. Omkar M Parkhi, Andrea Vedaldi, Andrew Zisserman, et al. 2015. Deep Face Recognition. In British Machine Vision Conference."},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/2808196.2811642"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298682"},{"key":"e_1_3_2_1_37_1","volume-title":"Recurrent Dropout without Memory Loss. CoRR abs\/1603.05118","author":"Semeniuta Stanislau","year":"2016","unstructured":"Stanislau Semeniuta , Aliaksei Severyn , and Erhardt Barth . 2016. Recurrent Dropout without Memory Loss. CoRR abs\/1603.05118 ( 2016 ). Stanislau Semeniuta, Aliaksei Severyn, and Erhardt Barth. 2016. Recurrent Dropout without Memory Loss. CoRR abs\/1603.05118 (2016)."},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/3123266.3127906"},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.5555\/2627435.2670313"},{"key":"e_1_3_2_1_40_1","volume-title":"Inception-ResNet and the Impact of Residual Connections on Learning. In International Conference on Learning Representations Workshops.","author":"Szegedy Christian","year":"2016","unstructured":"Christian Szegedy , Sergey Ioffe , and Vincent Vanhoucke . 2016 . Inception-v4 , Inception-ResNet and the Impact of Residual Connections on Learning. In International Conference on Learning Representations Workshops. Christian Szegedy, Sergey Ioffe, and Vincent Vanhoucke. 2016. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. In International Conference on Learning Representations Workshops."},{"key":"e_1_3_2_1_41_1","volume-title":"AENet: Learning Deep Audio Features for Video Analysis. CoRR abs\/1701.00599","author":"Takahashi Naoya","year":"2017","unstructured":"Naoya Takahashi , Michael Gygli , and Luc Van Gool . 2017. AENet: Learning Deep Audio Features for Video Analysis. CoRR abs\/1701.00599 ( 2017 ). Naoya Takahashi, Michael Gygli, and Luc Van Gool. 2017. AENet: Learning Deep Audio Features for Video Analysis. CoRR abs\/1701.00599 (2017)."},{"key":"e_1_3_2_1_42_1","volume-title":"Fr\u00e9d\u00e9ric Lefebvre, Claire-Hel\u00e8ne Demarty, Benoit Huet, and Louis Chevallier.","author":"Tiwari Shriman Narayan","year":"2016","unstructured":"Shriman Narayan Tiwari , Ngoc QK Duong , Fr\u00e9d\u00e9ric Lefebvre, Claire-Hel\u00e8ne Demarty, Benoit Huet, and Louis Chevallier. 2016 . Deep Features for Multimodal Emotion Classification . (2016). Shriman Narayan Tiwari, Ngoc QK Duong, Fr\u00e9d\u00e9ric Lefebvre, Claire-Hel\u00e8ne Demarty, Benoit Huet, and Louis Chevallier. 2016. Deep Features for Multimodal Emotion Classification. (2016)."},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.510"},{"key":"e_1_3_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/2993148.2997639"},{"key":"e_1_3_2_1_45_1","volume-title":"Recurrent Neural Network Regularization. CoRR abs\/1409.2329","author":"Zaremba Wojciech","year":"2014","unstructured":"Wojciech Zaremba , Ilya Sutskever , and Oriol Vinyals . 2014. Recurrent Neural Network Regularization. CoRR abs\/1409.2329 ( 2014 ). Wojciech Zaremba, Ilya Sutskever, and Oriol Vinyals. 2014. Recurrent Neural Network Regularization. CoRR abs\/1409.2329 (2014)."},{"key":"e_1_3_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/LSP.2016.2603342"},{"key":"e_1_3_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/2911996.2912051"},{"key":"e_1_3_2_1_48_1","volume-title":"Learning Affective Features with a Hybrid Deep Model for Audio-Visual Emotion Recognition","author":"Zhang Shiqing","year":"2017","unstructured":"Shiqing Zhang , Shiliang Zhang , Tiejun Huang , Wen Gao , and Qi Tian . 2017. Learning Affective Features with a Hybrid Deep Model for Audio-Visual Emotion Recognition . IEEE Transactions on Circuits and Systems for Video Technology ( 2017 ). Shiqing Zhang, Shiliang Zhang, Tiejun Huang, Wen Gao, and Qi Tian. 2017. Learning Affective Features with a Hybrid Deep Model for Audio-Visual Emotion Recognition. IEEE Transactions on Circuits and Systems for Video Technology (2017)."}],"event":{"name":"ICMI '17: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION","location":"Glasgow UK","acronym":"ICMI '17","sponsor":["SIGCHI ACM Special Interest Group on Computer-Human Interaction"]},"container-title":["Proceedings of the 19th ACM International Conference on Multimodal Interaction"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3136755.3143006","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3136755.3143006","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T02:10:55Z","timestamp":1750212655000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3136755.3143006"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,11,3]]},"references-count":48,"alternative-id":["10.1145\/3136755.3143006","10.1145\/3136755"],"URL":"https:\/\/doi.org\/10.1145\/3136755.3143006","relation":{},"subject":[],"published":{"date-parts":[[2017,11,3]]},"assertion":[{"value":"2017-11-03","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}