{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,5]],"date-time":"2026-06-05T05:06:46Z","timestamp":1780636006451,"version":"3.54.1"},"publisher-location":"New York, NY, USA","reference-count":35,"publisher":"ACM","license":[{"start":{"date-parts":[[2016,10,16]],"date-time":"2016-10-16T00:00:00Z","timestamp":1476576000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2016,10,16]]},"DOI":"10.1145\/2988257.2988264","type":"proceedings-article","created":{"date-parts":[[2016,10,12]],"date-time":"2016-10-12T18:34:04Z","timestamp":1476297244000},"page":"97-104","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":83,"title":["Multi-Modal Audio, Video and Physiological Sensor Learning for Continuous Emotion Prediction"],"prefix":"10.1145","author":[{"given":"Kevin","family":"Brady","sequence":"first","affiliation":[{"name":"Massachusetts Institute of Technology Lincoln Laboratory, Lexingon, MA, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Youngjune","family":"Gwon","sequence":"additional","affiliation":[{"name":"Massachusetts Institute of Technology Lincoln Laboratory, Lexington, MA, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Pooya","family":"Khorrami","sequence":"additional","affiliation":[{"name":"University of Illinois, Urbana-Champaign, IL, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Elizabeth","family":"Godoy","sequence":"additional","affiliation":[{"name":"Massachusetts Institute of Technology Lincoln Laboratory, Lexington, MA, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"William","family":"Campbell","sequence":"additional","affiliation":[{"name":"Massachusetts Institute of Technology Lincoln Laboratory, Lexington, MA, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Charlie","family":"Dagli","sequence":"additional","affiliation":[{"name":"MIT Lincoln Laboratory, Lexington, MA, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Thomas S.","family":"Huang","sequence":"additional","affiliation":[{"name":"University of Illinois, Urbana-Champaign, IL, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2016,10,16]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/2808196.2811634"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/2808196.2811638"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/2818346.2830596"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/2808196.2811641"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/2522848.2531745"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"crossref","unstructured":"P. Khorrami T. L. Paine K. Brady C. Dagli & T. S. Huang 2016. How Deep Neural Networks Can Improve Emotion Recognition on Video Data. arXiv preprint arXiv:1602.07377.  P. Khorrami T. L. Paine K. Brady C. Dagli & T. S. Huang 2016. How Deep Neural Networks Can Improve Emotion Recognition on Video Data. arXiv preprint arXiv:1602.07377.","DOI":"10.1109\/ICIP.2016.7532431"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCVW.2015.12"},{"key":"e_1_3_2_1_8_1","volume-title":"Dlib-ml: A machine learning toolkit. Journal of Machine Learning Research, 1755--1758.","author":"King D. E.","year":"2009","unstructured":"D. E. King , 2009 . Dlib-ml: A machine learning toolkit. Journal of Machine Learning Research, 1755--1758. D. E. King, 2009. Dlib-ml: A machine learning toolkit. Journal of Machine Learning Research, 1755--1758."},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2014.11.007"},{"key":"e_1_3_2_1_10_1","volume-title":"AVEC 2016 -- Depression, Mood, and Emotion Recognition Workshop and Challenge}, arXiv:1605","author":"Valstar M.","year":"2016","unstructured":"M. Valstar , J. Gratch , B. Schuller , F. Ringeval , D. Lalanne , M. Torres , S. Scherer , G. Stratou , R. Cowie , and M. Pantic . AVEC 2016 -- Depression, Mood, and Emotion Recognition Workshop and Challenge}, arXiv:1605 .01600, 2016 . M. Valstar, J. Gratch, B. Schuller, F. Ringeval, D. Lalanne, M. Torres, S. Scherer, G. Stratou, R. Cowie, and M. Pantic. AVEC 2016 -- Depression, Mood, and Emotion Recognition Workshop and Challenge}, arXiv:1605.01600, 2016."},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"crossref","DOI":"10.1002\/0471221279","volume-title":"Estimation with Applications to Tracking and Navigation: Theory, Algorithms, and Software","author":"Bar-Shalom Y.","year":"2001","unstructured":"Y. Bar-Shalom , X. Rong Li , and T. Kirubarajan , Estimation with Applications to Tracking and Navigation: Theory, Algorithms, and Software , John Wiley & Sons, Inc. , New York , 2001 . Y. Bar-Shalom, X. Rong Li, and T. Kirubarajan, Estimation with Applications to Tracking and Navigation: Theory, Algorithms, and Software, John Wiley & Sons, Inc., New York, 2001."},{"key":"e_1_3_2_1_12_1","series-title":"the series Lecture Notes in Computer Science","volume-title":"Multiple Classifier Systems","author":"Glodek M.","year":"2013","unstructured":"M. Glodek , , Kalman Filter Based Classifier Fusion for Affective State Recognition , Multiple Classifier Systems , Vol. 2772 of the series Lecture Notes in Computer Science , Springer-Verlag , Berlin ,, 2013 . M. Glodek, et al, Kalman Filter Based Classifier Fusion for Affective State Recognition, Multiple Classifier Systems, Vol. 2772 of the series Lecture Notes in Computer Science, Springer-Verlag, Berlin,, 2013."},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.5220\/0004828606710678"},{"key":"e_1_3_2_1_14_1","volume-title":"Processing Conference (EUSIPCO)","author":"Markov K.","year":"2015","unstructured":"K. Markov , Dynamic Speech Emotion Recognition with State-Space Models , 23rd European signal Processing Conference (EUSIPCO) , 2015 . K. Markov, et al, Dynamic Speech Emotion Recognition with State-Space Models, 23rd European signal Processing Conference (EUSIPCO), 2015."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/FG.2013.6553805"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2007.367262"},{"key":"e_1_3_2_1_17_1","volume-title":"Proc. of Interspeech","author":"M. W\u00f6llmer","year":"2008","unstructured":"M. W\u00f6llmer et al. , Abandoning emotion classes-towards continuous emotion recognition with modeling of long-range dependencies . In Proc. of Interspeech , 2008 . M. W\u00f6llmer et al., Abandoning emotion classes-towards continuous emotion recognition with modeling of long-range dependencies. In Proc. of Interspeech, 2008."},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/2808196.2811642"},{"key":"e_1_3_2_1_19_1","first-page":"41","volume-title":"Proc. Odyssey: The Speaker and Language Recognition Workshop in Toledo, Spain, ISCA","author":"Singer W. M.","year":"2004","unstructured":"Campbell, W. M. , Singer , E., Torres-Carrasquillo, P. A. , Reynolds , D. A. , Language Recognition with Support Vector Machines , In Proc. Odyssey: The Speaker and Language Recognition Workshop in Toledo, Spain, ISCA , pp. 41 -- 44 , 31 May -3 June 2004 . Campbell, W. M., Singer, E., Torres-Carrasquillo, P. A., Reynolds, D. A., Language Recognition with Support Vector Machines, In Proc. Odyssey: The Speaker and Language Recognition Workshop in Toledo, Spain, ISCA, pp. 41--44, 31 May-3 June 2004."},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/2808196.2811640"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1006\/dspr.1999.0361"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0042-6989(97)00169-7"},{"key":"e_1_3_2_1_23_1","volume-title":"Proc. of AISTATS","author":"Coates A.","year":"2011","unstructured":"A. Coates , H. Lee , A. Ng . An analysis of single-layer networks in unsupervised feature learning . In Proc. of AISTATS , 2011 A. Coates, H. Lee, A. Ng. An analysis of single-layer networks in unsupervised feature learning. In Proc. of AISTATS, 2011"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2011.5995313"},{"key":"e_1_3_2_1_25_1","unstructured":"ITU Standard rec-bs.1387--1--2001 2001.  ITU Standard rec-bs.1387--1--2001 2001."},{"key":"e_1_3_2_1_26_1","first-page":"30","volume-title":"Efficient spectral envelope estimation and its application to pitch shifting and envelope preservation,\" in Digital Audio Effects (DAFx)","author":"Roebel A.","year":"2005","unstructured":"A. Roebel and X. Rodet , \" Efficient spectral envelope estimation and its application to pitch shifting and envelope preservation,\" in Digital Audio Effects (DAFx) , 2005 , pp. 30 -- 35 . A. Roebel and X. Rodet, \"Efficient spectral envelope estimation and its application to pitch shifting and envelope preservation,\" in Digital Audio Effects (DAFx), 2005, pp. 30--35."},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"crossref","first-page":"2307","DOI":"10.21437\/Eurospeech.1999-503","author":"Kapilow D.","year":"1999","unstructured":"D. Kapilow , Y. Stylianou , and J. Schroeter , Detection of nonstationarity in speech signals and its application to time-scaling. Eurospeech. 1999 . pp. 2307 -- 2310 . D. Kapilow, Y. Stylianou, and J. Schroeter, Detection of nonstationarity in speech signals and its application to time-scaling. Eurospeech. 1999. pp. 2307--2310.","journal-title":"Eurospeech."},{"key":"e_1_3_2_1_28_1","volume-title":"Proc. of INTERSPEECH","author":"Torres-Carrasquillo P. A.","year":"2002","unstructured":"P. A. Torres-Carrasquillo , E. Singer , M. A. Kohler , R. J. Greene , D. A. Reynolds , and J. R. Deller . Approaches to Language Identification Using Gaussian Mixture Models and Shifted Delta Cepstral Features . In Proc. of INTERSPEECH , 2002 . P. A. Torres-Carrasquillo, E. Singer, M. A. Kohler, R. J. Greene, D. A. Reynolds, and J. R. Deller. Approaches to Language Identification Using Gaussian Mixture Models and Shifted Delta Cepstral Features. In Proc. of INTERSPEECH, 2002."},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF00127682"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/1553374.1553463"},{"key":"e_1_3_2_1_31_1","unstructured":"A. Graves. \"Generating sequences with recurrent neural networks.\" arXiv preprint arXiv:1308.0850 (2013).  A. Graves. \"Generating sequences with recurrent neural networks.\" arXiv preprint arXiv:1308.0850 (2013)."},{"issue":"2","key":"e_1_3_2_1_32_1","first-page":"201","volume":"5","author":"Bone D.","year":"2014","unstructured":"D. Bone , C. Lee , and S. Narayanan , \"Robust unsupervised arousal rating: A rule-based framework with knowledge-inspired vocal features,\" Affective Computing, IEEE Transactions on , vol. 5 , no. 2 , pp. 201 -- 213 , 2014 . D. Bone, C. Lee, and S. Narayanan, \"Robust unsupervised arousal rating: A rule-based framework with knowledge-inspired vocal features,\" Affective Computing, IEEE Transactions on, vol. 5, no. 2, pp. 201--213, 2014.","journal-title":"\"Robust unsupervised arousal rating: A rule-based framework with knowledge-inspired vocal features,\" Affective Computing, IEEE Transactions on"},{"key":"e_1_3_2_1_33_1","first-page":"5200","volume-title":"Speech and Signal Processing (ICASSP)","author":"Ringeval G. Trigeorgis F.","year":"2016","unstructured":"G. Trigeorgis F. Ringeval , R. Brueckner , E. Marchi , M. A. Nicolaou , and S. Zafeiriou . \" Adieu features' End-to-end speech emotion recognition using a deep convolutional recurrent network.\" In 2016 IEEE International Conference on Acoustics , Speech and Signal Processing (ICASSP) , pp. 5200 -- 5204 . IEEE, 2016 . G. Trigeorgis F. Ringeval, R. Brueckner, E. Marchi, M. A. Nicolaou, and S. Zafeiriou. \"Adieu features' End-to-end speech emotion recognition using a deep convolutional recurrent network.\" In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5200--5204. IEEE, 2016."},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2015.01.076"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/SSCI.2015.53"}],"event":{"name":"MM '16: ACM Multimedia Conference","location":"Amsterdam The Netherlands","acronym":"MM '16","sponsor":["SIGMM ACM Special Interest Group on Multimedia"]},"container-title":["Proceedings of the 6th International Workshop on Audio\/Visual Emotion Challenge"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2988257.2988264","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2988257.2988264","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T03:50:36Z","timestamp":1750218636000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2988257.2988264"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,10,16]]},"references-count":35,"alternative-id":["10.1145\/2988257.2988264","10.1145\/2988257"],"URL":"https:\/\/doi.org\/10.1145\/2988257.2988264","relation":{},"subject":[],"published":{"date-parts":[[2016,10,16]]},"assertion":[{"value":"2016-10-16","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}