{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:35:26Z","timestamp":1750221326609,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":38,"publisher":"ACM","license":[{"start":{"date-parts":[[2017,10,19]],"date-time":"2017-10-19T00:00:00Z","timestamp":1508371200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"SQUIRREL","award":["610532"],"award-info":[{"award-number":["610532"]}]},{"name":"TERESA","award":["611153"],"award-info":[{"award-number":["611153"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2017,10,19]]},"DOI":"10.1145\/3123266.3123353","type":"proceedings-article","created":{"date-parts":[[2017,10,20]],"date-time":"2017-10-20T13:04:26Z","timestamp":1508504666000},"page":"1006-1013","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":20,"title":["Deep Temporal Models using Identity Skip-Connections for Speech Emotion Recognition"],"prefix":"10.1145","author":[{"given":"Jaebok","family":"Kim","sequence":"first","affiliation":[{"name":"University of Twente, Enschede, Netherlands"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Gwenn","family":"Englebienne","sequence":"additional","affiliation":[{"name":"University of Twente, Enschede, Netherlands"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Khiet P.","family":"Truong","sequence":"additional","affiliation":[{"name":"University of Twente, Enschede, Netherlands"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Vanessa","family":"Evers","sequence":"additional","affiliation":[{"name":"University of Twente, Enschede, Netherlands"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2017,10,19]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Proceedings of LREC.","author":"Batliner Anton","year":"2004","unstructured":"Anton Batliner , Christian Hacker , Stefan Steidl , Elmar N\u00f6th , Shona D'Arcy , Martin J Russell , and Michael Wong . 2004 . You Stupid Tin Box-Children Interacting with the AIBO Robot: A Cross-linguistic Emotional Speech Corpus .. In Proceedings of LREC. Anton Batliner, Christian Hacker, Stefan Steidl, Elmar N\u00f6th, Shona D'Arcy, Martin J Russell, and Michael Wong. 2004. You Stupid Tin Box-Children Interacting with the AIBO Robot: A Cross-linguistic Emotional Speech Corpus.. In Proceedings of LREC."},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2013.50"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2005-446"},{"key":"e_1_3_2_1_4_1","volume-title":"IEMOCAP: Interactive emotional dyadic motion capture database. Language resources and evaluation","author":"Busso Carlos","year":"2008","unstructured":"Carlos Busso , Murtaza Bulut , Chi-Chun Lee , Abe Kazemzadeh , Emily Mower , Samuel Kim , Jeannette N Chang , Sungbok Lee , and Shrikanth S Narayanan . 2008 . IEMOCAP: Interactive emotional dyadic motion capture database. Language resources and evaluation Vol. 42 , 4 (2008), 335--359. Carlos Busso, Murtaza Bulut, Chi-Chun Lee, Abe Kazemzadeh, Emily Mower, Samuel Kim, Jeannette N Chang, Sungbok Lee, and Shrikanth S Narayanan. 2008. IEMOCAP: Interactive emotional dyadic motion capture database. Language resources and evaluation Vol. 42, 4 (2008), 335--359."},{"key":"e_1_3_2_1_5_1","volume-title":"Proceedings of International Conference on Machine Learning (ICML). 2067--2075","author":"Chung Junyoung","year":"2015","unstructured":"Junyoung Chung , Caglar G\u00fclccehre , Kyunghyun Cho , and Yoshua Bengio . 2015 . Gated Feedback Recurrent Neural Networks . In Proceedings of International Conference on Machine Learning (ICML). 2067--2075 . Junyoung Chung, Caglar G\u00fclccehre, Kyunghyun Cho, and Yoshua Bengio. 2015. Gated Feedback Recurrent Neural Networks. In Proceedings of International Conference on Machine Learning (ICML). 2067--2075."},{"key":"e_1_3_2_1_6_1","unstructured":"Roddy Cowie Ellen Douglas-Cowie Susie Savvidou* Edelle McMahon Martin Sawey and Marc Schr\u00f6der. 2000. 'FEELTRACE': An instrument for recording perceived emotion in real time ISCA Tutorial and Research Workshop (ITRW) on Speech and Emotion. 19--24.  Roddy Cowie Ellen Douglas-Cowie Susie Savvidou* Edelle McMahon Martin Sawey and Marc Schr\u00f6der. 2000. 'FEELTRACE': An instrument for recording perceived emotion in real time ISCA Tutorial and Research Workshop (ITRW) on Speech and Emotion. 19--24."},{"key":"e_1_3_2_1_7_1","first-page":"1068","article-title":"b. Autoencoder-based Unsupervised Domain Adaptation for Speech Emotion Recognition. Signal Processing Letters","volume":"21","author":"Deng Jun","year":"2014","unstructured":"Jun Deng , Zixing Zhang , Florian Eyben , and Bjorn Schuller . 2014 b. Autoencoder-based Unsupervised Domain Adaptation for Speech Emotion Recognition. Signal Processing Letters , IEEE Vol. 21 , 9 (2014), 1068 -- 1072 . Jun Deng, Zixing Zhang, Florian Eyben, and Bjorn Schuller. 2014 b. Autoencoder-based Unsupervised Domain Adaptation for Speech Emotion Recognition. Signal Processing Letters, IEEE Vol. 21, 9 (2014), 1068--1072.","journal-title":"IEEE"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICPR.2014.141"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/2133366.2133372"},{"key":"e_1_3_2_1_10_1","volume-title":"Learning Representations of Affect from Speech. arXiv preprint arXiv:1511.04747","author":"Ghosh Sayan","year":"2015","unstructured":"Sayan Ghosh , Eugene Laksana , Louis-Philippe Morency , and Stefan Scherer . 2015. Learning Representations of Affect from Speech. arXiv preprint arXiv:1511.04747 ( 2015 ). Sayan Ghosh, Eugene Laksana, Louis-Philippe Morency, and Stefan Scherer. 2015. Learning Representations of Affect from Speech. arXiv preprint arXiv:1511.04747 (2015)."},{"key":"e_1_3_2_1_11_1","volume-title":"Representation Learning for Speech Emotion Recognition Proceedings of INTERSPEECH. 3603--3607","author":"Ghosh Sayan","year":"2016","unstructured":"Sayan Ghosh , Eugene Laksana , Louis-Philippe Morency , and Stefan Scherer . 2016 . Representation Learning for Speech Emotion Recognition Proceedings of INTERSPEECH. 3603--3607 . Sayan Ghosh, Eugene Laksana, Louis-Philippe Morency, and Stefan Scherer. 2016. Representation Learning for Speech Emotion Recognition Proceedings of INTERSPEECH. 3603--3607."},{"volume-title":"Nonparametric statistical inference","author":"Gibbons Jean Dickinson","key":"e_1_3_2_1_12_1","unstructured":"Jean Dickinson Gibbons and Subhabrata Chakraborti . 2011. Nonparametric statistical inference . Springer . Jean Dickinson Gibbons and Subhabrata Chakraborti. 2011. Nonparametric statistical inference. Springer."},{"key":"e_1_3_2_1_13_1","volume-title":"Highway and residual networks learn unrolled iterative estimation. arXiv preprint arXiv:1612.07771","author":"Greff Klaus","year":"2016","unstructured":"Klaus Greff , Rupesh K Srivastava , and J\u00fcrgen Schmidhuber . 2016. Highway and residual networks learn unrolled iterative estimation. arXiv preprint arXiv:1612.07771 ( 2016 ). Klaus Greff, Rupesh K Srivastava, and J\u00fcrgen Schmidhuber. 2016. Highway and residual networks learn unrolled iterative estimation. arXiv preprint arXiv:1612.07771 (2016)."},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2004.840618"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/2964284.2964309"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2017-736"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2013.6638344"},{"key":"e_1_3_2_1_20_1","volume-title":"Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980","author":"Kingma Diederik","year":"2014","unstructured":"Diederik Kingma and Jimmy Ba . 2014 . Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014). Diederik Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)."},{"volume-title":"Speech emotion recognition using deep neural network and extreme learning machine Proceedings of INTERSPEECH","author":"Kun Han Ivan Tashev","key":"e_1_3_2_1_21_1","unstructured":"Ivan Tashev Kun Han , Dong Yu. 2011. Speech emotion recognition using deep neural network and extreme learning machine Proceedings of INTERSPEECH . Ivan Tashev Kun Han, Dong Yu. 2011. Speech emotion recognition using deep neural network and extreme learning machine Proceedings of INTERSPEECH."},{"key":"e_1_3_2_1_22_1","volume-title":"Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks","author":"LeCun Yann","year":"1995","unstructured":"Yann LeCun , Yoshua Bengio , and others. 1995. Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks , Vol. 3361 , 10 ( 1995 ), 1995. Yann LeCun, Yoshua Bengio, and others. 1995. Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks, Vol. 3361, 10 (1995), 1995."},{"volume-title":"High-level feature representation using recurrent neural network for speech emotion recognition Proceedings of INTERSPEECH","author":"Lee Jinkyu","key":"e_1_3_2_1_23_1","unstructured":"Jinkyu Lee and Ivan Tashev . 2015. High-level feature representation using recurrent neural network for speech emotion recognition Proceedings of INTERSPEECH . Jinkyu Lee and Ivan Tashev. 2015. High-level feature representation using recurrent neural network for speech emotion recognition Proceedings of INTERSPEECH."},{"key":"e_1_3_2_1_24_1","volume-title":"Emotional prosody speech and transcripts","author":"Liberman Mark","year":"2002","unstructured":"Mark Liberman , Kelly Davis , M Grossman , N Martey , and J Bell . 2002. Emotional prosody speech and transcripts . Linguistic Data Consortium , Philadelphia ( 2002 ). Mark Liberman, Kelly Davis, M Grossman, N Martey, and J Bell. 2002. Emotional prosody speech and transcripts. Linguistic Data Consortium, Philadelphia (2002)."},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2014.2360798"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDEW.2006.145"},{"key":"e_1_3_2_1_27_1","volume-title":"Roderick Cowie, and Maja Pantic.","author":"McKeown Gary","year":"2010","unstructured":"Gary McKeown , Michel Franccois Valstar , Roderick Cowie, and Maja Pantic. 2010 . The SEMAINE corpus of emotionally coloured character interactions Proceedings of IEEE International Conference on Multimedia and Expo (ICME) . 1079--1084. Gary McKeown, Michel Franccois Valstar, Roderick Cowie, and Maja Pantic. 2010. The SEMAINE corpus of emotionally coloured character interactions Proceedings of IEEE International Conference on Multimedia and Expo (ICME). 1079--1084."},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0893-6080(98)00010-0"},{"volume-title":"long short-term memory, fully connected deep neural networks Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","author":"Sainath Tara N","key":"e_1_3_2_1_29_1","unstructured":"Tara N Sainath , Oriol Vinyals , Andrew Senior , and Hacsim Sak . 2015. Convolutional , long short-term memory, fully connected deep neural networks Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . IEEE , 4580--4584. Tara N Sainath, Oriol Vinyals, Andrew Senior, and Hacsim Sak. 2015. Convolutional, long short-term memory, fully connected deep neural networks Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 4580--4584."},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/T-AFFC.2010.8"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.5555\/2627435.2670313"},{"key":"e_1_3_2_1_32_1","volume-title":"Highway networks. arXiv preprint arXiv:1505.00387","author":"Srivastava Rupesh Kumar","year":"2015","unstructured":"Rupesh Kumar Srivastava , Klaus Greff , and J\u00fcrgen Schmidhuber . 2015. Highway networks. arXiv preprint arXiv:1505.00387 ( 2015 ). Rupesh Kumar Srivastava, Klaus Greff, and J\u00fcrgen Schmidhuber. 2015. Highway networks. arXiv preprint arXiv:1505.00387 (2015)."},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2016.7472669"},{"key":"e_1_3_2_1_34_1","unstructured":"Andreas Veit Michael J Wilber and Serge Belongie. 2016. Residual networks behave like ensembles of relatively shallow networks Advances in Neural Information Processing Systems. 550--558.  Andreas Veit Michael J Wilber and Serge Belongie. 2016. Residual networks behave like ensembles of relatively shallow networks Advances in Neural Information Processing Systems. 550--558."},{"key":"e_1_3_2_1_35_1","volume-title":"Emotional speech recognition: Resources, features, and methods. Speech communication","author":"Ververidis Dimitrios","year":"2006","unstructured":"Dimitrios Ververidis and Constantine Kotropoulos . 2006. Emotional speech recognition: Resources, features, and methods. Speech communication , Vol. 48 , 9 ( 2006 ), 1162--1181. Dimitrios Ververidis and Constantine Kotropoulos. 2006. Emotional speech recognition: Resources, features, and methods. Speech communication, Vol. 48, 9 (2006), 1162--1181."},{"volume-title":"Highway long short-term memory rnns for distant speech recognition Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","author":"Zhang Yu","key":"e_1_3_2_1_36_1","unstructured":"Yu Zhang , Guoguo Chen , Dong Yu , Kaisheng Yaco , Sanjeev Khudanpur , and James Glass . 2016. Highway long short-term memory rnns for distant speech recognition Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . IEEE , 5755--5759. Yu Zhang, Guoguo Chen, Dong Yu, Kaisheng Yaco, Sanjeev Khudanpur, and James Glass. 2016. Highway long short-term memory rnns for distant speech recognition Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 5755--5759."},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACII.2015.7344669"},{"key":"e_1_3_2_1_38_1","volume-title":"Jan Koutn\u00edk, and J\u00fcrgen Schmidhuber.","author":"Zilly Julian Georg","year":"2016","unstructured":"Julian Georg Zilly , Rupesh Kumar Srivastava , Jan Koutn\u00edk, and J\u00fcrgen Schmidhuber. 2016 . Recurrent highway networks. arXiv preprint arXiv:1607.03474 (2016). Julian Georg Zilly, Rupesh Kumar Srivastava, Jan Koutn\u00edk, and J\u00fcrgen Schmidhuber. 2016. Recurrent highway networks. arXiv preprint arXiv:1607.03474 (2016)."}],"event":{"name":"MM '17: ACM Multimedia Conference","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Mountain View California USA","acronym":"MM '17"},"container-title":["Proceedings of the 25th ACM international conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3123266.3123353","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3123266.3123353","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T02:14:03Z","timestamp":1750212843000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3123266.3123353"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,10,19]]},"references-count":38,"alternative-id":["10.1145\/3123266.3123353","10.1145\/3123266"],"URL":"https:\/\/doi.org\/10.1145\/3123266.3123353","relation":{},"subject":[],"published":{"date-parts":[[2017,10,19]]},"assertion":[{"value":"2017-10-19","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}