{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,3]],"date-time":"2026-03-03T16:14:58Z","timestamp":1772554498163,"version":"3.50.1"},"reference-count":57,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2021,12,20]],"date-time":"2021-12-20T00:00:00Z","timestamp":1639958400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"NSF Smart and Connected Health","award":["1838615"],"award-info":[{"award-number":["1838615"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Comput. Healthcare"],"published-print":{"date-parts":[[2022,4,30]]},"abstract":"<jats:p>The rapid development of machine learning on acoustic signal processing has resulted in many solutions for detecting emotions from speech. Early works were developed for clean and acted speech and for a fixed set of emotions. Importantly, the datasets and solutions assumed that a person only exhibited one of these emotions. More recent work has continually been adding realism to emotion detection by considering issues such as reverberation, de-amplification, and background noise, but often considering one dataset at a time, and also assuming all emotions are accounted for in the model. We significantly improve realistic considerations for emotion detection by (i) more comprehensively assessing different situations by combining the five common publicly available datasets as one and enhancing the new dataset with data augmentation that considers reverberation and de-amplification, (ii) incorporating 11 typical home noises into the acoustics, and (iii) considering that in real situations a person may be exhibiting many emotions that are not currently of interest and they should not have to fit into a pre-fixed category nor be improperly labeled. Our novel solution combines CNN with out-of-data distribution detection. Our solution increases the situations where emotions can be effectively detected and outperforms a state-of-the-art baseline.<\/jats:p>","DOI":"10.1145\/3492300","type":"journal-article","created":{"date-parts":[[2021,12,20]],"date-time":"2021-12-20T16:29:43Z","timestamp":1640017783000},"page":"1-22","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Emotion Recognition Robust to Indoor Environmental Distortions and Non-targeted Emotions Using Out-of-distribution Detection"],"prefix":"10.1145","volume":"3","author":[{"given":"Ye","family":"Gao","sequence":"first","affiliation":[{"name":"University of Virginia, Charlottesville, VA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Asif","family":"Salekin","sequence":"additional","affiliation":[{"name":"Syracuse University, Syracuse, NY"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kristina","family":"Gordon","sequence":"additional","affiliation":[{"name":"University of Tennessee, Knoxville, TN"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Karen","family":"Rose","sequence":"additional","affiliation":[{"name":"Ohio State University, Columbus, OH"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hongning","family":"Wang","sequence":"additional","affiliation":[{"name":"University of Virginia, Charlottesville, VA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"John","family":"Stankovic","sequence":"additional","affiliation":[{"name":"University of Virginia, Charlottesville, VA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,12,20]]},"reference":[{"key":"e_1_3_1_2_2","unstructured":"[n.d.]. dynaEdge DE-100. Retrieved from https:\/\/asia.dynabook.com\/laptop\/dynaedge-de100\/overview.php."},{"key":"e_1_3_1_3_2","first-page":"31","volume-title":"IEEE International Conference on Recent Advances in Intelligent Computational Systems (RAICS)","author":"Alex Starlet Ben","year":"2018","unstructured":"Starlet Ben Alex, Ben P. Babu, and Leena Mary. 2018. Utterance and syllable level prosodic features for automatic emotion recognition. In IEEE International Conference on Recent Advances in Intelligent Computational Systems (RAICS). IEEE, 31\u201335."},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.5555\/1778694.1778756"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10639-015-9388-2"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1177\/1059712316664017"},{"key":"e_1_3_1_7_2","first-page":"251","volume-title":"22nd Conference on Computational Natural Language Learning","author":"Beard Rory","year":"2018","unstructured":"Rory Beard, Ritwik Das, Raymond W. M. Ng, P. G. Keerthana Gopalakrishnan, Luka Eerens, Pawel Swietojanski, and Ondrej Miksik. 2018. Multi-modal sequence fusion via recursive attention for emotion recognition. In 22nd Conference on Computational Natural Language Learning. 251\u2013259."},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/1571941.1571946"},{"key":"e_1_3_1_9_2","volume-title":"9th European Conference on Speech Communication and Technology","author":"Burkhardt Felix","year":"2005","unstructured":"Felix Burkhardt, Astrid Paeschke, Miriam Rolfes, Walter F. Sendlmeier, and Benjamin Weiss. 2005. A database of German emotional speech. In 9th European Conference on Speech Communication and Technology."},{"issue":"1","key":"e_1_3_1_10_2","doi-asserted-by":"crossref","first-page":"316","DOI":"10.1007\/s10803-017-3318-7","article-title":"Brief report: Inter-relationship between emotion regulation, intolerance of uncertainty, anxiety, and depression in youth with autism spectrum disorder","volume":"48","author":"Cai Ru Ying","year":"2018","unstructured":"Ru Ying Cai, Amanda L. Richdale, Cheryl Dissanayake, and Mirko Uljarevi\u0107. 2018. Brief report: Inter-relationship between emotion regulation, intolerance of uncertainty, anxiety, and depression in youth with autism spectrum disorder. J. Autism Devel. Disord. 48, 1 (2018), 316\u2013325.","journal-title":"J. Autism Devel. Disord."},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/TAFFC.2014.2336244"},{"key":"e_1_3_1_12_2","doi-asserted-by":"crossref","first-page":"179","DOI":"10.1007\/978-3-319-62530-0_10","volume-title":"Personal Assistants: Emerging Computational Technologies","author":"Castillo Jos\u00e9 Carlos","year":"2018","unstructured":"Jos\u00e9 Carlos Castillo, \u00c1lvaro Castro-Gonz\u00e1lez, Fern\u00e1ndo Alonso-Mart\u00edn, Antonio Fern\u00e1ndez-Caballero, and Miguel \u00c1ngel Salichs. 2018. Emotion detection and regulation from personal assistant robot in smart environment. In Personal Assistants: Emerging Computational Technologies. Springer, 179\u2013195."},{"key":"e_1_3_1_13_2","first-page":"27","volume-title":"Emotions, Technology, Design, and Learning","author":"Cen Ling","year":"2016","unstructured":"Ling Cen, Fei Wu, Zhu Liang Yu, and Fengye Hu. 2016. A real-time speech emotion recognition system and its application in online learning. In Emotions, Technology, Design, and Learning. Elsevier, 27\u201346."},{"issue":"1","key":"e_1_3_1_14_2","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1109\/TCE.2021.3056421","article-title":"Real-time speech emotion analysis for smart home assistants.","volume":"67","author":"Chatterjee Rajdeep","year":"2021","unstructured":"Rajdeep Chatterjee, Saptarshi Mazumdar, R. Simon Sherratt, Rohit Halder, Tanmoy Maitra, and Debasis Giri. 2021. Real-time speech emotion analysis for smart home assistants.IEEE Trans. Consum. Electron. 67, 1 (2021), 68\u201376.","journal-title":"IEEE Trans. Consum. Electron."},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.5555\/3324320.3324339"},{"issue":"2","key":"e_1_3_1_16_2","doi-asserted-by":"crossref","first-page":"133","DOI":"10.1080\/0305764X.2018.1472744","article-title":"Using emotion regulation to cope with challenges: A study of Chinese students in the United Kingdom","volume":"49","author":"Cheng Ming","year":"2019","unstructured":"Ming Cheng, Andrew Friesen, and Olalekan Adekola. 2019. Using emotion regulation to cope with challenges: A study of Chinese students in the United Kingdom. Cambr. J. Educ. 49, 2 (2019), 133\u2013145.","journal-title":"Cambr. J. Educ."},{"key":"e_1_3_1_17_2","first-page":"257","volume-title":"IEEE Applied Signal Processing Conference (ASPCON\u201918)","author":"Choudhury Akash Roy","year":"2018","unstructured":"Akash Roy Choudhury, Anik Ghosh, Rahul Pandey, and Subhas Barman. 2018. Emotion recognition from speech signals using excitation source and spectral features. In IEEE Applied Signal Processing Conference (ASPCON\u201918). IEEE, 257\u2013261."},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-69369-7_23"},{"key":"e_1_3_1_19_2","first-page":"193","volume-title":"IEEE International Conference on Multimedia and Expo","author":"Datcu Dragos","year":"2005","unstructured":"Dragos Datcu and L\u00e9on J. M. Rothkrantz. 2005. Facial expression recognition with relevance vector machines. In IEEE International Conference on Multimedia and Expo. IEEE, 193\u2013196."},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/LSP.2017.2672753"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.5555\/2602339.2602352"},{"key":"e_1_3_1_22_2","volume-title":"Toronto Emotional Speech Set (TESS)","author":"Dupuis Kate","year":"2010","unstructured":"Kate Dupuis and M. Kathleen Pichora-Fuller. 2010. Toronto Emotional Speech Set (TESS). University of Toronto, Psychology Department."},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/TAFFC.2015.2457417"},{"key":"e_1_3_1_24_2","first-page":"1","volume-title":"9th International Conference on Signal Processing and Communication Systems (ICSPCS\u201915)","author":"Fayek Haytham M.","year":"2015","unstructured":"Haytham M. Fayek, Margaret Lech, and Lawrence Cavedon. 2015. Towards real-time speech emotion recognition using deep neural networks. In 9th International Conference on Signal Processing and Communication Systems (ICSPCS\u201915). IEEE, 1\u20135."},{"key":"e_1_3_1_25_2","first-page":"200","volume-title":"International Conference on System Modeling & Advancement in Research Trends (SMART\u201918)","author":"Fernandes V.","year":"2018","unstructured":"V. Fernandes, L. Mascarehnas, C. Mendonca, A. Johnson, and R. Mishra. 2018. Speech emotion recognition using mel frequency cepstral coefficient and SVM classifier. In International Conference on System Modeling & Advancement in Research Trends (SMART\u201918). IEEE, 200\u2013204."},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jbi.2016.09.015"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/3384419.3430422"},{"issue":"3","key":"e_1_3_1_28_2","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1016\/j.jalz.2019.01.010","article-title":"2019 Alzheimer\u2019s disease facts and figures","volume":"15","author":"Gaugler Joseph","year":"2019","unstructured":"Joseph Gaugler, Bryan James, Tricia Johnson, Allison Marin, and Jennifer Weuve. 2019. 2019 Alzheimer\u2019s disease facts and figures. Alzh. Dement. 15, 3 (2019), 321\u2013387.","journal-title":"Alzh. Dement."},{"key":"e_1_3_1_29_2","volume-title":"8th International Conference on Affective Computing & Intelligent Interaction (ACII\u201919)","author":"Ghaleb Esam","year":"2019","unstructured":"Esam Ghaleb, Mirela Popa, and Stylianos Asteriadis. 2019. Multimodal and temporal perception of audio-visual cues for emotion recognition. In 8th International Conference on Affective Computing & Intelligent Interaction (ACII\u201919)."},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.5555\/3086952"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.5555\/3204855"},{"issue":"2","key":"e_1_3_1_32_2","first-page":"151","article-title":"Emotion regulation and mental health","volume":"2","author":"Gross James J.","year":"1995","unstructured":"James J. Gross and Ricardo F. Mu\u00f1oz. 1995. Emotion regulation and mental health. Clin. Psychol.: Sci. Pract. 2, 2 (1995), 151\u2013164.","journal-title":"Clin. Psychol.: Sci. Pract."},{"issue":"2","key":"e_1_3_1_33_2","doi-asserted-by":"crossref","first-page":"130","DOI":"10.1002\/wps.20618","article-title":"Mental illness and well-being: An affect regulation perspective","volume":"18","author":"Gross James J.","year":"2019","unstructured":"James J. Gross, Helen Uusberg, and Andero Uusberg. 2019. Mental illness and well-being: An affect regulation perspective. World Psychiat. 18, 2 (2019), 130\u2013139.","journal-title":"World Psychiat."},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.5555\/1892152"},{"key":"e_1_3_1_35_2","article-title":"A baseline for detecting misclassified and out-of-distribution examples in neural networks","author":"Hendrycks Dan","year":"2016","unstructured":"Dan Hendrycks and Kevin Gimpel. 2016. A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv preprint arXiv:1610.02136 (2016).","journal-title":"arXiv preprint arXiv:1610.02136"},{"key":"e_1_3_1_36_2","first-page":"3658","volume-title":"Interspeech Conference","author":"Huang Che-Wei","year":"2018","unstructured":"Che-Wei Huang and Shrikanth Narayanan. 2018. Stochastic shake-shake regularization for affective learning from speech. In Interspeech Conference. 3658\u20133662."},{"key":"e_1_3_1_37_2","first-page":"1","volume-title":"IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP\u201918)","author":"Jalili Amin","year":"2018","unstructured":"Amin Jalili, Sadid Sahami, Chong-Yung Chi, and Rassoul Amirfattahi. 2018. Speech emotion recognition using cyclostationary spectral analysis. In IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP\u201918). IEEE, 1\u20136."},{"key":"e_1_3_1_38_2","doi-asserted-by":"crossref","first-page":"14","DOI":"10.3389\/fcomp.2020.00014","article-title":"Real-time speech emotion recognition using a pre-trained image classification network: Effects of bandwidth reduction and companding","volume":"2","author":"Lech Margaret","year":"2020","unstructured":"Margaret Lech, Melissa Stolar, Christopher Best, and Robert Bolia. 2020. Real-time speech emotion recognition using a pre-trained image classification network: Effects of bandwidth reduction and companding. Front. Comput. Sci. 2 (2020), 14.","journal-title":"Front. Comput. Sci."},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.5555\/3327757.3327819"},{"key":"e_1_3_1_40_2","article-title":"Enhancing the reliability of out-of-distribution image detection in neural networks","author":"Liang Shiyu","year":"2017","unstructured":"Shiyu Liang, Yixuan Li, and Rayadurgam Srikant. 2017. Enhancing the reliability of out-of-distribution image detection in neural networks. arXiv preprint arXiv:1706.02690 (2017).","journal-title":"arXiv preprint arXiv:1706.02690"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0196391"},{"key":"e_1_3_1_42_2","volume-title":"16th International Congress of Phonetic Sciences","author":"Lugger Marko","year":"2007","unstructured":"Marko Lugger and Bin Yang. 2007. An incremental analysis of different feature groups in speaker independent emotion recognition. In 16th International Congress of Phonetic Sciences."},{"key":"e_1_3_1_43_2","unstructured":"Prasanta Chandra Mahalanobis. 1936. On the generalized distance in statistics. National Institute of Science of India."},{"key":"e_1_3_1_44_2","first-page":"1","volume-title":"Innovations in Intelligent Systems and Applications (INISTA\u201918)","author":"Mano Leandro Y.","year":"2018","unstructured":"Leandro Y. Mano. 2018. Emotional condition in the Health Smart Homes environment: Emotion recognition using ensemble of classifiers. In Innovations in Intelligent Systems and Applications (INISTA\u201918). IEEE, 1\u20138."},{"key":"e_1_3_1_45_2","first-page":"1128","volume-title":"24th European Signal Processing Conference (EUSIPCO\u201916)","author":"Mesaros Annamaria","year":"2016","unstructured":"Annamaria Mesaros, Toni Heittola, and Tuomas Virtanen. 2016. TUT database for acoustic scene classification and sound event detection. In 24th European Signal Processing Conference (EUSIPCO\u201916). IEEE, 1128\u20131132."},{"key":"e_1_3_1_46_2","article-title":"VoxCeleb: A large-scale speaker identification dataset","author":"Nagrani Arsha","year":"2017","unstructured":"Arsha Nagrani, Joon Son Chung, and Andrew Zisserman. 2017. VoxCeleb: A large-scale speaker identification dataset. arXiv preprint arXiv:1706.08612 (2017).","journal-title":"arXiv preprint arXiv:1706.08612"},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/TAFFC.2017.2713783"},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1145\/3130961"},{"key":"e_1_3_1_49_2","doi-asserted-by":"crossref","first-page":"365","DOI":"10.1007\/978-3-030-66471-8_28","volume-title":"International Conference on Distributed Computer and Communication Networks","author":"Shchetinin Eugene Yu","year":"2020","unstructured":"Eugene Yu Shchetinin, Leonid A. Sevastianov, Dmitry S. Kulyabov, Edik A. Ayrjan, and Anastasia V. Demidova. 2020. Deep neural networks for emotion recognition. In International Conference on Distributed Computer and Communication Networks. Springer, 365\u2013379."},{"key":"e_1_3_1_50_2","first-page":"1","volume-title":"11th International Conference on Signal Processing and Communication Systems (ICSPCS\u201917)","author":"Stolar Melissa N.","year":"2017","unstructured":"Melissa N. Stolar, Margaret Lech, Robert S. Bolia, and Michael Skinner. 2017. Real time speech emotion recognition using RGB image classification and transfer learning. In 11th International Conference on Signal Processing and Communication Systems (ICSPCS\u201917). IEEE, 1\u20138."},{"key":"e_1_3_1_51_2","article-title":"Towards robust speech emotion recognition using deep residual networks for speech enhancement","author":"Triantafyllopoulos Andreas","year":"2019","unstructured":"Andreas Triantafyllopoulos, Gil Keren, Johannes Wagner, Ingmar Steiner, and Bj\u00f6rn Schuller. 2019. Towards robust speech emotion recognition using deep residual networks for speech enhancement. In Interspeech Conference.","journal-title":"I"},{"key":"e_1_3_1_52_2","first-page":"5200","volume-title":"IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP\u201916)","author":"Trigeorgis George","year":"2016","unstructured":"George Trigeorgis, Fabien Ringeval, Raymond Brueckner, Erik Marchi, Mihalis A. Nicolaou, Bj\u00f6rn Schuller, and Stefanos Zafeiriou. 2016. Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP\u201916). IEEE, 5200\u20135204."},{"key":"e_1_3_1_53_2","first-page":"1007","volume-title":"42nd International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO\u201919)","author":"Vreb\u010devi\u0107 N.","year":"2019","unstructured":"N. Vreb\u010devi\u0107, I. Miji\u0107, and D. Petrinovi\u0107. 2019. Emotion classification based on convolutional neural network using speech data. In 42nd International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO\u201919). IEEE, 1007\u20131012."},{"key":"e_1_3_1_54_2","doi-asserted-by":"publisher","DOI":"10.1109\/TAFFC.2015.2392101"},{"key":"e_1_3_1_55_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.smhl.2020.100165"},{"key":"e_1_3_1_56_2","first-page":"281","volume-title":"International Conference on Robotics, Electrical and Signal Processing Techniques (ICREST\u201919)","author":"Zamil Adib Ashfaq A.","year":"2019","unstructured":"Adib Ashfaq A. Zamil, Sajib Hasan, Showmik MD. Jannatul Baki, Jawad MD. Adam, and Isra Zaman. 2019. Emotion detection from speech signals using voting mechanism on classified frames. In International Conference on Robotics, Electrical and Signal Processing Techniques (ICREST\u201919). IEEE, 281\u2013285."},{"key":"e_1_3_1_57_2","doi-asserted-by":"publisher","DOI":"10.1145\/2973750.2973762"},{"key":"e_1_3_1_58_2","doi-asserted-by":"crossref","first-page":"1517","DOI":"10.21437\/Interspeech.2005-446","volume-title":"Interspeech","author":"Burkhardt Felix","year":"2005","unstructured":"Felix Burkhardt, Astrid Paeschke, Miriam Rolfes, Walter F. Sendlmeier, and Benjamin Weiss. 2005. A database of German emotional speech. In Interspeech, Vol. 5. 1517\u20131520."}],"container-title":["ACM Transactions on Computing for Healthcare"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3492300","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3492300","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:31:07Z","timestamp":1750188667000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3492300"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,12,20]]},"references-count":57,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2022,4,30]]}},"alternative-id":["10.1145\/3492300"],"URL":"https:\/\/doi.org\/10.1145\/3492300","relation":{},"ISSN":["2691-1957","2637-8051"],"issn-type":[{"value":"2691-1957","type":"print"},{"value":"2637-8051","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,12,20]]},"assertion":[{"value":"2020-06-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-07-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-12-20","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}