{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,16]],"date-time":"2026-04-16T16:10:58Z","timestamp":1776355858297,"version":"3.51.2"},"reference-count":30,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2012,5,4]],"date-time":"2012-05-04T00:00:00Z","timestamp":1336089600000},"content-version":"unspecified","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/2.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J AUDIO SPEECH MUSIC PROC."],"published-print":{"date-parts":[[2012,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>In this article, we propose a new set of acoustic features for automatic emotion recognition from audio. The features are based on the perceptual quality metrics that are given in perceptual evaluation of audio quality known as ITU BS.1387 recommendation. Starting from the outer and middle ear models of the auditory system, we base our features on the masked perceptual loudness which defines relatively objective criteria for emotion detection. The features computed in critical bands based on the reference concept include the partial loudness of the emotional difference, emotional difference-to-perceptual mask ratio, measures of alterations of temporal envelopes, measures of harmonics of the emotional difference, the occurrence probability of emotional blocks, and perceptual bandwidth. A soft-majority voting decision rule that strengthens the conventional majority voting is proposed to assess the classifier outputs. Compared to the state-of-the-art systems including Munich Open-Source Emotion and Affect Recognition Toolkit, Hidden Markov Toolkit, and Generalized Discriminant Analysis, it is shown that the emotion recognition rates are improved between 7-16% for EMO-DB and 7-11% in VAM for \"all\" and \"valence\" tasks.<\/jats:p>","DOI":"10.1186\/1687-4722-2012-16","type":"journal-article","created":{"date-parts":[[2012,12,5]],"date-time":"2012-12-05T15:43:35Z","timestamp":1354722215000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":56,"title":["Perceptual audio features for emotion detection"],"prefix":"10.1186","volume":"2012","author":[{"given":"Mehmet Cenk","family":"Sezgin","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bilge","family":"Gunsel","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Gunes Karabulut","family":"Kurt","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2012,5,4]]},"reference":[{"issue":"1","key":"52_CR1","doi-asserted-by":"publisher","first-page":"32","DOI":"10.1109\/79.911197","volume":"18","author":"R Cowie","year":"2001","unstructured":"Cowie R, Douglas-Cowie E, Tsapatsoulis N, Votsis G, Kollias S, Fellenz W, Taylor J: Emotion recognition in human-computer interaction. IEEE Signal Process Mag 2001, 18(1):32-80.","journal-title":"IEEE Signal Process Mag"},{"issue":"3","key":"52_CR2","doi-asserted-by":"publisher","first-page":"572","DOI":"10.1016\/j.patcog.2010.09.020","volume":"44","author":"ME Ayadia","year":"2011","unstructured":"Ayadia ME, Kamelb MS, Karrayb F: Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recognit 2011, 44(3):572-587.","journal-title":"Pattern Recognit"},{"key":"52_CR3","doi-asserted-by":"publisher","first-page":"293","DOI":"10.1109\/TSA.2004.838534","volume":"13","author":"CM Lee","year":"2005","unstructured":"Lee CM, Narayanan SS: Toward detecting emotions in spoken dialogs. IEEE Trans Speech Audio Process 2005, 13: 293-303.","journal-title":"IEEE Trans Speech Audio Process"},{"key":"52_CR4","first-page":"827","volume-title":"Proc of the IEEE Int Workshop on EmoSPACE, in Conjunction with the IEEE FG 2011, CA, USA","author":"H Gunes","year":"2011","unstructured":"Gunes H, Schuller B, Pantic M, Cowie R: Emotion representation, analysis and synthesis in continuous space: a survey. Proc of the IEEE Int Workshop on EmoSPACE, in Conjunction with the IEEE FG 2011, CA, USA 2011, 827-834."},{"key":"52_CR5","first-page":"552","volume-title":"Proc of the IEEE Automatic Speech Recognition and Understanding Workshop, Italy","author":"B Schuller","year":"2009","unstructured":"Schuller B, Vlasenko B, Eyben F, Rigoll G, Wendemuth A: Acoustic emotion recognition: a benchmark comparison of performances. Proc of the IEEE Automatic Speech Recognition and Understanding Workshop, Italy 2009, 552-557."},{"issue":"9","key":"52_CR6","doi-asserted-by":"publisher","first-page":"1162","DOI":"10.1016\/j.specom.2006.04.003","volume":"48","author":"D Ververidis","year":"2006","unstructured":"Ververidis D, Kotropoulos C: Emotional speech recognition: resources, features, and methods. Speech Commun 2006, 48(9):1162-1181.","journal-title":"Speech Commun"},{"key":"52_CR7","volume-title":"The HTK Book (v3.4)","author":"S Young","year":"2006","unstructured":"Young S, Evermann G, Gales M, Hain T, Kershaw D, Liu X, Moore G, Odell J, Ollason D, Povey D, Valtchev V, Woodland P: The HTK Book (v3.4). Cambridge University Press, Cambridge; 2006."},{"key":"52_CR8","first-page":"576","volume-title":"IEEE Proc of the 4th International HUMAINE Association Conference on Affective Computing and Intelligent Interaction, Amsterdam","author":"F Eyben","year":"2009","unstructured":"Eyben F, Wollmer M, Schuller B: openEAR--introducing the munich open-source emotion and affect recognition toolkit. IEEE Proc of the 4th International HUMAINE Association Conference on Affective Computing and Intelligent Interaction, Amsterdam 2009, 576-581."},{"key":"52_CR9","first-page":"5688","volume-title":"Proc of the IEEE International Conference on Acoustics Speech and Signal Processing, Prague","author":"A Stuhlsatz","year":"2011","unstructured":"Stuhlsatz A, Meyer C, Eyben F, Zielke T, Meier G, Schuller B: Deep neural networks for acoustic emotion recognition: raising the benchmarks. Proc of the IEEE International Conference on Acoustics Speech and Signal Processing, Prague 2011, 5688-5691."},{"key":"52_CR10","first-page":"395","volume-title":"Speech Recognition, Technologies and Applications","author":"M Lugger","year":"2008","unstructured":"Lugger M, Yang B: Psychological motivated multi-stage emotion classification exploiting voice quality features. In Speech Recognition, Technologies and Applications. Edited by: France Mihelic, Janez Zibert. I-Tech Education and Publishing, Vienna, Austria; 2008:395-410."},{"issue":"5","key":"52_CR11","doi-asserted-by":"publisher","first-page":"1415","DOI":"10.1016\/j.sigpro.2009.09.009","volume":"90","author":"B Yang","year":"2010","unstructured":"Yang B, Lugger M: Emotion recognition from speech signals using new harmony features. Signal Process 2010, 90(5):1415-1423.","journal-title":"Signal Process"},{"key":"52_CR12","doi-asserted-by":"publisher","DOI":"10.1002\/0470093366","volume-title":"MPEG-7 Audio and Beyond","author":"HG Kim","year":"2005","unstructured":"Kim HG, Moreau N, Sikora T: MPEG-7 Audio and Beyond. John Wiley & Sons Ltd., England; 2005."},{"key":"52_CR13","first-page":"780","volume-title":"Proc of the IEEE Int Workshop on EmoSPACE, in Conjunction with the IEEE FG 2011, CA, USA","author":"C Sezgin","year":"2011","unstructured":"Sezgin C, Gunsel B, Kurt GK: A novel perceptual feature set for audio emotion recognition. Proc of the IEEE Int Workshop on EmoSPACE, in Conjunction with the IEEE FG 2011, CA, USA 2011, 780-785."},{"issue":"2","key":"52_CR14","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/T-AFFC.2010.8","volume":"1","author":"B Schuller","year":"2010","unstructured":"Schuller B, Vlasenko B, Eyben F, Wollmer M, Stuhlsatz A, Wendemuth A, Rigoll G: Cross-corpus acoustic emotion recognition: variances and strategies. IEEE Trans Affect Comput 2010, 1(2):1-13.","journal-title":"IEEE Trans Affect Comput"},{"key":"52_CR15","first-page":"1517","volume-title":"Proc of the INTERSPEECH, Portugal","author":"F Burkhardt","year":"2005","unstructured":"Burkhardt F, Paeschke A, Rolfes M, Sendlmeier W, Weiss B: A database of German emotional speech. Proc of the INTERSPEECH, Portugal 2005, 1517-1520."},{"key":"52_CR16","first-page":"737","volume-title":"Proc of the IEEE International Conference on Multimedia and Expo, Germany","author":"M Grimm","year":"2008","unstructured":"Grimm M, Kroschel K, Narayanan S: The Vera am Mittag German audio-visual emotional speech database. Proc of the IEEE International Conference on Multimedia and Expo, Germany 2008, 737-742."},{"key":"52_CR17","doi-asserted-by":"publisher","first-page":"169","DOI":"10.1080\/02699939208411068","volume":"6","author":"P Ekman","year":"1992","unstructured":"Ekman P: An argument for basic emotions. Cognit Emotion 1992, 6: 169-200.","journal-title":"Cognit Emotion"},{"key":"52_CR18","doi-asserted-by":"publisher","first-page":"81","DOI":"10.1037\/h0054570","volume":"61","author":"H Schlosberg","year":"1954","unstructured":"Schlosberg H: Three dimensions of emotions. Psychol Rev 1954, 61: 81-88.","journal-title":"Psychol Rev"},{"key":"52_CR19","doi-asserted-by":"publisher","first-page":"1161","DOI":"10.1037\/h0077714","volume":"39","author":"JA Russell","year":"1980","unstructured":"Russell JA: A circumplex model of affect. J Personal Soc Psychol 1980, 39: 1161-1178.","journal-title":"J Personal Soc Psychol"},{"key":"52_CR20","doi-asserted-by":"publisher","first-page":"603","DOI":"10.1016\/S0167-6393(03)00099-2","volume":"41","author":"T Nwe","year":"2003","unstructured":"Nwe T, Foo S, De Silva L: Speech emotion recognition using hidden Markov models. Speech Commun 2003, 41: 603-623.","journal-title":"Speech Commun"},{"issue":"3","key":"52_CR21","doi-asserted-by":"publisher","first-page":"201","DOI":"10.1109\/89.905995","volume":"9","author":"G Zhou","year":"2001","unstructured":"Zhou G, Hansen JHL, Kaiser JF: Nonlinear feature based classification of speech under stress. IEEE Trans Speech Audio Process 2001, 9(3):201-216.","journal-title":"IEEE Trans Speech Audio Process"},{"key":"52_CR22","doi-asserted-by":"publisher","first-page":"366","DOI":"10.1109\/AFGR.1998.670976","volume-title":"Proc of the IEEE Automatic Face and Gesture Recognition, Japan","author":"L Chen","year":"1998","unstructured":"Chen L, Huang T, Miyasato T, Nakatsu R: Multimodal human emotion\/expression recognition. Proc of the IEEE Automatic Face and Gesture Recognition, Japan 1998, 366-371."},{"key":"52_CR23","first-page":"279","volume-title":"Proc of the International Conference on Pattern Recognition, Israel","author":"P Pudil","year":"1994","unstructured":"Pudil P, Ferri F, Novovicova J, Kittler J: Floating search method for feature selection with nonmonotonic criterion functions. Proc of the International Conference on Pattern Recognition, Israel 1994, 279-283."},{"key":"52_CR24","unstructured":"International Telecommunications Union Recommendation BS.1387-1, Method for objective measurements of perceived audio quality 2000."},{"key":"52_CR25","first-page":"3","volume":"48","author":"T Thiede","year":"2000","unstructured":"Thiede T, Treurniet WC, Bitto R, Schmidmer C, Sporer T, Beerends JG, Colomes C, Keyhl M, Stoll H, Brandenburg K: PEAQ--the ITU standard for objective measurement of perceived audio quality. J Audio Eng Soc 2000, 48: 3-29.","journal-title":"J Audio Eng Soc"},{"issue":"4","key":"52_CR26","doi-asserted-by":"publisher","first-page":"582","DOI":"10.1109\/TASL.2008.2009578","volume":"17","author":"C Busso","year":"2009","unstructured":"Busso C, Lee S, Narayanan S: Analysis of emotionally salient aspects of fundamental frequency for emotion detection. IEEE Trans Audio Speech Lang Process 2009, 17(4):582-596.","journal-title":"IEEE Trans Audio Speech Lang Process"},{"issue":"3","key":"52_CR27","doi-asserted-by":"publisher","first-page":"1642","DOI":"10.1121\/1.2832651","volume":"123","author":"PJ Murphy","year":"2008","unstructured":"Murphy PJ, McGuigan KG, Walsh M, Colreavy M: Investigation of a glottal related harmonics-to-noise ratio and spectral tilt as indicators of glottal noise in synthesized and human voice signals. Acoust Soc Am 2008, 123(3):1642-1652.","journal-title":"Acoust Soc Am"},{"key":"52_CR28","first-page":"27:1","volume":"2","author":"CC Chang","year":"2001","unstructured":"Chang CC, Lin CJ: LibSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2001, 2: 27:1-27:27.","journal-title":"ACM Trans Intell Syst Technol"},{"key":"52_CR29","volume-title":"Data Mining: Practical Machine Learning Tools and Techniques With Java Implementations","author":"IH Witten","year":"2000","unstructured":"Witten IH, Frank E: Data Mining: Practical Machine Learning Tools and Techniques With Java Implementations. Morgan Kaufman, San Francisco; 2000."},{"key":"52_CR30","doi-asserted-by":"publisher","first-page":"178","DOI":"10.1007\/978-3-540-24842-2_17","volume-title":"Proc of the Affective Dialogue Systems, Germany","author":"E Andr\u00e9","year":"2004","unstructured":"Andr\u00e9 E, Rehm M, Minker W, B\u00fchler D: Endowing spoken language dialogue systems with emotional intelligence. Proc of the Affective Dialogue Systems, Germany 2004, 178-187."}],"container-title":["EURASIP Journal on Audio, Speech, and Music Processing"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/1687-4722-2012-16.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/1687-4722-2012-16\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1687-4722-2012-16.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T19:17:29Z","timestamp":1630523849000},"score":1,"resource":{"primary":{"URL":"https:\/\/asmp-eurasipjournals.springeropen.com\/articles\/10.1186\/1687-4722-2012-16"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,5,4]]},"references-count":30,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2012,12]]}},"alternative-id":["52"],"URL":"https:\/\/doi.org\/10.1186\/1687-4722-2012-16","relation":{},"ISSN":["1687-4722"],"issn-type":[{"value":"1687-4722","type":"electronic"}],"subject":[],"published":{"date-parts":[[2012,5,4]]},"assertion":[{"value":"11 November 2011","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 May 2012","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 May 2012","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"16"}}