{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,21]],"date-time":"2026-02-21T05:24:06Z","timestamp":1771651446893,"version":"3.50.1"},"reference-count":48,"publisher":"Springer Science and Business Media LLC","issue":"5","license":[{"start":{"date-parts":[[2023,7,4]],"date-time":"2023-07-04T00:00:00Z","timestamp":1688428800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,7,4]],"date-time":"2023-07-04T00:00:00Z","timestamp":1688428800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100019776","name":"Universidad Europea de Madrid","doi-asserted-by":"crossref","award":["2019\/UEM60"],"award-info":[{"award-number":["2019\/UEM60"]}],"id":[{"id":"10.13039\/100019776","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100003759","name":"Universidad Polit\u00e9cnica de Madrid","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100003759","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Multimed Tools Appl"],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>In this paper we present a new speech emotion dataset on Spanish. The database is created using an elicited approach and is composed by fifty non-actors expressing the Ekman\u2019s six basic emotions of anger, disgust, fear, happiness, sadness, and surprise, plus neutral tone. This article describes how this database has been created from the recording step to the performed crowdsourcing perception test step. The crowdsourcing has facilitated to statistically validate the emotion of each collected audio sample and also to filter noisy data samples. Hence we obtained two datasets EmoSpanishDB and EmoMatchSpanishDB. The first includes those recorded audios that had consensus during the crowdsourcing process. The second selects from EmoSpanishDB only those audios whose emotion also matches with the originally elicited. Last, we present a baseline comparative study between different state of the art machine learning techniques in terms of accuracy, precision, and recall for both datasets. The results obtained for EmoMatchSpanishDB improves the ones obtained for EmoSpanishDB and thereof, we recommend to follow the methodology that was used for the creation of emotional databases.<\/jats:p>","DOI":"10.1007\/s11042-023-15959-w","type":"journal-article","created":{"date-parts":[[2023,7,4]],"date-time":"2023-07-04T03:24:47Z","timestamp":1688441087000},"page":"13093-13112","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":17,"title":["EmoMatchSpanishDB: study of speech emotion recognition machine learning models in a new Spanish elicited database"],"prefix":"10.1007","volume":"83","author":[{"given":"Esteban","family":"Garcia-Cuesta","sequence":"first","affiliation":[]},{"given":"Antonio Barba","family":"Salvador","sequence":"additional","affiliation":[]},{"given":"Diego Gachet","family":"P\u00e3ez","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,7,4]]},"reference":[{"key":"15959_CR1","doi-asserted-by":"crossref","unstructured":"Amer MR, Siddiquie B, Richey C, Divakaran A (2014) Emotion recognition in speech using deep networks. In: ICASSP. Florence, Italy, pp 3752\u20133756","DOI":"10.1109\/ICASSP.2014.6854297"},{"issue":"5","key":"15959_CR2","doi-asserted-by":"publisher","first-page":"160855","DOI":"10.1098\/rsos.160855","volume":"4","author":"AS Attwood","year":"2017","unstructured":"Attwood AS, Easey KE, Dalili MN, Skinner AL, Woods A, Crick L, Ilett E, Penton-Voak IS, Munaf\u00f3 MR (2017) State anxiety and emotional face recognition in healthy volunteers. R Soc Open Sci. 4(5:160855","journal-title":"R Soc Open Sci."},{"key":"15959_CR3","doi-asserted-by":"crossref","unstructured":"Burkhardt F, Paeschke, Rolfes M, Sendlmeier W, Weiss B (2005) 1129 A database of German emotional speech. In: Proc. Interspeech, pp. 1517\u20131520","DOI":"10.21437\/Interspeech.2005-446"},{"key":"15959_CR4","doi-asserted-by":"publisher","first-page":"116","DOI":"10.5370\/KIEE.2016.65.1.116","volume":"65","author":"S Byun","year":"2016","unstructured":"Byun S, Lee S (2016) Emotion Recognition Using Tone and Tempo Based on Voice for IoT. Trans Korean Inst Electr Eng 65:116\u2013121","journal-title":"Trans Korean Inst Electr Eng"},{"issue":"27","key":"15959_CR5","doi-asserted-by":"publisher","first-page":"86","DOI":"10.1109\/MIS.2012.110","volume":"27","author":"RA Calvo","year":"2012","unstructured":"Calvo RA, D\u2019Mello S (2012) Frontiers of Affect-Aware Learning Technologies. Intell. Syst. IEEE. 27(27):86\u201389","journal-title":"Intell. Syst. IEEE."},{"key":"15959_CR6","doi-asserted-by":"crossref","unstructured":"Cao H, Verma R, Nenkova A (2014) Speaker-sensitive emotion recognition via ranking: Studies on acted and spontaneous speech. Comput Speech Lang.","DOI":"10.1016\/j.csl.2014.01.003"},{"issue":"2","key":"15959_CR7","doi-asserted-by":"publisher","first-page":"241","DOI":"10.1037\/a0022572","volume":"11","author":"SR Cavanagh","year":"2011","unstructured":"Cavanagh SR, Urry HL, Shin LM (2011) Mood-induced shifts in attentional bias to emotional information predict ill-and well-being. Emotion 11(2):241\u2013248","journal-title":"Emotion"},{"key":"15959_CR8","doi-asserted-by":"crossref","unstructured":"Chang-Hong L, Liao WK, Hsieh WC, Liao WJ, Wang JC (2014) Emotion identification using extremely low frequency components of speech feature contours. Hindawi Publishing Corporation. Sci World J. Volume 2014","DOI":"10.1155\/2014\/757121"},{"key":"15959_CR9","first-page":"319","volume-title":"Approaches to Emotion","author":"P Ekman","year":"1984","unstructured":"Ekman P (1984) Expression and the nature of emotion. In: Scherer K, Ekman P (eds) Approaches to Emotion. Erlbaum, Hillsdale, NJ, pp 319\u2013344"},{"issue":"2","key":"15959_CR10","doi-asserted-by":"publisher","first-page":"190","DOI":"10.1109\/TAFFC.2015.2457417","volume":"7","author":"F Eyben","year":"2015","unstructured":"Eyben F, Scherer KR, Schuller BW, Sundberg J, Andr\u00e9 E, Busso C, Truong KP (2015) The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Trans Affect Comput 7(2):190\u2013202","journal-title":"IEEE Trans Affect Comput"},{"key":"15959_CR11","unstructured":"Florian E, W\u00f6llmer M, Schuller B (2010)\u00a0openSMILE - The Munich Versatile and Fast Open-Source Audio Feature Extractor, Proc. ACM Multimedia (MM), ACM, Florence, Italy, pp. 1459\u20131462"},{"issue":"3","key":"15959_CR12","doi-asserted-by":"publisher","first-page":"414","DOI":"10.1016\/j.specom.2011.10.005","volume":"54","author":"I Grichkovtsova","year":"2012","unstructured":"Grichkovtsova I, Morel M, Lacheret A (2012) The role of voice quality and prosodic contour in affective speech perception. Speech Comm. 54(3):414\u2013429","journal-title":"Speech Comm."},{"key":"15959_CR13","unstructured":"Iriondo I, Guaus R, Rodriguez A, L\u00e1zaro P, Montoya N, Blanco JM, Bernadas D, Oliver JM, Tena D, and Longhi L (2000)\u00a0Validation of an acoustical modeling of emotional expression in Spanish using speech synthesis techniques. In ITRW on speechand emotion, New Castle, Northern Ireland, UK\u00a0Sept. 2000"},{"issue":"4","key":"15959_CR14","doi-asserted-by":"publisher","first-page":"363","DOI":"10.1177\/1754073910374661","volume":"2","author":"CE Izard","year":"2010","unstructured":"Izard CE (2010) The many meanings\/aspects of emotion: Emotion definitions, functions, activation, and regulation. Emot Rev 2(4):363\u2013370","journal-title":"Emot Rev"},{"key":"15959_CR15","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.wocn.2018.07.001","volume":"71","author":"Y Jadoul","year":"2018","unstructured":"Jadoul Y, Thompson B, de Boer B (2018) Introducing Parselmouth: A Python interface to Praat. J Phon 71:1\u201315. https:\/\/doi.org\/10.1016\/j.wocn.2018.07.001","journal-title":"J Phon"},{"key":"15959_CR16","unstructured":"Jiang D, Lu L, Zhang H, Tao J and Cai L. (2002). Music type classification by spectral contrast feature. In Multimedia and Expo, 2002. ICME\u201802. Proceedings. 2002 IEEE Int Conf. vol. 1, pp. 113\u2013116. IEEE, 2002"},{"key":"15959_CR17","doi-asserted-by":"publisher","first-page":"99","DOI":"10.1007\/s10772-011-9125-1","volume":"15","author":"SG Koolagudi","year":"2012","unstructured":"Koolagudi SG, Rao KS (2012) Emotion recognition from speech: a review. Int J Speech Technol 15:99\u2013117","journal-title":"Int J Speech Technol"},{"issue":"3","key":"15959_CR18","doi-asserted-by":"publisher","first-page":"1022","DOI":"10.1109\/TPAMI.2019.2944808","volume":"43","author":"J Kossaifi","year":"2021","unstructured":"Kossaifi J et al (2021) SEWA DB: A Rich Database for Audio-Visual Emotion and Sentiment Research in the Wild. In IEEE Trans Patt Anal Mach Intell. 43(3):1022\u20131040. https:\/\/doi.org\/10.1109\/TPAMI.2019.2944808","journal-title":"In IEEE Trans Patt Anal Mach Intell."},{"key":"15959_CR19","doi-asserted-by":"publisher","first-page":"2","DOI":"10.1057\/s41599-020-0499-z","volume":"7","author":"A Lausen","year":"2020","unstructured":"Lausen A, Hammerschmidt K (2020) Emotion recognition and confidence ratings predicted by vocal stimulus type and prosodic parameters. Humanit Soc Sci Commun 7:2. https:\/\/doi.org\/10.1057\/s41599-020-0499-z","journal-title":"Humanit Soc Sci Commun"},{"issue":"23","key":"15959_CR20","doi-asserted-by":"publisher","first-page":"1195","DOI":"10.1049\/el.2009.1977","volume":"45","author":"N Madhu","year":"2009","unstructured":"Madhu N (2009) Note on measures for spectral flatness. Electron Lett 45(23)Confusion matrix for all samples and best model:1195","journal-title":"Electron Lett"},{"key":"15959_CR21","doi-asserted-by":"crossref","unstructured":"Marchi E, Ringeval F, and Schuller B. (2014) Voice-enabled assistive robots for handling autism spectrum conditions: an examination of the role of prosody,\u201d Speech and Automata in Health Care (Speech Technology and Text Mining in Medicine and Healthcare). De Gruyter, Boston\/Berlin\/Munich. pp. 207-236","DOI":"10.1515\/9781614515159.207"},{"key":"15959_CR22","doi-asserted-by":"crossref","unstructured":"McFee B, Raffel C, Liang D, Ellis DPW, McVicar M,\u00a0 EB, and Nieto O. (2015) librosa: Audio and music signal analysis in python. In Proceedings of the 14th python in science conference, pp. 18\u201325","DOI":"10.25080\/Majora-7b98e3ed-003"},{"key":"15959_CR23","volume-title":"(1971) Silent Messages","author":"A Mehrabian","year":"1971","unstructured":"Mehrabian A (1971) (1971) Silent Messages. Wadsworth Publishing Co., Belmont, CA"},{"key":"15959_CR24","unstructured":"Montoro JM, Gutierrez-Arriola J, Colas J, Enriquez E, and Pardo JM. (1999). Analysis and modeling of emotional speech in Spanish. In Proc. int. conf. on phonetic sciences (pp. 957-960)"},{"key":"15959_CR25","unstructured":"Muhammadi J, Rabiee HR, and Hosseini A. (2013). Crowd Labeling: a survey. arXiv: Artificial Intelligence."},{"issue":"2","key":"15959_CR26","doi-asserted-by":"publisher","first-page":"341","DOI":"10.1007\/s10579-019-09450-y","volume":"54","author":"E Parada-Cabaleiro","year":"2020","unstructured":"Parada-Cabaleiro E, Costantini G, Batliner A et al (2020) DEMoS: an Italian emotional speech corpus. Lang Res Eval 54(2):341\u2013383. https:\/\/doi.org\/10.1007\/s10579-019-09450-y","journal-title":"Lang Res Eval"},{"key":"15959_CR27","doi-asserted-by":"crossref","DOI":"10.7551\/mitpress\/1140.001.0001","volume-title":"(1997) Affective Computing","author":"R Picard","year":"1997","unstructured":"Picard R (1997) (1997) Affective Computing. The MIT Press, Cambridge"},{"issue":"6","key":"15959_CR28","doi-asserted-by":"publisher","first-page":"1363","DOI":"10.1007\/s10796-017-9734-6","volume":"20","author":"M Poblet","year":"2018","unstructured":"Poblet M, Garcia-Cuesta E, Casanovas P (2018) Crowdsourcing roles, methods and tools for data-intensive disaster management. Inf Syst Front 20(6):1363\u20131379. https:\/\/doi.org\/10.1007\/s10796-017-9734-6","journal-title":"Inf Syst Front"},{"key":"15959_CR29","doi-asserted-by":"crossref","unstructured":"Polzehl T, Schmitt A, and Metze Florian. (2010). Approaching multi-lingual emotion recognition from speech - on language dependency of acoustic\/prosodic features for anger recognition. In Speech Prosody\u20192010 Conference, paper 442, Chicago, IL, USA May 10\u201314","DOI":"10.21437\/SpeechProsody.2010-123"},{"issue":"2","key":"15959_CR30","doi-asserted-by":"publisher","first-page":"327","DOI":"10.1007\/s10772-019-09605-w","volume":"22","author":"SS Poorna","year":"2019","unstructured":"Poorna SS, Nair GJ (2019) Multistage classification scheme to enhance speech emotion recognition. Int J Speech Technol. 22(2):327\u2013340","journal-title":"Int J Speech Technol."},{"issue":"4","key":"15959_CR31","doi-asserted-by":"publisher","first-page":"515","DOI":"10.1017\/S0140525X00076512","volume":"1","author":"D Premack","year":"1978","unstructured":"Premack D, Woodruff G (1978) Does the chimpanzee have a theory of mind? Behav Brain Sci Special Issue: Cognition and Consiousness in Nonhuman Species. 1(4):515\u2013526","journal-title":"Behav Brain Sci Special Issue: Cognition and Consiousness in Nonhuman Species."},{"issue":"12","key":"15959_CR32","doi-asserted-by":"publisher","first-page":"3096","DOI":"10.1016\/j.brat.2007.08.003","volume":"45","author":"S Quadflieg","year":"2007","unstructured":"Quadflieg S, Wend B, Mohr A, Miltner WH, Straube T (2007) Recognition and evaluation of emotional prosody in individuals with generalized social phobia: A pilot study. Behav Res Ther. 45(12):3096\u20133103","journal-title":"Behav Res Ther."},{"key":"15959_CR33","unstructured":"Real Academia Espa\u00f1ola y Asociaci\u00f3n de Academias de la Lengua Espa\u00f1ola. (2005). Diccionario panhisp\u00e1nico de dudas. Madrid: Santillana"},{"issue":"1","key":"15959_CR34","first-page":"e029","volume":"3","author":"IA Rodriguez","year":"2016","unstructured":"Rodriguez IA (2016) C\u00e1lculo de frecuencias de aparici\u00f3n de fonemas y al\u00f3fonos en espanol actual utilizando un transcriptor automatico. Loquens 3(1):e029","journal-title":"Loquens"},{"key":"15959_CR35","doi-asserted-by":"crossref","unstructured":"Rozgic V, Ananthakrishnan S, Saleem S, Kumar R, Vembu AN, Prasad R. (2012) Emotion recognition using acoustic and lexical features. In: INTERSPEECH. Portland, USA","DOI":"10.21437\/Interspeech.2012-118"},{"key":"15959_CR36","doi-asserted-by":"crossref","unstructured":"Sailunaz K, Dhaliwal M, Rokne J, and Alhajj R. (2018) Emotion detection from text and speech: a survey. SocNetw Anal Min. 8(1)","DOI":"10.1007\/s13278-018-0505-2"},{"key":"15959_CR37","unstructured":"Scherer KR, Banziger T and Roesch E. (2010). A Blueprint for Affective Computing: A source book and manual. Oxford University press."},{"key":"15959_CR38","doi-asserted-by":"crossref","unstructured":"Schuller B, Steidl S, Batliner A, Hirschberg J, Burgoon JK, Baird A, and Evanini K. (2016). The interspeech 2016 computational paralinguistics challenge: Deception, sincerity and native language. In 17TH Ann Conf Int Speech Comm Assoc (Interspeech 2016),. Vols 1\u20135 (Vol. 8, pp. 2001\u20132005). ISCA.","DOI":"10.21437\/Interspeech.2016-129"},{"key":"15959_CR39","unstructured":"Schuller B, W\u00f6llmer M, Eyben F, and Rigoll G. (2009) Spectral or Voice Quality? Feature Type Relevance for the Discrimination of Emotion Pairs. The Role of Prosody in Affective Speech (S. Hancil, ed.), vol. 97 of Linguistic Insights, Studies in Language and Communication, pp. 285-307, Peter Lang Publishing Group"},{"key":"15959_CR40","doi-asserted-by":"crossref","unstructured":"Shen P, Changjun Z. and Chen X. (2011) Automatic Speech Emotion Recognition Using Support Vector Machine. Int Conf Electr Mech Eng Inf Technol","DOI":"10.1109\/EMEIT.2011.6023178"},{"key":"15959_CR41","unstructured":"Snow R, O' Connor, Jurafsky D. and Ng A. (2008).\u00a0\u00a0\u00a0evaluating Non-Expert annotations for natural language tasks, Proceedings of EMNLP-08."},{"issue":"6","key":"15959_CR42","doi-asserted-by":"publisher","first-page":"669","DOI":"10.1016\/j.cpr.2010.05.001","volume":"30","author":"SR Staugaard","year":"2010","unstructured":"Staugaard SR (2010) Threatening faces and social anxiety: A literature review. Clin Psychol Rev 30(6):669\u2013690","journal-title":"Clin Psychol Rev"},{"issue":"1","key":"15959_CR43","doi-asserted-by":"publisher","first-page":"93","DOI":"10.1007\/s10772-018-9491-z","volume":"21","author":"M Swain","year":"2018","unstructured":"Swain M, Routray A, Kabisatpathy P (2018) Databases, features and classifiers for speech emotion recognition: a review. Int J Speech Technol 21(1):93\u2013120","journal-title":"Int J Speech Technol"},{"key":"15959_CR44","doi-asserted-by":"crossref","unstructured":"Tacconi D, Mayora O, Lukowicz P, Arnrich B, Setz C, Troster G, and Haring C (2008) Activity and emotion recognition to support early diagnosis of psychiatric diseases. In 2008 Second Int Conf Perv Comput Technol Healthcare, pp. 100-102","DOI":"10.1109\/PCTHEALTH.2008.4571041"},{"issue":"1","key":"15959_CR45","doi-asserted-by":"publisher","first-page":"59","DOI":"10.31887\/DCNS.2006.8.1\/ftremeau","volume":"8","author":"F Tr\u00e9meau","year":"2006","unstructured":"Tr\u00e9meau F (2006) A review of emotion deficits in schizophrenia. Dialogues Clin Neurosci 8(1):59\u201370","journal-title":"Dialogues Clin Neurosci"},{"issue":"4","key":"15959_CR46","doi-asserted-by":"publisher","first-page":"331","DOI":"10.1080\/13546805.2017.1330190","volume":"22","author":"HH Tseng","year":"2017","unstructured":"Tseng HH, Huang YL, Chen JT, Liang KY, Lin CC, Chen SH (2017) Facial and prosodic emotion recognition in social anxiety disorder. Cogn Neuropsychiatry. 22(4):331\u2013345","journal-title":"Cogn Neuropsychiatry."},{"key":"15959_CR47","unstructured":"Vaidyanathan PP (2008)\u00a0The Theory of Linear Prediction. Chapter 8. California Institute of Technology. Morgan and Claypool Publishers Series"},{"key":"15959_CR48","first-page":"1","volume-title":"\"On the Effects of Speaker Gender in Emotion Recognition Training Data,\" Speech Communication; 13th ITG-Symposium","author":"Z Xu","year":"2018","unstructured":"Xu Z, Meyer P, Fingscheidt T (2018) \u201cOn the Effects of Speaker Gender in Emotion Recognition Training Data,\u201d Speech Communication; 13th ITG-Symposium. Oldenburg, Germany, pp 1\u20135"}],"container-title":["Multimedia Tools and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11042-023-15959-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11042-023-15959-w\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11042-023-15959-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,23]],"date-time":"2024-10-23T15:52:55Z","timestamp":1729698775000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11042-023-15959-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,7,4]]},"references-count":48,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2024,2]]}},"alternative-id":["15959"],"URL":"https:\/\/doi.org\/10.1007\/s11042-023-15959-w","relation":{},"ISSN":["1573-7721"],"issn-type":[{"value":"1573-7721","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,7,4]]},"assertion":[{"value":"20 May 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"1 April 2023","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"29 May 2023","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 July 2023","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Partial financial support was received from Universidad Europea de Madrid under the research project APRENDE-R (#2019\/UEM60).","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}