{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T16:12:47Z","timestamp":1773850367736,"version":"3.50.1"},"reference-count":44,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2019,12,13]],"date-time":"2019-12-13T00:00:00Z","timestamp":1576195200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computers"],"abstract":"<jats:p>Because one of the key issues in improving the performance of Speech Emotion Recognition (SER) systems is the choice of an effective feature representation, most of the research has focused on developing a feature level fusion using a large set of features. In our study, we propose a relatively low-dimensional feature set that combines three features: baseline Mel Frequency Cepstral Coefficients (MFCCs), MFCCs derived from Discrete Wavelet Transform (DWT) sub-band coefficients that are denoted as DMFCC, and pitch based features. Moreover, the performance of the proposed feature extraction method is evaluated in clean conditions and in the presence of several real-world noises. Furthermore, conventional Machine Learning (ML) and Deep Learning (DL) classifiers are employed for comparison. The proposal is tested using speech utterances of both of the Berlin German Emotional Database (EMO-DB) and Interactive Emotional Dyadic Motion Capture (IEMOCAP) speech databases through speaker independent experiments. Experimental results show improvement in speech emotion detection over baselines.<\/jats:p>","DOI":"10.3390\/computers8040091","type":"journal-article","created":{"date-parts":[[2019,12,13]],"date-time":"2019-12-13T11:27:22Z","timestamp":1576236442000},"page":"91","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":22,"title":["An Investigation of a Feature-Level Fusion for Noisy Speech Emotion Recognition"],"prefix":"10.3390","volume":"8","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9413-3829","authenticated-orcid":false,"given":"Sara","family":"Sekkate","sequence":"first","affiliation":[{"name":"Team Networks, Telecoms &amp; Multimedia, University of Hassan II Casablanca, Casablanca 20000, Morocco"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mohammed","family":"Khalil","sequence":"additional","affiliation":[{"name":"Team Networks, Telecoms &amp; Multimedia, University of Hassan II Casablanca, Casablanca 20000, Morocco"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Abdellah","family":"Adib","sequence":"additional","affiliation":[{"name":"Team Networks, Telecoms &amp; Multimedia, University of Hassan II Casablanca, Casablanca 20000, Morocco"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sofia","family":"Ben Jebara","sequence":"additional","affiliation":[{"name":"COSIM Lab, Higher School of Communications of Tunis, Carthage University, Ariana 2083, Tunisia"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2019,12,13]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"15400","DOI":"10.1109\/ACCESS.2017.2728801","article-title":"Enhanced Forensic Speaker Verification Using a Combination of DWT and MFCC Feature Warping in the Presence of Noise and Reverberation Conditions","volume":"5","author":"Dean","year":"2017","journal-title":"IEEE Access"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Al-Ali, A.K.H., Senadji, B., and Naik, G.R. (2017, January 12\u201314). Enhanced forensic speaker verification using multi-run ICA in the presence of environmental noise and reverberation conditions. Proceedings of the 2017 IEEE International Conference on Signal and Image Processing Applications (ICSIPA), Kuching, Malaysia.","DOI":"10.1109\/ICSIPA.2017.8120601"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.dsp.2018.11.005","article-title":"Ensemble of jointly trained deep neural network based acoustic models for reverberant speech recognition","volume":"85","author":"Lee","year":"2019","journal-title":"Digit. Signal Process."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"3743","DOI":"10.1007\/s00034-019-01026-z","article-title":"Speaker Identification for OFDM-Based Aeronautical Communication System","volume":"38","author":"Sekkate","year":"2019","journal-title":"Circuits Syst. Signal Process."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"504","DOI":"10.3390\/make1010031","article-title":"A Near Real-Time Automatic Speaker Recognition Architecture for Voice-Based User Interface","volume":"1","author":"Dhakal","year":"2019","journal-title":"Mach. Learn. Knowl. Extr."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"2810","DOI":"10.1007\/s00034-018-0992-4","article-title":"Text-Independent Speaker Recognition in Clean and Noisy Backgrounds Using Modified VQ-LBG Algorithm","volume":"38","author":"Mallikarjunan","year":"2019","journal-title":"Circuits Syst. Signal Process."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/S1005-8885(17)60193-6","article-title":"Noisy speech emotion recognition using sample reconstruction and multiple-kernel learning","volume":"24","author":"Xiaoqing","year":"2017","journal-title":"J. China Univ. Posts Telecommun."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Esposito, A., and V\u00edch, R. (2009). Polish Emotional Speech Database \u2013 Recording and Preliminary Validation. Cross-Modal Analysis of Speech, Gestures, Gaze and Facial Expressions, Springer.","DOI":"10.1007\/978-3-642-03320-9"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Tawari, A., and Trivedi, M.M. (2010, January 23\u201326). Speech Emotion Analysis in Noisy Real-World Environment. Proceedings of the 2010 20th International Conference on Pattern Recognition, Istanbul, Turkey.","DOI":"10.1109\/ICPR.2010.1132"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"457","DOI":"10.2478\/aoa-2013-0054","article-title":"Speech Emotion Recognition under White Noise","volume":"38","author":"Huang","year":"2013","journal-title":"Arch. Acoust."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Hyun, K., Kim, E., and Kwak, Y. (2006, January 18\u201321). Robust Speech Emotion Recognition Using Log Frequency Power Ratio. Proceedings of the 2006 SICE-ICASE International Joint Conference, Busan, Korea.","DOI":"10.1109\/SICE.2006.314794"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Yeh, L.Y., and Chi, T.S. (2010, January 26\u201330). Spectro-temporal modulations for robust speech emotion recognition. Proceedings of the INTERSPEECH 2010, Makuhari, Japan.","DOI":"10.21437\/Interspeech.2010-286"},{"key":"ref_13","unstructured":"Georgogiannis, A., and Digalakis, V. (2012, January 27\u201331). Speech Emotion Recognition using non-linear Teager energy based features in noisy environments. Proceedings of the 2012 20th European Signal Processing Conference (EUSIPCO), Bucharest, Romania."},{"key":"ref_14","first-page":"197","article-title":"Speech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions","volume":"12","author":"Bashirpour","year":"2016","journal-title":"Iran. J. Electr. Electron. Eng."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Schuller, B., Arsic, D., Wallhoff, F., and Rigoll, G. (2006, January 2\u20135). Emotion Recognition in the Noise Applying Large Acoustic Feature Sets. Proceedings of the Speech Prosody, Dresden, Germany.","DOI":"10.21437\/SpeechProsody.2006-150"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Rozgic, V., Ananthakrishnan, S., Saleem, S., Kumar, R., Vembu, A., and Prasad, R. (2012, January 9\u201313). Emotion Recognition using Acoustic and Lexical Features. Proceedings of the INTERSPEECH 2012, Portland, OR, USA.","DOI":"10.21437\/Interspeech.2012-118"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"215","DOI":"10.1007\/s10772-012-9176-y","article-title":"Robust emotional speech classification in the presence of babble noise","volume":"16","author":"Karimi","year":"2013","journal-title":"Int. J. Speech Technol."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Jin, Y., Song, P., Zheng, W., and Zhao, L. (2014, January 4\u20139). A feature selection and feature fusion combination method for speaker-independent speech emotion recognition. Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy.","DOI":"10.1109\/ICASSP.2014.6854515"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Huang, Y., Tian, K., Wu, A., and Zhang, G. (2017). Feature fusion methods research based on deep belief networks for speech emotion recognition under noise condition. J. Ambient. Intell. Humaniz. Comput.","DOI":"10.1007\/s12652-017-0644-8"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"1799","DOI":"10.1016\/j.asej.2016.11.001","article-title":"Wavelet based feature combination for recognition of emotions","volume":"9","author":"Palo","year":"2018","journal-title":"Ain Shams Eng. J."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Kerkeni, L., Serrestou, Y., Raoof, K., Mbarki, M., Mahjoub, M.A., and Cleder, C. (2019). Automatic Speech Emotion Recognition using an Optimal Combination of Features based on EMD-TKEO. Speech Commun.","DOI":"10.5772\/intechopen.84856"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"1535","DOI":"10.1016\/j.patrec.2009.12.036","article-title":"A learning approach to hierarchical feature selection and aggregation for audio classification","volume":"31","author":"Ruvolo","year":"2010","journal-title":"Pattern Recognit. Lett."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"1415","DOI":"10.1016\/j.sigpro.2009.09.009","article-title":"Emotion recognition from speech signals using new harmony features","volume":"90","author":"Yang","year":"2010","journal-title":"Signal Process."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Seehapoch, T., and Wongthanavasu, S. (February, January 31). Speech emotion recognition using Support Vector Machines. Proceedings of the 2013 5th International Conference on Knowledge and Smart Technology (KST), Chonburi, Thailand.","DOI":"10.1109\/KST.2013.6512793"},{"key":"ref_25","unstructured":"Bhargava, M., and Polzehl, T. (2013). Improving Automatic Emotion Recognition from speech using Rhythm and Temporal feature. arXiv."},{"key":"ref_26","unstructured":"Klein, W.B., and Palival, K.K. (1995). A robust algorithm for pitch tracking (RAPT). Speech Coding and Synthesis, Elsevier."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Kasi, K., and Zahorian, S.A. (2002, January 13\u201317). Yet Another Algorithm for Pitch Tracking. Proceedings of the 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, Orlando, FL, USA.","DOI":"10.1109\/ICASSP.2002.1005751"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"357","DOI":"10.1109\/TASSP.1980.1163420","article-title":"Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences","volume":"28","author":"Davis","year":"1980","journal-title":"IEEE Trans. Acoust. Speech Signal Process."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"1215","DOI":"10.1109\/5.237532","article-title":"Signal modeling techniques in speech recognition","volume":"81","author":"Picone","year":"1993","journal-title":"Proc. IEEE"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"255","DOI":"10.1016\/j.asoc.2018.10.022","article-title":"A comparative analysis of speech signal processing algorithms for Parkinson\u2019s disease classification and the use of the tunable Q-factor wavelet transform","volume":"74","author":"Sakar","year":"2019","journal-title":"Appl. Soft Comput."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Mallat, S. (1998). A Wavelet Tour of Signal Processing, Academic Press. [2nd ed.].","DOI":"10.1016\/B978-012466606-1\/50008-8"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"293","DOI":"10.1109\/TSA.2004.838534","article-title":"Toward detecting emotions in spoken dialogs","volume":"13","author":"Lee","year":"2005","journal-title":"IEEE Trans. Speech Audio Process."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"179","DOI":"10.1111\/j.1469-1809.1936.tb02137.x","article-title":"The use of multiple measurements in taxonomic problems","volume":"7","author":"Fisher","year":"1936","journal-title":"Ann. Eugen."},{"key":"ref_34","unstructured":"Duda, R., and Hart, P. (1973). Pattern Classifications and Scene Analysis, John Wiley & Sons."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1007\/BF00994018","article-title":"Support-Vector Networks","volume":"20","author":"Cortes","year":"1995","journal-title":"Mach. Learn."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"1162","DOI":"10.1016\/j.specom.2006.04.003","article-title":"Emotional speech recognition: Resources, features, and methods","volume":"48","author":"Ververidis","year":"2006","journal-title":"Speech Commun."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"335","DOI":"10.1007\/s10579-008-9076-6","article-title":"IEMOCAP: Interactive emotional dyadic motion capture database","volume":"42","author":"Busso","year":"2008","journal-title":"Lang. Resour. Eval."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W.F., and Weiss, B. (2005, January 4\u20138). A database of German emotional speech. Proceedings of the INTERSPEECH ISCA, Lisbon, Portugal.","DOI":"10.21437\/Interspeech.2005-446"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"39","DOI":"10.1109\/TPAMI.2008.52","article-title":"A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions","volume":"31","author":"Zeng","year":"2009","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Pearce, D., and Hirsch, H.G. (2000, January 18\u201320). The Aurora Experimental Framework for the Performance Evaluation of Speech Recognition Systems under Noisy Conditions. Proceedings of the ISCA ITRW ASR2000, Paris, France.","DOI":"10.21437\/ICSLP.2000-743"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Tang, D., Zeng, J., and Li, M. (2018, January 2\u20136). An End-to-End Deep Learning Framework for Speech Emotion Recognition of Atypical Individuals. Proceedings of the INTERSPEECH 2018, Hyderabad, India.","DOI":"10.21437\/Interspeech.2018-2581"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"312","DOI":"10.1016\/j.bspc.2018.08.035","article-title":"Speech emotion recognition using deep 1D & 2D CNN LSTM networks","volume":"47","author":"Zhao","year":"2019","journal-title":"Biomed. Signal Process. Control"},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"69","DOI":"10.1016\/j.inffus.2018.09.008","article-title":"Emotion recognition using deep learning approach from audio\u2013visual emotional big data","volume":"49","author":"Hossain","year":"2019","journal-title":"Inf. Fusion"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Sarma, M., Ghahremani, P., Povey, D., Goel, N.K., Sarma, K.K., and Dehak, N. (2018, January 2\u20136). Emotion Identification from Raw Speech Signals Using DNNs. Proceedings of the INTERSPEECH 2018, Hyderabad, India.","DOI":"10.21437\/Interspeech.2018-1353"}],"container-title":["Computers"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-431X\/8\/4\/91\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T13:42:16Z","timestamp":1760190136000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-431X\/8\/4\/91"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,12,13]]},"references-count":44,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2019,12]]}},"alternative-id":["computers8040091"],"URL":"https:\/\/doi.org\/10.3390\/computers8040091","relation":{},"ISSN":["2073-431X"],"issn-type":[{"value":"2073-431X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,12,13]]}}}