{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,19]],"date-time":"2025-12-19T09:55:12Z","timestamp":1766138112986,"version":"build-2065373602"},"reference-count":45,"publisher":"MDPI AG","issue":"10","license":[{"start":{"date-parts":[[2022,9,27]],"date-time":"2022-09-27T00:00:00Z","timestamp":1664236800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"University of Sharjah"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>Speech signals carry various bits of information relevant to the speaker such as age, gender, accent, language, health, and emotions. Emotions are conveyed through modulations of facial and vocal expressions. This paper conducts an empirical comparison of performances between the classical classifiers: Gaussian Mixture Model (GMM), Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Artificial neural networks (ANN); and the deep learning classifiers, i.e., Long Short-Term Memory (LSTM), Convolutional Neural Network (CNN), and Gated Recurrent Unit (GRU) in addition to the ivector approach for a text-independent speaker verification task in neutral and emotional talking environments. The deep models undergo hyperparameter tuning using the Grid Search optimization algorithm. The models are trained and tested using a private Arabic Emirati Speech Database, Ryerson Audio\u2013Visual Database of Emotional Speech and Song dataset (RAVDESS) database, and a public Crowd-Sourced Emotional Multimodal Actors (CREMA) database. Experimental results illustrate that deep architectures do not necessarily outperform classical classifiers. In fact, evaluation was carried out through Equal Error Rate (EER) along with Area Under the Curve (AUC) scores. The findings reveal that the GMM model yields the lowest EER values and the best AUC scores across all datasets, amongst classical classifiers. In addition, the ivector model surpasses all the fine-tuned deep models (CNN, LSTM, and GRU) based on both evaluation metrics in the neutral, as well as the emotional speech. In addition, the GMM outperforms the ivector using the Emirati and RAVDESS databases.<\/jats:p>","DOI":"10.3390\/info13100456","type":"journal-article","created":{"date-parts":[[2022,9,28]],"date-time":"2022-09-28T01:51:49Z","timestamp":1664329909000},"page":"456","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Empirical Comparison between Deep and Classical Classifiers for Speaker Verification in Emotional Talking Environments"],"prefix":"10.3390","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1570-0897","authenticated-orcid":false,"given":"Ali Bou","family":"Nassif","sequence":"first","affiliation":[{"name":"Computer Engineering Department, University of Sharjah, Sharjah 27272, United Arab Emirates"}]},{"given":"Ismail","family":"Shahin","sequence":"additional","affiliation":[{"name":"Electrical Engineering Department, University of Sharjah, Sharjah 27272, United Arab Emirates"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4059-3176","authenticated-orcid":false,"given":"Mohammed","family":"Lataifeh","sequence":"additional","affiliation":[{"name":"Computer Science Department, University of Sharjah, Sharjah 27272, United Arab Emirates"}]},{"given":"Ashraf","family":"Elnagar","sequence":"additional","affiliation":[{"name":"Computer Science Department, University of Sharjah, Sharjah 27272, United Arab Emirates"}]},{"given":"Nawel","family":"Nemmour","sequence":"additional","affiliation":[{"name":"Computer Engineering Department, University of Sharjah, Sharjah 27272, United Arab Emirates"}]}],"member":"1968","published-online":{"date-parts":[[2022,9,27]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"19143","DOI":"10.1109\/ACCESS.2019.2896880","article-title":"Speech Recognition Using Deep Neural Networks: A Systematic Review","volume":"7","author":"Nassif","year":"2019","journal-title":"IEEE Access"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Reynolds, D.A. (2002, January 13\u201317). An Overview of Automatic Speaker Recognition Technology. Proceedings of the 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, Orlando, FL, USA.","DOI":"10.1109\/ICASSP.2002.5745552"},{"key":"ref_3","unstructured":"Salehghaffari, H. (2018). Speaker Verification using Convolutional Neural Networks. arXiv."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Baldominos, A., Cervantes, A., Saez, Y., and Isasi, P. (2019). A Comparison of Machine Learning and Deep Learning Techniques for Activity Recognition using Mobile Devices. Sensors, 19.","DOI":"10.3390\/s19030521"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"7331","DOI":"10.1109\/TCOMM.2019.2924010","article-title":"Wireless Networks Design in the Era of Deep Learning: Model-Based, AI-Based, or Both?","volume":"67","author":"Zappone","year":"2019","journal-title":"IEEE Trans. Commun."},{"key":"ref_6","unstructured":"Wan, V., and Campbell, W.M. (2000, January 11\u201313). Support vector machines for speaker verification and identification. Proceedings of the Neural Networks for Signal Processing X. In Proceedings of the 2000 IEEE Signal Processing Society Workshop (Cat. No.00TH8501), Sydney, NSW, Australia."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Vivaracho-Pascual, C., Ortega-Garcia, J., Alonso, L., and Moro-Sancho, Q.I. (2001, January 3\u20137). A comparative study of MLP-based artificial neural networks in text-independent speaker verification against GMM-based systems. Proceedings of the Eurospeech, Aalborg, Denmark.","DOI":"10.21437\/Eurospeech.2001-410"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"308","DOI":"10.1109\/LSP.2006.870086","article-title":"Support vector machines using GMM supervectors for speaker verification","volume":"13","author":"Campbell","year":"2006","journal-title":"IEEE Signal Process. Lett."},{"key":"ref_9","unstructured":"Chen, S.-H., and Luo, Y. (2009, January 18\u201320). Speaker Verification Using MFCC and Support Vector Machine. Proceedings of the International MultiConference of Engineers and Computer Scientists, Hong Kong, China."},{"key":"ref_10","first-page":"1073","article-title":"Arabic text-dependent speaker verification for mobile devices using artificial neural networks","volume":"7","author":"Alarifi","year":"2012","journal-title":"Int. J. Phys. Sci."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"3799","DOI":"10.1007\/s13369-014-1048-0","article-title":"Automatic Speaker Recognition Using Multi-Directional Local Features (MDLF)","volume":"39","author":"Mahmood","year":"2014","journal-title":"Arab. J. Sci. Eng."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Taylor, S., Hanani, A., Basha, H., and Sharaf, Y. (2015, January 14\u201317). Palestinian Arabic regional accent recognition. Proceedings of the 2015 International Conference on Speech Technology and Human-Computer Dialogue (SpeD), Bucharest, Romania.","DOI":"10.1109\/SPED.2015.7343088"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Chauhan, N., and Chandra, M. (2017, January 22\u201324). Speaker recognition and verification using artificial neural network. Proceedings of the 2017 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), Chennai, India.","DOI":"10.1109\/WiSPNET.2017.8299943"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Wu, W., Zheng, T.F., Xu, M.-X., and Bao, H.-J. (2006, January 17\u201321). Study on Speaker Verification on Emotional Speech. Proceedings of the NTERSPEECH, Pittsburgh, PA, USA.","DOI":"10.21437\/Interspeech.2006-191"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"236","DOI":"10.1049\/iet-spr.2008.0175","article-title":"Speaker verification under mismatched data conditions","volume":"3","author":"Pillay","year":"2009","journal-title":"Signal Process. IET"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"915","DOI":"10.1007\/s10772-018-9543-4","article-title":"Three-stage speaker verification architecture in emotional talking environments","volume":"21","author":"Shahin","year":"2018","journal-title":"Int. J. Speech Technol."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"105","DOI":"10.1007\/s10772-021-09876-2","article-title":"Automatic speaker verification systems and spoof detection techniques: Review and analysis","volume":"25","author":"Mittal","year":"2022","journal-title":"Int. J. Speech Technol."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"101258","DOI":"10.1016\/j.csl.2021.101258","article-title":"A speaker verification backend with robust performance across conditions","volume":"71","author":"Ferrer","year":"2022","journal-title":"Comput. Speech Lang."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"782","DOI":"10.1109\/LSP.2022.3143036","article-title":"Neural Acoustic-Phonetic Approach for Speaker Verification with Phonetic Attention Mask","volume":"29","author":"Liu","year":"2022","journal-title":"IEEE Signal Process. Lett."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Bhattacharya, G., Alam, J., and Kenny, P. (2017, January 20\u201324). Deep speaker embeddings for short-duration speaker verification. Proceedings of the Annual Conference of the International Speech Communication Association, Interspeech, Stockholm, Sweden.","DOI":"10.21437\/Interspeech.2017-1575"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1006\/dspr.1999.0361","article-title":"Speaker verification using adapted Gaussian mixture models","volume":"10","author":"Reynolds","year":"2000","journal-title":"Digit. Signal Process. A Rev. J."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"788","DOI":"10.1109\/TASL.2010.2064307","article-title":"Front-end factor analysis for speaker verification","volume":"19","author":"Dehak","year":"2011","journal-title":"IEEE Trans. Audio Speech Lang. Process."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"980","DOI":"10.1109\/TASL.2008.925147","article-title":"A Study of Inter-Speaker Variability in Speaker Verification","volume":"16","author":"Kenny","year":"2008","journal-title":"IEEE Trans. Audio Speech Lang. Process."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Garcia-Romero, D., and Espy-Wilson, C. (2011, January 28\u201331). Analysis of i-vector Length Normalization in Speaker Recognition Systems. Proceedings of the Interspeech, Florence, Italy.","DOI":"10.21437\/Interspeech.2011-53"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"3811","DOI":"10.1007\/s00034-022-01957-0","article-title":"Generative and Discriminative Modelling of Linear Energy Sub-bands for Spoof Detection in Speaker Verification Systems","volume":"41","author":"Bharathi","year":"2022","journal-title":"Circuits Syst. Signal Process."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Alam, M.J., Kinnunen, T., Kenny, P., Ouellet, P., and O\u2019Shaughnessy, D. (2011, January 11\u201315). Multi-taper MFCC Features for Speaker Verification using I-vectors. Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, Waikoloa, HI, USA.","DOI":"10.1109\/ASRU.2011.6163886"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Chen, L., Lee, K.A., Chng, E., Ma, B., Li, H., and Dai, L.-R. (2016, January 20\u201325). Content-aware local variability vector for speaker verification with short utterance. Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China.","DOI":"10.1109\/ICASSP.2016.7472726"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Zhu, Y., Ko, T., Snyder, D., Mak, B., and Povey, D. (2018, January 2\u20136). Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification. Proceedings of the Annual Conference of the International Speech Communication Association, Interspeech, Hyderabad, India.","DOI":"10.21437\/Interspeech.2018-1158"},{"key":"ref_29","unstructured":"Mobiny, A., and Najarian, M. (2018). Text-Independent Speaker Verification Using Long Short-Term Memory Networks. arXiv."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"389","DOI":"10.1007\/s10772-021-09795-2","article-title":"Convolutional neural network vectors for speaker recognition","volume":"24","author":"Hourri","year":"2021","journal-title":"Int. J. Speech Technol."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"16033","DOI":"10.1007\/s00521-021-06226-w","article-title":"Novel hybrid DNN approaches for speaker verification in emotional and stressful talking environments","volume":"33","author":"Shahin","year":"2021","journal-title":"Neural Comput. Appl."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"3256","DOI":"10.24996\/ijs.2021.62.9.38","article-title":"Analysis of Methods and Techniques Used for Speaker Identification, Recognition, and Verification: A Study on Quarter-Century Research Outcomes","volume":"62","author":"Mohammed","year":"2021","journal-title":"Iraqi J. Sci."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Chen, Y.H., Lopez-Moreno, I., Sainath, T.N., Visontai, M., Alvarez, R., and Parada, C. (2015, January 6\u201310). Locally-connected and convolutional neural networks for small footprint speaker recognition. Proceedings of the Interspeech, Dresden, Germany.","DOI":"10.21437\/Interspeech.2015-297"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Variani, E., Lei, X., McDermott, E., Moreno, I.L., and Gonzalez-Dominguez, J. (2014, January 4\u20139). Deep Neural Networks for Small Footprint Text-Dependent Speaker Verification. Proceedings of the 2014 in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy.","DOI":"10.1109\/ICASSP.2014.6854363"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Heigold, G., Moreno, I., Bengio, S., and Shazeer, N. (2016, January 20\u201325). End-to-end text-dependent speaker verification. Proceedings of the ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing, Shanghai, China.","DOI":"10.1109\/ICASSP.2016.7472652"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"377","DOI":"10.1109\/TAFFC.2014.2336244","article-title":"CREMA-D: Crowd-sourced emotional multimodal actors dataset","volume":"5","author":"Cao","year":"2014","journal-title":"IEEE Trans. Affect. Comput."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Livingstone, S.R., and Russo, F.A. (2018). The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE, 13.","DOI":"10.1371\/journal.pone.0196391"},{"key":"ref_38","unstructured":"Kumar, D.S.P. (2015). Feature Normalisation for Robust Speech Recognition. arXiv."},{"key":"ref_39","unstructured":"Li, L., Wang, D., Zhang, Z., and Zheng, T.F. (2015). Deep Speaker Vectors for Semi Text-independent Speaker Verification. arXiv."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"McFee, B., Raffel, C., Liang, D., Ellis, D.P., McVicar, M., Battenberg, E., and Nieto, O. (2015, January 6\u201312). Librosa: Audio and music signal analysis in python. Proceedings of the 14th Python in Science Conference, Austin, TX, USA.","DOI":"10.25080\/Majora-7b98e3ed-003"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"72","DOI":"10.1109\/89.365379","article-title":"Robust text-independent speaker identification using Gaussian mixture speaker models","volume":"3","author":"Reynolds","year":"1995","journal-title":"IEEE Trans. Speech Audio Process."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"436","DOI":"10.2991\/ijcis.2018.125905686","article-title":"AEkNN: An AutoEncoder kNN-Based Classifier With Built-in Dimensionality Reduction","volume":"12","author":"Pulgar","year":"2018","journal-title":"Int. J. Comput. Intell. Syst."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"207","DOI":"10.1016\/j.mcm.2008.05.010","article-title":"Artificial neural network modeling techniques applied to the hydrodesulfurization process","volume":"49","year":"2009","journal-title":"Math. Comput. Model."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Saez, Y., Baldominos, A., and Isasi, P. (2016). A Comparison Study of Classifier Algorithms for Cross-Person Physical Activity Recognition. Sensors, 17.","DOI":"10.3390\/s17010066"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Shahin, I. (2016, January 6\u201310). Emirati speaker verification based on HMMls, HMM2s, and HMM3s. Proceedings of the 2016 IEEE 13th International Conference on Signal Processing (ICSP), Chengdu, China.","DOI":"10.1109\/ICSP.2016.7877896"}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/13\/10\/456\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T00:40:18Z","timestamp":1760143218000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/13\/10\/456"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,9,27]]},"references-count":45,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2022,10]]}},"alternative-id":["info13100456"],"URL":"https:\/\/doi.org\/10.3390\/info13100456","relation":{},"ISSN":["2078-2489"],"issn-type":[{"type":"electronic","value":"2078-2489"}],"subject":[],"published":{"date-parts":[[2022,9,27]]}}}