{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,6]],"date-time":"2026-06-06T19:51:41Z","timestamp":1780775501291,"version":"3.54.1"},"reference-count":38,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2022,3,16]],"date-time":"2022-03-16T00:00:00Z","timestamp":1647388800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100016047","name":"Science Fund of the Republic of Serbia","doi-asserted-by":"publisher","award":["grant #6524560, AI \u2013 S ADAPT"],"award-info":[{"award-number":["grant #6524560, AI \u2013 S ADAPT"]}],"id":[{"id":"10.13039\/501100016047","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100016047","name":"Science Fund of the Republic of Serbia","doi-asserted-by":"publisher","award":["grant #6527104, AI-Com-in-AI"],"award-info":[{"award-number":["grant #6527104, AI-Com-in-AI"]}],"id":[{"id":"10.13039\/501100016047","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>Speaker recognition is an important classification task, which can be solved using several approaches. Although building a speaker recognition model on a closed set of speakers under neutral speaking conditions is a well-researched task and there are solutions that provide excellent performance, the classification accuracy of developed models significantly decreases when applying them to emotional speech or in the presence of interference. Furthermore, deep models may require a large number of parameters, so constrained solutions are desirable in order to implement them on edge devices in the Internet of Things systems for real-time detection. The aim of this paper is to propose a simple and constrained convolutional neural network for speaker recognition tasks and to examine its robustness for recognition in emotional speech conditions. We examine three quantization methods for developing a constrained network: floating-point eight format, ternary scalar quantization, and binary scalar quantization. The results are demonstrated on the recently recorded SEAC dataset.<\/jats:p>","DOI":"10.3390\/e24030414","type":"journal-article","created":{"date-parts":[[2022,3,16]],"date-time":"2022-03-16T22:09:58Z","timestamp":1647468598000},"page":"414","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":23,"title":["Speaker Recognition Using Constrained Convolutional Neural Networks in Emotional Speech"],"prefix":"10.3390","volume":"24","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0748-4672","authenticated-orcid":false,"given":"Nikola","family":"Simi\u0107","sequence":"first","affiliation":[{"name":"Faculty of Technical Sciences, University of Novi Sad, Trg Dositeja Obradovica 6, 21000 Novi Sad, Serbia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Sini\u0161a","family":"Suzi\u0107","sequence":"additional","affiliation":[{"name":"Faculty of Technical Sciences, University of Novi Sad, Trg Dositeja Obradovica 6, 21000 Novi Sad, Serbia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Tijana","family":"Nosek","sequence":"additional","affiliation":[{"name":"Faculty of Technical Sciences, University of Novi Sad, Trg Dositeja Obradovica 6, 21000 Novi Sad, Serbia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Mia","family":"Vujovi\u0107","sequence":"additional","affiliation":[{"name":"Faculty of Technical Sciences, University of Novi Sad, Trg Dositeja Obradovica 6, 21000 Novi Sad, Serbia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8267-9541","authenticated-orcid":false,"given":"Zoran","family":"Peri\u0107","sequence":"additional","affiliation":[{"name":"Faculty of Electronic Engineering, University of Nis, Aleksandra Medvedeva 14, 18000 Nis, Serbia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Milan","family":"Savi\u0107","sequence":"additional","affiliation":[{"name":"Faculty of Sciences and Mathematics, University of Pristina in Kosovska Mitrovica, Ive Lole Ribara 29, 38220 Kosovska Mitrovica, Serbia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4558-9918","authenticated-orcid":false,"given":"Vlado","family":"Deli\u0107","sequence":"additional","affiliation":[{"name":"Faculty of Technical Sciences, University of Novi Sad, Trg Dositeja Obradovica 6, 21000 Novi Sad, Serbia"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2022,3,16]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1016\/j.specom.2009.08.009","article-title":"An overview of text-independent speaker recognition: From features to supervectors","volume":"52","author":"Kinnunen","year":"2010","journal-title":"Speech Commun."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Reynolds, D.A. (2002, January 13\u201317). An overview of automatic speaker recognition technology. Proceedings of the 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, Orlando, FL, USA.","DOI":"10.1109\/ICASSP.2002.5745552"},{"key":"ref_3","first-page":"4368036","article-title":"Speech technology progress based on new machine learning paradigm","volume":"2019","year":"2019","journal-title":"Comput. Intell. Neurosci."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"14","DOI":"10.1002\/j.1538-7305.1987.tb00198.x","article-title":"A vector quantization approach to speaker recognition","volume":"66","author":"Soong","year":"1987","journal-title":"AT T Tech. J."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"254","DOI":"10.1109\/TASSP.1981.1163530","article-title":"Cepstral analysis technique for automatic speaker verification","volume":"29","author":"Furui","year":"1981","journal-title":"IEEE Trans. Acoust. Speech Signal Processing"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"M\u00fcller, C. (2007). Classification Methods for Speaker Recognition. Speaker Classification I. Lecture Notes in Computer Science, Springer.","DOI":"10.1007\/978-3-540-74200-5"},{"key":"ref_7","first-page":"7","article-title":"Speaker recognition using support vector machine","volume":"87","author":"Nijhawan","year":"2014","journal-title":"Int. J. Comput. Appl."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"788","DOI":"10.1109\/TASL.2010.2064307","article-title":"Front-end factor analysis for speaker verification","volume":"19","author":"Dehak","year":"2011","journal-title":"IEEE Trans. Audio Speech Lang. Processing"},{"key":"ref_9","unstructured":"Kenny, P. (2005). Joint Factor Analysis of Speaker and Session Variability: Theory and Algorithms, CRIM. Tech. Rep. CRIM-06\/08-13."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Mandari\u0107, I., Vujovi\u0107, M., Suzi\u0107, S., Nosek, T., Simi\u0107, N., and Deli\u0107, V. (2021, January 23\u201324). Initial analysis of the impact of emotional speech on the performance of speaker recognition on new serbian emotional database. Proceedings of the 29th Telecommunications Forum (TELFOR), Belgrade, Serbia.","DOI":"10.1109\/TELFOR52709.2021.9653376"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"02217","DOI":"10.1088\/1742-6596\/1992\/2\/022177","article-title":"Using quantized neural network for speaker recognition on edge computing devices","volume":"1992","author":"Dai","year":"2021","journal-title":"J. Phys. Conf. Ser."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Kitamura, T. (2008, January 22\u201323). Acoustic analysis of imitated voice produced by a professional impersonator. Proceedings of the 9th Annual Conference of the International Speech Communication Association (Interspeech), Brisbane, Australia.","DOI":"10.21437\/Interspeech.2008-248"},{"key":"ref_13","unstructured":"Ghiurcau, M.V., Rusu, C., and Astola, J. (2011, January 26\u201328). Speaker recognition in an emotional environment. Proceedings of the Signal Processing and Applied Mathematics for Electronics and Communications (SPAMEC 2011), Cluj-Napoca, Romania."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Wu, W., Zheng, F., Xu, M., and Bao, H. (2006, January 17\u201321). Study on speaker verification on emotional speech. Proceedings of the INTERSPEECH 2006\u2014ICSLP, Ninth International Conference on Spoken Language Processing, Pittsburgh, PA, USA.","DOI":"10.21437\/Interspeech.2006-191"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Snyder, D., Garcia-Romero, D., Sell, G., Povey, D., and Khudanpur, S. (2018, January 15\u201320). X-vectors: Robust DNN embeddings for speaker recognition. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2018, Calgary, AB, Canada.","DOI":"10.1109\/ICASSP.2018.8461375"},{"key":"ref_16","unstructured":"Sarma, B.D., and Das, R.K. (2020, January 7\u201310). Emotion invariant speaker embeddings for speaker identification with emotional speech. Proceedings of the 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Auckland, New Zealand."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Lukic, Y., Vogt, C., D\u00fcrr, O., and Stadelmann, T. (2016, January 13\u201316). Speaker identification and clustering using convolutional neural networks. Proceedings of the IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP), Vietri sul Mare, Italy.","DOI":"10.1109\/MLSP.2016.7738816"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"McLaren, M., Lei, Y., Scheffer, N., and Ferrer, L. (2014, January 14\u201318). Application of convolutional neural networks to speaker recognition in noisy conditions. Proceedings of the INTERSPEECH 2014, the 15th Annual Conference of the International Speech Communication Association, Singapore.","DOI":"10.21437\/Interspeech.2014-172"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"107665","DOI":"10.1016\/j.apacoust.2020.107665","article-title":"Speaker identification based on Radon transform and CNNs in the presence of different types of interference for Robotic Applications","volume":"177","author":"Shafik","year":"2021","journal-title":"Appl. Acoust."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Anvarjon, T., Mustaqeem, and Kwon, S. (2020). Deep-Net: A Lightweight CNN-Based Speech Emotion Recognition System Using Deep Frequency Features. Sensors, 20.","DOI":"10.3390\/s20185212"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Anvarjon, T., Mustaqeem, Choeh, J., and Kwon, S. (2021). Age and Gender Recognition Using a Convolutional Neural Network with a Specially Designed Multi-Attention Module through Speech Spectrograms. Sensors, 21.","DOI":"10.3390\/s21175892"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"5571","DOI":"10.1007\/s11042-017-5292-7","article-title":"Deep features-based speech emotion recognition for smart affective services","volume":"78","author":"Badshah","year":"2019","journal-title":"Multimed. Tools Appl."},{"key":"ref_23","unstructured":"Courbariaux, M., Hubara, I., Soudry, D., El-Yaniv, R., and Bengio, Y. (2016). Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or \u22121. arXiv."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"41","DOI":"10.5755\/j02.eie.28881","article-title":"Binary Quantization Analysis of Neural Networks Weights on MNIST Dataset","volume":"27","author":"Peric","year":"2021","journal-title":"Elektronika ir Elektrotechnika"},{"key":"ref_25","unstructured":"Zhu, C., Han, S., Mao, H., and Dally, W. (2017). Trained Ternary Quantization. arXiv."},{"key":"ref_26","unstructured":"(2019). IEEE Standard for Floating-Point Arithmetic (Standard No. IEEE Std 754\u20132019 (Revision of IEEE 754\u20132008))."},{"key":"ref_27","unstructured":"Sun, X., Choi, J., Chen, C.-Y., Wang, N., Venkataramani, S., Cui, X., Zhang, W., and Gopalakrishnan, K. (2019, January 8\u201314). Hybrid 8-bit floating point (HFP8) training and inference for deep neural networks. Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada."},{"key":"ref_28","unstructured":"Wang, N., Choi, J., Brand, B., Chen, C.-Y., and Gopalakrishnan, K. (2018, January 3\u20138). Training deep neural networks with 8-bit floating point numbers. Proceedings of the 32nd International Conference on Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Nikolic, J., Peric, Z., Aleksic, D., Tomic, S., and Jovanovic, A. (2021). Whether the support region of three-bit uniform quantizer has a strong impact on post-training quantization for MNIST Dataset?. Entropy, 21.","DOI":"10.3390\/e23121699"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Peric, Z., Savic, M., Simic, N., Denic, B., and Despotovic, V. (2021). Design of a 2-Bit Neural Network Quantizer for Laplacian Source. Entropy, 23.","DOI":"10.3390\/e23080933"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Peric, Z., Denic, B., Savic, M., and Despotovic, V. (2020). Design and analysis of binary scalar quantizer of laplacian source with applications. Information, 11.","DOI":"10.3390\/info11110501"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Peric, Z., Savic, M., Dincic, M., Vucic, N., Djosic, D., and Milosavljevic, S. (2021, January 25\u201327). Floating Point and Fixed Point 32-bits Quantizers for Quantization of Weights of Neural Networks. Proceedings of the 12th International Symposium on Advanced Topics in Electrical Engineering (ATEE), Bucharest, Romania.","DOI":"10.1109\/ATEE52255.2021.9425265"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Ye, F., and Yang, J. (2021). A deep neural network model for speaker identification. Appl. Sci., 11.","DOI":"10.3390\/app11083603"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Kwon, S. (2020). A CNN-Assisted Enhanced Audio Signal Processing for Speech Emotion Recognition. Sensors, 20.","DOI":"10.3390\/s20010183"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/97.736233","article-title":"A statistical model-based voice activity detection","volume":"6","author":"Sohn","year":"1999","journal-title":"IEEE Signal Processing Lett."},{"key":"ref_36","unstructured":"Kienast, M., and Sendlmeier, W.F. (2000, January 5\u20137). Acoustical analysis of spectral and temporal changes in emotional speech. Proceedings of the ITRW on Speech and Emotion, Newcastle upon Tyne, UK."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Hershey, S., Chaudhuri, S., Ellis, D.P.W., Gemmeke, J.F., Jansen, A., Moore, R.C., Plakal, M., Platt, D., Saurous, R.A., and Seybold, B. (2017, January 5\u20139). CNN architectures for large-scale audio classification. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA.","DOI":"10.1109\/ICASSP.2017.7952132"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"139438","DOI":"10.1109\/ACCESS.2019.2943492","article-title":"Lung Sound Recognition Algorithm Based on VGGish-BiGRU","volume":"7","author":"Shi","year":"2019","journal-title":"IEEE Access"}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/24\/3\/414\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T22:37:26Z","timestamp":1760135846000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/24\/3\/414"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,16]]},"references-count":38,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2022,3]]}},"alternative-id":["e24030414"],"URL":"https:\/\/doi.org\/10.3390\/e24030414","relation":{},"ISSN":["1099-4300"],"issn-type":[{"value":"1099-4300","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,3,16]]}}}