{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T17:37:59Z","timestamp":1776101879564,"version":"3.50.1"},"reference-count":58,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2019,3,19]],"date-time":"2019-03-19T00:00:00Z","timestamp":1552953600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100016644","name":"Ohio Federal Research Network","doi-asserted-by":"publisher","award":["OFRN"],"award-info":[{"award-number":["OFRN"]}],"id":[{"id":"10.13039\/100016644","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["MAKE"],"abstract":"<jats:p>In this paper, we present a novel pipelined near real-time speaker recognition architecture that enhances the performance of speaker recognition by exploiting the advantages of hybrid feature extraction techniques that contain the features of Gabor Filter (GF), Convolution Neural Networks (CNN), and statistical parameters as a single matrix set. This architecture has been developed to enable secure access to a voice-based user interface (UI) by enabling speaker-based authentication and integration with an existing Natural Language Processing (NLP) system. Gaining secure access to existing NLP systems also served as motivation. Initially, we identify challenges related to real-time speaker recognition and highlight the recent research in the field. Further, we analyze the functional requirements of a speaker recognition system and introduce the mechanisms that can address these requirements through our novel architecture. Subsequently, the paper discusses the effect of different techniques such as CNN, GF, and statistical parameters in feature extraction. For the classification, standard classifiers such as Support Vector Machine (SVM), Random Forest (RF) and Deep Neural Network (DNN) are investigated. To verify the validity and effectiveness of the proposed architecture, we compared different parameters including accuracy, sensitivity, and specificity with the standard AlexNet architecture.<\/jats:p>","DOI":"10.3390\/make1010031","type":"journal-article","created":{"date-parts":[[2019,3,19]],"date-time":"2019-03-19T12:12:25Z","timestamp":1552997545000},"page":"504-520","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":69,"title":["A Near Real-Time Automatic Speaker Recognition Architecture for Voice-Based User Interface"],"prefix":"10.3390","volume":"1","author":[{"given":"Parashar","family":"Dhakal","sequence":"first","affiliation":[{"name":"Electrical Engineering and Computer Science Department, the University of Toledo, Toledo, OH 43606, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8058-7072","authenticated-orcid":false,"given":"Praveen","family":"Damacharla","sequence":"additional","affiliation":[{"name":"ECE Department, Purdue University Northwest, Hammond, IN 46323, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4719-4941","authenticated-orcid":false,"given":"Ahmad Y.","family":"Javaid","sequence":"additional","affiliation":[{"name":"Electrical Engineering and Computer Science Department, the University of Toledo, Toledo, OH 43606, USA"}]},{"given":"Vijay","family":"Devabhaktuni","sequence":"additional","affiliation":[{"name":"ECE Department, Purdue University Northwest, Hammond, IN 46323, USA"}]}],"member":"1968","published-online":{"date-parts":[[2019,3,19]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"4","DOI":"10.17485\/ijst\/2016\/v9i4\/83894","article-title":"A voice identification system using hidden Markov model","volume":"9","author":"Das","year":"2016","journal-title":"Indian J. Sci. Technol."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Makary, M.A., and Daniel, M. (2016). Medical error\u2014The third leading cause of death in the US. BMJ, 353.","DOI":"10.1136\/bmj.i2139"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Damacharla, P., Dhakal, P., Stumbo, S., Javaid, A.Y., Ganapathy, S., Malek, D.A., Hodge, D.C., and Devabhaktuni, V. (2018). Effects of voice-based synthetic assistant on performance of emergency care provider in training. Int. J. Artif. Intell. Educ.","DOI":"10.1007\/s40593-018-0166-3"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"38637","DOI":"10.1109\/ACCESS.2018.2853560","article-title":"Common metrics to benchmark human-machine teams (HMT): A review","volume":"6","author":"Damacharla","year":"2018","journal-title":"IEEE Access"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"82","DOI":"10.1109\/MSP.2012.2205597","article-title":"Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups","volume":"29","author":"Hinton","year":"2012","journal-title":"IEEE Signal. Process. Mag."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1049\/iet-spr.2012.0151","article-title":"Comparative study of automatic speech recognition techniques","volume":"7","author":"Cutajar","year":"2013","journal-title":"IET Signal. Process."},{"key":"ref_7","first-page":"3133","article-title":"Do we need hundreds of classifiers to solve real-world classification problems","volume":"15","author":"Cernadas","year":"2014","journal-title":"J. Mach. Learn. Res."},{"key":"ref_8","first-page":"3837","article-title":"Are random forests truly the best classifiers?","volume":"17","author":"Weinberg","year":"2016","journal-title":"J. Mach. Learn. Res."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1023\/A:1008066223044","article-title":"Audio feature extraction and analysis for scene classification","volume":"20","author":"Liu","year":"1997","journal-title":"J. VLSI Signal. Process. Syst."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"209814","DOI":"10.1155\/2015\/209814","article-title":"Optimized audio classification and segmentation algorithm by using ensemble methods","volume":"2015","author":"Zahid","year":"2015","journal-title":"Math. Probl. Eng."},{"key":"ref_11","unstructured":"Lozano, H., Hernandez, I., Navas, E., Gonzalez, F., and Idigoras, I. (2007, January 28\u201331). Household sound identification system for people with hearing disabilities. Proceedings of the Conference and Workshop on Assistive Technologies for People with Vision and Hearing Impairments, Granada, Spain."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Chang, S.Y., and Morgan, N. (2014, January 14\u201318). Robust CNN-Based Speech Recognition with Gabor Filter Kernels. Proceedings of the Fifteenth Annual Conference of the International Speech Communication Association, Singapore.","DOI":"10.21437\/Interspeech.2014-226"},{"key":"ref_13","first-page":"84","article-title":"Imagenet classification with deep convolutional neural networks","volume":"60","author":"Krizhevsky","year":"2012","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"749","DOI":"10.1109\/JSTSP.2014.2364559","article-title":"A real-time end-to-end multilingual speech recognition architecture","volume":"9","author":"Eustis","year":"2015","journal-title":"IEEE J. Sel. Top. Signal Process."},{"key":"ref_15","first-page":"393","article-title":"A Review on Automatic speech recognition architecture and approaches","volume":"9","author":"Karpagavalli","year":"2016","journal-title":"Int. J. Signal. Process. Image Process. Pattern Recognit."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"30","DOI":"10.17485\/ijst\/2017\/v10i30\/115518","article-title":"Issues and challenges of voice recognition in pervasive environment","volume":"10","author":"Goyal","year":"2017","journal-title":"Indian J. Sci. Technol."},{"key":"ref_17","unstructured":"Zhang, A., Wang, Q., Zhu, Z., Paisley, J., and Wang, C. (2018). Fully Supervised Speaker Diarization. arXiv preprint, Available online: https:\/\/arxiv.org\/pdf\/1810.04719.pdf."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Zhang, A., Wang, Q., Zhu, Z., Paisley, J., and Wang, C. (2019, January 12\u201317). Fully supervised speaker diarization. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal. Processing, Brighton, UK.","DOI":"10.1109\/ICASSP.2019.8683892"},{"key":"ref_19","unstructured":"Salehghaffari, H. (arXiv, 2018). Speaker Verification using Convolutional Neural Networks, arXiv."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Nagrani, A., Son, C.J., and Andrew, Z. (arXiv, 2017). Voxceleb: A Large-Scale Speaker Identification Dataset, arXiv.","DOI":"10.21437\/Interspeech.2017-950"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Chung, J.S., Nagrani, A., and Zisserman, A. (2018, January 6). VoxCeleb2: Deep Speaker Recognition. Presented at the Interspeech 2018, Hyderabad, India. Available online: http:\/\/dx.doi.org\/10.21437\/Interspeech.2018-1929.","DOI":"10.21437\/Interspeech.2018-1929"},{"key":"ref_22","unstructured":"Xiaoyu, L. (2017). Deep Convolutional and LSTM Neural Networks for Acoustic Modelling in Automatic Speech Recognition, Pearson Education Inc."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"351","DOI":"10.1016\/0167-6393(90)90010-7","article-title":"Speech database development at MIT: TIMIT and beyond","volume":"9","author":"Zue","year":"1990","journal-title":"Speech Commun."},{"key":"ref_24","unstructured":"Mobiny, A. (arXiv, 2018). Text-Independent Speaker Verification Using Long Short-Term Memory Networks, arXiv."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"3244","DOI":"10.1109\/TII.2018.2799928","article-title":"GMM and CNN hybrid method for short utterance speaker recognition","volume":"14","author":"Liu","year":"2018","journal-title":"IEEE Trans. Ind. Inf."},{"key":"ref_26","unstructured":"Selvaraj, S.S.P., and Konam, S. (2019, March 18). Deep Learning for Speaker Recognition. Available online: https:\/\/arxiv.org\/ftp\/arxiv\/papers\/1708\/1708.05682.pdf."},{"key":"ref_27","first-page":"12","article-title":"Voice recognition and authentication as a proficient biometric tool and its application in online exam for PH people","volume":"39","author":"Rudrapal","year":"2012","journal-title":"Int. J. Comput. Appl."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Dhakal, P., Damacharla, P., Javaid, A.Y., and Devabhaktuni, V. (2018, January 6\u20138). Detection and Identification of Background Sounds to Improvise Voice Interface in Critical Environments. Proceedings of the 2018 IEEE International Symposium on Signal. Processing and Information Technology (ISSPIT), Louisville, KY, USA.","DOI":"10.1109\/ISSPIT.2018.8642755"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"500","DOI":"10.14445\/22315381\/IJETT-V10P298","article-title":"An outdoor navigation with voice recognition security application for visually impaired people","volume":"10","author":"Nandish","year":"2014","journal-title":"Int. J. Eng. Trends Technol."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Sainath, T.N., Mohamed, A.R., Kingsbury, B., and Ramabhadran, B. (2013, January 26\u201331). Deep Convolutional Neural Networks for LVCSR. Proceedings of the IEEE International Conference on acoustics, Speech and Signal Processing, Vancouver, BC, Canada.","DOI":"10.1109\/ICASSP.2013.6639347"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Vesely, K., Karafit, M., and Grzl, F. (2011, January 11). Convolutive Bottleneck Network Features for LVCSR. Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Big Island, HI, USA.","DOI":"10.1109\/ASRU.2011.6163903"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"1533","DOI":"10.1109\/TASLP.2014.2339736","article-title":"Convolutional neural networks for speech recognition","volume":"22","author":"Mohamed","year":"2014","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Poria, S., Cambria, E., and Gelbukh, A. (2015). Deep convolutional neural network textual features and multiple kernel learning for utterance-level multimodal sentiment analysis. EMNLP.","DOI":"10.18653\/v1\/D15-1303"},{"key":"ref_34","unstructured":"Missaoui, I., and Zied, L. (July, January 30). Gabor Filterbank Features for robust Speech Recognition. Proceedings of the International Conference on Image and Signal. Processing (ICISP), Cherburg, France."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1016\/j.csl.2017.02.006","article-title":"On the relevance of auditory-based Gabor features for deep learning in robust speech recognition","volume":"45","author":"Martinez","year":"2017","journal-title":"Comput. Speech Lang."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Chang, S.Y., and Morgan, N. (2013, January 25\u201329). Informative Spectro-Temporal Bottleneck Features for Noise-Robust Speech Recognition. Proceedings of the Interspeech 14th Annual Conference of the International Speech Communication Association, Lyon, France.","DOI":"10.21437\/Interspeech.2013-46"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Sarwar, S.S., Panda, P., and Roy, K. (2017, January 15). Gabor Filter Assisted Energy Efficient Fast Learning Convolutional Neural Networks. Proceedings of the 2017 IEEE\/ACM International Symposium on Low Power Electronics and Design (ISLPED), Taipei, Taiwan.","DOI":"10.1109\/ISLPED.2017.8009202"},{"key":"ref_38","unstructured":"Mahmoud, W.H., and Zhang, N. (2013, January 23\u201326). Software\/Hardware Implementation of an Adaptive Noise Cancellation System. Proceedings of the 120th ASEE Annual Conference and Exposition, Atlanta, GA, USA."},{"key":"ref_39","unstructured":"Wyse, L. (2017, January 18\u201319). Audio Spectrogram Representations for Processing with Convolutional Neural Networks. Proceedings of the IEEE International Conference on Deep Learning and Music, Anchorage, AK, USA."},{"key":"ref_40","unstructured":"Feng, L., and Kai, H.L. (2005). A New Database for Speaker Recognition, IMM."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Malik, F., and Baharudin, B. (2012, January 21\u201322). Quantized Histogram Color Features Analysis for Image Retrieval Based on Median and Laplacian Filters in DCT Domain. Proceedings of the IEEE International Conference on Innovation Management and Technology Research (ICIMTR), Malacca, Malaysia.","DOI":"10.1109\/ICIMTR.2012.6236471"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"7905","DOI":"10.1016\/j.eswa.2015.06.025","article-title":"CloudID: Trustworthy cloud-based and cross-enterprise biometric identification","volume":"42","author":"Haghighat","year":"2015","journal-title":"Exp. Syst. Appl."},{"key":"ref_43","unstructured":"Jain, K., and Farrokhnia, F. (1990, January 4\u20137). Unsupervised Texture Segmentation Using Gabor Filters. Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, Universal City, CA, USA."},{"key":"ref_44","unstructured":"Burkert, P., Trier, F., Afzal, M.Z., Dengel, A., and Liwicki, M. (arXiv, 2015). Dexpression: A Deep Convolutional Neural Network for Expression Recognition, arXiv."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Levi, G., and Hassner, T. (2015, January 7\u201312). Age and Gender Classification Using Convolutional Neural Networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Boston, MA, USA.","DOI":"10.1109\/CVPRW.2015.7301352"},{"key":"ref_46","unstructured":"Dieleman, S., Schl\u00fcter, J., Raffel, C., Olson, E., S\u00f8nderby, S.K., Nouri, D., Maturana, D., Thoma, M., Battenberg, E., and Kelly, J. (2015). Lasagne: First release, Zenodo."},{"key":"ref_47","unstructured":"Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R.R. (arXiv, 2012). Improving neural networks by preventing co-adaptation of feature detectors, arXiv."},{"key":"ref_48","unstructured":"Hijazi, S., Kumar, R., and Rowen, C. (2015). Using Convolutional Neural Networks for Image Recognition, Cadence Design Systems Inc."},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"1552","DOI":"10.1109\/TMI.2002.806569","article-title":"A support vector machine approach for detection of microcalcifications","volume":"21","author":"Wernick","year":"2002","journal-title":"IEEE Trans. Med. Imag."},{"key":"ref_50","unstructured":"Hsu, W., Chang, C.C., and Lin, C.J. (2003). A Practical Guide to Support Vector Classification, Department of Computer Science and Information Engineering, National Taiwan University. Technical Report."},{"key":"ref_51","unstructured":"Liaw, A., and Wiener, M. (2002). Classification and Regression by Random Forest, The R Foundation. The Newsletter of the R Project."},{"key":"ref_52","unstructured":"Breiman, L., Friedman, J., Stone, C.J., and Olshen, R.A. (1984). Classification and Regression Trees, CRC press."},{"key":"ref_53","unstructured":"Tang, Y. (2013, January 2). Deep learning using linear support vector machines. Presented at the Challenges in Representation Learning Workshop (ICML), Atlanta, GA, USA. Available online: https:\/\/arxiv.org\/pdf\/1306.0239.pdf."},{"key":"ref_54","first-page":"2825","article-title":"Scikit-learn: Machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"ref_55","unstructured":"(1997). NOVA, WGBH Science Unit Online, PBS."},{"key":"ref_56","unstructured":"(2019, March 18). Amazon, Alexa. Available online: Amazon.com."},{"key":"ref_57","unstructured":"(2019, March 18). Build Natural and Rich Conversational Experiences. Available online: DialogFlow.com."},{"key":"ref_58","unstructured":"(2019, March 18). Cortana Is Your Truly Personal Digital Assistant. Available online: Microsoft.com."}],"container-title":["Machine Learning and Knowledge Extraction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-4990\/1\/1\/31\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T12:39:06Z","timestamp":1760186346000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-4990\/1\/1\/31"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,3,19]]},"references-count":58,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2019,3]]}},"alternative-id":["make1010031"],"URL":"https:\/\/doi.org\/10.3390\/make1010031","relation":{},"ISSN":["2504-4990"],"issn-type":[{"value":"2504-4990","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,3,19]]}}}