{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,10]],"date-time":"2026-03-10T15:16:09Z","timestamp":1773155769996,"version":"3.50.1"},"reference-count":52,"publisher":"MDPI AG","issue":"7","license":[{"start":{"date-parts":[[2022,3,30]],"date-time":"2022-03-30T00:00:00Z","timestamp":1648598400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Deputyship for Research &amp; Innovation, Ministry of Education 517 in Saudi Arabia","award":["959"],"award-info":[{"award-number":["959"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>The use of face masks has increased dramatically since the COVID-19 pandemic started in order to to curb the spread of the disease. Additionally, breakthrough infections caused by the Delta and Omicron variants have further increased the importance of wearing a face mask, even for vaccinated individuals. However, the use of face masks also induces attenuation in speech signals, and this change may impact speech processing technologies, e.g., automated speaker verification (ASV) and speech to text conversion. In this paper we examine Automatic Speaker Verification (ASV) systems against the speech samples in the presence of three different types of face mask: surgical, cloth, and filtered N95, and analyze the impact on acoustics and other factors. In addition, we explore the effect of different microphones, and distance from the microphone, and the impact of face masks when speakers use ASV systems in real-world scenarios. Our analysis shows a significant deterioration in performance when an ASV system encounters different face masks, microphones, and variable distance between the subject and microphone. To address this problem, this paper proposes a novel framework to overcome performance degradation in these scenarios by realigning the ASV system. The novelty of the proposed ASV framework is as follows: first, we propose a fused feature descriptor by concatenating the novel Ternary Deviated overlapping Patterns (TDoP), Mel Frequency Cepstral Coefficients (MFCC), and Gammatone Cepstral Coefficients (GTCC), which are used by both the ensemble learning-based ASV and anomaly detection system in the proposed ASV architecture. Second, this paper proposes an anomaly detection model for identifying vocal samples produced in the presence of face masks. Next, it presents a Peak Norm (PN) filter to approximate the signal of the speaker without a face mask in order to boost the accuracy of ASV systems. Finally, the features of filtered samples utilizing the PN filter and samples without face masks are passed to the proposed ASV to test for improved accuracy. The proposed ASV system achieved an accuracy of 0.99 and 0.92, respectively, on samples recorded without a face mask and with different face masks. Although the use of face masks affects the ASV system, the PN filtering solution overcomes this deficiency up to 4%. Similarly, when exposed to different microphones and distances, the PN approach enhanced system accuracy by up to 7% and 9%, respectively. The results demonstrate the effectiveness of the presented framework against an in-house prepared, diverse Multi Speaker Face Masks (MSFM) dataset, (IRB No. FY2021-83), consisting of samples of subjects taken with a variety of face masks and microphones, and from different distances.<\/jats:p>","DOI":"10.3390\/s22072638","type":"journal-article","created":{"date-parts":[[2022,3,30]],"date-time":"2022-03-30T21:28:39Z","timestamp":1648675719000},"page":"2638","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":11,"title":["Toward Realigning Automatic Speaker Verification in the Era of COVID-19"],"prefix":"10.3390","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2497-7687","authenticated-orcid":false,"given":"Awais","family":"Khan","sequence":"first","affiliation":[{"name":"Department of Computer Science and Engineering, Oakland University, Rochester, MI 48309, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1290-1477","authenticated-orcid":false,"given":"Ali","family":"Javed","sequence":"additional","affiliation":[{"name":"Department of Software Engineering, University of Engineering and Technology, Taxila 47050, Pakistan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7927-3436","authenticated-orcid":false,"given":"Khalid Mahmood","family":"Malik","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Oakland University, Rochester, MI 48309, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5735-7495","authenticated-orcid":false,"given":"Muhammad Anas","family":"Raza","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Oakland University, Rochester, MI 48309, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6909-934X","authenticated-orcid":false,"given":"James","family":"Ryan","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Oakland University, Rochester, MI 48309, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4205-3621","authenticated-orcid":false,"given":"Abdul Khader Jilani","family":"Saudagar","sequence":"additional","affiliation":[{"name":"Information Systems Department, College of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh 11432, Saudi Arabia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6006-3888","authenticated-orcid":false,"given":"Hafiz","family":"Malik","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, University of Michigan, Dearborn, MI 48128, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2022,3,30]]},"reference":[{"key":"ref_1","unstructured":"Nedelman, M. (2021, July 30). CDC Shares \u2019Pivotal Discovery\u2019 on COVID-19 Breakthrough Infections That Led to New Mask Guidance. CNN Health. Available online: https:\/\/edition.cnn.com\/2021\/07\/30\/health\/breakthrough-infection-masks-cdc-provincetown-study\/index.html."},{"key":"ref_2","unstructured":"Aradhana, A., and Chen, L. (2021, July 23). Vaccinated People Make up 75% of Recent COVID-19 Cases in Singapore, but Few Fall Ill. REUTERS. Available online: https:\/\/www.reuters.com\/world\/asia-pacific\/vaccinated-people-singapore-make-up-three-quarters-recent-covid-19-cases-2021-07-23\/."},{"key":"ref_3","unstructured":"Sheinin, A.G. (2022, January 12). Vaccinated People Infected with Delta Remain Contagious. WebMD. Available online: https:\/\/www.webmd.com\/lung\/news\/20220112\/cdc-better-masks-for-omicron."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1024","DOI":"10.1136\/thoraxjnl-2020-215748","article-title":"Face coverings and mask to minimise droplet dispersion and aerosolisation: A video case study","volume":"75","author":"Bahl","year":"2020","journal-title":"Thorax"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"2371","DOI":"10.1121\/10.0002279","article-title":"Acoustic effects of medical, cloth, and transparent face masks on speech signals","volume":"148","author":"Corey","year":"2020","journal-title":"J. Acoust. Soc. Am."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"3562","DOI":"10.1121\/10.0002873","article-title":"Effects of face masks on acoustic analysis and speech perception: Implications for peri-pandemic protocols","volume":"148","author":"Magee","year":"2020","journal-title":"J. Acoust. Soc. Am."},{"key":"ref_7","unstructured":"Fecher, N., and Watt, D. (September, January 29). Effects of forensically-realistic facial concealment on auditory-visual consonant recognition in quiet and noise conditions. Proceedings of the Auditory-Visual Speech Processing (AVSP), Annecy, France."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Saeidi, R., Niemi, T., Karppelin, H., Pohjalainen, J., Kinnunen, T., and Alku, P. (2015, January 6\u201310). Speaker recognition for speech under face cover. Proceedings of the 16th Annual Conference of the International Speech Communication Association (Interspeech 2015), Dresden, Germany.","DOI":"10.21437\/Interspeech.2015-275"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Saeidi, R., Huhtakallio, I., and Alku, P. (2016, January 8\u201312). Analysis of Face Mask Effect on Speaker Recognition. Proceedings of the Interspeech, San Francisco, CA, USA.","DOI":"10.21437\/Interspeech.2016-518"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Loukina, A., Evanini, K., Mulholland, M., Blood, I., and Zechner, K. (2020). Do face masks introduce bias in speech technologies? The case of automated scoring of speaking proficiency. arXiv.","DOI":"10.21437\/Interspeech.2020-1264"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Ristea, N.C., and Ionescu, R.T. (2020). Are you wearing a mask? Improving mask detection from speech using augmentation by cycle-consistent GANs. arXiv.","DOI":"10.21437\/Interspeech.2020-1329"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"686","DOI":"10.3766\/jaaa.19.9.4","article-title":"Speech understanding using surgical masks: A problem in health care?","volume":"19","author":"Mendel","year":"2008","journal-title":"J. Am. Acad. Audiol."},{"key":"ref_13","unstructured":"Llamas, C., Harrison, P., Donnelly, D., and Watt, D. (2022, March 03). Effects of Different Types of Face Coverings on Speech Acoustics and Intelligibility. Available online: https:\/\/www.researchgate.net\/publication\/237289463_Effects_of_different_types_of_face_coverings_on_speech_acoustics_and_intelligibility."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Toscano, J.C., and Toscano, C.M. (2021). Effects of face masks on speech recognition in multi-talker babble noise. PLoS ONE, 16.","DOI":"10.1371\/journal.pone.0246842"},{"key":"ref_15","unstructured":"Das, R.K., and Li, H. (2020, January 7\u201310). Classification of Speech with and without Face Mask using Acoustic Features. Proceedings of the 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Auckland, New Zealand."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"108361","DOI":"10.1016\/j.patcog.2021.108361","article-title":"Face mask recognition from audio: The MASC database and an overview on the mask challenge","volume":"122","author":"Mohamed","year":"2022","journal-title":"Pattern Recognit."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"254","DOI":"10.1016\/j.neulet.2005.03.050","article-title":"Recruitment of fusiform face area associated with listening to degraded speech sounds in auditory\u2013visual speech perception: A PET study","volume":"382","author":"Kawase","year":"2005","journal-title":"Neurosci. Lett."},{"key":"ref_18","first-page":"1","article-title":"Acoustic voice characteristics with and without wearing a facemask","volume":"11","author":"Nguyen","year":"2021","journal-title":"Sci. Rep."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"36","DOI":"10.1097\/01.HJ.0000725092.55506.7e","article-title":"Comparison of the acoustic effects of face masks on speech","volume":"74","author":"Corey","year":"2021","journal-title":"Hear. J."},{"key":"ref_20","unstructured":"Orman, \u00d6.D., and Arslan, L.M. (2001, January 18\u201322). Frequency analysis of speaker identification. Proceedings of the 2001: A Speaker Odyssey-The Speaker Recognition Workshop, Crete, Greece."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Rusli, M.H., Sjarif, N.N.A., Yuhaniz, S.S., Kok, S., and Kadir, M.S. (2021, January 5\u20136). Evaluating the Masked and Unmasked Face with LeNet Algorithm. Proceedings of the 2021 IEEE 17th International Colloquium on Signal Processing & Its Applications (CSPA), Langkawi, Malaysia.","DOI":"10.1109\/CSPA52141.2021.9377283"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Patel, T.B., and Patil, H.A. (2015, January 6\u201310). Combining evidences from mel cepstral, cochlear filter cepstral and instantaneous frequency features for detection of natural vs. spoofed speech. Proceedings of the Sixteenth Annual Conference of the International Speech Communication Association, Dresden, Germany.","DOI":"10.21437\/Interspeech.2015-467"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Klumpp, P., Arias-Vergara, T., V\u00e1squez-Correa, J.C., P\u00e9rez-Toro, P.A., H\u00f6nig, F., N\u00f6th, E., and Orozco-Arroyave, J.R. (2020, January 25\u201329). Surgical Mask Detection with Deep Recurrent Phonetic Models. Proceedings of the Interspeech, Shanghai, China.","DOI":"10.21437\/Interspeech.2020-1723"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Fecher, N. (2012., January 9\u201313). \u201cAudio-Visual Face Cover Corpus\u201d: Investigations into Audio-Visual Speech and Speaker Recognition When the Speaker\u2019s Face is Occluded by Facewear. Proceedings of the Thirteenth Annual Conference of the International Speech Communication Association, Portland, OH, USA.","DOI":"10.21437\/Interspeech.2012-133"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Chen, G., Chai, S., Wang, G., Du, J., Zhang, W.Q., Weng, C., Su, D., Povey, D., Trmal, J., and Zhang, J. (2021). GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio. arXiv.","DOI":"10.21437\/Interspeech.2021-1965"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"190784","DOI":"10.1109\/ACCESS.2020.3031763","article-title":"A Speech Emotion Recognition Model Based on Multi-Level Local Binary and Local Ternary Patterns","volume":"8","author":"Varol","year":"2020","journal-title":"IEEE Access"},{"key":"ref_27","unstructured":"Muda, L., Begam, M., and Elamvazuthi, I. (2010). Voice recognition algorithms using mel frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques. arXiv."},{"key":"ref_28","unstructured":"Han, W., Chan, C.F., Choy, C.S., and Pun, K.P. (2006, January 21\u201324). An efficient MFCC extraction method in speech recognition. Proceedings of the 2006 IEEE International Symposium on Circuits and Systems, Kos, Greece."},{"key":"ref_29","unstructured":"Chin, C.S., and Xiao, J. (2021, January 23\u201325). Max-Fusion of Random Ensemble Subspace Discriminant with Aggregation of MFCCs and High Scalogram Coefficients for Acoustics Classification. Proceedings of the 2021 IEEE\/ACIS 19th International Conference on Computer and Information Science (ICIS), Shanghai, China."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"52","DOI":"10.1109\/LSP.2005.860538","article-title":"Combining evidence from residual phase and MFCC features for speaker recognition","volume":"13","author":"Murty","year":"2005","journal-title":"IEEE Signal Process. Lett."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"22","DOI":"10.1016\/j.neunet.2020.06.015","article-title":"Heart sound classification based on improved MFCC features and convolutional recurrent neural networks","volume":"130","author":"Deng","year":"2020","journal-title":"Neural Netw."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Biswas, M., Rahaman, S., Ahmadian, A., Subari, K., and Singh, P.K. (2022). Automatic spoken language identification using MFCC based time series features. Multimedia Tools and Applications, Springer.","DOI":"10.1007\/s11042-021-11439-1"},{"key":"ref_33","first-page":"540","article-title":"Gammatone cepstral coefficient for speaker Identification","volume":"2","author":"Fathima","year":"2013","journal-title":"Int. J. Adv. Res. Electr. Electron. Instrum. Eng."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"1684","DOI":"10.1109\/TMM.2012.2199972","article-title":"Gammatone cepstral coefficients: Biologically inspired features for non-speech audio classification","volume":"14","author":"Valero","year":"2012","journal-title":"IEEE Trans. Multimed."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Chin, C.S., Kek, X.Y., and Chan, T.K. (2021, January 19\u201320). Scattering Transform of Averaged Data Augmentation for Ensemble Random Subspace Discriminant Classifiers in Audio Recognition. Proceedings of the 2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India.","DOI":"10.1109\/ICACCS51430.2021.9441716"},{"key":"ref_36","unstructured":"H\u00e9ctor Delgado, N.E., and Kinnunen, T. (2022, March 03). Automatic Speaker Verification Spoofing and Countermeasures. Available online: https:\/\/www.asvspoof.org\/."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Li, X., Zhong, J., Wu, X., Yu, J., Liu, X., and Meng, H. (2020, January 4\u20138). Adversarial attacks on GMM i-vector based speaker verification systems. Proceedings of the ICASSP 2020\u20142020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain.","DOI":"10.1109\/ICASSP40776.2020.9053076"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Jagtap, S.S., and Bhalke, D. (2015, January 8\u201310). Speaker verification usirng Gaussian mixture model. Proceedings of the 2015 International Conference on Pervasive Computing (ICPC), Pune, India.","DOI":"10.1109\/PERVASIVE.2015.7087080"},{"key":"ref_39","first-page":"126","article-title":"Speaker identification using gmm with mfcc","volume":"12","author":"Mahboob","year":"2015","journal-title":"Int. J. Comput. Sci. Issues"},{"key":"ref_40","first-page":"2825","article-title":"Scikit-learn: Machine Learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1023\/A:1007607513941","article-title":"An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization","volume":"40","author":"Dietterich","year":"2000","journal-title":"Mach. Learn."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"1774","DOI":"10.1109\/TNNLS.2017.2673241","article-title":"Efficient kNN classification with different numbers of nearest neighbors","volume":"29","author":"Zhang","year":"2017","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_43","first-page":"1","article-title":"Naive bayes classifiers","volume":"18","author":"Murphy","year":"2006","journal-title":"Univ. Br. Columbia"},{"key":"ref_44","unstructured":"Claesen, M., De Smet, F., Suykens, J., and De Moor, B. (2014). EnsembleSVM: A library for ensemble learning using support vector machines. arXiv."},{"key":"ref_45","first-page":"1","article-title":"Linear discriminant analysis-a brief tutorial","volume":"18","author":"Balakrishnama","year":"1998","journal-title":"Inst. Signal Inf. Process."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"1995","DOI":"10.1007\/s00521-015-1923-y","article-title":"Monarch butterfly optimization","volume":"31","author":"Wang","year":"2019","journal-title":"Neural Comput. Appl."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1504\/IJBIC.2018.093328","article-title":"Earthworm optimisation algorithm: A bio-inspired metaheuristic algorithm for global optimisation problems","volume":"12","author":"Wang","year":"2018","journal-title":"Int. J. Bio-Inspir. Comput."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"674","DOI":"10.1007\/s42235-021-0050-y","article-title":"The colony predation algorithm","volume":"18","author":"Tu","year":"2021","journal-title":"J. Bionic Eng."},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"849","DOI":"10.1016\/j.future.2019.02.028","article-title":"Harris hawks optimization: Algorithm and applications","volume":"97","author":"Heidari","year":"2019","journal-title":"Future Gener. Comput. Syst."},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Aljasem, M., Irtaza, A., Malik, H., Saba, N., Javed, A., Malik, K.M., and Meharmohammadi, M. (2021). Secure Automatic Speaker Verification (SASV) System through sm-ALTP Features and Asymmetric Bagging. IEEE Trans. Inf. Forensics Secur.","DOI":"10.1109\/TIFS.2021.3082303"},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"982","DOI":"10.1109\/JSTSP.2020.2999828","article-title":"A light-weight replay detection framework for voice controlled IoT devices","volume":"14","author":"Malik","year":"2020","journal-title":"IEEE J. Sel. Top. Signal Process."},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"108283","DOI":"10.1016\/j.apacoust.2021.108283","article-title":"Towards protecting cyber-physical and IoT systems from single-and multi-order voice spoofing attacks","volume":"183","author":"Javed","year":"2021","journal-title":"Appl. Acoust."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/7\/2638\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T22:46:13Z","timestamp":1760136373000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/7\/2638"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,30]]},"references-count":52,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2022,4]]}},"alternative-id":["s22072638"],"URL":"https:\/\/doi.org\/10.3390\/s22072638","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,3,30]]}}}