{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,14]],"date-time":"2025-11-14T14:47:44Z","timestamp":1763131664668,"version":"3.45.0"},"reference-count":42,"publisher":"MDPI AG","issue":"11","license":[{"start":{"date-parts":[[2025,11,13]],"date-time":"2025-11-13T00:00:00Z","timestamp":1762992000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>Deepfake audio refers to the generation of voice recordings using deep neural networks that replicate a specific individual\u2019s voice, often for deceptive or fraud purposes. Although this has been an area of research for quite some time, deepfakes still pose substantial challenges for reliable true speaker authentication. To address the issue, we propose AudioFakeNet, a hybrid deep learning architecture that use Convolutional Neural Networks (CNNs) along with Long Short-Term Memory (LSTM) units, and Multi-Head Attention (MHA) mechanisms for robust deepfake detection. CNN extracts spatial and spectral features, LSTM captures temporal dependencies, and MHA enhances to focus on informative audio segments. The model is trained using Mel-Frequency Cepstral Coefficients (MFCCs) from the publicly available dataset and was validated on self-collected dataset, ensuring reproducibility. Performance comparisons with state-of-the-art machine learning and deep learning models show that our proposed AudioFakeNet achieves higher accuracy, better generalization, and lower Equal Error Rate (EER). Its modular design allows for broader adaptability in fake-audio detection tasks, offering significant potential across diverse speech synthesis applications.<\/jats:p>","DOI":"10.3390\/a18110716","type":"journal-article","created":{"date-parts":[[2025,11,14]],"date-time":"2025-11-14T14:37:52Z","timestamp":1763131072000},"page":"716","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["AudioFakeNet: A Model for Reliable Speaker Verification in Deepfake Audio"],"prefix":"10.3390","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3526-1601","authenticated-orcid":false,"given":"Samia","family":"Dilbar","sequence":"first","affiliation":[{"name":"Department of Electrical Engineering, Islamia University of Bahawalpur, Bahawalpur 63100, Pakistan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4390-2461","authenticated-orcid":false,"given":"Muhammad Ali","family":"Qureshi","sequence":"additional","affiliation":[{"name":"Department of Information and Communication Engineering, Islamia University of Bahawalpur, Bahawalpur 63100, Pakistan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8760-0628","authenticated-orcid":false,"given":"Serosh Karim","family":"Noon","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering, NFC Institute of Engineering & Technology, Multan 60000, Pakistan"}]},{"given":"Abdul","family":"Mannan","sequence":"additional","affiliation":[{"name":"Department of Biomedical Engineering, NFC Institute of Engineering & Technology, Multan 60000, Pakistan"}]}],"member":"1968","published-online":{"date-parts":[[2025,11,13]]},"reference":[{"key":"ref_1","unstructured":"Bird, J.J., and Lotfi, A. (2023). Real-time Detection of AI-Generated Speech for DeepFake Voice Conversion. arXiv."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"637","DOI":"10.70003\/160792642024072504014","article-title":"Design and Implementation for Research Paper Classification Based on CNN and RNN Models","volume":"25","author":"Biswas","year":"2024","journal-title":"J. Internet Technol."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"123941","DOI":"10.1016\/j.eswa.2024.123941","article-title":"Audio-Deepfake Detection: Adversarial Attacks and Countermeasures","volume":"250","author":"Rabhi","year":"2024","journal-title":"Expert Syst. Appl."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Sun, C., Jia, S., Hou, S., and Lyu, S. (2023, January 18\u201322). AI-Synthesized Voice Detection Using Neural Vocoder Artifacts. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada.","DOI":"10.1109\/CVPRW59228.2023.00097"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"7455","DOI":"10.1007\/s11760-024-03407-7","article-title":"Identification of true speakers from disguised voices in anti-forensic scenarios using an efficient framework","volume":"18","author":"Rana","year":"2024","journal-title":"Signal Image Video Process."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Chitale, M., Dhawale, A., Dubey, M., and Ghane, S. (2024, January 3\u20134). A Hybrid CNN-LSTM Approach for Deepfake Audio Detection. Proceedings of the 2024 3rd International Conference on Artificial Intelligence for Internet of Things (AIIoT), Vellore, India.","DOI":"10.1109\/AIIoT58432.2024.10574576"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Ashraf, M., Abid, F., Din, I.U., Rasheed, J., Yesiltepe, M., Yeo, S.F., and Ersoy, M.T. (2023). A Hybrid CNN and RNN Variant Model for Music Classification. Appl. Sci., 13.","DOI":"10.3390\/app13031476"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"149221","DOI":"10.1109\/ACCESS.2024.3478731","article-title":"Hybrid Transformer Architectures with Diverse Audio Features for Deepfake Speech Classification","volume":"12","author":"Zaman","year":"2024","journal-title":"IEEE Access"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"284","DOI":"10.62019\/abbdm.v4i02.159","article-title":"A Comprehensive Review of Forensic Phonetics Techniques","volume":"4","author":"Rana","year":"2024","journal-title":"Asian Bull. Big Data Manag."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"289","DOI":"10.3390\/forensicsci4030021","article-title":"Video and audio deepfake datasets and open issues in deepfake technology: Being ahead of the curve","volume":"4","author":"Akhtar","year":"2024","journal-title":"Forensic Sci."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"103200","DOI":"10.1016\/j.specom.2025.103200","article-title":"One-Class Network Leveraging Spectro-Temporal Features for Generalized Synthetic Speech Detection","volume":"169","author":"Ye","year":"2025","journal-title":"Speech Commun."},{"key":"ref_12","first-page":"103935","article-title":"Deepfakes in Digital Media Forensics: Generation, AI-Based Detection and Challenges","volume":"88","author":"Bendiab","year":"2025","journal-title":"J. Inf. Secur. Appl."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"104145","DOI":"10.1016\/j.cviu.2024.104145","article-title":"Acoustic Features Analysis for Explainable Machine Learning-Based Audio Spoofing Detection","volume":"249","author":"Bisogni","year":"2024","journal-title":"Comput. Vis. Image Underst."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3736765","article-title":"Where Are We in Audio Deepfake Detection? A Systematic Analysis over Generative and Detection Models","volume":"25","author":"Li","year":"2025","journal-title":"ACM Trans. Internet Technol."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"103254","DOI":"10.1016\/j.specom.2025.103254","article-title":"A Feature Engineering Approach for Literary and Colloquial Tamil Speech Classification Using 1D-CNN","volume":"173","author":"Nanmalar","year":"2025","journal-title":"Speech Commun."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"97765","DOI":"10.1109\/ACCESS.2025.3571293","article-title":"Deepfake Audio Detection for Urdu Language Using Deep Neural Networks","volume":"13","author":"Ahmad","year":"2025","journal-title":"IEEE Access"},{"key":"ref_17","unstructured":"Ahmadiadli, Y., Zhang, X.-P., and Khan, N. (2025). Beyond Identity: A Generalizable Approach for Deepfake Audio Detection. arXiv, Available online: https:\/\/arxiv.org\/abs\/2505.06766."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Borodin, K., Kudryavtsev, V., Korzh, D., Efimenko, A., Mkrtchian, G., Gorodnichev, M., and Rogov, O.Y. (2024). AASIST3: KAN-Enhanced AASIST Speech Deepfake Detection Using SSL Features and Additional Regularization for the ASVspoof 2024 Challenge. arXiv.","DOI":"10.21437\/ASVspoof.2024-8"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Pianese, A., Cozzolino, D., Poggi, G., and Verdoliva, L. (2024, January 24\u201326). Training-Free Deepfake Voice Recognition by Leveraging Large-Scale Pre-Trained Models. Proceedings of the 2024 ACM Workshop on Information Hiding and Multimedia Security (IH & MMSec 2024), Baiona, Spain. Available online: https:\/\/arxiv.org\/abs\/2405.02179.","DOI":"10.1145\/3658664.3659662"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"134018","DOI":"10.1109\/ACCESS.2022.3231480","article-title":"Deepfake audio detection via MFCC features using machine learning","volume":"10","author":"Hamza","year":"2022","journal-title":"IEEE Access"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"112798","DOI":"10.1016\/j.asoc.2025.112798","article-title":"A transformer-based deep learning approach for recognition of forgery methods in spoofing speech attribution","volume":"171","author":"Zhang","year":"2025","journal-title":"Appl. Soft Comput."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"102993","DOI":"10.1016\/j.inffus.2025.102993","article-title":"Advances in DeepFake Detection Algorithms: Exploring Fusion Techniques in Single and Multi-Modal Approach","volume":"118","author":"Kumar","year":"2025","journal-title":"Inf. Fusion"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"72134","DOI":"10.1109\/ACCESS.2023.3286864","article-title":"Detecting Fake Audio of Arabic Speakers Using Self-Supervised Deep Learning","volume":"11","author":"Almutairi","year":"2023","journal-title":"IEEE Access"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"101732","DOI":"10.1016\/j.csl.2024.101732","article-title":"Spoofing Countermeasure for Fake Speech Detection Using Brute Force Features","volume":"90","author":"Mirza","year":"2025","journal-title":"Comput. Speech Lang."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"132652","DOI":"10.1109\/ACCESS.2023.3333866","article-title":"Audio Deepfake Approaches","volume":"11","author":"Shaaban","year":"2023","journal-title":"IEEE Access"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"2166","DOI":"10.1109\/TASLP.2024.3378107","article-title":"A Non-Invasive Speech Quality Evaluation Algorithm for Hearing Aids with Multi-Head Self-Attention and Audiogram-Based Features","volume":"32","author":"Liang","year":"2024","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"105234","DOI":"10.1016\/j.dsp.2025.105234","article-title":"A Hybrid CNN-LSTM Model for Environmental Sound Classification: Leveraging Feature Engineering and Transfer Learning","volume":"163","author":"Akter","year":"2025","journal-title":"Digit. Signal Process."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"21547","DOI":"10.1109\/ACCESS.2025.3533653","article-title":"BMNet: Enhancing Deepfake Detection Through BiLSTM and Multi-Head Self-Attention Mechanism","volume":"13","author":"Xiong","year":"2025","journal-title":"IEEE Access"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Lavrentyeva, G., Novoselov, S., Tseren, A., Volkova, M., Gorlanov, A., and Kozlov, A. (2019, January 15\u201319). STC Antispoofing Systems for the ASVspoof2019 Challenge. Proceedings of the Interspeech 2019, Graz, Austria. Available online: https:\/\/www.isca-archive.org\/interspeech_2019\/lavrentyeva19_interspeech.html.","DOI":"10.21437\/Interspeech.2019-1768"},{"key":"ref_30","unstructured":"Huang, L., and Pun, C.-M. (2024). Self-Attention and Hybrid Features for Replay and Deep-Fake Audio Detection. arXiv."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"344","DOI":"10.1109\/TIFS.2023.3324724","article-title":"Domain Generalization via Aggregation and Separation for Audio Deepfake Detection","volume":"19","author":"Xie","year":"2023","journal-title":"IEEE Trans. Inf. Forensics Secur."},{"key":"ref_32","unstructured":"Abdeldayem, M., and Mohamed, A. (2025, August 13). The Fake or Real Dataset. Available online: https:\/\/www.kaggle.com\/datasets\/mohammedabdeldayem\/the-fake-or-real-dataset\/data."},{"key":"ref_33","unstructured":"Yi, J., Wang, C., Tao, J., Zhang, X., Zhang, C.Y., and Zhao, Y. (2023). Audio Deepfake Detection: A Survey. arXiv."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"109826","DOI":"10.1016\/j.asoc.2022.109826","article-title":"Adaptive Boosted Random Forest-Support Vector Machine Based Classification Scheme for Speaker Identification","volume":"131","author":"Karthikeyan","year":"2022","journal-title":"Appl. Soft Comput."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Liu, T., Yan, D., Wang, R., Yan, N., and Chen, G. (2021). Identification of Fake Stereo Audio Using SVM and CNN. Information, 12.","DOI":"10.3390\/info12070263"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Chau, H.-H., and Chau, Y. (2024, January 14\u201316). Audio-Based Classification of Mild Cognitive Impairment Using XGBoost. Proceedings of the 2024 IEEE 6th Eurasia Conference on Biomedical Engineering, Healthcare and Sustainability (ECBIOS), Tainan, Taiwan.","DOI":"10.1109\/ECBIOS61468.2024.10885492"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Wani, T.M., Qadri, S.A.A., Comminiello, D., and Amerini, I. (2024, January 24\u201326). Detecting Audio Deepfakes: Integrating CNN and BiLSTM with Multi-Feature Concatenation. Proceedings of the 2024 ACM Workshop on Information Hiding and Multimedia Security, Baiona, Spain.","DOI":"10.1145\/3658664.3659647"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Doan, T.P., Hong, K., and Jung, S. (2023, January 10\u201314). GAN Discriminator Based Audio Deepfake Detection. Proceedings of the 2nd Workshop on Security Implications of Deepfakes and Cheapfakes, Melbourne, VIC, Australia.","DOI":"10.1145\/3595353.3595883"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Lapates, J.M., Gerardo, B.D., and Medina, R.P. (2024, January 16\u201318). Performance Evaluation of Enhanced DCGANs for Detecting Deepfake Audio across Selected FoR Datasets. Proceedings of the 2024 15th International Conference on Information and Communication Technology Convergence (ICTC), Jeju Island, Republic of Korea.","DOI":"10.1109\/ICTC62082.2024.10827547"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Wijethunga, R., Matheesha, D., Al Noman, A., De Silva, K., Tissera, M., and Rupasinghe, L. (2020, January 10\u201311). Deepfake Audio Detection: A Deep Learning Based Solution for Group Conversations. Proceedings of the 2020 2nd International Conference on Advancements in Computing (ICAC), Colombo, Sri Lanka.","DOI":"10.1109\/ICAC51239.2020.9357161"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Volkova, M., Andzhukaev, T., Lavrentyeva, G., Novoselov, S., and Kozlov, A. (2019, January 20\u201325). Light CNN Architecture Enhancement for Different Types of Spoofing Attack Detection. Proceedings of the International Conference on Speech and Computer, Istanbul, Turkey.","DOI":"10.1007\/978-3-030-26061-3_53"},{"key":"ref_42","unstructured":"Sheikholeslami, S., Ghasemirahni, H., Payberah, A.H., Wang, T., Dowling, J., and Vlassov, V. (April, January 30). Utilizing Large Language Models for Ablation Studies in Machine Learning and Deep Learning. Proceedings of the 5th Workshop on Machine Learning and Systems, Rotterdam, The Netherlands."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/18\/11\/716\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,14]],"date-time":"2025-11-14T14:45:13Z","timestamp":1763131513000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/18\/11\/716"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11,13]]},"references-count":42,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2025,11]]}},"alternative-id":["a18110716"],"URL":"https:\/\/doi.org\/10.3390\/a18110716","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,11,13]]}}}