{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,7]],"date-time":"2026-05-07T12:00:05Z","timestamp":1778155205156,"version":"3.51.4"},"reference-count":21,"publisher":"MDPI AG","issue":"10","license":[{"start":{"date-parts":[[2024,10,8]],"date-time":"2024-10-08T00:00:00Z","timestamp":1728345600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computers"],"abstract":"<jats:p>Information dissemination and preservation are crucial for societal progress, especially in the technological age. While technology fosters knowledge sharing, it also risks spreading misinformation. Audio deepfakes\u2014convincingly fabricated audio created using artificial intelligence (AI)\u2014exacerbate this issue. We present Sonic Sleuth, a novel AI model designed specifically for detecting audio deepfakes. Our approach utilizes advanced deep learning (DL) techniques, including a custom CNN model, to enhance detection accuracy in audio misinformation, with practical applications in journalism and social media. Through meticulous data preprocessing and rigorous experimentation, we achieved a remarkable 98.27% accuracy and a 0.016 equal error rate (EER) on a substantial dataset of real and synthetic audio. Additionally, Sonic Sleuth demonstrated 84.92% accuracy and a 0.085 EER on an external dataset. The novelty of this research lies in its integration of datasets that closely simulate real-world conditions, including noise and linguistic diversity, enabling the model to generalize across a wide array of audio inputs. These results underscore Sonic Sleuth\u2019s potential as a powerful tool for combating misinformation and enhancing integrity in digital communications.<\/jats:p>","DOI":"10.3390\/computers13100256","type":"journal-article","created":{"date-parts":[[2024,10,8]],"date-time":"2024-10-08T12:03:49Z","timestamp":1728389029000},"page":"256","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":14,"title":["Audio Deep Fake Detection with Sonic Sleuth Model"],"prefix":"10.3390","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0009-0001-6599-1787","authenticated-orcid":false,"given":"Anfal","family":"Alshehri","sequence":"first","affiliation":[{"name":"Department of Computer Science, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah P.O. Box 80221, Saudi Arabia"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-2069-0079","authenticated-orcid":false,"given":"Danah","family":"Almalki","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah P.O. Box 80221, Saudi Arabia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7970-3248","authenticated-orcid":false,"given":"Eaman","family":"Alharbi","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah P.O. Box 80221, Saudi Arabia"},{"name":"Center of Research Excellence in Artificial Intelligence and Data Science, King Abdulaziz University, Jeddah, P.O. Box 80221, Saudi Arabia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4317-2358","authenticated-orcid":false,"given":"Somayah","family":"Albaradei","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah P.O. Box 80221, Saudi Arabia"},{"name":"Center of Research Excellence in Artificial Intelligence and Data Science, King Abdulaziz University, Jeddah, P.O. Box 80221, Saudi Arabia"}]}],"member":"1968","published-online":{"date-parts":[[2024,10,8]]},"reference":[{"key":"ref_1","unstructured":"Oh, S., Kang, M., Moon, H., Choi, K., and Chon, B.S. (2023). A demand-driven perspective on generative audio AI. arXiv."},{"key":"ref_2","unstructured":"(2020, May 04). Deepfakes (a Portmanteau of \u201cDeep Learning\u201d and \u201cFake\u201d). Images, Videos, or Audio Edited or Generated Using Artificial Intelligence Tools. Synthetic Media, Available online: https:\/\/en.wikipedia.org\/wiki\/Deepfake."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Gu, Y., Chen, Q., Liu, K., Xie, L., and Kang, C. (2019, January 18\u201321). GAN-based Model for Residential Load Generation Considering Typical Consumption Patterns. Proceedings of the ISGT 2019, Washington, DC, USA.","DOI":"10.1109\/ISGT.2019.8791575"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Camastra, F., and Vinciarelli, A. (2015). Machine Learning for Audio, Image and Video Analysis: Theory and Applications, Springer.","DOI":"10.1007\/978-1-4471-6735-8"},{"key":"ref_5","unstructured":"Tenoudji, F.C. (2018). Analog and Digital Signal Analysis: From Basics to Applications, Springer International Publishing."},{"key":"ref_6","unstructured":"Natsiou, A., and O\u2019Leary, S. (2022). Audio Representations for Deep Learning in Sound Synthesis: A Review. arXiv."},{"key":"ref_7","unstructured":"Marcus, G. (2020). The Next Decade in AI: Four Steps Towards Robust Artificial Intelligence. arXiv."},{"key":"ref_8","unstructured":"Frank, J., and Sch\u00f6nherr, L. (2021). WaveFake: A Data Set to Facilitate Audio Deepfake Detection. arXiv."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Kawa, P., Plata, M., and Syga, P. (2022, January 18\u201322). Attack Agnostic Dataset: Towards Generalization and Stabilization of Audio DeepFake Detection. Proceedings of the Interspeech 2022, ISCA, Incheon, Republic of Korea.","DOI":"10.21437\/Interspeech.2022-10078"},{"key":"ref_10","unstructured":"M\u00fcller, N.M., Czempin, P., Dieckmann, F., Froghyar, A., and B\u00f6ttinger, K. (2024). Does audio deepfake detection generalize?. arXiv."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Almutairi, Z., and Elgibreen, H. (2022). A Review of Modern Audio Deepfake Detection Methods: Challenges and Future Directions. Algorithms, 15.","DOI":"10.3390\/a15050155"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Sun, C., Jia, S., Hou, S., AlBadawy, E., and Lyu, S. (2023). Exposing AI-Synthesized Human Voices Using Neural Vocoder Artifacts. arXiv.","DOI":"10.1109\/CVPRW59228.2023.00097"},{"key":"ref_13","unstructured":"Zhang, C., Zhang, C., Zheng, S., Zhang, M., Qamar, M., Bae, S.-H., and Kweon, I.S. (2023). A Survey on Audio Diffusion Models: Text To Speech Synthesis and Enhancement in Generative AI. arXiv."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Wang, X., Yamagishi, J., Todisco, M., Delgado, H., Nautsch, A., Evans, N., Sahidullah, M., Vestman, V., Kinnunen, T., and Lee, K.A. (2020). ASVspoof 2019: A Large-Scale Public Database of Synthesized, Converted and Replayed Speech. arXiv.","DOI":"10.1016\/j.csl.2020.101114"},{"key":"ref_15","unstructured":"Khalid, H., Tariq, S., Kim, M., and Woo, S.S. (2023, September 29). FakeAVCeleb: A Novel Audio-Video Multimodal Deepfake Dataset. In Proceedings of the Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2). Available online: https:\/\/openreview.net\/forum?id=TAXFsg6ZaOl."},{"key":"ref_16","unstructured":"Abdeldayem, M. (2024, May 28). The Fake-or-Real Dataset. Kaggle Dataset, Available online: https:\/\/www.kaggle.com\/datasets\/mohammedabdeldayem\/the-fake-or-real-dataset."},{"key":"ref_17","first-page":"2087","article-title":"A Comparison of Features for Synthetic Speech Detection","volume":"2015","author":"Sahidullah","year":"2015","journal-title":"Interspeech"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Zheng, F., and Zhang, G. (2000, January 16\u201320). Integrating the energy information into MFCC. Proceedings of the 6th International Conference on Spoken Language Processing (ICSLP 2000), Beijing, China.","DOI":"10.21437\/ICSLP.2000-96"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"516","DOI":"10.1016\/j.csl.2017.01.001","article-title":"Constant Q Cepstral Coefficients: A Spoofing Countermeasure for Automatic Speaker Verification","volume":"45","author":"Todisco","year":"2017","journal-title":"Comput. Speech Lang."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Khalid, H., Kim, M., Tariq, S., and Woo, S.S. (2021, January 24). Evaluation of an audio-video multimodal deepfake dataset using unimodal and multimodal detectors. Proceedings of the 1st Workshop on Synthetic Multimedia-Audiovisual Deepfake Generation and Detection, Virtual Event.","DOI":"10.1145\/3476099.3484315"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Alzantot, M., Wang, Z., and Srivastava, M.B. (2019). Deep residual neural networks for audio spoofing detection. arXiv.","DOI":"10.21437\/Interspeech.2019-3174"}],"container-title":["Computers"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-431X\/13\/10\/256\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T16:09:12Z","timestamp":1760112552000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-431X\/13\/10\/256"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,10,8]]},"references-count":21,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2024,10]]}},"alternative-id":["computers13100256"],"URL":"https:\/\/doi.org\/10.3390\/computers13100256","relation":{},"ISSN":["2073-431X"],"issn-type":[{"value":"2073-431X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,10,8]]}}}