{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,10]],"date-time":"2026-01-10T09:30:51Z","timestamp":1768037451231,"version":"3.49.0"},"reference-count":42,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2025,3,3]],"date-time":"2025-03-03T00:00:00Z","timestamp":1740960000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"The Major Science and Technology Project of Xiamen (Industry and Information Technology Area)","award":["3502Z20231007"],"award-info":[{"award-number":["3502Z20231007"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["BDCC"],"abstract":"<jats:p>The detection of synthetic speech has become a pressing challenge due to the potential societal risks posed by synthetic speech technologies. Existing methods primarily focus on either the time or frequency domain of speech, limiting their ability to generalize to new and diverse speech synthesis algorithms. In this work, we present a novel and scientifically grounded approach, the Dual-domain Fusion Network (DDFNet), which synergistically integrates features from both the time and frequency domains to capture complementary information. The architecture consists of two specialized single-domain feature extraction networks, each optimized for the unique characteristics of its respective domain, and a feature fusion network that effectively combines these features at a deep level. Moreover, we incorporate multi-task learning to simultaneously capture rich, multi-faceted representations, further enhancing the model\u2019s generalization capability. Extensive experiments on the ASVspoof 2019 Logical Access corpus and ASVspoof 2021 tracks demonstrate that DDFNet achieves strong performance, maintaining competitive results despite the challenges posed by channel changes and compression coding, highlighting its robust generalization ability.<\/jats:p>","DOI":"10.3390\/bdcc9030058","type":"journal-article","created":{"date-parts":[[2025,3,3]],"date-time":"2025-03-03T09:04:49Z","timestamp":1740992689000},"page":"58","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["DDFNet: A Dual-Domain Fusion Network for Robust Synthetic Speech Detection"],"prefix":"10.3390","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2803-6203","authenticated-orcid":false,"given":"Jing","family":"Lu","sequence":"first","affiliation":[{"name":"College of Computer Science and Technology, Huaqiao University, Xiamen 361021, China"},{"name":"Xiamen Key Laboratory of Data Security & Blockchain Technology, Huaqiao University, Xiamen 361021, China"}]},{"given":"Qiang","family":"Zhang","sequence":"additional","affiliation":[{"name":"College of Computer Science and Technology, Huaqiao University, Xiamen 361021, China"},{"name":"Xiamen Key Laboratory of Data Security & Blockchain Technology, Huaqiao University, Xiamen 361021, China"}]},{"given":"Jialu","family":"Cao","sequence":"additional","affiliation":[{"name":"College of Computer Science and Technology, Huaqiao University, Xiamen 361021, China"},{"name":"Xiamen Key Laboratory of Data Security & Blockchain Technology, Huaqiao University, Xiamen 361021, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1591-656X","authenticated-orcid":false,"given":"Hui","family":"Tian","sequence":"additional","affiliation":[{"name":"College of Computer Science and Technology, Huaqiao University, Xiamen 361021, China"},{"name":"Xiamen Key Laboratory of Data Security & Blockchain Technology, Huaqiao University, Xiamen 361021, China"}]}],"member":"1968","published-online":{"date-parts":[[2025,3,3]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"15171","DOI":"10.1007\/s11042-022-13943-4","article-title":"A deep learning approaches in text-to-speech system: A systematic review and recent research perspective","volume":"82","author":"Kumar","year":"2023","journal-title":"Multimed. Tools Appl."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"132","DOI":"10.1109\/TASLP.2020.3038524","article-title":"An overview of voice conversion and its challenges: From statistical modeling to deep learning","volume":"29","author":"Sisman","year":"2021","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Shen, J., Pang, R., Weiss, R.J., Schuster, M., Jaitly, N., Yang, Z., Chen, Z., Zhang, Y., Wang, Y., and Skerrv-Ryan, R. (2018, January 15\u201320). Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions. Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada.","DOI":"10.1109\/ICASSP.2018.8461368"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"540","DOI":"10.1109\/TASLP.2019.2960721","article-title":"Non-Parallel Sequence-to-Sequence Voice Conversion With Disentangled Linguistic and Speaker Representations","volume":"28","author":"Zhang","year":"2020","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"825","DOI":"10.1093\/ietisy\/e90-d.5.825","article-title":"A hidden semi-Markov model-based speech synthesis system","volume":"90","author":"Zen","year":"2007","journal-title":"IEICE Trans. Inf. Syst."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Kaneko, T., Kameoka, H., Tanaka, K., and Hojo, N. (2019, January 12\u201317). Cyclegan-VC2: Improved Cyclegan-based Non-parallel Voice Conversion. Proceedings of the ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK.","DOI":"10.1109\/ICASSP.2019.8682897"},{"key":"ref_7","unstructured":"Kubin, G., and Kacic, Z. (2019, January 15\u201319). Deep residual neural networks for audio spoofing detection. Proceedings of the 20th Annual Conference of the International Speech Communication Association, Interspeech 2019, Graz, Austria."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Kwak, I.Y., Kwag, S., Lee, J., Huh, J.H., Lee, C.H., Jeon, Y., Hwang, J., and Yoon, J.W. (2021, January 10\u201315). ResMax: Detecting Voice Spoofing Attacks with Residual Network and Max Feature Map. Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy.","DOI":"10.1109\/ICPR48806.2021.9412165"},{"key":"ref_9","unstructured":"Meng, H., Xu, B., and Zheng, T.F. (2020, January 25\u201329). Spoofing attack detection using the non-linear fusion of sub-band classifiers. Proceedings of the 21st Annual Conference of the International Speech Communication Association, Interspeech 2020, Virtual Event, Shanghai, China."},{"key":"ref_10","unstructured":"Wang, Z., Cui, S., Kang, X., Sun, W., and Li, Z. (2020, January 7\u201310). Densely connected convolutional network for audio spoofing detection. Proceedings of the 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Auckland, New Zealand."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Wu, Z., Das, R.K., Yang, J., and Li, H. (2020, January 25\u201329). Light convolutional neural network with feature genuinization for detection of synthetic speech attacks. Proceedings of the 21st Annual Conference of the International Speech Communication Association, Interspeech 2020, Virtual Event, Shanghai, China.","DOI":"10.21437\/Interspeech.2020-1810"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Luo, A., Li, E., Liu, Y., Kang, X., and Wang, Z.J. (2021, January 6\u201311). A capsule network based approach for detection of audio spoofing attacks. Proceedings of the ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada.","DOI":"10.1109\/ICASSP39728.2021.9414670"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Zhang, Z., Yi, X., and Zhao, X. (2021, January 22\u201325). Fake speech detection using residual network with transformer encoder. Proceedings of the 2021 ACM Workshop on Information Hiding and Multimedia Security, Online.","DOI":"10.1145\/3437880.3460408"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Li, X., Li, N., Weng, C., Liu, X., Su, D., Yu, D., and Meng, H. (2021, January 6\u201311). Replay and synthetic speech detection with res2net architecture. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, Toronto, ON, Canada.","DOI":"10.1109\/ICASSP39728.2021.9413828"},{"key":"ref_15","unstructured":"Hermansky, H., Cernock\u00fd, H., Burget, L., Lamel, L., Scharenborg, O., and Motl\u00edcek, P. (September, January 30). Channel-wise gated res2net: Towards robust detection of synthetic speech attacks. Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Ma, X., Liang, T., Zhang, S., Huang, S., and He, L. (2021, January 5\u20139). Improved lightcnn with attention modules for asv spoofing detection. Proceedings of the 2021 IEEE International Conference on Multimedia and Expo (ICME), Shenzhen, China.","DOI":"10.1109\/ICME51207.2021.9428313"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Ray, R., Karthik, S., Mathur, V., Kumar, P., Maragatham, G., Tiwari, S., and Shankarappa, R.T. (2021, January 6\u20138). Feature genuinization based residual squeeze-and-excitation for audio anti-spoofing in sound AI. Proceedings of the 2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT), Kharagpur, India.","DOI":"10.1109\/ICCCNT51525.2021.9580127"},{"key":"ref_18","unstructured":"Hermansky, H., Cernock\u00fd, H., Burget, L., Lamel, L., Scharenborg, O., and Motl\u00edcek, P. (September, January 30). Graph attention networks for anti-spoofing. Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Wang, X., and Yamagishi, J. (September, January 30). A comparative study on recent neural spoofing countermeasures for synthetic speech detection. Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia.","DOI":"10.21437\/Interspeech.2021-702"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"1462","DOI":"10.1109\/LSP.2022.3183951","article-title":"Synthetic Speech Detection Based on Local Autoregression and Variance Statistics","volume":"29","author":"Cui","year":"2022","journal-title":"IEEE Signal Process. Lett."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Lei, Z., Yan, H., Liu, C., Ma, M., and Yang, Y. (2022, January 23\u201327). Two-Path GMM-ResNet and GMM-SENet for ASV Spoofing Detection. Proceedings of the ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore.","DOI":"10.1109\/ICASSP43922.2022.9746163"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"756","DOI":"10.1007\/978-3-031-17143-7_38","article-title":"Audio spoofing detection using constant-q spectral sketches and parallel-attention se-resnet","volume":"Volume 13556","author":"Atluri","year":"2022","journal-title":"Proceedings of the Computer Security\u2014ESORICS 2022\u201427th European Symposium on Research in Computer Security"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"1142","DOI":"10.1109\/LSP.2022.3169954","article-title":"The Role of Long-Term Dependency in Synthetic Speech Detection","volume":"29","author":"Li","year":"2022","journal-title":"IEEE Signal Process. Lett."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Muckenhirn, H., Magimai-Doss, M., and Marcel, S. (2017, January 1\u20134). End-to-End convolutional neural network-based voice presentation attack detection. Proceedings of the 2017 IEEE International Joint Conference on Biometrics (IJCB), Denver, CO, USA.","DOI":"10.1109\/BTAS.2017.8272715"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Hermansky, H., Cernock\u00fd, H., Burget, L., Lamel, L., Scharenborg, O., and Motl\u00edcek, P. (September, January 30). Rw-resnet: A novel speech anti-spoofing model using raw waveform. Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia.","DOI":"10.21437\/Interspeech.2021-438"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"1265","DOI":"10.1109\/LSP.2021.3089437","article-title":"Towards end-to-end synthetic speech detection","volume":"28","author":"Hua","year":"2021","journal-title":"IEEE Signal Process. Lett."},{"key":"ref_27","unstructured":"Kubin, G., and Kacic, Z. (2019, January 15\u201319). Understanding and visualizing raw waveform-based cnns. Proceedings of the 20th Annual Conference of the International Speech Communication Association, Interspeech 2019, Graz, Austria."},{"key":"ref_28","first-page":"1975","article-title":"End-to-end Synthetic Speech Detection Based on Attention Mechanism","volume":"38","author":"Wang","year":"2022","journal-title":"J. Signal Process."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"130","DOI":"10.1016\/j.specom.2014.10.005","article-title":"Spoofing and countermeasures for speaker verification: A survey","volume":"66","author":"Wu","year":"2015","journal-title":"Speech Commun."},{"key":"ref_30","unstructured":"Kubin, G., and Kacic, Z. (2019, January 15\u201319). Asvspoof 2019: Future horizons in spoofed and fake audio detection. Proceedings of the 20th Annual Conference of the International Speech Communication Association, Interspeech 2019, Graz, Austria."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"810","DOI":"10.1109\/TIFS.2015.2398812","article-title":"Toward a Universal Synthetic Speech Spoofing Detection Using Phase Information","volume":"10","author":"Sanchez","year":"2015","journal-title":"IEEE Trans. Inf. Forensics Secur."},{"key":"ref_32","unstructured":"Hermansky, H., Cernock\u00fd, H., Burget, L., Lamel, L., Scharenborg, O., and Motl\u00edcek, P. (September, January 30). The effect of silence and dual-band fusion in anti-spoofing system. Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"101114","DOI":"10.1016\/j.csl.2020.101114","article-title":"ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech","volume":"64","author":"Wang","year":"2020","journal-title":"Comput. Speech Lang."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Yamagishi, J., Wang, X., Todisco, M., Sahidullah, M., Patino, J., Nautsch, A., Liu, X., Lee, K.A., Kinnunen, T., and Evans, N. (2021). ASVspoof 2021: Accelerating progress in spoofed and deepfake speech detection. arXiv.","DOI":"10.21437\/ASVSPOOF.2021-8"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Jung, J.w., Heo, H.S., Tak, H., Shim, H.j., Chung, J.S., Lee, B.J., Yu, H.J., and Evans, N. (2022, January 23\u201327). AASIST: Audio anti-spoofing using integrated spectro-temporal graph attention networks. Proceedings of the ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore.","DOI":"10.1109\/ICASSP43922.2022.9747766"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Wang, X., Delgado, H., Tak, H., Jung, J.w., Shim, H.j., Todisco, M., Kukanov, I., Liu, X., Sahidullah, M., and Kinnunen, T. (2024). ASVspoof 5: Crowdsourced speech data, deepfakes, and adversarial attacks at scale. arXiv.","DOI":"10.21437\/ASVspoof.2024-1"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"M\u00fcller, N.M., Kawa, P., Choong, W.H., Casanova, E., G\u00f6lge, E., M\u00fcller, T., Syga, P., Sperl, P., and B\u00f6ttinger, K. (July, January 30). Mlaad: The multi-language audio anti-spoofing dataset. Proceedings of the 2024 International Joint Conference on Neural Networks (IJCNN), Yokohama, Japan.","DOI":"10.1109\/IJCNN60899.2024.10650962"},{"key":"ref_38","unstructured":"Larcher, A., and Bonastre, J. (2018, January 26\u201329). t-dcf: A detection cost function for the tandem assessment of spoofing countermeasures and automatic speaker verification. Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, Les Sables d\u2019Olonne, France."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"937","DOI":"10.1109\/LSP.2021.3076358","article-title":"One-class learning towards synthetic voice spoofing detection","volume":"28","author":"Zhang","year":"2021","journal-title":"IEEE Signal Process. Lett."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Ge, W., Patino, J., Todisco, M., and Evans, N. (September, January 30). Raw differentiable architecture search for speech deepfake and spoofing detection. Proceedings of the 2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge, Brno, Czechia.","DOI":"10.21437\/ASVSPOOF.2021-4"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Fu, Q., Teng, Z., White, J., Powell, M.E., and Schmidt, D.C. (2022, January 23\u201327). FastAudio: A Learnable Audio Front-End For Spoof Speech Detection. Proceedings of the ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore.","DOI":"10.1109\/ICASSP43922.2022.9746722"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"359","DOI":"10.1109\/LSP.2023.3262419","article-title":"End-to-End Dual-Branch Network Towards Synthetic Speech Detection","volume":"30","author":"Ma","year":"2023","journal-title":"IEEE Signal Process. Lett."}],"container-title":["Big Data and Cognitive Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-2289\/9\/3\/58\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T16:46:01Z","timestamp":1760028361000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-2289\/9\/3\/58"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,3]]},"references-count":42,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2025,3]]}},"alternative-id":["bdcc9030058"],"URL":"https:\/\/doi.org\/10.3390\/bdcc9030058","relation":{},"ISSN":["2504-2289"],"issn-type":[{"value":"2504-2289","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,3,3]]}}}