{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,30]],"date-time":"2025-11-30T23:03:57Z","timestamp":1764543837975,"version":"3.46.0"},"reference-count":30,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,12,1]],"date-time":"2025-12-01T00:00:00Z","timestamp":1764547200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,12,1]],"date-time":"2025-12-01T00:00:00Z","timestamp":1764547200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"National Key Research and Development Program of China","award":["No.2021YFB3101100"],"award-info":[{"award-number":["No.2021YFB3101100"]}]},{"DOI":"10.13039\/501100001809","name":"Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62272123"],"award-info":[{"award-number":["62272123"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Cybersecurity"],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Adversarial attacks on speaker identification (SI) systems have become a critical security concern, particularly in targeted black-box scenarios where access to the target model is limited. This paper proposes a novel framework that creates highly transferable adversarial examples. We use a\u00a0voice conversion (VC) model to synthesize shadow data from a single target speech sample, which is then used to train two diverse surrogate models. Neural Tangent Kernel (NTK) theory is employed to align acoustic feature spaces, while mutual information optimization enforces consistency between the surrogate models\u2019 predictions. Consequently, the adversarial attack is formulated as a min-max game that maximizes attack success while preserving speech quality. Extensive experiments on LibriSpeech and VCTK datasets demonstrate that our method significantly improves the transferability and effectiveness of adversarial examples compared to conventional approaches. Our findings suggest that generating shadow data through voice conversion followed by surrogate model training under information-theoretic constraints is a promising strategy for robust adversarial attacks.<\/jats:p>","DOI":"10.1186\/s42400-025-00490-2","type":"journal-article","created":{"date-parts":[[2025,11,30]],"date-time":"2025-11-30T23:02:25Z","timestamp":1764543745000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["InfoShadow: NTK &amp; MI guided adversarial attacks on speaker identification systems"],"prefix":"10.1186","volume":"8","author":[{"given":"Ruixin","family":"Song","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5974-1570","authenticated-orcid":false,"given":"Youliang","family":"Tian","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mengqian","family":"Li","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ze","family":"Yang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ruohan","family":"Wang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2025,12,1]]},"reference":[{"key":"490_CR1","unstructured":"Ahmed S, Wani Y, Shamsabadi AS, Yaghini M, Shumailov I, Papernot N, Fawaz K (2023) Tubes among us: analog attack on automatic speaker identification. In: 32nd USENIX security symposium (USENIX Security 23), pp. 265\u2013282"},{"key":"490_CR2","unstructured":"Becker S, Ackermann M, Lapuschkin S, M\u00fcller K-R, Samek W (2018)Interpreting and explaining deep neural networks for classification of audio signals. arXiv preprint arXiv:1807.03418"},{"key":"490_CR3","doi-asserted-by":"publisher","DOI":"10.1109\/TDSC.2022.3189397","author":"G Chen","year":"2022","unstructured":"Chen G, Zhao Z, Song F, Chen S, Fan L, Liu Y (2022) As2t: arbitrary source-to-target adversarial attack on speaker recognition systems. IEEE Trans Dependable Secure Comput. https:\/\/doi.org\/10.1109\/TDSC.2022.3189397","journal-title":"IEEE Trans Dependable Secure Comput"},{"key":"490_CR4","doi-asserted-by":"crossref","unstructured":"Chen G, Chenb S, Fan L, Du X, Zhao Z, Song F, Liu Y (2021) Who is real bob? Adversarial attacks on speaker recognition systems. In: 2021 IEEE symposium on security and privacy (SP), pp. 694\u2013711 IEEE","DOI":"10.1109\/SP40001.2021.00004"},{"key":"490_CR5","unstructured":"Chen G, Zhang Y, Zhao Z, Song F (2023) QFA2SR: Query-Free adversarial transfer attacks to speaker recognition systems. In: 32nd USENIX security symposium (USENIX Security 23), pp. 2437\u20132454"},{"key":"490_CR6","unstructured":"Christophe YV, Junichi Y, Kirsten M, et al.: (2016) Superseded-cstr vctk corpus: English multi-speaker corpus for cstr voice cloning toolkit"},{"key":"490_CR7","doi-asserted-by":"publisher","unstructured":"Desplanques B, Thienpondt J, Demuynck K (2020) Ecapa-tdnn: Emphasized channel attention, propagation and aggregation in tdnn based speaker verification. arXiv preprint arXiv:2005.07143https:\/\/doi.org\/10.48550\/arXiv.2005.07143","DOI":"10.48550\/arXiv.2005.07143"},{"key":"490_CR8","doi-asserted-by":"publisher","unstructured":"Duan R, Qu Z, Ding L, Liu Y, Lu Z (2023) Parrot-trained adversarial examples: pushing the practicality of black-box audio attacks against speaker recognition models. arXiv preprint arXiv:2311.07780https:\/\/doi.org\/10.48550\/arXiv.2311.07780","DOI":"10.48550\/arXiv.2311.07780"},{"key":"490_CR9","unstructured":"Jacot A, Gabriel F, Hongler C (2018) Neural tangent kernel: convergence and generalization in neural networks. Adv Neural Inf Process Syst 31"},{"key":"490_CR10","doi-asserted-by":"publisher","first-page":"4811","DOI":"10.1109\/TIFS.2021.3116438","volume":"16","author":"S Joshi","year":"2021","unstructured":"Joshi S, Villalba J, \u017belasko P, Moro-Vel\u00e1zquez L, Dehak N (2021) Study of pre-processing defenses against adversarial attacks on state-of-the-art speaker recognition systems. IEEE Trans Inf Forensics Secur 16:4811\u20134826. https:\/\/doi.org\/10.1109\/TIFS.2021.3116438","journal-title":"IEEE Trans Inf Forensics Secur"},{"issue":"3","key":"490_CR11","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3468673","volume":"17","author":"J Li","year":"2021","unstructured":"Li J, Zhang X, Xu J, Ma S, Gao W (2021) Learning to fool the speaker recognition. ACM Trans Multimed Comput Commun Appl (TOMM) 17(3):1\u201321. https:\/\/doi.org\/10.1145\/3468673","journal-title":"ACM Trans Multimed Comput Commun Appl (TOMM)"},{"key":"490_CR12","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2024.120618","volume":"670","author":"Y Li","year":"2024","unstructured":"Li Y, Zhang X, Sun M, Chen W, Li Y (2024) An attack-agnostic defense method against adversarial attacks on speaker verification by fusing downsampling and upsampling of speech signals. Inf Sci 670:120618. https:\/\/doi.org\/10.1016\/j.ins.2024.120618","journal-title":"Inf Sci"},{"key":"490_CR13","doi-asserted-by":"crossref","unstructured":"Li J, Tu W, Xiao L (2023) Freevc: Towards high-quality text-free one-shot voice conversion. In: ICASSP 2023-2023 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 1\u20135 IEEE","DOI":"10.1109\/ICASSP49357.2023.10095191"},{"issue":"3","key":"490_CR14","doi-asserted-by":"publisher","first-page":"33","DOI":"10.1007\/s11280-024-01274-3","volume":"27","author":"X Liu","year":"2024","unstructured":"Liu X, Tan H, Zhang J, Li A, Gu Z (2024) Transferable universal adversarial perturbations against speaker recognition systems. World Wide Web 27(3):33. https:\/\/doi.org\/10.1007\/s11280-024-01274-3","journal-title":"World Wide Web"},{"key":"490_CR15","doi-asserted-by":"crossref","unstructured":"Li J, Wang L, Xue L, Wang L, Wu Z (2024) An initial investigation of neural replay simulator for over-the-air adversarial perturbations to automatic speaker verification. In: ICASSP 2024-2024 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 4635\u20134639 IEEE","DOI":"10.1109\/ICASSP48485.2024.10447811"},{"key":"490_CR16","doi-asserted-by":"crossref","unstructured":"Panayotov V, Chen G, Povey D, Khudanpur S (2015) Librispeech: an asr corpus based on public domain audio books. In: 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 5206\u20135210 IEEE","DOI":"10.1109\/ICASSP.2015.7178964"},{"issue":"2","key":"490_CR17","doi-asserted-by":"publisher","first-page":"60","DOI":"10.1007\/s10044-024-01269-w","volume":"27","author":"U Patel","year":"2024","unstructured":"Patel U, Bhilare S, Hati A (2024) Enhancing cross-domain transferability of black-box adversarial attacks on speaker recognition systems using linearized backpropagation. Pattern Anal Appl 27(2):60. https:\/\/doi.org\/10.1007\/s10044-024-01269-w","journal-title":"Pattern Anal Appl"},{"key":"490_CR18","doi-asserted-by":"publisher","first-page":"132","DOI":"10.1109\/TASLP.2020.3038524","volume":"29","author":"B Sisman","year":"2020","unstructured":"Sisman B, Yamagishi J, King S, Li H (2020) An overview of voice conversion and its challenges: from statistical modeling to deep learning. IEEE\/ACM Trans Audio Speech Lang Process 29:132\u2013157. https:\/\/doi.org\/10.1109\/TASLP.2020.3038524","journal-title":"IEEE\/ACM Trans Audio Speech Lang Process"},{"key":"490_CR19","first-page":"40434","volume":"37","author":"C Wang","year":"2024","unstructured":"Wang C, He X, Wang Y, Wang J (2024) On the target-kernel alignment: a unified analysis with kernel complexity. Adv Neural Inf Process Syst 37:40434\u201340485","journal-title":"Adv Neural Inf Process Syst"},{"key":"490_CR20","doi-asserted-by":"crossref","unstructured":"Wang L, Li J, Luo Y, Zheng J, Wang L, Li H, Xu K, Fang C, Shi J, Wu Z (2024) Advsv: an over-the-air adversarial attack dataset for speaker verification. In: ICASSP 2024-2024 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 4555\u20134559 IEEE","DOI":"10.1109\/ICASSP48485.2024.10446549"},{"key":"490_CR21","doi-asserted-by":"publisher","unstructured":"Wang Q, Yao J, Wang Z, Guo P, Xie L (2023) Pseudo-siamese network based timbre-reserved black-box adversarial attack in speaker identification. arXiv preprint arXiv:2305.19020https:\/\/doi.org\/10.48550\/arXiv.2305.19020","DOI":"10.48550\/arXiv.2305.19020"},{"key":"490_CR22","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2024.125404","volume":"260","author":"F Yu","year":"2025","unstructured":"Yu F, Guan J, Wu H, Wang H, Ma B (2025) Multi-population differential evolution approach for feature selection with mutual information ranking. Expert Syst Appl 260:125404","journal-title":"Expert Syst Appl"},{"key":"490_CR23","unstructured":"Yu Z, Chang Y, Zhang N, Xiao C (2023) SMACK: semantically meaningful adversarial audio attack. In: 32nd USENIX security symposium (USENIX Security 23), pp. 3799\u20133816"},{"key":"490_CR24","doi-asserted-by":"publisher","first-page":"118789","DOI":"10.1109\/ACCESS.2022.3220639","volume":"10","author":"X Zhang","year":"2022","unstructured":"Zhang X, Xu Y, Zhang S, Li X (2022) A highly stealthy adaptive decay attack against speaker recognition. IEEE Access 10:118789\u2013118805. https:\/\/doi.org\/10.1109\/ACCESS.2022.3220639","journal-title":"IEEE Access"},{"key":"490_CR25","first-page":"4014","volume":"2023","author":"M Zhang","year":"2023","unstructured":"Zhang M, Xu K, Li H, Wang L, Fang C, Shi J (2023) Doubledeceiver: deceiving the speaker verification system protected by spoofing countermeasures. Proc. Interspeech 2023:4014\u20134018","journal-title":"Proc. Interspeech"},{"issue":"3","key":"490_CR26","doi-asserted-by":"publisher","first-page":"620","DOI":"10.1049\/cit2.12295","volume":"9","author":"J Zhang","year":"2024","unstructured":"Zhang J, Tan H, Wang L, Qian Y, Gu Z (2024) Rethinking multi-spatial information for transferable adversarial attacks on speaker recognition systems. CAAI Trans Intell Technol 9(3):620\u2013631. https:\/\/doi.org\/10.1049\/cit2.12295","journal-title":"CAAI Trans Intell Technol"},{"key":"490_CR27","doi-asserted-by":"crossref","unstructured":"Zhang W, Zhao S, Liu L, Li J, Cheng X, Zheng TF, Hu X (2021) Attack on practical speaker verification system using universal adversarial perturbations. In: ICASSP 2021-2021 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 2575\u20132579 IEEE","DOI":"10.1109\/ICASSP39728.2021.9413467"},{"key":"490_CR28","doi-asserted-by":"crossref","unstructured":"Zhao A, Chu T, Liu Y, Li W, Li J, Duan L (2023) Minimizing maximum model discrepancy for transferable black-box targeted attacks. In: proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp. 8153\u20138162","DOI":"10.1109\/CVPR52729.2023.00788"},{"key":"490_CR29","doi-asserted-by":"crossref","unstructured":"Zheng B, Jiang P, Wang Q, Li Q, Shen C, Wang C, Ge Y, Teng Q, Zhang S (2021) Black-box adversarial attacks on commercial speech platforms with minimal information. In: proceedings of the 2021 ACM SIGSAC conference on computer and communications security, pp. 86\u2013107","DOI":"10.1145\/3460120.3485383"},{"key":"490_CR30","doi-asserted-by":"crossref","unstructured":"Zuo C-X, Jia Z-J, Li W-J (2024) Advtts: adversarial text-to-speech synthesis attack on speaker identification systems. In: ICASSP 2024-2024 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 4840\u20134844 IEEE","DOI":"10.1109\/ICASSP48485.2024.10447190"}],"container-title":["Cybersecurity"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s42400-025-00490-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s42400-025-00490-2\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s42400-025-00490-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,30]],"date-time":"2025-11-30T23:02:31Z","timestamp":1764543751000},"score":1,"resource":{"primary":{"URL":"https:\/\/cybersecurity.springeropen.com\/articles\/10.1186\/s42400-025-00490-2"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12,1]]},"references-count":30,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["490"],"URL":"https:\/\/doi.org\/10.1186\/s42400-025-00490-2","relation":{},"ISSN":["2523-3246"],"issn-type":[{"value":"2523-3246","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,12,1]]},"assertion":[{"value":"1 April 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 September 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"1 December 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"107"}}