{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,1]],"date-time":"2026-04-01T14:42:47Z","timestamp":1775054567407,"version":"3.50.1"},"reference-count":49,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2022,6,17]],"date-time":"2022-06-17T00:00:00Z","timestamp":1655424000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,6,17]],"date-time":"2022-06-17T00:00:00Z","timestamp":1655424000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100004608","name":"Natural Science Foundation of Jiangsu Province","doi-asserted-by":"publisher","award":["BK20180080"],"award-info":[{"award-number":["BK20180080"]}],"id":[{"id":"10.13039\/501100004608","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62071484"],"award-info":[{"award-number":["62071484"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["U20B2047"],"award-info":[{"award-number":["U20B2047"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2023,2]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Automatic speaker recognition is an important biometric authentication approach with emerging applications. However, recent research has shown its vulnerability on adversarial attacks. In this paper, we propose a new type of adversarial examples by generating <jats:italic>imperceptible<\/jats:italic> adversarial samples for <jats:italic>targeted<\/jats:italic> attacks on <jats:italic>black-box<\/jats:italic> systems of automatic speaker recognition. Waveform samples are created directly by solving an optimization problem with waveform inputs and outputs, which is more realistic in real-life scenario. Inspired by <jats:italic>auditory masking<\/jats:italic>, a regularization term adapting to the energy of speech waveform is proposed for generating imperceptible adversarial perturbations. The optimization problems are subsequently solved by <jats:italic>differential evolution algorithm<\/jats:italic> in a black-box manner which does not require any knowledge on the inner configuration of the recognition systems. Experiments conducted on commonly used data sets, LibriSpeech and VoxCeleb, show that the proposed methods have successfully performed targeted attacks on state-of-the-art speaker recognition systems while being imperceptible to human listeners. Given the high SNR and PESQ scores of the yielded adversarial samples, the proposed methods deteriorate less on the quality of the original signals than several recently proposed methods, which justifies the imperceptibility of adversarial samples.<\/jats:p>","DOI":"10.1007\/s40747-022-00782-x","type":"journal-article","created":{"date-parts":[[2022,6,17]],"date-time":"2022-06-17T09:10:41Z","timestamp":1655457041000},"page":"65-79","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":20,"title":["Imperceptible black-box waveform-level adversarial attack towards automatic speaker recognition"],"prefix":"10.1007","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0783-684X","authenticated-orcid":false,"given":"Xingyu","family":"Zhang","sequence":"first","affiliation":[]},{"given":"Xiongwei","family":"Zhang","sequence":"additional","affiliation":[]},{"given":"Meng","family":"Sun","sequence":"additional","affiliation":[]},{"given":"Xia","family":"Zou","sequence":"additional","affiliation":[]},{"given":"Kejiang","family":"Chen","sequence":"additional","affiliation":[]},{"given":"Nenghai","family":"Yu","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,6,17]]},"reference":[{"key":"782_CR1","doi-asserted-by":"crossref","unstructured":"Ren H, Song Y, Yang S, Situ F (2016) Secure smart home: a voiceprint and internet-based authentication system for remote accessing. In Proc. 2016 11th international conference on computer science and education (ICCSE), Nagoya, Japan, Aug. 2016, pp 247\u2013251","DOI":"10.1109\/ICCSE.2016.7581588"},{"key":"782_CR2","doi-asserted-by":"crossref","unstructured":"Granqvist F, Seigel M, van Dalen R, Cahill A, Shum S, Paulik M (2020) Improving on-device speaker verification using federated learning with privacy. In: Proc. 2020 21th annual conference of the international speech communication association (INTERSPEECH), Shanghai, China, Oct. 2020","DOI":"10.21437\/Interspeech.2020-2944"},{"issue":"6","key":"782_CR3","doi-asserted-by":"publisher","first-page":"74","DOI":"10.1109\/MSP.2015.2462851","volume":"32","author":"JH Hansen","year":"2015","unstructured":"Hansen JH, Hasan T (2015) Speaker recognition by machines and humans: a tutorial review. IEEE Signal Process Mag 32(6):74\u201399","journal-title":"IEEE Signal Process Mag"},{"key":"782_CR4","doi-asserted-by":"crossref","unstructured":"Kinnunen T, Sahidullah M, Delgado H, Todisco M, Evans N, Yamagishi J, Lee KA (2017) The ASVspoof 2017 challenge: assessing the limits of replay spoofing attack detection. In: Proc. 2017 18th annual conference of the international speech communication association (INTERSPEECH), Stockholm, Sweden, Aug. 2017, pp 2\u20136","DOI":"10.21437\/Interspeech.2017-1111"},{"key":"782_CR5","doi-asserted-by":"crossref","unstructured":"Todisco M, Wang X, Vestman V, Sahidullah M, Delgado H, Nautsch A, Yamagishi J, Evans N, Kinnunen T, Lee KA (2019) ASVspoof 2019: future horizons in spoofed and fake audio detection. In: Proc. 2019 20th annual conference of the international speech communication association (INTERSPEECH), Graz, Austria, Sep. 2019","DOI":"10.21437\/Interspeech.2019-2249"},{"key":"782_CR6","doi-asserted-by":"crossref","unstructured":"Lorenzo-Trueba J, Yamagishi J, Toda T, Saito D, Villavicencio F, Kinnunen T, Ling T (2018) The voice conversion challenge 2018: promoting development of parallel and nonparallel methods. In: Proc. 2018 19th annual conference of the international speech communication association (INTERSPEECH), Hyderabad, India, Sep. 2018","DOI":"10.21437\/Odyssey.2018-28"},{"key":"782_CR7","unstructured":"Voice Conversion Challenge (2020) Accessed Oct. 2020. https:\/\/vc-challenge.org"},{"key":"782_CR8","doi-asserted-by":"crossref","unstructured":"Kreuk F, Adi Y, Cisse M, Keshet J (2018) Fooling end-to-end speaker verification with adversarial examples. In: Proc.2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), Calgary, AB, Canada, Apr. 2018, pp 1962\u20131966","DOI":"10.1109\/ICASSP.2018.8462693"},{"key":"782_CR9","doi-asserted-by":"crossref","unstructured":"Li X, Zhong J, Wu X, Yu J, Liu X, Meng H (2020) Adversarial attacks on GMM I-vector based speaker verification systems. In: Proc.2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), Barcelona, Spain, May 2020, pp 6579\u20136583","DOI":"10.1109\/ICASSP40776.2020.9053076"},{"key":"782_CR10","doi-asserted-by":"crossref","unstructured":"Li Z, Shi C, Xie Y, Liu J, Yuan B, Chen Y (2020) Practical adversarial attacks against speaker recognition systems. In: Proc. 21st international workshop on mobile computing systems and applications (ACM Hot Mobile), Austin, Texas, USA, Mar. 2020, pp 9\u201314","DOI":"10.1145\/3376897.3377856"},{"key":"782_CR11","doi-asserted-by":"crossref","unstructured":"Xie Y, Shi C, Li Z, Liu J, Chen Y, Yuan B (2020) Real-time, universal and robust adversarial attacks against speaker recognition systems. In: Proc. 2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), Barcelona, Spain, May 2020, pp 1738\u20131742","DOI":"10.1109\/ICASSP40776.2020.9053747"},{"key":"782_CR12","doi-asserted-by":"crossref","unstructured":"Wang Q, Guo P, Xie L (2020) Inaudible adversarial perturbations for targeted attack in speaker recognition. In: Proc. 2020 21th annual conference of the international speech communication association (INTERSPEECH), Shanghai, China, Oct. 2020","DOI":"10.21437\/Interspeech.2020-1955"},{"key":"782_CR13","doi-asserted-by":"crossref","unstructured":"Jati A, Hsu CC, Pal M, Peri R, Abd Almageed W, Narayanan S (2021) Adversarial attack and defense strategies for deep speaker recognition systems. Comp Speech Lang 68(101199)","DOI":"10.1016\/j.csl.2021.101199"},{"key":"782_CR14","doi-asserted-by":"crossref","unstructured":"Chen G, Chen S, Fan L, Du X, Zhao Z, Song F, Liu Y (2021) Who is real bob? Adversarial attacks on speaker recognition systems. In: Proc. 2021 IEEE symposium on security and privacy workshops (SPW), San Francisco, CA, USA, May 2021","DOI":"10.1109\/SP40001.2021.00004"},{"key":"782_CR15","doi-asserted-by":"crossref","unstructured":"Abdullah H, Garcia W, Peeters C, Traynor P, Butler KRB, Wilson J (2019) Practical hidden voice attacks against speech and speaker recognition systems. In: Proc. Network and Distributed Systems Security (NDSS), San Diego, United States, Feb. 2019","DOI":"10.14722\/ndss.2019.23362"},{"key":"782_CR16","unstructured":"Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I, Fergus R (2014) Intriguing properties of neural networks. In: Proc. 2nd international conference on learning representations (ICLR), Banff, Canada, Apr. 2014"},{"key":"782_CR17","unstructured":"Goodfellow IJ, Shlens J, Szegedy C (2015) Explaining and harnessing adversarial examples. In: Proc. 3nd international conference on learning representations (ICLR), Toronto, Canada, Jul. 2015"},{"key":"782_CR18","unstructured":"Kurakin A, Goodfellow I, Bengio S (2017) Adversarial examples in the physical world. In: Proc. 5nd international conference on learning representations (ICLR), Toulon, France, Apr. 2017"},{"key":"782_CR19","doi-asserted-by":"crossref","unstructured":"Papernot N, McDaniel P, Jha S, Fredrikson M, Celik ZB, Swami A (2016) The Limitations of Deep Learning in Adversarial Settings.In: Proc. 2016 IEEE european symposium on security and privacy (Euro S&P), Saarbrucken, Germany, Mar. 2016, pp 372\u2013387","DOI":"10.1109\/EuroSP.2016.36"},{"issue":"5","key":"782_CR20","doi-asserted-by":"publisher","first-page":"828","DOI":"10.1109\/TEVC.2019.2890858","volume":"23","author":"J Su","year":"2019","unstructured":"Su J, Vargas DV, Sakurai K (2019) One pixel attack for fooling deep neural networks. IEEE Trans Evol Comput 23(5):828\u2013841","journal-title":"IEEE Trans Evol Comput"},{"key":"782_CR21","doi-asserted-by":"crossref","unstructured":"Carlini N, Wagner D (2017) Towards Evaluating the Robustness of Neural Networks. In: Proc.2017 symposium on IEEE security and privacy workshops (SPW), San Jose, CA, USA, May 2017, pp 39\u201357","DOI":"10.1109\/SP.2017.49"},{"key":"782_CR22","doi-asserted-by":"crossref","unstructured":"Moosavi-Dezfooli SM, Fawzi A, Frossard P (2016)Deepfool: a simple and accurate method to fool deep neural networks. In: Proc. 2016 IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, Nevada, USA, Jun. 2016, pp 2574\u20132582","DOI":"10.1109\/CVPR.2016.282"},{"key":"782_CR23","doi-asserted-by":"crossref","unstructured":"Moosavi-Dezfooli SM, Fawzi A, Fawzi O, Frossard P (2017) Universal Adversarial Perturbations. In: Proc. 2017 IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, Hawaii, USA, Jul. 2017, pp 1765\u20131773","DOI":"10.1109\/CVPR.2017.17"},{"key":"782_CR24","doi-asserted-by":"crossref","unstructured":"Das RK, Tian X, Kinnunen T, Li H (2020) The attacker's perspective on automatic speaker verification: an overview. In: Proc. 2020 21th annual conference of the international speech communication association (INTERSPEECH), Shanghai, China, Oct. 2020","DOI":"10.21437\/Interspeech.2020-1052"},{"issue":"5","key":"782_CR25","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3178115","volume":"9","author":"Z Zhang","year":"2018","unstructured":"Zhang Z, Geiger J, Pohjalainen J, Mousa AED, Jin W, Schuller B (2018) Deep learning for environmentally robust speech recognition: an overview of recent developments. ACM Trans Intell Syst Technol (TIST) 9(5):1\u201328","journal-title":"ACM Trans Intell Syst Technol (TIST)"},{"key":"782_CR26","doi-asserted-by":"crossref","unstructured":"Safavi S, Gan H, Mporas I, Sotudeh R (2016) Fraud detection in voice-based identity authentication applications and services. In: 2016 IEEE 16th international conference on data mining workshops (ICDMW), Barcelona, Spain, Dec. 2016, pp 1074\u20131081","DOI":"10.1109\/ICDMW.2016.0155"},{"issue":"9","key":"782_CR27","doi-asserted-by":"publisher","first-page":"2805","DOI":"10.1109\/TNNLS.2018.2886017","volume":"30","author":"X Yuan","year":"2019","unstructured":"Yuan X, He P, Zhu Q, Li X (2019) Adversarial examples: attacks and defenses for deep learning. IEEE Trans Neural Netw Learn Syst 30(9):2805\u20132824","journal-title":"IEEE Trans Neural Netw Learn Syst"},{"key":"782_CR28","unstructured":"Qin Y, Carlini N, Goodfellow I, Cottrell G, Raffel C (2019) Imperceptible, robust, and targeted adversarial examples for automatic speech recognition. In: Proc. 2019 36th international conference on machine learning (PMLR), Long Beach, California, 2019"},{"key":"782_CR29","doi-asserted-by":"crossref","unstructured":"Schonherr L, Kohls K, Zeiler S, Holz T, Kolossa D (2019) Adversarial attacks against automatic speech recognition systems via psychoacoustic hiding. In: Proc. 2019 network and distributed system security symposium (NDSS), San Diego, California, Feb. 2019","DOI":"10.14722\/ndss.2019.23288"},{"key":"782_CR30","unstructured":"Ilyas A, Engstrom L, Athalye A, Lin J (2018) Black-box adversarial attacks with limited queries and information. In: Proc. 2018 35th international conference on machine learning (ICML), Stockholm, Sweden, Jul. 2018, pp 2137\u20132146"},{"key":"782_CR31","doi-asserted-by":"crossref","unstructured":"Wilkinghoff K (2020) On open-set speaker identification with I-vectors. In: Proc. Odyssey 2020 the speaker and language recognition workshop, Tokyo, Japan, May 2020, pp 408\u2013414","DOI":"10.21437\/Odyssey.2020-58"},{"issue":"11","key":"782_CR32","first-page":"2851","volume":"9","author":"T Liu","year":"2014","unstructured":"Liu T, Guan S (2014) Factor analysis method for text-independent speaker identification. J Softw (JSW) 9(11):2851\u20132860","journal-title":"J Softw (JSW)"},{"key":"782_CR33","doi-asserted-by":"crossref","unstructured":"Snyder D, Garcia-Romero D, Povey D, Khudanpur S (2017) Deep neural network embeddings for text-independent speaker verification. In: Proc. 2017 18th annual conference of the international speech communication association (INTERSPEECH), Stockholm, Sweden, Aug. 2017, pp 999\u20131003","DOI":"10.21437\/Interspeech.2017-620"},{"issue":"4","key":"782_CR34","doi-asserted-by":"publisher","first-page":"846","DOI":"10.1109\/TASLP.2014.2308473","volume":"22","author":"S Cumani","year":"2014","unstructured":"Cumani S, Plchot O, Laface P (2014) On the use of I-vector posterior distributions in probabilistic linear discriminant analysis. IEEE\/ACM Trans Audio Speech Lang Process 22(4):846\u2013857","journal-title":"IEEE\/ACM Trans Audio Speech Lang Process"},{"key":"782_CR35","doi-asserted-by":"crossref","unstructured":"Snyder D, Garcia-Romero D, Sell G, Povey D, Khudanpur S (2018) X-vectors: robust DNN embeddings for speaker recognition. In: Proc. 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), Calgary, AB, Canada, Apr. 2018, pp 5329\u20135333","DOI":"10.1109\/ICASSP.2018.8461375"},{"key":"782_CR36","doi-asserted-by":"publisher","DOI":"10.1201\/9781315154718","volume-title":"Hearing: An introduction to psychological and physiological acoustics","author":"SA Gelfand","year":"2017","unstructured":"Gelfand SA (2017) Hearing: An introduction to psychological and physiological acoustics, 6th edn. CRC Press, Boca Raton, FL, USA","edition":"6"},{"key":"782_CR37","doi-asserted-by":"publisher","first-page":"546","DOI":"10.1016\/j.swevo.2018.06.010","volume":"44","author":"KR Opara","year":"2019","unstructured":"Opara KR, Arabas J (2019) Differential evolution: a survey of theoretical analyses. Swarm Evol Comput 44:546\u2013558","journal-title":"Swarm Evol Comput"},{"key":"782_CR38","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.swevo.2016.01.004","volume":"27","author":"S Das","year":"2016","unstructured":"Das S, Mullick SS, Suganthan PN (2016) Recent advances in differential evolution\u2014an updated survey. Swarm Evol Comput 27:1\u201330","journal-title":"Swarm Evol Comput"},{"issue":"2","key":"782_CR39","doi-asserted-by":"publisher","first-page":"107","DOI":"10.1504\/IJCSM.2014.064064","volume":"5","author":"WK Mashwani","year":"2014","unstructured":"Mashwani WK (2014) Enhanced versions of differential evolution: state of the art survey. Int J Comput Sci Math 5(2):107\u2013126","journal-title":"Int J Comput Sci Math"},{"issue":"4","key":"782_CR40","doi-asserted-by":"publisher","first-page":"560","DOI":"10.1109\/TEVC.2014.2360890","volume":"19","author":"L Tang","year":"2015","unstructured":"Tang L, Dong Y, Liu J (2015) Differential evolution with an individual-dependent mechanism. IEEE Trans Evol Comput 19(4):560\u2013574","journal-title":"IEEE Trans Evol Comput"},{"key":"782_CR41","doi-asserted-by":"crossref","unstructured":"Nagrani A, Chung JS, Zisserman A (2017) VoxCeleb: a large-scale speaker identification dataset. In: Proc. 2017 18th conference of the international speech communication association (INTERSPEECH), Stockholm, Sweden, Aug. 2017, pp 2616\u20132620","DOI":"10.21437\/Interspeech.2017-950"},{"key":"782_CR42","doi-asserted-by":"crossref","unstructured":"Chung JS, Nagrani A, Zisserman A VoxCeleb2: deep speaker recognition. In: Proc. 2018 19th conference of the international speech communication association (INTERSPEECH), Hyderabad, India, Sept. 2018, pp 1086\u20131090","DOI":"10.21437\/Interspeech.2018-1929"},{"key":"782_CR43","doi-asserted-by":"crossref","unstructured":"Panayotov V, Chen G, Povey D, Khudanpur S (2015) Librispeech: an ASR corpus based on public domain audio books. In: Proc. 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), Brisbane, Australia, Apr. 2015, pp 5206\u20135210","DOI":"10.1109\/ICASSP.2015.7178964"},{"key":"782_CR44","unstructured":"Kaldi. Accessed: Nov. 2019. https:\/\/github.com\/kaldi-asr\/kaldi"},{"key":"782_CR45","unstructured":"Van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(11)"},{"key":"782_CR46","unstructured":"Microsoft Azure. Accessed: Mar. 2022. https:\/\/azure.microsoft.com\/zh-cn"},{"key":"782_CR47","unstructured":"Gong Y, Poellabauer C (2018) Crafting adversarial examples for speech paralinguistics applications. In: Proc. of dynamic and novel advances in machine learning and intelligent cyber security (DYNAMICS) Workshop, San Juan, Puerto Rico, USA, 2018"},{"key":"782_CR48","first-page":"1633","volume":"33","author":"F Tramer","year":"2020","unstructured":"Tramer F, Carlini N, Brendel W, Madry A (2020) On adaptive attacks to adversarial example defenses. Adv Neural Inf Process Syst 33:1633\u20131645","journal-title":"Adv Neural Inf Process Syst"},{"key":"782_CR49","unstructured":"Carlini N, Mishra P, Vaidya T, Zhang Y, Sherr M, Shields C, Wagner D, Zhou W (2016) Hidden voice commands. In: 25th USENIX security symposium (USENIX Security 16), Austin, TX, USA, Aug. 2016, pp 513\u2013530"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-022-00782-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-022-00782-x\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-022-00782-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,22]],"date-time":"2023-02-22T18:45:57Z","timestamp":1677091557000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-022-00782-x"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,6,17]]},"references-count":49,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,2]]}},"alternative-id":["782"],"URL":"https:\/\/doi.org\/10.1007\/s40747-022-00782-x","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"value":"2199-4536","type":"print"},{"value":"2198-6053","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,6,17]]},"assertion":[{"value":"29 November 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"1 May 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"17 June 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they do not have any conflicts of interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}