{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,21]],"date-time":"2026-01-21T18:09:45Z","timestamp":1769018985001,"version":"3.49.0"},"reference-count":45,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2023,12,5]],"date-time":"2023-12-05T00:00:00Z","timestamp":1701734400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,12,5]],"date-time":"2023-12-05T00:00:00Z","timestamp":1701734400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Cybersecurity"],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>In recent years, deep learning (DL) models have achieved significant progress in many domains, such as autonomous driving, facial recognition, and speech recognition. However, the vulnerability of deep learning models to adversarial attacks has raised serious concerns in the community because of their insufficient robustness and generalization. Also, transferable attacks have become a prominent method for black-box attacks. In this work, we explore the potential factors that impact adversarial examples (AEs) transferability in DL-based speech recognition. We also discuss the vulnerability of different DL systems and the irregular nature of decision boundaries. Our results show a remarkable difference in the transferability of AEs between speech and images, with the data relevance being low in images but opposite in speech recognition. Motivated by dropout-based ensemble approaches, we propose random gradient ensembles and dynamic gradient-weighted ensembles, and we evaluate the impact of ensembles on the transferability of AEs. The results show that the AEs created by both approaches are valid for transfer to the black box API.<\/jats:p>","DOI":"10.1186\/s42400-023-00175-8","type":"journal-article","created":{"date-parts":[[2023,12,5]],"date-time":"2023-12-05T01:02:07Z","timestamp":1701738127000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["Towards the transferable audio adversarial attack via ensemble methods"],"prefix":"10.1186","volume":"6","author":[{"given":"Feng","family":"Guo","sequence":"first","affiliation":[]},{"given":"Zheng","family":"Sun","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0000-5031","authenticated-orcid":false,"given":"Yuxuan","family":"Chen","sequence":"additional","affiliation":[]},{"given":"Lei","family":"Ju","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,12,5]]},"reference":[{"key":"175_CR1","unstructured":"Athalye A, Carlini N, Wagner D (2018) Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. In: International conference on machine learning. PMLR, pp 274\u2013283"},{"key":"175_CR2","unstructured":"Balduzzi D, Frean M, Leary L, Lewis J, Ma KW-D, McWilliams B (2017) The shattered gradients problem: If resnets are the answer, then what is the question? In: International conference on machine learning. PMLR, pp 342\u2013350"},{"key":"175_CR3","doi-asserted-by":"crossref","unstructured":"Carlini N, Wagner D (2017) Towards evaluating the robustness of neural networks. In: 2017 IEEE symposium on security and privacy (sp). IEEE, pp 39\u201357","DOI":"10.1109\/SP.2017.49"},{"key":"175_CR4","doi-asserted-by":"crossref","unstructured":"Carlini N, Wagner D (2018) Audio adversarial examples: targeted attacks on speech-to-text. In: 2018 IEEE security and privacy workshops (SPW). IEEE, pp 1\u20137","DOI":"10.1109\/SPW.2018.00009"},{"key":"175_CR5","doi-asserted-by":"crossref","unstructured":"Che Z, Borji A, Zhai G, Ling S, Li J, Le\u00a0Callet P (2020) A new ensemble adversarial attack powered by long-term gradient memories. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp 3405\u20133413","DOI":"10.1609\/aaai.v34i04.5743"},{"issue":"3","key":"175_CR6","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3510582","volume":"25","author":"Y Chen","year":"2022","unstructured":"Chen Y, Zhang J, Yuan X, Zhang S, Chen K, Wang X, Guo S (2022) Sok: a modularized approach to study the security of automatic speech recognition systems. ACM Trans Priv Secur 25(3):1\u201331","journal-title":"ACM Trans Priv Secur"},{"key":"175_CR7","unstructured":"Chen Y, Yuan X, Zhang J, Zhao Y, Zhang S, Chen K, Wang X (2020) Devil\u2019s whisper: a general approach for physical adversarial attacks against commercial black-box speech recognition devices. In: USENIX security symposium, pp 2667\u20132684"},{"key":"175_CR8","unstructured":"Cortes C, Lawarence N, Lee D, Sugiyama M, Garnett R (2015) Advances in neural information processing systems 28. In: Proceedings of the 29th annual conference on neural information processing systems"},{"key":"175_CR9","doi-asserted-by":"crossref","unstructured":"Dong Y, Liao F, Pang T, Su H, Zhu J, Hu X, Li J (2018) Boosting adversarial attacks with momentum. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9185\u20139193","DOI":"10.1109\/CVPR.2018.00957"},{"key":"175_CR10","doi-asserted-by":"crossref","unstructured":"Dong Y, Pang T, Su H, Zhu J (2019) Evading defenses to transferable adversarial examples by translation-invariant attacks. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 4312\u20134321","DOI":"10.1109\/CVPR.2019.00444"},{"key":"175_CR11","doi-asserted-by":"publisher","first-page":"74","DOI":"10.1016\/j.neunet.2022.01.003","volume":"148","author":"H Du","year":"2022","unstructured":"Du H, Xie L, Li H (2022) Noise-robust voice conversion with domain adversarial training. Neural Netw 148:74\u201384","journal-title":"Neural Netw"},{"key":"175_CR12","doi-asserted-by":"crossref","unstructured":"Du T, Ji S, Li J, Gu Q, Wang T, Beyah R (2020) Sirenattack: generating adversarial audio for end-to-end acoustic systems. In: Proceedings of the 15th ACM Asia conference on computer and communications security, pp 357\u2013369","DOI":"10.1145\/3320269.3384733"},{"key":"175_CR13","doi-asserted-by":"crossref","unstructured":"Hang J, Han K, Chen H, Li Y (2020) Ensemble adversarial black-box attacks against deep learning systems. Pattern Recogn 101:107184","DOI":"10.1016\/j.patcog.2019.107184"},{"key":"175_CR14","doi-asserted-by":"crossref","unstructured":"He Z, Rakin AS, Fan D (2019) Parametric noise injection: trainable randomness to improve deep neural network robustness against adversarial attack. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 588\u2013597","DOI":"10.1109\/CVPR.2019.00068"},{"key":"175_CR15","doi-asserted-by":"crossref","unstructured":"Huang Q, Katsman I, He H, Gu Z, Belongie S, Lim S-N (2019) Enhancing adversarial example transferability with an intermediate level attack. In: Proceedings of the IEEE\/CVF international conference on computer vision, pp 4733\u20134742","DOI":"10.1109\/ICCV.2019.00483"},{"key":"175_CR16","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2022.109286","volume":"137","author":"H Kim","year":"2023","unstructured":"Kim H, Park J, Lee J (2023) Generating transferable adversarial examples for speech classification. Pattern Recogn 137:109286","journal-title":"Pattern Recogn"},{"key":"175_CR17","doi-asserted-by":"crossref","unstructured":"Kim WJ, Hong S, Yoon S-E (2022) Diverse generative perturbations on attention space for transferable adversarial attacks. In: 2022 IEEE international conference on image processing (ICIP). IEEE, pp 281\u2013285","DOI":"10.1109\/ICIP46576.2022.9897346"},{"key":"175_CR18","unstructured":"Li Z, Bhojanapalli S, Zaheer M, Reddi S, Kumar S (2022) Robust training of neural networks using scale invariant architectures. In: International conference on machine learning. PMLR, pp 12656\u201312684"},{"key":"175_CR19","doi-asserted-by":"crossref","unstructured":"Li Y, Bai S, Zhou Y, Xie C, Zhang Z, Yuille A (2020) Learning transferable adversarial examples via ghost networks. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp 11458\u201311465","DOI":"10.1609\/aaai.v34i07.6810"},{"key":"175_CR20","doi-asserted-by":"crossref","unstructured":"Lin Z, Peng A, Wei R, Yu W, Zeng H (2022) An enhanced transferable adversarial attack of scale-invariant methods. In: 2022 IEEE international conference on image processing (ICIP). IEEE, pp 3788\u20133792","DOI":"10.1109\/ICIP46576.2022.9897429"},{"key":"175_CR21","unstructured":"Liu Y, Chen X, Liu C, Song D (2016) Delving into transferable adversarial examples and black-box attacks. arXiv preprint arXiv:1611.02770"},{"key":"175_CR22","doi-asserted-by":"crossref","unstructured":"Long Y, Zhang Q, Zeng B, Gao L, Liu X, Zhang J, Song J (2022) Frequency domain model augmentation for adversarial attack. In: Computer vision\u2013ECCV 2022: 17th European conference, Tel Aviv, Israel, October 23\u201327, 2022, proceedings, part IV. Springer, pp. 549\u2013566","DOI":"10.1007\/978-3-031-19772-7_32"},{"key":"175_CR23","doi-asserted-by":"crossref","unstructured":"Moosavi-Dezfooli S-M, Fawzi A, Frossard P (2016) Deepfool: a simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2574\u20132582","DOI":"10.1109\/CVPR.2016.282"},{"key":"175_CR24","doi-asserted-by":"crossref","unstructured":"Neekhara P, Hussain S, Pandey P, Dubnov S, McAuley J, Koushanfar F (2019) Universal adversarial perturbations for speech recognition systems. arXiv preprint arXiv:1905.03828","DOI":"10.21437\/Interspeech.2019-1353"},{"key":"175_CR25","doi-asserted-by":"crossref","unstructured":"Papernot N, McDaniel P, Jha S, Fredrikson M, Celik ZB, Swami A (2016) The limitations of deep learning in adversarial settings. In: 2016 IEEE European symposium on security and privacy (EuroS &P). IEEE, pp 372\u2013387","DOI":"10.1109\/EuroSP.2016.36"},{"key":"175_CR26","doi-asserted-by":"crossref","unstructured":"Papernot N, McDaniel P, Goodfellow I, Jha S, Celik ZB, Swami A (2017) Practical black-box attacks against machine learning. In: Proceedings of the 2017 ACM on Asia conference on computer and communications security, pp 506\u2013519","DOI":"10.1145\/3052973.3053009"},{"key":"175_CR27","unstructured":"Qin Y, Carlini N, Cottrell G, Goodfellow I, Raffel C (2019) Imperceptible, robust, and targeted adversarial examples for automatic speech recognition. In: International conference on machine learning. PMLR, pp 5231\u20135240"},{"issue":"3","key":"175_CR28","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3398394","volume":"53","author":"A Serban","year":"2020","unstructured":"Serban A, Poll E, Visser J (2020) Adversarial examples on object recognition: a comprehensive survey. ACM Comput Surv (CSUR) 53(3):1\u201338","journal-title":"ACM Comput Surv (CSUR)"},{"key":"175_CR29","unstructured":"Smilkov D, Thorat N, Kim B, Vi\u00e9gas F, Wattenberg M (2017) Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825"},{"key":"175_CR30","unstructured":"Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I, Fergus R (2013) Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199"},{"key":"175_CR31","doi-asserted-by":"crossref","unstructured":"Taori R, Kamsetty A, Chu B, Vemuri N (2019) Targeted adversarial examples for black box audio systems. In: 2019 IEEE security and privacy workshops (SPW). IEEE, pp 15\u201320","DOI":"10.1109\/SPW.2019.00016"},{"key":"175_CR32","unstructured":"Tram\u00e8r F, Papernot N, Goodfellow I, Boneh D, McDaniel P (2017) The space of transferable adversarial examples. arXiv preprint arXiv:1704.03453"},{"key":"175_CR33","unstructured":"Tram\u00e8r F, Kurakin A, Papernot N, Goodfellow I, Boneh D, McDaniel P (2017) Ensemble adversarial training: attacks and defenses. arXiv preprint arXiv:1705.07204"},{"key":"175_CR34","doi-asserted-by":"crossref","unstructured":"Wu W, Su Y, Lyu MR, King I (2021) Improving the transferability of adversarial samples with adversarial transformations. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 9024\u20139033","DOI":"10.1109\/CVPR46437.2021.00891"},{"key":"175_CR35","unstructured":"Wu L, Zhu Z, Tai C, et al. (2018) Understanding and enhancing the transferability of adversarial examples. arXiv preprint arXiv:1802.09707"},{"key":"175_CR36","unstructured":"Wu D, Wang Y, Xia S-T, Bailey J, Ma X (2020) Skip connections matter: on the transferability of adversarial examples generated with resnets. arXiv preprint arXiv:2002.05990"},{"key":"175_CR37","unstructured":"Wu L, Zhu Z, Tai C et al. (2018) Understanding and enhancing the transferability of adversarial examples. arXiv preprint arXiv:1802.09707"},{"key":"175_CR38","doi-asserted-by":"crossref","unstructured":"Xie C, Zhang Z, Zhou Y, Bai S, Wang J, Ren Z, Yuille AL (2019) Improving transferability of adversarial examples with input diversity. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 2730\u20132739","DOI":"10.1109\/CVPR.2019.00284"},{"key":"175_CR39","doi-asserted-by":"crossref","unstructured":"Xiong W, Droppo J, Huang X, Seide F, Seltzer ML, Stolcke A, Yu D, Zweig G (2016) The microsoft 2016 conversational speech recognition system. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 5255\u20135259","DOI":"10.1109\/ICASSP.2017.7953159"},{"key":"175_CR40","doi-asserted-by":"crossref","unstructured":"Xiong Y, Lin J, Zhang M, Hopcroft JE, He K (2022) Stochastic variance reduced ensemble adversarial attack for boosting the adversarial transferability. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 14983\u201314992","DOI":"10.1109\/CVPR52688.2022.01456"},{"key":"175_CR41","unstructured":"Xu J, Zhang J, Zhu J, Yang Y (2022) Disappeared command: spoofing attack on automatic speech recognition systems with sound masking. arXiv preprint arXiv:2204.08977"},{"key":"175_CR42","unstructured":"Xu M, Zhang T, Li Z, Zhang D (2022) Scale-invariant adversarial attack for evaluating and enhancing adversarial defenses. arXiv preprint arXiv:2201.12527"},{"key":"175_CR43","unstructured":"Yuan X, Chen Y, Zhao Y, Long Y, Liu X, Chen K, Zhang S, Huang H, Wang X, Gunter CA (2018) Commandersong: a systematic approach for practical adversarial voice recognition. In: 27th $$\\{$$USENIX$$\\}$$ security symposium ($$\\{$$USENIX$$\\}$$ security 18), pp 49\u201364"},{"key":"175_CR44","doi-asserted-by":"crossref","unstructured":"Zheng B, Jiang P, Wang Q, Li Q, Shen C, Wang C, Ge Y, Teng Q, Zhang S (2021) Black-box adversarial attacks on commercial speech platforms with minimal information. In: Proceedings of the 2021 ACM SIGSAC conference on computer and communications security, pp 86\u2013107","DOI":"10.1145\/3460120.3485383"},{"key":"175_CR45","doi-asserted-by":"publisher","first-page":"6487","DOI":"10.1109\/TIP.2022.3211736","volume":"31","author":"Y Zhu","year":"2022","unstructured":"Zhu Y, Chen Y, Li X, Chen K, He Y, Tian X, Zheng B, Chen Y, Huang Q (2022) Toward understanding and boosting adversarial transferability from a distribution perspective. IEEE Trans Image Process 31:6487\u20136501","journal-title":"IEEE Trans Image Process"}],"container-title":["Cybersecurity"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s42400-023-00175-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s42400-023-00175-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s42400-023-00175-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,12,5]],"date-time":"2023-12-05T01:04:36Z","timestamp":1701738276000},"score":1,"resource":{"primary":{"URL":"https:\/\/cybersecurity.springeropen.com\/articles\/10.1186\/s42400-023-00175-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,12,5]]},"references-count":45,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,12]]}},"alternative-id":["175"],"URL":"https:\/\/doi.org\/10.1186\/s42400-023-00175-8","relation":{},"ISSN":["2523-3246"],"issn-type":[{"value":"2523-3246","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,12,5]]},"assertion":[{"value":"12 April 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"5 July 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"5 December 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"44"}}