{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,19]],"date-time":"2026-01-19T02:15:50Z","timestamp":1768788950751,"version":"3.49.0"},"reference-count":52,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2024,5,31]],"date-time":"2024-05-31T00:00:00Z","timestamp":1717113600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"MSIT (Ministry of Science and ICT), Korea, under the ITRC","award":["IITP-2023-2018-0-01441"],"award-info":[{"award-number":["IITP-2023-2018-0-01441"]}]},{"name":"IITP"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Priv. Secur."],"published-print":{"date-parts":[[2024,5,31]]},"abstract":"<jats:p>Automatic speech recognition (ASR) systems are vulnerable to audio adversarial examples, which aim at deceiving ASR systems by adding perturbations to benign speech signals. These audio adversarial examples appear indistinguishable from benign audio waves, but the ASR system decodes them as intentional malicious commands. Previous studies have demonstrated the feasibility of such attacks in simulated environments (over-line) and have further showcased the creation of robust physical audio adversarial examples (over-air). Various defense techniques have been proposed to counter these attacks. However, most of them have either failed to handle various types of attacks effectively or have resulted in significant time overhead.<\/jats:p>\n          <jats:p>In this article, we propose a novel method for detecting audio adversarial examples. Our approach involves feeding both smoothed audio and original audio inputs into the ASR system. Subsequently, we introduce noise to the logits before providing them to the decoder of the ASR. We demonstrate that carefully selected noise can considerably influence the transcription results of audio adversarial examples while having minimal impact on the transcription of benign audio waves. Leveraging this characteristic, we detect audio adversarial examples by comparing the altered transcription, resulting from logit noising, with the original transcription. The proposed method can be easily applied to ASR systems without requiring any structural modifications or additional training. Experimental results indicate that the proposed method exhibits robustness against both over-line and over-air audio adversarial examples, outperforming state-of-the-art detection methods.<\/jats:p>","DOI":"10.1145\/3661822","type":"journal-article","created":{"date-parts":[[2024,4,26]],"date-time":"2024-04-26T11:31:35Z","timestamp":1714131095000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Toward Robust ASR System against Audio Adversarial Examples using Agitated Logit"],"prefix":"10.1145","volume":"27","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1574-905X","authenticated-orcid":false,"given":"Namgyu","family":"Park","sequence":"first","affiliation":[{"name":"POSTECH, Pohang, The Republic of Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0484-0790","authenticated-orcid":false,"given":"Jong","family":"Kim","sequence":"additional","affiliation":[{"name":"POSTECH, Pohang, The Republic of Korea"}]}],"member":"320","published-online":{"date-parts":[[2024,6,10]]},"reference":[{"key":"e_1_3_2_2_2","unstructured":"Mart\u00edn Abadi Paul Barham Jianmin Chen Zhifeng Chen Andy Davis Jeffrey Dean Matthieu Devin Sanjay Ghemawat Geoffrey Irving Michael Isard Manjunath Kudlur Josh Levenberg Rajat Monga Sherry Moore Derek G. Murray Benoit Steiner Paul Tucker Vijay Vasudevan Pete Warden Martin Wicke Yuan Yu and Xiaoqiang Zheng. 2016. TensorFlow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI'16). USENIX Association Savannah GA 265--283. https:\/\/www.usenix.org\/conference\/osdi16\/technical-sessions\/presentation\/abadi"},{"key":"e_1_3_2_3_2","doi-asserted-by":"crossref","DOI":"10.14722\/ndss.2019.23362","article-title":"Practical hidden voice attacks against speech and speaker recognition systems","author":"Abdullah Hadi","year":"2019","unstructured":"Hadi Abdullah, Washington Garcia, Christian Peeters, Patrick Traynor, Kevin R. B. Butler, and Joseph Wilson. 2019. Practical hidden voice attacks against speech and speaker recognition systems. In Network and Distributed System Security Symposium (NDSS'19).","journal-title":"Network and Distributed System Security Symposium (NDSS'19)"},{"key":"e_1_3_2_4_2","first-page":"142","volume-title":"Proceedings of the 2021 2021 IEEE Symposium on Security and Privacy","author":"Abdullah H.","year":"2021","unstructured":"H. Abdullah, M. Rahman, W. Garcia, K. Warren, A. Swarnim Yadav, T. Shrimpton, and P. Traynor. 2021. Hear \u201cNo Evil\u201d, see \u201ckenansville\u201d*: Efficient and transferable black-box attacks on speech recognition and voice identification systems. In Proceedings of the 2021 2021 IEEE Symposium on Security and Privacy. IEEE Computer Society, Los Alamitos, CA, USA, 142\u2013159. DOI:10.1109\/SP40001.2021.00009"},{"key":"e_1_3_2_5_2","doi-asserted-by":"crossref","unstructured":"Hadi Abdullah Kevin Warren Vincent Bindschaedler Nicolas Papernot and Patrick Traynor. 2021. SoK: The faults in our ASRs: An overview of attacks against automatic speech recognition and speaker identification systems. In IEEE Symposium on Security and Privacy (IEEE S&P).","DOI":"10.1109\/SP40001.2021.00014"},{"issue":"1","key":"e_1_3_2_6_2","doi-asserted-by":"crossref","first-page":"90","DOI":"10.1109\/T-C.1974.223784","article-title":"Discrete cosine transform","volume":"100","author":"Ahmed Nasir","year":"1974","unstructured":"Nasir Ahmed, T_ Natarajan, and Kamisetty R. Rao. 1974. Discrete cosine transform. IEEE Transactions on Computers 100, 1 (1974), 90\u201393.","journal-title":"IEEE Transactions on Computers"},{"key":"e_1_3_2_7_2","unstructured":"\u201cAmazon\u201d. 2024. \u201cAmazon Alexa\u201d. Retrieved from https:\/\/developer.amazon.com\/en-US\/alexa"},{"key":"e_1_3_2_8_2","unstructured":"Dario Amodei Sundaram Ananthanarayanan Rishita Anubhai Jingliang Bai Eric Battenberg Carl Case Jared Casper Bryan Catanzaro Qiang Cheng Guoliang Chen Jie Chen Jingdong Chen Zhijie Chen Mike Chrzanowski Adam Coates Greg Diamos Ke Ding Niandong Du Erich Elsen Jesse Engel Weiwei Fang Linxi Fan Christopher Fougner Liang Gao Caixia Gong Awni Hannun Tony Han Lappi Johannes Bing Jiang Cai Ju Billy Jun Patrick LeGresley Libby Lin Junjie Liu Yang Liu Weigao Li Xiangang Li Dongpeng Ma Sharan Narang Andrew Ng Sherjil Ozair Yiping Peng Ryan Prenger Sheng Qian Zongfeng Quan Jonathan Raiman Vinay Rao Sanjeev Satheesh David Seetapun Shubho Sengupta Kavya Srinet Anuroop Sriram Haiyuan Tang Liliang Tang Chong Wang Jidong Wang Kaifu Wang Yi Wang Zhijian Wang Zhiqian Wang Shuang Wu Likai Wei Bo Xiao Wen Xie Yan Xie Dani Yogatama Bin Yuan Jun Zhan and Zhenyao Zhu. 2016. Deep speech 2 : End-to-end speech recognition in english and mandarin. In Proceedings of The 33rd International Conference on Machine Learning (Proceedings of Machine Learning Research Vol. 48) Maria Florina Balcan and Kilian Q. Weinberger (Eds.). PMLR New York New York USA 173--182. https:\/\/proceedings.mlr.press\/v48\/amodei16.html"},{"key":"e_1_3_2_9_2","unstructured":"Apple. 2024. Apple Siri. https:\/\/www.apple.com\/siri"},{"key":"e_1_3_2_10_2","first-page":"284","volume-title":"Proceedings of the 35th International Conference on Machine Learning","author":"Athalye Anish","year":"2018","unstructured":"Anish Athalye, Logan Engstrom, Andrew Ilyas, and Kevin Kwok. 2018. Synthesizing robust adversarial examples. In Proceedings of the 35th International Conference on Machine Learning, Jennifer Dy and Andreas Krause (Eds.). PMLR, 284\u2013293. Retrieved from http:\/\/proceedings.mlr.press\/v80\/athalye18b.html"},{"key":"e_1_3_2_11_2","first-page":"24","volume-title":"Proceedings of the 2021 6th International Conference on Advances in Biomedical Engineering","author":"Ayache Mohammad","year":"2021","unstructured":"Mohammad Ayache, Hussien Kanaan, Kawthar Kassir, and Yasser Kassir. 2021. Speech command recognition using deep learning. In Proceedings of the 2021 6th International Conference on Advances in Biomedical Engineering. IEEE, 24\u201329."},{"key":"e_1_3_2_12_2","unstructured":"Alexei Baevski Henry Zhou Abdelrahman Mohamed and Michael Auli. 2020. wav2vec 2.0: A framework for self-supervised learning of speech representations. In Proceedings of the 34th International Conference on Neural Information Processing Systems (Vancouver BC Canada ) (NIPS'20). Curran Associates Inc. Red Hook NY USA Article 1044 (2020) 12."},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2023-906"},{"key":"e_1_3_2_14_2","first-page":"513","volume-title":"Proceedings of the Usenix Security Symposium","author":"Carlini Nicholas","year":"2016","unstructured":"Nicholas Carlini, Pratyush Mishra, Tavish Vaidya, Yuankai Zhang, Micah Sherr, Clay Shields, David A. Wagner, and Wenchao Zhou. 2016. Hidden voice commands. In Proceedings of the Usenix Security Symposium. 513\u2013530."},{"key":"e_1_3_2_15_2","first-page":"1","volume-title":"Proceedings of the 2018 IEEE Security and Privacy Workshops","author":"Carlini Nicholas","year":"2018","unstructured":"Nicholas Carlini and David Wagner. 2018. Audio adversarial examples: Targeted attacks on speech-to-text. In Proceedings of the 2018 IEEE Security and Privacy Workshops. 1\u20137. DOI:10.1109\/SPW.2018.00009"},{"key":"e_1_3_2_16_2","volume-title":"Proceedings of the NDSS","author":"Chen Tao","year":"2020","unstructured":"Tao Chen, Longfei Shangguan, Zhenjiang Li, and Kyle Jamieson. 2020. Metamorph: Injecting inaudible commands into over-the-air voice controlled systems. In Proceedings of the NDSS."},{"key":"e_1_3_2_17_2","doi-asserted-by":"crossref","first-page":"1861","DOI":"10.1145\/3460120.3485365","volume-title":"Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security","author":"Chen Yanjiao","year":"2021","unstructured":"Yanjiao Chen, Yijie Bai, Richard Mitev, Kaibo Wang, Ahmad-Reza Sadeghi, and Wenyuan Xu. 2021. FakeWake: Understanding and mitigating fake wake-up words of voice assistants. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security (Virtual Event, Republic of Korea). Association for Computing Machinery, New York, NY, USA, 1861\u20131883. DOI:10.1145\/3460120.3485365"},{"key":"e_1_3_2_18_2","first-page":"2667","volume-title":"Proceedings of the 29th USENIX Security Symposium","author":"Chen Yuxuan","year":"2020","unstructured":"Yuxuan Chen, Xuejing Yuan, Jiangshan Zhang, Yue Zhao, Shengzhi Zhang, Kai Chen, and XiaoFeng Wang. 2020. Devil\u2019s Whisper: A general approach for physical adversarial attacks against commercial black-box speech recognition devices. In Proceedings of the 29th USENIX Security Symposium. USENIX Association, 2667\u20132684. Retrieved from https:\/\/www.usenix.org\/conference\/usenixsecurity20\/presentation\/chen-yuxuan"},{"key":"e_1_3_2_19_2","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1145\/3385003.3410921","volume-title":"Proceedings of the 1st ACM Workshop on Security and Privacy on Artificial Intelligence","author":"D\u00f6rr Tom","year":"2020","unstructured":"Tom D\u00f6rr, Karla Markert, Nicolas M. M\u00fcller, and Konstantin B\u00f6ttinger. 2020. Towards resistant audio adversarial examples. In Proceedings of the 1st ACM Workshop on Security and Privacy on Artificial Intelligence (Taipei, Taiwan). Association for Computing Machinery, New York, NY, USA, 3\u201310. DOI:10.1145\/3385003.3410921"},{"key":"e_1_3_2_20_2","first-page":"3986","volume-title":"Proceedings of the 28th ACM International Conference on Multimedia","author":"Du Xia","year":"2020","unstructured":"Xia Du, Chi-Man Pun, and Zheng Zhang. 2020. A unified framework for detecting audio adversarial examples. In Proceedings of the 28th ACM International Conference on Multimedia (Seattle, WA, USA). Association for Computing Machinery, New York, NY, USA, 3986\u20133994. DOI:10.1145\/3394171.3413603"},{"key":"e_1_3_2_21_2","unstructured":"Google. 2024. Google Assistant. https:\/\/assistant.google.com\/"},{"key":"e_1_3_2_22_2","doi-asserted-by":"crossref","first-page":"369","DOI":"10.1145\/1143844.1143891","volume-title":"Proceedings of the 23rd International Conference on Machine Learning","author":"Graves Alex","year":"2006","unstructured":"Alex Graves, Santiago Fern\u00e1ndez, Faustino Gomez, and J\u00fcrgen Schmidhuber. 2006. Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd International Conference on Machine Learning (Pittsburgh, Pennsylvania, USA). Association for Computing Machinery, New York, NY, USA, 369\u2013376. DOI:10.1145\/1143844.1143891"},{"key":"e_1_3_2_23_2","unstructured":"Awni Hannun Carl Case Jared Casper Bryan Catanzaro Greg Diamos Erich Elsen Ryan Prenger Sanjeev Satheesh Shubho Sengupta Adam Coates and Andrew Y. Ng. 2014. Deep Speech: Scaling up end-to-end Speech Recognition. arXiv:1412.5567. Retrieved from https:\/\/arxiv.org\/abs\/1412.5567"},{"key":"e_1_3_2_24_2","doi-asserted-by":"crossref","first-page":"2521","DOI":"10.1145\/3319535.3363246","volume-title":"Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security","author":"Kwon Hyun","year":"2019","unstructured":"Hyun Kwon, Hyunsoo Yoon, and Ki-Woong Park. 2019. POSTER: Detecting audio adversarial example through audio modification. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security (London, United Kingdom). Association for Computing Machinery, New York, NY, USA, 2521\u20132523. DOI:10.1145\/3319535.3363246"},{"key":"e_1_3_2_25_2","first-page":"2","volume-title":"Proceedings of the Ieee Intl. Conf. on Acoustics, Speech and Signal Processing Hong Kong","author":"Lamere Paul","year":"2003","unstructured":"Paul Lamere, Philip Kwok, Evandro Gouvea, Bhiksha Raj, Rita Singh, William Walker, Manfred Warmuth, and Peter Wolf. 2003. The CMU SPHINX-4 speech recognition system. In Proceedings of the Ieee Intl. Conf. on Acoustics, Speech and Signal Processing Hong Kong. 2\u20135."},{"key":"e_1_3_2_26_2","first-page":"707","volume-title":"Proceedings of the Soviet Physics Doklady","author":"Levenshtein Vladimir I.","year":"1966","unstructured":"Vladimir I. Levenshtein. 1966. Binary codes capable of correcting deletions, insertions, and reversals. In Proceedings of the Soviet Physics Doklady. Soviet Union, 707\u2013710."},{"key":"e_1_3_2_27_2","first-page":"2272","volume-title":"Proceedings of the 21st Annual Conference of the International Speech Communication Association. Virtual Event, Shanghai, China, 25-29 October 2020.","author":"Li Ruirui","year":"2020","unstructured":"Ruirui Li, Jyun-Yu Jiang, Xian Wu, Chu-Cheng Hsieh, and Andreas Stolcke. 2020. Speaker identification for household scenarios with self-attention and adversarial training. In Proceedings of the 21st Annual Conference of the International Speech Communication Association. Virtual Event, Shanghai, China, 25-29 October 2020. Helen Meng, Bo Xu, and Thomas Fang Zheng (Eds.), ISCA, 2272\u20132276. DOI:10.21437\/Interspeech.2020-3025"},{"key":"e_1_3_2_28_2","article-title":"Robust detection of machine-induced audio attacks in intelligent audio systems with microphone array","author":"Li Zhuohang","year":"2021","unstructured":"Zhuohang Li, Cong Shi, Tianfang Zhang, Yi Xie, Jian Liu, Bo Yuan, and Yingying Chen. 2021. Robust detection of machine-induced audio attacks in intelligent audio systems with microphone array. Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security. Retrieved from https:\/\/api.semanticscholar.org\/CorpusID:243092171","journal-title":"Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security."},{"key":"e_1_3_2_29_2","first-page":"1121","volume-title":"Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security","author":"Li Zhuohang","year":"2020","unstructured":"Zhuohang Li, Yi Wu, Jian Liu, Yingying Chen, and Bo Yuan. 2020. AdvPulse: Universal, synchronization-free, and targeted audio adversarial attacks via subsecond perturbations. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security (Virtual Event, USA). Association for Computing Machinery, New York, NY, USA, 1121\u20131134. DOI:10.1145\/3372297.3423348"},{"key":"e_1_3_2_30_2","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1016\/j.neuroimage.2016.03.080","article-title":"Neurobiology of knowledge and misperception of lyrics","volume":"134","author":"Lid\u00e9n Claudia Beck","year":"2016","unstructured":"Claudia Beck Lid\u00e9n, Oliver Kr\u00fcger, Lena Schwarz, Michael Erb, Bernd Kardatzki, Klaus Scheffler, and Thomas Ethofer. 2016. Neurobiology of knowledge and misperception of lyrics. NeuroImage 134, 4 (2016), 12\u201321.","journal-title":"NeuroImage"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i04.5928"},{"key":"e_1_3_2_32_2","article-title":"Towards deep learning models resistant to adversarial attacks","author":"Madry Aleksander","year":"2018","unstructured":"Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2018. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=rJzIBfZAb","journal-title":"International Conference on Learning Representations"},{"key":"e_1_3_2_33_2","unstructured":"Microsoft. 2024. Microsoft Cortana. https:\/\/www.microsoft.com\/en-us\/cortana"},{"key":"e_1_3_2_34_2","article-title":"Voice recognition algorithms using mel frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques","author":"Muda Lindasalwa","year":"2010","unstructured":"Lindasalwa Muda, Mumtaj Begam, and Irraivan Elamvazuthi. 2010. Voice recognition algorithms using mel frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques. Jounal of Computing 2, 3 (2010).","journal-title":"Jounal of Computing"},{"key":"e_1_3_2_35_2","first-page":"5206","volume-title":"Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing","author":"Panayotov Vassil","year":"2015","unstructured":"Vassil Panayotov, Guoguo Chen, Daniel Povey, and Sanjeev Khudanpur. 2015. Librispeech: An ASR corpus based on public domain audio books. In Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing. 5206\u20135210. DOI:10.1109\/ICASSP.2015.7178964"},{"key":"e_1_3_2_36_2","doi-asserted-by":"crossref","first-page":"586","DOI":"10.1145\/3485832.3485912","volume-title":"Proceedings of the Annual Computer Security Applications Conference","author":"Park Namgyu","year":"2021","unstructured":"Namgyu Park, Sangwoo Ji, and Jong Kim. 2021. Detecting audio adversarial examples with logit noising. In Proceedings of the Annual Computer Security Applications Conference. 586\u2013595."},{"key":"e_1_3_2_37_2","volume-title":"IEEE 2011 Workshop on Automatic Speech Recognition and Understanding","unstructured":"Daniel Povey, Arnab Ghoshal, Gilles Boulianne, Lukas Burget, Ondrej Glembek, Nagendra Goel, Mirko Hannemann, Petr Motlicek, Yanmin Qian, Petr Schwarz, Jan Silovsky, Georg Stemmer, and Karel Vesel. 2011. The Kaldi speech recognition toolkit. In IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society."},{"key":"e_1_3_2_38_2","volume-title":"Proceedings of the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing","author":"Pratap Vineel","year":"2019","unstructured":"Vineel Pratap, Awni Hannun, Qiantong Xu, Jeff Cai, Jacob Kahn, Gabriel Synnaeve, Vitaliy Liptchinsky, and Ronan Collobert. 2019. Wav2Letter++: A fast open-source speech recognition system. In Proceedings of the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE. DOI:10.1109\/icassp.2019.8683535"},{"key":"e_1_3_2_39_2","first-page":"5231","volume-title":"Proceedings of the 36th International Conference on Machine Learning.","author":"Qin Yao","year":"2019","unstructured":"Yao Qin, Nicholas Carlini, Garrison Cottrell, Ian Goodfellow, and Colin Raffel. 2019. Imperceptible, robust, and targeted adversarial examples for automatic speech recognition. In Proceedings of the 36th International Conference on Machine Learning. Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.), PMLR, 5231\u20135240. Retrieved from http:\/\/proceedings.mlr.press\/v97\/qin19a.html"},{"key":"e_1_3_2_40_2","volume-title":"Digital Processing of Speech Signals","year":"1978","unstructured":"Lawrence R. Rabiner and Ronald W. Schafer. 1978. Digital Processing of Speech Signals. Prentice-hall."},{"key":"e_1_3_2_41_2","doi-asserted-by":"crossref","first-page":"843","DOI":"10.1145\/3427228.3427276","volume-title":"Proceedings of the Annual Computer Security Applications Conference","author":"Sch\u00f6nherr Lea","year":"2020","unstructured":"Lea Sch\u00f6nherr, Thorsten Eisenhofer, Steffen Zeiler, Thorsten Holz, and Dorothea Kolossa. 2020. Imperio: Robust over-the-air adversarial examples for automatic speech recognition systems. In Proceedings of the Annual Computer Security Applications Conference (Austin, USA). Association for Computing Machinery, New York, NY, USA, 843\u2013855. DOI:10.1145\/3427228.3427276"},{"key":"e_1_3_2_42_2","doi-asserted-by":"crossref","unstructured":"Lea Sch\u00f6nherr Katharina Kohls Steffen Zeiler Thorsten Holz and Dorothea Kolossa. 2019. Adversarial attacks against automatic speech recognition systems via psychoacoustic hiding. In 26th Annual Network and Distributed System Security Symposium (NDSS'19). San Diego California USA February 24-27 2019. The Internet Society. https:\/\/www.ndss-symposium.org\/ndss-paper\/adversarial-attacks-against-automatic-speech-recognition-systems-via-psychoacoustic-hiding\/","DOI":"10.14722\/ndss.2019.23288"},{"key":"e_1_3_2_43_2","unstructured":"Jonathan Shen Patrick Nguyen Yonghui Wu Zhifeng Chen Mia Xu Chen Ye Jia Anjuli Kannan Tara N. Sainath Yuan Cao Chung-Cheng Chiu Yanzhang He Jan Chorowski Smit Hinsu Stella Laurenzo James Qin Orhan Firat Wolfgang Macherey Suyog Gupta Ankur Bapna Shuyuan Zhang Ruoming Pang Ron J. Weiss Rohit Prabhavalkar Qiao Liang Benoit Jacob Bowen Liang HyoukJoong Lee Ciprian Chelba S\u00e9bastien Jean Bo Li Melvin Johnson Rohan Anil Rajat Tibrewal Xiaobing Liu Akiko Eriguchi Navdeep Jaitly Naveen Ari Colin Cherry Parisa Haghani Otavio Good Youlong Cheng Raziel Alvarez Isaac Caswell Wei-Ning Hsu Zongheng Yang Kuan-Chieh Wang Ekaterina Gonina Katrin Tomanek Ben Vanik Zelin Wu Llion Jones Mike Schuster Yanping Huang Dehao Chen Kazuki Irie George F. Foster John Richardson Klaus Macherey Antoine Bruguier Heiga Zen Colin Raffel Shankar Kumar Kanishka Rao David Rybach Matthew Murray Vijayaditya Peddinti Maxim Krikun Michiel Bacchiani Thomas B. Jablin Robert Suderman Ian Williams Benjamin Lee Deepti Bhatia Justin Carlson Semih Yavuz Yu Zhang Ian McGraw Max Galkin Qi Ge Golan Pundak Chad Whipkey Todd Wang Uri Alon Dmitry Lepikhin Ye Tian Sara Sabour William Chan Shubham Toshniwal Baohua Liao Michael Nirschl and Pat Rondon. 2019. Lingvo: A modular and scalable framework for sequence-to-sequence modeling. CoRR abs\/1902.08295 (2019). arXiv:1902.08295. http:\/\/arxiv.org\/abs\/1902.08295"},{"key":"e_1_3_2_44_2","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1109\/SPW.2019.00016","volume-title":"Proceedings of the 2019 IEEE Security and Privacy Workshops","author":"Taori Rohan","year":"2019","unstructured":"Rohan Taori, Amog Kamsetty, Brenton Chu, and Nikita Vemuri. 2019. Targeted adversarial examples for black box audio systems. In Proceedings of the 2019 IEEE Security and Privacy Workshops. 15\u201320. DOI:10.1109\/SPW.2019.00016"},{"key":"e_1_3_2_45_2","volume-title":"Proceedings of the 9th  \\(\\lbrace\\) USENIX \\(\\rbrace\\)  Workshop on Offensive Technologies","author":"Vaidya Tavish","year":"2015","unstructured":"Tavish Vaidya, Yuankai Zhang, Micah Sherr, and Clay Shields. 2015. Cocaine noodles: Exploiting the gap between human and machine speech recognition. In Proceedings of the 9th \\(\\lbrace\\) USENIX \\(\\rbrace\\) Workshop on Offensive Technologies."},{"key":"e_1_3_2_46_2","first-page":"6366","volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Brighton, United Kingdom, May 12-17, 2019","author":"Wang Xiong","year":"2019","unstructured":"Xiong Wang, Sining Sun, Changhao Shan, Jingyong Hou, Lei Xie, Shen Li, and Xin Lei. 2019. Adversarial examples for improving end-to-end attention-based small-footprint keyword spotting. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Brighton, United Kingdom, May 12-17, 2019. IEEE, 6366\u20136370. DOI:10.1109\/ICASSP.2019.8683479"},{"key":"e_1_3_2_47_2","first-page":"247","volume-title":"Proceedings of the 32nd USENIX Security Symposium","author":"Wu Xinghui","year":"2023","unstructured":"Xinghui Wu, Shiqing Ma, Chao Shen, Chenhao Lin, Qian Wang, Qi Li, and Yuan Rao. 2023. KENKU: Towards efficient and stealthy black-box adversarial attacks against ASR systems. In Proceedings of the 32nd USENIX Security Symposium. USENIX Association, Anaheim, CA, 247\u2013264. Retrieved from https:\/\/www.usenix.org\/conference\/usenixsecurity23\/presentation\/wu-xinghui"},{"key":"e_1_3_2_48_2","first-page":"5334","volume-title":"Proceedings of the 28th International Joint Conference on Artificial Intelligence","author":"Yakura Hiromu","year":"2019","unstructured":"Hiromu Yakura and Jun Sakuma. 2019. Robust audio adversarial example for a physical attack. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization, 5334\u20135341. DOI:10.24963\/ijcai.2019\/741"},{"key":"e_1_3_2_49_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Yang Zhuolin","year":"2018","unstructured":"Zhuolin Yang, Bo Li, Pin-Yu Chen, and Dawn Song. 2018. Characterizing audio adversarial examples using temporal dependency. In Proceedings of the International Conference on Learning Representations."},{"issue":"4","key":"e_1_3_2_50_2","doi-asserted-by":"crossref","first-page":"936","DOI":"10.3390\/electronics12040936","article-title":"AdVulCode: Generating adversarial vulnerable code against deep learning-based vulnerability detectors","volume":"12","author":"Yu Xueqi","year":"2023","unstructured":"Xueqi Yu, Zhen Li, Xiang Huang, and Shasha Zhao. 2023. AdVulCode: Generating adversarial vulnerable code against deep learning-based vulnerability detectors. Electronics 12, 4 (2023), 936.","journal-title":"Electronics"},{"key":"e_1_3_2_51_2","first-page":"3799","volume-title":"Proceedings of the 32nd USENIX Security Symposium","author":"Yu Zhiyuan","year":"2023","unstructured":"Zhiyuan Yu, Yuanhaur Chang, Ning Zhang, and Chaowei Xiao. 2023. \\(\\lbrace\\) SMACK \\(\\rbrace\\) : Semantically meaningful adversarial audio attack. In Proceedings of the 32nd USENIX Security Symposium. 3799\u20133816."},{"key":"e_1_3_2_52_2","first-page":"49","volume-title":"Proceedings of the 27th  \\(\\lbrace\\) USENIX \\(\\rbrace\\)  Security Symposium","author":"Yuan Xuejing","year":"2018","unstructured":"Xuejing Yuan, Yuxuan Chen, Yue Zhao, Yunhui Long, Xiaokang Liu, Kai Chen, Shengzhi Zhang, Heqing Huang, Xiaofeng Wang, and Carl A. Gunter. 2018. Commandersong: A systematic approach for practical adversarial voice recognition. In Proceedings of the 27th \\(\\lbrace\\) USENIX \\(\\rbrace\\) Security Symposium. 49\u201364."},{"key":"e_1_3_2_53_2","doi-asserted-by":"crossref","first-page":"86","DOI":"10.1145\/3460120.3485383","volume-title":"Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security","author":"Zheng Baolin","year":"2021","unstructured":"Baolin Zheng, Peipei Jiang, Qian Wang, Qi Li, Chao Shen, Cong Wang, Yunjie Ge, Qingyang Teng, and Shenyi Zhang. 2021. Black-box adversarial attacks on commercial speech platforms with minimal information. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security (Virtual Event, Republic of Korea). Association for Computing Machinery, New York, NY, USA, 86\u2013107. DOI:10.1145\/3460120.3485383"}],"container-title":["ACM Transactions on Privacy and Security"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3661822","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3661822","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T00:06:21Z","timestamp":1750291581000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3661822"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,5,31]]},"references-count":52,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2024,5,31]]}},"alternative-id":["10.1145\/3661822"],"URL":"https:\/\/doi.org\/10.1145\/3661822","relation":{},"ISSN":["2471-2566","2471-2574"],"issn-type":[{"value":"2471-2566","type":"print"},{"value":"2471-2574","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,5,31]]},"assertion":[{"value":"2023-07-07","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-04-23","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-06-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}