{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,5]],"date-time":"2026-02-05T09:22:42Z","timestamp":1770283362728,"version":"3.49.0"},"publisher-location":"New York, NY, USA","reference-count":83,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,7,18]],"date-time":"2022-07-18T00:00:00Z","timestamp":1658102400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,7,18]]},"DOI":"10.1145\/3533767.3534391","type":"proceedings-article","created":{"date-parts":[[2022,7,15]],"date-time":"2022-07-15T14:28:50Z","timestamp":1657895330000},"page":"189-201","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":22,"title":["ASRTest: automated testing for deep-neural-network-driven speech recognition systems"],"prefix":"10.1145","author":[{"given":"Pin","family":"Ji","sequence":"first","affiliation":[{"name":"Nanjing University, China"}]},{"given":"Yang","family":"Feng","sequence":"additional","affiliation":[{"name":"Nanjing University, China"}]},{"given":"Jia","family":"Liu","sequence":"additional","affiliation":[{"name":"Nanjing University, China"}]},{"given":"Zhihong","family":"Zhao","sequence":"additional","affiliation":[{"name":"Nanjing University, China"}]},{"given":"Zhenyu","family":"Chen","sequence":"additional","affiliation":[{"name":"Nanjing University, China"}]}],"member":"320","published-online":{"date-parts":[[2022,7,18]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"[n.d.]. https:\/\/www.apple.com\/siri\/ \t\t\t\t\t  [n.d.]. https:\/\/www.apple.com\/siri\/"},{"key":"e_1_3_2_1_2_1","volume-title":"Amazon has figured out what\u2019s behind Alexa\u2019s random laugh that was freaking people out. https:\/\/www.usatoday.com\/story\/tech\/2018\/03\/07\/alexas-weird-random-laughter-freaking-people-out\/404476002\/","year":"2018","unstructured":"[n.d.]. Amazon has figured out what\u2019s behind Alexa\u2019s random laugh that was freaking people out. https:\/\/www.usatoday.com\/story\/tech\/2018\/03\/07\/alexas-weird-random-laughter-freaking-people-out\/404476002\/ 7 March 2018 . [n.d.]. Amazon has figured out what\u2019s behind Alexa\u2019s random laugh that was freaking people out. https:\/\/www.usatoday.com\/story\/tech\/2018\/03\/07\/alexas-weird-random-laughter-freaking-people-out\/404476002\/ 7 March 2018."},{"key":"e_1_3_2_1_3_1","unstructured":"[n.d.]. Baidu Research. http:\/\/research.baidu.com\/ \t\t\t\t\t  [n.d.]. Baidu Research. http:\/\/research.baidu.com\/"},{"key":"e_1_3_2_1_4_1","unstructured":"[n.d.]. The Github repository of ASRTest. https:\/\/github.com\/SATE-Lab\/ASRTest \t\t\t\t\t  [n.d.]. The Github repository of ASRTest. https:\/\/github.com\/SATE-Lab\/ASRTest"},{"key":"e_1_3_2_1_5_1","unstructured":"[n.d.]. The Github repository of DeepSpeech2. https:\/\/github.com\/PaddlePaddle\/DeepSpeech \t\t\t\t\t  [n.d.]. The Github repository of DeepSpeech2. https:\/\/github.com\/PaddlePaddle\/DeepSpeech"},{"key":"e_1_3_2_1_6_1","unstructured":"[n.d.]. The Github repository of Paddle. https:\/\/github.com\/PaddlePaddle\/Paddle \t\t\t\t\t  [n.d.]. The Github repository of Paddle. https:\/\/github.com\/PaddlePaddle\/Paddle"},{"key":"e_1_3_2_1_7_1","unstructured":"[n.d.]. The Github repository of Pyroomacoustics. https:\/\/github.com\/LCAV\/pyroomacoustics \t\t\t\t\t  [n.d.]. The Github repository of Pyroomacoustics. https:\/\/github.com\/LCAV\/pyroomacoustics"},{"key":"e_1_3_2_1_8_1","volume-title":"Study: Voice-activated systems in cars raise the risk of accidents. https:\/\/www.extremetech.com\/extreme\/216860-study-voice-activated-systems-in-cars-raise-the-risk-of-accidents","year":"2015","unstructured":"[n.d.]. Study: Voice-activated systems in cars raise the risk of accidents. https:\/\/www.extremetech.com\/extreme\/216860-study-voice-activated-systems-in-cars-raise-the-risk-of-accidents 26 October 2015 . [n.d.]. Study: Voice-activated systems in cars raise the risk of accidents. https:\/\/www.extremetech.com\/extreme\/216860-study-voice-activated-systems-in-cars-raise-the-risk-of-accidents 26 October 2015."},{"key":"e_1_3_2_1_9_1","volume-title":"Washington Garcia, Logan Blue, Kevin Warren, Anurag Swarnim Yadav, Tom Shrimpton, and Patrick Traynor.","author":"Abdullah Hadi","year":"2019","unstructured":"Hadi Abdullah , Muhammad Sajidur Rahman , Washington Garcia, Logan Blue, Kevin Warren, Anurag Swarnim Yadav, Tom Shrimpton, and Patrick Traynor. 2019 . Hear\" no evil\", see\" kenansville\": Efficient and transferable black-box attacks on speech recognition and voice identification systems. arXiv preprint arXiv:1910.05262. Hadi Abdullah, Muhammad Sajidur Rahman, Washington Garcia, Logan Blue, Kevin Warren, Anurag Swarnim Yadav, Tom Shrimpton, and Patrick Traynor. 2019. Hear\" no evil\", see\" kenansville\": Efficient and transferable black-box attacks on speech recognition and voice identification systems. arXiv preprint arXiv:1910.05262."},{"key":"e_1_3_2_1_10_1","volume-title":"Acoustical and environmental robustness in automatic speech recognition. 201","author":"Acero Alex","unstructured":"Alex Acero . 1992. Acoustical and environmental robustness in automatic speech recognition. 201 , Springer Science & Business Media . Alex Acero. 1992. Acoustical and environmental robustness in automatic speech recognition. 201, Springer Science & Business Media."},{"key":"e_1_3_2_1_11_1","volume-title":"International conference on machine learning. 173\u2013182","author":"Amodei Dario","year":"2016","unstructured":"Dario Amodei , Sundaram Ananthanarayanan , Rishita Anubhai , Jingliang Bai , Eric Battenberg , Carl Case , Jared Casper , Bryan Catanzaro , Qiang Cheng , and Guoliang Chen . 2016 . Deep speech 2: End-to-end speech recognition in english and mandarin . In International conference on machine learning. 173\u2013182 . Dario Amodei, Sundaram Ananthanarayanan, Rishita Anubhai, Jingliang Bai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Qiang Cheng, and Guoliang Chen. 2016. Deep speech 2: End-to-end speech recognition in english and mandarin. In International conference on machine learning. 173\u2013182."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICSME46990.2020.00066"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"crossref","unstructured":"Muhammad Hilmi Asyrofi Zhou Yang and David Lo. 2021. CrossASR++: A Modular Differential Testing Framework for Automatic Speech Recognition. arXiv preprint arXiv:2105.14881. \t\t\t\t\t  Muhammad Hilmi Asyrofi Zhou Yang and David Lo. 2021. CrossASR++: A Modular Differential Testing Framework for Automatic Speech Recognition. arXiv preprint arXiv:2105.14881.","DOI":"10.26226\/morressier.613b5418842293c031b5b5ef"},{"key":"e_1_3_2_1_14_1","unstructured":"Dzmitry Bahdanau Kyunghyun Cho and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473. \t\t\t\t\t  Dzmitry Bahdanau Kyunghyun Cho and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASRU.2017.8268937"},{"key":"e_1_3_2_1_16_1","volume-title":"Davide Del Testa","author":"Bojarski Mariusz","year":"2016","unstructured":"Mariusz Bojarski , Davide Del Testa , Daniel Dworakowski, Bernhard Firner , Beat Flepp, Prasoon Goyal, Lawrence D Jackel, Mathew Monfort, Urs Muller, and Jiakai Zhang. 2016 . End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316. Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D Jackel, Mathew Monfort, Urs Muller, and Jiakai Zhang. 2016. End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316."},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1044\/jshd.2803.221"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/SPW.2018.00009"},{"key":"e_1_3_2_1_19_1","unstructured":"Tsong Y Chen Shing C Cheung and Shiu Ming Yiu. 2020. Metamorphic testing: a new approach for generating next test cases. arXiv preprint arXiv:2002.12543. \t\t\t\t\t  Tsong Y Chen Shing C Cheung and Shiu Ming Yiu. 2020. Metamorphic testing: a new approach for generating next test cases. arXiv preprint arXiv:2002.12543."},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3143561"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP39728.2021.9413535"},{"key":"e_1_3_2_1_22_1","unstructured":"Chung-Cheng Chiu and Colin Raffel. 2017. Monotonic chunkwise attention. arXiv preprint arXiv:1712.05382. \t\t\t\t\t  Chung-Cheng Chiu and Colin Raffel. 2017. Monotonic chunkwise attention. arXiv preprint arXiv:1712.05382."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2018.8462105"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2018.8462506"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1186\/s40537-020-00391-w"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3395363.3397357"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/1143844.1143891"},{"key":"e_1_3_2_1_28_1","volume-title":"International conference on machine learning. 1764\u20131772","author":"Graves Alex","year":"2014","unstructured":"Alex Graves and Navdeep Jaitly . 2014 . Towards end-to-end speech recognition with recurrent neural networks . In International conference on machine learning. 1764\u20131772 . Alex Graves and Navdeep Jaitly. 2014. Towards end-to-end speech recognition with recurrent neural networks. In International conference on machine learning. 1764\u20131772."},{"key":"e_1_3_2_1_29_1","volume-title":"Conformer: Convolution-augmented transformer for speech recognition. arXiv preprint arXiv:2005.08100.","author":"Gulati Anmol","year":"2020","unstructured":"Anmol Gulati , James Qin , Chung-Cheng Chiu , Niki Parmar , Yu Zhang , Jiahui Yu , Wei Han , Shibo Wang , Zhengdong Zhang , and Yonghui Wu . 2020 . Conformer: Convolution-augmented transformer for speech recognition. arXiv preprint arXiv:2005.08100. Anmol Gulati, James Qin, Chung-Cheng Chiu, Niki Parmar, Yu Zhang, Jiahui Yu, Wei Han, Shibo Wang, Zhengdong Zhang, and Yonghui Wu. 2020. Conformer: Convolution-augmented transformer for speech recognition. arXiv preprint arXiv:2005.08100."},{"key":"e_1_3_2_1_30_1","volume-title":"Che-Wei Huang, Andreas Stolcke, and Roland Maas.","author":"Guo Jinxi","year":"2020","unstructured":"Jinxi Guo , Gautam Tiwari , Jasha Droppo , Maarten Van Segbroeck , Che-Wei Huang, Andreas Stolcke, and Roland Maas. 2020 . Efficient minimum word error rate training of RNN-Transducer for end-to-end speech recognition. arXiv preprint arXiv:2007.13802. Jinxi Guo, Gautam Tiwari, Jasha Droppo, Maarten Van Segbroeck, Che-Wei Huang, Andreas Stolcke, and Roland Maas. 2020. Efficient minimum word error rate training of RNN-Transducer for end-to-end speech recognition. arXiv preprint arXiv:2007.13802."},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP39728.2021.9414858"},{"key":"e_1_3_2_1_32_1","unstructured":"Awni Hannun Carl Case Jared Casper Bryan Catanzaro Greg Diamos Erich Elsen Ryan Prenger Sanjeev Satheesh Shubho Sengupta and Adam Coates. 2014. Deep speech: Scaling up end-to-end speech recognition. arXiv preprint arXiv:1412.5567. \t\t\t\t\t  Awni Hannun Carl Case Jared Casper Bryan Catanzaro Greg Diamos Erich Elsen Ryan Prenger Sanjeev Satheesh Shubho Sengupta and Adam Coates. 2014. Deep speech: Scaling up end-to-end speech recognition. arXiv preprint arXiv:1412.5567."},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00065"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2019.8682336"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-99579-3_21"},{"key":"e_1_3_2_1_36_1","volume-title":"White noise: an infinite dimensional calculus. 253","author":"Hida Takeyuki","unstructured":"Takeyuki Hida , Hui-Hsiung Kuo , J\u00fcrgen Potthoff , and Ludwig Streit . 2013. White noise: an infinite dimensional calculus. 253 , Springer Science & Business Media . Takeyuki Hida, Hui-Hsiung Kuo, J\u00fcrgen Potthoff, and Ludwig Streit. 2013. White noise: an infinite dimensional calculus. 253, Springer Science & Business Media."},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2005-138"},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP40776.2020.9054098"},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"crossref","unstructured":"Kazuki Irie Rohit Prabhavalkar Anjuli Kannan Antoine Bruguier David Rybach and Patrick Nguyen. 2019. On the choice of modeling unit for sequence-to-sequence speech recognition. arXiv preprint arXiv:1902.01955. \t\t\t\t\t  Kazuki Irie Rohit Prabhavalkar Anjuli Kannan Antoine Bruguier David Rybach and Patrick Nguyen. 2019. On the choice of modeling unit for sequence-to-sequence speech recognition. arXiv preprint arXiv:1902.01955.","DOI":"10.21437\/Interspeech.2019-2277"},{"key":"e_1_3_2_1_40_1","volume-title":"International conference on computer graphics, simulation and modeling. 135\u2013138","author":"Ittichaichareon Chadawan","year":"2012","unstructured":"Chadawan Ittichaichareon , Siwat Suksri , and Thaweesak Yingthawornsuk . 2012 . Speech recognition using MFCC . In International conference on computer graphics, simulation and modeling. 135\u2013138 . Chadawan Ittichaichareon, Siwat Suksri, and Thaweesak Yingthawornsuk. 2012. Speech recognition using MFCC. In International conference on computer graphics, simulation and modeling. 135\u2013138."},{"key":"e_1_3_2_1_41_1","unstructured":"Mahaveer Jain Kjell Schubert Jay Mahadeokar Ching-Feng Yeh Kaustubh Kalgaonkar Anuroop Sriram Christian Fuegen and Michael L Seltzer. 2019. RNN-T for latency controlled ASR with improved beam search. arXiv preprint arXiv:1911.01629. \t\t\t\t\t  Mahaveer Jain Kjell Schubert Jay Mahadeokar Ching-Feng Yeh Kaustubh Kalgaonkar Anuroop Sriram Christian Fuegen and Michael L Seltzer. 2019. RNN-T for latency controlled ASR with improved beam search. arXiv preprint arXiv:1911.01629."},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"crossref","unstructured":"Vikas Joshi Rui Zhao Rupesh R Mehta Kshitiz Kumar and Jinyu Li. 2020. Transfer Learning Approaches for Streaming End-to-End Speech Recognition System. arXiv preprint arXiv:2008.05086. \t\t\t\t\t  Vikas Joshi Rui Zhao Rupesh R Mehta Kshitiz Kumar and Jinyu Li. 2020. Transfer Learning Approaches for Streaming End-to-End Speech Recognition System. arXiv preprint arXiv:2008.05086.","DOI":"10.21437\/Interspeech.2020-2345"},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP40776.2020.9054295"},{"key":"e_1_3_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASRU.2013.6707748"},{"key":"e_1_3_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASRU46091.2019.9003750"},{"key":"e_1_3_2_1_46_1","unstructured":"Chanwoo Kim Ananya Misra Kean Chin Thad Hughes Arun Narayanan Tara Sainath and Michiel Bacchiani. 2017. Generation of large-scale simulated utterances in virtual rooms to train deep-neural networks for far-field speech recognition in Google Home. \t\t\t\t\t  Chanwoo Kim Ananya Misra Kean Chin Thad Hughes Arun Narayanan Tara Sainath and Michiel Bacchiani. 2017. Generation of large-scale simulated utterances in virtual rooms to train deep-neural networks for far-field speech recognition in Google Home."},{"key":"e_1_3_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2017.7953075"},{"key":"e_1_3_2_1_48_1","doi-asserted-by":"crossref","unstructured":"Tom Ko Vijayaditya Peddinti Daniel Povey and Sanjeev Khudanpur. 2015. Audio augmentation for speech recognition. In Sixteenth annual conference of the international speech communication association. \t\t\t\t\t  Tom Ko Vijayaditya Peddinti Daniel Povey and Sanjeev Khudanpur. 2015. Audio augmentation for speech recognition. In Sixteenth annual conference of the international speech communication association.","DOI":"10.21437\/Interspeech.2015-711"},{"key":"e_1_3_2_1_49_1","article-title":"Exploring strategies for training deep neural networks","volume":"10","author":"Larochelle Hugo","year":"2009","unstructured":"Hugo Larochelle , Yoshua Bengio , J\u00e9r\u00f4me Louradour , and Pascal Lamblin . 2009 . Exploring strategies for training deep neural networks .. Journal of machine learning research , 10 , 1 (2009). Hugo Larochelle, Yoshua Bengio, J\u00e9r\u00f4me Louradour, and Pascal Lamblin. 2009. Exploring strategies for training deep neural networks.. Journal of machine learning research, 10, 1 (2009).","journal-title":"Journal of machine learning research"},{"key":"e_1_3_2_1_50_1","volume-title":"A method for the solution of certain non-linear problems in least squares. Quarterly of applied mathematics, 2, 2","author":"Levenberg Kenneth","year":"1944","unstructured":"Kenneth Levenberg . 1944. A method for the solution of certain non-linear problems in least squares. Quarterly of applied mathematics, 2, 2 ( 1944 ), 164\u2013168. Kenneth Levenberg. 1944. A method for the solution of certain non-linear problems in least squares. Quarterly of applied mathematics, 2, 2 (1944), 164\u2013168."},{"key":"e_1_3_2_1_51_1","unstructured":"Vladimir I Levenshtein. 1966. Binary codes capable of correcting deletions insertions and reversals. In Soviet physics doklady. 10 707\u2013710. \t\t\t\t\t  Vladimir I Levenshtein. 1966. Binary codes capable of correcting deletions insertions and reversals. In Soviet physics doklady. 10 707\u2013710."},{"key":"e_1_3_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2014.2304637"},{"key":"e_1_3_2_1_53_1","unstructured":"Jinyu Li Yu Wu Yashesh Gaur Chengyi Wang Rui Zhao and Shujie Liu. 2020. On the comparison of popular end-to-end models for large scale speech recognition. arXiv preprint arXiv:2005.14327. \t\t\t\t\t  Jinyu Li Yu Wu Yashesh Gaur Chengyi Wang Rui Zhao and Shujie Liu. 2020. On the comparison of popular end-to-end models for large scale speech recognition. arXiv preprint arXiv:2005.14327."},{"key":"e_1_3_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASRU46091.2019.9003906"},{"key":"e_1_3_2_1_55_1","doi-asserted-by":"crossref","unstructured":"Bo Liu Ying Wei Yu Zhang and Qiang Yang. 2017. Deep Neural Networks for High Dimension Low Sample Size Data.. In IJCAI. 2287\u20132293. \t\t\t\t\t  Bo Liu Ying Wei Yu Zhang and Qiang Yang. 2017. Deep Neural Networks for High Dimension Low Sample Size Data.. In IJCAI. 2287\u20132293.","DOI":"10.24963\/ijcai.2017\/318"},{"key":"e_1_3_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2016.7472765"},{"key":"e_1_3_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP40776.2020.9054476"},{"key":"e_1_3_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2015.7178964"},{"key":"e_1_3_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1145\/3361566"},{"key":"e_1_3_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.3390\/s20082326"},{"key":"e_1_3_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2018.8461809"},{"key":"e_1_3_2_1_62_1","unstructured":"Anthony Rousseau Paul Del\u00e9glise and Yannick Esteve. 2014. Enhancing the TED-LIUM corpus with selected data for language modeling and more TED talks.. In LREC. 3935\u20133939. \t\t\t\t\t  Anthony Rousseau Paul Del\u00e9glise and Yannick Esteve. 2014. Enhancing the TED-LIUM corpus with selected data for language modeling and more TED talks.. In LREC. 3935\u20133939."},{"key":"e_1_3_2_1_63_1","doi-asserted-by":"crossref","unstructured":"Lea Sch\u00f6nherr Katharina Kohls Steffen Zeiler Thorsten Holz and Dorothea Kolossa. 2018. Adversarial attacks against automatic speech recognition systems via psychoacoustic hiding. arXiv preprint arXiv:1808.05665. \t\t\t\t\t  Lea Sch\u00f6nherr Katharina Kohls Steffen Zeiler Thorsten Holz and Dorothea Kolossa. 2018. Adversarial attacks against automatic speech recognition systems via psychoacoustic hiding. arXiv preprint arXiv:1808.05665.","DOI":"10.14722\/ndss.2019.23288"},{"key":"e_1_3_2_1_64_1","first-page":"562","article-title":"Audio pitch shifting using the constant-Q transform","volume":"61","author":"Sch\u00f6rkhuber Christian","year":"2013","unstructured":"Christian Sch\u00f6rkhuber , Anssi Klapuri , and Alois Sontacchi . 2013 . Audio pitch shifting using the constant-Q transform . Journal of the Audio Engineering Society , 61 , 7\/8 (2013), 562 \u2013 572 . Christian Sch\u00f6rkhuber, Anssi Klapuri, and Alois Sontacchi. 2013. Audio pitch shifting using the constant-Q transform. Journal of the Audio Engineering Society, 61, 7\/8 (2013), 562\u2013572.","journal-title":"Journal of the Audio Engineering Society"},{"key":"e_1_3_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2016.2532875"},{"key":"e_1_3_2_1_66_1","unstructured":"Ben J Shannon and Kuldip K Paliwal. 2003. A comparative study of filter bank spacing for speech recognition. In Microelectronic engineering research conference. 41 310\u201312. \t\t\t\t\t  Ben J Shannon and Kuldip K Paliwal. 2003. A comparative study of filter bank spacing for speech recognition. In Microelectronic engineering research conference. 41 310\u201312."},{"key":"e_1_3_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.2478\/jaiscr-2019-0006"},{"key":"e_1_3_2_1_68_1","volume-title":"Kyle Kastner, Aaron Courville, and Yoshua Bengio.","author":"Sotelo Jose","year":"2017","unstructured":"Jose Sotelo , Soroush Mehri , Kundan Kumar , Joao Felipe Santos , Kyle Kastner, Aaron Courville, and Yoshua Bengio. 2017 . Char2wav: End-to-end speech synthesis. Jose Sotelo, Soroush Mehri, Kundan Kumar, Joao Felipe Santos, Kyle Kastner, Aaron Courville, and Yoshua Bengio. 2017. Char2wav: End-to-end speech synthesis."},{"key":"e_1_3_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1016\/0167-6393(93)90095-3"},{"key":"e_1_3_2_1_70_1","volume-title":"\u0141 ukasz Kaiser, and Illia Polosukhin","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani , Noam Shazeer , Niki Parmar , Jakob Uszkoreit , Llion Jones , Aidan N Gomez , \u0141 ukasz Kaiser, and Illia Polosukhin . 2017 . Attention is all you need. In Advances in neural information processing systems. 5998\u20136008. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, \u0141 ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998\u20136008."},{"key":"e_1_3_2_1_71_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP40776.2020.9053461"},{"key":"e_1_3_2_1_72_1","doi-asserted-by":"publisher","DOI":"10.3390\/sym11081018"},{"key":"e_1_3_2_1_73_1","volume-title":"SIEVE: Secure In-Vehicle Automatic Speech Recognition Systems. In 23rd International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2020","author":"Wang Shu","year":"2020","unstructured":"Shu Wang , Jiahao Cao , Kun Sun , and Qi Li . 2020 . SIEVE: Secure In-Vehicle Automatic Speech Recognition Systems. In 23rd International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2020 ). USENIX Association, San Sebastian. 365\u2013379. isbn:978-1-939133-18-2 https:\/\/www.usenix.org\/conference\/raid 2020\/presentation\/wang-shu Shu Wang, Jiahao Cao, Kun Sun, and Qi Li. 2020. SIEVE: Secure In-Vehicle Automatic Speech Recognition Systems. In 23rd International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2020). USENIX Association, San Sebastian. 365\u2013379. isbn:978-1-939133-18-2 https:\/\/www.usenix.org\/conference\/raid2020\/presentation\/wang-shu"},{"key":"e_1_3_2_1_74_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2018-1456"},{"key":"e_1_3_2_1_75_1","doi-asserted-by":"publisher","DOI":"10.1109\/JSTSP.2017.2763455"},{"key":"e_1_3_2_1_76_1","doi-asserted-by":"publisher","DOI":"10.1021\/acs.jcim.8b00785"},{"key":"e_1_3_2_1_77_1","doi-asserted-by":"publisher","DOI":"10.1002\/stv.430"},{"key":"e_1_3_2_1_78_1","doi-asserted-by":"publisher","DOI":"10.1109\/MSP.2012.2205029"},{"key":"e_1_3_2_1_79_1","volume-title":"AUTOMATIC SPEECH RECOGNITION","author":"Yu Dong","unstructured":"Dong Yu and Li Deng . 2016. AUTOMATIC SPEECH RECOGNITION .. Springer . Dong Yu and Li Deng. 2016. AUTOMATIC SPEECH RECOGNITION.. Springer."},{"key":"e_1_3_2_1_80_1","volume-title":"Commandersong: A systematic approach for practical adversarial voice recognition. In 27th $USENIX$ Security Symposium ($USENIX$ Security 18). 49\u201364.","author":"Yuan Xuejing","year":"2018","unstructured":"Xuejing Yuan , Yuxuan Chen , Yue Zhao , Yunhui Long , Xiaokang Liu , Kai Chen , Shengzhi Zhang , Heqing Huang , Xiaofeng Wang , and Carl A Gunter . 2018 . Commandersong: A systematic approach for practical adversarial voice recognition. In 27th $USENIX$ Security Symposium ($USENIX$ Security 18). 49\u201364. Xuejing Yuan, Yuxuan Chen, Yue Zhao, Yunhui Long, Xiaokang Liu, Kai Chen, Shengzhi Zhang, Heqing Huang, Xiaofeng Wang, and Carl A Gunter. 2018. Commandersong: A systematic approach for practical adversarial voice recognition. In 27th $USENIX$ Security Symposium ($USENIX$ Security 18). 49\u201364."},{"key":"e_1_3_2_1_81_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP40776.2020.9053896"},{"key":"e_1_3_2_1_82_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2004-55"},{"key":"e_1_3_2_1_83_1","doi-asserted-by":"crossref","unstructured":"Shiyu Zhou Linhao Dong Shuang Xu and Bo Xu. 2018. Syllable-based sequence-to-sequence speech recognition with the transformer in mandarin chinese. arXiv preprint arXiv:1804.10752. \t\t\t\t\t  Shiyu Zhou Linhao Dong Shuang Xu and Bo Xu. 2018. Syllable-based sequence-to-sequence speech recognition with the transformer in mandarin chinese. arXiv preprint arXiv:1804.10752.","DOI":"10.21437\/Interspeech.2018-1107"}],"event":{"name":"ISSTA '22: 31st ACM SIGSOFT International Symposium on Software Testing and Analysis","location":"Virtual South Korea","acronym":"ISSTA '22","sponsor":["SIGSOFT ACM Special Interest Group on Software Engineering"]},"container-title":["Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3533767.3534391","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3533767.3534391","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T18:43:41Z","timestamp":1750272221000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3533767.3534391"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,7,18]]},"references-count":83,"alternative-id":["10.1145\/3533767.3534391","10.1145\/3533767"],"URL":"https:\/\/doi.org\/10.1145\/3533767.3534391","relation":{},"subject":[],"published":{"date-parts":[[2022,7,18]]},"assertion":[{"value":"2022-07-18","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}