{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,25]],"date-time":"2026-04-25T15:20:09Z","timestamp":1777130409173,"version":"3.51.4"},"reference-count":85,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2025,4,28]],"date-time":"2025-04-28T00:00:00Z","timestamp":1745798400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nd\/4.0\/"}],"funder":[{"name":"Ministry of Education, Singapore under its Academic Research Fund Tier 3","award":["MOET32020-0004"],"award-info":[{"award-number":["MOET32020-0004"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Softw. Eng. Methodol."],"published-print":{"date-parts":[[2025,5,31]]},"abstract":"<jats:p>\n            As Automated Speech Recognition (ASR) systems gain widespread acceptance, there is a pressing need to rigorously test and enhance their performance. Nonetheless, the process of collecting and executing speech test cases is typically both costly and time-consuming. This presents a compelling case for the strategic prioritization of speech test cases, which consist of a piece of\n            <jats:italic>audio<\/jats:italic>\n            and the corresponding\n            <jats:italic>reference text<\/jats:italic>\n            . The central question we address is:\n            <jats:italic>In what sequence should speech test cases be collected and executed to identify the maximum number of errors at the earliest stage<\/jats:italic>\n            ? In this study, we introduce PRiOritizing sPeecH tEsT (\n            <jats:sc>Prophet<\/jats:sc>\n            ) cases, a tool designed to predict the likelihood that speech test cases will identify errors. Consequently,\n            <jats:sc>Prophet<\/jats:sc>\n            can assess and prioritize these test cases without having to run the ASR system, facilitating large-scale analysis. Our evaluation encompasses\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(6\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            distinct prioritization techniques across\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(3\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            ASR systems and\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(12\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            datasets. When constrained by the same test budget, our approach identified\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(15.44\\%\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            more misrecognized words than the leading state-of-the-art method. We select top-ranked speech test cases from the prioritized list to fine-tune ASR systems and analyze how our approach can improve the ASR system performance. Statistical evaluations show that our method delivers a considerably higher performance boost for ASR systems compared to established baseline techniques. Moreover, our correlation analysis confirms that fine-tuning an ASR system with a dataset where the model initially underperforms tends to yield greater performance improvements.\n          <\/jats:p>","DOI":"10.1145\/3707450","type":"journal-article","created":{"date-parts":[[2024,12,9]],"date-time":"2024-12-09T16:11:58Z","timestamp":1733760718000},"page":"1-27","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["Prioritizing Speech Test Cases"],"prefix":"10.1145","volume":"34","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5938-1918","authenticated-orcid":false,"given":"Zhou","family":"Yang","sequence":"first","affiliation":[{"name":"Singapore Management University, Singapore, Singapore"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0799-5018","authenticated-orcid":false,"given":"Jieke","family":"Shi","sequence":"additional","affiliation":[{"name":"Singapore Management University, Singapore, Singapore"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0862-2579","authenticated-orcid":false,"given":"Muhammad Hilmi","family":"Asyrofi","sequence":"additional","affiliation":[{"name":"PropertyGuru Group, Singapore, Singapore"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1006-8493","authenticated-orcid":false,"given":"Bowen","family":"Xu","sequence":"additional","affiliation":[{"name":"North Carolina State University at Raleigh, Raleigh, North Carolina, United States"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4558-0622","authenticated-orcid":false,"given":"Xin","family":"Zhou","sequence":"additional","affiliation":[{"name":"Singapore Management University, Singapore, Singapore"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6072-1799","authenticated-orcid":false,"given":"Donggyun","family":"Han","sequence":"additional","affiliation":[{"name":"Royal Holloway University of London, Egham, United Kingdom of Great Britain and Northern Ireland"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4367-7201","authenticated-orcid":false,"given":"David","family":"Lo","sequence":"additional","affiliation":[{"name":"Singapore Management University, Singapore, Singapore"}]}],"member":"320","published-online":{"date-parts":[[2025,4,28]]},"reference":[{"key":"e_1_3_2_2_2","unstructured":"Keyu An Xian Shi and Shiliang Zhang. 2023. BAT: Boundary aware transducer for memory-efficient and low-latency ASR. arXiv:2305.11571. Retrieved from https:\/\/arxiv.org\/abs\/2305.11571"},{"key":"e_1_3_2_3_2","first-page":"4218","volume-title":"12th Language Resources and Evaluation Conference","author":"Ardila Rosana","year":"2020","unstructured":"Rosana Ardila, Megan Branson, Kelly Davis, Michael Kohler, Josh Meyer, Michael Henretty, Reuben Morais, Lindsay Saunders, Francis Tyers, and Gregor Weber. 2020. Common voice: A massively-multilingual speech corpus. In 12th Language Resources and Evaluation Conference. European Language Resources Association, Marseille, France, 4218\u20134222. Retrieved from https:\/\/aclanthology.org\/2020.lrec-1.520"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1111\/opo.12131"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSME46990.2020.00066"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1145\/3468264.3473124"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSME52107.2021.00079"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2021.3136169"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP39728.2021.9414830"},{"key":"e_1_3_2_10_2","unstructured":"Abdul Hameed Azeemi Ihsan Ayyub Qazi and Agha Ali Raza. 2022. Towards representative subset selection for self-supervised speech recognition. arXiv:2203.09829. Retrieved from https:\/\/arxiv.org\/abs\/2203.09829"},{"key":"e_1_3_2_11_2","unstructured":"Massa Baali Tomoki Hayashi Hamdy Mubarak Soumi Maiti Shinji Watanabe Wassim El-Hajj and Ahmed Ali. 2023. Unsupervised data selection for TTS: Using Arabic broadcast news as a case study. arXiv:2301.09099. Retrieved from https:\/\/arxiv.org\/abs\/2301.09099"},{"key":"e_1_3_2_12_2","first-page":"1298","volume-title":"International Conference on Machine Learning","author":"Baevski Alexei","year":"2022","unstructured":"Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, and Michael Auli. 2022. Data2vec: A general framework for self-supervised learning in speech, vision and language. In International Conference on Machine Learning. PMLR, 1298\u20131312."},{"key":"e_1_3_2_13_2","first-page":"12449","article-title":"wav2vec 2.0: A framework for self-supervised learning of speech representations","author":"Baevski Alexei","year":"2020","unstructured":"Alexei Baevski, Yuhao Zhou, Abdelrahman Mohamed, and Michael Auli. 2020. wav2vec 2.0: A framework for self-supervised learning of speech representations. In Advances in Neural Information Processing Systems, 12449\u201312460.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/3236024.3236053"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/jstsp.2022.3188113"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP49357.2023.10095326"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP48485.2024.10447553"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/3460319.3464810"},{"key":"e_1_3_2_19_2","unstructured":"Kevin Clark Minh-Thang Luong Quoc V. Le and Christopher D. Manning. 2020. ELECTRA: Pre-training text encoders as discriminators rather than generators. arXiv:2003.10555. Retrieved from https:\/\/arxiv.org\/abs\/2003.10555"},{"key":"e_1_3_2_20_2","volume-title":"Mental Workload of Common Voice-Based Vehicle Interactions across Six Different Vehicle Systems","author":"Cooper Joel M.","year":"2014","unstructured":"Joel M. Cooper, Hailey Ingebretsen, and David L. Strayer. 2014. Mental Workload of Common Voice-Based Vehicle Interactions across Six Different Vehicle Systems. AAA Foundation for Traffic Safety. Technical Report, Washington, DC."},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N19-1423"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICECCS51672.2020.00016"},{"key":"e_1_3_2_23_2","unstructured":"Jiayu Du Jinpeng Li Guoguo Chen and Wei-Qiang Zhang. 2024. SpeechColab leaderboard: An open-source platform for automatic speech recognition evaluation. arXiv:2403.08196. Retrieved from https:\/\/arxiv.org\/abs\/2403.08196"},{"key":"e_1_3_2_24_2","unstructured":"Xiaoning Du Xiaofei Xie Yi Li Lei Ma Jianjun Zhao and Yang Liu. 2018. DeepCruiser: Automated guided testing for stateful deep learning systems. arXiv:1812.05339. Retrieved from http:\/\/arxiv.org\/abs\/1812.05339"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE.2001.919106"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/3395363.3397357"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/3377811.3380415"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1145\/3564625.3564636"},{"key":"e_1_3_2_29_2","volume-title":"Fundamental Statistics in Psychology and Education","author":"Guilford Joy Paul","year":"1950","unstructured":"Joy Paul Guilford. 1950. Fundamental Statistics in Psychology and Education, McGraw-Hill, New-York."},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/1177055.1177056"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1145\/3510003.3510188"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/3368089.3409754"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1145\/3524610.3527897"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2021.3122291"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1145\/3511598"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/3533767.3534391"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP40776.2020.9052942"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE.2019.00108"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1145\/3334480.3375154"},{"key":"e_1_3_2_41_2","volume-title":"Encyclopedia of Statistical Sciences","author":"Kotz Samuel","year":"2005","unstructured":"Samuel Kotz, Narayanaswamy Balakrishnan, Campbell B. Read, and Brani Vidakovic. 2005. Encyclopedia of Statistical Sciences, Vol. 1. John Wiley & Sons."},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP40776.2020.9053889"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/CCWC.2018.8301638"},{"key":"e_1_3_2_44_2","unstructured":"Zhenzhong Lan Mingda Chen Sebastian Goodman Kevin Gimpel Piyush Sharma and Radu Soricut. 2020. ALBERT: A lite BERT for self-supervised learning of language representations. arXiv:1909.11942. Retrieved from http:\/\/arxiv.org\/abs\/1909.11942"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2009-730"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2009-730"},{"key":"e_1_3_2_47_2","unstructured":"Yinhan Liu Myle Ott Naman Goyal Jingfei Du Mandar Joshi Danqi Chen Omer Levy Mike Lewis Luke Zettlemoyer and Veselin Stoyanov. 2019. RoBERTa: A robustly optimized BERT pretraining approach. arXiv:1907.11692. Retrieved from http:\/\/arxiv.org\/abs\/1907.11692"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/SANER.2019.8668044"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1145\/3238147.3238202"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1145\/3417330"},{"key":"e_1_3_2_51_2","first-page":"2215","article-title":"Active learning methods for low resource end-to-end speech recognition.","author":"Malhotra Karan","year":"2019","unstructured":"Karan Malhotra, Shubham Bansal, and Sriram Ganapathy. 2019. Active learning methods for low resource end-to-end speech recognition. In International Speech Communication Association (INTERSPEECH), 2215\u20132219.","journal-title":"International Speech Communication Association (INTERSPEECH)"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/ITS.2014.6947957"},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.514"},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.1145\/375360.375365"},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2015.7178848"},{"key":"e_1_3_2_56_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10664-021-10066-6"},{"key":"e_1_3_2_57_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2015.7178964"},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.1145\/3361566"},{"key":"e_1_3_2_59_2","first-page":"28492","volume-title":"40th International Conference on Machine Learning","volume":"202","author":"Radford Alec","year":"2023","unstructured":"Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine Mcleavey, and Ilya Sutskever. 2023. Robust speech recognition via large-scale weak supervision. In 40th International Conference on Machine Learning. Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (Eds.), Vol. 202, PMLR, 28492\u201328518. Retrieved from https:\/\/proceedings.mlr.press\/v202\/radford23a.html"},{"key":"e_1_3_2_60_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-99429-7_14"},{"key":"e_1_3_2_61_2","first-page":"12108","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","volume":"36","author":"Ramesh Krithika","year":"2022","unstructured":"Krithika Ramesh, Ashiqur R. KhudaBukhsh, and Sumeet Kumar. 2022. \u201cBeach\u201d to \u201cBitch\u201d: Inadvertent unsafe transcription of kids\u2019 content on YouTube. In Proceedings of the AAAI Conference on Artificial Intelligence 36, 11 (2022), 12108\u201312118."},{"key":"e_1_3_2_62_2","doi-asserted-by":"publisher","DOI":"10.1109\/SANER53432.2022.00130"},{"key":"e_1_3_2_63_2","doi-asserted-by":"publisher","DOI":"10.1145\/3551349.3556964"},{"key":"e_1_3_2_64_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2014.phonme_rich"},{"key":"e_1_3_2_65_2","first-page":"379","article-title":"Measuring cognitive distraction in the Automobile II: Assessing in-vehicle voice-based interactive technologies","volume":"372","author":"Strayer David L.","year":"2014","unstructured":"David L. Strayer, Jonna Turrill, James R. Coleman, Emily V. Ortiz, and Joel M. Cooper. 2014. Measuring cognitive distraction in the Automobile II: Assessing in-vehicle voice-based interactive technologies. Accident Analysis and Prevention 372 (2014), 379.","journal-title":"Accident Analysis and Prevention"},{"key":"e_1_3_2_66_2","doi-asserted-by":"publisher","DOI":"10.1145\/3180155.3180220"},{"key":"e_1_3_2_67_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2009.4960685"},{"key":"e_1_3_2_68_2","first-page":"6000","article-title":"Attention is all you need","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems, 6000\u20136010.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_69_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2016.7472753"},{"key":"e_1_3_2_70_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.aacl-demo.6"},{"key":"e_1_3_2_71_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2014.6854213"},{"key":"e_1_3_2_72_2","first-page":"1","article-title":"Wilcoxon signed-rank test","author":"Woolson Robert F.","year":"2007","unstructured":"Robert F. Woolson. 2007. Wilcoxon signed-rank test. Wiley Encyclopedia of Clinical Trials (2007), 1\u20133.","journal-title":"Wiley Encyclopedia of Clinical Trials"},{"key":"e_1_3_2_73_2","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2112.01821"},{"key":"e_1_3_2_74_2","unstructured":"Yonghui Wu Mike Schuster Zhifeng Chen Quoc V. Le Mohammad Norouzi Wolfgang Macherey Maxim Krikun Yuan Cao Qin Gao Klaus Macherey et al. 2016. Google\u2019s neural machine translation system: Bridging the gap between human and machine translation. arXiv:1609.08144. Retrieved from http:\/\/arxiv.org\/abs\/1609.08144"},{"key":"e_1_3_2_75_2","doi-asserted-by":"publisher","DOI":"10.1145\/3293882.3330579"},{"key":"e_1_3_2_76_2","doi-asserted-by":"publisher","DOI":"10.1145\/3368089.3409671"},{"key":"e_1_3_2_77_2","doi-asserted-by":"publisher","DOI":"10.1109\/SANER53432.2022.00054"},{"key":"e_1_3_2_78_2","doi-asserted-by":"publisher","DOI":"10.1145\/3551349.3560421"},{"key":"e_1_3_2_79_2","doi-asserted-by":"publisher","DOI":"10.1109\/SANER53432.2022.00056"},{"key":"e_1_3_2_80_2","doi-asserted-by":"publisher","DOI":"10.1145\/3510003.3510146"},{"key":"e_1_3_2_81_2","doi-asserted-by":"publisher","DOI":"10.1002\/stv.430"},{"key":"e_1_3_2_82_2","doi-asserted-by":"publisher","DOI":"10.1145\/3290607.3312791"},{"key":"e_1_3_2_83_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICST57152.2023.00050"},{"key":"e_1_3_2_84_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2019.2962027"},{"key":"e_1_3_2_85_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSME46990.2020.00017"},{"key":"e_1_3_2_86_2","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2018-1110"}],"container-title":["ACM Transactions on Software Engineering and Methodology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3707450","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3707450","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:17:38Z","timestamp":1750295858000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3707450"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,4,28]]},"references-count":85,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025,5,31]]}},"alternative-id":["10.1145\/3707450"],"URL":"https:\/\/doi.org\/10.1145\/3707450","relation":{},"ISSN":["1049-331X","1557-7392"],"issn-type":[{"value":"1049-331X","type":"print"},{"value":"1557-7392","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,4,28]]},"assertion":[{"value":"2023-09-03","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-10-19","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-04-28","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}