{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,20]],"date-time":"2025-06-20T04:08:52Z","timestamp":1750392532040,"version":"3.41.0"},"reference-count":82,"publisher":"Association for Computing Machinery (ACM)","issue":"FSE","funder":[{"name":"Australian Research Council","award":["DP210100041"],"award-info":[{"award-number":["DP210100041"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Softw. Eng."],"published-print":{"date-parts":[[2025,6,19]]},"abstract":"<jats:p>Machine learning (ML) for text classification has been widely used in various domains, such as toxicity detection, chatbot consulting, and review analysis. These applications can significantly impact ethics, economics, and human behavior, raising serious concerns about trusting ML decisions. Several studies indicate that traditional uncertainty metrics, such as model confidence, and performance metrics, like accuracy, are insufficient to build human trust in ML models. These models often learn spurious correlations during training and predict based on them during inference. When deployed in the real world, where such correlations are absent, their performance can deteriorate significantly. To avoid this, a common practice is to test whether predictions are made reasonably based on valid patterns in the data. Along with this, a challenge known as the trustworthiness oracle problem has been introduced. So far, due to the lack of automated trustworthiness oracles, the assessment requires manual validation, based on the decision process disclosed by explanation methods. However, this approach is time-consuming, error-prone, and not scalable.<\/jats:p>\n          <jats:p>To address this problem, we propose TOKI, the first automated trustworthiness oracle generation method for text classifiers. TOKI automatically checks whether the words contributing the most to a prediction are semantically related to the predicted class. Specifically, we leverage ML explanation methods to extract the decision-contributing words and measure their semantic relatedness with the class based on word embeddings. As a demonstration of its practical usefulness, we also introduce a novel adversarial attack method that targets trustworthiness vulnerabilities identified by TOKI. We compare TOKI with a naive baseline based solely on model confidence. To evaluate their alignment with human judgement, experiments are conducted on human-created ground truths of approximately 8,000 predictions. Additionally, we compare the effectiveness of TOKI-guided adversarial attack method with A2T, a state-of-the-art adversarial attack method for text classification. Results show that (1) relying on prediction uncertainty metrics, such as model confidence, cannot effectively distinguish between trustworthy and untrustworthy predictions, (2) TOKI achieves 142% higher accuracy than the naive baseline, and (3) TOKI-guided adversarial attack method is more effective with fewer perturbations than A2T.<\/jats:p>","DOI":"10.1145\/3729376","type":"journal-article","created":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T15:15:34Z","timestamp":1750346134000},"page":"2382-2405","source":"Crossref","is-referenced-by-count":0,"title":["Automated Trustworthiness Oracle Generation for Machine Learning Text Classifiers"],"prefix":"10.1145","volume":"2","author":[{"ORCID":"https:\/\/orcid.org\/0009-0000-3038-8403","authenticated-orcid":false,"given":"Lam","family":"Nguyen Tung","sequence":"first","affiliation":[{"name":"Monash University, Melbourne, Australia"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-2548-4406","authenticated-orcid":false,"given":"Steven","family":"Cho","sequence":"additional","affiliation":[{"name":"University of Auckland, Auckland, New Zealand"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3728-9541","authenticated-orcid":false,"given":"Xiaoning","family":"Du","sequence":"additional","affiliation":[{"name":"Monash University, Melbourne, Australia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2572-0250","authenticated-orcid":false,"given":"Neelofar","family":"Neelofar","sequence":"additional","affiliation":[{"name":"Royal Melbourne Institute of Technology, Melbourne, Australia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5885-9297","authenticated-orcid":false,"given":"Valerio","family":"Terragni","sequence":"additional","affiliation":[{"name":"University of Auckland, Auckland, New Zealand"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8666-2782","authenticated-orcid":false,"given":"Stefano","family":"Ruberto","sequence":"additional","affiliation":[{"name":"Joint Research Centre at the European Commission, Ispra, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1716-690X","authenticated-orcid":false,"given":"Aldeida","family":"Aleti","sequence":"additional","affiliation":[{"name":"Monash University, Melbourne, Australia"}]}],"member":"320","published-online":{"date-parts":[[2025,6,19]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2018.2870052"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/3459637.3482126"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2014.2372785"},{"key":"e_1_2_1_4_1","volume-title":"Advances in Neural Information Processing Systems. 13","author":"Bengio Yoshua","unstructured":"Yoshua Bengio, R\u00e9jean Ducharme, and Pascal Vincent. 2000. A Neural Probabilistic Language Model. In Advances in Neural Information Processing Systems. 13, MIT Press."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00051"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1162\/coli.2006.32.1.13"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","unstructured":"Adrian Bussone Simone Stumpf and Dympna O\u2019Sullivan. 2015. The Role of Explanations on Trust and Reliance in Clinical Decision Support Systems. In Healthcare Informatics. 160\u2013169. https:\/\/doi.org\/10.1109\/ICHI.2015.26 10.1109\/ICHI.2015.26","DOI":"10.1109\/ICHI.2015.26"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1007\/s42979-022-01409-1"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/2783258.2788613"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v36i10.21289"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.3389\/frai.2020.00054"},{"key":"e_1_2_1_13_1","unstructured":"Steven Cho Seaton Cousins-Baxter Stefano Ruberto and Valerio Terragni. 2024. Automated Trustworthiness Testing for Machine Learning Classifiers. arxiv:2406.05251. arxiv:2406.05251"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1"},{"key":"e_1_2_1_15_1","volume-title":"On-line at http:\/\/www. mw. com\/home. htm, 8, 2","author":"Dictionary Merriam-Webster","year":"2002","unstructured":"Merriam-Webster Dictionary. 2002. Merriam-webster. On-line at http:\/\/www. mw. com\/home. htm, 8, 2 (2002), 23."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3596490"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3338906.3338972"},{"key":"e_1_2_1_20_1","volume-title":"CAMS: An Annotated Corpus for Causal Analysis of Mental Health Issues in Social Media Posts. In Language Resources and Evaluation","author":"Garg Muskan","year":"2022","unstructured":"Muskan Garg, Chandni Saxena, Sriparna Saha, Veena Krishnan, Ruchi Joshi, and Vijay Mago. 2022. CAMS: An Annotated Corpus for Causal Analysis of Mental Health Issues in Social Media Posts. In Language Resources and Evaluation. European Language Resources Association, France. 387\u2013396. https:\/\/aclanthology.org\/2022.lrec-1.686"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1038\/s42256-020-00257-z"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3432934"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10618-022-00831-6"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3236024.3264835"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2491411.2494578"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","unstructured":"Keith Harrigian Carlos Aguirre and Mark Dredze. 2020. Do Models of Mental Health Based on Social Media Data Generalize? In Findings of the Association for Computational Linguistics: EMNLP. 3774\u20133788. https:\/\/doi.org\/10.18653\/v1\/2020.findings-emnlp.337 10.18653\/v1\/2020.findings-emnlp.337","DOI":"10.18653\/v1"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1"},{"key":"e_1_2_1_28_1","volume-title":"Complex, Intelligent and Software Intensive Systems","author":"Kaur Davinder","unstructured":"Davinder Kaur, Suleyman Uslu, Arjan Durresi, Sunil Badve, and Murat Dundar. 2021. Trustworthy Explainability Acceptance: A New Metric to Measure the Trustworthiness of Interpretable AI Medical Diagnostic Systems. In Complex, Intelligent and Software Intensive Systems. Springer International Publishing, Cham. 35\u201346. isbn:978-3-030-79725-6"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/3077257.3077271"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/REW53955.2021.00031"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","unstructured":"Nguyen Tung Lam Cho Steven Du Xiaoning Neelofar Terragni Valerio Ruberto Stefano and Aleti Aldeida. 2024. TOKI\u2019s Replicate Package. https:\/\/doi.org\/10.5281\/zenodo.13751579 10.5281\/zenodo.13751579","DOI":"10.5281\/zenodo.13751579"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1038\/s41467-019-08987-4"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/INISTA.2017.8001122"},{"key":"e_1_2_1_34_1","volume-title":"Understanding Neural Networks through Representation Erasure.. CoRR, abs\/1612.08220","author":"Li Jiwei","year":"2016","unstructured":"Jiwei Li, Will Monroe, and Dan Jurafsky. 2016. Understanding Neural Networks through Representation Erasure.. CoRR, abs\/1612.08220 (2016), http:\/\/dblp.uni-trier.de\/db\/journals\/corr\/corr1612.html##LiMJ16a"},{"key":"e_1_2_1_35_1","doi-asserted-by":"crossref","unstructured":"Q. Vera Liao and Vaughan Jennifer Wortman. 2024. AI Transparency in the Age of LLMs: A Human-Centered Research Roadmap. Harvard Data Science Review feb 29 https:\/\/hdsr.mitpress.mit.edu\/pub\/aelql9qy","DOI":"10.1162\/99608f92.8036d03b"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2023.102094"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3238147.3238202"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2024.3408062"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3487043"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i17.17745"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/219717.219748"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1"},{"key":"e_1_2_1_43_1","volume-title":"Non-Classical Lexical Semantic Relations. In The Computational Lexical Semantics Workshop. Association for Computational Linguistics","author":"Morris Jane","year":"2004","unstructured":"Jane Morris and Graeme Hirst. 2004. Non-Classical Lexical Semantic Relations. In The Computational Lexical Semantics Workshop. Association for Computational Linguistics, Boston, USA. 46\u201351. https:\/\/aclanthology.org\/W04-2607"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298640"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","unstructured":"Frank Nielsen. 2016. Hierarchical Clustering. Springer International Publishing Cham. 195\u2013211. isbn:978-3-319-21903-5 https:\/\/doi.org\/10.1007\/978-3-319-21903-5_8 10.1007\/978-3-319-21903-5_8","DOI":"10.1007\/978-3-319-21903-5_8"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/MSR.2015.35"},{"key":"e_1_2_1_49_1","unstructured":"Yaniv Ovadia Emily Fertig Jie Ren Zachary Nado D. Sculley Sebastian Nowozin Joshua Dillon Balaji Lakshminarayanan and Jasper Snoek. 2019. Can you trust your model' s uncertainty? Evaluating predictive uncertainty under dataset shift. In Advances in Neural Information Processing Systems. 32 Curran Associates. https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2019\/file\/8558cb408c1d76621371888657d2eb1d-Paper.pdf"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICSM.2015.7332474"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/3132747.3132785"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1109\/CONISOFT50191.2020.00014"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1109\/SCAM.2015.7335404"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2018.2857402"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/3491102.3501967"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2939778"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10664-020-09881-0"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2017"},{"key":"e_1_2_1_61_1","volume-title":"Towards Human-Centred Explainability Benchmarks For Text Classification. In 16th International AAAI Conference on Web and Social Media. AAAI, USA. 1.","author":"Schlegel Viktor","year":"2022","unstructured":"Viktor Schlegel, Erick Mendez Guzman, and Riza Batista-Navarro. 2022. Towards Human-Centred Explainability Benchmarks For Text Classification. In 16th International AAAI Conference on Web and Social Media. AAAI, USA. 1."},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1038\/s42256-020-0212-3"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10664-024-10469-1"},{"key":"e_1_2_1_64_1","volume-title":"Swivel: Improving Embeddings by Noticing What\u2019s Missing. CoRR, abs\/1602.02215","author":"Shazeer Noam","year":"2016","unstructured":"Noam Shazeer, Ryan Doherty, Colin Evans, and Chris Waterson. 2016. Swivel: Improving Embeddings by Noticing What\u2019s Missing. CoRR, abs\/1602.02215 (2016), 9 pages. arXiv:1602.02215. arxiv:1602.02215"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1145\/2901739.2903501"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.5120\/13897-1851"},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.3389\/fnins.2019.01321"},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1145\/3180155.3180220"},{"key":"e_1_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1007\/s41060-021-00242-8"},{"key":"e_1_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1"},{"key":"e_1_2_1_71_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1"},{"key":"e_1_2_1_72_1","doi-asserted-by":"publisher","DOI":"10.1109\/TR.2020.2972266"},{"key":"e_1_2_1_73_1","unstructured":"Wenqian Ye Guangtao Zheng Xu Cao Yunsheng Ma and Aidong Zhang. 2024. Spurious Correlations in Machine Learning: A Survey. arxiv:2402.12715. arxiv:2402.12715"},{"key":"e_1_2_1_74_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2023.3327163"},{"key":"e_1_2_1_75_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1"},{"key":"e_1_2_1_76_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2019.2962027"},{"key":"e_1_2_1_77_1","doi-asserted-by":"publisher","DOI":"10.1145\/3374217"},{"key":"e_1_2_1_78_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICPC.2013.6613842"},{"key":"e_1_2_1_79_1","doi-asserted-by":"publisher","DOI":"10.1145\/3351095.3372852"},{"key":"e_1_2_1_80_1","doi-asserted-by":"publisher","DOI":"10.1145\/3639372"},{"key":"e_1_2_1_81_1","doi-asserted-by":"publisher","unstructured":"Huichi Zhou Zhaoyang Wang Hongtao Wang Dongping Chen Wenhan Mu and Fangyuan Zhang. 2024. Evaluating the Validity of Word-level Adversarial Attacks with Large Language Models. In Findings of the Association for Computational Linguistics. Thailand. 4902\u20134922. https:\/\/doi.org\/10.18653\/v1\/2024.findings-acl.292 10.18653\/v1\/2024.findings-acl.292","DOI":"10.18653\/v1"},{"key":"e_1_2_1_82_1","volume-title":"Artificial Intelligence Applications and Innovations","author":"Zini Julia El","unstructured":"Julia El Zini, Mohamad Mansour, Basel Mousi, and Mariette Awad. 2022. On the\u00a0Evaluation of\u00a0the\u00a0Plausibility and\u00a0Faithfulness of\u00a0Sentiment Analysis Explanations. In Artificial Intelligence Applications and Innovations. Springer International Publishing, Cham. 338\u2013349. isbn:978-3-031-08337-2"}],"container-title":["Proceedings of the ACM on Software Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3729376","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T15:19:59Z","timestamp":1750346399000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3729376"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,19]]},"references-count":82,"journal-issue":{"issue":"FSE","published-print":{"date-parts":[[2025,6,19]]}},"alternative-id":["10.1145\/3729376"],"URL":"https:\/\/doi.org\/10.1145\/3729376","relation":{},"ISSN":["2994-970X"],"issn-type":[{"value":"2994-970X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,6,19]]}}}