{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,2]],"date-time":"2026-04-02T19:10:41Z","timestamp":1775157041060,"version":"3.50.1"},"reference-count":81,"publisher":"Association for Computing Machinery (ACM)","issue":"1","funder":[{"name":"RGC CRF grant under the contract","award":["C6015-23G"],"award-info":[{"award-number":["C6015-23G"]}]},{"DOI":"10.13039\/100002536","name":"HSBC","doi-asserted-by":"crossref","id":[{"id":"10.13039\/100002536","id-type":"DOI","asserted-by":"crossref"}]},{"name":"OpenAI Researcher Access Program"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Softw. Eng. Methodol."],"published-print":{"date-parts":[[2026,1,31]]},"abstract":"<jats:p>LLMs increasingly serve as general-purpose AI assistants in daily life, and their subtly unethical suggestions become a serious and real concern. It is demanding to test and mitigate such unethical suggestions from LLMs. Despite existing efforts to detect violations of \u201ctestable\u201d facets of ethics (e.g., fairness testing), it is challenging to encode the full scope of ethics (e.g., justice, deontology) into a test oracle without human annotations or intervention.<\/jats:p>\n                  <jats:p>\n                    In this article, we take inspiration from reflective equilibrium, a modern moral reasoning method in moral and political philosophy, to guide our approach. Instead of seeking unethical suggestions in LLMs, we aim to identify behavioral inconsistency in LLMs\u2019 ethics-related suggestions. These inconsistencies are anticipated to serve as a useful proxy and hint at unethical suggestions. We formulate reflective equilibrium in the form of fixed-point iteration, instantiate it as a novel test oracle, and also employ it to form a mitigation scheme for LLMs\u2019 behavioral inconsistency on ethics-related inputs. To facilitate testing, we also create a comprehensive test suite,\n                    <jats:sc>EthicsSuite<\/jats:sc>\n                    , with 20K moral situations. In our study, we evaluate eight widely used LLMs. Our experiments reveal that LLMs are prone to ethical inconsistencies, with 81.22% of our test cases prompting ethically inconsistent suggestions on average. Our human evaluation suggests that the majority of these inconsistencies indeed manifest unethical biases. Our mitigation scheme effectively refines a significant number (80.1%) of these suggestions for commercial LLMs such as GPT-4 and Claude.\n                  <\/jats:p>","DOI":"10.1145\/3722554","type":"journal-article","created":{"date-parts":[[2025,3,11]],"date-time":"2025-03-11T10:51:05Z","timestamp":1741690265000},"page":"1-27","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["<scp>Reeq<\/scp>\n                    : Testing and Mitigating Ethically Inconsistent Suggestions of Large Language Models with Reflective Equilibrium"],"prefix":"10.1145","volume":"35","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7680-2817","authenticated-orcid":false,"given":"Pingchuan","family":"Ma","sequence":"first","affiliation":[{"name":"Hong Kong University of Science and Technology, Hong Kong, Hong Kong"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-6892-1264","authenticated-orcid":false,"given":"Zhaoyu","family":"Wang","sequence":"additional","affiliation":[{"name":"The Hong Kong University of Science and Technology, Hong Kong, Hong Kong"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9897-4086","authenticated-orcid":false,"given":"Zongjie","family":"Li","sequence":"additional","affiliation":[{"name":"The Hong Kong University of Science and Technology, Hong Kong, Hong Kong"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3167-0480","authenticated-orcid":false,"given":"Zhenlan","family":"Ji","sequence":"additional","affiliation":[{"name":"The Hong Kong University of Science and Technology, Hong Kong, Hong Kong"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-5369-624X","authenticated-orcid":false,"given":"Ao","family":"Sun","sequence":"additional","affiliation":[{"name":"Hong Kong University of Science and Technology, Hong Kong, Hong Kong"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-5080-8736","authenticated-orcid":false,"given":"Juergen","family":"Rahmel","sequence":"additional","affiliation":[{"name":"HSBC, Hong Kong, Hong Kong"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0866-0308","authenticated-orcid":false,"given":"Shuai","family":"Wang","sequence":"additional","affiliation":[{"name":"Hong Kong University of Science and Technology, Hong Kong, Hong Kong"}]}],"member":"320","published-online":{"date-parts":[[2025,12,11]]},"reference":[{"key":"e_1_3_3_2_2","first-page":"21","volume-title":"In Proceedings and Addresses of the American Philosophical Association","volume":"89","author":"Elizabeth Anderson","year":"2015","unstructured":"Anderson Elizabeth. 2015. Moral bias and corrective practices: A pragmatist perspective. In Proceedings and Addresses of the American Philosophical Association, Vol. 89. American Philosophical Association, 21\u201347."},{"key":"e_1_3_3_3_2","unstructured":"Anthropic. 2024. The Claude 3 Model Family: Opus Sonnet Haiku. Retrieved from https:\/\/www-cdn.anthropic.com\/de8ba9b01c9ab7cbabf5c33b80b7bbc618857627\/Model_Card_Claude_3.pdf"},{"issue":"12","key":"e_1_3_3_4_2","first-page":"5087","article-title":"Biasfinder: Metamorphic test generation to uncover bias for sentiment analysis systems","volume":"48","author":"Hilmi Asyrofi Muhammad","year":"2021","unstructured":"Muhammad Hilmi Asyrofi, Zhou Yang, Imam Nur Bani Yusuf, Hong Jin Kang, Ferdian Thung, and David Lo. 2021. Biasfinder: Metamorphic test generation to uncover bias for sentiment analysis systems. IEEE Transactions on Software Engineering 48, 12 (2021), 5087\u20135101. 12","journal-title":"IEEE Transactions on Software Engineering"},{"key":"e_1_3_3_5_2","unstructured":"Azure. 2023. Azure Content Moderator. Retrieved from https:\/\/learn.microsoft.com\/en-us\/azure\/cognitive-services\/content-moderator\/overview"},{"key":"e_1_3_3_6_2","doi-asserted-by":"publisher","DOI":"10.5281\/zenodo.5297715"},{"key":"e_1_3_3_7_2","first-page":"1877","article-title":"Language models are few-shot learners","volume":"33","author":"Brown Tom","year":"2020","unstructured":"Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in Neural Information Processing Systems 33 (2020), 1877\u20131901.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_3_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/3624700"},{"key":"e_1_3_3_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/ASE51524.2021.9678670"},{"key":"e_1_3_3_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/3143561"},{"key":"e_1_3_3_11_2","unstructured":"Zhihong Chen Feng Jiang Junying Chen Tiannan Wang Fei Yu Guiming Chen Hongbo Zhang Juhao Liang Chen Zhang Zhiyi Zhang et al. 2023. Phoenix: Democratizing ChatGPT across Languages. arXiv:2304.10453. Retrieved from https:\/\/arxiv.org\/abs\/2304.10453"},{"key":"e_1_3_3_12_2","unstructured":"Zhenpeng Chen Jie M. Zhang Max Hort Federica Sarro and Mark Harman. 2022. Fairness testing: A comprehensive survey and analysis of trends. arXiv:2207.10223. Retrieved from https:\/\/arxiv.org\/abs\/2207.10223"},{"key":"e_1_3_3_13_2","unstructured":"Wei Lin Chiang Zhuohan Li Zi Lin Ying Sheng Zhanghao Wu Hao Zhang Lianmin Zheng Siyuan Zhuang Yonghao Zhuang Joseph E. Gonzalez et al. 2023. Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality. Retrieved from https:\/\/lmsys.org\/blog\/2023-03-30-vicuna\/"},{"key":"e_1_3_3_14_2","unstructured":"Wei-Lin Chiang Lianmin Zheng Ying Sheng Anastasios Nikolas Angelopoulos Tianle Li Dacheng Li Hao Zhang Banghua Zhu Michael Jordan Joseph E. Gonzalez et al. 2024. Chatbot arena: An open platform for evaluating LLMs by human preference. arXiv:2403.04132. Retrieved from https:\/\/arxiv.org\/abs\/2403.04132"},{"key":"e_1_3_3_15_2","unstructured":"Jiawen Deng Hao Sun Zhexin Zhang Jiale Cheng and Minlie Huang. 2023. Recent advances towards safe responsible and moral dialogue systems: A survey. arXiv:2302.09270. Retrieved from https:\/\/arxiv.org\/abs\/2302.09270"},{"key":"e_1_3_3_16_2","unstructured":"Jacob Devlin Ming-Wei Chang Kenton Lee and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805. Retrieved from https:\/\/arxiv.org\/abs\/1810.04805"},{"key":"e_1_3_3_17_2","doi-asserted-by":"crossref","unstructured":"Shizhe Diao Rui Pan Hanze Dong KaShun Shum Jipeng Zhang Wei Xiong and Tong Zhang. 2023. LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Retrieved from https:\/\/optimalscale.github.io\/LMFlow\/","DOI":"10.18653\/v1\/2024.naacl-demo.12"},{"key":"e_1_3_3_18_2","first-page":"320","volume-title":"60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Du Zhengxiao","year":"2022","unstructured":"Zhengxiao Du, Yujie Qian, Xiao Liu, Ming Ding, Jiezhong Qiu, Zhilin Yang, and Jie Tang. 2022. GLM: General language model pretraining with autoregressive blank infilling. In 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 320\u2013335."},{"key":"e_1_3_3_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/3213846.3213858"},{"key":"e_1_3_3_20_2","unstructured":"Leo Gao Stella Biderman Sid Black Laurence Golding Travis Hoppe Charles Foster Jason Phang Horace He Anish Thite Noa Nabeshima et al. 2020. The pile: An 800GB dataset of diverse text for language modeling. arXiv:2101.00027. Retrieved from https:\/\/arxiv.org\/abs\/2101.00027"},{"key":"e_1_3_3_21_2","doi-asserted-by":"crossref","first-page":"3356","DOI":"10.18653\/v1\/2020.findings-emnlp.301","volume-title":"Findings of the Association for Computational Linguistics (EMNLP \u201920)","author":"Gehman Samuel","year":"2020","unstructured":"Samuel Gehman, Suchin Gururangan, Maarten Sap, Yejin Choi, and Noah A. Smith. 2020. RealToxicityPrompts: Evaluating neural toxic degeneration in language models. In Findings of the Association for Computational Linguistics (EMNLP \u201920), 3356\u20133369."},{"key":"e_1_3_3_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE43902.2021.00047"},{"key":"e_1_3_3_23_2","unstructured":"Xingwei He Zhenghao Lin Yeyun Gong A. Jin Hang Zhang Chen Lin Jian Jiao Siu Ming Yiu Nan Duan Weizhu Chen et al. 2023. AnnoLLM: Making large language models to be better crowdsourced annotators. arXiv:2303.16854. Retrieved from https:\/\/arxiv.org\/abs\/2303.16854"},{"key":"e_1_3_3_24_2","volume-title":"International Conference on Learning Representations","author":"Hendrycks Dan","year":"2021","unstructured":"Dan Hendrycks, Collin Burns, Steven Basart, Andrew Critch Critch, Jerry Li Li, Dawn Song, and Jacob Steinhardt. 2021. Aligning AI with shared human values. In International Conference on Learning Representations. Retrieved from https:\/\/openreview.net\/forum?id=dNy_RKzJacY"},{"key":"e_1_3_3_25_2","volume-title":"International Conference on Learning Representations","author":"Hu Edward J.","year":"2022","unstructured":"Edward J. Hu, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. 2022. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations."},{"key":"e_1_3_3_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2020.3038802"},{"key":"e_1_3_3_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/3533767.3534391"},{"key":"e_1_3_3_28_2","unstructured":"Zhenlan Ji Pingchuan Ma Zongjie Li and Shuai Wang. 2023. Benchmarking and explaining large language model-based code generation: A causality-centric approach. arXiv:2310.06680. Retrieved from https:\/\/arxiv.org\/abs\/2310.06680"},{"key":"e_1_3_3_29_2","unstructured":"Liwei Jiang Jena D. Hwang Chandra Bhagavatula Ronan Le Bras Maxwell Forbes Jon Borchardt Jenny Liang Oren Etzioni Maarten Sap and Yejin Choi. 2021. Delphi: Towards machine ethics and norms. arXiv:2110.07574. Retrieved from https:\/\/arxiv.org\/abs\/2110.07574"},{"key":"e_1_3_3_30_2","unstructured":"Liwei Jiang Jena D. Hwang Chandra Bhagavatula Ronan Le Bras Jenny Liang Jesse Dodge Keisuke Sakaguchi Maxwell Forbes Jon Borchardt Saadia Gabriel et al. 2021. Can machines learn morality? The Delphi experiment. arXiv:2110.07574. Retrieved from https:\/\/arxiv.org\/abs\/2110.07574"},{"key":"e_1_3_3_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2022.3212329"},{"key":"e_1_3_3_32_2","volume-title":"2023 IEEE International Conference on Software Analysis, Evolution And Reengineering (SANER)","author":"Khoo Lin Sze","year":"2023","unstructured":"Lin Sze Khoo, Jia Qi Bay, Ming Lee Kimberly Yap, Mei Kuan Lim, Chun Yong Chong, Zhou Yang, and David Lo. 2023. Exploring and repairing gender fairness violations in word embedding-based sentiment analysis model through adversarial patches. In 2023 IEEE International Conference on Software Analysis, Evolution And Reengineering (SANER). IEEE Computer Society."},{"key":"e_1_3_3_33_2","unstructured":"Hyunwoo Kim Youngjae Yu Liwei Jiang Ximing Lu Daniel Khashabi Gunhee Kim Yejin Choi and Maarten Sap. 2022. Prosocialdialog: A prosocial backbone for conversational agents. arXiv:2205.12688. Retrieved from https:\/\/arxiv.org\/abs\/2205.12688"},{"key":"e_1_3_3_34_2","volume-title":"The Stanford Encyclopedia of Philosophy (Winter 2023 ed.)","author":"Knight Carl","year":"2023","unstructured":"Carl Knight. 2023. Reflective equilibrium. In The Stanford Encyclopedia of Philosophy (Winter 2023 ed.). Edward N. Zalta and Uri Nodelman (Eds.). Metaphysics Research Lab, Stanford University."},{"key":"e_1_3_3_35_2","unstructured":"Tianlin Li Xiaoyu Zhang Chao Du Tianyu Pang Qian Liu Qing Guo Chao Shen and Yang Liu. 2024. Your large language model is secretly a fairness proponent and you should prompt it like one. arXiv:2402.12150. Retrieved from https:\/\/arxiv.org\/abs\/2402.12150"},{"key":"e_1_3_3_36_2","first-page":"4582","volume-title":"59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)","author":"Li Xiang Lisa","year":"2021","unstructured":"Xiang Lisa Li and Percy Liang. 2021. Prefix-tuning: Optimizing continuous prompts for generation. In 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 4582\u20134597."},{"key":"e_1_3_3_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE48619.2023.00110"},{"key":"e_1_3_3_38_2","first-page":"13","volume-title":"IEEE\/ACM 46th International Conference on Software Engineering (ICSE \u201924)","volume":"74","author":"Li Zongjie","year":"2024","unstructured":"Zongjie Li, Chaozheng Wang, Pingchuan Ma, Chaowei Liu, Shuai Wang, Daoyuan Wu, Cuiyun Gao, and Yang Liu. 2024. On extracting specialized code abilities from large language models: A feasibility study. In IEEE\/ACM 46th International Conference on Software Engineering (ICSE \u201924). ACM, New York, NY, Article 74, 13 Pages."},{"key":"e_1_3_3_39_2","first-page":"11084","volume-title":"2024 Conference on Empirical Methods in Natural Language Processing","author":"Li Zongjie","year":"2024","unstructured":"Zongjie Li, Chaozheng Wang, Pingchuan Ma, Daoyuan Wu, Shuai Wang, Cuiyun Gao, and Yang Liu. 2024. Split and merge: Aligning position biases in LLM-based evaluators. In 2024 Conference on Empirical Methods in Natural Language Processing. Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen (Eds.), 11084\u201311108."},{"key":"e_1_3_3_40_2","first-page":"2336","volume-title":"2023 ACM SIGSAC Conference on Computer and Communications Security (CCS \u201923)","author":"Li Zongjie","year":"2023","unstructured":"Zongjie Li, Chaozheng Wang, Shuai Wang, and Gao Cuiyun.2023. Protecting intellectual property of large language model-based code generation APIs via watermarks. In 2023 ACM SIGSAC Conference on Computer and Communications Security (CCS \u201923), 2336\u20132350."},{"key":"e_1_3_3_41_2","unstructured":"Zongjie Li Daoyuan Wu Shuai Wang and Zhendong Su. 2024. API-Guided Dataset Synthesis to Finetune Large Code Models. arXiv:2408.08343. Retrieved from https:\/\/arxiv.org\/abs\/2408.08343"},{"key":"e_1_3_3_42_2","first-page":"5046","volume-title":"IEEE Transactions on Software Engineering","volume":"49","author":"Liu Shuang","year":"2023","unstructured":"Shuang Liu, Shujie Dou, Junjie Chen, Zhirun Zhang, and Ye Lu. 2023. Differential testing of machine translators based on compositional semantics. IEEE Transactions on Software Engineering 49, 12 (2023), 5046\u20135059."},{"key":"e_1_3_3_43_2","first-page":"1","volume-title":"37th IEEE\/ACM International Conference on Automated Software Engineering","author":"Liu Zixi","year":"2022","unstructured":"Zixi Liu, Yang Feng, Yining Yin, Jingyu Sun, Zhenyu Chen, and Baowen Xu. 2022. QATest: A uniform fuzzing framework for question answering systems. In 37th IEEE\/ACM International Conference on Automated Software Engineering, 1\u201312."},{"key":"e_1_3_3_44_2","doi-asserted-by":"publisher","DOI":"10.14778\/3494124.3494139"},{"key":"e_1_3_3_45_2","first-page":"458","volume-title":"29th International Joint Conference on Artificial Intelligence","author":"Ma Pingchuan","year":"2020","unstructured":"Pingchuan Ma, Shuai Wang, and Jin Liu. 2020. Metamorphic testing and certified mitigation of fairness violations in NLP models. In 29th International Joint Conference on Artificial Intelligence, 458\u2013465."},{"key":"e_1_3_3_46_2","unstructured":"Pingchuan Ma Zhaoyu Wang Zongjie Li Zhenlan Ji Ao Sun Juergen Rahmel and Shuai Wang. 2025. Artifact for the Paper \u201cReeq: Testing and mitigating ethically inconsistent suggestions of large language models with reflective equilibrium\u201d. Retrieved from https:\/\/github.com\/LLM-Ethics\/SCR"},{"key":"e_1_3_3_47_2","unstructured":"Pingchuan Ma Zhaoyu Wang Zongjie Li Zhenlan Ji Ao Sun Juergen Rahmel and Shuai Wang. 2025. Dataset for the Paper \u201cReeq: Testing and Mitigating Ethically Inconsistent Suggestions of Large Language Models with Reflective Equilibrium\u201d. Retrieved from https:\/\/github.com\/LLM-Ethics\/EthicsSuite"},{"key":"e_1_3_3_48_2","unstructured":"Aman Madaan Niket Tandon Prakhar Gupta Skyler Hallinan Luyu Gao Sarah Wiegreffe Uri Alon Nouha Dziri Shrimai Prabhumoye Yiming Yang et al. 2023. Self-refine: Iterative refinement with self-feedback. arXiv:2303.17651. Retrieved from https:\/\/arxiv.org\/abs\/2303.17651"},{"key":"e_1_3_3_49_2","unstructured":"Nat McAleese Rai Michael Pokorny Juan Felipe Ceron Uribe Evgenia Nitishinskaya Maja Trebacz and Jan Leike. 2024. LLM critics help catch LLM bugs. arXiv:2407.00215. Retrieved from https:\/\/arxiv.org\/abs\/2407.00215"},{"issue":"1","key":"e_1_3_3_50_2","first-page":"100","article-title":"Differential testing for software","volume":"10","author":"McKeeman William M.","year":"1998","unstructured":"William M. McKeeman. 1998. Differential testing for software. Digital Technical Journal 10, 1 (1998), 100\u2013107.","journal-title":"Digital Technical Journal"},{"key":"e_1_3_3_51_2","doi-asserted-by":"crossref","first-page":"652","DOI":"10.1057\/9781137344557_27","volume-title":"The Palgrave Handbook of Philosophical Methods","author":"McPherson Tristram","year":"2015","unstructured":"Tristram McPherson. 2015. The methodological irrelevance of reflective equilibrium. In The Palgrave Handbook of Philosophical Methods. Springer, 652\u2013674."},{"key":"e_1_3_3_52_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10664-023-10305-y"},{"key":"e_1_3_3_53_2","unstructured":"OpenAI. 2022. Introducing ChatGPT. Retrieved from https:\/\/openai.com\/blog\/chatgpt"},{"key":"e_1_3_3_54_2","unstructured":"OpenAI. 2023. GPT-4 Technical Report. Retrieved from https:\/\/cdn.openai.com\/papers\/gpt-4.pdf"},{"key":"e_1_3_3_55_2","unstructured":"OpenAI. 2023. Moderation. Retrieved from https:\/\/platform.openai.com\/docs\/guides\/moderation\/overview"},{"key":"e_1_3_3_56_2","unstructured":"OpenAI. 2023. OpenAI API Platform. Retrieved from https:\/\/platform.openai.com\/overview"},{"key":"e_1_3_3_57_2","unstructured":"OpenAI. 2023. Our Approach to AI Safety. Retrieved from https:\/\/openai.com\/blog\/our-approach-to-ai-safety"},{"key":"e_1_3_3_58_2","first-page":"27730","article-title":"Training language models to follow instructions with human feedback","volume":"35","author":"Ouyang Long","year":"2022","unstructured":"Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. 2022. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 35 (2022), 27730\u201327744.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_3_59_2","unstructured":"Baolin Peng Chunyuan Li Pengcheng He Michel Galley and Jianfeng Gao. 2023. Instruction tuning with gpt-4. arXiv:2304.03277. Retrieved from https:\/\/arxiv.org\/abs\/2304.03277"},{"key":"e_1_3_3_60_2","doi-asserted-by":"crossref","unstructured":"John Rawls. 1971. A Theory of Justice. Cambridge (Mass.).","DOI":"10.4159\/9780674042605"},{"key":"e_1_3_3_61_2","unstructured":"Teven Le Scao Angela Fan Christopher Akiki Ellie Pavlick Suzana Ili\u0107 Daniel Hesslow Roman Castagn\u00e9 Alexandra Sasha Luccioni Fran\u00e7ois Yvon Matthias Gall\u00e9 et al. 2022. Bloom: A 176b-parameter open-access multilingual language model. arXiv:2211.05100. Retrieved from https:\/\/arxiv.org\/abs\/2211.05100"},{"key":"e_1_3_3_62_2","doi-asserted-by":"publisher","DOI":"10.1145\/3551349.3556953"},{"key":"e_1_3_3_63_2","unstructured":"Noah Shinn Beck Labash and Ashwin Gopinath. 2023. Reflexion: An autonomous agent with dynamic memory and self-reflection. arXiv:2303.11366. Retrieved from https:\/\/arxiv.org\/abs\/2303.11366"},{"issue":"12","key":"e_1_3_3_64_2","first-page":"5188","article-title":"Astraea: Grammar-based fairness testing","volume":"48","author":"Soremekun Ezekiel","year":"2022","unstructured":"Ezekiel Soremekun, Sakshi Udeshi, and Sudipta Chattopadhyay. 2022. Astraea: Grammar-based fairness testing. IEEE Transactions on Software Engineering 48, 12 (2022), 5188\u20135211.","journal-title":"IEEE Transactions on Software Engineering"},{"key":"e_1_3_3_65_2","unstructured":"Rohan Taori Ishaan Gulrajani Tianyi Zhang Yann Dubois Xuechen Li Carlos Guestrin Percy Liang and Tatsunori B. Hashimoto. 2023. Stanford Alpaca: An Instruction-Following LLaMA Model. Retrieved from https:\/\/github.com\/tatsu-lab\/stanford_alpaca"},{"key":"e_1_3_3_66_2","doi-asserted-by":"crossref","first-page":"5369","DOI":"10.18653\/v1\/2020.acl-main.477","volume-title":"58th Annual Meeting of the Association for Computational Linguistics","author":"Tay Yi","year":"2020","unstructured":"Yi Tay, Donovan Ong, Jie Fu, Alvin Chan, Nancy Chen, Anh Tuan Luu, and Christopher Pal. 2020. Would you rather? a new benchmark for learning machine alignment with cultural values and social preferences. In 58th Annual Meeting of the Association for Computational Linguistics, 5369\u20135373."},{"key":"e_1_3_3_67_2","unstructured":"Hugo Touvron Thibaut Lavril Gautier Izacard Xavier Martinet Marie-Anne Lachaux Timoth\u00e9e Lacroix Baptiste Rozi\u00e8re Naman Goyal Eric Hambro Faisal Azhar et al. 2023. LLaMA: Open and efficient foundation language models. arXiv:2302.13971. Retrieved from https:\/\/arxiv.org\/abs\/2302.13971"},{"key":"e_1_3_3_68_2","first-page":"6000","article-title":"Attention is all you need","volume":"30","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems 30 (2017), 6000\u20136010.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_3_69_2","unstructured":"Chaozheng Wang Zongjie Li Cuiyun Gao Wenxuan Wang Ting Peng Hailiang Huang Yuetang Deng Shuai Wang and Michael R. Lyu. 2024. Exploring multi-lingual bias of large code models in code generation. arXiv:2404.19368. Retrieved from https:\/\/arxiv.org\/abs\/2404.19368"},{"key":"e_1_3_3_70_2","doi-asserted-by":"publisher","DOI":"10.1145\/3324884.3416584"},{"key":"e_1_3_3_71_2","unstructured":"Wenxuan Wang Jen-Tse Huang Weibin Wu Jianping Zhang Yizhan Huang Shuqing Li Pinjia He and Michael Lyu. 2023. MTTM: Metamorphic testing for textual content moderation software. arXiv:2302.05706. Retrieved from https:\/\/arxiv.org\/abs\/2302.05706"},{"key":"e_1_3_3_72_2","unstructured":"Zekun Moore Wang Zhongyuan Peng Haoran Que Jiaheng Liu Wangchunshu Zhou Yuhan Wu Hongcheng Guo Ruitong Gan Zehao Ni Jian Yang et al. 2023. Rolellm: Benchmarking eliciting and enhancing role-playing abilities of large language models. arXiv:2310.00746. Retrieved from https:\/\/arxiv.org\/abs\/2310.00746"},{"key":"e_1_3_3_73_2","unstructured":"Laura Weidinger John Mellor Maribeth Rauh Conor Griffin Jonathan Uesato Po-Sen Huang Myra Cheng Mia Glaese Borja Balle Atoosa Kasirzadeh et al. 2021. Ethical and social risks of harm from language models. arXiv:2112.04359. Retrieved from https:\/\/arxiv.org\/abs\/2112.04359"},{"key":"e_1_3_3_74_2","doi-asserted-by":"publisher","DOI":"10.1145\/3531146.3533088"},{"key":"e_1_3_3_75_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jss.2021.111060"},{"key":"e_1_3_3_76_2","doi-asserted-by":"crossref","first-page":"644","DOI":"10.1109\/ICSME52107.2021.00073","volume-title":"2021 IEEE International Conference on Software Maintenance and Evolution (ICSME)","author":"Yang Zhou","year":"2021","unstructured":"Zhou Yang, Harshit Jain, Jieke Shi, Muhammad Hilmi Asyrofi, and David Lo. 2021. BiasHeal: On-the-fly black-box healing of bias in sentiment analysis systems. In 2021 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 644\u2013648."},{"key":"e_1_3_3_77_2","doi-asserted-by":"publisher","DOI":"10.1145\/3533767.3534389"},{"key":"e_1_3_3_78_2","volume-title":"11th International Conference on Learning Representations (ICLR)","author":"Zeng Aohan","year":"2023","unstructured":"Aohan Zeng, Xiao Liu, Zhengxiao Du, Zihan Wang, Hanyu Lai, Ming Ding, Zhuoyi Yang, Yifan Xu, Wendi Zheng, Xiao Xia, et al. 2023. GLM-130B: An open bilingual pre-trained model. In 11th International Conference on Learning Representations (ICLR)."},{"key":"e_1_3_3_79_2","unstructured":"Susan Zhang Stephen Roller Naman Goyal Mikel Artetxe Moya Chen Shuohui Chen Christopher Dewan Mona Diab Xian Li Xi Victoria Lin et al. 2022. OPT: Open pre-trained transformer language models. arXiv:2205.01068. Retrieved from https:\/\/arxiv.org\/abs\/2205.01068"},{"key":"e_1_3_3_80_2","unstructured":"Wayne Xin Zhao Kun Zhou Junyi Li Tianyi Tang Xiaolei Wang Yupeng Hou Yingqian Min Beichen Zhang Junjie Zhang Zican Dong et al. 2023. A survey of large language models. arXiv:2303.18223. Retrieved from https:\/\/arxiv.org\/abs\/2303.18223"},{"key":"e_1_3_3_81_2","first-page":"3576","volume-title":"Findings of the Association for Computational Linguistics (EMNLP \u201922)","author":"Zhou Jingyan","unstructured":"Jingyan Zhou, Jiawen Deng, Fei Mi, Yitong Li, Yasheng Wang, Minlie Huang, Xin Jiang, Qun Liu, and Helen Meng. 2022. Towards identifying social bias in dialog systems: Framework, dataset, and benchmark. In Findings of the Association for Computational Linguistics (EMNLP \u201922), 3576\u20133591."},{"key":"e_1_3_3_82_2","doi-asserted-by":"crossref","first-page":"3755","DOI":"10.18653\/v1\/2022.acl-long.261","volume-title":"60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Ziems Caleb","year":"2022","unstructured":"Caleb Ziems, Jane Yu, Yi-Chia Wang, Alon Halevy, and Diyi Yang. 2022. The moral integrity corpus: A benchmark for ethical dialogue systems. In 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 3755\u20133773."}],"container-title":["ACM Transactions on Software Engineering and Methodology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3722554","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,11]],"date-time":"2025-12-11T15:56:14Z","timestamp":1765468574000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3722554"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12,11]]},"references-count":81,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2026,1,31]]}},"alternative-id":["10.1145\/3722554"],"URL":"https:\/\/doi.org\/10.1145\/3722554","relation":{},"ISSN":["1049-331X","1557-7392"],"issn-type":[{"value":"1049-331X","type":"print"},{"value":"1557-7392","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,12,11]]},"assertion":[{"value":"2024-07-24","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-03-05","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-12-11","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}