{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,8]],"date-time":"2026-06-08T18:10:39Z","timestamp":1780942239801,"version":"3.54.1"},"reference-count":54,"publisher":"Association for Computing Machinery (ACM)","issue":"FSE","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Softw. Eng."],"published-print":{"date-parts":[[2025,6,19]]},"abstract":"<jats:p>Large Language Models (LLMs) are prone to hallucinations, e.g., factually incorrect information, in their responses. These hallucinations present challenges for LLM-based applications that demand high factual accuracy. Existing hallucination detection methods primarily depend on external resources, which can suffer from issues such as low availability, incomplete coverage, privacy concerns, high latency, low reliability, and poor scalability. There are also methods depending on output probabilities, which are often inaccessible for closed-source LLMs like GPT models. This paper presents MetaQA, a self-contained hallucination detection approach that leverages metamorphic relation and prompt mutation. Unlike existing methods, MetaQA operates without any external resources and is compatible with both open-source and closed-source LLMs.\n \n \n \nMetaQA is based on the hypothesis that if an LLM\u2019s response is a hallucination, the designed metamorphic relations will be violated. We compare MetaQA with the state-of-the-art zero-resource hallucination detection method, SelfCheckGPT, across multiple datasets, and on two open-source and two closed-source LLMs. Our results reveal that MetaQA outperforms SelfCheckGPT in terms of precision, recall, and f1 score. For the four LLMs we study, MetaQA outperforms SelfCheckGPT with a superiority margin ranging from 0.041 - 0.113 (for precision), 0.143 - 0.430 (for recall), and 0.154 - 0.368 (for F1-score). For instance, with Mistral-7B, MetaQA achieves an average F1-score of 0.435, compared to SelfCheckGPT\u2019s F1-score of 0.205, representing an improvement rate of 112.2%. MetaQA also demonstrates superiority across all different categories of questions.<\/jats:p>","DOI":"10.1145\/3715735","type":"journal-article","created":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T15:15:34Z","timestamp":1750346134000},"page":"425-445","source":"Crossref","is-referenced-by-count":18,"title":["Hallucination Detection in Large Language Models with Metamorphic Relations"],"prefix":"10.1145","volume":"2","author":[{"ORCID":"https:\/\/orcid.org\/0009-0009-3482-7667","authenticated-orcid":false,"given":"Borui","family":"Yang","sequence":"first","affiliation":[{"name":"King's College London, London, United Kingdom"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9319-3483","authenticated-orcid":false,"given":"Md Afif","family":"Al Mamun","sequence":"additional","affiliation":[{"name":"University of Calgary, Calgary, Canada"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0481-7264","authenticated-orcid":false,"given":"Jie M.","family":"Zhang","sequence":"additional","affiliation":[{"name":"King's College London, London, United Kingdom"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1376-095X","authenticated-orcid":false,"given":"Gias","family":"Uddin","sequence":"additional","affiliation":[{"name":"York University, Toronto, Canada"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2025,6,19]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"2022. OpenAI ChatGPT. https:\/\/openai.com\/index\/chatgpt\/"},{"key":"e_1_2_1_2_1","first-page":"9649","article-title":"Boxe: A box embedding model for knowledge base completion","volume":"33","author":"Abboud Ralph","year":"2020","unstructured":"Ralph Abboud, Ismail Ceylan, Thomas Lukasiewicz, and Tommaso Salvatori. 2020. Boxe: A box embedding model for knowledge base completion. Advances in Neural Information Processing Systems, 33 (2020), 9649\u20139661.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","unstructured":"Ayush Agrawal Mirac Suzgun Lester Mackey and Adam Tauman Kalai. 2023. Do Language Models Know When They\u2019re Hallucinating References? arXiv preprint arXiv:2305.18248 https:\/\/doi.org\/10.48550\/arXiv.2305.18248 10.48550\/arXiv.2305.18248","DOI":"10.48550\/arXiv.2305.18248"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","unstructured":"Nicholas Asher and Swarnadeep Bhar. 2024. Strong hallucinations from negation and how to fix them. arXiv preprint arXiv:2402.10543 https:\/\/doi.org\/10.48550\/arXiv.2402.10543 10.48550\/arXiv.2402.10543","DOI":"10.48550\/arXiv.2402.10543"},{"key":"e_1_2_1_5_1","volume-title":"Generating Fact Checking Explanations. In Annual Meeting of the Association for Computational Linguistics. https:\/\/api.semanticscholar.org\/CorpusID:215744944","author":"Atanasova Pepa","year":"2020","unstructured":"Pepa Atanasova, Jakob Grue Simonsen, Christina Lioma, and Isabelle Augenstein. 2020. Generating Fact Checking Explanations. In Annual Meeting of the Association for Computational Linguistics. https:\/\/api.semanticscholar.org\/CorpusID:215744944"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1909.03242"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","unstructured":"Jifan Chen Grace Kim Aniruddh Sriram Greg Durrett and Eunsol Choi. 2023. Complex claim verification with evidence retrieved in the wild. arXiv preprint arXiv:2305.11859 https:\/\/doi.org\/10.48550\/arXiv.2305.11859 10.48550\/arXiv.2305.11859","DOI":"10.48550\/arXiv.2305.11859"},{"key":"e_1_2_1_8_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3143561","article-title":"Metamorphic testing: A review of challenges and opportunities","volume":"51","author":"Chen Tsong Yueh","year":"2018","unstructured":"Tsong Yueh Chen, Fei-Ching Kuo, Huai Liu, Pak-Lok Poon, Dave Towey, TH Tse, and Zhi Quan Zhou. 2018. Metamorphic testing: A review of challenges and opportunities. ACM Computing Surveys (CSUR), 51, 1 (2018), 1\u201327.","journal-title":"ACM Computing Surveys (CSUR)"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","unstructured":"I Chern Steffi Chern Shiqi Chen Weizhe Yuan Kehua Feng Chunting Zhou Junxian He Graham Neubig and Pengfei Liu. 2023. FacTool: Factuality Detection in Generative AI\u2013A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios. arXiv preprint arXiv:2307.13528 https:\/\/doi.org\/10.48550\/arXiv.2307.13528 10.48550\/arXiv.2307.13528","DOI":"10.48550\/arXiv.2307.13528"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","unstructured":"Roi Cohen May Hamri Mor Geva and Amir Globerson. 2023. Lm vs lm: Detecting factual errors via cross examination. arXiv preprint arXiv:2305.13281 https:\/\/doi.org\/10.48550\/arXiv.2305.13281 10.48550\/arXiv.2305.13281","DOI":"10.48550\/arXiv.2305.13281"},{"key":"e_1_2_1_11_1","volume-title":"Findings of the Association for Computational Linguistics ACL","author":"Dhuliawala Shehzaad","year":"2024","unstructured":"Shehzaad Dhuliawala, Mojtaba Komeili, Jing Xu, Roberta Raileanu, Xian Li, Asli Celikyilmaz, and Jason Weston. 2024. Chain-of-Verification Reduces Hallucination in Large Language Models. In Findings of the Association for Computational Linguistics ACL 2024, Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.). Association for Computational Linguistics, Bangkok, Thailand and virtual meeting. 3563\u20133578. https:\/\/aclanthology.org\/2024.findings-acl.212"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","unstructured":"Christopher Foster Abhishek Gulati Mark Harman Inna Harper Ke Mao Jillian Ritchey Herv\u00e9 Robert and Shubho Sengupta. 2025. Mutation-Guided LLM-based Test Generation at Meta. arXiv preprint arXiv:2501.12862 https:\/\/doi.org\/10.48550\/arXiv.2501.12862 10.48550\/arXiv.2501.12862","DOI":"10.48550\/arXiv.2501.12862"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","unstructured":"Boris A. Galitsky. 2023. Truth-O-Meter: Collaborating with LLM in Fighting its Hallucinations. Preprints July https:\/\/doi.org\/10.20944\/preprints202307.1723.v1 10.20944\/preprints202307.1723.v1","DOI":"10.20944\/preprints202307.1723.v1"},{"key":"e_1_2_1_14_1","doi-asserted-by":"crossref","first-page":"178","DOI":"10.1162\/tacl_a_00454","article-title":"A survey on automated fact-checking","volume":"10","author":"Guo Zhijiang","year":"2022","unstructured":"Zhijiang Guo, Michael Schlichtkrull, and Andreas Vlachos. 2022. A survey on automated fact-checking. Transactions of the Association for Computational Linguistics, 10 (2022), 178\u2013206.","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"e_1_2_1_15_1","volume-title":"A Richly Annotated Corpus for Different Tasks in Automated Fact-Checking. ArXiv, abs\/1911.01214","author":"Hanselowski Andreas","year":"2019","unstructured":"Andreas Hanselowski, Christian Stab, Claudia Schulz, Zile Li, and Iryna Gurevych. 2019. A Richly Annotated Corpus for Different Tasks in Automated Fact-Checking. ArXiv, abs\/1911.01214 (2019), https:\/\/api.semanticscholar.org\/CorpusID:207779874"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","unstructured":"Lei Huang Weijiang Yu Weitao Ma Weihong Zhong Zhangyin Feng Haotian Wang Qianglong Chen Weihua Peng Xiaocheng Feng and Bing Qin. 2023. A survey on hallucination in large language models: Principles taxonomy challenges and open questions. arXiv preprint arXiv:2311.05232 https:\/\/doi.org\/10.1145\/3703155 10.1145\/3703155","DOI":"10.1145\/3703155"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","unstructured":"Siqing Huo Negar Arabzadeh and Charles LA Clarke. 2023. Retrieving supporting evidence for llms generated answers. arXiv preprint arXiv:2306.13781 https:\/\/doi.org\/10.48550\/arXiv.2306.13781 10.48550\/arXiv.2306.13781","DOI":"10.48550\/arXiv.2306.13781"},{"key":"e_1_2_1_18_1","volume-title":"METAL: Metamorphic Testing Framework for Analyzing Large-Language Model Qualities. In 2024 IEEE Conference on Software Testing, Verification and Validation (ICST). 117\u2013128","author":"Hyun Sangwon","year":"2024","unstructured":"Sangwon Hyun, Mingyu Guo, and M. Ali Babar. 2024. METAL: Metamorphic Testing Framework for Analyzing Large-Language Model Qualities. In 2024 IEEE Conference on Software Testing, Verification and Validation (ICST). 117\u2013128. https:\/\/doi.org\/10.1109\/ICST60714.2024.00019 10.1109\/ICST60714.2024.00019"},{"key":"e_1_2_1_19_1","first-page":"1","article-title":"Survey of hallucination in natural language generation","volume":"55","author":"Ji Ziwei","year":"2023","unstructured":"Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. 2023. Survey of hallucination in natural language generation. Comput. Surveys, 55, 12 (2023), 1\u201338.","journal-title":"Comput. Surveys"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","unstructured":"Saurav Kadavath Tom Conerly Amanda Askell Tom Henighan Dawn Drain Ethan Perez Nicholas Schiefer Zac Hatfield-Dodds Nova DasSarma and Eli Tran-Johnson. 2022. Language models (mostly) know what they know. arXiv preprint arXiv:2207.05221 https:\/\/doi.org\/10.48550\/arXiv.2207.05221 10.48550\/arXiv.2207.05221","DOI":"10.48550\/arXiv.2207.05221"},{"key":"e_1_2_1_21_1","volume-title":"Akari Asai, Xinyan Yu, Dragomir Radev, Noah A Smith, Yejin Choi, and Kentaro Inui.","author":"Kasai Jungo","year":"2024","unstructured":"Jungo Kasai, Keisuke Sakaguchi, Ronan Le Bras, Akari Asai, Xinyan Yu, Dragomir Radev, Noah A Smith, Yejin Choi, and Kentaro Inui. 2024. REALTIME QA: what\u2019s the answer right now? Advances in Neural Information Processing Systems, 36 (2024)."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","unstructured":"Wojciech Kry\u015bci\u0144ski Bryan McCann Caiming Xiong and Richard Socher. 2019. Evaluating the factual consistency of abstractive text summarization. arXiv preprint arXiv:1910.12840 https:\/\/doi.org\/10.48550\/arXiv.1910.12840 10.48550\/arXiv.1910.12840","DOI":"10.48550\/arXiv.1910.12840"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","unstructured":"Vivian Lai Alison Smith-Renner Ke Zhang Ruijia Cheng Wenjuan Zhang Joel Tetreault and Alejandro Jaimes. 2022. An exploration of post-editing effectiveness in text summarization. arXiv preprint arXiv:2206.06383 https:\/\/doi.org\/10.48550\/arXiv.2206.06383 10.48550\/arXiv.2206.06383","DOI":"10.48550\/arXiv.2206.06383"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2305.11747"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","unstructured":"Ningke Li Yuekang Li Yi Liu Ling Shi Kailong Wang and Haoyu Wang. 2024. HalluVault: A Novel Logic Programming-aided Metamorphic Testing Framework for Detecting Fact-Conflicting Hallucinations in Large Language Models. arXiv preprint arXiv:2405.00648 https:\/\/doi.org\/10.48550\/arXiv.2405.00648 10.48550\/arXiv.2405.00648","DOI":"10.48550\/arXiv.2405.00648"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","unstructured":"Wei Li Wenhao Wu Moye Chen Jiachen Liu Xinyan Xiao and Hua Wu. 2022. Faithfulness in natural language generation: A systematic survey of analysis evaluation and optimization methods. arXiv preprint arXiv:2203.05227 https:\/\/doi.org\/10.48550\/arXiv.2203.05227 10.48550\/arXiv.2203.05227","DOI":"10.48550\/arXiv.2203.05227"},{"key":"e_1_2_1_27_1","article-title":"A survey of knowledge graph reasoning on graph types: Static, dynamic, and multi-modal","author":"Liang Ke","year":"2024","unstructured":"Ke Liang, Lingyuan Meng, Meng Liu, Yue Liu, Wenxuan Tu, Siwei Wang, Sihang Zhou, Xinwang Liu, Fuchun Sun, and Kunlun He. 2024. A survey of knowledge graph reasoning on graph types: Static, dynamic, and multi-modal. IEEE Transactions on Pattern Analysis and Machine Intelligence.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2109.07958"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","unstructured":"Zheheng Luo Qianqian Xie and Sophia Ananiadou. 2023. Chatgpt as a factual inconsistency evaluator for text summarization. arXiv preprint arXiv:2303.15621 https:\/\/doi.org\/10.48550\/arXiv.2303.15621 10.48550\/arXiv.2303.15621","DOI":"10.48550\/arXiv.2303.15621"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","unstructured":"Potsawee Manakul Adian Liusie and Mark JF Gales. 2023. Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models. arXiv preprint arXiv:2303.08896 https:\/\/doi.org\/10.48550\/arXiv.2303.08896 10.48550\/arXiv.2303.08896","DOI":"10.48550\/arXiv.2303.08896"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2305.14251"},{"key":"e_1_2_1_32_1","volume-title":"Annual Meeting of the Association for Computational Linguistics. https:\/\/api.semanticscholar.org\/CorpusID:196176000","author":"Moon Seungwhan","year":"2019","unstructured":"Seungwhan Moon, Pararth Shah, Anuj Kumar, and Rajen Subba. 2019. OpenDialKG: Explainable Conversational Reasoning with Attention-based Walks over Knowledge Graphs. In Annual Meeting of the Association for Computational Linguistics. https:\/\/api.semanticscholar.org\/CorpusID:196176000"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2104.04402"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2303.08774"},{"key":"e_1_2_1_35_1","article-title":"An Empirical Study of the Non-Determinism of ChatGPT in Code Generation","volume":"34","author":"Ouyang Shuyin","year":"2025","unstructured":"Shuyin Ouyang, Jie M. Zhang, Mark Harman, and Meng Wang. 2025. An Empirical Study of the Non-Determinism of ChatGPT in Code Generation. ACM Trans. Softw. Eng. Methodol., 34, 2 (2025), Article 42, Jan., 28 pages. issn:1049-331X","journal-title":"ACM Trans. Softw. Eng. Methodol."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","unstructured":"Libo Qin Qiguang Chen Xiachong Feng Yang Wu Yongheng Zhang Yinghui Li Min Li Wanxiang Che and Philip S Yu. 2024. Large language models meet nlp: A survey. arXiv preprint arXiv:2405.12819 https:\/\/doi.org\/10.48550\/arXiv.2405.12819 10.48550\/arXiv.2405.12819","DOI":"10.48550\/arXiv.2405.12819"},{"key":"e_1_2_1_37_1","first-page":"19716","article-title":"Beta embeddings for multi-hop logical reasoning in knowledge graphs","volume":"33","author":"Ren Hongyu","year":"2020","unstructured":"Hongyu Ren and Jure Leskovec. 2020. Beta embeddings for multi-hop logical reasoning in knowledge graphs. Advances in Neural Information Processing Systems, 33 (2020), 19716\u201319726.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","unstructured":"Abigail See Peter J Liu and Christopher D Manning. 2017. Get to the point: Summarization with pointer-generator networks. arXiv preprint arXiv:1704.04368 https:\/\/doi.org\/10.48550\/arXiv.1704.04368 10.48550\/arXiv.1704.04368","DOI":"10.48550\/arXiv.1704.04368"},{"key":"e_1_2_1_39_1","volume-title":"Proceedings of the ACM\/IEEE 42nd international conference on software engineering. 974\u2013985","author":"Sun Zeyu","year":"2020","unstructured":"Zeyu Sun, Jie M Zhang, Mark Harman, Mike Papadakis, and Lu Zhang. 2020. Automatic testing and improvement of machine translation. In Proceedings of the ACM\/IEEE 42nd international conference on software engineering. 974\u2013985."},{"key":"e_1_2_1_40_1","doi-asserted-by":"crossref","first-page":"100159","DOI":"10.1016\/j.jnlest.2022.100159","article-title":"Knowledge graph and knowledge reasoning: A systematic review","volume":"20","author":"Tian Ling","year":"2022","unstructured":"Ling Tian, Xue Zhou, Yan-Ping Wu, Wang-Tao Zhou, Jin-Hao Zhang, and Tian-Shu Zhang. 2022. Knowledge graph and knowledge reasoning: A systematic review. Journal of Electronic Science and Technology, 20, 2 (2022), 100159.","journal-title":"Journal of Electronic Science and Technology"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","unstructured":"Neeraj Varshney Satyam Raj Venkatesh Mishra Agneet Chatterjee Ritika Sarkar Amir Saeidi and Chitta Baral. 2024. Investigating and Addressing Hallucinations of LLMs in Tasks Involving Negation. arXiv preprint arXiv:2406.05494 https:\/\/doi.org\/10.48550\/arXiv.2406.05494 10.48550\/arXiv.2406.05494","DOI":"10.48550\/arXiv.2406.05494"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","unstructured":"Neeraj Varshney Wenlin Yao Hongming Zhang Jianshu Chen and Dong Yu. 2023. A stitch in time saves nine: Detecting and mitigating hallucinations of llms by validating low-confidence generation. arXiv preprint arXiv:2307.03987 https:\/\/doi.org\/10.48550\/arXiv.2307.03987 10.48550\/arXiv.2307.03987","DOI":"10.48550\/arXiv.2307.03987"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2310.03214"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","unstructured":"Miao Xiong Zhiyuan Hu Xinyang Lu Yifei Li Jie Fu Junxian He and Bryan Hooi. 2023. Can llms express their uncertainty? an empirical evaluation of confidence elicitation in llms. arXiv preprint arXiv:2306.13063 https:\/\/doi.org\/10.48550\/arXiv.2306.13063 10.48550\/arXiv.2306.13063","DOI":"10.48550\/arXiv.2306.13063"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","unstructured":"Ziwei Xu Sanjay Jain and Mohan Kankanhalli. 2024. Hallucination is inevitable: An innate limitation of large language models. arXiv preprint arXiv:2401.11817 https:\/\/doi.org\/10.48550\/arXiv.2401.11817 10.48550\/arXiv.2401.11817","DOI":"10.48550\/arXiv.2401.11817"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","unstructured":"Zhilin Yang Peng Qi Saizheng Zhang Yoshua Bengio William W Cohen Ruslan Salakhutdinov and Christopher D Manning. 2018. HotpotQA: A dataset for diverse explainable multi-hop question answering. arXiv preprint arXiv:1809.09600 https:\/\/doi.org\/10.48550\/arXiv.1809.09600 10.48550\/arXiv.1809.09600","DOI":"10.48550\/arXiv.1809.09600"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","unstructured":"Jia-Yu Yao Kun-Peng Ning Zhen-Hui Liu Mu-Nan Ning and Li Yuan. 2023. Llm lies: Hallucinations are not bugs but features as adversarial examples. arXiv preprint arXiv:2310.01469 https:\/\/doi.org\/10.48550\/arXiv.2310.01469 10.48550\/arXiv.2310.01469","DOI":"10.48550\/arXiv.2310.01469"},{"key":"e_1_2_1_48_1","doi-asserted-by":"crossref","unstructured":"Yifan Yao Jinhao Duan Kaidi Xu Yuanfang Cai Zhibo Sun and Yue Zhang. 2024. A survey on large language model (llm) security and privacy: The good the bad and the ugly. High-Confidence Computing 100211.","DOI":"10.1016\/j.hcc.2024.100211"},{"key":"e_1_2_1_49_1","unstructured":"Chan Xing Yu Chan Si Yu James and Poh Hui-Li Phyllis David. [n. d.]. CAN LLMS HAVE A FEVER? INVESTIGATING THE EFFECTS OF TEMPERATURE ON LLM SECURITY."},{"key":"e_1_2_1_50_1","volume-title":"Proceedings of the 29th ACM\/IEEE international conference on Automated software engineering. 701\u2013712","author":"Zhang Jie","year":"2014","unstructured":"Jie Zhang, Junjie Chen, Dan Hao, Yingfei Xiong, Bing Xie, Lu Zhang, and Hong Mei. 2014. Search-based inference of polynomial metamorphic relations. In Proceedings of the 29th ACM\/IEEE international conference on Automated software engineering. 701\u2013712."},{"key":"e_1_2_1_51_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/TSE.2019.2962027","article-title":"Machine Learning Testing: Survey, Landscapes and Horizons","volume":"48","author":"Zhang Jie M.","year":"2022","unstructured":"Jie M. Zhang, Mark Harman, Lei Ma, and Yang Liu. 2022. Machine Learning Testing: Survey, Landscapes and Horizons. IEEE Transactions on Software Engineering, 48, 1 (2022), 1\u201336.","journal-title":"IEEE Transactions on Software Engineering"},{"key":"e_1_2_1_52_1","volume-title":"Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1\u201316","author":"Zhang Xiaoyu","year":"2023","unstructured":"Xiaoyu Zhang, Jianping Li, Po-Wei Chi, Senthil Chandrasegaran, and Kwan-Liu Ma. 2023. ConceptEVA: concept-based interactive exploration and customization of document summaries. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1\u201316."},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","unstructured":"Yue Zhang Yafu Li Leyang Cui Deng Cai Lemao Liu Tingchen Fu Xinting Huang Enbo Zhao Yu Zhang and Yulong Chen. 2023. Siren\u2019s song in the AI ocean: a survey on hallucination in large language models. arXiv preprint arXiv:2309.01219 https:\/\/doi.org\/10.48550\/arXiv.2309.01219 10.48550\/arXiv.2309.01219","DOI":"10.48550\/arXiv.2309.01219"},{"key":"e_1_2_1_54_1","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence. 33","author":"Zhou Zili","year":"2019","unstructured":"Zili Zhou, Shaowu Liu, Guandong Xu, and Wu Zhang. 2019. On completing sparse knowledge base with transitive relation embedding. In Proceedings of the AAAI Conference on Artificial Intelligence. 33, 3125\u20133132."}],"container-title":["Proceedings of the ACM on Software Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3715735","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T15:32:01Z","timestamp":1750347121000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3715735"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,19]]},"references-count":54,"journal-issue":{"issue":"FSE","published-print":{"date-parts":[[2025,6,19]]}},"alternative-id":["10.1145\/3715735"],"URL":"https:\/\/doi.org\/10.1145\/3715735","relation":{},"ISSN":["2994-970X"],"issn-type":[{"value":"2994-970X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,6,19]]}}}