{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,28]],"date-time":"2026-04-28T15:28:36Z","timestamp":1777390116267,"version":"3.51.4"},"reference-count":65,"publisher":"Association for Computing Machinery (ACM)","issue":"OOPSLA2","license":[{"start":{"date-parts":[[2024,10,8]],"date-time":"2024-10-08T00:00:00Z","timestamp":1728345600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Program. Lang."],"published-print":{"date-parts":[[2024,10,8]]},"abstract":"<jats:p>\n                    Large language models (LLMs) have revolutionized language processing, but face critical challenges with security, privacy, and generating hallucinations \u2014 coherent but factually inaccurate outputs. A major issue is fact-conflicting hallucination (FCH), where LLMs produce content contradicting ground truth facts. Addressing FCH is difficult due to two key challenges:\n                    <jats:bold>1)<\/jats:bold>\n                    Automatically constructing and updating benchmark datasets is hard, as existing methods rely on manually curated static benchmarks that cannot cover the broad, evolving spectrum of FCH cases.\n                    <jats:bold>2)<\/jats:bold>\n                    Validating the reasoning behind LLM outputs is inherently difficult, especially for complex logical relations.\n                  <\/jats:p>\n                  <jats:p>\n                    To tackle these challenges, we introduce a novel logic-programming-aided metamorphic testing technique for FCH detection. We develop an extensive and extensible framework that constructs a comprehensive factual knowledge base by crawling sources like Wikipedia, seamlessly integrated into D\n                    <jats:sc>rowzee<\/jats:sc>\n                    . Using logical reasoning rules, we transform and augment this knowledge into a large set of test cases with ground truth answers. We test LLMs on these cases through template-based prompts, requiring them to provide reasoned answers. To validate their reasoning, we propose two semantic-aware oracles that assess the similarity between the semantic structures of the LLM answers and ground truth. Our approach automatically generates useful test cases and identifies hallucinations across six LLMs within nine domains, with hallucination rates ranging from 24.7% to 59.8%. Key findings include LLMs struggling with temporal concepts, out-of-distribution knowledge, and lack of logical reasoning capabilities. The results show that logic-based test cases generated by D\n                    <jats:sc>rowzee<\/jats:sc>\n                    effectively trigger and detect hallucinations. To further mitigate the identified FCHs, we explored model editing techniques, which proved effective on a small scale (with edits to fewer than 1000 knowledge pieces). Our findings emphasize the need for continued community efforts to detect and mitigate model hallucinations.\n                  <\/jats:p>","DOI":"10.1145\/3689776","type":"journal-article","created":{"date-parts":[[2024,10,8]],"date-time":"2024-10-08T03:23:04Z","timestamp":1728357784000},"page":"1843-1872","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":16,"title":["Drowzee: Metamorphic Testing for Fact-Conflicting Hallucination Detection in Large Language Models"],"prefix":"10.1145","volume":"8","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0270-1372","authenticated-orcid":false,"given":"Ningke","family":"Li","sequence":"first","affiliation":[{"name":"Huazhong University of Science and Technology, Wuhan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4382-0757","authenticated-orcid":false,"given":"Yuekang","family":"Li","sequence":"additional","affiliation":[{"name":"The University of New South Wales, Sydney, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4978-127X","authenticated-orcid":false,"given":"Yi","family":"Liu","sequence":"additional","affiliation":[{"name":"Nanyang Technological University, Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2023-0247","authenticated-orcid":false,"given":"Ling","family":"Shi","sequence":"additional","affiliation":[{"name":"Nanyang Technological University, Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3977-6573","authenticated-orcid":false,"given":"Kailong","family":"Wang","sequence":"additional","affiliation":[{"name":"Huazhong University of Science and Technology, Wuhan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1100-8633","authenticated-orcid":false,"given":"Haoyu","family":"Wang","sequence":"additional","affiliation":[{"name":"Huazhong University of Science and Technology, Wuhan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,10,8]]},"reference":[{"key":"e_1_3_1_2_1","unstructured":"Ralph Abboud \u0130smail \u0130lkan Ceylan Thomas Lukasiewicz and Tommaso Salvatori. 2020. BoxE: A Box Embedding Model for Knowledge Base Completion. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020 NeurIPS 2020 December 6-12 2020 virtual Hugo Larochelle Marc\u2019Aurelio Ranzato Raia Hadsell Maria-Florina Balcan and Hsuan-Tien Lin (Eds.). https:\/\/proceedings.neurips.cc\/paper\/2020\/hash\/6dbbe6abe5f14af882ff977fc3f35501-Abstract.html"},{"key":"e_1_3_1_3_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/P15-1034"},{"key":"e_1_3_1_4_1","unstructured":"Giusepppe Attardi. 2015. WikiExtractor. https:\/\/github.com\/attardi\/wikiextractor."},{"key":"e_1_3_1_5_1","doi-asserted-by":"crossref","first-page":"722","DOI":"10.1007\/978-3-540-76298-0_52","volume-title":"The Semantic Web, 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference, ISWC 2007 + ASWC 2007, Busan, Korea, November 11-15, 2007 (Lecture Notes in Computer Science, Vol. 4825)","author":"Auer S\u00f6ren","year":"2007","unstructured":"S\u00f6ren Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary G. Ives. 2007. DBpedia: A Nucleus for a Web of Open Data. In The Semantic Web, 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference, ISWC 2007 + ASWC 2007, Busan, Korea, November 11-15, 2007 (Lecture Notes in Computer Science, Vol. 4825), Karl Aberer, Key-Sun Choi, Natasha Fridman Noy, Dean Allemang, Kyung-Il Lee, Lyndon J. B. Nixon, Jennifer Golbeck, Peter Mika, Diana Maynard, Riichiro Mizoguchi, Guus Schreiber, and Philippe Cudr\u00e9-Mauroux (Eds.). Springer, 722-735. https:\/\/doi.org\/10.1007\/978-3-540-76298-0_52 10.1007\/978-3-540-76298-0_52"},{"key":"e_1_3_1_6_1","doi-asserted-by":"crossref","first-page":"967","DOI":"10.18653\/v1\/2023.findings-emnlp.68","volume-title":"Findings of the Association for Computational Linguistics: EMNLP 2023","author":"Azaria Amos","year":"2023","unstructured":"Amos Azaria and Tom Mitchell. 2023. The Internal State of an LLM Knows When It\u2019s Lying. In Findings of the Association for Computational Linguistics: EMNLP 2023, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapore, 967-976. https:\/\/doi.org\/10.18653\/v1\/2023.findings-emnlp.68 10.18653\/v1\/2023.findings-emnlp.68"},{"key":"e_1_3_1_7_1","first-page":"1962","volume-title":"Proceedings of the 22nd National Conference on Artificial Intelligence - Volume 2 (Vancouver, British Columbia, Canada) (AAAI\u201907)","author":"Bollacker Kurt","year":"2007","unstructured":"Kurt Bollacker, Robert Cook, and Patrick Tufts. 2007. Freebase: A Shared Database of Structured General Human Knowledge. In Proceedings of the 22nd National Conference on Artificial Intelligence - Volume 2 (Vancouver, British Columbia, Canada) (AAAI\u201907). AAAI Press, 1962-1963."},{"key":"e_1_3_1_8_1","unstructured":"Lichang Chen Shiyang Li Jun Yan Hai Wang Kalpa Gunaratna Vikas Yadav Zheng Tang Vijay Srinivasan Tianyi Zhou Heng Huang and Hongxia Jin. 2024. AlpaGasus: Training a Better Alpaca with Fewer Data. In The Twelfth International Conference on Learning Representations ICLR 2024 Vienna Austria May 7-11 2024. OpenReview.net. https:\/\/openreview.net\/forum?id=FdVXgSJhvz"},{"key":"e_1_3_1_9_1","first-page":"2086","volume-title":"Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)","author":"Dong Zican","year":"2024","unstructured":"Zican Dong, Tianyi Tang, Junyi Li, Wayne Xin Zhao, and Ji-Rong Wen. 2024. BAMBOO: A Comprehensive Benchmark for Evaluating Long Text Modeling Capacities of Large Language Models. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue (Eds.). ELRA and ICCL, Torino, Italia, 2086-2099. https:\/\/aclanthology.org\/2024.lrec-main.188"},{"key":"e_1_3_1_10_1","unstructured":"Mohamed Elaraby Mengyin Lu Jacob Dunn Xueying Zhang Yu Wang and Shizhu Liu. 2023. Halo: Estimation and Reduction of Hallucinations in Open-Source Weak Large Language Models. CoRR abs\/2308.11764 (2023). https:\/\/doi.org\/10.48550\/ARXIV.2308.11764 10.48550\/ARXIV.2308.11764 arXiv:2308.11764"},{"key":"e_1_3_1_11_1","unstructured":"GitHub. 2024. Drowzee. https:\/\/github.com\/security-pride\/Drowzee."},{"key":"e_1_3_1_12_1","unstructured":"Zhibin Gou Zhihong Shao Yeyun Gong Yelong Shen Yujiu Yang Nan Duan and Weizhu Chen. 2024. CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing. In The Twelfth International Conference on Learning Representations ICLR 2024 Vienna Austria May 7-11 2024. OpenReview.net. https:\/\/openreview.net\/forum?id=Sx038qxjek"},{"key":"e_1_3_1_13_1","unstructured":"Simeng Han Hailey Schoelkopf Yilun Zhao Zhenting Qi Martin Riddell Luke Benson Lucy Sun Ekaterina Zubova Yujie Qiao Matthew Burtell David Peng Jonathan Fan Yixin Liu Brian Wong Malcolm Sailor Ansong Ni Linyong Nan Jungo Kasai Tao Yu Rui Zhang Shafiq R. Joty Alexander R. Fabbri Wojciech Kryscinski Xi Victoria Lin Caiming Xiong and Dragomir Radev. 2022. FOLIO: Natural Language Reasoning with First-Order Logic. CoRR abs\/2209.00840 (2022). https:\/\/doi.org\/10.48550\/ARXIV.2209.00840 10.48550\/ARXIV.2209.00840 arXiv:2209.00840"},{"key":"e_1_3_1_14_1","unstructured":"hiyouga. 2023. FastEdit: Editing LLMs within 10 Seconds. https:\/\/github.com\/hiyouga\/FastEdit."},{"key":"e_1_3_1_15_1","unstructured":"Xinyi Hou Yanjie Zhao Yue Liu Zhou Yang Kailong Wang Li Li Xiapu Luo David Lo John C. Grundy and Haoyu Wang. 2023. Large Language Models for Software Engineering: A Systematic Literature Review. CoRR abs\/2308.10620 (2023). https:\/\/doi.org\/10.48550\/ARXIV.2308.10620 10.48550\/ARXIV.2308.10620 arXiv:2308.10620"},{"key":"e_1_3_1_16_1","unstructured":"Lei Huang Weijiang Yu Weitao Ma Weihong Zhong Zhangyin Feng Haotian Wang Qianglong Chen Weihua Peng Xiaocheng Feng Bing Qin and Ting Liu. 2023. A Survey on Hallucination in Large Language Models: Principles Taxonomy Challenges and Open Questions. CoRR abs\/2311.05232 (2023). https:\/\/doi.org\/10.48550\/ARXIV.2311.05232 10.48550\/ARXIV.2311.05232 arXiv:2311.05232"},{"key":"e_1_3_1_17_1","unstructured":"Albert Q. Jiang Alexandre Sablayrolles Arthur Mensch Chris Bamford Devendra Singh Chaplot Diego de Las Casas Florian Bressand Gianna Lengyel Guillaume Lample Lucile Saulnier L\u00e9lio Renard Lavaud Marie-Anne Lachaux Pierre Stock Teven Le Scao Thibaut Lavril Thomas Wang Timoth\u00e9e Lacroix and William El Sayed. 2023. Mistral 7B. CoRR abs\/2310.06825 (2023). https:\/\/doi.org\/10.48550\/ARXIV.2310.06825 10.48550\/ARXIV.2310.06825 arXiv:2310.06825"},{"key":"e_1_3_1_18_1","unstructured":"Albert Q. Jiang Alexandre Sablayrolles Antoine Roux Arthur Mensch Blanche Savary Chris Bamford Devendra Singh Chaplot Diego de Las Casas Emma Bou Hanna Florian Bressand Gianna Lengyel Guillaume Bour Guillaume Lample L\u00e9lio Renard Lavaud Lucile Saulnier Marie-Anne Lachaux Pierre Stock Sandeep Subramanian Sophia Yang Szymon Antoniak Teven Le Scao Th\u00e9ophile Gervet Thibaut Lavril Thomas Wang Timoth\u00e9e Lacroix and William El Sayed. 2024. Mixtral of Experts. CoRR abs\/2401.04088 (2024). https:\/\/doi.org\/10.48550\/ARXIV.2401.04088 10.48550\/ARXIV.2401.04088 arXiv:2401.04088"},{"key":"e_1_3_1_19_1","unstructured":"Jean Kaddour Joshua Harris Maximilian Mozes Herbie Bradley Roberta Raileanu and Robert McHardy. 2023. Challenges and Applications of Large Language Models. CoRR abs\/2307.10169 (2023). https:\/\/doi.org\/10.48550\/ARXIV.2307.10169 10.48550\/ARXIV.2307.10169 arXiv:2307.10169"},{"key":"e_1_3_1_20_1","unstructured":"Haoqiang Kang and Xiao-Yang Liu. 2023. Deficiency of Large Language Models in Finance: An Empirical Examination of Hallucination. CoRR abs\/2311.15548 (2023). https:\/\/doi.org\/10.48550\/ARXIV.2311.15548 10.48550\/ARXIV.2311.15548 arXiv:2311.15548"},{"key":"e_1_3_1_21_1","volume-title":"A Philosophical Essay on Probabilities","author":"Laplace Pierre-Simon","year":"1951","unstructured":"Pierre-Simon Laplace. 1951. A Philosophical Essay on Probabilities. Dover Publications, New York. Originally published in 1814 as \u201cEssai Philosophique sur les Probabilit\u00e9s\u201d."},{"key":"e_1_3_1_22_1","first-page":"6449","volume-title":"Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023","author":"Li Junyi","year":"2023","unstructured":"Junyi Li, Xiaoxue Cheng, Xin Zhao, Jian-Yun Nie, and Ji-Rong Wen. 2023a. HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, 6449-6464. https:\/\/doi.org\/10.18653\/V1\/2023.EMNLP-MAIN.397 10.18653\/V1\/2023.EMNLP-MAIN.397"},{"key":"e_1_3_1_23_1","unstructured":"Kenneth Li Oam Patel Fernanda B. Vi\u00e9gas Hanspeter Pfister and Martin Wattenberg. 2023b. Inference-Time Intervention: Eliciting Truthful Answers from a Language Model. In Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023 NeurIPS 2023 New Orleans LA USA December 10 - 16 2023 Alice Oh Tristan Naumann Amir Globerson Kate Saenko Moritz Hardt and Sergey Levine (Eds.). http:\/\/papers.nips.cc\/paper_files\/paper\/2023\/hash\/81b8390039b7302c909cb769f8b6cd93-Abstract-Conference.html"},{"key":"e_1_3_1_24_1","unstructured":"Ke Liang Lingyuan Meng Meng Liu Yue Liu Wenxuan Tu Siwei Wang Sihang Zhou Xinwang Liu and Fuchun Sun. 2022. Reasoning over Different Types of Knowledge Graphs: Static Temporal and Multi-Modal. CoRR abs\/2212.05767 (2022). https:\/\/doi.org\/10.48550\/ARXIV.2212.05767 10.48550\/ARXIV.2212.05767 arXiv:2212.05767"},{"key":"e_1_3_1_25_1","unstructured":"Hunter Lightman Vineet Kosaraju Yuri Burda Harrison Edwards Bowen Baker Teddy Lee Jan Leike John Schulman Ilya Sutskever and Karl Cobbe. 2024. Let\u2019s Verify Step by Step. In The Twelfth International Conference on Learning Representations ICLR 2024 Vienna Austria May 7-11 2024. OpenReview.net. https:\/\/openreview.net\/forum?id=v8L0pN6EOi"},{"key":"e_1_3_1_26_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.acl-long.229"},{"key":"e_1_3_1_27_1","unstructured":"Kevin Meng David Bau Alex Andonian and Yonatan Belinkov. 2022. Locating and Editing Factual Associations in GPT. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022 NeurIPS 2022 New Orleans LA USA November 28 - December 9 2022 Sanmi Koyejo S. Mohamed A. Agarwal Danielle Belgrave K. Cho and A. Oh (Eds.). http:\/\/papers.nips.cc\/paper_files\/paper\/2022\/hash\/6f1d43d5a82a37e89b0665b33bf3a182-Abstract-Conference.html"},{"key":"e_1_3_1_28_1","unstructured":"Kevin Meng Arnab Sen Sharma Alex J. Andonian Yonatan Belinkov and David Bau. 2023. Mass-Editing Memory in a Transformer. In The Eleventh International Conference on Learning Representations ICLR 2023 Kigali Rwanda May 1-5 2023. OpenReview.net. https:\/\/openreview.net\/forum?id=MkbcAHIYgyS"},{"key":"e_1_3_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/219717.219748"},{"key":"e_1_3_1_30_1","first-page":"12076","volume-title":"Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023","author":"Min Sewon","year":"2023","unstructured":"Sewon Min, Kalpesh Krishna, Xinxi Lyu, Mike Lewis, Wen-tau Yih, Pang Wei Koh, Mohit Iyyer, Luke Zettlemoyer, and Hannaneh Hajishirzi. 2023. FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, 12076-12100. https:\/\/doi.org\/10.18653\/V1\/2023.EMNLP-MAIN.741 10.18653\/V1\/2023.EMNLP-MAIN.741"},{"key":"e_1_3_1_31_1","first-page":"845","volume-title":"Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers","author":"Moon Seungwhan","year":"2019","unstructured":"Seungwhan Moon, Pararth Shah, Anuj Kumar, and Rajen Subba. 2019. OpenDialKG: Explainable Conversational Reasoning with Attention-based Walks over Knowledge Graphs. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, Anna Korhonen, David R. Traum, and Lluis M\u00e0rquez (Eds.). Association for Computational Linguistics, 845-854. https:\/\/doi.org\/10.18653\/V1\/Pl9-1081 10.18653\/V1\/Pl9-1081"},{"key":"e_1_3_1_32_1","doi-asserted-by":"crossref","first-page":"5153","DOI":"10.18653\/v1\/2023.emnlp-main.313","volume-title":"Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing","author":"Olausson Theo","year":"2023","unstructured":"Theo Olausson, Alex Gu, Ben Lipkin, Cedegao Zhang, Armando Solar-Lezama, Joshua Tenenbaum, and Roger Levy. 2023. LINC: A Neurosymbolic Approach for Logical Reasoning by Combining Language Models with First-Order Logic Provers. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapore, 5153-5176. https:\/\/doi.org\/10.18653\/v1\/2023.emnlp-main.313 10.18653\/v1\/2023.emnlp-main.313"},{"key":"e_1_3_1_33_1","unstructured":"OpenAI. 2023. GPT-4 Technical Report. CoRR abs\/2303.08774 (2023). https:\/\/doi.org\/10.48550\/ARXIV.2303.08774 10.48550\/ARXIV.2303.08774 arXiv:2303.08774"},{"key":"e_1_3_1_34_1","first-page":"314","volume-title":"Proceedings of the 27th Conference on Computational Natural Language Learning, CoNLL 2023, Singapore, December 6-7, 2023","author":"Pal Ankit","year":"2023","unstructured":"Ankit Pal, Logesh Kumar Umapathi, and Malaikannan Sankarasubbu. 2023. Med-HALT: Medical Domain Hallucination Test for Large Language Models. In Proceedings of the 27th Conference on Computational Natural Language Learning, CoNLL 2023, Singapore, December 6-7, 2023, Jing Jiang, David Reitter, and Shumin Deng (Eds.). Association for Computational Linguistics, 314-334. https:\/\/doi.org\/10.18653\/V1\/2023.CONLL-1.21 10.18653\/V1\/2023.CONLL-1.21"},{"key":"e_1_3_1_35_1","first-page":"3806","volume-title":"Findings of the Association for Computational Linguistics: EMNLP 2023","author":"Pan Liangming","year":"2023","unstructured":"Liangming Pan, Alon Albalak, Xinyi Wang, and William Wang. 2023. Logic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning. In Findings of the Association for Computational Linguistics: EMNLP 2023, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapore, 3806-3824. https:\/\/doi.org\/10.18653\/v1\/2023.findings-emnlp.248 10.18653\/v1\/2023.findings-emnlp.248"},{"key":"e_1_3_1_36_1","unstructured":"Eric Prud\u2019hommeaux and Andy Seaborne. 2018. SPARQL Query Language for RDF - W3C recommendation. https:\/\/www.w3.org\/TR\/rdf-sparql-query\/."},{"key":"e_1_3_1_37_1","first-page":"7064","volume-title":"Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)","author":"Qiu Yifu","year":"2024","unstructured":"Yifu Qiu, Zheng Zhao, Yftah Ziser, Anna Korhonen, Edoardo Ponti, and Shay Cohen. 2024. Are Large Language Model Temporally Grounded?. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), Kevin Duh, Helena Gomez, and Steven Bethard (Eds.). Association for Computational Linguistics, Mexico City, Mexico, 7064-7083. https:\/\/doi.org\/10.18653\/v1\/2024.naacl-long.391 10.18653\/v1\/2024.naacl-long.391"},{"key":"e_1_3_1_38_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1410"},{"key":"e_1_3_1_39_1","unstructured":"Philippe Remy. 2020. Python wrapper for Stanford OpenIE. https:\/\/github.com\/philipperemy\/Stanford-OpenIE-Python."},{"key":"e_1_3_1_40_1","unstructured":"Hongyu Ren and Jure Leskovec. 2020. Beta Embeddings for Multi-Hop Logical Reasoning in Knowledge Graphs. https:\/\/proceedings.neurips.cc\/paper\/2020\/hash\/e43739bba7cdb577e9e3e4e42447f5a5-Abstract.html"},{"key":"e_1_3_1_41_1","unstructured":"Satoshi Tajiri. 2023. Pokemon. https:\/\/www.pokemon.com\/us."},{"key":"e_1_3_1_42_1","unstructured":"ScienceDirect. 2023. Jaccard Similarity. https:\/\/www.sciencedirect.com\/topics\/computer-science\/jaccard-similarity."},{"key":"e_1_3_1_43_1","unstructured":"ScienceDirect. 2024. Friedman Test. https:\/\/www.sciencedirect.com\/topics\/biochemistry-genetics-and-molecular-biology\/friedman-test."},{"key":"e_1_3_1_44_1","first-page":"1073","volume-title":"Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics(Volume 1: Long Papers)","author":"See Abigail","year":"2017","unstructured":"Abigail See, Peter J. Liu, and Christopher D. Manning. 2017. Get To The Point: Summarization with Pointer-Generator Networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics(Volume 1: Long Papers), Regina Barzilay and Min-Yen Kan (Eds.). Association for Computational Linguistics, Vancouver, Canada, 1073-1083. https:\/\/doi.org\/10.18653\/v1\/P17-1099 10.18653\/v1\/P17-1099"},{"key":"e_1_3_1_45_1","unstructured":"Mohammed Latif Siddiq and Joanna C. S. Santos. 2023. Generate and Pray: Using SALLMS to Evaluate the Security of LLM Generated Code. CoRR abs\/2311.00889 (2023). https:\/\/doi.org\/10.48550\/ARXIV.2311.00889 10.48550\/ARXIV.2311.00889 arXiv2311.00889"},{"key":"e_1_3_1_46_1","first-page":"697","volume-title":"Proceedings of the 16th International Conference on World Wide Web, WWW 2007, Banff, Alberta, Canada, May 8-12, 2007","author":"Suchanek Fabian M.","year":"2007","unstructured":"Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. Yago: a core of semantic knowledge. In Proceedings of the 16th International Conference on World Wide Web, WWW 2007, Banff, Alberta, Canada, May 8-12, 2007, Carey L. Williamson, Mary Ellen Zurko, Peter F. Patel-Schneider, and Prashant J. Shenoy (Eds.). ACM, 697-706. https:\/\/doi.org\/10.1145\/1242572.1242667 10.1145\/1242572.1242667"},{"key":"e_1_3_1_47_1","unstructured":"Katherine Tian Eric Mitchell Huaxiu Yao Christopher D. Manning and Chelsea Finn. 2024. Fine-Tuning Language Models for Factuality. In The Twelfth International Conference on Learning Representations ICLR 2024 Vienna Austria May 7-11 2024. OpenReview.net. https:\/\/openreview.net\/forum?id=WPZ2yPag4K"},{"key":"e_1_3_1_48_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jnlest.2022.100159"},{"key":"e_1_3_1_49_1","unstructured":"Hugo Touvron Louis Martin Kevin Stone Peter Albert Amjad Almahairi Yasmine Babaei Nikolay Bashlykov Soumya Batra Prajjwal Bhargava Shruti Bhosale Dan Bikel Lukas Blecher Cristian Canton-Ferrer Moya Chen Guillem Cucurull David Esiobu Jude Fernandes Jeremy Fu Wenyin Fu Brian Fuller Cynthia Gao Vedanuj Goswami Naman Goyal Anthony Hartshorn Saghar Hosseini Rui Hou Hakan Inan Marcin Kardas Viktor Kerkez Madian Khabsa Isabel Kloumann Artem Korenev Punit Singh Koura Marie-Anne Lachaux Thibaut Lavril Jenya Lee Diana Liskovich Yinghai Lu Yuning Mao Xavier Martinet Todor Mihaylov Pushkar Mishra Igor Molybog Yixin Nie Andrew Poulton Jeremy Reizenstein Rashi Rungta Kalyan Saladi Alan Schelten Ruan Silva Eric Michael Smith Ranjan Subramanian Xiaoqing Ellen Tan Binh Tang Ross Taylor Adina Williams Jian Xiang Kuan Puxin Xu Zheng Yan Iliyan Zarov Yuchen Zhang Angela Fan Melanie Kambadur Sharan Narang Aurelien Rodriguez Robert Stojnic Sergey Edunov and Thomas Scialom. 2023. Llama 2: Open Foundation and Fine-Tuned Chat Models. CoRR abs\/2307.09288 (2023). https:\/\/doi.org\/10.48550\/ARXIV.2307.09288 10.48550\/ARXIV.2307.09288 arXiv:2307.09288"},{"key":"e_1_3_1_50_1","unstructured":"Neeraj Varshney Wenlin Yao Hongming Zhang Jianshu Chen and Dong Yu. 2023. A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation. CoRR abs\/2307.03987 (2023). https:\/\/doi.org\/10.48550\/ARXIV.2307.03987 10.48550\/ARXIV.2307.03987 arXiv:2307.03987"},{"key":"e_1_3_1_51_1","first-page":"13697","volume-title":"Findings of the Association for Computational Linguistics, ACL 2024, Bangkok, Thailand and virtual meeting, August 11-16, 2024","author":"Vu Tu","year":"2024","unstructured":"Tu Vu, Mohit Iyyer, Xuezhi Wang, Noah Constant, Jerry W. Wei, Jason Wei, Chris Tar, Yun-Hsuan Sung, Denny Zhou, Quoc V. Le, and Thang Luong. 2024. FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation. In Findings of the Association for Computational Linguistics, ACL 2024, Bangkok, Thailand and virtual meeting, August 11-16, 2024, Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.). Association for Computational Linguistics, 13697-13720. https:\/\/aclanthology.org\/2024.findings-acl.813"},{"key":"e_1_3_1_52_1","doi-asserted-by":"crossref","first-page":"82","DOI":"10.18653\/v1\/2024.acl-demos.9","volume-title":"Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)","author":"Wang Peng","year":"2024","unstructured":"Peng Wang, Ningyu Zhang, Bozhong Tian, Zekun Xi, Yunzhi Yao, Ziwen Xu, Mengru Wang, Shengyu Mao, Xiaohan Wang, Siyuan Cheng, Kangwei Liu, Yuansheng Ni, Guozhou Zheng, and Huajun Chen. 2024. EasyEdit: An Easy-to-use Knowledge Editing Framework for Large Language Models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), Yixin Cao, Yang Feng, and Deyi Xiong (Eds.). Association for Computational Linguistics, Bangkok, Thailand, 82-93. https:\/\/aclanthology.Org\/2024.acl-demos.9"},{"key":"e_1_3_1_53_1","first-page":"10837","volume-title":"Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event \/ Punta Cana, Dominican Republic, 7-11 November, 2021","author":"Wang Shufan","year":"2021","unstructured":"Shufan Wang, Laure Thompson, and Mohit Iyyer. 2021. Phrase-BERT: Improved Phrase Embeddings from BERT with an Application to Corpus Exploration. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event \/ Punta Cana, Dominican Republic, 7-11 November, 2021, Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (Eds.). Association for Computational Linguistics, 10837-10851. https:\/\/doi.org\/10.18653\/V1\/2021.EMNLP-MAIN.846 10.18653\/V1\/2021.EMNLP-MAIN.846"},{"key":"e_1_3_1_54_1","doi-asserted-by":"publisher","DOI":"10.1017\/S1471068407003237"},{"key":"e_1_3_1_55_1","unstructured":"Hanxiang Xu Shenao Wang Ningke Li Kailong Wang Yanjie Zhao Kai Chen Ting Yu Yang Liu and Haoyu Wang. 2024. Large Language Models for Cyber Security: A Systematic Literature Review. CoRR abs\/2405.04760 (2024). https:\/\/doi.org\/10.48550\/ARXIV.2405.04760 10.48550\/ARXIV.2405.04760 arXiv:2405.04760"},{"key":"e_1_3_1_56_1","first-page":"578","volume-title":"Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis (Vienna, Austria) (ISSTA 2024)","author":"Yang Mingke","year":"2024","unstructured":"Mingke Yang, Yuqi Chen, Yi Liu, and Ling Shi. 2024a. DistillSeq: A Framework for Safety Alignment Testing in Large Language Models using Knowledge Distillation. In Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis (Vienna, Austria) (ISSTA 2024). Association for Computing Machinery, New York, NY, USA, 578-589. https:\/\/doi.org\/10.1145\/3650212.3680304 10.1145\/3650212.3680304"},{"key":"e_1_3_1_57_1","first-page":"209","volume-title":"Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Yang Zonglin","year":"2024","unstructured":"Zonglin Yang, Li Dong, Xinya Du, Hao Cheng, Erik Cambria, Xiaodong Liu, Jianfeng Gao, and Furu Wei. 2024b. Language Models as Inductive Reasoners. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), Yvette Graham and Matthew Purver (Eds.). Association for Computational Linguistics, St. Julian\u2019s, Malta, 209-225. https:\/\/aclanthology.org\/2024.eacl-long.13"},{"key":"e_1_3_1_58_1","first-page":"2369","volume-title":"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018","author":"Yang Zhilin","year":"2018","unstructured":"Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William W. Cohen, Ruslan Salakhutdinov, and Christopher D. Manning. 2018. HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018, Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun\u2019ichi Tsujii (Eds.). Association for Computational Linguistics, 2369-2380. https:\/\/doi.org\/10.18653\/V1\/D18-1259 10.18653\/V1\/D18-1259"},{"key":"e_1_3_1_59_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.hcc.2024.100211"},{"key":"e_1_3_1_60_1","unstructured":"Xi Ye Qiaochu Chen Isil Dillig and Greg Durrett. 2023. SatLM: Satisfiability-Aided Language Models Using Declarative Prompting. In Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023 NeurIPS 2023 New Orleans LA USA December 10 - 16 2023 Alice Oh Tristan Naumann Amir Globerson Kate Saenko Moritz Hardt and Sergey Levine (Eds.). http:\/\/papers.nips.cc\/paper_files\/paper\/2023\/hash\/8e9c7d4a48bdac81a58f983a64aaf42b-Abstract-Conference.html"},{"key":"e_1_3_1_61_1","unstructured":"Shukang Yin Chaoyou Fu Sirui Zhao Tong Xu Hao Wang Dianbo Sui Yunhang Shen Ke Li Xing Sun and Enhong Chen. 2023. Woodpecker: Hallucination Correction for Multimodal Large Language Models. CoRR abs\/2310.16045 (2023). https:\/\/doi.org\/10.48550\/ARXIV.2310.16045 10.48550\/ARXIV.2310.16045 arXiv:2310.16045"},{"key":"e_1_3_1_62_1","unstructured":"Jifan Yu Xiaozhi Wang Shangqing Tu Shulin Cao Daniel Zhang-Li Xin Lv Hao Peng Zijun Yao Xiaohan Zhang Hanming Li Chunyang Li Zheyuan Zhang Yushi Bai Yantao Liu Amy Xin Kaifeng Yun Linlu Gong Nianyi Lin Jianhui Chen Zhili Wu Yunjia Qi Weikai Li Yong Guan Kaisheng Zeng Ji Qi Hailong Jin Jinxin Liu Yu Gu Yuan Yao Ning Ding Lei Hou Zhiyuan Liu Bin Xu Jie Tang and Juanzi Li. 2024. KoLA: Carefully Benchmarking World Knowledge of Large Language Models. In The Twelfth International Conference on Learning Representations ICLR 2024 Vienna Austria May 7-11 2024. OpenReview.net. https:\/\/openreview.net\/forum?id=AqN23oqraW"},{"key":"e_1_3_1_63_1","unstructured":"Yue Zhang Yafu Li Leyang Cui Deng Cai Lemao Liu Tingchen Fu Xinting Huang Enbo Zhao Yu Zhang Yulong Chen Longyue Wang Anh Tuan Luu Wei Bi Freda Shi and Shuming Shi. 2023. Siren\u2019s Song in the AI Ocean: A Survey on Hallucination in Large Language Models. CoRR abs\/2309.01219 (2023). https:\/\/doi.org\/10.48550\/ARXIV.2309.01219 10.48550\/ARXIV.2309.01219 arXiv:2309.01219"},{"key":"e_1_3_1_64_1","unstructured":"Zhibo Zhang Wuxia Bai Yuxi Li Mark Huasong Meng Kailong Wang Ling Shi Li Li Jun Wang and Haoyu Wang. 2024. GlitchProber: Advancing Effective Detection and Mitigation of Glitch Tokens in Large Language Models. CoRR abs\/2408.04905 (2024). https:\/\/doi.org\/10.48550\/ARXIV.2408.04905 10.48550\/ARXIV.2408.04905 arXiv:2408.04905"},{"key":"e_1_3_1_65_1","first-page":"3125","volume-title":"The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019","author":"Zhou Zili","year":"2019","unstructured":"Zili Zhou, Shaowu Liu, Guandong Xu, and Wu Zhang. 2019. On Completing Sparse Knowledge Base with Transitive Relation Embedding. In The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019. AAAI Press, 3125-3132. https:\/\/doi.org\/10.1609\/AAAI.V33I01.33013125 10.1609\/AAAI.V33I01.33013125"},{"key":"e_1_3_1_66_1","unstructured":"Andy Zou Long Phan Sarah Chen James Campbell Phillip Guo Richard Ren Alexander Pan Xuwang Yin Mantas Mazeika Ann-Kathrin Dombrowski Shashwat Goel Nathaniel Li Michael J. Byun Zifan Wang Alex Mallen Steven Basart Sanmi Koyejo Dawn Song Matt Fredrikson J. Zico Kolter and Dan Hendrycks. 2023. Representation Engineering: A Top-Down Approach to AI Transparency. CoRR abs\/2310.01405 (2023). https:\/\/doi.org\/10.48550\/ARXIV.2310.01405 10.48550\/ARXIV.2310.01405 arXiv:2310.01405"}],"container-title":["Proceedings of the ACM on Programming Languages"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3689776","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3689776","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,2,4]],"date-time":"2026-02-04T09:07:05Z","timestamp":1770196025000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3689776"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,10,8]]},"references-count":65,"journal-issue":{"issue":"OOPSLA2","published-print":{"date-parts":[[2024,10,8]]}},"alternative-id":["10.1145\/3689776"],"URL":"https:\/\/doi.org\/10.1145\/3689776","relation":{},"ISSN":["2475-1421"],"issn-type":[{"value":"2475-1421","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,10,8]]},"assertion":[{"value":"2024-04-06","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-08-18","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-10-08","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}