{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,21]],"date-time":"2026-06-21T06:32:37Z","timestamp":1782023557955,"version":"3.54.5"},"reference-count":391,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2025,1,24]],"date-time":"2025-01-24T00:00:00Z","timestamp":1737676800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Inf. Syst."],"published-print":{"date-parts":[[2025,3,31]]},"abstract":"<jats:p>The emergence of large language models (LLMs) has marked a significant breakthrough in natural language processing (NLP), fueling a paradigm shift in information acquisition. Nevertheless, LLMs are prone to hallucination, generating plausible yet nonfactual content. This phenomenon raises significant concerns over the reliability of LLMs in real-world information retrieval (IR) systems and has attracted intensive research to detect and mitigate such hallucinations. Given the open-ended general-purpose attributes inherent to LLMs, LLM hallucinations present distinct challenges that diverge from prior task-specific models. This divergence highlights the urgency for a nuanced understanding and comprehensive overview of recent advances in LLM hallucinations. In this survey, we begin with an innovative taxonomy of hallucination in the era of LLM and then delve into the factors contributing to hallucinations. Subsequently, we present a thorough overview of hallucination detection methods and benchmarks. Our discussion then transfers to representative methodologies for mitigating LLM hallucinations. Additionally, we delve into the current limitations faced by retrieval-augmented LLMs in combating hallucinations, offering insights for developing more robust IR systems. Finally, we highlight the promising research directions on LLM hallucinations, including hallucination in large vision-language models and understanding of knowledge boundaries in LLM hallucinations.<\/jats:p>","DOI":"10.1145\/3703155","type":"journal-article","created":{"date-parts":[[2024,11,20]],"date-time":"2024-11-20T15:07:22Z","timestamp":1732115242000},"page":"1-55","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1556,"title":["A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions"],"prefix":"10.1145","volume":"43","author":[{"ORCID":"https:\/\/orcid.org\/0009-0002-9650-8353","authenticated-orcid":false,"given":"Lei","family":"Huang","sequence":"first","affiliation":[{"name":"Harbin Institute of Technology, Harbin, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7449-3093","authenticated-orcid":false,"given":"Weijiang","family":"Yu","sequence":"additional","affiliation":[{"name":"Huawei Inc., Shenzhen, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-8631-3858","authenticated-orcid":false,"given":"Weitao","family":"Ma","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology, Harbin, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-4063-668X","authenticated-orcid":false,"given":"Weihong","family":"Zhong","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology, Harbin, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5909-1339","authenticated-orcid":false,"given":"Zhangyin","family":"Feng","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology, Harbin, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-5363-3886","authenticated-orcid":false,"given":"Haotian","family":"Wang","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology, Harbin, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7845-1544","authenticated-orcid":false,"given":"Qianglong","family":"Chen","sequence":"additional","affiliation":[{"name":"Huawei Inc., Shenzhen, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3380-6409","authenticated-orcid":false,"given":"Weihua","family":"Peng","sequence":"additional","affiliation":[{"name":"Huawei Inc., Shenzhen, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6011-0496","authenticated-orcid":false,"given":"Xiaocheng","family":"Feng","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology, Harbin, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2543-5604","authenticated-orcid":false,"given":"Bing","family":"Qin","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology, Harbin, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-1999-2600","authenticated-orcid":false,"given":"Ting","family":"Liu","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology, Harbin, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2025,1,24]]},"reference":[{"key":"e_1_3_1_2_2","unstructured":"Amro Abbas Kushal Tirumala D\u00e1niel Simig Surya Ganguli and Ari S. Morcos. 2023. SemDeDup: Data-efficient learning at web-scale through semantic deduplication. arXiv:2303.09540. Retrieved from https:\/\/arxiv.org\/abs\/2303.09540"},{"key":"e_1_3_1_3_2","doi-asserted-by":"crossref","unstructured":"Vaibhav Adlakha Parishad BehnamGhader Xing Han Lu Nicholas Meade and Siva Reddy. 2023. Evaluating correctness and faithfulness of instruction-following models for question answering. arXiv:2307.16877. Retrieved from https:\/\/arxiv.org\/abs\/2307.16877","DOI":"10.1162\/tacl_a_00667"},{"key":"e_1_3_1_4_2","unstructured":"Ayush Agrawal Lester Mackey and Adam Tauman Kalai. 2023. Do language models know when they\u2019re hallucinating references? arXiv:2305.18248. Retrieved from https:\/\/arxiv.org\/abs\/2305.18248"},{"key":"e_1_3_1_5_2","unstructured":"Perplexity AI. 2023. Perplexity AI. https:\/\/www.perplexity.ai\/"},{"key":"e_1_3_1_6_2","unstructured":"Renat Aksitov Chung-Ching Chang David Reitter Siamak Shakeri and Yun-Hsuan Sung. 2023. Characterizing attribution and fluency tradeoffs for retrieval-augmented large language models. arXiv:2302.05578. Retrieved from https:\/\/arxiv.org\/abs\/2302.05578"},{"key":"e_1_3_1_7_2","unstructured":"Badr AlKhamissi Millicent Li Asli Celikyilmaz Mona T. Diab and Marjan Ghazvininejad. 2022. A review on language models as knowledge bases. arXiv:2204.06031. Retrieved from https:\/\/arxiv.org\/abs\/2204.06031"},{"key":"e_1_3_1_8_2","unstructured":"Rohan Anil Sebastian Borgeaud Yonghui Wu Jean-Baptiste Alayrac Jiahui Yu Radu Soricut Johan Schalkwyk Andrew M. Dai Anja Hauth Katie Millican et\u00a0al. 2023. Gemini: A family of highly capable multimodal models. arXiv:2312.11805. Retrieved from https:\/\/arxiv.org\/abs\/2312.11805"},{"key":"e_1_3_1_9_2","unstructured":"Anthropic. 2023. Claude. Retrieved from https:\/\/claude.ai\/"},{"key":"e_1_3_1_10_2","unstructured":"Antropic. 2024. Claude 3 Haiku: Our Fastest Model Yet. 2024. Retrieved from https:\/\/www.anthropic.com\/news\/claude-3-haiku"},{"key":"e_1_3_1_11_2","unstructured":"ArXiv. 2023. arxiv dataset. Retrieved from https:\/\/www.kaggle.com\/datasets\/Cornell-University\/arxiv\/versions\/134"},{"key":"e_1_3_1_12_2","unstructured":"Akari Asai Zeqiu Wu Yizhong Wang Avirup Sil and Hannaneh Hajishirzi. 2023. Self-RAG: Learning to retrieve generate and critique through self-reflection. arXiv:2310.11511. Retrieved from https:\/\/arxiv.org\/abs\/2310.11511"},{"key":"e_1_3_1_13_2","unstructured":"Akari Asai Zexuan Zhong Danqi Chen Pang Wei Koh Luke Zettlemoyer Hannaneh Hajishirzi and Wen-tau Yih. 2024. Reliable adaptable and attributable language models with retrieval. arXiv:2403.03187. Retrieved from https:\/\/arxiv.org\/abs\/2403.03187"},{"key":"e_1_3_1_14_2","doi-asserted-by":"crossref","unstructured":"Amos Azaria and Tom M. Mitchell. 2023. The internal state of an LLM knows when its lying. arXiv:2304.13734. Retrieved from https:\/\/arxiv.org\/abs\/2304.13734","DOI":"10.18653\/v1\/2023.findings-emnlp.68"},{"key":"e_1_3_1_15_2","doi-asserted-by":"crossref","unstructured":"Jinheon Baek Alham Fikri Aji and Amir Saffari. 2023. Knowledge-augmented language model prompting for zero-shot knowledge graph question answering. arXiv:2306.04136. Retrieved from https:\/\/arxiv.org\/abs\/2306.04136","DOI":"10.18653\/v1\/2023.matching-1.7"},{"key":"e_1_3_1_16_2","doi-asserted-by":"crossref","unstructured":"Yejin Bang Samuel Cahyawijaya Nayeon Lee Wenliang Dai Dan Su Bryan Wilie Holy Lovenia Ziwei Ji Tiezheng Yu Willy Chung et\u00a0al. 2023. A multitask multilingual multimodal evaluation of ChatGPT on reasoning hallucination and interactivity. arXiv:2302.04023. Retrieved from https:\/\/arxiv.org\/abs\/2302.04023","DOI":"10.18653\/v1\/2023.ijcnlp-main.45"},{"key":"e_1_3_1_17_2","doi-asserted-by":"crossref","unstructured":"Scott Barnett Stefanus Kurniawan Srikanth Thudumu Zach Brannelly and Mohamed Abdelrazek. 2024. Seven failure points when engineering a retrieval augmented generation system. arXiv:2401.05856. Retrieved from https:\/\/arxiv.org\/abs\/2401.05856","DOI":"10.1145\/3644815.3644945"},{"key":"e_1_3_1_18_2","unstructured":"Mario Barrantes Benedikt Herudek and Richard Wang. 2020. Adversarial nli for factual correctness in text summarisation models. arXiv:2005.11739. Retrieved from https:\/\/arxiv.org\/abs\/2005.11739"},{"key":"e_1_3_1_19_2","first-page":"845","volume-title":"Proceedings of the 13th International Joint Conference on Artificial Intelligence","author":"Basso Pierre","year":"1993","unstructured":"Pierre Basso. 1993. Conditional causal logic: A formal theory of the meaning generating processes in a cognitive system. In Proceedings of the 13th International Joint Conference on Artificial Intelligence. Ruzena Bajcsy (Ed.), Morgan Kaufmann, 845\u2013851. Retrieved from http:\/\/ijcai.org\/Proceedings\/93-2\/Papers\/002.pdf"},{"key":"e_1_3_1_20_2","unstructured":"Iz Beltagy Matthew E. Peters and Arman Cohan. 2020. Longformer: The long-document transformer. arXiv:2004.05150. Retrieved from https:\/\/arxiv.org\/abs\/2004.05150"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/3442188.3445922"},{"key":"e_1_3_1_22_2","first-page":"1171","volume-title":"Proceedings of the 28th International Conference on Neural Information Processing Systems","author":"Bengio Samy","year":"2015","unstructured":"Samy Bengio, Oriol Vinyals, Navdeep Jaitly, and Noam Shazeer. 2015. Scheduled sampling for sequence prediction with recurrent neural networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems. Corinna Cortes, Neil D. Lawrence, Daniel D. Lee, Masashi Sugiyama, and Roman Garnett (Eds.), 1171\u20131179. Retrieved from https:\/\/proceedings.neurips.cc\/paper\/2015\/hash\/e995f98d56967d946471af29d7bf99f1-Abstract.html"},{"key":"e_1_3_1_23_2","unstructured":"Lukas Berglund Meg Tong Max Kaufmann Mikita Balesni Asa Cooper Stickland Tomasz Korbak and Owain Evans. 2023. The reversal curse: LLMs trained on \u201cA is B\u201d fail to learn \u201cB is A\u201d. arXiv:2309.12288. Retrieved from https:\/\/arxiv.org\/abs\/2309.12288"},{"key":"e_1_3_1_24_2","doi-asserted-by":"crossref","unstructured":"Sid Black Stella Biderman Eric Hallahan Quentin Anthony Leo Gao Laurence Golding Horace He Connor Leahy Kyle McDonell Jason Phang et\u00a0al. 2022. GPT-NeoX-20B: An open-source autoregressive language model. arXiv:2204.06745. Retrieved from https:\/\/arxiv.org\/abs\/2204.06745","DOI":"10.18653\/v1\/2022.bigscience-1.9"},{"key":"e_1_3_1_25_2","series-title":"Proceedings of Machine Learning Research, Vol. 162","first-page":"2206","volume-title":"Proceedings of theInternational Conference on Machine Learning (ICML \u201922)","author":"Borgeaud Sebastian","year":"2022","unstructured":"Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George van den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, et\u00a0al. 2022. Improving language models by retrieving from trillions of tokens. In Proceedings of theInternational Conference on Machine Learning (ICML \u201922). Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesv\u00e1ri, Gang Niu, and Sivan Sabato (Eds.), Proceedings of Machine Learning Research, Vol. 162, PMLR, 2206\u20132240. Retrieved from https:\/\/proceedings.mlr.press\/v162\/borgeaud22a.html"},{"key":"e_1_3_1_26_2","unstructured":"Samuel R. Bowman Jeeyoon Hyun Ethan Perez Edwin Chen Craig Pettit Scott Heiner Kamile Lukosuite Amanda Askell Andy Jones Anna Chen et al. 2022. Measuring progress on scalable oversight for large language models. arXiv:2211.03540. Retrieved from https:\/\/arxiv.org\/abs\/2211.03540"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.2307\/2334029"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.emnlp-main.113"},{"key":"e_1_3_1_29_2","first-page":"21","volume-title":"Proceedings of the Compression and Complexity of SEQUENCES 1997 (Cat. No. 97TB100171)","author":"Broder Andrei Z.","year":"1997","unstructured":"Andrei Z. Broder. 1997. On the resemblance and containment of documents. In Proceedings of the Compression and Complexity of SEQUENCES 1997 (Cat. No. 97TB100171). IEEE, 21\u201329."},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.5555\/3495724.3495883"},{"key":"e_1_3_1_31_2","unstructured":"S\u00e9bastien Bubeck Varun Chandrasekaran Ronen Eldan Johannes Gehrke Eric Horvitz Ece Kamar Peter Lee Yin Tat Lee Yuanzhi Li Scott M. Lundberg et\u00a0al. 2023. Sparks of artificial general intelligence: Early experiments with GPT-4. arXiv:2303.12712. Retrieved from https:\/\/arxiv.org\/abs\/2303.12712"},{"key":"e_1_3_1_32_2","unstructured":"Collin Burns Haotian Ye Dan Klein and Jacob Steinhardt. 2022. Discovering latent knowledge in language models without supervision. arXiv:2212.03827. Retrieved from https:\/\/arxiv.org\/abs\/2212.03827"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/2023.FINDINGS-EMNLP.835"},{"key":"e_1_3_1_34_2","unstructured":"Yihan Cao Siyu Li Yixin Liu Zhiling Yan Yutong Dai Philip S. Yu and Lichao Sun. 2023. A comprehensive survey of AI-generated content (AIGC): A history of generative AI from GAN to ChatGPT. arXiv:2303.04226. Retrieved from https:\/\/arxiv.org\/abs\/2303.04226"},{"key":"e_1_3_1_35_2","unstructured":"Nicholas Carlini Daphne Ippolito Matthew Jagielski Katherine Lee Florian Tramer and Chiyuan Zhang. 2022. Quantifying memorization across neural language models. arXiv:2202.07646. Retrieved from https:\/\/arxiv.org\/abs\/2202.07646"},{"key":"e_1_3_1_36_2","first-page":"2633","volume-title":"Proceedings of the International Conference on 30th USENIX Security Symposium (USENIX Security 21)","author":"Carlini Nicholas","year":"2021","unstructured":"Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, et al. 2021. Extracting training data from large language models. In Proceedings of the International Conference on 30th USENIX Security Symposium (USENIX Security 21). 2633\u20132650."},{"key":"e_1_3_1_37_2","unstructured":"Chung-Ching Chang David Reitter Renat Aksitov and Yun-Hsuan Sung. 2023. KL-divergence guided temperature sampling. 2306.01286. Retrieved from https:\/\/arxiv.org\/abs\/2306.01286"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/2023.FINDINGS-ACL.805"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.acl-long.554"},{"key":"e_1_3_1_40_2","unstructured":"Canyu Chen and Kai Shu. 2023. Combating misinformation in the age of LLMs: Opportunities and challenges. arXiv:2311.05656. Retrieved from https:\/\/arxiv.org\/abs\/2311.05656"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/2022.EMNLP-MAIN.146"},{"key":"e_1_3_1_42_2","unstructured":"Hung-Ting Chen Fangyuan Xu Shane A. Arora and Eunsol Choi. 2023. Understanding retrieval augmentation for long-form question answering. arXiv:2310.12150. Retrieved from https:\/\/arxiv.org\/abs\/2310.12150"},{"key":"e_1_3_1_43_2","unstructured":"Shiqi Chen Yiran Zhao Jinghan Zhang I-Chun Chern Siyang Gao Pengfei Liu and Junxian He. 2023. FELM: Benchmarking factuality evaluation of large language models. arXiv:2310.00741. Retrieved from https:\/\/arxiv.org\/abs\/2310.00741"},{"key":"e_1_3_1_44_2","unstructured":"Tong Chen Hongwei Wang Sihao Chen Wenhao Yu Kaixin Ma Xinran Zhao Hongming Zhang and Dong Yu. 2023. Dense X retrieval: What retrieval granularity should we use? \/arXiv:2312.06648. Retrieved from https:\/\/arxiv.org\/abs\/2312.06648"},{"key":"e_1_3_1_45_2","unstructured":"Xiaoyang Chen Ben He Hongyu Lin Xianpei Han Tianshu Wang Boxi Cao Le Sun and Yingfei Sun. 2024. Spiral of silence: How is large language model killing information retrieval? \u2013 A case study on open domain question answering. arXiv:2404.10496. Retrieved from https:\/\/arxiv.org\/abs\/2404.10496"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.5555\/3600270.3602050"},{"key":"e_1_3_1_47_2","unstructured":"Yijie Chen Yijin Liu Fandong Meng Yufeng Chen Jinan Xu and Jie Zhou. 2023. Improving translation faithfulness of large language models via augmenting instructions. arXiv:2308.12674. Retrieved from https:\/\/arxiv.org\/abs\/2308.12674"},{"key":"e_1_3_1_48_2","unstructured":"Yangyi Chen Karan Sikka Michael Cogswell Heng Ji and Ajay Divakaran. 2023. Measuring and improving chain-of-thought reasoning in vision-language models. arXiv:2309.04461. Retrieved from https:\/\/arxiv.org\/abs\/2309.04461"},{"key":"e_1_3_1_49_2","doi-asserted-by":"crossref","unstructured":"Qinyuan Cheng Xiaonan Li Shimin Li Qin Zhu Zhangyue Yin Yunfan Shao Linyang Li Tianxiang Sun Hang Yan and Xipeng Qiu. 2024. Unified active retrieval for retrieval augmented generation. arXiv:2406.12534. Retrieved from https:\/\/arxiv.org\/abs\/2406.12534","DOI":"10.18653\/v1\/2024.findings-emnlp.999"},{"key":"e_1_3_1_50_2","unstructured":"Qinyuan Cheng Tianxiang Sun Wenwei Zhang Siyin Wang Xiangyang Liu Mozhi Zhang Junliang He Mianqiu Huang Zhangyue Yin Kai Chen et\u00a0al. 2023. Evaluating hallucinations in chinese large language models. arXiv:2310.03368. Retrieved from https:\/\/arxiv.org\/abs\/2310.03368"},{"key":"e_1_3_1_51_2","unstructured":"I Chern Steffi Chern Shiqi Chen Weizhe Yuan Kehua Feng Chunting Zhou Junxian He Graham Neubig Pengfei Liu et al. 2023. FacTool: Factuality detection in generative AI\u2013A tool augmented framework for multi-task and multi-domain scenarios. arXiv:2307.13528. Retrieved from https:\/\/arxiv.org\/abs\/2307.13528"},{"key":"e_1_3_1_52_2","doi-asserted-by":"crossref","unstructured":"Cheng-Han Chiang and Hung-yi Lee. 2023. Can large language models be an alternative to human evaluations? arXiv:2305.01937. Retrieved from https:\/\/arxiv.org\/abs\/2305.01937","DOI":"10.18653\/v1\/2023.acl-long.870"},{"key":"e_1_3_1_53_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.acl-long.527"},{"key":"e_1_3_1_54_2","doi-asserted-by":"crossref","unstructured":"Sehyun Choi Tianqing Fang Zhaowei Wang and Yangqiu Song. 2023. KCTS: Knowledge-constrained tree search decoding with token-level hallucination detection. arXiv:2310.09044. Retrieved from https:\/\/arxiv.org\/abs\/2310.09044","DOI":"10.18653\/v1\/2023.emnlp-main.867"},{"key":"e_1_3_1_55_2","unstructured":"Aakanksha Chowdhery Sharan Narang Jacob Devlin Maarten Bosma Gaurav Mishra Adam Roberts Paul Barham Hyung Won Chung Charles Sutton Sebastian Gehrmann et\u00a0al. 2023. PaLM: Scaling language modeling with pathways. Journal of Machine Learning Research 24 (2023) 240:1\u2013240:113. Retrieved from http:\/\/jmlr.org\/papers\/v24\/22-1144.html"},{"key":"e_1_3_1_56_2","first-page":"4299","volume-title":"Proceedings of the 31st International Conference on Neural Information Processing Systems","author":"Christiano Paul F.","year":"2017","unstructured":"Paul F. Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, and Dario Amodei. 2017. Deep reinforcement learning from human preferences. In Proceedings of the 31st International Conference on Neural Information Processing Systems. Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.), 4299\u20134307. Retrieved from https:\/\/proceedings.neurips.cc\/paper\/2017\/hash\/d5e2c0adad503c91f91df240d0cd4e49-Abstract.html"},{"key":"e_1_3_1_57_2","doi-asserted-by":"crossref","unstructured":"Zheng Chu Jingchang Chen Qianglong Chen Haotian Wang Kun Zhu Xiyuan Du Weijiang Yu Ming Liu and Bing Qin. 2024. BeamAggR: Beam aggregation reasoning over multi-source knowledge for multi-hop question answering. arXiv:2406.19820. Retrieved from https:\/\/arxiv.org\/abs\/2406.19820","DOI":"10.18653\/v1\/2024.acl-long.67"},{"key":"e_1_3_1_58_2","unstructured":"Zheng Chu Jingchang Chen Qianglong Chen Weijiang Yu Tao He Haotian Wang Weihua Peng Ming Liu Bing Qin and Ting Liu. 2023. A survey of chain of thought reasoning: advances frontiers and future. arXiv:2309.15402. Retrieved from https:\/\/arxiv.org\/abs\/2309.15402"},{"key":"e_1_3_1_59_2","unstructured":"Yung-Sung Chuang Yujia Xie Hongyin Luo Yoon Kim James Glass and Pengcheng He. 2023. Dola: Decoding by contrasting layers improves factuality in large language models. arXiv:2309.03883. Retrieved from https:\/\/arxiv.org\/abs\/2309.03883"},{"key":"e_1_3_1_60_2","unstructured":"Hyung Won Chung Le Hou Shayne Longpre Barret Zoph Yi Tay William Fedus Eric Li Xuezhi Wang Mostafa Dehghani Siddhartha Brahma et al. 2022. Scaling instruction-finetuned language models. arXiv:2210.11416. Retrieved from https:\/\/arxiv.org\/abs\/2210.11416"},{"key":"e_1_3_1_61_2","unstructured":"Karl Cobbe Vineet Kosaraju Mohammad Bavarian Mark Chen Heewoo Jun Lukasz Kaiser Matthias Plappert Jerry Tworek Jacob Hilton Reiichiro Nakano Christopher Hesse and John Schulman. 2021. Training verifiers to solve math word problems. arXiv:2110.14168. Retrieved from https:\/\/arxiv.org\/abs\/2110.14168"},{"key":"e_1_3_1_62_2","doi-asserted-by":"crossref","unstructured":"Roi Cohen May Hamri Mor Geva and Amir Globerson. 2023. LM vs LM: Detecting factual errors via cross examination. arXiv:2305.13281. Retrieved from https:\/\/arxiv.org\/abs\/2305.13281","DOI":"10.18653\/v1\/2023.emnlp-main.778"},{"key":"e_1_3_1_63_2","unstructured":"Together Computer. 2023. RedPajama: An Open Dataset for Training Large Language Models. Retrieved from https:\/\/github.com\/togethercomputer\/RedPajama-Data"},{"key":"e_1_3_1_64_2","unstructured":"Ajeya Cotra. 2021. Why AI Alignment Could Be Hard with Modern Deep Learning. Cold Takes. Retrieved from https:\/\/www.cold-takes.com\/why-ai-alignment-could-be-hard-with-modern-deep-learning\/"},{"key":"e_1_3_1_65_2","doi-asserted-by":"crossref","unstructured":"Florin Cuconasu Giovanni Trappolini Federico Siciliano Simone Filice Cesare Campagnano Yoelle Maarek Nicola Tonellotto and Fabrizio Silvestri. 2024. The power of noise: Redefining retrieval for RAG systems. arXiv:2401.14887. Retrieved from https:\/\/arxiv.org\/abs\/2401.14887","DOI":"10.1145\/3626772.3657834"},{"key":"e_1_3_1_66_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.acl-long.581"},{"key":"e_1_3_1_67_2","unstructured":"Damai Dai Wenbin Jiang Qingxiu Dong Yajuan Lyu Qiaoqiao She and Zhifang Sui. 2022. Neural knowledge bank for pretrained transformers. arXiv:2208.00399. Retrieved from https:\/\/arxiv.org\/abs\/2208.00399"},{"key":"e_1_3_1_68_2","unstructured":"Sunhao Dai Yuqi Zhou Liang Pang Weihao Liu Xiaolin Hu Yong Liu Xiao Zhang and Jun Xu. 2023. LLMs may dominate information access: Neural retrievers are biased towards LLM-Generated texts. arXiv:2310.20501. Retrieved from https:\/\/arxiv.org\/abs\/2310.20501"},{"key":"e_1_3_1_69_2","volume-title":"Proceedings of the 8th International Conference on Learning Representations (ICLR \u201920)","author":"Dathathri Sumanth","year":"2020","unstructured":"Sumanth Dathathri, Andrea Madotto, Janice Lan, Jane Hung, Eric Frank, Piero Molino, Jason Yosinski, and Rosanne Liu. 2020. Plug and play language models: A simple approach to controlled text generation. In Proceedings of the 8th International Conference on Learning Representations (ICLR \u201920). OpenReview.net. Retrieved from https:\/\/openreview.net\/forum?id=H1edEyBKDS"},{"key":"e_1_3_1_70_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.emnlp-main.522"},{"key":"e_1_3_1_71_2","unstructured":"Maria Angels de Luis Balaguer Vinamra Benara Renato Luiz de Freitas Cunha Roberto de M. Estev\u00e3o Filho Todd Hendry Daniel Holstein Jennifer Marsman Nick Mecklenburg Sara Malvar Leonardo O. Nunes et\u00a0al. 2024. RAG vs Fine-tuning: Pipelines tradeoffs and a case study on agriculture. arXiv:2401.08406. Retrieved from https:\/\/arxiv.org\/abs\/2401.08406"},{"key":"e_1_3_1_72_2","unstructured":"Gr\u00e9goire Del\u00e9tang Anian Ruoss Paul-Ambroise Duquenne Elliot Catt Tim Genewein Christopher Mattern Jordi Grau-Moya Li Kevin Wenliang Matthew Aitchison Laurent Orseau et al. 2023. Language modeling is compression. arXiv:2309.10668. Retrieved from https:\/\/arxiv.org\/abs\/2309.10668"},{"key":"e_1_3_1_73_2","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/N19-1423"},{"key":"e_1_3_1_74_2","unstructured":"Shehzaad Dhuliawala Mojtaba Komeili Jing Xu Roberta Raileanu Xian Li Asli Celikyilmaz and Jason Weston. 2023. Chain-of-verification reduces hallucination in large language models. ArXiv preprint abs\/2309.11495 (2023). Retrieved from https:\/\/arxiv.org\/abs\/2309.11495"},{"key":"e_1_3_1_75_2","unstructured":"Hanxing Ding Liang Pang Zihao Wei Huawei Shen and Xueqi Cheng. 2024. Retrieve Only When It Needs: Adaptive Retrieval Augmentation for Hallucination Mitigation in Large Language Models. arXiv:2402.10612. Retrieved from https:\/\/arxiv.org\/abs\/2402.10612"},{"key":"e_1_3_1_76_2","unstructured":"Zican Dong Tianyi Tang Junyi Li Wayne Xin Zhao and Ji-Rong Wen. 2023. BAMBOO: A comprehensive benchmark for evaluating long text modeling capacities of large language models. arXiv:2309.13345. Retrieved from https:\/\/arxiv.org\/abs\/2309.13345"},{"key":"e_1_3_1_77_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.454"},{"key":"e_1_3_1_78_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.emnlp-main.168"},{"key":"e_1_3_1_79_2","unstructured":"Nouha Dziri Hannah Rashkin Tal Linzen and David Reitter. 2021. Evaluating groundedness in dialogue systems: The begin benchmark. arXiv:2105.00071. Retrieved from https:\/\/arxiv.org\/abs\/2105.00071"},{"key":"e_1_3_1_80_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.naacl-main.187"},{"key":"e_1_3_1_81_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00373"},{"key":"e_1_3_1_82_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1213"},{"key":"e_1_3_1_83_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1346"},{"key":"e_1_3_1_84_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-1082"},{"key":"e_1_3_1_85_2","unstructured":"Huawen Feng Yan Fan Xiong Liu Ting-En Lin Zekun Yao Yuchuan Wu Fei Huang Yongbin Li and Qianli Ma. 2023. Improving factual consistency of text summarization by adversarially decoupling comprehension and embellishment abilities of LLMs. arXiv:2310.19347. Retrieved from https:\/\/arxiv.org\/abs\/2310.19347"},{"key":"e_1_3_1_86_2","unstructured":"Shangbin Feng Weijia Shi Yuyang Bai Vidhisha Balachandran Tianxing He and Yulia Tsvetkov. 2023. Cook: Empowering general-purpose language models with modular and collaborative knowledge. arXiv:2305.09955. Retrieved from https:\/\/arxiv.org\/abs\/2305.09955"},{"key":"e_1_3_1_87_2","unstructured":"Zhangyin Feng Xiaocheng Feng Dezhi Zhao Maojin Yang and Bing Qin. 2023. Retrieval-generation synergy augmented large language models. arXiv:2310.05149. Retrieved from https:\/\/arxiv.org\/abs\/2310.05149"},{"key":"e_1_3_1_88_2","doi-asserted-by":"crossref","unstructured":"Constanza Fierro Reinald Kim Amplayo Fantine Huot Nicola De Cao Joshua Maynez Shashi Narayan and Mirella Lapata. 2024. Learning to plan and generate text with citations. arXiv:2404.03381. Retrieved from https:\/\/arxiv.org\/abs\/2404.03381","DOI":"10.18653\/v1\/2024.acl-long.615"},{"key":"e_1_3_1_89_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.findings-emnlp.76"},{"key":"e_1_3_1_90_2","unstructured":"Robert Friel and Atindriyo Sanyal. 2023. Chainpoll: A high efficacy method for LLM hallucination detection. arXiv:2310.18344. Retrieved from https:\/\/arxiv.org\/abs\/2310.18344"},{"key":"e_1_3_1_91_2","unstructured":"Jinlan Fu See-Kiong Ng Zhengbao Jiang and Pengfei Liu. 2023. GPTScore: Evaluate as you desire. arXiv:2302.04166. Retrieved from https:\/\/arxiv.org\/abs\/2302.04166"},{"key":"e_1_3_1_92_2","series-title":"JMLR Workshop and Conference Proceedings, Vol. 48","first-page":"1050","volume-title":"Proceedings of the 33nd International Conference on Machine Learning (ICML \u201916)","author":"Gal Yarin","year":"2016","unstructured":"Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In Proceedings of the 33nd International Conference on Machine Learning (ICML \u201916). Maria-Florina Balcan and Kilian Q. Weinberger (Eds.), JMLR Workshop and Conference Proceedings, Vol. 48, JMLR.org, 1050\u20131059. Retrieved from http:\/\/proceedings.mlr.press\/v48\/gal16.html"},{"key":"e_1_3_1_93_2","unstructured":"Leo Gao Stella Biderman Sid Black Laurence Golding Travis Hoppe Charles Foster Jason Phang Horace He Anish Thite Noa Nabeshima et al. 2021. The pile: An 800gb dataset of diverse text for language modeling. arXiv:2101.00027. Retrieved from https:\/\/arxiv.org\/abs\/2101.00027"},{"key":"e_1_3_1_94_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.acl-long.910"},{"key":"e_1_3_1_95_2","unstructured":"Mingqi Gao Jie Ruan Renliang Sun Xunjian Yin Shiping Yang and Xiaojun Wan. 2023. Human-like summarization evaluation with chatgpt. arXiv:2304.02554. Retrieved from https:\/\/arxiv.org\/abs\/2304.02554"},{"key":"e_1_3_1_96_2","unstructured":"Tianyu Gao Xingcheng Yao and Danqi Chen. 2021. SimCSE: Simple contrastive learning of sentence embeddings. arXiv:2104.08821. Retrieved from https:\/\/arxiv.org\/abs\/2104.08821"},{"key":"e_1_3_1_97_2","unstructured":"Yunfan Gao Tao Sheng Youlin Xiang Yun Xiong Haofen Wang and Jiawei Zhang. 2023. Chat-REC: Towards interactive and explainable LLMs-augmented recommender system. arXiv:2303.14524. Retrieved from https:\/\/arxiv.org\/abs\/2303.14524"},{"key":"e_1_3_1_98_2","doi-asserted-by":"crossref","unstructured":"Zorik Gekhman Gal Yona Roee Aharoni Matan Eyal Amir Feder Roi Reichart and Jonathan Herzig. 2024. Does fine-tuning LLMs on new knowledge encourage hallucinations? arXiv:2405.05904. Retrieved from https:\/\/arxiv.org\/abs\/","DOI":"10.18653\/v1\/2024.emnlp-main.444"},{"key":"e_1_3_1_99_2","doi-asserted-by":"publisher","DOI":"10.1145\/3292500.3330955"},{"key":"e_1_3_1_100_2","unstructured":"Google. 2023. Bard. Retrieved from https:\/\/bard.google.com\/"},{"key":"e_1_3_1_101_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.findings-emnlp.322"},{"key":"e_1_3_1_102_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.naacl-main.114"},{"key":"e_1_3_1_103_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.findings-acl.272"},{"key":"e_1_3_1_104_2","doi-asserted-by":"crossref","unstructured":"Tianrui Guan Fuxiao Liu Xiyang Wu Ruiqi Xian Zongxia Li Xiaoyu Liu Xijun Wang Lichang Chen Furong Huang Yaser Yacoob et al. 2023. Hallusionbench: An advanced diagnostic suite for entangled language hallucination & visual illusion in large vision-language models. arXiv:2310.14566. Retrieved from https:\/\/arxiv.org\/abs\/2310.14566","DOI":"10.1109\/CVPR52733.2024.01363"},{"key":"e_1_3_1_105_2","doi-asserted-by":"crossref","unstructured":"Nuno Miguel Guerreiro Duarte M. Alves Jonas Waldendorf Barry Haddow Alexandra Birch Pierre Colombo and Andr\u00e9 F. T. Martins. 2023. Hallucinations in large multilingual translation models. arXiv:2303.16104. Retrieved from https:\/\/arxiv.org\/abs\/2303.16104","DOI":"10.1162\/tacl_a_00615"},{"key":"e_1_3_1_106_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.eacl-main.75"},{"key":"e_1_3_1_107_2","unstructured":"Suriya Gunasekar Yi Zhang Jyoti Aneja Caio C\u00e9sar Teodoro Mendes Allie Del Giorno Sivakanth Gopi Mojan Javaheripi Piero Kauffmann Gustavo de Rosa Olli Saarikivi et al. 2023. Textbooks are all you need. arXiv:2306.11644. Retrieved from https:\/\/arxiv.org\/abs\/2306.11644"},{"key":"e_1_3_1_108_2","unstructured":"Anisha Gunjal Jihan Yin and Erhan Bas. 2023. Detecting and preventing hallucinations in large vision language models. arXiv:2308.06394. Retrieved from https:\/\/arxiv.org\/abs\/2308.06394"},{"key":"e_1_3_1_109_2","series-title":"Proceedings of Machine Learning Research, Vol. 119","first-page":"3929","volume-title":"Proceedings of the 37th International Conference on Machine Learning (ICML \u201920)","author":"Guu Kelvin","year":"2020","unstructured":"Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Ming-Wei Chang. 2020. Retrieval augmented language model pre-training. In Proceedings of the 37th International Conference on Machine Learning (ICML \u201920), Proceedings of Machine Learning Research, Vol. 119, PMLR, 3929\u20133938. Retrieved from http:\/\/proceedings.mlr.press\/v119\/guu20a.html"},{"key":"e_1_3_1_110_2","first-page":"901","volume-title":"Proceedings of the 12th Language Resources and Evaluation Conference","author":"Gyawali Bikash","year":"2020","unstructured":"Bikash Gyawali, Lucas Anastasiou, and Petr Knoth. 2020. Deduplication of scholarly documents using locality sensitive hashing and word embeddings. In Proceedings of the 12th Language Resources and Evaluation Conference. European Language Resources Association, 901\u2013910. Retrieved from https:\/\/aclanthology.org\/2020.lrec-1.113"},{"key":"e_1_3_1_111_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00306"},{"key":"e_1_3_1_112_2","unstructured":"Tianyang Han Qing Lian Rui Pan Renjie Pi Jipeng Zhang Shizhe Diao Yong Lin and Tong Zhang. 2024. The instinctive bias: Spurious images lead to hallucination in MLLMs. arXiv:2402.03757. Retrieved from https:\/\/arxiv.org\/abs\/2402.03757"},{"key":"e_1_3_1_113_2","unstructured":"Hangfeng He Hongming Zhang and Dan Roth. 2023. Rethinking with retrieval: Faithful large language model inference. arXiv:2301.00303. Retrieved from https:\/\/arxiv.org\/abs\/2301.00303"},{"key":"e_1_3_1_114_2","unstructured":"Junqing He Kunhao Pan Xiaoqun Dong Zhuoyang Song Yibo Liu Yuxin Liang Hao Wang Qianguo Sun Songxin Zhang Zejian Xie and Jiaxing Zhang. 2023. Never lost in the middle: Improving large language models via attention strengthening question answering. arXiv:2311.09198. Retrieved from https:\/\/arxiv.org\/abs\/2311.09198"},{"key":"e_1_3_1_115_2","volume-title":"Proceedings of the 36th International Conference on Neural Information Processing Systems","author":"Henderson Peter","year":"2022","unstructured":"Peter Henderson, Mark S. Krass, Lucia Zheng, Neel Guha, Christopher D. Manning, Dan Jurafsky, and Daniel E. Ho. 2022. Pile of law: Learning responsible data filtering from the law and a 256GB open-source legal dataset. In Proceedings of the 36th International Conference on Neural Information Processing Systems. Sanmi Koyejo, S. Mohamed, A. Agarwal, Danielle Belgrave, K. Cho, and A. Oh (Eds.). Retrieved from http:\/\/papers.nips.cc\/paper_files\/paper\/2022\/hash\/bc218a0c656e49d4b086975a9c785f47-Abstract-Datasets_and_Benchmarks.html"},{"key":"e_1_3_1_116_2","volume-title":"Proceedings of the 9th International Conference on Learning Representations (ICLR \u201921)","author":"Hendrycks Dan","year":"2021","unstructured":"Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. 2021. Measuring massive multitask language understanding. In Proceedings of the 9th International Conference on Learning Representations (ICLR \u201921). OpenReview.net. Retrieved from https:\/\/openreview.net\/forum?id=d7KBjmI3GmQ"},{"key":"e_1_3_1_117_2","unstructured":"Evan Hernandez Belinda Z. Li and Jacob Andreas. 2023. Inspecting and editing knowledge representations in language models. arXiv:2304.00740. Retrieved from https:\/\/arxiv.org\/abs\/2304.00740"},{"key":"e_1_3_1_118_2","volume-title":"Proceedings of the 8th International Conference on Learning Representations (ICLR \u201920)","author":"Holtzman Ari","year":"2020","unstructured":"Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. 2020. The curious case of neural text degeneration. In Proceedings of the 8th International Conference on Learning Representations (ICLR \u201920). OpenReview.net. Retrieved from https:\/\/openreview.net\/forum?id=rygGQyrFvH"},{"key":"e_1_3_1_119_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.emnlp-main.619"},{"key":"e_1_3_1_120_2","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/2024.FINDINGS-ACL.838"},{"key":"e_1_3_1_121_2","doi-asserted-by":"crossref","unstructured":"Lei Huang Xiaocheng Feng Weitao Ma Liang Zhao Yuchun Fan Weihong Zhong Dongliang Xu Qing Yang Hongtao Liu and Bing Qin. 2024. Advancing large language model attribution through self-improving. arXiv:2410.13298. Retrieved from https:\/\/arxiv.org\/abs\/2410.13298","DOI":"10.18653\/v1\/2024.emnlp-main.223"},{"key":"e_1_3_1_122_2","doi-asserted-by":"crossref","unstructured":"Qidong Huang Xiaoyi Dong Pan Zhang Bin Wang Conghui He Jiaqi Wang Dahua Lin Weiming Zhang and Nenghai Yu. 2023. Opera: Alleviating hallucination in multi-modal large language models via over-trust penalty and retrospection-allocation. arXiv:2311.17911. Retrieved from https:\/\/arxiv.org\/abs\/2311.17911","DOI":"10.1109\/CVPR52733.2024.01274"},{"key":"e_1_3_1_123_2","unstructured":"Shaohan Huang Li Dong Wenhui Wang Yaru Hao Saksham Singhal Shuming Ma Tengchao Lv Lei Cui Owais Khan Mohammed Qiang Liu et al. 2023. Language is not all you need: Aligning perception with language models. arXiv:2302.14045. Retrieved from https:\/\/arxiv.org\/abs\/2302.14045"},{"key":"e_1_3_1_124_2","unstructured":"Yuzhen Huang Yuzhuo Bai Zhihao Zhu Junlei Zhang Jinghan Zhang Tangjun Su Junteng Liu Chuancheng Lv Yikai Zhang Jiayi Lei et al. 2023a. C-eval: A multi-level multi-discipline chinese evaluation suite for foundation models. arXiv:2305.08322. Retrieved from https:\/\/arxiv.org\/abs\/2305.08322"},{"key":"e_1_3_1_125_2","unstructured":"Yi-Chong Huang Xia-Chong Feng Xiao-Cheng Feng and Bing Qin. 2021. The factual inconsistency problem in abstractive text summarization: A survey. arXiv:2104.14839. Retrieved from https:\/\/arxiv.org\/abs\/2104.14839"},{"key":"e_1_3_1_126_2","volume-title":"Proceedings of the 11th International Conference on Learning Representations (ICLR\u00a0\u201923)","author":"Huang Zeyu","year":"2023","unstructured":"Zeyu Huang, Yikang Shen, Xiaofeng Zhang, Jie Zhou, Wenge Rong, and Zhang Xiong. 2023d. Transformer-Patcher: One mistake worth one neuron. In Proceedings of the 11th International Conference on Learning Representations (ICLR\u00a0\u201923). OpenReview.net. Retrieved from https:\/\/openreview.net\/pdf?id=4oYUGeGBPm"},{"key":"e_1_3_1_127_2","unstructured":"Siqing Huo Negar Arabzadeh and Charles L. A. Clarke. 2023. Retrieving supporting evidence for LLMs generated answers. arXiv:2306.13781. Retrieved from https:\/\/arxiv.org\/abs\/2306.13781"},{"key":"e_1_3_1_128_2","unstructured":"Srinivasan Iyer Xi Victoria Lin Ramakanth Pasunuru Todor Mihaylov Daniel Simig Ping Yu Kurt Shuster Tianlu Wang Qing Liu Punit Singh Koura et al. 2022. Opt-iml: Scaling language model instruction meta learning through the lens of generalization. arXiv:2212.12017. Retrieved from https:\/\/arxiv.org\/abs\/2212.12017"},{"key":"e_1_3_1_129_2","unstructured":"Gautier Izacard Mathilde Caron Lucas Hosseini Sebastian Riedel Piotr Bojanowski Armand Joulin and Edouard Grave. 2022. Unsupervised dense information retrieval with contrastive learning. Transactions on Machine Learning Research 2022 (2022). Retrieved from https:\/\/openreview.net\/forum?id=jKN1pXi7b0"},{"key":"e_1_3_1_130_2","unstructured":"Gautier Izacard Patrick S. H. Lewis Maria Lomeli Lucas Hosseini Fabio Petroni Timo Schick Jane Dwivedi-Yu Armand Joulin Sebastian Riedel and Edouard Grave. 2023. Atlas: Few-shot learning with retrieval augmented language models. Journal of Machine Learning Research 24 (2023) 251:1\u2013251:43. Retrieved from http:\/\/jmlr.org\/papers\/v24\/23-0037.html"},{"key":"e_1_3_1_131_2","unstructured":"Rolf Jagerman Honglei Zhuang Zhen Qin Xuanhui Wang and Michael Bendersky. 2023. Query expansion by prompting large language models. arXiv:2305.03653. Retrieved from https:\/\/arxiv.org\/abs\/2305.03653"},{"key":"e_1_3_1_132_2","doi-asserted-by":"crossref","unstructured":"Sameer Jain Vaishakh Keshava Swarnashree Mysore Sathyendra Patrick Fernandes Pengfei Liu Graham Neubig and Chunting Zhou. 2023. Multi-dimensional evaluation of text summarization with in-context learning. arXiv:2306.01200. Retrieved from https:\/\/arxiv.org\/abs\/2306.01200","DOI":"10.18653\/v1\/2023.findings-acl.537"},{"key":"e_1_3_1_133_2","unstructured":"Joonhyun Jeong. 2023. Hijacking context in large multi-modal models. arXiv:2312.07553. Retrieved from https:\/\/arxiv.org\/abs\/2312.07553"},{"key":"e_1_3_1_134_2","doi-asserted-by":"crossref","unstructured":"Soyeong Jeong Jinheon Baek Sukmin Cho Sung Ju Hwang and Jong C. Park. 2024. Adaptive-RAG: Learning to adapt retrieval-augmented large language models through question complexity. arXiv:2403.14403. Retrieved from https:\/\/arxiv.org\/abs\/2403.14403","DOI":"10.18653\/v1\/2024.naacl-long.389"},{"key":"e_1_3_1_135_2","doi-asserted-by":"publisher","DOI":"10.1145\/3571730"},{"key":"e_1_3_1_136_2","unstructured":"Ziwei Ji Tiezheng Yu Yan Xu Nayeon Lee Etsuko Ishii and Pascale Fung. 2023. Towards mitigating hallucination in large language models via self-reflection. arXiv:2310.06271. Retrieved from https:\/\/arxiv.org\/abs\/2310.06271"},{"key":"e_1_3_1_137_2","doi-asserted-by":"crossref","unstructured":"Chaoya Jiang Wei Ye Mengfan Dong Hongrui Jia Haiyang Xu Ming Yan Ji Zhang and Shikun Zhang. 2024. Hal-Eval: A universal and fine-grained hallucination evaluation framework for large vision language models. arXiv:2402.15721. Retrieved from https:\/\/arxiv.org\/abs\/2402.15721","DOI":"10.1145\/3664647.3680576"},{"key":"e_1_3_1_138_2","doi-asserted-by":"crossref","unstructured":"Huiqiang Jiang Qianhui Wu Chin-Yew Lin Yuqing Yang and Lili Qiu. 2023. LLMLingua: Compressing prompts for accelerated inference of large language models. arXiv:2310.05736. Retrieved from https:\/\/arxiv.org\/abs\/2310.05736","DOI":"10.18653\/v1\/2023.emnlp-main.825"},{"key":"e_1_3_1_139_2","doi-asserted-by":"crossref","unstructured":"Zhengbao Jiang Frank F. Xu Luyu Gao Zhiqing Sun Qian Liu Jane Dwivedi-Yu Yiming Yang Jamie Callan and Graham Neubig. 2023. Active retrieval augmented generation. arXiv:2305.06983. Retrieved from https:\/\/arxiv.org\/abs\/2305.06983","DOI":"10.18653\/v1\/2023.emnlp-main.495"},{"key":"e_1_3_1_140_2","unstructured":"Liqiang Jing Ruosen Li Yunmo Chen Mengzhao Jia and Xinya Du. 2023. Faithscore: Evaluating hallucinations in large vision-language models. arXiv:2311.01477. Retrieved from https:\/\/arxiv.org\/abs\/2311.01477"},{"key":"e_1_3_1_141_2","unstructured":"Zhi Jing Yongye Su Yikun Han Bo Yuan Haiyun Xu Chunjiang Liu Kehai Chen and Min Zhang. 2024. When large language models meet vector databases: A survey. arXiv:2402.01763. Retrieved from https:\/\/arxiv.org\/abs\/2402.01763"},{"key":"e_1_3_1_142_2","unstructured":"Saurav Kadavath Tom Conerly Amanda Askell Tom Henighan Dawn Drain Ethan Perez Nicholas Schiefer Zac Hatfield-Dodds Nova DasSarma Eli Tran-Johnson et al. 2022. Language models (mostly) know what they know. arXiv:2207.05221. Retrieved from https:\/\/arxiv.org\/abs\/2207.05221"},{"key":"e_1_3_1_143_2","unstructured":"Greg Kamradt. 2024. The 5 Levels of Text Splitting for Retrieval. Youtube. Retrieved from https:\/\/www.youtube.com\/watch?v=8OJC21T2SL4"},{"key":"e_1_3_1_144_2","series-title":"Proceedings of Machine Learning Research, Vol. 202","first-page":"15696","volume-title":"Proceedings of the International Conference on Machine Learning (ICML \u201923)","author":"Kandpal Nikhil","year":"2023","unstructured":"Nikhil Kandpal, Haikang Deng, Adam Roberts, Eric Wallace, and Colin Raffel. 2023. Large language models struggle to learn long-tail knowledge. In Proceedings of the International Conference on Machine Learning (ICML \u201923). Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (Eds.), Proceedings of Machine Learning Research, Vol. 202, PMLR, 15696\u201315707. Retrieved from https:\/\/proceedings.mlr.press\/v202\/kandpal23a.html"},{"key":"e_1_3_1_145_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.550"},{"key":"e_1_3_1_146_2","unstructured":"Jungo Kasai Keisuke Sakaguchi Yoichi Takahashi Ronan Le Bras Akari Asai Xinyan Yu Dragomir Radev Noah A. Smith Yejin Choi and Kentaro Inui. 2022. RealTime QA: What\u2019s the answer right now? arXiv:2207.13332. Retrieved from https:\/\/arxiv.org\/abs\/2207.13332"},{"key":"e_1_3_1_147_2","unstructured":"Daniel Martin Katz Michael James Bommarito Shang Gao and Pablo Arredondo. 2023. Gpt-4 Passes the Bar Exam. Retrieved from https:\/\/www.datascienceassn.org\/sites\/default\/files\/GPT-4%20Passes%20the%20Bar%20Exam.pdf"},{"key":"e_1_3_1_148_2","volume-title":"Proceedings of the 8th International Conference on Learning Representations (ICLR \u201920)","author":"Khandelwal Urvashi","year":"2020","unstructured":"Urvashi Khandelwal, Omer Levy, Dan Jurafsky, Luke Zettlemoyer, and Mike Lewis. 2020. Generalization through memorization: Nearest neighbor language models. In Proceedings of the 8th International Conference on Learning Representations (ICLR \u201920). OpenReview.net. Retrieved from https:\/\/openreview.net\/forum?id=HklBjCEKvH"},{"key":"e_1_3_1_149_2","doi-asserted-by":"publisher","DOI":"10.5555\/3600270.3601883"},{"key":"e_1_3_1_150_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.750"},{"key":"e_1_3_1_151_2","unstructured":"Philippe Laban Wojciech Kry\u015aci\u0144ski Divyansh Agarwal Alexander R. Fabbri Caiming Xiong Shafiq Joty and Chien-Sheng Wu. 2023. LLMs as factual reasoners: Insights from existing benchmarks and beyond. arXiv:2305.14540. Retrieved from https:\/\/arxiv.org\/abs\/2305.14540"},{"key":"e_1_3_1_152_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00453"},{"key":"e_1_3_1_153_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.eacl-main.234"},{"key":"e_1_3_1_154_2","first-page":"6402","volume-title":"Proceedings of the 31st International Conference on Neural Information Processing Systems","author":"Lakshminarayanan Balaji","year":"2017","unstructured":"Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. 2017. Simple and scalable predictive uncertainty estimation using deep ensembles. In Proceedings of the 31st International Conference on Neural Information Processing Systems. Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.), 6402\u20136413. Retrieved from https:\/\/proceedings.neurips.cc\/paper\/2017\/hash\/9ef2ed4b7fd2c810847ffa5fa85bce38-Abstract.html"},{"key":"e_1_3_1_155_2","unstructured":"Tamera Lanham Anna Chen Ansh Radhakrishnan Benoit Steiner Carson Denison Danny Hernandez Dustin Li Esin Durmus Evan Hubinger Jackson Kernion et\u00a0al. 2023. Measuring faithfulness in chain-of-thought reasoning. arXiv:2307.13702. Retrieved from https:\/\/arxiv.org\/abs\/2307.13702"},{"key":"e_1_3_1_156_2","doi-asserted-by":"crossref","unstructured":"Barrett Martin Lattimer Patrick Chen Xinyuan Zhang and Yi Yang. 2023. Fast and accurate factual inconsistency detection over long documents. arXiv:2310.13189. Retrieved from https:\/\/arxiv.org\/abs\/2310.13189","DOI":"10.18653\/v1\/2023.emnlp-main.105"},{"key":"e_1_3_1_157_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.acl-long.577"},{"key":"e_1_3_1_158_2","doi-asserted-by":"publisher","DOI":"10.5555\/3600270.3602776"},{"key":"e_1_3_1_159_2","unstructured":"Deren Lei Yaxi Li Mingyu Wang Vincent Yun Emily Ching and Eslam Kamal. 2023. Chain of natural language inference for reducing large language model ungrounded hallucinations. arXiv:2310.03951. Retrieved from https:\/\/arxiv.org\/abs\/2310.03951"},{"key":"e_1_3_1_160_2","unstructured":"Sicong Leng Hang Zhang Guanzheng Chen Xin Li Shijian Lu Chunyan Miao and Lidong Bing. 2023. Mitigating object hallucinations in large vision-language models through visual contrastive decoding. arXiv:2311.16922. Retrieved from https:\/\/arxiv.org\/abs\/2311.16922."},{"key":"e_1_3_1_161_2","doi-asserted-by":"crossref","unstructured":"BA Levinstein and Daniel A. Herrmann. 2023. Still no lie detector for language models: Probing empirical and conceptual roadblocks. arXiv:2307.00175. Retrieved from https:\/\/arxiv.org\/abs\/2307.00175","DOI":"10.1007\/s11098-023-02094-3"},{"key":"e_1_3_1_162_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.703"},{"key":"e_1_3_1_163_2","doi-asserted-by":"publisher","DOI":"10.5555\/3495724.3496517"},{"key":"e_1_3_1_164_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.findings-acl.112"},{"key":"e_1_3_1_165_2","unstructured":"Jiachun Li Pengfei Cao Yubo Chen Kang Liu and Jun Zhao. 2024. Towards faithful chain-of-thought: large language models are bridging reasoners. arXiv:2405.18915. Retrieved from https:\/\/arxiv.org\/abs\/2405.18915"},{"key":"e_1_3_1_166_2","unstructured":"Junyi Li Jie Chen Ruiyang Ren Xiaoxue Cheng Wayne Xin Zhao Jian-Yun Nie and Ji-Rong Wen. 2024. The dawn after the dark: An empirical study on factuality hallucination in large language models. arXiv:2401.03205. Retrieved from https:\/\/arxiv.org\/abs\/2401.03205"},{"key":"e_1_3_1_167_2","unstructured":"Junyi Li Xiaoxue Cheng Wayne Xin Zhao Jian-Yun Nie and Ji-Rong Wen. 2023. HaluEval: A large-scale hallucination evaluation benchmark for large language models. arXiv:2305.11747. Retrieved from https:\/\/arxiv.org\/abs\/2305.11747"},{"key":"e_1_3_1_168_2","unstructured":"Junnan Li Dongxu Li Silvio Savarese and Steven Hoi. 2023. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv:2301.12597. Retrieved from https:\/\/arxiv.org\/abs\/2301.12597"},{"key":"e_1_3_1_169_2","series-title":"CEUR Workshop Proceedings, Vol. 3589","volume-title":"Proceedings of the SIGIR Workshop on eCommerce Co-located with the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 23)","author":"Li Jinming","year":"2023","unstructured":"Jinming Li, Wentao Zhang, Tian Wang, Guanglei Xiong, Alan Lu, and Gerard Medioni. 2023. GPT4Rec: A generative framework for personalized recommendation and user interests interpretation. In Proceedings of the SIGIR Workshop on eCommerce Co-located with the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 23). Surya Kallumadi, Yubin Kim, Tracy Holloway King, Shervin Malmasi, Maarten de Rijke, and Jacopo Tagliabue (Eds.), CEUR Workshop Proceedings, Vol. 3589, CEUR-WS.org. Retrieved from https:\/\/ceur-ws.org\/Vol-3589\/paper_2.pdf"},{"key":"e_1_3_1_170_2","unstructured":"Kenneth Li Oam Patel Fernanda Vi\u00e9gas Hanspeter Pfister and Martin Wattenberg. 2023. Inference-time intervention: Eliciting truthful answers from a language model. arXiv:2306.03341. Retrieved from https:\/\/arxiv.org\/abs\/2306.03341"},{"key":"e_1_3_1_171_2","unstructured":"Minghan Li Xilun Chen Ari Holtzman Beidi Chen Jimmy Lin Wen-tau Yih and Xi Victoria Lin. 2024. Nearest neighbor speculative decoding for LLM generation and attribution. arXiv:2405.19325. Retrieved from https:\/\/arxiv.org\/abs\/2405.19325"},{"key":"e_1_3_1_172_2","unstructured":"Wei Li Wenhao Wu Moye Chen Jiachen Liu Xinyan Xiao and Hua Wu. 2022. Faithfulness in natural language generation: A systematic survey of analysis evaluation and optimization methods. arXiv:2203.05227. Retrieved from https:\/\/arxiv.org\/abs\/2203.05227"},{"key":"e_1_3_1_173_2","unstructured":"Xiang Lisa Li Ari Holtzman Daniel Fried Percy Liang Jason Eisner Tatsunori Hashimoto Luke Zettlemoyer and Mike Lewis. 2022. Contrastive decoding: Open-ended text generation as optimization. arXiv:2210.15097. Retrieved from https:\/\/arxiv.org\/abs\/2210.15097"},{"key":"e_1_3_1_174_2","unstructured":"Yucheng Li. 2023. Unlocking context constraints of LLMs: Enhancing context efficiency of LLMs with self-information-based content filtering. arXiv:2304.12102. Retrieved from https:\/\/arxiv.org\/abs\/2304.12102"},{"key":"e_1_3_1_175_2","unstructured":"Yuanzhi Li S\u00e9bastien Bubeck Ronen Eldan Allie Del Giorno Suriya Gunasekar and Yin Tat Lee. 2023. Textbooks are all you need II: phi-1.5 technical report. arXiv:2309.05463. Retrieved from https:\/\/arxiv.org\/abs\/2309.05463"},{"key":"e_1_3_1_176_2","unstructured":"Yifan Li Yifan Du Kun Zhou Jinpeng Wang Wayne Xin Zhao and Ji-Rong Wen. 2023. Evaluating object hallucination in large vision-language models. Retrieved from https:\/\/arxiv.org\/abs\/2305.10355"},{"key":"e_1_3_1_177_2","unstructured":"Yunxiang Li Zihan Li Kai Zhang Ruilong Dan and You Zhang. 2023. ChatDoctor: A medical chat model fine-tuned on LLaMA model using medical domain knowledge. arXiv:2303.14070. Retrieved from https:\/\/arxiv.org\/abs\/2303.14070"},{"key":"e_1_3_1_178_2","unstructured":"Zuchao Li Shitou Zhang Hai Zhao Yifei Yang and Dongjie Yang. 2023i. BatGPT: A bidirectional autoregessive talker from generative pre-trained transformer. arXiv:2307.00360 (2023). Retrieved from https:\/\/arxiv.org\/abs\/2307.00360"},{"key":"e_1_3_1_179_2","first-page":"74","volume-title":"Text Summarization Branches Out","author":"Lin Chin-Yew","year":"2004","unstructured":"Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out. Association for Computational Linguistics, 74\u201381. Retrieved from https:\/\/aclanthology.org\/W04-1013"},{"key":"e_1_3_1_180_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.acl-long.229"},{"key":"e_1_3_1_181_2","unstructured":"Bingbin Liu Jordan T. Ash Surbhi Goel Akshay Krishnamurthy and Cyril Zhang. 2023. Exposing attention glitches with flip-flop language modeling. arXiv:2306.00946. Retrieved from https:\/\/arxiv.org\/abs\/2306.00946"},{"key":"e_1_3_1_182_2","unstructured":"Fuxiao Liu Tianrui Guan Zongxia Li Lichang Chen Yaser Yacoob Dinesh Manocha and Tianyi Zhou. 2023. HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(Ision) LLaVA-1.5 and Other Multi-Modality Models. Retrieved from https:\/\/arxiv.org\/abs\/2310.14566"},{"key":"e_1_3_1_183_2","unstructured":"Fuxiao Liu Kevin Lin Linjie Li Jianfeng Wang Yaser Yacoob and Lijuan Wang. 2023. Mitigating hallucination in large multi-modal models via robust instruction tuning. arXiv:2306.14565. Retrieved from https:\/\/arxiv.org\/abs\/2306.14565"},{"key":"e_1_3_1_184_2","unstructured":"Haotian Liu Chunyuan Li Qingyang Wu and Yong Jae Lee. 2023. Visual instruction tuning. arXiv:2304.08485. Retrieved from https:\/\/arxiv.org\/abs\/2304.08485"},{"key":"e_1_3_1_185_2","unstructured":"Hanchao Liu Wenyuan Xue Yifei Chen Dapeng Chen Xiutian Zhao Ke Wang Liping Hou Rongjun Li and Wei Peng. 2024. A survey on hallucination in large vision-language models. arXiv:2402.00253. Retrieved from https:\/\/arxiv.org\/abs\/2402.00253"},{"key":"e_1_3_1_186_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.findings-emnlp.655"},{"key":"e_1_3_1_187_2","unstructured":"Nelson F. Liu Kevin Lin John Hewitt Ashwin Paranjape Michele Bevilacqua Fabio Petroni and Percy Liang. 2023. Lost in the middle: How language models use long contexts. arXiv:2307.03172. Retrieved from https:\/\/arxiv.org\/abs\/2307.03172"},{"key":"e_1_3_1_188_2","doi-asserted-by":"crossref","unstructured":"Yang Liu Dan Iter Yichong Xu Shuohang Wang Ruochen Xu and Chenguang Zhu. 2023. Gpteval: Nlg evaluation using gpt-4 with better human alignment. arXiv:2303.16634. Retrieved from https:\/\/arxiv.org\/abs\/2303.16634","DOI":"10.18653\/v1\/2023.emnlp-main.153"},{"key":"e_1_3_1_189_2","unstructured":"Yinhan Liu Myle Ott Naman Goyal Jingfei Du Mandar Joshi Danqi Chen Omer Levy Mike Lewis Luke Zettlemoyer and Veselin Stoyanov. 2019. RoBERTa: A robustly optimized BERT pretraining approach. arXiv:1907.11692. Retrieved from http:\/\/arxiv.org\/abs\/1907.11692"},{"key":"e_1_3_1_190_2","unstructured":"Yang Liu Yuanshun Yao Jean-Francois Ton Xiaoying Zhang Ruocheng Guo Hao Cheng Yegor Klochkov Muhammad Faaiz Taufiq and Hang Li. 2023. Trustworthy LLMs: A survey and guideline for evaluating large language models\u2019 alignment. arXiv:2308.05374. Retrieved from https:\/\/arxiv.org\/abs\/2308.05374"},{"key":"e_1_3_1_191_2","unstructured":"Yijin Liu Xianfeng Zeng Fandong Meng and Jie Zhou. 2023. Instruction position matters in sequence generation with large language models. arXiv:2308.12097. Retrieved from https:\/\/arxiv.org\/abs\/2308.12097"},{"key":"e_1_3_1_192_2","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/2021.EMNLP-MAIN.565"},{"key":"e_1_3_1_193_2","doi-asserted-by":"crossref","unstructured":"Holy Lovenia Wenliang Dai Samuel Cahyawijaya Ziwei Ji and Pascale Fung. 2023. Negative object presence evaluation (NOPE) to measure object hallucination in vision-language models. arXiv:2310.05338. Retrieved from https:\/\/arxiv.org\/abs\/2310.05338","DOI":"10.18653\/v1\/2024.alvr-1.4"},{"key":"e_1_3_1_194_2","unstructured":"Jiaying Lu Jinmeng Rao Kezhen Chen Xiaoyuan Guo Yawen Zhang Baochen Sun Carl Yang and Jie Yang. 2023. Evaluation and Mitigation of Agnosia in Multimodal Large Language Models. arXiv:2309.04041. Retrieved from https:\/\/arxiv.org\/abs\/2309.04041"},{"key":"e_1_3_1_195_2","unstructured":"Huaishao Luo Lei Ji Botian Shi Haoyang Huang Nan Duan Tianrui Li Jason Li Taroon Bharti and Ming Zhou. 2020. Univl: A unified video and language pre-training model for multimodal understanding and generation. arXiv:2002.06353. Retrieved from https:\/\/arxiv.org\/abs\/2002.06353"},{"key":"e_1_3_1_196_2","unstructured":"Junyu Luo Cao Xiao and Fenglong Ma. 2023. Zero-resource hallucination prevention for large language models. arXiv:2309.02654. Retrieved from https:\/\/arxiv.org\/abs\/2309.02654"},{"key":"e_1_3_1_197_2","unstructured":"Zheheng Luo Qianqian Xie and Sophia Ananiadou. 2023. Chatgpt as a factual inconsistency evaluator for text summarization. arXiv:2303.15621. Retrieved from https:\/\/arxiv.org\/abs\/2303.15621"},{"key":"e_1_3_1_198_2","unstructured":"Xinbei Ma Yeyun Gong Pengcheng He Hai Zhao and Nan Duan. 2023. Query rewriting for retrieval-augmented large language models. arXiv:2305.14283. Retrieved from https:\/\/arxiv.org\/abs\/2305.14283"},{"key":"e_1_3_1_199_2","unstructured":"Muhammad Maaz Hanoona Rasheed Salman Khan and Fahad Shahbaz Khan. 2023. Video-ChatGPT: Towards detailed video understanding via large vision and language models. arXiv:2306.05424. Retrieved from https:\/\/arxiv.org\/abs\/2306.05424"},{"key":"e_1_3_1_200_2","doi-asserted-by":"publisher","DOI":"10.7551\/mitpress\/9780262019200.001.0001"},{"key":"e_1_3_1_201_2","unstructured":"Chaitanya Malaviya Subin Lee Sihao Chen Elizabeth Sieber Mark Yatskar and Dan Roth. 2023. ExpertQA: Expert-curated questions and attributed answers. arXiv:2309.07852. Retrieved from https:\/\/arxiv.org\/abs\/2309.07852"},{"key":"e_1_3_1_202_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.acl-long.546"},{"key":"e_1_3_1_203_2","doi-asserted-by":"crossref","unstructured":"Potsawee Manakul Adian Liusie and Mark J. F. Gales. 2023. SelfCheckGPT: Zero-resource black-box hallucination detection for generative large language models. arXiv:2303.08896. Retrieved from https:\/\/arxiv.org\/abs\/2303.08896","DOI":"10.18653\/v1\/2023.emnlp-main.557"},{"key":"e_1_3_1_204_2","doi-asserted-by":"publisher","DOI":"10.1137\/0222058"},{"key":"e_1_3_1_205_2","unstructured":"Shengyu Mao Yong Jiang Boli Chen Xiao Li Peng Wang Xinyu Wang Pengjun Xie Fei Huang Huajun Chen and Ningyu Zhang. 2024. RaFe: Ranking feedback improves query rewriting for RAG. arXiv:2405.14431. Retrieved from https:\/\/arxiv.org\/abs\/2405.14431"},{"key":"e_1_3_1_206_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.173"},{"key":"e_1_3_1_207_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.170"},{"key":"e_1_3_1_208_2","doi-asserted-by":"publisher","DOI":"10.5555\/3600270.3601532"},{"key":"e_1_3_1_209_2","volume-title":"Proceedings of the 11th International Conference on Learning Representations (ICLR \u201923)","author":"Meng Kevin","year":"2023","unstructured":"Kevin Meng, Arnab Sen Sharma, Alex J. Andonian, Yonatan Belinkov, and David Bau. 2023. Mass-editing memory in a transformer. In Proceedings of the 11th International Conference on Learning Representations (ICLR \u201923). OpenReview.net. Retrieved from https:\/\/openreview.net\/pdf?id=MkbcAHIYgyS"},{"key":"e_1_3_1_210_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.acl-long.268"},{"key":"e_1_3_1_211_2","unstructured":"Ning Miao Yee Whye Teh and Tom Rainforth. 2023. Selfcheck: Using llms to zero-shot check their own step-by-step reasoning. arXiv:2308.00436. Retrieved from https:\/\/arxiv.org\/abs\/2308.00436"},{"key":"e_1_3_1_212_2","unstructured":"Microsoft. 2023. New Bing. Retrieved from https:\/\/www.bing.com\/new"},{"key":"e_1_3_1_213_2","unstructured":"Sewon Min Suchin Gururangan Eric Wallace Hannaneh Hajishirzi Noah A. Smith and Luke Zettlemoyer. 2023. SILO language models: Isolating legal risk in a nonparametric datastore. arXiv:2308.04430. Retrieved from https:\/\/arxiv.org\/abs\/2308.04430"},{"key":"e_1_3_1_214_2","doi-asserted-by":"crossref","unstructured":"Sewon Min Kalpesh Krishna Xinxi Lyu Mike Lewis Wen-tau Yih Pang Wei Koh Mohit Iyyer Luke Zettlemoyer and Hannaneh Hajishirzi. 2023. FActScore: Fine-grained atomic evaluation of factual precision in long form text generation. arXiv:2305.14251. Retrieved from https:\/\/arxiv.org\/abs\/2305.14251","DOI":"10.18653\/v1\/2023.emnlp-main.741"},{"key":"e_1_3_1_215_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.naacl-main.104"},{"key":"e_1_3_1_216_2","volume-title":"Proceedings of the 10th International Conference on Learning Representations (ICLR \u201922)","author":"Mitchell Eric","year":"2022","unstructured":"Eric Mitchell, Charles Lin, Antoine Bosselut, Chelsea Finn, and Christopher D. Manning. 2022. Fast model editing at scale. In Proceedings of the 10th International Conference on Learning Representations (ICLR \u201922). OpenReview.net. Retrieved from https:\/\/openreview.net\/forum?id=0DcZxeWfOPt"},{"key":"e_1_3_1_217_2","series-title":"Proceedings of Machine Learning Research, Vol. 162","first-page":"15817","volume-title":"Proceedings of the International Conference on Machine Learning (ICML \u201922)","author":"Mitchell Eric","year":"2022","unstructured":"Eric Mitchell, Charles Lin, Antoine Bosselut, Christopher D. Manning, and Chelsea Finn. 2022. Memory-based model editing at scale. In Proceedings of the International Conference on Machine Learning (ICML \u201922). Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesv\u00e1ri, Gang Niu, and Sivan Sabato (Eds.), Proceedings of Machine Learning Research, Vol. 162, PMLR, 15817\u201315831. Retrieved from https:\/\/proceedings.mlr.press\/v162\/mitchell22a.html"},{"key":"e_1_3_1_218_2","unstructured":"Luca Moschella Valentino Maiorca Marco Fumero Antonio Norelli Francesco Locatello and Emanuele Rodola. 2022. Relative representations enable zero-shot latent space communication. arXiv:2209.15430. Retrieved from https:\/\/arxiv.org\/abs\/2209.15430"},{"key":"e_1_3_1_219_2","unstructured":"Niklas Muennighoff Hongjin Su Liang Wang Nan Yang Furu Wei Tao Yu Amanpreet Singh and Douwe Kiela. 2024. Generative representational instruction tuning. arXiv:2402.09906. Retrieved from https:\/\/arxiv.org\/abs\/2402.09906"},{"key":"e_1_3_1_220_2","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/2023.EACL-MAIN.148"},{"key":"e_1_3_1_221_2","unstructured":"Dor Muhlgay Ori Ram Inbal Magar Yoav Levine Nir Ratner Yonatan Belinkov Omri Abend Kevin Leyton-Brown Amnon Shashua and Yoav Shoham. 2023. Generating benchmarks for factuality evaluation of language models. arXiv:2307.06908. Retrieved from https:\/\/arxiv.org\/abs\/2307.06908"},{"key":"e_1_3_1_222_2","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/2023.FINDINGS-EACL.65"},{"key":"e_1_3_1_223_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.eacl-main.235"},{"key":"e_1_3_1_224_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.eacl-main.9"},{"key":"e_1_3_1_225_2","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/2023.ACL-LONG.559"},{"key":"e_1_3_1_226_2","unstructured":"Shiyu Ni Keping Bi Jiafeng Guo and Xueqi Cheng. 2024. When do LLMs need retrieval augmentation? Mitigating LLMs\u2019 overconfidence helps retrieval augmentation. arXiv:2402.11457. Retrieved from https:\/\/arxiv.org\/abs\/2402.11457"},{"key":"e_1_3_1_227_2","unstructured":"Sean O\u2019Brien and Mike Lewis. 2023. Contrastive decoding improves reasoning in large language models. arXiv:2309.09117. Retrieved from https:\/\/arxiv.org\/abs\/2309.09117"},{"key":"e_1_3_1_228_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.findings-naacl.52"},{"key":"e_1_3_1_229_2","unstructured":"OpenAI. 2022. Introducing chatgpt. Retrieved from https:\/\/openai.com\/blog\/chatgpt"},{"key":"e_1_3_1_230_2","unstructured":"OpenAI. 2023. GPT-4 technical report. arXiv:2303.08774. Retrieved from https:\/\/arxiv.org\/abs\/2303.08774"},{"key":"e_1_3_1_231_2","volume-title":"NeurIPS","author":"Ouyang Long","year":"2022","unstructured":"Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et\u00a0al. 2022. Training language models to follow instructions with human feedback. In NeurIPS. Retrieved from http:\/\/papers.nips.cc\/paper_files\/paper\/2022\/hash\/b1efde53be364a73914f58805a001731-Abstract-Conference.html"},{"key":"e_1_3_1_232_2","unstructured":"Oded Ovadia Menachem Brief Moshik Mishaeli and Oren Elisha. 2023. Fine-tuning or retrieval? Comparing knowledge injection in LLMs. arXiv:2312.05934. Retrieved from https:\/\/arxiv.org\/abs\/2312.05934"},{"key":"e_1_3_1_233_2","unstructured":"Lorenzo Pacchiardi Alex J Chan S\u00f6ren Mindermann Ilan Moscovitz Alexa Y Pan Yarin Gal Owain Evans and Jan Brauner. 2023. How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions. arXiv:2309.15840. Retrieved from https:\/\/arxiv.org\/abs\/2309.15840"},{"key":"e_1_3_1_234_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.naacl-main.383"},{"key":"e_1_3_1_235_2","unstructured":"Liangming Pan Michael Saxon Wenda Xu Deepak Nathani Xinyi Wang and William Yang Wang. 2023. Automatically correcting large language models: Surveying the landscape of diverse self-correction strategies. arXiv:2308.03188. Retrieved from https:\/\/arxiv.org\/abs\/2308.03188"},{"key":"e_1_3_1_236_2","unstructured":"Ruotong Pan Boxi Cao Hongyu Lin Xianpei Han Jia Zheng Sirui Wang Xunliang Cai and Le Sun. 2024. Not all contexts are equal: Teaching LLMs credibility-aware generation. arXiv:2404.06809. Retrieved from https:\/\/arxiv.org\/abs\/2404.06809"},{"key":"e_1_3_1_237_2","doi-asserted-by":"publisher","DOI":"10.3115\/1073083.1073135"},{"key":"e_1_3_1_238_2","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/2020.EMNLP-MAIN.89"},{"key":"e_1_3_1_239_2","doi-asserted-by":"crossref","unstructured":"Debjit Paul Robert West Antoine Bosselut and Boi Faltings. 2024. Making reasoning matter: Measuring and improving faithfulness of chain-of-thought reasoning. arXiv:2402.13950. Retrieved from https:\/\/arxiv.org\/abs\/2402.13950","DOI":"10.18653\/v1\/2024.findings-emnlp.882"},{"key":"e_1_3_1_240_2","doi-asserted-by":"publisher","DOI":"10.1016\/J.PATTER.2021.100336"},{"key":"e_1_3_1_241_2","unstructured":"Guilherme Penedo Quentin Malartic Daniel Hesslow Ruxandra Cojocaru Alessandro Cappelli Hamza Alobeidli Baptiste Pannier Ebtesam Almazrouei and Julien Launay. 2023. The RefinedWeb dataset for falcon LLM: Outperforming curated corpora with web data and web data only. arXiv:2306.01116. Retrieved from https:\/\/arxiv.org\/abs\/2306.01116"},{"key":"e_1_3_1_242_2","unstructured":"Baolin Peng Chunyuan Li Pengcheng He Michel Galley and Jianfeng Gao. 2023. Instruction tuning with gpt-4. arXiv:2304.03277. Retrieved from https:\/\/arxiv.org\/abs\/2304.03277"},{"key":"e_1_3_1_243_2","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/2023.FINDINGS-ACL.847"},{"key":"e_1_3_1_244_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1250"},{"key":"e_1_3_1_245_2","doi-asserted-by":"crossref","unstructured":"Ofir Press Muru Zhang Sewon Min Ludwig Schmidt Noah A Smith and Mike Lewis. 2022. Measuring and narrowing the compositionality gap in language models. arXiv:2210.03350. Retrieved from https:\/\/arxiv.org\/abs\/2210.03350","DOI":"10.18653\/v1\/2023.findings-emnlp.378"},{"key":"e_1_3_1_246_2","unstructured":"Jirui Qi Gabriele Sarti Raquel Fern\u00e1ndez and Arianna Bisazza. 2024. Model internals-based answer attribution for trustworthy retrieval-augmented generation. arXiv:2406.13663. Retrieved from https:\/\/arxiv.org\/abs\/2406.13663"},{"key":"e_1_3_1_247_2","unstructured":"Zhixiao Qi Yijiong Yu Meiqi Tu Junyi Tan and Yongfeng Huang. 2023. FoodGPT: A large language model in food testing domain with incremental pre-training and knowledge graph prompt. arXiv:2308.10173. Retrieved from https:\/\/arxiv.org\/abs\/2308.10173"},{"key":"e_1_3_1_248_2","unstructured":"Shuofei Qiao Yixin Ou Ningyu Zhang Xiang Chen Yunzhi Yao Shumin Deng Chuanqi Tan Fei Huang and Huajun Chen. 2022. Reasoning with language model prompting: A survey. arXiv:2212.09597. Retrieved from https:\/\/arxiv.org\/abs\/2212.09597"},{"key":"e_1_3_1_249_2","unstructured":"Alec Radford Karthik Narasimhan Tim Salimans and Ilya Sutskever. 2018. Improving language understanding by generative pre-training."},{"issue":"8","key":"e_1_3_1_250_2","first-page":"9","article-title":"Language models are unsupervised multitask learners","volume":"1","author":"Radford Alec","year":"2019","unstructured":"Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.","journal-title":"OpenAI blog"},{"key":"e_1_3_1_251_2","unstructured":"Rafael Rafailov Archit Sharma Eric Mitchell Stefano Ermon Christopher D. Manning and Chelsea Finn. 2023. Direct preference optimization: Your language model is secretly a reward model. arXiv:2305.18290. Retrieved from https:\/\/arxiv.org\/abs\/2305.18290"},{"key":"e_1_3_1_252_2","unstructured":"Colin Raffel Noam Shazeer Adam Roberts Katherine Lee Sharan Narang Michael Matena Yanqi Zhou Wei Li and Peter J. Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research 21 (2020) 140:1\u2013140:67. Retrieved from http:\/\/jmlr.org\/papers\/v21\/20-074.html"},{"key":"e_1_3_1_253_2","doi-asserted-by":"crossref","unstructured":"Ori Ram Yoav Levine Itay Dalmedigos Dor Muhlgay Amnon Shashua Kevin Leyton-Brown and Yoav Shoham. 2023. In-context retrieval-augmented language models. arXiv:2302.00083. Retrieved from https:\/\/arxiv.org\/abs\/2302.00083","DOI":"10.1162\/tacl_a_00605"},{"key":"e_1_3_1_254_2","volume-title":"Proceedings of the 4th International Conference on Learning Representations (ICLR \u201916)","author":"Ranzato Marc\u2019Aurelio","year":"2016","unstructured":"Marc\u2019Aurelio Ranzato, Sumit Chopra, Michael Auli, and Wojciech Zaremba. 2016. Sequence level training with recurrent neural networks. In Proceedings of the 4th International Conference on Learning Representations (ICLR \u201916). Yoshua Bengio and Yann LeCun (Eds.), Retrieved from http:\/\/arxiv.org\/abs\/1511.06732"},{"key":"e_1_3_1_255_2","unstructured":"Mathieu Ravaut Aixin Sun Nancy F. Chen and Shafiq Joty. 2024. On context utilization in summarization with large language models. arXiv:2310.10570. Retrieved from https:\/\/arxiv.org\/abs\/2310.10570"},{"key":"e_1_3_1_256_2","unstructured":"Vipula Rawte Amit P. Sheth and Amitava Das. 2023. A survey of hallucination in large foundation models. arXiv:2309.05922. Retrieved from https:\/\/arxiv.org\/abs\/2309.05922"},{"key":"e_1_3_1_257_2","unstructured":"Machel Reid Nikolay Savinov Denis Teplyashin Dmitry Lepikhin Timothy Lillicrap Jean Baptiste Alayrac Radu Soricut Angeliki Lazaridou Orhan Firat Julian Schrittwieser et\u00a0al. 2024. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context. arXiv:2403.05530 Retrieved from https:\/\/arxiv.org\/abs\/2403.05530"},{"key":"e_1_3_1_258_2","unstructured":"Ruiyang Ren Yuhao Wang Yingqi Qu Wayne Xin Zhao Jing Liu Hao Tian Hua Wu Ji-Rong Wen and Haifeng Wang. 2023. Investigating the factual knowledge boundary of large language models with retrieval augmentation. arXiv:2307.11019. Retrieved from https:\/\/arxiv.org\/abs\/2307.11019"},{"key":"e_1_3_1_259_2","unstructured":"Reuters. 2023. U.S. Copyright Office Says Some AI-Assisted Works May Be Copyrighted. Retrieved from https:\/\/www.reuters.com\/world\/us\/us-copyright-office-says-some-ai-assisted-works-may-be-copyrighted-2023-03-15\/"},{"key":"e_1_3_1_260_2","unstructured":"Nina Rimsky. 2023. Modulating Sycophancy in an RLHF Model via Activation Steering. Retrieved from https:\/\/www.alignmentforum.org\/posts\/zt6hRsDE84HeBKh7E\/reducing-sycophancy-and-improving-honesty-via-activation"},{"key":"e_1_3_1_261_2","unstructured":"Nina Rimsky. 2023. Reducing Sycophancy and Improving Honesty via Activation Steering. Retrieved from https:\/\/www.alignmentforum.org\/posts\/zt6hRsDE84HeBKh7E\/reducing-sycophancy-and-improving-honesty-via-activation"},{"key":"e_1_3_1_262_2","unstructured":"Vinu Sankar Sadasivan Aounon Kumar Sriram Balasubramanian Wenxiao Wang and Soheil Feizi. 2023. Can AI-generated text be reliably detected? arXiv:2303.11156. Retrieved from https:\/\/arxiv.org\/abs\/2303.11156"},{"key":"e_1_3_1_263_2","unstructured":"Sashank Santhanam Behnam Hedayatnia Spandana Gella Aishwarya Padmakumar Seokhwan Kim Yang Liu and Dilek Hakkani-Tur. 2021. Rome was built in 1776: A case study on factual correctness in knowledge-grounded response generation. arXiv:2110.05456. Retrieved from https:\/\/arxiv.org\/abs\/2110.05456"},{"key":"e_1_3_1_264_2","unstructured":"Parth Sarthi Salman Abdullah Aditi Tuli Shubh Khanna Anna Goldie and Christopher D. Manning. 2024. RAPTOR: Recursive abstractive processing for tree-organized retrieval. arXiv:2401.18059. Retrieved from https:\/\/arxiv.org\/abs\/2401.18059"},{"key":"e_1_3_1_265_2","unstructured":"William Saunders Catherine Yeh Jeff Wu Steven Bills Long Ouyang Jonathan Ward and Jan Leike. 2022. Self-critiquing models for assisting human evaluators. arXiv:2206.05802. Retrieved from https:\/\/arxiv.org\/abs\/2206.05802"},{"key":"e_1_3_1_266_2","unstructured":"John Schulman. 2023. Reinforcement Learning from Human Feedback: Progress and Challenges. Berkeley EECS. Retrieved from https:\/\/www.youtube.com\/watch?v=hhiLw5Q_UFg"},{"key":"e_1_3_1_267_2","unstructured":"John Schulman Filip Wolski Prafulla Dhariwal Alec Radford and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv:1707.06347. Retrieved from https:\/\/arxiv.org\/abs\/1707.06347"},{"key":"e_1_3_1_268_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.emnlp-main.529"},{"key":"e_1_3_1_269_2","doi-asserted-by":"crossref","unstructured":"Yijia Shao Yucheng Jiang Theodore A. Kanell Peter Xu Omar Khattab and Monica S. Lam. 2024. Assisting in writing wikipedia-like articles from scratch with large language models. arXiv:2402.14207. Retrieved from https:\/\/arxiv.org\/abs\/2402.14207","DOI":"10.18653\/v1\/2024.naacl-long.347"},{"key":"e_1_3_1_270_2","doi-asserted-by":"crossref","unstructured":"Zhihong Shao Yeyun Gong Yelong Shen Minlie Huang Nan Duan and Weizhu Chen. 2023. Enhancing retrieval-augmented large language models with iterative retrieval-generation synergy. arXiv:2305.15294. Retrieved from https:\/\/arxiv.org\/abs\/2305.15294","DOI":"10.18653\/v1\/2023.findings-emnlp.620"},{"key":"e_1_3_1_271_2","unstructured":"Mrinank Sharma Meg Tong Tomasz Korbak David Duvenaud Amanda Askell Samuel R. Bowman Newton Cheng Esin Durmus Zac Hatfield-Dodds Scott R. Johnston et\u00a0al. 2023. Towards understanding sycophancy in language models. arXiv:2310.13548. Retrieved from https:\/\/arxiv.org\/abs\/2310.13548"},{"key":"e_1_3_1_272_2","unstructured":"Weijia Shi Xiaochuang Han Mike Lewis Yulia Tsvetkov Luke Zettlemoyer and Scott Wen-tau Yih. 2023. Trusting your evidence: Hallucinate less with context-aware decoding. arXiv:2305.14739. Retrieved from https:\/\/arxiv.org\/abs\/2305.14739"},{"key":"e_1_3_1_273_2","unstructured":"Weijia Shi Sewon Min Maria Lomeli Chunting Zhou Margaret Li Xi Victoria Lin Noah A. Smith Luke Zettlemoyer Scott Yih and Mike Lewis. 2023. In-context pretraining: language modeling beyond document boundaries. arXiv:2310.10638. Retrieved from https:\/\/arxiv.org\/abs\/2310.10638"},{"key":"e_1_3_1_274_2","unstructured":"Weijia Shi Sewon Min Michihiro Yasunaga Minjoon Seo Rich James Mike Lewis Luke Zettlemoyer and Wen-tau Yih. 2023. REPLUG: Retrieval-augmented black-box language models. arXiv:2301.12652. Retrieved from https:\/\/arxiv.org\/abs\/2301.12652"},{"key":"e_1_3_1_275_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.findings-emnlp.320"},{"key":"e_1_3_1_276_2","unstructured":"Karan Singhal Tao Tu Juraj Gottweis Rory Sayres Ellery Wulczyn Le Hou Kevin Clark Stephen Pfohl Heather Cole-Lewis Darlene Neal et\u00a0al. 2023. Towards expert-level medical question answering with large language models. arXiv:2305.09617. Retrieved from https:\/\/arxiv.org\/abs\/2305.09617"},{"key":"e_1_3_1_277_2","volume-title":"Proceedings of the8th International Conference on Learning Representations (ICLR \u201920)","author":"Sinitsin Anton","year":"2020","unstructured":"Anton Sinitsin, Vsevolod Plokhotnyuk, Dmitriy Pyrkin, Sergei Popov, and Artem Babenko. 2020. Editable neural networks. In Proceedings of the8th International Conference on Learning Representations (ICLR \u201920). OpenReview.net. Retrieved from https:\/\/openreview.net\/forum?id=HJedXaEtvS"},{"key":"e_1_3_1_278_2","doi-asserted-by":"crossref","unstructured":"Aviv Slobodkin Omer Goldman Avi Caciularu Ido Dagan and Shauli Ravfogel. 2023. The curious case of hallucinatory unanswerablity: Finding truths in the hidden states of over-confident large language models. arXiv:2310.11877. Retrieved from https:\/\/arxiv.org\/abs\/2310.11877","DOI":"10.18653\/v1\/2023.emnlp-main.220"},{"key":"e_1_3_1_279_2","doi-asserted-by":"crossref","unstructured":"Aviv Slobodkin Eran Hirsch Arie Cattan Tal Schuster and Ido Dagan. 2024. Attribute first then generate: Locally-attributable grounded text generation. arXiv:2403.17104. Retrieved from https:\/\/arxiv.org\/abs\/2403.17104","DOI":"10.18653\/v1\/2024.acl-long.182"},{"key":"e_1_3_1_280_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1331"},{"key":"e_1_3_1_281_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.emnlp-main.566"},{"key":"e_1_3_1_282_2","doi-asserted-by":"publisher","DOI":"10.5555\/3495724.3495977"},{"key":"e_1_3_1_283_2","unstructured":"Hongjin Su Howard Yen Mengzhou Xia Weijia Shi Niklas Muennighoff Han Yu Wang Haisu Liu Quan Shi Zachary S. Siegel Michael Tang Ruoxi Sun Jinsung Yoon Sercan O. Arik Danqi Chen and Tao Yu. 2024. BRIGHT: A realistic and challenging benchmark for reasoning-intensive retrieval. arXiv:2407.12883. Retrieved from https:\/\/arxiv.org\/abs\/2407.12883"},{"key":"e_1_3_1_284_2","unstructured":"Jianlin Su Yu Lu Shengfeng Pan Bo Wen and Yunfeng Liu. 2021. RoFormer: Enhanced transformer with rotary position embedding. arXiv:2104.09864. Retrieved from https:\/\/arxiv.org\/abs\/2104.09864"},{"key":"e_1_3_1_285_2","unstructured":"Weihang Su Yichen Tang Qingyao Ai Zhijing Wu and Yiqun Liu. 2024. DRAGIN: Dynamic retrieval augmented generation based on the real-time information needs of large language models. arXiv:2403.10081. Retrieved from https:\/\/arxiv.org\/abs\/2403.10081"},{"key":"e_1_3_1_286_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.findings-acl.48"},{"key":"e_1_3_1_287_2","unstructured":"Kai Sun Yifan Ethan Xu Hanwen Zha Yue Liu and Xin Luna Dong. 2023. Head-to-tail: How knowledgeable are large language models (LLM)? A.K.A. will LLMs replace knowledge graphs? arXiv:2308.10168. Retrieved from https:\/\/arxiv.org\/abs\/2308.10168"},{"key":"e_1_3_1_288_2","unstructured":"Ilya Sutskever. 2023. An Observation on Generalization. Youtube. Retrieved from https:\/\/www.youtube.com\/watch?v=AKMuA_TVz3A&t=5s"},{"key":"e_1_3_1_289_2","volume-title":"Proceedings of the 12th International Conference on Learning Representations","author":"Tan Chenmien","year":"2024","unstructured":"Chenmien Tan, Ge Zhang, and Jie Fu. 2024. Massive editing for large language model via meta learning. In Proceedings of the 12th International Conference on Learning Representations. Retrieved from https:\/\/openreview.net\/forum?id=L6L1CJQ2PE"},{"key":"e_1_3_1_290_2","unstructured":"Hexiang Tan Fei Sun Wanli Yang Yuanzhuo Wang Qi Cao and Xueqi Cheng. 2024. Blinded by generated contexts: How language models merge generated and retrieved contexts for open-domain QA? arXiv:2401.11911. Retrieved from https:\/\/arxiv.org\/abs\/2401.11911"},{"key":"e_1_3_1_291_2","unstructured":"Raphael Tang Xinyu Zhang Xueguang Ma Jimmy Lin and Ferhan Ture. 2023. Found in the middle: Permutation self-consistency improves listwise ranking in large language models. arXiv:2310.07712. Retrieved from https:\/\/arxiv.org\/abs\/2310.07712"},{"key":"e_1_3_1_292_2","unstructured":"Nandan Thakur Nils Reimers Andreas R\u00fcckl\u00e9 Abhishek Srivastava and Iryna Gurevych. 2021. BEIR: A heterogenous benchmark for zero-shot evaluation of information retrieval models. arXiv:2104.08663. Retrieved from https:\/\/arxiv.org\/abs\/2104.08663"},{"key":"e_1_3_1_293_2","unstructured":"Ran Tian Shashi Narayan Thibault Sellam and Ankur P Parikh. 2019. Sticking to the facts: Confident decoding for faithful data-to-text generation. ArXiv preprint abs\/1910.08684 (2019). Retrieved from https:\/\/arxiv.org\/abs\/1910.08684"},{"key":"e_1_3_1_294_2","unstructured":"Shengbang Tong Zhuang Liu Yuexiang Zhai Yi Ma Yann LeCun and Saining Xie. 2024. Eyes wide shut? Exploring the visual shortcomings of multimodal LLMs. arXiv:2401.06209. Retrieved from https:\/\/arxiv.org\/abs\/2401.06209"},{"key":"e_1_3_1_295_2","unstructured":"S. M. Towhidul Islam Tonmoy S. M. Mehedi Zaman Vinija Jain Anku Rani Vipula Rawte Aman Chadha and Amitava Das. 2024. A comprehensive survey of hallucination mitigation techniques in large language models. arXiv:2401.01313. Retrieved from https:\/\/arxiv.org\/abs\/2401.01313"},{"key":"e_1_3_1_296_2","unstructured":"Hugo Touvron Thibaut Lavril Gautier Izacard Xavier Martinet Marie-Anne Lachaux Timoth\u00e9e Lacroix Baptiste Rozi\u00e8re Naman Goyal Eric Hambro Faisal Azhar et\u00a0al. 2023. LLaMA: Open and efficient foundation language models. arXiv:2302.13971. Retrieved from https:\/\/arxiv.org\/abs\/2302.13971"},{"key":"e_1_3_1_297_2","unstructured":"Hugo Touvron Louis Martin Kevin Stone Peter Albert Amjad Almahairi Yasmine Babaei Nikolay Bashlykov Soumya Batra Prajjwal Bhargava Shruti Bhosale et\u00a0al. 2023. Llama 2: Open foundation and fine-tuned chat models. arXiv:2307.09288. Retrieved from https:\/\/arxiv.org\/abs\/2307.09288"},{"key":"e_1_3_1_298_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.acl-long.557"},{"key":"e_1_3_1_299_2","unstructured":"Miles Turpin Julian Michael Ethan Perez and Samuel R. Bowman. 2023. Language models don\u2019t always say what they think: Unfaithful explanations in chain-of-thought prompting. arXiv:2305.04388. Retrieved from https:\/\/arxiv.org\/abs\/2305.04388"},{"key":"e_1_3_1_300_2","unstructured":"Logesh Kumar Umapathi Ankit Pal and Malaikannan Sankarasubbu. 2023. Med-halt: Medical domain hallucination test for large language models. arXiv:2307.15343. Retrieved from https:\/\/arxiv.org\/abs\/2307.15343"},{"key":"e_1_3_1_301_2","unstructured":"A\u00e4ron van den Oord Yazhe Li and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding. arXiv:1807.03748. Retrieved from http:\/\/arxiv.org\/abs\/1807.03748"},{"key":"e_1_3_1_302_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.emnlp-main.399"},{"key":"e_1_3_1_303_2","unstructured":"Neeraj Varshney Wenlin Yao Hongming Zhang Jianshu Chen and Dong Yu. 2023. A stitch in time saves nine: Detecting and mitigating hallucinations of LLMs by validating low-confidence generation. arXiv:2307.03987. Retrieved from https:\/\/arxiv.org\/abs\/2307.03987"},{"key":"e_1_3_1_304_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1580"},{"key":"e_1_3_1_305_2","doi-asserted-by":"crossref","unstructured":"Tu Vu Mohit Iyyer Xuezhi Wang Noah Constant Jerry Wei Jason Wei Chris Tar Yun-Hsuan Sung Denny Zhou Quoc Le and Thang Luong. 2023. FreshLLMs: Refreshing large language models with search engine augmentation. arXiv:2310.03214. Retrieved from https:\/\/arxiv.org\/abs\/2310.03214","DOI":"10.18653\/v1\/2024.findings-acl.813"},{"key":"e_1_3_1_306_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.eacl-main.210"},{"key":"e_1_3_1_307_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.450"},{"key":"e_1_3_1_308_2","unstructured":"Binjie Wang Ethan Chern and Pengfei Liu. 2023. ChineseFactEval: A Factuality Benchmark for Chinese LLMs."},{"key":"e_1_3_1_309_2","unstructured":"Cunxiang Wang Xiaoze Liu Yuanhao Yue Xiangru Tang Tianhang Zhang Jiayang Cheng Yunzhi Yao Wenyang Gao Xuming Hu Zehan Qi et\u00a0al. 2023. Survey on factuality in large language models: Knowledge retrieval and domain-specificity. arXiv:2310.07521. Retrieved from https:\/\/arxiv.org\/abs\/2310.07521"},{"key":"e_1_3_1_310_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.326"},{"key":"e_1_3_1_311_2","doi-asserted-by":"crossref","unstructured":"Jiaan Wang Yunlong Liang Fandong Meng Haoxiang Shi Zhixu Li Jinan Xu Jianfeng Qu and Jie Zhou. 2023. Is chatgpt a good nlg evaluator? a preliminary study. arXiv:2303.04048. Retrieved from https:\/\/arxiv.org\/abs\/2303.04048","DOI":"10.18653\/v1\/2023.newsum-1.1"},{"key":"e_1_3_1_312_2","unstructured":"Junyang Wang Yuhang Wang Guohai Xu Jing Zhang Yukai Gu Haitao Jia Ming Yan Ji Zhang and Jitao Sang. 2023. An LLM-free multi-dimensional benchmark for mllms hallucination evaluation. arXiv:2311.07397. Retrieved from https:\/\/arxiv.org\/abs\/2311.07397"},{"key":"e_1_3_1_313_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-53302-0_3"},{"key":"e_1_3_1_314_2","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/2023.EMNLP-MAIN.585"},{"key":"e_1_3_1_315_2","unstructured":"Peifeng Wang Zhengyang Wang Zheng Li Yifan Gao Bing Yin and Xiang Ren. 2023. SCOTT: Self-consistent chain-of-thought distillation. arXiv:2305.01879. Retrieved from https:\/\/arxiv.org\/abs\/2305.01879"},{"key":"e_1_3_1_316_2","unstructured":"Shuting Wang Xin Yu Mang Wang Weipeng Chen Yutao Zhu and Zhicheng Dou. 2024. RichRAG: Crafting rich responses for multi-faceted queries in retrieval-augmented generation. arXiv:2406.12566. Retrieved from https:\/\/arxiv.org\/abs\/2406.12566"},{"key":"e_1_3_1_317_2","unstructured":"Song Wang Yaochen Zhu Haochen Liu Zaiyi Zheng Chen Chen and Jundong Li. 2023. Knowledge editing for large language models: A survey. arXiv:2310.16218. Retrieved from https:\/\/arxiv.org\/abs\/2310.16218"},{"key":"e_1_3_1_318_2","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/2023.FINDINGS-EMNLP.691"},{"key":"e_1_3_1_319_2","unstructured":"Yufei Wang Wanjun Zhong Liangyou Li Fei Mi Xingshan Zeng Wenyong Huang Lifeng Shang Xin Jiang and Qun Liu. 2023. Aligning large language models with human: A survey. arXiv:2307.12966. Retrieved from https:\/\/arxiv.org\/abs\/2307.12966"},{"key":"e_1_3_1_320_2","unstructured":"Zhiruo Wang Jun Araki Zhengbao Jiang Md. Rizwan Parvez and Graham Neubig. 2023. Learning to filter context for retrieval-augmented generation. arXiv:2311.08377. Retrieved from https:\/\/arxiv.org\/abs\/2311.08377"},{"key":"e_1_3_1_321_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.101"},{"key":"e_1_3_1_322_2","unstructured":"Zirui Wang Jiahui Yu Adams Wei Yu Zihang Dai Yulia Tsvetkov and Yuan Cao. 2022. SimVLM: Simple visual language model pretraining with weak supervision. In Proceedings of the 10th International Conference on Learning Representations (ICLR \u201922). OpenReview.net. Retrieved from https:\/\/openreview.net\/forum?id=GUrhfTuf_3"},{"key":"e_1_3_1_323_2","doi-asserted-by":"publisher","DOI":"10.5555\/3600270.3602070"},{"key":"e_1_3_1_324_2","unstructured":"Jerry W. Wei Da Huang Yifeng Lu Denny Zhou and Quoc V. Le. 2023. Simple synthetic data reduces sycophancy in large language models. arXiv:2308.03958. Retrieved from https:\/\/arxiv.org\/abs\/2308.03958"},{"key":"e_1_3_1_325_2","unstructured":"Laura Weidinger John Mellor Maribeth Rauh Conor Griffin Jonathan Uesato Po-Sen Huang Myra Cheng Mia Glaese Borja Balle Atoosa Kasirzadeh et\u00a0al. 2021. Ethical and social risks of harm from language models. arXiv:2112.04359. Retrieved from https:\/\/arxiv.org\/abs\/2112.04359"},{"key":"e_1_3_1_326_2","unstructured":"Yilin Wen Zifeng Wang and Jimeng Sun. 2023. MindMap: Knowledge graph prompting sparks graph of thoughts in large language models. arXiv:2308.09729. Retrieved from https:\/\/arxiv.org\/abs\/2308.09729"},{"key":"e_1_3_1_327_2","doi-asserted-by":"crossref","unstructured":"Di Wu Jia-Chen Gu Fan Yin Nanyun Peng and Kai-Wei Chang. 2024. Synchronous faithfulness monitoring for trustworthy retrieval-augmented generation. arXiv:2406.13692. Retrieved from https:\/\/arxiv.org\/abs\/2406.13692","DOI":"10.18653\/v1\/2024.emnlp-main.527"},{"key":"e_1_3_1_328_2","unstructured":"Kevin Wu Eric Wu and James Zou. 2024. ClashEval: Quantifying the tug-of-war between an LLM\u2019s internal prior and external evidence. arXiv:2404.10198. Retrieved from https:\/\/arxiv.org\/abs\/2404.10198"},{"key":"e_1_3_1_329_2","unstructured":"Shitao Xiao Zheng Liu Peitian Zhang and Niklas Muennighof. 2023. C-Pack: Packaged resources to advance general chinese embedding. arXiv:2309.07597. Retrieved from https:\/\/arxiv.org\/abs\/2309.07597"},{"key":"e_1_3_1_330_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.eacl-main.236"},{"key":"e_1_3_1_331_2","unstructured":"Jian Xie Kai Zhang Jiangjie Chen Renze Lou and Yu Su. 2023. Adaptive chameleon or stubborn sloth: Unraveling the behavior of large language models in knowledge clashes. arXiv:2305.13300. Retrieved from https:\/\/arxiv.org\/abs\/2305.13300"},{"key":"e_1_3_1_332_2","unstructured":"Miao Xiong Zhiyuan Hu Xinyang Lu Yifei Li Jie Fu Junxian He and Bryan Hooi. 2023. Can LLMs express their uncertainty? An empirical evaluation of confidence elicitation in LLMs. arXiv:2306.13063. Retrieved from https:\/\/arxiv.org\/abs\/2306.13063"},{"key":"e_1_3_1_333_2","unstructured":"Fangyuan Xu Weijia Shi and Eunsol Choi. 2023. RECOMP: Improving retrieval-augmented LMs with compression and selective augmentation. arXiv:2310.04408. Retrieved from https:\/\/arxiv.org\/abs\/2310.04408"},{"key":"e_1_3_1_334_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.508"},{"key":"e_1_3_1_335_2","unstructured":"Jundong Xu Hao Fei Liangming Pan Qian Liu Mong-Li Lee and Wynne Hsu. 2024. Faithful logical reasoning via symbolic chain-of-thought. arXiv:2405.18357. Retrieved from https:\/\/arxiv.org\/abs\/2405.18357"},{"key":"e_1_3_1_336_2","unstructured":"Shicheng Xu Danyang Hou Liang Pang Jingcheng Deng Jun Xu Huawei Shen and Xueqi Cheng. 2023. AI-generated images introduce invisible relevance bias to text-image retrieval. arXiv:2311.14084. Retrieved from https:\/\/arxiv.org\/abs\/2311.14084"},{"key":"e_1_3_1_337_2","doi-asserted-by":"crossref","unstructured":"Shiping Yang Renliang Sun and Xiaojun Wan. 2023. A new benchmark and reverse validation method for passage-level hallucination detection. arXiv:2310.06498. Retrieved from https:\/\/arxiv.org\/abs\/2310.06498","DOI":"10.18653\/v1\/2023.findings-emnlp.256"},{"key":"e_1_3_1_338_2","unstructured":"Yuqing Yang Ethan Chern Xipeng Qiu Graham Neubig and Pengfei Liu. 2023. Alignment for honesty. arXiv:2312.07000. Retrieved from https:\/\/arxiv.org\/abs\/2312.07000"},{"key":"e_1_3_1_339_2","volume-title":"Proceedings of the6th International Conference on Learning Representations (ICLR\u00a0\u201918)","author":"Yang Zhilin","year":"2018","unstructured":"Zhilin Yang, Zihang Dai, Ruslan Salakhutdinov, and William W. Cohen. 2018. Breaking the softmax bottleneck: A high-rank RNN language model. In Proceedings of the6th International Conference on Learning Representations (ICLR\u00a0\u201918). OpenReview.net. Retrieved from https:\/\/openreview.net\/forum?id=HkwZSG-CZ"},{"key":"e_1_3_1_340_2","first-page":"15922","volume-title":"Proceedings of the 33rd International Conference on Neural Information Processing Systems (NeurIPS \u201919)","author":"Yang Zhilin","year":"2019","unstructured":"Zhilin Yang, Thang Luong, Ruslan Salakhutdinov, and Quoc V. Le. 2019. Mixtape: Breaking the softmax bottleneck efficiently. In Proceedings of the 33rd International Conference on Neural Information Processing Systems (NeurIPS \u201919). Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d\u2019Alch\u00e9-Buc, Emily B. Fox, and Roman Garnett (Eds.), 15922\u201315930. Retrieved from https:\/\/proceedings.neurips.cc\/paper\/2019\/hash\/512fc3c5227f637e41437c999a2d3169-Abstract.html"},{"key":"e_1_3_1_341_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1259"},{"key":"e_1_3_1_342_2","unstructured":"Jia-Yu Yao Kun-Peng Ning Zhen-Hui Liu Mu-Nan Ning and Li Yuan. 2023. LLM Lies: Hallucinations are not bugs but features as adversarial examples. arXiv:2310.01469. Retrieved from https:\/\/arxiv.org\/abs\/2310.01469"},{"key":"e_1_3_1_343_2","unstructured":"Shunyu Yao Jeffrey Zhao Dian Yu Nan Du Izhak Shafran Karthik Narasimhan and Yuan Cao. 2022. React: Synergizing reasoning and acting in language models. arXiv:2210.03629. Retrieved from https:\/\/arxiv.org\/abs\/2210.03629"},{"key":"e_1_3_1_344_2","unstructured":"Yunzhi Yao Peng Wang Bozhong Tian Siyuan Cheng Zhoubo Li Shumin Deng Huajun Chen and Ningyu Zhang. 2023. Editing large language models: Problems methods and opportunities. arXiv:2305.13172. Retrieved from https:\/\/arxiv.org\/abs\/2305.13172"},{"key":"e_1_3_1_345_2","unstructured":"Xi Ye Ruoxi Sun Sercan \u00d6. Arik and Tomas Pfister. 2023. Effective large language model adaptation for improved grounding. arXiv:2311.09533. Retrieved from https:\/\/arxiv.org\/abs\/2311.09533"},{"key":"e_1_3_1_346_2","unstructured":"Shukang Yin Chaoyou Fu Sirui Zhao Tong Xu Hao Wang Dianbo Sui Yunhang Shen Ke Li Xing Sun and Enhong Chen. 2023. Woodpecker: Hallucination correction for multimodal large language models. arXiv:2310.16045. Retrieved from https:\/\/arxiv.org\/abs\/2310.16045"},{"key":"e_1_3_1_347_2","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/2023.FINDINGS-ACL.551"},{"key":"e_1_3_1_348_2","unstructured":"Chanwoong Yoon Gangwoo Kim Byeongguk Jeon Sungdong Kim Yohan Jo and Jaewoo Kang. 2024. Ask optimal questions: Aligning large language models with retriever\u2019s preference in conversational search. arXiv:2402.11827. Retrieved from https:\/\/arxiv.org\/abs\/2402.11827"},{"key":"e_1_3_1_349_2","unstructured":"Ori Yoran Tomer Wolfson Ori Ram and Jonathan Berant. 2023. Making retrieval-augmented language models robust to irrelevant context. arXiv:2310.01558. Retrieved from https:\/\/arxiv.org\/abs\/2310.01558"},{"key":"e_1_3_1_350_2","unstructured":"Fangyi Yu Lee Quartey and Frank Schilder. 2022. Legal prompting: Teaching a language model to think like a lawyer. arXiv:2212.01326. Retrieved from https:\/\/arxiv.org\/abs\/2212.01326"},{"key":"e_1_3_1_351_2","unstructured":"Fei Yu Hongbo Zhang and Benyou Wang. 2023. Nature language reasoning a survey. arXiv:2303.14725. Retrieved from https:\/\/arxiv.org\/abs\/2303.14725"},{"key":"e_1_3_1_352_2","doi-asserted-by":"publisher","DOI":"10.1145\/3474085.3475638"},{"key":"e_1_3_1_353_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2023.3238699"},{"key":"e_1_3_1_354_2","unstructured":"Wenhao Yu Hongming Zhang Xiaoman Pan Kaixin Ma Hongwei Wang and Dong Yu. 2023. Chain-of-note: Enhancing robustness in retrieval-augmented language models. arXiv:2311.09210. Retrieved from https:\/\/arxiv.org\/abs\/2311.09210"},{"key":"e_1_3_1_355_2","unstructured":"Wenhao Yu Zhihan Zhang Zhenwen Liang Meng Jiang and Ashish Sabharwal. 2023. Improving language models via plug-and-play retrieval feedback. arXiv:2305.14002. Retrieved from https:\/\/arxiv.org\/abs\/2305.14002"},{"key":"e_1_3_1_356_2","first-page":"27263","volume-title":"Proceedings of the 35th International Conference on Neural Information Processing Systems (NeurIPS \u201921)","author":"Yuan Weizhe","year":"2021","unstructured":"Weizhe Yuan, Graham Neubig, and Pengfei Liu. 2021. BARTScore: Evaluating generated text as text generation. In Proceedings of the 35th International Conference on Neural Information Processing Systems (NeurIPS \u201921). Marc\u2019Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan (Eds.), 27263\u201327277. Retrieved from https:\/\/proceedings.neurips.cc\/paper\/2021\/hash\/e4d2b6e6fdeca3e60e0f1a62fee3d9dd-Abstract.html"},{"key":"e_1_3_1_357_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00688"},{"key":"e_1_3_1_358_2","unstructured":"Bohan Zhai Shijia Yang Chenfeng Xu Sheng Shen Kurt Keutzer and Manling Li. 2023. Halle-switch: Controlling object hallucination in large vision language models. arXiv:2310.01779. Retrieved from https:\/\/arxiv.org\/abs\/2310.01779"},{"key":"e_1_3_1_359_2","unstructured":"Hanning Zhang Shizhe Diao Yong Lin Yi R. Fung Qing Lian Xingyao Wang Yangyi Chen Heng Ji and Tong Zhang. 2023. R-Tuning: Teaching large language models to refuse unknown questions. arXiv:2311.09677. Retrieved from https:\/\/arxiv.org\/abs\/2311.09677"},{"key":"e_1_3_1_360_2","first-page":"25","volume-title":"Proceedings of the Workshop on Human Evaluation of NLP Systems (HumEval)","author":"Zhang Hugh","year":"2021","unstructured":"Hugh Zhang, Daniel Duckworth, Daphne Ippolito, and Arvind Neelakantan. 2021. Trading off diversity and quality in natural language generation. In Proceedings of the Workshop on Human Evaluation of NLP Systems (HumEval). Association for Computational Linguistics, 25\u201333. Retrieved from https:\/\/aclanthology.org\/2021.humeval-1.3"},{"key":"e_1_3_1_361_2","unstructured":"Jiaxin Zhang Zhuohang Li Kamalika Das Bradley A. Malin and Sricharan Kumar. 2023. SAC \\({}^{3}\\) : Reliable hallucination detection in black-box language models via semantic-aware cross-check consistency. arXiv:2311.01740. Retrieved from https:\/\/arxiv.org\/abs\/2311.01740"},{"key":"e_1_3_1_362_2","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2018.2883740"},{"key":"e_1_3_1_363_2","series-title":"Proceedings of Machine Learning Research, Vol. 119","first-page":"11328","volume-title":"Proceedings of the 37th International Conference on Machine Learning (ICML \u201920)","author":"Zhang Jingqing","year":"2020","unstructured":"Jingqing Zhang, Yao Zhao, Mohammad Saleh, and Peter J. Liu. 2020. PEGASUS: Pre-training with extracted gap-sentences for abstractive summarization. In Proceedings of the 37th International Conference on Machine Learning (ICML \u201920), Proceedings of Machine Learning Research, Vol. 119, PMLR, 11328\u201311339. Retrieved from http:\/\/proceedings.mlr.press\/v119\/zhang20ae.html"},{"key":"e_1_3_1_364_2","unstructured":"Mingtian Zhang Shawn Lan Peter Hayes and David Barber. 2024. Mafin: Enhancing black-box embeddings with model augmented fine-tuning. arXiv:2402.12177. Retrieved from https:\/\/arxiv.org\/abs\/2402.12177"},{"key":"e_1_3_1_365_2","unstructured":"Muru Zhang Ofir Press William Merrill Alisa Liu and Noah A. Smith. 2023. How language model hallucinations can snowball. arXiv:2305.13534. Retrieved from https:\/\/arxiv.org\/abs\/2305.13534"},{"key":"e_1_3_1_366_2","unstructured":"Ningyu Zhang Yunzhi Yao Bozhong Tian Peng Wang Shumin Deng Mengru Wang Zekun Xi Shengyu Mao Jintian Zhang Yuansheng Ni et\u00a0al. 2024. A comprehensive study of knowledge editing for large language models. arXiv:2401.01286. Retrieved from https:\/\/arxiv.org\/abs\/2401.01286"},{"key":"e_1_3_1_367_2","unstructured":"Shengyu Zhang Linfeng Dong Xiaoya Li Sen Zhang Xiaofei Sun Shuhe Wang Jiwei Li Runyi Hu Tianwei Zhang Fei Wu et al. 2023. Instruction tuning for large language models: A survey. arXiv:2308.10792. Retrieved from https:\/\/arxiv.org\/abs\/2308.10792"},{"key":"e_1_3_1_368_2","unstructured":"Shuo Zhang Liangming Pan Junzhou Zhao and William Yang Wang. 2023. Mitigating language model hallucination with interactive question-knowledge alignment. arXiv:2305.13669. Retrieved from https:\/\/arxiv.org\/abs\/2305.13669"},{"key":"e_1_3_1_369_2","unstructured":"Susan Zhang Stephen Roller Naman Goyal Mikel Artetxe Moya Chen Shuohui Chen Christopher Dewan Mona T. Diab Xian Li Xi Victoria Lin et\u00a0al. 2022. OPT: Open pre-trained transformer language models. arXiv:2205.01068. Retrieved from https:\/\/arxiv.org\/abs\/2205.01068"},{"key":"e_1_3_1_370_2","unstructured":"Tianyi Zhang Faisal Ladhak Esin Durmus Percy Liang Kathleen McKeown and Tatsunori B Hashimoto. 2023. Benchmarking large language models for news summarization. arXiv:2301.13848. Retrieved from https:\/\/arxiv.org\/abs\/2301.13848"},{"key":"e_1_3_1_371_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.emnlp-main.58"},{"key":"e_1_3_1_372_2","doi-asserted-by":"crossref","unstructured":"Xiaoying Zhang Baolin Peng Ye Tian Jingyan Zhou Lifeng Jin Linfeng Song Haitao Mi and Helen Meng. 2024. Self-alignment for factuality: Mitigating hallucinations in LLMs via self-evaluation. arXiv:2402.09267. Retrieved from https:\/\/arxiv.org\/abs\/2402.09267","DOI":"10.18653\/v1\/2024.acl-long.107"},{"key":"e_1_3_1_373_2","unstructured":"Yue Zhang Yafu Li Leyang Cui Deng Cai Lemao Liu Tingchen Fu Xinting Huang Enbo Zhao Yu Zhang Yulong Chen et\u00a0al. 2023. Siren\u2019s song in the AI ocean: A survey on hallucination in large language models. arXiv:2309.01219. Retrieved from https:\/\/arxiv.org\/abs\/2309.01219"},{"key":"e_1_3_1_374_2","unstructured":"Zhenyu Zhang Runjin Chen Shiwei Liu Zhewei Yao Olatunji Ruwase Beidi Chen Xiaoxia Wu and Zhangyang Wang. 2024. Found in the middle: How language models use long contexts better via plug-and-play positional encoding. arXiv:2403.04797."},{"key":"e_1_3_1_375_2","doi-asserted-by":"crossref","unstructured":"Zihan Zhang Meng Fang and Ling Chen. 2024. RetrievalQA: Assessing adaptive retrieval-augmented generation for short-form open-domain question answering. arXiv:2402.16457. Retrieved from https:\/\/arxiv.org\/abs\/2402.16457","DOI":"10.18653\/v1\/2024.findings-acl.415"},{"key":"e_1_3_1_376_2","unstructured":"Linxi Zhao Yihe Deng Weitong Zhang and Quanquan Gu. 2024. Mitigating object hallucination in large vision-language models via classifier-free guidance. arXiv:2402.08680. Retrieved from https:\/\/arxiv.org\/abs\/2402.08680"},{"key":"e_1_3_1_377_2","unstructured":"Liang Zhao Xiaocheng Feng Xiachong Feng Bing Qin and Ting Liu. 2023. Length extrapolation of transformers: A survey from the perspective of position encoding. arXiv:2312.17044. Retrieved from https:\/\/arxiv.org\/abs\/2312.17044"},{"key":"e_1_3_1_378_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.acl-long.320"},{"key":"e_1_3_1_379_2","doi-asserted-by":"publisher","DOI":"10.1145\/3637870"},{"key":"e_1_3_1_380_2","unstructured":"Wayne Xin Zhao Kun Zhou Junyi Li Tianyi Tang Xiaolei Wang Yupeng Hou Yingqian Min Beichen Zhang Junjie Zhang Zican Dong et al. 2023. A survey of large language models. arXiv:2303.18223. Retrieved from https:\/\/arxiv.org\/abs\/2303.18223"},{"key":"e_1_3_1_381_2","unstructured":"Yukun Zhao Lingyong Yan Weiwei Sun Guoliang Xing Chong Meng Shuaiqiang Wang Zhicong Cheng Zhaochun Ren and Dawei Yin. 2023. Knowing what LLMs do not know: A simple yet effective self-detection method. arXiv:2310.17918. Retrieved from https:\/\/arxiv.org\/abs\/2310.17918"},{"key":"e_1_3_1_382_2","unstructured":"Danna Zheng Mirella Lapata and Jeff Z. Pan. 2024. Large language models as reliable knowledge bases? arXiv:2407.13578. Retrieved from https:\/\/arxiv.org\/abs\/2407.13578"},{"key":"e_1_3_1_383_2","unstructured":"Shen Zheng Jie Huang and Kevin Chen-Chuan Chang. 2023. Why does ChatGPT fall short in answering questions faithfully? arXiv:2304.10513. Retrieved from https:\/\/arxiv.org\/abs\/2304.10513"},{"key":"e_1_3_1_384_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v37i3.25483"},{"key":"e_1_3_1_385_2","unstructured":"Chunting Zhou Pengfei Liu Puxin Xu Srini Iyer Jiao Sun Yuning Mao Xuezhe Ma Avia Efrat Ping Yu Lili Yu et\u00a0al. 2023. Lima: Less is more for alignment. arXiv:2305.11206. Retrieved from https:\/\/arxiv.org\/abs\/2305.11206"},{"key":"e_1_3_1_386_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.findings-acl.120"},{"key":"e_1_3_1_387_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.findings-emnlp.968"},{"key":"e_1_3_1_388_2","unstructured":"Yiyang Zhou Chenhang Cui Jaehong Yoon Linjun Zhang Zhun Deng Chelsea Finn Mohit Bansal and Huaxiu Yao. 2023. Analyzing and mitigating object hallucination in large vision-language models. arXiv:2310.00754. Retrieved from https:\/\/arxiv.org\/abs\/2310.00754"},{"key":"e_1_3_1_389_2","unstructured":"Deyao Zhu Jun Chen Xiaoqian Shen Xiang Li and Mohamed Elhoseiny. 2023. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv:2304.10592. Retrieved from https:\/\/arxiv.org\/abs\/2304.10592"},{"key":"e_1_3_1_390_2","unstructured":"Wenhao Zhu Hongyi Liu Qingxiu Dong Jingjing Xu Lingpeng Kong Jiajun Chen Lei Li and Shujian Huang. 2023. Multilingual machine translation with large language models: Empirical results and analysis. arXiv:2304.04675. Retrieved from https:\/\/arxiv.org\/abs\/2304.04675"},{"key":"e_1_3_1_391_2","unstructured":"Yutao Zhu Huaying Yuan Shuting Wang Jiongnan Liu Wenhan Liu Chenlong Deng Zhicheng Dou and Ji-Rong Wen. 2023. Large language models for information retrieval: A survey. arXiv:2308.07107. Retrieved from https:\/\/arxiv.org\/abs\/2308.07107"},{"key":"e_1_3_1_392_2","unstructured":"Yongshuo Zong Tingyang Yu Bingchen Zhao Ruchika Chavhan and Timothy Hospedales. 2023. Fool your (vision and) language model with embarrassingly simple permutations. arXiv:2310.01651. Retrieved from https:\/\/arxiv.org\/abs\/2310.01651"}],"container-title":["ACM Transactions on Information Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3703155","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3703155","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:10:18Z","timestamp":1750295418000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3703155"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,1,24]]},"references-count":391,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,3,31]]}},"alternative-id":["10.1145\/3703155"],"URL":"https:\/\/doi.org\/10.1145\/3703155","relation":{},"ISSN":["1046-8188","1558-2868"],"issn-type":[{"value":"1046-8188","type":"print"},{"value":"1558-2868","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,1,24]]},"assertion":[{"value":"2023-12-08","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-09-24","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-01-24","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}