{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,1]],"date-time":"2026-05-01T16:12:11Z","timestamp":1777651931916,"version":"3.51.4"},"reference-count":217,"publisher":"Association for Computing Machinery (ACM)","issue":"1","funder":[{"DOI":"10.13039\/501100012166","name":"National Key R&D program of China","doi-asserted-by":"crossref","award":["2022YFE0204900"],"award-info":[{"award-number":["2022YFE0204900"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Key NSFC Project","award":["62336006"],"award-info":[{"award-number":["62336006"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Comput. Surv."],"published-print":{"date-parts":[[2026,1,31]]},"abstract":"<jats:p>\n            This survey addresses the crucial issue of factuality in Large Language Models (LLMs). As LLMs find applications across diverse domains, the reliability and accuracy of their outputs become vital. We define the \u201cfactuality issue\u201d as the probability of LLMs to produce content inconsistent with established facts. We first delve into the implications of these inaccuracies. Subsequently, we analyze the mechanisms through which LLMs store and process facts, seeking the primary causes of factual errors. Our discussion then transitions to methodologies for evaluating LLM factuality, emphasizing key metrics, benchmarks, and studies. We further explore strategies for enhancing LLM factuality. Our survey offers a structured guide for researchers aiming to fortify the factual reliability of LLMs. We consistently maintain and update the related open-source materials at\n            <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" xlink:href=\"https:\/\/github.com\/wangcunxiang\/LLM-Factuality-Survey\">https:\/\/github.com\/wangcunxiang\/LLM-Factuality-Survey<\/jats:ext-link>\n            .\n          <\/jats:p>","DOI":"10.1145\/3742420","type":"journal-article","created":{"date-parts":[[2025,6,2]],"date-time":"2025-06-02T07:08:47Z","timestamp":1748848127000},"page":"1-37","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":19,"title":["Survey on Factuality in Large Language Models"],"prefix":"10.1145","volume":"58","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3023-8082","authenticated-orcid":false,"given":"Cunxiang","family":"Wang","sequence":"first","affiliation":[{"name":"Westlake University","place":["Hangzhou, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9726-3397","authenticated-orcid":false,"given":"Xiaoze","family":"Liu","sequence":"additional","affiliation":[{"name":"Purdue University","place":["West Lafayette, United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-3612-5805","authenticated-orcid":false,"given":"Yuanhao","family":"Yue","sequence":"additional","affiliation":[{"name":"Fudan University","place":["Shanghai, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8805-8789","authenticated-orcid":false,"given":"Qipeng","family":"Guo","sequence":"additional","affiliation":[{"name":"Shanghai AI Laboratory","place":["Shanghai, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-2165-9236","authenticated-orcid":false,"given":"Xiangkun","family":"Hu","sequence":"additional","affiliation":[{"name":"Shanghai Innovation Institute","place":["Shanghai, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-2700-4513","authenticated-orcid":false,"given":"Xiangru","family":"Tang","sequence":"additional","affiliation":[{"name":"Yale University","place":["New Haven, United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-6234-4409","authenticated-orcid":false,"given":"Tianhang","family":"Zhang","sequence":"additional","affiliation":[{"name":"Amazon.com Inc","place":["Shanghai, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1140-6084","authenticated-orcid":false,"given":"Cheng","family":"Jiayang","sequence":"additional","affiliation":[{"name":"HKUST","place":["Hong Kong, Hong Kong"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9458-696X","authenticated-orcid":false,"given":"Yunzhi","family":"Yao","sequence":"additional","affiliation":[{"name":"Zhejiang University","place":["Hangzhou, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6075-4224","authenticated-orcid":false,"given":"Xuming","family":"Hu","sequence":"additional","affiliation":[{"name":"HKUST (GZ)","place":["Guangzhou, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-5232-9130","authenticated-orcid":false,"given":"Zehan","family":"Qi","sequence":"additional","affiliation":[{"name":"Tsinghua University","place":["Beijing, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1143-064X","authenticated-orcid":false,"given":"Wenyang","family":"Gao","sequence":"additional","affiliation":[{"name":"Westlake University","place":["Hangzhou, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-9969-8259","authenticated-orcid":false,"given":"Yidong","family":"Wang","sequence":"additional","affiliation":[{"name":"Westlake University","place":["Hangzhou, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0667-7349","authenticated-orcid":false,"given":"Linyi","family":"Yang","sequence":"additional","affiliation":[{"name":"Westlake University","place":["Hangzhou, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4833-0880","authenticated-orcid":false,"given":"Jindong","family":"Wang","sequence":"additional","affiliation":[{"name":"William & Mary","place":["Williamsburg, USA"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8608-8482","authenticated-orcid":false,"given":"Xing","family":"Xie","sequence":"additional","affiliation":[{"name":"Microsoft","place":["Seattle, United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-2007-2019","authenticated-orcid":false,"given":"Zheng","family":"Zhang","sequence":"additional","affiliation":[{"name":"Amazon.com Inc","place":["Shanghai, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5214-2268","authenticated-orcid":false,"given":"Yue","family":"Zhang","sequence":"additional","affiliation":[{"name":"Westlake University","place":["Hangzhou, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,9,2]]},"reference":[{"key":"e_1_3_2_2_2","first-page":"3554","volume-title":"Proceedings of the NAACL","author":"Agarwal Oshin","year":"2021","unstructured":"Oshin Agarwal, Heming Ge, Siamak Shakeri, and Rami Al-Rfou. 2021. Knowledge graph based synthetic corpus generation for knowledge-enhanced language model pre-training. In Proceedings of the NAACL. Online, 3554\u20133565."},{"key":"e_1_3_2_3_2","unstructured":"Badr AlKhamissi Millicent Li Asli Celikyilmaz Mona Diab and Marjan Ghazvininejad. 2022. A Review on Language Models as Knowledge Bases. arXiv:2204.06031. Retrieved from https:\/\/arxiv.org\/abs\/2204.06031"},{"key":"e_1_3_2_4_2","doi-asserted-by":"crossref","unstructured":"Zeyuan Allen-Zhu and Yuanzhi Li. 2024. Physics of language models: Part 3.1 knowledge storage and extraction. In Proceedings of the 41st International Conference on Machine Learning (ICML 2024). PMLR 1067\u20131077.","DOI":"10.2139\/ssrn.5250633"},{"key":"e_1_3_2_5_2","unstructured":"Akari Asai Zeqiu Wu Yizhong Wang Avirup Sil and Hannaneh Hajishirzi. 2024. Self-RAG: Learning to retrieve generate and critique through self-reflection. In ICLR 2024 (Oral). hSyW5go0v8."},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","unstructured":"Amos Azaria and Tom Mitchell. 2023. The internal state of an llM knows when It\u2019s lying. Findings of ACL: EMNLP 2023. Singapore 967\u2013976. DOI:10.18653\/v1\/2023.findings-emnlp.68","DOI":"10.18653\/v1\/2023.findings-emnlp.68"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","unstructured":"Jinheon Baek Alham Fikri Aji and Amir Saffari. 2023. Knowledge-augmented language model prompting for zero-shot knowledge graph question answering. In Proc. 1st Workshop on Natural Language Reasoning and Structured Explanations (NLRSE @ ACL 2023). Toronto 78\u2013106. DOI:10.18653\/v1\/2023.nlrse-1.7","DOI":"10.18653\/v1\/2023.nlrse-1.7"},{"key":"e_1_3_2_8_2","doi-asserted-by":"crossref","first-page":"610","DOI":"10.1145\/3442188.3445922","volume-title":"Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (FAccT\u201921)","author":"Bender Emily M.","year":"2021","unstructured":"Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. 2021. On the dangers of stochastic parrots: Can language models be too big?. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (FAccT\u201921). Association for Computing Machinery, New York, NY, USA, 610\u2013623. DOI:10.1145\/3442188.3445922"},{"key":"e_1_3_2_9_2","unstructured":"Lukas Berglund Meg Tong Max Kaufmann Mikita Balesni Asa Cooper Stickland Tomasz Korbak and Owain Evans. 2024. The reversal curse: LLMs trained on \u201cA is B\u201d fail to learn \u201cB is A\u201d. In ICLR 2024."},{"key":"e_1_3_2_10_2","unstructured":"Sebastian Borgeaud et\u00a0al. 2022. Improving language models by retrieving from trillions of tokens. Proceedings of ICML 2022. PMLR 2206\u20132240."},{"key":"e_1_3_2_11_2","unstructured":"S\u00e9bastien Bubeck Varun Chandrasekaran Ronen Eldan Johannes Gehrke Eric Horvitz Ece Kamar Peter Lee Yin Tat Lee Yuanzhi Li Scott Lundberg et al. 2023. Sparks of Artificial General Intelligence: Early Experiments with GPT-4. arXiv:2303.12712. Retrieved from https:\/\/arxiv.org\/abs\/2303.12712"},{"key":"e_1_3_2_12_2","doi-asserted-by":"crossref","unstructured":"Yong Cao Li Zhou Seolhwa Lee Laura Cabello Min Chen and Daniel Hershcovich. 2023. Assessing cross-cultural alignment between ChatGPT and human societies: An empirical study. arXiv preprint arXiv:2303.17466.","DOI":"10.18653\/v1\/2023.c3nlp-1.7"},{"key":"e_1_3_2_13_2","unstructured":"Yupeng Chang Xu Wang Jindong Wang Yuan Wu Kaijie Zhu Hao Chen Linyi Yang Xiaoyuan Yi Cunxiang Wang Yidong Wang et al. 2023. A survey on evaluation of large language models. arXiv:2307.03109. Retrieved from https:\/\/arxiv.org\/abs\/2307.03109"},{"key":"e_1_3_2_14_2","unstructured":"Harrison Chase. 2022. LangChain. Retrieved September 9 2023 from https:\/\/github.com\/langchain-ai\/langchain"},{"key":"e_1_3_2_15_2","unstructured":"Anthony Chen Panupong Pasupat Sameer Singh Hongrae Lee and Kelvin Guu. 2023. PURR: Efficiently editing language model hallucinations by denoising language model corruptions. arXiv:2305.14908. Retrieved from https:\/\/arxiv.org\/abs\/2305.14908"},{"key":"e_1_3_2_16_2","doi-asserted-by":"crossref","unstructured":"Jiawei Chen Hongyu Lin Xianpei Han and Le Sun. 2024. Benchmarking large language models in retrieval-augmented generation. AAAI 2024 38 16 (2024) 17754\u201317762.","DOI":"10.1609\/aaai.v38i16.29728"},{"key":"e_1_3_2_17_2","doi-asserted-by":"crossref","unstructured":"Liang Chen et\u00a0al. 2023. Beyond factuality: A comprehensive evaluation of large language models as knowledge generators. EMNLP 2023. Singapore 6325\u20136341.","DOI":"10.18653\/v1\/2023.emnlp-main.390"},{"key":"e_1_3_2_18_2","first-page":"7870","volume-title":"Proceedings of the EMNLP","author":"Chen Sanyuan","year":"2020","unstructured":"Sanyuan Chen, Yutai Hou, Yiming Cui, Wanxiang Che, Ting Liu, and Xiangzhan Yu. 2020. Recall and learn: Fine-tuning deep pretrained language models with less forgetting. In Proceedings of the EMNLP. 7870\u20137881."},{"key":"e_1_3_2_19_2","doi-asserted-by":"crossref","unstructured":"Yuheng Chen Pengfei Cao Yubo Chen Kang Liu and Jun Zhao. 2024. Journey to the center of the knowledge neurons: discoveries of language-independent knowledge neurons and degenerate knowledge neurons. AAAI 2024 38 16 (2024) 17817\u201317825.","DOI":"10.1609\/aaai.v38i16.29735"},{"key":"e_1_3_2_20_2","unstructured":"I-Chun Chern et\u00a0al. 2024. FacTool: Factuality detection in generative AI - A tool-augmented framework for multi-task and multi-domain scenarios. In ICLR 2024 (Spotlight). OpenReview ID fN7AnX3ZEM."},{"key":"e_1_3_2_21_2","unstructured":"Wei-Lin Chiang Zhuohan Li Zi Lin Ying Sheng Zhanghao Wu Hao Zhang Lianmin Zheng Siyuan Zhuang Yonghao Zhuang Joseph E. Gonzalez et al. 2023. Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality. Retrieved from https:\/\/lmsys.org\/blog\/2023-03-30-vicuna\/"},{"key":"e_1_3_2_22_2","unstructured":"Wei-Lin Chiang et\u00a0al. 2024. Chatbot Arena: An open platform for evaluating LLMs by human preference. ICML 2024. PMLR 235."},{"key":"e_1_3_2_23_2","unstructured":"Yung-Sung Chuang et\u00a0al. 2024. DoLa: Decoding by contrasting layers improves factuality in large language models. In ICLR 2024. OpenReview ID RJR6nqC59y."},{"key":"e_1_3_2_24_2","unstructured":"Hyung Won Chung Le Hou Shayne Longpre Barret Zoph Yi Tay William Fedus Yunxuan Li Xuezhi Wang Mostafa Dehghani Siddhartha Brahma Albert Webson Shixiang Shane Gu Zhuyun Dai Mirac Suzgun Xinyun Chen Aakanksha Chowdhery Alex Castro-Ros Marie Pellat Kevin Robinson Dasha Valter Sharan Narang Gaurav Mishra Adams Yu Vincent Zhao Yanping Huang Andrew Dai Hongkun Yu Slav Petrov Ed H. Chi Jeff Dean Jacob Devlin Adam Roberts Denny Zhou Quoc V. Le and Jason Wei. 2024. Scaling instruction-finetuned language models. Journal of Machine Learning Research 25 (2024) 1\u201353."},{"key":"e_1_3_2_25_2","unstructured":"Karl Cobbe Vineet Kosaraju Mohammad Bavarian Mark Chen Heewoo Jun Lukasz Kaiser Matthias Plappert Jerry Tworek Jacob Hilton Reiichiro Nakano et\u00a0al. 2021. Training verifiers to solve math word problems. arXiv:2110.14168. Retrieved from https:\/\/arxiv.org\/abs\/2110.14168"},{"issue":"1","key":"e_1_3_2_26_2","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1177\/001316446002000104","article-title":"A coefficient of agreement for nominal scales","volume":"20","author":"Cohen Jacob","year":"1960","unstructured":"Jacob Cohen. 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20, 1 (1960), 37\u201346.","journal-title":"Educational and Psychological Measurement"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","unstructured":"Roi Cohen May Hamri Mor Geva and Amir Globerson. 2023. LM vs. LM: detecting factual errors via cross examination. EMNLP 2023. Singapore 12621\u201312640. DOI:10.18653\/v1\/2023.emnlp-main.778","DOI":"10.18653\/v1\/2023.emnlp-main.778"},{"key":"e_1_3_2_28_2","unstructured":"Jiaxi Cui Zongjian Li Yang Yan Bohua Chen and Li Yuan. 2023. ChatLaw: Open-Source Legal Large Language Model with Integrated External Knowledge Bases. arXiv:2306.16092. Retrieved from https:\/\/arxiv.org\/abs\/2306.16092"},{"key":"e_1_3_2_29_2","unstructured":"Shawn Curran Sam Lansley and Oliver Bethell. 2023. Hallucination is the Last Thing you Need. arXiv:2306.11520. Retrieved from https:\/\/arxiv.org\/abs\/2306.11520"},{"key":"e_1_3_2_30_2","first-page":"8493","volume-title":"Proceedings of the ACL","author":"Dai Damai","year":"2022","unstructured":"Damai Dai, Li Dong, Yaru Hao, Zhifang Sui, Baobao Chang, and Furu Wei. 2022. Knowledge neurons in pretrained transformers. In Proceedings of the ACL. Dublin, Ireland, 8493\u20138502."},{"key":"e_1_3_2_31_2","series-title":"KDD\u201924","first-page":"6437","volume-title":"Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining","volume":"199","author":"Dai Sunhao","year":"2024","unstructured":"Sunhao Dai, Chen Xu, Shicheng Xu, Liang Pang, Zhenhua Dong, and Jun Xu. 2024. Bias and unfairness in information retrieval systems: New challenges in the LLM era. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining(KDD\u201924, Vol. 199). ACM, 6437\u20136447. DOI:10.1145\/3637528.3671458"},{"key":"e_1_3_2_32_2","first-page":"6491","volume-title":"Proceedings of the EMNLP","author":"Cao Nicola De","year":"2021","unstructured":"Nicola De Cao, Wilker Aziz, and Ivan Titov. 2021. Editing factual knowledge in language models. In Proceedings of the EMNLP. Online and Punta Cana, Dominican Republic, 6491\u20136506."},{"key":"e_1_3_2_33_2","volume-title":"Proceedings of the ICLR","author":"Jong Michiel de","year":"2022","unstructured":"Michiel de Jong, Yury Zemlyanskiy, Nicholas FitzGerald, Fei Sha, and William W. Cohen. 2022. Mention memory: Incorporating textual knowledge into Transformers through entity mention attention. In Proceedings of the ICLR."},{"key":"e_1_3_2_34_2","first-page":"4171","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Minneapolis, Minnesota, 4171\u20134186. DOI:10.18653\/v1\/N19-1423"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","unstructured":"Shehzaad Dhuliawala Mojtaba Komeili Jing Xu Roberta Raileanu Xian Li Asli Celikyilmaz and Jason Weston. 2024. Chain-of-verification reduces hallucination in large language models. Findings of the Association for Computational Linguistics: ACL 2024. 3563\u20133578. Bangkok Thailand. DOI:10.18653\/v1\/2024.findings-acl.212","DOI":"10.18653\/v1\/2024.findings-acl.212"},{"key":"e_1_3_2_36_2","first-page":"5937","volume-title":"Findings of EMNLP 2022","author":"Dong Qingxiu","year":"2022","unstructured":"Qingxiu Dong, Damai Dai, Yifan Song, Jingjing Xu, Zhifang Sui, and Lei Li. 2022. Calibrating factual knowledge in pretrained language models. In Findings of EMNLP 2022. Abu Dhabi, United Arab Emirates, 5937\u20135947."},{"key":"e_1_3_2_37_2","doi-asserted-by":"crossref","unstructured":"Qingxiu Dong Lei Li Damai Dai Ce Zheng Jingyuan Ma et\u00a0al. 2024. A survey on in-context learning. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024). Miami FL. 1107\u20131128.","DOI":"10.18653\/v1\/2024.emnlp-main.64"},{"key":"e_1_3_2_38_2","unstructured":"Yilun Du Shuang Li Antonio Torralba Joshua B. Tenenbaum and Igor Mordatch. 2024. Improving factuality and reasoning in language models through multiagent debate. In Proceedings of the 12th International Conference on Learning Representations (ICLR 2024). Poster paper."},{"key":"e_1_3_2_39_2","unstructured":"Yann Dubois Bal\u00e1zs Galambosi Percy Liang and Tatsunori B. Hashimoto. 2024. Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators. COLM 2024. https:\/\/openreview.net\/forum?id=CybBmzWBX0#discussion"},{"key":"e_1_3_2_40_2","first-page":"11","volume-title":"Proceedings of the ICML (ICML\u201920)","author":"Dutta Sanghamitra","year":"2020","unstructured":"Sanghamitra Dutta, Dennis Wei, Hazar Yueksel, Pin-Yu Chen, Sijia Liu, and Kush R. Varshney. 2020. Is there a tradeoff between fairness and accuracy? A perspective using mismatched hypothesis testing. In Proceedings of the ICML (ICML\u201920). Article 263, 11 pages."},{"key":"e_1_3_2_41_2","doi-asserted-by":"crossref","unstructured":"Shahul Es Jithin James Luis Espinosa-Anke and Steven Schockaert. 2023. RAGAS: Automated Evaluation of Retrieval Augmented Generation. Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations Nikolaos Aletras and Orphee De Clercq (Eds.). Association for Computational Linguistics 150\u2013158. https:\/\/aclanthology.org\/2024.eacl-demo.16\/","DOI":"10.18653\/v1\/2024.eacl-demo.16"},{"key":"e_1_3_2_42_2","unstructured":"Wayne Xin Zhao et al.2023. A survey of large language models. arXiv:2303.18223. Retrieved from https:\/\/arxiv.org\/abs\/2303.18223"},{"issue":"11","key":"e_1_3_2_43_2","doi-asserted-by":"crossref","first-page":"793","DOI":"10.1119\/1.1937609","article-title":"Transmission of information: A statistical theory of communications","volume":"29","author":"Fano Robert M.","year":"1961","unstructured":"Robert M. Fano and David Hawkins. 1961. Transmission of information: A statistical theory of communications. American Journal of Physics 29, 11 (1961), 793\u2013794.","journal-title":"American Journal of Physics"},{"key":"e_1_3_2_44_2","first-page":"9126","volume-title":"Proceedings of the ACL","author":"Felkner Virginia","year":"2023","unstructured":"Virginia Felkner, Ho-Chun Herbert Chang, Eugene Jang, and Jonathan May. 2023. WinoQueer: A community-in-the-loop benchmark for Anti-LGBTQ+ bias in large language models. In Proceedings of the ACL. 9126\u20139140."},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","unstructured":"Jinlan Fu See-Kiong Ng Zhengbao Jiang and Pengfei Liu. 2024. GPTScore: Evaluate as you desire. Proceedings of NAACL-HLT 2024 (Vol. 1 Long Papers). Mexico City 6556\u20136576. DOI:10.18653\/v1\/2024.naacl-long.365","DOI":"10.18653\/v1\/2024.naacl-long.365"},{"key":"e_1_3_2_46_2","unstructured":"Xue-Yong Fu Md Tahmid Rahman Laskar Cheng Chen and Shashi Bhushan TN. 2023. Are large language models reliable judges? A study on the factuality evaluation capabilities of LLMs. Proceedings of the Third Workshop on Natural Language Generation Evaluation and Metrics (GEM 2023). Singapore 310\u2013316."},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","unstructured":"Thibault F\u00e9vry Livio Baldini Soares Nicholas FitzGerald Eunsol Choi and Tom Kwiatkowski. 2020. Entities as Experts: Sparse memory access with entity supervision. EMNLP 2020. 4937\u20134951. DOI:10.18653\/v1\/2020.emnlp-main.400","DOI":"10.18653\/v1\/2020.emnlp-main.400"},{"key":"e_1_3_2_48_2","doi-asserted-by":"crossref","unstructured":"Isabel O. Gallegos Ryan A. Rossi Joe Barrow Md Mehrab Tanjim Sungchul Kim et\u00a0al. 2024. Bias and fairness in large language models: A survey. Computational Linguistics 50 3 (2024).","DOI":"10.1162\/coli_a_00524"},{"key":"e_1_3_2_49_2","first-page":"16477","volume-title":"Proceedings of the ACL","author":"Gao Luyu","year":"2023","unstructured":"Luyu Gao, Zhuyun Dai, Panupong Pasupat, Anthony Chen, Arun Tejasvi Chaganty, Yicheng Fan, Vincent Zhao, Ni Lao, Hongrae Lee, Da-Cheng Juan, et\u00a0al. 2023. Rarr: Researching and revising what language models say, using language models. In Proceedings of the ACL. 16477\u201316508."},{"key":"e_1_3_2_50_2","unstructured":"Tianyu Gao Howard Yen Jiatong Yu and Danqi Chen. 2023. Enabling large language models to generate text with citations. EMNLP 2023. Singapore 6465\u20136488."},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","unstructured":"Mor Geva Jasmijn Bastings Katja Filippova and Amir Globerson. 2023. Dissecting recall of factual associations in auto-regressive language models. EMNLP 2023. Singapore 12216\u201312235. DOI:10.18653\/v1\/2023.emnlp-main.751. (aclanthology.org).","DOI":"10.18653\/v1\/2023.emnlp-main.751"},{"key":"e_1_3_2_52_2","first-page":"30","volume-title":"Proceedings of the EMNLP","author":"Geva Mor","year":"2022","unstructured":"Mor Geva, Avi Caciularu, Kevin Wang, and Yoav Goldberg. 2022. Transformer feed-forward layers build predictions by promoting concepts in the vocabulary space. In Proceedings of the EMNLP. Abu Dhabi, United Arab Emirates, 30\u201345."},{"key":"e_1_3_2_53_2","doi-asserted-by":"crossref","first-page":"346","DOI":"10.1162\/tacl_a_00370","article-title":"Did aristotle use a laptop? A question answering benchmark with implicit reasoning strategies","volume":"9","author":"Geva Mor","year":"2021","unstructured":"Mor Geva, Daniel Khashabi, Elad Segal, Tushar Khot, Dan Roth, and Jonathan Berant. 2021. Did aristotle use a laptop? A question answering benchmark with implicit reasoning strategies. Transactions of the Association for Computational Linguistics 9 (2021), 346\u2013361.","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"e_1_3_2_54_2","first-page":"5484","volume-title":"Proceedings of the EMNLP","author":"Geva Mor","year":"2021","unstructured":"Mor Geva, Roei Schuster, Jonathan Berant, and Omer Levy. 2021. Transformer feed-forward layers are key-value memories. In Proceedings of the EMNLP. Online and Punta Cana, Dominican Republic, 5484\u20135495."},{"key":"e_1_3_2_55_2","unstructured":"Ian J. Goodfellow Mehdi Mirza Da Xiao Aaron Courville and Yoshua Bengio. 2015. An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks. arXiv:1312.6211. Retrieved from https:\/\/arxiv.org\/abs\/1312.6211"},{"key":"e_1_3_2_56_2","article-title":"Bard","year":"2023","unstructured":"Google. 2023. Bard. bard.google.com (2023).","journal-title":"bard.google.com"},{"key":"e_1_3_2_57_2","unstructured":"Zhibin Gou Zhihong Shao Yeyun Gong Yelong Shen Yujiu Yang Nan Duan and Weizhu Chen. 2024. CRITIC: Large language models can self-correct with tool-interactive critiquing. In ICLR 2024 Poster Paper. (openreview.net)."},{"key":"e_1_3_2_58_2","volume-title":"Proceedings of the Annual Meeting of the Association for Computational Linguistics","author":"Gupta Ashim","year":"2021","unstructured":"Ashim Gupta and Vivek Srikumar. 2021. X-Fact: A new benchmark dataset for multilingual fact checking. In Proceedings of the Annual Meeting of the Association for Computational Linguistics. Retrieved from https:\/\/api.semanticscholar.org\/CorpusID:235457983"},{"key":"e_1_3_2_59_2","doi-asserted-by":"crossref","unstructured":"Vivek Gupta Maitrey Mehta Pegah Nokhiz and Vivek Srikumar. 2020. INFOTABS: Inference on tables as semi-structured data. ACL 2020. 2309\u20132324. Online. (aclanthology.org).","DOI":"10.18653\/v1\/2020.acl-main.210"},{"key":"e_1_3_2_60_2","unstructured":"Hangfeng He Hongming Zhang and Dan Roth. 2022. Rethinking with Retrieval: Faithful Large Language Model Inference. arXiv:2301.00303. Retrieved from https:\/\/arxiv.org\/abs\/2301.00303"},{"key":"e_1_3_2_61_2","article-title":"Measuring massive multitask language understanding","author":"Hendrycks Dan","year":"2021","unstructured":"Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. 2021. Measuring massive multitask language understanding. ICLR (2021).","journal-title":"ICLR"},{"key":"e_1_3_2_62_2","first-page":"5352","volume-title":"Proceedings of the ACL","author":"Hossain Tamanna","year":"2023","unstructured":"Tamanna Hossain, Sunipa Dev, and Sameer Singh. 2023. MISGENDERED: Limits of large language models in understanding pronouns. In Proceedings of the ACL. Toronto, Canada, 5352\u20135367."},{"key":"e_1_3_2_63_2","unstructured":"Neil Houlsby Andrei Giurgiu Stanislaw Jastrzebski Bruna Morrone Quentin de Laroussilhe et\u00a0al. 2019. Parameter-efficient transfer learning for NLP. Proceedings of the 36th International Conference on Machine Learning (ICML 2019). PMLR 2790\u20132799. (proceedings.mlr.press)."},{"key":"e_1_3_2_64_2","unstructured":"Xuming Hu Junzhe Chen Xiaochuan Li Yufei Guo Lijie Wen Philip S. Yu and Zhijiang Guo. 2024. Towards understanding factual knowledge of large language models. ICLR24."},{"key":"e_1_3_2_65_2","doi-asserted-by":"crossref","unstructured":"Jie Huang and Kevin Chen-Chuan Chang. 2023. Towards reasoning in large language models: A survey. Findings of ACL 2023. Toronto Canada 8298\u20138319.","DOI":"10.18653\/v1\/2023.findings-acl.67"},{"key":"e_1_3_2_66_2","unstructured":"Quzhe Huang Mingxu Tao Zhenwei An Chen Zhang Cong Jiang Zhibin Chen Zirui Wu and Yansong Feng. 2023. Lawyer LLaMA technical report. arXiv:2305.15062. Retrieved from https:\/\/arxiv.org\/abs\/2305.15062"},{"key":"e_1_3_2_67_2","unstructured":"Yuzhen Huang Yuzhuo Bai Zhihao Zhu Junlei Zhang Jinghan Zhang et\u00a0al. 2023. C-Eval: A multi-level multi-discipline chinese evaluation suite for foundation models. NeurIPS 2023. Datasets & Benchmarks Track Article 229. (proceedings.neurips.cc)."},{"key":"e_1_3_2_68_2","volume-title":"Proceedings of the ICLR","author":"Huang Zeyu","year":"2023","unstructured":"Zeyu Huang, Yikang Shen, Xiaofeng Zhang, Jie Zhou, Wenge Rong, and Zhang Xiong. 2023. Transformer-Patcher: One mistake worth one neuron. In Proceedings of the ICLR."},{"key":"e_1_3_2_69_2","doi-asserted-by":"crossref","unstructured":"Daphne Ippolito Florian Tram\u00e8r Milad Nasr Chiyuan Zhang Matthew Jagielski Katherine Lee Christopher Choquette-Choo and Nicholas Carlini. 2023. Preventing generation of verbatim memorization in language models gives a false sense of privacy. Proceedings of the 16th International Natural Language Generation Conference (INLG 2023). Prague 28\u201353. (aclanthology.org).","DOI":"10.18653\/v1\/2023.inlg-main.3"},{"key":"e_1_3_2_70_2","unstructured":"Gautier Izacard Mathilde Caron Lucas Hosseini Sebastian Riedel Piotr Bojanowski Armand Joulin and Edouard Grave. 2022. Unsupervised dense information retrieval with contrastive learning. Transactions on Machine Learning Research (accepted Aug 2022). (openreview.net)."},{"key":"e_1_3_2_71_2","volume-title":"Proceedings of the ICLR","author":"Izacard Gautier","year":"2021","unstructured":"Gautier Izacard and Edouard Grave. 2021. Distilling knowledge from reader to retriever for question answering. In Proceedings of the ICLR. Vienna, Austria."},{"key":"e_1_3_2_72_2","first-page":"874","volume-title":"Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume","author":"Izacard Gautier","year":"2021","unstructured":"Gautier Izacard and Edouard Grave. 2021. Leveraging passage retrieval with generative models for open domain question answering. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. Online, 874\u2013880. DOI:10.18653\/v1\/2021.eacl-main.74"},{"key":"e_1_3_2_73_2","unstructured":"Gautier Izacard Patrick Lewis Maria Lomeli Lucas Hosseini Fabio Petroni Timo Schick Jane Dwivedi-Yu Armand Joulin Sebastian Riedel and Edouard Grave. 2022. Atlas: Few-shot Learning with Retrieval Augmented Language Models. arXiv:2208.03299. Retrieved from https:\/\/arxiv.org\/abs\/2208.03299"},{"issue":"12","key":"e_1_3_2_74_2","first-page":"38","article-title":"Survey of hallucination in natural language generation","volume":"55","author":"Ji Ziwei","year":"2023","unstructured":"Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. 2023. Survey of hallucination in natural language generation. ACM Computing Surveys 55, 12, Article 248 (2023), 38 pages.","journal-title":"ACM Computing Surveys"},{"key":"e_1_3_2_75_2","first-page":"1057","volume-title":"Companion Proceedings of the the Web Conference 2018 (WWW\u201918)","author":"Jia Zhen","year":"2018","unstructured":"Zhen Jia, Abdalghani Abujabal, Rishiraj Saha Roy, Jannik Str\u00f6tgen, and Gerhard Weikum. 2018. TempQuestions: A benchmark for temporal question answering. In Companion Proceedings of the the Web Conference 2018 (WWW\u201918). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 1057\u20131062. DOI:10.1145\/3184558.3191536"},{"key":"e_1_3_2_76_2","doi-asserted-by":"publisher","unstructured":"J. Jiang K. Zhou Z. Dong K. Ye W. X. Zhao and J.-R. Wen. 2023. StructGPT: A general framework for large language model to reason over structured data. In Proc. EMNLP 2023. Association for Computational Linguistics 9237\u20139251. 10.18653\/v1\/2023.emnlp-main.574. (aclanthology.org).","DOI":"10.18653\/v1\/2023.emnlp-main.574"},{"key":"e_1_3_2_77_2","doi-asserted-by":"publisher","unstructured":"Z. Jiang A. Anastasopoulos J. Araki H. Ding and G. Neubig. 2020. X-FACTR: Multilingual factual knowledge retrieval from pre-trained language models. In Proc. EMNLP 2020. 5943\u20135959. 10.18653\/v1\/2020.emnlp-main.479. (aclanthology.org).","DOI":"10.18653\/v1\/2020.emnlp-main.479"},{"key":"e_1_3_2_78_2","doi-asserted-by":"publisher","unstructured":"Zhengbao Jiang Frank F. Xu Luyu Gao Zhiqing Sun Qian Liu Jane Dwivedi-Yu Yiming Yang Jamie Callan and Graham Neubig. 2023. Active Retrieval Augmented Generation. In Proc. EMNLP 2023 (Main). 7969\u20137992. 10.18653\/v1\/2023.emnlp-main.495. (aclanthology.org).","DOI":"10.18653\/v1\/2023.emnlp-main.495"},{"key":"e_1_3_2_79_2","volume-title":"Proceedings of the ACL","author":"Joshi Mandar","year":"2017","unstructured":"Mandar Joshi, Eunsol Choi, Daniel S. Weld, and Luke Zettlemoyer. 2017. TriviaQA: A large scale distantly supervised challenge dataset for reading comprehension. In Proceedings of the ACL. Vancouver, Canada."},{"key":"e_1_3_2_80_2","unstructured":"Saurav Kadavath Tom Conerly Amanda Askell Tom Henighan and Dawn Drain. 2022. Language Models (Mostly) Know What They Know. arXiv:2207.05221 [v4]."},{"key":"e_1_3_2_81_2","first-page":"15696","volume-title":"Proceedings of the ICML","author":"Kandpal Nikhil","year":"2023","unstructured":"Nikhil Kandpal, Haikang Deng, Adam Roberts, Eric Wallace, and Colin Raffel. 2023. Large language models struggle to learn long-tail knowledge. In Proceedings of the ICML. PMLR, 15696\u201315707."},{"key":"e_1_3_2_82_2","first-page":"5144","volume-title":"Proceedings of the NAACL","author":"Kang Minki","year":"2022","unstructured":"Minki Kang, Jinheon Baek, and Sung Ju Hwang. 2022. KALA: Knowledge-augmented language model adaptation. In Proceedings of the NAACL. Seattle, United States, 5144\u20135167."},{"key":"e_1_3_2_83_2","unstructured":"Jungo Kasai Keisuke Sakaguchi Yoichi Takahashi Ronan Le Bras Akari Asai Xinyan Velocity Yu Dragomir Radev Noah A. Smith Yejin Choi and Kentaro Inui. 2023. REALTIME QA: what\u2019s the answer right now? In Proceedings of the 37th International Conference on Neural Information Processing Systems (New Orleans LA USA) (NIPS\u201923). Curran Associates Inc. Red Hook NY USA Article 2130 19 pages."},{"key":"e_1_3_2_84_2","doi-asserted-by":"publisher","unstructured":"N. Kassner P. Dufter and H. Sch\u00fctze. 2021. Multilingual LAMA: Investigating knowledge in multilingual pre-trained language models. In Proc. EACL 2021. 3250\u20133258. 10.18653\/v1\/2021.eacl-main.284. (aclanthology.org).","DOI":"10.18653\/v1\/2021.eacl-main.284"},{"key":"e_1_3_2_85_2","unstructured":"Tushar Khot Harsh Trivedi Matthew Finlayson Yao Fu Kyle Richardson Peter Clark and Ashish Sabharwal. 2023. Decomposed Prompting: A Modular Approach for Solving Complex Tasks. ICLR 2023 (camera-ready). arXiv:2210.02406. (arxiv.org)."},{"key":"e_1_3_2_86_2","unstructured":"S. Kotha J. M. Springer and A. Raghunathan. 2024. Understanding catastrophic forgetting in language models via implicit inference. ICLR 2024 (poster). arXiv:2309.10105. (arxiv.org)."},{"key":"e_1_3_2_87_2","doi-asserted-by":"crossref","DOI":"10.1162\/tacl_a_00276","article-title":"Natural questions: A benchmark for question answering research","author":"Kwiatkowski Tom","year":"2019","unstructured":"Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur Parikh, Chris Alberti, Danielle Epstein, Illia Polosukhin, Matthew Kelcey, Jacob Devlin, et al. 2019. Natural questions: A benchmark for question answering research. Transactions of the Association of Computational Linguistics 7 (2019), 452\u2013466.","journal-title":"Transactions of the Association of Computational Linguistics"},{"key":"e_1_3_2_88_2","first-page":"8424","volume-title":"Proceedings of the ACL","author":"Lee Katherine","year":"2022","unstructured":"Katherine Lee, Daphne Ippolito, Andrew Nystrom, Chiyuan Zhang, Douglas Eck, Chris Callison-Burch, and Nicholas Carlini. 2022. Deduplicating training data makes language models better. In Proceedings of the ACL. Dublin, Ireland, 8424\u20138445."},{"key":"e_1_3_2_89_2","first-page":"34586","article-title":"Factuality enhanced language models for open-ended text generation","volume":"35","author":"Lee Nayeon","year":"2022","unstructured":"Nayeon Lee, Wei Ping, Peng Xu, Mostofa Patwary, Pascale N. Fung, Mohammad Shoeybi, and Bryan Catanzaro. 2022. Factuality enhanced language models for open-ended text generation. Advances in Neural Information Processing Systems 35, 2506 (2022), 34586\u201334599.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_90_2","doi-asserted-by":"publisher","DOI":"10.1145\/219717.219745"},{"key":"e_1_3_2_91_2","doi-asserted-by":"crossref","first-page":"333","DOI":"10.18653\/v1\/K17-1034","volume-title":"Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL\u201917)","author":"Levy Omer","year":"2017","unstructured":"Omer Levy, Minjoon Seo, Eunsol Choi, and Luke Zettlemoyer. 2017. Zero-shot relation extraction via reading comprehension. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL\u201917). Vancouver, Canada, 333\u2013342. DOI:10.18653\/v1\/K17-1034"},{"key":"e_1_3_2_92_2","first-page":"1774","volume-title":"Findings of ACL 2023","author":"Li Daliang","year":"2023","unstructured":"Daliang Li, Ankit Singh Rawat, Manzil Zaheer, Xin Wang, Michal Lukasik, Andreas Veit, Felix Yu, and Sanjiv Kumar. 2023. Large language models with controllable working memory. In Findings of ACL 2023. Toronto, Canada, 1774\u20131793."},{"key":"e_1_3_2_93_2","doi-asserted-by":"publisher","unstructured":"J. Li X. Cheng W. X. Zhao J.-Y. Nie and J.-R. Wen. 2023. HaluEval: A large-scale hallucination evaluation benchmark for large language models. In Proc. EMNLP 2023 (Main). 6449\u20136464. 10.18653\/v1\/2023.emnlp-main.397. (aclanthology.org).","DOI":"10.18653\/v1\/2023.emnlp-main.397"},{"key":"e_1_3_2_94_2","unstructured":"K. Li O. Patel F. Vi\u00e9gas H. Pfister and M. Wattenberg. 2023. Inference-time Intervention: Eliciting truthful answers from a language model. In Advances in Neural Information Processing Systems 36 (NeurIPS 2023 spotlight) Paper 1051. (openreview.net)."},{"key":"e_1_3_2_95_2","doi-asserted-by":"publisher","DOI":"10.1145\/2897350.2897352"},{"key":"e_1_3_2_96_2","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","author":"Li Zonglin","year":"2022","unstructured":"Zonglin Li, Ruiqi Guo, and Sanjiv Kumar. 2022. Decoupled context processing for context augmented language modeling. In Proceedings of the Advances in Neural Information Processing Systems. Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (Eds.)."},{"key":"e_1_3_2_97_2","volume-title":"Proceedings of the Text Summarization Branches Out","author":"Lin Chin-Yew","year":"2004","unstructured":"Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Proceedings of the Text Summarization Branches Out."},{"key":"e_1_3_2_98_2","first-page":"3214","volume-title":"Proceedings of the ACL","author":"Lin Stephanie","year":"2022","unstructured":"Stephanie Lin, Jacob Hilton, and Owain Evans. 2022. TruthfulQA: Measuring how models mimic human falsehoods. In Proceedings of the ACL. Dublin, Ireland, 3214\u20133252."},{"key":"e_1_3_2_99_2","doi-asserted-by":"crossref","unstructured":"Y.-T. Lin and Y.-N. Chen. 2023. LLM-Eval: Unified multi-dimensional automatic evaluation for open-domain conversations with large language models. In Proc. 5th NLP4ConvAI Workshop (ACL 2023). 47\u201358.","DOI":"10.18653\/v1\/2023.nlp4convai-1.5"},{"key":"e_1_3_2_100_2","unstructured":"Chen Ling Xujiang Zhao Jiaying Lu Chengyuan Deng Can Zheng Junxiang Wang Tanmoy Chowdhury Yun Li Hejie Cui Tianjiao Zhao et\u00a0al. 2023. Beyond one-model-fits-all: A survey of domain specialization for large language models. arXiv:2305.18703. Retrieved from https:\/\/arxiv.org\/abs\/2305.18703"},{"key":"e_1_3_2_101_2","doi-asserted-by":"crossref","unstructured":"Alisa Liu Zhaofeng Wu Julian Michael Alane Suhr Peter West Alexander Koller Swabha Swayamdipta Noah A. Smith and Yejin Choi. 2023. We\u2019re Afraid language models aren\u2019t modeling ambiguity. In Proc. EMNLP 2023 (Main). 790\u2013807. (aclanthology.org).","DOI":"10.18653\/v1\/2023.emnlp-main.51"},{"key":"e_1_3_2_102_2","unstructured":"Fuxiao Liu Kevin Lin Linjie Li Jianfeng Wang Yaser Yacoob and Lijuan Wang. 2024. Mitigating hallucination in large multi-modal models via robust instruction tuning. ICLR 2024 (poster). arXiv:2306.14565. (openreview.net)."},{"key":"e_1_3_2_103_2","doi-asserted-by":"publisher","unstructured":"Jerry Liu. 2022. LlamaIndex. DOI:10.5281\/zenodo.1234","DOI":"10.5281\/zenodo.1234"},{"key":"e_1_3_2_104_2","doi-asserted-by":"publisher","unstructured":"Nelson F. Liu Kevin Lin John Hewitt Ashwin Paranjape Michele Bevilacqua Fabio Petroni and Percy Liang. 2024. Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics 12 (2024) 157\u2013173. 10.1162\/tacl_a_00638. (aclanthology.org).","DOI":"10.1162\/tacl_a_00638"},{"issue":"3","key":"e_1_3_2_105_2","doi-asserted-by":"crossref","first-page":"513","DOI":"10.1007\/s10287-022-00425-z","article-title":"Accuracy and fairness tradeoffs in machine learning: A stochastic multi-objective approach","volume":"19","author":"Liu Suyun","year":"2022","unstructured":"Suyun Liu and Luis Nunes Vicente. 2022. Accuracy and fairness tradeoffs in machine learning: A stochastic multi-objective approach. Computational Management Science 19, 3 (2022), 513\u2013537.","journal-title":"Computational Management Science"},{"key":"e_1_3_2_106_2","doi-asserted-by":"publisher","unstructured":"Xiaoze Liu Ting Sun Tianyang Xu Feijie Wu Cunxiang Wang Xiaoqian Wang and Jing Gao. 2024. SHIELD: Evaluation and defense strategies for copyright compliance in LLM text generation. In Proc. EMNLP 2024. 1640\u20131670. 10.18653\/v1\/2024.emnlp-main.98. (aclanthology.org).","DOI":"10.18653\/v1\/2024.emnlp-main.98"},{"key":"e_1_3_2_107_2","unstructured":"Xiaoze Liu Feijie Wu Tianyang Xu Zhuo Chen Yichi Zhang Xiaoqian Wang and Jing Gao. 2024. Evaluating the factuality of large language models using large-scale knowledge graphs. IEEE Data Eng. Bull. 48 4 (2024) 87\u2013108."},{"key":"e_1_3_2_108_2","unstructured":"Yinhan Liu Myle Ott Naman Goyal Jingfei Du Mandar Joshi Danqi Chen Omer Levy Mike Lewis Luke Zettlemoyer and Veselin Stoyanov. 2019. RoBERTa: A robustly optimized BERT pretraining approach. arXiv:1907.11692. Retrieved from https:\/\/arxiv.org\/abs\/1907.11692"},{"key":"e_1_3_2_109_2","first-page":"7052","volume-title":"Proceedings of the EMNLP","author":"Longpre Shayne","year":"2021","unstructured":"Shayne Longpre, Kartik Perisetla, Anthony Chen, Nikhil Ramesh, Chris DuBois, and Sameer Singh. 2021. Entity-based knowledge conflicts in question answering. In Proceedings of the EMNLP. 7052\u20137063."},{"key":"e_1_3_2_110_2","volume-title":"Proceedings of the ACL","author":"Lowe Ryan","year":"2017","unstructured":"Ryan Lowe, Michael Noseworthy, Iulian Vlad Serban, Nicolas Angelard-Gontier, Yoshua Bengio, and Joelle Pineau. 2017. Towards an automatic turing test: Learning to evaluate dialogue responses. In Proceedings of the ACL."},{"key":"e_1_3_2_111_2","volume-title":"Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS\u201922)","author":"Lu Pan","year":"2022","unstructured":"Pan Lu, Swaroop Mishra, Tony Xia, Liang Qiu, Kai-Wei Chang, Song-Chun Zhu, Oyvind Tafjord, Peter Clark, and Ashwin Kalyan. 2022. Learn to explain: Multimodal reasoning via thought chains for science question answering. In Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS\u201922)."},{"key":"e_1_3_2_112_2","first-page":"14485","volume-title":"Proceedings of the ICML","author":"Lundstrom Daniel D.","year":"2022","unstructured":"Daniel D. Lundstrom, Tianjian Huang, and Meisam Razaviyayn. 2022. A rigorous study of integrated gradients method and extensions to internal neuron attributions. In Proceedings of the ICML. PMLR, 14485\u201314508."},{"key":"e_1_3_2_113_2","doi-asserted-by":"publisher","unstructured":"Hongyin Luo Yung-Sung Chuang Yuan Gong Tianhua Zhang Yoon Kim Xixin Wu Danny Fox Helen Meng and James Glass. 2023. Search-augmented instruction learning (SAIL). In Findings of ACL: EMNLP 2023. 3717\u20133729. 10.18653\/v1\/2023.findings-emnlp.242. (aclanthology.org).","DOI":"10.18653\/v1\/2023.findings-emnlp.242"},{"key":"e_1_3_2_114_2","unstructured":"Yun Luo Zhen Yang Fandong Meng Yafu Li Jie Zhou and Yue Zhang. 2023. An Empirical Study of Catastrophic Forgetting in Large Language Models During Continual Fine-tuning. arXiv:2308.08747. Retrieved from https:\/\/arxiv.org\/abs\/2308.08747"},{"key":"e_1_3_2_115_2","first-page":"9802","volume-title":"Proceedings of the ACL","author":"Mallen Alex","year":"2023","unstructured":"Alex Mallen, Akari Asai, Victor Zhong, Rajarshi Das, Daniel Khashabi, and Hannaneh Hajishirzi. 2023. When not to trust language models: Investigating effectiveness of parametric and non-parametric memories. In Proceedings of the ACL. Toronto, Canada, 9802\u20139822."},{"key":"e_1_3_2_116_2","doi-asserted-by":"publisher","unstructured":"P. Manakul A. Liusie and M. Gales. 2023. SelfCheckGPT: Zero-resource black-box hallucination detection for generative large language models. In Proc. EMNLP 2023. 9004\u20139017. 10.18653\/v1\/2023.emnlp-main.557. (aclanthology.org).","DOI":"10.18653\/v1\/2023.emnlp-main.557"},{"key":"e_1_3_2_117_2","first-page":"223","volume-title":"Findings of EMNLP 2020","author":"Massarelli Luca","year":"2020","unstructured":"Luca Massarelli, Fabio Petroni, Aleksandra Piktus, Myle Ott, Tim Rockt\u00e4schel, Vassilis Plachouras, Fabrizio Silvestri, and Sebastian Riedel. 2020. How decoding strategies affect the verifiability of generated text. In Findings of EMNLP 2020. 223\u2013235."},{"key":"e_1_3_2_118_2","first-page":"1906","volume-title":"Proceedings of the ACL","author":"Maynez Joshua","year":"2020","unstructured":"Joshua Maynez, Shashi Narayan, Bernd Bohnet, and Ryan McDonald. 2020. On faithfulness and factuality in abstractive summarization. In Proceedings of the ACL. Online, 1906\u20131919."},{"key":"e_1_3_2_119_2","doi-asserted-by":"publisher","unstructured":"J. McCarthy M. L. Minsky N. Rochester and C. E. Shannon. 2006. A proposal for the dartmouth summer research project on artificial intelligence. AI Magazine 27 4 (2006) 12. 10.1609\/aimag.v27i4.1904","DOI":"10.1609\/aimag.v27i4.1904"},{"key":"e_1_3_2_120_2","unstructured":"Alan Melikdjanian. 2018. Captain disillusion\u2019s escape from the \\(\\text{USSR}\\) . Retrieved August 18 2023 from https:\/\/www.youtube.com\/watch?v=MaDz0FCxzR8"},{"key":"e_1_3_2_121_2","article-title":"Locating and editing factual associations in GPT","author":"Meng Kevin","year":"2022","unstructured":"Kevin Meng, David Bau, Alex Andonian, and Yonatan Belinkov. 2022. Locating and editing factual associations in GPT. Advances in Neural Information Processing Systems 35, 1262 (2022), 17359\u201317372.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_122_2","unstructured":"Jacob Menick Maja Trebacz Vladimir Mikulik John Aslanides Francis Song Martin Chadwick Mia Glaese Susannah Young Lucy Campbell-Gillingham Geoffrey Irving and Nat McAleese. 2022. Teaching Language Models to Support Answers with Verified Quotes. arXiv:2203.11147. Retrieved from https:\/\/arxiv.org\/abs\/2203.11147"},{"key":"e_1_3_2_123_2","unstructured":"Microsoft. 2023. Bing chat. Retrieved August 20 2023 from https:\/\/www.bing.com\/new"},{"key":"e_1_3_2_124_2","volume-title":"Speech and Natural Language: Proceedings of a Workshop Held at Harriman, New York, February 23\u201326, 1992","author":"Miller George A.","year":"1992","unstructured":"George A. Miller. 1992. WordNet: A lexical database for English. In Speech and Natural Language: Proceedings of a Workshop Held at Harriman, New York, February 23\u201326, 1992."},{"issue":"4","key":"e_1_3_2_125_2","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1093\/ijl\/3.4.235","article-title":"Introduction to WordNet: An on-line lexical database*","volume":"3","author":"Miller George A.","year":"1990","unstructured":"George A. Miller, Richard Beckwith, Christiane Fellbaum, Derek Gross, and Katherine J. Miller. 1990. Introduction to WordNet: An on-line lexical database*. International Journal of Lexicography 3, 4 (121990), 235\u2013244.","journal-title":"International Journal of Lexicography"},{"key":"e_1_3_2_126_2","doi-asserted-by":"publisher","unstructured":"Sewon Min Kalpesh Krishna Xinxi Lyu et\u00a0al. 2023. FActScore: Fine-grained atomic evaluation of factual precision in long form text generation. In Proc. EMNLP 2023. Singapore 12076\u201312100. DOI:10.18653\/v1\/2023.emnlp-main.741. (aclanthology.org).","DOI":"10.18653\/v1\/2023.emnlp-main.741"},{"key":"e_1_3_2_127_2","volume-title":"Proceedings of the ICLR","author":"Mitchell Eric","year":"2022","unstructured":"Eric Mitchell, Charles Lin, Antoine Bosselut, Chelsea Finn, and Christopher D. Manning. 2022. Fast model editing at scale. In Proceedings of the ICLR."},{"key":"e_1_3_2_128_2","volume-title":"Proceedings of the ICML","author":"Mitchell Eric","year":"2022","unstructured":"Eric Mitchell, Charles Lin, Antoine Bosselut, Christopher D. Manning, and Chelsea Finn. 2022. Memory-based model editing at scale. In Proceedings of the ICML."},{"key":"e_1_3_2_129_2","first-page":"1581","volume-title":"Proceedings of the NAACL","author":"Moiseev Fedor","year":"2022","unstructured":"Fedor Moiseev, Zhe Dong, Enrique Alfonseca, and Martin Jaggi. 2022. SKILL: Structured knowledge infusion for large language models. In Proceedings of the NAACL. Seattle, United States, 1581\u20131588."},{"key":"e_1_3_2_130_2","unstructured":"Reiichiro Nakano Jacob Hilton Suchir Balaji Jeff Wu Long Ouyang Christina Kim Christopher Hesse Shantanu Jain Vineet Kosaraju William Saunders Xu Jiang Karl Cobbe Tyna Eloundou Gretchen Krueger Kevin Button Matthew Knight Benjamin Chess and John Schulman. 2022. WebGPT: Browser-assisted Question-answering with Human Feedback. arXiv:2112.09332. Retrieved from https:\/\/arxiv.org\/abs\/2112.09332"},{"key":"e_1_3_2_131_2","first-page":"10056","volume-title":"Proceedings of the ACL","author":"Neeman Ella","year":"2023","unstructured":"Ella Neeman, Roee Aharoni, Or Honovich, Leshem Choshen, Idan Szpektor, and Omri Abend. 2023. DisentQA: Disentangling parametric and contextual knowledge with counterfactual question answering. In Proceedings of the ACL. Toronto, Canada, 10056\u201310070."},{"key":"e_1_3_2_132_2","doi-asserted-by":"publisher","DOI":"10.1145\/360018.360022"},{"key":"e_1_3_2_133_2","unstructured":"Harsha Nori Nicholas King Scott Mayer McKinney Dean Carignan and Eric Horvitz. 2023. Capabilities of GPT-4 on Medical Challenge Problems. arXiv:2303.13375. Retrieved from https:\/\/arxiv.org\/abs\/2303.13375"},{"key":"e_1_3_2_134_2","unstructured":"OpenAI. 2022. Introducing chatgpt. Retrieved September 3 2023 from https:\/\/openai.com\/blog\/chatgpt"},{"key":"e_1_3_2_135_2","unstructured":"OpenAI. 2023. GPT-4 technical report. arXiv:2303.08774. Retrieved from https:\/\/arxiv.org\/abs\/2303.08774"},{"key":"e_1_3_2_136_2","unstructured":"Oded Ovadia Menachem Brief Moshik Mishaeli and Oren Elisha. 2023. Fine-Tuning or Retrieval? Comparing Knowledge Injection in LLMs. arXiv:2312.05934. Retrieved from https:\/\/arxiv.org\/abs\/2312.05934"},{"key":"e_1_3_2_137_2","doi-asserted-by":"publisher","unstructured":"Shirui Pan Linhao Luo et\u00a0al. 2024. Unifying large language models and knowledge graphs: A roadmap. IEEE Transactions on Knowledge and Data Engineering 36 7 (2024) 3580\u20133599. DOI:10.1109\/TKDE.2024.3352100. (researchgate.net).","DOI":"10.1109\/TKDE.2024.3352100"},{"key":"e_1_3_2_138_2","unstructured":"Yikang Pan Liangming Pan Wenhu Chen Preslav Nakov Min-Yen Kan and William Yang Wang. 2023. On the risk of misinformation pollution with large language models. arXiv:2305.13661. Retrieved from https:\/\/arxiv.org\/abs\/2305.13661"},{"key":"e_1_3_2_139_2","volume-title":"Proceedings of the ACL","author":"Papineni Kishore","year":"2002","unstructured":"Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the ACL."},{"key":"e_1_3_2_140_2","volume-title":"Open-LLM-Leaderboard-Report","author":"Park Daniel","year":"2023","unstructured":"Daniel Park. 2023. Open-LLM-Leaderboard-Report. Retrieved October 1, 2023 from https:\/\/github.com\/dsdanielpark\/Open-LLM-Leaderboard-Report"},{"key":"e_1_3_2_141_2","unstructured":"Baolin Peng Michel Galley Pengcheng He Hao Cheng Yujia Xie Yu Hu Qiuyuan Huang Lars Liden Zhou Yu Weizhu Chen and Jianfeng Gao. 2023. Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback. arXiv:2302.12813. Retrieved from https:\/\/arxiv.org\/abs\/2302.12813"},{"key":"e_1_3_2_142_2","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing","author":"Petroni Fabio","year":"2019","unstructured":"Fabio Petroni, Tim Rockt\u00e4schel, Patrick Lewis, Anton Bakhtin, Yuxiang Wu, Alexander H. Miller, and Sebastian Riedel. 2019. Language models as knowledge bases?. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. https:\/\/api.semanticscholar.org\/CorpusID:202539551"},{"key":"e_1_3_2_143_2","first-page":"2463","volume-title":"Proceedings of the EMNLP","author":"Petroni Fabio","year":"2019","unstructured":"Fabio Petroni, Tim Rockt\u00e4schel, Sebastian Riedel, Patrick Lewis, Anton Bakhtin, Yuxiang Wu, and Alexander Miller. 2019. Language models as knowledge bases?. In Proceedings of the EMNLP. Hong Kong, China, 2463\u20132473. DOI:10.18653\/v1\/D19-1250"},{"key":"e_1_3_2_144_2","first-page":"2463","volume-title":"Proceedings of the EMNLP","author":"Petroni Fabio","year":"2019","unstructured":"Fabio Petroni, Tim Rockt\u00e4schel, Sebastian Riedel, Patrick Lewis, Anton Bakhtin, Yuxiang Wu, and Alexander Miller. 2019. Language models as knowledge bases?. In Proceedings of the EMNLP. Hong Kong, China, 2463\u20132473. DOI:10.18653\/v1\/D19-1250"},{"key":"e_1_3_2_145_2","doi-asserted-by":"publisher","unstructured":"Fabio Petroni Tim Rockt\u00e4schel et\u00a0al. 2019. Language models as knowledge bases? In Proc. EMNLP-IJCNLP 2019. Hong Kong 2463\u20132473. DOI:10.18653\/v1\/D19-1250. (aclanthology.org).","DOI":"10.18653\/v1\/D19-1250"},{"key":"e_1_3_2_146_2","doi-asserted-by":"crossref","unstructured":"Pouya Pezeshkpour. 2023. Measuring and modifying factual knowledge in large language models. In IEEE ICMLA 2023. 831\u2013838. (researchr.org).","DOI":"10.1109\/ICMLA58977.2023.00122"},{"key":"e_1_3_2_147_2","article-title":"ChatGPT invented a sexual harassment scandal and named a real law prof as the accused","author":"Verma Will Oremus Pranshu","year":"2023","unstructured":"Will Oremus Pranshu Verma. 2023. ChatGPT invented a sexual harassment scandal and named a real law prof as the accused. The Washington Post (2023). Retrieved from https:\/\/www.washingtonpost.com\/technology\/2023\/04\/05\/chatgpt-lies\/","journal-title":"The Washington Post"},{"key":"e_1_3_2_148_2","unstructured":"Xiao Pu Mingqi Gao and Xiaojun Wan. 2023. Summarization is (Almost) Dead. arXiv:2309.09558. Retrieved from https:\/\/arxiv.org\/abs\/2309.09558"},{"key":"e_1_3_2_149_2","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing","author":"Qi Jirui","year":"2023","unstructured":"Jirui Qi, Raquel Fern\u2019andez, and Arianna Bisazza. 2023. Cross-lingual consistency of factual knowledge in multilingual language models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Retrieved from https:\/\/api.semanticscholar.org\/CorpusID:264145744"},{"key":"e_1_3_2_150_2","doi-asserted-by":"publisher","unstructured":"Yujia Qin Zihan Cai et\u00a0al. 2023. WebCPM: Interactive web search for chinese long-form question answering. Proc. ACL 2023 (Long Papers Vol. 1). 8968\u20138988. DOI:10.18653\/v1\/2023.acl-long.499. (aclanthology.org).","DOI":"10.18653\/v1\/2023.acl-long.499"},{"key":"e_1_3_2_151_2","unstructured":"Rafael Rafailov Archit Sharma et\u00a0al. 2023. Direct preference optimization: Your language model is secretly a reward model. Advances in Neural Information Processing Systems (NeurIPS 2023) 36 (2023) 14. Paper ID HPuSIXJaa9. (dblp.org)."},{"issue":"140","key":"e_1_3_2_152_2","first-page":"1","article-title":"Exploring the limits of transfer learning with a unified text-to-text transformer","volume":"21","author":"Raffel Colin","year":"2020","unstructured":"Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research 21, 140 (2020), 1\u201367. Retrieved from http:\/\/jmlr.org\/papers\/v21\/20-074.html","journal-title":"Journal of Machine Learning Research"},{"issue":"140","key":"e_1_3_2_153_2","first-page":"1","article-title":"Exploring the limits of transfer learning with a unified text-to-text transformer","volume":"21","author":"Raffel Colin","year":"2020","unstructured":"Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research 21, 140 (2020), 1\u201367. Retrieved from http:\/\/jmlr.org\/papers\/v21\/20-074.html","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_2_154_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1162\/coli_a_00490","article-title":"Measuring attribution in natural language generation models","author":"Rashkin Hannah","year":"2023","unstructured":"Hannah Rashkin, Vitaly Nikolaev, Matthew Lamm, Lora Aroyo, Michael Collins, Dipanjan Das, Slav Petrov, Gaurav Singh Tomar, Iulia Turc, and David Reitter. 2023. Measuring attribution in natural language generation models. Computational Linguistics 49, 4 (2023), 1\u201366.","journal-title":"Computational Linguistics"},{"key":"e_1_3_2_155_2","unstructured":"Ruiyang Ren Yuhao Wang et\u00a0al. 2025. Investigating the factual knowledge boundary of large language models with retrieval augmentation. Proc. COLING 2025. 3697\u20133715. (aclanthology.org)."},{"key":"e_1_3_2_156_2","first-page":"2960","volume-title":"Proceedings of the EACL","author":"Sadeq Nafis","year":"2023","unstructured":"Nafis Sadeq, Byungkyu Kang, Prarit Lamba, and Julian McAuley. 2023. Unsupervised improvement of factual knowledge in language models. In Proceedings of the EACL. Dubrovnik, Croatia, 2960\u20132969."},{"key":"e_1_3_2_157_2","volume-title":"Proceedings of the ACL","author":"Sellam Thibault","year":"2020","unstructured":"Thibault Sellam, Dipanjan Das, and Ankur Parikh. 2020. BLEURT: Learning robust metrics for text generation. In Proceedings of the ACL."},{"key":"e_1_3_2_158_2","unstructured":"Sheikh Shafayat Eunsu Kim Juhyun Oh and Alice Oh. 2024. Multi-FAct: Assessing multilingual LLMs\u2019 multi-regional knowledge using FActScore. arXiv:2402.18045. Retrieved from https:\/\/arxiv.org\/abs\/2402.18045"},{"key":"e_1_3_2_159_2","first-page":"31210","volume-title":"Proceedings of the ICML","author":"Shi Freda","year":"2023","unstructured":"Freda Shi, Xinyun Chen, Kanishka Misra, Nathan Scales, David Dohan, Ed H Chi, Nathanael Sch\u00e4rli, and Denny Zhou. 2023. Large language models can be easily distracted by irrelevant context. In Proceedings of the ICML. PMLR, 31210\u201331227."},{"key":"e_1_3_2_160_2","doi-asserted-by":"publisher","unstructured":"Weijia Shi Sewon Min et\u00a0al. 2024. REPLUG: Retrieval-augmented black-box language models. Proc. NAACL-HLT 2024 (Long Papers). 8371\u20138384. DOI:10.18653\/v1\/2024.naacl-long.463. (aclanthology.org).","DOI":"10.18653\/v1\/2024.naacl-long.463"},{"key":"e_1_3_2_161_2","unstructured":"Noah Shinn Federico Cassano et\u00a0al. 2023. Reflexion: Language agents with verbal reinforcement learning. 37th Conf. on Neural Info. Processing Systems NeurIPS 2023. 12. (dblp1.uni-trier.de)."},{"key":"e_1_3_2_162_2","first-page":"25968","article-title":"End-to-end training of multi-document reader and retriever for open-domain question answering","volume":"34","author":"Singh Devendra","year":"2021","unstructured":"Devendra Singh, Siva Reddy, Will Hamilton, Chris Dyer, and Dani Yogatama. 2021. End-to-end training of multi-document reader and retriever for open-domain question answering. Advances in Neural Information Processing Systems 34, 1988 (2021), 25968\u201325981.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_163_2","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/2024.nlp4pi-1.2","article-title":"Multilingual fact-checking using LLMs","author":"Singhal Aryan","year":"2024","unstructured":"Aryan Singhal, Thomas Law, Coby Kassner, Ayushman Gupta, Evan Duan, Aviral Damle, and Ryan Li. 2024. Multilingual fact-checking using LLMs. In Proceedings of the 3rd Workshop on NLP for Positive Impact. Retrieved from https:\/\/api.semanticscholar.org\/CorpusID:273901487","journal-title":"In Proceedings of the 3rd Workshop on NLP for Positive Impact."},{"key":"e_1_3_2_164_2","first-page":"16857","article-title":"Mpnet: Masked and permuted pre-training for language understanding","volume":"33","author":"Song Kaitao","year":"2020","unstructured":"Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, and Tie-Yan Liu. 2020. Mpnet: Masked and permuted pre-training for language understanding. Advances in Neural Information Processing Systems 33, 1414 (2020), 16857\u201316867.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_165_2","volume-title":"Proceedings of the AAAI","author":"Speer Robyn","year":"2017","unstructured":"Robyn Speer, Joshua Chin, and Catherine Havasi. 2017. Conceptnet 5.5: An open multilingual graph of general knowledge. In Proceedings of the AAAI."},{"key":"e_1_3_2_166_2","article-title":"Beyond the imitation game: Quantifying and extrapolating the capabilities of language models","author":"Srivastava Aarohi","year":"2023","unstructured":"Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adri\u00e0 Garriga-Alonso, et\u00a0al. 2023. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. TMLR (2023).","journal-title":"TMLR"},{"key":"e_1_3_2_167_2","unstructured":"Hao Sun Xiao Liu Yeyun Gong Anlei Dong Jingwen Lu Yan Zhang Daxin Jiang Linjun Yang Rangan Majumder and Nan Duan. 2023. BeamSearchQA: Large Language Models are Strong Zero-Shot QA Solver. arXiv:2305.14766. Retrieved from https:\/\/arxiv.org\/abs\/2305.14766"},{"key":"e_1_3_2_168_2","doi-asserted-by":"crossref","unstructured":"Kai Sun Yifan Ethan Xu Hanwen Zha Yue Liu and Xin Luna Dong. 2023. Head-to-tail: How knowledgeable are large language models (LLMs)? A.K.A. will LLMs replace knowledge graphs? In Proc. NAACL-HLT 2024 (Vol 1 - Long Papers). 311\u2013325. (aclanthology.org).","DOI":"10.18653\/v1\/2024.naacl-long.18"},{"key":"e_1_3_2_169_2","first-page":"13618","volume-title":"Proceedings of the AAAI","author":"Sun Weiwei","year":"2023","unstructured":"Weiwei Sun, Zhengliang Shi, Shen Gao, Pengjie Ren, Maarten de Rijke, and Zhaochun Ren. 2023. Contrastive learning reduces hallucination in conversations. In Proceedings of the AAAI. 13618\u201313626."},{"key":"e_1_3_2_170_2","doi-asserted-by":"crossref","unstructured":"Zhiqing Sun Sheng Shen Shengcao Cao et\u00a0al. 2024. Aligning large multimodal models with factually augmented RLHF. Findings of ACL 2024. 13088\u201313110. (aclanthology.org).","DOI":"10.18653\/v1\/2024.findings-acl.775"},{"key":"e_1_3_2_171_2","first-page":"5220","volume-title":"Findings of ACL 2023","author":"Tam Derek","year":"2023","unstructured":"Derek Tam, Anisha Mascarenhas, Shiyue Zhang, Sarah Kwan, Mohit Bansal, and Colin Raffel. 2023. Evaluating the factual consistency of large language models through news summarization. In Findings of ACL 2023. Toronto, Canada, 5220\u20135255."},{"key":"e_1_3_2_172_2","doi-asserted-by":"crossref","unstructured":"Yiming Tan Dehai Min Yu Li et\u00a0al. 2023. Can ChatGPT Replace Traditional KBQA Models? An In-depth analysis of the question-answering Performance of the GPT LLM Family. In Proc. ISWC 2023 (LNCS 14265). 348\u2013367. (link.springer.com).","DOI":"10.1007\/978-3-031-47240-4_19"},{"key":"e_1_3_2_173_2","doi-asserted-by":"crossref","first-page":"48","DOI":"10.18653\/v1\/2023.clinicalnlp-1.7","volume-title":"Proceedings of the 5th Clinical Natural Language Processing Workshop","author":"Tang Xiangru","year":"2023","unstructured":"Xiangru Tang, Arman Cohan, and Mark Gerstein. 2023. Aligning factual consistency for clinical studies summarization through reinforcement learning. In Proceedings of the 5th Clinical Natural Language Processing Workshop. Toronto, Canada, 48\u201358."},{"key":"e_1_3_2_174_2","first-page":"1","article-title":"Large language models in medicine","author":"Thirunavukarasu Arun James","year":"2023","unstructured":"Arun James Thirunavukarasu, Darren Shu Jeng Ting, Kabilan Elangovan, Laura Gutierrez, Ting Fang Tan, and Daniel Shu Wei Ting. 2023. Large language models in medicine. Nature Medicine 29, 8 (2023), 1\u201311.","journal-title":"Nature Medicine"},{"key":"e_1_3_2_175_2","unstructured":"Hugo Touvron Thibaut Lavril Gautier Izacard Xavier Martinet Marie-Anne Lachaux Timoth\u00e9e Lacroix Baptiste Rozi\u00e8re Naman Goyal Eric Hambro Faisal Azhar Aurelien Rodriguez Armand Joulin Edouard Grave and Guillaume Lample. 2023. LLaMA: Open and Efficient Foundation Language Models. arXiv:2302.13971. Retrieved from https:\/\/arxiv.org\/abs\/2302.13971"},{"key":"e_1_3_2_176_2","unstructured":"Hugo Touvron Louis Martin Kevin Stone Peter Albert Amjad Almahairi Yasmine Babaei Nikolay Bashlykov Soumya Batra Prajjwal Bhargava Shruti Bhosale Dan Bikel Lukas Blecher Cristian Canton Ferrer Moya Chen Guillem Cucurull David Esiobu Jude Fernandes Jeremy Fu Wenyin Fu Brian Fuller Cynthia Gao Vedanuj Goswami Naman Goyal Anthony Hartshorn Saghar Hosseini Rui Hou Hakan Inan Marcin Kardas Viktor Kerkez Madian Khabsa Isabel Kloumann Artem Korenev Punit Singh Koura Marie-Anne Lachaux Thibaut Lavril Jenya Lee Diana Liskovich Yinghai Lu Yuning Mao Xavier Martinet Todor Mihaylov Pushkar Mishra Igor Molybog Yixin Nie Andrew Poulton Jeremy Reizenstein Rashi Rungta Kalyan Saladi Alan Schelten Ruan Silva Eric Michael Smith Ranjan Subramanian Xiaoqing Ellen Tan Binh Tang Ross Taylor Adina Williams Jian Xiang Kuan Puxin Xu Zheng Yan Iliyan Zarov Yuchen Zhang Angela Fan Melanie Kambadur Sharan Narang Aurelien Rodriguez Robert Stojnic Sergey Edunov and Thomas Scialom. 2023. Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv:2307.09288. Retrieved from https:\/\/arxiv.org\/abs\/2307.09288"},{"key":"e_1_3_2_177_2","first-page":"10014","volume-title":"Proceedings of the ACL","author":"Trivedi Harsh","year":"2023","unstructured":"Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, and Ashish Sabharwal. 2023. Interleaving retrieval with chain-of-thought reasoning for knowledge-intensive multi-step questions. In Proceedings of the ACL. Toronto, Canada, 10014\u201310037."},{"key":"e_1_3_2_178_2","unstructured":"Neeraj Varshney Wenlin Yao Hongming Zhang Jianshu Chen and Dong Yu. 2023. A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation. arXiv:2307.03987. Retrieved from https:\/\/arxiv.org\/abs\/2307.03987"},{"issue":"10","key":"e_1_3_2_179_2","doi-asserted-by":"crossref","first-page":"78","DOI":"10.1145\/2629489","article-title":"Wikidata: A free collaborative knowledgebase","volume":"57","author":"Vrande\u010di\u0107 Denny","year":"2014","unstructured":"Denny Vrande\u010di\u0107 and Markus Kr\u00f6tzsch. 2014. Wikidata: A free collaborative knowledgebase. Communications of the ACM 57, 10 (2014), 78\u201385.","journal-title":"Communications of the ACM"},{"key":"e_1_3_2_180_2","doi-asserted-by":"crossref","unstructured":"Kim Trong Vu Michael Krumdick Varshini Reddy Franck Dernoncourt and Viet Dac Lai. 2024. An analysis of multilingual FActScore. arXiv:2406.19415. Retrieved from https:\/\/arxiv.org\/abs\/2406.19415","DOI":"10.18653\/v1\/2024.emnlp-main.247"},{"key":"e_1_3_2_181_2","doi-asserted-by":"crossref","unstructured":"Tu Vu Mohit Iyyer Xuezhi Wang et\u00a0al. 2024. FreshLLMs: Refreshing large language models with search engine augmentation. Findings of ACL 2024. 13697\u201313720. (aclanthology.org).","DOI":"10.18653\/v1\/2024.findings-acl.813"},{"key":"e_1_3_2_182_2","doi-asserted-by":"crossref","unstructured":"Zhongwei Wan Yichun Yin Wei Zhang et\u00a0al. 2022. G-MAP: General memory-augmented Pre-trained language model for domain tasks. In Proc. EMNLP 2022. 6585\u20136597. (aclanthology.org).","DOI":"10.18653\/v1\/2022.emnlp-main.441"},{"key":"e_1_3_2_183_2","unstructured":"Cunxiang Wang Sirui Cheng Qipeng Guo Yuanhao Yue Bowen Ding Zhikun Xu Yidong Wang Xiangkun Hu Zheng Zhang and Yue Zhang. 2023. Evaluating open question answering evaluation. In Proceedings of the 37th International Conference on Neural Information Processing Systems NIPS\u201923. 77013\u201377042."},{"key":"e_1_3_2_184_2","first-page":"3241","volume-title":"Proceedings of the ACL","author":"Wang Cunxiang","year":"2021","unstructured":"Cunxiang Wang, Pai Liu, and Yue Zhang. 2021. Can generative pre-trained language models serve as knowledge bases for closed-book QA?. In Proceedings of the ACL. Online, 3241\u20133251."},{"key":"e_1_3_2_185_2","doi-asserted-by":"crossref","first-page":"444","DOI":"10.1007\/978-3-031-44696-2_35","volume-title":"Proceedings of the Natural Language Processing and Chinese Computing","author":"Wang Cunxiang","year":"2023","unstructured":"Cunxiang Wang, Fuli Luo, Yanyang Li, Runxin Xu, Fei Huang, and Yue Zhang. 2023. Knowledgeable salient span mask for enhancing language models as knowledge base. In Proceedings of the Natural Language Processing and Chinese Computing. Fei Liu, Nan Duan, Qingting Xu, and Yu Hong (Eds.), Springer Nature Switzerland, Cham, 444\u2013456."},{"key":"e_1_3_2_186_2","unstructured":"Cunxiang Wang Ruoxi Ning Boqi Pan Tonghui Wu Qipeng Guo Cheng Deng Guangsheng Bao Xiangkun Hu Zheng Zhang Qian Wang and Yue Zhang. 2024. NovelQA: Benchmarking Question Answering on Documents Exceeding 200K Tokens. arXiv:2403.12766. Retrieved from https:\/\/arxiv.org\/abs\/2403.12766"},{"key":"e_1_3_2_187_2","unstructured":"Haochun Wang Chi Liu Nuwa Xi Zewen Qiang Sendong Zhao Bing Qin and Ting Liu. 2023. HuaTuo: Tuning LLaMA Model with Chinese Medical Knowledge. arXiv:2304.06975. Retrieved from https:\/\/arxiv.org\/abs\/2304.06975"},{"key":"e_1_3_2_188_2","doi-asserted-by":"crossref","unstructured":"Xinpeng Wang Bolei Ma Chengzhi Hu et\u00a0al. 2024. My answer is C\u201d: First-token probabilities do not match text answers in instruction-tuned language models. Findings of ACL 2024. 7407\u20137416.","DOI":"10.18653\/v1\/2024.findings-acl.441"},{"key":"e_1_3_2_189_2","unstructured":"Xuezhi Wang Jason Wei Dale Schuurmans et\u00a0al. 2023. Self-consistency improves chain-of-thought reasoning in language models. In Proc. ICLR 2023 (poster). (openreview.net)."},{"key":"e_1_3_2_190_2","unstructured":"Yike Wang Shangbin Feng Heng Wang Weijia Shi Vidhisha Balachandran Tianxing He and Yulia Tsvetkov. 2024. Resolving knowledge conflicts in large language models. COLM 2024. arXiv:2310.00935. Retrieved from https:\/\/arxiv.org\/abs\/2310.00935"},{"key":"e_1_3_2_191_2","unstructured":"Yihan Wang Si Si Daliang Li Michal Lukasik Felix Yu Cho-Jui Hsieh Inderjit S Dhillon and Sanjiv Kumar. 2022. Preserving in-context learning ability in large language model fine-tuning. arXiv:2211.00635. Retrieved from https:\/\/arxiv.org\/abs\/2211.00635"},{"key":"e_1_3_2_192_2","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","author":"Wei Jason","year":"2022","unstructured":"Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, brian ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. 2022. Chain of thought prompting elicits reasoning in large language models. In Proceedings of the Advances in Neural Information Processing Systems. Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (Eds.)."},{"key":"e_1_3_2_193_2","doi-asserted-by":"crossref","unstructured":"Orion Weller Marc Marone Nathaniel Weir et\u00a0al. 2024. \u201cAccording to ...\u201d: prompting language models improves quoting from pre-training data. In Proc. EACL 2024. 2288\u20132301. Malta. ACL. (aclanthology.org).","DOI":"10.18653\/v1\/2024.eacl-long.140"},{"key":"e_1_3_2_194_2","unstructured":"Shijie Wu Ozan Irsoy Steven Lu Vadim Dabravolski Mark Dredze Sebastian Gehrmann Prabhanjan Kambadur David Rosenberg and Gideon Mann. 2023. BloombergGPT: A Large Language Model for Finance. arXiv:2303.17564. Retrieved from https:\/\/arxiv.org\/abs\/2303.17564"},{"key":"e_1_3_2_195_2","unstructured":"Jian Xie Kai Zhang Jiangjie Chen Renze Lou and Yu Su. 2023. Adaptive Chameleon or Stubborn Sloth: Unraveling the Behavior of Large Language Models in Knowledge Clashes. ICLR 2024 Spotlight."},{"key":"e_1_3_2_196_2","unstructured":"Honglin Xiong Sheng Wang Yitao Zhu Zihao Zhao Yuxiao Liu Linlin Huang Qian Wang and Dinggang Shen. 2023. DoctorGLM: Fine-tuning your Chinese Doctor is not a Herculean Task. arXiv:2304.01097. Retrieved from https:\/\/arxiv.org\/abs\/2304.01097"},{"key":"e_1_3_2_197_2","doi-asserted-by":"crossref","unstructured":"Rongwu Xu Zehan Qi Zhijiang Guo Cunxiang Wang Hongru Wang Yue Zhang and Wei Xu. 2024. Knowledge Conflicts for LLMs: A Survey. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing Yaser Al-Onaizan Mohit Bansal and Yun-Nung Chen (Eds.). Association for Computational Linguistics 8541\u20138565. https:\/\/aclanthology.org\/2024.emnlp-main.486\/","DOI":"10.18653\/v1\/2024.emnlp-main.486"},{"key":"e_1_3_2_198_2","doi-asserted-by":"crossref","unstructured":"Tianyang Xu Shujin Wu Shizhe Diao Xiaoze Liu Xingyao Wang Yangyi Chen and Jing Gao. 2024. SaySelf: Teaching LLMs to express confidence with self-reflective rationales. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing Yaser Al-Onaizan Mohit Bansal and Yun-Nung Chen (Eds.). Association for Computational Linguistics 5985\u20135998. https:\/\/aclanthology.org\/2024.emnlp-main.343\/","DOI":"10.18653\/v1\/2024.emnlp-main.343"},{"key":"e_1_3_2_199_2","unstructured":"Linyao Yang Hongyang Chen Zhao Li Xiao Ding and Xindong Wu. 2023. ChatGPT is not Enough: Enhancing Large Language Models with Knowledge Graphs for Fact-aware Language Modeling. arXiv:2306.11489. Retrieved from https:\/\/arxiv.org\/abs\/2306.11489"},{"key":"e_1_3_2_200_2","unstructured":"Shunyu Yao Jeffrey Zhao Dian Yu et\u00a0al. 2023. ReAct: Synergizing reasoning and acting in language models. In Proc. ICLR 2023 (poster). (openreview.net)."},{"key":"e_1_3_2_201_2","doi-asserted-by":"crossref","unstructured":"Yunzhi Yao Peng Wang Bozhong Tian et\u00a0al. 2023. Editing large language models: Problems Methods and Opportunities. In Proc. EMNLP 2023. Singapore. ACL. (aclanthology.org) 10222\u201310240.","DOI":"10.18653\/v1\/2023.emnlp-main.632"},{"key":"e_1_3_2_202_2","doi-asserted-by":"crossref","unstructured":"Xi Ye Ruoxi Sun Sercan \u00d6. Arik and Tomas Pfister. 2024. Effective Large Language Model Adaptation for Improved Grounding and Citation Generation. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) Kevin Duh Helena Gomez and Steven Bethard (Eds.). Association for Computational Linguistics 6237\u20136251. https:\/\/aclanthology.org\/2024.naacl-long.346\/","DOI":"10.18653\/v1\/2024.naacl-long.346"},{"key":"e_1_3_2_203_2","first-page":"8653","volume-title":"Findings of ACL 2023","author":"Yin Zhangyue","year":"2023","unstructured":"Zhangyue Yin, Qiushi Sun, Qipeng Guo, Jiawen Wu, Xipeng Qiu, and Xuanjing Huang. 2023. Do large language models know what they don\u2019t know?. In Findings of ACL 2023. Toronto, Canada, 8653\u20138665."},{"key":"e_1_3_2_204_2","volume-title":"Proceedings of the ICLR","author":"Yu Wenhao","year":"2023","unstructured":"Wenhao Yu, Dan Iter, Shuohang Wang, Yichong Xu, Mingxuan Ju, Soumya Sanyal, Chenguang Zhu, Michael Zeng, and Meng Jiang. 2023. Generate rather than retrieve: Large language models are strong context generators. In Proceedings of the ICLR."},{"key":"e_1_3_2_205_2","first-page":"27263","article-title":"Bartscore: Evaluating generated text as text generation","volume":"34","author":"Yuan Weizhe","year":"2021","unstructured":"Weizhe Yuan, Graham Neubig, and Pengfei Liu. 2021. Bartscore: Evaluating generated text as text generation. Advances in Neural Information Processing Systems 34, 2088 (2021), 27263\u201327277.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_206_2","doi-asserted-by":"crossref","unstructured":"Xiang Yue Boshi Wang Kai Zhang et\u00a0al. 2023. Automatic evaluation of attribution by large language models. Findings of EMNLP 2023. 4615\u20134635.","DOI":"10.18653\/v1\/2023.findings-emnlp.307"},{"key":"e_1_3_2_207_2","unstructured":"Yuexiang Zhai Shengbang Tong Xiao Li Mu Cai Qing Qu Yong Jae Lee and Yi Ma. 2023. Investigating the catastrophic forgetting in multimodal large language models. CPAL 2024 (Proceedings Track) Oral. https:\/\/openreview.net\/forum?id=g7rMSiNtmA"},{"key":"e_1_3_2_208_2","doi-asserted-by":"crossref","unstructured":"Hanning Zhang Shizhe Diao Yong Lin Yi R. Fung Qing Lian Xingyao Wang Yangyi Chen Heng Ji and Tong Zhang. 2024. R-Tuning: Instructing Large Language Models to Say \u2018I Don\u2019t Know\u2019. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) Kevin Duh Helena Gomez and Steven Bethard (Eds.). Association for Computational Linguistics 7113\u20137139. https:\/\/aclanthology.org\/2024.naacl-long.394\/","DOI":"10.18653\/v1\/2024.naacl-long.394"},{"key":"e_1_3_2_209_2","unstructured":"Muru Zhang Ofir Press William Merrill Alisa Liu and Noah A. Smith. 2023. How language model hallucinations can snowball. In Proceedings of the 41st International Conference on Machine Learning ICML\u201924. 59670\u201359684."},{"key":"e_1_3_2_210_2","unstructured":"Shuo Zhang Liangming Pan Junzhou Zhao and William Yang Wang. 2023. Mitigating Language Model Hallucination with Interactive Question-Knowledge Alignment. arXiv:2305.13669. Retrieved from https:\/\/arxiv.org\/abs\/2305.13669"},{"key":"e_1_3_2_211_2","volume-title":"Proceedings of the ICLR","author":"Zhang Tianyi","year":"2020","unstructured":"Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, and Yoav Artzi. 2020. BERTScore: Evaluating text generation with BERT. In Proceedings of the ICLR."},{"key":"e_1_3_2_212_2","unstructured":"Tianhua Zhang Hongyin Luo Yung-Sung Chuang Wei Fang Luc Gaitskell Thomas Hartvigsen Xixin Wu Danny Fox Helen Meng and James Glass. 2023. Interpretable Unified Language Checking. arXiv:2304.03728. Retrieved from https:\/\/arxiv.org\/abs\/2304.03728"},{"key":"e_1_3_2_213_2","first-page":"10641","volume-title":"Proceedings of the ACL","author":"Zhang Zhengyan","year":"2023","unstructured":"Zhengyan Zhang, Zhiyuan Zeng, Yankai Lin, Huadong Wang, Deming Ye, Chaojun Xiao, Xu Han, Zhiyuan Liu, Peng Li, Maosong Sun, and Jie Zhou. 2023. Plug-and-play knowledge injection for pre-trained language models. In Proceedings of the ACL. Toronto, Canada, 10641\u201310658."},{"key":"e_1_3_2_214_2","doi-asserted-by":"crossref","unstructured":"Wanjun Zhong Ruixiang Cui Yiduo Guo et\u00a0al. 2024. AGIEval: A human-centric benchmark for evaluating foundation models. Findings of NAACL 2024. 2299\u20132314.","DOI":"10.18653\/v1\/2024.findings-naacl.149"},{"key":"e_1_3_2_215_2","unstructured":"Chunting Zhou Pengfei Liu Puxin Xu et\u00a0al. 2023. LIMA: Less is more for alignment. In Advances in Neural Information Processing Systems (NeurIPS 2023) 36 Article ID ac662d74829e4407 (2023) 55006\u201355021. (papers.nips.cc)."},{"key":"e_1_3_2_216_2","volume-title":"Proceedings of the 1st Conference on Language Modeling","author":"Zhou Jianing","year":"2024","unstructured":"Jianing Zhou, Ziheng Zeng, Hongyu Gong, and Suma Bhat. 2024. Enhancing language models with idiomatic reasoning. In Proceedings of the 1st Conference on Language Modeling."},{"key":"e_1_3_2_217_2","first-page":"1","article-title":"Larger and more instructable language models become less reliable","author":"Zhou Lexin","year":"2024","unstructured":"Lexin Zhou, Wout Schellaert, Fernando Mart\u00ednez-Plumed, Yael Moros-Daval, C\u00e8sar Ferri, and Jos\u00e9 Hern\u00e1ndez-Orallo. 2024. Larger and more instructable language models become less reliable. Nature 634, 8032 (2024), 1\u20138.","journal-title":"Nature"},{"key":"e_1_3_2_218_2","doi-asserted-by":"crossref","unstructured":"Wenxuan Zhou Sheng Zhang Hoifung Poon and Muhao Chen. 2023. Context-faithful prompting for large language models. arXiv:1701.00133. Retrieved from https:\/\/arxiv.org\/abs\/1701.00133","DOI":"10.18653\/v1\/2023.findings-emnlp.968"}],"container-title":["ACM Computing Surveys"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3742420","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,2]],"date-time":"2025-09-02T12:42:15Z","timestamp":1756816935000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3742420"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9,2]]},"references-count":217,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2026,1,31]]}},"alternative-id":["10.1145\/3742420"],"URL":"https:\/\/doi.org\/10.1145\/3742420","relation":{},"ISSN":["0360-0300","1557-7341"],"issn-type":[{"value":"0360-0300","type":"print"},{"value":"1557-7341","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,9,2]]},"assertion":[{"value":"2023-12-18","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-01-06","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-09-02","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}