{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,15]],"date-time":"2026-06-15T01:52:05Z","timestamp":1781488325640,"version":"3.54.1"},"reference-count":164,"publisher":"Association for Computing Machinery (ACM)","issue":"7","license":[{"start":{"date-parts":[[2025,3,5]],"date-time":"2025-03-05T00:00:00Z","timestamp":1741132800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000006","name":"Office of Naval Research","doi-asserted-by":"crossref","award":["N00014-23-1-2651"],"award-info":[{"award-number":["N00014-23-1-2651"]}],"id":[{"id":"10.13039\/100000006","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Comput. Surv."],"published-print":{"date-parts":[[2025,7,31]]},"abstract":"<jats:p>Autonomous systems are soon to be ubiquitous, spanning manufacturing, agriculture, healthcare, entertainment, and other industries. Most of these systems are developed with modular sub-components for decision-making, planning, and control that may be hand-engineered or learning-based. While these approaches perform well under the situations they were specifically designed for, they can perform especially poorly in out-of-distribution scenarios that will undoubtedly arise at test-time. The rise of foundation models trained on multiple tasks with impressively large datasets has led researchers to believe that these models may provide \u201ccommon sense\u201d reasoning that existing planners are missing, bridging the gap between algorithm development and deployment. While researchers have shown promising results in deploying foundation models to decision-making tasks, these models are known to hallucinate and generate decisions that may sound reasonable but are in fact poor. We argue there is a need to step back and simultaneously design systems that can quantify the certainty of a model\u2019s decision and detect when it may be hallucinating. In this work, we discuss the current use cases of foundation models for decision-making tasks, provide a general definition for hallucinations with examples, discuss existing approaches to hallucination detection and mitigation with a focus on decision problems, present guidelines, and explore areas for further research in this exciting field.<\/jats:p>","DOI":"10.1145\/3716846","type":"journal-article","created":{"date-parts":[[2025,2,11]],"date-time":"2025-02-11T11:30:51Z","timestamp":1739273451000},"page":"1-35","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":33,"title":["Hallucination Detection in Foundation Models for Decision-Making: A Flexible Definition and Review of the State of the Art"],"prefix":"10.1145","volume":"57","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7132-6671","authenticated-orcid":false,"given":"Neeloy","family":"Chakraborty","sequence":"first","affiliation":[{"name":"University of Illinois Urbana-Champaign, Urbana, United States"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8510-8787","authenticated-orcid":false,"given":"Melkior","family":"Ornik","sequence":"additional","affiliation":[{"name":"University of Illinois Urbana-Champaign, Urbana, United States"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3760-9859","authenticated-orcid":false,"given":"Katherine","family":"Driggs-Campbell","sequence":"additional","affiliation":[{"name":"University of Illinois Urbana-Champaign, Urbana, United States"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2025,3,5]]},"reference":[{"key":"e_1_3_2_2_2","article-title":"GPT-4 technical report","author":"Achiam Josh","year":"2023","unstructured":"Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et\u00a0al. 2023. GPT-4 technical report. arXiv preprint arXiv:2303.08774 (2023).","journal-title":"arXiv preprint arXiv:2303.08774"},{"key":"e_1_3_2_3_2","first-page":"520","volume-title":"Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Anantha Raviteja","year":"2021","unstructured":"Raviteja Anantha, Svitlana Vakulenko, Zhucheng Tu, Shayne Longpre, Stephen Pulman, and Srinivas Chappidi. 2021. Open-domain question answering goes conversational via question rewriting. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 520\u2013534."},{"key":"e_1_3_2_4_2","article-title":"Learn then test: Calibrating predictive algorithms to achieve risk control","author":"Angelopoulos Anastasios N.","year":"2022","unstructured":"Anastasios N. Angelopoulos, Stephen Bates, Emmanuel J. Cand\u00e8s, Michael I. Jordan, and Lihua Lei. 2022. Learn then test: Calibrating predictive algorithms to achieve risk control. arXiv preprint arXiv:2110.01052 (2022).","journal-title":"arXiv preprint arXiv:2110.01052"},{"key":"e_1_3_2_5_2","first-page":"967","volume-title":"Findings of the Conference on Empirical Methods in Natural Language Processing","author":"Azaria Amos","year":"2023","unstructured":"Amos Azaria and Tom Mitchell. 2023. The internal state of an LLM knows when it\u2019s lying. In Findings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 967\u2013976."},{"key":"e_1_3_2_6_2","article-title":"Hallucination of multimodal large language models: A survey","author":"Bai Zechen","year":"2024","unstructured":"Zechen Bai, Pichao Wang, Tianjun Xiao, Tong He, Zongbo Han, Zheng Zhang, and Mike Zheng Shou. 2024. Hallucination of multimodal large language models: A survey. arXiv preprint arXiv:2404.18930 (2024).","journal-title":"arXiv preprint arXiv:2404.18930"},{"key":"e_1_3_2_7_2","article-title":"On the opportunities and risks of foundation models","author":"Bommasani Rishi","year":"2022","unstructured":"Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, et\u00a0al. 2022. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258 (2022).","journal-title":"arXiv preprint arXiv:2108.07258"},{"key":"e_1_3_2_8_2","doi-asserted-by":"crossref","first-page":"818","DOI":"10.1007\/s40435-020-00665-4","article-title":"A review of PID control, tuning methods and applications","volume":"9","author":"Borase Rakesh P.","year":"2021","unstructured":"Rakesh P. Borase, D. K. Maghade, S. Y. Sondkar, and S. N. Pawar. 2021. A review of PID control, tuning methods and applications. Int. J. Dynam. Contr. 9 (2021), 818\u2013827.","journal-title":"Int. J. Dynam. Contr."},{"key":"e_1_3_2_9_2","first-page":"1877","volume-title":"Proceedings of the Conference on Neural Information Processing Systems","author":"Brown Tom","year":"2020","unstructured":"Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et\u00a0al. 2020. Language models are few-shot learners. In Proceedings of the Conference on Neural Information Processing Systems. Curran Associates, Inc., 1877\u20131901."},{"key":"e_1_3_2_10_2","first-page":"6101","volume-title":"Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics","author":"Cao Shulin","year":"2022","unstructured":"Shulin Cao, Jiaxin Shi, Liangming Pan, Lunyiu Nie, Yutong Xiang, Lei Hou, Juanzi Li, Bin He, and Hanwang Zhang. 2022. KQA Pro: A dataset with explicit compositional programs for complex question answering over knowledge base. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 6101\u20136119."},{"key":"e_1_3_2_11_2","first-page":"9630","volume-title":"Proceedings of the IEEE\/CVF Conference on International Conference on Computer Vision (ICCV\u201921)","author":"Caron Mathilde","year":"2021","unstructured":"Mathilde Caron, Hugo Touvron, Ishan Misra, Herv\u00e9 Jegou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. 2021. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE\/CVF Conference on International Conference on Computer Vision (ICCV\u201921). Institute of Electrical and Electronics Engineers, 9630\u20139640."},{"key":"e_1_3_2_12_2","first-page":"1125","volume-title":"Proceedings of the International Conference on Autonomous Agents and Multiagent Systems","author":"Chakraborty Neeloy","year":"2023","unstructured":"Neeloy Chakraborty, Aamir Hasan, Shuijing Liu, Tianchen Ji, Weihang Liang, D. Livingston McPherson, and Katherine Driggs-Campbell. 2023. Structural attention-based recurrent variational autoencoder for highway vehicle anomaly detection. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 1125\u20131134."},{"key":"e_1_3_2_13_2","article-title":"PURR: Efficiently editing language model hallucinations by denoising language model corruptions","author":"Chen Anthony","year":"2023","unstructured":"Anthony Chen, Panupong Pasupat, Sameer Singh, Hongrae Lee, and Kelvin Guu. 2023. PURR: Efficiently editing language model hallucinations by denoising language model corruptions. arXiv preprint arXiv:2305.14908 (2023).","journal-title":"arXiv preprint arXiv:2305.14908"},{"key":"e_1_3_2_14_2","volume-title":"Proceedings of the 12th International Conference on Learning Representations","author":"Chen Canyu","year":"2024","unstructured":"Canyu Chen and Kai Shu. 2024. Can LLM-generated misinformation be detected? In Proceedings of the 12th International Conference on Learning Representations."},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P17-1171"},{"key":"e_1_3_2_16_2","first-page":"14093","volume-title":"Proceedings of the IEEE International Conference on Robotics and Automation (ICRA\u201924)","author":"Chen Long","year":"2024","unstructured":"Long Chen, Oleg Sinavski, Jan H\u00fcnermann, Alice Karnsund, Andrew James Willmott, Danny Birch, Daniel Maund, and Jamie Shotton. 2024. Driving with LLMs: Fusing object-level vector modality for explainable autonomous driving. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA\u201924). Institute of Electrical and Electronics Engineers, 14093\u201314100."},{"key":"e_1_3_2_17_2","article-title":"Introspective tips: Large language model for in-context decision making","author":"Chen Liting","year":"2023","unstructured":"Liting Chen, Lu Wang, Hang Dong, Yali Du, Jie Yan, Fangkai Yang, Shuang Li, Pu Zhao, Si Qin, Saravan Rajmohan, et\u00a0al. 2023. Introspective tips: Large language model for in-context decision making. arXiv preprint arXiv:2305.11598 (2023).","journal-title":"arXiv preprint arXiv:2305.11598"},{"key":"e_1_3_2_18_2","article-title":"Evaluating large language models trained on code","author":"Chen Mark","year":"2021","unstructured":"Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et\u00a0al. 2021. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021).","journal-title":"arXiv preprint arXiv:2107.03374"},{"key":"e_1_3_2_19_2","volume-title":"Proceedings of the Conference on Neural Information Processing Systems","author":"Chen Sijin","year":"2024","unstructured":"Sijin Chen, Xin Chen, Anqi Pang, Xianfang Zeng, Wei Cheng, Yijun Fu, Fukun Yin, Zhibin Wang, Jingyi Yu, Gang Yu, et\u00a0al. 2024. MeshXL: Neural coordinate field for generative 3D foundation models. In Proceedings of the Conference on Neural Information Processing Systems."},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1145\/3583780.3614905"},{"key":"e_1_3_2_21_2","volume-title":"Proceedings of the 7th International Conference on Learning Representations","author":"Chevalier-Boisvert Maxime","year":"2019","unstructured":"Maxime Chevalier-Boisvert, Dzmitry Bahdanau, Salem Lahlou, Lucas Willems, Chitwan Saharia, Thien Huu Nguyen, and Yoshua Bengio. 2019. BabyAI: First steps towards grounded language learning with a human in the loop. In Proceedings of the 7th International Conference on Learning Representations."},{"issue":"240","key":"e_1_3_2_22_2","first-page":"11324","article-title":"PaLM: Scaling language modeling with pathways","volume":"24","author":"Chowdhery Aakanksha","year":"2023","unstructured":"Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, et\u00a0al. 2023. PaLM: Scaling language modeling with pathways. J. Mach. Learn. Res. 24, 240 (2023), 11324\u201311436.","journal-title":"J. Mach. Learn. Res."},{"key":"e_1_3_2_23_2","article-title":"Sora detector: A unified hallucination detection for large text-to-video models","author":"Chu Zhixuan","year":"2024","unstructured":"Zhixuan Chu, Lei Zhang, Yichen Sun, Siqiao Xue, Zhibo Wang, Zhan Qin, and Kui Ren. 2024. Sora detector: A unified hallucination detection for large text-to-video models. arXiv preprint arXiv:2405.04180 (2024).","journal-title":"arXiv preprint arXiv:2405.04180"},{"key":"e_1_3_2_24_2","article-title":"Training verifiers to solve math word problems","author":"Cobbe Karl","year":"2021","unstructured":"Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, et\u00a0al. 2021. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168 (2021).","journal-title":"arXiv preprint arXiv:2110.14168"},{"key":"e_1_3_2_25_2","first-page":"41","volume-title":"Proceedings of the 7th Computer Games Workshop at the 27th International Conference on Artificial Intelligence","author":"C\u00f4t\u00e9 Marc-Alexandre","year":"2019","unstructured":"Marc-Alexandre C\u00f4t\u00e9, \u00c1kos K\u00e1d\u00e1r, Xingdi Yuan, Ben Kybartas, Tavian Barnes, Emery Fine, James Moore, Matthew Hausknecht, Layla El Asri, Mahmoud Adada, et\u00a0al. 2019. TextWorld: A learning environment for text-based games. In Proceedings of the 7th Computer Games Workshop at the 27th International Conference on Artificial Intelligence. Springer International Publishing, 41\u201375."},{"key":"e_1_3_2_26_2","first-page":"958","volume-title":"Proceedings of the 1st Workshop on Large Language and Vision Models for Autonomous Driving at the IEEE\/CVF Winter Conference on Applications of Computer Vision (WACV\u201924)","author":"Cui Can","year":"2024","unstructured":"Can Cui, Yunsheng Ma, Xu Cao, Wenqian Ye, Yang Zhou, Kaizhao Liang, Jintai Chen, Juanwu Lu, Zichong Yang, Kuei-Da Liao, et\u00a0al. 2024. A survey on multimodal large language models for autonomous driving. In Proceedings of the 1st Workshop on Large Language and Vision Models for Autonomous Driving at the IEEE\/CVF Winter Conference on Applications of Computer Vision (WACV\u201924). Institute of Electrical and Electronics Engineers, 958\u2013979."},{"key":"e_1_3_2_27_2","first-page":"2136","volume-title":"Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics","author":"Dai Wenliang","year":"2023","unstructured":"Wenliang Dai, Zihan Liu, Ziwei Ji, Dan Su, and Pascale Fung. 2023. Plausible may not be faithful: Probing object hallucination in vision-language pre-training. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 2136\u20132148."},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-78800-3_24"},{"key":"e_1_3_2_29_2","first-page":"4171","volume-title":"Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 4171\u20134186."},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.findings-acl.212"},{"key":"e_1_3_2_31_2","article-title":"A survey on in-context learning","author":"Dong Qingxiu","year":"2023","unstructured":"Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Zhiyong Wu, Baobao Chang, Xu Sun, Jingjing Xu, Lei Li, and Zhifang Sui. 2023. A survey on in-context learning. arXiv preprint arXiv:2301.00234 (2023).","journal-title":"arXiv preprint arXiv:2301.00234"},{"key":"e_1_3_2_32_2","series-title":"Proceedings of Machine Learning Research","first-page":"8469","volume-title":"Proceedings of the 40th International Conference on Machine Learning","volume":"202","author":"Driess Danny","year":"2023","unstructured":"Danny Driess, Fei Xia, Mehdi S. M. Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, et\u00a0al. 2023. PaLM-E: An embodied multimodal language model. In Proceedings of the 40th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 202). PMLR, 8469\u20138488."},{"key":"e_1_3_2_33_2","series-title":"Proceedings of Machine Learning Research","first-page":"11733","volume-title":"Proceedings of the 41st International Conference on Machine Learning","volume":"235","author":"Du Yilun","year":"2024","unstructured":"Yilun Du, Shuang Li, Antonio Torralba, Joshua B. Tenenbaum, and Igor Mordatch. 2024. Improving factuality and reasoning in language models through multiagent debate. In Proceedings of the 41st International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 235). PMLR, 11733\u201311763."},{"issue":"4","key":"e_1_3_2_34_2","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1093\/logcom\/10.4.583","article-title":"The logic of conflicts between decision making agents","volume":"10","author":"Ekenberg L.","year":"2000","unstructured":"L. Ekenberg. 2000. The logic of conflicts between decision making agents. J. Logic Computat. 10, 4 (082000), 583\u2013602.","journal-title":"J. Logic Computat."},{"key":"e_1_3_2_35_2","article-title":"Halo: Estimation and reduction of hallucinations in open-source weak large language models","author":"Elaraby Mohamed","year":"2023","unstructured":"Mohamed Elaraby, Mengyin Lu, Jacob Dunn, Xueying Zhang, Yu Wang, Shizhu Liu, Pingchuan Tian, Yuping Wang, and Yuxuan Wang. 2023. Halo: Estimation and reduction of hallucinations in open-source weak large language models. arXiv preprint arXiv:2308.11764 (2023).","journal-title":"arXiv preprint arXiv:2308.11764"},{"key":"e_1_3_2_36_2","first-page":"1410","volume-title":"Proceedings of the European Control Conference (ECC\u201913)","author":"Formentin Simone","year":"2013","unstructured":"Simone Formentin, Klaske van Heusden, and Alireza Karimi. 2013. Model-based and data-driven model-reference control: A comparative analysis. In Proceedings of the European Control Conference (ECC\u201913). Institute of Electrical and Electronics Engineers, 1410\u20131415."},{"key":"e_1_3_2_37_2","volume-title":"Wikimedia Downloads","author":"Foundation Wikimedia","unstructured":"Wikimedia Foundation. [n. d.]. Wikimedia Downloads. Retrieved from https:\/\/dumps.wikimedia.org"},{"key":"e_1_3_2_38_2","doi-asserted-by":"crossref","first-page":"56","DOI":"10.18653\/v1\/W17-3207","volume-title":"Proceedings of the 1st Workshop on Neural Machine Translation","author":"Freitag Markus","year":"2017","unstructured":"Markus Freitag and Yaser Al-Onaizan. 2017. Beam search strategies for neural machine translation. In Proceedings of the 1st Workshop on Neural Machine Translation. Association for Computational Linguistics, 56\u201360."},{"key":"e_1_3_2_39_2","first-page":"910","volume-title":"Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW\u201924)","author":"Fu Daocheng","year":"2024","unstructured":"Daocheng Fu, Xin Li, Licheng Wen, Min Dou, Pinlong Cai, Botian Shi, and Yu Qiao. 2024. Drive like a human: Rethinking autonomous driving with large language models. In Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW\u201924). Institute of Electrical and Electronics Engineers, 910\u2013919."},{"issue":"1","key":"e_1_3_2_40_2","first-page":"1","article-title":"Peer review of GPT-4 technical report and systems card","volume":"3","author":"Gallifant Jack","year":"2024","unstructured":"Jack Gallifant, Amelia Fiske, Yulia A. Levites Strekalova, Juan S. Osorio-Valencia, Rachael Parke, Rogers Mwavu, Nicole Martinez, Judy Wawira Gichoya, Marzyeh Ghassemi, Dina Demner-Fushman, et\u00a0al. 2024. Peer review of GPT-4 technical report and systems card. PLoS Digit. Health 3, 1 (012024), 1\u201315.","journal-title":"PLoS Digit. Health"},{"key":"e_1_3_2_41_2","first-page":"346","article-title":"Did Aristotle use a laptop? A question answering benchmark with implicit reasoning strategies","volume":"9","author":"Geva Mor","year":"2021","unstructured":"Mor Geva, Daniel Khashabi, Elad Segal, Tushar Khot, Dan Roth, and Jonathan Berant. 2021. Did Aristotle use a laptop? A question answering benchmark with implicit reasoning strategies. Trans. Assoc. Computat. Ling. 9 (042021), 346\u2013361.","journal-title":"Trans. Assoc. Computat. Ling."},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1016\/0004-3702(92)90028-V"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1145\/4284.4286"},{"key":"e_1_3_2_44_2","first-page":"20123","volume-title":"Proceedings of the 38th AAAI Conference on Artificial Intelligence","author":"Hazra Rishi","year":"2024","unstructured":"Rishi Hazra, Pedro Zuidberg Dos Martires, and Luc De Raedt. 2024. SayCanPay: Heuristic planning with large language models using learnable domain knowledge. In Proceedings of the 38th AAAI Conference on Artificial Intelligence. AAAI Press, 20123\u201320133."},{"key":"e_1_3_2_45_2","volume-title":"Proceedings of the 8th Annual Conference on Robot Learning","author":"He Haoran","year":"2024","unstructured":"Haoran He, Peilin Wu, Chenjia Bai, Hang Lai, Lingxiao Wang, Ling Pan, Xiaolin Hu, and Weinan Zhang. 2024. Bridging the sim-to-real gap from the information bottleneck perspective. In Proceedings of the 8th Annual Conference on Robot Learning."},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.11694"},{"key":"e_1_3_2_47_2","volume-title":"Proceedings of the 9th International Conference on Learning Representations","author":"Hendrycks Dan","year":"2021","unstructured":"Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. 2021. Measuring massive multitask language understanding. In Proceedings of the 9th International Conference on Learning Representations."},{"key":"e_1_3_2_48_2","volume-title":"Proceedings of the Conference on Neural Information Processing Systems","author":"Hermann Karl Moritz","year":"2015","unstructured":"Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. 2015. Teaching machines to read and comprehend. In Proceedings of the Conference on Neural Information Processing Systems. Curran Associates, Inc."},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2024.3360020"},{"key":"e_1_3_2_50_2","article-title":"A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions","author":"Huang Lei","year":"2023","unstructured":"Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, et\u00a0al. 2023. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. arXiv preprint arXiv:2311.05232 (2023).","journal-title":"arXiv preprint arXiv:2311.05232"},{"key":"e_1_3_2_51_2","first-page":"13418","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201924)","author":"Huang Qidong","year":"2024","unstructured":"Qidong Huang, Xiaoyi Dong, Pan Zhang, Bin Wang, Conghui He, Jiaqi Wang, Dahua Lin, Weiming Zhang, and Nenghai Yu. 2024. OPERA: Alleviating hallucination in multi-modal large language models via over-trust penalty and retrospection-allocation. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201924). Institute of Electrical and Electronics Engineers, 13418\u201313427."},{"key":"e_1_3_2_52_2","article-title":"Understanding the planning of LLM agents: A survey","author":"Huang Xu","year":"2024","unstructured":"Xu Huang, Weiwen Liu, Xiaolong Chen, Xingmei Wang, Hao Wang, Defu Lian, Yasheng Wang, Ruiming Tang, and Enhong Chen. 2024. Understanding the planning of LLM agents: A survey. arXiv preprint arXiv:2402.02716 (2024).","journal-title":"arXiv preprint arXiv:2402.02716"},{"key":"e_1_3_2_53_2","first-page":"6693","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201919)","author":"Hudson Drew A.","year":"2019","unstructured":"Drew A. Hudson and Christopher D. Manning. 2019. GQA: A new dataset for real-world visual reasoning and compositional question answering. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201919). Institute of Electrical and Electronics Engineers, 6693\u20136702."},{"key":"e_1_3_2_54_2","series-title":"Proceedings of Machine Learning Research","first-page":"287","volume-title":"Proceedings of the 6th Conference on Robot Learning","volume":"205","author":"Ichter Brian","year":"2023","unstructured":"Brian Ichter, Anthony Brohan, Yevgen Chebotar, Chelsea Finn, Karol Hausman, Alexander Herzog, Daniel Ho, Julian Ibarz, Alex Irpan, Eric Jang, et\u00a0al. 2023. Do as I can, not as I say: Grounding language in robotic affordances. In Proceedings of the 6th Conference on Robot Learning(Proceedings of Machine Learning Research, Vol. 205). PMLR, 287\u2013318."},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","DOI":"10.1561\/0600000079"},{"key":"e_1_3_2_56_2","first-page":"944","volume-title":"Proceedings of the IEEE Military Communications Conference (MILCOM\u201923)","author":"Jha Sumit Kumar","year":"2023","unstructured":"Sumit Kumar Jha, Susmit Jha, Patrick Lincoln, Nathaniel D. Bastian, Alvaro Velasquez, Rickard Ewetz, and Sandeep Neema. 2023. Counterexample guided inductive synthesis using large language models and satisfiability solving. In Proceedings of the IEEE Military Communications Conference (MILCOM\u201923). Institute of Electrical and Electronics Engineers, 944\u2013949."},{"issue":"12","key":"e_1_3_2_57_2","first-page":"248","article-title":"Survey of hallucination in natural language generation","volume":"55","author":"Ji Ziwei","year":"2023","unstructured":"Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. 2023. Survey of hallucination in natural language generation. Comput. Surv. 55, 12, Article 248 (Mar.2023), 38 pages.","journal-title":"Comput. Surv."},{"issue":"1","key":"e_1_3_2_58_2","doi-asserted-by":"crossref","first-page":"317","DOI":"10.1038\/s41597-019-0322-0","article-title":"MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports","volume":"6","author":"Johnson Alistair E. W.","year":"2019","unstructured":"Alistair E. W. Johnson, Tom J. Pollard, Seth J. Berkowitz, Nathaniel R. Greenbaum, Matthew P. Lungren, Chih-ying Deng, Roger G. Mark, and Steven Horng. 2019. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Scient. Data 6, 1 (2019), 317.","journal-title":"Scient. Data"},{"key":"e_1_3_2_59_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P17-1147"},{"key":"e_1_3_2_60_2","volume-title":"Proceedings of the 37th Conference on Neural Information Processing Systems Datasets and Benchmarks Track","author":"Kasai Jungo","year":"2023","unstructured":"Jungo Kasai, Keisuke Sakaguchi, Yoichi Takahashi, Ronan Le Bras, Akari Asai, Xinyan Velocity Yu, Dragomir Radev, Noah A. Smith, Yejin Choi, and Kentaro Inui. 2023. RealTime QA: What\u2019s the answer right now? In Proceedings of the 37th Conference on Neural Information Processing Systems Datasets and Benchmarks Track."},{"key":"e_1_3_2_61_2","first-page":"3390","volume-title":"Proceedings of the 32nd AAAI Conference on Artificial Intelligence","author":"Kemker Ronald","year":"2018","unstructured":"Ronald Kemker, Marc McClure, Angelina Abitino, Tyler Hayes, and Christopher Kanan. 2018. Measuring catastrophic forgetting in neural networks. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence. AAAI Press, 3390\u20133398."},{"key":"e_1_3_2_62_2","article-title":"Can you text what is happening? Integrating pre-trained language encoders into trajectory prediction models for autonomous driving","author":"Keysan Ali","year":"2023","unstructured":"Ali Keysan, Andreas Look, Eitan Kosman, Gonca G\u00fcrsun, J\u00f6rg Wagner, Yu Yao, and Barbara Rakitsch. 2023. Can you text what is happening? Integrating pre-trained language encoders into trajectory prediction models for autonomous driving. arXiv preprint arXiv:2309.05282 (2023).","journal-title":"arXiv preprint arXiv:2309.05282"},{"key":"e_1_3_2_63_2","first-page":"274","volume-title":"Proceedings of the 11th Dialog System Technology Challenge","author":"Kim Seokhwan","year":"2023","unstructured":"Seokhwan Kim, Spandana Gella, Chao Zhao, Di Jin, Alexandros Papangelis, Behnam Hedayatnia, Yang Liu, and Dilek Z. Hakkani-Tur. 2023. Task-oriented conversational modeling with subjective knowledge track in DSTC11. In Proceedings of the 11th Dialog System Technology Challenge. Association for Computational Linguistics, 274\u2013281."},{"key":"e_1_3_2_64_2","first-page":"3992","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV\u201923)","author":"Kirillov Alexander","year":"2023","unstructured":"Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, et\u00a0al. 2023. Segment anything. In Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV\u201923). Institute of Electrical and Electronics Engineers, 3992\u20134003."},{"key":"e_1_3_2_65_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-016-0981-7"},{"key":"e_1_3_2_66_2","volume-title":"Proceedings of the \u201cNeural Conversational AI Workshop - What\u2019s left to TEACH (Trustworthy, Enhanced, Adaptable, Capable and Human-centric) chatbots?\u201d at the 40th International Conference on Machine Learning","author":"Kumar Bhawesh","year":"2023","unstructured":"Bhawesh Kumar, Charlie Lu, Gauri Gupta, Anil Palepu, David Bellamy, Ramesh Raskar, and Andrew Beam. 2023. Conformal prediction with large language models for multi-choice question answering. In Proceedings of the \u201cNeural Conversational AI Workshop - What\u2019s left to TEACH (Trustworthy, Enhanced, Adaptable, Capable and Human-centric) chatbots?\u201d at the 40th International Conference on Machine Learning."},{"key":"e_1_3_2_67_2","first-page":"452","article-title":"Natural questions: A benchmark for question answering research","volume":"7","author":"Kwiatkowski Tom","year":"2019","unstructured":"Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur Parikh, Chris Alberti, Danielle Epstein, Illia Polosukhin, Jacob Devlin, Kenton Lee, et\u00a0al. 2019. Natural questions: A benchmark for question answering research. Trans. Assoc. Computat. Ling. 7 (2019), 452\u2013466.","journal-title":"Trans. Assoc. Computat. Ling."},{"key":"e_1_3_2_68_2","volume-title":"Proceedings of the the 11th International Conference on Learning Representations","author":"Kwon Minae","year":"2023","unstructured":"Minae Kwon, Sang Michael Xie, Kalesha Bullard, and Dorsa Sadigh. 2023. Reward design with language models. In Proceedings of the the 11th International Conference on Learning Representations."},{"key":"e_1_3_2_69_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D16-1128"},{"key":"e_1_3_2_70_2","first-page":"1774","volume-title":"Findings of the 61st Annual Meeting of the Association for Computational Linguistics","author":"Li Daliang","year":"2023","unstructured":"Daliang Li, Ankit Singh Rawat, Manzil Zaheer, Xin Wang, Michal Lukasik, Andreas Veit, Felix Yu, and Sanjiv Kumar. 2023. Large language models with controllable working memory. In Findings of the 61st Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 1774\u20131793."},{"key":"e_1_3_2_71_2","first-page":"1250","volume-title":"Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Li Haonan","year":"2022","unstructured":"Haonan Li, Martin Tomko, Maria Vasardani, and Timothy Baldwin. 2022. MultiSpanQA: A dataset for multi-span question answering. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 1250\u20131260."},{"key":"e_1_3_2_72_2","first-page":"6449","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing","author":"Li Junyi","year":"2023","unstructured":"Junyi Li, Xiaoxue Cheng, Xin Zhao, Jian-Yun Nie, and Ji-Rong Wen. 2023. HaluEval: A large-scale hallucination evaluation benchmark for large language models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 6449\u20136464."},{"key":"e_1_3_2_73_2","volume-title":"Proceedings of the 12th International Conference on Learning Representations","author":"Li Xingxuan","year":"2024","unstructured":"Xingxuan Li, Ruochen Zhao, Yew Ken Chia, Bosheng Ding, Shafiq Joty, Soujanya Poria, and Lidong Bing. 2024. Chain-of-knowledge: Grounding large language models via dynamic knowledge adapting over heterogeneous sources. In Proceedings of the 12th International Conference on Learning Representations."},{"key":"e_1_3_2_74_2","first-page":"292","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing","author":"Li Yifan","year":"2023","unstructured":"Yifan Li, Yifan Du, Kun Zhou, Jinpeng Wang, Xin Zhao, and Ji-Rong Wen. 2023. Evaluating object hallucination in large vision-language models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 292\u2013305."},{"key":"e_1_3_2_75_2","first-page":"9493","volume-title":"Proceedings of the IEEE International Conference on Robotics and Automation (ICRA\u201923)","author":"Liang Jacky","year":"2023","unstructured":"Jacky Liang, Wenlong Huang, Fei Xia, Peng Xu, Karol Hausman, Brian Ichter, Pete Florence, and Andy Zeng. 2023. Code as policies: Language model programs for embodied control. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA\u201923). Institute of Electrical and Electronics Engineers, 9493\u20139500."},{"key":"e_1_3_2_76_2","article-title":"Introspective planning: Guiding language-enabled agents to refine their own uncertainty","author":"Liang Kaiqu","year":"2024","unstructured":"Kaiqu Liang, Zixu Zhang, and Jaime Fern\u00e1ndez Fisac. 2024. Introspective planning: Guiding language-enabled agents to refine their own uncertainty. arXiv preprint arXiv:2402.06529 (2024).","journal-title":"arXiv preprint arXiv:2402.06529"},{"key":"e_1_3_2_77_2","article-title":"Addressing image hallucination in text-to-image generation through factual image retrieval","author":"Lim Youngsun","year":"2024","unstructured":"Youngsun Lim and Hyunjung Shim. 2024. Addressing image hallucination in text-to-image generation through factual image retrieval. arXiv preprint arXiv:2407.10683 (2024).","journal-title":"arXiv preprint arXiv:2407.10683"},{"key":"e_1_3_2_78_2","first-page":"1","article-title":"Teaching models to express their uncertainty in words","author":"Lin Stephanie","year":"2022","unstructured":"Stephanie Lin, Jacob Hilton, and Owain Evans. 2022. Teaching models to express their uncertainty in words. Trans. Mach. Learn. Res. (2022) , 1\u201319. https:\/\/openreview.net\/pdf?id=8s8K2UZGTZ","journal-title":"Trans. Mach. Learn. Res."},{"key":"e_1_3_2_79_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.acl-long.229"},{"key":"e_1_3_2_80_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"e_1_3_2_81_2","article-title":"A survey on hallucination in large vision-language models","author":"Liu Hanchao","year":"2024","unstructured":"Hanchao Liu, Wenyuan Xue, Yifei Chen, Dapeng Chen, Xiutian Zhao, Ke Wang, Liping Hou, Rongjun Li, and Wei Peng. 2024. A survey on hallucination in large vision-language models. arXiv preprint arXiv:2402.00253 (2024).","journal-title":"arXiv preprint arXiv:2402.00253"},{"key":"e_1_3_2_82_2","first-page":"5154","volume-title":"Proceedings of the IEEE International Conference on Intelligent Transportation Systems (ITSC\u201923)","author":"Liu Jiaqi","year":"2023","unstructured":"Jiaqi Liu, Peng Hang, Xiao Qi, Jianqiang Wang, and Jian Sun. 2023. MTD-GPT: A multi-task decision-making GPT model for autonomous driving at unsignalized intersections. In Proceedings of the IEEE International Conference on Intelligent Transportation Systems (ITSC\u201923). Institute of Electrical and Electronics Engineers, 5154\u20135161."},{"key":"e_1_3_2_83_2","first-page":"6723","volume-title":"Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics","author":"Liu Tianyu","year":"2022","unstructured":"Tianyu Liu, Yizhe Zhang, Chris Brockett, Yi Mao, Zhifang Sui, Weizhu Chen, and Bill Dolan. 2022. A token-level reference-free hallucination detection benchmark for free-form text Generation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 6723\u20136737."},{"key":"e_1_3_2_84_2","first-page":"2507","volume-title":"Proceedings of the Conference on Neural Information Processing Systems","author":"Lu Pan","year":"2022","unstructured":"Pan Lu, Swaroop Mishra, Tanglin Xia, Liang Qiu, Kai-Wei Chang, Song-Chun Zhu, Oyvind Tafjord, Peter Clark, and Ashwin Kalyan. 2022. Learn to explain: Multimodal reasoning via thought chains for science question answering. In Proceedings of the Conference on Neural Information Processing Systems. Curran Associates, Inc., 2507\u20132521."},{"key":"e_1_3_2_85_2","first-page":"14032","volume-title":"Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics","author":"Malaviya Chaitanya","year":"2023","unstructured":"Chaitanya Malaviya, Peter Shaw, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2023. QUEST: A retrieval dataset of entity-seeking queries with implicit set operations. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 14032\u201314047."},{"key":"e_1_3_2_86_2","first-page":"9004","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing","author":"Manakul Potsawee","year":"2023","unstructured":"Potsawee Manakul, Adian Liusie, and Mark Gales. 2023. SelfCheckGPT: Zero-resource black-box hallucination detection for generative large language models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 9004\u20139017."},{"key":"e_1_3_2_87_2","volume-title":"Proceedings of the 1st Conference on Language Modeling","author":"Mao Jiageng","year":"2024","unstructured":"Jiageng Mao, Junjie Ye, Yuxi Qian, Marco Pavone, and Yue Wang. 2024. A language agent for autonomous driving. In Proceedings of the 1st Conference on Language Modeling."},{"key":"e_1_3_2_88_2","article-title":"FLIRT: Feedback loop in-context red teaming","author":"Mehrabi Ninareh","year":"2023","unstructured":"Ninareh Mehrabi, Palash Goyal, Christophe Dupuy, Qian Hu, Shalini Ghosh, Richard Zemel, Kai-Wei Chang, Aram Galstyan, and Rahul Gupta. 2023. FLIRT: Feedback loop in-context red teaming. arXiv preprint arXiv:2308.04265 (2023).","journal-title":"arXiv preprint arXiv:2308.04265"},{"key":"e_1_3_2_89_2","article-title":"Model information","year":"2024","unstructured":"Meta. 2024. Model information. GitHub. Retrieved from https:\/\/github.com\/meta-llama\/llama-models\/blob\/main\/models\/llama3_1\/MODEL_CARD.md","journal-title":"GitHub"},{"key":"e_1_3_2_90_2","first-page":"1","article-title":"Augmented language models: A survey","author":"Mialon Gr\u00e9goire","year":"2023","unstructured":"Gr\u00e9goire Mialon, Roberto Dessi, Maria Lomeli, Christoforos Nalmpantis, Ramakanth Pasunuru, Roberta Raileanu, Baptiste Roziere, Timo Schick, Jane Dwivedi-Yu, Asli Celikyilmaz, et\u00a0al. 2023. Augmented language models: A survey. Trans. Mach. Learn. Res. (2023), 1\u201335. https:\/\/openreview.net\/pdf?id=jh7wH2AzKK","journal-title":"Trans. Mach. Learn. Res."},{"key":"e_1_3_2_91_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.emnlp-main.741"},{"key":"e_1_3_2_92_2","volume-title":"Proceedings of the 12th International Conference on Learning Representations","author":"M\u00fcndler Niels","year":"2024","unstructured":"Niels M\u00fcndler, Jingxuan He, Slobodan Jenko, and Martin Vechev. 2024. Self-contradictory hallucinations of large language models: Evaluation, detection and mitigation. In Proceedings of the 12th International Conference on Learning Representations."},{"key":"e_1_3_2_93_2","article-title":"Introducing ChatGPT","year":"2022","unstructured":"OpenAI. 2022. Introducing ChatGPT. OpenAI blog. Retrieved from https:\/\/openai.com\/blog\/chatgpt","journal-title":"OpenAI blog"},{"key":"e_1_3_2_94_2","article-title":"DALL \\(\\cdot\\) E 3 system card","year":"2023","unstructured":"OpenAI. 2023. DALL \\(\\cdot\\) E 3 system card. OpenAI Blog. Retrieved from https:\/\/cdn.openai.com\/papers\/DALL_E_3_System_Card.pdf","journal-title":"OpenAI Blog"},{"key":"e_1_3_2_95_2","first-page":"1","article-title":"DINOv2: Learning robust visual features without supervision","author":"Oquab Maxime","year":"2024","unstructured":"Maxime Oquab, Timoth\u00e9e Darcet, Th\u00e9o Moutakanni, Huy V. Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et\u00a0al. 2024. DINOv2: Learning robust visual features without supervision. Trans. Mach. Learn. Res. (2024), 1\u201332. https:\/\/openreview.net\/pdf?id=a68SUt6zFt","journal-title":"Trans. Mach. Learn. Res."},{"key":"e_1_3_2_96_2","first-page":"313","volume-title":"Foundation Models for Speech, Images, Videos, and Control","author":"Paa\u00df Gerhard","year":"2023","unstructured":"Gerhard Paa\u00df and Sven Giesselbach. 2023. Foundation Models for Speech, Images, Videos, and Control. Springer International Publishing, Cham, 313\u2013382."},{"key":"e_1_3_2_97_2","volume-title":"Proceedings of the 12th International Conference on Learning Representations","author":"Pacchiardi Lorenzo","year":"2024","unstructured":"Lorenzo Pacchiardi, Alex James Chan, S\u00f6ren Mindermann, Ilan Moscovitz, Alexa Yue Pan, Yarin Gal, Owain Evans, and Jan M. Brauner. 2024. How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions. In Proceedings of the 12th International Conference on Learning Representations."},{"key":"e_1_3_2_98_2","first-page":"311","volume-title":"Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics","author":"Papineni Kishore","year":"2002","unstructured":"Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 311\u2013318."},{"issue":"2","key":"e_1_3_2_99_2","doi-asserted-by":"crossref","first-page":"1059","DOI":"10.1109\/LRA.2023.3338514","article-title":"CLARA: Classifying and disambiguating user commands for reliable interactive robotic agents","volume":"9","author":"Park Jeongeun","year":"2024","unstructured":"Jeongeun Park, Seungwon Lim, Joonhyung Lee, Sangbeom Park, Minsuk Chang, Youngjae Yu, and Sungjoon Choi. 2024. CLARA: Classifying and disambiguating user commands for reliable interactive robotic agents. IEEE Robot. Autom. Lett. 9, 2 (2024), 1059\u20131066.","journal-title":"IEEE Robot. Autom. Lett."},{"key":"e_1_3_2_100_2","doi-asserted-by":"publisher","DOI":"10.1145\/3586183.3606763"},{"key":"e_1_3_2_101_2","article-title":"Check your facts and try again: Improving large language models with external knowledge and automated feedback","author":"Peng Baolin","year":"2023","unstructured":"Baolin Peng, Michel Galley, Pengcheng He, Hao Cheng, Yujia Xie, Yu Hu, Qiuyuan Huang, Lars Liden, Zhou Yu, Weizhu Chen, et\u00a0al. 2023. Check your facts and try again: Improving large language models with external knowledge and automated feedback. arXiv preprint arXiv:2302.12813 (2023).","journal-title":"arXiv preprint arXiv:2302.12813"},{"key":"e_1_3_2_102_2","first-page":"8494","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201918)","author":"Puig Xavier","year":"2018","unstructured":"Xavier Puig, Kevin Ra, Marko Boben, Jiaman Li, Tingwu Wang, Sanja Fidler, and Antonio Torralba. 2018. VirtualHome: Simulating household activities via programs. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201918). Institute of Electrical and Electronics Engineers, 8494\u20138502."},{"key":"e_1_3_2_103_2","article-title":"A moral imperative: The need for continual superalignment of large language models","author":"Puthumanaillam Gokul","year":"2024","unstructured":"Gokul Puthumanaillam, Manav Vora, Pranay Thangeda, and Melkior Ornik. 2024. A moral imperative: The need for continual superalignment of large language models. arXiv preprint arXiv:2403.14683 (2024).","journal-title":"arXiv preprint arXiv:2403.14683"},{"key":"e_1_3_2_104_2","article-title":"Latent jailbreak: A benchmark for evaluating text safety and output robustness of large language models","author":"Qiu Huachuan","year":"2023","unstructured":"Huachuan Qiu, Shuai Zhang, Anqi Li, Hongliang He, and Zhenzhong Lan. 2023. Latent jailbreak: A benchmark for evaluating text safety and output robustness of large language models. arXiv preprint arXiv:2307.08487 (2023).","journal-title":"arXiv preprint arXiv:2307.08487"},{"key":"e_1_3_2_105_2","volume-title":"Proceedings of the 12th International Conference on Learning Representations","author":"Quach Victor","year":"2024","unstructured":"Victor Quach, Adam Fisch, Tal Schuster, Adam Yala, Jae Ho Sohn, Tommi S. Jaakkola, and Regina Barzilay. 2024. Conformal language modeling. In Proceedings of the 12th International Conference on Learning Representations."},{"key":"e_1_3_2_106_2","series-title":"Proceedings of Machine Learning Research","first-page":"8748","volume-title":"Proceedings of the 38th International Conference on Machine Learning","volume":"139","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et\u00a0al. 2021. Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 139). PMLR, 8748\u20138763."},{"key":"e_1_3_2_107_2","article-title":"Language models are unsupervised multitask learners","author":"Radford Alec","year":"2019","unstructured":"Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI Blog. Retrieved from https:\/\/openai.com\/research\/better-language-models","journal-title":"OpenAI Blog"},{"key":"e_1_3_2_108_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D16-1264"},{"key":"e_1_3_2_109_2","first-page":"5422","volume-title":"Findings of the Conference on Empirical Methods in Natural Language Processing","author":"Ramakrishna Anil","year":"2023","unstructured":"Anil Ramakrishna, Rahul Gupta, Jens Lehmann, and Morteza Ziyadi. 2023. INVITE: A testbed of automatically generated invalid questions to evaluate large language models for hallucinations. In Findings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 5422\u20135429."},{"key":"e_1_3_2_110_2","article-title":"A survey of hallucination in large foundation models","author":"Rawte Vipula","year":"2023","unstructured":"Vipula Rawte, Amit Sheth, and Amitava Das. 2023. A survey of hallucination in large foundation models. arXiv preprint arXiv:2309.05922 (2023).","journal-title":"arXiv preprint arXiv:2309.05922"},{"key":"e_1_3_2_111_2","series-title":"Proceedings of Machine Learning Research","first-page":"661","volume-title":"Proceedings of the 7th Conference on Robot Learning","volume":"229","author":"Ren Allen Z.","year":"2023","unstructured":"Allen Z. Ren, Anushri Dixit, Alexandra Bodrova, Sumeet Singh, Stephen Tu, Noah Brown, Peng Xu, Leila Takayama, Fei Xia, Jake Varley, et\u00a0al. 2023. Robots that ask for help: Uncertainty alignment for large language model planners. In Proceedings of the 7th Conference on Robot Learning(Proceedings of Machine Learning Research, Vol. 229). PMLR, 661\u2013682."},{"key":"e_1_3_2_112_2","first-page":"11709","volume-title":"Findings of the Conference on Empirical Methods in Natural Language Processing","author":"Sahoo Pranab","year":"2024","unstructured":"Pranab Sahoo, Prabhash Meharia, Akash Ghosh, Sriparna Saha, Vinija Jain, and Aman Chadha. 2024. A comprehensive survey of hallucination in large language, image, video and audio foundation models. In Findings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 11709\u201311724."},{"key":"e_1_3_2_113_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2021.3126658"},{"key":"e_1_3_2_114_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.02157"},{"key":"e_1_3_2_115_2","first-page":"8038","volume-title":"Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS\u201923)","author":"Schreiber Andre","year":"2023","unstructured":"Andre Schreiber, Tianchen Ji, D. Livingston McPherson, and Katherine Driggs-Campbell. 2023. An attentional recurrent neural network for occlusion-aware proactive anomaly detection in field robot navigation. In Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS\u201923). Institute of Electrical and Electronics Engineers, 8038\u20138045."},{"key":"e_1_3_2_116_2","article-title":"LAION-400M: Open dataset of CLIP-filtered 400 million image-text pairs","author":"Schuhmann Christoph","year":"2021","unstructured":"Christoph Schuhmann, Richard Vencu, Romain Beaumont, Robert Kaczmarczyk, Clayton Mullis, Aarush Katta, Theo Coombes, Jenia Jitsev, and Aran Komatsuzaki. 2021. LAION-400M: Open dataset of CLIP-filtered 400 million image-text pairs. arXiv preprint arXiv:2111.02114 (2021).","journal-title":"arXiv preprint arXiv:2111.02114"},{"key":"e_1_3_2_117_2","first-page":"146","volume-title":"Proceedings of the 17th European Conference on Computer Vision (ECCV\u201922)","author":"Schwenk Dustin","year":"2022","unstructured":"Dustin Schwenk, Apoorv Khandelwal, Christopher Clark, Kenneth Marino, and Roozbeh Mottaghi. 2022. A-OKVQA: A benchmark for visual question answering using world knowledge. In Proceedings of the 17th European Conference on Computer Vision (ECCV\u201922). Springer Nature Switzerland, 146\u2013162."},{"issue":"12","key":"e_1_3_2_118_2","first-page":"371","article-title":"A tutorial on conformal prediction","volume":"9","author":"Shafer Glenn","year":"2008","unstructured":"Glenn Shafer and Vladimir Vovk. 2008. A tutorial on conformal prediction. J. Mach. Learn. Res. 9, 12 (2008), 371\u2013421.","journal-title":"J. Mach. Learn. Res."},{"key":"e_1_3_2_119_2","series-title":"Proceedings of Machine Learning Research","first-page":"492","volume-title":"Proceedings of the 6th Conference on Robot Learning","volume":"205","author":"Shah Dhruv","year":"2023","unstructured":"Dhruv Shah, B\u0142a\u017cej Osi\u0144ski, Brian Ichter, and Sergey Levine. 2023. LM-Nav: Robotic navigation with large pre-trained models of language, vision, and action. In Proceedings of the 6th Conference on Robot Learning(Proceedings of Machine Learning Research, Vol. 205). PMLR, 492\u2013504."},{"key":"e_1_3_2_120_2","first-page":"8634","volume-title":"Proceedings of the Conference on Neural Information Processing Systems","author":"Shinn Noah","year":"2023","unstructured":"Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. 2023. Reflexion: Language agents with verbal reinforcement learning. In Proceedings of the Conference on Neural Information Processing Systems. Curran Associates, Inc., 8634\u20138652."},{"key":"e_1_3_2_121_2","article-title":"DriveLM: Driving with graph visual question answering","author":"Sima Chonghao","year":"2023","unstructured":"Chonghao Sima, Katrin Renz, Kashyap Chitta, Li Chen, Hanxue Zhang, Chengen Xie, Ping Luo, Andreas Geiger, and Hongyang Li. 2023. DriveLM: Driving with graph visual question answering. arXiv preprint arXiv:2312.14150 (2023).","journal-title":"arXiv preprint arXiv:2312.14150"},{"key":"e_1_3_2_122_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.engappai.2021.104236"},{"key":"e_1_3_2_123_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2024.3411928"},{"key":"e_1_3_2_124_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cogr.2023.04.001"},{"key":"e_1_3_2_125_2","first-page":"1","article-title":"Beyond the imitation game: Quantifying and extrapolating the capabilities of language models","author":"Srivastava Aarohi","year":"2023","unstructured":"Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adri\u00e0 Garriga-Alonso, et\u00a0al. 2023. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. Trans. Mach. Learn. Res. (2023), 1\u201395. https:\/\/openreview.net\/pdf?id=uyTL5Bvosj","journal-title":"Trans. Mach. Learn. Res."},{"issue":"5","key":"e_1_3_2_126_2","doi-asserted-by":"crossref","first-page":"763","DOI":"10.1587\/transfun.2021WBI0002","article-title":"Current status and issues of traffic light recognition technology in autonomous driving system","volume":"105","author":"Suganuma Naoki","year":"2022","unstructured":"Naoki Suganuma and Keisuke Yoneda. 2022. Current status and issues of traffic light recognition technology in autonomous driving system. IEICE Trans. Fundam. Electron., Commun. Comput. Sci. E105.A, 5 (2022), 763\u2013769.","journal-title":"IEICE Trans. Fundam. Electron., Commun. Comput. Sci."},{"key":"e_1_3_2_127_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N18-1074"},{"key":"e_1_3_2_128_2","article-title":"Cambrian-1: A fully open, vision-centric exploration of multimodal LLMs","author":"Tong Shengbang","year":"2024","unstructured":"Shengbang Tong, Ellis Brown, Penghao Wu, Sanghyun Woo, Manoj Middepogu, Sai Charitha Akula, Jihan Yang, Shusheng Yang, Adithya Iyer, Xichen Pan, et\u00a0al. 2024. Cambrian-1: A fully open, vision-centric exploration of multimodal LLMs. arXiv preprint arXiv:2406.16860 (2024).","journal-title":"arXiv preprint arXiv:2406.16860"},{"key":"e_1_3_2_129_2","article-title":"LLaMA: Open and efficient foundation language models","author":"Touvron Hugo","year":"2023","unstructured":"Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timoth\u00e9e Lacroix, Baptiste Rozi\u00e8re, Naman Goyal, Eric Hambro, Faisal Azhar, et\u00a0al. 2023. LLaMA: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023).","journal-title":"arXiv preprint arXiv:2302.13971"},{"key":"e_1_3_2_130_2","article-title":"Llama 2: Open foundation and fine-tuned chat models","author":"Touvron Hugo","year":"2023","unstructured":"Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et\u00a0al. 2023. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023).","journal-title":"arXiv preprint arXiv:2307.09288"},{"key":"e_1_3_2_131_2","first-page":"210","volume-title":"Proceedings of the International Semantic Web Conference","author":"Trivedi Priyansh","year":"2017","unstructured":"Priyansh Trivedi, Gaurav Maheshwari, Mohnish Dubey, and Jens Lehmann. 2017. LC-QuAD: A corpus for complex question answering over knowledge graphs. In Proceedings of the International Semantic Web Conference. Springer International Publishing, 210\u2013218."},{"key":"e_1_3_2_132_2","first-page":"95","volume-title":"Proceedings of the Student Research Workshop at the 18th Conference of the European Chapter of the Association for Computational Linguistics","author":"Uluoglakci Cem","year":"2024","unstructured":"Cem Uluoglakci and Tugba Temizel. 2024. HypoTermQA: Hypothetical terms dataset for benchmarking hallucination tendency of LLMs. In Proceedings of the Student Research Workshop at the 18th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 95\u2013136."},{"key":"e_1_3_2_133_2","article-title":"A stitch in time saves nine: Detecting and mitigating hallucinations of LLMs by validating low-confidence generation","author":"Varshney Neeraj","year":"2023","unstructured":"Neeraj Varshney, Wenlin Yao, Hongming Zhang, Jianshu Chen, and Dong Yu. 2023. A stitch in time saves nine: Detecting and mitigating hallucinations of LLMs by validating low-confidence generation. arXiv preprint arXiv:2307.03987 (2023).","journal-title":"arXiv preprint arXiv:2307.03987"},{"key":"e_1_3_2_134_2","volume-title":"Proceedings of the 37th Conference on Neural Information Processing Systems Datasets and Benchmarks Track","author":"Wang Boxin","year":"2023","unstructured":"Boxin Wang, Weixin Chen, Hengzhi Pei, Chulin Xie, Mintong Kang, Chenhui Zhang, Chejian Xu, Zidi Xiong, Ritik Dutta, Rylan Schaeffer, et\u00a0al. 2023. DecodingTrust: A comprehensive assessment of trustworthiness in GPT models. In Proceedings of the 37th Conference on Neural Information Processing Systems Datasets and Benchmarks Track."},{"key":"e_1_3_2_135_2","article-title":"Voyager: An open-ended embodied agent with large language models","author":"Wang Guanzhi","year":"2024","unstructured":"Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. 2024. Voyager: An open-ended embodied agent with large language models. Trans. Mach. Learn. Res. (2024), 1\u201344. https:\/\/openreview.net\/pdf?id=ehfRiF0R3a","journal-title":"Trans. Mach. Learn. Res."},{"key":"e_1_3_2_136_2","volume-title":"Proceedings of the Conference on Neural Information Processing Systems","author":"Wang Hongbo","year":"2024","unstructured":"Hongbo Wang, Jie Cao, Jin Liu, Xiaoqiang Zhou, Huaibo Huang, and Ran He. 2024. Hallo3D: Multi-modal hallucination detection and mitigation for consistent 3D content generation. In Proceedings of the Conference on Neural Information Processing Systems."},{"key":"e_1_3_2_137_2","article-title":"Conformal temporal logic planning using large language models","author":"Wang Jun","year":"2024","unstructured":"Jun Wang, Jiaming Tong, Kaiyuan Tan, Yevgeniy Vorobeychik, and Yiannis Kantaros. 2024. Conformal temporal logic planning using large language models. arXiv preprint arXiv:2309.10092 (2024).","journal-title":"arXiv preprint arXiv:2309.10092"},{"key":"e_1_3_2_138_2","first-page":"24824","volume-title":"Proceedings of the Conference on Neural Information Processing Systems","author":"Wei Jason","year":"2022","unstructured":"Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc V. Le, and Denny Zhou. 2022. Chain-of-thought prompting elicits reasoning in large language models. In Proceedings of the Conference on Neural Information Processing Systems. Curran Associates, Inc., 24824\u201324837."},{"key":"e_1_3_2_139_2","volume-title":"Proceedings of the 12th International Conference on Learning Representations","author":"Wen Licheng","year":"2024","unstructured":"Licheng Wen, Daocheng Fu, Xin Li, Xinyu Cai, Tao M. A., Pinlong Cai, Min Dou, Botian Shi, Liang He, and Yu Qiao. 2024. DiLu: A knowledge-driven approach to autonomous driving with large language models. In Proceedings of the 12th International Conference on Learning Representations."},{"key":"e_1_3_2_140_2","volume-title":"Proceedings of the Workshop on Large Language Model (LLM) Agents at the 12th International Conference on Learning Representations","author":"Wen Licheng","year":"2024","unstructured":"Licheng Wen, Xuemeng Yang, Daocheng Fu, Xiaofeng Wang, Pinlong Cai, Xin Li, Tao M. A., Yingxuan Li, Linran XU, Dengke Shang, et\u00a0al. 2024. On the road with GPT-4V(ision): Explorations of utilizing visual-language model as autonomous driving agent. In Proceedings of the Workshop on Large Language Model (LLM) Agents at the 12th International Conference on Learning Representations."},{"key":"e_1_3_2_141_2","doi-asserted-by":"publisher","DOI":"10.1007\/BF00992696"},{"key":"e_1_3_2_142_2","article-title":"Language prompt for autonomous driving","author":"Wu Dongming","year":"2023","unstructured":"Dongming Wu, Wencheng Han, Tiancai Wang, Yingfei Liu, Xiangyu Zhang, and Jianbing Shen. 2023. Language prompt for autonomous driving. arXiv preprint arXiv:2309.04379 (2023).","journal-title":"arXiv preprint arXiv:2309.04379"},{"key":"e_1_3_2_143_2","volume-title":"Proceedings of the 12th International Conference on Learning Representations","author":"Xiong Miao","year":"2024","unstructured":"Miao Xiong, Zhiyuan Hu, Xinyang Lu, Yifei Li, Jie Fu, Junxian He, and Bryan Hooi. 2024. Can LLMs express their uncertainty? An empirical evaluation of confidence elicitation in LLMs. In Proceedings of the 12th International Conference on Learning Representations."},{"issue":"10","key":"e_1_3_2_144_2","doi-asserted-by":"crossref","first-page":"8186","DOI":"10.1109\/LRA.2024.3440097","article-title":"DriveGPT4: Interpretable end-to-end autonomous driving via large language model","volume":"9","author":"Xu Zhenhua","year":"2024","unstructured":"Zhenhua Xu, Yujia Zhang, Enze Xie, Zhen Zhao, Yong Guo, Kwan-Yee K. Wong, Zhenguo Li, and Hengshuang Zhao. 2024. DriveGPT4: Interpretable end-to-end autonomous driving via large language model. IEEE Robot. Autom. Lett. 9, 10 (2024), 8186\u20138193.","journal-title":"IEEE Robot. Autom. Lett."},{"key":"e_1_3_2_145_2","article-title":"Harnessing the power of LLMs in practice: A survey on ChatGPT and beyond","author":"Yang Jingfeng","year":"2024","unstructured":"Jingfeng Yang, Hongye Jin, Ruixiang Tang, Xiaotian Han, Qizhang Feng, Haoming Jiang, Shaochen Zhong, Bing Yin, and Xia Hu. 2024. Harnessing the power of LLMs in practice: A survey on ChatGPT and beyond. ACM Trans. Knowl. Discov. Data 18, 6 (2024), 1\u201332.","journal-title":"ACM Trans. Knowl. Discov. Data"},{"key":"e_1_3_2_146_2","volume-title":"Proceedings of the Conference on Neural Information Processing Systems","author":"Yang Yuqing","year":"2024","unstructured":"Yuqing Yang, Ethan Chern, Xipeng Qiu, Graham Neubig, and Pengfei Liu. 2024. Alignment for honesty. In Proceedings of the Conference on Neural Information Processing Systems."},{"key":"e_1_3_2_147_2","volume-title":"Proceedings of the Workshop on Open-world Agents at the Conference on Neural Information Processing Systems","author":"Yang Zhenjie","year":"2024","unstructured":"Zhenjie Yang, Xiaosong Jia, Hongyang Li, and Junchi Yan. 2024. LLM4Drive: A survey of large language models for autonomous driving. In Proceedings of the Workshop on Open-world Agents at the Conference on Neural Information Processing Systems."},{"key":"e_1_3_2_148_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1259"},{"key":"e_1_3_2_149_2","article-title":"LLM lies: Hallucinations are not bugs, but features as adversarial examples","author":"Yao Jia-Yu","year":"2023","unstructured":"Jia-Yu Yao, Kun-Peng Ning, Zhen-Hui Liu, Mu-Nan Ning, and Li Yuan. 2023. LLM lies: Hallucinations are not bugs, but features as adversarial examples. arXiv preprint arXiv:2310.01469 (2023).","journal-title":"arXiv preprint arXiv:2310.01469"},{"key":"e_1_3_2_150_2","article-title":"Cognitive mirage: A review of hallucinations in large language models","author":"Ye Hongbin","year":"2023","unstructured":"Hongbin Ye, Tong Liu, Aijia Zhang, Wei Hua, and Weiqiang Jia. 2023. Cognitive mirage: A review of hallucinations in large language models. arXiv preprint arXiv:2309.06794 (2023).","journal-title":"arXiv preprint arXiv:2309.06794"},{"key":"e_1_3_2_151_2","first-page":"9333","volume-title":"Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics","author":"Yehuda Yakir","year":"2024","unstructured":"Yakir Yehuda, Itzik Malkiel, Oren Barkan, Jonathan Weill, Royi Ronen, and Noam Koenigstein. 2024. InterrogateLLM: Zero-resource hallucination detection in LLM-generated answers. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 9333\u20139347."},{"key":"e_1_3_2_152_2","article-title":"Dialog system technology challenge 7","author":"Yoshino Koichiro","year":"2019","unstructured":"Koichiro Yoshino, Chiori Hori, Julien Perez, Luis Fernando D\u2019Haro, Lazaros Polymenakos, Chulaka Gunasekara, Walter S. Lasecki, Jonathan K. Kummerfeld, Michel Galley, Chris Brockett, et\u00a0al. 2019. Dialog system technology challenge 7. arXiv preprint arXiv:1901.03461 (2019).","journal-title":"arXiv preprint arXiv:1901.03461"},{"key":"e_1_3_2_153_2","article-title":"LSUN: Construction of a large-scale image dataset using deep learning with humans in the loop","author":"Yu Fisher","year":"2015","unstructured":"Fisher Yu, Ari Seff, Yinda Zhang, Shuran Song, Thomas Funkhouser, and Jianxiong Xiao. 2015. LSUN: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365 (2015).","journal-title":"arXiv preprint arXiv:1506.03365"},{"key":"e_1_3_2_154_2","first-page":"1333","volume-title":"Findings of the Conference of the North American Chapter of the Association for Computational Linguistics","author":"Yu Xiaodong","year":"2024","unstructured":"Xiaodong Yu, Hao Cheng, Xiaodong Liu, Dan Roth, and Jianfeng Gao. 2024. ReEval: Automatic hallucination evaluation for retrieval-augmented large language models via transferable adversarial attacks. In Findings of the Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 1333\u20131351."},{"key":"e_1_3_2_155_2","series-title":"Proceedings of Machine Learning Research","first-page":"726","volume-title":"Proceedings of the 4th Conference on Robot Learning","volume":"155","author":"Zeng Andy","year":"2021","unstructured":"Andy Zeng, Pete Florence, Jonathan Tompson, Stefan Welker, Jonathan Chien, Maria Attarian, Travis Armstrong, Ivan Krasin, Dan Duong, Vikas Sindhwani, et\u00a0al. 2021. Transporter networks: Rearranging the visual world for robotic manipulation. In Proceedings of the 4th Conference on Robot Learning(Proceedings of Machine Learning Research, Vol. 155). PMLR, 726\u2013747."},{"key":"e_1_3_2_156_2","article-title":"Large language models for robotics: A survey","author":"Zeng Fanlong","year":"2023","unstructured":"Fanlong Zeng, Wensheng Gan, Yongheng Wang, Ning Liu, and Philip S. Yu. 2023. Large language models for robotics: A survey. arXiv preprint arXiv:2311.07226 (2023).","journal-title":"arXiv preprint arXiv:2311.07226"},{"key":"e_1_3_2_157_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.birob.2023.100131"},{"key":"e_1_3_2_158_2","series-title":"Proceedings of Machine Learning Research","first-page":"59670","volume-title":"Proceedings of the 41st International Conference on Machine Learning","volume":"235","author":"Zhang Muru","year":"2024","unstructured":"Muru Zhang, Ofir Press, William Merrill, Alisa Liu, and Noah A. Smith. 2024. How language model hallucinations can snowball. In Proceedings of the 41st International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 235). PMLR, 59670\u201359684."},{"key":"e_1_3_2_159_2","first-page":"2025","volume-title":"Findings of the 62nd Annual Meeting of the Association for Computational Linguistics","author":"Zhang Shuo","year":"2024","unstructured":"Shuo Zhang, Liangming Pan, Junzhou Zhao, and William Yang Wang. 2024. The knowledge alignment problem: Bridging human and external knowledge for large language models. In Findings of the 62nd Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2025\u20132038."},{"key":"e_1_3_2_160_2","volume-title":"Proceedings of the 8th International Conference on Learning Representations","author":"Zhang Tianyi","year":"2020","unstructured":"Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, and Yoav Artzi. 2020. BERTScore: Evaluating text generation with BERT. In Proceedings of the 8th International Conference on Learning Representations."},{"key":"e_1_3_2_161_2","article-title":"Siren\u2019s song in the AI Ocean: A survey on hallucination in large language models","author":"Zhang Yue","year":"2023","unstructured":"Yue Zhang, Yafu Li, Leyang Cui, Deng Cai, Lemao Liu, Tingchen Fu, Xinting Huang, Enbo Zhao, Yu Zhang, Yulong Chen, et\u00a0al. 2023. Siren\u2019s song in the AI Ocean: A survey on hallucination in large language models. arXiv preprint arXiv:2309.01219 (2023).","journal-title":"arXiv preprint arXiv:2309.01219"},{"key":"e_1_3_2_162_2","first-page":"737","volume-title":"Proceedings of the IEEE Symposium Series on Computational Intelligence (SSCI\u201920)","author":"Zhao Wenshuai","year":"2020","unstructured":"Wenshuai Zhao, Jorge Pe\u00f1a Queralta, and Tomi Westerlund. 2020. Sim-to-real transfer in deep reinforcement learning for robotics: A survey. In Proceedings of the IEEE Symposium Series on Computational Intelligence (SSCI\u201920). 737\u2013744."},{"key":"e_1_3_2_163_2","article-title":"A survey of large language models","author":"Zhao Wayne Xin","year":"2023","unstructured":"Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, et\u00a0al. 2023. A survey of large language models. arXiv preprint arXiv:2303.18223 (2023).","journal-title":"arXiv preprint arXiv:2303.18223"},{"key":"e_1_3_2_164_2","article-title":"A comprehensive survey on pretrained foundation models: A history from BERT to ChatGPT","author":"Zhou Ce","year":"2023","unstructured":"Ce Zhou, Qian Li, Chen Li, Jun Yu, Yixin Liu, Guangjing Wang, Kai Zhang, Cheng Ji, Qiben Yan, Lifang He, et\u00a0al. 2023. A comprehensive survey on pretrained foundation models: A history from BERT to ChatGPT. arXiv preprint arXiv:2302.09419 (2023).","journal-title":"arXiv preprint arXiv:2302.09419"},{"key":"e_1_3_2_165_2","volume-title":"Proceedings of the 12th International Conference on Learning Representations","author":"Zhou Yiyang","year":"2024","unstructured":"Yiyang Zhou, Chenhang Cui, Jaehong Yoon, Linjun Zhang, Zhun Deng, Chelsea Finn, Mohit Bansal, and Huaxiu Yao. 2024. Analyzing and mitigating object hallucination in large vision-language models. In Proceedings of the 12th International Conference on Learning Representations."}],"container-title":["ACM Computing Surveys"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3716846","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3716846","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3716846","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:18:53Z","timestamp":1750295933000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3716846"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,5]]},"references-count":164,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2025,7,31]]}},"alternative-id":["10.1145\/3716846"],"URL":"https:\/\/doi.org\/10.1145\/3716846","relation":{},"ISSN":["0360-0300","1557-7341"],"issn-type":[{"value":"0360-0300","type":"print"},{"value":"1557-7341","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,3,5]]},"assertion":[{"value":"2024-04-29","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-02-03","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-03-05","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}