{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,17]],"date-time":"2026-06-17T16:28:09Z","timestamp":1781713689598,"version":"3.54.5"},"reference-count":37,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2025,9,30]],"date-time":"2025-09-30T00:00:00Z","timestamp":1759190400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Artif. Intell."],"abstract":"<jats:p><jats:bold>Hallucination<\/jats:bold> in Large Language Models (LLMs) refers to outputs that appear fluent and coherent but are factually incorrect, logically inconsistent, or entirely fabricated. As LLMs are increasingly deployed in education, healthcare, law, and scientific research, understanding and mitigating hallucinations has become critical. In this work, we present a comprehensive survey and empirical analysis of hallucination <jats:italic>attribution<\/jats:italic> in LLMs. Introducing a novel framework to determine whether a given hallucination stems from not optimize prompting or the model's intrinsic behavior. We evaluate state-of-the-art LLMs\u2014including GPT-4, LLaMA 2, DeepSeek, and others\u2014under various controlled prompting conditions, using established benchmarks (TruthfulQA, HallucinationEval) to judge factuality. Our <jats:italic>attribution framework<\/jats:italic> defines metrics for <jats:italic>Prompt Sensitivity (PS)<\/jats:italic> and <jats:italic>Model Variability (MV)<\/jats:italic>, which together quantify the contribution of prompts vs. model-internal factors to hallucinations. Through extensive experiments and comparative analyses, we identify distinct patterns in hallucination occurrence, severity, and mitigation across models. Notably, structured prompt strategies such as chain-of-thought (CoT) prompting significantly reduce hallucinations in prompt-sensitive scenarios, though intrinsic model limitations persist in some cases. These findings contribute to <jats:italic>a deeper understanding<\/jats:italic> of LLM reliability and provide <jats:italic>insights<\/jats:italic> for prompt engineers, model developers, and AI practitioners. We further propose best practices and future directions to reduce hallucinations in both prompt design and model development pipelines.<\/jats:p>","DOI":"10.3389\/frai.2025.1622292","type":"journal-article","created":{"date-parts":[[2025,9,30]],"date-time":"2025-09-30T05:37:50Z","timestamp":1759210670000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":55,"title":["Survey and analysis of hallucinations in large language models: attribution to prompting strategies or model behavior"],"prefix":"10.3389","volume":"8","author":[{"given":"Dang","family":"Anh-Hoang","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Vu","family":"Tran","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Le-Minh","family":"Nguyen","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1965","published-online":{"date-parts":[[2025,9,30]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2305.11685","article-title":"The hallucination problem in large language models: a survey","author":"Andrews","year":"2023","journal-title":"arXiv [preprint]"},{"key":"B2","unstructured":"Claude: A Next-generation AI Assistant by Anthropic\n          \n          2023"},{"key":"B3","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2301.04589","article-title":"Multitask prompted training enables zero-shot task generalization","author":"Bang","year":"2023","journal-title":"arXiv [preprint]"},{"key":"B4","volume-title":"Statistical Decision theory and Bayesian Analysis","author":"Berger","year":"2013"},{"key":"B5","year":""},{"key":"B6","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2108.07258","article-title":"On the opportunities and risks of foundation models","author":"Bommasani","year":"2021","journal-title":"arXiv [preprint]"},{"key":"B7","doi-asserted-by":"publisher","first-page":"1877","DOI":"10.48550\/arXiv.2005.14165","article-title":"Language models are few-shot learners. Adv. Neural Inf. Process","volume":"33","author":"Brown","year":"2020","journal-title":"Syst"},{"key":"B8","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2308.03299","article-title":"Hallucination in large language models: a survey","author":"Chen","year":"2023","journal-title":"arXiv [preprint]"},{"key":"B9","unstructured":"Deepseek LLMs\n          \n          2023"},{"key":"B10","doi-asserted-by":"publisher","first-page":"5962","DOI":"10.18653\/v1\/2022.naacl-main.187","author":"Fabbri","year":"2022"},{"key":"B11","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.findings-emnlp.301","article-title":"\u201cRealtoxicityprompts: evaluating neural toxic degeneration in language models,\u201d","author":"Gehman","year":"2020","journal-title":"Findings of EMNLP"},{"key":"B12","doi-asserted-by":"crossref","DOI":"10.1201\/b16018","volume-title":"Bayesian Data Analysis","author":"Gelman","year":"2013"},{"key":"B13","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3571730","article-title":"Survey of hallucination in natural language generation","volume":"55","author":"Ji","year":"2023","journal-title":"ACM Comput. Surv"},{"key":"B14","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2207.05221","article-title":"Language models (mostly) know what they know","author":"Kadavath","year":"2022","journal-title":"arXiv [preprint]"},{"key":"B15","doi-asserted-by":"publisher","DOI":"10.4850\/arXiv:2305.00038","article-title":"Cohs: a dataset for evaluating factual consistency of summaries","author":"Kazemi","year":"2023","journal-title":"arXiv [preprint]"},{"key":"B16","doi-asserted-by":"publisher","first-page":"1233","DOI":"10.1056\/NEJMsr2214184","article-title":"Benefits, limits, and risks of gpt-4 as an ai chatbot for medicine","volume":"388","author":"Lee","year":"2023","journal-title":"N. Engl. J. Med."},{"key":"B17","author":"Lewis","year":"2020","journal-title":"Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks"},{"key":"B18","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2210.15097","article-title":"Contrastive decoding: Open-ended text generation as optimization","author":"Li","year":"2022","journal-title":"arXiv [preprint]"},{"key":"B19","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2109.07958","article-title":"Truthfulqa: measuring how models mimic human falsehoods","author":"Lin","year":"2022","journal-title":"arXiv [preprint]"},{"key":"B20","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2305.03023","article-title":"Evaluating the factual consistency of large language models: A survey","author":"Liu","year":"2023","journal-title":"arXiv [preprint]"},{"key":"B21","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.173","article-title":"\u201cOn faithfulness and factuality in abstractive summarization,\u201d","author":"Maynez","year":"2020","journal-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics"},{"key":"B22","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv:2112.09332","article-title":"Webgpt: browser-assisted question-answering with human feedback","author":"Nakano","year":"2021","journal-title":"arXiv [preprint]"},{"key":"B23","unstructured":"Gpt-4 System Card"},{"key":"B24","unstructured":"Gpt-4 Technical Report"},{"key":"B25","unstructured":"Openchat: Open-Source Chat Models\n          \n          2023"},{"key":"B26","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2203.02155","article-title":"Training language models to follow instructions with human feedback","author":"Ouyang","year":"2022","journal-title":"arXiv [preprint]"},{"key":"B27","volume-title":"Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference","author":"Pearl","year":"1988"},{"key":"B28","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2102.07350","article-title":"Prompt programming for large language models: Beyond the few-shot paradigm","author":"Reynolds","year":"2021","journal-title":"arXiv [preprint]"},{"key":"B29","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2202.08906","article-title":"Language models that seek for knowledge: modular search & generation for dialogue and prompt completion","author":"Shuster","year":"2022","journal-title":"arXiv [preprint]"},{"key":"B30","unstructured":"Touvron\n              H.\n            \n            \n              Lavril\n              T.\n            \n            \n              Izacard\n              G.\n            \n            \n              Martinet\n              X.\n            \n            \n              Lachaux\n              M.-A.\n            \n            \n              Lacroix\n              T.\n            \n          \n          Llama 2: Open Foundation and Fine-tuned Chat Models\n          \n          2023"},{"key":"B31","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2203.11171","article-title":"Self-consistency improves chain of thought reasoning in language models","author":"Wang","year":"2022","journal-title":"arXiv [preprint]"},{"key":"B32","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2201.11903","article-title":"Chain-of-thought prompting elicits reasoning in large language models","author":"Wei","year":"2022","journal-title":"arXiv [preprint]"},{"key":"B33","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2112.04359","article-title":"Taxonomy of risks posed by language models","author":"Weidinger","year":"2022","journal-title":"arXiv [preprint]"},{"key":"B34","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2310.06545","article-title":"Hallucinationeval: a unified framework for evaluating hallucinations in LLMs","author":"Wu","year":"2023","journal-title":"arXiv [preprint]"},{"key":"B35","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2210.03629","article-title":"React: Synergizing reasoning and acting in language models","author":"Yao","year":"2022","journal-title":"arXiv [preprint]"},{"key":"B36","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv:.303.08239","article-title":"Grounded language model training reduces hallucination","author":"Zhang","year":"2023","journal-title":"arXiv [preprint]"},{"key":"B37","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2205.10625","article-title":"Least-to-most prompting enables complex reasoning in large language models","author":"Zhou","year":"2022","journal-title":"arXiv [preprint]"}],"container-title":["Frontiers in Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2025.1622292\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,30]],"date-time":"2025-09-30T05:37:56Z","timestamp":1759210676000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2025.1622292\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9,30]]},"references-count":37,"alternative-id":["10.3389\/frai.2025.1622292"],"URL":"https:\/\/doi.org\/10.3389\/frai.2025.1622292","relation":{},"ISSN":["2624-8212"],"issn-type":[{"value":"2624-8212","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,9,30]]},"article-number":"1622292"}}