{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,7]],"date-time":"2026-07-07T15:27:39Z","timestamp":1783438059994,"version":"3.54.6"},"reference-count":124,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2026,1,3]],"date-time":"2026-01-03T00:00:00Z","timestamp":1767398400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2026,1,19]],"date-time":"2026-01-19T00:00:00Z","timestamp":1768780800000},"content-version":"vor","delay-in-days":16,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001803","name":"Charles Darwin University","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100001803","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Artif Intell Rev"],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Large language models (LLMs) are trained on vast and diverse internet corpora that often include inaccurate or misleading content. Consequently, LLMs can generate misinformation, making robust fact-checking essential. This review systematically analyzes how LLM-generated content is evaluated for factual accuracy by exploring key challenges such as hallucinations, dataset limitations, and the reliability of evaluation metrics. The review emphasizes the need for strong fact-checking frameworks that integrate advanced prompting strategies, domain-specific fine-tuning, and retrieval-augmented generation (RAG) methods. It proposes five research questions that guide the analysis of the recent literature from 2020 to 2025, focusing on evaluation methods and mitigation techniques. Instruction tuning, multi-agent reasoning, and RAG frameworks for external knowledge access are also reviewed. The key findings demonstrate the limitations of current metrics, the importance of validated external evidence, and the improvement of factual consistency through domain-specific customization. The review underscores the importance of building more accurate, understandable, and context-aware fact-checking. These insights contribute to the advancement of research toward more trustworthy models.<\/jats:p>","DOI":"10.1007\/s10462-025-11454-w","type":"journal-article","created":{"date-parts":[[2026,1,4]],"date-time":"2026-01-04T01:50:09Z","timestamp":1767491409000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":17,"title":["Hallucination to truth: a review of fact-checking and factuality evaluation in large language models"],"prefix":"10.1007","volume":"59","author":[{"given":"Subhey Sadi","family":"Rahman","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Md. Adnanul","family":"Islam","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Md. Mahbub","family":"Alam","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Musarrat","family":"Zeba","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Md. Abdur","family":"Rahman","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Sadia Sultana","family":"Chowa","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Mohaimenul Azam Khan","family":"Raiaan","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Sami","family":"Azam","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2026,1,3]]},"reference":[{"key":"11454_CR1","doi-asserted-by":"crossref","unstructured":"Abian AI, Raiaan MAK, Karim A, Azam S, Fahad NM, Shafiabady N, Yeo KC, De\u00a0Boer F (2024) Automated diagnosis of respiratory diseases from lung ultrasound videos ensuring XAI: an innovative hybrid model approach. Front Comput Sci, 6","DOI":"10.3389\/fcomp.2024.1438126"},{"key":"11454_CR2","doi-asserted-by":"publisher","DOI":"10.1016\/j.engappai.2025.110656","volume":"150","author":"AI Abian","year":"2025","unstructured":"Abian AI, Raiaan MAK, Jonkman M, Islam SMS, Azam S (2025) Atrous spatial pyramid pooling with swin transformer model for classification of gastrointestinal tract diseases from videos with enhanced explainability. Eng Appl Artif Intell 150:110656","journal-title":"Eng Appl Artif Intell"},{"key":"11454_CR3","doi-asserted-by":"crossref","unstructured":"Aly R, Papay S, Christodoulopoulos C, Augenstein I (2021) Feverous: Fact extraction and verification over unstructured and structured information. In: Proceedings of the 2021 conference on empirical methods in natural language processing. Association for Computational Linguistics, 6118\u20136129","DOI":"10.18653\/v1\/2021.fever-1.1"},{"issue":"8","key":"11454_CR4","doi-asserted-by":"publisher","first-page":"852","DOI":"10.1038\/s42256-024-00881-z","volume":"6","author":"I Augenstein","year":"2024","unstructured":"Augenstein I, Baldwin T, Cha M, Chakraborty T, Ciampaglia GL, Corney D, DiResta R, Ferrara E, Hale S, Halevy A et al (2024) Factuality challenges in the era of large language models and opportunities for fact-checking. Nat Mach Intell 6(8):852\u2013863","journal-title":"Nat Mach Intell"},{"key":"11454_CR5","doi-asserted-by":"crossref","unstructured":"Bai Y, Fu K (2024) A large language model-based fake news detection framework with rag fact-checking. In: 2024 IEEE international conference on big data (BigData). IEEE, pp 8617\u20138619","DOI":"10.1109\/BigData62323.2024.10826000"},{"key":"11454_CR6","doi-asserted-by":"crossref","unstructured":"Bang Y, Cahyawijaya S, Lee N, Dai W, Su D, Wilie B, Lovenia H, Ji Z, Yu T, Chung W, et\u00a0al (2023) A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity. arXiv preprint arXiv:2302.04023","DOI":"10.18653\/v1\/2023.ijcnlp-main.45"},{"key":"11454_CR7","unstructured":"Bayat F, Zhang L, Munir S, Wang L (2025) Factbench: a dynamic benchmark for in-the-wild language model factuality evaluation. https:\/\/arxiv.org\/abs\/2410.22257"},{"key":"11454_CR8","doi-asserted-by":"crossref","unstructured":"Bender EM, Gebru T, McMillan-Major A, Shmitchell S (2021) On the dangers of stochastic parrots: Can language models be too big?. In: Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pp 610\u2013623","DOI":"10.1145\/3442188.3445922"},{"key":"11454_CR9","doi-asserted-by":"crossref","unstructured":"Cao H, Wei L, Zhou W, Hu S (2024) Multi-source knowledge enhanced graph attention networks for multimodal fact verification. In: 2024 IEEE international conference on multimedia and expo (ICME). IEEE, pp 1\u20136","DOI":"10.1109\/ICME57554.2024.10687974"},{"key":"11454_CR10","unstructured":"Chatrath , Lotif M, Raza S (2024) Fact or fiction? Can LLMs be reliable annotators for political truths?. In: Workshop on socially responsible language modelling research"},{"key":"11454_CR11","doi-asserted-by":"crossref","unstructured":"Cheung T-H, Lam K-M (2023) Factllama: optimizing instruction-following language models with external knowledge for automated fact-checking. In: Asia Pacific signal and information processing association annual summit and conference (APSIPA ASC). IEEE 2023, pp 846\u2013853","DOI":"10.1109\/APSIPAASC58517.2023.10317251"},{"issue":"1","key":"11454_CR12","doi-asserted-by":"publisher","first-page":"23337","DOI":"10.1038\/s41598-025-04189-9","volume":"15","author":"JC Cheung","year":"2025","unstructured":"Cheung JC, Ho SS (2025) The effectiveness of explainable ai on human factors in trust models. Sci Rep 15(1):23337","journal-title":"Sci Rep"},{"key":"11454_CR13","first-page":"1441","volume":"2024","author":"EC Choi","year":"2024","unstructured":"Choi EC, Ferrara E (2024) Automated claim matching with large language models: empowering fact-checkers in the fight against misinformation. In: Companion proceedings of the ACM web conference, pp 1441\u20131449","journal-title":"Companion Proceedings of the ACM Web Conference"},{"key":"11454_CR14","first-page":"883","volume":"2024","author":"EC Choi","year":"2024","unstructured":"Choi EC, Ferrara E (2024) Fact-gpt: Fact-checking augmentation via claim matching with llms. In: Companion proceedings of the ACM web conference, pp 883\u2013886","journal-title":"Companion Proceedings of the ACM Web Conference"},{"key":"11454_CR15","doi-asserted-by":"crossref","unstructured":"Chowa SS, Alvi R, Rahman SS, Rahman MA, Raiaan MAK, Islam MR, Hussain M, Azam S (2025) From language to action: a review of large language models as autonomous agents and tool users. arXiv preprint arXiv:2508.17281","DOI":"10.1007\/s10462-025-11471-9"},{"issue":"4","key":"11454_CR16","first-page":"28","volume":"14","author":"ME Conway","year":"1968","unstructured":"Conway ME (1968) How do committees invent. Datamation 14(4):28\u201331","journal-title":"Datamation"},{"issue":"50","key":"11454_CR17","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.2322823121","volume":"121","author":"MR DeVerna","year":"2024","unstructured":"DeVerna MR, Yan HY, Yang K-C, Menczer F (2024) Fact-checking information from large language models can decrease headline discernment. Proc Natl Acad Sci 121(50):e2322823121","journal-title":"Proc Natl Acad Sci"},{"issue":"22","key":"11454_CR18","doi-asserted-by":"publisher","first-page":"23 787","DOI":"10.1609\/aaai.v39i22.34550","volume":"39","author":"Y Ding","year":"2025","unstructured":"Ding Y, Facciani M, Joyce E, Poudel A, Bhattacharya S, Veeramani B, Aguinaga S, Weninger T (2025) Citations and trust in llm generated responses. In: Proceedings of the AAAI conference on artificial intelligence, vol 39(22), pp 23787-23795","journal-title":"Proceedings of the AAAI Conference on Artificial Intelligence"},{"key":"11454_CR19","unstructured":"Dmonte A, Oruche R, Zampieri M, Calyam P, Augenstein I (2025) Claim verification in the age of large language models: a survey. arXiv preprint arXiv:2408.14317v2. [Online]. Available: https:\/\/arxiv.org\/abs\/2408.14317"},{"key":"11454_CR20","doi-asserted-by":"crossref","unstructured":"Eisenschlos JM, Dhingra B, Bulian J, B\u00f6rschinger B, Boyd-Graber J (2021) Fool me twice: entailment from Wikipedia Gamification. In: Proceedings of the 2021 conference of the North American Chapter of the Association for Computational Linguistics, pp 1234\u20131245. [Online]. Available: https:\/\/aclanthology.org\/2021.naacl-main.32\/","DOI":"10.18653\/v1\/2021.naacl-main.32"},{"key":"11454_CR21","doi-asserted-by":"crossref","unstructured":"Fadeeva E, Rubashevskii A, Shelmanov A, Petrakov S, Li H, Mubarak H, Tsymbalov E, Kuzmin G, Panchenko A, Baldwin T, et\u00a0al (2024) Fact-checking the output of large language models via token-level uncertainty quantification. In: Findings of the Association for Computational Linguistics: ACL 2024, Bangkok, Thailand, pp 9367\u20139385","DOI":"10.18653\/v1\/2024.findings-acl.558"},{"issue":"033\u201314","key":"11454_CR22","first-page":"042","volume":"14","author":"Y Ge","year":"2024","unstructured":"Ge Y, Zeng X, Huffman JS, Lin T-Y, Liu M-Y, Cui Y (2024) Visual fact checker: enabling high-fidelity detailed caption generation. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition 14(033\u201314):042","journal-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition"},{"key":"11454_CR23","unstructured":"Geng J, Kementchedjhieva Y, Nakov P, Gurevych I (2024) Multimodal large language models to support real-world fact-checking. arXiv preprint arXiv:2403.03627"},{"key":"11454_CR24","unstructured":"Ghosh B, Hasan S, Arafat NA, Khan A (2025) Logical consistency of large language models in fact-checking. In: The thirteenth international conference on learning representations"},{"key":"11454_CR25","doi-asserted-by":"crossref","unstructured":"Giarelis N, Mastrokostas C, Karacapilidis N (2024) A unified llm-kg framework to assist fact-checking in public deliberation. In: Proceedings of the first workshop on language-driven deliberation technology (DELITE)@ LREC-COLING, pp 13\u201319","DOI":"10.3390\/app13137620"},{"key":"11454_CR26","doi-asserted-by":"publisher","first-page":"1500","DOI":"10.1162\/tacl_a_00615","volume":"11","author":"NM Guerreiro","year":"2023","unstructured":"Guerreiro NM, Alves DM, Waldendorf J, Haddow B, Birch A, Colombo P, Martins AF (2023) Hallucinations in large multilingual translation models. Trans Assoc Comput Linguist 11:1500\u20131517","journal-title":"Trans Assoc Comput Linguist"},{"key":"11454_CR27","doi-asserted-by":"publisher","unstructured":"Gupta A, Srikumar V (2021) X-fact: a new benchmark dataset for multilingual fact checking. In: Proceedings of the 2021 conference on empirical methods in natural language processing, pp 732\u2013748. [Online]. https:\/\/doi.org\/10.48550\/arXiv.2106.09248","DOI":"10.48550\/arXiv.2106.09248"},{"issue":"20","key":"11454_CR28","doi-asserted-by":"publisher","first-page":"22 105","DOI":"10.1609\/aaai.v38i20.30214","volume":"38","author":"B Hu","year":"2024","unstructured":"Hu B, Sheng Q, Cao J, Shi Y, Li Y, Wang D, Qi P (2024) Bad actor, good advisor: exploring the role of large language models in fake news detection. In: Proceedings of the AAAI conference on artificial intelligence, vol 38(20), pp 22105-22113","journal-title":"Proceedings of the AAAI Conference on Artificial Intelligence"},{"issue":"10","key":"11454_CR29","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s10462-025-11328-1","volume":"58","author":"T Huang","year":"2025","unstructured":"Huang T (2025) Content moderation by llm: from accuracy to legitimacy. Artif Intell Rev 58(10):1\u201332","journal-title":"Artif Intell Rev"},{"issue":"2","key":"11454_CR30","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3703155","volume":"43","author":"L Huang","year":"2025","unstructured":"Huang L, Yu W, Ma W, Zhong W, Feng Z, Wang H, Chen Q, Peng W, Feng X, Qin B et al (2025) A survey on hallucination in large language models: principles, taxonomy, challenges, and open questions. ACM Trans Inf Syst 43(2):1\u201355","journal-title":"ACM Trans Inf Syst"},{"key":"11454_CR31","unstructured":"Huang Y, Feng X, Feng X, Qin B (2021) The factual inconsistency problem in abstractive text summarization: a survey. arXiv preprint arXiv:2104.14839"},{"key":"11454_CR32","doi-asserted-by":"crossref","unstructured":"Jannah SZ, Aco E, Peng S, Wakamiya S, Aramaki E (2025) Multilingual symptom detection on social media: enhancing health-related fact-checking with LLMs. In: Proceedings of the eighth fact extraction and verification workshop (FEVER), Vienna, Austria, pp 54\u201368","DOI":"10.18653\/v1\/2025.fever-1.4"},{"issue":"12","key":"11454_CR33","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3571730","volume":"55","author":"Z Ji","year":"2023","unstructured":"Ji Z, Lee N, Frieske R, Yu T, Su D, Xu Y, Ishii E, Bang YJ, Madotto A, Fung P (2023) Survey of hallucination in natural language generation. ACM Comput Surv 55(12):1\u201338","journal-title":"ACM Comput Surv"},{"key":"11454_CR34","doi-asserted-by":"crossref","unstructured":"Jiang Y, Bordia S, Zhong Z, Dognin C, Singh M, Bansal M (2020) Hover: a dataset for many-hop fact extraction and claim verification. In: Findings of the Association for Computational Linguistics: EMNLP, pp 3441\u20133450. [Online]. https:\/\/aclanthology.org\/2020.findings-emnlp.309\/","DOI":"10.18653\/v1\/2020.findings-emnlp.309"},{"key":"11454_CR35","doi-asserted-by":"crossref","unstructured":"Jin Z, Cao J, Guo H, Zhang Y, Luo J (2017) Multimodal fusion with recurrent neural networks for rumor detection on microblogs. In: Proceedings of the 25th ACM international conference on multimedia, pp 795\u2013816","DOI":"10.1145\/3123266.3123454"},{"key":"11454_CR36","doi-asserted-by":"crossref","unstructured":"Jin Q, Dhingra B, Liu Z, Cohen W, Lu X (2019) PubMedQA: a dataset for biomedical research question answering. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), 2567\u20132577. [Online]. https:\/\/aclanthology.org\/D19-1259\/","DOI":"10.18653\/v1\/D19-1259"},{"key":"11454_CR37","doi-asserted-by":"crossref","unstructured":"Jing X, Billa S, Godbout D (2025) On a scale from 1 to 5: quantifying hallucination in faithfulness evaluation. In: Findings of the Association for Computational Linguistics: NAACL Albuquerque. New Mexico, pp 7765\u20137780","DOI":"10.18653\/v1\/2025.findings-naacl.433"},{"issue":"646\u201329","key":"11454_CR38","first-page":"648","volume":"29","author":"K Kakizaki","year":"2025","unstructured":"Kakizaki K, Matsunaga Y, Furukawa R (2025) MAFT: multimodal automated fact-checking via textualization. In: Proceedings of the AAAI conference on artificial intelligence 29(646\u201329):648","journal-title":"Proceedings of the AAAI Conference on Artificial Intelligence"},{"key":"11454_CR39","unstructured":"Kamoi R, Das SSS, Lou R, Ahn JJ, Zhao Y, et\u00a0al (2024) Evaluating LLMs at detecting errors in LLM responses. arXiv preprint arXiv:2404.03602. [Online]. https:\/\/arxiv.org\/abs\/2404.03602"},{"key":"11454_CR40","unstructured":"Kamoi R, Das SSS, Lou R, Ahn JJ, Zhao Y, Lu X, Zhang N, Zhang Y, Zhang HR, Vummanthala SR, Dave S, Qin S, Cohan A, Yin W, Zhang R (2024) Evaluating LLMs at detecting errors in LLM responses. In: First conference on language modeling"},{"issue":"8","key":"11454_CR41","doi-asserted-by":"publisher","first-page":"250","DOI":"10.1007\/s10462-025-11241-7","volume":"58","author":"D Kampelopoulos","year":"2025","unstructured":"Kampelopoulos D, Tsanousa A, Vrochidis S, Kompatsiaris I (2025) A review of llms and their applications in the architecture, engineering and construction industry. Artif Intell Rev 58(8):250","journal-title":"Artif Intell Rev"},{"key":"11454_CR42","first-page":"49 025","volume":"36","author":"J Kasai","year":"2023","unstructured":"Kasai J, Sakaguchi K, Le Bras R, Asai A, Yu X, Radev D, Smith NA, Choi Y, Inui K et al (2023) Realtime qa: What\u2019s the answer right now? Adv Neural Inf Process Syst 36:49 025-49 043","journal-title":"Adv Neural Inf Process Syst"},{"key":"11454_CR43","doi-asserted-by":"crossref","unstructured":"Khaliq MA, Chang PY, Ma M, Pflugfelder B, Mileti\u0107 F (2024) RAGAR, your falsehood radar: RAG-augmented reasoning for political fact-checking using multimodal large language models. In: Proceedings of the seventh fact extraction and verification workshop (FEVER), Miami, Florida, USA, pp 280\u2013296","DOI":"10.18653\/v1\/2024.fever-1.29"},{"issue":"1","key":"11454_CR44","doi-asserted-by":"publisher","first-page":"7","DOI":"10.1016\/j.infsof.2008.09.009","volume":"51","author":"BA Kitchenham","year":"2009","unstructured":"Kitchenham BA, Brereton P, Budgen D, Turner M, Bailey J, Linkman SG (2009) Systematic literature reviews in software engineering: a systematic literature review. Inf Softw Technol 51(1):7\u201315. https:\/\/doi.org\/10.1016\/j.infsof.2008.09.009","journal-title":"Inf Softw Technol"},{"key":"11454_CR45","doi-asserted-by":"crossref","unstructured":"Krishnamurthy V, Balaji V (2024) Yours truly: a credibility framework for effortless llm-powered fact checking. IEEE Access","DOI":"10.1109\/ACCESS.2024.3520187"},{"key":"11454_CR46","unstructured":"Kupershtein L, Zalepa O, Sorokolit V, Prokopenko S (2025) Ai-agent-based system for fact-checking support using large language models. In: CEUR workshop proceedings, pp 321\u2013331"},{"key":"11454_CR47","doi-asserted-by":"crossref","unstructured":"Ladhak F, Durmus E, Suzgun M, Zhang T, Jurafsky D, McKeown K, Hashimoto TB (2023) When do pre-training biases propagate to downstream tasks? A case study in text summarization. In: Proceedings of the 17th conference of the European Chapter of the Association for Computational Linguistics, pp 3206\u20133219","DOI":"10.18653\/v1\/2023.eacl-main.234"},{"key":"11454_CR48","doi-asserted-by":"crossref","unstructured":"Lee N, Li BZ, Wang S, Yih W, Ma H, Khabsa M (2020) Language models as fact checkers?. In: Proceedings of the third workshop on fact extraction and VERification (FEVER), Online, pp 36\u201341","DOI":"10.18653\/v1\/2020.fever-1.5"},{"issue":"1","key":"11454_CR49","doi-asserted-by":"publisher","first-page":"17","DOI":"10.1038\/s44168-025-00215-8","volume":"4","author":"M Leippold","year":"2025","unstructured":"Leippold M, Vaghefi SA, Stammbach D, Muccione V, Bingler J, Ni J, Senni CC, Wekhof T, Schimanski T, Gostlow G et al (2025) Automated fact-checking of climate claims with large language models. Npj Climate Action 4(1):17","journal-title":"Npj Climate Action"},{"key":"11454_CR50","doi-asserted-by":"crossref","unstructured":"Leite JA, Razuvayevskaya O, Bontcheva K, Scarton C (2024) EUvsDisinfo: A dataset for multilingual detection of pro-Kremlin disinformation in news articles. In: Proceedings of the 33rd ACM international conference on information and knowledge management, pp 5380\u20135384","DOI":"10.1145\/3627673.3679167"},{"key":"11454_CR51","doi-asserted-by":"crossref","unstructured":"Leite J, Razuvayevskaya O, Bontcheva K, Scarton C (2024) Weakly supervised veracity classification with LLM-predicted credibility signals. https:\/\/arxiv.org\/abs\/2309.07601","DOI":"10.21203\/rs.3.rs-5389911\/v1"},{"key":"11454_CR52","doi-asserted-by":"crossref","unstructured":"Li J, Cheng X, Zhao WX, Nie J-Y, Wen J-R (2023) HaluEval: a large-scale hallucination evaluation benchmark for large language models. In: Proceedings of the 2023 conference on empirical methods in natural language processing, pp 4567\u20134578. [Online]. Available: https:\/\/aclanthology.org\/2023.emnlp-main.397\/","DOI":"10.18653\/v1\/2023.emnlp-main.397"},{"key":"11454_CR53","doi-asserted-by":"crossref","unstructured":"Li P, Gao Z, Zhang B, Yuan T, Wu Y, Harandi M, Jia Y, Zhu S, Li Q (2024) FIRE: a dataset for feedback integration and refinement evaluation of multimodal models. [Online]. https:\/\/arxiv.org\/abs\/2407.11522","DOI":"10.52202\/079017-3223"},{"key":"11454_CR54","doi-asserted-by":"crossref","unstructured":"Lin H, Deng Y, Gu Y, Zhang W, Ma J, Ng S, Chua T (2025) FACT-AUDIT: an adaptive multi-agent framework for dynamic fact-checking evaluation of large language models. In: Proceedings of the 63rd annual meeting of the association for computational linguistics (Volume 1: Long Papers), Vienna, Austria, pp 360\u2013381","DOI":"10.18653\/v1\/2025.acl-long.17"},{"key":"11454_CR55","doi-asserted-by":"crossref","unstructured":"Lin S, Hilton J, Evans O (2022) TruthfulQA: measuring how models mimic human falsehoods. In: Proceedings of the 60th annual meeting of the association for computational linguistics (Volume 1: Long Papers), Dublin, Ireland, pp 3214\u20133252","DOI":"10.18653\/v1\/2022.acl-long.229"},{"key":"11454_CR56","doi-asserted-by":"publisher","unstructured":"Li M, Peng B, Galley M, Gao J, Zhang Z (2024) Self-checker: plug-and-play modules for fact-checking with large language models. In: Findings of the Association for Computational Linguistics: NAACL 2024, pp 163\u2013181. [Online]. https:\/\/doi.org\/10.18653\/v1\/2024.findings-naacl.12","DOI":"10.18653\/v1\/2024.findings-naacl.12"},{"key":"11454_CR57","doi-asserted-by":"crossref","unstructured":"Li D, Rawat AS, Zaheer M, Wang X, Lukasik M, Veit A, Yu F, Kumar S (2022) Large language models with controllable working memory. arXiv preprint arXiv:2211.05110","DOI":"10.18653\/v1\/2023.findings-acl.112"},{"issue":"1","key":"11454_CR58","doi-asserted-by":"publisher","first-page":"541","DOI":"10.1609\/aaai.v39i1.32034","volume":"39","author":"Y Liu","year":"2025","unstructured":"Liu Y, Sun H, Guo W, Xiao X, Mao C, Yu Z, Yan R (2025) Bidev: Bilateral defusing verification for complex claim fact-checking. In: Proceedings of the AAAI conference on artificial intelligence Vol 39(1), pp 541\u2013549","journal-title":"Proceedings of the AAAI Conference on Artificial Intelligence"},{"key":"11454_CR59","unstructured":"Li W, Wu W, Chen M, Liu J, Xiao X, Wu H (2022) Faithfulness in natural language generation: a systematic survey of analysis, evaluation and optimization methods. arXiv preprint arXiv:2203.05227"},{"key":"11454_CR60","unstructured":"Li X, Zhang Y, Malthouse EC (2024) Large language model agent for fake news detection. arXiv preprint arXiv:2405.01593"},{"key":"11454_CR61","doi-asserted-by":"crossref","unstructured":"Luo G, Darrell T, Rohrbach A (2021) NewsCLIPpings: automatic generation of out-of-context multimodal media. In: Proceedings Of The 2021 conference on empirical methods in natural language processing, pp 6801\u20136817. [Online]. https:\/\/aclanthology.org\/2021.emnlp-main.545\/","DOI":"10.18653\/v1\/2021.emnlp-main.545"},{"key":"11454_CR62","first-page":"1614","volume":"2025","author":"J Ma","year":"2025","unstructured":"Ma J, Hu L, Li R, Fu W (2025) Local: logical and causal fact-checking with llm-based multi-agents. In: Proceedings of the ACM on web conference 2025, pp 1614\u20131625","journal-title":"Proceedings of the ACM on Web Conference"},{"key":"11454_CR63","doi-asserted-by":"crossref","unstructured":"Magomere J, La\u00a0Malfa E, Tonneau M, Kazemi A, Hale SA (2025) When claims evolve: evaluating and enhancing the robustness of embedding models against misinformation edits. Findings of the Association for Computational Linguistics: ACL 2025, 22374\u201322404","DOI":"10.18653\/v1\/2025.findings-acl.1150"},{"issue":"1","key":"11454_CR64","doi-asserted-by":"publisher","first-page":"16","DOI":"10.1007\/s11063-025-11732-2","volume":"57","author":"DE Mathew","year":"2025","unstructured":"Mathew DE, Ebem DU, Ikegwu AC, Ukeoma PE, Dibiaezue NF (2025) Recent emerging techniques in explainable artificial intelligence to enhance the interpretable and understanding of ai models for human. Neural Process Lett 57(1):16","journal-title":"Neural Process Lett"},{"key":"11454_CR65","unstructured":"Mishra S, Suryavardan S, Bhaskar A, Chopra P, Reganti AN, Patwa P, Das A, Chakraborty T, Sheth AP, Ekbal A, et\u00a0al (2022) FACTIFY: a multi-modal fact verification dataset. In: DE-FACTIFY@ AAAI"},{"key":"11454_CR66","doi-asserted-by":"crossref","unstructured":"Nakov P, Barr\u00f3n-Cede\u00f1o A, Da\u00a0San\u00a0Martino G, Alam F, Stru\u00df JM, Mandl T, M\u00edguez R, Caselli T, Kutlu M, Zaghouani W et\u00a0al (2022) Overview of the CLEF\u20132022 CheckThat! Lab on fighting the COVID-19 infodemic and fake news detection. In: International conference of the cross-language evaluation forum for european languages. Springer, pp 495\u2013520","DOI":"10.1007\/978-3-031-13643-6_29"},{"key":"11454_CR67","doi-asserted-by":"crossref","unstructured":"Nan Q, Cao J, Zhu Y, Wang Y, Li J (2021) Mdfend: multi-domain fake news detection. In: Proceedings of the 30th ACM international conference on information & knowledge management, pp 3343\u20133347","DOI":"10.1145\/3459637.3482139"},{"key":"11454_CR68","doi-asserted-by":"crossref","unstructured":"Onoe Y, Zhang MJ, Choi E, Durrett G (2022) Entity cloze by date: What lms know about unseen entities. arXiv preprint arXiv:2205.02832","DOI":"10.18653\/v1\/2022.findings-naacl.52"},{"key":"11454_CR69","unstructured":"Pal A, Umapathi LK, Sankarasubbu M (2022) MedMCQA: a large-scale multi-subject multi-choice dataset for medical domain question answering. In: Proceedings of the conference on health, inference, and learning, pp 248\u2013260. [Online]. https:\/\/proceedings.mlr.press\/v174\/pal22a.html"},{"key":"11454_CR70","doi-asserted-by":"publisher","unstructured":"Panchendrarajan R, M\u00edguez R, Zubiaga A (2025) MultiClaimNet: a massively multilingual dataset of fact-checked claim clusters. arXiv preprint arXiv:2503.22280. [Online]. https:\/\/doi.org\/10.48550\/arXiv.2503.22280","DOI":"10.48550\/arXiv.2503.22280"},{"key":"11454_CR71","doi-asserted-by":"crossref","unstructured":"Papadopoulos S, Koutlis C, Papadopoulos S, Petrantonakis PC (2025) RED-DOT: multimodal fact-checking via relevant evidence detection. IEEE Trans Computat Soc Syst, pp 1\u201310","DOI":"10.1109\/TCSS.2025.3553939"},{"key":"11454_CR72","doi-asserted-by":"crossref","unstructured":"Paullada A, Raji ID, Bender EM, Denton E, Hanna A (2021) Data and its (dis) contents: a survey of dataset development and use in machine learning research. Patterns, 2(11)","DOI":"10.1016\/j.patter.2021.100336"},{"key":"11454_CR73","unstructured":"Peng B, Galley M, He P, Cheng H, Xie Y, Hu Y, Huang Q, Liden L, Yu Z, Chen W, et\u00a0al (2023) Check your facts and try again: improving large language models with external knowledge and automated feedback. arXiv preprint arXiv:2302.12813"},{"key":"11454_CR74","unstructured":"Pisarevskaya D, Zubiaga A (2025) Zero-shot and few-shot learning with instruction-following llms for claim matching in automated fact-checking. In: Proceedings of the 31st international conference on computational linguistics, Abu Dhabi, UAE, pp 9721\u20139736"},{"issue":"052\u201313","key":"11454_CR75","first-page":"062","volume":"13","author":"P Qi","year":"2024","unstructured":"Qi P, Yan Z, Hsu W, Lee ML (2024) Sniffer: multimodal large language model for explainable out-of-context misinformation detection. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, vol 13(052\u201313), p 062","journal-title":"Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition"},{"key":"11454_CR76","doi-asserted-by":"publisher","first-page":"1341697","DOI":"10.3389\/frai.2024.1341697","volume":"7","author":"D Quelle","year":"2024","unstructured":"Quelle D, Bovet A (2024) The perils and promises of fact-checking with large language models. Front Artific Intell 7:1341697","journal-title":"Front Artific Intell"},{"key":"11454_CR77","doi-asserted-by":"crossref","unstructured":"Saakyan A, Chakrabarty T, Muresan S (2021) COVID-FACT: fact extraction and verification of real-world claims on COVID-19 pandemic. In: Proceedings of the 59th annual meeting of the association for computational linguistics, pp 2116\u20132129. [Online]. https:\/\/aclanthology.org\/2021.acl-long.165\/","DOI":"10.18653\/v1\/2021.acl-long.165"},{"key":"11454_CR78","unstructured":"Sairaj RT, Balasundaram S (2025) \u201cOntology Mapping for Retrieval Augmented Modelling to Reduce Factual Hallucinations in Pretrained Language Model-Based Auto-Generated Questions,\u201d Applied Ontology, 15705838251343009"},{"issue":"1","key":"11454_CR79","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s10791-025-09683-2","volume":"28","author":"RT Sairaj","year":"2025","unstructured":"Sairaj RT, Balasundaram SR (2025) Ensemble learning with RAG model to reduce redundant question topics in auto-generated exam questions. Discov Comput 28(1):1\u201317","journal-title":"Discover Computing"},{"key":"11454_CR80","doi-asserted-by":"crossref","unstructured":"Sakurai T, Shiramatsu S, Kinoshita R (2024) Llm-based agent for recommending information related to web discussions at appropriate timing. In: 2024 IEEE international conference on agents (ICA). IEEE, pp 120\u2013123","DOI":"10.1109\/ICA63002.2024.00033"},{"key":"11454_CR81","doi-asserted-by":"publisher","first-page":"573","DOI":"10.1609\/icwsm.v13i01.3254","volume":"13","author":"FKA Salem","year":"2019","unstructured":"Salem FKA, Al Feel R, Elbassuoni S, Jaber M, Farah M (2019) Fa-kes: a fake news dataset around the syrian war. In:Proceedings of the international AAAI conference on web and social media, vol 13, pp 573\u2013582","journal-title":"Proceedings of the international AAAI conference on web and social media"},{"key":"11454_CR82","doi-asserted-by":"crossref","unstructured":"Sankararaman H, Yasin MN, Sorensen T, Di\u00a0Bari A, Stolcke A (2024) Provenance: a light-weight fact-checker for retrieval augmented LLM generation output. In: Proceedings of the 2024 conference on empirical methods in natural language processing: industry track, Miami, Florida, US, pp 1305\u20131313","DOI":"10.18653\/v1\/2024.emnlp-industry.97"},{"key":"11454_CR83","first-page":"65 128","volume":"36","author":"M Schlichtkrull","year":"2023","unstructured":"Schlichtkrull M, Guo Z, Vlachos A (2023) Averitec: A dataset for real-world claim verification with evidence from the web. Adv Neural Inf Process Syst 36:65 128-65 167","journal-title":"Adv Neural Inf Process Syst"},{"key":"11454_CR84","doi-asserted-by":"crossref","unstructured":"Setty V (2024) Surprising efficacy of fine-tuned transformers for fact-checking over larger language models. In: Proceedings of the 47th international ACM SIGIR conference on research and development in information retrieval, pp 2842\u20132846","DOI":"10.1145\/3626772.3661361"},{"key":"11454_CR85","unstructured":"Shafayat S, Kim E, Oh J, Oh A (2024) Multi-FAct: assessing factuality of multilingual LLMs using FActScore. In: First Conference on language modeling (COLM)"},{"issue":"410\u201314","key":"11454_CR86","first-page":"419","volume":"14","author":"P Sharma","year":"2024","unstructured":"Sharma P, Shaham TR, Baradad M, Fu S, Rodriguez-Munoz A, Duggal S, Isola P, Torralba A (2024) A vision check-up for language models. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, Vol 14(410\u201314), p 419","journal-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition"},{"key":"11454_CR87","doi-asserted-by":"crossref","unstructured":"Shcharbakova H, Anikina T, Skachkova N, Genabith JV (2025) When scale meets diversity: evaluating language models on fine-grained multilingual claim verification. In: Proceedings of the eighth fact extraction and verification workshop (FEVER), Vienna, Austria, pp 69\u201384","DOI":"10.18653\/v1\/2025.fever-1.5"},{"key":"11454_CR88","doi-asserted-by":"publisher","DOI":"10.1016\/j.aei.2025.103634","volume":"68","author":"L Sheng","year":"2025","unstructured":"Sheng L, Chang F, Sun Q, Wangzha D, Gu Z (2025) Uncertainty reports as explainable AI: a cognitive-adaptive framework for human-AI decision systems in context tasks. Adv Eng Inform 68:103634","journal-title":"Adv Eng Inform"},{"key":"11454_CR89","unstructured":"Shu K, Mahudeswaran D, Wang S, Lee D, Liu H (2018) FakeNewsNet: A data repository with news content, social context and spatialtemporal information for studying fake news on social media. arXiv"},{"key":"11454_CR90","doi-asserted-by":"crossref","unstructured":"Si C, Goyal N, Wu T, Zhao C, Feng S, Daume\u00a0Iii H, Boyd-Graber J (2024) Large language models help humans verify truthfulness \u2013 except when they are convincingly wrong. In: Proceedings of the 2024 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), Mexico City, Mexico, pp 1459\u20131474. [Online]. https:\/\/aclanthology.org\/2024.naacl-long.81\/","DOI":"10.18653\/v1\/2024.naacl-long.81"},{"key":"11454_CR91","doi-asserted-by":"crossref","unstructured":"Siino M (2024) BrainLlama at SemEval-2024 Task 6: prompting Llama to detect hallucinations and related observable overgeneration mistakes. In: Proceedings of the 18th international workshop on semantic evaluation (SemEval-2024), Mexico City, Mexico, pp 82\u201387","DOI":"10.18653\/v1\/2024.semeval-1.14"},{"issue":"9","key":"11454_CR92","first-page":"426","volume":"13","author":"M Siino","year":"2022","unstructured":"Siino M, Di Nuovo E, Tinnirello I, La Cascia M (2022) Fake news spreaders detection: sometimes attention is not all you need. Inf 13(9):426","journal-title":"Inf"},{"key":"11454_CR93","unstructured":"Siino M, Tinnirello I (2024) GPT hallucination detection through prompt engineering. In: Working notes of the conference and labs of the evaluation forum (CLEF 2024)"},{"key":"11454_CR94","doi-asserted-by":"crossref","unstructured":"Singal R, Patwa P, Patwa P, Chadha A, Das A (2024) Evidence-backed fact checking using RAG and few-shot in-context learning with LLMs. In: Proceedings of the seventh fact extraction and verification workshop (FEVER), Miami, Florida, USA, pp 91\u201398","DOI":"10.18653\/v1\/2024.fever-1.10"},{"key":"11454_CR95","doi-asserted-by":"crossref","unstructured":"Singhal A, Law T, Kassner C, Gupta A, Duan E, Damle A, Li RL (2024) Multilingual fact-checking using llms. In: Proceedings of the third workshop on NLP for positive impact, pp 13\u201331","DOI":"10.18653\/v1\/2024.nlp4pi-1.2"},{"key":"11454_CR96","doi-asserted-by":"crossref","unstructured":"Tang L, Laban P, Durrett G (2024) MiniCheck: efficient fact-checking of LLMs on grounding documents. In: Proceedings of the 2024 conference on empirical methods in natural language processing, Miami, Florida, USA, pp 8818\u20138847","DOI":"10.18653\/v1\/2024.emnlp-main.499"},{"key":"11454_CR97","doi-asserted-by":"crossref","unstructured":"Thorne J, Vlachos A, Christodoulopoulos C, Mittal A (2018) FEVER: a large-scale dataset for fact extraction and verification. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 809\u2013819. [Online]. https:\/\/aclanthology.org\/N18-1074\/","DOI":"10.18653\/v1\/N18-1074"},{"key":"11454_CR98","doi-asserted-by":"crossref","unstructured":"Tran H, Wang J, Ting Y, Huang W, Chen T (2024) Leaf: Learning and evaluation augmented by fact-checking to improve factualness in large language models. arXiv preprint arXiv:2410.23526","DOI":"10.18653\/v1\/2025.emnlp-industry.23"},{"issue":"1","key":"11454_CR99","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s12859-015-0564-6","volume":"16","author":"G Tsatsaronis","year":"2015","unstructured":"Tsatsaronis G, Balikas G, Malakasiotis P, Partalas I, Zschunke M, Alvers MR, Weissenborn D, Krithara A, Petridis S, Polychronopoulos D et al (2015) An overview of the BioASQ large-scale biomedical semantic indexing and question answering competition. BMC Bioinformatics 16(1):1\u201328","journal-title":"BMC Bioinformatics"},{"key":"11454_CR100","doi-asserted-by":"crossref","unstructured":"Vladika J, Hacajova I, Matthes F (2025) Step-by-step fact verification system for medical claims with explainable reasoning. In: Proceedings of the 2025 conference of the nations of the americas chapter of the association for computational linguistics: human language technologies (Volume 2: Short Papers), Albuquerque, New Mexico, pp 805\u2013816","DOI":"10.18653\/v1\/2025.naacl-short.68"},{"key":"11454_CR101","unstructured":"Vykopal I, Pikuliak M, Ostermann S, \u0160imko M (2024) Generative large language models in automated fact-checking: a survey. arXiv preprint arXiv:2407.02351v2. [Online]. https:\/\/arxiv.org\/abs\/2407.02351"},{"key":"11454_CR102","doi-asserted-by":"crossref","unstructured":"Wadden D, Lin S, Lo K, Wang LL, van Zuylen M, Cohan A, Hajishirzi H (2020) Fact or fiction: verifying scientific claims. In: Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP), pp 7534\u20137550. [Online]. https:\/\/aclanthology.org\/2020.emnlp-main.609\/","DOI":"10.18653\/v1\/2020.emnlp-main.609"},{"key":"11454_CR103","doi-asserted-by":"crossref","unstructured":"Wang WY (2017) Liar, Liar pants on fire a new benchmark dataset for fake news detection. In: Proceedings of the 55th annual meeting of the association for computational linguistics (Volume 2: Short Papers), pp 422\u2013426. [Online]. https:\/\/aclanthology.org\/P17-2067\/","DOI":"10.18653\/v1\/P17-2067"},{"issue":"10","key":"11454_CR104","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0312240","volume":"19","author":"J Wang","year":"2024","unstructured":"Wang J, Zhu Z, Liu C, Li R, Wu X (2024) LLM-Enhanced multimodal detection of fake news. PLoS ONE 19(10):e0312240","journal-title":"PLoS ONE"},{"issue":"8","key":"11454_CR105","doi-asserted-by":"publisher","first-page":"249","DOI":"10.1007\/s10462-025-11222-w","volume":"58","author":"X Wang","year":"2025","unstructured":"Wang X, Jiang H, Yu Y, Yu J, Lin Y, Yi P, Wang Y, Qiao Y, Li L, Wang F-Y (2025) Building intelligence identification system via large language model watermarking: a survey and beyond. Artif Intell Rev 58(8):249","journal-title":"Artif Intell Rev"},{"key":"11454_CR106","unstructured":"Wang Y, Wang M, Iqbal H, Georgiev GN, Geng J, Gurevych I, Nakov P (2025) Openfactcheck: building, benchmarking customized fact-checking systems and evaluating the factuality of claims and llms. In: Proceedings of the 31st international conference on computational linguistics, vol 11399\u201311421"},{"key":"11454_CR107","doi-asserted-by":"crossref","unstructured":"Wang Y, Wang M, Manzoor MA, Liu F, Georgiev GN, Das RJ, Nakov P (2024) Factuality of large language models: a survey. In: Proceedings of the 2024 conference on empirical methods in natural language processing, miami, Florida, USA, pp 19519\u201319529","DOI":"10.18653\/v1\/2024.emnlp-main.1088"},{"key":"11454_CR108","unstructured":"Wang Y, Zhong W, Li L, Mi F, Zeng X, Huang W, Shang L, Jiang X, Liu Q (2023) Aligning large language models with human: a survey. arXiv preprint arXiv:2307.12966"},{"key":"11454_CR109","unstructured":"Weidinger L, Mellor J, Rauh M, Griffin C, Uesato J, Huang P-S, Cheng M, Glaese M, Balle B, Kasirzadeh A, et\u00a0al (2021) \u201cEthical and social risks of harm from language models,\u201d arXiv preprint arXiv:2112.04359"},{"key":"11454_CR110","unstructured":"Wei J, Yang C, Song X, Lu Y, Hu N, Huang J, Tran D, Peng D, Liu R, Huang D, Du C, Le QV (2024) Long-form factuality in large language models. arXiv"},{"key":"11454_CR111","doi-asserted-by":"crossref","unstructured":"Xie Z, Xing R, Wang Y, Geng J, Iqbal H, Sahnan D, Gurevych I, Nakov P (2025) FIRE: fact-checking with iterative retrieval and verification. In: Findings of the association for computational linguistics: NAACL Albuquerque. New Mexico, pp 2901\u20132914","DOI":"10.18653\/v1\/2025.findings-naacl.158"},{"key":"11454_CR112","doi-asserted-by":"crossref","unstructured":"Xiong G, Jin Q, Lu Z, Zhang A (2024) Benchmarking retrieval-augmented generation for medicine. In: ACL anthology, pp 6233\u20136251","DOI":"10.18653\/v1\/2024.findings-acl.372"},{"key":"11454_CR113","doi-asserted-by":"crossref","unstructured":"Yang Q, Christensen T, Gilda S, Fernandes J, Oliveira D, Wilson R, Woodard D (2024) Are fact-checking tools helpful? an exploration of the usability of google fact check. In: International conference on data and information in online, pp 82\u201398","DOI":"10.1007\/978-3-031-97352-9_7"},{"key":"11454_CR114","unstructured":"Yang Z, et\u00a0al (2022) A coarse-to-fine cascaded evidence-distillation neural network for explainable fake news detection. In: Proceedings of the 29th international conference on computational linguistics, pp 2637\u20132647. [Online]. https:\/\/aclanthology.org\/2022.coling-1.230\/"},{"key":"11454_CR115","doi-asserted-by":"crossref","unstructured":"Yao BM, Shah A, Sun L, Cho J, Huang L (2023) End-to-end multimodal fact-checking and explanation generation: a challenging dataset and models. In: Proceedings of the 46th international ACM SIGIR conference on research and development in information retrieval, Taipei, Taiwan, pp 2733\u20132743","DOI":"10.1145\/3539618.3591879"},{"key":"11454_CR116","unstructured":"Yao J-Y, Ning K-P, Liu Z-H, Ning M-N, Liu Y-Y, Yuan L (2023) Llm lies: hallucinations are not bugs, but features as adversarial examples. arXiv preprint arXiv:2310.01469"},{"key":"11454_CR117","doi-asserted-by":"publisher","unstructured":"Yao B, Shah A, Sun L, Cho J, Huang L (2023) End-to-end multimodal fact-checking and explanation generation: a challenging dataset and models. In: Proceedings Of The 46th international ACM SIGIR conference on research and development in information retrieval, pp 2733\u20132743. [Online]. https:\/\/doi.org\/10.1145\/3539618.3591879","DOI":"10.1145\/3539618.3591879"},{"issue":"1","key":"11454_CR118","doi-asserted-by":"publisher","first-page":"86","DOI":"10.1038\/s41597-025-04417-x","volume":"12","author":"B Zhang","year":"2025","unstructured":"Zhang B, Bornet A, Yazdani A, Khlebnikov P, Milutinovic M, Rouhizadeh H, Amini P, Teodoro D (2025) A dataset for evaluating clinical research claims in large language models. Sci Data 12(1):86","journal-title":"Sci Data"},{"key":"11454_CR119","doi-asserted-by":"crossref","unstructured":"Zhang X, Gao W (2023) Towards LLM-based fact verification on news claims with a hierarchical step-by-step prompting method. In: Proceedings of the 13th international joint conference on natural language processing and the 3rd conference of the Asia-Pacific chapter of the association for computational linguistics (Volume 1: Long Papers), Nusa Dua, Bali, pp 996\u20131011","DOI":"10.18653\/v1\/2023.ijcnlp-main.64"},{"key":"11454_CR120","unstructured":"Zhang X, Gao W (2024) Reinforcement retrieval leveraging fine-grained feedback for fact checking news claims with black-box LLM. In : Proceedings of the 2024 joint international conference on computational linguistics, language resources and evaluation (LREC-COLING 2024), Torino, Italia, pp 13\u00a0861\u201313\u00a0873"},{"key":"11454_CR121","doi-asserted-by":"crossref","unstructured":"Zhang X, Li S, Hauer B, Shi N, Kondrak G (2023) Don\u2019t trust ChatGPT when your question is not in English: a study of multilingual abilities and types of LLMs. In: Proceedings of the 2023 conference on empirical methods in natural language processing (EMNLP), Singapore, pp 7915\u20137927","DOI":"10.18653\/v1\/2023.emnlp-main.491"},{"key":"11454_CR122","unstructured":"Zhao X, Wang L, Wang Z, Cheng H, Zhang R, Wong K-F (2024) Pacar: automated fact-checking with planning and customized action reasoning using large language models. In: Proceedings of the 2024 joint international conference on computational linguistics, language resources and evaluation (LREC-COLING 2024), pp 12\u00a0564\u201312\u00a0573"},{"key":"11454_CR123","doi-asserted-by":"crossref","unstructured":"Zhao X, Yu J, Liu Z, Wang J, Li D, Chen Y, Hu B, Zhang M (2024) Medico: towards hallucination detection and correction with multi-source evidence fusion. In: Proceedings of the 2024 conference on empirical methods in natural language processing: system demonstrations, Miami, Florida, USA, pp 34\u201345","DOI":"10.18653\/v1\/2024.emnlp-demo.4"},{"key":"11454_CR124","first-page":"55 006","volume":"36","author":"C Zhou","year":"2023","unstructured":"Zhou C, Liu P, Xu P, Iyer S, Sun J, Mao Y, Ma X, Efrat A, Yu P, Yu L et al (2023) Lima: Less is more for alignment. Adv Neural Inf Process Syst 36:55 006-55 021","journal-title":"Adv Neural Inf Process Syst"}],"container-title":["Artificial Intelligence Review"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10462-025-11454-w","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10462-025-11454-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10462-025-11454-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,2,19]],"date-time":"2026-02-19T05:46:30Z","timestamp":1771479990000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10462-025-11454-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,1,3]]},"references-count":124,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2026,2]]}},"alternative-id":["11454"],"URL":"https:\/\/doi.org\/10.1007\/s10462-025-11454-w","relation":{},"ISSN":["1573-7462"],"issn-type":[{"value":"1573-7462","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,1,3]]},"assertion":[{"value":"26 July 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"26 November 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"3 January 2026","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"On behalf of all authors, the corresponding author states that there is no Conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical approval and consent to participate"}},{"value":"Not applicable.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Informed consents"}},{"value":"The authors declare no Conflict of interest.","order":5,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}],"article-number":"70"}}