{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,4]],"date-time":"2026-07-04T10:40:07Z","timestamp":1783161607921,"version":"3.54.6"},"reference-count":53,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2024,2,7]],"date-time":"2024-02-07T00:00:00Z","timestamp":1707264000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Artif. Intell."],"abstract":"<jats:p>Automated fact-checking, using machine learning to verify claims, has grown vital as misinformation spreads beyond human fact-checking capacity. Large language models (LLMs) like GPT-4 are increasingly trusted to write academic papers, lawsuits, and news articles and to verify information, emphasizing their role in discerning truth from falsehood and the importance of being able to verify their outputs. Understanding the capacities and limitations of LLMs in fact-checking tasks is therefore essential for ensuring the health of our information ecosystem. Here, we evaluate the use of LLM agents in fact-checking by having them phrase queries, retrieve contextual data, and make decisions. Importantly, in our framework, agents explain their reasoning and cite the relevant sources from the retrieved context. Our results show the enhanced prowess of LLMs when equipped with contextual information. GPT-4 outperforms GPT-3, but accuracy varies based on query language and claim veracity. While LLMs show promise in fact-checking, caution is essential due to inconsistent accuracy. Our investigation calls for further research, fostering a deeper comprehension of when agents succeed and when they fail.<\/jats:p>","DOI":"10.3389\/frai.2024.1341697","type":"journal-article","created":{"date-parts":[[2024,2,7]],"date-time":"2024-02-07T05:34:08Z","timestamp":1707284048000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":60,"title":["The perils and promises of fact-checking with large language models"],"prefix":"10.3389","volume":"7","author":[{"given":"Dorian","family":"Quelle","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Alexandre","family":"Bovet","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1965","published-online":{"date-parts":[[2024,2,7]]},"reference":[{"key":"B1","article-title":"\u201cProgress toward \u201cthe Holy Grail\u201d: the continued quest to automate fact-checking,\u201d","author":"Adair","year":"2017"},{"key":"B2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1475","article-title":"Multifc: a real-world multi-domain dataset for evidence-based fact checking of claims","author":"Augenstein","year":"2019","journal-title":"ArXiv"},{"key":"B3","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2302.04023","article-title":"A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity","author":"Bang","year":"2023","journal-title":"ArXiv"},{"key":"B4","doi-asserted-by":"publisher","first-page":"7","DOI":"10.1038\/s41467-018-07761-2","article-title":"Influence of fake news in Twitter during the 2016 US presidential election","volume":"10","author":"Bovet","year":"2019","journal-title":"Nat. Commun."},{"key":"B5","unstructured":"Language models are few-shot learners18771901\n            BrownT.\n            MannB.\n            RyderN.\n            SubbiahM.\n            KaplanJ. D.\n            DhariwalP.\n          Adv. Neural Inform. Process. Syst.332020"},{"key":"B6","unstructured":"N-gram counts and language models from the common crawl4\n            BuckC.\n            HeafieldK.\n            Van OoyenB.\n          LREC22014"},{"key":"B7","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2306.17176","article-title":"News verifiers showdown: a comparative performance evaluation of chatgpt 3.5, chatgpt 4.0, bing ai, and bard in news fact-checking","author":"Caramancion","year":"2023","journal-title":"ArXiv"},{"key":"B8","unstructured":"ChaseH.\n          Langchain2022"},{"key":"B9","doi-asserted-by":"publisher","DOI":"10.2139\/ssrn.4614239","article-title":"Automated claim matching with large language models: empowering fact-checkers in the fight against misinformation","author":"Choi","year":"2023","journal-title":"ArXiv"},{"key":"B10","doi-asserted-by":"publisher","first-page":"e47184","DOI":"10.2196\/47184","article-title":"Investigating the impact of user trust on the adoption and use of chatgpt: survey analysis","volume":"25","author":"Choudhury","year":"2023","journal-title":"J. Med. Internet Res."},{"key":"B11","doi-asserted-by":"publisher","first-page":"15","DOI":"10.3145\/epi.2023.sep.15","article-title":"Retraining fact-checkers: the emergence of chatgpt in information verification","volume":"2023","author":"Cuartielles Saura","year":"2023","journal-title":"UPF Digit. Reposit."},{"key":"B12","doi-asserted-by":"publisher","first-page":"103219","DOI":"10.1016\/j.ipm.2022.103219","article-title":"The state of human-centered NLP technology for fact-checking","volume":"60","author":"Das","year":"2023","journal-title":"Inform. Process. Manag."},{"key":"B13","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2305.12477","article-title":"Gpt-3.5 vs. gpt-4: evaluating chatgpt's reasoning performance in zero-shot learning","author":"Espejel","year":"2023","journal-title":"ArXiv"},{"key":"B14","doi-asserted-by":"publisher","first-page":"904","DOI":"10.1038\/s41562-023-01550-8","article-title":"Political polarization of news media and influencers on Twitter in the 2016 and 2020 US presidential elections","volume":"7","author":"Flamino","year":"2023","journal-title":"Nat. Hum. Behav."},{"key":"B15","doi-asserted-by":"crossref","first-page":"845","DOI":"10.18653\/v1\/S19-2147","article-title":"\u201cSemEval-2019 task 7: RumourEval, determining rumour veracity and support for rumours,\u201d","volume-title":"Proceedings of the 13th International Workshop on Semantic Evaluation","author":"Gorrell","year":"2019"},{"key":"B16","article-title":"\u201cThe rise of fact-checking sites in Europe,\u201d","volume-title":"Digital News Project Report, Reuters Institute for the Study of Journalism","author":"Graves","year":"2016"},{"key":"B17","doi-asserted-by":"publisher","first-page":"374","DOI":"10.1126\/science.aau2706","article-title":"Fake news on Twitter during the 2016 U.S. presidential election","volume":"363","author":"Grinberg","year":"2019","journal-title":"Science"},{"key":"B18","doi-asserted-by":"publisher","first-page":"178","DOI":"10.1162\/tacl_a_00454","article-title":"A survey on automated fact-checking","volume":"10","author":"Guo","year":"2021","journal-title":"Trans. Assoc. Comput. Linguist."},{"key":"B19","article-title":"\u201cThe quest to automate fact-checking,\u201d","author":"Hassan","year":"2015","journal-title":"Proceedings of the 2015 Computation+ Journalism Symposium"},{"key":"B20","doi-asserted-by":"publisher","first-page":"1945","DOI":"10.14778\/3137765.3137815","article-title":"Claimbuster: the first-ever end-to-end fact-checking system","volume":"10","author":"Hassan","year":"2017","journal-title":"Proc. VLDB Endowment"},{"key":"B21","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2111.09543","article-title":"Debertav3: improving deberta using electra-style pre-training with gradient-disentangled embedding sharing","author":"He","year":"2021","journal-title":"ArXiv"},{"key":"B22","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2006.03654","article-title":"Deberta: decoding-enhanced bert with disentangled attention","author":"He","year":"2020","journal-title":"arXiv preprint arXiv:2006.03654"},{"key":"B23","doi-asserted-by":"publisher","DOI":"10.31234\/osf.io\/qnjkf","article-title":"Leveraging chatgpt for efficient fact-checking","author":"Hoes","year":"2023","journal-title":"PsyArXiv"},{"key":"B24","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2301.08745","article-title":"Is chatgpt a good translator? Yes with gpt-4 as the engine","author":"Jiao","year":"2023","journal-title":"ArXiv"},{"key":"B25","article-title":"\u201cMatching tweets with applicable fact-checks across languages,\u201d","volume-title":"CEUR Workshop Proceedings","author":"Kazemi","year":"2022"},{"key":"B26","first-page":"2","article-title":"\u201cBert: pre-training of deep bidirectional transformers for language understanding,\u201d","volume-title":"Proceedings of naacL-HLT","author":"Kenton","year":"2019"},{"key":"B27","article-title":"\u201cOverview of the clef-2022 checkthat! lab task 3 on fake news detection,\u201d","volume-title":"CEUR Workshop Proceedings","author":"K\u00f6hler","year":"2022"},{"key":"B28","doi-asserted-by":"crossref","first-page":"5430","DOI":"10.18653\/v1\/2020.coling-main.474","article-title":"\u201cExplainable automated fact-checking: a survey,\u201d","volume-title":"Proceedings of the 28th International Conference on Computational Linguistics","author":"Kotonya","year":"2020"},{"key":"B29","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1907.11692","article-title":"Roberta: a robustly optimized bert pretraining approach","author":"Liu","year":"2019","journal-title":"ArXiv"},{"key":"B30","volume-title":"Fact-Checking 101","author":"Mantzarlis","year":"2018"},{"key":"B31","first-page":"651","article-title":"\u201cCross-platform multimodal misinformation: taxonomy, characteristics and detection for textual posts and videos,\u201d","volume-title":"Proceedings of the International AAAI Conference on Web and Social Media","author":"Micallef","year":"2022"},{"key":"B32","unstructured":"MisraR.\n          Politifact Fact Check Dataset2022"},{"key":"B33","doi-asserted-by":"publisher","first-page":"986","DOI":"10.1080\/21565503.2020.1803935","article-title":"A fake news inoculation? fact checkers, partisan identification, and the power of misinformation","volume":"8","author":"Morris","year":"2020","journal-title":"Polit. Gr. Ident."},{"key":"B34","article-title":"\u201cOverview of the clef-2022 checkthat! lab task 1 on identifying relevant claims in tweets,\u201d","volume-title":"CEUR Workshop Proceedings","author":"Nakov","year":""},{"key":"B35","first-page":"495","article-title":"\u201cOverview of the clef\u20132022 checkthat! lab on fighting the covid-19 infodemic and fake news detection,\u201d","volume-title":"International Conference of the Cross-Language Evaluation Forum for European Languages","author":"Nakov","year":""},{"key":"B36","article-title":"\u201cOverview of the clef-2022 checkthat! lab task 2 on detecting previously fact-checked claims,\u201d","volume-title":"CEUR Workshop Proceedings","author":"Nakov","year":""},{"key":"B37","volume-title":"Estimating Fact-Checking's Effects","author":"Nyhan","year":"2015"},{"key":"B38","doi-asserted-by":"publisher","first-page":"e2104235118","DOI":"10.1073\/pnas.2104235118","article-title":"The global effectiveness of fact-checking: evidence from simultaneous experiments in Argentina, Nigeria, South Africa, and the United Kingdom","volume":"118","author":"Porter","year":"2021","journal-title":"Proc. Natl. Acad. Sci. U. S. A."},{"key":"B39","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2310.18089","article-title":"Lost in translation\u2013multilingual misinformation and its evolution","author":"Quelle","year":"2023","journal-title":"arXiv preprint arXiv:2310.18089"},{"key":"B40","first-page":"2931","article-title":"\u201cTruth of varying shades: analyzing language in fake news and political fact-checking,\u201d","volume-title":"Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing","author":"Rashkin","year":"2017"},{"key":"B41","doi-asserted-by":"publisher","first-page":"333","DOI":"10.1561\/1500000019","article-title":"The probabilistic relevance framework: Bm25 and beyond","volume":"3","author":"Robertson","year":"2009","journal-title":"Found. Trends Inform. Retriev."},{"key":"B42","article-title":"\u201cOpenfact at checkthat! 2023: head-to-head gpt vs. bert-a comparative study of transformers language models for the detection of check-worthy claims,\u201d","volume-title":"CEUR Workshop Proceedings","author":"Sawi\u0144ski","year":"2023"},{"key":"B43","doi-asserted-by":"crossref","first-page":"3607","DOI":"10.18653\/v1\/2020.acl-main.332","article-title":"\u201cThat is a known lie: detecting previously fact-checked claims,\u201d","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Shaar","year":"2020"},{"key":"B44","doi-asserted-by":"crossref","DOI":"10.37016\/mr-2020-69","volume-title":"How COVID Drove the Evolution of Fact-Checking","author":"Siwakoti","year":"2021"},{"key":"B45","first-page":"809","article-title":"\u201cFEVER: a large-scale dataset for fact extraction and VERification,\u201d","volume-title":"Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)","author":"Thorne","year":""},{"key":"B46","first-page":"1","article-title":"\u201cThe fact extraction and VERification (FEVER) shared task,\u201d","volume-title":"Proceedings of the First Workshop on Fact Extraction and VERification (FEVER)","author":"Thorne","year":""},{"key":"B47","first-page":"5998","article-title":"Attention is all you need","volume":"30","author":"Vaswani","year":"2017","journal-title":"Adv. Neural Inform. Process. Syst."},{"key":"B48","doi-asserted-by":"crossref","first-page":"7534","DOI":"10.18653\/v1\/2020.emnlp-main.609","article-title":"\u201cFact or fiction: verifying scientific claims,\u201d","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Wadden","year":"2020"},{"key":"B49","doi-asserted-by":"publisher","first-page":"108","DOI":"10.1561\/1900000064","article-title":"Machine knowledge: creation and curation of comprehensive knowledge bases","volume":"10","author":"Weikum","year":"2020","journal-title":"Found. Trends Databases"},{"key":"B50","year":"2024","journal-title":"The Global Risk Report"},{"key":"B51","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2210.03629","article-title":"React: synergizing reasoning and acting in language models","author":"Yao","year":"2023","journal-title":"ArXiv"},{"key":"B52","doi-asserted-by":"publisher","first-page":"e12438","DOI":"10.1111\/lnc3.12438","article-title":"Automated fact-checking: a survey","volume":"15","author":"Zeng","year":"2021","journal-title":"Lang. Linguist. Compass"},{"key":"B53","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2304.04675","article-title":"Multilingual machine translation with large language models: empirical results and analysis","author":"Zhu","year":"2023","journal-title":"ArXiv"}],"container-title":["Frontiers in Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2024.1341697\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,2,7]],"date-time":"2024-02-07T05:34:30Z","timestamp":1707284070000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2024.1341697\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,2,7]]},"references-count":53,"alternative-id":["10.3389\/frai.2024.1341697"],"URL":"https:\/\/doi.org\/10.3389\/frai.2024.1341697","relation":{},"ISSN":["2624-8212"],"issn-type":[{"value":"2624-8212","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,2,7]]},"article-number":"1341697"}}