{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,10]],"date-time":"2026-04-10T10:07:35Z","timestamp":1775815655418,"version":"3.50.1"},"reference-count":30,"publisher":"Springer Science and Business Media LLC","issue":"10","license":[{"start":{"date-parts":[[2023,10,19]],"date-time":"2023-10-19T00:00:00Z","timestamp":1697673600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,10,19]],"date-time":"2023-10-19T00:00:00Z","timestamp":1697673600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Nat Mach Intell"],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Verifiability is a core content policy of Wikipedia: claims need to be backed by citations. Maintaining and improving the quality of Wikipedia references is an important challenge and there is a pressing need for better tools to assist humans in this effort. We show that the process of improving references can be tackled with the help of artificial intelligence (AI) powered by an information retrieval system and a language model. This neural-network-based system, which we call SIDE, can identify Wikipedia citations that are unlikely to support their claims, and subsequently recommend better ones from the web. We train this model on existing Wikipedia references, therefore learning from the contributions and combined wisdom of thousands of Wikipedia editors. Using crowdsourcing, we observe that for the top 10% most likely citations to be tagged as unverifiable by our system, humans prefer our system\u2019s suggested alternatives compared with the originally cited reference 70% of the time. To validate the applicability of our system, we built a demo to engage with the English-speaking Wikipedia community and find that SIDE\u2019s first citation recommendation is preferred twice as often as the existing Wikipedia citation for the same top 10% most likely unverifiable claims according to SIDE. Our results indicate that an AI-based system could be used, in tandem with humans, to improve the verifiability of Wikipedia.<\/jats:p>","DOI":"10.1038\/s42256-023-00726-1","type":"journal-article","created":{"date-parts":[[2023,10,19]],"date-time":"2023-10-19T12:02:20Z","timestamp":1697716940000},"page":"1142-1148","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":14,"title":["Improving Wikipedia verifiability with AI"],"prefix":"10.1038","volume":"5","author":[{"ORCID":"https:\/\/orcid.org\/0009-0005-5996-9830","authenticated-orcid":false,"given":"Fabio","family":"Petroni","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Samuel","family":"Broscheit","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Aleksandra","family":"Piktus","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Patrick","family":"Lewis","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Gautier","family":"Izacard","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lucas","family":"Hosseini","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jane","family":"Dwivedi-Yu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Maria","family":"Lomeli","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Timo","family":"Schick","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Michele","family":"Bevilacqua","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Pierre-Emmanuel","family":"Mazar\u00e9","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Armand","family":"Joulin","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Edouard","family":"Grave","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sebastian","family":"Riedel","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2023,10,19]]},"reference":[{"key":"726_CR1","unstructured":"Top websites ranking. similarweb https:\/\/www.similarweb.com\/top-websites\/ (2023). Accessed 28 September 2023."},{"key":"726_CR2","unstructured":"Statistics. Wikimedia https:\/\/stats.wikimedia.org\/#\/all-projects\/reading\/total-page-views\/normal|bar\u22232-year\u2223~total\u2223monthly (2023). Accessed 28 September 2023."},{"key":"726_CR3","unstructured":"Verifiability. Wikipedia https:\/\/en.wikipedia.org\/wiki\/Wikipedia:Verifiability (2023). Accessed 28 September 2023."},{"key":"726_CR4","doi-asserted-by":"crossref","unstructured":"Piccardi, T., Redi, M., Colavizza, G. & West, R. Quantifying engagement with citations on Wikipedia. In Proc. Web Conference 2020 2365\u20132376 (2020).","DOI":"10.1145\/3366423.3380300"},{"key":"726_CR5","doi-asserted-by":"publisher","first-page":"263","DOI":"10.3390\/info11050263","volume":"11","author":"W Lewoniewski","year":"2020","unstructured":"Lewoniewski, W., W\u0119cel, K. & Abramowicz, W. Modeling popularity and reliability of sources in multilingual Wikipedia. Information 11, 263 (2020).","journal-title":"Information"},{"key":"726_CR6","doi-asserted-by":"crossref","unstructured":"Kaffee, L.-A. & Elsahar, H. References in Wikipedia: the editors\u2019 perspective. In Companion Proc. Web Conference 2021 535\u2013538 (2021).","DOI":"10.1145\/3442442.3452337"},{"key":"726_CR7","doi-asserted-by":"crossref","unstructured":"Bowman, S. R., Angeli, G., Potts, C. & Manning, C. D. A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing 632\u2013642 (Association for Computational Linguistics, 2015).","DOI":"10.18653\/v1\/D15-1075"},{"key":"726_CR8","doi-asserted-by":"crossref","unstructured":"Wang, A. et al. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP 353\u2013355 (Association for Computational Linguistics, 2018).","DOI":"10.18653\/v1\/W18-5446"},{"key":"726_CR9","unstructured":"Camburu, O. M., Rockt\u00e4schel, T., Lukasiewicz, T. & Blunsom, P. e-snli: Natural language inference with natural language explanations. Adv. Neural Inf. Process. Syst. 31 (2018)."},{"key":"726_CR10","doi-asserted-by":"crossref","unstructured":"Nie, Y. et al. Adversarial NLI: A New Benchmark for Natural Language Understanding. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 4885\u20134901 (Association for Computational Linguistics, 2020).","DOI":"10.18653\/v1\/2020.acl-main.441"},{"key":"726_CR11","unstructured":"P\u00e9rez-Rosas, V., Kleinberg, B., Lefevre, A. & Mihalcea, R. Automatic detection of fake news. In Proceedings of the 27th International Conference on Computational Linguistics 3391\u20133401 (Association for Computational Linguistics, 2018)."},{"key":"726_CR12","doi-asserted-by":"crossref","unstructured":"Thorne, J., Vlachos, A., Christodoulopoulos, C. & Mittal, A. FEVER: a large-scale dataset for Fact Extraction and VERification. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies Vol. 1, 809\u2013819 (Association for Computational Linguistics, 2018).","DOI":"10.18653\/v1\/N18-1074"},{"key":"726_CR13","unstructured":"Thorne J. & Vlachos, A. Automated fact checking: Task formulations, methods and future directions. In Proceedings of the 27th International Conference on Computational Linguistics 3346\u20133359 (Association for Computational Linguistics, 2018)."},{"key":"726_CR14","doi-asserted-by":"publisher","unstructured":"Piktus, A. et al. The web is your oyster - knowledge-intensive NLP against a very large web corpus. Preprint at https:\/\/doi.org\/10.48550\/arXiv.2112.09924 (2021).","DOI":"10.48550\/arXiv.2112.09924"},{"key":"726_CR15","doi-asserted-by":"crossref","unstructured":"Mao, Y. et al. Generation-augmented retrieval for open-domain question answering. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing Vol. 1, 4089\u20134100 (Association for Computational Linguistics, 2021).","DOI":"10.18653\/v1\/2021.acl-long.316"},{"key":"726_CR16","doi-asserted-by":"crossref","unstructured":"Lewis, M. et al. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 7871\u20137880 (Association for Computational Linguistics, 2020).","DOI":"10.18653\/v1\/2020.acl-main.703"},{"key":"726_CR17","doi-asserted-by":"crossref","unstructured":"Robertson, S. E. et al. Okapi at TREC-3 (National Institute of Standards and Technology, 1995).","DOI":"10.6028\/NIST.SP.500-225.routing-city"},{"key":"726_CR18","unstructured":"Baeza-Yates, R. et al. Modern Information Retrieval (Association for Computing Machinary, 1999)."},{"key":"726_CR19","doi-asserted-by":"crossref","unstructured":"Manning, C. D., Raghavan, P. & Sch\u00fctze, H. Introduction to Information Retrieval Vol. 39 (Cambridge Univ. Press, 2008).","DOI":"10.1017\/CBO9780511809071"},{"key":"726_CR20","doi-asserted-by":"crossref","unstructured":"Robertson, S. & Zaragoza, H. The Probabilistic Relevance Framework: BM25 and Beyond (Now Publishers, 2009).","DOI":"10.1561\/1500000019"},{"key":"726_CR21","doi-asserted-by":"crossref","unstructured":"Lin, J. et al. Pyserini: A Python toolkit for reproducible information retrieval research with sparse and dense representations. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '21) 2356\u20132362 (Association for Computing Machinery, 2021).","DOI":"10.1145\/3404835.3463238"},{"key":"726_CR22","doi-asserted-by":"crossref","unstructured":"Wu, L., Petroni, F., Josifoski, M., Riedel, S. & Zettlemoyer, L. Scalable zero-shot entity linking with dense entity retrieval. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 6397\u20136407 (Association for Computational Linguistics, 2020).","DOI":"10.18653\/v1\/2020.emnlp-main.519"},{"key":"726_CR23","doi-asserted-by":"crossref","unstructured":"Karpukhin, V. et al. Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 6769\u20136781 (Association for Computational Linguistics, 2020).","DOI":"10.18653\/v1\/2020.emnlp-main.550"},{"key":"726_CR24","doi-asserted-by":"crossref","unstructured":"Maillard, J. et al. Multi-task retrieval for knowledge-intensive tasks. In Proc. 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing 1098\u20131111 (Association for Computational Linguistics, 2021).","DOI":"10.18653\/v1\/2021.acl-long.89"},{"key":"726_CR25","doi-asserted-by":"crossref","unstructured":"O\u011fuz, B. et al. Domain-matched pre-training tasks for dense retrieval. In Findings of the Association for Computational Linguistics: NAACL 2022 1524\u20131534 (Association for Computational Linguistics, 2022).","DOI":"10.18653\/v1\/2022.findings-naacl.114"},{"key":"726_CR26","first-page":"329","volume":"9","author":"Y Luan","year":"2021","unstructured":"Luan, Y., Eisenstein, J., Toutanova, K. & Collins, M. Sparse, dense, and attentional representations for text retrieval. Trans. Assoc. Comput. Ling. 9, 329\u2013345 (2021).","journal-title":"Trans. Assoc. Comput. Ling."},{"key":"726_CR27","unstructured":"Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. In Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 4171\u20134186 (Association for Computational Linguistics, 2019)."},{"key":"726_CR28","doi-asserted-by":"crossref","unstructured":"MacCartney, B. & Manning, C. D. Modeling semantic containment and exclusion in natural language inference. In Proc. 22nd International Conference on Computational Linguistics (Coling 2008) 521\u2013528 (Coling 2008 Organizing Committee, 2008).","DOI":"10.3115\/1599081.1599147"},{"key":"726_CR29","doi-asserted-by":"crossref","unstructured":"Seo, M. et al. Real-time open-domain question answering with dense-sparse phrase index. In Proc. 57th Annual Meeting of the Association for Computational Linguistics 4430\u20134441 (Association for Computational Linguistics, 2019).","DOI":"10.18653\/v1\/P19-1436"},{"key":"726_CR30","doi-asserted-by":"publisher","unstructured":"Petroni, F. et al. Improving Wikipedia verifiability with AI. Zenodo https:\/\/doi.org\/10.5281\/zenodo.8252866 (2022).","DOI":"10.5281\/zenodo.8252866"}],"container-title":["Nature Machine Intelligence"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.nature.com\/articles\/s42256-023-00726-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s42256-023-00726-1","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s42256-023-00726-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,31]],"date-time":"2024-10-31T01:20:41Z","timestamp":1730337641000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.nature.com\/articles\/s42256-023-00726-1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,10,19]]},"references-count":30,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2023,10]]}},"alternative-id":["726"],"URL":"https:\/\/doi.org\/10.1038\/s42256-023-00726-1","relation":{"has-preprint":[{"id-type":"doi","id":"10.21203\/rs.3.rs-2116541\/v1","asserted-by":"object"}]},"ISSN":["2522-5839"],"issn-type":[{"value":"2522-5839","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,10,19]]},"assertion":[{"value":"29 September 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 September 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"19 October 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declare no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}]}}