{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,24]],"date-time":"2026-06-24T03:52:57Z","timestamp":1782273177081,"version":"3.54.5"},"reference-count":52,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,8,8]],"date-time":"2025-08-08T00:00:00Z","timestamp":1754611200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,8,8]],"date-time":"2025-08-08T00:00:00Z","timestamp":1754611200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["npj Digit. Med."],"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Clinical evidence synthesis largely relies on systematic reviews (SR) of clinical studies from medical literature. Here, we propose a generative artificial intelligence (AI) pipeline named  to streamline study search, study screening, and data extraction tasks in SR. We chose published SRs to build , which contains 100 SRs and 2,220 clinical studies. For study search, it achieves high recall rates (Ours 0.711\u20130.834 v.s. Human baseline 0.138\u20130.232). For study screening,  beats previous document ranking methods in a 1.5\u20132.6 fold change. For data extraction, it outperforms a GPT-4\u2019s accuracy by 16\u201332%. In a pilot study, human-AI collaboration with  improved recall by 71.4% and reduced screening time by 44.2%, while in data extraction, accuracy increased by 23.5% with a 63.4% time reduction. Medical experts preferred \u2019s synthesized evidence over GPT-4\u2019s in 62.5%-100% of cases. These findings show the promise of accelerating clinical evidence synthesis driven by human-AI collaboration.<\/jats:p>","DOI":"10.1038\/s41746-025-01840-7","type":"journal-article","created":{"date-parts":[[2025,8,7]],"date-time":"2025-08-07T23:00:30Z","timestamp":1754607630000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":39,"title":["Accelerating clinical evidence synthesis with large language models"],"prefix":"10.1038","volume":"8","author":[{"given":"Zifeng","family":"Wang","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Lang","family":"Cao","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Benjamin","family":"Danek","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Qiao","family":"Jin","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Zhiyong","family":"Lu","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jimeng","family":"Sun","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2025,8,8]]},"reference":[{"key":"1840_CR1","doi-asserted-by":"publisher","first-page":"383","DOI":"10.1038\/d41586-021-03690-1","volume":"600","author":"J Elliott","year":"2021","unstructured":"Elliott, J. et al. Decision makers need constantly updated evidence synthesis. Nature 600, 383\u2013385 (2021).","journal-title":"Nature"},{"key":"1840_CR2","doi-asserted-by":"publisher","first-page":"665","DOI":"10.1348\/000711010X502733","volume":"63","author":"AP Field","year":"2010","unstructured":"Field, A. P. & Gillett, R. How to do a meta-analysis. Br. J. Math. Stat. Psychol. 63, 665\u2013694 (2010).","journal-title":"Br. J. Math. Stat. Psychol."},{"key":"1840_CR3","doi-asserted-by":"crossref","unstructured":"Concato, J., Shah, N. & Horwitz, R. I. Randomized, controlled trials, observational studies, and the hierarchy of research designs. In Research Ethics, 207\u2013212 (Routledge, 2017).","DOI":"10.4324\/9781315244426-20"},{"key":"1840_CR4","doi-asserted-by":"publisher","first-page":"e012545","DOI":"10.1136\/bmjopen-2016-012545","volume":"7","author":"R Borah","year":"2017","unstructured":"Borah, R., Brown, A. W., Capers, P. L. & Kaiser, K. A. Analysis of the time and workers needed to conduct systematic reviews of medical interventions using data from the Prospero registry. BMJ Open 7, e012545 (2017).","journal-title":"BMJ Open"},{"key":"1840_CR5","doi-asserted-by":"publisher","first-page":"181","DOI":"10.1111\/jebm.12447","volume":"14","author":"BD Hoffmeyer","year":"2021","unstructured":"Hoffmeyer, B. D., Andersen, M. Z., Fonnes, S. & Rosenberg, J. Most Cochrane reviews have not been updated for more than 5 years. J. Evid. Based Med. 14, 181\u2013184 (2021).","journal-title":"J. Evid. Based Med."},{"key":"1840_CR6","unstructured":"Medline PubMed production statistics. https:\/\/www.nlm.nih.gov\/bsd\/medline_pubmed_production_stats.html. Accessed: 2024-09-11."},{"key":"1840_CR7","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13643-019-1074-9","volume":"8","author":"IJ Marshall","year":"2019","unstructured":"Marshall, I. J. & Wallace, B. C. Toward systematic review automation: a practical guide to using machine learning tools in research synthesis. Syst. Rev. 8, 1\u201310 (2019).","journal-title":"Syst. Rev."},{"key":"1840_CR8","first-page":"1877","volume":"33","author":"T Brown","year":"2020","unstructured":"Brown, T. et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877\u20131901 (2020).","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"1840_CR9","doi-asserted-by":"crossref","unstructured":"Wang, S., Scells, H., Koopman, B. & Zuccon, G. Can chatgpt write a good boolean query for systematic review literature search? In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 1426\u20131436 (2023).","DOI":"10.1145\/3539618.3591703"},{"key":"1840_CR10","doi-asserted-by":"publisher","first-page":"ooae098","DOI":"10.1093\/jamiaopen\/ooae098","volume":"7","author":"GP Adam","year":"2024","unstructured":"Adam, G. P. et al. Literature search sandbox: a large language model that generates search queries for systematic reviews. JAMIA open 7, ooae098 (2024).","journal-title":"JAMIA open"},{"key":"1840_CR11","unstructured":"Wadhwa, S., DeYoung, J., Nye, B., Amir, S. & Wallace, B. C. Jointly extracting interventions, outcomes, and findings from RCT reports with LLMs. In Machine Learning for Healthcare Conference, 754\u2013771 (PMLR, 2023)."},{"key":"1840_CR12","doi-asserted-by":"publisher","first-page":"1163","DOI":"10.1093\/jamia\/ocae065","volume":"31","author":"G Zhang","year":"2024","unstructured":"Zhang, G. et al. A span-based model for extracting overlapping pico entities from randomized controlled trial publications. J. Am. Med. Inform. Assoc. 31, 1163\u20131171 (2024).","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"1840_CR13","doi-asserted-by":"publisher","first-page":"576","DOI":"10.1002\/jrsm.1710","volume":"15","author":"G Gartlehner","year":"2024","unstructured":"Gartlehner, G. et al. Data extraction for evidence synthesis using a large language model: A proof-of-concept study. Res. Synth. Methods 15, 576\u2013589 (2024).","journal-title":"Res. Synth. Methods"},{"key":"1840_CR14","doi-asserted-by":"publisher","first-page":"818","DOI":"10.1002\/jrsm.1732","volume":"15","author":"A Konet","year":"2024","unstructured":"Konet, A. et al. Performance of two large language models for data extraction in evidence synthesis. Res. Synth. Methods 15, 818\u2013824 (2024).","journal-title":"Res. Synth. Methods"},{"key":"1840_CR15","doi-asserted-by":"crossref","unstructured":"Syriani, E., David, I. & Kumar, G. Screening articles for systematic reviews with ChatGPT. Journal of Computer Languages, 80, 101287 (2024).","DOI":"10.1016\/j.cola.2024.101287"},{"key":"1840_CR16","doi-asserted-by":"publisher","first-page":"893","DOI":"10.1093\/jamia\/ocaf050","volume":"32","author":"R Sanghera","year":"2025","unstructured":"Sanghera, R. et al. High-performance automated abstract screening with large language model ensembles. J. Am. Med. Inform. Assoc. 32, 893\u2013904 (2025).","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"1840_CR17","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s12874-025-02583-5","volume":"25","author":"F Trad","year":"2025","unstructured":"Trad, F. et al. Streamlining systematic reviews with large language models using prompt engineering and retrieval augmented generation. BMC Med. Res. Methodol. 25, 1\u20139 (2025).","journal-title":"BMC Med. Res. Methodol."},{"key":"1840_CR18","doi-asserted-by":"crossref","unstructured":"Shaib, C. et al. Summarizing, simplifying, and synthesizing medical evidence using GPT-3 (with varying success). In The 61st Annual Meeting Of The Association For Computational Linguistics (2023).","DOI":"10.18653\/v1\/2023.acl-short.119"},{"key":"1840_CR19","first-page":"605","volume":"2021","author":"BC Wallace","year":"2021","unstructured":"Wallace, B. C., Saha, S., Soboczenski, F. & Marshall, I. J. Generating (factual?) narrative summaries of RCTs: experiments with neural multi-document summarization. AMIA Summits Transl. Sci. Proc. 2021, 605 (2021).","journal-title":"AMIA Summits Transl. Sci. Proc."},{"key":"1840_CR20","doi-asserted-by":"publisher","DOI":"10.1038\/s41746-024-01239-w","volume":"7","author":"G Zhang","year":"2024","unstructured":"Zhang, G. et al. Closing the gap between open source and commercial large language models for medical evidence summarization. npj Digit. Med. 7, 239 (2024).","journal-title":"npj Digit. Med."},{"key":"1840_CR21","doi-asserted-by":"publisher","first-page":"1593","DOI":"10.1038\/s41591-023-02366-9","volume":"29","author":"Y Peng","year":"2023","unstructured":"Peng, Y., Rousseau, J. F., Shortliffe, E. H. & Weng, C. AI-generated text may have a role in evidence-based medicine. Nat. Med. 29, 1593\u20131594 (2023).","journal-title":"Nat. Med."},{"key":"1840_CR22","doi-asserted-by":"publisher","first-page":"115","DOI":"10.3390\/biomedinformatics3010009","volume":"3","author":"SC Christopoulou","year":"2023","unstructured":"Christopoulou, S. C. Towards automated meta-analysis of clinical trials: an overview. BioMedInformatics 3, 115\u2013140 (2023).","journal-title":"BioMedInformatics"},{"key":"1840_CR23","doi-asserted-by":"crossref","unstructured":"Page, M. J. et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. Bmj 372, n71 (2021).","DOI":"10.1136\/bmj.n71"},{"key":"1840_CR24","doi-asserted-by":"publisher","first-page":"e56780","DOI":"10.2196\/56780","volume":"26","author":"X Luo","year":"2024","unstructured":"Luo, X. et al. Potential roles of large language models in the production of systematic reviews and meta-analyses. J. Med. Internet Res. 26, e56780 (2024).","journal-title":"J. Med. Internet Res."},{"key":"1840_CR25","doi-asserted-by":"crossref","unstructured":"Lieberum, J.-L. et al. Large language models for conducting systematic reviews: on the rise, but not yet ready for use\u2013a scoping review. J. Clin. Epidemiol. 181, 111746 (2025).","DOI":"10.1016\/j.jclinepi.2025.111746"},{"key":"1840_CR26","doi-asserted-by":"crossref","unstructured":"Yun, H., Marshall, I., Trikalinos, T. & Wallace, B. C. Appraising the potential uses and harms of LLMs for medical systematic reviews. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 10122\u201310139 (2023).","DOI":"10.18653\/v1\/2023.emnlp-main.626"},{"key":"1840_CR27","unstructured":"National Cancer Institute. Types of cancer treatment. https:\/\/www.cancer.gov\/about-cancer\/treatment\/types. Accessed: 2024-04-24."},{"key":"1840_CR28","doi-asserted-by":"publisher","first-page":"D267","DOI":"10.1093\/nar\/gkh061","volume":"32","author":"O Bodenreider","year":"2004","unstructured":"Bodenreider, O. The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32, D267\u2013D270 (2004).","journal-title":"Nucleic Acids Res."},{"key":"1840_CR29","unstructured":"Song, K., Tan, X., Qin, T., Lu, J. & Liu, T.-Y. Mpnet: Masked and permuted pre-training for language understanding 2004.09297 (2020)."},{"key":"1840_CR30","doi-asserted-by":"publisher","first-page":"btad651","DOI":"10.1093\/bioinformatics\/btad651","volume":"39","author":"Q Jin","year":"2023","unstructured":"Jin, Q. et al. Medcpt: Contrastive pre-trained transformers with large-scale PubMed search logs for zero-shot biomedical information retrieval. Bioinformatics 39, btad651 (2023).","journal-title":"Bioinformatics"},{"key":"1840_CR31","unstructured":"Deeks, J. J. & Higgins, J. P. Statistical algorithms in review manager 5. Statistical Methods Group of the Cochrane Collaboration. vol. 1 (2010)."},{"key":"1840_CR32","doi-asserted-by":"publisher","first-page":"1461","DOI":"10.1001\/jama.286.12.1461","volume":"286","author":"PG Shekelle","year":"2001","unstructured":"Shekelle, P. G. et al. Validity of the agency for healthcare research and quality clinical practice guidelines: How quickly do guidelines become outdated? JAMA 286, 1461\u20131467 (2001).","journal-title":"JAMA"},{"key":"1840_CR33","doi-asserted-by":"publisher","first-page":"S2","DOI":"10.1038\/d41586-024-00753-x","volume":"627","author":"M Hutson","year":"2024","unstructured":"Hutson, M. How AI is being used to accelerate clinical trials. Nature 627, S2\u2013S5 (2024).","journal-title":"Nature"},{"key":"1840_CR34","unstructured":"Wang, Z., Theodorou, B., Fu, T., Xiao, C. & Sun, J. Pytrial: Machine learning software and benchmark for clinical trial applications. arXiv preprint arXiv:2306.04018 (2023)."},{"key":"1840_CR35","doi-asserted-by":"crossref","unstructured":"Jin, Q., Leaman, R. & Lu, Z. Pubmed and beyond: biomedical literature search in the age of artificial intelligence. Ebiomedicine100 (2024).","DOI":"10.1016\/j.ebiom.2024.104988"},{"key":"1840_CR36","doi-asserted-by":"crossref","unstructured":"Scells, H. et al. A test collection for evaluating retrieval of studies for inclusion in systematic reviews. In Proc. of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 1237\u20131240 (2017).","DOI":"10.1145\/3077136.3080707"},{"key":"1840_CR37","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/1471-2105-11-55","volume":"11","author":"BC Wallace","year":"2010","unstructured":"Wallace, B. C., Trikalinos, T. A., Lau, J., Brodley, C. & Schmid, C. H. Semi-automated screening of biomedical citations for systematic reviews. BMC Bioinforma. 11, 1\u201311 (2010).","journal-title":"BMC Bioinforma."},{"key":"1840_CR38","unstructured":"Kanoulas, E., Li, D., Azzopardi, L. & Spijker, R. Clef 2018 technologically assisted reviews in empirical medicine overview. In CEUR workshop proceedings, vol. 2125 (2018)."},{"key":"1840_CR39","unstructured":"Trikalinos, T. et al. Large scale empirical evaluation of machine learning for semi-automating citation screening in systematic reviews. In 41st Annual Meeting of the Society for Medical Decision Making (SMDM, 2019)."},{"key":"1840_CR40","doi-asserted-by":"publisher","first-page":"e35568","DOI":"10.2196\/35568","volume":"25","author":"S \u0160uster","year":"2023","unstructured":"\u0160uster, S. et al. Automating quality assessment of medical evidence in systematic reviews: model development and validation study. J. Med. Internet Res. 25, e35568 (2023).","journal-title":"J. Med. Internet Res."},{"key":"1840_CR41","unstructured":"Yun, H. S., Pogrebitskiy, D., Marshall, I. J. & Wallace, B. C. Automatically extracting numerical results from randomized controlled trials with large language models. In Machine Learning for Healthcare Conference. PMLR. (2024)."},{"key":"1840_CR42","doi-asserted-by":"crossref","unstructured":"Schmidt, L. et al. Data extraction methods for systematic review (semi) automation: update of a living systematic review. F1000Research 10, 401 (2021).","DOI":"10.12688\/f1000research.51117.1"},{"key":"1840_CR43","doi-asserted-by":"crossref","unstructured":"Zhang, G. et al. Leveraging generative AI for clinical evidence synthesis needs to ensure trustworthiness. J. Biomed. Inform. 153, 104640 (2024).","DOI":"10.1016\/j.jbi.2024.104640"},{"key":"1840_CR44","doi-asserted-by":"crossref","unstructured":"Joseph, S. A. et al. Factpico: Factuality evaluation for plain language summarization of medical evidence. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 8437-8464) (2024).","DOI":"10.18653\/v1\/2024.acl-long.459"},{"key":"1840_CR45","doi-asserted-by":"crossref","unstructured":"Ramprasad, S., Mcinerney, J., Marshall, I. & Wallace, B. C. Automatically summarizing evidence from clinical trials: A prototype highlighting current challenges. In Proc. of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, 236\u2013247 (2023).","DOI":"10.18653\/v1\/2023.eacl-demo.27"},{"key":"1840_CR46","doi-asserted-by":"publisher","first-page":"e53164","DOI":"10.2196\/53164","volume":"26","author":"M Chelli","year":"2024","unstructured":"Chelli, M. et al. Hallucination rates and reference accuracy of ChatGPT and Bard for systematic reviews: comparative analysis. J. Med. Internet Res. 26, e53164 (2024).","journal-title":"J. Med. Internet Res."},{"key":"1840_CR47","doi-asserted-by":"crossref","unstructured":"Spillias, S. et al. Human-AI collaboration to identify literature for evidence synthesis. Cell Rep. Sustain. 1, 100132 (2023).","DOI":"10.1016\/j.crsus.2024.100132"},{"key":"1840_CR48","first-page":"9459","volume":"33","author":"P Lewis","year":"2020","unstructured":"Lewis, P. et al. Retrieval-augmented generation for knowledge-intensive NLP tasks. Adv. Neural Inf. Process. Syst. 33, 9459\u20139474 (2020).","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"1840_CR49","first-page":"24824","volume":"35","author":"J Wei","year":"2022","unstructured":"Wei, J. et al. Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural Inf. Process. Syst. 35, 24824\u201324837 (2022).","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"1840_CR50","unstructured":"OpenAI. Gpt-4 technical report 2303.08774 (2024)."},{"key":"1840_CR51","unstructured":"Anthropic. Introducing the Claude 3 family. https:\/\/www.anthropic.com\/news\/claude-3-family Accessed: 2024-04-24 (2023)."},{"key":"1840_CR52","unstructured":"National Center for Biotechnology Information (NCBI). Entrez programming utilities help. https:\/\/www.ncbi.nlm.nih.gov\/books\/NBK25501\/ Accessed: 2024-04-24 (2008)."}],"container-title":["npj Digital Medicine"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.nature.com\/articles\/s41746-025-01840-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41746-025-01840-7","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41746-025-01840-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,8]],"date-time":"2025-09-08T18:09:53Z","timestamp":1757354993000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.nature.com\/articles\/s41746-025-01840-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8,8]]},"references-count":52,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["1840"],"URL":"https:\/\/doi.org\/10.1038\/s41746-025-01840-7","relation":{},"ISSN":["2398-6352"],"issn-type":[{"value":"2398-6352","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,8,8]]},"assertion":[{"value":"31 October 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"24 June 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"8 August 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declare no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"509"}}