{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,4]],"date-time":"2026-07-04T00:10:04Z","timestamp":1783123804848,"version":"3.54.6"},"reference-count":43,"publisher":"Oxford University Press (OUP)","issue":"4","license":[{"start":{"date-parts":[[2025,2,27]],"date-time":"2025-02-27T00:00:00Z","timestamp":1740614400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,4,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Objectives<\/jats:title>\n                  <jats:p>We developed and validated a large language model (LLM)-assisted system for conducting systematic literature reviews in health technology assessment (HTA) submissions.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Materials and Methods<\/jats:title>\n                  <jats:p>We developed a five-module system using abstracts acquired from PubMed: (1) literature search query setup; (2) study protocol setup using population, intervention\/comparison, outcome, and study type (PICOs) criteria; (3) LLM-assisted abstract screening; (4) LLM-assisted data extraction; and (5) data summarization. The system incorporates a human-in-the-loop design, allowing real-time PICOs criteria adjustment. This is achieved by collecting information on disagreements between the LLM and human reviewers regarding inclusion\/exclusion decisions and their rationales, enabling informed PICOs refinement. We generated four evaluation sets including relapsed and refractory multiple myeloma (RRMM) and advanced melanoma to evaluate the LLM's performance in three key areas: (1) recommending inclusion\/exclusion decisions during abstract screening, (2) providing valid rationales for abstract exclusion, and (3) extracting relevant information from included abstracts.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>The system demonstrated relatively high performance across all evaluation sets. For abstract screening, it achieved an average sensitivity of 90%, F1 score of 82, accuracy of 89%, and Cohen's \u03ba of 0.71, indicating substantial agreement between human reviewers and LLM-based results. In identifying specific exclusion rationales, the system attained accuracies of 97% and 84%, and F1 scores of 98 and 89 for RRMM and advanced melanoma, respectively. For data extraction, the system achieved an F1 score of 93.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Discussion<\/jats:title>\n                  <jats:p>Results showed high sensitivity, Cohen's \u03ba, and PABAK for abstract screening, and high F1 scores for data extraction. This human-in-the-loop AI-assisted SLR system demonstrates the potential of GPT-4's in context learning capabilities by eliminating the need for manually annotated training data. In addition, this LLM-based system offers subject matter experts greater control through prompt adjustment and real-time feedback, enabling iterative refinement of PICOs criteria based on performance metrics.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Conclusion<\/jats:title>\n                  <jats:p>The system demonstrates potential to streamline systematic literature reviews, potentially reducing time, cost, and human errors while enhancing evidence generation for HTA submissions.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/jamia\/ocaf030","type":"journal-article","created":{"date-parts":[[2025,3,4]],"date-time":"2025-03-04T21:08:23Z","timestamp":1741122503000},"page":"616-625","source":"Crossref","is-referenced-by-count":48,"title":["Enhancing systematic literature reviews with generative artificial intelligence: development, applications, and performance evaluation"],"prefix":"10.1093","volume":"32","author":[{"given":"Ying","family":"Li","sequence":"first","affiliation":[{"name":"Regeneron Pharmaceuticals, Inc. , Tarrytown, NY 10591,","place":["United States"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Surabhi","family":"Datta","sequence":"additional","affiliation":[{"name":"IMO Health, Inc. , Rosemont, IL 60018,","place":["United States"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Majid","family":"Rastegar-Mojarad","sequence":"additional","affiliation":[{"name":"IMO Health, Inc. , Rosemont, IL 60018,","place":["United States"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Kyeryoung","family":"Lee","sequence":"additional","affiliation":[{"name":"IMO Health, Inc. , Rosemont, IL 60018,","place":["United States"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-9916-5654","authenticated-orcid":false,"given":"Hunki","family":"Paek","sequence":"additional","affiliation":[{"name":"IMO Health, Inc. , Rosemont, IL 60018,","place":["United States"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Julie","family":"Glasgow","sequence":"additional","affiliation":[{"name":"IMO Health, Inc. , Rosemont, IL 60018,","place":["United States"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Chris","family":"Liston","sequence":"additional","affiliation":[{"name":"IMO Health, Inc. , Rosemont, IL 60018,","place":["United States"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Long","family":"He","sequence":"additional","affiliation":[{"name":"IMO Health, Inc. , Rosemont, IL 60018,","place":["United States"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Xiaoyan","family":"Wang","sequence":"additional","affiliation":[{"name":"IMO Health, Inc. , Rosemont, IL 60018,","place":["United States"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yingxin","family":"Xu","sequence":"additional","affiliation":[{"name":"Regeneron Pharmaceuticals, Inc. , Tarrytown, NY 10591,","place":["United States"]}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2025,2,27]]},"reference":[{"key":"2025041716422131700_ocaf030-B1","volume-title":"Cochrane Handbook for Systematic Reviews of Interventions","author":"Chandler","year":"2019"},{"key":"2025041716422131700_ocaf030-B2","doi-asserted-by":"crossref","first-page":"e012545","DOI":"10.1136\/bmjopen-2016-012545","article-title":"Analysis of the time and workers needed to conduct systematic reviews of medical interventions using data from the PROSPERO registry","volume":"7","author":"Borah","year":"2017","journal-title":"BMJ Open."},{"key":"2025041716422131700_ocaf030-B3","doi-asserted-by":"crossref","first-page":"100443","DOI":"10.1016\/j.conctc.2019.100443","article-title":"The significant cost of systematic reviews and meta-analyses: a call for greater involvement of machine learning to assess the promise of clinical trials","volume":"16","author":"Michelson","year":"2019","journal-title":"Contemp Clin Trials Commun."},{"key":"2025041716422131700_ocaf030-B4","doi-asserted-by":"crossref","first-page":"e2005343","DOI":"10.1371\/journal.pbio.2005343","article-title":"Best match: new relevance search for PubMed","volume":"16","author":"Fiorini","year":"2018","journal-title":"PLoS Biol."},{"key":"2025041716422131700_ocaf030-B5"},{"key":"2025041716422131700_ocaf030-B6","doi-asserted-by":"crossref","first-page":"S253","DOI":"10.1016\/j.jval.2024.03.1395","article-title":"HTA44 systematic literature review requirements for health technology assessment in European markets","volume":"27","author":"Wright","year":"2024","journal-title":"Value in Health"},{"key":"2025041716422131700_ocaf030-B7","doi-asserted-by":"crossref","first-page":"227","DOI":"10.1007\/s40273-022-01229-4","article-title":"Living health technology assessment: issues, challenges and opportunities","volume":"41","author":"Thokala","year":"2023","journal-title":"Pharmacoeconomics."},{"key":"2025041716422131700_ocaf030-B8","doi-asserted-by":"crossref","first-page":"S404","DOI":"10.1016\/j.jval.2020.08.041","article-title":"ML1 do machines perform better than humans at systematic review of published literature? a case study of prostate cancer clinical evidence","volume":"23","author":"Abogunrin","year":"2020","journal-title":"Value Health"},{"key":"2025041716422131700_ocaf030-B9","doi-asserted-by":"crossref","first-page":"46","DOI":"10.1186\/s41182-019-0165-6","article-title":"A step by step guide for conducting a systematic review and meta-analysis with simulation data","volume":"47","author":"Tawfik","year":"2019","journal-title":"Trop Med Health."},{"key":"2025041716422131700_ocaf030-B10","doi-asserted-by":"crossref","first-page":"e0227742","DOI":"10.1371\/journal.pone.0227742","article-title":"Error rates of human reviewers during abstract screening in systematic reviews","volume":"15","author":"Wang","year":"2020","journal-title":"PLoS One."},{"key":"2025041716422131700_ocaf030-B11","doi-asserted-by":"crossref","first-page":"6","DOI":"10.1186\/2046-4053-4-6","article-title":"Better duplicate detection for systematic reviewers: evaluation of Systematic Review Assistant-Deduplication Module","volume":"4","author":"Rathbone","year":"2015","journal-title":"Syst Rev."},{"key":"2025041716422131700_ocaf030-B12","doi-asserted-by":"crossref","first-page":"330","DOI":"10.1002\/jrsm.1354","article-title":"Best practice guidelines for abstract screening large-evidence systematic reviews and meta-analyses","volume":"10","author":"Polanin","year":"2019","journal-title":"Res Synthesis Methods"},{"key":"2025041716422131700_ocaf030-B13","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1016\/j.jclinepi.2020.01.005","article-title":"Single-reviewer abstract screening missed 13 percent of relevant studies: a crowd-based, randomized controlled trial","volume":"121","author":"Gartlehner","year":"2020","journal-title":"J Clin Epidemiol."},{"key":"2025041716422131700_ocaf030-B14","doi-asserted-by":"crossref","first-page":"445","DOI":"10.1080\/19439342.2012.711342","article-title":"The benefits and challenges of using systematic reviews in international development research","volume":"4","author":"Mallett","year":"2012","journal-title":"J Develop Effect"},{"key":"2025041716422131700_ocaf030-B15","doi-asserted-by":"crossref","first-page":"108","DOI":"10.1186\/s12874-024-02224-3","article-title":"Machine learning models for abstract screening task\u2014a systematic literature review application for health economics and outcome research","volume":"24","author":"Du","year":"2024","journal-title":"BMC Med Res Methodol."},{"key":"2025041716422131700_ocaf030-B16","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1186\/2046-4053-4-5","article-title":"Using text mining for study identification in systematic reviews: a systematic review of current approaches","volume":"4","author":"O'Mara-Eves","year":"2015","journal-title":"Syst Rev."},{"key":"2025041716422131700_ocaf030-B17","doi-asserted-by":"crossref","first-page":"353","DOI":"10.1002\/jrsm.1553","article-title":"Using artificial intelligence methods for systematic review in health sciences: a systematic review","volume":"13","author":"Blaizot","year":"2022","journal-title":"Res Synth Methods."},{"key":"2025041716422131700_ocaf030-B18","doi-asserted-by":"crossref","first-page":"156","DOI":"10.1002\/jrsm.1589","article-title":"In-depth evaluation of machine learning methods for semi-automating article screening in a systematic review of mechanistic literature","volume":"14","author":"Kebede","year":"2023","journal-title":"Res Synth Methods."},{"key":"2025041716422131700_ocaf030-B19","doi-asserted-by":"crossref","first-page":"22","DOI":"10.1016\/j.jclinepi.2021.12.005","article-title":"Tools to support the automation of systematic reviews: a scoping review","volume":"144","author":"Khalil","year":"2022","journal-title":"J Clin Epidemiol."},{"key":"2025041716422131700_ocaf030-B20","doi-asserted-by":"crossref","first-page":"100162","DOI":"10.1016\/j.dajour.2023.100162","article-title":"A novel application of machine learning and zero-shot classification methods for automated abstract screening in systematic reviews","volume":"6","author":"Moreno-Garcia","year":"2023","journal-title":"Decision Anal J"},{"key":"2025041716422131700_ocaf030-B21","doi-asserted-by":"publisher","author":"Du","year":"2024","DOI":"10.21203\/rs.3.rs-4426541\/v1"},{"key":"2025041716422131700_ocaf030-B22","doi-asserted-by":"crossref","first-page":"e48996","DOI":"10.2196\/48996","article-title":"Automated paper screening for clinical reviews using large language models: data analysis study","volume":"26","author":"Guo","year":"2024","journal-title":"J Med Internet Res."},{"key":"2025041716422131700_ocaf030-B23","doi-asserted-by":"crossref","first-page":"351","DOI":"10.3390\/systems11070351","article-title":"Harnessing the power of ChatGPT for automating systematic review process: methodology, case study, limitations, and future directions","volume":"11","author":"Alshami","year":"2023","journal-title":"Systems"},{"key":"2025041716422131700_ocaf030-B24","doi-asserted-by":"crossref","first-page":"101287","DOI":"10.1016\/j.cola.2024.101287","article-title":"Screening articles for systematic reviews with ChatGPT","volume":"80","author":"Syriani","year":"2024","journal-title":"J Comput Languages"},{"key":"2025041716422131700_ocaf030-B25","doi-asserted-by":"crossref","first-page":"616","DOI":"10.1002\/jrsm.1715","article-title":"Can large language models replace humans in systematic reviews? Evaluating GPT-4's efficacy in screening and extracting data from peer-reviewed and grey literature in multiple languages","volume":"15","author":"Khraisha","year":"2024","journal-title":"Res Synth Methods."},{"key":"2025041716422131700_ocaf030-B26","doi-asserted-by":"crossref","first-page":"e076912","DOI":"10.1136\/bmjopen-2023-076912","article-title":"Inter-reviewer reliability of human literature reviewing and implications for the introduction of machine-assisted systematic reviews: a mixed-methods review","volume":"14","author":"Hanegraaf","year":"2024","journal-title":"BMJ Open."},{"key":"2025041716422131700_ocaf030-B27","doi-asserted-by":"crossref","first-page":"276","DOI":"10.11613\/BM.2012.031","article-title":"Interrater reliability: the kappa statistic","volume":"22","author":"McHugh","year":"2012","journal-title":"Biochem Med (Zagreb)."},{"key":"2025041716422131700_ocaf030-B28","doi-asserted-by":"crossref","first-page":"44","DOI":"10.7599\/hmr.2015.35.1.44","article-title":"Measurement of inter-rater reliability in systematic review","volume":"35","author":"Park","year":"2015","journal-title":"Hanyang Med Rev."},{"key":"2025041716422131700_ocaf030-B29","doi-asserted-by":"crossref","first-page":"423","DOI":"10.1016\/0895-4356(93)90018-V","article-title":"Bias, prevalence and kappa","volume":"46","author":"Byrt","year":"1993","journal-title":"J Clin Epidemiol."},{"key":"2025041716422131700_ocaf030-B30","doi-asserted-by":"crossref","first-page":"65","DOI":"10.1016\/j.zefq.2023.06.007","article-title":"A narrative review of recent tools and innovations toward automating living systematic reviews and evidence syntheses","volume":"181","author":"Schmidt","year":"2023","journal-title":"Z Evid Fortbild Qual Gesundhwes"},{"key":"2025041716422131700_ocaf030-B31","doi-asserted-by":"crossref","first-page":"S390","DOI":"10.1016\/j.jval.2023.09.2044","article-title":"HTA361 living systematic review (LSR) in health technology assessment (HTA): current guidance, methods, and challenges","volume":"26","author":"Sauca","year":"2023","journal-title":"Value Health"},{"key":"2025041716422131700_ocaf030-B32","doi-asserted-by":"crossref","first-page":"105623","DOI":"10.1016\/j.envint.2020.105623","article-title":"SWIFT-active screener: accelerated document screening through active learning and integrated recall estimation","volume":"138","author":"Howard","year":"2020","journal-title":"Environ Int."},{"key":"2025041716422131700_ocaf030-B33","doi-asserted-by":"crossref","first-page":"S532","DOI":"10.1016\/j.jval.2022.04.1277","article-title":"MSR70 pilot study to evaluate efficiency of DISTILLERSR\u00ae'S artificial intelligence (AI) tool over manual screening process in literature review","volume":"25","author":"Kamra","year":"2022","journal-title":"Value Health"},{"key":"2025041716422131700_ocaf030-B34","author":"Thomas","year":"2022"},{"key":"2025041716422131700_ocaf030-B35","author":"LaserAI"},{"key":"2025041716422131700_ocaf030-B36","author":"Made.AI"},{"key":"2025041716422131700_ocaf030-B37","author":"EasySLR"},{"key":"2025041716422131700_ocaf030-B38","author":"RobertReviewer"},{"key":"2025041716422131700_ocaf030-B39"},{"key":"2025041716422131700_ocaf030-B40","doi-asserted-by":"crossref","first-page":"S288","DOI":"10.1016\/j.jval.2023.03.1596","article-title":"MSR61 AI support reduced screening burden in a systematic review with costs and cost-effectiveness outcomes (SR-CCEO) for cost-effectiveness modeling","volume":"26","author":"Borowiack","year":"2023","journal-title":"Value Health"},{"key":"2025041716422131700_ocaf030-B41","doi-asserted-by":"crossref","first-page":"1812","DOI":"10.1111\/all.16100","article-title":"Patients' values and preferences for health states in allergic rhinitis\u2014an artificial intelligence supported systematic review","volume":"79","author":"Brozek","year":"2024","journal-title":"Allergy"},{"key":"2025041716422131700_ocaf030-B42","doi-asserted-by":"crossref","first-page":"S802","DOI":"10.1016\/j.jval.2019.09.2142","article-title":"PNS242 balancing global HTA requirements for literature reviews across Europe, North America, and Asia","volume":"22","author":"Ostawal","year":"2019","journal-title":"Value Health"},{"key":"2025041716422131700_ocaf030-B43"}],"container-title":["Journal of the American Medical Informatics Association"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/jamia\/article-pdf\/32\/4\/616\/62195702\/ocaf030.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/jamia\/article-pdf\/32\/4\/616\/62195702\/ocaf030.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,4,17]],"date-time":"2025-04-17T20:42:42Z","timestamp":1744922562000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/jamia\/article\/32\/4\/616\/8045049"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,2,27]]},"references-count":43,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2025,2,27]]},"published-print":{"date-parts":[[2025,4,1]]}},"URL":"https:\/\/doi.org\/10.1093\/jamia\/ocaf030","relation":{},"ISSN":["1067-5027","1527-974X"],"issn-type":[{"value":"1067-5027","type":"print"},{"value":"1527-974X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,4]]},"published":{"date-parts":[[2025,2,27]]}}}