{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,28]],"date-time":"2026-04-28T08:05:55Z","timestamp":1777363555232,"version":"3.51.4"},"reference-count":26,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2026,1,31]],"date-time":"2026-01-31T00:00:00Z","timestamp":1769817600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Institute of Information & communications Technology Planning & Evaluation","award":["IITP-2024-RS-2023-00256615"],"award-info":[{"award-number":["IITP-2024-RS-2023-00256615"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Systems"],"abstract":"<jats:p>This study proposes a stage-aware governance framework for large language models (LLMs) that structures human oversight and accountability across different decision stages in AI-assisted literature review systems. Large language models (LLMs) are increasingly embedded in systematic review workflows, yet how human oversight and accountability should be structured across different decision stages remains unclear. This study evaluates three LLMs in a controlled two-stage literature review workflow\u2014title-and-abstract screening and eligibility assessment\u2014using identical evidence inputs and fixed inclusion criteria, with outputs benchmarked against expert consensus under fully reproducible conditions with standardized prompts and comprehensive logging. While LLMs closely matched expert decisions during screening (precision 0.83\u20130.91; F1 up to 0.89; Cohen\u2019s \u03ba 0.65\u20130.85), performance degraded substantially at the eligibility stage (F1 0.58\u20130.65; \u03ba 0.52\u20130.62), indicating increased epistemic uncertainty when fine-grained criteria must be inferred from abstract-level information. Importantly, disagreements clustered in borderline cases rather than random error, supporting a stage-aware governance approach in which LLMs automate high-throughput screening while inter-model disagreement is operationalized as an actionable uncertainty signal that triggers human oversight in more consequential decision stages. These findings highlight the need for explicit oversight thresholds, responsibility allocation, and auditability in the responsible deployment of AI-assisted decision systems for evidence synthesis.<\/jats:p>","DOI":"10.3390\/systems14020153","type":"journal-article","created":{"date-parts":[[2026,2,2]],"date-time":"2026-02-02T09:48:08Z","timestamp":1770025688000},"page":"153","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Stage-Aware Governance of Large Language Models: Managing Uncertainty and Human Oversight in AI-Assisted Literature Review Systems"],"prefix":"10.3390","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7807-7794","authenticated-orcid":false,"given":"Junic","family":"Kim","sequence":"first","affiliation":[{"name":"School of Business, Konkuk University, 120 Neungdong-ro, Gwangjin-gu, Seoul 05029, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Haeyong","family":"Shin","sequence":"additional","affiliation":[{"name":"Graduate School of Metaverse, Konkuk University, 120 Neungdong-ro, Gwangjin-gu, Seoul 05029, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2026,1,31]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1093\/polsoc\/puaf001","article-title":"Governance of Generative AI","volume":"44","author":"Taeihagh","year":"2025","journal-title":"Policy Soc."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Kim, J. (2025). Modeling Generative AI and Social Entrepreneurial Searches: A Contextualized Optimal Stopping Approach. Adm. Sci., 15.","DOI":"10.3390\/admsci15080302"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"413","DOI":"10.1109\/TSE.2024.3519464","article-title":"Look Before You Leap: An Exploratory Study of Uncertainty Analysis for Large Language Models","volume":"51","author":"Huang","year":"2025","journal-title":"IEEE Trans. Softw. Eng."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"e2420496","DOI":"10.1001\/jamanetworkopen.2024.20496","article-title":"Performance of a Large Language Model in Screening Citations","volume":"7","author":"Oami","year":"2024","journal-title":"JAMA Netw. Open"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"e100101","DOI":"10.1016\/j.joitmc.2023.100101","article-title":"How industry recipe and boundary belief influence similar modular business model innovations","volume":"9","author":"Kim","year":"2023","journal-title":"J. Open Innov."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"2613","DOI":"10.1038\/s41591-024-03097-1","article-title":"Evaluation and Mitigation of the Limitations of Large Language Models in Clinical Decision-Making","volume":"30","author":"Hager","year":"2024","journal-title":"Nat. Med."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1007\/s10648-024-09862-5","article-title":"Screening Smarter, Not Harder: A Comparative Analysis of Machine Learning Screening Algorithms and Heuristic Stopping Criteria for Systematic Reviews in Educational Research","volume":"36","author":"Campos","year":"2024","journal-title":"Educ. Psychol. Rev."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"791","DOI":"10.7326\/M23-3389","article-title":"Sensitivity and Specificity of Using GPT-3.5 Turbo Models for Title and Abstract Screening in Systematic Reviews and Meta-Analyses","volume":"177","author":"Tran","year":"2024","journal-title":"Ann. Intern. Med."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"e23196","DOI":"10.1200\/JCO.2024.42.16_suppl.e23196","article-title":"Screening Oncology Articles in a Qualitative Literature Review Using Large Language Models: A Comparison of GPT-4 versus Fine-Tuned Open Source Models Using Expert-Annotated Data","volume":"42","author":"Thorlund","year":"2024","journal-title":"J. Clin. Oncol."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"389","DOI":"10.7326\/ANNALS-24-02189","article-title":"Development of Prompt Templates for Large Language Model\u2013Driven Screening in Systematic Reviews","volume":"178","author":"Cao","year":"2025","journal-title":"Ann. Intern. Med."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"e13611","DOI":"10.1200\/JCO.2024.42.16_suppl.e13611","article-title":"Retrieval-Augmented Large Language Models for Clinical Trial Screening","volume":"42","author":"Tan","year":"2024","journal-title":"J. Clin. Oncol."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"28","DOI":"10.1145\/3673861","article-title":"Prompting Considered Harmful","volume":"67","author":"Morris","year":"2024","journal-title":"Commun. ACM"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"e241139","DOI":"10.1148\/radiol.241139","article-title":"Is Open-Source There Yet? A Comparative Study on Commercial and Open-Source LLMs in Their Ability to Label Chest X-Ray Reports","volume":"313","author":"Dorfner","year":"2024","journal-title":"Radiology"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Kim, J. (2025). Academic Library with Generative AI: From Passive Information Providers to Proactive Knowledge Facilitators. Publications, 13.","DOI":"10.3390\/publications13030037"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"179","DOI":"10.1007\/s12599-023-00790-2","article-title":"AI-Enhanced Hybrid Decision Management","volume":"65","author":"Bork","year":"2023","journal-title":"Bus. Inf. Syst. Eng."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1145\/2994031","article-title":"Incentivizing Reproducibility","volume":"59","author":"Boisvert","year":"2016","journal-title":"Commun. ACM"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"2293","DOI":"10.1038\/s41562-024-02024-1","article-title":"When Combinations of Humans and AI Are Useful: A Systematic Review and Meta-Analysis","volume":"8","author":"Vaccaro","year":"2024","journal-title":"Nat. Hum. Behav."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1109\/MIC.2024.3388282","article-title":"The Role of Computer Science in Responsible AI Governance","volume":"28","author":"Almeida","year":"2024","journal-title":"IEEE Internet Comput."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"3280","DOI":"10.1038\/s41467-025-56989-2","article-title":"Benchmarking Large Language Models for Biomedical Natural Language Processing Applications and Recommendations","volume":"16","author":"Chen","year":"2023","journal-title":"Nat. Commun."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3762813","article-title":"Balancing the Unknown: Exploring Human Reliance on AI Advice under Aleatoric and Epistemic Uncertainty","volume":"32","author":"Holstein","year":"2025","journal-title":"ACM Trans. Comput.-Hum. Interact."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"728","DOI":"10.1186\/s12967-023-04576-8","article-title":"Harnessing Large Language Models for Candidate Gene Prioritization and Selection","volume":"21","author":"Toufiq","year":"2023","journal-title":"J. Transl. Med."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"38","DOI":"10.1093\/polsoc\/puae040","article-title":"Responsible Governance of Generative AI: Conceptualizing GenAI as Complex Adaptive Systems","volume":"44","author":"Janssen","year":"2025","journal-title":"Policy Soc."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Rostami, M., and Hawamdeh, S. (2025). Debunk Lists as External Knowledge Structures for Health Misinformation Detection with Generative AI. Systems, 13.","DOI":"10.3390\/systems13100882"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Frenkenberg, A., and Hochman, G. (2025). It\u2019s Scary to Use It, It\u2019s Scary to Refuse It: The Psychological Dimensions of AI Adoption\u2014Anxiety, Motives, and Dependency. Systems, 13.","DOI":"10.3390\/systems13020082"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"e70016","DOI":"10.1002\/widm.70016","article-title":"AI-Assisted Literature Review: Integrating Visualization and Geometric Features for Insightful Analysis","volume":"15","author":"Papageorgiou","year":"2025","journal-title":"WIREs Data Min. Knowl. Discov."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"877","DOI":"10.1007\/s10796-022-10284-3","article-title":"Designing Transparency for Effective Human-AI Collaboration","volume":"24","author":"Lind","year":"2022","journal-title":"Inf. Syst. Front."}],"container-title":["Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2079-8954\/14\/2\/153\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,2,2]],"date-time":"2026-02-02T10:15:31Z","timestamp":1770027331000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2079-8954\/14\/2\/153"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,1,31]]},"references-count":26,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2026,2]]}},"alternative-id":["systems14020153"],"URL":"https:\/\/doi.org\/10.3390\/systems14020153","relation":{},"ISSN":["2079-8954"],"issn-type":[{"value":"2079-8954","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,1,31]]}}}