{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,2]],"date-time":"2026-06-02T07:22:34Z","timestamp":1780384954765,"version":"3.54.1"},"reference-count":16,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,1,9]],"date-time":"2025-01-09T00:00:00Z","timestamp":1736380800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,1,9]],"date-time":"2025-01-09T00:00:00Z","timestamp":1736380800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100000002","name":"U.S. Department of Health & Human Services | National Institutes of Health","doi-asserted-by":"publisher","award":["R01 LM006910"],"award-info":[{"award-number":["R01 LM006910"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["npj Digit. Med."],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Using administrative claims and electronic health records for observational studies is common but challenging due to data limitations. Researchers rely on phenotype algorithms, requiring labor-intensive chart reviews for validation. This study investigates whether case adjudication using the previously introduced Knowledge-Enhanced Electronic Profile Review (KEEPER) system with large language models (LLMs) is feasible and could serve as a viable alternative to manual chart review. The task involves adjudicating cases identified by a phenotype algorithm, with KEEPER extracting predefined findings such as symptoms, comorbidities, and treatments from structured data. LLMs then evaluate KEEPER outputs to determine whether a patient truly qualifies as a case. We tested four LLMs including GPT-4, hosted locally to ensure privacy. Using zero-shot prompting and iterative prompt optimization, we found LLM performance, across ten diseases, varied by prompt and model, with sensitivities from 78 to 98% and specificities from 48 to 98%, indicating promise for automating phenotype evaluation.<\/jats:p>","DOI":"10.1038\/s41746-025-01433-4","type":"journal-article","created":{"date-parts":[[2025,1,9]],"date-time":"2025-01-09T05:11:04Z","timestamp":1736399464000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["Standardized patient profile review using large language models for case adjudication in observational research"],"prefix":"10.1038","volume":"8","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0817-5361","authenticated-orcid":false,"given":"Martijn J.","family":"Schuemie","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Anna","family":"Ostropolets","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Aleh","family":"Zhuk","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Uladzislau","family":"Korsik","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Seung In","family":"Seo","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Marc A.","family":"Suchard","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"George","family":"Hripcsak","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Patrick B.","family":"Ryan","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2025,1,9]]},"reference":[{"key":"1433_CR1","unstructured":"FDA. Real-World Data: Assessing Electronic Health Records and Medical Claims Data To Support Regulatory Decision-Making for Drug and Biological Products: Guidance for Industry. https:\/\/www.fda.gov\/regulatory-information\/search-fda-guidance-documents\/real-world-data-assessing-electronic-health-records-and-medical-claims-data-support-regulatory (2021)."},{"key":"1433_CR2","doi-asserted-by":"publisher","first-page":"1009","DOI":"10.1002\/pds.3856","volume":"24","author":"S Lanes","year":"2015","unstructured":"Lanes, S., Brown, J. S., Haynes, K., Pollack, M. F. & Walker, A. M. Identifying health outcomes in healthcare databases. Pharmacoepidemiol. Drug Saf. 24, 1009\u20131016 (2015).","journal-title":"Pharmacoepidemiol. Drug Saf."},{"key":"1433_CR3","doi-asserted-by":"crossref","unstructured":"Ostropolets, A. et al. Scalable and interpretable alternative to chart review for phenotype evaluation using standardized structured data from electronic health records. J. Am. Med. Inform. Assoc. 31, 119\u2013129.","DOI":"10.1093\/jamia\/ocad202"},{"key":"1433_CR4","doi-asserted-by":"publisher","DOI":"10.2196\/45312","volume":"9","author":"A Gilson","year":"2023","unstructured":"Gilson, A. et al. How does ChatGPT perform on the United States Medical Licensing Examination? The implications of large language models for medical education and knowledge assessment. JMIR Med. Educ. 9, e45312 (2023).","journal-title":"JMIR Med. Educ."},{"key":"1433_CR5","doi-asserted-by":"crossref","unstructured":"Li\u00e9vin, V., Hother, C. E., Motzfeldt, A. G., & Winther, O. Can large language models reason about medical questions? Patterns. 5, https:\/\/www.cell.com\/patterns\/fulltext\/S2666-3899(24)00042-4 (2024).","DOI":"10.1016\/j.patter.2024.100943"},{"key":"1433_CR6","doi-asserted-by":"crossref","unstructured":"Eriksen, A., M\u00f6ller, S. & Ryg, J. Use of GPT-4 to Diagnose Complex Clinical Cases. NEJM AI (2023).","DOI":"10.1056\/AIp2300031"},{"key":"1433_CR7","doi-asserted-by":"crossref","unstructured":"Reich, C. et al. OHDSI Standardized Vocabularies-a large-scale centralized reference ontology for international data harmonization. J. Am. Med. Inform. Assoc. 31, 583\u2013590 (2024).","DOI":"10.1093\/jamia\/ocad247"},{"key":"1433_CR8","first-page":"24824","volume":"35","author":"J Wei","year":"2022","unstructured":"Wei, J. et al. Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural Inf. Process. Syst. 35, 24824\u201324837 (2022).","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"1433_CR9","doi-asserted-by":"publisher","unstructured":"Zhou, Y. et al. Large Language Models Are Human-Level Prompt Engineers. Preprint at https:\/\/doi.org\/10.48550\/arXiv.2211.01910 (2023).","DOI":"10.48550\/arXiv.2211.01910"},{"key":"1433_CR10","first-page":"574","volume":"216","author":"G Hripcsak","year":"2015","unstructured":"Hripcsak, G. et al. Observational Health Data Sciences and Informatics (OHDSI): opportunities for observational researchers. Stud. Health Technol. Inform. 216, 574\u2013578 (2015).","journal-title":"Stud. Health Technol. Inform."},{"key":"1433_CR11","unstructured":"Brown, T. B. et al. Language models are few-shot learners. Proc. of the 34th Int. Conf. on Neural Information Processing Systems. 1877\u20131901. https:\/\/dl.acm.org\/doi\/abs\/10.5555\/3495724.3495883 (2020)."},{"key":"1433_CR12","doi-asserted-by":"publisher","unstructured":"Touvron, H. et al. Llama 2: open foundation and fine-tuned chat models. Preprint at https:\/\/doi.org\/10.48550\/arXiv.2307.09288 (2023).","DOI":"10.48550\/arXiv.2307.09288"},{"key":"1433_CR13","doi-asserted-by":"publisher","unstructured":"Mukherjee, S. et al. Orca: progressive learning from complex explanation traces of GPT-4. Preprint at https:\/\/doi.org\/10.48550\/arXiv.2306.02707 (2023).","DOI":"10.48550\/arXiv.2306.02707"},{"key":"1433_CR14","unstructured":"Lee, A. N., Hunter, C. J. & Ruiz, N. Platypus: quick, cheap, and powerful refinement of LLMs. Preprint at https:\/\/arxiv.org\/abs\/2308.07317 (2023)."},{"key":"1433_CR15","unstructured":"Open LLM Leaderboard\u2014a Hugging Face Space by HuggingFaceH4. https:\/\/huggingface.co\/spaces\/HuggingFaceH4\/open_llm_leaderboard."},{"key":"1433_CR16","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3324926","volume":"10","author":"W Wang","year":"2019","unstructured":"Wang, W., Zheng, V. W., Yu, H. & Miao, C. A survey of zero-shot learning: settings, methods, and applications. ACM Trans. Intell. Syst. Technol. 10, 1\u201337 (2019).","journal-title":"ACM Trans. Intell. Syst. Technol."}],"container-title":["npj Digital Medicine"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.nature.com\/articles\/s41746-025-01433-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41746-025-01433-4","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41746-025-01433-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,1,9]],"date-time":"2025-01-09T06:04:32Z","timestamp":1736402672000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.nature.com\/articles\/s41746-025-01433-4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,1,9]]},"references-count":16,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["1433"],"URL":"https:\/\/doi.org\/10.1038\/s41746-025-01433-4","relation":{},"ISSN":["2398-6352"],"issn-type":[{"value":"2398-6352","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,1,9]]},"assertion":[{"value":"4 March 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"1 January 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 January 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declare no competing interests. MAS receives contracts and grants from the US National Institutes of Health, US Food & Drug Administration, the US Department of Veterans Affairs and Janssen Research & Development, all outside the scope of this work.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"18"}}