{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,14]],"date-time":"2026-04-14T18:37:07Z","timestamp":1776191827315,"version":"3.50.1"},"reference-count":29,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2024,4,3]],"date-time":"2024-04-03T00:00:00Z","timestamp":1712102400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,4,3]],"date-time":"2024-04-03T00:00:00Z","timestamp":1712102400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100006492","name":"Division of Intramural Research, National Institute of Allergy and Infectious Diseases","doi-asserted-by":"publisher","award":["U01AI150741"],"award-info":[{"award-number":["U01AI150741"]}],"id":[{"id":"10.13039\/100006492","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000052","name":"U.S. Department of Health & Human Services | NIH | NIH Office of the Director","doi-asserted-by":"publisher","award":["OT2OD024611"],"award-info":[{"award-number":["OT2OD024611"]}],"id":[{"id":"10.13039\/100000052","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["npj Digit. Med."],"DOI":"10.1038\/s41746-024-01083-y","type":"journal-article","created":{"date-parts":[[2024,4,3]],"date-time":"2024-04-03T04:01:33Z","timestamp":1712116893000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":111,"title":["Evaluating large language models as agents in the clinic"],"prefix":"10.1038","volume":"7","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7374-7127","authenticated-orcid":false,"given":"Nikita","family":"Mehandru","sequence":"first","affiliation":[]},{"given":"Brenda Y.","family":"Miao","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0978-1117","authenticated-orcid":false,"given":"Eduardo Rodriguez","family":"Almaraz","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7884-0526","authenticated-orcid":false,"given":"Madhumita","family":"Sushil","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7433-2740","authenticated-orcid":false,"given":"Atul J.","family":"Butte","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9936-7141","authenticated-orcid":false,"given":"Ahmed","family":"Alaa","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,4,3]]},"reference":[{"key":"1083_CR1","unstructured":"Singhal, et al. Towards expert-level medical question answering with large language models. Preprint at https:\/\/arxiv.org\/abs\/2305.09617 (2023)."},{"key":"1083_CR2","doi-asserted-by":"crossref","unstructured":"Agrawal, M., Hegselmann, S., Lang, H., Kim, Y. & Sontag, D. Large Language Models are Few-Shot Clinical Information Extractors. In 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP). 1998\u20132022 (ACL, 2022).","DOI":"10.18653\/v1\/2022.emnlp-main.130"},{"key":"1083_CR3","unstructured":"Brown, T. B. et al. Language Models are Few-Shot Learners. In Proc. NeurIPS 2020. (2020)."},{"key":"1083_CR4","doi-asserted-by":"publisher","unstructured":"Bubeck, S. et al. Sparks of Artificial General Intelligence: Early experiments with GPT-4 Preprint at https:\/\/doi.org\/10.48550\/arXiv.2303.12712 (2023).","DOI":"10.48550\/arXiv.2303.12712"},{"key":"1083_CR5","doi-asserted-by":"publisher","first-page":"1233","DOI":"10.1056\/NEJMsr2214184","volume":"388","author":"P Lee","year":"2023","unstructured":"Lee, P., Bubeck, S. & Petro, J. Benefits, Limits, and Risks of GPT-4 as an AI Chatbot for Medicine. N. Engl. J. Med. 388, 1233\u20131239 (2023).","journal-title":"N. Engl. J. Med."},{"key":"1083_CR6","doi-asserted-by":"publisher","unstructured":"Fleming, S. L. et al. Assessing the Potential of USMLE-Like Exam Questions Generated by GPT-4. 2023.04.25.23288588. Preprint at https:\/\/doi.org\/10.1101\/2023.04.25.23288588 (2023).","DOI":"10.1101\/2023.04.25.23288588"},{"key":"1083_CR7","doi-asserted-by":"publisher","unstructured":"Nori, H., King, N., McKinney, S. M., Carignan, D. & Horvitz, E. Capabilities of GPT-4 on Medical Challenge Problems. Preprint at https:\/\/doi.org\/10.48550\/arXiv.2303.13375 (2023).","DOI":"10.48550\/arXiv.2303.13375"},{"key":"1083_CR8","doi-asserted-by":"publisher","unstructured":"Dash, D. et al. Evaluation of GPT-3.5 and GPT-4 for supporting real-world information needs in healthcare delivery. Preprint at https:\/\/doi.org\/10.48550\/arXiv.2304.13714 (2023).","DOI":"10.48550\/arXiv.2304.13714"},{"key":"1083_CR9","doi-asserted-by":"crossref","unstructured":"Park, J. S. et al. Generative Agents: Interactive Simulacra of Human Behavior. In 36th Symposium on User Interface Software and Technology (UIST). 1\u201322 (ACM, 2023).","DOI":"10.1145\/3586183.3606763"},{"key":"1083_CR10","doi-asserted-by":"publisher","unstructured":"Yang, H., Yue, S. & He, Y. Auto-GPT for Online Decision Making: Benchmarks and Additional Opinions. Preprint at https:\/\/doi.org\/10.48550\/arXiv.2306.02224 (2023).","DOI":"10.48550\/arXiv.2306.02224"},{"key":"1083_CR11","unstructured":"Johri, S. et al. Testing the Limits of Language Models: A Conversational Framework for Medical AI Assessment. medRxiv https:\/\/www.medrxiv.org\/content\/10.1101\/2023.09.12.23295399v2 (2023)."},{"key":"1083_CR12","unstructured":"Introducing Dr. Chatbot (2023). https:\/\/today.ucsd.edu\/story\/introducing-dr-chatbot."},{"key":"1083_CR13","doi-asserted-by":"publisher","unstructured":"Levine, D. M. et al. The Diagnostic and Triage Accuracy of the GPT-3 Artificial Intelligence Model. Preprint at https:\/\/doi.org\/10.1101\/2023.01.30.23285067 (2023).","DOI":"10.1101\/2023.01.30.23285067"},{"key":"1083_CR14","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41746-021-00464-x","volume":"4","author":"DM Korngiebel","year":"2021","unstructured":"Korngiebel, D. M. & Mooney, S. D. Considering the possibilities and pitfalls of Generative Pre-trained Transformer 3 (GPT-3) in healthcare delivery. Npj Digit. Med. 4, 1\u20133 (2021).","journal-title":"Npj Digit. Med."},{"key":"1083_CR15","doi-asserted-by":"publisher","unstructured":"Bankes, S. C. Agent-based modeling: A revolution? PNAS. https:\/\/doi.org\/10.1073\/pnas.072081299.","DOI":"10.1073\/pnas.072081299"},{"key":"1083_CR16","doi-asserted-by":"publisher","first-page":"77","DOI":"10.1146\/annurev-publhealth-040617-014317","volume":"39","author":"M Tracy","year":"2018","unstructured":"Tracy, M., Cerd\u00e1, M. & Keyes, K. M. Agent-Based Modeling in Public Health: Current Applications and Future Directions. Annu. Rev. Public Health 39, 77\u201394 (2018).","journal-title":"Annu. Rev. Public Health"},{"key":"1083_CR17","doi-asserted-by":"publisher","first-page":"7280","DOI":"10.1073\/pnas.082080899","volume":"99","author":"E Bonabeau","year":"2002","unstructured":"Bonabeau, E. Agent-based modeling: Methods and techniques for simulating human systems. Proc. Natl. Acad. Sci. 99, 7280\u20137287 (2002).","journal-title":"Proc. Natl. Acad. Sci."},{"key":"1083_CR18","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.trc.2013.12.001","volume":"40","author":"DJ Fagnant","year":"2014","unstructured":"Fagnant, D. J. & Kockelman, K. M. The travel and environmental implications of shared autonomous vehicles, using agent-based model scenarios. Transp. Res. Part C. Emerg. Technol. 40, 1\u201313 (2014).","journal-title":"Transp. Res. Part C. Emerg. Technol."},{"key":"1083_CR19","doi-asserted-by":"crossref","unstructured":"Kaur, P. et al. A survey on simulators for testing self-driving cars. In 2021 Fourth International Conference on Connected and Autonomous Driving (MetroCAD) (IEEE, 2021).","DOI":"10.1109\/MetroCAD51599.2021.00018"},{"key":"1083_CR20","doi-asserted-by":"publisher","first-page":"ooad045","DOI":"10.1093\/jamiaopen\/ooad045","volume":"6","author":"L Radhakrishnan","year":"2023","unstructured":"Radhakrishnan, L. et al. A certified de-identification system for all clinical text documents for information extraction at scale. JAMIA Open 6, ooad045 (2023).","journal-title":"JAMIA Open"},{"key":"1083_CR21","doi-asserted-by":"publisher","DOI":"10.1038\/s41597-022-01899-x","volume":"10","author":"AEW Johnson","year":"2023","unstructured":"Johnson, A. E. W. et al. MIMIC-IV, a freely accessible electronic health record dataset. Sci. Data 10, 1 (2023).","journal-title":"Sci. Data"},{"key":"1083_CR22","doi-asserted-by":"publisher","first-page":"219","DOI":"10.5001\/omj.2011.55","volume":"26","author":"M Zayyan","year":"2011","unstructured":"Zayyan, M. Objective Structured Clinical Examination: The Assessment of Choice. Oman Med. J. 26, 219\u2013222 (2011).","journal-title":"Oman Med. J."},{"key":"1083_CR23","unstructured":"Tu, et al. Towards Conversational Diagnostic AI. Preprint at https:\/\/arxiv.org\/abs\/2401.05654 (2024)."},{"key":"1083_CR24","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41746-023-00879-8","volume":"6","author":"M Wornow","year":"2023","unstructured":"Wornow, M. et al. The shaky foundations of large language models and foundation models for electronic health records. Npj Digit. Med. 6, 1\u201310 (2023).","journal-title":"Npj Digit. Med."},{"key":"1083_CR25","doi-asserted-by":"publisher","first-page":"172","DOI":"10.1038\/s41586-023-06291-2","volume":"620","author":"K Singhal","year":"2023","unstructured":"Singhal, K. et al. Large language models encode clinical knowledge. Nature 620, 172\u2013180 (2023).","journal-title":"Nature"},{"key":"1083_CR26","doi-asserted-by":"crossref","unstructured":"Shen, H., et al. MultiTurnCleanup: A Benchmark for Multi-Turn Spoken Conversational Transcript Cleanup. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP). 9895\u20139903. (ACL, 2023).","DOI":"10.18653\/v1\/2023.emnlp-main.613"},{"key":"1083_CR27","doi-asserted-by":"publisher","first-page":"123","DOI":"10.1146\/annurev-biodatasci-092820-114757","volume":"4","author":"I Chen","year":"2021","unstructured":"Chen, I. et al. Ethical machine learning in healthcare. Annu. Rev. Biomed. Data Sci. 4, 123\u2013144 (2021).","journal-title":"Annu. Rev. Biomed. Data Sci."},{"key":"1083_CR28","doi-asserted-by":"crossref","unstructured":"Rebedea, Traian, et al. \"NeMo Guardrails: A Toolkit for Controllable and Safe LLM Applications with Programmable Rails.\" Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 2023.","DOI":"10.18653\/v1\/2023.emnlp-demo.40"},{"key":"1083_CR29","doi-asserted-by":"crossref","unstructured":"Webster, P. Six ways large language models are changing healthcare. Nat. Med., 29, 2969\u20132971 (2023).","DOI":"10.1038\/s41591-023-02700-1"}],"container-title":["npj Digital Medicine"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.nature.com\/articles\/s41746-024-01083-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41746-024-01083-y","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41746-024-01083-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,4,3]],"date-time":"2024-04-03T04:03:13Z","timestamp":1712116993000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.nature.com\/articles\/s41746-024-01083-y"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,4,3]]},"references-count":29,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2024,12]]}},"alternative-id":["1083"],"URL":"https:\/\/doi.org\/10.1038\/s41746-024-01083-y","relation":{},"ISSN":["2398-6352"],"issn-type":[{"value":"2398-6352","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,4,3]]},"assertion":[{"value":"25 August 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 March 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"3 April 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"A.J.B. is a co-founder and consultant to Personalis and NuMedii; consultant to Mango Tree Corporation, and in the recent past, Samsung, 10x Genomics, Helix, Pathway Genomics, and Verinata (Illumina); has served on paid advisory panels or boards for Geisinger Health, Regenstrief Institute, Gerson Lehman Group, AlphaSights, Covance, Novartis, Genentech, and Merck, and Roche; is a shareholder in Personalis and NuMedii; is a minor shareholder in Apple, Meta (Facebook), Alphabet (Google), Microsoft, Amazon, Snap, 10x Genomics, Illumina, Regeneron, Sanofi, Pfizer, Royalty Pharma, Moderna, Sutro, Doximity, BioNtech, Invitae, Pacific Biosciences, Editas Medicine, Nuna Health, Assay Depot, and Vet24seven, and several other non-health related companies and mutual funds; and has received honoraria and travel reimbursement for invited talks from Johnson and Johnson, Roche, Genentech, Pfizer, Merck, Lilly, Takeda, Varian, Mars, Siemens, Optum, Abbott, Celgene, AstraZeneca, AbbVie, Westat, and many academic institutions, medical or disease specific foundations and associations, and health systems. A.J.B. receives royalty payments through Stanford University, for several patents and other disclosures licensed to NuMedii and Personalis. A.J.B.\u2019s research has been funded by NIH, Peraton (as the prime on an NIH contract), Genentech, Johnson and Johnson, FDA, Robert Wood Johnson Foundation, Leon Lowenstein Foundation, Intervalien Foundation, Priscilla Chan and Mark Zuckerberg, the Barbara and Gerson Bakar Foundation, and in the recent past, the March of Dimes, Juvenile Diabetes Research Foundation, California Governor\u2019s Office of Planning and Research, California Institute for Regenerative Medicine, L\u2019Oreal, and Progenity. None of these entities had any bearing on the design of this study or the writing of the manuscript. All other authors have no conflicts of interest to disclose.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"84"}}