{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,7]],"date-time":"2026-07-07T23:23:01Z","timestamp":1783466581554,"version":"3.55.0"},"reference-count":32,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2024,5,1]],"date-time":"2024-05-01T00:00:00Z","timestamp":1714521600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,5,1]],"date-time":"2024-05-01T00:00:00Z","timestamp":1714521600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100000057","name":"U.S. Department of Health & Human Services | NIH | National Institute of General Medical Sciences","doi-asserted-by":"publisher","award":["R35GM136375"],"award-info":[{"award-number":["R35GM136375"]}],"id":[{"id":"10.13039\/100000057","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000057","name":"U.S. Department of Health & Human Services | NIH | National Institute of General Medical Sciences","doi-asserted-by":"publisher","award":["R01GM141519"],"award-info":[{"award-number":["R01GM141519"]}],"id":[{"id":"10.13039\/100000057","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000057","name":"U.S. Department of Health & Human Services | NIH | National Institute of General Medical Sciences","doi-asserted-by":"publisher","award":["R01GM140012"],"award-info":[{"award-number":["R01GM140012"]}],"id":[{"id":"10.13039\/100000057","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000054","name":"U.S. Department of Health & Human Services | NIH | National Cancer Institute","doi-asserted-by":"publisher","award":["P30CA142543"],"award-info":[{"award-number":["P30CA142543"]}],"id":[{"id":"10.13039\/100000054","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000054","name":"U.S. Department of Health & Human Services | NIH | National Cancer Institute","doi-asserted-by":"publisher","award":["P50CA70907"],"award-info":[{"award-number":["P50CA70907"]}],"id":[{"id":"10.13039\/100000054","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000054","name":"U.S. Department of Health & Human Services | NIH | National Cancer Institute","doi-asserted-by":"publisher","award":["U01CA249245"],"award-info":[{"award-number":["U01CA249245"]}],"id":[{"id":"10.13039\/100000054","id-type":"DOI","asserted-by":"publisher"}]},{"name":"U.S. Department of Health & Human Services | NIH | National Cancer Institute"},{"name":"U.S. Department of Health & Human Services | NIH | National Institute of General Medical Sciences"},{"name":"U.S. Department of Health & Human Services | NIH | National Institute of General Medical Sciences"},{"DOI":"10.13039\/100000072","name":"U.S. Department of Health & Human Services | NIH | National Institute of Dental and Craniofacial Research","doi-asserted-by":"publisher","award":["R01DE030656"],"award-info":[{"award-number":["R01DE030656"]}],"id":[{"id":"10.13039\/100000072","id-type":"DOI","asserted-by":"publisher"}]},{"name":"U.S. Department of Health & Human Services | NIH | National Cancer Institute"},{"DOI":"10.13039\/100004917","name":"Cancer Prevention and Research Institute of Texas","doi-asserted-by":"publisher","award":["RP230330"],"award-info":[{"award-number":["RP230330"]}],"id":[{"id":"10.13039\/100004917","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100006492","name":"Division of Intramural Research, National Institute of Allergy and Infectious Diseases","doi-asserted-by":"publisher","award":["U01AI169298"],"award-info":[{"award-number":["U01AI169298"]}],"id":[{"id":"10.13039\/100006492","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["npj Digit. Med."],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Existing natural language processing (NLP) methods to convert free-text clinical notes into structured data often require problem-specific annotations and model training. This study aims to evaluate ChatGPT\u2019s capacity to extract information from free-text medical notes efficiently and comprehensively. We developed a large language model (LLM)-based workflow, utilizing systems engineering methodology and spiral \u201cprompt engineering\u201d process, leveraging OpenAI\u2019s API for batch querying ChatGPT. We evaluated the effectiveness of this method using a dataset of more than 1000 lung cancer pathology reports and a dataset of 191 pediatric osteosarcoma pathology reports, comparing the ChatGPT-3.5 (gpt-3.5-turbo-16k) outputs with expert-curated structured data. ChatGPT-3.5 demonstrated the ability to extract pathological classifications with an overall accuracy of 89%, in lung cancer dataset, outperforming the performance of two traditional NLP methods. The performance is influenced by the design of the instructive prompt. Our case analysis shows that most misclassifications were due to the lack of highly specialized pathology terminology, and erroneous interpretation of TNM staging rules. Reproducibility shows the relatively stable performance of ChatGPT-3.5 over time. In pediatric osteosarcoma dataset, ChatGPT-3.5 accurately classified both grades and margin status with accuracy of 98.6% and 100% respectively. Our study shows the feasibility of using ChatGPT to process large volumes of clinical notes for structured information extraction without requiring extensive task-specific human annotation and model training. The results underscore the potential role of LLMs in transforming unstructured healthcare data into structured formats, thereby supporting research and aiding clinical decision-making.<\/jats:p>","DOI":"10.1038\/s41746-024-01079-8","type":"journal-article","created":{"date-parts":[[2024,5,1]],"date-time":"2024-05-01T12:53:06Z","timestamp":1714567986000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":253,"title":["A critical assessment of using ChatGPT for extracting structured data from clinical notes"],"prefix":"10.1038","volume":"7","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2155-6107","authenticated-orcid":false,"given":"Jingwei","family":"Huang","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Donghan M.","family":"Yang","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ruichen","family":"Rong","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6785-7362","authenticated-orcid":false,"given":"Kuroush","family":"Nezafati","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Colin","family":"Treager","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3601-3351","authenticated-orcid":false,"given":"Zhikai","family":"Chi","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0001-3261","authenticated-orcid":false,"given":"Shidan","family":"Wang","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Xian","family":"Cheng","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yujia","family":"Guo","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Laura J.","family":"Klesse","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Guanghua","family":"Xiao","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Eric D.","family":"Peterson","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Xiaowei","family":"Zhan","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9456-1762","authenticated-orcid":false,"given":"Yang","family":"Xie","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2024,5,1]]},"reference":[{"key":"1079_CR1","unstructured":"Vaswani, A. et al. Attention is all you need. Adv. Neural Info. Processing Syst. 30, (2017)."},{"key":"1079_CR2","unstructured":"Devlin, J. et al. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018."},{"key":"1079_CR3","unstructured":"Radford, A. et al. Improving language understanding by generative pre-training. OpenAI: https:\/\/cdn.openai.com\/research-covers\/language-unsupervised\/language_understanding_paper.pdf (2018)."},{"key":"1079_CR4","unstructured":"Touvron, H. et al. LLaMA: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023)."},{"key":"1079_CR5","unstructured":"OpenAi, GPT-4 Technical Report. arXiv:2303.08774: https:\/\/arxiv.org\/pdf\/2303.08774.pdf (2023)."},{"key":"1079_CR6","unstructured":"Anil, R. et al. Palm 2 technical report. arXiv preprint arXiv:2305.10403 (2023)."},{"key":"1079_CR7","unstructured":"Turner, B. E. W. Epic, Microsoft bring GPT-4 to EHRs."},{"key":"1079_CR8","unstructured":"Landi, H. Microsoft\u2019s Nuance integrates OpenAI\u2019s GPT-4 into voice-enabled medical scribe software."},{"key":"1079_CR9","doi-asserted-by":"publisher","first-page":"e23898","DOI":"10.2196\/23898","volume":"9","author":"T Hao","year":"2021","unstructured":"Hao, T. et al. Health Natural Language Processing: Methodology Development and Applications. JMIR Med Inf. 9, e23898 (2021).","journal-title":"JMIR Med Inf."},{"key":"1079_CR10","doi-asserted-by":"publisher","first-page":"e206","DOI":"10.1136\/amiajnl-2013-002428","volume":"20","author":"J Pathak","year":"2013","unstructured":"Pathak, J., Kho, A. N. & Denny, J. C. Electronic health records-driven phenotyping: challenges, recent advances, and perspectives. J. Am. Med. Inform. Assoc. 20, e206\u2013e211 (2013).","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"1079_CR11","doi-asserted-by":"publisher","DOI":"10.1186\/s12859-017-1776-8","volume":"18","author":"G Crichton","year":"2017","unstructured":"Crichton, G. et al. A neural network multi-task learning approach to biomedical named entity recognition. BMC Bioinforma. 18, 368 (2017).","journal-title":"BMC Bioinforma."},{"key":"1079_CR12","doi-asserted-by":"publisher","first-page":"e17638","DOI":"10.2196\/17638","volume":"8","author":"J Wang","year":"2020","unstructured":"Wang, J. et al. Document-Level Biomedical Relation Extraction Using Graph Convolutional Network and Multihead Attention: Algorithm Development and Validation. JMIR Med Inf. 8, e17638 (2020).","journal-title":"JMIR Med Inf."},{"key":"1079_CR13","unstructured":"Liu, Y. et al. Roberta: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)."},{"key":"1079_CR14","doi-asserted-by":"publisher","DOI":"10.1038\/s41746-021-00455-y","volume":"4","author":"L Rasmy","year":"2021","unstructured":"Rasmy, L. et al. Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction. npj Digit. Med. 4, 86 (2021).","journal-title":"npj Digit. Med."},{"key":"1079_CR15","doi-asserted-by":"publisher","DOI":"10.1038\/s41746-022-00730-6","volume":"5","author":"H Wu","year":"2022","unstructured":"Wu, H. et al. A survey on clinical natural language processing in the United Kingdom from 2007 to 2022. npj Digit. Med. 5, 186 (2022).","journal-title":"npj Digit. Med."},{"key":"1079_CR16","unstructured":"Amin, M. B. et al. AJCC cancer staging manual. 1024: Springer 2017."},{"key":"1079_CR17","doi-asserted-by":"publisher","first-page":"706","DOI":"10.1097\/JTO.0b013e31812f3c1a","volume":"2","author":"P Goldstraw","year":"2007","unstructured":"Goldstraw, P. et al. The IASLC Lung Cancer Staging Project: Proposals for the Revision of the TNM Stage Groupings in the Forthcoming (Seventh) Edition of the TNM Classification of Malignant Tumours. J. Thorac. Oncol. 2, 706\u2013714 (2007).","journal-title":"J. Thorac. Oncol."},{"key":"1079_CR18","doi-asserted-by":"publisher","first-page":"e2300104","DOI":"10.1200\/CCI.23.00104","volume":"7","author":"DM Yang","year":"2023","unstructured":"Yang, D. M. et al. Osteosarcoma Explorer: A Data Commons With Clinical, Genomic, Protein, and Tissue Imaging Data for Osteosarcoma Research. JCO Clin. Cancer Inform. 7, e2300104 (2023).","journal-title":"JCO Clin. Cancer Inform."},{"key":"1079_CR19","doi-asserted-by":"crossref","unstructured":"The Lancet Digital, H., ChatGPT: friend or foe? Lancet Digital Health. 5, e102 (2023).","DOI":"10.1016\/S2589-7500(23)00023-7"},{"key":"1079_CR20","doi-asserted-by":"crossref","unstructured":"Nature, Will ChatGPT transform healthcare? Nat. Med. 29, 505\u2013506 (2023).","DOI":"10.1038\/s41591-023-02289-5"},{"key":"1079_CR21","doi-asserted-by":"publisher","first-page":"e107","DOI":"10.1016\/S2589-7500(23)00021-3","volume":"5","author":"SB Patel","year":"2023","unstructured":"Patel, S. B. & Lam, K. ChatGPT: the future of discharge summaries? Lancet Digit. Health 5, e107\u2013e108 (2023).","journal-title":"Lancet Digit. Health"},{"key":"1079_CR22","doi-asserted-by":"publisher","first-page":"e179","DOI":"10.1016\/S2589-7500(23)00048-1","volume":"5","author":"SR Ali","year":"2023","unstructured":"Ali, S. R. et al. Using ChatGPT to write patient clinic letters. Lancet Digit. Health 5, e179\u2013e181 (2023).","journal-title":"Lancet Digit. Health"},{"key":"1079_CR23","doi-asserted-by":"publisher","first-page":"405","DOI":"10.1016\/S1473-3099(23)00113-5","volume":"23","author":"A Howard","year":"2023","unstructured":"Howard, A., Hope, W. & Gerada, A. ChatGPT and antimicrobial advice: the end of the consulting infection doctor? Lancet Infect. Dis. 23, 405\u2013406 (2023).","journal-title":"Lancet Infect. Dis."},{"key":"1079_CR24","unstructured":"Mialon, G. et al. Augmented language models: a survey. arXiv preprint arXiv:2302.07842 (2023)."},{"key":"1079_CR25","unstructured":"Brown, T. et al. Language Models are Few-Shot Learners. Curran Associates, Inc. (2020)."},{"key":"1079_CR26","unstructured":"Wei, J. et al. Chain of thought prompting elicits reasoning in large language models. Adv Neural Inf Processing Syst 35, 24824\u201324837 (2022)."},{"key":"1079_CR27","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3571730","volume":"55","author":"Z Ji","year":"2023","unstructured":"Ji, Z. et al. Survey of Hallucination in Natural Language Generation. ACM Comput. Surv. 55, 1\u201338 (2023).","journal-title":"ACM Comput. Surv."},{"key":"1079_CR28","doi-asserted-by":"crossref","unstructured":"Alkaissi, H. & S. I. McFarlane, Artificial Hallucinations in ChatGPT: Implications in Scientific Writing. Cureus, (2023).","DOI":"10.7759\/cureus.35179"},{"key":"1079_CR29","doi-asserted-by":"crossref","unstructured":"Manakul, P. A. Liusie, & M. J. F. Gales, SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models. 2023.","DOI":"10.18653\/v1\/2023.emnlp-main.557"},{"key":"1079_CR30","doi-asserted-by":"publisher","first-page":"61","DOI":"10.1109\/2.59","volume":"21","author":"BW Boehm","year":"1988","unstructured":"Boehm, B. W. A spiral model of software development and enhancement. Computer 21, 61\u201372 (1988).","journal-title":"Computer"},{"key":"1079_CR31","unstructured":"OpenAi. OpenAI API Documentation. Available from: https:\/\/platform.openai.com\/docs\/guides\/text-generation."},{"key":"1079_CR32","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1126\/scisignal.2004088","volume":"6","author":"J Gao","year":"2013","unstructured":"Gao, J. et al. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci. Signal. 6, 1\u201319 (2013).","journal-title":"Sci. Signal."}],"container-title":["npj Digital Medicine"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.nature.com\/articles\/s41746-024-01079-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41746-024-01079-8","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41746-024-01079-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,5,1]],"date-time":"2024-05-01T12:56:01Z","timestamp":1714568161000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.nature.com\/articles\/s41746-024-01079-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,5,1]]},"references-count":32,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2024,12]]}},"alternative-id":["1079"],"URL":"https:\/\/doi.org\/10.1038\/s41746-024-01079-8","relation":{},"ISSN":["2398-6352"],"issn-type":[{"value":"2398-6352","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,5,1]]},"assertion":[{"value":"24 July 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"14 March 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"1 May 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declare no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"106"}}