{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,9]],"date-time":"2026-05-09T10:00:04Z","timestamp":1778320804738,"version":"3.51.4"},"reference-count":34,"publisher":"Oxford University Press (OUP)","issue":"10","license":[{"start":{"date-parts":[[2024,9,13]],"date-time":"2024-09-13T00:00:00Z","timestamp":1726185600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100014013","name":"United Kingdom Research and Innovation","doi-asserted-by":"crossref","award":["EP\/S02431X\/1"],"award-info":[{"award-number":["EP\/S02431X\/1"]}],"id":[{"id":"10.13039\/100014013","id-type":"DOI","asserted-by":"crossref"}]},{"name":"UKRI Centre for Doctoral Training in Biomedical AI at the University of Edinburgh, School of Informatics"},{"DOI":"10.13039\/501100000266","name":"Engineering and Physical Sciences Research Council","doi-asserted-by":"publisher","award":["EP\/V050869\/1"],"award-info":[{"award-number":["EP\/V050869\/1"]}],"id":[{"id":"10.13039\/501100000266","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Multimorbidity Doctoral Training Programme for Health Professionals"},{"DOI":"10.13039\/100010269","name":"Wellcome Trust","doi-asserted-by":"publisher","award":["223499\/Z\/21\/Z"],"award-info":[{"award-number":["223499\/Z\/21\/Z"]}],"id":[{"id":"10.13039\/100010269","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Legal and General PLC"},{"name":"Advanced Care Research Centre"},{"DOI":"10.13039\/501100000272","name":"National Institute for Health Research","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100000272","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Artificial Intelligence and Multimorbidity: Clustering in Individuals, Space and Clinical Context","award":["NIHR202639"],"award-info":[{"award-number":["NIHR202639"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,10,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Objectives<\/jats:title>\n                  <jats:p>The aim of this study was to investigate GPT-3.5 in generating and coding medical documents with International Classification of Diseases (ICD)-10 codes for data augmentation on low-resource labels.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Materials and Methods<\/jats:title>\n                  <jats:p>Employing GPT-3.5 we generated and coded 9606 discharge summaries based on lists of ICD-10 code descriptions of patients with infrequent (or generation) codes within the MIMIC-IV dataset. Combined with the baseline training set, this formed an augmented training set. Neural coding models were trained on baseline and augmented data and evaluated on an MIMIC-IV test set. We report micro- and macro-F1 scores on the full codeset, generation codes, and their families. Weak Hierarchical Confusion Matrices determined within-family and outside-of-family coding errors in the latter codesets. The coding performance of GPT-3.5 was evaluated on prompt-guided self-generated data and real MIMIC-IV data. Clinicians evaluated the clinical acceptability of the generated documents.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>Data augmentation results in slightly lower overall model performance but improves performance for the generation candidate codes and their families, including 1 absent from the baseline training data. Augmented models display lower out-of-family error rates. GPT-3.5 identifies ICD-10 codes by their prompted descriptions but underperforms on real data. Evaluators highlight the correctness of generated concepts while suffering in variety, supporting information, and narrative.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Discussion and Conclusion<\/jats:title>\n                  <jats:p>While GPT-3.5 alone given our prompt setting is unsuitable for ICD-10 coding, it supports data augmentation for training neural models. Augmentation positively affects generation code families but mainly benefits codes with existing examples. Augmentation reduces out-of-family errors. Documents generated by GPT-3.5 state prompted concepts correctly but lack variety, and authenticity in narratives.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/jamia\/ocae132","type":"journal-article","created":{"date-parts":[[2024,9,14]],"date-time":"2024-09-14T08:30:46Z","timestamp":1726302646000},"page":"2284-2293","source":"Crossref","is-referenced-by-count":28,"title":["Can GPT-3.5 generate and code discharge summaries?"],"prefix":"10.1093","volume":"31","author":[{"ORCID":"https:\/\/orcid.org\/0009-0006-7649-6251","authenticated-orcid":false,"given":"Mat\u00fa\u0161","family":"Falis","sequence":"first","affiliation":[{"name":"School of Informatics, The University of Edinburgh , Edinburgh EH8 9AB, United Kingdom"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-1163-3531","authenticated-orcid":false,"given":"Aryo Pradipta","family":"Gema","sequence":"additional","affiliation":[{"name":"School of Informatics, The University of Edinburgh , Edinburgh EH8 9AB, United Kingdom"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6828-6891","authenticated-orcid":false,"given":"Hang","family":"Dong","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Exeter , Exeter EX4 4QF, United Kingdom"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0564-4000","authenticated-orcid":false,"given":"Luke","family":"Daines","sequence":"additional","affiliation":[{"name":"Centre for Medical Informatics, Usher Institute, University of Edinburgh , Edinburgh EH16 4UX, United Kingdom"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7639-1859","authenticated-orcid":false,"given":"Siddharth","family":"Basetti","sequence":"additional","affiliation":[{"name":"Department of Research, Development and Innovation, National Health Service Highland , Inverness IV2 3JH, United Kingdom"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9828-9153","authenticated-orcid":false,"given":"Michael","family":"Holder","sequence":"additional","affiliation":[{"name":"Centre for Population Health Sciences, Usher Institute, The University of Edinburgh , Edinburgh EH16 4UX, United Kingdom"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7023-7108","authenticated-orcid":false,"given":"Rose S","family":"Penfold","sequence":"additional","affiliation":[{"name":"Ageing and Health, Usher Institute, The University of Edinburgh , Edinburgh EH16 4UX, United Kingdom"},{"name":"Advanced Care Research Centre, The University of Edinburgh , Edinburgh EH16 4UX, United Kingdom"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9022-3405","authenticated-orcid":false,"given":"Alexandra","family":"Birch","sequence":"additional","affiliation":[{"name":"School of Informatics, The University of Edinburgh , Edinburgh EH8 9AB, United Kingdom"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7279-1476","authenticated-orcid":false,"given":"Beatrice","family":"Alex","sequence":"additional","affiliation":[{"name":"Edinburgh Futures Institute, The University of Edinburgh , Edinburgh EH3 9EF, United Kingdom"},{"name":"School of Literatures, Languages and Cultures, The University of Edinburgh , Edinburgh EH8 9LH, United Kingdom"}]}],"member":"286","published-online":{"date-parts":[[2024,9,13]]},"reference":[{"issue":"1","key":"2024092007535297200_ocae132-B1","doi-asserted-by":"crossref","first-page":"159","DOI":"10.1038\/s41746-022-00705-7","article-title":"Automated clinical coding: what, why, and where we are?","volume":"5","author":"Dong","year":"2022","journal-title":"NPJ Digit Med"},{"issue":"1","key":"2024092007535297200_ocae132-B2","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1038\/s41597-023-01945-2","article-title":"MIMIC-IV, a freely accessible electronic health record dataset","volume":"10","author":"Johnson","year":"2023","journal-title":"Sci Data"},{"key":"2024092007535297200_ocae132-B3","first-page":"1101","author":"Mullenbach","year":"2018"},{"key":"2024092007535297200_ocae132-B4","first-page":"103728","article-title":"Explainable automated coding of clinical notes using hierarchical label-wise attention networks and label embedding initialisation","year":"2021"},{"key":"2024092007535297200_ocae132-B5","first-page":"196","author":"Kim","year":"2021"},{"key":"2024092007535297200_ocae132-B6","first-page":"31","author":"Rios","year":"2018"},{"key":"2024092007535297200_ocae132-B7","first-page":"4018","author":"Song","year":"2021"},{"key":"2024092007535297200_ocae132-B8","author":"Ren","year":"2022"},{"key":"2024092007535297200_ocae132-B9","first-page":"523","author":"Wang","year":"2022"},{"key":"2024092007535297200_ocae132-B10","author":"Falis","year":"2022"},{"key":"2024092007535297200_ocae132-B11","author":"Kim","year":"2022"},{"key":"2024092007535297200_ocae132-B12","first-page":"138","author":"Barros","year":"2022"},{"key":"2024092007535297200_ocae132-B13","author":"Afkanpour","year":"2022"},{"key":"2024092007535297200_ocae132-B14","first-page":"27730","volume-title":"Advances in Neural Information Processing Systems","author":"Ouyang","year":"2022"},{"key":"2024092007535297200_ocae132-B15","author":"Touvron","year":"2023"},{"key":"2024092007535297200_ocae132-B16","author":"Zhao","year":"2023"},{"key":"2024092007535297200_ocae132-B17","author":"Singhal","year":"2022"},{"issue":"12","key":"2024092007535297200_ocae132-B18","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3571730","article-title":"Survey of hallucination in natural language generation","volume":"55","author":"Ji","year":"2023","journal-title":"ACM Comput Surv"},{"issue":"13","key":"2024092007535297200_ocae132-B19","doi-asserted-by":"crossref","first-page":"1233","DOI":"10.1056\/NEJMsr2214184","article-title":"Benefits, limits, and risks of gpt-4 as an ai chatbot for medicine","volume":"388","author":"Lee","year":"2023","journal-title":"N Engl J Med"},{"issue":"6","key":"2024092007535297200_ocae132-B20","doi-asserted-by":"crossref","first-page":"269","DOI":"10.1016\/j.diii.2023.02.003","article-title":"Revolutionizing radiology with gpt-based models: Current applications, future possibilities and limitations of chatgpt","volume":"104","author":"Lecler","year":"2023","journal-title":"Diagn Interv Imaging"},{"key":"2024092007535297200_ocae132-B21","first-page":"2023","author":"Yeung","year":"2023"},{"key":"2024092007535297200_ocae132-B22","author":"Kraljevic","year":"2022"},{"key":"2024092007535297200_ocae132-B23","author":"Ghosh","year":"2023"},{"key":"2024092007535297200_ocae132-B24","author":"Edin","year":"2023"},{"key":"2024092007535297200_ocae132-B25","author":"Nguyen","year":"2023"},{"key":"2024092007535297200_ocae132-B26","author":"Vu","year":"2020"},{"key":"2024092007535297200_ocae132-B27","first-page":"8180","author":"Li","year":"2020"},{"issue":"2","key":"2024092007535297200_ocae132-B28","first-page":"3111","article-title":"Distributed representations of words and phrases and their compositionality","volume":"26","author":"Mikolov","year":"2013","journal-title":"Adv Neural Inf Process Syst"},{"key":"2024092007535297200_ocae132-B29","author":"Devlin","year":"2018"},{"key":"2024092007535297200_ocae132-B30","author":"Huang","year":"2022"},{"issue":"3","key":"2024092007535297200_ocae132-B31","doi-asserted-by":"crossref","first-page":"820","DOI":"10.1007\/s10618-014-0382-x","article-title":"Evaluation measures for hierarchical classification: a unified view and novel approaches","volume":"29","author":"Kosmopoulos","year":"2015","journal-title":"Data Min Knowl Disc"},{"key":"2024092007535297200_ocae132-B32","first-page":"907","author":"Falis","year":"2021"},{"issue":"5","key":"2024092007535297200_ocae132-B33","doi-asserted-by":"crossref","first-page":"378","DOI":"10.1037\/h0031619","article-title":"Measuring nominal scale agreement among many raters","volume":"76","author":"Fleiss","year":"1971","journal-title":"Psychol Bull"},{"key":"2024092007535297200_ocae132-B34","author":"Lewis","year":"2020"}],"container-title":["Journal of the American Medical Informatics Association"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/jamia\/advance-article-pdf\/doi\/10.1093\/jamia\/ocae132\/59206416\/ocae132.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/jamia\/advance-article-pdf\/doi\/10.1093\/jamia\/ocae132\/59206416\/ocae132.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,20]],"date-time":"2024-09-20T07:54:19Z","timestamp":1726818859000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/jamia\/article\/31\/10\/2284\/7756747"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,9,13]]},"references-count":34,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2024,9,13]]},"published-print":{"date-parts":[[2024,10,1]]}},"URL":"https:\/\/doi.org\/10.1093\/jamia\/ocae132","relation":{},"ISSN":["1067-5027","1527-974X"],"issn-type":[{"value":"1067-5027","type":"print"},{"value":"1527-974X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,10]]},"published":{"date-parts":[[2024,9,13]]}}}