{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,21]],"date-time":"2026-04-21T07:09:31Z","timestamp":1776755371949,"version":"3.51.2"},"reference-count":26,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2024,9,27]],"date-time":"2024-09-27T00:00:00Z","timestamp":1727395200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,9,27]],"date-time":"2024-09-27T00:00:00Z","timestamp":1727395200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Med Imaging"],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Background<\/jats:title>\n                    <jats:p>The impression section integrates key findings of a radiology report but can be subjective and variable. We sought to fine-tune and evaluate an open-source Large Language Model (LLM) in automatically generating impressions from the remainder of a radiology report across different imaging modalities and hospitals.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Methods<\/jats:title>\n                    <jats:p>\n                      In this institutional review board-approved retrospective study, we collated a dataset of CT, US, and MRI radiology reports from the University of California San Francisco Medical Center (UCSFMC) (\n                      <jats:italic>n<\/jats:italic>\n                      \u2009=\u2009372,716) and the Zuckerberg San Francisco General (ZSFG) Hospital and Trauma Center (\n                      <jats:italic>n<\/jats:italic>\n                      \u2009=\u200960,049), both under a single institution. The Recall-Oriented Understudy for Gisting Evaluation (ROUGE) score, an automatic natural language evaluation metric that measures word overlap, was used for automatic natural language evaluation. A reader study with five cardiothoracic radiologists was performed to more strictly evaluate the model\u2019s performance on a specific modality (CT chest exams) with a radiologist subspecialist baseline. We stratified the results of the reader performance study based on the diagnosis category and the original impression length to gauge case complexity.\n                    <\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>The LLM achieved ROUGE-L scores of 46.51, 44.2, and 50.96 on UCSFMC and upon external validation, ROUGE-L scores of 40.74, 37.89, and 24.61 on ZSFG across the CT, US, and MRI modalities respectively, implying a substantial degree of overlap between the model-generated impressions and impressions written by the subspecialist attending radiologists, but with a degree of degradation upon external validation. In our reader study, the model-generated impressions achieved overall mean scores of 3.56\/4, 3.92\/4, 3.37\/4, 18.29\u00a0s,12.32 words, and 84 while the original impression written by a subspecialist radiologist achieved overall mean scores of 3.75\/4, 3.87\/4, 3.54\/4, 12.2\u00a0s, 5.74 words, and 89 for clinical accuracy, grammatical accuracy, stylistic quality, edit time, edit distance, and ROUGE-L score respectively. The LLM achieved the highest clinical accuracy ratings for acute\/emergent findings and on shorter impressions.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Conclusions<\/jats:title>\n                    <jats:p>An open-source fine-tuned LLM can generate impressions to a satisfactory level of clinical accuracy, grammatical accuracy, and stylistic quality. Our reader performance study demonstrates the potential of large language models in drafting radiology report impressions that can aid in streamlining radiologists\u2019 workflows.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1186\/s12880-024-01435-w","type":"journal-article","created":{"date-parts":[[2024,9,27]],"date-time":"2024-09-27T06:02:34Z","timestamp":1727416954000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":24,"title":["An open-source fine-tuned large language model for radiological impression generation: a multi-reader performance study"],"prefix":"10.1186","volume":"24","author":[{"given":"Adrian","family":"Serapio","sequence":"first","affiliation":[]},{"given":"Gunvant","family":"Chaudhari","sequence":"additional","affiliation":[]},{"given":"Cody","family":"Savage","sequence":"additional","affiliation":[]},{"given":"Yoo Jin","family":"Lee","sequence":"additional","affiliation":[]},{"given":"Maya","family":"Vella","sequence":"additional","affiliation":[]},{"given":"Shravan","family":"Sridhar","sequence":"additional","affiliation":[]},{"given":"Jamie Lee","family":"Schroeder","sequence":"additional","affiliation":[]},{"given":"Jonathan","family":"Liu","sequence":"additional","affiliation":[]},{"given":"Adam","family":"Yala","sequence":"additional","affiliation":[]},{"given":"Jae Ho","family":"Sohn","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,9,27]]},"reference":[{"issue":"6","key":"1435_CR1","doi-asserted-by":"publisher","first-page":"1658","DOI":"10.1148\/rg.2020200020","volume":"40","author":"MP Hartung","year":"2020","unstructured":"Hartung MP, Bickle IC, Gaillard F, Kanne JP. How to create a great radiology report. RadioGraphics. 2020;40(6):1658\u201370. https:\/\/doi.org\/10.1148\/rg.2020200020. Radiological Society of North America.","journal-title":"RadioGraphics."},{"issue":"5","key":"1435_CR2","doi-asserted-by":"publisher","first-page":"1239","DOI":"10.2214\/ajr.175.5.1751239","volume":"175","author":"FM Hall","year":"2000","unstructured":"Hall FM. Language of the Radiology Report. Am J Roentgenol. 2000;175(5):1239\u201342. https:\/\/doi.org\/10.2214\/ajr.175.5.1751239. American Roentgen Ray Society.","journal-title":"Am J Roentgenol."},{"issue":"2","key":"1435_CR3","doi-asserted-by":"publisher","first-page":"93","DOI":"10.1007\/s13244-011-0066-7","volume":"2","author":"Good practice for radiological reporting","year":"2011","unstructured":"Good practice for radiological reporting. Guidelines from the European Society of Radiology (ESR). Insights Imaging. 2011;2(2):93\u20136. https:\/\/doi.org\/10.1007\/s13244-011-0066-7.","journal-title":"Insights Imaging"},{"key":"1435_CR4","first-page":"465","volume":"2011","author":"EF Gershanik","year":"2011","unstructured":"Gershanik EF, Lacson R, Khorasani R. Critical finding capture in the impression section of radiology reports. AMIA Annu Symp Proc. 2011;2011:465\u20139.","journal-title":"AMIA Annu Symp Proc"},{"issue":"1","key":"1435_CR5","doi-asserted-by":"publisher","first-page":"171","DOI":"10.1007\/s13244-016-0534-1","volume":"8","author":"AP Brady","year":"2016","unstructured":"Brady AP. Error and discrepancy in radiology: inevitable or avoidable? Insights Imaging. 2016;8(1):171\u201382. https:\/\/doi.org\/10.1007\/s13244-016-0534-1.","journal-title":"Insights Imaging"},{"issue":"4","key":"1435_CR6","doi-asserted-by":"publisher","first-page":"e230725","DOI":"10.1148\/radiol.230725","volume":"307","author":"LC Adams","year":"2023","unstructured":"Adams LC, Truhn D, Busch F, et al. Leveraging GPT-4 for post hoc transformation of free-text radiology reports into structured reporting: a multilingual feasibility study. Radiology. 2023;307(4):e230725. https:\/\/doi.org\/10.1148\/radiol.230725 Radiological Society of North America.","journal-title":"Radiology."},{"issue":"5","key":"1435_CR7","doi-asserted-by":"publisher","first-page":"e230582","DOI":"10.1148\/radiol.230582","volume":"307","author":"R Bhayana","year":"2023","unstructured":"Bhayana R, Krishna S, Bleakney RR. Performance of ChatGPT on a radiology board-style examination: insights into current strengths and limitations. Radiology. 2023;307(5):e230582. https:\/\/doi.org\/10.1148\/radiol.230582. Radiological Society of North America.","journal-title":"Radiology."},{"issue":"5","key":"1435_CR8","doi-asserted-by":"publisher","first-page":"e230922","DOI":"10.1148\/radiol.230922","volume":"307","author":"AA Rahsepar","year":"2023","unstructured":"Rahsepar AA, Tavakoli N, Kim GHJ, Hassani C, Abtin F, Bedayat A. How AI responds to common lung cancer questions: ChatGPT versus Google Bard. Radiology. 2023;307(5):e230922. https:\/\/doi.org\/10.1148\/radiol.230922. Radiological Society of North America.","journal-title":"Radiology."},{"issue":"5","key":"1435_CR9","doi-asserted-by":"publisher","first-page":"e231259","DOI":"10.1148\/radiol.231259","volume":"307","author":"Z Sun","year":"2023","unstructured":"Sun Z, Ong H, Kennedy P, et al. Evaluating GPT4 on impressions generation in radiology reports. Radiology. 2023;307(5):e231259. https:\/\/doi.org\/10.1148\/radiol.231259. Radiological Society of North America.","journal-title":"Radiology."},{"issue":"1","key":"1435_CR10","doi-asserted-by":"publisher","first-page":"e231147","DOI":"10.1148\/radiol.231147","volume":"309","author":"P Mukherjee","year":"2023","unstructured":"Mukherjee P, Hou B, Lanfredi RB, Summers RM. Feasibility of using the privacy-preserving large language model vicuna for labeling radiology reports. Radiology. 2023;309(1):e231147. https:\/\/doi.org\/10.1148\/radiol.231147. Radiological Society of North America.","journal-title":"Radiology."},{"key":"1435_CR11","doi-asserted-by":"publisher","unstructured":"Chung HW, Hou L, Longpre S, et al. Scaling Instruction-finetuned language models. arXiv; 2022. https:\/\/doi.org\/10.48550\/arXiv.2210.11416.","DOI":"10.48550\/arXiv.2210.11416"},{"issue":"1","key":"1435_CR12","first-page":"140:5485","volume":"21","author":"C Raffel","year":"2020","unstructured":"Raffel C, Shazeer N, Roberts A, et al. Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res. 2020;21(1):140:5485-140:5551.","journal-title":"J Mach Learn Res."},{"key":"1435_CR13","doi-asserted-by":"publisher","unstructured":"Paszke A, Gross S, Massa F, et al. PyTorch: an imperative style, high-performance deep learning library. arXiv; 2019. https:\/\/doi.org\/10.48550\/arXiv.1912.01703.","DOI":"10.48550\/arXiv.1912.01703"},{"key":"1435_CR14","doi-asserted-by":"publisher","unstructured":"Wolf T, Debut L, Sanh V, et al. HuggingFace\u2019s transformers: state-of-the-art natural language processing. arXiv; 2020. https:\/\/doi.org\/10.48550\/arXiv.1910.03771.","DOI":"10.48550\/arXiv.1910.03771"},{"key":"1435_CR15","doi-asserted-by":"publisher","unstructured":"Loshchilov I, Hutter F. Decoupled weight decay regularization. arXiv; 2019. https:\/\/doi.org\/10.48550\/arXiv.1711.05101.","DOI":"10.48550\/arXiv.1711.05101"},{"key":"1435_CR16","unstructured":"Lin C-Y. ROUGE: A package for automatic evaluation of summaries. text summ branches out. Barcelona, Spain: Association for Computational Linguistics; 2004. p. 74\u201381. https:\/\/aclanthology.org\/W04-1013. Accessed 15\u00a0Apr 2023."},{"issue":"1","key":"1435_CR17","doi-asserted-by":"publisher","first-page":"50","DOI":"10.1214\/aoms\/1177730491","volume":"18","author":"HB Mann","year":"1947","unstructured":"Mann HB, Whitney DR. On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat. 1947;18(1):50\u201360. https:\/\/doi.org\/10.1214\/aoms\/1177730491. Institute of Mathematical Statistics.","journal-title":"Ann Math Stat."},{"issue":"1","key":"1435_CR18","doi-asserted-by":"publisher","first-page":"3","DOI":"10.2466\/pr0.1966.19.1.3","volume":"19","author":"JJ Bartko","year":"1966","unstructured":"Bartko JJ. The intraclass correlation coefficient as a measure of reliability. Psychol Rep. 1966;19(1):3\u201311. https:\/\/doi.org\/10.2466\/pr0.1966.19.1.3.","journal-title":"Psychol Rep"},{"issue":"3","key":"1435_CR19","doi-asserted-by":"publisher","first-page":"261","DOI":"10.1038\/s41592-019-0686-2","volume":"17","author":"P Virtanen","year":"2020","unstructured":"Virtanen P, Gommers R, Oliphant TE, et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods. 2020;17(3):261\u201372. https:\/\/doi.org\/10.1038\/s41592-019-0686-2. Nature Publishing Group.","journal-title":"Nat Methods"},{"issue":"31","key":"1435_CR20","doi-asserted-by":"publisher","first-page":"1026","DOI":"10.21105\/joss.01026","volume":"3","author":"R Vallat","year":"2018","unstructured":"Vallat R. Pingouin: statistics in Python. J Open Source Softw. 2018;3(31):1026. https:\/\/doi.org\/10.21105\/joss.01026.","journal-title":"J Open Source Softw"},{"issue":"7825","key":"1435_CR21","doi-asserted-by":"publisher","first-page":"357","DOI":"10.1038\/s41586-020-2649-2","volume":"585","author":"CR Harris","year":"2020","unstructured":"Harris CR, Millman KJ, van der Walt SJ, et al. Array programming with NumPy. Nature. 2020;585(7825):357\u201362. https:\/\/doi.org\/10.1038\/s41586-020-2649-2. Nature Publishing Group.","journal-title":"Nature."},{"key":"1435_CR22","doi-asserted-by":"publisher","unstructured":"Ma C, Wu Z, Wang J, et al. ImpressionGPT: an iterative optimizing framework for radiology report summarization with ChatGPT. arXiv; 2023. https:\/\/doi.org\/10.48550\/arXiv.2304.08448.","DOI":"10.48550\/arXiv.2304.08448"},{"issue":"11","key":"1435_CR23","doi-asserted-by":"publisher","first-page":"1008","DOI":"10.1136\/thx.2004.031039","volume":"62","author":"R du Bois","year":"2007","unstructured":"du Bois R, King TE. Challenges in pulmonary fibrosis \u00b7 5: The NSIP\/UIP debate. Thorax. 2007;62(11):1008\u201312. https:\/\/doi.org\/10.1136\/thx.2004.031039.","journal-title":"Thorax"},{"issue":"1","key":"1435_CR24","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41746-023-00879-8","volume":"6","author":"M Wornow","year":"2023","unstructured":"Wornow M, Xu Y, Thapa R, et al. The shaky foundations of large language models and foundation models for electronic health records. NPJ Digit Med. 2023;6(1):1\u201310. https:\/\/doi.org\/10.1038\/s41746-023-00879-8. Nature Publishing Group.","journal-title":"Npj Digit Med."},{"issue":"6","key":"1435_CR25","doi-asserted-by":"publisher","first-page":"e333","DOI":"10.1016\/S2589-7500(23)00083-3","volume":"5","author":"H Li","year":"2023","unstructured":"Li H, Moon JT, Purkayastha S, Celi LA, Trivedi H, Gichoya JW. Ethics of large language models in medicine and medical research. Lancet Digit Health. 2023;5(6):e333\u20135. https:\/\/doi.org\/10.1016\/S2589-7500(23)00083-3. Elsevier.","journal-title":"Lancet Digit Health."},{"key":"1435_CR26","doi-asserted-by":"publisher","unstructured":"Shen Y, Heacock L, Elias J, et al. ChatGPT and other large language models are double-edged swords. Radiology. 2023.\u00a0https:\/\/doi.org\/10.1148\/radiol.230163. Radiological Society of North America.","DOI":"10.1148\/radiol.230163"}],"container-title":["BMC Medical Imaging"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12880-024-01435-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12880-024-01435-w\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12880-024-01435-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,27]],"date-time":"2024-09-27T06:07:10Z","timestamp":1727417230000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcmedimaging.biomedcentral.com\/articles\/10.1186\/s12880-024-01435-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,9,27]]},"references-count":26,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2024,12]]}},"alternative-id":["1435"],"URL":"https:\/\/doi.org\/10.1186\/s12880-024-01435-w","relation":{"has-preprint":[{"id-type":"doi","id":"10.21203\/rs.3.rs-4656707\/v1","asserted-by":"object"}]},"ISSN":["1471-2342"],"issn-type":[{"value":"1471-2342","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,9,27]]},"assertion":[{"value":"28 June 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"18 September 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"27 September 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The radiology reports in this study were collected retrospectively following the University of California San Francisco\u2019s Institutional Review Board approval (reference #: 303383) and informed consent waiver, following the Helsinki Declaration of 1975 as revised in 2013. All methods were performed in accordance with the relevant guidelines and regulations.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare no competing interests.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"254"}}