{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T17:50:51Z","timestamp":1773856251816,"version":"3.50.1"},"reference-count":23,"publisher":"Oxford University Press (OUP)","issue":"7","license":[{"start":{"date-parts":[[2025,5,7]],"date-time":"2025-05-07T00:00:00Z","timestamp":1746576000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/pages\/standard-publication-reuse-rights"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,7,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Background<\/jats:title>\n                    <jats:p>Automated data extraction from echocardiography reports could facilitate large-scale registry creation and clinical surveillance of valvular heart diseases (VHD). We evaluated the performance of open-source large language models (LLMs) guided by prompt instructions and chain of thought (CoT) for this task.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Methods<\/jats:title>\n                    <jats:p>From consecutive transthoracic echocardiographies performed in our center, we utilized 200 random reports from 2019 for prompt optimization and 1000 from 2023 for evaluation. Five instruction-tuned LLMs (Qwen2.0-72B, Llama3.0-70B, Mixtral8-46.7B, Llama3.0-8B, and Phi3.0-3.8B) were guided by prompt instructions with and without CoT to classify prosthetic valve presence and VHD severity. Performance was evaluated using classification metrics against expert-labeled ground truth. Mean squared error (MSE) was also calculated for predicted severity\u2019s deviation from actual severity.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>With CoT prompting, Llama3.0-70B and Qwen2.0 achieved the highest performance (accuracy: 99.1% and 98.9% for VHD severity; 100% and 99.9% for prosthetic valve; MSE: 0.02 and 0.05, respectively). Smaller models showed lower accuracy for VHD severity (54.1%-85.9%) but maintained high accuracy for prosthetic valve detection (&amp;gt;96%). Chain of thought reasoning yielded higher accuracy for larger models while increasing processing time from 2-25 to 67-154 seconds per report. Based on CoT reasonings, the wrong predictions were mainly due to model outputs being influenced by irrelevant information in the text or failure to follow the prompt instructions.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Conclusions<\/jats:title>\n                    <jats:p>Our study demonstrates the near-perfect performance of open-source LLMs for automated echocardiography report interpretation with the purpose of registry formation and disease surveillance. While larger models achieved exceptional accuracy through prompt optimization, practical implementation requires balancing performance with computational efficiency.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/jamia\/ocaf056","type":"journal-article","created":{"date-parts":[[2025,3,18]],"date-time":"2025-03-18T16:22:08Z","timestamp":1742314928000},"page":"1120-1129","source":"Crossref","is-referenced-by-count":2,"title":["A comparative analysis of privacy-preserving large language models for automated echocardiography report analysis"],"prefix":"10.1093","volume":"32","author":[{"given":"Elham","family":"Mahmoudi","sequence":"first","affiliation":[{"name":"Department of Radiology, Radiology Informatics Lab, Mayo Clinic , Rochester, MN 55905,","place":["United States"]},{"name":"Department of Cardiovascular Medicine, Mayo Clinic Rochester , Rochester, MN 55905,","place":["United States"]}]},{"given":"Sanaz","family":"Vahdati","sequence":"additional","affiliation":[{"name":"Department of Radiology, Radiology Informatics Lab, Mayo Clinic , Rochester, MN 55905,","place":["United States"]}]},{"given":"Chieh-Ju","family":"Chao","sequence":"additional","affiliation":[{"name":"Department of Radiology, Radiology Informatics Lab, Mayo Clinic , Rochester, MN 55905,","place":["United States"]},{"name":"Department of Cardiovascular Medicine, Mayo Clinic Rochester , Rochester, MN 55905,","place":["United States"]}]},{"given":"Bardia","family":"Khosravi","sequence":"additional","affiliation":[{"name":"Department of Radiology, Radiology Informatics Lab, Mayo Clinic , Rochester, MN 55905,","place":["United States"]}]},{"given":"Ajay","family":"Misra","sequence":"additional","affiliation":[{"name":"Department of Radiology, Radiology Informatics Lab, Mayo Clinic , Rochester, MN 55905,","place":["United States"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5788-9734","authenticated-orcid":false,"given":"Francisco","family":"Lopez-Jimenez","sequence":"additional","affiliation":[{"name":"Department of Cardiovascular Medicine, Mayo Clinic Rochester , Rochester, MN 55905,","place":["United States"]}]},{"given":"Bradley J","family":"Erickson","sequence":"additional","affiliation":[{"name":"Department of Radiology, Radiology Informatics Lab, Mayo Clinic , Rochester, MN 55905,","place":["United States"]}]}],"member":"286","published-online":{"date-parts":[[2025,5,7]]},"reference":[{"key":"2025071409362542800_ocaf056-B1","doi-asserted-by":"publisher","first-page":"104622","DOI":"10.1016\/j.ijmedinf.2021.104622","article-title":"Comparing automated vs manual data collection for COVID-specific medications from electronic health records","volume":"157","author":"Yin","year":"2022","journal-title":"Int J Med Inform"},{"key":"2025071409362542800_ocaf056-B2","doi-asserted-by":"publisher","first-page":"329","DOI":"10.1148\/radiol.16142770","article-title":"Natural language processing in radiology: a systematic review","volume":"279","author":"Pons","year":"2016","journal-title":"Radiology"},{"key":"2025071409362542800_ocaf056-B3","doi-asserted-by":"crossref","first-page":"622","DOI":"10.1055\/s-0040-1715567","article-title":"A rule-based data quality assessment system for electronic health record data","volume":"11","author":"Wang","year":"2020","journal-title":"Appl Clin Informat"},{"key":"2025071409362542800_ocaf056-B4","doi-asserted-by":"publisher","first-page":"e232756","DOI":"10.1148\/radiol.232756","article-title":"Chatbots and large language models in radiology: a practical primer for clinical and research applications","volume":"310","author":"Bhayana","year":"2024","journal-title":"Radiology"},{"key":"2025071409362542800_ocaf056-B5","author":"Shekhar","year":"2024"},{"key":"2025071409362542800_ocaf056-B6","doi-asserted-by":"publisher","author":"Chao","DOI":"10.1101\/2024.01.18.24301503"},{"key":"2025071409362542800_ocaf056-B7","first-page":"27730","article-title":"Training language models to follow instructions with human feedback","volume":"35","author":"Ouyang","year":"2022","journal-title":"Adv Neural Inform Process Syst"},{"key":"2025071409362542800_ocaf056-B8","doi-asserted-by":"crossref","first-page":"e428","DOI":"10.1016\/S2589-7500(24)00061-X","article-title":"Ethical and regulatory challenges of large language models in medicine","volume":"6","author":"Ong","year":"2024","journal-title":"Lancet Digital Health"},{"key":"2025071409362542800_ocaf056-B9","doi-asserted-by":"publisher","author":"Wiest","DOI":"10.1101\/2023.12.07.23299648"},{"key":"2025071409362542800_ocaf056-B10","doi-asserted-by":"publisher","first-page":"e231147","DOI":"10.1148\/radiol.231147","article-title":"Feasibility of using the privacy-preserving large language model vicuna for labeling radiology reports","volume":"309","author":"Mukherjee","year":"2023","journal-title":"Radiology"},{"key":"2025071409362542800_ocaf056-B11","doi-asserted-by":"publisher","author":"Artsi","DOI":"10.1101\/2024.01.05.24300884"},{"key":"2025071409362542800_ocaf056-B12","author":"Yang","year":"2024"},{"key":"2025071409362542800_ocaf056-B13","author":"Dubey","year":"2024"},{"key":"2025071409362542800_ocaf056-B14","author":"Jiang","year":"2024"},{"key":"2025071409362542800_ocaf056-B15","doi-asserted-by":"publisher","first-page":"2097","DOI":"10.1093\/jamia\/ocae085","article-title":"Local large language models for privacy-preserving accelerated review of historic echocardiogram reports","volume":"31","author":"Vaid","year":"2024","journal-title":"J Am Med Inform Assoc"},{"key":"2025071409362542800_ocaf056-B16","author":"Gu","year":"2024"},{"key":"2025071409362542800_ocaf056-B17","doi-asserted-by":"crossref","first-page":"179","DOI":"10.1186\/s12911-021-01533-7","article-title":"A systematic review of natural language processing applied to radiology reports","volume":"21","author":"Casey","year":"2021","journal-title":"BMC Med Inform Decis Mak"},{"key":"2025071409362542800_ocaf056-B18","doi-asserted-by":"crossref","first-page":"626","DOI":"10.1093\/ehjdh\/ztac047","article-title":"Automated interpretation of stress echocardiography reports using natural language processing","volume":"3","author":"Zheng","year":"2022","journal-title":"Eur Heart J Digit Health"},{"key":"2025071409362542800_ocaf056-B19","doi-asserted-by":"crossref","first-page":"1307","DOI":"10.3390\/bioengineering10111307","article-title":"Development and evaluation of a natural language processing system for curating a trans-thoracic echocardiogram (TTE) database","volume":"10","author":"Dong","year":"2023","journal-title":"Bioengineering-Basel"},{"key":"2025071409362542800_ocaf056-B20","doi-asserted-by":"publisher","first-page":"156","DOI":"10.1016\/j.cvdhj.2021.03.003","article-title":"Large-scale identification of aortic stenosis and its severity using natural language processing on electronic health records","volume":"2","author":"Solomon","year":"2021","journal-title":"Cardiovasc Digit Health J"},{"key":"2025071409362542800_ocaf056-B21","doi-asserted-by":"crossref","first-page":"e0153749","DOI":"10.1371\/journal.pone.0153749","article-title":"A natural language processing tool for large-scale data extraction from echocardiography reports","volume":"11","author":"Nath","year":"2016","journal-title":"Plos One"},{"key":"2025071409362542800_ocaf056-B22","author":"Rawte","year":"2023"},{"key":"2025071409362542800_ocaf056-B23","author":"Rawte","year":"2023"}],"container-title":["Journal of the American Medical Informatics Association"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/jamia\/article-pdf\/32\/7\/1120\/63103763\/ocaf056.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/jamia\/article-pdf\/32\/7\/1120\/63103763\/ocaf056.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,14]],"date-time":"2025-07-14T13:36:36Z","timestamp":1752500196000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/jamia\/article\/32\/7\/1120\/8126598"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,5,7]]},"references-count":23,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2025,5,7]]},"published-print":{"date-parts":[[2025,7,1]]}},"URL":"https:\/\/doi.org\/10.1093\/jamia\/ocaf056","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2024.12.19.24319181","asserted-by":"object"}]},"ISSN":["1067-5027","1527-974X"],"issn-type":[{"value":"1067-5027","type":"print"},{"value":"1527-974X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,7]]},"published":{"date-parts":[[2025,5,7]]}}}