{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,27]],"date-time":"2026-03-27T05:34:51Z","timestamp":1774589691377,"version":"3.50.1"},"reference-count":42,"publisher":"MDPI AG","issue":"11","license":[{"start":{"date-parts":[[2023,11,10]],"date-time":"2023-11-10T00:00:00Z","timestamp":1699574400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Bioengineering"],"abstract":"<jats:p>Background: Although electronic health records (EHR) provide useful insights into disease patterns and patient treatment optimisation, their reliance on unstructured data presents a difficulty. Echocardiography reports, which provide extensive pathology information for cardiovascular patients, are particularly challenging to extract and analyse, because of their narrative structure. Although natural language processing (NLP) has been utilised successfully in a variety of medical fields, it is not commonly used in echocardiography analysis. Objectives: To develop an NLP-based approach for extracting and categorising data from echocardiography reports by accurately converting continuous (e.g., LVOT VTI, AV VTI and TR Vmax) and discrete (e.g., regurgitation severity) outcomes in a semi-structured narrative format into a structured and categorised format, allowing for future research or clinical use. Methods: 135,062 Trans-Thoracic Echocardiogram (TTE) reports were derived from 146967 baseline echocardiogram reports and split into three cohorts: Training and Validation (n = 1075), Test Dataset (n = 98) and Application Dataset (n = 133,889). The NLP system was developed and was iteratively refined using medical expert knowledge. The system was used to curate a moderate-fidelity database from extractions of 133,889 reports. A hold-out validation set of 98 reports was blindly annotated and extracted by two clinicians for comparison with the NLP extraction. Agreement, discrimination, accuracy and calibration of outcome measure extractions were evaluated. Results: Continuous outcomes including LVOT VTI, AV VTI and TR Vmax exhibited perfect inter-rater reliability using intra-class correlation scores (ICC = 1.00, p &lt; 0.05) alongside high R2 values, demonstrating an ideal alignment between the NLP system and clinicians. A good level (ICC = 0.75\u20130.9, p &lt; 0.05) of inter-rater reliability was observed for outcomes such as LVOT Diam, Lateral MAPSE, Peak E Velocity, Lateral E\u2019 Velocity, PV Vmax, Sinuses of Valsalva and Ascending Aorta diameters. Furthermore, the accuracy rate for discrete outcome measures was 91.38% in the confusion matrix analysis, indicating effective performance. Conclusions: The NLP-based technique yielded good results when it came to extracting and categorising data from echocardiography reports. The system demonstrated a high degree of agreement and concordance with clinician extractions. This study contributes to the effective use of semi-structured data by providing a useful tool for converting semi-structured text to a structured echo report that can be used for data management. Additional validation and implementation in healthcare settings can improve data availability and support research and clinical decision-making.<\/jats:p>","DOI":"10.3390\/bioengineering10111307","type":"journal-article","created":{"date-parts":[[2023,11,13]],"date-time":"2023-11-13T02:02:42Z","timestamp":1699840962000},"page":"1307","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":18,"title":["Development and Evaluation of a Natural Language Processing System for Curating a Trans-Thoracic Echocardiogram (TTE) Database"],"prefix":"10.3390","volume":"10","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1953-0063","authenticated-orcid":false,"given":"Tim","family":"Dong","sequence":"first","affiliation":[{"name":"Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK"}]},{"given":"Nicholas","family":"Sunderland","sequence":"additional","affiliation":[{"name":"Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9435-726X","authenticated-orcid":false,"given":"Angus","family":"Nightingale","sequence":"additional","affiliation":[{"name":"Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK"}]},{"given":"Daniel P.","family":"Fudulu","sequence":"additional","affiliation":[{"name":"Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK"}]},{"given":"Jeremy","family":"Chan","sequence":"additional","affiliation":[{"name":"Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK"}]},{"given":"Ben","family":"Zhai","sequence":"additional","affiliation":[{"name":"School of Computing Science, Northumbria University, Newcastle upon Tyne NE1 8ST, UK"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2113-9653","authenticated-orcid":false,"given":"Alberto","family":"Freitas","sequence":"additional","affiliation":[{"name":"Faculty of Medicine, University of Porto, 4100 Porto, Portugal"}]},{"given":"Massimo","family":"Caputo","sequence":"additional","affiliation":[{"name":"Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK"}]},{"given":"Arnaldo","family":"Dimagli","sequence":"additional","affiliation":[{"name":"Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK"}]},{"given":"Stuart","family":"Mires","sequence":"additional","affiliation":[{"name":"Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK"}]},{"given":"Mike","family":"Wyatt","sequence":"additional","affiliation":[{"name":"University Hospitals Bristol and Weston, Marlborough St, Bristol BS1 3NU, UK"}]},{"given":"Umberto","family":"Benedetto","sequence":"additional","affiliation":[{"name":"Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK"}]},{"given":"Gianni D.","family":"Angelini","sequence":"additional","affiliation":[{"name":"Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol BS2 8HW, UK"}]}],"member":"1968","published-online":{"date-parts":[[2023,11,10]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"9253","DOI":"10.1038\/s41598-019-45705-y","article-title":"Relevant Word Order Vectorization for Improved Natural Language Processing in Electronic Health Records","volume":"9","author":"Thompson","year":"2019","journal-title":"Sci. Rep."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"43","DOI":"10.1038\/s43856-021-00043-x","article-title":"Development and multicenter validation of chest X-ray radiography interpretations based on natural language processing","volume":"1","author":"Zhang","year":"2021","journal-title":"Commun. Med."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"20265","DOI":"10.1038\/s41598-020-77258-w","article-title":"Validation of deep learning natural language processing algorithm for keyword extraction from pathology reports in electronic health records","volume":"10","author":"Kim","year":"2020","journal-title":"Sci. Rep."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"630","DOI":"10.1038\/s41398-021-01722-y","article-title":"Natural Language Processing markers in first episode psychosis and people at clinical high-risk","volume":"11","author":"Morgan","year":"2021","journal-title":"Transl. Psychiatry"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"106","DOI":"10.1038\/s41746-019-0179-9","article-title":"Language impairment in adults with end-stage liver disease: Application of natural language processing towards patient-generated health records","volume":"2","author":"Dickerson","year":"2019","journal-title":"NPJ Digit. Med."},{"key":"ref_6","unstructured":"Liu, L., Zhang, C., and Tao, D. (2023). GAN-MDF: A Method for Multi-fidelity Data Fusion in Digital Twins. arXiv."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"7425","DOI":"10.1016\/j.eswa.2014.05.043","article-title":"A novel approach for multimodal medical image fusion","volume":"41","author":"Liu","year":"2014","journal-title":"Expert Syst. Appl."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"90","DOI":"10.1109\/MCG.2016.59","article-title":"Data-Driven Healthcare: Challenges and Opportunities for Interactive Visualization","volume":"36","author":"Gotz","year":"2016","journal-title":"IEEE Comput. Graph. Appl."},{"key":"ref_9","unstructured":"(2023, June 20). Large-Scale Identification of Aortic Stenosis and Its Severity Using Natural Language Processing on Electronic Health Records\u2014ScienceDirect. Available online: https:\/\/www.sciencedirect.com\/science\/article\/pii\/S2666693621000256."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"102584","DOI":"10.1016\/j.artmed.2023.102584","article-title":"A general text mining method to extract echocardiography measurement results from echocardiography documents","volume":"143","author":"Fogarassy","year":"2023","journal-title":"Artif. Intell. Med."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Nath, C., Albaghdadi, M.S., and Jonnalagadda, S.R. (2016). A Natural Language Processing Tool for Large-Scale Data Extraction from Echocardiography Reports. PLoS ONE, 11.","DOI":"10.1371\/journal.pone.0153749"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"42","DOI":"10.1016\/j.jbi.2017.01.017","article-title":"Extraction of left ventricular ejection fraction information from various types of clinical reports","volume":"67","author":"Kim","year":"2017","journal-title":"J. Biomed. Inform."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"626","DOI":"10.1093\/ehjdh\/ztac047","article-title":"Automated interpretation of stress echocardiography reports using natural language processing","volume":"3","author":"Zheng","year":"2022","journal-title":"Eur. Heart J. Digit. Health"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Arnaud, E., Elbattah, M., Gignon, M., and Dequen, G. (2022, January 9\u201311). Learning Embeddings from Free-text Triage Notes using Pretrained Transformer Models. Proceedings of the 15th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2022), Workshop on Scaling-Up Health-IT, Vienna, Austria.","DOI":"10.5220\/0011012800003123"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Rietberg, M.T., Nguyen, V.B., Geerdink, J., Vijlbrief, O., and Seifert, C. (2023). Accurate and Reliable Classification of Unstructured Reports on Their Diagnostic Goal Using BERT Models. Diagnostics, 13.","DOI":"10.3390\/diagnostics13071251"},{"key":"ref_16","first-page":"1060","article-title":"Exploiting Unlabeled Texts with Clustering-based Instance Selection for Medical Relation Classification","volume":"2017","author":"Kim","year":"2018","journal-title":"AMIA Annu. Symp. Proc."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"G59","DOI":"10.1530\/ERP-20-0026","article-title":"A practical guideline for performing a comprehensive transthoracic echocardiogram in adults: The British Society of Echocardiography minimum dataset","volume":"7","author":"Robinson","year":"2020","journal-title":"Echo Res. Pract."},{"key":"ref_18","first-page":"851","article-title":"Gate-Based Rules for Extracting Attribute Values","volume":"25","author":"Andrade","year":"2021","journal-title":"Comput. Y Sist."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Cunningham, H., Tablan, V., Roberts, A., and Bontcheva, K. (2013). Getting More Out of Biomedical Documents with GATE\u2019s Full Lifecycle Open Source Text Analytics. PLoS Comput. Biol., 9.","DOI":"10.1371\/journal.pcbi.1002854"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"109","DOI":"10.1186\/s13195-021-00848-x","article-title":"Correlating natural language processing and automated speech analysis with clinician assessment to quantify speech-language changes in mild cognitive impairment and Alzheimer\u2019s dementia","volume":"13","author":"Yeung","year":"2021","journal-title":"Alzheimer\u2019s Res. Ther."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Rahman, M., Nowakowski, S., Agrawal, R., Naik, A., Sharafkhaneh, A., and Razjouyan, J. (2022). Validation of a Natural Language Processing Algorithm for the Extraction of the Sleep Parameters from the Polysomnography Reports. Healthcare, 10.","DOI":"10.3390\/healthcare10101837"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"939","DOI":"10.1093\/schbul\/sbac051","article-title":"Natural Language Processing and Psychosis: On the Need for Comprehensive Psychometric Evaluation","volume":"48","author":"Cohen","year":"2022","journal-title":"Schizophr. Bull."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"155","DOI":"10.1016\/j.jcm.2016.02.012","article-title":"A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research","volume":"15","author":"Koo","year":"2016","journal-title":"J. Chiropr. Med."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"S128","DOI":"10.1016\/j.jbi.2015.08.002","article-title":"Adapting existing natural language processing resources for cardiovascular risk factors identification in clinical notes","volume":"58","author":"Khalifa","year":"2015","journal-title":"J. Biomed. Inform."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"596","DOI":"10.1197\/jamia.M3096","article-title":"A Text Mining Approach to the Prediction of Disease Status from Clinical Discharge Summaries","volume":"16","author":"Yang","year":"2009","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"101311","DOI":"10.1016\/j.csl.2021.101311","article-title":"Natural language processing for under-resourced languages: Developing a Welsh natural language toolkit","volume":"72","author":"Cunliffe","year":"2021","journal-title":"Comput. Speech Lang."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"504","DOI":"10.1093\/jamia\/ocaa261","article-title":"Can reproducibility be improved in clinical natural language processing? A study of 7 clinical NLP suites","volume":"28","author":"Digan","year":"2020","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"297","DOI":"10.1016\/j.future.2019.02.040","article-title":"Analyse digital forensic evidences through a semantic-based methodology and NLP techniques","volume":"98","author":"Amato","year":"2019","journal-title":"Futur. Gener. Comput. Syst."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Drousiotis, E., Pentaliotis, P., Shi, L., and Cristea, A.I. (2022, January 27\u201331). Balancing Fined-Tuned Machine Learning Models Between Continuous and Discrete Variables\u2014A Comprehensive Analysis Using Educational Data. Proceedings of the International Conference on Artificial Intelligence in Education, Durham, UK.","DOI":"10.1007\/978-3-031-11644-5_21"},{"key":"ref_30","unstructured":"Belz, A., and Kow, E. (2011, January 19\u201324). Discrete vs. continuous rating scales for language evaluation in NLP. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers\u2014Volume 2, Portland, OR, USA."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"143","DOI":"10.1016\/j.aiopen.2021.07.002","article-title":"Discrete and continuous representations and processing in deep learning: Looking forward","volume":"2","author":"Cartuyvels","year":"2021","journal-title":"AI Open"},{"key":"ref_32","unstructured":"Sutton, R.S., McAllester, D., Singh, S., and Mansour, Y. (2023). Advances in Neural Information Processing Systems, MIT Press. Available online: https:\/\/papers.nips.cc\/paper_files\/paper\/1999\/hash\/464d828b85b0bed98e80ade0a5c43b0f-Abstract.html."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Hu, R., Andreas, J., Rohrbach, M., Darrell, T., and Saenko, K. (2017). Learning to Reason: End-to-End Module Networks for Visual Question Answering. arXiv.","DOI":"10.1109\/ICCV.2017.93"},{"key":"ref_34","unstructured":"Maddison, C.J., Mnih, A., and Teh, Y.W. (2017, January 24\u201326). The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables. Proceedings of the 5th International Conference on Learning Representations, ICLR 2017, Toulon, France. Available online: https:\/\/openreview.net\/forum?id=S1jE5L5gl."},{"key":"ref_35","unstructured":"Bengio, Y., Ducharme, R., Vincent, P., and Jauvin, C. (2001). Advances in Neural Information Processing Systems 13 (NIPS 2000), MIT Press."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Johnson, J., Hariharan, B., Van Der Maaten, L., Hoffman, J., Fei-Fei, L., Zitnick, C.L., and Girshick, R. (2017, January 22\u201329). Inferring and Executing Programs for Visual Reasoning. Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.325"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Andreas, J., Rohrbach, M., Darrell, T., and Klein, D. (2016, January 12\u201317). Learning to Compose Neural Networks for Question Answering. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA.","DOI":"10.18653\/v1\/N16-1181"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"e39","DOI":"10.1002\/ail2.39","article-title":"Explainable neural computation via stack neural module networks","volume":"2","author":"Hu","year":"2021","journal-title":"Appl. AI Lett."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Mascharka, D., Tran, P., Soklaski, R., and Majumdar, A. (2018, January 18\u201323). Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00519"},{"key":"ref_40","unstructured":"Yi, K., Wu, J., Gan, C., Torralba, A., Kohli, P., and Tenenbaum, J. (2023). Advances in Neural Information Processing Systems, Curran Associates, Inc.. Available online: https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2018\/hash\/5e388103a391daabe3de1d76a6739ccd-Abstract.html."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Peng, B., Alcaide, E., Anthony, Q., Albalak, A., Arcadinho, S., Cao, H., Cheng, X., Chung, M., Grella, M., and Kiran, K. (2023). RWKV: Reinventing RNNs for the Transformer Era. arXiv.","DOI":"10.18653\/v1\/2023.findings-emnlp.936"},{"key":"ref_42","unstructured":"Karypis, G. (2023, August 02). CLUTO\u2014A Clustering Toolkit. Report, Apr. Available online: http:\/\/conservancy.umn.edu\/handle\/11299\/215521."}],"container-title":["Bioengineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2306-5354\/10\/11\/1307\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T21:21:19Z","timestamp":1760131279000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2306-5354\/10\/11\/1307"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,11,10]]},"references-count":42,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2023,11]]}},"alternative-id":["bioengineering10111307"],"URL":"https:\/\/doi.org\/10.3390\/bioengineering10111307","relation":{},"ISSN":["2306-5354"],"issn-type":[{"value":"2306-5354","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,11,10]]}}}