{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,25]],"date-time":"2026-03-25T23:36:39Z","timestamp":1774481799806,"version":"3.50.1"},"reference-count":36,"publisher":"Springer Science and Business Media LLC","issue":"S3","license":[{"start":{"date-parts":[[2022,9,27]],"date-time":"2022-09-27T00:00:00Z","timestamp":1664236800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,9,27]],"date-time":"2022-09-27T00:00:00Z","timestamp":1664236800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100006093","name":"Patient-Centered Outcomes Research Institute","doi-asserted-by":"publisher","award":["ME-2018C3-14754"],"award-info":[{"award-number":["ME-2018C3-14754"]}],"id":[{"id":"10.13039\/100006093","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000049","name":"National Institute on Aging","doi-asserted-by":"publisher","award":["1R56AG 069880"],"award-info":[{"award-number":["1R56AG 069880"]}],"id":[{"id":"10.13039\/100000049","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100001364","name":"University of Florida Foundation","doi-asserted-by":"publisher","award":["00129436"],"award-info":[{"award-number":["00129436"]}],"id":[{"id":"10.13039\/100001364","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Med Inform Decis Mak"],"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n                <jats:title>Background<\/jats:title>\n                <jats:p>Diabetic retinopathy (DR) is a leading cause of blindness in American adults. If detected, DR can be treated to prevent further damage causing blindness. There is an increasing interest in developing artificial intelligence (AI) technologies to help detect DR using electronic health records. The lesion-related information documented in fundus image reports is a valuable resource that could help diagnoses of DR in clinical decision support systems. However, most studies for AI-based DR diagnoses are mainly based on medical images; there is limited studies to explore the lesion-related information captured in the free text image reports.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Methods<\/jats:title>\n                <jats:p>In this study, we examined two state-of-the-art transformer-based natural language processing (NLP) models, including BERT and RoBERTa, compared them with a recurrent neural network implemented using Long short-term memory (LSTM) to extract DR-related concepts from clinical narratives. We identified four different categories of DR-related clinical concepts including lesions, eye parts, laterality, and severity, developed annotation guidelines, annotated a DR-corpus of 536 image reports, and developed transformer-based NLP models for clinical concept extraction and relation extraction. We also examined the relation extraction under two settings including \u2018gold-standard\u2019 setting\u2014where gold-standard concepts were used\u2013and end-to-end setting.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Results<\/jats:title>\n                <jats:p>For concept extraction, the BERT model pretrained with the MIMIC III dataset achieve the best performance (0.9503 and 0.9645 for strict\/lenient evaluation). For relation extraction, BERT model pretrained using general English text achieved the best strict\/lenient F1-score of 0.9316. The end-to-end system, BERT_general_e2e, achieved the best strict\/lenient F1-score of 0.8578 and 0.8881, respectively. Another end-to-end system based on the RoBERTa architecture, RoBERTa_general_e2e, also achieved the same performance as BERT_general_e2e in strict scores.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Conclusions<\/jats:title>\n                <jats:p>This study demonstrated the efficiency of transformer-based NLP models for clinical concept extraction and relation extraction. Our results show that it\u2019s necessary to pretrain transformer models using clinical text to optimize the performance for clinical concept extraction. Whereas, for relation extraction, transformers pretrained using general English text perform better.<\/jats:p>\n              <\/jats:sec>","DOI":"10.1186\/s12911-022-01996-2","type":"journal-article","created":{"date-parts":[[2022,9,27]],"date-time":"2022-09-27T11:19:17Z","timestamp":1664277557000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":11,"title":["Identify diabetic retinopathy-related clinical concepts and their attributes using transformer-based natural language processing methods"],"prefix":"10.1186","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7290-8005","authenticated-orcid":false,"given":"Zehao","family":"Yu","sequence":"first","affiliation":[]},{"given":"Xi","family":"Yang","sequence":"additional","affiliation":[]},{"given":"Gianna L.","family":"Sweeting","sequence":"additional","affiliation":[]},{"given":"Yinghan","family":"Ma","sequence":"additional","affiliation":[]},{"given":"Skylar E.","family":"Stolte","sequence":"additional","affiliation":[]},{"given":"Ruogu","family":"Fang","sequence":"additional","affiliation":[]},{"given":"Yonghui","family":"Wu","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,9,27]]},"reference":[{"key":"1996_CR1","doi-asserted-by":"publisher","first-page":"e339","DOI":"10.1016\/S2214-109X(13)70113-X","volume":"1","author":"RRA Bourne","year":"2013","unstructured":"Bourne RRA, Stevens GA, White RA, Smith JL, Flaxman SR, Price H, et al. Causes of vision loss worldwide, 1990\u20132010: a systematic analysis. Lancet Glob Health. 2013;1:e339\u201349.","journal-title":"Lancet Glob Health"},{"key":"1996_CR2","doi-asserted-by":"publisher","first-page":"902","DOI":"10.1001\/jama.298.8.902","volume":"298","author":"Q Mohamed","year":"2007","unstructured":"Mohamed Q, Gillies MC, Wong TY. Management of diabetic retinopathy: a systematic review. JAMA. 2007;298:902.","journal-title":"JAMA"},{"key":"1996_CR3","doi-asserted-by":"publisher","first-page":"3360","DOI":"10.1109\/ACCESS.2018.2888639","volume":"7","author":"Z Gao","year":"2019","unstructured":"Gao Z, Li J, Guo J, Chen Y, Yi Z, Zhong J. Diagnosis of diabetic retinopathy using deep neural networks. IEEE Access. 2019;7:3360\u201370.","journal-title":"IEEE Access"},{"key":"1996_CR4","unstructured":"Yang B, Wright A. Development of deep learning algorithms to categorize free-text notes pertaining to diabetes: convolution neural networks achieve higher accuracy than support vector machines. arXiv:1809.05814. 2018"},{"key":"1996_CR5","first-page":"267","volume":"2019","author":"BT Bucher","year":"2020","unstructured":"Bucher BT, Shi J, Pettit RJ, Ferraro J, Chapman WW, Gundlapalli A. Determination of marital status of patients from structured and unstructured electronic healthcare data. AMIA Annu Symp Proc. 2020;2019:267\u201374.","journal-title":"AMIA Annu Symp Proc"},{"key":"1996_CR6","doi-asserted-by":"publisher","first-page":"1163","DOI":"10.1093\/jamia\/ocz163","volume":"26","author":"A Stubbs","year":"2019","unstructured":"Stubbs A, Filannino M, Soysal E, Henry S, Uzuner \u00d6. Cohort selection for clinical trials: n2c2 2018 shared task track 1. J Am Med Inform Assoc. 2019;26:1163\u201371.","journal-title":"J Am Med Inform Assoc"},{"key":"1996_CR7","doi-asserted-by":"crossref","unstructured":"Nguyen DQ, Verspoor K. End-to-end neural relation extraction using deep biaffine attention. arXiv:1812.11275. 2019;11437:729\u201338.","DOI":"10.1007\/978-3-030-15712-8_47"},{"key":"1996_CR8","doi-asserted-by":"publisher","first-page":"S128","DOI":"10.1016\/j.jbi.2015.08.002","volume":"58","author":"A Khalifa","year":"2015","unstructured":"Khalifa A, Meystre S. Adapting existing natural language processing resources for cardiovascular risk factors identification in clinical notes. J Biomed Inform. 2015;58:S128\u201332.","journal-title":"J Biomed Inform"},{"key":"1996_CR9","unstructured":"Shi P, Lin J. Simple BERT models for relation extraction and semantic role labeling. arXiv:1904.05255. 2019"},{"key":"1996_CR10","doi-asserted-by":"publisher","first-page":"106","DOI":"10.1016\/j.ins.2007.07.020","volume":"178","author":"WL Yun","year":"2008","unstructured":"Yun WL, Rajendra Acharya U, Venkatesh YV, Chee C, Min LC, Ng EYK. Identification of different stages of diabetic retinopathy using retinal optical images. Inf Sci. 2008;178:106\u201321.","journal-title":"Inf Sci"},{"key":"1996_CR11","doi-asserted-by":"publisher","first-page":"78","DOI":"10.1016\/j.compmedimag.2015.03.004","volume":"43","author":"E Imani","year":"2015","unstructured":"Imani E, Pourreza H-R, Banaee T. Fully automated diabetic retinopathy screening using morphological component analysis. Comput Med Imaging Graph. 2015;43:78\u201388.","journal-title":"Comput Med Imaging Graph"},{"key":"1996_CR12","doi-asserted-by":"publisher","first-page":"86115","DOI":"10.1109\/ACCESS.2019.2918625","volume":"7","author":"Y Sun","year":"2019","unstructured":"Sun Y, Zhang D. Diagnosis and analysis of diabetic retinopathy based on electronic health records. IEEE Access. 2019;7:86115\u201320.","journal-title":"IEEE Access"},{"key":"1996_CR13","unstructured":"Jin Y, Li F, Yu H. HYPE: a high performing NLP system for automatically detecting hypoglycemia events from electronic health record notes. arXiv:1811.11945. 2018"},{"key":"1996_CR14","doi-asserted-by":"publisher","first-page":"131","DOI":"10.1007\/s10916-018-0939-0","volume":"42","author":"H Wu","year":"2018","unstructured":"Wu H, Wei Y, Shang Y, Shi W, Wang L, Li J, et al. iT2DMS: a standard-based diabetic disease data repository and its pilot experiment on diabetic retinopathy phenotyping and examination results integration. J Med Syst. 2018;42:131.","journal-title":"J Med Syst"},{"key":"1996_CR15","doi-asserted-by":"publisher","first-page":"276","DOI":"10.11613\/BM.2012.031","volume":"22","author":"ML McHugh","year":"2012","unstructured":"McHugh ML. Interrater reliability: the kappa statistic. Biochem Med. 2012;22:276\u201382.","journal-title":"Biochem Med"},{"key":"1996_CR16","doi-asserted-by":"publisher","first-page":"55","DOI":"10.1172\/jci.insight.93751","volume":"2","author":"EJ Duh","year":"2017","unstructured":"Duh EJ, Sun JK, Stitt AW. Diabetic retinopathy: current understanding, mechanisms, and treatment strategies. JCI Insight. 2017;2:55. https:\/\/doi.org\/10.1172\/jci.insight.93751.","journal-title":"JCI Insight"},{"key":"1996_CR17","doi-asserted-by":"publisher","first-page":"1816","DOI":"10.3390\/ijms19061816","volume":"19","author":"W Wang","year":"2018","unstructured":"Wang W, Lo ACY. Diabetic retinopathy: pathophysiology and treatments. Int J Mol Sci. 2018;19:1816.","journal-title":"Int J Mol Sci"},{"key":"1996_CR18","unstructured":"Stenetorp P, Pyysalo S, Topi\u0107 G, Ohta T, Ananiadou S, Tsujii J. Brat: a web-based tool for NLP-assisted text annotation. In: Proceedings of the demonstrations at the 13th conference of the European chapter of the association for computational linguistics. Avignon, France: Association for Computational Linguistics; 2012. p. 102\u20137."},{"key":"1996_CR19","unstructured":"Gehrmann S, Dernoncourt F, Li Y, Carlson ET, Wu JT, Welt J, et al. Comparing rule-based and deep learning models for patient phenotyping. arXiv:1703.08705. 2017."},{"key":"1996_CR20","unstructured":"Devlin J, Chang M-W, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805. 2019"},{"key":"1996_CR21","unstructured":"Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, et al. RoBERTa: a robustly optimized BERT pretraining approach. arXiv:1907.11692. 2019"},{"key":"1996_CR22","doi-asserted-by":"publisher","first-page":"1935","DOI":"10.1093\/jamia\/ocaa189","volume":"27","author":"X Yang","year":"2020","unstructured":"Yang X, Bian J, Hogan WR, Wu Y. Clinical concept extraction using transformers. J Am Med Inform Assoc. 2020;27:1935\u201342.","journal-title":"J Am Med Inform Assoc"},{"key":"1996_CR23","first-page":"1812","volume":"2017","author":"Y Wu","year":"2018","unstructured":"Wu Y, Jiang M, Xu J, Zhi D, Xu H. Clinical named entity recognition using deep learning models. AMIA Annu Symp Proc. 2018;2017:1812\u20139.","journal-title":"AMIA Annu Symp Proc"},{"key":"1996_CR24","doi-asserted-by":"crossref","unstructured":"Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, et al. HuggingFace\u2019s transformers: state-of-the-art natural language processing. arXiv:1910.03771.. 2020","DOI":"10.18653\/v1\/2020.emnlp-demos.6"},{"key":"1996_CR25","unstructured":"Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, et al. PyTorch: An imperative style, high-performance deep learning library. arXiv:1912.01703. 2019"},{"key":"1996_CR26","unstructured":"Yang X, Yu Z, Guo Y, Bian J, Wu Y. Clinical relation extraction using transformer-based models. arXiv:2107.08957. 2021"},{"key":"1996_CR27","doi-asserted-by":"publisher","first-page":"232","DOI":"10.1186\/s12911-019-0935-4","volume":"19","author":"X Yang","year":"2019","unstructured":"Yang X, Lyu T, Li Q, Lee C-Y, Bian J, Hogan WR, et al. A study of deep learning methods for de-identification of clinical notes in cross-institute settings. BMC Med Inform Decis Mak. 2019;19:232.","journal-title":"BMC Med Inform Decis Mak"},{"key":"1996_CR28","unstructured":"Joulin A, Grave E, Bojanowski P, Douze M, J\u00e9gou H, Mikolov T. FastText.zip: compressing text classification models.arXiv:1612.03651. 2016"},{"key":"1996_CR29","doi-asserted-by":"publisher","first-page":"e22982","DOI":"10.2196\/22982","volume":"8","author":"X Yang","year":"2020","unstructured":"Yang X, Zhang H, He X, Bian J, Wu Y. Extracting family history of patients from clinical narratives: exploring an end-to-end solution with deep learning models. JMIR Med Inform. 2020;8:e22982.","journal-title":"JMIR Med Inform"},{"key":"1996_CR30","doi-asserted-by":"publisher","first-page":"e19735","DOI":"10.2196\/19735","volume":"8","author":"X Yang","year":"2020","unstructured":"Yang X, He X, Zhang H, Ma Y, Bian J, Wu Y. Measurement of semantic textual similarity in clinical texts: comparison of transformer-based models. JMIR Med Inform. 2020;8:e19735.","journal-title":"JMIR Med Inform"},{"key":"1996_CR31","doi-asserted-by":"publisher","first-page":"160035","DOI":"10.1038\/sdata.2016.35","volume":"3","author":"AEW Johnson","year":"2016","unstructured":"Johnson AEW, Pollard TJ, Shen L, Lehman LH, Feng M, Ghassemi M, et al. MIMIC-III, a freely accessible critical care database. Sci Data. 2016;3:160035.","journal-title":"Sci Data"},{"key":"1996_CR32","doi-asserted-by":"crossref","unstructured":"Schuster M, Nakajima K. Japanese and Korean voice search. In: 2012 IEEE international conference on acoustics, speech and signal processing (ICASSP). 2012. p. 5149\u201352","DOI":"10.1109\/ICASSP.2012.6289079"},{"key":"1996_CR33","doi-asserted-by":"crossref","unstructured":"Sennrich R, Haddow B, Birch A. neural machine translation of rare words with subword units. arXiv:1508.07909. 2016","DOI":"10.18653\/v1\/P16-1162"},{"key":"1996_CR34","doi-asserted-by":"publisher","first-page":"301","DOI":"10.1006\/jbin.2001.1029","volume":"34","author":"WW Chapman","year":"2001","unstructured":"Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG. A simple algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inform. 2001;34:301\u201310.","journal-title":"J Biomed Inform"},{"key":"1996_CR35","first-page":"269","volume":"2020","author":"Z Ji","year":"2020","unstructured":"Ji Z, Wei Q, Xu H. BERT-based ranking for biomedical entity normalization. AMIA Jt Summits Transl Sci Proc. 2020;2020:269\u201377.","journal-title":"AMIA Jt Summits Transl Sci Proc"},{"key":"1996_CR36","doi-asserted-by":"crossref","unstructured":"He Y, Zhu Z, Zhang Y, Chen Q, Caverlee J. Infusing disease knowledge into BERT for health question answering, medical inference and disease name recognition. arXiv: 2010.03746. 2020.","DOI":"10.18653\/v1\/2020.emnlp-main.372"}],"container-title":["BMC Medical Informatics and Decision Making"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12911-022-01996-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12911-022-01996-2\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12911-022-01996-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,9,27]],"date-time":"2022-09-27T11:21:23Z","timestamp":1664277683000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcmedinformdecismak.biomedcentral.com\/articles\/10.1186\/s12911-022-01996-2"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,9,27]]},"references-count":36,"journal-issue":{"issue":"S3","published-online":{"date-parts":[[2022,9]]}},"alternative-id":["1996"],"URL":"https:\/\/doi.org\/10.1186\/s12911-022-01996-2","relation":{},"ISSN":["1472-6947"],"issn-type":[{"value":"1472-6947","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,9,27]]},"assertion":[{"value":"20 June 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"14 September 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"27 September 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"This study was approved by the University of Florida Institutional Review Board (IRB201801358). This is a retrospective study using patient\u2019s electronic health records, a HIPAA waiver of authorization has been approved to waive the consent to participate.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"255"}}