{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,13]],"date-time":"2026-03-13T05:49:17Z","timestamp":1773380957590,"version":"3.50.1"},"reference-count":30,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2023,9,2]],"date-time":"2023-09-02T00:00:00Z","timestamp":1693612800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,9,2]],"date-time":"2023-09-02T00:00:00Z","timestamp":1693612800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100000024","name":"Canadian Institutes of Health Research","doi-asserted-by":"publisher","award":["201809FDN-409926-FDN-CBBA-114817"],"award-info":[{"award-number":["201809FDN-409926-FDN-CBBA-114817"]}],"id":[{"id":"10.13039\/501100000024","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Brain Inf."],"published-print":{"date-parts":[[2023,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Background<\/jats:title>\n                    <jats:p>Abstracting cerebrovascular disease (CeVD) from inpatient electronic medical records (EMRs) through natural language processing (NLP) is pivotal for automated disease surveillance and improving patient outcomes. Existing methods rely on coders\u2019 abstraction, which has time delays and under-coding issues. This study sought to develop an NLP-based method to detect CeVD using EMR clinical notes.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Methods<\/jats:title>\n                    <jats:p>\n                      CeVD status was confirmed through a chart review on randomly selected hospitalized patients who were 18\u00a0years or older and discharged from 3 hospitals in Calgary, Alberta, Canada, between January 1 and June 30, 2015. These patients\u2019 chart data were linked to administrative discharge abstract database (DAD) and Sunrise\n                      <jats:sup>\u2122<\/jats:sup>\n                      Clinical Manager (SCM) EMR database records by Personal Health Number (a unique lifetime identifier) and admission date. We trained multiple natural language processing (NLP) predictive models by combining two clinical concept extraction methods and two supervised machine learning (ML) methods: random forest and XGBoost. Using chart review as the reference standard, we compared the model performances with those of the commonly applied International Classification of Diseases (ICD-10-CA) codes, on the metrics of sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV).\n                    <\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Result<\/jats:title>\n                    <jats:p>Of the study sample (n\u2009=\u20093036), the prevalence of CeVD was 11.8% (n\u2009=\u2009360); the median patient age was 63; and females accounted for 50.3% (n\u2009=\u20091528) based on chart data. Among 49 extracted clinical documents from the EMR, four document types were identified as the most influential text sources for identifying CeVD disease (\u201cnursing transfer report,\u201d \u201cdischarge summary,\u201d \u201cnursing notes,\u201d and \u201cinpatient consultation.\u201d). The best performing NLP model was XGBoost, combining the Unified Medical Language System concepts extracted by cTAKES (e.g., top-ranked concepts, \u201cCerebrovascular accident\u201d and \u201cTransient ischemic attack\u201d), and the term frequency-inverse document frequency vectorizer. Compared with ICD codes, the model achieved higher validity overall, such as sensitivity (25.0% vs 70.0%), specificity (99.3% vs 99.1%), PPV (82.6 vs. 87.8%), and NPV (90.8% vs 97.1%).<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Conclusion<\/jats:title>\n                    <jats:p>The NLP algorithm developed in this study performed better than the ICD code algorithm in detecting CeVD. The NLP models could result in an automated EMR tool for identifying CeVD cases and be applied for future studies such as surveillance, and longitudinal studies.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1186\/s40708-023-00203-w","type":"journal-article","created":{"date-parts":[[2023,9,2]],"date-time":"2023-09-02T11:01:59Z","timestamp":1693652519000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["Cerebrovascular disease case identification in inpatient electronic medical record data using natural language processing"],"prefix":"10.1186","volume":"10","author":[{"given":"Jie","family":"Pan","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zilong","family":"Zhang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Steven Ray","family":"Peters","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shabnam","family":"Vatanpour","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Robin L.","family":"Walker","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Seungwon","family":"Lee","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Elliot A.","family":"Martin","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hude","family":"Quan","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2023,9,2]]},"reference":[{"key":"203_CR1","doi-asserted-by":"publisher","DOI":"10.1126\/scitranslmed.3001456","author":"CP Friedman","year":"2010","unstructured":"Friedman CP, Wong AK, Blumenthal D (2010) Policy: achieving a nationwide learning health system. Sci Transl Med. https:\/\/doi.org\/10.1126\/scitranslmed.3001456","journal-title":"Sci Transl Med"},{"issue":"2","key":"203_CR2","doi-asserted-by":"publisher","first-page":"457","DOI":"10.1093\/BRAIN\/AWAB439","volume":"145","author":"AK Bonkhoff","year":"2022","unstructured":"Bonkhoff AK, Grefkes C (2022) Precision medicine in stroke: towards personalized outcome predictions using artificial intelligence. Brain 145(2):457\u2013475. https:\/\/doi.org\/10.1093\/BRAIN\/AWAB439","journal-title":"Brain"},{"issue":"4","key":"203_CR3","doi-asserted-by":"publisher","first-page":"1424","DOI":"10.1111\/j.1475-6773.2007.00822.x","volume":"43","author":"H Quan","year":"2008","unstructured":"Quan H et al (2008) Assessing validity of ICD-9-CM and ICD-10 administrative data in recording clinical conditions in a unique dually coded database. Health Serv Res 43(4):1424\u20131441. https:\/\/doi.org\/10.1111\/j.1475-6773.2007.00822.x","journal-title":"Health Serv Res"},{"issue":"6","key":"203_CR4","doi-asserted-by":"publisher","first-page":"797","DOI":"10.1001\/jamaoncol.2016.0213","volume":"2","author":"WW Yim","year":"2016","unstructured":"Yim WW, Yetisgen M, Harris WP, Sharon WK (2016) Natural language processing in oncology review. JAMA Oncol 2(6):797\u2013804. https:\/\/doi.org\/10.1001\/jamaoncol.2016.0213","journal-title":"JAMA Oncol"},{"issue":"7","key":"203_CR5","doi-asserted-by":"publisher","first-page":"1946","DOI":"10.1161\/STROKEAHA.116.012390","volume":"47","author":"AYX Yu","year":"2016","unstructured":"Yu AYX et al (2016) Use and utility of administrative health data for stroke research and surveillance. Stroke 47(7):1946\u20131952. https:\/\/doi.org\/10.1161\/STROKEAHA.116.012390","journal-title":"Stroke"},{"key":"203_CR6","doi-asserted-by":"publisher","unstructured":"Kruse CS, Kothman K, Anerobi K, Abanaka L (2016) \u2018Adoption factors of the electronic health record: a systematic review\u2019, JMIR Med Inform 4(2):e19 https:\/\/medinform.jmir.org\/2016\/2\/e19, vol. 4, no. 2, p. e5525, Jun. 2016, doi: https:\/\/doi.org\/10.2196\/MEDINFORM.5525","DOI":"10.2196\/MEDINFORM.5525"},{"issue":"3","key":"203_CR7","doi-asserted-by":"publisher","first-page":"457","DOI":"10.1093\/JAMIA\/OCZ200","volume":"27","author":"S Wu","year":"2020","unstructured":"Wu S et al (2020) Deep learning in clinical natural language processing: a methodical review. J Am Med Inform Assoc 27(3):457\u2013470. https:\/\/doi.org\/10.1093\/JAMIA\/OCZ200","journal-title":"J Am Med Inform Assoc"},{"key":"203_CR8","doi-asserted-by":"publisher","unstructured":"S. Lee et al (2021) Electronic Medical record\u2013based case phenotyping for the Charlson conditions: scoping review. JMIR Med Inform 9(2): e23934 https:\/\/medinform.jmir.org\/2021\/2\/e23934, vol. 9, no. 2, p. e23934, Feb. 2021, doi: https:\/\/doi.org\/10.2196\/23934","DOI":"10.2196\/23934"},{"issue":"1","key":"203_CR9","doi-asserted-by":"publisher","first-page":"181","DOI":"10.1161\/STROKEAHA.120.030663","volume":"52","author":"W Guan","year":"2021","unstructured":"Guan W et al (2021) Automated electronic phenotyping of cardioembolic stroke. Stroke 52(1):181\u2013189. https:\/\/doi.org\/10.1161\/STROKEAHA.120.030663","journal-title":"Stroke"},{"issue":"7","key":"203_CR10","doi-asserted-by":"publisher","first-page":"2045","DOI":"10.1016\/J.JSTROKECEREBROVASDIS.2019.02.004","volume":"28","author":"R Garg","year":"2019","unstructured":"Garg R, Oh E, Naidech A, Kording K, Prabhakaran S (2019) Automating ischemic stroke subtype classification using machine learning and natural language processing. J Stroke Cerebrovasc Dis 28(7):2045\u20132051. https:\/\/doi.org\/10.1016\/J.JSTROKECEREBROVASDIS.2019.02.004","journal-title":"J Stroke Cerebrovasc Dis"},{"issue":"10","key":"203_CR11","doi-asserted-by":"publisher","first-page":"2922","DOI":"10.1109\/JBHI.2020.2976931","volume":"24","author":"SF Sung","year":"2020","unstructured":"Sung SF, Lin CY, Hu YH (2020) EMR-based phenotyping of ischemic stroke using supervised machine learning and text mining techniques. IEEE J Biomed Health Inform 24(10):2922\u20132931. https:\/\/doi.org\/10.1109\/JBHI.2020.2976931","journal-title":"IEEE J Biomed Health Inform"},{"issue":"2","key":"203_CR12","doi-asserted-by":"publisher","first-page":"164","DOI":"10.1212\/WNL.0000000000003490","volume":"88","author":"VM Castro","year":"2017","unstructured":"Castro VM et al (2017) Large-scale identification of patients with cerebral aneurysms using natural language processing. Neurology 88(2):164. https:\/\/doi.org\/10.1212\/WNL.0000000000003490","journal-title":"Neurology"},{"issue":"3","key":"203_CR13","doi-asserted-by":"publisher","first-page":"758","DOI":"10.1161\/STROKEAHA.118.024124","volume":"50","author":"S Bacchi","year":"2019","unstructured":"Bacchi S, Oakden-Rayner L, Zerner T, Kleinig T, Patel S, Jannes J (2019) Deep learning natural language processing successfully predicts the cerebrovascular cause of transient ischemic attack-like presentations. Stroke 50(3):758\u2013760. https:\/\/doi.org\/10.1161\/STROKEAHA.118.024124","journal-title":"Stroke"},{"issue":"2","key":"203_CR14","doi-asserted-by":"publisher","first-page":"291","DOI":"10.1007\/S12028-022-01513-3\/FIGURES\/3","volume":"37","author":"MI Miller","year":"2022","unstructured":"Miller MI et al (2022) Natural language processing of radiology reports to detect complications of ischemic stroke. Neurocrit Care 37(2):291\u2013302. https:\/\/doi.org\/10.1007\/S12028-022-01513-3\/FIGURES\/3","journal-title":"Neurocrit Care"},{"key":"203_CR15","doi-asserted-by":"publisher","DOI":"10.21203\/rs.3.rs-505934\/v1","author":"CA Eastwood","year":"2021","unstructured":"Eastwood CA, Southern DA, Khair S, Doktorchik C, Ghali WA, Quan H (2021) The ICD-11 field trial: creating a large dually coded database. Res Sq Prepr. https:\/\/doi.org\/10.21203\/rs.3.rs-505934\/v1","journal-title":"Res Sq Prepr"},{"key":"203_CR16","doi-asserted-by":"publisher","DOI":"10.23889\/IJPDS.V5I1.1123","author":"S Lee","year":"2020","unstructured":"Lee S et al (2020) Unlocking the potential of electronic health records for health research. Int J Popul Data Sci. https:\/\/doi.org\/10.23889\/IJPDS.V5I1.1123","journal-title":"Int J Popul Data Sci"},{"issue":"2","key":"203_CR17","doi-asserted-by":"publisher","first-page":"152","DOI":"10.1016\/j.cjca.2012.01.005","volume":"28","author":"H Quan","year":"2012","unstructured":"Quan H, Smith M, Bartlett-Esquilant G, Johansen H, Tu K, Lix L (2012) Mining administrative health databases to advance medical science: geographical considerations and untapped potential in Canada. Can J Cardiol 28(2):152\u2013154. https:\/\/doi.org\/10.1016\/j.cjca.2012.01.005","journal-title":"Can J Cardiol"},{"key":"203_CR18","doi-asserted-by":"publisher","first-page":"121","DOI":"10.1016\/B978-1-55860-335-6.50023-4","volume":"1994","author":"GH John","year":"1994","unstructured":"John GH, Kohavi R, Pfleger K (1994) Irrelevant features and the subset selection problem. Mach Learn Proc 1994:121\u2013129. https:\/\/doi.org\/10.1016\/B978-1-55860-335-6.50023-4","journal-title":"Mach Learn Proc"},{"key":"203_CR19","doi-asserted-by":"publisher","first-page":"76","DOI":"10.1016\/J.JCLINEPI.2015.10.002","volume":"71","author":"F Bagherzadeh-Khiabani","year":"2016","unstructured":"Bagherzadeh-Khiabani F, Ramezankhani A, Azizi F, Hadaegh F, Steyerberg EW, Khalili D (2016) A tutorial on variable selection for clinical prediction models: feature selection methods in data mining could improve the results. J Clin Epidemiol 71:76\u201385. https:\/\/doi.org\/10.1016\/J.JCLINEPI.2015.10.002","journal-title":"J Clin Epidemiol"},{"key":"203_CR20","unstructured":"Vijayarani S, Ilamathi MJ, Nithya M and undefined (2015) Preprocessing techniques for text mining-an overview\u2019, researchgate.net, Accessed 18 May 2023."},{"key":"203_CR21","doi-asserted-by":"publisher","unstructured":"Neumann M, King D, Beltagy I, Ammar W (2019) ScispaCy: fast and robust models for biomedical natural language processing\u2019, BioNLP 2019 - SIGBioMed Workshop on Biomedical Natural Language Processing, Proceedings of the 18th BioNLP Workshop and Shared Task, pp. 319\u2013327, Feb. 2019, doi: https:\/\/doi.org\/10.18653\/v1\/W19-5034","DOI":"10.18653\/v1\/W19-5034"},{"key":"203_CR22","doi-asserted-by":"publisher","first-page":"D267","DOI":"10.1093\/NAR\/GKH061","volume":"32","author":"O Bodenreider","year":"2004","unstructured":"Bodenreider O (2004) The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res 32:D267\u2013D270. https:\/\/doi.org\/10.1093\/NAR\/GKH061","journal-title":"Nucleic Acids Res"},{"issue":"1","key":"203_CR23","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1023\/A:1010933404324\/METRICS","volume":"45","author":"L Breiman","year":"2001","unstructured":"Breiman L (2001) Random forests. Mach Learn 45(1):5\u201332. https:\/\/doi.org\/10.1023\/A:1010933404324\/METRICS","journal-title":"Mach Learn"},{"key":"203_CR24","doi-asserted-by":"publisher","unstructured":"Chen T, Guestrin C. XGBoost: A scalable tree boosting system, Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. doi: https:\/\/doi.org\/10.1145\/2939672","DOI":"10.1145\/2939672"},{"key":"203_CR25","doi-asserted-by":"publisher","unstructured":"Xu Z, Huang G, Weinberger KQ, Zheng AX (2014) Gradient boosted feature selection\u2019, Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 522\u2013531, 2014, doi: https:\/\/doi.org\/10.1145\/2623330.2623635","DOI":"10.1145\/2623330.2623635"},{"key":"203_CR26","doi-asserted-by":"publisher","first-page":"307","DOI":"10.1007\/978-1-4419-9326-7_11","volume-title":"Ensemble machine learning","author":"Y Qi","year":"2012","unstructured":"Qi Y (2012) Random forest for bioinformatics. In: Zhang C, Ma Y (eds) Ensemble machine learning. Springer, New York, pp 307\u2013323. https:\/\/doi.org\/10.1007\/978-1-4419-9326-7_11"},{"issue":"5","key":"203_CR27","doi-asserted-by":"publisher","first-page":"511","DOI":"10.1093\/INTQHC\/MZU064","volume":"26","author":"H Quan","year":"2014","unstructured":"Quan H et al (2014) International variation in the definition of \u201cmain condition\u201d in ICD-coded health data. Int J Qual Health Care 26(5):511\u2013515. https:\/\/doi.org\/10.1093\/INTQHC\/MZU064","journal-title":"Int J Qual Health Care"},{"key":"203_CR28","unstructured":"Huang K, Altosaar J, Ranganath R (2019) ClinicalBERT: modeling clinical notes and predicting hospital readmission. Apr. 2019, Accessed 24 May 2023"},{"key":"203_CR29","doi-asserted-by":"publisher","unstructured":"E. Nurmambetova et al. (2023) Developing an inpatient electronic medical record phenotype for hospital-acquired pressure injuries: case study using natural language processing models\u2019, JMIR AI 2: e41264 https:\/\/ai.jmir.org\/2023\/1\/e41264, vol. 2, no. 1, p. e41264, Mar. 2023, doi: https:\/\/doi.org\/10.2196\/41264","DOI":"10.2196\/41264"},{"key":"203_CR30","doi-asserted-by":"publisher","first-page":"833","DOI":"10.1007\/978-1-4419-9530-8_40","volume-title":"Translational stroke research: from target selection to clinical trials","author":"P Mandava","year":"2012","unstructured":"Mandava P, Krumpelman CS, Murthy SB, Kent TA (2012) A critical review of stroke trial analytical methodology: outcome measures, study design, and correction for imbalances. In: Lapchak PA, Zhang JH (eds) Translational stroke research: from target selection to clinical trials. Springer, New York, pp 833\u2013861. https:\/\/doi.org\/10.1007\/978-1-4419-9530-8_40"}],"container-title":["Brain Informatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40708-023-00203-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s40708-023-00203-w\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40708-023-00203-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,18]],"date-time":"2023-11-18T13:27:51Z","timestamp":1700314071000},"score":1,"resource":{"primary":{"URL":"https:\/\/braininformatics.springeropen.com\/articles\/10.1186\/s40708-023-00203-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,9,2]]},"references-count":30,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,12]]}},"alternative-id":["203"],"URL":"https:\/\/doi.org\/10.1186\/s40708-023-00203-w","relation":{"has-preprint":[{"id-type":"doi","id":"10.21203\/rs.3.rs-2640617\/v1","asserted-by":"object"}]},"ISSN":["2198-4018","2198-4026"],"issn-type":[{"value":"2198-4018","type":"print"},{"value":"2198-4026","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,9,2]]},"assertion":[{"value":"28 February 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"14 August 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 September 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"This study was approved by the Conjoint Health Research Ethics Board at the University of Calgary (REB19-0088).","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"The authors declare that they have no competing interests to disclosure.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"22"}}