{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,19]],"date-time":"2026-02-19T13:19:29Z","timestamp":1771507169849,"version":"3.50.1"},"reference-count":12,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2019,11,15]],"date-time":"2019-11-15T00:00:00Z","timestamp":1573776000000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"},{"start":{"date-parts":[[2019,11,15]],"date-time":"2019-11-15T00:00:00Z","timestamp":1573776000000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Med Inform Decis Mak"],"published-print":{"date-parts":[[2019,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n<jats:title>Background<\/jats:title>\n<jats:p>Electronic medical records (EMR) contain numerical data important for clinical outcomes research, such as vital signs and cardiac ejection fractions (EF), which tend to be embedded in narrative clinical notes. In current practice, this data is often manually extracted for use in research studies. However, due to the large volume of notes in datasets, manually extracting numerical data often becomes infeasible. The objective of this study is to develop and validate a natural language processing (NLP) tool that can efficiently extract numerical clinical data from narrative notes.<\/jats:p>\n<\/jats:sec><jats:sec>\n<jats:title>Results<\/jats:title>\n<jats:p>To validate the accuracy of the tool EXTraction of EMR Numerical Data (EXTEND), we developed a reference standard by manually extracting vital signs from 285 notes, EF values from 300 notes, glycated hemoglobin (HbA1C), and serum creatinine from 890 notes. For each parameter of interest, we calculated the sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and F<jats:sub>1<\/jats:sub> score of EXTEND using two metrics.<\/jats:p>\n<jats:p>(1) completion of data extraction, and (2) accuracy of data extraction compared to the actual values in the note verified by chart review. At the note level, extraction by EXTEND was considered correct only if it accurately detected and extracted all values of interest in a note.<\/jats:p>\n<jats:p>Using manually-annotated labels as the gold standard, the note-level accuracy of EXTEND in capturing the numerical vital sign values, EF, HbA1C and creatinine ranged from 0.88 to 0.95 for sensitivity, 0.95 to 1.0 for specificity, 0.95 to 1.0 for PPV, 0.89 to 0.99 for NPV, and 0.92 to 0.96 in F<jats:sub>1<\/jats:sub> scores. Compared to the actual value level, the sensitivity, PPV, and F<jats:sub>1<\/jats:sub> score of EXTEND ranged from 0.91 to 0.95, 0.95 to 1.0 and 0.95 to 0.96.<\/jats:p>\n<\/jats:sec><jats:sec>\n<jats:title>Conclusions<\/jats:title>\n<jats:p>EXTEND is an efficient, flexible tool that uses knowledge-based rules to extract clinical numerical parameters with high accuracy. By increasing dictionary terms and developing new rules, the usage of EXTEND can easily be expanded to extract additional numerical data important in clinical outcomes research.<\/jats:p>\n<\/jats:sec>","DOI":"10.1186\/s12911-019-0970-1","type":"journal-article","created":{"date-parts":[[2019,11,15]],"date-time":"2019-11-15T17:02:47Z","timestamp":1573837367000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":18,"title":["EXTraction of EMR numerical data: an efficient and generalizable tool to EXTEND clinical research"],"prefix":"10.1186","volume":"19","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5772-7460","authenticated-orcid":false,"given":"Tianrun","family":"Cai","sequence":"first","affiliation":[]},{"given":"Luwan","family":"Zhang","sequence":"additional","affiliation":[]},{"given":"Nicole","family":"Yang","sequence":"additional","affiliation":[]},{"given":"Kanako K.","family":"Kumamaru","sequence":"additional","affiliation":[]},{"given":"Frank J.","family":"Rybicki","sequence":"additional","affiliation":[]},{"given":"Tianxi","family":"Cai","sequence":"additional","affiliation":[]},{"given":"Katherine P.","family":"Liao","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2019,11,15]]},"reference":[{"issue":"2","key":"970_CR1","doi-asserted-by":"publisher","first-page":"161","DOI":"10.1136\/jamia.1994.95236146","volume":"1","author":"C Friedman","year":"1994","unstructured":"Friedman C, Alderson PO, Austin JH, Cimino JJ, Johnson SB. A general natural-language text processor for clinical radiology. J Am Med Inform Assoc. 1994;1(2):161\u201374.","journal-title":"J Am Med Inform Assoc"},{"issue":"1","key":"970_CR2","doi-asserted-by":"publisher","first-page":"19","DOI":"10.1197\/jamia.M3378","volume":"17","author":"H Xu","year":"2010","unstructured":"Xu H, Stenner SP, Doan S, Johnson KB, Waitman LR, Denny JC. MedEx: a medication information extraction system for clinical narratives. J Am Med Inform Assoc. 2010;17(1):19\u201324.","journal-title":"J Am Med Inform Assoc"},{"issue":"5","key":"970_CR3","doi-asserted-by":"publisher","first-page":"507","DOI":"10.1136\/jamia.2009.001560","volume":"17","author":"GK Savova","year":"2010","unstructured":"Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC, Chute CG. Mayo clinical text analysis and knowledge extraction system (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc. 2010;17(5):507\u201313.","journal-title":"J Am Med Inform Assoc"},{"issue":"3","key":"970_CR4","doi-asserted-by":"publisher","first-page":"229","DOI":"10.1136\/jamia.2009.002733","volume":"17","author":"AR Aronson","year":"2010","unstructured":"Aronson AR, Lang F-M. An overview of MetaMap: historical perspective and recent advances. J Am Med Inform Assoc. 2010;17(3):229\u201336.","journal-title":"J Am Med Inform Assoc"},{"issue":"5","key":"970_CR5","doi-asserted-by":"publisher","first-page":"580","DOI":"10.1136\/amiajnl-2011-000155","volume":"18","author":"M Torii","year":"2011","unstructured":"Torii M, Wagholikar K, Liu H. Using machine learning for concept extraction on clinical documents from multiple data sources. J Am Med Inform Assoc. 2011;18(5):580\u20137.","journal-title":"J Am Med Inform Assoc"},{"issue":"5","key":"970_CR6","doi-asserted-by":"publisher","first-page":"859","DOI":"10.1136\/amiajnl-2011-000535","volume":"19","author":"JH Garvin","year":"2012","unstructured":"Garvin JH, DuVall SL, South BR, Bray BE, Bolton D, Heavirland J, Pickard S, Heidenreich P, Shen S, Weir C, et al. Automated extraction of ejection fraction for quality measurement using regular expressions in unstructured information management architecture (UIMA) for heart failure. J Am Med Inform Assoc. 2012;19(5):859\u201366.","journal-title":"J Am Med Inform Assoc"},{"issue":"4","key":"970_CR7","doi-asserted-by":"publisher","first-page":"319","DOI":"10.1177\/1460458216651917","volume":"23","author":"F Xie","year":"2017","unstructured":"Xie F, Zheng C, Yuh-Jer Shen A, Chen W. Extracting and analyzing ejection fraction values from electronic echocardiography reports in a large health maintenance organization. Health Informatics J. 2017;23(4):319\u201328.","journal-title":"Health Informatics J"},{"issue":"4","key":"970_CR8","doi-asserted-by":"publisher","first-page":"e0153749","DOI":"10.1371\/journal.pone.0153749","volume":"11","author":"C Nath","year":"2016","unstructured":"Nath C, Albaghdadi MS, Jonnalagadda SR. A natural language processing tool for large-scale data extraction from echocardiography reports. PLoS One. 2016;11(4):e0153749.","journal-title":"PLoS One"},{"issue":"6","key":"970_CR9","doi-asserted-by":"publisher","first-page":"473","DOI":"10.1016\/j.jcct.2016.08.007","volume":"10","author":"KK Kumamaru","year":"2016","unstructured":"Kumamaru KK, Saboo SS, Aghayev A, Cai P, Quesada CG, George E, Hussain Z, Cai T, Rybicki FJ. CT pulmonary angiography-based scoring system to predict the prognosis of acute pulmonary embolism. J Cardiovasc Comput Tomogr. 2016;10(6):473\u20139.","journal-title":"J Cardiovasc Comput Tomogr"},{"key":"970_CR10","unstructured":"Bird S, Klein E, Loper E. Natural language processing with Python: analyzing text with the natural language toolkit. Sebastopol: O'Reilly Media, Inc.; 2009."},{"issue":"1","key":"970_CR11","doi-asserted-by":"publisher","first-page":"176","DOI":"10.1148\/rg.2016150080","volume":"36","author":"T Cai","year":"2016","unstructured":"Cai T, Giannopoulos AA, Yu S, Kelil T, Ripley B, Kumamaru KK, Rybicki FJ, Mitsouras D. Natural language processing technologies in radiology research and clinical applications. RadioGraphics. 2016;36(1):176\u201391.","journal-title":"RadioGraphics"},{"key":"970_CR12","doi-asserted-by":"publisher","DOI":"10.1201\/b18588","volume-title":"Healthcare data analytics","author":"CK Reddy","year":"2015","unstructured":"Reddy CK, Aggarwal CC. Healthcare data analytics, vol. 239. Philadelphia: CRC Press; 2015."}],"container-title":["BMC Medical Informatics and Decision Making"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s12911-019-0970-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/s12911-019-0970-1\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s12911-019-0970-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2020,11,14]],"date-time":"2020-11-14T00:06:15Z","timestamp":1605312375000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcmedinformdecismak.biomedcentral.com\/articles\/10.1186\/s12911-019-0970-1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,11,15]]},"references-count":12,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2019,12]]}},"alternative-id":["970"],"URL":"https:\/\/doi.org\/10.1186\/s12911-019-0970-1","relation":{},"ISSN":["1472-6947"],"issn-type":[{"value":"1472-6947","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,11,15]]},"assertion":[{"value":"25 May 2019","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"6 November 2019","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"15 November 2019","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"This research is not human research and did not require IRB approval.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"Frank J. Rybicki is the Medical Director of Imagia Cybernetics. None of this research was performed as part of this employment. No other author declares a potential competing interest.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"226"}}