{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,15]],"date-time":"2026-04-15T17:14:36Z","timestamp":1776273276000,"version":"3.50.1"},"reference-count":27,"publisher":"Oxford University Press (OUP)","issue":"11","license":[{"start":{"date-parts":[[2019,5,30]],"date-time":"2019-05-30T00:00:00Z","timestamp":1559174400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/100000002","name":"NIH","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,11,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Objective<\/jats:title>\n                  <jats:p>Natural language processing (NLP) engines such as the clinical Text Analysis and Knowledge Extraction System are a solution for processing notes for research, but optimizing their performance for a clinical data warehouse remains a challenge. We aim to develop a high throughput NLP architecture using the clinical Text Analysis and Knowledge Extraction System and present a predictive model use case.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Materials and Methods<\/jats:title>\n                  <jats:p>The CDW was comprised of 1 103 038 patients across 10 years. The architecture was constructed using the Hadoop data repository for source data and 3 large-scale symmetric processing servers for NLP. Each named entity mention in a clinical document was mapped to the Unified Medical Language System concept unique identifier (CUI).<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>The NLP architecture processed 83 867 802 clinical documents in 13.33 days and produced 37 721 886 606 CUIs across 8 standardized medical vocabularies. Performance of the architecture exceeded 500 000 documents per hour across 30 parallel instances of the clinical Text Analysis and Knowledge Extraction System including 10 instances dedicated to documents greater than 20 000 bytes. In a use\u2013case example for predicting 30-day hospital readmission, a CUI-based model had similar discrimination to n-grams with an area under the curve receiver operating characteristic of 0.75 (95% CI, 0.74\u20130.76).<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Discussion and Conclusion<\/jats:title>\n                  <jats:p>Our health system\u2019s high throughput NLP architecture may serve as a benchmark for large-scale clinical research using a CUI-based approach.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/jamia\/ocz068","type":"journal-article","created":{"date-parts":[[2019,4,24]],"date-time":"2019-04-24T19:16:16Z","timestamp":1556133376000},"page":"1364-1369","source":"Crossref","is-referenced-by-count":27,"title":["Development and application of a high throughput natural language processing architecture to convert all clinical documents in a clinical data warehouse into standardized medical vocabularies"],"prefix":"10.1093","volume":"26","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6368-4652","authenticated-orcid":false,"given":"Majid","family":"Afshar","sequence":"first","affiliation":[{"name":"Center for Health Outcomes and Informatics Research, Health Sciences Division, Loyola University Chicago, Maywood, Illinois, USA"},{"name":"Department of Public Health Sciences, Stritch School of Medicine, Loyola University Chicago, Maywood, Illinois, USA"}]},{"given":"Dmitriy","family":"Dligach","sequence":"additional","affiliation":[{"name":"Center for Health Outcomes and Informatics Research, Health Sciences Division, Loyola University Chicago, Maywood, Illinois, USA"},{"name":"Department of Public Health Sciences, Stritch School of Medicine, Loyola University Chicago, Maywood, Illinois, USA"},{"name":"Department of Computer Science, Loyola University, Chicago, Illinois, USA"}]},{"given":"Brihat","family":"Sharma","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Loyola University, Chicago, Illinois, USA"}]},{"given":"Xiaoyuan","family":"Cai","sequence":"additional","affiliation":[{"name":"Informatics and Systems Development, Health Sciences Division, Loyola University Chicago, Maywood, Illinois, USA"}]},{"given":"Jason","family":"Boyda","sequence":"additional","affiliation":[{"name":"Informatics and Systems Development, Health Sciences Division, Loyola University Chicago, Maywood, Illinois, USA"}]},{"given":"Steven","family":"Birch","sequence":"additional","affiliation":[{"name":"Informatics and Systems Development, Health Sciences Division, Loyola University Chicago, Maywood, Illinois, USA"}]},{"given":"Daniel","family":"Valdez","sequence":"additional","affiliation":[{"name":"Informatics and Systems Development, Health Sciences Division, Loyola University Chicago, Maywood, Illinois, USA"}]},{"given":"Suzan","family":"Zelisko","sequence":"additional","affiliation":[{"name":"Informatics and Systems Development, Health Sciences Division, Loyola University Chicago, Maywood, Illinois, USA"}]},{"given":"Cara","family":"Joyce","sequence":"additional","affiliation":[{"name":"Center for Health Outcomes and Informatics Research, Health Sciences Division, Loyola University Chicago, Maywood, Illinois, USA"},{"name":"Department of Public Health Sciences, Stritch School of Medicine, Loyola University Chicago, Maywood, Illinois, USA"}]},{"given":"Fran\u00e7ois","family":"Modave","sequence":"additional","affiliation":[{"name":"Center for Health Outcomes and Informatics Research, Health Sciences Division, Loyola University Chicago, Maywood, Illinois, USA"},{"name":"Department of Public Health Sciences, Stritch School of Medicine, Loyola University Chicago, Maywood, Illinois, USA"}]},{"given":"Ron","family":"Price","sequence":"additional","affiliation":[{"name":"Center for Health Outcomes and Informatics Research, Health Sciences Division, Loyola University Chicago, Maywood, Illinois, USA"},{"name":"Informatics and Systems Development, Health Sciences Division, Loyola University Chicago, Maywood, Illinois, USA"}]}],"member":"286","published-online":{"date-parts":[[2019,5,30]]},"reference":[{"issue":"5","key":"2021012411195921000_ocz068-B1","doi-asserted-by":"crossref","first-page":"1007","DOI":"10.1093\/jamia\/ocv180","article-title":"Extracting information from the text of electronic medical records to improve case detection: a systematic review","volume":"23","author":"Ford","year":"2016","journal-title":"J Am Med Inform Assoc"},{"key":"2021012411195921000_ocz068-B2","first-page":"128","article-title":"Extracting information from textual documents in the electronic health record: a review of recent research","author":"Meystre","year":"2008","journal-title":"Yearb Med Inform"},{"key":"2021012411195921000_ocz068-B3","doi-asserted-by":"crossref","first-page":"122","DOI":"10.1055\/s-0038-1626725","article-title":"Development and validation of a natural language processing tool to identify patients treated for pneumonia across VA emergency departments","volume":"9","author":"Jones","year":"2018","journal-title":"Appl Clin Inform"},{"issue":"2","key":"2021012411195921000_ocz068-B4","doi-asserted-by":"crossref","first-page":"164","DOI":"10.1212\/WNL.0000000000003490","article-title":"Large-scale identification of patients with cerebral aneurysms using natural language processing","volume":"88","author":"Castro","year":"2017","journal-title":"Neurology"},{"issue":"12","key":"2021012411195921000_ocz068-B5","doi-asserted-by":"crossref","first-page":"1057","DOI":"10.1016\/j.ijmedinf.2015.09.002","article-title":"Using natural language processing to identify problem usage of prescription opioids","volume":"84","author":"Carrell","year":"2015","journal-title":"Int J Med Inform"},{"key":"2021012411195921000_ocz068-B6","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1155\/2018\/4302425","article-title":"Data processing and text mining technologies on electronic medical records: a review","volume":"2018","author":"Sun","year":"2018","journal-title":"J Healthc Eng"},{"issue":"5","key":"2021012411195921000_ocz068-B7","doi-asserted-by":"crossref","first-page":"507","DOI":"10.1136\/jamia.2009.001560","article-title":"Mayo clinical text analysis and knowledge extraction system (cTAKES): architecture, component evaluation and applications","volume":"17","author":"Savova","year":"2010","journal-title":"J Am Med Inform Assoc"},{"key":"2021012411195921000_ocz068-B8","first-page":"1179","article-title":"Detecting opioid-related aberrant behavior using natural language processing","volume":"2017","author":"Lingeman","year":"2018","journal-title":"AMIA Annu Symp Proc"},{"key":"2021012411195921000_ocz068-B9","author":"Yetisgen-Yildiz"},{"issue":"11","key":"2021012411195921000_ocz068-B10","doi-asserted-by":"crossref","first-page":"e78927","DOI":"10.1371\/journal.pone.0078927","article-title":"Modeling disease severity in multiple sclerosis using electronic health records","volume":"8","author":"Xia","year":"2013","journal-title":"PLoS One"},{"issue":"24","key":"2021012411195921000_ocz068-B11","doi-asserted-by":"crossref","first-page":"2647","DOI":"10.1001\/jama.2016.18533","article-title":"Association between hospital penalty status under the hospital readmission reduction program and readmission rates for target and nontarget conditions","volume":"316","author":"Desai","year":"2016","journal-title":"JAMA"},{"issue":"7","key":"2021012411195921000_ocz068-B12","doi-asserted-by":"crossref","first-page":"1108","DOI":"10.1097\/00005650-199807000-00016","article-title":"Casemix adjustment of managed care claims data using the clinical classification for health policy research method","volume":"36","author":"Cowen","year":"1998","journal-title":"Med Care"},{"issue":"11","key":"2021012411195921000_ocz068-B13","doi-asserted-by":"crossref","first-page":"e1002701","DOI":"10.1371\/journal.pmed.1002701","article-title":"Development and validation of machine learning models to identify high-risk surgical patients using automatically curated electronic health record data (Pythia): a retrospective, single-site study","volume":"15","author":"Corey","year":"2018","journal-title":"PLoS Med"},{"issue":"1","key":"2021012411195921000_ocz068-B14","doi-asserted-by":"crossref","first-page":"40","DOI":"10.1007\/s00134-011-2390-2","article-title":"Effect of changes over time in the performance of a customized SAPS-II model on the quality of care assessment","volume":"38","author":"Minne","year":"2012","journal-title":"Intensive Care Med"},{"key":"2021012411195921000_ocz068-B15","first-page":"2825","article-title":"Scikit learn: machine learning in python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"JMLR"},{"issue":"6","key":"2021012411195921000_ocz068-B16","doi-asserted-by":"crossref","first-page":"548","DOI":"10.3414\/ME14-02-0018","article-title":"Scaling-up NLP pipelines to process large corpora of clinical notes","volume":"54","author":"Divita","year":"2015","journal-title":"Methods Inf Med"},{"issue":"1","key":"2021012411195921000_ocz068-B17","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1186\/s12911-018-0719-2","article-title":"Big data hurdles in precision medicine and precision public health","volume":"18","author":"Prosperi","year":"2018","journal-title":"BMC Med Inform Decis Mak"},{"key":"2021012411195921000_ocz068-B18","doi-asserted-by":"crossref","first-page":"214","DOI":"10.15265\/IY-2017-029","article-title":"Capturing the patient's perspective: a review of advances in natural language processing of health-related text","volume":"26","author":"Gonzalez-Hernandez","year":"2017","journal-title":"Yearb Med Inform"},{"key":"2021012411195921000_ocz068-B19","article-title":"Capturing social health data in electronic systems: a systematic review","author":"Venzon","journal-title":"Comput Inform Nurs"},{"key":"2021012411195921000_ocz068-B20","first-page":"13","article-title":"Toward a learning health-care system\u2014knowledge delivery at the point of care empowered by big data and NLP","volume":"8","author":"Kaggal","year":"2016","journal-title":"Biomed Inform Insights"},{"key":"2021012411195921000_ocz068-B21","first-page":"276","article-title":"HTP-NLP: a new NLP system for high throughput phenotyping","volume":"235","author":"Schlegel","year":"2017","journal-title":"Stud Health Technol Inform"},{"key":"2021012411195921000_ocz068-B22","article-title":"Automated feature selection of predictors in electronic medical records data","author":"Gronsbell","journal-title":"Biometrics"},{"issue":"1","key":"2021012411195921000_ocz068-B23","doi-asserted-by":"crossref","first-page":"54","DOI":"10.1093\/jamia\/ocx111","article-title":"Enabling phenotypic big data with PheNorm","volume":"25","author":"Yu","year":"2018","journal-title":"J Am Med Inform Assoc"},{"issue":"5","key":"2021012411195921000_ocz068-B24","doi-asserted-by":"crossref","first-page":"993","DOI":"10.1093\/jamia\/ocv034","article-title":"Toward high-throughput phenotyping: unbiased automated feature extraction and selection from knowledge sources","volume":"22","author":"Yu","year":"2015","journal-title":"J Am Med Inform Assoc"},{"key":"2021012411195921000_ocz068-B25","doi-asserted-by":"crossref","first-page":"e143","DOI":"10.1093\/jamia\/ocw135","article-title":"Surrogate-assisted feature extraction for high-throughput phenotyping","volume":"24","author":"Yu","year":"2017","journal-title":"J Am Med Inform Assoc"},{"issue":"10","key":"2021012411195921000_ocz068-B26","doi-asserted-by":"crossref","first-page":"e921","DOI":"10.1038\/tp.2015.182","article-title":"Predicting early psychiatric readmission with natural language processing of narrative discharge summaries","volume":"6","author":"Rumshisky","year":"2016","journal-title":"Transl Psychiatry"},{"issue":"2","key":"2021012411195921000_ocz068-B27","doi-asserted-by":"crossref","first-page":"588","DOI":"10.1109\/JBHI.2017.2684121","article-title":"A natural language processing framework for assessing hospital readmissions for patients with COPD","volume":"22","author":"Agarwal","year":"2018","journal-title":"IEEE J Biomed Health Inform"}],"container-title":["Journal of the American Medical Informatics Association"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/jamia\/article-pdf\/26\/11\/1364\/36089053\/ocz068.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/academic.oup.com\/jamia\/article-pdf\/26\/11\/1364\/36089053\/ocz068.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,1,24]],"date-time":"2021-01-24T16:20:09Z","timestamp":1611505209000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/jamia\/article\/26\/11\/1364\/5506581"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,5,30]]},"references-count":27,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2019,5,30]]},"published-print":{"date-parts":[[2019,11,1]]}},"URL":"https:\/\/doi.org\/10.1093\/jamia\/ocz068","relation":{},"ISSN":["1067-5027","1527-974X"],"issn-type":[{"value":"1067-5027","type":"print"},{"value":"1527-974X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2019,11]]},"published":{"date-parts":[[2019,5,30]]}}}