{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,24]],"date-time":"2026-02-24T12:19:26Z","timestamp":1771935566636,"version":"3.50.1"},"reference-count":21,"publisher":"Oxford University Press (OUP)","issue":"8-9","license":[{"start":{"date-parts":[[2019,4,26]],"date-time":"2019-04-26T00:00:00Z","timestamp":1556236800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"name":"Weill Cornell Medicine Clinical and Translational Science Center","award":["UL1 TR000457"],"award-info":[{"award-number":["UL1 TR000457"]}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R01 GM105688"],"award-info":[{"award-number":["R01 GM105688"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R01 MH105384"],"award-info":[{"award-number":["R01 MH105384"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,8,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Objective<\/jats:title><jats:p>We aimed to address deficiencies in structured electronic health record (EHR) data for race and ethnicity by identifying black and Hispanic patients from unstructured clinical notes and assessing differences between patients with or without structured race\/ethnicity data.<\/jats:p><\/jats:sec><jats:sec><jats:title>Materials and Methods<\/jats:title><jats:p>Using EHR notes for 16 665 patients with encounters at a primary care practice, we developed rule-based natural language processing (NLP) algorithms to classify patients as black\/Hispanic. We evaluated performance of the method against an annotated gold standard, compared race and ethnicity between NLP-derived and structured EHR data, and compared characteristics of patients identified as black or Hispanic using only NLP vs patients identified as such only in structured EHR data.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>For the sample of 16 665 patients, NLP identified 948 additional patients as black, a 26%increase, and 665 additional patients as Hispanic, a 20% increase. Compared with the patients identified as black or Hispanic in structured EHR data, patients identified as black or Hispanic via NLP only were older, more likely to be male, less likely to have commercial insurance, and more likely to have higher comorbidity.<\/jats:p><\/jats:sec><jats:sec><jats:title>Discussion<\/jats:title><jats:p>Structured EHR data for race and ethnicity are subject to data quality issues. Supplementing structured EHR race data with NLP-derived race and ethnicity may allow researchers to better assess the demographic makeup of populations and draw more accurate conclusions about intergroup differences in health outcomes.<\/jats:p><\/jats:sec><jats:sec><jats:title>Conclusions<\/jats:title><jats:p>Black or Hispanic patients who are not documented as such in structured EHR race\/ethnicity fields differ significantly from those who are. Relatively simple NLP can help address this limitation.<\/jats:p><\/jats:sec>","DOI":"10.1093\/jamia\/ocz040","type":"journal-article","created":{"date-parts":[[2019,3,13]],"date-time":"2019-03-13T20:12:17Z","timestamp":1552507937000},"page":"722-729","source":"Crossref","is-referenced-by-count":66,"title":["Underserved populations with missing race ethnicity data differ significantly from those with structured race\/ethnicity documentation"],"prefix":"10.1093","volume":"26","author":[{"given":"Evan T","family":"Sholle","sequence":"first","affiliation":[{"name":"Information Technologies & Services Department, Weill Cornell Medicine, New York, New York, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Laura C","family":"Pinheiro","sequence":"additional","affiliation":[{"name":"Department of Medicine, Weill Cornell Medicine, New York, New York, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Prakash","family":"Adekkanattu","sequence":"additional","affiliation":[{"name":"Information Technologies & Services Department, Weill Cornell Medicine, New York, New York, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"suffix":"III","given":"Marcos A","family":"Davila","sequence":"additional","affiliation":[{"name":"Information Technologies & Services Department, Weill Cornell Medicine, New York, New York, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Stephen B","family":"Johnson","sequence":"additional","affiliation":[{"name":"Department of Healthcare Policy & Research, Weill Cornell Medicine, New York, New York, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jyotishman","family":"Pathak","sequence":"additional","affiliation":[{"name":"Department of Healthcare Policy & Research, Weill Cornell Medicine, New York, New York, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sanjai","family":"Sinha","sequence":"additional","affiliation":[{"name":"Department of Medicine, Weill Cornell Medicine, New York, New York, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Cassidie","family":"Li","sequence":"additional","affiliation":[{"name":"Department of Medicine, Weill Cornell Medicine, New York, New York, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Stasi A","family":"Lubansky","sequence":"additional","affiliation":[{"name":"Department of Medicine, Weill Cornell Medicine, New York, New York, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Monika M","family":"Safford","sequence":"additional","affiliation":[{"name":"Department of Medicine, Weill Cornell Medicine, New York, New York, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"suffix":"Jr","given":"Thomas R","family":"Campion","sequence":"additional","affiliation":[{"name":"Information Technologies & Services Department, Weill Cornell Medicine, New York, New York, USA"},{"name":"Department of Healthcare Policy & Research, Weill Cornell Medicine, New York, New York, USA"},{"name":"Department of Pediatrics, Weill Cornell Medicine, New York, New York, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2019,4,26]]},"reference":[{"issue":"8","key":"2020110613091096200_ocz040-B1","first-page":"666","article-title":"Unequal treatment: confronting racial and ethnic disparities in health care","volume":"94","author":"Nelson","year":"2002","journal-title":"J Natl Med Assoc"},{"key":"2020110613091096200_ocz040-B2","volume-title":"National Healthcare Disparities Report 2011","author":"U.S. Department of Health and Human Services","year":"2012"},{"key":"2020110613091096200_ocz040-B3","doi-asserted-by":"crossref","first-page":"1501","DOI":"10.1111\/j.1475-6773.2006.00552.x","article-title":"Obtaining data on patient race, ethnicity, and primary language in health care organizations: current challenges and proposed solutions","volume":"41","author":"Hasnain-Wynia","year":"2006","journal-title":"Health Serv Res"},{"issue":"6","key":"2020110613091096200_ocz040-B4","doi-asserted-by":"crossref","first-page":"719","DOI":"10.1007\/s11606-014-3102-8","article-title":"Accuracy of race, ethnicity, and language preference in an electronic health record","volume":"30","author":"Klinger","year":"2015","journal-title":"J Gen Intern Med"},{"issue":"9","key":"2020110613091096200_ocz040-B5","doi-asserted-by":"crossref","first-page":"769","DOI":"10.1016\/S0027-9684(15)30673-8","article-title":"Barriers to collecting patient race, ethnicity, and primary language data in physician practices: an exploratory study","volume":"102","author":"Hasnain-Wynia","year":"2010","journal-title":"J Natl Med Assoc"},{"issue":"6","key":"2020110613091096200_ocz040-B6","doi-asserted-by":"crossref","first-page":"501","DOI":"10.1056\/NEJMp1006114","article-title":"The \u201cmeaningful use\u201d regulation for electronic health records","volume":"363","author":"Blumenthal","year":"2010","journal-title":"N Engl J Med"},{"key":"2020110613091096200_ocz040-B7","article-title":"Revisions to the Standards for the Classification of Federal Data on Race and Ethnicity","author":"Office of Management and Budget","year":"1997","journal-title":"Federal Register"},{"issue":"10","key":"2020110613091096200_ocz040-B8","first-page":"1721","article-title":"Minorities are underrepresented in clinical trials of pharmaceutical agents for cystic fibrosis","volume":"13","author":"McGarry","year":"2016","journal-title":"Ann Am Thorac Soc"},{"key":"2020110613091096200_ocz040-B9","first-page":"537","article-title":"Integrating data from natural language processing into a clinical information system","author":"Johnson","year":"1996","journal-title":"Proc AMIA Annu Fall Symp"},{"issue":"5","key":"2020110613091096200_ocz040-B10","doi-asserted-by":"crossref","first-page":"544","DOI":"10.1136\/amiajnl-2011-000464","article-title":"Natural language processing: an introduction","volume":"18","author":"Nadkarni","year":"2011","journal-title":"J Am Med Inform Assoc"},{"issue":"e1","key":"2020110613091096200_ocz040-B11","doi-asserted-by":"crossref","first-page":"e163","DOI":"10.1136\/amiajnl-2013-001859","article-title":"Automated identification of patients with a diagnosis of binge eating disorder from narrative electronic health records","volume":"21","author":"Bellows","year":"2014","journal-title":"J Am Med Inform Assoc"},{"issue":"5","key":"2020110613091096200_ocz040-B12","doi-asserted-by":"crossref","first-page":"898","DOI":"10.1136\/amiajnl-2012-001076","article-title":"Longitudinal analysis of pain in patients with metastatic prostate cancer using natural language processing of medical record text","volume":"20","author":"Heintzelman","year":"2013","journal-title":"J Am Med Inform Assoc"},{"key":"2020110613091096200_ocz040-B13","first-page":"104","article-title":"From sour grapes to low-hanging fruit: a case study demonstrating a practical strategy for natural language processing portability","volume":"2017","author":"Johnson","year":"2018","journal-title":"AMIA Jt Summits Transl Sci Proc"},{"key":"2020110613091096200_ocz040-B14","first-page":"1581","article-title":"Secondary use of patients\u2019 electronic records (SUPER): an approach for meeting specific data needs of clinical and translational researchers","volume":"2017","author":"Sholle","year":"2017","journal-title":"AMIA Annu Symp Proc"},{"issue":"12","key":"2020110613091096200_ocz040-B15","doi-asserted-by":"crossref","first-page":"1057","DOI":"10.1016\/j.ijmedinf.2015.09.002","article-title":"Using natural language processing to identify problem usage of prescription opioids","volume":"84","author":"Carrell","year":"2015","journal-title":"Int J Med Inform"},{"key":"2020110613091096200_ocz040-B16","doi-asserted-by":"crossref","first-page":"151","DOI":"10.1186\/s12872-017-0580-8","article-title":"Unlocking echocardiogram measurements for heart disease research through natural language processing","volume":"17","author":"Patterson","year":"2017","journal-title":"BMC Cardiovasc Disord"},{"issue":"5","key":"2020110613091096200_ocz040-B17","doi-asserted-by":"crossref","first-page":"859","DOI":"10.1136\/amiajnl-2011-000535","article-title":"Automated extraction of ejection fraction for quality measurement using regular expressions in Unstructured Information Management Architecture (UIMA) for heart failure","volume":"19","author":"Garvin","year":"2012","journal-title":"J Am Med Inform Assoc"},{"key":"2020110613091096200_ocz040-B18","first-page":"147","article-title":"Ascertaining Depression Severity by ExtractingPatient Health Questionnaire-9 (PHQ-9) scores from clinical notes","volume":"2018","author":"Adekkanattu","year":"2018","journal-title":"AMIA Annu Symp Proc"},{"key":"2020110613091096200_ocz040-B19","article-title":"Standards for maintaining, collecting, and presenting federal data on race and ethnicity","author":"Office of Management and Budget","year":"1997","journal-title":"Federal Register"},{"key":"2020110613091096200_ocz040-B20"},{"issue":"3","key":"2020110613091096200_ocz040-B21","doi-asserted-by":"crossref","first-page":"448","DOI":"10.2105\/AJPH.2012.300943","article-title":"Tracking health disparities through natural-language processing","volume":"103","author":"Wieland","year":"2013","journal-title":"Am J Public Health"}],"container-title":["Journal of the American Medical Informatics Association"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/jamia\/article-pdf\/26\/8-9\/722\/34151890\/ocz040.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/academic.oup.com\/jamia\/article-pdf\/26\/8-9\/722\/34151890\/ocz040.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,15]],"date-time":"2024-07-15T19:41:54Z","timestamp":1721072514000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/jamia\/article\/26\/8-9\/722\/5480563"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,4,26]]},"references-count":21,"journal-issue":{"issue":"8-9","published-online":{"date-parts":[[2019,4,26]]},"published-print":{"date-parts":[[2019,8,1]]}},"URL":"https:\/\/doi.org\/10.1093\/jamia\/ocz040","relation":{},"ISSN":["1527-974X"],"issn-type":[{"value":"1527-974X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2019,8]]},"published":{"date-parts":[[2019,4,26]]}}}