{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,23]],"date-time":"2026-01-23T12:32:31Z","timestamp":1769171551901,"version":"3.49.0"},"reference-count":33,"publisher":"Oxford University Press (OUP)","issue":"2","license":[{"start":{"date-parts":[[2019,10,16]],"date-time":"2019-10-16T00:00:00Z","timestamp":1571184000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["1R01LM012607"],"award-info":[{"award-number":["1R01LM012607"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["1R01AI130460"],"award-info":[{"award-number":["1R01AI130460"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["P50MH113840"],"award-info":[{"award-number":["P50MH113840"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["1R01AI116794"],"award-info":[{"award-number":["1R01AI116794"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R01LM009012"],"award-info":[{"award-number":["R01LM009012"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R01HD099348"],"award-info":[{"award-number":["R01HD099348"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R01LM010098"],"award-info":[{"award-number":["R01LM010098"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100006093","name":"Patient-Centered Outcomes Research Institute","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100006093","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100006093","name":"PCORI","doi-asserted-by":"publisher","award":["ME-1511-32666"],"award-info":[{"award-number":["ME-1511-32666"]}],"id":[{"id":"10.13039\/100006093","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"NIH","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"name":"PCORI Methodology Committee"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020,2,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Objectives<\/jats:title>\n                  <jats:p>The ability to identify novel risk factors for health outcomes is a key strength of electronic health record (EHR)-based research. However, the validity of such studies is limited by error in EHR-derived phenotypes. The objective of this study was to develop a novel procedure for reducing bias in estimated associations between risk factors and phenotypes in EHR data.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Materials and Methods<\/jats:title>\n                  <jats:p>The proposed method combines the strengths of a gold-standard phenotype obtained through manual chart review for a small validation set of patients and an automatically-derived phenotype that is available for all patients but is potentially error-prone (hereafter referred to as the algorithm-derived phenotype). An augmented estimator of associations is obtained by optimally combining these 2 phenotypes. We conducted simulation studies to evaluate the performance of the augmented estimator and conducted an analysis of risk factors for second breast cancer events using data on a cohort from Kaiser Permanente Washington.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>The proposed method was shown to reduce bias relative to an estimator using only the algorithm-derived phenotype and reduce variance compared to an estimator using only the validation data.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Discussion<\/jats:title>\n                  <jats:p>Our simulation studies and real data application demonstrate that, compared to the estimator using validation data only, the augmented estimator has lower variance (ie, higher statistical efficiency). Compared to the estimator using error-prone EHR-derived phenotypes, the augmented estimator has smaller bias.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Conclusions<\/jats:title>\n                  <jats:p>The proposed estimator can effectively combine an error-prone phenotype with gold-standard data from a limited chart review in order to improve analyses of risk factors using EHR data.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/jamia\/ocz180","type":"journal-article","created":{"date-parts":[[2019,9,15]],"date-time":"2019-09-15T19:21:40Z","timestamp":1568575300000},"page":"244-253","source":"Crossref","is-referenced-by-count":19,"title":["An augmented estimation procedure for EHR-based association studies accounting for differential misclassification"],"prefix":"10.1093","volume":"27","author":[{"given":"Jiayi","family":"Tong","sequence":"first","affiliation":[{"name":"Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, Pennsylvania, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jing","family":"Huang","sequence":"additional","affiliation":[{"name":"Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, Pennsylvania, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jessica","family":"Chubak","sequence":"additional","affiliation":[{"name":"Department of Epidemiology, Kaiser Permanente Washington Health Research Institute, Seattle, Washington, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xuan","family":"Wang","sequence":"additional","affiliation":[{"name":"Department of Statistics, School of Mathematical Sciences, Zhejiang University, Hangzhou, Zhejiang, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jason H","family":"Moore","sequence":"additional","affiliation":[{"name":"Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, Pennsylvania, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Rebecca A","family":"Hubbard","sequence":"additional","affiliation":[{"name":"Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, Pennsylvania, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yong","family":"Chen","sequence":"additional","affiliation":[{"name":"Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, Pennsylvania, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2019,10,16]]},"reference":[{"issue":"6","key":"2020110613101214500_ocz180-B1","doi-asserted-by":"crossref","first-page":"395","DOI":"10.1038\/nrg3208","article-title":"Mining electronic health records: towards better research applications and clinical care","volume":"13","author":"Jensen","year":"2012","journal-title":"Nat Rev Genet"},{"issue":"1","key":"2020110613101214500_ocz180-B2","doi-asserted-by":"crossref","first-page":"117","DOI":"10.1136\/amiajnl-2012-001145","article-title":"Next-generation phenotyping of electronic health records","volume":"20","author":"Hripcsak","year":"2013","journal-title":"J Am Med Inform Assoc"},{"issue":"4","key":"2020110613101214500_ocz180-B3","doi-asserted-by":"crossref","first-page":"560","DOI":"10.1016\/j.ajhg.2010.03.003","article-title":"Robust replication of genotype-phenotype associations across multiple diseases in an electronic medical record","volume":"86","author":"Ritchie","year":"2010","journal-title":"Am J Hum Genet"},{"key":"2020110613101214500_ocz180-B4","first-page":"1203.","article-title":"A general framework for considering selection bias in EHR-based studies: what data are observed and why?","volume":"4","author":"Haneuse","year":"2016","journal-title":"EGEMS (Wash DC)"},{"issue":"4","key":"2020110613101214500_ocz180-B5","doi-asserted-by":"crossref","first-page":"843","DOI":"10.1093\/biomet\/86.4.843","article-title":"Bias and efficiency loss due to misclassified responses in binary regression","volume":"86","author":"Neuhaus","year":"1999","journal-title":"Biometrika"},{"key":"2020110613101214500_ocz180-B6","first-page":"1764","article-title":"An empirical study for impacts of measurement errors on EHR based association studies","volume":"2016","author":"Duan","year":"2016","journal-title":"AMIA Annu Symp Proc"},{"issue":"2","key":"2020110613101214500_ocz180-B7","doi-asserted-by":"crossref","first-page":"414","DOI":"10.2307\/2529795","article-title":"The effects of misclassification on the estimation of relative risk","volume":"33","author":"Barron","year":"1977","journal-title":"Biometrics"},{"issue":"5","key":"2020110613101214500_ocz180-B8","doi-asserted-by":"crossref","first-page":"488","DOI":"10.1093\/oxfordjournals.aje.a112408","article-title":"Bias due to misclassification in the estimation of relative risk","volume":"105","author":"Copeland","year":"1977","journal-title":"Am J Epidemiol"},{"issue":"7","key":"2020110613101214500_ocz180-B9","doi-asserted-by":"crossref","first-page":"745","DOI":"10.1002\/sim.4780070704","article-title":"Variance estimation for epidemiologic effect estimates under misclassification","volume":"7","author":"Greenland","year":"1988","journal-title":"Stat Med"},{"issue":"8","key":"2020110613101214500_ocz180-B10","doi-asserted-by":"crossref","first-page":"1197","DOI":"10.1002\/sim.4780100804","article-title":"Adjustment for non-differential misclassification error in the generalized linear model","volume":"10","author":"Liu","year":"1991","journal-title":"Stat Med"},{"issue":"2","key":"2020110613101214500_ocz180-B11","doi-asserted-by":"crossref","first-page":"338","DOI":"10.1111\/j.0006-341X.1999.00338.x","article-title":"Matrix methods for estimating odds ratios with misclassified exposure data: extensions and comparisons","volume":"55","author":"Morrissey","year":"1999","journal-title":"Biometrics"},{"issue":"4","key":"2020110613101214500_ocz180-B12","doi-asserted-by":"crossref","first-page":"1123","DOI":"10.1111\/j.0006-341X.2001.01123.x","article-title":"Threshold model for misclassified binary responses with applications to animal breeding","volume":"57","author":"Rekaya","year":"2001","journal-title":"Biometrics"},{"issue":"4","key":"2020110613101214500_ocz180-B13","doi-asserted-by":"crossref","first-page":"1034","DOI":"10.1111\/j.0006-341X.2002.1034_1.x","article-title":"A note on estimating crude odds ratios in case-control studies with differentially misclassified exposure","volume":"58","author":"Lyles","year":"2002","journal-title":"Biometrics"},{"issue":"3","key":"2020110613101214500_ocz180-B14","doi-asserted-by":"crossref","first-page":"670","DOI":"10.1111\/1541-0420.00077","article-title":"Binomial regression with misclassification","volume":"59","author":"Paulino","year":"2003","journal-title":"Biometrics"},{"issue":"14","key":"2020110613101214500_ocz180-B15","doi-asserted-by":"crossref","first-page":"2221","DOI":"10.1002\/sim.2094","article-title":"Does it always help to adjust for misclassification of a binary outcome in logistic regression?","volume":"24","author":"Luan","year":"2005","journal-title":"Stat Med"},{"issue":"2","key":"2020110613101214500_ocz180-B16","doi-asserted-by":"crossref","first-page":"528","DOI":"10.1016\/j.jspi.2007.06.012","article-title":"Maximum-likelihood and closed-form estimators of epidemiologic measures under misclassification","volume":"138","author":"Greenland","year":"2008","journal-title":"J Stat Plan Inference"},{"issue":"22","key":"2020110613101214500_ocz180-B17","doi-asserted-by":"crossref","first-page":"2297","DOI":"10.1002\/sim.3971","article-title":"Sensitivity analysis for misclassification in logistic regression via likelihood methods and predictive value weighting","volume":"29","author":"Lyles","year":"2010","journal-title":"Stat Med"},{"issue":"2","key":"2020110613101214500_ocz180-B18","doi-asserted-by":"crossref","first-page":"264","DOI":"10.1002\/pds.4680","article-title":"Inflation of type I error rates due to differential misclassification in EHR-derived outcomes: empirical illustration using breast cancer recurrence","volume":"28","author":"Chen","year":"2019","journal-title":"Pharmacoepidemiol Drug Saf"},{"issue":"11","key":"2020110613101214500_ocz180-B19","doi-asserted-by":"crossref","first-page":"1369","DOI":"10.1007\/s00439-014-1466-9","article-title":"Improving the power of genetic association tests with imperfect phenotype derived from electronic medical records","volume":"133","author":"Sinnott","year":"2014","journal-title":"Hum Genet"},{"issue":"2","key":"2020110613101214500_ocz180-B20","doi-asserted-by":"crossref","first-page":"85","DOI":"10.1038\/nrg3868","article-title":"Methods of integrating data to uncover genotype-phenotype interactions","volume":"16","author":"Ritchie","year":"2015","journal-title":"Nat Rev Genet"},{"issue":"3","key":"2020110613101214500_ocz180-B21","doi-asserted-by":"crossref","first-page":"129","DOI":"10.1038\/nrg.2015.36","article-title":"Unravelling the human genome-phenome relationship using phenome-wide association studies","volume":"17","author":"Bush","year":"2016","journal-title":"Nat Rev Genet"},{"issue":"2","key":"2020110613101214500_ocz180-B22","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1093\/oxfordjournals.aje.a009251","article-title":"Logistic regression when the outcome is measured with uncertainty","volume":"146","author":"Magder","year":"1997","journal-title":"Am J Epidemiol"},{"issue":"3","key":"2020110613101214500_ocz180-B23","doi-asserted-by":"crossref","first-page":"647","DOI":"10.1093\/biomet\/asr035","article-title":"Marginal methods for correlated binary data with misclassified responses","volume":"98","author":"Chen","year":"2011","journal-title":"Biometrika"},{"issue":"7","key":"2020110613101214500_ocz180-B24","doi-asserted-by":"crossref","first-page":"1095","DOI":"10.1002\/sim.1656","article-title":"Modelling risk when binary outcomes are subject to error","volume":"23","author":"McInturff","year":"2004","journal-title":"Stat Med"},{"issue":"4","key":"2020110613101214500_ocz180-B25","doi-asserted-by":"crossref","first-page":"589","DOI":"10.1097\/EDE.0b013e3182117c85","article-title":"Validation data-based adjustments for outcome misclassification in logistic regression: an illustration","volume":"22","author":"Lyles","year":"2011","journal-title":"Epidemiology"},{"issue":"9","key":"2020110613101214500_ocz180-B26","doi-asserted-by":"crossref","first-page":"904","DOI":"10.1093\/aje\/kws340","article-title":"Accounting for misclassified outcomes in binary regression models using multiple imputation with internal validation data","volume":"177","author":"Edwards","year":"2013","journal-title":"Am J Epidemiol"},{"key":"2020110613101214500_ocz180-B27","doi-asserted-by":"crossref","first-page":"67","DOI":"10.1016\/j.jmva.2015.05.017","article-title":"Semiparametric linear transformation model with differential measurement error and validation sampling","volume":"141","author":"Wang","year":"2015","journal-title":"J Multivar Anal"},{"key":"2020110613101214500_ocz180-B28","doi-asserted-by":"crossref","DOI":"10.1201\/9781420010138","volume-title":"Measurement Error in Nonlinear Models: A Modern Perspective","author":"Carroll","year":"2006"},{"issue":"3","key":"2020110613101214500_ocz180-B29","doi-asserted-by":"crossref","first-page":"343","DOI":"10.1016\/j.jclinepi.2011.09.002","article-title":"Tradeoffs between accuracy measures for electronic health care data algorithms","volume":"65","author":"Chubak","year":"2012","journal-title":"J Clin Epidemiol"},{"issue":"2","key":"2020110613101214500_ocz180-B30","doi-asserted-by":"crossref","first-page":"405","DOI":"10.1007\/s10549-014-2870-5","article-title":"Comparative safety of cardiovascular medication use and breast cancer outcomes among women with early stage breast cancer","volume":"144","author":"Boudreau","year":"2014","journal-title":"Breast Cancer Res Treat"},{"issue":"12","key":"2020110613101214500_ocz180-B31","doi-asserted-by":"crossref","first-page":"931","DOI":"10.1093\/jnci\/djs233","article-title":"Administrative data algorithms to identify second breast cancer events following early-stage invasive breast cancer","volume":"104","author":"Chubak","year":"2012","journal-title":"J Natl Cancer Inst"},{"issue":"8","key":"2020110613101214500_ocz180-B32","doi-asserted-by":"crossref","first-page":"e124.","DOI":"10.1371\/journal.pmed.0020124","article-title":"Why most published research findings are false","volume":"2","author":"Ioannidis","year":"2005","journal-title":"PLoS Med"},{"issue":"3","key":"2020110613101214500_ocz180-B33","doi-asserted-by":"crossref","first-page":"425","DOI":"10.1002\/sim.7522","article-title":"Weighted estimation for confounded binary outcomes subject to misclassification","volume":"37","author":"Gravel","year":"2018","journal-title":"Stat Med"}],"container-title":["Journal of the American Medical Informatics Association"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/jamia\/article-pdf\/27\/2\/244\/34152617\/ocz180.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/academic.oup.com\/jamia\/article-pdf\/27\/2\/244\/34152617\/ocz180.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2020,11,6]],"date-time":"2020-11-06T19:32:24Z","timestamp":1604691144000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/jamia\/article\/27\/2\/244\/5588595"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,10,16]]},"references-count":33,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2019,10,16]]},"published-print":{"date-parts":[[2020,2,1]]}},"URL":"https:\/\/doi.org\/10.1093\/jamia\/ocz180","relation":{},"ISSN":["1527-974X"],"issn-type":[{"value":"1527-974X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2020,2]]},"published":{"date-parts":[[2019,10,16]]}}}