{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,15]],"date-time":"2026-01-15T08:49:49Z","timestamp":1768466989259,"version":"3.49.0"},"reference-count":30,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2021,10,26]],"date-time":"2021-10-26T00:00:00Z","timestamp":1635206400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/100000002","name":"NIH","doi-asserted-by":"publisher","award":["R21CA227613"],"award-info":[{"award-number":["R21CA227613"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"NIH","doi-asserted-by":"publisher","award":["R01CA172073"],"award-info":[{"award-number":["R01CA172073"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"NIH","doi-asserted-by":"publisher","award":["R01LM012607"],"award-info":[{"award-number":["R01LM012607"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"NIH","doi-asserted-by":"publisher","award":["R01AI130460"],"award-info":[{"award-number":["R01AI130460"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"NIH","doi-asserted-by":"publisher","award":["R01AG073435"],"award-info":[{"award-number":["R01AG073435"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"NIH","doi-asserted-by":"publisher","award":["1R01LM013519"],"award-info":[{"award-number":["1R01LM013519"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"NIH","doi-asserted-by":"publisher","award":["1R56AG074604"],"award-info":[{"award-number":["1R56AG074604"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"NIH","doi-asserted-by":"publisher","award":["R56AG069880"],"award-info":[{"award-number":["R56AG069880"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"NIH","doi-asserted-by":"publisher","award":["R01GM140476"],"award-info":[{"award-number":["R01GM140476"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100006093","name":"Patient-Centered Outcomes Research Institute","doi-asserted-by":"publisher","award":["ME-2019C3-18315"],"award-info":[{"award-number":["ME-2019C3-18315"]}],"id":[{"id":"10.13039\/100006093","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100006093","name":"Patient-Centered Outcomes Research Institute","doi-asserted-by":"publisher","award":["ME-2018C3-14899"],"award-info":[{"award-number":["ME-2018C3-14899"]}],"id":[{"id":"10.13039\/100006093","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100006093","name":"Patient-Centered Outcomes Research Institute","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100006093","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,12,28]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Objectives<\/jats:title>\n                  <jats:p>Electronic health records (EHR) are commonly used for the identification of novel risk factors for disease, often referred to as an association study. A major challenge to EHR-based association studies is phenotyping error in EHR-derived outcomes. A manual chart review of phenotypes is necessary for unbiased evaluation of risk factor associations. However, this process is time-consuming and expensive. The objective of this paper is to develop an outcome-dependent sampling approach for designing manual chart review, where EHR-derived phenotypes can be used to guide the selection of charts to be reviewed in order to maximize statistical efficiency in the subsequent estimation of risk factor associations.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Materials and Methods<\/jats:title>\n                  <jats:p>After applying outcome-dependent sampling, an augmented estimator can be constructed by optimally combining the chart-reviewed phenotypes from the selected patients with the error-prone EHR-derived phenotype. We conducted simulation studies to evaluate the proposed method and applied our method to data on colon cancer recurrence in a cohort of patients treated for a primary colon cancer in the Kaiser Permanente Washington (KPW) healthcare system.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>Simulations verify the coverage probability of the proposed method and show that, when disease prevalence is less than 30%, the proposed method has smaller variance than an existing method where the validation set for chart review is uniformly sampled. In addition, from design perspective, the proposed method is able to achieve the same statistical power with 50% fewer charts to be validated than the uniform sampling method, thus, leading to a substantial efficiency gain in chart review. These findings were also confirmed by the application of the competing methods to the KPW colon cancer data.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Discussion<\/jats:title>\n                  <jats:p>Our simulation studies and analysis of data from KPW demonstrate that, compared to an existing uniform sampling method, the proposed outcome-dependent method can lead to a more efficient chart review sampling design and unbiased association estimates with higher statistical efficiency.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Conclusion<\/jats:title>\n                  <jats:p>The proposed method not only optimally combines phenotypes from chart review with EHR-derived phenotypes but also suggests an efficient design for conducting chart review, with the goal of improving the efficiency of estimated risk factor associations using EHR data.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/jamia\/ocab222","type":"journal-article","created":{"date-parts":[[2021,9,30]],"date-time":"2021-09-30T15:18:19Z","timestamp":1633015099000},"page":"52-61","source":"Crossref","is-referenced-by-count":12,"title":["A cost-effective chart review sampling design to account for phenotyping error in electronic health records (EHR) data"],"prefix":"10.1093","volume":"29","author":[{"given":"Ziyan","family":"Yin","sequence":"first","affiliation":[{"name":"Department of Statistical Science, Temple University, Philadelphia, Pennsylvania, USA"}]},{"given":"Jiayi","family":"Tong","sequence":"additional","affiliation":[{"name":"Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, Pennsylvania, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0835-0788","authenticated-orcid":false,"given":"Yong","family":"Chen","sequence":"additional","affiliation":[{"name":"Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, Pennsylvania, USA"}]},{"given":"Rebecca A","family":"Hubbard","sequence":"additional","affiliation":[{"name":"Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, The University of Pennsylvania, Philadelphia, Pennsylvania, USA"}]},{"given":"Cheng Yong","family":"Tang","sequence":"additional","affiliation":[{"name":"Department of Statistical Science, Temple University, Philadelphia, Pennsylvania, USA"}]}],"member":"286","published-online":{"date-parts":[[2021,10,26]]},"reference":[{"issue":"1","key":"2021122823065729000_ocab222-B1","doi-asserted-by":"crossref","first-page":"b81","DOI":"10.1136\/bmj.b81","article-title":"Use of primary care electronic medical record database in drug efficacy research on cardiovascular outcomes: comparison of database and randomised controlled trial findings","volume":"338","author":"Tannen","year":"2009","journal-title":"BMJ"},{"issue":"20","key":"2021122823065729000_ocab222-B2","doi-asserted-by":"crossref","first-page":"2016","DOI":"10.1161\/CIRCULATIONAHA.110.948828","article-title":"Identification of genomic predictors of atrioventricular conduction","volume":"122","author":"Denny","year":"2010","journal-title":"Circulation"},{"issue":"4","key":"2021122823065729000_ocab222-B3","doi-asserted-by":"crossref","first-page":"560","DOI":"10.1016\/j.ajhg.2010.03.003","article-title":"Robust replication of genotype-phenotype associations across multiple diseases in an electronic medical record","volume":"86","author":"Ritchie","year":"2010","journal-title":"Am J Hum Genet"},{"issue":"S4","key":"2021122823065729000_ocab222-B4","doi-asserted-by":"crossref","first-page":"S1","DOI":"10.1186\/1472-6947-15-S4-S1","article-title":"Predictive modeling of structured electronic health records for adverse drug event detection","volume":"15","author":"Zhao","year":"2015","journal-title":"BMC Med Inform Decis Mak"},{"issue":"1","key":"2021122823065729000_ocab222-B5","first-page":"20","article-title":"Comparative effectiveness research using observational data: active comparators to emulate target trials with inactive comparators","volume":"4","author":"Huitfeldt","year":"2016","journal-title":"EGEMS (Wash DC)"},{"issue":"1","key":"2021122823065729000_ocab222-B6","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1136\/bmjqs-2015-004332","article-title":"Electronic health record-based triggers to detect adverse events after outpatient orthopaedic surgery","volume":"25","author":"Menendez","year":"2016","journal-title":"BMJ Qual Saf"},{"key":"2021122823065729000_ocab222-B7","doi-asserted-by":"crossref","first-page":"26","DOI":"10.1016\/j.csbj.2016.11.001","article-title":"Heart failure: diagnosis, severity estimation and prediction of adverse events through machine learning techniques","volume":"15","author":"Tripoliti","year":"2017","journal-title":"Comput Struct Biotechnol J"},{"issue":"6","key":"2021122823065729000_ocab222-B8","doi-asserted-by":"crossref","first-page":"1719","DOI":"10.1109\/JBHI.2017.2675340","article-title":"Prediction of adverse events in patients undergoing major cardiovascular procedures","volume":"21","author":"Mortazavi","year":"2017","journal-title":"IEEE J Biomed Health Inform"},{"key":"2021122823065729000_ocab222-B9","first-page":"1764","article-title":"An empirical study for impacts of measurement errors on EHR based association studies","volume":"2016","author":"Duan","year":"2017","journal-title":"AMIA Annu Symp Proc"},{"issue":"2","key":"2021122823065729000_ocab222-B10","doi-asserted-by":"crossref","first-page":"264","DOI":"10.1002\/pds.4680","article-title":"Inflation of type I error rates due to differential misclassification in EHR-derived outcomes: empirical illustration using breast cancer recurrence","volume":"28","author":"Chen","year":"2019","journal-title":"Pharmacoepidemiol Drug Saf"},{"issue":"8","key":"2021122823065729000_ocab222-B11","doi-asserted-by":"crossref","first-page":"e124","DOI":"10.1371\/journal.pmed.0020124","article-title":"Why most published research findings are false","volume":"2","author":"Ioannidis","year":"2005","journal-title":"PLoS Med"},{"issue":"2","key":"2021122823065729000_ocab222-B12","doi-asserted-by":"crossref","first-page":"244","DOI":"10.1093\/jamia\/ocz180","article-title":"An augmented estimation procedure for EHR-based association studies accounting for differential misclassification","volume":"27","author":"Tong","year":"2020","journal-title":"J Am Med Inform Assoc"},{"issue":"12","key":"2021122823065729000_ocab222-B13","doi-asserted-by":"crossref","first-page":"e88","DOI":"10.1097\/MLR.0000000000000404","article-title":"Detecting lung and colorectal cancer recurrence using structured clinical\/administrative data to enable outcomes research and population health management","volume":"55","author":"Hassett","year":"2017","journal-title":"Med Care"},{"issue":"449","key":"2021122823065729000_ocab222-B14","doi-asserted-by":"crossref","first-page":"51","DOI":"10.1080\/01621459.2000.10473898","article-title":"Estimation and inference for logistic regression with covariate misclassification and measurement error in main study\/validation study designs","volume":"95","author":"Spiegelman","year":"2000","journal-title":"J Am Stat Assoc"},{"issue":"3","key":"2021122823065729000_ocab222-B15","doi-asserted-by":"crossref","first-page":"647","DOI":"10.1093\/biomet\/asr035","article-title":"Marginal methods for correlated binary data with misclassified responses","volume":"98","author":"Chen","year":"2011","journal-title":"Biometrika"},{"issue":"4","key":"2021122823065729000_ocab222-B16","doi-asserted-by":"crossref","first-page":"589","DOI":"10.1097\/EDE.0b013e3182117c85","article-title":"Validation data-based adjustments for outcome misclassification in logistic regression","volume":"22","author":"Lyles","year":"2011","journal-title":"Epidemiology"},{"issue":"1","key":"2021122823065729000_ocab222-B17","doi-asserted-by":"crossref","first-page":"78","DOI":"10.1111\/biom.12971","article-title":"Semi-supervised validation of multiple surrogate outcomes with application to electronic medical records phenotyping","volume":"75","author":"Hong","year":"2019","journal-title":"Biometrics"},{"issue":"4","key":"2021122823065729000_ocab222-B18","doi-asserted-by":"crossref","first-page":"359","DOI":"10.2307\/3316021","article-title":"Semiparametric efficient estimation for the auxiliary outcome problem with the conditional mean model","volume":"32","author":"Chen","year":"2004","journal-title":"Can J Statistics"},{"key":"2021122823065729000_ocab222-B19","doi-asserted-by":"crossref","first-page":"67","DOI":"10.1016\/j.jmva.2015.05.017","article-title":"Semiparametric linear transformation model with differential measurement error and validation sampling","volume":"141","author":"Wang","year":"2015","journal-title":"J Multivar Anal"},{"issue":"2","key":"2021122823065729000_ocab222-B20","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1093\/oxfordjournals.aje.a009251","article-title":"Logistic regression when the outcome is measured with uncertainty","volume":"146","author":"Magder","year":"1997","journal-title":"Am J Epidemiol"},{"issue":"4","key":"2021122823065729000_ocab222-B21","doi-asserted-by":"crossref","first-page":"963","DOI":"10.2307\/2532441","article-title":"The design and analysis of case-control studies with biased sampling","volume":"46","author":"Weinberg","year":"1990","journal-title":"Biometrics"},{"issue":"4","key":"2021122823065729000_ocab222-B22","doi-asserted-by":"crossref","first-page":"843","DOI":"10.1093\/biomet\/86.4.843","article-title":"Bias and efficiency loss due to misclassified responses in binary regression","volume":"86","author":"Neuhaus","year":"1999","journal-title":"Biometrika"},{"issue":"2","key":"2021122823065729000_ocab222-B23","doi-asserted-by":"crossref","first-page":"447","DOI":"10.1111\/1467-9868.00078","article-title":"Maximum likelihood estimation of logistic regression parameters under two-phase, outcome-dependent sampling","volume":"59","author":"Breslow","year":"1997","journal-title":"J R Stat Soc Ser B (Stat Methodol)"},{"key":"2021122823065729000_ocab222-B24","author":"Qin","year":"2017"},{"issue":"3","key":"2021122823065729000_ocab222-B25","doi-asserted-by":"crossref","first-page":"711","DOI":"10.1093\/biomet\/87.3.711","article-title":"Miscellanea. A robust imputation method for surrogate outcome data","volume":"87","author":"Chen","year":"2000","journal-title":"Biometrika"},{"issue":"1","key":"2021122823065729000_ocab222-B26","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1093\/biomet\/75.1.11","article-title":"Logistic regression for two-stage case-control data","volume":"75","author":"Breslow","year":"1988","journal-title":"Biometrika"},{"issue":"2","key":"2021122823065729000_ocab222-B27","doi-asserted-by":"crossref","first-page":"117","DOI":"10.1007\/s10258-002-0008-x","article-title":"Inverse probability weighted M-estimators for sample selection, attrition, and stratification","volume":"1","author":"Wooldridge","year":"2002","journal-title":"Port Econ J"},{"issue":"4","key":"2021122823065729000_ocab222-B28","doi-asserted-by":"crossref","first-page":"1001","DOI":"10.1093\/biomet\/ass045","article-title":"An efficient empirical likelihood approach for estimating equations with missing data","volume":"99","author":"Tang","year":"2012","journal-title":"Biometrika"},{"issue":"1","key":"2021122823065729000_ocab222-B29","doi-asserted-by":"crossref","first-page":"1","DOI":"10.2307\/1912526","article-title":"Maximum likelihood estimation of misspecified models","volume":"50","author":"White","year":"1982","journal-title":"Econometrica"},{"issue":"3","key":"2021122823065729000_ocab222-B30","doi-asserted-by":"crossref","first-page":"449","DOI":"10.1111\/1467-9868.00243","article-title":"A unified approach to regression analysis under double-sampling designs","volume":"62","author":"Chen","year":"2000","journal-title":"J R Stat Soc B"}],"container-title":["Journal of the American Medical Informatics Association"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/jamia\/article-pdf\/29\/1\/52\/41955511\/ocab222.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/jamia\/article-pdf\/29\/1\/52\/41955511\/ocab222.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,12,28]],"date-time":"2021-12-28T23:09:58Z","timestamp":1640732998000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/jamia\/article\/29\/1\/52\/6410783"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,26]]},"references-count":30,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2021,10,26]]},"published-print":{"date-parts":[[2021,12,28]]}},"URL":"https:\/\/doi.org\/10.1093\/jamia\/ocab222","relation":{},"ISSN":["1527-974X"],"issn-type":[{"value":"1527-974X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,1,1]]},"published":{"date-parts":[[2021,10,26]]}}}