{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,1]],"date-time":"2026-04-01T22:36:32Z","timestamp":1775082992814,"version":"3.50.1"},"reference-count":46,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2021,10,11]],"date-time":"2021-10-11T00:00:00Z","timestamp":1633910400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,10,11]],"date-time":"2021-10-11T00:00:00Z","timestamp":1633910400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100000093","name":"U.S. Department of Health & Human Services | NIH | Center for Information Technology","doi-asserted-by":"publisher","award":["R56HL116832"],"award-info":[{"award-number":["R56HL116832"]}],"id":[{"id":"10.13039\/100000093","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["npj Digit. Med."],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Laboratory data from Electronic Health Records (EHR) are often used in prediction models where estimation bias and model performance from missingness can be mitigated using imputation methods. We demonstrate the utility of imputation in two real-world EHR-derived cohorts of ischemic stroke from Geisinger and of heart failure from Sutter Health to: (1) characterize the patterns of missingness in laboratory variables; (2) simulate two missing mechanisms, arbitrary and monotone; (3) compare cross-sectional and multi-level multivariate missing imputation algorithms applied to laboratory data; (4) assess whether incorporation of latent information, derived from comorbidity data, can improve the performance of the algorithms. The latter was based on a case study of hemoglobin A1c under a univariate missing imputation framework. Overall, the pattern of missingness in EHR laboratory variables was<jats:italic>not at random<\/jats:italic>and was highly associated with patients\u2019 comorbidity data; and the multi-level imputation algorithm showed smaller imputation error than the cross-sectional method.<\/jats:p>","DOI":"10.1038\/s41746-021-00518-0","type":"journal-article","created":{"date-parts":[[2021,10,11]],"date-time":"2021-10-11T10:18:43Z","timestamp":1633947523000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":89,"title":["Imputation of missing values for electronic health record laboratory data"],"prefix":"10.1038","volume":"4","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7006-1285","authenticated-orcid":false,"given":"Jiang","family":"Li","sequence":"first","affiliation":[]},{"given":"Xiaowei S.","family":"Yan","sequence":"additional","affiliation":[]},{"given":"Durgesh","family":"Chaudhary","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5141-1502","authenticated-orcid":false,"given":"Venkatesh","family":"Avula","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2776-8952","authenticated-orcid":false,"given":"Satish","family":"Mudiganti","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8809-5582","authenticated-orcid":false,"given":"Hannah","family":"Husby","sequence":"additional","affiliation":[]},{"given":"Shima","family":"Shahjouei","sequence":"additional","affiliation":[]},{"given":"Ardavan","family":"Afshar","sequence":"additional","affiliation":[]},{"given":"Walter F.","family":"Stewart","sequence":"additional","affiliation":[]},{"given":"Mohammed","family":"Yeasin","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9477-0094","authenticated-orcid":false,"given":"Ramin","family":"Zand","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7689-933X","authenticated-orcid":false,"given":"Vida","family":"Abedi","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2021,10,11]]},"reference":[{"key":"518_CR1","doi-asserted-by":"publisher","first-page":"1678","DOI":"10.1161\/STROKEAHA.117.017033","volume":"48","author":"V Abedi","year":"2017","unstructured":"Abedi, V. et al. Novel screening tool for stroke using artificial neural network. Stroke 48, 1678\u20131681 (2017).","journal-title":"Stroke"},{"key":"518_CR2","doi-asserted-by":"publisher","first-page":"175628642093896","DOI":"10.1177\/1756286420938962","volume":"13","author":"V Abedi","year":"2020","unstructured":"Abedi, V. et al. Using artificial intelligence for improving stroke diagnosis in emergency departments: a practical framework. Ther. Adv. Neurol. Disord. 13, 1756286420938962 (2020).","journal-title":"Ther. Adv. Neurol. Disord."},{"key":"518_CR3","doi-asserted-by":"publisher","first-page":"43","DOI":"10.1038\/s41746-019-0122-0","volume":"2","author":"D Chen","year":"2019","unstructured":"Chen, D. et al. Deep learning and alternative learning strategies for retrospective real-world clinical data. NPJ Digit. Med. 2, 43 (2019).","journal-title":"NPJ Digit. Med."},{"key":"518_CR4","doi-asserted-by":"publisher","first-page":"795","DOI":"10.1016\/j.amjmed.2019.01.017","volume":"132","author":"N Noorbakhsh-Sabet","year":"2019","unstructured":"Noorbakhsh-Sabet, N., Zand, R., Zhang, Y. & Abedi, V. Artificial intelligence transforms the future of health care. Am. J. Med. 132, 795\u2013801 (2019).","journal-title":"Am. J. Med."},{"key":"518_CR5","doi-asserted-by":"publisher","first-page":"130","DOI":"10.1038\/s41746-020-00343-x","volume":"3","author":"N Razavian","year":"2020","unstructured":"Razavian, N. et al. A validated, real-time prediction model for favorable outcomes in hospitalized COVID-19 patients. NPJ Digit. Med. 3, 130 (2020).","journal-title":"NPJ Digit. Med."},{"key":"518_CR6","doi-asserted-by":"publisher","first-page":"e0208141","DOI":"10.1371\/journal.pone.0208141","volume":"14","author":"MA Konerman","year":"2019","unstructured":"Konerman, M. A. et al. Machine learning models to predict disease progression among veterans with hepatitis C virus. PLoS ONE 14, e0208141 (2019).","journal-title":"PLoS ONE"},{"key":"518_CR7","doi-asserted-by":"publisher","unstructured":"Abedi, V. et al. Prediction of long-term stroke recurrence using machine learning models. J. Clin. Med. 10, https:\/\/doi.org\/10.3390\/jcm10061286 (2021).","DOI":"10.3390\/jcm10061286"},{"key":"518_CR8","doi-asserted-by":"publisher","unstructured":"Misra, D. et al. Early detection of septic shock onset using interpretable machine learners. J. Clin. Med. 10, https:\/\/doi.org\/10.3390\/jcm10020301 (2021).","DOI":"10.3390\/jcm10020301"},{"key":"518_CR9","doi-asserted-by":"publisher","first-page":"106","DOI":"10.1186\/s12955-019-1181-2","volume":"17","author":"OF Ayilara","year":"2019","unstructured":"Ayilara, O. F. et al. Impact of missing data on bias and precision when estimating change in patient-reported outcomes from a clinical registry. Health Qual. Life Outcomes 17, 106 (2019).","journal-title":"Health Qual. Life Outcomes"},{"key":"518_CR10","doi-asserted-by":"publisher","first-page":"297","DOI":"10.1080\/00223891.2018.1530680","volume":"102","author":"JR van Ginkel","year":"2020","unstructured":"van Ginkel, J. R., Linting, M., Rippe, R. C. A. & van der Voort, A. Rebutting existing misconceptions about multiple imputation as a method for handling missing data. J. Pers. Assess. 102, 297\u2013308 (2020).","journal-title":"J. Pers. Assess."},{"key":"518_CR11","unstructured":"Ford, B. in Incomplete Data in Sample Surveys, Theory and Bibliographies Vol. 2 (Part IV) (eds. W. Madow, H. Nisselson, & I. Olkin) 185\u2013207 (Academic Press, 1983)."},{"key":"518_CR12","doi-asserted-by":"publisher","first-page":"12","DOI":"10.1016\/j.csda.2013.10.025","volume":"72","author":"L Doove","year":"2014","unstructured":"Doove, L., Van Buuren, S. & Dusseldorp, E. Recursive partitioning for missing data imputation in the presence of interaction effects. Comput Stat. Data Anal. 72, 12 (2014).","journal-title":"Comput Stat. Data Anal."},{"key":"518_CR13","doi-asserted-by":"crossref","first-page":"38","DOI":"10.1111\/j.2517-6161.1977.tb01600.x","volume":"39","author":"AP Dempster","year":"1977","unstructured":"Dempster, A. P., Laird, N. M. & Rubin, D. B. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 39, 38 (1977).","journal-title":"J. R. Stat. Soc. B"},{"key":"518_CR14","unstructured":"Arbuckle, J. L. in Advanced structural equation modeling: Issues and Techniques (eds. G. A. Marcoulides & R. E. Schumacker) (Lawrence Erlbaum Associates, 1996)."},{"key":"518_CR15","doi-asserted-by":"crossref","unstructured":"Rubin, D. B. Multiple Imputation for Nonresponse in Surveys. (Wiley, 1987).","DOI":"10.1002\/9780470316696"},{"key":"518_CR16","doi-asserted-by":"publisher","first-page":"260","DOI":"10.1038\/s41397-019-0101-5","volume":"20","author":"A Yoshikawa","year":"2020","unstructured":"Yoshikawa, A., Li, J. & Meltzer, H. Y. A functional HTR1A polymorphism, rs6295, predicts short-term response to lurasidone: confirmation with meta-analysis of other antipsychotic drugs. Pharmacogenomics J. 20, 260\u2013270 (2020).","journal-title":"Pharmacogenomics J."},{"key":"518_CR17","doi-asserted-by":"publisher","first-page":"681","DOI":"10.1002\/(SICI)1097-0258(19990330)18:6<681::AID-SIM71>3.0.CO;2-R","volume":"18","author":"S van Buuren","year":"1999","unstructured":"van Buuren, S., Boshuizen, H. C. & Knook, D. L. Multiple imputation of missing blood pressure covariates in survival analysis. Stat. Med. 18, 681\u2013694 (1999).","journal-title":"Stat. Med"},{"key":"518_CR18","doi-asserted-by":"publisher","first-page":"219","DOI":"10.1177\/0962280206074463","volume":"16","author":"S van Buuren","year":"2007","unstructured":"van Buuren, S. Multiple imputation of discrete and continuous data by fully conditional specification. Stat. Methods Med. Res. 16, 219\u2013242 (2007).","journal-title":"Stat. Methods Med. Res."},{"key":"518_CR19","first-page":"11","volume":"27","author":"TE Raghunathan","year":"2001","unstructured":"Raghunathan, T. E., Lepkowski, J. M., Van Hoewyk, J. & Solenberger, P. A multivariate technique for multiply imputing missing values using a sequence of regression models. Surv. Methodol. 27, 11 (2001).","journal-title":"Surv. Methodol."},{"key":"518_CR20","doi-asserted-by":"crossref","unstructured":"Schafer, J. L. Analysis of Incomplete Multivariate Data. (Chapman & Hall, 1997).","DOI":"10.1201\/9781439821862"},{"key":"518_CR21","doi-asserted-by":"publisher","first-page":"371","DOI":"10.1080\/10543401003687129","volume":"21","author":"G Frank Liu","year":"2011","unstructured":"Frank Liu, G. & Zhan, X. Comparisons of methods for analysis of repeated binary responses with missing data. J. Biopharm. Stat. 21, 371\u2013392 (2011).","journal-title":"J. Biopharm. Stat."},{"key":"518_CR22","doi-asserted-by":"publisher","unstructured":"Buuren, S. V. & Groothuis-Oudshoorn, K. mice: Multivariate imputation by chained equations in R. J. Stat. Software 45, https:\/\/doi.org\/10.18637\/jss.v045.i03 (2011).","DOI":"10.18637\/jss.v045.i03"},{"key":"518_CR23","doi-asserted-by":"publisher","first-page":"778","DOI":"10.1093\/ajcp\/aqw064","volume":"145","author":"Y Luo","year":"2016","unstructured":"Luo, Y., Szolovits, P., Dighe, A. S. & Baron, J. M. Using machine learning to predict laboratory test results. Am. J. Clin. Pathol. 145, 778\u2013788 (2016).","journal-title":"Am. J. Clin. Pathol."},{"key":"518_CR24","doi-asserted-by":"publisher","unstructured":"Waljee, A. K. et al. Comparison of imputation methods for missing laboratory data in medicine. BMJ Open 3, https:\/\/doi.org\/10.1136\/bmjopen-2013-002847 (2013).","DOI":"10.1136\/bmjopen-2013-002847"},{"key":"518_CR25","doi-asserted-by":"publisher","first-page":"112","DOI":"10.1016\/j.jbi.2017.03.009","volume":"68","author":"Z Hu","year":"2017","unstructured":"Hu, Z. et al. Strategies for handling missing clinical data for automated surgical site infection detection from the electronic health record. J. Biomed. Inf. 68, 112\u2013120 (2017).","journal-title":"J. Biomed. Inf."},{"key":"518_CR26","doi-asserted-by":"publisher","first-page":"645","DOI":"10.1093\/jamia\/ocx133","volume":"25","author":"Y Luo","year":"2018","unstructured":"Luo, Y., Szolovits, P., Dighe, A. S. & Baron, J. M. 3D-MICE: integration of cross-sectional and longitudinal imputation for multi-analyte longitudinal clinical data. J. Am. Med. Inf. Assoc. 25, 645\u2013653 (2018).","journal-title":"J. Am. Med. Inf. Assoc."},{"key":"518_CR27","doi-asserted-by":"publisher","first-page":"411","DOI":"10.1177\/1740774506070802","volume":"3","author":"NR Cook","year":"2006","unstructured":"Cook, N. R. Imputation strategies for blood pressure data nonignorably missing due to medication use. Clin. Trials 3, 411\u2013420 (2006).","journal-title":"Clin. Trials"},{"key":"518_CR28","first-page":"2389","volume":"366","author":"RM Yucel","year":"2008","unstructured":"Yucel, R. M. Multiple imputation inference for multivariate multilevel continuous data with ignorable non-response. Philos. Trans. A Math. Phys. Eng. Sci. 366, 2389\u20132403 (2008).","journal-title":"Philos. Trans. A Math. Phys. Eng. Sci."},{"key":"518_CR29","doi-asserted-by":"publisher","first-page":"444","DOI":"10.1002\/bimj.201900051","volume":"62","author":"MH Huque","year":"2020","unstructured":"Huque, M. H. et al. Multiple imputation methods for handling incomplete longitudinal and clustered data where the target analysis is a linear mixed effects model. Biom. J. 62, 444\u2013466 (2020).","journal-title":"Biom. J."},{"key":"518_CR30","doi-asserted-by":"crossref","unstructured":"van Buuren, S. Flexible Imputation of Missing Data. 2nd edn, (Chapman & Hall\/CRC, 2018).","DOI":"10.1201\/9780429492259"},{"key":"518_CR31","doi-asserted-by":"publisher","first-page":"353","DOI":"10.1016\/j.jmva.2013.11.006","volume":"124","author":"K-H Yuan","year":"2014","unstructured":"Yuan, K.-H. & Savalei, V. Consistency, bias and efficiency of the normal-distribution-based MLE: The role of auxiliary variables. J. Multivar. Anal. 124, 353\u2013370 (2014).","journal-title":"J. Multivar. Anal."},{"key":"518_CR32","doi-asserted-by":"publisher","first-page":"624","DOI":"10.1093\/aje\/kwp425","volume":"171","author":"KJ Lee","year":"2010","unstructured":"Lee, K. J. & Carlin, J. B. Multiple imputation for missing data: fully conditional specification versus multivariate normal imputation. Am. J. Epidemiol. 171, 624\u2013632 (2010).","journal-title":"Am. J. Epidemiol."},{"key":"518_CR33","doi-asserted-by":"publisher","first-page":"e0246877","DOI":"10.1371\/journal.pone.0246877","volume":"16","author":"D Chaudhary","year":"2021","unstructured":"Chaudhary, D. et al. Obesity and mortality after the first ischemic stroke: Is obesity paradox real? PLoS ONE 16, e0246877 (2021).","journal-title":"PLoS ONE"},{"key":"518_CR34","doi-asserted-by":"publisher","first-page":"117339","DOI":"10.1016\/j.jns.2021.117339","volume":"422","author":"D Chaudhary","year":"2021","unstructured":"Chaudhary, D. et al. Trends in ischemic stroke outcomes in a rural population in the United States. J. Neurol. Sci. 422, 117339 (2021).","journal-title":"J. Neurol. Sci."},{"key":"518_CR35","doi-asserted-by":"publisher","unstructured":"Li, J. et al. Polygenic risk scores augment stroke subtyping. Neurol. Genet. 7, https:\/\/doi.org\/10.1212\/NXG.0000000000000560 (2021).","DOI":"10.1212\/NXG.0000000000000560"},{"key":"518_CR36","doi-asserted-by":"publisher","first-page":"e005114","DOI":"10.1161\/CIRCOUTCOMES.118.005114","volume":"12","author":"R Chen","year":"2019","unstructured":"Chen, R., Stewart, W. F., Sun, J., Ng, K. & Yan, X. Recurrent neural networks for early detection of heart failure from longitudinal electronic health record data: implications for temporal modeling with respect to time before diagnosis, data density, data quantity, and data type. Circ. Cardiovasc. Qual. Outcomes 12, e005114 (2019).","journal-title":"Circ. Cardiovasc. Qual. Outcomes"},{"key":"518_CR37","doi-asserted-by":"publisher","first-page":"3725","DOI":"10.1002\/sim.6184","volume":"33","author":"CA Welch","year":"2014","unstructured":"Welch, C. A. et al. Evaluation of two-fold fully conditional specification multiple imputation for longitudinal electronic health record data. Stat. Med. 33, 3725\u20133737 (2014).","journal-title":"Stat. Med"},{"key":"518_CR38","doi-asserted-by":"publisher","first-page":"3657","DOI":"10.1002\/sim.3731","volume":"28","author":"J Nevalainen","year":"2009","unstructured":"Nevalainen, J., Kenward, M. G. & Virtanen, S. M. Missing values in longitudinal dietary data: a multiple imputation approach based on a fully conditional specification. Stat. Med. 28, 3657\u20133669 (2009).","journal-title":"Stat. Med."},{"key":"518_CR39","doi-asserted-by":"publisher","unstructured":"Abedi, V. et al. Increasing the density of laboratory measures for machine learning applications. J. Clin. Med. 10, https:\/\/doi.org\/10.3390\/jcm10010103 (2020).","DOI":"10.3390\/jcm10010103"},{"key":"518_CR40","doi-asserted-by":"publisher","first-page":"11","DOI":"10.1093\/biomet\/63.3.581","volume":"63","author":"DB Rubin","year":"1976","unstructured":"Rubin, D. B. Inference with missing data. Biometrika 63, 11 (1976).","journal-title":"Biometrika"},{"key":"518_CR41","doi-asserted-by":"publisher","first-page":"67","DOI":"10.18637\/jss.v045.i03","volume":"45","author":"S Van Buuren","year":"2011","unstructured":"Van Buuren, S. & Groothuis-Oudshoorn, K. mice: multivariate imputation by chained equations in R. J. Stat. Softw. 45, 67 (2011).","journal-title":"J. Stat. Softw."},{"key":"518_CR42","doi-asserted-by":"publisher","first-page":"21","DOI":"10.1198\/106186002760180608","volume":"11","author":"JL Schafer","year":"2002","unstructured":"Schafer, J. L. & Yucel, R. M. Computational strategies for multivariate linear mixed-effects models with missing values. J. Computational Graph. Stat. 11, 21 (2002).","journal-title":"J. Computational Graph. Stat."},{"key":"518_CR43","doi-asserted-by":"publisher","unstructured":"Kasim, R. M. & Raudenbush, S. W. Application of Gibbs sampling to nested variance components models with heterogeneous within-group variance. J. Educ. Behav. Stat. 23, https:\/\/doi.org\/10.2307\/1165316 (1998).","DOI":"10.2307\/1165316"},{"key":"518_CR44","doi-asserted-by":"publisher","unstructured":"Abedi, V. et al. Predicting short and long-term mortality after acute ischemic stroke using EHR. J. Neurol. Sci. 427, https:\/\/doi.org\/10.1016\/j.jns.2021.117560 (2021).","DOI":"10.1016\/j.jns.2021.117560"},{"key":"518_CR45","doi-asserted-by":"publisher","first-page":"2735","DOI":"10.1161\/CIRCULATIONAHA.105.169404","volume":"112","author":"SM Grundy","year":"2005","unstructured":"Grundy, S. M. et al. Diagnosis and management of the metabolic syndrome: an American Heart Association\/National Heart, Lung, and Blood Institute Scientific Statement. Circulation 112, 2735\u20132752 (2005).","journal-title":"Circulation"},{"key":"518_CR46","doi-asserted-by":"publisher","first-page":"3007","DOI":"10.1002\/sim.6926","volume":"35","author":"I Bondarenko","year":"2016","unstructured":"Bondarenko, I. & Raghunathan, T. Graphical and numerical diagnostic tools to assess suitability of multiple imputations and imputation models. Stat. Med. 35, 3007\u20133020 (2016).","journal-title":"Stat. Med."}],"container-title":["npj Digital Medicine"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.nature.com\/articles\/s41746-021-00518-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41746-021-00518-0","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41746-021-00518-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,9]],"date-time":"2024-09-09T20:04:33Z","timestamp":1725912273000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.nature.com\/articles\/s41746-021-00518-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,11]]},"references-count":46,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2021,12]]}},"alternative-id":["518"],"URL":"https:\/\/doi.org\/10.1038\/s41746-021-00518-0","relation":{},"ISSN":["2398-6352"],"issn-type":[{"value":"2398-6352","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,10,11]]},"assertion":[{"value":"1 June 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 September 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 October 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declare no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"147"}}