{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,28]],"date-time":"2026-02-28T04:23:18Z","timestamp":1772252598186,"version":"3.50.1"},"reference-count":21,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2017,1,25]],"date-time":"2017-01-25T00:00:00Z","timestamp":1485302400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000092","name":"U.S. National Library of Medicine","doi-asserted-by":"publisher","award":["T15LM007059"],"award-info":[{"award-number":["T15LM007059"]}],"id":[{"id":"10.13039\/100000092","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000057","name":"National Institute of General Medical Sciences","doi-asserted-by":"publisher","award":["R01GM100387"],"award-info":[{"award-number":["R01GM100387"]}],"id":[{"id":"10.13039\/100000057","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Data"],"abstract":"<jats:p>Many clinical research datasets have a large percentage of missing values that directly impacts their usefulness in yielding high accuracy classifiers when used for training in supervised machine learning. While missing value imputation methods have been shown to work well with smaller percentages of missing values, their ability to impute sparse clinical research data can be problem specific. We previously attempted to learn quantitative guidelines for ordering cardiac magnetic resonance imaging during the evaluation for pediatric cardiomyopathy, but missing data significantly reduced our usable sample size. In this work, we sought to determine if increasing the usable sample size through imputation would allow us to learn better guidelines. We first review several machine learning methods for estimating missing data. Then, we apply four popular methods (mean imputation, decision tree, k-nearest neighbors, and self-organizing maps) to a clinical research dataset of pediatric patients undergoing evaluation for cardiomyopathy. Using Bayesian Rule Learning (BRL) to learn ruleset models, we compared the performance of imputation-augmented models versus unaugmented models. We found that all four imputation-augmented models performed similarly to unaugmented models. While imputation did not improve performance, it did provide evidence for the robustness of our learned models.<\/jats:p>","DOI":"10.3390\/data2010008","type":"journal-article","created":{"date-parts":[[2017,1,25]],"date-time":"2017-01-25T09:50:44Z","timestamp":1485337844000},"page":"8","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":58,"title":["An Overview and Evaluation of Recent Machine Learning Imputation Methods Using Cardiac Imaging Data"],"prefix":"10.3390","volume":"2","author":[{"given":"Yuzhe","family":"Liu","sequence":"first","affiliation":[{"name":"Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15260, USA"},{"name":"Medical Scientist Training Program, University of Pittsburgh, Pittsburgh, PA 15260, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Vanathi","family":"Gopalakrishnan","sequence":"additional","affiliation":[{"name":"Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15260, USA"},{"name":"Medical Scientist Training Program, University of Pittsburgh, Pittsburgh, PA 15260, USA"},{"name":"Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA 15260, USA"},{"name":"Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA 15260, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2017,1,25]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1016\/j.jsp.2009.10.001","article-title":"An introduction to modern missing data analyses","volume":"48","author":"Baraldi","year":"2010","journal-title":"J. Sch. Psychol."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"940","DOI":"10.1001\/jama.2015.10516","article-title":"Missing Data: How to Best Account for What Is Not Known","volume":"314","author":"Newgard","year":"2015","journal-title":"JAMA"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Liu, Y., Gopalakrishnan, V., and Madan, S. (2015, January 12). Quantitative clinical guidelines for imaging use in evaluation of pediatric cardiomyopathy. Proceedings of the 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Washington, DC, USA.","DOI":"10.1109\/BIBM.2015.7359910"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"243","DOI":"10.1161\/CIRCIMAGING.108.840975","article-title":"The prognostic implications of cardiovascular magnetic resonance","volume":"2","author":"Flett","year":"2009","journal-title":"Circ. Cardiovasc. Imaging"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"668","DOI":"10.1093\/bioinformatics\/btq005","article-title":"Bayesian rule learning for biomedical data mining","volume":"26","author":"Gopalakrishnan","year":"2010","journal-title":"Bioinformatics"},{"key":"ref_6","unstructured":"Little, R.J.A., and Rubin, D.B. (2014). Statistical Analysis with Missing Data, John Wiley & Sons."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Gelman, A., and Hill, J. (2006). Data Analysis Using Regression and Multilevel\/Hierarchical Models, Cambridge University Press.","DOI":"10.1017\/CBO9780511790942"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"371","DOI":"10.1111\/j.1467-9868.2007.00640.x","article-title":"Every missingness not at random model has a missingness at random counterpart with equal fit","volume":"70","author":"Molenberghs","year":"2008","journal-title":"J. R. Stat. Soc. Ser. B (Stat. Methodol.)"},{"key":"ref_9","first-page":"263","article-title":"Pattern classification with missing data: A review","volume":"19","year":"2009","journal-title":"Neural Comput. Appl."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"1464","DOI":"10.1109\/5.58325","article-title":"The self-organizing map","volume":"78","author":"Kohonen","year":"1990","journal-title":"Proc. IEEE"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"300","DOI":"10.1007\/s005210200002","article-title":"Self-organising map for data imputation and correction in surveys","volume":"10","author":"Fessant","year":"2002","journal-title":"Neural Comput. Appl."},{"key":"ref_12","unstructured":"Quinlan, J.R. (1993). C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers Inc."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"309","DOI":"10.1007\/BF00994110","article-title":"A Bayesian method for the induction of probabilistic networks from data","volume":"9","author":"Cooper","year":"1992","journal-title":"Mach. Learn."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"134","DOI":"10.1016\/j.envsoft.2012.03.012","article-title":"Good practice in Bayesian network modelling","volume":"37","author":"Chen","year":"2012","journal-title":"Environ. Model. Softw."},{"key":"ref_15","unstructured":"John, G.H., and Langley, P. (1995, January 18\u201320). Estimating Continuous Distributions in Bayesian Classifiers. Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, Montr\u00e9al, QC, Canada."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1186\/1477-7525-6-57","article-title":"Simple imputation methods were inadequate for missing not at random (MNAR) quality of life data","volume":"6","author":"Fielding","year":"2008","journal-title":"Health Qual. Life Outcomes"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1177\/1740774514566662","article-title":"Missing not at random models for masked clinical trials with dropouts","volume":"12","author":"Kang","year":"2015","journal-title":"Clin. Trials"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Little, R.J., Rubin, D.B., and Zangeneh, S.Z. (2016). Conditions for ignoring the missing-data mechanism in likelihood inferences for parameter Subsets. J. Am. Stat. Assoc.","DOI":"10.1080\/01621459.2015.1136826"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"10","DOI":"10.1145\/1656274.1656278","article-title":"The WEKA data mining software","volume":"11","author":"Hall","year":"2009","journal-title":"ACM SIGKDD Explor. Newsl."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"60","DOI":"10.1016\/j.neucom.2014.02.061","article-title":"Self-organization and missing values in SOM and GTM","volume":"147","author":"Vatanen","year":"2015","journal-title":"Neurocomputing"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Lustgarten, J.L., Visweswaran, S., Gopalakrishnan, V., and Cooper, G.F. (2011). Application of an efficient Bayesian discretization method to biomedical data. BMC Bioinform., 12.","DOI":"10.1186\/1471-2105-12-309"}],"container-title":["Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2306-5729\/2\/1\/8\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T18:26:58Z","timestamp":1760207218000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2306-5729\/2\/1\/8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,1,25]]},"references-count":21,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2017,3]]}},"alternative-id":["data2010008"],"URL":"https:\/\/doi.org\/10.3390\/data2010008","relation":{"has-preprint":[{"id-type":"doi","id":"10.20944\/preprints201612.0078.v1","asserted-by":"object"}]},"ISSN":["2306-5729"],"issn-type":[{"value":"2306-5729","type":"electronic"}],"subject":[],"published":{"date-parts":[[2017,1,25]]}}}