{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,27]],"date-time":"2026-03-27T01:05:51Z","timestamp":1774573551211,"version":"3.50.1"},"reference-count":29,"publisher":"Springer Science and Business Media LLC","issue":"8","license":[{"start":{"date-parts":[[2021,3,6]],"date-time":"2021-03-06T00:00:00Z","timestamp":1614988800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,3,6]],"date-time":"2021-03-06T00:00:00Z","timestamp":1614988800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Artif Intell Rev"],"published-print":{"date-parts":[[2021,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Longitudinal datasets of human ageing studies usually have a high volume of missing data, and one way to handle missing values in a dataset is to replace them with estimations. However, there are many methods to estimate missing values, and no single method is the best for all datasets. In this article, we propose a data-driven missing value imputation approach that performs a feature-wise selection of the best imputation method, using known information in the dataset to rank the five methods we selected, based on their estimation error rates. We evaluated the proposed approach in two sets of experiments: a classifier-independent scenario, where we compared the applicabilities and error rates of each imputation method; and a classifier-dependent scenario, where we compared the predictive accuracy of Random Forest classifiers generated with datasets prepared using each imputation method and a baseline approach of doing no imputation (letting the classification algorithm handle the missing values internally). Based on our results from both sets of experiments, we concluded that the proposed data-driven missing value imputation approach generally resulted in models with more accurate estimations for missing data and better performing classifiers, in longitudinal datasets of human ageing. We also observed that imputation methods devised specifically for longitudinal data had very accurate estimations. This reinforces the idea that using the temporal information intrinsic to longitudinal data is a worthwhile endeavour for machine learning applications, and that can be achieved through the proposed data-driven approach.<\/jats:p>","DOI":"10.1007\/s10462-021-09963-5","type":"journal-article","created":{"date-parts":[[2021,3,6]],"date-time":"2021-03-06T14:02:45Z","timestamp":1615039365000},"page":"6277-6307","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":32,"title":["A data-driven missing value imputation approach for longitudinal datasets"],"prefix":"10.1007","volume":"54","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8125-8059","authenticated-orcid":false,"given":"Caio","family":"Ribeiro","sequence":"first","affiliation":[]},{"given":"Alex A.","family":"Freitas","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2021,3,6]]},"reference":[{"issue":"4","key":"9963_CR1","doi-asserted-by":"publisher","first-page":"349","DOI":"10.1016\/0010-4809(88)90050-X","volume":"21","author":"KM Albridge","year":"1988","unstructured":"Albridge KM, Standish J, Fries JF (1988) Hierarchical time-oriented approaches to missing data inference. Computers and Biomedical Research 21(4):349\u2013366","journal-title":"Computers and Biomedical Research"},{"key":"9963_CR2","unstructured":"Banks J, Breeze E, Lessof C, Nazroo J (2016) The dynamics of ageing: Evidence from the English Longitudinal Study of Ageing 2002\u201315 (Wave 7). Institute for Fiscal Studies, London. http:\/\/www.elsa-project.ac.uk\/publicationDetails\/id\/8696"},{"key":"9963_CR3","unstructured":"Banks J, Batty G, Coughlin K, Deepchand K, Marmot M, Nazroo J, Oldfield Z, Steel N, Steptoe MA, Wood, Zaninotto P (2019) English longitudinal study of ageing: Waves 0\u20138, 1998\u20132017.[data collection]"},{"issue":"1","key":"9963_CR4","doi-asserted-by":"publisher","first-page":"83","DOI":"10.1186\/s12874-016-0188-1","volume":"16","author":"M Belger","year":"2016","unstructured":"Belger M, Haro J, Reed C, Happich M, Kahle-Wrobleski K, Argimon J, Bruno G, Dodel R, Jones R, Vellas B et al (2016) How to deal with missing longitudinal data in cost of illness analysis in alzheimer\u2019s disease\u2013suggestions from the geras observational study. BMC Medical Research Methodology 16(1):83","journal-title":"BMC Medical Research Methodology"},{"key":"9963_CR5","doi-asserted-by":"crossref","unstructured":"Beyer K, Goldstein J, Ramakrishnan R, Shaft U (1999) When is \u201cnearest neighbor\u201d meaningful?. In: International conference on database theory. Springer, pp 217\u2013235","DOI":"10.1007\/3-540-49257-7_15"},{"issue":"1","key":"9963_CR6","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1023\/A:1010933404324","volume":"45","author":"L Breiman","year":"2001","unstructured":"Breiman L (2001) Random forests. Machine learning 45(1):5\u201332","journal-title":"Machine learning"},{"issue":"1\u201312","key":"9963_CR7","first-page":"24","volume":"110","author":"C Chen","year":"2004","unstructured":"Chen C, Liaw A, Breiman L et al (2004) Using random forest to learn imbalanced data. University of California, Berkeley 110(1\u201312):24","journal-title":"University of California, Berkeley"},{"key":"9963_CR8","doi-asserted-by":"crossref","DOI":"10.1093\/oso\/9780198524847.001.0001","volume-title":"Analysis of longitudinal data","author":"P Diggle","year":"2002","unstructured":"Diggle P (2002) Analysis of longitudinal data. Oxford University Press"},{"issue":"10","key":"9963_CR9","doi-asserted-by":"publisher","first-page":"968","DOI":"10.1016\/S0895-4356(03)00170-7","volume":"56","author":"JM Engels","year":"2003","unstructured":"Engels JM, Diehr P (2003) Imputation of missing longitudinal data: a comparison of methods. Journal of clinical epidemiology 56(10):968\u2013976","journal-title":"Journal of clinical epidemiology"},{"issue":"1","key":"9963_CR10","first-page":"3133","volume":"15","author":"M Fern\u00e1ndez-Delgado","year":"2014","unstructured":"Fern\u00e1ndez-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we need hundreds of classifiers to solve real world classification problems? The journal of machine learning research 15(1):3133\u20133181","journal-title":"The journal of machine learning research"},{"issue":"1","key":"9963_CR11","doi-asserted-by":"publisher","first-page":"86","DOI":"10.1214\/aoms\/1177731944","volume":"11","author":"M Friedman","year":"1940","unstructured":"Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. The Annals of Mathematical Statistics 11(1):86\u201392","journal-title":"The Annals of Mathematical Statistics"},{"issue":"4","key":"9963_CR12","doi-asserted-by":"publisher","first-page":"72","DOI":"10.11648\/j.ijsd.20170304.13","volume":"3","author":"AM Gad","year":"2017","unstructured":"Gad AM, Abdelkhalek RHM (2017) Imputation methods for longitudinal data: A comparative study. International Journal of Statistical Distributions and Applications 3(4):72","journal-title":"International Journal of Statistical Distributions and Applications"},{"key":"9963_CR13","volume-title":"Introduction to modern nonparametric statistics","author":"JJ Higgins","year":"2004","unstructured":"Higgins JJ (2004) Introduction to modern nonparametric statistics, 1st edn. Brooks\/Cole, Pacific Grove, CA","edition":"1"},{"key":"9963_CR14","unstructured":"Holm S (1979) A simple sequentially rejective multiple test procedure. Scandinavian journal of statistics, pp 65\u201370"},{"key":"9963_CR15","doi-asserted-by":"publisher","first-page":"112","DOI":"10.1016\/j.jbi.2017.03.009","volume":"68","author":"Z Hu","year":"2017","unstructured":"Hu Z, Melton GB, Arsoniadis EG, Wang Y, Kwaan MR, Simon GJ (2017) Strategies for handling missing clinical data for automated surgical site infection detection from the electronic health record. Journal of Biomedical Informatics 68:112\u2013120","journal-title":"Journal of Biomedical Informatics"},{"key":"9963_CR16","doi-asserted-by":"crossref","unstructured":"Kouiroukidis N, Evangelidis G (2011) The effects of dimensionality curse in high dimensional knn search. In: 2011 15th Panhellenic Conference on Informatics. IEEE, pp 41\u201345","DOI":"10.1109\/PCI.2011.45"},{"key":"9963_CR17","volume-title":"Statistical analysis with missing data","author":"RJ Little","year":"2019","unstructured":"Little RJ, Rubin DB (2019) Statistical analysis with missing data, vol 793. John Wiley & Sons"},{"key":"9963_CR18","doi-asserted-by":"publisher","first-page":"113","DOI":"10.1016\/j.ins.2013.07.007","volume":"250","author":"V L\u00f3pez","year":"2013","unstructured":"L\u00f3pez V, Fern\u00e1ndez A, Garc\u00eda S, Palade V, Herrera F (2013) An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Information Sciences 250:113\u2013141","journal-title":"Information Sciences"},{"key":"9963_CR19","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511975820","volume-title":"Statistical learning for biomedical data","author":"JD Malley","year":"2011","unstructured":"Malley JD, Malley KG, Pajevic S (2011) Statistical learning for biomedical data. Cambridge University Press"},{"key":"9963_CR20","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9781139381666","volume-title":"Preventing and treating missing data in longitudinal clinical trials: a practical guide","author":"CH Mallinckrodt","year":"2013","unstructured":"Mallinckrodt CH (2013) Preventing and treating missing data in longitudinal clinical trials: a practical guide. Cambridge University Press"},{"key":"9963_CR21","doi-asserted-by":"crossref","unstructured":"Minhas S, Khanum A, Riaz F, Alvi A, Khan SA, Initiative ADN, et al. (2015) Early alzheimer\u2019s disease prediction in machine learning setup: Empirical analysis with missing value computation. In: International Conference on Intelligent Data Engineering and Automated Learning. Springer, pp 424\u2013432","DOI":"10.1007\/978-3-319-24834-9_49"},{"key":"9963_CR22","doi-asserted-by":"crossref","unstructured":"Pomsuwan T, Freitas AA (2017) Feature selection for the classification of longitudinal human ageing data. In: IEEE International Conference on Data Mining Workshops (ICDMW). IEEE, pp 739\u2013746","DOI":"10.1109\/ICDMW.2017.102"},{"key":"9963_CR23","volume-title":"C4.5: Programs for Machine Learning","author":"JR Quinlan","year":"1993","unstructured":"Quinlan JR (1993) C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (ISBN 1-55860-238-0)"},{"key":"9963_CR24","unstructured":"Ribeiro C, Freitas AA (2019) Comparing the effectiveness of six missing value imputation methods for longitudinal classification datasets. In: 3rd Workshop on AI for Aging, Rehabilitation and Independent Assisted Living (ARIAL), held as part of IJCAI-2019"},{"key":"9963_CR25","doi-asserted-by":"publisher","first-page":"285","DOI":"10.1007\/978-3-319-59758-4_33","volume-title":"Artificial Intelligence in Medicine","author":"MS Santos","year":"2017","unstructured":"Santos MS, Soares JP, Henriques Abreu P, Ara\u00fajo H, Santos J (2017) Influence of data distribution in missing data imputation. In: ten Teije A, Popow C, Holmes JH, Sacchi L (eds) Artificial Intelligence in Medicine. Springer International Publishing, Cham, pp 285\u2013294 (ISBN 978-3-319-59758-4)"},{"key":"9963_CR26","doi-asserted-by":"publisher","first-page":"315","DOI":"10.1613\/jair.1199","volume":"19","author":"GM Weiss","year":"2003","unstructured":"Weiss GM, Provost F (2003) Learning when training data are costly: The effect of class distribution on tree induction. Journal of artificial intelligence research 19:315\u2013354","journal-title":"Journal of artificial intelligence research"},{"key":"9963_CR27","doi-asserted-by":"publisher","first-page":"128","DOI":"10.1016\/j.eswa.2017.04.003","volume":"82","author":"C Zhang","year":"2017","unstructured":"Zhang C, Liu C, Zhang X, Almpanidis G (2017) An up-to-date comparison of state-of-the-art classification algorithms. Expert Systems with Applications 82:128\u2013150","journal-title":"Expert Systems with Applications"},{"key":"9963_CR28","doi-asserted-by":"publisher","unstructured":"Zhao J, Feng Q, Wu P, Lupu R, Wilke RA, Wells QS, Denny J, Wei W-Q (2018) Learning from longitudinal data in electronic health record and genetic data to improve cardiovascular event prediction. bioRxiv. https:\/\/doi.org\/10.1101\/366682. URL https:\/\/www.biorxiv.org\/content\/early\/2018\/07\/11\/366682","DOI":"10.1101\/366682"},{"issue":"11","key":"9963_CR29","doi-asserted-by":"publisher","first-page":"933","DOI":"10.4236\/ojs.2014.411088","volume":"4","author":"X Zhu","year":"2014","unstructured":"Zhu X (2014) Comparison of four methods for handing missing data in longitudinal data analysis through a simulation study. Open Journal of Statistics 4(11):933","journal-title":"Open Journal of Statistics"}],"container-title":["Artificial Intelligence Review"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10462-021-09963-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10462-021-09963-5\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10462-021-09963-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,25]],"date-time":"2024-08-25T14:34:20Z","timestamp":1724596460000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10462-021-09963-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,3,6]]},"references-count":29,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2021,12]]}},"alternative-id":["9963"],"URL":"https:\/\/doi.org\/10.1007\/s10462-021-09963-5","relation":{},"ISSN":["0269-2821","1573-7462"],"issn-type":[{"value":"0269-2821","type":"print"},{"value":"1573-7462","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,3,6]]},"assertion":[{"value":"21 January 2021","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"6 March 2021","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}