{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,3]],"date-time":"2026-05-03T05:43:04Z","timestamp":1777786984578,"version":"3.51.4"},"reference-count":39,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2020,6,12]],"date-time":"2020-06-12T00:00:00Z","timestamp":1591920000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,6,12]],"date-time":"2020-06-12T00:00:00Z","timestamp":1591920000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Ministry of Posts, Telecommunications and Information Technology, Bangladesh","award":["56.00.0000.028.33.025.14-154"],"award-info":[{"award-number":["56.00.0000.028.33.025.14-154"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Big Data"],"published-print":{"date-parts":[[2020,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>In data analytics, missing data is a factor that degrades performance. Incorrect imputation of missing values could lead to a wrong prediction. In this era of big data, when a massive volume of data is generated in every second, and utilization of these data is a major concern to the stakeholders, efficiently handling missing values becomes more important. In this paper, we have proposed a new technique for missing data imputation, which is a hybrid approach of single and multiple imputation techniques. We have proposed an extension of popular <jats:italic>Multivariate Imputation by Chained Equation (MICE)<\/jats:italic> algorithm in two variations to impute categorical and numeric data. We have also implemented twelve existing algorithms to impute binary, ordinal, and numeric missing values. We have collected sixty-five thousand real health records from different hospitals and diagnostic centers of Bangladesh, maintaining the privacy of data. We have also collected three public datasets from the UCI Machine Learning Repository, ETH Zurich, and Kaggle. We have compared the performance of our proposed algorithms with existing algorithms using these datasets. Experimental results show that our proposed algorithm achieves 20% higher F-measure for binary data imputation and 11% less error for numeric data imputations than its competitors with similar execution time.<\/jats:p>","DOI":"10.1186\/s40537-020-00313-w","type":"journal-article","created":{"date-parts":[[2020,6,12]],"date-time":"2020-06-12T11:02:39Z","timestamp":1591959759000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":228,"title":["SICE: an improved missing data imputation technique"],"prefix":"10.1186","volume":"7","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8740-2744","authenticated-orcid":false,"given":"Shahidul Islam","family":"Khan","sequence":"first","affiliation":[]},{"given":"Abu Sayed Md Latiful","family":"Hoque","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2020,6,12]]},"reference":[{"issue":"1","key":"313_CR1","doi-asserted-by":"publisher","first-page":"3","DOI":"10.23876\/j.krcp.2017.36.1.3","volume":"36","author":"Choong Ho Lee","year":"2017","unstructured":"Lee Choong Ho, Yoon Hyung-Jin. Medical big data: promise and challenges. Kidney Res Clin Pract. 2017;36(1):3.","journal-title":"Kidney Res Clin Pract"},{"issue":"1","key":"313_CR2","doi-asserted-by":"publisher","first-page":"21","DOI":"10.1186\/s40537-015-0030-3","volume":"2","author":"Chun-Wei Tsai","year":"2015","unstructured":"Tsai Chun-Wei, Lai Chin-Feng, Chao Han-Chieh, Vasilakos Athanasios V. Big data analytics: a survey. J Big Data. 2015;2(1):21.","journal-title":"J Big Data"},{"issue":"8","key":"313_CR3","doi-asserted-by":"publisher","first-page":"611","DOI":"10.1108\/02635570310497657","volume":"103","author":"ML Brown","year":"2003","unstructured":"Brown ML, Kros JF. Data mining and the impact of missing data. Ind Manag Data Syst. 2003;103(8):611\u201321.","journal-title":"Ind Manag Data Syst"},{"issue":"2","key":"313_CR4","doi-asserted-by":"publisher","first-page":"293","DOI":"10.1093\/nsr\/nwt032","volume":"1","author":"Jianqing Fan","year":"2014","unstructured":"Fan Jianqing, Han Fang, Liu Han. Challenges of big data analysis. National Sci Rev. 2014;1(2):293\u2013314.","journal-title":"National Sci Rev"},{"issue":"4","key":"313_CR5","first-page":"3","volume":"23","author":"Erhard Rahm","year":"2000","unstructured":"Rahm Erhard, Do Hong Hai. Data cleaning: problems and current approaches. IEEE Data Eng Bull. 2000;23(4):3\u201313.","journal-title":"IEEE Data Eng Bull"},{"issue":"5\u20136","key":"313_CR6","doi-asserted-by":"publisher","first-page":"375","DOI":"10.1080\/713827180","volume":"17","author":"Shichao Zhang","year":"2003","unstructured":"Zhang Shichao, Zhang Chengqi, Yang Qiang. Data preparation for data mining. Appl Artif Intell. 2003;17(5\u20136):375\u201381.","journal-title":"Appl Artif Intell"},{"key":"313_CR7","doi-asserted-by":"crossref","unstructured":"Graham John\u00a0W, Cumsille Patricio\u00a0E, Shevock Allison\u00a0E. Methods for handling missing data. Handbook of Psychology, Second Edition, 2, 2012.","DOI":"10.1002\/9781118133880.hop202004"},{"issue":"5","key":"313_CR8","doi-asserted-by":"publisher","first-page":"18","DOI":"10.1109\/MSP.2014.2327238","volume":"31","author":"Konstantinos Slavakis","year":"2014","unstructured":"Slavakis Konstantinos, Giannakis Georgios B, Mateos Gonzalo. Modeling and optimization for big data analytics:(statistical) learning tools for our era of data deluge. IEEE Signal Process Mag. 2014;31(5):18\u201331.","journal-title":"IEEE Signal Process Mag"},{"key":"313_CR9","doi-asserted-by":"publisher","DOI":"10.3978\/j.issn.2305-5839.2015.12.11","author":"Z Zhang","year":"2015","unstructured":"Zhang Z. Missing values in big data research: some basic skills. Ann Transl Med. 2015;. https:\/\/doi.org\/10.3978\/j.issn.2305-5839.2015.12.11.","journal-title":"Ann Transl Med."},{"issue":"4","key":"313_CR10","doi-asserted-by":"publisher","first-page":"407","DOI":"10.4097\/kjae.2017.70.4.407","volume":"70","author":"Sang Kyu Kwak","year":"2017","unstructured":"Kwak Sang Kyu, Kim Jong Hae. Statistical data preparation: management of missing values and outliers. Korean J Anesthesiol. 2017;70(4):407.","journal-title":"Korean J Anesthesiol"},{"key":"313_CR11","volume-title":"Survey sampling theory and applications","author":"Raghunath Arnab","year":"2017","unstructured":"Arnab Raghunath. Survey sampling theory and applications. Cambridge: Academic Press; 2017."},{"issue":"1","key":"313_CR12","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1111\/j.2044-8317.2005.tb00312.x","volume":"58","author":"Rebecca Holman","year":"2005","unstructured":"Holman Rebecca, Glas Cees AW. Modelling non-ignorable missing-data mechanisms with item response theory models. Br J Math Stat Psychol. 2005;58(1):1\u201317.","journal-title":"Br J Math Stat Psychol"},{"key":"313_CR13","doi-asserted-by":"crossref","unstructured":"Grzymala-Busse Jerzy\u00a0W, Grzymala-Busse Witold\u00a0J. Handling missing attribute values. In: Data mining and knowledge discovery handbook. Berlin: Springer. 2009. p. 33\u201351. 2009.","DOI":"10.1007\/978-0-387-09823-4_3"},{"key":"313_CR14","unstructured":"Orczyk T, Porwik P. Influence of missing data imputation method on the classification accuracy of the medical data. J Med Inform Technol. 2013;22."},{"key":"313_CR15","doi-asserted-by":"crossref","unstructured":"Rahman M\u00a0Mostafizur, Davis Darryl\u00a0N. Machine learning-based missing value imputation method for clinical datasets. In IAENG transactions on engineering technologies. Springer, 2013. p. 245\u2013257 .","DOI":"10.1007\/978-94-007-6190-2_19"},{"issue":"3","key":"313_CR16","doi-asserted-by":"publisher","first-page":"293","DOI":"10.1007\/s10618-009-0131-8","volume":"19","author":"Jens H\u00fchn","year":"2009","unstructured":"H\u00fchn Jens, H\u00fcllermeier Eyke. Furia: an algorithm for unordered fuzzy rule induction. Data Min Knowl Discov. 2009;19(3):293\u2013319.","journal-title":"Data Min Knowl Discov"},{"issue":"1","key":"313_CR17","first-page":"1","volume":"6","author":"Peter Schmitt","year":"2015","unstructured":"Schmitt Peter, Mandel Jonas, Guedj Mickael. A comparison of six methods for missing data imputation. J Biometrics Biostat. 2015;6(1):1.","journal-title":"J Biometrics Biostat"},{"key":"313_CR18","doi-asserted-by":"publisher","first-page":"152","DOI":"10.1016\/j.neucom.2016.04.015","volume":"205","author":"Mehran Amiri","year":"2016","unstructured":"Amiri Mehran, Jensen Richard. Missing data imputation using fuzzy-rough methods. Neurocomputing. 2016;205:152\u201364.","journal-title":"Neurocomputing"},{"key":"313_CR19","doi-asserted-by":"crossref","unstructured":"Triguero Isaac, Gonz\u00e1lez Sergio, Moyano Jose\u00a0M, Garc\u00eda\u00a0L\u00f3pez Salvador, Fern\u00e1ndez Jes\u00fas Alcal\u00e1, Mart\u00edn Juli\u00e1n Luengo, Hilario Alberto Fern\u00e1ndez, D\u00edaz Jes\u00fas, S\u00e1nchez Luciano, Herrera\u00a0Triguero Francisco, et\u00a0al. Keel 3.0: an open source software for multi-stage analysis in data mining. 2017.","DOI":"10.2991\/ijcis.10.1.82"},{"issue":"10","key":"313_CR20","doi-asserted-by":"publisher","first-page":"913","DOI":"10.1080\/08839514.2019.1637138","volume":"33","author":"Anil Jadhav","year":"2019","unstructured":"Jadhav Anil, Pramod Dhanya, Ramanathan Krishnan. Comparison of performance of data imputation methods for numeric dataset. Appl Artif Intell. 2019;33(10):913\u201333.","journal-title":"Appl Artif Intell"},{"key":"313_CR21","doi-asserted-by":"crossref","unstructured":"Honghai Feng, Guoshun Chen, Cheng Yin, Bingru Yang, Yumei Chen. A svm regression based approach to filling in missing values. In: International Conference on Knowledge-Based and Intelligent Information and Engineering Systems. Springer. 2005. p. 581\u2013587.","DOI":"10.1007\/11553939_83"},{"issue":"5\u20136","key":"313_CR22","doi-asserted-by":"publisher","first-page":"684","DOI":"10.1016\/j.neunet.2005.06.025","volume":"18","author":"Kristiaan Pelckmans","year":"2005","unstructured":"Pelckmans Kristiaan, De Brabanter Jos, Suykens Johan AK, De Moor Bart. Handling missing values in support vector machine classifiers. Neural Netw. 2005;18(5\u20136):684\u201392.","journal-title":"Neural Netw"},{"key":"313_CR23","doi-asserted-by":"crossref","unstructured":"Boser Bernhard\u00a0E, Guyon Isabelle\u00a0M, Vapnik Vladimir\u00a0N. A training algorithm for optimal margin classifiers. In: Proceedings of the fifth annual workshop on Computational learning theory, 1992. p. 144\u2013152.","DOI":"10.1145\/130385.130401"},{"key":"313_CR24","doi-asserted-by":"crossref","unstructured":"Buuren S\u00a0van, Groothuis-Oudshoorn Karin. mice: Multivariate imputation by chained equations in r. J Statist Softw. 2010; 1\u201368.","DOI":"10.18637\/jss.v045.i03"},{"issue":"1","key":"313_CR25","doi-asserted-by":"publisher","first-page":"61","DOI":"10.1111\/stan.12023","volume":"68","author":"Gerko Vink","year":"2014","unstructured":"Vink Gerko, Frank Laurence E, Pannekoek Jeroen, Van Buuren Stef. Predictive mean matching imputation of semicontinuous variables. Statistica Neerlandica. 2014;68(1):61\u201390.","journal-title":"Statistica Neerlandica"},{"key":"313_CR26","unstructured":"Wright Raymond\u00a0E. Logistic regression. 1995."},{"issue":"1","key":"313_CR27","doi-asserted-by":"publisher","first-page":"197","DOI":"10.1007\/BF00048682","volume":"44","author":"Dankmar B\u00f6hning","year":"1992","unstructured":"B\u00f6hning Dankmar. Multinomial logistic regression algorithm. Ann Inst Stat Math. 1992;44(1):197\u2013200.","journal-title":"Ann Inst Stat Math"},{"key":"313_CR28","first-page":"1","volume":"18","author":"Suresh Balakrishnama","year":"1998","unstructured":"Balakrishnama Suresh, Ganapathiraju Aravind. Linear discriminant analysis-a brief tutorial. Inst Signal Inform Process. 1998;18:1\u20138.","journal-title":"Inst Signal Inform Process"},{"issue":"10","key":"313_CR29","first-page":"1137","volume":"67","author":"Rick L Lawrence","year":"2001","unstructured":"Lawrence Rick L, Wright Andrea. Rule-based classification systems using classification and regression tree (cart) analysis. Photogramm Eng Remote Sens. 2001;67(10):1137\u201342.","journal-title":"Photogramm Eng Remote Sens"},{"issue":"5","key":"313_CR30","doi-asserted-by":"publisher","first-page":"1986","DOI":"10.1214\/15-AOS1334","volume":"43","author":"Isma\u00ebl Castillo","year":"2015","unstructured":"Castillo Isma\u00ebl, Schmidt-Hieber Johannes, Van der Vaart Aad, et al. Bayesian linear regression with sparse priors. Ann Stat. 2015;43(5):1986\u20132018.","journal-title":"Ann Stat"},{"issue":"7","key":"313_CR31","doi-asserted-by":"publisher","first-page":"1","DOI":"10.18637\/jss.v045.i07","volume":"45","author":"James Honaker","year":"2011","unstructured":"Honaker James, King Gary, Blackwell Matthew, et al. Amelia ii: a program for missing data. J Stat Softw. 2011;45(7):1\u201347.","journal-title":"J Stat Softw"},{"issue":"10","key":"313_CR32","doi-asserted-by":"publisher","first-page":"2415","DOI":"10.1093\/ndt\/gft221","volume":"28","author":"Moniek CM de Goeij","year":"2013","unstructured":"de Goeij Moniek CM, van Diepen Merel, Jager Kitty J, Tripepi Giovanni, Zoccali Carmine, Dekker Friedo W. Multiple imputation: dealing with missing data. Nephrol Dial Transplant. 2013;28(10):2415\u201320.","journal-title":"Nephrol Dial Transplant"},{"key":"313_CR33","doi-asserted-by":"publisher","first-page":"63","DOI":"10.1016\/j.jclinepi.2019.02.016","volume":"110","author":"Paul Madley-Dowd","year":"2019","unstructured":"Madley-Dowd Paul, Hughes Rachael, Tilling Kate, Heron Jon. The proportion of missing data should not be used to guide decisions on multiple imputation. J Clin Epidemiol. 2019;110:63\u201373.","journal-title":"J Clin Epidemiol"},{"key":"313_CR34","unstructured":"Hair and Eye Color of Statistics Students. 1997. https:\/\/stat.ethz.ch\/R-manual\/R-devel\/library\/datasets\/html\/HairEyeColor.html. Accessed 11 Dec 2019."},{"key":"313_CR35","unstructured":"Car Evaluation Data Set. 1997. http:\/\/archive.ics.uci.edu\/ml\/datasets\/Car+Evaluation. Accessed 4 Feb 2020."},{"key":"313_CR36","unstructured":"KC House Data. 1997. https:\/\/www.kaggle.com\/shivachandel\/kc-house-data. Accessed 7 March 2020."},{"key":"313_CR37","doi-asserted-by":"crossref","unstructured":"Brodersen Kay\u00a0Henning, Ong Cheng\u00a0Soon, Stephan Klaas\u00a0Enno, Buhmann Joachim\u00a0M. The balanced accuracy and its posterior distribution. In 2010 20th International Conference on Pattern Recognition. IEEE. 2010. p. 3121\u20133124.","DOI":"10.1109\/ICPR.2010.764"},{"key":"313_CR38","unstructured":"Kuhn M. A short introduction to the caret package. R Found Stat Comput. 2015: 1."},{"key":"313_CR39","unstructured":"National statistical office Canada, age categories, life cycle groupings. https:\/\/www.statcan.gc.ca\/eng\/concepts\/definitions\/ag%e. 2017. Accessed 14 Oct 2019."}],"container-title":["Journal of Big Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-020-00313-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s40537-020-00313-w\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-020-00313-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,6,11]],"date-time":"2021-06-11T23:13:42Z","timestamp":1623453222000},"score":1,"resource":{"primary":{"URL":"https:\/\/journalofbigdata.springeropen.com\/articles\/10.1186\/s40537-020-00313-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,6,12]]},"references-count":39,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2020,12]]}},"alternative-id":["313"],"URL":"https:\/\/doi.org\/10.1186\/s40537-020-00313-w","relation":{},"ISSN":["2196-1115"],"issn-type":[{"value":"2196-1115","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,6,12]]},"assertion":[{"value":"8 February 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"29 May 2020","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"12 June 2020","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors do not have any competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"37"}}