{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,16]],"date-time":"2025-12-16T12:52:19Z","timestamp":1765889539838,"version":"build-2065373602"},"reference-count":34,"publisher":"MDPI AG","issue":"8","license":[{"start":{"date-parts":[[2025,8,12]],"date-time":"2025-08-12T00:00:00Z","timestamp":1754956800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computers"],"abstract":"<jats:p>Missing data imputation is a critical preprocessing task that directly impacts the quality and reliability of data-driven analyses, yet many existing methods treat numerical and categorical data separately and lack the integration of advanced techniques. We suggest a novel imputation technique to overcome these restrictions that synergistically combines regression imputation using HistGradientBoostingRegressor and fuzzy rule-based systems and is enhanced by a tailored clustering process. This integrated approach effectively handles mixed data types and complex data structures using regression models to predict missing numerical values, fuzzy logic to incorporate expert knowledge and interpretability, and clustering to capture latent data patterns. Categorical variables are managed by mode imputation and label encoding. We evaluated the method on twelve tabular datasets with artificially introduced missingness, employing a comprehensive set of metrics focused on originally missing entries. The results demonstrate that our iterative imputer performs competitively with other established imputation techniques, achieving better and comparable error rates and accuracy. By combining statistical learning with fuzzy and clustering frameworks, the method achieves 15% lower Root Mean Square Error (RMSE), 10% lower Mean Absolute Error (MAE), and 80% higher precision in UCI datasets, thus offering a promising advance in data preprocessing in practical applications.<\/jats:p>","DOI":"10.3390\/computers14080325","type":"journal-article","created":{"date-parts":[[2025,8,12]],"date-time":"2025-08-12T16:30:36Z","timestamp":1755016236000},"page":"325","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["An Integrated Intuitionistic Fuzzy-Clustering Approach for Missing Data Imputation"],"prefix":"10.3390","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4505-3224","authenticated-orcid":false,"given":"Charl\u00e8ne B\u00e9atrice","family":"Bridge-Nduwimana","sequence":"first","affiliation":[{"name":"Laboratory for Artificial Intelligence, Data Science and Emerging Systems, Fes National School of Applied Sciences, Sidi Mohamed Ben Abdellah University, Fez 30050, Morocco"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7554-9300","authenticated-orcid":false,"given":"Aziza","family":"El Ouaazizi","sequence":"additional","affiliation":[{"name":"Laboratory for Artificial Intelligence, Data Science and Emerging Systems, Fes National School of Applied Sciences, Sidi Mohamed Ben Abdellah University, Fez 30050, Morocco"},{"name":"Laboratory of Engineering Sciences, Polydisciplinary Faculty of Taza, Sidi Mohamed Ben Abdellah University, Taza 35000, Morocco"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6696-9426","authenticated-orcid":false,"given":"Majid","family":"Benyakhlef","sequence":"additional","affiliation":[{"name":"Laboratory of Engineering Sciences, Polydisciplinary Faculty of Taza, Sidi Mohamed Ben Abdellah University, Taza 35000, Morocco"}]}],"member":"1968","published-online":{"date-parts":[[2025,8,12]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Meng, H. (2025, January 23\u201325). A Comparative Study on Missing Value Imputation Techniques in Machine Learning. Proceedings of the SHS Web of Conferences, Shanghai, China.","DOI":"10.1051\/shsconf\/202521802014"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Huang, J., Mao, B., Bai, Y., Zhang, T., and Miao, C. (2020). An Integrated Fuzzy C-Means Method for Missing Data Imputation Using Taxi GPS Data. Sensors, 20.","DOI":"10.3390\/s20071992"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"2518","DOI":"10.1016\/j.procs.2024.04.237","article-title":"Handling incomplete data using Radial basis Kernelized Intuitionistic Fuzzy C-Means","volume":"235","author":"Sethia","year":"2024","journal-title":"Procedia Comput. Sci."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1972","DOI":"10.1177\/15353702221121602","article-title":"Evaluation methodology for deep learning imputation models","volume":"247","author":"Boursalie","year":"2022","journal-title":"Exp. Biol. Med."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Li, J., Guo, S., Ma, R., He, J., Zhang, X., Rui, D., Ding, Y., Li, Y., Jian, L., and Cheng, J. (2024). Comparison of the effects of imputation methods for missing data in predictive modelling of cohort study datasets. BMC Med. Res. Methodol., 24.","DOI":"10.1186\/s12874-024-02173-x"},{"key":"ref_6","unstructured":"Yoon, J., Jordon, J., and Van der Schaar, M. (2018, January 10\u201315). GAIN: Missing Data Imputation using Generative Adversarial Nets. Proceedings of the International Conference on Machine Learning, Stockholm, Sweden."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"179","DOI":"10.1017\/pan.2020.49","article-title":"The MIDAS Touch: Accurate and Scalable Missing-Data Imputation with Deep Learning","volume":"30","author":"Lall","year":"2022","journal-title":"Political Anal."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"1","DOI":"10.17485\/ijst\/2017\/v10i19\/110646","article-title":"A Comparison of Multiple Imputation Methods for Data with Missing Values","volume":"10","author":"Chhabra","year":"2017","journal-title":"Indian J. Sci. Technol."},{"key":"ref_9","first-page":"4737963","article-title":"A Probabilistic Approach for Missing Data Imputation","volume":"2024","year":"2024","journal-title":"Complexity"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Chen, T., and Guestrin, C. (2016, January 13\u201317). XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA.","DOI":"10.1145\/2939672.2939785"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Luo, Y. (2022). Evaluating the state of the art in missing data imputation for clinical data. Brief. Bioinform., 23.","DOI":"10.1093\/bib\/bbab489"},{"key":"ref_12","unstructured":"Cauthen, K.R., Lambert, G.J., Ray, J., and Lefantzi, S. (2016). Imputing Data That Are Missing at High Rates Using a Boosting Algorithm, Sandia National Lab. (SNL-NM)."},{"key":"ref_13","unstructured":"Schwerter, J., Gurtskaia, K., Romero, A., Zeyer-Gliozzo, B., and Pauly, M. (2024). Evaluating tree-based imputation methods as an alternative to MICE PMM for drawing inference in empirical studies. arXiv."},{"key":"ref_14","unstructured":"Foge, N., Schwerter, J., Gurtskaia, K., Pauly, M., and Doebler, P. (2024). Adapting tree-based multiple imputation methods for multi-level data? A simulation study. arXiv."},{"key":"ref_15","first-page":"278","article-title":"An Intelligent Missing Data Imputation Techniques: A Review","volume":"6","author":"Seu","year":"2022","journal-title":"Int. J. Inf. Vis."},{"key":"ref_16","first-page":"11530","article-title":"What\u2019s a good imputation to predict with missing values?","volume":"34","author":"Morvan","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Lee, D., Kim, J., Moon, W.-J., and Ye, J.C. (2019). CollaGAN: Collaborative GAN for Missing Image Data Imputation. arXiv.","DOI":"10.1109\/CVPR.2019.00259"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"395","DOI":"10.1016\/j.neunet.2021.05.033","article-title":"PC-GAIN: Pseudo-label conditional generative adversarial imputation networks for incomplete data","volume":"141","author":"Wang","year":"2021","journal-title":"Neural Netw."},{"key":"ref_19","first-page":"87","article-title":"Missing Data Imputation via Denoising Autoencoders: The Untold Story","volume":"Volume 11191","author":"Duivesteijn","year":"2018","journal-title":"Advances in Intelligent Data Analysis XVII"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Koutra, D., Plant, C., Gomez Rodriguez, M., Baralis, E., and Bonchi, F. (2023). Leveraging Variational Autoencoders for Multiple Data Imputation. Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer Nature.","DOI":"10.1007\/978-3-031-43421-1"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"152","DOI":"10.1016\/j.neucom.2016.04.015","article-title":"Missing data imputation using fuzzy-rough methods","volume":"205","author":"Amiri","year":"2016","journal-title":"Neurocomputing"},{"key":"ref_22","first-page":"50","article-title":"Fuzzy based Techniques for Handling Missing Values","volume":"12","author":"Farid","year":"2021","journal-title":"Int. J. Adv. Comput. Sci. Appl."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"863","DOI":"10.1007\/s00500-011-0774-4","article-title":"Missing Data Imputation for Fuzzy Rule-Based Classification Systems","volume":"16","author":"Herrera","year":"2012","journal-title":"Soft Comput."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"2419","DOI":"10.1007\/s10115-019-01427-1","article-title":"Missing data imputation using decision trees and fuzzy clustering with iterative learning","volume":"62","author":"Nikfalazar","year":"2020","journal-title":"Knowl Inf. Syst."},{"key":"ref_25","first-page":"573","article-title":"Towards Missing Data Imputation: A Study of Fuzzy K-means Clustering Method","volume":"Volume 3066","author":"Tsumoto","year":"2004","journal-title":"Proceedings of the International Conference on Rough Sets and Current Trends in Computing"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"060003","DOI":"10.1063\/1.5139149","article-title":"Missing values imputation based on fuzzy C-Means algorithm for classification of chronic obstructive pulmonary disease (COPD)","volume":"2192","author":"Aristiawati","year":"2019","journal-title":"AIP Conf. Proc."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Rodrigues, A.K.G., Ospina, R., and Ferreira, M.R.P. (2021). Adaptive kernel fuzzy clustering for missing data. PLoS ONE, 16.","DOI":"10.1371\/journal.pone.0259266"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"133","DOI":"10.1007\/s10791-025-09639-6","article-title":"An effective imputation approach for handling missing data using intuitionistic fuzzy clustering algorithms","volume":"28","author":"Sethia","year":"2025","journal-title":"Discov. Comput."},{"key":"ref_29","unstructured":"Wang, Y., Li, J., Chen, X., and Zhang, H. (2024). Contextual Language Model for Accurate Imputation Method. arXiv."},{"key":"ref_30","unstructured":"Qin, H., Chen, Y., Zhang, M., and Li, J. (2024). NAIM: Transformer-based Neural Attention Imputation Model for Tabular Data with Missing Values. arXiv."},{"key":"ref_31","unstructured":"Zhou, S., Liu, X., Wang, H., Zhang, Y., and Chen, J. (2024). NuwaTS: A Foundation Model for Generalizable Time Series Imputation. arXiv."},{"key":"ref_32","unstructured":"Zhao, W., Chen, X., Li, Y., Wang, J., and Liu, S. (2025). UnIMP: Uncertainty-aware Imputation Model based on Graph Neural Networks for Incomplete Data. arXiv."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1016\/j.ins.2013.01.021","article-title":"A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm","volume":"233","author":"Arslan","year":"2013","journal-title":"Inf. Sci."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"53","DOI":"10.4236\/jcc.2024.1211004","article-title":"Missing Data Imputation: A Comprehensive Review","volume":"12","author":"Alwateer","year":"2024","journal-title":"J. Comp. Comm."}],"container-title":["Computers"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-431X\/14\/8\/325\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T18:25:26Z","timestamp":1760034326000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-431X\/14\/8\/325"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8,12]]},"references-count":34,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2025,8]]}},"alternative-id":["computers14080325"],"URL":"https:\/\/doi.org\/10.3390\/computers14080325","relation":{},"ISSN":["2073-431X"],"issn-type":[{"type":"electronic","value":"2073-431X"}],"subject":[],"published":{"date-parts":[[2025,8,12]]}}}