{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,25]],"date-time":"2026-01-25T04:31:15Z","timestamp":1769315475451,"version":"3.49.0"},"reference-count":98,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2018,4,6]],"date-time":"2018-04-06T00:00:00Z","timestamp":1522972800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Symmetry"],"abstract":"<jats:p>Today, data availability has gone from scarce to superabundant. Technologies like IoT, trends in social media and the capabilities of smart-phones are producing and digitizing lots of data that was previously unavailable. This massive increase of data creates opportunities to gain new business models, but also demands new techniques and methods of data quality in knowledge discovery, especially when the data comes from different sources (e.g., sensors, social networks, cameras, etc.). The data quality process of the data set proposes conclusions about the information they contain. This is increasingly done with the aid of data cleaning approaches. Therefore, guaranteeing a high data quality is considered as the primary goal of the data scientist. In this paper, we propose a process for data cleaning in regression models (DC-RM). The proposed data cleaning process is evaluated through a real datasets coming from the UCI Repository of Machine Learning Databases. With the aim of assessing the data cleaning process, the dataset that is cleaned by DC-RM was used to train the same regression models proposed by the authors of UCI datasets. The results achieved by the trained models with the dataset produced by DC-RM are better than or equal to that presented by the datasets\u2019 authors.<\/jats:p>","DOI":"10.3390\/sym10040099","type":"journal-article","created":{"date-parts":[[2018,4,11]],"date-time":"2018-04-11T12:16:50Z","timestamp":1523449010000},"page":"99","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":39,"title":["How to Address the Data Quality Issues in Regression Models: A Guided Process for Data Cleaning"],"prefix":"10.3390","volume":"10","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4717-3040","authenticated-orcid":false,"given":"David","family":"Corrales","sequence":"first","affiliation":[{"name":"Grupo de Ingenier\u00eda Telem\u00e1tica, Universidad del Cauca, Campus Tulc\u00e1n, 190002 Popay\u00e1n, Colombia"},{"name":"Departamento de Ciencias de la Computaci\u00f3n e Ingenier\u00eda, Universidad Carlos III de Madrid, Avenida de la Universidad, 30, 28911 Legan\u00e9s, Spain"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5608-9097","authenticated-orcid":false,"given":"Juan","family":"Corrales","sequence":"additional","affiliation":[{"name":"Grupo de Ingenier\u00eda Telem\u00e1tica, Universidad del Cauca, Campus Tulc\u00e1n, 190002 Popay\u00e1n, Colombia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0041-6829","authenticated-orcid":false,"given":"Agapito","family":"Ledezma","sequence":"additional","affiliation":[{"name":"Departamento de Ciencias de la Computaci\u00f3n e Ingenier\u00eda, Universidad Carlos III de Madrid, Avenida de la Universidad, 30, 28911 Legan\u00e9s, Spain"}]}],"member":"1968","published-online":{"date-parts":[[2018,4,6]]},"reference":[{"key":"ref_1","unstructured":"Gantz, J., and Reinsel, D. (2012). The Digital Universe in 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East, IDC."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"652","DOI":"10.1109\/ACCESS.2014.2332453","article-title":"Toward Scalable Systems for Big Data Analytics: A Technology Tutorial","volume":"2","author":"Hu","year":"2014","journal-title":"IEEE Access"},{"key":"ref_3","unstructured":"Marr, B. (2015, September 30). Big Data: 20 Mind-Boggling Facts Everyone Must Read. Available online: https:\/\/www.forbes.com\/sites\/bernardmarr\/2015\/09\/30\/big-data-20-mindbogglingfacts-everyone-must-read\/."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Maimon, O., and Rokach, L. (2005). Introduction to Knowledge Discovery in Databases. Data Mining and Knowledge Discovery Handbook, Springer.","DOI":"10.1007\/b107408"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Eyob, E. (2009). Social Implications of Data Mining and Information Privacy: Interdisciplinary Frameworks and Solutions: Interdisciplinary Frameworks and Solutions, Information Science Reference.","DOI":"10.4018\/978-1-60566-196-4"},{"key":"ref_6","unstructured":"Piateski, G., and Frawley, W. (1991). Knowledge Discovery in Databases, MIT Press."},{"key":"ref_7","unstructured":"Chapman, P. (2000). CRISP-DM 1.0: Step-by-Step Data Mining Guide, SPSS."},{"key":"ref_8","unstructured":"Olson, D.L., and Delen, D. (2008). Advanced Data Mining Techniques, Springer Science & Business Media."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"396","DOI":"10.17706\/jcp.10.6.396-405","article-title":"A Conceptual Framework for Data Quality in Knowledge Discovery Tasks (FDQ-KDT): A Proposal","volume":"10","author":"Corrales","year":"2015","journal-title":"J. Comput."},{"key":"ref_10","unstructured":"Asuncion, A., and Newman, D. (2007). UCI Machine Learning Repository, University of California, School of Information and Computer Science. Available online: http:\/\/www.ics.uci.edu\/~{}mlearn\/MLRepository.html."},{"key":"ref_11","unstructured":"Sen, A., and Srivastava, M. (2012). Regression Analysis: Theory, Methods, and Applications, Springer Science & Business Media."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"347","DOI":"10.1016\/j.eswa.2017.02.013","article-title":"A regression tree approach using mathematical programming","volume":"78","author":"Yang","year":"2017","journal-title":"Expert Syst. Appl."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1016\/0169-2070(94)90045-0","article-title":"Artificial neural network models for forecasting and decision making","volume":"10","author":"Hill","year":"1994","journal-title":"Int. J. Forecast."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"302","DOI":"10.1109\/72.80341","article-title":"Orthogonal least squares learning algorithm for radial basis function networks","volume":"2","author":"Chen","year":"1991","journal-title":"IEEE Trans. Neural Netw."},{"key":"ref_15","unstructured":"Quinlan, J.R. (1992). Learning With Continuous Classes, World Scientific."},{"key":"ref_16","unstructured":"Maydanchik, A. (2007). Data Quality Assessment, Technics Publications LLC."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Morbey, G. (2013). Data Quality for Decision Makers: A Dialog between a Board Member and a DQ Expert, B\u00fccher, Springer Fachmedien.","DOI":"10.1007\/978-3-658-01823-8"},{"key":"ref_18","first-page":"33","article-title":"Data Quality in Linear Regression Models: Effect of Errors in Test Data and Errors in Training Data on Predictive Accuracy","volume":"2","author":"Klein","year":"1999","journal-title":"Inf. Sci."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Taleb, I., Dssouli, R., and Serhani, M.A. (July, January 27). Big Data Pre-processing: A Quality Framework. Proceedings of the 2015 IEEE International Congress on Big Data, New York, NY, USA.","DOI":"10.1109\/BigDataCongress.2015.35"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1016\/j.future.2015.11.024","article-title":"A Data Quality in Use model for Big Data","volume":"63","author":"Merino","year":"2016","journal-title":"Future Gener. Comput. Syst."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Wang, J., Cellary, W., Wang, D., Wang, H., Chen, S.C., Li, T., and Zhang, Y. (2015, January 1\u20133). A Data Quality Framework for Customer Relationship Analytics. Proceedings of the 2015 16th International Conference on Web Information Systems Engineering (WISE), Miami, FL, USA. Part II.","DOI":"10.1007\/978-3-319-26187-4"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Guillet, F.J., and Hamilton, H.J. (2007). Measuring and Modelling Data Quality for Quality-Awareness in Data Mining. Quality Measures in Data Mining, Springer.","DOI":"10.1007\/978-3-540-44918-8"},{"key":"ref_23","unstructured":"Galhard, H., Florescu, D., Shasha, D., and Simon, E. (March, January 28). An extensible Framework for Data Cleaning. Proceedings of the 2000 16th International Conference on Data Engineering, Washington, DC, USA."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"8304","DOI":"10.1016\/j.eswa.2015.06.050","article-title":"DQ2S? A framework for data quality-aware information management","volume":"42","author":"Dong","year":"2015","journal-title":"Expert Syst. Appl."},{"key":"ref_25","unstructured":"Yang, Q., and Webb, G. (2006, January 7\u201311). An Object-Oriented Framework for Data Quality Management of Enterprise Data Warehouse. Proceedings of the PRICAI 2006 Trends in Artificial Intelligence 9th Pacific Rim International Conference on Artificial Intelligence, Guilin, China."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Sebastian-Coleman, L. (2012). Measuring Data Quality for Ongoing Improvement: A Data Quality Assessment Framework, Newnes.","DOI":"10.1016\/B978-0-12-397033-6.00020-1"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Myrseth, P., Stang, J., and Dalberg, V. (2011, January 6\u20138). A data quality framework applied to e-government metadata: A prerequsite to establish governance of interoperable e-services. Proceedings of the 2011 International Conference on E-Business and E-Government (ICEE), Shanghai, China.","DOI":"10.1109\/ICEBEG.2011.5881298"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"325","DOI":"10.1016\/j.giq.2016.02.001","article-title":"Open data quality measurement framework: Definition and application to Open Government Data","volume":"33","author":"Vetro","year":"2016","journal-title":"Gov. Inf. Q."},{"key":"ref_29","first-page":"4421","article-title":"A Framework to Construct Data Quality Dimensions Relationships","volume":"6","author":"Panahy","year":"2013","journal-title":"Indian J. Sci. Technol."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"623","DOI":"10.1109\/69.404034","article-title":"A framework for analysis of data quality research","volume":"7","author":"Wang","year":"1995","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"144","DOI":"10.1136\/amiajnl-2011-000681","article-title":"Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research","volume":"20","author":"Weiskopf","year":"2013","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"40","DOI":"10.1016\/j.ijmedinf.2016.03.006","article-title":"Data quality assessment framework to assess electronic medical record data for use in research","volume":"90","author":"Reimer","year":"2016","journal-title":"Int. J. Med. Inform."},{"key":"ref_33","unstructured":"Almutiry, O., Wills, G., and Alwabel, A. (2013, January 24\u201326). Toward a framework for data quality in cloud-based health information system. Proceedings of the 2013 International Conference on Information Society (i-Society), Toronto, ON, Canada."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"600","DOI":"10.1197\/jamia.M1087","article-title":"Defining and improving data quality in medical registries: A literature review, case study, and generic framework","volume":"9","author":"Arts","year":"2002","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"1094","DOI":"10.1016\/j.ijmedinf.2015.09.008","article-title":"Structured data quality reports to improve EHR data quality","volume":"84","author":"Taggart","year":"2015","journal-title":"Int. J. Med. Inform."},{"key":"ref_36","first-page":"1","article-title":"Secondary use of EHR: Data quality issues and informatics opportunities","volume":"2010","author":"Botsis","year":"2010","journal-title":"Summit Transl. Bioinform."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"S21","DOI":"10.1097\/MLR.0b013e318257dd67","article-title":"A pragmatic framework for single-site and multisite data quality assessment in electronic health record-based clinical research","volume":"50","author":"Kahn","year":"2012","journal-title":"Med. Care"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"156","DOI":"10.1016\/j.canep.2018.02.002","article-title":"Evaluation of data quality at the National Cancer Registry of Ukraine","volume":"53","author":"Ryzhov","year":"2018","journal-title":"Cancer Epidemiol."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Rasta, K., Nguyen, T.H., and Prinz, A. (2013, January 29\u201331). A framework for data quality handling in enterprise service bus. Proceedings of the 2013 Third International Conference on Innovative Computing Technology (INTECH), London, UK.","DOI":"10.1109\/INTECH.2013.6653640"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"96","DOI":"10.1016\/j.cageo.2014.12.006","article-title":"The data quality analyzer: A quality control program for seismic data","volume":"76","author":"Ringler","year":"2015","journal-title":"Comput. Geosci."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"98","DOI":"10.1016\/j.rser.2016.10.054","article-title":"Data quality of electricity consumption data in a smart grid environment","volume":"75","author":"Chen","year":"2017","journal-title":"Renew. Sustain. Energy Rev."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1177\/160940690900800406","article-title":"Building a conceptual framework: philosophy, definitions, and procedure","volume":"8","author":"Jabareen","year":"2009","journal-title":"Int. J. Qual. Methods"},{"key":"ref_43","unstructured":"Schutt, R., and O\u2019Neil, C. (2013). Doing Data Science: Straight Talk from the Frontline, O\u2019Reilly Media, Inc."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Corrales, D., Ledezma, A., and Corrales, J. (2016). A Systematic Review of Data Quality Issues in Knowledge Discovery Tasks, Revista Ingenierias Universidad de Medellin.","DOI":"10.22395\/rium.v15n28a7"},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1016\/j.ins.2013.01.021","article-title":"A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm","volume":"233","author":"Aydilek","year":"2013","journal-title":"Inf. Sci."},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Hawkins, D.M. (1980). Identification of Outliers, Springer.","DOI":"10.1007\/978-94-015-3994-4"},{"key":"ref_47","unstructured":"Barnett, V., and Lewis, T. (1994). Outliers in Statistical Data, Wiley."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Johnson, R.A., and Wichern, D.W. (2014). Applied Multivariate Statistical Analysis, Prentice-Hall.","DOI":"10.1002\/9781118445112.stat02623"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Khalid, S., Khalil, T., and Nasreen, S. (2014, January 27\u201329). A survey of feature selection and feature extraction techniques in machine learning. Proceedings of the Science and Information Conference (SAI), London, UK.","DOI":"10.1109\/SAI.2014.6918213"},{"key":"ref_50","unstructured":"Tang, J., Alelyani, S., and Liu, H. (2014). Feature selection for classification: A review. Data Classification: Algorithms and Applications, CRC Press."},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Bosu, M.F., and MacDonell, S.G. (2013, January 4\u20137). A Taxonomy of Data Quality Challenges in Empirical Software Engineering. Proceedings of the 2013 22nd Australian Software Engineering Conference, Melbourne, Australia.","DOI":"10.1109\/ASWEC.2013.21"},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"304","DOI":"10.1109\/TKDE.2006.46","article-title":"Enhancing data analysis with noise removal","volume":"18","author":"Xiong","year":"2006","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"15:1","DOI":"10.1145\/1541880.1541882","article-title":"Anomaly Detection: A Survey","volume":"41","author":"Chandola","year":"2009","journal-title":"ACM Comput. Surv."},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Aljuaid, T., and Sasi, S. (2016, January 23\u201325). Proper imputation techniques for missing values in data sets. Proceedings of the 2016 International Conference on Data Science and Engineering (ICDSE), Cochin, India.","DOI":"10.1109\/ICDSE.2016.7823957"},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"890","DOI":"10.1109\/32.962560","article-title":"Software cost estimation with incomplete data","volume":"27","author":"Strike","year":"2001","journal-title":"IEEE Trans. Softw. Eng."},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Ziarko, W., and Yao, Y. (2001). A Comparison of Several Approaches to Missing Attribute Values in Data Mining, Springer. Rough Sets and Current Trends in Computing.","DOI":"10.1007\/3-540-45554-X"},{"key":"ref_57","unstructured":"Magnani, M. (2018, March 01). Techniques for Dealing With Missing Data in Knowledge Discovery Tasks. Available online: https:\/\/www.researchgate.net\/profile\/Matteo_Magnani\/publication\/228748415_Techniques_for_dealing_with_missing_data_in_knowledge_discovery_tasks\/links\/00b49521f12e9afa98000000.pdf."},{"key":"ref_58","doi-asserted-by":"crossref","unstructured":"Breunig, M.M., Kriegel, H.P., Ng, R.T., and Sander, J. (2000, January 15\u201318). LOF: Identifying density-based local outliers. Proceedings of the ACM Sigmod Record, Dallas, TX, USA.","DOI":"10.1145\/342009.335388"},{"key":"ref_59","unstructured":"Ester, M., Kriegel, H.P., Sander, J., and Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. KDD-96 Proceedings, AAAI Press."},{"key":"ref_60","doi-asserted-by":"crossref","unstructured":"Kriegel, H.P., Zimek, A., and Hubert, M.S. (2008, January 24\u201327). Angle-based outlier detection in high-dimensional data. Proceedings of the 14th ACM SIGKDD International Conference On Knowledge Discovery and Data Mining, Las Vegas, NV, USA.","DOI":"10.1145\/1401890.1401946"},{"key":"ref_61","unstructured":"Fayyad, U.M., Piatetsky-Shapiro, G., and Smyth, P. (1996). Advances in Knowledge Discovery and Data Mining, American Association for Artificial Intelligence. Chapter from Data Mining to Knowledge Discovery: An Overview."},{"key":"ref_62","first-page":"1787","article-title":"Feature Selection Methods And Algorithms","volume":"3","author":"Ladha","year":"2011","journal-title":"Int. J. Comput. Sci. Eng."},{"key":"ref_63","doi-asserted-by":"crossref","first-page":"16","DOI":"10.1016\/j.compeleceng.2013.11.024","article-title":"A survey on feature selection methods","volume":"40","author":"Chandrashekar","year":"2014","journal-title":"Comput. Electr. Eng."},{"key":"ref_64","doi-asserted-by":"crossref","first-page":"245","DOI":"10.1016\/S0004-3702(97)00063-5","article-title":"Selection of relevant features and examples in machine learning","volume":"97","author":"Blum","year":"1997","journal-title":"Artif. Intell."},{"key":"ref_65","unstructured":"Jolliffe, I. (2002). Principal Component Analysis, Wiley Online Library."},{"key":"ref_66","doi-asserted-by":"crossref","unstructured":"Huang, L., Jin, H., Yuan, P., and Chu, F. (2008, January 3\u20135). Duplicate Records Cleansing with Length Filtering and Dynamic Weighting. Proceedings of the 2008 Fourth International Conference on Semantics, Knowledge and Grid, Beijing, China.","DOI":"10.1109\/SKG.2008.88"},{"key":"ref_67","doi-asserted-by":"crossref","unstructured":"John, G.H., Kohavi, R., and Pfleger, K. (1994, January 10\u201313). Irrelevant Features and the Subset Selection Problem. Proceedings of the Eleventh International Machine Learning, Morgan Kaufmann, New Brunswick, NJ, USA.","DOI":"10.1016\/B978-1-55860-335-6.50023-4"},{"key":"ref_68","unstructured":"Guyon, I., Gunn, S., Nikravesh, M., and Zadeh, L.A. (2008). Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing), Springer."},{"key":"ref_69","doi-asserted-by":"crossref","unstructured":"Yin, H., Tino, P., Corchado, E., Byrne, W., and Yao, X. (2007). Filter Methods for Feature Selection\u2014A Comparative Study. Intelligent Data Engineering and Automated Learning\u2014IDEAL 2007 8th International Conference, Birmingham, UK, 16\u201319 December 2007, Springer.","DOI":"10.1007\/978-3-540-77226-2"},{"key":"ref_70","unstructured":"Urbanek, S. (2018, March 01). Package \u2018Rserve\u2019 Manual. Available online: https:\/\/cran.r-project.org\/web\/packages\/Rserve\/Rserve.pdf."},{"key":"ref_71","unstructured":"Team, R.C. (2018, March 01). R: A Language and Environment for Statistical Computing. Available online: http:\/\/citeseerx.ist.psu.edu\/viewdoc\/download?doi=10.1.1.470.5851&rep=rep1&type=pdf."},{"key":"ref_72","unstructured":"Stekhoven, D. (2018, March 01). Package \u2018missForest\u2019 Manual. Available online: https:\/\/cran.r-project.org\/web\/packages\/missForest\/missForest.pdf."},{"key":"ref_73","unstructured":"Hu, Y., Murray, W., and Shan, Y. (2018, March 01). Package \u2018Rlof\u2019 Manual. Available online: https:\/\/cran.r-project.org\/web\/packages\/Rlof\/Rlof.pdf."},{"key":"ref_74","unstructured":"Hennig, C. (2018, March 01). Package \u2018fpc\u2019 Manual. Available online: https:\/\/cran.r-project.org\/web\/packages\/fpc\/fpc.pdf."},{"key":"ref_75","unstructured":"Romanski, P., and Kotthoff, L. (2018, March 01). Package \u2018FSelector\u2019 Manual. Available online: https:\/\/cran.r-project.org\/web\/packages\/FSelector\/FSelector.pdf."},{"key":"ref_76","unstructured":"Singh, K., Kaur, R., and Kumar, D. (2015, January 25\u201327). Comment Volume Prediction Using Neural Networks and Decision Trees. Proceedings of the 2015 17th UKSIM\u201915 UKSIM-AMSS International Conference on Modelling and Simulation, IEEE Computer Society, Washington, DC, USA."},{"key":"ref_77","unstructured":"Ho, T.K. (1995, January 14\u201316). Random decision forests. Proceedings of the IEEE Third International Conference on Document Analysis and Recognition, Montreal, QC, Canada."},{"key":"ref_78","doi-asserted-by":"crossref","unstructured":"Faubel, F., McDonough, J., and Klakow, D. (2009, January 19\u201324). Bounded conditional mean imputation with Gaussian mixture models: A reconstruction approach to partly occluded features. Proceedings of the ICASSP 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, Taipei, Taiwan.","DOI":"10.1109\/ICASSP.2009.4960472"},{"key":"ref_79","unstructured":"Zhao, Y. (2012). R and Data Mining: Examples and Case Studies, Academic Press."},{"key":"ref_80","doi-asserted-by":"crossref","first-page":"9","DOI":"10.18046\/syt.v13i33.2077","article-title":"Water quality warnings based on cluster analysis in Colombian river basins","volume":"13","author":"Castillo","year":"2015","journal-title":"Sist. Telemat."},{"key":"ref_81","doi-asserted-by":"crossref","unstructured":"Erman, J., Arlitt, M., and Mahanti, A. (2006, January 11\u201315). Traffic Classification Using Clustering Algorithms. Proceedings of the 2006 MineNet\u201906 SIGCOMM Workshop on Mining Network Data, Pisa, Italy.","DOI":"10.1145\/1162678.1162679"},{"key":"ref_82","doi-asserted-by":"crossref","first-page":"255","DOI":"10.1145\/319983.319987","article-title":"Duplicate Record Elimination in Large Data Files","volume":"8","author":"Bitton","year":"1983","journal-title":"ACM Trans. Database Syst."},{"key":"ref_83","doi-asserted-by":"crossref","unstructured":"Corrales, D.C., Lasso, E., Ledezma, A., and Corrales, J.C. (2018). Feature selection for classification tasks: Expert knowledge or traditional methods?. J. Intell. Fuzzy Syst.","DOI":"10.3233\/JIFS-169470"},{"key":"ref_84","first-page":"1","article-title":"Caret package","volume":"28","author":"Kuhn","year":"2008","journal-title":"J. Stat. Softw."},{"key":"ref_85","doi-asserted-by":"crossref","first-page":"59","DOI":"10.2307\/1402731","article-title":"Karl Pearson and the chi-squared test","volume":"51","author":"Plackett","year":"1983","journal-title":"Int. Stat. Rev.\/Rev. Int. Stat."},{"key":"ref_86","doi-asserted-by":"crossref","first-page":"163","DOI":"10.1093\/biomet\/70.1.163","article-title":"Information gain and a general measure of correlation","volume":"70","author":"Kent","year":"1983","journal-title":"Biometrika"},{"key":"ref_87","unstructured":"Mitchell, T.M. (1997). Machine Learning, McGraw Hill."},{"key":"ref_88","doi-asserted-by":"crossref","first-page":"463","DOI":"10.1007\/978-3-540-35488-8_23","article-title":"Information gain, correlation and support vector machines","volume":"207","author":"Roobaert","year":"2006","journal-title":"Stud. Fuzziness Soft Comput."},{"key":"ref_89","first-page":"136","article-title":"Machine learning approaches for improving condition-based maintenance of naval propulsion plants","volume":"230","author":"Coraddu","year":"2016","journal-title":"Proc. Inst. Mech. Eng. Part M"},{"key":"ref_90","doi-asserted-by":"crossref","first-page":"113","DOI":"10.1007\/s13748-013-0040-3","article-title":"Event labeling combining ensemble detectors and background knowledge","volume":"2","author":"Gama","year":"2014","journal-title":"Prog. Artif. Intell."},{"key":"ref_91","doi-asserted-by":"crossref","first-page":"81","DOI":"10.1016\/j.enbuild.2017.01.083","article-title":"Data driven prediction models of energy use of appliances in a low-energy house","volume":"140","author":"Candanedo","year":"2017","journal-title":"Energy Build."},{"key":"ref_92","doi-asserted-by":"crossref","first-page":"3341","DOI":"10.1016\/j.jbusres.2016.02.010","article-title":"Predicting social media performance metrics and evaluation of the impact on brand building: A data mining approach","volume":"69","author":"Moro","year":"2016","journal-title":"J. Bus. Res."},{"key":"ref_93","doi-asserted-by":"crossref","unstructured":"Spiliopoulou, M., Schmidt-Thieme, L., and Janning, R. (2014). Feedback Prediction for Blogs. Data Analysis, Machine Learning and Knowledge Discovery, Springer International Publishing.","DOI":"10.1007\/978-3-319-01595-8"},{"key":"ref_94","doi-asserted-by":"crossref","first-page":"162","DOI":"10.1016\/j.enbuild.2014.04.034","article-title":"On-line learning of indoor temperature forecasting models towards energy efficiency","volume":"83","author":"Romeu","year":"2014","journal-title":"Energy Build."},{"key":"ref_95","first-page":"245","article-title":"Selection of relevant features in machine learning","volume":"184","author":"Langley","year":"1994","journal-title":"Proc. AAAI Fall Symp. Relev."},{"key":"ref_96","first-page":"1157","article-title":"An introduction to variable and feature selection","volume":"3","author":"Guyon","year":"2003","journal-title":"Introd. Var. Feature Sel."},{"key":"ref_97","doi-asserted-by":"crossref","first-page":"2507","DOI":"10.1093\/bioinformatics\/btm344","article-title":"A review of feature selection techniques in bioinformatics","volume":"23","author":"Saeys","year":"2007","journal-title":"Bioinformatics"},{"key":"ref_98","doi-asserted-by":"crossref","unstructured":"Da Silva Jacinto, A., da Silva Santos, R., and de Oliveira, J.M.P. (2014, January 10\u201312). Automatic and semantic pre-Selection of features using ontology for data mining on data sets related to cancer. Proceedings of the International Conference on Information Society (i-Society 2014), London, UK.","DOI":"10.1109\/i-Society.2014.7009060"}],"container-title":["Symmetry"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-8994\/10\/4\/99\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T14:59:50Z","timestamp":1760194790000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-8994\/10\/4\/99"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,4,6]]},"references-count":98,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2018,4]]}},"alternative-id":["sym10040099"],"URL":"https:\/\/doi.org\/10.3390\/sym10040099","relation":{},"ISSN":["2073-8994"],"issn-type":[{"value":"2073-8994","type":"electronic"}],"subject":[],"published":{"date-parts":[[2018,4,6]]}}}