{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T03:48:28Z","timestamp":1760240908035,"version":"build-2065373602"},"reference-count":37,"publisher":"MDPI AG","issue":"11","license":[{"start":{"date-parts":[[2019,10,28]],"date-time":"2019-10-28T00:00:00Z","timestamp":1572220800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>Results of experiments on numerical data sets discretized using two methods\u2014global versions of Equal Frequency per Interval and Equal Interval Width-are presented. Globalization of both methods is based on entropy. For discretized data sets left and right reducts were computed. For each discretized data set and two data sets, based, respectively, on left and right reducts, we applied ten-fold cross validation using the C4.5 decision tree generation system. Our main objective was to compare the quality of all three types of data sets in terms of an error rate. Additionally, we compared complexity of generated decision trees. We show that reduction of data sets may only increase the error rate and that the decision trees generated from reduced decision sets are not simpler than the decision trees generated from non-reduced data sets.<\/jats:p>","DOI":"10.3390\/e21111051","type":"journal-article","created":{"date-parts":[[2019,10,28]],"date-time":"2019-10-28T11:26:13Z","timestamp":1572261973000},"page":"1051","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Reduced Data Sets and Entropy-Based Discretization"],"prefix":"10.3390","volume":"21","author":[{"given":"Jerzy W.","family":"Grzymala-Busse","sequence":"first","affiliation":[{"name":"Department of Electrical Engineering and Computer Science, University of Kansas, Lawrence, KS 66045, USA"},{"name":"Department of Artificial Intelligence, University of Information Technology and Management, 35\u2013225 Rzeszow, Poland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zdzislaw S.","family":"Hippe","sequence":"additional","affiliation":[{"name":"Department of Artificial Intelligence, University of Information Technology and Management, 35\u2013225 Rzeszow, Poland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6064-9528","authenticated-orcid":false,"given":"Teresa","family":"Mroczek","sequence":"additional","affiliation":[{"name":"Department of Artificial Intelligence, University of Information Technology and Management, 35\u2013225 Rzeszow, Poland"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2019,10,28]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1016\/0004-3702(94)90084-1","article-title":"Learning Boolean concepts in the presence of many irrelevant features","volume":"69","author":"Almuallim","year":"1994","journal-title":"Artif. Intell."},{"key":"ref_2","unstructured":"Kira, K., and Rendell, L.A. (1992, January 12\u201316). The feature selection problem: Traditional methods and a new algorithm. Proceedings of the 10-th National Conference on AI, San Jose, CA, USA."},{"key":"ref_3","unstructured":"Garey, M., and Johnson, D. (1979). Computers and Intractability: A Guide to the Theory of NP-Completeness, W. H. Freeman."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Fuernkranz, J., Gamberger, D., and Lavrac, N. (2012). Foundations of Rule Learning, Springer.","DOI":"10.1007\/978-3-540-75197-7"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Stanczyk, U., and Jain, L.C. (2015). Feature Selection for Data and Pattern Recognition, Springer.","DOI":"10.1007\/978-3-662-45620-0"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Stanczyk, U., Zielosko, B., and Jain, L.C. (2018). Advances in Feature Selection for Data and Pattern Recognition, Springer.","DOI":"10.1007\/978-3-319-67588-6"},{"key":"ref_7","first-page":"389","article-title":"Integer programming models for feature selection: New extensions and a randomized solution algorithm","volume":"250","author":"Bertolazzi","year":"2016","journal-title":"Inf. Sci."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"437","DOI":"10.1051\/ro\/2015045","article-title":"Optimal discretization and selection of features by association rates of joint distributions","volume":"50","author":"Santoni","year":"2016","journal-title":"RAIRO-Oper. Res."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"642","DOI":"10.1109\/69.617056","article-title":"Feature selection via discretization","volume":"9","author":"Liu","year":"1997","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Sharmin, S., Ali, A.A., Khan, M.A.H., and Shoyaib, M. (2017, January 13\u201314). Feature selection and discretization based on mutual information. Proceedings of the IEEE International Conference on Imaging, Vision & Pattern Recognition, Dhaka, Bangladesh.","DOI":"10.1109\/ICIVPR.2017.7890885"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Weitschek, E., Felici, G., and Bertolazzi, P. (2012, January 3\u20136). MALA: A microarray clustering and classification software. Proceedings of the International Workshop on Database and Expert Systems Applications, Vienna, Austria.","DOI":"10.1109\/DEXA.2012.29"},{"key":"ref_12","unstructured":"Felici, G., and Weitschek, E. (2012, January 9\u201311). Mining logic models in the presence of noisy data. Proceedings of the International Symposium on Articial Intelligence and Mathematics, Fort Lauderdale, FL, USA."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"863","DOI":"10.1016\/j.ijar.2011.03.001","article-title":"Core-generating approximated minimum entropy discretization for rough set feature selection in pattern classification","volume":"52","author":"Tian","year":"2011","journal-title":"Int. J. Approx. Reason."},{"key":"ref_14","unstructured":"Jensen, R., and Shen, Q. (2002, January 12\u201317). Fuzzy-rough sets for descriptive dimensionality reduction. Proceedings of the International Conference on Fuzzy Systems FUZZ-IEEE 2002, Honolulu, HI, USA."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Nguyen, H.S. (1998, January 22\u201326). Discretization problem for rough sets methods. Proceedings of the 1-st International Conference RSCTC 1998 on Rough Sets and Current Trends in Computing, Warsaw, Poland.","DOI":"10.1007\/3-540-69115-4_75"},{"key":"ref_16","first-page":"565","article-title":"Rough set methods in feature reduction and classification","volume":"11","author":"Swiniarski","year":"2001","journal-title":"Int. J. Appl. Math. Comput. Sci."},{"key":"ref_17","unstructured":"Stanczyk, B., Zielosko, B., and Jain, L.C. (2017). Attribute selection based on reduction of numerical attribute during discretization. Advances in Feature Selection for Data and Pattern Recognition, Springer International Publishing AG."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"414","DOI":"10.1016\/j.patrec.2005.09.004","article-title":"Information-preserving hybrid data reduction based on fuzzy-rough techniques","volume":"27","author":"Hu","year":"2006","journal-title":"Pattern Recognit. Lett."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"125","DOI":"10.1016\/S0957-4174(00)00027-0","article-title":"Genetic algorithms approach to feature discretization in artificial neural networks for the prediction of stock price index","volume":"19","author":"Kim","year":"2000","journal-title":"Expert Syst. Appl."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1007\/BF00994018","article-title":"Support-vector networks","volume":"20","author":"Cortes","year":"1995","journal-title":"Mach. Learn."},{"key":"ref_21","unstructured":"Quinlan, J.R. (1993). C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"270","DOI":"10.1007\/978-3-642-31709-5_28","article-title":"An empirical comparison of rule induction using feature selection with the LEM2 algorithm","volume":"Volume 297","author":"Greco","year":"2012","journal-title":"Communications in Computer and Information Science"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"341","DOI":"10.1007\/BF01001956","article-title":"Rough sets","volume":"11","author":"Pawlak","year":"1982","journal-title":"Int. J. Comput. Inf. Sci."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Pawlak, Z. (1991). Rough Sets. Theoretical Aspects of Reasoning about Data, Kluwer Academic Publishers.","DOI":"10.1007\/978-94-011-3534-4"},{"key":"ref_25","unstructured":"Blajdo, P., Grzymala-Busse, J.W., Hippe, Z.S., Knap, M., Mroczek, T., and Piatek, L. (2008, January 17\u201319). A comparison of six approaches to discretization\u2014A rough set perspective. Proceedings of the Rough Sets and Knowledge Technology Conference, Chengdu, China."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"319","DOI":"10.1016\/S0888-613X(96)00074-6","article-title":"Global discretization of continuous attributes as preprocessing for machine learning","volume":"15","author":"Chmielewski","year":"1996","journal-title":"Int. J. Approx. Reason."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1002\/(SICI)1098-111X(200001)15:1<61::AID-INT4>3.0.CO;2-O","article-title":"Entropy and MDL discretization of continuous variables for Bayesian belief networks","volume":"15","author":"Clarke","year":"2000","journal-title":"Int. J. Intell. Syst."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"201","DOI":"10.1023\/A:1007674919412","article-title":"General and efficient multisplitting of numerical attributes","volume":"36","author":"Elomaa","year":"1999","journal-title":"Mach. Learn."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"87","DOI":"10.1007\/BF00994007","article-title":"On the handling of continuous-valued attributes in decision tree generation","volume":"8","author":"Fayyad","year":"1992","journal-title":"Mach. Learn."},{"key":"ref_30","unstructured":"Fayyad, U.M., and Irani, K.B. (September, January 28). Multi-interval discretization of continuous-valued attributes for classification learning. Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, Chambery, France."},{"key":"ref_31","unstructured":"Kloesgen, W., and Zytkow, J. (2002). Discretization of numerical attributes. Handbook of Data Mining and Knowledge Discovery, Oxford University Press."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Grzymala-Busse, J.W. (2009, January 14\u201317). A multiple scanning strategy for entropy based discretization. Proceedings of the 18th International Symposium on Methodologies for Intelligent Systems, Prague, Czech Republic.","DOI":"10.1007\/978-3-642-04125-9_6"},{"key":"ref_33","unstructured":"Kohavi, R., and Sahami, M. (1996, January 2\u20134). Error-based and entropy-based discretization of continuous features. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, OR, USA."},{"key":"ref_34","unstructured":"Polkowski, L., and Skowron, A. (1998). Discretization methods in data mining. Rough Sets in Knowledge Discovery 1: Methodology and Applications, Physica-Verlag."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Stefanowski, J. (1998, January 22\u201326). Handling continuous attributes in discovery of strong decision rules. Proceedings of the First Conference on Rough Sets and Current Trends in Computing, Warsaw, Poland.","DOI":"10.1007\/3-540-69115-4_54"},{"key":"ref_36","unstructured":"Stefanowski, J. (2001). Algorithms of Decision Rule Induction in Data Mining, Poznan University of Technology Press."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"589","DOI":"10.1111\/j.1540-6261.1968.tb00843.x","article-title":"Financial ratios, discriminant analysis and the prediction of corporate bankruptcy","volume":"23","author":"Altman","year":"1968","journal-title":"J. Financ."}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/21\/11\/1051\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T13:29:53Z","timestamp":1760189393000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/21\/11\/1051"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,10,28]]},"references-count":37,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2019,11]]}},"alternative-id":["e21111051"],"URL":"https:\/\/doi.org\/10.3390\/e21111051","relation":{},"ISSN":["1099-4300"],"issn-type":[{"type":"electronic","value":"1099-4300"}],"subject":[],"published":{"date-parts":[[2019,10,28]]}}}