{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,22]],"date-time":"2026-01-22T03:00:43Z","timestamp":1769050843834,"version":"3.49.0"},"reference-count":17,"publisher":"MDPI AG","issue":"7","license":[{"start":{"date-parts":[[2023,6,29]],"date-time":"2023-06-29T00:00:00Z","timestamp":1687996800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>Machine learning algorithms are frequently used for classification problems on tabular datasets. In order to make informed decisions about model selection and design, it is crucial to gain meaningful insights into the complexity of these datasets. Feature-based complexity measures are a set of complexity measures that evaluates how useful features are at discriminating instances of different classes. This paper, however, shows that existing feature-based measures are inadequate in accurately measuring the complexity of various synthetic classification datasets, particularly those with multiple classes. This paper proposes a new feature-based complexity measure called the F5 measure, which evaluates the discriminative power of features for each class by identifying long sequences of uninterrupted instances of the same class. It is shown that the F5 measure better represents the feature complexity of a dataset.<\/jats:p>","DOI":"10.3390\/e25071000","type":"journal-article","created":{"date-parts":[[2023,6,30]],"date-time":"2023-06-30T00:43:30Z","timestamp":1688085810000},"page":"1000","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Feature-Based Complexity Measure for Multinomial Classification Datasets"],"prefix":"10.3390","volume":"25","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3725-3154","authenticated-orcid":false,"given":"Kyle","family":"Erwin","sequence":"first","affiliation":[{"name":"Computer Science Division, Stellenbosh University, Stellenbosch 7600, South Africa"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0242-3539","authenticated-orcid":false,"given":"Andries","family":"Engelbrecht","sequence":"additional","affiliation":[{"name":"Computer Science Division, Stellenbosh University, Stellenbosch 7600, South Africa"},{"name":"Department of Industrial Engineering, Stellenbosh University, Stellenbosch 7600, South Africa"},{"name":"Center for Applied Mathematics and Bioinformatics, Gulf University for Science and Technology, Mubarak Al-Abdullah 32093, Kuwait"}]}],"member":"1968","published-online":{"date-parts":[[2023,6,29]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"33","DOI":"10.1016\/j.knosys.2015.05.014","article-title":"Recent advances and emerging challenges of feature selection in the context of big data","volume":"86","year":"2015","journal-title":"Knowl.-Based Syst."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"14","DOI":"10.1109\/MCI.2014.2326099","article-title":"The Emerging \u201cBig Dimensionality\u201d","volume":"9","author":"Zhai","year":"2014","journal-title":"IEEE Comput. Intell. Mag."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Malan, K.M. (2021). A Survey of Advances in Landscape Analysis for Optimisation. Algorithms, 14.","DOI":"10.3390\/a14020040"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Mersmann, O., Bischl, B., Trautmann, H., Preuss, M., Weihs, C., and Rudolph, G. (2011, January 12\u201316). Exploratory landscape analysis. Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation, Dublin Ireland.","DOI":"10.1145\/2001576.2001690"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"149","DOI":"10.1142\/S0218654305000761","article-title":"Persistence Barcodes for Shapes","volume":"11","author":"Carlsson","year":"2005","journal-title":"Int. J. Shape Model."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1090\/S0273-0979-07-01191-3","article-title":"Barcodes: The persistent topology of data","volume":"45","author":"Ghrist","year":"2007","journal-title":"Bull. Am. Math. Soc."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"209","DOI":"10.1007\/s10994-017-5681-1","article-title":"Data complexity meta-features for regression problems","volume":"107","author":"Lorena","year":"2018","journal-title":"Mach. Learn."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1109\/34.990132","article-title":"Complexity measures of supervised classification problems","volume":"24","author":"Ho","year":"2002","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3347711","article-title":"How Complex Is Your Classification Problem? A Survey on Measuring Classification Complexity","volume":"52","author":"Lorena","year":"2019","journal-title":"ACM Comput. Surv."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Lang, R.D., and Engelbrecht, A.P. (2021). An Exploratory Landscape Analysis-Based Benchmark Suite. Algorithms, 14.","DOI":"10.3390\/a14030078"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Mollineda, R.A., S\u00e1nchez, J.S., and Sotoca, J.M. (2005, January 7\u20139). Data characterization for effective prototype selection. Proceedings of the IbPRIA\u201905: Second Iberian Conference on Pattern Recognition and Image Analysis, Part II, Estoril, Portugal.","DOI":"10.1007\/11492542_4"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"365","DOI":"10.1016\/j.neucom.2021.05.107","article-title":"How important is data quality? Best classifiers vs best features","volume":"470","year":"2022","journal-title":"Neurocomputing"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Okimoto, L.C., Savii, R.M., and Lorena, A.C. (2017, January 2\u20135). Complexity Measures Effectiveness in Feature Selection. Proceedings of the Brazilian Conference on Intelligent Systems, Uberlandia, Brazil.","DOI":"10.1109\/BRACIS.2017.66"},{"key":"ref_14","unstructured":"de Souto, M.C.P., Lorena, A.C., Spola\u00f4r, N., and Costa, I.G. Proceedings of the Proceedings of the International Joint Conference on Neural Networks, Barcelona, Spain, 18\u201323 July 2010."},{"key":"ref_15","unstructured":"Orriols-Puig, A., Maci\u00e0, N., and Ho, T. (2010). DCoL: Data Complexity Library in C++ (Documentation), Technical Report; La Salle\u2014Universitat Ramon Llull."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"629","DOI":"10.1109\/3477.938265","article-title":"Two-parameter Fisher criterion","volume":"31","author":"Malina","year":"2001","journal-title":"IEEE Trans. Syst. Man Cybern. Part B (Cybern.)"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"137","DOI":"10.1109\/TSMCB.2009.2024166","article-title":"Selecting Discrete and Continuous Features Based on Neighborhood Decision Error Minimization","volume":"40","author":"Hu","year":"2010","journal-title":"IEEE Trans. Syst. Man Cybern. Part B (Cybern.)"}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/25\/7\/1000\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T20:03:45Z","timestamp":1760126625000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/25\/7\/1000"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,29]]},"references-count":17,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2023,7]]}},"alternative-id":["e25071000"],"URL":"https:\/\/doi.org\/10.3390\/e25071000","relation":{},"ISSN":["1099-4300"],"issn-type":[{"value":"1099-4300","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,6,29]]}}}