{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,22]],"date-time":"2026-03-22T08:07:27Z","timestamp":1774166847114,"version":"3.50.1"},"reference-count":39,"publisher":"Springer Science and Business Media LLC","issue":"7","license":[{"start":{"date-parts":[[2022,7,26]],"date-time":"2022-07-26T00:00:00Z","timestamp":1658793600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,7,26]],"date-time":"2022-07-26T00:00:00Z","timestamp":1658793600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100007511","name":"Universidad Rey Juan Carlos","doi-asserted-by":"publisher","award":["C1PREDOC2020"],"award-info":[{"award-number":["C1PREDOC2020"]}],"id":[{"id":"10.13039\/501100007511","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100007511","name":"Universidad Rey Juan Carlos","doi-asserted-by":"publisher","award":["C1PREDOC2020"],"award-info":[{"award-number":["C1PREDOC2020"]}],"id":[{"id":"10.13039\/501100007511","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100012818","name":"Comunidad de Madrid","doi-asserted-by":"publisher","award":["IND2019\/TIC-17194"],"award-info":[{"award-number":["IND2019\/TIC-17194"]}],"id":[{"id":"10.13039\/100012818","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100014440","name":"Ministerio de Ciencia, Innovaci\u00f3n y Universidades","doi-asserted-by":"publisher","award":["RTI-2018-094269-B-I00"],"award-info":[{"award-number":["RTI-2018-094269-B-I00"]}],"id":[{"id":"10.13039\/100014440","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100007511","name":"Universidad Rey Juan Carlos","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100007511","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Appl Intell"],"published-print":{"date-parts":[[2023,4]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Complexity measures aim to characterize the underlying complexity of supervised data. These measures tackle factors hindering the performance of <jats:italic>Machine Learning<\/jats:italic> (<jats:italic>ML<\/jats:italic>) classifiers like overlap, density, linearity, etc. The state-of-the-art has mainly focused on the dataset perspective of complexity, i.e., offering an estimation of the complexity of the whole dataset. Recently, the instance perspective has also been addressed. In this paper, the <jats:italic>hostility measure<\/jats:italic>, a complexity measure offering a multi-level (instance, class, and dataset) perspective of data complexity is proposed. The proposal is built by estimating the novel notion of <jats:italic>hostility<\/jats:italic>: the difficulty of correctly classifying a point, a class, or a whole dataset given their corresponding neighborhoods. The proposed measure is estimated at the instance level by applying the <jats:italic>k<\/jats:italic>-means algorithm in a recursive and hierarchical way, which allows to analyze how points from different classes are naturally grouped together across partitions. The instance information is aggregated to provide complexity knowledge at the class and the dataset levels. The validity of the proposal is evaluated through a variety of experiments dealing with the three perspectives and the corresponding comparative with the state-of-the-art measures. Throughout the experiments, the <jats:italic>hostility measure<\/jats:italic> has shown promising results and to be competitive, stable, and robust.<\/jats:p>","DOI":"10.1007\/s10489-022-03793-w","type":"journal-article","created":{"date-parts":[[2022,7,26]],"date-time":"2022-07-26T13:05:42Z","timestamp":1658840742000},"page":"8073-8096","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":11,"title":["Hostility measure for multi-level study of data complexity"],"prefix":"10.1007","volume":"53","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4674-1598","authenticated-orcid":false,"given":"Carmen","family":"Lancho","sequence":"first","affiliation":[]},{"given":"Isaac","family":"Mart\u00edn De Diego","sequence":"additional","affiliation":[]},{"given":"Marina","family":"Cuesta","sequence":"additional","affiliation":[]},{"given":"V\u00edctor","family":"Ace\u00f1a","sequence":"additional","affiliation":[]},{"given":"Javier M.","family":"Moguerza","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,7,26]]},"reference":[{"key":"3793_CR1","doi-asserted-by":"crossref","unstructured":"Arruda J L, Prud\u00eancio R B, Lorena A C (2020) Measuring instance hardness using data complexity measures. In: Brazilian conference on intelligent systems. Springer, pp 483\u2013497","DOI":"10.1007\/978-3-030-61380-8_33"},{"key":"3793_CR2","doi-asserted-by":"publisher","first-page":"83","DOI":"10.1016\/j.ins.2020.12.006","volume":"553","author":"VH Barella","year":"2021","unstructured":"Barella V H, Garcia L P, de Souto M C, Lorena A C, de Carvalho A C (2021) Assessing the data complexity of imbalanced datasets. Inf Sci 553:83\u2013109","journal-title":"Inf Sci"},{"key":"3793_CR3","doi-asserted-by":"crossref","unstructured":"Basu M, Ho TK (2006) Data complexity in pattern recognition. Springer Science & Business Media","DOI":"10.1007\/978-1-84628-172-3"},{"issue":"1","key":"3793_CR4","doi-asserted-by":"publisher","first-page":"82","DOI":"10.1109\/TEVC.2004.840153","volume":"9","author":"E Bernad\u00f3-Mansilla","year":"2005","unstructured":"Bernad\u00f3-Mansilla E, Ho T K (2005) Domain of competence of xcs classifier system in complexity measurement space. IEEE Trans Evol Comput 9(1):82\u2013104","journal-title":"IEEE Trans Evol Comput"},{"issue":"2","key":"3793_CR5","doi-asserted-by":"publisher","first-page":"153","DOI":"10.1023\/A:1014043630878","volume":"6","author":"H Brighton","year":"2002","unstructured":"Brighton H, Mellish C (2002) Advances in instance selection for instance-based learning algorithms. Data Min Knowl Discov 6(2):153\u2013172","journal-title":"Data Min Knowl Discov"},{"key":"3793_CR6","doi-asserted-by":"publisher","first-page":"175","DOI":"10.1016\/j.patcog.2017.10.038","volume":"76","author":"AL Brun","year":"2018","unstructured":"Brun A L, Britto A S Jr, Oliveira L S, Enembreck F, Sabourin R (2018) A framework for dynamic classifier selection oriented by the classification problem difficulty. Pattern Recogn 76:175\u2013190","journal-title":"Pattern Recogn"},{"key":"3793_CR7","doi-asserted-by":"publisher","first-page":"396","DOI":"10.1016\/j.patrec.2019.05.021","volume":"125","author":"Z Cai","year":"2019","unstructured":"Cai Z, Long Y, Shao L (2019) Classification complexity assessment for hyper-parameter optimization. Pattern Recogn Lett 125:396\u2013403","journal-title":"Pattern Recogn Lett"},{"key":"3793_CR8","unstructured":"Dua D, Graff C (2017) UCI machine learning repository. http:\/\/archive.ics.uci.edu\/ml. Accessed 9 June 2022"},{"key":"3793_CR9","doi-asserted-by":"publisher","first-page":"101445","DOI":"10.1016\/j.jocs.2021.101445","volume":"55","author":"A Fahim","year":"2021","unstructured":"Fahim A (2021) K and starting means for k-means algorithm. J Comput Sci 55:101445","journal-title":"J Comput Sci"},{"key":"3793_CR10","unstructured":"Garcia L, Lorena A (2019) ECoL: complexity measures for supervised problems. https:\/\/CRAN.R-project.org\/package=ECoL, r package version 0.3.0. Accessed 9 June 2022"},{"key":"3793_CR11","doi-asserted-by":"publisher","first-page":"108","DOI":"10.1016\/j.neucom.2014.10.085","volume":"160","author":"LP Garcia","year":"2015","unstructured":"Garcia L P, de Carvalho A C, Lorena A C (2015) Effect of label noise in the complexity of classification problems. Neurocomputing 160:108\u2013119","journal-title":"Neurocomputing"},{"issue":"1","key":"3793_CR12","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s40537-019-0206-3","volume":"6","author":"RH Hariri","year":"2019","unstructured":"Hariri R H, Fredericks E M, Bowers K M (2019) Uncertainty in big data analytics: survey, opportunities, and challenges. J Big Data 6(1):1\u201316","journal-title":"J Big Data"},{"issue":"1","key":"3793_CR13","doi-asserted-by":"publisher","first-page":"101","DOI":"10.1006\/cviu.1998.0624","volume":"70","author":"TK Ho","year":"1998","unstructured":"Ho T K, Baird H S (1998) Pattern classification with compact distribution maps. Comput Vis Image Underst 70(1):101\u2013110","journal-title":"Comput Vis Image Underst"},{"issue":"3","key":"3793_CR14","doi-asserted-by":"publisher","first-page":"289","DOI":"10.1109\/34.990132","volume":"24","author":"TK Ho","year":"2002","unstructured":"Ho T K, Basu M (2002) Complexity measures of supervised classification problems. IEEE Trans Pattern Anal Mach Intell 24(3):289\u2013300","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"3793_CR15","doi-asserted-by":"crossref","unstructured":"Hoekstra A, Duin R P (1996) On the nonlinearity of pattern classifiers. In: Proceedings of 13th international conference on pattern recognition, vol 4. IEEE, pp 271\u2013275","DOI":"10.1109\/ICPR.1996.547429"},{"issue":"2","key":"3793_CR16","doi-asserted-by":"publisher","first-page":"225","DOI":"10.1007\/s00180-008-0119-7","volume":"24","author":"K Hornik","year":"2009","unstructured":"Hornik K, Buchta C, Zeileis A (2009) Open-source machine learning: R meets Weka. Comput Stat 24(2):225\u2013232. https:\/\/doi.org\/10.1007\/s00180-008-0119-7https:\/\/doi.org\/10.1007\/s00180-008-0119-7","journal-title":"Comput Stat"},{"key":"3793_CR17","unstructured":"Kaplansky I (2020) Set theory and metric spaces, vol 298. American Mathematical Society"},{"key":"3793_CR18","doi-asserted-by":"publisher","first-page":"108114","DOI":"10.1016\/j.patcog.2021.108114","volume":"120","author":"M Koziarski","year":"2021","unstructured":"Koziarski M (2021) Potential anchoring for imbalanced data classification. Pattern Recogn 120:108114","journal-title":"Pattern Recogn"},{"issue":"4","key":"3793_CR19","doi-asserted-by":"publisher","first-page":"253","DOI":"10.3934\/jdg.2020018","volume":"7","author":"E Kropat","year":"2020","unstructured":"Kropat E, Weber G W, Tirkolaee E B (2020) Foundations of semialgebraic gene-environment networks. J Dyn Games 7(4):253","journal-title":"J Dyn Games"},{"key":"3793_CR20","doi-asserted-by":"crossref","unstructured":"Lancho C, Mart\u00edn de Diego I, Cuesta M, Ace\u00f1a V, Moguerza JM (2021) A complexity measure for binary classification problems based on lost points. In: International conference on intelligent data engineering and automated learning. Springer, pp 137\u2013146","DOI":"10.1007\/978-3-030-91608-4_14"},{"issue":"2","key":"3793_CR21","doi-asserted-by":"publisher","first-page":"354","DOI":"10.1109\/TKDE.2014.2327034","volume":"27","author":"E Leyva","year":"2014","unstructured":"Leyva E, Gonz\u00e1lez A, Perez R (2014) A set of complexity measures designed for applying meta-learning to instance selection. IEEE Trans Knowl Data Eng 27(2):354\u2013367","journal-title":"IEEE Trans Knowl Data Eng"},{"issue":"4","key":"3793_CR22","doi-asserted-by":"publisher","first-page":"1523","DOI":"10.1016\/j.patcog.2014.10.001","volume":"48","author":"E Leyva","year":"2015","unstructured":"Leyva E, Gonz\u00e1lez A, P\u00e9rez R (2015) Three new instance selection methods based on local sets: a comparative study with several approaches from a bi-objective perspective. Pattern Recogn 48 (4):1523\u20131537","journal-title":"Pattern Recogn"},{"issue":"1","key":"3793_CR23","doi-asserted-by":"publisher","first-page":"33","DOI":"10.1016\/j.neucom.2011.03.054","volume":"75","author":"AC Lorena","year":"2012","unstructured":"Lorena A C, Costa I G, Spola\u00f4r N, De Souto M C (2012) Analysis of complexity indices for classification problems: cancer gene expression data. Neurocomputing 75(1):33\u201342","journal-title":"Neurocomputing"},{"issue":"1","key":"3793_CR24","doi-asserted-by":"publisher","first-page":"209","DOI":"10.1007\/s10994-017-5681-1","volume":"107","author":"AC Lorena","year":"2018","unstructured":"Lorena A C, Maciel A I, de Miranda P B, Costa I G, Prud\u00eancio R B (2018) Data complexity meta-features for regression problems. Mach Learn 107(1):209\u2013246","journal-title":"Mach Learn"},{"issue":"5","key":"3793_CR25","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3347711","volume":"52","author":"AC Lorena","year":"2019","unstructured":"Lorena A C, Garcia L P, Lehmann J, Souto M C, Ho T K (2019) How complex is your classification problem? A survey on measuring classification complexity. ACM Comput Surv (CSUR) 52(5):1\u201334","journal-title":"ACM Comput Surv (CSUR)"},{"issue":"9","key":"3793_CR26","doi-asserted-by":"publisher","first-page":"3525","DOI":"10.1109\/TNNLS.2019.2944962","volume":"31","author":"Y Lu","year":"2019","unstructured":"Lu Y, Cheung Y M, Tang Y Y (2019) Bayes imbalance impact index: a measure of class imbalanced data set for classification problem. IEEE Trans Neural Netw Learn Syst 31(9):3525\u20133539","journal-title":"IEEE Trans Neural Netw Learn Syst"},{"issue":"1","key":"3793_CR27","doi-asserted-by":"publisher","first-page":"147","DOI":"10.1007\/s10115-013-0700-4","volume":"42","author":"J Luengo","year":"2015","unstructured":"Luengo J, Herrera F (2015) An automatic extraction method of the domains of competence for learning classifiers using data complexity measures. Knowl Inf Syst 42(1):147\u2013180","journal-title":"Knowl Inf Syst"},{"issue":"2","key":"3793_CR28","doi-asserted-by":"publisher","first-page":"115","DOI":"10.1016\/j.compbiomed.2010.12.006","volume":"41","author":"S Oh","year":"2011","unstructured":"Oh S (2011) A new dataset evaluation method based on category overlap. Comput Biol Med 41 (2):115\u2013122","journal-title":"Comput Biol Med"},{"issue":"1\u201340","key":"3793_CR29","first-page":"12","volume":"196","author":"A Orriols-Puig","year":"2010","unstructured":"Orriols-Puig A, Macia N, Ho T K (2010) Documentation for the data complexity library in c++. Universitat Ramon Llull La Salle 196(1\u201340):12","journal-title":"Universitat Ramon Llull La Salle"},{"key":"3793_CR30","doi-asserted-by":"crossref","unstructured":"Pascual-Triana J D, Charte D, Arroyo M A, Fern\u00e1ndez A, Herrera F (2021) Revisiting data complexity metrics based on morphology for overlap and imbalance: snapshot, new overlap number of balls metrics and singular problems prospect. Knowl Inf Syst 1\u201329","DOI":"10.1007\/s10115-021-01577-1"},{"key":"3793_CR31","doi-asserted-by":"publisher","first-page":"83396","DOI":"10.1109\/ACCESS.2019.2925300","volume":"7","author":"JA S\u00e1ez","year":"2019","unstructured":"S\u00e1ez J A, Galar M, Krawczyk B (2019) Addressing the overlapping data problem in classification using the one-vs-one decomposition strategy. IEEE Access 7:83396\u201383411","journal-title":"IEEE Access"},{"issue":"4","key":"3793_CR32","doi-asserted-by":"publisher","first-page":"394","DOI":"10.1002\/sam.11463","volume":"13","author":"D Singh","year":"2020","unstructured":"Singh D, Gosain A, Saha A (2020) Weighted k-nearest neighbor based data complexity metrics for imbalanced datasets. Stat Anal Data Min: the ASA Data Science Journal 13(4):394\u2013404","journal-title":"Stat Anal Data Min: the ASA Data Science Journal"},{"issue":"2","key":"3793_CR33","doi-asserted-by":"publisher","first-page":"225","DOI":"10.1007\/s10994-013-5422-z","volume":"95","author":"MR Smith","year":"2014","unstructured":"Smith M R, Martinez T, Giraud-Carrier C (2014) An instance level analysis of data complexity. Mach Learn 95(2):225\u2013256","journal-title":"Mach Learn"},{"key":"3793_CR34","doi-asserted-by":"crossref","unstructured":"Tanwani A K, Farooq M (2009) Classification potential vs. classification accuracy: a comprehensive study of evolutionary algorithms with biomedical datasets. In: Learning classifier systems. Springer, pp 127\u2013144","DOI":"10.1007\/978-3-642-17508-4_9"},{"issue":"1","key":"3793_CR35","doi-asserted-by":"publisher","first-page":"1238","DOI":"10.2991\/ijcis.10.1.82","volume":"10","author":"I Triguero","year":"2017","unstructured":"Triguero I, Gonz\u00e1lez S, Moyano J M, Garc\u00eda S, Alcal\u00e1-Fdez J, Luengo J, Fern\u00e1ndez A, del Jes\u00fas MJ, S\u00e1nchez L, Herrera F (2017) Keel 3.0: an open source software for multi-stage analysis in data mining. Int J Comput Intell Syst 10(1):1238\u20131249","journal-title":"Int J Comput Intell Syst"},{"key":"3793_CR36","doi-asserted-by":"publisher","first-page":"47","DOI":"10.1016\/j.ins.2019.08.062","volume":"509","author":"P Vuttipittayamongkol","year":"2020","unstructured":"Vuttipittayamongkol P, Elyan E (2020) Neighbourhood-based undersampling approach for handling imbalanced and overlapped data. Inf Sci 509:47\u201370","journal-title":"Inf Sci"},{"key":"3793_CR37","doi-asserted-by":"publisher","first-page":"382","DOI":"10.1016\/j.future.2018.08.007","volume":"91","author":"S Wan","year":"2019","unstructured":"Wan S, Zhao Y, Wang T, Gu Z, Abbasi Q H, Choo K K R (2019) Multi-dimensional data indexing and range query processing via voronoi diagram for internet of things. Futur Gener Comput Syst 91:382\u2013391","journal-title":"Futur Gener Comput Syst"},{"key":"3793_CR38","unstructured":"Weitzman MS (1970) Measures of overlap of income distributions of white and Negro families in the United States, vol 22. US Bureau of the Census"},{"key":"3793_CR39","doi-asserted-by":"crossref","first-page":"204","DOI":"10.1016\/j.cam.2018.08.038","volume":"351","author":"X Zhang","year":"2019","unstructured":"Zhang X, Li R, Zhang B, Yang Y, Guo J, Ji X (2019) An instance-based learning recommendation algorithm of imbalance handling methods. Appl Math Comput 351:204\u2013218","journal-title":"Appl Math Comput"}],"container-title":["Applied Intelligence"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10489-022-03793-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10489-022-03793-w\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10489-022-03793-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,3,16]],"date-time":"2023-03-16T02:28:09Z","timestamp":1678933689000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10489-022-03793-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,7,26]]},"references-count":39,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2023,4]]}},"alternative-id":["3793"],"URL":"https:\/\/doi.org\/10.1007\/s10489-022-03793-w","relation":{},"ISSN":["0924-669X","1573-7497"],"issn-type":[{"value":"0924-669X","type":"print"},{"value":"1573-7497","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,7,26]]},"assertion":[{"value":"21 May 2022","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"26 July 2022","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}