{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T15:24:46Z","timestamp":1772119486996,"version":"3.50.1"},"reference-count":42,"publisher":"Springer Science and Business Media LLC","issue":"5","license":[{"start":{"date-parts":[[2024,1,12]],"date-time":"2024-01-12T00:00:00Z","timestamp":1705017600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,1,12]],"date-time":"2024-01-12T00:00:00Z","timestamp":1705017600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100007157","name":"Instituto Polit\u00e9cnico do Porto","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100007157","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Knowl Inf Syst"],"published-print":{"date-parts":[[2024,5]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Imbalanced data are present in various business sectors and must be handled with the proper resampling methods and classification algorithms. To handle imbalanced data, there are numerous resampling and learning method combinations; nonetheless, their effective use necessitates specialised knowledge. In this paper, several approaches, ranging from more accessible to more advanced in the domain of data resampling techniques, will be considered to handle imbalanced data. The application developed delivers recommendations of the most suitable combinations of techniques for a specific dataset by extracting and comparing dataset meta-feature values recorded in a knowledge base. It facilitates effortless classification and automates part of the machine learning pipeline with comparable or better results than state-of-the-art solutions and with a much smaller execution time.<\/jats:p>","DOI":"10.1007\/s10115-023-02046-7","type":"journal-article","created":{"date-parts":[[2024,1,12]],"date-time":"2024-01-12T09:02:17Z","timestamp":1705050137000},"page":"2747-2767","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":10,"title":["An automated approach for binary classification on imbalanced data"],"prefix":"10.1007","volume":"66","author":[{"given":"Pedro Marques","family":"Vieira","sequence":"first","affiliation":[]},{"given":"F\u00e1tima","family":"Rodrigues","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,1,12]]},"reference":[{"issue":"2","key":"2046_CR1","doi-asserted-by":"publisher","first-page":"151","DOI":"10.2478\/fcds-2019-0009","volume":"44","author":"M Lango","year":"2019","unstructured":"Lango M (2019) Tackling the problem of class imbalance in multi-class sentiment classification: an experimental study. Found Comput Decis Sci 44(2):151\u2013178. https:\/\/doi.org\/10.2478\/fcds-2019-0009","journal-title":"Found Comput Decis Sci"},{"issue":"4","key":"2046_CR2","doi-asserted-by":"publisher","first-page":"221","DOI":"10.1007\/s13748-016-0094-0","volume":"5","author":"B Krawczyk","year":"2016","unstructured":"Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Prog Artif Intell 5(4):221\u2013232. https:\/\/doi.org\/10.1007\/s13748-016-0094-0","journal-title":"Prog Artif Intell"},{"key":"2046_CR3","doi-asserted-by":"publisher","unstructured":"Fern\u00e1ndez A, Garc\u00eda S, Galar M, Prati RC, Krawczyk B, Herrera F (2018) Learning from imbalanced data sets, vol 10. Springer. https:\/\/doi.org\/10.1007\/978-3-319-98074-4","DOI":"10.1007\/978-3-319-98074-4"},{"issue":"2","key":"2046_CR4","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/2907070","volume":"49","author":"P Branco","year":"2016","unstructured":"Branco P, Torgo L, Ribeiro RP (2016) A survey of predictive modeling on imbalanced domains. ACM Comput Surv (CSUR) 49(2):1\u201350. https:\/\/doi.org\/10.1145\/2907070","journal-title":"ACM Comput Surv (CSUR)"},{"key":"2046_CR5","doi-asserted-by":"publisher","first-page":"220","DOI":"10.1016\/j.eswa.2016.12.035","volume":"73","author":"G Haixiang","year":"2017","unstructured":"Haixiang G, Yijing L, Shang J, Mingyun G, Yuanyue H, Bing G (2017) Learning from class-imbalanced data: review of methods and applications. Expert Syst Appl 73:220\u2013239. https:\/\/doi.org\/10.1016\/j.eswa.2016.12.035","journal-title":"Expert Syst Appl"},{"key":"2046_CR6","doi-asserted-by":"publisher","first-page":"321","DOI":"10.1613\/jair.953","volume":"16","author":"NV Chawla","year":"2002","unstructured":"Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res (JAIR) 16:321\u2013357. https:\/\/doi.org\/10.1613\/jair.953","journal-title":"J Artif Intell Res (JAIR)"},{"key":"2046_CR7","doi-asserted-by":"publisher","unstructured":"Chaplot A, Choudhary N, Jain K (2019) A review on data level approaches for managing imbalanced classification problem. Int J Sci Res Sci Eng Technol 6(2):91-97. https:\/\/doi.org\/10.32628\/IJSRSET196225","DOI":"10.32628\/IJSRSET196225"},{"key":"2046_CR8","doi-asserted-by":"publisher","first-page":"204","DOI":"10.1016\/j.amc.2018.12.020","volume":"351","author":"X Zhang","year":"2019","unstructured":"Zhang X, Li R, Zhang B, Yang Y, Guo J, Ji X (2019) An instance-based learning recommendation algorithm of imbalance handling methods. Appl Math Comput 351:204\u2013218. https:\/\/doi.org\/10.1016\/j.amc.2018.12.020","journal-title":"Appl Math Comput"},{"issue":"3","key":"2046_CR9","doi-asserted-by":"publisher","first-page":"37","DOI":"10.1609\/aimag.v17i3.1230","volume":"17","author":"U Fayyad","year":"1996","unstructured":"Fayyad U, Piatetsky-Shapiro G, Smyth P (1996) From data mining to knowledge discovery in databases. AI Mag 17(3):37. https:\/\/doi.org\/10.1609\/aimag.v17i3.1230","journal-title":"AI Mag"},{"key":"2046_CR10","doi-asserted-by":"publisher","DOI":"10.1613\/jair.1.11854","author":"MA Z\u00f6ller","year":"2021","unstructured":"Z\u00f6ller MA, Huber MF (2021) Benchmark and survey of automated machine learning frameworks. J Artif Intell Res. https:\/\/doi.org\/10.1613\/jair.1.11854","journal-title":"J Artif Intell Res"},{"key":"2046_CR11","doi-asserted-by":"publisher","unstructured":"Tuggener L, Amirian M, Rombach K, L\u00f6rwald S, Varlet A, Westermann C, Stadelmann T (2019) Automated machine learning in practice: state of the art and recent results. In: 6th Swiss Conference on Data Science (SDS), pp 31-36. IEEE. https:\/\/doi.org\/10.21256\/zhaw-3156","DOI":"10.21256\/zhaw-3156"},{"key":"2046_CR12","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-05318-5","volume-title":"Automated machine learning: methods, systems, challenges","author":"F Hutter","year":"2019","unstructured":"Hutter F, Kotthoff L, Vanschoren J (2019) Automated machine learning: methods, systems, challenges. Springer Nature, New York. https:\/\/doi.org\/10.1007\/978-3-030-05318-5"},{"key":"2046_CR13","doi-asserted-by":"publisher","unstructured":"Vanschoren J (2018) Meta-learning: a survey. https:\/\/doi.org\/10.48550\/arXiv.1810.03548","DOI":"10.48550\/arXiv.1810.03548"},{"key":"2046_CR14","doi-asserted-by":"publisher","unstructured":"Thornton C, Hutter F, Hoos H, Leyton-Brown K (2013) Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms. In: ACM International Conference on Knowledge Discovery and Data Mining, pp 847\u2013855. https:\/\/doi.org\/10.1145\/2487575.2487629","DOI":"10.1145\/2487575.2487629"},{"key":"2046_CR15","doi-asserted-by":"publisher","first-page":"14","DOI":"10.1016\/j.neucom.2014.12.100","volume":"176","author":"L Garcia","year":"2016","unstructured":"Garcia L, Carvalho A, Lorena A (2016) Noise detection in the meta-learning level. Neurocomputing 176:14\u201325. https:\/\/doi.org\/10.1016\/j.neucom.2014.12.100","journal-title":"Neurocomputing"},{"key":"2046_CR16","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.eswa.2017.01.013","volume":"75","author":"AR Parmezan","year":"2017","unstructured":"Parmezan AR, Lee HD, Wu FC (2017) Metalearning for choosing feature selection algorithms in data mining: proposal of a new framework. Expert Syst Appl 75:1\u201324. https:\/\/doi.org\/10.1016\/j.eswa.2017.01.013","journal-title":"Expert Syst Appl"},{"key":"2046_CR17","doi-asserted-by":"publisher","unstructured":"Shen Z, Chen X, Garibaldi JM (2020) A novel meta learning framework for feature selection using data synthesis and fuzzy similarity. In: IEEE international conference on fuzzy systems (FUZZ-IEEE), pp 1\u20138. https:\/\/doi.org\/10.1109\/FUZZ48607.2020.9177769","DOI":"10.1109\/FUZZ48607.2020.9177769"},{"key":"2046_CR18","doi-asserted-by":"publisher","DOI":"10.3837\/tiis.2023.07.002","author":"I Khan","year":"2023","unstructured":"Khan I, Zhang X, Ayyasamy RK, Ali R (2023) AutoFe-Sel: a meta-learning based methodology for recommending feature subset selection algorithms. KSII Trans Internet Inform Syst. https:\/\/doi.org\/10.3837\/tiis.2023.07.002","journal-title":"KSII Trans Internet Inform Syst"},{"key":"2046_CR19","doi-asserted-by":"publisher","unstructured":"Moniz N, Cerqueira V. Automated imbalanced classification via meta-learning. Expert Syst Appl 178:115011 .https:\/\/doi.org\/10.1016\/j.eswa.2021.115011","DOI":"10.1016\/j.eswa.2021.115011"},{"key":"2046_CR20","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2020.106622","volume":"212","author":"X He","year":"2021","unstructured":"He X, Zhao K, Chu X (2021) AutoML: a survey of the state-of-the-art. Knowl-Based Syst 212:106622. https:\/\/doi.org\/10.1016\/j.knosys.2020.106622","journal-title":"Knowl-Based Syst"},{"key":"2046_CR21","unstructured":"M. Feurer, K. Eggensperger, S. Falkner, M. Lindauer, and F. Hutter, \u2018Auto-Sklearn 2.0: Hands-free AutoML via Meta-Learning\u2019, 2020, http:\/\/arxiv.org\/abs\/2007.04074 accessed: Feb. 13, 2022"},{"key":"2046_CR22","doi-asserted-by":"publisher","unstructured":"Olson, R.S., Bartley, N., Urbanowicz, R.J. and Moore, J.H., Evaluation of a tree-based pipeline optimisation tool for automating data science. In Proceedings of the genetic and evolutionary computation conference pp. 485-492, 2016. https:\/\/doi.org\/10.1145\/2908812.2908918","DOI":"10.1145\/2908812.2908918"},{"key":"2046_CR23","unstructured":"LeDell E, Poirier S (2020) H2o automl: Scalable automatic machine learning. In Proceedings of the AutoML Workshop at ICML (Vol. 2020). ICML. https:\/\/www.automl.org\/wp-content\/uploads\/2020\/07\/AutoML_2020_paper_61.pdf"},{"key":"2046_CR24","doi-asserted-by":"publisher","unstructured":"Gijsbers P, Bueno M L, Coors S, LeDell E, Poirier S, Thomas J, Vanschoren J (2022). Amlb: an automl benchmark. arXiv preprint. https:\/\/doi.org\/10.48550\/arXiv.2207.12560","DOI":"10.48550\/arXiv.2207.12560"},{"key":"2046_CR25","unstructured":"P. Vieira, PedroVieira1160634\/automated-imbalanced-classification: Automated Imbalanced Classification. https:\/\/github.com\/PedroVieira1160634\/automated-imbalanced-classification accessed Sep. 10, 2022"},{"key":"2046_CR26","unstructured":"GNU General Public License v3.0 - Project GNU - Free Software Foundation https:\/\/www.gnu.org\/licenses\/gpl-3.0.html accessed Sep. 10, 2022"},{"key":"2046_CR27","unstructured":"UCI Machine Learning Repository https:\/\/archive.ics.uci.edu\/ accessed Aug. 01, 2023"},{"key":"2046_CR28","unstructured":"KEEL: A software tool to assess evolutionary algorithms for Data Mining problems (regression, classification, clustering, pattern mining and so on) https:\/\/sci2s.ugr.es\/keel\/datasets.php accessed Feb. 14, 2022"},{"key":"2046_CR29","unstructured":"Find Open Datasets and Machine Learning Projects - Kaggle https:\/\/www.kaggle.com\/datasets accessed Feb. 14, 2022"},{"key":"2046_CR30","unstructured":"Dataset Search https:\/\/datasetsearch.research.google.com\/ accessed Feb. 14, 2022"},{"key":"2046_CR31","unstructured":"OpenML APIs - OpenML Documentation https:\/\/docs.openml.org\/APIs\/ accessed Jul. 30, 2022"},{"key":"2046_CR32","doi-asserted-by":"publisher","unstructured":"Rivolli A, Garcia L P, Soares C, Vanschoren J, Carvalho A C (2018) Characterizing classification datasets: a study of meta-features for meta-learning. arXiv preprint. https:\/\/doi.org\/10.48550\/arXiv.1808.10406","DOI":"10.48550\/arXiv.1808.10406"},{"key":"2046_CR33","unstructured":"The PyMFE example gallery \u2013 pymfe 0.4.1 documentation https:\/\/pymfe.readthedocs.io\/en\/latest\/auto_examples\/index.html accessed Aug. 20, 2022"},{"key":"2046_CR34","doi-asserted-by":"publisher","unstructured":"Gaudreault J G, Branco P, Gama J (2021) An analysis of performance metrics for imbalanced classification. In International Conference on Discovery Science (pp. 67-77). Cham: Springer International Publishing. https:\/\/doi.org\/10.1007\/978-3-030-88942-5_6","DOI":"10.1007\/978-3-030-88942-5_6"},{"issue":"10","key":"2046_CR35","doi-asserted-by":"publisher","first-page":"12049","DOI":"10.1007\/s10489-021-03041-7","volume":"52","author":"IM De Diego","year":"2022","unstructured":"De Diego IM, Redondo AR, Fern\u00e1ndez RR, Navarro J, Moguerza JM (2022) General Performance Score for classification problems. Appl Intell 52(10):12049\u201312063. https:\/\/doi.org\/10.1007\/s10489-021-03041-7","journal-title":"Appl Intell"},{"key":"2046_CR36","doi-asserted-by":"publisher","unstructured":"Brodersen K H, Ong C S, Stephan K E, Buhmann J M (2010) The balanced accuracy and its posterior distribution. In 20th international conference on pattern recognition (pp. 3121-3124). IEEE. https:\/\/doi.org\/10.1109\/ICPR.2010.764","DOI":"10.1109\/ICPR.2010.764"},{"issue":"1","key":"2046_CR37","doi-asserted-by":"publisher","first-page":"27","DOI":"10.1016\/j.patrec.2008.08.010","volume":"30","author":"C Ferri","year":"2009","unstructured":"Ferri C, Hern\u00e1ndez-Orallo J, Modroiu R (2009) An experimental comparison of performance measures for classification. Pattern Recogn Lett 30(1):27\u201338. https:\/\/doi.org\/10.1016\/j.patrec.2008.08.010","journal-title":"Pattern Recogn Lett"},{"issue":"8","key":"2046_CR38","doi-asserted-by":"publisher","first-page":"861","DOI":"10.1016\/j.patrec.2005.10.010","volume":"27","author":"T Fawcett","year":"2006","unstructured":"Fawcett T (2006) An introduction to ROC analysis. Pattern Recogn Lett 27(8):861\u2013874. https:\/\/doi.org\/10.1016\/j.patrec.2005.10.010","journal-title":"Pattern Recogn Lett"},{"issue":"7","key":"2046_CR39","doi-asserted-by":"publisher","first-page":"1145","DOI":"10.1016\/S0031-3203(96)00142-2","volume":"30","author":"AP Bradley","year":"1997","unstructured":"Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn 30(7):1145\u20131159. https:\/\/doi.org\/10.1016\/S0031-3203(96)00142-2","journal-title":"Pattern Recogn"},{"issue":"1","key":"2046_CR40","doi-asserted-by":"publisher","first-page":"168","DOI":"10.1016\/j.aci.2018.08.003","volume":"17","author":"A Tharwat","year":"2020","unstructured":"Tharwat A (2020) Classification assessment methods. Applied computing and informatics 17(1):168\u2013192. https:\/\/doi.org\/10.1016\/j.aci.2018.08.003","journal-title":"Applied computing and informatics"},{"key":"2046_CR41","doi-asserted-by":"crossref","unstructured":"McHugh ML. Interrater reliability: the kappa statistic. Biochem Med (Zagreb). 2012;22(3):276-82. PMID: 23092060; PMCID: PMC3900052. https:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC3900052\/","DOI":"10.11613\/BM.2012.031"},{"key":"2046_CR42","unstructured":"Imbalanced-learn documentation \u2013 Version 0.9.1 https:\/\/imbalanced-learn.org\/stable\/ accessed Sep. 10, 2022"}],"container-title":["Knowledge and Information Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10115-023-02046-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10115-023-02046-7\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10115-023-02046-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,4,30]],"date-time":"2024-04-30T20:11:10Z","timestamp":1714507870000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10115-023-02046-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,1,12]]},"references-count":42,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2024,5]]}},"alternative-id":["2046"],"URL":"https:\/\/doi.org\/10.1007\/s10115-023-02046-7","relation":{"has-preprint":[{"id-type":"doi","id":"10.21203\/rs.3.rs-3015970\/v1","asserted-by":"object"}]},"ISSN":["0219-1377","0219-3116"],"issn-type":[{"value":"0219-1377","type":"print"},{"value":"0219-3116","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,1,12]]},"assertion":[{"value":"2 June 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"5 September 2023","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 December 2023","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"12 January 2024","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors have no competing interests to declare.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}