{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,28]],"date-time":"2026-04-28T07:10:11Z","timestamp":1777360211387,"version":"3.51.4"},"reference-count":33,"publisher":"Springer Science and Business Media LLC","issue":"5","license":[{"start":{"date-parts":[[2025,5,30]],"date-time":"2025-05-30T00:00:00Z","timestamp":1748563200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,5,30]],"date-time":"2025-05-30T00:00:00Z","timestamp":1748563200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100005855","name":"Universidade Nova de Lisboa","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100005855","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["SN COMPUT. SCI."],"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Addressing the challenge of class imbalance in binary classification, this paper introduces Genetic Methods for OverSampling\u00a0(GM4OS), an innovative technique leveraging the combined capabilities of Genetic Algorithms\u00a0(GAs) and Genetic Programming\u00a0(GP). Traditional oversampling methods like SMOTE and its variants depend on the selected data points and fixed synthetic data generation processes, often leading to suboptimal results. GM4OS advances this field by simultaneously evolving a resampling set and a synthetic data generation function. Individuals in GM4OS are made of two components, the GA component selects minority class observations for resampling, while the GP component evolves functions to create synthetic observations. This dual evolution process aims to optimize both the selection of data points and the creation of synthetic samples, enhancing the performance of classifiers on imbalanced datasets. We studied the performance of GM4OS across ten different test datasets and against five oversampling approaches commonly used in the literature. The results highlight how GM4OS is able to outperform the baseline methods in three out of ten test datasets, improving the algorithm performance.<\/jats:p>","DOI":"10.1007\/s42979-025-04048-4","type":"journal-article","created":{"date-parts":[[2025,5,30]],"date-time":"2025-05-30T05:00:41Z","timestamp":1748581241000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["An Empirical Study of GM4OS for Imbalanced Binary Classification"],"prefix":"10.1007","volume":"6","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2925-527X","authenticated-orcid":false,"given":"Davide","family":"Farinati","sequence":"first","affiliation":[]},{"given":"Leonardo","family":"Vanneschi","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,5,30]]},"reference":[{"key":"4048_CR1","doi-asserted-by":"crossref","unstructured":"Batista GEAPA, Prati RC, Monard MC. A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor. 2004;6:20\u20139.","DOI":"10.1145\/1007730.1007735"},{"key":"4048_CR2","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/1007730.1007733","volume":"6","author":"N Chawla","year":"2004","unstructured":"Chawla N, Japkowicz N, Ko\u0142cz A. Editorial: Special issue on learning from imbalanced data sets. SIGKDD Explor. 2004;6:1\u20136. https:\/\/doi.org\/10.1145\/1007730.1007733.","journal-title":"SIGKDD Explor."},{"issue":"1","key":"4048_CR3","first-page":"321","volume":"16","author":"NV Chawla","year":"2002","unstructured":"Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. Smote: Synthetic minority over-sampling technique. J Artif Int Res. 2002;16(1):321\u201357.","journal-title":"J Artif Int Res."},{"key":"4048_CR4","doi-asserted-by":"publisher","unstructured":"Han H, Wang W-Y, Mao B-H. Borderline-smote: A new over-sampling method in imbalanced data sets learning. In: Proceedings of the 2005 international conference on advances in intelligent computing\u2014volume part I. ICIC\u201905. Springer, Berlin; 2005. p. 878\u2013887. https:\/\/doi.org\/10.1007\/11538059_91 .","DOI":"10.1007\/11538059_91"},{"key":"4048_CR5","doi-asserted-by":"crossref","unstructured":"He H, Bai Y, Garcia EA, Li S. Adasyn: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence); 2008. p. 1322\u20131328.","DOI":"10.1109\/IJCNN.2008.4633969"},{"key":"4048_CR6","doi-asserted-by":"publisher","DOI":"10.7551\/mitpress\/3927.001.0001","volume-title":"An introduction to genetic algorithms","author":"M Mitchell","year":"1996","unstructured":"Mitchell M. An introduction to genetic algorithms. Cambridge: MIT Press; 1996."},{"key":"4048_CR7","volume-title":"A field guide to genetic programming","author":"R Poli","year":"2008","unstructured":"Poli R, Langdon WB, McPhee NF. A field guide to genetic programming. Morrisville: Lulu Enterprises UK Ltd; 2008."},{"key":"4048_CR8","doi-asserted-by":"publisher","first-page":"68","DOI":"10.1007\/978-3-031-56852-7_5","volume-title":"Applications of evolutionary computation","author":"D Farinati","year":"2024","unstructured":"Farinati D, Vanneschi L. Gm4os: an evolutionary oversampling approach for imbalanced binary classification tasks. In: Smith S, Correia J, Cintrano C, editors. Applications of evolutionary computation. Cham: Springer; 2024. p. 68\u201382."},{"issue":"3","key":"4048_CR9","doi-asserted-by":"publisher","first-page":"299","DOI":"10.1109\/TKDE.2005.50","volume":"17","author":"J Huang","year":"2005","unstructured":"Huang J, Ling CX. Using AUC and accuracy in evaluating learning algorithms. IEEE Trans Knowl Data Eng. 2005;17(3):299\u2013310. https:\/\/doi.org\/10.1109\/TKDE.2005.50.","journal-title":"IEEE Trans Knowl Data Eng."},{"key":"4048_CR10","first-page":"176","volume":"7","author":"A Ali","year":"2015","unstructured":"Ali A, Shamsuddin SM, Ralescu A. Classification with class imbalance problem: a review. Int J Adv Soft Comput Appl. 2015;7:176\u2013204.","journal-title":"Int J Adv Soft Comput Appl"},{"key":"4048_CR11","doi-asserted-by":"publisher","first-page":"118","DOI":"10.1016\/j.ins.2019.06.007","volume":"501","author":"G Douzas","year":"2019","unstructured":"Douzas G, Bacao F. Geometric smote a geometrically enhanced drop-in replacement for smote. Inf Sci. 2019;501:118\u201335. https:\/\/doi.org\/10.1016\/j.ins.2019.06.007.","journal-title":"Inf Sci."},{"issue":"1","key":"4048_CR12","doi-asserted-by":"publisher","first-page":"4","DOI":"10.1504\/IJKESDP.2011.039875","volume":"3","author":"HM Nguyen","year":"2011","unstructured":"Nguyen HM, Cooper EW, Kamei K. Borderline over-sampling for imbalanced data classification. Int J Knowl Eng Soft Data Paradigm. 2011;3(1):4\u201321. https:\/\/doi.org\/10.1504\/IJKESDP.2011.039875.","journal-title":"Int J Knowl Eng Soft Data Paradigm."},{"key":"4048_CR13","doi-asserted-by":"publisher","unstructured":"Evgeniou T, Pontil M. Support vector machines: theory and applications. In: Advanced course on artificial intelligence, vol. 2049; 2001. p. 249\u201357. https:\/\/doi.org\/10.1007\/3-540-44673-7_12.","DOI":"10.1007\/3-540-44673-7_12"},{"key":"4048_CR14","doi-asserted-by":"publisher","unstructured":"Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial nets. In: Advances in neural information processing systems; 2014. pp. 2672\u20132680.https:\/\/doi.org\/10.48550\/arXiv.1406.2661","DOI":"10.48550\/arXiv.1406.2661"},{"key":"4048_CR15","doi-asserted-by":"publisher","unstructured":"Kingma DP, Welling M. Auto-encoding variational Bayes. arXiv preprint https:\/\/doi.org\/10.48550\/arXiv.1312.6114; 2013","DOI":"10.48550\/arXiv.1312.6114"},{"key":"4048_CR16","doi-asserted-by":"publisher","unstructured":"Frid-Adar M, Klang E, Amitai M, Goldberger J, Greenspan H, Synthetic data augmentation using GAN for improved liver lesion classification. In: 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI). IEEE. 2018.p. 04\u201307. https:\/\/doi.org\/10.1109\/ISBI.2018.8363576.","DOI":"10.1109\/ISBI.2018.8363576"},{"key":"4048_CR17","doi-asserted-by":"publisher","first-page":"464","DOI":"10.1016\/j.eswa.2017.09.030","volume":"91","author":"G Douzas","year":"2018","unstructured":"Douzas G, Bacao F. Effective data generation for imbalanced learning using conditional generative adversarial networks. Expert Syst Appl. 2018;91:464\u201371. https:\/\/doi.org\/10.1016\/j.eswa.2017.09.030.","journal-title":"Expert Syst Appl."},{"issue":"4","key":"4048_CR18","doi-asserted-by":"publisher","first-page":"1349","DOI":"10.28991\/ESJ-2023-07-04-021","volume":"7","author":"F Frank","year":"2023","unstructured":"Frank F, Bacao F. Advanced genetic programming vs. state-of-the-art AutoML in imbalanced binary classification. Emerg Sci J. 2023;7(4):1349\u201363. https:\/\/doi.org\/10.28991\/ESJ-2023-07-04-021.","journal-title":"Emerg Sci J."},{"key":"4048_CR19","doi-asserted-by":"publisher","first-page":"1021","DOI":"10.1080\/0952813X.2022.2120087","volume":"36","author":"A Kumar","year":"2024","unstructured":"Kumar A. A new fitness function in genetic programming for classification of imbalanced data. J Exp Theor Artif Intell. 2024;36:1021\u201333. https:\/\/doi.org\/10.1080\/0952813X.2022.2120087.","journal-title":"J Exp Theor Artif Intell."},{"key":"4048_CR20","doi-asserted-by":"publisher","unstructured":"Pei W, Xue B, Shang L, Zhang M. New fitness functions in genetic programming for classification with high-dimensional unbalanced data. In: 2019 IEEE Congress on evolutionary computation (CEC); 2019. p. 2779\u20132786. https:\/\/doi.org\/10.1109\/CEC.2019.8789974","DOI":"10.1109\/CEC.2019.8789974"},{"key":"4048_CR21","unstructured":"Karia V, Zhang W, Naeim A, Ramezani R. GenSample: a genetic algorithm for oversampling in imbalanced datasets\u2019 2019."},{"key":"4048_CR22","doi-asserted-by":"publisher","first-page":"213","DOI":"10.1007\/978-3-030-16670-0_14","volume-title":"Genetic program","author":"I Azzali","year":"2019","unstructured":"Azzali I, Vanneschi L, Silva S, Bakurov I, Giacobini M. A vectorial approach to genetic programming. In: Sekanina L, Hu T, Louren\u00e7o N, Richter H, Garc\u00eda-S\u00e1nchez P, editors. Genetic program. Cham: Springer; 2019. p. 213\u201327."},{"key":"4048_CR23","doi-asserted-by":"publisher","first-page":"106097","DOI":"10.1016\/j.asoc.2020.106097","volume":"89","author":"I Azzali","year":"2020","unstructured":"Azzali I, Vanneschi L, Bakurov I, Silva S, Ivaldi M, Giacobini M. Towards the use of vector based GP to predict physiological time series. Appl Soft Comput. 2020;89:106097. https:\/\/doi.org\/10.1016\/j.asoc.2020.106097.","journal-title":"Appl Soft Comput."},{"key":"4048_CR24","doi-asserted-by":"publisher","unstructured":"Quan W, Soule T. A study of the role of single node mutation in genetic programming. In: Genetic and evolutionary computation\u2014GECCO 2004. Berlin: Springer; 2004. p. 717\u2013718. https:\/\/doi.org\/10.1007\/978-3-540-24855-2_84","DOI":"10.1007\/978-3-540-24855-2_84"},{"key":"4048_CR25","unstructured":"Crossover operators in genetic algorithms: a review. 2025. https:\/\/www.researchgate.net\/publication\/288749263_CROSSOVER_OPERATORS_IN_GENETIC_ALGORITHMS_A_REVIEW"},{"key":"4048_CR26","unstructured":"GA one-point mutation of a numerical string; 2025. https:\/\/1library.net\/article\/ga-one-point-mutation-of-a-numerical-string.q0vvpl3z"},{"key":"4048_CR27","doi-asserted-by":"crossref","unstructured":"Romano JD, Le TT, La\u00a0Cava W, Gregg JT, Goldberg DJ, Chakraborty P, Ray NL, Himmelstein D, Fu W, Moore JH. Pmlb v1.0: an open source dataset collection for benchmarking machine learning methods. arXiv preprint arXiv:2012.00058v2; 2021.","DOI":"10.1093\/bioinformatics\/btab727"},{"key":"4048_CR28","doi-asserted-by":"publisher","first-page":"874","DOI":"10.2307\/2530946","volume":"40","author":"L Breiman","year":"1984","unstructured":"Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and regression trees. Biometrics. 1984;40:874.","journal-title":"Biometrics."},{"issue":"2","key":"4048_CR29","doi-asserted-by":"publisher","first-page":"215","DOI":"10.1111\/j.2517-6161.1958.tb00292.x","volume":"20","author":"DR Cox","year":"1958","unstructured":"Cox DR. The regression analysis of binary sequences. J R Stat Soc Ser B (Methodol). 1958;20(2):215\u201332.","journal-title":"J R Stat Soc Ser B (Methodol)."},{"key":"4048_CR30","unstructured":"Elor Y, Averbuch-Elor H. To SMOTE, or Not to SMOTE?"},{"key":"4048_CR31","doi-asserted-by":"publisher","unstructured":"Chen T, Guestrin C. Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. KDD \u201916. ACM; 2016.https:\/\/doi.org\/10.1145\/2939672.2939785.","DOI":"10.1145\/2939672.2939785"},{"key":"4048_CR32","unstructured":"Ferrer L. Analysis and comparison of classification metrics; 2023."},{"issue":"1","key":"4048_CR33","doi-asserted-by":"publisher","first-page":"122","DOI":"10.1214\/aoms\/1177704248","volume":"34","author":"TW Anderson","year":"1963","unstructured":"Anderson TW. Asymptotic theory for principal component analysis. Ann Math Stat. 1963;34(1):122\u201348. https:\/\/doi.org\/10.1214\/aoms\/1177704248.","journal-title":"Ann Math Stat."}],"container-title":["SN Computer Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s42979-025-04048-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s42979-025-04048-4\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s42979-025-04048-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,5,30]],"date-time":"2025-05-30T05:00:45Z","timestamp":1748581245000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s42979-025-04048-4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,5,30]]},"references-count":33,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2025,6]]}},"alternative-id":["4048"],"URL":"https:\/\/doi.org\/10.1007\/s42979-025-04048-4","relation":{},"ISSN":["2661-8907"],"issn-type":[{"value":"2661-8907","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,5,30]]},"assertion":[{"value":"25 June 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 May 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"30 May 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"On behalf of all authors, the corresponding author states that there is no Conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}],"article-number":"510"}}