{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,29]],"date-time":"2026-06-29T11:48:08Z","timestamp":1782733688820,"version":"3.54.5"},"reference-count":66,"publisher":"Springer Science and Business Media LLC","issue":"7","license":[{"start":{"date-parts":[[2023,1,5]],"date-time":"2023-01-05T00:00:00Z","timestamp":1672876800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,1,5]],"date-time":"2023-01-05T00:00:00Z","timestamp":1672876800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100002386","name":"Cairo University","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100002386","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Mach Learn"],"published-print":{"date-parts":[[2024,7]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Class imbalance occurs when the class distribution is not equal. Namely, one class is under-represented (minority class), and the other class has significantly more samples in the data (majority class). The class imbalance problem is prevalent in many real world applications. Generally, the under-represented minority class is the class of interest. The synthetic minority over-sampling technique (SMOTE) method is considered the most prominent method for handling unbalanced data. The SMOTE method generates new synthetic data patterns by performing linear interpolation between minority class samples and their K nearest neighbors. However, the SMOTE generated patterns do not necessarily conform to the original minority class distribution. This paper develops a novel theoretical analysis of the SMOTE method by deriving the probability distribution of the SMOTE generated samples. To the best of our knowledge, this is the first work deriving a mathematical formulation for the SMOTE patterns\u2019 probability distribution. This allows us to compare the density of the generated samples with the true underlying class-conditional density, in order to assess how representative the generated samples are. The derived formula is verified by computing it on a number of densities versus densities computed and estimated empirically.<\/jats:p>","DOI":"10.1007\/s10994-022-06296-4","type":"journal-article","created":{"date-parts":[[2023,1,5]],"date-time":"2023-01-05T18:02:42Z","timestamp":1672941762000},"page":"4903-4923","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":306,"title":["A theoretical distribution analysis of synthetic minority oversampling technique (SMOTE) for imbalanced learning"],"prefix":"10.1007","volume":"113","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5664-2457","authenticated-orcid":false,"given":"Dina","family":"Elreedy","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Amir F.","family":"Atiya","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Firuz","family":"Kamalov","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2023,1,5]]},"reference":[{"issue":"2013","key":"6296_CR1","first-page":"332","volume":"1","author":"SM Abd Elrahman","year":"2013","unstructured":"Abd Elrahman, S. M., & Abraham, A. (2013). A review of class imbalance problem. Journal of Network and Innovative Computing, 1(2013), 332\u2013340.","journal-title":"Journal of Network and Innovative Computing"},{"key":"6296_CR2","doi-asserted-by":"crossref","unstructured":"Ahsan, M., Gomes, R., & Denton, A. (2018). Smote implementation on phishing data to enhance cybersecurity. In 2018 IEEE International Conference on Electro\/Information Technology (EIT) (pp. 0531\u20130536). IEEE.","DOI":"10.1109\/EIT.2018.8500086"},{"issue":"70","key":"6296_CR3","doi-asserted-by":"crossref","first-page":"3489","DOI":"10.12988\/ams.2013.34221","volume":"7","author":"F Al-Sirehy","year":"2013","unstructured":"Al-Sirehy, F., & Fisher, B. (2013). Further results on the beta function and the incomplete beta function. Applied Mathematical Sciences, 7(70), 3489\u20133495.","journal-title":"Applied Mathematical Sciences"},{"issue":"2","key":"6296_CR4","doi-asserted-by":"crossref","first-page":"191","DOI":"10.12732\/ijam.v26i2.6","volume":"26","author":"F Al-Sirehy","year":"2013","unstructured":"Al-Sirehy, F., & Fisher, B. (2013). Results on the beta function and the incomplete beta function. International Journal of Applied Mathematics, 26(2), 191.","journal-title":"International Journal of Applied Mathematics"},{"issue":"1","key":"6296_CR5","doi-asserted-by":"crossref","first-page":"45","DOI":"10.1007\/s13748-012-0034-6","volume":"2","author":"I Albisua","year":"2013","unstructured":"Albisua, I., Arbelaitz, O., Gurrutxaga, I., Lasarguren, A., Muguerza, J., & Perez, J. M. (2013). The quest for the optimal class distribution: An approach for enhancing the effectiveness of learning via resampling methods for imbalanced data sets. Progress in Artificial Intelligence, 2(1), 45\u201363.","journal-title":"Progress in Artificial Intelligence"},{"key":"6296_CR6","doi-asserted-by":"crossref","unstructured":"Atiya, A., Talaat, N., & Shaheen, S. (1997). An efficient stock market forecasting model using neural networks. In Proceedings of International Conference on Neural Networks (ICNN\u201997) (pp. 2112\u20132115). IEEE.","DOI":"10.1109\/ICNN.1997.614231"},{"key":"6296_CR7","doi-asserted-by":"crossref","unstructured":"Balogun, A.O., Lafenwa-Balogun, F. B., Mojeed, H. A., Adeyemo, V. E., Akande, O. N., Akintola, A. G., Bajeh, A. O., & Usman-Hamza, F. E. (2020). Smote-based homogeneous ensemble methods for software defect prediction. In International Conference on Computational Science and its Applications (pp. 615\u2013631). Springer","DOI":"10.1007\/978-3-030-58817-5_45"},{"issue":"3","key":"6296_CR8","doi-asserted-by":"crossref","first-page":"849","DOI":"10.1016\/S0031-3203(02)00257-1","volume":"36","author":"R Barandela","year":"2003","unstructured":"Barandela, R., S\u00e1nchez, J. S., Garc\u0131a, V., & Rangel, E. (2003). Strategies for learning in class imbalance problems. Pattern Recognition, 36(3), 849\u2013851.","journal-title":"Pattern Recognition"},{"issue":"1","key":"6296_CR9","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1145\/1007730.1007735","volume":"6","author":"G Batista","year":"2004","unstructured":"Batista, G., Prati, R., & Monard, M. (2004). A study of the behavior of several methods for balancing machine learning training data. ACM Sigkdd Explorations Newsletter, 6(1), 20\u201329.","journal-title":"ACM Sigkdd Explorations Newsletter"},{"issue":"2","key":"6296_CR10","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1007\/s10994-020-05913-4","volume":"110","author":"S Bej","year":"2021","unstructured":"Bej, S., Davtyan, N., Wolfien, M., Nassar, M., & Wolkenhauer, O. (2021). Loras: An oversampling approach for imbalanced datasets. Machine Learning, 110(2), 279\u2013301.","journal-title":"Machine Learning"},{"key":"6296_CR11","doi-asserted-by":"crossref","unstructured":"Bol\u00edvar, A., Garc\u00eda, V., Florencia, R., Alejo, R., Rivera, G., & Sanchez-Solis, J. P. (2022). A preliminary study of smote on imbalanced big datasets when dealing with sparse and dense high dimensionality. In Mexican Conference on Pattern Recognition (pp. 46\u201355). Springer.","DOI":"10.1007\/978-3-031-07750-0_5"},{"key":"6296_CR12","doi-asserted-by":"crossref","first-page":"249","DOI":"10.1016\/j.neunet.2018.07.011","volume":"106","author":"M Buda","year":"2018","unstructured":"Buda, M., Maki, A., & Mazurowski, M. A. (2018). A systematic study of the class imbalance problem in convolutional neural networks. Neural Networks, 106, 249\u2013259.","journal-title":"Neural Networks"},{"key":"6296_CR13","doi-asserted-by":"crossref","unstructured":"Bunkhumpornpat, C., Sinapiromsaran, K., & Lursinsap, C. (2009). Safe-level-smote: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 475\u2013482). Springer.","DOI":"10.1007\/978-3-642-01307-2_43"},{"key":"6296_CR14","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1613\/jair.953","volume":"16","author":"NV Chawla","year":"2002","unstructured":"Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). Smote: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321\u2013357.","journal-title":"Journal of Artificial Intelligence Research"},{"issue":"2","key":"6296_CR15","doi-asserted-by":"crossref","first-page":"2092","DOI":"10.1007\/s10489-021-02369-4","volume":"52","author":"VK Chennuru","year":"2022","unstructured":"Chennuru, V. K., & Timmappareddy, S. R. (2022). Simulated annealing based undersampling (SAUS): A hybrid multi-objective optimization method to tackle class imbalance. Applied Intelligence, 52(2), 2092\u20132110.","journal-title":"Applied Intelligence"},{"key":"6296_CR16","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2021.3136503","author":"D Dablain","year":"2022","unstructured":"Dablain, D., Krawczyk, B., & Chawla, N. V. (2022). Deepsmote: Fusing deep learning and smote for imbalanced data. IEEE Transactions on Neural Networks and Learning Systems. https:\/\/doi.org\/10.1109\/TNNLS.2021.3136503","journal-title":"IEEE Transactions on Neural Networks and Learning Systems"},{"key":"6296_CR17","unstructured":"De\u00a0La\u00a0Calleja, J., & Fuentes, O. (2007). A distance-based over-sampling method for learning from imbalanced data sets. In FLAIRS Conference (pp. 634\u2013635)."},{"issue":"1","key":"6296_CR18","doi-asserted-by":"crossref","first-page":"143","DOI":"10.1080\/0952813X.2020.1864783","volume":"34","author":"D Devi","year":"2022","unstructured":"Devi, D., Biswas, S. K., & Purkayastha, B. (2022). Correlation-based oversampling aided cost sensitive ensemble learning technique for treatment of class imbalance. Journal of Experimental & Theoretical Artificial Intelligence, 34(1), 143\u2013174.","journal-title":"Journal of Experimental & Theoretical Artificial Intelligence"},{"key":"6296_CR19","doi-asserted-by":"crossref","first-page":"220","DOI":"10.1016\/j.neuroimage.2013.10.005","volume":"87","author":"R Dubey","year":"2014","unstructured":"Dubey, R., Zhou, J., Wang, Y., Thompson, P. M., Ye, J., & Alzheimer's Disease Neuroimaging Initiative (2014). Analysis of sampling techniques for imbalanced data: An n= 648 ADNI study. NeuroImage, 87, 220\u2013241.","journal-title":"NeuroImage"},{"issue":"1","key":"6296_CR20","doi-asserted-by":"crossref","first-page":"13","DOI":"10.32985\/ijeces.11.1.2","volume":"11","author":"M Dudjak","year":"2020","unstructured":"Dudjak, M., & Martinovi\u0107, G. (2020). In-depth performance analysis of smote-based oversampling algorithms in binary classification. International Journal of Electrical and Computer Engineering Systems, 11(1), 13\u201323.","journal-title":"International Journal of Electrical and Computer Engineering Systems"},{"key":"6296_CR21","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1007\/BF00327713","volume":"24","author":"J Dutka","year":"1981","unstructured":"Dutka, J. (1981). The incomplete beta function\u2014A historical profile. Archive for History of Exact Sciences, 24, 11\u201329.","journal-title":"Archive for History of Exact Sciences"},{"key":"6296_CR22","doi-asserted-by":"crossref","first-page":"32","DOI":"10.1016\/j.ins.2019.07.070","volume":"505","author":"D Elreedy","year":"2019","unstructured":"Elreedy, D., & Atiya, A. F. (2019). A comprehensive analysis of synthetic minority oversampling technique (SMOTE) for handling class imbalance. Information Sciences, 505, 32\u201364.","journal-title":"Information Sciences"},{"issue":"7","key":"6296_CR23","doi-asserted-by":"crossref","first-page":"2839","DOI":"10.1007\/s00521-020-05130-z","volume":"33","author":"E Elyan","year":"2021","unstructured":"Elyan, E., Moreno-Garcia, C. F., & Jayne, C. (2021). Cdsmote: Class decomposition and synthetic minority class oversampling technique for imbalanced-data classification. Neural Computing and Applications, 33(7), 2839\u20132851.","journal-title":"Neural Computing and Applications"},{"key":"6296_CR24","doi-asserted-by":"crossref","first-page":"863","DOI":"10.1613\/jair.1.11192","volume":"61","author":"A Fern\u00e1ndez","year":"2018","unstructured":"Fern\u00e1ndez, A., Garcia, S., Herrera, F., & Chawla, N. V. (2018). Smote for learning from imbalanced data: Progress and challenges, marking the 15-year anniversary. Journal of Artificial Intelligence Research, 61, 863\u2013905.","journal-title":"Journal of Artificial Intelligence Research"},{"issue":"103","key":"6296_CR25","first-page":"089","volume":"90","author":"S Fotouhi","year":"2019","unstructured":"Fotouhi, S., Asadi, S., & Kattan, M. W. (2019). A comprehensive data level analysis for cancer diagnosis on imbalanced data. Journal of Biomedical Informatics, 90(103), 089.","journal-title":"Journal of Biomedical Informatics"},{"issue":"3","key":"6296_CR26","doi-asserted-by":"crossref","first-page":"320","DOI":"10.1109\/TIT.1973.1055003","volume":"19","author":"K Fukunaga","year":"1973","unstructured":"Fukunaga, K., & Hostetler, L. (1973). Optimization of k nearest neighbor density estimates. IEEE Transactions on Information Theory, 19(3), 320\u2013326.","journal-title":"IEEE Transactions on Information Theory"},{"key":"6296_CR27","doi-asserted-by":"crossref","first-page":"107933","DOI":"10.1016\/j.asoc.2021.107933","volume":"113","author":"M Ganaie","year":"2021","unstructured":"Ganaie, M., Tanveer, M., & Alzheimer\u2019s Disease Neuroimaging Initiative (2021). Fuzzy least squares projection twin support vector machines for class imbalance learning. Applied Soft Computing, 113, 107933.","journal-title":"Applied Soft Computing"},{"key":"6296_CR28","doi-asserted-by":"crossref","first-page":"248","DOI":"10.1016\/j.neucom.2014.02.006","volume":"138","author":"M Gao","year":"2014","unstructured":"Gao, M., Hong, X., Chen, S., et al. (2014). Pdfos: Pdf estimation based over-sampling for imbalanced two-class problems. Neurocomputing, 138, 248\u2013259.","journal-title":"Neurocomputing"},{"issue":"3","key":"6296_CR29","doi-asserted-by":"crossref","first-page":"275","DOI":"10.1162\/evco.2009.17.3.275","volume":"17","author":"S Garc\u00eda","year":"2009","unstructured":"Garc\u00eda, S., & Herrera, F. (2009). Evolutionary undersampling for classification with imbalanced datasets: Proposals and taxonomy. Evolutionary Computation, 17(3), 275\u2013306.","journal-title":"Evolutionary Computation"},{"key":"6296_CR30","doi-asserted-by":"crossref","unstructured":"Garc\u00eda, V., S\u00e1nchez, J., & Mollineda, R. (2010). Exploring the performance of resampling strategies for the class imbalance problem. In Trends in applied intelligent systems (pp. 541\u2013549).","DOI":"10.1007\/978-3-642-13022-9_54"},{"issue":"4","key":"6296_CR31","first-page":"1","volume":"2","author":"J Goodman","year":"2022","unstructured":"Goodman, J., Sarkani, S., & Mazzuchi, T. (2022). Distance-based probabilistic data augmentation for synthetic minority oversampling. ACM\/IMS Transactions on Data Science (TDS), 2(4), 1\u201318.","journal-title":"ACM\/IMS Transactions on Data Science (TDS)"},{"key":"6296_CR32","doi-asserted-by":"crossref","unstructured":"Guo, G., Wang, H., Bell, D., Bi, Y., & Greer, K. (2003). KNN model-based approach in classification. In OTM Confederated International Conferences \u201cOn the Move to Meaningful Internet Systems\" (pp. 986\u2013996). Springer.","DOI":"10.1007\/978-3-540-39964-3_62"},{"issue":"114","key":"6296_CR33","first-page":"301","volume":"168","author":"A Guzm\u00e1n-Ponce","year":"2021","unstructured":"Guzm\u00e1n-Ponce, A., S\u00e1nchez, J. S., Valdovinos, R. M., & Marcial-Romero, J. R. (2021). DBIG-US: A two-stage under-sampling algorithm to face the class imbalance problem. Expert Systems with Applications, 168(114), 301.","journal-title":"Expert Systems with Applications"},{"key":"6296_CR34","doi-asserted-by":"crossref","first-page":"220","DOI":"10.1016\/j.eswa.2016.12.035","volume":"73","author":"G Haixiang","year":"2017","unstructured":"Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., & Bing, G. (2017). Learning from class-imbalanced data: Review of methods and applications. Expert Systems with Applications, 73, 220\u2013239.","journal-title":"Expert Systems with Applications"},{"key":"6296_CR35","doi-asserted-by":"crossref","unstructured":"Han, H., Wang, W. Y., & Mao, B. H. (2005). Borderline-smote: A new over-sampling method in imbalanced data sets learning. In International Conference on Intelligent Computing (pp. 878\u2013887). Springer.","DOI":"10.1007\/11538059_91"},{"key":"6296_CR36","unstructured":"He, H., Bai, Y., Garcia, E. A., & Li, S. A. (2008). Adasyn: Adaptive synthetic sampling approach for imbalanced learning. In IEEE International Joint Conference on Computational Intelligence, IJCNN (pp. 1322\u20131328). IEEE."},{"issue":"4","key":"6296_CR37","doi-asserted-by":"crossref","first-page":"18","DOI":"10.1109\/5254.708428","volume":"13","author":"MA Hearst","year":"1998","unstructured":"Hearst, M. A., Dumais, S. T., Osuna, E., Platt, J., & Scholkopf, B. (1998). Support vector machines. IEEE Intelligent Systems and their Applications, 13(4), 18\u201328.","journal-title":"IEEE Intelligent Systems and their Applications"},{"key":"6296_CR38","doi-asserted-by":"crossref","unstructured":"Hu, S., Liang, Y., Ma, L., & He. Y. (2009). Msmote: Improving classification performance when training data is imbalanced. In 2009 Second International Workshop on Computer Science and Engineering (pp. 13\u201317). IEEE.","DOI":"10.1109\/WCSE.2009.756"},{"issue":"5","key":"6296_CR39","doi-asserted-by":"crossref","first-page":"429","DOI":"10.3233\/IDA-2002-6504","volume":"6","author":"N Japkowicz","year":"2002","unstructured":"Japkowicz, N., & Stephen, S. (2002). The class imbalance problem: A systematic study. Intelligent Data Analysis, 6(5), 429\u2013449.","journal-title":"Intelligent Data Analysis"},{"key":"6296_CR40","unstructured":"Kamalov, F., Atiya, A. F., & Elreedy, D. (2022). Partial resampling of imbalanced data. arXiv preprint arXiv:2207.04631"},{"issue":"4","key":"6296_CR41","first-page":"1","volume":"52","author":"H Kaur","year":"2019","unstructured":"Kaur, H., Pannu, H. S., & Malhi, A. K. (2019). A systematic review on imbalanced data challenges in machine learning: Applications and solutions. ACM Computing Surveys (CSUR), 52(4), 1\u201336.","journal-title":"ACM Computing Surveys (CSUR)"},{"key":"6296_CR42","doi-asserted-by":"publisher","DOI":"10.1007\/s13198-021-01174-z","author":"A Kishor","year":"2021","unstructured":"Kishor, A., & Chakraborty, C. (2021). Early and accurate prediction of diabetics based on FCBF feature selection and smote. International Journal of System Assurance Engineering and Management. https:\/\/doi.org\/10.1007\/s13198-021-01174-z","journal-title":"International Journal of System Assurance Engineering and Management"},{"issue":"11","key":"6296_CR43","doi-asserted-by":"crossref","first-page":"3059","DOI":"10.1007\/s10994-021-06012-8","volume":"110","author":"M Koziarski","year":"2021","unstructured":"Koziarski, M., Bellinger, C., & Wo\u017aniak, M. (2021). RB-CCR: Radial-based combined cleaning and resampling algorithm for imbalanced data classification. Machine Learning, 110(11), 3059\u20133093.","journal-title":"Machine Learning"},{"key":"6296_CR44","doi-asserted-by":"crossref","first-page":"114750","DOI":"10.1016\/j.eswa.2021.114750","volume":"175","author":"Z Li","year":"2021","unstructured":"Li, Z., Huang, M., Liu, G., & Jiang, C. (2021). A hybrid method with dynamic weighted entropy for handling the problem of class imbalance with overlap in credit card fraud detection. Expert Systems with Applications, 175, 114750.","journal-title":"Expert Systems with Applications"},{"issue":"1","key":"6296_CR45","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12911-021-01695-4","volume":"22","author":"L Liu","year":"2022","unstructured":"Liu, L., Wu, X., Li, S., Tan, S., & Bai, Y. (2022). Solving the class imbalance problem using ensemble algorithm: Application of screening for aortic dissection. BMC Medical Informatics and Decision Making, 22(1), 1\u201316.","journal-title":"BMC Medical Informatics and Decision Making"},{"issue":"10","key":"6296_CR46","doi-asserted-by":"crossref","first-page":"1909","DOI":"10.1007\/s00500-010-0625-8","volume":"15","author":"J Luengo","year":"2011","unstructured":"Luengo, J., Fern\u00e1ndez, A., Garc\u00eda, S., & Herrera, F. (2011). Addressing data complexity for imbalanced data sets: Analysis of smote-based oversampling and evolutionary undersampling. Soft Computing, 15(10), 1909\u20131936.","journal-title":"Soft Computing"},{"issue":"3","key":"6296_CR47","doi-asserted-by":"crossref","first-page":"497","DOI":"10.1109\/TNN.2002.1000120","volume":"13","author":"M Magdon-Ismail","year":"2002","unstructured":"Magdon-Ismail, M., & Atiya, A. (2002). Density estimation and random variate generation using multilayer networks. IEEE Transactions on Neural Networks, 13(3), 497\u2013520.","journal-title":"IEEE Transactions on Neural Networks"},{"key":"6296_CR48","doi-asserted-by":"crossref","first-page":"108217","DOI":"10.1016\/j.knosys.2022.108217","volume":"241","author":"S Mayabadi","year":"2022","unstructured":"Mayabadi, S., & Saadatfar, H. (2022). Two density-based sampling approaches for imbalanced and overlapping data. Knowledge-Based Systems, 241, 108217.","journal-title":"Knowledge-Based Systems"},{"key":"6296_CR49","doi-asserted-by":"crossref","first-page":"107222","DOI":"10.1016\/j.knosys.2021.107222","volume":"227","author":"N Moniz","year":"2021","unstructured":"Moniz, N., & Monteiro, H. (2021). No free lunch in imbalanced learning. Knowledge-Based Systems, 227, 107222.","journal-title":"Knowledge-Based Systems"},{"issue":"11","key":"6296_CR50","doi-asserted-by":"crossref","first-page":"5713","DOI":"10.1109\/TNNLS.2018.2812279","volume":"29","author":"SS Mullick","year":"2018","unstructured":"Mullick, S. S., Datta, S., & Das, S. (2018). Adaptive learning-based $$k$$-nearest neighbor classifiers with resilience to class imbalance. IEEE Transactions on Neural Networks and Learning Systems, 29(11), 5713\u20135725.","journal-title":"IEEE Transactions on Neural Networks and Learning Systems"},{"issue":"1","key":"6296_CR51","doi-asserted-by":"crossref","first-page":"4","DOI":"10.1504\/IJKESDP.2011.039875","volume":"3","author":"HM Nguyen","year":"2011","unstructured":"Nguyen, H. M., Cooper, E. W., & Kamei, K. (2011). Borderline over-sampling for imbalanced data classification. International Journal of Knowledge Engineering and Soft Data Paradigms, 3(1), 4\u201321.","journal-title":"International Journal of Knowledge Engineering and Soft Data Paradigms"},{"issue":"3","key":"6296_CR52","doi-asserted-by":"crossref","first-page":"1065","DOI":"10.1214\/aoms\/1177704472","volume":"33","author":"E Parzen","year":"1962","unstructured":"Parzen, E. (1962). On estimation of a probability density function and mode. The Annals of Mathematical Statistics, 33(3), 1065\u20131076.","journal-title":"The Annals of Mathematical Statistics"},{"key":"6296_CR53","doi-asserted-by":"crossref","unstructured":"Prati, R. C., Batista, G. E., & Monard, M. C. (2004). Learning with class skews and small disjuncts. In Brazilian Symposium on Artificial Intelligence (pp. 296\u2013306). Springer.","DOI":"10.1007\/978-3-540-28645-5_30"},{"issue":"1","key":"6296_CR54","doi-asserted-by":"crossref","first-page":"71","DOI":"10.1145\/234313.234346","volume":"28","author":"JR Quinlan","year":"1996","unstructured":"Quinlan, J. R. (1996). Learning decision tree classifiers. ACM Computing Surveys (CSUR), 28(1), 71\u201372.","journal-title":"ACM Computing Surveys (CSUR)"},{"key":"6296_CR55","doi-asserted-by":"crossref","first-page":"832","DOI":"10.1214\/aoms\/1177728190","volume":"27","author":"M Rosenblatt","year":"1956","unstructured":"Rosenblatt, M. (1956). Remarks on some nonparametric estimates of a density function. The Annals of Mathematical Statistics, 27, 832\u2013837.","journal-title":"The Annals of Mathematical Statistics"},{"key":"6296_CR56","doi-asserted-by":"crossref","first-page":"429","DOI":"10.1016\/j.ins.2019.11.004","volume":"513","author":"F Thabtah","year":"2020","unstructured":"Thabtah, F., Hammoud, S., Kamalov, F., & Gonsalves, A. (2020). Data imbalance in classification: Experimental evaluation. Information Sciences, 513, 429\u2013441.","journal-title":"Information Sciences"},{"key":"6296_CR57","volume-title":"The theory of probability: Explorations and applications","author":"SS Venkatesh","year":"2013","unstructured":"Venkatesh, S. S. (2013). The theory of probability: Explorations and applications. Cambridge University Press."},{"key":"6296_CR58","doi-asserted-by":"crossref","first-page":"47","DOI":"10.1016\/j.ins.2019.08.062","volume":"509","author":"P Vuttipittayamongkol","year":"2020","unstructured":"Vuttipittayamongkol, P., & Elyan, E. (2020). Neighbourhood-based undersampling approach for handling imbalanced and overlapped data. Information Sciences, 509, 47\u201370.","journal-title":"Information Sciences"},{"key":"6296_CR59","unstructured":"Wadsworth, G. P. (1960). Introduction to probability and random variables. Tech. rep."},{"key":"6296_CR60","doi-asserted-by":"crossref","unstructured":"Wan, Z., Zhang, Y., & He, H. (2017). Variational autoencoder based synthetic data generation for imbalanced learning. In 2017 IEEE Symposium Series on Computational Intelligence (SSCI) (pp. 1\u20137). IEEE.","DOI":"10.1109\/SSCI.2017.8285168"},{"key":"6296_CR61","doi-asserted-by":"crossref","first-page":"64606","DOI":"10.1109\/ACCESS.2021.3074243","volume":"9","author":"L Wang","year":"2021","unstructured":"Wang, L., Han, M., Li, X., Zhang, N., & Cheng, H. (2021). Review of classification methods on unbalanced data sets. IEEE Access, 9, 64606\u201364628.","journal-title":"IEEE Access"},{"issue":"10","key":"6296_CR62","doi-asserted-by":"crossref","first-page":"4802","DOI":"10.1109\/TNNLS.2017.2771290","volume":"29","author":"S Wang","year":"2018","unstructured":"Wang, S., Minku, L. L., & Yao, X. (2018). A systematic study of online class imbalance learning with concept drift. IEEE Transactions on Neural Networks and Learning Systems, 29(10), 4802\u20134821.","journal-title":"IEEE Transactions on Neural Networks and Learning Systems"},{"key":"6296_CR63","doi-asserted-by":"crossref","first-page":"315","DOI":"10.1613\/jair.1199","volume":"19","author":"GM Weiss","year":"2003","unstructured":"Weiss, G. M., & Provost, F. (2003). Learning when training data are costly: The effect of class distribution on tree induction. Journal of Artificial Intelligence Research, 19, 315\u2013354.","journal-title":"Journal of Artificial Intelligence Research"},{"key":"6296_CR64","unstructured":"Wu, X., & Meng, S. (2016). E-commerce customer churn prediction based on improved SMOTE and AdaBoost. In 2016 13th International Conference on Service Systems and Service Management (ICSSSM) (pp. 1\u20135). IEEE."},{"key":"6296_CR65","doi-asserted-by":"crossref","first-page":"116213","DOI":"10.1016\/j.eswa.2021.116213","volume":"191","author":"Y Yan","year":"2022","unstructured":"Yan, Y., Jiang, Y., Zheng, Z., Yu, C., Zhang, Y., & Zhang, Y. (2022). LDAS: Local density-based adaptive sampling for imbalanced data classification. Expert Systems with Applications, 191, 116213.","journal-title":"Expert Systems with Applications"},{"key":"6296_CR66","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1016\/j.inffus.2013.12.003","volume":"20","author":"H Zhang","year":"2014","unstructured":"Zhang, H., & Li, M. (2014). RWO-sampling: A random walk over-sampling approach to imbalanced data classification. Information Fusion, 20, 99\u2013116.","journal-title":"Information Fusion"}],"container-title":["Machine Learning"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-022-06296-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10994-022-06296-4\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-022-06296-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,5,31]],"date-time":"2024-05-31T18:44:12Z","timestamp":1717181052000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10994-022-06296-4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,1,5]]},"references-count":66,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2024,7]]}},"alternative-id":["6296"],"URL":"https:\/\/doi.org\/10.1007\/s10994-022-06296-4","relation":{},"ISSN":["0885-6125","1573-0565"],"issn-type":[{"value":"0885-6125","type":"print"},{"value":"1573-0565","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,1,5]]},"assertion":[{"value":"11 July 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 November 2022","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"15 December 2022","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"5 January 2023","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors of this work declare no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}},{"value":"The presented work involves no experiments on individuals.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The experiments presented in this work do not involve individuals neither humans nor animals.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical approval"}},{"value":"The presented work involves no experiments on individuals.","order":5,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent to participate"}}]}}