{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,4]],"date-time":"2025-12-04T18:49:47Z","timestamp":1764874187652,"version":"3.37.3"},"reference-count":86,"publisher":"Springer Science and Business Media LLC","issue":"10","license":[{"start":{"date-parts":[[2024,7,8]],"date-time":"2024-07-08T00:00:00Z","timestamp":1720396800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,7,8]],"date-time":"2024-07-08T00:00:00Z","timestamp":1720396800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Mach Learn"],"published-print":{"date-parts":[[2024,10]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>In this paper, we propose systematic approaches for learning imbalanced data based on a two-regime process: regime 0, which generates excess zeros (majority class), and regime 1, which contributes to generating an outcome of one (minority class). The proposed model contains two latent equations: a split probit (logit) equation in the first stage and an ordinary probit (logit) equation in the second stage. Because boosting improves the accuracy of prediction versus using a single classifier, we combined a boosting strategy with the two-regime process. Thus, we developed the zero-inflated probit boost (ZIPBoost) and zero-inflated logit boost (ZILBoost) methods. We show that the weight functions of ZIPBoost have the desired properties for good predictive performance. Like AdaBoost, the weight functions upweight misclassified examples and downweight correctly classified examples. We show that the weight functions of ZILBoost have similar properties to those of LogitBoost. The algorithm will focus more on examples that are hard to classify in the next iteration, resulting in improved prediction accuracy. We provide the relative performance of ZIPBoost and ZILBoost, which rely on the excess kurtosis of the data distribution. Furthermore, we show the convergence and time complexity of our proposed methods. We demonstrate the performance of our proposed methods using a Monte Carlo simulation, mergers and acquisitions (M&amp;A) data application, and imbalanced datasets from the Keel repository. The results of the experiments show that our proposed methods yield better prediction accuracy compared to other learning algorithms.<\/jats:p>","DOI":"10.1007\/s10994-024-06558-3","type":"journal-article","created":{"date-parts":[[2024,7,8]],"date-time":"2024-07-08T21:01:30Z","timestamp":1720472490000},"page":"8233-8299","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["A systematic approach for learning imbalanced data: enhancing zero-inflated models through boosting"],"prefix":"10.1007","volume":"113","author":[{"given":"Yeasung","family":"Jeong","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9108-9554","authenticated-orcid":false,"given":"Kangbok","family":"Lee","sequence":"additional","affiliation":[]},{"given":"Young Woong","family":"Park","sequence":"additional","affiliation":[]},{"given":"Sumin","family":"Han","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,7,8]]},"reference":[{"key":"6558_CR1","doi-asserted-by":"publisher","first-page":"145","DOI":"10.1016\/j.jmva.2016.08.007","volume":"152","author":"F Alashwali","year":"2016","unstructured":"Alashwali, F., & Kent, J. T. (2016). The use of a common location measure in the invariant coordinate selection and projection pursuit. Journal of Multivariate AnalySis, 152, 145\u2013161. https:\/\/doi.org\/10.1016\/j.jmva.2016.08.007","journal-title":"Journal of Multivariate AnalySis"},{"issue":"1","key":"6558_CR2","doi-asserted-by":"publisher","first-page":"659","DOI":"10.1016\/j.amc.2006.05.116","volume":"183","author":"DKR Babajee","year":"2006","unstructured":"Babajee, D. K. R., & Dauhoo, M. Z. (2006). An analysis of the properties of the variants of Newton\u2019s method with third order convergence. Applied Mathematics and Computation, 183(1), 659\u2013684. https:\/\/doi.org\/10.1016\/j.amc.2006.05.116","journal-title":"Applied Mathematics and Computation"},{"issue":"4","key":"6558_CR3","doi-asserted-by":"publisher","first-page":"1645","DOI":"10.1111\/j.1540-6261.2006.00885.x","volume":"61","author":"M Baker","year":"2006","unstructured":"Baker, M., & Wurgler, J. (2006). Investor sentiment and the cross-section of stock returns. The Journal of Finance, 61(4), 1645\u20131680. https:\/\/doi.org\/10.1111\/j.1540-6261.2006.00885.x","journal-title":"The Journal of Finance"},{"issue":"3","key":"6558_CR4","doi-asserted-by":"publisher","first-page":"849","DOI":"10.1016\/S0031-3203(02)00257-1","volume":"36","author":"R Barandela","year":"2003","unstructured":"Barandela, R., S\u00e1nchez, J. S., Garc\u00eda, V., & Rangel, E. (2003). Strategies for learning in class imbalance problems. Pattern Recognition, 36(3), 849\u2013851. https:\/\/doi.org\/10.1016\/S0031-3203(02)00257-1","journal-title":"Pattern Recognition"},{"issue":"1","key":"6558_CR5","doi-asserted-by":"publisher","first-page":"20","DOI":"10.1145\/1007730.1007735","volume":"6","author":"GE Batista","year":"2004","unstructured":"Batista, G. E., Prati, R. C., & Monard, M. C. (2004). A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter, 6(1), 20\u201329. https:\/\/doi.org\/10.1145\/1007730.1007735","journal-title":"ACM SIGKDD Explorations Newsletter"},{"issue":"6","key":"6558_CR6","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0177678","volume":"12","author":"S Boughorbel","year":"2017","unstructured":"Boughorbel, S., Jarray, F., & El-Anbari, M. (2017). Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric. PLoS ONE, 12(6), e0177678. https:\/\/doi.org\/10.1371\/journal.pone.0177678","journal-title":"PLoS ONE"},{"key":"6558_CR7","unstructured":"Brooks, R. J., Galbraith, D. A., Nancekivell, E. G., & Bishop, C. A. (1988). Developing management guidelines for snapping turtles.\u00a0General Technical Report RM-Rocky Mountain Forest and Range Experiment Station, US Department of Agriculture, Forest Service (USA)."},{"issue":"9\u201310","key":"6558_CR8","doi-asserted-by":"publisher","first-page":"1861","DOI":"10.1111\/j.0306-686X.2005.00650.x","volume":"32","author":"M Bugeja","year":"2005","unstructured":"Bugeja, M. (2005). The \u201cindependence\u201d of expert opinions in corporate takeovers: Agreeing with directors\u2019 recommendations. Journal of Business Finance & Accounting, 32(9\u201310), 1861\u20131885. https:\/\/doi.org\/10.1111\/j.0306-686X.2005.00650.x","journal-title":"Journal of Business Finance & Accounting"},{"key":"6558_CR9","unstructured":"Butler, F. C., & Sauska, P. (2014). Mergers and acquisitions: Termination fees and acquisition deal completion.\u00a0Journal of Managerial Issues, 44\u201354."},{"key":"6558_CR10","doi-asserted-by":"publisher","DOI":"10.1016\/j.amc.2021.125991","volume":"398","author":"F Casella","year":"2021","unstructured":"Casella, F., & Bachmann, B. (2021). On the choice of initial guesses for the Newton-Raphson algorithm. Applied Mathematics and Computation, 398, 125991. https:\/\/doi.org\/10.1016\/j.amc.2021.125991","journal-title":"Applied Mathematics and Computation"},{"key":"6558_CR11","doi-asserted-by":"publisher","unstructured":"Chawla, N. V., Lazarevic, A., Hall, L. O., & Bowyer, K. W. (2003). SMOTEBoost: Improving prediction of the minority class in boosting. In\u00a0European conference on principles of data mining and knowledge discovery\u00a0(pp. 107\u2013119). Springer. https:\/\/doi.org\/10.1007\/978-3-540-39804-2_12","DOI":"10.1007\/978-3-540-39804-2_12"},{"key":"6558_CR12","doi-asserted-by":"publisher","first-page":"321","DOI":"10.1613\/jair.953","volume":"16","author":"NV Chawla","year":"2002","unstructured":"Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321\u2013357. https:\/\/doi.org\/10.1613\/jair.953","journal-title":"Journal of Artificial Intelligence Research"},{"issue":"2","key":"6558_CR13","doi-asserted-by":"publisher","first-page":"225","DOI":"10.1007\/s10618-008-0087-0","volume":"17","author":"NV Chawla","year":"2008","unstructured":"Chawla, N. V., Cieslak, D. A., Hall, L. O., & Joshi, A. (2008). Automatically countering imbalance and its empirical relationship to cost. Data Mining and Knowledge Discovery, 17(2), 225\u2013252. https:\/\/doi.org\/10.1007\/s10618-008-0087-0","journal-title":"Data Mining and Knowledge Discovery"},{"issue":"1","key":"6558_CR14","doi-asserted-by":"publisher","first-page":"159","DOI":"10.1080\/03610920903377799","volume":"40","author":"G Chen","year":"2010","unstructured":"Chen, G., & Tsurumi, H. (2010). Probit and logit model selection. Communications in Statistics\u2014Theory and Methods, 40(1), 159\u2013175. https:\/\/doi.org\/10.1080\/03610920903377799","journal-title":"Communications in Statistics\u2014Theory and Methods"},{"issue":"3","key":"6558_CR15","doi-asserted-by":"publisher","first-page":"397","DOI":"10.1093\/icb\/34.3.397","volume":"34","author":"JD Congdon","year":"1994","unstructured":"Congdon, J. D., Dunham, A. E., & Sels, R. V. L. (1994). Demographics of common snapping turtles (Chelydra serpentina): Implications for conservation and management of long-lived organisms. American Zoologist, 34(3), 397\u2013408. https:\/\/doi.org\/10.1093\/icb\/34.3.397","journal-title":"American Zoologist"},{"issue":"2","key":"6558_CR16","doi-asserted-by":"publisher","first-page":"225","DOI":"10.1111\/j.2517-6161.1988.tb01723.x","volume":"50","author":"JB Copas","year":"1988","unstructured":"Copas, J. B. (1988). Binary regression models for contaminated data. Journal of the Royal Statistical Society: Series B (methodological), 50(2), 225\u2013253.","journal-title":"Journal of the Royal Statistical Society: Series B (methodological)"},{"key":"6558_CR17","unstructured":"Dauphin, Y. N., Pascanu, R., Gulcehre, C., Cho, K., Ganguli, S., & Bengio, Y. (2014). Identifying and attacking the saddle point problem in high-dimensional non-convex optimization.\u00a0Advances in Neural Information Processing Systems,\u00a027."},{"issue":"4","key":"6558_CR18","doi-asserted-by":"publisher","first-page":"393","DOI":"10.1016\/S0167-9473(01)00067-6","volume":"38","author":"H Drucker","year":"2002","unstructured":"Drucker, H. (2002). Effect of pruning and early stopping on performance of a boosting ensemble. Computational Statistics & Data Analysis, 38(4), 393\u2013406. https:\/\/doi.org\/10.1016\/S0167-9473(01)00067-6","journal-title":"Computational Statistics & Data Analysis"},{"key":"6558_CR19","doi-asserted-by":"crossref","unstructured":"Fern\u00e1ndez, A., Garc\u00eda, S., Galar, M., Prati, R. C., Krawczyk, B., & Herrera, F. (2018).\u00a0Learning from imbalanced data sets\u00a0(Vol. 10, pp. 978\u20133). Springer.","DOI":"10.1007\/978-3-319-98074-4"},{"issue":"3","key":"6558_CR20","doi-asserted-by":"publisher","first-page":"561","DOI":"10.1016\/j.ijar.2008.11.004","volume":"50","author":"A Fern\u00e1ndez","year":"2009","unstructured":"Fern\u00e1ndez, A., del Jesus, M. J., & Herrera, F. (2009). Hierarchical fuzzy rule based classification systems with genetic rule selection for imbalanced datasets. International Journal of Approximate Reasoning, 50(3), 561\u2013577. https:\/\/doi.org\/10.1016\/j.ijar.2008.11.004","journal-title":"International Journal of Approximate Reasoning"},{"issue":"18","key":"6558_CR21","doi-asserted-by":"publisher","first-page":"2378","DOI":"10.1016\/j.fss.2007.12.023","volume":"159","author":"A Fern\u00e1ndez","year":"2008","unstructured":"Fern\u00e1ndez, A., Garc\u00eda, S., del Jesus, M. J., & Herrera, F. (2008). A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets. Fuzzy Sets and Systems, 159(18), 2378\u20132398. https:\/\/doi.org\/10.1016\/j.fss.2007.12.023","journal-title":"Fuzzy Sets and Systems"},{"key":"6558_CR22","unstructured":"Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. In\u00a0International conference on machine learning\u00a0(Vol. 96, pp. 148\u2013156)."},{"key":"6558_CR23","doi-asserted-by":"publisher","first-page":"321","DOI":"10.1016\/j.neucom.2018.09.013","volume":"321","author":"M Frid-Adar","year":"2018","unstructured":"Frid-Adar, M., Diamant, I., Klang, E., Amitai, M., Goldberger, J., & Greenspan, H. (2018). GAN-based synthetic medical image augmentation for increased CNN performance in liver lesion classification. Neurocomputing, 321, 321\u2013331. https:\/\/doi.org\/10.1016\/j.neucom.2018.09.013","journal-title":"Neurocomputing"},{"key":"6558_CR24","doi-asserted-by":"crossref","unstructured":"Friedman, J., Hastie, T., & Tibshirani, R. (2000). Special invited paper. Additive logistic regression: A statistical view of boosting.\u00a0Annals of Statistics, 337\u2013374.","DOI":"10.1214\/aos\/1016120463"},{"issue":"4","key":"6558_CR25","doi-asserted-by":"publisher","first-page":"463","DOI":"10.1109\/TSMCC.2011.2161285","volume":"42","author":"M Galar","year":"2012","unstructured":"Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., & Herrera, F. (2012). A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 42(4), 463\u2013484. https:\/\/doi.org\/10.1109\/TSMCC.2011.2161285","journal-title":"IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews"},{"key":"6558_CR26","doi-asserted-by":"publisher","first-page":"248","DOI":"10.1016\/j.neucom.2014.02.006","volume":"138","author":"M Gao","year":"2014","unstructured":"Gao, M., Hong, X., Chen, S., Harris, C. J., & Khalaf, E. (2014). PDFOS: PDF estimation based over-sampling for imbalanced two-class problems. Neurocomputing, 138, 248\u2013259. https:\/\/doi.org\/10.1016\/j.neucom.2014.02.006","journal-title":"Neurocomputing"},{"key":"6558_CR27","doi-asserted-by":"publisher","DOI":"10.1016\/j.jcorpfin.2020.101754","volume":"67","author":"N Gao","year":"2021","unstructured":"Gao, N., Hua, C., & Khurshed, A. (2021). Loan price in mergers and acquisitions. Journal of Corporate Finance, 67, 101754. https:\/\/doi.org\/10.1016\/j.jcorpfin.2020.101754","journal-title":"Journal of Corporate Finance"},{"issue":"4","key":"6558_CR28","doi-asserted-by":"publisher","first-page":"262","DOI":"10.2307\/1310589","volume":"37","author":"JW Gibbons","year":"1987","unstructured":"Gibbons, J. W. (1987). Why do turtles live so long? BioScience, 37(4), 262\u2013269. https:\/\/doi.org\/10.2307\/1310589","journal-title":"BioScience"},{"key":"6558_CR29","unstructured":"Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., & Bengio, Y. (2014). Generative adversarial nets. In Advances in Neural Information Processing Systems, 27."},{"issue":"3","key":"6558_CR30","doi-asserted-by":"publisher","first-page":"3659","DOI":"10.1016\/j.eswa.2011.09.058","volume":"39","author":"L Guelman","year":"2012","unstructured":"Guelman, L. (2012). Gradient boosting trees for auto insurance loss cost modeling and prediction. Expert Systems with Applications, 39(3), 3659\u20133667. https:\/\/doi.org\/10.1016\/j.eswa.2011.09.058","journal-title":"Expert Systems with Applications"},{"issue":"2","key":"6558_CR31","doi-asserted-by":"publisher","first-page":"1073","DOI":"10.1016\/j.jeconom.2007.01.002","volume":"141","author":"MN Harris","year":"2007","unstructured":"Harris, M. N., & Zhao, X. (2007). A zero-inflated ordered probit model, with an application to modelling tobacco consumption. Journal of Econometrics, 141(2), 1073\u20131099. https:\/\/doi.org\/10.1016\/j.jeconom.2007.01.002","journal-title":"Journal of Econometrics"},{"issue":"2","key":"6558_CR32","doi-asserted-by":"publisher","first-page":"556","DOI":"10.2307\/2269391","volume":"6","author":"SS Heppell","year":"1996","unstructured":"Heppell, S. S., Crowder, L. B., & Crouse, D. T. (1996). Models to evaluate headstarting as a management tool for long-lived turtles. Ecological Applications, 6(2), 556\u2013565. https:\/\/doi.org\/10.2307\/2269391","journal-title":"Ecological Applications"},{"key":"6558_CR33","unstructured":"Hill, D. W., Bagozzi, B. E., Moore, W. H., & Mukherjee, B. (2011). Strategic incentives and modeling bias in ordinal data: The zero-inflated ordered probit (ZiOP) model in political science. In\u00a0New faces in political methodology meeting (Vol. 30). Penn State."},{"issue":"5","key":"6558_CR34","doi-asserted-by":"publisher","first-page":"543","DOI":"10.1002\/sam.11570","volume":"15","author":"Y Huang","year":"2022","unstructured":"Huang, Y., Fields, K. G., & Ma, Y. (2022). A tutorial on generative adversarial networks with application to classification of imbalanced data. Statistical Analysis and Data Mining: THe ASA Data Science Journal, 15(5), 543\u2013552. https:\/\/doi.org\/10.1002\/sam.11570","journal-title":"Statistical Analysis and Data Mining: THe ASA Data Science Journal"},{"issue":"7","key":"6558_CR35","doi-asserted-by":"publisher","first-page":"8580","DOI":"10.1016\/j.eswa.2011.01.061","volume":"38","author":"JP Hwang","year":"2011","unstructured":"Hwang, J. P., Park, S., & Kim, E. (2011). A new weighted approach to imbalanced data classification problem via support vector machine with quadratic cost function. Expert Systems with Applications, 38(7), 8580\u20138585. https:\/\/doi.org\/10.1016\/j.eswa.2011.01.061","journal-title":"Expert Systems with Applications"},{"issue":"2","key":"6558_CR36","doi-asserted-by":"publisher","first-page":"332","DOI":"10.2307\/1939296","volume":"74","author":"FJ Janzen","year":"1993","unstructured":"Janzen, F. J. (1993). An experimental analysis of natural selection on body size of hatchling turtles. Ecology, 74(2), 332\u2013341. https:\/\/doi.org\/10.2307\/1939296","journal-title":"Ecology"},{"key":"6558_CR37","doi-asserted-by":"publisher","first-page":"107262","DOI":"10.1016\/j.patcog.2020.107262","volume":"102","author":"M Koziarski","year":"2020","unstructured":"Koziarski, M. (2020). Radial-based undersampling for imbalanced data classification. Pattern Recognition, 102, 107262. https:\/\/doi.org\/10.1016\/j.patcog.2020.107262","journal-title":"Pattern Recognition"},{"issue":"11","key":"6558_CR38","doi-asserted-by":"publisher","first-page":"3059","DOI":"10.1007\/s10994-021-06012-8","volume":"110","author":"M Koziarski","year":"2021","unstructured":"Koziarski, M., Bellinger, C., & Wo\u017aniak, M. (2021). RB-CCR: Radial-Based Combined Cleaning and Resampling algorithm for imbalanced data classification. Machine Learning, 110(11), 3059\u20133093. https:\/\/doi.org\/10.1007\/s10994-021-06012-8","journal-title":"Machine Learning"},{"key":"6558_CR39","doi-asserted-by":"publisher","DOI":"10.1515\/amcs-2017-0050","author":"M Koziarski","year":"2017","unstructured":"Koziarski, M., & Wo\u017aniak, M. (2017). CCR: A combined cleaning and resampling algorithm for imbalanced data classification. International Journal of Applied Mathematics and Computer Science. https:\/\/doi.org\/10.1515\/amcs-2017-0050","journal-title":"International Journal of Applied Mathematics and Computer Science"},{"issue":"4","key":"6558_CR40","doi-asserted-by":"publisher","first-page":"221","DOI":"10.1007\/s13748-016-0094-0","volume":"5","author":"B Krawczyk","year":"2016","unstructured":"Krawczyk, B. (2016). Learning from imbalanced data: Open challenges and future directions. Progress in Artificial Intelligence, 5(4), 221\u2013232. https:\/\/doi.org\/10.1007\/s13748-016-0094-0","journal-title":"Progress in Artificial Intelligence"},{"key":"6558_CR41","doi-asserted-by":"publisher","first-page":"554","DOI":"10.1016\/j.asoc.2013.08.014","volume":"14","author":"B Krawczyk","year":"2014","unstructured":"Krawczyk, B., Wo\u017aniak, M., & Schaefer, G. (2014). Cost-sensitive decision tree ensembles for effective imbalanced classification. Applied Soft Computing, 14, 554\u2013562. https:\/\/doi.org\/10.1016\/j.asoc.2013.08.014","journal-title":"Applied Soft Computing"},{"key":"6558_CR42","doi-asserted-by":"publisher","first-page":"271","DOI":"10.1016\/j.jbusres.2019.11.083","volume":"109","author":"K Lee","year":"2020","unstructured":"Lee, K., Joo, S., Baik, H., Han, S., & In, J. (2020). Unbalanced data, type II error, and nonlinearity in predicting M&A failure. Journal of Business Research, 109, 271\u2013287. https:\/\/doi.org\/10.1016\/j.jbusres.2019.11.083","journal-title":"Journal of Business Research"},{"key":"6558_CR43","unstructured":"Lin, J., Zhong, C., Hu, D., Rudin, C., & Seltzer, M. (2020). Generalized and scalable optimal sparse decision trees. In\u00a0International conference on machine learning\u00a0(pp. 6150\u20136160). PMLR."},{"issue":"1","key":"6558_CR44","doi-asserted-by":"publisher","first-page":"191","DOI":"10.1023\/A:1012406528296","volume":"46","author":"Y Lin","year":"2002","unstructured":"Lin, Y., Lee, Y., & Wahba, G. (2002). Support vector machines for classification in nonstandard situations. Machine Learning, 46(1), 191\u2013202. https:\/\/doi.org\/10.1023\/A:1012406528296","journal-title":"Machine Learning"},{"issue":"8","key":"6558_CR45","doi-asserted-by":"publisher","first-page":"1055","DOI":"10.1109\/TKDE.2006.131","volume":"18","author":"CX Ling","year":"2006","unstructured":"Ling, C. X., Sheng, V. S., & Yang, Q. (2006). Test strategies for cost-sensitive decision trees. IEEE Transactions on Knowledge and Data Engineering, 18(8), 1055\u20131067. https:\/\/doi.org\/10.1109\/TKDE.2006.131","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"key":"6558_CR46","doi-asserted-by":"publisher","unstructured":"Liu, B., Ma, Y., & Wong, C. K. (2000). Improving an association rule based classifier. In\u00a0European conference on principles of data mining and knowledge discovery\u00a0(pp. 504\u2013509). Springer. https:\/\/doi.org\/10.1007\/3-540-45372-5_58","DOI":"10.1007\/3-540-45372-5_58"},{"key":"6558_CR47","unstructured":"Liu, G., Wu, J., & Zhou, Z. H. (2012). Key instance detection in multi-instance learning. In\u00a0Asian conference on machine learning\u00a0(pp. 253\u2013268). PMLR."},{"issue":"6","key":"6558_CR48","doi-asserted-by":"publisher","first-page":"1465","DOI":"10.1080\/02664763.2020.1870669","volume":"49","author":"X Liu","year":"2022","unstructured":"Liu, X., & He, W. (2022). Adaptive kernel scaling support vector machine with application to a prostate cancer image study. Journal of Applied Statistics, 49(6), 1465\u20131484. https:\/\/doi.org\/10.1080\/02664763.2020.1870669","journal-title":"Journal of Applied Statistics"},{"key":"6558_CR49","unstructured":"London, B., Lu, L., Sandler, T., & Joachims, T. (2023). Boosted off-policy learning. In\u00a0International conference on artificial intelligence and statistics\u00a0(pp. 5614\u20135640). PMLR."},{"issue":"7","key":"6558_CR50","doi-asserted-by":"publisher","first-page":"6585","DOI":"10.1016\/j.eswa.2011.12.043","volume":"39","author":"V L\u00f3pez","year":"2012","unstructured":"L\u00f3pez, V., Fern\u00e1ndez, A., Moreno-Torres, J. G., & Herrera, F. (2012). Analysis of preprocessing vs. cost-sensitive learning for imbalanced classification: Open problems on intrinsic data characteristics. Expert Systems with Applications, 39(7), 6585\u20136608. https:\/\/doi.org\/10.1016\/j.eswa.2011.12.043","journal-title":"Expert Systems with Applications"},{"issue":"234","key":"6558_CR51","first-page":"1","volume":"21","author":"M Massias","year":"2020","unstructured":"Massias, M., Vaiter, S., Gramfort, A., & Salmon, J. (2020). Dual extrapolation for sparse generalized linear models. Journal of Machine Learning Research, 21(234), 1\u201333.","journal-title":"Journal of Machine Learning Research"},{"key":"6558_CR52","doi-asserted-by":"publisher","unstructured":"Napiera\u0142a, K., Stefanowski, J., & Wilk, S. (2010). Learning from imbalanced data in presence of noisy and borderline examples. In\u00a0International conference on rough sets and current trends in computing\u00a0(pp. 158\u2013167). Springer. https:\/\/doi.org\/10.1007\/978-3-642-13529-3_18","DOI":"10.1007\/978-3-642-13529-3_18"},{"issue":"1","key":"6558_CR53","first-page":"99","volume":"15","author":"R Oentaryo","year":"2014","unstructured":"Oentaryo, R., Lim, E. P., Finegold, M., Lo, D., Zhu, F., Phua, C., Cheu, E. Y., Yap, G. E., Sim, K., Nguyen, M. N., Perera, K., Neupane, B., Faisal, M., Aung, Z., Woon, W. L., Chen, W., Patel, D., & Berrar, D. (2014). Detecting click fraud in online advertising: A data mining approach. Journal of Machine Learning Research, 15(1), 99\u2013140.","journal-title":"Journal of Machine Learning Research"},{"issue":"1","key":"6558_CR54","doi-asserted-by":"publisher","first-page":"343","DOI":"10.1137\/17M1150116","volume":"29","author":"S Paternain","year":"2019","unstructured":"Paternain, S., Mokhtari, A., & Ribeiro, A. (2019). A Newton-based method for nonconvex optimization with fast evasion of saddle points. SIAM Journal on Optimization, 29(1), 343\u2013368.","journal-title":"SIAM Journal on Optimization"},{"key":"6558_CR55","doi-asserted-by":"publisher","DOI":"10.1016\/j.asoc.2020.106989","volume":"101","author":"W Pei","year":"2021","unstructured":"Pei, W., Xue, B., Shang, L., & Zhang, M. (2021). Genetic programming for development of cost-sensitive classifiers for binary high-dimensional unbalanced classification. Applied Soft Computing, 101, 106989. https:\/\/doi.org\/10.1016\/j.asoc.2020.106989","journal-title":"Applied Soft Computing"},{"issue":"3","key":"6558_CR56","doi-asserted-by":"publisher","first-page":"334","DOI":"10.1655\/HERPETOLOGICA-D-11-00046.1","volume":"68","author":"C Perez-Heydrich","year":"2012","unstructured":"Perez-Heydrich, C., Jackson, K., Wendland, L. D., & Brown, M. B. (2012). Gopher tortoise hatchling survival: Field study and meta-analysis. Herpetologica, 68(3), 334\u2013344. https:\/\/doi.org\/10.1655\/HERPETOLOGICA-D-11-00046.1","journal-title":"Herpetologica"},{"issue":"3","key":"6558_CR57","doi-asserted-by":"publisher","first-page":"199","DOI":"10.1023\/A:1024099825458","volume":"52","author":"F Provost","year":"2003","unstructured":"Provost, F., & Domingos, P. (2003). Tree induction for probability-based ranking. Machine Learning, 52(3), 199\u2013215. https:\/\/doi.org\/10.1023\/A:1024099825458","journal-title":"Machine Learning"},{"key":"6558_CR58","doi-asserted-by":"publisher","first-page":"1038","DOI":"10.1016\/j.neucom.2015.07.109","volume":"171","author":"D Ren","year":"2016","unstructured":"Ren, D., Qu, F., Lv, K., Zhang, Z., Xu, H., & Wang, X. (2016). A gradient descent boosting spectrum modeling method based on back interval partial least squares. Neurocomputing, 171, 1038\u20131046. https:\/\/doi.org\/10.1016\/j.neucom.2015.07.109","journal-title":"Neurocomputing"},{"key":"6558_CR59","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2022.108296","volume":"241","author":"Z Ren","year":"2022","unstructured":"Ren, Z., Zhu, Y., Kang, W., Fu, H., Niu, Q., Gao, D., Yan, K., & Hong, J. (2022). Adaptive cost-sensitive learning: Improving the convergence of intelligent diagnosis models under imbalanced data. Knowledge-Based Systems, 241, 108296. https:\/\/doi.org\/10.1016\/j.knosys.2022.108296","journal-title":"Knowledge-Based Systems"},{"key":"6558_CR60","doi-asserted-by":"publisher","first-page":"650","DOI":"10.1016\/j.jcorpfin.2019.07.010","volume":"58","author":"L Renneboog","year":"2019","unstructured":"Renneboog, L., & Vansteenkiste, C. (2019). Failure and success in mergers and acquisitions. Journal of Corporate Finance, 58, 650\u2013699. https:\/\/doi.org\/10.1016\/j.jcorpfin.2019.07.010","journal-title":"Journal of Corporate Finance"},{"key":"6558_CR61","doi-asserted-by":"publisher","first-page":"218","DOI":"10.1016\/j.jcorpfin.2013.11.012","volume":"28","author":"L Renneboog","year":"2014","unstructured":"Renneboog, L., & Zhao, Y. (2014). Director networks and takeovers. Journal of Corporate Finance, 28, 218\u2013234. https:\/\/doi.org\/10.1016\/j.jcorpfin.2013.11.012","journal-title":"Journal of Corporate Finance"},{"issue":"4","key":"6558_CR62","doi-asserted-by":"publisher","first-page":"628","DOI":"10.1016\/j.ijforecast.2013.01.008","volume":"29","author":"BD Rodrigues","year":"2013","unstructured":"Rodrigues, B. D., & Stevenson, M. J. (2013). Takeover prediction using forecast combinations. International Journal of Forecasting, 29(4), 628\u2013641. https:\/\/doi.org\/10.1016\/j.ijforecast.2013.01.008","journal-title":"International Journal of Forecasting"},{"issue":"1","key":"6558_CR63","first-page":"5975","volume":"17","author":"D Rohde","year":"2016","unstructured":"Rohde, D., & Wand, M. P. (2016). Semiparametric mean field variational Bayes: General principles and numerical issues. Journal of Machine Learning Research, 17(1), 5975\u20136021.","journal-title":"Journal of Machine Learning Research"},{"key":"6558_CR64","doi-asserted-by":"publisher","first-page":"60401","DOI":"10.1109\/ACCESS.2020.2983605","volume":"8","author":"MAS Saber","year":"2020","unstructured":"Saber, M. A. S., Ghorbani, M., Bayati, A., Nguyen, K. K., & Cheriet, M. (2020). Online data center traffic classification based on inter-flow correlations. IEEE Access, 8, 60401\u201360416. https:\/\/doi.org\/10.1109\/ACCESS.2020.2983605","journal-title":"IEEE Access"},{"issue":"1","key":"6558_CR65","doi-asserted-by":"publisher","first-page":"576","DOI":"10.1137\/110840054","volume":"23","author":"A Saha","year":"2013","unstructured":"Saha, A., & Tewari, A. (2013). On the nonasymptotic convergence of cyclic coordinate descent methods. SIAM Journal on Optimization, 23(1), 576\u2013601. https:\/\/doi.org\/10.1137\/110840054","journal-title":"SIAM Journal on Optimization"},{"issue":"10","key":"6558_CR66","doi-asserted-by":"publisher","first-page":"1587","DOI":"10.1080\/03610918.2011.589332","volume":"40","author":"J Song","year":"2011","unstructured":"Song, J., Lu, X., Liu, M., & Wu, X. (2011). Stratified normalization LogitBoost for two-class unbalanced data classification. Communications in Statistics-Simulation and Computation, 40(10), 1587\u20131593. https:\/\/doi.org\/10.1080\/03610918.2011.589332","journal-title":"Communications in Statistics-Simulation and Computation"},{"issue":"3","key":"6558_CR67","doi-asserted-by":"publisher","first-page":"395","DOI":"10.1007\/s11575-011-0099-7","volume":"52","author":"GK Stahl","year":"2012","unstructured":"Stahl, G. K., Chua, C. H., & Pablo, A. L. (2012). Does national context affect target firm employees\u2019 trust in acquisitions? Management International Review, 52(3), 395\u2013423. https:\/\/doi.org\/10.1007\/s11575-011-0099-7","journal-title":"Management International Review"},{"issue":"12","key":"6558_CR68","doi-asserted-by":"publisher","first-page":"R721","DOI":"10.1016\/j.cub.2020.04.088","volume":"30","author":"CB Stanford","year":"2020","unstructured":"Stanford, C. B., Iverson, J. B., Rhodin, A. G., van Dijk, P. P., Mittermeier, R. A., Kuchling, G., & Walde, A. D. (2020). Turtles and tortoises are in trouble. Current Biology, 30(12), R721\u2013R735. https:\/\/doi.org\/10.1016\/j.cub.2020.04.088","journal-title":"Current Biology"},{"key":"6558_CR69","doi-asserted-by":"publisher","unstructured":"Stefanowski, J., & Wilk, S. (2008). Selective pre-processing of imbalanced data for improving classification performance. In\u00a0International conference on data warehousing and knowledge discovery\u00a0(pp. 283\u2013292). Springer. https:\/\/doi.org\/10.1007\/978-3-540-85836-2_27","DOI":"10.1007\/978-3-540-85836-2_27"},{"key":"6558_CR70","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511801181","volume-title":"An introduction to numerical analysis","author":"E S\u00fcli","year":"2003","unstructured":"S\u00fcli, E., & Mayers, D. F. (2003). An introduction to numerical analysis. Cambridge University Press."},{"issue":"12","key":"6558_CR71","doi-asserted-by":"publisher","first-page":"3358","DOI":"10.1016\/j.patcog.2007.04.009","volume":"40","author":"Y Sun","year":"2007","unstructured":"Sun, Y., Kamel, M. S., Wong, A. K., & Wang, Y. (2007). Cost-sensitive boosting for classification of imbalanced data. Pattern Recognition, 40(12), 3358\u20133378. https:\/\/doi.org\/10.1016\/j.patcog.2007.04.009","journal-title":"Pattern Recognition"},{"issue":"5","key":"6558_CR72","doi-asserted-by":"publisher","first-page":"1623","DOI":"10.1016\/j.patcog.2014.11.014","volume":"48","author":"Z Sun","year":"2015","unstructured":"Sun, Z., Song, Q., Zhu, X., Sun, H., Xu, B., & Zhou, Y. (2015). A novel ensemble method for classifying imbalanced data. Pattern Recognition, 48(5), 1623\u20131637. https:\/\/doi.org\/10.1016\/j.patcog.2014.11.014","journal-title":"Pattern Recognition"},{"issue":"9","key":"6558_CR73","doi-asserted-by":"publisher","first-page":"1917","DOI":"10.1080\/00949655.2013.770514","volume":"84","author":"CY Tang","year":"2014","unstructured":"Tang, C. Y., & Wu, T. T. (2014). Nested coordinate descent algorithms for empirical likelihood. Journal of Statistical Computation and Simulation, 84(9), 1917\u20131930. https:\/\/doi.org\/10.1080\/00949655.2013.770514","journal-title":"Journal of Statistical Computation and Simulation"},{"issue":"12","key":"6558_CR74","doi-asserted-by":"publisher","first-page":"1339","DOI":"10.1016\/j.patrec.2013.04.019","volume":"34","author":"P Thanathamathee","year":"2013","unstructured":"Thanathamathee, P., & Lursinsap, C. (2013). Handling imbalanced data sets with synthetic boundary data generation using bootstrap re-sampling and AdaBoost techniques. Pattern Recognition Letters, 34(12), 1339\u20131347. https:\/\/doi.org\/10.1016\/j.patrec.2013.04.019","journal-title":"Pattern Recognition Letters"},{"key":"6558_CR75","first-page":"3333","volume":"15","author":"W Waegeman","year":"2014","unstructured":"Waegeman, W., Dembczy\u0144ski, K., Jachnik, A., Cheng, W., & H\u00fcllermeier, E. (2014). On the Bayes-optimality of f-measure maximizers. Journal of Machine Learning Research, 15, 3333\u20133388.","journal-title":"Journal of Machine Learning Research"},{"issue":"1","key":"6558_CR76","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s10115-009-0198-y","volume":"25","author":"BX Wang","year":"2010","unstructured":"Wang, B. X., & Japkowicz, N. (2010). Boosting support vector machines for imbalanced data sets. Knowledge and Information Systems, 25(1), 1\u201320. https:\/\/doi.org\/10.1007\/s10115-009-0198-y","journal-title":"Knowledge and Information Systems"},{"issue":"5","key":"6558_CR77","doi-asserted-by":"publisher","first-page":"1356","DOI":"10.1109\/TKDE.2014.2345380","volume":"27","author":"S Wang","year":"2014","unstructured":"Wang, S., Minku, L. L., & Yao, X. (2014). Resampling-based ensemble methods for online class imbalance learning. IEEE Transactions on Knowledge and Data Engineering, 27(5), 1356\u20131368. https:\/\/doi.org\/10.1109\/TKDE.2014.2345380","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"key":"6558_CR78","doi-asserted-by":"publisher","DOI":"10.1155\/2021\/6033860","author":"J Wei","year":"2021","unstructured":"Wei, J., Feng, G., Lu, Z., Han, P., Zhu, Y., & Huang, W. (2021). Evaluating drug risk using GAN and SMOTE based on CFDA\u2019s spontaneous reporting data. Journal of Healthcare Engineering. https:\/\/doi.org\/10.1155\/2021\/6033860","journal-title":"Journal of Healthcare Engineering"},{"issue":"67\u201368","key":"6558_CR79","doi-asserted-by":"publisher","first-page":"7","DOI":"10.1137\/17M1150116","volume":"35","author":"S Wright","year":"2006","unstructured":"Wright, S., & Nocedal, J. (2006). Numerical optimization. Springer Science, 35(67\u201368), 7. https:\/\/doi.org\/10.1137\/17M1150116","journal-title":"Springer Science"},{"issue":"6","key":"6558_CR80","doi-asserted-by":"publisher","first-page":"1145","DOI":"10.1080\/00949655.2011.652114","volume":"83","author":"TT Wu","year":"2013","unstructured":"Wu, T. T. (2013). Lasso penalized semiparametric regression on high-dimensional recurrent event data via coordinate descent. Journal of Statistical Computation and Simulation, 83(6), 1145\u20131155. https:\/\/doi.org\/10.1080\/00949655.2011.652114","journal-title":"Journal of Statistical Computation and Simulation"},{"issue":"4","key":"6558_CR81","doi-asserted-by":"publisher","first-page":"1698","DOI":"10.1214\/10-AOAS345","volume":"4","author":"TT Wu","year":"2010","unstructured":"Wu, T. T., & Lange, K. (2010). Multicategory vertex discriminant analysis for high-dimensional data. The Annals of Applied Statistics, 4(4), 1698\u20131721. https:\/\/doi.org\/10.1214\/10-AOAS345","journal-title":"The Annals of Applied Statistics"},{"issue":"52","key":"6558_CR82","doi-asserted-by":"publisher","first-page":"5706","DOI":"10.1080\/00036846.2020.1770682","volume":"52","author":"D Xu","year":"2020","unstructured":"Xu, D. (2020). Modelling asset returns under price limits with mixture of truncated Gaussian distribution. Applied Economics, 52(52), 5706\u20135725. https:\/\/doi.org\/10.1080\/00036846.2020.1770682","journal-title":"Applied Economics"},{"key":"6558_CR83","doi-asserted-by":"publisher","unstructured":"Yang, H., & Zhou, Y. (2021). Ida-gan: A novel imbalanced data augmentation gan. In\u00a02020 25th international conference on pattern recognition (ICPR)\u00a0(pp. 8299\u20138305). IEEE. https:\/\/doi.org\/10.1109\/ICPR48806.2021.9411996","DOI":"10.1109\/ICPR48806.2021.9411996"},{"key":"6558_CR84","doi-asserted-by":"publisher","DOI":"10.1155\/2013\/761814","author":"QY Yin","year":"2013","unstructured":"Yin, Q. Y., Zhang, J. S., Zhang, C. X., & Liu, S. C. (2013). An empirical study on the performance of cost-sensitive boosting algorithms with different levels of class imbalance. Mathematical Problems in Engineering. https:\/\/doi.org\/10.1155\/2013\/761814","journal-title":"Mathematical Problems in Engineering"},{"key":"6558_CR85","doi-asserted-by":"publisher","unstructured":"Zhang, S., Liu, L., Zhu, X., & Zhang, C. (2008). A strategy for attributes selection in cost-sensitive decision trees induction. In\u00a02008 IEEE 8th international conference on computer and information technology workshops\u00a0(pp. 8\u201313). IEEE. https:\/\/doi.org\/10.1109\/CIT.2008.Workshops.51.","DOI":"10.1109\/CIT.2008.Workshops.51"},{"issue":"12","key":"6558_CR86","doi-asserted-by":"publisher","first-page":"4428","DOI":"10.1016\/j.patcog.2012.06.006","volume":"45","author":"S Zheng","year":"2012","unstructured":"Zheng, S., & Liu, W. (2012). Functional gradient ascent for Probit regression. Pattern Recognition, 45(12), 4428\u20134437. https:\/\/doi.org\/10.1016\/j.patcog.2012.06.006","journal-title":"Pattern Recognition"}],"container-title":["Machine Learning"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-024-06558-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10994-024-06558-3\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-024-06558-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,17]],"date-time":"2024-10-17T21:07:11Z","timestamp":1729199231000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10994-024-06558-3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7,8]]},"references-count":86,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2024,10]]}},"alternative-id":["6558"],"URL":"https:\/\/doi.org\/10.1007\/s10994-024-06558-3","relation":{},"ISSN":["0885-6125","1573-0565"],"issn-type":[{"type":"print","value":"0885-6125"},{"type":"electronic","value":"1573-0565"}],"subject":[],"published":{"date-parts":[[2024,7,8]]},"assertion":[{"value":"9 June 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 February 2024","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"26 April 2024","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"8 July 2024","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"We have no conflict of interest to declare.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}},{"value":"Not Applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical approval"}},{"value":"Not Applicable.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent to participate"}},{"value":"Not Applicable.","order":5,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}}]}}