{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,31]],"date-time":"2025-12-31T00:15:06Z","timestamp":1767140106087,"version":"build-2238731810"},"reference-count":34,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2023,2,4]],"date-time":"2023-02-04T00:00:00Z","timestamp":1675468800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,2,4]],"date-time":"2023-02-04T00:00:00Z","timestamp":1675468800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100007065","name":"Universit\u00e0 degli Studi di Salerno","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100007065","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Comput Stat"],"published-print":{"date-parts":[[2024,2]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>\n                    This paper proposes and discusses a bootstrap scheme to make inferences when an imbalance in one of the levels of a binary variable affects both the dependent variable and some of the features. Specifically, the imbalance in the binary dependent variable is managed by adopting an asymmetric link function based on the quantile of the generalized extreme value (GEV) distribution, leading to a class of models called\n                    <jats:italic>GEV regression<\/jats:italic>\n                    . Within this framework, we propose using the fractional-random-weighted (FRW) bootstrap to obtain confidence intervals and implement a multiple testing procedure to identifying the set of relevant features. The main advantages of FRW bootstrap are as follows: (1) all observations belonging to the imbalanced class are always present in every bootstrap resample; (2) the bootstrap can be applied even when the complexity of the link function does not allow to easily compute second-order derivatives for the Hessian; (3) the bootstrap resampling scheme does not change whatever the link function is, and can be applied beyond the GEV link function used in this study. The performance of the FRW bootstrap in GEV regression modelling is evaluated using a detailed Monte Carlo simulation study, where the imbalance is present in the dependent variable and features. An application of the proposed methodology to a real dataset to analyze student churn in an Italian university is also discussed.\n                  <\/jats:p>","DOI":"10.1007\/s00180-023-01330-y","type":"journal-article","created":{"date-parts":[[2023,2,4]],"date-time":"2023-02-04T14:02:31Z","timestamp":1675519351000},"page":"181-213","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Bootstrapping binary GEV regressions for imbalanced datasets"],"prefix":"10.1007","volume":"39","author":[{"given":"Michele","family":"La Rocca","sequence":"first","affiliation":[]},{"given":"Marcella","family":"Niglio","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1150-8278","authenticated-orcid":false,"given":"Marialuisa","family":"Restaino","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,2,4]]},"reference":[{"key":"1330_CR1","doi-asserted-by":"publisher","DOI":"10.1002\/0471249688","volume-title":"Categorical data analysis","author":"A Agresti","year":"2002","unstructured":"Agresti A (2002) Categorical data analysis, 2nd edn. Wiley, New York","edition":"2"},{"issue":"33","key":"1330_CR2","doi-asserted-by":"publisher","first-page":"528","DOI":"10.1080\/02664763.2017.1282441","volume":"45","author":"JS Bergtold","year":"2018","unstructured":"Bergtold JS, Yeager EA, Featherstone AM (2018) Inferences from logistic regression models in the presence of small samples, rare events, nonlinearity, and multicollinearity with observational data. J Appl Stat 45(33):528\u2013546","journal-title":"J Appl Stat"},{"issue":"11","key":"1330_CR4","doi-asserted-by":"publisher","first-page":"1783","DOI":"10.1057\/jors.2014.106","volume":"66","author":"R Calabrese","year":"2015","unstructured":"Calabrese R, Giudici P (2015) Estimating bank default with generalised extreme value regression models. J Oper Res Soc 66(11):1783\u20131792","journal-title":"J Oper Res Soc"},{"issue":"6","key":"1330_CR5","doi-asserted-by":"publisher","first-page":"1172","DOI":"10.1080\/02664763.2013.784894","volume":"40","author":"R Calabrese","year":"2013","unstructured":"Calabrese R, Osmetti S (2013) Modelling SME loan defaults as rare events: an application to credit defaults. J Appl Stat 40(6):1172\u20131188","journal-title":"J Appl Stat"},{"key":"1330_CR3","doi-asserted-by":"publisher","first-page":"604","DOI":"10.1057\/jors.2015.64","volume":"67","author":"R Calabrese","year":"2016","unstructured":"Calabrese R, Marra G, Osmetti SA (2016) Bankruptcy prediction of small and medium enterprises using a flexible binary generalized extreme value model. J Oper Res Soc 67:604\u2013615","journal-title":"J Oper Res Soc"},{"key":"1330_CR6","doi-asserted-by":"publisher","first-page":"321","DOI":"10.1613\/jair.953","volume":"16","author":"NV Chawla","year":"2002","unstructured":"Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321\u2013357","journal-title":"J Artif Intell Res"},{"issue":"448","key":"1330_CR7","doi-asserted-by":"publisher","first-page":"1172","DOI":"10.1080\/01621459.1999.10473872","volume":"94","author":"M-H Chen","year":"1999","unstructured":"Chen M-H, Dey DK, Shao Q-M (1999) A new skewed link model for dichotomous quantal response data. J Am Stat Assoc 94(448):1172\u20131186","journal-title":"J Am Stat Assoc"},{"key":"1330_CR8","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4471-3675-0","volume-title":"An introduction to statistical modeling of extreme values","author":"S Coles","year":"2001","unstructured":"Coles S (2001) An introduction to statistical modeling of extreme values. Springer, Berlin"},{"issue":"3","key":"1330_CR9","doi-asserted-by":"publisher","first-page":"189","DOI":"10.1214\/ss\/1032280214","volume":"11","author":"TJ DiCiccio","year":"1996","unstructured":"DiCiccio TJ, Efron B (1996) Bootstrap confidence intervals. Stat Sci 11(3):189\u2013228","journal-title":"Stat Sci"},{"key":"1330_CR10","doi-asserted-by":"publisher","DOI":"10.1201\/9780367807849","volume-title":"An introduction to generalized linear models","author":"AJ Dobson","year":"2008","unstructured":"Dobson AJ, Barnett AG (2008) An introduction to generalized linear models, 3rd edn. CRC Press, New York","edition":"3"},{"key":"1330_CR11","doi-asserted-by":"crossref","unstructured":"Efron B (1982) The Jackknife, the bootstrap, and other resampling plans. CBMS-NF n038, S.I.A.M., Philadelphia","DOI":"10.1137\/1.9781611970319"},{"issue":"1","key":"1330_CR12","doi-asserted-by":"publisher","first-page":"18","DOI":"10.1111\/j.0824-7935.2004.t01-1-00228.x","volume":"20","author":"A Estabrooks","year":"2004","unstructured":"Estabrooks A, Taeho J, Japkovicz N (2004) A multiple resampling method for learning form imbalanced data sets. Comput Intell 20(1):18\u201336","journal-title":"Comput Intell"},{"issue":"2","key":"1330_CR13","doi-asserted-by":"publisher","first-page":"381","DOI":"10.1093\/biomet\/88.2.381","volume":"88","author":"Z Jin","year":"2001","unstructured":"Jin Z, Ying Z, Wei L (2001) A simple resampling method by perturbing the minimand. Biometrika 88(2):381\u2013390","journal-title":"Biometrika"},{"issue":"1","key":"1330_CR14","doi-asserted-by":"publisher","first-page":"93","DOI":"10.1093\/biomet\/asm079","volume":"95","author":"S Kim","year":"2007","unstructured":"Kim S, Chen M-H, Dey DK (2007) Flexible generalized t-link models for binary response data. Biometrika 95(1):93\u2013106","journal-title":"Biometrika"},{"issue":"1","key":"1330_CR15","doi-asserted-by":"publisher","first-page":"220","DOI":"10.1016\/j.eswa.2016.12.035","volume":"73","author":"G Haixiang","year":"2017","unstructured":"Haixiang G, Yijing L, Shang J, Mingyun G, Yuanyue H, Bing G (2017) Learning from class-imbalanced data: review of methods and applications. Expert Syst Appl 73(1):220\u2013239","journal-title":"Expert Syst Appl"},{"key":"1330_CR16","doi-asserted-by":"publisher","first-page":"371","DOI":"10.1080\/00031305.2015.1089789","volume":"61","author":"TC Hesterberg","year":"2015","unstructured":"Hesterberg TC (2015) What teachers should know about the bootstrap: resampling in the undergraduate statistics curriculum. Am Stat 61:371\u2013386","journal-title":"Am Stat"},{"issue":"5","key":"1330_CR17","doi-asserted-by":"publisher","first-page":"429","DOI":"10.3233\/IDA-2002-6504","volume":"6","author":"N Japkowicz","year":"2002","unstructured":"Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intell Data An Js 6(5):429\u2013449","journal-title":"Intell Data An Js"},{"issue":"2","key":"1330_CR18","doi-asserted-by":"publisher","first-page":"137","DOI":"10.1093\/oxfordjournals.pan.a004868","volume":"9","author":"G King","year":"2001","unstructured":"King G, Zeng L (2001) Logistic regression in rare events data. Polit Anal 9(2):137\u2013163","journal-title":"Polit Anal"},{"key":"1330_CR19","doi-asserted-by":"publisher","DOI":"10.1142\/p191","volume-title":"Extreme values distributions. Theory and methods","author":"S Kotz","year":"2000","unstructured":"Kotz S, Nadarajah S (2000) Extreme values distributions. Theory and methods. Imperial College Press, London"},{"key":"1330_CR20","doi-asserted-by":"publisher","first-page":"221","DOI":"10.1007\/s13748-016-0094-0","volume":"5","author":"B Krawczyk","year":"2001","unstructured":"Krawczyk B (2001) Learning from imbalanced data: open challenges and future directions. Prog Artif Intell 5:221\u2013232","journal-title":"Prog Artif Intell"},{"key":"1330_CR21","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4899-3242-6","volume-title":"Generalized linear models","author":"P McCullagh","year":"1989","unstructured":"McCullagh P, Nelder JA (1989) Generalized linear models. Chapmann Hall, New York"},{"issue":"1578\u20131590","key":"1330_CR22","doi-asserted-by":"publisher","first-page":"1578","DOI":"10.1080\/03610918.2019.1676438","volume":"51","author":"H Olmus","year":"2022","unstructured":"Olmus H, Nazman E, Erba\u015f S (2022) Comparison of penalized logistic regression models for rare event case. Commun Stat Simul Comput 51(1578\u20131590):1578\u20131590","journal-title":"Commun Stat Simul Comput"},{"key":"1330_CR23","unstructured":"R Core Team (2022) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https:\/\/www.R-project.org\/"},{"issue":"469","key":"1330_CR25","doi-asserted-by":"publisher","first-page":"94","DOI":"10.1198\/016214504000000539","volume":"100","author":"JP Romano","year":"2005","unstructured":"Romano JP, Wolf M (2005a) Exact and approximate stepdown methods for multiple hypothesis testing. J Am Stat Assoc 100(469):94\u2013108","journal-title":"J Am Stat Assoc"},{"issue":"4","key":"1330_CR26","doi-asserted-by":"publisher","first-page":"1237","DOI":"10.1111\/j.1468-0262.2005.00615.x","volume":"73","author":"JP Romano","year":"2005","unstructured":"Romano JP, Wolf M (2005b) Stepwise multiple testing as formalized data snooping. Econometrica 73(4):1237\u20131282","journal-title":"Econometrica"},{"key":"1330_CR27","doi-asserted-by":"publisher","first-page":"38","DOI":"10.1016\/j.spl.2016.02.012","volume":"113","author":"JP Romano","year":"2016","unstructured":"Romano JP, Wolf M (2016) Efficient computation of adjusted p-values for resampling-based stepdown multiple testing. Stat Prob Lett 113:38\u201340","journal-title":"Stat Prob Lett"},{"issue":"2","key":"1330_CR24","doi-asserted-by":"publisher","first-page":"404","DOI":"10.1017\/S0266466608080171","volume":"24","author":"JP Romano","year":"2008","unstructured":"Romano JP, Shaikh AM, Wolf M (2008) Formalized data snooping based on generalized error rates. Econom Theory 24(2):404\u2013447","journal-title":"Econom Theory"},{"key":"1330_CR28","doi-asserted-by":"crossref","unstructured":"Seiffert C, Khoshgoftaar TM, Van Hulse J, Napolitano A (2008) Resampling or reweighting: a comparison of boosting implementations. In: 20th IEEE international conference tools with artificial intelligence, vol 1. IEEE, pp 445\u2013451","DOI":"10.1109\/ICTAI.2008.59"},{"key":"1330_CR29","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4612-0795-5","volume-title":"The Jackknife and bootstrap","author":"J Shao","year":"1995","unstructured":"Shao J, Tu D (1995) The Jackknife and bootstrap. Springer, New York"},{"key":"1330_CR30","doi-asserted-by":"publisher","first-page":"67","DOI":"10.1093\/biomet\/72.1.67","volume":"72","author":"RL Smith","year":"1985","unstructured":"Smith RL (1985) Maximum likelihood estimation in a class of non-regular cases. Biometrika 72:67\u201390","journal-title":"Biometrika"},{"issue":"4","key":"1330_CR31","doi-asserted-by":"publisher","first-page":"687","DOI":"10.1142\/S0218001409007326","volume":"23","author":"Y Sun","year":"2009","unstructured":"Sun Y, Wong AC, Kamel MS (2009) Classification of imbalanced data: a review. Int J Pattern Recognit Artif Intell 23(4):687\u2013719","journal-title":"Int J Pattern Recognit Artif Intell"},{"key":"1330_CR32","doi-asserted-by":"crossref","unstructured":"Tahir MA, Kittler J, Mikolajczyk K, Yan F (2012) A multiple expert approach to the class imbalance problem using inverse random under sampling. In: Multiple classifier systems. Springer, pp 82\u201391","DOI":"10.1007\/978-3-642-02326-2_9"},{"issue":"4","key":"1330_CR33","doi-asserted-by":"publisher","first-page":"2000","DOI":"10.1214\/10-AOAS354","volume":"4","author":"X Wang","year":"2010","unstructured":"Wang X, Dey DK (2010) Generalised extreme value regression for binary response data: an application to b2b electronic payments system adoption. Ann Appl Stat 4(4):2000\u20132023","journal-title":"Ann Appl Stat"},{"issue":"4","key":"1330_CR34","doi-asserted-by":"publisher","first-page":"345","DOI":"10.1080\/00031305.2020.1731599","volume":"74","author":"L Xu","year":"2020","unstructured":"Xu L, Gotwalt C, Hong Y, King CB, Meeker WQ (2020) Applications of the fractional-random-weight bootstrap. Am Stat 74(4):345\u2013358","journal-title":"Am Stat"}],"updated-by":[{"DOI":"10.1007\/s00180-023-01365-1","type":"correction","label":"Correction","source":"publisher","updated":{"date-parts":[[2023,6,19]],"date-time":"2023-06-19T00:00:00Z","timestamp":1687132800000}}],"container-title":["Computational Statistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00180-023-01330-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s00180-023-01330-y\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00180-023-01330-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,1,31]],"date-time":"2024-01-31T13:04:37Z","timestamp":1706706277000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s00180-023-01330-y"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,2,4]]},"references-count":34,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2024,2]]}},"alternative-id":["1330"],"URL":"https:\/\/doi.org\/10.1007\/s00180-023-01330-y","relation":{},"ISSN":["0943-4062","1613-9658"],"issn-type":[{"value":"0943-4062","type":"print"},{"value":"1613-9658","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,2,4]]},"assertion":[{"value":"2 May 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"26 January 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 February 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"19 June 2023","order":4,"name":"change_date","label":"Change Date","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Correction","order":5,"name":"change_type","label":"Change Type","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"A Correction to this paper has been published:","order":6,"name":"change_details","label":"Change Details","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"https:\/\/doi.org\/10.1007\/s00180-023-01365-1","URL":"https:\/\/doi.org\/10.1007\/s00180-023-01365-1","order":7,"name":"change_details","label":"Change Details","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}