{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,26]],"date-time":"2026-06-26T01:41:24Z","timestamp":1782438084622,"version":"3.54.5"},"reference-count":31,"publisher":"Springer Science and Business Media LLC","issue":"8","license":[{"start":{"date-parts":[[2021,7,7]],"date-time":"2021-07-07T00:00:00Z","timestamp":1625616000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,7,7]],"date-time":"2021-07-07T00:00:00Z","timestamp":1625616000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100008769","name":"Julius-Maximilians-Universit\u00e4t W\u00fcrzburg","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100008769","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Mach Learn"],"published-print":{"date-parts":[[2021,8]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>In many real world settings,\n imbalanced data impedes model performance of learning algorithms, like neural networks, mostly for rare cases. This is especially problematic for tasks focusing on these rare occurrences. For example, when estimating precipitation, extreme rainfall events are scarce but important considering their potential consequences. While there are numerous well studied solutions for classification settings, most of them cannot be applied to regression easily. Of the few solutions for regression tasks, barely any have explored cost-sensitive learning which is known to have advantages compared to sampling-based methods in classification tasks. In this work, we propose a sample weighting approach for imbalanced regression datasets called <jats:italic>DenseWeight<\/jats:italic> and a cost-sensitive learning approach for neural network regression with imbalanced data called <jats:italic>DenseLoss<\/jats:italic> based on our weighting scheme. DenseWeight weights data points according to their target value rarities through kernel density estimation (KDE). DenseLoss adjusts each data point\u2019s influence on the loss according to DenseWeight, giving rare data points more influence on model\n training compared to common data points. We show on multiple differently distributed datasets that DenseLoss significantly improves model performance for rare data points through its density-based weighting scheme. Additionally, we compare DenseLoss to the state-of-the-art method SMOGN, finding that our method mostly yields better performance. Our approach provides more control over model training as it enables us to actively decide on the trade-off between focusing on common or rare cases through a single hyperparameter, allowing the training of better models for rare data points.<\/jats:p>","DOI":"10.1007\/s10994-021-06023-5","type":"journal-article","created":{"date-parts":[[2021,7,7]],"date-time":"2021-07-07T18:01:51Z","timestamp":1625680911000},"page":"2187-2211","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":175,"title":["Density-based weighting for imbalanced regression"],"prefix":"10.1007","volume":"110","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3102-481X","authenticated-orcid":false,"given":"Michael","family":"Steininger","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Konstantin","family":"Kobs","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Padraig","family":"Davidson","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Anna","family":"Krause","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Andreas","family":"Hotho","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2021,7,7]]},"reference":[{"key":"6023_CR1","unstructured":"Branco, P., Ribeiro, R. P., & Torgo, L. (2016a). UBL: An R package for utility-based learning. arXiv preprint arXiv:1604.08079."},{"key":"6023_CR2","unstructured":"Branco, P., Torgo, L., & Ribeiro, R. P. (2017). SMOGN: A pre-processing approach for imbalanced regression. In LIDTA."},{"issue":"2","key":"6023_CR3","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/2907070","volume":"49","author":"P Branco","year":"2016","unstructured":"Branco, P., Torgo, L., & Ribeiro, R. P. (2016b). A survey of predictive modeling on imbalanced domains. ACM Computing Surveys (CSUR), 49(2), 1\u201350.","journal-title":"ACM Computing Surveys (CSUR)"},{"key":"6023_CR4","doi-asserted-by":"publisher","first-page":"321","DOI":"10.1613\/jair.953","volume":"16","author":"NV Chawla","year":"2002","unstructured":"Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. JAIR, 16, 321\u2013357.","journal-title":"JAIR"},{"issue":"1","key":"6023_CR5","doi-asserted-by":"publisher","first-page":"161","DOI":"10.1080\/24709360.2017.1396742","volume":"1","author":"Y-C Chen","year":"2017","unstructured":"Chen, Y.-C. (2017). A tutorial on kernel density estimation and recent advances. Biostatistics and Epidemiology, 1(1), 161\u2013187.","journal-title":"Biostatistics and Epidemiology"},{"key":"6023_CR6","first-page":"9268","volume":"2018","author":"Y Cui","year":"2019","unstructured":"Cui, Y., Jia, M., Lin, T.-Y., Song, Y., & Belongie, S. (2019). Class-balanced loss based on effective number of samples. CVPR, 2018, 9268\u20139277.","journal-title":"CVPR"},{"issue":"15","key":"6023_CR7","doi-asserted-by":"publisher","first-page":"2031","DOI":"10.1002\/joc.1688","volume":"28","author":"C Daly","year":"2008","unstructured":"Daly, C., et al. (2008). Physiographically sensitive mapping of climatological temperature and precipitation across the conterminous United States. International Journal of Climatology, 28(15), 2031\u20132064.","journal-title":"International Journal of Climatology"},{"key":"6023_CR8","first-page":"1851","volume":"2017","author":"Q Dong","year":"2017","unstructured":"Dong, Q., Gong, S., & Zhu, X. (2017). Class rectification hard mining for imbalanced deep learning. ICCV, 2017, 1851\u20131860.","journal-title":"ICCV"},{"key":"6023_CR10","doi-asserted-by":"crossref","unstructured":"Grinstead, C. M., & Snell, J. L. (2012). Introduction to probability. AMS.","DOI":"10.1090\/stml\/057"},{"key":"6023_CR11","unstructured":"He, H., Bai, Y., Garcia, E. A., & Li, S. (2008). ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In IJCNN 2008. IEEE (pp. 1322\u20131328)."},{"key":"6023_CR12","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In ICCV 2015.","DOI":"10.1109\/ICCV.2015.123"},{"key":"6023_CR13","doi-asserted-by":"crossref","unstructured":"Hern\u00e1ndez-Orallo, J. (2014). Probabilistic reframing for cost-sensitive regression. In TKDD 8.4.","DOI":"10.1145\/2641758"},{"issue":"12","key":"6023_CR14","doi-asserted-by":"publisher","first-page":"3395","DOI":"10.1016\/j.patcog.2013.06.014","volume":"46","author":"J Hern\u00e1ndez-Orallo","year":"2013","unstructured":"Hern\u00e1ndez-Orallo, J. (2013). ROC curves for regression. Pattern Recognition, 46(12), 3395\u20133411.","journal-title":"Pattern Recognition"},{"key":"6023_CR15","first-page":"5375","volume":"2016","author":"C Huang","year":"2016","unstructured":"Huang, C., Li, Y., Change Loy, C., & Tang, X. (2016). Learning deep representation for imbalanced classification. CVPR, 2016, 5375\u20135384.","journal-title":"CVPR"},{"key":"6023_CR16","doi-asserted-by":"publisher","first-page":"1192","DOI":"10.1016\/j.ins.2019.10.017","volume":"512","author":"F Kamalov","year":"2020","unstructured":"Kamalov, F. (2020). Kernel density estimation based sampling for imbalanced class distribution. Information Sciences, 512, 1192\u20131201.","journal-title":"Information Sciences"},{"key":"6023_CR17","unstructured":"Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980."},{"issue":"4","key":"6023_CR18","doi-asserted-by":"publisher","first-page":"221","DOI":"10.1007\/s13748-016-0094-0","volume":"5","author":"B Krawczyk","year":"2016","unstructured":"Krawczyk, B. (2016). Learning from imbalanced data: Open challenges and future directions. Progress in Artificial Intelligence, 5(4), 221\u2013232.","journal-title":"Progress in Artificial Intelligence"},{"key":"6023_CR19","unstructured":"Kunz, N. (2019). Smogn. [Online; version 0.1.2]. https:\/\/git.io\/JOWoK."},{"key":"6023_CR20","first-page":"807","volume":"2010","author":"V Nair","year":"2010","unstructured":"Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted Boltzmann machines. ICML, 2010, 807\u2013814.","journal-title":"ICML"},{"key":"6023_CR21","unstructured":"Odland, T. (2019). KDEpy. [Online; version 1.0.10]. https:\/\/git.io\/JOWrM."},{"key":"6023_CR22","doi-asserted-by":"crossref","unstructured":"Prechelt, L. (1998). Early stopping-but when? In Neural networks: Tricks of the trade (pp. 55\u201369). Springer.","DOI":"10.1007\/3-540-49430-8_3"},{"key":"6023_CR23","unstructured":"Ribeiro, R. P. (2011). Utility-based Regression. PhD thesis. University of Porto."},{"issue":"9","key":"6023_CR24","doi-asserted-by":"publisher","first-page":"1803","DOI":"10.1007\/s10994-020-05900-9","volume":"109","author":"RP Ribeiro","year":"2020","unstructured":"Ribeiro, R. P., & Moniz, N. (2020). Imbalanced regression and extreme value prediction. Machine Learning, 109(9), 1803\u20131835.","journal-title":"Machine Learning"},{"key":"6023_CR25","doi-asserted-by":"crossref","unstructured":"Silverman, B. W. (1986). Density estimation for statistics and data analysis (Vol. 26). CRC Press, London","DOI":"10.1007\/978-1-4899-3324-9"},{"issue":"04","key":"6023_CR26","first-page":"687","volume":"23","author":"Y Sun","year":"2009","unstructured":"Sun, Y., Wong, A. K., & Kamel, M. S. (2009). Classification of imbalanced data: A review. IJPRAI, 23(04), 687\u2013719.","journal-title":"IJPRAI"},{"key":"6023_CR27","doi-asserted-by":"crossref","unstructured":"Torgo, L., Ribeiro, R. P., Pfahringer, B., & Branco, P. (2013). Smote for regression. In Portuguese conference on artificial intelligence (pp. 378\u2013389). Springer.","DOI":"10.1007\/978-3-642-40669-0_33"},{"key":"6023_CR9","unstructured":"U.S. Geological Survey. (1996). GTOPO30. https:\/\/doi.org\/10.5066\/F7DF6PQS."},{"key":"6023_CR28","first-page":"1663","volume":"2017","author":"T Vandal","year":"2017","unstructured":"Vandal, T., Kodra, E., Ganguly, S., Michaelis, A., Nemani, R., & Ganguly, A. R. (2017). Deepsd: Generating high resolution climate change projections through single image super-resolution. KDD, 2017, 1663\u20131672.","journal-title":"KDD"},{"key":"6023_CR29","first-page":"7029","volume":"2017","author":"Y-X Wang","year":"2017","unstructured":"Wang, Y.-X., Ramanan, D., & Hebert, M. (2017). Learning to model the tail. NIPS, 2017, 7029\u20137039.","journal-title":"NIPS"},{"key":"6023_CR30","doi-asserted-by":"crossref","unstructured":"Wilcoxon, F. (1945). Individual comparisons by ranking methods. In Biometrics bulletin 1.6 (pp. 80\u201383). http:\/\/www.jstor.org\/stable\/3001968.","DOI":"10.2307\/3001968"},{"key":"6023_CR31","doi-asserted-by":"crossref","unstructured":"Zhao, H., Sinha, A. P., & Bansal, G. (2011). An extended tuning method for cost sensitive regression and forecasting. In Decision support systems 51.3.","DOI":"10.1016\/j.dss.2011.01.003"}],"container-title":["Machine Learning"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-021-06023-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10994-021-06023-5\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-021-06023-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,27]],"date-time":"2021-09-27T21:35:23Z","timestamp":1632778523000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10994-021-06023-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,7,7]]},"references-count":31,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2021,8]]}},"alternative-id":["6023"],"URL":"https:\/\/doi.org\/10.1007\/s10994-021-06023-5","relation":{},"ISSN":["0885-6125","1573-0565"],"issn-type":[{"value":"0885-6125","type":"print"},{"value":"1573-0565","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,7,7]]},"assertion":[{"value":"22 November 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 April 2021","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"16 June 2021","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 July 2021","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}