{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,8]],"date-time":"2026-06-08T14:10:14Z","timestamp":1780927814766,"version":"3.54.1"},"reference-count":40,"publisher":"Springer Science and Business Media LLC","issue":"6","license":[{"start":{"date-parts":[[2025,4,3]],"date-time":"2025-04-03T00:00:00Z","timestamp":1743638400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,4,3]],"date-time":"2025-04-03T00:00:00Z","timestamp":1743638400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100002301","name":"Eesti Teadusagentuur","doi-asserted-by":"publisher","award":["PRG1604"],"award-info":[{"award-number":["PRG1604"]}],"id":[{"id":"10.13039\/501100002301","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003510","name":"Haridus- ja Teadusministeerium","doi-asserted-by":"publisher","award":["Estonian Centre of Excellence in Artificial Intelligence (EXAI)"],"award-info":[{"award-number":["Estonian Centre of Excellence in Artificial Intelligence (EXAI)"]}],"id":[{"id":"10.13039\/501100003510","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Mach Learn"],"published-print":{"date-parts":[[2025,6]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>In many binary classification applications, the costs of false positives and negatives are imbalanced. Furthermore, there is often uncertainty about the exact costs of these errors. A natural measure-of-interest to be minimised in such scenarios is the expected misclassification cost. We identify many situations where this measure has analytic gradients, and thus it can be used as a training loss and optimised directly using empirical risk minimisation. In particular, we derive such losses from the Beta, Gamma and Gaussian distributions to model different kinds of cost uncertainty. The Beta family includes commonly used losses such as cross-entropy, squared error and 0\u20131 loss as special cases. The question then arises as to when it is appropriate to directly optimize the measure-of-interest, versus using a standard surrogate like cross-entropy or focal loss during training. After revisiting the theory of surrogate losses, proper losses and cost-sensitive learning to obtain good candidate surrogates out of derived families, we conduct an empirical comparison of derived training losses that, to our knowledge, were never tried on deep neural networks before, with the aim to minimise cost-sensitive measures-of-interest. The findings show that using Beta losses in training leads to improved performance compared to traditional training objectives like cross-entropy, label smoothing, and focal loss. This improvement is seen not only in terms of misclassification cost metrics, but (perhaps surprisingly) also in conventional metrics such as accuracy, mean squared error, and the area under the ROC curve.<\/jats:p>","DOI":"10.1007\/s10994-024-06634-8","type":"journal-article","created":{"date-parts":[[2025,4,5]],"date-time":"2025-04-05T09:36:37Z","timestamp":1743845797000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["Cost-sensitive classification with cost uncertainty: do we need surrogate losses?"],"prefix":"10.1007","volume":"114","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9210-1260","authenticated-orcid":false,"given":"Viacheslav","family":"Komisarenko","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Meelis","family":"Kull","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2025,4,3]]},"reference":[{"key":"6634_CR1","doi-asserted-by":"publisher","first-page":"249","DOI":"10.1016\/j.neunet.2018.07.011","volume":"106","author":"M Buda","year":"2018","unstructured":"Buda, M., Maki, A., & Mazurowski, M. A. (2018). A systematic study of the class imbalance problem in convolutional neural networks. Neural Networks, 106, 249\u2013259.","journal-title":"Neural Networks"},{"key":"6634_CR2","first-page":"13","volume":"3","author":"A Buja","year":"2005","unstructured":"Buja, A., Stuetzle, W., & Shen, Y. (2005). Loss functions for binary class probability estimation and classification: Structure and applications. Working Draft, 3, 13.","journal-title":"Working Draft"},{"key":"6634_CR3","unstructured":"Cao, K., Wei, C., Gaidon, A., Arechiga, N., & Ma, T. (2019). Learning imbalanced datasets with label-distribution-aware margin loss. Advances in Neural Information Processing Systems, 32."},{"key":"6634_CR4","doi-asserted-by":"publisher","first-page":"321","DOI":"10.1613\/jair.953","volume":"16","author":"NV Chawla","year":"2002","unstructured":"Chawla, N. V., Bowyer, K. W., Hall, Lawrence O., & Philip Kegelmeyer, W. (2002). Smote: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321\u2013357.","journal-title":"Journal of Artificial Intelligence Research"},{"key":"6634_CR5","unstructured":"Collell, G., Prelec, D. & Patil, K. (2016). Reviving threshold-moving: A simple plug-in bagging ensemble for binary and multiclass imbalanced data. arXiv preprint arXiv:1606.08698."},{"key":"6634_CR6","doi-asserted-by":"crossref","unstructured":"Cui, Y., Jia, M., Lin, T.Y., Song, Y. & Belongie, S. (2019). Class-balanced loss based on effective number of samples. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 9268\u20139277.","DOI":"10.1109\/CVPR.2019.00949"},{"key":"6634_CR7","first-page":"1","volume":"7","author":"J Dem\u0161ar","year":"2006","unstructured":"Dem\u0161ar, J. (2006). Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research, 7, 1\u201330.","journal-title":"The Journal of Machine Learning Research"},{"key":"6634_CR8","doi-asserted-by":"publisher","first-page":"95","DOI":"10.1007\/s10994-006-8199-5","volume":"65","author":"C Drummond","year":"2006","unstructured":"Drummond, C., & Holte, Robert C. (2006). Cost curves: An improved method for visualizing classifier performance. Machine Learning, 65, 95\u2013130.","journal-title":"Machine Learning"},{"key":"6634_CR9","unstructured":"Elkan, C. (2001). The foundations of cost-sensitive learning. In International joint conference on artificial intelligence, pp. 973\u2013978. Lawrence Erlbaum Associates Ltd."},{"key":"6634_CR10","unstructured":"Guo, C., Pleiss, G., Sun, Y. & Weinberger, K.Q. (2017). On calibration of modern neural networks. In International Conference on Machine Learning, PMLR, pp. 1321\u20131330."},{"issue":"1","key":"6634_CR11","doi-asserted-by":"publisher","first-page":"103","DOI":"10.1007\/s10994-009-5119-5","volume":"77","author":"DJ Hand","year":"2009","unstructured":"Hand, D. J. (2009). Measuring classifier performance: A coherent alternative to the area under the roc curve. Machine Learning, 77(1), 103\u2013123.","journal-title":"Machine Learning"},{"key":"6634_CR12","doi-asserted-by":"publisher","DOI":"10.1007\/978-0-387-84858-7","volume-title":"The elements of statistical learning: Data mining, inference, and prediction","author":"T Hastie","year":"2009","unstructured":"Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction. Germany: Springer Science & Business Media."},{"key":"6634_CR13","doi-asserted-by":"publisher","first-page":"18","DOI":"10.1016\/j.media.2016.05.004","volume":"35","author":"M Havaei","year":"2017","unstructured":"Havaei, M., Davy, A., Warde-Farley, D., Biard, A., Courville, A., Bengio, Y., Pal, C., Jodoin, P.-M., & Larochelle, H. (2017). Brain tumor segmentation with deep neural networks. Medical Image Analysis, 35, 18\u201331.","journal-title":"Medical Image Analysis"},{"key":"6634_CR14","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S. & Sun, J.,(2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770\u2013778, .","DOI":"10.1109\/CVPR.2016.90"},{"key":"6634_CR15","first-page":"2813","volume":"13","author":"J Hern\u00e1ndez-Orallo","year":"2012","unstructured":"Hern\u00e1ndez-Orallo, J., Flach, P., & Ferri, C. (2012). A unified view of performance metrics: Translating threshold choice into expected classification loss. Journal of Machine Learning Research, 13, 2813\u20132869.","journal-title":"Journal of Machine Learning Research"},{"key":"6634_CR16","doi-asserted-by":"crossref","unstructured":"Huang, C., Li, Y., Loy, C.C. & Tang, X. (2016). Learning deep representation for imbalanced classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5375\u20135384.","DOI":"10.1109\/CVPR.2016.580"},{"key":"6634_CR17","doi-asserted-by":"publisher","first-page":"50","DOI":"10.1016\/j.neucom.2018.11.099","volume":"343","author":"A Iranmehr","year":"2019","unstructured":"Iranmehr, A., Masnadi-Shirazi, H., & Vasconcelos, N. (2019). Cost-sensitive support vector machines. Neurocomputing, 343, 50\u201364.","journal-title":"Neurocomputing"},{"issue":"5","key":"6634_CR18","doi-asserted-by":"publisher","first-page":"429","DOI":"10.3233\/IDA-2002-6504","volume":"6","author":"N Japkowicz","year":"2002","unstructured":"Japkowicz, N., & Stephen, S. (2002). The class imbalance problem: A systematic study. Intelligent Data Analysis, 6(5), 429\u2013449.","journal-title":"Intelligent Data Analysis"},{"issue":"1","key":"6634_CR19","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s40537-019-0192-5","volume":"6","author":"JM Johnson","year":"2019","unstructured":"Johnson, J. M., & Khoshgoftaar, T. M. (2019). Survey on deep learning with class imbalance. Journal of Big Data, 6(1), 1\u201354.","journal-title":"Journal of Big Data"},{"key":"6634_CR20","unstructured":"Kingma, D.P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980."},{"key":"6634_CR21","unstructured":"Krizhevsky, A., Nair, V., & Hinton, G. (2009). Cifar-10 and cifar-100 datasets. https:\/\/www.cs.toronto.edu\/kriz\/cifar.html, 6."},{"key":"6634_CR22","unstructured":"Kukar, M., Kononenko, I. et\u00a0al. (1998). Cost-sensitive learning with neural networks. In ECAI, vol\/\u00a015, pp. 88\u201394. Citeseer."},{"key":"6634_CR23","doi-asserted-by":"crossref","unstructured":"Li, B., Liu, Y. & Wang, X. (2019). Gradient harmonized single-stage detector. In Proceedings of the AAAI Conference on Artificial Intelligence,33, pp. 8577\u20138584.","DOI":"10.1609\/aaai.v33i01.33018577"},{"key":"6634_CR24","doi-asserted-by":"crossref","unstructured":"Lin, T.-Y., Goyal, P., Girshick, R., He, K., & Doll\u00e1r, P. (2017). Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2980\u20132988.","DOI":"10.1109\/ICCV.2017.324"},{"issue":"1","key":"6634_CR25","doi-asserted-by":"publisher","first-page":"191","DOI":"10.1023\/A:1012406528296","volume":"46","author":"Y Lin","year":"2002","unstructured":"Lin, Y., Lee, Y., & Wahba, G. (2002). Support vector machines for classification in nonstandard situations. Machine Learning, 46(1), 191\u2013202.","journal-title":"Machine Learning"},{"key":"6634_CR26","doi-asserted-by":"crossref","unstructured":"Lipton, Z.C., Elkan, C., & Naryanaswamy, B. (2014). Optimal thresholding of classifiers to maximize f1 measure. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer, pp. 225\u2013239.","DOI":"10.1007\/978-3-662-44851-9_15"},{"key":"6634_CR27","unstructured":"M\u00fcller, R., Kornblith, S., & Hinton, G.E. (2019) When does label smoothing help? Advances in neural information processing systems, 32."},{"issue":"8","key":"6634_CR28","doi-asserted-by":"publisher","first-page":"1123","DOI":"10.1057\/palgrave.jors.2602641","volume":"60","author":"P Pendharkar","year":"2009","unstructured":"Pendharkar, P. (2009). Misclassification cost minimizing fitness functions for genetic algorithm-based artificial neural network classifiers. Journal of the Operational Research Society, 60(8), 1123\u20131134.","journal-title":"Journal of the Operational Research Society"},{"issue":"5","key":"6634_CR29","doi-asserted-by":"publisher","first-page":"432","DOI":"10.1002\/nav.20154","volume":"53","author":"P Pendharkar","year":"2006","unstructured":"Pendharkar, P., & Nanda, S. (2006). A misclassification cost-minimizing evolutionary-neural classification approach. Naval Research Logistics (NRL), 53(5), 432\u2013447.","journal-title":"Naval Research Logistics (NRL)"},{"key":"6634_CR30","unstructured":"Pereyra, G., Tucker, G., Chorowski, J., Kaiser, \u0141., & Hinton, G. (2017). Regularizing neural networks by penalizing confident output distributions. arXiv preprint arXiv:1701.06548."},{"issue":"2","key":"6634_CR31","doi-asserted-by":"publisher","first-page":"245","DOI":"10.1007\/s10115-011-0465-6","volume":"33","author":"E Ramentol","year":"2012","unstructured":"Ramentol, E., Caballero, Y., Bello, R., & Herrera, F. (2012). Smote-RSB*: A hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using smote and rough sets theory. Knowledge and Information Systems, 33(2), 245\u2013265.","journal-title":"Knowledge and Information Systems"},{"key":"6634_CR32","unstructured":"Rybizki, L. (2014). Learning cost sensitive binary classification rules accounting for uncertain and unequal misclassification costs. IWQW Discussion Papers: Technical report."},{"issue":"4","key":"6634_CR33","doi-asserted-by":"publisher","first-page":"1856","DOI":"10.1214\/aos\/1176347398","volume":"17","author":"MJ Schervish","year":"1989","unstructured":"Schervish, M. J. (1989). A general method for comparing probability assessors. The Annals of Statistics, 17(4), 1856\u20131879.","journal-title":"The Annals of Statistics"},{"key":"6634_CR34","doi-asserted-by":"publisher","first-page":"118931","DOI":"10.1109\/ACCESS.2019.2933437","volume":"7","author":"Yu Shuang","year":"2019","unstructured":"Shuang, Yu., Li, X., Zhang, X., & Wang, H. (2019). The OCS-SVM: An objective-cost-sensitive SVM with sample-based misclassification cost invariance. IEEE Access, 7, 118931\u2013118942.","journal-title":"IEEE Access"},{"issue":"2","key":"6634_CR35","doi-asserted-by":"publisher","first-page":"125","DOI":"10.1007\/BF02289503","volume":"31","author":"EH Shuford Jr","year":"1966","unstructured":"Shuford, E. H., Jr., Albert, A., & Massengill, H. E. (1966). Admissible probability measurement procedures. Psychometrika, 31(2), 125\u2013145.","journal-title":"Psychometrika"},{"key":"6634_CR36","first-page":"11809","volume":"34","author":"D-B Wang","year":"2021","unstructured":"Wang, D.-B., Feng, L., & Zhang, M.-L. (2021). Rethinking calibration of deep neural networks: Do not be afraid of overconfidence. Advances in Neural Information Processing Systems, 34, 11809\u201311820.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"6634_CR37","unstructured":"Xiao, H., Rasul, K., & Vollgraf, R. (2017). Fashion-mnist: A novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747."},{"issue":"1","key":"6634_CR38","doi-asserted-by":"publisher","first-page":"41","DOI":"10.1038\/s41597-022-01721-8","volume":"10","author":"J Yang","year":"2023","unstructured":"Yang, J., Shi, R., Wei, D., Liu, Z., Zhao, L., Ke, B., Pfister, H., & Ni, B. (2023). Medmnist v2-a large-scale lightweight benchmark for 2d and 3d biomedical image classification. Scientific Data, 10(1), 41.","journal-title":"Scientific Data"},{"key":"6634_CR39","unstructured":"Ye, N., Chai, K.M., Lee, W.S., & Chieu, H.L. (2012). Optimizing f-measures: A tale of two approaches. In Proceedings of the 29th International Conference on Machine Learning, Omnipress, pp. 289\u2013296."},{"key":"6634_CR40","doi-asserted-by":"crossref","unstructured":"Zadrozny, B., Langford, J., & Abe, N. (2003). Cost-sensitive learning by cost-proportionate example weighting. In Third IEEE International Conference on Data Mining, IEEE, pp. 435\u2013442.","DOI":"10.1109\/ICDM.2003.1250950"}],"container-title":["Machine Learning"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-024-06634-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10994-024-06634-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-024-06634-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,5,21]],"date-time":"2025-05-21T16:17:52Z","timestamp":1747844272000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10994-024-06634-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,4,3]]},"references-count":40,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2025,6]]}},"alternative-id":["6634"],"URL":"https:\/\/doi.org\/10.1007\/s10994-024-06634-8","relation":{},"ISSN":["0885-6125","1573-0565"],"issn-type":[{"value":"0885-6125","type":"print"},{"value":"1573-0565","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,4,3]]},"assertion":[{"value":"27 February 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 November 2024","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"19 November 2024","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"3 April 2025","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors have no Conflict of interest to declare that are relevant to the content of this article.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical approval"}},{"value":"Not applicable.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent to participate"}},{"value":"Not applicable.","order":5,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"Code for running the experiments will be published upon acceptance.","order":6,"name":"Ethics","group":{"name":"EthicsHeading","label":"Code availability"}}],"article-number":"132"}}