{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,18]],"date-time":"2026-01-18T08:55:24Z","timestamp":1768726524804,"version":"3.49.0"},"reference-count":35,"publisher":"Springer Science and Business Media LLC","issue":"7","license":[{"start":{"date-parts":[[2025,6,5]],"date-time":"2025-06-05T00:00:00Z","timestamp":1749081600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,6,5]],"date-time":"2025-06-05T00:00:00Z","timestamp":1749081600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100014440","name":"Ministerio de Ciencia, Innovaci\u00f3n y Universidades","doi-asserted-by":"publisher","award":["PID2023-153035NB-100"],"award-info":[{"award-number":["PID2023-153035NB-100"]}],"id":[{"id":"10.13039\/100014440","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100008530","name":"European Regional Development Fund","doi-asserted-by":"publisher","award":["PID2023-153035NB-100"],"award-info":[{"award-number":["PID2023-153035NB-100"]}],"id":[{"id":"10.13039\/501100008530","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003759","name":"Universidad Polit\u00e9cnica de Madrid","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100003759","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003759","name":"Universidad Polit\u00e9cnica de Madrid","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100003759","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Mach Learn"],"published-print":{"date-parts":[[2025,7]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>In supervised classification problems using Deep Neural Networks, the loss function is typically based on the Kullback\u2013Leibler divergence. However, alternative entropic divergence formulations, such as the Jensen\u2013Shannon Divergence (JSD), have recently garnered attention for their unique properties. In this study, we delve deeper into the interpretation of the JSD and its generalized form, the Jensen\u2013Tsallis Divergence (JTD), as alternative loss functions for supervised classification. When provided with one-hot encoded distributions for the true label probabilities, we demonstrate that these novel divergences impose an intrinsic output confidence regularization that prevents overfitting. Additionally, the <jats:italic>q<\/jats:italic> non-extensive parameter of the JTD directly influences the structure of the regularizer, offering increased flexibility in the formulation of the loss function. Through experiments conducted on artificially imbalanced versions of MNIST, Fashion-MNIST, SVHN and CIFAR-10 we showcase how JTD outperforms JSD and other traditional loss functions in terms of generalization performance, especially for highly imbalanced datasets.<\/jats:p>","DOI":"10.1007\/s10994-025-06791-4","type":"journal-article","created":{"date-parts":[[2025,6,5]],"date-time":"2025-06-05T16:32:54Z","timestamp":1749141174000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Jensen\u2013Tsallis divergence for supervised classification under data imbalance"],"prefix":"10.1007","volume":"114","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7870-6287","authenticated-orcid":false,"given":"Antonio","family":"Squicciarini","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9912-1499","authenticated-orcid":false,"given":"Tom","family":"Trigano","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7407-3630","authenticated-orcid":false,"given":"David","family":"Luengo","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,6,5]]},"reference":[{"key":"6791_CR1","doi-asserted-by":"publisher","unstructured":"Ahn, C., Kim, K., Baek, J.-W., Lim, J., & Han, S. (2023). Sample-wise label confidence incorporation for learning with noisy labels. In 2023 IEEE\/CVF international conference on computer vision (ICCV) (pp. 1823\u20131832). https:\/\/doi.org\/10.1109\/ICCV51070.2023.00175","DOI":"10.1109\/ICCV51070.2023.00175"},{"key":"6791_CR2","unstructured":"Amid, E., Warmuth, M. K., & Srinivasan, S. (2019). Two-temperature logistic regression based on the Tsallis divergence. In Proceedings of the twenty-second international conference on artificial intelligence and statistics (pp. 2388\u20132396)."},{"key":"6791_CR3","doi-asserted-by":"publisher","first-page":"249","DOI":"10.1016\/j.neunet.2018.07.011","volume":"106","author":"M Buda","year":"2018","unstructured":"Buda, M., Maki, A., & Mazurowski, M. A. (2018). A systematic study of the class imbalance problem in convolutional neural networks. Neural Networks, 106, 249\u2013259. https:\/\/doi.org\/10.1016\/j.neunet.2018.07.011","journal-title":"Neural Networks"},{"key":"6791_CR4","doi-asserted-by":"publisher","unstructured":"Choi, Y., Lee, K., & Oh, S. (2019). Distributional deep reinforcement learning with a mixture of Gaussians. In 2019 international conference on robotics and automation (ICRA) (pp. 9791\u20139797). https:\/\/doi.org\/10.1109\/ICRA.2019.8793505","DOI":"10.1109\/ICRA.2019.8793505"},{"key":"6791_CR5","unstructured":"Chu, X., Jin, Y., Zhu, W., Wang, Y., Wang, X., Zhang, S., & Mei, H. (2022). DNA: Domain generalization with diversified neural averaging. In Proceedings of the 39th Furuichi and Minculete (international conference on machine learning (pp. 4010\u20134034)."},{"key":"6791_CR6","doi-asserted-by":"crossref","unstructured":"Cui, Y., Jia, M., Lin, T.-Y., Song, Y., & Belongie, S. (2019). Class-balanced loss based on effective number of samples. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (pp. 9268\u20139277).","DOI":"10.1109\/CVPR.2019.00949"},{"key":"6791_CR7","first-page":"10647","volume":"33","author":"J Deasy","year":"2020","unstructured":"Deasy, J., Simidjievski, N., & Li\u00f3, P. (2020). constraining variational inference with geometric Jensen-Shannon divergence. Advances in Neural Information Processing Systems, 33, 10647\u201310658.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"6791_CR8","first-page":"1","volume":"7","author":"J Dem\u0161ar","year":"2006","unstructured":"Dem\u0161ar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 1\u201330.","journal-title":"Journal of Machine Learning Research"},{"issue":"7","key":"6791_CR9","doi-asserted-by":"publisher","first-page":"1858","DOI":"10.1109\/TIT.2003.813506","volume":"49","author":"DM Endres","year":"2003","unstructured":"Endres, D. M., & Schindelin, J. E. (2003). A new metric for probability distributions. IEEE Transactions on Information Theory, 49(7), 1858\u20131860. https:\/\/doi.org\/10.1109\/TIT.2003.813506","journal-title":"IEEE Transactions on Information Theory"},{"key":"6791_CR10","first-page":"30284","volume":"34","author":"E Englesson","year":"2021","unstructured":"Englesson, E., & Azizpour, H. (2021). Generalized Jensen-Shannon divergence loss for learning with noisy labels. Advances in Neural Information Processing Systems, 34, 30284\u201330297.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"6791_CR11","unstructured":"Erichson, B., Lim, S. H., Xu, W., Utrera, F., Cao, Z., & Mahoney, M. (2024). NoisyMix: Boosting model robustness to common corruptions. In Proceedings of the 27th international conference on artificial intelligence and statistics (pp. 4033\u20134041)."},{"issue":"1\u20132","key":"6791_CR13","doi-asserted-by":"publisher","first-page":"388","DOI":"10.1016\/j.physa.2011.07.052","volume":"391","author":"S Furuichi","year":"2012","unstructured":"Furuichi, S., & Mitroi, F.-C. (2012). Mathematical inequalities for some divergences. Physica A: Statistical Mechanics and Its Applications, 391(1\u20132), 388\u2013400. https:\/\/doi.org\/10.1016\/j.physa.2011.07.052","journal-title":"Physica A: Statistical Mechanics and Its Applications"},{"issue":"12","key":"6791_CR14","doi-asserted-by":"publisher","first-page":"4868","DOI":"10.1063\/1.1805729","volume":"45","author":"S Furuichi","year":"2004","unstructured":"Furuichi, S., Yanagi, K., & Kuriyama, K. (2004). Fundamental properties of Tsallis relative entropy. Journal of Mathematical Physics, 45(12), 4868\u20134877. https:\/\/doi.org\/10.1063\/1.1805729","journal-title":"Journal of Mathematical Physics"},{"key":"6791_CR15","unstructured":"Gangwani, T., Liu, Q., & Peng, J. (2019). Learning self-imitating diverse policies. In International conference on learning representations. https:\/\/arxiv.org\/abs\/1805.10309"},{"key":"6791_CR16","doi-asserted-by":"publisher","unstructured":"Ghosh, A., Kumar, H., & Sastry, P. S. (2017). Robust loss functions under label noise for deep neural networks. In Proceedings of the AAAI conference on artificial intelligence (Vol. 31, No. 1). https:\/\/doi.org\/10.1609\/aaai.v31i1.10894","DOI":"10.1609\/aaai.v31i1.10894"},{"key":"6791_CR17","doi-asserted-by":"publisher","unstructured":"Golik, P., Doetsch, P., & Ney, H. (2013). Cross-entropy vs. squared error training: A theoretical and experimental comparison. Interspeech2013, 1756\u20131760. https:\/\/doi.org\/10.21437\/Interspeech.2013-436","DOI":"10.21437\/Interspeech.2013-436"},{"issue":"11","key":"6791_CR18","doi-asserted-by":"publisher","first-page":"139","DOI":"10.1145\/3422622","volume":"63","author":"I Goodfellow","year":"2020","unstructured":"Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2020). Generative adversarial networks. Communications of the ACM, 63(11), 139\u2013144. https:\/\/doi.org\/10.1145\/3422622","journal-title":"Communications of the ACM"},{"key":"6791_CR19","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770\u2013778).","DOI":"10.1109\/CVPR.2016.90"},{"key":"6791_CR20","unstructured":"Hendrycks, D., Mu, N., Cubuk, E. D., Zoph, B., Gilmer, J., & Lakshminarayanan, B. (2019). AugMix: A simple data processing method to improve robustness and uncertainty. In International conference on learning representations."},{"key":"6791_CR21","doi-asserted-by":"publisher","unstructured":"Hoyos-Osorio, J. K., Posso-Murillo, S., & Sanchez-Giraldo, L. G. (2023). The representation Jensen-Shannon divergence. https:\/\/doi.org\/10.48550\/arXiv.2305.16446. arXiv:2305.16446","DOI":"10.48550\/arXiv.2305.16446"},{"key":"6791_CR22","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.patrec.2023.03.018","volume":"169","author":"D Hwang","year":"2023","unstructured":"Hwang, D., Ha, J.-W., Shim, H., & Choe, J. (2023). Entropy regularization for weakly supervised object localization. Pattern Recognition Letters, 169, 1\u20137. https:\/\/doi.org\/10.1016\/j.patrec.2023.03.018","journal-title":"Pattern Recognition Letters"},{"issue":"1","key":"6791_CR23","doi-asserted-by":"publisher","first-page":"81","DOI":"10.1016\/S0378-4371(03)00566-1","volume":"329","author":"PW Lamberti","year":"2003","unstructured":"Lamberti, P. W., & Majtey, A. P. (2003). Non-logarithmic Jensen-Shannon divergence. Physica A: Statistical Mechanics and its Applications, 329(1), 81\u201390. https:\/\/doi.org\/10.1016\/S0378-4371(03)00566-1","journal-title":"Physica A: Statistical Mechanics and its Applications"},{"issue":"11","key":"6791_CR24","doi-asserted-by":"publisher","first-page":"2278","DOI":"10.1109\/5.726791","volume":"86","author":"Y Lecun","year":"1998","unstructured":"Lecun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278\u20132324. https:\/\/doi.org\/10.1109\/5.726791","journal-title":"Proceedings of the IEEE"},{"key":"6791_CR25","first-page":"3163","volume":"34","author":"M Li","year":"2021","unstructured":"Li, M., Zhang, X., Thrampoulidis, C., Chen, J., & Oymak, S. (2021). AutoBalance: Optimized loss functions for imbalanced data. Advances in Neural Information Processing Systems, 34, 3163\u20133177.","journal-title":"Advances in Neural Information Processing Systems"},{"issue":"1","key":"6791_CR26","doi-asserted-by":"publisher","first-page":"145","DOI":"10.1109\/18.61115","volume":"37","author":"J Lin","year":"1991","unstructured":"Lin, J. (1991). Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory, 37(1), 145\u2013151. https:\/\/doi.org\/10.1109\/18.61115","journal-title":"IEEE Transactions on Information Theory"},{"key":"6791_CR27","doi-asserted-by":"crossref","unstructured":"Lin, T.-Y., Goyal, P., Girshick, R., He, K., & Dollar, P. (2017). Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision (pp. 2980\u20132988).","DOI":"10.1109\/ICCV.2017.324"},{"key":"6791_CR29","first-page":"15288","volume":"33","author":"J Mukhoti","year":"2020","unstructured":"Mukhoti, J., Kulharia, V., Sanyal, A., Golodetz, S., Torr, P., & Dokania, P. (2020). Calibrating deep neural networks using focal loss. Advances in Neural Information Processing Systems, 33, 15288\u201315299.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"6791_CR31","unstructured":"Pereyra, G., Tucker, G., Chorowski, J., Kaiser, L., & Hinton, G. (2017). Regularizing neural networks by penalizing confident output distributions. In International conference on learning representations (ICLR), workshop track. https:\/\/arxiv.org\/abs\/1701.06548"},{"key":"6791_CR33","unstructured":"Sinn, M., & Rawat, A. (2018). Non-parametric estimation of Jensen-Shannon divergence in generative adversarial network training. In Proceedings of the twenty-first international conference on artificial intelligence and statistics (pp. 642\u2013651)."},{"issue":"8","key":"6791_CR34","doi-asserted-by":"publisher","first-page":"8414","DOI":"10.1609\/aaai.v36i8.20817","volume":"36","author":"J Tack","year":"2022","unstructured":"Tack, J., Yu, S., Jeong, J., Kim, M., Hwang, S. J., & Shin, J. (2022). Consistency regularization for adversarial robustness. Proceedings of the AAAI Conference on Artificial Intelligence, 36(8), 8414\u20138422. https:\/\/doi.org\/10.1609\/aaai.v36i8.20817","journal-title":"Proceedings of the AAAI Conference on Artificial Intelligence"},{"key":"6791_CR35","doi-asserted-by":"publisher","unstructured":"Tsallis, C., Baldovin, F., Cerbino, R., & Pierobon, P. (2003). Introduction to nonextensive statistical mechanics and thermodynamics. https:\/\/doi.org\/10.48550\/arXiv.cond-mat\/0309093. arXiv:cond-mat\/0309093","DOI":"10.48550\/arXiv.cond-mat\/0309093"},{"key":"6791_CR36","doi-asserted-by":"publisher","unstructured":"Vila, M., Bardera, A., Feixas, M., & Sbert, M. (2011). Tsallis mutual information for document classification. 13(9), 1694\u20131707. Retrieved March 31, 2025, from https:\/\/doi.org\/10.3390\/e13091694","DOI":"10.3390\/e13091694"},{"key":"6791_CR37","unstructured":"Weber, M. G., Li, L., Wang, B., Zhao, Z., Li, B., & Zhang, C. (2022). Certifying out-of-domain generalization for Blackbox functions. In Proceedings of the 39th international conference on machine learning (pp. 23527\u201323548)."},{"key":"6791_CR38","unstructured":"Weng, L. (2019). From GAN to WGAN. arXiv:1904.08994"},{"key":"6791_CR39","unstructured":"Zhang, Z., & Sabuncu, M. (2018). Generalized cross entropy loss for training deep neural networks with noisy labels. In Advances in neural information processing systems (Vol. 31)."}],"container-title":["Machine Learning"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-025-06791-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10994-025-06791-4\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-025-06791-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,10]],"date-time":"2025-07-10T08:57:23Z","timestamp":1752137843000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10994-025-06791-4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,5]]},"references-count":35,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2025,7]]}},"alternative-id":["6791"],"URL":"https:\/\/doi.org\/10.1007\/s10994-025-06791-4","relation":{},"ISSN":["0885-6125","1573-0565"],"issn-type":[{"value":"0885-6125","type":"print"},{"value":"1573-0565","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,6,5]]},"assertion":[{"value":"11 April 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"15 April 2025","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"30 April 2025","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"5 June 2025","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that there is no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical approval"}},{"value":"Not applicable.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent to participate"}},{"value":"The authors give their consent for publication.","order":5,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}}],"article-number":"162"}}