{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,18]],"date-time":"2026-06-18T19:38:33Z","timestamp":1781811513469,"version":"3.54.5"},"reference-count":34,"publisher":"Springer Science and Business Media LLC","issue":"10","license":[{"start":{"date-parts":[[2022,8,18]],"date-time":"2022-08-18T00:00:00Z","timestamp":1660780800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,8,18]],"date-time":"2022-08-18T00:00:00Z","timestamp":1660780800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int J Comput Vis"],"published-print":{"date-parts":[[2022,10]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Long-tailed datasets are very frequently encountered in real-world use cases where few classes or categories (known as majority or head classes) have higher number of data samples compared to the other classes (known as minority or tail classes). Training deep neural networks on such datasets gives results biased towards the head classes. So far, researchers have come up with multiple weighted loss and data re-sampling techniques in efforts to reduce the bias. However, most of such techniques assume that the tail classes are always the most difficult classes to learn and therefore need more weightage or attention. Here, we argue that the assumption might not always hold true. Therefore, we propose a novel approach to dynamically measure the instantaneous difficulty of each class during the training phase of the model. Further, we use the difficulty measures of each class to design a novel weighted loss technique called \u2018class-wise difficulty based weighted (CDB-W) loss\u2019 and a novel data sampling technique called \u2018class-wise difficulty based sampling (CDB-S)\u2019. To verify the wide-scale usability of our CDB methods, we conducted extensive experiments on multiple tasks such as image classification, object detection, instance segmentation and video-action classification. Results verified that CDB-W loss and CDB-S could achieve state-of-the-art results on many class-imbalanced datasets such as ImageNet-LT, LVIS and EGTEA, that resemble real-world use cases.<\/jats:p>","DOI":"10.1007\/s11263-022-01643-3","type":"journal-article","created":{"date-parts":[[2022,8,18]],"date-time":"2022-08-18T09:02:53Z","timestamp":1660813373000},"page":"2517-2531","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":35,"title":["Class-Difficulty Based Methods for Long-Tailed Visual Recognition"],"prefix":"10.1007","volume":"130","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5207-1551","authenticated-orcid":false,"given":"Saptarshi","family":"Sinha","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Hiroki","family":"Ohashi","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Katsuyuki","family":"Nakamura","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2022,8,18]]},"reference":[{"key":"1643_CR1","doi-asserted-by":"publisher","first-page":"405","DOI":"10.1109\/TKDE.2012.232","volume":"26","author":"S Barua","year":"2014","unstructured":"Barua, S., Islam, M. M., Yao, X., et al. (2014). Mwmote- majority weighted minority oversampling technique for imbalanced data set learning. IEEE Transactions on Knowledge and Data Engineering, 26, 405\u2013425. https:\/\/doi.org\/10.1109\/TKDE.2012.232","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"key":"1643_CR2","unstructured":"Cao, K., Wei, C., Gaidon, A., et\u00a0al. (2019). Learning imbalanced datasets with label-distribution-aware margin loss. Paper presented at Advances in neural information processing systems (NeurIPS)."},{"key":"1643_CR3","doi-asserted-by":"publisher","first-page":"321","DOI":"10.1613\/jair.953","volume":"16","author":"NV Chawla","year":"2002","unstructured":"Chawla, N. V., Bowyer, K. W., Hall, L. O., et al. (2002). Smote: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321\u2013357. https:\/\/doi.org\/10.1613\/jair.953","journal-title":"Journal of Artificial Intelligence Research"},{"key":"1643_CR4","doi-asserted-by":"crossref","unstructured":"Cui, Y., Jia, M., Lin, T. Y., et\u00a0al. (2019). Class-balanced loss based on effective number of samples. Paper presented at the International conference of computer vision and pattern recognition (CVPR)","DOI":"10.1109\/CVPR.2019.00949"},{"key":"1643_CR5","doi-asserted-by":"crossref","unstructured":"Deng, J., Dong, W., Socher, R., et\u00a0al. (2009). Imagenet: A large-scale hierarchical image database. Paper presented at the 2009 IEEE Conference on computer vision and pattern recognition (CVPR).","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"1643_CR6","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-98074-4","volume-title":"Learning from imbalanced data sets","author":"A Fern\u00e1ndez Hilario","year":"2018","unstructured":"Fern\u00e1ndez Hilario, A., Garc\u00eda L\u00f3pez, S., Galar, M., et al. (2018). Learning from imbalanced data sets. Switzerland: Springer."},{"key":"1643_CR7","doi-asserted-by":"crossref","unstructured":"Gupta, A., Dollar, P., Girshick, R. (2019). LVIS: A dataset for large vocabulary instance segmentation. Paper presented at the IEEE Conference on computer vision and pattern recognition (CVPR).","DOI":"10.1109\/CVPR.2019.00550"},{"key":"1643_CR8","doi-asserted-by":"publisher","first-page":"878","DOI":"10.1007\/11538059_91","volume-title":"Advances in intelligent computing","author":"H Han","year":"2005","unstructured":"Han, H., Wang, W. Y., & Mao, B. H. (2005). Borderline-smote: A new over-sampling method in imbalanced data sets learning. In D. S. Huang, X. P. Zhang, & G. B. Huang (Eds.), Advances in intelligent computing (pp. 878\u2013887). Berlin, Heidelberg: Springer."},{"key":"1643_CR9","doi-asserted-by":"crossref","unstructured":"Hara, K., Kataoka, H., Satoh, Y. (2018). Can spatiotemporal 3d cnns retrace the history of 2d CNNS and imagenet? Paper presented at the Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).","DOI":"10.1109\/CVPR.2018.00685"},{"key":"1643_CR10","doi-asserted-by":"publisher","first-page":"1263","DOI":"10.1109\/TKDE.2008.239","volume":"21","author":"H He","year":"2009","unstructured":"He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21, 1263\u20131284. https:\/\/doi.org\/10.1109\/TKDE.2008.239","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"key":"1643_CR11","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., et\u00a0al. (2015). Deep residual learning for image recognition. Preprint at arXiv:1512.03385.","DOI":"10.1109\/CVPR.2016.90"},{"key":"1643_CR12","doi-asserted-by":"publisher","first-page":"2781","DOI":"10.1109\/TPAMI.2019.2914680","volume":"42","author":"C Huang","year":"2020","unstructured":"Huang, C., Li, Y., Loy, C. C., et al. (2020). Deep imbalanced learning for face recognition and attribute prediction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42, 2781\u20132791.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"1643_CR13","doi-asserted-by":"crossref","unstructured":"Jamal, M. A., Brown, M., Yang, M. H., et\u00a0al. (2020). Rethinking class-balanced methods for long-tailed visual recognition from a domain adaptation perspective. Paper presented at the IEEE Conference on computer vision and pattern recognition (CVPR).","DOI":"10.1109\/CVPR42600.2020.00763"},{"key":"1643_CR14","unstructured":"Kang, B., Xie, S., Rohrbach, M., et\u00a0al. (2020). Decoupling representation and classifier for long-tailed recognition. Paper presented at the Eighth International conference on learning representations (ICLR)."},{"key":"1643_CR15","unstructured":"Kay, W., Carreira, J., Simonyan, K., et\u00a0al. (2017). The kinetics human action video dataset. https:\/\/deepmind.com\/research\/open-source\/kinetics."},{"key":"1643_CR16","unstructured":"Krizhevsky, A. (2009). Learning multiple layers of features from tiny images. https:\/\/www.cs.toronto.edu\/~kriz\/cifar.html."},{"key":"1643_CR17","unstructured":"LeCun, Y. (1998). The mnist database of handwritten digits. http:\/\/yann.lecun.com\/exdb\/mnist\/."},{"key":"1643_CR18","doi-asserted-by":"crossref","unstructured":"LeCun, Y., Bottou, L., Bengio, Y., et\u00a0al. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE.","DOI":"10.1109\/5.726791"},{"key":"1643_CR19","doi-asserted-by":"crossref","unstructured":"Li, Y., Wang, T., Kang, B., et\u00a0al. (2020). Overcoming classifier imbalance for long-tail object detection with balanced group softmax. Paper presented at the Proceedings of IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR).","DOI":"10.1109\/CVPR42600.2020.01100"},{"key":"1643_CR20","doi-asserted-by":"crossref","unstructured":"Lin, T. Y., Goyal, P., Girshick, R., et\u00a0al. (2017). Focal loss for dense object detection. Paper presented at the IEEE International conference on computer vision (ICCV).","DOI":"10.1109\/ICCV.2017.324"},{"key":"1643_CR21","doi-asserted-by":"crossref","unstructured":"Lin, Y., Liu, M., Rehg, J. M. (2018). In the eye of beholder: Joint learning of gaze and actions in first person video. Paper presented at the European conference on computer vision (ECCV).","DOI":"10.1007\/978-3-030-01228-1_38"},{"key":"1643_CR22","doi-asserted-by":"publisher","first-page":"539","DOI":"10.1109\/TSMCB.2008.2007853","volume":"39","author":"XY Liu","year":"2009","unstructured":"Liu, X. Y., Wu, J., & Zhou, Z. H. (2009). Exploratory undersampling for class-imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 39, 539\u2013550. https:\/\/doi.org\/10.1109\/TSMCB.2008.2007853","journal-title":"IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics)"},{"key":"1643_CR23","doi-asserted-by":"crossref","unstructured":"Liu, Z., Miao, Z., Zhan, X., et\u00a0al. (2019). Large-scale long-tailed recognition in an open world. Paper presented at the IEEE Conference of computer vision and pattern recognition (CVPR).","DOI":"10.1109\/CVPR.2019.00264"},{"key":"1643_CR24","unstructured":"Mikolov, T., Sutskever, I., Chen, K., et\u00a0al. (2013). Distributed representations of words and phrases and their compositionality. Paper presented at Advances in Neural Information Processing Systems (NeurIPS)."},{"key":"1643_CR25","unstructured":"Ren, M., Zeng, W., Yang, B., et\u00a0al. (2018). Learning to reweight examples for robust deep learning. Paper presented at the International conference on machine learning (ICML)."},{"key":"1643_CR26","unstructured":"Ren, S., He, K., Girshick, R., et\u00a0al. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. Paper presented at Advances in neural information processing systems (NeurIPS)."},{"key":"1643_CR27","unstructured":"Shu, J., Xie, Q., Yi, L., et\u00a0al. (2019). Meta-weight-net: Learning an explicit mapping for sample weighting. Paper presented at Advances in neural information processing systems (NeurIPS)."},{"key":"1643_CR28","unstructured":"Sinha, S., Ohashi, H., Nakamura, K. (2020). Class-wise difficulty-balanced loss for solving class-imbalance. Paper presented at Proceedings of Asian Conference on computer vision (ACCV)."},{"key":"1643_CR29","doi-asserted-by":"crossref","unstructured":"Song, H. O., Xiang, Y., Jegelka, S., et\u00a0al. (2016). Deep metric learning via lifted structured feature embedding. Paper presented at Conference on Computer Vision and Pattern Recognition (CVPR).","DOI":"10.1109\/CVPR.2016.434"},{"key":"1643_CR30","doi-asserted-by":"crossref","unstructured":"Tan, J., Wang, C., Li, B., et\u00a0al. (2020). Equalization loss for long-tailed object recognition. Paper presented at the IEEE\/CVF Conference of computer vision and pattern recognition (CVPR).","DOI":"10.1109\/CVPR42600.2020.01168"},{"key":"1643_CR31","unstructured":"Wang, T., Li, Y., Kang, B., et\u00a0al. (2019). Classification calibration for long-tail instance segmentation. Preprint at arXiv:1910.13081."},{"key":"1643_CR32","doi-asserted-by":"crossref","unstructured":"Wang, T., Li, Y., Kang, B., et\u00a0al. (2020). The devil is in classification: A simple framework for long-tail instance segmentation. Paper presented at the Proceedings of IEEE\/CVF Conference on European conference on computer vision (ECCV).","DOI":"10.1007\/978-3-030-58568-6_43"},{"key":"1643_CR33","doi-asserted-by":"crossref","unstructured":"Wang, Y. X., Hebert, M. (2016). Learning to learn: Model regression networks for easy small sample learning. Paper presented at the European Conference on computer vision (ECCV).","DOI":"10.1007\/978-3-319-46466-4_37"},{"key":"1643_CR34","unstructured":"Wang, Y. X., Ramanan, D., Hebert, M. (2017). Learning to model the tail. Paper presented at Advances in neural information processing systems (NeurIPS)."}],"container-title":["International Journal of Computer Vision"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11263-022-01643-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11263-022-01643-3\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11263-022-01643-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,9,10]],"date-time":"2022-09-10T10:14:35Z","timestamp":1662804875000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11263-022-01643-3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,8,18]]},"references-count":34,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2022,10]]}},"alternative-id":["1643"],"URL":"https:\/\/doi.org\/10.1007\/s11263-022-01643-3","relation":{},"ISSN":["0920-5691","1573-1405"],"issn-type":[{"value":"0920-5691","type":"print"},{"value":"1573-1405","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,8,18]]},"assertion":[{"value":"13 September 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"28 June 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"18 August 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical Approval"}},{"value":"Not applicable.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent to Participate"}},{"value":"Not applicable.","order":5,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for Publication"}}]}}