{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T01:46:48Z","timestamp":1760060808617,"version":"build-2065373602"},"reference-count":57,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2025,9,22]],"date-time":"2025-09-22T00:00:00Z","timestamp":1758499200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Laboratory of Big Data and Decision Making of National University of Defense Technology"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["MAKE"],"abstract":"<jats:p>In real-world visual recognition tasks, long-tailed distributions pose a widespread challenge, with extreme class imbalance severely limiting the representational learning capability of deep models. In practice, due to this imbalance, deep models often exhibit poor generalization performance on tail classes. To address this issue, data augmentation through the synthesis of new tail-class samples has become an effective method. One popular approach is CutMix, which explicitly mixes images from tail and other classes, constructing labels based on the ratio of the regions cropped from both images. However, region-based labels completely ignore the inherent semantic information of the augmented samples. To overcome this problem, we propose a saliency-guided local semantic mixing (LSM) method, which uses differentiable block decoupling and semantic-aware local mixing techniques. This method integrates head-class backgrounds while preserving the key discriminative features of tail classes and dynamically assigns labels to effectively augment tail-class samples. This results in efficient balancing of long-tailed data distributions and significant improvements in classification performance. The experimental validation shows that this method demonstrates significant advantages across three long-tailed benchmark datasets, improving classification accuracy by 5.0%, 7.3%, and 6.1%, respectively. Notably, the LSM framework is highly compatible, seamlessly integrating with existing classification models and providing significant performance gains, validating its broad applicability.<\/jats:p>","DOI":"10.3390\/make7030107","type":"journal-article","created":{"date-parts":[[2025,9,22]],"date-time":"2025-09-22T13:04:44Z","timestamp":1758546284000},"page":"107","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Saliency-Guided Local Semantic Mixing for Long-Tailed Image Classification"],"prefix":"10.3390","volume":"7","author":[{"ORCID":"https:\/\/orcid.org\/0009-0001-3214-0310","authenticated-orcid":false,"given":"Jiahui","family":"Lv","sequence":"first","affiliation":[{"name":"Laboratory for Big Data and Decision, National University of Defense Technology, Changsha 410073, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jun","family":"Lei","sequence":"additional","affiliation":[{"name":"Laboratory for Big Data and Decision, National University of Defense Technology, Changsha 410073, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1804-9198","authenticated-orcid":false,"given":"Jun","family":"Zhang","sequence":"additional","affiliation":[{"name":"Laboratory for Big Data and Decision, National University of Defense Technology, Changsha 410073, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chao","family":"Chen","sequence":"additional","affiliation":[{"name":"Laboratory for Big Data and Decision, National University of Defense Technology, Changsha 410073, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4958-8573","authenticated-orcid":false,"given":"Shuohao","family":"Li","sequence":"additional","affiliation":[{"name":"Laboratory for Big Data and Decision, National University of Defense Technology, Changsha 410073, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2025,9,22]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1145\/3065386","article-title":"ImageNet classification with deep convolutional neural networks","volume":"60","author":"Krizhevsky","year":"2017","journal-title":"Commun. ACM"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1007\/s11263-015-0816-y","article-title":"Imagenet large scale visual recognition challenge","volume":"115","author":"Russakovsky","year":"2015","journal-title":"Int. J. Comput. Vis."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"823","DOI":"10.1080\/01431160600746456","article-title":"A survey of image classification methods and techniques for improving classification performance","volume":"28","author":"Lu","year":"2007","journal-title":"Int. J. Remote Sens."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1921","DOI":"10.1016\/S0031-3203(98)00079-X","article-title":"On image classification: City images vs. landscapes","volume":"31","author":"Vailaya","year":"1998","journal-title":"Pattern Recognit."},{"key":"ref_5","first-page":"740","article-title":"Microsoft coco: Common objects in context","volume":"Volume 13","author":"Lin","year":"2014","journal-title":"Proceedings of the 13th European Conference"},{"doi-asserted-by":"crossref","unstructured":"Liu, Z., Miao, Z., Zhan, X., Wang, J., Gong, B., and Yu, S.X. (2019, January 15\u201320). Large-scale long-tailed recognition in an open world. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","key":"ref_6","DOI":"10.1109\/CVPR.2019.00264"},{"key":"ref_7","first-page":"1567","article-title":"Learning imbalanced datasets with label-distribution-aware margin loss","volume":"32","author":"Cao","year":"2019","journal-title":"Adv. Neural Inf. Process. Syst."},{"doi-asserted-by":"crossref","unstructured":"Li, T., Cao, P., Yuan, Y., Fan, L., Yang, Y., Feris, R.S., Indyk, P., and Katabi, D. (2022, January 18\u201324). Targeted supervised contrastive learning for long-tailed recognition. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","key":"ref_8","DOI":"10.1109\/CVPR52688.2022.00679"},{"doi-asserted-by":"crossref","unstructured":"Tan, J., Wang, C., Li, B., Li, Q., Ouyang, W., Yin, C., and Yan, J. (2020, January 13\u201319). Equalization loss for long-tailed object recognition. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","key":"ref_9","DOI":"10.1109\/CVPR42600.2020.01168"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"60","DOI":"10.1016\/j.media.2017.07.005","article-title":"A survey on deep learning in medical image analysis","volume":"42","author":"Litjens","year":"2017","journal-title":"Med. Image Anal."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"110415","DOI":"10.1016\/j.asoc.2023.110415","article-title":"A broad review on class imbalance learning techniques","volume":"143","author":"Rezvani","year":"2023","journal-title":"Appl. Soft Comput."},{"doi-asserted-by":"crossref","unstructured":"Miao, W., Pang, G., Bai, X., Li, T., and Zheng, J. (2024, January 26\u201327). Out-of-distribution detection in long-tailed recognition with calibrated outlier class learning. Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada.","key":"ref_12","DOI":"10.1609\/aaai.v38i5.28217"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1613\/jair.953","article-title":"SMOTE: Synthetic minority over-sampling technique","volume":"16","author":"Chawla","year":"2002","journal-title":"J. Artif. Intell. Res."},{"doi-asserted-by":"crossref","unstructured":"Zhou, B., Cui, Q., Wei, X.S., and Chen, Z.M. (2020, January 13\u201319). Bbn: Bilateral-branch network with cumulativelearning for long-tailed visual recognition. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","key":"ref_14","DOI":"10.1109\/CVPR42600.2020.00974"},{"unstructured":"Kang, B., Xie, S., Rohrbach, M., Yan, Z., Gordo, A., Feng, J., and Kalantidis, Y. (2019). Decoupling representation and classifier for long-tailed recognition. arXiv.","key":"ref_15"},{"key":"ref_16","first-page":"5149","article-title":"Meta-learning in neural networks: A survey","volume":"44","author":"Hospedales","year":"2021","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_17","first-page":"4175","article-title":"Balanced meta-softmax for long-tailed visual recognition","volume":"33","author":"Ren","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"doi-asserted-by":"crossref","unstructured":"Vu, D.Q., and Thu, M.T.H. (2024, January 16\u201317). Smooth Balance Softmax for Long-Tailed Image Classification. Proceedings of the International Conference on Advances in Information and Communication Technology, Phu Tho, Vietnam.","key":"ref_18","DOI":"10.1007\/978-3-031-80943-9_36"},{"doi-asserted-by":"crossref","unstructured":"Zang, Y., Huang, C., and Loy, C.C. (2021, January 11\u201317). Fasa: Feature augmentation and sampling adaptation for long-tailed instance segmentation. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","key":"ref_19","DOI":"10.1109\/ICCV48922.2021.00344"},{"doi-asserted-by":"crossref","unstructured":"Wang, T., Li, Y., Kang, B., Li, J., Liew, J., Tang, S., Hoi, S., and Feng, J. (2020, January 23\u201328). The devil is in classification: A simple framework for long-tail instance segmentation. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","key":"ref_20","DOI":"10.1007\/978-3-030-58568-6_43"},{"doi-asserted-by":"crossref","unstructured":"Van Horn, G., Mac Aodha, O., Song, Y., Cui, Y., Sun, C., Shepard, A., Adam, H., Perona, P., and Belongie, S. (2018, January 18\u201323). The inaturalist species classification and detection dataset. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","key":"ref_21","DOI":"10.1109\/CVPR.2018.00914"},{"doi-asserted-by":"crossref","unstructured":"Zhang, H., Cisse, M., Dauphin, Y.N., and Lopez-Paz, D. (2017). mixup: Beyond empirical risk minimization. arXiv.","key":"ref_22","DOI":"10.1007\/978-1-4899-7687-1_79"},{"unstructured":"Yun, S., Han, D., Oh, S.J., Chun, S., Choe, J., and Yoo, Y. (November, January 17). Cutmix: Regularization strategy to train strong classifiers with localizable features. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea.","key":"ref_23"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3643806","article-title":"A survey of mix-based data augmentation: Taxonomy, methods, applications, and explainability","volume":"57","author":"Cao","year":"2024","journal-title":"Acm Comput. Surv."},{"unstructured":"Qin, H., Jin, X., Zhu, H., Liao, H., El-Yacoubi, M.A., and Gao, X. (October, January 29). Sumix: Mixup with semantic and uncertain information. Proceedings of the European Conference on Computer Vision, Milan, Italy.","key":"ref_25"},{"unstructured":"Zhang, Y., Kang, B., Hooi, B., Yan, S., and Feng, J. (2021). Deep long-tailed learning: A survey. arXiv.","key":"ref_26"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"249","DOI":"10.1016\/j.neunet.2018.07.011","article-title":"A systematic study of the class imbalance problem in convolutional neural networks","volume":"106","author":"Buda","year":"2018","journal-title":"Neural Netw."},{"doi-asserted-by":"crossref","unstructured":"Guo, H., and Wang, S. (2021, January 20\u201325). Long-tailed multi-label visual recognition by collaborative training on uniform and re-balanced samplings. Proceedings of the IEEE\/CVF Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA.","key":"ref_28","DOI":"10.1109\/CVPR46437.2021.01484"},{"key":"ref_29","first-page":"1","article-title":"C4. 5, class imbalance, and cost sensitivity: Why under-sampling beats over-sampling","volume":"Volume 11","author":"Drummond","year":"2003","journal-title":"Imbalanced Datasets II"},{"unstructured":"Byrd, J., and Lipton, Z. (2019). What is the effect of importance weighting in deep learning?. International Conference on Machine Learning, PMLR.","key":"ref_30"},{"doi-asserted-by":"crossref","unstructured":"Cui, Y., Jia, M., Lin, T.Y., Song, Y., and Belongie, S. (2019, January 15\u201320). Class-balanced loss based on effective number of samples. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","key":"ref_31","DOI":"10.1109\/CVPR.2019.00949"},{"unstructured":"Menon, A.K., Jayasumana, S., Rawat, A.S., Jain, H., Veit, A., and Kumar, S. (2020). Long-tail learning via logit adjustment. arXiv.","key":"ref_32"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"5721","DOI":"10.1109\/TIP.2023.3321461","article-title":"Inverse image frequency for long-tailed image recognition","volume":"32","author":"Alexandridis","year":"2003","journal-title":"IEEE Trans. Image Process."},{"doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Goyal, P., Girshick, R., He, K., and Doll\u00e1r, P. (2017, January 22\u201329). Focal loss for dense object detection. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","key":"ref_34","DOI":"10.1109\/ICCV.2017.324"},{"unstructured":"Shu, J., Xie, Q., Yi, L., Zhao, Q., Zhou, S., Xu, Z., and Meng, D. (2019, January 8\u201314). Meta-weight-net: Learning an explicit mapping for sample weighting. Proceedings of the International Conference on Neural Information Processing Systems, Vancouver, BC, Canada.","key":"ref_35"},{"doi-asserted-by":"crossref","unstructured":"Zhu, J., Wang, Z., Chen, J., Chen, Y.P., and Jiang, Y.G. (2022, January 18\u201324). Balanced contrastive learning for long-tailed visual recognition. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","key":"ref_36","DOI":"10.1109\/CVPR52688.2022.00678"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3592615","article-title":"Data augmentation-based novel deep learning method for deepfaked images detection","volume":"20","author":"Iqbal","year":"2024","journal-title":"Acm Trans. Multimed. Comput. Commun. Appl."},{"unstructured":"Wang, Z., Wang, P., Liu, K., Wang, P., Fu, Y., Lu, C.T., Aggarwal, C.C., Pei, J., and Zhou, Y. (2024). A comprehensive survey on data augmentation. arXiv.","key":"ref_38"},{"unstructured":"DeVries, T., and Taylor, G.W. (2017). Improved regularization of convolutional neural net works with cutout. arXiv.","key":"ref_39"},{"doi-asserted-by":"crossref","unstructured":"Salehin, I., and Kang, D.K. (2023). A review on dropout regularization approaches for deep neural networks within the scholarly domain. Electronics, 12.","key":"ref_40","DOI":"10.3390\/electronics12143106"},{"doi-asserted-by":"crossref","unstructured":"Li, L., and Li, A. (2023, January 17\u201324). A2-aug: Adaptive automated data augmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada.","key":"ref_41","DOI":"10.1109\/CVPRW59228.2023.00221"},{"doi-asserted-by":"crossref","unstructured":"Park, S., Hong, Y., Heo, B., Yun, S., and Choi, J.Y. (2022, January 18\u201324). The majority can help the minority: Context-rich minority oversampling for long-tailed classification. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","key":"ref_42","DOI":"10.1109\/CVPR52688.2022.00676"},{"doi-asserted-by":"crossref","unstructured":"Zhong, Z., Cui, J., Liu, S., and Jia, J. (2021, January 20\u201325). Improving calibration for long-tailed recognition. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","key":"ref_43","DOI":"10.1109\/CVPR46437.2021.01622"},{"doi-asserted-by":"crossref","unstructured":"Chou, H.P., Chang, S.C., Pan, J.Y., Wei, W., and Juan, D.C. (2020, January 23\u201328). Remix: Rebalanced mixup. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","key":"ref_44","DOI":"10.1007\/978-3-030-65414-6_9"},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"103951","DOI":"10.1016\/j.dsp.2023.103951","article-title":"CRmix: A regularization by clipping images and replacing mixed samples for imbalanced classification","volume":"135","author":"Li","year":"2023","journal-title":"Digit. Signal Process."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"n2","DOI":"10.1080\/10691898.2011.11889611","article-title":"Measuring skewness: A forgotten statistic?","volume":"19","author":"Doane","year":"2011","journal-title":"J. Stat. Educ."},{"doi-asserted-by":"crossref","unstructured":"Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., and Batra, D. (2017, January 22\u201329). Grad-cam: Visual explanations from deep networks via gradient-based localization. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","key":"ref_47","DOI":"10.1109\/ICCV.2017.74"},{"doi-asserted-by":"crossref","unstructured":"Shi, W., Caballero, J., Husz\u00e1r, F., Totz, J., Aitken, A.P., Bishop, R., Rueckert, D., and Wang, Z. (2016, January 27\u201330). Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","key":"ref_48","DOI":"10.1109\/CVPR.2016.207"},{"doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","key":"ref_49","DOI":"10.1109\/CVPR.2016.90"},{"doi-asserted-by":"crossref","unstructured":"Hong, Y., Han, S., Choi, K., Seo, S., Kim, B., and Chang, B. (2021, January 20\u201325). Disentangling label distribution for long-tailed visual recognition. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","key":"ref_50","DOI":"10.1109\/CVPR46437.2021.00656"},{"key":"ref_51","first-page":"18661","article-title":"Supervised contrastive learning","volume":"33","author":"Khosla","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"doi-asserted-by":"crossref","unstructured":"Wang, P., Han, K., Wei, X.S., Zhang, L., and Wang, L. (2021, January 20\u201325). Contrastive learning based hybrid networks for long-tailed image classification. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","key":"ref_52","DOI":"10.1109\/CVPR46437.2021.00100"},{"doi-asserted-by":"crossref","unstructured":"Hou, C., Zhang, J., Wang, H., and Zhou, T. (2023, January 1\u20136). Subclass-balancing contrastive learning for long-tailed recognition. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Paris, France.","key":"ref_53","DOI":"10.1109\/ICCV51070.2023.00497"},{"doi-asserted-by":"crossref","unstructured":"Zhou, Z., Li, L., Zhao, P., Heng, P.A., and Gong, W. (2023, January 17\u201324). Class-conditional sharpness-aware minimization for deep long-tailed recognition. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada.","key":"ref_54","DOI":"10.1109\/CVPR52729.2023.00341"},{"doi-asserted-by":"crossref","unstructured":"Mahajan, D., Girshick, R., Ramanathan, V., He, K., Paluri, M., Li, Y., Bharambe, A., and Van Der Maaten, L. (2018, January 8\u201314). Exploring the limits of weakly supervised pretraining. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","key":"ref_55","DOI":"10.1007\/978-3-030-01216-8_12"},{"unstructured":"Wang, X., Lian, L., Miao, Z., Liu, Z., and Yu, S.X. (2020). Long-tailed recognition by routing diverse distribution-aware experts. arXiv.","key":"ref_56"},{"doi-asserted-by":"crossref","unstructured":"Sharma, S., Xian, Y., Yu, N., and Singh, A. (2023). Learning prototype classifiers for long-tailed recognition. arXiv.","key":"ref_57","DOI":"10.24963\/ijcai.2023\/151"}],"container-title":["Machine Learning and Knowledge Extraction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-4990\/7\/3\/107\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T18:47:07Z","timestamp":1760035627000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-4990\/7\/3\/107"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9,22]]},"references-count":57,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2025,9]]}},"alternative-id":["make7030107"],"URL":"https:\/\/doi.org\/10.3390\/make7030107","relation":{},"ISSN":["2504-4990"],"issn-type":[{"type":"electronic","value":"2504-4990"}],"subject":[],"published":{"date-parts":[[2025,9,22]]}}}