{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,14]],"date-time":"2026-04-14T04:29:37Z","timestamp":1776140977133,"version":"3.50.1"},"reference-count":48,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2024,1,30]],"date-time":"2024-01-30T00:00:00Z","timestamp":1706572800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["2046059"],"award-info":[{"award-number":["2046059"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Evaluating classification accuracy is a key component of the training and validation stages of thematic map production, and the choice of metric has profound implications for both the success of the training process and the reliability of the final accuracy assessment. We explore key considerations in selecting and interpreting loss and assessment metrics in the context of data imbalance, which arises when the classes have unequal proportions within the dataset or landscape being mapped. The challenges involved in calculating single, integrated measures that summarize classification success, especially for datasets with considerable data imbalance, have led to much confusion in the literature. This confusion arises from a range of issues, including a lack of clarity over the redundancy of some accuracy measures, the importance of calculating final accuracy from population-based statistics, the effects of class imbalance on accuracy statistics, and the differing roles of accuracy measures when used for training and final evaluation. In order to characterize classification success at the class level, users typically generate averages from the class-based measures. These averages are sometimes generated at the macro-level, by taking averages of the individual-class statistics, or at the micro-level, by aggregating values within a confusion matrix, and then, calculating the statistic. We show that the micro-averaged producer\u2019s accuracy (recall), user\u2019s accuracy (precision), and F1-score, as well as weighted macro-averaged statistics where the class prevalences are used as weights, are all equivalent to each other and to the overall accuracy, and thus, are redundant and should be avoided. Our experiment, using a variety of loss metrics for training, suggests that the choice of loss metric is not as complex as it might appear to be, despite the range of choices available, which include cross-entropy (CE), weighted CE, and micro- and macro-Dice. The highest, or close to highest, accuracies in our experiments were obtained by using CE loss for models trained with balanced data, and for models trained with imbalanced data, the highest accuracies were obtained by using weighted CE loss. We recommend that, since weighted CE loss used with balanced training is equivalent to CE, weighted CE loss is a good all-round choice. Although Dice loss is commonly suggested as an alternative to CE loss when classes are imbalanced, micro-averaged Dice is similar to overall accuracy, and thus, is particularly poor for training with imbalanced data. Furthermore, although macro-Dice resulted in models with high accuracy when the training used balanced data, when the training used imbalanced data, the accuracies were lower than for weighted CE. In summary, the significance of this paper lies in its provision of readers with an overview of accuracy and loss metric terminology, insight regarding the redundancy of some measures, and guidance regarding best practices.<\/jats:p>","DOI":"10.3390\/rs16030533","type":"journal-article","created":{"date-parts":[[2024,1,30]],"date-time":"2024-01-30T11:35:34Z","timestamp":1706614534000},"page":"533","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":94,"title":["Selecting and Interpreting Multiclass Loss and Accuracy Assessment Metrics for Classifications with Class Imbalance: Guidance and Best Practices"],"prefix":"10.3390","volume":"16","author":[{"given":"Sarah","family":"Farhadpour","sequence":"first","affiliation":[{"name":"Department of Geology and Geography, West Virginia University, Morgantown, WV 26505, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0414-9748","authenticated-orcid":false,"given":"Timothy A.","family":"Warner","sequence":"additional","affiliation":[{"name":"Department of Geology and Geography, West Virginia University, Morgantown, WV 26505, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4412-5599","authenticated-orcid":false,"given":"Aaron E.","family":"Maxwell","sequence":"additional","affiliation":[{"name":"Department of Geology and Geography, West Virginia University, Morgantown, WV 26505, USA"}]}],"member":"1968","published-online":{"date-parts":[[2024,1,30]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Congalton, R., and Green, K. (2019). Assessing the Accuracy of Remotely Sensed Data: Principles and Practices, CRC Press. [3rd ed.].","DOI":"10.1201\/9780429052729"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Warner, T.A., Nellis, M.D., and Foody, G.M. (2009). The SAGE Handbook of Remote Sensing, SAGE Publications, Inc.","DOI":"10.4135\/9780857021052"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"102035","DOI":"10.1016\/j.media.2021.102035","article-title":"Loss odyssey in medical image segmentation","volume":"71","author":"Ma","year":"2021","journal-title":"Med. Image Anal."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"102026","DOI":"10.1016\/j.compmedimag.2021.102026","article-title":"Unified Focal loss: Generalising Dice and cross entropy-based losses to handle class imbalanced medical image segmentation","volume":"95","author":"Yeung","year":"2022","journal-title":"Comput. Med. Imaging Graph."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"111199","DOI":"10.1016\/j.rse.2019.05.018","article-title":"Key issues in rigorous accuracy assessment of land cover products","volume":"231","author":"Stehman","year":"2019","journal-title":"Remote Sens. Environ."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Maxwell, A.E., Warner, T.A., and Guill\u00e9n, L.A. (2021). Accuracy Assessment in Convolutional Neural Network-Based Deep Learning Remote Sensing Studies\u2014Part 1: Literature Review. Remote Sens., 13.","DOI":"10.3390\/rs13132450"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Gowda, T., You, W., Lignos, C., and May, J. (2021, January 12). Macro-Average: Rare Types Are Important Too. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online.","DOI":"10.18653\/v1\/2021.naacl-main.90"},{"key":"ref_8","unstructured":"Grandini, M., Bagli, E., and Visani, G. (2020). Metrics for Multi-Class Classification: An Overview. arXiv."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"331","DOI":"10.1016\/S0034-4257(98)00010-8","article-title":"Design and Analysis for Thematic Map Accuracy Assessment: Fundamental Principles","volume":"64","author":"Stehman","year":"1998","journal-title":"Remote Sens. Environ."},{"key":"ref_10","first-page":"727","article-title":"Statistical Rigor and Practical Utility in Thematic Map Accuracy Assessment","volume":"67","author":"Stehman","year":"2001","journal-title":"Photogramm. Eng. Remote Sens."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"111","DOI":"10.1080\/01431161.2010.541950","article-title":"Impact of sample size allocation when using stratified random sampling to estimate accuracy and area of land-cover change","volume":"3","author":"Stehman","year":"2012","journal-title":"Remote Sens. Lett."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"168","DOI":"10.1016\/j.aci.2018.08.003","article-title":"Classification assessment methods","volume":"17","author":"Tharwat","year":"2021","journal-title":"Appl. Comput. Inform."},{"key":"ref_13","first-page":"1671","article-title":"Assessing Landsat classification accuracy using discrete multivariate analysis statistical techniques","volume":"49","author":"Congalton","year":"1983","journal-title":"Photogramm. Eng. Remote Sens."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"111630","DOI":"10.1016\/j.rse.2019.111630","article-title":"Explaining the unsuitability of the kappa coefficient in the assessment and comparison of the accuracy of thematic maps obtained by image classification","volume":"239","author":"Foody","year":"2020","journal-title":"Remote Sens. Environ."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"4407","DOI":"10.1080\/01431161.2011.552923","article-title":"Death to Kappa: Birth of quantity disagreement and allocation disagreement for accuracy assessment","volume":"32","author":"Pontius","year":"2011","journal-title":"Int. J. Remote Sens."},{"key":"ref_16","first-page":"5907313","article-title":"Novel Convolutions for Semantic Segmentation of Remote Sensing Images","volume":"61","author":"Xiao","year":"2023","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"7570","DOI":"10.1109\/TGRS.2020.2981082","article-title":"River Ice Segmentation with Deep Learning","volume":"58","author":"Singh","year":"2020","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"6008305","DOI":"10.1109\/LGRS.2023.3302432","article-title":"Cross-Scale Feature Propagation Network for Semantic Segmentation of High-Resolution Remote Sensing Images","volume":"20","author":"Zeng","year":"2023","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_19","unstructured":"Subramanian, V. (2018). Deep Learning with PyTorch: A Practical Approach to Building Neural Network Models Using PyTorch, Packt Publishing."},{"key":"ref_20","unstructured":"Antiga, L.P.G., Stevens, E., and Viehmann, T. (2020). Deep Learning with PyTorch, Manning."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Zhao, R., Qian, B., Zhang, X., Li, Y., Wei, R., Liu, Y., and Pan, Y. (2020, January 17\u201320). Rethinking Dice Loss for Medical Image Segmentation. Proceedings of the 2020 IEEE International Conference on Data Mining (ICDM), Sorrento, Italy.","DOI":"10.1109\/ICDM50108.2020.00094"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Sudre, C.H., Li, W., Vercauteren, T., Ourselin, S., and Jorge Cardoso, M. (2017, January 14). Generalised Dice Overlap as a Deep Learning Loss Function for Highly Unbalanced Segmentations. Proceedings of the Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, Qu\u00e9bec City, QC, Canada.","DOI":"10.1007\/978-3-319-67558-9_28"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Li, X., Sun, X., Meng, Y., Liang, J., Wu, F., and Li, J. (2020). Dice Loss for Data-imbalanced NLP Tasks. arXiv.","DOI":"10.18653\/v1\/2020.acl-main.45"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Bertels, J., Eelbode, T., Berman, M., Vandermeulen, D., Maes, F., Bisschops, R., and Blaschko, M. (2019). Optimizing the Dice Score and Jaccard Index for Medical Image Segmentation: Theory & Practice. arXiv.","DOI":"10.1007\/978-3-030-32245-8_11"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Wang, P., and Chung, A.C.S. (2018, January 20). Focal Dice Loss and Image Dilation for Brain Tumor Segmentation. Proceedings of the Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, Granada, Spain.","DOI":"10.1007\/978-3-030-00889-5_14"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Salehi, S.S., Erdogmus, D., and Gholipour, A. (2017, January 10). Tversky Loss Function for Image Segmentation Using 3D Fully Convolutional Deep Networks. Proceedings of the 8th International Workshop, MLMI 2017, Quebec City, QC, Canada.","DOI":"10.1007\/978-3-319-67389-9_44"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Abraham, N., and Khan, N.M. (2019, January 8\u201311). A Novel Focal Tversky Loss Function with Improved Attention U-Net for Lesion Segmentation. Proceedings of the 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019), Venice, Italy.","DOI":"10.1109\/ISBI.2019.8759329"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"2217","DOI":"10.1109\/JSTARS.2019.2918242","article-title":"EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification","volume":"12","author":"Helber","year":"2019","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1016\/j.rse.2011.11.026","article-title":"Sentinel-2: ESA\u2019s Optical High-Resolution Mission for GMES Operational Services","volume":"120","author":"Drusch","year":"2012","journal-title":"Remote Sens. Environ."},{"key":"ref_30","unstructured":"(2020, December 31). PyTorch [WWW Document], n.d. Available online: https:\/\/www.pytorch.org."},{"key":"ref_31","unstructured":"(2021, January 05). Welcome to Python.org [WWW Document], n.d. Python.org. Available online: https:\/\/www.python.org\/."},{"key":"ref_32","unstructured":"Bjorck, J., Gomes, C., Selman, B., and Weinberger, K.Q. (2018, January 3\u20138). Understanding Batch Normalization. Proceedings of the 32nd International Conference on Neural Information Processing Systems, Montr\u00e9al, QC, Canada."},{"key":"ref_33","unstructured":"Ioffe, S., and Szegedy, C. (2015, January 6\u201311). Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. Proceedings of the 32nd International Conference on Machine Learning, Lille, France."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Buslaev, A., Iglovikov, V.I., Khvedchenya, E., Parinov, A., Druzhinin, M., and Kalinin, A.A. (2020). Albumentations: Fast and Flexible Image Augmentations. Information, 11.","DOI":"10.3390\/info11020125"},{"key":"ref_35","unstructured":"Kuhn, M., Vaughan, D., and Hvitfeldt, E. (2021). Yardstick: Tidy Characterizations of Model Performance. R Package Version 0.0. 2021, R Core Team."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18637\/jss.v028.i05","article-title":"Building Predictive Models in R Using the caret Package","volume":"28","author":"Kuhn","year":"2008","journal-title":"J. Stat. Softw."},{"key":"ref_37","unstructured":"Evans, J.S., and Murphy, M.A. (2018). rfUtilities, R Core Team."},{"key":"ref_38","unstructured":"Pontius, R.G., and Santacruz, A. (2023). diffeR: Metrics of Difference for Comparing Pairs of Maps or Pairs of Variables, R Core Team."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"4923","DOI":"10.1080\/01431161.2014.930207","article-title":"Estimating area and map accuracy for stratified random sampling when the strata are different from the map classes","volume":"35","author":"Stehman","year":"2014","journal-title":"Int. J. Remote Sens."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"5243","DOI":"10.1080\/01431160903131000","article-title":"Sampling designs for accuracy assessment of land cover","volume":"30","author":"Stehman","year":"2009","journal-title":"Int. J. Remote Sens."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"743","DOI":"10.14358\/PERS.70.6.743","article-title":"A Critical Evaluation of the Normalized Error Matrix in Map Accuracy Assessment","volume":"70","author":"Stehman","year":"2004","journal-title":"Photogramm. Eng. Remote Sens."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"2423","DOI":"10.1080\/014311699212100","article-title":"Basic probability sampling designs for thematic map accuracy assessment","volume":"20","author":"Stehman","year":"1999","journal-title":"Int. J. Remote Sens."},{"key":"ref_43","first-page":"1343","article-title":"Comparison of systematic and random sampling for estimating the accuracy of maps generated from remotely sensed data","volume":"58","author":"Stehman","year":"1992","journal-title":"Photogramm. Eng. Remote Sens."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"113881","DOI":"10.1016\/j.rse.2023.113881","article-title":"Choosing a sample size allocation to strata based on trade-offs in precision when estimating accuracy and area of a rare class from a stratified sample","volume":"300","author":"Stehman","year":"2024","journal-title":"Remote Sens. Environ."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Goyal, P., Girshick, R., He, K., and Doll\u00e1r, P. (2017, January 22\u201329). Focal Loss for Dense Object Detection. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.324"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Ghosh, K., Bellinger, C., Corizzo, R., Branco, P., Krawczyk, B., and Japkowicz, N. (2022). The class imbalance problem in deep learning. Mach. Learn.","DOI":"10.1007\/s10994-022-06268-8"},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"27","DOI":"10.1186\/s40537-019-0192-5","article-title":"Survey on deep learning with class imbalance","volume":"6","author":"Johnson","year":"2019","journal-title":"J. Big Data"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Ding, W., Huang, D.Y., Chen, Z., Yu, X., and Lin, W. (2017, January 12\u201315). Facial action recognition using very deep networks for highly imbalanced class distribution. Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Kuala Lumpur, Malaysia.","DOI":"10.1109\/APSIPA.2017.8282246"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/16\/3\/533\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T13:51:53Z","timestamp":1760104313000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/16\/3\/533"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,1,30]]},"references-count":48,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2024,2]]}},"alternative-id":["rs16030533"],"URL":"https:\/\/doi.org\/10.3390\/rs16030533","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,1,30]]}}}