{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,10]],"date-time":"2026-03-10T01:30:30Z","timestamp":1773106230821,"version":"3.50.1"},"reference-count":44,"publisher":"MDPI AG","issue":"13","license":[{"start":{"date-parts":[[2024,6,24]],"date-time":"2024-06-24T00:00:00Z","timestamp":1719187200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of PR China","doi-asserted-by":"publisher","award":["42075130"],"award-info":[{"award-number":["42075130"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>In remote sensing image processing, the segmentation of clouds and their shadows is a fundamental and vital task. For cloud images, traditional deep learning methods often have weak generalization capabilities and are prone to interference from ground objects and noise, which not only results in poor boundary segmentation but also causes false and missed detections of small targets. To address these issues, we proposed a multi-branch attention fusion network (MAFNet). In the encoder section, the dual branches of ResNet50 and the Swin transformer extract features together. A multi-branch attention fusion module (MAFM) uses positional encoding to add position information. Additionally, multi-branch aggregation attention (MAA) in the MAFM fully fuses the same level of deep features extracted by ResNet50 and the Swin transformer, which enhances the boundary segmentation ability and small target detection capability. To address the challenge of detecting small cloud and shadow targets, an information deep aggregation module (IDAM) was introduced to perform multi-scale deep feature aggregation, which supplements high semantic information, improving small target detection. For the problem of rough segmentation boundaries, a recovery guided module (RGM) was designed in the decoder section, which enables the model to effectively allocate attention to complex boundary information, enhancing the network\u2019s focus on boundary information. Experimental results on the Cloud and Cloud Shadow dataset, HRC-WHU dataset, and SPARCS dataset indicate that MAFNet surpasses existing advanced semantic segmentation techniques.<\/jats:p>","DOI":"10.3390\/rs16132308","type":"journal-article","created":{"date-parts":[[2024,6,24]],"date-time":"2024-06-24T10:41:12Z","timestamp":1719225672000},"page":"2308","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":10,"title":["Multi-Branch Attention Fusion Network for Cloud and Cloud Shadow Segmentation"],"prefix":"10.3390","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0009-0007-2604-8167","authenticated-orcid":false,"given":"Hongde","family":"Gu","sequence":"first","affiliation":[{"name":"Collaborative Innovation Center on Atmospheric Environment and Equipment Technology, Nanjing University of Information Science and Technology, Nanjing 210044, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6366-1189","authenticated-orcid":false,"given":"Guowei","family":"Gu","sequence":"additional","affiliation":[{"name":"Collaborative Innovation Center on Atmospheric Environment and Equipment Technology, Nanjing University of Information Science and Technology, Nanjing 210044, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3056-7713","authenticated-orcid":false,"given":"Yi","family":"Liu","sequence":"additional","affiliation":[{"name":"Collaborative Innovation Center on Atmospheric Environment and Equipment Technology, Nanjing University of Information Science and Technology, Nanjing 210044, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3835-6075","authenticated-orcid":false,"given":"Haifeng","family":"Lin","sequence":"additional","affiliation":[{"name":"College of Information Science and Technology, Nanjing Forestry University, Nanjing 210000, China"}]},{"given":"Yao","family":"Xu","sequence":"additional","affiliation":[{"name":"Collaborative Innovation Center on Atmospheric Environment and Equipment Technology, Nanjing University of Information Science and Technology, Nanjing 210044, China"},{"name":"Department of Computer Science, University of Reading, Whiteknights, Reading RG6 6DH, UK"}]}],"member":"1968","published-online":{"date-parts":[[2024,6,24]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"83","DOI":"10.1016\/j.rse.2011.10.028","article-title":"Object-based cloud and cloud shadow detection in Landsat imagery","volume":"118","author":"Zhu","year":"2012","journal-title":"Remote Sens. Environ."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"2","DOI":"10.1117\/12.410332","article-title":"Comparative analysis of hyperspectral adaptive matched filter detectors","volume":"Volume 4049","author":"Manolakis","year":"2000","journal-title":"Proceedings of the Algorithms for Multispectral, Hyperspectral, and Ultraspectral Imagery VI"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1016\/j.isprsjprs.2018.07.006","article-title":"Cloud\/shadow detection based on spectral indices for multi\/hyperspectral optical remote sensing imagery","volume":"144","author":"Zhai","year":"2018","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Kegelmeyer, W. (1994). Extraction of Cloud Statistics from Whole Sky Imaging Cameras, Sandia National Lab. (SNL-CA). Technical Report.","DOI":"10.2172\/10141846"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"3322","DOI":"10.1109\/TGRS.2017.2669341","article-title":"Automatic Road Detection and Centerline Extraction via Cascaded End-to-End Convolutional Neural Network","volume":"55","author":"Cheng","year":"2017","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_6","first-page":"102597","article-title":"SUACDNet: Attentional change detection network based on siamese U-shaped structure","volume":"105","author":"Song","year":"2021","journal-title":"Int. J. Appl. Earth Obs. Geoinf."},{"key":"ref_7","unstructured":"Pereira, F., Burges, C., Bottou, L., and Weinberger, K. (2012). ImageNet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems 25, Proceedings of the 26th Annual Conference on Neural Information Processing Systems 2012, Lake Tahoe, NV, USA, 3\u20136 December 2012, Curran Associates, Inc."},{"key":"ref_8","unstructured":"Simonyan, K., and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv."},{"key":"ref_9","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (July, January 26). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Long, J., Shelhamer, E., and Darrell, T. (2015, January 7\u201312). Fully convolutional networks for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015, January 5\u20139). U-net: Convolutional networks for biomedical image segmentation. Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany.","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. (2017, January 21\u201326). Pyramid scene parsing network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.660"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., and Sang, N. (2018, January 8\u201314). BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01261-8_20"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Gu, J., Kwon, H., Wang, D., Ye, W., Li, M., Chen, Y.H., Lai, L., Chandra, V., and Pan, D.Z. (2022, January 18\u201324). Multi-Scale High-Resolution Vision Transformer for Semantic Segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01178"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Yu, C., Xiao, B., Gao, C., Yuan, L., Zhang, L., Sang, N., and Wang, J. (2021, January 20\u201325). Lite-HRNet: A Lightweight High-Resolution Network. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01030"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"834","DOI":"10.1109\/TPAMI.2017.2699184","article-title":"DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs","volume":"40","author":"Chen","year":"2018","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"65347","DOI":"10.1109\/ACCESS.2019.2917952","article-title":"DenseU-net-based semantic segmentation of small objects in urban remote sensing images","volume":"7","author":"Dong","year":"2019","journal-title":"IEEE Access"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"3532","DOI":"10.1109\/TGRS.2020.3009143","article-title":"Adaptive Effective Receptive Field Convolution for Semantic Segmentation of VHR Remote Sensing Images","volume":"59","author":"Chen","year":"2021","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Vedaldi, A., Bischof, H., Brox, T., and Frahm, J.M. (2020, January 23\u201328). End-to-End Object Detection with Transformers. Proceedings of the Computer Vision\u2014ECCV 2020, Glasgow, UK.","DOI":"10.1007\/978-3-030-58604-1"},{"key":"ref_20","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An image is worth 16 \u00d7 16 words: Transformers for image recognition at scale. arXiv."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021, January 11\u201317). Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Wang, W., Xie, E., Li, X., Fan, D.P., Song, K., Liang, D., Lu, T., Luo, P., and Shao, L. (2021, January 11\u201317). Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00061"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Wu, H., Xiao, B., Codella, N., Liu, M., Dai, X., Yuan, L., and Zhang, L. (2021, January 11\u201317). CvT: Introducing Convolutions to Vision Transformers. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00009"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"5410012","DOI":"10.1109\/TGRS.2022.3175613","article-title":"Dual-branch network for cloud and cloud shadow segmentation","volume":"60","author":"Lu","year":"2022","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"5404215","DOI":"10.1109\/TGRS.2024.3378970","article-title":"Muti-path Muti-scale Attention Network for Cloud and Cloud shadow segmentation","volume":"62","author":"Gu","year":"2024","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Wu, C., Wu, F., and Huang, Y. (2020). Da-transformer: Distance-aware transformer. arXiv.","DOI":"10.18653\/v1\/2021.naacl-main.166"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1109\/JSTARS.2022.3224081","article-title":"Axial cross attention meets CNN: Bibranch fusion network for change detection","volume":"16","author":"Song","year":"2022","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"4408715","DOI":"10.1109\/TGRS.2022.3144165","article-title":"Swin transformer embedding for remote sensing image semantic segmentation","volume":"60","author":"He","year":"2022","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Chaman, A., and Dokmanic, I. (2021, January 20\u201325). Truly shift-invariant convolutional neural networks. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00377"},{"key":"ref_30","unstructured":"Li, B., Dai, Y., Cheng, X., Chen, H., Lin, Y., and He, M. (2017, January 10\u201314). Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep CNN. Proceedings of the 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), Hong Kong."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Ha, S., Yun, J.M., and Choi, S. (2015, January 9\u201312). Multi-modal convolutional neural networks for activity recognition. Proceedings of the 2015 IEEE International Conference on Systems, Man, and Cybernetics, Kowloon Tong, Hong Kong.","DOI":"10.1109\/SMC.2015.525"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Hu, K., Zhang, D., and Xia, M. (2021). CDUNet: Cloud detection UNet for remote sensing imagery. Remote Sens., 13.","DOI":"10.3390\/rs13224533"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Woo, S., Park, J., Lee, J.Y., and Kweon, I.S. (2018, January 8\u201314). Cbam: Convolutional block attention module. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"197","DOI":"10.1016\/j.isprsjprs.2019.02.017","article-title":"Deep learning based cloud detection for medium and high resolution remote sensing images of different sensors","volume":"150","author":"Li","year":"2019","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"4907","DOI":"10.3390\/rs6064907","article-title":"Automated detection of cloud and cloud shadow in single-date Landsat imagery using neural networks and spatial post-processing","volume":"6","author":"Hughes","year":"2014","journal-title":"Remote Sens."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"1169","DOI":"10.1109\/TIP.2020.3042065","article-title":"Cgnet: A light-weight context guided network for semantic segmentation","volume":"30","author":"Wu","year":"2020","journal-title":"IEEE Trans. Image Process."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Lee, Y., Kim, J., Willette, J., and Hwang, S.J. (2022, January 18\u201324). Mpvit: Multi-path vision transformer for dense prediction. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00714"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"883","DOI":"10.1175\/BAMS-88-6-883","article-title":"Cloudnet: Continuous evaluation of cloud profiles in seven operational models using ground-based observations","volume":"88","author":"Illingworth","year":"2007","journal-title":"Bull. Am. Meteorol. Soc."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., and Adam, H. (2018, January 8\u201314). Encoder-decoder with atrous separable convolution for semantic image segmentation. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_49"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"3051","DOI":"10.1007\/s11263-021-01515-2","article-title":"Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation","volume":"129","author":"Yu","year":"2021","journal-title":"Int. J. Comput. Vis."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Guo, J., Han, K., Wu, H., Tang, Y., Chen, X., Wang, Y., and Xu, C. (2022, January 18\u201324). Cmt: Convolutional neural networks meet vision transformers. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01186"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., and Wang, M. (2022, January 23\u201327). Swin-unet: Unet-like pure transformer for medical image segmentation. Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel.","DOI":"10.1007\/978-3-031-25066-8_9"},{"key":"ref_43","unstructured":"Li, H., Xiong, P., An, J., and Wang, L. (2018). Pyramid attention network for semantic segmentation. arXiv."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"2294","DOI":"10.1016\/j.procs.2020.04.248","article-title":"OCR-nets: Variants of pre-trained CNN for Urdu handwritten character recognition via transfer learning","volume":"171","author":"KO","year":"2020","journal-title":"Procedia Comput. Sci."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/16\/13\/2308\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T15:03:39Z","timestamp":1760108619000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/16\/13\/2308"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,6,24]]},"references-count":44,"journal-issue":{"issue":"13","published-online":{"date-parts":[[2024,7]]}},"alternative-id":["rs16132308"],"URL":"https:\/\/doi.org\/10.3390\/rs16132308","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,6,24]]}}}