{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,19]],"date-time":"2026-03-19T14:35:54Z","timestamp":1773930954110,"version":"3.50.1"},"reference-count":49,"publisher":"MDPI AG","issue":"6","license":[{"start":{"date-parts":[[2024,3,8]],"date-time":"2024-03-08T00:00:00Z","timestamp":1709856000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Key R&amp;D Program of China","award":["2022YFB3902300"],"award-info":[{"award-number":["2022YFB3902300"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>The complementary characteristics of SAR and optical images are beneficial in improving the accuracy of land cover classification. Deep learning-based models have achieved some notable results. However, how to effectively extract and fuse the unique features of multi-modal images for pixel-level classification remains challenging. In this article, a two-branch supervised semantic segmentation framework without any pretrained backbone is proposed. Specifically, a novel symmetric attention module is designed with improved strip pooling. The multiple long receptive fields can better perceive irregular objects and obtain more anisotropic contextual information. Meanwhile, to solve the semantic absence and inconsistency of different modalities, we construct a multi-scale fusion module, which is composed of atrous spatial pyramid pooling, varisized convolutions and skip connections. A joint loss function is introduced to constrain the backpropagation and reduce the impact of class imbalance. Validation experiments were implemented on the DFC2020 and WHU-OPT-SAR datasets. The proposed model achieved the best quantitative values on the metrics of OA, Kappa and mIoU, and its class accuracy was also excellent. It is worth mentioning that the number of parameters and the computational complexity of the method are relatively low. The adaptability of the model was verified on RGB\u2013thermal segmentation task.<\/jats:p>","DOI":"10.3390\/rs16060957","type":"journal-article","created":{"date-parts":[[2024,3,8]],"date-time":"2024-03-08T11:47:30Z","timestamp":1709898450000},"page":"957","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":15,"title":["Multi-Scale Feature Fusion Network with Symmetric Attention for Land Cover Classification Using SAR and Optical Images"],"prefix":"10.3390","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1528-4363","authenticated-orcid":false,"given":"Dongdong","family":"Xu","sequence":"first","affiliation":[{"name":"Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3078-1886","authenticated-orcid":false,"given":"Zheng","family":"Li","sequence":"additional","affiliation":[{"name":"Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China"},{"name":"University of Chinese Academy of Sciences, Beijing 100049, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-7948-4090","authenticated-orcid":false,"given":"Hao","family":"Feng","sequence":"additional","affiliation":[{"name":"Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China"},{"name":"University of Chinese Academy of Sciences, Beijing 100049, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2231-7063","authenticated-orcid":false,"given":"Fanlu","family":"Wu","sequence":"additional","affiliation":[{"name":"Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1647-2956","authenticated-orcid":false,"given":"Yongcheng","family":"Wang","sequence":"additional","affiliation":[{"name":"Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China"}]}],"member":"1968","published-online":{"date-parts":[[2024,3,8]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"4340","DOI":"10.1109\/TGRS.2020.3016820","article-title":"More Diverse Means Better: Multimodal Deep Learning Meets Remote-Sensing Imagery Classification","volume":"59","author":"Hong","year":"2021","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Li, X., Zhang, G., Cui, H., Hou, S., Wang, S., Li, X., Chen, Y., Li, Z., and Zhang, L. (2022). MCANet: A joint semantic segmentation framework of optical and SAR images for land use classification. Int. J. Appl. Earth Obs., 106.","DOI":"10.1016\/j.jag.2021.102638"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"1011","DOI":"10.1109\/JSTARS.2020.2975252","article-title":"Multimodal Bilinear Fusion Network With Second-Order Attention-Based Channel Selection for Land Cover Classification","volume":"13","author":"Li","year":"2020","journal-title":"IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"28","DOI":"10.1016\/j.inffus.2021.12.004","article-title":"Image fusion in the loop of high-level vision tasks: A semantic-aware real-time infrared and visible image fusion network","volume":"82","author":"Tang","year":"2022","journal-title":"Inform. Fusion"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"3829","DOI":"10.1109\/TGRS.2020.3015389","article-title":"Collaborative Attention-Based Heterogeneous Gated Fusion Network for Land Cover Classification","volume":"59","author":"Li","year":"2021","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Meng, H., Li, C., Liu, Y., Gong, Y., He, W., and Zou, M. (2023). Corn Land Extraction Based on Integrating Optical and SAR Remote Sensing Images. Land, 12.","DOI":"10.3390\/land12020398"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"272","DOI":"10.1016\/j.isprsjprs.2023.04.008","article-title":"Aligning semantic distribution in fusing optical and SAR images for land use classification","volume":"199","author":"Li","year":"2023","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_8","first-page":"1","article-title":"Dense Adaptive Grouping Distillation Network for Multimodal Land Cover Classification With Privileged Modality","volume":"60","author":"Li","year":"2022","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"1562","DOI":"10.1109\/JSTARS.2022.3144587","article-title":"CFNet: A Cross Fusion Network for Joint Land Cover Classification Using Optical and SAR Images","volume":"15","author":"Kang","year":"2022","journal-title":"IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"2374","DOI":"10.1109\/JSTARS.2019.2915277","article-title":"Impervious Surface Estimation From Optical and Polarimetric SAR Data Using Small-Patched Deep Convolutional Networks: A Comparative Study","volume":"12","author":"Zhang","year":"2019","journal-title":"IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"937","DOI":"10.1109\/TGRS.2017.2756851","article-title":"Multisource Remote Sensing Data Classification Based on Convolutional Neural Network","volume":"56","author":"Xu","year":"2018","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"1778","DOI":"10.1109\/TGRS.2004.831865","article-title":"Classification of hyperspectral remote sensing images with support vector machines","volume":"42","author":"Melgani","year":"2004","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Talukdar, S., Singha, P., Mahato, S., Pal, S., Liou, Y.A., and Rahman, A. (2020). Land-Use Land-Cover Classification by Machine Learning Classifiers for Satellite Observations\u2014A Review. Remote Sens., 12.","DOI":"10.3390\/rs12071135"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"492","DOI":"10.1109\/TGRS.2004.842481","article-title":"Investigation of the random forest framework for classification of hyperspectral data","volume":"43","author":"Ham","year":"2005","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"1804","DOI":"10.1109\/TGRS.2008.916090","article-title":"Nearest Neighbor Classification of Remote Sensing Images With the Maximal Margin Principle","volume":"46","author":"Blanzieri","year":"2008","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Li, K., Wang, D., Wang, X., Liu, G., Wu, Z., and Wang, Q. (2023). Mixing Self-Attention and Convolution: A Unified Framework for Multisource Remote Sensing Data Classification. IEEE Trans. Geosci. Remote Sens., 61.","DOI":"10.1109\/TGRS.2023.3310521"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1016\/j.isprsjprs.2019.09.016","article-title":"Combining Sentinel-1 and Sentinel-2 Satellite Image Time Series for land cover mapping via a multi-source deep learning architecture","volume":"158","author":"Ienco","year":"2019","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Liu, S., Qi, Z., Li, X., and Yeh, A.G.O. (2019). Integration of Convolutional Neural Networks and Object-Based Post-Classification Refinement for Land Use and Land Cover Mapping with Optical and SAR Data. Remote Sens., 11.","DOI":"10.3390\/rs11060690"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Feng, Q., Yang, J., Zhu, D., Liu, J., Guo, H., Bayartungalag, B., and Li, B. (2019). Integrating Multitemporal Sentinel-1\/2 Data for Coastal Land Cover Classification Using a Multibranch Convolutional Neural Network: A Case of the Yellow River Delta. Remote Sens., 11.","DOI":"10.3390\/rs11091006"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Li, X., Lei, L., and Kuang, G. (2022). Locality-Constrained Bilinear Network for Land Cover Classification Using Heterogeneous Images. IEEE Geosci. Remote Sens. Lett., 19.","DOI":"10.1109\/LGRS.2021.3086592"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Gao, M., Xu, J., Yu, J., and Dong, Q. (2023). Distilled Heterogeneous Feature Alignment Network for SAR Image Semantic Segmentation. IEEE Geosci. Remote Sens. Lett., 20.","DOI":"10.1109\/LGRS.2023.3293160"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Kang, J., Wang, Z., Zhu, R., Xia, J., Sun, X., Fernandez-Beltran, R., and Plaza, A. (2022). DisOptNet: Distilling Semantic Knowledge From Optical Images for Weather-Independent Building Segmentation. IEEE Trans. Geosci. Remote Sens., 60.","DOI":"10.1109\/TGRS.2022.3165209"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Chen, Y., and Bruzzone, L. (2022). Self-Supervised SAR-Optical Data Fusion of Sentinel-1\/-2 Images. IEEE Trans. Geosci. Remote Sens., 60.","DOI":"10.1109\/TGRS.2021.3128072"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"7797","DOI":"10.1109\/JSTARS.2022.3204888","article-title":"Self-Supervised Learning for Invariant Representations From Multi-Spectral and SAR Images","volume":"15","author":"Jain","year":"2022","journal-title":"IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Liu, C., Sun, H., Xu, Y., and Kuang, G. (2022). Multi-Source Remote Sensing Pretraining Based on Contrastive Self-Supervised Learning. Remote Sens., 14.","DOI":"10.3390\/rs14184632"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"2269","DOI":"10.1109\/TGRS.2020.3000684","article-title":"Spectral Superresolution of Multispectral Imagery With Joint Sparse and Low-Rank Learning","volume":"59","author":"Gao","year":"2021","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"7355","DOI":"10.1109\/TGRS.2020.2982064","article-title":"Joint Classification of Hyperspectral and LiDAR Data Using Hierarchical Random Walk and Deep CNN Architecture","volume":"58","author":"Zhao","year":"2020","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Schmitt, M., Hughes, L.H., Qiu, C., and Zhu, X.X. (2019). SEN12MS\u2014A Curated Dataset of Georeferenced Multi-Spectral Sentinel-1\/2 Imagery for Deep Learning and Data Fusion. arXiv.","DOI":"10.5194\/isprs-annals-IV-2-W7-153-2019"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"154","DOI":"10.1109\/MGRS.2020.2970124","article-title":"2020 IEEE GRSS Data Fusion Contest: Global Land Cover Mapping With Weak Supervision [Technical Committees]","volume":"8","author":"Yokoya","year":"2020","journal-title":"IEEE Geosci. Remote Sens. Mag."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., and Lu, H. (2019, January 15\u201320). Dual Attention Network for Scene Segmentation. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00326"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"6896","DOI":"10.1109\/TPAMI.2020.3007032","article-title":"CCNet: Criss-Cross Attention for Semantic Segmentation","volume":"45","author":"Huang","year":"2023","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Zhong, Z., Lin, Z.Q., Bidart, R., Hu, X., Daya, I.B., Li, Z., Zheng, W.S., Li, J., and Wong, A. (2020, January 13\u201319). Squeeze-and-Attention Networks for Semantic Segmentation. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01308"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"238","DOI":"10.1016\/j.isprsjprs.2021.05.004","article-title":"An attention-fused network for semantic segmentation of very-high-resolution remote sensing imagery","volume":"177","author":"Yang","year":"2021","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Xu, Z., Zhu, J., Geng, J., Deng, X., and Jiang, W. (2021, January 11\u201316). Triplet Attention Feature Fusion Network for SAR and Optical Image Land Cover Classification. Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, Brussels, Belgium.","DOI":"10.1109\/IGARSS47720.2021.9555126"},{"key":"ref_35","unstructured":"Woo, S., Park, J., Lee, J.Y., and Kweon, I.S. (2018). Computer Vision\u2014ECCV 2018, Proceedings of the 15th European Conference, Munich, Germany, 8\u201314 September 2018, Springer International Publishing."},{"key":"ref_36","unstructured":"Tao, A., Sapra, K., and Catanzaro, B. (2020). Hierarchical Multi-Scale Attention for Semantic Segmentation. arXiv."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Yuan, M., Ren, D., Feng, Q., Wang, Z., Dong, Y., Lu, F., and Wu, X. (2023). MCAFNet: A Multiscale Channel Attention Fusion Network for Semantic Segmentation of Remote Sensing Images. Remote Sens., 15.","DOI":"10.3390\/rs15020361"},{"key":"ref_38","unstructured":"Chen, L.C., Papandreou, G., Schroff, F., and Adam, H. (2017). Rethinking Atrous Convolution for Semantic Image Segmentation. arXiv."},{"key":"ref_39","unstructured":"Lazebnik, S., Schmid, C., and Ponce, J. (2006, January 17\u201322). Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR\u201906), New York, NY, USA."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. (2017, January 21\u201326). Pyramid Scene Parsing Network. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.660"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Hou, Q., Zhang, L., Cheng, M.M., and Feng, J. (2020, January 13\u201319). Strip Pooling: Rethinking Spatial Pooling for Scene Parsing. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00406"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Song, Q., Mei, K., and Huang, R. (2021). AttaNet: Attention-Augmented Network for Fast and Accurate Scene Parsing. arXiv.","DOI":"10.1609\/aaai.v35i3.16359"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Chen, K., Dai, X., Xia, M., Weng, L., Hu, K., and Lin, H. (2023). MSFANet: Multi-Scale Strip Feature Attention Network for Cloud and Cloud Shadow Segmentation. Remote Sens., 15.","DOI":"10.3390\/rs15194853"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Vaezi Joze, H.R., Shaban, A., Iuzzolino, M.L., and Koishida, K. (2020, January 13\u201319). MMTM: Multimodal Transfer Module for CNN Fusion. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01330"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Goyal, P., Girshick, R., He, K., and Doll\u00e1r, P. (2017, January 22\u201329). Focal Loss for Dense Object Detection. Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.324"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Yu, J., Jiang, Y., Wang, Z., Cao, Z., and Huang, T. (2016, January 15\u201319). UnitBox: An Advanced Object Detection Network. Proceedings of the 24th ACM International Conference on Multimedia, Amsterdam, The Netherlands.","DOI":"10.1145\/2964284.2967274"},{"key":"ref_47","unstructured":"Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., and Adam, H. (2018). Computer Vision\u2014ECCV 2018, Proceedings of the 15th European Conference, Munich, Germany, 8\u201314 September 2018, Springer International Publishing."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Yang, M., Yu, K., Zhang, C., Li, Z., and Yang, K. (2018, January 18\u201323). DenseASPP for Semantic Segmentation in Street Scenes. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00388"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Ha, Q., Watanabe, K., Karasawa, T., Ushiku, Y., and Harada, T. (2017, January 24\u201328). MFNet: Towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes. Proceedings of the 2017 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada.","DOI":"10.1109\/IROS.2017.8206396"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/16\/6\/957\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T14:11:11Z","timestamp":1760105471000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/16\/6\/957"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,3,8]]},"references-count":49,"journal-issue":{"issue":"6","published-online":{"date-parts":[[2024,3]]}},"alternative-id":["rs16060957"],"URL":"https:\/\/doi.org\/10.3390\/rs16060957","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,3,8]]}}}