{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,20]],"date-time":"2026-03-20T08:04:39Z","timestamp":1773993879122,"version":"3.50.1"},"reference-count":35,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2023,1,6]],"date-time":"2023-01-06T00:00:00Z","timestamp":1672963200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Fundamental Research Funds for the Central Universities","award":["lzujbky-2021-ct09"],"award-info":[{"award-number":["lzujbky-2021-ct09"]}]},{"name":"Fundamental Research Funds for the Central Universities","award":["21JR7RA457"],"award-info":[{"award-number":["21JR7RA457"]}]},{"name":"Fundamental Research Funds for the Central Universities","award":["kjcx2022010"],"award-info":[{"award-number":["kjcx2022010"]}]},{"name":"Fundamental Research Funds for the Central Universities","award":["62176108"],"award-info":[{"award-number":["62176108"]}]},{"name":"Science and Technology support program of Gansu Province of China","award":["lzujbky-2021-ct09"],"award-info":[{"award-number":["lzujbky-2021-ct09"]}]},{"name":"Science and Technology support program of Gansu Province of China","award":["21JR7RA457"],"award-info":[{"award-number":["21JR7RA457"]}]},{"name":"Science and Technology support program of Gansu Province of China","award":["kjcx2022010"],"award-info":[{"award-number":["kjcx2022010"]}]},{"name":"Science and Technology support program of Gansu Province of China","award":["62176108"],"award-info":[{"award-number":["62176108"]}]},{"name":"Science and Technology innovation Project of Forestry and Grassland Bureau of Gansu Province","award":["lzujbky-2021-ct09"],"award-info":[{"award-number":["lzujbky-2021-ct09"]}]},{"name":"Science and Technology innovation Project of Forestry and Grassland Bureau of Gansu Province","award":["21JR7RA457"],"award-info":[{"award-number":["21JR7RA457"]}]},{"name":"Science and Technology innovation Project of Forestry and Grassland Bureau of Gansu Province","award":["kjcx2022010"],"award-info":[{"award-number":["kjcx2022010"]}]},{"name":"Science and Technology innovation Project of Forestry and Grassland Bureau of Gansu Province","award":["62176108"],"award-info":[{"award-number":["62176108"]}]},{"name":"National Natural Science Foundation of China","award":["lzujbky-2021-ct09"],"award-info":[{"award-number":["lzujbky-2021-ct09"]}]},{"name":"National Natural Science Foundation of China","award":["21JR7RA457"],"award-info":[{"award-number":["21JR7RA457"]}]},{"name":"National Natural Science Foundation of China","award":["kjcx2022010"],"award-info":[{"award-number":["kjcx2022010"]}]},{"name":"National Natural Science Foundation of China","award":["62176108"],"award-info":[{"award-number":["62176108"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Semantic segmentation for urban remote sensing images is one of the most-crucial tasks in the field of remote sensing. Remote sensing images contain rich information on ground objects, such as shape, location, and boundary and can be found in high-resolution remote sensing images. It is exceedingly challenging to identify remote sensing images because of the large intraclass variance and low interclass variance caused by these objects. In this article, we propose a multiscale hierarchical channel attention fusion network model based on a transformer and CNN, which we name the multiscale channel attention fusion network (MCAFNet). MCAFNet uses ResNet-50 and Vit-B\/16 to learn the global\u2013local context, and this strengthens the semantic feature representation. Specifically, a global\u2013local transformer block (GLTB) is deployed in the encoder stage. This design handles image details at low resolution and extracts global image features better than previous methods. In the decoder module, a channel attention optimization module and a fusion module are added to better integrate high- and low-dimensional feature maps, which enhances the network\u2019s ability to obtain small-scale semantic information. The proposed method is conducted on the ISPRS Vaihingen and Potsdam datasets. Both quantitative and qualitative evaluations show the competitive performance of MCAFNet in comparison to the performance of the mainstream methods. In addition, we performed extensive ablation experiments on the Vaihingen dataset in order to test the effectiveness of multiple network components.<\/jats:p>","DOI":"10.3390\/rs15020361","type":"journal-article","created":{"date-parts":[[2023,1,9]],"date-time":"2023-01-09T04:47:08Z","timestamp":1673239628000},"page":"361","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":42,"title":["MCAFNet: A Multiscale Channel Attention Fusion Network for Semantic Segmentation of Remote Sensing Images"],"prefix":"10.3390","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7855-8678","authenticated-orcid":false,"given":"Min","family":"Yuan","sequence":"first","affiliation":[{"name":"School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, China"}]},{"given":"Dingbang","family":"Ren","sequence":"additional","affiliation":[{"name":"School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, China"}]},{"given":"Qisheng","family":"Feng","sequence":"additional","affiliation":[{"name":"College of Pastoral Agriculture Science and Technology, Lanzhou University, Lanzhou 730000, China"}]},{"given":"Zhaobin","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, China"}]},{"given":"Yongkang","family":"Dong","sequence":"additional","affiliation":[{"name":"School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, China"}]},{"given":"Fuxiang","family":"Lu","sequence":"additional","affiliation":[{"name":"School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, China"}]},{"given":"Xiaolin","family":"Wu","sequence":"additional","affiliation":[{"name":"School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, China"}]}],"member":"1968","published-online":{"date-parts":[[2023,1,6]]},"reference":[{"key":"ref_1","first-page":"50","article-title":"A Survey on Semantic Segmentation using Deep Learning Techniques","volume":"9","author":"Tapasvi","year":"2021","journal-title":"Int. J. Eng. Res. Technol."},{"key":"ref_2","first-page":"5998","article-title":"Attention is all you need","volume":"30","author":"Vaswani","year":"2017","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_3","unstructured":"Parmar, N., Vaswani, A., Uszkoreit, J., Kaiser, L., Shazeer, N., Ku, A., and Tran, D. (2018, January 10\u201315). Image transformer. Proceedings of the International Conference on Machine Learning, Stockholm, Sweden."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Zheng, S., Lu, J., Zhao, H., Zhu, X., Luo, Z., Wang, Y., Fu, Y., Feng, J., Xiang, T., and Torr, P.H. (2021, January 20\u201325). Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00681"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Wang, H., Zhu, Y., Adam, H., Yuille, A., and Chen, L.C. (2021, January 20\u201325). Max-deeplab: End-to-end panoptic segmentation with mask transformers. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00542"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"He, L., Zhou, Q., Li, X., Niu, L., Cheng, G., Li, X., Liu, W., Tong, Y., Ma, L., and Zhang, L. (2021, January 20\u201324). End-to-end video object detection with spatial-temporal transformers. Proceedings of the 29th ACM International Conference on Multimedia, Virtual Event.","DOI":"10.1145\/3474085.3475285"},{"key":"ref_7","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An image is worth 16 \u00d7 16 words: Transformers for image recognition at scale. arXiv."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Esser, P., Rombach, R., and Ommer, B. (2021, January 20\u201325). Taming transformers for high-resolution image synthesis. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01268"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T., and Xie, S. (2022, January 19\u201320). A ConvNet for the 2020s. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01167"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1038\/nature14539","article-title":"Deep learning","volume":"521","author":"LeCun","year":"2015","journal-title":"Nature"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. (2017, January 21\u201326). Pyramid scene parsing network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.660"},{"key":"ref_12","unstructured":"Li, H., Xiong, P., An, J., and Wang, L. (2018). Pyramid attention network for semantic segmentation. arXiv."},{"key":"ref_13","unstructured":"Chen, L.C., Papandreou, G., Schroff, F., and Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. arXiv."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Yu, F., Koltun, V., and Funkhouser, T. (2017, January 21\u201326). Dilated residual networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.75"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Yang, M., Yu, K., Zhang, C., Li, Z., and Yang, K. (2018, January 18\u201323). Denseaspp for semantic segmentation in street scenes. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00388"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"107","DOI":"10.1016\/j.patcog.2019.03.011","article-title":"Color image segmentation based on multi-level Tsallis\u2013Havrda\u2013Charv\u00e1t entropy and 2D histogram using PSO algorithms","volume":"92","author":"Borjigin","year":"2019","journal-title":"Pattern Recognit."},{"key":"ref_17","unstructured":"Wu, Z., Shen, C., and Hengel, A.V.d. (2017). Real-time semantic image segmentation via spatial sparsity. arXiv."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Xu, Q., Ma, Y., Wu, J., and Long, C. (2021, January 18\u201322). Faster BiSeNet: A Faster Bilateral Segmentation Network for Real-time Semantic Segmentation. Proceedings of the 2021 International Joint Conference on Neural Networks (IJCNN), Shenzhen, China.","DOI":"10.1109\/IJCNN52387.2021.9533819"},{"key":"ref_19","unstructured":"Poudel, R.P., Liwicki, S., and Cipolla, R. (2019). Fast-scnn: Fast semantic segmentation network. arXiv."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Long, J., Shelhamer, E., and Darrell, T. (2015, January 7\u201312). Fully convolutional networks for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015, January 5\u20139). U-net: Convolutional networks for biomedical image segmentation. Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany.","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"ref_22","unstructured":"Badrinarayanan, V., Handa, A., and Cipolla, R. (2015). Segnet: A deep convolutional encoder\u2013decoder architecture for robust semantic pixel-wise labelling. arXiv."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Lin, G., Milan, A., Shen, C., and Reid, I. (2017, January 21\u201326). Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.549"},{"key":"ref_24","first-page":"12077","article-title":"SegFormer: Simple and efficient design for semantic segmentation with transformers","volume":"34","author":"Xie","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021, January 10\u201317). Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"ref_26","unstructured":"Wang, L., Fang, S., Zhang, C., Li, R., and Duan, C. (2021). Efficient Hybrid Transformer: Learning Global-local Context for Urban Scene Segmentation. arXiv."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"188","DOI":"10.1016\/j.neunet.2021.01.021","article-title":"Bilateral attention decoder: A lightweight decoder for real-time semantic segmentation","volume":"137","author":"Peng","year":"2021","journal-title":"Neural Netw."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"426","DOI":"10.1109\/TGRS.2020.2994150","article-title":"LANet: Local attention embedding to improve the semantic segmentation of remote sensing images","volume":"59","author":"Ding","year":"2020","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_29","first-page":"5601313","article-title":"Cross fusion net: A fast semantic segmentation network for small-scale semantic information capturing in aerial scenes","volume":"60","author":"Peng","year":"2021","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Yang, H., Wu, P., Yao, X., Wu, Y., Wang, B., and Xu, Y. (2018). Building extraction in very high resolution imagery by dense-attention networks. Remote Sens., 10.","DOI":"10.3390\/rs10111768"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Eigen, D., and Fergus, R. (2015, January 7\u201313). Predicting depth, surface normals and semantic labels with a common multiscale convolutional architecture. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.304"},{"key":"ref_32","unstructured":"Sun, K., Zhao, Y., Jiang, B., Cheng, T., Xiao, B., Liu, D., Mu, Y., Wang, X., Liu, W., and Wang, J. (2019). High-resolution representations for labeling pixels and regions. arXiv."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"834","DOI":"10.1109\/TPAMI.2017.2699184","article-title":"Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs","volume":"40","author":"Chen","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_34","unstructured":"Chen, J., Lu, Y., Yu, Q., Luo, X., Adeli, E., Wang, Y., Lu, L., Yuille, A.L., and Zhou, Y. (2021). Transunet: Transformers make strong encoders for medical image segmentation. arXiv."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Ioffe, S., Vanhoucke, V., and Alemi, A. (2017, January 4\u20139). Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17), San Francisco, CA, USA.","DOI":"10.1609\/aaai.v31i1.11231"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/15\/2\/361\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T18:02:18Z","timestamp":1760119338000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/15\/2\/361"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,1,6]]},"references-count":35,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2023,1]]}},"alternative-id":["rs15020361"],"URL":"https:\/\/doi.org\/10.3390\/rs15020361","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,1,6]]}}}