{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,4]],"date-time":"2026-04-04T18:08:17Z","timestamp":1775326097742,"version":"3.50.1"},"reference-count":75,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T00:00:00Z","timestamp":1675296000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62171381"],"award-info":[{"award-number":["62171381"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Fundamental Research Funds for the Central Universities","award":["62171381"],"award-info":[{"award-number":["62171381"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Semantic segmentation of high-spatial-resolution (HSR) remote sensing (RS) images has been extensively studied, and most of the existing methods are based on convolutional neural network (CNN) models. However, the CNN is regarded to have less power in global representation modeling. In the past few years, methods using transformer have attracted increasing attention and generate improved results in semantic segmentation of natural images, owing to their powerful ability in global information acquisition. Nevertheless, these transformer-based methods exhibit limited performance in semantic segmentation of RS images, probably because of the lack of comprehensive understanding in the feature decoding process. In this paper, a novel transformer-based model named the bi-decoder transformer segmentor for remote sensing (BiTSRS) is proposed, aiming at alleviating the problem of flexible feature decoding, through a bi-decoder design for semantic segmentation of RS images. In the proposed BiTSRS, the Swin transformer is adopted as encoder to take both global and local representations into consideration, and a unique design module (ITM) is designed to deal with the limitation of input size for Swin transformer. Furthermore, BiTSRS adopts a bi-decoder structure consisting of a Dilated-Uper decoder and a fully deformable convolutional network (FDCN) module embedded with focal loss, with which it is capable of decoding a wide range of features and local detail deformations. Both ablation experiments and comparison experiments were conducted on three representative RS images datasets. The ablation analysis demonstrates the contributions of specifically designed modules in the proposed BiTSRS to performance improvement. The comparison experimental results illustrate that the proposed BiTSRS clearly outperforms some state-of-the-art semantic segmentation methods.<\/jats:p>","DOI":"10.3390\/rs15030840","type":"journal-article","created":{"date-parts":[[2023,2,3]],"date-time":"2023-02-03T03:36:57Z","timestamp":1675395417000},"page":"840","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":14,"title":["BiTSRS: A Bi-Decoder Transformer Segmentor for High-Spatial-Resolution Remote Sensing Images"],"prefix":"10.3390","volume":"15","author":[{"given":"Yuheng","family":"Liu","sequence":"first","affiliation":[{"name":"School of Electronics and Information, Northwestern Polytechnical University, Xi\u2019an 710129, China"}]},{"given":"Yifan","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Electronics and Information, Northwestern Polytechnical University, Xi\u2019an 710129, China"}]},{"given":"Ye","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Electronics and Information, Northwestern Polytechnical University, Xi\u2019an 710129, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8018-596X","authenticated-orcid":false,"given":"Shaohui","family":"Mei","sequence":"additional","affiliation":[{"name":"School of Electronics and Information, Northwestern Polytechnical University, Xi\u2019an 710129, China"}]}],"member":"1968","published-online":{"date-parts":[[2023,2,2]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Liu, Y., Ren, Q., Geng, J., Ding, M., and Li, J. (2018). Efficient Patch-Wise Semantic Segmentation for Large-Scale Remote Sensing Images. Sensors, 18.","DOI":"10.3390\/s18103232"},{"key":"ref_2","unstructured":"Kampffmeyer, M., Salberg, A.B., and Jenssen, R. (July, January 26). Semantic segmentation of small objects and modeling of uncertainty in urban remote sensing images using deep convolutional neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Las Vegas, NV, USA."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"8301","DOI":"10.1109\/TGRS.2020.2985695","article-title":"Ms-rrfsegnet: Multiscale regional relation feature segmentation network for semantic segmentation of urban scene point clouds","volume":"58","author":"Luo","year":"2020","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_4","unstructured":"Khan, S.A., Shi, Y., Shahzad, M., and Zhu, X.X. (2020, January 14\u201319). FGCN: Deep Feature-based Graph Convolutional Network for Semantic Segmentation of Urban 3D Point Clouds. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3484440","article-title":"Multi-stage fusion and multi-source attention network for multi-modal remote sensing image segmentation","volume":"12","author":"Zhao","year":"2021","journal-title":"ACM Trans. Intell. Syst. Technol. TIST"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"6601","DOI":"10.1109\/TIP.2020.2992177","article-title":"Polarimetric SAR image semantic segmentation with 3D discrete wavelet transform and Markov random field","volume":"29","author":"Bi","year":"2020","journal-title":"IEEE Trans. Image Process."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"9764982","DOI":"10.34133\/2022\/9764982","article-title":"A broadband green-red vegetation index for monitoring gross primary production phenology","volume":"2022","author":"Yin","year":"2022","journal-title":"J. Remote Sens."},{"key":"ref_8","unstructured":"Alemohammad, H., and Booth, K. (2020). LandCoverNet: A global benchmark land cover classification training dataset. arXiv."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"111322","DOI":"10.1016\/j.rse.2019.111322","article-title":"Land-cover classification with high-resolution remote sensing images using transferable deep models","volume":"237","author":"Tong","year":"2020","journal-title":"Remote Sens. Environ."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"9769536","DOI":"10.34133\/2022\/9769536","article-title":"An Introduction to the Chinese High-Resolution Earth Observation System: Gaofen-1\u02dc 7 Civilian Satellites","volume":"2022","author":"Chen","year":"2022","journal-title":"J. Remote Sens."},{"key":"ref_11","first-page":"3349","article-title":"Deep high-resolution representation learning for visual recognition","volume":"3","author":"Wang","year":"2020","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"230","DOI":"10.1016\/j.isprsjprs.2020.04.016","article-title":"Efficient structure from motion for large-scale UAV images: A review and a comparison of SfM tools","volume":"167","author":"Jiang","year":"2020","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Hoeser, T., and Kuenzer, C. (2020). Object detection and image segmentation with deep learning on earth observation data: A review-part i: Evolution and recent trends. Remote Sens., 12.","DOI":"10.3390\/rs12101667"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1145\/3065386","article-title":"Imagenet classification with deep convolutional neural networks","volume":"60","author":"Krizhevsky","year":"2012","journal-title":"Commun. ACM"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"1789","DOI":"10.1109\/TIP.2022.3146012","article-title":"Graph Convolutional Dictionary Selection With L2,p Norm for Video Summarization","volume":"31","author":"Ma","year":"2022","journal-title":"IEEE Trans. Image Process."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Liu, H., Li, W., Xia, X.G., Zhang, M., Gao, C.Z., and Tao, R. (2022). Central attention network for hyperspectral imagery classification. IEEE Trans. Neural Netw. Learn. Syst.","DOI":"10.1109\/TNNLS.2022.3155114"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Zhang, Y., Li, W., Zhang, M., Wang, S., Tao, R., and Du, Q. (2022). Graph Information Aggregation Cross-Domain Few-Shot Learning for Hyperspectral Image Classification. IEEE Trans. Neural Netw. Learn. Syst.","DOI":"10.1109\/TNNLS.2022.3185795"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Li, W., Gao, Y., Zhang, M., Tao, R., and Du, Q. (2022). Asymmetric feature fusion network for hyperspectral and SAR image classification. IEEE Trans. Neural Netw. Learn. Syst.","DOI":"10.1109\/TNNLS.2022.3149394"},{"key":"ref_19","unstructured":"Ren, S., He, K., Girshick, R., and Sun, J. (2015, January 7\u201312). Faster r-cnn: Towards real-time object detection with region proposal networks. Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_20","unstructured":"Ciresan, D., Giusti, A., Gambardella, L., and Schmidhuber, J. (2015, January 7\u201312). Deep neural networks segment neuronal membranes in electron microscopy images. Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"1915","DOI":"10.1109\/TPAMI.2012.231","article-title":"Learning Hierarchical Features for Scene Labeling","volume":"35","author":"Farabet","year":"2013","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Ganin, Y., and Lempitsky, V.S. (2014, January 1\u20135). N4-Fields: Neural Network Nearest Neighbor Fields for Image Transforms. Proceedings of the 12th Asian Conference on Computer Vision, Singapore.","DOI":"10.1007\/978-3-319-16808-1_36"},{"key":"ref_23","unstructured":"Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., and Yuille, A.L. (2014). Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Long, J., Shelhamer, E., and Darrell, T. (2015, January 7\u201312). Fully convolutional networks for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015, January 5\u20139). U-net: Convolutional networks for biomedical image segmentation. Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany.","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"ref_26","unstructured":"Zhou, Z., Siddiquee, M.M.R., Tajbakhsh, N., and Liang, J. (2018). Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, Springer."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., and Adam, H. (2018, January 8\u201314). Encoder-decoder with atrous separable convolution for semantic image segmentation. Proceedings of the European conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_49"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. (2017, January 21\u201326). Pyramid Scene Parsing Network. Proceedings of the Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.660"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"1633","DOI":"10.1109\/JSTARS.2018.2810320","article-title":"Symmetrical dense-shortcut deep fully convolutional networks for semantic segmentation of very-high-resolution remote sensing images","volume":"11","author":"Chen","year":"2018","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"474","DOI":"10.1109\/LGRS.2018.2795531","article-title":"Fully convolutional networks for semantic segmentation of very high resolution remotely sensed images combined with DSM","volume":"15","author":"Sun","year":"2018","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"169","DOI":"10.1080\/09540091.2018.1510902","article-title":"Semantic segmentation of high-resolution remote sensing images using fully convolutional network with adaptive threshold","volume":"31","author":"Wu","year":"2019","journal-title":"Connect. Sci."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Liu, Y., Zhu, Q., Cao, F., Chen, J., and Lu, G. (2021). High-resolution remote sensing image segmentation framework based on attention mechanism and adaptive weighting. ISPRS Int. J. Geo Inf., 10.","DOI":"10.3390\/ijgi10040241"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"905","DOI":"10.1109\/LGRS.2020.2988294","article-title":"SCAttNet: Semantic Segmentation Network With Spatial and Channel Attention Mechanism for High-Resolution Remote Sensing Images","volume":"18","author":"Li","year":"2021","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"9809505","DOI":"10.34133\/2022\/9809505","article-title":"An Elliptic Centerness for Object Instance Segmentation in Aerial Images","volume":"2022","author":"Luo","year":"2022","journal-title":"J. Remote Sens."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Negin, F., Tabejamaat, M., Fraisse, R., and Bremond, F. (2022, January 19\u201320). Transforming Temporal Embeddings to Keypoint Heatmaps for Detection of Tiny Vehicles in Wide Area Motion Imagery (WAMI) Sequences. Proceedings of the 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), New Orleans, LA, USA.","DOI":"10.1109\/CVPRW56347.2022.00149"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"1346","DOI":"10.1109\/ACCESS.2021.3138980","article-title":"HM-Net: A Regression Network for Object Center Detection and Tracking on Wide Area Motion Imagery","volume":"10","author":"Motorcu","year":"2022","journal-title":"IEEE Access"},{"key":"ref_37","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Wang, W., Xie, E., Li, X., Fan, D.P., Song, K., Liang, D., Lu, T., Luo, P., and Shao, L. (2021, January 11\u201317). Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00061"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"415","DOI":"10.1007\/s41095-022-0274-8","article-title":"Pvt v2: Improved baselines with pyramid vision transformer","volume":"8","author":"Wang","year":"2022","journal-title":"Comput. Vis. Media"},{"key":"ref_40","first-page":"15908","article-title":"Transformer in Transformer","volume":"34","author":"Han","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Wu, H., Xiao, B., Codella, N., Liu, M., Dai, X., Yuan, L., and Zhang, L. (2021, January 11\u201317). CvT: Introducing Convolutions to Vision Transformers. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00009"},{"key":"ref_42","first-page":"1","article-title":"Hyperspectral Image Classification Using Group-Aware Hierarchical Transformer","volume":"60","author":"Mei","year":"2022","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021, January 11\u201317). Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Liu, Z., Ning, J., Cao, Y., Wei, Y., Zhang, Z., Lin, S., and Hu, H. (2022, January 18\u201324). Video Swin Transformer. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00320"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Liang, J., Cao, J., Sun, G., Zhang, K., Gool, L.V., and Timofte, R. (2021, January 11\u201317). SwinIR: Image Restoration Using Swin Transformer. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCVW54120.2021.00210"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Xu, X., Feng, Z., Cao, C., Li, M., Wu, J., Wu, Z., Shang, Y., and Ye, S. (2021). An Improved Swin Transformer-Based Model for Remote Sensing Object Detection and Instance Segmentation. Remote Sens., 13.","DOI":"10.3390\/rs13234779"},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/TGRS.2022.3230846","article-title":"Swin Transformer Embedding UNet for Remote Sensing Image Semantic Segmentation","volume":"60","author":"He","year":"2022","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"10990","DOI":"10.1109\/JSTARS.2021.3119654","article-title":"STransFuse: Fusing Swin Transformer and Convolutional Neural Network for Remote Sensing Image Semantic Segmentation","volume":"14","author":"Gao","year":"2021","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_49","first-page":"1","article-title":"Transformer and CNN Hybrid Deep Neural Network for Semantic Segmentation of Very-high-resolution Remote Sensing Imagery","volume":"60","author":"Zhang","year":"2022","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Xu, Z., Zhang, W., Zhang, T., Yang, Z., and Li, J. (2021). Efficient transformer for remote sensing image segmentation. Remote Sens., 13.","DOI":"10.3390\/rs13183585"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Xiao, T., Liu, Y., Zhou, B., Jiang, Y., and Sun, J. (2018, January 8\u201314). Unified Perceptual Parsing for Scene Understanding. Proceedings of the European Conference on Computer Vision, Munich, Germany.","DOI":"10.1007\/978-3-030-01228-1_26"},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"2481","DOI":"10.1109\/TPAMI.2016.2644615","article-title":"Segnet: A deep convolutional encoder-decoder architecture for image segmentation","volume":"39","author":"Badrinarayanan","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"834","DOI":"10.1109\/TPAMI.2017.2699184","article-title":"Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs","volume":"40","author":"Chen","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_54","unstructured":"Chen, L.C., Papandreou, G., Schroff, F., and Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. arXiv."},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"357","DOI":"10.1080\/17538947.2020.1831087","article-title":"Incorporating DeepLabv3+ and object-based image analysis for semantic segmentation of very high resolution remote sensing images","volume":"14","author":"Du","year":"2021","journal-title":"Int. J. Digit. Earth"},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Mou, L., Hua, Y., and Zhu, X.X. (2019, January 15\u201320). A Relation-Augmented Fully Convolutional Network for Semantic Segmentation in Aerial Scenes. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01270"},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Zheng, Z., Zhong, Y., Wang, J., and Ma, A. (2020, January 13\u201319). Foreground-Aware Relation Network for Geospatial Object Segmentation in High Spatial Resolution Remote Sensing Imagery. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition 2020, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00415"},{"key":"ref_58","doi-asserted-by":"crossref","first-page":"426","DOI":"10.1109\/TGRS.2020.2994150","article-title":"LANet: Local Attention Embedding to Improve the Semantic Segmentation of Remote Sensing Images","volume":"59","author":"Ding","year":"2021","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_59","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/TGRS.2020.3035561","article-title":"Multi-Attention-Network for Semantic Segmentation of Fine Resolution Remote Sensing Images","volume":"60","author":"Li","year":"2020","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_60","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., and Polosukhin, I. (2017, January 4\u20139). Attention is All you Need. Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA."},{"key":"ref_61","doi-asserted-by":"crossref","unstructured":"Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., and Zagoruyko, S. (2020, January 23\u201328). End-to-End Object Detection with Transformers. Proceedings of the Computer Vision\u2013ECCV 2020: 16th European Conference, Glasgow, UK.","DOI":"10.1007\/978-3-030-58452-8_13"},{"key":"ref_62","doi-asserted-by":"crossref","unstructured":"Strudel, R., Garcia, R., Laptev, I., and Schmid, C. (2021, January 11\u201317). Segmenter: Transformer for Semantic Segmentation. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00717"},{"key":"ref_63","doi-asserted-by":"crossref","unstructured":"Zheng, S., Lu, J., Zhao, H., Zhu, X., Luo, Z., Wang, Y., Fu, Y., Feng, J., Xiang, T., and Torr, P.H.S. (2021, January 20\u201325). Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00681"},{"key":"ref_64","first-page":"12077","article-title":"SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers","volume":"34","author":"Xie","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_65","doi-asserted-by":"crossref","unstructured":"Ding, L., Lin, D., Lin, S., Zhang, J., Cui, X., Wang, Y., Tang, H., and Bruzzone, L. (2021). Looking Outside the Window: Wider-Context Transformer for the Semantic Segmentation of High-Resolution Remote Sensing Images. arXiv.","DOI":"10.1109\/TGRS.2022.3168697"},{"key":"ref_66","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Doll\u00e1r, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2016, January 27\u201330). Feature Pyramid Networks for Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_67","doi-asserted-by":"crossref","unstructured":"Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., and Wei, Y. (2017, January 22\u201329). Deformable Convolutional Networks. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.89"},{"key":"ref_68","unstructured":"(2022, December 27). ISPRS, Semantic Labeling Contest (2018). Available online: https:\/\/www.isprs.org\/education\/benchmarks\/UrbanSemLab\/default.aspx."},{"key":"ref_69","doi-asserted-by":"crossref","first-page":"7092","DOI":"10.1109\/TGRS.2017.2740362","article-title":"High-resolution aerial image labeling with convolutional neural networks","volume":"55","author":"Maggiori","year":"2017","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_70","doi-asserted-by":"crossref","first-page":"7557","DOI":"10.1109\/TGRS.2020.2979552","article-title":"Relation matters: Relational context-aware fully convolutional network for semantic segmentation of high-resolution aerial images","volume":"58","author":"Mou","year":"2020","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_71","unstructured":"Wang, J., Zheng, Z., Ma, A., Lu, X., and Zhong, Y. (2021). LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation. arXiv."},{"key":"ref_72","unstructured":"Loshchilov, I., and Hutter, F. (2017). Decoupled weight decay regularization. arXiv."},{"key":"ref_73","doi-asserted-by":"crossref","unstructured":"Mishra, P., and Sarawadekar, K. (2019, January 17\u201320). Polynomial learning rate policy with warm restart for deep neural network. Proceedings of the TENCON 2019-2019 IEEE Region 10 Conference (TENCON), Kerala, India.","DOI":"10.1109\/TENCON.2019.8929465"},{"key":"ref_74","unstructured":"Bao, H., Dong, L., and Wei, F. (2021). BEiT: BERT Pre-Training of Image Transformers. arXiv."},{"key":"ref_75","doi-asserted-by":"crossref","first-page":"196","DOI":"10.1016\/j.isprsjprs.2022.06.008","article-title":"UNetFormer: An UNet-like Transformer for Efficient Semantic Segmentation of Remote Sensing Urban Scene Imagery","volume":"190","author":"Wang","year":"2021","journal-title":"ISPRS J. Photogramm. Remote Sens."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/15\/3\/840\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T18:22:52Z","timestamp":1760120572000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/15\/3\/840"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,2,2]]},"references-count":75,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2023,2]]}},"alternative-id":["rs15030840"],"URL":"https:\/\/doi.org\/10.3390\/rs15030840","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,2,2]]}}}