{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,18]],"date-time":"2026-05-18T18:32:01Z","timestamp":1779129121815,"version":"3.51.4"},"reference-count":51,"publisher":"MDPI AG","issue":"21","license":[{"start":{"date-parts":[[2022,10,28]],"date-time":"2022-10-28T00:00:00Z","timestamp":1666915200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100012166","name":"National Key R&amp;D Program of China","doi-asserted-by":"publisher","award":["2021YFE0117300"],"award-info":[{"award-number":["2021YFE0117300"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Natural imagery segmentation has been transferred to land cover classification in remote sensing imagery with excellent performance. However, two key issues have been overlooked in the transfer process: (1) some objects were easily overwhelmed by the complex backgrounds; (2) interclass information for indistinguishable classes was not fully utilized. The attention mechanism in the transformer is capable of modeling long-range dependencies on each sample for per-pixel context extraction. Notably, per-pixel context from the attention mechanism can aggregate category information. Therefore, we proposed a semantic segmentation method based on pixel representation augmentation. In our method, a simplified feature pyramid was designed to decode the hierarchical pixel features from the backbone, and then decode the category representations into learnable category object embedding queries by cross-attention in the transformer decoder. Finally, pixel representation is augmented by an additional cross-attention in the transformer encoder under the supervision of auxiliary segmentation heads. The results of extensive experiments on the aerial image dataset Potsdam and satellite image dataset Gaofen Image Dataset with 15 categories (GID-15) demonstrate that the cross-attention is effective, and our method achieved the mean intersection over union (mIoU) of 86.2% and 62.5% on the Potsdam test set and GID-15 validation set, respectively. Additionally, we achieved an inference speed of 76 frames per second (FPS) on the Potsdam test dataset, higher than all the state-of-the-art models we tested on the same device.<\/jats:p>","DOI":"10.3390\/rs14215415","type":"journal-article","created":{"date-parts":[[2022,10,30]],"date-time":"2022-10-30T09:01:42Z","timestamp":1667120502000},"page":"5415","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["Pixel Representation Augmented through Cross-Attention for High-Resolution Remote Sensing Imagery Segmentation"],"prefix":"10.3390","volume":"14","author":[{"given":"Yiyun","family":"Luo","sequence":"first","affiliation":[{"name":"School of Geography and Remote Sensing, Guangzhou University, Guangzhou 510006, China"},{"name":"Centre for Remote Sensing Big Data Intelligence Applications, Guangzhou University, Guangzhou 510006, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jinnian","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Geography and Remote Sensing, Guangzhou University, Guangzhou 510006, China"},{"name":"Centre for Remote Sensing Big Data Intelligence Applications, Guangzhou University, Guangzhou 510006, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1303-195X","authenticated-orcid":false,"given":"Xiankun","family":"Yang","sequence":"additional","affiliation":[{"name":"School of Geography and Remote Sensing, Guangzhou University, Guangzhou 510006, China"},{"name":"Centre for Remote Sensing Big Data Intelligence Applications, Guangzhou University, Guangzhou 510006, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9985-0165","authenticated-orcid":false,"given":"Zhenyu","family":"Yu","sequence":"additional","affiliation":[{"name":"School of Geography and Remote Sensing, Guangzhou University, Guangzhou 510006, China"},{"name":"Centre for Remote Sensing Big Data Intelligence Applications, Guangzhou University, Guangzhou 510006, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zixuan","family":"Tan","sequence":"additional","affiliation":[{"name":"School of Geography and Remote Sensing, Guangzhou University, Guangzhou 510006, China"},{"name":"Centre for Remote Sensing Big Data Intelligence Applications, Guangzhou University, Guangzhou 510006, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2022,10,28]]},"reference":[{"key":"ref_1","unstructured":"(2022, March 30). What Is the Difference between Land Cover and Land Use?, Available online: https:\/\/oceanservice.noaa.gov\/facts\/lclu.html."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"185","DOI":"10.1016\/S0034-4257(01)00295-4","article-title":"Status of Land Cover Classification Accuracy Assessment","volume":"80","author":"Foody","year":"2002","journal-title":"Remote Sens. Environ."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"353","DOI":"10.1080\/014311600210876","article-title":"Toward Remote Sensing Methods for Land Cover Dynamic Monitoring: Application to Morocco","volume":"21","author":"Sobrino","year":"2000","journal-title":"Int. J. Remote Sens."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1425","DOI":"10.1080\/01431169608948714","article-title":"The Use of the Normalized Difference Water Index (NDWI) in the Delineation of Open Water Features","volume":"17","author":"McFeeters","year":"1996","journal-title":"Int. J. Remote Sens."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"146","DOI":"10.1016\/j.rse.2006.02.010","article-title":"Use of Impervious Surface in Urban Land-Use Classification","volume":"102","author":"Lu","year":"2006","journal-title":"Remote Sens. Environ."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Talukdar, S., Singha, P., Mahato, S., Pal, S., Liou, Y.-A., and Rahman, A. (2020). Land-Use Land-Cover Classification by Machine Learning Classifiers for Satellite Observations\u2014A Review. Remote Sens., 12.","DOI":"10.3390\/rs12071135"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"294","DOI":"10.1016\/j.patrec.2005.08.011","article-title":"Random Forests for Land Cover Classification","volume":"27","author":"Gislason","year":"2006","journal-title":"Pattern Recognit. Lett."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"503","DOI":"10.1016\/S0034-4257(00)00142-5","article-title":"Multiple Criteria for Evaluating Machine Learning Algorithms for Land Cover Classification from Satellite Data","volume":"74","author":"DeFries","year":"2000","journal-title":"Remote Sens. Environ."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"166","DOI":"10.1016\/j.isprsjprs.2019.04.015","article-title":"Deep Learning in Remote Sensing Applications: A Meta-Analysis and Review","volume":"152","author":"Ma","year":"2019","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1145\/3065386","article-title":"ImageNet Classification with Deep Convolutional Neural Networks","volume":"60","author":"Krizhevsky","year":"2017","journal-title":"Commun. ACM"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Long, J., Shelhamer, E., and Darrell, T. (2015, January 7\u201312). Fully Convolutional Networks for Semantic Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1016\/j.neucom.2019.02.003","article-title":"Survey on Semantic Segmentation Using Deep Learning Techniques","volume":"338","author":"Lateef","year":"2019","journal-title":"Neurocomputing"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015, January 5\u20139). U-Net: Convolutional Networks for Biomedical Image Segmentation. Proceedings of the Medical Image Computing and Computer-Assisted Intervention(MICCAI), Munich, Germany.","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. (2017, January 21\u201326). Pyramid Scene Parsing Network. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.660"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"2481","DOI":"10.1109\/TPAMI.2016.2644615","article-title":"SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation","volume":"39","author":"Badrinarayanan","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"834","DOI":"10.1109\/TPAMI.2017.2699184","article-title":"DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs","volume":"40","author":"Chen","year":"2018","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"331","DOI":"10.1007\/s41095-022-0271-y","article-title":"Attention Mechanisms in Computer Vision: A Survey","volume":"8","author":"Guo","year":"2022","journal-title":"Comput. Vis. Media"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"270","DOI":"10.1007\/978-3-030-01240-3_17","article-title":"PSANet: Point-Wise Spatial Attention Network for Scene Parsing","volume":"Volume 11213","author":"Zhao","year":"2018","journal-title":"Proceedings of the European Conference on Computer Vision (ECCV 2018)"},{"key":"ref_19","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2021, January 3\u20137). An Image Is Worth 16 \u00d7 16 Words: Transformers for Image Recognition at Scale. Proceedings of the 9th International Conference on Learning Representations, ICLR 2021, Virtual Event."},{"key":"ref_20","unstructured":"Guyon, I., von Luxburg, U., Bengio, S., Wallach, H.M., Fergus, R., Vishwanathan, S.V.N., and Garnett, R. (2007, January 4\u20139). Attention Is All You Need. Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021, January 10\u201317). Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. Proceedings of the 2021 IEEE\/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Zheng, S., Lu, J., Zhao, H., Zhu, X., Luo, Z., Wang, Y., Fu, Y., Feng, J., Xiang, T., and Torr, P.H.S. (2021, January 19\u201325). Rethinking Semantic Segmentation From a Sequence-to-Sequence Perspective With Transformers. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, Virtual.","DOI":"10.1109\/CVPR46437.2021.00681"},{"key":"ref_23","unstructured":"Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., and Vaughan, J.W. (2021, January 6\u201314). SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, Virtual."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Li, X., He, H., Li, X., Li, D., Cheng, G., Shi, J., Weng, L., Tong, Y., and Lin, Z. (2021, January 19\u201325). PointFlow: Flowing Semantics Through Points for Aerial Image Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, Virtual.","DOI":"10.1109\/CVPR46437.2021.00420"},{"key":"ref_25","first-page":"17864","article-title":"Per-Pixel Classification Is Not All You Need for Semantic Segmentation","volume":"Volume 34","author":"Cheng","year":"2021","journal-title":"Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"28","DOI":"10.1016\/j.apgeog.2006.09.004","article-title":"Remote Sensing and GIS for Mapping and Monitoring Land Cover and Land-Use Changes in the Northwestern Coastal Zone of Egypt","volume":"27","author":"Shalaby","year":"2007","journal-title":"Appl. Geogr."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"935","DOI":"10.1080\/014311698215801","article-title":"Remote Sensing Techniques for Mangrove Mapping","volume":"19","author":"Green","year":"1998","journal-title":"Int. J. Remote Sens."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"322","DOI":"10.1016\/j.rse.2005.05.008","article-title":"Decision Tree Regression for Soft Classification of Remote Sensing Data","volume":"97","author":"Xu","year":"2005","journal-title":"Remote Sens. Environ."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"271","DOI":"10.1016\/S0169-2046(01)00160-8","article-title":"Predicting Land-Cover and Land-Use Change in the Urban Fringe: A Case in Morelia City, Mexico","volume":"55","author":"Bocco","year":"2001","journal-title":"Landsc. Urban Plan"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"2193","DOI":"10.1080\/01431160110078467","article-title":"Unsupervised Classification of Satellite Imagery: Choosing a Good Algorithm","volume":"23","author":"Duda","year":"2002","journal-title":"Int. J. Remote Sens."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"5628115","DOI":"10.1109\/TGRS.2022.3197334","article-title":"Hidden Path Selection Network for Semantic Segmentation of Remote Sensing Images","volume":"60","author":"Yang","year":"2022","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Bell, S., Zitnick, C.L., Bala, K., and Girshick, R. (2016, January 27\u201330). Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.314"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Lin, T.-Y., Dollar, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21\u201326). Feature Pyramid Networks for Object Detection. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Huang, S., Lu, Z., Cheng, R., and He, C. (2021, January 10\u201317). FaPN: Feature-Aligned Pyramid Network for Dense Image Prediction. Proceedings of the 2021 IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00090"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"173","DOI":"10.1007\/978-3-030-58539-6_11","article-title":"Object-Contextual Representations for Semantic Segmentation","volume":"Volume 12351","author":"Vedaldi","year":"2020","journal-title":"Proceedings of the European Conference on Computer Vision (ECCV 2020)"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"539","DOI":"10.1016\/j.patcog.2016.07.001","article-title":"dos Towards Better Exploiting Convolutional Neural Networks for Remote Sensing Scene Classification","volume":"61","author":"Nogueira","year":"2017","journal-title":"Pattern Recognit."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"3051","DOI":"10.1007\/s11263-021-01515-2","article-title":"BiSeNet V2: Bilateral Network with Guided Aggregation for Real-Time Semantic Segmentation","volume":"129","author":"Yu","year":"2021","journal-title":"Int. J. Comput. Vision"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., and Lu, H. (2019, January 16\u201320). Dual Attention Network for Scene Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00326"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Zhang, F., Chen, Y., Li, Z., Hong, Z., Liu, J., Ma, F., Han, J., and Ding, E. (November, January 27). ACFNet: Attentional Class Feature Network for Semantic Segmentation. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00690"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Vedaldi, A., Bischof, H., Brox, T., and Frahm, J.-M. (2020). End-to-End Object Detection with Transformers. European Conference on Computer Vision (ECCV 2020), Springer International Publishing.","DOI":"10.1007\/978-3-030-58548-8"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y., and Liu, W. (November, January 27). CCNet: Criss-Cross Attention for Semantic Segmentation. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00069"},{"key":"ref_42","unstructured":"Zhang, Y., and Yang, Q. (2021). A Survey on Multi-Task Learning. IEEE T Knowl. Data En."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Lin, T.-Y., Goyal, P., Girshick, R.B., He, K., and Doll\u00e1r, P. (2017, January 22\u201329). Focal Loss for Dense Object Detection. Proceedings of the IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy.","DOI":"10.1109\/ICCV.2017.324"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Milletari, F., Navab, N., and Ahmadi, S.-A. (2016, January 25\u201328). V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA.","DOI":"10.1109\/3DV.2016.79"},{"key":"ref_45","unstructured":"(2022, January 28). 2D Semantic Labeling Contest-Potsdam. Available online: https:\/\/www.isprs.org\/education\/benchmarks\/UrbanSemLab\/2d-sem-label-potsdam.aspx."},{"key":"ref_46","unstructured":"Guo, M.-H., Lu, C., Liu, Z.-N., Cheng, M.-M., and Hu, S. (2022). Visual Attention Network. arXiv."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Mou, L., Hua, Y., and Zhu, X.X. (2019, January 16\u201320). A Relation-Augmented Fully Convolutional Network for Semantic Segmentation in Aerial Scenes. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01270"},{"key":"ref_48","unstructured":"Li, G., and Kim, J. (2019, January 9\u201312). DABNet: Depth-Wise Asymmetric Bottleneck for Real-Time Semantic Segmentation. Proceedings of the 30th British Machine Vision Conference 2019, BMVC 2019, Cardiff, UK."},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1016\/j.isprsjprs.2021.09.005","article-title":"ABCNet: Attentive Bilateral Contextual Network for Efficient Semantic Segmentation of Fine-Resolution Remotely Sensed Imagery","volume":"181","author":"Li","year":"2021","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Xiao, T., Liu, Y., Zhou, B., Jiang, Y., and Sun, J. (2018, January 8\u201314). Unified Perceptual Parsing for Scene Understanding. Proceedings of the Proceedings of the European Conference on Computer Vision (ECCV 2018), Munich, Germany.","DOI":"10.1007\/978-3-030-01228-1_26"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Xia, G.-S., Ding, J., Qian, M., Xue, N., Han, J., Bai, X., Yang, M.Y., Li, S., Belongie, S., and Luo, J. (2021, January 10\u201317). LUAI Challenge 2021 on Learning To Understand Aerial Images. Proceedings of the the IEEE\/CVF International Conference on Computer Vision, Montreal, QC, Canada.","DOI":"10.1109\/ICCVW54120.2021.00090"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/21\/5415\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T01:04:57Z","timestamp":1760144697000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/21\/5415"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,10,28]]},"references-count":51,"journal-issue":{"issue":"21","published-online":{"date-parts":[[2022,11]]}},"alternative-id":["rs14215415"],"URL":"https:\/\/doi.org\/10.3390\/rs14215415","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,10,28]]}}}