{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,21]],"date-time":"2026-03-21T04:27:26Z","timestamp":1774067246369,"version":"3.50.1"},"reference-count":33,"publisher":"MDPI AG","issue":"9","license":[{"start":{"date-parts":[[2024,4,25]],"date-time":"2024-04-25T00:00:00Z","timestamp":1714003200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Postgraduate Research &amp; Practice Innovation Program of Jiangsu Province","award":["KYCX23_0363"],"award-info":[{"award-number":["KYCX23_0363"]}]},{"name":"Postgraduate Research &amp; Practice Innovation Program of Jiangsu Province","award":["ZY202224"],"award-info":[{"award-number":["ZY202224"]}]},{"name":"Interdisciplinary Innovation Fund for Doctoral Students of Nanjing University of Aeronautics and Astronautics","award":["KYCX23_0363"],"award-info":[{"award-number":["KYCX23_0363"]}]},{"name":"Interdisciplinary Innovation Fund for Doctoral Students of Nanjing University of Aeronautics and Astronautics","award":["ZY202224"],"award-info":[{"award-number":["ZY202224"]}]},{"name":"Key Research and Development Project of Qixia 751 District in 2022","award":["KYCX23_0363"],"award-info":[{"award-number":["KYCX23_0363"]}]},{"name":"Key Research and Development Project of Qixia 751 District in 2022","award":["ZY202224"],"award-info":[{"award-number":["ZY202224"]}]},{"name":"Jiangsu Provincial Key Laboratory of Culture and Tourism for Non-destructive Testing and Safety Traceability of Cultural Relics","award":["KYCX23_0363"],"award-info":[{"award-number":["KYCX23_0363"]}]},{"name":"Jiangsu Provincial Key Laboratory of Culture and Tourism for Non-destructive Testing and Safety Traceability of Cultural Relics","award":["ZY202224"],"award-info":[{"award-number":["ZY202224"]}]},{"name":"Key Laboratory of Non-Destructive Testing and Monitoring Technology for High-Speed Transport Facilities of the Ministry of Industry and Information Technology","award":["KYCX23_0363"],"award-info":[{"award-number":["KYCX23_0363"]}]},{"name":"Key Laboratory of Non-Destructive Testing and Monitoring Technology for High-Speed Transport Facilities of the Ministry of Industry and Information Technology","award":["ZY202224"],"award-info":[{"award-number":["ZY202224"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Pixel-level classification of very-high-resolution images is a crucial yet challenging task in remote sensing. While transformers have demonstrated effectiveness in capturing dependencies, their tendency to partition images into patches may restrict their applicability to highly detailed remote sensing images. To extract latent contextual semantic information from high-resolution remote sensing images, we proposed a gaze\u2013saccade transformer (GSV-Trans) with visual perceptual attention. GSV-Trans incorporates a visual perceptual attention (VPA) mechanism that dynamically allocates computational resources based on the semantic complexity of the image. The VPA mechanism includes both gaze attention and eye movement attention, enabling the model to focus on the most critical parts of the image and acquire competitive semantic information. Additionally, to capture contextual semantic information across different levels in the image, we designed an inter-layer short-term visual memory module with bidirectional affinity propagation to guide attention allocation. Furthermore, we introduced a dual-branch pseudo-label module (DBPL) that imposes pixel-level and category-level semantic constraints on both gaze and saccade branches. DBPL encourages the model to extract domain-invariant features and align semantic information across different domains in the feature space. Extensive experiments on multiple pixel-level classification benchmarks confirm the effectiveness and superiority of our method over the state of the art.<\/jats:p>","DOI":"10.3390\/rs16091514","type":"journal-article","created":{"date-parts":[[2024,4,25]],"date-time":"2024-04-25T05:26:13Z","timestamp":1714022773000},"page":"1514","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["A Bio-Inspired Visual Perception Transformer for Cross-Domain Semantic Segmentation of High-Resolution Remote Sensing Images"],"prefix":"10.3390","volume":"16","author":[{"given":"Xinyao","family":"Wang","sequence":"first","affiliation":[{"name":"College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China"}]},{"given":"Haitao","family":"Wang","sequence":"additional","affiliation":[{"name":"College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China"}]},{"given":"Yuqian","family":"Jing","sequence":"additional","affiliation":[{"name":"College of Electronic Information and Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China"}]},{"given":"Xianming","family":"Yang","sequence":"additional","affiliation":[{"name":"China Greatwall Technology Group Co., Ltd., Shenzhen 518052, China"}]},{"given":"Jianbo","family":"Chu","sequence":"additional","affiliation":[{"name":"College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China"}]}],"member":"1968","published-online":{"date-parts":[[2024,4,25]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"834","DOI":"10.1109\/TPAMI.2017.2699184","article-title":"DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs","volume":"40","author":"Chen","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_2","unstructured":"Chen, L.C., Papandreou, G., Schroff, F., and Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. arXiv."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1016\/j.neucom.2021.05.001","article-title":"Global context-aware multi-scale features aggregative network for salient object detection","volume":"455","author":"Ullah","year":"2021","journal-title":"Neurocomputing"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Lin, C.-Y., Chiu, Y.-C., Ng, H.-F., Shih, T.K., and Lin, K.-H. (2020). Global-and-Local Context Network for Semantic Segmentation of Street View Images. Sensors, 20.","DOI":"10.3390\/s20102907"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"108469","DOI":"10.1016\/j.knosys.2022.108469","article-title":"Combining deep learning and ontology reasoning for remote sensing image semantic segmentation","volume":"243","author":"Li","year":"2022","journal-title":"Knowl.-Based Syst."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"7871","DOI":"10.1109\/TGRS.2020.3034123","article-title":"AFNet: Adaptive Fusion Network for Remote Sensing Image Semantic Segmentation","volume":"59","author":"Liu","year":"2021","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"76","DOI":"10.1038\/s41598-022-27358-6","article-title":"Combining convolutional neural networks and self-attention for fundus diseases identification","volume":"13","author":"Wang","year":"2023","journal-title":"Sci. Rep."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Zheng, S., Lu, J., Zhao, H., Zhu, X., Luo, Z., Wang, Y., Fu, Y., Feng, J., Xiang, T., and Torr, P.H. (2021, January 20\u201325). Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00681"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"10990","DOI":"10.1109\/JSTARS.2021.3119654","article-title":"STransFuse: Fusing SWIN Transformer and Convolutional Neural Network for Remote Sensing Image Semantic Segmentation","volume":"14","author":"Gao","year":"2021","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"4408715","DOI":"10.1109\/TGRS.2022.3144165","article-title":"Swin Transformer Embedding Unet for Remote Sensing Image Semantic Segmentation","volume":"60","author":"He","year":"2022","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1016\/j.isprsjprs.2021.02.009","article-title":"Learning deep semantic segmentation network under multiple weakly-supervised constraints for cross-domain remote sensing image semantic segmentation","volume":"175","author":"Li","year":"2021","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_12","first-page":"5605116","article-title":"Enhancing Multiscale Representations With Transformer for Remote Sensing Image Semantic Segmentation","volume":"61","author":"Xiao","year":"2023","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_13","unstructured":"Wei, X., and Zhou, X. (2023). International Conference on Neural Information Processing, Springer Nature Singapore."},{"key":"ref_14","first-page":"5900314","article-title":"CTMFNet: CNN and Transformer Multiscale Fusion Network of Remote Sensing Urban Scene Imagery","volume":"61","author":"Song","year":"2022","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"5628313","DOI":"10.1109\/TGRS.2022.3198972","article-title":"Domain Adaptation for Remote Sensing Image Semantic Segmentation: An Integrated Approach of Contrastive Learning and Adversarial Learning","volume":"60","author":"Bai","year":"2022","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"5608416","DOI":"10.1109\/TGRS.2023.3271776","article-title":"Category-Level Assignment for Cross-Domain Semantic Segmentation in Remote Sensing Images","volume":"61","author":"Ni","year":"2023","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"5284","DOI":"10.1109\/JSTARS.2023.3280365","article-title":"Swin-Conv-Dspp and Global Local Transformer for Remote Sensing Image Semantic Segmentation","volume":"16","author":"Mo","year":"2023","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"9174","DOI":"10.1109\/JSTARS.2022.3214889","article-title":"High-Resolution Remote Sensing Image Semantic Segmentation via Multiscale Context and Linear Self-Attention","volume":"15","author":"Yin","year":"2022","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_19","first-page":"5998","article-title":"Attention is all you need. Advances in neural information processing systems","volume":"30","author":"Vaswani","year":"2017","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_20","first-page":"5621714","article-title":"Adaptive Context Transformer for Semisupervised Remote Sensing Image Segmentation","volume":"61","author":"Li","year":"2023","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"1200","DOI":"10.1109\/JAS.2022.105686","article-title":"SwinFusion: Cross-domain Long-range Learning for General Image Fusion via Swin Transformer","volume":"9","author":"Ma","year":"2022","journal-title":"IEEE\/CAA J. Autom. Sin."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"2","DOI":"10.1167\/jov.20.12.2","article-title":"A review of interactions between peripheral and foveal vision","volume":"20","author":"Stewart","year":"2020","journal-title":"J. Vis."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"255","DOI":"10.1016\/j.bandc.2008.08.016","article-title":"Neurophysiology and neuroanatomy of reflexive and volitional saccades: Evidence from studies of humans","volume":"68","author":"McDowell","year":"2008","journal-title":"Brain Cogn."},{"key":"ref_24","unstructured":"Jonnalagadda, A., Wang, W.Y., Manjunath, B.S., and Eckstein, M.P. (2021). Foveater: Foveated transformer for image classification. arXiv."},{"key":"ref_25","unstructured":"Shi, Y., Sun, M., Wang, Y., Wang, R., Sun, H., and Chen, Z. (2023). EViT: An Eagle Vision Transformer with Bi-Fovea Self-Attention. arXiv."},{"key":"ref_26","unstructured":"Shi, D. (2023). TransNeXt: Robust Foveal Visual Perception for Vision Transformers. arXiv."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"72","DOI":"10.1038\/scientificamerican0661-72","article-title":"Stabilized Images on the Retina","volume":"204","author":"Pritchard","year":"1961","journal-title":"Sci. Am."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"3558","DOI":"10.1109\/TGRS.2019.2958123","article-title":"Triplet adversarial domain adaptation for pixel-level classification of VHR remote sensing images","volume":"58","author":"Yan","year":"2019","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Saito, K., Watanabe, K., Ushiku, Y., and Harada, T. (2018, January 18\u201322). Maximum classifier discrepancy for unsupervised domain adaptation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00392"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Zhao, Y., Guo, P., Sun, Z., Chen, X., and Gao, H. (2023). ResiDualGAN: Resize-Residual DualGAN for Cross-Domain Remote Sensing Images Semantic Segmentation. Remote Sens., 15.","DOI":"10.3390\/rs15051428"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Hoyer, L., Dai, D., and Gool, L.V. (2022, January 19\u201324). Daformer: Improving network architectures and training strategies for domain-adaptive semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00969"},{"key":"ref_32","unstructured":"Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., and Wang, M. (2022). European Conference on Computer Vision, Springer Nature Switzerland."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"1126","DOI":"10.1007\/s11263-023-01894-8","article-title":"SegViT v2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision Transformers","volume":"132","author":"Zhang","year":"2024","journal-title":"Int. J. Comput. Vis."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/16\/9\/1514\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T14:33:48Z","timestamp":1760106828000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/16\/9\/1514"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,4,25]]},"references-count":33,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2024,5]]}},"alternative-id":["rs16091514"],"URL":"https:\/\/doi.org\/10.3390\/rs16091514","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,4,25]]}}}