{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,13]],"date-time":"2026-02-13T13:55:50Z","timestamp":1770990950688,"version":"3.50.1"},"reference-count":49,"publisher":"MDPI AG","issue":"13","license":[{"start":{"date-parts":[[2023,6,30]],"date-time":"2023-06-30T00:00:00Z","timestamp":1688083200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Key Laboratory of Land Satellite Remote Sensing Application, Ministry of Natural Resources of the People\u2019s Republic of China","award":["KLSMNR-G202206"],"award-info":[{"award-number":["KLSMNR-G202206"]}]},{"name":"Key Laboratory of Land Satellite Remote Sensing Application, Ministry of Natural Resources of the People\u2019s Republic of China","award":["42001362"],"award-info":[{"award-number":["42001362"]}]},{"name":"Key Laboratory of Land Satellite Remote Sensing Application, Ministry of Natural Resources of the People\u2019s Republic of China","award":["1523142301011"],"award-info":[{"award-number":["1523142301011"]}]},{"name":"National Natural Science Foundation of China","award":["KLSMNR-G202206"],"award-info":[{"award-number":["KLSMNR-G202206"]}]},{"name":"National Natural Science Foundation of China","award":["42001362"],"award-info":[{"award-number":["42001362"]}]},{"name":"National Natural Science Foundation of China","award":["1523142301011"],"award-info":[{"award-number":["1523142301011"]}]},{"name":"Startup Foundation for Introducing Talent of NUIST","award":["KLSMNR-G202206"],"award-info":[{"award-number":["KLSMNR-G202206"]}]},{"name":"Startup Foundation for Introducing Talent of NUIST","award":["42001362"],"award-info":[{"award-number":["42001362"]}]},{"name":"Startup Foundation for Introducing Talent of NUIST","award":["1523142301011"],"award-info":[{"award-number":["1523142301011"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Convolutional neural networks (CNNs) have achieved great progress in the classification of surface objects with hyperspectral data, but due to the limitations of convolutional operations, CNNs cannot effectively interact with contextual information. Transformer succeeds in solving this problem, and thus has been widely used to classify hyperspectral surface objects in recent years. However, the huge computational load of Transformer poses a challenge in hyperspectral semantic segmentation tasks. In addition, the use of single Transformer discards the local correlation, making it ineffective for remote sensing tasks with small datasets. Therefore, we propose a new Transformer layered architecture that combines Transformer with CNN, adopts a feature dimensionality reduction module and a Transformer-style CNN module to extract shallow features and construct texture constraints, and employs the original Transformer Encoder to extract deep features. Furthermore, we also designed a simple Decoder to process shallow spatial detail information and deep semantic features separately. Experimental results based on three publicly available hyperspectral datasets show that our proposed method has significant advantages compared with other traditional CNN, Transformer-type models.<\/jats:p>","DOI":"10.3390\/rs15133366","type":"journal-article","created":{"date-parts":[[2023,7,3]],"date-time":"2023-07-03T00:49:27Z","timestamp":1688345367000},"page":"3366","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":15,"title":["Shallow-Guided Transformer for Semantic Segmentation of Hyperspectral Remote Sensing Imagery"],"prefix":"10.3390","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5723-4789","authenticated-orcid":false,"given":"Yuhan","family":"Chen","sequence":"first","affiliation":[{"name":"School of Remote Sensing and Geomatics Engineering, Nanjing University of Information Science and Technology, Nanjing 210044, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5443-5910","authenticated-orcid":false,"given":"Pengyuan","family":"Liu","sequence":"additional","affiliation":[{"name":"School of Geographical Sciences, Nanjing University of Information Science and Technology, Nanjing 210044, China"}]},{"given":"Jiechen","family":"Zhao","sequence":"additional","affiliation":[{"name":"Qingdao Innovation and Development Base (Centre), Harbin Engineering University, Qingdao 266000, China"}]},{"given":"Kaijian","family":"Huang","sequence":"additional","affiliation":[{"name":"School of Electronic Information and Electrical Engineering, Huizhou University, Huizhou 516007, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6693-957X","authenticated-orcid":false,"given":"Qingyun","family":"Yan","sequence":"additional","affiliation":[{"name":"School of Remote Sensing and Geomatics Engineering, Nanjing University of Information Science and Technology, Nanjing 210044, China"}]}],"member":"1968","published-online":{"date-parts":[[2023,6,30]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Cai, Y., Lin, J., Hu, X., Wang, H., Yuan, X., Zhang, Y., Timofte, R., and Van Gool, L. (2022, January 18\u201324). Mask-guided spectral-wise transformer for efficient hyperspectral image reconstruction. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01698"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"5543017","DOI":"10.1109\/TGRS.2022.3218795","article-title":"HyperNet: Self-supervised hyperspectral spatial\u2013spectral feature understanding network for hyperspectral change detection","volume":"60","author":"Hu","year":"2022","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Tian, S., Lu, Q., and Wei, L. (2022). Multiscale Superpixel-Based Fine Classification of Crops in the UAV-Based Hyperspectral Imagery. Remote Sens., 14.","DOI":"10.3390\/rs14143292"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Zhou, L., Zhang, C., and Wu, M. (2018, January 18\u201322). D-LinkNet: LinkNet with pretrained encoder and dilated convolution for high resolution satellite imagery road extraction. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPRW.2018.00034"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1016\/j.isprsjprs.2021.09.005","article-title":"ABCNet: Attentive bilateral contextual network for efficient semantic segmentation of Fine-Resolution remotely sensed imagery","volume":"181","author":"Li","year":"2021","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Ding, X., Zhang, X., Han, J., and Ding, G. (2022, January 18\u201324). Scaling up your kernels to 31 \u00d7 31: Revisiting large kernel design in cnns. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01166"},{"key":"ref_7","first-page":"1228002","article-title":"Hyperspectral Remote-Sensing Classification Combining Transformer and Multiscale Residual Mechanisms","volume":"60","author":"Chen","year":"2023","journal-title":"Laser Optoelectron. Prog."},{"key":"ref_8","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An image is worth 16 \u00d7 16 words: Transformers for image recognition at scale. arXiv."},{"key":"ref_9","first-page":"5514715","article-title":"Spectral\u2013spatial transformer network for hyperspectral image classification: A factorized architecture search framework","volume":"60","author":"Zhong","year":"2021","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_10","first-page":"5518615","article-title":"SpectralFormer: Rethinking hyperspectral image classification with transformers","volume":"60","author":"Hong","year":"2021","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_11","first-page":"5532513","article-title":"MSTNet: A Multilevel Spectral\u2013Spatial Transformer Network for Hyperspectral Image Classification","volume":"60","author":"Yu","year":"2022","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"5522214","DOI":"10.1109\/TGRS.2022.3221534","article-title":"Spectral\u2013spatial feature tokenization transformer for hyperspectral image classification","volume":"60","author":"Sun","year":"2022","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021, January 11\u201317). Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Peng, Y., Ren, J., Wang, J., and Shi, M. (2023). Spectral-Swin Transformer with Spatial Feature Extraction Enhancement for Hyperspectral Image Classification. Remote Sens., 15.","DOI":"10.3390\/rs15102696"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"7570","DOI":"10.1109\/JSTARS.2021.3099118","article-title":"Hyperspectral image classification using a hybrid 3D-2D convolutional neural networks","volume":"14","author":"Ghaderizadeh","year":"2021","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"258619","DOI":"10.1155\/2015\/258619","article-title":"Deep convolutional neural networks for hyperspectral image classification","volume":"2015","author":"Hu","year":"2015","journal-title":"J. Sensors"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"4420","DOI":"10.1109\/TGRS.2018.2818945","article-title":"3-D deep learning approach for remote sensing image classification","volume":"56","author":"Hamida","year":"2018","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Gong, H., Li, Q., Li, C., Dai, H., He, Z., Wang, W., Li, H., Han, F., Tuniyazi, A., and Mu, T. (2021). Multiscale information fusion for hyperspectral image classification based on hybrid 2D-3D CNN. Remote Sens., 13.","DOI":"10.3390\/rs13122268"},{"key":"ref_19","first-page":"6005905","article-title":"Efficient Semantic Segmentation of Hyperspectral Images Using Adaptable Rectangular Convolution","volume":"19","author":"Paoletti","year":"2022","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"1968","DOI":"10.1109\/LGRS.2019.2960528","article-title":"DSSNet: A Simple Dilated Semantic Segmentation Network for Hyperspectral Imagery Classification","volume":"17","author":"Pan","year":"2020","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"11709","DOI":"10.1109\/TCYB.2021.3070577","article-title":"A spectral-spatial-dependent global learning framework for insufficient and imbalanced hyperspectral image classification","volume":"52","author":"Zhu","year":"2021","journal-title":"IEEE Trans. Cybern."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"6517505","DOI":"10.1109\/LGRS.2022.3215200","article-title":"Class-Guided Swin Transformer for Semantic Segmentation of Remote Sensing Imagery","volume":"19","author":"Meng","year":"2022","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"353","DOI":"10.1016\/j.isprsjprs.2021.03.016","article-title":"A global context-aware and batch-independent network for road extraction from VHR satellite imagery","volume":"175","author":"Zhu","year":"2021","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"415","DOI":"10.1007\/s41095-022-0274-8","article-title":"Pvt v2: Improved baselines with pyramid vision transformer","volume":"8","author":"Wang","year":"2022","journal-title":"Comput. Vis. Media"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Yu, W., Luo, M., Zhou, P., Si, C., Zhou, Y., Wang, X., Feng, J., and Yan, S. (2022, January 18\u201324). Metaformer is actually what you need for vision. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01055"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T., and Xie, S. (2022, January 18\u201324). A convnet for the 2020s. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01167"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Chang, S., Wang, P., Lin, M., Wang, F., Zhang, D.J., Jin, R., and Shou, M.Z. (2023, January 18\u201322). Making Vision Transformers Efficient from a Token Sparsification View. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada.","DOI":"10.1109\/CVPR52729.2023.00600"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Ding, M., Xiao, B., Codella, N., Luo, P., Wang, J., and Yuan, L. (2022, January 23\u201327). Davit: Dual attention vision transformers. Proceedings of the Computer Vision\u2013ECCV 2022: 17th European Conference, Tel Aviv, Israel. Proceedings, Part XXIV.","DOI":"10.1007\/978-3-031-20053-3_5"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Pan, X., Ge, C., Lu, R., Song, S., Chen, G., Huang, Z., and Huang, G. (2022, January 18\u201324). On the integration of self-attention and convolution. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00089"},{"key":"ref_30","unstructured":"Xu, J., Sun, X., Zhang, Z., Zhao, G., and Lin, J. (2019). Advances in Neural Information Processing Systems, MIT Press."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Sun, J., Shen, Z., Wang, Y., Bao, H., and Zhou, X. (2021, January 20\u201325). LoFTR: Detector-free local feature matching with transformers. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00881"},{"key":"ref_32","unstructured":"d\u2019Ascoli, S., Touvron, H., Leavitt, M.L., Morcos, A.S., Biroli, G., and Sagun, L. (2021, January 18\u201324). Convit: Improving vision transformers with soft convolutional inductive biases. Proceedings of the International Conference on Machine Learning (PMLR), Virtual."},{"key":"ref_33","unstructured":"Pan, Z., Cai, J., and Zhuang, B. (2022). Fast vision transformers with hilo attention. arXiv."},{"key":"ref_34","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv."},{"key":"ref_35","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, \u0141., and Polosukhin, I. (2017). Advances in Neural Information Processing Systems, MIT Press."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"He, X., Chen, Y., and Lin, Z. (2021). Spatial-spectral transformer for hyperspectral image classification. Remote Sens., 13.","DOI":"10.3390\/rs13030498"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Strudel, R., Garcia, R., Laptev, I., and Schmid, C. (2021, January 11\u201317). Segmenter: Transformer for semantic segmentation. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00717"},{"key":"ref_38","first-page":"12077","article-title":"SegFormer: Simple and efficient design for semantic segmentation with transformers","volume":"34","author":"Xie","year":"2021","journal-title":"Advances in Neural Information Processing Systems"},{"key":"ref_39","unstructured":"Hendrycks, D., and Gimpel, K. (2016). Gaussian error linear units (gelus). arXiv."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Xiao, T., Liu, Y., Zhou, B., Jiang, Y., and Sun, J. (2018, January 8\u201314). Unified perceptual parsing for scene understanding. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01228-1_26"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Chaurasia, A., and Culurciello, E. (2017, January 10\u201313). Linknet: Exploiting encoder representations for efficient semantic segmentation. Proceedings of the 2017 IEEE Visual Communications and Image Processing (VCIP), St. Petersburg, FL, USA.","DOI":"10.1109\/VCIP.2017.8305148"},{"key":"ref_42","first-page":"1500305","article-title":"Inland Water Mapping Based on GA-LinkNet from CyGNSS Data","volume":"20","author":"Yan","year":"2022","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_43","unstructured":"Loshchilov, I., and Hutter, F. (2017). Decoupled weight decay regularization. arXiv."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015, January 5\u20139). U-net: Convolutional networks for biomedical image segmentation. Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany.","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"ref_45","unstructured":"Martinsson, J., and Mogren, O. (\u20132, January 27). Semantic segmentation of fashion images using feature pyramid networks. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea."},{"key":"ref_46","unstructured":"Chen, L.C., Papandreou, G., Schroff, F., and Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. arXiv."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., and Adam, H. (2018, January 8\u201314). Encoder-decoder with atrous separable convolution for semantic image segmentation. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_49"},{"key":"ref_48","first-page":"15475","article-title":"Rest: An efficient transformer for visual recognition","volume":"Volume 34","author":"Zhang","year":"2021","journal-title":"Advances in Neural Information Processing Systems"},{"key":"ref_49","first-page":"9355","article-title":"Twins: Revisiting the design of spatial attention in vision transformers","volume":"Volume 34","author":"Chu","year":"2021","journal-title":"Advances in Neural Information Processing Systems"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/15\/13\/3366\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T20:04:24Z","timestamp":1760126664000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/15\/13\/3366"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,30]]},"references-count":49,"journal-issue":{"issue":"13","published-online":{"date-parts":[[2023,7]]}},"alternative-id":["rs15133366"],"URL":"https:\/\/doi.org\/10.3390\/rs15133366","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,6,30]]}}}