{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,13]],"date-time":"2026-02-13T13:55:53Z","timestamp":1770990953235,"version":"3.50.1"},"reference-count":58,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2023,2,22]],"date-time":"2023-02-22T00:00:00Z","timestamp":1677024000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Natural Science Foundation of China","award":["62062033"],"award-info":[{"award-number":["62062033"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Hyperspectral image (HSI) classification is a significant foundation for remote sensing image analysis, widely used in biology, aerospace, and other applications. Convolution neural networks (CNNs) and attention mechanisms have shown outstanding ability in HSI classification and have been widely studied in recent years. However, the existing CNN-based and attention mechanism-based methods cannot fully use spatial\u2013spectral information, which is not conducive to further improving HSI classification accuracy. This paper proposes a new spatial\u2013spectral Transformer network with multi-scale convolution (SS-TMNet), which can effectively extract local and global spatial\u2013spectral information. SS-TMNet includes two key modules, i.e., multi-scale 3D convolution projection module (MSCP) and spatial\u2013spectral attention module (SSAM). The MSCP uses multi-scale 3D convolutions with different depths to extract the fused spatial\u2013spectral features. The spatial\u2013spectral attention module includes three branches: height spatial attention, width spatial attention, and spectral attention, which can extract the fusion information of spatial and spectral features. The proposed SS-TMNet was tested on three widely used HSI datasets: Pavia University, IndianPines, and Houston2013. The experimental results show that the proposed SS-TMNet is superior to the existing methods.<\/jats:p>","DOI":"10.3390\/rs15051206","type":"journal-article","created":{"date-parts":[[2023,2,23]],"date-time":"2023-02-23T01:31:06Z","timestamp":1677115866000},"page":"1206","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":18,"title":["SS-TMNet: Spatial\u2013Spectral Transformer Network with Multi-Scale Convolution for Hyperspectral Image Classification"],"prefix":"10.3390","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7269-4484","authenticated-orcid":false,"given":"Xiaohui","family":"Huang","sequence":"first","affiliation":[{"name":"School of Information Engineering, East China Jiaotong University, Nanchang 330013, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yunfei","family":"Zhou","sequence":"additional","affiliation":[{"name":"School of Information Engineering, East China Jiaotong University, Nanchang 330013, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2458-6774","authenticated-orcid":false,"given":"Xiaofei","family":"Yang","sequence":"additional","affiliation":[{"name":"The Department of Computer and Information Science, University of Macau, Macau 519000, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xianhong","family":"Zhu","sequence":"additional","affiliation":[{"name":"School of Information Engineering, East China Jiaotong University, Nanchang 330013, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ke","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Computer Science, Shenyang Aerospace University, Shenyang 110136, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2023,2,22]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"6","DOI":"10.1109\/MGRS.2013.2244672","article-title":"Hyperspectral remote sensing data analysis and future challenges","volume":"1","author":"Plaza","year":"2013","journal-title":"IEEE Geosci. Remote Sens. Mag."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"377","DOI":"10.1109\/JSTARS.2020.3037070","article-title":"TDSSC: A three-directions spectral\u2013spatial convolution neural network for hyperspectral image change detection","volume":"14","author":"Zhan","year":"2020","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"968","DOI":"10.1109\/JSTARS.2021.3133021","article-title":"Hyperspectral image classification\u2014Traditional to deep models: A survey for future prospects","volume":"15","author":"Ahmad","year":"2021","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1778","DOI":"10.1109\/TGRS.2004.831865","article-title":"Classification of hyperspectral remote sensing images with support vector machines","volume":"42","author":"Melgani","year":"2004","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"2112","DOI":"10.1109\/TGRS.2008.916629","article-title":"Supervised classification of remotely sensed imagery using a modified k-NN technique","volume":"46","author":"Samaniego","year":"2008","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_6","first-page":"4085","article-title":"Semisupervised hyperspectral image segmentation using multinomial logistic regression with active learning","volume":"48","author":"Li","year":"2010","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"480","DOI":"10.1109\/TGRS.2004.842478","article-title":"Classification of hyperspectral data from urban areas based on extended morphological profiles","volume":"43","author":"Benediktsson","year":"2005","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_8","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (July, January 26). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"106771","DOI":"10.1016\/j.knosys.2021.106771","article-title":"Image classification with deep learning in the presence of noisy labels: A survey","volume":"215","author":"Algan","year":"2021","journal-title":"Knowl.-Based Syst."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Touvron, H., Bojanowski, P., Caron, M., Cord, M., El-Nouby, A., Grave, E., Izacard, G., Joulin, A., Synnaeve, G., and Verbeek, J. (2022). Resmlp: Feedforward networks for image classification with data-efficient training. IEEE Trans. Pattern Anal. Mach. Intell.","DOI":"10.1109\/TPAMI.2022.3206148"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"3212","DOI":"10.1109\/TNNLS.2018.2876865","article-title":"Object detection with deep learning: A review","volume":"30","author":"Zhao","year":"2019","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_12","first-page":"3523","article-title":"Image segmentation using deep learning: A survey","volume":"44","author":"Minaee","year":"2021","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"3173","DOI":"10.1109\/TGRS.2018.2794326","article-title":"Hyperspectral image classification with deep feature fusion network","volume":"56","author":"Song","year":"2018","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"6232","DOI":"10.1109\/TGRS.2016.2584107","article-title":"Deep feature extraction and classification of hyperspectral images based on convolutional neural networks","volume":"54","author":"Chen","year":"2016","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"2094","DOI":"10.1109\/JSTARS.2014.2329330","article-title":"Deep learning-based classification of hyperspectral data","volume":"7","author":"Chen","year":"2014","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"3639","DOI":"10.1109\/TGRS.2016.2636241","article-title":"Deep recurrent neural networks for hyperspectral image classification","volume":"55","author":"Mou","year":"2017","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_17","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., and Polosukhin, I. (2017, January 4\u20139). Attention is All you Need. Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, Long Beach, CA, USA."},{"key":"ref_18","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2021, January 3\u20137). An Image is Worth 16 \u00d7 16 Words: Transformers for Image Recognition at Scale. Proceedings of the 9th International Conference on Learning Representations, Virtual."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"165","DOI":"10.1109\/TGRS.2019.2934760","article-title":"HSI-BERT: Hyperspectral image classification using the bidirectional encoder representation from transformers","volume":"58","author":"He","year":"2019","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"104286","DOI":"10.1016\/j.infrared.2022.104286","article-title":"Investigation of the data fusion of spectral and textural data from hyperspectral imaging for the near geographical origin discrimination of wolfberries using 2D-CNN algorithms","volume":"125","author":"Hao","year":"2022","journal-title":"Infrared Phys. Technol."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"He, M., Li, B., and Chen, H. (2017, January 17\u201320). Multi-Scale 3D Deep Convolutional Neural Network for Hyperspectral Image Classification. Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China.","DOI":"10.1109\/ICIP.2017.8297014"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Fang, B., Liu, Y., Zhang, H., and He, J. (2022). Hyperspectral Image Classification Based on 3D Asymmetric Inception Network with Data Fusion Transfer Learning. Remote Sens., 14.","DOI":"10.3390\/rs14071711"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Chang, Y.L., Tan, T.H., Lee, W.H., Chang, L., Chen, Y.N., Fan, K.C., and Alkhaleefah, M. (2022). Consolidated Convolutional Neural Network for Hyperspectral Image Classification. Remote Sens., 14.","DOI":"10.3390\/rs14071571"},{"key":"ref_24","unstructured":"Zhou, D., Kang, B., Jin, X., Yang, L., Lian, X., Jiang, Z., Hou, Q., and Feng, J. (2021). Deepvit: Towards Deeper Vision Transformer. arXiv."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"He, X., Chen, Y., and Lin, Z. (2021). Spatial-Spectral Transformer for Hyperspectral Image Classification. Remote Sens., 13.","DOI":"10.3390\/rs13030498"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Yu, D., Li, Q., Wang, X., Zhang, Z., Qian, Y., and Xu, C. (2023, January 3\u20137). DSTrans: Dual-Stream Transformer for Hyperspectral Image Restoration. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA.","DOI":"10.1109\/WACV56688.2023.00373"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Li, J., Xing, H., Ao, Z., Wang, H., Liu, W., and Zhang, A. (2023). Convolution-Transformer Adaptive Fusion Network for Hyperspectral Image Classification. Appl. Sci., 13.","DOI":"10.3390\/app13010492"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"5522214","DOI":"10.1109\/TGRS.2022.3221534","article-title":"Spectral-Spatial Feature Tokenization Transformer for Hyperspectral Image Classification","volume":"60","author":"Sun","year":"2022","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_29","unstructured":"Wang, Y., Jiang, S., Xu, M., Zhang, S., and Jia, S. (2022, January 23\u201329). A Center-Masked Convolutional Transformer for Hyperspectral Image Classification. Proceedings of the 31st International Joint Conference on Artificial Intelligence, Vienna, Austria."},{"key":"ref_30","first-page":"5516712","article-title":"Marginalized graph self-representation for unsupervised hyperspectral band selection","volume":"60","author":"Zhang","year":"2022","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"246","DOI":"10.1016\/j.neucom.2022.06.031","article-title":"Multi-feature fusion: Graph neural network and CNN combining for hyperspectral image classification","volume":"501","author":"Ding","year":"2022","journal-title":"Neurocomputing"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"119508","DOI":"10.1016\/j.eswa.2023.119508","article-title":"Multireceptive field: An adaptive path aggregation graph neural framework for hyperspectral image classification","volume":"217","author":"Zhang","year":"2023","journal-title":"Expert Syst. Appl."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"8500","DOI":"10.1109\/TCSVT.2022.3196679","article-title":"Spectral\u2013Spatial Feature Extraction With Dual Graph Autoencoder for Hyperspectral Image Clustering","volume":"32","author":"Zhang","year":"2022","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"201","DOI":"10.1016\/j.ins.2022.04.006","article-title":"AF2GNN: Graph convolution with adaptive filters and aggregator fusion for hyperspectral image classification","volume":"602","author":"Ding","year":"2022","journal-title":"Inf. Sci."},{"key":"ref_35","first-page":"5536716","article-title":"Unsupervised self-correlated learning smoothy enhanced locality preserving graph convolution embedding clustering for hyperspectral images","volume":"60","author":"Ding","year":"2022","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_36","first-page":"5518615","article-title":"SpectralFormer: Rethinking hyperspectral image classification with transformers","volume":"60","author":"Hong","year":"2021","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"6015005","DOI":"10.1109\/LGRS.2022.3217775","article-title":"Two-Branch Pure Transformer for Hyperspectral Image Classification","volume":"19","author":"He","year":"2022","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Feng, J., Luo, X., Li, S., Wang, Q., and Yin, J. (2022, January 17\u201322). Spectral Transformer with Dynamic Spatial Sampling and Gaussian Positional Embedding for Hyperspectral Image Classification. Proceedings of the International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia.","DOI":"10.1109\/IGARSS46834.2022.9883118"},{"key":"ref_39","first-page":"5536016","article-title":"Self-supervised locality preserving low-pass graph convolutional embedding for large-scale hyperspectral image clustering","volume":"60","author":"Ding","year":"2022","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_40","first-page":"2491","article-title":"SimpleMKL","volume":"9","author":"Rakotomamonjy","year":"2008","journal-title":"J. Mach. Learn. Res."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"5975","DOI":"10.1080\/01431161.2010.512425","article-title":"Extended profiles with morphological attribute filters for the analysis of hyperspectral data","volume":"31","author":"Waske","year":"2010","journal-title":"Int. J. Remote Sens."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"4816","DOI":"10.1109\/TGRS.2012.2230268","article-title":"Generalized Composite Kernel Framework for Hyperspectral Image Classification","volume":"51","author":"Li","year":"2013","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"862","DOI":"10.1109\/TGRS.2008.2005729","article-title":"Classification of Hyperspectral Images With Regularized Linear Discriminant Analysis","volume":"47","author":"Bandos","year":"2009","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"4865","DOI":"10.1109\/TGRS.2011.2153861","article-title":"Hyperspectral Image Classification With Independent Component Discriminant Analysis","volume":"49","author":"Villa","year":"2011","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"447","DOI":"10.1109\/LGRS.2011.2172185","article-title":"Linear Versus Nonlinear PCA for the Classification of Hyperspectral Data Based on the Extended Morphological Profiles","volume":"9","author":"Licciardi","year":"2011","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"261","DOI":"10.1109\/TGRS.2014.2321405","article-title":"Automatic spatial\u2013spectral feature selection for hyperspectral image via discriminative sparse multimodal learning","volume":"53","author":"Zhang","year":"2014","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_47","first-page":"1","article-title":"Hyperspectral image classification based on mathematical morphology and tensor decomposition","volume":"4","author":"Jouni","year":"2020","journal-title":"Math.-Morphol.-Theory Appl."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Luo, F., Huang, H., Duan, Y., Liu, J., and Liao, Y. (2017). Local geometric structure feature for dimensionality reduction of hyperspectral imagery. Remote Sens., 9.","DOI":"10.3390\/rs9080790"},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"258619","DOI":"10.1155\/2015\/258619","article-title":"Deep Convolutional Neural Networks for Hyperspectral Image Classification","volume":"2015","author":"Hu","year":"2015","journal-title":"J. Sens."},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Graham, B., El-Nouby, A., Touvron, H., Stock, P., Joulin, A., J\u00e9gou, H., and Douze, M. (2021, January 11\u201317). Levit: A Vision Transformer in Convnet\u2019s Clothing for Faster Inference. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.01204"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Chen, C.F.R., Fan, Q., and Panda, R. (2021, January 11\u201317). CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00041"},{"key":"ref_52","unstructured":"Simonyan, K., and Zisserman, A. (2015, January 7\u20139). Very Deep Convolutional Networks for Large-Scale Image Recognition. Proceedings of the International Conference on Learning Representations, San Diego, CA, USA."},{"key":"ref_53","first-page":"5528715","article-title":"Hyperspectral Image Transformer Classification Networks","volume":"60","author":"Yang","year":"2022","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_54","unstructured":"Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., and Antiga, L. (2019, January 8\u201314). PyTorch: An Imperative Style, High-Performance Deep Learning Library. Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, Vancouver, BC, Canada."},{"key":"ref_55","unstructured":"Sharma, V., Diba, A., Tuytelaars, T., and Van Gool, L. (2016). Hyperspectral CNN for Image Classification & Band Selection, with Application to Face Recognition, ESAT. Technical Report KUL\/ESAT\/PSI\/1604, KU Leuven."},{"key":"ref_56","doi-asserted-by":"crossref","first-page":"277","DOI":"10.1109\/LGRS.2019.2918719","article-title":"HybridSN: Exploring 3-D\u20132-D CNN feature hierarchy for hyperspectral image classification","volume":"17","author":"Roy","year":"2019","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Heo, B., Yun, S., Han, D., Chun, S., Choe, J., and Oh, S.J. (2021, January 11\u201317). Rethinking spatial dimensions of vision transformers. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.01172"},{"key":"ref_58","doi-asserted-by":"crossref","first-page":"1328","DOI":"10.1109\/TPAMI.2022.3145427","article-title":"Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition","volume":"45","author":"Hou","year":"2023","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/15\/5\/1206\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T18:39:17Z","timestamp":1760121557000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/15\/5\/1206"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,2,22]]},"references-count":58,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2023,3]]}},"alternative-id":["rs15051206"],"URL":"https:\/\/doi.org\/10.3390\/rs15051206","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,2,22]]}}}