{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,20]],"date-time":"2026-03-20T16:08:37Z","timestamp":1774022917534,"version":"3.50.1"},"reference-count":48,"publisher":"MDPI AG","issue":"19","license":[{"start":{"date-parts":[[2022,9,21]],"date-time":"2022-09-21T00:00:00Z","timestamp":1663718400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Youth Foundation for Defence Science and Technology Excellence","award":["2017-JCJQ-ZQ-034"],"award-info":[{"award-number":["2017-JCJQ-ZQ-034"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Hyperspectral images (HSIs) contain spatially structured information and pixel-level sequential spectral attributes. The continuous spectral features contain hundreds of wavelength bands and the differences between spectra are essential for achieving fine-grained classification. Due to the limited receptive field of backbone networks, convolutional neural networks (CNNs)-based HSI classification methods show limitations in modeling spectral-wise long-range dependencies with fixed kernel size and a limited number of layers. Recently, the self-attention mechanism of transformer framework is introduced to compensate for the limitations of CNNs and to mine the long-term dependencies of spectral signatures. Therefore, many joint CNN and Transformer architectures for HSI classification have been proposed to obtain the merits of both networks. However, these architectures make it difficult to capture spatial\u2013spectral correlation and CNNs distort the continuous nature of the spectral signature because of the over-focus on spatial information, which means that the transformer can easily encounter bottlenecks in modeling spectral-wise similarity and long-range dependencies. To address this problem, we propose a neighborhood enhancement hybrid transformer (NEHT) network. In particular, a simple 2D convolution module is adopted to achieve dimensionality reduction while minimizing the distortion of the original spectral distribution by stacked CNNs. Then, we extract group-wise spatial\u2013spectral features in a parallel design to enhance the representation capability of each token. Furthermore, a feature fusion strategy is introduced to increase subtle discrepancies of spectra. Finally, the self-attention of transformer is employed to mine the long-term dependencies between the enhanced feature sequences. Extensive experiments are performed on three well-known datasets and the proposed NEHT network shows superiority over state-of-the-art (SOTA) methods. Specifically, our proposed method outperforms the SOTA method by 0.46%, 1.05% and 0.75% on average in overall accuracy, average accuracy and kappa coefficient metrics.<\/jats:p>","DOI":"10.3390\/rs14194732","type":"journal-article","created":{"date-parts":[[2022,9,22]],"date-time":"2022-09-22T23:07:55Z","timestamp":1663888075000},"page":"4732","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":16,"title":["Hyperspectral Image Classification via Spectral Pooling and Hybrid Transformer"],"prefix":"10.3390","volume":"14","author":[{"given":"Chen","family":"Ma","sequence":"first","affiliation":[{"name":"The School of Astronautics, Harbin Institute of Technology, Harbin 150080, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5694-505X","authenticated-orcid":false,"given":"Junjun","family":"Jiang","sequence":"additional","affiliation":[{"name":"The School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150080, China"}]},{"given":"Huayi","family":"Li","sequence":"additional","affiliation":[{"name":"The School of Astronautics, Harbin Institute of Technology, Harbin 150080, China"}]},{"given":"Xiaoguang","family":"Mei","sequence":"additional","affiliation":[{"name":"The Electronic Information School, Wuhan University, Wuhan 430072, China"}]},{"given":"Chengchao","family":"Bai","sequence":"additional","affiliation":[{"name":"The School of Astronautics, Harbin Institute of Technology, Harbin 150080, China"}]}],"member":"1968","published-online":{"date-parts":[[2022,9,21]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"6690","DOI":"10.1109\/TGRS.2019.2907932","article-title":"Deep Learning for Hyperspectral Image Classification: An Overview","volume":"57","author":"Li","year":"2019","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/TGRS.2022.3172371","article-title":"SpectralFormer: Rethinking Hyperspectral Image Classification With Transformers","volume":"60","author":"Hong","year":"2022","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"1579","DOI":"10.1109\/TGRS.2017.2765364","article-title":"Recent Advances on Spectral\u2013Spatial Hyperspectral Image Classification: An Overview and New Guidelines","volume":"56","author":"He","year":"2018","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"45","DOI":"10.1109\/MSP.2013.2279179","article-title":"Advances in Hyperspectral Image Classification: Earth Monitoring with Statistical Learning Methods","volume":"31","author":"Tuia","year":"2014","journal-title":"IEEE Signal Process. Mag."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1778","DOI":"10.1109\/TGRS.2004.831865","article-title":"Classification of hyperspectral remote sensing images with support vector machines","volume":"42","author":"Melgani","year":"2004","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"344","DOI":"10.1016\/j.patcog.2013.07.005","article-title":"Target detection based on a dynamic subspace","volume":"47","author":"Du","year":"2014","journal-title":"Pattern Recognit."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"809","DOI":"10.1109\/TGRS.2011.2162649","article-title":"Spectral\u2013spatial hyperspectral image segmentation using subspace multinomial logistic regression and Markov random fields","volume":"50","author":"Li","year":"2011","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"6232","DOI":"10.1109\/TGRS.2016.2584107","article-title":"Deep feature extraction and classification of hyperspectral images based on convolutional neural networks","volume":"54","author":"Chen","year":"2016","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"5408","DOI":"10.1109\/TGRS.2018.2815613","article-title":"Hyperspectral image classification with deep learning models","volume":"56","author":"Yang","year":"2018","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"6440","DOI":"10.1109\/TGRS.2018.2838665","article-title":"Active learning with convolutional neural networks for hyperspectral image classification using a new bayesian approach","volume":"56","author":"Haut","year":"2018","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"258619","DOI":"10.1155\/2015\/258619","article-title":"Deep convolutional neural networks for hyperspectral image classification","volume":"2015","author":"Hu","year":"2015","journal-title":"J. Sens."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"4843","DOI":"10.1109\/TIP.2017.2725580","article-title":"Going deeper with contextual CNN for hyperspectral image classification","volume":"26","author":"Lee","year":"2017","journal-title":"IEEE Trans. Image Process."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"740","DOI":"10.1109\/TGRS.2018.2860125","article-title":"Deep pyramidal residual networks for spectral\u2014Spatial hyperspectral image classification","volume":"57","author":"Paoletti","year":"2018","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"847","DOI":"10.1109\/TGRS.2017.2755542","article-title":"Spectral\u2013spatial residual network for hyperspectral image classification: A 3-D deep learning framework","volume":"56","author":"Zhong","year":"2017","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_16","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., and Polosukhin, I. (2017). Attention is all you need. Adv. Neural Inf. Process. Syst., 30."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"165","DOI":"10.1109\/TGRS.2019.2934760","article-title":"HSI-BERT: Hyperspectral image classification using the bidirectional encoder representation from transformers","volume":"58","author":"He","year":"2019","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_18","unstructured":"Devlin, J., Chang, M., Lee, K., and Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv."},{"key":"ref_19","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Qing, Y., Liu, W., Feng, L., and Gao, W. (2021). Improved Transformer Net for Hyperspectral Image Classification. Remote Sens., 13.","DOI":"10.3390\/rs13112216"},{"key":"ref_21","unstructured":"Simonyan, K., and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"He, X., Chen, Y., and Lin, Z. (2021). Spatial-Spectral Transformer for Hyperspectral Image Classification. Remote Sens., 13.","DOI":"10.3390\/rs13030498"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/TGRS.2022.3231215","article-title":"Spectral-Spatial Feature Tokenization Transformer for Hyperspectral Image Classification","volume":"60","author":"Sun","year":"2022","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"447","DOI":"10.1109\/LGRS.2011.2172185","article-title":"Linear versus nonlinear PCA for the classification of hyperspectral data based on the extended morphological profiles","volume":"9","author":"Licciardi","year":"2011","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_25","first-page":"1","article-title":"Hyperspectral Image Transformer Classification Networks","volume":"60","author":"Yang","year":"2022","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Graham, B., El-Nouby, A., Touvron, H., Stock, P., Joulin, A., J\u00e9gou, H., and Douze, M. (2021, January 10\u201317). LeViT: A Vision Transformer in ConvNet\u2019s Clothing for Faster Inference. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.01204"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Heo, B., Yun, S., Han, D., Chun, S., Choe, J., and Oh, S.J. (2021, January 10\u201317). Rethinking Spatial Dimensions of Vision Transformers. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.01172"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Wu, H., Xiao, B., Codella, N., Liu, M., Dai, X., Yuan, L., and Zhang, L. (2021, January 10\u201317). Cvt: Introducing convolutions to vision transformers. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00009"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Guo, J., Han, K., Wu, H., Tang, Y., Chen, X., Wang, Y., and Xu, C. (2022, January 21\u201324). Cmt: Convolutional neural networks meet vision transformers. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01186"},{"key":"ref_30","unstructured":"Li, Y., Zhang, K., Cao, J., Timofte, R., and Van Gool, L. (2021). Localvit: Bringing locality to vision transformers. arXiv."},{"key":"ref_31","unstructured":"Chu, X., Tian, Z., Zhang, B., Wang, X., Wei, X., Xia, H., and Shen, C. (2021). Conditional positional encodings for vision transformers. arXiv."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Wang, A., Xing, S., Zhao, Y., Wu, H., and Iwahori, Y. (2022). A Hyperspectral Image Classification Method Based on Adaptive Spectral Spatial Kernel Combined with Improved Vision Transformer. Remote Sens., 14.","DOI":"10.3390\/rs14153705"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Yang, L., Yang, Y., Yang, J., Zhao, N., Wu, L., Wang, L., and Wang, T. (2022). FusionNet: A Convolution\u2013Transformer Fusion Network for Hyperspectral Image Classification. Remote Sens., 14.","DOI":"10.3390\/rs14164066"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"7071485","DOI":"10.1155\/2022\/7071485","article-title":"Spectral-Spatial Attention Transformer with Dense Connection for Hyperspectral Image Classification","volume":"2022","author":"Dang","year":"2022","journal-title":"Comput. Intell. Neurosci."},{"key":"ref_35","unstructured":"Xue, X., Zhang, H., Bai, Z., and Li, Y. (2021). 3D-ANAS v2: Grafting Transformer Module on Automatically Designed ConvNet for Hyperspectral Image Classification. arXiv."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Zhang, Z., Li, T., Tang, X., Hu, X., and Peng, Y. (2022). CAEVT: Convolutional Autoencoder Meets Lightweight Vision Transformer for Hyperspectral Image Classification. Sensors, 22.","DOI":"10.3390\/s22103902"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Chen, Q., Wu, Q., Wang, J., Hu, Q., Hu, T., Ding, E., Cheng, J., and Wang, J. (2022). MixFormer: Mixing Features across Windows and Dimensions. arXiv.","DOI":"10.1109\/CVPR52688.2022.00518"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Chen, J., Wang, X., Guo, Z., Zhang, X., and Sun, J. (2021, January 10\u201317). Dynamic region-aware convolution. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada.","DOI":"10.1109\/CVPR46437.2021.00797"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Wang, W., Xie, E., Li, X., Fan, D.P., Song, K., Liang, D., Lu, T., Luo, P., and Shao, L. (2021, January 10\u201317). Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00061"},{"key":"ref_40","unstructured":"Yan, H., Li, Z., Li, W., Wang, C., Wu, M., and Zhang, C. (2021). ConTNet: Why not use convolution and transformer at the same time?. arXiv."},{"key":"ref_41","unstructured":"Larsson, G., Maire, M., and Shakhnarovich, G. (2016). Fractalnet: Ultra-deep neural networks without residuals. arXiv."},{"key":"ref_42","unstructured":"Ba, J.L., Kiros, J.R., and Hinton, G.E. (2016). Layer normalization. arXiv."},{"key":"ref_43","unstructured":"Ke, G., He, D., and Liu, T.Y. (2020). Rethinking positional encoding in language pre-training. arXiv."},{"key":"ref_44","unstructured":"Ioffe, S., and Szegedy, C. (2015, January 7\u20139). Batch normalization: Accelerating deep network training by reducing internal covariate shift. Proceedings of the International Conference on Machine Learning, Lille, France."},{"key":"ref_45","unstructured":"Hendrycks, D., and Gimpel, K. (2016). Gaussian error linear units (gelus). arXiv."},{"key":"ref_46","unstructured":"Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., and J\u00e9gou, H. (2021, January 18\u201324). Training data-efficient image transformers & distillation through attention. Proceedings of the International Conference on Machine Learning, Online."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"492","DOI":"10.1109\/TGRS.2004.842481","article-title":"Investigation of the random forest framework for classification of hyperspectral data","volume":"43","author":"Ham","year":"2005","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_48","unstructured":"Haut, J., Paoletti, M., Paz-Gallardo, A., Plaza, J., Plaza, A., and Vigo-Aguiar, J. (2017, January 4\u20138). Cloud implementation of logistic regression for hyperspectral image classification. Proceedings of the 17th International Conference on Computational and Mathematical Methods in Science and Engineering, CMMSE 2017, Rota, Spain."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/19\/4732\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T00:37:02Z","timestamp":1760143022000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/19\/4732"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,9,21]]},"references-count":48,"journal-issue":{"issue":"19","published-online":{"date-parts":[[2022,10]]}},"alternative-id":["rs14194732"],"URL":"https:\/\/doi.org\/10.3390\/rs14194732","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,9,21]]}}}