{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,24]],"date-time":"2026-02-24T18:08:55Z","timestamp":1771956535613,"version":"3.50.1"},"reference-count":50,"publisher":"MDPI AG","issue":"10","license":[{"start":{"date-parts":[[2022,5,20]],"date-time":"2022-05-20T00:00:00Z","timestamp":1653004800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Natural Science Foundation of China","award":["91948303-1"],"award-info":[{"award-number":["91948303-1"]}]},{"name":"National Natural Science Foundation of China","award":["61803375"],"award-info":[{"award-number":["61803375"]}]},{"name":"National Natural Science Foundation of China","award":["QL20210018"],"award-info":[{"award-number":["QL20210018"]}]},{"name":"Postgraduate Scientific Research Innovation Project of Hunan Province","award":["91948303-1"],"award-info":[{"award-number":["91948303-1"]}]},{"name":"Postgraduate Scientific Research Innovation Project of Hunan Province","award":["61803375"],"award-info":[{"award-number":["61803375"]}]},{"name":"Postgraduate Scientific Research Innovation Project of Hunan Province","award":["QL20210018"],"award-info":[{"award-number":["QL20210018"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Convolutional neural networks (CNNs) have been prominent in most hyperspectral image (HSI) processing applications due to their advantages in extracting local information. Despite their success, the locality of the convolutional layers within CNNs results in heavyweight models and time-consuming defects. In this study, inspired by the excellent performance of transformers that are used for long-range representation learning in computer vision tasks, we built a lightweight vision transformer for HSI classification that can extract local and global information simultaneously, thereby facilitating accurate classification. Moreover, as traditional dimensionality reduction methods are limited in their linear representation ability, a three-dimensional convolutional autoencoder was adopted to capture the nonlinear characteristics between spectral bands. Based on the aforementioned three-dimensional convolutional autoencoder and lightweight vision transformer, we designed an HSI classification network, namely the \u201cconvolutional autoencoder meets lightweight vision transformer\u201d (CAEVT). Finally, we validated the performance of the proposed CAEVT network using four widely used hyperspectral datasets. Our approach showed superiority, especially in the absence of sufficient labeled samples, which demonstrates the effectiveness and efficiency of the CAEVT network.<\/jats:p>","DOI":"10.3390\/s22103902","type":"journal-article","created":{"date-parts":[[2022,5,21]],"date-time":"2022-05-21T09:18:08Z","timestamp":1653124688000},"page":"3902","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":23,"title":["CAEVT: Convolutional Autoencoder Meets Lightweight Vision Transformer for Hyperspectral Image Classification"],"prefix":"10.3390","volume":"22","author":[{"given":"Zhiwen","family":"Zhang","sequence":"first","affiliation":[{"name":"The State Key Laboratory of High-Performance Computing, College of Computer, National University of Defense Technology, Changsha 410073, China"}]},{"given":"Teng","family":"Li","sequence":"additional","affiliation":[{"name":"Beijing Institute for Advanced Study, National University of Defense Technology, Beijing 100020, China"},{"name":"College of Advanced Interdisciplinary Studies, National University of Defense Technology, Changsha 410073, China"}]},{"given":"Xuebin","family":"Tang","sequence":"additional","affiliation":[{"name":"The State Key Laboratory of High-Performance Computing, College of Computer, National University of Defense Technology, Changsha 410073, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1798-8508","authenticated-orcid":false,"given":"Xiang","family":"Hu","sequence":"additional","affiliation":[{"name":"The State Key Laboratory of High-Performance Computing, College of Computer, National University of Defense Technology, Changsha 410073, China"}]},{"given":"Yuanxi","family":"Peng","sequence":"additional","affiliation":[{"name":"The State Key Laboratory of High-Performance Computing, College of Computer, National University of Defense Technology, Changsha 410073, China"}]}],"member":"1968","published-online":{"date-parts":[[2022,5,20]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"75","DOI":"10.1016\/j.asoc.2017.11.045","article-title":"Computational intelligence in optical remote sensing image processing","volume":"64","author":"Zhong","year":"2018","journal-title":"Appl. Soft Comput."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"1119","DOI":"10.3390\/rs10071119","article-title":"Very Deep Convolutional Neural Networks for Complex Land Cover Mapping Using Multispectral Remote Sensing Imagery","volume":"10","author":"Masoud","year":"2018","journal-title":"Remote Sens."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Pipitone, C., Maltese, A., Dardanelli, G., Brutto, M.L., and Loggia, G.L. (2018). Monitoring Water Surface and Level of a Reservoir Using Different Remote Sensing Approaches and Comparison with Dam Displacements Evaluated via GNSS. Remote Sens., 10.","DOI":"10.3390\/rs10010071"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"3966","DOI":"10.3390\/rs70403966","article-title":"Global and Local Real-Time Anomaly Detectors for Hyperspectral Remote Sensing Imagery","volume":"7","author":"Zhao","year":"2015","journal-title":"Remote Sens."},{"key":"ref_5","first-page":"261","article-title":"Hyperspectral image classification by a variable interval spectral average and spectral curve matching combined algorithm","volume":"12","author":"Kumar","year":"2010","journal-title":"Int. J. Appl. Earth Obs. Geoinf."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"60","DOI":"10.1016\/j.rse.2017.10.041","article-title":"Atmospheric correction for hyperspectral ocean color retrieval with application to the Hyperspectral Imager for the Coastal Ocean (HICO)","volume":"204","author":"Ibrahim","year":"2018","journal-title":"Remote Sens. Environ."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"725","DOI":"10.14358\/PERS.80.8.725","article-title":"Improved Capability in Stone Pine Forest Mapping and Management in Lebanon Using Hyperspectral CHRIS-Proba Data Relative to Landsat ETM","volume":"80","author":"Awad","year":"2014","journal-title":"Photogramm. Eng. Remote Sens."},{"key":"ref_8","first-page":"102154","article-title":"Improved k-means and spectral matching for hyperspectral mineral mapping","volume":"91","author":"Ren","year":"2020","journal-title":"Int. J. Appl. Earth Obs. Geoinf."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"466","DOI":"10.1109\/TGRS.2004.841417","article-title":"Dimensionality reduction and classification of hyperspectral image data using sequences of extended morphological transformations","volume":"43","author":"Plaza","year":"2005","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"1044","DOI":"10.1109\/36.841984","article-title":"An experiment-based quantitative and comparative analysis of target detection and image classification algorithms for hyperspectral imagery","volume":"38","author":"Chang","year":"2000","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"22","DOI":"10.1109\/MGRS.2016.2540798","article-title":"Deep learning for remote sensing data: A technical tutorial on the state of the art","volume":"4","author":"Zhang","year":"2016","journal-title":"IEEE Geosci. Remote Sens. Mag."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"4073","DOI":"10.1109\/JSTARS.2016.2517204","article-title":"Spectral\u2013Spatial Classification of Hyperspectral Image Based on Deep Auto-Encoder","volume":"9","author":"Ma","year":"2016","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Ji, J., Mei, S., Hou, J., Li, X., and Du, Q. (2017, January 23\u201328). Learning sensor-specific features for hyperspectral images via 3-dimensional convolutional autoencoder. Proceedings of the 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Fort Worth, TX, USA.","DOI":"10.1109\/IGARSS.2017.8127329"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"5595026","DOI":"10.1155\/2021\/5595026","article-title":"Medicalguard: U-net model robust against adversarially perturbed images","volume":"2021","author":"Kwon","year":"2021","journal-title":"Secur. Commun. Netw."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"6217","DOI":"10.1007\/s11042-021-11135-0","article-title":"BlindNet backdoor: Attack on deep neural network using blind watermark","volume":"81","author":"Kwon","year":"2022","journal-title":"Multimed. Tools Appl."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"170","DOI":"10.1587\/transinf.2021EDL8054","article-title":"Multi-Model Selective Backdoor Attack with Different Trigger Positions","volume":"105","author":"KWON","year":"2022","journal-title":"IEICE Trans. Inf. Syst."},{"key":"ref_17","unstructured":"Kwon, H. (2021). Defending Deep Neural Networks against Backdoor Attack by Using De-trigger Autoencoder. IEEE Access."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Mu, C., Dong, Z., and Liu, Y. (2022). A Two-Branch Convolutional Neural Network Based on Multi-Spectral Entropy Rate Superpixel Segmentation for Hyperspectral Image Classification. Remote Sens., 14.","DOI":"10.3390\/rs14071569"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"1482","DOI":"10.1109\/JSTARS.2020.3041344","article-title":"Learning a deep similarity network for hyperspectral image classification","volume":"14","author":"Yang","year":"2020","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"3006","DOI":"10.1109\/JSTARS.2021.3062872","article-title":"Sandwich Convolutional Neural Network for Hyperspectral Image Classification Using Spectral Feature Enhancement","volume":"14","author":"Gao","year":"2021","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_21","first-page":"5508614","article-title":"Spectral Feature Fusion Networks With Dual Attention for Hyperspectral Image Classification","volume":"60","author":"Li","year":"2021","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Chang, Y.L., Tan, T.H., Lee, W.H., Chang, L., Chen, Y.N., Fan, K.C., and Alkhaleefah, M. (2022). Consolidated Convolutional Neural Network for Hyperspectral Image Classification. Remote Sens., 14.","DOI":"10.3390\/rs14071571"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Fang, B., Liu, Y., Zhang, H., and He, J. (2022). Hyperspectral Image Classification Based on 3D Asymmetric Inception Network with Data Fusion Transfer Learning. Remote Sens., 14.","DOI":"10.3390\/rs14071711"},{"key":"ref_24","first-page":"102157","article-title":"Deep fusion of localized spectral features and multi-scale spatial features for effective classification of hyperspectral images","volume":"91","author":"Sun","year":"2020","journal-title":"Int. J. Appl. Earth Obs. Geoinf."},{"key":"ref_25","first-page":"102459","article-title":"A combination method of stacked autoencoder and 3D deep residual network for hyperspectral image classification","volume":"102","author":"Zhao","year":"2021","journal-title":"Int. J. Appl. Earth Obs. Geoinf."},{"key":"ref_26","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., and Polosukhin, I. (2017, January 4\u20139). Attention Is All You Need. Proceedings of the Annual Conference on Neural Information Processing Systems 2017, Long Beach, CA, USA."},{"key":"ref_27","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., and Houlsby, N. (2020). An Image is Worth 16 \u00d7 16 Words: Transformers for Image Recognition at Scale. arXiv."},{"key":"ref_28","unstructured":"Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., and J\u00e9gou, H. (2021, January 17\u201319). Training data-efficient image transformers & distillation through attention. Proceedings of the International Conference on Machine Learning, PMLR, Virtual Event."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., and Zagoruyko, S. (2020, January 23\u201328). End-to-end object detection with transformers. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58452-8_13"},{"key":"ref_30","unstructured":"Zhu, X., Su, W., Lu, L., Li, B., Wang, X., and Dai, J. (2020). Deformable detr: Deformable transformers for end-to-end object detection. arXiv."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Zheng, S., Lu, J., Zhao, H., Zhu, X., Luo, Z., Wang, Y., Fu, Y., Feng, J., Xiang, T., and Torr, P.H. (2021, January 20\u201325). Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. Proceedings of the 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00681"},{"key":"ref_32","unstructured":"Jun, E., Jeong, S., Heo, D.W., and Suk, H.I. (2021). Medical Transformer: Universal Brain Encoder for 3D MRI Analysis. arXiv."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Lin, C.H., Yumer, E., Wang, O., Shechtman, E., and Lucey, S. (2018, January 18\u201323). St-gan: Spatial transformer generative adversarial networks for image compositing. Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00985"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Li, G., Xu, D., Cheng, X., Si, L., and Zheng, C. (2021). SimViT: Exploring a Simple Vision Transformer with sliding windows. arXiv.","DOI":"10.1109\/ICME52920.2022.9859907"},{"key":"ref_35","first-page":"5518615","article-title":"Spectralformer: Rethinking hyperspectral image classification with transformers","volume":"60","author":"Hong","year":"2021","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"He, X., Chen, Y., and Lin, Z. (2021). Spatial-Spectral Transformer for Hyperspectral Image Classification. Remote Sens., 13.","DOI":"10.3390\/rs13030498"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Qing, Y., Liu, W., Feng, L., and Gao, W. (2021). Improved Transformer Net for Hyperspectral Image Classification. Remote Sens., 13.","DOI":"10.3390\/rs13112216"},{"key":"ref_38","unstructured":"Mehta, S., and Rastegari, M. (2021). MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer. arXiv."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"12777","DOI":"10.1007\/s11042-019-08453-9","article-title":"Dropout vs. batch normalization: An empirical study of their impact to deep learning","volume":"79","author":"Garbin","year":"2020","journal-title":"Multimed. Tools Appl."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2015, January 7\u201313). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. Proceedings of the 2015 IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.123"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"71","DOI":"10.1016\/j.neucom.2021.07.015","article-title":"ContrastNet: Unsupervised feature learning by autoencoder and prototypical contrastive learning for hyperspectral imagery classification","volume":"460","author":"Cao","year":"2021","journal-title":"Neurocomputing"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"847","DOI":"10.1109\/TGRS.2017.2755542","article-title":"Spectral-Spatial Residual Network for Hyperspectral Image Classification: A 3-D Deep Learning Framework","volume":"56","author":"Zhong","year":"2017","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep Residual Learning for Image Recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Wenju, W., Shuguang, D., Zhongmin, J., and Liujie, S. (2018). A Fast Dense Spectral\u2013Spatial Convolution Network Framework for Hyperspectral Images Classification. Remote Sens., 10.","DOI":"10.3390\/rs10071068"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Ma, W., Yang, Q., Wu, Y., Zhao, W., and Zhang, X. (2019). Double-Branch Multi-Attention Mechanism Network for Hyperspectral Image Classification. Remote Sens., 11.","DOI":"10.3390\/rs11111307"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Li, R., Zheng, S., Duan, C., Yang, Y., and Wang, X. (2020). Classification of Hyperspectral Image Based on Double-Branch Dual-Attention Mechanism Network. Remote Sens., 12.","DOI":"10.20944\/preprints201912.0059.v2"},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"5277","DOI":"10.1109\/TGRS.2019.2961681","article-title":"Lightweight spectral\u2013spatial squeeze-and-excitation residual bag-of-features learning for hyperspectral classification","volume":"58","author":"Roy","year":"2020","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_48","unstructured":"Li, R., and Duan, C. (2020). LiteDenseNet: A lightweight network for hyperspectral image classification. arXiv."},{"key":"ref_49","first-page":"5502915","article-title":"LiteDepthwiseNet: A Lightweight Network for Hyperspectral Image Classification","volume":"60","author":"Cui","year":"2021","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Chen, L., Wei, Z., and Xu, Y. (2020). A lightweight spectral\u2013spatial feature extraction and fusion network for hyperspectral image classification. Remote Sens., 12.","DOI":"10.3390\/rs12091395"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/10\/3902\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T23:15:56Z","timestamp":1760138156000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/10\/3902"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,5,20]]},"references-count":50,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2022,5]]}},"alternative-id":["s22103902"],"URL":"https:\/\/doi.org\/10.3390\/s22103902","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,5,20]]}}}