{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,17]],"date-time":"2026-03-17T18:31:17Z","timestamp":1773772277736,"version":"3.50.1"},"reference-count":57,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2023,1,31]],"date-time":"2023-01-31T00:00:00Z","timestamp":1675123200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Natural Science Foundation of China Major Program","award":["42192580"],"award-info":[{"award-number":["42192580"]}]},{"name":"National Natural Science Foundation of China Major Program","award":["42192583"],"award-info":[{"award-number":["42192583"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Convolutional neural networks (CNNs) have made significant advances in remote sensing scene classification (RSSC) in recent years. Nevertheless, the limitations of the receptive field cause CNNs to suffer from a disadvantage in capturing contextual information. To address this issue, vision transformer (ViT), a novel model that has piqued the interest of academics, is used to extract latent contextual information in remote sensing scene classification. However, when confronted with the challenges of large-scale variations and high interclass similarity in scene classification images, the original ViT has the drawback of ignoring important local features, thereby causing the model\u2019s performance to degrade. Consequently, we propose the hierarchical contextual feature-preserved network (HCFPN) by combining the advantages of CNNs and ViT. First, a hierarchical feature extraction module based on ResNet-34 is utilized to acquire the multilevel convolutional features and high-level semantic features. Second, a contextual feature-preserved module takes advantage of the first two multilevel features to capture abundant long-term contextual features. Then, the captured long-term contextual features are utilized for multiheaded cross-level attention computing to aggregate and explore the correlation of multilevel features. Finally, the multiheaded cross-level attention score and high-level semantic features are classified. Then, a category score average module is proposed to fuse the classification results, whereas a label smoothing approach is utilized prior to calculating the loss to produce discriminative scene representation. In addition, we conduct extensive experiments on two publicly available RSSC datasets. Our proposed HCPFN outperforms most state-of-the-art approaches.<\/jats:p>","DOI":"10.3390\/rs15030810","type":"journal-article","created":{"date-parts":[[2023,2,1]],"date-time":"2023-02-01T05:33:53Z","timestamp":1675229633000},"page":"810","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["HCFPN: Hierarchical Contextual Feature-Preserved Network for Remote Sensing Scene Classification"],"prefix":"10.3390","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1596-6073","authenticated-orcid":false,"given":"Jingwen","family":"Yuan","sequence":"first","affiliation":[{"name":"School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430072, China"}]},{"given":"Shugen","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430072, China"}]}],"member":"1968","published-online":{"date-parts":[[2023,1,31]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"270","DOI":"10.1109\/MGRS.2022.3145854","article-title":"Artificial intelligence for remote sensing data analysis: A review of challenges and opportunities","volume":"10","author":"Zhang","year":"2022","journal-title":"IEEE Geosci. Remote Sens. Mag."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"111322","DOI":"10.1016\/j.rse.2019.111322","article-title":"Land-cover classification with high-resolution remote sensing images using transferable deep models","volume":"237","author":"Tong","year":"2020","journal-title":"Remote Sens. Environ."},{"key":"ref_3","first-page":"1","article-title":"Accelerating convolutional neural network-based hyperspectral image classification by step activation quantization","volume":"60","author":"Mei","year":"2021","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"3660","DOI":"10.1109\/TGRS.2016.2523563","article-title":"Semantic annotation of high-resolution satellite images via weakly supervised learning","volume":"54","author":"Yao","year":"2016","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"7405","DOI":"10.1109\/TGRS.2016.2601622","article-title":"Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images","volume":"54","author":"Cheng","year":"2016","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"570","DOI":"10.1109\/JPROC.2012.2189089","article-title":"Human settlements: A global challenge for EO data processing and interpretation","volume":"101","author":"Gamba","year":"2012","journal-title":"Proc. IEEE Inst. Electr. Electron. Eng."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"134","DOI":"10.1080\/10095020.2017.1329314","article-title":"Earth observation brain (EOB): An intelligent earth observation system","volume":"20","author":"Li","year":"2017","journal-title":"Geo Spat. Inf. Sci."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"2395","DOI":"10.1080\/01431161.2011.608740","article-title":"High-resolution satellite scene classification using a sparse coding based multiple feature combination","volume":"33","author":"Sheng","year":"2012","journal-title":"Int. J. Remote Sens."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"5798","DOI":"10.1109\/TGRS.2017.2714676","article-title":"Two-stage reranking for remote sensing image retrieval","volume":"55","author":"Tang","year":"2017","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"5525","DOI":"10.1109\/TGRS.2017.2709802","article-title":"Scene classification based on the fully sparse semantic topic model","volume":"55","author":"Zhu","year":"2017","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_11","first-page":"6180","article-title":"Adaptive deep sparse semantic modeling framework for high spatial resolution image scene classification","volume":"56","author":"Zhu","year":"2018","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"2689","DOI":"10.1109\/TGRS.2017.2781712","article-title":"Scene classification based on the sparse homogeneous\u2013heterogeneous topic feature model","volume":"56","author":"Zhu","year":"2018","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"82","DOI":"10.1109\/TGRS.2019.2931801","article-title":"Remote sensing scene classification by gated bidirectional network","volume":"58","author":"Sun","year":"2019","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Yang, Y., and Newsam, S. (2008, January 12\u201315). Comparing SIFT descriptors and Gabor texture features for classification of remote sensed imagery. Proceedings of the 2008 15th IEEE International Conference on Image Processing, San Diego, CA, USA.","DOI":"10.1109\/ICIP.2008.4712139"},{"key":"ref_15","unstructured":"Dalal, N., and Triggs, B. (2005, January 20\u201325). Histograms of oriented gradients for human detection. Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR\u201905), San Diego, CA, USA."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"3706","DOI":"10.1109\/TGRS.2006.881741","article-title":"Modeling and detection of geospatial objects using texture motifs","volume":"44","author":"Bhagavathy","year":"2006","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"3420","DOI":"10.1109\/TGRS.2020.3007533","article-title":"Deep hash learning for remote sensing image retrieval","volume":"59","author":"Liu","year":"2020","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"3735","DOI":"10.1109\/JSTARS.2020.3005403","article-title":"Remote sensing image scene classification meets deep learning: Challenges, methods, benchmarks, and opportunities","volume":"13","author":"Cheng","year":"2020","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_19","first-page":"1","article-title":"A cross-layer nonlocal network for remote sensing scene classification","volume":"19","author":"Li","year":"2021","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_20","first-page":"1","article-title":"Perturbation-seeking generative adversarial networks: A defense framework for remote sensing image scene classification","volume":"60","author":"Cheng","year":"2021","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_21","first-page":"1","article-title":"Meta-hashing for remote sensing image retrieval","volume":"60","author":"Tang","year":"2021","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_22","first-page":"1","article-title":"Class-Level Prototype Guided Multiscale Feature Learning for Remote Sensing Scene Classification With Limited Labels","volume":"60","author":"Tang","year":"2022","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_23","unstructured":"Simonyan, K., and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_25","first-page":"1","article-title":"Semi-supervised locality preserving dense graph neural network with ARMA filters and context-aware learning for hyperspectral image classification","volume":"60","author":"Ding","year":"2021","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_26","first-page":"1","article-title":"Unsupervised self-correlated learning smoothy enhanced locality preserving graph convolution embedding clustering for hyperspectral images","volume":"60","author":"Ding","year":"2022","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_27","first-page":"1","article-title":"Self-supervised locality preserving low-pass graph convolutional embedding for large-scale hyperspectral image clustering","volume":"60","author":"Ding","year":"2022","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"201","DOI":"10.1016\/j.ins.2022.04.006","article-title":"AF2GNN: Graph convolution with adaptive filters and aggregator fusion for hyperspectral image classification","volume":"602","author":"Ding","year":"2022","journal-title":"Inf. Sci."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"246","DOI":"10.1016\/j.neucom.2022.06.031","article-title":"Multi-feature fusion: Graph neural network and CNN combining for hyperspectral image classification","volume":"501","author":"Ding","year":"2022","journal-title":"Neurocomputing"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Wang, X., Girshick, R., Gupta, A., and He, K. (2018, January 18\u201323). Non-local neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00813"},{"key":"ref_31","first-page":"6000","article-title":"Attention is all you need","volume":"30","author":"Vaswani","year":"2017","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"2223","DOI":"10.1109\/JSTARS.2022.3155665","article-title":"Homo\u2013Heterogenous Transformer Learning Framework for RS Scene Classification","volume":"15","author":"Ma","year":"2022","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Bazi, Y., Bashmal, L., Rahhal, M.M.A., Dayil, R.A., and Ajlan, N.A. (2021). Vision transformers for remote sensing image classification. Remote Sens., 13.","DOI":"10.3390\/rs13030516"},{"key":"ref_34","first-page":"1","article-title":"When CNNs meet vision transformer: A joint framework for remote sensing scene classification","volume":"19","author":"Deng","year":"2021","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_35","first-page":"1","article-title":"Vision transformer: An excellent teacher for guiding small networks in remote sensing image scene classification","volume":"60","author":"Xu","year":"2022","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"1986","DOI":"10.1109\/JSTARS.2020.2988477","article-title":"Classification of high-spatial-resolution remote sensing scenes method using transfer learning and deep convolutional neural network","volume":"13","author":"Li","year":"2020","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"8358","DOI":"10.1109\/TGRS.2020.2987338","article-title":"Attribute-cooperated convolutional neural network for remote sensing image classification","volume":"58","author":"Zhang","year":"2020","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"7894","DOI":"10.1109\/TGRS.2019.2917161","article-title":"A feature aggregation convolutional neural network for remote sensing scene classification","volume":"57","author":"Lu","year":"2019","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"4104","DOI":"10.1109\/JSTARS.2017.2705419","article-title":"Aggregating rich hierarchical features for scene classification in remote sensing imagery","volume":"10","author":"Wang","year":"2017","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3159171","article-title":"A discriminatively learned cnn embedding for person reidentification","volume":"14","author":"Zheng","year":"2017","journal-title":"ACM Trans. Multimedia Comput. Commun. Appl."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"371","DOI":"10.1109\/TGRS.2017.2748120","article-title":"Diversity-promoting deep structural metric learning for remote sensing scene classification","volume":"56","author":"Gong","year":"2017","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"1200","DOI":"10.1109\/LGRS.2019.2894399","article-title":"Siamese convolutional neural networks for remote sensing scene classification","volume":"16","author":"Liu","year":"2019","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"177","DOI":"10.1016\/j.ins.2020.06.018","article-title":"Sample generation based on a supervised Wasserstein Generative Adversarial Network for high-resolution remote-sensing scene classification","volume":"539","author":"Han","year":"2020","journal-title":"Inf. Sci."},{"key":"ref_44","first-page":"1","article-title":"A Supervised Progressive Growing Generative Adversarial Network for Remote Sensing Image Scene Classification","volume":"60","author":"Ma","year":"2022","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"1155","DOI":"10.1109\/TGRS.2018.2864987","article-title":"Scene classification with recurrent attention of VHR remote sensing images","volume":"57","author":"Wang","year":"2018","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"1926","DOI":"10.1109\/LGRS.2020.3011405","article-title":"Remote sensing image scene classification based on an enhanced attention module","volume":"18","author":"Zhao","year":"2020","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"2030","DOI":"10.1109\/JSTARS.2021.3051569","article-title":"Attention consistent network for remote sensing scene classification","volume":"14","author":"Tang","year":"2021","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_48","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021, January 10\u201317). Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Sun, Z., Cao, S., Yang, Y., and Kitani, K.M. (2021, January 10\u201317). Rethinking transformer-based set prediction for object detection. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00359"},{"key":"ref_51","first-page":"1","article-title":"SCViT: A Spatial-Channel Feature Preserving Vision Transformer for Remote Sensing Image Scene Classification","volume":"60","author":"Lv","year":"2022","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"1884","DOI":"10.1109\/JSTARS.2022.3145042","article-title":"Transformer-Driven Semantic Relation Inference for Multilabel Classification of High-Resolution Remote Sensing Images","volume":"15","author":"Tan","year":"2022","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"1414","DOI":"10.1109\/TNNLS.2020.3042276","article-title":"Looking closer at the scene: Multiscale representation learning for remote sensing image scene classification","volume":"33","author":"Wang","year":"2020","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_54","unstructured":"Ba, J.L., Kiros, J.R., and Hinton, G.E. (2016). Layer normalization. arXiv."},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"3965","DOI":"10.1109\/TGRS.2017.2685945","article-title":"AID: A benchmark data set for performance evaluation of aerial scene classification","volume":"55","author":"Xia","year":"2017","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_56","doi-asserted-by":"crossref","first-page":"1575","DOI":"10.1109\/JSTARS.2022.3142898","article-title":"WH-MAVS: A Novel Dataset and Deep Learning Benchmark for Multiple Land Use and Land Cover Applications","volume":"15","author":"Yuan","year":"2022","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. (2009, January 20\u201325). Imagenet: A large-scale hierarchical image database. Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA.","DOI":"10.1109\/CVPR.2009.5206848"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/15\/3\/810\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T18:20:00Z","timestamp":1760120400000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/15\/3\/810"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,1,31]]},"references-count":57,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2023,2]]}},"alternative-id":["rs15030810"],"URL":"https:\/\/doi.org\/10.3390\/rs15030810","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,1,31]]}}}