{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,10]],"date-time":"2026-02-10T18:15:18Z","timestamp":1770747318631,"version":"3.49.0"},"reference-count":55,"publisher":"MDPI AG","issue":"22","license":[{"start":{"date-parts":[[2022,11,17]],"date-time":"2022-11-17T00:00:00Z","timestamp":1668643200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Recent developments in remote sensing technology have allowed us to observe the Earth with very high-resolution (VHR) images. VHR imagery scene classification is a challenging problem in the field of remote sensing. Vision transformer (ViT) models have achieved breakthrough results in image recognition tasks. However, transformer\u2013encoder layers encode different levels of features, where the latest layer represents semantic information, in contrast to the earliest layers, which contain more detailed data but ignore the semantic information of an image scene. In this paper, a new deep framework is proposed for VHR scene understanding by exploring the strengths of ViT features in a simple and effective way. First, pre-trained ViT models are used to extract informative features from the original VHR image scene, where the transformer\u2013encoder layers are used to generate the feature descriptors of the input images. Second, we merged the obtained features as one signal data set. Third, some extracted ViT features do not describe well the image scenes, such as agriculture, meadows, and beaches, which could negatively affect the performance of the classification model. To deal with this challenge, we propose a new algorithm for feature- and image selection. Indeed, this gives us the possibility of eliminating the less important features and images, as well as those that are abnormal; based on the similarity of preserving the whole data set, we selected the most informative features and important images by dropping the irrelevant images that degraded the classification accuracy. The proposed method was tested on three VHR benchmarks. The experimental results demonstrate that the proposed method outperforms other state-of-the-art methods.<\/jats:p>","DOI":"10.3390\/rs14225817","type":"journal-article","created":{"date-parts":[[2022,11,18]],"date-time":"2022-11-18T04:08:40Z","timestamp":1668744520000},"page":"5817","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":13,"title":["On the Co-Selection of Vision Transformer Features and Images for Very High-Resolution Image Scene Classification"],"prefix":"10.3390","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5911-3128","authenticated-orcid":false,"given":"Souleyman","family":"Chaib","sequence":"first","affiliation":[{"name":"LabRi Laboratory, Ecole Sup\u00e8rieure en Informatique, Sidi Bel Abb\u00e8s 22000, Algeria"}]},{"given":"Dou El Kefel","family":"Mansouri","sequence":"additional","affiliation":[{"name":"Faculty of Science of Nature and Life, Ibn-Khaldoun University, Tiaret 14000, Algeria"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3243-990X","authenticated-orcid":false,"given":"Ibrahim","family":"Omara","sequence":"additional","affiliation":[{"name":"Department of Machine Intelligence, Faculty of Artificial Intelligence, Menoufia University, Shebin ElKom 32511, Egypt"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2631-1846","authenticated-orcid":false,"given":"Ahmed","family":"Hagag","sequence":"additional","affiliation":[{"name":"Department of Scientific Computing, Faculty of Computers and Artificial Intelligence, Benha University, Benha 13518, Egypt"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3620-1395","authenticated-orcid":false,"given":"Sahraoui","family":"Dhelim","sequence":"additional","affiliation":[{"name":"School of Computer Science, University College Dublin, D04 V1W8 Dublin, Ireland"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2019-0636","authenticated-orcid":false,"given":"Djamel Amar","family":"Bensaber","sequence":"additional","affiliation":[{"name":"LabRi Laboratory, Ecole Sup\u00e8rieure en Informatique, Sidi Bel Abb\u00e8s 22000, Algeria"}]}],"member":"1968","published-online":{"date-parts":[[2022,11,17]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"145","DOI":"10.1023\/A:1011139631724","article-title":"Modeling the shape of the scene: A holistic representation of the spatial envelope","volume":"42","author":"Oliva","year":"2001","journal-title":"Int. J. Comput. Vis."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"971","DOI":"10.1109\/TPAMI.2002.1017623","article-title":"Multiresolution gray-scale and rotation invariant texture classification with local binary patterns","volume":"24","author":"Ojala","year":"2002","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1007\/BF00130487","article-title":"Color indexing","volume":"7","author":"Swain","year":"1991","journal-title":"Int. J. Comput. Vis."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Lowe, D.G. (1999, January 20\u201327). Object recognition from local scale-invariant features. Proceedings of the Seventh IEEE International Conference on Computer Vision, Kerkyra, Greece.","DOI":"10.1109\/ICCV.1999.790410"},{"key":"ref_5","unstructured":"Dalal, N., and Triggs, B. (2005, January 20\u201325). Histograms of oriented gradients for human detection. Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR\u201905), San Diego, CA, USA."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Yang, Y., and Newsam, S. (2008, January 12\u201315). Comparing SIFT descriptors and Gabor texture features for classification of remote sensed imagery. Proceedings of the 2008 15th IEEE International Conference on Image Processing, San Diego, CA, USA.","DOI":"10.1109\/ICIP.2008.4712139"},{"key":"ref_7","unstructured":"dos Santos, J.A., Penatti, O.A.B., and da Silva Torres, R. (2010, January 17\u201321). Evaluating the Potential of Texture and Color Descriptors for Remote Sensing Image Retrieval and Classification. Proceedings of the VISAPP, Angers, France."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Risojevi\u0107, V., Momi\u0107, S., and Babi\u0107, Z. (2011, January 14\u201316). Gabor descriptors for aerial image classification. Proceedings of the International Conference on Adaptive and Natural Computing Algorithms, Ljubljana, Slovenia.","DOI":"10.1007\/978-3-642-20267-4_6"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"75","DOI":"10.1007\/s11760-014-0704-x","article-title":"Block-based semantic classification of high-resolution multispectral aerial images","volume":"10","year":"2016","journal-title":"Signal Image Video Process."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"4837","DOI":"10.1109\/TGRS.2015.2411331","article-title":"Measuring the effectiveness of various features for thematic information extraction from very high resolution remote sensing imagery","volume":"53","author":"Chen","year":"2015","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1899","DOI":"10.1109\/JSTARS.2012.2228254","article-title":"Indexing of remote sensing images with different resolutions by multiple features","volume":"6","author":"Luo","year":"2013","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"1465","DOI":"10.1109\/TIP.2008.925367","article-title":"Indexing of satellite images with different resolutions by wavelet features","volume":"17","author":"Luo","year":"2008","journal-title":"IEEE Trans. Image Process."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1137\/080730627","article-title":"Local scale measure from the topographic map and application to remote sensing images","volume":"8","author":"Luo","year":"2009","journal-title":"Multiscale Model. Simul."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"2403","DOI":"10.1109\/LGRS.2015.2478966","article-title":"Land-use scene classification in high-resolution remote sensing images using improved correlatons","volume":"12","author":"Qi","year":"2015","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1016\/j.isprsjprs.2014.10.002","article-title":"Multi-class geospatial object detection and geographic image classification based on collection of part detectors","volume":"98","author":"Cheng","year":"2014","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"330","DOI":"10.1016\/j.geomorph.2006.04.013","article-title":"Automated classification of landform elements using object-based image analysis","volume":"81","author":"Blaschke","year":"2006","journal-title":"Geomorphology"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"2343","DOI":"10.1109\/JSTARS.2016.2536943","article-title":"Semantic classification of high-resolution remote-sensing images based on mid-level features","volume":"9","author":"Zhang","year":"2016","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"5158","DOI":"10.1109\/JSTARS.2015.2495267","article-title":"Remote sensing image classification: No features, no clustering","volume":"8","author":"Cui","year":"2015","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Sivic, J., and Zisserman, A. (2003, January 13\u201316). Video Google: A text retrieval approach to object matching in videos. Proceedings of the Computer Vision, IEEE International Conference on. IEEE Computer Society, Nice, France.","DOI":"10.1109\/ICCV.2003.1238663"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"45","DOI":"10.1080\/01431161.2012.705443","article-title":"Automatic landslide detection from remote-sensing imagery using a scene classification method based on BoVW and pLSA","volume":"34","author":"Cheng","year":"2013","journal-title":"Int. J. Remote Sens."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"035004","DOI":"10.1117\/1.JRS.10.035004","article-title":"Feature significance-based multibag-of-visual-words model for remote sensing image scene classification","volume":"10","author":"Zhao","year":"2016","journal-title":"J. Appl. Remote Sens."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Wu, H., Liu, B., Su, W., Zhang, W., and Sun, J. (2016). Hierarchical coding vectors for scene level land-use classification. Remote Sens., 8.","DOI":"10.3390\/rs8050436"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"1055","DOI":"10.1109\/LGRS.2012.2228625","article-title":"High-resolution remote-sensing image classification via an approximate earth mover\u2019s distance-based bag-of-features model","volume":"10","author":"Zhang","year":"2013","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"4238","DOI":"10.1109\/TGRS.2015.2393857","article-title":"Effective and efficient mid-level visual elements-oriented land-use classification using VHR remote sensing images","volume":"53","author":"Cheng","year":"2015","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"14680","DOI":"10.3390\/rs71114680","article-title":"Transferring deep convolutional neural networks for the scene classification of high-resolution remote sensing imagery","volume":"7","author":"Hu","year":"2015","journal-title":"Remote Sens."},{"key":"ref_26","unstructured":"Deng, J., Berg, A., Satheesh, S., Su, H., Khosla, A., and Li, F. (2020, March 15). Imagenet Large Scale Visual Recognition Competition. ilsvrc2012. Available online: https:\/\/image-net.org\/challenges\/LSVRC\/."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"1793","DOI":"10.1109\/TGRS.2015.2488681","article-title":"Scene classification via a gradient boosting random convolutional network framework","volume":"54","author":"Zhang","year":"2015","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Yang, Y., and Newsam, S. (2010, January 2\u20135). Bag-of-visual-words and spatial extensions for land-use classification. Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, San Jose, CA, USA.","DOI":"10.1145\/1869790.1869829"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"539","DOI":"10.1016\/j.patcog.2016.07.001","article-title":"Towards better exploiting convolutional neural networks for remote sensing scene classification","volume":"61","author":"Nogueira","year":"2017","journal-title":"Pattern Recognit."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"2149","DOI":"10.1080\/01431161.2016.1171928","article-title":"Using convolutional features and a sparse autoencoder for land-use scene classification","volume":"37","author":"Othman","year":"2016","journal-title":"Int. J. Remote Sens."},{"key":"ref_31","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, \u0141., and Polosukhin, I. (2017, January 4\u20139). Attention is all you need. Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA."},{"key":"ref_32","unstructured":"Tsai, Y.H.H., Bai, S., Liang, P.P., Kolter, J.Z., Morency, L.P., and Salakhutdinov, R. (August, January 28). Multimodal transformer for unaligned multimodal language sequences. Proceedings of the Association for Computational Linguistics Meeting, Florence, Italy."},{"key":"ref_33","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"4775","DOI":"10.1109\/TGRS.2017.2700322","article-title":"Deep feature fusion for VHR remote sensing scene classification","volume":"55","author":"Chaib","year":"2017","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"2899","DOI":"10.1109\/TKDE.2020.3014262","article-title":"sCOs: Semi-Supervised Co-Selection by a Similarity Preserving Approach","volume":"34","author":"Benabdeslem","year":"2022","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Tang, J., and Liu, H. (2013, January 2\u20134). Coselect: Feature selection with instance selection for social media data. Proceedings of the 2013 SIAM International Conference on Data Mining, Austin, TX, USA.","DOI":"10.1137\/1.9781611972832.77"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"626","DOI":"10.1198\/jasa.2011.tm10390","article-title":"Outlier Detection Using Nonconvex Penalized Regression","volume":"106","author":"She","year":"2011","journal-title":"J. Am. Stat. Assoc."},{"key":"ref_38","unstructured":"Castelluccio, M., Poggi, G., Sansone, C., and Verdoliva, L. (2015). Land use classification in remote sensing images by convolutional neural networks. arXiv."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"3965","DOI":"10.1109\/TGRS.2017.2685945","article-title":"AID: A benchmark data set for performance evaluation of aerial scene classification","volume":"55","author":"Xia","year":"2017","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"1865","DOI":"10.1109\/JPROC.2017.2675998","article-title":"Remote sensing image scene classification: Benchmark and state of the art","volume":"105","author":"Cheng","year":"2017","journal-title":"Proc. IEEE"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/1961189.1961199","article-title":"LIBSVM: A library for support vector machines","volume":"2","author":"Chang","year":"2011","journal-title":"ACM Trans. Intell. Syst. Technol."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"2175","DOI":"10.1109\/TGRS.2014.2357078","article-title":"Saliency-guided unsupervised feature learning for scene classification","volume":"53","author":"Zhang","year":"2014","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"74","DOI":"10.1016\/j.isprsjprs.2018.01.023","article-title":"Binary patterns encoded convolutional neural networks for texture recognition and remote sensing scene classification","volume":"138","author":"Anwer","year":"2018","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"7109","DOI":"10.1109\/TGRS.2018.2848473","article-title":"Scene classification based on multiscale convolutional neural network","volume":"56","author":"Liu","year":"2018","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"6899","DOI":"10.1109\/TGRS.2018.2845668","article-title":"Remote sensing scene classification using multilayer stacked covariance pooling","volume":"56","author":"He","year":"2018","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"3508","DOI":"10.1109\/JSTARS.2019.2934165","article-title":"Aggregated deep fisher feature for VHR remote sensing scene classification","volume":"12","author":"Li","year":"2019","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"241","DOI":"10.1109\/LGRS.2020.2970810","article-title":"Multilayer feature fusion with weight adjustment based on a convolutional neural network for remote sensing scene classification","volume":"18","author":"Ma","year":"2020","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"8639367","DOI":"10.1155\/2018\/8639367","article-title":"A two-stream deep fusion framework for high-resolution aerial scene classification","volume":"2018","author":"Yu","year":"2018","journal-title":"Comput. Intell. Neurosci."},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"1155","DOI":"10.1109\/TGRS.2018.2864987","article-title":"Scene classification with recurrent attention of VHR remote sensing images","volume":"57","author":"Wang","year":"2018","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"18195","DOI":"10.1109\/ACCESS.2021.3052977","article-title":"A multi-level convolution pyramid semantic fusion framework for high-resolution remote sensing image scene classification and annotation","volume":"9","author":"Sun","year":"2021","journal-title":"IEEE Access"},{"key":"ref_51","first-page":"1","article-title":"Multilevel feature fusion networks with adaptive channel dimensionality reduction for remote sensing scene classification","volume":"19","author":"Wang","year":"2021","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Lv, Y., Zhang, X., Xiong, W., Cui, Y., and Cai, M. (2019). An end-to-end local-global-fusion feature extraction network for remote sensing image scene classification. Remote Sens., 11.","DOI":"10.3390\/rs11243006"},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Fan, R., Wang, L., Feng, R., and Zhu, Y. (August, January 28). Attention based residual network for high-resolution remote sensing imagery scene classification. Proceedings of the IGARSS 2019\u20142019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan.","DOI":"10.1109\/IGARSS.2019.8900199"},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"67200","DOI":"10.1109\/ACCESS.2019.2918732","article-title":"Global-local attention network for aerial scene classification","volume":"7","author":"Guo","year":"2019","journal-title":"IEEE Access"},{"key":"ref_55","first-page":"1","article-title":"Transferring CNN With Adaptive Learning for Remote Sensing Scene Classification","volume":"60","author":"Wang","year":"2022","journal-title":"IEEE Trans. Geosci. Remote Sens."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/22\/5817\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T01:20:19Z","timestamp":1760145619000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/22\/5817"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,11,17]]},"references-count":55,"journal-issue":{"issue":"22","published-online":{"date-parts":[[2022,11]]}},"alternative-id":["rs14225817"],"URL":"https:\/\/doi.org\/10.3390\/rs14225817","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,11,17]]}}}