{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,25]],"date-time":"2026-02-25T00:36:02Z","timestamp":1771979762536,"version":"3.50.1"},"reference-count":110,"publisher":"MDPI AG","issue":"16","license":[{"start":{"date-parts":[[2022,8,17]],"date-time":"2022-08-17T00:00:00Z","timestamp":1660694400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001665","name":"ANR AI chair OTTOPIA","doi-asserted-by":"publisher","award":["ANR-20-CHIA-0030"],"award-info":[{"award-number":["ANR-20-CHIA-0030"]}],"id":[{"id":"10.13039\/501100001665","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Deep learning methods have become an integral part of computer vision and machine learning research by providing significant improvement performed in many tasks such as classification, regression, and detection. These gains have been also observed in the field of remote sensing for Earth observation where most of the state-of-the-art results are now achieved by deep neural networks. However, one downside of these methods is the need for large amounts of annotated data, requiring lots of labor-intensive and expensive human efforts, in particular for specific domains that require expert knowledge such as medical imaging or remote sensing. In order to limit the requirement on data annotations, several self-supervised representation learning methods have been proposed to learn unsupervised image representations that can consequently serve for downstream tasks such as image classification, object detection or semantic segmentation. As a result, self-supervised learning approaches have been considerably adopted in the remote sensing domain within the last few years. In this article, we review the underlying principles developed by various self-supervised methods with a focus on scene classification task. We highlight the main contributions and analyze the experiments, as well as summarize the key conclusions, from each study. We then conduct extensive experiments on two public scene classification datasets to benchmark and evaluate different self-supervised models. Based on comparative results, we investigate the impact of individual augmentations when applied to remote sensing data as well as the use of self-supervised pre-training to boost the classification performance with limited number of labeled samples. We finally underline the current trends and challenges, as well as perspectives of self-supervised scene classification.<\/jats:p>","DOI":"10.3390\/rs14163995","type":"journal-article","created":{"date-parts":[[2022,8,17]],"date-time":"2022-08-17T22:53:30Z","timestamp":1660776810000},"page":"3995","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":64,"title":["Self-Supervised Learning for Scene Classification in Remote Sensing: Current State of the Art and Perspectives"],"prefix":"10.3390","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6848-5791","authenticated-orcid":false,"given":"Paul","family":"Berg","sequence":"first","affiliation":[{"name":"Institut de Recherche en Informatique et Syst\u00e8mes Al\u00e9atoires (IRISA), Universit\u00e9 Bretagne Sud, UMR 6074, F-56000 Vannes, France"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0266-767X","authenticated-orcid":false,"given":"Minh-Tan","family":"Pham","sequence":"additional","affiliation":[{"name":"Institut de Recherche en Informatique et Syst\u00e8mes Al\u00e9atoires (IRISA), Universit\u00e9 Bretagne Sud, UMR 6074, F-56000 Vannes, France"}]},{"given":"Nicolas","family":"Courty","sequence":"additional","affiliation":[{"name":"Institut de Recherche en Informatique et Syst\u00e8mes Al\u00e9atoires (IRISA), Universit\u00e9 Bretagne Sud, UMR 6074, F-56000 Vannes, France"}]}],"member":"1968","published-online":{"date-parts":[[2022,8,17]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1007\/s11263-015-0816-y","article-title":"ImageNet Large Scale Visual Recognition Challenge","volume":"115","author":"Russakovsky","year":"2015","journal-title":"Int. J. Comput. Vis."},{"key":"ref_2","unstructured":"Huh, M., Agrawal, P., and Efros, A.A. (2016). What makes ImageNet good for transfer learning?. arXiv."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"1865","DOI":"10.1109\/JPROC.2017.2675998","article-title":"Remote Sensing Image Scene Classification: Benchmark and State of the Art","volume":"105","author":"Cheng","year":"2017","journal-title":"Proc. IEEE"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"217","DOI":"10.1080\/01431160412331269698","article-title":"Random forest classifier for remote sensing classification","volume":"26","author":"Pal","year":"2005","journal-title":"Int. J. Remote Sens."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"247","DOI":"10.1016\/j.isprsjprs.2010.11.001","article-title":"Support vector machines in remote sensing: A review","volume":"66","author":"Mountrakis","year":"2011","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"3735","DOI":"10.1109\/JSTARS.2020.3005403","article-title":"Remote sensing image scene classification meets deep learning: Challenges, methods, benchmarks, and opportunities","volume":"13","author":"Cheng","year":"2020","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_7","unstructured":"Dalal, N., and Triggs, B. (2005, January 20\u201326). Histograms of oriented gradients for human detection. Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR\u201905), San Diego, CA, USA."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Lowe, D. (1999, January 20\u201327). Object recognition from local scale-invariant features. Proceedings of the Seventh IEEE International Conference on Computer Vision, Corfu, Greece.","DOI":"10.1109\/ICCV.1999.790410"},{"key":"ref_9","unstructured":"(2003, January 13\u201316). Video Google: A text retrieval approach to object matching in videos. Proceedings of the Ninth IEEE International Conference on Computer Vision, Washington, DC, USA."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"222","DOI":"10.1007\/s11263-013-0636-x","article-title":"Image classification with the fisher vector: Theory and practice","volume":"105","author":"Perronnin","year":"2013","journal-title":"Int. J. Comput. Vis."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1038\/nature14539","article-title":"Deep learning","volume":"521","author":"LeCun","year":"2015","journal-title":"Nature"},{"key":"ref_12","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Neumann, M., Pinto, A.S., Zhai, X., and Houlsby, N. (October, January 26). Training general representations for remote sensing using in-domain knowledge. Proceedings of the IGARSS 2020-2020 IEEE International Geoscience and Remote Sensing Symposium, Waikoloa, HI, USA.","DOI":"10.1109\/IGARSS39084.2020.9324501"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Yang, Y., and Newsam, S. (2010, January 2\u20135). Bag-of-visual-words and spatial extensions for land-use classification. Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, San Jose, CA, USA.","DOI":"10.1145\/1869790.1869829"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"3965","DOI":"10.1109\/TGRS.2017.2685945","article-title":"AID: A Benchmark Dataset for Performance Evaluation of Aerial Scene Classification","volume":"55","author":"Xia","year":"2017","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Li, H., Dou, X., Tao, C., Wu, Z., Chen, J., Peng, J., Deng, M., and Zhao, L. (2020). RSI-CB: A large-scale remote sensing image classification benchmark using crowdsourced data. Sensors, 20.","DOI":"10.3390\/s20061594"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"2217","DOI":"10.1109\/JSTARS.2019.2918242","article-title":"Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification","volume":"12","author":"Helber","year":"2019","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Sumbul, G., Charfuelan, M., Demir, B., and Markl, V. (August, January 28). Bigearthnet: A large-scale benchmark archive for remote sensing image understanding. Proceedings of the IGARSS 2019-2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan.","DOI":"10.1109\/IGARSS.2019.8900532"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1016\/j.rse.2011.11.026","article-title":"Sentinel-2: ESA\u2019s optical high-resolution mission for GMES operational services","volume":"120","author":"Drusch","year":"2012","journal-title":"Remote Sens. Environ."},{"key":"ref_20","unstructured":"Kaplan, J., McCandlish, S., Henighan, T., Brown, T.B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., and Amodei, D. (2020). Scaling laws for neural language models. arXiv."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"1798","DOI":"10.1109\/TPAMI.2013.50","article-title":"Representation learning: A review and new perspectives","volume":"35","author":"Bengio","year":"2013","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"2168","DOI":"10.1109\/TPAMI.2020.3031898","article-title":"Small data challenges in big data era: A survey of recent progress on unsupervised and semi-supervised methods","volume":"44","author":"Qi","year":"2020","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"4037","DOI":"10.1109\/TPAMI.2020.2992393","article-title":"Self-supervised visual feature learning with deep neural networks: A survey","volume":"43","author":"Jing","year":"2020","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"107090","DOI":"10.1016\/j.knosys.2021.107090","article-title":"Review on self-supervised image recognition using deep neural networks","volume":"224","author":"Ohri","year":"2021","journal-title":"Knowl. Based Syst."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Liu, X., Zhang, F., Hou, Z., Mian, L., Wang, Z., Zhang, J., and Tang, J. (2021). Self-supervised learning: Generative or contrastive. IEEE Trans. Knowl. Data Eng.","DOI":"10.1109\/TKDE.2021.3090866"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Vincent, P., Larochelle, H., Bengio, Y., and Manzagol, P.A. (2008, January 18\u201324). Extracting and composing robust features with denoising autoencoders. Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland.","DOI":"10.1145\/1390156.1390294"},{"key":"ref_27","unstructured":"Kingma, D.P., and Welling, M. (2014). Auto-Encoding Variational Bayes. arXiv."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"He, K., Chen, X., Xie, S., Li, Y., Doll\u00e1r, P., and Girshick, R. (2021). Masked Autoencoders Are Scalable Vision Learners. arXiv.","DOI":"10.1109\/CVPR52688.2022.01553"},{"key":"ref_29","unstructured":"Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., and Weinberger, K. (2014). Generative Adversarial Nets. Advances in Neural Information Processing Systems, Curran Associates, Inc."},{"key":"ref_30","unstructured":"Radford, A., Metz, L., and Chintala, S. (2015). Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. arXiv."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Doersch, C., Gupta, A., and Efros, A.A. (2015, January 7\u201313). Unsupervised Visual Representation Learning by Context Prediction. Proceedings of the International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.167"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Zhang, R., Isola, P., and Efros, A.A. (2016, January 8\u201314). Colorful Image Colorization. Proceedings of the European Conference on Computer Vision ECCV, Munich, Germany.","DOI":"10.1007\/978-3-319-46487-9_40"},{"key":"ref_33","unstructured":"Gidaris, S., Singh, P., and Komodakis, N. (2018). Unsupervised representation learning by predicting image rotations. arXiv."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Noroozi, M., and Favaro, P. (2016, January 8\u201314). Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles. Proceedings of the European Conference on Computer Vision ECCV, Munich, Germany.","DOI":"10.1007\/978-3-319-46466-4_5"},{"key":"ref_35","first-page":"766","article-title":"Discriminative unsupervised feature learning with convolutional neural networks","volume":"27","author":"Dosovitskiy","year":"2014","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_36","unstructured":"Jing, L., Vincent, P., LeCun, Y., and Tian, Y. (2021). Understanding dimensional collapse in contrastive self-supervised learning. arXiv."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Dong, X., and Shen, J. (2018, January 8\u201314). Triplet Loss in Siamese Network for Object Tracking. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01261-8_28"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Wu, Z., Xiong, Y., Yu, S.X., and Lin, D. (2018, January 18\u201323). Unsupervised feature learning via non-parametric instance discrimination. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00393"},{"key":"ref_39","unstructured":"Chen, T., Kornblith, S., Norouzi, M., and Hinton, G. (2020, January 13\u201318). A simple framework for contrastive learning of visual representations. Proceedings of the International Conference on Machine Learning PMLR, Virtual."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"He, K., Fan, H., Wu, Y., Xie, S., and Girshick, R. (2020, January 14\u201319). Momentum Contrast for Unsupervised Visual Representation Learning. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Virtual.","DOI":"10.1109\/CVPR42600.2020.00975"},{"key":"ref_41","first-page":"9912","article-title":"Unsupervised learning of visual features by contrasting cluster assignments","volume":"33","author":"Caron","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"355","DOI":"10.1561\/2200000073","article-title":"Computational optimal transport: With applications to data science","volume":"11","author":"Cuturi","year":"2019","journal-title":"Found. Trends Mach. Learn."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Wang, X., Zhang, R., Shen, C., Kong, T., and Li, L. (2021, January 20\u201325). Dense contrastive learning for self-supervised visual pre-training. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00304"},{"key":"ref_44","first-page":"21271","article-title":"Bootstrap your own latent-a new approach to self-supervised learning","volume":"33","author":"Grill","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_45","unstructured":"Hinton, G., Vinyals, O., and Dean, J. (2015). Distilling the knowledge in a neural network. arXiv, 2."},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Chen, X., and He, K. (2021, January 20\u201325). Exploring simple siamese representation learning. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01549"},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Caron, M., Touvron, H., Misra, I., J\u00e9gou, H., Mairal, J., Bojanowski, P., and Joulin, A. (2021, January 11\u201317). Emerging properties in self-supervised vision transformers. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Virtual.","DOI":"10.1109\/ICCV48922.2021.00951"},{"key":"ref_48","unstructured":"Zbontar, J., Jing, L., Misra, I., LeCun, Y., and Deny, S. (2021, January 18\u201324). Barlow twins: Self-supervised learning via redundancy reduction. Proceedings of the International Conference on Machine Learning, Virtual."},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"124020","DOI":"10.1088\/1742-5468\/ab3985","article-title":"On the information bottleneck theory of deep learning","volume":"2019","author":"Saxe","year":"2019","journal-title":"J. Stat. Mech. Theory Exp."},{"key":"ref_50","unstructured":"Bardes, A., Ponce, J., and LeCun, Y. (2022). VICReg: Variance-Invariance-Covariance Regularization For Self-Supervised Learning. arXiv."},{"key":"ref_51","unstructured":"Krizhevsky, A. (2022, July 20). Learning Multiple Layers of Features from Tiny Images. Available online: http:\/\/citeseerx.ist.psu.edu\/viewdoc\/download?doi=10.1.1.222.9220&rep=rep1&type=pdf."},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"2092","DOI":"10.1109\/LGRS.2017.2752750","article-title":"MARTA GANs: Unsupervised Representation Learning for Remote Sensing Image Classification","volume":"14","author":"Lin","year":"2017","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Penatti, O.A., Nogueira, K., and Dos Santos, J.A. (2015, January 7\u201312). Do deep features generalize from everyday objects to remote sensing and aerial scenes domains?. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Boston, MA, USA.","DOI":"10.1109\/CVPRW.2015.7301382"},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Stojni\u0107, V., and Risojevi\u0107, V. (2018, January 16\u201319). Evaluation of Split-Brain Autoencoders for High-Resolution Remote Sensing Scene Classification. Proceedings of the 2018 International Symposium ELMAR, Zadar, Croatia.","DOI":"10.23919\/ELMAR.2018.8534634"},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Zhang, R., Isola, P., and Efros, A.A. (2017, January 21\u201326). Split-brain autoencoders: Unsupervised learning by cross-channel prediction. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.76"},{"key":"ref_56","first-page":"8004005","article-title":"Remote sensing image scene classification with self-supervised paradigm under limited labeled samples","volume":"19","author":"Tao","year":"2020","journal-title":"IEEE Geosci. Remote. Sens. Lett."},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Zhao, Z., Luo, Z., Li, J., Chen, C., and Piao, Y. (2020). When self-supervised learning meets scene classification: Remote sensing scene classification based on a multitask learning framework. Remote Sens., 12.","DOI":"10.3390\/rs12203276"},{"key":"ref_58","unstructured":"Xia, G.S., Yang, W., Delon, J., Gousseau, Y., Sun, H., and Ma\u00eetre, H. (2010, January 5\u20137). Structural high-resolution satellite image indexing. Proceedings of the ISPRS TC VII Symposium-100 Years ISPRS, Vienna, Austria."},{"key":"ref_59","doi-asserted-by":"crossref","first-page":"474","DOI":"10.1109\/JSTARS.2020.3036602","article-title":"Self-Supervised Pretraining of Transformers for Satellite Image Time Series Classification","volume":"14","author":"Yuan","year":"2021","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_60","first-page":"4171","article-title":"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding","volume":"Volume 1","author":"Devlin","year":"2019","journal-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies"},{"key":"ref_61","first-page":"6000","article-title":"Attention is all you need","volume":"30","author":"Vaswani","year":"2017","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_62","unstructured":"Jean, N., Wang, S., Samar, A., Azzari, G., Lobell, D., and Ermon, S. (February, January 27). Tile2vec: Unsupervised representation learning for spatially distributed data. Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA."},{"key":"ref_63","doi-asserted-by":"crossref","first-page":"341","DOI":"10.1080\/10106049.2011.562309","article-title":"Monitoring US agriculture: The US department of agriculture, national agricultural statistics service, cropland data layer program","volume":"26","author":"Boryan","year":"2011","journal-title":"Geocarto Int."},{"key":"ref_64","doi-asserted-by":"crossref","first-page":"249","DOI":"10.1049\/ell2.12108","article-title":"Self-supervised learning with randomised layers for remote sensing","volume":"57","author":"Jung","year":"2021","journal-title":"Electron. Lett."},{"key":"ref_65","doi-asserted-by":"crossref","unstructured":"Stojnic, V., and Risojevic, V. (2021, January 19\u201325). Self-Supervised Learning of Remote Sensing Scene Representations Using Contrastive Multiview Coding. Proceedings of the 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Nashville, TN, USA.","DOI":"10.1109\/CVPRW53098.2021.00129"},{"key":"ref_66","doi-asserted-by":"crossref","unstructured":"Ayush, K., Uzkent, B., Meng, C., Tanmay, K., Burke, M., Lobell, D., and Ermon, S. (2021, January 11\u201317). Geography-aware self-supervised learning. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Virtual.","DOI":"10.1109\/ICCV48922.2021.01002"},{"key":"ref_67","unstructured":"Chen, X., Fan, H., Girshick, R., and He, K. (2020). Improved baselines with momentum contrastive learning. arXiv."},{"key":"ref_68","doi-asserted-by":"crossref","unstructured":"Ma\u00f1as, O., Lacoste, A., Gir\u00f3-i Nieto, X., Vazquez, D., and Rodr\u00edguez, P. (2021, January 10). Seasonal Contrast: Unsupervised Pre-Training from Uncurated Remote Sensing Data. Proceedings of the 2021 IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00928"},{"key":"ref_69","doi-asserted-by":"crossref","unstructured":"Daudt, R.C., Le Saux, B., Boulch, A., and Gousseau, Y. (2018, January 22\u201327). Urban change detection for multi-spectral earth observation using convolutional neural networks. Proceedings of the IGARSS 2018-2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain.","DOI":"10.1109\/IGARSS.2018.8518015"},{"key":"ref_70","first-page":"8010105","article-title":"Contrastive Self-Supervised Learning With Smoothed Representation for Remote Sensing","volume":"19","author":"Jung","year":"2021","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_71","unstructured":"Lam, D., Kuzma, R., McGee, K., Dooley, S., Laielli, M., Klaric, M., Bulatov, Y., and McCord, B. (2018). xview: Objects in context in overhead imagery. arXiv."},{"key":"ref_72","doi-asserted-by":"crossref","unstructured":"Tao, C., Qia, J., Zhang, G., Zhu, Q., Lu, W., and Li, H. (2022). TOV: The Original Vision Model for Optical Remote Sensing Image Understanding via Self-supervised Learning. arXiv.","DOI":"10.1109\/JSTARS.2023.3271312"},{"key":"ref_73","unstructured":"Miller, G.A. (1998). WordNet: An Electronic Lexical Database, MIT Press."},{"key":"ref_74","doi-asserted-by":"crossref","first-page":"197","DOI":"10.1016\/j.isprsjprs.2018.01.004","article-title":"PatternNet: A benchmark dataset for performance evaluation of remote sensing image retrieval","volume":"145","author":"Zhou","year":"2018","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_75","doi-asserted-by":"crossref","first-page":"705","DOI":"10.5194\/isprs-annals-V-3-2022-705-2022","article-title":"Contrastive self-supervised data fusion for satellite imagery","volume":"3","author":"Scheibenreif","year":"2022","journal-title":"ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci."},{"key":"ref_76","doi-asserted-by":"crossref","first-page":"5866","DOI":"10.1109\/TGRS.2020.3024744","article-title":"Multisensor Data Fusion for Cloud Removal in Global and All-Season Sentinel-2 Imagery","volume":"59","author":"Ebel","year":"2020","journal-title":"IEEE Trans. Geosci. Remote. Sens."},{"key":"ref_77","doi-asserted-by":"crossref","first-page":"134","DOI":"10.1109\/MGRS.2020.3033515","article-title":"Report on the 2020 IEEE GRSS data fusion contest-global land cover mapping with weak supervision [technical committees]","volume":"8","author":"Yokoya","year":"2020","journal-title":"IEEE Geosci. Remote Sens. Mag."},{"key":"ref_78","unstructured":"Windsor, R., Jamaludin, A., Kadir, T., and Zisserman, A. (October, January 27). Self-supervised multi-modal alignment for whole body medical imaging. Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Strasbourg, France."},{"key":"ref_79","doi-asserted-by":"crossref","unstructured":"Scheibenreif, L., Hanna, J., Mommert, M., and Borth, D. (2022, January 21\u201324). Self-Supervised Vision Transformers for Land-Cover Segmentation and Classification. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPRW56347.2022.00148"},{"key":"ref_80","doi-asserted-by":"crossref","unstructured":"Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021, January 11\u201317). Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Virtual.","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"ref_81","doi-asserted-by":"crossref","first-page":"6509805","DOI":"10.1109\/LGRS.2022.3173419","article-title":"Spatial-temporal Invariant Contrastive Learning for Remote Sensing Scene Classification","volume":"19","author":"Huang","year":"2022","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_82","first-page":"4204","article-title":"Mapping estimation for discrete optimal transport","volume":"29","author":"Perrot","year":"2016","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_83","unstructured":"Rubner, Y., Tomasi, C., and Guibas, L. (1998, January 7). A metric for distributions with applications to image databases. Proceedings of the Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271), Bombay, India."},{"key":"ref_84","doi-asserted-by":"crossref","unstructured":"Zheng, X., Kellenberger, B., Gong, R., Hajnsek, I., and Tuia, D. (2021, January 11\u201317). Self-Supervised Pretraining and Controlled Augmentation Improve Rare Wildlife Recognition in UAV Images. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Virtual.","DOI":"10.1109\/ICCVW54120.2021.00087"},{"key":"ref_85","doi-asserted-by":"crossref","unstructured":"Wang, X., Liu, Z., and Yu, S.X. (2021, January 20\u201325). Unsupervised Feature Learning by Cross-Level Instance-Group Discrimination. Proceedings of the 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Virtual.","DOI":"10.1109\/CVPR46437.2021.01240"},{"key":"ref_86","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1016\/j.rse.2018.06.028","article-title":"Detecting mammals in UAV images: Best practices to address a substantially imbalanced dataset with deep learning","volume":"216","author":"Kellenberger","year":"2018","journal-title":"Remote Sens. Environ."},{"key":"ref_87","doi-asserted-by":"crossref","first-page":"106510","DOI":"10.1016\/j.compag.2021.106510","article-title":"Self-supervised contrastive learning on agricultural images","volume":"191","author":"Nalpantidis","year":"2021","journal-title":"Comput. Electron. Agric."},{"key":"ref_88","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2015, January 7\u201313). Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.123"},{"key":"ref_89","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41598-018-38343-3","article-title":"DeepWeeds: A multiclass weed species image dataset for deep learning","volume":"9","author":"Olsen","year":"2019","journal-title":"Sci. Rep."},{"key":"ref_90","doi-asserted-by":"crossref","unstructured":"Chiu, M.T., Xu, X., Wei, Y., Huang, Z., Schwing, A.G., Brunner, R., Khachatrian, H., Karapetyan, H., Dozier, I., and Rose, G. (2020, January 13\u201319). Agriculture-vision: A large aerial image database for agricultural pattern analysis. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00290"},{"key":"ref_91","unstructured":"Risojevi\u0107, V., and Stojni\u0107, V. (2021). The role of pre-training in high-resolution remote sensing scene classification. arXiv."},{"key":"ref_92","doi-asserted-by":"crossref","first-page":"2508","DOI":"10.1109\/JSTARS.2021.3056883","article-title":"Self-supervised GANs with similarity loss for remote sensing image scene classification","volume":"14","author":"Guo","year":"2021","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_93","doi-asserted-by":"crossref","unstructured":"Chen, T., Zhai, X., Ritter, M., Lucic, M., and Houlsby, N. (2019, January 16\u201317). Self-supervised gans via auxiliary rotation loss. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01243"},{"key":"ref_94","unstructured":"Zhang, H., Goodfellow, I., Metaxas, D., and Odena, A. (2019, January 9\u201315). Self-attention generative adversarial networks. Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA."},{"key":"ref_95","doi-asserted-by":"crossref","unstructured":"Jain, P., Schoen-Phelan, B., and Ross, R. (2022). Self-Supervised Learning for Invariant Representations from Multi-Spectral and SAR Images. arXiv.","DOI":"10.1109\/JSTARS.2022.3204888"},{"key":"ref_96","doi-asserted-by":"crossref","unstructured":"Wang, Y., Albrecht, C.M., and Zhu, X.X. (2022). Self-supervised Vision Transformers for Joint SAR-optical Representation Learning. arXiv.","DOI":"10.1109\/IGARSS46834.2022.9883983"},{"key":"ref_97","doi-asserted-by":"crossref","first-page":"174","DOI":"10.1109\/MGRS.2021.3089174","article-title":"BigEarthNet-MM: A Large-Scale, Multimodal, Multilabel Benchmark Archive for Remote Sensing Image Classification and Retrieval [Software and Datasets]","volume":"9","author":"Sumbul","year":"2021","journal-title":"IEEE Geosci. Remote Sens. Mag."},{"key":"ref_98","unstructured":"Goodfellow, I.J., Shlens, J., and Szegedy, C. (2014). Explaining and harnessing adversarial examples. arXiv."},{"key":"ref_99","first-page":"16199","article-title":"Robust pre-training by adversarial contrastive learning","volume":"33","author":"Jiang","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_100","first-page":"2983","article-title":"Adversarial self-supervised contrastive learning","volume":"33","author":"Kim","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_101","doi-asserted-by":"crossref","unstructured":"Xu, Y., Sun, H., Chen, J., Lei, L., Kuang, G., and Ji, K. (2021, January 11\u201316). Robust remote sensing scene classification by adversarial self-supervised learning. Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, Brussels, Belgium.","DOI":"10.1109\/IGARSS47720.2021.9553824"},{"key":"ref_102","unstructured":"Patel, C., Sharma, S., and Gulshan, V. (2021). Evaluating Self and Semi-Supervised Methods for Remote Sensing Segmentation Tasks. arXiv."},{"key":"ref_103","doi-asserted-by":"crossref","unstructured":"Wang, Y., Albrecht, C.M., Braham, N.A.A., Mou, L., and Zhu, X.X. (2022). Self-supervised Learning in Remote Sensing: A Review. arXiv.","DOI":"10.1109\/MGRS.2022.3198244"},{"key":"ref_104","unstructured":"Neumann, M., Pinto, A.S., Zhai, X., and Houlsby, N. (2019). In-domain representation learning for remote sensing. arXiv."},{"key":"ref_105","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1177\/001316446002000104","article-title":"A coefficient of agreement for nominal scales","volume":"20","author":"Cohen","year":"1960","journal-title":"Educ. Psychol. Meas."},{"key":"ref_106","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Nevada, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_107","unstructured":"Ioffe, S., and Szegedy, C. (2015, January 17\u201323). Batch normalization: Accelerating deep network training by reducing internal covariate shift. Proceedings of the International Conference on Machine Learning, PMLR, Baltimore, MD, USA."},{"key":"ref_108","first-page":"8026","article-title":"Pytorch: An imperative style, high-performance deep learning library","volume":"32","author":"Paszke","year":"2019","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_109","first-page":"857","article-title":"Stochastic neighbor embedding","volume":"15","author":"Hinton","year":"2002","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_110","unstructured":"Shen, K., Jones, R., Kumar, A., Xie, S.M., HaoChen, J.Z., Ma, T., and Liang, P. (2022, January 17\u201323). Connect, Not Collapse: Explaining Contrastive Learning for Unsupervised Domain Adaptation. Proceedings of the International Conference on Machine Learning, Baltimore, MD, USA."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/16\/3995\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T00:10:55Z","timestamp":1760141455000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/16\/3995"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,8,17]]},"references-count":110,"journal-issue":{"issue":"16","published-online":{"date-parts":[[2022,8]]}},"alternative-id":["rs14163995"],"URL":"https:\/\/doi.org\/10.3390\/rs14163995","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,8,17]]}}}