{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,25]],"date-time":"2026-02-25T00:35:52Z","timestamp":1771979752423,"version":"3.50.1"},"reference-count":97,"publisher":"MDPI AG","issue":"22","license":[{"start":{"date-parts":[[2022,11,10]],"date-time":"2022-11-10T00:00:00Z","timestamp":1668038400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Chang Jiang Scholars Program","award":["T2012122"],"award-info":[{"award-number":["T2012122"]}]},{"name":"Chang Jiang Scholars Program","award":["B0201"],"award-info":[{"award-number":["B0201"]}]},{"name":"Chang Jiang Scholars Program","award":["2018-JCJQ-ZQ-046"],"award-info":[{"award-number":["2018-JCJQ-ZQ-046"]}]},{"name":"Chang Jiang Scholars Program","award":["62101046"],"award-info":[{"award-number":["62101046"]}]},{"name":"Chang Jiang Scholars Program","award":["62136001"],"award-info":[{"award-number":["62136001"]}]},{"name":"Civil Aviation Program","award":["T2012122"],"award-info":[{"award-number":["T2012122"]}]},{"name":"Civil Aviation Program","award":["B0201"],"award-info":[{"award-number":["B0201"]}]},{"name":"Civil Aviation Program","award":["2018-JCJQ-ZQ-046"],"award-info":[{"award-number":["2018-JCJQ-ZQ-046"]}]},{"name":"Civil Aviation Program","award":["62101046"],"award-info":[{"award-number":["62101046"]}]},{"name":"Civil Aviation Program","award":["62136001"],"award-info":[{"award-number":["62136001"]}]},{"name":"Space-based on orbit real-time processing technology program","award":["T2012122"],"award-info":[{"award-number":["T2012122"]}]},{"name":"Space-based on orbit real-time processing technology program","award":["B0201"],"award-info":[{"award-number":["B0201"]}]},{"name":"Space-based on orbit real-time processing technology program","award":["2018-JCJQ-ZQ-046"],"award-info":[{"award-number":["2018-JCJQ-ZQ-046"]}]},{"name":"Space-based on orbit real-time processing technology program","award":["62101046"],"award-info":[{"award-number":["62101046"]}]},{"name":"Space-based on orbit real-time processing technology program","award":["62136001"],"award-info":[{"award-number":["62136001"]}]},{"name":"National Science Foundation for Young Scientists of China","award":["T2012122"],"award-info":[{"award-number":["T2012122"]}]},{"name":"National Science Foundation for Young Scientists of China","award":["B0201"],"award-info":[{"award-number":["B0201"]}]},{"name":"National Science Foundation for Young Scientists of China","award":["2018-JCJQ-ZQ-046"],"award-info":[{"award-number":["2018-JCJQ-ZQ-046"]}]},{"name":"National Science Foundation for Young Scientists of China","award":["62101046"],"award-info":[{"award-number":["62101046"]}]},{"name":"National Science Foundation for Young Scientists of China","award":["62136001"],"award-info":[{"award-number":["62136001"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["T2012122"],"award-info":[{"award-number":["T2012122"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["B0201"],"award-info":[{"award-number":["B0201"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["2018-JCJQ-ZQ-046"],"award-info":[{"award-number":["2018-JCJQ-ZQ-046"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62101046"],"award-info":[{"award-number":["62101046"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62136001"],"award-info":[{"award-number":["62136001"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Currently, under supervised learning, a model pre-trained by a large-scale nature scene dataset and then fine-tuned on a few specific task labeling data is the paradigm that has dominated knowledge transfer learning. Unfortunately, due to different categories of imaging data and stiff challenges of data annotation, there is not a large enough and uniform remote sensing dataset to support large-scale pre-training in the remote sensing domain (RSD). Moreover, pre-training models on large-scale nature scene datasets by supervised learning and then directly fine-tuning on diverse downstream tasks seems to be a crude method, which is easily affected by inevitable incorrect labeling, severe domain gaps and task-aware discrepancies. Thus, in this paper, considering the self-supervised pre-training and powerful vision transformer (ViT) architecture, a concise and effective knowledge transfer learning strategy called ConSecutive Pre-Training (CSPT) is proposed based on the idea of not stopping pre-training in natural language processing (NLP), which can gradually bridge the domain gap and transfer large-scale data knowledge to any specific domain (e.g., from nature scene domain to RSD) In addition, the proposed CSPT also can release the huge potential of unlabeled data for task-aware model training. Finally, extensive experiments were carried out on twelve remote sensing datasets involving three types of downstream tasks (e.g., scene classification, object detection and land cover classification) and two types of imaging data (e.g., optical and synthetic aperture radar (SAR)). The results show that by utilizing the proposed CSPT for task-aware model training, almost all downstream tasks in the RSD can outperform the previous knowledge transfer learning strategies based on model pre-training without any expensive manually labeling and even surpass the state-of-the-art (SOTA) performance without any careful network architecture designing.<\/jats:p>","DOI":"10.3390\/rs14225675","type":"journal-article","created":{"date-parts":[[2022,11,10]],"date-time":"2022-11-10T21:33:02Z","timestamp":1668115982000},"page":"5675","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":41,"title":["Consecutive Pre-Training: A Knowledge Transfer Learning Strategy with Relevant Unlabeled Data for Remote Sensing Domain"],"prefix":"10.3390","volume":"14","author":[{"given":"Tong","family":"Zhang","sequence":"first","affiliation":[{"name":"Beijing Key Laboratory of Embedded Real-Time Information Processing Technology, Beijing Institute of Technology, Beijing 100081, China"}]},{"given":"Peng","family":"Gao","sequence":"additional","affiliation":[{"name":"Shang Hai AI Laboratory, Shanghai 100024, China"}]},{"given":"Hao","family":"Dong","sequence":"additional","affiliation":[{"name":"Center on Frontiers of Computing Studies (CFCS), School of Computer Science (CS), Peking University, Beijing 100871, China"}]},{"given":"Yin","family":"Zhuang","sequence":"additional","affiliation":[{"name":"Beijing Key Laboratory of Embedded Real-Time Information Processing Technology, Beijing Institute of Technology, Beijing 100081, China"}]},{"given":"Guanqun","family":"Wang","sequence":"additional","affiliation":[{"name":"Beijing Key Laboratory of Embedded Real-Time Information Processing Technology, Beijing Institute of Technology, Beijing 100081, China"}]},{"given":"Wei","family":"Zhang","sequence":"additional","affiliation":[{"name":"Advanced Research Institute of Multidisciplinary Sciences, Beijing Institute of Technology, Beijing 100081, China"}]},{"given":"He","family":"Chen","sequence":"additional","affiliation":[{"name":"Beijing Key Laboratory of Embedded Real-Time Information Processing Technology, Beijing Institute of Technology, Beijing 100081, China"}]}],"member":"1968","published-online":{"date-parts":[[2022,11,10]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"958","DOI":"10.1080\/19475705.2018.1524400","article-title":"Seismic vulnerability assessment at urban scale using data mining and GIScience technology: Application to Urumqi (China)","volume":"10","author":"Liu","year":"2019","journal-title":"Geomat. Nat. Hazards Risk"},{"key":"ref_2","first-page":"63","article-title":"Urban planning and building smart cities based on the Internet of Things using Big Data analytics","volume":"101","author":"Rathore","year":"2016","journal-title":"Comput. Netw. Int. J. Comput. Telecommun. Netw."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Ozdarici-Ok, A., Ok, A.O., and Schindler, K. (2015). Mapping of Agricultural Crops from Single High-Resolution Multispectral Images\u2014Data-Driven Smoothing vs. Parcel-Based Smoothing. Remote Sens., 7.","DOI":"10.3390\/rs70505611"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"183","DOI":"10.1016\/j.compind.2018.03.014","article-title":"Real-time object detection in agricultural\/remote environments using the multiple-expert colour feature extreme learning machine (MEC-ELM)","volume":"98","author":"Sadgrove","year":"2018","journal-title":"Comput. Ind."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"186","DOI":"10.1007\/978-3-642-15558-1_14","article-title":"Detection and tracking of large number of targets in wide area surveillance","volume":"Volume 6313","author":"Daniilidis","year":"2010","journal-title":"Computer Vision\u2014ECCV 2010. ECCV 2010"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., and Darrell, T. (2014, January 3\u20137). Caffe: Convolutional architecture for fast feature embedding. Proceedings of the 22nd ACM International Conference on Multimedia, Orlando, FL, USA.","DOI":"10.1145\/2647868.2654889"},{"key":"ref_7","unstructured":"Simonyan, K., and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2015, January 7\u201312). Going deeper with convolutions. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1145\/3065386","article-title":"Imagenet classification with deep convolutional neural networks","volume":"60","author":"Krizhevsky","year":"2017","journal-title":"Commun. ACM"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Xie, S., Girshick, R., Doll\u00e1r, P., Tu, Z., and He, K. (2017, January 21\u201326). Aggregated residual transformations for deep neural networks. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.634"},{"key":"ref_12","unstructured":"Zhang, H., Wu, C., Zhang, Z., Zhu, Y., Lin, H., Zhang, Z., Sun, Y., He, T., Mueller, J., and Manmatha, R. (2020). Resnest: Split-attention networks. arXiv."},{"key":"ref_13","unstructured":"Tan, M., and Le, Q. (2019, January 9\u201315). Efficientnet: Rethinking model scaling for convolutional neural networks. Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA."},{"key":"ref_14","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021, January 11\u201317). Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the 2021 IEEE\/CVF International Conference on Computer Vision Workshops (ICCVW), Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Zhang, Q., Xu, Y., Zhang, J., and Tao, D. (2022). Vitaev2: Vision transformer advanced by exploring inductive bias for image recognition and beyond. arXiv.","DOI":"10.1007\/s11263-022-01739-w"},{"key":"ref_17","unstructured":"Gao, P., Lu, J., Li, H., Mottaghi, R., and Kembhavi, A. (2021). Container: Context aggregation network. arXiv."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1007\/s11263-015-0816-y","article-title":"ImageNet Large Scale Visual Recognition Challenge","volume":"115","author":"Russakovsky","year":"2015","journal-title":"Int. J. Comput. Vis. (IJCV)"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"3965","DOI":"10.1109\/TGRS.2017.2685945","article-title":"AID: A Benchmark Data Set for Performance Evaluation of Aerial Scene Classification","volume":"55","author":"Xia","year":"2017","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"1865","DOI":"10.1109\/JPROC.2017.2675998","article-title":"Remote Sensing Image Scene Classification: Benchmark and State of the Art","volume":"105","author":"Gong","year":"2017","journal-title":"Proc. IEEE"},{"key":"ref_21","first-page":"4","article-title":"ISPRS semantic labeling contest","volume":"1","author":"Rottensteiner","year":"2014","journal-title":"ISPRS Leopoldsh\u00f6he Ger."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"111322","DOI":"10.1016\/j.rse.2019.111322","article-title":"Land-cover classification with high-resolution remote sensing images using transferable deep models","volume":"237","author":"Tong","year":"2020","journal-title":"Remote Sens. Environ."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"296","DOI":"10.1016\/j.isprsjprs.2019.11.023","article-title":"Object Detection in Optical Remote Sensing Images: A Survey and A New Benchmark","volume":"159","author":"Li","year":"2020","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1016\/j.isprsjprs.2014.10.002","article-title":"Multi-class geospatial object detection and geographic image classification based on collection of part detectors","volume":"98","author":"Cheng","year":"2014","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Zhu, H., Chen, X., Dai, W., Fu, K., Ye, Q., and Jiao, J. (2015, January 27\u201330). Orientation robust object detection in aerial images using deep convolutional neural network. Proceedings of the IEEE International Conference on Image Processing, Quebec City, QC, Canada.","DOI":"10.1109\/ICIP.2015.7351502"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"324","DOI":"10.5220\/0006120603240331","article-title":"A high resolution optical satellite image dataset for ship recognition and some new baselines","volume":"Volume 2","author":"Liu","year":"2017","journal-title":"International Conference on Pattern Recognition Applications and Methods"},{"key":"ref_27","first-page":"566","article-title":"Standard SAR ATR evaluation experiments using the MSTAR public release data set","volume":"Volume 3370","author":"Ross","year":"1998","journal-title":"Algorithms for Synthetic Aperture Radar Imagery"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"120234","DOI":"10.1109\/ACCESS.2020.3005861","article-title":"HRSID: A high-resolution SAR images dataset for ship detection and instance segmentation","volume":"8","author":"Wei","year":"2020","journal-title":"IEEE Access"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Li, J., Qu, C., and Shao, J. (2017, January 13\u201314). Ship detection in SAR images based on an improved faster R-CNN. Proceedings of the 2017 SAR in Big Data Era: Models, Methods and Applications (BIGSARDATA), Beijing, China.","DOI":"10.1109\/BIGSARDATA.2017.8124934"},{"key":"ref_30","unstructured":"Long, Y., Xia, G.S., Zhang, L., Cheng, G., and Li, D. (2022). Aerial Scene Parsing: From Tile-level Scene Classification to Pixel-wise Semantic Labeling. arXiv."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Ranjan, P., Patil, S., and Ansari, R.A. (2020, January 10\u201313). Building Footprint Extraction from Aerial Images using Multiresolution Analysis Based Transfer Learning. Proceedings of the 2020 IEEE 17th India Council International Conference (INDICON), New Delhi, India.","DOI":"10.1109\/INDICON49873.2020.9342581"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"78","DOI":"10.1016\/j.isprsjprs.2017.12.007","article-title":"Semantic labeling in very high resolution images via a self-cascaded convolutional neural network","volume":"145","author":"Liu","year":"2018","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Chen, Z., Zhang, T., and Ouyang, C. (2018). End-to-end airplane detection using transfer learning in remote sensing images. Remote Sens., 10.","DOI":"10.3390\/rs10010139"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Wang, D., Zhang, J., Du, B., Xia, G.S., and Tao, D. (2022). An Empirical Study of Remote Sensing Pre-Training. arXiv.","DOI":"10.1109\/TGRS.2022.3176603"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"98","DOI":"10.1007\/s11263-014-0733-5","article-title":"The pascal visual object classes challenge: A retrospective","volume":"111","author":"Everingham","year":"2015","journal-title":"Int. J. Comput. Vis."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"1452","DOI":"10.1109\/TPAMI.2017.2723009","article-title":"Places: A 10 million Image Database for Scene Recognition","volume":"40","author":"Zhou","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Maire, M., Belongie, S., Hays, J., and Zitnick, C.L. (2014). Microsoft COCO: Common Objects in Context. Computer Vision\u2013ECCV 2014. ECCV 2014, Springer. Lecture Notes in Computer Science.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_38","unstructured":"Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., and Clark, J. (2021, January 18\u201321). Learning transferable visual models from natural language supervision. Proceedings of the 38th International Conference on Machine Learning, Virtual Event."},{"key":"ref_39","unstructured":"Chakraborty, S., Uzkent, B., Ayush, K., Tanmay, K., Sheehan, E., and Ermon, S. (2020). Efficient conditional pre-training for transfer learning. arXiv."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Ericsson, L., Gouk, H., and Hospedales, T.M. (2021, January 20\u201325). How well do self-supervised models transfer?. Proceedings of the 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00537"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Kotar, K., Ilharco, G., Schmidt, L., Ehsani, K., and Mottaghi, R. (2021, January 20\u201325). Contrasting contrastive self-supervised representation learning pipelines. Proceedings of the 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/ICCV48922.2021.00980"},{"key":"ref_42","unstructured":"Asano, Y.M., Rupprecht, C., Zisserman, A., and Vedaldi, A. (2021). PASS: An ImageNet replacement for self-supervised pre-training without humans. arXiv."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Stojnic, V., and Risojevic, V. (2021, January 20\u201325). Self-supervised learning of remote sensing scene representations using contrastive multiview coding. Proceedings of the 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPRW53098.2021.00129"},{"key":"ref_44","first-page":"1","article-title":"Global and local contrastive self-supervised learning for semantic segmentation of HR remote sensing images","volume":"60","author":"Li","year":"2022","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_45","first-page":"1","article-title":"Remote sensing image scene classification with self-supervised paradigm under limited labeled samples","volume":"19","author":"Tao","year":"2020","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_46","first-page":"1","article-title":"Geographical Knowledge-Driven Representation Learning for Remote Sensing Images","volume":"60","author":"Li","year":"2021","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Manas, O., Lacoste, A., Giro-i Nieto, X., Vazquez, D., and Rodriguez, P. (2021, January 20\u201325). Seasonal contrast: Unsupervised pre-training from uncurated remote sensing data. Proceedings of the 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/ICCV48922.2021.00928"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Reed, C.J., Yue, X., Nrusimha, A., Ebrahimi, S., Vijaykumar, V., Mao, R., Li, B., Zhang, S., Guillory, D., and Metzger, S. (2022, January 3\u20138). Self-supervised pre-training improves self-supervised pre-training. Proceedings of the 2022 IEEE\/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA.","DOI":"10.1109\/WACV51458.2022.00112"},{"key":"ref_49","first-page":"22243","article-title":"Big self-supervised models are strong semi-supervised learners","volume":"33","author":"Chen","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"He, K., Fan, H., Wu, Y., Xie, S., and Girshick, R. (2020, January 13\u201319). Momentum contrast for unsupervised visual representation learning. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00975"},{"key":"ref_51","first-page":"21271","article-title":"Bootstrap your own latent-a new approach to self-supervised learning","volume":"33","author":"Grill","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_52","first-page":"9912","article-title":"Unsupervised learning of visual features by contrasting cluster assignments","volume":"33","author":"Caron","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"He, K., Chen, X., Xie, S., Li, Y., Doll\u00e1r, P., and Girshick, R. (2021). Masked autoencoders are scalable vision learners. arXiv.","DOI":"10.1109\/CVPR52688.2022.01553"},{"key":"ref_54","unstructured":"Bao, H., Dong, L., and Wei, F. (2021). Beit: Bert pre-training of image transformers. arXiv."},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Xie, Z., Zhang, Z., Cao, Y., Lin, Y., Bao, J., Yao, Z., Dai, Q., and Hu, H. (2021). Simmim: A simple framework for masked image modeling. arXiv.","DOI":"10.1109\/CVPR52688.2022.00943"},{"key":"ref_56","unstructured":"Gao, P., Ma, T., Li, H., Dai, J., and Qiao, Y. (2022). ConvMAE: Masked Convolution Meets Masked Autoencoders. arXiv."},{"key":"ref_57","unstructured":"Zhang, R., Guo, Z., Gao, P., Fang, R., Zhao, B., Wang, D., Qiao, Y., and Li, H. (2022). Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training. arXiv."},{"key":"ref_58","doi-asserted-by":"crossref","unstructured":"Gururangan, S., Marasovi\u0107, A., Swayamdipta, S., Lo, K., Beltagy, I., Downey, D., and Smith, N.A. (2020). Don\u2019t stop pre-training: Adapt language models to domains and tasks. arXiv.","DOI":"10.18653\/v1\/2020.acl-main.740"},{"key":"ref_59","unstructured":"Dery, L.M., Michel, P., Talwalkar, A., and Neubig, G. (2021). Should we be pre-training? an argument for end-task aware training as an alternative. arXiv."},{"key":"ref_60","unstructured":"Anand, M., and Garg, A. (2021). Recent advancements in self-supervised paradigms for visual feature representation. arXiv."},{"key":"ref_61","doi-asserted-by":"crossref","unstructured":"Ericsson, L., Gouk, H., Loy, C.C., and Hospedales, T.M. (2021). Self-Supervised Representation Learning: Introduction, Advances and Challenges. arXiv.","DOI":"10.1109\/MSP.2021.3134634"},{"key":"ref_62","doi-asserted-by":"crossref","unstructured":"Tao, C., Qia, J., Zhang, G., Zhu, Q., Lu, W., and Li, H. (2022). TOV: The Original Vision Model for Optical Remote Sensing Image Understanding via Self-supervised Learning. arXiv.","DOI":"10.1109\/JSTARS.2023.3271312"},{"key":"ref_63","doi-asserted-by":"crossref","unstructured":"Xu, Y., Sun, H., Chen, J., Lei, L., Ji, K., and Kuang, G. (2021). Adversarial Self-Supervised Learning for Robust SAR Target Recognition. Remote Sens., 13.","DOI":"10.3390\/rs13204158"},{"key":"ref_64","doi-asserted-by":"crossref","unstructured":"Ayush, K., Uzkent, B., Meng, C., Tanmay, K., Burke, M., Lobell, D., and Ermon, S. (2021, January 20\u201325). Geography-aware self-supervised learning. Proceedings of the 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/ICCV48922.2021.01002"},{"key":"ref_65","doi-asserted-by":"crossref","unstructured":"Wang, L., Liang, F., Li, Y., Ouyang, W., Zhang, H., and Shao, J. (2022). RePre: Improving Self-Supervised Vision Transformer with Reconstructive Pre-training. arXiv.","DOI":"10.24963\/ijcai.2022\/200"},{"key":"ref_66","doi-asserted-by":"crossref","unstructured":"Wang, D., Zhang, Q., Xu, Y., Zhang, J., Du, B., Tao, D., and Zhang, L. (2022). Advancing Plain Vision Transformer Towards Remote Sensing Foundation Model. arXiv.","DOI":"10.1109\/TGRS.2022.3222818"},{"key":"ref_67","doi-asserted-by":"crossref","unstructured":"Zhou, L., Liu, H., Bae, J., He, J., Samaras, D., and Prasanna, P. (2022). Self Pre-training with Masked Autoencoders for Medical Image Analysis. arXiv.","DOI":"10.1109\/ISBI53787.2023.10230477"},{"key":"ref_68","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv."},{"key":"ref_69","unstructured":"Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., and Askell, A. (2020). Language models are few-shot learners. arXiv."},{"key":"ref_70","unstructured":"Loshchilov, I., and Hutter, F. (2017). Decoupled weight decay regularization. arXiv."},{"key":"ref_71","doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Dollar, P., and Girshick, R. (2017, January 22\u201329). Mask R-CNN. Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.322"},{"key":"ref_72","unstructured":"Chen, K., Wang, J., Pang, J., Cao, Y., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., and Xu, J. (2019). MMDetection: Open MMLab Detection Toolbox and Benchmark. arXiv."},{"key":"ref_73","doi-asserted-by":"crossref","unstructured":"Xiao, T., Liu, Y., Zhou, B., Jiang, Y., and Sun, J. (2018, January 8\u201314). Unified perceptual parsing for scene understanding. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01228-1_26"},{"key":"ref_74","unstructured":"Contributors, M. (2022, January 22). MMSegmentation: OpenMMLab Semantic Segmentation Toolbox and Benchmark. Available online: https:\/\/github.com\/open-mmlab\/mmsegmentation."},{"key":"ref_75","doi-asserted-by":"crossref","unstructured":"Xia, G.S., Bai, X., Ding, J., Zhu, Z., Belongie, S., Luo, J., Datcu, M., Pelillo, M., and Zhang, L. (2018, January 18\u201323). DOTA: A large-scale dataset for object detection in aerial images. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00418"},{"key":"ref_76","doi-asserted-by":"crossref","first-page":"4121","DOI":"10.1109\/JSTARS.2020.3009352","article-title":"Channel-Attention-Based DenseNet Network for Remote Sensing Image Scene Classification","volume":"13","author":"Tong","year":"2020","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_77","doi-asserted-by":"crossref","first-page":"1926","DOI":"10.1109\/LGRS.2020.3011405","article-title":"Remote sensing image scene classification based on an enhanced attention module","volume":"18","author":"Zhao","year":"2020","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_78","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1109\/TIP.2021.3127851","article-title":"Remote Sensing Scene Classification via Multi-Branch Local Attention Network","volume":"31","author":"Chen","year":"2021","journal-title":"IEEE Trans. Image Process."},{"key":"ref_79","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/TGRS.2022.3230378","article-title":"Embedded Self-Distillation in Compact Multi-Branch Ensemble Network for Remote Sensing Scene Classification","volume":"60","author":"Zhao","year":"2022","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_80","doi-asserted-by":"crossref","first-page":"9530","DOI":"10.1109\/JSTARS.2021.3109661","article-title":"A Multiscale Attention Network for Remote Sensing Scene Images Classification","volume":"14","author":"Zhang","year":"2021","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_81","doi-asserted-by":"crossref","first-page":"5396","DOI":"10.1109\/TIP.2020.2983560","article-title":"Multi-Granularity Canonical Appearance Pooling for Remote Sensing Scene Classification","volume":"29","author":"Wang","year":"2020","journal-title":"IEEE Trans. Image Process."},{"key":"ref_82","doi-asserted-by":"crossref","first-page":"9768","DOI":"10.1109\/JSTARS.2021.3114404","article-title":"Best representation branch model for remote sensing image scene classification","volume":"14","author":"Zhang","year":"2021","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_83","doi-asserted-by":"crossref","first-page":"8077","DOI":"10.1109\/TGRS.2020.2987060","article-title":"High-Resolution Remote Sensing Image Scene Classification via Key Filter Bank Based on Convolutional Neural Network","volume":"58","author":"Li","year":"2020","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_84","doi-asserted-by":"crossref","unstructured":"Chen, X., Xie, S., and He, K. (2021, January 11\u201317). An empirical study of training self-supervised visual transformers. Proceedings of the 2021 IEEE\/CVF International Conference on Computer Vision Workshops (ICCVW), Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00950"},{"key":"ref_85","doi-asserted-by":"crossref","unstructured":"Guo, Y., Xu, M., Li, J., Ni, B., Zhu, X., Sun, Z., and Xu, Y. (2022). HCSC: Hierarchical Contrastive Selective Coding. arXiv.","DOI":"10.1109\/CVPR52688.2022.00948"},{"key":"ref_86","doi-asserted-by":"crossref","unstructured":"Peng, X., Wang, K., Zhu, Z., and You, Y. (2022). Crafting Better Contrastive Views for Siamese Representation Learning. arXiv.","DOI":"10.1109\/CVPR52688.2022.01556"},{"key":"ref_87","first-page":"1","article-title":"When CNNs meet vision transformer: A joint framework for remote sensing scene classification","volume":"19","author":"Deng","year":"2021","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_88","doi-asserted-by":"crossref","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","article-title":"Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks","volume":"39","author":"Ren","year":"2016","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_89","unstructured":"Jocher, G., Chaurasia, A., Stoken, A., Borovec, J., Kwon, Y., Michael, K., Fang, J. (2022). ultralytics\/yolov5: V6.2\u2014YOLOv5 Classification Models, Apple M1, Reproducibility, ClearML and Deci.ai Integrations. Available online: https:\/\/doi.org\/10.5281\/zenodo.7002879."},{"key":"ref_90","doi-asserted-by":"crossref","unstructured":"Liu, S., Qi, L., Qin, H., Shi, J., and Jia, J. (2018, January 18\u201323). Path aggregation network for instance segmentation. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00913"},{"key":"ref_91","unstructured":"Zhou, X., Wang, D., and Kr\u00e4henb\u00fchl, P. (2019). Objects as points. arXiv."},{"key":"ref_92","first-page":"1","article-title":"Multiscale Semantic Fusion-Guided Fractal Convolutional Object Detection Network for Optical Remote Sensing Imagery","volume":"60","author":"Zhang","year":"2021","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_93","doi-asserted-by":"crossref","first-page":"2148","DOI":"10.1109\/JSTARS.2020.3046482","article-title":"Cross-layer attention network for small object detection in remote sensing imagery","volume":"14","author":"Li","year":"2020","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_94","first-page":"1","article-title":"FSoD-Net: Full-scale object detection from optical remote sensing imagery","volume":"60","author":"Wang","year":"2021","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_95","doi-asserted-by":"crossref","unstructured":"Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., and Adam, H. (2018, January 8\u201314). Encoder-decoder with atrous separable convolution for semantic image segmentation. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_49"},{"key":"ref_96","doi-asserted-by":"crossref","unstructured":"Cao, Y., Xu, J., Lin, S., Wei, F., and Hu, H. (2019\u20132, January 27). Gcnet: Non-local networks meet squeeze-excitation networks and beyond. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Korea.","DOI":"10.1109\/ICCVW.2019.00246"},{"key":"ref_97","doi-asserted-by":"crossref","unstructured":"Chen, F., Liu, H., Zeng, Z., Zhou, X., and Tan, X. (2022). BES-Net: Boundary Enhancing Semantic Context Network for High-Resolution Image Semantic Segmentation. Remote Sens., 14.","DOI":"10.3390\/rs14071638"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/22\/5675\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T01:13:54Z","timestamp":1760145234000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/22\/5675"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,11,10]]},"references-count":97,"journal-issue":{"issue":"22","published-online":{"date-parts":[[2022,11]]}},"alternative-id":["rs14225675"],"URL":"https:\/\/doi.org\/10.3390\/rs14225675","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,11,10]]}}}