{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,12]],"date-time":"2026-05-12T15:50:09Z","timestamp":1778601009125,"version":"3.51.4"},"reference-count":43,"publisher":"MDPI AG","issue":"8","license":[{"start":{"date-parts":[[2021,4,10]],"date-time":"2021-04-10T00:00:00Z","timestamp":1618012800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["41701508"],"award-info":[{"award-number":["41701508"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>In recent years, fully supervised object detection methods in remote sensing images with good performance have been developed. However, this approach requires a large number of instance-level annotated samples that are relatively expensive to acquire. Therefore, weakly supervised learning using only image-level annotations has attracted much attention. Most of the weakly supervised object detection methods are based on multi-instance learning methods, and their performance depends on the process of scoring the candidate region proposals during training. In this process, the use of only image-level labels for supervision usually cannot obtain optimal results due to the lack of location information of the object. To address the above problem, a dynamic sample pseudo-label generation framework is proposed to generate pseudo-labels for each proposal without additional annotations. First, we propose the pseudo-label generation algorithm (PLG) to generate the category labels of the proposal by using the localization information of the object. Specifically, we propose to use the pixel average of the object\u2019s localization map in the proposal as the proposal category confidence and calculate the pseudo-label by comparing the proposal category confidence with the preset threshold. In addition, an effective adaptive threshold selection strategy is designed to eliminate the effect of different category shape differences in computing sample pseudo-labels. Comparative experiments on the NWPU VHR-10 dataset demonstrate that our method can significantly improve the detection performance compared to existing methods.<\/jats:p>","DOI":"10.3390\/rs13081461","type":"journal-article","created":{"date-parts":[[2021,4,12]],"date-time":"2021-04-12T05:52:00Z","timestamp":1618206720000},"page":"1461","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":35,"title":["Dynamic Pseudo-Label Generation for Weakly Supervised Object Detection in Remote Sensing Images"],"prefix":"10.3390","volume":"13","author":[{"given":"Hui","family":"Wang","sequence":"first","affiliation":[{"name":"Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"Key Laboratory of Network Information System Technology, Institute of Electronics, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"University of Chinese Academy of Sciences, Beijing 100190, China"},{"name":"School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 100190, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hao","family":"Li","sequence":"additional","affiliation":[{"name":"Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"Key Laboratory of Network Information System Technology, Institute of Electronics, Chinese Academy of Sciences, Beijing 100190, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wanli","family":"Qian","sequence":"additional","affiliation":[{"name":"Georgia Institute of Technology College of Computing, Atlanta, GA 30318, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wenhui","family":"Diao","sequence":"additional","affiliation":[{"name":"Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"Key Laboratory of Network Information System Technology, Institute of Electronics, Chinese Academy of Sciences, Beijing 100190, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Liangjin","family":"Zhao","sequence":"additional","affiliation":[{"name":"Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"Key Laboratory of Network Information System Technology, Institute of Electronics, Chinese Academy of Sciences, Beijing 100190, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jinghua","family":"Zhang","sequence":"additional","affiliation":[{"name":"Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"Key Laboratory of Network Information System Technology, Institute of Electronics, Chinese Academy of Sciences, Beijing 100190, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Daobing","family":"Zhang","sequence":"additional","affiliation":[{"name":"Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"Key Laboratory of Network Information System Technology, Institute of Electronics, Chinese Academy of Sciences, Beijing 100190, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2021,4,10]]},"reference":[{"key":"ref_1","unstructured":"Simonyan, K., and Zisserman, A. (2015, January 7\u20139). Very Deep Convolutional Networks for Large-Scale Image Recognition. Proceedings of the International Conference on Learning Representations, San Diego, CA, USA."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2015, January 7\u201312). Going deeper with convolutions. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna, Z. (2016, January 27\u201330). Rethinking the inception architecture for computer vision. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), Seattle, WA, USA.","DOI":"10.1109\/CVPR.2016.308"},{"key":"ref_4","unstructured":"Ioffe, S., and Szegedy, C. (2015, January 6\u201311). Batch normalization: Accelerating deep network training by reducing internal covariate shift. Proceedings of the International Conference on Machine Learning, Lille, France."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C., and Berg, A.C. (2016). SSD: Single Shot MultiBox Detector, Springer.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_6","unstructured":"Yang, X., Yang, J., Yan, J., Zhang, Y., Zhang, T., Guo, Z., Sun, X., and Fu, K. (November, January 27). Scrdet: Towards more robust detection for small, cluttered and rotated objects. Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_7","first-page":"1","article-title":"SRAF-Net: Shape Robust Anchor-Free Network for Garbage Dumps in Remote Sensing Imagery","volume":"99","author":"Sun","year":"2020","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"3377","DOI":"10.1109\/TGRS.2019.2954328","article-title":"FMSSD: Feature-Merged Single-Shot Detection for Multiscale Objects in Large-Scale Remote Sensing Imagery","volume":"58","author":"Wang","year":"2019","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"268","DOI":"10.1016\/j.isprsjprs.2020.09.022","article-title":"Oriented Objects as pairs of Middle Lines","volume":"169","author":"Wei","year":"2020","journal-title":"J. Photogramm. Remote Sens."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"5398","DOI":"10.1109\/JSTARS.2020.3021098","article-title":"BAS4Net: Boundary-Aware Semi-Supervised Semantic Segmentation Network for Very High Resolution Remote Sensing Images","volume":"13","author":"Sun","year":"2020","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","article-title":"Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks","volume":"39","author":"Ren","year":"2015","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Yu, C.-N.J., and Joachims, T. (2009, January 14\u201318). Learning structural svms with latent variables. Proceedings of the 26th International Conference on Machine Learning (ICML), Montreal, QC, Canada.","DOI":"10.1145\/1553374.1553523"},{"key":"ref_13","first-page":"1637","article-title":"Weakly supervised discovery of visual pattern configurations","volume":"2014","author":"Song","year":"2014","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"405","DOI":"10.1109\/TPAMI.2015.2456908","article-title":"Weakly Supervised Large Scale Object Localization with Multiple Instance Learning and Bag Splitting","volume":"38","author":"Ren","year":"2016","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Ye, Q., Zhang, T., Qiu, Q., Zhang, B., Chen, J., and Sapiro, G. (2017, January 21\u201326). Self-learning scene-specific pedestrian detectors using a pro-gressive latent model. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.222"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Bilen, H., and Vedaldi, A. (2016, January 27\u201330). Weakly supervised deep detection networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.311"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Tang, P., Wang, X., Bai, X., and Liu, W. (2017, January 21\u201326). Multiple instance detection network with online instance classifier refinement. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.326"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Gao, M., Li, A., Yu, R., Morariu, V.I., and Davis, L.S. (2018, January 8\u201314). C-wsl: Count-guided weakly supervised localization. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01246-5_10"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"176","DOI":"10.1109\/TPAMI.2018.2876304","article-title":"PCL: Proposal Cluster Learning for Weakly Supervised Object Detection","volume":"42","author":"Tang","year":"2018","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Wan, F., Wei, P., Jiao, J., Han, Z., and Ye, Q. (2018, January 18\u201323). Min entropy latent model for weakly supervised object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00141"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Wan, F., Liu, C., Ke, W., Ji, X., Jiao, J., and Ye, Q. (2019, January 15\u201320). C-mil: Continuation multiple instance learning for weakly supervised object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00230"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"8002","DOI":"10.1109\/TGRS.2020.2985989","article-title":"Progressive Contextual Instance Refinement for Weakly Supervised Object Detection in Remote Sensing Images","volume":"58","author":"Feng","year":"2020","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"675","DOI":"10.1109\/TGRS.2020.2991407","article-title":"Automatic Weakly Supervised Object Detection From High Spatial Resolution Remote Sensing Images via Dynamic Curriculum Learning","volume":"59","author":"Yao","year":"2020","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Feng, X., Han, J., Yao, X., and Cheng, G. (2020). TCANet: Triple Context-Aware Network for Weakly Supervised Object Detection in Remote Sensing Images. IEEE Trans. Geosci. Remote Sens.","DOI":"10.1109\/TGRS.2020.2985989"},{"key":"ref_25","first-page":"561","article-title":"Support vector machines for multiple-instance learning","volume":"2002","author":"Andrews","year":"2002","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Zhang, Y., Bai, Y., Ding, M., Li, Y., and Ghanem, B. (2018, January 18\u201323). W2f: A weakly-supervised to fully-supervised framework for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00103"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Diba, A., Sharma, V., Pazandeh, A., Pirsiavash, H., and Van Gool, L. (2017, January 21\u201326). Weakly supervised cascaded convolutional networks. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.545"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Shen, Y., Ji, R., Zhang, S., Zuo, W., and Wang, Y. (2018, January 18\u201323). Generative Adversarial Learning Towards Fast Weakly Supervised Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00604"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Ge, W., Yang, S., and Yu, Y. (2018, January 18\u201323). Multi-Evidence Filtering and Fusion for Multi-Label Classification, Object Detection and Semantic Segmentation Based on Weakly Supervised Learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00139"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Shen, Y.H., Ji, R.R., Wang, Y., Wu, Y.J., and Cao, L. (2019, January 15\u201320). Cyclic guidance for weakly supervised joint detection and segmentation. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00079"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Li, X., Kan, M., Shan, S., and Chen, X. (2019). Weakly supervised object detection with segmentation collaboration. arXiv.","DOI":"10.1109\/ICCV.2019.00983"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"154","DOI":"10.1007\/s11263-013-0620-5","article-title":"Selective search for object recognition","volume":"104","author":"Uijlings","year":"2013","journal-title":"Int. Comput. Vis."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 7\u201313). Fast R-CNN. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., and Torralba, A. (2016, January 27\u201330). Learning Deep Features for Discriminative Localization. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.319"},{"key":"ref_35","unstructured":"Lin, M., Chen, Q., and Yan, S. (2014, January 14\u201316). Network in network. Proceedings of the Interna-Tional Conference on Learning Representations, Banff, AB, Canada."},{"key":"ref_36","unstructured":"Selvaraju, R.R., Das, A., Vedantam, R., Cogswell, M., Parikh, D., and Batra, D. (2016). Grad-CAM: Why did you say that?. arXiv."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Chattopadhay, A., Sarkar, A., Howlader, P., and Balasubramanian, V. (2018, January 12\u201315). Grad-CAM++: Generalized Gradient-Based Visual Explanations for Deep Convolutional Networks. Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA.","DOI":"10.1109\/WACV.2018.00097"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1016\/j.isprsjprs.2014.10.002","article-title":"Multi-class geospatial object detection and geographic image classification based on collection of part detectors","volume":"98","author":"Cheng","year":"2014","journal-title":"J. Photogramm. Remote Sens."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1016\/j.isprsjprs.2016.03.014","article-title":"A survey on object detection in optical remote sensing images","volume":"117","author":"Cheng","year":"2016","journal-title":"J. Photogramm. Remote Sens."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"7405","DOI":"10.1109\/TGRS.2016.2601622","article-title":"Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images","volume":"54","author":"Cheng","year":"2016","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"275","DOI":"10.1007\/s11263-012-0538-3","article-title":"Weakly supervised localization and learning with generic knowledge","volume":"100","author":"Deselaers","year":"2012","journal-title":"Int. J. Comput. Vis."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1007\/s11263-015-0816-y","article-title":"ImageNet large scale visual recognition challenge","volume":"115","author":"Russakovsky","year":"2015","journal-title":"Int. J. Comput. Vis."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Hosang, J., Benenson, R., and Schiele, B. (2017, January 21\u201326). Learning non-maximum suppression. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.685"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/13\/8\/1461\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,13]],"date-time":"2025-10-13T14:12:14Z","timestamp":1760364734000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/13\/8\/1461"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,4,10]]},"references-count":43,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2021,4]]}},"alternative-id":["rs13081461"],"URL":"https:\/\/doi.org\/10.3390\/rs13081461","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,4,10]]}}}