{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,17]],"date-time":"2025-10-17T14:26:24Z","timestamp":1760711184407,"version":"build-2065373602"},"reference-count":53,"publisher":"MDPI AG","issue":"11","license":[{"start":{"date-parts":[[2023,6,4]],"date-time":"2023-06-04T00:00:00Z","timestamp":1685836800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62271409","62262067"],"award-info":[{"award-number":["62271409","62262067"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Detecting sparse, small, lost persons with only a few pixels in high-resolution aerial images was, is, and remains an important and difficult mission, in which a vital role is played by accurate monitoring and intelligent co-rescuing for the search and rescue (SaR) system. However, many problems have not been effectively solved in existing remote-vision-based SaR systems, such as the shortage of person samples in SaR scenarios and the low tolerance of small objects for bounding boxes. To address these issues, a copy-paste mechanism (ISCP) with semi-supervised object detection (SSOD) via instance segmentation and maximum mean discrepancy distance is proposed (MMD), which can provide highly robust, multi-task, and efficient aerial-based person detection for the prototype SaR system. Specifically, numerous pseudo-labels are obtained by accurately segmenting the instances of synthetic ISCP samples to obtain their boundaries. The SSOD trainer then uses soft weights to balance the prediction entropy of the loss function between the ground truth and unreliable labels. Moreover, a novel evaluation metric MMD for anchor-based detectors is proposed to elegantly compute the IoU of the bounding boxes. Extensive experiments and ablation studies on Heridal and optimized public datasets demonstrate that our approach is effective and achieves state-of-the-art person detection performance in aerial images.<\/jats:p>","DOI":"10.3390\/rs15112928","type":"journal-article","created":{"date-parts":[[2023,6,5]],"date-time":"2023-06-05T02:18:29Z","timestamp":1685931509000},"page":"2928","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":16,"title":["Semi-Supervised Person Detection in Aerial Images with Instance Segmentation and Maximum Mean Discrepancy Distance"],"prefix":"10.3390","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7273-6170","authenticated-orcid":false,"given":"Xiangqing","family":"Zhang","sequence":"first","affiliation":[{"name":"School of Electronics and Information, Northwestern Polytechnical University, Xi\u2019an 710072, China"},{"name":"College of Mathematics and Computer Science, Yan\u2019an University, Yan\u2019an 716000, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0669-9970","authenticated-orcid":false,"given":"Yan","family":"Feng","sequence":"additional","affiliation":[{"name":"School of Electronics and Information, Northwestern Polytechnical University, Xi\u2019an 710072, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3380-8957","authenticated-orcid":false,"given":"Shun","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Electronics and Information, Northwestern Polytechnical University, Xi\u2019an 710072, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8739-6711","authenticated-orcid":false,"given":"Nan","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Electronics and Information, Northwestern Polytechnical University, Xi\u2019an 710072, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8018-596X","authenticated-orcid":false,"given":"Shaohui","family":"Mei","sequence":"additional","affiliation":[{"name":"School of Electronics and Information, Northwestern Polytechnical University, Xi\u2019an 710072, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2051-6955","authenticated-orcid":false,"given":"Mingyi","family":"He","sequence":"additional","affiliation":[{"name":"School of Electronics and Information, Northwestern Polytechnical University, Xi\u2019an 710072, China"}]}],"member":"1968","published-online":{"date-parts":[[2023,6,4]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"114937","DOI":"10.1016\/j.eswa.2021.114937","article-title":"Search and rescue operation using UAVs: A case study","volume":"178","author":"Golcarenarenji","year":"2021","journal-title":"Expert Syst. Appl."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Niedzielski, T., Jurecka, M., Mizi\u0144ski, B., Pawul, W., and Motyl, T. (2021). First Successful Rescue of a Lost Person Using the Human Detection System: A Case Study from Beskid Niski (SE Poland). Remote. Sens., 13.","DOI":"10.3390\/rs13234903"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Varga, L.A., Kiefer, B., Messmer, M., and Zell, A. (2022, January 3\u20138). SeaDronesSee: A Maritime Benchmark for Detecting Humans in Open Water. Proceedings of the 2022 IEEE\/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA.","DOI":"10.1109\/WACV51458.2022.00374"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"221","DOI":"10.3233\/ICA-210649","article-title":"An ensemble deep learning method with optimized weights for drone-based water rescue and surveillance","volume":"28","author":"Knapik","year":"2021","journal-title":"Integr. -Comput. -Aided Eng."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1256","DOI":"10.1007\/s11263-019-01177-1","article-title":"Deep Learning Approach in Aerial Imagery for Supporting Land Search and Rescue Missions","volume":"127","author":"Gotovac","year":"2019","journal-title":"Int. J. Comput. Vis."},{"key":"ref_6","unstructured":"Pyrr\u00f6, P., Naseri, H., and Jung, A. (2021). Rethinking Drone-Based Search and Rescue with Aerial Person Detection. arXiv."},{"key":"ref_7","unstructured":"Maru\u0161i\u0107, \u017d., Bo\u017ei\u0107-\u0160tuli\u0107, D., Gotovac, S., and Maru\u0161i\u0107, T. (2018, January 26\u201329). Region proposal approach for human detection on aerial imagery. Proceedings of the 2018 3rd International Conference on Smart and Sustainable Technologies (SpliTech), Split, Croatia."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Vasi\u0107, M.K., and Papi\u0107, V. (2020). Multimodel Deep Learning for Person Detection in Aerial Images. Electronics, 9.","DOI":"10.3390\/electronics9091459"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Mei, S., Geng, Y., Hou, J., and Du, Q. (2021). Learning hyperspectral images from RGB images via a coarse-to-fine CNN. Sci. China Inf. Sci., 65.","DOI":"10.1007\/s11432-020-3102-9"},{"key":"ref_10","first-page":"1","article-title":"Hyperspectral Image Classification Using Attention-Based Bidirectional Long Short-Term Memory Network","volume":"60","author":"Mei","year":"2022","journal-title":"IEEE Trans. Geosci. Remote. Sens."},{"key":"ref_11","first-page":"1","article-title":"Accelerating Convolutional Neural Network-Based Hyperspectral Image Classification by Step Activation Quantization","volume":"60","author":"Mei","year":"2022","journal-title":"IEEE Trans. Geosci. Remote. Sens."},{"key":"ref_12","first-page":"19","article-title":"Feature enhancement network for object detection in optical remote sensing images","volume":"48","author":"Cheng","year":"2021","journal-title":"J. Remote. Sens."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"1331","DOI":"10.1109\/TGRS.2020.3005151","article-title":"An Anchor-Free Method Based on Feature Balancing and Refinement Network for Multiscale Ship Detection in SAR Images","volume":"59","author":"Fu","year":"2021","journal-title":"IEEE Trans. Geosci. Remote. Sens."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/TGRS.2023.3335484","article-title":"Few-Shot Class-Incremental SAR Target Recognition Based on Hierarchical Embedding and Incremental Evolutionary Network","volume":"61","author":"Wang","year":"2023","journal-title":"IEEE Trans. Geosci. Remote. Sens."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"11162","DOI":"10.1109\/JSTARS.2021.3109469","article-title":"Scattering-Keypoint-Guided Network for Oriented Ship Detection in High-Resolution and Large-Scale SAR Images","volume":"14","author":"Fu","year":"2021","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Yu, Z., Chen, L., Cheng, Z., and Luo, J. (2020, January 13\u201319). TransMatch: A Transfer-Learning Scheme for Semi-Supervised Few-Shot Learning. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01287"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Choi, J., Elezi, I., Lee, H.J., Farabet, C., and Alvarez, J.M. (2021, January 10\u201317). Active Learning for Deep Object Detection via Probabilistic Modeling. Proceedings of the 2021 IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.01010"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Abuduweili, A., Li, X., Shi, H., Xu, C.Z., and Dou, D. (2021, January 20\u201325). Adaptive Consistency Regularization for Semi-Supervised Transfer Learning. Proceedings of the 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00685"},{"key":"ref_19","first-page":"1314","article-title":"Rethinking Pseudo Labels for Semi-supervised Object Detection","volume":"36","author":"Li","year":"2022","journal-title":"Proc. AAAI Conf. Artif. Intell."},{"key":"ref_20","unstructured":"Wallach, H., Larochelle, H., Beygelzimer, A., dAlch\u00e9 Buc, F., Fox, E., and Garnett, R. (2019). Advances in Neural Information Processing Systems, Curran Associates, Inc."},{"key":"ref_21","unstructured":"Sohn, K., Zhang, Z., Li, C.L., Zhang, H., Lee, C.Y., and Pfister, T. (2005). A Simple Semi-Supervised Learning Framework for Object Detection. arXiv."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Zhou, Q., Yu, C., Wang, Z., Qian, Q., and Li, H. (2021, January 20\u201325). Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework. Proceedings of the 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00407"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Wang, Z., Li, Y., Guo, Y., Fang, L., and Wang, S. (2021, January 20\u201325). Data-Uncertainty Guided Multi-Phase Learning for Semi-Supervised Object Detection. Proceedings of the 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00454"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Liu, Y.C., Ma, C.Y., and Kira, Z. (2022, January 18\u201324). Unbiased Teacher v2: Semi-supervised Object Detection for Anchor-free and Anchor-based Detectors. Proceedings of the 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00959"},{"key":"ref_25","unstructured":"Jiang, B., Luo, R., Mao, J., Xiao, T., and Jiang, Y. (2018). Computer Vision\u2013ECCV 2018, Springer International Publishing."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., and Savarese, S. (2019, January 15\u201320). Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00075"},{"key":"ref_27","first-page":"12993","article-title":"Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression","volume":"34","author":"Zheng","year":"2020","journal-title":"Proc. AAAI Conf. Artif. Intell."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"146","DOI":"10.1016\/j.neucom.2022.07.042","article-title":"Focal and efficient IOU loss for accurate bounding box regression","volume":"506","author":"Zhang","year":"2022","journal-title":"Neurocomputing"},{"key":"ref_29","unstructured":"Pan, S.J., Kwok, J.T., and Yang, Q. (2008). Proceedings of the 23rd National Conference on Artificial Intelligence\u2014Volume 2, AAAI Press."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Zhu, X., Lyu, S., Wang, X., and Zhao, Q. (2021, January 11\u201317). TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-captured Scenarios. Proceedings of the 2021 IEEE\/CVF International Conference on Computer Vision Workshops (ICCVW), Montreal, BC, Canada.","DOI":"10.1109\/ICCVW54120.2021.00312"},{"key":"ref_31","unstructured":"Cheng, G., Yuan, X., Yao, X., Yan, K., Zeng, Q., and Han, J. (2022). Towards Large-Scale Small Object Detection: Survey and Benchmarks. arXiv."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"3051","DOI":"10.1007\/s11263-021-01515-2","article-title":"Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation","volume":"129","author":"Yu","year":"2021","journal-title":"Int. J. Comput. Vis."},{"key":"ref_33","unstructured":"Du, D., Zhu, P., Wen, L., Bian, X., and Lin, H. (2019, January 27\u201328). VisDrone-DET2019: The Vision Meets Drone Object Detection in Image Challenge Results. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Republic of Korea."},{"key":"ref_34","unstructured":"Bolya, D., Foley, S., Hays, J., and Hoffman, J. (2020). Computer Vision\u2013ECCV 2020, Springer."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Kisantal, M., Wojna, Z., Murawski, J., Naruniec, J., and Cho, K. (2019, January 21\u201322). Augmentation for small object detection. Proceedings of the 9th International Conference on Advances in Computing and Information Technology (ACITY 2019), Sydney, Australia.","DOI":"10.5121\/csit.2019.91713"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Ghiasi, G., Cui, Y., Srinivas, A., Qian, R., Lin, T.Y., Cubuk, E.D., Le, Q.V., and Zoph, B. (2021, January 20\u201325). Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation. Proceedings of the 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00294"},{"key":"ref_37","unstructured":"Zhang, H., Cisse, M., Dauphin, Y.N., and Lopez-Paz, D. (2022). Mixup: Beyond Empirical Risk Minimization. arXiv."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Yun, S., Han, D., Chun, S., Oh, S.J., Yoo, Y., and Choe, J. (November, January 27). CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea.","DOI":"10.1109\/ICCV.2019.00612"},{"key":"ref_39","unstructured":"Bochkovskiy, A., Wang, C.Y., and Liao, H.Y.M. (2020). YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv."},{"key":"ref_40","first-page":"20230","article-title":"alphaIoU: A Family of Power Intersection over Union Losses for Bounding Box Regression","volume":"34","author":"He","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_41","unstructured":"Gevorgyan, Z. (2022). SIoU Loss: More Powerful Learning for Bounding Box Regression. arXiv."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Xu, C., Wang, J., Yang, W., and Yu, L. (2021). Dot Distance for Tiny Object Detection in Aerial Images, IEEE.","DOI":"10.1109\/CVPRW53098.2021.00130"},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"79","DOI":"10.1016\/j.isprsjprs.2022.06.002","article-title":"Detecting Tiny Objects in Aerial Images: A Normalized Wasserstein Distance and a New Benchmark","volume":"190","author":"Xu","year":"2022","journal-title":"ISPRS J. Photogramm. Remote. Sens."},{"key":"ref_44","unstructured":"Li, C., Li, L., Jiang, H., Weng, K., Geng, Y., Li, L., Ke, Z., Li, Q., Cheng, M., and Nie, W. (2022). YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications. arXiv."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Wang, C.Y., Bochkovskiy, A., and Liao, H.Y.M. (2023). YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. arXiv.","DOI":"10.1109\/CVPR52729.2023.00721"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"102692","DOI":"10.1016\/j.scs.2020.102692","article-title":"SSDMNV2: A real time DNN-based face mask detection system using single shot multibox detector and MobileNetV2","volume":"66","author":"Nagrath","year":"2021","journal-title":"Sustain. Cities Soc."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Goyal, P., Girshick, R., He, K., and Dollar, P. (2017, January 22\u201329). Focal Loss for Dense Object Detection. Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.324"},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","article-title":"Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks","volume":"39","author":"Ren","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Chen, Q., Wang, Y., Yang, T., Zhang, X., Cheng, J., and Sun, J. (2021, January 20\u201325). You Only Look One-Level Feature. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01284"},{"key":"ref_50","unstructured":"Ge, Z., Liu, S., Wang, F., Li, Z., and Sun, J. (2021). YOLOX: Exceeding YOLO Series in 2021. arXiv."},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"4371","DOI":"10.1109\/JSTARS.2022.3175498","article-title":"Finding Nonrigid Tiny Person With Densely Cropped and Local Attention Object Detector Networks in Low-Altitude Aerial Images","volume":"15","author":"Zhang","year":"2022","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens."},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Tian, Z., Shen, C., Chen, H., and He, T. (November, January 27). FCOS: Fully Convolutional One-Stage Object Detection. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea.","DOI":"10.1109\/ICCV.2019.00972"},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Liu, Z., Gao, G., Sun, L., and Fang, Z. (2021, January 5\u20139). HRDNet: High-Resolution Detection Network for Small Objects. Proceedings of the 2021 IEEE International Conference on Multimedia and Expo (ICME), Shenzhen, China.","DOI":"10.1109\/ICME51207.2021.9428241"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/15\/11\/2928\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T19:48:00Z","timestamp":1760125680000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/15\/11\/2928"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,4]]},"references-count":53,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2023,6]]}},"alternative-id":["rs15112928"],"URL":"https:\/\/doi.org\/10.3390\/rs15112928","relation":{},"ISSN":["2072-4292"],"issn-type":[{"type":"electronic","value":"2072-4292"}],"subject":[],"published":{"date-parts":[[2023,6,4]]}}}