{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,9]],"date-time":"2026-04-09T04:16:14Z","timestamp":1775708174131,"version":"3.50.1"},"reference-count":57,"publisher":"MDPI AG","issue":"13","license":[{"start":{"date-parts":[[2021,7,2]],"date-time":"2021-07-02T00:00:00Z","timestamp":1625184000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62001455, 41871245"],"award-info":[{"award-number":["62001455, 41871245"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Object detection based on remote sensing imagery has become increasingly popular over the past few years. Unlike natural images taken by humans or surveillance cameras, the scale of remote sensing images is large, which requires the training and inference procedure to be on a cutting image. However, objects appearing in remote sensing imagery are often sparsely distributed and the labels for each class are imbalanced. This results in unstable training and inference. In this paper, we analyze the training characteristics of the remote sensing images and propose the fusion of the aggregated-mosaic training method, with the assigned-stitch augmentation and auto-target-duplication. In particular, based on the ground truth and mosaic image size, the assigned-stitch augmentation enhances each training sample with an appropriate account of objects, facilitating the smooth training procedure. Hard to detect objects, or those in classes with rare samples, are randomly selected and duplicated by the auto-target-duplication, which solves the sample imbalance or classes with insufficient results. Thus, the training process is able to focus on weak classes. We employ VEDAI and NWPU VHR-10, remote sensing datasets with sparse objects, to verify the proposed method. The YOLOv5 adopts the Mosaic as the augmentation method and is one of state-of-the-art detectors, so we choose Mosaic (YOLOv5) as the baseline. Results demonstrate that our method outperforms Mosaic (YOLOv5) by 2.72% and 5.44% on 512 \u00d7 512 and 1024 \u00d7 1024 resolution imagery, respectively. Moreover, the proposed method outperforms Mosaic (YOLOv5) by 5.48% under the NWPU VHR-10 dataset.<\/jats:p>","DOI":"10.3390\/rs13132602","type":"journal-article","created":{"date-parts":[[2021,7,2]],"date-time":"2021-07-02T10:06:34Z","timestamp":1625220394000},"page":"2602","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":28,"title":["An Improved Aggregated-Mosaic Method for the Sparse Object Detection of Remote Sensing Imagery"],"prefix":"10.3390","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5620-406X","authenticated-orcid":false,"given":"Boya","family":"Zhao","sequence":"first","affiliation":[{"name":"Key Laboratory of Digital Earth Science, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8427-9851","authenticated-orcid":false,"given":"Yuanfeng","family":"Wu","sequence":"additional","affiliation":[{"name":"Key Laboratory of Digital Earth Science, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China"},{"name":"School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 100049, China"}]},{"given":"Xinran","family":"Guan","sequence":"additional","affiliation":[{"name":"Key Laboratory of Digital Earth Science, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China"},{"name":"School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 100049, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3888-8124","authenticated-orcid":false,"given":"Lianru","family":"Gao","sequence":"additional","affiliation":[{"name":"Key Laboratory of Digital Earth Science, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China"},{"name":"College of Resources and Environment, University of Chinese Academy of Sciences, Beijing 100049, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0319-7753","authenticated-orcid":false,"given":"Bing","family":"Zhang","sequence":"additional","affiliation":[{"name":"Key Laboratory of Digital Earth Science, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China"},{"name":"College of Resources and Environment, University of Chinese Academy of Sciences, Beijing 100049, China"}]}],"member":"1968","published-online":{"date-parts":[[2021,7,2]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1016\/j.isprsjprs.2020.06.014","article-title":"X-ModalNet: A semi-supervised deep cross-modal network for classification of remote sensing data","volume":"167","author":"Hong","year":"2020","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Hou, J.-B., Zhu, X., and Yin, X.-C. (2021). Self-Adaptive Aspect Ratio Anchor for Oriented Object Detection in Remote Sensing Images. Remote Sens., 13.","DOI":"10.3390\/rs13071318"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"302","DOI":"10.1109\/LGRS.2019.2919755","article-title":"Fourier-based rotation-invariant feature boosting: An efficient framework for geospatial object detection","volume":"17","author":"Wu","year":"2019","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Awad, M.M., and Lauteri, M. (2021). Self-Organizing Deep Learning (SO-UNet)\u2014A Novel Framework to Classify Urban and Peri-Urban Forests. Sustainability, 13.","DOI":"10.3390\/su13105548"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s40537-019-0197-0","article-title":"A survey on image data augmentation for deep learning","volume":"6","author":"Shorten","year":"2019","journal-title":"J. Big Data"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Nusrat, I., and Jang, S.-B. (2018). A comparison of regularization techniques in deep neural networks. Symmetry, 10.","DOI":"10.3390\/sym10110648"},{"key":"ref_7","unstructured":"Dalal, N., and Triggs, B. (2005, January 20\u201325). Histograms of oriented gradients for human detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Diego, CA, USA."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Yan, J., Lei, Z., Wen, L., and Li, S.Z. (2014, January 24\u201327). The fastest deformable part model for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.320"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Cao, J., Cholakkal, H., Anwer, R.M., Khan, F.S., Pang, Y., and Shao, L. (2020, January 14\u201319). D2det: Towards high quality object detection and instance segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01150"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Doll\u00e1r, P., and Girshick, R. (2017, January 16\u201321). Mask R-CNN. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/ICCV.2017.322"},{"key":"ref_11","unstructured":"Yun, S., Han, D., Oh, S.J., Chun, S., Choe, J., and Yoo, Y. (November, January 27). Cutmix: Regularization strategy to train strong classifiers with localizable features. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Seoul, Korea."},{"key":"ref_12","unstructured":"Zhang, H., Cisse, M., Dauphin, Y.N., and Lopez-Paz, D. (May, January 30). Mixup: Beyond empirical risk minimization. Proceedings of the 6th International Conference on Learning Representations (ICLR), Vancouver, BC, Canada."},{"key":"ref_13","unstructured":"Ultralytics (2021, May 08). YOLOv5. Available online: https:\/\/github.com\/ultralytics\/yolov5."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1023\/B:VISI.0000029664.99615.94","article-title":"Distinctive image features from scale-invariant keypoints","volume":"60","author":"Lowe","year":"2004","journal-title":"Int. J. Comput. Vis."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"287","DOI":"10.1023\/A:1007618119488","article-title":"Soft margins for AdaBoost","volume":"42","author":"Onoda","year":"2001","journal-title":"Mach. Learn."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"293","DOI":"10.1023\/A:1018628609742","article-title":"Least squares support vector machine classifiers","volume":"9","author":"Suykens","year":"1999","journal-title":"Neural Process. Lett."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Felzenszwalb, P., McAllester, D., and Ramanan, D. (2008, January 24\u201326). A discriminatively trained, multiscale, deformable part model. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Anchorage, AK, USA.","DOI":"10.1109\/CVPR.2008.4587597"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1007\/s11263-015-0816-y","article-title":"Imagenet large scale visual recognition challenge","volume":"115","author":"Russakovsky","year":"2015","journal-title":"Int. J. Comput. Vis."},{"key":"ref_19","unstructured":"Krizhevsky, A., Sutskever, I., and Hinton, G.E. (2012, January 3\u20136). Imagenet classification with deep convolutional neural networks. Proceedings of the Advances in Neural Information Processing Systems (NIPS), Lake Tahoe, NV, USA."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016, January 27\u201330). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., and Berg, A.C. (2016, January 11\u201314). Ssd: Single shot multibox detector. Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014, January 24\u201327). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.81"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Redmon, J., and Farhadi, A. (2017, January 16\u201321). YOLO9000: Better, faster, stronger. Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.690"},{"key":"ref_24","unstructured":"Redmon, J., and Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv."},{"key":"ref_25","unstructured":"Choi, J., Chun, D., Kim, H., and Lee, H.-J. (November, January 27). Gaussian yolov3: An accurate and fast object detector using localization uncertainty for autonomous driving. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Seoul, Korea."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Lin, T.-Y., Doll\u00e1r, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 16\u201321). Feature pyramid networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 7\u201313). Fast R-CNN. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_28","unstructured":"Bochkovskiy, A., Wang, C.-Y., and Liao, H.-Y.M. (2020). Yolov4: Optimal speed and accuracy of object detection. arXiv."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Wang, C.-Y., Bochkovskiy, A., and Liao, H.-Y.M. (2021, January 19\u201325). Scaled-yolov4: Scaling cross stage partial network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Virtual.","DOI":"10.1109\/CVPR46437.2021.01283"},{"key":"ref_30","unstructured":"Zheng, Z., Wang, P., Ren, D., Liu, W., Ye, R., Hu, Q., and Zuo, W. (2020). Enhancing geometric factors in model learning and inference for object detection and instance segmentation. arXiv."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., and Ren, D. (2020, January 7\u201312). Distance-IoU loss: Faster and better learning for bounding box regression. Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA.","DOI":"10.1609\/aaai.v34i07.6999"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","article-title":"Faster R-CNN: Towards real-time object detection with region proposal networks","volume":"39","author":"Ren","year":"2016","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_33","unstructured":"Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., and Tian, Q. (November, January 27). Centernet: Keypoint triplets for object detection. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Seoul, Korea."},{"key":"ref_34","unstructured":"Ghiasi, G., Lin, T.-Y., and Le, Q.V. (2018, January 3\u20138). Dropblock: A regularization method for convolutional networks. Proceedings of the Advances in Neural Information Processing Systems (NIPS), Montr\u00e9al, QC, Canada."},{"key":"ref_35","first-page":"1929","article-title":"Dropout: A simple way to prevent neural networks from overfitting","volume":"15","author":"Srivastava","year":"2014","journal-title":"J. Mach. Learn. Res."},{"key":"ref_36","unstructured":"Zhong, Z., Zheng, L., Kang, G., Li, S., and Yang, Y. (2020, January 7\u201312). Random erasing data augmentation. Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA."},{"key":"ref_37","unstructured":"DeVries, T., and Taylor, G.W. (2017). Improved regularization of convolutional neural networks with cutout. arXiv."},{"key":"ref_38","unstructured":"Real, E., Aggarwal, A., Huang, Y., and Le, Q.V. (February, January 27). Regularized evolution for image classifier architecture search. Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Dwibedi, D., Misra, I., and Hebert, M. (2017, January 22\u201329). Cut, paste and learn: Surprisingly easy synthesis for instance detection. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.146"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Dvornik, N., Mairal, J., and Schmid, C. (2018, January 8\u201314). Modeling visual context is key to augmenting object detection datasets. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01258-8_23"},{"key":"ref_41","unstructured":"Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F.A., and Brendel, W. (2019, January 6\u20139). ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. Proceedings of the 7th International Conference on Learning Representations (ICLR), New Orleans, LA, USA."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Tokozume, Y., Ushiku, Y., and Harada, T. (2018, January 18\u201322). Between-class learning for image classification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).","DOI":"10.1109\/CVPR.2018.00575"},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"2917","DOI":"10.1109\/TCSVT.2019.2935128","article-title":"Data augmentation using random image cropping and patching for deep cnns","volume":"30","author":"Takahashi","year":"2019","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_44","unstructured":"Ioffe, S., and Szegedy, C. (2015, January 6\u201311). Batch normalization: Accelerating deep network training by reducing internal covariate shift. Proceedings of the International Conference on Machine Learning (ICML), Lille, France."},{"key":"ref_45","unstructured":"Kingma, D.P., and Ba, J. (2015, January 7\u20139). Adam: A method for stochastic optimization. Proceedings of the 3rd International Conference on Learning Representations (ICLR), San Diego, CA, USA."},{"key":"ref_46","unstructured":"Loshchilov, I., and Hutter, F. (2019, January 6\u20139). Decoupled weight decay regularization. Proceedings of the 7th International Conference on Learning Representations (ICLR), New Orleans, LA, USA."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Shafahi, A., Najibi, M., Xu, Z., Dickerson, J., Davis, L.S., and Goldstein, T. (2020, January 7\u201312). Universal adversarial training. Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA.","DOI":"10.1609\/aaai.v34i04.6017"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Wang, J., Yang, Y., Chen, Y., and Han, Y. (2021). LighterGAN: An Illumination Enhancement Method for Urban UAV Imagery. Remote Sens., 13.","DOI":"10.3390\/rs13071371"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Awad, M.M., and De Jong, K. (2011, January 5\u20138). Optimization of spectral signatures selection using multi-objective genetic algorithms. Proceedings of the IEEE Congress of Evolutionary Computation (CEC), New Orleans, LA, USA.","DOI":"10.1109\/CEC.2011.5949809"},{"key":"ref_50","unstructured":"Ding, Y., Zhou, Y., Zhu, Y., Ye, Q., and Jiao, J. (November, January 27). Selective sparse sampling for fine-grained image recognition. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Seoul, Korea."},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Zheng, S., Zhang, Y., Liu, W., and Zou, Y. (2020). Improved image representation and sparse representation for image classification. Appl. Intell., 1\u201312.","DOI":"10.1007\/s10489-019-01612-3"},{"key":"ref_52","unstructured":"Van Etten, A. (2018). You only look twice: Rapid multi-scale object detection in satellite imagery. arXiv."},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Ghiasi, G., Cui, Y., Srinivas, A., Qian, R., Lin, T.-Y., Cubuk, E.D., Le, Q.V., and Zoph, B. (2021, January 19\u201325). Simple copy-paste is a strong data augmentation method for instance segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Virtual.","DOI":"10.1109\/CVPR46437.2021.00294"},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1016\/j.jvcir.2015.11.002","article-title":"Vehicle detection in aerial imagery: A small target detection benchmark","volume":"34","author":"Razakarivony","year":"2016","journal-title":"J. Vis. Commun. Image Represent."},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1016\/j.isprsjprs.2016.03.014","article-title":"A survey on object detection in optical remote sensing images","volume":"117","author":"Cheng","year":"2016","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_56","unstructured":"Chen, P., Liu, S., Zhao, H., and Jia, J. (2020). Gridmask data augmentation. arXiv."},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Wang, J., Jin, S., Liu, W., Liu, W., Qian, C., and Luo, P. (2021, January 19\u201325). When human pose estimation meets robustness: Adversarial algorithms and benchmarks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Virtual.","DOI":"10.1109\/CVPR46437.2021.01168"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/13\/13\/2602\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T06:25:23Z","timestamp":1760163923000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/13\/13\/2602"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,7,2]]},"references-count":57,"journal-issue":{"issue":"13","published-online":{"date-parts":[[2021,7]]}},"alternative-id":["rs13132602"],"URL":"https:\/\/doi.org\/10.3390\/rs13132602","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,7,2]]}}}