{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,11]],"date-time":"2026-04-11T21:18:46Z","timestamp":1775942326331,"version":"3.50.1"},"reference-count":39,"publisher":"MDPI AG","issue":"19","license":[{"start":{"date-parts":[[2021,9,30]],"date-time":"2021-09-30T00:00:00Z","timestamp":1632960000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Remote sensing image target detection is widely used for both civil and military purposes. However, two factors need to be considered for remote sensing image target detection: real-time and accuracy for detecting targets that occupy few pixels. Considering the two above issues, the main research objective of this paper is to improve the performance of the YOLO algorithm in remote sensing image target detection. The reason is that the YOLO models can guarantee both detection speed and accuracy. More specifically, the YOLOv3 model with an auxiliary network is further improved in this paper. Our model improvement consists of four main components. Firstly, an image blocking module is used to feed fixed size images to the YOLOv3 network; secondly, to speed up the training of YOLOv3, DIoU is used, which can speed up the convergence and increase the training speed; thirdly, the Convolutional Block Attention Module (CBAM) is used to connect the auxiliary network to the backbone network, making it easier for the network to notice specific features so that some key information is not easily lost during the training of the network; and finally, the adaptive feature fusion (ASFF) method is applied to our network model with the aim of improving the detection speed by reducing the inference overhead. The experiments on the DOTA dataset were conducted to validate the effectiveness of our model on the DOTA dataset. Our model can achieve satisfactory detection performance on remote sensing images, and our model performs significantly better than the unimproved YOLOv3 model with an auxiliary network. The experimental results show that the mAP of the optimised network model is 5.36% higher than that of the original YOLOv3 model with the auxiliary network, and the detection frame rate was also increased by 3.07 FPS.<\/jats:p>","DOI":"10.3390\/rs13193908","type":"journal-article","created":{"date-parts":[[2021,10,8]],"date-time":"2021-10-08T21:26:20Z","timestamp":1633728380000},"page":"3908","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":37,"title":["Remote Sensing Image Target Detection: Improvement of the YOLOv3 Model with Auxiliary Networks"],"prefix":"10.3390","volume":"13","author":[{"given":"Zhenfang","family":"Qu","sequence":"first","affiliation":[{"name":"College of Electronic Engineering, Heilongjiang University, Harbin 150080, China"}]},{"given":"Fuzhen","family":"Zhu","sequence":"additional","affiliation":[{"name":"College of Electronic Engineering, Heilongjiang University, Harbin 150080, China"}]},{"given":"Chengxiao","family":"Qi","sequence":"additional","affiliation":[{"name":"College of Electronic Engineering, Heilongjiang University, Harbin 150080, China"}]}],"member":"1968","published-online":{"date-parts":[[2021,9,30]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1016\/j.isprsjprs.2016.03.014","article-title":"A survey on object detection in optical remote sensing images","volume":"117","author":"Cheng","year":"2016","journal-title":"ISPRS J. Photogram. Remote Sens."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"296","DOI":"10.1016\/j.isprsjprs.2019.11.023","article-title":"Object detection in optical remote sensing images: A survey and a new benchmark","volume":"159","author":"Li","year":"2020","journal-title":"ISPRS J. Photogram. Remote Sens."},{"key":"ref_3","unstructured":"Long, J., Shelhamer, E., and Darrell, T. (2017, January 21\u201326). Fully convolutional networks for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"4376","DOI":"10.1109\/TIP.2019.2910667","article-title":"Weakly supervised adversarial domain adaptation for semantic segmentation in urban scenes","volume":"28","author":"Wang","year":"2019","journal-title":"IEEE Trans. Image Process."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Huang, G., Liu, Z., Van Der Maatten, L., and Weinberger, K.Q. (2017, January 21\u201326). Densely connected convolutional networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.243"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 13\u201316). Fast R-CNN. Proceedings of the IEEE International Conference on Computer Vision ( ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"947","DOI":"10.1109\/LGRS.2018.2889247","article-title":"A sample update-based convolutional neural network framework for object detection in large-area remote sensing images","volume":"16","author":"Hu","year":"2019","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Yoo, J.J., Ahn, N.H., and Sohn, K.A. (2020, April 23). Rethinking Data Augmentation for Image Super-Resolution: A Comprehensive Analysis and a New Strategy. Available online: https:\/\/arxiv.org\/abs\/2004.00448.","DOI":"10.1109\/CVPR42600.2020.00840"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Girshick, R., Donahue, J., and Darrell, T. (2014, January 23\u201328). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.81"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","article-title":"Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks","volume":"39","author":"Ren","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Dolloor, P., and Girshick, R. (2017, January 21\u201326). Mask R-CNN. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/ICCV.2017.322"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S., and Girshick, R. (2016, January 27\u201330). You Only Look Once: Unified, Real-Time Object Detection. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Redmon, J., and Farhadi, A. (2017, January 21\u201326). YOLO9000: Better, Faster, Stronger. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.690"},{"key":"ref_15","unstructured":"Redmon, J. (2018, April 08). YOLOv3: An Incremental Improvement. Available online: https:\/\/arxiv.org\/abs\/1804.02767."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Dolloor, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21\u201326). Feature pyramid networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_17","first-page":"3630","article-title":"Matching networks for one shot learning","volume":"10","author":"Vinyals","year":"2016","journal-title":"Proc. Adv. Neural Inf. Process. Syst."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Dai, Z.G., Cai, B.L., Lin, Y.G., and Chen, J.Y. (2021, April 07). UP-DETR: Unsupervised Pre-Training for Object Detection with Transformers. Available online: https:\/\/arxiv.org\/abs\/2011.09094.","DOI":"10.1109\/CVPR46437.2021.00165"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Volpi, M., Morsier, F.D., Camps-Valls, G., Kanevski, M., and Tuia, D. (2013, January 21\u201326). Multi-sensor change detection based on nonlinear canonical correlations. Proceedings of the 2013 IEEE International Geoscience and Remote Sensing Symposium-IGARSS, Melbourne, VIC, Australia.","DOI":"10.1109\/IGARSS.2013.6723187"},{"key":"ref_20","first-page":"6508","article-title":"VHR object detection based on structural feature extraction and query expansion","volume":"10","author":"Bai","year":"2014","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"749","DOI":"10.1109\/LGRS.2011.2180695","article-title":"A visual search inspired computational model for ship detection in optical satellite images","volume":"9","author":"Bi","year":"2012","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"1977","DOI":"10.1080\/01431160802546837","article-title":"Road centreline extraction from high-resolution imagery based on multiscale structural features and support vector machines","volume":"30","author":"Huang","year":"2009","journal-title":"Int. J Remote Sens."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"7405","DOI":"10.1109\/TGRS.2016.2601622","article-title":"Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images","volume":"54","author":"Cheng","year":"2016","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"5535","DOI":"10.1109\/TGRS.2019.2900302","article-title":"Hierarchical and robust convolutional neural network for very high-resolution remote sensing object detection","volume":"57","author":"Zhang","year":"2019","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"2337","DOI":"10.1109\/TGRS.2017.2778300","article-title":"Rotation insensitive and context augmented object detection in remote sensing images","volume":"56","author":"Li","year":"2018","journal-title":"IEEE Trans Geosci. Remote Sens."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Tang, T., Zhou, S., Deng, Z., Zou, H., and Lei, L. (2017). Vehicle detection in aerial images based on region convolutional neural networks and hard negative example mining. Sensors, 17.","DOI":"10.3390\/s17020336"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"5832","DOI":"10.1109\/TGRS.2016.2572736","article-title":"Ship detection in spaceborne optical image with SVD networks","volume":"54","author":"Zou","year":"2016","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"1665","DOI":"10.1109\/LGRS.2017.2727515","article-title":"Fully convolutional network with task partitioning for inshore ship detection in optical remote sensing images","volume":"14","author":"Lin","year":"2017","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"937","DOI":"10.1109\/LGRS.2018.2813094","article-title":"Arbitrary oriented ship detection frame-work in optical remote sensing images","volume":"15","author":"Liu","year":"2018","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Tang, T., Zhou, S., Deng, Z., Lei, L., and Zou, H. (2017). Arbitrary oriented vehicle detection in aerial imagery with single convolutional neural networks. Remote Sens., 9.","DOI":"10.3390\/rs9111170"},{"key":"ref_31","first-page":"960","article-title":"Learning a rotation invariant detector with rotatable bounding box","volume":"9","author":"Liu","year":"2017","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Liu, W. (2016). SSD: Single Shot MultiBox Detector. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Zhong, J., Lei, T., and Yao, G. (2017). Robust vehicle detection in aerial images based on cascaded convolutional neural networks. Sensors, 17.","DOI":"10.3390\/s17122720"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Han, X., Zhong, Y., and Zhang, L. (2017). An efficient and robust integrated geospatical object detection framework for high spatial resolution remote geospatial sensing imagery. Remote Sens., 9.","DOI":"10.3390\/rs9070666"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Xu, Z., Xu, X., Wang, L., Yang, R., and Pu, F. (2017). Deformable ConvNet with aspect ratio constrained NMS for object detection in remote sensing imagery. Remote Sens., 9.","DOI":"10.3390\/rs9121312"},{"key":"ref_36","first-page":"27574","article-title":"Research on Small Target Detection in Driving Scenarios Based on Improved Yolo Network","volume":"8","author":"Xun","year":"2019","journal-title":"IEEE Access"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Woo, S., Park, J., and Lee, J.Y. (2018). CBAM: Convolutional Block Attention Module, Springer.","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"ref_38","unstructured":"Liu, S., Huang, D., and Wang, Y. (2019, September 21). Learning Spatial Fusion for Single-Shot Object Detection. Available online: https:\/\/arxiv.org\/abs\/1911.09516."},{"key":"ref_39","first-page":"12993","article-title":"Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression","volume":"34","author":"Zheng","year":"2020","journal-title":"Proc. AAAI Conf. Artif. Intell."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/13\/19\/3908\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T07:07:41Z","timestamp":1760166461000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/13\/19\/3908"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,9,30]]},"references-count":39,"journal-issue":{"issue":"19","published-online":{"date-parts":[[2021,10]]}},"alternative-id":["rs13193908"],"URL":"https:\/\/doi.org\/10.3390\/rs13193908","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,9,30]]}}}