{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,17]],"date-time":"2026-04-17T04:32:59Z","timestamp":1776400379962,"version":"3.51.2"},"reference-count":59,"publisher":"MDPI AG","issue":"14","license":[{"start":{"date-parts":[[2022,7,6]],"date-time":"2022-07-06T00:00:00Z","timestamp":1657065600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Nature Science Founding of China","doi-asserted-by":"publisher","award":["61573183"],"award-info":[{"award-number":["61573183"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Target detection based on unmanned aerial vehicle (UAV) images has increasingly become a hot topic with the rapid development of UAVs and related technologies. UAV aerial images often feature a large number of small targets and complex backgrounds due to the UAV\u2019s flying height and shooting angle of view. These characteristics make the advanced YOLOv4 detection method lack outstanding performance in UAV aerial images. In light of the aforementioned problems, this study adjusted YOLOv4 to the image\u2019s characteristics, making the improved method more suitable for target detection in UAV aerial images. Specifically, according to the characteristics of the activation function, different activation functions were used in the shallow network and the deep network, respectively. The loss for the bounding box regression was computed using the EIOU loss function. Improved Efficient Channel Attention (IECA) modules were added to the backbone. At the neck, the Spatial Pyramid Pooling (SPP) module was replaced with a pyramid pooling module. At the end of the model, Adaptive Spatial Feature Fusion (ASFF) modules were added. In addition, a dataset of forklifts based on UAV aerial imagery was also established. On the PASCAL VOC, VEDAI, and forklift datasets, we ran a series of experiments. The experimental results reveal that the proposed method (YOLO-DRONE, YOLOD) has better detection performance than YOLOv4 for the aforementioned three datasets, with the mean average precision (mAP) being improved by 3.06%, 3.75%, and 1.42%, respectively.<\/jats:p>","DOI":"10.3390\/rs14143240","type":"journal-article","created":{"date-parts":[[2022,7,6]],"date-time":"2022-07-06T21:15:52Z","timestamp":1657142152000},"page":"3240","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":55,"title":["YOLOD: A Target Detection Method for UAV Aerial Imagery"],"prefix":"10.3390","volume":"14","author":[{"given":"Xudong","family":"Luo","sequence":"first","affiliation":[{"name":"College of Electronic and Information Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China"}]},{"given":"Yiquan","family":"Wu","sequence":"additional","affiliation":[{"name":"College of Electronic and Information Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China"}]},{"given":"Langyue","family":"Zhao","sequence":"additional","affiliation":[{"name":"College of Electronic and Information Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China"}]}],"member":"1968","published-online":{"date-parts":[[2022,7,6]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.isprsjprs.2021.01.024","article-title":"A CNN approach to simultaneously count plants and detect plantation-rows from UAV imagery","volume":"174","author":"Osco","year":"2021","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Sivakumar, A.N.V., Li, J.T., Scott, S., Psota, E., Jhala, A.J., Luck, J.D., and Shi, Y.Y. (2020). Comparison of Object Detection and Patch-Based Classification Deep Learning Models on Mid- to Late-Season Weed Detection in UAV Imagery. Remote Sens., 12.","DOI":"10.3390\/rs12132136"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Wang, L., Xiang, L.R., Tang, L., and Jiang, H.Y. (2021). A Convolutional Neural Network-Based Method for Corn Stand Counting in the Field. Sensors, 21.","DOI":"10.3390\/s21020507"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Wu, J.T., Yang, G.J., Yang, H., Zhu, Y.H., Li, Z.H., Lei, L., and Zhao, C.J. (2020). Extracting apple tree crown information from remote imagery using deep learning. Comput. Electron. Agric., 174.","DOI":"10.1016\/j.compag.2020.105504"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Ammour, N., Alhichri, H., Bazi, Y., Benjdira, B., Alajlan, N., and Zuair, M. (2017). Deep Learning Approach for Car Detection in UAV Imagery. Remote Sens., 9.","DOI":"10.3390\/rs9040312"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Audebert, N., Le Saux, B., and Lefevre, S. (2017). Segment-before-Detect: Vehicle Detection and Classification through Semantic Segmentation of Aerial Images. Remote Sens., 9.","DOI":"10.3390\/rs9040368"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"208643","DOI":"10.1109\/ACCESS.2020.3036075","article-title":"Multi-Scale Vehicle Detection in High-Resolution Aerial Images with Context Information","volume":"8","author":"Li","year":"2020","journal-title":"IEEE Access"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"1256","DOI":"10.1007\/s11263-019-01177-1","article-title":"Deep Learning Approach in Aerial Imagery for Supporting Land Search and Rescue Missions","volume":"127","author":"Marusic","year":"2019","journal-title":"Int. J. Comput. Vis."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"De Oliveira, D.C., and Wehrmeister, M.A. (2018). Using Deep Learning and Low-Cost RGB and Thermal Cameras to Detect Pedestrians in Aerial Images Captured by Multirotor UAV. Sensors, 18.","DOI":"10.3390\/s18072244"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"37905","DOI":"10.1109\/ACCESS.2021.3063681","article-title":"Automatic Person Detection in Search and Rescue Operations Using Deep CNN Detectors","volume":"9","author":"Sambolek","year":"2021","journal-title":"IEEE Access"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1627","DOI":"10.1109\/TPAMI.2009.167","article-title":"Object Detection with Discriminatively Trained Part-Based Models","volume":"32","author":"Felzenszwalb","year":"2010","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1023\/A:1008162616689","article-title":"A trainable system for object detection","volume":"38","author":"Papageorgiou","year":"2000","journal-title":"Int. J. Comput. Vis."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"137","DOI":"10.1023\/B:VISI.0000013087.49260.fb","article-title":"Robust real-time face detection","volume":"57","author":"Viola","year":"2004","journal-title":"Int. J. Comput. Vis."},{"key":"ref_14","unstructured":"Dalal, N., and Triggs, B. (2005, January 20\u201325). Histograms of Oriented Gradients for Human Detection. Proceedings of the Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1023\/B:VISI.0000029664.99615.94","article-title":"Distinctive image features from scale-invariant keypoints","volume":"60","author":"Lowe","year":"2004","journal-title":"Int. J. Comput. Vis."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014, January 23\u201328). Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. Proceedings of the 27th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.81"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 11\u201318). Fast R-CNN. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_18","unstructured":"Ren, S.Q., He, K.M., Girshick, R., and Sun, J. (2015, January 7\u201312). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. Proceedings of the 29th Annual Conference on Neural Information Processing Systems (NIPS), Montreal, QC, Canada."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"He, K.M., Gkioxari, G., Dollar, P., and Girshick, R. (2017, January 22\u201329). Mask R-CNN. Proceedings of the 16th IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.322"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Cai, Z.W., and Vasconcelos, N. (2018, January 18\u201323). Cascade R-CNN: Delving Into High Quality Object Detection. Proceedings of the 31st IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00644"},{"key":"ref_21","unstructured":"Dai, J.F., Li, Y., He, K.M., and Sun, J. (2016, January 5\u201310). R-FCN: Object Detection via Region-Based Fully Convolutional Networks. Proceedings of the 30th Conference on Neural Information Processing Systems (NIPS), Barcelona, Spain."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016, January 27\u201330). You Only Look Once: Unified, Real-Time Object Detection. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., and Berg, A.C. (2016, January 8\u201316). SSD: Single Shot MultiBox Detector. Proceedings of the 14th European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_24","unstructured":"Huang, L., Yang, Y., Deng, Y., and Yu, Y. (2015). DenseBox: Unifying Landmark Localization with End to End Object Detection. arXiv."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Goyal, P., Girshick, R., He, K.M., and Dollar, P. (2017, January 22\u201329). Focal Loss for Dense Object Detection. Proceedings of the 16th IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.324"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Redmon, J., and Farhadi, A. (2017, January 21\u201326). YOLO9000: Better, Faster, Stronger. Proceedings of the 30th IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.690"},{"key":"ref_27","unstructured":"Redmon, J., and Farhadi, A.J. (2018). YOLOv3: An Incremental Improvement. arXiv."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"740","DOI":"10.1109\/LGRS.2016.2542358","article-title":"Convolutional Neural Network Based Automatic Object Detection on Aerial Images","volume":"13","author":"Sevo","year":"2016","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"LaLonde, R., Zhang, D., and Shah, M. (2018, January 18\u201323). ClusterNet: Detecting Small Objects in Large Scenes by Exploiting Spatio-Temporal Information. Proceedings of the 31st IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00421"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"1452","DOI":"10.1109\/TPAMI.2020.2974745","article-title":"Gliding Vertex on the Horizontal Bounding Box for Multi-Oriented Object Detection","volume":"43","author":"Xu","year":"2021","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Avola, D., Cinque, L., Diko, A., Fagioli, A., Foresti, G.L., Mecca, A., Pannone, D., and Piciarelli, C. (2021). MS-Faster R-CNN: Multi-Stream Backbone for Improved Faster R-CNN Object Detection and Aerial Tracking from UAV Images. Remote Sens., 13.","DOI":"10.3390\/rs13091670"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"555","DOI":"10.1016\/j.neucom.2019.03.102","article-title":"Coarse-to-fine object detection in unmanned aerial vehicle imagery using lightweight convolutional neural network and deep motion saliency","volume":"398","author":"Zhang","year":"2020","journal-title":"Neurocomputing"},{"key":"ref_33","unstructured":"Bochkovskiy, A., Wang, C.-Y., and Liao, H.-Y. (2020). Yolov4: Optimal Speed and Accuracy of Object Detection. arXiv."},{"key":"ref_34","unstructured":"Ge, Z., Liu, S., Wang, F., Li, Z., and Sun, J. (2021). Yolox: Exceeding Yolo Series in 2021. arXiv."},{"key":"ref_35","unstructured":"Howard, A., Sandler, M., Chu, G., Chen, L.C., Chen, B., Tan, M.X., Wang, W.J., Zhu, Y.K., Pang, R.M., and Vasudevan, V. (November, January 27). Searching for MobileNetV3. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Korea."},{"key":"ref_36","unstructured":"Misra, D.J. (2019). Mish: A Self Regularized Non-Monotonic Activation Function. arXiv."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Zhang, Y.-F., Ren, W., Zhang, Z., Jia, Z., Wang, L., and Tan, T.J. (2021). Focal and Efficient IOU Loss for Accurate Bounding Box Regression. arXiv.","DOI":"10.1016\/j.neucom.2022.07.042"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Zhao, H.S., Shi, J.P., Qi, X.J., Wang, X.G., and Jia, J.Y. (2017, January 21\u201326). Pyramid Scene Parsing Network. Proceedings of the 30th IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.660"},{"key":"ref_39","unstructured":"Liu, S., Huang, D., and Wang, Y.J. (2019). Learning Spatial Fusion for Single-Shot Object Detection. arXiv."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"He, K.M., Zhang, X.Y., Ren, S.Q., and Sun, J. (2016, January 27\u201330). Deep Residual Learning for Image Recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_41","unstructured":"Maas, A.L., Hannun, A.Y., and Ng, A.Y. (2013). Rectifier Nonlinearities Improve Neural Network Acoustic Models, Computer Science Department, Stanford University."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Wang, C.-Y., Liao, H.-Y.M., Wu, Y.-H., Chen, P.-Y., Hsieh, J.-W., and Yeh, I.-H. (2020, January 14\u201319). CSPNet: A New Backbone that Can Enhance Learning Capability of CNN. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA.","DOI":"10.1109\/CVPRW50498.2020.00203"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Liu, S., Qi, L., Qin, H.F., Shi, J.P., and Jia, J.Y. (2018, January 18\u201323). Path Aggregation Network for Instance Segmentation. Proceedings of the 31st IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00913"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"He, K.M., Zhang, X.Y., Ren, S.Q., and Sun, J. (2015, January 11\u201318). Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.123"},{"key":"ref_45","unstructured":"Clevert, D.-A., Unterthiner, T., and Hochreiter, S.J. (2015). Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs). arXiv."},{"key":"ref_46","unstructured":"Ramachandran, P., Zoph, B., and Le, Q.V.J. (2017). Searching for Activation Functions. arXiv."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"4525","DOI":"10.1109\/TIP.2016.2593342","article-title":"Scale-Aware Pixelwise Object Proposal Networks","volume":"25","author":"Jie","year":"2016","journal-title":"IEEE Trans. Image Process."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Yu, J., Jiang, Y., Wang, Z., Cao, Z., and Huang, T.J. (2016). UnitBox: An Advanced Object Detection Network. arXiv.","DOI":"10.1145\/2964284.2967274"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., Savarese, S., and Soc, I.C. (2019, January 16\u201320). Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression. Proceedings of the 32nd IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00075"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Zheng, Z.H., Wang, P., Liu, W., Li, J.Z., Ye, R.G., and Ren, D.W. (2020, January 7\u201312). Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. Proceedings of the 34th AAAI Conference on Artificial Intelligence\/32nd Innovative Applications of Artificial Intelligence Conference\/10th AAAI Symposium on Educational Advances in Artificial Intelligence, New York, NY, USA.","DOI":"10.1609\/aaai.v34i07.6999"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Hu, J., Shen, L., and Sun, G. (2018, January 18\u201323). Squeeze-and-Excitation Networks. Proceedings of the 31st IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00745"},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Woo, S.H., Park, J., Lee, J.Y., and Kweon, I.S. (2018, January 8\u201314). CBAM: Convolutional Block Attention Module. Proceedings of the 15th European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., and Hu, Q. (2020, January 13\u201319). ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Los Alamitos, CA, USA.","DOI":"10.1109\/CVPR42600.2020.01155"},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"He, K.M., Zhang, X.Y., Ren, S.Q., and Sun, J. (2014, January 6\u201312). Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. Proceedings of the 13th European Conference on Computer Vision (ECCV), Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10578-9_23"},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Liu, S.T., Huang, D., and Wang, Y.H. (2018, January 8\u201314). Receptive Field Block Net for Accurate and Fast Object Detection. Proceedings of the 15th European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01252-6_24"},{"key":"ref_56","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1016\/j.jvcir.2015.11.002","article-title":"Vehicle detection in aerial imagery: A small target detection benchmark","volume":"34","author":"Razakarivony","year":"2016","journal-title":"J. Vis. Commun. Image Represent."},{"key":"ref_57","unstructured":"Wilson, A.C., Roelofs, R., Stern, M., Srebro, N., and Recht, B.J. (2017). The Marginal Value of Adaptive Gradient Methods in Machine Learning. arXiv."},{"key":"ref_58","doi-asserted-by":"crossref","first-page":"336","DOI":"10.1007\/s11263-019-01228-7","article-title":"Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization","volume":"128","author":"Selvaraju","year":"2020","journal-title":"Int. J. Comput. Vis."},{"key":"ref_59","doi-asserted-by":"crossref","first-page":"621","DOI":"10.1080\/07038992.2021.1922879","article-title":"Integration of Multi-Source Geospatial Data from GNSS Receivers, Terrestrial Laser Scanners, and Unmanned Aerial Vehicles","volume":"47","author":"Dabrowski","year":"2021","journal-title":"Can. J. Remote Sens."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/14\/3240\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T23:43:14Z","timestamp":1760139794000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/14\/3240"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,7,6]]},"references-count":59,"journal-issue":{"issue":"14","published-online":{"date-parts":[[2022,7]]}},"alternative-id":["rs14143240"],"URL":"https:\/\/doi.org\/10.3390\/rs14143240","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,7,6]]}}}