{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,28]],"date-time":"2026-03-28T18:27:51Z","timestamp":1774722471867,"version":"3.50.1"},"reference-count":56,"publisher":"MDPI AG","issue":"16","license":[{"start":{"date-parts":[[2024,8,11]],"date-time":"2024-08-11T00:00:00Z","timestamp":1723334400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Currently, target detection on unmanned aerial vehicle (UAV) images is a research hotspot. Due to the significant scale variability of targets and the interference of complex backgrounds, current target detection models face challenges when applied to UAV images. To address these issues, we designed an effective and lightweight full-scale target detection network, FSTD-Net. The design of FSTD-Net is based on three principal aspects. Firstly, to optimize the extracted target features at different scales while minimizing background noise and sparse feature representations, a multi-scale contextual information extraction module (MSCIEM) is developed. The multi-scale information extraction module (MSIEM) in MSCIEM can better capture multi-scale features, and the contextual information extraction module (CIEM) in MSCIEM is designed to capture long-range contextual information. Secondly, to better adapt to various target shapes at different scales in UAV images, we propose the feature extraction module fitting different shapes (FEMFDS), based on deformable convolutions. Finally, considering low-level features contain rich details, a low-level feature enhancement branch (LLFEB) is designed. The experiments demonstrate that, compared to the second-best model, the proposed FSTD-Net achieves improvements of 3.8%, 2.4%, and 2.0% in AP50, AP, and AP75 on the VisDrone2019, respectively. Additionally, FSTD-Net achieves enhancements of 3.4%, 1.7%, and 1% on the UAVDT dataset. Our proposed FSTD-Net has better detection performance compared to state-of-the-art detection models. The experimental results indicate the effectiveness of the FSTD-Net for target detection in UAV images.<\/jats:p>","DOI":"10.3390\/rs16162944","type":"journal-article","created":{"date-parts":[[2024,8,12]],"date-time":"2024-08-12T08:54:08Z","timestamp":1723452848000},"page":"2944","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["An Effective and Lightweight Full-Scale Target Detection Network for UAV Images Based on Deformable Convolutions and Multi-Scale Contextual Feature Optimization"],"prefix":"10.3390","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0009-0008-6379-8650","authenticated-orcid":false,"given":"Wanwan","family":"Yu","sequence":"first","affiliation":[{"name":"School of Electronics and Information Engineering, Harbin Institute of Technology, Harbin 150001, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1082-114X","authenticated-orcid":false,"given":"Junping","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Electronics and Information Engineering, Harbin Institute of Technology, Harbin 150001, China"}]},{"given":"Dongyang","family":"Liu","sequence":"additional","affiliation":[{"name":"School of Electronics and Information Engineering, Harbin Institute of Technology, Harbin 150001, China"}]},{"given":"Yunqiao","family":"Xi","sequence":"additional","affiliation":[{"name":"School of Electronics and Information Engineering, Harbin Institute of Technology, Harbin 150001, China"}]},{"given":"Yinhu","family":"Wu","sequence":"additional","affiliation":[{"name":"School of Electronics and Information Engineering, Harbin Institute of Technology, Harbin 150001, China"}]}],"member":"1968","published-online":{"date-parts":[[2024,8,11]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1109\/JPROC.2023.3238524","article-title":"Object detection in 20 years: A survey","volume":"111","author":"Zou","year":"2023","journal-title":"Proc. IEEE"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"103514","DOI":"10.1016\/j.dsp.2022.103514","article-title":"A survey of modern deep learning based object detection models","volume":"126","author":"Zaidi","year":"2022","journal-title":"Digit. Signal Process."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Waheed, M., Ahmad, R., Ahmed, W., Alam, M.M., and Magarini, M. (2023). On coverage of critical nodes in UAV-assisted emergency networks. Sensors, 23.","DOI":"10.3390\/s23031586"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"19683","DOI":"10.1007\/s11042-021-11146-x","article-title":"Monitoring and surveillance of urban road traffic using low altitude drone images: A deep learning approach","volume":"81","author":"Gupta","year":"2022","journal-title":"Multimed. Tools Appl."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Deng, A., Han, G., Chen, D., Ma, T., and Liu, Z. (2023). Slight aware enhancement transformer and multiple matching network for real-time UAV tracking. Remote Sens., 15.","DOI":"10.3390\/rs15112857"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"5506305","DOI":"10.1109\/LGRS.2021.3079317","article-title":"Multitask learning of alfalfa nutritive value from UAVbased hyperspectral images","volume":"19","author":"Feng","year":"2022","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014, January 23\u201328). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.81"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 7\u201313). Fast R-CNN. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","article-title":"Faster R-CNN: Towards real-time object detection with region proposal networks","volume":"39","author":"Ren","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1109\/MGRS.2021.3115137","article-title":"Deep learning for unmanned aerial vehicle-based object detection and tracking: A survey","volume":"10","author":"Wu","year":"2022","journal-title":"IEEE Geosci. Remote Sens. Mag."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Terven, J., and Cordova-Esparza, D. (2023). A comprehensive review of YOLO architectures in computer vision: From YOLOv1 to YOLOv8 and YOLO-NAS. arXiv.","DOI":"10.3390\/make5040083"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016, January 27\u201330). You only look once: Unified real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Redmon, J., and Farhadi, A. (2016). YOLO9000: Better, Faster, Stronger. arXiv.","DOI":"10.1109\/CVPR.2017.690"},{"key":"ref_14","unstructured":"Redmon, J., and Farhadi, A. (2018). YOLOv3: An Incremental Improvement. arXiv."},{"key":"ref_15","unstructured":"Bochkovskiy, A., Wang, C.-Y., and Liao, H.-Y.M. (2020). YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv."},{"key":"ref_16","unstructured":"Li, C., Li, L., Jiang, H., Weng, K., Geng, Y., Li, L., and Wei, X. (2022). YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications. arXiv."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Wang, C.-Y., Bochkovskiy, A., and Liao, H.-Y.M. (2023, January 17\u201324). YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada.","DOI":"10.1109\/CVPR52729.2023.00721"},{"key":"ref_18","unstructured":"Reis, D., Kupec, J., Hong, J., and Daoudi, A. (2023). Real-Time Flying Object Detection with YOLOv8. arXiv."},{"key":"ref_19","unstructured":"Wang, C.Y., Yeh, I.H., and Liao, H.Y.M. (2024). YOLOv9: Learning what you want to learn using programmable gradient information. arXiv."},{"key":"ref_20","unstructured":"Wang, A., Chen, H., Liu, L., Chen, K., Lin, Z., Han, J., and Ding, G. (2024). YOLOv10: Real-Time End-to-End Object Detection. arXiv."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Lin, T.-Y., Doll\u00e1r, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21\u201326). Feature pyramid networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_22","first-page":"4704415","article-title":"A DeNoising FPN With Transformer R-CNN for Tiny Object Detection","volume":"62","author":"Liu","year":"2024","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Zhao, G., Ge, W., and Yu, Y. (2021, January 11\u201317). GraphFPN: Graph feature pyramid network for object detection. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00276"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Yang, G., Lei, J., Zhu, Z., Cheng, S., Feng, Z., and Liang, R. (2023, January 1\u20134). AFPN: Asymptotic feature pyramid network for object detection. Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics (SMC), Honolulu, HI, USA.","DOI":"10.1109\/SMC53992.2023.10394415"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., and Wei, Y. (2017, January 22\u201329). Deformable convolutional networks. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.89"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Zhu, X., Hu, H., Lin, S., and Dai, J. (2019, January 16\u201320). Deformable convnets v2: More deformable, better results. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00953"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Wang, W., Dai, J., Chen, Z., Huang, Z., Li, Z., Zhu, X., Hu, X., Lu, T., Lu, L., and Li, H. (2023, January 17\u201324). InternImage: Exploring large-scale vision foundation models with deformable convolutions. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada.","DOI":"10.1109\/CVPR52729.2023.01385"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Xiong, Y., Li, Z., Chen, Y., Wang, F., Zhu, X., Luo, J., and Dai, J. (2024). Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications. arXiv.","DOI":"10.1109\/CVPR52733.2024.00540"},{"key":"ref_29","first-page":"5603914","article-title":"Center-boundary dual attention for oriented object detection in remote sensing images","volume":"60","author":"Liu","year":"2021","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"2069","DOI":"10.1109\/TMM.2021.3075566","article-title":"Realtime and accurate UAV pedestrian detection for social distancing monitoring in COVID-19 pandemic","volume":"24","author":"Shao","year":"2022","journal-title":"IEEE Trans. Multimed."},{"key":"ref_31","first-page":"6001105","article-title":"Contrastive Learning and Similarity Feature Fusion for UAV Image Target Detection","volume":"21","author":"Wang","year":"2024","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_32","first-page":"6004305","article-title":"Self-attention guidance and multiscale feature fusion-based UAV image object detection","volume":"20","author":"Zhang","year":"2023","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"33","DOI":"10.1109\/JMASS.2023.3332948","article-title":"Hybrid CNN and Transformer Network for Semantic Segmentation of UAV Remote Sensing Images","volume":"5","author":"Zhou","year":"2024","journal-title":"IEEE J. Miniat. Air Space Syst."},{"key":"ref_34","first-page":"4406410","article-title":"SFSANet: Multi-scale object detection in remote sensing image based on semantic fusion and scale adaptability","volume":"62","author":"Zhang","year":"2024","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_35","first-page":"5603214","article-title":"Attention-free global multiscale fusion network for remote sensing object detection","volume":"62","author":"Gao","year":"2023","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"5613515","DOI":"10.1109\/TGRS.2023.3294241","article-title":"A task-balanced multiscale adaptive fusion network for object detection in remote sensing images","volume":"61","author":"Gao","year":"2023","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"5615614","DOI":"10.1109\/TGRS.2023.3294241","article-title":"Global to local: A scale-aware network for remote sensing object detection","volume":"61","author":"Gao","year":"2023","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"6510405","DOI":"10.1109\/LGRS.2022.3178479","article-title":"Multiscale deformable attention and multilevel features aggregation for remote sensing object detection","volume":"19","author":"Dong","year":"2022","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"6505305","DOI":"10.1109\/LGRS.2022.3141109","article-title":"Object detection deployed on UAVs for oblique images by fusing IMU information","volume":"19","author":"Shen","year":"2022","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"5015214","DOI":"10.1109\/TIM.2024.3381272","article-title":"MFFSODNet: Multi-Scale Feature Fusion Small Object Detection Network for UAV Aerial Images","volume":"73","author":"Jiang","year":"2024","journal-title":"IEEE Trans. Instrum. Meas."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"6517905","DOI":"10.1109\/LGRS.2022.3220661","article-title":"Find small objects in UAV images by feature mining and attention","volume":"19","author":"Liu","year":"2022","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"6006605","DOI":"10.1109\/LGRS.2024.3382090","article-title":"MFO-Net: A Multiscale Feature Optimization Network for UAV Image Object Detection","volume":"21","author":"Lan","year":"2024","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"13312","DOI":"10.1109\/JIOT.2023.3334742","article-title":"Split-and-Shuffle Detector for Real-Time Traffic Object Detection in Aerial Image","volume":"11","author":"Mao","year":"2024","journal-title":"IEEE Internet Things J."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"8018505","DOI":"10.1109\/LGRS.2021.3103069","article-title":"SSPNet: Scale selection pyramid network for tiny person detection from UAV images","volume":"19","author":"Hong","year":"2022","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"2300","DOI":"10.1109\/TCYB.2020.3004636","article-title":"Context-aware block net for small object detection","volume":"52","author":"Cui","year":"2022","journal-title":"IEEE Trans. Cybern."},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Yang, C., Huang, Z., and Wang, N. (2022, January 18\u201324). QueryDet: Cascaded sparse query for accelerating high-resolution small object detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01330"},{"key":"ref_47","first-page":"5621411","article-title":"Full-Scale Feature Aggregation and Grouping Feature Reconstruction-Based UAV Image Target Detection","volume":"62","author":"Zhang","year":"2024","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"3456","DOI":"10.1109\/TCSVT.2020.3038649","article-title":"Efficient selective context network for accurate object detection","volume":"31","author":"Nie","year":"2021","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"5602918","DOI":"10.1109\/TGRS.2022.3224815","article-title":"FSoD-Net: Full-scale object detection from optical remote sensing imagery","volume":"60","author":"Wang","year":"2022","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Cai, X., Lai, Q., Wang, Y., Wang, W., Sun, Z., and Yao, Y. (2024). Poly Kernel Inception Network for Remote Sensing Detection. arXiv.","DOI":"10.1109\/CVPR52733.2024.02617"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Liu, S., Qi, L., Qin, H., Shi, J., and Jia, J. (2018, January 18\u201323). Path aggregation network for instance segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00913"},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna, Z. (2016, January 27\u201330). Rethinking the inception architecture for computer vision. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.308"},{"key":"ref_53","unstructured":"Yu, W., Zhou, P., Yan, S., and Wang, X. (2023). Inceptionnext: When inception meets convnext. arXiv."},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Du, D., Qi, Y., Yu, H., Yang, Y., Duan, K., Li, G., Zhang, W., Huang, Q., and Tian, Q. (2018, January 8\u201314). The unmanned aerial vehicle benchmark: Object detection and tracking. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01249-6_23"},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., and Berg, A.C. (2016, January 11\u201314). SSD: Single shot multibox detector. Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_56","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1007\/s11263-009-0275-4","article-title":"The pascal visual object classes (voc) challenge","volume":"88","author":"Everingham","year":"2010","journal-title":"Int. J. Comput. Vis."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/16\/16\/2944\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T15:35:00Z","timestamp":1760110500000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/16\/16\/2944"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,8,11]]},"references-count":56,"journal-issue":{"issue":"16","published-online":{"date-parts":[[2024,8]]}},"alternative-id":["rs16162944"],"URL":"https:\/\/doi.org\/10.3390\/rs16162944","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,8,11]]}}}