{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,13]],"date-time":"2026-03-13T15:12:33Z","timestamp":1773414753517,"version":"3.50.1"},"reference-count":62,"publisher":"MDPI AG","issue":"6","license":[{"start":{"date-parts":[[2023,3,21]],"date-time":"2023-03-21T00:00:00Z","timestamp":1679356800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62072021"],"award-info":[{"award-number":["62072021"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Object detection in drone-captured images is a popular task in recent years. As drones always navigate at different altitudes, the object scale varies considerably, which burdens the optimization of models. Moreover, high-speed and low-altitude flight cause motion blur on densely packed objects, which leads to great challenges. To solve the two issues mentioned above, based on YOLOv5, we add an additional prediction head to detect tiny-scale objects and replace CNN-based prediction heads with transformer prediction heads (TPH), constructing the TPH-YOLOv5 model. TPH-YOLOv5++ is proposed to significantly reduce the computational cost and improve the detection speed of TPH-YOLOv5. In TPH-YOLOv5++, cross-layer asymmetric transformer (CA-Trans) is designed to replace the additional prediction head while maintain the knowledge of this head. By using a sparse local attention (SLA) module, the asymmetric information between the additional head and other heads can be captured efficiently, enriching the features of other heads. In the VisDrone Challenge 2021, TPH-YOLOv5 won 4th place and achieved well-matched results with the 1st place model (AP 39.43%). Based on the TPH-YOLOv5 and CA-Trans module, TPH-YOLOv5++ can further increase efficiency while achieving comparable and better results.<\/jats:p>","DOI":"10.3390\/rs15061687","type":"journal-article","created":{"date-parts":[[2023,3,21]],"date-time":"2023-03-21T06:56:48Z","timestamp":1679381808000},"page":"1687","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":99,"title":["TPH-YOLOv5++: Boosting Object Detection on Drone-Captured Scenarios with Cross-Layer Asymmetric Transformer"],"prefix":"10.3390","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3508-027X","authenticated-orcid":false,"given":"Qi","family":"Zhao","sequence":"first","affiliation":[{"name":"Department of Electronic and Information Engineering, Beihang University, Beijing 100191, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6590-0016","authenticated-orcid":false,"given":"Binghao","family":"Liu","sequence":"additional","affiliation":[{"name":"Department of Electronic and Information Engineering, Beihang University, Beijing 100191, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9769-7083","authenticated-orcid":false,"given":"Shuchang","family":"Lyu","sequence":"additional","affiliation":[{"name":"Department of Electronic and Information Engineering, Beihang University, Beijing 100191, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8955-9964","authenticated-orcid":false,"given":"Chunlei","family":"Wang","sequence":"additional","affiliation":[{"name":"Department of Electronic and Information Engineering, Beihang University, Beijing 100191, China"}]},{"given":"Hong","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Astronautics, Beihang University, Beijing 100191, China"}]}],"member":"1968","published-online":{"date-parts":[[2023,3,21]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1016\/j.isprsjprs.2017.11.011","article-title":"Beyond RGB: Very high resolution urban remote sensing with multimodal deep networks","volume":"140","author":"Audebert","year":"2018","journal-title":"ISPRS J. Photogramm. Remote. Sens."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"82","DOI":"10.1109\/MCOM.2018.1700422","article-title":"Multiple moving targets surveillance based on a cooperative network for multi-UAV","volume":"56","author":"Gu","year":"2018","journal-title":"IEEE Commun. Mag."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Hird, J.N., Montaghi, A., McDermid, G.J., Kariyeva, J., Moorman, B.J., Nielsen, S.E., and McIntosh, A.C. (2017). Use of unmanned aerial vehicles for monitoring recovery of forest vegetation on petroleum well sites. Remote Sens., 9.","DOI":"10.3390\/rs9050413"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1016\/j.rse.2018.06.028","article-title":"Detecting mammals in UAV images: Best practices to address a substantially imbalanced dataset with deep learning","volume":"216","author":"Kellenberger","year":"2018","journal-title":"Remote Sens. Environ."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll\u00e1r, P., and Zitnick, C.L. (2014, January 6\u201312). Microsoft coco: Common objects in context. Proceedings of the European Conference on Computer Vision, Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"98","DOI":"10.1007\/s11263-014-0733-5","article-title":"The pascal visual object classes challenge: A retrospective","volume":"111","author":"Everingham","year":"2015","journal-title":"Int. J. Comput. Vis."},{"key":"ref_7","unstructured":"Ren, S., He, K., Girshick, R., and Sun, J. (2015, January 7\u201312). Faster r-cnn: Towards real-time object detection with region proposal networks. Proceedings of the NIPS 2015, Advances in Neural Information Processing Systems 28, Montreal, QC, Canada."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016, January 27\u201330). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., and Berg, A.C. (2016, January 11\u201314). Ssd: Single shot multibox detector. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Goyal, P., Girshick, R., He, K., and Doll\u00e1r, P. (2017, January 22\u201329). Focal loss for dense object detection. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.324"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Zhang, H., Wang, Y., Dayoub, F., and Sunderhauf, N. (2021, January 20\u201325). Varifocalnet: An iou-aware dense object detector. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00841"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Redmon, J., and Farhadi, A. (2017, January 21\u201326). YOLO9000: Better, faster, stronger. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.690"},{"key":"ref_13","unstructured":"Redmon, J., and Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv."},{"key":"ref_14","unstructured":"Bochkovskiy, A., Wang, C.Y., and Liao, H.Y.M. (2020). Yolov4: Optimal speed and accuracy of object detection. arXiv."},{"key":"ref_15","unstructured":"Jocher, G., Stoken, A., Borovec, J., Chaurasia, A., Changyu, L., Laughing, A., Hogan, A., Hajek, J., Diaconu, L., and Marc, Y. (2021). ultralytics\/yolov5: V5. 0-YOLOv5-P6 1280 models AWS Supervise. ly and YouTube integrations. Zenodo, 11."},{"key":"ref_16","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, \u0141., and Polosukhin, I. (2017, January 4\u20139). Attention is all you need. Proceedings of the NIPS 2017, Advances in Neural Information Processing Systems 30, Long Beach, CA, USA."},{"key":"ref_17","unstructured":"Alexey, D., Lucas, B., Alexander, K., Dirk, W., Xiaohua, Z., Thomas, U., Mostafa, D., Matthias, M., Georg, H., and Sylvain, G. (2021, January 3\u20137). An imageis worth 16x16 words: Transformers for image recognition at scale. Proceedings of the 9th International Conference on Learning Representations (ICLR 2021), Vienna, Austria."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Woo, S., Park, J., Lee, J.Y., and Kweon, I.S. (2018, January 8\u201314). Cbam: Convolutional block attention module. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Du, D., Qi, Y., Yu, H., Yang, Y., Duan, K., Li, G., Zhang, W., Huang, Q., and Tian, Q. (2018, January 8\u201314). The unmanned aerial vehicle benchmark: Object detection and tracking. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01249-6_23"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Zhu, X., Lyu, S., Wang, X., and Zhao, Q. (2021, January 11\u201317). TPH-YOLOv5: Improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCVW54120.2021.00312"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"7380","DOI":"10.1109\/TPAMI.2021.3119563","article-title":"Detection and tracking meet drones challenge","volume":"44","author":"Zhu","year":"2021","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Cai, Z., and Vasconcelos, N. (2018, January 18\u201323). Cascade r-cnn: Delving into high quality object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00644"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Law, H., and Deng, J. (2018, January 8\u201314). Cornernet: Detecting objects as paired keypoints. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01264-9_45"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., and Tian, Q. (2019, January 27\u201328). Centernet: Keypoint triplets for object detection. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea.","DOI":"10.1109\/ICCV.2019.00667"},{"key":"ref_25","unstructured":"Ge, Z., Liu, S., Wang, F., Li, Z., and Sun, J. (2021). Yolox: Exceeding yolo series in 2021. arXiv."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Yang, Z., Liu, S., Hu, H., Wang, L., and Lin, S. (2019, January 27\u201328). Reppoints: Point set representation for object detection. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea.","DOI":"10.1109\/ICCV.2019.00975"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Doll\u00e1r, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21\u201326). Feature pyramid networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Wu, Y., Chen, Y., Yuan, L., Liu, Z., Wang, L., Li, H., and Fu, Y. (2020, January 14\u201319). Rethinking classification and localization for object detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01020"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Chen, Q., Wang, Y., Yang, T., Zhang, X., Cheng, J., and Sun, J. (2021, January 20\u201325). You only look one-level feature. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01284"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Bodla, N., Singh, B., Chellappa, R., and Davis, L.S. (2017, January 22\u201329). Soft-NMS\u2013improving object detection with one line of code. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.593"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"104117","DOI":"10.1016\/j.imavis.2021.104117","article-title":"Weighted boxes fusion: Ensembling boxes from different object detection models","volume":"107","author":"Solovyev","year":"2021","journal-title":"Image Vis. Comput."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Yang, F., Fan, H., Chu, P., Blasch, E., and Ling, H. (2019, January 27\u201328). Clustered object detection in aerial images. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea.","DOI":"10.1109\/ICCV.2019.00840"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Zhang, J., Huang, J., Chen, X., and Zhang, D. (2019, January 27\u201328). How to fully exploit the abilities of aerial image detectors. Proceedings of the IEEE\/CVF International Conference on Computer Vision Workshops, Seoul, Republic of Korea.","DOI":"10.1109\/ICCVW.2019.00007"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"283","DOI":"10.1016\/j.isprsjprs.2021.08.002","article-title":"Multi-scale adversarial network for vehicle detection in UAV imagery","volume":"180","author":"Zhang","year":"2021","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"1556","DOI":"10.1109\/TIP.2020.3045636","article-title":"A global-local self-adaptive network for drone-view object detection","volume":"30","author":"Deng","year":"2020","journal-title":"IEEE Trans. Image Process."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Li, C., Yang, T., Zhu, S., Chen, C., and Guan, S. (2020, January 13\u201319). Density map guided object detection in aerial images. Proceedings of the proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA.","DOI":"10.1109\/CVPRW50498.2020.00103"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Yu, W., Yang, T., and Chen, C. (2021, January 5\u20139). Towards resolving the challenge of long-tail distribution in UAV images for object detection. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, Virtual.","DOI":"10.1109\/WACV48630.2021.00330"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Chen, J., Hong, H., Song, B., Guo, J., Chen, C., and Xu, J. (2023). MDCT: Multi-Kernel Dilated Convolution and Transformer for One-Stage Object Detection of Remote Sensing Images. Remote Sens., 15.","DOI":"10.3390\/rs15020371"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Gallo, I., Rehman, A.U., Dehkordi, R.H., Landro, N., La Grassa, R., and Boschetti, M. (2023). Deep Object Detection of Crop Weeds: Performance of YOLOv7 on a Real Case Dataset from UAV Images. Remote Sens., 15.","DOI":"10.3390\/rs15020539"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Liu, R., Tao, F., Liu, X., Na, J., Leng, H., Wu, J., and Zhou, T. (2022). RAANet: A Residual ASPP with Attention Framework for Semantic Segmentation of High-Resolution Remote Sensing Images. Remote Sens., 14.","DOI":"10.3390\/rs14133109"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"364","DOI":"10.1016\/j.neucom.2020.06.011","article-title":"Novel up-scale feature aggregation for object detection in aerial images","volume":"411","author":"Lin","year":"2020","journal-title":"Neurocomputing"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Duan, C., Wei, Z., Zhang, C., Qu, S., and Wang, H. (2021, January 11\u201317). Coarse-grained Density Map Guided Object Detection in Aerial Images. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCVW54120.2021.00313"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Xi, Y., Jia, W., Miao, Q., Liu, X., Fan, X., and Li, H. (2022). FiFoNet: Fine-Grained Target Focusing Network for Object Detection in UAV Images. Remote Sens., 14.","DOI":"10.3390\/rs14163919"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Ringwald, T., Sommer, L., Schumann, A., Beyerer, J., and Stiefelhagen, R. (2019, January 16\u201317). UAV-Net: A fast aerial vehicle detector for mobile platforms. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA.","DOI":"10.1109\/CVPRW.2019.00080"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Zhang, R., Shao, Z., Huang, X., Wang, J., and Li, D. (2020). Object detection in UAV images via global density fused convolutional network. Remote Sens., 12.","DOI":"10.3390\/rs12193140"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Chen, P.Y., Hsieh, J.W., Wang, C.Y., and Liao, H.Y.M. (2020, January 14\u201319). Recursive hybrid fusion pyramid network for real-time small object detection on embedded devices. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA.","DOI":"10.1109\/CVPRW50498.2020.00209"},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Cao, J., Pang, Y., Han, J., and Li, X. (2021). Hierarchical Regression and Classification for Accurate Object Detection. IEEE Trans. Neural Netw. Learn. Syst., 1\u201315.","DOI":"10.1109\/TNNLS.2021.3106641"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Qi, G., Zhang, Y., Wang, K., Mazur, N., Liu, Y., and Malaviya, D. (2022). Small Object Detection Method Based on Adaptive Spatial Parallel Convolution and Fast Multi-Scale Fusion. Remote Sens., 14.","DOI":"10.3390\/rs14020420"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Huang, Y., Chen, J., and Huang, D. (2022, January 7\u201314). Ufpmp-det: Toward accurate and efficient object detection on drone imagery. Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA.","DOI":"10.1609\/aaai.v36i1.19986"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021, January 11\u201317). Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Dong, X., Bao, J., Chen, D., Zhang, W., Yu, N., Yuan, L., Chen, D., and Guo, B. (2022, January 18\u201324). Cswin transformer: A general vision transformer backbone with cross-shaped windows. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01181"},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Chen, C.F.R., Fan, Q., and Panda, R. (2021, January 11\u201317). Crossvit: Cross-attention multi-scale vision transformer for image classification. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00041"},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Xu, R., Xiang, H., Tu, Z., Xia, X., Yang, M.H., and Ma, J. (2022, January 23\u201327). V2X-ViT: Vehicle-to-everything cooperative perception with vision transformer. Proceedings of the Computer Vision\u2013ECCV 2022: 17th European Conference, Tel Aviv, Israel.","DOI":"10.1007\/978-3-031-19842-7_7"},{"key":"ref_54","unstructured":"Xu, R., Tu, Z., Xiang, H., Shao, W., Zhou, B., and Ma, J. (2022). CoBEVT: Cooperative bird\u2019s eye view semantic segmentation with sparse transformers. arXiv."},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Tu, Z., Talebi, H., Zhang, H., Yang, F., Milanfar, P., Bovik, A., and Li, Y. (2022, January 23\u201327). Maxvit: Multi-axis vision transformer. Proceedings of the Computer Vision\u2013ECCV 2022: 17th European Conference, Tel Aviv, Israel.","DOI":"10.1007\/978-3-031-20053-3_27"},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., and Zagoruyko, S. (2020, January 23\u201328). End-to-end object detection with transformers. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58452-8_13"},{"key":"ref_57","unstructured":"Wang, Y., Zhang, X., Yang, T., and Sun, J. (2021). Anchor DETR: Query Design for Transformer-Based Object Detection. arXiv."},{"key":"ref_58","first-page":"26183","article-title":"You only look at one sequence: Rethinking transformer in vision through object detection","volume":"34","author":"Fang","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_59","doi-asserted-by":"crossref","unstructured":"Cao, Y., He, Z., Wang, L., Wang, W., Yuan, Y., Zhang, D., Zhang, J., Zhu, P., Van Gool, L., and Han, J. (2021, January 11\u201317). VisDrone-DET2021: The vision meets drone object detection challenge results. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCVW54120.2021.00319"},{"key":"ref_60","doi-asserted-by":"crossref","first-page":"117106","DOI":"10.1016\/j.eswa.2022.117106","article-title":"Dilated convolution based RCNN using feature fusion for Low-Altitude aerial objects","volume":"199","author":"Mittal","year":"2022","journal-title":"Expert Syst. Appl."},{"key":"ref_61","doi-asserted-by":"crossref","unstructured":"Zhang, Z., Lu, X., Cao, G., Yang, Y., Jiao, L., and Liu, F. (2021, January 11\u201317). ViT-YOLO: Transformer-based YOLO for object detection. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCVW54120.2021.00314"},{"key":"ref_62","doi-asserted-by":"crossref","unstructured":"Wan, J., Zhang, B., Zhao, Y., Du, Y., and Tong, Z. (2021, January 11\u201317). VistrongerDet: Stronger Visual Information for Object Detection in VisDrone Images. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCVW54120.2021.00316"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/15\/6\/1687\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T18:59:41Z","timestamp":1760122781000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/15\/6\/1687"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,3,21]]},"references-count":62,"journal-issue":{"issue":"6","published-online":{"date-parts":[[2023,3]]}},"alternative-id":["rs15061687"],"URL":"https:\/\/doi.org\/10.3390\/rs15061687","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,3,21]]}}}