{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,25]],"date-time":"2026-04-25T14:50:27Z","timestamp":1777128627464,"version":"3.51.4"},"reference-count":40,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2022,12,23]],"date-time":"2022-12-23T00:00:00Z","timestamp":1671753600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"the National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61772295"],"award-info":[{"award-number":["61772295"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"the National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["cstc2018jcyjAX0470"],"award-info":[{"award-number":["cstc2018jcyjAX0470"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"the National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["KJZD-M202000501"],"award-info":[{"award-number":["KJZD-M202000501"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"the National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["yjg193096"],"award-info":[{"award-number":["yjg193096"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"the National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["KJQN201800539"],"award-info":[{"award-number":["KJQN201800539"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100012669","name":"the Natural Science Foundation Project of CQ CSTC","doi-asserted-by":"publisher","award":["61772295"],"award-info":[{"award-number":["61772295"]}],"id":[{"id":"10.13039\/501100012669","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100012669","name":"the Natural Science Foundation Project of CQ CSTC","doi-asserted-by":"publisher","award":["cstc2018jcyjAX0470"],"award-info":[{"award-number":["cstc2018jcyjAX0470"]}],"id":[{"id":"10.13039\/501100012669","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100012669","name":"the Natural Science Foundation Project of CQ CSTC","doi-asserted-by":"publisher","award":["KJZD-M202000501"],"award-info":[{"award-number":["KJZD-M202000501"]}],"id":[{"id":"10.13039\/501100012669","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100012669","name":"the Natural Science Foundation Project of CQ CSTC","doi-asserted-by":"publisher","award":["yjg193096"],"award-info":[{"award-number":["yjg193096"]}],"id":[{"id":"10.13039\/501100012669","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100012669","name":"the Natural Science Foundation Project of CQ CSTC","doi-asserted-by":"publisher","award":["KJQN201800539"],"award-info":[{"award-number":["KJQN201800539"]}],"id":[{"id":"10.13039\/501100012669","id-type":"DOI","asserted-by":"publisher"}]},{"name":"the Applying Basic Research Program of Chongqing Education Committee","award":["61772295"],"award-info":[{"award-number":["61772295"]}]},{"name":"the Applying Basic Research Program of Chongqing Education Committee","award":["cstc2018jcyjAX0470"],"award-info":[{"award-number":["cstc2018jcyjAX0470"]}]},{"name":"the Applying Basic Research Program of Chongqing Education Committee","award":["KJZD-M202000501"],"award-info":[{"award-number":["KJZD-M202000501"]}]},{"name":"the Applying Basic Research Program of Chongqing Education Committee","award":["yjg193096"],"award-info":[{"award-number":["yjg193096"]}]},{"name":"the Applying Basic Research Program of Chongqing Education Committee","award":["KJQN201800539"],"award-info":[{"award-number":["KJQN201800539"]}]},{"name":"the Chongqing Postgraduate Education Teaching Reform Research Project","award":["61772295"],"award-info":[{"award-number":["61772295"]}]},{"name":"the Chongqing Postgraduate Education Teaching Reform Research Project","award":["cstc2018jcyjAX0470"],"award-info":[{"award-number":["cstc2018jcyjAX0470"]}]},{"name":"the Chongqing Postgraduate Education Teaching Reform Research Project","award":["KJZD-M202000501"],"award-info":[{"award-number":["KJZD-M202000501"]}]},{"name":"the Chongqing Postgraduate Education Teaching Reform Research Project","award":["yjg193096"],"award-info":[{"award-number":["yjg193096"]}]},{"name":"the Chongqing Postgraduate Education Teaching Reform Research Project","award":["KJQN201800539"],"award-info":[{"award-number":["KJQN201800539"]}]},{"name":"the Science and Technology Research Program of Chongqing Municipal Education Commission","award":["61772295"],"award-info":[{"award-number":["61772295"]}]},{"name":"the Science and Technology Research Program of Chongqing Municipal Education Commission","award":["cstc2018jcyjAX0470"],"award-info":[{"award-number":["cstc2018jcyjAX0470"]}]},{"name":"the Science and Technology Research Program of Chongqing Municipal Education Commission","award":["KJZD-M202000501"],"award-info":[{"award-number":["KJZD-M202000501"]}]},{"name":"the Science and Technology Research Program of Chongqing Municipal Education Commission","award":["yjg193096"],"award-info":[{"award-number":["yjg193096"]}]},{"name":"the Science and Technology Research Program of Chongqing Municipal Education Commission","award":["KJQN201800539"],"award-info":[{"award-number":["KJQN201800539"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Object detection is one of the most widespread applications for numerous Unmanned Aerial Vehicle (UAV) tasks. Due to the shooting angle and flying height of the UAV, compared with general scenarios, small objects account for a large proportion of aerial images, and common object detectors are not extremely effective in aerial images. Moreover, since the computing resources of UAV platforms are generally limited, the deployment of common detectors with a large number of parameters on UAV platforms is difficult. This paper proposes a lightweight object detector YOLO-UAVlite for aerial images. Firstly, the spatial attention module and coordinate attention module are modified and combined to form a novel Spatial-Coordinate Self-Attention (SCSA) module, which integrates spatial, location, and channel information to enhance object representation. On this basis, we construct a lightweight backbone, named SCSAshufflenet, which combines the Enhanced ShuffleNet (ES) network with the proposed SCSA module to improve feature extraction and reduce model size. Secondly, we propose an improved feature pyramid model, namely Slim-BiFPN, where we construct new lightweight convolutional blocks to reduce the information loss during the feature map fusion process while reducing the model weights. Finally, the localization loss function is modified to increase the bounding box regression rate while improving the localization accuracy. Extensive experiments conducted on the VisDrone-DET2021 dataset indicate that, compared with the YOLOv5-N baseline, the proposed YOLO-UAVlite reduces the number of parameters by 25.8% and achieves gains of 10.9% in mAP0.50. Compared with other lightweight detectors, both the mAP and the number of parameters are improved.<\/jats:p>","DOI":"10.3390\/rs15010083","type":"journal-article","created":{"date-parts":[[2022,12,27]],"date-time":"2022-12-27T07:31:56Z","timestamp":1672126316000},"page":"83","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":36,"title":["A Lightweight Object Detector Based on Spatial-Coordinate Self-Attention for UAV Aerial Images"],"prefix":"10.3390","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5538-0475","authenticated-orcid":false,"given":"Chen","family":"Liu","sequence":"first","affiliation":[{"name":"College of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8582-4302","authenticated-orcid":false,"given":"Degang","family":"Yang","sequence":"additional","affiliation":[{"name":"College of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Liu","family":"Tang","sequence":"additional","affiliation":[{"name":"College of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7028-4641","authenticated-orcid":false,"given":"Xun","family":"Zhou","sequence":"additional","affiliation":[{"name":"Party School of Yibin Committee of Communist Party of China, Yibin 644000, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yi","family":"Deng","sequence":"additional","affiliation":[{"name":"College of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2022,12,23]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Wang, L., Xiang, L., Tang, L., and Jiang, H. (2021). A Convolutional Neural Network-Based Method for Corn Stand Counting in the Field. Sensors, 21.","DOI":"10.3390\/s21020507"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"37905","DOI":"10.1109\/ACCESS.2021.3063681","article-title":"Automatic Person Detection in Search and Rescue Operations Using Deep CNN Detectors","volume":"9","author":"Sambolek","year":"2021","journal-title":"IEEE Access"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"253","DOI":"10.1007\/s41095-018-0116-x","article-title":"Traffic signal detection and classification in street views using an attention model","volume":"4","author":"Lu","year":"2018","journal-title":"Comput. Vis. Media"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014, January 6\u201312). Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. Proceedings of the European Conference on Computer Vision, Zurich, Switzerland.","DOI":"10.1109\/CVPR.2014.81"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 7\u201313). Fast R-CNN. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Dollar, P., and Girshick, R. (2017, January 22\u201329). Mask R-CNN. Proceedings of the International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.322"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016, January 27\u201330). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., and Berg, A.C. (2016, January 11\u201314). Ssd: Single shot multibox detector. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Redmon, J., and Farhadi, A. (2017, January 21\u201326). YOLO9000: Better, faster, stronger. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.690"},{"key":"ref_10","unstructured":"Redmon, J., and Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2015, January 7\u201312). Going deeper with convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"ref_12","unstructured":"Ioffe, S., and Szegedy, C. (2015, January 7\u20139). Batch normalization: Accelerating deep network training by reducing internal covariate shift. Proceedings of the International Conference on Machine Learning, Lille, France."},{"key":"ref_13","unstructured":"Bochkovskiy, A., Wang, C.Y., and Liao, H.Y.M. (2020). Yolov4: Optimal speed and accuracy of object detection. arXiv."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., and Ren, D. (2020, January 7\u201312). Distance-IoU loss: Faster and better learning for bounding box regression. Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA.","DOI":"10.1609\/aaai.v34i07.6999"},{"key":"ref_15","unstructured":"Jocher, G., Chaurasia, A., Stoken, A., Borovec, J., NanoCode012, Kwon, Y., Xie, T., Fang, J., imyhxy, and Michael, K. (2022, December 20). ultralytics\/yolov5: v6.1\u2014TensorRT, TensorFlow Edge TPU and OpenVINO Export and Inference. Available online: https:\/\/zenodo.org\/record\/6222936#.Y5GBLH1BxPZ."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Doll\u00e1r, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21\u201326). Feature pyramid networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_17","unstructured":"Wang, W., Xie, E., Song, X., Zang, Y., Wang, W., Lu, T., Yu, G., and Shen, C. (November, January 27). Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_18","unstructured":"(2022, December 20). NanoDet-Plus: Super Fast and High Accuracy Lightweight Anchor-Free Object Detection Model. Available online: https:\/\/github.com\/RangiLyu\/nanodet."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll\u00e1r, P., and Zitnick, C.L. (2014, January 6\u201312). Microsoft coco: Common objects in context. Proceedings of the European Conference on Computer Vision, Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"7380","DOI":"10.1109\/TPAMI.2021.3119563","article-title":"Detection and Tracking Meet Drones Challenge","volume":"44","author":"Zhu","year":"2021","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_21","unstructured":"Yang, F., Fan, H., Chu, P., Blasch, E., and Ling, H. (November, January 27). Clustered object detection in aerial images. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"5146","DOI":"10.1109\/TGRS.2019.2897139","article-title":"ORSIm detector: A novel object detection framework in optical remote sensing imagery using spatial-frequency channel features","volume":"57","author":"Wu","year":"2019","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"5512","DOI":"10.1109\/TGRS.2019.2899955","article-title":"mathcalR2-CNN: Fast Tiny Object Detection in Large-Scale Remote Sensing Images","volume":"57","author":"Pang","year":"2019","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"2300","DOI":"10.1109\/TCYB.2020.3004636","article-title":"Context-aware block net for small object detection","volume":"52","author":"Cui","year":"2020","journal-title":"IEEE Trans. Cybern."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"227288","DOI":"10.1109\/ACCESS.2020.3046515","article-title":"YOLO-ACN: Focusing on small target and occluded object detection","volume":"8","author":"Li","year":"2020","journal-title":"IEEE Access"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"1452","DOI":"10.1109\/TPAMI.2020.2974745","article-title":"Gliding vertex on the horizontal bounding box for multi-oriented object detection","volume":"43","author":"Xu","year":"2020","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_27","unstructured":"Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., and Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Zhang, X., Zhou, X., Lin, M., and Sun, J. (2018, January 18\u201323). Shufflenet: An extremely efficient convolutional neural network for mobile devices. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00716"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Ma, N., Zhang, X., Zheng, H.T., and Sun, J. (2018, January 8\u201314). Shufflenet v2: Practical guidelines for efficient cnn architecture design. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01264-9_8"},{"key":"ref_30","unstructured":"Yu, G., Chang, Q., Lv, W., Xu, C., Cui, C., Ji, W., Dang, Q., Deng, K., Wang, G., and Du, Y. (2021). PP-PicoDet: A Better Real-Time Object Detector on Mobile Devices. arXiv."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Hu, J., Shen, L., and Sun, G. (2018, January 18\u201323). Squeeze-and-excitation networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00745"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., and Hu, Q. (2019). ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. arXiv.","DOI":"10.1109\/CVPR42600.2020.01155"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Woo, S., Park, J., Lee, J.Y., and Kweon, I.S. (2018, January 8\u201314). Cbam: Convolutional block attention module. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Hou, Q., Zhou, D., and Feng, J. (2021, January 20\u201325). Coordinate attention for efficient mobile network design. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01350"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Han, K., Wang, Y., Tian, Q., Guo, J., Xu, C., and Xu, C. (2020, January 13\u201319). Ghostnet: More features from cheap operations. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00165"},{"key":"ref_36","unstructured":"Howard, A., Sandler, M., Chu, G., Chen, L.C., Chen, B., Tan, M., Wang, W., Zhu, Y., Pang, R., and Vasudevan, V. (November, January 27). Searching for mobilenetv3. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_37","first-page":"20230","article-title":"Alpha-IoU: A Family of Power Intersection over Union Losses for Bounding Box Regression","volume":"34","author":"He","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_38","unstructured":"Li, C., Li, L., Jiang, H., Weng, K., Geng, Y., Li, L., Ke, Z., Li, Q., Cheng, M., and Nie, W. (2022). YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications. arXiv."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Wang, C.Y., Bochkovskiy, A., and Liao, H.Y.M. (2022). YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv.","DOI":"10.1109\/CVPR52729.2023.00721"},{"key":"ref_40","unstructured":"Chen, X., and Gong, Z. (2022, December 20). YOLOv5-Lite: Lighter, Faster and Easier to Deploy. Available online: https:\/\/zenodo.org\/record\/5241425#.Y6-BhhVBxPY."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/15\/1\/83\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T01:49:39Z","timestamp":1760147379000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/15\/1\/83"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,23]]},"references-count":40,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,1]]}},"alternative-id":["rs15010083"],"URL":"https:\/\/doi.org\/10.3390\/rs15010083","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,12,23]]}}}