{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,22]],"date-time":"2026-04-22T19:18:18Z","timestamp":1776885498858,"version":"3.51.2"},"reference-count":66,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2023,6,9]],"date-time":"2023-06-09T00:00:00Z","timestamp":1686268800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Natural Science Foundation","award":["62076137"],"award-info":[{"award-number":["62076137"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>SSD is a classical single-stage object detection algorithm, which predicts by generating different scales of feature maps on different convolutional layers. However, due to the problems of its insufficient non-linearity and the lack of semantic information in the shallow feature maps, as well as the fact that small objects contain few pixels, the detection accuracy of small objects is significantly worse than that of large- and medium-scale objects. Considering the above problems, we propose a novel object detector, self-attention combined feature fusion-based SSD for small object detection (SAFF-SSD), to boost the precision of small object detection. In this work, a novel self-attention module called the Local Lighted Transformer block (2L-Transformer) is proposed and is coupled with EfficientNetV2-S as our backbone for improved feature extraction. CSP-PAN topology is adopted as the detection neck to equip feature maps with both low-level object detail features and high-level semantic features, improving the accuracy of object detection and having a clear, noticeable and definitive effect on the detection of small targets. Simultaneously, we substitute the normalized Wasserstein distance (NWD) for the commonly used Intersection over Union (IoU), which alleviates the problem wherein the extensions of IoU-based metrics are very sensitive to the positional deviation of the small objects. The experiments illustrate the promising performance of our detector on many datasets, such as Pascal VOC 2007, TGRS-HRRSD and AI-TOD.<\/jats:p>","DOI":"10.3390\/rs15123027","type":"journal-article","created":{"date-parts":[[2023,6,9]],"date-time":"2023-06-09T08:37:33Z","timestamp":1686299853000},"page":"3027","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":55,"title":["SAFF-SSD: Self-Attention Combined Feature Fusion-Based SSD for Small Object Detection in Remote Sensing"],"prefix":"10.3390","volume":"15","author":[{"given":"Bihan","family":"Huo","sequence":"first","affiliation":[{"name":"School of Mathematics and Statistics, Nanjing University of Information Science and Technology, Nanjing 210044, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chenglong","family":"Li","sequence":"additional","affiliation":[{"name":"School of Mathematics and Statistics, Nanjing University of Information Science and Technology, Nanjing 210044, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jianwei","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Mathematics and Statistics, Nanjing University of Information Science and Technology, Nanjing 210044, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-1867-8517","authenticated-orcid":false,"given":"Yingjian","family":"Xue","sequence":"additional","affiliation":[{"name":"School of Mathematics and Statistics, Nanjing University of Information Science and Technology, Nanjing 210044, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhoujin","family":"Lin","sequence":"additional","affiliation":[{"name":"School of Mathematics and Statistics, Nanjing University of Information Science and Technology, Nanjing 210044, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2023,6,9]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"103910","DOI":"10.1016\/j.imavis.2020.103910","article-title":"Recent Advances in Small Object Detection Based on Deep Learning: A Review","volume":"97","author":"Tong","year":"2020","journal-title":"Image Vis. Comput."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"2278","DOI":"10.1109\/5.726791","article-title":"Gradient-Based Learning Applied to Document Recognition","volume":"86","author":"Lecun","year":"1998","journal-title":"Proc. IEEE"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 7\u201313). Fast R-CNN. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","article-title":"Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks","volume":"39","author":"Ren","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016, January 27\u201330). You Only Look Once: Unified, Real-Time Object Detection. Proceedings of the 2016 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Redmon, J., and Farhadi, A. (2017, January 21\u201326). YOLO9000: Better, Faster, Stronger. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.690"},{"key":"ref_7","unstructured":"Redmon, J., and Farhadi, A. (2018). YOLOv3: An Incremental Improvement. arXiv."},{"key":"ref_8","unstructured":"Bochkovskiy, A., Wang, C.-Y., and Liao, H.Y.M. (2020). Yolov4: Optimal speed and accuracy of object detection. arXiv."},{"key":"ref_9","unstructured":"Glenn, J. (2020, June 10). YOLOv5 Release v6.1. Available online: https:\/\/github.com\/ultralytics\/yolov5\/releases\/tag\/v6.1."},{"key":"ref_10","unstructured":"Ge, Z., Liu, S., Wang, F., Li, Z., and Sun, J. (2021). Yolox: Exceeding yolo series in 2021. arXiv."},{"key":"ref_11","unstructured":"Xu, S., Wang, X., Lv, W., Chang, Q., Cui, C., Deng, K., Wang, G., Dang, Q., Wei, S., and Du, Y. (2022). Pp-yoloe: An evolved version of yolo. arXiv."},{"key":"ref_12","unstructured":"Wang, C.Y., Bochkovskiy, A., and Liao, H.Y.M. (2022). Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., and Erhan, D. (2016, January 11\u201314). SSD: Single shot multibox detector. Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"6549","DOI":"10.1007\/s00521-018-3486-1","article-title":"An enhanced SSD with feature fusion and visual reasoning for object detection","volume":"31","author":"Leng","year":"2019","journal-title":"Neural Comput. Appl."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Shi, W., Bao, S., and Tan, D. (2019). FFESSD: An Accurate and Efficient Single-Shot Detector for Target Detection. Appl. Sci., 9.","DOI":"10.3390\/app9204276"},{"key":"ref_16","first-page":"310","article-title":"SSD small target detection algorithm based on deconvolution and feature fusion","volume":"15","author":"Zhao","year":"2020","journal-title":"CAAI Trans. Intell. Syst."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Jeong, J., Park, H., and Kwak, N. (2017, January 4\u20137). Enhancement of SSD by Concatenating Feature Maps for Object Detection. Proceedings of the British Machine Vision Conference, London, UK.","DOI":"10.5244\/C.31.76"},{"key":"ref_18","first-page":"94","article-title":"MDSSD: Multi-scale deconvolutional single shot detector for small objects","volume":"63","author":"Cui","year":"2020","journal-title":"Sci. China (Inf. Sci.)"},{"key":"ref_19","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An Image Is Worth 16 \u00d7 16 Words: Transformers for Image Recognition at Scale 2021. arXiv."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"79","DOI":"10.1016\/j.isprsjprs.2022.06.002","article-title":"Detecting tiny objects in aerial images: A normalized Wasserstein distance and a new benchmark","volume":"190","author":"Xu","year":"2022","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Lin, T.-Y., Dollar, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21\u201326). Feature Pyramid Networks for Object Detection. Proceedings of the 2017 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Liu, S., Qi, L., Qin, H., Shi, J., and Jia, J. (2018, January 18\u201320). Path Aggregation Network for Instance Segmentation. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00913"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Tan, M., Pang, R., and Le, Q.V. (2020, January 14\u201319). EfficientDet: Scalable and Efficient Object Detection. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01079"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Ghiasi, G., Lin, T.-Y., and Le, Q.V. (2019, January 16\u201320). NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00720"},{"key":"ref_25","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, \u0141., and Polosukhin, I. (2017, January 4\u20139). Attention is all you need. Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS\u201917), Long Beach, CA, USA."},{"key":"ref_26","unstructured":"Jacob, D., Ming, C., Kenton, L., and Toutanova, K. (2019, January 2\u20137). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), Minneapolis, MN, USA."},{"key":"ref_27","unstructured":"Alec, R., Karthik, N., Tim, S., and Ilya, S. (2018). Improving Language Understanding with Unsupervised Learning. Tech. Rep., 4."},{"key":"ref_28","first-page":"9","article-title":"Language models are unsupervised multitask learners","volume":"1","author":"Radford","year":"2019","journal-title":"OpenAI Blog"},{"key":"ref_29","unstructured":"Tom, B., Benjamin, M., Nick, R., Melanie, S., Jared, K., Prafulla, D., Arvind, N., Pranav, S., Girish, S., and Amanda, A. (2020, January 6\u201312). Language models are few-shot learners. Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, BC, Canada."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Wang, X., Girshick, R., Gupta, A., and He, K. (2018, January 18\u201320). Non-Local Neural Networks. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00813"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Cao, Y., Xu, J., Lin, S., Wei, F., and Hu, H. (November, January 27). GCNet: Non-Local Networks Meet Squeeze-Excitation Networks and Beyond. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea.","DOI":"10.1109\/ICCVW.2019.00246"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Bello, I., Zoph, B., Le, Q., Vaswani, A., and Shlens, J. (November, January 27). Attention Augmented Convolutional Networks. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea.","DOI":"10.1109\/ICCV.2019.00338"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Yin, M., Yao, Z., Cao, Y., Li, X., Zhang, Z., Lin, S., and Hu, H. (2020, January 23\u201328). Disentangled non-local neural networks. Proceedings of the European Conference on Computer Vision (ECCV), Online.","DOI":"10.1007\/978-3-030-58555-6_12"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., and Lu, H. (2019, January 16\u201320). Dual Attention Network for Scene Segmentation. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00326"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Hu, H., Gu, J., Zhang, Z., Dai, J., and Wei, Y. (2018, January 18\u201320). Relation Networks for Object Detection. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00378"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Gu, J., Hu, H., Wang, L., Wei, Y., and Dai, J. (2018, January 8\u201314). Learning Region Features for Object Detection. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01258-8_24"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., and Zagoruyko, S. (2020, January 23\u201328). End-to-End Object Detection with Transformers. Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK. Proceedings, Part I 16.","DOI":"10.1007\/978-3-030-58452-8_13"},{"key":"ref_38","unstructured":"Cheng, C., Fangyun, W., and Han, H. (2020, January 6\u201312). Relationnet++: Bridging visual representations for object detection via transformer decoder. Proceedings of the Thirty-Fourth Annual Conference on Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada."},{"key":"ref_39","unstructured":"Zhu, X., Su, W., Lu, L., Li, B., Wang, X., and Dai, J. (2021, January 3\u20137). Deformable {detr}: Deformable transformers for end-to-end object detection. Proceedings of the 2021 International Conference on Learning Representations, Online."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Sun, P., Zhang, R., Jiang, Y., Kong, T., Xu, C., Zhan, W., Tomizuka, M., Li, L., Yuan, Z., and Wang, C. (2021, January 19\u201325). Sparse R-CNN: End-to-End Object Detection with Learnable Proposals. Proceedings of the 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Online.","DOI":"10.1109\/CVPR46437.2021.01422"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., and Savarese, S. (2019, January 16\u201320). Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00075"},{"key":"ref_42","unstructured":"Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., and Ren, D. (2020, January 7\u201312). Distance-iou loss: Faster and better learning for bounding box regression. Proceedings of the 2020 AAAI Conference on Artifificial Intelligence (AAAI), New York, NY, USA."},{"key":"ref_43","unstructured":"Yang, X., Yan, J., Ming, Q., Wang, W., Zhang, X., and Tian, Q. (2021, January 18\u201324). Rethinking rotated object detection with gaussian Wasserstein distance loss. Proceedings of the 2021 International Conference on Machine Learning (ICML), Online."},{"key":"ref_44","unstructured":"Tan, M., and Le, Q.V. (2021, January 18\u201324). EfficientNetV2: Smaller Models and Faster Training. Proceedings of the 2021 International Conference on Machine Learning (ICML), Online."},{"key":"ref_45","unstructured":"Hu, J., Shen, L., Albanie, S., Sun, G., and Wu, E. (2019, January 16\u201320). Squeeze-and-Excitation Networks 2019. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA."},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021, January 19\u201325). Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. Proceedings of the 2021 IEEE\/CVF International Conference on Computer Vision (CVPR), Online.","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"ref_47","unstructured":"Mehta, S., and Rastegari, M. (2022). MobileViT: Light-Weight, General-Purpose, and Mobile-Friendly Vision Transformer. arXiv."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"1904","DOI":"10.1109\/TPAMI.2015.2389824","article-title":"Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition","volume":"37","author":"He","year":"2015","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2015, January 7\u201312). Going Deeper with Convolutions. Proceedings of the 2015 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Wang, C.Y., Liao, H.Y.M., Wu, Y.H., Chen, P.Y., Hsieh, J.W., and Yeh, I.H. (2020, January 14\u201319). CSPNet: A new backbone that can enhance learning capability of CNN. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPRW50498.2020.00203"},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1007\/s11263-009-0275-4","article-title":"The Pascal Visual Object Classes (VOC) Challenge","volume":"88","author":"Everingham","year":"2010","journal-title":"Int. J. Comput. Vis."},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"5535","DOI":"10.1109\/TGRS.2019.2900302","article-title":"Hierarchical and Robust Convolutional Neural Network for Very High-Resolution Remote Sensing Object Detection","volume":"57","author":"Zhang","year":"2019","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Wang, J., Yang, W., Guo, H., Zhang, R., and Xia, G.-S. (2021, January 18\u201321). Tiny Object Detection in Aerial Images. Proceedings of the 2021 26th International Conference on Pattern Recognition (ICPR), Taichung, Taiwan.","DOI":"10.1109\/ICPR48806.2021.9413340"},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll\u00e1r, P., and Zitnick, C.L. (2014, January 5\u201312). Microsoft COCO: Common Objects in Context. Proceedings of the European Conference on Computer Vision (ECCV), Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Xia, G.-S., Bai, X., Ding, J., Zhu, Z., Belongie, S., Luo, J., Datcu, M., Pelillo, M., and Zhang, L. (2018, January 18\u201320). DOTA: A Large-Scale Dataset for Object Detection in Aerial Images. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00418"},{"key":"ref_56","doi-asserted-by":"crossref","first-page":"318","DOI":"10.1109\/TPAMI.2018.2858826","article-title":"Focal Loss for Dense Object Detection","volume":"42","author":"Lin","year":"2020","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_57","unstructured":"Li, C., Li, L., Jiang, H., Weng, K., Geng, Y., Li, L., Ke, Z., Li, Q., Cheng, M., and Nie, W. (2022). YOLOv6: A one-stage object detection framework for industrial applications. arXiv."},{"key":"ref_58","doi-asserted-by":"crossref","first-page":"7389","DOI":"10.1109\/TIP.2020.3002345","article-title":"FoveaBox: Beyound Anchor-Based Object Detection","volume":"29","author":"Kong","year":"2020","journal-title":"IEEE Trans. Image Process."},{"key":"ref_59","unstructured":"(2023, April 17). YOLO by Ultralytics (Version 8.0.0). Available online: https:\/\/github.com\/ultralytics\/ultralytics."},{"key":"ref_60","doi-asserted-by":"crossref","unstructured":"Liu, K., Huang, J., and Li, X. (2022). Eagle-Eye-Inspired Attention for Object Detection in Remote Sensing. Remote Sens., 14.","DOI":"10.3390\/rs14071743"},{"key":"ref_61","doi-asserted-by":"crossref","unstructured":"Li, Y., Chen, Y., Wang, N., and Zhang, Z.-X. (November, January 27). Scale-Aware Trident Networks for Object Detection. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea.","DOI":"10.1109\/ICCV.2019.00615"},{"key":"ref_62","doi-asserted-by":"crossref","unstructured":"Yang, Z., Liu, S., Hu, H., Wang, L., and Lin, S. (November, January 27). RepPoints: Point Set Representation for Object Detection. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea.","DOI":"10.1109\/ICCV.2019.00975"},{"key":"ref_63","doi-asserted-by":"crossref","unstructured":"Tian, Z., Shen, C., Chen, H., and He, T. (November, January 27). FCOS: Fully Convolutional One-Stage Object Detection. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea.","DOI":"10.1109\/ICCV.2019.00972"},{"key":"ref_64","doi-asserted-by":"crossref","unstructured":"Zhang, S., Chi, C., Yao, Y., Lei, Z., and Li, S.Z. (2020, January 14\u201319). Bridging the Gap Between Anchor-Based and Anchor-Free Detection via Adaptive Training Sample Selection. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00978"},{"key":"ref_65","doi-asserted-by":"crossref","unstructured":"Cai, Z., and Vasconcelos, N. (2018, January 18\u201322). Cascade R-CNN: Delving into High Quality Object Detection. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00644"},{"key":"ref_66","doi-asserted-by":"crossref","unstructured":"Qiao, S., Chen, L.-C., and Yuille, A. (2021, January 19\u201325). DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution. Proceedings of the 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Online.","DOI":"10.1109\/CVPR46437.2021.01008"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/15\/12\/3027\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T19:51:50Z","timestamp":1760125910000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/15\/12\/3027"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,9]]},"references-count":66,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2023,6]]}},"alternative-id":["rs15123027"],"URL":"https:\/\/doi.org\/10.3390\/rs15123027","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,6,9]]}}}