{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T01:43:29Z","timestamp":1760233409930,"version":"build-2065373602"},"reference-count":49,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2021,1,6]],"date-time":"2021-01-06T00:00:00Z","timestamp":1609891200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61725105","41701508"],"award-info":[{"award-number":["61725105","41701508"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>As a precursor step for computer vision algorithms, object detection plays an important role in various practical application scenarios. With the objects to be detected becoming more complex, the problem of multi-scale object detection has attracted more and more attention, especially in the field of remote sensing detection. Early convolutional neural network detection algorithms are mostly based on artificially preset anchor-boxes to divide different regions in the image, and then obtain the prior position of the target. However, the anchor box is difficult to set reasonably and will cause a large amount of computational redundancy, which affects the generality of the detection model obtained under fixed parameters. In the past two years, anchor-free detection algorithm has achieved remarkable development in the field of detection on natural image. However, there is no sufficient research on how to deal with multi-scale detection more effectively in anchor-free framework and use these detectors on remote sensing images. In this paper, we propose a specific-attention Feature Pyramid Network (FPN) module, which is able to generate a feature pyramid, basing on the characteristics of objects with various sizes. In addition, this pyramid suits multi-scale object detection better. Besides, a scale-aware detection head is proposed which contains a multi-receptive feature fusion module and a size-based feature compensation module. The new anchor-free detector can obtain a more effective multi-scale feature expression. Experiments on challenging datasets show that our approach performs favorably against other methods in terms of the multi-scale object detection performance.<\/jats:p>","DOI":"10.3390\/rs13020160","type":"journal-article","created":{"date-parts":[[2021,1,6]],"date-time":"2021-01-06T20:45:42Z","timestamp":1609965942000},"page":"160","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":15,"title":["AF-EMS Detector: Improve the Multi-Scale Detection Performance of the Anchor-Free Detector"],"prefix":"10.3390","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1453-3732","authenticated-orcid":false,"given":"Jiangqiao","family":"Yan","sequence":"first","affiliation":[{"name":"Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"Key Laboratory of Network Information System Technology (NIST), Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"University of Chinese Academy of Sciences, Beijing 100190, China"},{"name":"School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 100190, China"}]},{"given":"Liangjin","family":"Zhao","sequence":"additional","affiliation":[{"name":"Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"Key Laboratory of Network Information System Technology (NIST), Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, China"}]},{"given":"Wenhui","family":"Diao","sequence":"additional","affiliation":[{"name":"Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"Key Laboratory of Network Information System Technology (NIST), Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, China"}]},{"given":"Hongqi","family":"Wang","sequence":"additional","affiliation":[{"name":"Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"Key Laboratory of Network Information System Technology (NIST), Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"University of Chinese Academy of Sciences, Beijing 100190, China"},{"name":"School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 100190, China"}]},{"given":"Xian","family":"Sun","sequence":"additional","affiliation":[{"name":"Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"Key Laboratory of Network Information System Technology (NIST), Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"University of Chinese Academy of Sciences, Beijing 100190, China"},{"name":"School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 100190, China"}]}],"member":"1968","published-online":{"date-parts":[[2021,1,6]]},"reference":[{"key":"ref_1","unstructured":"Simonyan, K., and Zisserman, A. (2015, January 7\u20139). Very Deep Convolutional Networks for Large-Scale Image Recognition. Proceedings of the 2015 International Conference on Learning Representations (ICLR), Santiago, Chile."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Zagoruyko, S., and Komodakis, N. (2016). Wide Residual Networks. Comput. Vis. Pattern Recognit.","DOI":"10.5244\/C.30.87"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Xie, S., Girshick, R., Doll\u00e1r, P., Tu, Z., and He, K. (2017, January 21\u201326). Aggregated Residual Transformations for Deep Neural Networks. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.634"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"261","DOI":"10.1007\/s11263-019-01247-4","article-title":"Deep Learning for Generic Object Detection: A Survey","volume":"128","author":"Liu","year":"2020","journal-title":"Int. J. Comput. Vis."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"294","DOI":"10.1016\/j.isprsjprs.2020.01.025","article-title":"Rotation-aware and multi-scale convolutional neural network for object detection in remote sensing images","volume":"161","author":"Kun","year":"2020","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Dai, J., Qi, H., Xiong, Y., Yi, L., and Wei, Y. (2017, January 22\u201329). Deformable Convolutional Networks. Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.89"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Doll\u00e1r, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21\u201326). Feature Pyramid Networks for Object Detection. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Zhu, X., Hu, H., Lin, S., and Dai, J. (2019, January 16\u201320). Deformable ConvNets v2: More Deformable, Better Results. Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00953"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"50839","DOI":"10.1109\/ACCESS.2018.2869884","article-title":"Position Detection and Direction Prediction for Arbitrary-Oriented Ships via Multitask Rotation Region Convolutional Neural Network","volume":"6","author":"Yang","year":"2018","journal-title":"IEEE Access"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Yang, X., Liu, Q., Yan, J., and Li, A. (2021, January 2\u20139). R3Det: Refined Single-Stage Detector with Feature Refinement for Rotating Object. Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21), Virtual Conference, Available online: https:\/\/aaai.org\/Conferences\/AAAI-21\/.","DOI":"10.1609\/aaai.v35i4.16426"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Prasomphan, S., Tathong, T., and Charoenprateepkit, P. (2019, January 22\u201324). Traffic Sign Detection for Panoramic Images Using Convolution Neural Network Technique. Proceedings of the the 2019 3rd High Performance Computing and Cluster Technologies Conference, Guangzhou, China.","DOI":"10.1145\/3341069.3341090"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","article-title":"Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks","volume":"39","author":"Ren","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (July, January 26). You Only Look Once: Unified, Real-Time Object Detection. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., and Berg, A.C. (2016, January 8\u201316). SSD: Single Shot MultiBox Detector. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Tian, Z., Shen, C., Chen, H., and He, T. (November, January 27). FCOS: Fully Convolutional One-Stage Object Detection. Proceedings of the 2019 IEEE International Conference on Computer Vision (ICCV), Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00972"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Yang, Z., Liu, S., Hu, H., Wang, L., and Lin, S. (November, January 27). RepPoints: Point Set Representation for Object Detection. Proceedings of the 2019 IEEE International Conference on Computer Vision (ICCV), Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00975"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Singh, B., and Davis, L.S. (2018, January 18\u201322). An Analysis of Scale Invariance in Object Detection-SNIP. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00377"},{"key":"ref_18","unstructured":"Singh, B., Najibi, M., and Davis, L.S. (2018, January 2\u20138). SNIPER: Efficient Multi-Scale Training. Proceedings of the Thirty-second Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, QC, Canada."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"80622","DOI":"10.1109\/ACCESS.2019.2923016","article-title":"SSD-MSN: An Improved Multi-Scale Object Detection Network Based on SSD","volume":"7","author":"Chen","year":"2019","journal-title":"IEEE Access"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"57552","DOI":"10.1109\/ACCESS.2020.2982658","article-title":"Adaptive Anchor Networks for Multi-Scale Object Detection in Remote Sensing Images","volume":"8","author":"Zhang","year":"2020","journal-title":"IEEE Access"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (July, January 26). Deep Residual Learning for Image Recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Zhang, S., Zhu, X., Lei, Z., Shi, H., Wang, X., and Li, S.Z. (2017, January 22\u201329). S3FD: Single Shot Scale-invariant Face Detector. Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.30"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Li, J., Wang, Y., Wang, C., Tai, Y., Qian, J., Yang, J., and Huang, F. (2019, January 16\u201320). DSFD: Dual Shot Face Detector. Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00520"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Yang, X., Yang, J., Yan, J., Zhang, Y., Zhang, T., Guo, Z., Xian, S., and Fu, K. (November, January 27). SCRDet: Towards More Robust Detection for Small, Cluttered and Rotated Objects. Proceedings of the 2019 IEEE International Conference on Computer Vision (ICCV), Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00832"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Yan, J., Wang, H., Yan, M., Diao, W., Sun, X., and Li, H. (2019). IoU-Adaptive Deformable R-CNN: Make Full Use of IoU for Multi-Class Object Detection in Remote Sensing Imagery. Remote Sens., 11.","DOI":"10.3390\/rs11030286"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Tychsen-Smith, L., and Petersson, L. (2017, January 22\u201329). DeNet: Scalable Real-Time Object Detection with Directed Sparse Sampling. Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.54"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Law, H., and Deng, J. (2018, January 8\u201314). CornerNet: Detecting Objects as Paired Keypoints. Proceedings of the European Conference on Computer Vision, Munich, Germany.","DOI":"10.1007\/978-3-030-01264-9_45"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., and Tian, Q. (November, January 27). CenterNet: Keypoint Triplets for Object Detection. Proceedings of the 2019 IEEE International Conference on Computer Vision (ICCV), Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00667"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Wang, J., Chen, K., Yang, S., Loy, C.C., and Lin, D. (2019, January 16\u201320). Region Proposal by Guided Anchoring. Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00308"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Lowe, D.G. (1999, January 20\u201327). Object recognition from local scale-invariant features. Proceedings of the Seventh IEEE International Conference on Computer Vision, Kerkyra, Greece.","DOI":"10.1109\/ICCV.1999.790410"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1145\/3065386","article-title":"ImageNet Classification with Deep Convolutional Neural Networks","volume":"60","author":"Krizhevsky","year":"2017","journal-title":"Commun. ACM"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Huang, G., Liu, Z., Laurens, V.D.M., and Weinberger, K.Q. (2017, January 21\u201326). Densely Connected Convolutional Networks. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.243"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Liu, W., Jia, Y., Sermanet, P., and Rabinovich, A. (2015, January 7\u201312). Going deeper with convolutions. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Ioffe, S., Vanhoucke, V., and Alemi, A. (2016, January 2\u20134). Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. Proceedings of the 2016 International Conference on Learning Representations (ICLR), San Juan, Puerto Rico.","DOI":"10.1609\/aaai.v31i1.11231"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna, Z. (2016, January 27\u201330). Rethinking the Inception Architecture for Computer Vision. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.308"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Li, Y., Chen, Y., Wang, N., and Zhang, Z. (November, January 27). Scale-Aware Trident Networks for Object Detection. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00615"},{"key":"ref_37","unstructured":"Gao, S., Cheng, M.M., Zhao, K., Zhang, X.Y., and Torr, P.H.S. (2019). Res2Net: A New Multi-scale Backbone Architecture. IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_38","unstructured":"Zhao, Q., Sheng, T., Wang, Y., Tang, Z., Cai, L., Chen, Y., and Ling, H. (February, January 27). M2Det: A Single-Shot Object detector based on Multi-Level Feature Pyramid Network. Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI), Honolulu, HI, USA."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Pang, J., Chen, K., Shi, J., Feng, H., Ouyang, W., and Lin, D. (2019, January 16\u201320). Libra R-CNN: Towards Balanced Learning for Object Detection. Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00091"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Liu, S., Qi, L., Qin, H., Shi, J., and Jia, J. (2018, January 18\u201322). Path Aggregation Network for Instance Segmentation. Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00913"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Yan, J., Zhang, Y., Chang, Z., Zhang, T., and Sun, X. (2020, January 7\u201312). FAS-Net: Construct Effective Features Adaptively for Multi-Scale Object Detection. Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI), New York, NY, USA.","DOI":"10.1609\/aaai.v34i07.6947"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Hu, J., Shen, L., Albanie, S., Sun, G., and Wu, E. (2018, January 18\u201322). Squeeze-and-Excitation Networks. Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00745"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Cao, Y., Xu, J., Lin, S., Wei, F., and Hu, H. (November, January 27). GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond. Proceedings of the 2019 IEEE International Conference on Computer Vision (ICCV), Seoul, Korea.","DOI":"10.1109\/ICCVW.2019.00246"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Woo, S., Park, J., Lee, J.Y., and Kweon, I.S. (2018, January 8\u201314). CBAM: Convolutional Block Attention Module. Proceedings of the European Conference on Computer Vision, Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Goyal, P., Girshick, R., He, K., and Doll\u00e1r, P. (2017, January 22\u201329). Focal Loss for Dense Object Detection. Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.324"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Xia, G., Bai, X., Ding, J., Zhu, Z., Belongie, S.J., Luo, J., Datcu, M., Pelillo, M., and Zhang, L. (2018, January 18\u201322). DOTA: A Large-Scale Dataset for Object Detection in Aerial Images. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00418"},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1007\/s11263-009-0275-4","article-title":"The Pascal Visual Object Classes (VOC) Challenge","volume":"88","author":"Everingham","year":"2010","journal-title":"Int. J. Comput. Vis."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Lin, T., Maire, M., Belongie, S.J., Hays, J., Perona, P., Ramanan, D., Dollar, P., and Zitnick, C.L. (2014). Microsoft COCO: Common Objects in Context. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Azimi, S.M., Vig, E., Bahmanyar, R., Korner, M., and Reinartz, P. (2018, January 2\u20136). Towards Multi-class Object Detection in Unconstrained Remote Sensing Imagery. Proceedings of the 14th Asian Conference on Computer Vision, Perth, Australia.","DOI":"10.1007\/978-3-030-20893-6_10"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/13\/2\/160\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T05:07:24Z","timestamp":1760159244000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/13\/2\/160"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,1,6]]},"references-count":49,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2021,1]]}},"alternative-id":["rs13020160"],"URL":"https:\/\/doi.org\/10.3390\/rs13020160","relation":{},"ISSN":["2072-4292"],"issn-type":[{"type":"electronic","value":"2072-4292"}],"subject":[],"published":{"date-parts":[[2021,1,6]]}}}