{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,15]],"date-time":"2026-06-15T04:32:05Z","timestamp":1781497925097,"version":"3.54.1"},"reference-count":50,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2021,2,25]],"date-time":"2021-02-25T00:00:00Z","timestamp":1614211200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program of China","doi-asserted-by":"publisher","award":["2019YFC1510905"],"award-info":[{"award-number":["2019YFC1510905"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Air Force Equipment Pre-research Project","award":["303020401"],"award-info":[{"award-number":["303020401"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Object detection in optical remote sensing images (ORSIs) remains a difficult task because ORSIs always have some specific characteristics such as scale-differences between classes, numerous instances in one image and complex background texture. To address these problems, we propose a new Multi-Feature Pyramid Network (MFPNet) with Receptive Field Block (RFB) that integrates both local and global features to detect scattered objects and targets with scale-differences in ORSIs. We build a Multi-Feature Pyramid Module (M-FPM) with two cascaded convolution pyramids as the main structure of MFPNet, which handles object detection of different scales very well. RFB is designed to construct local context information, which makes the network more suitable for the objects detection around complex background. Asymmetric convolution kernel is introduced to RFB to improve the ability of feature attraction by adding nonlinear transformation. Then, a two-step detection network is constructed to combine the M-FPM and RFB to obtain more accurate results. Through a comprehensive evaluation of the experimental results on two publicly available remote sensing datasets Levir and DIOR, we demonstrate that our method outperforms state-of-the-art networks for about 1.3% mAP in Levir dataset and 4.1% mAP in DIOR dataset. Experimental results prove the effectiveness of our method in ORSIs of complex environments.<\/jats:p>","DOI":"10.3390\/rs13050862","type":"journal-article","created":{"date-parts":[[2021,2,26]],"date-time":"2021-02-26T04:36:24Z","timestamp":1614314184000},"page":"862","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":44,"title":["Object Detection in Remote Sensing Images via Multi-Feature Pyramid Network with Receptive Field Block"],"prefix":"10.3390","volume":"13","author":[{"given":"Zhichao","family":"Yuan","sequence":"first","affiliation":[{"name":"Image Processing Center, School of Astronautics, Beihang University, Beijing 100191, China"},{"name":"Beijing Key Laboratory of Digital Media, Beihang University, Beijing 100191, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ziming","family":"Liu","sequence":"additional","affiliation":[{"name":"Image Processing Center, School of Astronautics, Beihang University, Beijing 100191, China"},{"name":"Beijing Key Laboratory of Digital Media, Beihang University, Beijing 100191, China"},{"name":"Department of Computing, The Hong Kong Polytechnic University, Hongkong 999077, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Chunbo","family":"Zhu","sequence":"additional","affiliation":[{"name":"Image Processing Center, School of Astronautics, Beihang University, Beijing 100191, China"},{"name":"Beijing Key Laboratory of Digital Media, Beihang University, Beijing 100191, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jing","family":"Qi","sequence":"additional","affiliation":[{"name":"DFH Satellite Co., Ltd., Beijing 100094, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6701-0471","authenticated-orcid":false,"given":"Danpei","family":"Zhao","sequence":"additional","affiliation":[{"name":"Image Processing Center, School of Astronautics, Beihang University, Beijing 100191, China"},{"name":"Beijing Key Laboratory of Digital Media, Beihang University, Beijing 100191, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2021,2,25]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Lin, T., Goyal, P., Girshick, R.B., He, K., and Doll\u00e1r, P. (2017, January 22\u201329). Focal Loss for Dense Object Detection. Proceedings of the IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy.","DOI":"10.1109\/ICCV.2017.324"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.E., Fu, C., and Berg, A.C. (2016, January 11\u201314). SSD: Single Shot MultiBox Detector. Proceedings of the Computer Vision-ECCV 2016-14th European Conference, Amsterdam, The Netherlands. Proceedings, Part I.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S.K., Girshick, R.B., and Farhadi, A. (2016, January 27\u201330). You Only Look Once: Unified, Real-Time Object Detection. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Zhang, S., Wen, L., Bian, X., Lei, Z., and Li, S.Z. (2018, January 18\u201322). Single-Shot Refinement Neural Network for Object Detection. Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00442"},{"key":"ref_5","unstructured":"Dai, J., Li, Y., He, K., and Sun, J. (2016, January 5\u201310). R-FCN: Object Detection via Region-based Fully Convolutional Networks. Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, Barcelona, Spain."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Girshick, R.B. (2015, January 7\u201313). Fast R-CNN. Proceedings of the 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Girshick, R.B., Donahue, J., Darrell, T., and Malik, J. (2014, January 23\u201328). Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.81"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Doll\u00e1r, P., and Girshick, R.B. (2017, January 22\u201329). Mask R-CNN. Proceedings of the IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy.","DOI":"10.1109\/ICCV.2017.322"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Lin, T., Doll\u00e1r, P., Girshick, R.B., He, K., Hariharan, B., and Belongie, S.J. (2017, January 21\u201326). Feature Pyramid Networks for Object Detection. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_10","unstructured":"Ren, S., He, K., Girshick, R.B., and Sun, J. (2015, January 7\u201312). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, Montreal, QC, Canada."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"98","DOI":"10.1007\/s11263-014-0733-5","article-title":"The Pascal Visual Object Classes Challenge: A Retrospective","volume":"111","author":"Everingham","year":"2015","journal-title":"Int. J. Comput. Vis."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1007\/s11263-009-0275-4","article-title":"The Pascal Visual Object Classes (VOC) Challenge","volume":"88","author":"Everingham","year":"2010","journal-title":"Int. J. Comput. Vis."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Lin, T., Maire, M., Belongie, S.J., Hays, J., Perona, P., Ramanan, D., Doll\u00e1r, P., and Zitnick, C.L. (2014, January 6\u201312). Microsoft COCO: Common Objects in Context. Proceedings of the Computer Vision-ECCV 2014-13th European Conference, Zurich, Switzerland. Proceedings, Part V.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., and Tian, Q. (2019). CenterNet: Keypoint Triplets for Object Detection. arXiv.","DOI":"10.1109\/ICCV.2019.00667"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Tian, Z., Shen, C., Chen, H., and He, T. (2019). FCOS: Fully Convolutional One-Stage Object Detection. arXiv.","DOI":"10.1109\/ICCV.2019.00972"},{"key":"ref_16","unstructured":"Zhou, X., Wang, D., and Kr\u00e4henb\u00fchl, P. (2019). Objects as Points. arXiv."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Bhagavatula, C., Zhu, C., Luu, K., and Savvides, M. (2017, January 22\u201329). Faster than Real-Time Facial Alignment: A 3D Spatial Transformer Network Approach in Unconstrained Poses. Proceedings of the IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy.","DOI":"10.1109\/ICCV.2017.429"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Zheng, Y., Pal, D.K., and Savvides, M. (2018, January 18\u201322). Ring Loss: Convex Feature Normalization for Face Recognition. Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00534"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"He, T., Shen, C., Tian, Z., Gong, D., Sun, C., and Yan, Y. (2019, January 16\u201320). Knowledge Adaptation for Efficient Semantic Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00067"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Liu, Y., Chen, K., Liu, C., Qin, Z., Luo, Z., and Wang, J. (2019, January 16\u201320). Structured Knowledge Distillation for Semantic Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00271"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Tian, Z., He, T., Shen, C., and Yan, Y. (2019, January 16\u201320). Decoders Matter for Semantic Segmentation: Data-Dependent Decoding Enables Flexible Feature Aggregation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00324"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Liang, X., Wang, T., Yang, L., and Xing, E.P. (2018, January 8\u201314). CIRL: Controllable Imitative Reinforcement Learning for Vision-Based Self-driving. Proceedings of the Computer Vision-ECCV 2018-15th European Conference, Munich, Germany. Proceedings, Part VII.","DOI":"10.1007\/978-3-030-01234-2_36"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Wang, D., Devin, C., Cai, Q., Yu, F., and Darrell, T. (2019, January 20\u201324). Deep Object-Centric Policies for Autonomous Driving. Proceedings of the International Conference on Robotics and Automation, ICRA 2019, Montreal, QC, Canada.","DOI":"10.1109\/ICRA.2019.8794224"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"1100","DOI":"10.1109\/TIP.2017.2773199","article-title":"Random Access Memories: A New Paradigm for Target Detection in High Resolution Aerial Remote Sensing Images","volume":"27","author":"Zou","year":"2018","journal-title":"IEEE Trans. Image Process."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Xia, G., Bai, X., Ding, J., Zhu, Z., Belongie, S.J., Luo, J., Datcu, M., Pelillo, M., and Zhang, L. (2018, January 18\u201322). DOTA: A Large-Scale Dataset for Object Detection in Aerial Images. Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00418"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Li, K., Wan, G., Cheng, G., Meng, L., and Han, J. (2019). Object Detection in Optical Remote Sensing Images: A Survey and A New Benchmark. arXiv.","DOI":"10.1016\/j.isprsjprs.2019.11.023"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"265","DOI":"10.1109\/TIP.2018.2867198","article-title":"Learning Rotation-Invariant and Fisher Discriminative Convolutional Neural Networks for Object Detection","volume":"28","author":"Cheng","year":"2019","journal-title":"IEEE Trans. Image Process."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"7405","DOI":"10.1109\/TGRS.2016.2601622","article-title":"Learning Rotation-Invariant Convolutional Neural Networks for Object Detection in VHR Optical Remote Sensing Images","volume":"54","author":"Cheng","year":"2016","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"2337","DOI":"10.1109\/TGRS.2017.2778300","article-title":"Rotation-Insensitive and Context-Augmented Object Detection in Remote Sensing Images","volume":"56","author":"Li","year":"2018","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_30","unstructured":"Liu, L., Pan, Z., and Lei, B. (2017). Learning a Rotation Invariant Detector with Rotatable Bounding Box. arXiv."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"937","DOI":"10.1109\/LGRS.2018.2813094","article-title":"Arbitrary-Oriented Ship Detection Framework in Optical Remote-Sensing Images","volume":"15","author":"Liu","year":"2018","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Tang, T., Zhou, S., Deng, Z., Lei, L., and Zou, H. (2017). Arbitrary-Oriented Vehicle Detection in Aerial Imagery with Single Convolutional Neural Networks. Remote Sens., 9.","DOI":"10.3390\/rs9111170"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Cai, Z., and Vasconcelos, N. (2018, January 18\u201322). Cascade R-CNN: Delving Into High Quality Object Detection. Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00644"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Liu, S., Qi, L., Qin, H., Shi, J., and Jia, J. (2018, January 18\u201322). Path Aggregation Network for Instance Segmentation. Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00913"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Law, H., and Deng, J. (2018, January 8\u201314). CornerNet: Detecting Objects as Paired Keypoints. Proceedings of the Computer Vision-ECCV 2018-15th European Conference, Munich, Germany. Proceedings Part XIV.","DOI":"10.1007\/978-3-030-01264-9_45"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"791","DOI":"10.1109\/LGRS.2018.2882778","article-title":"Detection of Multiclass Objects in Optical Remote Sensing Images","volume":"16","author":"Liu","year":"2019","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Redmon, J., and Farhadi, A. (2017, January 21\u201326). YOLO9000: Better, Faster, Stronger. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.690"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Han, X., Zhong, Y., and Zhang, L. (2017). An Efficient and Robust Integrated Geospatial Object Detection Framework for High Spatial Resolution Remote Sensing Imagery. Remote Sens., 9.","DOI":"10.3390\/rs9070666"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Zhong, J., Lei, T., and Yao, G. (2017). Robust Vehicle Detection in Aerial Images Based on Cascaded Convolutional Neural Networks. Sensors, 17.","DOI":"10.3390\/s17122720"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Tang, T., Zhou, S., Deng, Z., Zou, H., and Lei, L. (2017). Vehicle Detection in Aerial Images Based on Region Convolutional Neural Networks and Hard Negative Example Mining. Sensors, 17.","DOI":"10.3390\/s17020336"},{"key":"ref_41","unstructured":"Fu, C., Liu, W., Ranga, A., Tyagi, A., and Berg, A.C. (2017). DSSD: Deconvolutional Single Shot Detector. arXiv."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Zhao, Q., Sheng, T., Wang, Y., Tang, Z., Chen, Y., Cai, L., and Ling, H. (February, January 27). M2Det: A Single-Shot Object Detector Based on Multi-Level Feature Pyramid Network. Proceedings of the The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, Honolulu, HI, USA.","DOI":"10.1609\/aaai.v33i01.33019259"},{"key":"ref_43","unstructured":"Singh, S.P., and Markovitch, S. (2017, January 4\u20139). Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., and Wei, Y. (2017, January 22\u201329). Deformable Convolutional Networks. Proceedings of the IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy. IEEE Computer Society.","DOI":"10.1109\/ICCV.2017.89"},{"key":"ref_45","unstructured":"Chen, L., Papandreou, G., Schroff, F., and Adam, H. (2017). Rethinking Atrous Convolution for Semantic Image Segmentation. arXiv."},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Liu, S., Huang, D., and Wang, Y. (, January 8\u201314). Receptive Field Block Net for Accurate and Fast Object Detection. Proceedings of the Computer Vision-ECCV 2018-15th European Conference, Munich, Germany. Proceedings, Part XI.","DOI":"10.1007\/978-3-030-01252-6_24"},{"key":"ref_47","unstructured":"Redmon, J., and Farhadi, A. (2018). YOLOv3: An Incremental Improvement. arXiv."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2015, January 7\u201313). Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. Proceedings of the 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.123"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep Residual Learning for Image Recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"4518","DOI":"10.1109\/JSTARS.2020.3015049","article-title":"A Contextual Bidirectional Enhancement Method for Remote Sensing Image Object Detection","volume":"13","author":"Zhang","year":"2020","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/13\/5\/862\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T05:28:49Z","timestamp":1760160529000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/13\/5\/862"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,2,25]]},"references-count":50,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2021,3]]}},"alternative-id":["rs13050862"],"URL":"https:\/\/doi.org\/10.3390\/rs13050862","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,2,25]]}}}