{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,25]],"date-time":"2026-03-25T00:33:09Z","timestamp":1774398789187,"version":"3.50.1"},"reference-count":37,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2019,2,8]],"date-time":"2019-02-08T00:00:00Z","timestamp":1549584000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Key Projects of Science and Technology Agency of Guangxi province, China","award":["Guike AA 17129002"],"award-info":[{"award-number":["Guike AA 17129002"]}]},{"name":"National Science and Technology Key Program of China","award":["2013GS500303"],"award-info":[{"award-number":["2013GS500303"]}]},{"name":"Municipal Science and Technology Project of CQMMC, China","award":["2017030502"],"award-info":[{"award-number":["2017030502"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Object detection has attracted increasing attention in the field of remote sensing image analysis. Complex backgrounds, vertical views, and variations in target kind and size in remote sensing images make object detection a challenging task. In this work, considering that the types of objects are often closely related to the scene in which they are located, we propose a convolutional neural network (CNN) by combining scene-contextual information for object detection. Specifically, we put forward the scene-contextual feature pyramid network (SCFPN), which aims to strengthen the relationship between the target and the scene and solve problems resulting from variations in target size. Additionally, to improve the capability of feature extraction, the network is constructed by repeating a building aggregated residual block. This block increases the receptive field, which can extract richer information for targets and achieve excellent performance with respect to small object detection. Moreover, to improve the proposed model performance, we use group normalization, which divides the channels into groups and computes the mean and variance for normalization within each group, to solve the limitation of the batch normalization. The proposed method is validated on a public and challenging dataset. The experimental results demonstrate that our proposed method outperforms other state-of-the-art object detection models.<\/jats:p>","DOI":"10.3390\/rs11030339","type":"journal-article","created":{"date-parts":[[2019,2,11]],"date-time":"2019-02-11T03:26:01Z","timestamp":1549855561000},"page":"339","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":62,"title":["Object Detection in Remote Sensing Images Based on a Scene-Contextual Feature Pyramid Network"],"prefix":"10.3390","volume":"11","author":[{"given":"Chaoyue","family":"Chen","sequence":"first","affiliation":[{"name":"Key Lab of Optoelectronic Technology &amp; Systems of Education Ministry, Chongqing University, Chongqing 400044, China"}]},{"given":"Weiguo","family":"Gong","sequence":"additional","affiliation":[{"name":"Key Lab of Optoelectronic Technology &amp; Systems of Education Ministry, Chongqing University, Chongqing 400044, China"}]},{"given":"Yongliang","family":"Chen","sequence":"additional","affiliation":[{"name":"Key Lab of Optoelectronic Technology &amp; Systems of Education Ministry, Chongqing University, Chongqing 400044, China"}]},{"given":"Weihong","family":"Li","sequence":"additional","affiliation":[{"name":"Key Lab of Optoelectronic Technology &amp; Systems of Education Ministry, Chongqing University, Chongqing 400044, China"}]}],"member":"1968","published-online":{"date-parts":[[2019,2,8]]},"reference":[{"key":"ref_1","unstructured":"Krizhevsky, A., Sutskever, I., and Hinton, G.E. (2012, January 3\u20136). ImageNet classification with deep convolutional neural networks. Proceedings of the International Conference on Neural Information Processing Systems (NIPS), Lake Tahoe, NV, USA."},{"key":"ref_2","unstructured":"Simonyan, K., and Zisserman, A. (arXiv, 2014). Very Deep Convolutional Networks for Large-Scale Image Recognition, arXiv."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Liu, W., Jia, Y., Jia, Y.Q., and Sermanet, P. (2015, January 7\u201312). Going deeper with convolutions. Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"ref_4","unstructured":"Dai, J.F., Li, Y., He, K.M., and Sun, J. (arXiv, 2016). R-FCN: Object Detection via Region-based Fully Convolutional Networks, arXiv."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014, January 23\u201328). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.81"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"272","DOI":"10.1016\/j.patcog.2017.03.030","article-title":"Locality Constraint Distance Metric Learning for Traffic Congestion Detection","volume":"75","author":"Wang","year":"2018","journal-title":"Pattern Recognition."},{"key":"ref_7","unstructured":"Wang, Q., Chen, M., Nie, F., and Li, X. (2018). Detecting Coherent Groups in Crowd Scenes by Multiview Clustering. TPAMI."},{"key":"ref_8","first-page":"1918","article-title":"An Incremental framework for Video-based Traffic Sign Detection, Tracking and Recognition","volume":"18","author":"Yuan","year":"2016","journal-title":"ITSM"},{"key":"ref_9","first-page":"142","article-title":"Region-Based Convolutional Networks for Accurate Object Detection and Segmentation","volume":"38","author":"Girshick","year":"2015","journal-title":"TRAMPI"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"154","DOI":"10.1007\/s11263-013-0620-5","article-title":"Selective Search for Object Recognition","volume":"104","author":"Uijlings","year":"2013","journal-title":"Int. J. Comput. Vis."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 11\u201318). Fast R-CNN. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_12","first-page":"1137","article-title":"Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks","volume":"39","author":"Ren","year":"2017","journal-title":"TRAMPI"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Zeng, D., Zhao, F., Ge, S., and Shen, W. (2018). Fast cascade face detection with pyramid network. Pattern Recognit. Lett.","DOI":"10.1016\/j.patrec.2018.05.024"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Fu, C., and Berg, A.C. (2016, January 8\u201316). SSD: Single Shot Multibox Detector. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Doll\u00e1r, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (arXiv, 2017). Feature pyramid networks for object detection, arXiv.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Vakalopoulou, M., Karantzalos, K., Komodakis, N., and Paragios, N. (2015, January 26\u201331). Building detection in very high resolution multispectral data with deep learning features. Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Milan, Italy.","DOI":"10.1109\/IGARSS.2015.7326158"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Ammour, N., Alhichri, H., Bazi, Y., Benjdira, B., Alajlan, N., and Zuair, M. (2017). Deep Learning Approach for Car Detection in UAV Imagery. Remote Sens., 9.","DOI":"10.3390\/rs9040312"},{"key":"ref_18","first-page":"1","article-title":"A Hierarchical Oil Tank Detector with Deep Surrounding Features for High-Resolution Optical Satellite Imagery","volume":"8","author":"Zhang","year":"2017","journal-title":"IEEE J. STARS"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"2486","DOI":"10.1109\/TGRS.2016.2645610","article-title":"Accurate Object Localization in Remote Sensing Images Based on Convolutional Neural Networks","volume":"55","author":"Long","year":"2017","journal-title":"IEEE Geosci. Remote Sens."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"5553","DOI":"10.1109\/TGRS.2016.2569141","article-title":"Weakly Supervised Learning Based on Coupled Convolutional Neural Networks for Aircraft Detection","volume":"54","author":"Zhang","year":"2016","journal-title":"IEEE Geosci. Remote Sens."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"645","DOI":"10.1109\/TGRS.2016.2612821","article-title":"Convolutional Neural Networks for Large-Scale Remote-Sensing Image Classification","volume":"55","author":"Maggiori","year":"2016","journal-title":"IEEE Geosci. Remote Sens."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"1998","DOI":"10.1109\/LGRS.2017.2745900","article-title":"Rural Building Detection in High-Resolution Imagery Based on a Two-Stage CNN Model","volume":"14","author":"Sun","year":"2017","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"7405","DOI":"10.1109\/TGRS.2016.2601622","article-title":"Learning Rotation-Invariant Convolutional Neural Networks for Object Detection in VHR Optical Remote Sensing Images","volume":"54","author":"Cheng","year":"2016","journal-title":"IEEE Geosci. Remote Sens."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"461","DOI":"10.5194\/isprs-archives-XLII-1-W1-461-2017","article-title":"Learning Oriented Region-based Convolutional Neural Networks for Building Detection in Satellite Remote Sensing Images","volume":"XLII-1\/W1","author":"Chen","year":"2017","journal-title":"Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1016\/j.isprsjprs.2018.04.003","article-title":"Multi-scale object detection in remote sensing imagery with convolutional neural networks","volume":"145","author":"Deng","year":"2018","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Guo, W., Yang, W., Zhang, H., and Hua, G. (2018). Geospatial Object Detection in High Resolution Satellite Images Based on Multi-Scale Convolutional Neural Network. Remote Sens., 10.","DOI":"10.3390\/rs10010131"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Yang, X., Sun, H., Fu, K., Yang, J., Sun, X., Yan, M., and Guo, Z. (2018). Automatic Ship Detection in Remote Sensing Images from Google Earth of Complex Scenes Based on Multiscale Rotation Dense Feature Pyramid Networks. Remote Sens., 10.","DOI":"10.3390\/rs10010132"},{"key":"ref_28","first-page":"1","article-title":"Semantic Segmentation for high spatial resolution remote sensing images based on convolution neural network and pyramid pooling module","volume":"99","author":"Yu","year":"2018","journal-title":"IEEE J. STARS"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Xia, G.S., Bai, X., Ding, J., Zhu, Z., Belongie, S., Luo, J., Datcu, M., Pelillo, M., and Zhang, L. (arXiv, 2017). DOTA: A Large-scale Dataset for Object Detection in Aerial Images, arXiv.","DOI":"10.1109\/CVPR.2018.00418"},{"key":"ref_30","first-page":"1-1","article-title":"Mask R-CNN","volume":"99","author":"He","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Xie, S., Girshick, R., Dollar, P., Tu, Z., and He, K. (arXiv, 2017). Aggregated Residual Transformations for Deep Neural Networks, arXiv.","DOI":"10.1109\/CVPR.2017.634"},{"key":"ref_32","unstructured":"Yu, F., and Koltun, V. (arXiv, 2015). Multi-Scale Context Aggregation by Dilated Convolutions, arXiv."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 26\u201330). Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_34","unstructured":"Ioffe, S., and Szegedy, C. (2015, January 6\u201311). Batch normalization: accelerating deep network training by reducing internal covariate shift. Proceedings of the International Conference on International Conference on Machine Learning, Lille, France."},{"key":"ref_35","unstructured":"Wu, Y.X., and He, K.M. (arXiv, 2018). Group Normalization, arXiv."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"39401","DOI":"10.1109\/ACCESS.2018.2856088","article-title":"An end-to-end neural network for road extraction from remote sensing imagery by multiple feature pyramid network","volume":"6","author":"Gao","year":"2018","journal-title":"IEEE Access."},{"key":"ref_37","unstructured":"Girshick, R., Radosavovic, I., Gkioxari, G., Dollar, P., and He, K. (2018, January 22). Detectron. Available online: https:\/\/github.com\/facebookresearch\/detectron."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/11\/3\/339\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T12:30:44Z","timestamp":1760185844000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/11\/3\/339"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,2,8]]},"references-count":37,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2019,2]]}},"alternative-id":["rs11030339"],"URL":"https:\/\/doi.org\/10.3390\/rs11030339","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,2,8]]}}}