{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,27]],"date-time":"2026-01-27T22:37:11Z","timestamp":1769553431752,"version":"3.49.0"},"reference-count":31,"publisher":"MDPI AG","issue":"17","license":[{"start":{"date-parts":[[2021,8,25]],"date-time":"2021-08-25T00:00:00Z","timestamp":1629849600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Modern Educational Technology Research Project","award":["2021-R-89410"],"award-info":[{"award-number":["2021-R-89410"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Deep convolutional neural networks (DCNNs) are driving progress in object detection of high-resolution remote sensing images. Region proposal generation, as one of the key steps in object detection, has also become the focus of research. High-resolution remote sensing images usually contain various sizes of objects and complex background, small objects are easy to miss or be mis-identified in object detection. If the recall rate of region proposal of small objects and multi-scale objects can be improved, it will bring an improvement on the performance of the accuracy in object detection. Spatial attention is the ability to focus on local features in images and can improve the learning efficiency of DCNNs. This study proposes a multi-scale spatial attention region proposal network (MSA-RPN) for high-resolution optical remote sensing imagery. The MSA-RPN is an end-to-end deep learning network with a backbone network of ResNet. It deploys three novel modules to fulfill its task. First, the Scale-specific Feature Gate (SFG) focuses on features of objects by processing multi-scale features extracted from the backbone network. Second, the spatial attention-guided model (SAGM) obtains spatial information of objects from the multi-scale attention maps. Third, the Selective Strong Attention Maps Model (SSAMM) adaptively selects sliding windows according to the loss values from the system\u2019s feedback, and sends the windowed samples to the spatial attention decoder. Finally, the candidate regions and their corresponding confidences can be obtained. We evaluate the proposed network in a public dataset LEVIR and compare with several state-of-the-art methods. The proposed MSA-RPN yields a higher recall rate of region proposal generation, especially for small targets in remote sensing images.<\/jats:p>","DOI":"10.3390\/rs13173362","type":"journal-article","created":{"date-parts":[[2021,8,25]],"date-time":"2021-08-25T04:24:27Z","timestamp":1629865467000},"page":"3362","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["A Multi-Scale Spatial Attention Region Proposal Network for High-Resolution Optical Remote Sensing Imagery"],"prefix":"10.3390","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5925-5495","authenticated-orcid":false,"given":"Ruchan","family":"Dong","sequence":"first","affiliation":[{"name":"Jinling Institute of Technology, Nanjing 211169, China"},{"name":"Software Testing Engineering Laboratory of Jiangsu Province, Nanjing 211169, China"}]},{"given":"Licheng","family":"Jiao","sequence":"additional","affiliation":[{"name":"Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education of China, Xidian University, Xi\u2019an 710071, China"}]},{"given":"Yan","family":"Zhang","sequence":"additional","affiliation":[{"name":"Jinling Institute of Technology, Nanjing 211169, China"}]},{"given":"Jin","family":"Zhao","sequence":"additional","affiliation":[{"name":"School of Mathematics and Statistics, Xi\u2019an Jiaotong University, Xi\u2019an 710049, China"}]},{"given":"Weiyan","family":"Shen","sequence":"additional","affiliation":[{"name":"Jinling Institute of Technology, Nanjing 211169, China"}]}],"member":"1968","published-online":{"date-parts":[[2021,8,25]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","article-title":"Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks","volume":"39","author":"Ren","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Redmon, J., and Farhadi, A. (2017, January 21\u201326). YOLO9000: Better, Faster, Stronger. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.690"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"814","DOI":"10.1109\/TPAMI.2015.2465908","article-title":"What Makes for Effective Detection Proposals?","volume":"38","author":"Hosang","year":"2016","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_4","unstructured":"Gu, C., Lim, J.J., Arbelaez, P., and Malik, J. (2009, January 20\u201325). Recognition Using Regions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"154","DOI":"10.1007\/s11263-013-0620-5","article-title":"Selective Search for Object Recognition","volume":"104","author":"Uijlings","year":"2013","journal-title":"Int. J. Comput. Vis."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Alexe, B., Deselaers, T., and Ferrari, V. (2010, January 13\u201318). What is an object?. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA.","DOI":"10.1109\/CVPR.2010.5540226"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Cheng, M.M., Zhang, Z., Lin, W.Y., and Torr, P. (2014, January 23\u201328). BING: Binarized Normed Gradients for Objectness Estimation at 300fps. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.414"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Zitnick, C.L., and Doll\u00e1r, P. (2014). Edge Boxes: Locating Object Proposals from Edges. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-319-10602-1_26"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Gidaris, S., and Komodakis, N. (2016, January 19\u201322). Attend Refine Repeat: Active Box Proposal Generation via In-Out Localization. Proceedings of the British Machine Vision Conference (BMVC), York, UK.","DOI":"10.5244\/C.30.90"},{"key":"ref_10","first-page":"1990","article-title":"Learning to Segment Object Candidates","volume":"Volume 2","author":"Pinheiro","year":"2015","journal-title":"Proceedings of the Internati Onal Conference on Neural Information Processing Systems"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Pinheiro, P.O., Lin, T.Y., Collobert, R., and Doll\u00e1r, P. (2016, January 8\u201316). Learning to Refine Object Segments. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_5"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Hu, H., Lan, S., Jiang, Y., Cao, Z., and Sha, F. (2017, January 21\u201326). FastMask: Segment Multi-scale Object Candidates in One Shot. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.245"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1016\/j.isprsjprs.2018.04.003","article-title":"Multi-scale object detection in remote sensing imagery with convolutional neural networks","volume":"145","author":"A","year":"2018","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"281","DOI":"10.1016\/j.isprsjprs.2018.02.014","article-title":"Multi-class geospatial object detection based on a position-sensitive balancing framework for high spatial resolution remote sensing imagery","volume":"138","author":"Zhong","year":"2018","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"2337","DOI":"10.1109\/TGRS.2017.2778300","article-title":"Rotation-Insensitive and Context-Augmented Object Detection in Remote Sensing Images","volume":"56","author":"Li","year":"2017","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Tang, T., Zhou, S., Deng, Z., Zou, H., and Lei, L. (2017). Vehicle Detection in Aerial Images Based on Region Convolutional Neural Networks and Hard Negative Example Mining. Sensors, 17.","DOI":"10.3390\/s17020336"},{"key":"ref_17","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, \u0141., and Polosukhin, I. (2017, January 4\u20139). Attention is All You Need. Proceedings of the International Conference on Neural Information Processing Systems, Long Beach, CA, USA."},{"key":"ref_18","unstructured":"Zagoruyko, S., and Komodakis, N. (2017, January 24\u201316). Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer. Proceedings of the International Conference on Learning Representations, Toulon, France."},{"key":"ref_19","unstructured":"Jaderberg, M., Simonyan, K., Zisserman, A., and Kavukcuoglu, K. (2015, January 7\u201312). Spatial Transformer Networks. Proceedings of the International Conference on Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_20","unstructured":"Almahairi, A., Ballas, N., Cooijmans, T., Zheng, Y., Larochelle, H., and Courville, A. (2016, January 19\u201324). Dynamic Capacity Networks. Proceedings of the International Conference on Machine Learning, New York City, NY, USA."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"2011","DOI":"10.1109\/TPAMI.2019.2913372","article-title":"Squeeze-and-Excitation Networks","volume":"42","author":"Hu","year":"2020","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., Wang, X., and Tang, X. (2017, January 21\u201326). Residual Attention Network for Image Classification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.683"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Woo, S., Park, J., Lee, J.Y., and Kweon, I.S. (2018, January 23\u201328). CBAM: Convolutional Block Attention Module. Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK.","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Wilms, C., and Frintrop, S. (2018, January 2\u20136). AttentionMask: Attentive, Efficient Object Proposal Generation Focusing on Small Objects. Proceedings of the Asian Conference on Computer Vision, Perth, Australia.","DOI":"10.1007\/978-3-030-20890-5_43"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"1100","DOI":"10.1109\/TIP.2017.2773199","article-title":"Random Access Memories: A New Paradigm for Target Detection in High Resolution Aerial Remote Sensing Images","volume":"27","author":"Zou","year":"2017","journal-title":"IEEE Trans. Image Process."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) IEEE, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"1904","DOI":"10.1109\/TPAMI.2015.2389824","article-title":"Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition","volume":"37","author":"He","year":"2014","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll\u00e1r, P., and Zitnick, C.L. (2014, January 6\u201312). Microsoft COCO: Common objects in context. Proceedings of the European Conference on Computer Vision, Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"8534","DOI":"10.1109\/TGRS.2019.2921396","article-title":"Sig-NMS-Based Faster R-CNN Combining Transfer Learning for Small Target Detection in VHR Optical Remote Sensing Imagery","volume":"57","author":"Dong","year":"2019","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R.B., Guadarrama, S., and Darrell, T. (2014). Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv.","DOI":"10.1145\/2647868.2654889"},{"key":"ref_31","first-page":"1097","article-title":"ImageNet Classification with Deep Convolutional Neural Networks","volume":"Volume 60","author":"Krizhevsky","year":"2012","journal-title":"Proceedings of the International Conference on Neural Information Processing Systems"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/13\/17\/3362\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T06:51:10Z","timestamp":1760165470000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/13\/17\/3362"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,8,25]]},"references-count":31,"journal-issue":{"issue":"17","published-online":{"date-parts":[[2021,9]]}},"alternative-id":["rs13173362"],"URL":"https:\/\/doi.org\/10.3390\/rs13173362","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,8,25]]}}}