{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,11]],"date-time":"2026-04-11T19:58:07Z","timestamp":1775937487520,"version":"3.50.1"},"reference-count":44,"publisher":"MDPI AG","issue":"6","license":[{"start":{"date-parts":[[2023,3,13]],"date-time":"2023-03-13T00:00:00Z","timestamp":1678665600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["42001408"],"award-info":[{"award-number":["42001408"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61806097"],"award-info":[{"award-number":["61806097"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["2022YFE0204600"],"award-info":[{"award-number":["2022YFE0204600"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"National Key R&amp;D Program of China","award":["42001408"],"award-info":[{"award-number":["42001408"]}]},{"name":"National Key R&amp;D Program of China","award":["61806097"],"award-info":[{"award-number":["61806097"]}]},{"name":"National Key R&amp;D Program of China","award":["2022YFE0204600"],"award-info":[{"award-number":["2022YFE0204600"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Remote sensing object detection is a difficult task because it often requires real-time feedback through numerous objects in complex environments. In object detection, Feature Pyramids Networks (FPN) have been widely used for better representations based on a multi-scale problem. However, the multiple level features cause detectors\u2019 structures to be complex and makes redundant calculations that slow down the detector. This paper uses a single-layer feature to make the detection lightweight and accurate without relying on Feature Pyramid Structures. We proposed a method called the Cross Stage Partial Strengthen Matching Detector (StrMCsDet). The StrMCsDet generates a single-level feature map architecture in the backbone with a cross stage partial network. To provide an alternative way of replacing the traditional feature pyramid, a multi-scale encoder was designed to compensate the receptive field limitation. Additionally, a stronger matching strategy was proposed to make sure that various scale anchors may be equally matched. The StrMCsDet is different from the conventional full pyramid structure and fully exploits the feature map which deals with a multi-scale encoder. Methods achieved both comparable precision and speed for practical applications. Experiments conducted on the DIOR dataset and the NWPU-VHR-10 dataset achieved 65.6 and 73.5 mAP on 1080 Ti, respectively, which can match the performance of state-of-the-art works. Moreover, StrMCsDet requires less computation and achieved 38.5 FPS on the DIOR dataset.<\/jats:p>","DOI":"10.3390\/rs15061574","type":"journal-article","created":{"date-parts":[[2023,3,14]],"date-time":"2023-03-14T02:33:36Z","timestamp":1678761216000},"page":"1574","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["A Cross Stage Partial Network with Strengthen Matching Detector for Remote Sensing Object Detection"],"prefix":"10.3390","volume":"15","author":[{"given":"Shougang","family":"Ren","sequence":"first","affiliation":[{"name":"College of Artificial Intelligence, Nanjing Agricultural University, Nanjing 210095, China"}]},{"given":"Zhiruo","family":"Fang","sequence":"additional","affiliation":[{"name":"College of Artificial Intelligence, Nanjing Agricultural University, Nanjing 210095, China"}]},{"given":"Xingjian","family":"Gu","sequence":"additional","affiliation":[{"name":"College of Artificial Intelligence, Nanjing Agricultural University, Nanjing 210095, China"}]}],"member":"1968","published-online":{"date-parts":[[2023,3,13]]},"reference":[{"key":"ref_1","first-page":"5","article-title":"Support vector machines for classification and regression","volume":"14","author":"Gunn","year":"1998","journal-title":"ISIS Tech. Rep."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014, January 23\u201328). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.81"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 7\u201313). Fast r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_4","unstructured":"Ren, S., He, K., Girshick, R., and Sun, J. (2016). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. arXiv, Available online: http:\/\/arxiv.org\/abs\/1506.01497."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Doll\u00e1r, P., and Girshick, R. (2017, January 22\u201329). Mask r-cnn. Proceedings of the IEEE international Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.322"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., and Berg, A.C. (2016, January 11\u201314). Ssd: Single shot multibox detector. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016, January 27\u201330). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Redmon, J., and Farhadi, A. (2017, January 21\u201326). YOLO9000: Better, faster, stronger. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.690"},{"key":"ref_9","unstructured":"Redmon, J., and Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv."},{"key":"ref_10","unstructured":"Bochkovskiy, A., Wang, C.-Y., and Liao, H.-J.M. (2020). YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv, Available online: http:\/\/arxiv.org\/abs\/2004.10934."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Wang, C.-Y., Bochkovskiy, A., and Liao, H.-Y.M. (2021). Scaled-YOLOv4: Scaling Cross Stage Partial Network. arXiv, Available online: http:\/\/arxiv.org\/abs\/2011.08036.","DOI":"10.1109\/CVPR46437.2021.01283"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Luo, X., Wu, Y., and Wang, F. (2022). Target Detection Method of UAV Aerial Imagery Based on Improved YOLOv5. Remote Sens., 14.","DOI":"10.3390\/rs14195063"},{"key":"ref_13","unstructured":"Wang, C.-Y., Bochkovskiy, A., and Liao, H.-Y.M. (2022). YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv, Available online: http:\/\/arxiv.org\/abs\/2207.02696."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep Residual Learning for Image Recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Lin, T.-Y., Goyal, P., Girshick, R., He, K., and Doll\u00e1r, P. (2017, January 22\u201329). Focal loss for dense object detection. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.324"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Lin, T.-Y., Dollar, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21\u201326). Feature Pyramid Networks for Object Detection. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Chen, Q., Wang, Y., Yang, T., Zhang, X., Cheng, J., and Sun, J. (2021, January 20\u201325). You only look one-level feature. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01284"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Law, H., and Deng, J. (2018, January 8\u201314). Cornernet: Detecting objects as paired keypoints. Proceedings of the European conference on computer vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01264-9_45"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., and Tian, Q. (November, January 27). CenterNet: Keypoint Triplets for Object Detection. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea.","DOI":"10.1109\/ICCV.2019.00667"},{"key":"ref_20","unstructured":"Zhu, X., Su, W., Lu, L., Li, B., Wang, X., and Dai, J. (2021). Deformable DETR: Deformable Transformers for End-to-End Object Detection. arXiv, Available online: http:\/\/arxiv.org\/abs\/2010.04159."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., and Zagoruyko, S. (2020). End-to-End Object Detection with Transformers. arXiv, Available online: http:\/\/arxiv.org\/abs\/2005.12872.","DOI":"10.1007\/978-3-030-58452-8_13"},{"key":"ref_22","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2021). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv, Available online: http:\/\/arxiv.org\/abs\/2010.11929."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"1556","DOI":"10.1109\/TIP.2020.3045636","article-title":"A global-local self-adaptive network for drone-view object detection","volume":"30","author":"Deng","year":"2020","journal-title":"IEEE Trans. Image Process."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Jiang, N., Yu, X., Peng, X., Gong, Y., and Han, Z. (2021, January 6\u201311). SM+: Refined scale match for tiny person detection. Proceedings of the ICASSP 2021\u20132021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada.","DOI":"10.1109\/ICASSP39728.2021.9414162"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Li, C., Yang, T., Zhu, S., Chen, C., and Guan, S. (2020, January 14\u201319). Density map guided object detection in aerial images. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA.","DOI":"10.1109\/CVPRW50498.2020.00103"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"9542","DOI":"10.1080\/01431161.2021.1995071","article-title":"Dual-det: A fast detector for oriented object detection in aerial images","volume":"42","author":"Guan","year":"2021","journal-title":"Int. J. Remote Sens."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"296","DOI":"10.1016\/j.isprsjprs.2019.11.023","article-title":"Object detection in optical remote sensing images: A survey and a new benchmark","volume":"159","author":"Li","year":"2020","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1016\/j.isprsjprs.2016.03.014","article-title":"A survey on object detection in optical remote sensing images","volume":"117","author":"Cheng","year":"2016","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Wang, C.-Y., Liao, H.-Y.M., Wu, Y.-H., Chen, P.-Y., Hsieh, J.-W., and Yeh, I.-H. (2020, January 14\u201319). CSPNet: A new backbone that can enhance learning capability of CNN. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA.","DOI":"10.1109\/CVPRW50498.2020.00203"},{"key":"ref_30","unstructured":"Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., and Adam, H. (2017). MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv, Available online: http:\/\/arxiv.org\/abs\/1704.04861."},{"key":"ref_31","first-page":"1","article-title":"A new lightweight network based on MobileNetV3","volume":"16","author":"Zhao","year":"2022","journal-title":"KSII Trans. Internet Inf. Syst."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"3885","DOI":"10.1016\/j.apt.2021.08.038","article-title":"Efficient image segmentation based on deep learning for mineral image classification","volume":"32","author":"Liu","year":"2021","journal-title":"Adv. Powder Technol."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"1398","DOI":"10.1111\/mice.12674","article-title":"Cross-scene pavement distress detection by a novel transfer learning framework","volume":"36","author":"Li","year":"2021","journal-title":"Comput. Civ. Infrastruct. Eng."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"1086140","DOI":"10.3389\/fmars.2022.1086140","article-title":"Multi-scale ship target detection using SAR images based on improved Yolov5","volume":"9","author":"Yasir","year":"2023","journal-title":"Front. Mar. Sci."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Zhou, W., Guo, Q., Lei, J., Yu, L., and Hwang, J.-N. (2021). IRFR-Net: Interactive Recursive Feature-Reshaping Network for Detecting Salient Objects in RGB-D Images. IEEE Trans. Neural Netw. Learn. Syst., 1\u201313.","DOI":"10.1109\/TNNLS.2021.3105484"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/LGRS.2022.3229556","article-title":"RMCHN: A Residual Modular Cascaded Heterogeneous Network for Noise Suppression in DAS-VSP Records","volume":"20","author":"Zhong","year":"2023","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"107685","DOI":"10.1016\/j.compeleceng.2022.107685","article-title":"An improved 3D point cloud instance segmentation method for overhead catenary height detection","volume":"98","author":"Zong","year":"2022","journal-title":"Comput. Electr. Eng."},{"key":"ref_38","first-page":"23","article-title":"Container Ship Cell Guide Accuracy Check Technology Based on Improved 3D Point Cloud Instance Segmentation","volume":"73","author":"Zong","year":"2022","journal-title":"Brodogr. Teor. Praksa Brodogr. Pomor. Teh."},{"key":"ref_39","unstructured":"Tian, Z., Shen, C., Chen, H., and He, T. (November, January 27). Fcos: Fully convolutional one-stage object detection. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_40","first-page":"9259","article-title":"M2Det: A Single-Shot Object Detector Based on Multi-Level Feature Pyramid Network","volume":"33","author":"Zhao","year":"2019","journal-title":"Proc. AAAI Conf. Artif. Intell."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Huang, W., Li, G., Chen, Q., Ju, M., and Qu, J. (2021). CF2PN: A Cross-Scale Feature Fusion Pyramid Network Based Remote Sensing Target Detection. Remote Sens., 13.","DOI":"10.3390\/rs13050847"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Hu, J., Shen, L., and Sun, G. (2018, January 18\u201322). Squeeze-and-excitation networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00745"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Woo, S., Park, J., Lee, J.-Y., and Kweon, I.S. (2018, January 8\u201314). Cbam: Convolutional block attention module. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"12556","DOI":"10.1007\/s10489-021-03121-8","article-title":"SA-FPN: An effective feature pyramid network for crowded human detection","volume":"52","author":"Zhou","year":"2022","journal-title":"Appl. Intell."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/15\/6\/1574\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T18:54:12Z","timestamp":1760122452000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/15\/6\/1574"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,3,13]]},"references-count":44,"journal-issue":{"issue":"6","published-online":{"date-parts":[[2023,3]]}},"alternative-id":["rs15061574"],"URL":"https:\/\/doi.org\/10.3390\/rs15061574","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,3,13]]}}}