{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,1]],"date-time":"2026-04-01T21:32:09Z","timestamp":1775079129733,"version":"3.50.1"},"reference-count":50,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2021,2,25]],"date-time":"2021-02-25T00:00:00Z","timestamp":1614211200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100017700","name":"Henan Provincial Science and Technology Research Project","doi-asserted-by":"publisher","award":["212102210102"],"award-info":[{"award-number":["212102210102"]}],"id":[{"id":"10.13039\/501100017700","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>In the wake of developments in remote sensing, the application of target detection of remote sensing is of increasing interest. Unfortunately, unlike natural image processing, remote sensing image processing involves dealing with large variations in object size, which poses a great challenge to researchers. Although traditional multi-scale detection networks have been successful in solving problems with such large variations, they still have certain limitations: (1) The traditional multi-scale detection methods note the scale of features but ignore the correlation between feature levels. Each feature map is represented by a single layer of the backbone network, and the extracted features are not comprehensive enough. For example, the SSD network uses the features extracted from the backbone network at different scales directly for detection, resulting in the loss of a large amount of contextual information. (2) These methods combine with inherent backbone classification networks to perform detection tasks. RetinaNet is just a combination of the ResNet-101 classification network and FPN network to perform the detection tasks; however, there are differences in object classification and detection tasks. To address these issues, a cross-scale feature fusion pyramid network (CF2PN) is proposed. First and foremost, a cross-scale fusion module (CSFM) is introduced to extract sufficiently comprehensive semantic information from features for performing multi-scale fusion. Moreover, a feature pyramid for target detection utilizing thinning U-shaped modules (TUMs) performs the multi-level fusion of the features. Eventually, a focal loss in the prediction section is used to control the large number of negative samples generated during the feature fusion process. The new architecture of the network proposed in this paper is verified by DIOR and RSOD dataset. The experimental results show that the performance of this method is improved by 2\u201312% in the DIOR dataset and RSOD dataset compared with the current SOTA target detection methods.<\/jats:p>","DOI":"10.3390\/rs13050847","type":"journal-article","created":{"date-parts":[[2021,2,25]],"date-time":"2021-02-25T05:20:08Z","timestamp":1614230408000},"page":"847","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":97,"title":["CF2PN: A Cross-Scale Feature Fusion Pyramid Network Based Remote Sensing Target Detection"],"prefix":"10.3390","volume":"13","author":[{"given":"Wei","family":"Huang","sequence":"first","affiliation":[{"name":"School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 450000, China"}]},{"given":"Guanyi","family":"Li","sequence":"additional","affiliation":[{"name":"School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 450000, China"}]},{"given":"Qiqiang","family":"Chen","sequence":"additional","affiliation":[{"name":"School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 450000, China"}]},{"given":"Ming","family":"Ju","sequence":"additional","affiliation":[{"name":"School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 450000, China"}]},{"given":"Jiantao","family":"Qu","sequence":"additional","affiliation":[{"name":"School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 450000, China"}]}],"member":"1968","published-online":{"date-parts":[[2021,2,25]]},"reference":[{"key":"ref_1","first-page":"805","article-title":"Hyperspectral Mineral Target Detection Based on Density Peak","volume":"25","author":"Hou","year":"2019","journal-title":"Intell. Autom. Soft Comput."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Sun, L., Wu, F., He, C., Zhan, T., Liu, W., and Zhang, D. (2020). Weighted Collaborative Sparse and L1\/2 Low-Rank Regularizations With Superpixel Segmentation for Hyperspectral Unmixing. IEEE Geosci. Remote Sens. Lett.","DOI":"10.1109\/LGRS.2020.3019427"},{"key":"ref_3","unstructured":"Papageorgiou, C.P., Oren, M., and Poggio, T. (1998, January 4\u20137). A general framework for object detection. Proceedings of the Sixth International Conference on Computer Vision, Bombay, India."},{"key":"ref_4","unstructured":"Dalal, N., and Triggs, B. (2005, January 20\u201326). Histograms of oriented gradients for human detection. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), San Diego, CA, USA."},{"key":"ref_5","unstructured":"Fu, L., Li, Z., and Ye, Q. (2020). Learning Robust Discriminant Subspace Based on Joint L2,p- and L2,s-Norm Distance Metrics. IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"3818","DOI":"10.1109\/TNNLS.2019.2944869","article-title":"Nonpeaked Discriminant Analysis for Data Representation","volume":"30","author":"Ye","year":"2019","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"114","DOI":"10.1109\/TCSVT.2016.2596158","article-title":"L1-Norm Distance Linear Discriminant Analysis Based on an Effective Iterative Algorithm","volume":"28","author":"Ye","year":"2018","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"4494","DOI":"10.1109\/TNNLS.2017.2749428","article-title":"L1-norm Distance Minimization Based Fast Robust Twin Support Vector k-plane clustering","volume":"29","author":"Ye","year":"2018","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_9","first-page":"5","article-title":"Support vector machines for classification and regression","volume":"14","author":"Gunn","year":"1998","journal-title":"ISIS Tech. Rep."},{"key":"ref_10","first-page":"697","article-title":"Investigation on the Chinese Text Sentiment Analysis Based on ConVolutional Neural Networks in Deep Learning","volume":"58","author":"Xu","year":"2019","journal-title":"Comput. Mater. Contin."},{"key":"ref_11","first-page":"829","article-title":"R2N: A Novel Deep Learning Architecture for Rain Removal from Single Image","volume":"58","author":"Guo","year":"2019","journal-title":"Comput. Mater. Contin."},{"key":"ref_12","first-page":"575","article-title":"A Review on Deep Learning Approaches to Image Classification and Object Segmentation","volume":"60","author":"Wu","year":"2019","journal-title":"Comput. Mater. Contin."},{"key":"ref_13","first-page":"601","article-title":"Deep Feature Fusion Model for Sentence Semantic Matching","volume":"61","author":"Zhang","year":"2019","journal-title":"Comput. Mater. Contin"},{"key":"ref_14","first-page":"329","article-title":"Modified PSO Algorithm on Recurrent Fuzzy Neural Network for System Identification","volume":"25","author":"Hung","year":"2019","journal-title":"Intell. Auto Soft Comput."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014, January 23\u201327). Rich feature hierarchies for accurate target detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.81"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1007\/s11263-009-0275-4","article-title":"The PASCAL Visual Object Classes (VOC) Challenge","volume":"88","author":"Everingham","year":"2010","journal-title":"IJCV"},{"key":"ref_17","unstructured":"Li, X., Shang, M., Qin, H., and Chen, L. (2015). Fast Accurate Fish Detection and Recognition of Underwater Images with Fast R-CNN, IEEE."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Qian, R., Liu, Q., Yue, Y., Coenen, F., and Zhang, B. (2016). Road Surface Traffic Sign Detection with Hybrid Region Proposal and Fast R-CNN, IEEE.","DOI":"10.1109\/FSKD.2016.7603233"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Wang, K., Dong, Y., Bai, H., Zhao, Y., and Hu, K. (2016). Use Fast R-CNN and Cascade Structure for Face Detection, IEEE.","DOI":"10.1109\/VCIP.2016.7805472"},{"key":"ref_20","unstructured":"Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster R-CNN: Towards Real-Time Target detection with Region Proposal Networks. Advances in Neural Information Processing Systems, IEEE."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Mhalla, A., Chateau, T., Gazzah, S., Ben Amara, N.E., and Assoc Comp, M. (2016). PhD Forum: Scene-Specific Pedestrian Detector Using Monte Carlo Framework and Faster R-CNN Deep Model, IEEE.","DOI":"10.1145\/2967413.2974040"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Zhai, M., Liu, H., Sun, F., and Zhang, Y. (2020). Ship Detection Based on Faster R-CNN Network in Optical Remote Sensing Images, Springer.","DOI":"10.1007\/978-981-32-9050-1_3"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016, January 27\u201330). You Only Look Once: Unified, Real-Time Target detection. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Zhang, X., Qiu, Z., Huang, P., Hu, J., and Luo, J. (2018, January 18\u201320). Application Research of YOLO v2 Combined with Color Identification. Proceedings of the 2018 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, Zhengzhou, China.","DOI":"10.1109\/CyberC.2018.00036"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Itakura, K., and Hosoi, F. (2020). Automatic Tree Detection from Three-Dimensional Images Reconstructed from 360 degrees Spherical Camera Using YOLO v2. Remote Sens., 12.","DOI":"10.3390\/rs12060988"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Bi, F., and Yang, J. (2019). Target Detection System Design and FPGA Implementation Based on YOLO v2 Algorithm, IEEE.","DOI":"10.1109\/ICISPC.2019.8935783"},{"key":"ref_27","unstructured":"Redmon, J., and Ali, F. (2018). Yolov3: An incremental improvement. arXiv."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Zhang, X., Yang, W., Tang, X., and Liu, J. (2018). A Fast Learning Method for Accurate and Robust Lane Detection Using Two-Stage Feature Extraction with YOLO v3. Sensors, 18.","DOI":"10.3390\/s18124308"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Adarsh, P., Rathi, P., and Kumar, M. (2020). YOLO v3-Tiny: Target detection and Recognition Using One Stage Improved Model, IEEE.","DOI":"10.1109\/ICACCS48705.2020.9074315"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Liu, G., Nouaze, J.C., Mbouembe, P.L.T., and Kim, J.H. (2020). YOLO-Tomato: A Robust Algorithm for Tomato Detection Based on YOLOv3. Sensors, 20.","DOI":"10.3390\/s20072145"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Liu, M., Wang, X., Zhou, A., Fu, X., Ma, Y., and Piao, C. (2020). UAV-YOLO: Small Object Detection on Unmanned Aerial Vehicle Perspective. Sensors, 20.","DOI":"10.3390\/s20082238"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Li, J., Gu, J., Huang, Z., and Wen, J. (2019). Application Research of Improved YOLO V3 Algorithm in PCB Electronic Component Detection. Appl. Sci., 9.","DOI":"10.3390\/app9183750"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"241","DOI":"10.1016\/j.ins.2020.02.067","article-title":"DC-SPP-YOLO: Dense connection and spatial pyramid pooling based YOLO for target detection","volume":"522","author":"Huang","year":"2020","journal-title":"Inf. Sci."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Fu, C., and Berg, A.C. (2016, January 8\u201316). SSD: Single Shot Multibox Detector. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Goyal, P., Girshick, R., He, K., and Doll\u00e1r, P. (2017, January 2\u201329). Focal loss for dense object detection. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.324"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"43607","DOI":"10.1109\/ACCESS.2019.2908016","article-title":"Multi-scale image block-level f-cnn for remote sensing images target detection","volume":"7","author":"Zhao","year":"2019","journal-title":"IEEE Access"},{"key":"ref_37","unstructured":"Sergievskiy, N., and Ponamarev, A. (2019). Reduced focal loss: 1st place solution to xview target detection in satellite imagery. arXiv."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Chen, C., Gong, W., Chen, Y., and Li, W. (2019). Target detection in remote sensing images based on a scene-contextual feature pyramid network. Remote Sensing, 11.","DOI":"10.3390\/rs11030339"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Yang, X., Yang, J., Yan, J., Zhang, Y., Zhang, T., and Guo, Z. (2019, January 22). SCRDet: Towards more robust detection for small, cluttered and rotated objects. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00832"},{"key":"ref_40","unstructured":"Yang, X., Yan, J., Yang, X., Tang, J., Liao, W., and He, T. (2020). SCRDet++: Detecting small, cluttered and rotated objects via instance-level feature denoising and rotation loss smoothing. arXiv."},{"key":"ref_41","first-page":"296","article-title":"Target detection in optical remote sensing images: A survey and a new benchmark","volume":"159","author":"Li","year":"2020","journal-title":"ISPRS"},{"key":"ref_42","unstructured":"Zhao, Q., Sheng, T., Wang, Y., Tang, Z., and Ling, H. (February, January 27). M2det: A single-shot object detector based on multi-level feature pyramid network. Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Wang, C.Y., Bochkovskiy, A., and Liao, H.Y.M. (2020). Scaled-YOLOv4: Scaling Cross Stage Partial Network. arXiv.","DOI":"10.1109\/CVPR46437.2021.01283"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Hu, J., Shen, L., Albanie, S., Sun, G., and Wu, E. (2017, January 21\u201326). Squeeze-and-excitation networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2018.00745"},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"1998","DOI":"10.1109\/LGRS.2017.2745900","article-title":"Rural Building Detection in High-Resolution Imagery Based on a Two-Stage CNN Model","volume":"14","author":"Sun","year":"2017","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"7405","DOI":"10.1109\/TGRS.2016.2601622","article-title":"Learning Rotation-Invariant Convolutional Neural Networks for Target detection in VHR Optical Remote Sensing Images","volume":"54","author":"Cheng","year":"2016","journal-title":"IEEE Geosci. Remote Sens."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"461","DOI":"10.5194\/isprs-archives-XLII-1-W1-461-2017","article-title":"Learning Oriented Region-based Convolutional Neural Networks for Building Detection in Satellite Remote Sensing Images","volume":"XLII-1\/W1","author":"Chen","year":"2017","journal-title":"Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1016\/j.isprsjprs.2018.04.003","article-title":"Multi-scale target detection in remote sensing imagery with convolutional neural networks","volume":"145","author":"Deng","year":"2018","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Yang, X., Sun, H., Fu, K., Yang, J., Sun, X., Yan, M., and Guo, Z. (2018). Automatic Ship Detection in Remote Sensing Images from Google Earth of Complex Scenes Based on Multiscale Rotation Dense Feature Pyramid Networks. Remote Sens., 10.","DOI":"10.3390\/rs10010132"},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"39401","DOI":"10.1109\/ACCESS.2018.2856088","article-title":"An end-to-end neural network for road extraction from remote sensing imagery by multiple feature pyramid network","volume":"6","author":"Gao","year":"2018","journal-title":"IEEE Access."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/13\/5\/847\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T05:28:07Z","timestamp":1760160487000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/13\/5\/847"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,2,25]]},"references-count":50,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2021,3]]}},"alternative-id":["rs13050847"],"URL":"https:\/\/doi.org\/10.3390\/rs13050847","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,2,25]]}}}