{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T02:26:09Z","timestamp":1760149569331,"version":"build-2065373602"},"reference-count":50,"publisher":"MDPI AG","issue":"17","license":[{"start":{"date-parts":[[2023,8,23]],"date-time":"2023-08-23T00:00:00Z","timestamp":1692748800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62201472","62272383"],"award-info":[{"award-number":["62201472","62272383"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Remote Sensing Image processing is a traditional research field, where RSI object detection is one of the most important directions. This paper focuses on an inherent problem of multi-stage object detection frameworks: the coupling error transmitting problem. In brief, because of the coupling method between the classifier and the regressor, the traditional multi-stage Detection frameworks tend to be fallible when encountering coarse object proposals. To deal with this problem, this article proposes a novel deep learning-based multi-stage object detection framework. Specifically, a novel network head architecture with a multi-to-one coupling method is proposed to avoid the coupling error of the traditional network head architecture. Moreover, it is found that the traditional network head architecture is more efficient than the novel network architecture when encountering fine object proposals. Considering this phenomenon, a proposal-consistent cooperation mechanism between the network heads is proposed. This mechanism makes the traditional network head and the novel network head develop each other\u2019s advantages and avoid the disadvantages. Experiments with different backbone networks on three publicly available data sets have shown the effectiveness of the proposed method since mAP is proposed as 0.7% to 12.3% on most models and data sets.<\/jats:p>","DOI":"10.3390\/rs15174130","type":"journal-article","created":{"date-parts":[[2023,8,23]],"date-time":"2023-08-23T08:01:21Z","timestamp":1692777681000},"page":"4130","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Detector Consistency Research on Remote Sensing Object Detection"],"prefix":"10.3390","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0960-3636","authenticated-orcid":false,"given":"Yuanlin","family":"Zhang","sequence":"first","affiliation":[{"name":"Shaanxi Key Laboratory for Network Computing and Security Technology, Department of Computer Science and Engineering, Xi\u2019an University of Technology, No. 5 South Jinhua Road, Xi\u2019an 710048, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3742-4029","authenticated-orcid":false,"given":"Haiyan","family":"Jin","sequence":"additional","affiliation":[{"name":"Shaanxi Key Laboratory for Network Computing and Security Technology, Department of Computer Science and Engineering, Xi\u2019an University of Technology, No. 5 South Jinhua Road, Xi\u2019an 710048, China"}]}],"member":"1968","published-online":{"date-parts":[[2023,8,23]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Xu, H., Zheng, W., Liu, F., Li, P., and Wang, R. (2023). Unmanned Aerial Vehicle Perspective Small Target Recognition Algorithm Based on Improved YOLOv5. Remote Sens., 15.","DOI":"10.3390\/rs15143583"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"K\u00f6rez, A., Bar\u0131\u015f\u00e7\u0131, N., \u00c7etin, A., and Erg\u00fcn, U. (2020). Weighted ensemble object detection with optimized coefficients for remote sensing images. ISPRS Int. J. Geo-Inf., 9.","DOI":"10.3390\/ijgi9060370"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"1174","DOI":"10.1109\/TGRS.2014.2335751","article-title":"Compressed-domain ship detection on spaceborne optical image using deep neural network and extreme learning machine","volume":"53","author":"Tang","year":"2014","journal-title":"IEEE Trans. Geosci. Remote Sens. (TGRS)"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Chen, F., Ren, R., Van de Voorde, T., Xu, W., Zhou, G., and Zhou, Y. (2018). Fast automatic airport detection in remote sensing images using convolutional neural networks. Remote Sens., 10.","DOI":"10.3390\/rs10030443"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"382","DOI":"10.1016\/j.isprsjprs.2007.10.005","article-title":"On-line boosting-based car detection from aerial images","volume":"63","author":"Grabner","year":"2008","journal-title":"ISPRS J. Photogramm. Remote Sens. (P&RS)"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"140","DOI":"10.1109\/TPAMI.2018.2876253","article-title":"Motion segmentation & multiple object tracking by correlation co-clustering","volume":"42","author":"Keuper","year":"2020","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., and Fei-Fei, L. (2014, January 23\u201328). Large-scale video classification with convolutional neural networks. Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.223"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Lu, J., Yang, J., Batra, D., and Parikh, D. (2018, January 18\u201322). Neural baby talk. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00754"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"1645","DOI":"10.1109\/TMM.2017.2772796","article-title":"Multistage object detection with group recursive learning","volume":"20","author":"Li","year":"2017","journal-title":"IEEE Trans. Multimed."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Cai, Z., and Vasconcelos, N. (2018, January 18\u201322). Cascade r-cnn: Delving into high quality object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00644"},{"key":"ref_11","first-page":"1","article-title":"OLCN: An optimized low coupling network for small objects detection","volume":"19","author":"Yuan","year":"2021","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"7933","DOI":"10.1109\/TGRS.2020.3048384","article-title":"DCL-Net: Augmenting the Capability of Classification and Localization for Remote Sensing Object Detection","volume":"59","author":"Liu","year":"2021","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Lowe, D.G. (1999, January 20\u201327). Object recognition from local scale-invariant features. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Kerkyra, Greece.","DOI":"10.1109\/ICCV.1999.790410"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"4516","DOI":"10.1109\/TGRS.2011.2144607","article-title":"Uniform robust scale-invariant feature matching for optical remote sensing images","volume":"49","author":"Sedaghat","year":"2011","journal-title":"IEEE Trans. Geosci. Remote Sens. (TGRS)"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"287","DOI":"10.1109\/LGRS.2008.2011751","article-title":"Robust scale-invariant feature matching for remote sensing image registration","volume":"6","author":"Li","year":"2009","journal-title":"IEEE Geosci. Remote Sens. Lett. (GRSL)"},{"key":"ref_16","unstructured":"Dalal, N., and Triggs, B. (2005, January 20\u201325). Histograms of oriented gradients for human detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Diego, CA, USA."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"2327","DOI":"10.1109\/JSTARS.2013.2242846","article-title":"Airborne vehicle detection in dense urban areas using HoG features and disparity maps","volume":"6","author":"Tuermer","year":"2013","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. (J-STARS)"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1016\/j.isprsjprs.2014.10.002","article-title":"Multi-class geospatial object detection and geographic image classification based on collection of part detectors","volume":"98","author":"Cheng","year":"2014","journal-title":"ISPRS J. Photogramm. Remote Sens. (P&RS)"},{"key":"ref_19","unstructured":"Li, F.F., and Perona, P. (2005, January 20\u201325). A bayesian hierarchical model for learning natural scene categories. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Diego, CA, USA."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"366","DOI":"10.1109\/LGRS.2009.2035644","article-title":"Object classification of aerial images with bag-of-visual words","volume":"7","author":"Xu","year":"2010","journal-title":"IEEE Geosci. Remote Sens. Lett. (GRSL)"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"109","DOI":"10.1109\/LGRS.2011.2161569","article-title":"Automatic target detection in high-resolution remote sensing images using spatial sparse coding bag-of-words model","volume":"9","author":"Sun","year":"2012","journal-title":"IEEE Geosci. Remote Sens. Lett. (GRSL)"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"45","DOI":"10.1080\/01431161.2012.705443","article-title":"Automatic landslide detection from remote-sensing imagery using a scene classification method based on BoVW and pLSA","volume":"34","author":"Cheng","year":"2013","journal-title":"Int. J. Remote Sens. (IJRS)"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"15014","DOI":"10.3390\/rs71115014","article-title":"Accurate annotation of remote sensing images via active spectral clustering with little expert knowledge","volume":"7","author":"Xia","year":"2015","journal-title":"Remote Sens."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"1346","DOI":"10.1109\/TGRS.2014.2337883","article-title":"A sparse representation-based binary hypothesis model for target detection in hyperspectral images","volume":"53","author":"Zhang","year":"2015","journal-title":"IEEE Trans. Geosci. Remote Sens. (TGRS)"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"2053","DOI":"10.1109\/JSTARS.2015.2404578","article-title":"Object detection based on sparse representation and Hough voting for optical remote sensing imagery","volume":"8","author":"Yokoya","year":"2015","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. (J-STARS)"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"1030","DOI":"10.1109\/TGRS.2013.2246837","article-title":"Sparse transfer manifold embedding for hyperspectral target detection","volume":"52","author":"Zhang","year":"2014","journal-title":"IEEE Trans. Geosci. Remote Sens. (TGRS)"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1016\/j.isprsjprs.2013.12.011","article-title":"Efficient, simultaneous detection of multi-class geospatial targets based on visual saliency modeling and discriminative learning of sparse coding","volume":"89","author":"Han","year":"2014","journal-title":"ISPRS J. Photogramm. Remote Sens. (P&RS)"},{"key":"ref_28","unstructured":"Redmon, J., and Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Chen, Q., Wang, Y., Yang, T., Zhang, X., Cheng, J., and Sun, J. (2021, January 19\u201325). You only look one-level feature. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01284"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., and Berg, A.C. (2016, January 11\u201314). Ssd: Single shot multibox detector. Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014, January 23\u201328). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.81"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 7\u201312). Fast r-cnn. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Boston, MA, USA.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_33","unstructured":"Ren, S., He, K., Girshick, R., and Sun, J. (2015, January 7\u201312). Faster r-cnn: Towards real-time object detection with region proposal networks. Proceedings of the Advances in Neural Information Processing Systems (NIPS), Montreal, QC, Canada."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"265","DOI":"10.1109\/TIP.2018.2867198","article-title":"Learning rotation-invariant and fisher discriminative convolutional neural networks for object detection","volume":"28","author":"Cheng","year":"2018","journal-title":"IEEE Trans. Image Process. (TIP)"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"791","DOI":"10.1109\/LGRS.2018.2882778","article-title":"Detection of multiclass objects in optical remote sensing images","volume":"16","author":"Liu","year":"2018","journal-title":"IEEE Geosci. Remote Sens. Lett. (GRSL)"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"43607","DOI":"10.1109\/ACCESS.2019.2908016","article-title":"Multi-scale image block-level f-cnn for remote sensing images object detection","volume":"7","author":"Zhao","year":"2019","journal-title":"IEEE Access"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"179","DOI":"10.1109\/TGRS.2019.2935177","article-title":"Gated and axis-concentrated localization network for remote sensing object detection","volume":"58","author":"Lu","year":"2020","journal-title":"IEEE Trans. Geosci. Remote Sens. (TGRS)"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"30980","DOI":"10.1109\/ACCESS.2019.2903422","article-title":"Object detection in aerial images using feature fusion deep networks","volume":"7","author":"Long","year":"2019","journal-title":"IEEE Access"},{"key":"ref_39","unstructured":"Zhang, W., Jiao, L., Liu, X., and Liu, J. (August, January 28). Multi-scale feature fusion network for object detection in vhr optical remote sensing images. Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Yokohama, Japan."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"296","DOI":"10.1016\/j.isprsjprs.2019.11.023","article-title":"Object detection in optical remote sensing images: A survey and a new benchmark","volume":"159","author":"Li","year":"2020","journal-title":"ISPRS J. Photogramm. Remote Sens. (P&RS)"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"5535","DOI":"10.1109\/TGRS.2019.2900302","article-title":"Hierarchical and robust convolutional neural network for very high-resolution remote sensing object detection","volume":"57","author":"Zhang","year":"2019","journal-title":"IEEE Trans. Geosci. Remote Sens. (TGRS)"},{"key":"ref_42","unstructured":"Chen, K., Wang, J., Pang, J., Cao, Y., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., and Xu, J. (2019). MMDetection: Open mmlab detection toolbox and benchmark. arXiv."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"7405","DOI":"10.1109\/TGRS.2016.2601622","article-title":"Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images","volume":"54","author":"Cheng","year":"2016","journal-title":"IEEE Trans. Geosci. Remote Sens. (TGRS)"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Doll\u00e1r, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21\u201326). Feature pyramid networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Zhang, H., Chang, H., Ma, B., Wang, N., and Chen, X. (2020, January 23\u201328). Dynamic R-CNN: Towards high quality object detection via dynamic training. Proceedings of the ECCV 2020: Computer Vision European Conference, Glasgow, UK.","DOI":"10.1007\/978-3-030-58555-6_16"},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Cheng, G., Zhou, P., and Han, J. (2016, January 27\u201330). Rifd-cnn: Rotation-invariant and fisher discriminative convolutional neural networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.315"},{"key":"ref_48","unstructured":"Li, X., Zhang, L., Chen, Y.P., Tai, Y.W., and Tang, C.K. (2020). One-shot object detection without fine-tuning. arXiv."},{"key":"ref_49","first-page":"2725","article-title":"One-shot object detection with co-attention and co-excitation","volume":"32","author":"Hsieh","year":"2019","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_50","first-page":"1","article-title":"Solo-to-collaborative dual-attention network for one-shot object detection in remote sensing images","volume":"60","author":"Li","year":"2021","journal-title":"IEEE Trans. Geosci. Remote Sens."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/15\/17\/4130\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T20:40:37Z","timestamp":1760128837000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/15\/17\/4130"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,8,23]]},"references-count":50,"journal-issue":{"issue":"17","published-online":{"date-parts":[[2023,9]]}},"alternative-id":["rs15174130"],"URL":"https:\/\/doi.org\/10.3390\/rs15174130","relation":{},"ISSN":["2072-4292"],"issn-type":[{"type":"electronic","value":"2072-4292"}],"subject":[],"published":{"date-parts":[[2023,8,23]]}}}