{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,14]],"date-time":"2026-04-14T22:31:14Z","timestamp":1776205874404,"version":"3.50.1"},"reference-count":60,"publisher":"MDPI AG","issue":"6","license":[{"start":{"date-parts":[[2024,3,7]],"date-time":"2024-03-07T00:00:00Z","timestamp":1709769600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["32271880"],"award-info":[{"award-number":["32271880"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62172231"],"award-info":[{"award-number":["62172231"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["BK20220107"],"award-info":[{"award-number":["BK20220107"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Natural Science Foundation of Jiangsu Province of China","award":["32271880"],"award-info":[{"award-number":["32271880"]}]},{"name":"Natural Science Foundation of Jiangsu Province of China","award":["62172231"],"award-info":[{"award-number":["62172231"]}]},{"name":"Natural Science Foundation of Jiangsu Province of China","award":["BK20220107"],"award-info":[{"award-number":["BK20220107"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Object detection is dedicated to finding objects in an image and estimate their categories and locations. Recently, object detection algorithms suffer from a loss of semantic information in the deeper feature maps due to the deepening of the backbone network. For example, when using complex backbone networks, existing feature fusion methods cannot fuse information from different layers effectively. In addition, anchor-free object detection methods fail to accurately predict the same object due to the different learning mechanisms of the regression and centrality of the prediction branches. To address the above problem, we propose a multi-scale fusion and interactive learning method for fully convolutional one-stage anchor-free object detection, called MFIL-FCOS. Specifically, we designed a multi-scale fusion module to address the problem of local semantic information loss in high-level feature maps which strengthen the ability of feature extraction by enhancing the local information of low-level features and fusing the rich semantic information of high-level features. Furthermore, we propose an interactive learning module to increase the interactivity and more accurate predictions by generating a centrality-position weight adjustment regression task and a centrality prediction task. Following these strategic improvements, we conduct extensive experiments on the COCO and DIOR datasets, demonstrating its superior capabilities in 2D object detection tasks and remote sensing image detection, even under challenging conditions.<\/jats:p>","DOI":"10.3390\/rs16060936","type":"journal-article","created":{"date-parts":[[2024,3,7]],"date-time":"2024-03-07T11:33:06Z","timestamp":1709811186000},"page":"936","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":19,"title":["MFIL-FCOS: A Multi-Scale Fusion and Interactive Learning Method for 2D Object Detection and Remote Sensing Image Detection"],"prefix":"10.3390","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8741-8607","authenticated-orcid":false,"given":"Guoqing","family":"Zhang","sequence":"first","affiliation":[{"name":"School of Computer Science, Nanjing University of Information Science and Technology, Nanjing 210044, China"},{"name":"Jiangsu Key Laboratory of Image and Video Understanding for Social Safety, Nanjing University of Science and Technology, Nanjing 210094, China"},{"name":"Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology (CICAEET), Nanjing University of Information Science & Technology, Nanjing 210044, China"}]},{"given":"Wenyu","family":"Yu","sequence":"additional","affiliation":[{"name":"School of Computer Science, Nanjing University of Information Science and Technology, Nanjing 210044, China"}]},{"given":"Ruixia","family":"Hou","sequence":"additional","affiliation":[{"name":"Research Institute of Resource Information Techniques, Chinese Academy of Forestry (CAF), Beijing 100091, China"}]}],"member":"1968","published-online":{"date-parts":[[2024,3,7]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"58443","DOI":"10.1109\/ACCESS.2020.2983149","article-title":"A survey of autonomous driving: Common practices and emerging technologies","volume":"8","author":"Yurtsever","year":"2020","journal-title":"IEEE Access"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Ghafir, I., Prenosil, V., Svoboda, J., and Hammoudeh, M. (2016, January 22\u201324). A survey on network security monitoring systems. Proceedings of the 2016 IEEE 4th International Conference on Future Internet of Things and Cloud Workshops (FiCloudW), Vienna, Austria.","DOI":"10.1109\/W-FiCloud.2016.30"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"408","DOI":"10.1016\/j.forsciint.2018.06.020","article-title":"The application of low-altitude near-infrared aerial photography for detecting clandestine burials using a UAV and low-cost unmodified digital camera","volume":"289","author":"Evers","year":"2018","journal-title":"Forensic Sci. Int."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"113","DOI":"10.1186\/s40537-019-0276-2","article-title":"Deep convolutional neural network based medical image classification for disease diagnosis","volume":"6","author":"Yadav","year":"2019","journal-title":"J. Big Data"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1123","DOI":"10.1007\/s00170-013-4904-2","article-title":"An industrial vision system for surface quality inspection of transparent parts","volume":"68","author":"Ortega","year":"2013","journal-title":"Int. J. Adv. Manuf. Technol."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"8387","DOI":"10.1080\/01431161.2018.1550919","article-title":"The development of remote sensing in the last 40 years","volume":"39","author":"Cracknell","year":"2018","journal-title":"Int. J. Remote Sens."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Zhang, Z., Zhang, X., Peng, C., Xue, X., and Sun, J. (2018, January 8\u201314). Exfuse: Enhancing feature fusion for semantic segmentation. Proceedings of the European Conference on Computer Vision, Munich, Germany.","DOI":"10.1007\/978-3-030-01249-6_17"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"7068349","DOI":"10.1155\/2018\/7068349","article-title":"Deep learning for computer vision: A brief review","volume":"2018","author":"Voulodimos","year":"2018","journal-title":"Comput. Intell. Neurosci."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"417","DOI":"10.1146\/annurev-biodatasci-110920-093120","article-title":"Satellite monitoring for air quality and health","volume":"4","author":"Holloway","year":"2021","journal-title":"Annual Rev. Biomed. Data Sci."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"36","DOI":"10.1109\/MCOM.2016.7470933","article-title":"Wireless communications with unmanned aerial vehicles: Opportunities and challenges","volume":"54","author":"Zeng","year":"2016","journal-title":"IEEE Commun. Mag."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Canty, M.J. (2019). Image Analysis, Classification and Change Detection in Remote Sensing: With Algorithms for Python, CRC Press.","DOI":"10.1201\/9780429464348"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"22","DOI":"10.1016\/j.isprsjprs.2015.10.004","article-title":"Remote sensing platforms and sensors: A survey","volume":"115","author":"Toth","year":"2016","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1016\/S0896-6273(00)80832-6","article-title":"Differential processing of objects under various viewing conditions in the human lateral occipital complex","volume":"24","author":"Kushnir","year":"1999","journal-title":"Neuron"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"170461","DOI":"10.1109\/ACCESS.2020.3021508","article-title":"Exploring deep learning-based architecture, strategies, applications and current trends in generic object detection: A comprehensive review","volume":"8","author":"Aziz","year":"2020","journal-title":"IEEE Access"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Yang, X., Sun, H., Fu, K., Yang, J., Sun, X., Yan, M., and Guo, Z. (2018). Automatic ship detection in remote sensing images from google earth of complex scenes based on multiscale rotation dense feature pyramid networks. Remote Sens., 10.","DOI":"10.3390\/rs10010132"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"39","DOI":"10.1016\/j.neucom.2020.01.085","article-title":"Recent advances in deep learning for object detection","volume":"396","author":"Wu","year":"2020","journal-title":"Neurocomputing"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1007\/s11263-015-0816-y","article-title":"Imagenet large scale visual recognition challenge","volume":"115","author":"Russakovsky","year":"2015","journal-title":"Int. J. Comput. Vis."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., and Zitnick, C. (2014, January 6\u201312). Microsoft COCO: Common objects in context. Proceedings of the Computer Vision\u2013ECCV 2014: 13th European Conference, Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1007\/s11263-009-0275-4","article-title":"The pascal visual object classes (VOC) challenge","volume":"88","author":"Everingham","year":"2010","journal-title":"Int. J. Comput. Vis."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 7\u201313). Fast R-CNN. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_21","unstructured":"Redmon, J., and Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv."},{"key":"ref_22","unstructured":"Tian, Z., Shen, C., Chen, H., and He, T. (November, January 27). FCOS: Fully convolutional one-stage object detection. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_23","unstructured":"Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., and Tian, Q. (November, January 27). Centernet: Keypoint triplets for object detection. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Law, H., and Deng, J. (2018, January 8\u201314). Cornernet: Detecting objects as paired keypoints. Proceedings of the European Conference on Computer Vision, Munich, Germany.","DOI":"10.1007\/978-3-030-01264-9_45"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Zhou, X., Zhuo, J., and Krahenbuhl, P. (2019, January 15\u201320). Bottom-up object detection by grouping extreme and center points. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00094"},{"key":"ref_26","first-page":"3127232","article-title":"A New Spatial-Oriented Object Detection Framework for Remote Sensing Images","volume":"60","author":"Yu","year":"2021","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_27","first-page":"5610013","article-title":"Foreground Refinement Network for Rotated Object Detection in Remote Sensing Images","volume":"60","author":"Zhang","year":"2021","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"8021","DOI":"10.1109\/ACCESS.2022.3141059","article-title":"Multi-Size Object Detection in Large Scene Remote Sensing Images Under Dual Attention Mechanism","volume":"10","author":"Wang","year":"2022","journal-title":"IEEE Access"},{"key":"ref_29","first-page":"5405316","article-title":"Object Detection in Large-Scale Remote-Sensing Images Based on Time-Frequency Analysis and Feature Optimization","volume":"60","author":"Bai","year":"2021","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_30","first-page":"8013705","article-title":"Target detection in remote sensing image based on object-and-scene context constrained CNN","volume":"19","author":"Cheng","year":"2021","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"2148","DOI":"10.1109\/JSTARS.2020.3046482","article-title":"Cross-layer attention network for small object detection in remote sensing imagery","volume":"14","author":"Li","year":"2020","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Zhang, X., Gong, Z., Guo, H., Liu, X., Ding, L., Zhu, K., and Wang, J. (2023). Adaptive Adjacent Layer Feature Fusion for Object Detection in Remote Sensing Images. Remote Sens., 15.","DOI":"10.3390\/rs15174224"},{"key":"ref_33","first-page":"1","article-title":"Visual categorization with bags of keypoints","volume":"1","author":"Csurka","year":"2004","journal-title":"Workshop Stat. Learn. Comput. Vis."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.patcog.2017.06.036","article-title":"Multi-modal feature fusion for geographic image annotation","volume":"73","author":"Li","year":"2018","journal-title":"Pattern Recognit."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"1375","DOI":"10.1109\/TITS.2020.2969993","article-title":"Railway traffic object detection using differential feature fusion convolution neural network","volume":"22","author":"Ye","year":"2020","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"612","DOI":"10.1016\/j.neucom.2014.03.090","article-title":"Hypergraph based feature fusion for 3-D object retrieval","volume":"151","author":"Wang","year":"2015","journal-title":"Neurocomputing"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Haussmann, E., Fenzi, M., Chitta, K., Ivanecky, J., Xu, H., Roy, D., and Alvarez, J.M. (November, January 19). Scalable active learning for object detection. Proceedings of the 2020 IEEE Intelligent Vehicles Symposium, Las Vegas, NV, USA.","DOI":"10.1109\/IV47402.2020.9304793"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Yao, A., Gall, J., Leistner, C., and Van Gool, L. (2012, January 16\u201321). Interactive object detection. Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA.","DOI":"10.1109\/CVPR.2012.6248060"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Li, Y., Huang, D., Qin, D., Wang, L., and Gong, B. (2020, January 23\u201328). Improving object detection with selective self-supervised self-training. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58526-6_35"},{"key":"ref_40","first-page":"042609","article-title":"Comprehensive survey of deep learning in remote sensing: Theories, tools, and challenges for the community","volume":"11","author":"Ball","year":"2017","journal-title":"Remote Sens."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Li, S., He, C., Li, R., and Zhang, L. (2022, January 18\u201324). A dual weighting label assignment scheme for object detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00917"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"7389","DOI":"10.1109\/TIP.2020.3002345","article-title":"Foveabox: Beyound anchor-based object detection","volume":"29","author":"Kong","year":"2020","journal-title":"IEEE Trans. Image Process."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Zhang, S., Chi, C., Yao, Y., Lei, Z., and Li, S.Z. (2020, January 13\u201319). Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00978"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Kim, K., and Lee, H.S. (2020, January 23\u201328). Probabilistic anchor assignment with iou prediction for object detection. Proceedings of the Computer Vision\u2014ECCV 2020: 16th European Conference, Glasgow, UK.","DOI":"10.1007\/978-3-030-58595-2_22"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Ge, Z., Liu, S., Li, Z., Yoshie, O., and Sun, J. (2021, January 20\u201325). OTA: Optimal transport assignment for object detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00037"},{"key":"ref_46","unstructured":"Zhu, B., Wang, J., Jiang, Z., Zong, F., Liu, S., Li, Z., and Sun, J. (2007). Autoassign: Differentiable label assignment for dense object detection. arXiv."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Ke, W., Zhang, T., Huang, Z., Ye, Q., Liu, J., and Huang, D. (2020, January 13\u201319). Multiple anchor learning for visual object detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01022"},{"key":"ref_48","first-page":"21002","article-title":"Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection","volume":"33","author":"Li","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Zhang, H., Wang, Y., Dayoub, F., and Sunderhauf, N. (2021, January 20\u201325). Varifocalnet: An IoU-aware dense object detector. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00841"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Gao, Z., Wang, L., and Wu, G. (2021, January 20\u201325). Mutual supervision for dense object detection. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Nashville, TN, USA.","DOI":"10.1109\/ICCV48922.2021.00362"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Feng, C., Zhong, Y., Gao, Y., Scott, M.R., and Huang, W. (2021, January 20\u201325). TOOD: Task-aligned one-stage object detection. Proceedings of the 2021 IEEE\/CVF International Conference on Computer Vision, Nashville, TN, USA.","DOI":"10.1109\/ICCV48922.2021.00349"},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Ni, Z., Yang, F., Wen, S., and Zhang, G. (2023). Dual Relation Knowledge Distillation for Object Detection. arXiv.","DOI":"10.24963\/ijcai.2023\/142"},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Ma, Y., Liu, S., Li, Z., and Sun, J. (2021, January 19\u201325). Iqdet: Instance-wise quality distribution sampling for object detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00176"},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., and Berg, A.C. (2016, January 11\u201314). SSD: Single shot multibox detector. Proceedings of the ECCV 2016: 14th European Conference, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_55","first-page":"5601920","article-title":"A novel nonlocal-aware pyramid and multiscale multitask refinement detector for object detection in remote sensing images","volume":"60","author":"Huang","year":"2021","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_56","unstructured":"Bochkovskiy, A., Wang, C.Y., and Liao, H.Y.M. (2020). Yolov4: Optimal speed and accuracy of object detection. arXiv."},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Tan, M., Pang, R., and Le, Q.V. (2020, January 14\u201319). Efficientdet: Scalable and efficient object detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01079"},{"key":"ref_58","doi-asserted-by":"crossref","unstructured":"Ye, Y., Ren, X., Zhu, B., Tang, T., Tan, X., Gui, Y., and Yao, Q. (2022). An adaptive attention fusion mechanism convolutional network for object detection in remote sensing images. Remote Sens., 14.","DOI":"10.3390\/rs14030516"},{"key":"ref_59","doi-asserted-by":"crossref","first-page":"3522","DOI":"10.1049\/ipr2.12230","article-title":"MFP-Net: Multi-scale feature pyramid network for crowd counting","volume":"15","author":"Lei","year":"2021","journal-title":"IET Image Process."},{"key":"ref_60","first-page":"851","article-title":"HAWK-Net: Hierarchical Attention Weighted Top-K Network for High-resolution Image Classification","volume":"31","author":"Nakanishi","year":"2023","journal-title":"J. Inf. Process."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/16\/6\/936\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T14:10:24Z","timestamp":1760105424000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/16\/6\/936"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,3,7]]},"references-count":60,"journal-issue":{"issue":"6","published-online":{"date-parts":[[2024,3]]}},"alternative-id":["rs16060936"],"URL":"https:\/\/doi.org\/10.3390\/rs16060936","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,3,7]]}}}