{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,15]],"date-time":"2026-06-15T04:32:04Z","timestamp":1781497924876,"version":"3.54.1"},"reference-count":46,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2022,1,21]],"date-time":"2022-01-21T00:00:00Z","timestamp":1642723200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["41971281"],"award-info":[{"award-number":["41971281"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>For remote sensing object detection, fusing the optimal feature information automatically and overcoming the sensitivity to adapt multi-scale objects remains a significant challenge for the existing convolutional neural networks. Given this, we develop a convolutional network model with an adaptive attention fusion mechanism (AAFM). The model is proposed based on the backbone network of EfficientDet. Firstly, according to the characteristics of object distribution in datasets, the stitcher is applied to make one image containing objects of various scales. Such a process can effectively balance the proportion of multi-scale objects and handle the scale-variable properties. In addition, inspired by channel attention, a spatial attention model is also introduced in the construction of the adaptive attention fusion mechanism. In this mechanism, the semantic information of the different feature maps is obtained via convolution and different pooling operations. Then, the parallel spatial and channel attention are fused in the optimal proportions by the fusion factors to get the further representative feature information. Finally, the Complete Intersection over Union (CIoU) loss is used to make the bounding box better cover the ground truth. The experimental results of the optical image dataset DIOR demonstrate that, compared with state-of-the-art detectors such as the Single Shot multibox Detector (SSD), You Only Look Once (YOLO) v4, and EfficientDet, the proposed module improves accuracy and has stronger robustness.<\/jats:p>","DOI":"10.3390\/rs14030516","type":"journal-article","created":{"date-parts":[[2022,1,23]],"date-time":"2022-01-23T20:34:40Z","timestamp":1642970080000},"page":"516","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":80,"title":["An Adaptive Attention Fusion Mechanism Convolutional Network for Object Detection in Remote Sensing Images"],"prefix":"10.3390","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6843-6722","authenticated-orcid":false,"given":"Yuanxin","family":"Ye","sequence":"first","affiliation":[{"name":"Faculty of Geosciences and Environmental Engineering, Southwest Jiaotong University, Chengdu 610031, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Xiaoyue","family":"Ren","sequence":"additional","affiliation":[{"name":"Faculty of Geosciences and Environmental Engineering, Southwest Jiaotong University, Chengdu 610031, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3251-2452","authenticated-orcid":false,"given":"Bai","family":"Zhu","sequence":"additional","affiliation":[{"name":"Faculty of Geosciences and Environmental Engineering, Southwest Jiaotong University, Chengdu 610031, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Tengfeng","family":"Tang","sequence":"additional","affiliation":[{"name":"Faculty of Geosciences and Environmental Engineering, Southwest Jiaotong University, Chengdu 610031, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Xin","family":"Tan","sequence":"additional","affiliation":[{"name":"Faculty of Geosciences and Environmental Engineering, Southwest Jiaotong University, Chengdu 610031, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yang","family":"Gui","sequence":"additional","affiliation":[{"name":"The 9th System Design Department of China Areospace Science Industry Corporation, Wuhan 430000, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Qin","family":"Yao","sequence":"additional","affiliation":[{"name":"Northwest Institute of Nuclear Technology, Xi\u2019an 710025, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2022,1,21]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Janakiramaiah, B., Kalyani, G., Karuna, A., Prasad, L.V.N., and Krishna, M. (2021). Military object detection in defense using multi-level capsule networks. Soft Comput., 1\u201315.","DOI":"10.1007\/s00500-021-05912-0"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"1002","DOI":"10.1109\/TITS.2015.2496795","article-title":"Fast Detection of Multiple Objects in Traffic Scenes With a Common Detection Framework","volume":"17","author":"Hu","year":"2015","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"101009","DOI":"10.1016\/j.aei.2019.101009","article-title":"Convolutional neural networks for object detection in aerial imagery for disaster response and recovery","volume":"43","author":"Pi","year":"2020","journal-title":"Adv. Eng. Inform."},{"key":"ref_4","unstructured":"(2005, January 20\u201325). Histograms of Oriented Gradients for Human Detection. Proceedings of the IEEE Computer Society Conference on Computer Vision & Pattern Recognition, San Diego, CA, USA."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"471","DOI":"10.1109\/LGRS.2012.2210189","article-title":"Texture-Based Airport Runway Detection","volume":"10","author":"Aytekin","year":"2012","journal-title":"IEEE Geosci. Remote. Sens. Lett."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Weber, J., and Lefevre, S. (2008). A Multivariate Hit-or-Miss Transform for Conjoint Spatial and Spectral Template Matching, Springer.","DOI":"10.1007\/978-3-540-69905-7_26"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"70","DOI":"10.1016\/j.compag.2018.02.016","article-title":"Deep learning in agriculture: A survey","volume":"147","author":"Kamilaris","year":"2018","journal-title":"Comput. Electron. Agric."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"9059","DOI":"10.1109\/TGRS.2019.2924684","article-title":"Fast and Robust Matching for Multimodal Remote Sensing Image Registration","volume":"57","author":"Ye","year":"2019","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_9","first-page":"1","article-title":"Robust Matching for SAR and Optical Images Using Multiscale Convolutional Gradient Features","volume":"19","author":"Zhou","year":"2021","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Shen, H., Jiang, M., Li, J., Yuan, Q., Wei, Y., and Zhang, L. (2019). Spatial\u2013Spectral Fusion by Combining Deep Learning and Variational Model. IEEE Transactions on Geoscience and Remote Sensing, Institute of Electrical and Electronics Engineers (IEEE).","DOI":"10.1109\/TGRS.2019.2904659"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","article-title":"Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks","volume":"39","author":"Ren","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2013, January 23\u201328). Rich Feature Hierarchies for Accurate Object Detection and Semantic Seg-mentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA.","DOI":"10.1109\/CVPR.2014.81"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"154","DOI":"10.1007\/s11263-013-0620-5","article-title":"Selective search for object recognition","volume":"104","author":"Uijlings","year":"2013","journal-title":"Int. J. Comput. Vis."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015). Fast R-CNN. arXiv.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"813","DOI":"10.3390\/app8050813","article-title":"Small Object Detection in Optical Remote Sensing Images via Modified Faster R-CNN","volume":"8","author":"Yun","year":"2018","journal-title":"Appl. Sci."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., and Berg, A.C. (2016, January 11\u201314). SSD: Single Shot MultiBox Detector. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016, January 27\u201330). You Only Look Once: Unified, Real-Time Object Detection. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_18","unstructured":"Redmon, J., and Farhadi, A. (2018). YOLOv3: An Incremental Improvement. arXiv."},{"key":"ref_19","unstructured":"Bochkovskiy, A., Wang, C.-Y., and Liao, H.-J.M. (2020). YOLOv4 Optimal Speed and Accuracy of Object Detection. In Proceedings of the Computer Vision and Pattern Recognition. arxiv."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Xu, D., and Wu, Y. (2020). Improved YOLO-V3 with DenseNet for Multi-Scale Remote Sensing Target Detection. Sensors, 20.","DOI":"10.3390\/s20154276"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Law, H., and Deng, J. (2018). CornerNet: Detecting Objects as Paired Keypoints. arXiv.","DOI":"10.1007\/978-3-030-01264-9_45"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., and Tian, Q. (2019, January 27\u201328). CenterNet: Keypoint Triplets for Object Detection. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00667"},{"key":"ref_23","unstructured":"Tan, M., and Le, Q.V. (2019). Efficientnet: Rethinking model scaling for convolutional neural networks. arXiv."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Tan, M., Pang, R., and Le, Q.V. (2019). Efficientdet: Scalable and Efficient Object Detection. arXiv.","DOI":"10.1109\/CVPR42600.2020.01079"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., and Savarese, S. (2019, January 15\u201320). Generalized Intersection Over union: A metric and a Loss for Bounding Box Regression. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2019, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00075"},{"key":"ref_26","unstructured":"Chen, Y., Zhang, P., Li, Z., Li, Y., Zhang, X., Meng, G., Xiang, S., Sun, J., and Jia, J. (2020). Stitcher: Feedback-driven Data Provider for Object Detection. arXiv."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., and Ren, D. (2020). Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. In Proceedings of the AAAI Conference on Artificial Intelligence. arXiv.","DOI":"10.1609\/aaai.v34i07.6999"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Liu, M., Wang, X., Zhou, A., Fu, X., Ma, Y., and Piao, C. (2020). UAV-YOLO: Small Object Detection on Unmanned Aerial Vehicle Perspective. Sensors, 20.","DOI":"10.3390\/s20082238"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Ju, M., Luo, H., and Wang, Z. (2020, January 24\u201326). An improved YOLO V3 for small vehicles detection in aerial images. Proceedings of the 3rd International Conference on Algorithms, Computing and Artificial Intelligence, Sanya, China.","DOI":"10.1145\/3446132.3446188"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Wang, G., Zhuang, Y., Wang, Z., Chen, H., Shi, H., and Chen, L. (August, January 28). Spatial Enhanced-SSD For Multiclass Object Detection in Remote Sensing Images. Proceedings of the IGARSS 2019\u20132019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan.","DOI":"10.1109\/IGARSS.2019.8898526"},{"key":"ref_31","unstructured":"Lin, T., Goyal, P., Girshick, R., He, K., and Doll\u00e1r, P. Focal loss for dense object detection. Proceedings of the IEEE Transactions on Pattern Analysis & Machine Intelligence, Venice, Italy."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Khoroshevsky, F., Khoroshevsky, S., and Bar-Hillel, A. (2021). Parts-per-Object Count in Agricultural Images: Solving Phenotyping Problems via a Single Deep Neural Network. Remote. Sens., 13.","DOI":"10.3390\/rs13132496"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"5529383","DOI":"10.1155\/2021\/5529383","article-title":"Research on Mount Wilson Magnetic Classification Based on Deep Learning","volume":"2021","author":"He","year":"2021","journal-title":"Adv. Astron."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Liu, Y., Yang, J., and Cui, W. (October, January 26). Simple, Fast, Accurate Object Detection based on Anchor-Free Method for High Resolution Remote Sensing Images. Proceedings of the IGARSS 2020\u20132020 IEEE International Geoscience and Remote Sensing Symposium, Waikoloa, HI, USA.","DOI":"10.1109\/IGARSS39084.2020.9324301"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Lin, Z., and Guo, W. (2021). Cotton Stand Counting from Unmanned Aerial System Imagery Using MobileNet and CenterNet Deep Learning Models. Remote Sens., 13.","DOI":"10.3390\/rs13142822"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"401","DOI":"10.1109\/LGRS.2020.2975086","article-title":"A Specially Optimized One-Stage Network for Object Detection in Remote Sensing Images","volume":"18","author":"Qin","year":"2021","journal-title":"IEEE Geosci. Remote. Sens. Lett."},{"key":"ref_37","unstructured":"Bahdanau, D., Cho, K., and Bengio, Y. (2014). Neural Machine Translation by Jointly Learning to Align and Translate. arXiv."},{"key":"ref_38","first-page":"2204","article-title":"Recurrent Models of Visual Attention","volume":"2","author":"Mnih","year":"2014","journal-title":"Adv. Neural Inf. Processing Syst."},{"key":"ref_39","unstructured":"Chorowski, J., Bahdanau, D., Serdyuk, D., Cho, K., and Bengio, Y. (2015). Attention-based models for speech recognition. arXiv."},{"key":"ref_40","first-page":"2017","article-title":"Spatial Transformer Network","volume":"28","author":"Max","year":"2015","journal-title":"Adv. Neural Inf. Processing Syst."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"2011","DOI":"10.1109\/TPAMI.2019.2913372","article-title":"Squeeze-and-Excitation Networks","volume":"42","author":"Hu","year":"2020","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Woo, S., Park, J., Lee, J., and Kweon, I.S. (2018, January 8\u201314). CBAM: Convolutional Block Attention Module. Proceedings of the 2018 European Conference on Computer Vision, Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., and Hu, Q. (2020, January 14\u201319). ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01155"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Tan, M., Chen, B., Pang, R., Vasudevan, V., Sandler, M., Howard, A., and Le, Q.V. (2019). MnasNet: Platform-Aware Neural Architecture Search for Mobile. arXiv.","DOI":"10.1109\/CVPR.2019.00293"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and Chen, L. (2018, January 18\u201323). MobileNetV2: Inverted residuals and linear bottlenecks. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00474"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Luong, M., Pham, H., and Manning, C.D. (2015). Effective Approaches to Attention-based Neural Machine Translation. arXiv.","DOI":"10.18653\/v1\/D15-1166"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/3\/516\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T22:05:31Z","timestamp":1760133931000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/3\/516"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,1,21]]},"references-count":46,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2022,2]]}},"alternative-id":["rs14030516"],"URL":"https:\/\/doi.org\/10.3390\/rs14030516","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,1,21]]}}}