{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,8]],"date-time":"2026-06-08T11:19:06Z","timestamp":1780917546406,"version":"3.54.1"},"reference-count":48,"publisher":"MDPI AG","issue":"15","license":[{"start":{"date-parts":[[2023,7,30]],"date-time":"2023-07-30T00:00:00Z","timestamp":1690675200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Natural Science Foundation of China Project","award":["62262064"],"award-info":[{"award-number":["62262064"]}]},{"name":"National Natural Science Foundation of China Project","award":["62266043"],"award-info":[{"award-number":["62266043"]}]},{"name":"National Natural Science Foundation of China Project","award":["61966035"],"award-info":[{"award-number":["61966035"]}]},{"name":"National Natural Science Foundation of China Project","award":["XJEDU2016S106"],"award-info":[{"award-number":["XJEDU2016S106"]}]},{"name":"National Natural Science Foundation of China Project","award":["2022D01C56"],"award-info":[{"award-number":["2022D01C56"]}]},{"name":"Key R&amp;D projects in Xinjiang Uygur Autonomous Region","award":["62262064"],"award-info":[{"award-number":["62262064"]}]},{"name":"Key R&amp;D projects in Xinjiang Uygur Autonomous Region","award":["62266043"],"award-info":[{"award-number":["62266043"]}]},{"name":"Key R&amp;D projects in Xinjiang Uygur Autonomous Region","award":["61966035"],"award-info":[{"award-number":["61966035"]}]},{"name":"Key R&amp;D projects in Xinjiang Uygur Autonomous Region","award":["XJEDU2016S106"],"award-info":[{"award-number":["XJEDU2016S106"]}]},{"name":"Key R&amp;D projects in Xinjiang Uygur Autonomous Region","award":["2022D01C56"],"award-info":[{"award-number":["2022D01C56"]}]},{"name":"Natural Science Foundation of Xinjiang Uygur Autonomous Region of China","award":["62262064"],"award-info":[{"award-number":["62262064"]}]},{"name":"Natural Science Foundation of Xinjiang Uygur Autonomous Region of China","award":["62266043"],"award-info":[{"award-number":["62266043"]}]},{"name":"Natural Science Foundation of Xinjiang Uygur Autonomous Region of China","award":["61966035"],"award-info":[{"award-number":["61966035"]}]},{"name":"Natural Science Foundation of Xinjiang Uygur Autonomous Region of China","award":["XJEDU2016S106"],"award-info":[{"award-number":["XJEDU2016S106"]}]},{"name":"Natural Science Foundation of Xinjiang Uygur Autonomous Region of China","award":["2022D01C56"],"award-info":[{"award-number":["2022D01C56"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Remote sensing image object detection holds significant research value in resources and the environment. Nevertheless, complex background information and considerable size differences between objects in remote sensing images make it challenging. This paper proposes an efficient remote sensing image object detection model (MSA-YOLO) to improve detection performance. First, we propose a Multi-Scale Strip Convolution Attention Mechanism (MSCAM), which can reduce the introduction of background noise and fuse multi-scale features to enhance the focus of the model on foreground objects of various sizes. Second, we introduce the lightweight convolution module GSConv and propose an improved feature fusion layer, which makes the model more lightweight while improving detection accuracy. Finally, we propose the Wise-Focal CIoU loss function, which can reweight different samples to balance the contribution of different samples to the loss function, thereby improving the regression effect. Experimental results show that on the remote sensing image public datasets DIOR and HRRSD, the performance of our proposed MSA-YOLO model is significantly better than other existing methods.<\/jats:p>","DOI":"10.3390\/s23156811","type":"journal-article","created":{"date-parts":[[2023,7,31]],"date-time":"2023-07-31T03:30:02Z","timestamp":1690774202000},"page":"6811","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":44,"title":["MSA-YOLO: A Remote Sensing Object Detection Model Based on Multi-Scale Strip Attention"],"prefix":"10.3390","volume":"23","author":[{"given":"Zihang","family":"Su","sequence":"first","affiliation":[{"name":"School of Software, Xinjiang University, Urumqi 830091, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jiong","family":"Yu","sequence":"additional","affiliation":[{"name":"School of Software, Xinjiang University, Urumqi 830091, China"},{"name":"College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Haotian","family":"Tan","sequence":"additional","affiliation":[{"name":"College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Xueqiang","family":"Wan","sequence":"additional","affiliation":[{"name":"School of Software, Xinjiang University, Urumqi 830091, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Kaiyang","family":"Qi","sequence":"additional","affiliation":[{"name":"School of Software, Xinjiang University, Urumqi 830091, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2023,7,30]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Cheng, G., and Han, J. (2016). A Survey on Object Detection in Optical Remote Sensing Images. arXiv.","DOI":"10.1016\/j.isprsjprs.2016.03.014"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"296","DOI":"10.1016\/j.isprsjprs.2019.11.023","article-title":"Object detection in optical remote sensing images: A survey and a new benchmark","volume":"159","author":"Li","year":"2020","journal-title":"Isprs J. Photogramm. Remote Sens."},{"key":"ref_3","first-page":"91","article-title":"Faster r-cnn: Towards real-time object detection with region proposal networks","volume":"28","author":"Ren","year":"2015","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Cai, Z., and Vasconcelos, N. (2017, January 18\u201322). Cascade R-CNN: Delving Into High Quality Object Detection. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00644"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"386","DOI":"10.1109\/TPAMI.2018.2844175","article-title":"Mask R-CNN","volume":"42","author":"He","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"318","DOI":"10.1109\/TPAMI.2018.2858826","article-title":"Focal Loss for Dense Object Detection","volume":"42","author":"Lin","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.E., Fu, C.Y., and Berg, A.C. (2015, January 7\u201313). SSD: Single Shot MultiBox Detector. Proceedings of the European Conference on Computer Vision, Santiago, Chile.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S.K., Girshick, R.B., and Farhadi, A. (July, January 26). You Only Look Once: Unified, Real-Time Object Detection. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_9","unstructured":"Farhadi, A., and Redmon, J. (1997, January 17\u201319). Yolov3: An incremental improvement. Proceedings of the Computer Vision and Pattern Recognition, San Juan, PR, USA."},{"key":"ref_10","unstructured":"Bochkovskiy, A., Wang, C.Y., and Liao, H.Y.M. (2020). YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv."},{"key":"ref_11","unstructured":"Jocher, G., Stoken, G., Borovec, A., Chaurasia, J., Changyu, A., Hogan, L., Hajek, A., Diaconu, J., Kwon, L., and Defretin, Y. (2021). Ultralytics\/yolov5: V5.0\u2014YOLOv5-P6 1280 Models, AWS, Supervise.ly and YouTube Integrations. Zenodo."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Wang, C.Y., Bochkovskiy, A., and Liao, H.Y.M. (2022). YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv.","DOI":"10.1109\/UV56588.2022.10185474"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Tian, Z., Shen, C., Chen, H., and He, T. (November, January 27). FCOS: Fully Convolutional One-Stage Object Detection. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea.","DOI":"10.1109\/ICCV.2019.00972"},{"key":"ref_14","unstructured":"Ge, Z., Liu, S., Wang, F., Li, Z., and Sun, J. (2021). YOLOX: Exceeding YOLO Series in 2021. arXiv."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., and Zagoruyko, S. (2020). End-to-End Object Detection with Transformers. arXiv.","DOI":"10.1007\/978-3-030-58452-8_13"},{"key":"ref_16","unstructured":"Zhu, X., Su, W., Lu, L., Li, B., Wang, X., and Dai, J. (2020). Deformable DETR: Deformable Transformers for End-to-End Object Detection. arXiv."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"SushmaLeela, T., Chandrakanth, R., Saibaba, J., Varadan, G., and Mohan, S.S. (2013, January 18\u201321). Mean-shift based object detection and clustering from high resolution remote sensing imagery. Proceedings of the 2013 Fourth National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG), Jodhpur, India.","DOI":"10.1109\/NCVPRIPG.2013.6776271"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"1300","DOI":"10.1109\/LGRS.2016.2582528","article-title":"Remote Sensing Optical Image Registration Using Modified Uniform Robust SIFT","volume":"13","author":"Paul","year":"2016","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Wang, Y., Xu, C., Liu, C., and Li, Z. (2022). Context Information Refinement for Few-Shot Object Detection in Remote Sensing Images. Remote Sens., 14.","DOI":"10.3390\/rs14143255"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Niu, R., Zhi, X., Jiang, S., Gong, J., Zhang, W., and Yu, L. (2023). Aircraft Target Detection in Low Signal-to-Noise Ratio Visible Remote Sensing Images. Remote Sens., 15.","DOI":"10.3390\/rs15081971"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"777","DOI":"10.1080\/13658816.2019.1624761","article-title":"A locally-constrained YOLO framework for detecting small and densely-distributed building footprints","volume":"34","author":"Xie","year":"2020","journal-title":"Int. J. Geogr. Inf. Sci."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Zhu, X., Lyu, S., Wang, X., and Zhao, Q. (2021, January 11\u201317). TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-captured Scenarios. Proceedings of the 2021 IEEE\/CVF International Conference on Computer Vision Workshops (ICCVW), Montreal, QC, Canada.","DOI":"10.1109\/ICCVW54120.2021.00312"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Sun, Y., Liu, W., Gao, Y., Hou, X., and Bi, F. (2022). A Dense Feature Pyramid Network for Remote Sensing Object Detection. Appl. Sci., 12.","DOI":"10.3390\/app12104997"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Wan, X., Yu, J., Tan, H., and Wang, J. (2022). LAG: Layered Objects to Generate Better Anchors for Object Detection in Aerial Images. Sensors, 22.","DOI":"10.3390\/s22103891"},{"key":"ref_25","first-page":"1","article-title":"Multiscale Deformable Attention and Multilevel Features Aggregation for Remote Sensing Object Detection","volume":"19","author":"Dong","year":"2022","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/TGRS.2020.3040273","article-title":"A new spatial-oriented object detection framework for remote sensing images","volume":"60","author":"Yu","year":"2021","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"4307","DOI":"10.1109\/TGRS.2020.3010051","article-title":"Learning Center Probability Map for Detecting Objects in Aerial Images","volume":"59","author":"Wang","year":"2020","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"2011","DOI":"10.1109\/TPAMI.2019.2913372","article-title":"Squeeze-and-Excitation Networks","volume":"42","author":"Hu","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Wang, Q., Wu, B., Zhu, P.F., Li, P., Zuo, W., and Hu, Q. (2019, January 13\u201319). ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01155"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Woo, S., Park, J., Lee, J.Y., and Kweon, I.S. (2018, January 8\u201314). Cbam: Convolutional block attention module. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Hou, Q., Zhou, D., and Feng, J. (2021, January 19\u201325). Coordinate Attention for Efficient Mobile Network Design. Proceedings of the 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Online.","DOI":"10.1109\/CVPR46437.2021.01350"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Misra, D., Nalamada, T., Arasanipalai, A.U., and Hou, Q. (2020, January 3\u20138). Rotate to Attend: Convolutional Triplet Attention Module. Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA.","DOI":"10.1109\/WACV48630.2021.00318"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Wang, X., Girshick, R.B., Gupta, A.K., and He, K. (2017, January 18\u201322). Non-local Neural Networks. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00813"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Liu, J., Hou, Q., Cheng, M.M., Wang, C., and Feng, J. (2020, January 13\u201319). Improving Convolutional Networks With Self-Calibrated Convolutions. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01011"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Zhu, L., Wang, X., Ke, Z., Zhang, W., and Lau, R.W.H. (2023). BiFormer: Vision Transformer with Bi-Level Routing Attention. arXiv.","DOI":"10.1109\/CVPR52729.2023.00995"},{"key":"ref_36","unstructured":"Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., and Adam, H. (2017). MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Zhang, X., Zhou, X., Lin, M., and Sun, J. (2017, January 18\u201322). ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. Proceedings of the 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00716"},{"key":"ref_38","unstructured":"Tan, M., and Le, Q.V. (2019). EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv."},{"key":"ref_39","unstructured":"Li, H., Li, J., Wei, H., Liu, Z., Zhan, Z., and Ren, Q. (2022). Slim-neck by GSConv: A better design paradigm of detector architectures for autonomous vehicles. arXiv."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Doll\u00e1r, P., Girshick, R.B., He, K., Hariharan, B., and Belongie, S.J. (2017, January 21\u201326). Feature Pyramid Networks for Object Detection. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Cao, Y., Chen, K., Loy, C.C., and Lin, D. (2019, January 13\u201319). Prime Sample Attention in Object Detection. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01160"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Pang, J., Chen, K., Shi, J., Feng, H., Ouyang, W., and Lin, D. (2019, January 16\u201320). Libra R-CNN: Towards Balanced Learning for Object Detection. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00091"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Zhang, H., Chang, H., Ma, B., Wang, N., and Chen, X. (2020, January 23\u201328). Dynamic R-CNN: Towards High Quality Object Detection via Dynamic Training. Proceedings of the European Conference on Computer Vision, Online.","DOI":"10.1007\/978-3-030-58555-6_16"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Girshick, R.B. (2015, January 7\u201313). Fast R-CNN. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"146","DOI":"10.1016\/j.neucom.2022.07.042","article-title":"Focal and efficient IOU loss for accurate bounding box regression","volume":"506","author":"Zhang","year":"2022","journal-title":"Neurocomputing"},{"key":"ref_46","unstructured":"Jaderberg, M., Simonyan, K., Zisserman, A., and Kavukcuoglu, K. (2015, January 7\u201310). Spatial Transformer Networks. Proceedings of the NIPS, Montreal, QC, Canada."},{"key":"ref_47","unstructured":"Zhang, Y., Yuan, Y., Feng, Y., and Lu, X. (August, January 28). Hierarchical and robust convolutional neural network for very high-resolution remote sensing object detection. Proceedings of the IEEE Transactions on Geoscience and Remote Sensing, Yokohama, Japan."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll\u00e1r, P., and Zitnick, C.L. (2014, January 6\u201312). Microsoft coco: Common objects in context. Proceedings of the Computer Vision\u2013ECCV 2014: 13th European Conference, Zurich, Switzerland. Part V 13.","DOI":"10.1007\/978-3-319-10602-1_48"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/15\/6811\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T20:22:48Z","timestamp":1760127768000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/15\/6811"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,7,30]]},"references-count":48,"journal-issue":{"issue":"15","published-online":{"date-parts":[[2023,8]]}},"alternative-id":["s23156811"],"URL":"https:\/\/doi.org\/10.3390\/s23156811","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,7,30]]}}}