{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,14]],"date-time":"2026-04-14T16:13:53Z","timestamp":1776183233715,"version":"3.50.1"},"reference-count":23,"publisher":"MDPI AG","issue":"16","license":[{"start":{"date-parts":[[2024,8,7]],"date-time":"2024-08-07T00:00:00Z","timestamp":1722988800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Key Research and Development Program of China","award":["2022YFC3320802"],"award-info":[{"award-number":["2022YFC3320802"]}]},{"name":"National Key Research and Development Program of China","award":["2023YFB3905704"],"award-info":[{"award-number":["2023YFB3905704"]}]},{"name":"National Key Research and Development Program of China","award":["226Z5901G"],"award-info":[{"award-number":["226Z5901G"]}]},{"name":"Central Guiding Local Technology Development","award":["2022YFC3320802"],"award-info":[{"award-number":["2022YFC3320802"]}]},{"name":"Central Guiding Local Technology Development","award":["2023YFB3905704"],"award-info":[{"award-number":["2023YFB3905704"]}]},{"name":"Central Guiding Local Technology Development","award":["226Z5901G"],"award-info":[{"award-number":["226Z5901G"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Object detection via remote sensing encounters significant challenges due to factors such as small target sizes, uneven target distribution, and complex backgrounds. This paper introduces the K-CBST YOLO algorithm, which is designed to address these challenges. It features a novel architecture that integrates the Convolutional Block Attention Module (CBAM) and Swin-Transformer to enhance global semantic understanding of feature maps and maximize the utilization of contextual information. Such integration significantly improves the accuracy with which small targets are detected against complex backgrounds. Additionally, we propose an improved detection network that combines the improved K-Means algorithm with a smooth Non-Maximum Suppression (NMS) algorithm. This network employs an adaptive dynamic K-Means clustering algorithm to pinpoint target areas of concentration in remote sensing images that feature varied distributions and uses a smooth NMS algorithm to suppress the confidence of overlapping candidate boxes, thereby minimizing their interference with subsequent detection results. The enhanced algorithm substantially bolsters the model\u2019s robustness in handling multi-scale target distributions, preserves more potentially valid information, and diminishes the likelihood of missed detections. This study involved experiments performed on the publicly available DIOR remote sensing image dataset and the DOTA aerial image dataset. Our experimental results demonstrate that, compared with other advanced detection algorithms, K-CBST YOLO outperforms all its counterparts in handling both datasets. It achieved a 68.3% mean Average Precision (mAP) on the DIOR dataset and a 78.4% mAP on the DOTA dataset.<\/jats:p>","DOI":"10.3390\/rs16162885","type":"journal-article","created":{"date-parts":[[2024,8,8]],"date-time":"2024-08-08T07:01:25Z","timestamp":1723100485000},"page":"2885","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":16,"title":["Enhancing Remote Sensing Object Detection with K-CBST YOLO: Integrating CBAM and Swin-Transformer"],"prefix":"10.3390","volume":"16","author":[{"given":"Aonan","family":"Cheng","sequence":"first","affiliation":[{"name":"National Engineering Research Center of Surveying and Mapping, China TopRS Technology Company Limited, Beijing 100039, China"},{"name":"Beijing Low-Altitude Remote Sensing Engineering Technology Research Center, Beijing 100039, China"}]},{"given":"Jincheng","family":"Xiao","sequence":"additional","affiliation":[{"name":"National Engineering Research Center of Surveying and Mapping, China TopRS Technology Company Limited, Beijing 100039, China"},{"name":"Beijing Low-Altitude Remote Sensing Engineering Technology Research Center, Beijing 100039, China"}]},{"given":"Yingcheng","family":"Li","sequence":"additional","affiliation":[{"name":"National Engineering Research Center of Surveying and Mapping, China TopRS Technology Company Limited, Beijing 100039, China"},{"name":"Beijing Low-Altitude Remote Sensing Engineering Technology Research Center, Beijing 100039, China"}]},{"given":"Yiming","family":"Sun","sequence":"additional","affiliation":[{"name":"National Engineering Research Center of Surveying and Mapping, China TopRS Technology Company Limited, Beijing 100039, China"},{"name":"Beijing Low-Altitude Remote Sensing Engineering Technology Research Center, Beijing 100039, China"}]},{"given":"Yafeng","family":"Ren","sequence":"additional","affiliation":[{"name":"National Engineering Research Center of Surveying and Mapping, China TopRS Technology Company Limited, Beijing 100039, China"},{"name":"Beijing Low-Altitude Remote Sensing Engineering Technology Research Center, Beijing 100039, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4108-3004","authenticated-orcid":false,"given":"Jianli","family":"Liu","sequence":"additional","affiliation":[{"name":"National Engineering Research Center of Surveying and Mapping, China TopRS Technology Company Limited, Beijing 100039, China"},{"name":"Beijing Low-Altitude Remote Sensing Engineering Technology Research Center, Beijing 100039, China"}]}],"member":"1968","published-online":{"date-parts":[[2024,8,7]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014, January 23\u201328). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.81"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 7\u201313). Fast r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1007\/978-3-319-46448-0_2","article-title":"Ssd: Single shot multibox detector","volume":"Volume 14","author":"Liu","year":"2016","journal-title":"Proceedings of the Computer Vision\u2013ECCV 2016: 14th European Conference"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016, January 27\u201330). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Goyal, P., Girshick, R., He, K., and Doll\u00e1r, P. (2017, January 22\u201329). Focal loss for dense object detection. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.324"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","article-title":"Faster r-cnn: Towards real-time object detection with region proposal networks","volume":"39","author":"Ren","year":"2016","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., and Zagoruyko, S. (2020, January 23\u201328). End-to-end object detection with transformers. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58452-8_13"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., and Guo, B. (2021, January 11\u201317). Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"ref_9","first-page":"9355","article-title":"Twins: Revisiting the design of spatial attention in vision transformers","volume":"34","author":"Chu","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_10","unstructured":"Bochkovskiy, A., Wang, C.Y., and Liao, H.Y.M. (2020). Yolov4: Optimal speed and accuracy of object detection. arXiv."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Ding, J., Xue, N., Long, Y., Xia, G.S., and Lu, Q. (2019, January 15\u201320). Learning RoI transformer for oriented object detection in aerial images. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00296"},{"key":"ref_12","unstructured":"Yang, X., Yang, J., Yan, J., Zhang, Y., Zhang, T., Guo, Z., and Fu, K. (November, January 27). Scrdet: Towards more robust detection for small, cluttered and rotated objects. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/TGRS.2024.3432878","article-title":"Multistage enhancement network for tiny object detection in remote sensing images","volume":"62","author":"Zhang","year":"2024","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_14","first-page":"1","article-title":"Domain adaptation with contrastive learning for object detection in satellite imagery","volume":"62","author":"Biswas","year":"2024","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/TGRS.2024.3496660","article-title":"Relation Learning Reasoning Meets Tiny Object Tracking in Satellite Videos","volume":"62","author":"Yang","year":"2024","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Woo, S., Park, J., Lee, J.Y., and Kweon, I.S. (2018, January 8\u201314). Cbam: Convolutional block attention module. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"ref_17","first-page":"38192","article-title":"A convnet for non-maximum suppression","volume":"Volume 38","author":"Hosang","year":"2016","journal-title":"Proceedings of the Pattern Recognition: 38th German Conference, GCPR 2016"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"296","DOI":"10.1016\/j.isprsjprs.2019.11.023","article-title":"Object detection in optical remote sensing images: A survey and a new benchmark","volume":"159","author":"Li","year":"2020","journal-title":"Isprs J. Photogramm. Remote Sens."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Xia, G.S., Bai, X., Ding, J., Zhu, Z., Belongie, S., Luo, J., and Zhang, L. (2018, January 18\u201323). DOTA: A large-scale dataset for object detection in aerial images. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00418"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Yang, X., Sun, H., Fu, K., Yang, J., Sun, X., Yan, M., and Guo, Z. (2018). Automatic ship detection in remote sensing images from google earth of complex scenes based on multiscale rotation dense feature pyramid networks. Remote Sens., 10.","DOI":"10.3390\/rs10010132"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/TGRS.2021.3095186","article-title":"CFC-Net: A critical feature capturing network for arbitrary-oriented object detection in remote-sensing images","volume":"60","author":"Ming","year":"2021","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_22","first-page":"1","article-title":"On improving bounding box representations for oriented object detection","volume":"61","author":"Yao","year":"2022","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Azimi, S.M., Vig, E., Bahmanyar, R., K\u00f6rner, M., and Reinartz, P. (2018, January 2\u20136). Towards multi-class object detection in unconstrained remote sensing imagery. Proceedings of the Asian Conference on Computer Vision, Perth, Australia.","DOI":"10.1007\/978-3-030-20893-6_10"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/16\/16\/2885\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T15:31:40Z","timestamp":1760110300000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/16\/16\/2885"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,8,7]]},"references-count":23,"journal-issue":{"issue":"16","published-online":{"date-parts":[[2024,8]]}},"alternative-id":["rs16162885"],"URL":"https:\/\/doi.org\/10.3390\/rs16162885","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,8,7]]}}}