{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,10]],"date-time":"2026-05-10T15:11:47Z","timestamp":1778425907661,"version":"3.51.4"},"reference-count":44,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2023,6,2]],"date-time":"2023-06-02T00:00:00Z","timestamp":1685664000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Neurorobot."],"abstract":"<jats:p>A synthetic aperture radar (SAR) image is crucial for ship detection in computer vision. Due to the background clutter, pose variations, and scale changes, it is a challenge to construct a SAR ship detection model with low false-alarm rates and high accuracy. Therefore, this paper proposes a novel SAR ship detection model called ST-YOLOA. First, the Swin Transformer network architecture and coordinate attention (CA) model are embedded in the STCNet backbone network to enhance the feature extraction performance and capture global information. Second, we used the PANet path aggregation network with a residual structure to construct the feature pyramid to increase global feature extraction capability. Next, to cope with the local interference and semantic information loss problems, a novel up\/down-sampling method is proposed. Finally, the decoupled detection head is used to achieve the predicted output of the target position and the boundary box to improve convergence speed and detection accuracy. To demonstrate the efficiency of the proposed method, we have constructed three SAR ship detection datasets: a norm test set (NTS), a complex test set (CTS), and a merged test set (MTS). The experimental results show that our ST-YOLOA achieved an accuracy of 97.37%, 75.69%, and 88.50% on the three datasets, respectively, superior to the effects of other state-of-the-art methods. Our ST-YOLOA performs favorably in complex scenarios, and the accuracy is 4.83% higher than YOLOX on the CTS. Moreover, ST-YOLOA achieves real-time detection with a speed of 21.4 FPS.<\/jats:p>","DOI":"10.3389\/fnbot.2023.1170163","type":"journal-article","created":{"date-parts":[[2023,6,2]],"date-time":"2023-06-02T14:11:28Z","timestamp":1685715088000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":27,"title":["ST-YOLOA: a Swin-transformer-based YOLO model with an attention mechanism for SAR ship detection under complex background"],"prefix":"10.3389","volume":"17","author":[{"given":"Kai","family":"Zhao","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ruitao","family":"Lu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Siyu","family":"Wang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xiaogang","family":"Yang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Qingge","family":"Li","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jiwei","family":"Fan","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1965","published-online":{"date-parts":[[2023,6,2]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","first-page":"8333","DOI":"10.1109\/TGRS.2019.2920534","article-title":"DRBox-v2: an improved detector with rotatable boxes for target detection in SAR images","volume":"57","author":"An","year":"2019","journal-title":"IEEE Trans. Geosci. Remote Sensing"},{"key":"B2","doi-asserted-by":"publisher","first-page":"10934","DOI":"10.48550\/arXiv.2004.10934","article-title":"Yolov4: Optimal speed and accuracy of object detection","volume":"2004","author":"Bochkovskiy","year":"2020","journal-title":"arXiv"},{"key":"B3","first-page":"108","article-title":"Digital processing of synthetic aperture radar data","volume":"1","author":"Cumming","year":"2005","journal-title":"Artech House"},{"key":"B4","doi-asserted-by":"publisher","first-page":"6569","DOI":"10.1109\/ICCV.2019.00667","article-title":"\u201cCenternet: Keypoint triplets for object detection\u201d","author":"Duan","year":"2019","journal-title":"Proceedings IEEE\/CVF"},{"key":"B5","doi-asserted-by":"publisher","first-page":"303","DOI":"10.1007\/s11263-009-0275-4","article-title":"The pascal visual object classes (voc) challenge","volume":"88","author":"Everingham","year":"2009","journal-title":"Int. J. Computer Vision"},{"key":"B6","doi-asserted-by":"publisher","first-page":"5394","DOI":"10.1109\/TGRS.2018.2815592","article-title":"Adaptive ship detection in hybrid-polarimetric SAR images based on the power\u2013entropy decomposition","volume":"56","author":"Gao","year":"2018","journal-title":"IEEE Trans. Geosci. Remote Sensing"},{"key":"B7","doi-asserted-by":"publisher","first-page":"290","DOI":"10.1117\/12.2641456","article-title":"Enhanced attention one shot SAR ship detection algorithm based on cluster analysis and transformer","author":"Gao","year":"2022","journal-title":"Second International Conference on Digital Signal and Computer Communications (DSCC 2022): SPIE"},{"key":"B8","first-page":"08430","article-title":"Yolox: Exceeding yolo series in 2021","volume":"2107","author":"Ge","year":"2021","journal-title":"arXiv preprint arXiv:"},{"key":"B9","doi-asserted-by":"publisher","first-page":"1440","DOI":"10.1109\/ICCV.2015.169","article-title":"\u201cFast r-cnn\u201d","author":"Girshick","year":"2015","journal-title":"Proceedings of the IEEE International Conference on Computer Vision"},{"key":"B10","doi-asserted-by":"publisher","first-page":"580","DOI":"10.1109\/CVPR.2014.81","article-title":"\u201cRich feature hierarchies for accurate object detection and semantic segmentation\u201d","author":"Girshick","year":"2014","journal-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition"},{"key":"B11","doi-asserted-by":"publisher","first-page":"331","DOI":"10.1007\/s41095-022-0271-y","article-title":"Attention mechanisms in computer vision: a survey","volume":"8","author":"Guo","year":"2022","journal-title":"Comput. Visual Media"},{"key":"B12","doi-asserted-by":"publisher","first-page":"3127","DOI":"10.11834\/jig.210373","article-title":"Ship detection in SAR images based on adaptive weight pyramid and branch strong correlation","volume":"27","author":"Guo","year":"2022","journal-title":"J. Image Graphics"},{"key":"B13","first-page":"13713","article-title":"\u201cCoordinate attention for efficient mobile network design\u201d","author":"Hou","year":"2021","journal-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition"},{"key":"B14","first-page":"7132","article-title":"\u201cSqueeze-and-excitation networks\u201d","author":"Hu","year":"2018","journal-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition"},{"key":"B15","doi-asserted-by":"publisher","first-page":"502","DOI":"10.3788\/IRLA20210106","article-title":"Infrared dim and small target detection based on YOLO-IDSTD algorithm","volume":"51","author":"Jiang","year":"2022","journal-title":"Infrared Laser Eng."},{"key":"B16","unstructured":"JocherG.\n          YOLOv52020"},{"key":"B17","doi-asserted-by":"publisher","first-page":"3186","DOI":"10.3390\/rs14133186","article-title":"Ship detection in SAR images based on feature enhancement Swin transformer and adjacent feature fusion","volume":"14","author":"Li","year":"2022","journal-title":"Remote Sensing"},{"key":"B18","doi-asserted-by":"publisher","first-page":"2117","DOI":"10.1109\/CVPR.2017.106","article-title":"\u201cFeature pyramid networks for object detection\u201d","author":"Lin","year":"","journal-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition"},{"key":"B19","doi-asserted-by":"publisher","first-page":"2999","DOI":"10.1109\/ICCV.2017.324","article-title":"Focal loss for dense object detection","volume":"8","author":"Lin","year":"","journal-title":"IEEE Trans. Pattern Anal. Mach. Int."},{"key":"B20","doi-asserted-by":"publisher","first-page":"6974","DOI":"10.3390\/s22186974","article-title":"A domestic trash detection model based on improved YOLOX","volume":"22","author":"Liu","year":"2022","journal-title":"Sensors"},{"key":"B21","first-page":"8759","article-title":"\u201cPath aggregation network for instance segmentation\u201d","author":"Liu","year":"2018","journal-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition"},{"key":"B22","first-page":"21","article-title":"\u201cSsd: Single shot multibox detector\u201d, in Computer Vision\u2013ECCV 2016, 14th","volume-title":"European Conference","author":"Liu","year":"2016"},{"key":"B23","doi-asserted-by":"publisher","first-page":"10012","DOI":"10.1109\/ICCV48922.2021.00986","article-title":"\u201cSwin transformer: Hierarchical vision transformer using shifted windows\u201d","author":"Liu","year":"2021","journal-title":"Proceedings of the IEEE\/CVF International Conference Computer Vision"},{"key":"B24","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/LGRS.2020.3038784","article-title":"Infrared small target detection based on local hypergraph dissimilarity measure","volume":"19","author":"Lu","year":"","journal-title":"IEEE Geoscience Remote Sens Lett"},{"key":"B25","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/LGRS.2020.3026546","article-title":"Robust infrared small target detection via multidirectional derivative-based weighted contrast measure","volume":"19","author":"Lu","year":"","journal-title":"IEEE Geosci. Remote Sensing Letters"},{"key":"B26","doi-asserted-by":"publisher","first-page":"6","DOI":"10.1109\/MGRS.2013.2248301","article-title":"A tutorial on synthetic aperture radar","volume":"1","author":"Moreira","year":"2013","journal-title":"IEEE Geosci. Remote Sens. Mag."},{"key":"B27","doi-asserted-by":"publisher","first-page":"779","DOI":"10.1109\/CVPR.2016.91","article-title":"\u201cYou only look once: Unified, real-time object detection\u201d","author":"Redmon","year":"2016","journal-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition"},{"key":"B28","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/TPAMI.2016.2577031","article-title":"Faster r-cnn: Towards real-time object detection with region proposal networks","volume":"28","author":"Ren","year":"2015","journal-title":"Adv. Neural Inf. Proc. Syst."},{"key":"B29","doi-asserted-by":"publisher","first-page":"208","DOI":"10.1109\/7.135446","article-title":"A CFAR adaptive matched filter detector","volume":"28","author":"Robey","year":"1992","journal-title":"IEEE Trans. Aerospace Electr. Syst."},{"key":"B30","doi-asserted-by":"publisher","first-page":"3329","DOI":"10.1109\/JSTARS.2015.2417756","article-title":"Manifold adaptation for constant false alarm rate ship detection in South African oceans","volume":"8","author":"Schwegmann","year":"2015","journal-title":"IEEE J. Selected Topics Appl. Remote Sensing"},{"key":"B31","article-title":"\u201cEfficientDet: Scalable and Efficient Object Detection\u201d","author":"Tan","year":"2020","journal-title":"2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition"},{"key":"B32","first-page":"9627","article-title":"\u201cFcos: Fully convolutional one-stage object detection\u201d","author":"Tian","year":"2019","journal-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision"},{"key":"B33","first-page":"2","article-title":"Attention is all you need","volume":"30","author":"Vaswani","year":"2017","journal-title":"Adv. Neural Inf. Proc. Syst."},{"key":"B34","doi-asserted-by":"publisher","first-page":"529","DOI":"10.1109\/LGRS.2017.2654450","article-title":"An intensity-space domain CFAR method for ship detection in HR SAR images","volume":"14","author":"Wang","year":"2017","journal-title":"IEEE Geosci. Remote Sensing Letters"},{"key":"B35","first-page":"390","article-title":"\u201cCSPNet: A new backbone that can enhance learning capability of CNN\u201d","author":"Wang","year":"2020","journal-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops"},{"key":"B36","doi-asserted-by":"publisher","first-page":"02696","DOI":"10.48550\/arXiv.2207.02696","article-title":"YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors","volume":"2207","author":"Wang","year":"2022","journal-title":"arXiv"},{"key":"B37","doi-asserted-by":"publisher","first-page":"492","DOI":"10.3390\/horticulturae7110492","article-title":"SwinGD: A robust grape bunch detection model based on swin transformer in complex vineyard environment","volume":"7","author":"Wang","year":"2021","journal-title":"Horticulturae"},{"key":"B38","doi-asserted-by":"publisher","first-page":"780","DOI":"10.1080\/2150704X.2018.1475770","article-title":"Combining a single shot multibox detector with transfer learning for ship detection using sentinel-1 SAR images","volume":"9","author":"Wang","year":"2018","journal-title":"Remote Sensing Lett."},{"key":"B39","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2022.3231744","author":"Wu","year":"2022","journal-title":"Selecting High-Quality Proposals for Weakly Supervised Object Detection With Bottom-Up Aggregated Attention and Phase-Aware Loss. IEEE Transactions on Image Processing"},{"key":"B40","doi-asserted-by":"publisher","first-page":"1488","DOI":"10.3390\/rs14061488","article-title":"CRTransSar: a visual transformer based on contextual joint representation learning for SAR ship detection","volume":"14","author":"Xia","year":"2022","journal-title":"Remote Sensing"},{"key":"B41","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/LGRS.2021.3122190","article-title":"OLCN: An optimized low coupling network for small objects detection","volume":"19","author":"Yuan","year":"2021","journal-title":"IEEE Geosci. Remote Sensing Letters"},{"key":"B42","doi-asserted-by":"publisher","first-page":"09412","DOI":"10.48550\/arXiv.1710.09412","article-title":"mixup: Beyond empirical risk minimization","volume":"1710","author":"Zhang","year":"2017","journal-title":"Remote Sensing"},{"key":"B43","first-page":"9759","article-title":"\u201cBridging the gap between anchor-based and anchor-free detection via adaptive training sample selection\u201d","author":"Zhang","year":"2020","journal-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition"},{"key":"B44","doi-asserted-by":"publisher","first-page":"146","DOI":"10.1016\/j.neucom.2022.07.042","article-title":"Focal and efficient IOU loss for accurate bounding box regression","volume":"506","author":"Zhang","year":"2022","journal-title":"Neurocomputing"}],"container-title":["Frontiers in Neurorobotics"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fnbot.2023.1170163\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,2]],"date-time":"2023-06-02T14:11:51Z","timestamp":1685715111000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fnbot.2023.1170163\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,2]]},"references-count":44,"alternative-id":["10.3389\/fnbot.2023.1170163"],"URL":"https:\/\/doi.org\/10.3389\/fnbot.2023.1170163","relation":{},"ISSN":["1662-5218"],"issn-type":[{"value":"1662-5218","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,6,2]]},"article-number":"1170163"}}