{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,2]],"date-time":"2026-06-02T03:09:22Z","timestamp":1780369762718,"version":"3.54.1"},"reference-count":42,"publisher":"MDPI AG","issue":"22","license":[{"start":{"date-parts":[[2022,11,17]],"date-time":"2022-11-17T00:00:00Z","timestamp":1668643200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Science and Technology Project of Shaanxi Province Yinhan Jiwei Engineering Construction Co., Ltd.","award":["SPS-D-15"],"award-info":[{"award-number":["SPS-D-15"]}]},{"name":"Science and Technology Project of Shaanxi Province Yinhan Jiwei Engineering Construction Co., Ltd.","award":["2020slkj-5"],"award-info":[{"award-number":["2020slkj-5"]}]},{"name":"Science and Technology Project of Shaanxi Province Yinhan Jiwei Engineering Construction Co., Ltd.","award":["ZD2020E005"],"award-info":[{"award-number":["ZD2020E005"]}]},{"name":"Science and Technology Project of Shaanxi Province Yinhan Jiwei Engineering Construction Co., Ltd.","award":["JCKYS2022604SSJS002"],"award-info":[{"award-number":["JCKYS2022604SSJS002"]}]},{"name":"Shaanxi Provincial Water Conservancy Science and Technology Program","award":["SPS-D-15"],"award-info":[{"award-number":["SPS-D-15"]}]},{"name":"Shaanxi Provincial Water Conservancy Science and Technology Program","award":["2020slkj-5"],"award-info":[{"award-number":["2020slkj-5"]}]},{"name":"Shaanxi Provincial Water Conservancy Science and Technology Program","award":["ZD2020E005"],"award-info":[{"award-number":["ZD2020E005"]}]},{"name":"Shaanxi Provincial Water Conservancy Science and Technology Program","award":["JCKYS2022604SSJS002"],"award-info":[{"award-number":["JCKYS2022604SSJS002"]}]},{"name":"Heilongjiang Provincial Natural Science Foundation","award":["SPS-D-15"],"award-info":[{"award-number":["SPS-D-15"]}]},{"name":"Heilongjiang Provincial Natural Science Foundation","award":["2020slkj-5"],"award-info":[{"award-number":["2020slkj-5"]}]},{"name":"Heilongjiang Provincial Natural Science Foundation","award":["ZD2020E005"],"award-info":[{"award-number":["ZD2020E005"]}]},{"name":"Heilongjiang Provincial Natural Science Foundation","award":["JCKYS2022604SSJS002"],"award-info":[{"award-number":["JCKYS2022604SSJS002"]}]},{"name":"Acoustics Science and Technology Laboratory","award":["SPS-D-15"],"award-info":[{"award-number":["SPS-D-15"]}]},{"name":"Acoustics Science and Technology Laboratory","award":["2020slkj-5"],"award-info":[{"award-number":["2020slkj-5"]}]},{"name":"Acoustics Science and Technology Laboratory","award":["ZD2020E005"],"award-info":[{"award-number":["ZD2020E005"]}]},{"name":"Acoustics Science and Technology Laboratory","award":["JCKYS2022604SSJS002"],"award-info":[{"award-number":["JCKYS2022604SSJS002"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Sonar image is the main way for underwater vehicles to obtain environmental information. The task of target detection in sonar images can distinguish multi-class targets in real time and accurately locate them, providing perception information for the decision-making system of underwater vehicles. However, there are many challenges in sonar image target detection, such as many kinds of sonar, complex and serious noise interference in images, and less datasets. This paper proposes a sonar image target detection method based on Dual Path Vision Transformer Network (DP-VIT) to accurately detect targets in forward-look sonar and side-scan sonar. DP-ViT increases receptive field by adding multi-scale to patch embedding enhances learning ability of model feature extraction by using Dual Path Transformer Block, then introduces Conv-Attention to reduce model training parameters, and finally uses Generalized Focal Loss to solve the problem of imbalance between positive and negative samples. The experimental results show that the performance of this sonar target detection method is superior to other mainstream methods on both forward-look sonar dataset and side-scan sonar dataset, and it can also maintain good performance in the case of adding noise.<\/jats:p>","DOI":"10.3390\/rs14225807","type":"journal-article","created":{"date-parts":[[2022,11,18]],"date-time":"2022-11-18T04:08:40Z","timestamp":1668744520000},"page":"5807","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":19,"title":["DP-ViT: A Dual-Path Vision Transformer for Real-Time Sonar Target Detection"],"prefix":"10.3390","volume":"14","author":[{"given":"Yushan","family":"Sun","sequence":"first","affiliation":[{"name":"Science and Technology on Underwater Vehicle Laboratory, Harbin Engineering University, Harbin 150001, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Haotian","family":"Zheng","sequence":"additional","affiliation":[{"name":"Science and Technology on Underwater Vehicle Laboratory, Harbin Engineering University, Harbin 150001, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8604-5554","authenticated-orcid":false,"given":"Guocheng","family":"Zhang","sequence":"additional","affiliation":[{"name":"Science and Technology on Underwater Vehicle Laboratory, Harbin Engineering University, Harbin 150001, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jingfei","family":"Ren","sequence":"additional","affiliation":[{"name":"College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8149-4609","authenticated-orcid":false,"given":"Hao","family":"Xu","sequence":"additional","affiliation":[{"name":"Marine Design and Research Institute of China, Shanghai 200011, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0174-5122","authenticated-orcid":false,"given":"Chao","family":"Xu","sequence":"additional","affiliation":[{"name":"College of Underwater Acoustic Engineering, Harbin Engineering University, Harbin 150001, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2022,11,17]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"103128","DOI":"10.1016\/j.apor.2022.103128","article-title":"Submarine pipeline tracking technology based on AUVs with forward looking sonar","volume":"122","author":"Zhang","year":"2022","journal-title":"Appl. Ocean Res."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"47","DOI":"10.1007\/s11804-022-00276-9","article-title":"A Review of Current Research and Advances in Unmanned Surface Vehicles","volume":"21","author":"Bai","year":"2022","journal-title":"J. Mar. Sci. Appl."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"141","DOI":"10.1109\/JOE.2021.3103269","article-title":"Automatic Target Recognition for Mine Countermeasure Missions Using Forward-Looking Sonar Data","volume":"47","author":"Palomeras","year":"2021","journal-title":"IEEE J. Ocean Eng."},{"key":"ref_4","unstructured":"Tang, Y., Jin, S., Xiao, F., Bian, G., and Zhang, Y. (2020, January 23\u201325). Recognition of Side-scan Sonar Shipwreck Image Using Convolutional Neural Network. Proceedings of the 2020 2nd International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI), Taiyuan, China."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Grz\u0105dziel, A. (2022). Application of Remote Sensing Techniques to Identification of Underwater Airplane Wreck in Shallow Water Environment: Case Study of the Baltic Sea, Poland. Remote Sens., 14.","DOI":"10.3390\/rs14205195"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1904","DOI":"10.1109\/TPAMI.2015.2389824","article-title":"Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition","volume":"37","author":"He","year":"2014","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_7","unstructured":"Ren, S., He, K., Girshick, R., and Sun, J. (2015, January 7\u201312). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Dollar, P., Girshick, R., and He, K. (2017, January 21\u201326). Feature Pyramid Networks for Object Detection. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Doll\u00e1r, P., and Girshick, R. (2017, January 22\u201329). Mask r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.322"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Ma, L., Zhao, D., Li, S., and Yu, D. (2020, January 27\u201329). End-to-End Denoising of Dark Burst Images using Recurrent Fully Convolutional Networks. Proceedings of the 15th International Conference on Computer Vision Theory and Applications, Valetta, Malta.","DOI":"10.5220\/0008895901890196"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., and Berg, A.C. (2016, January 11\u201314). Ssd: Single shot multibox detector. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Law, H., and Deng, J. (2018, January 8\u201314). Cornernet: Detecting objects as paired keypoints. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01264-9_45"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Wang, C.Y., Bochkovskiy, A., and Liao, H.Y.M. (2022). YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv.","DOI":"10.1109\/CVPR52729.2023.00721"},{"key":"ref_14","unstructured":"Radford, A., Narasimhan, K., Salimans, T., and Sutskever, I. (2018). Improving Language Understanding with Unsupervised Learning, Open AI."},{"key":"ref_15","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., and Unterthiner, T. (2021, January 3\u20137). An Image is Worth 16 \u00d7 16 Words: Transformers for Image Recognition at Scale. Proceedings of the International Conference on Learning Representations, Virtual Event, Austria."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., and Zagoruyko, S. (2020, January 23\u201328). End-to-end object detection with transformers. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58452-8_13"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Liu, Z., Hu, H., Lin, Y., Yao, Z., Xie, Z., and Wei, Y. (2022, January 18\u201324). Swin transformer v2: Scaling up capacity and resolution. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01170"},{"key":"ref_18","unstructured":"Li, J., Xia, X., Li, W., Li, H., Wang, X., and Xiao, X. (2022). Next-ViT: Next Generation Vision Transformer for Efficient Deployment in Realistic Industrial Scenarios. arXiv."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"1013","DOI":"10.1109\/JSEN.2015.2496945","article-title":"Robust sonar-based underwater object recognition against angle-of-view variation","volume":"16","author":"Cho","year":"2015","journal-title":"IEEE Sens. J."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"6858","DOI":"10.1109\/JSEN.2019.2912325","article-title":"A statistically-based method for the detection of underwater objects in sonar imagery","volume":"19","author":"Abu","year":"2019","journal-title":"IEEE Sens. J."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Neupane, D., and Seok, J. (2020). A review on deep learning-based approaches for automatic sonar target recognition. Electronics, 9.","DOI":"10.3390\/electronics9111972"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Kim, J., and Yu, S.C. (2016, January 6\u20139). Convolutional neural network-based real-time ROV detection using forward-looking sonar image. Proceedings of the 2016 IEEE\/OES Autonomous Underwater Vehicles (AUV), Tokyo, Japan.","DOI":"10.1109\/AUV.2016.7778702"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"3745","DOI":"10.1109\/JSEN.2019.2960796","article-title":"YOLOv3-DPFIN: A dual-path feature fusion neural network for robust real-time sonar target detection","volume":"20","author":"Kong","year":"2019","journal-title":"IEEE Sens. J."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"1135","DOI":"10.1007\/s11760-020-01841-x","article-title":"Detection and segmentation of underwater objects from forward-looking sonar based on a modified Mask RCNN","volume":"15","author":"Fan","year":"2021","journal-title":"Signal Image Video Process."},{"key":"ref_25","unstructured":"Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., and Weyand, T. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Zhang, X., Zhou, X., Lin, M., and Sun, J. (2018, January 18\u201323). Shufflenet: An extremely efficient convolutional neural network for mobile devices. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00716"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T., and Xie, S. (2022, January 18-24). A convnet for the 2020s. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01167"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Sun, P., Zhang, R., Jiang, Y., Kong, T., Xu, C., and Zhan, W. (2021, January 20\u201325). Sparse r-cnn: End-to-end object detection with learnable proposals. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01422"},{"key":"ref_29","unstructured":"Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., and J\u00e9gou, H. (2021, January 18\u201324). Training data-efficient image transformers & distillation through attention. Proceedings of the International Conference on Machine Learning, Online."},{"key":"ref_30","unstructured":"Zhu, X., Su, W., Lu, L., Li, B., Wang, X., and Dai, J. (2020). Deformable detr: Deformable transformers for end-to-end object detection. arXiv."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Li, Y., Mao, H., Girshick, R., and He, K. (2022). Exploring plain vision transformer backbones for object detection. arXiv.","DOI":"10.1007\/978-3-031-20077-9_17"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Xu, W., Xu, Y., Chang, T., and Tu, Z. (2021, January 10\u201317). Co-scale conv-attentional image transformers. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00983"},{"key":"ref_33","first-page":"21002","article-title":"Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection","volume":"33","author":"Li","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_34","first-page":"852","article-title":"Transfg: A transformer architecture for fine-grained recognition","volume":"36","author":"He","year":"2022","journal-title":"Proc. AAAI Conf. Artif. Intell."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Chollet, F. (2017, January 21\u201326). Xception: Deep learning with depthwise separable convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.195"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1007\/s11263-015-0816-y","article-title":"ImageNet Large Scale Visual Recognition Challenge","volume":"115","author":"Russakovsky","year":"2015","journal-title":"Int. J. Comput. Vis."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Radosavovic, I., Kosaraju, R.P., Girshick, R., He, K., and Doll\u00e1r, P. (2020, January 13\u201319). Designing Network Design Spaces. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01044"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Wang, X., Wang, G., and Zhang, W. (2018, January 9\u201311). Pseudo-color processing of forward looking sonar image: An adaptive hot metal coding algorithm. Proceedings of the 2018 Chinese Control and Decision Conference (CCDC), Shenyang, China.","DOI":"10.1109\/CCDC.2018.8407165"},{"key":"ref_39","unstructured":"Zhang, J., Sohel, F., Bian, H., Bennamoun, M., and An, S. (2016, January 19\u201323). Forward-looking sonar image registration using polar transform. Proceedings of the OCEANS 2016 MTS\/IEEE Monterey, Monterey, CA, USA."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Buslaev, A., Iglovikov, V.I., Khvedchenya, E., Parinov, A., Druzhinin, M., and Kalinin, A.A. (2020). Albumentations: Fast and flexible image augmentations. Information, 11.","DOI":"10.3390\/info11020125"},{"key":"ref_41","unstructured":"Ge, Z., Liu, S., Wang, F., Li, Z., and Sun, J. (2021). Yolox: Exceeding yolo series in 2021. arXiv."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1109\/48.16818","article-title":"A statistical study of acoustic signals backscattered from the sea bottom","volume":"14","author":"Gensane","year":"1989","journal-title":"IEEE J. Ocean. Eng."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/22\/5807\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T01:20:12Z","timestamp":1760145612000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/22\/5807"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,11,17]]},"references-count":42,"journal-issue":{"issue":"22","published-online":{"date-parts":[[2022,11]]}},"alternative-id":["rs14225807"],"URL":"https:\/\/doi.org\/10.3390\/rs14225807","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,11,17]]}}}