{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,16]],"date-time":"2026-01-16T19:41:24Z","timestamp":1768592484387,"version":"3.49.0"},"reference-count":45,"publisher":"MDPI AG","issue":"6","license":[{"start":{"date-parts":[[2022,3,8]],"date-time":"2022-03-08T00:00:00Z","timestamp":1646697600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61976033, 51609033"],"award-info":[{"award-number":["61976033, 51609033"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100005047","name":"Natural Science Foundation of Liaoning Province","doi-asserted-by":"publisher","award":["20180520005"],"award-info":[{"award-number":["20180520005"]}],"id":[{"id":"10.13039\/501100005047","id-type":"DOI","asserted-by":"publisher"}]},{"name":"the Key Development Guidance Program of Liaoning Province of China","award":["2019JH8\/10100100"],"award-info":[{"award-number":["2019JH8\/10100100"]}]},{"name":"the Soft Science Research Program of Dalian City of China","award":["2019J11CY014"],"award-info":[{"award-number":["2019J11CY014"]}]},{"DOI":"10.13039\/501100012226","name":"Fundamental Research Funds for the Central Universities","doi-asserted-by":"publisher","award":["3132020110"],"award-info":[{"award-number":["3132020110"]}],"id":[{"id":"10.13039\/501100012226","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Autonomous collision avoidance technology provides an intelligent method for unmanned surface vehicles\u2019 (USVs) safe and efficient navigation. In this paper, the USV collision avoidance problem under the constraint of the international regulations for preventing collisions at sea (COLREGs) was studied. Here, a reinforcement learning collision avoidance (RLCA) algorithm is proposed that complies with USV maneuverability. Notably, the reinforcement learning agent does not require any prior knowledge about USV collision avoidance from humans to learn collision avoidance motions well. The double-DQN method was used to reduce the overestimation of the action-value function. A dueling network architecture was adopted to clearly distinguish the difference between a great state and an excellent action. Aiming at the problem of agent exploration, a method based on the characteristics of USV collision avoidance, the category-based exploration method, can improve the exploration ability of the USV. Because a large number of turning behaviors in the early steps may affect the training, a method to discard some of the transitions was designed, which can improve the effectiveness of the algorithm. A finite Markov decision process (MDP) that conforms to the USVs\u2019 maneuverability and COLREGs was used for the agent training. The RLCA algorithm was tested in a marine simulation environment in many different USV encounters, which showed a higher average reward. The RLCA algorithm bridged the divide between USV navigation status information and collision avoidance behavior, resulting in successfully planning a safe and economical path to the terminal.<\/jats:p>","DOI":"10.3390\/s22062099","type":"journal-article","created":{"date-parts":[[2022,3,9]],"date-time":"2022-03-09T01:50:53Z","timestamp":1646790653000},"page":"2099","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":31,"title":["A Novel Reinforcement Learning Collision Avoidance Algorithm for USVs Based on Maneuvering Characteristics and COLREGs"],"prefix":"10.3390","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7828-4902","authenticated-orcid":false,"given":"Yunsheng","family":"Fan","sequence":"first","affiliation":[{"name":"College of Marine Electrical Engineering, Dalian Maritime University, Dalian 116026, China"},{"name":"Key Laboratory of Technology and System for Intelligent Ships of Liaoning Province, Dalian 116026, China"}]},{"given":"Zhe","family":"Sun","sequence":"additional","affiliation":[{"name":"College of Marine Electrical Engineering, Dalian Maritime University, Dalian 116026, China"},{"name":"Key Laboratory of Technology and System for Intelligent Ships of Liaoning Province, Dalian 116026, China"}]},{"given":"Guofeng","family":"Wang","sequence":"additional","affiliation":[{"name":"College of Marine Electrical Engineering, Dalian Maritime University, Dalian 116026, China"},{"name":"Key Laboratory of Technology and System for Intelligent Ships of Liaoning Province, Dalian 116026, China"}]}],"member":"1968","published-online":{"date-parts":[[2022,3,8]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1038\/nature14236","article-title":"Human-level control through deep reinforcement learning","volume":"518","author":"Mnih","year":"2015","journal-title":"Nature"},{"key":"ref_2","unstructured":"Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"350","DOI":"10.1038\/s41586-019-1724-z","article-title":"Grandmaster level in StarCraft II using multi-agent reinforcement learning","volume":"575","author":"Vinyals","year":"2019","journal-title":"Nature"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"354","DOI":"10.1038\/nature24270","article-title":"Mastering the game of go without human knowledge","volume":"550","author":"Silver","year":"2017","journal-title":"Nature"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"71","DOI":"10.1016\/j.arcontrol.2016.04.018","article-title":"Unmanned surface vehicles: An overview of developments and challenges","volume":"41","author":"Liu","year":"2016","journal-title":"Annu. Rev. Control."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"63","DOI":"10.1016\/j.neucom.2017.06.066","article-title":"Concise deep reinforcement learning obstacle avoidance for underactuated unmanned marine vessels","volume":"272","author":"Cheng","year":"2018","journal-title":"Neurocomputing"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"109216","DOI":"10.1016\/j.oceaneng.2021.109216","article-title":"Deep reinforcement learning-based collision avoidance for an autonomous ship","volume":"234","author":"Chun","year":"2021","journal-title":"Ocean Eng."},{"key":"ref_8","unstructured":"Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. (2017). Proximal policy optimization algorithms. arXiv."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"102759","DOI":"10.1016\/j.apor.2021.102759","article-title":"A path planning strategy unified with a COLREGS collision avoidance function based on deep reinforcement learning and artificial potential field","volume":"113","author":"Li","year":"2021","journal-title":"Appl. Ocean Res."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"375","DOI":"10.1016\/j.neucom.2020.05.089","article-title":"A composite learning method for multi-ship collision avoidance based on reinforcement learning and inverse control","volume":"411","author":"Xie","year":"2020","journal-title":"Neurocomputing"},{"key":"ref_11","unstructured":"Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., and Kavukcuoglu, K. (2016). Asynchronous methods for deep reinforcement learning. Int. Conf. Mach. Learn., 1928\u20131937."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"105210","DOI":"10.1016\/j.knosys.2019.105201","article-title":"The autonomous navigation and obstacle avoidance for USVs with ANOA deep reinforcement learning method","volume":"196","author":"Wu","year":"2020","journal-title":"Knowl.-Based Syst."},{"key":"ref_13","unstructured":"Rummery, G.A., and Niranjan, M. (1994). On-Line Q-Learning Using Connectionist Systems, Citeseer."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"1513","DOI":"10.1142\/S0129183101002851","article-title":"Deep-Sarsa: A reinforcement learning algorithm for autonomous navigation","volume":"12","author":"Andrecut","year":"2001","journal-title":"Int. J. Mod. Phys. C"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Guo, S., Zhang, X., Zheng, Y., and Du, Y. (2020). An autonomous path planning model for unmanned ships based on deep reinforcement learning. Sensors, 20.","DOI":"10.3390\/s20020426"},{"key":"ref_16","unstructured":"Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. (2015). Continuous control with deep reinforcement learning. arXiv."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"107001","DOI":"10.1016\/j.oceaneng.2020.107001","article-title":"Collision avoidance for an unmanned surface vehicle using deep reinforcement learning","volume":"199","author":"Woo","year":"2020","journal-title":"Ocean Eng."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"107704","DOI":"10.1016\/j.oceaneng.2020.107704","article-title":"Intelligent collision avoidance algorithms for USVs via deep reinforcement learning under COLREGs","volume":"217","author":"Xu","year":"2020","journal-title":"Ocean Eng."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Fossen, T.I. (2011). Handbook of Marine Craft Hydrodynamics and Motion Control, John Wiley & Sons.","DOI":"10.1002\/9781119994138"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"106436","DOI":"10.1016\/j.oceaneng.2019.106436","article-title":"COLREGs-compliant multiship collision avoidance based on deep reinforcement learning","volume":"191","author":"Zhao","year":"2019","journal-title":"Ocean Eng."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Zhang, X., Wang, C., Liu, Y., and Chen, X. (2019). Decision-making for the autonomous navigation of maritime autonomous surface ships based on scene division and deep reinforcement learning. Sensors, 19.","DOI":"10.3390\/s19184055"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"268","DOI":"10.1016\/j.apor.2019.02.020","article-title":"Automatic collision avoidance of multiple ships based on deep Q-learning","volume":"86","author":"Shen","year":"2019","journal-title":"Appl. Ocean Res."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"165262","DOI":"10.1109\/ACCESS.2019.2953326","article-title":"Learn to navigate: Cooperative path planning for unmanned surface vehicles using deep reinforcement learning","volume":"7","author":"Zhou","year":"2019","journal-title":"IEEE Access"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"2834","DOI":"10.1109\/TITS.2020.2976567","article-title":"A formation autonomous navigation system for unmanned surface vehicles with distributed control strategy","volume":"22","author":"Sun","year":"2020","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"277","DOI":"10.1016\/j.oceaneng.2017.09.020","article-title":"Review of ship safety domains: Models and applications","volume":"145","author":"Szlapczynski","year":"2017","journal-title":"Ocean Eng."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Li, L., Zhou, Z., Wang, B., Miao, L., An, Z., and Xiao, X. (2021). Domain Adaptive Ship Detection in Optical Remote Sensing Images. Remote Sens., 13.","DOI":"10.3390\/rs13163168"},{"key":"ref_27","first-page":"807","article-title":"Theory and observations on the use of a mathematical model for ship manoeuvring in deep and confined waters","volume":"68","author":"Norrbin","year":"1971","journal-title":"SSPA Rep. Nr"},{"key":"ref_28","unstructured":"Sutton, R.S., and Barto, A.G. (2018). Reinforcement Learning: An Introduction, MIT Press."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"604","DOI":"10.1038\/s41586-020-03051-4","article-title":"Mastering atari, go, chess and shogi by planning with a learned model","volume":"588","author":"Schrittwieser","year":"1992","journal-title":"Nature"},{"key":"ref_30","unstructured":"Watkins, C.J.C.H. (1989). Learning from Delayed Rewards, University of Cambridge."},{"key":"ref_31","first-page":"1057","article-title":"Policy gradient methods for reinforcement learning with function approximation","volume":"12","author":"Sutton","year":"2000","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_32","first-page":"387","article-title":"Deterministic policy gradient algorithms","volume":"32","author":"Silver","year":"2014","journal-title":"Int. Conf. Mach. Learn."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1007\/BF00115009","article-title":"Learning to predict by the methods of temporal differences","volume":"3","author":"Sutton","year":"1988","journal-title":"Mach. Learn."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1007\/BF00992698","article-title":"Q-learning","volume":"8","author":"Watkins","year":"1992","journal-title":"Mach. Learn."},{"key":"ref_35","first-page":"1","article-title":"Learning deep architectures for Al","volume":"2","author":"Bengio","year":"2007","journal-title":"Found. Trends Mach."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1007\/BF00992697","article-title":"Practical issues in temporal difference learning","volume":"8","author":"Tesauro","year":"1992","journal-title":"Mach. Learn."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"85","DOI":"10.1016\/j.neunet.2014.09.003","article-title":"Deep learning in neural networks: An overview","volume":"61","author":"Schmidhuber","year":"2015","journal-title":"Neural Netw."},{"key":"ref_38","first-page":"2613","article-title":"Double Q-learning","volume":"23","author":"Van","year":"2010","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_39","first-page":"1038","article-title":"Generalization in reinforcement learning: Successful examples using sparse coarse coding","volume":"8","author":"Sutton","year":"1996","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_40","first-page":"1","article-title":"Deep reinforcement learning with double q-learning","volume":"30","author":"Van","year":"2016","journal-title":"Proc. AAAI Conf. Artif. Intell."},{"key":"ref_41","first-page":"1995","article-title":"Dueling network architectures for deep reinforcement learning","volume":"48","author":"Wang","year":"2016","journal-title":"Int. Conf. Mach. Learn."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"580","DOI":"10.1038\/s41586-020-03157-9","article-title":"First return, then explore","volume":"590","author":"Ecoffet","year":"2021","journal-title":"Nature"},{"key":"ref_43","first-page":"1471","article-title":"Unifying count-based exploration and intrinsic motivation","volume":"29","author":"Bellemare","year":"2016","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_44","first-page":"2721","article-title":"Count-based exploration with neural density models","volume":"70","author":"Ostrovski","year":"2017","journal-title":"Int. Conf. Mach. Learn."},{"key":"ref_45","first-page":"1","article-title":"Exploration: A study of count-based exploration for deep reinforcement learning","volume":"30","author":"Tang","year":"2017","journal-title":"Conf. Neural Inf. Process. Syst."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/6\/2099\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T22:34:15Z","timestamp":1760135655000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/6\/2099"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,8]]},"references-count":45,"journal-issue":{"issue":"6","published-online":{"date-parts":[[2022,3]]}},"alternative-id":["s22062099"],"URL":"https:\/\/doi.org\/10.3390\/s22062099","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,3,8]]}}}