{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,19]],"date-time":"2026-02-19T08:02:11Z","timestamp":1771488131027,"version":"3.50.1"},"reference-count":32,"publisher":"MDPI AG","issue":"15","license":[{"start":{"date-parts":[[2022,7,31]],"date-time":"2022-07-31T00:00:00Z","timestamp":1659225600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["51409033"],"award-info":[{"award-number":["51409033"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["52171342"],"award-info":[{"award-number":["52171342"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["3132019343"],"award-info":[{"award-number":["3132019343"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["3132022126"],"award-info":[{"award-number":["3132022126"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["2020RT08"],"award-info":[{"award-number":["2020RT08"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Fundamental Research Funds for the Central Universities","award":["51409033"],"award-info":[{"award-number":["51409033"]}]},{"name":"Fundamental Research Funds for the Central Universities","award":["52171342"],"award-info":[{"award-number":["52171342"]}]},{"name":"Fundamental Research Funds for the Central Universities","award":["3132019343"],"award-info":[{"award-number":["3132019343"]}]},{"name":"Fundamental Research Funds for the Central Universities","award":["3132022126"],"award-info":[{"award-number":["3132022126"]}]},{"name":"Fundamental Research Funds for the Central Universities","award":["2020RT08"],"award-info":[{"award-number":["2020RT08"]}]},{"name":"Dalian Innovation Team Support Plan in the Key Research Field","award":["51409033"],"award-info":[{"award-number":["51409033"]}]},{"name":"Dalian Innovation Team Support Plan in the Key Research Field","award":["52171342"],"award-info":[{"award-number":["52171342"]}]},{"name":"Dalian Innovation Team Support Plan in the Key Research Field","award":["3132019343"],"award-info":[{"award-number":["3132019343"]}]},{"name":"Dalian Innovation Team Support Plan in the Key Research Field","award":["3132022126"],"award-info":[{"award-number":["3132022126"]}]},{"name":"Dalian Innovation Team Support Plan in the Key Research Field","award":["2020RT08"],"award-info":[{"award-number":["2020RT08"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>With the development of artificial intelligence technology, the behavior decision-making of an intelligent smart marine autonomous surface ship (SMASS) has become particularly important. This research proposed local path planning and a behavior decision-making approach based on improved Proximal Policy Optimization (PPO), which could drive an unmanned SMASS to the target without requiring any human experiences. In addition, a generalized advantage estimation was added to the loss function of the PPO algorithm, which allowed baselines in PPO algorithms to be self-adjusted. At first, the SMASS was modeled with the Nomoto model in a simulation waterway. Then, distances, obstacles, and prohibited areas were regularized as rewards or punishments, which were used to judge the performance and manipulation decisions of the vessel Subsequently, improved PPO was introduced to learn the action\u2013reward model, and the neural network model after training was used to manipulate the SMASS\u2019s movement. To achieve higher reward values, the SMASS could find an appropriate path or navigation strategy by itself. After a sufficient number of rounds of training, a convincing path and manipulation strategies would likely be produced. Compared with the proposed approach of the existing methods, this approach is more effective in self-learning and continuous optimization and thus closer to human manipulation.<\/jats:p>","DOI":"10.3390\/s22155732","type":"journal-article","created":{"date-parts":[[2022,8,1]],"date-time":"2022-08-01T23:49:27Z","timestamp":1659397767000},"page":"5732","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":24,"title":["Intelligent Smart Marine Autonomous Surface Ship Decision System Based on Improved PPO Algorithm"],"prefix":"10.3390","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6561-6349","authenticated-orcid":false,"given":"Wei","family":"Guan","sequence":"first","affiliation":[{"name":"Navigation College, Dalian Maritime University, Dalian 116026, China"}]},{"given":"Zhewen","family":"Cui","sequence":"additional","affiliation":[{"name":"Navigation College, Dalian Maritime University, Dalian 116026, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1577-571X","authenticated-orcid":false,"given":"Xianku","family":"Zhang","sequence":"additional","affiliation":[{"name":"Navigation College, Dalian Maritime University, Dalian 116026, China"}]}],"member":"1968","published-online":{"date-parts":[[2022,7,31]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Seuwou, P., Banissi, E., Ubakanma, G., Sharif, M.S., and Healey, A. (2017). Actor-Network Theory as a Framework to Analyse Technology Acceptance Model\u2019s External Variables: The Case of Autonomous Vehicles. International Conference on Global Security, Safety, and Sustainability, Springer.","DOI":"10.1007\/978-3-319-51064-4_24"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"45","DOI":"10.1109\/MRA.2010.935792","article-title":"Avalon Navigation Strategy and Trajectory Following Controller for an Autonomous Sailing Vessel","volume":"17","author":"Erckens","year":"2010","journal-title":"IEEE Robot. Autom. Mag."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Zhang, Z., Wu, D.F., Gu, J.D., and Li, F.S. (2019). A Path-Planning Strategy for Unmanned Surface Vehicles Based on an Adaptive Hybrid Dynamic Stepsize and Target Attractive Force-RRT Algorithm. J. Mar. Sci. Eng., 7.","DOI":"10.3390\/jmse7050132"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"114945","DOI":"10.1109\/ACCESS.2019.2935964","article-title":"Self-Adaptive Dynamic Obstacle Avoidance and Path Planning for USV Under Complex Maritime Environment","volume":"7","author":"Liu","year":"2019","journal-title":"IEEE Access"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Xie, S.R., Wu, P., Peng, Y., Luo, J., Qu, D., Li, Q.M., and Gu, J. (2014, January 28\u201330). The Obstacle Avoidance Planning of USV Based on Improved Artificial Potential Field. Proceedings of the IEEE International Conference on Information and Automation (ICIA), Hailar, China.","DOI":"10.1109\/ICInfA.2014.6932751"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"588","DOI":"10.1017\/S0373463318000796","article-title":"COLREGS-Constrained Real-time Path Planning for Autonomous Ships Using Modified Artificial Potential Fields","volume":"72","author":"Lyu","year":"2019","journal-title":"J. Navig."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"106299","DOI":"10.1016\/j.oceaneng.2019.106299","article-title":"A knowledge-free path planning approach for smart ships based on reinforcement learning","volume":"189","author":"Chen","year":"2019","journal-title":"Ocean. Eng."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Everett, M., Chen, Y.F., and How, J.P. (2018, January 1\u20135). Motion Planning Among Dynamic, Decision-Making Agents with Deep Reinforcement Learning. Proceedings of the 25th IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain.","DOI":"10.1109\/IROS.2018.8593871"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Zhang, J., Springenberg, J.T., Boedecker, J., and Burgard, W. (2017, January 24\u201328). Deep Reinforcement Learning with Successor Features for Navigation across Similar Environments. Proceedings of the 2017 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada.","DOI":"10.1109\/IROS.2017.8206049"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"268","DOI":"10.1016\/j.apor.2019.02.020","article-title":"Automatic collision avoidance of multiple ships based on deep Q-learning","volume":"86","author":"Shen","year":"2019","journal-title":"Appl. Ocean. Res."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"102759","DOI":"10.1016\/j.apor.2021.102759","article-title":"A path planning strategy unified with a COLREGS collision avoidance function based on deep reinforcement learning and artificial potential field","volume":"113","author":"Li","year":"2021","journal-title":"Appl. Ocean. Res."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Hu, Z., Wan, K., Gao, X., Zhai, Y., and Wang, Q. (2020). Deep Reinforcement Learning Approach with Multiple Experience Pools for UAV\u2019s Autonomous Motion Planning in Complex Unknown Environments. Sensors, 20.","DOI":"10.3390\/s20071890"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"109216","DOI":"10.1016\/j.oceaneng.2021.109216","article-title":"Deep reinforcement learning-based collision avoidance for an autonomous ship","volume":"234","author":"Chun","year":"2021","journal-title":"Ocean. Eng."},{"key":"ref_14","first-page":"293","article-title":"Control method for path following and collision avoidance of autonomous ship based on deep reinforcement learning","volume":"27","author":"Zhao","year":"2019","journal-title":"J. Mar. Sci. Technol.-Taiwan"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"107704","DOI":"10.1016\/j.oceaneng.2020.107704","article-title":"Intelligent collision avoidance algorithms for USVs via deep reinforcement learning under COLREGs","volume":"217","author":"Xu","year":"2020","journal-title":"Ocean. Eng."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Long, P., Fan, T., Liao, X., Liu, W., Zhang, H., and Pan, J. (2018, January 21\u201325). Towards Optimally Decentralized Multi-Robot Collision Avoidance via Deep Reinforcement Learning. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia.","DOI":"10.1109\/ICRA.2018.8461113"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Guan, W., Peng, H.W., Zhang, X.K., and Sun, H. (2022). Ship Steering Adaptive CGS Control Based on EKF Identification Method. J. Mar. Sci. Eng., 10.","DOI":"10.3390\/jmse10020294"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"3821048","DOI":"10.1155\/2019\/3821048","article-title":"Ship Steering Control Based on Quantum Neural Network","volume":"2019","author":"Guan","year":"2019","journal-title":"Complexity"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"106349","DOI":"10.1016\/j.oceaneng.2019.106349","article-title":"Improvement of integrator backstepping control for ships with concise robust control and nonlinear decoration","volume":"189","author":"Zhang","year":"2019","journal-title":"Ocean. Eng."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"031302","DOI":"10.1115\/1.4029826","article-title":"System Identification of Nonlinear Vessel Steering","volume":"137","author":"Perera","year":"2015","journal-title":"J. Offshore Mech. Arct. Eng."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"354","DOI":"10.3233\/ISP-1957-43504","article-title":"On the steering qualities of ships","volume":"4","author":"Nomoto","year":"1957","journal-title":"Int. Shipbuild. Prog."},{"key":"ref_22","first-page":"250","article-title":"A novel approach for assistance with anti-collision decision making based on the International Regulations for Preventing Collisions at Sea","volume":"226","author":"Zhang","year":"2012","journal-title":"Proc. Inst. Mech. Eng. Part M J. Eng. Marit. Environ."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"1292","DOI":"10.1007\/s00773-020-00787-6","article-title":"Path planning and collision avoidance for autonomous surface vehicles I: A review","volume":"26","author":"Vagale","year":"2021","journal-title":"J. Mar. Sci. Technol."},{"key":"ref_24","unstructured":"Dearden, R. (1998, January 26\u201330). Bayesian Q-learning. Proceedings of the Fifteenth National\/tenth Conference on Artificial Intelligence\/innovative Applications of Artificial Intelligence, Madison, WI, USA."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"533","DOI":"10.1038\/323533a0","article-title":"Learning Representations by Back Propagating Errors","volume":"323","author":"Rumelhart","year":"1986","journal-title":"Nature"},{"key":"ref_26","unstructured":"Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and Riedmiller, M. (2013). Playing Atari with Deep Reinforcement Learning. arXiv."},{"key":"ref_27","unstructured":"Hasselt, H.V., Guez, A., and Silver, D. (2016, January 12\u201317). Deep Reinforcement Learning with Double Q-learning. Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA."},{"key":"ref_28","unstructured":"Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. (2015). Continuous control with deep reinforcement learning. arXiv."},{"key":"ref_29","unstructured":"Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. (2017). Proximal Policy Optimization Algorithms. arXiv."},{"key":"ref_30","unstructured":"Schulman, J., Moritz, P., Levine, S., Jordan, M., and Abbeel, P. (2015). High-Dimensional Continuous Control Using Generalized Advantage Estimation. arXiv."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Fan, Y., Sun, Z., and Wang, G. (2022). A Novel Reinforcement Learning Collision Avoidance Algorithm for USVs Based on Maneuvering Characteristics and COLREGs. Sensors, 22.","DOI":"10.3390\/s22062099"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"104","DOI":"10.1016\/j.eswa.2016.06.021","article-title":"Neural networks based reinforcement learning for mobile robots obstacle avoidance","volume":"62","author":"Duguleana","year":"2016","journal-title":"Expert Syst. Appl."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/15\/5732\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T00:00:16Z","timestamp":1760140816000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/15\/5732"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,7,31]]},"references-count":32,"journal-issue":{"issue":"15","published-online":{"date-parts":[[2022,8]]}},"alternative-id":["s22155732"],"URL":"https:\/\/doi.org\/10.3390\/s22155732","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,7,31]]}}}