{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T17:06:38Z","timestamp":1774976798147,"version":"3.50.1"},"reference-count":54,"publisher":"MDPI AG","issue":"17","license":[{"start":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T00:00:00Z","timestamp":1630454400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["51779057"],"award-info":[{"award-number":["51779057"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"the Natural Science Foundation of Heilongjiang Province","award":["ZD2020E005"],"award-info":[{"award-number":["ZD2020E005"]}]},{"name":"Shaanxi Provincial Water Conservancy Science and Technology Plan","award":["2020slkj-5"],"award-info":[{"award-number":["2020slkj-5"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>This study aims to solve the problems of poor exploration ability, single strategy, and high training cost in autonomous underwater vehicle (AUV) motion planning tasks and to overcome certain difficulties, such as multiple constraints and a sparse reward environment. In this research, an end-to-end motion planning system based on deep reinforcement learning is proposed to solve the motion planning problem of an underactuated AUV. The system directly maps the state information of the AUV and the environment into the control instructions of the AUV. The system is based on the soft actor\u2013critic (SAC) algorithm, which enhances the exploration ability and robustness to the AUV environment. We also use the method of generative adversarial imitation learning (GAIL) to assist its training to overcome the problem that learning a policy for the first time is difficult and time-consuming in reinforcement learning. A comprehensive external reward function is then designed to help the AUV smoothly reach the target point, and the distance and time are optimized as much as possible. Finally, the end-to-end motion planning algorithm proposed in this research is tested and compared on the basis of the Unity simulation platform. Results show that the algorithm has an optimal decision-making ability during navigation, a shorter route, less time consumption, and a smoother trajectory. Moreover, GAIL can speed up the AUV training speed and minimize the training time without affecting the planning effect of the SAC algorithm.<\/jats:p>","DOI":"10.3390\/s21175893","type":"journal-article","created":{"date-parts":[[2021,9,2]],"date-time":"2021-09-02T23:05:12Z","timestamp":1630623912000},"page":"5893","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":14,"title":["End-to-End AUV Motion Planning Method Based on Soft Actor-Critic"],"prefix":"10.3390","volume":"21","author":[{"given":"Xin","family":"Yu","sequence":"first","affiliation":[{"name":"Science and Technology on Underwater Vehicle Laboratory, Harbin Engineering University, Harbin 150001, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yushan","family":"Sun","sequence":"additional","affiliation":[{"name":"Science and Technology on Underwater Vehicle Laboratory, Harbin Engineering University, Harbin 150001, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1737-4810","authenticated-orcid":false,"given":"Xiangbin","family":"Wang","sequence":"additional","affiliation":[{"name":"Science and Technology on Underwater Vehicle Laboratory, Harbin Engineering University, Harbin 150001, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Guocheng","family":"Zhang","sequence":"additional","affiliation":[{"name":"Science and Technology on Underwater Vehicle Laboratory, Harbin Engineering University, Harbin 150001, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2021,9,1]]},"reference":[{"key":"ref_1","unstructured":"Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"269","DOI":"10.1007\/BF01386390","article-title":"A note on two problems in connexion with graphs","volume":"1","author":"Dijkstra","year":"1959","journal-title":"Numer. Math."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Scharff Willners, J., Gonzalez-Adell, D., Hern\u00e1ndez, J.D., Pairet, \u00c8., and Petillot, Y. (2021). Online 3-Dimensional Path Planning with Kinematic Constraints in Unknown Environments Using Hybrid A* with Tree Pruning. Sensors, 21.","DOI":"10.3390\/s21041152"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"993","DOI":"10.1109\/TSMC.2015.2500027","article-title":"Mutual information-based multi-AUV path planning for scalar field sampling using multidimensional RRT","volume":"46","author":"Cui","year":"2015","journal-title":"IEEE Trans. Syst. Man Cybern. Syst."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"6523158","DOI":"10.1155\/2020\/6523158","article-title":"Improved artificial potential field method applied for AUV path planning","volume":"2020","author":"Fan","year":"2020","journal-title":"Math. Probl. Eng."},{"key":"ref_6","unstructured":"Zeng, Z., Sammut, K., He, F., and Lammas, A. (2012, January 14\u201319). Efficient path evaluation for AUVs using adaptive B-spline approximation. Proceedings of the IEEE Oceans, Hampton Roads, VA, USA."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Cai, W., Zhang, M., and Zheng, Y.R. (2017). Task assignment and path planning for multiple autonomous underwater vehicles using 3D dubins curves. Sensors, 17.","DOI":"10.3390\/s17071607"},{"key":"ref_8","unstructured":"Sutton, R.S., and Barto, A.G. (2018). Reinforcement Learning: An Introduction, MIT Press."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Wang, L., Kan, J., Guo, J., and Wang, C. (2019). 3D path planning for the ground robot with improved ant colony optimization. Sensors, 19.","DOI":"10.3390\/s19040815"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Hao, K., Zhao, J., Yu, K., Li, C., and Wang, C. (2020). Path planning of mobile robots based on a multi-population migration genetic algorithm. Sensors, 20.","DOI":"10.3390\/s20205873"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"227","DOI":"10.1016\/j.ins.2018.04.044","article-title":"An integrated multi-population genetic algorithm for multi-vehicle task assignment in a drift field","volume":"453","author":"Bai","year":"2018","journal-title":"Inf. Sci."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"2166","DOI":"10.1109\/LRA.2017.2722541","article-title":"Clustering-based algorithms for multivehicle task assignment in a time-invariant drift field","volume":"2","author":"Bai","year":"2017","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Li, J., and Wang, H. (2020, January 2\u20135). Research on AUV Path Planning Based on Improved Ant Colony Algorithm. Proceedings of the 2020 IEEE International Conference on Mechatronics and Automation (ICMA), Beijing, China.","DOI":"10.1109\/ICMA49215.2020.9233546"},{"key":"ref_14","unstructured":"Camci, E., and Kayacan, E. (2019). End-to-End Motion Planning of Quadrotors Using Deep Reinforcement Learning. arXiv."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Doukhi, O., and Lee, D. (2021). Deep Reinforcement Learning for End-to-End Local Motion Planning of Autonomous Aerial Robots in Unknown Outdoor Environments: Real-Time Flight Experiments. Sensors, 21.","DOI":"10.3390\/s21072534"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"63","DOI":"10.1016\/j.neucom.2017.06.066","article-title":"Concise deep reinforcement learning obstacle avoidance for underactuated unmanned marine vessels","volume":"272","author":"Cheng","year":"2018","journal-title":"Neurocomputing"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Sun, Y., Ran, X., Zhang, G., Xu, H., and Wang, X. (2020). AUV 3D path planning based on the improved hierarchical deep Q network. J. Mar. Sci. Eng., 8.","DOI":"10.3390\/jmse8020145"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"591","DOI":"10.1007\/s10846-019-01004-2","article-title":"Mapless motion planning system for an autonomous underwater vehicle using policy gradient-based deep reinforcement learning","volume":"96","author":"Sun","year":"2019","journal-title":"J. Intell. Robot. Syst."},{"key":"ref_19","unstructured":"Butyrev, L.T., and Mutschler, C. (2019). Deep reinforcement learning for motion planning of mobile robots. arXiv."},{"key":"ref_20","unstructured":"Haarnoja, T., Zhou, A., Abbeel, P., and Levine, S. (2018, January 2). Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. Proceedings of the PMLR, Montr\u00e9al, QC, Canada."},{"key":"ref_21","unstructured":"Haarnoja, T., Zhou, A., Hartikainen, K., Tucker, G., Ha, S., Tan, J., Kumar, V., Zhu, H., Gupta, A., and Abbeel, P. (2018). Soft actor-critic algorithms and applications. arXiv."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Prianto, E., Kim, M., Park, J., Bae, J., and Kim, J. (2020). Path Planning for Multi-Arm Manipulators Using Deep Reinforcement Learning: Soft Actor\u2013Critic with Hindsight Experience Replay. Sensors, 20.","DOI":"10.3390\/s20205911"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"26871","DOI":"10.1109\/ACCESS.2021.3056903","article-title":"Motion Planning for Dual-Arm Robot Based on Soft Actor-Critic","volume":"9","author":"Wong","year":"2021","journal-title":"IEEE Access"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Liu, Q., Li, Y., and Liu, L. (2020, January 20\u201321). A 3D Simulation Environment and Navigation Approach for Robot Navigation via Deep Reinforcement Learning in Dense Pedestrian Environment. Proceedings of the 2020 IEEE 16th International Conference on Automation Science and Engineering (CASE), Hong Kong, China.","DOI":"10.1109\/CASE48305.2020.9217023"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Cheng, Y., and Song, Y. (2020, January 27\u201329). Autonomous Decision-Making Generation of UAV based on Soft Actor-Critic Algorithm. Proceedings of the 2020 39th Chinese Control Conference (CCC), Shenyang, China.","DOI":"10.23919\/CCC50068.2020.9188886"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Gupta, A., Khwaja, A.S., Anpalagan, A., Guan, L., and Venkatesh, B. (2020). Policy-Gradient and Actor-Critic Based State Representation Learning for Safe Driving of Autonomous Vehicles. Sensors, 20.","DOI":"10.3390\/s20215991"},{"key":"ref_27","unstructured":"Chen, J., Li, S.E., and Tomizuka, M. (2021, January 19\u201322). Interpretable end-to-end urban autonomous driving with latent deep reinforcement learning. Proceedings of the IEEE Transactions on Intelligent Transportation Systems, Indianapolis, IN, USA."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"195000","DOI":"10.1109\/ACCESS.2020.3033888","article-title":"Using deep reinforcement learning for exploratory performance testing of software systems with multi-dimensional input spaces","volume":"8","author":"Ahmad","year":"2020","journal-title":"IEEE Access"},{"key":"ref_29","unstructured":"Baram, N., Anschel, O., Caspi, I., and Mannor, S. (2017, January 6\u201311). End-to-end differentiable adversarial imitation learning. Proceedings of the PMLR, Sydney, Australia."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"88","DOI":"10.1162\/neco.1991.3.1.88","article-title":"Efficient training of artificial neural networks for autonomous navigation","volume":"3","author":"Pomerleau","year":"1991","journal-title":"Neural Comput."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"661","DOI":"10.1109\/LRA.2015.2509024","article-title":"A machine learning approach to visual perception of forest trails for mobile robots","volume":"1","author":"Giusti","year":"2015","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_32","unstructured":"Bojarski, M., Del Testa, D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., Jackel, L.D., Monfort, M., Muller, U., and Zhang, J. (2016). End to end learning for self-driving cars. arXiv."},{"key":"ref_33","unstructured":"Ross, S.E.P., and Bagnell, D. (2010, January 13\u201315). Efficient reductions for imitation learning. Proceedings of the JMLR Workshop and Conference Proceedings, Chia Laguna Resort, Sardinia, Italy."},{"key":"ref_34","unstructured":"Ross, S.E.P., Gordon, G., and Bagnell, D. (2011, January 11\u201313). A reduction of imitation learning and structured prediction to no-regret online learning. Proceedings of the JMLR Workshop and Conference Proceedings, Lauderdale, FL, USA."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"227","DOI":"10.1016\/S0378-3758(00)00115-4","article-title":"Improving predictive inference under covariate shift by weighting the log-likelihood function","volume":"90","author":"Shimodaira","year":"2000","journal-title":"J. Stat. Plan. Inference"},{"key":"ref_36","unstructured":"Ng, A.Y., and Russell, S.J. (July, January 29). Algorithms for inverse reinforcement learning. Proceedings of the ICML, Stanford, CA, USA."},{"key":"ref_37","unstructured":"Ziebart, B.D., Maas, A.L., Bagnell, J.A., and Dey, A.K. (2008, January 13\u201317). Maximum entropy inverse reinforcement learning. Proceedings of the AAAI, Chicago, IL, USA."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1007\/s10514-009-9121-3","article-title":"Learning to search: Functional gradient techniques for imitation learning","volume":"27","author":"Ratliff","year":"2009","journal-title":"Auton. Robot."},{"key":"ref_39","first-page":"1","article-title":"Generative adversarial nets","volume":"27","author":"Goodfellow","year":"2014","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_40","unstructured":"Brock, A., Donahue, J., and Simonyan, K. (2018). Large scale GAN training for high fidelity natural image synthesis. arXiv, Available online: https:\/\/arxiv.org\/abs\/1809.11096."},{"key":"ref_41","first-page":"4565","article-title":"Generative adversarial imitation learning","volume":"29","author":"Ho","year":"2016","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_42","unstructured":"Ho, J., Gupta, J., and Ermon, S. (2016, January 20\u201322). Model-free imitation learning with policy optimization. Proceedings of the PMLR, New York, NY, USA."},{"key":"ref_43","unstructured":"Merel, J., Tassa, Y., TB, D., Srinivasan, S., Lemmon, J., Wang, Z., Wayne, G., and Heess, N. (2017). Learning human behaviors from motion capture by adversarial imitation. arXiv."},{"key":"ref_44","unstructured":"Peng, X.B., Kanazawa, A., Toyer, S., Abbeel, P., and Levine, S. (2018). Variational discriminator bottleneck: Improving imitation learning, inverse rl, and gans by constraining information flow. arXiv."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Karimshoushtari, M., Novara, C., and Tango, F. (2021). How Imitation Learning and Human Factors Can Be Combined in a Model Predictive Control Algorithm for Adaptive Motion Planning and Control. Sensors, 21.","DOI":"10.3390\/s21124012"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Zhou, Y., Fu, R., Wang, C., and Zhang, R. (2020). Modeling Car-Following Behaviors and Driving Styles with Generative Adversarial Imitation Learning. Sensors, 20.","DOI":"10.3390\/s20185034"},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Fossen, T.I. (2011). Handbook of Marine Craft Hydrodynamics and Motion Control, John Wiley & Sons.","DOI":"10.1002\/9781119994138"},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"1238","DOI":"10.1177\/0278364913495721","article-title":"Reinforcement learning in robotics: A survey","volume":"32","author":"Kober","year":"2013","journal-title":"Int. J. Robot. Res."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Chaffre, T., Moras, J., Chan-Hon-Tong, A., and Marzat, J. (2020). Sim-to-real transfer with incremental environment complexity for reinforcement learning of depth-based robot navigation. arXiv.","DOI":"10.5220\/0009821603140323"},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"34","DOI":"10.1126\/science.153.3731.34","article-title":"Dynamic programming","volume":"153","author":"Bellman","year":"1966","journal-title":"Science"},{"key":"ref_51","unstructured":"Haarnoja, T., Tang, H., Abbeel, P., and Levine, S. (2017, January 6\u201311). Reinforcement learning with deep energy-based policies. Proceedings of the PMLR, Sydney, Australia."},{"key":"ref_52","unstructured":"Bhattacharyya, R., Wulfe, B., Phillips, D., Kuefler, A., Morton, J., Senanayake, R., and Kochenderfer, M. (2020). Modeling human driving behavior through generative adversarial imitation learning. arXiv."},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Torabi, F., Warnell, G., and Stone, P. (2018). Generative adversarial imitation from observation. arXiv.","DOI":"10.24963\/ijcai.2018\/687"},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"445","DOI":"10.1038\/nature14540","article-title":"Reinforcement learning improves behaviour from evaluative feedback","volume":"521","author":"Littman","year":"2015","journal-title":"Nature"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/17\/5893\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T06:54:26Z","timestamp":1760165666000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/17\/5893"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,9,1]]},"references-count":54,"journal-issue":{"issue":"17","published-online":{"date-parts":[[2021,9]]}},"alternative-id":["s21175893"],"URL":"https:\/\/doi.org\/10.3390\/s21175893","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,9,1]]}}}