{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,25]],"date-time":"2026-03-25T16:15:40Z","timestamp":1774455340850,"version":"3.50.1"},"reference-count":22,"publisher":"MDPI AG","issue":"23","license":[{"start":{"date-parts":[[2021,12,6]],"date-time":"2021-12-06T00:00:00Z","timestamp":1638748800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61973101"],"award-info":[{"award-number":["61973101"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100012130","name":"Aeronautical Science Foundation of China","doi-asserted-by":"publisher","award":["20180577005"],"award-info":[{"award-number":["20180577005"]}],"id":[{"id":"10.13039\/501100012130","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Planetary soft landing has been studied extensively due to its promising application prospects. In this paper, a soft landing control algorithm based on deep reinforcement learning (DRL) with good convergence property is proposed. First, the soft landing problem of the powered descent phase is formulated and the theoretical basis of Reinforcement Learning (RL) used in this paper is introduced. Second, to make it easier to converge, a reward function is designed to include process rewards like velocity tracking reward, solving the problem of sparse reward. Then, by including the fuel consumption penalty and constraints violation penalty, the lander can learn to achieve velocity tracking goal while saving fuel and keeping attitude angle within safe ranges. Then, simulations of training are carried out under the frameworks of Deep deterministic policy gradient (DDPG), Twin Delayed DDPG (TD3), and Soft Actor Critic (SAC), respectively, which are of the classical RL frameworks, and all converged. Finally, the trained policy is deployed into velocity tracking and soft landing experiments, results of which demonstrate the validity of the algorithm proposed.<\/jats:p>","DOI":"10.3390\/s21238161","type":"journal-article","created":{"date-parts":[[2021,12,7]],"date-time":"2021-12-07T02:48:13Z","timestamp":1638845293000},"page":"8161","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":19,"title":["Deep Reinforcement Learning-Based Accurate Control of Planetary Soft Landing"],"prefix":"10.3390","volume":"21","author":[{"given":"Xibao","family":"Xu","sequence":"first","affiliation":[{"name":"School of Astronautics, Harbin Institute of Technology, Harbin 150001, China"},{"name":"Beijing Institute of Astronautical Systems Engineering, Beijing 100076, China"}]},{"given":"Yushen","family":"Chen","sequence":"additional","affiliation":[{"name":"School of Astronautics, Harbin Institute of Technology, Harbin 150001, China"}]},{"given":"Chengchao","family":"Bai","sequence":"additional","affiliation":[{"name":"School of Astronautics, Harbin Institute of Technology, Harbin 150001, China"}]}],"member":"1968","published-online":{"date-parts":[[2021,12,6]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"172","DOI":"10.1016\/j.robot.2017.04.020","article-title":"50 years of rovers for planetary exploration: A retrospective review for future directions","volume":"94","author":"Sanguino","year":"2017","journal-title":"Robot. Auton. Syst."},{"key":"ref_2","first-page":"12","article-title":"Review and prospect of the development of world lunar exploration","volume":"481","author":"Lu","year":"2019","journal-title":"Space Int."},{"key":"ref_3","first-page":"719","article-title":"A Survey of Guidance Technology for Moon \/Mars Soft Landing","volume":"41","author":"Xu","year":"2020","journal-title":"J. Astronaut."},{"key":"ref_4","unstructured":"Sostaric, R.R. (2007, January 3\u20137). Powered descent trajectory guidance and some considerations for human lunar landing. Proceedings of the 30th Annual AAS Guidance and Control Conference, Breckenridge, CO, USA."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"116309","DOI":"10.1016\/j.ijms.2020.116309","article-title":"From vacuum to atmospheric pressure: A review of ambient ion soft landing","volume":"450","author":"Tata","year":"2020","journal-title":"Int. J. Mass Spectrom."},{"key":"ref_6","first-page":"409","article-title":"Optimal Design of Direct Soft-Landing Trajectory of Lunar Prospector","volume":"2","author":"He","year":"2007","journal-title":"J. Astronaut."},{"key":"ref_7","unstructured":"Leondes, C.T., and Vance, R.W. (1964). Lunar Terminal Guidance, Lunar Missions and Exploration. University of California Engineering and Physical Sciences Extension Series, Wiley."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"503","DOI":"10.2514\/3.2362","article-title":"A terminal guidance technique for lunar landing","volume":"2","author":"Citron","year":"1964","journal-title":"AIAA J."},{"key":"ref_9","unstructured":"Hull, D.G., and Speyer, J. (1981, January 3\u20135). Optimal reentry and plane-change trajectories. Proceedings of the AIAA Astrodynamics Specialist Conference, Lake Tahoe, NV, USA."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"686","DOI":"10.1016\/j.actaastro.2019.12.037","article-title":"A multiple-shooting differential dynamic programming algorithm. Part 1: Theory","volume":"170","author":"Pellegrini","year":"2020","journal-title":"Acta Astronaut."},{"key":"ref_11","unstructured":"Bolle, A., Circi, C., and Corrao, G. (2015). Adaptive Multiple Shooting Optimization Method for Determining Optimal Spacecraft Trajectories. (9,031,818), U.S. Patent."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"2896","DOI":"10.1109\/TAES.2019.2955785","article-title":"Optimal Guidance for Planetary Landing in Hazardous Terrains","volume":"56","author":"Bai","year":"2020","journal-title":"IEEE Trans. Aerosp. Electron. Syst."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"898","DOI":"10.2514\/3.28985","article-title":"Development of the iterative guidance mode with its application to various vehicles and missions","volume":"4","author":"Chandler","year":"1967","journal-title":"J. Spacecr. Rocket."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"379","DOI":"10.1016\/j.actaastro.2021.09.003","article-title":"Powered soft landing guidance method for launchers with non-cluster configured engines","volume":"189","author":"Song","year":"2021","journal-title":"Acta Astronaut."},{"key":"ref_15","unstructured":"Amrutha, V., Sreeja, S., and Sabarinath, A. (2021, January 6\u201313). Trajectory Optimization of Lunar Soft Landing Using Differential Evolution. Proceedings of the 2021 IEEE Aerospace Conference (50100), Big Sky, MT, USA."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"1122","DOI":"10.2514\/1.G002357","article-title":"Real-time optimal control via deep neural networks: Study on landing problems","volume":"41","author":"Izzo","year":"2018","journal-title":"J. Guid. Control Dyn."},{"key":"ref_17","unstructured":"Furfaro, R., Bloise, I., Orlandelli, M., Di Lizia, P., Topputo, F., and Linares, R. (2018, January 19\u201328). Deep learning for autonomous lunar landing. Proceedings of the 2018 AAS\/AIAA Astrodynamics Specialist Conference, Snowbird, UT, USA."},{"key":"ref_18","unstructured":"Furfaro, R., Bloise, I., Orlandelli, M., Di Lizia, P., Topputo, F., and Linares, R. (2018, January 13\u201315). A recurrent deep architecture for quasi-optimal feedback guidance in planetary landing. Proceedings of the IAA SciTech Forum on Space Flight Mechanics and Space Structures and Materials, Moscow, Russia."},{"key":"ref_19","unstructured":"Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv."},{"key":"ref_20","unstructured":"Kiran, B.R., Sobh, I., Talpaert, V., Mannion, P., Al Sallab, A.A., Yogamani, S., and P\u00e9rez, P. (2020). Deep reinforcement learning for autonomous driving: A survey. arXiv."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"178450","DOI":"10.1109\/ACCESS.2020.3027923","article-title":"Review of Deep Reinforcement Learning-based Object Grasping: Techniques, Open Challenges and Recommendations","volume":"8","author":"Mohammed","year":"2020","journal-title":"IEEE Access"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"1353","DOI":"10.2514\/1.27553","article-title":"Convex programming approach to powered descent guidance for mars landing","volume":"30","author":"Acikmese","year":"2007","journal-title":"J. Guid. Control Dyn."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/23\/8161\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T07:42:04Z","timestamp":1760168524000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/23\/8161"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,12,6]]},"references-count":22,"journal-issue":{"issue":"23","published-online":{"date-parts":[[2021,12]]}},"alternative-id":["s21238161"],"URL":"https:\/\/doi.org\/10.3390\/s21238161","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,12,6]]}}}