{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T02:57:30Z","timestamp":1760151450936,"version":"build-2065373602"},"reference-count":23,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2022,3,18]],"date-time":"2022-03-18T00:00:00Z","timestamp":1647561600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61473015","91646108","62073020"],"award-info":[{"award-number":["61473015","91646108","62073020"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Robotics"],"abstract":"<jats:p>Path planning is a key technology for the autonomous mobility of intelligent robots. However, there are few studies on how to carry out path planning in real time under the confrontation environment. Therefore, based on the deep deterministic policy gradient (DDPG) algorithm, this paper designs the reward function and adopts the incremental training and reward compensation method to improve the training efficiency and obtain the penetration strategy. The Monte Carlo experiment results show that the algorithm can effectively avoid static obstacles, break through the interception, and finally reach the target area. Moreover, the algorithm is also validated in the Webots simulator.<\/jats:p>","DOI":"10.3390\/robotics11020035","type":"journal-article","created":{"date-parts":[[2022,3,20]],"date-time":"2022-03-20T21:37:17Z","timestamp":1647812237000},"page":"35","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":13,"title":["Research on Game-Playing Agents Based on Deep Reinforcement Learning"],"prefix":"10.3390","volume":"11","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2172-6607","authenticated-orcid":false,"given":"Kai","family":"Zhao","sequence":"first","affiliation":[{"name":"School of Astronautics, Beihang University (BUAA), Beijing 100191, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4019-970X","authenticated-orcid":false,"given":"Jia","family":"Song","sequence":"additional","affiliation":[{"name":"School of Astronautics, Beihang University (BUAA), Beijing 100191, China"}]},{"given":"Yuxie","family":"Luo","sequence":"additional","affiliation":[{"name":"School of Astronautics, Beihang University (BUAA), Beijing 100191, China"}]},{"given":"Yang","family":"Liu","sequence":"additional","affiliation":[{"name":"School of Automation Science and Electrical Engineering, Beihang University (BUAA), Beijing 100191, China"}]}],"member":"1968","published-online":{"date-parts":[[2022,3,18]]},"reference":[{"key":"ref_1","first-page":"022025","article-title":"The Trajectory Generation of UCAV Evading Missiles Based on Neural Networks","volume":"Volume 1486","author":"Zhang","year":"2020","journal-title":"Journal of Physics: Conference Series"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Yang, C., Wu, J., Liu, G., and Zhang, Y. (2018, January 10\u201312). Ballistic Missile Maneuver Penetration Based on Reinforcement Learning. Proceedings of the 2018 IEEE CSAA Guidance, Navigation and Control Conference (CGNCC), Xiamen, China.","DOI":"10.1109\/GNCC42960.2018.9018872"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"3423","DOI":"10.1016\/j.cja.2020.03.026","article-title":"Evasion guidance algorithms for air-breathing hypersonic vehicles in three-player pursuit-evasion games","volume":"33","author":"Yan","year":"2020","journal-title":"Chin. J. Aeronaut."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Nguyen, H., and La, H. (2019, January 25\u201327). Review of deep reinforcement learning for robot manipulation. Proceedings of the 2019 Third IEEE International Conference on Robotic Computing (IRC), Naples, Italy.","DOI":"10.1109\/IRC.2019.00120"},{"key":"ref_5","unstructured":"Li, Y. (2017). Deep reinforcement learning: An overview. arXiv."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"8","DOI":"10.1016\/j.arcontrol.2018.09.005","article-title":"Reinforcement learning for control: Performance, stability, and deep approximators","volume":"46","author":"Kober","year":"2018","journal-title":"Annu. Rev. Control."},{"key":"ref_7","unstructured":"Dulac-Arnold, G., Mankowitz, D., and Hester, T. (2019). Challenges of real-world reinforcement learning. arXiv."},{"key":"ref_8","unstructured":"Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. (2015). Continuous control with deep reinforcement learning. arXiv."},{"key":"ref_9","unstructured":"Zhang, C., Song, W., Cao, Z., Zhang, J., Tan, P.S., and Xu, C. (2020). Learning to dispatch for job shop scheduling via deep reinforcement learning. arXiv."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"1238","DOI":"10.1177\/0278364913495721","article-title":"Reinforcement learning in robotics: A survey","volume":"32","author":"Kober","year":"2013","journal-title":"Int. J. Robot. Res."},{"key":"ref_11","first-page":"5781591","article-title":"Dynamic path planning of unknown environment based on deep reinforcement learning","volume":"2018","author":"Lei","year":"2018","journal-title":"J. Robot."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"4577","DOI":"10.1109\/TNNLS.2020.3023711","article-title":"Robust formation control for cooperative underactuated quadrotors via reinforcement learning","volume":"32","author":"Zhao","year":"2020","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Gao, J., Ye, W., Guo, J., and Li, Z. (2020). Deep reinforcement learning for indoor mobile robot path planning. Sensors, 20.","DOI":"10.3390\/s20195493"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Choi, J., Park, K., Kim, M., and Seok, S. (2019, January 20\u201324). Deep reinforcement learning of navigation in a complex and crowded environment with a limited field of view. Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada.","DOI":"10.1109\/ICRA.2019.8793979"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Faust, A., Oslund, K., Ramirez, O., Francis, A., Tapia, L., Fiser, M., and Davidson, J. (2018, January 21\u201325). Prm-rl: Long-range robotic navigation tasks by combining reinforcement learning and sampling-based planning. Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia.","DOI":"10.1109\/ICRA.2018.8461096"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Feng, S., Sebastian, B., and Ben-Tzvi, P. (2021). A Collision Avoidance Method Based on Deep Reinforcement Learning. Robotics, 10.","DOI":"10.3390\/robotics10020073"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"2258","DOI":"10.1109\/TII.2019.2933443","article-title":"Distributed reinforcement learning algorithm for dynamic economic dispatch with unknown generation cost functions","volume":"16","author":"Dai","year":"2019","journal-title":"IEEE Trans. Ind. Inform."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"6932","DOI":"10.1109\/LRA.2020.3026638","article-title":"Mobile robot path planning in dynamic environments through globally guided reinforcement learning","volume":"5","author":"Wang","year":"2020","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"156","DOI":"10.1109\/TSMCC.2007.913919","article-title":"A comprehensive survey of multiagent reinforcement learning","volume":"38","author":"Busoniu","year":"2008","journal-title":"IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.)"},{"key":"ref_20","unstructured":"De Witt, C.S., Peng, B., Kamienny, P.A., Torr, P.H., B\u00f6hmer, W., and Whiteson, S. (2020). Deep multi-agent reinforcement learning for decentralized continuous cooperative control. arXiv."},{"key":"ref_21","unstructured":"Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., and Riedmiller, M. (2014, January 21\u201326). Deterministic policy gradient algorithms. Proceedings of the 31st International Conference on Machine Learning, Beijing, China."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"e12360","DOI":"10.1111\/exsy.12360","article-title":"Path planning of humanoids based on artificial potential field method in unknown environments","volume":"36","author":"Kumar","year":"2019","journal-title":"Expert Syst."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"414","DOI":"10.1038\/s41586-021-04301-9","article-title":"Magnetic control of tokamak plasmas through deep reinforcement learning","volume":"602","author":"Degrave","year":"2022","journal-title":"Nature"}],"container-title":["Robotics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2218-6581\/11\/2\/35\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T22:38:52Z","timestamp":1760135932000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2218-6581\/11\/2\/35"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,18]]},"references-count":23,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2022,4]]}},"alternative-id":["robotics11020035"],"URL":"https:\/\/doi.org\/10.3390\/robotics11020035","relation":{},"ISSN":["2218-6581"],"issn-type":[{"type":"electronic","value":"2218-6581"}],"subject":[],"published":{"date-parts":[[2022,3,18]]}}}