{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,23]],"date-time":"2026-03-23T12:51:01Z","timestamp":1774270261110,"version":"3.50.1"},"reference-count":38,"publisher":"MDPI AG","issue":"11","license":[{"start":{"date-parts":[[2021,10,29]],"date-time":"2021-10-29T00:00:00Z","timestamp":1635465600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62003267"],"award-info":[{"award-number":["62003267"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100007128","name":"Natural Science Foundation of Shaanxi Province","doi-asserted-by":"publisher","award":["2020JQ-220"],"award-info":[{"award-number":["2020JQ-220"]}],"id":[{"id":"10.13039\/501100007128","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Open Project of Science and Technology on Electronic Information Control Laboratory","award":["JS20201100339"],"award-info":[{"award-number":["JS20201100339"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>A pursuit\u2013evasion game is a classical maneuver confrontation problem in the multi-agent systems (MASs) domain. An online decision technique based on deep reinforcement learning (DRL) was developed in this paper to address the problem of environment sensing and decision-making in pursuit\u2013evasion games. A control-oriented framework developed from the DRL-based multi-agent deep deterministic policy gradient (MADDPG) algorithm was built to implement multi-agent cooperative decision-making to overcome the limitation of the tedious state variables required for the traditionally complicated modeling process. To address the effects of errors between a model and a real scenario, this paper introduces adversarial disturbances. It also proposes a novel adversarial attack trick and adversarial learning MADDPG (A2-MADDPG) algorithm. By introducing an adversarial attack trick for the agents themselves, uncertainties of the real world are modeled, thereby optimizing robust training. During the training process, adversarial learning was incorporated into our algorithm to preprocess the actions of multiple agents, which enabled them to properly respond to uncertain dynamic changes in MASs. Experimental results verified that the proposed approach provides superior performance and effectiveness for pursuers and evaders, and both can learn the corresponding confrontational strategy during training.<\/jats:p>","DOI":"10.3390\/e23111433","type":"journal-article","created":{"date-parts":[[2021,11,1]],"date-time":"2021-11-01T22:21:08Z","timestamp":1635805268000},"page":"1433","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":47,"title":["An Improved Approach towards Multi-Agent Pursuit\u2013Evasion Game Decision-Making Using Deep Reinforcement Learning"],"prefix":"10.3390","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1359-7112","authenticated-orcid":false,"given":"Kaifang","family":"Wan","sequence":"first","affiliation":[{"name":"School of Electronics and Information, Northwestern Polytechnical University, Xi\u2019an 710072, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6763-4372","authenticated-orcid":false,"given":"Dingwei","family":"Wu","sequence":"additional","affiliation":[{"name":"School of Electronics and Information, Northwestern Polytechnical University, Xi\u2019an 710072, China"}]},{"given":"Yiwei","family":"Zhai","sequence":"additional","affiliation":[{"name":"School of Electronics and Information, Northwestern Polytechnical University, Xi\u2019an 710072, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1415-4444","authenticated-orcid":false,"given":"Bo","family":"Li","sequence":"additional","affiliation":[{"name":"School of Electronics and Information, Northwestern Polytechnical University, Xi\u2019an 710072, China"}]},{"given":"Xiaoguang","family":"Gao","sequence":"additional","affiliation":[{"name":"School of Electronics and Information, Northwestern Polytechnical University, Xi\u2019an 710072, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8167-8566","authenticated-orcid":false,"given":"Zijian","family":"Hu","sequence":"additional","affiliation":[{"name":"School of Electronics and Information, Northwestern Polytechnical University, Xi\u2019an 710072, China"}]}],"member":"1968","published-online":{"date-parts":[[2021,10,29]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.37965\/jait.2020.0065","article-title":"Technologies Supporting Artificial Intelligence and Robotics Application Development","volume":"1","author":"Chen","year":"2021","journal-title":"J. Artif. Intell. Technol."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Wu, D., Wan, K., Gao, X., and Hu, Z. (2021, January 16\u201318). Multiagent Motion Planning Based on Deep Reinforcement Learning in Complex Environments. Proceedings of the 2021 6th International Conference on Control and Robotics Engineering (ICCRE), Beijing, China.","DOI":"10.1109\/ICCRE51898.2021.9435656"},{"key":"ref_3","unstructured":"Czap, H. (2005). Self-Organization and Autonomic Informatics (I), IOS Press."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"12","DOI":"10.4304\/jsw.1.2.12-23","article-title":"A Jxta Based Asynchronous Peer-to-Peer Implementation of Genetic Programming","volume":"1","author":"Folino","year":"2006","journal-title":"J. Softw."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Forestiero, A., Mastroianni, C., Papuzzo, G., and Spezzano, G. (2010, January 17\u201320). A proximity-based self-organizing framework for service composition and discovery. Proceedings of the 2010 10th IEEE\/ACM International Conference on Cluster, Cloud and Grid Computing, Melbourne, VIC, Australia.","DOI":"10.1109\/CCGRID.2010.48"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1911","DOI":"10.1109\/TAC.2019.2926554","article-title":"Solutions for Multiagent Pursuit-Evasion Games on Communication Graphs: Finite-Time Capture and Asymptotic Behaviors","volume":"65","author":"Lopez","year":"2019","journal-title":"IEEE Trans. Autom. Control"},{"key":"ref_7","unstructured":"Zhou, Z., and Xu, H. (May, January 27). Mean Field Game and Decentralized Intelligent Adaptive Pursuit Evasion Strategy for Massive Multi-Agent System under Uncertain Environment with Detailed Proof. Proceedings of the Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications II, Online."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Liu, K., Jia, B., Chen, G., Pham, K., and Blasch, E. (2015, January 13\u201317). A real-time orbit SATellites Uncertainty propagation and visualization system using graphics computing unit and multi-threading processing. Proceedings of the IEEE\/AIAA Digital Avionics Systems Conference, Prague, Czech Republic.","DOI":"10.1109\/DASC.2015.7311467"},{"key":"ref_9","unstructured":"(1967). Differential Games. A Mathematical Theory with Applications to Warfare and Pursuit, Control and Optimization. Math. Gaz., 51, 80."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"(2009). Unification of differential games, generalized solutions of the Hamilton-Jacobi equations, and a stochastic guide. Differ. Equ., 45, 1653\u20131668.","DOI":"10.1134\/S0012266109110111"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"24","DOI":"10.1016\/j.automatica.2016.04.012","article-title":"Multi-player pursuit\u2013evasion games with one superior evader","volume":"71","author":"Chen","year":"2016","journal-title":"Automatica"},{"key":"ref_12","unstructured":"Hao, W., Cheng, L., and Fang, B. (2014, January 19\u201321). An alliance generation algorithm based on modified particle swarm optimization for multiple emotional robots pursuit-evader problem. Proceedings of the 2014 11th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), Xiamen, China."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1038\/nature14539","article-title":"Deep learning","volume":"521","author":"Lecun","year":"2015","journal-title":"Nature"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"408","DOI":"10.1016\/j.tics.2019.02.006","article-title":"Reinforcement Learning, Fast and Slow","volume":"23","author":"Botvinick","year":"2019","journal-title":"Trends Cogn. Sci."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"101","DOI":"10.1016\/j.neucom.2020.06.031","article-title":"Cooperative control for multi-player pursuit\u2013evasion games with reinforcement learning","volume":"412","author":"Wang","year":"2020","journal-title":"Neurocomputing"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Xu, G., Zhao, Y., and Liu, H. (2019, January 22\u201324). Pursuit and evasion game between UVAs based on multi-agent reinforcement learning. Proceedings of the 2019 Chinese Automation Congress (CAC), Hangzhou, China.","DOI":"10.1109\/CAC48633.2019.8997447"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Park, J., Lee, J., Kim, T., Ahn, I., and Park, J. (2021). Co-Evolution of Predator-Prey Ecosystems by Reinforcement Learning Agents. Entropy, 23.","DOI":"10.3390\/e23040461"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Gu, S., Geng, M., and Lan, L. (2021). Attention-Based Fault-Tolerant Approach for Multi-Agent Reinforcement Learning Systems. Entropy, 23.","DOI":"10.3390\/e23091133"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Sewak, M. (2019). Deep q network (dqn), double dqn, and dueling dqn. Deep Reinforcement Learning, Springer.","DOI":"10.1007\/978-981-13-8285-7_8"},{"key":"ref_20","unstructured":"Liu, B., Ye, X., Dong, X., and Ni, L. (2017). Branching improved Deep Q Networks for solving pursuit\u2013evasion strategy solution of spacecraft. J. Ind. Manag. Optim., 13."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"3416","DOI":"10.1007\/s11227-018-2591-3","article-title":"A novel approach for multi-agent cooperative pursuit to capture grouped evaders","volume":"76","author":"Qadir","year":"2020","journal-title":"J. Supercomput."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Singh, G., Lofaro, D.M., and Sofge, D. (2020, January 22\u201324). Pursuit-evasion with Decentralized Robotic Swarm in Continuous State Space and Action Space via Deep Reinforcement Learning. Proceedings of the 12th International Conference on Agents and Artificial Intelligence, Valletta, Malta.","DOI":"10.5220\/0008971502260233"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"e27","DOI":"10.1002\/adc2.27","article-title":"Cooperatively pursuing a target unmanned aerial vehicle by multiple unmanned aerial vehicles based on multiagent reinforcement learning","volume":"2","author":"Wang","year":"2020","journal-title":"Adv. Control Appl. Eng. Ind. Syst."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"932","DOI":"10.1016\/j.dt.2020.05.022","article-title":"A new energy efficient management approach for wireless sensor networks in target tracking","volume":"17","author":"Pang","year":"2021","journal-title":"Def. Technol."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"419","DOI":"10.1007\/s10846-019-00996-1","article-title":"Optimizing evasive strategies for an evader with imperfect vision capacity","volume":"96","author":"Di","year":"2019","journal-title":"J. Intell. Robot. Syst."},{"key":"ref_26","unstructured":"Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., and Mordatch, I. (2017). Multi-agent actor\u2013critic for mixed cooperative-competitive environments. arXiv."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Zhang, B.H., Lemoine, B., and Mitchell, M. (2018, January 2\u20133). Mitigating unwanted biases with adversarial learning. Proceedings of the 2018 AAAI\/ACM Conference on AI, Ethics, and Society, New Orleans, LA, USA.","DOI":"10.1145\/3278721.3278779"},{"key":"ref_28","unstructured":"Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. (2015). Continuous control with deep reinforcement learning. arXiv."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Li, B., Gan, Z., Chen, D., and Sergey Aleksandrovich, D. (2020). UAV Maneuvering Target Tracking in Uncertain Environments Based on Deep Reinforcement Learning and Meta-Learning. Remote Sens., 12.","DOI":"10.3390\/rs12223789"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"457","DOI":"10.1016\/j.dt.2020.11.014","article-title":"Maneuvering target tracking of UAV based on MN-DDPG and transfer learning","volume":"17","author":"Li","year":"2021","journal-title":"Def. Technol."},{"key":"ref_31","unstructured":"Li, S., Wu, Y., Cui, X., Dong, H., Fang, F., and Russell, S. (February, January 27). Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient. Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA."},{"key":"ref_32","unstructured":"Goodfellow, I.J., Shlens, J., and Szegedy, C. (2014). Explaining and harnessing adversarial examples. arXiv."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Liu, Y., Mao, S., Mei, X., Yang, T., and Zhao, X. (2019, January 6\u20139). Sensitivity of Adversarial Perturbation in Fast Gradient Sign Method. Proceedings of the 2019 IEEE Symposium Series on Computational Intelligence (SSCI), Xiamen, China.","DOI":"10.1109\/SSCI44817.2019.9002856"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"441","DOI":"10.1287\/moor.12.3.441","article-title":"The complexity of Markov decision processes","volume":"12","author":"Papadimitriou","year":"1987","journal-title":"Math. Oper. Res."},{"key":"ref_35","unstructured":"Guo, H., Liu, T., Wang, Y., Chen, F., and Fan, J. (2006, January 21\u201323). Research on actor\u2013critic reinforcement learning in RoboCup. Proceedings of the 2006 6th World Congress on Intelligent Control and Automation, Dalian, China."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Wan, K., Gao, X., Hu, Z., and Wu, G. (2020). Robust motion control for UAV in dynamic uncertain environments using deep reinforcement learning. Remote Sens., 12.","DOI":"10.3390\/rs12040640"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"750","DOI":"10.1007\/s10458-019-09421-1","article-title":"A survey and critique of multiagent deep reinforcement learning","volume":"33","author":"Kartal","year":"2019","journal-title":"Auton. Agents Multi-Agent Syst."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Vivek, B., and Babu, R.V. (2020, January 13\u201319). Single-step adversarial training with dropout scheduling. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00103"}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/23\/11\/1433\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T07:23:18Z","timestamp":1760167398000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/23\/11\/1433"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,29]]},"references-count":38,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2021,11]]}},"alternative-id":["e23111433"],"URL":"https:\/\/doi.org\/10.3390\/e23111433","relation":{},"ISSN":["1099-4300"],"issn-type":[{"value":"1099-4300","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,10,29]]}}}