{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,30]],"date-time":"2026-03-30T20:48:10Z","timestamp":1774903690095,"version":"3.50.1"},"reference-count":33,"publisher":"MDPI AG","issue":"16","license":[{"start":{"date-parts":[[2020,8,13]],"date-time":"2020-08-13T00:00:00Z","timestamp":1597276800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"This work was supported by The National Key Research and Development Program of China","award":["2017YFC0822403"],"award-info":[{"award-number":["2017YFC0822403"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Multiple unmanned aerial vehicle (UAV) collaboration has great potential. To increase the intelligence and environmental adaptability of multi-UAV control, we study the application of deep reinforcement learning algorithms in the field of multi-UAV cooperative control. Aiming at the problem of a non-stationary environment caused by the change of learning agent strategy in reinforcement learning in a multi-agent environment, the paper presents an improved multiagent reinforcement learning algorithm\u2014the multiagent joint proximal policy optimization (MAJPPO) algorithm with the centralized learning and decentralized execution. This algorithm uses the moving window averaging method to make each agent obtain a centralized state value function, so that the agents can achieve better collaboration. The improved algorithm enhances the collaboration and increases the sum of reward values obtained by the multiagent system. To evaluate the performance of the algorithm, we use the MAJPPO algorithm to complete the task of multi-UAV formation and the crossing of multiple-obstacle environments. To simplify the control complexity of the UAV, we use the six-degree of freedom and 12-state equations of the dynamics model of the UAV with an attitude control loop. The experimental results show that the MAJPPO algorithm has better performance and better environmental adaptability.<\/jats:p>","DOI":"10.3390\/s20164546","type":"journal-article","created":{"date-parts":[[2020,8,14]],"date-time":"2020-08-14T08:28:35Z","timestamp":1597393715000},"page":"4546","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":35,"title":["Research on the Multiagent Joint Proximal Policy Optimization Algorithm Controlling Cooperative Fixed-Wing UAV Obstacle Avoidance"],"prefix":"10.3390","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0701-2763","authenticated-orcid":false,"given":"Weiwei","family":"Zhao","sequence":"first","affiliation":[{"name":"Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, No. 3888, Dongnanhu Rd., Changchun 130033, China"},{"name":"University of Chinese Academy of Sciences, No. 19, Yuquan Rd., Beijing 100049, China"}]},{"given":"Hairong","family":"Chu","sequence":"additional","affiliation":[{"name":"Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, No. 3888, Dongnanhu Rd., Changchun 130033, China"}]},{"given":"Xikui","family":"Miao","sequence":"additional","affiliation":[{"name":"School of Information Engineering, Henan University of Science and Technology, Luoyang 471000, China"}]},{"given":"Lihong","family":"Guo","sequence":"additional","affiliation":[{"name":"Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, No. 3888, Dongnanhu Rd., Changchun 130033, China"}]},{"given":"Honghai","family":"Shen","sequence":"additional","affiliation":[{"name":"Key Laboratory of Airborne Optical Imaging and Measurement, Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, No. 3888, Dong Nanhu Road, Changchun 130033, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5784-0527","authenticated-orcid":false,"given":"Chenhao","family":"Zhu","sequence":"additional","affiliation":[{"name":"Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, No. 3888, Dongnanhu Rd., Changchun 130033, China"},{"name":"University of Chinese Academy of Sciences, No. 19, Yuquan Rd., Beijing 100049, China"}]},{"given":"Feng","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Aviation Operations and Services, Aviation University of the Air Force, No. 2222, Dongnanhu Rd., Changchun 130022, China"}]},{"given":"Dongxin","family":"Liang","sequence":"additional","affiliation":[{"name":"Xi\u2019an Jiaotong University Health Science Center, No. 76, Yanta West Road, Xi\u2019an 710061, China"}]}],"member":"1968","published-online":{"date-parts":[[2020,8,13]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"401","DOI":"10.1109\/TAC.2005.864190","article-title":"Flocking for multi-agent dynamic systems: Algorithms and theory","volume":"51","year":"2006","journal-title":"IEEE Trans. Autom. Control"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Bin, F., Xiaofeng, F., and Shuo, X. (2017, January 9\u201310). Research on Cooperative Collision Avoidance Problem of Multiple UAV Based on Reinforcement Learning. Proceedings of the International Conference on Intelligent Computation Technology & Automation 2017, Changsha, China.","DOI":"10.1109\/ICICTA.2017.30"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"52","DOI":"10.1109\/TCST.2014.2312392","article-title":"Multirobot cooperative learning for predator avoidance","volume":"23","author":"La","year":"2015","journal-title":"IEEE Trans. Control Syst. Technol."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"186","DOI":"10.1109\/TCYB.2015.2509646","article-title":"A Q-learning approach to flocking with UAVs in a stochastic environment","volume":"47","author":"Hung","year":"2017","journal-title":"IEEE Trans. Cybern."},{"key":"ref_5","unstructured":"Pham, H.X., La, H.M., Feil-Seifer, D., and Nguyen, L.V. (2018). Autonomous uav navigation using reinforcement learning. arXiv."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"22","DOI":"10.1145\/3301273","article-title":"Reinforcement learning for UAV attitude control","volume":"3","author":"Koch","year":"2019","journal-title":"ACM Trans. Cyber Phys. Syst."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"156","DOI":"10.1109\/TSMCC.2007.913919","article-title":"A comprehensive survey of multiagent reinforcement learning","volume":"38","author":"Busoniu","year":"2008","journal-title":"IEEE Trans. Syst. Man Cybern. Part C Appl. Rev."},{"key":"ref_8","unstructured":"Kapoor, S. (2018). Multi-agent reinforcement learning: A report on challenges and approaches. arXiv."},{"key":"ref_9","unstructured":"Nguyen, T.T., Nguyen, N.D., and Nahavandi, S. (2018). Deep Reinforcement Learning for Multi-Agent Systems: A Review of Challenges, Solutions and Applications. arXiv."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Tan, M. (1993, January 27\u201329). Multi-agent reinforcement learning: Independent vs. cooperative agents. Proceedings of the Tenth International Conference on Machine Learning, Amherst, MA, USA.","DOI":"10.1016\/B978-1-55860-307-3.50049-6"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Tampuu, A., Matiisen, T., Kodelja, D., Kuzovkin, I., Korjus, K., Aru, J., Aru, J., and Vicente, R. (2017). Multiagent cooperation and competition with deep reinforcement learning. PLoS ONE, 12.","DOI":"10.1371\/journal.pone.0172395"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Foerster, J., Farquhar, G., Afouras, T., Nardelli, N., and Whiteson, S. (2018, January 2\u20137). Counterfactual multi-agent policy gradients. Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA.","DOI":"10.1609\/aaai.v32i1.11794"},{"key":"ref_13","unstructured":"Lowe, R., Wu, Y.I., Tamar, A., Harb, J., Abbeel, O.P., and Mordatch, I. (2017). Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in Neural Information Processing Systems, Curran Associates Inc."},{"key":"ref_14","unstructured":"Li, S., Wu, Y., Cui, X., Dong, H., Fang, F., and Russell, S. (February, January 27). Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient. Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI), Honolulu, HI, USA."},{"key":"ref_15","unstructured":"Yang, Y., Luo, R., Li, M., Zhou, M., Zhang, W., and Wang, J. (2018). Mean field multi-agent reinforcement learning. arXiv."},{"key":"ref_16","unstructured":"Sunehag, P., Lever, G., Gruslys, A., Czarnecki, W.M., Zambaldi, V., Jaderberg, M., Lanctot, M., Sonnerat, N., Leibo, J.Z., and Tuyls, K. (2017). Value-decomposition networks for cooperative multi-agent learning. arXiv."},{"key":"ref_17","unstructured":"Rashid, T., Samvelyan, M., De Witt, C.S., Farquhar, G., Foerster, J., and Whiteson, S. (2018). QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning. arXiv."},{"key":"ref_18","unstructured":"Hong, Z.-W., Su, S.-Y., Shann, T.-Y., Chang, Y.-H., and Lee, C.-Y. (2018, January 10\u201315). A deep policy inference q-network for multi-agent systems. Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, Stockholm, Sweden."},{"key":"ref_19","unstructured":"Bansal, T., Pachocki, J., Sidor, S., Sutskever, I., and Mordatch, I. (2017). Emergent complexity via multi-agent competition. arXiv."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Wang, C., Wang, J., Zhang, X., and Zhang, X. (2017, January 14\u201316). Autonomous navigation of UAV in large-scale unknown complex environment with deep reinforcement learning. Proceedings of the 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP), Montreal, QC, Canada.","DOI":"10.1109\/GlobalSIP.2017.8309082"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"17798","DOI":"10.1109\/ACCESS.2019.2895643","article-title":"Flocking Control of Fixed-Wing UAVs with Cooperative Obstacle Avoidance Capability","volume":"7","author":"Zhao","year":"2019","journal-title":"IEEE Access"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Guerrero-Castellanos, J.F., Vega-Alonzo, A., Durand, S., Marchand, N., Gonzalez-Diaz, V.R., Casta\u00f1eda-Camacho, J., and Guerrero-S\u00e1nchez, W.F. (2019). Leader-Following Consensus and Formation Control of VTOL-UAVs with Event-Triggered Communications. Sensors, 19.","DOI":"10.3390\/s19245498"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1038\/nature14236","article-title":"Human-level control through deep reinforcement learning","volume":"518","author":"Mnih","year":"2015","journal-title":"Nature"},{"key":"ref_24","unstructured":"Sutton, R.S., and Barto, A.G. (2018). Reinforcement Learning: An Introduction, MIT Press."},{"key":"ref_25","unstructured":"Hausknecht, M., and Stone, P. (2015, January 12\u201314). Deep recurrent q-learning for partially observable mdps. Proceedings of the 2015 AAAI Fall Symposium Series, Arlington, VA, USA."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Oliehoek, F.A., and Amato, C. (2016). A Concise Introduction to Decentralized POMDPs, Springer International Publishing.","DOI":"10.1007\/978-3-319-28929-8"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"229","DOI":"10.1007\/BF00992696","article-title":"Simple statistical gradient-following algorithms for connectionist reinforcement learning","volume":"8","author":"Williams","year":"1992","journal-title":"Mach. Learn."},{"key":"ref_28","unstructured":"Schulman, J., Moritz, P., Levine, S., Jordan, M., and Abbeel, P. (2015). High-dimensional continuous control using generalized advantage estimation. arXiv."},{"key":"ref_29","unstructured":"Schulman, J., Levine, S., Abbeel, P., Jordan, M., and Moritz, P. (2015, January 6\u201311). Trust region policy optimization. Proceedings of the 32nd International Conference on Machine Learning, Lille, France."},{"key":"ref_30","unstructured":"Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. (2017). Proximal policy optimization algorithms. arXiv."},{"key":"ref_31","unstructured":"Heess, N., TB, D., Sriram, S., Lemmon, J., Merel, J., Wayne, G., Tassa, Y., Erez, T., Wang, Z., and Eslami, S.M. (2017). Emergence of locomotion behaviours in rich environments. arXiv."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Beard, R.W., and McLain, T.W. (2012). Small Unmanned Aircraft: Theory and Practice, Princeton University Press.","DOI":"10.1515\/9781400840601"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"1484","DOI":"10.1016\/j.neunet.2009.05.011","article-title":"Real-time reinforcement learning by sequential actor\u2013critics and experience replay","volume":"22","author":"Wawrzynski","year":"2009","journal-title":"Neural Netw."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/20\/16\/4546\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T10:00:31Z","timestamp":1760176831000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/20\/16\/4546"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,8,13]]},"references-count":33,"journal-issue":{"issue":"16","published-online":{"date-parts":[[2020,8]]}},"alternative-id":["s20164546"],"URL":"https:\/\/doi.org\/10.3390\/s20164546","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,8,13]]}}}