{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,1]],"date-time":"2025-10-01T15:24:10Z","timestamp":1759332250433,"version":"3.41.2"},"reference-count":32,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2023,1,10]],"date-time":"2023-01-10T00:00:00Z","timestamp":1673308800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62106284"],"award-info":[{"award-number":["62106284"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Neurorobot."],"abstract":"<jats:p>A system with multiple cooperating unmanned aerial vehicles (multi-UAVs) can use its advantages to accomplish complicated tasks. Recent developments in deep reinforcement learning (DRL) offer good prospects for decision-making for multi-UAV systems. However, the safety and training efficiencies of DRL still need to be improved before practical use. This study presents a transfer-safe soft actor-critic (TSSAC) for multi-UAV decision-making. Decision-making by each UAV is modeled with a constrained Markov decision process (CMDP), in which safety is constrained to maximize the return. The soft actor-critic-Lagrangian (SAC-Lagrangian) algorithm is combined with a modified Lagrangian multiplier in the CMDP model. Moreover, parameter-based transfer learning is used to enable cooperative and efficient training of the tasks to the multi-UAVs. Simulation experiments indicate that the proposed method can improve the safety and training efficiencies and allow the UAVs to adapt to a dynamic scenario.<\/jats:p>","DOI":"10.3389\/fnbot.2022.1105480","type":"journal-article","created":{"date-parts":[[2023,1,10]],"date-time":"2023-01-10T21:22:29Z","timestamp":1673385749000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["Research on reinforcement learning-based safe decision-making methodology for multiple unmanned aerial vehicles"],"prefix":"10.3389","volume":"16","author":[{"given":"Longfei","family":"Yue","sequence":"first","affiliation":[]},{"given":"Rennong","family":"Yang","sequence":"additional","affiliation":[]},{"given":"Ying","family":"Zhang","sequence":"additional","affiliation":[]},{"given":"Jialiang","family":"Zuo","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2023,1,10]]},"reference":[{"key":"B1","first-page":"1","article-title":"Constrained policy optimization,","author":"Achiam","year":"2017","journal-title":"Proceedings of the International Conference on Machine Learning"},{"volume-title":"Constrained Markov Decision Processes: Stochastic Modeling, 1st Edn.","year":"1999","author":"Altman","key":"B2"},{"key":"B3","doi-asserted-by":"publisher","first-page":"353","DOI":"10.1080\/00401706.1995.10484354","article-title":"Markov decision processes: discrete stochastic dynamic programming","volume":"37","author":"Baxter","year":"1995","journal-title":"Technometrics"},{"key":"B4","doi-asserted-by":"publisher","DOI":"10.1016\/B978-0-12-093480-5.50005-2","author":"Bertsekas","year":"1982","journal-title":"Constrained Optimization and Lagrange Multiplier Methods, 1st Edn"},{"key":"B5","doi-asserted-by":"publisher","first-page":"537","DOI":"10.1007\/s10115-013-0665-3","article-title":"Transfer learning for activity recognition: a survey","volume":"36","author":"Cook","year":"2013","journal-title":"Knowledg. Inform.Syst."},{"key":"B6","doi-asserted-by":"publisher","DOI":"10.1109\/ICCNC.2016.7440563","article-title":"UAV-assisted disaster management: Applications and open issues,","author":"Erdelj","year":"2016","journal-title":"Proceedings of the IEEE International Conference on Computing, Networking and Communications"},{"key":"B7","doi-asserted-by":"publisher","first-page":"2167","DOI":"10.4172\/2167-0374.1000144","article-title":"Genetic fuzzy based artificial intelligence for unmanned combat aerial vehicle control in simulated air combat missions","volume":"6","author":"Ernest","year":"2016","journal-title":"J. Defense Manager"},{"key":"B8","doi-asserted-by":"publisher","DOI":"10.1109\/MDM.2016.96","article-title":"The use of autonomous UAVs to improve pesticide application in crop fields,","author":"Faical","year":"2016","journal-title":"Proceedings of 17th IEEE International Conference on Mobile Data Management"},{"journal-title":"UAV Swarm Tactics: An Agent-Based Simulation and Markov Process Analysis.","year":"2013","author":"Gaertner","key":"B9"},{"key":"B10","doi-asserted-by":"publisher","DOI":"10.1109\/BRACIS.2016.027","article-title":"Towards knowledge transfer in deep reinforcement learning,","author":"Glatt","year":"2016","journal-title":"Proceedings of 2016 5th Brazilian Conference Intelligent Systems (BRACIS)"},{"article-title":"Learning to walk in the real world with minimal human effort","year":"2020","author":"Ha","key":"B11"},{"key":"B12","first-page":"1861","article-title":"Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor,","author":"Haarnoja","year":"","journal-title":"Proceedings of the 35th International Conference on Machine Learning"},{"article-title":"Soft actor-critic algorithms and applications","year":"","author":"Haarnoja","key":"B13"},{"key":"B14","doi-asserted-by":"publisher","first-page":"678","DOI":"10.5139\/JKSAS.2019.47.9.678","article-title":"Analysis of SEAD mission procedures for manned-unmanned aerial vehicles teaming","volume":"47","author":"Kim","year":"2019","journal-title":"J. Korean Soc. Aeronaut. Space Sci."},{"article-title":"Adam: A method for stochastic optimization","year":"2014","author":"Kingma","key":"B15"},{"key":"B16","doi-asserted-by":"publisher","DOI":"10.1109\/IVCNZ.2008.4762118","article-title":"Knowledge-based power line detection for UAV surveillance and inspection systems,","author":"Li","year":"2008","journal-title":"Proceedings of 23rd International Conference on Image and Vision Computing"},{"key":"B17","doi-asserted-by":"publisher","first-page":"445","DOI":"10.1038\/nature14540","article-title":"Reinforcement learning improves behaviour from evaluative feedback","volume":"521","author":"Littman","year":"2015","journal-title":"Nature"},{"key":"B18","doi-asserted-by":"publisher","first-page":"63504","DOI":"10.1109\/ACCESS.2019.2914352","article-title":"Cooperative routing problem for ground vehicle and unmanned aerial vehicle: the application on intelligence, surveillance, and reconnaissance missions","volume":"7","author":"Liu","year":"2019","journal-title":"IEEE Access"},{"key":"B19","doi-asserted-by":"publisher","first-page":"313","DOI":"10.1016\/j.cja.2014.02.011","article-title":"Optimization of beamforming and path planning for UAV-assisted wireless relay networks","volume":"27","author":"Ouyang","year":"2014","journal-title":"Chin. J. Aeronaut."},{"key":"B20","first-page":"612","article-title":"Constrained differential optimization,","author":"Platt","year":"1987","journal-title":"Proceedings of Conference and Workshop on Neural Information Processing Systems"},{"key":"B21","doi-asserted-by":"publisher","first-page":"4883","DOI":"10.1007\/s00500-016-2376-7","article-title":"Solving complex multi-UAV mission planning problems using multi-objective genetic algorithms","volume":"21","author":"Ramirez","year":"2016","journal-title":"Soft Comput."},{"key":"B22","unstructured":"RayA.\n            AchiamJ.\n            AmodeiD.\n          Benchmarking Safe Exploration in Deep Reinforcement Learning, 1\u2013252019"},{"article-title":"Proximal policy optimization algorithm","year":"2017","author":"Schulman","key":"B23"},{"key":"B24","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/JIOT.2020.3020067","article-title":"Drone-cell trajectory planning and resource allocation for highly mobile networks: a hierarchical DRL approach","volume":"99","author":"Shi","year":"2020","journal-title":"IEEE Internet Things J."},{"key":"B25","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-37731-1_62","article-title":"Meta transfer learning for adaptive vehicle tracking in UAV videos,","author":"Song","year":"2020","journal-title":"Proceedings of 26rd International Conference on Daejeon, South Korea: MMM 2020"},{"key":"B26","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.engappai.2020.104112","article-title":"Multi-agent hierarchical policy gradient for air combat tactics emergence via self-play","volume":"98","author":"Sun","year":"2020","journal-title":"Eng. Appl. Artif. Intell."},{"key":"B27","first-page":"7","author":"Wineefeld","year":"2011","journal-title":"Unmanned Systems Integrated Roadmap"},{"key":"B28","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i12.17272","article-title":"WCSAC: Worst-case soft actor critic for safety-constrained reinforcement learning,","author":"Yang","year":"2021","journal-title":"Proceedings of Thirty-Fifth AAAI Conference on Artificial Intelligence"},{"key":"B29","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-27535-8_48","article-title":"Multi-agent reinforcement learning for swarm confrontation environments,","author":"Zhang","year":"2019","journal-title":"Proceedings of Intelligent Robotics and Applications (ICIRA 2019)"},{"key":"B30","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.cja.2020.03.031","article-title":"Adaptive level of autonomy for human-UAVS collaborative surveillance using situated fuzzy cognitive maps","volume":"33","author":"Zhao","year":"2020","journal-title":"Chin. J. Aeronaut."},{"key":"B31","doi-asserted-by":"publisher","first-page":"402","DOI":"10.1016\/j.ast.2018.01.035","article-title":"Cooperative search-attack mission planning for multi-UAV based on intelligent self-organized algorithm","volume":"76","author":"Zhen","year":"2018","journal-title":"Aerosp. Sci. Technol."},{"key":"B32","doi-asserted-by":"publisher","first-page":"35551","DOI":"10.1109\/ACCESS.2018.2843773","article-title":"Feature-based transfer learning based on distribution similarity","volume":"6","author":"Zhong","year":"2018","journal-title":"IEEE Access"}],"container-title":["Frontiers in Neurorobotics"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fnbot.2022.1105480\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,10]],"date-time":"2023-01-10T21:22:43Z","timestamp":1673385763000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fnbot.2022.1105480\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,1,10]]},"references-count":32,"alternative-id":["10.3389\/fnbot.2022.1105480"],"URL":"https:\/\/doi.org\/10.3389\/fnbot.2022.1105480","relation":{},"ISSN":["1662-5218"],"issn-type":[{"type":"electronic","value":"1662-5218"}],"subject":[],"published":{"date-parts":[[2023,1,10]]},"article-number":"1105480"}}