{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,10]],"date-time":"2026-06-10T16:50:12Z","timestamp":1781110212414,"version":"3.54.1"},"reference-count":37,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2023,9,21]],"date-time":"2023-09-21T00:00:00Z","timestamp":1695254400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Neurorobot."],"abstract":"<jats:p>Unmanned Aerial Vehicles (UAVs) have gained popularity due to their low lifecycle cost and minimal human risk, resulting in their widespread use in recent years. In the UAV swarm cooperative decision domain, multi-agent deep reinforcement learning has significant potential. However, current approaches are challenged by the multivariate mission environment and mission time constraints. In light of this, the present study proposes a meta-learning based multi-agent deep reinforcement learning approach that provides a viable solution to this problem. This paper presents an improved MAML-based multi-agent deep deterministic policy gradient (MADDPG) algorithm that achieves an unbiased initialization network by automatically assigning weights to meta-learning trajectories. In addition, a Reward-TD prioritized experience replay technique is introduced, which takes into account immediate reward and TD-error to improve the resilience and sample utilization of the algorithm. Experiment results show that the proposed approach effectively accomplishes the task in the new scenario, with significantly improved task success rate, average reward, and robustness compared to existing methods.<\/jats:p>","DOI":"10.3389\/fnbot.2023.1243174","type":"journal-article","created":{"date-parts":[[2023,9,21]],"date-time":"2023-09-21T15:19:15Z","timestamp":1695309555000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":18,"title":["MW-MADDPG: a meta-learning based decision-making method for collaborative UAV swarm"],"prefix":"10.3389","volume":"17","author":[{"given":"Minrui","family":"Zhao","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Gang","family":"Wang","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Qiang","family":"Fu","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Xiangke","family":"Guo","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yu","family":"Chen","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Tengda","family":"Li","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"XiangYu","family":"Liu","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1965","published-online":{"date-parts":[[2023,9,21]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","first-page":"17","DOI":"10.37105\/sd.4","article-title":"Military use of unmanned aerial vehicles-a historical study","volume":"4","author":"Aleksander","year":"2018","journal-title":"Saf. Def"},{"key":"B2","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2301.08028","article-title":"Survey of meta-reinforcement learning","author":"Beck","year":"2023","journal-title":"arXiv"},{"key":"B3","doi-asserted-by":"publisher","first-page":"102324","DOI":"10.1016\/j.adhoc.2020.102324","article-title":"A comprehensive review of unmanned aerial vehicle attacks and neutralization techniques","volume":"111","author":"Chamola","year":"2021","journal-title":"Ad Hoc Netw"},{"key":"B4","doi-asserted-by":"publisher","first-page":"5374","DOI":"10.1109\/TNNLS.2021.3070584","article-title":"Multiagent meta-reinforcement learning for adaptive multipath routing optimization","volume":"33","author":"Chen","year":"2022","journal-title":"IEEE Trans. Neural Netw. Learn. Syst"},{"key":"B5","doi-asserted-by":"publisher","first-page":"47","DOI":"10.1038\/s41586-022-05172-4","article-title":"Discovering faster matrix multiplication algorithms with reinforcement learning","volume":"610","author":"Fawzi","year":"2022","journal-title":"Nature"},{"key":"B6","doi-asserted-by":"publisher","DOI":"10.1016\/j.cja.2023.03.044","article-title":"Electromagnetic interference modeling and elimination for a solar\/hydrogen hybrid powered small-scale UAV","author":"Ge","year":"2023","journal-title":"Chin. J. Aeronaut"},{"key":"B7","doi-asserted-by":"publisher","first-page":"271","DOI":"10.1002\/sys.21477","article-title":"A mission-based architecture for swarm unmanned systems","volume":"22","author":"Giles","year":"2019","journal-title":"Syst. Eng"},{"key":"B8","doi-asserted-by":"publisher","first-page":"5149","DOI":"10.1109\/TPAMI.2021.3079209","article-title":"Meta-learning in neural networks: a survey","volume":"44","author":"Hospedales","year":"2022","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell"},{"key":"B9","doi-asserted-by":"crossref","first-page":"316","DOI":"10.1109\/SMC.2017.8122622","article-title":"A novel DDPG method with prioritized experience replay","volume-title":"2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC)","author":"Hou","year":"2017"},{"key":"B10","doi-asserted-by":"publisher","first-page":"377","DOI":"10.1016\/j.cja.2022.09.008","article-title":"Imaginary filtered hindsight experience replay for UAV tracking dynamic targets in large-scale unknown environments","volume":"36","author":"Hu","year":"2023","journal-title":"Chin. J. Aeronaut"},{"key":"B11","doi-asserted-by":"publisher","first-page":"6388","DOI":"10.1109\/TNNLS.2021.3079148","article-title":"Attention-based meta-reinforcement learning for tracking control of AUV with time-varying dynamics","volume":"33","author":"Jiang","year":"2022","journal-title":"IEEE Trans. Neural Netw. Learn. Syst"},{"key":"B12","doi-asserted-by":"publisher","first-page":"109644","DOI":"10.1016\/j.comnet.2023.109644","article-title":"Equalizing service probability in UAV-assisted wireless powered mmWave networks for post-disaster rescue","volume":"225","author":"Jin","year":"2023","journal-title":"Comput. Netw"},{"key":"B13","doi-asserted-by":"publisher","first-page":"386","DOI":"10.1109\/MNET.011.2000388","article-title":"Toward intelligent cooperation of UAV swarms: when machine learning meets digital twin","volume":"35","author":"Lei","year":"2021","journal-title":"IEEE Netw"},{"key":"B14","doi-asserted-by":"publisher","first-page":"108875","DOI":"10.1016\/j.patcog.2022.108875","article-title":"Clustering experience replay for the effective exploitation in reinforcement learning","volume":"131","author":"Li","year":"2022","journal-title":"Pattern Recognit"},{"key":"B15","doi-asserted-by":"publisher","first-page":"2100","DOI":"10.1109\/TITS.2020.3040557","article-title":"Novel UAV-enabled data collection scheme for intelligent transportation system through UAV speed control","volume":"22","author":"Li","year":"2021","journal-title":"IEEE Trans. Intell. Transp. Syst"},{"key":"B16","doi-asserted-by":"publisher","first-page":"5926","DOI":"10.1109\/TITS.2020.3042670","article-title":"An iterative two-phase optimization method based on divide and conquer framework for integrated scheduling of multiple UAVs","volume":"22","author":"Liu","year":"2021","journal-title":"IEEE Trans. Intell. Transp. Syst"},{"key":"B17","doi-asserted-by":"publisher","first-page":"1406","DOI":"10.3390\/rs14061406","article-title":"Swarm scheduling method for remote sensing observations during emergency scenarios","volume":"14","author":"Liu","year":"","journal-title":"Remote Sens"},{"key":"B18","doi-asserted-by":"publisher","first-page":"8085","DOI":"10.1109\/JSTARS.2022.3206399","article-title":"YOLOv5-tassel: detecting tassels in RGB UAV imagery with improved YOLOv5 based on transfer learning","volume":"15","author":"Liu","year":"","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens"},{"key":"B19","doi-asserted-by":"publisher","first-page":"529","DOI":"10.1038\/nature14236","article-title":"Human-level control through deep reinforcement learning","volume":"518","author":"Mnih","year":"2015","journal-title":"Nature"},{"key":"B20","doi-asserted-by":"publisher","first-page":"570","DOI":"10.1002\/asjc.2806","article-title":"Formation control of unmanned aerial vehicle swarms: a comprehensive review","volume":"25","author":"Ouyang","year":"2023","journal-title":"Asian J. Control"},{"key":"B21","doi-asserted-by":"publisher","first-page":"1486","DOI":"10.1109\/TCDS.2021.3110959","article-title":"A dynamically adaptive approach to reducing strategic interference for multiagent systems","volume":"14","author":"Pan","year":"2022","journal-title":"IEEE Trans. Cogn. Develop. Syst"},{"key":"B22","doi-asserted-by":"publisher","first-page":"14224","DOI":"10.1109\/TITS.2022.3155072","article-title":"The drone scheduling problem: a systematic state-of-the-art review","volume":"23","author":"Pasha","year":"2022","journal-title":"IEEE Trans. Intell. Transp. Syst"},{"key":"B23","doi-asserted-by":"publisher","first-page":"990","DOI":"10.1126\/science.add4679","article-title":"Mastering the game of Stratego with model-free multiagent reinforcement learning","volume":"378","author":"Perolat","year":"2022","journal-title":"Science"},{"key":"B24","doi-asserted-by":"publisher","first-page":"100469","DOI":"10.1016\/j.vehcom.2022.100469","article-title":"Task assignment algorithms for unmanned aerial vehicle networks: a comprehensive survey","volume":"35","author":"Poudel","year":"2022","journal-title":"Veh. Commun"},{"key":"B25","doi-asserted-by":"publisher","first-page":"153","DOI":"10.1007\/s00521-021-06569-4","article-title":"A review of artificial intelligence applied to path planning in UAV swarms","volume":"34","author":"Puente-Castro","year":"2022","journal-title":"Neural Comput. Appl"},{"key":"B26","doi-asserted-by":"publisher","first-page":"103","DOI":"10.1016\/j.eswa.2016.10.044","article-title":"Analysing temporal performance profiles of UAV operators using time series clustering","volume":"70","author":"Rodriguez-Fernandez","year":"2017","journal-title":"Expert Syst. Appl"},{"key":"B27","doi-asserted-by":"publisher","first-page":"106053","DOI":"10.1016\/j.ast.2020.106053","article-title":"Design and real-time implementation of a wireless autopilot using multivariable predictive generalized minimum variance control in the state-space","volume":"105","author":"Silveira","year":"2020","journal-title":"Aerosp. Sci. Technol"},{"key":"B28","doi-asserted-by":"publisher","first-page":"4295","DOI":"10.1007\/s10462-022-10281-7","article-title":"Swarm intelligence algorithms for multiple unmanned aerial vehicles collaboration: a comprehensive review","volume":"56","author":"Tang","year":"2023","journal-title":"Artif. Intell. Rev"},{"key":"B29","doi-asserted-by":"publisher","first-page":"3362","DOI":"10.3934\/jimo.2022089","article-title":"A mini review on UAV mission planning","volume":"19","author":"Wang","year":"2022","journal-title":"J. Ind. Manag. Optim"},{"key":"B30","doi-asserted-by":"publisher","first-page":"109072","DOI":"10.1016\/j.knosys.2022.109072","article-title":"A task allocation algorithm for a swarm of unmanned aerial vehicles based on bionic wolf pack method","volume":"250","author":"Wang","year":"2022","journal-title":"Knowl. Based Syst"},{"key":"B31","doi-asserted-by":"publisher","first-page":"108439","DOI":"10.1016\/j.comnet.2021.108439","article-title":"Computation offloading over multi-UAV MEC network: a distributed deep reinforcement learning approach","volume":"199","author":"Wei","year":"2021","journal-title":"Comput. Netw"},{"key":"B32","doi-asserted-by":"publisher","first-page":"223","DOI":"10.1038\/s41586-021-04357-7","article-title":"Outracing champion Gran Turismo drivers with deep reinforcement learning","volume":"602","author":"Wurman","year":"2022","journal-title":"Nature"},{"key":"B33","doi-asserted-by":"publisher","first-page":"124","DOI":"10.1016\/j.neucom.2020.08.034","article-title":"Meta weight learning via model-agnostic meta-learning","volume":"432","author":"Xu","year":"2021","journal-title":"Neurocomputing"},{"key":"B34","doi-asserted-by":"publisher","first-page":"1582","DOI":"10.1007\/s10489-021-02502-3","article-title":"A distributed task reassignment method in dynamic environment for multi-UAV system","volume":"52","author":"Yang","year":"2022","journal-title":"Appl. Intell"},{"key":"B35","doi-asserted-by":"publisher","first-page":"28","DOI":"10.1109\/MWC.011.2100036","article-title":"Joint optimization of control and communication in autonomous UAV swarms: challenges, potentials, and framework","volume":"28","author":"Yao","year":"2021","journal-title":"IEEE Wirel. Commun"},{"key":"B36","doi-asserted-by":"publisher","first-page":"107994","DOI":"10.1016\/j.cie.2022.107994","article-title":"Helicopter-UAVs search and rescue task allocation considering UAVs operating environment and performance","volume":"167","author":"Zhang","year":"2022","journal-title":"Comput. Ind. Eng"},{"key":"B37","doi-asserted-by":"publisher","first-page":"47","DOI":"10.1016\/j.patrec.2022.11.031","article-title":"A multi-scenario text generation method based on meta reinforcement learning","volume":"165","author":"Zhao","year":"2023","journal-title":"Pattern Recognit. Lett"}],"container-title":["Frontiers in Neurorobotics"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fnbot.2023.1243174\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,9,21]],"date-time":"2023-09-21T15:19:19Z","timestamp":1695309559000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fnbot.2023.1243174\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,9,21]]},"references-count":37,"alternative-id":["10.3389\/fnbot.2023.1243174"],"URL":"https:\/\/doi.org\/10.3389\/fnbot.2023.1243174","relation":{},"ISSN":["1662-5218"],"issn-type":[{"value":"1662-5218","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,9,21]]},"article-number":"1243174"}}