{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,18]],"date-time":"2025-12-18T20:00:39Z","timestamp":1766088039487,"version":"3.37.3"},"reference-count":40,"publisher":"Springer Science and Business Media LLC","issue":"5","license":[{"start":{"date-parts":[[2023,2,21]],"date-time":"2023-02-21T00:00:00Z","timestamp":1676937600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,2,21]],"date-time":"2023-02-21T00:00:00Z","timestamp":1676937600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["No. 62176088"],"award-info":[{"award-number":["No. 62176088"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Program for Science & Technology Development of Henan Province","award":["212102210412","222102210067"],"award-info":[{"award-number":["212102210412","222102210067"]}]},{"name":"Program for Science & Technology Development of Henan Province","award":["222102210022"],"award-info":[{"award-number":["222102210022"]}]},{"DOI":"10.13039\/501100019005","name":"Young Elite Scientists Sponsorship Program by Tianjin","doi-asserted-by":"publisher","award":["No. 2022HYTP013"],"award-info":[{"award-number":["No. 2022HYTP013"]}],"id":[{"id":"10.13039\/501100019005","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2023,10]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Multi-agent multi-target search strategies can be utilized in complex scenarios such as post-disaster search and rescue by unmanned aerial vehicles. To solve the problem of fixed target and trajectory, the current multi-agent multi-target search strategies are mainly based on deep reinforcement learning (DRL). However, the training of agents by the DRL tend to be brittle due to their sensitivity to the training environment, which makes the strategies learned by the agents fall into local optima frequently, resulting in poor system robustness. Additionally, sparse rewards in DRL will lead to the problems such as difficulty in system convergence and low utilization efficiency of the sampled data. To address the problem that the robustness of the agents is weakened and the sparse rewards exist in the multi-objective search environment, we propose a MiniMax Multi-agent Deep Deterministic Policy Gradient based on the Parallel Hindsight Experience Replay (PHER-M3DDPG) algorithm, which adopts the framework of centralized training and decentralized execution in continuous action space. To enhance the system robustness, the PHER-M3DDPG algorithm employs a minimax learning architecture, which adaptively adjusts the learning strategy of agents by involving adversarial disturbances. In addition, to solve the sparse rewards problem, the PHER-M3DDPG algorithm adopts a parallel hindsight experience replay mechanism to increase the efficiency of data utilization by involving virtual learning targets and batch processing of the sampled data. Simulation results show that the PHER-M3DDPG algorithm outperforms the existing algorithms in terms of convergence speed and the task completion time in a multi-target search environment.<\/jats:p>","DOI":"10.1007\/s40747-023-00985-w","type":"journal-article","created":{"date-parts":[[2023,2,21]],"date-time":"2023-02-21T19:46:16Z","timestamp":1677008776000},"page":"4887-4898","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":18,"title":["Cooperative multi-agent target searching: a deep reinforcement learning approach based on parallel hindsight experience replay"],"prefix":"10.1007","volume":"9","author":[{"given":"Yi","family":"Zhou","sequence":"first","affiliation":[]},{"given":"Zhixiang","family":"Liu","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5984-4588","authenticated-orcid":false,"given":"Huaguang","family":"Shi","sequence":"additional","affiliation":[]},{"given":"Si","family":"Li","sequence":"additional","affiliation":[]},{"given":"Nianwen","family":"Ning","sequence":"additional","affiliation":[]},{"given":"Fuqiang","family":"Liu","sequence":"additional","affiliation":[]},{"given":"Xiaozhi","family":"Gao","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,2,21]]},"reference":[{"issue":"9","key":"985_CR1","doi-asserted-by":"publisher","first-page":"3826","DOI":"10.1109\/TCYB.2020.2977374","volume":"50","author":"TT Nguyen","year":"2020","unstructured":"Nguyen TT, Nguyen ND, Nahavandi S (2020) Deep reinforcement learning for multiagent systems: a review of challenges, solutions, and applications. IEEE Trans Cybern 50(9):3826\u20133839","journal-title":"IEEE Trans Cybern"},{"key":"985_CR2","doi-asserted-by":"publisher","first-page":"346","DOI":"10.1109\/TSMC.2020.2997855","volume":"52","author":"Y Shang","year":"2022","unstructured":"Shang Y (2022) Resilient cluster consensus of multiagent systems. IEEE Trans Syst Man Cybern Syst 52:346\u2013356","journal-title":"IEEE Trans Syst Man Cybern Syst"},{"issue":"2","key":"985_CR3","doi-asserted-by":"publisher","first-page":"847","DOI":"10.1109\/TCNS.2020.3038843","volume":"8","author":"S Papaioannou","year":"2021","unstructured":"Papaioannou S, Kolios P, Theocharides T, Panayiotou CG, Polycarpou MM (2021) A cooperative multiagent probabilistic framework for search and track missions. IEEE Trans. Control Netw. Syst. 8(2):847\u2013858","journal-title":"IEEE Trans Control Netw Syst"},{"issue":"13","key":"985_CR4","doi-asserted-by":"publisher","first-page":"660","DOI":"10.1016\/j.oceaneng.2018.12.035","volume":"172","author":"Q Jia","year":"2019","unstructured":"Jia Q, Xu X, Feng X (2019) Research on cooperative area search of multiple underwater robots based on the prediction of initial target information. Ocean Eng. 172(13):660\u2013670","journal-title":"Ocean Eng"},{"issue":"2","key":"985_CR5","doi-asserted-by":"publisher","first-page":"283","DOI":"10.1109\/JAS.2021.1004252","volume":"9","author":"B Li","year":"2022","unstructured":"Li B, Chen B (2022) An adaptive rapidly-exploring random tree. IEEE-CAA J. Autom. Sinica 9(2):283\u2013294","journal-title":"IEEE-CAA J Autom Sin"},{"issue":"2","key":"985_CR6","doi-asserted-by":"publisher","first-page":"1071","DOI":"10.1109\/LRA.2020.2966394","volume":"5","author":"A Wolek","year":"2020","unstructured":"Wolek A, Cheng S, Goswami D, Paley DA (2020) Cooperative mapping and target search over an unknown occupancy graph using mutual information. IEEE Robot. Autom. Lett. 5(2):1071\u20131078","journal-title":"IEEE Robot Autom Lett"},{"issue":"2","key":"985_CR7","doi-asserted-by":"publisher","first-page":"856","DOI":"10.1109\/TCYB.2018.2875625","volume":"50","author":"Z Kashino","year":"2020","unstructured":"Kashino Z, Nejat G, Benhabib B (2020) A hybrid strategy for target search using static and mobile sensors. IEEE Trans. Cybern. 50(2):856\u2013868","journal-title":"IEEE Trans Cybern"},{"key":"985_CR8","doi-asserted-by":"publisher","first-page":"1995","DOI":"10.1007\/s00521-019-04376-6","volume":"32","author":"Y Yuan","year":"2020","unstructured":"Yuan Y, Tian Z, Wang C, Zheng F, Lv Y (2020) A q-learning-based approach for virtual network embedding in data center. Neural Comput. Appl. 32:1995\u20132004","journal-title":"Neural Comput Appl"},{"key":"985_CR9","doi-asserted-by":"crossref","unstructured":"Foerster J, Farquhar G, Afouras T, Nardelli N, Whiteson S (2018) Counterfactual multi agent policy gradients. In: Proceedings of the AAAI conference on artificial intelligence, New Orleans, Louisiana, USA","DOI":"10.1609\/aaai.v32i1.11794"},{"issue":"12","key":"985_CR10","doi-asserted-by":"publisher","first-page":"7363","DOI":"10.1109\/TSMC.2020.2967936","volume":"51","author":"J Sharma","year":"2021","unstructured":"Sharma J, Andersen P-A, Granmo O-C, Goodwin M (2021) Deep q-learning with q-matrix transfer learning for novel fire evacuation environment. IEEE Trans. Syst. Man Cybern. Syst. 51(12):7363\u20137381","journal-title":"IEEE Trans Syst Man Cybern Syst"},{"issue":"5","key":"985_CR11","doi-asserted-by":"publisher","first-page":"8577","DOI":"10.1109\/JIOT.2019.2921159","volume":"6","author":"C Qiu","year":"2019","unstructured":"Qiu C, Hu Y, Chen Y, Zeng B (2019) Deep deterministic policy gradient (ddpg)-based energy harvesting wireless communications. IEEE Internet Things J. 6(5):8577\u20138588","journal-title":"IEEE Internet Things J"},{"key":"985_CR12","unstructured":"Luo J-Q, Wei C (2019) Obstacle avoidance path planning based on target heuristic and repair genetic algorithms. In: IEEE international conference of intelligent applied systems on engineering, Fuzhou, China, pp 44\u20134"},{"issue":"2","key":"985_CR13","doi-asserted-by":"publisher","first-page":"363","DOI":"10.1016\/S1004-4132(08)60093-6","volume":"19","author":"P Li","year":"2008","unstructured":"Li P, Li S (2008) Grover quantum searching algorithm based on weighted targets. J. Syst. Eng. Electron. 19(2):363\u2013369","journal-title":"J Syst Eng Electron"},{"issue":"6","key":"985_CR14","doi-asserted-by":"publisher","first-page":"1032","DOI":"10.1109\/TRO.2010.2073050","volume":"26","author":"I Sisso","year":"2010","unstructured":"Sisso I, Shima T, Ben-Haim Y (2010) Info-gap approach to multiagent search under severe uncertainty. IEEE Trans. Robot. 26(6):1032\u20131041","journal-title":"IEEE Trans Robot"},{"key":"985_CR15","doi-asserted-by":"crossref","unstructured":"Baum M, Passino K (2002) A search-theoretic approach to cooperative control for uninhabited air vehicles. In: AIAA guidance, navigation, and control conference and exhibit, Monterey, California, USA","DOI":"10.2514\/6.2002-4589"},{"issue":"9","key":"985_CR16","doi-asserted-by":"publisher","first-page":"2142","DOI":"10.1109\/TAC.2010.2051094","volume":"55","author":"A Garcia","year":"2010","unstructured":"Garcia A, Li C, Pedraza F (2010) A bio-inspired scheme for coordinated online search. IEEE Trans. Autom. Contr. 55(9):2142\u20132147","journal-title":"IEEE Trans Autom Control"},{"key":"985_CR17","doi-asserted-by":"crossref","unstructured":"Sujit PB, Ghose D (2004) Multiple agent search of an unknown environment using game theoretical models. In: Proceedings of the 2004 American control conference, Boston, MA, USA, pp 5564\u20135569","DOI":"10.23919\/ACC.2004.1384740"},{"issue":"6","key":"985_CR18","doi-asserted-by":"publisher","first-page":"2396","DOI":"10.1109\/TAC.2018.2857760","volume":"64","author":"K Leahy","year":"2019","unstructured":"Leahy K, Schwager M (2019) Tracking a markov target in a discrete environment with multiple sensors. IEEE Trans. Autom. Contr. 64(6):2396\u20132411","journal-title":"IEEE Trans Autom Control"},{"issue":"3","key":"985_CR19","first-page":"5816","volume":"6","author":"Z Chen","year":"2021","unstructured":"Chen Z, Alonso-Mora J, Bai X, Harabor DD, Stuckey PJ (2021) Integrated task assignment and path planning for capacitated multi-agent pickup and delivery. IEEE Robot. 6(3):5816\u20135823","journal-title":"IEEE Robot"},{"issue":"5","key":"985_CR20","doi-asserted-by":"publisher","first-page":"1054","DOI":"10.1109\/TNN.1998.712192","volume":"9","author":"RS Sutton","year":"1998","unstructured":"Sutton RS, Barto AG (1998) Reinforcement learning: An introduction. IEEE Trans. Neural Netw. 9(5):1054","journal-title":"IEEE Trans Neural Netw"},{"key":"985_CR21","doi-asserted-by":"publisher","first-page":"96549","DOI":"10.1109\/ACCESS.2019.2929120","volume":"7","author":"X Cao","year":"2019","unstructured":"Cao X, Sun C, Yan M (2019) Target search control of auv in underwater environment with deep reinforcement learning. IEEE Access. 7:96549\u201396559","journal-title":"IEEE Access"},{"issue":"4","key":"985_CR22","first-page":"573","volume":"11","author":"Y Wang","year":"2019","unstructured":"Wang Y, Zhang L, Wang L, Wang Z (2019) Multitask learning for object localization with deep reinforcement learning. IEEE Trans Neural Netw Learn Syst 11(4):573\u2013580","journal-title":"IEEE Trans Neural Netw Learn Syst"},{"issue":"12","key":"985_CR23","doi-asserted-by":"publisher","first-page":"4640","DOI":"10.1109\/TIM.2019.2899476","volume":"68","author":"S Sun","year":"2019","unstructured":"Sun S, Yin Y, Wang X, Xu D (2019) Robust visual detection and tracking strategies for autonomous aerial refueling of uavs. IEEE Trans Instrum Meas 68(12):4640\u20134652","journal-title":"IEEE Trans Instrum Meas"},{"issue":"3","key":"985_CR24","doi-asserted-by":"publisher","first-page":"391","DOI":"10.1109\/TAI.2021.3133509","volume":"3","author":"J Shi","year":"2022","unstructured":"Shi J, Fan Y, Zhou G, Shen J (2022) Distributed gan: Toward a faster reinforcement-learning-based architecture search. IEEE Trans. Artif. Intell. 3(3):391\u2013401","journal-title":"IEEE Trans Artif Intell"},{"issue":"11","key":"985_CR25","doi-asserted-by":"publisher","first-page":"13702","DOI":"10.1109\/TVT.2020.3023733","volume":"69","author":"Y-J Chen","year":"2020","unstructured":"Chen Y-J, Chang D-K, Zhang C (2020) Autonomous tracking using a swarm of uavs: A constrained multi-agent reinforcement learning approach. IEEE Trans. Veh. Technol. 69(11):13702\u201313717","journal-title":"IEEE Trans Veh Technol"},{"issue":"10","key":"985_CR26","doi-asserted-by":"publisher","first-page":"1686","DOI":"10.1109\/JAS.2021.1004141","volume":"8","author":"C Liu","year":"2021","unstructured":"Liu C, Zhu F, Liu Q, Fu Y (2021) Hierarchical reinforcement learning with automatic sub-goal identification. IEEE-CAA J. Autom. Sinica 8(10):1686\u20131696","journal-title":"IEEE-CAA J Autom Sin"},{"issue":"7","key":"985_CR27","doi-asserted-by":"publisher","first-page":"6180","DOI":"10.1109\/JIOT.2020.2973193","volume":"7","author":"C Wang","year":"2020","unstructured":"Wang C, Wang J, Wang J, Zhang X (2020) Deep-reinforcement-learning-based autonomous uav navigation with sparse rewards. IEEE Internet Things J 7(7):6180\u20136190","journal-title":"IEEE Internet Things J"},{"issue":"3","key":"985_CR28","doi-asserted-by":"publisher","first-page":"1515","DOI":"10.1109\/TCYB.2020.2990722","volume":"52","author":"LF Vecchietti","year":"2022","unstructured":"Vecchietti LF, Seo M, Har D (2022) Sampling rate decay in hindsight experience replay for robot control. IEEE Trans. Cybern. 52(3):1515\u20131526","journal-title":"IEEE Trans Cybern"},{"key":"985_CR29","doi-asserted-by":"publisher","first-page":"105669","DOI":"10.1109\/ACCESS.2019.2932257","volume":"7","author":"J Xie","year":"2019","unstructured":"Xie, J., Shao, Z., Li, Y., Guan, Y., Tan, J.: Deep reinforcement learning with optimized reward functions for robotic trajectory planning. IEEE Access 7, 105669\u2013105679 (2019)","journal-title":"IEEE Access"},{"key":"985_CR30","doi-asserted-by":"publisher","first-page":"15392","DOI":"10.1109\/ACCESS.2020.2967642","volume":"8","author":"Y Zeng","year":"2020","unstructured":"Zeng Y, Xu K, Qin L, Yin Q (2020) A semi-markov decision model with inverse reinforcement learning for recognizing the destination of a maneuvering agent in real time strategy games. IEEE Access 8:15392\u201315409","journal-title":"IEEE Access"},{"key":"985_CR31","doi-asserted-by":"publisher","first-page":"1687","DOI":"10.1007\/s00521-021-06104-5","volume":"34","author":"Y Du","year":"2022","unstructured":"Du Y, Warnell G, Gebremedhin A, Stone P, Taylor M (2022) Lucid dreaming for experience replay: refreshing past states with the current policy. Neural Comput. Appl. 34:1687\u20131712","journal-title":"Neural Comput Appl"},{"issue":"3","key":"985_CR32","doi-asserted-by":"publisher","first-page":"2511","DOI":"10.1109\/TVT.2022.3145346","volume":"71","author":"S Na","year":"2022","unstructured":"Na S, Niu H, Lennox B, Arvin F (2022) Bio-inspired collision avoidance in swarm systems via deep reinforcement learning. IEEE Trans. Veh. Technol. 71(3):2511\u20132526","journal-title":"IEEE Trans Veh Technol"},{"key":"985_CR33","unstructured":"Sutton RS, Mcallester D, Singh S, Mansour Y (1999) Policy gradient methods for reinforcement learning with function approximation. In: International conference on neural information processing systems, Denver, CO, pp 1057\u20131063"},{"issue":"3","key":"985_CR34","first-page":"826","volume":"8","author":"D Wang","year":"2022","unstructured":"Wang D, Liu B, Jia H, Zhang Z, Chen J, Huang D (2022) Peer-to-peer electricity transaction decisions of the user-side smart energy system based on the sarsa reinforcement learning. CSEE J. Power Energy Syst. 8(3):826\u2013837","journal-title":"CSEE J Power Energy Syst"},{"key":"985_CR35","doi-asserted-by":"crossref","unstructured":"Li S, Wu Y, Cui X, Dong H, Fang F, Russell S (2019) Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient. In: Proceedings of the 33th AAAI conference on artificial intelligence, Honolulu, Hawaii, USA, vol 33, no. 1. pp 4213\u20134220","DOI":"10.1609\/aaai.v33i01.33014213"},{"key":"985_CR36","unstructured":"Brockman G, Cheung V, Pettersson L (2016) Openai gym. arXiv:\u00a0Learning"},{"key":"985_CR37","unstructured":"Lowe R, Wu YI, Tamar A, Harb J, Abbeel P, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. In: Proceedings of the 31st international conference on neural information processing systems, Long Beach, California, USA, pp 6382\u20136393"},{"key":"985_CR38","doi-asserted-by":"publisher","first-page":"1355","DOI":"10.1007\/s40747-021-00591-8","volume":"8","author":"W Liang","year":"2022","unstructured":"Liang W, Wang J, Bao W, Zhu X, Wang Q, Han B (2022) Continuous self-adaptive optimization to learn multi-task multi-agent. Complex Intell. Syst. 8:1355\u20131367","journal-title":"Complex Intell Syst"},{"key":"985_CR39","doi-asserted-by":"crossref","unstructured":"Wen X, Qin S (2022) A projection-based continuous-time algorithm for distributed optimization over multi-agent systems. Complex Intell. Syst. 8:719\u2013729","DOI":"10.1007\/s40747-020-00265-x"},{"key":"985_CR40","unstructured":"Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: International conference on machine learning (ICML)"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-023-00985-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-023-00985-w\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-023-00985-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,9,22]],"date-time":"2023-09-22T17:11:45Z","timestamp":1695402705000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-023-00985-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,2,21]]},"references-count":40,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2023,10]]}},"alternative-id":["985"],"URL":"https:\/\/doi.org\/10.1007\/s40747-023-00985-w","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"type":"print","value":"2199-4536"},{"type":"electronic","value":"2198-6053"}],"subject":[],"published":{"date-parts":[[2023,2,21]]},"assertion":[{"value":"11 October 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 January 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"21 February 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Corresponding authors declare on behalf of all authors that there is no conflict of interest. We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}