{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,21]],"date-time":"2026-04-21T13:31:43Z","timestamp":1776778303561,"version":"3.51.2"},"reference-count":20,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2024,3,2]],"date-time":"2024-03-02T00:00:00Z","timestamp":1709337600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,3,2]],"date-time":"2024-03-02T00:00:00Z","timestamp":1709337600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62172072"],"award-info":[{"award-number":["62172072"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2024,6]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Intelligent agents and multi-agent systems are increasingly used in complex scenarios, such as controlling groups of drones and non-player characters in video games. In these applications, multi-agent navigation and obstacle avoidance are foundational functions. However, problems become more challenging with the increased complexity of the environment and the dynamic decision-making interactions among agents. The Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm is a classical multi-agent reinforcement learning algorithm successfully used to improve agents\u2019 performance. However, it ignores the temporal message hidden in agents\u2019 interaction with the environment and needs to be more efficient in scenarios with many agents due to its training technique. To address the limitations of MADDPG, we propose to explore modified algorithms of MADDPG for multi-agent navigation and obstacle avoidance. By combining MADDPG with Long Short-Term Memory (LSTM), we obtain the MADDPG-LSTMactor algorithm, which leverages continuous observations over time as input for the policy network, enabling the LSTM layer to capture hidden temporal patterns. Moreover, by simplifying the input of the critic network, we obtain the MADDPG-L algorithm for efficiency improvement in scenarios with many agents. Experimental results demonstrate that these algorithms outperform existing networks in the OpenAI multi-agent particle environment. We also conducted a comparative study of the LSTM-based approach with Transformer and self-attention models in the task of multi-agent navigation and obstacle avoidance. The results reveal that Transformer and self-attention do not consistently outperform LSTM. The LSTM-based model exhibits a favorable tradeoff across varying sequence lengths. Overall, this work addresses the limitations of MADDPG in multi-agent navigation and obstacle avoidance tasks, providing insights for developing intelligent agents and multi-agent systems.<\/jats:p>","DOI":"10.1007\/s40747-024-01389-0","type":"journal-article","created":{"date-parts":[[2024,3,2]],"date-time":"2024-03-02T10:01:53Z","timestamp":1709373713000},"page":"4141-4155","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":20,"title":["Time-aware MADDPG with LSTM for multi-agent obstacle avoidance: a comparative study"],"prefix":"10.1007","volume":"10","author":[{"given":"Enyu","family":"Zhao","sequence":"first","affiliation":[]},{"given":"Ning","family":"Zhou","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8132-2380","authenticated-orcid":false,"given":"Chanjuan","family":"Liu","sequence":"additional","affiliation":[]},{"given":"Houfu","family":"Su","sequence":"additional","affiliation":[]},{"given":"Yang","family":"Liu","sequence":"additional","affiliation":[]},{"given":"Jinmiao","family":"Cong","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,3,2]]},"reference":[{"key":"1389_CR1","doi-asserted-by":"publisher","first-page":"1899","DOI":"10.1007\/s40747-021-00366-1","volume":"8","author":"J Zhong","year":"2022","unstructured":"Zhong J, Wang T, Cheng L (2022) Collision-free path planning for welding manipulator via hybrid algorithm of deep reinforcement learning and inverse kinematics. Complex Intell Syst 8:1899\u20131912","journal-title":"Complex Intell Syst"},{"issue":"3","key":"1389_CR2","doi-asserted-by":"publisher","first-page":"505","DOI":"10.1145\/3828.3830","volume":"32","author":"R Dechter","year":"1985","unstructured":"Dechter R, Pearl J (1985) Generalized best-first search strategies and the optimality of A. J ACM 32(3):505\u2013536","journal-title":"J ACM"},{"key":"1389_CR3","doi-asserted-by":"crossref","unstructured":"Tan M (1993) Multi-agent reinforcement learning: independent vs. cooperative agents. Machine Learning Proceedings, pp 330\u2013337","DOI":"10.1016\/B978-1-55860-307-3.50049-6"},{"issue":"3","key":"1389_CR4","doi-asserted-by":"publisher","first-page":"279","DOI":"10.1007\/BF00992698","volume":"8","author":"CJCH Watkins","year":"1992","unstructured":"Watkins CJCH, Dayan P (1992) Q-learning. Mach Learn 8(3):279\u2013292","journal-title":"Mach Learn"},{"key":"1389_CR5","unstructured":"de Witt CS, Gupta T, Makoviichuk D, et al. (2020) Is independent learning all you need in the starcraft multi-agent challenge? arXiv preprint arXiv:2011.09533. Accessed 15 Dec 2023."},{"key":"1389_CR6","unstructured":"Lowe R, Wu Y, Tamar A, et al. (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. International Conference on Neural Information Processing Systems, Long Beach, pp 6382\u20136393"},{"key":"1389_CR7","unstructured":"Sunehag P, Lever G, Gruslys A, et al. (2017) Value-decomposition networks for cooperative multi-agent learning. arXiv preprint arXiv:1706.05296. Accessed 15 Dec 2023."},{"key":"1389_CR8","unstructured":"Rashid T, Samvelyan M, Schroeder C, et al. (2018) Qmix: monotonic value function factorisation for deep multi-agent reinforcement learning. International Conference on Machine Learning, Stockholm, pp 4295\u20134304"},{"key":"1389_CR9","unstructured":"Peng B, Rashid T, de Witt CAS, et al. (2021) FACMAC: factored multi-agent centralised policy gradients. Neural Information Processing Systems, Online, pp 12208\u201312221"},{"key":"1389_CR10","doi-asserted-by":"crossref","unstructured":"Naveed K B, Qiao Z, Dolan J M (2021) Trajectory planning for autonomous vehicles using hierarchical reinforcement learning. 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), Indianapolis, pp 601\u2013606","DOI":"10.1109\/ITSC48978.2021.9564634"},{"key":"1389_CR11","unstructured":"Parisotto E, Song F, Rae J, et al. (2020) Stabilizing transformers for reinforcement learning. International Conference on Machine Learning, Online, pp 7487\u20137498"},{"key":"1389_CR12","doi-asserted-by":"crossref","unstructured":"Yan Y, Li X, Qiu X, et al. (2022) Relative distributed formation and obstacle avoidance with multi-agent reinforcement learning\/\/2022 International Conference on Robotics and Automation (ICRA). IEEE, pp 1661\u20131667","DOI":"10.1109\/ICRA46639.2022.9812263"},{"key":"1389_CR13","unstructured":"Ezen-Can A (2020) A comparison of LSTM and BERT for small corpus. arXiv preprint arXiv:2009.05451. Accessed 2 Jan\n2024."},{"key":"1389_CR14","doi-asserted-by":"crossref","unstructured":"Bilokon P, Qiu Y (2023) Transformers versus LSTMs for electronic trading. arXiv preprint arXiv:2309.11400. Accessed 2 Jan\n2024.","DOI":"10.2139\/ssrn.4577922"},{"key":"1389_CR15","first-page":"1","volume":"12","author":"RS Sutton","year":"1999","unstructured":"Sutton RS, McAllester D, Singh S et al (1999) Policy gradient methods for reinforcement learning with function approximation. Adv Neural Inf Process Syst 12:1\u20137","journal-title":"Adv Neural Inf Process Syst"},{"key":"1389_CR16","doi-asserted-by":"crossref","unstructured":"Dey R, Salem FM (2017) Gate-variants of gated recurrent unit (GRU) neural networks\/\/2017 IEEE 60th international midwest symposium on circuits and systems (MWSCAS). IEEE, pp 1597\u20131600","DOI":"10.1109\/MWSCAS.2017.8053243"},{"key":"1389_CR17","unstructured":"Opendilab,Dl-engine-docs[OL]. https:\/\/di-engine-docs.readthedocs.io\/zh_CN\/latest\/env_tutorial\/multiagent_particle_zh.html. Accessed 12 Dec 2023."},{"issue":"4","key":"1389_CR18","doi-asserted-by":"publisher","first-page":"7445","DOI":"10.1109\/LRA.2021.3098332","volume":"6","author":"A Salar","year":"2021","unstructured":"Salar A, Mo C, Mehran M et al (2021) Toward observation based least restrictive collision avoidance using deep meta reinforcement learning. IEEE Robot Autom Lett 6(4):7445\u20137452","journal-title":"IEEE Robot Autom Lett"},{"key":"1389_CR19","doi-asserted-by":"publisher","first-page":"4065","DOI":"10.1002\/int.22450","volume":"36","author":"C Liu","year":"2021","unstructured":"Liu C, Zhu E, Zhang Q, Wei X (2021) Exploring the effects of computational costs in extensive games via modeling and simulation. Int J Intell Syst 36:4065\u20134087","journal-title":"Int J Intell Syst"},{"issue":"4","key":"1389_CR20","doi-asserted-by":"publisher","first-page":"8379","DOI":"10.1109\/LRA.2021.3102636","volume":"6","author":"Z Yuan","year":"2021","unstructured":"Yuan Z, Bo D, Xuan L et al (2021) Decentralized multi-robot collision avoidance in complex scenarios with selective communication. IEEE Robot Autom Lett 6(4):8379\u20138386","journal-title":"IEEE Robot Autom Lett"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-024-01389-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-024-01389-0\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-024-01389-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,5,16]],"date-time":"2024-05-16T18:23:35Z","timestamp":1715883815000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-024-01389-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,3,2]]},"references-count":20,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2024,6]]}},"alternative-id":["1389"],"URL":"https:\/\/doi.org\/10.1007\/s40747-024-01389-0","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"value":"2199-4536","type":"print"},{"value":"2198-6053","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,3,2]]},"assertion":[{"value":"7 April 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"10 February 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 March 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors would like to declare that the work was the authors\u2019 original research. We wish to confirm that the work described has not been published previously, that it is not under consideration for publication elsewhere, that its publication is approved by all authors and tacitly or explicitly by the responsible authorities where the work was carried out, and that, if accepted, it will not be published elsewhere in the same form, in English or in any other language, including electronically without the written consent of the copyright holder.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}