{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,25]],"date-time":"2026-01-25T12:56:00Z","timestamp":1769345760381,"version":"3.49.0"},"reference-count":34,"publisher":"MDPI AG","issue":"9","license":[{"start":{"date-parts":[[2021,8,31]],"date-time":"2021-08-31T00:00:00Z","timestamp":1630368000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"the National Natural Science Foundation of China (No. 61906210),381National Grand R&amp;D Plan (Grant No. 2020AAA0103501)","award":["No. 61906210,Grant No. 2020AAA0103501"],"award-info":[{"award-number":["No. 61906210,Grant No. 2020AAA0103501"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>The aim of multi-agent reinforcement learning systems is to provide interacting agents with the ability to collaboratively learn and adapt to the behavior of other agents. Typically, an agent receives its private observations providing a partial view of the true state of the environment. However, in realistic settings, the harsh environment might cause one or more agents to show arbitrarily faulty or malicious behavior, which may suffice to allow the current coordination mechanisms fail. In this paper, we study a practical scenario of multi-agent reinforcement learning systems considering the security issues in the presence of agents with arbitrarily faulty or malicious behavior. The previous state-of-the-art work that coped with extremely noisy environments was designed on the basis that the noise intensity in the environment was known in advance. However, when the noise intensity changes, the existing method has to adjust the configuration of the model to learn in new environments, which limits the practical applications. To overcome these difficulties, we present an Attention-based Fault-Tolerant (FT-Attn) model, which can select not only correct, but also relevant information for each agent at every time step in noisy environments. The multihead attention mechanism enables the agents to learn effective communication policies through experience concurrent with the action policies. Empirical results showed that FT-Attn beats previous state-of-the-art methods in some extremely noisy environments in both cooperative and competitive scenarios, much closer to the upper-bound performance. Furthermore, FT-Attn maintains a more general fault tolerance ability and does not rely on the prior knowledge about the noise intensity of the environment.<\/jats:p>","DOI":"10.3390\/e23091133","type":"journal-article","created":{"date-parts":[[2021,8,31]],"date-time":"2021-08-31T08:42:32Z","timestamp":1630399352000},"page":"1133","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":12,"title":["Attention-Based Fault-Tolerant Approach for Multi-Agent Reinforcement Learning Systems"],"prefix":"10.3390","volume":"23","author":[{"given":"Shanzhi","family":"Gu","sequence":"first","affiliation":[{"name":"College of Computer, National University of Defense Technology, Changsha 410073, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7239-1819","authenticated-orcid":false,"given":"Mingyang","family":"Geng","sequence":"additional","affiliation":[{"name":"College of Computer, National University of Defense Technology, Changsha 410073, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Long","family":"Lan","sequence":"additional","affiliation":[{"name":"High Performance Computing Laboratory, National University of Defense Technology, Changsha 410073, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2021,8,31]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Geng, M., Zhou, X., Ding, B., Wang, H., and Zhang, L. (2018). Learning to cooperate in decentralized multirobot exploration of dynamic environments. International Conference on Neural Information Processing, Springer.","DOI":"10.1007\/978-3-030-04239-4_4"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Higgins, F., Tomlinson, A., and Martin, K.M. (2009, January 20\u201325). Survey on security challenges for swarm robotics. Proceedings of the 2009 Fifth International Conference on Autonomic and Autonomous Systems, Valencia, Spain.","DOI":"10.1109\/ICAS.2009.62"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"591","DOI":"10.1613\/jair.2502","article-title":"A multiagent approach to autonomous intersection management","volume":"31","author":"Dresner","year":"2008","journal-title":"J. Artif. Intell. Res."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Pipattanasomporn, M., Feroze, H., and Rahman, S. (2009, January 15\u201318). Multi-agent systems in a distributed smart grid: Design and implementation. Proceedings of the 2009 IEEE\/PES Power Systems Conference and Exposition, Seattle, WA, USA.","DOI":"10.1109\/PSCE.2009.4840087"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Geng, M., Xu, K., Zhou, X., Ding, B., Wang, H., and Zhang, L. (2019). Learning to cooperate via an attention-based communication neural network in decentralized multirobot exploration. Entropy, 21.","DOI":"10.3390\/e21030294"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.knosys.2019.03.033","article-title":"Multi-objective evolutionary computation for topology coverage assessment problem","volume":"177","author":"Zhou","year":"2019","journal-title":"Knowl.-Based Syst."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1016\/j.aei.2005.06.002","article-title":"Multi-agent robot systems as distributed autonomous systems","volume":"20","author":"Ota","year":"2006","journal-title":"Adv. Eng. Inform."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Amato, C. (2018, January 13\u201319). Decision-Making Under Uncertainty in Multi-Agent and Multi-Robot Systems: Planning and Learning. Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligenge, Stockholm, Sweden.","DOI":"10.24963\/ijcai.2018\/805"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"112202","DOI":"10.1007\/s11432-016-9024-y","article-title":"A large-scale multi-objective flights conflict avoidance approach supporting 4D trajectory operation","volume":"60","author":"Guan","year":"2017","journal-title":"Sci. China Inf. Sci."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"192104","DOI":"10.1007\/s11432-018-9720-6","article-title":"Solving multi-scenario cardinality constrained optimization problems via multi-objective evolutionary algorithms","volume":"62","author":"Zhou","year":"2019","journal-title":"Sci. China Inf. Sci."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"014201","DOI":"10.1007\/s11432-017-9263-6","article-title":"An optimization-based shared control framework with applications in multirobot systems","volume":"61","author":"Fang","year":"2018","journal-title":"Sci. China Inf. Sci."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Millard, A.G., Timmis, J., and Winfield, A.F. (2013, January 20\u201323). Towards exogenous fault detection in swarm robotic systems. Proceedings of the Conference towards Autonomous Robotic Systems, Oxford, UK.","DOI":"10.1007\/978-3-662-43645-5_44"},{"key":"ref_13","unstructured":"Kilinc, O., and Montana, G. (2018). Multi-agent deep reinforcement learning with extremely noisy observations. arXiv."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Bu\u015foniu, L., Babu\u0161ka, R., and De Schutter, B. (2010). Multi-agent reinforcement learning: An overview. Innovations in Multi-Agent Systems and Applications-1, Springer.","DOI":"10.1007\/978-3-642-14435-6_7"},{"key":"ref_15","unstructured":"Fischer, F., Rovatsos, M., and Weiss, G. (2004, January 19\u201323). Hierarchical reinforcement learning in communication-mediated multiagent coordination. Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, New York, NY, USA."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Littman, M.L. (1994). Markov games as a framework for multi-agent reinforcement learning. Machine Learning Proceedings, Elsevier.","DOI":"10.1016\/B978-1-55860-335-6.50027-1"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"156","DOI":"10.1109\/TSMCC.2007.913919","article-title":"A comprehensive survey of multiagent reinforcement learning","volume":"38","author":"Busoniu","year":"2008","journal-title":"IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.)"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Tan, M. (1993, January 27\u201329). Multi-agent reinforcement learning: Independent vs. cooperative agents. Proceedings of the Tenth International Conference on Machine Learning, Amherst, MA, USA.","DOI":"10.1016\/B978-1-55860-307-3.50049-6"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Tampuu, A., Matiisen, T., Kodelja, D., Kuzovkin, I., Korjus, K., Aru, J., Aru, J., and Vicente, R. (2017). Multiagent cooperation and competition with deep reinforcement learning. PLoS ONE, 12.","DOI":"10.1371\/journal.pone.0172395"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Gupta, J.K., Egorov, M., and Kochenderfer, M. (2017, January 8\u201312). Cooperative multi-agent control using deep reinforcement learning. Proceedings of the International Conference on Autonomous Agents and Multiagent Systems, Sao Paulo, Brazil.","DOI":"10.1007\/978-3-319-71682-4_5"},{"key":"ref_21","unstructured":"Sunehag, P., Lever, G., Gruslys, A., Czarnecki, W.M., Zambaldi, V.F., Jaderberg, M., Lanctot, M., Sonnerat, N., Leibo, J.Z., and Tuyls, K. (2018, January 10\u201315). Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward. Proceedings of the 17th International Conference on Autonomous Agents and Multi-Agent Systems, Stockholm, Sweden."},{"key":"ref_22","unstructured":"Sukhbaatar, S., Fergus, R., and Szlam, A. (2016, January 5\u201310). Learning multiagent communication with backpropagation. Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain."},{"key":"ref_23","unstructured":"Foerster, J.N., Assael, Y.M., de Freitas, N., and Whiteson, S. (2016). Learning to communicate to solve riddles with deep distributed recurrent q-networks. arXiv."},{"key":"ref_24","unstructured":"Peng, P., Yuan, Q., Wen, Y., Yang, Y., Tang, Z., Long, H., and Wang, J. (2017). Multiagent bidirectionally-coordinated nets for learning to play starcraft combat games. arXiv."},{"key":"ref_25","unstructured":"Lowe, R., Wu, Y.I., Tamar, A., Harb, J., Abbeel, O.P., and Mordatch, I. (2017, January 4\u20139). Multi-agent actor\u2013critic for mixed cooperative-competitive environments. Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA."},{"key":"ref_26","unstructured":"Jiang, J., and Lu, Z. (2018, January 3\u20138). Learning attentional communication for multi-agent cooperation. Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_27","unstructured":"Iqbal, S., and Sha, F. (2019, January 10\u201315). Actor-attention-critic for multi-agent reinforcement learning. Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA."},{"key":"ref_28","unstructured":"Kim, D., Moon, S., Hostallero, D., Kang, W.J., Lee, T., Son, K., and Yi, Y. (2019). Learning to schedule communication in multi-agent reinforcement learning. arXiv."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Luo, C., Liu, X., Chen, X., and Luo, J. (2020, January 2\u20134). Multi-agent Fault-tolerant Reinforcement Learning with Noisy Environments. Proceedings of the 2020 IEEE 26th International Conference on Parallel and Distributed Systems (ICPADS), Hong Kong, China.","DOI":"10.1109\/ICPADS51040.2020.00031"},{"key":"ref_30","unstructured":"Konda, V.R., and Tsitsiklis, J.N. (2000). Actor-critic algorithms. Advances in Neural Information Processing Systems, MIT Press."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Foerster, J., Farquhar, G., Afouras, T., Nardelli, N., and Whiteson, S. (2017). Counterfactual multi-agent policy gradients. arXiv.","DOI":"10.1609\/aaai.v32i1.11794"},{"key":"ref_32","unstructured":"Kingma, D.P., and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv."},{"key":"ref_33","unstructured":"Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. (2015). Continuous control with deep reinforcement learning. arXiv."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"823","DOI":"10.1103\/PhysRev.36.823","article-title":"On the theory of the Brownian motion","volume":"36","author":"Uhlenbeck","year":"1930","journal-title":"Phys. Rev."}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/23\/9\/1133\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T06:53:06Z","timestamp":1760165586000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/23\/9\/1133"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,8,31]]},"references-count":34,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2021,9]]}},"alternative-id":["e23091133"],"URL":"https:\/\/doi.org\/10.3390\/e23091133","relation":{},"ISSN":["1099-4300"],"issn-type":[{"value":"1099-4300","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,8,31]]}}}