{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,10]],"date-time":"2026-01-10T19:18:29Z","timestamp":1768072709468,"version":"3.49.0"},"reference-count":21,"publisher":"World Scientific Pub Co Pte Ltd","issue":"09","funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62071470"],"award-info":[{"award-number":["62071470"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61971421"],"award-info":[{"award-number":["61971421"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100018636","name":"Xuzhou science and technology project","doi-asserted-by":"crossref","award":["KC20167"],"award-info":[{"award-number":["KC20167"]}],"id":[{"id":"10.13039\/501100018636","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Int. J. Patt. Recogn. Artif. Intell."],"published-print":{"date-parts":[[2022,7]]},"abstract":"<jats:p> The rapid development of deep reinforcement learning makes it widely used in multi-agent environments to solve the multi-agent cooperation problem. However, due to the instability of multi-agent environments, the performance is insufficient when using deep reinforcement learning algorithms to train each agent independently. In this work, we use the framework of centralized training with decentralized execution to extend the maximum entropy deep reinforcement learning algorithm Soft Actor-Critic (SAC) and proposes the multi-agent deep reinforcement learning algorithm MASAC based on the maximum entropy framework. Proposed model treats all the agents as part of the environment, it can effectively solve the problem of poor convergence of algorithms due to environmental instability. At the same time, we have noticed the shortcoming of centralized training, using all the information of the agents as input of critics, and it is easy to lose the information related to the current agent. Inspired by the application of self-attention mechanism in machine translation, we use the self-attention mechanism to improve the critic and propose the ATT-MASAC algorithm. Each agent can discover their relationship with other agents through encoder operation and attention calculation as part of the critic networks. Compared with the recent multi-agent deep reinforcement learning algorithms, ATT-MASAC has better convergence effect. Also, it has better stability when the number of agents in the environment increases. <\/jats:p>","DOI":"10.1142\/s0218001422520140","type":"journal-article","created":{"date-parts":[[2022,4,28]],"date-time":"2022-04-28T10:14:09Z","timestamp":1651140849000},"source":"Crossref","is-referenced-by-count":8,"title":["Actor-Critic for Multi-Agent Reinforcement Learning with Self-Attention"],"prefix":"10.1142","volume":"36","author":[{"given":"Juan","family":"Zhao","sequence":"first","affiliation":[{"name":"Department of Mechanical and Electrical Engineering, Henan Industrial and Trade Vocational College, Zhengzhou, China"}]},{"given":"Tong","family":"Zhu","sequence":"additional","affiliation":[{"name":"Zhengzhou Coal Industry (Group) Co., Ltd, Zhengzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7215-3685","authenticated-orcid":false,"given":"Shuo","family":"Xiao","sequence":"additional","affiliation":[{"name":"School of Computer Sciences and Technology, China University of Mining & Technology, Xuzhou, China"}]},{"given":"Zongqian","family":"Gao","sequence":"additional","affiliation":[{"name":"School of Computer Sciences and Technology, China University of Mining & Technology, Xuzhou, China"}]},{"given":"Hao","family":"Sun","sequence":"additional","affiliation":[{"name":"School of Computer Sciences and Technology, China University of Mining & Technology, Xuzhou, China"}]}],"member":"219","published-online":{"date-parts":[[2022,6,6]]},"reference":[{"issue":"1","key":"S0218001422520140BIB001","first-page":"1582","volume":"17","author":"Abdallah S.","year":"2016","journal-title":"J. Mach. Learn. Res."},{"key":"S0218001422520140BIB002","first-page":"1045","volume-title":"Proc. 2013 Int. Conf. Autonomous Agents and Multi-agent Systems","author":"Abdallah S.","year":"2013"},{"key":"S0218001422520140BIB003","doi-asserted-by":"publisher","DOI":"10.1016\/S0921-8890(97)00043-2"},{"key":"S0218001422520140BIB004","doi-asserted-by":"publisher","DOI":"10.1109\/TVT.2019.2925903"},{"key":"S0218001422520140BIB005","doi-asserted-by":"publisher","DOI":"10.1109\/TVT.2019.2914936"},{"key":"S0218001422520140BIB006","doi-asserted-by":"publisher","DOI":"10.1063\/1.5085397"},{"key":"S0218001422520140BIB008","author":"Duan J.","year":"2021","journal-title":"IEEE Transactions on Neural Networks and Learning Systems"},{"key":"S0218001422520140BIB009","first-page":"2137","volume-title":"Proc. 30th Int. Conf. Advances in Neural Information Processing Systems","author":"Foerster J.","year":"2016"},{"key":"S0218001422520140BIB010","first-page":"1146","volume-title":"Proc. 34th Int. Conf. Machine Learning","volume":"70","author":"Foerster J.","year":"2017"},{"key":"S0218001422520140BIB011","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-71682-4_5"},{"key":"S0218001422520140BIB012","first-page":"1856","volume-title":"Int. Conf. Machine Learning","author":"Haarnoja T.","year":"2018"},{"key":"S0218001422520140BIB013","first-page":"6379","volume-title":"Adv. Neural Inf. Process. Syst.","author":"Lowe R.","year":"2017"},{"key":"S0218001422520140BIB014","doi-asserted-by":"publisher","DOI":"10.3233\/JIFS-169339"},{"key":"S0218001422520140BIB015","first-page":"330","volume-title":"Proc. 10th Int. Conf. Machine Learning","author":"Tan M.","year":"2001"},{"key":"S0218001422520140BIB016","doi-asserted-by":"publisher","DOI":"10.1038\/nature14236"},{"issue":"4","key":"S0218001422520140BIB017","first-page":"1","volume":"22","author":"Shuo X.","year":"2018","journal-title":"Cluster Comput."},{"key":"S0218001422520140BIB018","doi-asserted-by":"publisher","DOI":"10.1049\/cje.2018.02.009"},{"key":"S0218001422520140BIB019","doi-asserted-by":"publisher","DOI":"10.3233\/IFS-131021"},{"key":"S0218001422520140BIB020","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2015.2403394"},{"key":"S0218001422520140BIB021","doi-asserted-by":"publisher","DOI":"10.1109\/TCYB.2014.2387277"},{"key":"S0218001422520140BIB022","doi-asserted-by":"publisher","DOI":"10.1049\/cje.2017.10.009"}],"container-title":["International Journal of Pattern Recognition and Artificial Intelligence"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.worldscientific.com\/doi\/pdf\/10.1142\/S0218001422520140","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,8,5]],"date-time":"2022-08-05T01:47:22Z","timestamp":1659664042000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.worldscientific.com\/doi\/10.1142\/S0218001422520140"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,6,6]]},"references-count":21,"journal-issue":{"issue":"09","published-print":{"date-parts":[[2022,7]]}},"alternative-id":["10.1142\/S0218001422520140"],"URL":"https:\/\/doi.org\/10.1142\/s0218001422520140","relation":{},"ISSN":["0218-0014","1793-6381"],"issn-type":[{"value":"0218-0014","type":"print"},{"value":"1793-6381","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,6,6]]},"article-number":"2252014"}}