{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,22]],"date-time":"2025-08-22T07:40:11Z","timestamp":1755848411434,"version":"3.44.0"},"publisher-location":"New York, NY, USA","reference-count":28,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,12,23]],"date-time":"2022-12-23T00:00:00Z","timestamp":1671753600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"CAS Project for Young Scientists in Basic Research","award":["YSBR-040"],"award-info":[{"award-number":["YSBR-040"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,12,23]]},"DOI":"10.1145\/3579654.3579702","type":"proceedings-article","created":{"date-parts":[[2023,3,14]],"date-time":"2023-03-14T16:09:40Z","timestamp":1678810180000},"page":"1-7","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Trust Region Method Using K-FAC in Multi-Agent Reinforcement Learning"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3815-3574","authenticated-orcid":false,"given":"Jiali","family":"Yu","sequence":"first","affiliation":[{"name":"Institute of Software Chinese Academy of Sciences, University of Chinese Academy of Sciences, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1958-8321","authenticated-orcid":false,"given":"Fengge","family":"Wu","sequence":"additional","affiliation":[{"name":"Institute of Software Chinese Academy of Sciences, University of Chinese Academy of Sciences, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8950-4157","authenticated-orcid":false,"given":"Junsuo","family":"Zhao","sequence":"additional","affiliation":[{"name":"Institute of Software Chinese Academy of Sciences, University of Chinese Academy of Sciences, China"}]}],"member":"320","published-online":{"date-parts":[[2023,3,14]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"Takuya Akiba Shuji Suzuki and Keisuke Fukuda. 2017. Extremely large minibatch sgd: Training resnet-50 on imagenet in 15 minutes. arXiv preprint arXiv:1711.04325(2017)."},{"key":"e_1_3_2_1_2_1","volume-title":"Natural gradient works efficiently in learning. Neural computation 10, 2","author":"Amari Shun-Ichi","year":"1998","unstructured":"Shun-Ichi Amari. 1998. Natural gradient works efficiently in learning. Neural computation 10, 2 (1998), 251\u2013276."},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1137\/16M1080173"},{"key":"e_1_3_2_1_4_1","unstructured":"Christian\u00a0Schroeder de Witt Tarun Gupta Denys Makoviichuk Viktor Makoviychuk Philip\u00a0HS Torr Mingfei Sun and Shimon Whiteson. 2020. Is independent learning all you need in the starcraft multi-agent challenge?arXiv preprint arXiv:2011.09533(2020)."},{"key":"e_1_3_2_1_5_1","volume-title":"International conference on machine learning. PMLR, 1407\u20131416","author":"Espeholt Lasse","year":"2018","unstructured":"Lasse Espeholt, Hubert Soyer, Remi Munos, Karen Simonyan, Vlad Mnih, Tom Ward, Yotam Doron, Vlad Firoiu, Tim Harley, Iain Dunning, 2018. Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures. In International conference on machine learning. PMLR, 1407\u20131416."},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.11794"},{"key":"e_1_3_2_1_7_1","volume-title":"International Conference on Machine Learning. PMLR, 573\u2013582","author":"Grosse Roger","year":"2016","unstructured":"Roger Grosse and James Martens. 2016. A kronecker-factored approximate fisher matrix for convolution layers. In International Conference on Machine Learning. PMLR, 573\u2013582."},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-71682-4_5"},{"key":"e_1_3_2_1_9_1","unstructured":"Dan Horgan John Quan David Budden Gabriel Barth-Maron Matteo Hessel Hado Van\u00a0Hasselt and David Silver. 2018. Distributed prioritized experience replay. arXiv preprint arXiv:1803.00933(2018)."},{"key":"e_1_3_2_1_10_1","unstructured":"Jakub\u00a0Grudzien Kuba Ruiqing Chen Munning Wen Ying Wen Fanglei Sun Jun Wang and Yaodong Yang. 2021. Trust region policy optimisation in multi-agent reinforcement learning. arXiv preprint arXiv:2109.11251(2021)."},{"key":"e_1_3_2_1_11_1","unstructured":"Timothy\u00a0P Lillicrap Jonathan\u00a0J Hunt Alexander Pritzel Nicolas Heess Tom Erez Yuval Tassa David Silver and Daan Wierstra. 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971(2015)."},{"key":"e_1_3_2_1_12_1","volume-title":"Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in neural information processing systems 30","author":"Lowe Ryan","year":"2017","unstructured":"Ryan Lowe, Yi\u00a0I Wu, Aviv Tamar, Jean Harb, OpenAI Pieter\u00a0Abbeel, and Igor Mordatch. 2017. Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in neural information processing systems 30 (2017)."},{"key":"e_1_3_2_1_13_1","volume-title":"International conference on machine learning. PMLR, 2408\u20132417","author":"Martens James","year":"2015","unstructured":"James Martens and Roger Grosse. 2015. Optimizing neural networks with kronecker-factored approximate curvature. In International conference on machine learning. PMLR, 2408\u20132417."},{"key":"e_1_3_2_1_14_1","unstructured":"Hiroaki Mikami Hisahiro Suganuma Yoshiki Tanaka Yuichi Kageyama 2018. Massively distributed SGD: ImageNet\/ResNet-50 training in a flash. arXiv preprint arXiv:1811.05233(2018)."},{"key":"e_1_3_2_1_15_1","volume-title":"International conference on machine learning. PMLR","author":"Mnih Volodymyr","year":"2016","unstructured":"Volodymyr Mnih, Adria\u00a0Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In International conference on machine learning. PMLR, 1928\u20131937."},{"volume-title":"A concise introduction to decentralized POMDPs","author":"Oliehoek A","key":"e_1_3_2_1_16_1","unstructured":"Frans\u00a0A Oliehoek and Christopher Amato. 2016. A concise introduction to decentralized POMDPs. Springer."},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.5555\/3433701.3433826"},{"key":"e_1_3_2_1_18_1","first-page":"12208","article-title":"Facmac: Factored multi-agent centralised policy gradients","volume":"34","author":"Peng Bei","year":"2021","unstructured":"Bei Peng, Tabish Rashid, Christian Schroeder\u00a0de Witt, Pierre-Alexandre Kamienny, Philip Torr, Wendelin B\u00f6hmer, and Shimon Whiteson. 2021. Facmac: Factored multi-agent centralised policy gradients. Advances in Neural Information Processing Systems 34 (2021), 12208\u201312221.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_1_19_1","unstructured":"Mikayel Samvelyan Tabish Rashid Christian\u00a0Schroeder De\u00a0Witt Gregory Farquhar Nantas Nardelli Tim\u00a0GJ Rudner Chia-Man Hung Philip\u00a0HS Torr Jakob Foerster and Shimon Whiteson. 2019. The starcraft multi-agent challenge. arXiv preprint arXiv:1902.04043(2019)."},{"key":"e_1_3_2_1_20_1","volume-title":"Fast curvature matrix-vector products for second-order gradient descent. Neural computation 14, 7","author":"Schraudolph N","year":"2002","unstructured":"Nicol\u00a0N Schraudolph. 2002. Fast curvature matrix-vector products for second-order gradient descent. Neural computation 14, 7 (2002), 1723\u20131738."},{"key":"e_1_3_2_1_21_1","unstructured":"John Schulman Filip Wolski Prafulla Dhariwal Alec Radford and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347(2017)."},{"key":"e_1_3_2_1_22_1","volume-title":"Policy gradient methods for reinforcement learning with function approximation. Advances in neural information processing systems 12","author":"Sutton S","year":"1999","unstructured":"Richard\u00a0S Sutton, David McAllester, Satinder Singh, and Yishay Mansour. 1999. Policy gradient methods for reinforcement learning with function approximation. Advances in neural information processing systems 12 (1999)."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2012.6386109"},{"key":"e_1_3_2_1_24_1","volume-title":"Scalable trust-region method for deep reinforcement learning using kronecker-factored approximation. Advances in neural information processing systems 30","author":"Wu Yuhuai","year":"2017","unstructured":"Yuhuai Wu, Elman Mansimov, Roger\u00a0B Grosse, Shun Liao, and Jimmy Ba. 2017. Scalable trust-region method for deep reinforcement learning using kronecker-factored approximation. Advances in neural information processing systems 30 (2017)."},{"key":"e_1_3_2_1_25_1","unstructured":"Chris Ying Sameer Kumar Dehao Chen Tao Wang and Youlong Cheng. 2018. Image classification at supercomputer scale. arXiv preprint arXiv:1811.06992(2018)."},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3295500.3356137"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/3225058.3225069"},{"key":"e_1_3_2_1_28_1","unstructured":"Chao Yu Akash Velu Eugene Vinitsky Yu Wang Alexandre Bayen and Yi Wu. 2021. The surprising effectiveness of ppo in cooperative multi-agent games. arXiv preprint arXiv:2103.01955(2021)."}],"event":{"name":"ACAI 2022: 2022 5th International Conference on Algorithms, Computing and Artificial Intelligence","acronym":"ACAI 2022","location":"Sanya China"},"container-title":["Proceedings of the 2022 5th International Conference on Algorithms, Computing and Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3579654.3579702","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3579654.3579702","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,22]],"date-time":"2025-08-22T06:58:58Z","timestamp":1755845938000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3579654.3579702"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,23]]},"references-count":28,"alternative-id":["10.1145\/3579654.3579702","10.1145\/3579654"],"URL":"https:\/\/doi.org\/10.1145\/3579654.3579702","relation":{},"subject":[],"published":{"date-parts":[[2022,12,23]]},"assertion":[{"value":"2023-03-14","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}