{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,5]],"date-time":"2026-03-05T19:33:43Z","timestamp":1772739223232,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":32,"publisher":"ACM","license":[{"start":{"date-parts":[[2019,10,13]],"date-time":"2019-10-13T00:00:00Z","timestamp":1570924800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2019,10,13]]},"DOI":"10.1145\/3356464.3357707","type":"proceedings-article","created":{"date-parts":[[2019,10,31]],"date-time":"2019-10-31T12:20:52Z","timestamp":1572524452000},"page":"1-7","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":34,"title":["Factorized Q-learning for large-scale multi-agent systems"],"prefix":"10.1145","author":[{"given":"Ming","family":"Zhou","sequence":"first","affiliation":[{"name":"Shanghai Jiao Tong University"}]},{"given":"Yong","family":"Chen","sequence":"additional","affiliation":[{"name":"Beihang University"}]},{"given":"Ying","family":"Wen","sequence":"additional","affiliation":[{"name":"University College London"}]},{"given":"Yaodong","family":"Yang","sequence":"additional","affiliation":[{"name":"University College London"}]},{"given":"Yufeng","family":"Su","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University"}]},{"given":"Weinan","family":"Zhang","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University"}]},{"given":"Dell","family":"Zhang","sequence":"additional","affiliation":[{"name":"Birkbeck, University of London"}]},{"given":"Jun","family":"Wang","sequence":"additional","affiliation":[{"name":"University College London"}]}],"member":"320","published-online":{"date-parts":[[2019,10,13]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1006\/game.1993.1023"},{"key":"e_1_3_2_1_2_1","first-page":"746","volume-title":"Proceedings of the 15th National Conference on Artificial Intelligence (AAAI) and 10th Innovative Applications of Artificial Intelligence Conference (IAAI)","author":"Claus Caroline","year":"1998","unstructured":"Caroline Claus and Craig Boutilier . The dynamics of reinforcement learning in cooperative multiagent systems . In Proceedings of the 15th National Conference on Artificial Intelligence (AAAI) and 10th Innovative Applications of Artificial Intelligence Conference (IAAI) , pages 746 -- 752 , Madison, WI, US , 1998 . Caroline Claus and Craig Boutilier. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the 15th National Conference on Artificial Intelligence (AAAI) and 10th Innovative Applications of Artificial Intelligence Conference (IAAI), pages 746--752, Madison, WI, US, 1998."},{"key":"e_1_3_2_1_3_1","first-page":"2137","volume-title":"NIPS","author":"Foerster Jakob N.","year":"2016","unstructured":"Jakob N. Foerster , Yannis M. Assael , Nando de Freitas , and Shimon Whiteson . Learning to communicate with deep multi-agent reinforcement learning . In NIPS , pages 2137 -- 2145 , 2016 . Jakob N. Foerster, Yannis M. Assael, Nando de Freitas, and Shimon Whiteson. Learning to communicate with deep multi-agent reinforcement learning. In NIPS, pages 2137--2145, 2016."},{"key":"e_1_3_2_1_4_1","volume-title":"Adaptive and Learning Agents Workshop (ALA)","author":"HolmesParker Chris","year":"2014","unstructured":"Chris HolmesParker , Matthew E. Taylor , Yusen Zhan , and Kagan Tumer . Exploiting structure and agent-centric rewards to promote coordination in large multiagent systems . In Adaptive and Learning Agents Workshop (ALA) , 2014 . Chris HolmesParker, Matthew E. Taylor, Yusen Zhan, and Kagan Tumer. Exploiting structure and agent-centric rewards to promote coordination in large multiagent systems. In Adaptive and Learning Agents Workshop (ALA), 2014."},{"key":"e_1_3_2_1_5_1","volume-title":"Nash Q-learning for general-sum stochastic games. Journal of Machine Learning Research (JMLR), 4:1039--1069","author":"Hu Junling","year":"2003","unstructured":"Junling Hu and Michael P. Wellman . Nash Q-learning for general-sum stochastic games. Journal of Machine Learning Research (JMLR), 4:1039--1069 , 2003 . Junling Hu and Michael P. Wellman. Nash Q-learning for general-sum stochastic games. Journal of Machine Learning Research (JMLR), 4:1039--1069, 2003."},{"key":"e_1_3_2_1_6_1","volume-title":"ICML","author":"Jelle","year":"2004","unstructured":"Jelle R. Kok and Nikos A. Vlassis. Sparse cooperative Q-learning . In ICML , 2004 . Jelle R. Kok and Nikos A. Vlassis. Sparse cooperative Q-learning. In ICML, 2004."},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.5555\/3298483.3298548"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.5555\/3091574.3091594"},{"key":"e_1_3_2_1_9_1","first-page":"322","volume-title":"ICML","author":"Littman Michael L.","year":"2001","unstructured":"Michael L. Littman . Friend-or-Foe Q-learning in general-sum games . In ICML , pages 322 -- 328 , 2001 . Michael L. Littman. Friend-or-Foe Q-learning in general-sum games. In ICML, pages 322--328, 2001."},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1016\/S1389-0417(01)00015-8"},{"key":"e_1_3_2_1_11_1","volume-title":"An asynchronous parallel stochastic coordinate descent algorithm. The Journal of Machine Learning Research (JMLR), 16(1):285--322","author":"Liu Ji","year":"2015","unstructured":"Ji Liu , Stephen J Wright , Christopher R\u00e9 , Victor Bittorf , and Srikrishna Sridhar . An asynchronous parallel stochastic coordinate descent algorithm. The Journal of Machine Learning Research (JMLR), 16(1):285--322 , 2015 . Ji Liu, Stephen J Wright, Christopher R\u00e9, Victor Bittorf, and Srikrishna Sridhar. An asynchronous parallel stochastic coordinate descent algorithm. The Journal of Machine Learning Research (JMLR), 16(1):285--322, 2015."},{"key":"e_1_3_2_1_12_1","first-page":"6382","volume-title":"NIPS","author":"Lowe Ryan","year":"2017","unstructured":"Ryan Lowe , Yi Wu , Aviv Tamar , Jean Harb , Pieter Abbeel , and Igor Mordatch . Multi-agent actor-critic for mixed cooperative-competitive environments . In NIPS , pages 6382 -- 6393 , 2017 . Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, and Igor Mordatch. Multi-agent actor-critic for mixed cooperative-competitive environments. In NIPS, pages 6382--6393, 2017."},{"key":"e_1_3_2_1_14_1","volume-title":"Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781","author":"Mikolov Tomas","year":"2013","unstructured":"Tomas Mikolov , Kai Chen , Greg Corrado , and Jeffrey Dean . Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 , 2013 . Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1038\/nature14236"},{"key":"e_1_3_2_1_16_1","volume-title":"Value function approximation via low-rank models. arXiv preprint arXiv:1509.00061","author":"Ong Hao Yi","year":"2015","unstructured":"Hao Yi Ong . Value function approximation via low-rank models. arXiv preprint arXiv:1509.00061 , 2015 . Hao Yi Ong. Value function approximation via low-rank models. arXiv preprint arXiv:1509.00061, 2015."},{"key":"e_1_3_2_1_17_1","volume-title":"Gregory Farquhar, Jakob N. Foerster, and Shimon Whiteson. QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning. arXiv preprint arXiv:1803.11485","author":"Rashid Tabish","year":"2018","unstructured":"Tabish Rashid , Mikayel Samvelyan , Christian Schroder de Witt , Gregory Farquhar, Jakob N. Foerster, and Shimon Whiteson. QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning. arXiv preprint arXiv:1803.11485 , 2018 . Tabish Rashid, Mikayel Samvelyan, Christian Schroder de Witt, Gregory Farquhar, Jakob N. Foerster, and Shimon Whiteson. QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning. arXiv preprint arXiv:1803.11485, 2018."},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/1718487.1718498"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2168752.2168771"},{"key":"e_1_3_2_1_20_1","volume-title":"Reinforcement learning with factored states and actions. Journal of Machine Learning Research (JMLR), 5(Aug):1063--1088","author":"Sallans Brian","year":"2004","unstructured":"Brian Sallans and Geoffrey E Hinton . Reinforcement learning with factored states and actions. Journal of Machine Learning Research (JMLR), 5(Aug):1063--1088 , 2004 . Brian Sallans and Geoffrey E Hinton. Reinforcement learning with factored states and actions. Journal of Machine Learning Research (JMLR), 5(Aug):1063--1088, 2004."},{"key":"e_1_3_2_1_21_1","unstructured":"Peter Sunehag Guy Lever Audrunas Gruslys Wojciech Marian Czarnecki Vinicius Flores Zambaldi Max Jaderberg Marc Lanctot Nicolas Sonnerat Joel Z. Leibo Karl Tuyls and Thore Graepel. Value-decomposition networks for cooperative multi-agent learning. arXiv preprint arXiv:1706.05296 2017.  Peter Sunehag Guy Lever Audrunas Gruslys Wojciech Marian Czarnecki Vinicius Flores Zambaldi Max Jaderberg Marc Lanctot Nicolas Sonnerat Joel Z. Leibo Karl Tuyls and Thore Graepel. Value-decomposition networks for cooperative multi-agent learning. arXiv preprint arXiv:1706.05296 2017."},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.5555\/551283"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0172395"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1016\/B978-1-55860-307-3.50049-6"},{"key":"e_1_3_2_1_25_1","first-page":"871","volume-title":"NIPS","author":"Tesauro Gerald","year":"2003","unstructured":"Gerald Tesauro . Extending Q-learning to general adaptive multi-agent systems . In NIPS , pages 871 -- 878 , 2003 . Gerald Tesauro. Extending Q-learning to general adaptive multi-agent systems. In NIPS, pages 871--878, 2003."},{"key":"e_1_3_2_1_26_1","volume-title":"CMU-CS-03--107","author":"Uther William","year":"1997","unstructured":"William Uther and Manuela Veloso . Adversarial reinforcement learning. Technical report , CMU-CS-03--107 , 1997 . William Uther and Manuela Veloso. Adversarial reinforcement learning. Technical report, CMU-CS-03--107, 1997."},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.5555\/3016100.3016191"},{"key":"e_1_3_2_1_28_1","first-page":"2613","volume-title":"NIPS","author":"van Hasselt Hado","year":"2010","unstructured":"Hado van Hasselt . Double Q-learning . In NIPS , pages 2613 -- 2621 , 2010 . Hado van Hasselt. Double Q-learning. In NIPS, pages 2613--2621, 2010."},{"key":"e_1_3_2_1_29_1","first-page":"1995","volume-title":"ICML","author":"Wang Ziyu","year":"2016","unstructured":"Ziyu Wang , Tom Schaul , Matteo Hessel , Hado van Hasselt , Marc Lanctot , and Nando de Freitas . Dueling network architectures for deep reinforcement learning . In ICML , pages 1995 -- 2003 , 2016 . Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, and Nando de Freitas. Dueling network architectures for deep reinforcement learning. In ICML, pages 1995--2003, 2016."},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF00992698"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10107-015-0892-3"},{"key":"e_1_3_2_1_33_1","volume-title":"ICML","author":"Yang Yaodong","year":"2018","unstructured":"Yaodong Yang , Rui Luo , Minne Li , Ming Zhou , Weinan Zhang , and Jun Wang . Mean field multi-agent reinforcement learning . In ICML , Stockholm, Sweden , 2018 . Yaodong Yang, Rui Luo, Minne Li, Ming Zhou, Weinan Zhang, and Jun Wang. Mean field multi-agent reinforcement learning. In ICML, Stockholm, Sweden, 2018."},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.11371"}],"event":{"name":"DAI '19: First International Conference on Distributed Artificial Intelligence","location":"Beijing China","acronym":"DAI '19"},"container-title":["Proceedings of the First International Conference on Distributed Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3356464.3357707","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3356464.3357707","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:22:54Z","timestamp":1750202574000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3356464.3357707"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,10,13]]},"references-count":32,"alternative-id":["10.1145\/3356464.3357707","10.1145\/3356464"],"URL":"https:\/\/doi.org\/10.1145\/3356464.3357707","relation":{},"subject":[],"published":{"date-parts":[[2019,10,13]]},"assertion":[{"value":"2019-10-13","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}