{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T07:17:30Z","timestamp":1740122250907,"version":"3.37.3"},"reference-count":47,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2021,6,7]],"date-time":"2021-06-07T00:00:00Z","timestamp":1623024000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,6,7]],"date-time":"2021-06-07T00:00:00Z","timestamp":1623024000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100000266","name":"Engineering and Physical Sciences Research Council","doi-asserted-by":"publisher","award":["First Grant EP\/R001227\/1"],"award-info":[{"award-number":["First Grant EP\/R001227\/1"]}],"id":[{"id":"10.13039\/501100000266","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100000781","name":"European Research Council","doi-asserted-by":"publisher","award":["European Union\u2019s Horizon 2020 (grant agreement No. 758824 - INFLUENCE)"],"award-info":[{"award-number":["European Union\u2019s Horizon 2020 (grant agreement No. 758824 - INFLUENCE)"]}],"id":[{"id":"10.13039\/501100000781","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Auton Agent Multi-Agent Syst"],"published-print":{"date-parts":[[2021,10]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Recent years have seen the application of deep reinforcement learning techniques to cooperative multi-agent systems, with great empirical success. However, given the lack of theoretical insight, it remains unclear what the employed neural networks are learning, or how we should enhance their learning power to address the problems on which they fail. In this work, we empirically investigate the learning power of various network architectures on a series of one-shot games. Despite their simplicity, these games capture many of the crucial problems that arise in the multi-agent setting, such as an exponential number of joint actions or the lack of an explicit coordination mechanism. Our results extend those in Castellini et al. (Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS\u201919.International Foundation for Autonomous Agents and Multiagent Systems, pp 1862\u20131864, 2019) and quantify how well various approaches can represent the requisite value functions, and help us identify the reasons that can impede good performance, like sparsity of the values or too tight coordination requirements.<\/jats:p>","DOI":"10.1007\/s10458-021-09506-w","type":"journal-article","created":{"date-parts":[[2021,6,7]],"date-time":"2021-06-07T07:04:57Z","timestamp":1623049497000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Analysing factorizations of action-value networks for cooperative multi-agent reinforcement learning"],"prefix":"10.1007","volume":"35","author":[{"given":"Jacopo","family":"Castellini","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Frans A.","family":"Oliehoek","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Rahul","family":"Savani","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shimon","family":"Whiteson","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2021,6,7]]},"reference":[{"key":"9506_CR1","doi-asserted-by":"crossref","unstructured":"Amato, C., & Oliehoek, F.\u00a0A. (2015). Scalable planning and learning for multiagent POMDPs. In Proceedings of the 29th AAAI Conference on Artificial Intelligence, AAAI\u201915 (pp. 1995\u20132002). American Association for Artificial Intelligence.","DOI":"10.1609\/aaai.v29i1.9439"},{"key":"9506_CR2","unstructured":"Boehmer, W., Kurin, V., & Whiteson, S. (2020). Deep coordination graphs. In Proceedings of the 37th International Conference on Machine Learning, ICML\u201920 (pp. 980\u2013991). PMLR."},{"key":"9506_CR3","doi-asserted-by":"crossref","unstructured":"Busoniu, L., Babuska, R., Schutter, D., & Bart. . (2008). A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics Part C (Applications and Reviews), 38, 156\u2013172.","DOI":"10.1109\/TSMCC.2007.913919"},{"key":"9506_CR4","unstructured":"Castellini, J., Oliehoek, F.\u00a0A., Savani, R., & Whiteson, S. (2019). The representational capacity of action-value networks for multi-agent reinforcement learning - extended abstract. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS\u201919 (pp. 1862\u20131864). International Foundation for Autonomous Agents and Multiagent Systems."},{"key":"9506_CR5","unstructured":"Chung, J.\u00a0J., Mikliundefined, D., Sabattini, L., Tumer, K., & Siegwart, R. (2019). The impact of agent definitions and interactions on multiagent learning for coordination. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS\u201919 (pp. 1752\u20131760). International Foundation for Autonomous Agents and MultiAgent Systems."},{"key":"9506_CR6","unstructured":"Claus, C., & Boutilier, C. (1998). The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the 15th\/10th AAAI Conference on Artificial Intelligence\/Innovative Applications of Artificial Intelligence, AAAI\u201998\/IAAI\u201998 (pp. 746\u2013752). American Association for Artificial Intelligence."},{"key":"9506_CR7","unstructured":"Foerster, J., Assael, I.\u00a0A., de\u00a0Freitas, N., & Whiteson, S. (2016). Learning to communicate with deep multi-agent reinforcement learning. In Advances in Neural Information Processing Systems 29, NIPS\u201916 (pp. 2137\u20132145). Curran Associates, Inc."},{"key":"9506_CR8","unstructured":"Foerster, J.\u00a0N., Nardelli, N., Farquhar, G., Torr, Philip H.\u00a0S., Kohli, P., & Whiteson, S. (2017). Stabilising experience replay for deep multi-agent reinforcement learning. In Proceedings of the 34th International Conference on Machine Learning, ICML\u201917 (pp. 1146\u20131155). PMLR."},{"key":"9506_CR9","doi-asserted-by":"crossref","unstructured":"Foerster, J.\u00a0N., Farquhar, G., Afouras, T., Nardelli, N., & Whiteson, S. (2018). Counterfactual multi-agent policy gradients. In Proceedings of the 32th AAAI Conference on Artificial Intelligence, AAAI\u201918 (pp. 2974\u20132982). American Association for Artificial Intelligence.","DOI":"10.1609\/aaai.v32i1.11794"},{"issue":"2","key":"9506_CR10","doi-asserted-by":"publisher","first-page":"197","DOI":"10.1007\/s10458-006-7035-4","volume":"13","author":"M Ghavamzadeh","year":"2006","unstructured":"Ghavamzadeh, M., Mahadevan, S., & Makar, R. (2006). Hierarchical multi-agent reinforcement learning. Autonomous Agents and Multi-Agent Systems, 13(2), 197\u2013229.","journal-title":"Autonomous Agents and Multi-Agent Systems"},{"key":"9506_CR11","unstructured":"Grenager, T., Powers, R.\u00a0A., & Shoham, Y. (2002). Dispersion games: General definitions and some specific learning results. In Proceedings of the 18th AAAI Conference on Artificial Intelligence, AAAI\u201902 (pp. 398\u2013403). American Association for Artificial Intelligence."},{"key":"9506_CR12","unstructured":"Guestrin, C., Koller, D., & Parr, R. (2002). Multiagent planning with factored MDPs. In Advances in Neural Information Processing Systems 14, NIPS\u201901 (pp. 1523\u20131530). Morgan Kaufmann Publishers Inc."},{"key":"9506_CR13","unstructured":"Guestrin, C., Lagoudakis, M.\u00a0G., & Parr, R. (2002). Coordinated reinforcement learning. In Proceedings of the 19th International Conference on Machine Learning, ICML\u201902 (pp. 227\u2013234). Morgan Kaufmann Publishers Inc."},{"key":"9506_CR14","doi-asserted-by":"crossref","unstructured":"Guestrin, C., Venkataraman, S., & Koller, D. (2002). Context-specific multiagent coordination and planning with factored MDPs. In Proceedings of the 19th\/10th AAAI Conference on Artificial Intelligence\/Innovative Applications of Artificial Intelligence, AAAI\u201902\/IAAI\u201902. American Association for Artificial Intelligence.","DOI":"10.1613\/jair.1000"},{"issue":"1","key":"9506_CR15","doi-asserted-by":"publisher","first-page":"399","DOI":"10.1613\/jair.1000","volume":"19","author":"C Guestrin","year":"2003","unstructured":"Guestrin, C., Koller, D., Parr, R., & Venkataraman, S. (2003). Efficient solution algorithms for factored MDPs. Journal of Artificial Intelligence Research, 19(1), 399\u2013468.","journal-title":"Journal of Artificial Intelligence Research"},{"key":"9506_CR16","doi-asserted-by":"crossref","unstructured":"Gupta, J.\u00a0K., Egorov, M., & Kochenderfer, M. (2017). Cooperative multi-agent control using deep reinforcement learning. In Autonomous Agents and Multi-Agent Systems (pp. 66\u201383). Springer.","DOI":"10.1007\/978-3-319-71682-4_5"},{"issue":"8","key":"9506_CR17","doi-asserted-by":"publisher","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","volume":"9","author":"S Hochreiter","year":"1997","unstructured":"Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735\u20131780.","journal-title":"Neural Computation"},{"key":"9506_CR18","unstructured":"Hofstadter, D. R. (1985). Metamagical Themas: Questing for the essence of mind and pattern. Basic Books, Inc."},{"key":"9506_CR19","unstructured":"Kendall, M., & Gibbons, J.\u00a0D. (1990). Rank Correlation Methods, 5th edn. A Charles Griffin Title."},{"key":"9506_CR20","first-page":"1789","volume":"7","author":"JR Kok","year":"2006","unstructured":"Kok, J. R., & Vlassis, N. (2006). Collaborative multiagent reinforcement learning by payoff propagation. Journal of Machine Learning Research, 7, 1789\u20131828.","journal-title":"Journal of Machine Learning Research"},{"key":"9506_CR21","unstructured":"Kok, J.\u00a0R., \u2019t\u00a0Hoen, P.\u00a0J., Bakker, B., & Vlassis, N. (2005). Utile coordination: Learning interdependencies among cooperative agents. In Proceedings of the IEEE Symposium on Computational Intelligence and Games (CIG) (pp. 29\u201336)."},{"key":"9506_CR22","doi-asserted-by":"publisher","first-page":"82","DOI":"10.1016\/j.neucom.2016.01.031","volume":"190","author":"L Kraemer","year":"2016","unstructured":"Kraemer, L., & Banerjee, B. (2016). Multi-agent reinforcement learning as a rehearsal for decentralized planning. Neurocomputing, 190, 82\u201394.","journal-title":"Neurocomputing"},{"key":"9506_CR23","unstructured":"Leibo, J.\u00a0Z., Zambaldi, V., Lanctot, M., Marecki, J., & Graepel, T. (2017). Multi-agent reinforcement learning in sequential social dilemmas. In Proceedings of the 16th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS\u201917 (pp. 464\u2013473). International Foundation for Autonomous Agents and MultiAgent Systems."},{"key":"9506_CR24","unstructured":"Lillicrap, T.\u00a0P., Hunt, J.\u00a0J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D. (2016). Continuous control with deep reinforcement learning. In Proceedings of the 4th International Conference on Learning Representations, ICLR\u201916."},{"key":"9506_CR25","doi-asserted-by":"crossref","unstructured":"Littman, M.\u00a0L. (1994). Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the 11th International Conference on International Conference on Machine Learning, ICML\u201994 (pp. 157\u2013163). Morgan Kaufmann Publishers Inc.","DOI":"10.1016\/B978-1-55860-335-6.50027-1"},{"key":"9506_CR26","unstructured":"Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., & Mordatch, I. (2017). Multi-agent actor-critic for mixed cooperative-competitive environments. In Advances in Neural Information Processing Systems 30, NIPS\u201917 (pp. 6379\u20136390). Curran Associates, Inc."},{"key":"9506_CR27","unstructured":"Mahajan, A., Rashid, T., Samvelyan, M., & Whiteson, S. (2019). MAVEN: multi-agent variational exploration. In Advances in Neural Information Processing Systems 32, NIPS\u201919 (pp. 7611\u20137622). Curran Associates, Inc."},{"key":"9506_CR28","doi-asserted-by":"crossref","unstructured":"Matignon, L., Laurent, G. J., Fort-Piat, L., & Nadine. (2012). Independent reinforcement learners in cooperative Markov games: A survey regarding coordination problems. Knowledge Engineering Review, 27(1), 1\u201331.","DOI":"10.1017\/S0269888912000057"},{"issue":"7540","key":"9506_CR29","doi-asserted-by":"publisher","first-page":"529","DOI":"10.1038\/nature14236","volume":"518","author":"V Mnih","year":"2015","unstructured":"Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529\u2013533.","journal-title":"Nature"},{"key":"9506_CR30","unstructured":"Mnih, V., Badia, A.\u00a0P., Mirza, M., Graves, A., Harley, T., Lillicrap, T.\u00a0P., Silver, D., & Kavukcuoglu, K. (2016). Asynchronous methods for deep reinforcement learning. In Proceedings of the 33rd International Conference on Machine Learning, volume\u00a048 of ICML\u201916 (pp. 1928\u20131937). PMLR."},{"key":"9506_CR31","doi-asserted-by":"crossref","unstructured":"Oliehoek, F.\u00a0A. (2010). Value-based planning for teams of agents in stochastic partially observable environments. PhD thesis, Informatics Institute, University of Amsterdam.","DOI":"10.5117\/9789056296100"},{"key":"9506_CR32","unstructured":"Oliehoek, F.\u00a0A., Whiteson, S., & Spaan, M. T.\u00a0J. (2011). Exploiting agent and type independence in collaborative graphical Bayesian games. CoRR, abs\/1108.0404."},{"key":"9506_CR33","unstructured":"Osborne, M. J., & Rubinstein, A. (1994). A Course in Game Theory. The MIT Press."},{"key":"9506_CR34","unstructured":"Palmer, G., Tuyls, K., Bloembergen, D., & Savani, R. (2018). Lenient multi-agent deep reinforcement learning. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS\u201918 (pp. 443\u2013451). International Foundation for Autonomous Agents and MultiAgent Systems."},{"key":"9506_CR35","unstructured":"Rashid, T., Samvelyan, M., Schr\u00f6der\u00a0de Witt, C., Farquhar, G., Foerster, J.\u00a0N., & Whiteson, S. (2018). QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning. In Proceedings of the 35th International Conference on Machine Learning, ICML\u201918 (pp. 4292\u20134301). JMLR.org."},{"issue":"2","key":"9506_CR36","doi-asserted-by":"publisher","first-page":"730","DOI":"10.1016\/j.artint.2010.11.001","volume":"175","author":"A Rogers","year":"2011","unstructured":"Rogers, A., Farinelli, A., Stranders, R., & Jennings, N. R. (2011). Bounded approximate decentralised coordination via the max-sum algorithm. Artificial Intelligence, 175(2), 730\u2013759.","journal-title":"Artificial Intelligence"},{"key":"9506_CR37","doi-asserted-by":"crossref","unstructured":"Shahrampour, S., Rakhlin, A., & Jadbabaie, A. (2017). Multi-armed bandits in multi-agent networks. 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 2786\u20132790).","DOI":"10.1109\/ICASSP.2017.7952664"},{"key":"9506_CR38","unstructured":"Son, K., Kim, D., Kang, W.\u00a0J., Hostallero, D., & Yi, Y. (2019). QTRAN: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In Proceedings of the 36th International Conference on Machine Learning, ICML\u201919 (pp. 5887\u20135896). JMLR.org."},{"key":"9506_CR39","unstructured":"Sukhbaatar, S., Szlam, A., & Fergus, R. (2016). Learning multiagent communication with backpropagation. In Advances in Neural Information Processing Systems 30, NIPS\u201916 (pp. 2252\u20132260). Curran Associates, Inc."},{"key":"9506_CR40","unstructured":"Sunehag, P., Lever, G., Gruslys, A., Czarnecki, W.\u00a0M., Zambaldi, V., Jaderberg, M., Lanctot, M., Sonnerat, N., Leibo, J.\u00a0Z., Tuyls, K., & Graepel, T. (2018). Value-decomposition networks for cooperative multi-agent learning based on team reward. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS\u201918 (pp. 2085\u20132087). International Foundation for Autonomous Agents and MultiAgent Systems."},{"issue":"4","key":"9506_CR41","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1371\/journal.pone.0172395","volume":"12","author":"A Tampuu","year":"2017","unstructured":"Tampuu, A., Matiisen, T., Kodelja, D., Kuzovkin, I., Korjus, K., Aru, J., et al. (2017). Multiagent cooperation and competition with deep reinforcement learning. PLoS ONE, 12(4), 1\u201315.","journal-title":"PLoS ONE"},{"key":"9506_CR42","doi-asserted-by":"crossref","unstructured":"Tan, M. (1993). Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the 10th International Conference on Machine Learning, ICML\u201993 (pp. 330\u2013337). Morgan Kaufmann Publishers Inc.","DOI":"10.1016\/B978-1-55860-307-3.50049-6"},{"key":"9506_CR43","unstructured":"Van\u00a0der Pol, E., & Oliehoek, F.\u00a0A. (2016). Coordinated deep reinforcement learners for traffic light control. In Workshop on Learning, Inference and Control of Multi-Agent Systems, NIPS\u201916."},{"issue":"3","key":"9506_CR44","first-page":"279","volume":"8","author":"CJCH Watkins","year":"1992","unstructured":"Watkins, C. J. C. H., & Dayan, P. (1992). Q-learning. Machine Learning, 8(3), 279\u2013292.","journal-title":"Machine Learning"},{"issue":"84","key":"9506_CR45","first-page":"1","volume":"17","author":"E Wei","year":"2016","unstructured":"Wei, E., & Luke, S. (2016). Lenient learning in independent-learner stochastic cooperative games. Journal of Machine Learning Research, 17(84), 1\u201342.","journal-title":"Journal of Machine Learning Research"},{"key":"9506_CR46","unstructured":"Wunder, M., Littman, M.\u00a0L., & Babes, M. (2010). Classes of multiagent q-learning dynamics with epsilon-greedy exploration. In Proceedings of the 27th International Conference on Machine Learning, ICML\u201910 (pp. 1167\u20131174). Omnipress."},{"issue":"5","key":"9506_CR47","doi-asserted-by":"publisher","first-page":"10026","DOI":"10.3390\/s150510026","volume":"15","author":"D Ye","year":"2015","unstructured":"Ye, D., Zhang, M., & Yang, Y. (2015). A multi-agent framework for packet routing in wireless sensor networks. Sensors, 15(5), 10026\u201310047.","journal-title":"Sensors"}],"container-title":["Autonomous Agents and Multi-Agent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10458-021-09506-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10458-021-09506-w\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10458-021-09506-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,30]],"date-time":"2022-12-30T03:50:41Z","timestamp":1672372241000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10458-021-09506-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,6,7]]},"references-count":47,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2021,10]]}},"alternative-id":["9506"],"URL":"https:\/\/doi.org\/10.1007\/s10458-021-09506-w","relation":{},"ISSN":["1387-2532","1573-7454"],"issn-type":[{"type":"print","value":"1387-2532"},{"type":"electronic","value":"1573-7454"}],"subject":[],"published":{"date-parts":[[2021,6,7]]},"assertion":[{"value":"10 May 2021","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 June 2021","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"25"}}