{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,21]],"date-time":"2026-02-21T18:14:35Z","timestamp":1771697675367,"version":"3.50.1"},"reference-count":86,"publisher":"Springer Science and Business Media LLC","issue":"9-10","license":[{"start":{"date-parts":[[2020,1,23]],"date-time":"2020-01-23T00:00:00Z","timestamp":1579737600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,1,23]],"date-time":"2020-01-23T00:00:00Z","timestamp":1579737600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100000741","name":"University of Warwick","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100000741","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Mach Learn"],"published-print":{"date-parts":[[2020,9]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Deep reinforcement learning algorithms have recently been used to train multiple interacting agents in a centralised manner whilst keeping their execution decentralised. When the agents can only acquire partial observations and are faced with tasks requiring coordination and synchronisation skills, inter-agent communication plays an essential role. In this work, we propose a framework for multi-agent training using deep deterministic policy gradients that enables concurrent, end-to-end learning of an explicit communication protocol through a memory device. During training, the agents learn to perform read and write operations enabling them to infer a shared representation of the world. We empirically demonstrate that concurrent learning of the communication device and individual policies can improve inter-agent coordination and performance in small-scale systems. Our experimental results show that the proposed method achieves superior performance in scenarios with up to six agents. We illustrate how different communication patterns can emerge on six different tasks of increasing complexity. Furthermore, we study the effects of corrupting the communication channel, provide a visualisation of the time-varying memory content as the underlying task is being solved and validate the building blocks of the proposed memory device through ablation studies.<\/jats:p>","DOI":"10.1007\/s10994-019-05864-5","type":"journal-article","created":{"date-parts":[[2020,1,23]],"date-time":"2020-01-23T18:03:48Z","timestamp":1579802628000},"page":"1727-1747","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":42,"title":["Improving coordination in small-scale multi-agent deep reinforcement learning through memory-driven communication"],"prefix":"10.1007","volume":"109","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0314-8057","authenticated-orcid":false,"given":"Emanuele","family":"Pesce","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Giovanni","family":"Montana","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2020,1,23]]},"reference":[{"key":"5864_CR1","unstructured":"Ahilan, S., & Dayan, P. (2019). Feudal multi-agent hierarchies for cooperative reinforcement learning. arXiv preprint arXiv:1901.08492"},{"key":"5864_CR2","unstructured":"Brosig, J., Ockenfels, A., & Weimann, J., et\u00a0al. (2003). Information and communication in sequential bargaining. Citeseer"},{"key":"5864_CR3","unstructured":"Caicedo, J. C., & Lazebnik, S. (2015). Active object localization with deep reinforcement learning. In: Proceedings of the IEEE international conference on computer vision (pp. 2488\u20132496)."},{"issue":"1","key":"5864_CR4","doi-asserted-by":"crossref","first-page":"47","DOI":"10.1177\/1555412008325478","volume":"4","author":"MG Chen","year":"2009","unstructured":"Chen, M. G. (2009). Communication, coordination, and camaraderie in world of warcraft. Games and Culture, 4(1), 47\u201373.","journal-title":"Games and Culture"},{"key":"5864_CR5","unstructured":"Chu, X., & Ye, H. (2017). Parameter sharing deep deterministic policy gradient for cooperative multi-agent reinforcement learning. arXiv preprint arXiv:1710.00336"},{"key":"5864_CR6","doi-asserted-by":"crossref","first-page":"189","DOI":"10.1111\/j.1540-6210.2007.00827.x","volume":"67","author":"LK Comfort","year":"2007","unstructured":"Comfort, L. K. (2007). Crisis management in hindsight: Cognition, communication, coordination, and control. Public Administration Review, 67, 189\u2013197.","journal-title":"Public Administration Review"},{"issue":"4","key":"5864_CR7","doi-asserted-by":"crossref","first-page":"568","DOI":"10.2307\/2555734","volume":"20","author":"R Cooper","year":"1989","unstructured":"Cooper, R., DeJong, D. V., Forsythe, R., & Ross, T. W. (1989). Communication in the battle of the sexes game: Some experimental results. The RAND Journal of Economics, 20(4), 568.","journal-title":"The RAND Journal of Economics"},{"issue":"2","key":"5864_CR8","doi-asserted-by":"crossref","first-page":"167","DOI":"10.1016\/0165-1765(92)90217-M","volume":"40","author":"R Cooper","year":"1992","unstructured":"Cooper, R., De Jong, D. V., Forsythe, R., & Ross, T. W. (1992). Forward induction in coordination games. Economics Letters, 40(2), 167\u2013172.","journal-title":"Economics Letters"},{"key":"5864_CR9","doi-asserted-by":"crossref","unstructured":"Cortes, J., Martinez, S., Karatas, T., & Bullo, F. (2002). Coverage control for mobile sensing networks. In: Proceedings of IEEE international conference on robotics and automation, 2002. ICRA\u201902, IEEE, (Vol.\u00a02, pp. 1327\u20131332)","DOI":"10.1109\/ROBOT.2002.1014727"},{"issue":"2\u20133","key":"5864_CR10","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1023\/A:1007518724497","volume":"33","author":"RH Crites","year":"1998","unstructured":"Crites, R. H., & Barto, A. G. (1998). Elevator group control using multiple reinforcement learning agents. Machine Learning, 33(2\u20133), 235\u2013262.","journal-title":"Machine Learning"},{"key":"5864_CR11","unstructured":"Das, A., Gervet, T., Romoff, J., Batra, D., Parikh, D., Rabbat, M., & Pineau, J. (2018). Tarmac: Targeted multi-agent communication. arXiv preprint arXiv:1810.11187"},{"issue":"1","key":"5864_CR12","doi-asserted-by":"crossref","first-page":"51","DOI":"10.1075\/is.11.1.05rui","volume":"11","author":"JP De Ruiter","year":"2010","unstructured":"De Ruiter, J. P., Noordzij, M. L., Newman-Norlund, S., Newman-Norlund, R., Hagoort, P., Levinson, S. C., et al. (2010). Exploring the cognitive infrastructure of communication. Interaction Studies, 11(1), 51\u201377.","journal-title":"Interaction Studies"},{"key":"5864_CR13","unstructured":"Degris, T., White, M., & Sutton, R. S. (2012). Off-policy actor-critic. arXiv preprint arXiv:1205.4839"},{"issue":"4","key":"5864_CR14","doi-asserted-by":"crossref","first-page":"1292","DOI":"10.1257\/aer.98.4.1292","volume":"98","author":"S Demichelis","year":"2008","unstructured":"Demichelis, S., & Weibull, J. W. (2008). Language, meaning, and games: A model of communication, coordination, and evolution. American Economic Review, 98(4), 1292\u20131311.","journal-title":"American Economic Review"},{"key":"5864_CR15","unstructured":"Evans, R., & Gao, J . (2016) . Deepmind ai reduces google data centre cooling bill by 40"},{"key":"5864_CR16","unstructured":"Foerster, J., Assael, I. A., de\u00a0Freitas, N., & Whiteson, S. (2016). Learning to communicate with deep multi-agent reinforcement learning. In: Advances in neural information processing systems (pp. 2137\u20132145)."},{"key":"5864_CR17","unstructured":"Foerster, J., Farquhar, G., Afouras, T., Nardelli, N., & Whiteson, S. (2017). Counterfactual multi-agent policy gradients. arXiv preprint arXiv:1705.08926"},{"key":"5864_CR18","unstructured":"Foerster, J. N., Song, F., Hughes, E., Burch, N., Dunning, I., Whiteson, S., Botvinick, M., & Bowling, M. (2018). Bayesian action decoder for deep multi-agent reinforcement learning. arXiv preprint arXiv:1811.01458"},{"issue":"3","key":"5864_CR19","doi-asserted-by":"crossref","first-page":"325","DOI":"10.1023\/A:1008937911390","volume":"8","author":"D Fox","year":"2000","unstructured":"Fox, D., Burgard, W., Kruppa, H., & Thrun, S. (2000). A probabilistic approach to collaborative multi-robot localization. Autonomous Robots, 8(3), 325\u2013344.","journal-title":"Autonomous Robots"},{"key":"5864_CR20","unstructured":"French, A., Macedo, M., Poulsen, J., Waterson, T., & Yu, A. (2008). Multivariate analysis of variance (manova). San Francisco State University"},{"issue":"8","key":"5864_CR21","doi-asserted-by":"crossref","first-page":"931","DOI":"10.1177\/0956797612436816","volume":"23","author":"R Fusaroli","year":"2012","unstructured":"Fusaroli, R., Bahrami, B., Olsen, K., Roepstorff, A., Rees, G., Frith, C., et al. (2012). Coming to terms: Quantifying the benefits of linguistic coordination. Psychological Science, 23(8), 931\u2013939.","journal-title":"Psychological Science"},{"issue":"5","key":"5864_CR22","doi-asserted-by":"crossref","first-page":"737","DOI":"10.1207\/s15516709cog0000_34","volume":"29","author":"B Galantucci","year":"2005","unstructured":"Galantucci, B. (2005). An experimental study of the emergence of human communication systems. Cognitive Science, 29(5), 737\u2013767.","journal-title":"Cognitive Science"},{"issue":"1","key":"5864_CR23","doi-asserted-by":"crossref","first-page":"33","DOI":"10.1075\/is.11.1.04gar","volume":"11","author":"S Garrod","year":"2010","unstructured":"Garrod, S., Fay, N., Rogers, S., Walker, B., & Swoboda, N. (2010). Can iterated learning explain the emergence of graphical symbols? Interaction Studies, 11(1), 33\u201350.","journal-title":"Interaction Studies"},{"key":"5864_CR24","first-page":"227","volume":"2","author":"C Guestrin","year":"2002","unstructured":"Guestrin, C., Lagoudakis, M., & Parr, R. (2002). Coordinated reinforcement learning. ICML, Citeseer, 2, 227\u2013234.","journal-title":"ICML, Citeseer"},{"key":"5864_CR25","doi-asserted-by":"crossref","unstructured":"Gupta, J. K., Egorov, M., & Kochenderfer, M. (2017). Cooperative multi-agent control using deep reinforcement learning. In: International conference on autonomous agents and multiagent systems (pp. 66\u201383). Springer","DOI":"10.1007\/978-3-319-71682-4_5"},{"key":"5864_CR26","unstructured":"Hernandez-Leal, P., Kaisers, M., Baarslag, T., de\u00a0Cote, E. M. (2017). A survey of learning in multiagent environments: Dealing with non-stationarity. arXiv preprint arXiv:1707.09183"},{"issue":"8","key":"5864_CR27","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","volume":"9","author":"S Hochreiter","year":"1997","unstructured":"Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735\u20131780.","journal-title":"Neural Computation"},{"key":"5864_CR28","unstructured":"Iqbal, S., & Sha, F. (2019). Actor-attention-critic for multi-agent reinforcement learning. ICML"},{"key":"5864_CR29","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-642-15612-0","volume-title":"Innovations in agent-based complex automated negotiations","author":"T It\u014d","year":"2011","unstructured":"It\u014d, T., Zhang, M., Robu, V., Fatima, S., Matsuo, T., & Yamaki, H. (2011). Innovations in agent-based complex automated negotiations. Berlin: Springer."},{"key":"5864_CR30","unstructured":"Jang, E., Gu, S., & Poole, B. (2016). Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144"},{"issue":"11","key":"5864_CR31","doi-asserted-by":"crossref","first-page":"e49945","DOI":"10.1371\/journal.pone.0049945","volume":"7","author":"N Jarrass\u00e9","year":"2012","unstructured":"Jarrass\u00e9, N., Charalambous, T., & Burdet, E. (2012). A framework to describe, analyze and generate interactive motor behaviors. PloS One, 7(11), e49945.","journal-title":"PloS One"},{"key":"5864_CR32","unstructured":"Jiang, J., & Lu, Z. (2018). Learning attentional communication for multi-agent cooperation. arXiv preprint arXiv:1805.07733"},{"issue":"10","key":"5864_CR33","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1145\/2347736.2347753","volume":"55","author":"M Kearns","year":"2012","unstructured":"Kearns, M. (2012). Experiments in social computation. Communications of the ACM, 55(10), 56\u201367.","journal-title":"Communications of the ACM"},{"key":"5864_CR34","unstructured":"Kim, D., Moon, S., Hostallero, D., Kang, W. J., Lee, T., Son, K., & Yi, Y. (2019). Learning to schedule communication in multi-agent reinforcement learning. ICLR"},{"key":"5864_CR35","unstructured":"Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980"},{"key":"5864_CR36","unstructured":"Kong, X., Xin, B., Liu, F., & Wang, Y. (2017). Revisiting the master-slave architecture in multi-agent deep reinforcement learning. arXiv preprint arXiv:1712.07305"},{"key":"5864_CR37","doi-asserted-by":"crossref","first-page":"82","DOI":"10.1016\/j.neucom.2016.01.031","volume":"190","author":"L Kraemer","year":"2016","unstructured":"Kraemer, L., & Banerjee, B. (2016). Multi-agent reinforcement learning as a rehearsal for decentralized planning. Neurocomputing, 190, 82\u201394.","journal-title":"Neurocomputing"},{"issue":"1","key":"5864_CR38","first-page":"136","volume":"37","author":"HD Lasswell","year":"1948","unstructured":"Lasswell, H. D. (1948). The structure and function of communication in society. The Communication of Ideas, 37(1), 136\u201339.","journal-title":"The Communication of Ideas"},{"key":"5864_CR39","unstructured":"Lauer, M., & Riedmiller, M. (2000). An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In: Proceedings of the seventeenth international conference on machine learning, Citeseer"},{"issue":"1","key":"5864_CR40","doi-asserted-by":"crossref","first-page":"55","DOI":"10.3233\/KES-2010-0206","volume":"15","author":"GJ Laurent","year":"2011","unstructured":"Laurent, G. J., Matignon, L., Fort-Piat, L., et al. (2011). The world of independent learners is not markovian. International Journal of Knowledge-based and Intelligent Engineering Systems, 15(1), 55\u201364.","journal-title":"International Journal of Knowledge-based and Intelligent Engineering Systems"},{"key":"5864_CR41","unstructured":"Lazaridou, A., Peysakhovich, A., Baroni, M. (2016). Multi-agent cooperation and the emergence of (natural) language. arXiv preprint arXiv:1612.07182"},{"issue":"7553","key":"5864_CR42","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1038\/nature14539","volume":"521","author":"Y LeCun","year":"2015","unstructured":"LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436.","journal-title":"Nature"},{"key":"5864_CR43","unstructured":"Li, Y. (2017). Deep reinforcement learning: An overview. arXiv preprint arXiv:1701.07274"},{"key":"5864_CR44","unstructured":"Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., & Wierstra, D. (2015). Continuous control with deep reinforcement learning. CoRR abs\/1509.02971"},{"key":"5864_CR45","doi-asserted-by":"crossref","unstructured":"Littman, M. L. (1994). Markov games as a framework for multi-agent reinforcement learning. In: Machine learning proceedings 1994 (pp. 157\u2013163). Elsevier","DOI":"10.1016\/B978-1-55860-335-6.50027-1"},{"key":"5864_CR46","unstructured":"Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, O. P., & Mordatch, I. (2017). Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in neural information processing systems (pp. 6379\u20136390)"},{"key":"5864_CR47","doi-asserted-by":"crossref","unstructured":"Matignon, L., Laurent, G., & Le\u00a0Fort-Piat, N. (2007). Hysteretic q-learning: an algorithm for decentralized reinforcement learning in cooperative multi-agent teams. In: IEEE\/RSJ international conference on intelligent robots and systems (pp. 157\u2013163) IROS\u201907.x","DOI":"10.1109\/IROS.2007.4399095"},{"issue":"5","key":"5864_CR48","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1002\/cplx.20034","volume":"9","author":"JH Miller","year":"2004","unstructured":"Miller, J. H., & Moser, S. (2004). Communication and coordination. Complexity, 9(5), 31\u201340.","journal-title":"Complexity"},{"key":"5864_CR49","unstructured":"Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602"},{"issue":"7540","key":"5864_CR50","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1038\/nature14236","volume":"518","author":"V Mnih","year":"2015","unstructured":"Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529.","journal-title":"Nature"},{"key":"5864_CR51","unstructured":"Mordatch, I., & Abbeel, P. (2017). Emergence of grounded compositional language in multi-agent populations. arXiv preprint arXiv:1703.04908"},{"issue":"1","key":"5864_CR52","doi-asserted-by":"crossref","first-page":"215","DOI":"10.1109\/JPROC.2006.887293","volume":"95","author":"R Olfati-Saber","year":"2007","unstructured":"Olfati-Saber, R., Fax, J. A., & Murray, R. M. (2007). Consensus and cooperation in networked multi-agent systems. Proceedings of the IEEE, 95(1), 215\u2013233.","journal-title":"Proceedings of the IEEE"},{"key":"5864_CR53","doi-asserted-by":"crossref","unstructured":"Oliehoek, F. A., & Vlassis, N. (2007). Q-value functions for decentralized pomdps. In: Proceedings of the 6th international joint conference on autonomous agents and multiagent systems. ACM","DOI":"10.1145\/1329125.1329390"},{"key":"5864_CR54","unstructured":"Ono, N., & Fukumoto, K. (1996). Multi-agent reinforcement learning: A modular approach. In: Second international conference on multiagent systems (pp. 252\u2013258)."},{"issue":"3","key":"5864_CR55","doi-asserted-by":"crossref","first-page":"387","DOI":"10.1007\/s10458-005-2631-2","volume":"11","author":"L Panait","year":"2005","unstructured":"Panait, L., & Luke, S. (2005). Cooperative multi-agent learning: The state of the art. Autonomous Agents and Multi-agent Systems, 11(3), 387\u2013434.","journal-title":"Autonomous Agents and Multi-agent Systems"},{"issue":"2","key":"5864_CR56","doi-asserted-by":"crossref","first-page":"314","DOI":"10.1111\/1467-8306.9302004","volume":"93","author":"DC Parker","year":"2003","unstructured":"Parker, D. C., Manson, S. M., Janssen, M. A., Hoffmann, M. J., & Deadman, P. (2003). Multi-agent systems for the simulation of land-use and land-cover change: A review. Annals of the Association of American Geographers, 93(2), 314\u2013337.","journal-title":"Annals of the Association of American Geographers"},{"key":"5864_CR57","unstructured":"Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., & Lerer, A. (2017). Automatic differentiation in pytorch"},{"key":"5864_CR58","unstructured":"Peng, P., Yuan, Q., Wen, Y., Yang, Y., Tang, Z., Long, H., & Wang, J. (2017). Multiagent bidirectionally-coordinated nets for learning to play starcraft combat games. arXiv preprint arXiv:1703.10069"},{"key":"5864_CR59","unstructured":"Peshkin, L., Kim, K. E., Meuleau, N., & Kaelbling, L. P. (2000). Learning to cooperate via policy search. In: Proceedings of the sixteenth conference on uncertainty in artificial intelligence (pp. 489\u2013496). Morgan Kaufmann Publishers Inc."},{"key":"5864_CR60","doi-asserted-by":"crossref","first-page":"372","DOI":"10.1016\/j.trc.2017.11.009","volume":"86","author":"A Petrillo","year":"2018","unstructured":"Petrillo, A., Salvi, A., Santini, S., & Valente, A. S. (2018). Adaptive multi-agents synchronization for collaborative driving of autonomous vehicles with multiple communication delays. Transportation Research Part C: Emerging Technologies, 86, 372\u2013392.","journal-title":"Transportation Research Part C: Emerging Technologies"},{"key":"5864_CR61","unstructured":"Pipattanasomporn, M., Feroze, H., & Rahman, S. (2009). Multi-agent systems in a distributed smart grid: Design and implementation. In: Power systems conference and exposition (2009). PSCE\u201909 (pp. 1\u20138). IEEE: IEEE\/PES."},{"issue":"4","key":"5864_CR62","doi-asserted-by":"crossref","first-page":"324","DOI":"10.1016\/j.robot.2007.08.005","volume":"56","author":"W Ren","year":"2008","unstructured":"Ren, W., & Sorensen, N. (2008). Distributed coordination architecture for multi-robot formation control. Robotics and Autonomous Systems, 56(4), 324\u2013333.","journal-title":"Robotics and Autonomous Systems"},{"key":"5864_CR63","doi-asserted-by":"crossref","unstructured":"Scardovi, L., & Sepulchre, R. (2008). Synchronization in networks of identical linear systems. In: 47th IEEE conference on decision and control, 2008. CDC 2008 (pp. 546\u2013551). IEEE","DOI":"10.1109\/CDC.2008.4738875"},{"key":"5864_CR64","unstructured":"Schmidhuber , J. (1996). A general method for multi-agent reinforcement learning in unrestricted environments. In: Adaptation, coevolution and learning in multiagent systems: papers from the 1996 AAAI spring symposium (pp. 84\u201387)"},{"key":"5864_CR65","doi-asserted-by":"crossref","first-page":"85","DOI":"10.1016\/j.neunet.2014.09.003","volume":"61","author":"J Schmidhuber","year":"2015","unstructured":"Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61, 85\u2013117.","journal-title":"Neural Networks"},{"key":"5864_CR66","unstructured":"Schulman, J., Levine, S., Abbeel, P., Jordan, M., & Moritz, P. (2015). Trust region policy optimization. In: International conference on machine learning (pp. 1889\u20131897)"},{"issue":"18","key":"5864_CR67","doi-asserted-by":"crossref","first-page":"7361","DOI":"10.1073\/pnas.0702077104","volume":"104","author":"R Selten","year":"2007","unstructured":"Selten, R., & Warglien, M. (2007). The emergence of simple languages in an experimental coordination game. Proceedings of the National Academy of Sciences, 104(18), 7361\u20137366.","journal-title":"Proceedings of the National Academy of Sciences"},{"key":"5864_CR68","unstructured":"Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., & Riedmiller, M. (2014). Deterministic policy gradient algorithms. In: ICML"},{"key":"5864_CR69","unstructured":"Singh, A., Jain, T., & Sukhbaatar, S. (2019). Learning when to communicate at scale in multiagent cooperative and competitive tasks. ICLR"},{"key":"5864_CR70","doi-asserted-by":"crossref","unstructured":"Singh, S. P., Jaakkola, T., & Jordan, M. I.(1994). Learning without state-estimation in partially observable markovian decision processes. In: Proceedings of machine learning 1994 (pp. 284\u2013292). Elsevier","DOI":"10.1016\/B978-1-55860-335-6.50042-8"},{"issue":"1","key":"5864_CR71","doi-asserted-by":"crossref","first-page":"83","DOI":"10.1006\/ijhc.1997.0162","volume":"48","author":"P Stone","year":"1998","unstructured":"Stone, P., & Veloso, M. (1998). Towards collaborative and adversarial learning: A case study in robotic soccer. International Journal of Human-Computer Studies, 48(1), 83\u2013104.","journal-title":"International Journal of Human-Computer Studies"},{"key":"5864_CR72","unstructured":"Sukhbaatar, S., & Fergus, R., et\u00a0al. (2016). Learning multiagent communication with backpropagation. In: Advances in neural information processing systems (pp. 2244\u20132252)"},{"key":"5864_CR73","volume-title":"Introduction to reinforcement learning","author":"RS Sutton","year":"1998","unstructured":"Sutton, R. S., & Barto, A. G. (1998). Introduction to reinforcement learning (Vol. 135). Cambridge: MIT Press."},{"issue":"4","key":"5864_CR74","doi-asserted-by":"crossref","first-page":"366","DOI":"10.1162\/BIOT_a_00064","volume":"5","author":"S Sz\u00e1mad\u00f3","year":"2010","unstructured":"Sz\u00e1mad\u00f3, S. (2010). Pre-hunt communication provides context for the evolution of early human language. Biological Theory, 5(4), 366\u2013382.","journal-title":"Biological Theory"},{"issue":"4","key":"5864_CR75","doi-asserted-by":"crossref","first-page":"e0172395","DOI":"10.1371\/journal.pone.0172395","volume":"12","author":"A Tampuu","year":"2017","unstructured":"Tampuu, A., Matiisen, T., Kodelja, D., Kuzovkin, I., Korjus, K., Aru, J., et al. (2017). Multiagent cooperation and competition with deep reinforcement learning. PloS One, 12(4), e0172395.","journal-title":"PloS One"},{"key":"5864_CR76","unstructured":"Tan, M. (1993). Multi-agent reinforcement learning: Independent vs. cooperative agents. In: Proceedings of the tenth international conference on machine learning (pp. 330\u2013337)"},{"issue":"1","key":"5864_CR77","doi-asserted-by":"crossref","first-page":"14","DOI":"10.1075\/is.11.1.08the","volume":"11","author":"CA Theisen","year":"2010","unstructured":"Theisen, C. A., Oberlander, J., & Kirby, S. (2010). Systematicity and arbitrariness in novel communication systems. Interaction Studies, 11(1), 14\u201332.","journal-title":"Interaction Studies"},{"issue":"3","key":"5864_CR78","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1609\/aimag.v33i3.2426","volume":"33","author":"K Tuyls","year":"2012","unstructured":"Tuyls, K., & Weiss, G. (2012). Multiagent learning: Basics, challenges, and prospects. Ai Magazine, 33(3), 41.","journal-title":"Ai Magazine"},{"issue":"5","key":"5864_CR79","doi-asserted-by":"crossref","first-page":"823","DOI":"10.1103\/PhysRev.36.823","volume":"36","author":"GE Uhlenbeck","year":"1930","unstructured":"Uhlenbeck, G. E., & Ornstein, L. S. (1930). On the theory of the brownian motion. Physical Review, 36(5), 823.","journal-title":"Physical Review"},{"key":"5864_CR80","volume-title":"Python Tutorial","author":"G Van Rossum","year":"1995","unstructured":"Van Rossum, G., & Drake, F. L, Jr. (1995). Python Tutorial. The Netherlands: Centrum voor Wiskunde en Informatica Amsterdam."},{"issue":"2","key":"5864_CR81","doi-asserted-by":"crossref","first-page":"e0170780","DOI":"10.1371\/journal.pone.0170780","volume":"12","author":"Y Vorobeychik","year":"2017","unstructured":"Vorobeychik, Y., Joveski, Z., & Yu, S. (2017). Does communication help people coordinate? PloS One, 12(2), e0170780.","journal-title":"PloS One"},{"issue":"2","key":"5864_CR82","doi-asserted-by":"crossref","first-page":"170","DOI":"10.1002\/rnc.1687","volume":"22","author":"G Wen","year":"2012","unstructured":"Wen, G., Duan, Z., Yu, W., & Chen, G. (2012). Consensus in multi-agent systems with communication constraints. International Journal of Robust and Nonlinear Control, 22(2), 170\u2013182.","journal-title":"International Journal of Robust and Nonlinear Control"},{"key":"5864_CR83","unstructured":"Wen, Y., Yang, Y., Luo, R., Wang, J., & Pan, W. (2019). Probabilistic recursive reasoning for multi-agent reinforcement learning. arXiv preprint arXiv:1901.09207"},{"issue":"5","key":"5864_CR84","doi-asserted-by":"crossref","first-page":"447","DOI":"10.1111\/1468-0017.00237","volume":"18","author":"T Wharton","year":"2003","unstructured":"Wharton, T. (2003). Natural pragmatics and natural codes. Mind & Language, 18(5), 447\u2013477.","journal-title":"Mind & Language"},{"key":"5864_CR85","unstructured":"Wunder, M., Littman, M., & Stone, M. (2009). Communication, credibility and negotiation using a cognitive hierarchy model. In: Workshop# 19: MSDM 2009, p\u00a073"},{"issue":"10","key":"5864_CR86","doi-asserted-by":"crossref","first-page":"2262","DOI":"10.1109\/TAC.2011.2164017","volume":"56","author":"K You","year":"2011","unstructured":"You, K., & Xie, L. (2011). Network topology and communication data rate for consensusability of discrete-time multi-agent systems. IEEE Transactions on Automatic Control, 56(10), 2262.","journal-title":"IEEE Transactions on Automatic Control"}],"container-title":["Machine Learning"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-019-05864-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1007\/s10994-019-05864-5\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-019-05864-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,1,22]],"date-time":"2021-01-22T00:50:28Z","timestamp":1611276628000},"score":1,"resource":{"primary":{"URL":"http:\/\/link.springer.com\/10.1007\/s10994-019-05864-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,1,23]]},"references-count":86,"journal-issue":{"issue":"9-10","published-print":{"date-parts":[[2020,9]]}},"alternative-id":["5864"],"URL":"https:\/\/doi.org\/10.1007\/s10994-019-05864-5","relation":{},"ISSN":["0885-6125","1573-0565"],"issn-type":[{"value":"0885-6125","type":"print"},{"value":"1573-0565","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,1,23]]},"assertion":[{"value":"21 January 2019","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"28 October 2019","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"6 December 2019","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 January 2020","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}