{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,3]],"date-time":"2025-10-03T08:51:18Z","timestamp":1759481478626,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":27,"publisher":"ACM","license":[{"start":{"date-parts":[[2019,10,13]],"date-time":"2019-10-13T00:00:00Z","timestamp":1570924800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2019,10,13]]},"DOI":"10.1145\/3356464.3357712","type":"proceedings-article","created":{"date-parts":[[2019,10,31]],"date-time":"2019-10-31T12:20:52Z","timestamp":1572524452000},"page":"1-7","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Achieving cooperation through deep multiagent reinforcement learning in sequential prisoner's dilemmas"],"prefix":"10.1145","author":[{"given":"Weixun","family":"Wang","sequence":"first","affiliation":[{"name":"Tianjin University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jianye","family":"Hao","sequence":"additional","affiliation":[{"name":"Tianjin University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yixi","family":"Wang","sequence":"additional","affiliation":[{"name":"Tianjin University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Matthew","family":"Taylor","sequence":"additional","affiliation":[{"name":"Washington State University"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2019,10,13]]},"reference":[{"unstructured":"Robert Axelrod. 1984. The evolution of cooperation. Robert Axelrod. 1984. The evolution of cooperation.","key":"e_1_3_2_1_1_1"},{"doi-asserted-by":"publisher","key":"e_1_3_2_1_2_1","DOI":"10.1007\/s10458-007-0020-8"},{"doi-asserted-by":"publisher","key":"e_1_3_2_1_3_1","DOI":"10.1109\/TSMCC.2007.913919"},{"key":"e_1_3_2_1_4_1","volume-title":"Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems-Volume 1. International Foundation for Autonomous Agents and Multiagent Systems, 399--406","author":"Crandall Jacob W","year":"2012","unstructured":"Jacob W Crandall . 2012 . Just add Pepper: extending learning algorithms for repeated matrix games to repeated markov games . In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems-Volume 1. International Foundation for Autonomous Agents and Multiagent Systems, 399--406 . Jacob W Crandall. 2012. Just add Pepper: extending learning algorithms for repeated matrix games to repeated markov games. In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems-Volume 1. International Foundation for Autonomous Agents and Multiagent Systems, 399--406."},{"key":"e_1_3_2_1_5_1","volume-title":"AAAI workshop on Multiagent Learning.","author":"Crandall Jacob W","year":"2005","unstructured":"Jacob W Crandall and Michael A Goodrich . 2005 . Learning to teach and follow in repeated games . In AAAI workshop on Multiagent Learning. Jacob W Crandall and Michael A Goodrich. 2005. Learning to teach and follow in repeated games. In AAAI workshop on Multiagent Learning."},{"unstructured":"Steven Damer and Maria L Gini. 2008. Achieving Cooperation in a Minimally Constrained Environment. In AAAI. 57--62. Steven Damer and Maria L Gini. 2008. Achieving Cooperation in a Minimally Constrained Environment. In AAAI. 57--62.","key":"e_1_3_2_1_6_1"},{"key":"e_1_3_2_1_7_1","volume-title":"Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems. International Foundation for Autonomous Agents and Multiagent Systems, 1141--1148","author":"Elidrisi Mohamed","year":"2014","unstructured":"Mohamed Elidrisi , Nicholas Johnson , Maria Gini , and Jacob Crandall . 2014 . Fast adaptive learning in repeated stochastic games by game abstraction . In Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems. International Foundation for Autonomous Agents and Multiagent Systems, 1141--1148 . Mohamed Elidrisi, Nicholas Johnson, Maria Gini, and Jacob Crandall. 2014. Fast adaptive learning in repeated stochastic games by game abstraction. In Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems. International Foundation for Autonomous Agents and Multiagent Systems, 1141--1148."},{"key":"e_1_3_2_1_8_1","volume-title":"Counterfactual Multi-Agent Policy Gradients. arXiv preprint arXiv:1705.08926","author":"Foerster Jakob","year":"2017","unstructured":"Jakob Foerster , Gregory Farquhar , Triantafyllos Afouras , Nantas Nardelli , and Shimon Whiteson . 2017. Counterfactual Multi-Agent Policy Gradients. arXiv preprint arXiv:1705.08926 ( 2017 ). Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, and Shimon Whiteson. 2017. Counterfactual Multi-Agent Policy Gradients. arXiv preprint arXiv:1705.08926 (2017)."},{"unstructured":"Jakob Foerster Nantas Nardelli Gregory Farquhar Philip Torr Pushmeet Kohli Shimon Whiteson etal 2017. Stabilising experience replay for deep multi-agent reinforcement learning. arXiv preprint arXiv:1702.08887 (2017). Jakob Foerster Nantas Nardelli Gregory Farquhar Philip Torr Pushmeet Kohli Shimon Whiteson et al. 2017. Stabilising experience replay for deep multi-agent reinforcement learning. arXiv preprint arXiv:1702.08887 (2017).","key":"e_1_3_2_1_9_1"},{"doi-asserted-by":"publisher","key":"e_1_3_2_1_10_1","DOI":"10.1007\/s10458-014-9265-1"},{"doi-asserted-by":"publisher","key":"e_1_3_2_1_11_1","DOI":"10.1007\/978-3-319-71682-4_15"},{"key":"e_1_3_2_1_12_1","volume-title":"Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems. 1315--1316","author":"Hernandez-Leal Pablo","year":"2016","unstructured":"Pablo Hernandez-Leal , Benjamin Rosman , Matthew E Taylor , L Enrique Sucar , and Enrique Munoz de Cote . 2016 . A Bayesian approach for learning and tracking switching, non-stationary opponents . In Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems. 1315--1316 . Pablo Hernandez-Leal, Benjamin Rosman, Matthew E Taylor, L Enrique Sucar, and Enrique Munoz de Cote. 2016. A Bayesian approach for learning and tracking switching, non-stationary opponents. In Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems. 1315--1316."},{"doi-asserted-by":"crossref","unstructured":"Pablo Hernandez-Leal Matthew E Taylor Benjamin Rosman L Enrique Sucar and Enrique Munoz De Cote. 2016. Identifying and Tracking Switching Non-stationary Opponents: a Bayesian Approach. In Multiagent Interaction without Prior Coordination Workshop at AAAI. Pablo Hernandez-Leal Matthew E Taylor Benjamin Rosman L Enrique Sucar and Enrique Munoz De Cote. 2016. Identifying and Tracking Switching Non-stationary Opponents: a Bayesian Approach. In Multiagent Interaction without Prior Coordination Workshop at AAAI.","key":"e_1_3_2_1_13_1","DOI":"10.1007\/s10458-016-9352-6"},{"key":"e_1_3_2_1_14_1","first-page":"1039","article-title":"Nash Q-learning for general-sum stochastic games","author":"Hu Junling","year":"2003","unstructured":"Junling Hu and Michael P Wellman . 2003 . Nash Q-learning for general-sum stochastic games . Journal of machine learning research 4 , Nov (2003), 1039 -- 1069 . Junling Hu and Michael P Wellman. 2003. Nash Q-learning for general-sum stochastic games. Journal of machine learning research 4, Nov (2003), 1039--1069.","journal-title":"Journal of machine learning research 4"},{"key":"e_1_3_2_1_15_1","volume-title":"Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 464--473","author":"Leibo Joel Z","year":"2017","unstructured":"Joel Z Leibo , Vinicius Zambaldi , Marc Lanctot , Janusz Marecki , and Thore Graepel . 2017 . Multi-agent Reinforcement Learning in Sequential Social Dilemmas . In Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 464--473 . Joel Z Leibo, Vinicius Zambaldi, Marc Lanctot, Janusz Marecki, and Thore Graepel. 2017. Multi-agent Reinforcement Learning in Sequential Social Dilemmas. In Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 464--473."},{"doi-asserted-by":"publisher","key":"e_1_3_2_1_16_1","DOI":"10.5555\/3091574.3091594"},{"key":"e_1_3_2_1_17_1","volume-title":"Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. arXiv preprint arXiv:1706.02275","author":"Lowe Ryan","year":"2017","unstructured":"Ryan Lowe , Yi Wu , Aviv Tamar , Jean Harb , Pieter Abbeel , and Igor Mordatch . 2017. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. arXiv preprint arXiv:1706.02275 ( 2017 ). Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, and Igor Mordatch. 2017. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. arXiv preprint arXiv:1706.02275 (2017)."},{"key":"e_1_3_2_1_18_1","volume-title":"Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602","author":"Mnih Volodymyr","year":"2013","unstructured":"Volodymyr Mnih , Koray Kavukcuoglu , David Silver , Alex Graves , Ioannis Antonoglou , Daan Wierstra , and Martin Riedmiller . 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 ( 2013 ). Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)."},{"doi-asserted-by":"crossref","unstructured":"Volodymyr Mnih Koray Kavukcuoglu David Silver Andrei A Rusu Joel Veness Marc G Bellemare Alex Graves Martin Riedmiller Andreas K Fidjeland Georg Ostrovski etal 2015. Human-level control through deep reinforcement learning. Nature 518 7540 (2015) 529--533. Volodymyr Mnih Koray Kavukcuoglu David Silver Andrei A Rusu Joel Veness Marc G Bellemare Alex Graves Martin Riedmiller Andreas K Fidjeland Georg Ostrovski et al. 2015. Human-level control through deep reinforcement learning. Nature 518 7540 (2015) 529--533.","key":"e_1_3_2_1_19_1","DOI":"10.1038\/nature14236"},{"key":"e_1_3_2_1_20_1","volume-title":"A strategy of win-stay, lose-shift that outperforms tit-for-tat in the Prisoner's Dilemma game. Nature 364, 6432","author":"Nowak Martin","year":"1993","unstructured":"Martin Nowak and Karl Sigmund . 1993. A strategy of win-stay, lose-shift that outperforms tit-for-tat in the Prisoner's Dilemma game. Nature 364, 6432 ( 1993 ), 56--58. Martin Nowak and Karl Sigmund. 1993. A strategy of win-stay, lose-shift that outperforms tit-for-tat in the Prisoner's Dilemma game. Nature 364, 6432 (1993), 56--58."},{"key":"e_1_3_2_1_21_1","volume-title":"A multi-agent reinforcement learning model of common-pool resource appropriation. arXiv preprint arXiv:1707.06600","author":"Perolat Julien","year":"2017","unstructured":"Julien Perolat , Joel Z Leibo , Vinicius Zambaldi , Charles Beattie , Karl Tuyls , and Thore Graepel . 2017. A multi-agent reinforcement learning model of common-pool resource appropriation. arXiv preprint arXiv:1707.06600 ( 2017 ). Julien Perolat, Joel Z Leibo, Vinicius Zambaldi, Charles Beattie, Karl Tuyls, and Thore Graepel. 2017. A multi-agent reinforcement learning model of common-pool resource appropriation. arXiv preprint arXiv:1707.06600 (2017)."},{"key":"e_1_3_2_1_22_1","volume-title":"High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438","author":"Schulman John","year":"2015","unstructured":"John Schulman , Philipp Moritz , Sergey Levine , Michael Jordan , and Pieter Abbeel . 2015. High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438 ( 2015 ). John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel. 2015. High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438 (2015)."},{"doi-asserted-by":"publisher","key":"e_1_3_2_1_23_1","DOI":"10.1023\/A:1008942012299"},{"key":"e_1_3_2_1_24_1","volume-title":"Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z Leibo, Karl Tuyls, et al.","author":"Sunehag Peter","year":"2017","unstructured":"Peter Sunehag , Guy Lever , Audrunas Gruslys , Wojciech Marian Czarnecki , Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z Leibo, Karl Tuyls, et al. 2017 . Value-Decomposition Networks For Cooperative Multi-Agent Learning . arXiv preprint arXiv:1706.05296 (2017). Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z Leibo, Karl Tuyls, et al. 2017. Value-Decomposition Networks For Cooperative Multi-Agent Learning. arXiv preprint arXiv:1706.05296 (2017)."},{"unstructured":"Richard S Sutton David A McAllester Satinder P Singh and Yishay Mansour. 2000. Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems. 1057--1063. Richard S Sutton David A McAllester Satinder P Singh and Yishay Mansour. 2000. Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems. 1057--1063.","key":"e_1_3_2_1_25_1"},{"key":"e_1_3_2_1_26_1","volume-title":"Sample efficient actor-critic with experience replay. arXiv preprint arXiv:1611.01224","author":"Wang Ziyu","year":"2016","unstructured":"Ziyu Wang , Victor Bapst , Nicolas Heess , Volodymyr Mnih , Remi Munos , Koray Kavukcuoglu , and Nando de Freitas . 2016. Sample efficient actor-critic with experience replay. arXiv preprint arXiv:1611.01224 ( 2016 ). Ziyu Wang, Victor Bapst, Nicolas Heess, Volodymyr Mnih, Remi Munos, Koray Kavukcuoglu, and Nando de Freitas. 2016. Sample efficient actor-critic with experience replay. arXiv preprint arXiv:1611.01224 (2016)."},{"key":"e_1_3_2_1_27_1","volume-title":"Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning 8, 3--4","author":"Williams Ronald J","year":"1992","unstructured":"Ronald J Williams . 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning 8, 3--4 ( 1992 ), 229--256. Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning 8, 3--4 (1992), 229--256."}],"event":{"acronym":"DAI '19","name":"DAI '19: First International Conference on Distributed Artificial Intelligence","location":"Beijing China"},"container-title":["Proceedings of the First International Conference on Distributed Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3356464.3357712","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3356464.3357712","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:22:54Z","timestamp":1750202574000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3356464.3357712"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,10,13]]},"references-count":27,"alternative-id":["10.1145\/3356464.3357712","10.1145\/3356464"],"URL":"https:\/\/doi.org\/10.1145\/3356464.3357712","relation":{},"subject":[],"published":{"date-parts":[[2019,10,13]]},"assertion":[{"value":"2019-10-13","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}