{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,30]],"date-time":"2026-04-30T14:40:10Z","timestamp":1777560010055,"version":"3.51.4"},"reference-count":42,"publisher":"SAGE Publications","issue":"4","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["AIC"],"published-print":{"date-parts":[[2022,9,20]]},"abstract":"<jats:p>The development of autonomous agents which can interact with other agents to accomplish a given task is a core area of research in artificial intelligence and machine learning. Towards this goal, the Autonomous Agents Research Group develops novel machine learning algorithms for autonomous systems control, with a specific focus on deep reinforcement learning and multi-agent reinforcement learning. Research problems include scalable learning of coordinated agent policies and inter-agent communication; reasoning about the behaviours, goals, and composition of other agents from limited observations; and sample-efficient learning based on intrinsic motivation, curriculum learning, causal inference, and representation learning. This article provides a broad overview of the ongoing research portfolio of the group and discusses open problems for future directions.<\/jats:p>","DOI":"10.3233\/aic-220116","type":"journal-article","created":{"date-parts":[[2022,9,2]],"date-time":"2022-09-02T11:21:25Z","timestamp":1662117685000},"page":"357-368","source":"Crossref","is-referenced-by-count":15,"title":["Deep reinforcement learning for multi-agent interaction"],"prefix":"10.1177","volume":"35","author":[{"given":"Ibrahim H.","family":"Ahmed","sequence":"first","affiliation":[{"name":"Autonomous Agents Research Group, School of Informatics, University of Edinburgh, United Kingdom"}]},{"given":"Cillian","family":"Brewitt","sequence":"additional","affiliation":[{"name":"Autonomous Agents Research Group, School of Informatics, University of Edinburgh, United Kingdom"}]},{"given":"Ignacio","family":"Carlucho","sequence":"additional","affiliation":[{"name":"Autonomous Agents Research Group, School of Informatics, University of Edinburgh, United Kingdom"}]},{"given":"Filippos","family":"Christianos","sequence":"additional","affiliation":[{"name":"Autonomous Agents Research Group, School of Informatics, University of Edinburgh, United Kingdom"}]},{"given":"Mhairi","family":"Dunion","sequence":"additional","affiliation":[{"name":"Autonomous Agents Research Group, School of Informatics, University of Edinburgh, United Kingdom"}]},{"given":"Elliot","family":"Fosong","sequence":"additional","affiliation":[{"name":"Autonomous Agents Research Group, School of Informatics, University of Edinburgh, United Kingdom"}]},{"given":"Samuel","family":"Garcin","sequence":"additional","affiliation":[{"name":"Autonomous Agents Research Group, School of Informatics, University of Edinburgh, United Kingdom"}]},{"given":"Shangmin","family":"Guo","sequence":"additional","affiliation":[{"name":"Autonomous Agents Research Group, School of Informatics, University of Edinburgh, United Kingdom"}]},{"given":"Balint","family":"Gyevnar","sequence":"additional","affiliation":[{"name":"Autonomous Agents Research Group, School of Informatics, University of Edinburgh, United Kingdom"}]},{"given":"Trevor","family":"McInroe","sequence":"additional","affiliation":[{"name":"Autonomous Agents Research Group, School of Informatics, University of Edinburgh, United Kingdom"}]},{"given":"Georgios","family":"Papoudakis","sequence":"additional","affiliation":[{"name":"Autonomous Agents Research Group, School of Informatics, University of Edinburgh, United Kingdom"}]},{"given":"Arrasy","family":"Rahman","sequence":"additional","affiliation":[{"name":"Autonomous Agents Research Group, School of Informatics, University of Edinburgh, United Kingdom"}]},{"given":"Lukas","family":"Sch\u00e4fer","sequence":"additional","affiliation":[{"name":"Autonomous Agents Research Group, School of Informatics, University of Edinburgh, United Kingdom"}]},{"given":"Massimiliano","family":"Tamborski","sequence":"additional","affiliation":[{"name":"Autonomous Agents Research Group, School of Informatics, University of Edinburgh, United Kingdom"}]},{"given":"Giuseppe","family":"Vecchio","sequence":"additional","affiliation":[{"name":"Autonomous Agents Research Group, School of Informatics, University of Edinburgh, United Kingdom"}]},{"given":"Cheng","family":"Wang","sequence":"additional","affiliation":[{"name":"Autonomous Agents Research Group, School of Informatics, University of Edinburgh, United Kingdom"}]},{"given":"Stefano\u00a0V.","family":"Albrecht","sequence":"additional","affiliation":[{"name":"Autonomous Agents Research Group, School of Informatics, University of Edinburgh, United Kingdom"}]}],"member":"179","reference":[{"key":"10.3233\/AIC-220116_ref1","doi-asserted-by":"crossref","unstructured":"I.H.\u00a0Ahmed, J.P.\u00a0Hanna, E.\u00a0Fosong and S.V.\u00a0Albrecht, Towards quantum-secure authentication and key agreement via abstract multi-agent interaction, in: International Conference on Practical Applications of Agents and Multi-Agent Systems (PAAMS), 2021.","DOI":"10.1007\/978-3-030-85739-4_2"},{"key":"10.3233\/AIC-220116_ref2","doi-asserted-by":"crossref","unstructured":"S.V.\u00a0Albrecht, C.\u00a0Brewitt, J.\u00a0Wilhelm, B.\u00a0Gyevnar, F.\u00a0Eiras, M.\u00a0Dobre and S.\u00a0Ramamoorthy, Interpretable goal-based prediction and planning for autonomous driving, in: IEEE International Conference on Robotics and Automation (ICRA), 2021.","DOI":"10.1109\/ICRA48506.2021.9560849"},{"key":"10.3233\/AIC-220116_ref3","doi-asserted-by":"publisher","first-page":"765","DOI":"10.1007\/s10458-016-9358-0","article-title":"Special issue on multiagent interaction without prior coordination: Guest editorial","volume":"31","author":"Albrecht","year":"2017","journal-title":"Autonomous Agents and Multi-Agent Systems"},{"key":"10.3233\/AIC-220116_ref4","unstructured":"S.V.\u00a0Albrecht and S.\u00a0Ramamoorthy, A game-theoretic model and best-response learning method for ad hoc coordination in multiagent systems, in: Proceedings of the 12th International Conference on Autonomous Agents and Multiagent Systems, St. Paul, Minnesota, USA, 2013."},{"key":"10.3233\/AIC-220116_ref5","unstructured":"S.V.\u00a0Albrecht and S.\u00a0Ramamoorthy, Are you doing what I think you are doing? Criticising uncertain agent models, in: Proceedings of the 31st Conference on Uncertainty in Artificial Intelligence, 2015, pp.\u00a052\u201361."},{"key":"10.3233\/AIC-220116_ref6","doi-asserted-by":"publisher","first-page":"66","DOI":"10.1016\/j.artint.2018.01.002","article-title":"Autonomous agents modelling other agents: A comprehensive survey and open problems","volume":"258","author":"Albrecht","year":"2018","journal-title":"Artificial Intelligence"},{"key":"10.3233\/AIC-220116_ref7","doi-asserted-by":"publisher","DOI":"10.1016\/j.artint.2020.103292"},{"key":"10.3233\/AIC-220116_ref8","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-32375-1_2"},{"key":"10.3233\/AIC-220116_ref9","doi-asserted-by":"publisher","DOI":"10.1109\/IV47402.2020.9304839"},{"key":"10.3233\/AIC-220116_ref10","doi-asserted-by":"crossref","unstructured":"C.\u00a0Brewitt, B.\u00a0Gyevnar, S.\u00a0Garcin and S.V.\u00a0Albrecht, GRIT: Fast, interpretable, and verifiable goal recognition with learned decision trees for autonomous driving, in: IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021.","DOI":"10.1109\/IROS51168.2021.9636279"},{"key":"10.3233\/AIC-220116_ref12","unstructured":"I.\u00a0Carlucho, A.\u00a0Rahman, W.\u00a0Ard, E.\u00a0Fosong, C.\u00a0Barbalata and S.V.\u00a0Albrecht, Cooperative marine operations via ad hoc teams, in: IJCAI Workshop on Ad Hoc Teamwork, 2022."},{"key":"10.3233\/AIC-220116_ref13","unstructured":"F.\u00a0Christianos, G.\u00a0Papoudakis, A.\u00a0Rahman and S.V.\u00a0Albrecht, Scaling multi-agent reinforcement learning with selective parameter sharing, in: International Conference on Machine Learning (ICML), 2021."},{"key":"10.3233\/AIC-220116_ref14","unstructured":"F.\u00a0Christianos, L.\u00a0Sch\u00e4fer and S.V.\u00a0Albrecht, Shared experience actor-critic for multi-agent reinforcement learning, in: 34th Conference on Neural Information Processing Systems (NeurIPS), 2020."},{"key":"10.3233\/AIC-220116_ref15","doi-asserted-by":"publisher","first-page":"1086","DOI":"10.1109\/TITS.2019.2901791","article-title":"Multi-agent deep reinforcement learning for large-scale traffic signal control","volume":"21","author":"Chu","year":"2020","journal-title":"IEEE Transactions on Intelligent Transportation Systems"},{"key":"10.3233\/AIC-220116_ref16","unstructured":"M.\u00a0Dennis, N.\u00a0Jaques, E.\u00a0Vinitsky, A.\u00a0Bayen, S.\u00a0Russell, A.\u00a0Critch and S.\u00a0Levine, Emergent complexity and zero-shot transfer via unsupervised environment design, in: NIPS, 2020."},{"key":"10.3233\/AIC-220116_ref17","unstructured":"A.\u00a0Dosovitskiy, G.\u00a0Ros, F.\u00a0Codevilla, A.\u00a0Lopez and V.\u00a0Koltun, CARLA: An open urban driving simulator, in: Proceedings of the 1st Annual Conference on Robot Learning, 2017, pp.\u00a01\u201316."},{"key":"10.3233\/AIC-220116_ref19","unstructured":"E.\u00a0Fosong, A.\u00a0Rahman, I.\u00a0Carlucho and S.V.\u00a0Albrecht, Few-shot teamwork, in: IJCAI Workshop on Ad Hoc Teamwork, 2022."},{"key":"10.3233\/AIC-220116_ref20","unstructured":"D.\u00a0Ghosh, J.\u00a0Rahme, A.\u00a0Kumar, A.\u00a0Zhang, R.P.\u00a0Adams and S.\u00a0Levine, Why generalization in RL is difficult: Epistemic POMDPs and implicit partial observability, in: Advances in Neural Information Processing Systems, 2021."},{"key":"10.3233\/AIC-220116_ref21","unstructured":"S.\u00a0Guo, Y.\u00a0Ren, K.\u00a0Mathewson, S.\u00a0Kirby, S.V.\u00a0Albrecht and K.\u00a0Smith, Expressivity of emergent languages is a trade-off between contextual complexity and unpredictability, in: International Conference on Learning Representations (ICLR), 2022."},{"key":"10.3233\/AIC-220116_ref22","unstructured":"B.\u00a0Gyevnar, M.\u00a0Tamborski, C.\u00a0Wang, C.G.\u00a0Lucas, S.B.\u00a0Cohen and S.V.\u00a0Albrecht, A human-centric method for generating causal explanations in natural language for autonomous vehicle motion planning, in: IJCAI Workshop on Artificial Intelligence for Autonomous Driving, 2022."},{"key":"10.3233\/AIC-220116_ref23","doi-asserted-by":"crossref","unstructured":"J.P.\u00a0Hanna, A.\u00a0Rahman, E.\u00a0Fosong, F.\u00a0Eiras, M.\u00a0Dobre, J.\u00a0Redford, S.\u00a0Ramamoorthy and S.V.\u00a0Albrecht, Interpretable goal recognition in the presence of occluded factors for autonomous vehicles, in: IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021.","DOI":"10.1109\/IROS51168.2021.9635903"},{"key":"10.3233\/AIC-220116_ref24","doi-asserted-by":"crossref","unstructured":"M.\u00a0Jacob, S.\u00a0Devlin and K.\u00a0Hofmann, \u201cIt\u2019s unwieldy and it takes a lot of time\u201d \u2013 challenges and opportunities for creating agents in commercial games, in: Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, Vol.\u00a016, 2020, pp.\u00a088\u201394.","DOI":"10.1609\/aiide.v16i1.7415"},{"key":"10.3233\/AIC-220116_ref28","doi-asserted-by":"publisher","DOI":"10.1109\/ITSC45102.2020.9294728"},{"key":"10.3233\/AIC-220116_ref30","unstructured":"R.\u00a0Lowe, Y.\u00a0Wu, A.\u00a0Tamar, J.\u00a0Harb, P.\u00a0Abbeel and I.\u00a0Mordatch, Multi-agent actor-critic for mixed cooperative-competitive environments, in: Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS\u201917, Curran Associates Inc., Red Hook, NY, USA, 2017, pp.\u00a06382\u20136393. ISBN 9781510860964."},{"key":"10.3233\/AIC-220116_ref31","doi-asserted-by":"crossref","unstructured":"W.\u00a0Macke, R.\u00a0Mirsky and P.\u00a0Stone, Expected value of communication for planning in ad hoc teamwork, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol.\u00a035, 2021, pp.\u00a011290\u201311298.","DOI":"10.1609\/aaai.v35i13.17346"},{"key":"10.3233\/AIC-220116_ref32","unstructured":"D.\u00a0Malik, Y.\u00a0Li and P.\u00a0Ravikumar, When is generalizable reinforcement learning tractable? in: Advances in Neural Information Processing Systems, 2021."},{"issue":"2","key":"10.3233\/AIC-220116_ref35","doi-asserted-by":"publisher","first-page":"175","DOI":"10.1007\/s10458-015-9280-x","article-title":"Ad hoc teamwork by learning teammates\u2019 task","volume":"30","author":"Melo","year":"2016","journal-title":"Autonomous Agents and Multi-Agent Systems"},{"key":"10.3233\/AIC-220116_ref36","doi-asserted-by":"crossref","unstructured":"R.\u00a0Mirsky, I.\u00a0Carlucho, A.\u00a0Rahman, E.\u00a0Fosong, W.\u00a0Macke, M.\u00a0Sridharan, P.\u00a0Stone and S.V.\u00a0Albrecht, A survey of ad hoc teamwork: Definitions, methods, and open problems, in: European Conference on Multi-Agent Systems (EUMAS), 2022.","DOI":"10.1007\/978-3-031-20614-6_16"},{"key":"10.3233\/AIC-220116_ref37","doi-asserted-by":"crossref","unstructured":"R.\u00a0Mirsky, W.\u00a0Macke, A.\u00a0Wang, H.\u00a0Yedidsion and P.\u00a0Stone, A penny for your thoughts: The value of communication in ad hoc teamwork, in: Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, 2021, pp.\u00a0254\u2013260.","DOI":"10.24963\/ijcai.2020\/36"},{"key":"10.3233\/AIC-220116_ref40","doi-asserted-by":"crossref","unstructured":"P.-Y.\u00a0Oudeyer and F.\u00a0Kaplan, What is intrinsic motivation? A typology of computational approaches, Frontiers in Neurorobotics 1 (2009), 6.","DOI":"10.3389\/neuro.12.006.2007"},{"key":"10.3233\/AIC-220116_ref41","unstructured":"G.\u00a0Papoudakis, F.\u00a0Christianos and S.V.\u00a0Albrecht, Agent modelling under partial observability for deep reinforcement learning, in: Proceedings of the Neural Information Processing Systems (NeurIPS), 2021."},{"key":"10.3233\/AIC-220116_ref43","unstructured":"G.\u00a0Papoudakis, F.\u00a0Christianos, L.\u00a0Sch\u00e4fer and S.V.\u00a0Albrecht, Benchmarking multi-agent deep reinforcement learning algorithms in cooperative tasks, in: Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (NeurIPS), 2021."},{"key":"10.3233\/AIC-220116_ref45","unstructured":"L.\u00a0Pinto, J.\u00a0Davidson, R.\u00a0Sukthankar and A.K.\u00a0Gupta, Robust adversarial reinforcement learning, in: ICML, 2017."},{"key":"10.3233\/AIC-220116_ref46","unstructured":"D.\u00a0Precup, R.S.\u00a0Sutton and S.\u00a0Singh, Eligibility traces for off-policy policy evaluation, in: Proceedings of the 17th International Conference on Machine Learning (ICML), 2000, pp.\u00a0759\u2013766."},{"key":"10.3233\/AIC-220116_ref47","unstructured":"A.\u00a0Rahman, E.\u00a0Fosong, I.\u00a0Carlucho and S.V.\u00a0Albrecht, Towards robust ad hoc teamwork agents by creating diverse training teammates, in: IJCAI Workshop on Ad Hoc Teamwork, 2022."},{"key":"10.3233\/AIC-220116_ref48","unstructured":"A.\u00a0Rahman, N.\u00a0H\u00f6pner, F.\u00a0Christianos and S.V.\u00a0Albrecht, Towards open ad hoc teamwork using graph-based policy learning, in: International Conference on Machine Learning (ICML), 2021."},{"key":"10.3233\/AIC-220116_ref50","unstructured":"L.\u00a0Sch\u00e4fer, F.\u00a0Christianos, J.P.\u00a0Hanna and S.V.\u00a0Albrecht, Decoupled reinforcement learning to stabilise intrinsically-motivated exploration, in: International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2022."},{"key":"10.3233\/AIC-220116_ref53","doi-asserted-by":"crossref","unstructured":"P.\u00a0Stone, G.A.\u00a0Kaminka, S.\u00a0Kraus and J.S.\u00a0Rosenschein, Ad hoc autonomous agent teams: Collaboration without pre-coordination, in: AAAI Conference on Artificial Intelligence, AAAI Press, Atlanta, GA, USA, 2010, pp.\u00a01504\u20131509.","DOI":"10.1609\/aaai.v24i1.7529"},{"key":"10.3233\/AIC-220116_ref55","unstructured":"P.S.\u00a0Thomas and E.\u00a0Brunskill, Data-efficient off-policy policy evaluation for reinforcement learning, in: Proceedings of the 33rd International Conference on Machine Learning (ICML), 2016."},{"key":"10.3233\/AIC-220116_ref56","doi-asserted-by":"crossref","unstructured":"J.\u00a0Tobin, R.\u00a0Fong, A.\u00a0Ray, J.\u00a0Schneider, W.\u00a0Zaremba and P.\u00a0Abbeel, Domain randomization for transferring deep neural networks from simulation to the real world, in: 2017 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 2017, pp.\u00a023\u201330.","DOI":"10.1109\/IROS.2017.8202133"},{"key":"10.3233\/AIC-220116_ref57","unstructured":"G.\u00a0Vecchio, S.\u00a0Palazzo, D.C.\u00a0Guastella, I.\u00a0Carlucho, S.V.\u00a0Albrecht, G.\u00a0Muscato and C.\u00a0Spampinato, MIDGARD: A simulation platform for autonomous navigation in unstructured environments, in: ICRA Workshop on Releasing Robots into the Wild: Simulations, Benchmarks, and Deployment (ICRA), 2022."},{"key":"10.3233\/AIC-220116_ref60","unstructured":"R.\u00a0Zhong, J.P.\u00a0Hanna, L.\u00a0Sch\u00e4fer and S.V.\u00a0Albrecht, Robust on-policy data collection for data-efficient policy evaluation, in: NeurIPS Workshop on Offline Reinforcement Learning (OfflineRL), 2021."}],"container-title":["AI Communications"],"original-title":[],"link":[{"URL":"https:\/\/content.iospress.com\/download?id=10.3233\/AIC-220116","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,28]],"date-time":"2026-04-28T18:28:04Z","timestamp":1777400884000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/full\/10.3233\/AIC-220116"}},"subtitle":[],"editor":[{"given":"Stefano V.","family":"Albrecht","sequence":"additional","affiliation":[]},{"given":"Michael","family":"Woolridge","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2022,9,20]]},"references-count":42,"journal-issue":{"issue":"4"},"URL":"https:\/\/doi.org\/10.3233\/aic-220116","relation":{},"ISSN":["1875-8452","0921-7126"],"issn-type":[{"value":"1875-8452","type":"electronic"},{"value":"0921-7126","type":"print"}],"subject":[],"published":{"date-parts":[[2022,9,20]]}}}