{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,4]],"date-time":"2026-06-04T16:43:38Z","timestamp":1780591418430,"version":"3.54.1"},"reference-count":148,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2022,5,19]],"date-time":"2022-05-19T00:00:00Z","timestamp":1652918400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,5,19]],"date-time":"2022-05-19T00:00:00Z","timestamp":1652918400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001778","name":"Deakin University","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100001778","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Appl Intell"],"published-print":{"date-parts":[[2023,2]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Reinforcement learning (RL) has emerged as an effective approach for building an intelligent system, which involves multiple self-operated agents to collectively accomplish a designated task. More importantly, there has been a renewed focus on RL since the introduction of deep learning that essentially makes RL feasible to operate in high-dimensional environments. However, there are many diversified research directions in the current literature, such as multi-agent and multi-objective learning, and human-machine interactions. Therefore, in this paper, we propose a comprehensive software architecture that not only plays a vital role in designing a connect-the-dots deep RL architecture but also provides a guideline to develop a realistic RL application in a short time span. By inheriting the proposed architecture, software managers can foresee any challenges when designing a deep RL-based system. As a result, they can expedite the design process and actively control every stage of software development, which is especially critical in agile development environments. For this reason, we design a deep RL-based framework that strictly ensures flexibility, robustness, and scalability. To enforce generalization, the proposed architecture also does not depend on a specific RL algorithm, a network configuration, the number of agents, or the type of agents.<\/jats:p>","DOI":"10.1007\/s10489-022-03550-z","type":"journal-article","created":{"date-parts":[[2022,5,19]],"date-time":"2022-05-19T03:37:26Z","timestamp":1652931446000},"page":"2967-2988","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["Towards designing a generic and comprehensive deep reinforcement learning framework"],"prefix":"10.1007","volume":"53","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4052-5819","authenticated-orcid":false,"given":"Ngoc Duy","family":"Nguyen","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Thanh Thi","family":"Nguyen","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Nhat Truong","family":"Pham","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Hai","family":"Nguyen","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Dang Tu","family":"Nguyen","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Thanh Dang","family":"Nguyen","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Chee Peng","family":"Lim","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Michael","family":"Johnstone","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Asim","family":"Bhatti","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Douglas","family":"Creighton","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Saeid","family":"Nahavandi","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2022,5,19]]},"reference":[{"key":"3550_CR1","doi-asserted-by":"crossref","unstructured":"Sutton RS, Barto AG, et al. (1998) Introduction to reinforcement learning. MIT press Cambridge, vol 135","DOI":"10.1109\/TNN.1998.712192"},{"key":"3550_CR2","doi-asserted-by":"publisher","first-page":"27091","DOI":"10.1109\/ACCESS.2017.2777827","volume":"5","author":"ND Nguyen","year":"2017","unstructured":"Nguyen ND, Nguyen T, Nahavandi S (2017) System design perspective for human-level agents using deep reinforcement learning: A survey. IEEE Access 5:27091\u201327102","journal-title":"IEEE Access"},{"key":"3550_CR3","doi-asserted-by":"crossref","unstructured":"Mao H, Alizadeh M, Menache I, Kandula S (2016) Resource management with deep reinforcement learning. In: Proceedings of the 15th ACM workshop on hot topics in networks, pp 50\u201356","DOI":"10.1145\/3005745.3005750"},{"key":"3550_CR4","doi-asserted-by":"publisher","unstructured":"Nguyen TT, Reddi VJ (2021) Deep reinforcement learning for cyber security. IEEE Transactions on Neural Networks and Learning Systems, pp 1\u201317. https:\/\/doi.org\/10.1109\/TNNLS.2021.3121870https:\/\/doi.org\/10.1109\/TNNLS.2021.3121870","DOI":"10.1109\/TNNLS.2021.3121870 10.1109\/TNNLS.2021.3121870"},{"issue":"3","key":"3550_CR5","doi-asserted-by":"publisher","first-page":"325","DOI":"10.1023\/A:1008937911390","volume":"8","author":"D Fox","year":"2000","unstructured":"Fox D, Burgard W, Kruppa H, Thrun S (2000) A probabilistic approach to collaborative multi-robot localization. Autonomous robots 8(3):325\u2013344","journal-title":"Autonomous robots"},{"key":"3550_CR6","doi-asserted-by":"publisher","first-page":"105201","DOI":"10.1016\/j.knosys.2019.105201","volume":"196","author":"X Wu","year":"2020","unstructured":"Wu X, Chen H, Chen C, Zhong M, Xie S, Guo Y, Fujita H (2020) The autonomous navigation and obstacle avoidance for usvs with anoa deep reinforcement learning method. Knowl-Based Syst 196:105201","journal-title":"Knowl-Based Syst"},{"issue":"3","key":"3550_CR7","doi-asserted-by":"publisher","first-page":"263","DOI":"10.1177\/0278364912472380","volume":"32","author":"K M\u00fclling","year":"2013","unstructured":"M\u00fclling K, Kober J, Kroemer O, Peters J (2013) Learning to select and generalize striking movements in robot table tennis. The International Journal of Robotics Research 32(3):263\u2013279","journal-title":"The International Journal of Robotics Research"},{"issue":"1","key":"3550_CR8","doi-asserted-by":"publisher","first-page":"124","DOI":"10.1109\/TRO.2018.2878318","volume":"35","author":"TG Thuruthel","year":"2018","unstructured":"Thuruthel TG, Falotico E, Renda F, Laschi C (2018) Model-based reinforcement learning for closed-loop dynamic control of soft robotic manipulators. IEEE Trans Robot 35(1):124\u2013134","journal-title":"IEEE Trans Robot"},{"key":"3550_CR9","doi-asserted-by":"publisher","first-page":"104500","DOI":"10.1016\/j.engappai.2021.104500","volume":"106","author":"J Li","year":"2021","unstructured":"Li J, Yu T, Zhang X (2021) Emergency fault affected wide-area automatic generation control via large-scale deep reinforcement learning. Eng Appl Artif Intell 106:104500","journal-title":"Eng Appl Artif Intell"},{"key":"3550_CR10","doi-asserted-by":"publisher","first-page":"117541","DOI":"10.1016\/j.apenergy.2021.117541","volume":"304","author":"J Li","year":"2021","unstructured":"Li J, Yu T, Yang B (2021) A data-driven output voltage control of solid oxide fuel cell using multi-agent deep reinforcement learning. Appl Energy 304:117541","journal-title":"Appl Energy"},{"key":"3550_CR11","doi-asserted-by":"publisher","first-page":"117900","DOI":"10.1016\/j.apenergy.2021.117900","volume":"306","author":"J Li","year":"2022","unstructured":"Li J, Yu T, Zhang X (2022) Coordinated load frequency control of multi-area integrated energy system using multi-agent deep reinforcement learning. Appl Energy 306:117900","journal-title":"Appl Energy"},{"key":"3550_CR12","doi-asserted-by":"publisher","first-page":"1267","DOI":"10.1016\/j.egyr.2021.02.043","volume":"7","author":"J Li","year":"2021","unstructured":"Li J, Yu T (2021) A new adaptive controller based on distributed deep reinforcement learning for pemfc air supply system. Energy Reports 7:1267\u20131279","journal-title":"Energy Reports"},{"key":"3550_CR13","doi-asserted-by":"crossref","unstructured":"Zheng G, Zhang F, Zheng Z, Xiang Y, Yuan NJ, Xie X, Li Z (2018) Drn: A deep reinforcement learning framework for news recommendation. In: Proceedings of the 2018 World Wide Web Conference, pp 167\u2013176","DOI":"10.1145\/3178876.3185994"},{"key":"3550_CR14","doi-asserted-by":"publisher","first-page":"142","DOI":"10.1016\/j.ins.2020.05.066","volume":"538","author":"X Wu","year":"2020","unstructured":"Wu X, Chen H, Wang J, Troiano L, Loia V, Fujita H (2020) Adaptive stock trading strategies with deep reinforcement learning methods. Inf Sci 538:142\u2013158","journal-title":"Inf Sci"},{"key":"3550_CR15","doi-asserted-by":"crossref","unstructured":"Jin J, Song C, Li H, Gai K, Wang J, Zhang W (2018) Real-time bidding with multi-agent reinforcement learning in display advertising. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp 2193\u20132201","DOI":"10.1145\/3269206.3272021"},{"key":"3550_CR16","doi-asserted-by":"crossref","unstructured":"Xu P, Yin Q, Zhang J, Huang K (2021) Deep reinforcement learning with part-aware exploration bonus in video games. IEEE Transactions on Games","DOI":"10.1109\/TG.2021.3134259"},{"issue":"4-5","key":"3550_CR17","doi-asserted-by":"publisher","first-page":"698","DOI":"10.1177\/0278364920987859","volume":"40","author":"J Ibarz","year":"2021","unstructured":"Ibarz J, Tan J, Finn C, Kalakrishnan M, Pastor P, Levine S (2021) How to train your robot with deep reinforcement learning: lessons we have learned. The International Journal of Robotics Research 40(4-5):698\u2013721","journal-title":"The International Journal of Robotics Research"},{"issue":"6443","key":"3550_CR18","doi-asserted-by":"publisher","first-page":"859","DOI":"10.1126\/science.aau6249","volume":"364","author":"M Jaderberg","year":"2019","unstructured":"Jaderberg M, Czarnecki WM, Dunning I, Marris L, Lever G, Castaneda AG, Beattie C, Rabinowitz NC, Morcos AS, Ruderman A et al (2019) Human-level performance in 3d multiplayer games with population-based reinforcement learning. Science 364(6443):859\u2013865","journal-title":"Science"},{"key":"3550_CR19","doi-asserted-by":"crossref","unstructured":"Bellman RE (2010) Dynamic programming. Princeton University Press","DOI":"10.1515\/9781400835386"},{"key":"3550_CR20","unstructured":"Fowler M (2004) Uml distilled: a brief guide to the standard object modeling language. Addison-Wesley Professional"},{"key":"3550_CR21","unstructured":"Ross TJ (2005) Fuzzy logic with engineering applications. John Wiley & Sons"},{"issue":"4","key":"3550_CR22","doi-asserted-by":"publisher","first-page":"355","DOI":"10.1109\/TCIAIG.2013.2294713","volume":"6","author":"M Hausknecht","year":"2014","unstructured":"Hausknecht M, Lehman J, Miikkulainen R, Stone P (2014) A neuroevolution approach to general atari game playing. IEEE Transactions on Computational Intelligence and AI in Games 6(4):355\u2013366","journal-title":"IEEE Transactions on Computational Intelligence and AI in Games"},{"key":"3550_CR23","unstructured":"Bertsekas DP (1995) Dynamic programming and optimal control. Athena Scientific"},{"key":"3550_CR24","first-page":"2899","volume":"10","author":"J Duchi","year":"2009","unstructured":"Duchi J, Singer Y (2009) Efficient online and batch learning using forward backward splitting. The Journal of Machine Learning Research 10:2899\u20132934","journal-title":"The Journal of Machine Learning Research"},{"issue":"2","key":"3550_CR25","doi-asserted-by":"publisher","first-page":"201","DOI":"10.1109\/TSMCC.2011.2106494","volume":"42","author":"S Adam","year":"2011","unstructured":"Adam S, Busoniu L, Babuska R (2011) Experience replay for real-time reinforcement learning control. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 42(2):201\u2013212","journal-title":"IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews)"},{"issue":"7540","key":"3550_CR26","doi-asserted-by":"publisher","first-page":"529","DOI":"10.1038\/nature14236","volume":"518","author":"V Mnih","year":"2015","unstructured":"Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. nature 518(7540):529\u2013533","journal-title":"nature"},{"key":"3550_CR27","first-page":"1097","volume":"25","author":"A Krizhevsky","year":"2012","unstructured":"Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25:1097\u20131105","journal-title":"Advances in neural information processing systems"},{"issue":"7587","key":"3550_CR28","doi-asserted-by":"publisher","first-page":"484","DOI":"10.1038\/nature16961","volume":"529","author":"D Silver","year":"2016","unstructured":"Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of go with deep neural networks and tree search. nature 529(7587):484\u2013489","journal-title":"nature"},{"issue":"1","key":"3550_CR29","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/TCIAIG.2012.2186810","volume":"4","author":"CB Browne","year":"2012","unstructured":"Browne CB, Powley E, Whitehouse D, Lucas SM, Cowling PI, Rohlfshagen P, Tavener S, Perez D, Samothrakis S, Colton S (2012) A survey of monte carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in games 4(1):1\u201343","journal-title":"IEEE Transactions on Computational Intelligence and AI in games"},{"issue":"2","key":"3550_CR30","doi-asserted-by":"publisher","first-page":"215","DOI":"10.1162\/neco.1994.6.2.215","volume":"6","author":"G Tesauro","year":"1994","unstructured":"Tesauro G (1994) Td-gammon, a self-teaching backgammon program, achieves master-level play. Neural computation 6(2):215\u2013219","journal-title":"Neural computation"},{"issue":"19","key":"3550_CR31","doi-asserted-by":"publisher","first-page":"70","DOI":"10.2352\/ISSN.2470-1173.2017.19.AVM-023","volume":"2017","author":"AhmadEL Sallab","year":"2017","unstructured":"Sallab Ahmad EL, Abdou M, Perot E, Yogamani S (2017) Deep reinforcement learning framework for autonomous driving. Electronic Imaging 2017(19):70\u201376","journal-title":"Electronic Imaging"},{"key":"3550_CR32","unstructured":"Shalev-Shwartz S, Shammah S, Shashua A (2016) Safe, multi-agent, reinforcement learning for autonomous driving. arXiv:1610.032951610.03295"},{"key":"3550_CR33","doi-asserted-by":"crossref","unstructured":"Ng AY, Coates A, Diel M, Ganapathi V, Schulte J, Tse B, Berger E, Liang E (2006) Autonomous inverted helicopter flight via reinforcement learning. In: Experimental robotics IX. Springer, pp 363\u2013372","DOI":"10.1007\/11552246_35"},{"key":"3550_CR34","unstructured":"Nazari M, Oroojlooy A, Tak\u00e1\u010d M, Snyder LV (2018) Reinforcement learning for solving the vehicle routing problem. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. NIPS\u201918. Curran Associates Inc., Red Hook, NY, USA, p 9861?9871"},{"key":"3550_CR35","unstructured":"Bello I, Pham H, Le QV, Norouzi M, Bengio S (2017) Neural combinatorial optimization with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Workshop Track Proceedings. https:\/\/openreview.net\/forum?id=Bk9mxlSFx. OpenReview.net"},{"issue":"3","key":"3550_CR36","doi-asserted-by":"publisher","first-page":"387","DOI":"10.1007\/s10458-005-2631-2","volume":"11","author":"L Panait","year":"2005","unstructured":"Panait L, Luke S (2005) Cooperative multi-agent learning: The state of the art. Autonomous agents and multi-agent systems 11(3):387\u2013434","journal-title":"Autonomous agents and multi-agent systems"},{"key":"3550_CR37","unstructured":"Leibo JZ, Zambaldi V, Lanctot M, Marecki J, Graepel T (2017) Multi-agent reinforcement learning in sequential social dilemmas. In: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, pp 464\u2013473"},{"key":"3550_CR38","first-page":"1603","volume":"15","author":"X Wang","year":"2002","unstructured":"Wang X, Sandholm T (2002) Reinforcement learning to play an optimal nash equilibrium in team markov games. Advances in neural information processing systems 15:1603\u20131610","journal-title":"Advances in neural information processing systems"},{"issue":"7-9","key":"3550_CR39","doi-asserted-by":"publisher","first-page":"1180","DOI":"10.1016\/j.neucom.2007.11.026","volume":"71","author":"J Peters","year":"2008","unstructured":"Peters J, Schaal S (2008) Natural actor-critic. Neurocomputing 71(7-9):1180\u20131190","journal-title":"Neurocomputing"},{"key":"3550_CR40","unstructured":"He H, Boyd-Graber J, Kwok K, Daum\u00e9 III H (2016) Opponent modeling in deep reinforcement learning. In: International conference on machine learning, PMLR, pp 1804\u20131813"},{"key":"3550_CR41","unstructured":"Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. Advances in neural information processing systems, vol 27"},{"key":"3550_CR42","unstructured":"Palmer G, Tuyls K, Bloembergen D, Savani R (2018) Lenient multi-agent deep reinforcement learning. In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, pp 443\u2013451"},{"issue":"3","key":"3550_CR43","doi-asserted-by":"publisher","first-page":"41","DOI":"10.1609\/aimag.v33i3.2426","volume":"33","author":"K Tuyls","year":"2012","unstructured":"Tuyls K, Weiss G (2012) Multiagent learning: Basics, challenges, and prospects. Ai Magazine 33(3):41\u201341","journal-title":"Ai Magazine"},{"key":"3550_CR44","doi-asserted-by":"crossref","unstructured":"Natarajan S, Tadepalli P (2005) Dynamic preferences in multi-criteria reinforcement learning. In: Proceedings of the 22nd international conference on Machine learning, pp 601\u2013608","DOI":"10.1145\/1102351.1102427"},{"key":"3550_CR45","doi-asserted-by":"publisher","first-page":"67","DOI":"10.1613\/jair.3987","volume":"48","author":"DM Roijers","year":"2013","unstructured":"Roijers DM, Vamplew P, Whiteson S, Dazeley R (2013) A survey of multi-objective sequential decision-making. J Artif Intell Res 48:67\u2013113","journal-title":"J Artif Intell Res"},{"issue":"1","key":"3550_CR46","first-page":"3483","volume":"15","author":"K Van Moffaert","year":"2014","unstructured":"Van Moffaert K, Now\u00e9 A (2014) Multi-objective reinforcement learning using sets of pareto dominating policies. The Journal of Machine Learning Research 15(1):3483\u20133512","journal-title":"The Journal of Machine Learning Research"},{"key":"3550_CR47","doi-asserted-by":"crossref","unstructured":"Barrett L, Narayanan S (2008) Learning all optimal policies with multiple criteria. In: Proceedings of the 25th international conference on Machine learning, pp 41\u201347","DOI":"10.1145\/1390156.1390162"},{"issue":"1","key":"3550_CR48","doi-asserted-by":"publisher","first-page":"51","DOI":"10.1007\/s10994-010-5232-5","volume":"84","author":"P Vamplew","year":"2011","unstructured":"Vamplew P, Dazeley R, Berry A, Issabekov R, Dekker E (2011) Empirical evaluation methods for multiobjective reinforcement learning algorithms. Machine learning 84(1):51\u201380","journal-title":"Machine learning"},{"key":"3550_CR49","unstructured":"van Seijen H, Fatemi M, Romoff J, Laroche R, Barnes T, Tsang J (2017) Hybrid reward architecture for reinforcement learning. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp 5398\u20135408"},{"key":"3550_CR50","doi-asserted-by":"publisher","first-page":"103915","DOI":"10.1016\/j.engappai.2020.103915","volume":"96","author":"TT Nguyen","year":"2020","unstructured":"Nguyen TT, Nguyen ND, Vamplew P, Nahavandi S, Dazeley R, Lim CP (2020) A multi-objective deep reinforcement learning framework. Eng Appl Artif Intell 96:103915","journal-title":"Eng Appl Artif Intell"},{"key":"3550_CR51","unstructured":"Amodei D, Olah C, Steinhardt J, Christiano P, Schulman J, Man\u00e9 D (2016) Concrete problems in ai safety. arXiv:1606.06565"},{"key":"3550_CR52","unstructured":"Christiano PF, Leike J, Brown TB, Martic M, Legg S, Amodei D (2017) Deep reinforcement learning from human preferences. In: Proceedings of the 31st international conference on neural information processing systems, pp 4302\u20134310"},{"key":"3550_CR53","doi-asserted-by":"publisher","first-page":"58","DOI":"10.1016\/j.neucom.2019.05.062","volume":"359","author":"ND Nguyen","year":"2019","unstructured":"Nguyen ND, Nguyen T, Nahavandi S (2019) Multi-agent behavioral control system using deep reinforcement learning. Neurocomputing 359:58\u201368","journal-title":"Neurocomputing"},{"key":"3550_CR54","unstructured":"Nguyen ND, Nguyen TT (2020) Fruit-api. GitHub. https:\/\/github.com\/garlicdevs\/Fruit-API"},{"key":"3550_CR55","unstructured":"Castro PS, Moitra S, Gelada C, Kumar S, Bellemare MG (2018) Dopamine: A research framework for deep reinforcement learning. arXiv:1812.06110"},{"key":"3550_CR56","unstructured":"Liang E, Liaw R, Nishihara R, Moritz P, Fox R, Gonzalez J, Goldberg K, Stoica I (2017) Ray rllib: A composable and scalable reinforcement learning library. arXiv:1712.09381, p 85"},{"key":"3550_CR57","unstructured":"Dhariwal P, Hesse C, Klimov O, Nichol A, Plappert M, Radford A, Schulman J, Sidor S, Wu Y, Zhokhov P (2017) Openai baselines. GitHub. https:\/\/github.com\/openai\/baselines"},{"key":"3550_CR58","unstructured":"Tokui S, Oono K, Hido S, Clayton J (2015) Chainer: a next-generation open source framework for deep learning. In: Proceedings of workshop on machine learning systems (LearningSys) in the twenty-ninth annual conference on neural information processing systems (NIPS), vol 5, pp 1\u20136"},{"key":"3550_CR59","unstructured":"Sorokin I, Seleznev A, Pavlov M, Fedorov A, Ignateva A (2015) Deep attention recurrent q-network. arXiv:1512.01693"},{"key":"3550_CR60","unstructured":"Miyoshi K, Agarwal A, Toghiani-Rizi B (2017) Unreal. GitHub. https:\/\/github.com\/miyosuda\/unreal"},{"key":"3550_CR61","unstructured":"Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. In: International conference on machine learning, PMLR, pp 1587\u20131596"},{"key":"3550_CR62","unstructured":"Haarnoja T, Zhou A, Abbeel P, Levine S (2018) Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International conference on machine learning, PMLR, pp 1861\u20131870"},{"key":"3550_CR63","unstructured":"Haarnoja T, Zhou A, Hartikainen K, Tucker G, Ha S, Tan J, Kumar V, Zhu H, Gupta A, Abbeel P et al (2018) Soft actor-critic algorithms and applications. arXiv:1812.05905"},{"key":"3550_CR64","doi-asserted-by":"crossref","unstructured":"Van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning. In: Proceedings of the AAAI conference on artificial intelligence, vol 30","DOI":"10.1609\/aaai.v30i1.10295"},{"key":"3550_CR65","unstructured":"Wang Z, Schaul T, Hessel M, Hasselt H, Lanctot M, Freitas N (2016) Dueling network architectures for deep reinforcement learning. In: International conference on machine learning, PMLR, pp 1995\u20132003"},{"key":"3550_CR66","unstructured":"Schaul T, Quan J, Antonoglou I, Silver D (2016) Prioritized experience replay. In: Bengio Y, LeCun Y (eds) 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings. 1511.05952"},{"key":"3550_CR67","unstructured":"Hausknecht M, Stone P (2015) Deep recurrent q-learning for partially observable mdps. In: 2015 aaai fall symposium series"},{"key":"3550_CR68","doi-asserted-by":"crossref","unstructured":"Hessel M, Modayil J, Van Hasselt H, Schaul T, Ostrovski G, Dabney W, Horgan D, Piot B, Azar M, Silver D (2018) Rainbow: Combining improvements in deep reinforcement learning. In: Thirty-second AAAI conference on artificial intelligence","DOI":"10.1609\/aaai.v32i1.11796"},{"key":"3550_CR69","unstructured":"Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: International conference on machine learning, PMLR, pp 1928\u20131937"},{"key":"3550_CR70","unstructured":"Jaderberg M, Mnih V, Czarnecki WM, Schaul T, Leibo JZ, Silver D, Kavukcuoglu K (2017) Reinforcement learning with unsupervised auxiliary tasks. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. https:\/\/openreview.net\/forum?id=SJ6yPD5xg. OpenReview.net"},{"key":"3550_CR71","unstructured":"Silver D, Lever G, Heess N, Degris T, Wierstra D, Riedmiller M (2014) Deterministic policy gradient algorithms. In: International conference on machine learning, PMLR, pp 387\u2013395"},{"key":"3550_CR72","unstructured":"Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2016) Continuous control with deep reinforcement learning. In: Bengio Y, LeCun Y (eds) 4th International conference on learning representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings. arXiv:1509.029711509.02971"},{"key":"3550_CR73","unstructured":"Schulman J, Levine S, Abbeel P, Jordan M, Moritz P (2015) Trust region policy optimization. In: International conference on machine learning, PMLR, pp 1889\u20131897"},{"issue":"7","key":"3550_CR74","doi-asserted-by":"publisher","first-page":"3797","DOI":"10.1109\/TIT.2014.2320500","volume":"60","author":"T Van Erven","year":"2014","unstructured":"Van Erven T, Harremos P (2014) R\u00e9nyi divergence and kullback-leibler divergence. IEEE Trans Inf Theory 60(7):3797\u20133820","journal-title":"IEEE Trans Inf Theory"},{"key":"3550_CR75","first-page":"5279","volume":"30","author":"Y Wu","year":"2017","unstructured":"Wu Y, Mansimov E, Grosse RB, Liao S, Ba J (2017) Scalable trust-region method for deep reinforcement learning using kronecker-factored approximation. Advances in neural information processing systems 30:5279\u20135288","journal-title":"Advances in neural information processing systems"},{"key":"3550_CR76","unstructured":"Wang Z, Bapst V, Heess N, Mnih V, Munos R, Kavukcuoglu K, de Freitas N (2017) Sample efficient actor-critic with experience replay. In: 5th International conference on learning representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. https:\/\/openreview.net\/forum?id=HyM25Mqel. OpenReview.net"},{"key":"3550_CR77","unstructured":"Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv:1707.06347"},{"key":"3550_CR78","unstructured":"Nachum O, Norouzi M, Xu K, Schuurmans D (2017) Bridging the gap between value and policy based reinforcement learning. In: Proceedings of the 31st international conference on neural information processing systems, pp 2772\u20132782"},{"key":"3550_CR79","unstructured":"O\u2019Donoghue B, Munos R, Kavukcuoglu K, Mnih V (2017) Combining policy gradient and q-learning. In: 5th International conference on learning representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. https:\/\/openreview.net\/forum?id=B1kJ6H9ex. OpenReview.net"},{"key":"3550_CR80","unstructured":"Schulman J, Chen X, Abbeel P (2017) Equivalence between policy gradients and soft q-learning. arXiv:1704.06440"},{"key":"3550_CR81","unstructured":"Gruslys A, Dabney W, Azar MG, Piot B, Bellemare MG, Munos R (2018) The reactor: A fast and sample-efficient actor-critic agent for reinforcement learning. In: 6th International conference on learning representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. https:\/\/openreview.net\/forum?id=rkHVZWZAZ. OpenReview.net"},{"key":"3550_CR82","unstructured":"Gu S, Lillicrap T, Ghahramani Z, Turner RE, Sch\u00f6lkopf B, Levine S (2017) Interpolated policy gradient: merging on-policy and off-policy gradient estimation for deep reinforcement learning. In: Proceedings of the 31st international conference on neural information processing systems, pp 3849\u20133858"},{"key":"3550_CR83","unstructured":"Barth-Maron G, Hoffman MW, Budden D, Dabney W, Horgan D, TB D, Muldal A, Heess N, Lillicrap TP (2018) Distributed distributional deterministic policy gradients. In: 6th International conference on learning representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. https:\/\/openreview.net\/forum?id=SyZipzbCb. OpenReview.net"},{"key":"3550_CR84","unstructured":"Espeholt L, Marinier R, Stanczyk P, Wang K, Michalski M (2020) SEED RL: scalable and efficient deep-rl with accelerated central inference. In: 8th International conference on learning representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. https:\/\/openreview.net\/forum?id=rkgvXlrKwH. OpenReview.net"},{"key":"3550_CR85","unstructured":"Schwarzer M, Anand A, Goel R, Hjelm RD, Courville AC, Bachman P (2021) Data-efficient reinforcement learning with self-predictive representations. In: 9th International conference on learning representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. https:\/\/openreview.net\/forum?id=uCQfPZwRaUu. OpenReview.net"},{"issue":"4","key":"3550_CR86","doi-asserted-by":"publisher","first-page":"e0172395","DOI":"10.1371\/journal.pone.0172395","volume":"12","author":"A Tampuu","year":"2017","unstructured":"Tampuu A, Matiisen T, Kodelja D, Kuzovkin I, Korjus K, Aru J, Aru J, Vicente R (2017) Multiagent cooperation and competition with deep reinforcement learning. PloS one 12(4):e0172395","journal-title":"PloS one"},{"key":"3550_CR87","doi-asserted-by":"publisher","first-page":"82","DOI":"10.1016\/j.neucom.2016.01.031","volume":"190","author":"L Kraemer","year":"2016","unstructured":"Kraemer L, Banerjee B (2016) Multi-agent reinforcement learning as a rehearsal for decentralized planning. Neurocomputing 190:82\u201394","journal-title":"Neurocomputing"},{"key":"3550_CR88","unstructured":"Lowe R, Wu Y, Tamar A, Harb J, Abbeel P, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. In: Proceedings of the 31st international conference on neural information processing systems, pp 6382\u20136393"},{"key":"3550_CR89","unstructured":"Foerster JN, Assael YM, de Freitas N, Whiteson S (2016) Learning to communicate with deep multi-agent reinforcement learning. In: Proceedings of the 30th international conference on neural information processing systems, pp 2145\u20132153"},{"key":"3550_CR90","first-page":"2244","volume":"29","author":"S Sukhbaatar","year":"2016","unstructured":"Sukhbaatar S, Fergus R, et al. (2016) Learning multiagent communication with backpropagation. Advances in neural information processing systems 29:2244\u20132252","journal-title":"Advances in neural information processing systems"},{"key":"3550_CR91","doi-asserted-by":"crossref","unstructured":"Gupta JK, Egorov M, Kochenderfer M (2017) Cooperative multi-agent control using deep reinforcement learning. In: International conference on autonomous agents and multiagent systems, Springer, pp 66\u201383","DOI":"10.1007\/978-3-319-71682-4_5"},{"issue":"9","key":"3550_CR92","doi-asserted-by":"publisher","first-page":"3826","DOI":"10.1109\/TCYB.2020.2977374","volume":"50","author":"TT Nguyen","year":"2020","unstructured":"Nguyen TT, Nguyen ND, Nahavandi S (2020) Deep reinforcement learning for multiagent systems: A review of challenges, solutions, and applications. IEEE transactions on cybernetics 50(9):3826\u20133839","journal-title":"IEEE transactions on cybernetics"},{"key":"3550_CR93","unstructured":"Egorov M (2016) Multi-agent deep reinforcement learning. CS231n: convolutional neural networks for visual recognition, pp 1\u20138"},{"key":"3550_CR94","unstructured":"Shu T, Tian Y (2019) M\u02c63rl: Mind-aware multi-agent management reinforcement learning. In: 7th International conference on learning representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. https:\/\/openreview.net\/forum?id=BkzeUiRcY7. OpenReview.net"},{"key":"3550_CR95","unstructured":"Yang J, Nakhaei A, Isele D, Fujimura K, Zha H (2020) CM3: cooperative multi-goal multi-stage multi-agent reinforcement learning. In: 8th International conference on learning representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. https:\/\/openreview.net\/forum?id=S1lEX04tPr. OpenReview.net"},{"key":"3550_CR96","unstructured":"Long Q, Zhou Z, Gupta A, Fang F, Wu Y, Wang X (2020) Evolutionary population curriculum for scaling multi-agent reinforcement learning. In: 8th International conference on learning representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. https:\/\/openreview.net\/forum?id=SJxbHkrKDH. OpenReview.net"},{"key":"3550_CR97","unstructured":"Kim D, Moon S, Hostallero D, Kang WJ, Lee T, Son K, Yi Y (2019) Learning to schedule communication in multi-agent reinforcement learning. In: 7th International conference on learning representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. https:\/\/openreview.net\/forum?id=SJxu5iR9KQ. OpenReview.net"},{"key":"3550_CR98","first-page":"9927","volume":"32","author":"C Schroeder de Witt","year":"2019","unstructured":"Schroeder de Witt C, Foerster J, Farquhar G, Torr P, Boehmer W, Whiteson S (2019) Multi-agent common knowledge reinforcement learning. Advances in Neural Information Processing Systems 32:9927\u20139939","journal-title":"Advances in Neural Information Processing Systems"},{"key":"3550_CR99","unstructured":"Christianos F, Scha\u0308fer L, Albrecht SV (2020) Shared experience actor-critic for multi-agent reinforcement learning. In: Larochelle H, Ranzato M, Hadsell R, Balcan M-F, Lin H-T (eds) Advances in neural information processing systems 33: annual conference on neural information processing systems 2020, NeurIPS 2020, December 6-12, 2020, virtual. https:\/\/proceedings.neurips.cc\/paper\/2020\/hash\/7967cc8e3ab559e68cc944c44b1cf3e8-Abstract.html"},{"key":"3550_CR100","unstructured":"Wang J, Ren Z, Liu T, Yu Y, Zhang C (2021) QPLEX: duplex dueling multi-agent q-learning. In: 9th International conference on learning representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. https:\/\/openreview.net\/forum?id=Rcmk0xxIQV. OpenReview.net"},{"key":"3550_CR101","unstructured":"Wang J, Kurth-Nelson Z, Soyer H, Leibo JZ, Tirumala D, Munos R, Blundell C, Kumaran D, Botvinick MM (2017) Learning to reinforcement learn. In: Gunzelmann G, Howes A, Tenbrink T, Davelaar E J (eds) Proceedings of the 39th annual meeting of the cognitive science society, CogSci 2017, London, UK, 16-29 July 2017. https:\/\/mindmodeling.org\/cogsci2017\/papers\/0252\/index.html. cognitivesciencesociety.org"},{"key":"3550_CR102","unstructured":"Agarwal R, Liang C, Schuurmans D, Norouzi M (2019) Learning to generalize from sparse and underspecified rewards. In: International conference on machine learning, PMLR, pp 130\u2013140"},{"key":"3550_CR103","unstructured":"Rakelly K, Zhou A, Finn C, Levine S, Quillen D (2019) Efficient off-policy meta-reinforcement learning via probabilistic context variables. In: International conference on machine learning, PMLR, pp 5331\u20135340"},{"key":"3550_CR104","unstructured":"Liu EZ, Raghunathan A, Liang P, Finn C (2021) Decoupling exploration and exploitation for meta-reinforcement learning without sacrifices. In: International conference on machine learning, PMLR, pp 6925\u20136935"},{"key":"3550_CR105","unstructured":"Zintgraf LM, Feng L, Lu C, Igl M, Hartikainen K, Hofmann K, Whiteson S (2021) Exploration in approximate hyper-state space for meta reinforcement learning. In: International conference on machine learning, PMLR, pp 12991\u201313001"},{"key":"3550_CR106","unstructured":"Zintgraf L, Devlin S, Ciosek K, Whiteson S, Hofmann K (2021) Deep interactive bayesian reinforcement learning via meta-learning. In: Proceedings of the 20th international conference on autonomous agents and multiagent systems, pp 1712\u20131714"},{"key":"3550_CR107","first-page":"5302","volume":"31","author":"A Gupta","year":"2018","unstructured":"Gupta A, Mendonca R, Liu Y, Abbeel P, Levine S (2018) Meta-reinforcement learning of structured exploration strategies. Advances in Neural Information Processing Systems 31:5302\u20135311","journal-title":"Advances in Neural Information Processing Systems"},{"key":"3550_CR108","unstructured":"Lin Z, Thomas G, Yang G, Ma T (2020) Model-based adversarial meta-reinforcement learning. In: Larochelle H, Ranzato M, Hadsell R, Balcan M-F, Lin H-T (eds) Advances in neural information processing systems 33: Annual conference on neural information processing systems 2020, NeurIPS 2020, December 6-12, 2020, virtual. https:\/\/proceedings.neurips.cc\/paper\/2020\/hash\/73634c1dcbe056c1f7dcf5969da406c8-Abstract.html"},{"issue":"2","key":"3550_CR109","doi-asserted-by":"publisher","first-page":"219","DOI":"10.1162\/neco.1995.7.2.219","volume":"7","author":"F Girosi","year":"1995","unstructured":"Girosi F, Jones M, Poggio T (1995) Regularization theory and neural networks architectures. Neural computation 7(2):219\u2013269","journal-title":"Neural computation"},{"key":"3550_CR110","unstructured":"Goodfellow IJ, Mirza M, Da X, Courville AC, Bengio Y (2014) An empirical investigation of catastrophic forgeting in gradient-based neural networks. In: Bengio Y, LeCun Y (eds) 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings. 1312.6211"},{"key":"3550_CR111","doi-asserted-by":"crossref","unstructured":"Thrun S, Pratt L (1998) Learning to learn: Introduction and overview. In: Learning to learn. Springer, pp 3\u201317","DOI":"10.1007\/978-1-4615-5529-2_1"},{"key":"3550_CR112","unstructured":"Rusu AA, Rabinowitz NC, Desjardins G, Soyer H, Kirkpatrick J, Kavukcuoglu K, Pascanu R, Hadsell R (2016) Progressive neural networks. arXiv:1606.04671"},{"issue":"13","key":"3550_CR113","doi-asserted-by":"publisher","first-page":"3521","DOI":"10.1073\/pnas.1611835114","volume":"114","author":"J Kirkpatrick","year":"2017","unstructured":"Kirkpatrick J, Pascanu R, Rabinowitz N, Veness J, Desjardins G, Rusu AA, Milan K, Quan J, Ramalho T, Grabska-Barwinska A et al (2017) Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences 114(13):3521\u20133526","journal-title":"Proceedings of the national academy of sciences"},{"key":"3550_CR114","unstructured":"Fernando C, Banarse D, Blundell C, Zwols Y, Ha D, Rusu AA, Pritzel A, Wierstra D (2017) Pathnet: Evolution channels gradient descent in super neural networks. arXiv:1701.08734"},{"key":"3550_CR115","unstructured":"Rusu AA, Colmenarejo SG, G\u00fcl\u00e7ehre C, Desjardins G, Kirkpatrick J, Pascanu R, Mnih V, Kavukcuoglu K, Hadsell R (2016) Policy distillation. In: Bengio Y, LeCun Y (eds) 4th International conference on learning representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings. 1511.06295"},{"key":"3550_CR116","doi-asserted-by":"crossref","unstructured":"Yin H, Pan SJ (2017) Knowledge transfer for deep reinforcement learning with hierarchical experience replay. In: Thirty-first AAAI conference on artificial intelligence","DOI":"10.1609\/aaai.v31i1.10733"},{"key":"3550_CR117","unstructured":"Parisotto E, Ba LJ, Salakhutdinov R (2016) Actor-mimic: Deep multitask and transfer reinforcement learning. In: Bengio Y, LeCun Y (eds) 4th International conference on learning representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings. 1511.06342"},{"key":"3550_CR118","unstructured":"Wulfmeier M, Posner I, Abbeel P (2017) Mutual alignment transfer learning. In: Conference on robot learning, PMLR, pp 281\u2013290"},{"issue":"4","key":"3550_CR119","doi-asserted-by":"publisher","first-page":"541","DOI":"10.1016\/j.neunet.2010.01.001","volume":"23","author":"M Grze\u015b","year":"2010","unstructured":"Grze\u015b M, Kudenko D (2010) Online learning of shaping rewards in reinforcement learning. Neural Netw 23(4):541\u2013550","journal-title":"Neural Netw"},{"issue":"1","key":"3550_CR120","doi-asserted-by":"publisher","first-page":"41","DOI":"10.1023\/A:1022140919877","volume":"13","author":"AG Barto","year":"2003","unstructured":"Barto AG, Mahadevan S (2003) Recent advances in hierarchical reinforcement learning. Discrete event dynamic systems 13(1):41\u201377","journal-title":"Discrete event dynamic systems"},{"key":"3550_CR121","first-page":"3675","volume":"29","author":"TD Kulkarni","year":"2016","unstructured":"Kulkarni TD, Narasimhan K, Saeedi A, Tenenbaum J (2016) Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. Advances in neural information processing systems 29:3675\u20133683","journal-title":"Advances in neural information processing systems"},{"key":"3550_CR122","unstructured":"Burda Y, Edwards H, Pathak D, Storkey AJ, Darrell T, Efros AA (2019) Large-scale study of curiosity-driven learning. In: 7th International conference on learning representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. https:\/\/openreview.net\/forum?id=rJNwDjAqYX. OpenReview.net"},{"key":"3550_CR123","doi-asserted-by":"crossref","unstructured":"Pathak D, Agrawal P, Efros AA, Darrell T (2017) Curiosity-driven exploration by self-supervised prediction. In: International conference on machine learning, PMLR, pp 2778\u20132787","DOI":"10.1109\/CVPRW.2017.70"},{"key":"3550_CR124","unstructured":"Ostrovski G, Bellemare MG, Oord A, Munos R (2017) Count-based exploration with neural density models. In: International conference on machine learning, PMLR, pp 2721\u20132730"},{"key":"3550_CR125","unstructured":"Andrychowicz M, Wolski F, Ray A, Schneider J, Fong R, Welinder P, McGrew B, Tobin J, Abbeel P, Zaremba W (2017) Hindsight experience replay. In: Proceedings of the 31st international conference on neural information processing systems, pp 5055\u20135065"},{"key":"3550_CR126","doi-asserted-by":"crossref","unstructured":"Bengio Y, Louradour J, Collobert R, Weston J (2009) Curriculum learning. In: Proceedings of the 26th annual international conference on machine learning, pp 41\u201348","DOI":"10.1145\/1553374.1553380"},{"key":"3550_CR127","first-page":"7299","volume":"31","author":"A Santoro","year":"2018","unstructured":"Santoro A, Faulkner R, Raposo D, Rae J, Chrzanowski M, Weber T, Wierstra D, Vinyals O, Pascanu R, Lillicrap T (2018) Relational recurrent neural networks. Advances in Neural Information Processing Systems 31:7299\u20137310","journal-title":"Advances in Neural Information Processing Systems"},{"key":"3550_CR128","unstructured":"Parisotto E, Salakhutdinov R (2018) Neural map: Structured memory for deep reinforcement learning. In: 6th International conference on learning representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. https:\/\/openreview.net\/forum?id=Bk9zbyZCZ. OpenReview.net"},{"key":"3550_CR129","unstructured":"Horgan D, Quan J, Budden D, Barth-Maron G, Hessel M, van Hasselt H, Silver D (2018) Distributed prioritized experience replay. In: 6th International conference on learning representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. https:\/\/openreview.net\/forum?id=H1Dy---0Z. OpenReview.net"},{"key":"3550_CR130","unstructured":"Stooke A, Abbeel P (2018) Accelerated methods for deep reinforcement learning. arXiv:1803.02811"},{"key":"3550_CR131","unstructured":"Liang E, Liaw R, Nishihara R, Moritz P, Fox R, Goldberg K, Gonzalez J, Jordan M, Stoica I (2018) Rllib: Abstractions for distributed reinforcement learning. In: International conference on machine learning, PMLR, pp 3053\u20133062"},{"key":"3550_CR132","first-page":"4565","volume":"29","author":"J Ho","year":"2016","unstructured":"Ho J, Ermon S (2016) Generative adversarial imitation learning. Advances in neural information processing systems 29:4565\u20134573","journal-title":"Advances in neural information processing systems"},{"key":"3550_CR133","doi-asserted-by":"publisher","first-page":"253","DOI":"10.1613\/jair.3912","volume":"47","author":"MG Bellemare","year":"2013","unstructured":"Bellemare MG, Naddaf Y, Veness J, Bowling M (2013) The arcade learning environment: An evaluation platform for general agents. J Artif Intell Res 47:253\u2013279","journal-title":"J Artif Intell Res"},{"key":"3550_CR134","unstructured":"Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W (2016) Openai gym. arXiv:1606.01540"},{"key":"3550_CR135","doi-asserted-by":"crossref","unstructured":"Todorov E, Erez T, Tassa Y (2012) Mujoco: A physics engine for model-based control. In: 2012 IEEE\/RSJ international conference on intelligent robots and systems, IEEE, pp 5026\u20135033","DOI":"10.1109\/IROS.2012.6386109"},{"key":"3550_CR136","unstructured":"Plappert M (2016) keras-rl. GitHub. https:\/\/github.com\/keras-rl\/keras-rl"},{"key":"3550_CR137","unstructured":"Kuhnle A, Schaarschmidt M, Fricke K (2017) Tensorforce: a tensorflow library for applied reinforcement learning. Web page. https:\/\/github.com\/tensorforce\/tensorforce"},{"key":"3550_CR138","unstructured":"Hill A, Raffin A, Ernestus M, Gleave A, Kanervisto A, Traore R, Dhariwal P, Hesse C, Klimov O, Nichol A, Plappert M, Radford A, Schulman J, Sidor S, Wu Y (2018) Stable baselines. GitHub. https:\/\/github.com\/hill-a\/stable-baselines"},{"issue":"268","key":"3550_CR139","first-page":"1","volume":"22","author":"A Raffin","year":"2021","unstructured":"Raffin A, Hill A, Gleave A, Kanervisto A, Ernestus M, Dormann N (2021) Stable-baselines3: Reliable reinforcement learning implementations. J Mach Learn Res 22(268):1\u20138. http:\/\/jmlr.org\/papers\/v22\/20-1364.html","journal-title":"J Mach Learn Res"},{"key":"3550_CR140","unstructured":"Duan Y, Chen X, Houthooft R, Schulman J, Abbeel P (2016) Benchmarking deep reinforcement learning for continuous control. In: International conference on machine learning, PMLR, pp 1329\u20131338"},{"key":"3550_CR141","unstructured":"Terry JK, Black B, Jayakumar M, Hari A, Sullivan R, Santos L, Dieffendahl C, Williams NL, Lokesh Y, Horsch C et al (2020) Pettingzoo: Gym for multi-agent reinforcement learning. arXiv:2009.14471"},{"key":"3550_CR142","doi-asserted-by":"crossref","unstructured":"Zheng L, Yang J, Cai H, Zhou M, Zhang W, Wang J, Yu Y (2018) Magent: A many-agent reinforcement learning platform for artificial collective intelligence. In: Proceedings of the AAAI conference on artificial intelligence, vol 32","DOI":"10.1609\/aaai.v32i1.11371"},{"key":"3550_CR143","unstructured":"Hoffman M, Shahriari B, Aslanides J, Barth-Maron G, Behbahani F, Norman T, Abdolmaleki A, Cassirer A, Yang F, Baumli K et al (2020) Acme: A research framework for distributed reinforcement learning. arXiv:2006.00979"},{"key":"3550_CR144","unstructured":"Petrenko A, Wijmans E, Shacklett B, Koltun V (2021) Megaverse: Simulating embodied agents at one million experiences per second. In: International conference on machine learning, PMLR, pp 8556\u20138566"},{"key":"3550_CR145","unstructured":"Weng J, Chen H, Yan D, You K, Duburcq A, Zhang M, Su H, Zhu J (2021) Tianshou: A highly modularized deep reinforcement learning library. arXiv:2107.14171"},{"key":"3550_CR146","first-page":"8026","volume":"32","author":"A Paszke","year":"2019","unstructured":"Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L et al (2019) Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32:8026\u20138037","journal-title":"Advances in neural information processing systems"},{"key":"3550_CR147","doi-asserted-by":"crossref","unstructured":"Ellis B, Stylos J, Myers B (2007) The factory pattern in api design: A usability evaluation. In: 29th International conference on software engineering (ICSE\u201907), IEEE, pp 302\u2013312","DOI":"10.1109\/ICSE.2007.85"},{"key":"3550_CR148","unstructured":"Nguyen ND, Nguyen TT (2020) Fruitlab. https:\/\/fruitlab.org\/"}],"container-title":["Applied Intelligence"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10489-022-03550-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10489-022-03550-z\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10489-022-03550-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,11]],"date-time":"2023-01-11T11:49:05Z","timestamp":1673437745000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10489-022-03550-z"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,5,19]]},"references-count":148,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2023,2]]}},"alternative-id":["3550"],"URL":"https:\/\/doi.org\/10.1007\/s10489-022-03550-z","relation":{},"ISSN":["0924-669X","1573-7497"],"issn-type":[{"value":"0924-669X","type":"print"},{"value":"1573-7497","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,5,19]]},"assertion":[{"value":"21 March 2022","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"19 May 2022","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Yes","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"<!--Emphasis Type='Bold' removed-->Consent for Publication"}},{"value":"There is no conflict of interest.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"<!--Emphasis Type='Bold' removed-->Conflict of Interests"}}]}}