{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,21]],"date-time":"2026-02-21T18:55:48Z","timestamp":1771700148601,"version":"3.50.1"},"reference-count":93,"publisher":"MDPI AG","issue":"8","license":[{"start":{"date-parts":[[2022,8,12]],"date-time":"2022-08-12T00:00:00Z","timestamp":1660262400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Science and Technology Innovation 2030","award":["2018AAA0100901"],"award-info":[{"award-number":["2018AAA0100901"]}]},{"name":"Science and Technology Innovation 2030","award":["2020BD003"],"award-info":[{"award-number":["2020BD003"]}]},{"name":"PKU-Baidu Fund","award":["2018AAA0100901"],"award-info":[{"award-number":["2018AAA0100901"]}]},{"name":"PKU-Baidu Fund","award":["2020BD003"],"award-info":[{"award-number":["2020BD003"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>Games have long been benchmarks and test-beds for AI algorithms. With the development of AI techniques and the boost of computational power, modern game AI systems have achieved superhuman performance in many games played by humans. These games have various features and present different challenges to AI research, so the algorithms used in each of these AI systems vary. This survey aims to give a systematic review of the techniques and paradigms used in modern game AI systems. By decomposing each of the recent milestones into basic components and comparing them based on the features of games, we summarize the common paradigms to build game AI systems and their scope and limitations. We claim that deep reinforcement learning is the most general methodology to become a mainstream method for games with higher complexity. We hope this survey can both provide a review of game AI algorithms and bring inspiration to the game AI community for future directions.<\/jats:p>","DOI":"10.3390\/a15080282","type":"journal-article","created":{"date-parts":[[2022,8,14]],"date-time":"2022-08-14T21:09:06Z","timestamp":1660511346000},"page":"282","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":11,"title":["Techniques and Paradigms in Modern Game AI Systems"],"prefix":"10.3390","volume":"15","author":[{"given":"Yunlong","family":"Lu","sequence":"first","affiliation":[{"name":"School of Computer Science, Peking University, Beijing 100871, China"}]},{"given":"Wenxin","family":"Li","sequence":"additional","affiliation":[{"name":"School of Computer Science, Peking University, Beijing 100871, China"}]}],"member":"1968","published-online":{"date-parts":[[2022,8,12]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Turing, A.M. (2009). Computing machinery and intelligence. Parsing the Turing Test, Springer.","DOI":"10.1007\/978-1-4020-6710-5_3"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"484","DOI":"10.1038\/nature16961","article-title":"Mastering the game of Go with deep neural networks and tree search","volume":"529","author":"Silver","year":"2016","journal-title":"Nature"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"354","DOI":"10.1038\/nature24270","article-title":"Mastering the game of go without human knowledge","volume":"550","author":"Silver","year":"2017","journal-title":"Nature"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1140","DOI":"10.1126\/science.aar6404","article-title":"A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play","volume":"362","author":"Silver","year":"2018","journal-title":"Science"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"508","DOI":"10.1126\/science.aam6960","article-title":"Deepstack: Expert-level artificial intelligence in heads-up no-limit poker","volume":"356","author":"Schmid","year":"2017","journal-title":"Science"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"418","DOI":"10.1126\/science.aao1733","article-title":"Superhuman AI for heads-up no-limit poker: Libratus beats top professionals","volume":"359","author":"Brown","year":"2018","journal-title":"Science"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"885","DOI":"10.1126\/science.aay2400","article-title":"Superhuman AI for multiplayer poker","volume":"365","author":"Brown","year":"2019","journal-title":"Science"},{"key":"ref_8","unstructured":"Vinyals, O., Babuschkin, I., Chung, J., Mathieu, M., Jaderberg, M., Czarnecki, W.M., Dudzik, A., Huang, A., Georgiev, P., and Powell, R. (2019). Alphastar: Mastering the real-time strategy game starcraft ii. DeepMind Blog, 2, Available online: https:\/\/www.deepmind.com\/blog\/alphastar-mastering-the-real-time-strategy-game-starcraft-ii."},{"key":"ref_9","unstructured":"Berner, C., Brockman, G., Chan, B., Cheung, V., Debiak, P., Dennison, C., Farhi, D., Fischer, Q., Hashme, S., and Hesse, C. (2019). Dota 2 with large scale deep reinforcement learning. arXiv."},{"key":"ref_10","first-page":"621","article-title":"Towards playing full moba games with deep reinforcement learning","volume":"33","author":"Ye","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1007\/s13218-020-00647-w","article-title":"From chess and atari to starcraft and beyond: How game ai is driving the world of ai","volume":"34","author":"Risi","year":"2020","journal-title":"KI-K\u00fcnstliche Intell."},{"key":"ref_12","unstructured":"Yin, Q., Yang, J., Ni, W., Liang, B., and Huang, K. (2021). AI in Games: Techniques, Challenges and Opportunities. arXiv."},{"key":"ref_13","unstructured":"Copeland, B.J. (2022, July 10). The Modern History of Computing. Available online: https:\/\/plato.stanford.edu\/entries\/computing-history\/."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"58","DOI":"10.1145\/203330.203343","article-title":"Temporal difference learning and TD-Gammon","volume":"38","author":"Tesauro","year":"1995","journal-title":"Commun. ACM"},{"key":"ref_15","first-page":"21","article-title":"Chinook the world man-machine checkers champion","volume":"17","author":"Schaeffer","year":"1996","journal-title":"AI Mag."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1016\/S0004-3702(01)00129-1","article-title":"Deep blue","volume":"134","author":"Campbell","year":"2002","journal-title":"Artif. Intell."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"145","DOI":"10.1126\/science.1259433","article-title":"Heads-up limit hold\u2019em poker is solved","volume":"347","author":"Bowling","year":"2015","journal-title":"Science"},{"key":"ref_18","unstructured":"Li, J., Koyamada, S., Ye, Q., Liu, G., Wang, C., Yang, R., Zhao, L., Qin, T., Liu, T.Y., and Hon, H.W. (2020). Suphx: Mastering mahjong with deep reinforcement learning. arXiv."},{"key":"ref_19","unstructured":"Fu, H., Liu, W., Wu, S., Wang, Y., Yang, T., Li, K., Xing, J., Li, B., Ma, B., and Fu, Q. (2021, January 3\u20137). Actor-Critic Policy Optimization in a Large-Scale Imperfect-Information Game. Proceedings of the International Conference on Learning Representations, Virtual Event."},{"key":"ref_20","unstructured":"Zha, D., Xie, J., Ma, W., Zhang, S., Lian, X., Hu, X., and Liu, J. (2021, January 18\u201324). Douzero: Mastering doudizhu with self-play deep reinforcement learning. Proceedings of the International Conference on Machine Learning, Virtual Event."},{"key":"ref_21","unstructured":"Guan, Y., Liu, M., Hong, W., Zhang, W., Fang, F., Zeng, G., and Lin, Y. (2022). PerfectDou: Dominating DouDizhu with Perfect Information Distillation. arXiv."},{"key":"ref_22","unstructured":"Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1006\/game.2000.0794","article-title":"Zermelo and the early history of game theory","volume":"34","author":"Schwalbe","year":"2001","journal-title":"Games Econ. Behav."},{"key":"ref_24","unstructured":"Osborne, M.J., and Rubinstein, A. (1994). A Course in Game Theory, MIT Press."},{"key":"ref_25","unstructured":"Sutton, R.S., and Barto, A.G. (2018). Reinforcement Learning: An Introduction, MIT Press."},{"key":"ref_26","unstructured":"Watson, J. (2002). Strategy: An Introduction to Game Theory, WW Norton."},{"key":"ref_27","unstructured":"Sunehag, P., Lever, G., Gruslys, A., Czarnecki, W.M., Zambaldi, V., Jaderberg, M., Lanctot, M., Sonnerat, N., Leibo, J.Z., and Tuyls, K. (2017). Value-decomposition networks for cooperative multi-agent learning. arXiv."},{"key":"ref_28","unstructured":"Rashid, T., Samvelyan, M., Schroeder, C., Farquhar, G., Foerster, J., and Whiteson, S. (2018, January 10\u201315). Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden."},{"key":"ref_29","first-page":"6382","article-title":"Multi-agent actor-critic for mixed cooperative-competitive environments","volume":"30","author":"Lowe","year":"2017","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Foerster, J., Farquhar, G., Afouras, T., Nardelli, N., and Whiteson, S. (2018, January 2\u20137). Counterfactual multi-agent policy gradients. Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA.","DOI":"10.1609\/aaai.v32i1.11794"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"100","DOI":"10.1109\/TSSC.1968.300136","article-title":"A formal basis for the heuristic determination of minimum cost paths","volume":"4","author":"Hart","year":"1968","journal-title":"IEEE Trans. Syst. Sci. Cybern."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"179","DOI":"10.1016\/0004-3702(79)90016-X","article-title":"A minimax algorithm better than alpha-beta?","volume":"12","author":"Stockman","year":"1979","journal-title":"Artif. Intell."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Kocsis, L., and Szepesv\u00e1ri, C. (2006, January 18\u201322). Bandit based monte-carlo planning. Proceedings of the European Conference on Machine Learning, Berlin, Germany.","DOI":"10.1007\/11871842_29"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Gelly, S., and Silver, D. (2007, January 20\u201324). Combining online and offline knowledge in UCT. Proceedings of the 24th International Conference on Machine Learning, Corvalis, OR, USA.","DOI":"10.1145\/1273496.1273531"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"1856","DOI":"10.1016\/j.artint.2011.03.007","article-title":"Monte-Carlo tree search and rapid action value estimation in computer Go","volume":"175","author":"Gelly","year":"2011","journal-title":"Artif. Intell."},{"key":"ref_36","unstructured":"Chaslot, G.M.B., Winands, M.H., and Herik, H. (October, January 29). Parallel monte-carlo tree search. Proceedings of the International Conference on Computers and Games, Beijing, China."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1613\/jair.820","article-title":"GIB: Imperfect information in a computationally challenging game","volume":"14","author":"Ginsberg","year":"2001","journal-title":"J. Artif. Intell. Res."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Bjarnason, R., Fern, A., and Tadepalli, P. (2009, January 19\u201323). Lower bounding Klondike solitaire with Monte-Carlo planning. Proceedings of the Nineteenth International Conference on Automated Planning and Scheduling, Thessaloniki, Greece.","DOI":"10.1609\/icaps.v19i1.13363"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"87","DOI":"10.1016\/S0004-3702(97)00082-9","article-title":"Search in games with incomplete information: A case study using bridge card play","volume":"100","author":"Frank","year":"1998","journal-title":"Artif. Intell."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"120","DOI":"10.1109\/TCIAIG.2012.2200894","article-title":"Information set monte carlo tree search","volume":"4","author":"Cowling","year":"2012","journal-title":"IEEE Trans. Comput. Intell. Games"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Whitehouse, D., Powley, E.J., and Cowling, P.I. (2011\u20133, January 31). Determinization and information set Monte Carlo tree search for the card game Dou Di Zhu. Proceedings of the 2011 IEEE Conference on Computational Intelligence and Games (CIG\u201911), Seoul, Korea.","DOI":"10.1109\/CIG.2011.6031993"},{"key":"ref_42","unstructured":"Burch, N. (2022, July 10). Time and Space: Why Imperfect Information Games Are Hard. Available online: https:\/\/era.library.ualberta.ca\/items\/db44409f-b373-427d-be83-cace67d33c41."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Eiben, A.E., and Smith, J.E. (2003). Introduction to Evolutionary Computing, Springer.","DOI":"10.1007\/978-3-662-05094-1"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Rechenberg, I. (1978). Evolutionsstrategien. Simulationsmethoden in der Medizin und Biologie, Springer.","DOI":"10.1007\/978-3-642-81283-5_8"},{"key":"ref_45","first-page":"489","article-title":"Arms races between and within species","volume":"205","author":"Dawkins","year":"1979","journal-title":"Proc. R. Soc. Lond. Ser. B Biol. Sci."},{"key":"ref_46","unstructured":"Angeline, P., and Pollack, J. (1993, January 1). Competitive Environments Evolve Better Solutions for Complex Tasks. Proceedings of the 5th International Conference on Genetic Algorithms, San Francisco, CA, USA."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Reynolds, C.W. (1994, January 6\u20138). Competition, coevolution and the game of tag. Proceedings of the Fourth International Workshop on the Synthesis and Simulation of Living Systems, Boston, MA, USA.","DOI":"10.7551\/mitpress\/1428.003.0010"},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"353","DOI":"10.1162\/artl.1994.1.4.353","article-title":"Evolving 3D morphology and behavior by competition","volume":"1","author":"Sims","year":"1994","journal-title":"Artif. Life"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Smith, G., Avery, P., Houmanfar, R., and Louis, S. (2010, January 18\u201321). Using co-evolved rts opponents to teach spatial tactics. Proceedings of the 2010 IEEE Conference on Computational Intelligence and Games, Copenhagen, Denmark.","DOI":"10.1109\/ITW.2010.5593359"},{"key":"ref_50","unstructured":"Fern\u00e1ndez-Ares, A., Garc\u00eda-S\u00e1nchez, P., Mora, A.M., Castillo, P.A., and Merelo, J. (April, January 30). There can be only one: Evolving RTS bots via joust selection. Proceedings of the European Conference on the Applications of Evolutionary Computation, Porto, Portugal."},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"105032","DOI":"10.1016\/j.knosys.2019.105032","article-title":"Optimizing hearthstone agents using an evolutionary algorithm","volume":"188","author":"Tonda","year":"2020","journal-title":"Knowl.-Based Syst."},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"359","DOI":"10.1016\/0893-6080(89)90020-8","article-title":"Multilayer feedforward networks are universal approximators","volume":"2","author":"Hornik","year":"1989","journal-title":"Neural Netw."},{"key":"ref_53","unstructured":"Hinton, G., Vinyals, O., and Dean, J. (2015). Distilling the knowledge in a neural network. arXiv."},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"229","DOI":"10.1007\/BF00992696","article-title":"Simple statistical gradient-following algorithms for connectionist reinforcement learning","volume":"8","author":"Williams","year":"1992","journal-title":"Mach. Learn."},{"key":"ref_55","unstructured":"Konda, V., and Tsitsiklis, J. (December, January 29). Actor-critic algorithms. Proceedings of the Advances in Neural Information Processing Systems 12 (NIPS 1999), Denver, CO, USA."},{"key":"ref_56","unstructured":"Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. (2015). Continuous control with deep reinforcement learning. arXiv."},{"key":"ref_57","unstructured":"Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., and Kavukcuoglu, K. (2016, January 20\u201322). Asynchronous methods for deep reinforcement learning. Proceedings of the International Conference on Machine Learning, PMLR, New York, NY, USA."},{"key":"ref_58","unstructured":"Espeholt, L., Soyer, H., Munos, R., Simonyan, K., Mnih, V., Ward, T., Doron, Y., Firoiu, V., Harley, T., and Dunning, I. (2018, January 10\u201315). Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures. Proceedings of the International Conference on Machine Learning. PMLR, Stockholm, Sweden."},{"key":"ref_59","unstructured":"Schulman, J., Levine, S., Abbeel, P., Jordan, M., and Moritz, P. (2015, January 7\u20139). Trust region policy optimization. Proceedings of the International Conference on Machine Learning, PMLR, Lille, France."},{"key":"ref_60","unstructured":"Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. (2017). Proximal policy optimization algorithms. arXiv."},{"key":"ref_61","doi-asserted-by":"crossref","first-page":"1127","DOI":"10.1111\/1468-0262.00153","article-title":"A simple adaptive procedure leading to correlated equilibrium","volume":"68","author":"Hart","year":"2000","journal-title":"Econometrica"},{"key":"ref_62","first-page":"1729","article-title":"Regret minimization in games with incomplete information","volume":"20","author":"Zinkevich","year":"2007","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_63","unstructured":"Tammelin, O. (2014). Solving large imperfect information games using CFR+. arXiv."},{"key":"ref_64","unstructured":"Brown, N., and Sandholm, T. (February, January 27). Solving imperfect-information games via discounted regret minimization. Proceedings of the Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA."},{"key":"ref_65","unstructured":"Lanctot, M., Waugh, K., Zinkevich, M., and Bowling, M.H. (2009, January 6\u201311). Monte Carlo Sampling for Regret Minimization in Extensive Games. Proceedings of the NIPS, Vancouver, BC, Canada."},{"key":"ref_66","unstructured":"Schmid, M., Burch, N., Lanctot, M., Moravcik, M., Kadlec, R., and Bowling, M. (February, January 27). Variance reduction in monte carlo counterfactual regret minimization (VR-MCCFR) for extensive form games using baselines. Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA."},{"key":"ref_67","unstructured":"Waugh, K., Schnizlein, D., Bowling, M.H., and Szafron, D. (2009, January 10\u201315). Abstraction pathologies in extensive games. Proceedings of the AAMAS, Budapest, Hungary."},{"key":"ref_68","doi-asserted-by":"crossref","unstructured":"Waugh, K., Morrill, D., Bagnell, J.A., and Bowling, M. (2015, January 25\u201330). Solving games with functional regret estimation. Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, TX, USA.","DOI":"10.1609\/aaai.v29i1.9445"},{"key":"ref_69","unstructured":"Brown, N., Lerer, A., Gross, S., and Sandholm, T. (2019, January 9\u201315). Deep counterfactual regret minimization. Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA."},{"key":"ref_70","unstructured":"Li, H., Hu, K., Ge, Z., Jiang, T., Qi, Y., and Song, L. (2018). Double neural counterfactual regret minimization. arXiv."},{"key":"ref_71","unstructured":"Steinberger, E. (2019). Single deep counterfactual regret minimization. arXiv."},{"key":"ref_72","unstructured":"Steinberger, E., Lerer, A., and Brown, N. (2020). DREAM: Deep regret minimization with advantage baselines and model-free learning. arXiv."},{"key":"ref_73","first-page":"374","article-title":"Iterative solution of games by fictitious play","volume":"13","author":"Brown","year":"1951","journal-title":"Act. Anal. Prod. Alloc."},{"key":"ref_74","unstructured":"Heinrich, J., Lanctot, M., and Silver, D. (2015, January 7\u20139). Fictitious self-play in extensive-form games. Proceedings of the International Conference on Machine Learning, PMLR, Lille, France."},{"key":"ref_75","unstructured":"Heinrich, J., and Silver, D. (2016). Deep reinforcement learning from self-play in imperfect-information games. arXiv."},{"key":"ref_76","unstructured":"McMahan, H.B., Gordon, G.J., and Blum, A. (2003, January 21\u201324). Planning in the presence of cost functions controlled by an adversary. Proceedings of the 20th International Conference on Machine Learning (ICML-03), Washington, DC, USA."},{"key":"ref_77","first-page":"4193","article-title":"A unified game-theoretic approach to multiagent reinforcement learning","volume":"30","author":"Lanctot","year":"2017","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_78","unstructured":"Bansal, T., Pachocki, J., Sidor, S., Sutskever, I., and Mordatch, I. (2017). Emergent complexity via multi-agent competition. arXiv."},{"key":"ref_79","unstructured":"Jaderberg, M., Dalibard, V., Osindero, S., Czarnecki, W.M., Donahue, J., Razavi, A., Vinyals, O., Green, T., Dunning, I., and Simonyan, K. (2017). Population based training of neural networks. arXiv."},{"key":"ref_80","unstructured":"Zhao, E., Yan, R., Li, J., Li, K., and Xing, J. (March, January 22). AlphaHoldem: High-Performance Artificial Intelligence for Heads-Up No-Limit Texas Hold\u2019em from End-to-End Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence, Virtual Event."},{"key":"ref_81","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_82","unstructured":"Johanson, M. (2013). Measuring the size of large no-limit poker games. arXiv."},{"key":"ref_83","doi-asserted-by":"crossref","unstructured":"Zha, D., Lai, K.H., Cao, Y., Huang, S., Wei, R., Guo, J., and Hu, X. (2019). Rlcard: A toolkit for reinforcement learning in card games. arXiv.","DOI":"10.24963\/ijcai.2020\/764"},{"key":"ref_84","doi-asserted-by":"crossref","unstructured":"Zhou, H., Zhang, H., Zhou, Y., Wang, X., and Li, W. (2018, January 2\u20134). Botzone: An online multi-agent competitive platform for ai education. Proceedings of the 23rd Annual ACM Conference on Innovation and Technology in Computer Science Education, Larnaca, Cyprus.","DOI":"10.1145\/3197091.3197099"},{"key":"ref_85","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1007\/BF00115009","article-title":"Learning to predict by the methods of temporal differences","volume":"3","author":"Sutton","year":"1988","journal-title":"Mach. Learn."},{"key":"ref_86","doi-asserted-by":"crossref","first-page":"604","DOI":"10.1038\/s41586-020-03051-4","article-title":"Mastering atari, go, chess and shogi by planning with a learned model","volume":"588","author":"Schrittwieser","year":"2020","journal-title":"Nature"},{"key":"ref_87","first-page":"17443","article-title":"Real world games look like spinning tops","volume":"33","author":"Czarnecki","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_88","doi-asserted-by":"crossref","unstructured":"Lyu, X., Baisero, A., Xiao, Y., and Amato, C. (2022). A Deeper Understanding of State-Based Critics in Multi-Agent Reinforcement Learning. arXiv.","DOI":"10.1609\/aaai.v36i9.21171"},{"key":"ref_89","first-page":"12","article-title":"The bitter lesson","volume":"13","author":"Sutton","year":"2019","journal-title":"Incomplete Ideas"},{"key":"ref_90","doi-asserted-by":"crossref","first-page":"52","DOI":"10.1109\/6.591665","article-title":"Moore\u2019s law: Past, present and future","volume":"34","author":"Schaller","year":"1997","journal-title":"IEEE Spectr."},{"key":"ref_91","doi-asserted-by":"crossref","first-page":"223","DOI":"10.1038\/s41586-021-04357-7","article-title":"Outracing champion Gran Turismo drivers with deep reinforcement learning","volume":"602","author":"Wurman","year":"2022","journal-title":"Nature"},{"key":"ref_92","doi-asserted-by":"crossref","unstructured":"Kurach, K., Raichuk, A., Sta\u0144czyk, P., Zaj\u0105c, M., Bachem, O., Espeholt, L., Riquelme, C., Vincent, D., Michalski, M., and Bousquet, O. (2019). Google research football: A novel reinforcement learning environment. arXiv.","DOI":"10.1609\/aaai.v34i04.5878"},{"key":"ref_93","unstructured":"Baker, B., Kanitscheider, I., Markov, T., Wu, Y., Powell, G., McGrew, B., and Mordatch, I. (2019). Emergent tool use from multi-agent autocurricula. arXiv."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/15\/8\/282\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T00:07:38Z","timestamp":1760141258000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/15\/8\/282"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,8,12]]},"references-count":93,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2022,8]]}},"alternative-id":["a15080282"],"URL":"https:\/\/doi.org\/10.3390\/a15080282","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,8,12]]}}}