{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,30]],"date-time":"2025-07-30T16:30:52Z","timestamp":1753893052782,"version":"3.41.2"},"reference-count":41,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2023,5,12]],"date-time":"2023-05-12T00:00:00Z","timestamp":1683849600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100003495","name":"Hessisches Ministerium f\u00fcr Wissenschaft und Kunst","doi-asserted-by":"publisher","award":["Cluster project \"The Third Wave of Artificial Intelligence - 3AI\""],"award-info":[{"award-number":["Cluster project \"The Third Wave of Artificial Intelligence - 3AI\""]}],"id":[{"id":"10.13039\/501100003495","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Artif. Intell."],"abstract":"<jats:p>In recent years, deep neural networks for strategy games have made significant progress. AlphaZero-like frameworks which combine Monte-Carlo tree search with reinforcement learning have been successfully applied to numerous games with perfect information. However, they have not been developed for domains where uncertainty and unknowns abound, and are therefore often considered unsuitable due to imperfect observations. Here, we challenge this view and argue that they are a viable alternative for games with imperfect information\u2014a domain currently dominated by heuristic approaches or methods explicitly designed for hidden information, such as oracle-based techniques. To this end, we introduce a novel algorithm based solely on reinforcement learning, called AlphaZe\u2217\u2217, which is an AlphaZero-based framework for games with imperfect information. We examine its learning convergence on the games Stratego and DarkHex and show that it is a surprisingly strong baseline, while using a model-based approach: it achieves similar win rates against other Stratego bots like Pipeline Policy Space Response Oracle (P2SRO), while not winning in direct comparison against P2SRO or reaching the much stronger numbers of DeepNash. Compared to heuristics and oracle-based approaches, AlphaZe\u2217\u2217 can easily deal with rule changes, e.g., when more information than usual is given, and drastically outperforms other approaches in this respect.<\/jats:p>","DOI":"10.3389\/frai.2023.1014561","type":"journal-article","created":{"date-parts":[[2023,5,12]],"date-time":"2023-05-12T06:22:56Z","timestamp":1683872576000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["AlphaZe\u2217\u2217: AlphaZero-like baselines for imperfect information games are surprisingly strong"],"prefix":"10.3389","volume":"6","author":[{"given":"Jannis","family":"Bl\u00fcml","sequence":"first","affiliation":[]},{"given":"Johannes","family":"Czech","sequence":"additional","affiliation":[]},{"given":"Kristian","family":"Kersting","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2023,5,12]]},"reference":[{"volume-title":"Competitive play in Stratego","year":"2010","author":"Arts","key":"B1"},{"key":"B2","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1912.06680","article-title":"Dota 2 with large scale deep reinforcement learning","author":"Berner","year":"2019","journal-title":"arXiv:1912.06680v1"},{"key":"B3","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1709.09451","article-title":"Combining prediction of human decisions with ISMCTS in imperfect information games","author":"Bitan","year":"2017","journal-title":"arXiv preprint arXiv:1709.09451"},{"key":"B4","first-page":"57","article-title":"\u201cA comparison of Monte-Carlo methods for phantom GO,\u201d","volume-title":"Proceedings of BeNeLux Conference on Artificial Intelligence","author":"Borsboom","year":"2007"},{"key":"B5","first-page":"17057","article-title":"\u201cCombining deep reinforcement learning and search for imperfect-information games,\u201d","author":"Brown","year":"2020","journal-title":"Advances in Neural Information Processing Systems, Vol. 33"},{"key":"B6","first-page":"793","article-title":"\u201cDeep counterfactual regret minimization,\u201d","volume-title":"International Conference on Machine Learning","author":"Brown","year":"2019"},{"key":"B7","doi-asserted-by":"publisher","first-page":"885","DOI":"10.1126\/science.aay2400","article-title":"Superhuman ai for multiplayer poker","volume":"365","author":"Brown","year":"2019","journal-title":"Science"},{"key":"B8","article-title":"\u201cEfficient Monte Carlo counterfactual regret minimization in games with many player actions,\u201d","volume-title":"Advances in Neural Information Processing Systems, Vol. 25","author":"Burch","year":"2012"},{"key":"B9","doi-asserted-by":"publisher","first-page":"120","DOI":"10.1109\/TCIAIG.2012.2200894","article-title":"Information set Monte Carlo tree search","volume":"4","author":"Cowling","year":"2012","journal-title":"IEEE Trans. Comput. Intell. AI Games"},{"key":"B10","doi-asserted-by":"crossref","first-page":"114","DOI":"10.1109\/CIG.2015.7317927","article-title":"\u201cEmergent bluffing and inference with Monte Carlo tree search,\u201d","volume-title":"2015 IEEE Conference on Computational Intelligence and Games (CIG)","author":"Cowling","year":"2015"},{"key":"B11","doi-asserted-by":"publisher","first-page":"24","DOI":"10.3389\/frai.2020.00024","article-title":"Learning to play the chess variant Crazyhouse above world champion level with deep neural networks and human data","volume":"3","author":"Czech","year":"2020","journal-title":"Front. Artif. Intell"},{"key":"B12","first-page":"28","article-title":"\u201cUsing response functions to measure strategy strength,\u201d","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","author":"Davis","year":"2014"},{"key":"B13","first-page":"1","article-title":"Invincible-a Stratego bot","volume":"5","author":"de Boer","year":"2008","journal-title":"Int. J. Intell. Games Simul"},{"key":"B14","doi-asserted-by":"publisher","first-page":"87","DOI":"10.1016\/S0004-3702(97)00082-9","article-title":"Search in games with incomplete information: a case study using bridge card play","volume":"100","author":"Frank","year":"1998","journal-title":"Artif. Intell"},{"key":"B15","doi-asserted-by":"publisher","first-page":"3051","DOI":"10.1162\/neco.2007.19.11.3051","article-title":"Model-based reinforcement learning for partially observable games with sampling-based state estimation","volume":"19","author":"Fujita","year":"2007","journal-title":"Neural Comput"},{"key":"B16","doi-asserted-by":"publisher","first-page":"106","DOI":"10.1145\/2093548.2093574","article-title":"The grand challenge of computer go: Monte carlo tree search and extensions","volume":"55","author":"Gelly","year":"2012","journal-title":"Commun. ACM"},{"key":"B17","doi-asserted-by":"publisher","first-page":"303","DOI":"10.1613\/jair.820","article-title":"Gib: imperfect information in a computationally challenging game","volume":"14","author":"Ginsberg","year":"2001","journal-title":"J. Artif. Intell. Res"},{"key":"B18","first-page":"5927","article-title":"\u201cDeep pyramidal residual networks,\u201d","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Han","year":"2017"},{"key":"B19","first-page":"805","article-title":"\u201cFictitious self-play in extensive-form games,\u201d","author":"Heinrich","year":"2015","journal-title":"Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, ICML'15"},{"key":"B20","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1603.01121","article-title":"Deep reinforcement learning from self-play in imperfect-information games","author":"Heinrich","year":"2016","journal-title":"arXiv: 1603.01121"},{"key":"B21","first-page":"7132","article-title":"\u201cSqueeze-and-excitation networks,\u201d","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Hu","year":"2018"},{"key":"B22","first-page":"282","article-title":"\u201cBandit based Monte-Carlo planning,\u201d","volume-title":"European Conference on Machine Learning","author":"Kocsis","year":"2006"},{"key":"B23","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1908.09453","article-title":"OpenSpiel: A framework for reinforcement learning in games","author":"Lanctot","year":"2019","journal-title":"CoRR abs\/1908.09453"},{"key":"B24","first-page":"4193","article-title":"\u201cA unified game-theoretic approach to multiagent reinforcement learning,\u201d","volume-title":"Proceedings of the 31st International Conference on Neural Information Processing Systems, NeurIPS'17","author":"Lanctot","year":"2017"},{"key":"B25","first-page":"95","article-title":"\u201cThe million pound bridge program,\u201d","volume-title":"Heuristic Programming in Artificial Intelligence The First Computer Olympiad","author":"Levy","year":"1989"},{"key":"B26","doi-asserted-by":"crossref","first-page":"274","DOI":"10.1109\/MASS.2019.00040","article-title":"\u201cDeep neural network ensembles against deception: Ensemble diversity, accuracy and robustness,\u201d","volume-title":"2019 IEEE 16th International Conference on Mobile Ad Hoc and Sensor Systems (MASS)","author":"Liu","year":"2019"},{"key":"B27","doi-asserted-by":"crossref","DOI":"10.1609\/aaai.v24i1.7562","article-title":"\u201cUnderstanding the success of perfect information Monte Carlo sampling in game tree search,\u201d","volume-title":"Twenty-Fourth AAAI Conference on Artificial Intelligence","author":"Long","year":"2010"},{"key":"B28","article-title":"\u201cPipeline PSRO: a scalable approach for finding approximate Nash Equilibria in large games,\u201d","author":"McAleer","year":"2020","journal-title":"Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020"},{"key":"B29","doi-asserted-by":"publisher","first-page":"529","DOI":"10.1038\/nature14236","article-title":"Human-level control through deep reinforcement learning","volume":"518","author":"Mnih","year":"2015","journal-title":"Nature"},{"key":"B30","doi-asserted-by":"publisher","first-page":"990","DOI":"10.1126\/science.add4679","article-title":"Mastering the game of Stratego with model-free multiagent reinforcement learning","volume":"378","author":"Perolat","year":"2022","journal-title":"Science"},{"key":"B31","doi-asserted-by":"publisher","first-page":"575","DOI":"10.1613\/jair.3402","article-title":"Computing approximate Nash Equilibria and robust best-responses using sampling","volume":"42","author":"Ponsen","year":"2011","journal-title":"J. Artif. Intell. Res"},{"key":"B32","doi-asserted-by":"publisher","first-page":"203","DOI":"10.1007\/s10472-011-9258-6","article-title":"Multi-armed bandits with episode context","volume":"61","author":"Rosin","year":"2011","journal-title":"Ann. Math. Artif. Intell"},{"key":"B33","first-page":"4510","article-title":"\u201cMobilenetv2: inverted residuals and linear bottlenecks,\u201d","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Sandler","year":"2018"},{"key":"B34","doi-asserted-by":"publisher","first-page":"484","DOI":"10.1038\/nature16961","article-title":"Mastering the game of Go with deep neural networks and tree search","volume":"529","author":"Silver","year":"2016","journal-title":"Nature"},{"key":"B35","article-title":"\u201cMonte-carlo planning in large pomdps,\u201d","volume-title":"Advances in Neural Information Processing Systems, Vol. 23","author":"Silver","year":"2010"},{"key":"B36","doi-asserted-by":"publisher","first-page":"354","DOI":"10.1038\/nature24270","article-title":"Mastering the game of Go without human knowledge","volume":"550","author":"Silver","year":"2017","journal-title":"Nature"},{"key":"B37","doi-asserted-by":"publisher","first-page":"1140","DOI":"10.1126\/science.aar6404","article-title":"A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play","volume":"362","author":"Silver","year":"2018","journal-title":"Science"},{"key":"B38","doi-asserted-by":"publisher","first-page":"350","DOI":"10.1038\/s41586-019-1724-z","article-title":"Grandmaster level in StarCraft ii using multi-agent reinforcement learning","volume":"575","author":"Vinyals","year":"2019","journal-title":"Nature"},{"journal-title":"Monte Carlo tree search for games with hidden information and uncertainty","year":"2014","author":"Whitehouse","key":"B39"},{"key":"B40","article-title":"\u201cAn expert-level card playing agent based on a variant of perfect information Monte Carlo sampling,\u201d","volume-title":"Twenty-Fourth International Joint Conference on Artificial Intelligence","author":"Wisser","year":"2015"},{"key":"B41","first-page":"1729","article-title":"\u201cRegret minimization in games with incomplete information,\u201d","volume-title":"Proceedings of the 20th International Conference on Neural Information Processing Systems, NeurIPS'07","author":"Zinkevich","year":"2007"}],"container-title":["Frontiers in Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2023.1014561\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,12]],"date-time":"2023-05-12T06:23:34Z","timestamp":1683872614000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2023.1014561\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,5,12]]},"references-count":41,"alternative-id":["10.3389\/frai.2023.1014561"],"URL":"https:\/\/doi.org\/10.3389\/frai.2023.1014561","relation":{},"ISSN":["2624-8212"],"issn-type":[{"type":"electronic","value":"2624-8212"}],"subject":[],"published":{"date-parts":[[2023,5,12]]},"article-number":"1014561"}}