{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,5]],"date-time":"2026-01-05T18:29:03Z","timestamp":1767637743206,"version":"3.48.0"},"reference-count":36,"publisher":"Maximum Academic Press","license":[{"start":{"date-parts":[[2024,11,8]],"date-time":"2024-11-08T00:00:00Z","timestamp":1731024000000},"content-version":"unspecified","delay-in-days":312,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["cambridge.org"],"crossmark-restriction":true},"short-container-title":["The Knowledge Engineering Review"],"published-print":{"date-parts":[[2024]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Real-time strategy (RTS) games have provided a fertile ground for AI research with notable recent successes based on deep reinforcement learning (RL). However, RL remains a data-hungry approach featuring a high sample complexity. In this paper, we focus on a sample complexity reduction technique called reinforcement learning as a rehearsal (RLaR) and on the RTS game of MicroRTS to formulate and evaluate it. RLaR has been formulated in the context of action-value function based RL before. Here, we formulate it for a different RL framework, called actor-critic RL. We show that on the one hand the actor-critic framework allows RLaR to be much simpler, but on the other hand, it leaves room for a key component of RLaR\u2013a prediction function that relates a learner\u2019s observations with that of its opponent. This function, when leveraged for exploration, accelerates RL as our experiments in MicroRTS show. Further experiments provide evidence that RLaR may reduce actor noise compared to a variant that does not utilize RLaR\u2019s exploration. This study provides the first evaluation of RLaR\u2019s efficacy in a domain with a large strategy space.<\/jats:p>","DOI":"10.1017\/s0269888924000092","type":"journal-article","created":{"date-parts":[[2024,11,8]],"date-time":"2024-11-08T03:17:11Z","timestamp":1731035831000},"update-policy":"https:\/\/doi.org\/10.1017\/policypage","source":"Crossref","is-referenced-by-count":0,"title":["Reinforcement actor-critic learning as a rehearsal in MicroRTS"],"prefix":"10.48130","volume":"39","author":[{"given":"Shiron","family":"Manandhar","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7999-0307","authenticated-orcid":false,"given":"Bikramjit","family":"Banerjee","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"27968","published-online":{"date-parts":[[2024,11,8]]},"reference":[{"key":"S0269888924000092_ref18","doi-asserted-by":"publisher","DOI":"10.1007\/s11721-021-00203-8"},{"key":"S0269888924000092_ref31","doi-asserted-by":"crossref","unstructured":"Synnaeve, G. & Bessiere, P. 2011. A Bayesian model for plan recognition in RTS games applied to StarCraft. In Proceedings of the 7th AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, AIIDE 2011, 79\u201384.","DOI":"10.1609\/aiide.v7i1.12429"},{"key":"S0269888924000092_ref33","doi-asserted-by":"publisher","DOI":"10.1007\/BF00992698"},{"key":"S0269888924000092_ref10","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"S0269888924000092_ref24","doi-asserted-by":"crossref","unstructured":"Perkins, L. 2010. Terrain analysis in real-time strategy games: An integrated approach to choke point detection and region decomposition. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE), 6, 168\u2013173.","DOI":"10.1609\/aiide.v6i1.12405"},{"key":"S0269888924000092_ref13","unstructured":"Kelly, R. & Churchill, D. 2020. Transfer Learning Between RTS Combat Scenarios Using Component-Action Deep Reinforcement Learning. https:\/\/ceur-ws.org\/Vol-2862\/paper28.pdf."},{"key":"S0269888924000092_ref34","unstructured":"Weber, B. G. , Mateas, M. & Jhala, A. 2011a. Building human-level AI for real-time strategy games. In AAAI Fall Symposium Series."},{"key":"S0269888924000092_ref15","unstructured":"Marthi, B. , Russell, S. J. , Latham, D. & Guestrin, C. 2005. Concurrent hierarchical reinforcement learning. In International Joint Conference of Artificial Intelligence (IJCAI), 779\u2013785."},{"key":"S0269888924000092_ref4","unstructured":"Buro, M. 2003. Real-time strategy games: A new AI research challenge. In Proceedings of International Joint Conferences on Artificial Intelligence, 1534\u20131535."},{"key":"S0269888924000092_ref27","unstructured":"Sharma, M. , Holmes, M. , Santamara, J. , Irani, A. Jr , C. & Ram, A. 2007. Transfer learning in real-time strategy games using hybrid CBR\/RL. In Proceedings of International Joint Conference on Artificial Intelligence, 1041\u20131046."},{"key":"S0269888924000092_ref22","doi-asserted-by":"crossref","unstructured":"Onta\u00f1\u00f3n, S. 2013. The combinatorial multi-armed Bandit problem and its application to real-time strategy games. In Proceedings of the Ninth AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE), AAAI, Boston, MA, 58\u201364.","DOI":"10.1609\/aiide.v9i1.12681"},{"key":"S0269888924000092_ref23","doi-asserted-by":"publisher","DOI":"10.1109\/TCIAIG.2013.2286295"},{"key":"S0269888924000092_ref25","doi-asserted-by":"publisher","DOI":"10.1609\/aiide.v19i1.27530"},{"key":"S0269888924000092_ref28","unstructured":"Sohn, K. , Lee, H. & Yan, X. 2015. Learning structured output representation using deep conditional generative models. In Advances in Neural Information Processing Systems 25."},{"key":"S0269888924000092_ref8","doi-asserted-by":"crossref","unstructured":"Critch, L. & Churchill, D. 2020. Combining influence maps with heuristic search for executing sneak-attacks in RTS games. In Proceedings of 2020 IEEE Conference on Games (CoG-20), 740\u2013743.","DOI":"10.1109\/CoG47356.2020.9231889"},{"key":"S0269888924000092_ref7","doi-asserted-by":"crossref","unstructured":"Churchill, D. , Saffidine, A. & Buro, M. 2012. Fast heuristic search for RTS Game Combat scenarios. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment 8(1), 112\u2013117.","DOI":"10.1609\/aiide.v8i1.12527"},{"key":"S0269888924000092_ref14","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2016.01.031"},{"key":"S0269888924000092_ref19","doi-asserted-by":"crossref","unstructured":"Niel, R. , Krebbers, J. , Drugan, M. M. & Wiering, M. A. 2018. Hierarchical reinforcement learning for real-time strategy games. In Proceedings of ICAART-2018, 470\u2013477.","DOI":"10.5220\/0006593804700477"},{"key":"S0269888924000092_ref36","doi-asserted-by":"publisher","DOI":"10.1109\/CIG.2012.6374183"},{"key":"S0269888924000092_ref37","doi-asserted-by":"publisher","DOI":"10.1007\/BF00992696"},{"key":"S0269888924000092_ref6","doi-asserted-by":"crossref","unstructured":"Churchill, D. & Buro, M. 2011. Build order optimization in starCraft. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, 7(1), 14\u201319.","DOI":"10.1609\/aiide.v7i1.12435"},{"key":"S0269888924000092_ref17","unstructured":"Ng, A. Y. , Harada, D. & Russell, S. 1999. Policy invariance under reward transformations: Theory and application to reward shaping. In Proceedings of the 16th International Conference on Machine Learning, 278\u2013287. Morgan Kaufmann."},{"key":"S0269888924000092_ref26","doi-asserted-by":"crossref","unstructured":"Richoux, F. 2020. MicroPhantom: Playing MicroRTS under uncertainty and chaos. In 2020 IEEE Conference on Games (CoG), 670\u2013677.","DOI":"10.1109\/CoG47356.2020.9231653"},{"key":"S0269888924000092_ref32","first-page":"575","article-title":"Grandmaster level in StarCraft II using multi-agent reinforcement learning","author":"Vinyals","year":"2019","journal-title":"Nature"},{"key":"S0269888924000092_ref21","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-77465-5_15"},{"key":"S0269888924000092_ref3","doi-asserted-by":"publisher","DOI":"10.1109\/MCI.2019.2919363"},{"key":"S0269888924000092_ref9","doi-asserted-by":"publisher","DOI":"10.1109\/MIS.2002.1024748"},{"key":"S0269888924000092_ref35","doi-asserted-by":"crossref","unstructured":"Weber, B. , Mateas, M. & Jhala, A. 2011b. A particle model for state estimation in real-time strategy games. In Proceedings of AIIDE, 103\u2013108.","DOI":"10.1609\/aiide.v7i1.12424"},{"key":"S0269888924000092_ref12","unstructured":"Jaidee, U. & Mu\u00f1oz-Avila, H. 2012. ClassQ-l: A Q-learning algorithm for adversarial real-time strategy games. In Eighth Artificial Intelligence and Interactive Digital Entertainment Conference."},{"key":"S0269888924000092_ref5","unstructured":"Chung, M. , Buro, M. & Schaeffer, J. 2005. Monte Carlo planning in RTS games. In IEEE Symposium on Computational Intelligence and Games (CIG). Citeseer."},{"key":"S0269888924000092_ref20","unstructured":"Oh, J. , Guo, Y. , Singh, S. & Lee, H. 2018. Self-imitation learning. In ICML."},{"key":"S0269888924000092_ref11","doi-asserted-by":"publisher","DOI":"10.1109\/CoG52621.2021.9619076"},{"key":"S0269888924000092_ref30","unstructured":"Sutton, R. S. , McAllester, D. , Singh, S. & Mansour, Y. 2000. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems 12, 1057\u20131063. MIT Press."},{"key":"S0269888924000092_ref16","doi-asserted-by":"publisher","DOI":"10.1038\/nature14236"},{"volume-title":"International Joint Conference of Artificial Intelligence (IJCAI)","year":"2009","author":"Balla","key":"S0269888924000092_ref2"},{"volume-title":"Reinforcement Learning: An Introduction","year":"1998","author":"Sutton","key":"S0269888924000092_ref29"}],"container-title":["The Knowledge Engineering Review"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S0269888924000092","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,5]],"date-time":"2026-01-05T14:42:25Z","timestamp":1767624145000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S0269888924000092\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024]]},"references-count":36,"alternative-id":["S0269888924000092"],"URL":"https:\/\/doi.org\/10.1017\/s0269888924000092","relation":{},"ISSN":["0269-8889","1469-8005"],"issn-type":[{"type":"print","value":"0269-8889"},{"type":"electronic","value":"1469-8005"}],"subject":[],"published":{"date-parts":[[2024]]},"assertion":[{"value":"\u00a9 The Author(s), 2024. Published by Cambridge University Press","name":"copyright","label":"Copyright","group":{"name":"copyright_and_licensing","label":"Copyright and Licensing"}},{"value":"This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https:\/\/creativecommons.org\/licenses\/by\/4.0\/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.","name":"license","label":"License","group":{"name":"copyright_and_licensing","label":"Copyright and Licensing"}},{"value":"This content has been made available to all.","name":"free","label":"Free to read"}],"article-number":"e6"}}