{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,4]],"date-time":"2026-04-04T09:45:05Z","timestamp":1775295905298,"version":"3.50.1"},"reference-count":30,"publisher":"Maximum Academic Press","license":[{"start":{"date-parts":[[2018,12,4]],"date-time":"2018-12-04T00:00:00Z","timestamp":1543881600000},"content-version":"unspecified","delay-in-days":337,"URL":"https:\/\/www.cambridge.org\/core\/terms"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["The Knowledge Engineering Review"],"published-print":{"date-parts":[[2018]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Reinforcement learning (RL) algorithms are often used to compute agents capable of acting in environments without prior knowledge of the environment dynamics. However, these algorithms struggle to converge in environments with large branching factors and their large resulting state-spaces. In this work, we develop an approach to compress the number of entries in a Q-value table using a deep auto-encoder. We develop a set of techniques to mitigate the large branching factor problem. We present the application of such techniques in the scenario of a real-time strategy (RTS) game, where both state space and branching factor are a problem. We empirically evaluate an implementation of the technique to control agents in an RTS game scenario where classical RL fails and provide a number of possible avenues of further work on this problem.<\/jats:p>","DOI":"10.1017\/s0269888918000280","type":"journal-article","created":{"date-parts":[[2018,12,4]],"date-time":"2018-12-04T06:40:14Z","timestamp":1543905614000},"source":"Crossref","is-referenced-by-count":6,"title":["Q-Table compression for reinforcement learning"],"prefix":"10.48130","volume":"33","author":[{"given":"Leonardo","family":"Amado","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3549-6168","authenticated-orcid":false,"given":"Felipe","family":"Meneguzzi","sequence":"additional","affiliation":[]}],"member":"27968","published-online":{"date-parts":[[2018,12,4]]},"reference":[{"key":"S0269888918000280_ref6","doi-asserted-by":"crossref","unstructured":"Jaidee U. & Munoz-Avila H. 2012. Classq-l: a q-learning algorithm for adversarial real-time strategy games.","DOI":"10.1609\/aiide.v8i3.12547"},{"key":"S0269888918000280_ref32","unstructured":"Kaelbling L. P. , Littman M. L. & Moore A. W. 1996. Reinforcement learning: a survey. Journal of Artificial Intelligence Research 4, 237\u2013285."},{"key":"S0269888918000280_ref3","unstructured":"Boyan J. A. & Moore A. W. 1995. Generalization in reinforcement learning: Safely approximating the value function. In Advances in Neural Information Processing Systems 7, Tesauro, G., Touretzky, D. S. & Leen, T. K. (eds). MIT Press, 369\u2013376."},{"key":"S0269888918000280_ref31","unstructured":"Zhang J. & Zong C. 2015. Deep neural networks in machine translation: an overview. IEEE Intelligent Systems 300, 16\u201325."},{"key":"S0269888918000280_ref8","doi-asserted-by":"crossref","unstructured":"Lange S. & Riedmiller M. A. 2010. Deep auto-encoder neural networks in reinforcement learning. In IJCNN, 1\u20138. IEEE.","DOI":"10.1109\/IJCNN.2010.5596468"},{"key":"S0269888918000280_ref18","doi-asserted-by":"crossref","unstructured":"Rumelhart D. E. , Hinton G. E. & Williams R. J. 1988. Neurocomputing: foundations of research. Learning Representations by Back-propagating Errors, 696\u2013699. MIT Press.","DOI":"10.7551\/mitpress\/4943.003.0042"},{"key":"S0269888918000280_ref16","unstructured":"Pan S. J. & Yang Q. 2010. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 220, 1345\u20131359."},{"key":"S0269888918000280_ref29","doi-asserted-by":"crossref","unstructured":"Wendel V. , Alef J. , G\u00f6bel S. & Steinemtz R. 2014. A method for simulating players in a collaborative multiplayer serious game. In Proceedings of the 2014 ACM International Workshop on Serious Games, SeriousGames \u201914, 15\u201320. ACM.","DOI":"10.1145\/2656719.2656723"},{"key":"S0269888918000280_ref4","unstructured":"Guestrin C. , Lagoudakis M. G. & Parr R. 2002. Coordinated reinforcement learning. In Proceedings of the Nineteenth International Conference on Machine Learning, ICML \u201902, 227\u2013234, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc."},{"key":"S0269888918000280_ref5","unstructured":"Hebb D. O. 1949. The Organization of Behavior: A Neuropsychological Theory. Wiley."},{"key":"S0269888918000280_ref22","unstructured":"Tesauro G. 1992. Practical issues in temporal difference learning. Machine Learning 8, 257\u2013277."},{"key":"S0269888918000280_ref24","doi-asserted-by":"crossref","unstructured":"Tokic M. 2010. Adaptive e-greedy exploration in reinforcement learning based on value differences. In Proceedings of the 33rd Annual German Conference on Advances in Artificial Intelligence, KI\u201910, pages 203\u2013210. Springer-Verlag.","DOI":"10.1007\/978-3-642-16111-7_23"},{"key":"S0269888918000280_ref11","unstructured":"Mnih V. , Kavukcuoglu K. , Silver D. , Graves A. , Antonoglou I. , Wierstra D. & Riedmiller M. 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602."},{"key":"S0269888918000280_ref12","unstructured":"Mnih V. , Kavukcuoglu K. , Silver D. , Rusu A. A. , Veness J. , Bellemare M. G. , Graves A. , Riedmiller M. , Fidjeland A. K. , Ostrovski G. , Petersen S. , Beattie C. , Sadik A. , Antonoglou I. , King H. , Kumaran D. , Wierstra D. , Legg S. & Hassabis D. 2015. Human-level control through deep reinforcement learning. Nature 5180, 529\u2013533."},{"key":"S0269888918000280_ref26","unstructured":"Vinyals O. , Ewalds T. , Bartunov S. , Georgiev P. , Vezhnevets A. S. , Yeo M. , Makhzani A. , K\u00fcttler H. , Agapiou J. , Schrittwieser J. , Quan J. , Gaffney S. , Petersen S. , Simonyan K. , Schaul T. , van Hasselt H. , Silver D. , Lillicrap T. P. , Calderone K. , Keet P. , Brunasso A. , Lawrence D. , Ekermo A. , Repp J. & Tsing R. 2017. Starcraft II: a new challenge for reinforcement learning. CoRR abs\/1708.04782."},{"key":"S0269888918000280_ref1","doi-asserted-by":"crossref","unstructured":"Barriga N. A. , Stanescu M. & Buro M. 2017. Combining strategic learning with tactical search in real-time strategy games. In AIIDE, 9\u201315. AAAI Press.","DOI":"10.1609\/aiide.v13i1.12922"},{"key":"S0269888918000280_ref19","unstructured":"Sharma M. , Holmes M. , Santamaria J. , Irani A. , Isbell C. & Ram A. 2007. Transfer learning in real-time strategy games using hybrid cbr\/rl. In Proceedings of the 20th International Joint Conference on Artifical Intelligence, IJCAI'07, 1041\u20131046. Morgan Kaufmann Publishers Inc."},{"key":"S0269888918000280_ref20","volume-title":"Introduction to Reinforcement Learning","author":"R.","year":"1998"},{"key":"S0269888918000280_ref13","unstructured":"Nair A. , Srinivasan P. , Blackwell S. , Alcicek C. , Fearon R. , Maria A. D. , Panneershelvam V. , Suleyman M. , Beattie C. , Petersen S. , Legg S. , Mnih V. , Kavukcuoglu K. & Silver D. 2015. Massively parallel methods for deep reinforcement learning. CoRR, abs\/1507.04296."},{"key":"S0269888918000280_ref9","unstructured":"Legenstein R. , Wilbert N. & Wiskott L. 2010. Reinforcement Learning on Slow Features of High-Dimensional Input Streams. PLoS Comput Biol 6, e1000894."},{"key":"S0269888918000280_ref15","doi-asserted-by":"crossref","unstructured":"Onta\u00f1\u00f3n S. 2013. The combinatorial multi-armed bandit problem and its application to real-time strategy games. In AIIDE, Sukthankar, G. & Horswill, I. (eds),. AAAI.","DOI":"10.1609\/aiide.v9i1.12681"},{"key":"S0269888918000280_ref17","doi-asserted-by":"publisher","DOI":"10.1002\/9780470316887"},{"key":"S0269888918000280_ref21","unstructured":"Synnaeve G. , Nardelli N. , Auvolat A. , Chintala S. , Lacroix T. , Lin Z. , Richoux F. & Usunier N. 2016. Torchcraft: a library for machine learning research on real-time strategy games. arXiv preprint arXiv:1611.00625."},{"key":"S0269888918000280_ref23","unstructured":"Tesauro G. 1995. Temporal difference learning and td-gammon. Communications of the ACM 380, 58\u201368."},{"key":"S0269888918000280_ref2","unstructured":"Bianchi R. A. , Celiberto L. A. Jr. , Santos P. E. , Matsuura J. P. & de Mantaras R. L. 2015. Transferring knowledge as heuristics in reinforcement learning: a case-based approach. Artificial Intelligence 226, 0 102-121."},{"key":"S0269888918000280_ref28","unstructured":"Watkins J. C. H. & Dayan P. 1992. Technical note: Q-learning. Machine Learning 8, 279\u2013292."},{"key":"S0269888918000280_ref10","doi-asserted-by":"crossref","unstructured":"Mataric M. J. 1994. Reward functions for accelerated learning. In Proceedings of the Eleventh International Conference on Machine Learning, 181\u2013189. Morgan Kaufmann.","DOI":"10.1016\/B978-1-55860-335-6.50030-1"},{"key":"S0269888918000280_ref25","doi-asserted-by":"crossref","unstructured":"Vincent P. , Larochelle H. , Bengio Y. & Manzagol P.-A. 2008. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning, ICML \u201908, 1096\u20131103. ACM.","DOI":"10.1145\/1390156.1390294"},{"key":"S0269888918000280_ref30","doi-asserted-by":"crossref","unstructured":"Zhang C. & Lesser V. 2013. Coordinating multi-agent reinforcement learning with limited communication. In Proceedings of the 2013 International Conference on Autonomous Agents and Multi-agent Systems, AAMAS \u201913, 1101\u20131108. International Foundation for Autonomous Agents and Multiagent Systems.","DOI":"10.65109\/ZASH3647"},{"key":"S0269888918000280_ref7","unstructured":"Kok J. R. & Vlassis N. 2006. Collaborative multiagent reinforcement learning by payoff propagation. Journal of Machine Learning Research 7, 1789\u20131828."}],"container-title":["The Knowledge Engineering Review"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S0269888918000280","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,4]],"date-time":"2026-04-04T08:46:32Z","timestamp":1775292392000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S0269888918000280\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018]]},"references-count":30,"alternative-id":["S0269888918000280"],"URL":"https:\/\/doi.org\/10.1017\/s0269888918000280","relation":{},"ISSN":["0269-8889","1469-8005"],"issn-type":[{"value":"0269-8889","type":"print"},{"value":"1469-8005","type":"electronic"}],"subject":[],"published":{"date-parts":[[2018]]},"article-number":"e22"}}