{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T01:01:29Z","timestamp":1760058089801,"version":"build-2065373602"},"reference-count":34,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2025,3,9]],"date-time":"2025-03-09T00:00:00Z","timestamp":1741478400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>In recent years, deep reinforcement learning (DRL) has made significant progress in the field of games. A prime example is AlphaZero, which, despite the formidable capabilities showcased, deters many from exploring its potential because of its demands for substantial computational resources. In this paper, we introduce BoxesZero, a computationally frugal Dots-and-Boxes agent that can achieve a high level of performance using relatively fewer computational resources. BoxesZero utilizes a novel and insightful training approach called \u201cbackward training\u201d, which starts by training from high-reward states near the end of the game and gradually trains earlier stages of the game. It also incorporates the domain knowledge of Dots-and-Boxes, such as endgame theorems, to accelerate the Monte Carlo Tree Search (MCTS) process. Furthermore, we extend the existing endgame theorems (which only include long chains) to encompass scenarios with 1-chains and 2-chains, providing corresponding proofs, which we refer to as the extended endgame theorems. This novel agent, BoxesZero, can achieve a high level of playing strength much faster than AlphaZero, substantially improving the model\u2019s learning efficiency. With carefully tuned parameters and limited GPU resources, BoxesZero surpasses the strongest open-source Boxes agents, PRsboxes and DabbleBoxes. Experimental results demonstrate that BoxesZero achieves an ELO rating comparable to AlphaZero in significantly less time. Furthermore, BoxesZero won the championship in the Dots-and-Boxes category of the 2024 Chinese Computer Game Competition.<\/jats:p>","DOI":"10.3390\/e27030285","type":"journal-article","created":{"date-parts":[[2025,3,10]],"date-time":"2025-03-10T06:59:59Z","timestamp":1741589999000},"page":"285","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["BoxesZero: An Efficient and Computationally Frugal Dots-and-Boxes Agent"],"prefix":"10.3390","volume":"27","author":[{"ORCID":"https:\/\/orcid.org\/0009-0002-9149-6058","authenticated-orcid":false,"given":"Xuefen","family":"Niu","sequence":"first","affiliation":[{"name":"School of Computer and Communication Engineering, Northeastern University, Qinhuangdao 066004, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-4704-1276","authenticated-orcid":false,"given":"Qirui","family":"Liu","sequence":"additional","affiliation":[{"name":"School of Computer and Communication Engineering, Northeastern University, Qinhuangdao 066004, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-4013-8318","authenticated-orcid":false,"given":"Wei","family":"Chen","sequence":"additional","affiliation":[{"name":"School of Computer and Communication Engineering, Northeastern University, Qinhuangdao 066004, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yujiao","family":"Zheng","sequence":"additional","affiliation":[{"name":"School of Computer and Communication Engineering, Northeastern University, Qinhuangdao 066004, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhanggen","family":"Jin","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Northeastern University, Qinhuangdao 066004, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2025,3,9]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1007\/BF00992698","article-title":"Q-learning","volume":"8","author":"Watkins","year":"1992","journal-title":"Mach. Learn."},{"key":"ref_2","unstructured":"Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv."},{"key":"ref_3","unstructured":"Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. (2017). Proximal policy optimization algorithms. arXiv."},{"key":"ref_4","first-page":"1861","article-title":"Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor","volume":"80","author":"Haarnoja","year":"2018","journal-title":"Int. Conf. Mach. Learn."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"484","DOI":"10.1038\/nature16961","article-title":"Mastering the game of Go with deep neural networks and tree search","volume":"529","author":"Silver","year":"2016","journal-title":"Nature"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"354","DOI":"10.1038\/nature24270","article-title":"Mastering the Game of Go without Human Knowledge","volume":"550","author":"Silver","year":"2017","journal-title":"Nature"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"18157","DOI":"10.1109\/ACCESS.2023.3246638","article-title":"Analyses of Tabular AlphaZero on Strongly-Solved Stochastic Games","volume":"11","author":"Hsueh","year":"2023","journal-title":"IEEE Access"},{"key":"ref_8","unstructured":"Wu, D.J. (2019). Accelerating Self-Play Learning in Go. arXiv."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"604","DOI":"10.1038\/s41586-020-03051-4","article-title":"Mastering Atari, Go, Chess and Shogi by planning with a learned model","volume":"588","author":"Schrittwieser","year":"2020","journal-title":"Nature"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Coulom, R. (2006). Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search. International Conference on Computers and Games, Springer.","DOI":"10.1007\/978-3-540-75538-8_7"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"2497","DOI":"10.1007\/s10462-022-10228-y","article-title":"Monte Carlo Tree Search: A Review of Recent Modifications and Applications","volume":"56","author":"Godlewski","year":"2023","journal-title":"Artif. Intell. Rev."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"e2206625119","DOI":"10.1073\/pnas.2206625119","article-title":"Acquisition of chess knowledge in AlphaZero","volume":"119","author":"McGrath","year":"2022","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"ref_13","unstructured":"Zahavy, T., Veeriah, V., Hou, S., Waugh, K., Lai, M., Leurent, E., Tomasev, N., Schut, L., Hassabis, D., and Singh, S. (2024). Diversifying AI: Towards Creative Chess with AlphaZero. arXiv."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"1140","DOI":"10.1126\/science.aar6404","article-title":"A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play","volume":"362","author":"Silver","year":"2018","journal-title":"Science"},{"key":"ref_15","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., and Polosukhin, I. (2017). Attention is All You Need. Advances in Neural Information Processing Systems (NeurIPS), Morgan Kaufmann Publishers Inc."},{"key":"ref_16","unstructured":"Tjong, E., Huang, J., Zhang, K., Wang, W., and Li, H. (2021, January 5\u20136). A Transformer-based Mahjong AI via Hierarchical Decision-Making and Fan Backward. Proceedings of the 2021 CAAI Conference on Artificial Intelligence (CAAI), Hangzhou, China."},{"key":"ref_17","unstructured":"Badia, A.P., Piot, B., Kapturowski, S., Sprechmann, P., Vitvitskyi, A., Guo, Z.D., and Blundell, C. (2020, January 13\u201318). Agent57: Outperforming the Atari Human Benchmark. Proceedings of the 37th International Conference on Machine Learning, Virtual."},{"key":"ref_18","unstructured":"Fan, L., Wang, G., Jiang, Y., Mandlekar, A., Yang, Y., Zhu, H., Tang, A., Huang, D., Zhu, Y., and Anandkumar, A. (2022). Minedojo: Building open-ended embodied agents with internet-scale knowledge. arXiv."},{"key":"ref_19","first-page":"G8","article-title":"Playing Simple Loony Dots-and-Boxes Endgames Optimally","volume":"14","author":"Buzzard","year":"2013","journal-title":"Integers"},{"key":"ref_20","unstructured":"Barker, J., and Korf, R. (2012, January 22\u201326). Solving dots-and-boxes. Proceedings of the AAAI Conference on Artificial Intelligence, Toronto, ON, Canada."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"293","DOI":"10.1016\/0004-3702(75)90019-3","article-title":"An analysis of alpha-beta pruning","volume":"6","author":"Knuth","year":"1975","journal-title":"Artif. Intell."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"1181","DOI":"10.1613\/jair.1.14390","article-title":"An Overview of Environmental Features that Impact Deep Reinforcement Learning in Sparse-Reward Domains","volume":"76","author":"Capobianco","year":"2023","journal-title":"J. Artif. Intell. Res."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"671","DOI":"10.1007\/s00182-020-00730-4","article-title":"Best Play in Dots and Boxes Endgames","volume":"50","author":"Allcock","year":"2021","journal-title":"Int. J. Game Theory"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.inffus.2022.03.003","article-title":"Exploration in deep reinforcement learning: A survey","volume":"85","author":"Ladosz","year":"2022","journal-title":"Inf. Fusion"},{"key":"ref_25","unstructured":"Cazenave, T. (2007, January 5\u20139). EMCTS: Enhanced Monte Carlo Tree Search. Proceedings of the Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE), Snowbird, UT, USA."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"77","DOI":"10.1016\/0004-3702(87)90004-X","article-title":"Game tree searching by min\/max approximation","volume":"34","author":"Rivest","year":"1987","journal-title":"Artif. Intell."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1016\/0004-3702(94)90004-3","article-title":"Proof-number search","volume":"66","author":"Allis","year":"1994","journal-title":"Artif. Intell."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Qi, Z., Huang, X., Shen, Y., and Shi, J. (2020, January 22\u201324). Optimization of connect6 based on principal variation search and transposition tables algorithms. Proceedings of the 2020 Chinese Control And Decision Conference (CCDC), Hefei, China.","DOI":"10.1109\/CCDC49329.2020.9163922"},{"key":"ref_29","unstructured":"Cazenave, T. (2015, January 25\u201331). Generalized rapid action value estimation. Proceedings of the 24th International Conference on Artificial Intelligence, Buenos Aires, Argentina."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Kocsis, L., and Szepesv\u00e1ri, C. (2006). Bandit based monte-carlo planning. European Conference on Machine Learning, Springer.","DOI":"10.1007\/11871842_29"},{"key":"ref_31","unstructured":"Gelly, S., and Wang, Y. (2006). Exploration exploitation in go: UCT for Monte-Carlo go. NIPS: Neural Information Processing Systems Conference On-line trading of Exploration and Exploitation Workshop, HAL."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2015, January 7\u201313). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.123"},{"key":"ref_33","unstructured":"Loshchilov, I., and Hutter, F. (2016). Sgdr: Stochastic gradient descent with warm restarts. arXiv."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Coulom, R. (2008, January 29). Whole-history rating: A Bayesian rating system for players of time-varying strength. Proceedings of the International Conference on Computers and Games, Beijing, China.","DOI":"10.1007\/978-3-540-87608-3_11"}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/27\/3\/285\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T16:49:44Z","timestamp":1760028584000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/27\/3\/285"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,9]]},"references-count":34,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2025,3]]}},"alternative-id":["e27030285"],"URL":"https:\/\/doi.org\/10.3390\/e27030285","relation":{},"ISSN":["1099-4300"],"issn-type":[{"type":"electronic","value":"1099-4300"}],"subject":[],"published":{"date-parts":[[2025,3,9]]}}}