{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,20]],"date-time":"2026-02-20T18:06:14Z","timestamp":1771610774089,"version":"3.50.1"},"reference-count":23,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2020,5,2]],"date-time":"2020-05-02T00:00:00Z","timestamp":1588377600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61603406"],"award-info":[{"award-number":["61603406"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61806212"],"award-info":[{"award-number":["61806212"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61603403"],"award-info":[{"award-number":["61603403"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61502516"],"award-info":[{"award-number":["61502516"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Symmetry"],"abstract":"<jats:p>The Monte Carlo Tree Search (MCTS) has demonstrated excellent performance in solving many planning problems. However, the state space and the branching factors are huge, and the planning horizon is long in many practical applications, especially in the adversarial environment. It is computationally expensive to cover a sufficient number of rewarded states that are far away from the root in the flat non-hierarchical MCTS. Therefore, the flat non-hierarchical MCTS is inefficient for dealing with planning problems with a long planning horizon, huge state space, and branching factors. In this work, we propose a novel hierarchical MCTS-based online planning method named the HMCTS-OP to tackle this issue. The HMCTS-OP integrates the MAXQ-based task hierarchies and the hierarchical MCTS algorithms into the online planning framework. Specifically, the MAXQ-based task hierarchies reduce the search space and guide the search process. Therefore, the computational complexity is significantly reduced. Moreover, the reduction in the computational complexity enables the MCTS to perform a deeper search to find better action in a limited time. We evaluate the performance of the HMCTS-OP in the domain of online planning in the asymmetric adversarial environment. The experiment results show that the HMCTS-OP outperforms other online planning methods in this domain.<\/jats:p>","DOI":"10.3390\/sym12050719","type":"journal-article","created":{"date-parts":[[2020,5,5]],"date-time":"2020-05-05T06:41:20Z","timestamp":1588660880000},"page":"719","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["HMCTS-OP: Hierarchical MCTS Based Online Planning in the Asymmetric Adversarial Environment"],"prefix":"10.3390","volume":"12","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1775-1831","authenticated-orcid":false,"given":"Lina","family":"Lu","sequence":"first","affiliation":[{"name":"College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, Hunan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wanpeng","family":"Zhang","sequence":"additional","affiliation":[{"name":"College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, Hunan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xueqiang","family":"Gu","sequence":"additional","affiliation":[{"name":"College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, Hunan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xiang","family":"Ji","sequence":"additional","affiliation":[{"name":"College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, Hunan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jing","family":"Chen","sequence":"additional","affiliation":[{"name":"College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, Hunan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2020,5,2]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Vien, N.A., and Toussaint, M. (2015, January 25\u201330). Hierarchical Monte-Carlo Planning. Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, TX, USA.","DOI":"10.1609\/aaai.v29i1.9687"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Hostetler, J., Fern, A., and Dietterich, T. (2014, January 27\u201331). State Aggregation in Monte Carlo Tree Search. Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, Qu\u00e9bec City, QC, Canada.","DOI":"10.1609\/aaai.v28i1.9066"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"He, R., Brunskill, E., and Roy, N. (2010, January 11\u201315). PUMA: Planning under Uncertainty with Macro-Actions. Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, Atlanta, GA, USA.","DOI":"10.1609\/aaai.v24i1.7749"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"227","DOI":"10.1613\/jair.639","article-title":"Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition","volume":"13","author":"Dietterich","year":"2000","journal-title":"J. Artif. Intell. Res."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Puterman, M.L. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming, John Wiley & Sons.","DOI":"10.1002\/9780470316887"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/TCIAIG.2012.2186810","article-title":"A survey of monte carlo tree search methods","volume":"4","author":"Browne","year":"2012","journal-title":"IEEE Trans. Comput. Intell. AI Games"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Kocsis, L., and Szepesv\u00e1ri, C. (2006). Bandit Based Monte-Carlo Planning. European Conference on Machine Learning, Springer.","DOI":"10.1007\/11871842_29"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1023\/A:1013689704352","article-title":"Finite-time analysis of the multiarmed bandit problem","volume":"47","author":"Auer","year":"2002","journal-title":"Mach. Learn."},{"key":"ref_9","unstructured":"Sutton, R.S., and Barto, A.G. (2018). Reinforcement Learning: An Introduction, MIT Press. [2nd ed.]."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/2501654.2501658","article-title":"Exploration and exploitation in evolutionary algorithms: A survey","volume":"45","author":"Liu","year":"2013","journal-title":"ACM Comput. Surv."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Li, Z., Narayan, A., and Leong, T. (2017, January 4\u201310). An Efficient Approach to Model-Based Hierarchical Reinforcement Learning. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA.","DOI":"10.1609\/aaai.v31i1.11024"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"2719","DOI":"10.1109\/TCYB.2014.2314294","article-title":"A learning-based semi-autonomous controller for robotic exploration of unknown disaster scenes while searching for victims","volume":"44","author":"Doroodgar","year":"2014","journal-title":"IEEE Trans. Cybern."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"1569","DOI":"10.1007\/s10994-017-5650-8","article-title":"Offline reinforcement learning with task hierarchies","volume":"106","author":"Schwab","year":"2017","journal-title":"Mach. Learn."},{"key":"ref_14","unstructured":"Le, H.M., Jiang, N., Agarwal, A., Dud\u00edk, M., Yue, Y., and Daum\u00e9, H. (2018, January 10\u201315). Hierarchical imitation and reinforcement learning. Proceedings of the ICML 2018: The 35th International Conference on Machine Learning, Stockholm, Sweden."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/2717316","article-title":"Online Planning for Large Markov Decision Processes with Hierarchical Decomposition","volume":"6","author":"Bai","year":"2015","journal-title":"ACM Trans. Intell. Syst. Technol."},{"key":"ref_16","unstructured":"Bai, A., Srivastava, S., and Russell, S. (2016, January 9\u201315). Markovian state and action abstractions for MDPs via hierarchical MCTS. Proceedings of the IJCAI: International Joint Conference on Artificial Intelligence, New York, NY, USA."},{"key":"ref_17","unstructured":"Menashe, J., and Stone, P. (2015, January 4\u20138). Monte Carlo hierarchical model learning. Proceedings of the fourteenth International Conference on Autonomous Agents and Multiagent Systems, Istanbul, Turkey."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Sironi, C.F., Liu, J., Perez-Liebana, D., and Gaina, R.D. (2018). Self-Adaptive MCTS for General Video Game Playing Self-Adaptive MCTS for General Video Game Playing. Proceedings of the International Conference on the Applications of Evolutionary Computation, Springer.","DOI":"10.1007\/978-3-319-77538-8_25"},{"key":"ref_19","unstructured":"Neufeld, X., Mostaghim, S., and Perez-Liebana, D. (2019, January 23\u201324). A Hybrid Planning and Execution Approach Through HTN and MCTS. Proceedings of the IntEx Workshop at ICAPS-2019, London, UK."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Onta\u00f1\u00f3n, S. (2017, January 22\u201325). Informed Monte Carlo Tree Search for Real-Time Strategy games. Proceedings of the IEEE Conference on Computational Intelligence in Games CIG, New York, NY, USA.","DOI":"10.1109\/CIG.2016.7860394"},{"key":"ref_21","unstructured":"Onta\u00f1\u00f3n, S. (2013, January 14\u201315). The combinatorial Multi-armed Bandit problem and its application to real-time strategy games. Proceedings of the Ninth Artificial Intelligence and Interactive Digital Entertainment Conference AIIDE, Boston, MA, USA."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1016\/0169-2607(86)90081-7","article-title":"Kruskal-Wallis test: BASIC computer program to perform nonparametric one-way analysis of variance and multiple comparisons on ranks of several independent samples","volume":"23","year":"1986","journal-title":"Comput. Methods Programs Biomed."},{"key":"ref_23","unstructured":"Morris, H., and Degroot, M.J.S. (2011). Probability and Statistics, Addison Wesley. [4th ed.]."}],"container-title":["Symmetry"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-8994\/12\/5\/719\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,13]],"date-time":"2025-10-13T14:09:50Z","timestamp":1760364590000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-8994\/12\/5\/719"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,5,2]]},"references-count":23,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2020,5]]}},"alternative-id":["sym12050719"],"URL":"https:\/\/doi.org\/10.3390\/sym12050719","relation":{},"ISSN":["2073-8994"],"issn-type":[{"value":"2073-8994","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,5,2]]}}}