{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,4]],"date-time":"2025-09-04T13:28:42Z","timestamp":1756992522746},"publisher-location":"California","reference-count":0,"publisher":"International Joint Conferences on Artificial Intelligence Organization","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020,7]]},"abstract":"<jats:p>Hierarchical reinforcement learning (HRL) is a promising approach to solve tasks with long time horizons and sparse rewards. It is often implemented as a high-level policy assigning subgoals to a low-level policy. However, it suffers the high-level non-stationarity problem since the low-level policy is constantly changing. The non-stationarity also leads to the data efficiency problem: policies need more data at non-stationary states to stabilize training. To address these issues, we propose a novel HRL method: Interactive Influence-based Hierarchical Reinforcement Learning (I^2HRL). First, inspired by agent modeling, we enable the interaction between the low-level and high-level policies to stabilize the high-level policy training. The high-level policy makes decisions conditioned on the received low-level policy representation as well as the state of the environment. Second, we furthermore stabilize the high-level policy via an information-theoretic regularization with minimal dependence on the changing low-level policy. Third, we propose the influence-based exploration to more frequently visit the non-stationary states where more transition data is needed. We experimentally validate the effectiveness of the proposed solution in several tasks in MuJoCo domains by demonstrating that our approach can significantly boost the learning performance and accelerate learning compared with state-of-the-art HRL methods.<\/jats:p>","DOI":"10.24963\/ijcai.2020\/433","type":"proceedings-article","created":{"date-parts":[[2020,7,8]],"date-time":"2020-07-08T08:12:10Z","timestamp":1594195930000},"page":"3131-3138","source":"Crossref","is-referenced-by-count":6,"title":["I\u00b2HRL: Interactive Influence-based Hierarchical Reinforcement Learning"],"prefix":"10.24963","author":[{"given":"Rundong","family":"Wang","sequence":"first","affiliation":[{"name":"Nanyang Technological University"}]},{"given":"Runsheng","family":"Yu","sequence":"additional","affiliation":[{"name":"Nanyang Technological University"}]},{"given":"Bo","family":"An","sequence":"additional","affiliation":[{"name":"Nanyang Technological University"}]},{"given":"Zinovi","family":"Rabinovich","sequence":"additional","affiliation":[{"name":"Nanyang Technological University"}]}],"member":"10584","event":{"number":"28","sponsor":["International Joint Conferences on Artificial Intelligence Organization (IJCAI)"],"acronym":"IJCAI-PRICAI-2020","name":"Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}","start":{"date-parts":[[2020,7,11]]},"theme":"Artificial Intelligence","location":"Yokohama, Japan","end":{"date-parts":[[2020,7,17]]}},"container-title":["Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence"],"original-title":[],"deposited":{"date-parts":[[2020,7,8]],"date-time":"2020-07-08T22:15:08Z","timestamp":1594246508000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.ijcai.org\/proceedings\/2020\/433"}},"subtitle":[],"proceedings-subject":"Artificial Intelligence Research Articles","short-title":[],"issued":{"date-parts":[[2020,7]]},"references-count":0,"URL":"https:\/\/doi.org\/10.24963\/ijcai.2020\/433","relation":{},"subject":[],"published":{"date-parts":[[2020,7]]}}}