{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,2]],"date-time":"2026-06-02T23:35:23Z","timestamp":1780443323565,"version":"3.54.1"},"reference-count":103,"publisher":"Association for Computing Machinery (ACM)","issue":"5","license":[{"start":{"date-parts":[[2021,6,5]],"date-time":"2021-06-05T00:00:00Z","timestamp":1622851200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Singapore Ministry of Education (MOE) Academic Research Fund (AcRF) Tier-1","award":["19-C220-SMU-023"],"award-info":[{"award-number":["19-C220-SMU-023"]}]},{"name":"National Research Foundation, Singapore under its AI Singapore Programme AISG","award":["AISG2-RP-2020-019"],"award-info":[{"award-number":["AISG2-RP-2020-019"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Comput. Surv."],"published-print":{"date-parts":[[2022,6,30]]},"abstract":"<jats:p>Hierarchical Reinforcement Learning (HRL) enables autonomous decomposition of challenging long-horizon decision-making tasks into simpler subtasks. During the past years, the landscape of HRL research has grown profoundly, resulting in copious approaches. A comprehensive overview of this vast landscape is necessary to study HRL in an organized manner. We provide a survey of the diverse HRL approaches concerning the challenges of learning hierarchical policies, subtask discovery, transfer learning, and multi-agent learning using HRL. The survey is presented according to a novel taxonomy of the approaches. Based on the survey, a set of important open problems is proposed to motivate the future research in HRL. Furthermore, we outline a few suitable task domains for evaluating the HRL approaches and a few interesting examples of the practical applications of HRL in the Supplementary Material.<\/jats:p>","DOI":"10.1145\/3453160","type":"journal-article","created":{"date-parts":[[2021,6,5]],"date-time":"2021-06-05T16:13:25Z","timestamp":1622909605000},"page":"1-35","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":411,"title":["Hierarchical Reinforcement Learning"],"prefix":"10.1145","volume":"54","author":[{"given":"Shubham","family":"Pateria","sequence":"first","affiliation":[{"name":"Nanyang Technological University, Nanyang Avenue, Singapore"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Budhitama","family":"Subagdja","sequence":"additional","affiliation":[{"name":"Singapore Management University, Singapore"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ah-hwee","family":"Tan","sequence":"additional","affiliation":[{"name":"Singapore Management University, Singapore"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Chai","family":"Quek","sequence":"additional","affiliation":[{"name":"Nanyang Technological University, Nanyang Avenue, Singapore"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2021,6,5]]},"reference":[{"key":"e_1_2_2_1_1","volume-title":"Variational option discovery algorithms. arxiv:1807.10299","author":"Achiam Joshua","year":"2018","unstructured":"Joshua Achiam , Harrison Edwards , Dario Amodei , and Pieter Abbeel . 2018. Variational option discovery algorithms. arxiv:1807.10299 ( 2018 ). Joshua Achiam, Harrison Edwards, Dario Amodei, and Pieter Abbeel. 2018. Variational option discovery algorithms. arxiv:1807.10299 (2018)."},{"key":"e_1_2_2_2_1","volume-title":"Feudal multi-agent hierarchies for cooperative reinforcement learning. arxiv:1901.08492","author":"Ahilan Sanjeevan","year":"2019","unstructured":"Sanjeevan Ahilan and Peter Dayan . 2019. Feudal multi-agent hierarchies for cooperative reinforcement learning. arxiv:1901.08492 ( 2019 ). Sanjeevan Ahilan and Peter Dayan. 2019. Feudal multi-agent hierarchies for cooperative reinforcement learning. arxiv:1901.08492 (2019)."},{"key":"e_1_2_2_3_1","volume-title":"Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI\u201917)","author":"Bacon Pierre-Luc","year":"2017","unstructured":"Pierre-Luc Bacon , Jean Harb , and Doina Precup . 2017 . The option-critic architecture . In Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI\u201917) . AAAI Press, 1726\u20131734. Pierre-Luc Bacon, Jean Harb, and Doina Precup. 2017. The option-critic architecture. In Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI\u201917). AAAI Press, 1726\u20131734."},{"key":"e_1_2_2_4_1","volume-title":"Proceedings of the 8th International Conference on Learning Representations.","author":"Bagaria Akhil","year":"2020","unstructured":"Akhil Bagaria and George Konidaris . 2020 . Option discovery using deep skill chaining . In Proceedings of the 8th International Conference on Learning Representations. Akhil Bagaria and George Konidaris. 2020. Option discovery using deep skill chaining. In Proceedings of the 8th International Conference on Learning Representations."},{"key":"e_1_2_2_5_1","volume-title":"Proceedings of the IASTED International Conference on Neural Networks and Computational Intelligence. IASTED\/ACTA Press, 125\u2013130","author":"Bakker Bram","year":"2004","unstructured":"Bram Bakker and J\u00fcrgen Schmidhuber . 2004 . Hierarchical reinforcement learning with subpolicies specializing for learned subgoals . In Proceedings of the IASTED International Conference on Neural Networks and Computational Intelligence. IASTED\/ACTA Press, 125\u2013130 . Bram Bakker and J\u00fcrgen Schmidhuber. 2004. Hierarchical reinforcement learning with subpolicies specializing for learned subgoals. In Proceedings of the IASTED International Conference on Neural Networks and Computational Intelligence. IASTED\/ACTA Press, 125\u2013130."},{"key":"e_1_2_2_6_1","volume-title":"Advances in Neural Information Processing Systems","volume":"32","author":"Barreto Andre","year":"2019","unstructured":"Andre Barreto , Diana Borsa , Shaobo Hou , Gheorghe Comanici , Eser Ayg\u00fcn , Philippe Hamel , Daniel Toyama , Jonathan Hunt , Shibl Mourad , David Silver , and Doina Precup . 2019 . The option keyboard: Combining skills in reinforcement learning . In Advances in Neural Information Processing Systems , Vol. 32 . Curran Associates, Inc., 13052\u201313062. Andre Barreto, Diana Borsa, Shaobo Hou, Gheorghe Comanici, Eser Ayg\u00fcn, Philippe Hamel, Daniel Toyama, Jonathan Hunt, Shibl Mourad, David Silver, and Doina Precup. 2019. The option keyboard: Combining skills in reinforcement learning. In Advances in Neural Information Processing Systems, Vol. 32. Curran Associates, Inc., 13052\u201313062."},{"key":"e_1_2_2_7_1","first-page":"1","article-title":"Recent advances in hierarchical reinforcement learning","volume":"13","author":"Barto Andrew G.","year":"2003","unstructured":"Andrew G. Barto and Sridhar Mahadevan . 2003 . Recent advances in hierarchical reinforcement learning . Discr. Event Dyn. Syst. 13 , 1 - 2 (2003), 41\u201377. DOI:https:\/\/doi.org\/10.1023\/A:1025696116075 Andrew G. Barto and Sridhar Mahadevan. 2003. Recent advances in hierarchical reinforcement learning. Discr. Event Dyn. Syst. 13, 1-2 (2003), 41\u201377. DOI:https:\/\/doi.org\/10.1023\/A:1025696116075","journal-title":"Discr. Event Dyn. Syst."},{"key":"e_1_2_2_8_1","volume-title":"A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Statist. 41, 1 (02","author":"Baum Leonard E.","year":"1970","unstructured":"Leonard E. Baum , Ted Petrie , George Soules , and Norman Weiss . 1970. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Statist. 41, 1 (02 1970 ), 164\u2013171. DOI:https:\/\/doi.org\/10.1214\/aoms\/1177697196 Leonard E. Baum, Ted Petrie, George Soules, and Norman Weiss. 1970. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Statist. 41, 1 (02 1970), 164\u2013171. DOI:https:\/\/doi.org\/10.1214\/aoms\/1177697196"},{"key":"e_1_2_2_9_1","volume-title":"Wiley Encyclopedia of Operations Research and Management Science (2010). DOI:https:\/\/doi.org\/10.1002\/9780470400531.eorms0757","author":"Baykal-G\u00fcrsoy Melike","unstructured":"Melike Baykal-G\u00fcrsoy . 2010. Semi-Markov decision processes . Wiley Encyclopedia of Operations Research and Management Science (2010). DOI:https:\/\/doi.org\/10.1002\/9780470400531.eorms0757 Melike Baykal-G\u00fcrsoy. 2010. Semi-Markov decision processes. Wiley Encyclopedia of Operations Research and Management Science (2010). DOI:https:\/\/doi.org\/10.1002\/9780470400531.eorms0757"},{"key":"e_1_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.5555\/3157096.3157262"},{"key":"e_1_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/321105.321111"},{"key":"e_1_2_2_12_1","volume-title":"The theory of dynamic programming. Bull. Amer. Math. Soc. 60, 6 (11","author":"Bellman Richard","year":"1954","unstructured":"Richard Bellman . 1954. The theory of dynamic programming. Bull. Amer. Math. Soc. 60, 6 (11 1954 ), 503\u2013515. Richard Bellman. 1954. The theory of dynamic programming. Bull. Amer. Math. Soc. 60, 6 (11 1954), 503\u2013515."},{"key":"e_1_2_2_13_1","volume-title":"Elements of Robotics","author":"Ben-Ari Mordechai","unstructured":"Mordechai Ben-Ari and Francesco Mondada . 2018. Finite state machines . In Elements of Robotics . Springer , 55\u201361. DOI:https:\/\/doi.org\/10.1007\/978-3-319-62533-1_4 Mordechai Ben-Ari and Francesco Mondada. 2018. Finite state machines. In Elements of Robotics. Springer, 55\u201361. DOI:https:\/\/doi.org\/10.1007\/978-3-319-62533-1_4"},{"key":"e_1_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.5555\/3398761.3398984"},{"key":"e_1_2_2_15_1","doi-asserted-by":"crossref","unstructured":"Z. Chen and B. Liu. 2018. Lifelong Machine Learning. Vol. 12. Morgan & Claypool Publishers. 1\u2013207 pages. DOI:https:\/\/doi.org\/10.2200\/S00737ED1V01Y201610AIM033  Z. Chen and B. Liu. 2018. Lifelong Machine Learning. Vol. 12. Morgan & Claypool Publishers. 1\u2013207 pages. DOI:https:\/\/doi.org\/10.2200\/S00737ED1V01Y201610AIM033","DOI":"10.2200\/S00737ED1V01Y201610AIM033"},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/W14-4012"},{"key":"e_1_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-016-5580-x"},{"key":"e_1_2_2_18_1","first-page":"225","article-title":"On the max flow min cut theorem of networks","volume":"38","author":"Dantzig G.","year":"2003","unstructured":"G. Dantzig and Delbert Ray Fulkerson . 2003 . On the max flow min cut theorem of networks . Lin. Ineq. Relat. Syst. 38 (2003), 225 \u2013 231 . G. Dantzig and Delbert Ray Fulkerson. 2003. On the max flow min cut theorem of networks. Lin. Ineq. Relat. Syst. 38 (2003), 225\u2013231.","journal-title":"Lin. Ineq. Relat. Syst."},{"key":"e_1_2_2_19_1","volume-title":"Hinton","author":"Dayan Peter","year":"1993","unstructured":"Peter Dayan and Geoffrey E . Hinton . 1993 . Feudal reinforcement learning. In Advances in Neural Information Processing Systems, Vol. 5 . Morgan-Kaufmann , 271\u2013278. Peter Dayan and Geoffrey E. Hinton. 1993. Feudal reinforcement learning. In Advances in Neural Information Processing Systems, Vol. 5. Morgan-Kaufmann, 271\u2013278."},{"key":"e_1_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1613\/jair.639"},{"key":"e_1_2_2_21_1","article-title":"Hierarchical reinforcement learning: A survey","volume":"4","author":"Al-Emran Mostafa","year":"2015","unstructured":"Mostafa Al-Emran . 2015 . Hierarchical reinforcement learning: A survey . Int. J. Comput. Dig. Syst. 4 , 02 (2015). DOI:https:\/\/doi.org\/10.12785\/IJCDS\/040207 Mostafa Al-Emran. 2015. Hierarchical reinforcement learning: A survey. Int. J. Comput. Dig. Syst. 4, 02 (2015). DOI:https:\/\/doi.org\/10.12785\/IJCDS\/040207","journal-title":"Int. J. Comput. Dig. Syst."},{"key":"e_1_2_2_22_1","volume-title":"Why does unsupervised pre-training help deep learning?J. Mach. Learn. Res. 11 (Mar","author":"Erhan Dumitru","year":"2010","unstructured":"Dumitru Erhan , Yoshua Bengio , Aaron Courville , Pierre-Antoine Manzagol , Pascal Vincent , and Samy Bengio . 2010. Why does unsupervised pre-training help deep learning?J. Mach. Learn. Res. 11 (Mar . 2010 ), 625\u2013660. Dumitru Erhan, Yoshua Bengio, Aaron Courville, Pierre-Antoine Manzagol, Pascal Vincent, and Samy Bengio. 2010. Why does unsupervised pre-training help deep learning?J. Mach. Learn. Res. 11 (Mar. 2010), 625\u2013660."},{"key":"e_1_2_2_23_1","volume-title":"Advances in Neural Information Processing Systems","volume":"32","author":"Eysenbach Ben","year":"2019","unstructured":"Ben Eysenbach , Russ R. Salakhutdinov , and Sergey Levine . 2019 . Search on the replay buffer: Bridging planning and reinforcement learning . In Advances in Neural Information Processing Systems , Vol. 32 . Curran Associates, Inc., 15246\u201315257. Ben Eysenbach, Russ R. Salakhutdinov, and Sergey Levine. 2019. Search on the replay buffer: Bridging planning and reinforcement learning. In Advances in Neural Information Processing Systems, Vol. 32. Curran Associates, Inc., 15246\u201315257."},{"key":"e_1_2_2_24_1","volume-title":"Diversity is all you need: Learning skills without a reward function. arxiv:1802.06070","author":"Eysenbach Benjamin","year":"2018","unstructured":"Benjamin Eysenbach , Abhishek Gupta , Julian Ibarz , and Sergey Levine . 2018. Diversity is all you need: Learning skills without a reward function. arxiv:1802.06070 ( 2018 ). Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, and Sergey Levine. 2018. Diversity is all you need: Learning skills without a reward function. arxiv:1802.06070 (2018)."},{"key":"e_1_2_2_25_1","volume-title":"Stochastic neural networks for hierarchical reinforcement learning. arxiv:1704.03012","author":"Florensa Carlos","year":"2017","unstructured":"Carlos Florensa , Yan Duan , and Pieter Abbeel . 2017. Stochastic neural networks for hierarchical reinforcement learning. arxiv:1704.03012 ( 2017 ). Carlos Florensa, Yan Duan, and Pieter Abbeel. 2017. Stochastic neural networks for hierarchical reinforcement learning. arxiv:1704.03012 (2017)."},{"key":"e_1_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.5555\/3157096.3157336"},{"key":"e_1_2_2_27_1","volume-title":"Counterfactual multi-agent policy gradients. arxiv:1705.08926","author":"Foerster Jakob N.","year":"2017","unstructured":"Jakob N. Foerster , Gregory Farquhar , Triantafyllos Afouras , Nantas Nardelli , and Shimon Whiteson . 2017. Counterfactual multi-agent policy gradients. arxiv:1705.08926 ( 2017 ). Jakob N. Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, and Shimon Whiteson. 2017. Counterfactual multi-agent policy gradients. arxiv:1705.08926 (2017)."},{"key":"e_1_2_2_28_1","volume-title":"Multi-level discovery of deep options. arxiv:1703.08294","author":"Fox Roy","year":"2017","unstructured":"Roy Fox , Sanjay Krishnan , Ion Stoica , and Ken Goldberg . 2017. Multi-level discovery of deep options. arxiv:1703.08294 ( 2017 ). Roy Fox, Sanjay Krishnan, Ion Stoica, and Ken Goldberg. 2017. Multi-level discovery of deep options. arxiv:1703.08294 (2017)."},{"key":"e_1_2_2_29_1","volume-title":"Meta learning shared hierarchies. arxiv:1710.09767","author":"Frans Kevin","year":"2017","unstructured":"Kevin Frans , Jonathan Ho , Xi Chen , Pieter Abbeel , and John Schulman . 2017. Meta learning shared hierarchies. arxiv:1710.09767 ( 2017 ). Kevin Frans, Jonathan Ho, Xi Chen, Pieter Abbeel, and John Schulman. 2017. Meta learning shared hierarchies. arxiv:1710.09767 (2017)."},{"key":"e_1_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10458-006-7035-4"},{"key":"e_1_2_2_31_1","volume-title":"Danilo Jimenez Rezende, and Daan Wierstra","author":"Gregor Karol","year":"2016","unstructured":"Karol Gregor , Danilo Jimenez Rezende, and Daan Wierstra . 2016 . Variational intrinsic control. arxiv:1611.07507 (2016). Karol Gregor, Danilo Jimenez Rezende, and Daan Wierstra. 2016. Variational intrinsic control. arxiv:1611.07507 (2016)."},{"key":"e_1_2_2_32_1","volume-title":"Deep reinforcement learning for robotic manipulation. CoRR abs\/1610.00633","author":"Gu Shixiang","year":"2016","unstructured":"Shixiang Gu , Ethan Holly , Timothy P. Lillicrap , and Sergey Levine . 2016. Deep reinforcement learning for robotic manipulation. CoRR abs\/1610.00633 ( 2016 ). Shixiang Gu, Ethan Holly, Timothy P. Lillicrap, and Sergey Levine. 2016. Deep reinforcement learning for robotic manipulation. CoRR abs\/1610.00633 (2016)."},{"key":"e_1_2_2_33_1","volume-title":"Proceedings of the Conference on Robot Learning (Proceedings of Machine Learning Research)","volume":"100","author":"Gupta Abhishek","year":"2020","unstructured":"Abhishek Gupta , Vikash Kumar , Corey Lynch , Sergey Levine , and Karol Hausman . 2020 . Relay policy learning: Solving long-horizon tasks via imitation and reinforcement learning . In Proceedings of the Conference on Robot Learning (Proceedings of Machine Learning Research) , Vol. 100 . PMLR, 1025\u20131037. Abhishek Gupta, Vikash Kumar, Corey Lynch, Sergey Levine, and Karol Hausman. 2020. Relay policy learning: Solving long-horizon tasks via imitation and reinforcement learning. In Proceedings of the Conference on Robot Learning (Proceedings of Machine Learning Research), Vol. 100. PMLR, 1025\u20131037."},{"key":"e_1_2_2_34_1","volume-title":"Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research)","volume":"80","author":"Haarnoja Tuomas","year":"2018","unstructured":"Tuomas Haarnoja , Aurick Zhou , Pieter Abbeel , and Sergey Levine . 2018 . Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor . In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research) , Vol. 80 . PMLR,, 1861\u20131870. Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. 2018. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research), Vol. 80. PMLR,, 1861\u20131870."},{"key":"e_1_2_2_35_1","unstructured":"Tuomas Haarnoja Aurick Zhou Sehoon Ha Jie Tan George Tucker and Sergey Levine. 2018. Learning to walk via deep reinforcement learning. (2018). arxiv:1812.11103  Tuomas Haarnoja Aurick Zhou Sehoon Ha Jie Tan George Tucker and Sergey Levine. 2018. Learning to walk via deep reinforcement learning. (2018). arxiv:1812.11103"},{"key":"e_1_2_2_36_1","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","volume":"32","author":"Harb Jean","year":"2018","unstructured":"Jean Harb , Pierre-Luc Bacon , Martin Klissarov , and Doina Precup . 2018 . When waiting is not an option: Learning options with a deliberation cost . In Proceedings of the AAAI Conference on Artificial Intelligence , Vol. 32 . Jean Harb, Pierre-Luc Bacon, Martin Klissarov, and Doina Precup. 2018. When waiting is not an option: Learning options with a deliberation cost. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32."},{"key":"e_1_2_2_37_1","volume-title":"Proceedings of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research)","volume":"119","author":"Hasenclever Leonard","year":"2020","unstructured":"Leonard Hasenclever , Fabio Pardo , Raia Hadsell , Nicolas Heess , and Josh Merel . 2020 . CoMic: Complementary task learning & mimicry for reusable skills . In Proceedings of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research) , Vol. 119 . PMLR, 4105\u20134115. Leonard Hasenclever, Fabio Pardo, Raia Hadsell, Nicolas Heess, and Josh Merel. 2020. CoMic: Complementary task learning & mimicry for reusable skills. In Proceedings of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research), Vol. 119. PMLR, 4105\u20134115."},{"key":"e_1_2_2_38_1","volume-title":"Proceedings of the 6th International Conference on Learning Representations.","author":"Hausman Karol","unstructured":"Karol Hausman , Jost Tobias Springenberg , Ziyu Wang , Nicolas Heess , and Martin A. Riedmiller . 2018. Learning an embedding space for transferable robot skills . In Proceedings of the 6th International Conference on Learning Representations. Karol Hausman, Jost Tobias Springenberg, Ziyu Wang, Nicolas Heess, and Martin A. Riedmiller. 2018. Learning an embedding space for transferable robot skills. In Proceedings of the 6th International Conference on Learning Representations."},{"key":"e_1_2_2_39_1","volume-title":"Encyclopedia of Machine Learning","author":"Hengst Bernhard","unstructured":"Bernhard Hengst . 2010. Hierarchical reinforcement learning . In Encyclopedia of Machine Learning . Springer US , Boston, MA , 495\u2013502. DOI:https:\/\/doi.org\/10.1007\/978-0-387-30164-8_363 Bernhard Hengst. 2010. Hierarchical reinforcement learning. In Encyclopedia of Machine Learning. Springer US, Boston, MA, 495\u2013502. DOI:https:\/\/doi.org\/10.1007\/978-0-387-30164-8_363"},{"key":"e_1_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_2_2_41_1","volume-title":"Advances in Neural Information Processing Systems","volume":"32","author":"Jiang YiDing","year":"2019","unstructured":"YiDing Jiang , Shixiang (Shane) Gu , Kevin P. Murphy , and Chelsea Finn . 2019 . Language as an abstraction for hierarchical deep reinforcement learning . In Advances in Neural Information Processing Systems , Vol. 32 . Curran Associates, Inc., 9419\u20139431. YiDing Jiang, Shixiang (Shane) Gu, Kevin P. Murphy, and Chelsea Finn. 2019. Language as an abstraction for hierarchical deep reinforcement learning. In Advances in Neural Information Processing Systems, Vol. 32. Curran Associates, Inc., 9419\u20139431."},{"key":"e_1_2_2_42_1","volume-title":"Girshick","author":"Johnson Justin","year":"2016","unstructured":"Justin Johnson , Bharath Hariharan , Laurens van der Maaten , Li Fei-Fei , C. Lawrence Zitnick , and Ross B . Girshick . 2016 . CLEVR : A diagnostic dataset for compositional language and elementary visual reasoning. arxiv:1612.06890 (2016). Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Li Fei-Fei, C. Lawrence Zitnick, and Ross B. Girshick. 2016. CLEVR: A diagnostic dataset for compositional language and elementary visual reasoning. arxiv:1612.06890 (2016)."},{"key":"e_1_2_2_43_1","doi-asserted-by":"publisher","DOI":"10.5555\/1402383.1402429"},{"key":"e_1_2_2_44_1","doi-asserted-by":"publisher","DOI":"10.5555\/1643275.1643301"},{"key":"e_1_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i04.5871"},{"key":"e_1_2_2_46_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33019955"},{"key":"e_1_2_2_47_1","volume-title":"Proceedings of the 2nd International Conference on Learning Representations.","author":"Diederik","unstructured":"Diederik P. Kingma and Max Welling. 2014. Auto-encoding variational Bayes . In Proceedings of the 2nd International Conference on Learning Representations. Diederik P. Kingma and Max Welling. 2014. Auto-encoding variational Bayes. In Proceedings of the 2nd International Conference on Learning Representations."},{"key":"e_1_2_2_48_1","volume-title":"Learnings options end-to-end for continuous action tasks. CoRR abs\/1712.00004","author":"Klissarov Martin","year":"2017","unstructured":"Martin Klissarov , Pierre-Luc Bacon , Jean Harb , and Doina Precup . 2017. Learnings options end-to-end for continuous action tasks. CoRR abs\/1712.00004 ( 2017 ). Martin Klissarov, Pierre-Luc Bacon, Jean Harb, and Doina Precup. 2017. Learnings options end-to-end for continuous action tasks. CoRR abs\/1712.00004 (2017)."},{"key":"e_1_2_2_49_1","volume-title":"Proceedings of the 20th International Joint Conference on Artifical Intelligence (IJCAI\u201907)","author":"Konidaris George","year":"2007","unstructured":"George Konidaris and Andrew Barto . 2007 . Building portable options: Skill transfer in reinforcement learning . In Proceedings of the 20th International Joint Conference on Artifical Intelligence (IJCAI\u201907) . Morgan Kaufmann Publishers Inc., San Francisco, CA, 895\u2013900. George Konidaris and Andrew Barto. 2007. Building portable options: Skill transfer in reinforcement learning. In Proceedings of the 20th International Joint Conference on Artifical Intelligence (IJCAI\u201907). Morgan Kaufmann Publishers Inc., San Francisco, CA, 895\u2013900."},{"key":"e_1_2_2_50_1","volume-title":"Proceedings of the 22nd International Conference on Neural Information Processing Systems (NIPS\u201909)","author":"Konidaris George","year":"2009","unstructured":"George Konidaris and Andrew Barto . 2009 . Skill discovery in continuous reinforcement learning domains using skill chaining . In Proceedings of the 22nd International Conference on Neural Information Processing Systems (NIPS\u201909) . Curran Associates Inc., Red Hook, NY, 1015\u20131023. George Konidaris and Andrew Barto. 2009. Skill discovery in continuous reinforcement learning domains using skill chaining. In Proceedings of the 22nd International Conference on Neural Information Processing Systems (NIPS\u201909). Curran Associates Inc., Red Hook, NY, 1015\u20131023."},{"key":"e_1_2_2_51_1","volume-title":"Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS\u201916)","author":"Kulkarni Tejas D.","unstructured":"Tejas D. Kulkarni , Karthik R. Narasimhan , Ardavan Saeedi , and Joshua B. Tenenbaum . 2016. Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation . In Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS\u201916) . Curran Associates Inc., Red Hook, NY, 3682\u20133690. Tejas D. Kulkarni, Karthik R. Narasimhan, Ardavan Saeedi, and Joshua B. Tenenbaum. 2016. Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. In Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS\u201916). Curran Associates Inc., Red Hook, NY, 3682\u20133690."},{"key":"e_1_2_2_52_1","volume-title":"Gershman","author":"Kulkarni Tejas D.","year":"2016","unstructured":"Tejas D. Kulkarni , Ardavan Saeedi , Simanta Gautam , and Samuel J . Gershman . 2016 . Deep successor reinforcement learning. arxiv:1606.02396 (2016). Tejas D. Kulkarni, Ardavan Saeedi, Simanta Gautam, and Samuel J. Gershman. 2016. Deep successor reinforcement learning. arxiv:1606.02396 (2016)."},{"key":"e_1_2_2_53_1","volume-title":"Reinforcement Learning","author":"Lazaric Alessandro","unstructured":"Alessandro Lazaric . 2012. Transfer in reinforcement learning: A framework and a survey . In Reinforcement Learning . Springer , 143\u2013173. DOI:https:\/\/doi.org\/10.1007\/978-3-642-27645-3_5 Alessandro Lazaric. 2012. Transfer in reinforcement learning: A framework and a survey. In Reinforcement Learning. Springer, 143\u2013173. DOI:https:\/\/doi.org\/10.1007\/978-3-642-27645-3_5"},{"key":"e_1_2_2_54_1","volume-title":"Proceedings of the 7th International Conference on Learning Representations.","author":"Levy Andrew","year":"2019","unstructured":"Andrew Levy , George Dimitri Konidaris , Robert Platt Jr ., and Kate Saenko . 2019 . Learning multi-level hierarchies with hindsight . In Proceedings of the 7th International Conference on Learning Representations. Andrew Levy, George Dimitri Konidaris, Robert Platt Jr., and Kate Saenko. 2019. Learning multi-level hierarchies with hindsight. In Proceedings of the 7th International Conference on Learning Representations."},{"key":"e_1_2_2_55_1","volume-title":"Proceedings of the 9th European Conference on Recent Advances in Reinforcement Learning (EWRL\u201911)","author":"Kfir","unstructured":"Kfir Y. Levy and Nahum Shimkin. 2011. Unified inter and intra options learning using policy gradient methods . In Proceedings of the 9th European Conference on Recent Advances in Reinforcement Learning (EWRL\u201911) . Springer-Verlag, Berlin, 153\u2013164. DOI:https:\/\/doi.org\/10.1007\/978-3-642-29946-9_17 Kfir Y. Levy and Nahum Shimkin. 2011. Unified inter and intra options learning using policy gradient methods. In Proceedings of the 9th European Conference on Recent Advances in Reinforcement Learning (EWRL\u201911). Springer-Verlag, Berlin, 153\u2013164. DOI:https:\/\/doi.org\/10.1007\/978-3-642-29946-9_17"},{"key":"e_1_2_2_56_1","volume-title":"Proceedings of the 4th International Conference on Learning Representations.","author":"Lillicrap Timothy P.","year":"2016","unstructured":"Timothy P. Lillicrap , Jonathan J. Hunt , Alexander Pritzel , Nicolas Heess , Tom Erez , Yuval Tassa , David Silver , and Daan Wierstra . 2016 . Continuous control with deep reinforcement learning . In Proceedings of the 4th International Conference on Learning Representations. Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2016. Continuous control with deep reinforcement learning. In Proceedings of the 4th International Conference on Learning Representations."},{"key":"e_1_2_2_57_1","doi-asserted-by":"publisher","DOI":"10.1109\/IJCNN.2000.859416"},{"key":"e_1_2_2_58_1","volume-title":"Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI\u201916)","author":"Liu Miao","unstructured":"Miao Liu , Christopher Amato , Emily P. Anesta , J. Daniel Griffith , and Jonathan P. How . 2016. Learning for decentralized control of multiagent systems in large, partially-observable stochastic environments . In Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI\u201916) . AAAI Press, 2523\u20132529. Retrieved from https:\/\/dl.acm.org\/doi\/10.5555\/3016100.3016253. Miao Liu, Christopher Amato, Emily P. Anesta, J. Daniel Griffith, and Jonathan P. How. 2016. Learning for decentralized control of multiagent systems in large, partially-observable stochastic environments. In Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI\u201916). AAAI Press, 2523\u20132529. Retrieved from https:\/\/dl.acm.org\/doi\/10.5555\/3016100.3016253."},{"key":"e_1_2_2_59_1","doi-asserted-by":"publisher","DOI":"10.5555\/3295222.3295385"},{"key":"e_1_2_2_60_1","volume-title":"Proceedings of the 34th International Conference on Machine Learning (ICML\u201917)","author":"Machado Marlos C.","year":"2017","unstructured":"Marlos C. Machado , Marc G. Bellemare , and Michael Bowling . 2017 . A Laplacian framework for option discovery in reinforcement learning . In Proceedings of the 34th International Conference on Machine Learning (ICML\u201917) . JMLR.org, 2295\u20132304. Marlos C. Machado, Marc G. Bellemare, and Michael Bowling. 2017. A Laplacian framework for option discovery in reinforcement learning. In Proceedings of the 34th International Conference on Machine Learning (ICML\u201917). JMLR.org, 2295\u20132304."},{"key":"e_1_2_2_61_1","volume-title":"Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes. J. Mach. Learn. Res. 8 (Dec","author":"Mahadevan Sridhar","year":"2007","unstructured":"Sridhar Mahadevan and Mauro Maggioni . 2007. Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes. J. Mach. Learn. Res. 8 (Dec . 2007 ), 2169\u20132231. Sridhar Mahadevan and Mauro Maggioni. 2007. Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes. J. Mach. Learn. Res. 8 (Dec. 2007), 2169\u20132231."},{"key":"e_1_2_2_62_1","doi-asserted-by":"publisher","DOI":"10.1145\/375735.376302"},{"key":"e_1_2_2_63_1","volume-title":"Advances in Neural Information Processing Systems","author":"Maron Oded","unstructured":"Oded Maron and Tom\u00e1s Lozano-P\u00e9rez . 1998. A framework for multiple-instance learning . In Advances in Neural Information Processing Systems , Vol. 10 . The MIT Press , 570\u2013576. Oded Maron and Tom\u00e1s Lozano-P\u00e9rez. 1998. A framework for multiple-instance learning. In Advances in Neural Information Processing Systems, Vol. 10. The MIT Press, 570\u2013576."},{"key":"e_1_2_2_64_1","volume-title":"Proceedings of the 18th International Conference on Machine Learning (ICML\u201901)","author":"McGovern Amy","unstructured":"Amy McGovern and Andrew G. Barto . 2001. Automatic discovery of subgoals in reinforcement learning using diverse density . In Proceedings of the 18th International Conference on Machine Learning (ICML\u201901) . Morgan Kaufmann Publishers Inc., San Francisco, CA, 361\u2013368. Amy McGovern and Andrew G. Barto. 2001. Automatic discovery of subgoals in reinforcement learning using diverse density. In Proceedings of the 18th International Conference on Machine Learning (ICML\u201901). Morgan Kaufmann Publishers Inc., San Francisco, CA, 361\u2013368."},{"key":"e_1_2_2_66_1","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-36755-1_25"},{"key":"e_1_2_2_67_1","doi-asserted-by":"publisher","DOI":"10.1145\/3291045"},{"key":"e_1_2_2_68_1","volume-title":"Georg Ostrovski et\u00a0al","author":"Mnih Volodymyr","year":"2015","unstructured":"Volodymyr Mnih , Koray Kavukcuoglu , David Silver , Andrei A. Rusu , Joel Veness , Marc G. Bellemare , Alex Graves , Martin Riedmiller , Andreas K. Fidjeland , Georg Ostrovski et\u00a0al . 2015 . Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529\u2013533. DOI:https:\/\/doi.org\/10.1038\/nature14236 Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski et\u00a0al. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529\u2013533. DOI:https:\/\/doi.org\/10.1038\/nature14236"},{"key":"e_1_2_2_69_1","doi-asserted-by":"publisher","DOI":"10.5555\/3327144.3327250"},{"key":"e_1_2_2_70_1","volume-title":"Near-optimal representation learning for hierarchical reinforcement learning. arxiv:1810.01257","author":"Nachum Ofir","year":"2018","unstructured":"Ofir Nachum , Shixiang Gu , Honglak Lee , and Sergey Levine . 2018. Near-optimal representation learning for hierarchical reinforcement learning. arxiv:1810.01257 ( 2018 ). Ofir Nachum, Shixiang Gu, Honglak Lee, and Sergey Levine. 2018. Near-optimal representation learning for hierarchical reinforcement learning. arxiv:1810.01257 (2018)."},{"key":"e_1_2_2_71_1","volume-title":"Why does hierarchy (sometimes) work so well in reinforcement learning? arxiv:1909.10618","author":"Nachum Ofir","year":"2019","unstructured":"Ofir Nachum , Haoran Tang , Xingyu Lu , Shixiang Gu , Honglak Lee , and Sergey Levine . 2019. Why does hierarchy (sometimes) work so well in reinforcement learning? arxiv:1909.10618 ( 2019 ). Ofir Nachum, Haoran Tang, Xingyu Lu, Shixiang Gu, Honglak Lee, and Sergey Levine. 2019. Why does hierarchy (sometimes) work so well in reinforcement learning? arxiv:1909.10618 (2019)."},{"key":"e_1_2_2_72_1","volume-title":"Proceedings of the IEEE International Conference on Robotics and Automation (ICRA\u201915)","author":"Omidshafiei S.","year":"2015","unstructured":"S. Omidshafiei , A. Agha-mohammadi, C. Amato , and J. P. How . 2015. Decentralized control of partially observable Markov decision processes using belief space macro-actions . In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA\u201915) . 5962\u20135969. DOI:https:\/\/doi.org\/10.1109\/ICRA. 2015 .7140035 S. Omidshafiei, A. Agha-mohammadi, C. Amato, and J. P. How. 2015. Decentralized control of partially observable Markov decision processes using belief space macro-actions. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA\u201915). 5962\u20135969. DOI:https:\/\/doi.org\/10.1109\/ICRA.2015.7140035"},{"key":"e_1_2_2_73_1","volume-title":"Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS\u201997)","author":"Parr Ronald","year":"1998","unstructured":"Ronald Parr and Stuart Russell . 1998 . Reinforcement learning with hierarchies of machines . In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS\u201997) . The MIT Press, Cambridge, MA, 1043\u20131049. Ronald Parr and Stuart Russell. 1998. Reinforcement learning with hierarchies of machines. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS\u201997). The MIT Press, Cambridge, MA, 1043\u20131049."},{"key":"e_1_2_2_74_1","doi-asserted-by":"publisher","DOI":"10.1109\/SSCI44817.2019.9002777"},{"key":"e_1_2_2_75_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2017.70"},{"key":"e_1_2_2_76_1","volume-title":"Gregory Farquhar, Jakob N. Foerster, and Shimon Whiteson.","author":"Rashid Tabish","year":"2018","unstructured":"Tabish Rashid , Mikayel Samvelyan , Christian Schr\u00f6der de Witt , Gregory Farquhar, Jakob N. Foerster, and Shimon Whiteson. 2018 . QMIX : Monotonic value function factorisation for deep multi-agent reinforcement learning. arxiv:1803.11485 (2018). Tabish Rashid, Mikayel Samvelyan, Christian Schr\u00f6der de Witt, Gregory Farquhar, Jakob N. Foerster, and Shimon Whiteson. 2018. QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning. arxiv:1803.11485 (2018)."},{"key":"e_1_2_2_77_1","volume-title":"Self-consistent trajectory autoencoder: Hierarchical reinforcement learning with trajectory embeddings. arxiv:1806.02813","author":"Co-Reyes John D.","year":"2018","unstructured":"John D. Co-Reyes , Yuxuan Liu , Abhishek Gupta , Benjamin Eysenbach , Pieter Abbeel , and Sergey Levine . 2018. Self-consistent trajectory autoencoder: Hierarchical reinforcement learning with trajectory embeddings. arxiv:1806.02813 ( 2018 ). John D. Co-Reyes, Yuxuan Liu, Abhishek Gupta, Benjamin Eysenbach, Pieter Abbeel, and Sergey Levine. 2018. Self-consistent trajectory autoencoder: Hierarchical reinforcement learning with trajectory embeddings. arxiv:1806.02813 (2018)."},{"key":"e_1_2_2_78_1","volume-title":"Advances in Neural Information Processing Systems","volume":"31","author":"Riemer Matthew","year":"2018","unstructured":"Matthew Riemer , Miao Liu , and Gerald Tesauro . 2018 . Learning abstract options . In Advances in Neural Information Processing Systems , Vol. 31 . Curran Associates, Inc., 10424\u201310434. Matthew Riemer, Miao Liu, and Gerald Tesauro. 2018. Learning abstract options. In Advances in Neural Information Processing Systems, Vol. 31. Curran Associates, Inc., 10424\u201310434."},{"key":"e_1_2_2_79_1","doi-asserted-by":"publisher","DOI":"10.5555\/2968618.2968823"},{"key":"e_1_2_2_80_1","volume-title":"Artificial Intelligence: A Modern Approach","author":"Russell Stuart","year":"2009","unstructured":"Stuart Russell and Peter Norvig . 2009 . Artificial Intelligence: A Modern Approach ( 3 rd ed.). Prentice Hall Press . Stuart Russell and Peter Norvig. 2009. Artificial Intelligence: A Modern Approach (3rd ed.). Prentice Hall Press.","edition":"3"},{"key":"e_1_2_2_81_1","volume-title":"Proceedings of the 4th International Conference on Learning Representations.","author":"Rusu Andrei A.","year":"2016","unstructured":"Andrei A. Rusu , Sergio Gomez Colmenarejo , \u00c7aglar G\u00fcl\u00e7ehre , Guillaume Desjardins , James Kirkpatrick , Razvan Pascanu , Volodymyr Mnih , Koray Kavukcuoglu , and Raia Hadsell . 2016 . Policy distillation . In Proceedings of the 4th International Conference on Learning Representations. Andrei A. Rusu, Sergio Gomez Colmenarejo, \u00c7aglar G\u00fcl\u00e7ehre, Guillaume Desjardins, James Kirkpatrick, Razvan Pascanu, Volodymyr Mnih, Koray Kavukcuoglu, and Raia Hadsell. 2016. Policy distillation. In Proceedings of the 4th International Conference on Learning Representations."},{"key":"e_1_2_2_82_1","volume-title":"Proceedings of the 32nd International Conference on Machine Learning (Proceedings of Machine Learning Research)","volume":"37","author":"Schulman John","year":"2015","unstructured":"John Schulman , Sergey Levine , Pieter Abbeel , Michael Jordan , and Philipp Moritz . 2015 . Trust region policy optimization . In Proceedings of the 32nd International Conference on Machine Learning (Proceedings of Machine Learning Research) , Vol. 37 . PMLR,, 1889\u20131897. John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. 2015. Trust region policy optimization. In Proceedings of the 32nd International Conference on Machine Learning (Proceedings of Machine Learning Research), Vol. 37. PMLR,, 1889\u20131897."},{"key":"e_1_2_2_83_1","volume-title":"Dynamics-aware unsupervised discovery of skills. arxiv:1907.01657","author":"Sharma Archit","year":"2019","unstructured":"Archit Sharma , Shixiang Gu , Sergey Levine , Vikash Kumar , and Karol Hausman . 2019. Dynamics-aware unsupervised discovery of skills. arxiv:1907.01657 ( 2019 ). Archit Sharma, Shixiang Gu, Sergey Levine, Vikash Kumar, and Karol Hausman. 2019. Dynamics-aware unsupervised discovery of skills. arxiv:1907.01657 (2019)."},{"key":"e_1_2_2_84_1","volume-title":"Proceedings of the 21st International Conference on Neural Information Processing Systems (NIPS\u201908)","author":"\u015eim\u015fek \u00d6zg\u00fcr","unstructured":"\u00d6zg\u00fcr \u015eim\u015fek and Andrew G. Barto . 2008. Skill characterization based on betweenness . In Proceedings of the 21st International Conference on Neural Information Processing Systems (NIPS\u201908) . Curran Associates Inc., Red Hook, NY, 1497\u20131504. \u00d6zg\u00fcr \u015eim\u015fek and Andrew G. Barto. 2008. Skill characterization based on betweenness. In Proceedings of the 21st International Conference on Neural Information Processing Systems (NIPS\u201908). Curran Associates Inc., Red Hook, NY, 1497\u20131504."},{"key":"e_1_2_2_85_1","volume-title":"Barto","author":"\u015eim\u015fek \u00d6zg\u00fcr","year":"2005","unstructured":"\u00d6zg\u00fcr \u015eim\u015fek , Alicia P. Wolfe , and Andrew G . Barto . 2005 . Identifying useful subgoals in reinforcement learning by local graph partitioning (ICML\u201905). Association for Computing Machinery , New York, NY, 816\u2013823. DOI:https:\/\/doi.org\/10.1145\/1102351.1102454 \u00d6zg\u00fcr \u015eim\u015fek, Alicia P. Wolfe, and Andrew G. Barto. 2005. Identifying useful subgoals in reinforcement learning by local graph partitioning (ICML\u201905). Association for Computing Machinery, New York, NY, 816\u2013823. DOI:https:\/\/doi.org\/10.1145\/1102351.1102454"},{"key":"e_1_2_2_86_1","doi-asserted-by":"publisher","DOI":"10.5555\/3327757.3327818"},{"key":"e_1_2_2_87_1","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-45622-8_16"},{"key":"e_1_2_2_88_1","volume-title":"Learning goal embeddings via self-play for hierarchical reinforcement learning. arxiv:1811.09083","author":"Sukhbaatar Sainbayar","year":"2018","unstructured":"Sainbayar Sukhbaatar , Emily Denton , Arthur Szlam , and Rob Fergus . 2018. Learning goal embeddings via self-play for hierarchical reinforcement learning. arxiv:1811.09083 ( 2018 ). Sainbayar Sukhbaatar, Emily Denton, Arthur Szlam, and Rob Fergus. 2018. Learning goal embeddings via self-play for hierarchical reinforcement learning. arxiv:1811.09083 (2018)."},{"key":"e_1_2_2_89_1","volume-title":"Intrinsic motivation and automatic curricula via asymmetric self-play. arxiv:1703.05407","author":"Sukhbaatar Sainbayar","year":"2017","unstructured":"Sainbayar Sukhbaatar , Ilya Kostrikov , Arthur Szlam , and Rob Fergus . 2017. Intrinsic motivation and automatic curricula via asymmetric self-play. arxiv:1703.05407 ( 2017 ). Sainbayar Sukhbaatar, Ilya Kostrikov, Arthur Szlam, and Rob Fergus. 2017. Intrinsic motivation and automatic curricula via asymmetric self-play. arxiv:1703.05407 (2017)."},{"key":"e_1_2_2_90_1","doi-asserted-by":"publisher","DOI":"10.5555\/3157096.3157348"},{"key":"e_1_2_2_91_1","volume-title":"Barto","author":"Sutton Richard S.","year":"2018","unstructured":"Richard S. Sutton and Andrew G . Barto . 2018 . Reinforcement Learning : An Introduction (2nd ed.) The MIT Press , Cambridge, MA. Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction (2nd ed.) The MIT Press, Cambridge, MA."},{"key":"e_1_2_2_92_1","doi-asserted-by":"publisher","DOI":"10.5555\/3009657.3009806"},{"key":"e_1_2_2_93_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0004-3702(99)00052-1"},{"key":"e_1_2_2_94_1","volume-title":"Multi-agent Reinforcement Learning: Independent vs. Cooperative Agents","author":"Tan Ming","unstructured":"Ming Tan . 1997. Multi-agent Reinforcement Learning: Independent vs. Cooperative Agents . Morgan Kaufmann Publishers Inc ., San Francisco, CA, 487\u2013494. Ming Tan. 1997. Multi-agent Reinforcement Learning: Independent vs. Cooperative Agents. Morgan Kaufmann Publishers Inc., San Francisco, CA, 487\u2013494."},{"key":"e_1_2_2_95_1","volume-title":"Hierarchical deep multiagent reinforcement learning. arxiv:1809.09332","author":"Tang Hongyao","year":"2018","unstructured":"Hongyao Tang , Jianye Hao , Tangjie Lv , Yingfeng Chen , Zongzhang Zhang , Hangtian Jia , Chunxu Ren , Yan Zheng , Changjie Fan , and Li Wang . 2018. Hierarchical deep multiagent reinforcement learning. arxiv:1809.09332 ( 2018 ). Hongyao Tang, Jianye Hao, Tangjie Lv, Yingfeng Chen, Zongzhang Zhang, Hangtian Jia, Chunxu Ren, Yan Zheng, Changjie Fan, and Li Wang. 2018. Hierarchical deep multiagent reinforcement learning. arxiv:1809.09332 (2018)."},{"key":"e_1_2_2_96_1","volume-title":"Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI\u201917)","author":"Tessler Chen","year":"2017","unstructured":"Chen Tessler , Shahar Givony , Tom Zahavy , Daniel J. Mankowitz , and Shie Mannor . 2017 . A deep hierarchical approach to lifelong learning in minecraft . In Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI\u201917) . AAAI Press, 1553\u20131561. Chen Tessler, Shahar Givony, Tom Zahavy, Daniel J. Mankowitz, and Shie Mannor. 2017. A deep hierarchical approach to lifelong learning in minecraft. In Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI\u201917). AAAI Press, 1553\u20131561."},{"key":"e_1_2_2_97_1","volume-title":"Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems. 5026\u20135033","author":"Todorov E.","year":"2012","unstructured":"E. Todorov , T. Erez , and Y. Tassa . 2012. MuJoCo: A physics engine for model-based control . In Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems. 5026\u20135033 . DOI:https:\/\/doi.org\/10.1109\/IROS. 2012 .6386109 E. Todorov, T. Erez, and Y. Tassa. 2012. MuJoCo: A physics engine for model-based control. In Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems. 5026\u20135033. DOI:https:\/\/doi.org\/10.1109\/IROS.2012.6386109"},{"key":"e_1_2_2_98_1","volume-title":"Stochastic Models of Neural Networks","author":"Turchetti Claudio","unstructured":"Claudio Turchetti . 2004. Stochastic Models of Neural Networks . Vol. 102 . IOS Press . Claudio Turchetti. 2004. Stochastic Models of Neural Networks. Vol. 102. IOS Press."},{"key":"e_1_2_2_99_1","volume-title":"Proceedings of the 34th International Conference on Machine Learning (ICML\u201917)","author":"Vezhnevets Alexander Sasha","year":"2017","unstructured":"Alexander Sasha Vezhnevets , Simon Osindero , Tom Schaul , Nicolas Heess , Max Jaderberg , David Silver , and Koray Kavukcuoglu . 2017 . FeUdal networks for hierarchical reinforcement learning . In Proceedings of the 34th International Conference on Machine Learning (ICML\u201917) . JMLR.org, 3540\u20133549. Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, and Koray Kavukcuoglu. 2017. FeUdal networks for hierarchical reinforcement learning. In Proceedings of the 34th International Conference on Machine Learning (ICML\u201917). JMLR.org, 3540\u20133549."},{"key":"e_1_2_2_100_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF00992698"},{"key":"e_1_2_2_101_1","volume-title":"Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 1566\u20131574","author":"Yang Jiachen","year":"2020","unstructured":"Jiachen Yang , Igor Borovikov , and Hongyuan Zha . 2020 . Hierarchical cooperative multi-agent reinforcement learning with skill discovery . In Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 1566\u20131574 . Jiachen Yang, Igor Borovikov, and Hongyuan Zha. 2020. Hierarchical cooperative multi-agent reinforcement learning with skill discovery. In Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 1566\u20131574."},{"key":"e_1_2_2_102_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2018.2805379"},{"key":"e_1_2_2_103_1","volume-title":"Proc. Mach. Learn. Res.","volume":"117","author":"Zahavy Tom","year":"2020","unstructured":"Tom Zahavy , Avinatan Hasidim , Haim Kaplan , and Yishay Mansour . 2020 . Planning in hierarchical reinforcement learning: Guarantees for using local policies . Proc. Mach. Learn. Res. , Vol. 117 . PMLR, 906\u2013934. Tom Zahavy, Avinatan Hasidim, Haim Kaplan, and Yishay Mansour. 2020. Planning in hierarchical reinforcement learning: Guarantees for using local policies. Proc. Mach. Learn. Res., Vol. 117. PMLR, 906\u2013934."},{"key":"e_1_2_2_104_1","volume-title":"Advances in Neural Information Processing Systems","volume":"32","author":"Zhang Shangtong","year":"2019","unstructured":"Shangtong Zhang and Shimon Whiteson . 2019 . DAC: The double actor-critic architecture for learning options . In Advances in Neural Information Processing Systems , Vol. 32 . Curran Associates, Inc. , 2012\u20132022. Shangtong Zhang and Shimon Whiteson. 2019. DAC: The double actor-critic architecture for learning options. In Advances in Neural Information Processing Systems, Vol. 32. Curran Associates, Inc., 2012\u20132022."}],"container-title":["ACM Computing Surveys"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3453160","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3453160","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T21:28:39Z","timestamp":1750195719000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3453160"}},"subtitle":["A Comprehensive Survey"],"short-title":[],"issued":{"date-parts":[[2021,6,5]]},"references-count":103,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2022,6,30]]}},"alternative-id":["10.1145\/3453160"],"URL":"https:\/\/doi.org\/10.1145\/3453160","relation":{},"ISSN":["0360-0300","1557-7341"],"issn-type":[{"value":"0360-0300","type":"print"},{"value":"1557-7341","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,6,5]]},"assertion":[{"value":"2020-07-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-02-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-06-05","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}