{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,30]],"date-time":"2025-07-30T16:31:09Z","timestamp":1753893069853,"version":"3.41.2"},"reference-count":47,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2023,12,22]],"date-time":"2023-12-22T00:00:00Z","timestamp":1703203200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Robot. AI"],"abstract":"<jats:p>The current paper proposes a hierarchical reinforcement learning (HRL) method to decompose a complex task into simpler sub-tasks and leverage those to improve the training of an autonomous agent in a simulated environment. For practical reasons (i.e., illustrating purposes, easy implementation, user-friendly interface, and useful functionalities), we employ two Python frameworks called TextWorld and MiniGrid. MiniGrid functions as a 2D simulated representation of the real environment, while TextWorld functions as a high-level abstraction of this simulated environment. Training on this abstraction disentangles manipulation from navigation actions and allows us to design a dense reward function instead of a sparse reward function for the lower-level environment, which, as we show, improves the performance of training. Formal methods are utilized throughout the paper to establish that our algorithm is not prevented from deriving solutions.<\/jats:p>","DOI":"10.3389\/frobt.2023.1280578","type":"journal-article","created":{"date-parts":[[2023,12,22]],"date-time":"2023-12-22T04:38:49Z","timestamp":1703219929000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Decomposing user-defined tasks in a reinforcement learning setup using TextWorld"],"prefix":"10.3389","volume":"10","author":[{"given":"Thanos","family":"Petsanis","sequence":"first","affiliation":[]},{"given":"Christoforos","family":"Keroglou","sequence":"additional","affiliation":[]},{"given":"Athanasios","family":"Ch. Kapoutsis","sequence":"additional","affiliation":[]},{"given":"Elias B.","family":"Kosmatopoulos","sequence":"additional","affiliation":[]},{"given":"Georgios Ch.","family":"Sirakoulis","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2023,12,22]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.11797","article-title":"Safe reinforcement learning via shielding","volume":"32","author":"Alshiekh","year":"2018","journal-title":"Proc. AAAI Conf. Artif. Intell."},{"key":"B2","doi-asserted-by":"crossref","DOI":"10.1109\/CVPR.2018.00387","article-title":"Vision-and-language navigation: interpreting visually-grounded navigation instructions in real environments","author":"Anderson","year":"2018"},{"key":"B3","doi-asserted-by":"publisher","first-page":"41","DOI":"10.1023\/a:1022140919877","article-title":"Recent advances in hierarchical reinforcement learning","volume":"13","author":"Barto","year":"2003","journal-title":"Discrete event Dyn. Syst."},{"key":"B4","doi-asserted-by":"crossref","DOI":"10.1109\/ICRA.2011.5980058","article-title":"Towards autonomous robotic butlers: lessons learned with the pr2","volume-title":"Icra","author":"Bohren","year":"2011"},{"volume-title":"Miniworld: minimalistic 3d environment for rl robotics research","year":"2018","author":"Chevalier-Boisvert","key":"B5"},{"key":"B6","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1007\/978-3-030-24337-1_3","article-title":"Textworld: a learning environment for text-based games","volume-title":"Computer games","author":"C\u00f4t\u00e9","year":"2019"},{"key":"B7","article-title":"Decomposition techniques for planning in stochastic domains","volume-title":"International joint conference on artificial intelligence","author":"Dean","year":"1995"},{"key":"B8","article-title":"ProcTHOR: large-scale embodied AI using procedural generation","volume-title":"NeurIPS","author":"Deitke","year":"2022"},{"key":"B9","first-page":"118","article-title":"The maxq method for hierarchical reinforcement learning","volume":"98","author":"Dietterich","year":"1998","journal-title":"ICML"},{"key":"B10","doi-asserted-by":"publisher","first-page":"230","DOI":"10.1109\/TETCI.2022.3141105","article-title":"A survey of embodied ai: from simulators to research tasks","volume":"6","author":"Duan","year":"2022","journal-title":"IEEE Trans. Emerg. Top. Comput. Intell."},{"volume-title":"Dialfred: dialogue-enabled agents for embodied instruction following","year":"2022","author":"Gao","key":"B11"},{"key":"B12","doi-asserted-by":"publisher","first-page":"265","DOI":"10.1146\/annurev-control-091420-084139","article-title":"Integrated task and motion planning","volume":"4","author":"Garrett","year":"2021","journal-title":"Annu. Rev. Control, Robotics, Aut. Syst."},{"volume-title":"Navigating to objects in the real world","year":"2022","author":"Gervet","key":"B13"},{"key":"B14","doi-asserted-by":"crossref","first-page":"346","DOI":"10.1109\/ICRA.2015.7139022","article-title":"Towards manipulation planning with temporal logic specifications","volume-title":"2015 IEEE Int. Conf. Robotics Automation (ICRA)","author":"He","year":"2015"},{"key":"B15","doi-asserted-by":"publisher","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput."},{"volume-title":"3d-llm: injecting the 3d world into large language models","year":"2023","author":"Hong","key":"B16"},{"key":"B17","first-page":"2107","article-title":"Using reward machines for high-level task specification and decomposition in reinforcement learning","volume-title":"Proceedings of the 35th international conference on machine learning","author":"Icarte","year":"2018"},{"key":"B18","doi-asserted-by":"publisher","first-page":"2081","DOI":"10.1016\/j.ifacol.2020.12.2526","article-title":"Communication policies in heterogeneous multi-agent systems in partially known environments under temporal logic specifications","volume":"53","author":"Keroglou","year":"","journal-title":"IFAC-PapersOnLine"},{"year":"","author":"Keroglou","article-title":"Communication policies in heterogeneous multi-agent systems in partially known environments under temporal logic specifications 21st IFAC World Congress","key":"B19"},{"key":"B20","doi-asserted-by":"publisher","first-page":"196","DOI":"10.1109\/tmrb.2023.3261342","article-title":"A survey on technical challenges of assistive robotics for elder people in domestic environments: the aspida concept","volume":"5","author":"Keroglou","year":"2023","journal-title":"IEEE Trans. Med. Robotics Bionics"},{"key":"B21","doi-asserted-by":"publisher","first-page":"240","DOI":"10.3390\/biomimetics8020240","article-title":"The task decomposition and dedicated reward-system-based reinforcement learning algorithm for pick-and-place","volume":"8","author":"Kim","year":"2023","journal-title":"Biomimetics"},{"key":"B22","doi-asserted-by":"publisher","first-page":"1238","DOI":"10.1177\/0278364913495721","article-title":"Reinforcement learning in robotics: a survey","volume":"32","author":"Kober","year":"2013","journal-title":"Int. J. Robotics Res."},{"key":"B23","doi-asserted-by":"publisher","first-page":"746","DOI":"10.3390\/machines10090746","article-title":"Assessment of industry 4.0 for modern manufacturing ecosystem: a systematic survey of surveys","volume":"10","author":"Konstantinidis","year":"2022","journal-title":"Machines"},{"volume-title":"Theory and application of reward shaping in reinforcement learning","year":"2004","author":"Laud","key":"B24"},{"volume-title":"Reinforcement learning with temporal logic rewards","year":"2016","author":"Li","key":"B25"},{"year":"2023","author":"Liu","article-title":"Summary of chatgpt\/gpt-4 research and perspective towards the future of large language models","key":"B26"},{"key":"B27","doi-asserted-by":"crossref","first-page":"181","DOI":"10.1016\/B978-1-55860-335-6.50030-1","article-title":"Reward functions for accelerated learning","volume-title":"Machine learning proceedings 1994","author":"Mataric","year":"1994"},{"volume-title":"Asynchronous methods for deep reinforcement learning","year":"2016","author":"Mnih","key":"B28"},{"key":"B29","first-page":"278","article-title":"Policy invariance under reward transformations: theory and application to reward shaping","volume":"99","author":"Ng","year":"1999","journal-title":"Icml"},{"key":"B30","doi-asserted-by":"publisher","first-page":"137","DOI":"10.11113\/jt.v78.9285","article-title":"Industry 4.0: a review on industrial automation and robotic","volume":"78","author":"Othman","year":"2016","journal-title":"J. Teknol."},{"key":"B31","doi-asserted-by":"publisher","first-page":"1498","DOI":"10.1162\/neco_a_01387","article-title":"Reinforcement learning in sparse-reward environments with hindsight policy gradients","volume":"33","author":"Rauber","year":"2021","journal-title":"Neural Comput."},{"volume-title":"Reinforcement learning with sparse rewards using guidance from offline demonstration","year":"2022","author":"Rengarajan","key":"B32"},{"volume-title":"An open simulation-to-real embodied AI platform","year":"2020","key":"B33"},{"key":"B34","doi-asserted-by":"publisher","first-page":"115","DOI":"10.1016\/0004-3702(74)90026-5","article-title":"Planning in a hierarchy of abstraction spaces","volume":"5","author":"Sacerdoti","year":"1974","journal-title":"Artif. Intell."},{"key":"B35","doi-asserted-by":"crossref","DOI":"10.1109\/ICCV.2019.00943","article-title":"Habitat: a platform for embodied AI research","volume-title":"Proceedings of the IEEE\/CVF international conference on computer vision (ICCV)","author":"Savva","year":"2019"},{"volume-title":"Proximal policy optimization algorithms","year":"2017","author":"Schulman","key":"B36"},{"key":"B37","doi-asserted-by":"crossref","DOI":"10.1109\/CVPR42600.2020.01075","article-title":"Alfred: a benchmark for interpreting grounded instructions for everyday tasks","volume-title":"Cvpr","author":"Shridhar","year":""},{"volume-title":"Alfworld: aligning text and embodied environments for interactive learning","year":"","author":"Shridhar","key":"B38"},{"key":"B39","doi-asserted-by":"crossref","DOI":"10.15607\/RSS.2019.XV.073","article-title":"End-to-end robotic reinforcement learning without reward engineering","author":"Singh","year":"2019"},{"volume-title":"Reward machines: exploiting reward function structure in reinforcement learning","year":"2020","author":"Toro Icarte","key":"B40"},{"key":"B41","doi-asserted-by":"publisher","first-page":"279","DOI":"10.1023\/a:1022676722315","article-title":"Q-learning","volume":"8","author":"Watkins","year":"1992","journal-title":"Mach. Learn."},{"volume-title":"Allenact: a framework for embodied ai research","year":"2020","author":"Weihs","key":"B42"},{"key":"B43","doi-asserted-by":"publisher","first-page":"105669","DOI":"10.1109\/ACCESS.2019.2932257","article-title":"Deep reinforcement learning with optimized reward functions for robotic trajectory planning","volume":"7","author":"Xie","year":"2019","journal-title":"IEEE Access"},{"volume-title":"Habitat challenge 2023","year":"2023","author":"Yadav","key":"B44"},{"key":"B45","article-title":"The homerobot open vocab mobile manipulation challenge","volume-title":"Thirty-seventh conference on neural information processing systems: competition track","author":"Yenamandra","year":""},{"volume-title":"Homerobot: open vocab mobile manipulation","year":"","author":"Yenamandra","key":"B46"},{"volume-title":"Tasklama: probing the complex task understanding of language models","year":"2023","author":"Yuan","key":"B47"}],"container-title":["Frontiers in Robotics and AI"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2023.1280578\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,12,22]],"date-time":"2023-12-22T04:39:07Z","timestamp":1703219947000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2023.1280578\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,12,22]]},"references-count":47,"alternative-id":["10.3389\/frobt.2023.1280578"],"URL":"https:\/\/doi.org\/10.3389\/frobt.2023.1280578","relation":{},"ISSN":["2296-9144"],"issn-type":[{"type":"electronic","value":"2296-9144"}],"subject":[],"published":{"date-parts":[[2023,12,22]]},"article-number":"1280578"}}