{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,11]],"date-time":"2025-09-11T19:02:59Z","timestamp":1757617379503,"version":"3.44.0"},"reference-count":32,"publisher":"Springer Science and Business Media LLC","issue":"23","license":[{"start":{"date-parts":[[2025,1,10]],"date-time":"2025-01-10T00:00:00Z","timestamp":1736467200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,1,10]],"date-time":"2025-01-10T00:00:00Z","timestamp":1736467200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100008383","name":"Bundesministerium f\u00fcr Verkehr und Digitale Infrastruktur","doi-asserted-by":"publisher","award":["45FGU121_E"],"award-info":[{"award-number":["45FGU121_E"]}],"id":[{"id":"10.13039\/100008383","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100002347","name":"Bundesministerium f\u00fcr Bildung und Forschung","doi-asserted-by":"publisher","award":["01DD20003"],"award-info":[{"award-number":["01DD20003"]}],"id":[{"id":"10.13039\/501100002347","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004115","name":"Gottfried Wilhelm Leibniz Universit\u00e4t Hannover","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100004115","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Neural Comput &amp; Applic"],"published-print":{"date-parts":[[2025,8]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Reinforcement learning (RL) agents naturally struggle with long-sequence sparse-reward tasks due to the lack of reward feedback during exploration and the problem of identifying the necessary action sequences required to reach the goal. Previous works have used abstract symbolic task knowledge models to speed up RL agents in these tasks by either splitting the task into easier to solve sub-tasks or by creating an artificial dense reward function. These approaches are often limited by their requirement of perfect symbolic knowledge models, which cannot be guaranteed when the abstract symbolic models are provided by humans and in real-world tasks. We introduce <jats:italic>exponential plan-based reward shaping<\/jats:italic>, which is able to leverage the ability to learn from experience of RL to compensate deficiencies in incomplete and incorrect abstract symbolic plans and use them to solve difficult tasks faster, while guaranteeing convergence to the optimal policy. Our approach is able to work with plans that miss important steps, include unnecessary extra steps, contain steps that refer ambiguously to both important and useless states, or encode an incorrect order of steps. We use action representations designed by human experts to automatically compute plans to capture the high-level task structure. The abstract symbolic subgoals defined by the plan are used to create dense reward feedback, which signals important states to the RL agent that should be achieved and explored to reach the goal. We show the theoretical advantages of our approach for plans with many steps and show its effectiveness empirically on multiple tasks with different kinds of incomplete or incorrect knowledge.<\/jats:p>","DOI":"10.1007\/s00521-024-10615-2","type":"journal-article","created":{"date-parts":[[2025,1,10]],"date-time":"2025-01-10T09:55:03Z","timestamp":1736502903000},"page":"18851-18866","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Using incomplete and incorrect plans to shape reinforcement learning in long-sequence sparse-reward tasks"],"prefix":"10.1007","volume":"37","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5719-4278","authenticated-orcid":false,"given":"Henrik","family":"M\u00fcller","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lukas","family":"Berg","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Daniel","family":"Kudenko","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2025,1,10]]},"reference":[{"key":"10615_CR1","unstructured":"Adhikari A, Yuan X, C\u00f4t\u00e9 MA, et\u00a0al (2020) Learning dynamic belief graphs to generalize on text-based games. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. Curran Associates Inc., Red Hook, NY, USA, NIPS\u201920"},{"key":"10615_CR2","doi-asserted-by":"publisher","unstructured":"Ammanabrolu P, Riedl M (2019) Playing text-adventure games with graph-based deep reinforcement learning. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 3557\u20133565, https:\/\/doi.org\/10.18653\/v1\/N19-1358","DOI":"10.18653\/v1\/N19-1358"},{"key":"10615_CR3","unstructured":"Andreas J, Klein D, Levine S (2017) Modular multitask reinforcement learning with policy sketches. In: Proceedings of the 34th International Conference on Machine Learning - Volume 70. JMLR.org, ICML\u201917, 166-175"},{"issue":"1","key":"10615_CR4","doi-asserted-by":"publisher","first-page":"41","DOI":"10.1023\/A:1022140919877","volume":"13","author":"AG Barto","year":"2003","unstructured":"Barto AG, Mahadevan S (2003) Recent advances in hierarchical reinforcement learning. Discrete event dynamic sys 13 1 41\u201377","journal-title":"Discrete event dynamic sys"},{"key":"10615_CR5","doi-asserted-by":"publisher","unstructured":"Basu C, Singhal M, Dragan AD (2018) Learning from richer human guidance: Augmenting comparison-based learning with feature queries. In: Proceedings of the 2018 ACM\/IEEE International Conference on Human-Robot Interaction. Association for Computing Machinery, New York, NY, USA, HRI \u201918, 132-140, https:\/\/doi.org\/10.1145\/3171221.3171284","DOI":"10.1145\/3171221.3171284"},{"key":"10615_CR6","doi-asserted-by":"publisher","unstructured":"De Giacomo G, Iocchi L, Favorito M et al (2021) Foundations for restraining bolts: Reinforcement learning with ltlf\/ldlf restraining specifications. Proceedings of the International Conference on Automated Planning and Scheduling 29(1):128\u2013136 https:\/\/doi.org\/10.1609\/icaps.v29i1.3549","DOI":"10.1609\/icaps.v29i1.3549"},{"key":"10615_CR7","unstructured":"Efthymiadis K, Kudenko D (2015) Knowledge revision for reinforcement learning with abstract mdps. In: Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems. International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, AAMAS \u201915, p 763-770"},{"key":"10615_CR8","unstructured":"Efthymiadis K, Devlin S, Kudenko D (2013) Overcoming erroneous domain knowledge in plan-based reward shaping. In: Proceedings of the 2013 International Conference on Autonomous Agents and Multi-Agent Systems. International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, AAMAS \u201913, p 1245-1246"},{"key":"10615_CR9","doi-asserted-by":"publisher","DOI":"10.1007\/s00521-022-07396-x","author":"M Elbarbari","year":"2022","unstructured":"Elbarbari M, Delgrange F, Vervlimmeren I et al (2022) A framework for flexibly guiding learning agents. Neural Comput Appl. https:\/\/doi.org\/10.1007\/s00521-022-07396-x","journal-title":"Neural Comput Appl"},{"key":"10615_CR10","doi-asserted-by":"crossref","unstructured":"Gehring C, Asai M, Chitnis R, et\u00a0al (2022) Reinforcement learning for classical planning: Viewing heuristics as dense reward generators. In: Kumar A, Thi\u00e9baux S, Varakantham P, et\u00a0al (eds) Proceedings of the Thirty-Second International Conference on Automated Planning and Scheduling, ICAPS 2022, Singapore (virtual), June 13-24, 2022. AAAI Press, pp 588\u2013596, https:\/\/ojs.aaai.org\/index.php\/ICAPS\/article\/view\/19846","DOI":"10.1609\/icaps.v32i1.19846"},{"key":"10615_CR11","doi-asserted-by":"crossref","unstructured":"Goyal P, Niekum S, Mooney RJ (2019) Using natural language for reward shaping in reinforcement learning. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19. International Joint Conferences on Artificial Intelligence Organization, pp 2385\u20132391,, 10.24963\/ijcai.2019\/331","DOI":"10.24963\/ijcai.2019\/331"},{"key":"10615_CR12","unstructured":"Grzes M (2017) Reward shaping in episodic reinforcement learning. In: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems. International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, AAMAS \u201917, p 565-573"},{"key":"10615_CR13","doi-asserted-by":"publisher","unstructured":"Grzes M, Kudenko D (2008) Plan-based reward shaping for reinforcement learning. In: 2008 4th International IEEE Conference Intelligent Systems, pp 10\u201322\u201310\u201329, https:\/\/doi.org\/10.1109\/IS.2008.4670492","DOI":"10.1109\/IS.2008.4670492"},{"key":"10615_CR14","doi-asserted-by":"publisher","unstructured":"Grzes M, Kudenko D (2009) Theoretical and empirical analysis of reward shaping in reinforcement learning. In: 2009 International Conference on Machine Learning and Applications, pp 337\u2013344, https:\/\/doi.org\/10.1109\/ICMLA.2009.33","DOI":"10.1109\/ICMLA.2009.33"},{"key":"10615_CR15","unstructured":"Guan L, Verma M, Kambhampati S (2020) Explanation augmented feedback in human-in-the-loop reinforcement learning. CoRR abs\/2006.14804. https:\/\/arxiv.org\/abs\/2006.14804, 2006.14804"},{"key":"10615_CR16","unstructured":"Guan L, Sreedharan S, Kambhampati S (2022) Leveraging approximate symbolic models for reinforcement learning via skill diversity. In: Chaudhuri K, Jegelka S, Song L, et\u00a0al (eds) International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, Proceedings of Machine Learning Research, vol 162. PMLR, pp 7949\u20137967, https:\/\/proceedings.mlr.press\/v162\/guan22c.html"},{"key":"10615_CR17","unstructured":"Hadfield-Menell D, Milli S, Abbeel P, et\u00a0al (2017) Inverse reward design. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. Curran Associates Inc., Red Hook, NY, USA, NIPS\u201917, p 6768-6777"},{"key":"10615_CR18","doi-asserted-by":"publisher","unstructured":"Hafner D, Lee KH, Fischer I, et\u00a0al (2022) Deep hierarchical planning from pixels. https:\/\/doi.org\/10.48550\/ARXIV.2206.04114, https:\/\/arxiv.org\/abs\/2206.04114","DOI":"10.48550\/ARXIV.2206.04114"},{"key":"10615_CR19","doi-asserted-by":"publisher","unstructured":"Hasanbeig M, Yogananda Jeppu N, Abate A et al (2021) Deepsynth: Automata synthesis for automatic task segmentation in deep reinforcement learning. Proceedings of the AAAI Conference on Artificial Intelligence 35(9):7647\u20137656. https:\/\/doi.org\/10.1609\/aaai.v35i9.16935https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/view\/16935","DOI":"10.1609\/aaai.v35i9.16935"},{"key":"10615_CR20","unstructured":"Icarte RT, Klassen T, Valenzano R, et\u00a0al (2018) Using reward machines for high-level task specification and decomposition in reinforcement learning. In: Proceedings of the 35th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol\u00a080. PMLR, pp 2107\u20132116, https:\/\/proceedings.mlr.press\/v80\/icarte18a.html"},{"key":"10615_CR21","doi-asserted-by":"publisher","unstructured":"Illanes L, Yan X, Toro Icarte R et al (2020) Symbolic plans as high-level instructions for reinforcement learning. Proceedings of the International Conference on Automated Planning and Scheduling 30(1):540\u2013550 https:\/\/doi.org\/10.1609\/icaps.v30i1.6750, https:\/\/ojs.aaai.org\/index.php\/ICAPS\/article\/view\/6750","DOI":"10.1609\/icaps.v30i1.6750"},{"key":"10615_CR22","unstructured":"Jiang Y, Bharadwaj S, Wu B, et\u00a0al (2020) Temporal-logic-based reward shaping for continuing learning tasks. CoRR abs\/2007.01498. https:\/\/arxiv.org\/abs\/2007.01498, 2007.01498"},{"key":"10615_CR23","doi-asserted-by":"publisher","unstructured":"Jin M, Ma Z, Jin K et al (2022) Creativity of ai: Automatic symbolic option discovery for facilitating deep reinforcement learning. Proceedings of the AAAI Conference on Artificial Intelligence 36(6):7042\u20137050 https:\/\/doi.org\/10.1609\/icaps.v30i1.6750, https:\/\/ojs.aaai.org\/index.php\/ICAPS\/article\/view\/6750","DOI":"10.1609\/icaps.v30i1.6750"},{"key":"10615_CR24","unstructured":"Jothimurugan K, Bansal S, Bastani O, et\u00a0al (2021) Compositional reinforcement learning from logical specifications. CoRR abs\/2106.13906. https:\/\/arxiv.org\/abs\/2106.13906, 2106.13906"},{"key":"10615_CR25","doi-asserted-by":"publisher","unstructured":"Kokel H, Manoharan A, Natarajan S et al (2021) Reprel: Integrating relational planning and reinforcement learning for effective abstraction. Proceedings of the International Conference on Automated Planning and Scheduling 31(1):533\u2013541 https:\/\/doi.org\/10.1609\/icaps.v31i1.16001","DOI":"10.1609\/icaps.v31i1.16001"},{"key":"10615_CR26","doi-asserted-by":"publisher","unstructured":"Lyu D, Yang F, Liu B, et\u00a0al (2019) Sdrl: Interpretable and data-efficient deep reinforcement learning leveraging symbolic planning. In: Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence. AAAI Press, AAAI\u201919\/IAAI\u201919\/EAAI\u201919, https:\/\/doi.org\/10.1609\/aaai.v33i01.33012970","DOI":"10.1609\/aaai.v33i01.33012970"},{"key":"10615_CR27","doi-asserted-by":"publisher","unstructured":"Mitchener L, Tuckey D, Crosby M, et\u00a0al (2022) Detect, understand, act: A neuro-symbolic hierarchical reinforcement learning framework (extended abstract). In: Raedt LD (ed) Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22. International Joint Conferences on Artificial Intelligence Organization, pp 5314\u20135318, https:\/\/doi.org\/10.24963\/ijcai.2022\/742, sister Conferences Best Papers","DOI":"10.24963\/ijcai.2022\/742"},{"key":"10615_CR28","unstructured":"Ng AY, Harada D, Russell SJ (1999) Policy invariance under reward transformations: Theory and application to reward shaping. In: Proceedings of the Sixteenth International Conference on Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, ICML \u201999, p 278-287"},{"key":"10615_CR29","unstructured":"Tuli M, Li AC, Vaezipoor P, et\u00a0al (2022) Learning to follow instructions in text-based games. In: Oh AH, Agarwal A, Belgrave D, et\u00a0al (eds) Advances in Neural Information Processing Systems, https:\/\/openreview.net\/forum?id=StlwkcFsjaZ"},{"key":"10615_CR30","doi-asserted-by":"crossref","unstructured":"Yang F, Lyu D, Liu B, et\u00a0al (2018) Peorl: Integrating symbolic planning and hierarchical reinforcement learning for robust decision-making. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence. AAAI Press, IJCAI\u201918, p 4860-4866","DOI":"10.24963\/ijcai.2018\/675"},{"key":"10615_CR31","unstructured":"Zhang R, Torabi F, Guan L, et\u00a0al (2019) Leveraging human guidance for deep reinforcement learning tasks. CoRR abs\/1909.09906. http:\/\/arxiv.org\/abs\/1909.09906, 1909.09906"},{"key":"10615_CR32","doi-asserted-by":"publisher","unstructured":"Zhou S, Dai X, Chen H, et\u00a0al (2020) Interactive recommender system via knowledge graph-enhanced reinforcement learning. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, New York, NY, USA, SIGIR \u201920, p 179-188, https:\/\/doi.org\/10.1145\/3397271.3401174","DOI":"10.1145\/3397271.3401174"}],"container-title":["Neural Computing and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00521-024-10615-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s00521-024-10615-2\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00521-024-10615-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,6]],"date-time":"2025-09-06T03:13:05Z","timestamp":1757128385000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s00521-024-10615-2"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,1,10]]},"references-count":32,"journal-issue":{"issue":"23","published-print":{"date-parts":[[2025,8]]}},"alternative-id":["10615"],"URL":"https:\/\/doi.org\/10.1007\/s00521-024-10615-2","relation":{},"ISSN":["0941-0643","1433-3058"],"issn-type":[{"type":"print","value":"0941-0643"},{"type":"electronic","value":"1433-3058"}],"subject":[],"published":{"date-parts":[[2025,1,10]]},"assertion":[{"value":"4 November 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 October 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"10 January 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}