{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,5,31]],"date-time":"2024-05-31T19:56:53Z","timestamp":1717185413704},"reference-count":0,"publisher":"IOS Press","isbn-type":[{"value":"9781643684369","type":"print"},{"value":"9781643684376","type":"electronic"}],"license":[{"start":{"date-parts":[[2023,9,28]],"date-time":"2023-09-28T00:00:00Z","timestamp":1695859200000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,9,28]]},"abstract":"<jats:p>Training reinforcement learning (RL) agents using scalar reward signals is often infeasible when an environment has sparse and non-Markovian rewards. Moreover, handcrafting these reward functions before training is prone to misspecification. We learn non-Markovian finite task specifications as finite-state \u2018task automata\u2019 from episodes of agent experience within environments with unknown dynamics. First, we learn a product MDP, a model composed of the specification\u2019s automaton and the environment\u2019s MDP (both initially unknown), by treating it as a partially observable MDP and employing a hidden Markov model learning algorithm. Second, we efficiently distil the task automaton (assumed to be a deterministic finite automaton) from the learnt product MDP. Our automaton enables a task to be decomposed into sub-tasks, so an RL agent can later synthesise an optimal policy more efficiently. It is also an interpretable encoding of high-level task features, so a human can verify that the agent\u2019s learnt tasks have no misspecifications. Finally, we also take steps towards ensuring that the automaton is environment-agnostic, making it well-suited for use in transfer learning.<\/jats:p>","DOI":"10.3233\/faia230247","type":"book-chapter","created":{"date-parts":[[2023,9,29]],"date-time":"2023-09-29T08:58:59Z","timestamp":1695977939000},"source":"Crossref","is-referenced-by-count":1,"title":["Learning Task Automata for Reinforcement Learning Using Hidden Markov Models"],"prefix":"10.3233","author":[{"given":"Alessandro","family":"Abate","sequence":"first","affiliation":[{"name":"University of Oxford, aabate@cs.ox.ac.uk, james.fox@cs.ox.ac.uk, david.hyland@cs.ox.ac.uk, mjw@cs.ox.ac.uk"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yousif","family":"Almulla","sequence":"additional","affiliation":[{"name":"University of Oxford, aabate@cs.ox.ac.uk, james.fox@cs.ox.ac.uk, david.hyland@cs.ox.ac.uk, mjw@cs.ox.ac.uk"},{"name":"Microsoft Azure Quantum, yalmulla@microsoft.com"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"James","family":"Fox","sequence":"additional","affiliation":[{"name":"University of Oxford, aabate@cs.ox.ac.uk, james.fox@cs.ox.ac.uk, david.hyland@cs.ox.ac.uk, mjw@cs.ox.ac.uk"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"David","family":"Hyland","sequence":"additional","affiliation":[{"name":"University of Oxford, aabate@cs.ox.ac.uk, james.fox@cs.ox.ac.uk, david.hyland@cs.ox.ac.uk, mjw@cs.ox.ac.uk"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Michael","family":"Wooldridge","sequence":"additional","affiliation":[{"name":"University of Oxford, aabate@cs.ox.ac.uk, james.fox@cs.ox.ac.uk, david.hyland@cs.ox.ac.uk, mjw@cs.ox.ac.uk"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"7437","container-title":["Frontiers in Artificial Intelligence and Applications","ECAI 2023"],"original-title":[],"link":[{"URL":"https:\/\/ebooks.iospress.nl\/pdf\/doi\/10.3233\/FAIA230247","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,9,29]],"date-time":"2023-09-29T08:59:12Z","timestamp":1695977952000},"score":1,"resource":{"primary":{"URL":"https:\/\/ebooks.iospress.nl\/doi\/10.3233\/FAIA230247"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,9,28]]},"ISBN":["9781643684369","9781643684376"],"references-count":0,"URL":"https:\/\/doi.org\/10.3233\/faia230247","relation":{},"ISSN":["0922-6389","1879-8314"],"issn-type":[{"value":"0922-6389","type":"print"},{"value":"1879-8314","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,9,28]]}}}