{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,4]],"date-time":"2026-03-04T20:02:54Z","timestamp":1772654574811,"version":"3.50.1"},"publisher-location":"California","reference-count":0,"publisher":"International Joint Conferences on Artificial Intelligence Organization","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2018,7]]},"abstract":"<jats:p>Model-based strategies for control are critical to obtain sample efficient learning. Dyna is a planning paradigm that naturally interleaves learning and planning, by simulating one-step experience to update the action-value function. This elegant planning strategy has been mostly explored in the tabular setting. The aim of this paper is to revisit sample-based planning, in stochastic and continuous domains with learned models. We first highlight the flexibility afforded by a model over Experience Replay (ER). Replay-based methods can be seen as stochastic planning methods that repeatedly sample from a buffer of recent agent-environment interactions and perform updates to improve data efficiency. We show that a model, as opposed to a replay buffer, is particularly useful for specifying which states to sample from during planning, such as predecessor states that propagate information in reverse from a state more quickly. We introduce a semi-parametric model learning approach, called Reweighted Experience Models (REMs), that makes it simple to sample next states or predecessors. We demonstrate that REM-Dyna exhibits similar advantages over replay-based methods in learning in continuous state problems, and that the performance gap grows when moving to stochastic domains, of increasing size.<\/jats:p>","DOI":"10.24963\/ijcai.2018\/666","type":"proceedings-article","created":{"date-parts":[[2018,7,5]],"date-time":"2018-07-05T01:49:10Z","timestamp":1530755350000},"page":"4794-4800","source":"Crossref","is-referenced-by-count":8,"title":["Organizing Experience: a Deeper Look at Replay Mechanisms for Sample-Based Planning in Continuous State Domains"],"prefix":"10.24963","author":[{"given":"Yangchen","family":"Pan","sequence":"first","affiliation":[{"name":"University of Alberta"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Muhammad","family":"Zaheer","sequence":"additional","affiliation":[{"name":"University of Alberta"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Adam","family":"White","sequence":"additional","affiliation":[{"name":"Deepmind"},{"name":"University of Alberta"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Andrew","family":"Patterson","sequence":"additional","affiliation":[{"name":"University of Alberta"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Martha","family":"White","sequence":"additional","affiliation":[{"name":"University of Alberta"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"10584","event":{"name":"Twenty-Seventh International Joint Conference on Artificial Intelligence {IJCAI-18}","theme":"Artificial Intelligence","location":"Stockholm, Sweden","acronym":"IJCAI-2018","number":"27","sponsor":["International Joint Conferences on Artificial Intelligence Organization (IJCAI)"],"start":{"date-parts":[[2018,7,13]]},"end":{"date-parts":[[2018,7,19]]}},"container-title":["Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence"],"original-title":[],"deposited":{"date-parts":[[2018,7,5]],"date-time":"2018-07-05T01:55:05Z","timestamp":1530755705000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.ijcai.org\/proceedings\/2018\/666"}},"subtitle":[],"proceedings-subject":"Artificial Intelligence Research Articles","short-title":[],"issued":{"date-parts":[[2018,7]]},"references-count":0,"URL":"https:\/\/doi.org\/10.24963\/ijcai.2018\/666","relation":{},"subject":[],"published":{"date-parts":[[2018,7]]}}}