{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,1]],"date-time":"2026-05-01T17:04:33Z","timestamp":1777655073621,"version":"3.51.4"},"publisher-location":"California","reference-count":0,"publisher":"International Joint Conferences on Artificial Intelligence Organization","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,8]]},"abstract":"<jats:p>In this paper, we investigate Exploratory Conservative Policy Optimization (ECPO), a policy optimization strategy that improves exploration behavior while assuring monotonic progress in a principled objective. ECPO conducts maximum entropy exploration within a mirror descent framework, but updates policies using reversed KL projection. This formulation bypasses undesirable mode seeking behavior and avoids premature convergence to sub-optimal policies, while still supporting strong theoretical properties such as guaranteed policy improvement. Experimental evaluations demonstrate that the proposed method significantly improves practical exploration and surpasses the empirical performance of state-of-the art policy optimization methods in a set of benchmark tasks.<\/jats:p>","DOI":"10.24963\/ijcai.2019\/434","type":"proceedings-article","created":{"date-parts":[[2019,7,28]],"date-time":"2019-07-28T03:46:05Z","timestamp":1564285565000},"page":"3130-3136","source":"Crossref","is-referenced-by-count":7,"title":["On Principled Entropy Exploration in Policy Optimization"],"prefix":"10.24963","author":[{"given":"Jincheng","family":"Mei","sequence":"first","affiliation":[{"name":"University of Alberta"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chenjun","family":"Xiao","sequence":"additional","affiliation":[{"name":"University of Alberta"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ruitong","family":"Huang","sequence":"additional","affiliation":[{"name":"Borealis AI Lab"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dale","family":"Schuurmans","sequence":"additional","affiliation":[{"name":"University of Alberta"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Martin","family":"M\u00fcller","sequence":"additional","affiliation":[{"name":"University of Alberta"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"10584","event":{"name":"Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}","theme":"Artificial Intelligence","location":"Macao, China","acronym":"IJCAI-2019","number":"28","sponsor":["International Joint Conferences on Artificial Intelligence Organization (IJCAI)"],"start":{"date-parts":[[2019,8,10]]},"end":{"date-parts":[[2019,8,16]]}},"container-title":["Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence"],"original-title":[],"deposited":{"date-parts":[[2019,7,28]],"date-time":"2019-07-28T03:49:16Z","timestamp":1564285756000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.ijcai.org\/proceedings\/2019\/434"}},"subtitle":[],"proceedings-subject":"Artificial Intelligence Research Articles","short-title":[],"issued":{"date-parts":[[2019,8]]},"references-count":0,"URL":"https:\/\/doi.org\/10.24963\/ijcai.2019\/434","relation":{},"subject":[],"published":{"date-parts":[[2019,8]]}}}