{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,28]],"date-time":"2026-02-28T01:35:29Z","timestamp":1772242529718,"version":"3.50.1"},"publisher-location":"California","reference-count":0,"publisher":"International Joint Conferences on Artificial Intelligence Organization","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2018,7]]},"abstract":"<jats:p>Policy optimization on high-dimensional continuous control tasks exhibits its difficulty caused by the large variance of the policy gradient estimators. We present the action subspace dependent gradient (ASDG) estimator which incorporates the Rao-Blackwell theorem (RB) and Control Variates (CV) into a unified framework to reduce the variance. To invoke RB, our proposed algorithm (POSA) learns the underlying factorization structure among the action space based on the second-order advantage information.  POSA captures the quadratic information explicitly and efficiently by utilizing the wide \\&amp; deep architecture. Empirical studies show that our proposed approach demonstrates the performance improvements on high-dimensional synthetic settings and OpenAI Gym's MuJoCo continuous control tasks.<\/jats:p>","DOI":"10.24963\/ijcai.2018\/699","type":"proceedings-article","created":{"date-parts":[[2018,7,5]],"date-time":"2018-07-05T05:49:10Z","timestamp":1530769750000},"page":"5038-5044","source":"Crossref","is-referenced-by-count":3,"title":["Policy Optimization with Second-Order Advantage Information"],"prefix":"10.24963","author":[{"given":"Jiajin","family":"Li","sequence":"first","affiliation":[{"name":"The Chinese University of Hong Kong"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Baoxiang","family":"Wang","sequence":"additional","affiliation":[{"name":"The Chinese University of Hong Kong"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shengyu","family":"Zhang","sequence":"additional","affiliation":[{"name":"The Chinese University of Hong Kong"},{"name":"Tencent"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"10584","event":{"name":"Twenty-Seventh International Joint Conference on Artificial Intelligence {IJCAI-18}","theme":"Artificial Intelligence","location":"Stockholm, Sweden","acronym":"IJCAI-2018","number":"27","sponsor":["International Joint Conferences on Artificial Intelligence Organization (IJCAI)"],"start":{"date-parts":[[2018,7,13]]},"end":{"date-parts":[[2018,7,19]]}},"container-title":["Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence"],"original-title":[],"deposited":{"date-parts":[[2018,7,5]],"date-time":"2018-07-05T05:55:21Z","timestamp":1530770121000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.ijcai.org\/proceedings\/2018\/699"}},"subtitle":[],"proceedings-subject":"Artificial Intelligence Research Articles","short-title":[],"issued":{"date-parts":[[2018,7]]},"references-count":0,"URL":"https:\/\/doi.org\/10.24963\/ijcai.2018\/699","relation":{},"subject":[],"published":{"date-parts":[[2018,7]]}}}