{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,19]],"date-time":"2025-12-19T21:55:15Z","timestamp":1766181315160,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":30,"publisher":"ACM","license":[{"start":{"date-parts":[[2019,10,13]],"date-time":"2019-10-13T00:00:00Z","timestamp":1570924800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2019,10,13]]},"DOI":"10.1145\/3356464.3357706","type":"proceedings-article","created":{"date-parts":[[2019,10,31]],"date-time":"2019-10-31T12:20:52Z","timestamp":1572524452000},"page":"1-10","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["Generative adversarial exploration for reinforcement learning"],"prefix":"10.1145","author":[{"given":"Weijun","family":"Hong","sequence":"first","affiliation":[{"name":"Shanghai Jiao Tong University"}]},{"given":"Menghui","family":"Zhu","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University"}]},{"given":"Minghuan","family":"Liu","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University"}]},{"given":"Weinan","family":"Zhang","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University"}]},{"given":"Ming","family":"Zhou","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University"}]},{"given":"Yong","family":"Yu","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University"}]},{"given":"Peng","family":"Sun","sequence":"additional","affiliation":[{"name":"Tencent AI Lab"}]}],"member":"320","published-online":{"date-parts":[[2019,10,13]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"Marc Bellemare Sriram Srinivasan Georg Ostrovski Tom Schaul David Saxton and Remi Munos. 2016. Unifying count-based exploration and intrinsic motivation. In NeurIPS. 1471--1479.  Marc Bellemare Sriram Srinivasan Georg Ostrovski Tom Schaul David Saxton and Remi Munos. 2016. Unifying count-based exploration and intrinsic motivation. In NeurIPS. 1471--1479."},{"key":"e_1_3_2_1_2_1","volume-title":"Openai gym. arXiv preprint arXiv:1606.01540","author":"Brockman Greg","year":"2016","unstructured":"Greg Brockman , Vicki Cheung , Ludwig Pettersson , Jonas Schneider , John Schulman , Jie Tang , and Wojciech Zaremba . 2016. Openai gym. arXiv preprint arXiv:1606.01540 ( 2016 ). Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. Openai gym. arXiv preprint arXiv:1606.01540 (2016)."},{"key":"e_1_3_2_1_3_1","volume-title":"Exploration by random network distillation. ICLR","author":"Burda Yuri","year":"2019","unstructured":"Yuri Burda , Harrison Edwards , Amos Storkey , and Oleg Klimov . 2019. Exploration by random network distillation. ICLR ( 2019 ). Yuri Burda, Harrison Edwards, Amos Storkey, and Oleg Klimov. 2019. Exploration by random network distillation. ICLR (2019)."},{"key":"e_1_3_2_1_4_1","volume-title":"Contingency-aware exploration in reinforcement learning. arXiv preprint arXiv:1811.01483","author":"Choi Jongwook","year":"2018","unstructured":"Jongwook Choi , Yijie Guo , Marcin Moczulski , Junhyuk Oh , Neal Wu , Mohammad Norouzi , and Honglak Lee . 2018. Contingency-aware exploration in reinforcement learning. arXiv preprint arXiv:1811.01483 ( 2018 ). Jongwook Choi, Yijie Guo, Marcin Moczulski, Junhyuk Oh, Neal Wu, Mohammad Norouzi, and Honglak Lee. 2018. Contingency-aware exploration in reinforcement learning. arXiv preprint arXiv:1811.01483 (2018)."},{"key":"e_1_3_2_1_5_1","unstructured":"Ian Goodfellow Jean Pouget-Abadie Mehdi Mirza Bing Xu David Warde-Farley Sherjil Ozair Aaron Courville and Yoshua Bengio. 2014. Generative adversarial nets. In NeurIPS. 2672--2680.  Ian Goodfellow Jean Pouget-Abadie Mehdi Mirza Bing Xu David Warde-Farley Sherjil Ozair Aaron Courville and Yoshua Bengio. 2014. Generative adversarial nets. In NeurIPS. 2672--2680."},{"key":"e_1_3_2_1_6_1","volume-title":"Filip De Turck, and Pieter Abbeel","author":"Houthooft Rein","year":"2016","unstructured":"Rein Houthooft , Xi Chen , Yan Duan , John Schulman , Filip De Turck, and Pieter Abbeel . 2016 . Vime : Variational information maximizing exploration. In NeurIPS. 1109--1117. Rein Houthooft, Xi Chen, Yan Duan, John Schulman, Filip De Turck, and Pieter Abbeel. 2016. Vime: Variational information maximizing exploration. In NeurIPS. 1109--1117."},{"key":"e_1_3_2_1_7_1","volume-title":"Control of exploitation-exploration meta-parameter in reinforcement learning. Neural networks 15, 4-6","author":"Ishii Shin","year":"2002","unstructured":"Shin Ishii , Wako Yoshida , and Junichiro Yoshimoto . 2002. Control of exploitation-exploration meta-parameter in reinforcement learning. Neural networks 15, 4-6 ( 2002 ), 665--687. Shin Ishii, Wako Yoshida, and Junichiro Yoshimoto. 2002. Control of exploitation-exploration meta-parameter in reinforcement learning. Neural networks 15, 4-6 (2002), 665--687."},{"key":"e_1_3_2_1_8_1","unstructured":"Laurent Itti and Pierre F Baldi. 2006. Bayesian surprise attracts human attention. In NeurIPS. 547--554.  Laurent Itti and Pierre F Baldi. 2006. Bayesian surprise attracts human attention. In NeurIPS. 547--554."},{"key":"e_1_3_2_1_9_1","first-page":"267","article-title":"Approximately optimal approximate reinforcement learning","volume":"2","author":"Kakade Sham","year":"2002","unstructured":"Sham Kakade and John Langford . 2002 . Approximately optimal approximate reinforcement learning . In ICML , Vol. 2. 267 -- 274 . Sham Kakade and John Langford. 2002. Approximately optimal approximate reinforcement learning. In ICML, Vol. 2. 267--274.","journal-title":"ICML"},{"key":"e_1_3_2_1_10_1","unstructured":"Christian Kauten. 2018. Super Mario Bros for OpenAI Gym. https:\/\/github.com\/Kautenja\/gym-super-mario-bros.  Christian Kauten. 2018. Super Mario Bros for OpenAI Gym. https:\/\/github.com\/Kautenja\/gym-super-mario-bros."},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"crossref","unstructured":"J Zico Kolter and Andrew Y Ng. 2009. Near-Bayesian exploration in polynomial time. In ICML. ACM 513--520.  J Zico Kolter and Andrew Y Ng. 2009. Near-Bayesian exploration in polynomial time. In ICML. ACM 513--520.","DOI":"10.1145\/1553374.1553441"},{"key":"e_1_3_2_1_12_1","volume-title":"Continuous control with deep reinforcement learning. ICLR","author":"Lillicrap Timothy P","year":"2016","unstructured":"Timothy P Lillicrap , Jonathan J Hunt , Alexander Pritzel , Nicolas Heess , Tom Erez , Yuval Tassa , David Silver , and Daan Wierstra . 2016. Continuous control with deep reinforcement learning. ICLR ( 2016 ). Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2016. Continuous control with deep reinforcement learning. ICLR (2016)."},{"key":"e_1_3_2_1_13_1","volume-title":"Count-based exploration with the successor representation. arXiv preprint arXiv:1807.11622","author":"Machado Marlos C","year":"2018","unstructured":"Marlos C Machado , Marc G Bellemare , and Michael Bowling . 2018. Count-based exploration with the successor representation. arXiv preprint arXiv:1807.11622 ( 2018 ). Marlos C Machado, Marc G Bellemare, and Michael Bowling. 2018. Count-based exploration with the successor representation. arXiv preprint arXiv:1807.11622 (2018)."},{"key":"e_1_3_2_1_14_1","volume-title":"Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu.","author":"Mnih Volodymyr","year":"2016","unstructured":"Volodymyr Mnih , Adria Puigdomenech Badia , Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016 . Asynchronous methods for deep reinforcement learning. In ICML. 1928--1937. Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In ICML. 1928--1937."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"crossref","unstructured":"Volodymyr Mnih Koray Kavukcuoglu David Silver Andrei A Rusu Joel Veness Marc G Bellemare Alex Graves Martin Riedmiller Andreas K Fidjeland Georg Ostrovski etal 2015. Human-level control through deep reinforcement learning. Nature 518 7540 (2015) 529.  Volodymyr Mnih Koray Kavukcuoglu David Silver Andrei A Rusu Joel Veness Marc G Bellemare Alex Graves Martin Riedmiller Andreas K Fidjeland Georg Ostrovski et al. 2015. Human-level control through deep reinforcement learning. Nature 518 7540 (2015) 529.","DOI":"10.1038\/nature14236"},{"key":"e_1_3_2_1_16_1","unstructured":"Shakir Mohamed and Danilo Jimenez Rezende. 2015. Variational information maximisation for intrinsically motivated reinforcement learning. In NeurIPS. 2125--2133.  Shakir Mohamed and Danilo Jimenez Rezende. 2015. Variational information maximisation for intrinsically motivated reinforcement learning. In NeurIPS. 2125--2133."},{"key":"e_1_3_2_1_17_1","unstructured":"Ian Osband Charles Blundell Alexander Pritzel and Benjamin Van Roy. 2016. Deep exploration via bootstrapped DQN. In NeurIPS. 4026--4034.  Ian Osband Charles Blundell Alexander Pritzel and Benjamin Van Roy. 2016. Deep exploration via bootstrapped DQN. In NeurIPS. 4026--4034."},{"key":"e_1_3_2_1_18_1","volume-title":"Count-based exploration with neural density models. ICML","author":"Ostrovski Georg","year":"2018","unstructured":"Georg Ostrovski , Marc G Bellemare , Aaron van den Oord , and Remi Munos . 2018. Count-based exploration with neural density models. ICML ( 2018 ). Georg Ostrovski, Marc G Bellemare, Aaron van den Oord, and Remi Munos. 2018. Count-based exploration with neural density models. ICML (2018)."},{"key":"e_1_3_2_1_19_1","first-page":"6","article-title":"What is intrinsic motivation? A typology of computational approaches","volume":"1","author":"Oudeyer Pierre-Yves","year":"2009","unstructured":"Pierre-Yves Oudeyer and Frederic Kaplan . 2009 . What is intrinsic motivation? A typology of computational approaches . FRONT NEUROROBOTICS 1 (2009), 6 . Pierre-Yves Oudeyer and Frederic Kaplan. 2009. What is intrinsic motivation? A typology of computational approaches. FRONT NEUROROBOTICS 1 (2009), 6.","journal-title":"FRONT NEUROROBOTICS"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"crossref","unstructured":"Deepak Pathak Pulkit Agrawal Alexei A Efros and Trevor Darrell. 2017. Curiosity-driven exploration by self-supervised prediction. In CVPRW. 16--17.  Deepak Pathak Pulkit Agrawal Alexei A Efros and Trevor Darrell. 2017. Curiosity-driven exploration by self-supervised prediction. In CVPRW. 16--17.","DOI":"10.1109\/CVPRW.2017.70"},{"volume-title":"Curious model-building control systems","author":"Schmidhuber J\u00fcrgen","key":"e_1_3_2_1_21_1","unstructured":"J\u00fcrgen Schmidhuber . 1991. Curious model-building control systems . In IJCNN. IEEE , 1458--1463. J\u00fcrgen Schmidhuber. 1991. Curious model-building control systems. In IJCNN. IEEE, 1458--1463."},{"key":"e_1_3_2_1_22_1","first-page":"230","article-title":"Formal theory of creativity, fun, and intrinsic motivation (1990-2010)","volume":"2","author":"Schmidhuber J\u00fcrgen","year":"2010","unstructured":"J\u00fcrgen Schmidhuber . 2010 . Formal theory of creativity, fun, and intrinsic motivation (1990-2010) . IEEE Trans. Autom. Control 2 , 3 (2010), 230 -- 247 . J\u00fcrgen Schmidhuber. 2010. Formal theory of creativity, fun, and intrinsic motivation (1990-2010). IEEE Trans. Autom. Control 2, 3 (2010), 230--247.","journal-title":"IEEE Trans. Autom. Control"},{"key":"e_1_3_2_1_23_1","volume-title":"Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347","author":"Schulman John","year":"2017","unstructured":"John Schulman , Filip Wolski , Prafulla Dhariwal , Alec Radford , and Oleg Klimov . 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 ( 2017 ). John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)."},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1007\/s12064-011-0142-z"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"crossref","unstructured":"Alexander L Strehl and Michael L Littman. 2005. A theoretical analysis of model-based interval estimation. In ICML. ACM 856--863.  Alexander L Strehl and Michael L Littman. 2005. A theoretical analysis of model-based interval estimation. In ICML. ACM 856--863.","DOI":"10.1145\/1102351.1102459"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jcss.2007.08.009"},{"volume-title":"Reinforcement learning: An introduction","author":"Sutton Richard S","key":"e_1_3_2_1_27_1","unstructured":"Richard S Sutton and Andrew G Barto . 2018. Reinforcement learning: An introduction . MIT press . Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press."},{"key":"e_1_3_2_1_28_1","volume-title":"Yan Duan, John Schulman, Filip DeTurck, and Pieter Abbeel.","author":"Tang Haoran","year":"2017","unstructured":"Haoran Tang , Rein Houthooft , Davis Foote , Adam Stooke , OpenAI Xi Chen , Yan Duan, John Schulman, Filip DeTurck, and Pieter Abbeel. 2017 . # Exploration : A study of count-based exploration for deep reinforcement learning. In NeurIPS. 2753--2762. Haoran Tang, Rein Houthooft, Davis Foote, Adam Stooke, OpenAI Xi Chen, Yan Duan, John Schulman, Filip DeTurck, and Pieter Abbeel. 2017. # Exploration: A study of count-based exploration for deep reinforcement learning. In NeurIPS. 2753--2762."},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.5555\/3016100.3016191"},{"key":"e_1_3_2_1_30_1","volume-title":"Marc Lanctot, and Nando De Freitas.","author":"Wang Ziyu","year":"2015","unstructured":"Ziyu Wang , Tom Schaul , Matteo Hessel , Hado Van Hasselt , Marc Lanctot, and Nando De Freitas. 2015 . Dueling network architectures for deep reinforcement learning. arXiv preprint arXiv:1511.06581 (2015). Ziyu Wang, Tom Schaul, Matteo Hessel, Hado Van Hasselt, Marc Lanctot, and Nando De Freitas. 2015. Dueling network architectures for deep reinforcement learning. arXiv preprint arXiv:1511.06581 (2015)."}],"event":{"name":"DAI '19: First International Conference on Distributed Artificial Intelligence","acronym":"DAI '19","location":"Beijing China"},"container-title":["Proceedings of the First International Conference on Distributed Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3356464.3357706","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3356464.3357706","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:22:54Z","timestamp":1750202574000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3356464.3357706"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,10,13]]},"references-count":30,"alternative-id":["10.1145\/3356464.3357706","10.1145\/3356464"],"URL":"https:\/\/doi.org\/10.1145\/3356464.3357706","relation":{},"subject":[],"published":{"date-parts":[[2019,10,13]]},"assertion":[{"value":"2019-10-13","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}