{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:18:06Z","timestamp":1750220286964,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":36,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,2,18]],"date-time":"2022-02-18T00:00:00Z","timestamp":1645142400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,2,18]]},"DOI":"10.1145\/3529836.3529918","type":"proceedings-article","created":{"date-parts":[[2022,6,21]],"date-time":"2022-06-21T20:27:55Z","timestamp":1655843275000},"page":"51-58","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Pseudo Reward and Action Importance Classification for Sparse Reward Problem"],"prefix":"10.1145","author":[{"given":"Qingtong","family":"Wu","sequence":"first","affiliation":[{"name":"School of Computing, National University of Defense Technology, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dawei","family":"Feng","sequence":"additional","affiliation":[{"name":"School of Computing, National University of Defense Technology, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yuanzhao","family":"Zhai","sequence":"additional","affiliation":[{"name":"School of Computing, National University of Defense Technology, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bo","family":"Ding","sequence":"additional","affiliation":[{"name":"School of Computing, National University of Defense Technology, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jie","family":"Luo","sequence":"additional","affiliation":[{"name":"School of Computing, National University of Defense Technology, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,6,21]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/1553374.1553380"},{"key":"e_1_3_2_1_2_1","unstructured":"Christopher Berner Greg Brockman Brooke Chan Christy Cheung David Farhi Quirin Fischer Shariq Hashme Chris Hesse 2019. Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680(2019).  Christopher Berner Greg Brockman Brooke Chan Christy Cheung David Farhi Quirin Fischer Shariq Hashme Chris Hesse 2019. Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680(2019)."},{"key":"e_1_3_2_1_3_1","unstructured":"Mariusz Bojarski Davide Del\u00a0Testa Daniel Dworakowski Bernhard Firner Beat Flepp Prasoon Goyal Lawrence\u00a0D Jackel Mathew Monfort Urs Muller Jiakai Zhang 2016. End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316(2016).  Mariusz Bojarski Davide Del\u00a0Testa Daniel Dworakowski Bernhard Firner Beat Flepp Prasoon Goyal Lawrence\u00a0D Jackel Mathew Monfort Urs Muller Jiakai Zhang 2016. End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316(2016)."},{"key":"e_1_3_2_1_4_1","unstructured":"Yuri Burda Harri Edwards Deepak Pathak Amos Storkey Trevor Darrell and Alexei\u00a0A Efros. 2018. Large-scale study of curiosity-driven learning. arXiv preprint arXiv:1808.04355(2018).  Yuri Burda Harri Edwards Deepak Pathak Amos Storkey Trevor Darrell and Alexei\u00a0A Efros. 2018. Large-scale study of curiosity-driven learning. arXiv preprint arXiv:1808.04355(2018)."},{"key":"e_1_3_2_1_5_1","volume-title":"International Conference on Machine Learning. PMLR, 1407\u20131416","author":"Espeholt Lasse","year":"2018","unstructured":"Lasse Espeholt , Hubert Soyer , Remi Munos , Karen Simonyan , Vlad Mnih , Tom Ward , Yotam Doron , Vlad Firoiu , Tim Harley , Iain Dunning , 2018 . Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures . In International Conference on Machine Learning. PMLR, 1407\u20131416 . Lasse Espeholt, Hubert Soyer, Remi Munos, Karen Simonyan, Vlad Mnih, Tom Ward, Yotam Doron, Vlad Firoiu, Tim Harley, Iain Dunning, 2018. Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures. In International Conference on Machine Learning. PMLR, 1407\u20131416."},{"key":"e_1_3_2_1_6_1","unstructured":"Yannis Flet-Berliac Johan Ferret Olivier Pietquin Philippe Preux and Matthieu Geist. 2021. Adversarially guided actor-critic. arXiv preprint arXiv:2102.04376(2021).  Yannis Flet-Berliac Johan Ferret Olivier Pietquin Philippe Preux and Matthieu Geist. 2021. Adversarially guided actor-critic. arXiv preprint arXiv:2102.04376(2021)."},{"key":"e_1_3_2_1_7_1","unstructured":"Kevin Frans Jonathan Ho Xi Chen Pieter Abbeel and John Schulman. 2017. Meta learning shared hierarchies. arXiv preprint arXiv:1710.09767(2017).  Kevin Frans Jonathan Ho Xi Chen Pieter Abbeel and John Schulman. 2017. Meta learning shared hierarchies. arXiv preprint arXiv:1710.09767(2017)."},{"key":"e_1_3_2_1_8_1","volume-title":"A review on generative adversarial networks: Algorithms, theory, and applications","author":"Gui Jie","year":"2021","unstructured":"Jie Gui , Zhenan Sun , Yonggang Wen , Dacheng Tao , and Jieping Ye. 2021. A review on generative adversarial networks: Algorithms, theory, and applications . IEEE Transactions on Knowledge and Data Engineering ( 2021 ). Jie Gui, Zhenan Sun, Yonggang Wen, Dacheng Tao, and Jieping Ye. 2021. A review on generative adversarial networks: Algorithms, theory, and applications. IEEE Transactions on Knowledge and Data Engineering (2021)."},{"key":"e_1_3_2_1_9_1","unstructured":"Tuomas Haarnoja Aurick Zhou Kristian Hartikainen George Tucker Sehoon Ha Jie Tan Vikash Kumar Henry Zhu Abhishek Gupta Pieter Abbeel 2018. Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905(2018).  Tuomas Haarnoja Aurick Zhou Kristian Hartikainen George Tucker Sehoon Ha Jie Tan Vikash Kumar Henry Zhu Abhishek Gupta Pieter Abbeel 2018. Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905(2018)."},{"key":"e_1_3_2_1_10_1","unstructured":"Dan Horgan John Quan David Budden Gabriel Barth-Maron Matteo Hessel Hado Van\u00a0Hasselt and David Silver. 2018. Distributed prioritized experience replay. arXiv preprint arXiv:1803.00933(2018).  Dan Horgan John Quan David Budden Gabriel Barth-Maron Matteo Hessel Hado Van\u00a0Hasselt and David Silver. 2018. Distributed prioritized experience replay. arXiv preprint arXiv:1803.00933(2018)."},{"key":"e_1_3_2_1_11_1","volume-title":"Vime: Variational information maximizing exploration. arXiv preprint arXiv:1605.09674(2016).","author":"Houthooft Rein","year":"2016","unstructured":"Rein Houthooft , Xi Chen , Yan Duan , John Schulman , Filip De\u00a0Turck , and Pieter Abbeel . 2016 . Vime: Variational information maximizing exploration. arXiv preprint arXiv:1605.09674(2016). Rein Houthooft, Xi Chen, Yan Duan, John Schulman, Filip De\u00a0Turck, and Pieter Abbeel. 2016. Vime: Variational information maximizing exploration. arXiv preprint arXiv:1605.09674(2016)."},{"key":"e_1_3_2_1_12_1","volume-title":"Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. Advances in neural information processing systems 29","author":"Kulkarni D","year":"2016","unstructured":"Tejas\u00a0 D Kulkarni , Karthik Narasimhan , Ardavan Saeedi , and Josh Tenenbaum . 2016. Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. Advances in neural information processing systems 29 ( 2016 ), 3675\u20133683. Tejas\u00a0D Kulkarni, Karthik Narasimhan, Ardavan Saeedi, and Josh Tenenbaum. 2016. Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. Advances in neural information processing systems 29 (2016), 3675\u20133683."},{"key":"e_1_3_2_1_13_1","unstructured":"Karol Kurach Olivier Raichuk Lasse Espeholt Carlos Riquelme Damien Vincent Marcin Michalski Olivier Bousquet 2019. Google research football: A novel reinforcement learning environment. arXiv preprint arXiv:1907.11180(2019).  Karol Kurach Olivier Raichuk Lasse Espeholt Carlos Riquelme Damien Vincent Marcin Michalski Olivier Bousquet 2019. Google research football: A novel reinforcement learning environment. arXiv preprint arXiv:1907.11180(2019)."},{"key":"e_1_3_2_1_14_1","volume-title":"Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. Neural Information Processing Systems (NIPS)","author":"Lowe Ryan","year":"2017","unstructured":"Ryan Lowe , Yi Wu , Aviv Tamar , Jean Harb , Pieter Abbeel , and Igor Mordatch . 2017. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. Neural Information Processing Systems (NIPS) ( 2017 ). Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, and Igor Mordatch. 2017. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. Neural Information Processing Systems (NIPS) (2017)."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.artint.2017.09.001"},{"key":"e_1_3_2_1_16_1","volume-title":"Human-level control through deep reinforcement learning. nature 518, 7540","author":"Mnih Volodymyr","year":"2015","unstructured":"Volodymyr Mnih , Koray Kavukcuoglu , David Silver , Andrei\u00a0 A Rusu , Joel Veness , Marc\u00a0 G Bellemare , Alex Graves , Martin Riedmiller , Andreas\u00a0 K Fidjeland , Georg Ostrovski , 2015. Human-level control through deep reinforcement learning. nature 518, 7540 ( 2015 ), 529\u2013533. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei\u00a0A Rusu, Joel Veness, Marc\u00a0G Bellemare, Alex Graves, Martin Riedmiller, Andreas\u00a0K Fidjeland, Georg Ostrovski, 2015. Human-level control through deep reinforcement learning. nature 518, 7540 (2015), 529\u2013533."},{"key":"e_1_3_2_1_17_1","unstructured":"Igor Mordatch and Pieter Abbeel. 2017. Emergence of Grounded Compositional Language in Multi-Agent Populations. arXiv preprint arXiv:1703.04908(2017).  Igor Mordatch and Pieter Abbeel. 2017. Emergence of Grounded Compositional Language in Multi-Agent Populations. arXiv preprint arXiv:1703.04908(2017)."},{"volume-title":"Shaping and policy search in reinforcement learning","author":"Y Ng.","key":"e_1_3_2_1_18_1","unstructured":"Andrew\u00a0 Y Ng. 2003. Shaping and policy search in reinforcement learning . University of California , Berkeley. Andrew\u00a0Y Ng. 2003. Shaping and policy search in reinforcement learning. University of California, Berkeley."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2017.70"},{"key":"e_1_3_2_1_20_1","unstructured":"Tim Salimans and Richard Chen. 2018. Learning Montezuma\u2019s Revenge from a Single Demonstration. arXiv preprint arXiv:1812.03381(2018).  Tim Salimans and Richard Chen. 2018. Learning Montezuma\u2019s Revenge from a Single Demonstration. arXiv preprint arXiv:1812.03381(2018)."},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.2352\/ISSN.2470-1173.2017.19.AVM-023"},{"key":"e_1_3_2_1_22_1","unstructured":"Tom Schaul John Quan Ioannis Antonoglou and David Silver. 2015. Prioritized experience replay. arXiv preprint arXiv:1511.05952(2015).  Tom Schaul John Quan Ioannis Antonoglou and David Silver. 2015. Prioritized experience replay. arXiv preprint arXiv:1511.05952(2015)."},{"key":"e_1_3_2_1_23_1","unstructured":"John Schulman Filip Wolski Prafulla Dhariwal Alec Radford and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347(2017).  John Schulman Filip Wolski Prafulla Dhariwal Alec Radford and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347(2017)."},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neunet.2019.01.011"},{"key":"e_1_3_2_1_25_1","volume-title":"Julian Schrittwieser, Ioannis Antonoglou","author":"Silver David","year":"2016","unstructured":"David Silver , Aja Huang , Chris\u00a0 J Maddison , Arthur Guez , Laurent Sifre , George Van Den\u00a0Driessche , Julian Schrittwieser, Ioannis Antonoglou , Veda Panneershelvam, Marc Lanctot , 2016 . Mastering the game of Go with deep neural networks and tree search. nature 529, 7587 (2016), 484\u2013489. David Silver, Aja Huang, Chris\u00a0J Maddison, Arthur Guez, Laurent Sifre, George Van Den\u00a0Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, 2016. Mastering the game of Go with deep neural networks and tree search. nature 529, 7587 (2016), 484\u2013489."},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"crossref","unstructured":"Paul\u00a0J Silvia. 2012. Curiosity and motivation. The Oxford handbook of human motivation(2012) 157\u2013166.  Paul\u00a0J Silvia. 2012. Curiosity and motivation. The Oxford handbook of human motivation(2012) 157\u2013166.","DOI":"10.1093\/oxfordhb\/9780195399820.013.0010"},{"key":"e_1_3_2_1_28_1","volume-title":"Proceedings of the annual conference of the cognitive science society. Cognitive Science Society, 2601\u20132606","author":"Singh Satinder","year":"2009","unstructured":"Satinder Singh , Richard\u00a0 L Lewis , and Andrew\u00a0 G Barto . 2009 . Where do rewards come from . In Proceedings of the annual conference of the cognitive science society. Cognitive Science Society, 2601\u20132606 . Satinder Singh, Richard\u00a0L Lewis, and Andrew\u00a0G Barto. 2009. Where do rewards come from. In Proceedings of the annual conference of the cognitive science society. Cognitive Science Society, 2601\u20132606."},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v31i1.10933"},{"key":"e_1_3_2_1_30_1","volume-title":"Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 7782","author":"Vinyals Oriol","year":"2019","unstructured":"Oriol Vinyals , Igor Babuschkin , Wojciech\u00a0 M Czarnecki , Micha\u00ebl Mathieu , Andrew Dudzik , Junyoung Chung , David\u00a0 H Choi , Richard Powell , Timo Ewalds , Petko Georgiev , 2019. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 7782 ( 2019 ), 350\u2013354. Oriol Vinyals, Igor Babuschkin, Wojciech\u00a0M Czarnecki, Micha\u00ebl Mathieu, Andrew Dudzik, Junyoung Chung, David\u00a0H Choi, Richard Powell, Timo Ewalds, Petko Georgiev, 2019. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 7782 (2019), 350\u2013354."},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2021.3069908"},{"key":"e_1_3_2_1_32_1","volume-title":"Online reinforcement learning control for the personalization of a robotic knee prosthesis","author":"Wen Yue","year":"2019","unstructured":"Yue Wen , Jennie Si , Andrea Brandt , Xiang Gao , and He\u00a0Helen Huang . 2019. Online reinforcement learning control for the personalization of a robotic knee prosthesis . IEEE transactions on cybernetics 50, 6 ( 2019 ), 2346\u20132356. Yue Wen, Jennie Si, Andrea Brandt, Xiang Gao, and He\u00a0Helen Huang. 2019. Online reinforcement learning control for the personalization of a robotic knee prosthesis. IEEE transactions on cybernetics 50, 6 (2019), 2346\u20132356."},{"key":"e_1_3_2_1_33_1","unstructured":"Deheng Ye Guibin Chen Wen Zhang Sheng Chen Bo Yuan Bo Liu Jia Chen Zhao Liu Fuhao Qiu Hongsheng Yu 2020. Towards playing full moba games with deep reinforcement learning. arXiv preprint arXiv:2011.12692(2020).  Deheng Ye Guibin Chen Wen Zhang Sheng Chen Bo Yuan Bo Liu Jia Chen Zhao Liu Fuhao Qiu Hongsheng Yu 2020. Towards playing full moba games with deep reinforcement learning. arXiv preprint arXiv:2011.12692(2020)."},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i4.16443"},{"key":"e_1_3_2_1_35_1","unstructured":"Oleksii Zhelo Jingwei Zhang Lei Tai Ming Liu and Wolfram Burgard. 2018. Curiosity-driven exploration for mapless navigation with deep reinforcement learning. arXiv preprint arXiv:1804.00456(2018).  Oleksii Zhelo Jingwei Zhang Lei Tai Ming Liu and Wolfram Burgard. 2018. Curiosity-driven exploration for mapless navigation with deep reinforcement learning. arXiv preprint arXiv:1804.00456(2018)."},{"key":"e_1_3_2_1_36_1","volume-title":"International Conference on Machine Learning. PMLR, 11436\u201311446","author":"Zheng Zeyu","year":"2020","unstructured":"Zeyu Zheng , Junhyuk Oh , Matteo Hessel , Zhongwen Xu , Manuel Kroiss , Hado Van\u00a0Hasselt , David Silver , and Satinder Singh . 2020 . What can learned intrinsic rewards capture? . In International Conference on Machine Learning. PMLR, 11436\u201311446 . Zeyu Zheng, Junhyuk Oh, Matteo Hessel, Zhongwen Xu, Manuel Kroiss, Hado Van\u00a0Hasselt, David Silver, and Satinder Singh. 2020. What can learned intrinsic rewards capture?. In International Conference on Machine Learning. PMLR, 11436\u201311446."},{"key":"e_1_3_2_1_37_1","unstructured":"Zeyu Zheng Junhyuk Oh and Satinder Singh. 2018. On learning intrinsic rewards for policy gradient methods. arXiv preprint arXiv:1804.06459(2018).  Zeyu Zheng Junhyuk Oh and Satinder Singh. 2018. On learning intrinsic rewards for policy gradient methods. arXiv preprint arXiv:1804.06459(2018)."}],"event":{"name":"ICMLC 2022: 2022 14th International Conference on Machine Learning and Computing","acronym":"ICMLC 2022","location":"Guangzhou China"},"container-title":["2022 14th International Conference on Machine Learning and Computing (ICMLC)"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3529836.3529918","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3529836.3529918","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:31:25Z","timestamp":1750188685000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3529836.3529918"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,2,18]]},"references-count":36,"alternative-id":["10.1145\/3529836.3529918","10.1145\/3529836"],"URL":"https:\/\/doi.org\/10.1145\/3529836.3529918","relation":{},"subject":[],"published":{"date-parts":[[2022,2,18]]},"assertion":[{"value":"2022-06-21","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}