{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:47:17Z","timestamp":1750308437165,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":15,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,12,16]],"date-time":"2022-12-16T00:00:00Z","timestamp":1671148800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,12,16]]},"DOI":"10.1145\/3584376.3584599","type":"proceedings-article","created":{"date-parts":[[2023,4,19]],"date-time":"2023-04-19T22:54:51Z","timestamp":1681944891000},"page":"1269-1273","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["An Improved Soft Q Imitation Learning based on Normalized Reward"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2443-8261","authenticated-orcid":false,"given":"Xiangren","family":"Kong","sequence":"first","affiliation":[{"name":"School of Computer Science, South China Normal University, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0572-8442","authenticated-orcid":false,"given":"Gang","family":"Feng","sequence":"additional","affiliation":[{"name":"School of Computer Science, South China Normal University, China"}]}],"member":"320","published-online":{"date-parts":[[2023,4,19]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"N. Le V. S. Rathour K. Yamazaki K. Luu and M. Savvides Deep reinforcement learning in computer vision: a comprehensive survey Artifificial Intelligence Review (2021) 1\u201387.  N. Le V. S. Rathour K. Yamazaki K. Luu and M. Savvides Deep reinforcement learning in computer vision: a comprehensive survey Artifificial Intelligence Review (2021) 1\u201387."},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2021.114632"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2021.3116063"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1991.3.1.88"},{"key":"e_1_3_2_1_5_1","first-page":"670","volume-title":"Icml","author":"Ng Andrew Y","year":"2000","unstructured":"Andrew Y Ng , Stuart J Russell , Algorithms for inverse reinforcement learning . In Icml , pages 663\u2013 670 , 2000 . Andrew Y Ng, Stuart J Russell, Algorithms for inverse reinforcement learning. In Icml, pages 663\u2013670, 2000."},{"key":"e_1_3_2_1_6_1","volume-title":"A connection between generative adversarial networks, inverse reinforcement learning, and energy-based models. arXiv preprint arXiv:1611.03852","author":"Finn Chelsea","year":"2016","unstructured":"Chelsea Finn , Paul Christiano , Pieter Abbeel , and Sergey Levine . A connection between generative adversarial networks, inverse reinforcement learning, and energy-based models. arXiv preprint arXiv:1611.03852 , 2016 . Chelsea Finn, Paul Christiano, Pieter Abbeel, and Sergey Levine. A connection between generative adversarial networks, inverse reinforcement learning, and energy-based models. arXiv preprint arXiv:1611.03852, 2016."},{"key":"e_1_3_2_1_7_1","volume-title":"Learning robust rewards with adversarial inverse reinforcement learning. arXiv preprint arXiv:1710.11248","author":"Fu Justin","year":"2017","unstructured":"Justin Fu , Katie Luo , and Sergey Levine . Learning robust rewards with adversarial inverse reinforcement learning. arXiv preprint arXiv:1710.11248 , 2017 . Justin Fu, Katie Luo, and Sergey Levine. Learning robust rewards with adversarial inverse reinforcement learning. arXiv preprint arXiv:1710.11248, 2017."},{"key":"e_1_3_2_1_8_1","volume-title":"Debidatta Dwibedi, Sergey Levine, and Jonathan Tompson. Discriminator-actor-critic: Addressing sample inefficiency and reward bias in adversarial imitation learning.","author":"Kostrikov Ilya","year":"2018","unstructured":"Ilya Kostrikov , Kumar Krishna Agrawal , Debidatta Dwibedi, Sergey Levine, and Jonathan Tompson. Discriminator-actor-critic: Addressing sample inefficiency and reward bias in adversarial imitation learning. 2018 . Ilya Kostrikov, Kumar Krishna Agrawal, Debidatta Dwibedi, Sergey Levine, and Jonathan Tompson. Discriminator-actor-critic: Addressing sample inefficiency and reward bias in adversarial imitation learning. 2018."},{"key":"e_1_3_2_1_9_1","volume-title":"Sqil: imitation learning via regularized behavioral cloning. arXiv preprint arXiv:1905.11108","author":"Reddy Siddharth","year":"2019","unstructured":"Siddharth Reddy , Anca D Dragan , and Sergey Levine . Sqil: imitation learning via regularized behavioral cloning. arXiv preprint arXiv:1905.11108 , 2019 . Siddharth Reddy, Anca D Dragan, and Sergey Levine. Sqil: imitation learning via regularized behavioral cloning. arXiv preprint arXiv:1905.11108, 2019."},{"key":"e_1_3_2_1_10_1","volume-title":"Discriminator soft actor critic without extrinsic rewards[C]\/\/2020 IEEE 9th Global Conference on Consumer Electronics (GCCE)","author":"Nishio","year":"2020","unstructured":"Nishio D, Tsuneda T, Kuyoshi D , Discriminator soft actor critic without extrinsic rewards[C]\/\/2020 IEEE 9th Global Conference on Consumer Electronics (GCCE) . IEEE , 2020 : 117-120. Nishio D, Tsuneda T, Kuyoshi D, Discriminator soft actor critic without extrinsic rewards[C]\/\/2020 IEEE 9th Global Conference on Consumer Electronics (GCCE). IEEE, 2020: 117-120."},{"key":"e_1_3_2_1_11_1","first-page":"4573","volume-title":"Advances in neural information processing systems","author":"Ho Jonathan","year":"2016","unstructured":"Jonathan Ho and Stefano Ermon . Generative adversarial imitation learning . In Advances in neural information processing systems , pages 4565\u2013 4573 , 2016 . Jonathan Ho and Stefano Ermon. Generative adversarial imitation learning. In Advances in neural information processing systems, pages 4565\u20134573, 2016."},{"key":"e_1_3_2_1_12_1","first-page":"822","article-title":"Learning guidance rewards with trajectory-space smoothing [J]","volume":"33","author":"Gangwani","year":"2020","unstructured":"Gangwani T, Zhou Y, Peng J . Learning guidance rewards with trajectory-space smoothing [J] . Advances in Neural Information Processing Systems , 2020 , 33 : 822 - 832 . Gangwani T, Zhou Y, Peng J. Learning guidance rewards with trajectory-space smoothing [J]. Advances in Neural Information Processing Systems, 2020, 33: 822-832.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_1_13_1","volume-title":"a python module for physics simulation for games, robotics and machine learning. GitHub repository","author":"Coumans Erwin","year":"2016","unstructured":"Erwin Coumans and Yunfei Bai . Pybullet , a python module for physics simulation for games, robotics and machine learning. GitHub repository , 2016 . Erwin Coumans and Yunfei Bai. Pybullet, a python module for physics simulation for games, robotics and machine learning. GitHub repository, 2016."},{"key":"e_1_3_2_1_14_1","volume-title":"Workshop on Deep Reinforcement Learning at the 33rd Conference on Neural Information Processing Systems","author":"Fujita Yasuhiro","year":"2019","unstructured":"Yasuhiro Fujita , Toshiki Kataoka , Prabhat Nagarajan , and Takahiro Ishikawa . Chainerrl : A deep reinforcement learning library . In Workshop on Deep Reinforcement Learning at the 33rd Conference on Neural Information Processing Systems , December 2019 . Yasuhiro Fujita, Toshiki Kataoka, Prabhat Nagarajan, and Takahiro Ishikawa. Chainerrl: A deep reinforcement learning library. In Workshop on Deep Reinforcement Learning at the 33rd Conference on Neural Information Processing Systems, December 2019."},{"key":"e_1_3_2_1_15_1","first-page":"5033","volume-title":"2012 IEEE\/RSJ International Conference on Intelligent Robots and Systems","author":"Todorov Emanuel","unstructured":"Emanuel Todorov , Tom Erez , and Yuval Tassa . Mujoco : A physics engine for model-based control . In 2012 IEEE\/RSJ International Conference on Intelligent Robots and Systems , pages 5026\u2013 5033 . IEEE, 2012. Emanuel Todorov, Tom Erez, and Yuval Tassa. Mujoco: A physics engine for model-based control. In 2012 IEEE\/RSJ International Conference on Intelligent Robots and Systems, pages 5026\u20135033. IEEE, 2012."}],"event":{"name":"RICAI 2022: 2022 4th International Conference on Robotics, Intelligent Control and Artificial Intelligence","acronym":"RICAI 2022","location":"Dongguan China"},"container-title":["Proceedings of the 2022 4th International Conference on Robotics, Intelligent Control and Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3584376.3584599","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3584376.3584599","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T17:49:58Z","timestamp":1750268998000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3584376.3584599"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,16]]},"references-count":15,"alternative-id":["10.1145\/3584376.3584599","10.1145\/3584376"],"URL":"https:\/\/doi.org\/10.1145\/3584376.3584599","relation":{},"subject":[],"published":{"date-parts":[[2022,12,16]]},"assertion":[{"value":"2023-04-19","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}