{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:13:29Z","timestamp":1750220009658,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":21,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,10,17]],"date-time":"2022-10-17T00:00:00Z","timestamp":1665964800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,10,17]]},"DOI":"10.1145\/3511808.3557616","type":"proceedings-article","created":{"date-parts":[[2022,10,16]],"date-time":"2022-10-16T01:29:57Z","timestamp":1665883797000},"page":"3796-3800","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Improving Imitation Learning by Merging Experts Trajectories"],"prefix":"10.1145","author":[{"given":"Pegah","family":"Alizadeh","sequence":"first","affiliation":[{"name":"LIPN, UMR CNRS 7030, Universit\u00e9 Sorbonne Paris Nord, Villetaneuse, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Aomar","family":"Osmani","sequence":"additional","affiliation":[{"name":"LIPN, UMR CNRS 7030, Universit\u00e9 Sorbonne Paris Nord, Villetaneuse, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sammy","family":"Taleb","sequence":"additional","affiliation":[{"name":"L\u00e9onard De Vinci P\u00f4le Universitaire, Research Center, Paris, La D\u00e9fense, France"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,10,17]]},"reference":[{"volume-title":"International conference on learning representations.","author":"Marcin","unstructured":"Marcin Andrychowicz et al. 2020. What matters for on-policy deep actorcritic methods? a large-scale study . In International conference on learning representations. Marcin Andrychowicz et al. 2020. What matters for on-policy deep actorcritic methods? a large-scale study. In International conference on learning representations.","key":"e_1_3_2_1_1_1"},{"unstructured":"Kamil Ciosek Quan Vuong Robert Loftin and Katja Hofmann. 2019. Better exploration with optimistic actor critic. Advances in Neural Information Processing Systems 32.  Kamil Ciosek Quan Vuong Robert Loftin and Katja Hofmann. 2019. Better exploration with optimistic actor critic. Advances in Neural Information Processing Systems 32.","key":"e_1_3_2_1_2_1"},{"unstructured":"Lasse Espeholt et al. 2018. IMPALA: scalable distributed deep-rl with importance weighted actor-learner architectures. CoRR.  Lasse Espeholt et al. 2018. IMPALA: scalable distributed deep-rl with importance weighted actor-learner architectures. CoRR.","key":"e_1_3_2_1_3_1"},{"unstructured":"William H Guss etal 2021. The minerl 2020 competition on sample efficient reinforcement learning using human priors. arXiv preprint arXiv:2101.11071.  William H Guss et al. 2021. The minerl 2020 competition on sample efficient reinforcement learning using human priors. arXiv preprint arXiv:2101.11071.","key":"e_1_3_2_1_4_1"},{"unstructured":"William H Guss etal 2019. The minerl competition on sample efficient reinforcement learning using human priors.  William H Guss et al. 2019. The minerl competition on sample efficient reinforcement learning using human priors.","key":"e_1_3_2_1_5_1"},{"doi-asserted-by":"publisher","key":"e_1_3_2_1_6_1","DOI":"10.1145\/3054912"},{"unstructured":"Max Jaderberg et al. 2017. Population based training of neural networks. arXiv preprint arXiv:1711.09846.  Max Jaderberg et al. 2017. Population based training of neural networks. arXiv preprint arXiv:1711.09846.","key":"e_1_3_2_1_7_1"},{"doi-asserted-by":"crossref","unstructured":"Hilbert J Kappen. 2011. Optimal control theory and the linear bellman equation.  Hilbert J Kappen. 2011. Optimal control theory and the linear bellman equation.","key":"e_1_3_2_1_8_1","DOI":"10.1017\/CBO9780511984679.018"},{"unstructured":"Vijay Konda and John Tsitsiklis. 1999. Actor-critic algorithms. Advances in neural information processing systems 12.  Vijay Konda and John Tsitsiklis. 1999. Actor-critic algorithms. Advances in neural information processing systems 12.","key":"e_1_3_2_1_9_1"},{"doi-asserted-by":"crossref","unstructured":"Andrew Melnik Augustin Harter Christian Limberg Krishan Rana Niko Suenderhauf and Helge Ritter. 2021. Critic guided segmentation of rewarding objects in first-person views. (2021). arXiv: 2107.09540 [cs.CV].  Andrew Melnik Augustin Harter Christian Limberg Krishan Rana Niko Suenderhauf and Helge Ritter. 2021. Critic guided segmentation of rewarding objects in first-person views. (2021). arXiv: 2107.09540 [cs.CV].","key":"e_1_3_2_1_10_1","DOI":"10.1007\/978-3-030-87626-5_25"},{"key":"e_1_3_2_1_11_1","volume-title":"International conference on machine learning. PMLR","author":"Mnih Volodymyr","year":"2016","unstructured":"Volodymyr Mnih , Adria Puigdomenech Badia , Mehdi Mirza , Alex Graves , Timothy Lillicrap , Tim Harley , David Silver , and Koray Kavukcuoglu . 2016 . Asynchronous methods for deep reinforcement learning . In International conference on machine learning. PMLR , 1928--1937. Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In International conference on machine learning. PMLR, 1928--1937."},{"doi-asserted-by":"crossref","unstructured":"Alejandro Newell Kaiyu Yang and Jia Deng. 2016. Stacked hourglass networks for human pose estimation. (2016). arXiv: 1603.06937 [cs.CV].  Alejandro Newell Kaiyu Yang and Jia Deng. 2016. Stacked hourglass networks for human pose estimation. (2016). arXiv: 1603.06937 [cs.CV].","key":"e_1_3_2_1_12_1","DOI":"10.1007\/978-3-319-46484-8_29"},{"key":"e_1_3_2_1_13_1","volume-title":"Markov Decision Processes: Discrete Stochastic Dynamic Programming","author":"Puterman Martin L.","year":"1977","unstructured":"Martin L. Puterman . 1994. Markov Decision Processes: Discrete Stochastic Dynamic Programming . ( 1 st ed.). John Wiley & Sons, Inc. , New York, NY, USA . isbn: 04716 1977 9. Martin L. Puterman. 1994. Markov Decision Processes: Discrete Stochastic Dynamic Programming. (1st ed.). John Wiley & Sons, Inc., New York, NY, USA. isbn: 0471619779.","edition":"1"},{"key":"e_1_3_2_1_14_1","volume-title":"Proceedings of the NeurIPS 2019 Competition and Demonstration Track (Proceedings of Machine Learning Research). Hugo Jair Escalante and Raia Hadsell, (Eds.)","volume":"123","author":"Scheller Christian","year":"2020","unstructured":"Christian Scheller , Yanick Schraner , and Manfred Vogel . 2020 . Sample efficient reinforcement learning through learning from demonstrations in minecraft . In Proceedings of the NeurIPS 2019 Competition and Demonstration Track (Proceedings of Machine Learning Research). Hugo Jair Escalante and Raia Hadsell, (Eds.) Vol. 123 . PMLR, (Aug. 2020), 67--76. https:\/\/proceedings.mlr.press\/v123\/s cheller20a.html. Christian Scheller, Yanick Schraner, and Manfred Vogel. 2020. Sample efficient reinforcement learning through learning from demonstrations in minecraft. In Proceedings of the NeurIPS 2019 Competition and Demonstration Track (Proceedings of Machine Learning Research). Hugo Jair Escalante and Raia Hadsell, (Eds.) Vol. 123. PMLR, (Aug. 2020), 67--76. https:\/\/proceedings.mlr.press\/v123\/s cheller20a.html."},{"key":"e_1_3_2_1_15_1","volume-title":"International Conference on Machine Learning. PMLR, 9870--9879","author":"Stooke Adam","year":"2021","unstructured":"Adam Stooke , Kimin Lee , Pieter Abbeel , and Michael Laskin . 2021 . Decoupling representation learning from reinforcement learning . In International Conference on Machine Learning. PMLR, 9870--9879 . Adam Stooke, Kimin Lee, Pieter Abbeel, and Michael Laskin. 2021. Decoupling representation learning from reinforcement learning. In International Conference on Machine Learning. PMLR, 9870--9879."},{"key":"e_1_3_2_1_16_1","volume-title":"A unified analysis of valuefunction- based reinforcement-learning algorithms. Neural computation, 11, 8","author":"Szepesv\u00e1ri Csaba","year":"2017","unstructured":"Csaba Szepesv\u00e1ri and Michael L Littman . 1999. A unified analysis of valuefunction- based reinforcement-learning algorithms. Neural computation, 11, 8 , 2017 --2060. Csaba Szepesv\u00e1ri and Michael L Littman. 1999. A unified analysis of valuefunction- based reinforcement-learning algorithms. Neural computation, 11, 8, 2017--2060."},{"unstructured":"Hongyao Tang Zhaopeng Meng HAO Jianye Chen Chen Daniel Graves Dong Li Wulong Liu and Yaodong Yang. 2020. What about taking policy as input of value function: policy-extended value function approximator.  Hongyao Tang Zhaopeng Meng HAO Jianye Chen Chen Daniel Graves Dong Li Wulong Liu and Yaodong Yang. 2020. What about taking policy as input of value function: policy-extended value function approximator.","key":"e_1_3_2_1_17_1"},{"doi-asserted-by":"publisher","key":"e_1_3_2_1_18_1","DOI":"10.1109\/TVT.2018.2890773"},{"doi-asserted-by":"publisher","key":"e_1_3_2_1_19_1","DOI":"10.1109\/ICDE51399.2021.00065"},{"unstructured":"Erik Wijmans Abhishek Kadian Ari Morcos Stefan Lee Irfan Essa Devi Parikh Manolis Savva and Dhruv Batra. 2019. Dd-ppo: learning near-perfect pointgoal navigators from 2.5 billion frames. arXiv preprint arXiv:1911.00357.  Erik Wijmans Abhishek Kadian Ari Morcos Stefan Lee Irfan Essa Devi Parikh Manolis Savva and Dhruv Batra. 2019. Dd-ppo: learning near-perfect pointgoal navigators from 2.5 billion frames. arXiv preprint arXiv:1911.00357.","key":"e_1_3_2_1_20_1"},{"doi-asserted-by":"crossref","unstructured":"Tian Zhang Raghu Ramakrishnan and Miron Livny. 1996. Birch: an efficient data clustering method for very large databases. ACM sigmod record 25 2 103--114.  Tian Zhang Raghu Ramakrishnan and Miron Livny. 1996. Birch: an efficient data clustering method for very large databases. ACM sigmod record 25 2 103--114.","key":"e_1_3_2_1_21_1","DOI":"10.1145\/235968.233324"}],"event":{"sponsor":["SIGWEB ACM Special Interest Group on Hypertext, Hypermedia, and Web","SIGIR ACM Special Interest Group on Information Retrieval"],"acronym":"CIKM '22","name":"CIKM '22: The 31st ACM International Conference on Information and Knowledge Management","location":"Atlanta GA USA"},"container-title":["Proceedings of the 31st ACM International Conference on Information &amp; Knowledge Management"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3511808.3557616","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3511808.3557616","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T17:51:09Z","timestamp":1750182669000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3511808.3557616"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,10,17]]},"references-count":21,"alternative-id":["10.1145\/3511808.3557616","10.1145\/3511808"],"URL":"https:\/\/doi.org\/10.1145\/3511808.3557616","relation":{},"subject":[],"published":{"date-parts":[[2022,10,17]]},"assertion":[{"value":"2022-10-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}