{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T15:44:32Z","timestamp":1772120672933,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":55,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,3,8]],"date-time":"2021-03-08T00:00:00Z","timestamp":1615161600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,3,8]]},"DOI":"10.1145\/3437963.3441764","type":"proceedings-article","created":{"date-parts":[[2021,3,6]],"date-time":"2021-03-06T04:36:17Z","timestamp":1615005377000},"page":"121-129","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":37,"title":["User Response Models to Improve a REINFORCE Recommender System"],"prefix":"10.1145","author":[{"given":"Minmin","family":"Chen","sequence":"first","affiliation":[{"name":"Google, Inc., Mountain View, CA, USA"}]},{"given":"Bo","family":"Chang","sequence":"additional","affiliation":[{"name":"Google, Inc., Mountain View, CA, USA"}]},{"given":"Can","family":"Xu","sequence":"additional","affiliation":[{"name":"Google, Inc., Mountain View, CA, USA"}]},{"given":"Ed H.","family":"Chi","sequence":"additional","affiliation":[{"name":"Google, Inc., Mountain View, CA, USA"}]}],"member":"320","published-online":{"date-parts":[[2021,3,8]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Constrained policy optimization. arXiv preprint arXiv:1705.10528","author":"Achiam Joshua","year":"2017","unstructured":"Joshua Achiam , David Held , Aviv Tamar , and Pieter Abbeel . 2017. Constrained policy optimization. arXiv preprint arXiv:1705.10528 ( 2017 ). Joshua Achiam, David Held, Aviv Tamar, and Pieter Abbeel. 2017. Constrained policy optimization. arXiv preprint arXiv:1705.10528 (2017)."},{"key":"e_1_3_2_1_2_1","volume-title":"Striving for Simplicity in Off-policy Deep Reinforcement Learning. arXiv preprint arXiv:1907.04543","author":"Agarwal Rishabh","year":"2019","unstructured":"Rishabh Agarwal , Dale Schuurmans , and Mohammad Norouzi . 2019. Striving for Simplicity in Off-policy Deep Reinforcement Learning. arXiv preprint arXiv:1907.04543 ( 2019 ). Rishabh Agarwal, Dale Schuurmans, and Mohammad Norouzi. 2019. Striving for Simplicity in Off-policy Deep Reinforcement Learning. arXiv preprint arXiv:1907.04543 (2019)."},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/3289600.3290999"},{"key":"e_1_3_2_1_4_1","unstructured":"Minmin Chen Ramki Gummadi Chris Harris and Dale Schuurmans. 2019 b. Surrogate Objectives for Batch Policy Optimization in One-step Decision Making. In Advances in Neural Information Processing Systems. 8825--8835.  Minmin Chen Ramki Gummadi Chris Harris and Dale Schuurmans. 2019 b. Surrogate Objectives for Batch Policy Optimization in One-step Decision Making. In Advances in Neural Information Processing Systems. 8825--8835."},{"key":"e_1_3_2_1_5_1","volume-title":"Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555","author":"Chung Junyoung","year":"2014","unstructured":"Junyoung Chung , Caglar Gulcehre , KyungHyun Cho , and Yoshua Bengio . 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 ( 2014 ). Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014)."},{"key":"e_1_3_2_1_6_1","unstructured":"Corinna Cortes Yishay Mansour and Mehryar Mohri. 2010. Learning bounds for importance weighting. In Advances in neural information processing systems .  Corinna Cortes Yishay Mansour and Mehryar Mohri. 2010. Learning bounds for importance weighting. In Advances in neural information processing systems ."},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/2959100.2959190"},{"key":"e_1_3_2_1_8_1","volume-title":"Adapting auxiliary losses using gradient similarity. arXiv preprint arXiv:1812.02224","author":"Du Yunshu","year":"2018","unstructured":"Yunshu Du , Wojciech M Czarnecki , Siddhant M Jayakumar , Razvan Pascanu , and Balaji Lakshminarayanan . 2018. Adapting auxiliary losses using gradient similarity. arXiv preprint arXiv:1812.02224 ( 2018 ). Yunshu Du, Wojciech M Czarnecki, Siddhant M Jayakumar, Razvan Pascanu, and Balaji Lakshminarayanan. 2018. Adapting auxiliary losses using gradient similarity. arXiv preprint arXiv:1812.02224 (2018)."},{"key":"e_1_3_2_1_9_1","volume-title":"Deep reinforcement learning in large discrete action spaces. arXiv preprint arXiv:1512.07679","author":"Dulac-Arnold Gabriel","year":"2015","unstructured":"Gabriel Dulac-Arnold , Richard Evans , Hado van Hasselt , Peter Sunehag , Timothy Lillicrap , Jonathan Hunt , Timothy Mann , Theophane Weber , Thomas Degris , and Ben Coppin . 2015. Deep reinforcement learning in large discrete action spaces. arXiv preprint arXiv:1512.07679 ( 2015 ). Gabriel Dulac-Arnold, Richard Evans, Hado van Hasselt, Peter Sunehag, Timothy Lillicrap, Jonathan Hunt, Timothy Mann, Theophane Weber, Thomas Degris, and Ben Coppin. 2015. Deep reinforcement learning in large discrete action spaces. arXiv preprint arXiv:1512.07679 (2015)."},{"key":"e_1_3_2_1_10_1","article-title":"Tree-based batch mode reinforcement learning","volume":"6","author":"Ernst Damien","year":"2005","unstructured":"Damien Ernst , Pierre Geurts , and Louis Wehenkel . 2005 . Tree-based batch mode reinforcement learning . Journal of Machine Learning Research , Vol. 6 , Apr (2005). Damien Ernst, Pierre Geurts, and Louis Wehenkel. 2005. Tree-based batch mode reinforcement learning. Journal of Machine Learning Research , Vol. 6, Apr (2005).","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_2_1_11_1","volume-title":"Off-policy deep reinforcement learning without exploration. arXiv preprint arXiv:1812.02900","author":"Fujimoto Scott","year":"2018","unstructured":"Scott Fujimoto , David Meger , and Doina Precup . 2018. Off-policy deep reinforcement learning without exploration. arXiv preprint arXiv:1812.02900 ( 2018 ). Scott Fujimoto, David Meger, and Doina Precup. 2018. Off-policy deep reinforcement learning without exploration. arXiv preprint arXiv:1812.02900 (2018)."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3159652.3159687"},{"key":"e_1_3_2_1_13_1","volume-title":"World models. arXiv preprint arXiv:1803.10122","author":"Ha David","year":"2018","unstructured":"David Ha and J\u00fcrgen Schmidhuber . 2018. World models. arXiv preprint arXiv:1803.10122 ( 2018 ). David Ha and J\u00fcrgen Schmidhuber. 2018. World models. arXiv preprint arXiv:1803.10122 (2018)."},{"key":"e_1_3_2_1_14_1","volume-title":"International Conference on Machine Learning . 2555--2565","author":"Hafner Danijar","year":"2019","unstructured":"Danijar Hafner , Timothy Lillicrap , Ian Fischer , Ruben Villegas , David Ha , Honglak Lee , and James Davidson . 2019 . Learning latent dynamics for planning from pixels . In International Conference on Machine Learning . 2555--2565 . Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, and James Davidson. 2019. Learning latent dynamics for planning from pixels. In International Conference on Machine Learning . 2555--2565."},{"key":"e_1_3_2_1_15_1","volume-title":"Darla: Improving zero-shot transfer in reinforcement learning. arXiv preprint arXiv:1707.08475","author":"Higgins Irina","year":"2017","unstructured":"Irina Higgins , Arka Pal , Andrei A Rusu , Loic Matthey , Christopher P Burgess , Alexander Pritzel , Matthew Botvinick , Charles Blundell , and Alexander Lerchner . 2017 . Darla: Improving zero-shot transfer in reinforcement learning. arXiv preprint arXiv:1707.08475 (2017). Irina Higgins, Arka Pal, Andrei A Rusu, Loic Matthey, Christopher P Burgess, Alexander Pritzel, Matthew Botvinick, Charles Blundell, and Alexander Lerchner. 2017. Darla: Improving zero-shot transfer in reinforcement learning. arXiv preprint arXiv:1707.08475 (2017)."},{"key":"e_1_3_2_1_16_1","volume-title":"Long short-term memory. Neural computation","author":"Hochreiter Sepp","year":"1997","unstructured":"Sepp Hochreiter and J\u00fcrgen Schmidhuber . 1997. Long short-term memory. Neural computation , Vol. 9 , 8 ( 1997 ), 1735--1780. Sepp Hochreiter and J\u00fcrgen Schmidhuber. 1997. Long short-term memory. Neural computation , Vol. 9, 8 (1997), 1735--1780."},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3219819.3219846"},{"key":"e_1_3_2_1_18_1","unstructured":"Eugene Ie Vihan Jain Jing Wang Sanmit Narvekar Ritesh Agarwal Rui Wu Heng-Tze Cheng Tushar Chandra and Craig Boutilier. 2019. SlateQ: A tractable decomposition for reinforcement learning with recommendation sets. (2019).  Eugene Ie Vihan Jain Jing Wang Sanmit Narvekar Ritesh Agarwal Rui Wu Heng-Tze Cheng Tushar Chandra and Craig Boutilier. 2019. SlateQ: A tractable decomposition for reinforcement learning with recommendation sets. (2019)."},{"key":"e_1_3_2_1_19_1","volume-title":"Tom Schaul, Joel Z Leibo, David Silver, and Koray Kavukcuoglu.","author":"Jaderberg Max","year":"2017","unstructured":"Max Jaderberg , Volodymyr Mnih , Wojciech Marian Czarnecki , Tom Schaul, Joel Z Leibo, David Silver, and Koray Kavukcuoglu. 2017 . Reinforcement learning with unsupervised auxiliary tasks. In ICLR . Max Jaderberg, Volodymyr Mnih, Wojciech Marian Czarnecki, Tom Schaul, Joel Z Leibo, David Silver, and Koray Kavukcuoglu. 2017. Reinforcement learning with unsupervised auxiliary tasks. In ICLR ."},{"key":"e_1_3_2_1_20_1","volume-title":"On using very large target vocabulary for neural machine translation. arXiv preprint arXiv:1412.2007","author":"Jean S\u00e9bastien","year":"2014","unstructured":"S\u00e9bastien Jean , Kyunghyun Cho , Roland Memisevic , and Yoshua Bengio . 2014. On using very large target vocabulary for neural machine translation. arXiv preprint arXiv:1412.2007 ( 2014 ). S\u00e9bastien Jean, Kyunghyun Cho, Roland Memisevic, and Yoshua Bengio. 2014. On using very large target vocabulary for neural machine translation. arXiv preprint arXiv:1412.2007 (2014)."},{"key":"e_1_3_2_1_21_1","volume-title":"et almbox","author":"Kaiser Lukasz","year":"2019","unstructured":"Lukasz Kaiser , Mohammad Babaeizadeh , Piotr Milos , Blazej Osinski , Roy H Campbell , Konrad Czechowski , Dumitru Erhan , Chelsea Finn , Piotr Kozakowski , Sergey Levine , et almbox . 2019 . Model-based reinforcement learning for atari. arXiv preprint arXiv:1903.00374 (2019). Lukasz Kaiser, Mohammad Babaeizadeh, Piotr Milos, Blazej Osinski, Roy H Campbell, Konrad Czechowski, Dumitru Erhan, Chelsea Finn, Piotr Kozakowski, Sergey Levine, et almbox. 2019. Model-based reinforcement learning for atari. arXiv preprint arXiv:1903.00374 (2019)."},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1329125.1329241"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1611835114"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/MC.2009.263"},{"key":"e_1_3_2_1_25_1","volume-title":"Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction. arXiv preprint arXiv:1906.00949","author":"Kumar Aviral","year":"2019","unstructured":"Aviral Kumar , Justin Fu , George Tucker , and Sergey Levine . 2019. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction. arXiv preprint arXiv:1906.00949 ( 2019 ). Aviral Kumar, Justin Fu, George Tucker, and Sergey Levine. 2019. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction. arXiv preprint arXiv:1906.00949 (2019)."},{"key":"e_1_3_2_1_26_1","volume-title":"Reinforcement learning","author":"Lange Sascha","unstructured":"Sascha Lange , Thomas Gabel , and Martin Riedmiller . 2012. Batch reinforcement learning . In Reinforcement learning . Springer , 45--73. Sascha Lange, Thomas Gabel, and Martin Riedmiller. 2012. Batch reinforcement learning. In Reinforcement learning . Springer, 45--73."},{"key":"e_1_3_2_1_27_1","volume-title":"Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643","author":"Levine Sergey","year":"2020","unstructured":"Sergey Levine , Aviral Kumar , George Tucker , and Justin Fu. 2020. Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643 ( 2020 ). Sergey Levine, Aviral Kumar, George Tucker, and Justin Fu. 2020. Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643 (2020)."},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1177\/0278364917710318"},{"key":"e_1_3_2_1_29_1","volume-title":"Deep reinforcement learning based recommendation with explicit user-item interactions modeling. arXiv preprint arXiv:1810.12027","author":"Liu Feng","year":"2018","unstructured":"Feng Liu , Ruiming Tang , Xutao Li , Weinan Zhang , Yunming Ye , Haokun Chen , Huifeng Guo , and Yuzhou Zhang . 2018. Deep reinforcement learning based recommendation with explicit user-item interactions modeling. arXiv preprint arXiv:1810.12027 ( 2018 ). Feng Liu, Ruiming Tang, Xutao Li, Weinan Zhang, Yunming Ye, Haokun Chen, Huifeng Guo, and Yuzhou Zhang. 2018. Deep reinforcement learning based recommendation with explicit user-item interactions modeling. arXiv preprint arXiv:1810.12027 (2018)."},{"key":"e_1_3_2_1_30_1","unstructured":"Shikun Liu Andrew Davison and Edward Johns. 2019. Self-supervised generalisation with meta auxiliary learning. In Advances in Neural Information Processing Systems. 1679--1689.  Shikun Liu Andrew Davison and Edward Johns. 2019. Self-supervised generalisation with meta auxiliary learning. In Advances in Neural Information Processing Systems. 1679--1689."},{"key":"e_1_3_2_1_31_1","unstructured":"Taylor Mordan Nicolas Thome Gilles Henaff and Matthieu Cord. 2018. Revisiting multi-task learning with rock: a deep residual auxiliary block for visual detection. In Advances in Neural Information Processing Systems. 1310--1322.  Taylor Mordan Nicolas Thome Gilles Henaff and Matthieu Cord. 2018. Revisiting multi-task learning with rock: a deep residual auxiliary block for visual detection. In Advances in Neural Information Processing Systems. 1310--1322."},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neunet.2019.01.012"},{"key":"e_1_3_2_1_33_1","volume-title":"Progressive neural networks. arXiv preprint arXiv:1606.04671","author":"Rusu Andrei A","year":"2016","unstructured":"Andrei A Rusu , Neil C Rabinowitz , Guillaume Desjardins , Hubert Soyer , James Kirkpatrick , Koray Kavukcuoglu , Razvan Pascanu , and Raia Hadsell . 2016. Progressive neural networks. arXiv preprint arXiv:1606.04671 ( 2016 ). Andrei A Rusu, Neil C Rabinowitz, Guillaume Desjardins, Hubert Soyer, James Kirkpatrick, Koray Kavukcuoglu, Razvan Pascanu, and Raia Hadsell. 2016. Progressive neural networks. arXiv preprint arXiv:1606.04671 (2016)."},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/371920.372071"},{"key":"e_1_3_2_1_35_1","volume-title":"International Conference on Machine Learning . 1889--1897","author":"Schulman John","year":"2015","unstructured":"John Schulman , Sergey Levine , Pieter Abbeel , Michael Jordan , and Philipp Moritz . 2015 . Trust region policy optimization . In International Conference on Machine Learning . 1889--1897 . John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. 2015. Trust region policy optimization. In International Conference on Machine Learning . 1889--1897."},{"key":"e_1_3_2_1_36_1","volume-title":"Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347","author":"Schulman John","year":"2017","unstructured":"John Schulman , Filip Wolski , Prafulla Dhariwal , Alec Radford , and Oleg Klimov . 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 ( 2017 ). John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)."},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2018.8462891"},{"key":"e_1_3_2_1_38_1","article-title":"An MDP-based recommender system","volume":"6","author":"Shani Guy","year":"2005","unstructured":"Guy Shani , David Heckerman , and Ronen I Brafman . 2005 . An MDP-based recommender system . Journal of Machine Learning Research , Vol. 6 , Sep (2005). Guy Shani, David Heckerman, and Ronen I Brafman. 2005. An MDP-based recommender system. Journal of Machine Learning Research , Vol. 6, Sep (2005).","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_2_1_39_1","volume-title":"Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et almbox.","author":"Silver David","year":"2016","unstructured":"David Silver , Aja Huang , Chris J Maddison , Arthur Guez , Laurent Sifre , George Van Den Driessche , Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et almbox. 2016 . Mastering the game of Go with deep neural networks and tree search. nature , Vol. 529 , 7587 (2016), 484. David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et almbox. 2016. Mastering the game of Go with deep neural networks and tree search. nature , Vol. 529, 7587 (2016), 484."},{"key":"e_1_3_2_1_40_1","volume-title":"et almbox","author":"Silver David","year":"2017","unstructured":"David Silver , Julian Schrittwieser , Karen Simonyan , Ioannis Antonoglou , Aja Huang , Arthur Guez , Thomas Hubert , Lucas Baker , Matthew Lai , Adrian Bolton , et almbox . 2017 . Mastering the game of go without human knowledge. nature , Vol. 550 , 7676 (2017), 354--359. David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et almbox. 2017. Mastering the game of go without human knowledge. nature , Vol. 550, 7676 (2017), 354--359."},{"key":"e_1_3_2_1_41_1","volume-title":"Curl: Contrastive unsupervised representations for reinforcement learning. arXiv preprint arXiv:2004.04136","author":"Srinivas Aravind","year":"2020","unstructured":"Aravind Srinivas , Michael Laskin , and Pieter Abbeel . 2020 . Curl: Contrastive unsupervised representations for reinforcement learning. arXiv preprint arXiv:2004.04136 (2020). Aravind Srinivas, Michael Laskin, and Pieter Abbeel. 2020. Curl: Contrastive unsupervised representations for reinforcement learning. arXiv preprint arXiv:2004.04136 (2020)."},{"key":"e_1_3_2_1_42_1","volume-title":"et almbox","author":"Sutton Richard S","year":"1998","unstructured":"Richard S Sutton , Andrew G Barto , et almbox . 1998 . Reinforcement learning: An introduction .MIT press. Richard S Sutton, Andrew G Barto, et almbox. 1998. Reinforcement learning: An introduction .MIT press."},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.5555\/2789272.2886805"},{"key":"e_1_3_2_1_44_1","unstructured":"Adith Swaminathan and Thorsten Joachims. 2015b. The self-normalized estimator for counterfactual learning. In Advances in Neural Information Processing Systems .  Adith Swaminathan and Thorsten Joachims. 2015b. The self-normalized estimator for counterfactual learning. In Advances in Neural Information Processing Systems ."},{"key":"e_1_3_2_1_45_1","unstructured":"Philip Thomas and Emma Brunskill. 2016. Data-efficient off-policy policy evaluation for reinforcement learning. In ICML . 2139--2148.  Philip Thomas and Emma Brunskill. 2016. Data-efficient off-policy policy evaluation for reinforcement learning. In ICML . 2139--2148."},{"key":"e_1_3_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2017-1118"},{"key":"e_1_3_2_1_47_1","volume-title":"Learning Longer-term Dependencies in RNNs with Auxiliary Losses. In International Conference on Machine Learning. 4965--4974","author":"Trinh Trieu","year":"2018","unstructured":"Trieu Trinh , Andrew Dai , Thang Luong , and Quoc Le . 2018 . Learning Longer-term Dependencies in RNNs with Auxiliary Losses. In International Conference on Machine Learning. 4965--4974 . Trieu Trinh, Andrew Dai, Thang Luong, and Quoc Le. 2018. Learning Longer-term Dependencies in RNNs with Auxiliary Losses. In International Conference on Machine Learning. 4965--4974."},{"key":"e_1_3_2_1_48_1","volume-title":"Machine learning","author":"Watkins Christopher JCH","year":"1992","unstructured":"Christopher JCH Watkins and Peter Dayan . 1992. Q-learning. Machine learning , Vol. 8 , 3--4 ( 1992 ), 279--292. Christopher JCH Watkins and Peter Dayan. 1992. Q-learning. Machine learning , Vol. 8, 3--4 (1992), 279--292."},{"key":"e_1_3_2_1_49_1","volume-title":"Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning","author":"Williams Ronald J","year":"1992","unstructured":"Ronald J Williams . 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning , Vol. 8 , 3--4 ( 1992 ), 229--256. Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning , Vol. 8, 3--4 (1992), 229--256."},{"key":"e_1_3_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/3316481"},{"key":"e_1_3_2_1_51_1","volume-title":"International conference on machine learning . 612--621","author":"Zhang Yuting","year":"2016","unstructured":"Yuting Zhang , Kibok Lee , and Honglak Lee . 2016 . Augmenting supervised neural networks with unsupervised objectives for large-scale image classification . In International conference on machine learning . 612--621 . Yuting Zhang, Kibok Lee, and Honglak Lee. 2016. Augmenting supervised neural networks with unsupervised objectives for large-scale image classification. In International conference on machine learning . 612--621."},{"key":"e_1_3_2_1_52_1","volume-title":"Long Xia, Jiliang Tang, and Dawei Yin with Martin Vesely as coordinator. ACM SIGWEB Newsletter Spring","author":"Zhao Xiangyu","year":"2019","unstructured":"Xiangyu Zhao , Long Xia , Jiliang Tang , and Dawei Yin . 2019 a. \" Deep reinforcement learning for search, recommendation, and online advertising: a survey\" by Xiangyu Zhao , Long Xia, Jiliang Tang, and Dawei Yin with Martin Vesely as coordinator. ACM SIGWEB Newsletter Spring ( 2019 ), 1--15. Xiangyu Zhao, Long Xia, Jiliang Tang, and Dawei Yin. 2019 a. \" Deep reinforcement learning for search, recommendation, and online advertising: a survey\" by Xiangyu Zhao, Long Xia, Jiliang Tang, and Dawei Yin with Martin Vesely as coordinator. ACM SIGWEB Newsletter Spring (2019), 1--15."},{"key":"e_1_3_2_1_53_1","volume-title":"2019 b. Model-based reinforcement learning for whole-chain recommendations. arXiv preprint arXiv:1902.03987","author":"Zhao Xiangyu","year":"2019","unstructured":"Xiangyu Zhao , Long Xia , Dawei Yin , and Jiliang Tang . 2019 b. Model-based reinforcement learning for whole-chain recommendations. arXiv preprint arXiv:1902.03987 ( 2019 ). Xiangyu Zhao, Long Xia, Dawei Yin, and Jiliang Tang. 2019 b. Model-based reinforcement learning for whole-chain recommendations. arXiv preprint arXiv:1902.03987 (2019)."},{"key":"e_1_3_2_1_54_1","volume-title":"Deep reinforcement learning for list-wise recommendations. arXiv preprint arXiv:1801.00209","author":"Zhao Xiangyu","year":"2017","unstructured":"Xiangyu Zhao , Liang Zhang , Long Xia , Zhuoye Ding , Dawei Yin , and Jiliang Tang . 2017. Deep reinforcement learning for list-wise recommendations. arXiv preprint arXiv:1801.00209 ( 2017 ). Xiangyu Zhao, Liang Zhang, Long Xia, Zhuoye Ding, Dawei Yin, and Jiliang Tang. 2017. Deep reinforcement learning for list-wise recommendations. arXiv preprint arXiv:1801.00209 (2017)."},{"key":"e_1_3_2_1_55_1","volume-title":"Xing Xie, and Zhenhui Li.","author":"Zheng Guanjie","year":"2018","unstructured":"Guanjie Zheng , Fuzheng Zhang , Zihan Zheng , Yang Xiang , Nicholas Jing Yuan , Xing Xie, and Zhenhui Li. 2018 . DRN : A de ep reinforcement learning framework for news recommendation. (2018), 167--176. Guanjie Zheng, Fuzheng Zhang, Zihan Zheng, Yang Xiang, Nicholas Jing Yuan, Xing Xie, and Zhenhui Li. 2018. DRN: A deep reinforcement learning framework for news recommendation. (2018), 167--176."}],"event":{"name":"WSDM '21: The Fourteenth ACM International Conference on Web Search and Data Mining","location":"Virtual Event Israel","acronym":"WSDM '21","sponsor":["SIGMOD ACM Special Interest Group on Management of Data","SIGWEB ACM Special Interest Group on Hypertext, Hypermedia, and Web","SIGKDD ACM Special Interest Group on Knowledge Discovery in Data","SIGIR ACM Special Interest Group on Information Retrieval"]},"container-title":["Proceedings of the 14th ACM International Conference on Web Search and Data Mining"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3437963.3441764","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3437963.3441764","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:47:35Z","timestamp":1750193255000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3437963.3441764"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,3,8]]},"references-count":55,"alternative-id":["10.1145\/3437963.3441764","10.1145\/3437963"],"URL":"https:\/\/doi.org\/10.1145\/3437963.3441764","relation":{},"subject":[],"published":{"date-parts":[[2021,3,8]]},"assertion":[{"value":"2021-03-08","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}