{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T16:06:03Z","timestamp":1776096363324,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":54,"publisher":"ACM","license":[{"start":{"date-parts":[[2023,8,4]],"date-time":"2023-08-04T00:00:00Z","timestamp":1691107200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,8,6]]},"DOI":"10.1145\/3580305.3599473","type":"proceedings-article","created":{"date-parts":[[2023,8,4]],"date-time":"2023-08-04T18:10:58Z","timestamp":1691172658000},"page":"2874-2884","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":20,"title":["PrefRec: Recommender Systems with Human Preferences for Reinforcing Long-term User Engagement"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3490-1088","authenticated-orcid":false,"given":"Wanqi","family":"Xue","sequence":"first","affiliation":[{"name":"Nanyang Technological University, Singapore, Singapore"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6451-9299","authenticated-orcid":false,"given":"Qingpeng","family":"Cai","sequence":"additional","affiliation":[{"name":"Kuaishou Technology, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9340-0366","authenticated-orcid":false,"given":"Zhenghai","family":"Xue","sequence":"additional","affiliation":[{"name":"Nanyang Technological University, Singapore, Singapore"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7153-1878","authenticated-orcid":false,"given":"Shuo","family":"Sun","sequence":"additional","affiliation":[{"name":"Nanyang Technological University, Singapore, Singapore"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1440-911X","authenticated-orcid":false,"given":"Shuchang","family":"Liu","sequence":"additional","affiliation":[{"name":"Kuaishou Technology, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0424-9658","authenticated-orcid":false,"given":"Dong","family":"Zheng","sequence":"additional","affiliation":[{"name":"Kuaishou Technology, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9266-0780","authenticated-orcid":false,"given":"Peng","family":"Jiang","sequence":"additional","affiliation":[{"name":"Kuaishou Technology, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3636-3618","authenticated-orcid":false,"given":"Kun","family":"Gai","sequence":"additional","affiliation":[{"name":"Unaffiliated, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7064-7438","authenticated-orcid":false,"given":"Bo","family":"An","sequence":"additional","affiliation":[{"name":"Nanyang Technological University, Singapore, Singapore"}]}],"member":"320","published-online":{"date-parts":[[2023,8,4]]},"reference":[{"key":"e_1_3_2_2_1_1","volume-title":"Proc. of the 1st International Workshop on Novelty and Diversity in Recommender Systems. 3--10","author":"Adomavicius Gediminas","year":"2011","unstructured":"Gediminas Adomavicius and YoungOk Kwon . 2011 . Maximizing aggregate recommendation diversity: A graph-theoretic approach . In Proc. of the 1st International Workshop on Novelty and Diversity in Recommender Systems. 3--10 . Gediminas Adomavicius and YoungOk Kwon. 2011. Maximizing aggregate recommendation diversity: A graph-theoretic approach. In Proc. of the 1st International Workshop on Novelty and Diversity in Recommender Systems. 3--10."},{"key":"e_1_3_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/3366423.3380281"},{"key":"e_1_3_2_2_3_1","unstructured":"Xueying Bai Jian Guan and Hongning Wang. 2019. A model-based reinforcement learning with adversarial training for online recommendation. In NeurIPS. 10734--10745. Xueying Bai Jian Guan and Hongning Wang. 2019. A model-based reinforcement learning with adversarial training for online recommendation. In NeurIPS. 10734--10745."},{"key":"e_1_3_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.2307\/2334029"},{"key":"e_1_3_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3543873.3584640"},{"key":"e_1_3_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/3543507.3583259"},{"key":"e_1_3_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3437963.3441764"},{"key":"e_1_3_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/3460231.3474236"},{"key":"e_1_3_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/3219819.3220122"},{"key":"e_1_3_2_2_10_1","unstructured":"Konstantina Christakopoulou Can Xu Sai Zhang Sriraj Badam Trevor Potter Daniel Li Hao Wan Xinyang Yi Ya Le Chris Berg etal 2022. Reward shaping for user satisfaction in a REINFORCE recommender. arXiv preprint arXiv:2209.15166 (2022). Konstantina Christakopoulou Can Xu Sai Zhang Sriraj Badam Trevor Potter Daniel Li Hao Wan Xinyang Yi Ya Le Chris Berg et al. 2022. Reward shaping for user satisfaction in a REINFORCE recommender. arXiv preprint arXiv:2209.15166 (2022)."},{"key":"e_1_3_2_2_11_1","volume-title":"Nature","volume":"590","author":"Ecoffet Adrien","year":"2021","unstructured":"Adrien Ecoffet , Joost Huizinga , Joel Lehman , Kenneth O Stanley , and Jeff Clune . 2021 . First return, then explore . Nature , Vol. 590 , 7847 (2021), 580--586. Adrien Ecoffet, Joost Huizinga, Joel Lehman, Kenneth O Stanley, and Jeff Clune. 2021. First return, then explore. Nature, Vol. 590, 7847 (2021), 580--586."},{"key":"e_1_3_2_2_12_1","unstructured":"Mehrdad Farajtabar Yinlam Chow and Mohammad Ghavamzadeh. 2018. More robust doubly robust off-policy evaluation. In ICML. 1447--1456. Mehrdad Farajtabar Yinlam Chow and Mohammad Ghavamzadeh. 2018. More robust doubly robust off-policy evaluation. In ICML. 1447--1456."},{"key":"e_1_3_2_2_13_1","volume-title":"Nature","volume":"610","author":"Fawzi Alhussein","year":"2022","unstructured":"Alhussein Fawzi , Matej Balog , Aja Huang , Thomas Hubert , Bernardino Romera-Paredes , Mohammadamin Barekatain , Alexander Novikov , Francisco J. R. Ruiz , Julian Schrittwieser , Grzegorz Swirszcz , David Silver , Demis Hassabis , and Pushmeet Kohli . 2022 . Discovering faster matrix multiplication algorithms with reinforcement learning . Nature , Vol. 610 , 7930 (2022), 47--53. Alhussein Fawzi, Matej Balog, Aja Huang, Thomas Hubert, Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Francisco J. R. Ruiz, Julian Schrittwieser, Grzegorz Swirszcz, David Silver, Demis Hassabis, and Pushmeet Kohli. 2022. Discovering faster matrix multiplication algorithms with reinforcement learning. Nature, Vol. 610, 7930 (2022), 47--53."},{"key":"e_1_3_2_2_14_1","volume-title":"NeurIPS","volume":"34","author":"Fujimoto Scott","year":"2021","unstructured":"Scott Fujimoto and Shixiang Shane Gu . 2021 . A minimalist approach to offline reinforcement learning . NeurIPS , Vol. 34 (2021). Scott Fujimoto and Shixiang Shane Gu. 2021. A minimalist approach to offline reinforcement learning. NeurIPS, Vol. 34 (2021)."},{"key":"e_1_3_2_2_15_1","unstructured":"Scott Fujimoto Herke Hoof and David Meger. 2018. Addressing function approximation error in actor-critic methods. In ICML. 1587--1596. Scott Fujimoto Herke Hoof and David Meger. 2018. Addressing function approximation error in actor-critic methods. In ICML. 1587--1596."},{"key":"e_1_3_2_2_16_1","unstructured":"Scott Fujimoto David Meger and Doina Precup. 2019. Off-policy deep reinforcement learning without exploration. In ICML. 2052--2062. Scott Fujimoto David Meger and Doina Precup. 2019. Off-policy deep reinforcement learning without exploration. In ICML. 2052--2062."},{"key":"e_1_3_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3159652.3159687"},{"key":"e_1_3_2_2_18_1","volume-title":"International conference on machine learning. PMLR","author":"Haarnoja Tuomas","year":"2018","unstructured":"Tuomas Haarnoja , Aurick Zhou , Pieter Abbeel , and Sergey Levine . 2018 . Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor . In International conference on machine learning. PMLR , 1861--1870. Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. 2018. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning. PMLR, 1861--1870."},{"key":"e_1_3_2_2_19_1","unstructured":"Bal\u00e1zs Hidasi Alexandros Karatzoglou Linas Baltrunas and Domonkos Tikk. 2016. Session-based recommendations with recurrent neural networks. In ICLR. Bal\u00e1zs Hidasi Alexandros Karatzoglou Linas Baltrunas and Domonkos Tikk. 2016. Session-based recommendations with recurrent neural networks. In ICLR."},{"key":"e_1_3_2_2_20_1","doi-asserted-by":"crossref","unstructured":"Eugene Ie Vihan Jain Jing Wang Sanmit Narvekar Ritesh Agarwal Rui Wu Heng-Tze Cheng Tushar Chandra and Craig Boutilier. 2019. SlateQ: A Tractable Decomposition for Reinforcement Learning with Recommendation Sets. In IJCAI Sarit Kraus (Ed.). 2592--2599. Eugene Ie Vihan Jain Jing Wang Sanmit Narvekar Ritesh Agarwal Rui Wu Heng-Tze Cheng Tushar Chandra and Craig Boutilier. 2019. SlateQ: A Tractable Decomposition for Reinforcement Learning with Recommendation Sets. In IJCAI Sarit Kraus (Ed.). 2592--2599.","DOI":"10.24963\/ijcai.2019\/360"},{"key":"e_1_3_2_2_21_1","doi-asserted-by":"crossref","unstructured":"Luo Ji Qi Qin Bingqing Han and Hongxia Yang. 2021. Reinforcement learning to optimize lifetime value in cold-start recommendation. In CIKM. ACM 782--791. Luo Ji Qi Qin Bingqing Han and Hongxia Yang. 2021. Reinforcement learning to optimize lifetime value in cold-start recommendation. In CIKM. ACM 782--791.","DOI":"10.1145\/3459637.3482292"},{"key":"e_1_3_2_2_22_1","volume-title":"Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980","author":"Kingma Diederik P","year":"2014","unstructured":"Diederik P Kingma and Jimmy Ba . 2014 . Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014). Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)."},{"key":"e_1_3_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1257\/jep.15.4.143"},{"key":"e_1_3_2_2_24_1","unstructured":"Ilya Kostrikov Ashvin Nair and Sergey Levine. 2022. Offline reinforcement learning with in-sample Q-Learning. In ICLR. Ilya Kostrikov Ashvin Nair and Sergey Levine. 2022. Offline reinforcement learning with in-sample Q-Learning. In ICLR."},{"key":"e_1_3_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.5555\/2946645.2946684"},{"key":"e_1_3_2_2_26_1","volume-title":"Continuous control with deep reinforcement learning. ICLR","author":"Lillicrap Timothy P","year":"2016","unstructured":"Timothy P Lillicrap , Jonathan J Hunt , Alexander Pritzel , Nicolas Heess , Tom Erez , Yuval Tassa , David Silver , and Daan Wierstra . 2016. Continuous control with deep reinforcement learning. ICLR ( 2016 ). Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2016. Continuous control with deep reinforcement learning. ICLR (2016)."},{"key":"e_1_3_2_2_27_1","volume-title":"Doina Precup, and Adith Swaminathan.","author":"Mazoure Bogdan","year":"2021","unstructured":"Bogdan Mazoure , Paul Mineiro , Pavithra Srinath , Reza Sharifi Sedeh , Doina Precup, and Adith Swaminathan. 2021 . Improving long-term metrics in recommendation systems using short-horizon offline RL. arXiv preprint arXiv:2106.00589 (2021). Bogdan Mazoure, Paul Mineiro, Pavithra Srinath, Reza Sharifi Sedeh, Doina Precup, and Adith Swaminathan. 2021. Improving long-term metrics in recommendation systems using short-horizon offline RL. arXiv preprint arXiv:2106.00589 (2021)."},{"key":"e_1_3_2_2_28_1","volume-title":"Proceedings of the Seventeenth International Conference on Machine Learning. 663--670","author":"Andrew","unstructured":"Andrew Y. Ng and Stuart J. Russell. 2000. Algorithms for Inverse Reinforcement Learning . In Proceedings of the Seventeenth International Conference on Machine Learning. 663--670 . Andrew Y. Ng and Stuart J. Russell. 2000. Algorithms for Inverse Reinforcement Learning. In Proceedings of the Seventeenth International Conference on Machine Learning. 663--670."},{"key":"e_1_3_2_2_29_1","unstructured":"Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L Wainwright Pamela Mishkin Chong Zhang Sandhini Agarwal Katarina Slama Alex Ray etal 2022. Training language models to follow instructions with human feedback. arXiv preprint arXiv:2203.02155 (2022). Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L Wainwright Pamela Mishkin Chong Zhang Sandhini Agarwal Katarina Slama Alex Ray et al. 2022. Training language models to follow instructions with human feedback. arXiv preprint arXiv:2203.02155 (2022)."},{"key":"e_1_3_2_2_30_1","volume-title":"SURF: Semi-supervised reward learning with data augmentation for feedback-efficient preference-based reinforcement learning. ICLR.","author":"Park Jongjin","year":"2022","unstructured":"Jongjin Park , Younggyo Seo , Jinwoo Shin , Honglak Lee , Pieter Abbeel , and Kimin Lee . 2022 . SURF: Semi-supervised reward learning with data augmentation for feedback-efficient preference-based reinforcement learning. ICLR. Jongjin Park, Younggyo Seo, Jinwoo Shin, Honglak Lee, Pieter Abbeel, and Kimin Lee. 2022. SURF: Semi-supervised reward learning with data augmentation for feedback-efficient preference-based reinforcement learning. ICLR."},{"key":"e_1_3_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.5555\/1046920.1088715"},{"key":"e_1_3_2_2_32_1","doi-asserted-by":"crossref","unstructured":"Jing-Cheng Shi Yang Yu Qing Da Shi-Yong Chen and Anxiang Zeng. 2019. Virtual-Taobao: Virtualizing real-world online retail environment for Reinforcement Learning. In AAAI. 4902--4909. Jing-Cheng Shi Yang Yu Qing Da Shi-Yong Chen and Anxiang Zeng. 2019. Virtual-Taobao: Virtualizing real-world online retail environment for Reinforcement Learning. In AAAI. 4902--4909.","DOI":"10.1609\/aaai.v33i01.33014902"},{"key":"e_1_3_2_2_33_1","volume-title":"Science","volume":"362","author":"Silver David","year":"2018","unstructured":"David Silver , Thomas Hubert , Julian Schrittwieser , Ioannis Antonoglou , Matthew Lai , Arthur Guez , Marc Lanctot , Laurent Sifre , Dharshan Kumaran , Thore Graepel , 2018 . A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play . Science , Vol. 362 , 6419 (2018), 1140--1144. David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, et al. 2018. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science, Vol. 362, 6419 (2018), 1140--1144."},{"key":"e_1_3_2_2_34_1","volume-title":"Nature","volume":"550","author":"Silver David","year":"2017","unstructured":"David Silver , Julian Schrittwieser , Karen Simonyan , Ioannis Antonoglou , Aja Huang , Arthur Guez , Thomas Hubert , Lucas Baker , Matthew Lai , Adrian Bolton , 2017 . Mastering the game of Go without human knowledge . Nature , Vol. 550 , 7676 (2017), 354--359. David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. 2017. Mastering the game of Go without human knowledge. Nature, Vol. 550, 7676 (2017), 354--359."},{"key":"e_1_3_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/3357384.3357895"},{"key":"e_1_3_2_2_36_1","volume-title":"Reinforcement Learning: An Introduction","author":"Sutton Richard S","year":"2018","unstructured":"Richard S Sutton and Andrew G Barto . 2018 . Reinforcement Learning: An Introduction . MIT Press . Richard S Sutton and Andrew G Barto. 2018. Reinforcement Learning: An Introduction. MIT Press."},{"key":"e_1_3_2_2_37_1","volume-title":"NeurIPS","volume":"28","author":"Swaminathan Adith","year":"2015","unstructured":"Adith Swaminathan and Thorsten Joachims . 2015 . The self-normalized estimator for counterfactual learning . NeurIPS , Vol. 28 (2015). Adith Swaminathan and Thorsten Joachims. 2015. The self-normalized estimator for counterfactual learning. NeurIPS, Vol. 28 (2015)."},{"key":"e_1_3_2_2_38_1","volume-title":"Filip De Turck, and Pieter Abbeel","author":"Tang Haoran","year":"2017","unstructured":"Haoran Tang , Rein Houthooft , Davis Foote , Adam Stooke , Xi Chen , Yan Duan , John Schulman , Filip De Turck, and Pieter Abbeel . 2017 . Exploration : A study of count-based exploration for deep reinforcement learning. In NeurIPS. 2753--2762. Haoran Tang, Rein Houthooft, Davis Foote, Adam Stooke, Xi Chen, Yan Duan, John Schulman, Filip De Turck, and Pieter Abbeel. 2017. Exploration: A study of count-based exploration for deep reinforcement learning. In NeurIPS. 2753--2762."},{"key":"e_1_3_2_2_39_1","volume-title":"Nature","volume":"575","author":"Vinyals Oriol","year":"2019","unstructured":"Oriol Vinyals , Igor Babuschkin , Wojciech M Czarnecki , Micha\u00ebl Mathieu , Andrew Dudzik , Junyoung Chung , David H Choi , Richard Powell , Timo Ewalds , Petko Georgiev , 2019 . Grandmaster level in StarCraft II using multi-agent reinforcement learning . Nature , Vol. 575 , 7782 (2019), 350--354. Oriol Vinyals, Igor Babuschkin, Wojciech M Czarnecki, Micha\u00ebl Mathieu, Andrew Dudzik, Junyoung Chung, David H Choi, Richard Powell, Timo Ewalds, Petko Georgiev, et al. 2019. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, Vol. 575, 7782 (2019), 350--354."},{"key":"e_1_3_2_2_40_1","doi-asserted-by":"crossref","unstructured":"Shoujin Wang Liang Hu Yan Wang Longbing Cao Quan Z. Sheng and Mehmet Orgun. 2019. Sequential recommender systems: Challenges progress and prospects. In IJCAI. 6332--6338. Shoujin Wang Liang Hu Yan Wang Longbing Cao Quan Z. Sheng and Mehmet Orgun. 2019. Sequential recommender systems: Challenges progress and prospects. In IJCAI. 6332--6338.","DOI":"10.24963\/ijcai.2019\/883"},{"key":"e_1_3_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3534678.3539073"},{"key":"e_1_3_2_2_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3132847.3133025"},{"key":"e_1_3_2_2_43_1","unstructured":"Wanqi Xue Bo An Shuicheng Yan and Zhongwen Xu. 2023 a. Reinforcement Learning from Diverse Human Preferences. arxiv: 2301.11774 Wanqi Xue Bo An Shuicheng Yan and Zhongwen Xu. 2023 a. Reinforcement Learning from Diverse Human Preferences. arxiv: 2301.11774"},{"key":"e_1_3_2_2_44_1","volume-title":"ResAct: Reinforcing Long-term Engagement in Sequential Recommendation with Residual Actor. In The Eleventh International Conference on Learning Representations.","author":"Xue Wanqi","year":"2023","unstructured":"Wanqi Xue , Qingpeng Cai , Ruohan Zhan , Dong Zheng , Peng Jiang , Kun Gai , and Bo An . 2023 b . ResAct: Reinforcing Long-term Engagement in Sequential Recommendation with Residual Actor. In The Eleventh International Conference on Learning Representations. Wanqi Xue, Qingpeng Cai, Ruohan Zhan, Dong Zheng, Peng Jiang, Kun Gai, and Bo An. 2023 b. ResAct: Reinforcing Long-term Engagement in Sequential Recommendation with Residual Actor. In The Eleventh International Conference on Learning Representations."},{"key":"e_1_3_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/3534678.3539092"},{"key":"e_1_3_2_2_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/3534678.3539040"},{"key":"e_1_3_2_2_47_1","volume-title":"SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 41--50.","author":"Zhang Xiao","unstructured":"Xiao Zhang , Haonan Jia , Hanjing Su , Wenhan Wang , Jun Xu , and Ji-Rong Wen . 2021. Counterfactual Reward Modification for Streaming Recommendation with Delayed Feedback . In SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 41--50. Xiao Zhang, Haonan Jia, Hanjing Su, Wenhan Wang, Jun Xu, and Ji-Rong Wen. 2021. Counterfactual Reward Modification for Streaming Recommendation with Delayed Feedback. In SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 41--50."},{"key":"e_1_3_2_2_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/3397271.3401170"},{"key":"e_1_3_2_2_49_1","volume-title":"Rabbit Holes and Taste Distortion: Distribution-Aware Recommendation with Evolving Interests. In WWW '21: The Web Conference","author":"Zhao Xing","year":"2021","unstructured":"Xing Zhao , Ziwei Zhu , and James Caverlee . 2021 . Rabbit Holes and Taste Distortion: Distribution-Aware Recommendation with Evolving Interests. In WWW '21: The Web Conference 2021. 888--899. Xing Zhao, Ziwei Zhu, and James Caverlee. 2021. Rabbit Holes and Taste Distortion: Distribution-Aware Recommendation with Evolving Interests. In WWW '21: The Web Conference 2021. 888--899."},{"key":"e_1_3_2_2_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394486.3403329"},{"key":"e_1_3_2_2_51_1","volume-title":"Xing Xie, and Zhenhui Li.","author":"Zheng Guanjie","year":"2018","unstructured":"Guanjie Zheng , Fuzheng Zhang , Zihan Zheng , Yang Xiang , Nicholas Jing Yuan , Xing Xie, and Zhenhui Li. 2018 . DRN : A deep reinforcement learning framework for news recommendation. In WWW. 167--176. Guanjie Zheng, Fuzheng Zhang, Zihan Zheng, Yang Xiang, Nicholas Jing Yuan, Xing Xie, and Zhenhui Li. 2018. DRN: A deep reinforcement learning framework for news recommendation. In WWW. 167--176."},{"key":"e_1_3_2_2_52_1","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1000488107"},{"key":"e_1_3_2_2_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/3292500.3330668"},{"key":"e_1_3_2_2_54_1","volume-title":"Model Based Reinforcement Learning for Atari. In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=S1xCPJHtDB","author":"Kaiser \u0141ukasz","year":"2020","unstructured":"\u0141ukasz Kaiser , Mohammad Babaeizadeh , Piotr Mi\u0140os , B\u0140a\u017cej Osi\u0144ski , Roy H Campbell , Konrad Czechowski , Dumitru Erhan , Chelsea Finn , Piotr Kozakowski , Sergey Levine , Afroz Mohiuddin , Ryan Sepassi , George Tucker , and Henryk Michalewski . 2020 . Model Based Reinforcement Learning for Atari. In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=S1xCPJHtDB \u0141ukasz Kaiser, Mohammad Babaeizadeh, Piotr Mi\u0140os, B\u0140a\u017cej Osi\u0144ski, Roy H Campbell, Konrad Czechowski, Dumitru Erhan, Chelsea Finn, Piotr Kozakowski, Sergey Levine, Afroz Mohiuddin, Ryan Sepassi, George Tucker, and Henryk Michalewski. 2020. Model Based Reinforcement Learning for Atari. In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=S1xCPJHtDB"}],"event":{"name":"KDD '23: The 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining","location":"Long Beach CA USA","acronym":"KDD '23","sponsor":["SIGMOD ACM Special Interest Group on Management of Data","SIGKDD ACM Special Interest Group on Knowledge Discovery in Data"]},"container-title":["Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3580305.3599473","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3580305.3599473","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:37:37Z","timestamp":1750178257000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3580305.3599473"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,8,4]]},"references-count":54,"alternative-id":["10.1145\/3580305.3599473","10.1145\/3580305"],"URL":"https:\/\/doi.org\/10.1145\/3580305.3599473","relation":{},"subject":[],"published":{"date-parts":[[2023,8,4]]},"assertion":[{"value":"2023-08-04","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}