{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T15:44:31Z","timestamp":1772120671565,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":68,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,7,6]],"date-time":"2022-07-06T00:00:00Z","timestamp":1657065600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,7,6]]},"DOI":"10.1145\/3477495.3531716","type":"proceedings-article","created":{"date-parts":[[2022,7,7]],"date-time":"2022-07-07T15:12:13Z","timestamp":1657206733000},"page":"2738-2748","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":11,"title":["State Encoders in Reinforcement Learning for Recommendation"],"prefix":"10.1145","author":[{"given":"Jin","family":"Huang","sequence":"first","affiliation":[{"name":"University of Amsterdam, Amsterdam, Netherlands"}]},{"given":"Harrie","family":"Oosterhuis","sequence":"additional","affiliation":[{"name":"Radboud University, Nijmegen, Netherlands"}]},{"given":"Bunyamin","family":"Cetinkaya","sequence":"additional","affiliation":[{"name":"University of Amsterdam, Amsterdam, Netherlands"}]},{"given":"Thijs","family":"Rood","sequence":"additional","affiliation":[{"name":"University of Amsterdam, Amsterdam, Netherlands"}]},{"given":"Maarten","family":"de Rijke","sequence":"additional","affiliation":[{"name":"University of Amsterdam, Amsterdam, Netherlands"}]}],"member":"320","published-online":{"date-parts":[[2022,7,7]]},"reference":[{"key":"e_1_3_2_2_1_1","doi-asserted-by":"crossref","unstructured":"Panagiotis Adamopoulos and Alexander Tuzhilin. 2014. On Over-Specialization and Concentration Bias of Recommendations: Probabilistic Neighborhood Selection in Collaborative Filtering Systems. In RecSys . ACM 153--160. Panagiotis Adamopoulos and Alexander Tuzhilin. 2014. On Over-Specialization and Concentration Bias of Recommendations: Probabilistic Neighborhood Selection in Collaborative Filtering Systems. In RecSys . ACM 153--160.","DOI":"10.1145\/2645710.2645752"},{"key":"e_1_3_2_2_2_1","volume-title":"Reinforcement Learning based Recommender Systems: A Survey. arXiv preprint arXiv:2101.06286","author":"Afsar M. Mehdi","year":"2021","unstructured":"M. Mehdi Afsar , Trafford Crump , and Behrouz Far . 2021. Reinforcement Learning based Recommender Systems: A Survey. arXiv preprint arXiv:2101.06286 ( 2021 ). M. Mehdi Afsar, Trafford Crump, and Behrouz Far. 2021. Reinforcement Learning based Recommender Systems: A Survey. arXiv preprint arXiv:2101.06286 (2021)."},{"key":"e_1_3_2_2_3_1","unstructured":"Dzmitry Bahdanau Kyunghyun Cho and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. In ICLR . Dzmitry Bahdanau Kyunghyun Cho and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. In ICLR ."},{"key":"e_1_3_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/72.279181"},{"key":"e_1_3_2_2_5_1","volume-title":"Simulations in Recommender Systems: An Industry Perspective. arXiv preprint arXiv:2109.06723","author":"Bernardi Lucas","year":"2021","unstructured":"Lucas Bernardi , Sakshi Batra , and Cintia Alicia Bruscantini . 2021. Simulations in Recommender Systems: An Industry Perspective. arXiv preprint arXiv:2109.06723 ( 2021 ). Lucas Bernardi, Sakshi Batra, and Cintia Alicia Bruscantini. 2021. Simulations in Recommender Systems: An Industry Perspective. arXiv preprint arXiv:2109.06723 (2021)."},{"key":"e_1_3_2_2_6_1","volume-title":"Bias and Debias in Recommender System: A Survey and Future Directions. arXiv preprint arXiv:2010.03240","author":"Chen Jiawei","year":"2020","unstructured":"Jiawei Chen , Hande Dong , Xiang Wang , Fuli Feng , Meng Wang , and Xiangnan He. 2020. Bias and Debias in Recommender System: A Survey and Future Directions. arXiv preprint arXiv:2010.03240 ( 2020 ). Jiawei Chen, Hande Dong, Xiang Wang, Fuli Feng, Meng Wang, and Xiangnan He. 2020. Bias and Debias in Recommender System: A Survey and Future Directions. arXiv preprint arXiv:2010.03240 (2020)."},{"key":"e_1_3_2_2_7_1","doi-asserted-by":"crossref","unstructured":"Minmin Chen Alex Beutel Paul Covington Sagar Jain Francois Belletti and Ed H Chi. 2019 b. Top-K Off-Policy Correction for a REINFORCE Recommender System. In WSDM . ACM 456--464. Minmin Chen Alex Beutel Paul Covington Sagar Jain Francois Belletti and Ed H Chi. 2019 b. Top-K Off-Policy Correction for a REINFORCE Recommender System. In WSDM . ACM 456--464.","DOI":"10.1145\/3289600.3290999"},{"key":"e_1_3_2_2_8_1","doi-asserted-by":"crossref","unstructured":"Ruey-Cheng Chen Qingyao Ai Gaya Jayasinghe and W Bruce Croft. 2019 a. Correcting for Recency Bias in Job Recommendation. In CIKM. ACM 2185--2188. Ruey-Cheng Chen Qingyao Ai Gaya Jayasinghe and W Bruce Croft. 2019 a. Correcting for Recency Bias in Job Recommendation. In CIKM. ACM 2185--2188.","DOI":"10.1145\/3357384.3358131"},{"key":"e_1_3_2_2_9_1","doi-asserted-by":"crossref","unstructured":"Shi-Yong Chen Yang Yu Qing Da Jun Tan Hai-Kuan Huang and Hai-Hong Tang. 2018b. Stabilizing Reinforcement Learning in Dynamic Environment with Application to Online Recommendation. In KDD. ACM 1187--1196. Shi-Yong Chen Yang Yu Qing Da Jun Tan Hai-Kuan Huang and Hai-Hong Tang. 2018b. Stabilizing Reinforcement Learning in Dynamic Environment with Application to Online Recommendation. In KDD. ACM 1187--1196.","DOI":"10.1145\/3219819.3220122"},{"key":"e_1_3_2_2_10_1","unstructured":"Xinshi Chen Shuang Li Hui Li Shaohua Jiang Yuan Qi and Le Song. 2019 c. Generative Adversarial User Model for Reinforcement Learning based Recommendation System. In ICML . PMLR 1052--1061. Xinshi Chen Shuang Li Hui Li Shaohua Jiang Yuan Qi and Le Song. 2019 c. Generative Adversarial User Model for Reinforcement Learning based Recommendation System. In ICML . PMLR 1052--1061."},{"key":"e_1_3_2_2_11_1","doi-asserted-by":"crossref","unstructured":"Xu Chen Hongteng Xu Yongfeng Zhang Jiaxi Tang Yixin Cao Zheng Qin and Hongyuan Zha. 2018a. Sequential Recommendation with User Memory Networks. In WSDM. ACM 108--116. Xu Chen Hongteng Xu Yongfeng Zhang Jiaxi Tang Yixin Cao Zheng Qin and Hongyuan Zha. 2018a. Sequential Recommendation with User Memory Networks. In WSDM. ACM 108--116.","DOI":"10.1145\/3159652.3159668"},{"key":"e_1_3_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/2988450.2988454"},{"key":"e_1_3_2_2_13_1","volume-title":"Deep Reinforcement Learning in Large Discrete Action Spaces. arXiv preprint arXiv:1512.07679","author":"Dulac-Arnold Gabriel","year":"2015","unstructured":"Gabriel Dulac-Arnold , Richard Evans , Hado van Hasselt , Peter Sunehag , Timothy Lillicrap , Jonathan Hunt , Timothy Mann , Theophane Weber , Thomas Degris , and Ben Coppin . 2015. Deep Reinforcement Learning in Large Discrete Action Spaces. arXiv preprint arXiv:1512.07679 ( 2015 ). Gabriel Dulac-Arnold, Richard Evans, Hado van Hasselt, Peter Sunehag, Timothy Lillicrap, Jonathan Hunt, Timothy Mann, Theophane Weber, Thomas Degris, and Ben Coppin. 2015. Deep Reinforcement Learning in Large Discrete Action Spaces. arXiv preprint arXiv:1512.07679 (2015)."},{"key":"e_1_3_2_2_14_1","volume-title":"Eigentaste: A Constant Time Collaborative Filtering Algorithm. information retrieval","author":"Goldberg Ken","year":"2001","unstructured":"Ken Goldberg , Theresa Roeder , Dhruv Gupta , and Chris Perkins . 2001 . Eigentaste: A Constant Time Collaborative Filtering Algorithm. information retrieval , Vol. 4 , 2 (2001), 133--151. Ken Goldberg, Theresa Roeder, Dhruv Gupta, and Chris Perkins. 2001. Eigentaste: A Constant Time Collaborative Filtering Algorithm. information retrieval , Vol. 4, 2 (2001), 133--151."},{"key":"e_1_3_2_2_15_1","volume-title":"Converse-Et-Impera: Exploiting Deep Learning and Hierarchical Reinforcement Learning for Conversational Recommender Systems","author":"Greco Claudio","unstructured":"Claudio Greco , Alessandro Suglia , Pierpaolo Basile , and Giovanni Semeraro . 2017. Converse-Et-Impera: Exploiting Deep Learning and Hierarchical Reinforcement Learning for Conversational Recommender Systems . In AIxIA. Springer , 372--386. Claudio Greco, Alessandro Suglia, Pierpaolo Basile, and Giovanni Semeraro. 2017. Converse-Et-Impera: Exploiting Deep Learning and Hierarchical Reinforcement Learning for Conversational Recommender Systems. In AIxIA. Springer, 372--386."},{"key":"e_1_3_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2945386"},{"key":"e_1_3_2_2_17_1","first-page":"1","article-title":"The Movielens Datasets","volume":"5","author":"Maxwell Harper F","year":"2015","unstructured":"F Maxwell Harper and Joseph A Konstan . 2015 . The Movielens Datasets : History and Context. TiiS , Vol. 5 , 4 (2015), 1 -- 19 . F Maxwell Harper and Joseph A Konstan. 2015. The Movielens Datasets: History and Context. TiiS , Vol. 5, 4 (2015), 1--19.","journal-title":"History and Context. TiiS"},{"key":"e_1_3_2_2_18_1","doi-asserted-by":"crossref","unstructured":"Xiangnan He Xiaoyu Du Xiang Wang Feng Tian Jinhui Tang and Tat-Seng Chua. 2018. Outer Product-based Neural Collaborative Filtering. In IJCAI . ijcai.org 2227--2233. Xiangnan He Xiaoyu Du Xiang Wang Feng Tian Jinhui Tang and Tat-Seng Chua. 2018. Outer Product-based Neural Collaborative Filtering. In IJCAI . ijcai.org 2227--2233.","DOI":"10.24963\/ijcai.2018\/308"},{"key":"e_1_3_2_2_19_1","doi-asserted-by":"crossref","unstructured":"Xiangnan He Lizi Liao Hanwang Zhang Liqiang Nie Xia Hu and Tat-Seng Chua. 2017. Neural Collaborative Filtering. In WWW. ACM 173--182. Xiangnan He Lizi Liao Hanwang Zhang Liqiang Nie Xia Hu and Tat-Seng Chua. 2017. Neural Collaborative Filtering. In WWW. ACM 173--182.","DOI":"10.1145\/3038912.3052569"},{"key":"e_1_3_2_2_20_1","unstructured":"Bal\u00e1zs Hidasi Alexandros Karatzoglou Linas Baltrunas and Domonkos Tikk. 2016. Session-based Recommendations with Recurrent Neural Networks. (2016). Bal\u00e1zs Hidasi Alexandros Karatzoglou Linas Baltrunas and Domonkos Tikk. 2016. Session-based Recommendations with Recurrent Neural Networks. (2016)."},{"key":"e_1_3_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.1952.10483446"},{"key":"e_1_3_2_2_22_1","doi-asserted-by":"crossref","unstructured":"Jin Huang Harrie Oosterhuis Maarten de Rijke and Herke van Hoof. 2020. Keeping Dataset Biases out of the Simulation: A Debiased Simulator for Reinforcement Learning based Recommender Systems. In RecSys. ACM 190--199. Jin Huang Harrie Oosterhuis Maarten de Rijke and Herke van Hoof. 2020. Keeping Dataset Biases out of the Simulation: A Debiased Simulator for Reinforcement Learning based Recommender Systems. In RecSys. ACM 190--199.","DOI":"10.1145\/3383313.3412252"},{"key":"e_1_3_2_2_23_1","volume-title":"Hongjian Dou, Ji-Rong Wen, and Edward Y Chang.","author":"Huang Jin","year":"2018","unstructured":"Jin Huang , Wayne Xin Zhao , Hongjian Dou, Ji-Rong Wen, and Edward Y Chang. 2018 . Improving Sequential Recommendation with Knowledge-Enhanced Memory Networks. In SIGIR . ACM , 505--514. Jin Huang, Wayne Xin Zhao, Hongjian Dou, Ji-Rong Wen, and Edward Y Chang. 2018. Improving Sequential Recommendation with Knowledge-Enhanced Memory Networks. In SIGIR . ACM, 505--514."},{"key":"e_1_3_2_2_24_1","volume-title":"2019 a. RecSim: A Configurable Simulation Platform for Recommender Systems. arXiv preprint arXiv:1909.04847","author":"Ie Eugene","year":"2019","unstructured":"Eugene Ie , Chih-wei Hsu, Martin Mladenov , Vihan Jain , Sanmit Narvekar , Jing Wang , Rui Wu , and Craig Boutilier . 2019 a. RecSim: A Configurable Simulation Platform for Recommender Systems. arXiv preprint arXiv:1909.04847 ( 2019 ). Eugene Ie, Chih-wei Hsu, Martin Mladenov, Vihan Jain, Sanmit Narvekar, Jing Wang, Rui Wu, and Craig Boutilier. 2019 a. RecSim: A Configurable Simulation Platform for Recommender Systems. arXiv preprint arXiv:1909.04847 (2019)."},{"key":"e_1_3_2_2_25_1","volume-title":"et almbox. 2019 b. Reinforcement Learning for Slate-based Recommender Systems: A Tractable Decomposition and Practical Methodology. arXiv preprint arXiv:1905.12767","author":"Ie Eugene","year":"2019","unstructured":"Eugene Ie , Vihan Jain , Jing Wang , Sanmit Narvekar , Ritesh Agarwal , Rui Wu , Heng-Tze Cheng , Morgane Lustman , Vince Gatto , Paul Covington , et almbox. 2019 b. Reinforcement Learning for Slate-based Recommender Systems: A Tractable Decomposition and Practical Methodology. arXiv preprint arXiv:1905.12767 ( 2019 ). Eugene Ie, Vihan Jain, Jing Wang, Sanmit Narvekar, Ritesh Agarwal, Rui Wu, Heng-Tze Cheng, Morgane Lustman, Vince Gatto, Paul Covington, et almbox. 2019 b. Reinforcement Learning for Slate-based Recommender Systems: A Tractable Decomposition and Practical Methodology. arXiv preprint arXiv:1905.12767 (2019)."},{"key":"e_1_3_2_2_26_1","volume-title":"Causal Inference in Statistics, Social, and Biomedical Sciences","author":"Imbens Guido W","unstructured":"Guido W Imbens and Donald B Rubin . 2015. Causal Inference in Statistics, Social, and Biomedical Sciences . Cambridge University Press . Guido W Imbens and Donald B Rubin. 2015. Causal Inference in Statistics, Social, and Biomedical Sciences .Cambridge University Press."},{"key":"e_1_3_2_2_27_1","doi-asserted-by":"crossref","unstructured":"Thorsten Joachims Adith Swaminathan and Tobias Schnabel. 2017. Unbiased Learning-to-Rank with Biased Feedback. In WSDM. ACM 781--789. Thorsten Joachims Adith Swaminathan and Tobias Schnabel. 2017. Unbiased Learning-to-Rank with Biased Feedback. In WSDM. ACM 781--789.","DOI":"10.1145\/3018661.3018699"},{"key":"e_1_3_2_2_28_1","volume-title":"et almbox","author":"Kang Joseph DY","year":"2007","unstructured":"Joseph DY Kang , Joseph L Schafer , et almbox . 2007 . Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data. Statistical science , Vol. 22 , 4 (2007), 523--539. Joseph DY Kang, Joseph L Schafer, et almbox. 2007. Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data. Statistical science , Vol. 22, 4 (2007), 523--539."},{"key":"e_1_3_2_2_29_1","doi-asserted-by":"crossref","unstructured":"Lihong Li Wei Chu John Langford and Robert E Schapire. 2010. A Contextual-bandit Approach to Personalized News Article Recommendation. In WWW . ACM 661--670. Lihong Li Wei Chu John Langford and Robert E Schapire. 2010. A Contextual-bandit Approach to Personalized News Article Recommendation. In WWW . ACM 661--670.","DOI":"10.1145\/1772690.1772758"},{"key":"e_1_3_2_2_30_1","volume-title":"Drprofiling: Deep Reinforcement User Profiling for Recommendations in Heterogenous Information Networks. TKDE","author":"Liang Huizhi","year":"2020","unstructured":"Huizhi Liang . 2020 . Drprofiling: Deep Reinforcement User Profiling for Recommendations in Heterogenous Information Networks. TKDE (2020). Huizhi Liang. 2020. Drprofiling: Deep Reinforcement User Profiling for Recommendations in Heterogenous Information Networks. TKDE (2020)."},{"key":"e_1_3_2_2_31_1","unstructured":"Timothy P Lillicrap Jonathan J Hunt Alexander Pritzel Nicolas Heess Tom Erez Yuval Tassa David Silver and Daan Wierstra. 2016. Continuous Control with Deep Reinforcement Learning. In ICLR (Poster) . Timothy P Lillicrap Jonathan J Hunt Alexander Pritzel Nicolas Heess Tom Erez Yuval Tassa David Silver and Daan Wierstra. 2016. Continuous Control with Deep Reinforcement Learning. In ICLR (Poster) ."},{"key":"e_1_3_2_2_32_1","volume-title":"A Survey on Reinforcement Learning for Recommender Systems. arXiv preprint arXiv:2109.10665","author":"Lin Yuanguo","year":"2021","unstructured":"Yuanguo Lin , Yong Liu , Fan Lin , Pengcheng Wu , Wenhua Zeng , and Chunyan Miao . 2021. A Survey on Reinforcement Learning for Recommender Systems. arXiv preprint arXiv:2109.10665 ( 2021 ). Yuanguo Lin, Yong Liu, Fan Lin, Pengcheng Wu, Wenhua Zeng, and Chunyan Miao. 2021. A Survey on Reinforcement Learning for Recommender Systems. arXiv preprint arXiv:2109.10665 (2021)."},{"key":"e_1_3_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2019.2925019"},{"key":"e_1_3_2_2_34_1","doi-asserted-by":"crossref","unstructured":"Feng Liu Huifeng Guo Xutao Li Ruiming Tang Yunming Ye and Xiuqiang He. 2020 a. End-to-End Deep Reinforcement Learning based Recommendation with Supervised Embedding. In WSDM. ACM 384--392. Feng Liu Huifeng Guo Xutao Li Ruiming Tang Yunming Ye and Xiuqiang He. 2020 a. End-to-End Deep Reinforcement Learning based Recommendation with Supervised Embedding. In WSDM. ACM 384--392.","DOI":"10.1145\/3336191.3371858"},{"key":"e_1_3_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2020.106170"},{"key":"e_1_3_2_2_36_1","volume-title":"TALE","author":"Liu Su","unstructured":"Su Liu , Ye Chen , Hui Huang , Liang Xiao , and Xiaojun Hei . 2018. Towards Smart Educational Recommendations with Reinforcement Learning in Classroom . In TALE . IEEE , 1079--1084. Su Liu, Ye Chen, Hui Huang, Liang Xiao, and Xiaojun Hei. 2018. Towards Smart Educational Recommendations with Reinforcement Learning in Classroom. In TALE . IEEE, 1079--1084."},{"key":"e_1_3_2_2_37_1","volume-title":"H Chi","author":"Ma Jiaqi","year":"2020","unstructured":"Jiaqi Ma , Zhe Zhao , Xinyang Yi , Ji Yang , Minmin Chen , Jiaxi Tang , Lichan Hong , and Ed H Chi . 2020 . Off-policy Learning in Two-stage Recommender Systems. In WWW. ACM \/ IW 3C2, 463--473. Jiaqi Ma, Zhe Zhao, Xinyang Yi, Ji Yang, Minmin Chen, Jiaxi Tang, Lichan Hong, and Ed H Chi. 2020. Off-policy Learning in Two-stage Recommender Systems. In WWW. ACM \/ IW3C2, 463--473."},{"key":"e_1_3_2_2_38_1","doi-asserted-by":"crossref","unstructured":"Benjamin M Marlin and Richard S Zemel. 2009. Collaborative Prediction and Ranking with Non-Random Missing Data. In RecSys . ACM 5--12. Benjamin M Marlin and Richard S Zemel. 2009. Collaborative Prediction and Ranking with Non-Random Missing Data. In RecSys . ACM 5--12.","DOI":"10.1145\/1639714.1639717"},{"key":"e_1_3_2_2_39_1","volume-title":"et almbox","author":"Mnih Volodymyr","year":"2015","unstructured":"Volodymyr Mnih , Koray Kavukcuoglu , David Silver , Andrei A Rusu , Joel Veness , Marc G Bellemare , Alex Graves , Martin Riedmiller , Andreas K Fidjeland , Georg Ostrovski , et almbox . 2015 . Human-level Control through Deep Reinforcement Learning . nature , Vol. 518 , 7540 (2015), 529--533. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et almbox. 2015. Human-level Control through Deep Reinforcement Learning. nature , Vol. 518, 7540 (2015), 529--533."},{"key":"e_1_3_2_2_40_1","doi-asserted-by":"crossref","unstructured":"Tien T Nguyen Pik-Mai Hui F Maxwell Harper Loren Terveen and Joseph A Konstan. 2014. Exploring the Filter Bubble: the Effect of Using Recommender Systems on Content Diversity. In WWW. ACM 677--686. Tien T Nguyen Pik-Mai Hui F Maxwell Harper Loren Terveen and Joseph A Konstan. 2014. Exploring the Filter Bubble: the Effect of Using Recommender Systems on Content Diversity. In WWW. ACM 677--686.","DOI":"10.1145\/2566486.2568012"},{"key":"e_1_3_2_2_41_1","unstructured":"Feiyang Pan Qingpeng Cai Pingzhong Tang Fuzhen Zhuang and Qing He. 2019. Policy Gradients for Contextual Recommendations. In WWW. ACM 1421--1431. Feiyang Pan Qingpeng Cai Pingzhong Tang Fuzhen Zhuang and Qing He. 2019. Policy Gradients for Contextual Recommendations. In WWW. ACM 1421--1431."},{"key":"e_1_3_2_2_42_1","volume-title":"The Filter Bubble: How the New Personalized Web is Changing What We Read and How We Think","author":"Pariser Eli","unstructured":"Eli Pariser . 2011. The Filter Bubble: How the New Personalized Web is Changing What We Read and How We Think . Penguin . Eli Pariser. 2011. The Filter Bubble: How the New Personalized Web is Changing What We Read and How We Think .Penguin."},{"key":"e_1_3_2_2_43_1","doi-asserted-by":"crossref","unstructured":"Bruno Pradel Nicolas Usunier and Patrick Gallinari. 2012. Ranking with Non-Random Missing Ratings: Influence of Popularity and Positivity on Evaluation Metrics. In RecSys. ACM 147--154. Bruno Pradel Nicolas Usunier and Patrick Gallinari. 2012. Ranking with Non-Random Missing Ratings: Influence of Popularity and Positivity on Evaluation Metrics. In RecSys. ACM 147--154.","DOI":"10.1145\/2365952.2365982"},{"key":"e_1_3_2_2_44_1","doi-asserted-by":"publisher","DOI":"10.1186\/s13638-019-1619-6"},{"key":"e_1_3_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.1994.10476818"},{"key":"e_1_3_2_2_46_1","volume-title":"RecoGym: A Reinforcement Learning Environment for the problem of Product Recommendation in Online Advertising. arXiv preprint arXiv:1808.00720","author":"Rohde David","year":"2018","unstructured":"David Rohde , Stephen Bonner , Travis Dunlop , Flavian Vasile , and Alexandros Karatzoglou . 2018. RecoGym: A Reinforcement Learning Environment for the problem of Product Recommendation in Online Advertising. arXiv preprint arXiv:1808.00720 ( 2018 ). David Rohde, Stephen Bonner, Travis Dunlop, Flavian Vasile, and Alexandros Karatzoglou. 2018. RecoGym: A Reinforcement Learning Environment for the problem of Product Recommendation in Online Advertising. arXiv preprint arXiv:1808.00720 (2018)."},{"key":"e_1_3_2_2_47_1","unstructured":"Tobias Schnabel Adith Swaminathan Ashudeep Singh Navin Chandak and Thorsten Joachims. 2016. Recommendations as Treatments: Debiasing Learning and Evaluation. In ICML . JMLR.org 1670--1679. Tobias Schnabel Adith Swaminathan Ashudeep Singh Navin Chandak and Thorsten Joachims. 2016. Recommendations as Treatments: Debiasing Learning and Evaluation. In ICML . JMLR.org 1670--1679."},{"key":"e_1_3_2_2_48_1","doi-asserted-by":"crossref","unstructured":"Bichen Shi Makbule Gulcin Ozsoy Neil Hurley Barry Smyth Elias Z Tragos James Geraci and Aonghus Lawlor. 2019 a. PyRecGym: A Reinforcement Learning Gym for Recommender Systems. In RecSys . ACM 491--495. Bichen Shi Makbule Gulcin Ozsoy Neil Hurley Barry Smyth Elias Z Tragos James Geraci and Aonghus Lawlor. 2019 a. PyRecGym: A Reinforcement Learning Gym for Recommender Systems. In RecSys . ACM 491--495.","DOI":"10.1145\/3298689.3346981"},{"key":"e_1_3_2_2_49_1","volume-title":"AAAI","author":"Shi Jing-Cheng","unstructured":"Jing-Cheng Shi , Yang Yu , Qing Da , Shi-Yong Chen , and An-Xiang Zeng . 2019 b. Virtual-taobao: Virtualizing Real-world Online Retail Environment for Reinforcement Learning . In AAAI , Vol. 33 . AAAI Press , 4902--4909. Jing-Cheng Shi, Yang Yu, Qing Da, Shi-Yong Chen, and An-Xiang Zeng. 2019 b. Virtual-taobao: Virtualizing Real-world Online Retail Environment for Reinforcement Learning. In AAAI, Vol. 33. AAAI Press, 4902--4909."},{"key":"e_1_3_2_2_50_1","doi-asserted-by":"crossref","unstructured":"Harald Steck. 2010. Training and Testing of Recommender Systems on Data Missing Not at Random. In KDD . ACM 713--722. Harald Steck. 2010. Training and Testing of Recommender Systems on Data Missing Not at Random. In KDD . ACM 713--722.","DOI":"10.1145\/1835804.1835895"},{"key":"e_1_3_2_2_51_1","doi-asserted-by":"crossref","unstructured":"Harald Steck. 2011. Item Popularity and Recommendation Accuracy. In RecSys. ACM 125--132. Harald Steck. 2011. Item Popularity and Recommendation Accuracy. In RecSys. ACM 125--132.","DOI":"10.1145\/2043932.2043957"},{"key":"e_1_3_2_2_52_1","unstructured":"Yueming Sun and Yi Zhang. 2018. Conversational Recommender System. In SIGIR. ACM 235--244. Yueming Sun and Yi Zhang. 2018. Conversational Recommender System. In SIGIR. ACM 235--244."},{"key":"e_1_3_2_2_53_1","volume-title":"Deep Reinforcement Learning with Attention for Slate Markov Decision Processes with High-Dimensional States and Actions. arXiv preprint arXiv:1512.01124","author":"Sunehag Peter","year":"2015","unstructured":"Peter Sunehag , Richard Evans , Gabriel Dulac-Arnold , Yori Zwols , Daniel Visentin , and Ben Coppin . 2015. Deep Reinforcement Learning with Attention for Slate Markov Decision Processes with High-Dimensional States and Actions. arXiv preprint arXiv:1512.01124 ( 2015 ). Peter Sunehag, Richard Evans, Gabriel Dulac-Arnold, Yori Zwols, Daniel Visentin, and Ben Coppin. 2015. Deep Reinforcement Learning with Attention for Slate Markov Decision Processes with High-Dimensional States and Actions. arXiv preprint arXiv:1512.01124 (2015)."},{"key":"e_1_3_2_2_54_1","doi-asserted-by":"publisher","DOI":"10.5555\/551283"},{"key":"e_1_3_2_2_55_1","volume-title":"Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning. Machine learning","author":"Williams Ronald J","year":"1992","unstructured":"Ronald J Williams . 1992. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning. Machine learning , Vol. 8 , 3 ( 1992 ), 229--256. Ronald J Williams. 1992. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning. Machine learning , Vol. 8, 3 (1992), 229--256."},{"key":"e_1_3_2_2_56_1","unstructured":"Chao-Yuan Wu Amr Ahmed Alex Beutel Alexander J Smola and How Jing. 2017. Recurrent Recommender Networks. In WSDM. ACM 495--503. Chao-Yuan Wu Amr Ahmed Alex Beutel Alexander J Smola and How Jing. 2017. Recurrent Recommender Networks. In WSDM. ACM 495--503."},{"key":"e_1_3_2_2_57_1","volume-title":"Gerard De Melo, and Yongfeng Zhang","author":"Xian Yikun","year":"2019","unstructured":"Yikun Xian , Zuohui Fu , Shan Muthukrishnan , Gerard De Melo, and Yongfeng Zhang . 2019 . Reinforcement Knowledge Graph Reasoning for Explainable Recommendation. In SIGIR . ACM , 285--294. Yikun Xian, Zuohui Fu, Shan Muthukrishnan, Gerard De Melo, and Yongfeng Zhang. 2019. Reinforcement Knowledge Graph Reasoning for Explainable Recommendation. In SIGIR . ACM, 285--294."},{"key":"e_1_3_2_2_58_1","unstructured":"Feng Yu Qiang Liu Shu Wu Liang Wang and Tieniu Tan. 2016. A Dynamic Recurrent Model for Next Basket Recommendation. In SIGIR . ACM 729--732. Feng Yu Qiang Liu Shu Wu Liang Wang and Tieniu Tan. 2016. A Dynamic Recurrent Model for Next Basket Recommendation. In SIGIR . ACM 729--732."},{"key":"e_1_3_2_2_59_1","doi-asserted-by":"crossref","unstructured":"Tong Yu Yilin Shen Ruiyi Zhang Xiangyu Zeng and Hongxia Jin. 2019. Vision-Language Recommendation via Attribute Augmented Multimodal Reinforcement Learning. In MM. ACM 39--47. Tong Yu Yilin Shen Ruiyi Zhang Xiangyu Zeng and Hongxia Jin. 2019. Vision-Language Recommendation via Attribute Augmented Multimodal Reinforcement Learning. In MM. ACM 39--47.","DOI":"10.1145\/3343031.3350935"},{"key":"e_1_3_2_2_60_1","volume-title":"A Novel Movie Recommendation System based on Deep Reinforcement Learning with Prioritized Experience Replay","author":"Yuyan Zhang","unstructured":"Zhang Yuyan , Su Xiayao , and Liu Yong . 2019. A Novel Movie Recommendation System based on Deep Reinforcement Learning with Prioritized Experience Replay . In ICCT. IEEE , 1496--1500. Zhang Yuyan, Su Xiayao, and Liu Yong. 2019. A Novel Movie Recommendation System based on Deep Reinforcement Learning with Prioritized Experience Replay. In ICCT. IEEE , 1496--1500."},{"key":"e_1_3_2_2_61_1","doi-asserted-by":"crossref","unstructured":"Shuo Zhang and Krisztian Balog. 2020. Evaluating Conversational Recommender Systems via User Simulation. In KDD . ACM 1512--1520. Shuo Zhang and Krisztian Balog. 2020. Evaluating Conversational Recommender Systems via User Simulation. In KDD . ACM 1512--1520.","DOI":"10.1145\/3394486.3403202"},{"key":"e_1_3_2_2_62_1","volume-title":"PRICAI","author":"Zhao Chenfei","unstructured":"Chenfei Zhao and Lan Hu. 2019. CapDRL: A Deep Capsule Reinforcement Learning for Movie Recommendation . In PRICAI . Springer , 734--739. Chenfei Zhao and Lan Hu. 2019. CapDRL: A Deep Capsule Reinforcement Learning for Movie Recommendation. In PRICAI . Springer, 734--739."},{"key":"e_1_3_2_2_63_1","volume-title":"Toward Simulating Environments in Reinforcement Learning based Recommendations. arXiv preprint arXiv:1906.11462","author":"Zhao Xiangyu","year":"2019","unstructured":"Xiangyu Zhao , Long Xia , Zhuoye Ding , Dawei Yin , and Jiliang Tang . 2019. Toward Simulating Environments in Reinforcement Learning based Recommendations. arXiv preprint arXiv:1906.11462 ( 2019 ). Xiangyu Zhao, Long Xia, Zhuoye Ding, Dawei Yin, and Jiliang Tang. 2019. Toward Simulating Environments in Reinforcement Learning based Recommendations. arXiv preprint arXiv:1906.11462 (2019)."},{"key":"e_1_3_2_2_64_1","doi-asserted-by":"crossref","unstructured":"Xiangyu Zhao Long Xia Liang Zhang Zhuoye Ding Dawei Yin and Jiliang Tang. 2018a. Deep Reinforcement Learning for Page-wise Recommendations. In RecSys . ACM 95--103. Xiangyu Zhao Long Xia Liang Zhang Zhuoye Ding Dawei Yin and Jiliang Tang. 2018a. Deep Reinforcement Learning for Page-wise Recommendations. In RecSys . ACM 95--103.","DOI":"10.1145\/3240323.3240374"},{"key":"e_1_3_2_2_65_1","doi-asserted-by":"crossref","unstructured":"Xiangyu Zhao Liang Zhang Zhuoye Ding Long Xia Jiliang Tang and Dawei Yin. 2018b. Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning. In KDD . ACM 1040--1048. Xiangyu Zhao Liang Zhang Zhuoye Ding Long Xia Jiliang Tang and Dawei Yin. 2018b. Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning. In KDD . ACM 1040--1048.","DOI":"10.1145\/3219819.3219886"},{"key":"e_1_3_2_2_66_1","volume-title":"Deep Reinforcement Learning for List-Wise Recommendations. arXiv preprint arXiv:1801.00209","author":"Zhao Xiangyu","year":"2017","unstructured":"Xiangyu Zhao , Liang Zhang , Long Xia , Zhuoye Ding , Dawei Yin , and Jiliang Tang . 2017. Deep Reinforcement Learning for List-Wise Recommendations. arXiv preprint arXiv:1801.00209 ( 2017 ). Xiangyu Zhao, Liang Zhang, Long Xia, Zhuoye Ding, Dawei Yin, and Jiliang Tang. 2017. Deep Reinforcement Learning for List-Wise Recommendations. arXiv preprint arXiv:1801.00209 (2017)."},{"key":"e_1_3_2_2_67_1","doi-asserted-by":"crossref","unstructured":"Xiangyu Zhao Xudong Zheng Xiwang Yang Xiaobing Liu and Jiliang Tang. 2020. Jointly Learning to Recommend and Advertise. In KDD. ACM 3319--3327. Xiangyu Zhao Xudong Zheng Xiwang Yang Xiaobing Liu and Jiliang Tang. 2020. Jointly Learning to Recommend and Advertise. In KDD. ACM 3319--3327.","DOI":"10.1145\/3394486.3403384"},{"key":"e_1_3_2_2_68_1","volume-title":"Xing Xie, and Zhenhui Li.","author":"Zheng Guanjie","year":"2018","unstructured":"Guanjie Zheng , Fuzheng Zhang , Zihan Zheng , Yang Xiang , Nicholas Jing Yuan , Xing Xie, and Zhenhui Li. 2018 . DRN : A Deep Reinforcement Learning Framework for News Recommendation. In WWW . ACM , 167--176. Guanjie Zheng, Fuzheng Zhang, Zihan Zheng, Yang Xiang, Nicholas Jing Yuan, Xing Xie, and Zhenhui Li. 2018. DRN: A Deep Reinforcement Learning Framework for News Recommendation. In WWW . ACM, 167--176."}],"event":{"name":"SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval","location":"Madrid Spain","acronym":"SIGIR '22","sponsor":["SIGIR ACM Special Interest Group on Information Retrieval"]},"container-title":["Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3477495.3531716","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3477495.3531716","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:02:07Z","timestamp":1750186927000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3477495.3531716"}},"subtitle":["A Reproducibility Study"],"short-title":[],"issued":{"date-parts":[[2022,7,6]]},"references-count":68,"alternative-id":["10.1145\/3477495.3531716","10.1145\/3477495"],"URL":"https:\/\/doi.org\/10.1145\/3477495.3531716","relation":{},"subject":[],"published":{"date-parts":[[2022,7,6]]},"assertion":[{"value":"2022-07-07","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}