{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T17:19:59Z","timestamp":1776100799682,"version":"3.50.1"},"reference-count":220,"publisher":"Association for Computing Machinery (ACM)","issue":"7","license":[{"start":{"date-parts":[[2022,12,15]],"date-time":"2022-12-15T00:00:00Z","timestamp":1671062400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Comput. Surv."],"published-print":{"date-parts":[[2023,7,31]]},"abstract":"<jats:p>\n            <jats:bold>Recommender systems (RSs)<\/jats:bold>\n            have become an inseparable part of our everyday lives. They help us find our favorite items to purchase, our friends on social networks, and our favorite movies to watch. Traditionally, the recommendation problem was considered to be a classification or prediction problem, but it is now widely agreed that formulating it as a sequential decision problem can better reflect the user-system interaction. Therefore, it can be formulated as a\n            <jats:bold>Markov decision process (MDP)<\/jats:bold>\n            and be solved by\n            <jats:bold>reinforcement learning (RL)<\/jats:bold>\n            algorithms. Unlike traditional recommendation methods, including collaborative filtering and content-based filtering, RL is able to handle the sequential, dynamic user-system interaction and to take into account the long-term user engagement. Although the idea of using RL for recommendation is not new and has been around for about two decades, it was not very practical, mainly because of scalability problems of traditional RL algorithms. However, a new trend has emerged in the field since the introduction of\n            <jats:bold>deep reinforcement learning (DRL)<\/jats:bold>\n            , which made it possible to apply RL to the recommendation problem with large state and action spaces. In this paper, a survey on\n            <jats:bold>reinforcement learning based recommender systems (RLRSs)<\/jats:bold>\n            is presented. Our aim is to present an outlook on the field and to provide the reader with a fairly complete knowledge of key concepts of the field. We first recognize and illustrate that RLRSs can be generally classified into RL- and DRL-based methods. Then, we propose an RLRS framework with four components, i.e., state representation, policy optimization, reward formulation, and environment building, and survey RLRS algorithms accordingly. We highlight emerging topics and depict important trends using various graphs and tables. Finally, we discuss important aspects and challenges that can be addressed in the future.\n          <\/jats:p>","DOI":"10.1145\/3543846","type":"journal-article","created":{"date-parts":[[2022,6,15]],"date-time":"2022-06-15T12:23:29Z","timestamp":1655295809000},"page":"1-38","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":348,"title":["Reinforcement Learning based Recommender Systems: A Survey"],"prefix":"10.1145","volume":"55","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6108-5250","authenticated-orcid":false,"given":"M. Mehdi","family":"Afsar","sequence":"first","affiliation":[{"name":"University of Calgary, Calgary, AB, Canada"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6690-1926","authenticated-orcid":false,"given":"Trafford","family":"Crump","sequence":"additional","affiliation":[{"name":"University of Calgary, Calgary, AB, Canada"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1589-8039","authenticated-orcid":false,"given":"Behrouz","family":"Far","sequence":"additional","affiliation":[{"name":"University of Calgary, Calgary, AB, Canada"}]}],"member":"320","published-online":{"date-parts":[[2022,12,15]]},"reference":[{"key":"e_1_3_1_2_2","article-title":"The Zettabyte Era\u2013Trends and Analysis","author":"Index Cisco Visual Networking","year":"2013","unstructured":"Cisco Visual Networking Index. 2013. The Zettabyte Era\u2013Trends and Analysis. Cisco White Paper (2013).","journal-title":"Cisco White Paper"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511763113"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-0-387-85820-3_1"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-006-0082-7"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1145\/502585.502625"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCE.2006.1706489"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/336992.337035"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2018.04.008"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10462-015-9440-z"},{"key":"e_1_3_1_11_2","first-page":"1","volume-title":"EHB\u201913","author":"Sezgin Emre","year":"2013","unstructured":"Emre Sezgin and Sevgi \u00d6zkan. 2013. A systematic literature review on health recommender systems. In EHB\u201913. 1\u20134."},{"key":"e_1_3_1_12_2","unstructured":"Netflix Update: Try This at Home. https:\/\/sifter.org\/simon\/journal\/20061211.html. ([n. d.])."},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2013.03.012"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.5555\/3086952"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/3158369"},{"key":"e_1_3_1_16_2","article-title":"Deep reinforcement learning: An overview","author":"Li Yuxi","year":"2017","unstructured":"Yuxi Li. 2017. Deep reinforcement learning: An overview. arXiv preprint arXiv:1701.07274 (2017).","journal-title":"arXiv preprint arXiv:1701.07274"},{"key":"e_1_3_1_17_2","volume-title":"AAAI\u201918","author":"Henderson Peter","year":"2018","unstructured":"Peter Henderson, Riashat Islam, Philip Bachman, Joelle Pineau, Doina Precup, and David Meger. 2018. Deep reinforcement learning that matters. In AAAI\u201918."},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.2352\/ISSN.2470-1173.2017.19.AVM-023"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.robot.2019.01.003"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1177\/0278364913495721"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.procir.2017.03.095"},{"key":"e_1_3_1_22_2","article-title":"A deep reinforcement learning framework for the financial portfolio management problem","author":"Jiang Zhengyao","year":"2017","unstructured":"Zhengyao Jiang, Dixing Xu, and Jinjun Liang. 2017. A deep reinforcement learning framework for the financial portfolio management problem. arXiv preprint arXiv:1706.10059 (2017).","journal-title":"arXiv preprint arXiv:1706.10059"},{"key":"e_1_3_1_23_2","first-page":"1671","volume-title":"AAAI\u201908","author":"Guez Arthur","year":"2008","unstructured":"Arthur Guez, Robert D. Vincent, Massimo Avoli, and Joelle Pineau. 2008. Adaptive treatment of epilepsy via batch-mode reinforcement learning. In AAAI\u201908. 1671\u20131678."},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41591-018-0310-5"},{"key":"e_1_3_1_25_2","article-title":"Deep reinforcement learning in large discrete action spaces","author":"Dulac-Arnold Gabriel","year":"2015","unstructured":"Gabriel Dulac-Arnold, Richard Evans, Hado van Hasselt, Peter Sunehag, Timothy Lillicrap, Jonathan Hunt, Timothy Mann, Theophane Weber, Thomas Degris, and Ben Coppin. 2015. Deep reinforcement learning in large discrete action spaces. arXiv preprint arXiv:1512.07679 (2015).","journal-title":"arXiv preprint arXiv:1512.07679"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/3289600.3290999"},{"key":"e_1_3_1_27_2","article-title":"Introduction to multi-armed bandits","author":"Slivkins Aleksandrs","year":"2019","unstructured":"Aleksandrs Slivkins. 2019. Introduction to multi-armed bandits. arXiv preprint arXiv:1904.07272 (2019).","journal-title":"arXiv preprint arXiv:1904.07272"},{"key":"e_1_3_1_28_2","volume-title":"Reinforcement Learning: An Introduction","author":"Sutton Richard S.","year":"2018","unstructured":"Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction. MIT press."},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1145\/1772690.1772758"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/2911451.2911548"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-29659-3"},{"key":"e_1_3_1_32_2","volume-title":"FLAIRS\u201921","author":"Afsar M. Mehdi","year":"2021","unstructured":"M. Mehdi Afsar, Trafford Crump, and Behrouz Far. 2021. An exploration on-demand article recommender system for cancer patients information provisioning. In FLAIRS\u201921, Vol. 34."},{"issue":"4","key":"e_1_3_1_33_2","first-page":"12","article-title":"Survey of multiarmed bandit algorithms applied to recommendation systems","volume":"9","author":"Elena Gangan","year":"2021","unstructured":"Gangan Elena, Kudus Milos, and Ilyushin Eugene. 2021. Survey of multiarmed bandit algorithms applied to recommendation systems. International Journal of Open Information Technologies 9, 4 (2021), 12\u201327.","journal-title":"International Journal of Open Information Technologies"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1155\/2009\/421425"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1145\/2556270"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1023\/a:1021240730564"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2013.2281156"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1145\/3407190"},{"key":"e_1_3_1_39_2","article-title":"Explainable recommendation: A survey and new perspectives","author":"Zhang Yongfeng","year":"2018","unstructured":"Yongfeng Zhang and Xu Chen. 2018. Explainable recommendation: A survey and new perspectives. arXiv preprint arXiv:1804.11192 (2018).","journal-title":"arXiv preprint arXiv:1804.11192"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00799-015-0156-0"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1145\/3320496.3320500"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1145\/3190616"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1145\/3465401"},{"key":"e_1_3_1_44_2","first-page":"1187","volume-title":"SIGKDD\u201918","author":"Chen Shi-Yong","year":"2018","unstructured":"Shi-Yong Chen, Yang Yu, Qing Da, Jun Tan, Hai-Kuan Huang, and Hai-Hong Tang. 2018. Stabilizing reinforcement learning in dynamic environment with application to online recommendation. In SIGKDD\u201918. 1187\u20131196."},{"key":"e_1_3_1_45_2","article-title":"Reinforcement learning based recommender system using biclustering technique","author":"Choi Sungwoon","year":"2018","unstructured":"Sungwoon Choi, Heonseok Ha, Uiwon Hwang, Chanju Kim, Jung-Woo Ha, and Sungroh Yoon. 2018. Reinforcement learning based recommender system using biclustering technique. arXiv preprint arXiv:1801.05532 (2018).","journal-title":"arXiv preprint arXiv:1801.05532"},{"key":"e_1_3_1_46_2","first-page":"226","volume-title":"ICOIACT\u201918","author":"Munemasa Isshu","year":"2018","unstructured":"Isshu Munemasa, Yuta Tomomatsu, Kunioki Hayashi, and Tomohiro Takagi. 2018. Deep reinforcement learning for recommender systems. In ICOIACT\u201918. 226\u2013233."},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.1145\/3240323.3240374"},{"key":"e_1_3_1_48_2","first-page":"167","volume-title":"WWW\u201918","author":"Zheng Guanjie","year":"2018","unstructured":"Guanjie Zheng, Fuzheng Zhang, Zihan Zheng, Yang Xiang, Nicholas Jing Yuan, Xing Xie, and Zhenhui Li. 2018. DRN: A deep reinforcement learning framework for news recommendation. In WWW\u201918. 167\u2013176."},{"key":"e_1_3_1_49_2","first-page":"1040","volume-title":"SIGKDD\u201918","author":"Zhao Xiangyu","year":"2018","unstructured":"Xiangyu Zhao, Liang Zhang, Zhuoye Ding, Long Xia, Jiliang Tang, and Dawei Yin. 2018. Recommendations with negative feedback via pairwise deep reinforcement learning. In SIGKDD\u201918. 1040\u20131048."},{"key":"e_1_3_1_50_2","doi-asserted-by":"crossref","first-page":"75","DOI":"10.1145\/2365952.2365971","volume-title":"RecSys","author":"Moling Omar","year":"2012","unstructured":"Omar Moling, Linas Baltrunas, and Francesco Ricci. 2012. Optimal radio channel recommendations with explicit and implicit feedback. In RecSys. 75\u201382."},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","DOI":"10.5555\/1046920.1088715"},{"key":"e_1_3_1_52_2","first-page":"172","volume-title":"ICIS\u201917","author":"Hu Binbin","year":"2017","unstructured":"Binbin Hu, Chuan Shi, and Jian Liu. 2017. Playlist recommendation based on reinforcement learning. In ICIS\u201917. 172\u2013182."},{"key":"e_1_3_1_53_2","article-title":"Deep reinforcement learning for list-wise recommendations","author":"Zhao Xiangyu","year":"2017","unstructured":"Xiangyu Zhao, Liang Zhang, Long Xia, Zhuoye Ding, Dawei Yin, and Jiliang Tang. 2017. Deep reinforcement learning for list-wise recommendations. arXiv preprint arXiv:1801.00209 (2017).","journal-title":"arXiv preprint arXiv:1801.00209"},{"key":"e_1_3_1_54_2","doi-asserted-by":"publisher","DOI":"10.1145\/245108.245121"},{"key":"e_1_3_1_55_2","doi-asserted-by":"publisher","DOI":"10.1145\/138859.138867"},{"key":"e_1_3_1_56_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-72079-9_10"},{"key":"e_1_3_1_57_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-0-387-85820-3_3"},{"key":"e_1_3_1_58_2","volume-title":"Learning from Delayed Rewards","author":"Watkins Christopher","year":"1989","unstructured":"Christopher Watkins. 1989. Learning from Delayed Rewards. University of Cambridge."},{"key":"e_1_3_1_59_2","volume-title":"On-line Q-learning Using Connectionist Systems","author":"Rummery Gavin A.","year":"1994","unstructured":"Gavin A. Rummery and Mahesan Niranjan. 1994. On-line Q-learning Using Connectionist Systems. Vol. 37. University of Cambridge."},{"key":"e_1_3_1_60_2","first-page":"298","volume-title":"ICML\u201993","author":"Schwartz Anton","year":"1993","unstructured":"Anton Schwartz. 1993. A reinforcement learning method for maximizing undiscounted rewards. In ICML\u201993. 298\u2013305."},{"key":"e_1_3_1_61_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCIAIG.2012.2186810"},{"key":"e_1_3_1_62_2","first-page":"216","article-title":"Monte-Carlo tree search: A new framework for game AI.","volume":"8","author":"Chaslot Guillaume","year":"2008","unstructured":"Guillaume Chaslot, Sander Bakkes, Istvan Szita, and Pieter Spronck. 2008. Monte-Carlo tree search: A new framework for game AI. AIIDE 8 (2008), 216\u2013217.","journal-title":"AIIDE"},{"key":"e_1_3_1_63_2","doi-asserted-by":"publisher","DOI":"10.1038\/nature16961"},{"key":"e_1_3_1_64_2","doi-asserted-by":"publisher","DOI":"10.5555\/933034"},{"key":"e_1_3_1_65_2","doi-asserted-by":"publisher","DOI":"10.5555\/1046920.1088690"},{"key":"e_1_3_1_66_2","doi-asserted-by":"publisher","DOI":"10.1007\/BF00992696"},{"key":"e_1_3_1_67_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSMC.1983.6313077"},{"key":"e_1_3_1_68_2","article-title":"Playing Atari with deep reinforcement learning","author":"Mnih Volodymyr","year":"2013","unstructured":"Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013).","journal-title":"arXiv preprint arXiv:1312.5602"},{"key":"e_1_3_1_69_2","doi-asserted-by":"publisher","DOI":"10.1038\/nature14236"},{"key":"e_1_3_1_70_2","doi-asserted-by":"publisher","DOI":"10.1007\/BF00992699"},{"key":"e_1_3_1_71_2","volume-title":"Connectionist Models Summer School Hillsdale","author":"Thrun Sebastian","year":"1993","unstructured":"Sebastian Thrun and Anton Schwartz. 1993. Issues in using function approximation for reinforcement learning. In Connectionist Models Summer School Hillsdale."},{"key":"e_1_3_1_72_2","article-title":"Deep reinforcement learning with double Q-learning","author":"Hasselt Hado Van","year":"2015","unstructured":"Hado Van Hasselt, Arthur Guez, and David Silver. 2015. Deep reinforcement learning with double Q-learning. arXiv preprint arXiv:1509.06461 (2015).","journal-title":"arXiv preprint arXiv:1509.06461"},{"key":"e_1_3_1_73_2","first-page":"1995","volume-title":"ICML\u201916","author":"Wang Ziyu","year":"2016","unstructured":"Ziyu Wang, Tom Schaul, Matteo Hessel, Hado Hasselt, Marc Lanctot, and Nando Freitas. 2016. Dueling network architectures for deep reinforcement learning. In ICML\u201916. 1995\u20132003."},{"key":"e_1_3_1_74_2","article-title":"Prioritized experience replay","author":"Schaul Tom","year":"2015","unstructured":"Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver. 2015. Prioritized experience replay. arXiv preprint arXiv:1511.05952 (2015).","journal-title":"arXiv preprint arXiv:1511.05952"},{"key":"e_1_3_1_75_2","article-title":"Continuous control with deep reinforcement learning","author":"Lillicrap Timothy P.","year":"2015","unstructured":"Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015).","journal-title":"arXiv preprint arXiv:1509.02971"},{"key":"e_1_3_1_76_2","volume-title":"ICML\u201914","author":"Silver David","year":"2014","unstructured":"David Silver et\u00a0al. 2014. Deterministic policy gradient algorithms. In ICML\u201914."},{"key":"e_1_3_1_77_2","article-title":"Proximal policy optimization algorithms","author":"Schulman John","year":"2017","unstructured":"John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).","journal-title":"arXiv preprint arXiv:1707.06347"},{"key":"e_1_3_1_78_2","first-page":"1889","volume-title":"ICML\u201915","author":"Schulman John","year":"2015","unstructured":"John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. 2015. Trust region policy optimization. In ICML\u201915. 1889\u20131897."},{"key":"e_1_3_1_79_2","doi-asserted-by":"publisher","DOI":"10.21428\/594757db.6a36bb36"},{"key":"e_1_3_1_80_2","first-page":"580","volume-title":"UAI\u201901","author":"Zimdars A.","year":"2001","unstructured":"A. Zimdars, D. M. Chickering, and C. Meek. 2001. Using temporal data for making recommendations. In UAI\u201901. 580\u2013588."},{"key":"e_1_3_1_81_2","article-title":"Reinforcement learning for slate-based recommender systems: A tractable decomposition and practical methodology","author":"Ie Eugene","year":"2019","unstructured":"Eugene Ie et\u00a0al. 2019. Reinforcement learning for slate-based recommender systems: A tractable decomposition and practical methodology. arXiv preprint arXiv:1905.12767 (2019).","journal-title":"arXiv preprint arXiv:1905.12767"},{"key":"e_1_3_1_82_2","article-title":"Empirical evaluation of gated recurrent neural networks on sequence modeling","author":"Chung Junyoung","year":"2014","unstructured":"Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014).","journal-title":"arXiv preprint arXiv:1412.3555"},{"key":"e_1_3_1_83_2","first-page":"153","volume-title":"TPDL\u201915","author":"Beel Joeran","year":"2015","unstructured":"Joeran Beel and Stefan Langer. 2015. A comparison of offline evaluations, online evaluations, and user studies in the context of research-paper recommender systems. In TPDL\u201915. 153\u2013168."},{"key":"e_1_3_1_84_2","first-page":"770","volume-title":"IJCAI\u201997","author":"Joachims Thorsten","year":"1997","unstructured":"Thorsten Joachims, Dayne Freitag, Tom Mitchell, et\u00a0al. 1997. WebWatcher: A tour guide for the World Wide Web. In IJCAI\u201997. 770\u2013777."},{"key":"e_1_3_1_85_2","first-page":"692","volume-title":"WI\u201905","author":"Preda Mircea","year":"2005","unstructured":"Mircea Preda and Dan Popescu. 2005. Personalized web recommendations: Supporting epistemic information about end-users. In WI\u201905. 692\u2013695."},{"key":"e_1_3_1_86_2","doi-asserted-by":"publisher","DOI":"10.1145\/1297231.1297250"},{"key":"e_1_3_1_87_2","doi-asserted-by":"publisher","DOI":"10.1145\/1250910.1250923"},{"key":"e_1_3_1_88_2","doi-asserted-by":"publisher","DOI":"10.1145\/1363686.1363954"},{"key":"e_1_3_1_89_2","doi-asserted-by":"publisher","DOI":"10.1145\/1557914.1557930"},{"key":"e_1_3_1_90_2","first-page":"60","volume-title":"TAAI\u201910","author":"Chi Chung-Yi","year":"2010","unstructured":"Chung-Yi Chi, Richard Tzong-Han Tsai, Jeng-You Lai, and Jane Yung-jen Hsu. 2010. A reinforcement learning approach to emotion-based automatic playlist generation. In TAAI\u201910. 60\u201365."},{"key":"e_1_3_1_91_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-010-5229-0"},{"key":"e_1_3_1_92_2","doi-asserted-by":"publisher","DOI":"10.1111\/j.1541-0420.2011.01572.x"},{"key":"e_1_3_1_93_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10257-013-0222-3"},{"key":"e_1_3_1_94_2","article-title":"DJ-MC: A reinforcement-learning agent for music playlist recommendation","author":"Liebman Elad","year":"2014","unstructured":"Elad Liebman, Maytal Saar-Tsechansky, and Peter Stone. 2014. DJ-MC: A reinforcement-learning agent for music playlist recommendation. arXiv preprint arXiv:1401.1880 (2014).","journal-title":"arXiv preprint arXiv:1401.1880"},{"key":"e_1_3_1_95_2","volume-title":"IJCAI\u201915","author":"Theocharous Georgios","year":"2015","unstructured":"Georgios Theocharous, Philip S. Thomas, and Mohammad Ghavamzadeh. 2015. Personalized ad recommendation systems for life-time value optimization with guarantees. In IJCAI\u201915."},{"key":"e_1_3_1_96_2","article-title":"Partially observable Markov decision process for recommender systems","author":"Lu Zhongqi","year":"2016","unstructured":"Zhongqi Lu and Qiang Yang. 2016. Partially observable Markov decision process for recommender systems. arXiv preprint arXiv:1608.07793 (2016).","journal-title":"arXiv preprint arXiv:1608.07793"},{"key":"e_1_3_1_97_2","doi-asserted-by":"publisher","DOI":"10.1145\/3109859.3109914"},{"key":"e_1_3_1_98_2","first-page":"167","volume-title":"GWS","author":"Intayoad Wacharawan","year":"2018","unstructured":"Wacharawan Intayoad, Chayapol Kamyod, and Punnarumol Temdee. 2018. Reinforcement learning for online learning recommendation system. In GWS. 167\u2013170."},{"key":"e_1_3_1_99_2","first-page":"1","article-title":"Music recommender using deep embedding-based features and behavior-based reinforcement learning","author":"Chang Jia-Wei","year":"2019","unstructured":"Jia-Wei Chang, Ching-Yi Chiou, Jia-Yi Liao, Ying-Kai Hung, Chien-Che Huang, Kuan-Cheng Lin, and Ying-Hung Pu. 2019. Music recommender using deep embedding-based features and behavior-based reinforcement learning. Multimedia Tools and Applications (2019), 1\u201328.","journal-title":"Multimedia Tools and Applications"},{"key":"e_1_3_1_100_2","first-page":"197","volume-title":"iSCI\u201919","author":"Chen Jing","year":"2019","unstructured":"Jing Chen and Wenjun Jiang. 2019. Context-aware personalized POI sequence recommendation. In iSCI\u201919. Springer, 197\u2013210."},{"key":"e_1_3_1_101_2","first-page":"91","volume-title":"PAKDD\u201920","author":"Wang Yu","year":"2020","unstructured":"Yu Wang. 2020. A hybrid recommendation for music based on reinforcement learning. In PAKDD\u201920. 91\u2013103."},{"key":"e_1_3_1_102_2","doi-asserted-by":"publisher","DOI":"10.1287\/mnsc.2020.3727"},{"key":"e_1_3_1_103_2","doi-asserted-by":"publisher","DOI":"10.1145\/345124.345169"},{"key":"e_1_3_1_104_2","first-page":"1257","volume-title":"NIPS\u201908","author":"Mnih Andriy","year":"2008","unstructured":"Andriy Mnih and Russ R. Salakhutdinov. 2008. Probabilistic matrix factorization. In NIPS\u201908. 1257\u20131264."},{"key":"e_1_3_1_105_2","first-page":"3111","volume-title":"NIPS\u201913","author":"Mikolov Tomas","year":"2013","unstructured":"Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In NIPS\u201913. 3111\u20133119."},{"key":"e_1_3_1_106_2","article-title":"WaveNet: A generative model for raw audio","author":"Oord Aaron van den","year":"2016","unstructured":"Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. 2016. WaveNet: A generative model for raw audio. arXiv preprint arXiv:1609.03499 (2016).","journal-title":"arXiv preprint arXiv:1609.03499"},{"key":"e_1_3_1_107_2","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3064009"},{"key":"e_1_3_1_108_2","volume-title":"Adaptive Control Processes","author":"Bellman Richard E.","year":"2015","unstructured":"Richard E. Bellman. 2015. Adaptive Control Processes. Princeton University Press."},{"key":"e_1_3_1_109_2","first-page":"407","volume-title":"Analysis, Design and Evaluation of Man-Machine Systems","author":"Barto Andrew G.","year":"1995","unstructured":"Andrew G. Barto. 1995. Reinforcement learning and dynamic programming. In Analysis, Design and Evaluation of Man-Machine Systems. 407\u2013412."},{"key":"e_1_3_1_110_2","first-page":"282","volume-title":"ECML\u201906","author":"Kocsis Levente","year":"2006","unstructured":"Levente Kocsis and Csaba Szepesv\u00e1ri. 2006. Bandit based Monte-Carlo planning. In ECML\u201906. 282\u2013293."},{"key":"e_1_3_1_111_2","unstructured":"MovieLens. https:\/\/grouplens.org\/datasets\/movielens\/. ([n. d.])."},{"key":"e_1_3_1_112_2","first-page":"591","volume-title":"ISMIR\u201911","author":"Bertin-Mahieux Thierry","year":"2011","unstructured":"Thierry Bertin-Mahieux, Daniel P. W. Ellis, Brian Whitman, and Paul Lamere. 2011. The million song dataset. In ISMIR\u201911. 591\u2013596."},{"key":"e_1_3_1_113_2","article-title":"Deep reinforcement learning with attention for slate Markov decision processes with high-dimensional states and actions","author":"Sunehag Peter","year":"2015","unstructured":"Peter Sunehag, Richard Evans, Gabriel Dulac-Arnold, Yori Zwols, Daniel Visentin, and Ben Coppin. 2015. Deep reinforcement learning with attention for slate Markov decision processes with high-dimensional states and actions. arXiv preprint arXiv:1512.01124 (2015).","journal-title":"arXiv preprint arXiv:1512.01124"},{"key":"e_1_3_1_114_2","first-page":"2978","volume-title":"EMBC\u201916","author":"Nemati Shamim","year":"2016","unstructured":"Shamim Nemati, Mohammad M. Ghassemi, and Gari D. Clifford. 2016. Optimal medication dosing from suboptimal clinical examples: A deep reinforcement learning approach. In EMBC\u201916. 2978\u20132981."},{"key":"e_1_3_1_115_2","first-page":"372","volume-title":"AIxIA\u201917","author":"Greco Claudio","year":"2017","unstructured":"Claudio Greco, Alessandro Suglia, Pierpaolo Basile, and Giovanni Semeraro. 2017. Converse-Et-Impera: Exploiting deep learning and hierarchical reinforcement learning for conversational recommender systems. In AIxIA\u201917. 372\u2013386."},{"key":"e_1_3_1_116_2","article-title":"Deep reinforcement learning for sepsis treatment","author":"Raghu Aniruddh","year":"2017","unstructured":"Aniruddh Raghu, Matthieu Komorowski, Imran Ahmed, Leo Celi, Peter Szolovits, and Marzyeh Ghassemi. 2017. Deep reinforcement learning for sepsis treatment. arXiv preprint arXiv:1711.09602 (2017).","journal-title":"arXiv preprint arXiv:1711.09602"},{"key":"e_1_3_1_117_2","first-page":"2447","volume-title":"SIGKDD\u201918","author":"Wang Lu","year":"2018","unstructured":"Lu Wang, Wei Zhang, Xiaofeng He, and Hongyuan Zha. 2018. Supervised reinforcement learning with recurrent neural network for dynamic treatment recommendation. In SIGKDD\u201918. 2447\u20132456."},{"key":"e_1_3_1_118_2","doi-asserted-by":"publisher","DOI":"10.1145\/3209978.3210002"},{"key":"e_1_3_1_119_2","first-page":"734","volume-title":"PRICAI\u201919","author":"Zhao Chenfei","year":"2019","unstructured":"Chenfei Zhao and Lan Hu. 2019. CapDRL: A deep capsule reinforcement learning for movie recommendation. In PRICAI\u201919. Springer, 734\u2013739."},{"key":"e_1_3_1_120_2","first-page":"2810","volume-title":"SIGKDD\u201919","author":"Zou Lixin","year":"2019","unstructured":"Lixin Zou, Long Xia, Zhuoye Ding, Jiaxing Song, Weidong Liu, and Dawei Yin. 2019. Reinforcement learning to optimize long-term user engagement in recommender systems. In SIGKDD\u201919. 2810\u20132818."},{"key":"e_1_3_1_121_2","first-page":"1048","volume-title":"ICDM\u201919","author":"Gao Rong","year":"2019","unstructured":"Rong Gao, Haifeng Xia, Jing Li, Donghua Liu, Shuai Chen, and Gang Chun. 2019. DRCGR: Deep reinforcement learning framework incorporating CNN and GAN-based for interactive recommendation. In ICDM\u201919. 1048\u20131053."},{"key":"e_1_3_1_122_2","doi-asserted-by":"publisher","DOI":"10.1145\/3343031.3350935"},{"key":"e_1_3_1_123_2","first-page":"51","volume-title":"WI\u201919","author":"Tsumita Daisuke","year":"2019","unstructured":"Daisuke Tsumita and Tomohiro Takagi. 2019. Dialogue based recommender system that flexibly mixes utterances and recommendations. In WI\u201919. 51\u201358."},{"key":"e_1_3_1_124_2","first-page":"435","volume-title":"AAAI\u201919","author":"Zhang Jing","year":"2019","unstructured":"Jing Zhang, Bowen Hao, Bo Chen, Cuiping Li, Hong Chen, and Jimeng Sun. 2019. Hierarchical reinforcement learning for course recommendation in MOOCs. In AAAI\u201919. 435\u2013442."},{"key":"e_1_3_1_125_2","article-title":"Deep reinforcement learning for personalized search story recommendation","author":"Zhang Jason","year":"2019","unstructured":"Jason Zhang, Junming Yin, Dongwon Lee, and Linhong Zhu. 2019. Deep reinforcement learning for personalized search story recommendation. Journal of Environmental Sciences (2019).","journal-title":"Journal of Environmental Sciences"},{"key":"e_1_3_1_126_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2019.2925019"},{"key":"e_1_3_1_127_2","first-page":"1496","volume-title":"ICCT\u201919","author":"Yuyan Zhang","year":"2019","unstructured":"Zhang Yuyan, Su Xiayao, and Liu Yong. 2019. A novel movie recommendation system based on deep reinforcement learning with prioritized experience replay. In ICCT\u201919. 1496\u20131500."},{"key":"e_1_3_1_128_2","first-page":"535","volume-title":"SIGIR\u201919","author":"Gui Tao","year":"2019","unstructured":"Tao Gui, Peng Liu, Qi Zhang, Liang Zhu, Minlong Peng, Yunhua Zhou, and Xuanjing Huang. 2019. Mention recommendation in Twitter with cooperative multi-agent reinforcement learning. In SIGIR\u201919. 535\u2013544."},{"key":"e_1_3_1_129_2","volume-title":"NIPS\u201919","author":"Zhang Ruiyi","year":"2019","unstructured":"Ruiyi Zhang, Tong Yu, Yilin Shen, Hongxia Jin, and Changyou Chen. 2019. Text-based interactive recommendation via constraint-augmented reinforcement learning. In NIPS\u201919."},{"key":"e_1_3_1_130_2","doi-asserted-by":"publisher","DOI":"10.1145\/3359554"},{"key":"e_1_3_1_131_2","article-title":"Model-based reinforcement learning with adversarial training for online recommendation","author":"Bai Xueying","year":"2019","unstructured":"Xueying Bai, Jian Guan, and Hongning Wang. 2019. Model-based reinforcement learning with adversarial training for online recommendation. arXiv preprint arXiv:1911.03845 (2019).","journal-title":"arXiv preprint arXiv:1911.03845"},{"key":"e_1_3_1_132_2","first-page":"59","volume-title":"WI\u201919","author":"Hengst Floris Den","year":"2019","unstructured":"Floris Den Hengst, Mark Hoogendoorn, Frank Van Harmelen, and Joost Bosman. 2019. Reinforcement learning for personalized dialogue management. In WI\u201919. 59\u201367."},{"key":"e_1_3_1_133_2","first-page":"285","volume-title":"SIGIR\u201919","author":"Xian Yikun","year":"2019","unstructured":"Yikun Xian, Zuohui Fu, S. Muthukrishnan, Gerard De Melo, and Yongfeng Zhang. 2019. Reinforcement knowledge graph reasoning for explainable recommendation. In SIGIR\u201919. 285\u2013294."},{"key":"e_1_3_1_134_2","first-page":"3312","volume-title":"AAAI\u201919","author":"Chen Haokun","year":"2019","unstructured":"Haokun Chen, Xinyi Dai, Han Cai, Weinan Zhang, Xuejian Wang, Ruiming Tang, Yuzhou Zhang, and Yong Yu. 2019. Large-scale interactive recommendation with tree-structured policy gradient. In AAAI\u201919, Vol. 33. 3312\u20133320."},{"key":"e_1_3_1_135_2","first-page":"104","volume-title":"DASFAA\u201919","author":"Zou Lixin","year":"2019","unstructured":"Lixin Zou, Long Xia, Zhuoye Ding, Dawei Yin, Jiaxing Song, and Weidong Liu. 2019. Reinforcement learning to diversify top-n recommendation. In DASFAA\u201919. 104\u2013120."},{"key":"e_1_3_1_136_2","first-page":"1052","volume-title":"ICML\u201919","author":"Chen Xinshi","year":"2019","unstructured":"Xinshi Chen, Shuang Li, Hui Li, Shaohua Jiang, Yuan Qi, and Le Song. 2019. Generative adversarial user model for reinforcement learning based recommendation system. In ICML\u201919. 1052\u20131061."},{"key":"e_1_3_1_137_2","article-title":"Explainable knowledge graph-based recommendation via deep reinforcement learning","author":"Song Weiping","year":"2019","unstructured":"Weiping Song, Zhijian Duan, Ziqing Yang, Hao Zhu, Ming Zhang, and Jian Tang. 2019. Explainable knowledge graph-based recommendation via deep reinforcement learning. arXiv preprint arXiv:1906.09506 (2019).","journal-title":"arXiv preprint arXiv:1906.09506"},{"key":"e_1_3_1_138_2","doi-asserted-by":"publisher","DOI":"10.1145\/3336191.3371801"},{"key":"e_1_3_1_139_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2020.3012346"},{"key":"e_1_3_1_140_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394486.3403384"},{"key":"e_1_3_1_141_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2020.106302"},{"key":"e_1_3_1_142_2","doi-asserted-by":"publisher","DOI":"10.1109\/JIOT.2020.2974848"},{"key":"e_1_3_1_143_2","doi-asserted-by":"publisher","DOI":"10.1145\/3397271.3401237"},{"key":"e_1_3_1_144_2","doi-asserted-by":"publisher","DOI":"10.1145\/3397271.3401170"},{"key":"e_1_3_1_145_2","article-title":"Deep reinforcement learning based recommendation with explicit user-item interactions modeling","author":"Liu Feng","year":"2018","unstructured":"Feng Liu, Ruiming Tang, Xutao Li, Weinan Zhang, Yunming Ye, Haokun Chen, Huifeng Guo, and Yuzhou Zhang. 2018. Deep reinforcement learning based recommendation with explicit user-item interactions modeling. arXiv preprint arXiv:1810.12027 (2018).","journal-title":"arXiv preprint arXiv:1810.12027"},{"key":"e_1_3_1_146_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2020.106170"},{"key":"e_1_3_1_147_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-47426-3_13"},{"key":"e_1_3_1_148_2","doi-asserted-by":"publisher","DOI":"10.1145\/3336191.3371769"},{"key":"e_1_3_1_149_2","doi-asserted-by":"publisher","DOI":"10.1145\/3383313.3412233"},{"key":"e_1_3_1_150_2","doi-asserted-by":"publisher","DOI":"10.1145\/3340531.3412044"},{"key":"e_1_3_1_151_2","doi-asserted-by":"publisher","DOI":"10.1145\/3366423.3380098"},{"key":"e_1_3_1_152_2","first-page":"1","volume-title":"IJCNN\u201920","author":"Chen Xiaocong","year":"2020","unstructured":"Xiaocong Chen, Chaoran Huang, Lina Yao, Xianzhi Wang, Wenjie Zhang, et\u00a0al. 2020. Knowledge-guided deep reinforcement learning for interactive recommendation. In IJCNN\u201920. 1\u20138."},{"key":"e_1_3_1_153_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394592"},{"key":"e_1_3_1_154_2","first-page":"2073","volume-title":"SIGKDD\u201920","author":"Lei Wenqiang","year":"2020","unstructured":"Wenqiang Lei, Gangyi Zhang, Xiangnan He, Yisong Miao, Xiang Wang, Liang Chen, and Tat-Seng Chua. 2020. Interactive path reasoning on graph for conversational recommendation. In SIGKDD\u201920. 2073\u20132083."},{"key":"e_1_3_1_155_2","doi-asserted-by":"publisher","DOI":"10.1145\/3397271.3401147"},{"key":"e_1_3_1_156_2","doi-asserted-by":"publisher","DOI":"10.1145\/3397271.3401171"},{"key":"e_1_3_1_157_2","doi-asserted-by":"publisher","DOI":"10.1145\/3397271.3401134"},{"key":"e_1_3_1_158_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2020.2998695"},{"key":"e_1_3_1_159_2","doi-asserted-by":"publisher","DOI":"10.1145\/3397271.3401174"},{"key":"e_1_3_1_160_2","volume-title":"FAccTRec Workshop","author":"Singh Ashudeep","year":"2020","unstructured":"Ashudeep Singh, Yoni Halpern, Nithum Thain, Konstantina Christakopoulou, EH Chi, Jilin Chen, and Alex Beutel. 2020. Building healthy recommendation sequences for everyone: A safe reinforcement learning approach. In FAccTRec Workshop."},{"key":"e_1_3_1_161_2","doi-asserted-by":"publisher","DOI":"10.1145\/3336191.3371858"},{"key":"e_1_3_1_162_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2020.07.057"},{"key":"e_1_3_1_163_2","first-page":"1","volume-title":"CSICC\u201921","author":"Baghi Vahid","year":"2021","unstructured":"Vahid Baghi, Seyed Mohammad Seyed Motehayeri, Ali Moeini, and Rooholah Abedian. 2021. Improving ranking function and diversification in interactive recommendation systems based on deep reinforcement learning. In CSICC\u201921. 1\u20137."},{"key":"e_1_3_1_164_2","article-title":"Deep reinforcement learning based group recommender system","author":"Liu Zefang","year":"2021","unstructured":"Zefang Liu, Shuran Wen, and Yinzhu Quan. 2021. Deep reinforcement learning based group recommender system. arXiv preprint arXiv:2106.06900 (2021).","journal-title":"arXiv preprint arXiv:2106.06900"},{"key":"e_1_3_1_165_2","doi-asserted-by":"publisher","DOI":"10.1145\/3437963.3441824"},{"key":"e_1_3_1_166_2","article-title":"Deep reinforcement learning framework for category-based item recommendation","author":"Fu Mingsheng","year":"2021","unstructured":"Mingsheng Fu, Anubha Agrawal, Athirai A. Irissappane, Jie Zhang, Liwei Huang, and Hong Qu. 2021. Deep reinforcement learning framework for category-based item recommendation. IEEE Transactions on Cybernetics (2021).","journal-title":"IEEE Transactions on Cybernetics"},{"key":"e_1_3_1_167_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2021.107085"},{"key":"e_1_3_1_168_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2021.107217"},{"key":"e_1_3_1_169_2","doi-asserted-by":"publisher","DOI":"10.1145\/3442381.3449934"},{"key":"e_1_3_1_170_2","first-page":"1055","volume-title":"SIGKDD\u201921","author":"Liu Danyang","year":"2021","unstructured":"Danyang Liu, Jianxun Lian, Zheng Liu, Xiting Wang, Guangzhong Sun, and Xing Xie. 2021. Reinforced anchor knowledge graph generation for news recommendation reasoning. In SIGKDD\u201921. 1055\u20131065."},{"key":"e_1_3_1_171_2","article-title":"Unified conversational recommendation policy learning via graph-based reinforcement learning","author":"Deng Yang","year":"2021","unstructured":"Yang Deng, Yaliang Li, Fei Sun, Bolin Ding, and Wai Lam. 2021. Unified conversational recommendation policy learning via graph-based reinforcement learning. arXiv preprint arXiv:2105.09710 (2021).","journal-title":"arXiv preprint arXiv:2105.09710"},{"key":"e_1_3_1_172_2","volume-title":"AAAI\u201921","author":"Xie Ruobing","year":"2021","unstructured":"Ruobing Xie, Shaoliang Zhang, Rui Wang, Feng Xia, and Leyu Lin. 2021. Hierarchical reinforcement learning for integrated recommendation. In AAAI\u201921."},{"key":"e_1_3_1_173_2","article-title":"Reinforcement learning with a disentangled universal value function for item recommendation","author":"Wang Kai","year":"2021","unstructured":"Kai Wang, Zhene Zou, Qilin Deng, Runze Wu, Jianrong Tao, Changjie Fan, Liang Chen, and Peng Cui. 2021. Reinforcement learning with a disentangled universal value function for item recommendation. arXiv preprint arXiv:2104.02981 (2021).","journal-title":"arXiv preprint arXiv:2104.02981"},{"key":"e_1_3_1_174_2","first-page":"750","volume-title":"AAAI\u201921","author":"Zhao Xiangyu","year":"2021","unstructured":"Xiangyu Zhao, Changsheng Gu, Haoshenglun Zhang, Xiwang Yang, Xiaobing Liu, Hui Liu, and Jiliang Tang. 2021. DEAR: Deep reinforcement learning for online advertising impression in recommender systems. In AAAI\u201921. 750\u2013758."},{"key":"e_1_3_1_175_2","article-title":"Value penalized Q-learning for recommender systems","author":"Gao Chengqian","year":"2021","unstructured":"Chengqian Gao, Ke Xu, and Peilin Zhao. 2021. Value penalized Q-learning for recommender systems. arXiv preprint arXiv:2110.07923 (2021).","journal-title":"arXiv preprint arXiv:2110.07923"},{"key":"e_1_3_1_176_2","volume-title":"AAAI\u201921","author":"Xiao Teng","year":"2021","unstructured":"Teng Xiao and Donglin Wang. 2021. A general offline reinforcement learning framework for interactive recommendation. In AAAI\u201921."},{"key":"e_1_3_1_177_2","doi-asserted-by":"publisher","DOI":"10.1145\/3437963.3441764"},{"key":"e_1_3_1_178_2","unstructured":"Dankit K. Nassiuma. 2001. Survey sampling: Theory and methods. (2001)."},{"key":"e_1_3_1_179_2","article-title":"Hindsight experience replay","author":"Andrychowicz Marcin","year":"2017","unstructured":"Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel, and Wojciech Zaremba. 2017. Hindsight experience replay. arXiv preprint arXiv:1707.01495 (2017).","journal-title":"arXiv preprint arXiv:1707.01495"},{"key":"e_1_3_1_180_2","first-page":"1201","volume-title":"ICML\u201909","author":"Yue Yisong","year":"2009","unstructured":"Yisong Yue and Thorsten Joachims. 2009. Interactively optimizing information retrieval systems as a dueling bandits problem. In ICML\u201909. 1201\u20131208."},{"key":"e_1_3_1_181_2","volume-title":"AAAI\u201917","author":"Yu Lantao","year":"2017","unstructured":"Lantao Yu, Weinan Zhang, Jun Wang, and Yong Yu. 2017. SeqGAN: Sequence generative adversarial nets with policy gradient. In AAAI\u201917."},{"key":"e_1_3_1_182_2","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRev.36.823"},{"key":"e_1_3_1_183_2","first-page":"1861","volume-title":"ICML\u201918","author":"Haarnoja Tuomas","year":"2018","unstructured":"Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. 2018. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In ICML\u201918. 1861\u20131870."},{"key":"e_1_3_1_184_2","article-title":"Learning to communicate with deep multi-agent reinforcement learning","author":"Foerster Jakob N.","year":"2016","unstructured":"Jakob N. Foerster, Yannis M. Assael, Nando De Freitas, and Shimon Whiteson. 2016. Learning to communicate with deep multi-agent reinforcement learning. arXiv preprint arXiv:1605.06676 (2016).","journal-title":"arXiv preprint arXiv:1605.06676"},{"key":"e_1_3_1_185_2","article-title":"Multi-agent actor-critic for mixed cooperative-competitive environments","author":"Lowe Ryan","year":"2017","unstructured":"Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, and Igor Mordatch. 2017. Multi-agent actor-critic for mixed cooperative-competitive environments. arXiv preprint arXiv:1706.02275 (2017).","journal-title":"arXiv preprint arXiv:1706.02275"},{"key":"e_1_3_1_186_2","first-page":"4565","article-title":"Generative adversarial imitation learning","volume":"29","author":"Ho Jonathan","year":"2016","unstructured":"Jonathan Ho and Stefano Ermon. 2016. Generative adversarial imitation learning. NIPS\u201916 29 (2016), 4565\u20134573.","journal-title":"NIPS\u201916"},{"key":"e_1_3_1_187_2","volume-title":"Survival Analysis","author":"Kleinbaum David G.","year":"2010","unstructured":"David G. Kleinbaum and Mitchel Klein. 2010. Survival Analysis. Springer."},{"key":"e_1_3_1_188_2","doi-asserted-by":"publisher","DOI":"10.1145\/3018661.3018719"},{"key":"e_1_3_1_189_2","article-title":"RecSim: A configurable simulation platform for recommender systems","author":"Ie Eugene","year":"2019","unstructured":"Eugene Ie et\u00a0al. 2019. RecSim: A configurable simulation platform for recommender systems. arXiv preprint arXiv:1909.04847 (2019).","journal-title":"arXiv preprint arXiv:1909.04847"},{"key":"e_1_3_1_190_2","doi-asserted-by":"publisher","DOI":"10.1198\/106186008X320456"},{"key":"e_1_3_1_191_2","unstructured":"Yelp. https:\/\/www.yelp.com\/academic_dataset. ([n. d.])."},{"key":"e_1_3_1_192_2","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.39.10.1953"},{"key":"e_1_3_1_193_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-60990-0_12"},{"key":"e_1_3_1_194_2","doi-asserted-by":"publisher","DOI":"10.1016\/0304-4068(74)90037-8"},{"key":"e_1_3_1_195_2","doi-asserted-by":"publisher","DOI":"10.1023\/A:1022140919877"},{"key":"e_1_3_1_196_2","first-page":"3540","volume-title":"ICML\u201917","author":"Vezhnevets Alexander Sasha","year":"2017","unstructured":"Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, and Koray Kavukcuoglu. 2017. Feudal networks for hierarchical reinforcement learning. In ICML\u201917. 3540\u20133549."},{"key":"e_1_3_1_197_2","article-title":"Data-efficient hierarchical reinforcement learning","author":"Nachum Ofir","year":"2018","unstructured":"Ofir Nachum, Shixiang Gu, Honglak Lee, and Sergey Levine. 2018. Data-efficient hierarchical reinforcement learning. arXiv preprint arXiv:1805.08296 (2018).","journal-title":"arXiv preprint arXiv:1805.08296"},{"key":"e_1_3_1_198_2","first-page":"3675","article-title":"Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation","volume":"29","author":"Kulkarni Tejas D.","year":"2016","unstructured":"Tejas D. Kulkarni, Karthik Narasimhan, Ardavan Saeedi, and Josh Tenenbaum. 2016. Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. NIPS\u201916 29 (2016), 3675\u20133683.","journal-title":"NIPS\u201916"},{"key":"e_1_3_1_199_2","doi-asserted-by":"publisher","DOI":"10.1016\/S0004-3702(99)00052-1"},{"key":"e_1_3_1_200_2","first-page":"995","volume-title":"ICDM\u201910","author":"Rendle Steffen","year":"2010","unstructured":"Steffen Rendle. 2010. Factorization machines. In ICDM\u201910. 995\u20131000."},{"key":"e_1_3_1_201_2","first-page":"359","article-title":"Supervised actor-critic reinforcement learning","author":"Rosenstein Michael T.","year":"2004","unstructured":"Michael T. Rosenstein, Andrew G. Barto, Jennie Si, Andy Barto, Warren Powell, and Donald Wunsch. 2004. Supervised actor-critic reinforcement learning. Learning and Approximate Dynamic Programming: Scaling Up to the Real World (2004), 359\u2013380.","journal-title":"Learning and Approximate Dynamic Programming: Scaling Up to the Real World"},{"key":"e_1_3_1_202_2","doi-asserted-by":"publisher","DOI":"10.5555\/2789272.2886795"},{"key":"e_1_3_1_203_2","doi-asserted-by":"publisher","DOI":"10.1145\/3054912"},{"key":"e_1_3_1_204_2","doi-asserted-by":"publisher","DOI":"10.5555\/2016945.2016946"},{"key":"e_1_3_1_205_2","first-page":"585","volume-title":"SIGCHI\u201903","author":"Cosley Dan","year":"2003","unstructured":"Dan Cosley, Shyong K. Lam, Istvan Albert, Joseph A. Konstan, and John Riedl. 2003. Is seeing believing? How recommender system interfaces affect users\u2019 opinions. In SIGCHI\u201903. 585\u2013592."},{"key":"e_1_3_1_206_2","first-page":"135","volume-title":"ICETE\u201905","author":"Chen Li","year":"2005","unstructured":"Li Chen and Pearl Pu. 2005. Trust building in recommender agents. In ICETE\u201905. 135\u2013145."},{"key":"e_1_3_1_207_2","doi-asserted-by":"publisher","DOI":"10.1145\/1297231.1297259"},{"key":"e_1_3_1_208_2","doi-asserted-by":"publisher","DOI":"10.1145\/3236386.3241340"},{"key":"e_1_3_1_209_2","first-page":"587","volume-title":"ICDM\u201918","author":"Wang Xiting","year":"2018","unstructured":"Xiting Wang, Yiru Chen, Jie Yang, Le Wu, Zhengtao Wu, and Xing Xie. 2018. A reinforcement learning framework for explainable recommendation. In ICDM\u201918. 587\u2013596."},{"key":"e_1_3_1_210_2","doi-asserted-by":"publisher","DOI":"10.1145\/3308558.3313404"},{"key":"e_1_3_1_211_2","article-title":"Evolution strategies as a scalable alternative to reinforcement learning","author":"Salimans Tim","year":"2017","unstructured":"Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor, and Ilya Sutskever. 2017. Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864 (2017).","journal-title":"arXiv preprint arXiv:1703.03864"},{"key":"e_1_3_1_212_2","article-title":"A load balanced recommendation approach","author":"Afsar Mehdi","year":"2021","unstructured":"Mehdi Afsar, Trafford Crump, and Behrouz Far. 2021. A load balanced recommendation approach. arXiv preprint arXiv:2105.09981 (2021).","journal-title":"arXiv preprint arXiv:2105.09981"},{"key":"e_1_3_1_213_2","article-title":"OpenAI Gym","author":"Brockman Greg","year":"2016","unstructured":"Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. OpenAI Gym. arXiv preprint arXiv:1606.01540 (2016).","journal-title":"arXiv preprint arXiv:1606.01540"},{"key":"e_1_3_1_214_2","article-title":"Horizon: Facebook\u2019s open source applied reinforcement learning platform","author":"Gauci Jason","year":"2018","unstructured":"Jason Gauci, Edoardo Conti, Yitao Liang, Kittipat Virochsiri, Yuchen He, Zachary Kaden, Vivek Narayanan, Xiaohui Ye, Zhengxing Chen, and Scott Fujimoto. 2018. Horizon: Facebook\u2019s open source applied reinforcement learning platform. arXiv preprint arXiv:1811.00260 (2018).","journal-title":"arXiv preprint arXiv:1811.00260"},{"key":"e_1_3_1_215_2","article-title":"RecoGym: A reinforcement learning environment for the problem of product recommendation in online advertising","author":"Rohde David","year":"2018","unstructured":"David Rohde, Stephen Bonner, Travis Dunlop, Flavian Vasile, and Alexandros Karatzoglou. 2018. RecoGym: A reinforcement learning environment for the problem of product recommendation in online advertising. arXiv preprint arXiv:1808.00720 (2018).","journal-title":"arXiv preprint arXiv:1808.00720"},{"key":"e_1_3_1_216_2","article-title":"Toward simulating environments in reinforcement learning based recommendations","author":"Zhao Xiangyu","year":"2019","unstructured":"Xiangyu Zhao, Long Xia, Zhuoye Ding, Dawei Yin, and Jiliang Tang. 2019. Toward simulating environments in reinforcement learning based recommendations. arXiv preprint arXiv:1906.11462 (2019).","journal-title":"arXiv preprint arXiv:1906.11462"},{"key":"e_1_3_1_217_2","doi-asserted-by":"publisher","DOI":"10.1145\/3298689.3346981"},{"key":"e_1_3_1_218_2","doi-asserted-by":"publisher","DOI":"10.1145\/3383313.3412252"},{"key":"e_1_3_1_219_2","article-title":"Progressive growing of GANs for improved quality, stability, and variation","author":"Karras Tero","year":"2017","unstructured":"Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2017. Progressive growing of GANs for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017).","journal-title":"arXiv preprint arXiv:1710.10196"},{"key":"e_1_3_1_220_2","first-page":"4401","volume-title":"CVPR\u201919","author":"Karras Tero","year":"2019","unstructured":"Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In CVPR\u201919. 4401\u20134410."},{"key":"e_1_3_1_221_2","first-page":"7799","volume-title":"CVPR\u201920","author":"Karnewar Animesh","year":"2020","unstructured":"Animesh Karnewar and Oliver Wang. 2020. MSG-GAN: Multi-scale gradients for generative adversarial networks. In CVPR\u201920. 7799\u20137808."}],"container-title":["ACM Computing Surveys"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3543846","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3543846","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T17:49:39Z","timestamp":1750268979000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3543846"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,15]]},"references-count":220,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2023,7,31]]}},"alternative-id":["10.1145\/3543846"],"URL":"https:\/\/doi.org\/10.1145\/3543846","relation":{},"ISSN":["0360-0300","1557-7341"],"issn-type":[{"value":"0360-0300","type":"print"},{"value":"1557-7341","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,12,15]]},"assertion":[{"value":"2020-12-03","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-06-03","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-12-15","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}