{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,5]],"date-time":"2025-11-05T21:10:08Z","timestamp":1762377008486,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":40,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,9,13]],"date-time":"2021-09-13T00:00:00Z","timestamp":1631491200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100000266","name":"Engineering and Physical Sciences Research Council","doi-asserted-by":"publisher","award":["EP\/R018634\/1"],"award-info":[{"award-number":["EP\/R018634\/1"]}],"id":[{"id":"10.13039\/501100000266","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,9,13]]},"DOI":"10.1145\/3460231.3474256","type":"proceedings-article","created":{"date-parts":[[2021,9,13]],"date-time":"2021-09-13T21:45:04Z","timestamp":1631569504000},"page":"241-251","source":"Crossref","is-referenced-by-count":11,"title":["Partially Observable Reinforcement Learning for Dialog-based Interactive Recommendation"],"prefix":"10.1145","author":[{"given":"Yaxiong","family":"Wu","sequence":"first","affiliation":[{"name":"University of Glasgow, United Kingdom"}]},{"given":"Craig","family":"Macdonald","sequence":"additional","affiliation":[{"name":"School of Computing cience University of Glasgow, United Kingdom"}]},{"given":"Iadh","family":"Ounis","sequence":"additional","affiliation":[{"name":"University of Glasgow, United Kingdom"}]}],"member":"320","published-online":{"date-parts":[[2021,9,13]]},"reference":[{"volume-title":"Constrained Markov decision processes","author":"Altman Eitan","key":"e_1_3_2_2_1_1","unstructured":"Eitan Altman . 1999. Constrained Markov decision processes . CRC Press . Eitan Altman. 1999. Constrained Markov decision processes. CRC Press."},{"key":"e_1_3_2_2_2_1","volume-title":"Proc. ECCV. 663\u2013676","author":"Berg L","year":"2010","unstructured":"Tamara\u00a0 L Berg , Alexander\u00a0 C Berg , and Jonathan Shih . 2010 . Automatic attribute discovery and characterization from noisy web data . In Proc. ECCV. 663\u2013676 . Tamara\u00a0L Berg, Alexander\u00a0C Berg, and Jonathan Shih. 2010. Automatic attribute discovery and characterization from noisy web data. In Proc. ECCV. 663\u2013676."},{"key":"e_1_3_2_2_3_1","volume-title":"Proc. WSDM. 456\u2013464","author":"Chen Minmin","year":"2019","unstructured":"Minmin Chen , Alex Beutel , Paul Covington , Sagar Jain , Francois Belletti , and Ed\u00a0 H Chi . 2019 . Top-k off-policy correction for a REINFORCE recommender system . In Proc. WSDM. 456\u2013464 . Minmin Chen, Alex Beutel, Paul Covington, Sagar Jain, Francois Belletti, and Ed\u00a0H Chi. 2019. Top-k off-policy correction for a REINFORCE recommender system. In Proc. WSDM. 456\u2013464."},{"key":"e_1_3_2_2_4_1","volume-title":"Proc. WSDM. 121\u2013129","author":"Chen Minmin","year":"2021","unstructured":"Minmin Chen , Bo Chang , Can Xu , and Ed\u00a0 H Chi . 2021 . User Response Models to Improve a REINFORCE Recommender System . In Proc. WSDM. 121\u2013129 . Minmin Chen, Bo Chang, Can Xu, and Ed\u00a0H Chi. 2021. User Response Models to Improve a REINFORCE Recommender System. In Proc. WSDM. 121\u2013129."},{"key":"e_1_3_2_2_5_1","unstructured":"Junyoung Chung Caglar Gulcehre KyungHyun Cho and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555(2014).  Junyoung Chung Caglar Gulcehre KyungHyun Cho and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555(2014)."},{"key":"e_1_3_2_2_6_1","volume-title":"Proc. UAI. 1061\u20131071","author":"Gangwani Tanmay","year":"2020","unstructured":"Tanmay Gangwani , Joel Lehman , Qiang Liu , and Jian Peng . 2020 . Learning belief representations for imitation learning in pomdps . In Proc. UAI. 1061\u20131071 . Tanmay Gangwani, Joel Lehman, Qiang Liu, and Jian Peng. 2020. Learning belief representations for imitation learning in pomdps. In Proc. UAI. 1061\u20131071."},{"key":"e_1_3_2_2_7_1","unstructured":"Chongming Gao Wenqiang Lei Xiangnan He Maarten de Rijke and Tat-Seng Chua. 2021. Advances and Challenges in Conversational Recommender Systems: A Survey. arXiv preprint arXiv:2101.09459(2021).  Chongming Gao Wenqiang Lei Xiangnan He Maarten de Rijke and Tat-Seng Chua. 2021. Advances and Challenges in Conversational Recommender Systems: A Survey. arXiv preprint arXiv:2101.09459(2021)."},{"volume-title":"Deep Learning","author":"Goodfellow Ian","key":"e_1_3_2_2_8_1","unstructured":"Ian Goodfellow , Yoshua Bengio , and Aaron Courville . 2016. Deep Learning . MIT Press . Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press."},{"key":"e_1_3_2_2_9_1","volume-title":"Proc. NeurIPS. 678\u2013688","author":"Guo Xiaoxiao","year":"2018","unstructured":"Xiaoxiao Guo , Hui Wu , Yu Cheng , Steven Rennie , Gerald Tesauro , and Rogerio Feris . 2018 . Dialog-based interactive image retrieval . In Proc. NeurIPS. 678\u2013688 . Xiaoxiao Guo, Hui Wu, Yu Cheng, Steven Rennie, Gerald Tesauro, and Rogerio Feris. 2018. Dialog-based interactive image retrieval. In Proc. NeurIPS. 678\u2013688."},{"key":"e_1_3_2_2_10_1","volume-title":"Fashion IQ: A New Dataset towards Retrieving Images by Natural Language Feedback. arXiv preprint arXiv:1905.12794(2019).","author":"Guo Xiaoxiao","year":"2019","unstructured":"Xiaoxiao Guo , Hui Wu , Yupeng Gao , Steven Rennie , and Rogerio Feris . 2019 . Fashion IQ: A New Dataset towards Retrieving Images by Natural Language Feedback. arXiv preprint arXiv:1905.12794(2019). Xiaoxiao Guo, Hui Wu, Yupeng Gao, Steven Rennie, and Rogerio Feris. 2019. Fashion IQ: A New Dataset towards Retrieving Images by Natural Language Feedback. arXiv preprint arXiv:1905.12794(2019)."},{"key":"e_1_3_2_2_11_1","volume-title":"Double Q-learning. Proc. NeurIPS","author":"Hasselt Hado","year":"2010","unstructured":"Hado Hasselt . 2010 . Double Q-learning. Proc. NeurIPS (2010), 2613\u20132621. Hado Hasselt. 2010. Double Q-learning. Proc. NeurIPS (2010), 2613\u20132621."},{"key":"e_1_3_2_2_12_1","volume-title":"Proc. AAAI. 29\u201337","author":"Hausknecht Matthew","year":"2015","unstructured":"Matthew Hausknecht and Peter Stone . 2015 . Deep recurrent Q-learning for partially observable mdps . In Proc. AAAI. 29\u201337 . Matthew Hausknecht and Peter Stone. 2015. Deep recurrent Q-learning for partially observable mdps. In Proc. AAAI. 29\u201337."},{"key":"e_1_3_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_2_14_1","volume-title":"Proc. ICLR","author":"Hidasi Bal\u00e1zs","year":"2016","unstructured":"Bal\u00e1zs Hidasi , Alexandros Karatzoglou , Linas Baltrunas , and Domonkos Tikk . 2016 . Session-based recommendations with recurrent neural networks . Proc. ICLR (2016). Bal\u00e1zs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2016. Session-based recommendations with recurrent neural networks. Proc. ICLR (2016)."},{"key":"e_1_3_2_2_15_1","volume-title":"Proc. ICML. 2117\u20132126","author":"Igl Maximilian","year":"2018","unstructured":"Maximilian Igl , Luisa Zintgraf , Tuan\u00a0Anh Le , Frank Wood , and Shimon Whiteson . 2018 . Deep variational reinforcement learning for POMDPs . In Proc. ICML. 2117\u20132126 . Maximilian Igl, Luisa Zintgraf, Tuan\u00a0Anh Le, Frank Wood, and Shimon Whiteson. 2018. Deep variational reinforcement learning for POMDPs. In Proc. ICML. 2117\u20132126."},{"key":"e_1_3_2_2_16_1","unstructured":"Dietmar Jannach Ahtsham Manzoor Wanling Cai and Li Chen. 2020. A Survey on Conversational Recommender Systems. arXiv preprint arXiv:2004.00646(2020).  Dietmar Jannach Ahtsham Manzoor Wanling Cai and Li Chen. 2020. A Survey on Conversational Recommender Systems. arXiv preprint arXiv:2004.00646(2020)."},{"key":"e_1_3_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2018.00035"},{"key":"e_1_3_2_2_18_1","volume-title":"Proc. ICLR.","author":"Kingma P","year":"2014","unstructured":"Diederik\u00a0 P Kingma and Jimmy Ba . 2014 . Adam: A method for stochastic optimization . In Proc. ICLR. Diederik\u00a0P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. In Proc. ICLR."},{"key":"e_1_3_2_2_19_1","volume-title":"Proc. NeurIPS. 1008\u20131014","author":"Konda R","year":"2000","unstructured":"Vijay\u00a0 R Konda and John\u00a0 N Tsitsiklis . 2000 . Actor-critic algorithms . In Proc. NeurIPS. 1008\u20131014 . Vijay\u00a0R Konda and John\u00a0N Tsitsiklis. 2000. Actor-critic algorithms. In Proc. NeurIPS. 1008\u20131014."},{"key":"e_1_3_2_2_20_1","volume-title":"Proc. WSDM. 304\u2013312","author":"Lei Wenqiang","year":"2020","unstructured":"Wenqiang Lei , Xiangnan He , Yisong Miao , Qingyun Wu , Richang Hong , Min-Yen Kan , and Tat-Seng Chua . 2020 . Estimation-action-reflection: Towards deep interaction between conversational and recommender systems . In Proc. WSDM. 304\u2013312 . Wenqiang Lei, Xiangnan He, Yisong Miao, Qingyun Wu, Richang Hong, Min-Yen Kan, and Tat-Seng Chua. 2020. Estimation-action-reflection: Towards deep interaction between conversational and recommender systems. In Proc. WSDM. 304\u2013312."},{"key":"e_1_3_2_2_21_1","volume-title":"Interactive recommendation with user-specific deep reinforcement learning. TKDD","author":"Lei Yu","year":"2019","unstructured":"Yu Lei and Wenjie Li. 2019. Interactive recommendation with user-specific deep reinforcement learning. TKDD ( 2019 ), 1\u201315. Yu Lei and Wenjie Li. 2019. Interactive recommendation with user-specific deep reinforcement learning. TKDD (2019), 1\u201315."},{"key":"e_1_3_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.124"},{"key":"e_1_3_2_2_23_1","unstructured":"Zhongqi Lu and Qiang Yang. 2016. Partially observable markov decision process for recommender systems. arXiv preprint arXiv:1608.07793(2016).  Zhongqi Lu and Qiang Yang. 2016. Partially observable markov decision process for recommender systems. arXiv preprint arXiv:1608.07793(2016)."},{"key":"e_1_3_2_2_24_1","volume-title":"Proc. NeurIPS. 3111\u20133119","author":"Mikolov Tomas","year":"2013","unstructured":"Tomas Mikolov , Ilya Sutskever , Kai Chen , Greg Corrado , and Jeffrey Dean . 2013 . Distributed representations of words and phrases and their compositionality . In Proc. NeurIPS. 3111\u20133119 . Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proc. NeurIPS. 3111\u20133119."},{"key":"e_1_3_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.131"},{"key":"e_1_3_2_2_26_1","volume-title":"An MDP-based recommender system.JMLR","author":"Shani Guy","year":"2005","unstructured":"Guy Shani , David Heckerman , Ronen\u00a0 I Brafman , and Craig Boutilier . 2005. An MDP-based recommender system.JMLR ( 2005 ). Guy Shani, David Heckerman, Ronen\u00a0I Brafman, and Craig Boutilier. 2005. An MDP-based recommender system.JMLR (2005)."},{"key":"e_1_3_2_2_27_1","volume-title":"Proc. ICML. 387\u2013395","author":"Silver David","year":"2014","unstructured":"David Silver , Guy Lever , Nicolas Heess , Thomas Degris , Daan Wierstra , and Martin Riedmiller . 2014 . Deterministic policy gradient algorithms . In Proc. ICML. 387\u2013395 . David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller. 2014. Deterministic policy gradient algorithms. In Proc. ICML. 387\u2013395."},{"key":"e_1_3_2_2_28_1","volume-title":"Proc. SIGIR. 235\u2013244","author":"Sun Yueming","year":"2018","unstructured":"Yueming Sun and Yi Zhang . 2018 . Conversational recommender system . In Proc. SIGIR. 235\u2013244 . Yueming Sun and Yi Zhang. 2018. Conversational recommender system. In Proc. SIGIR. 235\u2013244."},{"volume-title":"Reinforcement learning: An introduction","author":"Sutton S","key":"e_1_3_2_2_29_1","unstructured":"Richard\u00a0 S Sutton and Andrew\u00a0 G Barto . 2018. Reinforcement learning: An introduction . MIT press . Richard\u00a0S Sutton and Andrew\u00a0G Barto. 2018. Reinforcement learning: An introduction. MIT press."},{"key":"e_1_3_2_2_30_1","volume-title":"Proc. WSDM. 565\u2013573","author":"Tang Jiaxi","year":"2018","unstructured":"Jiaxi Tang and Ke Wang . 2018 . Personalized top-n sequential recommendation via convolutional sequence embedding . In Proc. WSDM. 565\u2013573 . Jiaxi Tang and Ke Wang. 2018. Personalized top-n sequential recommendation via convolutional sequence embedding. In Proc. WSDM. 565\u2013573."},{"key":"e_1_3_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00449"},{"key":"e_1_3_2_2_32_1","volume-title":"Proc. SIGIR. 931\u2013940","author":"Xin Xin","year":"2020","unstructured":"Xin Xin , Alexandros Karatzoglou , Ioannis Arapakis , and Joemon\u00a0 M Jose . 2020 . Self-Supervised Reinforcement Learning for Recommender Systems . In Proc. SIGIR. 931\u2013940 . Xin Xin, Alexandros Karatzoglou, Ioannis Arapakis, and Joemon\u00a0M Jose. 2020. Self-Supervised Reinforcement Learning for Recommender Systems. In Proc. SIGIR. 931\u2013940."},{"key":"e_1_3_2_2_33_1","volume-title":"Proc. WSDM. 364\u2013372","author":"Xu Kerui","year":"2021","unstructured":"Kerui Xu , Jingxuan Yang , Jun Xu , Sheng Gao , Jun Guo , and Ji-Rong Wen . 2021 . Adapting User Preference to Online Feedback in Multi-round Conversational Recommendation . In Proc. WSDM. 364\u2013372 . Kerui Xu, Jingxuan Yang, Jun Xu, Sheng Gao, Jun Guo, and Ji-Rong Wen. 2021. Adapting User Preference to Online Feedback in Multi-round Conversational Recommendation. In Proc. WSDM. 364\u2013372."},{"key":"e_1_3_2_2_34_1","volume-title":"Proc. KDD. 157\u2013165","author":"Yu Tong","year":"2019","unstructured":"Tong Yu , Yilin Shen , and Hongxia Jin . 2019 . A visual dialog augmented interactive recommender system . In Proc. KDD. 157\u2013165 . Tong Yu, Yilin Shen, and Hongxia Jin. 2019. A visual dialog augmented interactive recommender system. In Proc. KDD. 157\u2013165."},{"key":"e_1_3_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i01.5465"},{"key":"e_1_3_2_2_36_1","volume-title":"Proc. NeurIPS. 15214\u201315224","author":"Zhang Ruiyi","year":"2019","unstructured":"Ruiyi Zhang , Tong Yu , Yilin Shen , Hongxia Jin , and Changyou Chen . 2019 . Text-based interactive recommendation via constraint-augmented reinforcement learning . In Proc. NeurIPS. 15214\u201315224 . Ruiyi Zhang, Tong Yu, Yilin Shen, Hongxia Jin, and Changyou Chen. 2019. Text-based interactive recommendation via constraint-augmented reinforcement learning. In Proc. NeurIPS. 15214\u201315224."},{"key":"e_1_3_2_2_37_1","volume-title":"Proc. CIKM. 177\u2013186","author":"Zhang Yongfeng","year":"2018","unstructured":"Yongfeng Zhang , Xu Chen , Qingyao Ai , Liu Yang , and W\u00a0Bruce Croft . 2018 . Towards conversational search and recommendation: System ask, user respond . In Proc. CIKM. 177\u2013186 . Yongfeng Zhang, Xu Chen, Qingyao Ai, Liu Yang, and W\u00a0Bruce Croft. 2018. Towards conversational search and recommendation: System ask, user respond. In Proc. CIKM. 177\u2013186."},{"key":"e_1_3_2_2_38_1","volume-title":"Proc. WWW. 167\u2013176","author":"Zheng Guanjie","year":"2018","unstructured":"Guanjie Zheng , Fuzheng Zhang , Zihan Zheng , Yang Xiang , Nicholas\u00a0Jing Yuan , Xing Xie , and Zhenhui Li . 2018 . DRN: A deep reinforcement learning framework for news recommendation . In Proc. WWW. 167\u2013176 . Guanjie Zheng, Fuzheng Zhang, Zihan Zheng, Yang Xiang, Nicholas\u00a0Jing Yuan, Xing Xie, and Zhenhui Li. 2018. DRN: A deep reinforcement learning framework for news recommendation. In Proc. WWW. 167\u2013176."},{"key":"e_1_3_2_2_39_1","volume-title":"Proc. KDD. 2810\u20132818","author":"Zou Lixin","year":"2019","unstructured":"Lixin Zou , Long Xia , Zhuoye Ding , Jiaxing Song , Weidong Liu , and Dawei Yin . 2019 . Reinforcement learning to optimize long-term user engagement in recommender systems . In Proc. KDD. 2810\u20132818 . Lixin Zou, Long Xia, Zhuoye Ding, Jiaxing Song, Weidong Liu, and Dawei Yin. 2019. Reinforcement learning to optimize long-term user engagement in recommender systems. In Proc. KDD. 2810\u20132818."},{"key":"e_1_3_2_2_40_1","volume-title":"Proc. SIGIR. 749\u2013758","author":"Zou Lixin","year":"2020","unstructured":"Lixin Zou , Long Xia , Yulong Gu , Xiangyu Zhao , Weidong Liu , Jimmy\u00a0Xiangji Huang , and Dawei Yin . 2020 . Neural Interactive Collaborative Filtering . In Proc. SIGIR. 749\u2013758 . Lixin Zou, Long Xia, Yulong Gu, Xiangyu Zhao, Weidong Liu, Jimmy\u00a0Xiangji Huang, and Dawei Yin. 2020. Neural Interactive Collaborative Filtering. In Proc. SIGIR. 749\u2013758."}],"event":{"name":"RecSys '21: Fifteenth ACM Conference on Recommender Systems","sponsor":["SIGWEB ACM Special Interest Group on Hypertext, Hypermedia, and Web","SIGAI ACM Special Interest Group on Artificial Intelligence","SIGKDD ACM Special Interest Group on Knowledge Discovery in Data","SIGIR ACM Special Interest Group on Information Retrieval","SIGCHI ACM Special Interest Group on Computer-Human Interaction","SIGecom Special Interest Group on Economics and Computation"],"location":"Amsterdam Netherlands","acronym":"RecSys '21"},"container-title":["Fifteenth ACM Conference on Recommender Systems"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3460231.3474256","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3460231.3474256","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:12:17Z","timestamp":1750191137000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3460231.3474256"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,9,13]]},"references-count":40,"alternative-id":["10.1145\/3460231.3474256","10.1145\/3460231"],"URL":"https:\/\/doi.org\/10.1145\/3460231.3474256","relation":{},"subject":[],"published":{"date-parts":[[2021,9,13]]}}}