{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,16]],"date-time":"2026-04-16T14:49:54Z","timestamp":1776350994748,"version":"3.51.2"},"reference-count":51,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2021,11,30]],"date-time":"2021-11-30T00:00:00Z","timestamp":1638230400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["61672252"],"award-info":[{"award-number":["61672252"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100012226","name":"Fundamental Research Funds for the Central Universities","doi-asserted-by":"crossref","award":["2019kfyXKJC021"],"award-info":[{"award-number":["2019kfyXKJC021"]}],"id":[{"id":"10.13039\/501100012226","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM\/IMS Trans. Data Sci."],"published-print":{"date-parts":[[2021,11,30]]},"abstract":"<jats:p>\n                    Due to its nature of learning from dynamic interactions and planning for long-run performance, Reinforcement Learning (RL) has attracted much attention in Interactive Recommender Systems (IRSs). However, most of the existing RL-based IRSs usually face large discrete action space problem, which severely limits their efficiency. Moreover, data sparsity is another problem that most IRSs are confronted with. The utilization of recommendation-related textual knowledge can tackle this problem to some extent, but existing RL-based recommendation methods either neglect to combine textual information or are not suitable for incorporating it. To address these two problems, in this article, we propose a\n                    <jats:underline>T<\/jats:underline>\n                    ext-based deep\n                    <jats:underline>R<\/jats:underline>\n                    einforcement learning framework using self-supervised\n                    <jats:underline>G<\/jats:underline>\n                    raph representation for\n                    <jats:underline>I<\/jats:underline>\n                    nteractive\n                    <jats:underline>R<\/jats:underline>\n                    ecommendation (TRGIR). Specifically, we leverage textual information to map items and users into a same feature space by a self-supervised embedding method based on the graph convolutional network, which greatly alleviates data sparsity problem. Moreover, we design an effective method to construct an action candidate set, which reduces the scale of the action space directly. Two types of representative reinforcement learning algorithms have been applied to implement TRGIR. Since the action space of IRS is discrete, it is natural to implement TRGIR with Deep Q-learning Network (DQN). In the TRGIR implementation with Deep Deterministic Policy Gradient (DDPG), denoted as TRGIR-DDPG, we design a policy vector, which can represent user\u2019s preferences, to generate discrete actions from the candidate set. Through extensive experiments on three public datasets, we demonstrate that TRGIR-DDPG achieves state-of-the-art performance over several baselines in a time-efficient manner.\n                  <\/jats:p>","DOI":"10.1145\/3522596","type":"journal-article","created":{"date-parts":[[2022,5,17]],"date-time":"2022-05-17T09:52:21Z","timestamp":1652781141000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["A Text-based Deep Reinforcement Learning Framework Using Self-supervised Graph Representation for Interactive Recommendation"],"prefix":"10.1145","volume":"2","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4371-7514","authenticated-orcid":false,"given":"Chaoyang","family":"Wang","sequence":"first","affiliation":[{"name":"Huazhong University of Science and Technology, Wuhan, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9393-4854","authenticated-orcid":false,"given":"Zhiqiang","family":"Guo","sequence":"additional","affiliation":[{"name":"Huazhong University of Science and Technology, Wuhan, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5265-7624","authenticated-orcid":false,"given":"Jianjun","family":"Li","sequence":"additional","affiliation":[{"name":"Huazhong University of Science and Technology, Wuhan, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6984-1914","authenticated-orcid":false,"given":"Guohui","family":"Li","sequence":"additional","affiliation":[{"name":"Huazhong University of Science and Technology, Wuhan, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9598-2072","authenticated-orcid":false,"given":"Peng","family":"Pan","sequence":"additional","affiliation":[{"name":"Huazhong University of Science and Technology, Wuhan, China"}]}],"member":"320","published-online":{"date-parts":[[2022,5,17]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/276304.276314"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.3233\/IA-170031"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/3097983.3098170"},{"key":"e_1_3_2_5_2","article-title":"Graph convolutional matrix completion","author":"Berg Rianne van den","year":"2017","unstructured":"Rianne van den Berg, Thomas N. Kipf, and Max Welling. 2017. Graph convolutional matrix completion. arXiv preprint arXiv:1706.02263 (2017).","journal-title":"arXiv preprint arXiv:1706.02263"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33013312"},{"key":"e_1_3_2_7_2","first-page":"1052","volume-title":"Proceedings of the 36th International Conference on Machine Learning (ICML)","volume":"97","author":"Chen Xinshi","year":"2019","unstructured":"Xinshi Chen, Shuang Li, Hui Li, Shaohua Jiang, Yuan Qi, and Le Song. 2019. Generative adversarial user model for reinforcement learning based recommendation system. In Proceedings of the 36th International Conference on Machine Learning (ICML), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.), Vol. 97. PMLR, 1052\u20131061."},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/3308560.3316457"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1145\/3269206.3271810"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/3240323.3240353"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.330161"},{"key":"e_1_3_2_12_2","article-title":"Deep reinforcement learning in large discrete action spaces","author":"Dulac-Arnold Gabriel","year":"2015","unstructured":"Gabriel Dulac-Arnold, Richard Evans, Hado van Hasselt, Peter Sunehag, Timothy Lillicrap, Jonathan Hunt, Timothy Mann, Theophane Weber, Thomas Degris, and Ben Coppin. 2015. Deep reinforcement learning in large discrete action spaces. arXiv preprint arXiv:1512.07679 (2015).","journal-title":"arXiv preprint arXiv:1512.07679"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.4135\/9781412985475"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v30i1.10295"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/3397271.3401063"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/3038912.3052569"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1145\/3219819.3219846"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/582415.582418"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2018.00035"},{"key":"e_1_3_2_20_2","volume-title":"Proceedings of the International Conference on Learning Representations (ICLR)","author":"Kipf Thomas N.","year":"2017","unstructured":"Thomas N. Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In Proceedings of the International Conference on Learning Representations (ICLR)."},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/MC.2009.263"},{"key":"e_1_3_2_22_2","unstructured":"Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. Retrieved from http:\/\/snap.stanford.edu\/data."},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1145\/1772690.1772758"},{"key":"e_1_3_2_24_2","volume-title":"Proceedings of the 4th International Conference on Learning Representations (ICLR Poster)","author":"Lillicrap Timothy P.","year":"2016","unstructured":"Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2016. Continuous control with deep reinforcement learning. In Proceedings of the 4th International Conference on Learning Representations (ICLR Poster)."},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1145\/1639714.1639717"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1038\/nature14236"},{"key":"e_1_3_2_27_2","article-title":"Representation learning with contrastive predictive coding","author":"Oord Aaron van den","year":"2018","unstructured":"Aaron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018).","journal-title":"arXiv preprint arXiv:1807.03748"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1162"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-93417-4_38"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/2740908.2742726"},{"key":"e_1_3_2_31_2","first-page":"387","volume-title":"Proceedings of the 31st International Conference on Machine Learning (ICML)","author":"Silver David","year":"2014","unstructured":"David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller. 2014. Deterministic policy gradient algorithms. In Proceedings of the 31st International Conference on Machine Learning (ICML). 387\u2013395."},{"key":"e_1_3_2_32_2","first-page":"926","volume-title":"Proceedings of the Conference on Advances in Neural Information Processing Systems","author":"Socher Richard","year":"2013","unstructured":"Richard Socher, Danqi Chen, Christopher D. Manning, and Andrew Ng. 2013. Reasoning with neural tensor networks for knowledge base completion. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 926\u2013934."},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1145\/3077136.3080677"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1145\/3159652.3159656"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.3233\/FAIA200136"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v31i1.10936"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i5.16569"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1145\/3331184.3331267"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1145\/3077136.3080685"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1145\/3404835.3462862"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v31i1.10952"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v30i1.10329"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2017\/447"},{"key":"e_1_3_2_44_2","doi-asserted-by":"crossref","first-page":"153","DOI":"10.1145\/2835776.2835837","volume-title":"Proceedings of the 9th ACM International Conference on Web Search and Data Mining (WSDM)","author":"Yao Wu","year":"2016","unstructured":"Wu Yao, Christopher Dubois, Alice X. Zheng, and Martin Ester. 2016. Collaborative denoising auto-encoders for top-N recommender systems. In Proceedings of the 9th ACM International Conference on Web Search and Data Mining (WSDM). 153\u2013162."},{"key":"e_1_3_2_45_2","first-page":"15214","article-title":"Text-based interactive recommendation via constraint-augmented reinforcement learning","volume":"32","author":"Zhang Ruiyi","year":"2019","unstructured":"Ruiyi Zhang, Tong Yu, Yilin Shen, Hongxia Jin, and Changyou Chen. 2019. Text-based interactive recommendation via constraint-augmented reinforcement learning. Adv. Neural Inf. Process. Syst. 32 (2019), 15214\u201315224.","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1145\/3320496.3320500"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1145\/3240323.3240374"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1145\/3219819.3219886"},{"key":"e_1_3_2_49_2","article-title":"Deep reinforcement learning for list-wise recommendations","author":"Zhao Xiangyu","year":"2018","unstructured":"Xiangyu Zhao, Liang Zhang, Zhuoye Ding, Dawei Yin, Yihong Zhao, and Jiliang Tang. 2018. Deep reinforcement learning for list-wise recommendations. arXiv preprint arXiv:1801.00209 (2018).","journal-title":"arXiv preprint arXiv:1801.00209"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1145\/2505515.2505690"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1145\/3178876.3185994"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1145\/3018661.3018665"}],"container-title":["ACM\/IMS Transactions on Data Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3522596","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3522596","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,16]],"date-time":"2026-04-16T13:55:51Z","timestamp":1776347751000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3522596"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,11,30]]},"references-count":51,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2021,11,30]]}},"alternative-id":["10.1145\/3522596"],"URL":"https:\/\/doi.org\/10.1145\/3522596","relation":{},"ISSN":["2691-1922"],"issn-type":[{"value":"2691-1922","type":"print"}],"subject":[],"published":{"date-parts":[[2021,11,30]]},"assertion":[{"value":"2021-03-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-02-01","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-05-17","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}