{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,9]],"date-time":"2026-01-09T13:16:33Z","timestamp":1767964593592,"version":"3.49.0"},"reference-count":51,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2021,5,5]],"date-time":"2021-05-05T00:00:00Z","timestamp":1620172800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100012166","name":"National Key R&D Program of China","doi-asserted-by":"crossref","award":["2018YFC0830703"],"award-info":[{"award-number":["2018YFC0830703"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["61872370, 61832017, 61872338"],"award-info":[{"award-number":["61872370, 61832017, 61872338"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Beijing Outstanding Young Scientist Program","award":["BJJWZYJH012019100020098"],"award-info":[{"award-number":["BJJWZYJH012019100020098"]}]},{"name":"Beijing Academy of Artificial Intelligence","award":["BAAI2019ZD0305"],"award-info":[{"award-number":["BAAI2019ZD0305"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Inf. Syst."],"published-print":{"date-parts":[[2021,7,26]]},"abstract":"<jats:p>\n            Personalized search is a promising way to improve search qualities by taking user interests into consideration. Recently, machine learning and deep learning techniques have been successfully applied to search result personalization. Most existing models simply regard the personal search history as a static set of user behaviors and learn fixed ranking strategies based on all the recorded data. Though improvements have been achieved, the essence that the search process is a sequence of interactions between the search engine and user is ignored. The user\u2019s interests may dynamically change during the search process, therefore, it would be more helpful if a personalized search model could track the whole interaction process and adjust its ranking strategy continuously. In this article, we adapt reinforcement learning to personalized search and propose a framework, referred to as RLPS. It utilizes a\n            <jats:bold>Markov Decision Process<\/jats:bold>\n            (\n            <jats:bold>MDP<\/jats:bold>\n            ) to track sequential interactions between the user and search engine, and continuously update the underlying personalized ranking model with the user\u2019s real-time feedback to learn the user\u2019s dynamic interests. Within this framework, we implement two models: the listwise RLPS-L and the hierarchical RLPS-H. RLPS-L interacts with users and trains the ranking model with document lists, while RLPS-H improves model training by designing a layered structure and introducing document pairs. In addition, we also design a feedback-aware personalized ranking component to capture the user\u2019s feedback, which impacts the user interest profile for the next query. Significant improvements over existing personalized search models are observed in the experiments on the public AOL search log and a commercial log.\n          <\/jats:p>","DOI":"10.1145\/3446617","type":"journal-article","created":{"date-parts":[[2021,5,6]],"date-time":"2021-05-06T04:14:52Z","timestamp":1620274492000},"page":"1-29","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":11,"title":["RLPS: A Reinforcement Learning\u2013Based Framework for Personalized Search"],"prefix":"10.1145","volume":"39","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0527-6095","authenticated-orcid":false,"given":"Jing","family":"Yao","sequence":"first","affiliation":[{"name":"School of Information, Renmin University of China, Beijing, P. R China"}]},{"given":"Zhicheng","family":"Dou","sequence":"additional","affiliation":[{"name":"Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, P. R China"}]},{"given":"Jun","family":"Xu","sequence":"additional","affiliation":[{"name":"Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, P. R China"}]},{"given":"Ji-Rong","family":"Wen","sequence":"additional","affiliation":[{"name":"Beijing Key Laboratory of Big Data Management and Analysis Methods, Key Laboratory of Data Engineering and Knowledge Engineering, Ministry of Education of the People\u2019s Republic of China"}]}],"member":"320","published-online":{"date-parts":[[2021,5,5]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Proceedings of the 6th International Conference on Learning Representations (ICLR\u201918)","author":"Ahmad Wasi Uddin","year":"2018"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/3331184.3331246"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/2009916.2009938"},{"key":"e_1_2_1_4_1","volume-title":"Proceedings of the 19th International Conference on World Wide Web (WWW\u201910)","author":"Bennett Paul N."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/2348283.2348312"},{"key":"e_1_2_1_6_1","volume-title":"Proceedings of the 22nd International Conference on Machine Learning (ICML\u201905)","author":"Burges Christopher J. C."},{"key":"e_1_2_1_7_1","unstructured":"Chris J. C. Burges Krysta M. Svore Qiang Wu and Jianfeng Gao. 2008. Ranking Boosting and Model Adaptation. Technical Report MSR-TR-2008-109. 18 pages.  Chris J. C. Burges Krysta M. Svore Qiang Wu and Jianfeng Gao. 2008. Ranking Boosting and Model Adaptation. Technical Report MSR-TR-2008-109. 18 pages."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/2600428.2609453"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/1273496.1273513"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/1871437.1871745"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1390334.1390446"},{"key":"e_1_2_1_12_1","volume-title":"Proceedings of the 20th ACM Conference on Information and Knowledge Management (CIKM\u201911)","author":"Collins-Thompson Kevyn"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/1341531.1341545"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3159652.3159659"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1242572.1242651"},{"key":"e_1_2_1_16_1","volume-title":"Proceedings of the 10th International World Wide Web Conference (WWW\u201910)","author":"Dwork Cynthia"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3269206.3271728"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/2449396.2449413"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2505515.2505642"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3269206.3271808"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/1076034.1076063"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1458082.1458176"},{"key":"e_1_2_1_23_1","volume-title":"Learning to Rank for Information Retrieval","author":"Liu Tie-Yan"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3331184.3331218"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/1935826.1935840"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/1146847.1146848"},{"key":"e_1_2_1_27_1","volume-title":"Manning","author":"Pennington Jeffrey","year":"2014"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/1390156.1390255"},{"key":"e_1_2_1_29_1","volume-title":"Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS\u201911)","author":"Ross St\u00e9phane","year":"2011"},{"key":"e_1_2_1_30_1","volume-title":"An MDP-based recommender system. CoRR abs\/1301.0600","author":"Shani Guy","year":"2013"},{"key":"e_1_2_1_31_1","volume-title":"Proceedings of the 16th ACM Conference on Information and Knowledge Management (CIKM\u201907)","author":"Sieg Ahu"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/2556195.2556234"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/2806416.2806493"},{"key":"e_1_2_1_34_1","volume-title":"Barto","author":"Sutton Richard S.","year":"1998"},{"key":"e_1_2_1_35_1","volume-title":"Proceedings of SIGIR","author":"Teevan Jaime","year":"2008"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/1935826.1935848"},{"key":"e_1_2_1_37_1","unstructured":"Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N. Gomez Lukasz Kaiser and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30.  Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N. Gomez Lukasz Kaiser and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30."},{"key":"e_1_2_1_38_1","volume-title":"Mark Johnson, Dawei Song, and Alistair Willis.","author":"Vu Thanh","year":"2017"},{"key":"e_1_2_1_39_1","volume-title":"Son Ngoc Tran, and Dawei Song","author":"Vu Thanh Tien","year":"2015"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/2484028.2484068"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3219819.3219961"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3077136.3080685"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/2488388.2488511"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1007\/bf00992696"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/3077136.3080775"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/3077136.3080809"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/2747874"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/3234944.3234977"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/3240323.3240374"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/3219819.3219886"},{"key":"e_1_2_1_51_1","volume-title":"Deep reinforcement learning for list-wise recommendations. CoRR abs\/1801.00209","author":"Zhao Xiangyu","year":"2018"}],"container-title":["ACM Transactions on Information Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3446617","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3446617","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:47:31Z","timestamp":1750193251000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3446617"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,5,5]]},"references-count":51,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2021,7,26]]}},"alternative-id":["10.1145\/3446617"],"URL":"https:\/\/doi.org\/10.1145\/3446617","relation":{},"ISSN":["1046-8188","1558-2868"],"issn-type":[{"value":"1046-8188","type":"print"},{"value":"1558-2868","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,5,5]]},"assertion":[{"value":"2020-04-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-12-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-05-05","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}