{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,2]],"date-time":"2025-08-02T19:10:46Z","timestamp":1754161846997,"version":"3.41.2"},"reference-count":49,"publisher":"Association for Computing Machinery (ACM)","issue":"1","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Recomm. Syst."],"published-print":{"date-parts":[[2026,3,31]]},"abstract":"<jats:p>Commodity recommendation contributes an important part of individuals\u2019 daily life. In this context, deep reinforcement learning methods have demonstrated substantial efficacy in enhancing recommender systems\u2019 performance. Nevertheless, several recommender systems directly utilize original feature information as a foundational element for decision-making, which seems simplistic and low efficient. Furthermore, the incorporation of sequential decision-making adds complexity to the task of recommendation.<\/jats:p>\n          <jats:p>In pursuit of maximizing the long-term sequential returns of recommender systems, our study introduces a novel architecture, named User Information Separating Architecture (UISA). This framework is tailored to align with classic reinforcement learning algorithms and aims to extract the user\u2019s interest value through the discrete processing of both static and dynamic user information. Through integration with deep reinforcement learning, the architecture is oriented towards the maximization of long-term profit and is applicable in sequential recommendation scenarios. We conduct experimental assessments by combining the proposed architecture with proximal policy optimization (PPO) and deep deterministic policy gradient (DDPG) algorithms. The outcomes illustrate marked improvements in commodity recommendation, showcasing enhancements ranging from approximately 5% to 40% in both reward and click-through rate metrics across a self-constructed JDEnv environment and the Virtual Taobao environment. Through comparison experiments, the UISA models demonstrate comparable performance.<\/jats:p>","DOI":"10.1145\/3654806","type":"journal-article","created":{"date-parts":[[2024,4,6]],"date-time":"2024-04-06T06:20:02Z","timestamp":1712384402000},"page":"1-18","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["UISA: User Information Separating Architecture for Commodity Recommendation Policy with Deep Reinforcement Learning"],"prefix":"10.1145","volume":"4","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3392-7374","authenticated-orcid":false,"given":"Aobo","family":"Xu","sequence":"first","affiliation":[{"name":"China University of Petroleum East China","place":["Qingdao, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9385-5977","authenticated-orcid":false,"given":"Ling","family":"Jian","sequence":"additional","affiliation":[{"name":"China University of Petroleum East China","place":["Qingdao, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-9476-9336","authenticated-orcid":false,"given":"Yue","family":"Yin","sequence":"additional","affiliation":[{"name":"China University of Petroleum East China","place":["Qingdao, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-7158-7825","authenticated-orcid":false,"given":"Na","family":"Zhang","sequence":"additional","affiliation":[{"name":"China University of Petroleum East China","place":["Qingdao, China"]}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,7,29]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/3289600.3290999"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1145\/3488560.3501396"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/3523227.3551485"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2022.118459"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1145\/3383313.3412252"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSC.2019.2918310"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/2505515.2505665"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/CICT.2015.20"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/3132847.3132926"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/3595384"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/3340531.3411919"},{"key":"e_1_3_2_13_2","article-title":"A first look at LLM-powered generative news recommendation","author":"Liu Qijiong","year":"2023","unstructured":"Qijiong Liu, Nuo Chen, Tetsuya Sakai, and Xiao-Ming Wu. 2023. A first look at LLM-powered generative news recommendation. arXiv preprint arXiv:2305.06566 (2023).","journal-title":"arXiv preprint arXiv:2305.06566"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/BigData52589.2021.9671558"},{"key":"e_1_3_2_15_2","article-title":"LLM-Rec: Personalized recommendation via prompting large language models","author":"Lyu Hanjia","year":"2023","unstructured":"Hanjia Lyu, Song Jiang, Hanqing Zeng, Yinglong Xia, and Jiebo Luo. 2023. LLM-Rec: Personalized recommendation via prompting large language models. arXiv preprint arXiv:2307.15780 (2023).","journal-title":"arXiv preprint arXiv:2307.15780"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394486.3403278"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41586-023-06004-9"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/3397271.3401098"},{"key":"e_1_3_2_19_2","article-title":"BERTERS: Multimodal representation learning for expert recommendation system with transformer","author":"Nikzad-Khasmakhia N.","year":"2020","unstructured":"N. Nikzad-Khasmakhia, M. Reza Feizi-Derakhshia, and Cina Motamedb. 2020. BERTERS: Multimodal representation learning for expert recommendation system with transformer. arXiv preprint arXiv:2007.07229 (2020).","journal-title":"arXiv preprint arXiv:2007.07229"},{"key":"e_1_3_2_20_2","first-page":"27730","article-title":"Training language models to follow instructions with human feedback","volume":"35","author":"Ouyang Long","year":"2022","unstructured":"Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, and Peter Welinder. 2022. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 35 (2022), 27730\u201327744.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/371920.372071"},{"key":"e_1_3_2_22_2","first-page":"27","volume-title":"Fifth International Conference on Computer and Information Science","volume":"1","author":"Sarwar Badrul","year":"2002","unstructured":"Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2002. Incremental singular value decomposition algorithms for highly scalable recommender systems. In Fifth International Conference on Computer and Information Science, Vol. 1. Citeseer, 27\u20138."},{"key":"e_1_3_2_23_2","article-title":"Proximal policy optimization algorithms","author":"Schulman John","year":"2017","unstructured":"John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. CoRR, abs\/1707.06347 (2017).","journal-title":"CoRR"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33014902"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1038\/nature16961"},{"key":"e_1_3_2_26_2","volume-title":"Reinforcement Learning: An Introduction","author":"Sutton Richard S.","year":"2018","unstructured":"Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction. MIT Press."},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2021.107217"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/SMC52423.2021.9658757"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41586-019-1724-z"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10458-007-9021-x"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSS.2021.3064213"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/1148170.1148257"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1145\/3397271.3401134"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.11851"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2019\/883"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1145\/3532192"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2018.00074"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/SCC.2019.00042"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/IJCNN54540.2023.10192007"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i01.5360"},{"key":"e_1_3_2_41_2","article-title":"A reusable model-agnostic framework for faithfully explainable recommendation and system scrutability","author":"Xu Zhichao","year":"2023","unstructured":"Zhichao Xu, Hansi Zeng, Juntao Tan, Zuohui Fu, Yongfeng Zhang, and Qingyao Ai. 2023. A reusable model-agnostic framework for faithfully explainable recommendation and system scrutability. ACM Transactions on Information Systems (2023).","journal-title":"ACM Transactions on Information Systems"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1145\/3298689.3346996"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1145\/3219819.3219890"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1145\/3556536"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1145\/3158369"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1145\/3442381.3450125"},{"key":"e_1_3_2_47_2","article-title":"Toward simulating environments in reinforcement learning based recommendations","author":"Zhao Xiangyu","year":"2019","unstructured":"Xiangyu Zhao, Long Xia, Lixin Zou, Dawei Yin, and Jiliang Tang. 2019. Toward simulating environments in reinforcement learning based recommendations. arXiv preprint arXiv:1906.11462 (2019).","journal-title":"arXiv preprint arXiv:1906.11462"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/WKDD.2010.54"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1145\/3178876.3185994"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/BigData52589.2021.9671593"}],"container-title":["ACM Transactions on Recommender Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3654806","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,29]],"date-time":"2025-07-29T12:47:20Z","timestamp":1753793240000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3654806"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,29]]},"references-count":49,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2026,3,31]]}},"alternative-id":["10.1145\/3654806"],"URL":"https:\/\/doi.org\/10.1145\/3654806","relation":{},"ISSN":["2770-6699"],"issn-type":[{"type":"electronic","value":"2770-6699"}],"subject":[],"published":{"date-parts":[[2025,7,29]]},"assertion":[{"value":"2023-10-16","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-03-05","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-07-29","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}