{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,3]],"date-time":"2026-04-03T15:02:07Z","timestamp":1775228527564,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":42,"publisher":"ACM","license":[{"start":{"date-parts":[[2020,8,20]],"date-time":"2020-08-20T00:00:00Z","timestamp":1597881600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Science Foundation","award":["IIS1907704, IIS1928278, IIS1714741, IIS1715940, IIS1845081, CNS1815636"],"award-info":[{"award-number":["IIS1907704, IIS1928278, IIS1714741, IIS1715940, IIS1845081, CNS1815636"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2020,8,23]]},"DOI":"10.1145\/3394486.3403384","type":"proceedings-article","created":{"date-parts":[[2020,8,20]],"date-time":"2020-08-20T23:03:59Z","timestamp":1597964639000},"page":"3319-3327","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":58,"title":["Jointly Learning to Recommend and Advertise"],"prefix":"10.1145","author":[{"given":"Xiangyu","family":"Zhao","sequence":"first","affiliation":[{"name":"Michigan State University, East Lansing, MI, USA"}]},{"given":"Xudong","family":"Zheng","sequence":"additional","affiliation":[{"name":"Bytedance, Beijing, China"}]},{"given":"Xiwang","family":"Yang","sequence":"additional","affiliation":[{"name":"Bytedance, Beijing, China"}]},{"given":"Xiaobing","family":"Liu","sequence":"additional","affiliation":[{"name":"Bytedance, Beijing, China"}]},{"given":"Jiliang","family":"Tang","sequence":"additional","affiliation":[{"name":"Michigan State University, East Lansing, MI, USA"}]}],"member":"320","published-online":{"date-parts":[[2020,8,20]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Dynamic programming","author":"Bellman Richard","unstructured":"Richard Bellman . 2013. Dynamic programming . Courier Corporation . Richard Bellman. 2013. Dynamic programming. Courier Corporation."},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/3018661.3018702"},{"key":"e_1_3_2_1_3_1","volume-title":"Large-scale Interactive Recommendation with Tree-structured Policy Gradient. arXiv preprint arXiv:1811.05869","author":"Chen Haokun","year":"2018","unstructured":"Haokun Chen , Xinyi Dai , Han Cai , Weinan Zhang , Xuejian Wang , Ruiming Tang , Yuzhou Zhang , and Yong Yu. 2018b. Large-scale Interactive Recommendation with Tree-structured Policy Gradient. arXiv preprint arXiv:1811.05869 ( 2018 ). Haokun Chen, Xinyi Dai, Han Cai, Weinan Zhang, Xuejian Wang, Ruiming Tang, Yuzhou Zhang, and Yong Yu. 2018b. Large-scale Interactive Recommendation with Tree-structured Policy Gradient. arXiv preprint arXiv:1811.05869 (2018)."},{"key":"e_1_3_2_1_4_1","volume-title":"Chi","author":"Chen Minmin","year":"2018","unstructured":"Minmin Chen , Alex Beutel , Paul Covington , Sagar Jain , Francois Belletti , and Ed Chi . 2018 a. Top-K Off-Policy Correction for a REINFORCE Recommender System . arXiv preprint arXiv:1812.02353 (2018). Minmin Chen, Alex Beutel, Paul Covington, Sagar Jain, Francois Belletti, and Ed Chi. 2018a. Top-K Off-Policy Correction for a REINFORCE Recommender System. arXiv preprint arXiv:1812.02353 (2018)."},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3219819.3220122"},{"key":"e_1_3_2_1_6_1","volume-title":"Generative Adversarial User Model for Reinforcement Learning Based Recommendation System. In International Conference on Machine Learning. 1052--1061","author":"Chen Xinshi","year":"2019","unstructured":"Xinshi Chen , Shuang Li , Hui Li , Shaohua Jiang , Yuan Qi , and Le Song . 2019 . Generative Adversarial User Model for Reinforcement Learning Based Recommendation System. In International Conference on Machine Learning. 1052--1061 . Xinshi Chen, Shuang Li, Hui Li, Shaohua Jiang, Yuan Qi, and Le Song. 2019. Generative Adversarial User Model for Reinforcement Learning Based Recommendation System. In International Conference on Machine Learning. 1052--1061."},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/2988450.2988454"},{"key":"e_1_3_2_1_8_1","volume-title":"Reinforcement Learning based Recommender System using Biclustering Technique. arXiv preprint arXiv:1801.05532","author":"Choi Sungwoon","year":"2018","unstructured":"Sungwoon Choi , Heonseok Ha , Uiwon Hwang , Chanju Kim , Jung-Woo Ha , and Sungroh Yoon . 2018. Reinforcement Learning based Recommender System using Biclustering Technique. arXiv preprint arXiv:1801.05532 ( 2018 ). Sungwoon Choi, Heonseok Ha, Uiwon Hwang, Chanju Kim, Jung-Woo Ha, and Sungroh Yoon. 2018. Reinforcement Learning based Recommender System using Biclustering Technique. arXiv preprint arXiv:1801.05532 (2018)."},{"key":"e_1_3_2_1_9_1","volume-title":"Off-policy actor-critic. arXiv preprint arXiv:1205.4839","author":"Degris Thomas","year":"2012","unstructured":"Thomas Degris , Martha White , and Richard S Sutton . 2012. Off-policy actor-critic. arXiv preprint arXiv:1205.4839 ( 2012 ). Thomas Degris, Martha White, and Richard S Sutton. 2012. Off-policy actor-critic. arXiv preprint arXiv:1205.4839 (2012)."},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.5555\/2891460.2891493"},{"key":"e_1_3_2_1_11_1","volume-title":"Deep reinforcement learning in large discrete action spaces. arXiv preprint arXiv:1512.07679","author":"Dulac-Arnold Gabriel","year":"2015","unstructured":"Gabriel Dulac-Arnold , Richard Evans , Hado van Hasselt , Peter Sunehag , Timothy Lillicrap , Jonathan Hunt , Timothy Mann , Theophane Weber , Thomas Degris , and Ben Coppin . 2015. Deep reinforcement learning in large discrete action spaces. arXiv preprint arXiv:1512.07679 ( 2015 ). Gabriel Dulac-Arnold, Richard Evans, Hado van Hasselt, Peter Sunehag, Timothy Lillicrap, Jonathan Hunt, Timothy Mann, Theophane Weber, Thomas Degris, and Ben Coppin. 2015. Deep reinforcement learning in large discrete action spaces. arXiv preprint arXiv:1512.07679 (2015)."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3178876.3186165"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/IJCNN.2018.8489092"},{"key":"e_1_3_2_1_14_1","unstructured":"Huifeng Guo Ruiming Tang Yunming Ye Zhenguo Li and Xiuqiang He. 2017. DeepFM: a factorization-machine based neural network for CTR prediction. arXiv preprint arXiv:1703.04247 (2017).  Huifeng Guo Ruiming Tang Yunming Ye Zhenguo Li and Xiuqiang He. 2017. DeepFM: a factorization-machine based neural network for CTR prediction. arXiv preprint arXiv:1703.04247 (2017)."},{"key":"e_1_3_2_1_15_1","volume-title":"2015 AAAI Fall Symposium Series.","author":"Hausknecht Matthew","year":"2015","unstructured":"Matthew Hausknecht and Peter Stone . 2015 . Deep recurrent q-learning for partially observable mdps . In 2015 AAAI Fall Symposium Series. Matthew Hausknecht and Peter Stone. 2015. Deep recurrent q-learning for partially observable mdps. In 2015 AAAI Fall Symposium Series."},{"key":"e_1_3_2_1_16_1","volume-title":"Session-based recommendations with recurrent neural networks. arXiv preprint arXiv:1511.06939","author":"Hidasi Bal\u00e1zs","year":"2015","unstructured":"Bal\u00e1zs Hidasi , Alexandros Karatzoglou , Linas Baltrunas , and Domonkos Tikk . 2015. Session-based recommendations with recurrent neural networks. arXiv preprint arXiv:1511.06939 ( 2015 ). Bal\u00e1zs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2015. Session-based recommendations with recurrent neural networks. arXiv preprint arXiv:1511.06939 (2015)."},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3269206.3272021"},{"key":"e_1_3_2_1_18_1","volume-title":"Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602","author":"Mnih Volodymyr","year":"2013","unstructured":"Volodymyr Mnih , Koray Kavukcuoglu , David Silver , Alex Graves , Ioannis Antonoglou , Daan Wierstra , and Martin Riedmiller . 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 ( 2013 ). Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.11888"},{"key":"e_1_3_2_1_20_1","volume-title":"RecoGym: A Reinforcement Learning Environment for the problem of Product Recommendation in Online Advertising. arXiv preprint arXiv:1808.00720","author":"Rohde David","year":"2018","unstructured":"David Rohde , Stephen Bonner , Travis Dunlop , Flavian Vasile , and Alexandros Karatzoglou . 2018. RecoGym: A Reinforcement Learning Environment for the problem of Product Recommendation in Online Advertising. arXiv preprint arXiv:1808.00720 ( 2018 ). David Rohde, Stephen Bonner, Travis Dunlop, Flavian Vasile, and Alexandros Karatzoglou. 2018. RecoGym: A Reinforcement Learning Environment for the problem of Product Recommendation in Online Advertising. arXiv preprint arXiv:1808.00720 (2018)."},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1287\/opre.1070.0384"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/2396761.2398561"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1287\/mksc.2016.1023"},{"key":"e_1_3_2_1_24_1","volume-title":"Seeun Yun, Donggyu Yun, Hyoju Chung, and Yung Yi.","author":"Song Hyungseok","year":"2019","unstructured":"Hyungseok Song , Hyeryung Jang , Hai Tran Hong , Seeun Yun, Donggyu Yun, Hyoju Chung, and Yung Yi. 2019 . Solving Continual Combinatorial Selection via Deep Reinforcement Learning . (2019). Hyungseok Song, Hyeryung Jang, Hai Tran Hong, Seeun Yun, Donggyu Yun, Hyoju Chung, and Yung Yi. 2019. Solving Continual Combinatorial Selection via Deep Reinforcement Learning. (2019)."},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2505515.2514700"},{"key":"e_1_3_2_1_26_1","volume-title":"et almbox","author":"Wang Weixun","year":"2018","unstructured":"Weixun Wang , Junqi Jin , Jianye Hao , Chunjie Chen , Chuan Yu , Weinan Zhang , Jun Wang , Yixi Wang , Han Li , Jian Xu , et almbox . 2018 b. Learning to Advertise with Adaptive Exposure via Constrained Two-Level Reinforcement Learning . arXiv preprint arXiv:1809.03149 (2018). Weixun Wang, Junqi Jin, Jianye Hao, Chunjie Chen, Chuan Yu, Weinan Zhang, Jun Wang, Yixi Wang, Han Li, Jian Xu, et almbox. 2018b. Learning to Advertise with Adaptive Exposure via Constrained Two-Level Reinforcement Learning. arXiv preprint arXiv:1809.03149 (2018)."},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2018.00074"},{"key":"e_1_3_2_1_28_1","volume-title":"Nando De Freitas, and Marc Lanctot","author":"Wang Ziyu","year":"2015","unstructured":"Ziyu Wang , Nando De Freitas, and Marc Lanctot . 2015 . Dueling Network Architectures for Deep Reinforcement Learning . (2015). Ziyu Wang, Nando De Freitas, and Marc Lanctot. 2015. Dueling Network Architectures for Deep Reinforcement Learning. (2015)."},{"key":"e_1_3_2_1_29_1","volume-title":"A Multi-Agent Reinforcement Learning Method for Impression Allocation in Online Display Advertising. arXiv preprint arXiv:1809.03152","author":"Wu Di","year":"2018","unstructured":"Di Wu , Cheng Chen , Xun Yang , Xiujun Chen , Qing Tan , Jian Xu , and Kun Gai . 2018a. A Multi-Agent Reinforcement Learning Method for Impression Allocation in Online Display Advertising. arXiv preprint arXiv:1809.03152 ( 2018 ). Di Wu, Cheng Chen, Xun Yang, Xiujun Chen, Qing Tan, Jian Xu, and Kun Gai. 2018a. A Multi-Agent Reinforcement Learning Method for Impression Allocation in Online Display Advertising. arXiv preprint arXiv:1809.03152 (2018)."},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3269206.3271748"},{"key":"e_1_3_2_1_31_1","unstructured":"Min Xu Tao Qin and Tie-Yan Liu. 2013. Estimation bias in multi-armed bandit algorithms for search advertising. In Advances in Neural Information Processing Systems. 2400--2408.  Min Xu Tao Qin and Tie-Yan Liu. 2013. Estimation bias in multi-armed bandit algorithms for search advertising. In Advances in Neural Information Processing Systems. 2400--2408."},{"key":"e_1_3_2_1_32_1","volume-title":"Adaptive keywords extraction with contextual bandits for advertising on parked domains. arXiv preprint arXiv:1307.3573","author":"Yuan Shuai","year":"2013","unstructured":"Shuai Yuan , Jun Wang , and Maurice van der Meer . 2013. Adaptive keywords extraction with contextual bandits for advertising on parked domains. arXiv preprint arXiv:1307.3573 ( 2013 ). Shuai Yuan, Jun Wang, and Maurice van der Meer. 2013. Adaptive keywords extraction with contextual bandits for advertising on parked domains. arXiv preprint arXiv:1307.3573 (2013)."},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/3397271.3401467"},{"key":"e_1_3_2_1_34_1","volume-title":"2019 a. Deep Reinforcement Learning for Online Advertising in Recommender Systems. arXiv preprint arXiv:1909.03602","author":"Zhao Xiangyu","year":"2019","unstructured":"Xiangyu Zhao , Changsheng Gu , Haoshenglun Zhang , Xiaobing Liu , Xiwang Yang , and Jiliang Tang . 2019 a. Deep Reinforcement Learning for Online Advertising in Recommender Systems. arXiv preprint arXiv:1909.03602 ( 2019 ). Xiangyu Zhao, Changsheng Gu, Haoshenglun Zhang, Xiaobing Liu, Xiwang Yang, and Jiliang Tang. 2019 a. Deep Reinforcement Learning for Online Advertising in Recommender Systems. arXiv preprint arXiv:1909.03602 (2019)."},{"key":"e_1_3_2_1_35_1","volume-title":"2019 b. Toward Simulating Environments in Reinforcement Learning Based Recommendations. arXiv preprint arXiv:1906.11462","author":"Zhao Xiangyu","year":"2019","unstructured":"Xiangyu Zhao , Long Xia , Zhuoye Ding , Dawei Yin , and Jiliang Tang . 2019 b. Toward Simulating Environments in Reinforcement Learning Based Recommendations. arXiv preprint arXiv:1906.11462 ( 2019 ). Xiangyu Zhao, Long Xia, Zhuoye Ding, Dawei Yin, and Jiliang Tang. 2019 b. Toward Simulating Environments in Reinforcement Learning Based Recommendations. arXiv preprint arXiv:1906.11462 (2019)."},{"key":"e_1_3_2_1_36_1","volume-title":"Long Xia, Jiliang Tang, and Dawei Yin with Martin Vesely as coordinator. ACM SIGWEB Newsletter Spring","author":"Zhao Xiangyu","year":"2019","unstructured":"Xiangyu Zhao , Long Xia , Jiliang Tang , and Dawei Yin . 2019 c. Deep reinforcement learning for search, recommendation, and online advertising: a survey by Xiangyu Zhao , Long Xia, Jiliang Tang, and Dawei Yin with Martin Vesely as coordinator. ACM SIGWEB Newsletter Spring ( 2019 ), 4. Xiangyu Zhao, Long Xia, Jiliang Tang, and Dawei Yin. 2019 c. Deep reinforcement learning for search, recommendation, and online advertising: a survey by Xiangyu Zhao, Long Xia, Jiliang Tang, and Dawei Yin with Martin Vesely as coordinator. ACM SIGWEB Newsletter Spring (2019), 4."},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3240323.3240374"},{"key":"e_1_3_2_1_38_1","volume-title":"2019 d. Model-Based Reinforcement Learning for Whole-Chain Recommendations. arXiv preprint arXiv:1902.03987","author":"Zhao Xiangyu","year":"2019","unstructured":"Xiangyu Zhao , Long Xia , Yihong Zhao , Dawei Yin , and Jiliang Tang . 2019 d. Model-Based Reinforcement Learning for Whole-Chain Recommendations. arXiv preprint arXiv:1902.03987 ( 2019 ). Xiangyu Zhao, Long Xia, Yihong Zhao, Dawei Yin, and Jiliang Tang. 2019 d. Model-Based Reinforcement Learning for Whole-Chain Recommendations. arXiv preprint arXiv:1902.03987 (2019)."},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3219819.3219886"},{"key":"e_1_3_2_1_40_1","volume-title":"Deep Reinforcement Learning for List-wise Recommendations. arXiv preprint arXiv:1801.00209","author":"Zhao Xiangyu","year":"2017","unstructured":"Xiangyu Zhao , Liang Zhang , Zhuoye Ding , Dawei Yin , Yihong Zhao , and Jiliang Tang . 2017. Deep Reinforcement Learning for List-wise Recommendations. arXiv preprint arXiv:1801.00209 ( 2017 ). Xiangyu Zhao, Liang Zhang, Zhuoye Ding, Dawei Yin, Yihong Zhao, and Jiliang Tang. 2017. Deep Reinforcement Learning for List-wise Recommendations. arXiv preprint arXiv:1801.00209 (2017)."},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3178876.3185994"},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3397271.3401181"}],"event":{"name":"KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining","location":"Virtual Event CA USA","acronym":"KDD '20","sponsor":["SIGMOD ACM Special Interest Group on Management of Data","SIGKDD ACM Special Interest Group on Knowledge Discovery in Data"]},"container-title":["Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3394486.3403384","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3394486.3403384","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3394486.3403384","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T21:31:30Z","timestamp":1750195890000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3394486.3403384"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,8,20]]},"references-count":42,"alternative-id":["10.1145\/3394486.3403384","10.1145\/3394486"],"URL":"https:\/\/doi.org\/10.1145\/3394486.3403384","relation":{},"subject":[],"published":{"date-parts":[[2020,8,20]]},"assertion":[{"value":"2020-08-20","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}