{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,2]],"date-time":"2025-12-02T15:05:46Z","timestamp":1764687946762,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":84,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,9,13]],"date-time":"2021-09-13T00:00:00Z","timestamp":1631491200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100002913","name":"Vlaamse Overheid","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100002913","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,9,13]]},"DOI":"10.1145\/3460231.3474247","type":"proceedings-article","created":{"date-parts":[[2021,9,13]],"date-time":"2021-09-13T21:45:02Z","timestamp":1631569502000},"page":"63-74","source":"Crossref","is-referenced-by-count":30,"title":["Pessimistic Reward Models for Off-Policy Learning in Recommendation"],"prefix":"10.1145","author":[{"given":"Olivier","family":"Jeunen","sequence":"first","affiliation":[{"name":"University of Antwerp, Belgium"}]},{"given":"Bart","family":"Goethals","sequence":"additional","affiliation":[{"name":"University of Antwerp, Belgium"}]}],"member":"320","published-online":{"date-parts":[[2021,9,13]]},"reference":[{"volume-title":"Proc. of the 23rd ACM SIGKDD International Conference on Knowledge Discovery & Data Mining(KDD \u201917)","author":"Agarwal A.","key":"e_1_3_2_2_1_1","unstructured":"A. Agarwal , S. Basu , T. Schnabel , and T. Joachims . 2017. Effective Evaluation Using Logged Bandit Feedback from Multiple Loggers . In Proc. of the 23rd ACM SIGKDD International Conference on Knowledge Discovery & Data Mining(KDD \u201917) . ACM, 687\u2013696. A. Agarwal, S. Basu, T. Schnabel, and T. Joachims. 2017. Effective Evaluation Using Logged Bandit Feedback from Multiple Loggers. In Proc. of the 23rd ACM SIGKDD International Conference on Knowledge Discovery & Data Mining(KDD \u201917). ACM, 687\u2013696."},{"volume-title":"Proc. of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR\u201919)","author":"Agarwal A.","key":"e_1_3_2_2_2_1","unstructured":"A. Agarwal , K. Takatsu , I. Zaitsev , and T. Joachims . 2019. A General Framework for Counterfactual Learning-to-Rank . In Proc. of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR\u201919) . ACM, 5\u201314. A. Agarwal, K. Takatsu, I. Zaitsev, and T. Joachims. 2019. A General Framework for Counterfactual Learning-to-Rank. In Proc. of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR\u201919). ACM, 5\u201314."},{"volume-title":"Proc. of the 2019 World Wide Web Conference(WWW \u201919)","author":"Agarwal A.","key":"e_1_3_2_2_3_1","unstructured":"A. Agarwal , X. Wang , C. Li , M. Bendersky , and M. Najork . 2019. Addressing Trust Bias for Unbiased Learning-to-Rank . In Proc. of the 2019 World Wide Web Conference(WWW \u201919) . ACM, 4\u201314. A. Agarwal, X. Wang, C. Li, M. Bendersky, and M. Najork. 2019. Addressing Trust Bias for Unbiased Learning-to-Rank. In Proc. of the 2019 World Wide Web Conference(WWW \u201919). ACM, 4\u201314."},{"key":"e_1_3_2_2_4_1","doi-asserted-by":"crossref","unstructured":"J.\u00a0O. Berger and R.\u00a0L. Wolpert. 1988. The Likelihood Principle. IMS. J.\u00a0O. Berger and R.\u00a0L. Wolpert. 1988. The Likelihood Principle. IMS.","DOI":"10.1214\/lnms\/1215466210"},{"key":"e_1_3_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.5555\/2567709.2567766"},{"volume-title":"Proc. of the 12th ACM Conference on Recommender Systems(RecSys \u201918)","author":"Chaney A.","key":"e_1_3_2_2_6_1","unstructured":"A. Chaney , B. Stewart , and B. Engelhardt . 2018. How Algorithmic Confounding in Recommendation Systems Increases Homogeneity and Decreases Utility . In Proc. of the 12th ACM Conference on Recommender Systems(RecSys \u201918) . ACM, 224\u2013232. A. Chaney, B. Stewart, and B. Engelhardt. 2018. How Algorithmic Confounding in Recommendation Systems Increases Homogeneity and Decreases Utility. In Proc. of the 12th ACM Conference on Recommender Systems(RecSys \u201918). ACM, 224\u2013232."},{"volume-title":"Proc. of the 24th International Conference on Neural Information Processing Systems(NIPS\u201911)","author":"Chapelle O.","key":"e_1_3_2_2_7_1","unstructured":"O. Chapelle and L. Li . 2011. An Empirical Evaluation of Thompson Sampling . In Proc. of the 24th International Conference on Neural Information Processing Systems(NIPS\u201911) . 2249\u20132257. O. Chapelle and L. Li. 2011. An Empirical Evaluation of Thompson Sampling. In Proc. of the 24th International Conference on Neural Information Processing Systems(NIPS\u201911). 2249\u20132257."},{"key":"e_1_3_2_2_8_1","volume-title":"Proc. of the 12th ACM International Conference on Web Search and Data Mining(WSDM \u201919)","author":"Chen M.","year":"2019","unstructured":"M. Chen , A. Beutel , P. Covington , S. Jain , F. Belletti , and E.\u00a0 H. Chi . 2019 . Top-K Off-Policy Correction for a REINFORCE Recommender System . In Proc. of the 12th ACM International Conference on Web Search and Data Mining(WSDM \u201919) . ACM, 456\u2013464. M. Chen, A. Beutel, P. Covington, S. Jain, F. Belletti, and E.\u00a0H. Chi. 2019. Top-K Off-Policy Correction for a REINFORCE Recommender System. In Proc. of the 12th ACM International Conference on Web Search and Data Mining(WSDM \u201919). ACM, 456\u2013464."},{"key":"e_1_3_2_2_9_1","article-title":"Block-Aware Item Similarity Models for Top-N Recommendation","volume":"38","author":"Chen Y.","year":"2020","unstructured":"Y. Chen , Y. Wang , X. Zhao , J. Zou , and M. de Rijke . 2020 . Block-Aware Item Similarity Models for Top-N Recommendation . ACM Trans. Inf. Syst. 38 , 4, Article 42 (Sept. 2020), 26\u00a0pages. Y. Chen, Y. Wang, X. Zhao, J. Zou, and M. de Rijke. 2020. Block-Aware Item Similarity Models for Top-N Recommendation. ACM Trans. Inf. Syst. 38, 4, Article 42 (Sept. 2020), 26\u00a0pages.","journal-title":"ACM Trans. Inf. Syst."},{"volume-title":"Proc. of the 14th ACM International Conference on Web Search and Data Mining(WSDM \u201921)","author":"Chen Z.","key":"e_1_3_2_2_10_1","unstructured":"Z. Chen , Y. Wang , D. Lin , D.\u00a0 Z. Cheng , L. Hong , E.\u00a0 H. Chi , and C. Cui . 2021. Beyond Point Estimate: Inferring Ensemble Prediction Variation from Neuron Activation Strength in Recommender Systems . In Proc. of the 14th ACM International Conference on Web Search and Data Mining(WSDM \u201921) . ACM, 76\u201384. Z. Chen, Y. Wang, D. Lin, D.\u00a0Z. Cheng, L. Hong, E.\u00a0H. Chi, and C. Cui. 2021. Beyond Point Estimate: Inferring Ensemble Prediction Variation from Neuron Activation Strength in Recommender Systems. In Proc. of the 14th ACM International Conference on Web Search and Data Mining(WSDM \u201921). ACM, 76\u201384."},{"volume-title":"Proc. of the 2021 World Wide Web Conference(WWW \u201921)","author":"Choi M.","key":"e_1_3_2_2_11_1","unstructured":"M. Choi , J. Kim , J. Lee , H. Shim , and J. Lee . 2021. Session-aware Linear Item-Item Models for Session-based Recommendation . In Proc. of the 2021 World Wide Web Conference(WWW \u201921) . M. Choi, J. Kim, J. Lee, H. Shim, and J. Lee. 2021. Session-aware Linear Item-Item Models for Session-based Recommendation. In Proc. of the 2021 World Wide Web Conference(WWW \u201921)."},{"volume-title":"Proc. of the 13th ACM Conference on Recommender Systems(RecSys \u201919)","author":"Dacrema F.","key":"e_1_3_2_2_12_1","unstructured":"M.\u00a0 F. Dacrema , P. Cremonesi , and D. Jannach . 2019. Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches . In Proc. of the 13th ACM Conference on Recommender Systems(RecSys \u201919) . ACM, 101\u2013109. M.\u00a0F. Dacrema, P. Cremonesi, and D. Jannach. 2019. Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches. In Proc. of the 13th ACM Conference on Recommender Systems(RecSys \u201919). ACM, 101\u2013109."},{"volume-title":"Proc. of the 28th International Conference on International Conference on Machine Learning(ICML\u201911)","author":"Dud\u00edk M.","key":"e_1_3_2_2_13_1","unstructured":"M. Dud\u00edk , J. Langford , and L. Li . 2011. Doubly Robust Policy Evaluation and Learning . In Proc. of the 28th International Conference on International Conference on Machine Learning(ICML\u201911) . 1097\u20131104. M. Dud\u00edk, J. Langford, and L. Li. 2011. Doubly Robust Policy Evaluation and Learning. In Proc. of the 28th International Conference on International Conference on Machine Learning(ICML\u201911). 1097\u20131104."},{"key":"e_1_3_2_2_14_1","volume-title":"Proc. of the 32nd International Conference on Neural Information Processing Systems(NIPS\u201918)","author":"Dumitrascu B.","year":"2018","unstructured":"B. Dumitrascu , K. Feng , and B.\u00a0 E. Engelhardt . 2018 . PG-TS: Improved Thompson Sampling for Logistic Contextual Bandits . In Proc. of the 32nd International Conference on Neural Information Processing Systems(NIPS\u201918) . 4629\u20134638. B. Dumitrascu, K. Feng, and B.\u00a0E. Engelhardt. 2018. PG-TS: Improved Thompson Sampling for Logistic Contextual Bandits. In Proc. of the 32nd International Conference on Neural Information Processing Systems(NIPS\u201918). 4629\u20134638."},{"volume-title":"An introduction to the bootstrap","author":"Efron B.","key":"e_1_3_2_2_15_1","unstructured":"B. Efron and R.\u00a0 J. Tibshirani . 1994. An introduction to the bootstrap . CRC press . B. Efron and R.\u00a0J. Tibshirani. 1994. An introduction to the bootstrap. CRC press."},{"volume-title":"Proc. of the 13th ACM Conference on Recommender Systems(RecSys \u201919)","author":"Elahi E.","key":"e_1_3_2_2_16_1","unstructured":"E. Elahi , W. Wang , D. Ray , A. Fenton , and T. Jebara . 2019. Variational Low Rank Multinomials for Collaborative Filtering with Side-information . In Proc. of the 13th ACM Conference on Recommender Systems(RecSys \u201919) . ACM, 340\u2013347. E. Elahi, W. Wang, D. Ray, A. Fenton, and T. Jebara. 2019. Variational Low Rank Multinomials for Collaborative Filtering with Side-information. In Proc. of the 13th ACM Conference on Recommender Systems(RecSys \u201919). ACM, 340\u2013347."},{"key":"e_1_3_2_2_17_1","volume-title":"Generalized Multiple Importance Sampling. Statist. Sci. 34, 1 (02","author":"Elvira V.","year":"2019","unstructured":"V. Elvira , L. Martino , D. Luengo , and M.\u00a0 F. Bugallo . 2019. Generalized Multiple Importance Sampling. Statist. Sci. 34, 1 (02 2019 ), 129\u2013155. V. Elvira, L. Martino, D. Luengo, and M.\u00a0F. Bugallo. 2019. Generalized Multiple Importance Sampling. Statist. Sci. 34, 1 (02 2019), 129\u2013155."},{"volume-title":"Proc. of the 35th International Conference on Machine Learning(ICML\u201918","author":"Farajtabar M.","key":"e_1_3_2_2_18_1","unstructured":"M. Farajtabar , Y. Chow , and M. Ghavamzadeh . 2018. More Robust Doubly Robust Off-policy Evaluation . In Proc. of the 35th International Conference on Machine Learning(ICML\u201918 , Vol.\u00a080). PMLR, 1447\u20131456. M. Farajtabar, Y. Chow, and M. Ghavamzadeh. 2018. More Robust Doubly Robust Off-policy Evaluation. In Proc. of the 35th International Conference on Machine Learning(ICML\u201918, Vol.\u00a080). PMLR, 1447\u20131456."},{"volume-title":"Proc. of the 34th AAAI Conference on Artificial Intelligence(AAAI\u201920)","author":"Faury L.","key":"e_1_3_2_2_19_1","unstructured":"L. Faury , U. Tanielian , F. Vasile , E. Smirnova , and E. Dohmatob . 2020. Distributionally Robust Counterfactual Risk Minimization . In Proc. of the 34th AAAI Conference on Artificial Intelligence(AAAI\u201920) . AAAI Press. L. Faury, U. Tanielian, F. Vasile, E. Smirnova, and E. Dohmatob. 2020. Distributionally Robust Counterfactual Risk Minimization. In Proc. of the 34th AAAI Conference on Artificial Intelligence(AAAI\u201920). AAAI Press."},{"volume-title":"Proc. of The 33rd International Conference on Machine Learning(ICML \u201916)","author":"Gal Y.","key":"e_1_3_2_2_20_1","unstructured":"Y. Gal and Z. Ghahramani . 2016. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning . In Proc. of The 33rd International Conference on Machine Learning(ICML \u201916) . PMLR, 1050\u20131059. Y. Gal and Z. Ghahramani. 2016. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. In Proc. of The 33rd International Conference on Machine Learning(ICML \u201916). PMLR, 1050\u20131059."},{"volume-title":"Proc. of the 8th ACM Conference on Recommender Systems(RecSys \u201914)","author":"Garcin F.","key":"e_1_3_2_2_21_1","unstructured":"F. Garcin , B. Faltings , O. Donatsch , A. Alazzawi , C. Bruttin , and A. Huber . 2014. Offline and Online Evaluation of News Recommender Systems at Swissinfo.Ch . In Proc. of the 8th ACM Conference on Recommender Systems(RecSys \u201914) . 169\u2013176. F. Garcin, B. Faltings, O. Donatsch, A. Alazzawi, C. Bruttin, and A. Huber. 2014. Offline and Online Evaluation of News Recommender Systems at Swissinfo.Ch. In Proc. of the 8th ACM Conference on Recommender Systems(RecSys \u201914). 169\u2013176."},{"volume-title":"Proc. of the 11th ACM International Conference on Web Search and Data Mining(WSDM \u201918)","author":"Gilotte A.","key":"e_1_3_2_2_22_1","unstructured":"A. Gilotte , C. Calauz\u00e8nes , T. Nedelec , A. Abraham , and S. Doll\u00e9 . 2018. Offline A\/B Testing for Recommender Systems . In Proc. of the 11th ACM International Conference on Web Search and Data Mining(WSDM \u201918) . ACM, 198\u2013206. A. Gilotte, C. Calauz\u00e8nes, T. Nedelec, A. Abraham, and S. Doll\u00e9. 2018. Offline A\/B Testing for Recommender Systems. In Proc. of the 11th ACM International Conference on Web Search and Data Mining(WSDM \u201918). ACM, 198\u2013206."},{"volume-title":"Proc. of the 14th ACM Conference on Recommender Systems. ACM, 456\u2013461","author":"Guo D.","key":"e_1_3_2_2_23_1","unstructured":"D. Guo , S.\u00a0 I. Ktena , P.\u00a0 K. Myana , F. Huszar , W. Shi , A. Tejani , M. Kneier , and S. Das . 2020. Deep Bayesian Bandits: Exploring in Online Personalized Recommendations . In Proc. of the 14th ACM Conference on Recommender Systems. ACM, 456\u2013461 . D. Guo, S.\u00a0I. Ktena, P.\u00a0K. Myana, F. Huszar, W. Shi, A. Tejani, M. Kneier, and S. Das. 2020. Deep Bayesian Bandits: Exploring in Online Personalized Recommendations. In Proc. of the 14th ACM Conference on Recommender Systems. ACM, 456\u2013461."},{"key":"e_1_3_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/2648584.2648589"},{"key":"e_1_3_2_2_25_1","volume-title":"Proc. of the 9th International Conference on Learning Representations(ICLR \u201921)","author":"Hui L.","year":"2006","unstructured":"L. Hui and M. Belkin . 2021. Evaluation of Neural Architectures Trained with Square Loss vs Cross-Entropy in Classification Tasks . In Proc. of the 9th International Conference on Learning Representations(ICLR \u201921) . arxiv: 2006 .07322\u00a0[cs.LG] L. Hui and M. Belkin. 2021. Evaluation of Neural Architectures Trained with Square Loss vs Cross-Entropy in Classification Tasks. In Proc. of the 9th International Conference on Learning Representations(ICLR \u201921). arxiv:2006.07322\u00a0[cs.LG]"},{"key":"e_1_3_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1198\/106186008X320456"},{"key":"e_1_3_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/3298689.3347069"},{"volume-title":"Proc. of the ACM RecSys Workshop on Bandit Learning from User Interactions(REVEAL \u201920)","author":"Jeunen O.","key":"e_1_3_2_2_28_1","unstructured":"O. Jeunen and B. Goethals . 2020. An Empirical Evaluation of Doubly Robust Learning for Recommendation . In Proc. of the ACM RecSys Workshop on Bandit Learning from User Interactions(REVEAL \u201920) . O. Jeunen and B. Goethals. 2020. An Empirical Evaluation of Doubly Robust Learning for Recommendation. In Proc. of the ACM RecSys Workshop on Bandit Learning from User Interactions(REVEAL \u201920)."},{"volume-title":"Proc. of the ACM RecSys Workshop on Reinforcement Learning and Robust Estimators for Recommendation(REVEAL \u201919)","author":"Jeunen O.","key":"e_1_3_2_2_29_1","unstructured":"O. Jeunen , D. Mykhaylov , D. Rohde , F. Vasile , A. Gilotte , and M. Bompaire . 2019. Learning from Bandit Feedback: An Overview of the State-of-the-art . In Proc. of the ACM RecSys Workshop on Reinforcement Learning and Robust Estimators for Recommendation(REVEAL \u201919) . O. Jeunen, D. Mykhaylov, D. Rohde, F. Vasile, A. Gilotte, and M. Bompaire. 2019. Learning from Bandit Feedback: An Overview of the State-of-the-art. In Proc. of the ACM RecSys Workshop on Reinforcement Learning and Robust Estimators for Recommendation(REVEAL \u201919)."},{"volume-title":"Proc. of the ACM RecSys Workshop on Reinforcement Learning and Robust Estimators for Recommendation(REVEAL \u201919)","author":"Jeunen O.","key":"e_1_3_2_2_30_1","unstructured":"O. Jeunen , D. Rohde , and F. Vasile . 2019. On the Value of Bandit Feedback for Offline Recommender System Evaluation . In Proc. of the ACM RecSys Workshop on Reinforcement Learning and Robust Estimators for Recommendation(REVEAL \u201919) . O. Jeunen, D. Rohde, and F. Vasile. 2019. On the Value of Bandit Feedback for Offline Recommender System Evaluation. In Proc. of the ACM RecSys Workshop on Reinforcement Learning and Robust Estimators for Recommendation(REVEAL \u201919)."},{"volume-title":"Proc. of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining(KDD \u201920)","author":"Jeunen O.","key":"e_1_3_2_2_31_1","unstructured":"O. Jeunen , D. Rohde , F. Vasile , and M. Bompaire . 2020. Joint Policy-Value Learning for Recommendation . In Proc. of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining(KDD \u201920) . ACM, 1223\u20131233. O. Jeunen, D. Rohde, F. Vasile, and M. Bompaire. 2020. Joint Policy-Value Learning for Recommendation. In Proc. of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining(KDD \u201920). ACM, 1223\u20131233."},{"volume-title":"Proc. of the 14th ACM Conference on Recommender Systems(RecSys \u201920)","author":"Jeunen O.","key":"e_1_3_2_2_32_1","unstructured":"O. Jeunen , J. Van\u00a0Balen , and B. Goethals . 2020. Closed-Form Models for Collaborative Filtering with Side-Information . In Proc. of the 14th ACM Conference on Recommender Systems(RecSys \u201920) . ACM, 651\u2013656. O. Jeunen, J. Van\u00a0Balen, and B. Goethals. 2020. Closed-Form Models for Collaborative Filtering with Side-Information. In Proc. of the 14th ACM Conference on Recommender Systems(RecSys \u201920). ACM, 651\u2013656."},{"key":"e_1_3_2_2_33_1","unstructured":"Y. Jin Z. Yang and Z. Wang. 2020. Is Pessimism Provably Efficient for Offline RL?arxiv:2012.15085\u00a0[cs.LG] Y. Jin Z. Yang and Z. Wang. 2020. Is Pessimism Provably Efficient for Offline RL?arxiv:2012.15085\u00a0[cs.LG]"},{"key":"e_1_3_2_2_34_1","volume-title":"Proc. of the 6th International Conference on Learning Representations(ICLR \u201918)","author":"Joachims T.","year":"2018","unstructured":"T. Joachims , A. Swaminathan , and M. de Rijke . 2018 . Deep Learning with Logged Bandit Feedback . In Proc. of the 6th International Conference on Learning Representations(ICLR \u201918) . T. Joachims, A. Swaminathan, and M. de Rijke. 2018. Deep Learning with Logged Bandit Feedback. In Proc. of the 6th International Conference on Learning Representations(ICLR \u201918)."},{"volume-title":"Proc. of the 10th ACM International Conference on Web Search and Data Mining(WSDM \u201917)","author":"Joachims T.","key":"e_1_3_2_2_35_1","unstructured":"T. Joachims , A. Swaminathan , and T. Schnabel . 2017. Unbiased Learning-to-Rank with Biased Feedback . In Proc. of the 10th ACM International Conference on Web Search and Data Mining(WSDM \u201917) . ACM, 781\u2013789. T. Joachims, A. Swaminathan, and T. Schnabel. 2017. Unbiased Learning-to-Rank with Biased Feedback. In Proc. of the 10th ACM International Conference on Web Search and Data Mining(WSDM \u201917). ACM, 781\u2013789."},{"key":"e_1_3_2_2_36_1","unstructured":"R. Kidambi A. Rajeswaran P. Netrapalli and T. Joachims. 2020. MOReL: Model-Based Offline Reinforcement Learning. In Advances in Neural Information Processing Systems(NeurIPS \u201920 Vol.\u00a033). R. Kidambi A. Rajeswaran P. Netrapalli and T. Joachims. 2020. MOReL: Model-Based Offline Reinforcement Learning. In Advances in Neural Information Processing Systems(NeurIPS \u201920 Vol.\u00a033)."},{"key":"e_1_3_2_2_37_1","unstructured":"A. Kumar A. Zhou G. Tucker and S. Levine. 2020. Conservative Q-Learning for Offline Reinforcement Learning. In Advances in Neural Information Processing Systems(NeurIPS \u201920 Vol.\u00a033). A. Kumar A. Zhou G. Tucker and S. Levine. 2020. Conservative Q-Learning for Offline Reinforcement Learning. In Advances in Neural Information Processing Systems(NeurIPS \u201920 Vol.\u00a033)."},{"key":"e_1_3_2_2_38_1","unstructured":"D. Lefortier A. Swaminathan X. Gu T. Joachims and M. de Rijke. 2016. Large-scale validation of counterfactual learning methods: A test-bed. arXiv preprint arXiv:1612.00367(2016). D. Lefortier A. Swaminathan X. Gu T. Joachims and M. de Rijke. 2016. Large-scale validation of counterfactual learning methods: A test-bed. arXiv preprint arXiv:1612.00367(2016)."},{"key":"e_1_3_2_2_39_1","unstructured":"S. Levine A. Kumar G. Tucker and J. Fu. 2020. Offline Reinforcement Learning: Tutorial Review and Perspectives on Open Problems. arxiv:2005.01643\u00a0[cs.LG] S. Levine A. Kumar G. Tucker and J. Fu. 2020. Offline Reinforcement Learning: Tutorial Review and Perspectives on Open Problems. arxiv:2005.01643\u00a0[cs.LG]"},{"key":"e_1_3_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/1772690.1772758"},{"volume-title":"Proc. of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR \u201916)","author":"Li S.","key":"e_1_3_2_2_41_1","unstructured":"S. Li , A. Karatzoglou , and C. Gentile . 2016. Collaborative Filtering Bandits . In Proc. of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR \u201916) . ACM, 539\u2013548. S. Li, A. Karatzoglou, and C. Gentile. 2016. Collaborative Filtering Bandits. In Proc. of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR \u201916). ACM, 539\u2013548."},{"volume-title":"Proc. of the 2018 World Wide Web Conference(WWW \u201918)","author":"Liang D.","key":"e_1_3_2_2_42_1","unstructured":"D. Liang , R.\u00a0 G. Krishnan , M.\u00a0 D Hoffman , and T. Jebara . 2018. Variational autoencoders for collaborative filtering . In Proc. of the 2018 World Wide Web Conference(WWW \u201918) . ACM, 689\u2013698. D. Liang, R.\u00a0G. Krishnan, M.\u00a0D Hoffman, and T. Jebara. 2018. Variational autoencoders for collaborative filtering. In Proc. of the 2018 World Wide Web Conference(WWW \u201918). ACM, 689\u2013698."},{"key":"e_1_3_2_2_43_1","unstructured":"Y. Liu A. Swaminathan A. Agarwal and E. Brunskill. 2020. Provably Good Batch Off-Policy Reinforcement Learning Without Great Exploration. In Advances in Neural Information Processing Systems(NeurIPS \u201920 Vol.\u00a033). Y. Liu A. Swaminathan A. Agarwal and E. Brunskill. 2020. Provably Good Batch Off-Policy Reinforcement Learning Without Great Exploration. In Advances in Neural Information Processing Systems(NeurIPS \u201920 Vol.\u00a033)."},{"volume-title":"Proc. of the 36th International Conference on Machine Learning(ICML \u201919","author":"London B.","key":"e_1_3_2_2_44_1","unstructured":"B. London and T. Sandler . 2019. Bayesian Counterfactual Risk Minimization . In Proc. of the 36th International Conference on Machine Learning(ICML \u201919 , Vol.\u00a097). PMLR, 4125\u20134133. B. London and T. Sandler. 2019. Bayesian Counterfactual Risk Minimization. In Proc. of the 36th International Conference on Machine Learning(ICML \u201919, Vol.\u00a097). PMLR, 4125\u20134133."},{"key":"e_1_3_2_2_45_1","volume-title":"Proc. of the 35th AAAI Conference on Artificial Intelligence(AAAI\u201921)","author":"Lopez R.","year":"2021","unstructured":"R. Lopez , I. Dhillion , and M.\u00a0 I. Jordan . 2021 . Learning from eXtreme Bandit Feedback . In Proc. of the 35th AAAI Conference on Artificial Intelligence(AAAI\u201921) . AAAI Press. R. Lopez, I. Dhillion, and M.\u00a0I. Jordan. 2021. Learning from eXtreme Bandit Feedback. In Proc. of the 35th AAAI Conference on Artificial Intelligence(AAAI\u201921). AAAI Press."},{"volume-title":"Proc. of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining(KDD \u201920)","author":"Ma C.","key":"e_1_3_2_2_46_1","unstructured":"C. Ma , L. Ma , Y. Zhang , R. Tang , X. Liu , and M. Coates . 2020. Probabilistic Metric Learning with Adaptive Margin for Top-K Recommendation . In Proc. of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining(KDD \u201920) . ACM, 1036\u20131044. C. Ma, L. Ma, Y. Zhang, R. Tang, X. Liu, and M. Coates. 2020. Probabilistic Metric Learning with Adaptive Margin for Top-K Recommendation. In Proc. of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining(KDD \u201920). ACM, 1036\u20131044."},{"key":"e_1_3_2_2_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/3219819.3220007"},{"key":"e_1_3_2_2_48_1","volume-title":"Proc. of the 2020 World Wide Web Conference(WWW \u201920)","author":"Ma J.","year":"2020","unstructured":"J. Ma , Z. Zhao , X. Yi , J. Yang , M. Chen , J. Tang , L. Hong , and E.\u00a0 H. Chi . 2020 . Off-Policy Learning in Two-Stage Recommender Systems . In Proc. of the 2020 World Wide Web Conference(WWW \u201920) . ACM. J. Ma, Z. Zhao, X. Yi, J. Yang, M. Chen, J. Tang, L. Hong, and E.\u00a0H. Chi. 2020. Off-Policy Learning in Two-Stage Recommender Systems. In Proc. of the 2020 World Wide Web Conference(WWW \u201920). ACM."},{"volume-title":"Proc. of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS)(AIStats \u201919","author":"Ma Y.","key":"e_1_3_2_2_49_1","unstructured":"Y. Ma , Y. Wang , and B. Narayanaswamy . 2019. Imitation-Regularized Offline Learning . In Proc. of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS)(AIStats \u201919 , Vol.\u00a089). PMLR, 2956\u20132965. Y. Ma, Y. Wang, and B. Narayanaswamy. 2019. Imitation-Regularized Offline Learning. In Proc. of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS)(AIStats \u201919, Vol.\u00a089). PMLR, 2956\u20132965."},{"volume-title":"Proc. of the 29th ACM International Conference on Information & Knowledge Management(CIKM \u201920)","author":"Mansoury M.","key":"e_1_3_2_2_50_1","unstructured":"M. Mansoury , H. Abdollahpouri , M. Pechenizkiy , B. Mobasher , and R. Burke . 2020. Feedback Loop and Bias Amplification in Recommender Systems . In Proc. of the 29th ACM International Conference on Information & Knowledge Management(CIKM \u201920) . ACM, 2145\u20132148. M. Mansoury, H. Abdollahpouri, M. Pechenizkiy, B. Mobasher, and R. Burke. 2020. Feedback Loop and Bias Amplification in Recommender Systems. In Proc. of the 29th ACM International Conference on Information & Knowledge Management(CIKM \u201920). ACM, 2145\u20132148."},{"key":"e_1_3_2_2_51_1","unstructured":"A. Masegosa. 2020. Learning under Model Misspecification: Applications to Variational and Ensemble methods. In Advances in Neural Information Processing Systems(NeurIPS \u201920 Vol.\u00a033). 5479\u20135491. A. Masegosa. 2020. Learning under Model Misspecification: Applications to Variational and Ensemble methods. In Advances in Neural Information Processing Systems(NeurIPS \u201920 Vol.\u00a033). 5479\u20135491."},{"key":"e_1_3_2_2_52_1","first-page":"21","article-title":"Empirical Bernstein Bounds and Sample Variance","volume":"1050","author":"Maurer A.","year":"2009","unstructured":"A. Maurer and M. Pontil . 2009 . Empirical Bernstein Bounds and Sample Variance Penalization. Stat. 1050 (2009), 21 . A. Maurer and M. Pontil. 2009. Empirical Bernstein Bounds and Sample Variance Penalization. Stat. 1050(2009), 21.","journal-title":"Penalization. Stat."},{"key":"e_1_3_2_2_53_1","first-page":"1","article-title":"Optimistic Bayesian Sampling in Contextual-Bandit Problems","volume":"13","author":"May C.","year":"2012","unstructured":"B.\u00a0 C. May , N. Korda , A. Lee , and D.\u00a0 S. Leslie . 2012 . Optimistic Bayesian Sampling in Contextual-Bandit Problems . J. Mach. Learn. Res. 13 , 1 (June 2012), 2069\u20132106. B.\u00a0C. May, N. Korda, A. Lee, and D.\u00a0S. Leslie. 2012. Optimistic Bayesian Sampling in Contextual-Bandit Problems. J. Mach. Learn. Res. 13, 1 (June 2012), 2069\u20132106.","journal-title":"J. Mach. Learn. Res."},{"volume-title":"Proc. of the 12th ACM Conference on Recommender Systems(RecSys \u201918)","author":"McInerney J.","key":"e_1_3_2_2_54_1","unstructured":"J. McInerney , B. Lacker , S. Hansen , K. Higley , H. Bouchard , A. Gruson , and R. Mehrotra . 2018. Explore, Exploit, and Explain: Personalizing Explainable Recommendations with Bandits . In Proc. of the 12th ACM Conference on Recommender Systems(RecSys \u201918) . ACM, 31\u201339. J. McInerney, B. Lacker, S. Hansen, K. Higley, H. Bouchard, A. Gruson, and R. Mehrotra. 2018. Explore, Exploit, and Explain: Personalizing Explainable Recommendations with Bandits. In Proc. of the 12th ACM Conference on Recommender Systems(RecSys \u201918). ACM, 31\u201339."},{"key":"e_1_3_2_2_55_1","volume-title":"Proc. of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1222\u20131230","author":"McMahan B.","year":"2013","unstructured":"H.\u00a0 B. McMahan , G. Holt , D. Sculley , M. Young , D. Ebner , J. Grady , L. Nie , T. Phillips , E. Davydov , D. Golovin , 2013 . Ad click prediction: a view from the trenches . In Proc. of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1222\u20131230 . H.\u00a0B. McMahan, G. Holt, D. Sculley, M. Young, D. Ebner, J. Grady, L. Nie, T. Phillips, E. Davydov, D. Golovin, 2013. Ad click prediction: a view from the trenches. In Proc. of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1222\u20131230."},{"volume-title":"Proc. of the 27th ACM International Conference on Information and Knowledge Management(CIKM \u201918)","author":"Mehrotra R.","key":"e_1_3_2_2_56_1","unstructured":"R. Mehrotra , J. McInerney , H. Bouchard , M. Lalmas , and F. Diaz . 2018. Towards a Fair Marketplace: Counterfactual Evaluation of the Trade-off between Relevance, Fairness & Satisfaction in Recommendation Systems . In Proc. of the 27th ACM International Conference on Information and Knowledge Management(CIKM \u201918) . ACM, 2243\u20132251. R. Mehrotra, J. McInerney, H. Bouchard, M. Lalmas, and F. Diaz. 2018. Towards a Fair Marketplace: Counterfactual Evaluation of the Trade-off between Relevance, Fairness & Satisfaction in Recommendation Systems. In Proc. of the 27th ACM International Conference on Information and Knowledge Management(CIKM \u201918). ACM, 2243\u20132251."},{"volume-title":"Proc. of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining(KDD \u201920)","author":"Mehrotra R.","key":"e_1_3_2_2_57_1","unstructured":"R. Mehrotra , N. Xue , and M. Lalmas . 2020. Bandit Based Optimization of Multiple Objectives on a Music Streaming Platform . In Proc. of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining(KDD \u201920) . ACM, 3224\u20133233. R. Mehrotra, N. Xue, and M. Lalmas. 2020. Bandit Based Optimization of Multiple Objectives on a Music Streaming Platform. In Proc. of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining(KDD \u201920). ACM, 3224\u20133233."},{"volume-title":"Probabilistic Machine Learning: An introduction","author":"Murphy P.","key":"e_1_3_2_2_58_1","unstructured":"K.\u00a0 P. Murphy . 2021. Probabilistic Machine Learning: An introduction . MIT Press . K.\u00a0P. Murphy. 2021. Probabilistic Machine Learning: An introduction. MIT Press."},{"volume-title":"Proc. of the NeurIPS Workshop on Causality and Machine Learning(CausalML \u201919)","author":"Mykhaylov D.","key":"e_1_3_2_2_59_1","unstructured":"D. Mykhaylov , D. Rohde , F. Vasile , M. Bompaire , and O. Jeunen . 2019. Three Methods for Training on Bandit Feedback . In Proc. of the NeurIPS Workshop on Causality and Machine Learning(CausalML \u201919) . D. Mykhaylov, D. Rohde, F. Vasile, M. Bompaire, and O. Jeunen. 2019. Three Methods for Training on Bandit Feedback. In Proc. of the NeurIPS Workshop on Causality and Machine Learning(CausalML \u201919)."},{"volume-title":"Proc. of the 2011 IEEE 11th International Conference on Data Mining(ICDM \u201911)","author":"Ning X.","key":"e_1_3_2_2_60_1","unstructured":"X. Ning and G. Karypis . 2011. SLIM: Sparse Linear Methods for Top-N Recommender Systems . In Proc. of the 2011 IEEE 11th International Conference on Data Mining(ICDM \u201911) . IEEE Computer Society, 497\u2013506. X. Ning and G. Karypis. 2011. SLIM: Sparse Linear Methods for Top-N Recommender Systems. In Proc. of the 2011 IEEE 11th International Conference on Data Mining(ICDM \u201911). IEEE Computer Society, 497\u2013506."},{"key":"e_1_3_2_2_61_1","doi-asserted-by":"publisher","DOI":"10.1145\/3397271.3401102"},{"key":"e_1_3_2_2_62_1","unstructured":"I. Osband C. Blundell A. Pritzel and B. Van\u00a0Roy. 2016. Deep Exploration via Bootstrapped DQN. In Advances in Neural Information Processing Systems Vol.\u00a029. 4026\u20134034. I. Osband C. Blundell A. Pritzel and B. Van\u00a0Roy. 2016. Deep Exploration via Bootstrapped DQN. In Advances in Neural Information Processing Systems Vol.\u00a029. 4026\u20134034."},{"key":"e_1_3_2_2_63_1","unstructured":"A.\u00a0B. Owen. 2013. Monte Carlo theory methods and examples. A.\u00a0B. Owen. 2013. Monte Carlo theory methods and examples."},{"volume-title":"Proc. of the ACM RecSys Workshop on Offline Evaluation for Recommender Systems(REVEAL \u201918)","author":"Rohde D.","key":"e_1_3_2_2_64_1","unstructured":"D. Rohde , S. Bonner , T. Dunlop , F. Vasile , and A. Karatzoglou . 2018. RecoGym: A Reinforcement Learning Environment for the problem of Product Recommendation in Online Advertising . In Proc. of the ACM RecSys Workshop on Offline Evaluation for Recommender Systems(REVEAL \u201918) . D. Rohde, S. Bonner, T. Dunlop, F. Vasile, and A. Karatzoglou. 2018. RecoGym: A Reinforcement Learning Environment for the problem of Product Recommendation in Online Advertising. In Proc. of the ACM RecSys Workshop on Offline Evaluation for Recommender Systems(REVEAL \u201918)."},{"volume-title":"Proc. of the 10th ACM Conference on Recommender Systems(RecSys \u201916)","author":"Rossetti M.","key":"e_1_3_2_2_65_1","unstructured":"M. Rossetti , F. Stella , and M. Zanker . 2016. Contrasting Offline and Online Results when Evaluating Recommendation Algorithms . In Proc. of the 10th ACM Conference on Recommender Systems(RecSys \u201916) . ACM, 31\u201334. M. Rossetti, F. Stella, and M. Zanker. 2016. Contrasting Offline and Online Results when Evaluating Recommendation Algorithms. In Proc. of the 10th ACM Conference on Recommender Systems(RecSys \u201916). ACM, 31\u201334."},{"volume-title":"Proc. of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 965\u2013975","author":"Sachdeva N.","key":"e_1_3_2_2_66_1","unstructured":"N. Sachdeva , Y. Su , and T. Joachims . 2020. Off-Policy Bandits with Deficient Support . In Proc. of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 965\u2013975 . N. Sachdeva, Y. Su, and T. Joachims. 2020. Off-Policy Bandits with Deficient Support. In Proc. of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 965\u2013975."},{"key":"e_1_3_2_2_67_1","unstructured":"Y. Saito S. Aihara M. Matsutani and Y. Narita. 2020. Large-scale Open Dataset Pipeline and Benchmark for Bandit Algorithms. arxiv:2008.07146\u00a0[cs.LG] Y. Saito S. Aihara M. Matsutani and Y. Narita. 2020. Large-scale Open Dataset Pipeline and Benchmark for Bandit Algorithms. arxiv:2008.07146\u00a0[cs.LG]"},{"volume-title":"Proc. of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining(KDD \u201920)","author":"Sakhi O.","key":"e_1_3_2_2_68_1","unstructured":"O. Sakhi , S. Bonner , D. Rohde , and F. Vasile . 2020. BLOB : A Probabilistic Model for Recommendation that Combines Organic and Bandit Signals . In Proc. of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining(KDD \u201920) . ACM, 783\u2013793. O. Sakhi, S. Bonner, D. Rohde, and F. Vasile. 2020. BLOB : A Probabilistic Model for Recommendation that Combines Organic and Bandit Signals. In Proc. of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining(KDD \u201920). ACM, 783\u2013793."},{"key":"e_1_3_2_2_69_1","volume-title":"Proc. of the AAAI Conference on Artificial Intelligence 30","author":"Sedhain S.","year":"2016","unstructured":"S. Sedhain , A. Menon , S. Sanner , and D. Braziunas . 2016. On the Effectiveness of Linear Models for One-Class Collaborative Filtering . Proc. of the AAAI Conference on Artificial Intelligence 30 , 1( 2016 ). S. Sedhain, A. Menon, S. Sanner, and D. Braziunas. 2016. On the Effectiveness of Linear Models for One-Class Collaborative Filtering. Proc. of the AAAI Conference on Artificial Intelligence 30, 1(2016)."},{"key":"e_1_3_2_2_70_1","doi-asserted-by":"publisher","DOI":"10.1145\/3336191.3371831"},{"key":"e_1_3_2_2_71_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0378-3758(00)00115-4"},{"volume-title":"Distributionally Robust Policy Evaluation and Learning in Offline Contextual Bandits. In International Conference on Machine Learning(ICML\u201920)","author":"Si N.","key":"e_1_3_2_2_72_1","unstructured":"N. Si , F. Zhang , Z. Zhou , and J. Blanchet . 2020 . Distributionally Robust Policy Evaluation and Learning in Offline Contextual Bandits. In International Conference on Machine Learning(ICML\u201920) . N. Si, F. Zhang, Z. Zhou, and J. Blanchet. 2020. Distributionally Robust Policy Evaluation and Learning in Offline Contextual Bandits. In International Conference on Machine Learning(ICML\u201920)."},{"key":"e_1_3_2_2_73_1","doi-asserted-by":"publisher","DOI":"10.1287\/mnsc.1050.0451"},{"key":"e_1_3_2_2_74_1","volume-title":"Embarrassingly Shallow Autoencoders for Sparse Data. In The World Wide Web Conference(WWW \u201919)","author":"Steck H.","year":"2019","unstructured":"H. Steck . 2019 . Embarrassingly Shallow Autoencoders for Sparse Data. In The World Wide Web Conference(WWW \u201919) . ACM, 3251\u20133257. H. Steck. 2019. Embarrassingly Shallow Autoencoders for Sparse Data. In The World Wide Web Conference(WWW \u201919). ACM, 3251\u20133257."},{"volume-title":"Proc. of the 37th International Conference on Machine Learning(ICML \u201920)","author":"Su Y.","key":"e_1_3_2_2_75_1","unstructured":"Y. Su , M. Dimakopoulou , A. Krishnamurthy , and M. Dudik . 2020. Doubly robust off-policy evaluation with shrinkage . In Proc. of the 37th International Conference on Machine Learning(ICML \u201920) . PMLR, 9167\u20139176. Y. Su, M. Dimakopoulou, A. Krishnamurthy, and M. Dudik. 2020. Doubly robust off-policy evaluation with shrinkage. In Proc. of the 37th International Conference on Machine Learning(ICML \u201920). PMLR, 9167\u20139176."},{"volume-title":"CAB: Continuous Adaptive Blending for Policy Evaluation and Learning. In International Conference on Machine Learning(ICML\u201919)","author":"Su Y.","key":"e_1_3_2_2_76_1","unstructured":"Y. Su , L. Wang , M. Santacatterina , and T. Joachims . 2019 . CAB: Continuous Adaptive Blending for Policy Evaluation and Learning. In International Conference on Machine Learning(ICML\u201919) . 6005\u20136014. Y. Su, L. Wang, M. Santacatterina, and T. Joachims. 2019. CAB: Continuous Adaptive Blending for Policy Evaluation and Learning. In International Conference on Machine Learning(ICML\u201919). 6005\u20136014."},{"volume-title":"Proc. of the 32nd International Conference on International Conference on Machine Learning(ICML\u201915)","author":"Swaminathan A.","key":"e_1_3_2_2_77_1","unstructured":"A. Swaminathan and T. Joachims . 2015. Counterfactual Risk Minimization: Learning from Logged Bandit Feedback . In Proc. of the 32nd International Conference on International Conference on Machine Learning(ICML\u201915) . JMLR.org, 814\u2013823. A. Swaminathan and T. Joachims. 2015. Counterfactual Risk Minimization: Learning from Logged Bandit Feedback. In Proc. of the 32nd International Conference on International Conference on Machine Learning(ICML\u201915). JMLR.org, 814\u2013823."},{"key":"e_1_3_2_2_78_1","unstructured":"A. Swaminathan and T. Joachims. 2015. The Self-Normalized Estimator for Counterfactual Learning. In Advances in Neural Information Processing Systems. 3231\u20133239. A. Swaminathan and T. Joachims. 2015. The Self-Normalized Estimator for Counterfactual Learning. In Advances in Neural Information Processing Systems. 3231\u20133239."},{"volume-title":"Proc. of the 14th ACM Conference on Recommender Systems(RecSys \u201920)","author":"Tang H.","key":"e_1_3_2_2_79_1","unstructured":"H. Tang , J. Liu , M. Zhao , and X. Gong . 2020. Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations . In Proc. of the 14th ACM Conference on Recommender Systems(RecSys \u201920) . ACM, 269\u2013278. H. Tang, J. Liu, M. Zhao, and X. Gong. 2020. Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations. In Proc. of the 14th ACM Conference on Recommender Systems(RecSys \u201920). ACM, 269\u2013278."},{"volume-title":"Proc. of the 28th ACM Conference on User Modeling, Adaptation and Personalization(UMAP \u201920)","author":"Vasile F.","key":"e_1_3_2_2_80_1","unstructured":"F. Vasile , D. Rohde , O. Jeunen , and A. Benhalloum . 2020. A Gentle Introduction to Recommendation as Counterfactual Policy Learning . In Proc. of the 28th ACM Conference on User Modeling, Adaptation and Personalization(UMAP \u201920) . ACM, 392\u2013393. F. Vasile, D. Rohde, O. Jeunen, and A. Benhalloum. 2020. A Gentle Introduction to Recommendation as Counterfactual Policy Learning. In Proc. of the 28th ACM Conference on User Modeling, Adaptation and Personalization(UMAP \u201920). ACM, 392\u2013393."},{"key":"e_1_3_2_2_81_1","volume-title":"Proc. of the 25th Conference on Uncertainty in Artificial Intelligence(UAI \u201909)","author":"Walsh J.","year":"2009","unstructured":"T.\u00a0 J. Walsh , I. Szita , C. Diuk , and M.\u00a0 L. Littman . 2009 . Exploring Compact Reinforcement-Learning Representations with Linear Regression . In Proc. of the 25th Conference on Uncertainty in Artificial Intelligence(UAI \u201909) . AUAI Press, 591\u2013598. T.\u00a0J. Walsh, I. Szita, C. Diuk, and M.\u00a0L. Littman. 2009. Exploring Compact Reinforcement-Learning Representations with Linear Regression. In Proc. of the 25th Conference on Uncertainty in Artificial Intelligence(UAI \u201909). AUAI Press, 591\u2013598."},{"key":"e_1_3_2_2_82_1","doi-asserted-by":"publisher","DOI":"10.1145\/3397271.3401147"},{"key":"e_1_3_2_2_83_1","volume-title":"MOPO: Model-Based Offline Policy Optimization. In Advances in Neural Information Processing Systems(NeurIPS \u201920, Vol.\u00a033).","author":"Yu T.","year":"2020","unstructured":"T. Yu , G. Thomas , L. Yu , S. Ermon , J.\u00a0 Y. Zou , S. Levine , C. Finn , and T. Ma . 2020 . MOPO: Model-Based Offline Policy Optimization. In Advances in Neural Information Processing Systems(NeurIPS \u201920, Vol.\u00a033). T. Yu, G. Thomas, L. Yu, S. Ermon, J.\u00a0Y. Zou, S. Levine, C. Finn, and T. Ma. 2020. MOPO: Model-Based Offline Policy Optimization. In Advances in Neural Information Processing Systems(NeurIPS \u201920, Vol.\u00a033)."},{"key":"e_1_3_2_2_84_1","doi-asserted-by":"publisher","DOI":"10.1145\/3298689.3346997"}],"event":{"name":"RecSys '21: Fifteenth ACM Conference on Recommender Systems","sponsor":["SIGWEB ACM Special Interest Group on Hypertext, Hypermedia, and Web","SIGAI ACM Special Interest Group on Artificial Intelligence","SIGKDD ACM Special Interest Group on Knowledge Discovery in Data","SIGIR ACM Special Interest Group on Information Retrieval","SIGCHI ACM Special Interest Group on Computer-Human Interaction","SIGecom Special Interest Group on Economics and Computation"],"location":"Amsterdam Netherlands","acronym":"RecSys '21"},"container-title":["Fifteenth ACM Conference on Recommender Systems"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3460231.3474247","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3460231.3474247","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:12:17Z","timestamp":1750191137000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3460231.3474247"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,9,13]]},"references-count":84,"alternative-id":["10.1145\/3460231.3474247","10.1145\/3460231"],"URL":"https:\/\/doi.org\/10.1145\/3460231.3474247","relation":{},"subject":[],"published":{"date-parts":[[2021,9,13]]}}}