{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,2]],"date-time":"2026-01-02T07:39:54Z","timestamp":1767339594997,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":21,"publisher":"ACM","license":[{"start":{"date-parts":[[2019,7,25]],"date-time":"2019-07-25T00:00:00Z","timestamp":1564012800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"NSFC","award":["61876077"],"award-info":[{"award-number":["61876077"]}]},{"name":"National Key R&D Program of China","award":["2017YFB1002201"],"award-info":[{"award-number":["2017YFB1002201"]}]},{"name":"Jiangsu SF","award":["BK20170013"],"award-info":[{"award-number":["BK20170013"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2019,7,25]]},"DOI":"10.1145\/3292500.3330933","type":"proceedings-article","created":{"date-parts":[[2019,7,26]],"date-time":"2019-07-26T13:17:26Z","timestamp":1564147046000},"page":"566-576","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":38,"title":["Environment Reconstruction with Hidden Confounders for Reinforcement Learning based Recommendation"],"prefix":"10.1145","author":[{"given":"Wenjie","family":"Shang","sequence":"first","affiliation":[{"name":"Nanjing University, Nanjing, China"}]},{"given":"Yang","family":"Yu","sequence":"additional","affiliation":[{"name":"Nanjing University, Nanjing, China"}]},{"given":"Qingyang","family":"Li","sequence":"additional","affiliation":[{"name":"AI Labs, Didi Chuxing, Beijing, China"}]},{"given":"Zhiwei","family":"Qin","sequence":"additional","affiliation":[{"name":"AI Labs, Didi Chuxing, Beijing, China"}]},{"given":"Yiping","family":"Meng","sequence":"additional","affiliation":[{"name":"AI Labs, Didi Chuxing, Beijing, China"}]},{"given":"Jieping","family":"Ye","sequence":"additional","affiliation":[{"name":"AI Labs, Didi Chuxing, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2019,7,25]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.robot.2008.10.024"},{"key":"e_1_3_2_1_2_1","first-page":"1342","article-title":"Bandits with Unobserved Confounders","volume":"28","author":"Bareinboim Elias","year":"2015","unstructured":"Elias Bareinboim , Andrew Forney , and Judea Pearl . 2015 . Bandits with Unobserved Confounders : A Causal Approach. In Advances in Neural Information Processing Systems 28. 1342 -- 1350 . Elias Bareinboim, Andrew Forney, and Judea Pearl. 2015. Bandits with Unobserved Confounders: A Causal Approach. In Advances in Neural Information Processing Systems 28. 1342--1350.","journal-title":"A Causal Approach. In Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_1_3_1","volume-title":"Inverse Reinforcement Learning, and Energy-Based Models. arXiv","author":"Finn Chelsea","year":"2016","unstructured":"Chelsea Finn , Paul F. Christiano , Pieter Abbeel , and Sergey Levine . 2016. A Connection between Generative Adversarial Networks , Inverse Reinforcement Learning, and Energy-Based Models. arXiv , Vol. abs\/ 1611 .03852 ( 2016 ). Chelsea Finn, Paul F. Christiano, Pieter Abbeel, and Sergey Levine. 2016. A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models. arXiv, Vol. abs\/1611.03852 (2016)."},{"key":"e_1_3_2_1_4_1","volume-title":"Proceedings of the 34th International Conference on Machine Learning. 1156--1164","author":"Forney Andrew","year":"2017","unstructured":"Andrew Forney , Judea Pearl , and Elias Bareinboim . 2017 . Counterfactual Data-Fusion for Online Reinforcement Learners . In Proceedings of the 34th International Conference on Machine Learning. 1156--1164 . Andrew Forney, Judea Pearl, and Elias Bareinboim. 2017. Counterfactual Data-Fusion for Online Reinforcement Learners. In Proceedings of the 34th International Conference on Machine Learning. 1156--1164."},{"key":"e_1_3_2_1_5_1","unstructured":"Ian Goodfellow Jean Pouget-Abadie Mehdi Mirza Bing Xu David Warde-Farley Sherjil Ozair Aaron C. Courville and Yoshua Bengio. 2014. Generative Adversarial Nets. In Advances in Neural Information Processing Systems 27. 2672--2680.   Ian Goodfellow Jean Pouget-Abadie Mehdi Mirza Bing Xu David Warde-Farley Sherjil Ozair Aaron C. Courville and Yoshua Bengio. 2014. Generative Adversarial Nets. In Advances in Neural Information Processing Systems 27. 2672--2680."},{"key":"e_1_3_2_1_6_1","unstructured":"Jonathan Ho and Stefano Ermon. 2016. Generative Adversarial Imitation Learning. In Advances in Neural Information Processing Systems 29. 4565--4573.   Jonathan Ho and Stefano Ermon. 2016. Generative Adversarial Imitation Learning. In Advances in Neural Information Processing Systems 29. 4565--4573."},{"key":"e_1_3_2_1_7_1","unstructured":"Christos Louizos Uri Shalit Joris M. Mooij David Sontag Richard S. Zemel and Max Welling. 2017. Causal Effect Inference with Deep Latent-Variable Models. In Advances in Neural Information Processing Systems 30. 6449--6459.   Christos Louizos Uri Shalit Joris M. Mooij David Sontag Richard S. Zemel and Max Welling. 2017. Causal Effect Inference with Deep Latent-Variable Models. In Advances in Neural Information Processing Systems 30. 6449--6459."},{"key":"e_1_3_2_1_8_1","volume-title":"Bernhard Sch\u00f6 lkopf, and Jos\u00e9 Miguel Hern\u00e1 ndez-Lobato","author":"Lu Chaochao","year":"2018","unstructured":"Chaochao Lu , Bernhard Sch\u00f6 lkopf, and Jos\u00e9 Miguel Hern\u00e1 ndez-Lobato . 2018 . Deconfounding Reinforcement Learning in Observational Settings . arXiv, Vol. abs\/ 1812 .10576 (2018). Chaochao Lu, Bernhard Sch\u00f6 lkopf, and Jos\u00e9 Miguel Hern\u00e1 ndez-Lobato. 2018. Deconfounding Reinforcement Learning in Observational Settings. arXiv, Vol. abs\/1812.10576 (2018)."},{"key":"e_1_3_2_1_9_1","volume-title":"Generating High Fidelity Images with Subscale Pixel Networks and Multidimensional Upscaling. arXiv","author":"Menick Jacob","year":"2018","unstructured":"Jacob Menick and Nal Kalchbrenner . 2018. Generating High Fidelity Images with Subscale Pixel Networks and Multidimensional Upscaling. arXiv , Vol. abs\/ 1812 .01608 ( 2018 ). Jacob Menick and Nal Kalchbrenner. 2018. Generating High Fidelity Images with Subscale Pixel Networks and Multidimensional Upscaling. arXiv, Vol. abs\/1812.01608 (2018)."},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1038\/nature14236"},{"key":"e_1_3_2_1_11_1","volume-title":"Causal inference in statistics: An overview. Statistics surveys","author":"Pearl Judea","year":"2009","unstructured":"Judea Pearl . 2009. Causal inference in statistics: An overview. Statistics surveys , Vol. 3 ( 2009 ), 96--146. Judea Pearl. 2009. Causal inference in statistics: An overview. Statistics surveys, Vol. 3 (2009), 96--146."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1991.3.1.88"},{"key":"e_1_3_2_1_13_1","volume-title":"Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics . 627--635","author":"Ross St\u00e9","year":"2011","unstructured":"St\u00e9 phane Ross , Geoffrey J. Gordon , and Drew Bagnell . 2011 . A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning . In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics . 627--635 . St\u00e9 phane Ross, Geoffrey J. Gordon, and Drew Bagnell. 2011. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics . 627--635."},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/279943.279964"},{"key":"e_1_3_2_1_15_1","volume-title":"Is imitation learning the route to humanoid robots? Trends in cognitive sciences","author":"Schaal Stefan","year":"1999","unstructured":"Stefan Schaal . 1999. Is imitation learning the route to humanoid robots? Trends in cognitive sciences , Vol. 3 , 6 ( 1999 ), 233--242. Stefan Schaal. 1999. Is imitation learning the route to humanoid robots? Trends in cognitive sciences, Vol. 3, 6 (1999), 233--242."},{"key":"e_1_3_2_1_16_1","volume-title":"Proceedings of the 32nd International Conference on Machine Learning, ICML 2015","author":"Schulman John","year":"2015","unstructured":"John Schulman , Sergey Levine , Pieter Abbeel , Michael I. Jordan , and Philipp Moritz . 2015 . Trust Region Policy Optimization . In Proceedings of the 32nd International Conference on Machine Learning, ICML 2015 , Lille, France, 6- -11 July 2015 . 1889--1897. John Schulman, Sergey Levine, Pieter Abbeel, Michael I. Jordan, and Philipp Moritz. 2015. Trust Region Policy Optimization. In Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6--11 July 2015 . 1889--1897."},{"key":"e_1_3_2_1_17_1","volume-title":"Virtual-Taobao: Virtualizing Real-world Online Retail Environment for Reinforcement Learning. arXiv","author":"Shi Jing-Cheng","year":"2018","unstructured":"Jing-Cheng Shi , Yang Yu , Qing Da , Shi-Yong Chen , and Anxiang Zeng . 2018. Virtual-Taobao: Virtualizing Real-world Online Retail Environment for Reinforcement Learning. arXiv , Vol. abs\/ 1805 .10000 ( 2018 ). Jing-Cheng Shi, Yang Yu, Qing Da, Shi-Yong Chen, and Anxiang Zeng. 2018. Virtual-Taobao: Virtualizing Real-world Online Retail Environment for Reinforcement Learning. arXiv, Vol. abs\/1805.10000 (2018)."},{"key":"e_1_3_2_1_18_1","volume-title":"Mastering the game of Go with deep neural networks and tree search. Nature","author":"Silver David","year":"2016","unstructured":"David Silver , Aja Huang , Chris J. Maddison , Arthur Guez , Laurent Sifre , George van den Driessche , Julian Schrittwieser , Ioannis Antonoglou , Vedavyas Panneershelvam , Marc Lanctot , Sander Dieleman , Dominik Grewe , John Nham , Nal Kalchbrenner , Ilya Sutskever , Timothy P. Lillicrap , Madeleine Leach , Koray Kavukcuoglu , Thore Graepel , and Demis Hassabis . 2016. Mastering the game of Go with deep neural networks and tree search. Nature , Vol. 529 , 7587 ( 2016 ), 484--489. David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Vedavyas Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy P. Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis. 2016. Mastering the game of Go with deep neural networks and tree search. Nature, Vol. 529, 7587 (2016), 484--489."},{"key":"e_1_3_2_1_19_1","volume-title":"Barto","author":"Sutton Richard S.","year":"2018","unstructured":"Richard S. Sutton and Andrew G . Barto . 2018 . Reinforcement Learning : An Introduction (2nd Edition) .MIT Press . Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction (2nd Edition) .MIT Press."},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2018.2827047"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3219819.3220111"}],"event":{"name":"KDD '19: The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining","sponsor":["SIGMOD ACM Special Interest Group on Management of Data","SIGKDD ACM Special Interest Group on Knowledge Discovery in Data"],"location":"Anchorage AK USA","acronym":"KDD '19"},"container-title":["Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3292500.3330933","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3292500.3330933","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T00:26:03Z","timestamp":1750206363000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3292500.3330933"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,7,25]]},"references-count":21,"alternative-id":["10.1145\/3292500.3330933","10.1145\/3292500"],"URL":"https:\/\/doi.org\/10.1145\/3292500.3330933","relation":{},"subject":[],"published":{"date-parts":[[2019,7,25]]},"assertion":[{"value":"2019-07-25","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}