{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,2]],"date-time":"2025-08-02T14:31:38Z","timestamp":1754145098115,"version":"3.41.2"},"reference-count":94,"publisher":"Association for Computing Machinery (ACM)","issue":"12","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Comput. Surv."],"published-print":{"date-parts":[[2025,12,31]]},"abstract":"<jats:p>The literature on bandits has developed largely independently of advances in causal inference. Work in the last few years has started investigating the close connections between these two areas and that has led to fruitful ideas that have produced advances in bandit algorithms. We present the first survey focusing specifically on the intersection of these two areas. We first provide a taxonomy for categorizing research in this area, and then place important works within this structure. We also describe various algorithms and methods, and provide the highlights. Finally, we point out promising directions for future research in this area.<\/jats:p>","DOI":"10.1145\/3744917","type":"journal-article","created":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T21:18:43Z","timestamp":1750281523000},"page":"1-30","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Causality in Bandits: A Survey"],"prefix":"10.1145","volume":"57","author":[{"ORCID":"https:\/\/orcid.org\/0009-0004-8572-125X","authenticated-orcid":false,"given":"Chandrasekar","family":"Subramanian","sequence":"first","affiliation":[{"name":"Indian Institute of Technology Madras","place":["Chennai, India"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5364-7639","authenticated-orcid":false,"given":"Balaraman","family":"Ravindran","sequence":"additional","affiliation":[{"name":"Wadhwani School of Data Science and Artificial Intelligence, Indian Institute of Technology Madras","place":["Chennai, India"]}]}],"member":"320","published-online":{"date-parts":[[2025,7,11]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","unstructured":"Ryan Prescott Adams and David J. C. MacKay. 2007. Bayesian Online Changepoint Detection. 10.48550\/arXiv.0710.3742","DOI":"10.48550\/arXiv.0710.3742"},{"key":"e_1_3_2_3_2","series-title":"PMLR","first-page":"1638","volume-title":"Proceedings of the 31st International Conference on Machine Learning","volume":"32","author":"Agarwal Alekh","year":"2014","unstructured":"Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, Lihong Li, and Robert Schapire. 2014. Taming the monster: A fast and simple algorithm for contextual bandits. In Proceedings of the 31st International Conference on Machine Learning(PMLR, Vol. 32). 1638\u20131646."},{"key":"e_1_3_2_4_2","series-title":"Proceedings of Machine Learning Research","first-page":"39.1\u201339.26","volume-title":"Proceedings of the 25th Annual Conference on Learning Theory","volume":"23","author":"Agrawal Shipra","year":"2012","unstructured":"Shipra Agrawal and Navin Goyal. 2012. Analysis of thompson sampling for the multi-armed bandit problem. In Proceedings of the 25th Annual Conference on Learning Theory(Proceedings of Machine Learning Research, Vol. 23). PMLR, Edinburgh, Scotland, 39.1\u201339.26. Retrieved from http:\/\/proceedings.mlr.press\/v23\/agrawal12\/agrawal12.pdf"},{"key":"e_1_3_2_5_2","first-page":"127","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Agrawal Shipra","year":"2013","unstructured":"Shipra Agrawal and Navin Goyal. 2013. Thompson sampling for contextual bandits with linear payoffs. In Proceedings of the International Conference on Machine Learning. PMLR, 127\u2013135."},{"key":"e_1_3_2_6_2","volume-title":"Proceedings of the NeurIPS 2022 Workshop on Causality for Real-world Impact","author":"ALAMI Reda","year":"2022","unstructured":"Reda ALAMI. 2022. Non-stationary causal bandits. In Proceedings of the NeurIPS 2022 Workshop on Causality for Real-world Impact. Retrieved from https:\/\/openreview.net\/forum?id=DnH-JVrh9kq"},{"key":"e_1_3_2_7_2","volume-title":"Proceedings of the 23rd Conference on Learning Theory","author":"Audibert Jean-Yves","year":"2010","unstructured":"Jean-Yves Audibert, Sebastien Bubeck, and Remi Munos. 2010. Best arm identification in multi-armed bandits. In Proceedings of the 23rd Conference on Learning Theory."},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1023\/A:1013689704352"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1145\/3501714.3501743"},{"key":"e_1_3_2_10_2","first-page":"1342","volume-title":"Proceedings of the Advances in Neural Information Processing Systems 28","author":"Bareinboim Elias","year":"2015","unstructured":"Elias Bareinboim, Andrew Forney, and Judea Pearl. 2015. Bandits with unobserved confounders: A causal approach. In Proceedings of the Advances in Neural Information Processing Systems 28. 1342\u20131350."},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1510507113"},{"key":"e_1_3_2_12_2","first-page":"44356","volume-title":"Proceedings of the Advances in Neural Information Processing Systems 36","author":"Bellot Alexis","year":"2023","unstructured":"Alexis Bellot, Alan Malek, and Silvia Chiappa. 2023. Transportability for bandits with data from different environments. In Proceedings of the Advances in Neural Information Processing Systems 36. 44356\u201344381."},{"key":"e_1_3_2_13_2","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","volume":"27","author":"Besbes Omar","year":"2014","unstructured":"Omar Besbes, Yonatan Gur, and Assaf Zeevi. 2014. Stochastic multi-armed-bandit problem with non-stationary rewards. In Proceedings of the Advances in Neural Information Processing Systems. Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K.Q. Weinberger (Eds.), Vol. 27, Curran Associates, Inc. Retrieved from https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2014\/file\/903ce9225fca3e988c2af215d4e544d3-Paper.pdf"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.5555\/3524938.3525017"},{"issue":"101","key":"e_1_3_2_15_2","first-page":"3207","article-title":"Counterfactual reasoning and learning systems: The example of computational advertising","volume":"14","author":"Bottou L\u00e9on","year":"2013","unstructured":"L\u00e9on Bottou, Jonas Peters, Joaquin Qui\u00f1onero-Candela, Denis X. Charles, D. Max Chickering, Elon Portugaly, Dipankar Ray, Patrice Simard, and Ed Snelson. 2013. Counterfactual reasoning and learning systems: The example of computational advertising. Journal of Machine Learning Research 14, 101 (2013), 3207\u20133260. Retrieved from http:\/\/jmlr.org\/papers\/v14\/bottou13a.html","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/CEC48606.2020.9185782"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00521-024-09800-0"},{"key":"e_1_3_2_18_2","first-page":"208","volume-title":"Proceedings of the 14th International Conference on Artificial Intelligence and Statistics","author":"Chu Wei","year":"2011","unstructured":"Wei Chu, Lihong Li, Lev Reyzin, and Robert Schapire. 2011. Contextual bandits with linear payoff functions. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. JMLR Workshop and Conference Proceedings, 208\u2013214."},{"key":"e_1_3_2_19_2","first-page":"407","volume-title":"Proceedings of the Conference on Causal Learning and Reasoning","author":"Kroon Arnoud De","year":"2022","unstructured":"Arnoud De Kroon, Joris Mooij, and Danielle Belgrave. 2022. Causal bandits without prior knowledge using separating sets. In Proceedings of the Conference on Causal Learning and Reasoning. PMLR, 407\u2013427."},{"key":"e_1_3_2_20_2","first-page":"47","article-title":"Causal reinforcement learning: A survey","author":"Deng Zhihong","year":"2023","unstructured":"Zhihong Deng, Jing Jiang, Guodong Long, and Chengqi Zhang. 2023. Causal reinforcement learning: A survey. Transactions on Machine Learning Research (2023), 47 pages. https:\/\/openreview.net\/forum?id=qqnttX9LPo","journal-title":"Transactions on Machine Learning Research"},{"key":"e_1_3_2_21_2","volume-title":"Proceedings of the 38th Annual Conference on Neural Information Processing Systems","author":"Dhawan Nikita","year":"2024","unstructured":"Nikita Dhawan, Leonardo Cotta, Karen Ullrich, Rahul Krishnan, and Chris J. Maddison. 2024. End-to-end causal effect estimation from unstructured natural language data. In Proceedings of the 38th Annual Conference on Neural Information Processing Systems. Retrieved from https:\/\/openreview.net\/forum?id=gzQARCgIsI"},{"key":"e_1_3_2_22_2","first-page":"1939","volume-title":"Proceedings of the Advances in Neural Information Processing Systems 34","author":"Dimakopoulou Maria","year":"2021","unstructured":"Maria Dimakopoulou, Zhimei Ren, and Zhengyuan Zhou. 2021. Online multi-armed bandits with adaptive inference. In Proceedings of the Advances in Neural Information Processing Systems 34. 1939\u20131951."},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33013445"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i8.16892"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-021-05961-4"},{"key":"e_1_3_2_26_2","series-title":"Proceedings of Machine Learning Research","first-page":"67","volume-title":"Proceedings of the 3rd Machine Learning for Healthcare Conference.","volume":"85","author":"Durand Audrey","year":"2018","unstructured":"Audrey Durand, Charis Achilleos, Demetris Iacovides, Katerina Strati, Georgios D. Mitsis, and Joelle Pineau. 2018. Contextual bandits for adapting treatment in a mouse model of de novo carcinogenesis. In Proceedings of the 3rd Machine Learning for Healthcare Conference.Finale Doshi-Velez, Jim Fackler, Ken Jung, David Kale, Rajesh Ranganath, Byron Wallace, and Jenna Wiens (Eds.), Proceedings of Machine Learning Research, Vol. 85, PMLR, 67\u201382. Retrieved from https:\/\/proceedings.mlr.press\/v85\/durand18a.html"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i04.5797"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v37i6.25917"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33012454"},{"key":"e_1_3_2_30_2","first-page":"1156","volume-title":"Proceedings of the 34th International Conference on Machine Learning, ICML 2017","author":"Forney Andrew","year":"2017","unstructured":"Andrew Forney, Judea Pearl, and Elias Barelnbolm. 2017. Counterfactual data-fusion for online reinforcement learners. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017. 1156\u20131164."},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i17.17775"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.2014602118"},{"key":"e_1_3_2_33_2","article-title":"Bayesian causal bandits with backdoor adjustment prior","author":"Huang Jireh","year":"2023","unstructured":"Jireh Huang and Qing Zhou. 2023. Bayesian causal bandits with backdoor adjustment prior. Transactions on Machine Learning Research (2023). https:\/\/openreview.net\/forum?id=sMsGv5Kfm3","journal-title":"Transactions on Machine Learning Research"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v38i18.30027"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v36i6.20653"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","unstructured":"Xu Huang Weiwen Liu Xiaolong Chen Xingmei Wang Hao Wang Defu Lian Yasheng Wang Ruiming Tang and Enhong Chen. 2024. Understanding the planning of LLM agents: A survey. 10.48550\/arXiv.2402.02716","DOI":"10.48550\/arXiv.2402.02716"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1098\/rsos.171377"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9781139025751"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1198\/106186008X320456"},{"key":"e_1_3_2_40_2","first-page":"423","volume-title":"Proceedings of the Causal Learning and Reasoning","author":"Jamshidi Fateme","year":"2024","unstructured":"Fateme Jamshidi, Jalal Etesami, and Negar Kiyavash. 2024. Confounded budgeted causal bandits. In Proceedings of the Causal Learning and Reasoning. PMLR, 423\u2013461."},{"key":"e_1_3_2_41_2","volume-title":"Proceedings of the 6th International Conference on Learning Representations (ICLR 2018)","author":"Joachims Thorsten","year":"2018","unstructured":"Thorsten Joachims, Adith Swaminathan, and Maarten de Rijke. 2018. Deep learning with logged bandit feedback. In Proceedings of the 6th International Conference on Learning Representations (ICLR 2018)."},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.5555\/3045390.3045708"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","unstructured":"Khashayar Khosravi Renato Paes Leme Chara Podimata and Apostolis Tsorvantzis. 2024. Preferences Evolve And So Should Your Bandits: Bandits with Evolving States for Online Platforms. 10.48550\/arXiv.2307.11655","DOI":"10.48550\/arXiv.2307.11655"},{"key":"e_1_3_2_44_2","first-page":"1189","volume-title":"Proceedings of the Advances in Neural Information Processing Systems 29","author":"Lattimore Finnian","year":"2016","unstructured":"Finnian Lattimore, Tor Lattimore, and Mark D. Reid. 2016. Causal bandits: Learning good interventions via causal inference. In Proceedings of the Advances in Neural Information Processing Systems 29. 1189\u20131197."},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1017\/9781108571401"},{"key":"e_1_3_2_46_2","first-page":"2573","volume-title":"Proceedings of the Advances in Neural Information Processing Systems 31","author":"Lee Sanghack","year":"2018","unstructured":"Sanghack Lee and Elias Bareinboim. 2018. Structural causal bandits: Where to intervene?. In Proceedings of the Advances in Neural Information Processing Systems 31. 2573\u20132583."},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33014164"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i06.6582"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.11699"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","unstructured":"Chenxi Liu Yongqiang Chen Tongliang Liu Mingming Gong James Cheng Bo Han and Kun Zhang. 2024. Discovery of the Hidden World with Large Language Models. 10.48550\/arXiv.2402.03941","DOI":"10.48550\/arXiv.2402.03941"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.2012.2230215"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","unstructured":"Yueyang Liu Xu Kuang and Benjamin Van Roy. 2023. A Definition of Non-Stationary Bandits. 10.48550\/arXiv.2302.12202","DOI":"10.48550\/arXiv.2302.12202"},{"key":"e_1_3_2_53_2","first-page":"24817","article-title":"Causal bandits with unknown graph structure","volume":"34","author":"Lu Yangyi","year":"2021","unstructured":"Yangyi Lu, Amirhossein Meisami, and Ambuj Tewari. 2021. Causal bandits with unknown graph structure. Advances in Neural Information Processing Systems 34 (2021), 24817\u201324828.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_54_2","series-title":"PMLR","first-page":"141","volume-title":"Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI)","volume":"124","author":"Lu Yangyi","year":"2020","unstructured":"Yangyi Lu, Amirhossein Meisami, Ambuj Tewari, and William Yan. 2020. Regret analysis of bandit problems with causal background knowledge. In Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI)(PMLR, Vol. 124). 141\u2013150."},{"key":"e_1_3_2_55_2","series-title":"Proceedings of Machine Learning Research","first-page":"1739","volume-title":"Proceedings of the 31st Conference On Learning Theory.","volume":"75","author":"Luo Haipeng","year":"2018","unstructured":"Haipeng Luo, Chen-Yu Wei, Alekh Agarwal, and John Langford. 2018. Efficient contextual bandits in non-stationary worlds. In Proceedings of the 31st Conference On Learning Theory.S\u00e9bastien Bubeck, Vianney Perchet, and Philippe Rigollet (Eds.), Proceedings of Machine Learning Research, Vol. 75, PMLR, 1739\u20131776. Retrieved from https:\/\/proceedings.mlr.press\/v75\/luo18a.html"},{"key":"e_1_3_2_56_2","volume-title":"Proceedings of the 12th International Conference on Learning Representations","author":"Ma Yecheng Jason","year":"2024","unstructured":"Yecheng Jason Ma, William Liang, Guanzhi Wang, De-An Huang, Osbert Bastani, Dinesh Jayaraman, Yuke Zhu, Linxi Fan, and Anima Anandkumar. 2024. Eureka: Human-level reward design via coding large language models. In Proceedings of the 12th International Conference on Learning Representations. Retrieved from https:\/\/openreview.net\/forum?id=IEduRUO55F"},{"key":"e_1_3_2_57_2","article-title":"Causal contextual bandits with adaptive context","volume":"1","author":"Madhavan Rahul","year":"2024","unstructured":"Rahul Madhavan, Aurghya Maiti, Gaurav Sinha, and Siddharth Barman. 2024. Causal contextual bandits with adaptive context. Reinforcement Learning Journal 1, 1 (2024). https:\/\/openreview.net\/forum?id=zph66oWSpI","journal-title":"Reinforcement Learning Journal"},{"key":"e_1_3_2_58_2","volume-title":"Proceedings of the 38th Conference on Uncertainty in Artificial Intelligence","author":"Maiti Aurghya","year":"2022","unstructured":"Aurghya Maiti, Vineet Nair, and Gaurav Sinha. 2022. A causal bandit approach to learning good atomic interventions in presence of unobserved confounders. In Proceedings of the 38th Conference on Uncertainty in Artificial Intelligence."},{"key":"e_1_3_2_59_2","first-page":"23574","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Malek Alan","year":"2023","unstructured":"Alan Malek, Virginia Aglietti, and Silvia Chiappa. 2023. Additive causal bandits with unknown graph. In Proceedings of the International Conference on Machine Learning. PMLR, 23574\u201323589."},{"key":"e_1_3_2_60_2","doi-asserted-by":"publisher","unstructured":"Shervin Minaee Tomas Mikolov Narjes Nikzad Meysam Chenaghlu Richard Socher Xavier Amatriain and Jianfeng Gao. 2024. Large Language Models: A Survey. 10.48550\/arXiv.2402.06196","DOI":"10.48550\/arXiv.2402.06196"},{"key":"e_1_3_2_61_2","volume-title":"Proceedings of the 24th International Conference on Artificial Intelligence and Statistics","author":"Nair Vineet","year":"2021","unstructured":"Vineet Nair, Vishakha Patil, and Gaurav Sinha. 2021. Budgeted and non-budgeted causal bandits. In Proceedings of the 24th International Conference on Artificial Intelligence and Statistics."},{"key":"e_1_3_2_62_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33014634"},{"key":"e_1_3_2_63_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2009.191"},{"key":"e_1_3_2_64_2","doi-asserted-by":"publisher","DOI":"10.1093\/biomet\/82.4.669"},{"key":"e_1_3_2_65_2","doi-asserted-by":"publisher","DOI":"10.1214\/09-SS057"},{"key":"e_1_3_2_66_2","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511803161"},{"key":"e_1_3_2_67_2","doi-asserted-by":"publisher","DOI":"10.1515\/jci-2019-2002"},{"key":"e_1_3_2_68_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v25i1.7861"},{"key":"e_1_3_2_69_2","volume-title":"Causal Inference in Statistics: A Primer (1st ed.)","author":"Pearl Judea","year":"2016","unstructured":"Judea Pearl, Madelyn Glymour, and Nicholas P. Jewell. 2016. Causal Inference in Statistics: A Primer (1st ed.). John Wiley and Sons, Ltd."},{"key":"e_1_3_2_70_2","doi-asserted-by":"publisher","DOI":"10.1016\/S0049-237X(06)80074-1"},{"key":"e_1_3_2_71_2","doi-asserted-by":"publisher","DOI":"10.5555\/3202377"},{"key":"e_1_3_2_72_2","doi-asserted-by":"publisher","DOI":"10.5555\/3202377"},{"key":"e_1_3_2_73_2","doi-asserted-by":"publisher","unstructured":"Libo Qin Qiguang Chen Xiachong Feng Yang Wu Yongheng Zhang Yinghui Li Min Li Wanxiang Che and Philip S. Yu. 2024. Large Language Models Meet NLP: A Survey. 10.48550\/arXiv.2405.12819","DOI":"10.48550\/arXiv.2405.12819"},{"key":"e_1_3_2_74_2","volume-title":"Proceedings of the 13th International Conference on Learning Representations","author":"Raghavan Arvind","year":"2025","unstructured":"Arvind Raghavan and Elias Bareinboim. 2025. Counterfactual realizability. In Proceedings of the 13th International Conference on Learning Representations. Retrieved from https:\/\/openreview.net\/forum?id=uuriavczkL"},{"key":"e_1_3_2_75_2","doi-asserted-by":"publisher","DOI":"10.1093\/biomet\/70.1.41"},{"key":"e_1_3_2_76_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394486.3403139"},{"key":"e_1_3_2_77_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2022.3232363"},{"key":"e_1_3_2_78_2","volume-title":"Proceedings of the International Conference on Machine Learning (ICML\u201918) Workshops","author":"Sawant Neela","year":"2018","unstructured":"Neela Sawant, Chitti Babu Namballa, Narayanan Sadagopan, and Houssam Nassif. 2018. Contextual multi-armed bandits for causal marketing. In Proceedings of the International Conference on Machine Learning (ICML\u201918) Workshops."},{"key":"e_1_3_2_79_2","series-title":"PMLR","first-page":"3057","volume-title":"Proceedings of the 34th International Conference on Machine Learning","volume":"70","author":"Sen Rajat","year":"2017","unstructured":"Rajat Sen, Karthikeyan Shanmugam, Alexandres G. Dimakis, and Sanjay Shakkottai. 2017. Identifying best interventions through online importance sampling. In Proceedings of the 34th International Conference on Machine Learning(PMLR, Vol. 70). 3057\u20133066."},{"key":"e_1_3_2_80_2","first-page":"3076","volume-title":"Proceedings of the 34th International Conference on Machine Learning - Volume 70 (ICML\u201917)","author":"Shalit Uri","year":"2017","unstructured":"Uri Shalit, Fredrik D Johansson, and David Sontag. 2017. Estimating individual treatment effect: Generalization bounds and algorithms. In Proceedings of the 34th International Conference on Machine Learning - Volume 70 (ICML\u201917). 3076\u20133085."},{"key":"e_1_3_2_81_2","doi-asserted-by":"publisher","unstructured":"Nihal Sharma Soumya Basu Karthikeyan Shanmugam and Sanjay Shakkottai. 2020. On under-exploration in bandits with mean bounds from confounded data. 10.48550\/arXiv.2002.08405","DOI":"10.48550\/arXiv.2002.08405"},{"key":"e_1_3_2_82_2","doi-asserted-by":"publisher","DOI":"10.1287\/mnsc.2023.4678"},{"key":"e_1_3_2_83_2","doi-asserted-by":"publisher","DOI":"10.1145\/3486622.3493926"},{"key":"e_1_3_2_84_2","volume-title":"Causal Contextual Bandits","author":"Subramanian Chandrasekar","year":"2024","unstructured":"Chandrasekar Subramanian. 2024. Causal Contextual Bandits. Ph. D. Dissertation. Indian Institute of Technology Madras, Chennai, India."},{"key":"e_1_3_2_85_2","volume-title":"Proceedings of the 10th International Conference on Learning Representations (ICLR 2022)","author":"Subramanian Chandrasekar","year":"2022","unstructured":"Chandrasekar Subramanian and Balaraman Ravindran. 2022. Casusal contextual bandits with targeted interventions. In Proceedings of the 10th International Conference on Learning Representations (ICLR 2022)."},{"key":"e_1_3_2_86_2","volume-title":"Reinforcement Learning: An Introduction (2nd ed.)","author":"Sutton Richard S.","year":"2018","unstructured":"Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction (2nd ed.). MIT Press, Cambridge, MA."},{"key":"e_1_3_2_87_2","volume-title":"Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37","author":"Swaminathan Adith","year":"2015","unstructured":"Adith Swaminathan and Thorsten Joachims. 2015. Counterfactual risk minimization: Learning from logged bandit feedback. In Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37."},{"issue":"297","key":"e_1_3_2_88_2","first-page":"1","article-title":"Causal bandits for linear structural equation models","volume":"24","author":"Varici Burak","year":"2023","unstructured":"Burak Varici, Karthikeyan Shanmugam, Prasanna Sattigeri, and Ali Tajer. 2023. Causal bandits for linear structural equation models. Journal of Machine Learning Research 24, 297 (2023), 1\u201359.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_2_89_2","article-title":"Approximate allocation matching for structural causal bandits with unobserved confounders","volume":"36","author":"Wei Lai","year":"2024","unstructured":"Lai Wei, Muhammad Qasim Elahi, Mahsa Ghasemi, and Murat Kocaoglu. 2024. Approximate allocation matching for structural causal bandits with unobserved confounders. Advances in Neural Information Processing Systems 36 (2024), 68810\u201368832.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_90_2","series-title":"Proceedings of Machine Learning Research","first-page":"5508","volume-title":"Proceedings of the 35th International Conference on Machine Learning, {ICML} 2018","volume":"80","author":"Yabe Akihiro","year":"2018","unstructured":"Akihiro Yabe, Daisuke Hatano, Hanna Sumita, Shinji Ito, Naonori Kakimura, Takuro Fukunaga, and Ken-ichi Kawarabayashi. 2018. Causal bandits with propagating inference. In Proceedings of the 35th International Conference on Machine Learning, {ICML} 2018(Proceedings of Machine Learning Research, Vol. 80). 5508\u20135516."},{"key":"e_1_3_2_91_2","first-page":"4609","volume-title":"Proceedings of the International Conference on Artificial Intelligence and Statistics","author":"Yan Zirui","year":"2024","unstructured":"Zirui Yan, Dennis Wei, Dmitriy A. Katz, Prasanna Sattigeri, and Ali Tajer. 2024. Causal bandits with general causal models and interventions. In Proceedings of the International Conference on Artificial Intelligence and Statistics. PMLR, 4609\u20134617."},{"key":"e_1_3_2_92_2","doi-asserted-by":"publisher","unstructured":"Yan Zeng Ruichu Cai Fuchun Sun Libo Huang and Zhifeng Hao. 2023. A Survey on Causal Reinforcement Learning. 10.48550\/arXiv.2302.05209","DOI":"10.48550\/arXiv.2302.05209"},{"key":"e_1_3_2_93_2","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2017\/186"},{"key":"e_1_3_2_94_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i13.17449"},{"key":"e_1_3_2_95_2","doi-asserted-by":"publisher","DOI":"10.1145\/3404835.3462892"}],"container-title":["ACM Computing Surveys"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3744917","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,16]],"date-time":"2025-07-16T13:26:38Z","timestamp":1752672398000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3744917"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,11]]},"references-count":94,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2025,12,31]]}},"alternative-id":["10.1145\/3744917"],"URL":"https:\/\/doi.org\/10.1145\/3744917","relation":{},"ISSN":["0360-0300","1557-7341"],"issn-type":[{"type":"print","value":"0360-0300"},{"type":"electronic","value":"1557-7341"}],"subject":[],"published":{"date-parts":[[2025,7,11]]},"assertion":[{"value":"2023-07-31","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-06-13","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-07-11","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}