{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,30]],"date-time":"2025-07-30T16:31:22Z","timestamp":1753893082588,"version":"3.41.2"},"reference-count":34,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2024,12,6]],"date-time":"2024-12-06T00:00:00Z","timestamp":1733443200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Artif. Intell."],"abstract":"<jats:p>We study a contextual bandit setting where the agent has access to causal side information, in addition to the ability to perform multiple targeted experiments corresponding to potentially different context-action pairs\u2014simultaneously in one-shot within a budget. This new formalism provides a natural model for several real-world scenarios where parallel targeted experiments can be conducted and where some domain knowledge of causal relationships is available. We propose a new algorithm that utilizes a novel entropy-like measure that we introduce. We perform several experiments, both using purely synthetic data and using a real-world dataset. In addition, we study sensitivity of our algorithm's performance to various aspects of the problem setting. The results show that our algorithm performs better than baselines in all of the experiments. We also show that the algorithm is sound; that is, as budget increases, the learned policy eventually converges to an optimal policy. Further, we theoretically bound our algorithm's regret under additional assumptions. Finally, we provide ways to achieve two popular notions of fairness, namely counterfactual fairness and demographic parity, with our algorithm.<\/jats:p>","DOI":"10.3389\/frai.2024.1346700","type":"journal-article","created":{"date-parts":[[2024,12,6]],"date-time":"2024-12-06T06:51:50Z","timestamp":1733467910000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Causal contextual bandits with one-shot data integration"],"prefix":"10.3389","volume":"7","author":[{"given":"Chandrasekar","family":"Subramanian","sequence":"first","affiliation":[]},{"given":"Balaraman","family":"Ravindran","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2024,12,6]]},"reference":[{"key":"B1","first-page":"39.1","article-title":"\u201cAnalysis of Thompson sampling for the multi-armed Bandit problem,\u201d","author":"Agrawal","year":"2012"},{"key":"B2","first-page":"249","article-title":"\u201cOffline contextual multi-armed bandits for mobile health interventions: a case study on emotion regulation,\u201d","author":"Ameko","year":"2020"},{"key":"B3","first-page":"1","article-title":"\u201cSurvey on applications of multi-armed and contextual bandits,\u201d","author":"Bouneffouf","year":"2020"},{"key":"B4","doi-asserted-by":"publisher","first-page":"4209","DOI":"10.1038\/s41598-022-07939-1","article-title":"A clarification of the nuances in the fairness metrics landscape","volume":"12","author":"Castelnovo","year":"2022","journal-title":"Sci. Rep"},{"key":"B5","doi-asserted-by":"publisher","first-page":"2419","DOI":"10.1007\/s10994-021-05961-4","article-title":"Challenges of real-world reinforcement learning: definitions, benchmarks and analysis","volume":"110","author":"Dulac-Arnold","year":"2021","journal-title":"Machine Learn"},{"key":"B6","first-page":"214","article-title":"\u201cFairness through awareness,\u201d","author":"Dwork","year":"2012"},{"article-title":"\u201cThe case for process fairness in learning: feature selection for fair decision making,\u201d","year":"2016","author":"Grgi\u0107-Hla\u010da","key":"B7"},{"key":"B8","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3397269","article-title":"A survey of learning causality with data: problems and methods","volume":"53","author":"Guo","year":"2020","journal-title":"ACM Comput. Surv"},{"key":"B9","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2004.06321","article-title":"Sequential batch learning in finite-action linear contextual bandits","author":"Han","year":"2020","journal-title":"arXiv [preprint]"},{"article-title":"\u201cDeep learning with logged bandit feedback,\u201d","year":"2018","author":"Joachims","key":"B10"},{"volume-title":"Probabilistic Graphical Models: Principles and Techniques","year":"2009","author":"Koller","key":"B11"},{"key":"B12","first-page":"4069","article-title":"\u201cCounterfactual fairness,\u201d","author":"Kusner","year":"2017","journal-title":"Advances in Neural Information Processing Systems, Vol. 30"},{"key":"B13","first-page":"1189","article-title":"\u201cCausal bandits: learning good interventions via causal inference,\u201d","author":"Lattimore","year":"2016","journal-title":"Advances in Neural Information Processing Systems 29, Vol. 29"},{"key":"B14","doi-asserted-by":"crossref","DOI":"10.1017\/9781108571401","volume-title":"Bandit Algorithms","author":"Lattimore","year":"2020"},{"key":"B15","first-page":"3619","article-title":"\u201cTransferable contextual bandit for cross-domain recommendation,\u201d","author":"Liu","year":"2018"},{"key":"B16","first-page":"141","article-title":"\u201cRegret analysis of bandit problems with causal background knowledge,\u201d","author":"Lu","year":"2020"},{"key":"B17","doi-asserted-by":"publisher","first-page":"96","DOI":"10.1214\/09-SS057","article-title":"Causal inference in statistics: an overview","volume":"3","author":"Pearl","year":"","journal-title":"Stat. Surv"},{"volume-title":"Causality, 2nd Edn","year":"","author":"Pearl","key":"B18"},{"key":"B19","doi-asserted-by":"publisher","first-page":"2002","DOI":"10.1515\/jci-2019-2002","article-title":"On the Interpretation of do(x)","volume":"7","author":"Pearl","year":"2019","journal-title":"J. Causal Infer"},{"key":"B20","doi-asserted-by":"publisher","first-page":"37","DOI":"10.1109\/LCSYS.2020.3047601","article-title":"Batched learning in generalized linear contextual bandits with general decision sets","volume":"6","author":"Ren","year":"2022","journal-title":"IEEE Contr. Syst. Lett"},{"key":"B21","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1707.02038","article-title":"A tutorial on thompson sampling","author":"Russo","year":"2017","journal-title":"arXiv [preprint]"},{"key":"B22","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1810.01859","article-title":"Contextual multi-armed bandits for causal marketing","author":"Sawant","year":"2018","journal-title":"arXiv [preprint]"},{"key":"B23","first-page":"3057","article-title":"\u201cIdentifying best interventions through online importance sampling,\u201d","author":"Sen","year":"2017"},{"key":"B24","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-031-01560-1","volume-title":"Active Learning, 1st Edn","author":"Settles","year":"2012"},{"key":"B25","doi-asserted-by":"publisher","first-page":"341","DOI":"10.1023\/A:1008202821328","article-title":"Differential evolution\u2014a simple and efficient heuristic for global optimization over continuous spaces","volume":"11","author":"Storn","year":"1997","journal-title":"J. Glob. Optimizat"},{"volume-title":"Causal Contextual Bandits","year":"2024","author":"Subramanian","key":"B26"},{"article-title":"\u201cCausal contextual bandits with targeted interventions,\u201d","year":"2022","author":"Subramanian","key":"B27"},{"key":"B28","doi-asserted-by":"publisher","first-page":"1731","DOI":"10.5555\/2789272.2886805","article-title":"Batch learning from logged bandit feedback through counterfactual risk minimization","volume":"16","author":"Swaminathan","year":"","journal-title":"J. Machine Learn. Res"},{"key":"B29","first-page":"814","article-title":"\u201cCounterfactual risk minimization: learning from logged bandit feedback,\u201d","author":"Swaminathan","year":""},{"key":"B30","first-page":"433","article-title":"\u201cAlgorithms with logarithmic or sublinear regret for constrained contextual bandits,\u201d","author":"Wu","year":"2015"},{"key":"B31","first-page":"5512","article-title":"\u201cCausal bandits with propagating inference,\u201d","author":"Yabe","year":"2018"},{"key":"B32","first-page":"1340","article-title":"\u201cTransfer learning in multi-armed bandits: a causal approach,\u201d","author":"Zhang","year":"2017"},{"key":"B33","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2110.08057","article-title":"Almost optimal batch-regret tradeoff for batch linear contextual bandits","author":"Zhang","year":"2022","journal-title":"arXiv [preprint]"},{"key":"B34","first-page":"1238","article-title":"\u201cCounterfactual fairness with partially known causal graph,\u201d","author":"Zuo","year":"2022","journal-title":"Advances in Neural Information Processing Systems, Vol. 35"}],"container-title":["Frontiers in Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2024.1346700\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,12,6]],"date-time":"2024-12-06T06:51:57Z","timestamp":1733467917000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2024.1346700\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,12,6]]},"references-count":34,"alternative-id":["10.3389\/frai.2024.1346700"],"URL":"https:\/\/doi.org\/10.3389\/frai.2024.1346700","relation":{},"ISSN":["2624-8212"],"issn-type":[{"type":"electronic","value":"2624-8212"}],"subject":[],"published":{"date-parts":[[2024,12,6]]},"article-number":"1346700"}}