{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,8]],"date-time":"2025-09-08T06:46:18Z","timestamp":1757313978061,"version":"3.41.0"},"reference-count":24,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2023,6,1]],"date-time":"2023-06-01T00:00:00Z","timestamp":1685577600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["SIGIR Forum"],"published-print":{"date-parts":[[2023,6]]},"abstract":"<jats:p>\n            Online experiments such as Randomised Controlled Trials (RCTs) or A\/B-tests are the bread and butter of modern platforms on the web. They are conducted continuously to allow platforms to estimate the causal effect of replacing system variant \"A\" with variant \"B\", on some metric of interest. These\n            <jats:italic>variants<\/jats:italic>\n            can differ in many aspects. In this paper, we focus on the common use-case where they correspond to machine learning models. The online experiment then serves as the final arbiter to decide which model is superior, and should thus be shipped. The statistical literature on causal effect estimation from RCTs has a substantial history, which contributes deservedly to the level of trust researchers and practitioners have in this \"gold standard\" of evaluation practices. Nevertheless, in the particular case of machine learning experiments, we remark that certain critical issues remain. Specifically, the assumptions that are required to ascertain that A\/B-tests yield unbiased estimates of the causal effect, are seldom met in practical applications. We argue that, because variants typically learn using pooled data, a lack of\n            <jats:italic>model interference<\/jats:italic>\n            cannot be guaranteed. This undermines the conclusions we can draw from online experiments with machine learning models. We discuss the implications this has for practitioners, and for the research literature.\n          <\/jats:p>","DOI":"10.1145\/3636341.3636358","type":"journal-article","created":{"date-parts":[[2023,12,4]],"date-time":"2023-12-04T17:07:47Z","timestamp":1701709667000},"page":"1-9","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["A Common Misassumption in Online Experiments with Machine Learning Models"],"prefix":"10.1145","volume":"57","author":[{"given":"Olivier","family":"Jeunen","sequence":"first","affiliation":[{"name":"ShareChat, Edinburgh, UK"}]}],"member":"320","published-online":{"date-parts":[[2023,12,4]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Multiple randomization designs","author":"Bajari Patrick","year":"2021","unstructured":"Patrick Bajari , Brian Burdick , Guido W. Imbens , Lorenzo Masoero , James McQueen , Thomas Richardson , and Ido M. Rosen . Multiple randomization designs , 2021 . URL https:\/\/arxiv.org\/abs\/2112.13495. Patrick Bajari, Brian Burdick, Guido W. Imbens, Lorenzo Masoero, James McQueen, Thomas Richardson, and Ido M. Rosen. Multiple randomization designs, 2021. URL https:\/\/arxiv.org\/abs\/2112.13495."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/2532508.2532511"},{"key":"e_1_2_1_3_1","volume-title":"Advances in Neural Information Processing Systems","author":"Chapelle Olivier","year":"2011","unstructured":"Olivier Chapelle and Lihong Li . An empirical evaluation of thompson sampling . In J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems , volume 24 . Curran Associates, Inc., 2011 . URL https:\/\/proceedings.neurips.cc\/paper\/2011\/file\/e53a0a2978c28872a4505bdb51db06dc-Paper.pdf. Olivier Chapelle and Lihong Li. An empirical evaluation of thompson sampling. In J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 24. Curran Associates, Inc., 2011. URL https:\/\/proceedings.neurips.cc\/paper\/2011\/file\/e53a0a2978c28872a4505bdb51db06dc-Paper.pdf."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3582900.3582905"},{"key":"e_1_2_1_5_1","volume-title":"Workshop on Perspectives on Offline Evaluation for Recommender Systems at RecSys '21, PERSPECTIVES '21","author":"Diaz Fernando","year":"2021","unstructured":"Fernando Diaz . On evaluating session-based recommendation with implicit feedback . In Workshop on Perspectives on Offline Evaluation for Recommender Systems at RecSys '21, PERSPECTIVES '21 , 2021 . Fernando Diaz. On evaluating session-based recommendation with implicit feedback. In Workshop on Perspectives on Offline Evaluation for Recommender Systems at RecSys '21, PERSPECTIVES '21, 2021."},{"key":"e_1_2_1_6_1","volume-title":"Statistical methods for research workers","author":"Fisher Ronald Aylmer","year":"1925","unstructured":"Ronald Aylmer Fisher . Statistical methods for research workers . Springer , 1925 . Ronald Aylmer Fisher. Statistical methods for research workers. Springer, 1925."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1136\/bmj.1.3923.554-a"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/2645710.2645745"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/3159652.3159687"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3331651.3331655"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9781139025751"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3298689.3347069"},{"key":"e_1_2_1_14_1","volume-title":"Workshop on Offline Evaluation for Recommender Systems at RecSys '18, REVEAL '18","author":"Jeunen Olivier","year":"2018","unstructured":"Olivier Jeunen , Koen Verstrepen , and Bart Goethals . Fair offline evaluation methodologies for implicit-feedback recommender systems with mnar data . In Workshop on Offline Evaluation for Recommender Systems at RecSys '18, REVEAL '18 , 2018 . Olivier Jeunen, Koen Verstrepen, and Bart Goethals. Fair offline evaluation methodologies for implicit-feedback recommender systems with mnar data. In Workshop on Offline Evaluation for Recommender Systems at RecSys '18, REVEAL '18, 2018."},{"key":"e_1_2_1_15_1","first-page":"592","volume-title":"Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics","volume":"22","author":"Kaufmann Emilie","year":"2012","unstructured":"Emilie Kaufmann , Olivier Cappe , and Aurelien Garivier . On bayesian upper confidence bounds for bandit problems. In Neil D. Lawrence and Mark Girolami, editors , Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics , volume 22 of ICML '12, pages 592 -- 600 . PMLR, 21--23 Apr 2012 . URL https:\/\/proceedings.mlr.press\/v22\/kaufmann12.html. Emilie Kaufmann, Olivier Cappe, and Aurelien Garivier. On bayesian upper confidence bounds for bandit problems. In Neil D. Lawrence and Mark Girolami, editors, Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, volume 22 of ICML '12, pages 592--600. PMLR, 21--23 Apr 2012. URL https:\/\/proceedings.mlr.press\/v22\/kaufmann12.html."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1017\/9781108653985"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3534678.3539160"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3447548.3467193"},{"key":"e_1_2_1_19_1","volume-title":"Deep learning: A critical appraisal","author":"Marcus Gary","year":"2018","unstructured":"Gary Marcus . Deep learning: A critical appraisal , 2018 . URL https:\/\/arxiv.org\/abs\/1801.00631. Gary Marcus. Deep learning: A critical appraisal, 2018. URL https:\/\/arxiv.org\/abs\/1801.00631."},{"key":"e_1_2_1_20_1","first-page":"73","article-title":"On small differences in sensation","volume":"3","author":"Peirce Charles Sanders","year":"1884","unstructured":"Charles Sanders Peirce and Joseph Jastrow . On small differences in sensation . Memoirs of the National Academy of Sciences , 3 : 73 -- 83 , 1884 . Charles Sanders Peirce and Joseph Jastrow. On small differences in sensation. Memoirs of the National Academy of Sciences, 3:73--83, 1884.","journal-title":"Memoirs of the National Academy of Sciences"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1090\/S0002-9904-1952-09620-8"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/2959100.2959176"},{"key":"e_1_2_1_23_1","volume-title":"Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of educational Psychology, 66(5):688","author":"Rubin Donald B","year":"1974","unstructured":"Donald B Rubin . Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of educational Psychology, 66(5):688 , 1974 . Donald B Rubin. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of educational Psychology, 66(5):688, 1974."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3534678.3542601"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/3340631.3398666"}],"container-title":["ACM SIGIR Forum"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3636341.3636358","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3636341.3636358","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:35:41Z","timestamp":1750178141000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3636341.3636358"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6]]},"references-count":24,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,6]]}},"alternative-id":["10.1145\/3636341.3636358"],"URL":"https:\/\/doi.org\/10.1145\/3636341.3636358","relation":{},"ISSN":["0163-5840"],"issn-type":[{"type":"print","value":"0163-5840"}],"subject":[],"published":{"date-parts":[[2023,6]]},"assertion":[{"value":"2023-12-04","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}