{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,23]],"date-time":"2025-06-23T16:07:04Z","timestamp":1750694824570,"version":"3.41.0"},"reference-count":34,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2025,1,25]],"date-time":"2025-01-25T00:00:00Z","timestamp":1737763200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"NSF","award":["1942123, 2211237, and 2224246"],"award-info":[{"award-number":["1942123, 2211237, and 2224246"]}]},{"name":"Jane Street Graduate Research Fellowship"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["J. ACM"],"published-print":{"date-parts":[[2025,2,28]]},"abstract":"<jats:p>\n            Ensuring that analyses performed on a dataset are representative of the entire population is one of the central problems in statistics. Most classical techniques assume that the dataset is independent of the analyst\u2019s query and break down in the common setting where a dataset is reused for multiple, adaptively chosen, queries. This problem of\n            <jats:italic>adaptive data analysis<\/jats:italic>\n            was formalized in the seminal works of Dwork et\u00a0al. (STOC 2015) and Hardt and Ullman (FOCS 2014).\n          <\/jats:p>\n          <jats:p>We identify a remarkably simple set of assumptions under which the queries will continue to be representative even when chosen adaptively: the only requirements are that each query takes as input a random subsample and outputs few bits. This result shows that the noise inherent in subsampling is sufficient to guarantee that query responses generalize. The simplicity of this subsampling-based framework allows it to model a variety of real-world scenarios not covered by prior work.<\/jats:p>\n          <jats:p>In addition to its simplicity, we demonstrate the utility of this framework by designing mechanisms for two foundational tasks: statistical queries and median finding. In particular, our mechanism for answering the broadly applicable class of statistical queries is both extremely simple and state of the art in many parameter regimes.<\/jats:p>","DOI":"10.1145\/3698104","type":"journal-article","created":{"date-parts":[[2024,10,1]],"date-time":"2024-10-01T10:50:38Z","timestamp":1727779838000},"page":"1-45","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Subsampling Suffices for Adaptive Data Analysis"],"prefix":"10.1145","volume":"72","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3185-1097","authenticated-orcid":false,"given":"Guy","family":"Blanc","sequence":"first","affiliation":[{"name":"Computer Science, Stanford University, Stanford, United States"}]}],"member":"320","published-online":{"date-parts":[[2025,1,25]]},"reference":[{"key":"e_1_3_4_2_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.spl.2022.109528"},{"key":"e_1_3_4_3_2","article-title":"Privacy amplification by subsampling: Tight analyses via couplings and divergences","volume":"31","author":"Balle Borja","year":"2018","unstructured":"Borja Balle, Gilles Barthe, and Marco Gaboardi. 2018. Privacy amplification by subsampling: Tight analyses via couplings and divergences. Advances in Neural Information Processing Systems 31 (2018), 1\u201311.","journal-title":"Advances in Neural Information Processing Systems"},{"issue":"10","key":"e_1_3_4_4_2","article-title":"Clustering with Bregman divergences.","volume":"6","author":"Banerjee Arindam","year":"2005","unstructured":"Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, Joydeep Ghosh, and John Lafferty. 2005. Clustering with Bregman divergences. Journal of Machine Learning Research 6, 10 (2005), 1\u201312.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_4_5_2","doi-asserted-by":"publisher","DOI":"10.1145\/2897518.2897566"},{"key":"e_1_3_4_6_2","doi-asserted-by":"publisher","DOI":"10.1145\/3564246.3585226"},{"key":"e_1_3_4_7_2","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.1972.10482404"},{"key":"e_1_3_4_8_2","first-page":"625","volume-title":"Proceedings of the Conference on Learning Theory","author":"Dagan Yuval","year":"2022","unstructured":"Yuval Dagan and Gil Kur. 2022. A bounded-noise mechanism for differential privacy. In Proceedings of the Conference on Learning Theory. 625\u2013661."},{"key":"e_1_3_4_9_2","article-title":"Generalization in adaptive data analysis and holdout reuse","volume":"28","author":"Dwork Cynthia","year":"2015","unstructured":"Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toni Pitassi, Omer Reingold, and Aaron Roth. 2015. Generalization in adaptive data analysis and holdout reuse. Advances in Neural Information Processing Systems 28 (2015), 2350\u20132358.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_4_10_2","doi-asserted-by":"publisher","DOI":"10.1126\/science.aaa9375"},{"key":"e_1_3_4_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/2746539.2746580"},{"key":"e_1_3_4_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/FOCS.2010.12"},{"key":"e_1_3_4_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/3046674"},{"key":"e_1_3_4_14_2","first-page":"728","volume-title":"Proceedings of the Conference on Learning Theory","author":"Feldman Vitaly","year":"2017","unstructured":"Vitaly Feldman and Thomas Steinke. 2017. Generalization for adaptively-chosen estimators via stable median. In Proceedings of the Conference on Learning Theory. 728\u2013757."},{"key":"e_1_3_4_15_2","first-page":"535","volume-title":"Proceedings of the Conference on Learning Theory","author":"Feldman Vitaly","year":"2018","unstructured":"Vitaly Feldman and Thomas Steinke. 2018. Calibrating noise to variance in adaptive data analysis. In Proceedings of the Conference on Learning Theory. 535\u2013544."},{"key":"e_1_3_4_16_2","first-page":"297","volume-title":"Proceedings of the 31st International Conference on Algorithmic Learning Theory","author":"Fish Benjamin","year":"2020","unstructured":"Benjamin Fish, Lev Reyzin, and Benjamin I. P. Rubinstein. 2020. Sampling without compromising accuracy in adaptive data analysis. In Proceedings of the 31st International Conference on Algorithmic Learning Theory. 297\u2013318."},{"key":"e_1_3_4_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.2008.929943"},{"key":"e_1_3_4_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/FOCS.2014.55"},{"key":"e_1_3_4_19_2","doi-asserted-by":"publisher","DOI":"10.1214\/aoms\/1177730196"},{"key":"e_1_3_4_20_2","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.1963.10500830"},{"key":"e_1_3_4_21_2","doi-asserted-by":"publisher","DOI":"10.1111\/j.1467-9574.1980.tb00681.x"},{"key":"e_1_3_4_22_2","first-page":"881","volume-title":"Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms","author":"Karp Richard M.","year":"2007","unstructured":"Richard M. Karp and Robert Kleinberg. 2007. Noisy binary search and its applications. In Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms. 881\u2013890."},{"key":"e_1_3_4_23_2","doi-asserted-by":"publisher","DOI":"10.1145\/293347.293351"},{"key":"e_1_3_4_24_2","volume-title":"Theory of U-statistics","author":"Korolyuk Vladimir S.","year":"2013","unstructured":"Vladimir S. Korolyuk and Yu V. Borovskich. 2013. Theory of U-statistics. Vol. 273. Springer Science & Business Media."},{"key":"e_1_3_4_25_2","first-page":"29","article-title":"\u00dcber den Median der Binomial- and Poissonverteilung","volume":"19","author":"Neumann Peter","year":"1966","unstructured":"Peter Neumann. 1966. \u00dcber den Median der Binomial- and Poissonverteilung. Wissenschaftliche Zeitschrift der Technischen Universit\u00e4t Dresden (in German) 19 (1966), 29\u201333.","journal-title":"Wissenschaftliche Zeitschrift der Technischen Universit\u00e4t Dresden (in German)"},{"key":"e_1_3_4_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/FOCS.2016.59"},{"key":"e_1_3_4_27_2","first-page":"1232","volume-title":"Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, Arthur Gretton and Christian C. Robert (Eds.).","volume":"51","author":"Russo Daniel","year":"2016","unstructured":"Daniel Russo and James Zou. 2016. Controlling bias in adaptive data analysis using information theory. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, Arthur Gretton and Christian C. Robert (Eds.). Proceedings of Machine Learning Research, Vol. 51. PMLR, 1232\u20131240. https:\/\/proceedings.mlr.press\/v51\/russo16.html"},{"key":"e_1_3_4_28_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00037-003-0175-x"},{"key":"e_1_3_4_29_2","doi-asserted-by":"publisher","DOI":"10.1006\/jagm.2000.1138"},{"key":"e_1_3_4_30_2","article-title":"Information, privacy and stability in adaptive data analysis","author":"Smith Adam","year":"2017","unstructured":"Adam Smith. 2017. Information, privacy and stability in adaptive data analysis. arXiv preprint arXiv:1706.00820 (2017).","journal-title":"arXiv preprint arXiv:1706.00820"},{"key":"e_1_3_4_31_2","article-title":"Composition of differential privacy & privacy amplification by subsampling","author":"Steinke Thomas","year":"2022","unstructured":"Thomas Steinke. 2022. Composition of differential privacy & privacy amplification by subsampling. arXiv preprint arXiv:2210.00597 (2022).","journal-title":"arXiv preprint arXiv:2210.00597"},{"key":"e_1_3_4_32_2","article-title":"Between pure and approximate differential privacy","author":"Steinke Thomas","year":"2015","unstructured":"Thomas Steinke and Jonathan Ullman. 2015. Between pure and approximate differential privacy. arXiv preprint arXiv:1501.06095 (2015).","journal-title":"arXiv preprint arXiv:1501.06095"},{"key":"e_1_3_4_33_2","first-page":"1588","volume-title":"Proceedings of the Conference on Learning Theory","author":"Steinke Thomas","year":"2015","unstructured":"Thomas Steinke and Jonathan Ullman. 2015. Interactive fingerprinting codes and the hardness of preventing false discovery. In Proceedings of the Conference on Learning Theory. 1588\u20131628."},{"key":"e_1_3_4_34_2","article-title":"A note on optimal probability lower bounds for centered random variables","author":"Veraar Mark","year":"2008","unstructured":"Mark Veraar. 2008. A note on optimal probability lower bounds for centered random variables. arXiv preprint arXiv:0803.0727 (2008).","journal-title":"arXiv preprint arXiv:0803.0727"},{"key":"e_1_3_4_35_2","first-page":"7634","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Zhu Yuqing","year":"2019","unstructured":"Yuqing Zhu and Yu-Xiang Wang. 2019. Poission subsampled R\u00e9nyi differential privacy. In Proceedings of the International Conference on Machine Learning. 7634\u20137642."}],"container-title":["Journal of the ACM"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3698104","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3698104","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:10:27Z","timestamp":1750295427000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3698104"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,1,25]]},"references-count":34,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,2,28]]}},"alternative-id":["10.1145\/3698104"],"URL":"https:\/\/doi.org\/10.1145\/3698104","relation":{},"ISSN":["0004-5411","1557-735X"],"issn-type":[{"type":"print","value":"0004-5411"},{"type":"electronic","value":"1557-735X"}],"subject":[],"published":{"date-parts":[[2025,1,25]]},"assertion":[{"value":"2023-09-20","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-09-17","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-01-25","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}