{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,12]],"date-time":"2026-03-12T07:16:12Z","timestamp":1773299772622,"version":"3.50.1"},"reference-count":53,"publisher":"Association for Computing Machinery (ACM)","issue":"12","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2024,8]]},"abstract":"<jats:p>\n            We design, to the best of our knowledge, the first differentially private (DP) stream aggregation processing system at scale. Our system -\n            <jats:italic>Differential Privacy SQL Pipelines (DP-SQLP)<\/jats:italic>\n            - is built using a streaming framework similar to Spark streaming, and is built on top of the Spanner database and the F1 query engine from Google.\n          <\/jats:p>\n          <jats:p>Towards designing DP-SQLP we make both algorithmic and systemic advances, namely, we (i) design a novel (user-level) DP key selection algorithm that can operate on an unbounded set of possible keys, and can scale to one billion keys that users have contributed, (ii) design a preemptive execution scheme for DP key selection that avoids enumerating all the keys at each triggering time, and (iii) use algorithmic techniques from DP continual observation to release a continual DP histogram of user contributions to different keys over the stream length. We empirically demonstrate the efficacy by obtaining at least 16\u00d7 reduction in error over meaningful baselines we consider. We implemented a streaming differentially private user impressions for Google Shopping with DP-SQLP. The streaming DP algorithms are further applied to Google Trends.<\/jats:p>","DOI":"10.14778\/3685800.3685833","type":"journal-article","created":{"date-parts":[[2024,11,8]],"date-time":"2024-11-08T17:25:21Z","timestamp":1731086721000},"page":"4145-4158","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["Differentially Private Stream Processing at Scale"],"prefix":"10.14778","volume":"17","author":[{"given":"Bing","family":"Zhang","sequence":"first","affiliation":[{"name":"Google"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Vadym","family":"Doroshenko","sequence":"additional","affiliation":[{"name":"Google"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Peter","family":"Kairouz","sequence":"additional","affiliation":[{"name":"Google Research"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Thomas","family":"Steinke","sequence":"additional","affiliation":[{"name":"Google DeepMind"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Abhradeep","family":"Thakurta","sequence":"additional","affiliation":[{"name":"Google DeepMind"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ziyin","family":"Ma","sequence":"additional","affiliation":[{"name":"Google"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Eidan","family":"Cohen","sequence":"additional","affiliation":[{"name":"Google"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Himani","family":"Apte","sequence":"additional","affiliation":[{"name":"Google"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jodi","family":"Spacek","sequence":"additional","affiliation":[{"name":"Google"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,11,8]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"International Conference on Machine Learning. PMLR, 32--40","author":"Agarwal Naman","year":"2017","unstructured":"Naman Agarwal and Karan Singh. 2017. The price of differential privacy for online learning. In International Conference on Machine Learning. PMLR, 32--40."},{"key":"e_1_2_1_2_1","doi-asserted-by":"crossref","unstructured":"Tyler Akidau Robert Bradshaw Craig Chambers Slava Chernyak Rafael J Fern\u00e1ndez-Moctezuma Reuven Lax Sam McVeety Daniel Mills Frances Perry Eric Schmidt et al. 2015. The dataflow model: a practical approach to balancing correctness latency and cost in massive-scale unbounded out-of-order data processing. (2015).","DOI":"10.14778\/2824032.2824076"},{"key":"e_1_2_1_3_1","volume-title":"Plume: Differential Privacy at Scale. arXiv preprint arXiv:2201.11603","author":"Amin Kareem","year":"2022","unstructured":"Kareem Amin, Jennifer Gillenwater, Matthew Joseph, Alex Kulesza, and Sergei Vassilvitskii. 2022. Plume: Differential Privacy at Scale. arXiv preprint arXiv:2201.11603 (2022)."},{"key":"e_1_2_1_4_1","volume-title":"International Conference on Machine Learning. PMLR, 263--271","author":"Amin Kareem","year":"2019","unstructured":"Kareem Amin, Alex Kulesza, Andres Munoz, and Sergei Vassilvtiskii. 2019. Bounding user contributions: A bias-variance trade-off in differential privacy. In International Conference on Machine Learning. PMLR, 263--271."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/2508859.2516735"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-662-53641-4_24"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/SP.2011.40"},{"key":"e_1_2_1_8_1","first-page":"28","article-title":"Apache Flink\u2122: Stream and Batch Processing in a Single Engine","volume":"38","author":"Carbone Paris","year":"2015","unstructured":"Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, and Kostas Tzoumas. 2015. Apache Flink\u2122: Stream and Batch Processing in a Single Engine. IEEE Data Eng. Bull. 38, 4 (2015), 28--38. http:\/\/sites.computer.org\/debull\/A15dec\/p28.pdf","journal-title":"IEEE Data Eng. Bull."},{"key":"e_1_2_1_9_1","volume-title":"International Conference on Artificial Intelligence and Statistics. PMLR, 2397--2419","author":"Cardoso Adrian Rivera","year":"2022","unstructured":"Adrian Rivera Cardoso and Ryan Rogers. 2022. Differentially private histograms under continual observation: Streaming selection into the unknown. In International Conference on Artificial Intelligence and Statistics. PMLR, 2397--2419."},{"key":"e_1_2_1_10_1","first-page":"3","article-title":"Private and Continual Release of Statistics","volume":"14","author":"Hubert Chan T.-H.","year":"2011","unstructured":"T.-H. Hubert Chan, Elaine Shi, and Dawn Song. 2011. Private and Continual Release of Statistics. ACM Trans. on Information Systems Security 14, 3 (Nov. 2011), 26:1--26:24.","journal-title":"ACM Trans. on Information Systems Security"},{"key":"e_1_2_1_11_1","volume-title":"Contact tracing mobile apps for COVID-19: Privacy considerations and related trade-offs. arXiv preprint arXiv:2003.11511","author":"Cho Hyunghoon","year":"2020","unstructured":"Hyunghoon Cho, Daphne Ippolito, and Yun William Yu. 2020. Contact tracing mobile apps for COVID-19: Privacy considerations and related trade-offs. arXiv preprint arXiv:2003.11511 (2020)."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/2491245"},{"key":"e_1_2_1_13_1","volume-title":"Improved differential privacy for sgd via optimal private linear operators on adaptive streams. arXiv preprint arXiv:2202.08312","author":"Denisov Sergey","year":"2022","unstructured":"Sergey Denisov, Brendan McMahan, Keith Rush, Adam Smith, and Abhradeep Thakurta. 2022. Improved differential privacy for sgd via optimal private linear operators on adaptive streams. arXiv preprint arXiv:2202.08312 (2022)."},{"key":"e_1_2_1_14_1","unstructured":"Damien Desfontaines. 2024. A list of real-world uses of differential privacy. https:\/\/desfontain.es\/blog\/real-world-differential-privacy.html"},{"key":"e_1_2_1_15_1","doi-asserted-by":"crossref","unstructured":"Cynthia Dwork Krishnaram Kenthapadi Frank McSherry Ilya Mironov and Moni Naor. 2006. Our data ourselves: Privacy via distributed noise generation. In Advances in Cryptology---EUROCRYPT. 486--503.","DOI":"10.1007\/11761679_29"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1007\/11681878_14"},{"key":"e_1_2_1_17_1","volume-title":"Proc. of the Forty-Second ACM Symp. on Theory of Computing (STOC'10)","author":"Dwork Cynthia","unstructured":"Cynthia Dwork, Moni Naor, Toniann Pitassi, and Guy N. Rothblum. 2010. Differential Privacy Under Continual Observation. In Proc. of the Forty-Second ACM Symp. on Theory of Computing (STOC'10). 715--724."},{"key":"e_1_2_1_18_1","unstructured":"Cynthia Dwork Moni Naor Toniann Pitassi Guy N Rothblum and Sergey Yekhanin. 2010. Pan-Private Streaming Algorithms.. In ics. 66--80."},{"key":"e_1_2_1_19_1","first-page":"3","article-title":"The algorithmic foundations of differential privacy","volume":"9","author":"Dwork Cynthia","year":"2014","unstructured":"Cynthia Dwork and Aaron Roth. 2014. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science 9, 3--4 (2014), 211--407.","journal-title":"Foundations and Trends in Theoretical Computer Science"},{"key":"e_1_2_1_20_1","volume-title":"Concentrated differential privacy. arXiv preprint arXiv: 1603.01887","author":"Dwork Cynthia","year":"2016","unstructured":"Cynthia Dwork and Guy N Rothblum. 2016. Concentrated differential privacy. arXiv preprint arXiv: 1603.01887 (2016)."},{"key":"e_1_2_1_21_1","volume-title":"Martin P\u00e1l, and Miroslava Sotakova.","author":"Epasto Alessandro","year":"2023","unstructured":"Alessandro Epasto, Kevin Graney, Jieming Mao, Andres Munoz Medina, Martin P\u00e1l, and Miroslava Sotakova. 2023. Differentially Private Algorithms for k-Anonymity Server. https:\/\/github.com\/WICG\/turtledove\/blob\/main\/DP_kanon_server.pdf"},{"key":"e_1_2_1_22_1","volume-title":"Vahab Mirrokni, Sergei Vassilvitskii, and Peilin Zhong.","author":"Epasto Alessandro","year":"2023","unstructured":"Alessandro Epasto, Jieming Mao, Andres Munoz Medina, Vahab Mirrokni, Sergei Vassilvitskii, and Peilin Zhong. 2023. Differentially private continual releases of streaming frequency moment estimations. arXiv preprint arXiv:2301.05605 (2023)."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2016.2601103"},{"key":"e_1_2_1_24_1","volume-title":"A survey on the evolution of stream processing systems. arXiv preprint arXiv:2008.00842","author":"Fragkoulis Marios","year":"2020","unstructured":"Marios Fragkoulis, Paris Carbone, Vasiliki Kalavri, and Asterios Katsifodimos. 2020. A survey on the evolution of stream processing systems. arXiv preprint arXiv:2008.00842 (2020)."},{"key":"e_1_2_1_25_1","volume-title":"Constant matters: Finegrained Complexity of Differentially Private Continual Observation Using Completely Bounded Norms. arXiv preprint arXiv:2202.11205","author":"Henzinger Monika","year":"2022","unstructured":"Monika Henzinger and Jalaj Upadhyay. 2022. Constant matters: Finegrained Complexity of Differentially Private Continual Observation Using Completely Bounded Norms. arXiv preprint arXiv:2202.11205 (2022)."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611977554.ch183"},{"key":"e_1_2_1_27_1","volume-title":"Efficient use of differentially private binary trees. Theory and Practice of Differential Privacy (TPDP","author":"Honaker James","year":"2015","unstructured":"James Honaker. 2015. Efficient use of differentially private binary trees. Theory and Practice of Differential Privacy (TPDP 2015), London, UK (2015)."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2019.2946884"},{"key":"e_1_2_1_29_1","volume-title":"Proc. of the 25th Annual Conf. on Learning Theory (COLT)","volume":"23","author":"Jain Prateek","year":"2012","unstructured":"Prateek Jain, Pravesh Kothari, and Abhradeep Thakurta. 2012. Differentially Private Online Learning. In Proc. of the 25th Annual Conf. on Learning Theory (COLT), Vol. 23. 24.1--24.34."},{"key":"e_1_2_1_30_1","volume-title":"International Conference on Machine Learning. PMLR, 5213--5225","author":"Kairouz Peter","year":"2021","unstructured":"Peter Kairouz, Brendan McMahan, Shuang Song, Om Thakkar, Abhradeep Thakurta, and Zheng Xu. 2021. Practical and private (deep) learning without sampling or shuffling. In International Conference on Machine Learning. PMLR, 5213--5225."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.2017.2685505"},{"key":"e_1_2_1_32_1","volume-title":"A Bias-Variance-Privacy Trilemma for Statistical Estimation. arXiv preprint arXiv:2301.13334","author":"Kamath Gautam","year":"2023","unstructured":"Gautam Kamath, Argyris Mouzakis, Matthew Regehr, Vikrant Singhal, Thomas Steinke, and Jonathan Ullman. 2023. A Bias-Variance-Privacy Trilemma for Statistical Estimation. arXiv preprint arXiv:2301.13334 (2023)."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/1526709.1526733"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICSME46990.2020.00034"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-015-0398-x"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/JIOT.2018.2799820"},{"key":"e_1_2_1_37_1","volume-title":"Wennan Zhu, Peter Kairouz, and Marco Gruteser.","author":"Liu Yuhan","year":"2022","unstructured":"Yuhan Liu, Ananda Theertha Suresh, Wennan Zhu, Peter Kairouz, and Marco Gruteser. 2022. Histogram Estimation under User-level Privacy with Heterogeneous Data. arXiv preprint arXiv:2206.03008 (2022)."},{"key":"e_1_2_1_38_1","volume-title":"Factorization norms and hereditary discrepancy. arXiv preprint arXiv:1408.1376","author":"Matousek Jiri","year":"2014","unstructured":"Jiri Matousek, Aleksandar Nikolov, and Kunal Talwar. 2014. Factorization norms and hereditary discrepancy. arXiv preprint arXiv:1408.1376 (2014)."},{"key":"e_1_2_1_39_1","unstructured":"H Brendan McMahan Krishna Pillutla Thomas Steinke Abhradeep Thakurta et al. 2024. Efficient and near-optimal noise generation for streaming differential privacy. arXiv preprint arXiv:2404.16706 (2024)."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/CSF.2017.11"},{"key":"e_1_2_1_41_1","volume-title":"Hassan Jameel Asghar, and Dali Kaafar","author":"Perrier Victor","year":"2018","unstructured":"Victor Perrier, Hassan Jameel Asghar, and Dali Kaafar. 2018. Private continual release of real-valued data streams. arXiv preprint arXiv:1811.03197 (2018)."},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/1639714.1639794"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.14778\/3229863.3229871"},{"key":"e_1_2_1_44_1","unstructured":"Adam Smith and Abhradeep Thakurta. 2013. (Nearly) optimal algorithms for private online learning in full-information and bandit settings. In Advances in Neural Information Processing Systems. 2733--2741."},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/1055558.1055596"},{"key":"e_1_2_1_46_1","volume-title":"Zenodo","author":"Syed Shahbaz","year":"2017","unstructured":"Shahbaz Syed, Michael Voelske, Martin Potthast, and Benno Stein. 2017. Webis-tldr-17 corpus. Zenodo, November (2017)."},{"key":"e_1_2_1_47_1","volume-title":"Xiaofan Cheng, Fangpeng Ming, Liang Zhao, and Xiaokang Zhou.","author":"Tan Liang","year":"2021","unstructured":"Liang Tan, Keping Yu, Ali Kashif Bashir, Xiaofan Cheng, Fangpeng Ming, Liang Zhao, and Xiaokang Zhou. 2021. Toward real-time and efficient cardiovascular monitoring for COVID-19 patients by 5G-enabled wearable medical devices: a deep learning approach. Neural Computing and Applications (2021), 1--14."},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-018-0514-9"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-57048-8_7"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2016.0169"},{"key":"e_1_2_1_51_1","volume-title":"William Lam, Damien Desfontaines, Daniel Simmons-Marengo, and Bryant Gipson.","author":"Wilson Royce J","year":"2019","unstructured":"Royce J Wilson, Celia Yuxin Zhang, William Lam, Damien Desfontaines, Daniel Simmons-Marengo, and Bryant Gipson. 2019. Differentially private SQL with bounded user contribution. arXiv preprint arXiv:1909.01917 (2019)."},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/2517349.2522737"},{"key":"e_1_2_1_53_1","unstructured":"\u00dalfar Erlingsson Ilya Mironov Ananth Raghunathan and Shuang Song. 2019. That which we call private. arXiv:1908.03566 [cs.LG]"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3685800.3685833","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,12,31]],"date-time":"2024-12-31T05:31:19Z","timestamp":1735623079000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3685800.3685833"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,8]]},"references-count":53,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2024,8]]}},"alternative-id":["10.14778\/3685800.3685833"],"URL":"https:\/\/doi.org\/10.14778\/3685800.3685833","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2024,8]]},"assertion":[{"value":"2024-11-08","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}