{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,22]],"date-time":"2025-08-22T23:40:02Z","timestamp":1755906002828,"version":"3.44.0"},"reference-count":70,"publisher":"Association for Computing Machinery (ACM)","issue":"2","funder":[{"name":"Danmarks Frie Forskningsr\u00e5d","award":["DFF-8021-002498"],"award-info":[{"award-number":["DFF-8021-002498"]}]},{"DOI":"10.13039\/100008398","name":"Villum Foundation","doi-asserted-by":"crossref","award":["VIL51463"],"award-info":[{"award-number":["VIL51463"]}],"id":[{"id":"10.13039\/100008398","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Manag. Data"],"published-print":{"date-parts":[[2025,6,9]]},"abstract":"<jats:p>\n            Differential privacy is the gold standard for privacy in data analysis. In many data analysis applications, the data is a database of\n            <jats:italic toggle=\"yes\">documents.<\/jats:italic>\n            For databases consisting of many documents, one of the most fundamental problems is that of pattern matching and computing (i) how often a pattern appears as a substring in the database (\n            <jats:italic toggle=\"yes\">substring counting<\/jats:italic>\n            ) and (ii) how many documents in the collection contain the pattern as a substring (\n            <jats:italic toggle=\"yes\">document counting<\/jats:italic>\n            ). In this paper, we initiate the theoretical study of substring and document counting under differential privacy. We give an \u03b5-differentially private data structure solving this problem for all patterns simultaneously with a maximum additive error of O(\ud835\udcc1 \u2022 polylog(n\ud835\udcc1|\u03a3|)), where \ud835\udcc1 is the maximum length of a document in the database,\n            <jats:italic toggle=\"yes\">n<\/jats:italic>\n            is the number of documents, and |\u03a3| is the size of the alphabet. We show that this is optimal up to a O(polylog(\ud835\udcc3\ud835\udcc1)) factor. Further, we show that for (\u03b5,\u03b4)-differential privacy, the bound for document counting can be improved to O(\u221a\ud835\udcc1 \u2022 polylog(\ud835\udcc3\ud835\udcc1|\u03a3|)). Additionally, our data structures are efficient. In particular, our data structures use O(\ud835\udcc3\ud835\udcc1\n            <jats:sup>2<\/jats:sup>\n            ) space, O(\ud835\udcc3\n            <jats:sup>2<\/jats:sup>\n            \ud835\udcc1\n            <jats:sup>4<\/jats:sup>\n            ) preprocessing time, and O(|P|) query time where\n            <jats:italic toggle=\"yes\">P<\/jats:italic>\n            is the query pattern. Along the way, we develop a new technique for differentially privately computing a general class of counting functions on trees of independent interest. Our data structures immediately lead to improved algorithms for related problems, such as privately mining frequent substrings and\n            <jats:italic toggle=\"yes\">q<\/jats:italic>\n            -grams. For\n            <jats:italic toggle=\"yes\">q<\/jats:italic>\n            -grams, we further improve the preprocessing time of the data structure.\n          <\/jats:p>","DOI":"10.1145\/3725232","type":"journal-article","created":{"date-parts":[[2025,6,9]],"date-time":"2025-06-09T15:20:31Z","timestamp":1749482431000},"page":"1-27","source":"Crossref","is-referenced-by-count":0,"title":["Differentially Private Substring and Document Counting"],"prefix":"10.1145","volume":"3","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6647-088X","authenticated-orcid":false,"given":"Giulia","family":"Bernardini","sequence":"first","affiliation":[{"name":"University of Milan, Milan, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1120-5154","authenticated-orcid":false,"given":"Philip","family":"Bille","sequence":"additional","affiliation":[{"name":"Technical University of Denmark, Kgs. Lyngby, Denmark"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8322-4952","authenticated-orcid":false,"given":"Inge Li","family":"G\u00f8rtz","sequence":"additional","affiliation":[{"name":"Technical University of Denmark, Kgs. Lyngby, Denmark"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1078-4075","authenticated-orcid":false,"given":"Teresa Anna","family":"Steiner","sequence":"additional","affiliation":[{"name":"University of Southern Denmark, Odense, Denmark"}]}],"member":"320","published-online":{"date-parts":[[2025,6,9]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Abowd et al","author":"John","year":"2020","unstructured":"John M. Abowd et al., 2020. The modernization of statistical disclosure limitation at the US Census Bureau. Census Working Papers, (2020)."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/3313276.3316312"},{"key":"e_1_2_1_3_1","volume-title":"Apple Machine Learning","author":"Privacy Team Apple Differential","year":"2017","unstructured":"Apple Differential Privacy Team. 2017. Learning with privacy at scale. Apple Machine Learning, (2017)."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.4086\/TOC.2016.V012A001"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.4230\/LIPICS.CPM.2021.8"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1007\/10719839_9"},{"key":"e_1_2_1_7_1","volume-title":"Inge Li G\u00f8rtz, and Teresa Anna Steiner","author":"Bernardini Giulia","year":"2024","unstructured":"Giulia Bernardini, Philip Bille, Inge Li G\u00f8rtz, and Teresa Anna Steiner. 2024. Differentially Private Substring and Document Counting. arxiv:2412.13813 [cs.DS] https:\/\/arxiv.org\/abs\/2412.13813"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1007\/S00453-017-0380--7"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00224-013-9498-4"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/1065167.1065184"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/2505515.2505553"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-03367-4_9"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3188745.3188946"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/FOCS.2015.45"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/SATML59370.2024.00032"},{"key":"e_1_2_1_16_1","article-title":"Private and Continual Release of Statistics","volume":"14","author":"Hubert Chan T.-H.","year":"2011","unstructured":"T.-H. Hubert Chan, Elaine Shi, and Dawn Song. 2011. Private and Continual Release of Statistics. ACM Trans. Inf. Syst. Secur., Vol. 14, 3 (2011), 26:1-26:24.","journal-title":"ACM Trans. Inf. Syst. Secur."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM51629.2021.00014"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/2382196.2382263"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2339530.2339564"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1016\/J.COSE.2014.12.005"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1016\/J.COSE.2015.08.005"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3564246.3585148"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-68786-5_23"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/773153.773173"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/1536414.1536466"},{"key":"e_1_2_1_26_1","volume-title":"Smith","author":"Dwork Cynthia","year":"2006","unstructured":"Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam D. Smith. 2006. Calibrating Noise to Sensitivity in Private Data Analysis. In Proc. 3rd TCC, Vol. 3876. Springer, 265-284."},{"key":"e_1_2_1_27_1","volume-title":"Rothblum","author":"Dwork Cynthia","year":"2010","unstructured":"Cynthia Dwork, Moni Naor, Toniann Pitassi, and Guy N. Rothblum. 2010. Differential privacy under continual observation. In Proc. 42nd STOC. 715-724."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/SFCS.1997.646102"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/355541.355547"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/1082036.1082039"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/1240233.1240243"},{"key":"e_1_2_1_32_1","volume-title":"Proc. 40th ICML. 10072-10092","author":"Fichtenberger Hendrik","year":"2023","unstructured":"Hendrik Fichtenberger, Monika Henzinger, and Jalaj Upadhyay. 2023. Constant Matters: Fine-grained Error Bound on Differentially Private Continual Observation. In Proc. 40th ICML. 10072-10092. https:\/\/proceedings.mlr.press\/v202\/fichtenberger23a.html"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-21458-5_18"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.tcs.2012.08.004"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/3375890"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-38906-1_30"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","unstructured":"Younan Gao and Meng He. 2021. Space Efficient Two-Dimensional Orthogonal Colored Range Counting. In Proc. 29th ESA . 46:1-46:17. https:\/\/doi.org\/10.4230\/LIPICS.ESA.2021.46","DOI":"10.4230\/LIPICS.ESA.2021.46"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1007\/978--3--662--44777--2_38"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.4230\/LIPICS.ICALP.2023.66"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","unstructured":"Badih Ghazi Ravi Kumar Jelani Nelson and Pasin Manurangsi. 2023b. Private Counting of Distinct and k-Occurring Items in Time Windows. In Proc. 14th ITCS . 55:1-55:24. https:\/\/doi.org\/10.4230\/LIPIcs.ITCS.2023.55","DOI":"10.4230\/LIPIcs.ITCS.2023.55"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1006\/jagm.1995.1038"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/1806689.1806786"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1137\/0213024"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/3187009.3177733"},{"key":"e_1_2_1_45_1","volume-title":"Proc. 33rd COLT,. 2263-2285","author":"Kaplan Haim","year":"2020","unstructured":"Haim Kaplan, Katrina Ligett, Yishay Mansour, Moni Naor, and Uri Stemmer. 2020. Privately Learning Thresholds: Closing the Exponential Gap. In Proc. 33rd COLT,. 2263-2285. http:\/\/proceedings.mlr.press\/v125\/kaplan20a.html"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1137\/070684483"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1007\/978--3-030--37228--6_28"},{"key":"e_1_2_1_48_1","volume-title":"Proc. 34th NeurIPS,. 5102-5111","author":"Kim Kunho","year":"2021","unstructured":"Kunho Kim, Sivakanth Gopi, Janardhan Kulkarni, and Sergey Yekhanin. 2021. Differentially Private n-gram Extraction. In Proc. 34th NeurIPS,. 5102-5111. https:\/\/proceedings.neurips.cc\/paper\/2021\/hash\/28ce9bc954876829eeb56ff46da8e1ab-Abstract.html"},{"key":"e_1_2_1_49_1","volume-title":"Proc. 12th CPM. 181-192","author":"Landau Gad M","year":"2001","unstructured":"Gad M Landau, Toru Kasai, Gunho Lee, Hiroki Arimura, Setsuo Arikawa, and Kunsoo Park. 2001. Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications. In Proc. 12th CPM. 181-192."},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611973105.20"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","unstructured":"Hieu Hanh Le Muneo Kushima Kenji Araki and Haruo Yokota. 2019. Differentially private sequential pattern mining considering time interval for electronic medical record systems. In Proc. 23rd IDEAS. 13:1-13:9. https:\/\/doi.org\/10.1145\/3331076.3331098","DOI":"10.1145\/3331076.3331098"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-91458-9_6"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1038\/s41598-023--43030-z"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1089\/cmb.2009.0169"},{"key":"e_1_2_1_55_1","first-page":"657","volume-title":"Proc. 13th SODA","volume":"2","author":"Muthukrishnan S.","year":"2002","unstructured":"S. Muthukrishnan. 2002. Efficient algorithms for document retrieval problems. In Proc. 13th SODA, Vol. 2. 657-666."},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/2535933"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.3390\/app12042131"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jda.2006.03.011"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1016\/0022-0000(83)90006-5"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","unstructured":"Teresa Anna Steiner. 2024. Differentially Private Approximate Pattern Matching. In Proc. 15th ITCS . 94:1-94:18. https:\/\/doi.org\/10.4230\/LIPICS.ITCS.2024.94","DOI":"10.4230\/LIPICS.ITCS.2024.94"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1007\/978--3--319--57048--8_7"},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2018.00079"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1016\/J.JKSUCI.2022.04.013"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1109\/SP.2018.00035"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2018.2839752"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2015.7113354"},{"key":"e_1_2_1_67_1","unstructured":"Timothy Yang Galen Andrew Hubert Eichner Haicheng Sun Wei Li Nicholas Kong Daniel Ramage and Fran\u00e7oise Beaufays. 2018. Applied federated learning: Improving google keyboard query suggestions. arXiv preprint arXiv:1812.02903 (2018)."},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.14778\/2428536.2428539"},{"key":"e_1_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2882928"},{"key":"e_1_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-95930-6_42"}],"container-title":["Proceedings of the ACM on Management of Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3725232","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,22]],"date-time":"2025-08-22T23:18:49Z","timestamp":1755904729000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3725232"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,9]]},"references-count":70,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,6,9]]}},"alternative-id":["10.1145\/3725232"],"URL":"https:\/\/doi.org\/10.1145\/3725232","relation":{},"ISSN":["2836-6573"],"issn-type":[{"type":"electronic","value":"2836-6573"}],"subject":[],"published":{"date-parts":[[2025,6,9]]}}}