{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,23]],"date-time":"2026-04-23T18:45:50Z","timestamp":1776969950622,"version":"3.51.4"},"reference-count":29,"publisher":"Association for Computing Machinery (ACM)","issue":"1","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["SIGMOD Rec."],"published-print":{"date-parts":[[2026,4,23]]},"abstract":"<jats:p>For databases consisting of many text documents, one of the most fundamental data analysis tasks is counting (i) how often a pattern appears as a substring in the database (substring counting) and (ii) how many documents in the collection contain the pattern as a substring (document counting). If such a database contains sensitive data, it is crucial to protect the privacy of individuals in the database. Differential privacy is the gold standard for privacy in data analysis. It gives rigorous privacy guarantees, but comes at the cost of yielding less accurate results.<\/jats:p>\n                  <jats:p>In this paper, we study the problem of substring and document counting under differential privacy. We give the first differentially private data structures for these problems and provide bounds on their additive error. For \u03b5-differential privacy, we show that the error of our data structure is optimal up to a poly-logarithmic factor in the number of documents and length of the longest document. Our data structures immediately lead to improved algorithms for related problems, such as privately mining frequent substrings and q-grams.<\/jats:p>","DOI":"10.1145\/3810900.3810908","type":"journal-article","created":{"date-parts":[[2026,4,23]],"date-time":"2026-04-23T18:16:38Z","timestamp":1776968198000},"page":"41-49","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["A Differentially Private Data Structure for Substring and Document Counting"],"prefix":"10.1145","volume":"55","author":[{"given":"Giulia","family":"Bernardini","sequence":"first","affiliation":[{"name":"University of Milan, Milan, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Philip","family":"Bille","sequence":"additional","affiliation":[{"name":"Technical University of Denmark, Lyngby, Denmark"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Inge Li","family":"G\u00f8rtz","sequence":"additional","affiliation":[{"name":"Technical University of Denmark, Lyngby, Denmark"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Teresa Anna","family":"Steiner","sequence":"additional","affiliation":[{"name":"University of Southern Denmark, Odense, Denmark"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2026,4,23]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/3313276.3316312"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.4086\/toc.2016.v012a001"},{"key":"e_1_2_1_3_1","series-title":"LIPIcs","first-page":"1","volume-title":"Proc. 32nd CPM","author":"Belazzougui D.","year":"2021","unstructured":"D. Belazzougui, D. Kosolobov, S. J. Puglisi, and R. Raman. Weighted ancestors in suffix trees revisited. In Proc. 32nd CPM, volume 191 of LIPIcs, pages 8:1-8:15. Schloss Dagstuhl - Leibniz-Zentrum f\u00a8ur Informatik, 2021."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1007\/10719839_9"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3725232"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00453-017-0380-7"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/2505515.2505553"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/3188745.3188946"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/FOCS.2015.45"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/2382196.2382263"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/2339530.2339564"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3564246.3585148"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/1536414.1536466"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1007\/11681878_14"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1806689.1806787"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/SFCS.1997.646102"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/355541.355547"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-38906-1_30"},{"key":"e_1_2_1_19_1","first-page":"1","volume-title":"Proc. 29th ESA","author":"Gao Y.","year":"2021","unstructured":"Y. Gao and M. He. Space efficient two-dimensional orthogonal colored range counting. In Proc. 29th ESA, pages 46:1-46:17, 2021."},{"key":"e_1_2_1_20_1","first-page":"1","volume-title":"Proc. 50th ICALP","volume":"261","author":"Ghazi B.","year":"2023","unstructured":"B. Ghazi, P. Kamath, R. Kumar, P. Manurangsi, and K. Wu. On differentially private counting on trees. In Proc. 50th ICALP, volume 261, pages 66:1-66:18, 2023."},{"key":"e_1_2_1_21_1","first-page":"1","volume-title":"Proc. 14th ITCS","author":"Ghazi B.","year":"2023","unstructured":"B. Ghazi, R. Kumar, J. Nelson, and P. Manurangsi. Private counting of distinct and k-occurring items in time windows. In Proc. 14th ITCS, pages 55:1-55:24, 2023."},{"key":"e_1_2_1_22_1","first-page":"2263","volume-title":"Proc. 33rd COLT","author":"Kaplan H.","year":"2020","unstructured":"H. Kaplan, K. Ligett, Y. Mansour, M. Naor, and U. Stemmer. Privately learning thresholds: Closing the exponential gap. In Proc. 33rd COLT, pages 2263-2285, 2020."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1137\/070684483"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-37228-6_28"},{"key":"e_1_2_1_25_1","first-page":"5102","volume-title":"Proc. 34th NeurIPS","author":"Kim K.","year":"2021","unstructured":"K. Kim, S. Gopi, J. Kulkarni, and S. Yekhanin. Differentially private n-gram extraction. In Proc. 34th NeurIPS, pages 5102-5111, 2021."},{"key":"e_1_2_1_26_1","first-page":"181","volume-title":"Proc. 12th CPM","author":"Landau G. M.","year":"2001","unstructured":"G. M. Landau, T. Kasai, G. Lee, H. Arimura, S. Arikawa, and K. Park. Linear-time longest-common-prefix computation in suffix arrays and its applications. In Proc. 12th CPM, pages 181-192, 2001."},{"key":"e_1_2_1_27_1","volume-title":"A graph-based differentially private algorithm for mining frequent sequential patterns. Applied Sciences, 12(4)","author":"del Prado M.","year":"2022","unstructured":"M. Nunez-del Prado, Y. Maehara-Aliaga, J. Salas, H. Alatrista-Salas, and D. Meg\u00b4as. A graph-based differentially private algorithm for mining frequent sequential patterns. Applied Sciences, 12(4), 2022."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1016\/0022-0000(83)90006-5"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2882928"}],"container-title":["ACM SIGMOD Record"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3810900.3810908","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,23]],"date-time":"2026-04-23T18:17:00Z","timestamp":1776968220000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3810900.3810908"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,4,23]]},"references-count":29,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2026,4,23]]}},"alternative-id":["10.1145\/3810900.3810908"],"URL":"https:\/\/doi.org\/10.1145\/3810900.3810908","relation":{},"ISSN":["0163-5808"],"issn-type":[{"value":"0163-5808","type":"print"}],"subject":[],"published":{"date-parts":[[2026,4,23]]},"assertion":[{"value":"2026-04-23","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}