{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,11]],"date-time":"2025-09-11T19:51:55Z","timestamp":1757620315476,"version":"3.44.0"},"reference-count":25,"publisher":"Springer Science and Business Media LLC","issue":"5","license":[{"start":{"date-parts":[[2025,7,29]],"date-time":"2025-07-29T00:00:00Z","timestamp":1753747200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,7,29]],"date-time":"2025-07-29T00:00:00Z","timestamp":1753747200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Stat Comput"],"published-print":{"date-parts":[[2025,10]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Micro and survey datasets often contain private information about individuals, like their health status, income, or political preferences. Previous studies have shown that, even after data anonymization, a malicious intruder could still be able to identify individuals in the dataset by matching their variables to external information. Disclosure risk measures are statistical measures meant to quantify how big such a risk is for a specific dataset. One of the most common measures is the number of sample unique values that are also population unique. Mixed membership models can provide very accurate estimates of this measure. A limitation of this approach is that the number of extreme profiles has to be chosen by the modeller. In this article, we propose a non-parametric version of the model, based on the Hierarchical Dirichlet Process (HDP). The proposed approach does not require any tuning parameter or model selection step and provides accurate estimates of the disclosure risk measure, even with samples as small as 1<jats:inline-formula>\n              <jats:alternatives>\n                <jats:tex-math>$$\\%$$<\/jats:tex-math>\n                <mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mo>%<\/mml:mo>\n                <\/mml:math>\n              <\/jats:alternatives>\n            <\/jats:inline-formula> of the population size. Moreover, a data augmentation scheme to address the presence of structural zeros is presented. The proposed methodology is tested on a real dataset from the New York microdata.<\/jats:p>","DOI":"10.1007\/s11222-025-10693-9","type":"journal-article","created":{"date-parts":[[2025,7,29]],"date-time":"2025-07-29T12:35:58Z","timestamp":1753792558000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Disclosure risk assessment with Bayesian non-parametric hierarchical modelling"],"prefix":"10.1007","volume":"35","author":[{"given":"Marco","family":"Battiston","sequence":"first","affiliation":[]},{"given":"Lorenzo","family":"Rimella","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,7,29]]},"reference":[{"issue":"409","key":"10693_CR1","doi-asserted-by":"publisher","first-page":"38","DOI":"10.1080\/01621459.1990.10475304","volume":"85","author":"JG Bethlehem","year":"1990","unstructured":"Bethlehem, J.G., Keller, W.J., Pannekoek, J.: Disclosure control of microdata. J. Am. Stat. Assoc. 85(409), 38\u201345 (1990)","journal-title":"J. Am. Stat. Assoc."},{"key":"10693_CR2","doi-asserted-by":"publisher","first-page":"525","DOI":"10.1214\/15-AOAS807","volume":"9","author":"C Carota","year":"2015","unstructured":"Carota, C., Filippone, M., Leombruni, R., Polettini, S.: Bayesian nonparametric disclosure risk estimation via mixed effects log-linear models. Ann. Appl. Stat. 9, 525\u2013546 (2015)","journal-title":"Ann. Appl. Stat."},{"issue":"1","key":"10693_CR3","doi-asserted-by":"publisher","first-page":"165","DOI":"10.1111\/insr.12471","volume":"90","author":"C Carota","year":"2022","unstructured":"Carota, C., Filippone, M., Polettini, S.: Assessing bayesian semi-parametric log-linear models: an application to discolure risk estimation. Int. Stat. Rev. 90(1), 165\u2013183 (2022)","journal-title":"Int. Stat. Rev."},{"issue":"393","key":"10693_CR4","doi-asserted-by":"publisher","first-page":"10","DOI":"10.1080\/01621459.1986.10478229","volume":"81","author":"GT Duncan","year":"1986","unstructured":"Duncan, G.T., Lambert, D.: Disclosure-limited data dissemination. J. Am. Stat. Assoc. 81(393), 10\u201318 (1986)","journal-title":"J. Am. Stat. Assoc."},{"issue":"2","key":"10693_CR5","doi-asserted-by":"publisher","first-page":"207","DOI":"10.1080\/07350015.1989.10509729","volume":"7","author":"G Duncan","year":"1989","unstructured":"Duncan, G., Lambert, D.: The risk of disclosure for microdata. J. Bus. Econ. Stat. 7(2), 207\u2013217 (1989)","journal-title":"J. Bus. Econ. Stat."},{"issue":"2","key":"10693_CR6","doi-asserted-by":"publisher","first-page":"209","DOI":"10.1214\/aos\/1176342360","volume":"1","author":"TS Ferguson","year":"1973","unstructured":"Ferguson, T.S.: A bayesian analysis of some nonparametric problems. Ann. Stat. 1(2), 209\u2013230 (1973)","journal-title":"Ann. Stat."},{"key":"10693_CR7","first-page":"313","volume":"9","author":"D Lambert","year":"1993","unstructured":"Lambert, D.: Measures of disclosure risk and harm. J. Off. Stat. 9, 313\u2013313 (1993)","journal-title":"J. Off. Stat."},{"issue":"3","key":"10693_CR8","doi-asserted-by":"publisher","first-page":"635","DOI":"10.1111\/rssa.12352","volume":"181","author":"D Manrique-Vallier","year":"2018","unstructured":"Manrique-Vallier, D., Hu, J.: Bayesian non-parametric generation of fully synthetic multivariate categorical data in the presence of structural zeros. J. R. Stat. Soc. Ser. A Stat. Soc. 181(3), 635\u2013647 (2018). https:\/\/doi.org\/10.1111\/rssa.12352. (https:\/\/academic.oup.com\/jrsssa\/article-pdf\/181\/3\/635\/49449396\/jrsssa_181_3_635.pdf)","journal-title":"J. R. Stat. Soc. Ser. A Stat. Soc."},{"issue":"500","key":"10693_CR9","doi-asserted-by":"publisher","first-page":"1385","DOI":"10.1080\/01621459.2012.710508","volume":"107","author":"D Manrique-Vallier","year":"2012","unstructured":"Manrique-Vallier, D., Reiter, J.P.: Estimating identification disclosure risk using mixed membership models. J. Am. Stat. Assoc. 107(500), 1385\u20131394 (2012)","journal-title":"J. Am. Stat. Assoc."},{"issue":"4","key":"10693_CR10","doi-asserted-by":"publisher","first-page":"1061","DOI":"10.1080\/10618600.2013.844700","volume":"23","author":"D Manrique-Vallier","year":"2014","unstructured":"Manrique-Vallier, D., Reiter, J.P.: Bayesian estimation of discrete multivariate latent structure models with structural zeros. J. Comput. Graph. Stat. 23(4), 1061\u20131079 (2014)","journal-title":"J. Comput. Graph. Stat."},{"key":"10693_CR11","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1214\/11-SS074","volume":"5","author":"GJ Matthews","year":"2011","unstructured":"Matthews, G.J., Harel, O., et al.: Data confidentiality: a review of methods for statistical disclosure limitation and methods for assessing privacy. Stat. Surveys 5, 1\u201329 (2011)","journal-title":"Stat. Surveys"},{"key":"10693_CR12","doi-asserted-by":"publisher","first-page":"1103","DOI":"10.1198\/016214505000000619","volume":"100","author":"J Reiter","year":"2005","unstructured":"Reiter, J.: Estimating risks of identification disclosure in microdata. J. Am. Stat. Assoc. 100, 1103\u20131112 (2005)","journal-title":"J. Am. Stat. Assoc."},{"key":"10693_CR13","doi-asserted-by":"crossref","unstructured":"Rinott, Y., Shlomo, N.: A Generalized Negative Binomial Smoothing Model for Sample Disclosure Risk Estimation. In Privacy in Statistical Databases. Lecture Notes in Computer Science, Springer, Heidelberg (2006)","DOI":"10.1007\/11930242_8"},{"key":"10693_CR14","doi-asserted-by":"publisher","unstructured":"Ruggles, S., Flood, S., Sobek, M., Backman, D., Chen, A., Cooper, G., Richards, S., Rodgers, R., Schouweiler, M.: IPUMS USA: Version 15.0 [dataset]. IPUMS. Minneapolis, MN. (2024). https:\/\/doi.org\/10.18128\/D010.V15.0","DOI":"10.18128\/D010.V15.0"},{"key":"10693_CR15","first-page":"639","volume":"4","author":"J Sethuraman","year":"1994","unstructured":"Sethuraman, J.: A constructive definition of dirichlet priors. Stat. Sin. 4, 639\u2013650 (1994)","journal-title":"Stat. Sin."},{"key":"10693_CR16","doi-asserted-by":"crossref","unstructured":"Shlomo, N., Skinner, C., et al.: Assessing the protection provided by misclassification-based disclosure limitation methods for survey microdata. Ann. Appl. Stat. 4(3), 1291\u20131310 (2010)","DOI":"10.1214\/09-AOAS317"},{"key":"10693_CR17","doi-asserted-by":"crossref","unstructured":"Skinner, C.J., Elliot, M.: A measure of disclosure risk for microdata. J. R. Stat. Soc.: series B (statistical methodology) 64(4), 855\u2013867 (2002)","DOI":"10.1111\/1467-9868.00365"},{"issue":"483","key":"10693_CR18","doi-asserted-by":"publisher","first-page":"989","DOI":"10.1198\/016214507000001328","volume":"103","author":"C Skinner","year":"2008","unstructured":"Skinner, C., Shlomo, N.: Assessing identification risk in survey microdata using log-linear models. J. Am. Stat. Assoc. 103(483), 989\u20131001 (2008)","journal-title":"J. Am. Stat. Assoc."},{"issue":"483","key":"10693_CR19","doi-asserted-by":"publisher","first-page":"989","DOI":"10.1198\/016214507000001328","volume":"103","author":"C Skinner","year":"2008","unstructured":"Skinner, C., Shlomo, N.: Assessing identification risk in survey microdata using log-linear models. J. Am. Stat. Assoc. 103(483), 989\u20131001 (2008)","journal-title":"J. Am. Stat. Assoc."},{"key":"10693_CR20","first-page":"31","volume":"10","author":"C Skinner","year":"1994","unstructured":"Skinner, C., Marsh, C., Openshaw, S., Wymer, C.: Disclosure control for census microdata. J. Off. Stat. 10, 31\u201331 (1994)","journal-title":"J. Off. Stat."},{"key":"10693_CR21","doi-asserted-by":"crossref","unstructured":"Snoke, J., Meijer, E., Phillips, D., Wilkens, J., Lee, J.: Synthesizing surveys with multiple units of observation: An application to the longitudinal aging study in india. Journal of Survey Statistics and Methodology, 047 (2025)","DOI":"10.1093\/jssam\/smae047"},{"key":"10693_CR22","unstructured":"Sweeney, L.: Computational disclosure control: Theory and practice. PhD dissertation, Massachusetts Institute of Technology (2001)"},{"key":"10693_CR23","doi-asserted-by":"publisher","first-page":"158","DOI":"10.1017\/CBO9780511802478.006","volume-title":"5. Bayesian Nonparametrics.","author":"YW Teh","year":"2010","unstructured":"Teh, Y.W., Jordan, M.I.: 5. Bayesian Nonparametrics., pp. 158\u2013207. Cambridge University Press, Cambridge (2010)"},{"key":"10693_CR24","doi-asserted-by":"publisher","first-page":"1566","DOI":"10.1198\/016214506000000302","volume":"101","author":"YW Teh","year":"2006","unstructured":"Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical dirichlet processes. J. Am. Stat. Assoc. 101, 1566\u20131581 (2006)","journal-title":"J. Am. Stat. Assoc."},{"key":"10693_CR25","doi-asserted-by":"publisher","unstructured":"Willenborg, L., De\u00a0Waal, T.: Elements of Statistical Disclosure Control vol. 155. Springer, New York (2012). https:\/\/doi.org\/10.1007\/978-1-4613-0121-9","DOI":"10.1007\/978-1-4613-0121-9"}],"container-title":["Statistics and Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11222-025-10693-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11222-025-10693-9\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11222-025-10693-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,8]],"date-time":"2025-09-08T06:37:06Z","timestamp":1757313426000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11222-025-10693-9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,29]]},"references-count":25,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2025,10]]}},"alternative-id":["10693"],"URL":"https:\/\/doi.org\/10.1007\/s11222-025-10693-9","relation":{},"ISSN":["0960-3174","1573-1375"],"issn-type":[{"type":"print","value":"0960-3174"},{"type":"electronic","value":"1573-1375"}],"subject":[],"published":{"date-parts":[[2025,7,29]]},"assertion":[{"value":"3 February 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 July 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"29 July 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"158"}}