{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,10]],"date-time":"2026-06-10T04:14:44Z","timestamp":1781064884525,"version":"3.54.1"},"reference-count":47,"publisher":"Springer Science and Business Media LLC","issue":"4","license":[{"start":{"date-parts":[[2022,12,16]],"date-time":"2022-12-16T00:00:00Z","timestamp":1671148800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,12,16]],"date-time":"2022-12-16T00:00:00Z","timestamp":1671148800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100008678","name":"Universit\u00e4t Leipzig","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100008678","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int J Data Sci Anal"],"published-print":{"date-parts":[[2023,5]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Privacy-preserving record linkage (PPRL) is the process aimed at identifying records that represent the same real-world entity across different data sources while guaranteeing the privacy of sensitive information about these entities. A popular PPRL method is to encode sensitive plain-text data into Bloom filters (BFs), bit vectors that enable the efficient calculation of similarities between records that is required for PPRL. However, BF encoding cannot completely prevent the re-identification of plain-text values because sets of BFs can contain bit patterns that can be mapped to plain-text values using cryptanalysis attacks. Various hardening techniques have therefore been proposed that modify the bit patterns in BFs with the aim to prevent such attacks. However, it has been shown that even hardened BFs can still be vulnerable to attacks. To avoid any such attacks, we propose a novel encoding technique for PPRL based on autoencoders that transforms BFs into vectors of real numbers. To achieve a high comparison quality of the generated numerical vectors, we propose a method that guarantees the comparability of encodings generated by the different data owners. Experiments on real-world data sets show that our technique achieves high linkage quality and prevents known cryptanalysis attacks on BF encoding.<\/jats:p>","DOI":"10.1007\/s41060-022-00377-2","type":"journal-article","created":{"date-parts":[[2022,12,16]],"date-time":"2022-12-16T03:02:47Z","timestamp":1671159767000},"page":"347-357","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":10,"title":["Privacy-preserving record linkage using autoencoders"],"prefix":"10.1007","volume":"15","author":[{"given":"Victor","family":"Christen","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Tim","family":"H\u00e4ntschel","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Peter","family":"Christen","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Erhard","family":"Rahm","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2022,12,16]]},"reference":[{"key":"377_CR1","unstructured":"Bank, D., Koenigstein, N., Giryes, R.: Autoencoders. CoRR arXiv:2003.05991 (2020)"},{"issue":"8","key":"377_CR2","doi-asserted-by":"publisher","first-page":"6391","DOI":"10.1007\/s10462-021-09975-1","volume":"54","author":"MM Bejani","year":"2021","unstructured":"Bejani, M.M., Ghatee, M.: A systematic review on overfitting control in shallow and deep neural networks. Artif. Intell. Rev. 54(8), 6391\u20136438 (2021)","journal-title":"Artif. Intell. Rev."},{"issue":"12","key":"377_CR3","doi-asserted-by":"publisher","first-page":"eabi8021","DOI":"10.1126\/sciadv.abi8021","volume":"8","author":"O Binette","year":"2022","unstructured":"Binette, O., Steorts, R.C.: (Almost) all of entity resolution. Sci. Adv. 8(12), eabi8021 (2022)","journal-title":"Sci. Adv."},{"issue":"7","key":"377_CR4","doi-asserted-by":"publisher","first-page":"422","DOI":"10.1145\/362686.362692","volume":"13","author":"BH Bloom","year":"1970","unstructured":"Bloom, B.H.: Space\/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422\u2013426 (1970)","journal-title":"Commun. ACM"},{"key":"377_CR5","volume-title":"Medical Data Privacy Handbook","author":"JH Boyd","year":"2015","unstructured":"Boyd, J.H., Randall, S.M., Ferrante, A.M.: Application of privacy-preserving techniques in operational record linkage centres. In: Gkoulalas-Divanis, A., Loukides, G. (eds.) Medical Data Privacy Handbook. Springer, New York (2015)"},{"key":"377_CR6","doi-asserted-by":"crossref","unstructured":"Christen, P., Vidanage, A., Ranbaduge, T., Schnell, R.: Pattern-mining based cryptanalysis of Bloom filters for privacy-preserving record linkage. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining. pp. 628\u2013640. Springer, Melbourne (2018)","DOI":"10.1007\/978-3-319-57454-7_49"},{"key":"377_CR7","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-59706-1","volume-title":"Linking Sensitive Data","author":"P Christen","year":"2020","unstructured":"Christen, P., Ranbaduge, T., Schnell, R.: Linking Sensitive Data. Springer, Heidelberg (2020)"},{"key":"377_CR8","doi-asserted-by":"crossref","unstructured":"Christen, P., Ranbaduge, T., Vatsalan, D., Schnell, R.: Precise and fast cryptanalysis for Bloom filter based privacy-preserving record linkage. Transactions Knowl. Data Eng. 18(11), 2164\u20132177(2018)","DOI":"10.1109\/TKDE.2018.2874004"},{"key":"377_CR9","unstructured":"Christen, P., Schnell, R.: Common misconceptions about population data. arXiv preprint arXiv:2112.10912 (2021)"},{"key":"377_CR10","doi-asserted-by":"crossref","unstructured":"Christen, P., Schnell, R., Vatsalan, D., Ranbaduge, T.: Efficient cryptanalysis of Bloom filters for privacy-preserving record linkage. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining. , vol. LNAI 10234, pp. 628\u2013640. Springer, Jeju, Korea (2017)","DOI":"10.1007\/978-3-319-57454-7_49"},{"key":"377_CR11","unstructured":"Culnane, C., Rubinstein, B.I., Teague, V.: Vulnerabilities in the use of similarity tables in combination with pseudonymisation to preserve data privacy in the UK Office for National Statistics\u2019 privacy-preserving record linkage. arXiv Preprint (2017)"},{"key":"377_CR12","doi-asserted-by":"crossref","unstructured":"Davis, J., Goadrich, M.: The relationship between precision-recall and ROC curves. In: International Conference on Machine Learning. pp. 233\u2013240. ACM, Pittsburgh (2006)","DOI":"10.1145\/1143844.1143874"},{"key":"377_CR13","doi-asserted-by":"crossref","unstructured":"Dwork, C., Talwar, K., Thakurta, A., Zhang, L.: Analyze Gauss: optimal bounds for privacy-preserving principal component analysis. In: Symposium on Theory of Computing. pp. 11\u201320. ACM, New York (2014)","DOI":"10.1145\/2591796.2591883"},{"key":"377_CR14","unstructured":"Franke, M., Sehili, Z., Rohde, F., Rahm, E.: Evaluation of hardening techniques for privacy-preserving record linkage. In: Proceedings of the 24th International Conference on Extending Database Technology, EDBT 2021, Nicosia, Cyprus, 23-26 March 2021, pp. 289\u2013300 (2021)"},{"key":"377_CR15","doi-asserted-by":"publisher","first-page":"453","DOI":"10.1007\/BF01025868","volume":"57","author":"D Freedman","year":"1981","unstructured":"Freedman, D., Diaconis, P.: On the histogram as a density estimator:l2 theory. Zeitschrift f\u00fcr Wahrscheinlichkeitstheorie und Verwandte Gebiete 57, 453\u2013476 (1981)","journal-title":"Zeitschrift f\u00fcr Wahrscheinlichkeitstheorie und Verwandte Gebiete"},{"key":"377_CR16","doi-asserted-by":"crossref","unstructured":"Gkoulalas-Divanis, A., Vatsalan, D., Karapiperis, D., Kantarcioglu, M.: Modern privacy-preserving record linkage techniques: an overview. Transactions Informations Forensics Secur. (2021)","DOI":"10.1109\/TIFS.2021.3114026"},{"key":"377_CR17","doi-asserted-by":"crossref","unstructured":"Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Symposium on the Theory of Computing. pp. 604\u2013613. ACM, Dallas (1998)","DOI":"10.1145\/276698.276876"},{"key":"377_CR18","doi-asserted-by":"crossref","unstructured":"Karakasidis, A., Verykios, V.S., Christen, P.: Fake injection strategies for private phonetic matching. In: International Workshop on Data Privacy Management. Leuven, Belgium (2011)","DOI":"10.1007\/978-3-642-28879-1_2"},{"key":"377_CR19","doi-asserted-by":"crossref","unstructured":"Karapiperis, D., Gkoulalas-Divanis, A., Verykios, V.S.: Distance-aware encoding of numerical values for privacy-preserving record linkage. In: IEEE International Conference on Data Engineering. pp. 135\u2013138. San Diego (2017)","DOI":"10.1109\/ICDE.2017.58"},{"key":"377_CR20","doi-asserted-by":"crossref","unstructured":"Kroll, M., Steinmetzer, S.: Who is 1011011111...1110110010? Automated cryptanalysis of Bloom filter encryptions of databases with several personal identifiers. In: International Joint Conference on Biomedical Engineering Systems and Technologies. pp. 341\u2013356. Lisbon (2015)","DOI":"10.1007\/978-3-319-27707-3_21"},{"key":"377_CR21","doi-asserted-by":"crossref","unstructured":"Kuzu, M., Kantarcioglu, M., Durham, E., Malin, B.: A constraint satisfaction cryptanalysis of Bloom filters in private record linkage. In: International Symposium on Privacy Enhancing Technologies Symposium. pp. 226\u2013245. Springer (2011)","DOI":"10.1007\/978-3-642-22263-4_13"},{"issue":"2","key":"377_CR22","doi-asserted-by":"publisher","first-page":"285","DOI":"10.1136\/amiajnl-2012-000917","volume":"20","author":"M Kuzu","year":"2013","unstructured":"Kuzu, M., Kantarcioglu, M., Durham, E.A., Toth, C., Malin, B.: A practical approach to achieve private medical record linkage in light of public resources. J. Am. Med. Inform. Assoc. 20(2), 285\u2013292 (2013)","journal-title":"J. Am. Med. Inform. Assoc."},{"issue":"1","key":"377_CR23","doi-asserted-by":"publisher","first-page":"78","DOI":"10.1214\/ss\/1177013818","volume":"1","author":"L Le Cam","year":"1986","unstructured":"Le Cam, L.: The central limit theorem around 1935. Statistical Sci. 1(1), 78\u201391 (1986)","journal-title":"Statistical Sci."},{"key":"377_CR24","first-page":"49","volume":"2","author":"PC Mahalanobis","year":"1936","unstructured":"Mahalanobis, P.C.: On the generalized distance in statistics. Proc. Nat. Inst. Sci. (Calcutta) 2, 49\u201355 (1936)","journal-title":"Proc. Nat. Inst. Sci. (Calcutta)"},{"issue":"4","key":"377_CR25","doi-asserted-by":"publisher","first-page":"217","DOI":"10.1504\/IJBDI.2017.086956","volume":"4","author":"W Mitchell","year":"2017","unstructured":"Mitchell, W., Dewri, R., Thurimella, R., Roschke, M.: A graph traversal attack on Bloom filter-based medical data aggregation. Int. J. Big Data Intell. 4(4), 217\u2013226 (2017)","journal-title":"Int. J. Big Data Intell."},{"issue":"6","key":"377_CR26","doi-asserted-by":"publisher","first-page":"24","DOI":"10.1145\/1743546.1743558","volume":"53","author":"A Narayanan","year":"2010","unstructured":"Narayanan, A., Shmatikov, V.: Myths and fallacies of personally identifiable information. Commun. ACM 53(6), 24\u201326 (2010)","journal-title":"Commun. ACM"},{"key":"377_CR27","doi-asserted-by":"crossref","unstructured":"Newcombe, H., Kennedy, J., Axford, S., James, A.: Automatic linkage of vital records. Science 130(3381), 954\u2013959 (1959)","DOI":"10.1126\/science.130.3381.954"},{"key":"377_CR28","doi-asserted-by":"crossref","unstructured":"Niedermeyer, F., Steinmetzer, S., Kroll, M., Schnell, R.: Cryptanalysis of basic Bloom filters used for privacy preserving record linkage. German Record Linkage Center, Working Paper Series, No. WP-GRLC-2014-04 (2014)","DOI":"10.2139\/ssrn.3530867"},{"issue":"2","key":"377_CR29","doi-asserted-by":"publisher","first-page":"346","DOI":"10.1109\/JBHI.2018.2796941","volume":"22","author":"R Pita","year":"2018","unstructured":"Pita, R., Pinto, C., Sena, S., Fiaccone, R., Amorim, L., Reis, S., Barreto, M., Denaxas, S., Barreto, M.: On the accuracy and scalability of probabilistic data linkage over the Brazilian 114 million cohort. J. Biomed. Health Inform. 22(2), 346\u2013353 (2018)","journal-title":"J. Biomed. Health Inform."},{"key":"377_CR30","doi-asserted-by":"crossref","unstructured":"Ranbaduge, T., Schnell, R.: Securing Bloom filters for privacy-preserving record linkage. In: International Conference on Information and Knowledge Management. pp. 2185\u20132188. ACM, Galway (2020)","DOI":"10.1145\/3340531.3412105"},{"key":"377_CR31","doi-asserted-by":"publisher","first-page":"205","DOI":"10.1016\/j.jbi.2013.12.003","volume":"50","author":"SM Randall","year":"2014","unstructured":"Randall, S.M., Ferrante, A.M., Boyd, J.H., Bauer, J.K., Semmens, J.B.: Privacy-preserving record linkage on large real world datasets. J. Biomed. Inform. 50, 205\u2013212 (2014)","journal-title":"J. Biomed. Inform."},{"issue":"1","key":"377_CR32","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1472-6947-9-1","volume":"9","author":"R Schnell","year":"2009","unstructured":"Schnell, R., Bachteler, T., Reiher, J.: Privacy-preserving record linkage using Bloom filters. BMC Med. Inform. Decisi. Mak. 9(1), 1\u201311 (2009)","journal-title":"BMC Med. Inform. Decisi. Mak."},{"key":"377_CR33","doi-asserted-by":"crossref","unstructured":"Schnell, R., Borgs, C.: Randomized response and balanced Bloom filters for privacy preserving record linkage. In: International Conference on Data Mining Workshops. pp. 218\u2013224. IEEE, Barcelona (2016)","DOI":"10.1109\/ICDMW.2016.0038"},{"key":"377_CR34","doi-asserted-by":"crossref","unstructured":"Schnell, R., Borgs, C.: XOR-folding for Bloom filter-based encryptions for privacy-preserving record linkage. German Record Linkage Center 22 (2016)","DOI":"10.2139\/ssrn.3527984"},{"key":"377_CR35","doi-asserted-by":"crossref","unstructured":"Schnell, R., Borgs, C.: Hardening encrypted patient names against cryptographic attacks using cellular automata. In: International Conference on Data Mining Workshops. pp. 518\u2013522. IEEE, Singapore (2018)","DOI":"10.1109\/ICDMW.2018.00082"},{"key":"377_CR36","doi-asserted-by":"crossref","unstructured":"Schnell, R., Borgs, C.: Encoding hierarchical classification codes for privacy-preserving record linkage using Bloom filters. In: Workshop on Data Integration and Applications. held at ECML\/PKDD, pp. 142\u2013156. Springer, W\u00fcrzburg (2019)","DOI":"10.1007\/978-3-030-43887-6_12"},{"issue":"3\/4","key":"377_CR37","doi-asserted-by":"publisher","first-page":"591","DOI":"10.2307\/2333709","volume":"52","author":"SS Shapiro","year":"1965","unstructured":"Shapiro, S.S., Wilk, M.B.: An analysis of variance test for normality. Biometrika 52(3\/4), 591\u2013611 (1965)","journal-title":"Biometrika"},{"issue":"1","key":"377_CR38","doi-asserted-by":"publisher","first-page":"11","DOI":"10.1007\/BF00130487","volume":"7","author":"MJ Swain","year":"1991","unstructured":"Swain, M.J., Ballard, D.H.: Color indexing. Int. J. Comput. Vis. 7(1), 11\u201332 (1991)","journal-title":"Int. J. Comput. Vis."},{"key":"377_CR39","doi-asserted-by":"crossref","unstructured":"Vaiwsri, S., Ranbaduge, T., Christen, P.: Reference values based hardening for Bloom filters based privacy-preserving record linkage. In: Australasian Conference on Data Mining. pp. 189\u2013202. Springer, Bathurst (2018)","DOI":"10.1007\/978-981-13-6661-1_15"},{"key":"377_CR40","doi-asserted-by":"crossref","unstructured":"Vaiwsri, S., Ranbaduge, T., Christen, P.: Accurate and efficient privacy-preserving string matching. Int. J. Data Sci. Anal. 14, 191\u2013125(2022)","DOI":"10.1007\/s41060-022-00320-5"},{"key":"377_CR41","doi-asserted-by":"crossref","unstructured":"Vatsalan, D., Christen, P.: Privacy-preserving matching of similar patients. J. Biomed. Inform. 59, 285\u2013298 (2016)","DOI":"10.1016\/j.jbi.2015.12.004"},{"key":"377_CR42","doi-asserted-by":"crossref","unstructured":"Vatsalan, D., Christen, P., Verykios, V.S.: A taxonomy of privacy-preserving record linkage techniques. Information Syst. 38(6), 946\u2013969 (2013)","DOI":"10.1016\/j.is.2012.11.005"},{"key":"377_CR43","doi-asserted-by":"crossref","unstructured":"Vatsalan, D., Sehili, Z., Christen, P., Rahm, E.: Privacy-preserving record linkage for Big Data: current approaches and research challenges. In: Zomaya, A.Y., Sakr, S. (eds.) Handbook of Big Data Technologies. Springer, New York (2017)","DOI":"10.1007\/978-3-319-49340-4_25"},{"key":"377_CR44","doi-asserted-by":"crossref","unstructured":"Vidanage, A., Christen, P., Ranbaduge, T., Schnell, R.: A graph matching attack on privacy-preserving record linkage. In: International Conference on Information and Knowledge Management. pp. 1485\u20131494. ACM (2020)","DOI":"10.1145\/3340531.3411931"},{"key":"377_CR45","doi-asserted-by":"crossref","unstructured":"Vidanage, A., Ranbaduge, T., Christen, P., Randall, S.: A privacy attack on multiple dynamic match-key based privacy-preserving record linkage. Int. J. Popul. Data Sci. 5(1),13 (2020)","DOI":"10.23889\/ijpds.v5i1.1345"},{"key":"377_CR46","doi-asserted-by":"crossref","unstructured":"Vidanage, A., Ranbaduge, T., Christen, P., Schnell, R.: Efficient pattern mining based cryptanalysis for privacy-preserving record linkage. In: International Conference on Data Engineering. IEEE, Macau (2019)","DOI":"10.1109\/ICDE.2019.00176"},{"key":"377_CR47","doi-asserted-by":"crossref","unstructured":"Vidanage, A., Ranbaduge, T., Christen, P., Schnell, R.: A taxonomy of attacks on privacy-preserving record linkage. J. Priv. Confid. 12(1), 35 (2022)","DOI":"10.29012\/jpc.764"}],"container-title":["International Journal of Data Science and Analytics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s41060-022-00377-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s41060-022-00377-2\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s41060-022-00377-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,4,26]],"date-time":"2023-04-26T14:30:26Z","timestamp":1682519426000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s41060-022-00377-2"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,16]]},"references-count":47,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2023,5]]}},"alternative-id":["377"],"URL":"https:\/\/doi.org\/10.1007\/s41060-022-00377-2","relation":{},"ISSN":["2364-415X","2364-4168"],"issn-type":[{"value":"2364-415X","type":"print"},{"value":"2364-4168","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,12,16]]},"assertion":[{"value":"11 August 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 November 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"16 December 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no conflicts of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}