{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,11]],"date-time":"2025-11-11T13:58:01Z","timestamp":1762869481450,"version":"3.37.0"},"reference-count":39,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2024,12,3]],"date-time":"2024-12-03T00:00:00Z","timestamp":1733184000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,12,3]],"date-time":"2024-12-03T00:00:00Z","timestamp":1733184000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"ES Ministry of Science, Innovation and Universities","award":["TED2021-132073B-I00"],"award-info":[{"award-number":["TED2021-132073B-I00"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int. J. Inf. Secur."],"published-print":{"date-parts":[[2025,2]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>This paper explores the limitations faced by current solutions for selecting quasi-identifying attributes in the context of Privacy-Preserving Data Publishing (PPDP). PPDP stipulates that any published personal data should not be linkable to other available data sources in a manner that could potentially lead to individual re-identification or compromise sensitive data. The state-of-the-art methods for selecting quasi-identifying attributes commonly rely on heuristic evaluations to assess the risk of re-identification associated with each attribute. We hypothesize that these heuristic-based methods could be significantly improved by complementing them with empirical methods capable of quantifying the external linkability of dataset attributes. This empirical layer would enable a fine-tuning of the obfuscation of attributes within the dataset, thereby preventing the unnecessary privatization of attributes beyond potential attackers\u2019 reach while ensuring privatization of those easily accessible. For this purpose, we explore recent advancements in identifying semantically related datasets across heterogeneous data sources. Although initially developed for purposes beyond privacy preservation, these methods support our initiative by uncovering potential links with external data and thus providing empirical evidence for the identification of attributes as quasi-identifiers. Finally, we discuss potential pathways to implement this empirical layer in quasi-identifier identification systems.<\/jats:p>","DOI":"10.1007\/s10207-024-00944-7","type":"journal-article","created":{"date-parts":[[2024,12,3]],"date-time":"2024-12-03T17:42:31Z","timestamp":1733247751000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Protecting privacy in the age of big data: exploring data linking methods for quasi-identifier selection"],"prefix":"10.1007","volume":"24","author":[{"given":"Antonio","family":"Borrero-Foncubierta","sequence":"first","affiliation":[]},{"given":"Mercedes","family":"Rodriguez-Garcia","sequence":"additional","affiliation":[]},{"given":"Andr\u00e9s","family":"Mu\u00f1oz","sequence":"additional","affiliation":[]},{"given":"Juan Manuel","family":"Dodero","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,12,3]]},"reference":[{"key":"944_CR1","unstructured":"European Parliament and Council of the European Union. Regulation (EU) 2016\/679 of the European Parliament and of the Council. European Union (2016). https:\/\/data.europa.eu\/eli\/reg\/2016\/679\/oj"},{"key":"944_CR2","doi-asserted-by":"crossref","unstructured":"Bukaty, P.: The California Privacy Rights Act (CPRA)\u2014An Implementation and Compliance Guide. IT Governance Publishing, New York (2021)","DOI":"10.2307\/j.ctv1kv1d14"},{"key":"944_CR3","doi-asserted-by":"publisher","first-page":"51071","DOI":"10.1109\/ACCESS.2020.2980235","volume":"8","author":"A Zigomitros","year":"2020","unstructured":"Zigomitros, A., Casino, F., Solanas, A., Patsakis, C.: A survey on privacy properties for data publishing of relational data. IEEE Access 8, 51071\u201351099 (2020)","journal-title":"IEEE Access"},{"key":"944_CR4","unstructured":"Motwani, R., Ying, X.: Efficient algorithms for masking and finding quasi-identifiers. In: Proceedings of the Conference on Very Large Data Bases (VLDB), pp. 83\u201393. SIAM, Vienna (2007)"},{"issue":"2","key":"944_CR5","first-page":"512","volume":"89","author":"AM Omer","year":"2016","unstructured":"Omer, A.M., Mohamad, M.M.B.: Simple and effective method for selecting quasi-identifier. J. Theor. Appl. Inf. Technol. 89(2), 512 (2016)","journal-title":"J. Theor. Appl. Inf. Technol."},{"key":"944_CR6","doi-asserted-by":"crossref","unstructured":"Hildebrant, R., Le, Q. T., Ta, D. H., Vu, H. T.: Towards better bounds for finding quasi-identifiers. In: Proceedings of the 42nd ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, pp. 155\u2013167. ACM, New York (2023)","DOI":"10.1145\/3584372.3588668"},{"issue":"2","key":"944_CR7","doi-asserted-by":"publisher","first-page":"295","DOI":"10.1166\/jmihi.2020.2966","volume":"10","author":"J Jung","year":"2020","unstructured":"Jung, J., Park, P., Lee, J., Lee, H., Lee, G.K., Cha, H.S.: A determination scheme for quasi-identifiers using uniqueness and influence for de-identification of clinical data. J. Med. Imaging Health Inform. 10(2), 295\u2013303 (2020)","journal-title":"J. Med. Imaging Health Inform."},{"issue":"3","key":"944_CR8","doi-asserted-by":"publisher","first-page":"392","DOI":"10.1109\/TKDE.2009.120","volume":"22","author":"D Sacharidis","year":"2009","unstructured":"Sacharidis, D., Mouratidis, K., Papadias, D.: K-anonymity in the presence of external databases. IEEE Trans. Knowl. Data Eng. 22(3), 392\u2013403 (2009)","journal-title":"IEEE Trans. Knowl. Data Eng."},{"issue":"1","key":"944_CR9","first-page":"150","volume":"26","author":"Y Yan","year":"2018","unstructured":"Yan, Y., Wang, W., Hao, X., Zhang, L.: Finding quasi-identifiers for k-anonymity model by the set of cut-vertex. Eng. Lett. 26(1), 150\u2013160 (2018)","journal-title":"Eng. Lett."},{"key":"944_CR10","doi-asserted-by":"crossref","unstructured":"Lee, Y. J., Lee, K.\u00a0H.: Re-identification of medical records by optimum quasi-identifiers. In: 2017 19th International Conference on Advanced Communication Technology (ICACT), pp. 428\u2013435. IEEE, New York (2017)","DOI":"10.23919\/ICACT.2017.7890125"},{"issue":"2","key":"944_CR11","first-page":"12","volume":"9","author":"E Gachanga","year":"2019","unstructured":"Gachanga, E., Kimwele, M., Nderu, L.: Feature based data anonymization for high dimensional data. J. Inf. Eng. Appl. 9(2), 12\u201321 (2019)","journal-title":"J. Inf. Eng. Appl."},{"key":"944_CR12","doi-asserted-by":"publisher","first-page":"103027","DOI":"10.1016\/j.cose.2022.103027","volume":"126","author":"S Srijayanthi","year":"2023","unstructured":"Srijayanthi, S., Sethukarasi, T.: Design of privacy preserving model based on clustering involved anonymization along with feature selection. Comput. Secur. 126, 103027 (2023)","journal-title":"Comput. Secur."},{"key":"944_CR13","doi-asserted-by":"crossref","unstructured":"Hulsebos, M., Hu, K., Bakker, M., Zgraggen, E., Satyanarayan, A., Kraska, T., Demiralp, \u00c7., Hidalgo, C.: Sherlock: A deep learning approach to semantic data type detection. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1500\u20131508. ACM, New York (2019)","DOI":"10.1145\/3292500.3330993"},{"key":"944_CR14","first-page":"3111","volume":"26","author":"T Mikolov","year":"2013","unstructured":"Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. Adv. Neural. Inf. Process. Syst. 26, 3111\u20133119 (2013)","journal-title":"Adv. Neural. Inf. Process. Syst."},{"issue":"7","key":"944_CR15","first-page":"579","volume":"8","author":"M-C Popescu","year":"2009","unstructured":"Popescu, M.-C., Balas, V.E., Perescu-Popescu, L., Mastorakis, N.: Multilayer perceptron and neural networks. WSEAS Trans. Circuits Syst. 8(7), 579\u2013588 (2009)","journal-title":"WSEAS Trans. Circuits Syst."},{"key":"944_CR16","unstructured":"Lafferty,J. D., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, ICML \u201901, pp. 282\u2013289. Morgan Kaufmann Publishers Inc, San Francisco (2001). ISBN 1558607781"},{"issue":"11","key":"944_CR17","doi-asserted-by":"publisher","first-page":"1835","DOI":"10.14778\/3407790.3407793","volume":"13","author":"D Zhang","year":"2020","unstructured":"Zhang, D., Suhara, Y., Li, J., Hulsebos, M., Demiralp, C., Tan, W.-C.: Sato: contextual semantic type detection in tables. Proc. VLDB Endow. 13(11), 1835\u20131848 (2020)","journal-title":"Proc. VLDB Endow."},{"key":"944_CR18","doi-asserted-by":"crossref","unstructured":"Wang, D., Shiralkar, P., Lockard, C., Huang, B., Dong, X.\u00a0L., Jiang, M.: Tcn: table convolutional network for web table interpretation. In: Proceedings of the Web Conference 2021, pp. 4020\u20134032. ACM, New York (2021)","DOI":"10.1145\/3442381.3450090"},{"issue":"1","key":"944_CR19","doi-asserted-by":"publisher","first-page":"33","DOI":"10.1145\/3542700.3542709","volume":"51","author":"X Deng","year":"2022","unstructured":"Deng, X., Sun, H., Lees, A., You, W., Cong, Yu.: Turl: table understanding through representation learning. ACM SIGMOD Rec. 51(1), 33\u201340 (2022)","journal-title":"ACM SIGMOD Rec."},{"issue":"6","key":"944_CR20","doi-asserted-by":"publisher","first-page":"1319","DOI":"10.14778\/3583140.3583149","volume":"16","author":"Y Sun","year":"2023","unstructured":"Sun, Y., Xin, H., Chen, L.: Reca: Related tables enhanced column semantic type annotation framework. Proc. VLDB Endow. 16(6), 1319\u20131331 (2023)","journal-title":"Proc. VLDB Endow."},{"key":"944_CR21","doi-asserted-by":"crossref","unstructured":"Su, Y., Rafiei, D., Nazari, B.\u00a0K.: Sand: semantic annotation of numeric data in web tables. In: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, pp. 2342\u20132351. ACM, New York (2023)","DOI":"10.1145\/3583780.3615046"},{"key":"944_CR22","unstructured":"Fernandez, R. C., Abedjan, Z., Koko, F., Yuan, G., Madden, S., Stonebraker, M.: Aurum: A data discovery system. In: 2018 IEEE 34th International Conference on Data Engineering (ICDE), pp. 1001\u20131012. IEEE, New York (2018)"},{"issue":"3","key":"944_CR23","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3388870","volume":"38","author":"A Alserafi","year":"2020","unstructured":"Alserafi, A., Abell\u00f3, A., Romero, O., Calders, T.: Keeping the data lake in form: proximity mining for pre-filtering schema matching. ACM Trans. Inf. Syst. (TOIS) 38(3), 1\u201330 (2020)","journal-title":"ACM Trans. Inf. Syst. (TOIS)"},{"key":"944_CR24","doi-asserted-by":"crossref","unstructured":"Trabelsi, M., Chen, Z., Zhang, S., Davison, B.\u00a0D., Heflin, J.: Strubert: structure-aware bert for table search and matching. In: Proceedings of the ACM Web Conference 2022, pp. 442\u2013451. ACM, New York (2022)","DOI":"10.1145\/3485447.3511972"},{"issue":"10","key":"944_CR25","doi-asserted-by":"publisher","first-page":"2458","DOI":"10.14778\/3603581.3603587","volume":"16","author":"Y Dong","year":"2022","unstructured":"Dong, Y., Xiao, C., Nozawa, T., Enomoto, M., Oyamada, M.: Deepjoin: joinable table discovery with pre-trained language models. Proc. VLDB Endow. 16(10), 2458\u20132470 (2022)","journal-title":"Proc. VLDB Endow."},{"issue":"11","key":"944_CR26","doi-asserted-by":"publisher","first-page":"3377","DOI":"10.14778\/3611479.3611533","volume":"16","author":"MY Eltabakh","year":"2023","unstructured":"Eltabakh, M.Y., Kunjir, M., Elmagarmid, A., Ahmad, M.S.: Cross modal data discovery over structured and unstructured data lakes. Proc. VLDB Endow. 16(11), 3377\u20133390 (2023)","journal-title":"Proc. VLDB Endow."},{"issue":"7","key":"944_CR27","doi-asserted-by":"publisher","first-page":"1726","DOI":"10.14778\/3587136.3587146","volume":"16","author":"G Fan","year":"2023","unstructured":"Fan, G., Wang, J., Li, Y., Zhang, D., Miller, R.: Semantics-aware dataset discovery from data lakes with contextualized column-based representation learning. Proc. VLDB Endow. 16(7), 1726\u20131739 (2023)","journal-title":"Proc. VLDB Endow."},{"key":"944_CR28","doi-asserted-by":"crossref","unstructured":"Parmar, A., Katariya, R., Patel, V.: A review on random forest: An ensemble classifier. In: International Conference on Intelligent Data Communication Technologies and Internet of Things (ICICI) 2018, pp. 758\u2013763. Springer, New York (2019)","DOI":"10.1007\/978-3-030-03146-6_86"},{"key":"944_CR29","doi-asserted-by":"publisher","first-page":"73","DOI":"10.1007\/978-1-0716-0826-5_3","volume":"2190","author":"D Chicco","year":"2021","unstructured":"Chicco, D.: Siamese neural networks: an overview. Artif. Neural Netw. 2190, 73\u201394 (2021)","journal-title":"Artif. Neural Netw."},{"key":"944_CR30","doi-asserted-by":"crossref","unstructured":"Kramer, O., Kramer, O.: K-nearest neighbors. In: Dimensionality Reduction with Unsupervised Nearest Neighbors, vol. 51, pp. 13\u201323 (2013)","DOI":"10.1007\/978-3-642-38652-7_2"},{"issue":"13","key":"944_CR31","doi-asserted-by":"publisher","first-page":"1581","DOI":"10.14778\/3007263.3007314","volume":"9","author":"P Konda","year":"2016","unstructured":"Konda, P., Das, S., Doan, A.H., Ardalan, A., Ballard, J.R., Li, H., Panahi, F., Zhang, H., Naughton, J., Prasad, S., et al.: Magellan: toward building entity matching management systems over data science stacks. Proc. VLDB Endow. 9(13), 1581\u20131584 (2016)","journal-title":"Proc. VLDB Endow."},{"key":"944_CR32","doi-asserted-by":"crossref","unstructured":"Mudgal, S., Li, H., Rekatsinas, T., Doan, A., Park, Y., Krishnan, G., Deep, R., Arcaute, E., Raghavendra, V.: Deep learning for entity matching: A design space exploration. In: Proceedings of the 2018 International Conference on Management of Data, pp. 19\u201334. ACM, New York (2018)","DOI":"10.1145\/3183713.3196926"},{"key":"944_CR33","doi-asserted-by":"crossref","unstructured":"Kong, C., Chen, B.-X., Zhang, L.-P.: DEM: Deep entity matching across heterogeneous information networks. J. Comput. Sci. Technol. 35, 739\u2013750 (2020)","DOI":"10.1007\/s11390-020-0139-5"},{"key":"944_CR34","doi-asserted-by":"publisher","DOI":"10.1016\/j.is.2020.101565","volume":"93","author":"G Papadakis","year":"2020","unstructured":"Papadakis, G., Mandilaras, G., Gagliardelli, L., Simonini, G., Thanos, E., Giannakopoulos, G., Bergamaschi, S., Palpanas, T., Koubarakis, M.: Three-dimensional entity resolution with JedAI. Inf. Syst. 93, 101565 (2020)","journal-title":"Inf. Syst."},{"key":"944_CR35","doi-asserted-by":"crossref","unstructured":"Jurek-Loughrey, A.: Siamese neural network for unstructured data linkage. In: Proceedings of the 22nd International Conference on Information Integration and Web-Based Applications and Services, pp. 417\u2013425. ACM, New York (2020)","DOI":"10.1145\/3428757.3429106"},{"issue":"4","key":"944_CR36","doi-asserted-by":"publisher","first-page":"822","DOI":"10.1007\/s11390-021-1321-0","volume":"36","author":"C-C Sun","year":"2021","unstructured":"Sun, C.-C., Shen, D.-R.: Mixed hierarchical networks for deep entity matching. J. Comput. Sci. Technol. 36(4), 822\u2013838 (2021)","journal-title":"J. Comput. Sci. Technol."},{"key":"944_CR37","doi-asserted-by":"crossref","unstructured":"Yao, D., Gu, Y., Cong, G., Jin, H., Lv, X.: Entity resolution with hierarchical graph attention networks. In: Proceedings of the 2022 International Conference on Management of Data, pp. 429\u2013442. ACM, New York (2022)","DOI":"10.1145\/3514221.3517872"},{"key":"944_CR38","doi-asserted-by":"publisher","unstructured":"Mugeni, J.\u00a0B., Lynden, S., Amagasa, T., Matono, A.: Adapterem: pre-trained language model adaptation for generalized entity matching using adapter-tuning. In: Proceedings of the 27th International Database Engineered Applications Symposium, IDEAS \u201923, pp. 140\u2013147. Association for Computing Machinery, New York (2023). ISBN 9798400707445. https:\/\/doi.org\/10.1145\/3589462.3589498","DOI":"10.1145\/3589462.3589498"},{"key":"944_CR39","unstructured":"Trouillon, T., Welbl, J., Riedel, S., Gaussier, \u00c9., Bouchard, G.: Complex embeddings for simple link prediction. In: International Conference on Machine Learning, pp. 2071\u20132080. PMLR, USA (2016)"}],"container-title":["International Journal of Information Security"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10207-024-00944-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10207-024-00944-7\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10207-024-00944-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,12]],"date-time":"2025-02-12T05:20:23Z","timestamp":1739337623000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10207-024-00944-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,12,3]]},"references-count":39,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,2]]}},"alternative-id":["944"],"URL":"https:\/\/doi.org\/10.1007\/s10207-024-00944-7","relation":{},"ISSN":["1615-5262","1615-5270"],"issn-type":[{"type":"print","value":"1615-5262"},{"type":"electronic","value":"1615-5270"}],"subject":[],"published":{"date-parts":[[2024,12,3]]},"assertion":[{"value":"3 December 2024","order":1,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no Conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}},{"value":"This article does not contain any studies with human participants and\/or animals performed by any of the authors.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical standards"}}],"article-number":"37"}}