{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T01:04:19Z","timestamp":1759971859733,"version":"build-2065373602"},"reference-count":45,"publisher":"Public Library of Science (PLoS)","issue":"10","license":[{"start":{"date-parts":[[2025,10,8]],"date-time":"2025-10-08T00:00:00Z","timestamp":1759881600000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Funda\u00c3\u00a7\u00c3\u00a3o para a Ci\u00c3\u00aancia e Tecnologia","award":["UID\/50008"],"award-info":[{"award-number":["UID\/50008"]}]},{"name":"Funda\u00c3\u00a7\u00c3\u00a3o para a Ci\u00c3\u00aancia e Tecnologia","award":["UID\/50008"],"award-info":[{"award-number":["UID\/50008"]}]},{"name":"Funda\u00c3\u00a7\u00c3\u00a3o para a Ci\u00c3\u00aancia e a Tecnologia","award":["UID\/06522\/2023"],"award-info":[{"award-number":["UID\/06522\/2023"]}]}],"content-domain":{"domain":["www.plosone.org"],"crossmark-restriction":false},"short-container-title":["PLoS One"],"abstract":"<jats:p>This work investigates the trade-off between data anonymization and utility, particularly focusing on the implications for equity-related research in education. Using microdata from the 2019 Brazilian National Student Performance Exam (ENADE), the study applies the (\u03b5, \u03b4)-Differential Privacy model to explore the impact of anonymization on the dataset\u2019s utility for socio-educational equity analysis. By clustering both the original and anonymized datasets, the research evaluates how group categories related to students\u2019 sociodemographic variables, such as gender, race, income, and parental education, are affected by the anonymization process. The results reveal that while anonymization techniques can preserve overall data structure, they can also lead to the suppression or misrepresentation of minority groups, introducing biases that may jeopardise the promotion of educational equity. This finding highlights the importance of involving domain experts in the interpretation of anonymized data, particularly in studies aimed at reducing socio-economic inequalities. The study concludes that careful attention is needed to prevent anonymization efforts from distorting key group categories, which could undermine the validity of data-driven policies aimed at promoting equity.<\/jats:p>","DOI":"10.1371\/journal.pone.0332441","type":"journal-article","created":{"date-parts":[[2025,10,8]],"date-time":"2025-10-08T17:24:59Z","timestamp":1759944299000},"page":"e0332441","update-policy":"https:\/\/doi.org\/10.1371\/journal.pone.corrections_policy","source":"Crossref","is-referenced-by-count":0,"title":["Subtle biases introduced in equity studies through data anonymization"],"prefix":"10.1371","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6054-7188","authenticated-orcid":true,"given":"Paulo","family":"Fazendeiro","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3072-0186","authenticated-orcid":true,"given":"Paula","family":"Prata","sequence":"additional","affiliation":[]},{"given":"Maria Eug\u00e9nia","family":"Ferr\u00e3o","sequence":"additional","affiliation":[]}],"member":"340","published-online":{"date-parts":[[2025,10,8]]},"reference":[{"issue":"4","key":"pone.0332441.ref001","doi-asserted-by":"crossref","first-page":"1185","DOI":"10.1007\/s13524-016-0484-8","article-title":"Racial inequality in education in Brazil: a twins fixed-effects approach","volume":"53","author":"LJ Marteleto","year":"2016","journal-title":"Demography"},{"issue":"1","key":"pone.0332441.ref002","first-page":"41","article-title":"Inequality of opportunities and educational outcomes in Brazil","volume":"54","author":"CAC Ribeiro","year":"2011","journal-title":"Dados"},{"issue":"1","key":"pone.0332441.ref003","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1332\/174426421X16149632470114","article-title":"Understanding knowledge brokerage and its transformative potential: a Bourdieusian perspective","volume":"18","author":"S Chew","year":"2022","journal-title":"Evid Policy"},{"issue":"2","key":"pone.0332441.ref004","doi-asserted-by":"crossref","first-page":"221","DOI":"10.1111\/ejed.12550","article-title":"Policies for inclusive education practices in teacher education in the United Kingdom and France","volume":"58","author":"R Malet","year":"2023","journal-title":"Eur J Educ"},{"issue":"1","key":"pone.0332441.ref005","doi-asserted-by":"crossref","first-page":"254","DOI":"10.3390\/su11010254","article-title":"About the triggering of UN sustainable development goals and regenerative sustainability in higher education","volume":"11","author":"G Sonetti","year":"2019","journal-title":"Sustainability"},{"key":"pone.0332441.ref006","doi-asserted-by":"crossref","DOI":"10.1177\/23328584221121344","article-title":"Conceptions of Educational Equity","volume":"8","author":"M Levinson","year":"2022","journal-title":"AERA Open"},{"key":"pone.0332441.ref007","volume-title":"Handbook on European data protection law \u2013 2018 edition [Internet]","author":"of Europe C, of Human Rights EC, Supervisor EDP, for Fundamental Rights EUA","year":"2018"},{"issue":"1","key":"pone.0332441.ref008","doi-asserted-by":"crossref","first-page":"117","DOI":"10.2307\/799885","article-title":"Resistance to community surveys","volume":"18","author":"E Josephson","year":"1970","journal-title":"Soc Probl"},{"issue":"409","key":"pone.0332441.ref009","doi-asserted-by":"crossref","first-page":"38","DOI":"10.1080\/01621459.1990.10475304","article-title":"Disclosure control of microdata","volume":"85","author":"JG Bethlehem","year":"1990","journal-title":"J Am Stat Assoc"},{"issue":"1","key":"pone.0332441.ref010","doi-asserted-by":"crossref","first-page":"69","DOI":"10.1111\/j.1467-9574.1992.tb01327.x","article-title":"Disclosure risk for microdata stemming from official statistics","volume":"46","author":"U Blien","year":"1992","journal-title":"Stat Neerl"},{"issue":"3","key":"pone.0332441.ref011","first-page":"329","article-title":"Finding a needle in a haystack","volume":"2","author":"T Dalenius","year":"1986","journal-title":"J Off Stat"},{"issue":"1","key":"pone.0332441.ref012","doi-asserted-by":"crossref","first-page":"33","DOI":"10.1111\/j.1467-9574.1992.tb01325.x","article-title":"Strategies for measuring risk in public use microdata files","volume":"46","author":"BV Greenberg","year":"1992","journal-title":"Stat Neerl"},{"issue":"1","key":"pone.0332441.ref013","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1111\/j.1467-9574.1992.tb01324.x","article-title":"On identification disclosure and prediction disclosure for microdata","volume":"46","author":"CJ Skinner","year":"1992","journal-title":"Statist Neerl"},{"issue":"1","key":"pone.0332441.ref014","doi-asserted-by":"crossref","DOI":"10.1186\/s40537-019-0177-4","article-title":"Big Data and discrimination: perils, promises and solutions. A systematic review","volume":"6","author":"M Favaretto","year":"2019","journal-title":"J Big Data"},{"issue":"1","key":"pone.0332441.ref015","doi-asserted-by":"crossref","first-page":"456","DOI":"10.1038\/s41597-022-01561-6","article-title":"Utility-driven assessment of anonymized data via clustering","volume":"9","author":"ME Ferr\u00e3o","year":"2022","journal-title":"Sci Data"},{"key":"pone.0332441.ref016","first-page":"1","article-title":"Data Anonymization: K-anonymity Sensitivity Analysis.","volume-title":"2020 15th Iberian Conference on Information Systems and Technologies (CISTI) [Internet]","author":"W Santos","year":"2020"},{"issue":"1","key":"pone.0332441.ref017","article-title":"Ethics and privacy as enablers of learning analytics","volume":"3","author":"D Gasevic","year":"2016","journal-title":"J Learn Anal"},{"issue":"1","key":"pone.0332441.ref018","article-title":"Guest editorial: ethics and privacy in learning analytics","volume":"3","author":"R Ferguson","year":"2016","journal-title":"J Learn Anal"},{"issue":"3","key":"pone.0332441.ref019","doi-asserted-by":"crossref","first-page":"169","DOI":"10.18608\/jla.2022.7751","article-title":"Privacy in LA research","volume":"9","author":"O Viberg","year":"2022","journal-title":"J Learn Anal"},{"issue":"2","key":"pone.0332441.ref020","doi-asserted-by":"crossref","first-page":"83","DOI":"10.18608\/jla.2021.7353","article-title":"De-identification is insufficient to protect student privacy, or \u2013 what can a field trip reveal?","volume":"8","author":"E Yacobson","year":"2021","journal-title":"J Learn Anal"},{"key":"pone.0332441.ref021","unstructured":"INEP - Instituto Nacional de Estudos e Pesquisas Educacionais An\u00edsio Teixeira. ANRESC (Prova Brasil) [Internet]. 2023. Available from: https:\/\/www.gov.br\/inep\/pt-br\/acesso-a-informacao\/dados-abertos\/microdados"},{"key":"pone.0332441.ref022","article-title":"Exame Nacional de Desempenho de Estudantes (Enade): Tend\u00eancias da produ\u00e7\u00e3o cient\u00edfica brasileira (2004-2018)","volume":"30","author":"ADO Fernandes","year":"2022","journal-title":"Educ Policy Anal Arch"},{"key":"pone.0332441.ref023","first-page":"130","article-title":"Anonymized Data Assessment via Analysis of Variance: An Application to Higher Education Evaluation.","volume-title":"Lecture Notes in Computer Science","author":"ME Ferr\u00e3o","year":"2023"},{"key":"pone.0332441.ref024","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-319-57959-7","volume-title":"The EU General Data Protection Regulation (GDPR) [Internet]","author":"P Voigt","year":"2017"},{"issue":"1","key":"pone.0332441.ref025","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1109\/TLT.2016.2607747","article-title":"Privacy-preserving learning analytics: challenges and techniques","volume":"10","author":"ME Gursoy","year":"2017","journal-title":"IEEE Trans Learning Technol"},{"key":"pone.0332441.ref026","doi-asserted-by":"crossref","first-page":"10562","DOI":"10.1109\/ACCESS.2017.2706947","article-title":"Privacy-preserving data mining: methods, metrics, and applications","volume":"5","author":"R Mendes","year":"2017","journal-title":"IEEE Access"},{"issue":"6","key":"pone.0332441.ref027","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3460427","article-title":"A comprehensive survey of privacy-preserving federated learning","volume":"54","author":"X Yin","year":"2021","journal-title":"ACM Comput Surv"},{"issue":"6","key":"pone.0332441.ref028","doi-asserted-by":"crossref","first-page":"1010","DOI":"10.1109\/69.971193","article-title":"Protecting respondents identities in microdata release","volume":"13","author":"P Samarati","year":"2001","journal-title":"IEEE Trans Knowl Data Eng"},{"issue":"05","key":"pone.0332441.ref029","doi-asserted-by":"crossref","first-page":"557","DOI":"10.1142\/S0218488502001648","article-title":"k-Anonymity: a model for protecting privacy","volume":"10","author":"L Sweeney","year":"2002","journal-title":"Int J Unc Fuzz Knowl Based Syst"},{"key":"pone.0332441.ref030","doi-asserted-by":"crossref","first-page":"102193","DOI":"10.1016\/j.is.2023.102193","article-title":"Real-world K-Anonymity applications: The KGen approach and its evaluation in fraudulent transactions","volume":"115","author":"D De Pascale","year":"2023","journal-title":"Inf Syst"},{"key":"pone.0332441.ref031","doi-asserted-by":"crossref","first-page":"012279","DOI":"10.1088\/1757-899X\/225\/1\/012279","article-title":"An extensive study on data anonymization algorithms based on K-Anonymity","volume":"225","author":"MS Simi","year":"2017","journal-title":"IOP Conf Ser: Mater Sci Eng"},{"key":"pone.0332441.ref032","first-page":"1","article-title":"Differential privacy","author":"C Dwork","year":"2006","journal-title":"Lecture Notes in Computer Science"},{"issue":"2","key":"pone.0332441.ref033","article-title":"Differential privacy in practice: Expose your Epsilons!","volume":"9","author":"C Dwork","year":"2019","journal-title":"J Priv Confidentiality"},{"key":"pone.0332441.ref034","doi-asserted-by":"crossref","unstructured":"Dwork C, Kenthapadi K, McSherry F, Mironov I, Naor M. Our Data, Ourselves: Privacy Via Distributed Noise Generation. 2006. pp. 486\u2013503. Available from: http:\/\/link.springer.com\/10.1007\/11761679_29","DOI":"10.1007\/11761679_29"},{"key":"pone.0332441.ref035","unstructured":"Near J, Darais D. Guidelines for Evaluating Differential Privacy Guarantees [Internet]. 2023. Available from: https:\/\/nvlpubs.nist.gov\/nistpubs\/SpecialPublications\/NIST.SP.800-226.ipd.pdf"},{"key":"pone.0332441.ref036","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/978-3-540-78478-4_1","article-title":"An Ad Omnia Approach to Defining and Achieving Private Data Analysis.","volume-title":"Privacy, Security, and Trust in KDD [Internet]","author":"C. Dwork","year":"2008"},{"key":"pone.0332441.ref037","first-page":"88","article-title":"On syntactic anonymity and differential privacy.","volume-title":"2013 IEEE 29th International Conference on Data Engineering Workshops (ICDEW) [Internet]","author":"C Clifton","year":"2013"},{"issue":"1","key":"pone.0332441.ref038","first-page":"67","article-title":"SafePub: a truthful data anonymization algorithm with strong privacy guarantees","volume":"2018","author":"R Bild","year":"2018","journal-title":"Proc Priv Enhancing Techno"},{"issue":"7","key":"pone.0332441.ref039","doi-asserted-by":"crossref","first-page":"1277","DOI":"10.1002\/spe.2812","article-title":"Flexible data anonymization using ARX\u2014Current status and challenges ahead","volume":"50","author":"F Prasser","year":"2020","journal-title":"Softw Pract Exp"},{"issue":"1","key":"pone.0332441.ref040","doi-asserted-by":"crossref","DOI":"10.1186\/s40537-018-0124-9","article-title":"Differential privacy: its technological prescriptive using big data","volume":"5","author":"P Jain","year":"2018","journal-title":"J Big Data"},{"issue":"24","key":"pone.0332441.ref041","doi-asserted-by":"crossref","first-page":"7030","DOI":"10.3390\/s20247030","article-title":"A comprehensive survey on local differential privacy toward data statistics and analysis","volume":"20","author":"T Wang","year":"2020","journal-title":"Sensors (Basel)"},{"issue":"3","key":"pone.0332441.ref042","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3168389","article-title":"Technical privacy metrics","volume":"51","author":"I Wagner","year":"2018","journal-title":"ACM Comput Surv"},{"key":"pone.0332441.ref043","doi-asserted-by":"crossref","first-page":"153","DOI":"10.1002\/9780470724163.ch7","volume-title":"Handbook of Granular Computing","author":"P Fazendeiro","year":"2008"},{"key":"pone.0332441.ref044","first-page":"21","article-title":"Clustering large data sets with mixed numeric and categorical values.","volume-title":"First Pacific Asia Knowledge Discovery and Data Mining Conference [Internet]","author":"ZX Huang","year":"1997"},{"issue":"1","key":"pone.0332441.ref045","doi-asserted-by":"crossref","first-page":"243","DOI":"10.1016\/j.patcog.2012.07.021","article-title":"An extensive comparative study of cluster validity indices","volume":"46","author":"O Arbelaitz","year":"2013","journal-title":"Pattern Recognit"}],"container-title":["PLOS One"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pone.0332441","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,8]],"date-time":"2025-10-08T17:25:06Z","timestamp":1759944306000},"score":1,"resource":{"primary":{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pone.0332441"}},"subtitle":[],"editor":[{"given":"Micah","family":"Altman","sequence":"first","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2025,10,8]]},"references-count":45,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2025,10,8]]}},"URL":"https:\/\/doi.org\/10.1371\/journal.pone.0332441","relation":{},"ISSN":["1932-6203"],"issn-type":[{"value":"1932-6203","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,10,8]]}}}