{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,11]],"date-time":"2025-09-11T19:17:16Z","timestamp":1757618236325,"version":"3.44.0"},"reference-count":50,"publisher":"Springer Science and Business Media LLC","issue":"7","license":[{"start":{"date-parts":[[2025,6,6]],"date-time":"2025-06-06T00:00:00Z","timestamp":1749168000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,6,6]],"date-time":"2025-06-06T00:00:00Z","timestamp":1749168000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100006752","name":"Universidade do Porto","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100006752","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Mach Learn"],"published-print":{"date-parts":[[2025,7]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Protecting user data privacy can be achieved via many methods, from statistical transformations to generative models. However, they all have critical drawbacks. For example, creating a transformed data set using traditional techniques is highly time-consuming. Also, recent deep learning-based solutions require significant computational resources in addition to long training phases, and differentially private-based solutions may undermine data utility. In this paper, we propose <jats:inline-formula>\n              <jats:alternatives>\n                <jats:tex-math>$$\\epsilon$$<\/jats:tex-math>\n                <mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mi>\u03f5<\/mml:mi>\n                <\/mml:math>\n              <\/jats:alternatives>\n            <\/jats:inline-formula>-PrivateSMOTE, a technique designed to protect against re-identification and linkage attacks, particularly addressing cases with a high re-identification risk. Our proposal combines synthetic data generation via noise-induced interpolation with differential privacy principles to obfuscate high-risk cases. We demonstrate how <jats:inline-formula>\n              <jats:alternatives>\n                <jats:tex-math>$$\\epsilon$$<\/jats:tex-math>\n                <mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mi>\u03f5<\/mml:mi>\n                <\/mml:math>\n              <\/jats:alternatives>\n            <\/jats:inline-formula>-PrivateSMOTE is capable of achieving competitive results in privacy risk and better predictive performance when compared to multiple traditional and state-of-the-art privacy-preservation methods, including generative adversarial networks, variational autoencoders, and differential privacy baselines. We also show how our method improves time requirements by at least a factor of 9 and is a resource-efficient solution that ensures high performance without specialised hardware.<\/jats:p>","DOI":"10.1007\/s10994-025-06799-w","type":"journal-article","created":{"date-parts":[[2025,6,6]],"date-time":"2025-06-06T10:35:34Z","timestamp":1749206134000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Differentially-private data synthetisation for efficient re-identification risk control"],"prefix":"10.1007","volume":"114","author":[{"given":"T\u00e2nia","family":"Carvalho","sequence":"first","affiliation":[]},{"given":"Nuno","family":"Moniz","sequence":"additional","affiliation":[]},{"given":"Lu\u00eds","family":"Antunes","sequence":"additional","affiliation":[]},{"given":"Nitesh","family":"Chawla","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,6,6]]},"reference":[{"key":"6799_CR1","unstructured":"Agresti, A. (1996). An introduction to categorical data analysis. Wiley."},{"key":"6799_CR2","doi-asserted-by":"crossref","unstructured":"Basha, S. J., Madala, S. R., Vivek, K., Kumar, E. S., Ammannamma, T. (2022). A Review on Imbalanced Data Classification Techniques. In 2022 International Conference on Advanced Computing Technologies and Applications (ICACTA) (pp. 1\u20136).","DOI":"10.1109\/ICACTA54488.2022.9753392"},{"issue":"1","key":"6799_CR3","first-page":"2653","volume":"18","author":"A Benavoli","year":"2017","unstructured":"Benavoli, A., Corani, G., Dem\u0161ar, J., & Zaffalon, M. (2017). Time for a change: A tutorial for comparing multiple classifiers through Bayesian analysis. The Journal of Machine Learning Research., 18(1), 2653\u20132688.","journal-title":"The Journal of Machine Learning Research."},{"key":"6799_CR4","unstructured":"Bird, T., Kingma, F.H., & Barber, D. (2020). Reducing the computational cost of deep generative models with binary neural networks. arXiv preprint arXiv:2010.13476.."},{"key":"6799_CR5","doi-asserted-by":"crossref","unstructured":"Brickell, J., & Shmatikov, V. (2008). The cost of privacy: destruction of data-mining utility in anonymized data publishing. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 70\u201378).","DOI":"10.1145\/1401890.1401904"},{"key":"6799_CR6","doi-asserted-by":"crossref","unstructured":"Cai, K., Lei, X., Wei, J., Xiao, X. (2021). Data synthesis via differentially private markov random fields. Proceedings of VLDB Endowment 14(11), 2190\u20132202. https:\/\/doi.org\/10.14778\/3476249.3476272","DOI":"10.14778\/3476249.3476272"},{"issue":"14","key":"6799_CR7","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3588765","volume":"55","author":"T Carvalho","year":"2023","unstructured":"Carvalho, T., Moniz, N., Faria, P., & Antunes, & L. (2023). Survey on Privacy-Preserving Techniques for Microdata Publication. ACM Computer Survey, 55(14), 1. https:\/\/doi.org\/10.1145\/3588765","journal-title":"ACM Computer Survey"},{"issue":"6","key":"6799_CR8","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0252169","volume":"16","author":"T Carvalho","year":"2021","unstructured":"Carvalho, T., Faria, P., Antunes, L., & Moniz, N. (2021). Fundamental privacy rights in a pandemic state. PLoS ONE, 16(6), Article e0252169.","journal-title":"PLoS ONE"},{"key":"6799_CR9","doi-asserted-by":"publisher","first-page":"321","DOI":"10.1613\/jair.953","volume":"16","author":"NV Chawla","year":"2002","unstructured":"Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321\u2013357.","journal-title":"Journal of Artificial Intelligence Research"},{"key":"6799_CR10","doi-asserted-by":"crossref","unstructured":"Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In Proceedings of 22nd ACM International Conference on SIGKDD (pp. 785\u2013794).","DOI":"10.1145\/2939672.2939785"},{"key":"6799_CR11","doi-asserted-by":"crossref","unstructured":"Domingo-Ferrer, J. (2008). A survey of inference control methods for privacy-preserving data mining. In Privacy-preserving data mining (pp. 53\u201380). Springer.","DOI":"10.1007\/978-0-387-70992-5_3"},{"issue":"3\u20134","key":"6799_CR12","first-page":"211","volume":"9","author":"C Dwork","year":"2014","unstructured":"Dwork, C., Roth, A., et al. (2014). The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3\u20134), 211\u2013407.","journal-title":"Foundations and Trends in Theoretical Computer Science"},{"key":"6799_CR13","doi-asserted-by":"crossref","unstructured":"Dwork, C. (2008). Differential privacy: A survey of results. In International conference on theory and applications of models of computation (pp. 1\u201319). Springer.","DOI":"10.1007\/978-3-540-79228-4_1"},{"key":"6799_CR14","unstructured":"El\u00a0Emam, K., Mosquera, L., & Hoptroff, R. (2020). Practical synthetic data generation: Balancing privacy and the broad availability of data. O\u2019Reilly Media."},{"key":"6799_CR15","unstructured":"European Commission. (2023). Opinion 05\/2014 on Anonymisation Techniques. Accessed January 2023. https:\/\/ec.europa.eu\/justice\/article-29\/documentation\/opinion-recommendation\/files\/2014\/wp216_en.pdf."},{"key":"6799_CR16","doi-asserted-by":"crossref","unstructured":"Giomi, M., Boenisch, F., Wehmeyer, C., & Tasn\u00e1di, B. (2022). Anonymeter. Accessed Jun 2023. https:\/\/github.com\/statice\/anonymeter.","DOI":"10.56553\/popets-2023-0055"},{"key":"6799_CR17","doi-asserted-by":"crossref","unstructured":"Giomi, M., Boenisch, F., Wehmeyer, C., Tasn\u00e1di, B. (2022). A Unified Framework for Quantifying Privacy Risk in Synthetic Data. arXiv preprint arXiv:2211.10459.","DOI":"10.56553\/popets-2023-0055"},{"key":"6799_CR18","doi-asserted-by":"crossref","unstructured":"Han, H., Wang, W.Y., Mao, B.H. (2005). Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In International conference on intelligent computing (pp. 878\u2013887). Springer.","DOI":"10.1007\/11538059_91"},{"key":"6799_CR19","doi-asserted-by":"publisher","first-page":"4069","DOI":"10.1109\/TSP.2020.3006760","volume":"68","author":"J He","year":"2020","unstructured":"He, J., Cai, L., & Guan, X. (2020). Differential private noise adding mechanism and its application on consensus algorithm. IEEE Transactions on Signal Processing, 68, 4069\u20134082.","journal-title":"IEEE Transactions on Signal Processing"},{"key":"6799_CR20","unstructured":"Holohan, N., Antonatos, S., Braghin, S., & Mac\u00a0Aonghusa, P. (2017). ($$k$$, $$\\epsilon$$)-Anonymity: $$k$$-Anonymity with $$\\epsilon$$-Differential Privacy. arXiv preprint arXiv:1710.01615."},{"key":"6799_CR21","doi-asserted-by":"crossref","unstructured":"Ho, T. K. (1998). The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8), 832\u2013844.","DOI":"10.1109\/34.709601"},{"key":"6799_CR22","unstructured":"Imbalanced-learn developers.: Imbalanced learn. Accessed Jan 2023. https:\/\/imbalanced-learn.org\/stable\/index.html."},{"key":"6799_CR23","unstructured":"Jordon, J., Yoon, J., Van Der\u00a0Schaar, M. (2018). PATE-GAN: Generating synthetic data with differential privacy guarantees. In International conference on learning representations."},{"key":"6799_CR24","unstructured":"Kruschke, J., & Liddell T. The bayesian new statistics: Two historical trends converge. SSRN Electronic Journal."},{"key":"6799_CR25","unstructured":"Long, Y., Bindschaedler, V., & Gunter, C.A. (2017). Towards measuring membership privacy. arXiv preprint arXiv:1712.09136."},{"key":"6799_CR26","unstructured":"Mahiou, S., Xu, K., & Ganev, G. (2022). dpart: Differentially Private Autoregressive Tabular, a General Framework for Synthetic Data Generation. arXiv preprint arXiv:2207.05810."},{"key":"6799_CR27","doi-asserted-by":"publisher","first-page":"127","DOI":"10.1007\/BF02478259","volume":"5","author":"W Mcculloch","year":"1943","unstructured":"Mcculloch, W., & Pitts, W. (1943). A Logical Calculus of Ideas Immanent in Nervous Activity. Bulletin of Mathematical Biophysics., 5, 127\u2013147.","journal-title":"Bulletin of Mathematical Biophysics."},{"key":"6799_CR28","doi-asserted-by":"crossref","unstructured":"Muralidhar, K., Domingo-Ferrer, J., spsampsps Mart\u00ednez, S. (2020). $$\\epsilon$$-Differential Privacy for Microdata Releases Does Not Guarantee Confidentiality (Let Alone Utility). In International Conference on Privacy in Statistical Databases (pp. 21\u201331). Springer.","DOI":"10.1007\/978-3-030-57521-2_2"},{"key":"6799_CR29","unstructured":"Nikolenko, S.I. (2019). Synthetic data for deep learning. arXiv preprint arXiv:1909.11512."},{"key":"6799_CR30","doi-asserted-by":"crossref","unstructured":"Patki, N., Wedge, R., & Veeramachaneni, K. (2016). The Synthetic Data Vault. In 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA) (pp. 399\u2013410).","DOI":"10.1109\/DSAA.2016.49"},{"key":"6799_CR31","first-page":"2825","volume":"12","author":"F Pedregosa","year":"2011","unstructured":"Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011). Scikit-learn: Machine learning in Python. The Journal of machine Learning research., 12, 2825\u20132830.","journal-title":"The Journal of machine Learning research."},{"issue":"7","key":"6799_CR32","first-page":"1277","volume":"50","author":"F Prasser","year":"2020","unstructured":"Prasser, F., Eicher, J., Spengler, H., Bild, R., & Kuhn, K. A. (2020). Flexible data anonymization using ARX-Current status and challenges ahead. Software: Practice and Experience, 50(7), 1277\u20131304.","journal-title":"Software: Practice and Experience"},{"issue":"5","key":"6799_CR33","doi-asserted-by":"publisher","first-page":"441","DOI":"10.22266\/ijies2020.1031.39","volume":"13","author":"GA Pradipta","year":"2020","unstructured":"Pradipta, G. A., Wardoyo, R., Musdholifah, A., & Sanjaya, I. N. H. (2020). Improving classifiaction performance of fetal umbilical cord using combination of SMOTE method and multiclassifier voting in imbalanced data and small dataset. International Journal of Intelligent Engineering and Systems, 13(5), 441\u2013454.","journal-title":"International Journal of Intelligent Engineering and Systems"},{"key":"6799_CR34","unstructured":"Qian, Z., Cebere, B.C., & van\u00a0der Schaar, M.: Synthcity: facilitating innovative use cases of synthetic data in different data modalities. arxiv:org\/abs\/2301.07573."},{"key":"6799_CR35","unstructured":"Rastogi, V., Suciu, D., & Hong, S. (2007). The boundary between privacy and utility in data publishing. In Proceedings of the 33rd international conference on Very large data bases (pp. 531\u2013542)."},{"issue":"6","key":"6799_CR36","doi-asserted-by":"publisher","first-page":"1010","DOI":"10.1109\/69.971193","volume":"13","author":"P Samarati","year":"2001","unstructured":"Samarati, P. (2001). Protecting respondents identities in microdata release. IEEE Transactions on Knowledge and Data Engineering, 13(6), 1010\u20131027.","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"issue":"5","key":"6799_CR37","doi-asserted-by":"publisher","first-page":"771","DOI":"10.1007\/s00778-014-0351-4","volume":"23","author":"J Soria-Comas","year":"2014","unstructured":"Soria-Comas, J., Domingo-Ferrer, J., S\u00e1nchez, D., & Mart\u00ednez, S. (2014). Enhancing data utility in differential privacy via microaggregation-based k-anonymity. The VLDB Journal, 23(5), 771\u2013794.","journal-title":"The VLDB Journal"},{"key":"6799_CR38","doi-asserted-by":"crossref","unstructured":"Spelmen, V. S., Porkodi, R. A., & Review on handling imbalanced data. (2018). International conference on current trends towards converging technologies (ICCTCT). IEEE, 2018, 1\u201311.","DOI":"10.1109\/ICCTCT.2018.8551020"},{"key":"6799_CR39","unstructured":"Stadler, T., Oprisanu, B., & Troncoso, C. (2022) Synthetic data\u2013anonymisation groundhog day. In 31st USENIX Security Symposium (USENIX Security 22) (pp. 1451\u20131468)."},{"key":"6799_CR40","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2021.107965","volume":"118","author":"AN Tarekegn","year":"2021","unstructured":"Tarekegn, A. N., Giacobini, M., & Michalak, K. (2021). A review of methods for imbalanced multi-label classification. Pattern Recognition, 118, Article 107965.","journal-title":"Pattern Recognition"},{"key":"6799_CR41","doi-asserted-by":"crossref","unstructured":"Torkzadehmahani, R., Kairouz, P., & Paten, B. (2019) Dp-cgan: Differentially private synthetic data and label generation. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops.","DOI":"10.1109\/CVPRW.2019.00018"},{"key":"6799_CR42","unstructured":"University of Pennsylvania. (2023). Lecture1: Introduction to Differential Privacy and the Laplace Mechanism. Accessed Jun 2023. https:\/\/www.cis.upenn.edu\/~aaroth\/chatgpt_lecture_notes.pdf."},{"key":"6799_CR43","unstructured":"van Breugel, B., Sun, H., Qian, Z., & van\u00a0der Schaar, M. (2023). Membership inference attacks against synthetic data through overfitting detection. arXiv preprint arXiv:2302.12580."},{"issue":"2","key":"6799_CR44","doi-asserted-by":"publisher","first-page":"49","DOI":"10.1145\/2641190.2641198","volume":"15","author":"J Vanschoren","year":"2013","unstructured":"Vanschoren, J., van Rijn, J. N., Bischl, B., & Torgo, L. (2013). OpenML: Networked Science in Machine Learning. SIGKDD Explorations, 15(2), 49\u201360. https:\/\/doi.org\/10.1145\/2641190.2641198","journal-title":"SIGKDD Explorations"},{"key":"6799_CR45","unstructured":"Weng, C.G., & Poon, J. (2008). A new evaluation measure for imbalanced datasets. In Proceedings of the 7th Australasian Data Mining Conference (Vol. 87, pp. 27\u201332)."},{"key":"6799_CR46","unstructured":"Xie, L., Lin, K., Wang, S., Wang, F., & Zhou, J. (2018). Differentially private generative adversarial network. arXiv preprint arXiv:1802.06739."},{"key":"6799_CR47","first-page":"1","volume":"32","author":"L Xu","year":"2019","unstructured":"Xu, L., Skoularidou, M., Cuesta-Infante, A., & Veeramachaneni, K. (2019). Modeling tabular data using conditional gan. Advances in Neural Information Processing Systems, 32, 1.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"6799_CR48","doi-asserted-by":"crossref","unstructured":"Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., Bennett, K.P. (2019). Assessing privacy and quality of synthetic health data. In Proceedings of the Conference on Artificial Intelligence for Data Discovery and Reuse (pp. 1\u20134).","DOI":"10.1145\/3359115.3359124"},{"key":"6799_CR49","unstructured":"Zhang, Z., Wang, T., Li, N., Honorio, J., Backes, M., He, S., et\u00a0al. (2021). PrivSyn: Differentially Private Data Synthesis. In 30th usenix security symposium (usenix security 21). USENIX ASSOCIATION (pp. 929\u2013946). Available from: https:\/\/www.usenix.org\/conference\/usenixsecurity21\/presentation\/zhang-zhikun."},{"issue":"4","key":"6799_CR50","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3134428","volume":"42","author":"J Zhang","year":"2017","unstructured":"Zhang, J., Cormode, G., Procopiuc, C. M., Srivastava, D., & Xiao, X. (2017). Privbayes: Private data release via bayesian networks. ACM Transactions on Database Systems (TODS), 42(4), 1\u201341.","journal-title":"ACM Transactions on Database Systems (TODS)"}],"container-title":["Machine Learning"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-025-06799-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10994-025-06799-w\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-025-06799-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,6]],"date-time":"2025-09-06T19:04:19Z","timestamp":1757185459000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10994-025-06799-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,6]]},"references-count":50,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2025,7]]}},"alternative-id":["6799"],"URL":"https:\/\/doi.org\/10.1007\/s10994-025-06799-w","relation":{},"ISSN":["0885-6125","1573-0565"],"issn-type":[{"type":"print","value":"0885-6125"},{"type":"electronic","value":"1573-0565"}],"subject":[],"published":{"date-parts":[[2025,6,6]]},"assertion":[{"value":"12 February 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"15 July 2024","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 May 2025","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"6 June 2025","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"I declare that the authors have no Conflict of interest as defined by Springer, or other interests that might be perceived to influence the results and\/or discussion reported in this paper.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics Approval"}},{"value":"Not applicable.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent to Participate"}},{"value":"All of the material is owned by the authors and\/or no permissions are required.","order":5,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for Publication"}},{"value":"The code of the proposed method is available at .","order":6,"name":"Ethics","group":{"name":"EthicsHeading","label":"Code Availability"}}],"article-number":"164"}}