{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,16]],"date-time":"2026-04-16T10:46:50Z","timestamp":1776336410093,"version":"3.51.2"},"reference-count":43,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2017,10,26]],"date-time":"2017-10-26T00:00:00Z","timestamp":1508976000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"United States National Science Foundation","award":["CNS-1116991,CNS-1640374"],"award-info":[{"award-number":["CNS-1116991,CNS-1640374"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Priv. Secur."],"published-print":{"date-parts":[[2017,11,30]]},"abstract":"<jats:p>\n            <jats:italic>k<\/jats:italic>\n            -means clustering is a widely used clustering analysis technique in machine learning. In this article, we study the problem of differentially private\n            <jats:italic>k<\/jats:italic>\n            -means clustering. Several state-of-the-art methods follow the\n            <jats:italic>single-workload<\/jats:italic>\n            approach, which adapts an existing machine-learning algorithm by making each step private. However, most of them do not have satisfactory empirical performance. In this work, we develop techniques to analyze the empirical error behaviors of one of the state-of-the-art single-workload approaches, DPLloyd, which is a differentially private version of the Lloyd algorithm for\n            <jats:italic>k<\/jats:italic>\n            &gt;-means clustering. Based on the analysis, we propose an improvement of DPLloyd. We also propose a new algorithm for\n            <jats:italic>k<\/jats:italic>\n            -means clustering from the perspective of the\n            <jats:italic>noninteractive<\/jats:italic>\n            approach, which publishes a synopsis of the input dataset and then runs\n            <jats:italic>k<\/jats:italic>\n            -means on synthetic data generated from the synopsis. We denote this approach by EUGkM. After analyzing the empirical error behaviors of EUGkM, we further propose a\n            <jats:italic>hybrid<\/jats:italic>\n            approach that combines our DPLloyd improvement and EUGkM. Results from extensive and systematic experiments support our analysis and demonstrate the effectiveness of the DPLloyd improvement, EUGkM, and the hybrid approach.\n          <\/jats:p>","DOI":"10.1145\/3133201","type":"journal-article","created":{"date-parts":[[2017,10,27]],"date-time":"2017-10-27T12:48:13Z","timestamp":1509108493000},"page":"1-33","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":44,"title":["Differentially Private K-Means Clustering and a Hybrid Approach to Private Optimization"],"prefix":"10.1145","volume":"20","author":[{"given":"Dong","family":"Su","sequence":"first","affiliation":[{"name":"Purdue University"}]},{"given":"Jianneng","family":"Cao","sequence":"additional","affiliation":[{"name":"Institute for Infocomm Research"}]},{"given":"Ninghui","family":"Li","sequence":"additional","affiliation":[{"name":"Purdue University"}]},{"given":"Elisa","family":"Bertino","sequence":"additional","affiliation":[{"name":"Purdue University"}]},{"given":"Min","family":"Lyu","sequence":"additional","affiliation":[{"name":"University of Science and Technology of China, Anhui, China"}]},{"given":"Hongxia","family":"Jin","sequence":"additional","affiliation":[{"name":"Samsung Research America"}]}],"member":"320","published-online":{"date-parts":[[2017,10,26]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.4086\/toc.2012.v008a006"},{"key":"e_1_2_1_2_1","volume-title":"UCI Machine Learning Repository [http:\/\/archive.ics.uci.edu\/ml]","author":"Lichman M.","unstructured":"M. Lichman . 2013. UCI Machine Learning Repository [http:\/\/archive.ics.uci.edu\/ml] . Irvine, CA : University of California , School of Information and Computer Science. M. Lichman. 2013. UCI Machine Learning Repository [http:\/\/archive.ics.uci.edu\/ml]. Irvine, CA: University of California, School of Information and Computer Science."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/1835804.1835869"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/1065167.1065184"},{"key":"e_1_2_1_5_1","unstructured":"United States Census. 1991. Topologically Integrated Geographic Encoding and Referencing. Retrieved from http:\/\/www.census.gov\/geo\/maps-data\/data\/tiger.html.  United States Census. 1991. Topologically Integrated Geographic Encoding and Referencing. Retrieved from http:\/\/www.census.gov\/geo\/maps-data\/data\/tiger.html."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.5555\/2981780.2981817"},{"key":"e_1_2_1_7_1","volume-title":"Sarwate","author":"Chaudhuri Kamalika","year":"2011","unstructured":"Kamalika Chaudhuri , Claire Monteleoni , and Anand D . Sarwate . 2011 . Differentially private empirical risk minimization. J. Mach. Learn. Res . 12 (July 2011), 1069--1109. Kamalika Chaudhuri, Claire Monteleoni, and Anand D. Sarwate. 2011. Differentially private empirical risk minimization. J. Mach. Learn. Res. 12 (July 2011), 1069--1109."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2012.16"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/773153.773173"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1007\/11787006_1"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1866739.1866758"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1007\/11681878_14"},{"key":"e_1_2_1_13_1","volume-title":"Privacy-Preserving Datamining on Vertically Partitioned Databases","author":"Dwork Cynthia","unstructured":"Cynthia Dwork and Kobbi Nissim . 2004. Privacy-Preserving Datamining on Vertically Partitioned Databases . Springer , Berlin , 528--544. DOI:https:\/\/doi.org\/10.1007\/978-3-540-28628-8_32 10.1007\/978-3-540-28628-8_32 Cynthia Dwork and Kobbi Nissim. 2004. Privacy-Preserving Datamining on Vertically Partitioned Databases. Springer, Berlin, 528--544. DOI:https:\/\/doi.org\/10.1007\/978-3-540-28628-8_32"},{"key":"e_1_2_1_14_1","unstructured":"Pasi Fr\u00e4nti. 2006. Clustering datasets. Retrieved from http:\/\/cs.joensuu.fi\/sipu\/datasets\/.  Pasi Fr\u00e4nti. 2006. Clustering datasets. Retrieved from http:\/\/cs.joensuu.fi\/sipu\/datasets\/."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1835804.1835868"},{"key":"e_1_2_1_16_1","volume-title":"Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS\u201912)","author":"Hardt Moritz","year":"2012","unstructured":"Moritz Hardt , Katrina Ligett , and Frank McSherry . 2012 . A simple and practical algorithm for differentially private data release . In Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS\u201912) . Curran Associates, 2339--2347. Moritz Hardt, Katrina Ligett, and Frank McSherry. 2012. A simple and practical algorithm for differentially private data release. In Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS\u201912). Curran Associates, 2339--2347."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2882931"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.14778\/1920841.1920970"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.5555\/2986459.2986500"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/3477.764879"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.14778\/2350229.2350251"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2013.6544872"},{"key":"e_1_2_1_23_1","first-page":"2","article-title":"Least squares quantization in PCM","volume":"28","author":"Lloyd S.","year":"2006","unstructured":"S. Lloyd . 2006 . Least squares quantization in PCM . IEEE Trans. Inf. Theor. 28 , 2 (Sept. 2006), 129--137. DOI:https:\/\/doi.org\/10.1109\/TIT.1982.1056489 10.1109\/TIT.1982.1056489 S. Lloyd. 2006. Least squares quantization in PCM. IEEE Trans. Inf. Theor. 28, 2 (Sept. 2006), 129--137. DOI:https:\/\/doi.org\/10.1109\/TIT.1982.1056489","journal-title":"IEEE Trans. Inf. Theor."},{"key":"e_1_2_1_24_1","volume-title":"Introduction to Information Retrieval","author":"Manning Christopher D.","unstructured":"Christopher D. Manning , Prabhakar Raghavan , and Hinrich Sch\u00fctze . 2008. Introduction to Information Retrieval . Cambridge University Press , New York . Christopher D. Manning, Prabhakar Raghavan, and Hinrich Sch\u00fctze. 2008. Introduction to Information Retrieval. Cambridge University Press, New York."},{"key":"e_1_2_1_25_1","unstructured":"Frank McSherry. 2009. Privacy Integrated Queries (PINQ) Infrastructure. Retrieved from http:\/\/research.microsoft.com\/en-us\/downloads\/73099525-fd8d-4966-9b93-574e6023147f\/.  Frank McSherry. 2009. Privacy Integrated Queries (PINQ) Infrastructure. Retrieved from http:\/\/research.microsoft.com\/en-us\/downloads\/73099525-fd8d-4966-9b93-574e6023147f\/."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/1557019.1557090"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/FOCS.2007.41"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/1559845.1559850"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/2213836.2213876"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/2213836.2213876"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/1250790.1250803"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-8655(99)00069-0"},{"key":"e_1_2_1_33_1","unstructured":"Weiliang Qiu. 2015. clusterGeneration: Random Cluster Generation (with Specified Degree of Separation). Retrieved from http:\/\/cran.r-project.org\/web\/packages\/clusterGeneration\/index.html.  Weiliang Qiu. 2015. clusterGeneration: Random Cluster Generation (with Specified Degree of Separation). Retrieved from http:\/\/cran.r-project.org\/web\/packages\/clusterGeneration\/index.html."},{"key":"e_1_2_1_34_1","volume-title":"The 4th International Conference on Advances in Pattern Recognition and Digital Techniques. 137--143","author":"Ray Siddheswar","unstructured":"Siddheswar Ray and Rose H. Turi . 1999. Determination of number of clusters in K-means clustering and application in colour image segmentation . In The 4th International Conference on Advances in Pattern Recognition and Digital Techniques. 137--143 . Siddheswar Ray and Rose H. Turi. 1999. Determination of number of clusters in K-means clustering and application in colour image segmentation. In The 4th International Conference on Advances in Pattern Recognition and Digital Techniques. 137--143."},{"key":"e_1_2_1_35_1","unstructured":"Scipy.org. 2001. Scientific Computing Tools for Python. Retrieved from http:\/\/scipy.org\/.  Scipy.org. 2001. Scientific Computing Tools for Python. Retrieved from http:\/\/scipy.org\/."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/1993636.1993743"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/2857705.2857708"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1111\/1467-9868.00293"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2010.247"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2588573"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463676.2465330"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.14778\/2350229.2350253"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611973440.68"}],"container-title":["ACM Transactions on Privacy and Security"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3133201","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3133201","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T02:10:59Z","timestamp":1750212659000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3133201"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,10,26]]},"references-count":43,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2017,11,30]]}},"alternative-id":["10.1145\/3133201"],"URL":"https:\/\/doi.org\/10.1145\/3133201","relation":{},"ISSN":["2471-2566","2471-2574"],"issn-type":[{"value":"2471-2566","type":"print"},{"value":"2471-2574","type":"electronic"}],"subject":[],"published":{"date-parts":[[2017,10,26]]},"assertion":[{"value":"2016-10-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2017-08-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2017-10-26","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}