{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T05:02:39Z","timestamp":1750309359693,"version":"3.41.0"},"reference-count":39,"publisher":"Association for Computing Machinery (ACM)","issue":"9","license":[{"start":{"date-parts":[[2024,11,20]],"date-time":"2024-11-20T00:00:00Z","timestamp":1732060800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Knowl. Discov. Data"],"published-print":{"date-parts":[[2024,11,30]]},"abstract":"<jats:p>\n            Large-scale learning algorithms are essential for modern data collections that may have billions of data points. Here, we study the design of parallel\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(k\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            -clustering algorithms, which include the\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(k\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            -median,\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(k\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            -medoids, and\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(k\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            -means clustering problems. We design efficient parallel algorithms for these problems and prove that they still compute constant-factor approximations to the optimal solution for stable clustering instances. In addition to our theoretic results, we present computational experiments that show that our\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(k\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            -median and\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(k\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            -means algorithms work well in practice\u2014we are able to find better clusterings than state-of-the-art coreset constructions using samples of the same size.\n          <\/jats:p>","DOI":"10.1145\/3674508","type":"journal-article","created":{"date-parts":[[2024,7,20]],"date-time":"2024-07-20T12:57:40Z","timestamp":1721480260000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Large-Scale K-Clustering"],"prefix":"10.1145","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7518-8242","authenticated-orcid":false,"given":"Konstantin","family":"Voevodski","sequence":"first","affiliation":[{"name":"Google Inc., Mountain View, CA, USA"}]}],"member":"320","published-online":{"date-parts":[[2024,11,20]]},"reference":[{"key":"e_1_3_1_2_2","first-page":"10","article-title":"Streaming K-Means Approximation","author":"Ailon Nir","year":"2009","unstructured":"Nir Ailon, Ragesh Jaiswal, and Claire Monteleoni. 2009. Streaming K-Means Approximation. In NIPS, 10\u201318.","journal-title":"NIPS"},{"key":"e_1_3_1_3_2","first-page":"1027","article-title":"K-Means++: The Advantages of Careful Seeding","author":"Arthur David","year":"2007","unstructured":"David Arthur and Sergei Vassilvitskii. 2007. K-Means++: The Advantages of Careful Seeding. In SODA, 1027\u20131035.","journal-title":"SODA"},{"doi-asserted-by":"publisher","key":"e_1_3_1_4_2","DOI":"10.1145\/380752.380755"},{"key":"e_1_3_1_5_2","first-page":"3216","article-title":"Clustering with Same-Cluster Queries","author":"Ashtiani Hassan","year":"2016","unstructured":"Hassan Ashtiani, Shrinu Kushagra, and Shai Ben-David. 2016. Clustering with Same-Cluster Queries. In NIPS, 3216\u20133224.","journal-title":"NIPS"},{"doi-asserted-by":"publisher","key":"e_1_3_1_6_2","DOI":"10.1016\/j.ipl.2011.10.006"},{"doi-asserted-by":"publisher","key":"e_1_3_1_7_2","DOI":"10.1145\/2835776.2835829"},{"key":"e_1_3_1_8_2","first-page":"1119","article-title":"Scalable K-Means Clustering via Lightweight Coresets","author":"Bachem Olivier","year":"2018","unstructured":"Olivier Bachem, Mario Lucic, and Andreas Krause. 2018. Scalable K-Means Clustering via Lightweight Coresets. In KDD, 1119\u20131127.","journal-title":"KDD"},{"key":"e_1_3_1_9_2","first-page":"622","article-title":"Scalable K-Means++","author":"Bahmani Bahman","year":"2012","unstructured":"Bahman Bahmani, Benjamin Moseley, Andrea Vattani, Ravi Kumar, and Sergei Vassilvitskii. 2012. Scalable K-Means++. In VLDB, 622\u2013633.","journal-title":"VLDB"},{"key":"e_1_3_1_10_2","first-page":"671","article-title":"A Discriminative Framework for Clustering via Similarity Functions","author":"Balcan Maria Florina","year":"2008","unstructured":"Maria Florina Balcan, Avrim Blum, and Santosh Vempala. 2008. A Discriminative Framework for Clustering via Similarity Functions. In STOC, 671\u2013680.","journal-title":"STOC"},{"key":"e_1_3_1_11_2","first-page":"1995","article-title":"Distributed K-Means and K-Median Clustering on General Topologies","author":"Balcan Maria Fiorina","year":"2013","unstructured":"Maria Fiorina Balcan, Steven Ehrlich, and Yingyu Liang. 2013. Distributed K-Means and K-Median Clustering on General Topologies. In NIPS, 1995\u20132003.","journal-title":"NIPS"},{"doi-asserted-by":"publisher","key":"e_1_3_1_12_2","DOI":"10.1145\/3381424"},{"key":"e_1_3_1_13_2","first-page":"63","article-title":"Clustering under Perturbation Resilience","author":"Balcan Maria Florina","year":"2012","unstructured":"Maria Florina Balcan and Yingyu Liang. 2012. Clustering under Perturbation Resilience. In ICALP, 63\u201374.","journal-title":"ICALP"},{"key":"e_1_3_1_14_2","first-page":"6867","article-title":"Affinity Clustering: Hierarchical Clustering at Scale","author":"Bateni Mohammad Hossein","year":"2017","unstructured":"Mohammad Hossein Bateni, Soheil Behnezhad, Mahsa Derakhshan, MohammadTaghi Hajiaghayi, Raimondas Kiveris, Silvio Lattanzi, and Vahab Mirrokni. 2017. Affinity Clustering: Hierarchical Clustering at Scale. In NIPS, 6867\u20136877.","journal-title":"NIPS"},{"doi-asserted-by":"publisher","key":"e_1_3_1_15_2","DOI":"10.1016\/j.tcs.2014.09.025"},{"doi-asserted-by":"publisher","key":"e_1_3_1_16_2","DOI":"10.1006\/jcss.2002.1882"},{"doi-asserted-by":"publisher","key":"e_1_3_1_17_2","DOI":"10.1145\/780542.780548"},{"key":"e_1_3_1_18_2","first-page":"1038","article-title":"Towards Optimal Lower Bounds for K-Median and K-Means Coresets","author":"Cohen-Addad Vincent","year":"2022","unstructured":"Vincent Cohen-Addad, Kasper Green Larsen, David Saulpic, and Chris Schwiegelshohn. 2022. Towards Optimal Lower Bounds for K-Median and K-Means Coresets. In STOC, 1038\u20131051.","journal-title":"STOC"},{"key":"e_1_3_1_19_2","first-page":"2679","article-title":"Improved Coresets for Euclidean K-Means","author":"Cohen-Addad Vincent","year":"2022","unstructured":"Vincent Cohen-Addad, Kasper Green Larsen, David Saulpic, Chris Schwiegelshohn, and Ali Sheikh-Omar. 2022. Improved Coresets for Euclidean K-Means. In NIPS, 2679\u20132694.","journal-title":"NIPS"},{"key":"e_1_3_1_20_2","first-page":"20333","article-title":"Parallel and Efficient Hierarchical K-Median Clustering","author":"Cohen-Addad Vincent","year":"2021","unstructured":"Vincent Cohen-Addad, Silvio Lattanzi, Ashkan Norouzi-Fard, Christian Sohler, and Ola Svensson. 2021. Parallel and Efficient Hierarchical K-Median Clustering. In NIPS, 20333\u201320345.","journal-title":"NIPS"},{"key":"e_1_3_1_21_2","first-page":"396","article-title":"Sublinear-Time Approximation for Clustering Via Random Sampling","author":"Czumaj Artur","year":"2004","unstructured":"Artur Czumaj and Christian Sohler. 2004. Sublinear-Time Approximation for Clustering Via Random Sampling. In ICALP, 396\u2013407.","journal-title":"ICALP"},{"unstructured":"D. Dheeru and E. K. Taniskidou. 2017. UCI Machine Learning Repository. Retrieved from http:\/\/archive.ics.uci.edu\/ml","key":"e_1_3_1_22_2"},{"key":"e_1_3_1_23_2","first-page":"681","article-title":"Fast Clustering using MapReduce","author":"Ene Alina","year":"2011","unstructured":"Alina Ene, Sungjin Im, and Benjamin Moseley. 2011. Fast Clustering using MapReduce. In KDD, 681\u2013689.","journal-title":"KDD"},{"key":"e_1_3_1_24_2","first-page":"569","article-title":"A Unified Framework for Approximating and Clustering Data","author":"Feldman Dan","year":"2011","unstructured":"Dan Feldman and Michael Langberg. 2011. A Unified Framework for Approximating and Clustering Data. In STOC, 569\u2013578.","journal-title":"STOC"},{"key":"e_1_3_1_25_2","first-page":"43","article-title":"Greedy and Local Ratio Algorithms in the MapReduce Model","author":"Harvey Nicholas J. A.","year":"2018","unstructured":"Nicholas J. A. Harvey, Christopher Liaw, and Paul Liu. 2018. Greedy and Local Ratio Algorithms in the MapReduce Model. In SPAA, 43\u201352.","journal-title":"SPAA"},{"unstructured":"Lingxiao Huang Jian Li and Xuan Wu. 2024. On Optimal Coreset Construction for Euclidean \\((k z)\\) -Clustering. arxiv:2211.11923. Retrieved from https:\/\/arxiv.org\/abs\/2211.11923","key":"e_1_3_1_26_2"},{"doi-asserted-by":"publisher","key":"e_1_3_1_27_2","DOI":"10.1145\/375827.375845"},{"doi-asserted-by":"publisher","key":"e_1_3_1_28_2","DOI":"10.1016\/j.comgeo.2004.03.003"},{"unstructured":"KDD Cup 2004. KDD Cup 2004: Particle Physics; Plus Protein Homology Prediction. Retrieved from https:\/\/www.kdd.org\/kdd-cup\/view\/kdd-cup-2004\/Data","key":"e_1_3_1_29_2"},{"key":"e_1_3_1_30_2","first-page":"1975","article-title":"Consistent K-Clustering","author":"Lattanzi Silvio","year":"2017","unstructured":"Silvio Lattanzi and Sergei Vassilvitskii. 2017. Consistent K-Clustering. In ICML, 1975\u20131984.","journal-title":"ICML"},{"key":"e_1_3_1_31_2","first-page":"901","article-title":"Approximating K-Median Via Pseudo-Approximation","author":"Li Shi","year":"2013","unstructured":"Shi Li and Ola Svensson. 2013. Approximating K-Median Via Pseudo-Approximation. In STOC, 901\u2013910.","journal-title":"STOC"},{"doi-asserted-by":"publisher","key":"e_1_3_1_32_2","DOI":"10.1609\/aaai.v35i10.17037"},{"doi-asserted-by":"publisher","key":"e_1_3_1_33_2","DOI":"10.1007\/s11263-022-01639-z"},{"key":"e_1_3_1_34_2","first-page":"160","article-title":"Training Gaussian Mixture Models at Scale via Coresets","volume":"18","author":"Lucic Mario","year":"2018","unstructured":"Mario Lucic, Matthew Faulkner, Andreas Krause, and Dan Feldman. 2018. Training Gaussian Mixture Models at Scale via Coresets. J. Machine Learning Res. 18, 160 (2018), 1\u201325.","journal-title":"J. Machine Learning Res."},{"key":"e_1_3_1_35_2","first-page":"1063","article-title":"Fast Distributed K-Center Clustering with Outliers on Massive Data","author":"Malkomes Gustavo","year":"2015","unstructured":"Gustavo Malkomes, Matt J. Kusner, Wenlin Chen, Kilian Q. Weinberger, and Benjamin Moseley. 2015. Fast Distributed K-Center Clustering with Outliers on Massive Data. In NIPS, 1063\u20131071.","journal-title":"NIPS"},{"doi-asserted-by":"publisher","key":"e_1_3_1_36_2","DOI":"10.1023\/B:MACH.0000033115.78247.f0"},{"key":"e_1_3_1_37_2","first-page":"439","article-title":"Sublinear Time Approximate Clustering","author":"Mishra Nina","year":"2001","unstructured":"Nina Mishra, Dan Oblinger, and Leonard Pitt. 2001. Sublinear Time Approximate Clustering. In SODA, 439\u2013447.","journal-title":"SODA"},{"issue":"6","key":"e_1_3_1_38_2","first-page":"1","article-title":"XAI Beyond Classification: Interpretable Neural Clustering","volume":"23","author":"Peng Xi","year":"2022","unstructured":"Xi Peng, Yunfan Li, Ivor W Tsang, Hongyuan Zhu, Jiancheng Lv, and Joey Tianyi Zhou. 2022. XAI Beyond Classification: Interpretable Neural Clustering. J. Machine Learning Res. 23, 6 (2022), 1\u201328.","journal-title":"J. Machine Learning Res"},{"key":"e_1_3_1_39_2","first-page":"268","article-title":"SCAN: Learning to Classify Images Without Labels","author":"Gansbeke Wouter Van","year":"2020","unstructured":"Wouter Van Gansbeke, Simon Vandenhende, Stamatios Georgoulis, Marc Proesmans, and Luc Van Gool. 2020. SCAN: Learning to Classify Images Without Labels. In ECCV, 268\u2013285.","journal-title":"ECCV"},{"key":"e_1_3_1_40_2","first-page":"2890","article-title":"Large Scale K-Median Clustering for Stable Clustering Instances","author":"Voevodski Konstantin","year":"2021","unstructured":"Konstantin Voevodski. 2021. Large Scale K-Median Clustering for Stable Clustering Instances. In AISTATS, 2890\u20132898.","journal-title":"AISTATS"}],"container-title":["ACM Transactions on Knowledge Discovery from Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3674508","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3674508","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T00:05:58Z","timestamp":1750291558000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3674508"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,11,20]]},"references-count":39,"journal-issue":{"issue":"9","published-print":{"date-parts":[[2024,11,30]]}},"alternative-id":["10.1145\/3674508"],"URL":"https:\/\/doi.org\/10.1145\/3674508","relation":{},"ISSN":["1556-4681","1556-472X"],"issn-type":[{"type":"print","value":"1556-4681"},{"type":"electronic","value":"1556-472X"}],"subject":[],"published":{"date-parts":[[2024,11,20]]},"assertion":[{"value":"2022-10-17","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-04-30","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-11-20","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}