{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,8]],"date-time":"2026-01-08T00:26:21Z","timestamp":1767831981299,"version":"3.49.0"},"reference-count":48,"publisher":"Association for Computing Machinery (ACM)","issue":"7","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2023,3]]},"abstract":"<jats:p>How can we efficiently and scalably cluster high-dimensional data? The<jats:italic>k<\/jats:italic>-means algorithm clusters data by iteratively reducing intra-cluster Euclidean distances until convergence. While it finds applications from recommendation engines to image segmentation, its application to high-dimensional data is hindered by the need to repeatedly compute Euclidean distances among points and centroids. In this paper, we propose Marigold (<jats:italic>k<\/jats:italic>-means for high-dimensional data), a scalable algorithm for<jats:italic>k<\/jats:italic>-means clustering in high dimensions. Marigold prunes distance calculations by means of (i) a tight distance-bounding scheme; (ii) a stepwise calculation over a multiresolution transform; and (iii) exploiting the triangle inequality. To our knowledge, such an arsenal of pruning techniques has not been hitherto applied to<jats:italic>k<\/jats:italic>-means. Our work is motivated by time-critical Angle-Resolved Photoemission Spectroscopy (ARPES) experiments, where it is vital to detect clusters among high-dimensional spectra in real time. In a thorough experimental study with real-world data sets we demonstrate that Marigold efficiently clusters high-dimensional data, achieving approximately one order of magnitude improvement over prior art.<\/jats:p>","DOI":"10.14778\/3587136.3587147","type":"journal-article","created":{"date-parts":[[2023,5,8]],"date-time":"2023-05-08T23:11:35Z","timestamp":1683587495000},"page":"1740-1748","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":8,"title":["Marigold: Efficient<i>k<\/i>-Means Clustering in High Dimensions"],"prefix":"10.14778","volume":"16","author":[{"given":"Kasper Overgaard","family":"Mortensen","sequence":"first","affiliation":[{"name":"Aarhus University"}]},{"given":"Fatemeh","family":"Zardbani","sequence":"additional","affiliation":[{"name":"Aarhus University"}]},{"given":"Mohammad Ahsanul","family":"Haque","sequence":"additional","affiliation":[{"name":"Aarhus University"}]},{"given":"Steinn Ymir","family":"Agustsson","sequence":"additional","affiliation":[{"name":"Aarhus Univeristy"}]},{"given":"Davide","family":"Mottin","sequence":"additional","affiliation":[{"name":"Aarhus University"}]},{"given":"Philip","family":"Hofmann","sequence":"additional","affiliation":[{"name":"Aarhus University"}]},{"given":"Panagiotis","family":"Karras","sequence":"additional","affiliation":[{"name":"Aarhus University"}]}],"member":"320","published-online":{"date-parts":[[2023,5,8]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1016\/1051-2004(91)90086-Z"},{"key":"e_1_2_1_2_1","volume-title":"Streaming k-means approximation. NeurIPS 22","author":"Ailon Nir","year":"2009","unstructured":"Nir Ailon , Ragesh Jaiswal , and Claire Monteleoni . 2009. Streaming k-means approximation. NeurIPS 22 ( 2009 ). Nir Ailon, Ragesh Jaiswal, and Claire Monteleoni. 2009. Streaming k-means approximation. NeurIPS 22 (2009)."},{"key":"e_1_2_1_3_1","volume-title":"Proc. Hawaii Int. Conf. System Sciences. 677--679","author":"Harry","unstructured":"Harry C. Andrews and William K. Pratt. 1968. Fourier transform coding of images . In Proc. Hawaii Int. Conf. System Sciences. 677--679 . Harry C. Andrews and William K. Pratt. 1968. Fourier transform coding of images. In Proc. Hawaii Int. Conf. System Sciences. 677--679."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.14778\/2180912.2180915"},{"key":"e_1_2_1_5_1","doi-asserted-by":"crossref","unstructured":"Luca Becchetti Marc Bury Vincent Cohen-Addad Fabrizio Grandoni and Chris Schwiegelshohn. 2019. Oblivious dimension reduction for k-means: beyond subspaces and the Johnson-Lindenstrauss lemma. In STOC. 1039--1050. Luca Becchetti Marc Bury Vincent Cohen-Addad Fabrizio Grandoni and Chris Schwiegelshohn. 2019. Oblivious dimension reduction for k -means: beyond subspaces and the Johnson-Lindenstrauss lemma. In STOC. 1039--1050.","DOI":"10.1145\/3313276.3316318"},{"key":"e_1_2_1_6_1","doi-asserted-by":"crossref","unstructured":"Shai Ben-David D\u00e1vid P\u00e1l and Hans Ulrich Simon. 2007. Stability of k-means clustering. In COLT. 20--34. Shai Ben-David D\u00e1vid P\u00e1l and Hans Ulrich Simon. 2007. Stability of k -means clustering. In COLT. 20--34.","DOI":"10.1007\/978-3-540-72927-3_4"},{"key":"e_1_2_1_7_1","volume-title":"Random projections for k-means clustering. NeurIPS 23","author":"Boutsidis Christos","year":"2010","unstructured":"Christos Boutsidis , Anastasios Zouzias , and Petros Drineas . 2010. Random projections for k-means clustering. NeurIPS 23 ( 2010 ). Christos Boutsidis, Anastasios Zouzias, and Petros Drineas. 2010. Random projections for k-means clustering. NeurIPS 23 (2010)."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.2014.2375327"},{"key":"e_1_2_1_9_1","doi-asserted-by":"crossref","unstructured":"Andrei Broder Lluis Garcia-Pueyo Vanja Josifovski Sergei Vassilvitskii and Srihari Venkatesan. 2014. Scalable k-means by ranked retrieval. In WSDM. 233--242. Andrei Broder Lluis Garcia-Pueyo Vanja Josifovski Sergei Vassilvitskii and Srihari Venkatesan. 2014. Scalable k -means by ranked retrieval. In WSDM. 233--242.","DOI":"10.1145\/2556195.2556260"},{"key":"e_1_2_1_10_1","doi-asserted-by":"crossref","first-page":"L092001","DOI":"10.1103\/PhysRevMaterials.6.L092001","article-title":"One-dimensional electronic states in a natural misfit structure","volume":"6","author":"Alla Chikina","year":"2022","unstructured":"Alla Chikina et al. 2022 . One-dimensional electronic states in a natural misfit structure . Physical Review Materials 6 , 9 (2022), L092001 . Alla Chikina et al. 2022. One-dimensional electronic states in a natural misfit structure. Physical Review Materials 6, 9 (2022), L092001.","journal-title":"Physical Review Materials"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/LSP.2011.2163394"},{"key":"e_1_2_1_12_1","doi-asserted-by":"crossref","unstructured":"Michael B Cohen Sam Elder Cameron Musco Christopher Musco and Madalina Persu. 2015. Dimensionality reduction for k-means clustering and low rank approximation. In STOC. 163--172. Michael B Cohen Sam Elder Cameron Musco Christopher Musco and Madalina Persu. 2015. Dimensionality reduction for k -means clustering and low rank approximation. In STOC. 163--172.","DOI":"10.1145\/2746539.2746569"},{"key":"e_1_2_1_13_1","first-page":"236","article-title":"Accessing the Spectral Function in a Current-Carrying Device","volume":"125","author":"Davide Curcio","year":"2020","unstructured":"Davide Curcio et al. 2020 . Accessing the Spectral Function in a Current-Carrying Device . Physical Review Letters 125 , 23 (2020), 236 -- 403 . Davide Curcio et al. 2020. Accessing the Spectral Function in a Current-Carrying Device. Physical Review Letters 125, 23 (2020), 236--403.","journal-title":"Physical Review Letters"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/MSP.2012.2211477"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11760-021-01951-0"},{"key":"e_1_2_1_16_1","unstructured":"Charles Elkan. 2003. Using the triangle inequality to accelerate k-means. In ICML. 147--153. Charles Elkan. 2003. Using the triangle inequality to accelerate k -means. In ICML. 147--153."},{"key":"e_1_2_1_17_1","first-page":"768","article-title":"Cluster analysis of multivariate data : efficiency versus interpretability of classifications","volume":"21","author":"Forgy Edward W.","year":"1965","unstructured":"Edward W. Forgy . 1965 . Cluster analysis of multivariate data : efficiency versus interpretability of classifications . Biometrics 21 , 3 (1965), 768 -- 769 . Edward W. Forgy. 1965. Cluster analysis of multivariate data : efficiency versus interpretability of classifications. Biometrics 21, 3 (1965), 768--769.","journal-title":"Biometrics"},{"key":"e_1_2_1_18_1","doi-asserted-by":"crossref","unstructured":"Greg Hamerly. 2010. Making k-means even faster. In SDM. 130--140. Greg Hamerly. 2010. Making k -means even faster. In SDM. 130--140.","DOI":"10.1137\/1.9781611972801.12"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1116\/5.0038637"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2022.11.001"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10115-005-0210-0"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1386118.1386124"},{"key":"e_1_2_1_23_1","doi-asserted-by":"crossref","unstructured":"Shrikant Kashyap and Panagiotis Karras. 2011. Scalable kNN search on vertically stored time series. In KDD. 1334--1342. Shrikant Kashyap and Panagiotis Karras. 2011. Scalable k NN search on vertically stored time series. In KDD. 1334--1342.","DOI":"10.1145\/2020408.2020607"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1063\/5.0054920"},{"key":"e_1_2_1_25_1","volume-title":"Volker Kr\u00fcger, Kamal Nasrollahi, Karen Andersen-Ranberg, Thomas B. Moeslund, and Erika Geraldina Spaich.","author":"Klonovs Juris","year":"2016","unstructured":"Juris Klonovs , Mohammad Ahsanul Haque , Volker Kr\u00fcger, Kamal Nasrollahi, Karen Andersen-Ranberg, Thomas B. Moeslund, and Erika Geraldina Spaich. 2016 . Distributed Computing and Monitoring Technologies for Older Patients. Springer . Juris Klonovs, Mohammad Ahsanul Haque, Volker Kr\u00fcger, Kamal Nasrollahi, Karen Andersen-Ranberg, Thomas B. Moeslund, and Erika Geraldina Spaich. 2016. Distributed Computing and Monitoring Technologies for Older Patients. Springer."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1088\/2632-2153\/abd87c"},{"key":"e_1_2_1_27_1","doi-asserted-by":"crossref","unstructured":"Amit Kumar Yogish Sabharwal and Sandeep Sen. 2004. A simple linear-time (1+\u03b5)-approximation algorithm for k-means clustering in any dimensions. In FOCS. 454--462. Amit Kumar Yogish Sabharwal and Sandeep Sen. 2004. A simple linear-time (1+\u03b5)-approximation algorithm for k -means clustering in any dimensions. In FOCS. 454--462.","DOI":"10.1109\/FOCS.2004.7"},{"key":"e_1_2_1_28_1","doi-asserted-by":"crossref","unstructured":"Ju-Hong Lee Deok-Hwan Kim and Chin-Wan Chung. 1999. Multi-dimensional Selectivity Estimation Using Compressed Histogram Information. In SIGMOD. 205--214. Ju-Hong Lee Deok-Hwan Kim and Chin-Wan Chung. 1999. Multi-dimensional Selectivity Estimation Using Compressed Histogram Information. In SIGMOD. 205--214.","DOI":"10.1145\/304181.304200"},{"key":"e_1_2_1_29_1","volume-title":"The Stanford Encyclopedia of Philosophy (Summer 2020 ed.). Metaphysics Research Lab","author":"Leonelli Sabina","unstructured":"Sabina Leonelli . 2020. Scientific Research and Big Data . In The Stanford Encyclopedia of Philosophy (Summer 2020 ed.). Metaphysics Research Lab , Stanford University . Sabina Leonelli. 2020. Scientific Research and Big Data. In The Stanford Encyclopedia of Philosophy (Summer 2020 ed.). Metaphysics Research Lab, Stanford University."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.1982.1056489"},{"key":"e_1_2_1_31_1","unstructured":"Konstantin Makarychev Yury Makarychev and Ilya Razenshteyn. 2019. Performance of Johnson-Lindenstrauss transform for k-means and k-medians clustering. In STOC. 1027--1038. Konstantin Makarychev Yury Makarychev and Ilya Razenshteyn. 2019. Performance of Johnson-Lindenstrauss transform for k -means and k -medians clustering. In STOC. 1027--1038."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.11591\/ijeecs.v16.i2.pp752-758"},{"key":"e_1_2_1_33_1","doi-asserted-by":"crossref","unstructured":"Arshad M. Mehar Kenan Matawie and Anthony Maeder. 2013. Determining an optimal value of k in k-means clustering. In BIBM. 51--55. Arshad M. Mehar Kenan Matawie and Anthony Maeder. 2013. Determining an optimal value of k in k -means clustering. In BIBM. 51--55.","DOI":"10.1109\/BIBM.2013.6732734"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1088\/2632-2153\/abab61"},{"key":"e_1_2_1_35_1","doi-asserted-by":"crossref","unstructured":"Dan Pelleg and Andrew Moore. 1999. Accelerating exact k-means algorithms with geometric reasoning. In KDD. 277--281. Dan Pelleg and Andrew Moore. 1999. Accelerating exact k -means algorithms with geometric reasoning. In KDD. 277--281.","DOI":"10.1145\/312129.312248"},{"key":"e_1_2_1_36_1","article-title":"Super resolution convolutional neural network for feature extraction in spectroscopic data","volume":"91","author":"Han Peng","year":"2020","unstructured":"Han Peng et al. 2020 . Super resolution convolutional neural network for feature extraction in spectroscopic data . Review of Scientific Instruments 91 , 3 (2020). Han Peng et al. 2020. Super resolution convolutional neural network for feature extraction in spectroscopic data. Review of Scientific Instruments 91, 3 (2020).","journal-title":"Review of Scientific Instruments"},{"key":"e_1_2_1_37_1","doi-asserted-by":"crossref","first-page":"533","DOI":"10.1002\/9780470142691.ch8","article-title":"Angle-resolved photoemission as a tool for the study of surfaces","volume":"49","author":"Plummer Earl W.","year":"1982","unstructured":"Earl W. Plummer and William Eberhardt . 1982 . Angle-resolved photoemission as a tool for the study of surfaces . Advances in Chemical Physics 49 (1982), 533 . Earl W. Plummer and William Eberhardt. 1982. Angle-resolved photoemission as a tool for the study of surfaces. Advances in Chemical Physics 49 (1982), 533.","journal-title":"Advances in Chemical Physics"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1107\/S1600577514015409"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1016\/0377-0427(87)90125-7"},{"key":"e_1_2_1_40_1","doi-asserted-by":"crossref","unstructured":"David Sculley. 2010. Web-scale k-means clustering. In TheWebConf. 1177--1178. David Sculley. 2010. Web-scale k -means clustering. In TheWebConf. 1177--1178.","DOI":"10.1145\/1772690.1772862"},{"issue":"8","key":"e_1_2_1_41_1","first-page":"57","article-title":"Spatial Modeling Principles in Earth Sciences. Springer","volume":"2","author":"Sen Zekai","year":"2016","unstructured":"Zekai Sen . 2016 . Spatial Modeling Principles in Earth Sciences. Springer , Chapter 2 . 8 .1, 57 -- 59 . Zekai Sen. 2016. Spatial Modeling Principles in Earth Sciences. Springer, Chapter 2.8.1, 57--59.","journal-title":"Chapter"},{"key":"e_1_2_1_42_1","volume-title":"Applications of Digital Image Processing XLIV","author":"Testolina Michela","unstructured":"Michela Testolina and Touradj Ebrahimi . 2021. Review of subjective quality assessment methodologies and standards for compressed images evaluation . In Applications of Digital Image Processing XLIV , Vol. 11842 . SPIE , 302--315. Michela Testolina and Touradj Ebrahimi. 2021. Review of subjective quality assessment methodologies and standards for compressed images evaluation. In Applications of Digital Image Processing XLIV, Vol. 11842. SPIE, 302--315."},{"key":"e_1_2_1_43_1","volume-title":"Hastie","author":"Tibshirani Robert","year":"2000","unstructured":"Robert Tibshirani , Guenther Walther , and Trevor J . Hastie . 2000 . Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society : Series B (Statistical Methodology) 63 (2000). Robert Tibshirani, Guenther Walther, and Trevor J. Hastie. 2000. Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 63 (2000)."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00357-020-09372-3"},{"key":"e_1_2_1_45_1","unstructured":"Sergei Vassilvitskii and David Arthur. 2006. k-means++: The advantages of careful seeding. In SODA. 1027--1035. Sergei Vassilvitskii and David Arthur. 2006. k -means++: The advantages of careful seeding. In SODA. 1027--1035."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.14778\/3425879.3425887"},{"key":"e_1_2_1_47_1","volume-title":"Hall","author":"Witten Ian H.","year":"2011","unstructured":"Ian H. Witten , Eibe Frank , and Mark A . Hall . 2011 . Data Mining : Practical Machine Learning Tools and Techniques (3rd ed.). Morgan Kaufmann Publishers Inc ., USA. Ian H. Witten, Eibe Frank, and Mark A. Hall. 2011. Data Mining: Practical Machine Learning Tools and Techniques (3rd ed.). Morgan Kaufmann Publishers Inc., USA."},{"key":"e_1_2_1_48_1","first-page":"338","article-title":"Initializing k-means clustering using affinity propagation","volume":"1","author":"Zhu Yan","year":"2009","unstructured":"Yan Zhu , Jian Yu , and Caiyan Jia . 2009 . Initializing k-means clustering using affinity propagation . In HIS , Vol. 1. 338 -- 343 . Yan Zhu, Jian Yu, and Caiyan Jia. 2009. Initializing k-means clustering using affinity propagation. In HIS, Vol. 1. 338--343.","journal-title":"HIS"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3587136.3587147","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,20]],"date-time":"2024-10-20T00:51:39Z","timestamp":1729385499000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3587136.3587147"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,3]]},"references-count":48,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2023,3]]}},"alternative-id":["10.14778\/3587136.3587147"],"URL":"https:\/\/doi.org\/10.14778\/3587136.3587147","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2023,3]]},"assertion":[{"value":"2023-05-08","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}