{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,11]],"date-time":"2026-03-11T16:34:43Z","timestamp":1773246883059,"version":"3.50.1"},"reference-count":39,"publisher":"Association for Computing Machinery (ACM)","issue":"7","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2017,3]]},"abstract":"<jats:p>\n            We study the problem of\n            <jats:italic>k<\/jats:italic>\n            -means clustering in the presence of outliers. The goal is to cluster a set of data points to minimize the variance of the points assigned to the same cluster, with the freedom of ignoring a small set of data points that can be labeled as outliers. Clustering with outliers has received a lot of attention in the data processing community, but practical, efficient, and provably good algorithms remain unknown for the most popular\n            <jats:italic>k<\/jats:italic>\n            -means objective.\n          <\/jats:p>\n          <jats:p>\n            Our work proposes a simple local search-based algorithm for\n            <jats:italic>k<\/jats:italic>\n            -means clustering with outliers. We prove that this algorithm achieves constant-factor approximate solutions and can be combined with known sketching techniques to scale to large data sets. Using empirical evaluation on both synthetic and large-scale real-world data, we demonstrate that the algorithm dominates recently proposed heuristic approaches for the problem.\n          <\/jats:p>","DOI":"10.14778\/3067421.3067425","type":"journal-article","created":{"date-parts":[[2017,5,12]],"date-time":"2017-05-12T12:17:14Z","timestamp":1494591434000},"page":"757-768","source":"Crossref","is-referenced-by-count":76,"title":["Local search methods for k-means with outliers"],"prefix":"10.14778","volume":"10","author":[{"given":"Shalmoli","family":"Gupta","sequence":"first","affiliation":[{"name":"University of Illinois"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ravi","family":"Kumar","sequence":"additional","affiliation":[{"name":"Google"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kefu","family":"Lu","sequence":"additional","affiliation":[{"name":"Washington University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Benjamin","family":"Moseley","sequence":"additional","affiliation":[{"name":"Washington University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sergei","family":"Vassilvitskii","sequence":"additional","affiliation":[{"name":"Google"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2017,3]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-03685-9_2"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/376284.375668"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.5555\/2535015"},{"key":"e_1_2_1_4_1","first-page":"164","volume-title":"KDD","author":"Arning A.","year":"1996","unstructured":"A. Arning , R. Agrawal , and P. Raghavan . A linear method for deviation detection in large databases . In KDD , pages 164 -- 169 , 1996 . A. Arning, R. Agrawal, and P. Raghavan. A linear method for deviation detection in large databases. In KDD, pages 164--169, 1996."},{"key":"e_1_2_1_5_1","first-page":"1027","volume-title":"SODA","author":"Arthur D.","year":"2007","unstructured":"D. Arthur and S. Vassilvitskii . k-means++: the advantages of careful seeding . In SODA , pages 1027 -- 1035 , 2007 . D. Arthur and S. Vassilvitskii. k-means++: the advantages of careful seeding. In SODA, pages 1027--1035, 2007."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.14778\/2180912.2180915"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/956750.956758"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-28349-8_2"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/1376616.1376638"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/335191.335388"},{"key":"e_1_2_1_11_1","first-page":"642","volume-title":"SODA","author":"Charikar M.","year":"2001","unstructured":"M. Charikar , S. Khuller , D. M. Mount , and G. Narasimhan . Algorithms for facility location problems with outliers . In SODA , pages 642 -- 651 , 2001 . M. Charikar, S. Khuller, D. M. Mount, and G. Narasimhan. Algorithms for facility location problems with outliers. In SODA, pages 642--651, 2001."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611972832.21"},{"key":"e_1_2_1_13_1","first-page":"826","volume-title":"SODA","author":"Chen K.","year":"2008","unstructured":"K. Chen . A constant factor approximation algorithm for k-median clustering with outliers . In SODA , pages 826 -- 835 , 2008 . K. Chen. A constant factor approximation algorithm for k-median clustering with outliers. In SODA, pages 826--835, 2008."},{"key":"e_1_2_1_14_1","first-page":"226","volume-title":"AAAI","author":"Ester M.","year":"1996","unstructured":"M. Ester , H.-P. Kriegel , J. Sander , and X. Xu . A density-based algorithm for discovering clusters in large spatial databases with noise . In AAAI , pages 226 -- 231 , 1996 . M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In AAAI, pages 226--231, 1996."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/360402.360419"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/1247069.1247072"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2003.1198387"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/276305.276312"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/1058150.1058157"},{"key":"e_1_2_1_20_1","volume-title":"Simpler analyses of local search algorithms for facility location. CoRR, abs\/0809.2554","author":"Gupta A.","year":"2008","unstructured":"A. Gupta and K. Tangwongsan . Simpler analyses of local search algorithms for facility location. CoRR, abs\/0809.2554 , 2008 . A. Gupta and K. Tangwongsan. Simpler analyses of local search algorithms for facility location. CoRR, abs\/0809.2554, 2008."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00454-006-1271-x"},{"key":"e_1_2_1_22_1","first-page":"58","volume-title":"KDD","author":"Hinneburg A.","year":"1998","unstructured":"A. Hinneburg and D. A. Keim . An efficient approach to clustering in large multimedia databases with noise . In KDD , pages 58 -- 65 , 1998 . A. Hinneburg and D. A. Keim. An efficient approach to clustering in large multimedia databases with noise. In KDD, pages 58--65, 1998."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/331499.331504"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.comgeo.2004.03.003"},{"key":"e_1_2_1_25_1","first-page":"392","volume-title":"VLDB","author":"Knorr E.","year":"1998","unstructured":"E. Knorr and R. Ng . Algorithms for mining distance-based outliers in large datasets . In VLDB , pages 392 -- 403 , 1998 . E. Knorr and R. Ng. Algorithms for mining distance-based outliers in large datasets. In VLDB, pages 392--403, 1998."},{"key":"e_1_2_1_26_1","first-page":"19","volume-title":"KDD","author":"Knorr E. M.","year":"1997","unstructured":"E. M. Knorr and R. T. Ng . A unified notion of outliers: Properties and computation . In KDD , pages 19 -- 22 , 1997 . E. M. Knorr and R. T. Ng. A unified notion of outliers: Properties and computation. In KDD, pages 19--22, 1997."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/FOCS.2004.7"},{"key":"e_1_2_1_28_1","volume-title":"UCI machine learning repository","author":"Lichman M.","year":"2013","unstructured":"M. Lichman . UCI machine learning repository , 2013 . M. Lichman. UCI machine learning repository, 2013."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.1982.1056489"},{"key":"e_1_2_1_30_1","first-page":"211","volume-title":"VLDB","author":"K. E. M.","year":"1999","unstructured":"K. E. M. and N. R. T. Finding intensional knowledge of distance-based outliers . In VLDB , pages 211 -- 222 , 1999 . K. E. M. and N. R. T. Finding intensional knowledge of distance-based outliers. In VLDB, pages 211--222, 1999."},{"key":"e_1_2_1_31_1","first-page":"1063","volume-title":"NIPS","author":"Malkomes G.","year":"2015","unstructured":"G. Malkomes , M. Kusner , W. Chen , K. Weinberger , and B. Moseley . Fast distributed k-center clustering with outliers on massive data . In NIPS , pages 1063 -- 1071 , 2015 . G. Malkomes, M. Kusner, W. Chen, K. Weinberger, and B. Moseley. Fast distributed k-center clustering with outliers on massive data. In NIPS, pages 1063--1071, 2015."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-85363-3_14"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2006.31"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/335191.335468"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2004.25"},{"key":"e_1_2_1_36_1","first-page":"1359","volume-title":"NIPS","author":"Ott L.","year":"2014","unstructured":"L. Ott , L. Pang , F. T. Ramos , and S. Chawla . On integrated clustering and outlier detection . In NIPS , pages 1359 -- 1367 , 2014 . L. Ott, L. Pang, F. T. Ramos, and S. Chawla. On integrated clustering and outlier detection. In NIPS, pages 1359--1367, 2014."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2003.1260802"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/335191.335437"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10115-007-0114-2"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3067421.3067425","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T09:21:19Z","timestamp":1672219279000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3067421.3067425"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,3]]},"references-count":39,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2017,3]]}},"alternative-id":["10.14778\/3067421.3067425"],"URL":"https:\/\/doi.org\/10.14778\/3067421.3067425","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2017,3]]}}}