{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:28:54Z","timestamp":1750220934610,"version":"3.41.0"},"reference-count":45,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2019,2,18]],"date-time":"2019-02-18T00:00:00Z","timestamp":1550448000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Canada Research Chairs program and an NSERC Discovery Grant"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Algorithms"],"published-print":{"date-parts":[[2019,4,30]]},"abstract":"<jats:p>\n            Clustering problems are well studied in a variety of fields, such as data science, operations research, and computer science. Such problems include variants of center location problems,\n            <jats:italic>k<\/jats:italic>\n            -median and\n            <jats:italic>k<\/jats:italic>\n            -means to name a few. In some cases, not all data points need to be clustered; some may be discarded for various reasons. For instance, some points may arise from noise in a dataset or one might be willing to discard a certain fraction of the points to avoid incurring unnecessary overhead in the cost of a clustering solution.\n          <\/jats:p>\n          <jats:p>\n            We study clustering problems with outliers. More specifically, we look at\n            <jats:sc>uncapacitated facility location<\/jats:sc>\n            (UFL),\n            <jats:italic>k<\/jats:italic>\n            -\n            <jats:sc>median<\/jats:sc>\n            , and\n            <jats:italic>k<\/jats:italic>\n            -\n            <jats:sc>means<\/jats:sc>\n            . In these problems, we are given a set\n            <jats:bold>\n              <jats:italic>X<\/jats:italic>\n            <\/jats:bold>\n            of data points in a metric space \u03b4(., .), a set\n            <jats:italic>\n              <jats:bold>C<\/jats:bold>\n            <\/jats:italic>\n            of possible centers (each maybe with an opening cost), maybe an integer parameter\n            <jats:italic>k<\/jats:italic>\n            , plus an additional parameter\n            <jats:italic>z<\/jats:italic>\n            as the number of outliers. In\n            <jats:sc>uncapacitated facility location<\/jats:sc>\n            with outliers, we have to open some centers, discard up to\n            <jats:italic>z<\/jats:italic>\n            points of\n            <jats:bold>\n              <jats:italic>X<\/jats:italic>\n            <\/jats:bold>\n            , and assign every other point to the nearest open center, minimizing the total assignment cost plus center opening costs. In\n            <jats:italic>k<\/jats:italic>\n            -\n            <jats:sc>median<\/jats:sc>\n            and\n            <jats:italic>k<\/jats:italic>\n            -\n            <jats:sc>means<\/jats:sc>\n            , we have to open up to\n            <jats:italic>k<\/jats:italic>\n            centers, but there are no opening costs. In\n            <jats:italic>k<\/jats:italic>\n            -\n            <jats:sc>means<\/jats:sc>\n            , the cost of assigning\n            <jats:italic>j<\/jats:italic>\n            to\n            <jats:italic>i<\/jats:italic>\n            is \u03b4\n            <jats:sup>2<\/jats:sup>\n            (\n            <jats:italic>j<\/jats:italic>\n            ,\n            <jats:italic>i<\/jats:italic>\n            ).\n          <\/jats:p>\n          <jats:p>\n            We present several results. Our main focus is on cases where \u03b4 is a doubling metric (this includes fixed dimensional Euclidean metrics as a special case) or is the shortest path metrics of graphs from a minor-closed family of graphs. For\n            <jats:sc>uniform-cost<\/jats:sc>\n            UFL with outliers on such metrics, we show that a multiswap simple local search heuristic yields a PTAS. With a bit more work, we extend this to bicriteria approximations for the\n            <jats:italic>k<\/jats:italic>\n            -\n            <jats:sc>median<\/jats:sc>\n            and\n            <jats:italic>k<\/jats:italic>\n            -\n            <jats:sc>means<\/jats:sc>\n            problems in the same metrics where, for any constant \u03f5 &gt; 0, we can find a solution using (1 + \u03f5)\n            <jats:italic>k<\/jats:italic>\n            centers whose cost is at most a (1 + \u03f5)-factor of the optimum and uses at most\n            <jats:italic>z<\/jats:italic>\n            outliers. Our algorithms are all based on natural multiswap local search heuristics. We also show that natural local search heuristics that do not violate the number of clusters and outliers for\n            <jats:italic>k<\/jats:italic>\n            -\n            <jats:sc>median<\/jats:sc>\n            (or\n            <jats:italic>k<\/jats:italic>\n            -\n            <jats:sc>means<\/jats:sc>\n            ) will have unbounded gap even in Euclidean metrics.\n          <\/jats:p>\n          <jats:p>\n            Furthermore, we show how our analysis can be extended to general metrics for\n            <jats:italic>k<\/jats:italic>\n            -\n            <jats:sc>means<\/jats:sc>\n            with outliers to obtain a (25 + \u03f5, 1 + \u03f5)-approximation: an algorithm that uses at most (1 + \u03f5)\n            <jats:italic>k<\/jats:italic>\n            clusters and whose cost is at most 25 + \u03f5 of optimum and uses no more than\n            <jats:italic>z<\/jats:italic>\n            outliers.\n          <\/jats:p>","DOI":"10.1145\/3301446","type":"journal-article","created":{"date-parts":[[2019,2,19]],"date-time":"2019-02-19T20:54:15Z","timestamp":1550609655000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":21,"title":["Approximation Schemes for Clustering with Outliers"],"prefix":"10.1145","volume":"15","author":[{"given":"Zachary","family":"Friggstad","sequence":"first","affiliation":[{"name":"Department of Computing Science, University of Alberta, Canada"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1289-6839","authenticated-orcid":false,"given":"Kamyar","family":"Khodamoradi","sequence":"additional","affiliation":[{"name":"Department of Computing Science, University of Alberta, Canada"}]},{"given":"Mohsen","family":"Rezapour","sequence":"additional","affiliation":[{"name":"Department of Mathematics, K.N. Toosi University of Technology, Tehran, Iran"}]},{"given":"Mohammad R.","family":"Salavatipour","sequence":"additional","affiliation":[{"name":"Department of Computing Science, University of Alberta, Canada"}]}],"member":"320","published-online":{"date-parts":[[2019,2,18]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Proceedings of IEEE Symp. in Foundations of Computer Science (FOCS'17)","author":"Ahmadian Sara","year":"2016","unstructured":"Sara Ahmadian , Ashkan Norouzi-Fard , Ola Svensson , and Justin Ward . 2016 . Better guarantees for K-means and euclidean K-median by primal-dual algorithms . In Proceedings of IEEE Symp. in Foundations of Computer Science (FOCS'17) . 61--72. Sara Ahmadian, Ashkan Norouzi-Fard, Ola Svensson, and Justin Ward. 2016. Better guarantees for K-means and euclidean K-median by primal-dual algorithms. In Proceedings of IEEE Symp. in Foundations of Computer Science (FOCS'17). 61--72."},{"key":"e_1_2_1_2_1","volume-title":"Proceedings of the 43rd International Colloquium on Automata, Languages, and Programming (ICALP\u201916)","author":"Ahmadian Sara","year":"2016","unstructured":"Sara Ahmadian and Chaitanya Swamy . 2016 . Approximation algorithms for clustering problems with lower bounds and outliers . In Proceedings of the 43rd International Colloquium on Automata, Languages, and Programming (ICALP\u201916) . 69:1--69:15. Sara Ahmadian and Chaitanya Swamy. 2016. Approximation algorithms for clustering problems with lower bounds and outliers. In Proceedings of the 43rd International Colloquium on Automata, Languages, and Programming (ICALP\u201916). 69:1--69:15."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-009-5103-0"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/290179.290180"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/276698.276718"},{"key":"e_1_2_1_6_1","volume-title":"Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA\u201907)","author":"Arthur David","year":"2007","unstructured":"David Arthur and Sergei Vassilvitskii . 2007 . K-means++: The advantages of careful seeding . In Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA\u201907) . SIAM, 1027--1035. David Arthur and Sergei Vassilvitskii. 2007. K-means++: The advantages of careful seeding. In Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA\u201907). SIAM, 1027--1035."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/380752.380755"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1137\/S0097539702416402"},{"key":"e_1_2_1_9_1","volume-title":"Proceedings of the 32nd International Symposium on Computational Geometry (SoCG\u201916)","author":"Bandyapadhyay Sayan","year":"2016","unstructured":"Sayan Bandyapadhyay and Kasturi Varadarajan . 2016 . On variants of K-means clustering . In Proceedings of the 32nd International Symposium on Computational Geometry (SoCG\u201916) . 14:1--14:15. Sayan Bandyapadhyay and Kasturi Varadarajan. 2016. On variants of K-means clustering. In Proceedings of the 32nd International Symposium on Computational Geometry (SoCG\u201916). 14:1--14:15."},{"key":"e_1_2_1_10_1","first-page":"71","article-title":"A survey of clustering data mining techniques","volume":"25","author":"Pavel Berkhin","year":"2006","unstructured":"Pavel Berkhin et al. 2006 . A survey of clustering data mining techniques . Group. Multidimens. Data 25 (2006), 71 . Pavel Berkhin et al. 2006. A survey of clustering data mining techniques. Group. Multidimens. Data 25 (2006), 71.","journal-title":"Group. Multidimens. Data"},{"key":"e_1_2_1_11_1","volume-title":"Proceedings of the 26th Annual ACM-SIAM Symposium on Discrete Algorithms. SIAM, 737--756","author":"Byrka Jaros\u0142aw","year":"2014","unstructured":"Jaros\u0142aw Byrka , Thomas Pensyl , Bartosz Rybicki , Aravind Srinivasan , and Khoa Trinh . 2014 . An improved approximation for K-median, and positive correlation in budgeted optimization . In Proceedings of the 26th Annual ACM-SIAM Symposium on Discrete Algorithms. SIAM, 737--756 . Jaros\u0142aw Byrka, Thomas Pensyl, Bartosz Rybicki, Aravind Srinivasan, and Khoa Trinh. 2014. An improved approximation for K-median, and positive correlation in budgeted optimization. In Proceedings of the 26th Annual ACM-SIAM Symposium on Discrete Algorithms. SIAM, 737--756."},{"key":"e_1_2_1_12_1","volume-title":"Proceedings of the 43rd International Colloquium on Automata, Languages, and Programming (ICALP","author":"Chakrabarty Deeparnab","year":"2016","unstructured":"Deeparnab Chakrabarty , Prachi Goyal , and Ravishankar Krishnaswamy . 2016 . The non-uniform k-center problem . In Proceedings of the 43rd International Colloquium on Automata, Languages, and Programming (ICALP 2016). 67:1--67:15. Deeparnab Chakrabarty, Prachi Goyal, and Ravishankar Krishnaswamy. 2016. The non-uniform k-center problem. In Proceedings of the 43rd International Colloquium on Automata, Languages, and Programming (ICALP 2016). 67:1--67:15."},{"key":"e_1_2_1_13_1","volume-title":"Proceedings of the 12th Annual ACM-SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, 642--651","author":"Charikar Moses","year":"2001","unstructured":"Moses Charikar , Samir Khuller , David M. Mount , and Giri Narasimhan . 2001 . Algorithms for facility location problems with outliers . In Proceedings of the 12th Annual ACM-SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, 642--651 . Moses Charikar, Samir Khuller, David M. Mount, and Giri Narasimhan. 2001. Algorithms for facility location problems with outliers. In Proceedings of the 12th Annual ACM-SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, 642--651."},{"key":"e_1_2_1_15_1","volume-title":"Proceedings of the 19th Annual ACM-SIAM Symposium on Discrete Algorithms. 826--835","author":"Chen Ke","year":"2008","unstructured":"Ke Chen . 2008 . A constant factor approximation algorithm for K-median clustering with outliers . In Proceedings of the 19th Annual ACM-SIAM Symposium on Discrete Algorithms. 826--835 . Ke Chen. 2008. A constant factor approximation algorithm for K-median clustering with outliers. In Proceedings of the 19th Annual ACM-SIAM Symposium on Discrete Algorithms. 826--835."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.5555\/3174304.3175298"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/FOCS.2016.46"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/780542.780550"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1023\/B:MACH.0000033113.59016.96"},{"key":"e_1_2_1_20_1","volume-title":"Proceedings of the Knowledge Discovery and Data Mining Conference (KDD\u201996)","volume":"96","author":"Ester Martin","year":"1996","unstructured":"Martin Ester , Hans-Peter Kriegel , J\u00f6rg Sander , Xiaowei Xu , 1996 . A density-based algorithm for discovering clusters in large spatial databases with noise . In Proceedings of the Knowledge Discovery and Data Mining Conference (KDD\u201996) , Vol. 96 . 226--231. Martin Ester, Hans-Peter Kriegel, J\u00f6rg Sander, Xiaowei Xu, et al. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the Knowledge Discovery and Data Mining Conference (KDD\u201996), Vol. 96. 226--231."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/1247069.1247072"},{"volume-title":"Proceedings of the IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS\u201916)","author":"Friggstad Zachary","key":"e_1_2_1_22_1","unstructured":"Zachary Friggstad , Mohsen Rezapour , and Mohammad R. Salavatipour . 2016. Local search yields a PTAS for k-means in doubling metrics . In Proceedings of the IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS\u201916) . 365--374. Zachary Friggstad, Mohsen Rezapour, and Mohammad R. Salavatipour. 2016. Local search yields a PTAS for k-means in doubling metrics. In Proceedings of the IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS\u201916). 365--374."},{"key":"e_1_2_1_23_1","volume-title":"Salavatipour","author":"Friggstad Zachary","year":"2016","unstructured":"Zachary Friggstad , Mohsen Rezapour , and Mohammad R . Salavatipour . 2016 . Local search yields a PTAS for k-means in doubling metrics. CoRR abs\/1603.08976 (2016). http:\/\/arxiv.org\/abs\/1603.08976 Zachary Friggstad, Mohsen Rezapour, and Mohammad R. Salavatipour. 2016. Local search yields a PTAS for k-means in doubling metrics. CoRR abs\/1603.08976 (2016). http:\/\/arxiv.org\/abs\/1603.08976"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1006\/jagm.1998.0993"},{"key":"e_1_2_1_25_1","unstructured":"A. Gupta and T. Tangwongsan. 2008. Simpler analyses of local search algorithms for facility location. CoRR abs\/0809.2554 (2008).  A. Gupta and T. Tangwongsan. 2008. Simpler analyses of local search algorithms for facility location. CoRR abs\/0809.2554 (2008)."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.14778\/3067421.3067425"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/1064092.1064114"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/1007352.1007400"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1023\/B:AIRE.0000045502.10941.a9"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/177424.178042"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2009.09.011"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/509907.510012"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/375827.375845"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.comgeo.2004.03.003"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/3188745.3188882"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/FOCS.2004.7"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/1667053.1667054"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipl.2016.11.009"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.5555\/2027223.2027230"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/2488608.2488723"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.1982.1056489"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-00202-1_24"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1007\/s004540010019"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/2395116.2395117"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/78.127962"},{"key":"e_1_2_1_46_1","unstructured":"Andrea Vattani. 2009. The hardness of K-means clustering in the plane (unpublished).  Andrea Vattani. 2009. The hardness of K-means clustering in the plane (unpublished)."}],"container-title":["ACM Transactions on Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3301446","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3301446","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:53:25Z","timestamp":1750204405000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3301446"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,2,18]]},"references-count":45,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2019,4,30]]}},"alternative-id":["10.1145\/3301446"],"URL":"https:\/\/doi.org\/10.1145\/3301446","relation":{},"ISSN":["1549-6325","1549-6333"],"issn-type":[{"type":"print","value":"1549-6325"},{"type":"electronic","value":"1549-6333"}],"subject":[],"published":{"date-parts":[[2019,2,18]]},"assertion":[{"value":"2018-01-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2018-11-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-02-18","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}