{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,25]],"date-time":"2026-03-25T06:48:37Z","timestamp":1774421317008,"version":"3.50.1"},"reference-count":29,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2022,1,31]],"date-time":"2022-01-31T00:00:00Z","timestamp":1643587200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>Metric k-center clustering is a fundamental unsupervised learning primitive. Although widely used, this primitive is heavily affected by noise in the data, so a more sensible variant seeks for the best solution that disregards a given number z of points of the dataset, which are called outliers. We provide efficient algorithms for this important variant in the streaming model under the sliding window setting, where, at each time step, the dataset to be clustered is the window W of the most recent data items. For general metric spaces, our algorithms achieve O1 approximation and, remarkably, require a working memory linear in k+z and only logarithmic in |W|. For spaces of bounded doubling dimension, the approximation can be made arbitrarily close to 3. For these latter spaces, we show, as a by-product, how to estimate the effective diameter of the window W, which is a measure of the spread of the window points, disregarding a given fraction of noisy distances. We also provide experimental evidence of the practical viability of the improved clustering and diameter estimation algorithms.<\/jats:p>","DOI":"10.3390\/a15020052","type":"journal-article","created":{"date-parts":[[2022,1,31]],"date-time":"2022-01-31T08:20:29Z","timestamp":1643617229000},"page":"52","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["k-Center Clustering with Outliers in Sliding Windows"],"prefix":"10.3390","volume":"15","author":[{"given":"Paolo","family":"Pellizzoni","sequence":"first","affiliation":[{"name":"Department of Information Engineering, University of Padova, Via Gradenigo 6\/B, 35131 Padova, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9189-9618","authenticated-orcid":false,"given":"Andrea","family":"Pietracaprina","sequence":"additional","affiliation":[{"name":"Department of Information Engineering, University of Padova, Via Gradenigo 6\/B, 35131 Padova, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9189-6938","authenticated-orcid":false,"given":"Geppino","family":"Pucci","sequence":"additional","affiliation":[{"name":"Department of Information Engineering, University of Padova, Via Gradenigo 6\/B, 35131 Padova, Italy"}]}],"member":"1968","published-online":{"date-parts":[[2022,1,31]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Henzinger, M., Raghavan, P., and Rajagopalan, S. (1998, January 20\u201322). Computing on Data Streams. Proceedings of the DIMACS Workshop on External Memory Algorithms, New Brunswick, NJ, USA.","DOI":"10.1090\/dimacs\/050\/05"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Garofalakis, M., Gehrke, J., and Rastogi, R. (2016). The Sliding-Window Computation Model and Results. Data Stream Management, Springer.","DOI":"10.1007\/978-3-540-28608-0"},{"key":"ref_3","unstructured":"Cao, M. (2016). Sliding Window Algorithms. Encyclopedia of Algorithms, Springer."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Snyder, L. (2011). Introduction to facility location. Wiley Enciclopedia of Operations Research and Management Science, Wiley.","DOI":"10.1002\/9780470400531.eorms0423"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Hennig, C., Meila, M., Murtagh, F., and Rocci, R. (2015). Handbook of Cluster Analysis, CRC Press.","DOI":"10.1201\/b19706"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Bateni, M., Esfandiari, H., Fischer, M., and Mirrokni, V. (2021, January 2\u20139). Extreme k-Center Clustering. Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI), Vancouver, BC, Canada.","DOI":"10.1609\/aaai.v35i5.16513"},{"key":"ref_7","unstructured":"Charikar, M., Khuller, S., Mount, D., and Narasimhan, G. (2001, January 7\u20139). Algorithms for Facility Location Problems with Outliers. Proceedings of the 12th ACM-SIAM Symposium on Discrete Algorithms (SODA), Washington, DC, USA."},{"key":"ref_8","unstructured":"Malkomes, G., Kusner, M., Chen, W., Weinberger, K., and Moseley, B. (2015, January 7\u201312). Fast Distributed k-Center Clustering with Outliers on Massive Data. Proceedings of the 28th Conference on Neural Information Processing Systems (NIPS), Montreal, QC, Canada."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Ceccarello, M., Pietracaprina, A., and Pucci, G. (2018). Solving k-center Clustering (with Outliers) in MapReduce and Streaming, almost as Accurately as Sequentially. arXiv.","DOI":"10.14778\/3317315.3317319"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1016\/0377-0427(87)90125-7","article-title":"Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis","volume":"20","author":"Rousseeuw","year":"1987","journal-title":"Comput. Appl. Math."},{"key":"ref_11","unstructured":"Altieri, F., Pietracaprina, A., Pucci, G., and Vandin, F. (May, January 29). Scalable Distributed Approximation of Internal Measures for Clustering Evaluation. Proceedings of the SIAM International Conference on Data Mining (SDM), Online."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"293","DOI":"10.1016\/0304-3975(85)90224-5","article-title":"Clustering to minimize the maximum intercluster distance","volume":"38","author":"Gonzalez","year":"1985","journal-title":"Theor. Comput. Sci."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"180","DOI":"10.1287\/moor.10.2.180","article-title":"A Best Possible Heuristic for the k-Center Problem","volume":"10","author":"Hochbaum","year":"1985","journal-title":"Math. Oper. Res."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Chan, T.H.H., Guerqin, A., and Sozio, M. (2018, January 23\u201327). Fully Dynamic k-Center Clustering. Proceedings of the World Wide Web Conference, Lyon, France.","DOI":"10.1145\/3178876.3186124"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"36","DOI":"10.1145\/3311953","article-title":"A Lottery Model for Center-Type Problems With Outliers","volume":"15","author":"Harris","year":"2019","journal-title":"ACM Trans. Algorithms"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"46","DOI":"10.1145\/3392720","article-title":"The Non-Uniform k-Center Problem","volume":"16","author":"Chakrabarty","year":"2020","journal-title":"ACM Trans. Algorithms"},{"key":"ref_17","unstructured":"Ding, H., Yu, H., and Wang, Z. (2019, January 9\u201311). Greedy Strategy Works for k-Center Clustering with Outliers and Coreset Construction. Proceedings of the 27th Annual European Symposium on Algorithms (ESA), Munich\/Garching, Germany."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"McCutchen, R., and Khuller, S. (2008). Streaming Algorithms for k-Center Clustering with Outliers and with Anonymity. Approximation, Randomization and Combinatorial Optimization. Algorithms and Techniques, Springer.","DOI":"10.1007\/978-3-540-85363-3_14"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"1989","DOI":"10.1007\/s00453-020-00683-w","article-title":"The Parameterized Hardness of the k-Center Problem in Transportation Networks","volume":"82","author":"Feldmann","year":"2020","journal-title":"Algorithmica"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3477541","article-title":"Near-Linear Time Approximation Schemes for Clustering in Doubling Metrics","volume":"68","author":"Feldmann","year":"2021","journal-title":"J. ACM"},{"key":"ref_21","unstructured":"Cohen-Addad, V., Schwiegelshohn, C., and Sohler, C. (2016, January 11\u201315). Diameter and k-Center in Sliding Windows. Proceedings of the 43th International Colloquium on Automata, Languages and Programming (ICALP), Rome, Italy."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Pellizzoni, P., Pietracaprina, A., and Pucci, G. (2020, January 6\u20139). Dimensionality-adaptive k-center in sliding windows. Proceedings of the 7th IEEE International Conference on Data Science and Advanced Analytics (DSAA), Sydney, Australia.","DOI":"10.1109\/DSAA49011.2020.00032"},{"key":"ref_23","unstructured":"de Berg, M., Monemizadeh, M., and Zhong, Y. (2021, January 29\u201330). k-Center Clustering with Outliers in the Sliding-Window Model. Proceedings of the 29th Annual European Symposium on Algorithms (ESA), Lisbon, Portugal."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Braverman, V., Lang, H., Levin, K., and Monemizadeh, M. (2016, January 10\u201312). Clustering Problems on Sliding Windows. Proceedings of the 27th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), Arlington, VA, USA.","DOI":"10.1137\/1.9781611974331.ch95"},{"key":"ref_25","unstructured":"Borassi, M., Epasto, A., Lattanzi, S., Vassilvitskii, S., and Zadimoghaddam, M. (2020, January 6\u201312). Sliding Window Algorithms for k-Clustering Problems. Proceedings of the 34th Neural Information Processing Systems (NeurIPS), Online."},{"key":"ref_26","unstructured":"Palmer, C., Gibbons, P., and Faloutsos, C. (2002, January 23\u201326). ANF: A fast and scalable tool for data mining in massive graphs. Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Edmonton, AB, Canada."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Braverman, V., and Ostrovsky, R. (2007, January 20\u201323). Smooth Histograms for Sliding Windows. Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS), Providence, RI, USA.","DOI":"10.1109\/FOCS.2007.55"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"5750","DOI":"10.1109\/TIT.2014.2339840","article-title":"Efficient Classification for Metric Data","volume":"60","author":"Gottlieb","year":"2014","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"40038","DOI":"10.1109\/ACCESS.2019.2906949","article-title":"An Internal Validity Index Based on Density-Involved Distance","volume":"7","author":"Hu","year":"2019","journal-title":"IEEE Access"}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/15\/2\/52\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T22:11:46Z","timestamp":1760134306000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/15\/2\/52"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,1,31]]},"references-count":29,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2022,2]]}},"alternative-id":["a15020052"],"URL":"https:\/\/doi.org\/10.3390\/a15020052","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,1,31]]}}}