{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,17]],"date-time":"2026-01-17T20:25:16Z","timestamp":1768681516356,"version":"3.49.0"},"reference-count":35,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2021,2,8]],"date-time":"2021-02-08T00:00:00Z","timestamp":1612742400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>This work studies clustering algorithms which operates with ordinal or comparison-based queries (operations), a situation that arises in many active-learning applications where \u201cdissimilarities\u201d between data points are evaluated by humans. Typically, exact answers are costly (or difficult to obtain in large amounts) while possibly erroneous answers have low cost. Motivated by these considerations, we study algorithms with non-trivial trade-offs between the number of exact (high-cost) operations and noisy (low-cost) operations with provable performance guarantees. Specifically, we study a class of polynomial-time graph-based clustering algorithms (termed Single-Linkage) which are widely used in practice and that guarantee exact solutions for stable instances in several clustering problems (these problems are NP-hard in the worst case). We provide several variants of these algorithms using ordinal operations and, in particular, non-trivial trade-offs between the number of high-cost and low-cost operations that are used. Our algorithms still guarantee exact solutions for stable instances of k-medoids clustering, and they use a rather small number of high-cost operations, without increasing the low-cost operations too much.<\/jats:p>","DOI":"10.3390\/a14020055","type":"journal-article","created":{"date-parts":[[2021,2,9]],"date-time":"2021-02-09T04:18:50Z","timestamp":1612844330000},"page":"55","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Optimal Clustering in Stable Instances Using Combinations of Exact and Noisy Ordinal Queries"],"prefix":"10.3390","volume":"14","author":[{"given":"Enrico","family":"Bianchi","sequence":"first","affiliation":[{"name":"Department of Computer Science, ETH Zurich, 8092 Zurich, Switzerland"}]},{"given":"Paolo","family":"Penna","sequence":"additional","affiliation":[{"name":"Department of Computer Science, ETH Zurich, 8092 Zurich, Switzerland"}]}],"member":"1968","published-online":{"date-parts":[[2021,2,8]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Balcan, M.F., Blum, A., and Vempala, S. (2008, January 17\u201320). A Discriminative Framework for Clustering via Similarity Functions. Proceedings of the Fortieth Annual ACM Symposium on Theory of Computing, Victoria, BC, Canada. STOC \u201908.","DOI":"10.1145\/1374376.1374474"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1016\/j.ipl.2011.10.006","article-title":"Center-based clustering under perturbation stability","volume":"112","author":"Awasthi","year":"2012","journal-title":"Inf. Process. Lett."},{"key":"ref_3","unstructured":"Angelidakis, H., Makarychev, K., and Makarychev, Y. (June, January 19). Algorithms for Stable and Perturbation-Resilient Problems. Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing (STOC), Montreal, PQ, Canada."},{"key":"ref_4","unstructured":"Roughgarden, T. (2021, February 08). CS264: Beyond Worst-Case Analysis Lecture# 6: Perturbation-Stable Clustering. Available online: http:\/\/theory.stanford.edu\/~tim\/w17\/l\/l6.pdf."},{"key":"ref_5","unstructured":"Tamuz, O., Liu, C., Belongie, S.J., Shamir, O., and Kalai, A. (28\u20132, January 28). Adaptively Learning the Crowd Kernel. Proceedings of the 28th International Conference on Machine Learning, ICML 2011, Bellevue, DC, USA."},{"key":"ref_6","first-page":"2711","article-title":"Finite Sample Prediction and Recovery Bounds for Ordinal Embedding","volume":"Volume 29","author":"Lee","year":"2016","journal-title":"Advances in Neural Information Processing Systems"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Emamjomeh-Zadeh, E., and Kempe, D. (2018, January 7\u201310). Adaptive hierarchical clustering using ordinal queries. Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans, LA, USA.","DOI":"10.1137\/1.9781611975031.28"},{"key":"ref_8","unstructured":"Perrot, M., Esser, P., and Ghoshdastidar, D. (2020). Near-optimal comparison based clustering. arXiV."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Addanki, R., Galhotra, S., and Saha, B. (2021, February 08). How to Design Robust Algorithms Using Noisy Comparison Oracles. Available online: https:\/\/people.cs.umass.edu\/_sainyam\/comparison_fullversion.pdf.","DOI":"10.14778\/3467861.3467862"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"476","DOI":"10.1134\/S0361768818060142","article-title":"Active Learning and Crowdsourcing: A Survey of Optimization Methods for Data Labeling","volume":"44","author":"Gilyazev","year":"2018","journal-title":"Program. Comput. Softw."},{"key":"ref_11","unstructured":"Geissmann, B., Leucci, S., Liu, C., Penna, P., and Proietti, G. (2019, January 8\u201311). Dual-Mode Greedy Algorithms Can Save Energy. Proceedings of the 30th International Symposium on Algorithms and Computation (ISAAC), Shanghai, China."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Xu, Y., Zhang, H., Miller, K., Singh, A., and Dubrawski, A. (2017). Noise-tolerant interactive learning using pairwise comparisons. Advances in Neural Information Processing Systems, Curran Associates, Inc.","DOI":"10.1109\/ACSSC.2018.8645081"},{"key":"ref_13","unstructured":"Hopkins, M., Kane, D., Lovett, S., and Mahajan, G. (2020). Noise-tolerant, reliable active classification with comparison queries. arXiv."},{"key":"ref_14","first-page":"7456","article-title":"Foundations of Comparison-Based Hierarchical Clustering","volume":"Volume 32","author":"Ghoshdastidar","year":"2019","journal-title":"Advances in Neural Information Processing Systems"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Cohen-addad, V., Kanade, V., Mallmann-trenn, F., and Mathieu, C. (2019). Hierarchical Clustering: Objective Functions and Algorithms. J. ACM, 66.","DOI":"10.1145\/3321386"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"1003","DOI":"10.1109\/TKDE.2002.1033770","article-title":"CLARANS: A method for clustering objects for spatial data mining","volume":"14","author":"Ng","year":"2002","journal-title":"Knowl. Data Eng. IEEE Trans."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"3336","DOI":"10.1016\/j.eswa.2008.01.039","article-title":"A simple and fast algorithm for K-medoids clustering","volume":"36","author":"Park","year":"2009","journal-title":"Expert Syst. Appl."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Schubert, E., and Rousseeuw, P.J. (2019). Faster k-medoids clustering: Improving the PAM, CLARA, and CLARANS algorithms. International Conference on Similarity Search and Applications, Springer.","DOI":"10.1007\/978-3-030-32047-8_16"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Wang, X., Wang, X., and Wilkes, D.M. (2020). An Efficient K-Medoids Clustering Algorithm for Large Scale Data. Machine Learning-based Natural Scene Recognition for Mobile Robot Localization in An Unknown Environment, Springer.","DOI":"10.1007\/978-981-13-9217-7"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Sammut, C., and Webb, G.I. (2010). K-Medoids Clustering. Encyclopedia of Machine Learning, Springer.","DOI":"10.1007\/978-0-387-30164-8"},{"key":"ref_21","unstructured":"Geissmann, B., Leucci, S., Liu, C., and Penna, P. (2019, January 9\u201311). Optimal Sorting with Persistent Comparison Errors. Proceedings of the 27th Annual European Symposium on Algorithms (ESA), Munich\/Garching, Germany."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Kane, D.M., Lovett, S., Moran, S., and Zhang, J. (2017, January 15\u201317). Active Classification with Comparison Queries. Proceedings of the 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), Berkeley, CA, USA.","DOI":"10.1109\/FOCS.2017.40"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Ukkonen, A. (2017, January 18\u201321). Crowdsourced Correlation Clustering with Relative Distance Comparisons. Proceedings of the 2017 IEEE International Conference on Data Mining (ICDM), New Orleans, LA, USA.","DOI":"10.1109\/ICDM.2017.148"},{"key":"ref_24","first-page":"3216","article-title":"Clustering with Same-Cluster Queries","volume":"Volume 29","author":"Lee","year":"2016","journal-title":"Advances in Neural Information Processing Systems"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Ailon, N., Bhattacharya, A., and Jaiswal, R. (2018). Approximate correlation clustering using same-cluster queries. Latin American Symposium on Theoretical Informatics (LATIN), Springer.","DOI":"10.1007\/978-3-319-77404-6_2"},{"key":"ref_26","unstructured":"Saha, B., and Subramanian, S. (2019, January 9\u201311). Correlation Clustering with Same-Cluster Queries Bounded by Optimal Cost. Proceedings of the 27th Annual European Symposium on Algorithms (ESA), Munich\/Garching, Germany."},{"key":"ref_27","unstructured":"Bressan, M., Cesa-Bianchi, N., Lattanzi, S., and Paudice, A. (2020). Exact Recovery of Mangled Clusters with Same-Cluster Queries. arXiV."},{"key":"ref_28","first-page":"5788","article-title":"Clustering with Noisy Queries","volume":"Volume 30","author":"Guyon","year":"2017","journal-title":"Advances in Neural Information Processing Systems"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"105833","DOI":"10.1016\/j.ipl.2019.105833","article-title":"On semi-supervised active clustering of stable instances with oracles","volume":"151","author":"Sanyal","year":"2019","journal-title":"Inf. Process. Lett."},{"key":"ref_30","first-page":"6649","article-title":"Query k-means clustering and the double dixie cup problem","volume":"31","author":"Chien","year":"2018","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_31","unstructured":"Epstein, L., and Erlebach, T. (2018). Longest Increasing Subsequence Under Persistent Comparison Errors. Approximation and Online Algorithms, Springer International Publishing."},{"key":"ref_32","unstructured":"Ngo, H.Q. (2009). The Weighted Coupon Collector\u2019s Problem and Applications. Computing and Combinatorics, Springer."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"367","DOI":"10.1051\/ps\/2016016","article-title":"The coupon collector\u2019s problem revisited: Generalizing the double Dixie cup problem of Newman and Shepp","volume":"20","author":"Doumas","year":"2016","journal-title":"ESAIM Probab. Stat."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"58","DOI":"10.2307\/2308930","article-title":"The Double Dixie Cup Problem","volume":"67","author":"Newman","year":"1960","journal-title":"Am. Math. Mon."},{"key":"ref_35","first-page":"138","article-title":"On subtrees of trees","volume":"34","author":"Wang","year":"2005","journal-title":"Adv. Appl. Math.\u2014Advan Appl Math"}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/14\/2\/55\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T05:21:17Z","timestamp":1760160077000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/14\/2\/55"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,2,8]]},"references-count":35,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2021,2]]}},"alternative-id":["a14020055"],"URL":"https:\/\/doi.org\/10.3390\/a14020055","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,2,8]]}}}