{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2023,1,11]],"date-time":"2023-01-11T05:42:29Z","timestamp":1673415749948},"reference-count":61,"publisher":"Association for Computing Machinery (ACM)","issue":"10","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2021,6]]},"abstract":"\n Metric based comparison operations such as finding maximum, nearest and farthest neighbor are fundamental to studying various clustering techniques such as\n k<\/jats:italic>\n -center clustering and agglomerative hierarchical clustering. These techniques crucially rely on accurate estimation of pairwise distance between records. However, computing exact features of the records, and their pairwise distances is often challenging, and sometimes not possible. We circumvent this challenge by leveraging weak supervision in the form of a comparison oracle that compares the relative distance between the queried points such as `Is point\n u<\/jats:italic>\n closer to\n v<\/jats:italic>\n or\n w<\/jats:italic>\n closer to\n x<\/jats:italic>\n ?'.\n <\/jats:p>\n \n However, it is possible that some queries are easier to answer than others using a comparison oracle. We capture this by introducing two different noise models called adversarial and probabilistic noise. In this paper, we study various problems that include finding maximum, nearest\/farthest neighbor search under these noise models. Building upon the techniques we develop for these problems, we give robust algorithms for\n k<\/jats:italic>\n -center clustering and agglomerative hierarchical clustering. We prove that our algorithms achieve good approximation guarantees with a high probability and analyze their query complexity. We evaluate the effectiveness and efficiency of our techniques empirically on various real-world datasets.\n <\/jats:p>","DOI":"10.14778\/3467861.3467862","type":"journal-article","created":{"date-parts":[[2021,10,26]],"date-time":"2021-10-26T16:17:12Z","timestamp":1635265032000},"page":"1703-1716","source":"Crossref","is-referenced-by-count":3,"title":["How to design robust algorithms using noisy comparison Oracle"],"prefix":"10.14778","volume":"14","author":[{"given":"Raghavendra","family":"Addanki","sequence":"first","affiliation":[{"name":"UMass Amherst"}]},{"given":"Sainyam","family":"Galhotra","sequence":"additional","affiliation":[{"name":"UMass Amherst"}]},{"given":"Barna","family":"Saha","sequence":"additional","affiliation":[{"name":"UC Berkeley"}]}],"member":"320","published-online":{"date-parts":[[2021,10,26]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"[n.d.]. Google Vision API https:\/\/cloud.google.com\/vision. [n.d.]. Google Vision API https:\/\/cloud.google.com\/vision."},{"key":"e_1_2_1_2_1","unstructured":"https:\/\/simplemaps.com\/data\/us-cities. United States Cities Database. (https:\/\/simplemaps.com\/data\/us-cities). https:\/\/simplemaps.com\/data\/us-cities. United States Cities Database. (https:\/\/simplemaps.com\/data\/us-cities)."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.14778\/3467861.3467862"},{"key":"e_1_2_1_4_1","volume-title":"Approximate Clustering with Same-Cluster Queries. In 9th Innovations in Theoretical Computer Science Conference (ITCS","volume":"94","author":"Ailon Nir","year":"2018","unstructured":"Nir Ailon , Anup Bhattacharya , Ragesh Jaiswal , and Amit Kumar . 2018 . Approximate Clustering with Same-Cluster Queries. In 9th Innovations in Theoretical Computer Science Conference (ITCS 2018), Vol. 94 . Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 40. Nir Ailon, Anup Bhattacharya, Ragesh Jaiswal, and Amit Kumar. 2018. Approximate Clustering with Same-Cluster Queries. In 9th Innovations in Theoretical Computer Science Conference (ITCS 2018), Vol. 94. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 40."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-02927-1_5"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.14778\/3204028.3204034"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.5555\/3157382.3157458"},{"key":"e_1_2_1_8_1","volume-title":"Thirty-Second AAAI Conference on Artificial Intelligence.","author":"Ben-David Shai","year":"2018","unstructured":"Shai Ben-David . 2018 . Clustering-what both theoreticians and practitioners are doing wrong . In Thirty-Second AAAI Conference on Artificial Intelligence. Shai Ben-David. 2018. Clustering-what both theoreticians and practitioners are doing wrong. In Thirty-Second AAAI Conference on Artificial Intelligence."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/2897518.2897642"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.5555\/1347082.1347112"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.5555\/3454287.3455410"},{"key":"e_1_2_1_12_1","volume-title":"Hierarchical clustering with structural constraints. arXiv preprint arXiv:1805.09476","author":"Chatziafratis Vaggos","year":"2018","unstructured":"Vaggos Chatziafratis , Rad Niazadeh , and Moses Charikar . 2018. Hierarchical clustering with structural constraints. arXiv preprint arXiv:1805.09476 ( 2018 ). Vaggos Chatziafratis, Rad Niazadeh, and Moses Charikar. 2018. Hierarchical clustering with structural constraints. arXiv preprint arXiv:1805.09476 (2018)."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.5555\/3327757.3327771"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/NCC.2019.8732224"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2015.2462357"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/2684066"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3199672"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.5555\/3174304.3175295"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1137\/S0097539791195877"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.14778\/2876473.2876474"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3183755"},{"key":"e_1_2_1_22_1","volume-title":"Sorting with Recurrent Comparison Errors. In 28th International Symposium on Algorithms and Computation (ISAAC","author":"Geissmann Barbara","year":"2017","unstructured":"Barbara Geissmann , Stefano Leucci , Chih-Hung Liu , and Paolo Penna . 2017 . Sorting with Recurrent Comparison Errors. In 28th International Symposium on Algorithms and Computation (ISAAC 2017). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. Barbara Geissmann, Stefano Leucci, Chih-Hung Liu, and Paolo Penna. 2017. Sorting with Recurrent Comparison Errors. In 28th International Symposium on Algorithms and Computation (ISAAC 2017). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik."},{"key":"e_1_2_1_23_1","volume-title":"Optimal Sorting with Persistent Comparison Errors. In 27th Annual European Symposium on Algorithms (ESA","volume":"144","author":"Geissmann Barbara","year":"2019","unstructured":"Barbara Geissmann , Stefano Leucci , Chih-Hung Liu , and Paolo Penna . 2019 . Optimal Sorting with Persistent Comparison Errors. In 27th Annual European Symposium on Algorithms (ESA 2019), Vol. 144 . Schloss Dagstuhl-Leibniz-Zentrum f\u00fcr Informatik, 49. Barbara Geissmann, Stefano Leucci, Chih-Hung Liu, and Paolo Penna. 2019. Optimal Sorting with Persistent Comparison Errors. In 27th Annual European Symposium on Algorithms (ESA 2019), Vol. 144. Schloss Dagstuhl-Leibniz-Zentrum f\u00fcr Informatik, 49."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00224-019-09957-5"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.5555\/3454287.3454957"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2012.6224657"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2588576"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1016\/0304-3975(85)90224-5"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/3366423.3380045"},{"key":"e_1_2_1_30_1","unstructured":"Gregory Griffin Alex Holub and Pietro Perona. 2007. Caltech-256 object category dataset. (2007). Gregory Griffin Alex Holub and Pietro Perona. 2007. Caltech-256 object category dataset. (2007)."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2213836.2213880"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/2872427.2883037"},{"key":"e_1_2_1_33_1","volume-title":"reliable active classification with comparison queries. arXiv preprint arXiv:2001.05497","author":"Hopkins Max","year":"2020","unstructured":"Max Hopkins , Daniel Kane , Shachar Lovett , and Gaurav Mahajan . 2020. Noise-tolerant , reliable active classification with comparison queries. arXiv preprint arXiv:2001.05497 ( 2020 ). Max Hopkins, Daniel Kane, Shachar Lovett, and Gaurav Mahajan. 2020. Noise-tolerant, reliable active classification with comparison queries. arXiv preprint arXiv:2001.05497 (2020)."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.5555\/3454287.3455228"},{"key":"e_1_2_1_35_1","volume-title":"Metric learning for individual fairness. arXiv preprint arXiv:1906.00250","author":"Ilvento Christina","year":"2019","unstructured":"Christina Ilvento . 2019. Metric learning for individual fairness. arXiv preprint arXiv:1906.00250 ( 2019 ). Christina Ilvento. 2019. Metric learning for individual fairness. arXiv preprint arXiv:1906.00250 (2019)."},{"key":"e_1_2_1_36_1","volume-title":"Comparison based learning from weak oracles. arXiv preprint arXiv:1802.06942","author":"Kazemi Ehsan","year":"2018","unstructured":"Ehsan Kazemi , Lin Chen , Sanjoy Dasgupta , and Amin Karbasi . 2018. Comparison based learning from weak oracles. arXiv preprint arXiv:1802.06942 ( 2018 ). Ehsan Kazemi, Lin Chen, Sanjoy Dasgupta, and Amin Karbasi. 2018. Comparison based learning from weak oracles. arXiv preprint arXiv:1802.06942 (2018)."},{"key":"e_1_2_1_37_1","volume-title":"Relaxed oracles for semi-supervised clustering. arXiv preprint arXiv:1711.07433","author":"Kim Taewan","year":"2017","unstructured":"Taewan Kim and Joydeep Ghosh . 2017. Relaxed oracles for semi-supervised clustering. arXiv preprint arXiv:1711.07433 ( 2017 ). Taewan Kim and Joydeep Ghosh. 2017. Relaxed oracles for semi-supervised clustering. arXiv preprint arXiv:1711.07433 (2017)."},{"key":"e_1_2_1_38_1","volume-title":"Semi-supervised active clustering with weak oracles. arXiv preprint arXiv:1709.03202","author":"Kim Taewan","year":"2017","unstructured":"Taewan Kim and Joydeep Ghosh . 2017. Semi-supervised active clustering with weak oracles. arXiv preprint arXiv:1709.03202 ( 2017 ). Taewan Kim and Joydeep Ghosh. 2017. Semi-supervised active clustering with weak oracles. arXiv preprint arXiv:1709.03202 (2017)."},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.5555\/2040572.2040653"},{"key":"e_1_2_1_40_1","volume-title":"International Conference on Machine Learning. 3448--3457","author":"Kleindessner Matth\u00e4us","year":"2019","unstructured":"Matth\u00e4us Kleindessner , Pranjal Awasthi , and Jamie Morgenstern . 2019 . Fair k-Center Clustering for Data Summarization . In International Conference on Machine Learning. 3448--3457 . Matth\u00e4us Kleindessner, Pranjal Awasthi, and Jamie Morgenstern. 2019. Fair k-Center Clustering for Data Summarization. In International Conference on Machine Learning. 3448--3457."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3035953"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.5555\/3454287.3455147"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.5555\/3295222.3295329"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.5555\/3294996.3295221"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.14778\/2947618.2947624"},{"key":"e_1_2_1_46_1","volume-title":"James Davis, Hector Garcia-Molina, and Neoklis Polyzotis.","author":"Polychronopoulos Vassilis","year":"2013","unstructured":"Vassilis Polychronopoulos , Luca De Alfaro , James Davis, Hector Garcia-Molina, and Neoklis Polyzotis. 2013 . Human-Powered Top-k Lists.. In WebDB. 25--30. Vassilis Polychronopoulos, Luca De Alfaro, James Davis, Hector Garcia-Molina, and Neoklis Polyzotis. 2013. Human-Powered Top-k Lists.. In WebDB. 25--30."},{"key":"e_1_2_1_47_1","volume-title":"A solution to the single-question crowd wisdom problem. Nature 541, 7638","author":"Prelec Dra\u017een","year":"2017","unstructured":"Dra\u017een Prelec , H Sebastian Seung , and John McCoy . 2017. A solution to the single-question crowd wisdom problem. Nature 541, 7638 ( 2017 ), 532--535. Dra\u017een Prelec, H Sebastian Seung, and John McCoy. 2017. A solution to the single-question crowd wisdom problem. Nature 541, 7638 (2017), 532--535."},{"key":"e_1_2_1_48_1","volume-title":"SLINK: an optimally efficient algorithm for the single-link cluster method. The computer journal 16, 1","author":"Sibson Robin","year":"1973","unstructured":"Robin Sibson . 1973. SLINK: an optimally efficient algorithm for the single-link cluster method. The computer journal 16, 1 ( 1973 ), 30--34. Robin Sibson. 1973. SLINK: an optimally efficient algorithm for the single-link cluster method. The computer journal 16, 1 (1973), 30--34."},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.5555\/3104482.3104567"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2017.148"},{"key":"e_1_2_1_51_1","unstructured":"https:\/\/modal-python.readthedocs.io\/en\/latest\/. [n.d.]. modAL library. https:\/\/modal-python.readthedocs.io\/en\/latest\/. [n.d.]. modAL library."},{"key":"e_1_2_1_52_1","unstructured":"https:\/\/scikit-learn.org\/stable\/. [n.d.]. Scikit-learn. https:\/\/scikit-learn.org\/stable\/. [n.d.]. Scikit-learn."},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.5555\/1965254"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1145\/2187836.2187969"},{"key":"e_1_2_1_55_1","volume-title":"Skyline Computation with Noisy Comparisons. In Combinatorial Algorithms: 31st International Workshop, IWOCA 2020, Bordeaux, France, June 8--10, 2020, Proceedings. Springer, 289","author":"Verdugo Victor","unstructured":"Victor Verdugo . [n.d.]. Skyline Computation with Noisy Comparisons. In Combinatorial Algorithms: 31st International Workshop, IWOCA 2020, Bordeaux, France, June 8--10, 2020, Proceedings. Springer, 289 . Victor Verdugo. [n.d.]. Skyline Computation with Noisy Comparisons. In Combinatorial Algorithms: 31st International Workshop, IWOCA 2020, Bordeaux, France, June 8--10, 2020, Proceedings. Springer, 289."},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2015.7113286"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.14778\/2732977.2732982"},{"key":"e_1_2_1_58_1","unstructured":"Ramya Korlakai Vinayak and Babak Hassibi. 2016. Crowdsourced clustering: Querying edges vs triangles. In Advances in Neural Information Processing Systems. 1316--1324. Ramya Korlakai Vinayak and Babak Hassibi. 2016. Crowdsourced clustering: Querying edges vs triangles. In Advances in Neural Information Processing Systems. 1316--1324."},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.14778\/2350229.2350263"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.5555\/1971947"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1145\/3219819.3220064"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3467861.3467862","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T10:33:27Z","timestamp":1672223607000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3467861.3467862"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,6]]},"references-count":61,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2021,6]]}},"alternative-id":["10.14778\/3467861.3467862"],"URL":"http:\/\/dx.doi.org\/10.14778\/3467861.3467862","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":["General Earth and Planetary Sciences","Water Science and Technology","Geography, Planning and Development"],"published":{"date-parts":[[2021,6]]}}}