{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,14]],"date-time":"2026-03-14T09:53:42Z","timestamp":1773482022524,"version":"3.50.1"},"reference-count":74,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2024,5,29]],"date-time":"2024-05-29T00:00:00Z","timestamp":1716940800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100006374","name":"Natural Science Foundation of Anhui Province","doi-asserted-by":"publisher","award":["2208085MF163"],"award-info":[{"award-number":["2208085MF163"]}],"id":[{"id":"10.13039\/501100006374","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100006374","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62272432"],"award-info":[{"award-number":["62272432"]}],"id":[{"id":"10.13039\/501100006374","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100006374","name":"National Key Research and Development Program of China","doi-asserted-by":"publisher","award":["2021YFA1000900"],"award-info":[{"award-number":["2021YFA1000900"]}],"id":[{"id":"10.13039\/501100006374","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Manag. Data"],"published-print":{"date-parts":[[2024,5,29]]},"abstract":"<jats:p>DBSCAN is a popular density-based clustering algorithm that has many different applications in practice. However, the running time of DBSCAN in high-dimensional space or general metric space (\\em e.g., clustering a set of texts by using edit distance) can be as large as quadratic in the input size. Moreover, most of existing accelerating techniques for DBSCAN are only available for low-dimensional Euclidean space. In this paper, we study the DBSCAN problem under the assumption that the inliers (the core points and border points) have a low intrinsic dimension (which is a realistic assumption for many high-dimensional applications), where the outliers can locate anywhere in the space without any assumption. First, we propose a k-center clustering based algorithm that can reduce the time-consuming labeling and merging tasks of DBSCAN to be linear. Further, we propose a linear time approximate DBSCAN algorithm, where the key idea is building a novel small-size summary for the core points. Also, our algorithm can be efficiently implemented for streaming data and the required memory is independent of the input size. Finally, we conduct our experiments and compare our algorithms with several popular DBSCAN algorithms. The experimental results suggest that our proposed approach can significantly reduce the computational complexity in practice.<\/jats:p>","DOI":"10.1145\/3654981","type":"journal-article","created":{"date-parts":[[2024,5,30]],"date-time":"2024-05-30T09:44:53Z","timestamp":1717062293000},"page":"1-25","source":"Crossref","is-referenced-by-count":3,"title":["Towards Metric DBSCAN: Exact, Approximate, and Streaming Algorithms"],"prefix":"10.1145","volume":"2","author":[{"ORCID":"https:\/\/orcid.org\/0009-0001-4735-3212","authenticated-orcid":false,"given":"Guanlin","family":"Mo","sequence":"first","affiliation":[{"name":"University of Science and Technology of China, Hefei, Anhui, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-2076-2165","authenticated-orcid":false,"given":"Shihong","family":"Song","sequence":"additional","affiliation":[{"name":"University of Science and Technology of China, Hefei, Anhui, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1307-6077","authenticated-orcid":false,"given":"Hu","family":"Ding","sequence":"additional","affiliation":[{"name":"University of Science and Technology of China, Hefei, Anhui, China"}]}],"member":"320","published-online":{"date-parts":[[2024,5,30]]},"reference":[{"key":"e_1_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/98524.98567"},{"key":"e_1_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/304181.304187"},{"key":"e_1_2_2_3_1","first-page":"2055","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Babenko Artem","year":"2016","unstructured":"Artem Babenko and Victor Lempitsky. Efficient indexing of billion-scale datasets of deep descriptors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2055--2063, 2016."},{"key":"e_1_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/1143844.1143857"},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11704-019-9059-3"},{"key":"e_1_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/MSP.2017.2693418"},{"key":"e_1_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3308558.3313641"},{"key":"e_1_2_2_8_1","volume-title":"evostream--evolutionary stream clustering utilizing idle times. Big data research, 14:101--111","author":"Carnein Matthias","year":"2018","unstructured":"Matthias Carnein and Heike Trautmann. evostream--evolutionary stream clustering utilizing idle times. Big data research, 14:101--111, 2018."},{"key":"e_1_2_2_9_1","first-page":"12","article-title":"Solving k-center clustering (with outliers) in mapreduce and streaming, almost as accurately as sequentially","author":"Ceccarello Matteo","year":"2019","unstructured":"Matteo Ceccarello, Andrea Pietracaprina, and Geppino Pucci. Solving k-center clustering (with outliers) in mapreduce and streaming, almost as accurately as sequentially. Proceedings of the VLDB Endowment, 12, 2019.","journal-title":"Proceedings of the VLDB Endowment"},{"key":"e_1_2_2_10_1","volume-title":"Anomaly detection: A survey. ACM computing surveys (CSUR), 41(3):1--58","author":"Chandola Varun","year":"2009","unstructured":"Varun Chandola, Arindam Banerjee, and Vipin Kumar. Anomaly detection: A survey. ACM computing surveys (CSUR), 41(3):1--58, 2009."},{"key":"e_1_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1142\/S0218195905001683"},{"key":"e_1_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/1281192.1281210"},{"key":"e_1_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/1132516.1132564"},{"key":"e_1_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/34.1000236"},{"key":"e_1_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1142\/S0218195919400028"},{"key":"e_1_2_2_16_1","volume-title":"27th Annual European Symposium on Algorithms (ESA 2019","author":"Ding Hu","year":"2019","unstructured":"Hu Ding, Haikuo Yu, and Zixiu Wang. Greedy strategy works for k-center clustering with outliers and coreset construction. In 27th Annual European Symposium on Algorithms (ESA 2019). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2019."},{"key":"e_1_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.5555\/3491440.3491866"},{"key":"e_1_2_2_18_1","volume-title":"Proceedings of the Third International Workshop on Paraphrasing (IWP2005)","author":"Dolan William B","year":"2005","unstructured":"William B Dolan, Chris Quirk, and Chris Brockett. Automatically constructing a corpus of sentential paraphrases. In Proceedings of the Third International Workshop on Paraphrasing (IWP2005), 2005."},{"key":"e_1_2_2_19_1","volume-title":"UCI machine learning repository","author":"Dua Dheeru","year":"2017","unstructured":"Dheeru Dua and Casey Graff. UCI machine learning repository, 2017. URL http:\/\/archive.ics.uci.edu\/ml."},{"key":"e_1_2_2_20_1","first-page":"9267","volume-title":"International Conference on Machine Learning","author":"Elkin Yury","year":"2023","unstructured":"Yury Elkin and Vitaliy Kurlin. A new near-linear time algorithm for k-nearest neighbor search using a compressed cover tree. In International Conference on Machine Learning, pages 9267--9311. PMLR, 2023."},{"key":"e_1_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/220279.220293"},{"key":"e_1_2_2_22_1","first-page":"85","volume-title":"CCCG","volume":"95","author":"Erickson Jeff","year":"1995","unstructured":"Jeff Erickson. On the relative complexities of some geometric problems. In CCCG, volume 95, pages 85--90, 1995."},{"key":"e_1_2_2_23_1","first-page":"226","volume-title":"A density-based algorithm for discovering clusters in large spatial databases with noise. In kdd","author":"Ester Martin","year":"1996","unstructured":"Martin Ester, Hans-Peter Kriegel, J\u00f6rg Sander, Xiaowei Xu, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. In kdd, volume 96, pages 226--231, 1996."},{"key":"e_1_2_2_24_1","volume-title":"Challenges of big data analysis. National science review, 1(2):293--314","author":"Fan Jianqing","year":"2014","unstructured":"Jianqing Fan, Fang Han, and Han Liu. Challenges of big data analysis. National science review, 1(2):293--314, 2014."},{"key":"e_1_2_2_25_1","first-page":"481","volume-title":"Proceedings 21","author":"Fichtenberger Hendrik","year":"2013","unstructured":"Hendrik Fichtenberger, Marc Gill\u00e9, Melanie Schmidt, Chris Schwiegelshohn, and Christian Sohler. Bico: Birch meets coresets for k-means clustering. In Algorithms--ESA 2013: 21st Annual European Symposium, Sophia Antipolis, France, September 2--4, 2013. Proceedings 21, pages 481--492. Springer, 2013."},{"key":"e_1_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2737792"},{"key":"e_1_2_2_27_1","volume-title":"Clustering to minimize the maximum intercluster distance. Theoretical computer science, 38: 293--306","author":"Gonzalez Teofilo F","year":"1985","unstructured":"Teofilo F Gonzalez. Clustering to minimize the maximum intercluster distance. Theoretical computer science, 38: 293--306, 1985."},{"key":"e_1_2_2_28_1","volume-title":"A faster algorithm for dbscan, master' s thesis","author":"Gunawan A","year":"2013","unstructured":"A Gunawan and M de Berg. A faster algorithm for dbscan, master' s thesis. Technical University of Eindhoven, 2013."},{"key":"e_1_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/SFCS.2003.1238226"},{"key":"e_1_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/CIC.1997.647926"},{"key":"e_1_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2016.2522412"},{"key":"e_1_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1137\/S0097539704446281"},{"key":"e_1_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11704-013-3158-3"},{"key":"e_1_2_2_34_1","volume-title":"A unified approach to approximation algorithms for bottleneck problems. Journal of the ACM (JACM), 33(3):533--550","author":"Hochbaum Dorit S","year":"1986","unstructured":"Dorit S Hochbaum and David B Shmoys. A unified approach to approximation algorithms for bottleneck problems. Journal of the ACM (JACM), 33(3):533--550, 1986."},{"key":"e_1_2_2_35_1","first-page":"1","volume-title":"6th International Conference on Learning Representations (ICLR 2018","volume":"6","author":"Houle Michael E","year":"1801","unstructured":"Michael E Houle. Characterizing adversarial subspaces using local intrinsic dimensionality. In 6th International Conference on Learning Representations (ICLR 2018), CoRR abs\/1801.02613, volume 6, pages 1--15, 2018."},{"key":"e_1_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/FOCS.2018.00082"},{"key":"e_1_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF01908075"},{"key":"e_1_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/34.291440"},{"key":"e_1_2_2_39_1","first-page":"1162","volume-title":"International Conference on Machine Learning","author":"Izbicki Mike","year":"2015","unstructured":"Mike Izbicki and Christian Shelton. Faster cover trees. In International Conference on Machine Learning, pages 1162--1170. PMLR, 2015."},{"key":"e_1_2_2_40_1","first-page":"3019","volume-title":"International conference on machine learning","author":"Jang Jennifer","year":"2019","unstructured":"Jennifer Jang and Heinrich Jiang. Dbscan: Towards fast and scalable density clustering. In International conference on machine learning, pages 3019--3029. PMLR, 2019."},{"key":"e_1_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00409"},{"key":"e_1_2_2_42_1","volume-title":"Product quantization for nearest neighbor search","author":"Jegou Herve","year":"2010","unstructured":"Herve Jegou, Matthijs Douze, and Cordelia Schmid. Product quantization for nearest neighbor search. IEEE transactions on pattern analysis and machine intelligence, 33(1):117--128, 2010."},{"key":"e_1_2_2_43_1","first-page":"798","volume-title":"Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms","author":"Krauthgamer Robert","year":"2004","unstructured":"Robert Krauthgamer and James R Lee. Navigating nets: Simple algorithms for proximity search. In Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms, pages 798--807. Citeseer, 2004."},{"key":"e_1_2_2_44_1","doi-asserted-by":"publisher","DOI":"10.1002\/widm.30"},{"key":"e_1_2_2_45_1","volume-title":"Learning multiple layers of features from tiny images. Technical report","author":"Krizhevsky Alex","year":"2009","unstructured":"Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. Technical report, 2009."},{"key":"e_1_2_2_46_1","volume-title":"Revisiting k-means: New algorithms via bayesian nonparametrics. arXiv preprint arXiv:1111.0352","author":"Kulis Brian","year":"2011","unstructured":"Brian Kulis and Michael I Jordan. Revisiting k-means: New algorithms via bayesian nonparametrics. arXiv preprint arXiv:1111.0352, 2011."},{"key":"e_1_2_2_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/5.726791"},{"key":"e_1_2_2_48_1","first-page":"707","volume-title":"Soviet physics doklady","author":"Levenshtein Vladimir I","year":"1966","unstructured":"Vladimir I Levenshtein et al. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady, volume 10, pages 707--710. Soviet Union, 1966."},{"key":"e_1_2_2_49_1","doi-asserted-by":"publisher","DOI":"10.14778\/3021924.3021932"},{"key":"e_1_2_2_50_1","volume-title":"Quantitative structure-- activity relationship models for ready biodegradability of chemicals. Journal of chemical information and modeling, 53 :867--878","author":"Mansouri Kamel","year":"2013","unstructured":"Kamel Mansouri, Tine Ringsted, Davide Ballabio, Roberto Todeschini, and Viviana Consonni. Quantitative structure-- activity relationship models for ready biodegradability of chemicals. Journal of chemical information and modeling, 53 :867--878, 2013."},{"key":"e_1_2_2_51_1","doi-asserted-by":"publisher","DOI":"10.21105\/joss.00205"},{"key":"e_1_2_2_52_1","volume-title":"A guided tour to approximate string matching. ACM computing surveys (CSUR), 33(1):31--88","author":"Navarro Gonzalo","year":"2001","unstructured":"Gonzalo Navarro. A guided tour to approximate string matching. ACM computing surveys (CSUR), 33(1):31--88, 2001."},{"key":"e_1_2_2_53_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1162"},{"key":"e_1_2_2_54_1","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511933912"},{"key":"e_1_2_2_55_1","volume-title":"Clustering by fast search and find of density peaks. science, 344(6191):1492--1496","author":"Rodriguez Alex","year":"2014","unstructured":"Alex Rodriguez and Alessandro Laio. Clustering by fast search and find of density peaks. science, 344(6191):1492--1496, 2014."},{"key":"e_1_2_2_56_1","volume-title":"Nonlinear dimensionality reduction by locally linear embedding. science, 290 (5500):2323--2326","author":"Roweis Sam T","year":"2000","unstructured":"Sam T Roweis and Lawrence K Saul. Nonlinear dimensionality reduction by locally linear embedding. science, 290 (5500):2323--2326, 2000."},{"key":"e_1_2_2_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/3188745.3188916"},{"key":"e_1_2_2_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/CLUSTER.2019.8891020"},{"key":"e_1_2_2_59_1","volume-title":"Hans Peter Kriegel, and Xiaowei Xu. Dbscan revisited, revisited: why and how you should (still) use dbscan. ACM Transactions on Database Systems (TODS), 42(3):1--21","author":"Schubert Erich","year":"2017","unstructured":"Erich Schubert, J\u00f6rg Sander, Martin Ester, Hans Peter Kriegel, and Xiaowei Xu. Dbscan revisited, revisited: why and how you should (still) use dbscan. ACM Transactions on Database Systems (TODS), 42(3):1--21, 2017."},{"key":"e_1_2_2_60_1","volume-title":"scikit learn. https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.datasets.make_moons. html#sklearn.datasets.make_moons","year":"2007","unstructured":"scikit-learn developers. scikit learn. https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.datasets.make_moons. html#sklearn.datasets.make_moons, 2007--2023."},{"key":"e_1_2_2_61_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3196887"},{"key":"e_1_2_2_62_1","doi-asserted-by":"publisher","DOI":"10.1145\/1007352.1007399"},{"key":"e_1_2_2_63_1","volume-title":"Pearson Education India","author":"Tan Pang-Ning","year":"2016","unstructured":"Pang-Ning Tan, Michael Steinbach, and Vipin Kumar. Introduction to data mining. Pearson Education India, 2016."},{"key":"e_1_2_2_64_1","first-page":"210","volume-title":"Proceedings, Part II 11","author":"Veeling Bastiaan S","year":"2018","unstructured":"Bastiaan S Veeling, Jasper Linmans, Jim Winkens, Taco Cohen, and Max Welling. Rotation equivariant cnns for digital pathology. In Medical Image Computing and Computer Assisted Intervention--MICCAI 2018: 21st International Conference, Granada, Spain, September 16--20, 2018, Proceedings, Part II 11, pages 210--218. Springer, 2018."},{"key":"e_1_2_2_65_1","doi-asserted-by":"publisher","DOI":"10.1145\/1553374.1553511"},{"key":"e_1_2_2_66_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3380582"},{"key":"e_1_2_2_67_1","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00290"},{"key":"e_1_2_2_68_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIFS.2021.3058771"},{"key":"e_1_2_2_69_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N18-1101"},{"key":"e_1_2_2_70_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICMLC.2007.4370588"},{"key":"e_1_2_2_71_1","volume-title":"Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms","author":"Xiao Han","year":"2017","unstructured":"Han Xiao, Kashif Rasul, and Roland Vollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms, 2017."},{"key":"e_1_2_2_72_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2019.00122"},{"key":"e_1_2_2_73_1","volume-title":"Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365","author":"Yu Fisher","year":"2015","unstructured":"Fisher Yu, Yinda Zhang, Shuran Song, Ari Seff, and Jianxiong Xiao. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365, 2015."},{"key":"e_1_2_2_74_1","volume-title":"Character-level convolutional networks for text classification. Advances in neural information processing systems, 28","author":"Zhang Xiang","year":"2015","unstructured":"Xiang Zhang, Junbo Zhao, and Yann LeCun. Character-level convolutional networks for text classification. Advances in neural information processing systems, 28, 2015."}],"container-title":["Proceedings of the ACM on Management of Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3654981","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3654981","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,21]],"date-time":"2025-08-21T14:39:18Z","timestamp":1755787158000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3654981"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,5,29]]},"references-count":74,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2024,5,29]]}},"alternative-id":["10.1145\/3654981"],"URL":"https:\/\/doi.org\/10.1145\/3654981","relation":{},"ISSN":["2836-6573"],"issn-type":[{"value":"2836-6573","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,5,29]]}}}