{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:11:05Z","timestamp":1750219865436,"version":"3.41.0"},"reference-count":26,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2023,5,26]],"date-time":"2023-05-26T00:00:00Z","timestamp":1685059200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Austrian Science Fund","award":["P 34962"],"award-info":[{"award-number":["P 34962"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Manag. Data"],"published-print":{"date-parts":[[2023,5,26]]},"abstract":"<jats:p>Density-based clustering aims to find groups of similar objects (i.e., clusters) in a given dataset. Applications include, e.g., process mining and anomaly detection. It comes with two user parameters (\u03b5, MinPts) that determine the clustering result, but are typically unknown in advance. Thus, users need to interactively test various settings until satisfying clusterings are found. However, existing solutions suffer from the following limitations: (a) Ineffective pruning of expensive neighborhood computations. (b) Approximate clustering, where objects are falsely labeled noise. (c) Restricted parameter tuning that is limited to \u03b5 whereas MinPts is constant, which reduces the explorable clusterings. (d) Inflexibility in terms of applicable data types and distance functions. We propose FINEX, a linear-space index that overcomes these limitations. Our index provides exact clusterings and can be queried with either of the two parameters. FINEX avoids neighborhood computations where possible and reduces the complexities of the remaining computations by leveraging fundamental properties of density-based clusters. Hence, our solution is efficient and flexible regarding data types and distance functions. Moreover, FINEX respects the original and straightforward notion of density-based clustering. In our experiments on 12 large real-world datasets from various domains, FINEX frequently outperforms state-of-the-art techniques for exact clustering by orders of magnitude.<\/jats:p>","DOI":"10.1145\/3588925","type":"journal-article","created":{"date-parts":[[2023,5,30]],"date-time":"2023-05-30T17:42:05Z","timestamp":1685468525000},"page":"1-25","source":"Crossref","is-referenced-by-count":0,"title":["FINEX: A Fast Index for Exact &amp; Flexible Density-Based Clustering"],"prefix":"10.1145","volume":"1","author":[{"ORCID":"https:\/\/orcid.org\/0009-0005-6572-2228","authenticated-orcid":false,"given":"Konstantin Emil","family":"Thiel","sequence":"first","affiliation":[{"name":"University of Salzburg, Salzburg, Austria"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-3742-5555","authenticated-orcid":false,"given":"Daniel","family":"Kocher","sequence":"additional","affiliation":[{"name":"University of Salzburg, Salzburg, Austria"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3036-6201","authenticated-orcid":false,"given":"Nikolaus","family":"Augsten","sequence":"additional","affiliation":[{"name":"University of Salzburg, Salzburg, Austria"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7190-6825","authenticated-orcid":false,"given":"Thomas","family":"H\u00fctter","sequence":"additional","affiliation":[{"name":"University of Salzburg, Salzburg, Austria"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6460-6306","authenticated-orcid":false,"given":"Willi","family":"Mann","sequence":"additional","affiliation":[{"name":"Celonis SE, Munich, Austria"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-7656-7526","authenticated-orcid":false,"given":"Daniel","family":"Schmitt","sequence":"additional","affiliation":[{"name":"University of Salzburg, Salzburg, Austria"}]}],"member":"320","published-online":{"date-parts":[[2023,5,30]]},"reference":[{"key":"e_1_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2005.111"},{"key":"e_1_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/304182.304187"},{"key":"e_1_2_2_3_1","volume-title":"Similarity Joins in Relational Database Systems","author":"Augsten Nikolaus","unstructured":"Nikolaus Augsten and Michael Bohlen. 2013. Similarity Joins in Relational Database Systems 3rd ed.). San Rafael: Morgan & Claypool Publishers.","edition":"3"},{"key":"e_1_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/361002.361007"},{"volume-title":"Advances in Knowledge Discovery and Data Mining (PAKDD '06)","author":"Brecheisen Stefan","key":"e_1_2_2_5_1","unstructured":"Stefan Brecheisen, Hans-Peter Kriegel, and Martin Pfeifle. 2006. Parallel Density-Based Clustering of Complex Objects. In Advances in Knowledge Discovery and Data Mining (PAKDD '06). Springer Berlin Heidelberg, 179--188."},{"volume-title":"Advances in Knowledge Discovery and Data Mining (PAKDD '13)","author":"Campello Ricardo J.G.B.","key":"e_1_2_2_6_1","unstructured":"Ricardo J.G.B. Campello, Davoud Moulavi, and J\u00f6rg Sander. 2013. Density-Based Clustering Based on Hierarchical Density Estimate. In Advances in Knowledge Discovery and Data Mining (PAKDD '13). Springer Berlin Heidelberg, 160--172."},{"key":"e_1_2_2_7_1","volume-title":"Proceedings of the 23rd International Conference on Very Large Data Bases (VLDB '97)","author":"Ciaccia Paolo","year":"1997","unstructured":"Paolo Ciaccia, Marco Patella, and Pavel Zezula. 1997. M-tree: An Efficient Access Method for Similarity Search in Metric Spaces. In Proceedings of the 23rd International Conference on Very Large Data Bases (VLDB '97). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 426--435."},{"key":"e_1_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.5555\/3001460.3001507"},{"key":"e_1_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3196922"},{"key":"e_1_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/602259.602266"},{"key":"e_1_2_2_11_1","volume-title":"Data Mining: Concepts and Techniques","author":"Han Jiawei","year":"2012","unstructured":"Jiawei Han, Micheline Kamber, and Jian Pei. 2012. Data Mining: Concepts and Techniques 3rd edition ed.). Morgan Kaufmann.","edition":"3"},{"key":"e_1_2_2_12_1","volume-title":"Discovering Deviating Cases and Process Variants Using Trace Clustering. In 27th Benelux Conference on Artificial Intelligence (BNAIC","author":"Hompes B. F. A.","year":"2015","unstructured":"B. F. A. Hompes, J. C. A. M. Buijs, W. M. P. van der Aalst, P. M. Dixit, and J Buurman. 2015. Discovering Deviating Cases and Process Variants Using Trace Clustering. In 27th Benelux Conference on Artificial Intelligence (BNAIC 2015). Hasselt, Belgium."},{"key":"e_1_2_2_13_1","volume-title":"Proceedings of the 36th International Conference on Machine Learning","volume":"97","author":"Jang Jennifer","year":"2019","unstructured":"Jennifer Jang and Heinrich Jiang. 2019. DBSCAN: Towards fast and scalable density clustering. In Proceedings of the 36th International Conference on Machine Learning, Vol. 97. PMLR, 3019--3029."},{"key":"e_1_2_2_14_1","volume-title":"KDTREE 2: Fortran 95 and C software to efficiently search for near neighbors in a multi-dimensional Euclidean space. arXiv preprint physics\/0408067","author":"Kennel Matthew B","year":"2004","unstructured":"Matthew B Kennel. 2004. KDTREE 2: Fortran 95 and C software to efficiently search for near neighbors in a multi-dimensional Euclidean space. arXiv preprint physics\/0408067 (2004)."},{"key":"e_1_2_2_15_1","volume-title":"Proceedings of the 24th International Conference on Extending Database Technology. EDBT, 109--120","author":"Kocher Daniel","year":"2021","unstructured":"Daniel Kocher, Nikolaus Augsten, and Willi Mann. 2021. Scaling Density-Based Clustering to Large Collections of Sets. In Proceedings of the 24th International Conference on Extending Database Technology. EDBT, 109--120."},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2939750"},{"key":"e_1_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2020.3023125"},{"key":"e_1_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.14778\/2947618.2947620"},{"key":"e_1_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1093\/mnras\/stz1260"},{"key":"e_1_2_2_20_1","volume-title":"http:\/\/archive.ics.uci.edu\/ml\/datasets\/gassensorsforhomeactivitymonitoring","author":"Machine Learning Repository UCI","year":"2022","unstructured":"UCI Machine Learning Repository. 2022. HT-SENSOR. http:\/\/archive.ics.uci.edu\/ml\/datasets\/gassensorsforhomeactivitymonitoring. Accessed: October 2022."},{"key":"e_1_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-03973-7_8"},{"volume-title":"or mean removal and variance scaling. https:\/\/scikit-learn.org\/stable\/modules\/preprocessing.html","year":"2022","key":"e_1_2_2_22_1","unstructured":"scikit learn. 2022. Standardization, or mean removal and variance scaling. https:\/\/scikit-learn.org\/stable\/modules\/preprocessing.html. Accessed: October 2022."},{"key":"e_1_2_2_23_1","doi-asserted-by":"crossref","unstructured":"Kevin Sheridan Tejas G Puranik Eugene Mangortey Olivia J Pinon-Fischer Michelle Kirby and Dimitri N Mavris. 2020. An application of dbscan clustering for flight anomaly detection during the approach phase. In AIAA Scitech 2020 Forum.","DOI":"10.2514\/6.2020-1851"},{"key":"e_1_2_2_24_1","volume-title":"FINEX: A Fast Index for Exact & Flexible Density-Based Clustering (Extended Version with Proofs)*. arxiv: 2304.04817 [cs.DB]","author":"Thiel Konstantin Emil","year":"2023","unstructured":"Konstantin Emil Thiel, Daniel Kocher, Nikolaus Augsten, Thomas H\u00fctter, Willi Mann, and Daniel Ulrich Schmitt. 2023. FINEX: A Fast Index for Exact & Flexible Density-Based Clustering (Extended Version with Proofs)*. arxiv: 2304.04817 [cs.DB]"},{"key":"e_1_2_2_25_1","doi-asserted-by":"crossref","unstructured":"Xubo Wang Lu Qin Xuemin Lin Ying Zhang and Lijun Chang. 2019. Leveraging set relations in exact and dynamic set similarity join. In The VLDB Journal. 267--292.","DOI":"10.1007\/s00778-018-0529-2"},{"key":"e_1_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/2000824.2000825"}],"container-title":["Proceedings of the ACM on Management of Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3588925","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3588925","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:47:37Z","timestamp":1750178857000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3588925"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,5,26]]},"references-count":26,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,5,26]]}},"alternative-id":["10.1145\/3588925"],"URL":"https:\/\/doi.org\/10.1145\/3588925","relation":{},"ISSN":["2836-6573"],"issn-type":[{"type":"electronic","value":"2836-6573"}],"subject":[],"published":{"date-parts":[[2023,5,26]]}}}