{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,24]],"date-time":"2025-11-24T09:24:47Z","timestamp":1763976287148,"version":"3.41.0"},"reference-count":45,"publisher":"Association for Computing Machinery (ACM)","issue":"5","license":[{"start":{"date-parts":[[2021,10,31]],"date-time":"2021-10-31T00:00:00Z","timestamp":1635638400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"NSF","award":["2105133, 2126474, 1901099, 1737633, IIS-1320580, IIS-0940818, IIS-1218168, and 1916518"],"award-info":[{"award-number":["2105133, 2126474, 1901099, 1737633, IIS-1320580, IIS-0940818, IIS-1218168, and 1916518"]}]},{"name":"Google\u2019s AI for Social Good Impact Scholars program"},{"name":"Dean\u2019s Research Initiative Award at the University of Maryland"},{"DOI":"10.13039\/100000203","name":"USGS","doi-asserted-by":"crossref","award":["G21AC10207"],"award-info":[{"award-number":["G21AC10207"]}],"id":[{"id":"10.13039\/100000203","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Pitt Momentum Fund Award"},{"DOI":"10.13039\/100000005","name":"USDOD","doi-asserted-by":"crossref","award":["HM0476-20-1-0009"],"award-info":[{"award-number":["HM0476-20-1-0009"]}],"id":[{"id":"10.13039\/100000005","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/100000015","name":"USDOE","doi-asserted-by":"crossref","award":["DE-AR0000795"],"award-info":[{"award-number":["DE-AR0000795"]}],"id":[{"id":"10.13039\/100000015","id-type":"DOI","asserted-by":"crossref"}]},{"name":"NIH","award":["UL1 TR002494, KL2TR002492, and TL1 TR002493"],"award-info":[{"award-number":["UL1 TR002494, KL2TR002492, and TL1 TR002493"]}]},{"DOI":"10.13039\/100000199","name":"USDA","doi-asserted-by":"crossref","award":["2017-51181-27222"],"award-info":[{"award-number":["2017-51181-27222"]}],"id":[{"id":"10.13039\/100000199","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Minnesota Super computing Institute"},{"name":"Safety Research using Simulation University Transportation Center"},{"name":"US-DOT\u2019s University Transportation Centers Program","award":["69A3551747131"],"award-info":[{"award-number":["69A3551747131"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Intell. Syst. Technol."],"published-print":{"date-parts":[[2021,10,31]]},"abstract":"<jats:p>Cluster detection is important and widely used in a variety of applications, including public health, public safety, transportation, and so on. Given a collection of data points, we aim to detect density-connected spatial clusters with varying geometric shapes and densities, under the constraint that the clusters are statistically significant. The problem is challenging, because many societal applications and domain science studies have low tolerance for spurious results, and clusters may have arbitrary shapes and varying densities. As a classical topic in data mining and learning, a myriad of techniques have been developed to detect clusters with both varying shapes and densities (e.g., density-based, hierarchical, spectral, or deep clustering methods). However, the vast majority of these techniques do not consider statistical rigor and are susceptible to detecting spurious clusters formed as a result of natural randomness. On the other hand, scan statistic approaches explicitly control the rate of spurious results, but they typically assume a single \u201chotspot\u201d of over-density and many rely on further assumptions such as a tessellated input space. To unite the strengths of both lines of work, we propose a statistically robust formulation of a multi-scale DBSCAN, namely Significant DBSCAN+, to identify significant clusters that are density connected. As we will show, incorporation of statistical rigor is a powerful mechanism that allows the new Significant DBSCAN+ to outperform state-of-the-art clustering techniques in various scenarios. We also propose computational enhancements to speed-up the proposed approach. Experiment results show that Significant DBSCAN+ can simultaneously improve the success rate of true cluster detection (e.g., 10\u201320% increases in absolute F1 scores) and substantially reduce the rate of spurious results (e.g., from thousands\/hundreds of spurious detections to none or just a few across 100 datasets), and the acceleration methods can improve the efficiency for both clustered and non-clustered data.<\/jats:p>","DOI":"10.1145\/3474842","type":"journal-article","created":{"date-parts":[[2021,11,24]],"date-time":"2021-11-24T16:04:45Z","timestamp":1637769885000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":16,"title":["Significant DBSCAN+: Statistically Robust Density-based Clustering"],"prefix":"10.1145","volume":"12","author":[{"given":"Yiqun","family":"Xie","sequence":"first","affiliation":[{"name":"University of Maryland, College Park, MD"}]},{"given":"Xiaowei","family":"Jia","sequence":"additional","affiliation":[{"name":"University of Pittsburgh, S. Bouquet Street Pittsburgh, PA"}]},{"given":"Shashi","family":"Shekhar","sequence":"additional","affiliation":[{"name":"University of Minnesota, Minneapolis, MN"}]},{"given":"Han","family":"Bao","sequence":"additional","affiliation":[{"name":"University of Iowa, Iowa City, IA"}]},{"given":"Xun","family":"Zhou","sequence":"additional","affiliation":[{"name":"University of Iowa, Iowa City, IA"}]}],"member":"320","published-online":{"date-parts":[[2021,11,24]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"2020. HDBSCAN. Retrieved from https:\/\/hdbscan.readthedocs.io\/en\/latest\/index.html.  2020. HDBSCAN. Retrieved from https:\/\/hdbscan.readthedocs.io\/en\/latest\/index.html."},{"key":"e_1_2_1_2_1","unstructured":"2020. National Cancer Institute. Retrieved from https:\/\/surveillance.cancer.gov\/satscan\/.  2020. National Cancer Institute. Retrieved from https:\/\/surveillance.cancer.gov\/satscan\/."},{"key":"e_1_2_1_3_1","unstructured":"2020. SaTScan. Retrieved from https:\/\/www.satscan.org\/.  2020. SaTScan. Retrieved from https:\/\/www.satscan.org\/."},{"key":"e_1_2_1_4_1","unstructured":"2021. Keras implementation for Deep Embedding Clustering (DEC). Retrieved from https:\/\/github.com\/XifengGuo\/DEC-keras.  2021. Keras implementation for Deep Embedding Clustering (DEC). Retrieved from https:\/\/github.com\/XifengGuo\/DEC-keras."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/3161602"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-37456-2_14"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/2733381"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.csda.2011.11.001"},{"key":"e_1_2_1_10_1","volume-title":"Anderson Ribeiro Duarte, and Ricardo Tavares","author":"Duczmal Luiz","year":"2009","unstructured":"Luiz Duczmal , Anderson Ribeiro Duarte, and Ricardo Tavares . 2009 . Extensions of the scan statistic for the detection and inference of spatial clusters. In Scan Statistics. Springer , 153\u2013177. Luiz Duczmal, Anderson Ribeiro Duarte, and Ricardo Tavares. 2009. Extensions of the scan statistic for the detection and inference of spatial clusters. In Scan Statistics. Springer, 153\u2013177."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1198\/106186006X112396"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2014.13"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2737792"},{"key":"e_1_2_1_14_1","volume-title":"Koutras","author":"Glaz Joseph","year":"2019","unstructured":"Joseph Glaz and Markos V . Koutras . 2019 . Handbook of Scan Statistics. Springer . Joseph Glaz and Markos V. Koutras. 2019. Handbook of Scan Statistics. Springer."},{"key":"e_1_2_1_15_1","first-page":"2892","article-title":"Partially view-aligned clustering","volume":"33","author":"Huang Zhenyu","year":"2020","unstructured":"Zhenyu Huang , Peng Hu , Joey Tianyi Zhou , Jiancheng Lv , and Xi Peng . 2020 . Partially view-aligned clustering . Advances in Neural Information Processing Systems 33 (2020), 2892 \u2013 2902 . https:\/\/proceedings.neurips.cc\/paper\/2020\/hash\/1e591403ff232de0f0f139ac51d99295-Abstract.html. Zhenyu Huang, Peng Hu, Joey Tianyi Zhou, Jiancheng Lv, and Xi Peng. 2020. Partially view-aligned clustering. Advances in Neural Information Processing Systems 33 (2020), 2892\u20132902. https:\/\/proceedings.neurips.cc\/paper\/2020\/hash\/1e591403ff232de0f0f139ac51d99295-Abstract.html.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/2.781637"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1080\/03610929708831995"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.mbs.2011.07.004"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1186\/1475-2875-13-53"},{"volume-title":"Bayesian Scan Statistics","author":"Neill Daniel B.","key":"e_1_2_1_21_1","unstructured":"Daniel B. Neill . 2018. Bayesian Scan Statistics . Springer , New York, NY , 1\u201321. Daniel B. Neill. 2018. Bayesian Scan Statistics. Springer, New York, NY, 1\u201321."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1014052.1014082"},{"volume-title":"Proceedings of the IEEE International Conference on Data Mining (ICDM\u201917)","author":"Araujo Neto Antonio Cavalcante","key":"e_1_2_1_23_1","unstructured":"Antonio Cavalcante Araujo Neto , Joerg Sander , Ricardo J. G. B. Campello , and Mario A. Nascimento . 2017. Efficient computation of multiple density-based clustering hierarchies . In Proceedings of the IEEE International Conference on Data Mining (ICDM\u201917) . IEEE, 991\u2013996. Antonio Cavalcante Araujo Neto, Joerg Sander, Ricardo J. G. B. Campello, and Mario A. Nascimento. 2017. Efficient computation of multiple density-based clustering hierarchies. In Proceedings of the IEEE International Conference on Data Mining (ICDM\u201917). IEEE, 991\u2013996."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.5555\/2980539.2980649"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2623330.2623726"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1023\/B:EEST.0000027208.48919.7e"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/3200488"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2020.107749"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2021.01.087"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2020.2968848"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.5555\/2540128.2540361"},{"volume-title":"Encyclopedia of Biometrics","author":"Reynolds Douglas A.","key":"e_1_2_1_32_1","unstructured":"Douglas A. Reynolds . 2009. Gaussian mixture models . In Encyclopedia of Biometrics , Stan Z. Li and Anil Jain (Eds.). Springer US , 659\u2013663. DOI:10.1007\/978-0-387-73003-5_196 10.1007\/978-0-387-73003-5_196 Douglas A. Reynolds. 2009. Gaussian mixture models. In Encyclopedia of Biometrics, Stan Z. Li and Anil Jain (Eds.). Springer US, 659\u2013663. DOI:10.1007\/978-0-387-73003-5_196"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1111\/sjos.12027"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1198\/jcgs.2009.07071"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/2756547"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.3390\/ijgi4042306"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/TBDATA.2016.2631518"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11222-007-9033-z"},{"key":"e_1_2_1_39_1","first-page":"1010","article-title":"Optimal and fast detection of spatial clusters with scan statistics","volume":"38","author":"\u00a0al Guenther Walther","year":"2010","unstructured":"Guenther Walther et \u00a0al . 2010 . Optimal and fast detection of spatial clusters with scan statistics . Ann. Stat. 38 , 2 (2010), 1010 \u2013 1033 . Guenther Walther et\u00a0al. 2010. Optimal and fast detection of spatial clusters with scan statistics. Ann. Stat. 38, 2 (2010), 1010\u20131033.","journal-title":"Ann. Stat."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.5555\/3045390.3045442"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611975673.10"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3340964.3340968"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/3379562"},{"key":"e_1_2_1_44_1","unstructured":"Yiqun Xie Shashi Shekhar and Yan Li. 2021. Statistically-robust clustering techniques for mapping spatial hotspots: A survey. arXiv:2103.12019. Retrieved from https:\/\/arxiv.org\/abs\/2103.12019.  Yiqun Xie Shashi Shekhar and Yan Li. 2021. Statistically-robust clustering techniques for mapping spatial hotspots: A survey. arXiv:2103.12019. Retrieved from https:\/\/arxiv.org\/abs\/2103.12019."},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1007\/s40745-015-0040-1"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNN.2005.845141"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.3022749"}],"container-title":["ACM Transactions on Intelligent Systems and Technology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3474842","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3474842","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3474842","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T21:28:44Z","timestamp":1750195724000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3474842"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,31]]},"references-count":45,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2021,10,31]]}},"alternative-id":["10.1145\/3474842"],"URL":"https:\/\/doi.org\/10.1145\/3474842","relation":{},"ISSN":["2157-6904","2157-6912"],"issn-type":[{"type":"print","value":"2157-6904"},{"type":"electronic","value":"2157-6912"}],"subject":[],"published":{"date-parts":[[2021,10,31]]},"assertion":[{"value":"2021-03-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-07-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-11-24","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}