{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,28]],"date-time":"2025-11-28T04:33:44Z","timestamp":1764304424973},"reference-count":32,"publisher":"Association for Computing Machinery (ACM)","issue":"1","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2009,8]]},"abstract":"<jats:p>Clustering high dimensional data is an emerging research field.<jats:italic>Subspace clustering<\/jats:italic>or<jats:italic>projected clustering<\/jats:italic>group similar objects in subspaces, i.e. projections, of the full space. In the past decade, several clustering paradigms have been developed in parallel, without thorough evaluation and comparison between these paradigms on a common basis.<\/jats:p><jats:p>Conclusive evaluation and comparison is challenged by three major issues. First, there is no ground truth that describes the \"true\" clusters in real world data. Second, a large variety of evaluation measures have been used that reflect different aspects of the clustering result. Finally, in typical publications authors have limited their analysis to their favored paradigm only, while paying other paradigms little or no attention.<\/jats:p><jats:p>In this paper, we take a systematic approach to evaluate the major paradigms in a common framework. We study representative clustering algorithms to characterize the different aspects of each paradigm and give a detailed comparison of their properties. We provide a benchmark set of results on a large variety of real world and synthetic data sets. Using different evaluation measures, we broaden the scope of the experimental analysis and create a common baseline for future developments and comparable evaluations in the field. For repeatability, all implementations, data sets and evaluation measures are available on our website.<\/jats:p>","DOI":"10.14778\/1687627.1687770","type":"journal-article","created":{"date-parts":[[2014,6,24]],"date-time":"2014-06-24T12:17:57Z","timestamp":1403612277000},"page":"1270-1281","source":"Crossref","is-referenced-by-count":157,"title":["Evaluating clustering in subspace projections of high dimensional data"],"prefix":"10.14778","volume":"2","author":[{"given":"Emmanuel","family":"M\u00fcller","sequence":"first","affiliation":[{"name":"RWTH Aachen University, Germany"}]},{"given":"Stephan","family":"G\u00fcnnemann","sequence":"additional","affiliation":[{"name":"RWTH Aachen University, Germany"}]},{"given":"Ira","family":"Assent","sequence":"additional","affiliation":[{"name":"Aalborg University, Denmark"}]},{"given":"Thomas","family":"Seidl","sequence":"additional","affiliation":[{"name":"RWTH Aachen University, Germany"}]}],"member":"320","published-online":{"date-parts":[[2009,8]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/304182.304188"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/342009.335383"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/276304.276314"},{"key":"e_1_2_1_4_1","first-page":"487","volume-title":"VLDB","author":"Agrawal R.","year":"1994","unstructured":"R. Agrawal and R. Srikant . Fast algorithms for mining association rules . In VLDB , pages 487 -- 499 , 1994 . R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In VLDB, pages 487--499, 1994."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2007.49"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2008.46"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-87481-2_44"},{"key":"e_1_2_1_8_1","volume-title":"UCI Machine Learning Repository","author":"Asuncion A.","year":"2007","unstructured":"A. Asuncion and D. Newman . UCI Machine Learning Repository , 2007 . A. Asuncion and D. Newman. UCI Machine Learning Repository, 2007."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.5555\/645503.656271"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2007.85"},{"key":"e_1_2_1_11_1","first-page":"93","volume-title":"International Conference on Intelligent Systems for Molecular Biology","author":"Cheng Y.","year":"2000","unstructured":"Y. Cheng and G. M. Church . Biclustering of expression data . In International Conference on Intelligent Systems for Molecular Biology , pages 93 -- 103 , 2000 . Y. Cheng and G. M. Church. Biclustering of expression data. In International Conference on Intelligent Systems for Molecular Biology, pages 93--103, 2000."},{"issue":"1","key":"e_1_2_1_12_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1111\/j.2517-6161.1977.tb01600.x","article-title":"Maximum likelihood from incomplete data via the em algorithm","volume":"39","author":"Dempster A. P.","year":"1977","unstructured":"A. P. Dempster , N. M. Laird , and D. B. Rubin . Maximum likelihood from incomplete data via the em algorithm . Journal of the Royal Statistical Society , 39 ( 1 ): 1 -- 38 , 1977 . A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, 39(1):1--38, 1977.","journal-title":"Journal of the Royal Statistical Society"},{"key":"e_1_2_1_13_1","first-page":"226","volume-title":"KDD","author":"Ester M.","year":"1996","unstructured":"M. Ester , H.-P. Kriegel , J. Sander , and X. Xu . A density-based algorithm for discovering clusters in large spatial databases . In KDD , pages 226 -- 231 , 1996 . M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases. In KDD, pages 226--231, 1996."},{"key":"e_1_2_1_14_1","volume-title":"Morgan Kaufmann","author":"Han J.","year":"2001","unstructured":"J. Han and M. Kamber . Data Mining: Concepts and Techniques . Morgan Kaufmann , 2001 . J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, 2001."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/342009.335372"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4757-1904-8"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611972740.23"},{"key":"e_1_2_1_18_1","first-page":"882","volume-title":"VLDB","author":"Keogh E.","year":"2006","unstructured":"E. Keogh , L. Wei , X. Xi , S.-H. Lee , and M. Vlachos . LB_Keogh supports exact indexing of shapes under rotation invariance with arbitrary representations and distance measures . In VLDB , pages 882 -- 893 , 2006 . E. Keogh, L. Wei, X. Xi, S.-H. Lee, and M. Vlachos. LB_Keogh supports exact indexing of shapes under rotation invariance with arbitrary representations and distance measures. In VLDB, pages 882--893, 2006."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2005.5"},{"key":"e_1_2_1_20_1","first-page":"281","volume-title":"Berkeley Symp. Math. stat. & prob.","author":"MacQueen J.","year":"1967","unstructured":"J. MacQueen . Some methods for classification and analysis of multivariate observations . In Berkeley Symp. Math. stat. & prob. , pages 281 -- 297 , 1967 . J. MacQueen. Some methods for classification and analysis of multivariate observations. In Berkeley Symp. Math. stat. & prob., pages 281--297, 1967."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/1401890.1401956"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2006.123"},{"key":"e_1_2_1_23_1","first-page":"2","volume-title":"Open Source in Data Mining Workshop at PAKDD","author":"M\u00fcller E.","year":"2009","unstructured":"E. M\u00fcller , I. Assent , S. G\u00fcnnemann , T. Jansen , and T. Seidl . OpenSubspace: An open source framework for evaluation and exploration of subspace clustering algorithms in WEKA . In Open Source in Data Mining Workshop at PAKDD , pages 2 -- 13 , 2009 . E. M\u00fcller, I. Assent, S. G\u00fcnnemann, T. Jansen, and T. Seidl. OpenSubspace: An open source framework for evaluation and exploration of subspace clustering algorithms in WEKA. In Open Source in Data Mining Workshop at PAKDD, pages 2--13, 2009."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611972795.16"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/1401890.1402026"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-02279-1_36"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611972719.7"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2006.106"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/564691.564739"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.5555\/1032649.1033453"},{"key":"e_1_2_1_31_1","volume-title":"USA","author":"Witten I.","year":"2005","unstructured":"I. Witten and E. Frank . Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann , USA , 2005 . I. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, USA, 2005."},{"key":"e_1_2_1_32_1","first-page":"689","volume-title":"ICDM","author":"Yiu M. L.","year":"2003","unstructured":"M. L. Yiu and N. Mamoulis . Frequent-pattern based iterative projected clustering . In ICDM , pages 689 -- 692 , 2003 . M. L. Yiu and N. Mamoulis. Frequent-pattern based iterative projected clustering. In ICDM, pages 689--692, 2003."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/1687627.1687770","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,5,27]],"date-time":"2024-05-27T23:57:03Z","timestamp":1716854223000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/1687627.1687770"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,8]]},"references-count":32,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2009,8]]}},"alternative-id":["10.14778\/1687627.1687770"],"URL":"https:\/\/doi.org\/10.14778\/1687627.1687770","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2009,8]]}}}