{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,18]],"date-time":"2025-11-18T12:10:44Z","timestamp":1763467844316,"version":"3.41.0"},"reference-count":25,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2007,12,1]],"date-time":"2007-12-01T00:00:00Z","timestamp":1196467200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Knowl. Discov. Data"],"published-print":{"date-parts":[[2007,12]]},"abstract":"<jats:p>\n            How do we find a\n            <jats:italic>natural<\/jats:italic>\n            clustering of a real-world point set which contains an unknown number of clusters with different shapes, and which may be contaminated by noise? As most clustering algorithms were designed with certain assumptions (Gaussianity), they often require the user to give input parameters, and are sensitive to noise. In this article, we propose a robust framework for determining a natural clustering of a given dataset, based on the minimum description length (MDL) principle. The proposed framework,\n            <jats:italic>robust information-theoretic clustering (RIC)<\/jats:italic>\n            , is orthogonal to any known clustering algorithm: Given a preliminary clustering, RIC purifies these clusters from noise, and adjusts the clusterings such that it simultaneously determines the most natural amount and shape (subspace) of the clusters. Our RIC method can be combined with any clustering technique ranging from K-means and K-medoids to advanced methods such as spectral clustering. In fact, RIC is even able to purify and improve an initial coarse clustering, even if we start with very simple methods. In an extension, we propose a fully automatic stand-alone clustering method and efficiency improvements. RIC scales well with the dataset size. Extensive experiments on synthetic and real-world datasets validate the proposed RIC framework.\n          <\/jats:p>","DOI":"10.1145\/1297332.1297334","type":"journal-article","created":{"date-parts":[[2007,12,7]],"date-time":"2007-12-07T19:19:01Z","timestamp":1197055141000},"page":"10","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":16,"title":["RIC"],"prefix":"10.1145","volume":"1","author":[{"given":"Christian","family":"B\u00f6hm","sequence":"first","affiliation":[{"name":"University of Munich, Munich, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Christos","family":"Faloutsos","sequence":"additional","affiliation":[{"name":"Carnegie Mellon University, Pittsburgh, PA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jia-Yu","family":"Pan","sequence":"additional","affiliation":[{"name":"Google, Mountain View, CA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Claudia","family":"Plant","sequence":"additional","affiliation":[{"name":"University of Munich, Munich, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2007,12]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/342009.335383"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/276304.276314"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/304182.304187"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.2307\/2532201"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2005.151"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1007568.1007620"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/1014052.1014064"},{"volume-title":"Proceedings of the ACM SIGKDD Conference on International Knowledge Discovery and Data Mining.","author":"Ester M.","key":"e_1_2_1_8_1","unstructured":"Ester , M. , Kriegel , H.-P. , Sander , J. , and Xu , X . 1996. A density-based algorithm for discovering clusters in large spatial databases with noise . In Proceedings of the ACM SIGKDD Conference on International Knowledge Discovery and Data Mining. Ester, M., Kriegel, H.-P., Sander, J., and Xu, X. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the ACM SIGKDD Conference on International Knowledge Discovery and Data Mining."},{"key":"e_1_2_1_9_1","doi-asserted-by":"crossref","unstructured":"Gr\u00fcnwald P. 2005. A tutorial introduction to the minimum description length principle. Advances in Minimum Description Length: Theory and Applications.  Gr\u00fcnwald P. 2005. A tutorial introduction to the minimum description length principle. Advances in Minimum Description Length: Theory and Applications.","DOI":"10.7551\/mitpress\/1114.001.0001"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/276304.276312"},{"volume-title":"Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS).","author":"Hamerly G.","key":"e_1_2_1_11_1","unstructured":"Hamerly , G. and Elkan , C . 2003. Learning the k in k-means . In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS). Hamerly, G. and Elkan, C. 2003. Learning the k in k-means. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS)."},{"volume-title":"Clustering Algorithms","author":"Hartigan J. A.","key":"e_1_2_1_12_1","unstructured":"Hartigan , J. A. 1975. Clustering Algorithms . John Wiley . Hartigan, J. A. 1975. Clustering Algorithms. John Wiley."},{"volume-title":"Principal Component Analysis","author":"Jolliffe I.","key":"e_1_2_1_13_1","unstructured":"Jolliffe , I. 1986. Principal Component Analysis . Springer . Jolliffe, I. 1986. Principal Component Analysis. Springer."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1093\/comjnl\/26.4.354"},{"volume-title":"Proceedings of the Conference on Advances in Neural Information Processing Systems.","author":"Ng A.","key":"e_1_2_1_15_1","unstructured":"Ng , A. , Jordan , M. , and Weiss , Y . 2001. On spectral clustering: Analysis and an algorithm . In Proceedings of the Conference on Advances in Neural Information Processing Systems. Ng, A., Jordan, M., and Weiss, Y. 2001. On spectral clustering: Analysis and an algorithm. In Proceedings of the Conference on Advances in Neural Information Processing Systems."},{"volume-title":"Proceedings of the Conference on Very Large Databases (VLDB), 144--155","author":"Ng R. T.","key":"e_1_2_1_16_1","unstructured":"Ng , R. T. and Han , J . 1994. Efficient and effective clustering methods for spatial data mining . In Proceedings of the Conference on Very Large Databases (VLDB), 144--155 . Ng, R. T. and Han, J. 1994. Efficient and effective clustering methods for spatial data mining. In Proceedings of the Conference on Very Large Databases (VLDB), 144--155."},{"volume-title":"Proceedings of the 17th International Conference on Machine Learning (ICML), 727--734","author":"Pelleg D.","key":"e_1_2_1_17_1","unstructured":"Pelleg , D. and Moore , A . 2000. X-means: Extending K-means with efficient estimation of the number of clusters . In Proceedings of the 17th International Conference on Machine Learning (ICML), 727--734 . Pelleg, D. and Moore, A. 2000. X-means: Extending K-means with efficient estimation of the number of clusters. In Proceedings of the 17th International Conference on Machine Learning (ICML), 727--734."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/345508.345578"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1162\/0899766042321751"},{"key":"e_1_2_1_20_1","doi-asserted-by":"crossref","unstructured":"Tibshirani R. Walther G. and Hastie T. 2000. Estimating the number of clusters in a dataset via the gap statistic. Tech. Rep. Stanford University.  Tibshirani R. Walther G. and Hastie T. 2000. Estimating the number of clusters in a dataset via the gap statistic. Tech. Rep. Stanford University.","DOI":"10.1111\/1467-9868.00293"},{"volume-title":"Proceedings of the 37th Allerton Conference on Communication, Control and Computing.","author":"Tishby N.","key":"e_1_2_1_21_1","unstructured":"Tishby , N. , Pereira , F. C. , and Bialek , W . 2000. The information bottleneck method . In Proceedings of the 37th Allerton Conference on Communication, Control and Computing. Tishby, N., Pereira, F. C., and Bialek, W. 2000. The information bottleneck method. In Proceedings of the 37th Allerton Conference on Communication, Control and Computing."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1066157.1066211"},{"key":"e_1_2_1_23_1","volume-title":"Information Retrieval","author":"Van-Rijsbergen C.","unstructured":"Van-Rijsbergen , C. 1979. Information Retrieval , 2 nd ed. Butterworths , London . Van-Rijsbergen, C. 1979. Information Retrieval, 2nd ed. Butterworths, London.","edition":"2"},{"volume-title":"Proceedings of the 1st International Workshop on Temporal, Spatial, and Spatio-Temporal Data Mining-Revised Papers (TSDM), 31--45","author":"Zhang B.","key":"e_1_2_1_24_1","unstructured":"Zhang , B. , Hsu , M. , and Dayal , U . 2000. K-harmonic means---A spatial clustering algorithm with boosting . In Proceedings of the 1st International Workshop on Temporal, Spatial, and Spatio-Temporal Data Mining-Revised Papers (TSDM), 31--45 . Zhang, B., Hsu, M., and Dayal, U. 2000. K-harmonic means---A spatial clustering algorithm with boosting. In Proceedings of the 1st International Workshop on Temporal, Spatial, and Spatio-Temporal Data Mining-Revised Papers (TSDM), 31--45."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/233269.233324"}],"container-title":["ACM Transactions on Knowledge Discovery from Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1297332.1297334","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1297332.1297334","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T15:14:02Z","timestamp":1750259642000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1297332.1297334"}},"subtitle":["Parameter-free noise-robust clustering"],"short-title":[],"issued":{"date-parts":[[2007,12]]},"references-count":25,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2007,12]]}},"alternative-id":["10.1145\/1297332.1297334"],"URL":"https:\/\/doi.org\/10.1145\/1297332.1297334","relation":{},"ISSN":["1556-4681","1556-472X"],"issn-type":[{"type":"print","value":"1556-4681"},{"type":"electronic","value":"1556-472X"}],"subject":[],"published":{"date-parts":[[2007,12]]},"assertion":[{"value":"2007-12-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}