{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,13]],"date-time":"2026-02-13T10:25:32Z","timestamp":1770978332040,"version":"3.50.1"},"reference-count":25,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2004,12,1]],"date-time":"2004-12-01T00:00:00Z","timestamp":1101859200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["SIGKDD Explor. Newsl."],"published-print":{"date-parts":[[2004,12]]},"abstract":"<jats:p>Data clustering has been discussed extensively, but almost all known conventional clustering algorithms tend to break down in high dimensional spaces because of the inherent sparsity of the data points. Existing subspace clustering algorithms for handling high-dimensional data focus on numerical dimensions. In this paper, we designed an iterative algorithm called SUBCAD for clustering high dimensional categorical data sets, based on the minimization of an objective function for clustering. We deduced some cluster memberships changing rules using the objective function. We also designed an objective function to determine the subspace associated with each cluster. We proved various properties of this objective function that are essential for us to design a fast algorithm to find the subspace associated with each cluster. Finally, we carried out some experiments to show the effectiveness of the proposed method and the algorithm.<\/jats:p>","DOI":"10.1145\/1046456.1046468","type":"journal-article","created":{"date-parts":[[2007,1,17]],"date-time":"2007-01-17T18:32:02Z","timestamp":1169058722000},"page":"87-94","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":55,"title":["Subspace clustering for high dimensional categorical data"],"prefix":"10.1145","volume":"6","author":[{"given":"Guojun","family":"Gan","sequence":"first","affiliation":[{"name":"York University, Toronto, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jianhong","family":"Wu","sequence":"additional","affiliation":[{"name":"York University, Toronto, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2004,12]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/304182.304188"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/342009.335383"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/276304.276314"},{"key":"e_1_2_1_4_1","volume-title":"Cluster analysis for applications","author":"Anderberg M.","year":"1973","unstructured":"M. Anderberg . Cluster analysis for applications . Academic Press , New York , 1973 .]] M. Anderberg. Cluster analysis for applications. Academic Press, New York, 1973.]]"},{"key":"e_1_2_1_5_1","unstructured":"K.\n      Beyer J.\n      Goldstein R.\n      Ramakrishnan and \n      U.\n      Shaft\n  . \n  When is \"nearest neighbor\" meaningful? In C\n  . Beeri and P. Buneman editors Database Theory - ICDT '99 7th International Conference Jerusalem Israel January 10--12 1999 Proceedings volume \n  1540\n   of \n  Lecture Notes in Computer Science pages \n  217\n  --\n  235\n  . \n  Springer 1999.]]   K. Beyer J. Goldstein R. Ramakrishnan and U. Shaft. When is \"nearest neighbor\" meaningful? In C. Beeri and P. Buneman editors Database Theory - ICDT '99 7th International Conference Jerusalem Israel January 10--12 1999 Proceedings volume 1540 of Lecture Notes in Computer Science pages 217--235. Springer 1999.]]"},{"key":"e_1_2_1_6_1","volume-title":"UCI repository of machine learning databases","author":"Blake C.","year":"1998","unstructured":"C. Blake and C. Merz . UCI repository of machine learning databases , 1998 . http:\/\/www.ics.uci.edu\/~mlearn\/MLRepository.html.]] C. Blake and C. Merz. UCI repository of machine learning databases, 1998. http:\/\/www.ics.uci.edu\/~mlearn\/MLRepository.html.]]"},{"key":"e_1_2_1_7_1","first-page":"91","volume-title":"Proc. 15th International Conf. on Machine Learning","author":"Bradley P.","year":"1998","unstructured":"P. Bradley and U. Fayyad . Refining initial points for K-Means clustering . In Proc. 15th International Conf. on Machine Learning , pages 91 -- 99 . Morgan Kaufmann, San Francisco, CA , 1998 .]] P. Bradley and U. Fayyad. Refining initial points for K-Means clustering. In Proc. 15th International Conf. on Machine Learning, pages 91--99. Morgan Kaufmann, San Francisco, CA, 1998.]]"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0893-6080(01)00108-3"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/312129.312199"},{"key":"e_1_2_1_10_1","volume-title":"Department of Mathematics and Statistics","author":"Gan G.","year":"2003","unstructured":"G. Gan . Subspace clustering for high dimendional categorical data. Master's thesis , Department of Mathematics and Statistics , York University , Toronto, Canada , October 2003 .]] G. Gan. Subspace clustering for high dimendional categorical data. Master's thesis, Department of Mathematics and Statistics, York University, Toronto, Canada, October 2003.]]"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/276304.276312"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1009769707641"},{"key":"e_1_2_1_14_1","volume-title":"Algorithms for Clustering Data","author":"Jain A.","year":"1988","unstructured":"A. Jain and R. Dubes . Algorithms for Clustering Data . Prentice Hall , Englewood Cliffs , New Jersey, 1988 .]] A. Jain and R. Dubes. Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs, New Jersey, 1988.]]"},{"key":"e_1_2_1_15_1","series-title":"Wiley series in probability and mathematical statistics","doi-asserted-by":"crossref","DOI":"10.1002\/9780470316801","volume-title":"Finding Groups in Data--An Introduction to Cluster Analysis","author":"Kaufman L.","year":"1990","unstructured":"L. Kaufman and P. Rousseeuw . Finding Groups in Data--An Introduction to Cluster Analysis . Wiley series in probability and mathematical statistics . John Wiley & Sons, Inc. , New York , 1990 .]] L. Kaufman and P. Rousseeuw. Finding Groups in Data--An Introduction to Cluster Analysis. Wiley series in probability and mathematical statistics. John Wiley & Sons, Inc., New York, 1990.]]"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/198429.198435"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/354756.354775"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.5555\/211390"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.5555\/850941.852931"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/342009.335384"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/312129.312248"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/564691.564739"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3147.3165"},{"key":"e_1_2_1_24_1","first-page":"216","volume-title":"Exploratory Data Analysis in Empirical Research","author":"Wishart D.","year":"2002","unstructured":"D. Wishart . k-means clustering with outlier detection, mixed variables and missing values . In M. Schwaiger and O. Opitz, editors, Exploratory Data Analysis in Empirical Research , pages 216 -- 226 . Springer , 2002 .]] D. Wishart. k-means clustering with outlier detection, mixed variables and missing values. In M. Schwaiger and O. Opitz, editors, Exploratory Data Analysis in Empirical Research, pages 216--226. Springer, 2002.]]"},{"key":"e_1_2_1_26_1","first-page":"517","volume-title":"Data Engineering, 2002. Proceedings. 18th International Conference on","author":"Yang J.","year":"2002","unstructured":"J. Yang , W. Wang , H. Wang , and P. Yu . \u0394-clusters: capturing subspace correlation in a large data set . Data Engineering, 2002. Proceedings. 18th International Conference on , pages 517 -- 528 , 26 Feb. -1 March 2002 .]] J. Yang, W. Wang, H. Wang, and P. Yu. \u0394-clusters: capturing subspace correlation in a large data set. Data Engineering, 2002. Proceedings. 18th International Conference on, pages 517--528, 26 Feb.-1 March 2002.]]"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/17.9.763"}],"container-title":["ACM SIGKDD Explorations Newsletter"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1046456.1046468","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1046456.1046468","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T17:23:55Z","timestamp":1750267435000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1046456.1046468"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2004,12]]},"references-count":25,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2004,12]]}},"alternative-id":["10.1145\/1046456.1046468"],"URL":"https:\/\/doi.org\/10.1145\/1046456.1046468","relation":{},"ISSN":["1931-0145","1931-0153"],"issn-type":[{"value":"1931-0145","type":"print"},{"value":"1931-0153","type":"electronic"}],"subject":[],"published":{"date-parts":[[2004,12]]},"assertion":[{"value":"2004-12-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}