{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,20]],"date-time":"2026-01-20T16:36:59Z","timestamp":1768927019893,"version":"3.49.0"},"reference-count":95,"publisher":"Association for Computing Machinery (ACM)","issue":"5","funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62472064"],"award-info":[{"award-number":["62472064"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Dalian Young Science and Technology Talent Support Program","award":["2023RQ056"],"award-info":[{"award-number":["2023RQ056"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Knowl. Discov. Data"],"published-print":{"date-parts":[[2025,6,30]]},"abstract":"<jats:p>\n            Categorical data clustering is a fundamental data mining problem, which has been extensively studied during the past decades. To date, many effective clustering algorithms for categorical data are available in the literature. However, almost all existing categorical data clustering algorithms did not address the issue of the statistical significance of detected clusters. In particular, how to assess the statistical significance of a set of non-overlapping categorical clusters still remains unaddressed. In this article, we formulate the categorical data clustering problem as a multiple hypothesis testing problem, where the null hypothesis is that each attribute is independent of the given partition of clusters. Then, all individual\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(p\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            -values from different attributes are integrated to obtain a consensus\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(p\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            -value through statistical meta-analysis. Thereafter, a significance-based clustering algorithm is proposed in which the combined\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(p\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            -value is efficiently optimized in an indirectly and incremental manner. Experimental results on 25 real-world datasets demonstrate that our method is capable of achieving comparable performance to state-of-the-art categorical data clustering algorithms. Furthermore, our method has a good capability of determining whether there really exists a clustering structure and assessing whether a given set of clusters is statistically significant.\n          <\/jats:p>","DOI":"10.1145\/3735977","type":"journal-article","created":{"date-parts":[[2025,5,15]],"date-time":"2025-05-15T15:23:38Z","timestamp":1747322618000},"page":"1-31","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Clustering Categorical Data via Multiple Hypothesis Testing"],"prefix":"10.1145","volume":"19","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7470-9395","authenticated-orcid":false,"given":"Lianyu","family":"Hu","sequence":"first","affiliation":[{"name":"School of Software, Dalian University of Technology, Dalian, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9474-8375","authenticated-orcid":false,"given":"Mudi","family":"Jiang","sequence":"additional","affiliation":[{"name":"School of Software, Dalian University of Technology, Dalian, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1386-812X","authenticated-orcid":false,"given":"Yan","family":"Liu","sequence":"additional","affiliation":[{"name":"School of Software Engineering, Dalian University, Dalian, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6406-1142","authenticated-orcid":false,"given":"Quan","family":"Zou","sequence":"additional","affiliation":[{"name":"Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9526-8816","authenticated-orcid":false,"given":"Zengyou","family":"He","sequence":"additional","affiliation":[{"name":"School of Software, Dalian University of Technology, Dalian, China"}]}],"member":"320","published-online":{"date-parts":[[2025,6,16]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1016\/j.patcog.2018.10.026","article-title":"To cluster, or not to cluster: An analysis of clusterability methods","volume":"88","author":"Adolfsson Andreas","year":"2019","unstructured":"Andreas Adolfsson, Margareta Ackerman, and Naomi C Brownstein. 2019. To cluster, or not to cluster: An analysis of clusterability methods. Pattern Recognition 88 (2019), 13\u201326.","journal-title":"Pattern Recognition"},{"issue":"1","key":"e_1_3_2_3_2","doi-asserted-by":"crossref","first-page":"110","DOI":"10.1016\/j.patrec.2006.06.006","article-title":"A method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data set","volume":"28","author":"Ahmad Amir","year":"2007","unstructured":"Amir Ahmad and Lipika Dey. 2007. A method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data set. Pattern Recognition Letters 28, 1 (2007), 110\u2013118.","journal-title":"Pattern Recognition Letters"},{"issue":"12","key":"e_1_3_2_4_2","doi-asserted-by":"crossref","first-page":"4462","DOI":"10.1016\/j.csda.2012.02.020","article-title":"Investigating the multimodality of multivariate data with principal curves","volume":"56","author":"Ahmed Murat O.","year":"2012","unstructured":"Murat O. Ahmed and Guenther Walther. 2012. Investigating the multimodality of multivariate data with principal curves. Computational Statistics & Data Analysis 56, 12 (2012), 4462\u20134469.","journal-title":"Computational Statistics & Data Analysis"},{"issue":"4","key":"e_1_3_2_5_2","doi-asserted-by":"crossref","first-page":"329","DOI":"10.1038\/nmeth.4239","article-title":"Tabular data","volume":"14","author":"Altman Naomi","year":"2017","unstructured":"Naomi Altman and Martin Krzywinski. 2017. Tabular data. Nature Methods 14, 4 (2017), 329\u2013331.","journal-title":"Nature Methods"},{"key":"e_1_3_2_6_2","doi-asserted-by":"crossref","DOI":"10.4324\/9780429330308","volume-title":"Categorical Data Analysis for the Behavioral and Social Sciences","author":"Azen Razia","year":"2021","unstructured":"Razia Azen and Cindy M. Walker. 2021. Categorical Data Analysis for the Behavioral and Social Sciences. Routledge."},{"issue":"6","key":"e_1_3_2_7_2","doi-asserted-by":"crossref","first-page":"1560","DOI":"10.1007\/s10618-014-0387-5","article-title":"Cluster validity functions for categorical data: A solution-space perspective","volume":"29","author":"Bai Liang","year":"2015","unstructured":"Liang Bai and Jiye Liang. 2015. Cluster validity functions for categorical data: A solution-space perspective. Data Mining and Knowledge Discovery 29, 6 (2015), 1560\u20131597.","journal-title":"Data Mining and Knowledge Discovery"},{"key":"e_1_3_2_8_2","doi-asserted-by":"crossref","first-page":"108694","DOI":"10.1016\/j.patcog.2022.108694","article-title":"A categorical data clustering framework on graph representation","volume":"128","author":"Bai Liang","year":"2022","unstructured":"Liang Bai and Jiye Liang. 2022. A categorical data clustering framework on graph representation. Pattern Recognition 128 (2022), 108694.","journal-title":"Pattern Recognition"},{"issue":"6","key":"e_1_3_2_9_2","doi-asserted-by":"crossref","first-page":"1509","DOI":"10.1109\/TPAMI.2012.228","article-title":"The impact of cluster representatives on the convergence of the K-modes type clustering","volume":"35","author":"Bai Liang","year":"2013","unstructured":"Liang Bai, Jiye Liang, Chuangyin Dang, and Fuyuan Cao. 2013. The impact of cluster representatives on the convergence of the K-modes type clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 6 (2013), 1509\u20131522.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"e_1_3_2_10_2","first-page":"149","volume-title":"Proceedings of the 2004 IEEE International Conference on Fuzzy Systems","volume":"1","author":"Banerjee A.","year":"2004","unstructured":"A. Banerjee and R. N. Dave. 2004. Validating clusters using the Hopkins statistic. In Proceedings of the 2004 IEEE International Conference on Fuzzy Systems, Vol. 1, 149\u2013153."},{"key":"e_1_3_2_11_2","first-page":"582","volume-title":"Proceedings of the 11th International Conference on Information and Knowledge Management","author":"Barbar\u00e1 Daniel","year":"2002","unstructured":"Daniel Barbar\u00e1, Yi Li, and Julia Couto. 2002. COOLCAT: An entropy-based algorithm for categorical clustering. In Proceedings of the 11th International Conference on Information and Knowledge Management, 582\u2013589."},{"issue":"4","key":"e_1_3_2_12_2","doi-asserted-by":"crossref","first-page":"3658","DOI":"10.1109\/TKDE.2021.3132373","article-title":"Dimensionality reduction for categorical data","volume":"35","author":"Bera Debajyoti","year":"2023","unstructured":"Debajyoti Bera, Rameshwar Pratap, and Bhisham Dev Verma. 2023. Dimensionality reduction for categorical data. IEEE Transactions on Knowledge and Data Engineering 35, 4 (2023), 3658\u20133671.","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"key":"e_1_3_2_13_2","doi-asserted-by":"crossref","first-page":"77","DOI":"10.1007\/BF01908065","article-title":"On some significance tests in cluster analysis","volume":"2","author":"Bock Hans-Hermann","year":"1985","unstructured":"Hans-Hermann Bock. 1985. On some significance tests in cluster analysis. Journal of Classification 2 (1985), 77\u2013108.","journal-title":"Journal of Classification"},{"issue":"1","key":"e_1_3_2_14_2","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1016\/0167-9473(96)88919-5","article-title":"Probabilistic models in cluster analysis","volume":"23","author":"Bock Hans H.","year":"1996","unstructured":"Hans H. Bock. 1996. Probabilistic models in cluster analysis. Computational Statistics & Data Analysis 23, 1 (1996), 5\u201328.","journal-title":"Computational Statistics & Data Analysis"},{"key":"e_1_3_2_15_2","doi-asserted-by":"crossref","DOI":"10.1002\/9781119558378","volume-title":"Introduction to Meta-Analysis","author":"Borenstein Michael","year":"2021","unstructured":"Michael Borenstein, Larry V. Hedges, Julian P. T. Higgins, and Hannah R. Rothstein. 2021. Introduction to Meta-Analysis. John Wiley & Sons."},{"key":"e_1_3_2_16_2","first-page":"243","volume-title":"Proceedings of the 2008 SIAM International Conference on Data Mining","author":"Boriah Shyam","year":"2008","unstructured":"Shyam Boriah, Varun Chandola, and Vipin Kumar. 2008. Similarity measures for categorical data: A comparative evaluation. In Proceedings of the 2008 SIAM International Conference on Data Mining. SIAM, 243\u2013254."},{"issue":"1","key":"e_1_3_2_17_2","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1007\/s10618-013-0336-8","article-title":"Clustering categorical data in projected spaces","volume":"29","author":"Bouguessa Mohamed","year":"2015","unstructured":"Mohamed Bouguessa. 2015. Clustering categorical data in projected spaces. Data Mining and Knowledge Discovery 29, 1 (2015), 3\u201338.","journal-title":"Data Mining and Knowledge Discovery"},{"key":"e_1_3_2_18_2","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511804441","volume-title":"Convex Optimization","author":"Boyd Stephen P.","year":"2004","unstructured":"Stephen P. Boyd and Lieven Vandenberghe. 2004. Convex Optimization. Cambridge University Press."},{"issue":"2","key":"e_1_3_2_19_2","doi-asserted-by":"crossref","first-page":"285","DOI":"10.1007\/s00357-015-9179-x","article-title":"DESPOTA: DEndrogram slicing through a permutation test approach","volume":"32","author":"Bruzzese Dario","year":"2015","unstructured":"Dario Bruzzese and Domenico Vistocco. 2015. DESPOTA: DEndrogram slicing through a permutation test approach. Journal of Classification 32, 2 (2015), 285\u2013304.","journal-title":"Journal of Classification"},{"issue":"12","key":"e_1_3_2_20_2","doi-asserted-by":"crossref","first-page":"1624","DOI":"10.1109\/TKDE.2005.198","article-title":"Document clustering using locality preserving indexing","volume":"17","author":"Cai Deng","year":"2005","unstructured":"Deng Cai, Xiaofei He, and Jiawei Han. 2005. Document clustering using locality preserving indexing. IEEE Transactions on Knowledge and Data Engineering 17, 12 (2005), 1624\u20131637.","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"issue":"1","key":"e_1_3_2_21_2","first-page":"1","article-title":"Meta-analysis methods for combining multiple expression profiles: Comparisons, statistical characterization and an application guideline","volume":"14","author":"Chang Lun-Ching","year":"2013","unstructured":"Lun-Ching Chang, Hui-Min Lin, Etienne Sibille, and George C. Tseng. 2013. Meta-analysis methods for combining multiple expression profiles: Comparisons, statistical characterization and an application guideline. BMC Bioinformatics 14, 1 (2013), 1\u201315.","journal-title":"BMC Bioinformatics"},{"key":"e_1_3_2_22_2","doi-asserted-by":"crossref","first-page":"108272","DOI":"10.1016\/j.patcog.2021.108272","article-title":"The UU-test for statistical modeling of unimodal data","volume":"122","author":"Chasani Paraskevi","year":"2022","unstructured":"Paraskevi Chasani and Aristidis Likas. 2022. The UU-test for statistical modeling of unimodal data. Pattern Recognition 122 (2022), 108272.","journal-title":"Pattern Recognition"},{"key":"e_1_3_2_23_2","first-page":"153","article-title":"Bayesian classification (AutoClass): Theory and results","author":"Cheeseman Peter","year":"1996","unstructured":"Peter Cheeseman and John Stutz. 1996. Bayesian classification (AutoClass): Theory and results. In Advances in Knowledge Discovery and Data Mining, 153\u2013180.","journal-title":"In Advances in Knowledge Discovery and Data Mining"},{"key":"e_1_3_2_24_2","first-page":"2732","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Chen Chao","year":"2016","unstructured":"Chao Chen and Novi Quadrianto. 2016. Clustering high dimensional categorical data via topographical features. In Proceedings of the International Conference on Machine Learning, PMLR, 2732\u20132740."},{"issue":"11","key":"e_1_3_2_25_2","doi-asserted-by":"crossref","first-page":"1458","DOI":"10.1109\/TKDE.2008.81","article-title":"On data labeling for clustering categorical data","volume":"20","author":"Chen Hung-Leng","year":"2008","unstructured":"Hung-Leng Chen, Kun-Ta Chuang, and Ming-Syan Chen. 2008. On data labeling for clustering categorical data. IEEE Transactions on Knowledge and Data Engineering 20, 11 (2008), 1458\u20131472.","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"issue":"6","key":"e_1_3_2_26_2","doi-asserted-by":"crossref","first-page":"1241","DOI":"10.1007\/s00778-009-0134-5","article-title":"He-tree: A framework for detecting changes in clustering structure for categorical data streams","volume":"18","author":"Chen Keke","year":"2009","unstructured":"Keke Chen and Ling Liu. 2009. He-tree: A framework for detecting changes in clustering structure for categorical data streams. The VLDB Journal 18, 6 (2009), 1241\u20131260.","journal-title":"The VLDB Journal"},{"issue":"1","key":"e_1_3_2_27_2","doi-asserted-by":"crossref","first-page":"2246","DOI":"10.1016\/j.artint.2011.09.003","article-title":"Model-based multidimensional clustering of categorical data","volume":"176","author":"Chen Tao","year":"2012","unstructured":"Tao Chen, Nevin L. Zhang, Tengfei Liu, Kin Man Poon, and Yi Wang. 2012. Model-based multidimensional clustering of categorical data. Artificial Intelligence 176, 1 (2012), 2246\u20132269.","journal-title":"Artificial Intelligence"},{"issue":"152","key":"e_1_3_2_28_2","first-page":"1","article-title":"Selective inference for k-means clustering","volume":"24","author":"Chen Yiqun T.","year":"2023","unstructured":"Yiqun T. Chen and Daniela M. Witten. 2023. Selective inference for k-means clustering. Journal of Machine Learning Research 24, 152 (2023), 1\u201341.","journal-title":"Journal of Machine Learning Research"},{"issue":"3","key":"e_1_3_2_29_2","doi-asserted-by":"crossref","first-page":"579","DOI":"10.1111\/1467-9868.00141","article-title":"Calibrating the excess mass and dip tests of modality","volume":"60","author":"Cheng M.-Y.","year":"1998","unstructured":"M.-Y. Cheng and Peter Hall. 1998. Calibrating the excess mass and dip tests of modality. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 60, 3 (1998), 579\u2013589.","journal-title":"Journal of the Royal Statistical Society: Series B (Statistical Methodology)"},{"issue":"4","key":"e_1_3_2_30_2","doi-asserted-by":"crossref","first-page":"153","DOI":"10.1002\/sam.11379","article-title":"The next-generation K-means algorithm","volume":"11","author":"Demidenko Eugene","year":"2018","unstructured":"Eugene Demidenko. 2018. The next-generation K-means algorithm. Statistical Analysis and Data Mining 11, 4 (2018), 153\u2013166.","journal-title":"Statistical Analysis and Data Mining"},{"key":"e_1_3_2_31_2","first-page":"1","article-title":"Statistical comparisons of classifiers over multiple data sets","volume":"7","author":"Dem\u0161ar Janez","year":"2006","unstructured":"Janez Dem\u0161ar. 2006. Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7 (2006), 1\u201330.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_2_32_2","doi-asserted-by":"crossref","first-page":"126608","DOI":"10.1016\/j.eswa.2025.126608","article-title":"Categorical data clustering: 25 years beyond K-modes","volume":"272","author":"Dinh Tai","year":"2025","unstructured":"Tai Dinh, Wong Hauchi, Philippe Fournier-Viger, Daniil Lisik, Minh-Quyet Ha, Hieu-Chi Dam, and Van-Nam Huynh. 2025. Categorical data clustering: 25 years beyond K-modes. Expert Systems with Applications 272 (2025), 126608.","journal-title":"Expert Systems with Applications"},{"key":"e_1_3_2_33_2","unstructured":"Dheeru Dua and Casey Graff. 2017. UCI machine learning repository. Retrieved from http:\/\/archive.ics.uci.edu\/ml"},{"issue":"1","key":"e_1_3_2_34_2","doi-asserted-by":"crossref","first-page":"33","DOI":"10.1007\/BF01890074","article-title":"A test for spatial homogeneity in cluster analysis","volume":"4","author":"Dubes Richard C.","year":"1987","unstructured":"Richard C. Dubes and Guangzhou Zeng. 1987. A test for spatial homogeneity in cluster analysis. Journal of Classification 4, 1 (1987), 33\u201356.","journal-title":"Journal of Classification"},{"issue":"8","key":"e_1_3_2_35_2","doi-asserted-by":"crossref","first-page":"4875","DOI":"10.1109\/TIT.2019.2903113","article-title":"Adaptive nonparametric clustering","volume":"65","author":"Efimov Kirill","year":"2019","unstructured":"Kirill Efimov, Larisa Adamyan, and Vladimir Spokoiny. 2019. Adaptive nonparametric clustering. IEEE Transactions on Information Theory 65, 8 (2019), 4875\u20134892.","journal-title":"IEEE Transactions on Information Theory"},{"issue":"2","key":"e_1_3_2_36_2","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1007\/BF00114265","article-title":"Knowledge acquisition via incremental conceptual clustering","volume":"2","author":"Fisher Douglas H.","year":"1987","unstructured":"Douglas H. Fisher. 1987. Knowledge acquisition via incremental conceptual clustering. Machine Learning 2, 2 (1987), 139\u2013172.","journal-title":"Machine Learning"},{"issue":"374","key":"e_1_3_2_37_2","doi-asserted-by":"crossref","first-page":"277","DOI":"10.1080\/01621459.1981.10477643","article-title":"Graphics for the multivariate two-sample problem","volume":"76","author":"Friedman Jerome H.","year":"1981","unstructured":"Jerome H. Friedman and Lawrence C. Rafsky. 1981. Graphics for the multivariate two-sample problem. Journal of the American Statistical Association 76, 374 (1981), 277\u2013287.","journal-title":"Journal of the American Statistical Association"},{"issue":"2","key":"e_1_3_2_38_2","first-page":"115","article-title":"Testing for the existence of clusters","volume":"33","author":"Fuentes Claudio","year":"2009","unstructured":"Claudio Fuentes and George Casella. 2009. Testing for the existence of clusters. SORT (Barcelona) 33, 2 (2009), 115.","journal-title":"SORT (Barcelona)"},{"issue":"2","key":"e_1_3_2_39_2","doi-asserted-by":"crossref","first-page":"179","DOI":"10.1007\/s11634-016-0278-2","article-title":"Probabilistic clustering via Pareto solutions and significance tests","volume":"12","author":"Gallegos Mar\u00eda Teresa","year":"2018","unstructured":"Mar\u00eda Teresa Gallegos and Gunter Ritter. 2018. Probabilistic clustering via Pareto solutions and significance tests. Advances in Data Analysis and Classification 12, 2 (2018), 179\u2013202.","journal-title":"Advances in Data Analysis and Classification"},{"key":"e_1_3_2_40_2","first-page":"73","volume-title":"Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","author":"Ganti Venkatesh","year":"1999","unstructured":"Venkatesh Ganti, Johannes Gehrke, and Raghu Ramakrishnan. 1999. CACTUS\u2014Clustering categorical data using summaries. In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 73\u201383."},{"issue":"545","key":"e_1_3_2_41_2","doi-asserted-by":"crossref","first-page":"332","DOI":"10.1080\/01621459.2022.2116331","article-title":"Selective inference for hierarchical clustering","volume":"119","author":"Gao Lucy L.","year":"2024","unstructured":"Lucy L. Gao, Jacob Bien, and Daniela Witten. 2024. Selective inference for hierarchical clustering. Journal of the American Statistical Association 119, 545 (2024), 332\u2013342.","journal-title":"Journal of the American Statistical Association"},{"key":"e_1_3_2_42_2","doi-asserted-by":"crossref","first-page":"32","DOI":"10.1007\/978-3-642-79999-0_3","volume-title":"From Data to Knowledge","author":"Gordon Allan D.","year":"1996","unstructured":"Allan D. Gordon. 1996. Null models in cluster validation. In From Data to Knowledge. Springer, 32\u201344."},{"issue":"5","key":"e_1_3_2_43_2","doi-asserted-by":"crossref","first-page":"345","DOI":"10.1016\/S0306-4379(00)00022-3","article-title":"ROCK: A robust clustering algorithm for categorical attributes","volume":"25","author":"Guha Sudipto","year":"2000","unstructured":"Sudipto Guha, Rajeev Rastogi, and Kyuseok Shim. 2000. ROCK: A robust clustering algorithm for categorical attributes. Information Systems 25, 5 (2000), 345\u2013366.","journal-title":"Information Systems"},{"issue":"4","key":"e_1_3_2_44_2","doi-asserted-by":"crossref","first-page":"1215","DOI":"10.1111\/biom.13376","article-title":"Nonparametric cluster significance testing with reference to a unimodal null distribution","volume":"77","author":"Helgeson Erika S.","year":"2021","unstructured":"Erika S. Helgeson, David M. Vock, and Eric Bair. 2021. Nonparametric cluster significance testing with reference to a unimodal null distribution. Biometrics 77, 4 (2021), 1215\u20131226.","journal-title":"Biometrics"},{"key":"e_1_3_2_45_2","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1016\/j.patrec.2015.04.009","article-title":"What are the true clusters?","volume":"64","author":"Hennig Christian","year":"2015","unstructured":"Christian Hennig. 2015. What are the true clusters? Pattern Recognition Letters 64 (2015), 53\u201362.","journal-title":"Pattern Recognition Letters"},{"issue":"5","key":"e_1_3_2_46_2","doi-asserted-by":"crossref","first-page":"4113","DOI":"10.1007\/s10115-024-02317-x","article-title":"Clusterability test for categorical data","volume":"67","author":"Hu Lianyu","year":"2025","unstructured":"Lianyu Hu, Junjie Dong, Mudi Jiang, Yan Liu, and Zengyou He. 2025. Clusterability test for categorical data. Knowledge and Information Systems 67, 5 (2025), 4113\u20134138.","journal-title":"Knowledge and Information Systems"},{"key":"e_1_3_2_47_2","doi-asserted-by":"crossref","first-page":"111364","DOI":"10.1016\/j.patcog.2025.111364","article-title":"Interpretable categorical data clustering via hypothesis testing","volume":"162","author":"Hu Lianyu","year":"2025","unstructured":"Lianyu Hu, Mudi Jiang, Junjie Dong, Xinying Liu, and Zengyou He. 2025. Interpretable categorical data clustering via hypothesis testing. Pattern Recognition 162 (2025), 111364.","journal-title":"Pattern Recognition"},{"key":"e_1_3_2_48_2","doi-asserted-by":"crossref","first-page":"121588","DOI":"10.1016\/j.ins.2024.121588","article-title":"Significance-based decision tree for interpretable categorical data clustering","volume":"690","author":"Hu Lianyu","year":"2025","unstructured":"Lianyu Hu, Mudi Jiang, Xinying Liu, and Zengyou He. 2025. Significance-based decision tree for interpretable categorical data clustering. Information Sciences 690 (2025), 121588.","journal-title":"Information Sciences"},{"issue":"4","key":"e_1_3_2_49_2","doi-asserted-by":"crossref","first-page":"975","DOI":"10.1080\/10618600.2014.948179","article-title":"Statistical significance of clustering using soft thresholding","volume":"24","author":"Huang Hanwen","year":"2015","unstructured":"Hanwen Huang, Yufeng Liu, Ming Yuan, and J. S. Marron. 2015. Statistical significance of clustering using soft thresholding. Journal of Computational and Graphical Statistics 24, 4 (2015), 975\u2013993.","journal-title":"Journal of Computational and Graphical Statistics"},{"issue":"5","key":"e_1_3_2_50_2","doi-asserted-by":"crossref","first-page":"657","DOI":"10.1109\/TPAMI.2005.95","article-title":"Automated variable weighting in k-means type clustering","volume":"27","author":"Huang J. Z.","year":"2005","unstructured":"J. Z. Huang, M. K. Ng, Hongqiang Rong, and Zichen Li. 2005. Automated variable weighting in k-means type clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 5 (2005), 657\u2013668.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"e_1_3_2_51_2","doi-asserted-by":"crossref","first-page":"107452","DOI":"10.1016\/j.asoc.2021.107452","article-title":"Discovery of arbitrarily shaped significant clusters in spatial point data with noise","volume":"108","author":"Huang Jincai","year":"2021","unstructured":"Jincai Huang and Jianbo Tang. 2021. Discovery of arbitrarily shaped significant clusters in spatial point data with noise. Applied Soft Computing 108 (2021), 107452.","journal-title":"Applied Soft Computing"},{"issue":"3","key":"e_1_3_2_52_2","doi-asserted-by":"crossref","first-page":"283","DOI":"10.1023\/A:1009769707641","article-title":"Extensions to the k-means algorithm for clustering large data sets with categorical values","volume":"2","author":"Huang Zhexue","year":"1998","unstructured":"Zhexue Huang. 1998. Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery 2, 3 (1998), 283\u2013304.","journal-title":"Data Mining and Knowledge Discovery"},{"issue":"1","key":"e_1_3_2_53_2","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1007\/BF01908075","article-title":"Comparing partitions","volume":"2","author":"Hubert Lawrence","year":"1985","unstructured":"Lawrence Hubert and Phipps Arabie. 1985. Comparing partitions. Journal of Classification 2, 1 (1985), 193\u2013218.","journal-title":"Journal of Classification"},{"issue":"12","key":"e_1_3_2_54_2","doi-asserted-by":"crossref","first-page":"2396","DOI":"10.1109\/TPAMI.2011.84","article-title":"A link-based approach to the cluster ensemble problem","volume":"33","author":"Iam-On Natthakan","year":"2011","unstructured":"Natthakan Iam-On, Tossapon Boongoen, Simon Garrett, and Chris Price. 2011. A link-based approach to the cluster ensemble problem. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 12 (2011), 2396\u20132409.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"e_1_3_2_55_2","first-page":"281","volume-title":"In Proceedings of the 2002 International Conference on Pattern Recognition","volume":"4","author":"Jain A. K.","year":"2002","unstructured":"A. K. Jain, Xiaowei Xu, Tin Kam Ho, and Fan Xiao. 2002. Uniformity testing using minimal spanning tree. In Proceedings of the 2002 International Conference on Pattern Recognition, Vol. 4, 281\u2013284."},{"issue":"8","key":"e_1_3_2_56_2","doi-asserted-by":"crossref","first-page":"651","DOI":"10.1016\/j.patrec.2009.09.011","article-title":"Data clustering: 50 years beyond K-means","volume":"31","author":"Jain Anil K.","year":"2010","unstructured":"Anil K. Jain. 2010. Data clustering: 50 years beyond K-means. Pattern Recognition Letters 31, 8 (2010), 651\u2013666.","journal-title":"Pattern Recognition Letters"},{"issue":"8","key":"e_1_3_2_57_2","doi-asserted-by":"crossref","first-page":"3308","DOI":"10.1109\/TNNLS.2017.2728138","article-title":"Subspace clustering of categorical and numerical data with an unknown number of clusters","volume":"29","author":"Jia Hong","year":"2018","unstructured":"Hong Jia and Yiu-Ming Cheung. 2018. Subspace clustering of categorical and numerical data with an unknown number of clusters. IEEE Transactions on Neural Networks and Learning Systems 29, 8 (2018), 3308\u20133325.","journal-title":"IEEE Transactions on Neural Networks and Learning Systems"},{"issue":"5","key":"e_1_3_2_58_2","doi-asserted-by":"crossref","first-page":"1065","DOI":"10.1109\/TNNLS.2015.2436432","article-title":"A new distance metric for unsupervised learning of categorical data","volume":"27","author":"Jia Hong","year":"2016","unstructured":"Hong Jia, Yiu-Ming Cheung, and Jiming Liu. 2016. A new distance metric for unsupervised learning of categorical data. IEEE Transactions on Neural Networks and Learning Systems 27, 5 (2016), 1065\u20131079.","journal-title":"IEEE Transactions on Neural Networks and Learning Systems"},{"issue":"9","key":"e_1_3_2_59_2","doi-asserted-by":"crossref","first-page":"1810","DOI":"10.1109\/TKDE.2018.2808532","article-title":"Unsupervised coupled metric similarity for non-iid categorical data","volume":"30","author":"Jian Songlei","year":"2018","unstructured":"Songlei Jian, Longbing Cao, Kai Lu, and Hang Gao. 2018. Unsupervised coupled metric similarity for non-iid categorical data. IEEE Transactions on Knowledge and Data Engineering 30, 9 (2018), 1810\u20131823.","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"issue":"5","key":"e_1_3_2_60_2","doi-asserted-by":"crossref","first-page":"853","DOI":"10.1109\/TKDE.2018.2848902","article-title":"CURE: Flexible categorical data representation by hierarchical coupling learning","volume":"31","author":"Jian Songlei","year":"2019","unstructured":"Songlei Jian, Guansong Pang, Longbing Cao, Kai Lu, and Hang Gao. 2019. CURE: Flexible categorical data representation by hierarchical coupling learning. IEEE Transactions on Knowledge and Data Engineering 31, 5 (2019), 853\u2013866.","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"key":"e_1_3_2_61_2","article-title":"Dip-means: An incremental clustering method for estimating the number of clusters","volume":"25","author":"Kalogeratos Argyris","year":"2012","unstructured":"Argyris Kalogeratos and Aristidis Likas. 2012. Dip-means: An incremental clustering method for estimating the number of clusters. In Advances in Neural Information Processing Systems, Vol. 25.","journal-title":"In Advances in Neural Information Processing Systems"},{"issue":"3","key":"e_1_3_2_62_2","doi-asserted-by":"crossref","first-page":"811","DOI":"10.1111\/biom.12647","article-title":"Statistical significance for hierarchical clustering","volume":"73","author":"Kimes Patrick K.","year":"2017","unstructured":"Patrick K. Kimes, Yufeng Liu, David Neil Hayes, and James Stephen Marron. 2017. Statistical significance for hierarchical clustering. Biometrics 73, 3 (2017), 811\u2013821.","journal-title":"Biometrics"},{"key":"e_1_3_2_63_2","first-page":"68","volume-title":"Proceedings of the 21st International Conference on Machine Learning","author":"Li Tao","year":"2004","unstructured":"Tao Li, Sheng Ma, and Mitsunori Ogihara. 2004. Entropy-based criterion in categorical clustering. In Proceedings of the 21st International Conference on Machine Learning, 68."},{"issue":"341","key":"e_1_3_2_64_2","doi-asserted-by":"crossref","first-page":"159","DOI":"10.1080\/01621459.1973.10481356","article-title":"A probability theory of cluster analysis","volume":"68","author":"Ling Robert F.","year":"1973","unstructured":"Robert F. Ling. 1973. A probability theory of cluster analysis. Journal of the American Statistical Association 68, 341 (1973), 159\u2013164.","journal-title":"Journal of the American Statistical Association"},{"issue":"354","key":"e_1_3_2_65_2","doi-asserted-by":"crossref","first-page":"293","DOI":"10.1080\/01621459.1976.10480335","article-title":"Probability tables for cluster analysis based on a theory of random graphs","volume":"71","author":"Ling Robert F.","year":"1976","unstructured":"Robert F. Ling and George G. Killough. 1976. Probability tables for cluster analysis based on a theory of random graphs. Journal of the American Statistical Association 71, 354 (1976), 293\u2013300.","journal-title":"Journal of the American Statistical Association"},{"issue":"5","key":"e_1_3_2_66_2","doi-asserted-by":"crossref","first-page":"1129","DOI":"10.1109\/TKDE.2017.2650229","article-title":"Spectral ensemble clustering via weighted K-means: Theoretical and practical evidence","volume":"29","author":"Liu Hongfu","year":"2017","unstructured":"Hongfu Liu, Junjie Wu, Tongliang Liu, Dacheng Tao, and Yun Fu. 2017. Spectral ensemble clustering via weighted K-means: Theoretical and practical evidence. IEEE Transactions on Knowledge and Data Engineering 29, 5 (2017), 1129\u20131143.","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"issue":"483","key":"e_1_3_2_67_2","doi-asserted-by":"crossref","first-page":"1281","DOI":"10.1198\/016214508000000454","article-title":"Statistical significance of clustering for high-dimension, low\u2013sample size data","volume":"103","author":"Liu Yufeng","year":"2008","unstructured":"Yufeng Liu, David Neil Hayes, Andrew Nobel, and James Stephen Marron. 2008. Statistical significance of clustering for high-dimension, low\u2013sample size data. Journal of the American Statistical Association 103, 483 (2008), 1281\u20131293.","journal-title":"Journal of the American Statistical Association"},{"issue":"497","key":"e_1_3_2_68_2","doi-asserted-by":"crossref","first-page":"378","DOI":"10.1080\/01621459.2011.646935","article-title":"Bootstrapping for significance of compact clusters in multidimensional datasets","volume":"107","author":"Maitra Ranjan","year":"2012","unstructured":"Ranjan Maitra, Volodymyr Melnykov, and Soumendra N. Lahiri. 2012. Bootstrapping for significance of compact clusters in multidimensional datasets. Journal of the American Statistical Association 107, 497 (2012), 378\u2013392.","journal-title":"Journal of the American Statistical Association"},{"key":"e_1_3_2_69_2","doi-asserted-by":"crossref","first-page":"1055","DOI":"10.1145\/2939672.2939740","volume-title":"Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","author":"Maurus Samuel","year":"2016","unstructured":"Samuel Maurus and Claudia Plant. 2016. Skinny-dip: Clustering in a sea of noise. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1055\u20131064."},{"issue":"2","key":"e_1_3_2_70_2","doi-asserted-by":"crossref","first-page":"219","DOI":"10.1023\/A:1010924920739","article-title":"Reinterpreting the category utility function","volume":"45","author":"Mirkin Boris","year":"2001","unstructured":"Boris Mirkin. 2001. Reinterpreting the category utility function. Machine Learning 45, 2 (2001), 219\u2013228.","journal-title":"Machine Learning"},{"issue":"3","key":"e_1_3_2_71_2","doi-asserted-by":"crossref","first-page":"503","DOI":"10.1109\/TPAMI.2007.53","article-title":"On the impact of dissimilarity measure in k-modes clustering algorithm","volume":"29","author":"Ng Michael K.","year":"2007","unstructured":"Michael K. Ng, Mark Junjie Li, Joshua Zhexue Huang, and Zengyou He. 2007. On the impact of dissimilarity measure in k-modes clustering algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence 29, 3 (2007), 503\u2013507.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"issue":"3","key":"e_1_3_2_72_2","doi-asserted-by":"crossref","first-page":"409","DOI":"10.1093\/bioinformatics\/btv588","article-title":"A novel bi-level meta-analysis approach: Applied to biological pathway analysis","volume":"32","author":"Nguyen Tin","year":"2016","unstructured":"Tin Nguyen, Rebecca Tagett, Michele Donato, Cristina Mitrea, and Sorin Draghici. 2016. A novel bi-level meta-analysis approach: Applied to biological pathway analysis. Bioinformatics 32, 3 (2016), 409\u2013416.","journal-title":"Bioinformatics"},{"key":"e_1_3_2_73_2","doi-asserted-by":"crossref","first-page":"253","DOI":"10.1016\/j.csda.2013.02.013","article-title":"An empirical study of tests for uniformity in multidimensional data","volume":"64","author":"Petrie Adam","year":"2013","unstructured":"Adam Petrie and Thomas R. Willemain. 2013. An empirical study of tests for uniformity in multidimensional data. Computational Statistics & Data Analysis 64 (2013), 253\u2013268.","journal-title":"Computational Statistics & Data Analysis"},{"issue":"10","key":"e_1_3_2_74_2","doi-asserted-by":"crossref","first-page":"2047","DOI":"10.1109\/TNNLS.2015.2451151","article-title":"Space structure and clustering of categorical data","volume":"27","author":"Qian Yuhua","year":"2016","unstructured":"Yuhua Qian, Feijiang Li, Jiye Liang, Bing Liu, and Chuangyin Dang. 2016. Space structure and clustering of categorical data. IEEE Transactions on Neural Networks and Learning Systems 27, 10 (2016), 2047\u20132059.","journal-title":"IEEE Transactions on Neural Networks and Learning Systems"},{"key":"e_1_3_2_75_2","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1007\/BF01201021","article-title":"The MAP test for multimodality","volume":"11","author":"Roz\u00e1l Gregory Paul M.","year":"1994","unstructured":"Gregory Paul M. Roz\u00e1l and J. A. Hartigan. 1994. The MAP test for multimodality. Journal of Classification 11, 1 (1994), 5\u201336.","journal-title":"Journal of Classification"},{"issue":"8","key":"e_1_3_2_76_2","doi-asserted-by":"crossref","first-page":"888","DOI":"10.1109\/34.868688","article-title":"Normalized cuts and image segmentation","volume":"22","author":"Shi Jianbo","year":"2000","unstructured":"Jianbo Shi and Jitendra Malik. 2000. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 8 (2000), 888\u2013905.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"e_1_3_2_77_2","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1016\/j.inffus.2021.11.011","article-title":"Tabular data: Deep learning is not all you need","volume":"81","author":"Shwartz-Ziv Ravid","year":"2022","unstructured":"Ravid Shwartz-Ziv and Amitai Armon. 2022. Tabular data: Deep learning is not all you need. Information Fusion 81 (2022), 84\u201390.","journal-title":"Information Fusion"},{"key":"e_1_3_2_78_2","first-page":"2210","volume-title":"Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","author":"Siffer Alban","year":"2018","unstructured":"Alban Siffer, Pierre-Alain Fouque, Alexandre Termier, and Christine Largou\u00ebt. 2018. Are your data gathered? In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2210\u20132218."},{"issue":"1","key":"e_1_3_2_79_2","doi-asserted-by":"crossref","first-page":"97","DOI":"10.1111\/j.2517-6161.1981.tb01155.x","article-title":"Using kernel density estimates to investigate multimodality","volume":"43","author":"Silverman Bernard W.","year":"1981","unstructured":"Bernard W. Silverman. 1981. Using kernel density estimates to investigate multimodality. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 43, 1 (1981), 97\u201399.","journal-title":"Journal of the Royal Statistical Society: Series B (Statistical Methodology)"},{"issue":"1","key":"e_1_3_2_80_2","doi-asserted-by":"crossref","first-page":"89","DOI":"10.1109\/34.184777","article-title":"Threshold validity for mutual neighborhood clustering","volume":"15","author":"Smith S. P.","year":"1993","unstructured":"S. P. Smith. 1993. Threshold validity for mutual neighborhood clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence 15, 1 (1993), 89\u201392.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"e_1_3_2_81_2","doi-asserted-by":"crossref","first-page":"73","DOI":"10.1109\/TPAMI.1984.4767477","article-title":"Testing for uniformity in multidimensional data","volume":"1","author":"Smith Stephen P.","year":"1984","unstructured":"Stephen P. Smith and Anil K. Jain. 1984. Testing for uniformity in multidimensional data. IEEE Transactions on Pattern Analysis and Machine Intelligence 1 (1984), 73\u201381.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"issue":"2","key":"e_1_3_2_82_2","first-page":"777","article-title":"Hypothesis setting and order statistic for robust genomic meta-analysis","volume":"8","author":"Song Chi","year":"2014","unstructured":"Chi Song and George C. Tseng. 2014. Hypothesis setting and order statistic for robust genomic meta-analysis. The Annals of Applied Statistics 8, 2 (2014), 777.","journal-title":"The Annals of Applied Statistics"},{"key":"e_1_3_2_83_2","first-page":"583","article-title":"Cluster ensembles\u2014a knowledge reuse framework for combining multiple partitions","volume":"3","author":"Strehl Alexander","year":"2002","unstructured":"Alexander Strehl and Joydeep Ghosh. 2002. Cluster ensembles\u2014a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research 3 (Dec. 2002), 583\u2013617.","journal-title":"Journal of Machine Learning Research"},{"issue":"1","key":"e_1_3_2_84_2","doi-asserted-by":"crossref","first-page":"133","DOI":"10.1080\/10618600.2020.1796398","article-title":"U-statistical inference for hierarchical clustering","volume":"30","author":"Valk Marcio","year":"2021","unstructured":"Marcio Valk and Gabriela Bettella Cybis. 2021. U-statistical inference for hierarchical clustering. Journal of Computational and Graphical Statistics 30, 1 (2021), 133\u2013143.","journal-title":"Journal of Computational and Graphical Statistics"},{"issue":"525","key":"e_1_3_2_85_2","doi-asserted-by":"crossref","first-page":"158","DOI":"10.1080\/01621459.2017.1385465","article-title":"Admissibility in partial conjunction testing","volume":"114","author":"Wang Jingshu","year":"2019","unstructured":"Jingshu Wang and Art B. Owen. 2019. Admissibility in partial conjunction testing. Journal of the American Statistical Association 114, 525 (2019), 158\u2013168.","journal-title":"Journal of the American Statistical Association"},{"key":"e_1_3_2_86_2","doi-asserted-by":"crossref","first-page":"183","DOI":"10.1016\/j.patcog.2019.01.042","article-title":"Optimal mathematical programming and variable neighborhood search for k-modes categorical data clustering","volume":"90","author":"Xiao Yiyong","year":"2019","unstructured":"Yiyong Xiao, Changhao Huang, Jiaoying Huang, Ikou Kaku, and Yuchun Xu. 2019. Optimal mathematical programming and variable neighborhood search for k-modes categorical data clustering. Pattern Recognition 90 (2019), 183\u2013195.","journal-title":"Pattern Recognition"},{"issue":"5","key":"e_1_3_2_87_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3474842","article-title":"Significant DBSCAN+: Statistically robust density-based clustering","volume":"12","author":"Xie Yiqun","year":"2021","unstructured":"Yiqun Xie, Xiaowei Jia, Shashi Shekhar, Han Bao, and Xun Zhou. 2021. Significant DBSCAN+: Statistically robust density-based clustering. ACM Transactions on Intelligent Systems and Technology 12, 5 (2021), 1\u201326.","journal-title":"ACM Transactions on Intelligent Systems and Technology"},{"issue":"1","key":"e_1_3_2_88_2","doi-asserted-by":"crossref","first-page":"103","DOI":"10.1007\/s10618-011-0221-2","article-title":"DHCC: Divisive hierarchical clustering of categorical data","volume":"24","author":"Xiong Tengke","year":"2012","unstructured":"Tengke Xiong, Shengrui Wang, Andr\u00e9 Mayers, and Ernest Monga. 2012. DHCC: Divisive hierarchical clustering of categorical data. Data Mining and Knowledge Discovery 24, 1 (2012), 103\u2013135.","journal-title":"Data Mining and Knowledge Discovery"},{"issue":"473","key":"e_1_3_2_89_2","doi-asserted-by":"crossref","first-page":"355","DOI":"10.1198\/016214505000000312","article-title":"Clustering categorical data based on distance vectors","volume":"101","author":"Zhang Peng","year":"2006","unstructured":"Peng Zhang, Xiaogang Wang, and Peter X.-K Song. 2006. Clustering categorical data based on distance vectors. Journal of the American Statistical Association 101, 473 (2006), 355\u2013367.","journal-title":"Journal of the American Statistical Association"},{"issue":"7","key":"e_1_3_2_90_2","first-page":"3560","article-title":"Learnable weighting of intra-attribute distances for categorical data clustering with nominal and ordinal attributes","volume":"44","author":"Zhang Yiqun","year":"2022","unstructured":"Yiqun Zhang and Yiu-Ming Cheung. 2022. Learnable weighting of intra-attribute distances for categorical data clustering with nominal and ordinal attributes. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 7 (2022), 3560\u20133576.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"issue":"9","key":"e_1_3_2_91_2","doi-asserted-by":"crossref","first-page":"6530","DOI":"10.1109\/TNNLS.2022.3202700","article-title":"Graph-based dissimilarity measurement for cluster analysis of any-type-attributed data","volume":"34","author":"Zhang Yiqun","year":"2023","unstructured":"Yiqun Zhang and Yiu-Ming Cheung. 2023. Graph-based dissimilarity measurement for cluster analysis of any-type-attributed data. IEEE Transactions on Neural Networks and Learning Systems 34, 9 (2023), 6530\u20136544.","journal-title":"IEEE Transactions on Neural Networks and Learning Systems"},{"key":"e_1_3_2_92_2","first-page":"3758","volume-title":"Proceedings of the 31st International Joint Conference on Artificial Intelligence","author":"Zhang Yiqun","year":"2022","unstructured":"Yiqun Zhang, Yiu-Ming Cheung, and An Zeng. 2022. Het2Hom: Representation of heterogeneous attributes into homogeneous concept spaces for categorical-and-numerical-attribute data clustering. In Proceedings of the 31st International Joint Conference on Artificial Intelligence, 3758\u20133765."},{"key":"e_1_3_2_93_2","first-page":"1943","volume-title":"Proceedings of the 27th European Conference on Artificial Intelligence","author":"Zhao Mingjie","year":"2024","unstructured":"Mingjie Zhao, Sen Feng, Yiqun Zhang, Mengke Li, Yang Lu, and Yiu-Ming Cheung. 2024. Learning order forest for qualitative-attribute data clustering. In Proceedings of the 27th European Conference on Artificial Intelligence, 1943\u20131950."},{"key":"e_1_3_2_94_2","doi-asserted-by":"crossref","first-page":"150","DOI":"10.1016\/j.patcog.2017.04.019","article-title":"Clustering ensemble selection for categorical data based on internal validity indices","volume":"69","author":"Zhao Xingwang","year":"2017","unstructured":"Xingwang Zhao, Jiye Liang, and Chuangyin Dang. 2017. Clustering ensemble selection for categorical data based on internal validity indices. Pattern Recognition 69 (2017), 150\u2013168.","journal-title":"Pattern Recognition"},{"issue":"3","key":"e_1_3_2_95_2","doi-asserted-by":"crossref","first-page":"927","DOI":"10.1109\/TNNLS.2019.2911118","article-title":"From whole to part: Reference-based representation for clustering categorical data","volume":"31","author":"Zheng Qibin","year":"2020","unstructured":"Qibin Zheng, Xingchun Diao, Jianjun Cao, Yi Liu, Hongmei Li, Junnan Yao, Chen Chang, and Guojun Lv. 2020. From whole to part: Reference-based representation for clustering categorical data. IEEE Transactions on Neural Networks and Learning Systems 31, 3 (2020), 927\u2013937.","journal-title":"IEEE Transactions on Neural Networks and Learning Systems"},{"issue":"1","key":"e_1_3_2_96_2","doi-asserted-by":"crossref","first-page":"533","DOI":"10.1109\/TPAMI.2020.3010953","article-title":"Unsupervised heterogeneous coupling learning for categorical representation","volume":"44","author":"Zhu Chengzhang","year":"2022","unstructured":"Chengzhang Zhu, Longbing Cao, and Jianping Yin. 2022. Unsupervised heterogeneous coupling learning for categorical representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 1 (2022), 533\u2013549.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"}],"container-title":["ACM Transactions on Knowledge Discovery from Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3735977","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,16]],"date-time":"2025-06-16T16:56:26Z","timestamp":1750092986000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3735977"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,16]]},"references-count":95,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2025,6,30]]}},"alternative-id":["10.1145\/3735977"],"URL":"https:\/\/doi.org\/10.1145\/3735977","relation":{},"ISSN":["1556-4681","1556-472X"],"issn-type":[{"value":"1556-4681","type":"print"},{"value":"1556-472X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,6,16]]},"assertion":[{"value":"2024-11-26","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-05-08","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-06-16","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}