{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,7]],"date-time":"2026-04-07T05:15:23Z","timestamp":1775538923216,"version":"3.50.1"},"reference-count":62,"publisher":"Association for Computing Machinery (ACM)","issue":"6","funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62476063, 62376233, 61806131, 62306181"],"award-info":[{"award-number":["62476063, 62376233, 61806131, 62306181"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"National Natural Science Foundation of China \/ Research Grants Council Joint Research Scheme","award":["N_HKBU214\/21"],"award-info":[{"award-number":["N_HKBU214\/21"]}]},{"DOI":"10.13039\/501100003453","name":"Natural Science Foundation of Guangdong Province","doi-asserted-by":"publisher","award":["2025A1515011293"],"award-info":[{"award-number":["2025A1515011293"]}],"id":[{"id":"10.13039\/501100003453","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003392","name":"Natural Science Foundation of Fujian Province","doi-asserted-by":"publisher","award":["2024J09001"],"award-info":[{"award-number":["2024J09001"]}],"id":[{"id":"10.13039\/501100003392","id-type":"DOI","asserted-by":"publisher"}]},{"name":"National Key Laboratory of Radar Signal Processing","award":["JKW202403"],"award-info":[{"award-number":["JKW202403"]}]},{"name":"General Research Fund of Research Grants Council","award":["12201321, 12202622, 12201323"],"award-info":[{"award-number":["12201321, 12202622, 12201323"]}]},{"name":"RGC Senior Research Fellow Scheme","award":["SRFS2324-2S02"],"award-info":[{"award-number":["SRFS2324-2S02"]}]},{"DOI":"10.13039\/501100017610","name":"Shenzhen Science and Technology Innovation Program","doi-asserted-by":"publisher","award":["RCBS20231211090659101"],"award-info":[{"award-number":["RCBS20231211090659101"]}],"id":[{"id":"10.13039\/501100017610","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Guangdong Provincial Key Laboratory","award":["2023B1212060076"],"award-info":[{"award-number":["2023B1212060076"]}]},{"name":"Xiaomi Young Talents Program"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Manag. Data"],"published-print":{"date-parts":[[2025,12,4]]},"abstract":"<jats:p>Clustering is a popular machine learning technique for data mining that can process and analyze datasets to automatically reveal sample distribution patterns. Since the ubiquitous categorical data naturally lack a well-defined metric space such as the Euclidean distance space of numerical data, the distribution of categorical data is usually under-represented, and thus valuable information can be easily twisted in clustering. This paper, therefore, introduces a novel order distance metric learning approach to intuitively represent categorical attribute values by learning their optimal order relationship and quantifying their distance in a line similar to that of the numerical attributes. Since subjectively created qualitative categorical values involve ambiguity and fuzziness, the order distance metric is learned in the context of clustering. Accordingly, a new joint learning paradigm is developed to alternatively perform clustering and order distance metric learning with low time complexity and a guarantee of convergence. Due to the clustering-friendly order learning mechanism and the homogeneous ordinal nature of the order distance and Euclidean distance, the proposed method achieves superior clustering accuracy on categorical and mixed datasets. More importantly, the learned order distance metric greatly reduces the difficulty of understanding and managing the non-intuitive categorical data. Experiments with ablation studies, significance tests, case studies, etc., have validated the efficacy of the proposed method. The source code is available at https:\/\/github.com\/csmjzhao\/OCL_Source_Code.<\/jats:p>","DOI":"10.1145\/3769772","type":"journal-article","created":{"date-parts":[[2025,12,6]],"date-time":"2025-12-06T04:32:13Z","timestamp":1764995533000},"page":"1-24","source":"Crossref","is-referenced-by-count":2,"title":["Categorical Data Clustering via Value Order Estimated Distance Metric Learning"],"prefix":"10.1145","volume":"3","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0328-987X","authenticated-orcid":false,"given":"Yiqun","family":"Zhang","sequence":"first","affiliation":[{"name":"School of Computer Science and Technology, Guangdong University of Technology, Guangzhou, China and Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-5517-4845","authenticated-orcid":false,"given":"Mingjie","family":"Zhao","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9352-2366","authenticated-orcid":false,"given":"Hong","family":"Jia","sequence":"additional","affiliation":[{"name":"College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9433-9683","authenticated-orcid":false,"given":"Mengke","family":"Li","sequence":"additional","affiliation":[{"name":"College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3497-9611","authenticated-orcid":false,"given":"Yang","family":"Lu","sequence":"additional","affiliation":[{"name":"School of Informatics, Xiamen University, Xiamen, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7629-4648","authenticated-orcid":false,"given":"Yiu-ming","family":"Cheung","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong"}]}],"member":"320","published-online":{"date-parts":[[2025,12,5]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1137\/0125042"},{"key":"e_1_2_1_2_1","volume-title":"Categorical data analysis","author":"Agresti Alan","unstructured":"Alan Agresti. 2003. Categorical data analysis. John Wiley & Sons."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2006.06.006"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/IJCNN.2014.6889941"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2022.108694"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2011.04.024"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1002\/bs.3830120210"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611972788.22"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDCS60910.2024.00035"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3637528.3671839"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2014.09.012"},{"key":"e_1_2_1_12_1","unstructured":"Dheeru Dua and Efi Karra Taniskidou. 2017. UCI Machine Learning Repository. http:\/\/archive.ics.uci.edu\/ml"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNN.2008.2005601"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3709741"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSMCC.2013.2247595"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.5555\/3122009.3176831"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1007\/s007780050005"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.5555\/2976248.2976312"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2623330.2623652"},{"key":"e_1_2_1_20_1","volume-title":"Significance-Based Categorical Data Clustering. ArXiv","author":"Hu Lianyu","year":"2022","unstructured":"Lianyu Hu, Mudi Jiang, Yan Liu, and Zengyou He. 2022. Significance-Based Categorical Data Clustering. ArXiv, Vol. abs\/2211.03956 (2022)."},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of the First Pacific-Asia Conference on Knowledge Discovery and Data Mining. 21-34","author":"Huang Zhexue","year":"1997","unstructured":"Zhexue Huang. 1997. Clustering large data sets with mixed numeric and categorical values. In Proceedings of the First Pacific-Asia Conference on Knowledge Discovery and Data Mining. 21-34."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1009769707641"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-03915-7_8"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/2133360.2133361"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2017.2728138"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2015.2436432"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2018.2808532"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2017\/269"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2018.2848902"},{"key":"e_1_2_1_30_1","volume-title":"CM-CaFE: A Clustering Method with Causality-based Feature Embedding. ACM Transactions on Knowledge Discovery from Data","author":"Jing Xuechun","year":"2025","unstructured":"Xuechun Jing, Fuyuan Cao, Kui Yu, and Jiye Liang. 2025. CM-CaFE: A Clustering Method with Causality-based Feature Embedding. ACM Transactions on Knowledge Discovery from Data, Vol. 19, 3 (2025), 1-23."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1007\/b98832"},{"key":"e_1_2_1_32_1","volume-title":"Amaresh Chandra Mishra, and Sraban Kumar Mohanty.","author":"Kar Amit Kumar","year":"2024","unstructured":"Amit Kumar Kar, Mohammad Maksood Akhter, Amaresh Chandra Mishra, and Sraban Kumar Mohanty. 2024. EDMD: An Entropy based Dissimilarity measure to cluster Mixed-categorical Data. Pattern Recognition (2024), 110674."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2005.06.002"},{"key":"e_1_2_1_34_1","volume-title":"MICCF: A Mutual Information Constrained Clustering Framework for Learning Clustering-Oriented Feature Representations. ACM Transactions on Knowledge Discovery from Data","author":"Li Hongyu","year":"2024","unstructured":"Hongyu Li, Lefei Zhang, Kehua Su, and Wei Yu. 2024. MICCF: A Mutual Information Constrained Clustering Framework for Learning Clustering-Oriented Feature Representations. ACM Transactions on Knowledge Discovery from Data, Vol. 18, 8 (2024), 1-22."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/1015330.1015404"},{"key":"e_1_2_1_36_1","volume-title":"Proceedings of the 15th International Conference on Machine Learning. 296-304","author":"Lin Dekang","year":"1998","unstructured":"Dekang Lin. 1998. An information-theoretic definition of similarity. In Proceedings of the 15th International Conference on Machine Learning. 296-304."},{"key":"e_1_2_1_37_1","volume-title":"Paulo Oswaldo Boaventura-Netto, Peter Hahn, and Tania Querido.","author":"Loiola Eliane Maria","year":"2007","unstructured":"Eliane Maria Loiola, Nair Maria Maia De Abreu, Paulo Oswaldo Boaventura-Netto, Peter Hahn, and Tania Querido. 2007. A survey for the quadratic assignment problem. European journal of operational research, Vol. 176, 2 (2007), 657-690."},{"key":"e_1_2_1_38_1","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"van der Maaten Laurens","year":"2008","unstructured":"Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research, Vol. 9, 11 (2008), 2579-2605.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2015.2451151"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.1971.10482356"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3725366"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2011.18"},{"key":"e_1_2_1_43_1","first-page":"5261","article-title":"Neural graph matching network: Learning lawler's quadratic assignment problem with extension to hypergraph and multiple-graph matching","volume":"44","author":"Wang Runzhong","year":"2021","unstructured":"Runzhong Wang, Junchi Yan, and Xiaokang Yang. 2021. Neural graph matching network: Learning lawler's quadratic assignment problem with extension to hypergraph and multiple-graph matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 44, 9 (2021), 5261-5279.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"e_1_2_1_44_1","volume-title":"Individual comparisons by ranking methods. Biometrics bulletin","author":"Wilcoxon Frank","year":"1945","unstructured":"Frank Wilcoxon. 1945. Individual comparisons by ranking methods. Biometrics bulletin, Vol. 1, 6 (1945), 80-83."},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/TFUZZ.2022.3189831"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/TFUZZ.2022.3189831"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCYB.2020.2983073"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2022.3202700"},{"key":"e_1_2_1_49_1","volume-title":"Proceedings of the 34th AAAI Conference on Artificial Intelligence. 6869-6876","author":"Zhang Yiqun","unstructured":"Yiqun Zhang, Yiu-ming Cheung, and et al., 2020a. An ordinal data clustering algorithm with automated distance learning. In Proceedings of the 34th AAAI Conference on Artificial Intelligence. 6869-6876."},{"key":"e_1_2_1_50_1","first-page":"3560","article-title":"Learnable Weighting of Intra-Attribute Distances for Categorical Data Clustering with Nominal and Ordinal Attributes","volume":"44","author":"Zhang Yiqun","year":"2022","unstructured":"Yiqun Zhang, Yiu-ming Cheung, and et al., 2022a. Learnable Weighting of Intra-Attribute Distances for Categorical Data Clustering with Nominal and Ordinal Attributes. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 44, 7 (2022), 3560-3576.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2019.2899381"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2022\/522"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2025.3563769"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1109\/TETCI.2025.3598327"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v39i21.34429"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2025.126738"},{"key":"e_1_2_1_57_1","first-page":"124","article-title":"e. Adaptive micro partition and hierarchical merging for accurate mixed data clustering","volume":"11","author":"Zhang Yunfan","year":"2025","unstructured":"Yunfan Zhang, Rong Zou, Yiqun Zhang, Yue Zhang, Yiu-ming Cheung, and Kangshun Li. 2025 e. Adaptive micro partition and hierarchical merging for accurate mixed data clustering. Complex & Intelligent Systems, Vol. 11 (2025), 124-128.","journal-title":"Complex & Intelligent Systems"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.3233\/FAIA240709"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSMC.2020.3027860"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2020.3010953"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1109\/DSAA49011.2020.00024"},{"key":"e_1_2_1_62_1","volume-title":"SDENK: Unbiased Subspace Density-Clustering. Neurocomputing","author":"Zou Rong","year":"2025","unstructured":"Rong Zou, Yunfan Zhang, Mingjie Zhao, Zexi Tan, Yiqun Zhang, and Yiu-ming Cheung. 2025. SDENK: Unbiased Subspace Density-Clustering. Neurocomputing (2025), 131225."}],"container-title":["Proceedings of the ACM on Management of Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3769772","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,7]],"date-time":"2026-04-07T04:36:02Z","timestamp":1775536562000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3769772"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12,4]]},"references-count":62,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2025,12,4]]}},"alternative-id":["10.1145\/3769772"],"URL":"https:\/\/doi.org\/10.1145\/3769772","relation":{},"ISSN":["2836-6573"],"issn-type":[{"value":"2836-6573","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,12,4]]}}}