{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,3,11]],"date-time":"2025-03-11T04:11:48Z","timestamp":1741666308260,"version":"3.38.0"},"reference-count":22,"publisher":"SAGE Publications","issue":"6","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IDA"],"published-print":{"date-parts":[[2021,10,29]]},"abstract":"<jats:p>In order to solve the clustering problem with incomplete and categorical matrix data sets, and considering the uncertain relationship between samples and clusters, a set pair k-modes clustering algorithm is proposed (MD-SPKM). Firstly, the correlation theory of set pair information granule is introduced into k-modes clustering. By improving the distance formula of traditional k-modes algorithm, a set pair distance measurement method between incomplete matrix samples is defined. Secondly, considering the uncertain relationship between the sample and the cluster, the definition of the intra-cluster average distance and the threshold calculation formula to determine whether the sample belongs to multiple clusters is given, and then the result of set pair clustering is formed, which includes positive region, boundary region and negative region. Finally, through the selected three data sets and four contrast algorithms for experimental evaluation, the experimental results show that the set pair k-modes clustering algorithm can effectively handle incomplete categorical matrix data sets, and has good clustering performance in Accuracy, Recall, ARI and NMI.<\/jats:p>","DOI":"10.3233\/ida-205340","type":"journal-article","created":{"date-parts":[[2021,11,2]],"date-time":"2021-11-02T20:03:10Z","timestamp":1635883390000},"page":"1507-1524","source":"Crossref","is-referenced-by-count":3,"title":["MD-SPKM: A set pair k-modes clustering algorithm for incomplete categorical matrix data"],"prefix":"10.1177","volume":"25","author":[{"given":"Chunying","family":"Zhang","sequence":"first","affiliation":[{"name":"College of Science, North China University of Science and Technology, Tangshan, Hebei, China"},{"name":"Key Laboratory of Data Science and Application of Hebei Province, Tangshan, Hebei, China"}]},{"given":"Ruiyan","family":"Gao","sequence":"additional","affiliation":[{"name":"College of Science, North China University of Science and Technology, Tangshan, Hebei, China"}]},{"given":"Jiahao","family":"Wang","sequence":"additional","affiliation":[{"name":"College of Science, North China University of Science and Technology, Tangshan, Hebei, China"}]},{"given":"Song","family":"Chen","sequence":"additional","affiliation":[{"name":"College of Science, North China University of Science and Technology, Tangshan, Hebei, China"}]},{"given":"Fengchun","family":"Liu","sequence":"additional","affiliation":[{"name":"Qianan College, North China University of Science and Technology, Tangshan, Hebei, China"}]},{"given":"Jing","family":"Ren","sequence":"additional","affiliation":[{"name":"College of Science, North China University of Science and Technology, Tangshan, Hebei, China"}]},{"given":"Xiaoze","family":"Feng","sequence":"additional","affiliation":[{"name":"College of Science, North China University of Science and Technology, Tangshan, Hebei, China"}]}],"member":"179","reference":[{"key":"10.3233\/IDA-205340_ref1","doi-asserted-by":"crossref","unstructured":"T.X. Wang and J.Y. Gao, An improved k-means algorithm based on kurtosis test, Journal of Physics: Conference Series 1267 (2019), 012027.","DOI":"10.1088\/1742-6596\/1267\/1\/012027"},{"key":"10.3233\/IDA-205340_ref2","doi-asserted-by":"crossref","first-page":"691","DOI":"10.1007\/978-3-319-25751-8_83","article-title":"Genetic sampling k-means for clustering large data sets","volume":"9423","author":"Luchi","year":"2015","journal-title":"Lecture Notes in Computer Science"},{"key":"10.3233\/IDA-205340_ref3","doi-asserted-by":"crossref","first-page":"283","DOI":"10.1023\/A:1009769707641","article-title":"Extensions to the k-means algorithm for clustering large data sets with categorical values","volume":"2","author":"Huang","year":"1998","journal-title":"Data Mining and Knowledge Discovery"},{"key":"10.3233\/IDA-205340_ref4","first-page":"112","article-title":"An improved k-modes clustering algorithm","volume":"28","author":"Shi","year":"2019","journal-title":"Operations Research and Management Science"},{"key":"10.3233\/IDA-205340_ref5","doi-asserted-by":"crossref","first-page":"167","DOI":"10.1016\/j.ins.2015.11.005","article-title":"Initialization of k-modes clustering using outlier detection techniques","volume":"332","author":"Jiang","year":"2016","journal-title":"Information Sciences"},{"key":"10.3233\/IDA-205340_ref6","doi-asserted-by":"crossref","first-page":"6171","DOI":"10.1007\/s10586-018-1889-5","article-title":"Attribute weights-based clustering centres algorithm for initialising K-modes clustering","volume":"22","author":"Peng","year":"2019","journal-title":"Cluster Computing"},{"key":"10.3233\/IDA-205340_ref7","first-page":"73","article-title":"Improved cluster center initialization method for clustering categorical data","volume":"38","author":"Wang","year":"2018","journal-title":"Journal of Computer Applications"},{"key":"10.3233\/IDA-205340_ref8","doi-asserted-by":"crossref","first-page":"605","DOI":"10.1016\/j.asoc.2017.04.019","article-title":"k-mw-modes: An algorithm for clustering categorical matrix-object data","volume":"57","author":"Cao","year":"2017","journal-title":"Applied Soft Computing"},{"key":"10.3233\/IDA-205340_ref9","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/s12559-016-9397-5","article-title":"Three-way decisions and cognitive computing","volume":"8","author":"Yao","year":"2016","journal-title":"Cognitive Computation"},{"key":"10.3233\/IDA-205340_ref10","doi-asserted-by":"crossref","first-page":"107","DOI":"10.1016\/j.ijar.2018.09.005","article-title":"Three-way decision and granular computing","volume":"103","author":"Yao","year":"2018","journal-title":"International Journal of Approximate Reasoning"},{"key":"10.3233\/IDA-205340_ref11","first-page":"31","article-title":"Three-way cluster analysis","author":"Yu","year":"2016","journal-title":"Peak Data Science"},{"key":"10.3233\/IDA-205340_ref12","first-page":"15","article-title":"Model of three-way decision based on the space of set pair information granule and its application","author":"Zhang","year":"2016","journal-title":"Journal on Communications"},{"key":"10.3233\/IDA-205340_ref13","first-page":"1790","article-title":"K-modes algorithm based on interdependence redundancy measure","volume":"37","author":"Huang","year":"2016","journal-title":"Journal of Chinese Computer Systems"},{"key":"10.3233\/IDA-205340_ref14","first-page":"1","article-title":"A global-relationship dissimilarity measure for the k-modes clustering algorithm","volume":"2017","author":"Zhou","year":"2017","journal-title":"Computational Intelligence and Neuroscience"},{"key":"10.3233\/IDA-205340_ref15","doi-asserted-by":"crossref","first-page":"4593","DOI":"10.1109\/TNNLS.2017.2770167","article-title":"An algorithm for clustering categorical data with set-valued features","volume":"29","author":"Cao","year":"2018","journal-title":"IEEE Transactions on Neural Networks and Learning Systems"},{"key":"10.3233\/IDA-205340_ref16","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.amc.2016.09.023","article-title":"A fuzzy SV-k-modes algorithm for clustering categorical data with set-valued attributes","volume":"295","author":"Cao","year":"2017","journal-title":"Applied Mathematics and Computation"},{"key":"10.3233\/IDA-205340_ref17","first-page":"1325","article-title":"A MD fuzzy k-modes algorithm for clustering categorical matrix-object data","volume":"56","author":"Li","year":"2019","journal-title":"Journal of Computer Research and Development"},{"key":"10.3233\/IDA-205340_ref18","doi-asserted-by":"crossref","first-page":"54","DOI":"10.1016\/j.knosys.2018.04.029","article-title":"CE3: A three-way clustering method based on mathematical morphology","volume":"155","author":"Wang","year":"2018","journal-title":"Knowledge-Based Systems"},{"key":"10.3233\/IDA-205340_ref19","doi-asserted-by":"crossref","first-page":"1568","DOI":"10.1016\/j.asoc.2019.105536","article-title":"A three-way c-means algorithm","volume":"82","author":"Zhang","year":"2019","journal-title":"Applied Soft Computing Journal"},{"key":"10.3233\/IDA-205340_ref20","first-page":"67","article-title":"Set pair analysis and its preliminary application","volume":"1","author":"Zhao","year":"1994","journal-title":"Exploration of Nature"},{"key":"10.3233\/IDA-205340_ref21","first-page":"81","article-title":"The fundamental operation of arithmetic on connection number a+b\u2062i+c\u2062j and its application","volume":"17","author":"Huang","year":"2000","journal-title":"Mechanical & Electrical Engineering Magazine"},{"key":"10.3233\/IDA-205340_ref22","doi-asserted-by":"crossref","first-page":"323","DOI":"10.1007\/978-3-642-35380-2_38","article-title":"Rough set based fuzzy k-modes for categorical data","volume":"7677","author":"Saha","year":"2012","journal-title":"Swarm, Evolutionary, and Memetic Computing"}],"container-title":["Intelligent Data Analysis"],"original-title":[],"link":[{"URL":"https:\/\/content.iospress.com\/download?id=10.3233\/IDA-205340","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,10]],"date-time":"2025-03-10T12:52:36Z","timestamp":1741611156000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/full\/10.3233\/IDA-205340"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,29]]},"references-count":22,"journal-issue":{"issue":"6"},"URL":"https:\/\/doi.org\/10.3233\/ida-205340","relation":{},"ISSN":["1088-467X","1571-4128"],"issn-type":[{"type":"print","value":"1088-467X"},{"type":"electronic","value":"1571-4128"}],"subject":[],"published":{"date-parts":[[2021,10,29]]}}}