{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,29]],"date-time":"2026-01-29T18:16:46Z","timestamp":1769710606601,"version":"3.49.0"},"reference-count":22,"publisher":"SAGE Publications","issue":"5","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IFS"],"published-print":{"date-parts":[[2023,5,4]]},"abstract":"<jats:p>Many traditional clustering algorithms are incapable of processing mixed-type datasets in parallel, limiting their applications in big data. In this paper, we propose a CF tree clustering algorithm based on MapReduce to handle mixed-type datasets. Mapper phase and reducer phase are the two primary phases of MR-CF. In the mapper phase, the original CF tree algorithm is modified to collect intermediate CF entries, and in the reducer phase, k-prototypes is extended to cluster CF entries. To avoid the high costs associated with I\/O overheads and data serialization, MR-CF loads a dataset from HDFS only once. We first analyze the time complexity, space complexity, and I\/O complexity of MR-CF. We also compare it with sklearn BIRCH, Apache Mahout k-means, k-prototypes, and mrk-prototypes on several real-world datasets and synthetic datasets. Experiments on two mixed-type big datasets reveal that MR-CF reduces execution time by 45.4% and 61.3% when compared to k-prototypes, and it reduces execution time by 73.8% and 55.0% when compared to mrk-prototypes.<\/jats:p>","DOI":"10.3233\/jifs-224234","type":"journal-article","created":{"date-parts":[[2023,2,28]],"date-time":"2023-02-28T11:25:51Z","timestamp":1677583551000},"page":"8309-8320","source":"Crossref","is-referenced-by-count":0,"title":["A parallel CF tree clustering algorithm for mixed-type datasets"],"prefix":"10.1177","volume":"44","author":[{"given":"Yufeng","family":"Li","sequence":"first","affiliation":[{"name":"College of Artificial Intelligence, Tianjin University of Science & Technology, Tianjin, China"}]},{"given":"Keyi","family":"Xu","sequence":"additional","affiliation":[{"name":"College of Artificial Intelligence, Tianjin University of Science & Technology, Tianjin, China"}]},{"given":"Yumei","family":"Ding","sequence":"additional","affiliation":[{"name":"College of Sciences, Tianjin University of Science & Technology, Tianjin, China"}]},{"given":"Zhiwei","family":"Sun","sequence":"additional","affiliation":[{"name":"College of Artificial Intelligence, Tianjin University of Science & Technology, Tianjin, China"}]},{"given":"Ting","family":"Ke","sequence":"additional","affiliation":[{"name":"College of Artificial Intelligence, Tianjin University of Science & Technology, Tianjin, China"}]}],"member":"179","reference":[{"key":"10.3233\/JIFS-224234_ref8","doi-asserted-by":"crossref","first-page":"2963","DOI":"10.3233\/JIFS-211633","article-title":"Shadow detection of soil imagebased on density peak clustering and histogram fitting","volume":"43","author":"Zeng","year":"2022","journal-title":"Journalof Intelligent & Fuzzy Systems"},{"key":"10.3233\/JIFS-224234_ref10","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1016\/j.knosys.2019.04.020","article-title":"Hierarchicalprediction based on two-level gaussian mixture model clustering forbike-sharing system","volume":"175","author":"Jia","year":"2019","journal-title":"Knowledge-Based Systems"},{"key":"10.3233\/JIFS-224234_ref11","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1007\/BF00337288","article-title":"Self-organized formation of topologically correctfeature maps","volume":"43","author":"Kohonen","year":"1982","journal-title":"Biological Cybernetics"},{"key":"10.3233\/JIFS-224234_ref12","doi-asserted-by":"crossref","first-page":"1270","DOI":"10.1109\/TII.2016.2547584","article-title":"A big data clustering algorithm formitigating the risk of customer churn","volume":"12","author":"Bi","year":"2016","journal-title":"IEEE Transactions on Industrial Informatics"},{"key":"10.3233\/JIFS-224234_ref13","doi-asserted-by":"crossref","first-page":"1241","DOI":"10.1109\/TCBB.2018.2886006","article-title":"Intra-cluster distance minimization in DNA methylation analysis using an advanced tabu-based iterative k-medoids clustering algorithm","volume":"17","author":"Damgacioglu","year":"2020","journal-title":"IEEE\/ACM Transactions on Computational Biology & Bioinformatics"},{"key":"10.3233\/JIFS-224234_ref14","doi-asserted-by":"crossref","first-page":"887","DOI":"10.1109\/TCBB.2014.2359433","article-title":"Adaptivefuzzy consensus clustering framework for clustering analysis ofcancer data","volume":"12","author":"Yu","year":"2015","journal-title":"IEEE\/ACM Transactions on Computational Biology &Bioinformatics"},{"key":"10.3233\/JIFS-224234_ref15","doi-asserted-by":"crossref","first-page":"2579","DOI":"10.3233\/JIFS-212709","article-title":"Tumor segmentation from brain MR imagesusing STSA based modified K-means clustering approach","volume":"43","author":"Lather","year":"2022","journal-title":"Journalof Intelligent & Fuzzy Systems"},{"key":"10.3233\/JIFS-224234_ref16","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.knosys.2020.106549","article-title":"Feature-reduction fuzzyco-clustering approach for hyper-spectral image analysis","volume":"216","author":"Pham","year":"2021","journal-title":"Knowledge-Based Systems"},{"key":"10.3233\/JIFS-224234_ref17","doi-asserted-by":"crossref","first-page":"46","DOI":"10.1109\/TIFS.2012.2223679","article-title":"Document clustering for forensicanalysis: An approach for improving computer inspection","volume":"8","author":"Nassif","year":"2013","journal-title":"IEEETransactions on Information Forensics and Security"},{"key":"10.3233\/JIFS-224234_ref18","first-page":"916","article-title":"Clustering documents inevolving languages by image texture analysis","volume":"46","author":"Brodic","year":"2017","journal-title":"AppliedIntelligence"},{"key":"10.3233\/JIFS-224234_ref21","first-page":"1","article-title":"A New Adaptive MixtureDistance-Based Improved Density Peaks Clustering for Gearbox FaultDiagnosis","volume":"71","author":"Kumar","year":"2022","journal-title":"IEEE Transactions on Instrumentation andMeasurement"},{"key":"10.3233\/JIFS-224234_ref22","doi-asserted-by":"crossref","first-page":"178","DOI":"10.1016\/j.ins.2022.11.139","article-title":"K-meansclustering algorithms: A comprehensive review, variants analysis,and advances in the era of big data","volume":"622","author":"Abiodun","year":"2023","journal-title":"Information Sciences"},{"key":"10.3233\/JIFS-224234_ref25","first-page":"1345","article-title":"Divide and conquer-meansclustering method based on MapReduce","volume":"41","author":"Zang","year":"2020","journal-title":"Computer Engineering andDesign"},{"key":"10.3233\/JIFS-224234_ref26","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.is.2016.02.007","article-title":"Single-pass and linear-time-means clustering based on MapReduce","volume":"60","author":"Shahrivari","year":"2016","journal-title":"Information Systems"},{"key":"10.3233\/JIFS-224234_ref30","doi-asserted-by":"crossref","first-page":"93","DOI":"10.1016\/j.bdr.2018.05.002","article-title":"A novel clustering method usingenhanced grey wolf optimizer and MapReduce","volume":"14","author":"Tripathi","year":"2018","journal-title":"Big Data Research"},{"key":"10.3233\/JIFS-224234_ref31","first-page":"771","article-title":"Parallel swarmintelligence strategies for large-scale clustering based onMapReduce with application to epigenetics of aging","volume":"69","author":"Benmounah","year":"2018","journal-title":"AppliedSoft Computing"},{"key":"10.3233\/JIFS-224234_ref32","doi-asserted-by":"crossref","first-page":"198","DOI":"10.1016\/j.eswa.2017.08.051","article-title":"Fwcmr: A scalable and robust fuzzyweighted clustering based on MapReduce with application tomicroarray gene expression","volume":"91","author":"Hosseini","year":"2018","journal-title":"Expert Systems with Applications"},{"key":"10.3233\/JIFS-224234_ref34","doi-asserted-by":"crossref","first-page":"1045","DOI":"10.1109\/TIT.2014.2375327","article-title":"Randomizeddimensionality reduction for-means clustering","volume":"61","author":"Boutsidis","year":"2014","journal-title":"IEEETransactions on Information Theory"},{"key":"10.3233\/JIFS-224234_ref36","doi-asserted-by":"crossref","first-page":"619","DOI":"10.1007\/s10844-017-0472-5","article-title":"One-pass MapReduce-basedclustering method for mixed large scale data","volume":"52","author":"HajKacem","year":"2019","journal-title":"Journal ofIntelligent Information Systems"},{"key":"10.3233\/JIFS-224234_ref38","doi-asserted-by":"crossref","first-page":"109238","DOI":"10.1016\/j.patcog.2022.109238","article-title":"ASampling-Based Density Peaks Clustering Algorithm for Large-ScaleData","volume":"136","author":"Ding","year":"2023","journal-title":"Pattern Recognition"},{"key":"10.3233\/JIFS-224234_ref42","first-page":"1694","article-title":"An effective clustering method overCF tree using multiple range queries","volume":"32","author":"Ryu","year":"2019","journal-title":"IEEE Transactionson Knowledge and Data Engineering"},{"key":"10.3233\/JIFS-224234_ref43","doi-asserted-by":"crossref","first-page":"5295","DOI":"10.3233\/JIFS-202079","article-title":"MR-BIRCH: A scalable MapReduce-basedBIRCH clustering algorithm","volume":"40","author":"Li","year":"2021","journal-title":"Journal of Intelligent & FuzzySystems"}],"container-title":["Journal of Intelligent &amp; Fuzzy Systems"],"original-title":[],"link":[{"URL":"https:\/\/content.iospress.com\/download?id=10.3233\/JIFS-224234","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,29]],"date-time":"2026-01-29T06:54:09Z","timestamp":1769669649000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/full\/10.3233\/JIFS-224234"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,5,4]]},"references-count":22,"journal-issue":{"issue":"5"},"URL":"https:\/\/doi.org\/10.3233\/jifs-224234","relation":{},"ISSN":["1064-1246","1875-8967"],"issn-type":[{"value":"1064-1246","type":"print"},{"value":"1875-8967","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,5,4]]}}}