{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,14]],"date-time":"2025-10-14T00:49:05Z","timestamp":1760402945458,"version":"build-2065373602"},"reference-count":36,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2022,1,2]],"date-time":"2022-01-02T00:00:00Z","timestamp":1641081600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Natural Science Foundation of China (NSFC)","award":["61572194."],"award-info":[{"award-number":["61572194."]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Symmetry"],"abstract":"<jats:p>Density clustering has been widely used in many research disciplines to determine the structure of real-world datasets. Existing density clustering algorithms only work well on complete datasets. In real-world datasets, however, there may be missing feature values due to technical limitations. Many imputation methods used for density clustering cause the aggregation phenomenon. To solve this problem, a two-stage novel density peak clustering approach with missing features is proposed: First, the density peak clustering algorithm is used for the data with complete features, while the labeled core points that can represent the whole data distribution are used to train the classifier. Second, we calculate a symmetrical FWPD distance matrix for incomplete data points, then the incomplete data are imputed by the symmetrical FWPD distance matrix and classified by the classifier. The experimental results show that the proposed approach performs well on both synthetic datasets and real datasets.<\/jats:p>","DOI":"10.3390\/sym14010060","type":"journal-article","created":{"date-parts":[[2022,1,9]],"date-time":"2022-01-09T23:35:09Z","timestamp":1641771309000},"page":"60","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":10,"title":["Clustering with Missing Features: A Density-Based Approach"],"prefix":"10.3390","volume":"14","author":[{"given":"Kun","family":"Gao","sequence":"first","affiliation":[{"name":"School of Software Engineering, East China Normal University, Shanghai 200062, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3311-363X","authenticated-orcid":false,"given":"Hassan Ali","family":"Khan","sequence":"additional","affiliation":[{"name":"School of Software Engineering, East China Normal University, Shanghai 200062, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wenwen","family":"Qu","sequence":"additional","affiliation":[{"name":"School of Software Engineering, East China Normal University, Shanghai 200062, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2022,1,2]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Gan, G., Ma, C., and Wu, J. (2007). Data Clustering: Theory, Algorithms, and Applications, Society for Industrial and Applied Mathematics, American Statistical Association.","DOI":"10.1137\/1.9780898718348"},{"key":"ref_2","unstructured":"Ankerst, M., Breunig, M., Kriegel, H.P., Ng, R., and Sander, J. (2008, January 9\u201312). Ordering points to identify the clustering structure. Proceedings of the ACM International Conference on Management of Data SIGMOD, Vancouver, BC, Canada."},{"key":"ref_3","unstructured":"Han, J., Kamber, M., and Pei, J. (2011). Data Mining: Concepts and Techniques, Morgan Kaufmane. [3rd ed.]."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"651","DOI":"10.1016\/j.patrec.2009.09.011","article-title":"Data clustering: 50 years beyond k-means","volume":"31","author":"Jain","year":"2010","journal-title":"Pattern Recognit. Lett."},{"key":"ref_5","first-page":"100","article-title":"Algorithm AS 136: A k-means clustering algorithm","volume":"28","author":"Hartigan","year":"1979","journal-title":"J. R. Stat. Soc. Ser. C Appl. Stat."},{"key":"ref_6","first-page":"1","article-title":"Fuzzy c-means algorithm\u2014A review","volume":"2","author":"Suganya","year":"2012","journal-title":"Int. J. Sci. Res. Publ."},{"key":"ref_7","unstructured":"Wang, K., Zhang, J., Li, D., Zhang, X., and Guo, T. (2008). Adaptive affinity propagation clustering. arXiv."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"917","DOI":"10.1093\/bioinformatics\/bth007","article-title":"Gaussian mixture clustering and imputation of microarray data","volume":"20","author":"Ouyang","year":"2004","journal-title":"Bioinformatics"},{"key":"ref_9","unstructured":"Ester, M., Kriegel, H.P., Sander, J., and Xu, X. (1996, January 2\u20134). A density-based algorithm for discovering clusters in large spatial databases with noise. Proceedings of the Kdd, Portland, OR, USA."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"183","DOI":"10.26599\/BDMA.2021.9020001","article-title":"Effective density-based clustering algorithms for incomplete data","volume":"4","author":"Xue","year":"2021","journal-title":"Big Data Min. Anal."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1492","DOI":"10.1126\/science.1242072","article-title":"Clustering by fast search and find of density peaks","volume":"344","author":"Rodriguez","year":"2014","journal-title":"Science"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1016\/j.knosys.2016.02.001","article-title":"Study on density peaks clustering based on k-nearest neighbors and principal component analysis","volume":"99","author":"Du","year":"2016","journal-title":"Knowl.-Based Syst."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"208","DOI":"10.1016\/j.knosys.2017.07.010","article-title":"Adaptive density peak clustering based on K-nearest neighbors with aggregating strategy","volume":"133","author":"Yaohui","year":"2017","journal-title":"Knowl.-Based Syst."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"702","DOI":"10.1016\/j.physa.2019.03.012","article-title":"A novel density peaks clustering algorithm based on k nearest neighbors for improving assignment process","volume":"523","author":"Jiang","year":"2019","journal-title":"Phys. A Stat. Mech. Its Appl."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Cao, L., Liu, Y., Wang, D., Wang, T., and Fu, C. (2020). A novel density peak fuzzy clustering algorithm for moving vehicles using traffic radar. Electronics, 9.","DOI":"10.3390\/electronics9010046"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"104824","DOI":"10.1016\/j.knosys.2019.06.032","article-title":"Fast density peak clustering for large scale data based on kNN","volume":"187","author":"Chen","year":"2020","journal-title":"Knowl.-Based Syst."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Lin, J.L., Kuo, J.C., and Chuang, H.W. (2020). Improving Density Peak Clustering by Automatic Peak Selection and Single Linkage Clustering. Symmetry, 12.","DOI":"10.3390\/sym12071168"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Shi, Z., Ma, D., Yan, X., Zhu, W., and Zhao, Z. (2021). A Density-Peak-Based Clustering Method for Multiple Densities Dataset. ISPRS Int. J. Geo-Inf., 10.","DOI":"10.3390\/ijgi10090589"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"2419","DOI":"10.1007\/s10115-019-01427-1","article-title":"Missing data imputation using decision trees and fuzzy clustering with iterative learning","volume":"62","author":"Nikfalazar","year":"2020","journal-title":"Knowl. Inform. Syst."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"216969","DOI":"10.1109\/ACCESS.2020.3042119","article-title":"CBRG: A Novel Algorithm for Handling Missing Data Using Bayesian Ridge Regression and Feature Selection Based on Gain Ratio","volume":"8","author":"Mostafa","year":"2020","journal-title":"IEEE Access"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"70316","DOI":"10.1109\/ACCESS.2020.2983319","article-title":"Credal Transfer Learning With Multi-Estimation for Missing Data","volume":"8","author":"Ma","year":"2020","journal-title":"IEEE Access"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"81","DOI":"10.1504\/IJBDM.2020.106883","article-title":"Missing data imputation by the aid of features similarities","volume":"1","author":"Mostafa","year":"2020","journal-title":"Int. J. Big Data Manag."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"418","DOI":"10.1016\/j.ins.2021.04.076","article-title":"Clustering mixed numerical and categorical data with missing values","volume":"571","author":"Dinh","year":"2021","journal-title":"Inform. Sci."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"1087","DOI":"10.1016\/j.jclinepi.2006.01.014","article-title":"A gentle introduction to imputation of missing values","volume":"59","author":"Donders","year":"2006","journal-title":"J. Clin. Epidemiol."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"617","DOI":"10.1109\/TSMC.1979.4310090","article-title":"Pattern recognition with partly missing data","volume":"9","author":"Dixon","year":"1979","journal-title":"IEEE Trans. Syst. Man Cybernet."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1111\/j.2517-6161.1977.tb01600.x","article-title":"Maximum likelihood from incomplete data via the EM algorithm","volume":"39","author":"Dempster","year":"1977","journal-title":"J. R. Stat. Soc. Ser. B Methodol."},{"key":"ref_27","first-page":"1","article-title":"Gaussian Mixture Model Clustering with Incomplete Data","volume":"17","author":"Zhang","year":"2021","journal-title":"ACM Trans. Multimedia Comput. Commun. Appl. TOMM"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"69162","DOI":"10.1109\/ACCESS.2019.2910287","article-title":"K-means clustering with incomplete data","volume":"7","author":"Wang","year":"2019","journal-title":"IEEE Access"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"735","DOI":"10.1109\/3477.956035","article-title":"Fuzzy c-means clustering of incomplete data","volume":"31","author":"Hathaway","year":"2001","journal-title":"IEEE Trans. Syst. Man Cybernet. Part B Cybernet."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"1987","DOI":"10.1007\/s10994-018-5722-4","article-title":"Clustering with missing features: A penalized dissimilarity measure based approach","volume":"107","author":"Datta","year":"2018","journal-title":"Mach. Learn."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"131","DOI":"10.1023\/A:1007465528199","article-title":"Bayesian network classifiers","volume":"29","author":"Friedman","year":"1997","journal-title":"Mach. Learn."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Bishop, C.M. (1995). Neural Networks for Pattern Recognition, Oxford University Press.","DOI":"10.1093\/oso\/9780198538493.001.0001"},{"key":"ref_33","unstructured":"Jakkula, V. (2006). Tutorial on Support Vector Machine (svm), School of EECS, Washington State University."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1471-2105-8-3","article-title":"FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data","volume":"8","author":"Fu","year":"2007","journal-title":"BMC Bioinform."},{"key":"ref_35","first-page":"583","article-title":"Cluster Ensembles\u2014A Knowledge Reuse Framework for Combining Multiple Partitions","volume":"3","author":"Strehl","year":"2002","journal-title":"J. Mach. Learn. Res."},{"key":"ref_36","first-page":"580","article-title":"A Method for Comparing Two Hierarchical Clusterings: Comment","volume":"78","author":"Jolliffe","year":"1983","journal-title":"J. Am. Stat. Assoc."}],"container-title":["Symmetry"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-8994\/14\/1\/60\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,13]],"date-time":"2025-10-13T14:12:32Z","timestamp":1760364752000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-8994\/14\/1\/60"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,1,2]]},"references-count":36,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2022,1]]}},"alternative-id":["sym14010060"],"URL":"https:\/\/doi.org\/10.3390\/sym14010060","relation":{},"ISSN":["2073-8994"],"issn-type":[{"type":"electronic","value":"2073-8994"}],"subject":[],"published":{"date-parts":[[2022,1,2]]}}}