{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,9]],"date-time":"2026-05-09T20:53:35Z","timestamp":1778360015767,"version":"3.51.4"},"reference-count":57,"publisher":"Institution of Engineering and Technology (IET)","issue":"1","license":[{"start":{"date-parts":[[2024,11,19]],"date-time":"2024-11-19T00:00:00Z","timestamp":1731974400000},"content-version":"vor","delay-in-days":323,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61906175"],"award-info":[{"award-number":["61906175"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100017700","name":"Henan Provincial Science and Technology Research Project","doi-asserted-by":"publisher","award":["242102210033"],"award-info":[{"award-number":["242102210033"]}],"id":[{"id":"10.13039\/501100017700","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100017700","name":"Henan Provincial Science and Technology Research Project","doi-asserted-by":"publisher","award":["242102211050"],"award-info":[{"award-number":["242102211050"]}],"id":[{"id":"10.13039\/501100017700","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["ietresearch.onlinelibrary.wiley.com"],"crossmark-restriction":true},"short-container-title":["IET Software"],"published-print":{"date-parts":[[2024,1]]},"abstract":"<jats:p>The technique of software defect prediction aims to assess and predict potential defects in software projects and has made significant progress in recent years within software development. In previous studies, this technique largely relied on supervised learning methods, requiring a substantial amount of labeled historical defect data to train the models. However, obtaining these labeled data often demands significant time and resources. In contrast, software defect prediction based on unsupervised learning does not depend on known labeled data, eliminating the need for large\u2010scale data labeling, thereby saving considerable time and resources while providing a more flexible solution for ensuring software quality. This paper conducts software defect prediction using unsupervised learning methods on data from 16 projects across two public datasets (PROMISE and NASA). During the feature selection step, a chi\u2010squared sparse feature selection method is proposed. This feature selection strategy combines chi\u2010squared tests with sparse principal component analysis (SPCA). Specifically, the chi\u2010squared test is first used to filter out the most statistically significant features, and then the SPCA is applied to reduce the dimensionality of these significant features. In the clustering step, the dot product matrix and Pearson correlation coefficient (PCC) matrix are used to construct weighted adjacency matrices, and a clustering overlap method is proposed. This method integrates spectral clustering, Newman clustering, fluid clustering, and Clauset\u2013Newman\u2013Moore (CNM) clustering through ensemble learning. Experimental results indicate that, in the absence of labeled data, using the chi\u2010squared sparse method for feature selection demonstrates superior performance, and the proposed clustering overlap method outperforms or is comparable to the effectiveness of the four baseline clustering methods.<\/jats:p>","DOI":"10.1049\/2024\/6294422","type":"journal-article","created":{"date-parts":[[2024,11,20]],"date-time":"2024-11-20T02:50:31Z","timestamp":1732071031000},"update-policy":"https:\/\/doi.org\/10.1002\/crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Software Defect Prediction Method Based on Clustering Ensemble Learning"],"prefix":"10.1049","volume":"2024","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4722-5915","authenticated-orcid":false,"given":"Hongwei","family":"Tao","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-9245-2716","authenticated-orcid":false,"given":"Qiaoling","family":"Cao","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Haoran","family":"Chen","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yanting","family":"Li","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xiaoxu","family":"Niu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tao","family":"Wang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhenhao","family":"Geng","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Songtao","family":"Shang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"265","published-online":{"date-parts":[[2024,11,19]]},"reference":[{"key":"e_1_2_11_1_2","doi-asserted-by":"publisher","DOI":"10.1002\/smr.2549"},{"key":"e_1_2_11_2_2","doi-asserted-by":"publisher","DOI":"10.2298\/CSIS141228061M"},{"key":"e_1_2_11_3_2","doi-asserted-by":"publisher","DOI":"10.1155\/2018\/9616938"},{"key":"e_1_2_11_4_2","doi-asserted-by":"publisher","DOI":"10.1007\/s13369-022-07337-9"},{"key":"e_1_2_11_5_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.asoc.2022.109737"},{"key":"e_1_2_11_6_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10664-023-10341-8"},{"key":"e_1_2_11_7_2","doi-asserted-by":"publisher","DOI":"10.1049\/iet-sen.2017.0148"},{"key":"e_1_2_11_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2021.3054730"},{"key":"e_1_2_11_9_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.compgeo.2024.106175"},{"key":"e_1_2_11_10_2","doi-asserted-by":"publisher","DOI":"10.3390\/rs14092103"},{"key":"e_1_2_11_11_2","doi-asserted-by":"publisher","DOI":"10.34133\/2022\/9835014"},{"key":"e_1_2_11_12_2","doi-asserted-by":"publisher","DOI":"10.34133\/2022\/9780497"},{"key":"e_1_2_11_13_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00521-021-06015-5"},{"key":"e_1_2_11_14_2","first-page":"107","article-title":"An Adaptive Agent Decision Model Based on Deep Reinforcement Learning and Autonomous Learning","volume":"10","author":"Zhu C.","year":"2023","journal-title":"Journal of Logistics, Informatics and Service Science"},{"key":"e_1_2_11_15_2","doi-asserted-by":"crossref","unstructured":"NamJ.andClamiK. S. Defect Prediction on Unlabeled Datasets 30th IEEE\/ACM International Conference on Automated Software Engineering (ASE) 2015 IEEE 452\u2013463.","DOI":"10.1109\/ASE.2015.56"},{"key":"e_1_2_11_16_2","doi-asserted-by":"publisher","DOI":"10.1155\/2022\/5024399"},{"key":"e_1_2_11_17_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11227-021-04113-8"},{"key":"e_1_2_11_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2022.3150153"},{"key":"e_1_2_11_19_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.infsof.2020.106287"},{"key":"e_1_2_11_20_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jss.2020.110862"},{"key":"e_1_2_11_21_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2023.123041"},{"key":"e_1_2_11_22_2","doi-asserted-by":"crossref","unstructured":"ZhangF. ZhengQ. andZouY. et al.Cross-Project Defect Prediction Using a Connectivity-Based Unsupervised Classifier Proceedings of the 38th International Conference on Software Engineering 2016 ACM 309\u2013320.","DOI":"10.1145\/2884781.2884839"},{"key":"e_1_2_11_23_2","doi-asserted-by":"crossref","unstructured":"HaD. A. ChenT. H. andYuanS. M. Unsupervised Methods for Software Defect Prediction Proceedings of the 10th International Symposium on Information and Communication Technology 2019 ACM 49\u201355.","DOI":"10.1145\/3368926.3369711"},{"key":"e_1_2_11_24_2","doi-asserted-by":"publisher","DOI":"10.1007\/BFb0020217"},{"key":"e_1_2_11_25_2","doi-asserted-by":"crossref","unstructured":"ACichockiR. Z. Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-way Data Analysis and Blind Source Separation 2009.","DOI":"10.1002\/9780470747278"},{"key":"e_1_2_11_26_2","doi-asserted-by":"crossref","unstructured":"YangY. ZhouY. andLiuJ. et al.Effort-Aware Just-In-Time Defect Prediction: Simple Unsupervised Models Could Be Better Than Supervised Models Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering 2016 ACM 157\u2013168.","DOI":"10.1145\/2950290.2950353"},{"key":"e_1_2_11_27_2","doi-asserted-by":"crossref","unstructured":"YanM. FangY. andLoD. et al.File-Level Defect Prediction: Unsupervised vs. Supervised Models 2017 ACM\/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM) 2017 ACM 344\u2013353.","DOI":"10.1109\/ESEM.2017.48"},{"key":"e_1_2_11_28_2","doi-asserted-by":"crossref","unstructured":"FuW.andMenziesT. Revisiting Unsupervised Learning for Defect Prediction Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering 2017 ACM 72\u201383.","DOI":"10.1145\/3106237.3106257"},{"key":"e_1_2_11_29_2","doi-asserted-by":"publisher","DOI":"10.1145\/3183339"},{"key":"e_1_2_11_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2020.3001739"},{"key":"e_1_2_11_31_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cviu.2018.08.003"},{"key":"e_1_2_11_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.3007291"},{"key":"e_1_2_11_33_2","doi-asserted-by":"publisher","DOI":"10.1111\/exsy.12553"},{"key":"e_1_2_11_34_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2022.118157"},{"key":"e_1_2_11_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2018.2863752"},{"key":"e_1_2_11_36_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.asoc.2023.110100"},{"key":"e_1_2_11_37_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00607-022-01100-6"},{"key":"e_1_2_11_38_2","unstructured":"KoloninA.andRameshV. Unsupervised Tokenization Learning 2022 arXiv preprint arXiv:2205.11443."},{"key":"e_1_2_11_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/34.868688"},{"key":"e_1_2_11_40_2","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevE.74.036104"},{"key":"e_1_2_11_41_2","unstructured":"Par\u00e9sF. Garcia-GasullaD. andVilaltaA. et al.Fluid Communities: A Community Detection Algorithm 2017 arXiv preprint arXiv: 1703.09307."},{"key":"e_1_2_11_42_2","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevE.70.066111"},{"key":"e_1_2_11_43_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2010.08.026"},{"key":"e_1_2_11_44_2","doi-asserted-by":"publisher","DOI":"10.1186\/s13104-019-4837-4"},{"key":"e_1_2_11_45_2","doi-asserted-by":"publisher","DOI":"10.1080\/09720502.2016.1259769"},{"key":"e_1_2_11_46_2","doi-asserted-by":"crossref","unstructured":"ZhaiY. SongW. andLiuX. et al.A Chi-Square Statistics Based Feature Selection Method in Text Classification 2018 IEEE 9th International Conference on Software Engineering and Service Science (ICSESS) 2018 IEEE 160\u2013163.","DOI":"10.1109\/ICSESS.2018.8663882"},{"key":"e_1_2_11_47_2","doi-asserted-by":"publisher","DOI":"10.1198\/106186006X113430"},{"key":"e_1_2_11_48_2","doi-asserted-by":"publisher","DOI":"10.1201\/b18401"},{"key":"e_1_2_11_49_2","doi-asserted-by":"publisher","DOI":"10.1007\/s40840-023-01629-5"},{"key":"e_1_2_11_50_2","unstructured":"KiranY.andRiedelM. A Scalable Approach to Performing Multiplication and Matrix Dot-Products in Unary 2023 arXiv preprint arXiv: 2307.03204."},{"key":"e_1_2_11_51_2","first-page":"1","article-title":"Inferential Procedures Based on the Weighted Pearson Correlation Coefficient Test Statistic","volume":"51","author":"Yu H.","year":"2022","journal-title":"Journal of Applied Statistics"},{"key":"e_1_2_11_52_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2017.06.085"},{"key":"e_1_2_11_53_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10489-021-02447-7"},{"key":"e_1_2_11_54_2","doi-asserted-by":"crossref","unstructured":"HassanA. E. Predicting Faults Using the Complexity of Code Changes 2009 IEEE 31st International Conference on Software Engineering 2009 IEEE 78\u201388.","DOI":"10.1109\/ICSE.2009.5070510"},{"key":"e_1_2_11_55_2","doi-asserted-by":"crossref","unstructured":"JureczkoM.andMadeyskiL. Towards Identifying Software Project Clusters With Regard to Defect Prediction Proceedings of the 6th International Conference on Predictive Models in Software Engineering 2010 IEEE 1\u201310.","DOI":"10.1145\/1868328.1868342"},{"key":"e_1_2_11_56_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2013.11"},{"key":"e_1_2_11_57_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.dam.2023.08.026"}],"container-title":["IET Software"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/ietresearch.onlinelibrary.wiley.com\/doi\/pdf\/10.1049\/2024\/6294422","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,5]],"date-time":"2025-11-05T09:08:43Z","timestamp":1762333723000},"score":1,"resource":{"primary":{"URL":"https:\/\/ietresearch.onlinelibrary.wiley.com\/doi\/10.1049\/2024\/6294422"}},"subtitle":[],"editor":[{"given":"Antonio","family":"Galli","sequence":"additional","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2024,1]]},"references-count":57,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2024,1]]}},"alternative-id":["10.1049\/2024\/6294422"],"URL":"https:\/\/doi.org\/10.1049\/2024\/6294422","archive":["Portico"],"relation":{},"ISSN":["1751-8806","1751-8814"],"issn-type":[{"value":"1751-8806","type":"print"},{"value":"1751-8814","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,1]]},"assertion":[{"value":"2024-05-04","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-09-30","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-11-19","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}],"article-number":"6294422"}}