{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,4]],"date-time":"2026-05-04T10:14:02Z","timestamp":1777889642814,"version":"3.51.4"},"reference-count":54,"publisher":"SAGE Publications","issue":"2","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["KES"],"published-print":{"date-parts":[[2021,7,26]]},"abstract":"<jats:p>BACKGROUND: Fault data is vital to predicting the fault-proneness in large systems. Predicting faulty classes helps in allocating the appropriate testing resources for future releases. However, current fault data face challenges such as unlabeled instances and data imbalance. These challenges degrade the performance of the prediction models. Data imbalance happens because the majority of classes are labeled as not faulty whereas the minority of classes are labeled as faulty. AIM: The research proposes to improve fault prediction using software metrics in combination with threshold values. Statistical techniques are proposed to improve the quality of the datasets and therefore the quality of the fault prediction. METHOD: Threshold values of object-oriented metrics are used to label classes as faulty to improve the fault prediction models The resulting datasets are used to build prediction models using five machine learning techniques. The use of threshold values is validated on ten large object-oriented systems. RESULTS: The models are built for the datasets with and without the use of thresholds. The combination of thresholds with machine learning has improved the fault prediction models significantly for the five classifiers. CONCLUSION: Threshold values can be used to label software classes as fault-prone and can be used to improve machine learners in predicting the fault-prone classes.<\/jats:p>","DOI":"10.3233\/kes-210061","type":"journal-article","created":{"date-parts":[[2021,7,27]],"date-time":"2021-07-27T13:18:16Z","timestamp":1627391896000},"page":"159-172","source":"Crossref","is-referenced-by-count":2,"title":["Software fault prediction using machine learning techniques with metric thresholds"],"prefix":"10.1177","volume":"25","author":[{"given":"Raed","family":"Shatnawi","sequence":"first","affiliation":[]}],"member":"179","reference":[{"key":"10.3233\/KES-210061_ref1","doi-asserted-by":"crossref","unstructured":"A. Folleco, T.M. Khoshgoftaar, J. Van Hulse and L. Bullard, Software quality modeling: The impact of class noise on the random forest classifier, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence), Hong Kong, 2008, pp. 3853\u20133859.","DOI":"10.1109\/CEC.2008.4631321"},{"key":"10.3233\/KES-210061_ref2","doi-asserted-by":"crossref","unstructured":"B. Ghotra, S. McIntosh and A.E. Hassan, Revisiting the Impact of Classification Techniques on the Performance of Defect Prediction Models, 2015 IEEE\/ACM 37th IEEE International Conference on Software Engineering, Florence, 2015, pp. 789\u2013800.","DOI":"10.1109\/ICSE.2015.91"},{"issue":"1","key":"10.3233\/KES-210061_ref3","doi-asserted-by":"crossref","first-page":"4","DOI":"10.1147\/sj.411.0004","article-title":"Software debugging, testing, and verification","volume":"41","author":"Hailpern","year":"2002","journal-title":"IBM Systems Journal"},{"key":"10.3233\/KES-210061_ref4","doi-asserted-by":"crossref","first-page":"561","DOI":"10.1007\/s10664-008-9079-3","article-title":"Techniques for evaluating fault prediction models","volume":"13","author":"Jiang","year":"2008","journal-title":"Empir Softw Eng"},{"issue":"2","key":"10.3233\/KES-210061_ref5","doi-asserted-by":"crossref","first-page":"442","DOI":"10.1016\/0005-2795(75)90109-9","article-title":"Comparison of the predicted and observed secondary structure of T4 phage lysozyme","volume":"405","author":"Matthews","year":"1975","journal-title":"Biochimica et Biophysica Acta (BBA) \u2013 Protein Structure"},{"issue":"21","key":"10.3233\/KES-210061_ref6","doi-asserted-by":"crossref","first-page":"4867","DOI":"10.1016\/j.ins.2011.06.017","article-title":"Class noise detection based on software metrics and ROC curves","volume":"181","author":"Catal","year":"2011","journal-title":"Information Sciences"},{"key":"10.3233\/KES-210061_ref7","unstructured":"C. Seiffert, T.M. Khoshgoftaar, J. Van Hulse and A. Napolitano, Building useful models from imbalanced data with sampling and boosting, Proceedings of the Twenty\u00a0\u2013 First International FLAIRS Conference, 2008, pp 206\u2013311."},{"issue":"1","key":"10.3233\/KES-210061_ref8","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1007\/BF00153759","article-title":"Instance\u00a0\u2013 based learning algorithms","volume":"6","author":"Aha","year":"1991","journal-title":"Machine Learning"},{"key":"10.3233\/KES-210061_ref9","doi-asserted-by":"crossref","unstructured":"D. Gray, D. Bowes, N. Davey, Y. Sun and B. Christianson, The misuse of the NASA metrics data program data sets for automated software defect prediction, in Evaluation and Assessment in Software Engineering (EASE), 2011.","DOI":"10.1049\/ic.2011.0012"},{"issue":"1","key":"10.3233\/KES-210061_ref10","first-page":"37","article-title":"Evaluation: From precision, recall and f-measure to roc, informedness, markedness & correlation","volume":"2","author":"Powers","year":"2011","journal-title":"Journal of Machine Learning Technologies"},{"key":"10.3233\/KES-210061_ref11","doi-asserted-by":"crossref","unstructured":"E. Tempero et al., The Qualitas Corpus: A Curated Collection of Java Code for Empirical Studies, 2010 Asia Pacific Software Engineering Conference, Sydney, NSW, 2010, pp. 336\u2013345.","DOI":"10.1109\/APSEC.2010.46"},{"issue":"5","key":"10.3233\/KES-210061_ref12","doi-asserted-by":"crossref","first-page":"2107","DOI":"10.1007\/s10664-015-9396-2","article-title":"Towards building a universal defect prediction model with rank transformed predictors","volume":"2","author":"Zhang","year":"2016","journal-title":"Empir Software Eng"},{"key":"10.3233\/KES-210061_ref13","unstructured":"G. John and P. Langley, Estimating continuous distributions in Bayesian classifiers, in: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, in: P. Besnard and S. Hanks, eds, San Francisco, CA: Morgan Kaufmann, 1995, pp. 338\u2013345."},{"issue":"1","key":"10.3233\/KES-210061_ref14","first-page":"1","article-title":"On the proposal and evaluation of a benchmark-based threshold derivation method","volume":"27","author":"Vale","year":"2018","journal-title":"Software Quality Journal"},{"issue":"9","key":"10.3233\/KES-210061_ref15","first-page":"1264","article-title":"Learning from imbalanced data","volume":"21","author":"He","year":"2009","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"issue":"5","key":"10.3233\/KES-210061_ref16","doi-asserted-by":"crossref","first-page":"1042","DOI":"10.1016\/j.jss.2011.12.006","article-title":"The impact of accounting for special methods in the measurement of object-oriented class cohesion on refactoring and fault prediction activities","volume":"85","author":"Al Dallal","year":"2012","journal-title":"The Journal of Systems and Software"},{"key":"10.3233\/KES-210061_ref17","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.ins.2013.06.002","article-title":"Tackling the problem of classification with noisy data using Multiple Classifier Systems: Analysis of the performance and robustness","volume":"247","author":"S\u00e1ez","year":"2013","journal-title":"Information Sciences"},{"issue":"12","key":"10.3233\/KES-210061_ref18","doi-asserted-by":"crossref","first-page":"1513","DOI":"10.1016\/j.datak.2009.08.005","article-title":"Knowledge discovery from imbalanced and noisy data","volume":"68","author":"Van Hulse","year":"2009","journal-title":"Data and Knowledge Engineering"},{"key":"10.3233\/KES-210061_ref19","unstructured":"J.R. Quinlan, C4.5: Programs for machine learning, Morgan Kaufmann Publishers, San Mateo, 1993."},{"issue":"2","key":"10.3233\/KES-210061_ref20","doi-asserted-by":"crossref","first-page":"244","DOI":"10.1016\/j.jss.2011.05.044","article-title":"Identifying thresholds for object\u00a0\u2013 oriented software metrics","volume":"85","author":"Ferreira","year":"2012","journal-title":"The Journal of Systems and Software"},{"key":"10.3233\/KES-210061_ref21","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1007\/s11219-011-9132-0","article-title":"Predicting high-risk program classes by selecting the right software measurements","volume":"20","author":"Gao","year":"2012","journal-title":"Software Quality Journal"},{"issue":"2","key":"10.3233\/KES-210061_ref22","doi-asserted-by":"crossref","first-page":"553","DOI":"10.1007\/s11219-017-9361-y","article-title":"An empirical study of crash-inducing commits in Mozilla Firefox","volume":"26","author":"An","year":"2018","journal-title":"Software Quality Journal"},{"issue":"1","key":"10.3233\/KES-210061_ref23","doi-asserted-by":"crossref","first-page":"751","DOI":"10.1109\/32.544352","article-title":"A validation of object\u2013oriented design metrics as quality indicators","volume":"22","author":"Basili","year":"1996","journal-title":"IEEE Transactions on Software Engineering"},{"issue":"1","key":"10.3233\/KES-210061_ref24","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1023\/A:1009815306478","article-title":"Replicated case studies for investigating quality factors in object\u2013oriented designs","volume":"6","author":"Briand","year":"2001","journal-title":"Empirical Software Engineering"},{"issue":"3","key":"10.3233\/KES-210061_ref25","doi-asserted-by":"crossref","first-page":"245","DOI":"10.1016\/S0164-1212(99)00102-8","article-title":"Exploring the relationship between design measures and software quality in object-oriented systems","volume":"51","author":"Briand","year":"2000","journal-title":"J Syst Softw"},{"issue":"5","key":"10.3233\/KES-210061_ref26","first-page":"49","article-title":"Exploring the missing link: An empirical study of software fixes, Software Testing","volume":"24","author":"Hamill","year":"2014","journal-title":"Verification and Reliability"},{"key":"10.3233\/KES-210061_ref27","unstructured":"M. Jureczko and D. Spinellis, Using object-oriented design metrics to predict software defects, In: Proceedings of the 5th International Conference on Dependability of Computer Systems, 2010, pp. 69\u201381."},{"key":"10.3233\/KES-210061_ref28","doi-asserted-by":"crossref","unstructured":"M. Jureczko and L. Madeyski, Towards identifying software project clusters with regard to defect prediction, In Proceedings of the 6th International Conference on Predictive Models in Software Engineering, 2010, pp. 1\u201310.","DOI":"10.1145\/1868328.1868342"},{"key":"10.3233\/KES-210061_ref29","first-page":"179","article-title":"Addressing the curse of imbalanced training sets: one\u2013sided selection","author":"Kubat","year":"1997","journal-title":"Proceedings of the Fourteenth International Conference on Machine Learning"},{"key":"10.3233\/KES-210061_ref30","unstructured":"M. Lorenz and J. Kidd, Object-oriented Software Metrics, Prentice-Hall: Englewood Cliffs NJ, 1994."},{"issue":"6","key":"10.3233\/KES-210061_ref31","doi-asserted-by":"crossref","first-page":"603","DOI":"10.1109\/TSE.2014.2322358","article-title":"Researcher Bias: The Use of Machine Learning in Software Defect Prediction","volume":"40","author":"Shepperd","year":"2014","journal-title":"IEEE Transactions on Software Engineering"},{"key":"10.3233\/KES-210061_ref32","unstructured":"N. Chawla, C4.5 and Imbalanced Data sets: Investigating the effect of sampling method, probabilistic estimate, and decision tree structure, Workshop on Learning from Imbalanced Datasets II, ICML, 2003, Washington DC."},{"issue":"3","key":"10.3233\/KES-210061_ref33","doi-asserted-by":"crossref","first-page":"494","DOI":"10.1037\/0033-2909.114.3.494","article-title":"Dominance statistics: Ordinal analyses to answer ordinal questions","volume":"114","author":"Cliff","year":"1993","journal-title":"Psychological Bulletin"},{"key":"10.3233\/KES-210061_ref34","doi-asserted-by":"crossref","first-page":"263","DOI":"10.1016\/j.asoc.2015.04.045","article-title":"Software defect prediction using cost-sensitive neural network","volume":"33","author":"Arar","year":"2015","journal-title":"Applied Soft Computing"},{"key":"10.3233\/KES-210061_ref35","doi-asserted-by":"crossref","first-page":"170","DOI":"10.1016\/j.infsof.2014.11.006","article-title":"An empirical study on software defect prediction with a simplified metric set","volume":"59","author":"He","year":"2015","journal-title":"Information and Software Technology"},{"key":"10.3233\/KES-210061_ref36","doi-asserted-by":"crossref","unstructured":"P. Oliveira, M.T. Valente and F.P. Lima, Extracting relative thresholds for source code metrics, Software Evolution Week \u2013 IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE), Antwerp, 2014, pp. 254\u2013263.","DOI":"10.1109\/CSMR-WCRE.2014.6747177"},{"key":"10.3233\/KES-210061_ref37","doi-asserted-by":"crossref","unstructured":"P. Singh and S. Verma, ACO based comprehensive model for software fault prediction, 24(1) (2020), 63\u201371.","DOI":"10.3233\/KES-200029"},{"issue":"12","key":"10.3233\/KES-210061_ref38","doi-asserted-by":"crossref","first-page":"1253","DOI":"10.1109\/TSE.2018.2836442","article-title":"A comprehensive investigation of the role of imbalanced learning for software defect prediction","volume":"45","author":"Song","year":"2019","journal-title":"IEEE Transactions on Software Engineering"},{"key":"10.3233\/KES-210061_ref39","doi-asserted-by":"crossref","first-page":"203","DOI":"10.1111\/exsy.12078","article-title":"Fault prediction considering threshold effects of object-oriented metrics","volume":"32","author":"Malhotra","year":"2015","journal-title":"Expert Systems"},{"issue":"11","key":"10.3233\/KES-210061_ref40","doi-asserted-by":"crossref","first-page":"1868","DOI":"10.1016\/j.jss.2007.12.794","article-title":"The effectiveness of software metrics in identifying error-prone classes in post-release software evolution process","volume":"81","author":"Shatnawi","year":"2008","journal-title":"Journal of Systems and Software"},{"key":"10.3233\/KES-210061_ref41","doi-asserted-by":"crossref","unstructured":"R. Shatnawi, Deriving metrics thresholds using log transformation, J Softw Evol and Proc 27 (2015), 95\u2013113.","DOI":"10.1002\/smr.1702"},{"issue":"2","key":"10.3233\/KES-210061_ref42","doi-asserted-by":"crossref","first-page":"216","DOI":"10.1109\/TSE.2010.9","article-title":"Quantitative investigation of the acceptable risk levels of object\u00a0\u2013 oriented metrics in open\u00a0\u2013 source systems","volume":"36","author":"Shatnawi","year":"2010","journal-title":"IEEE Trans Software Eng"},{"issue":"1","key":"10.3233\/KES-210061_ref43","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1002\/smr.404","article-title":"Finding software metrics threshold values using ROC curves","volume":"22","author":"Shatnawi","year":"2010","journal-title":"Journal of Software Maintenance & Evolution, Research & Practice"},{"key":"10.3233\/KES-210061_ref44","unstructured":"S. Benlarbi, K. El Emam, N. Goel and S. Rai, Thresholds for object-oriented measures, Proceedings 11th International Symposium on Software Reliability Engineering, ISSRE 2000, San Jose, CA, USA, 2000, pp. 24\u201338."},{"issue":"6","key":"10.3233\/KES-210061_ref45","doi-asserted-by":"crossref","first-page":"476","DOI":"10.1109\/32.295895","article-title":"A metrics suite for object oriented design","volume":"20","author":"Chidamber","year":"1994","journal-title":"IEEE Trans Software Eng"},{"issue":"4","key":"10.3233\/KES-210061_ref46","doi-asserted-by":"crossref","first-page":"485","DOI":"10.1109\/TSE.2008.35","article-title":"Benchmarking classification models for software defect prediction: a proposed framework and novel findings","volume":"34","author":"Lessmann","year":"2008","journal-title":"IEEE Transactions on Software Engineering"},{"issue":"6","key":"10.3233\/KES-210061_ref47","doi-asserted-by":"crossref","first-page":"1959","DOI":"10.1007\/s00500-016-2456-8","article-title":"Multi-objective cross-version defect prediction","volume":"22","author":"Shukla","year":"2018","journal-title":"Soft Comput"},{"issue":"2","key":"10.3233\/KES-210061_ref48","doi-asserted-by":"crossref","first-page":"434","DOI":"10.1109\/TR.2013.2259203","article-title":"Using class imbalance learning for software defect prediction","volume":"62","author":"Wang","year":"2013","journal-title":"IEEE Transactions on Reliability"},{"key":"10.3233\/KES-210061_ref49","doi-asserted-by":"crossref","first-page":"357","DOI":"10.1016\/j.eswa.2017.04.014","article-title":"Towards an ensemble based system for predicting the number of software faults","volume":"82","author":"Rathore","year":"2017","journal-title":"Expert Syst Appl"},{"key":"10.3233\/KES-210061_ref50","doi-asserted-by":"crossref","unstructured":"T. Alves, C. Ypma and J. Visser, Deriving metric thresholds from benchmark data, 2010 IEEE International Conference on Software Maintenance, 2010, pp. 1\u201310.","DOI":"10.1109\/ICSM.2010.5609747"},{"key":"10.3233\/KES-210061_ref51","doi-asserted-by":"crossref","unstructured":"T. Fukushima, Y. Kamei, S. McIntosh, K. Yamashita and N. Ubayashi, An empirical study of just-in-time defect prediction using cross-project models, In Proceedings of the 11th Working Conference on Mining Software Repositories, 2014, pp. 172\u2013181.","DOI":"10.1145\/2597073.2597075"},{"issue":"1","key":"10.3233\/KES-210061_ref52","doi-asserted-by":"crossref","first-page":"2","DOI":"10.1109\/TSE.2007.256941","article-title":"Data mining static code attributes to learn defect predictors","volume":"33","author":"Menzies","year":"2007","journal-title":"IEEE Transactions on Software Engineering"},{"key":"10.3233\/KES-210061_ref53","doi-asserted-by":"crossref","unstructured":"W. Cohen, Fast effective rule induction, In: Twelfth International Conference on Machine Learning, 1995, pp. 115\u2013123.","DOI":"10.1016\/B978-1-55860-377-6.50023-2"},{"issue":"2","key":"10.3233\/KES-210061_ref54","doi-asserted-by":"crossref","first-page":"155","DOI":"10.1016\/S0164-1212(98)10052-3","article-title":"Another metric suite for object oriented programming","volume":"44","author":"Li","year":"1998","journal-title":"The Journal of Systems and Software"}],"container-title":["International Journal of Knowledge-based and Intelligent Engineering Systems"],"original-title":[],"link":[{"URL":"https:\/\/content.iospress.com\/download?id=10.3233\/KES-210061","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,5,1]],"date-time":"2026-05-01T05:13:43Z","timestamp":1777612423000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/full\/10.3233\/KES-210061"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,7,26]]},"references-count":54,"journal-issue":{"issue":"2"},"URL":"https:\/\/doi.org\/10.3233\/kes-210061","relation":{},"ISSN":["1327-2314","1875-8827"],"issn-type":[{"value":"1327-2314","type":"print"},{"value":"1875-8827","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,7,26]]}}}