{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T02:32:09Z","timestamp":1760236329021,"version":"build-2065373602"},"reference-count":56,"publisher":"MDPI AG","issue":"22","license":[{"start":{"date-parts":[[2021,11,12]],"date-time":"2021-11-12T00:00:00Z","timestamp":1636675200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Key- Area Research and Development Program of Guangdong Province","award":["2020B0101650001"],"award-info":[{"award-number":["2020B0101650001"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62002123"],"award-info":[{"award-number":["62002123"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Guangdong Basic and Applied Basic Research Foundation","award":["2019A1515110212"],"award-info":[{"award-number":["2019A1515110212"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Ranking-oriented cross-project defect prediction (ROCPDP), which ranks software modules of a new target industrial project based on the predicted defect number or density, has been suggested in the literature. A major concern of ROCPDP is the distribution difference between the source project (aka. within-project) data and target project (aka. cross-project) data, which evidently degrades prediction performance. To investigate the impacts of training data selection methods on the performances of ROCPDP models, we examined the practical effects of nine training data selection methods, including a global filter, which does not filter out any cross-project data. Additionally, the prediction performances of ROCPDP models trained on the filtered cross-project data using the training data selection methods were compared with those of ranking-oriented within-project defect prediction (ROWPDP) models trained on sufficient and limited within-project data. Eleven available defect datasets from the industrial projects were considered and evaluated using two ranking performance measures, i.e., FPA and Norm(Popt). The results showed no statistically significant differences among these nine training data selection methods in terms of FPA and Norm(Popt). The performances of ROCPDP models trained on filtered cross-project data were not comparable with those of ROWPDP models trained on sufficient historical within-project data. However, ROCPDP models trained on filtered cross-project data achieved better performance values than ROWPDP models trained on limited historical within-project data. Therefore, we recommended that software quality teams exploit other project datasets to perform ROCPDP when there is no or limited within-project data.<\/jats:p>","DOI":"10.3390\/s21227535","type":"journal-article","created":{"date-parts":[[2021,11,14]],"date-time":"2021-11-14T20:51:53Z","timestamp":1636923113000},"page":"7535","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["An Empirical Study of Training Data Selection Methods for Ranking-Oriented Cross-Project Defect Prediction"],"prefix":"10.3390","volume":"21","author":[{"given":"Haoyu","family":"Luo","sequence":"first","affiliation":[{"name":"School of Computer Science, South China Normal University, Guangzhou 510631, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Heng","family":"Dai","sequence":"additional","affiliation":[{"name":"School of Mechanical and Electrical Engineering, Wuhan Qingchuan University, Wuhan 430204, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Weiqiang","family":"Peng","sequence":"additional","affiliation":[{"name":"School of Computer Science, Wuhan University, Wuhan 430072, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wenhua","family":"Hu","sequence":"additional","affiliation":[{"name":"School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan 430070, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Fuyang","family":"Li","sequence":"additional","affiliation":[{"name":"School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan 430070, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2021,11,12]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1355","DOI":"10.1109\/TR.2020.2996261","article-title":"WR-ELM: Weighted regularization extreme learning machine for imbalance learning in software fault prediction","volume":"69","author":"Bal","year":"2020","journal-title":"IEEE Trans. Reliab."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"106163","DOI":"10.1016\/j.asoc.2020.106163","article-title":"Collaborative filtering based recommendation of sampling methods for software defect prediction","volume":"90","author":"Sun","year":"2020","journal-title":"Appl. Soft Comput."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"182","DOI":"10.1016\/j.infsof.2018.10.004","article-title":"Software defect prediction based on kernel PCA and weighted extreme learning machine","volume":"106","author":"Xu","year":"2019","journal-title":"Inf. Softw. Technol."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"44","DOI":"10.1016\/j.jss.2017.08.025","article-title":"The bayesian network based program dependence graph and its application to fault localization","volume":"134","author":"Yu","year":"2017","journal-title":"J. Syst. Softw."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"106295","DOI":"10.1016\/j.infsof.2020.106295","article-title":"Search-based fault localisation: A systematic mapping study","volume":"123","author":"Freitas","year":"2020","journal-title":"Inf. Softw. Technol."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"106312","DOI":"10.1016\/j.infsof.2020.106312","article-title":"Multiple fault localization of software programs: A systematic literature review","volume":"124","author":"Zakari","year":"2020","journal-title":"Inf. Softw. Technol."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Szajna, A., Kostrzewski, M., Ciebiera, K., Stryjski, R., and Sciubba, E. (2021). Application of the deep CNN-based method in industrial system for wire marking identification. Energies, 14.","DOI":"10.3390\/en14123659"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"300","DOI":"10.1016\/j.ress.2014.09.019","article-title":"Method for evaluating an extended fault tree to analyse the dependability of complex systems: Application to a satellite-based railway system","volume":"133","author":"Nguyen","year":"2015","journal-title":"Reliab. Eng. Syst. Saf."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"148","DOI":"10.1016\/j.infsof.2019.04.013","article-title":"Source code properties of defective infrastructure as code scripts","volume":"112","author":"Rahman","year":"2019","journal-title":"Inf. Softw. Technol."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"106255","DOI":"10.1016\/j.infsof.2019.106255","article-title":"Web service design defects detection: A bi-level multi-objective approach","volume":"121","author":"Rebai","year":"2020","journal-title":"Inf. Softw. Technol."},{"key":"ref_11","first-page":"288","article-title":"Software defect prediction models for quality improvement: A literature study","volume":"9","author":"Rawat","year":"2012","journal-title":"Int. J. Comput. Sci. Issues"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"142","DOI":"10.1016\/j.infsof.2018.10.001","article-title":"Automatically identifying code features for software defect prediction: Using ast n-grams","volume":"106","author":"Shippey","year":"2019","journal-title":"Inf. Softw. Technol."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"131","DOI":"10.1016\/j.infsof.2019.06.003","article-title":"SimSAX: A measure of project similarity based on symbolic approximation method and software defect inflow","volume":"115","author":"Ochodek","year":"2019","journal-title":"Inf. Softw. Technol."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"106441","DOI":"10.1016\/j.infsof.2020.106441","article-title":"Revisiting heterogeneous defect prediction methods: How far are we?","volume":"130","author":"Chen","year":"2021","journal-title":"Inf. Softw. Technol."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"106287","DOI":"10.1016\/j.infsof.2020.106287","article-title":"A systematic review of unsupervised learning techniques for software defect prediction","volume":"122","author":"Li","year":"2020","journal-title":"Inf. Softw. Technol."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1109\/TSE.2016.2597849","article-title":"An improved SDA based defect prediction framework for both within-project and cross-project class-imbalance problems","volume":"43","author":"Jing","year":"2017","journal-title":"IEEE Trans. Softw. Eng."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"603","DOI":"10.1109\/TSE.2014.2322358","article-title":"Researcher bias: The use of machine learning in software defect prediction","volume":"40","author":"Shepperd","year":"2014","journal-title":"IEEE Trans. Softw. Eng."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"234","DOI":"10.1109\/TR.2014.2370891","article-title":"A learning-to-rank approach to software defect prediction","volume":"64","author":"Yang","year":"2015","journal-title":"IEEE Trans. Reliab."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"540","DOI":"10.1007\/s10664-008-9103-7","article-title":"On the relative value of cross-company and within-company data for defect prediction","volume":"14","author":"Turhan","year":"2009","journal-title":"Empir. Softw. Eng."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"822","DOI":"10.1109\/TSE.2012.83","article-title":"Local versus global lessons for defect prediction and effort estimation","volume":"39","author":"Menzies","year":"2013","journal-title":"IEEE Trans. Softw. Eng."},{"key":"ref_21","unstructured":"Zimmermann, T., Penta, M.D., and Kim, S. (2013, January 18\u201319). Better cross company defect prediction. Proceedings of the 10th Working Conference on Mining Software Repositories, MSR \u201913, San Francisco, CA, USA."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Kawata, K., Amasaki, S., and Yokogawa, T. (2015, January 12\u201316). Improving relevancy filter methods for cross-project defect prediction. Proceedings of the 3rd International Conference on Applied Computing and Information Technology, ACIT 2015\/2nd International Conference on Computational Science and Intelligence, CSI 2015, Okayama, Japan.","DOI":"10.1109\/ACIT-CSI.2015.104"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"1427","DOI":"10.1142\/S0218194017400046","article-title":"Improving cross-company defect prediction with data filtering","volume":"27","author":"Yu","year":"2017","journal-title":"Int. J. Softw. Eng. Knowl. Eng."},{"key":"ref_24","unstructured":"He, P., Li, B., Zhang, D., and Ma, Y. (2014). Simplification of training data for cross-project defect prediction. arXiv."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"25646","DOI":"10.1109\/ACCESS.2017.2771460","article-title":"Evaluating data filter on cross-project defect prediction: Comparison and improvements","volume":"5","author":"Li","year":"2017","journal-title":"IEEE Access"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1109\/TR.2019.2931559","article-title":"Improving ranking-oriented defect prediction using a cost-sensitive ranking SVM","volume":"69","author":"Yu","year":"2020","journal-title":"IEEE Trans. Reliab."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"1511","DOI":"10.1142\/S0218194016400155","article-title":"An empirical study of ranking-oriented cross-project software defect prediction","volume":"26","author":"You","year":"2016","journal-title":"Int. J. Softw. Eng. Knowl. Eng."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"9847","DOI":"10.1007\/s10586-018-1696-z","article-title":"Deep neural network based hybrid approach for software defect prediction using software metrics","volume":"22","author":"Manjula","year":"2019","journal-title":"Clust. Comput."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"17","DOI":"10.1007\/978-3-642-13318-3_3","article-title":"Software defect prediction using fuzzy support vector regression","volume":"Volume 6064","author":"Zhang","year":"2010","journal-title":"Proceedings of the 7th International Symposium on Neural Networks, ISNN 2010"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Tang, A., and Muccini, H. (2012, January 27\u201329). Compressed C4.5 models for software defect prediction. Proceedings of the 2012 12th International Conference on Quality Software, Xi\u2019an, China.","DOI":"10.1109\/QSIC.2012.19"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"154","DOI":"10.1007\/s10664-012-9218-8","article-title":"Software defect prediction using bayesian networks","volume":"19","author":"Okutan","year":"2014","journal-title":"Empir. Softw. Eng."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Petric, J., Bowes, D., Hall, T., Christianson, B., and Baddoo, N. (2016, January 8\u20139). Building an ensemble for software defect prediction based on diversity selection. Proceedings of the 10th ACM\/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2016, Ciudad Real, Spain.","DOI":"10.1145\/2961111.2962610"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Yu, X., Bennin, K.E., Liu, J., Keung, J.W., Yin, X., and Xu, Z. (2019, January 24\u201327). An empirical study of learning to rank techniques for effort-aware defect prediction. Proceedings of the 26th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2019, Hangzhou, China.","DOI":"10.1109\/SANER.2019.8668033"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"106364","DOI":"10.1016\/j.infsof.2020.106364","article-title":"Effort-aware semi-supervised just-in-time defect prediction","volume":"126","author":"Li","year":"2020","journal-title":"Inf. Softw. Technol."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"161","DOI":"10.1016\/j.infsof.2018.10.003","article-title":"Software defect number prediction: Unsupervised vs. supervised methods","volume":"106","author":"Chen","year":"2019","journal-title":"Inf. Softw. Technol."},{"key":"ref_36","unstructured":"Crestani, F., Marchand-Maillet, S., Chen, H., Efthimiadis, E.N., and Savoy, J. (2010, January 19\u201323). Learning to rank for information retrieval. Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2010, Geneva, Switzerland."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"223","DOI":"10.1109\/TR.2007.896761","article-title":"A comprehensive empirical study of count models for software fault prediction","volume":"56","author":"Gao","year":"2007","journal-title":"IEEE Trans. Reliab."},{"key":"ref_38","first-page":"303","article-title":"Predicting number of faults in software system using genetic programming","volume":"Volume 62","author":"Rathore","year":"2015","journal-title":"Proceedings of the 2015 International Conference on Soft Computing and Software Engineering, SCSE\u201915"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/2853073.2853083","article-title":"A decision tree regression based approach for the number of software faults prediction","volume":"41","author":"Rathore","year":"2016","journal-title":"ACM SIGSOFT Softw. Eng. Notes"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Xu, H. (2015, January 6\u20138). An empirical study on predicting defect numbers. Proceedings of the 27th International Conference on Software Engineering and Knowledge Engineering, SEKE 2015, Pittsburgh, PA, USA.","DOI":"10.18293\/SEKE2015-132"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"7417","DOI":"10.1007\/s00500-016-2284-x","article-title":"An empirical study of some software fault prediction techniques for the number of faults prediction","volume":"21","author":"Rathore","year":"2017","journal-title":"Soft Comput."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Nguyen, T.T., An, T.Q., Hai, V.T., and Phuong, T.M. (2014, January 15\u201317). Similarity-based and rank-based defect prediction. Proceedings of the 2014 International Conference on Advanced Technologies for Communications (ATC 2014), Hanoi, Vietnam.","DOI":"10.1109\/ATC.2014.7043405"},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"125","DOI":"10.1016\/j.infsof.2018.11.005","article-title":"A two-phase transfer learning model for cross-project defect prediction","volume":"107","author":"Liu","year":"2019","journal-title":"Inf. Softw. Technol."},{"key":"ref_44","unstructured":"van Vliet, H., and Issarny, V. (2009, January 24\u201328). Cross-project defect prediction: A large scale experiment on data vs. domain vs. process. Proceedings of the 7th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT International Symposium on Foundations of Software Engineering 2009, Amsterdam, The Netherlands."},{"key":"ref_45","unstructured":"Bener, A., Turhan, B., and Biffl, S. (2017, January 9\u201310). Training data selection for cross-project defection prediction: Which approach is better?. Proceedings of the 2017 ACM\/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2017, Toronto, ON, Canada."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"43","DOI":"10.1007\/s10664-014-9346-4","article-title":"Value-cognitive boosting with a support vector machine for cross-project defect prediction","volume":"21","author":"Ryu","year":"2016","journal-title":"Empir. Softw. Eng."},{"key":"ref_47","unstructured":"Dillon, L.K., Visser, W., and Williams, L.A. (2016, January 14\u201322). Cross-project defect prediction using a connectivity-based unsupervised classifier. Proceedings of the 38th International Conference on Software Engineering, ICSE 2016, Austin, TX, USA."},{"key":"ref_48","unstructured":"Lanza, M., Penta, M.D., and Xie, T. (2012, January 2\u20133). Think locally, act globally: Improving defect and effort prediction models. Proceedings of the 9th IEEE Working Conference of Mining Software Repositories, MSR 2012, Zurich, Switzerland."},{"key":"ref_49","unstructured":"Ostrand, T.J. (2009, January 18\u201319). Revisiting the evaluation of defect prediction models. Proceedings of the 5th International Workshop on Predictive Models in Software Engineering, PROMISE 2009, Vancouver, BC, Canada."},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"476","DOI":"10.1109\/32.295895","article-title":"A metrics suite for object oriented design","volume":"20","author":"Chidamber","year":"1994","journal-title":"IEEE Trans. Softw. Eng."},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"335","DOI":"10.1109\/TSE.1987.233164","article-title":"The use of software complexity metrics in software maintenance","volume":"13","author":"Kafura","year":"1987","journal-title":"IEEE Trans. Softw. Eng."},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"308","DOI":"10.1109\/TSE.1976.233837","article-title":"A complexity measure","volume":"2","author":"McCabe","year":"1976","journal-title":"IEEE Trans. Softw. Eng."},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Kostrzewski, M. (2020). Sensitivity analysis of selected parameters in the order picking process simulation model, with randomly generated orders. Entropy, 22.","DOI":"10.3390\/e22040423"},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"545","DOI":"10.11144\/Javeriana.upsy10-2.cdcp","article-title":"Cliff\u2019s delta calculator: A non-parametric effect size program for two groups of observations","volume":"10","author":"Macbeth","year":"2011","journal-title":"Univ. Psychol."},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"507","DOI":"10.2307\/2529204","article-title":"A cluster analysis method for grouping means in the analysis of variance","volume":"30","author":"Scott","year":"1974","journal-title":"Biometrics"},{"key":"ref_56","unstructured":"Bertolino, A., Canfora, G., and Elbaum, S.G. (2015, January 16\u201324). Revisiting the impact of classification techniques on the performance of defect prediction models. Proceedings of the 37th IEEE\/ACM International Conference on Software Engineering, ICSE 2015, Florence, Italy."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/22\/7535\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T07:29:27Z","timestamp":1760167767000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/22\/7535"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,11,12]]},"references-count":56,"journal-issue":{"issue":"22","published-online":{"date-parts":[[2021,11]]}},"alternative-id":["s21227535"],"URL":"https:\/\/doi.org\/10.3390\/s21227535","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2021,11,12]]}}}