{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,5]],"date-time":"2026-03-05T16:40:19Z","timestamp":1772728819359,"version":"3.50.1"},"reference-count":32,"publisher":"MDPI AG","issue":"9","license":[{"start":{"date-parts":[[2022,9,7]],"date-time":"2022-09-07T00:00:00Z","timestamp":1662508800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Linnaeus University Centre"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Data"],"abstract":"<jats:p>Modern systems produce and handle a large volume of sensitive enterprise data. Therefore, security vulnerabilities in the software systems must be identified and resolved early to prevent security breaches and failures. Predicting security vulnerabilities is an alternative to identifying them as developers write code. In this study, we studied the ability of several machine learning algorithms to predict security vulnerabilities. We created two datasets containing security vulnerability information from two open-source systems: (1) Apache Tomcat (versions 4.x and five 2.5.x minor versions). We also computed source code metrics for these versions of both systems. We examined four classifiers, including Naive Bayes, Decision Tree, XGBoost Classifier, and Logistic Regression, to show their ability to predict security vulnerabilities. Moreover, an ensemble learner was introduced using a stacking classifier to see whether the prediction performance could be improved. We performed cross-version and cross-project predictions to assess the effectiveness of the best-performing model. Our results showed that the XGBoost classifier performed best compared to other learners, i.e., with an average accuracy of 97% in both datasets. The stacking classifier performed with an average accuracy of 92% in Struts and 71% in Tomcat. Our best-performing model\u2014XGBoost\u2014could predict with an average accuracy of 87% in Tomcat and 99% in Struts in a cross-version setup.<\/jats:p>","DOI":"10.3390\/data7090127","type":"journal-article","created":{"date-parts":[[2022,9,7]],"date-time":"2022-09-07T20:52:03Z","timestamp":1662583923000},"page":"127","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Are Source Code Metrics \u201cGood Enough\u201d in Predicting Security Vulnerabilities?"],"prefix":"10.3390","volume":"7","author":[{"given":"Sundarakrishnan","family":"Ganesh","sequence":"first","affiliation":[{"name":"Department of Computer Science and Media Technology, Linnaeus University, 351 95 V\u00e4xj\u00f6, Sweden"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7092-2244","authenticated-orcid":false,"given":"Francis","family":"Palma","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Media Technology, Linnaeus University, 351 95 V\u00e4xj\u00f6, Sweden"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1154-5308","authenticated-orcid":false,"given":"Tobias","family":"Olsson","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Media Technology, Linnaeus University, 351 95 V\u00e4xj\u00f6, Sweden"}]}],"member":"1968","published-online":{"date-parts":[[2022,9,7]]},"reference":[{"key":"ref_1","unstructured":"Osborne, C. (2022, July 14). Open-Source Vulnerabilities Plague Enterprise Codebase Systems. Available online: https:\/\/www.zdnet.com\/article\/enterprise-codebases-plagued-by-open-source-vulnerabilities\/."},{"key":"ref_2","unstructured":"(2021, October 15). JSLint. Available online: https:\/\/www.jslint.com."},{"key":"ref_3","unstructured":"(2021, October 15). SonarQube. Available online: https:\/\/www.sonarqube.org\/."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Andrianto, I., Liem, M.I., and Asnar, Y.D.W. (2017, January 1\u20132). Web application fuzz testing. Proceedings of the 2017 International Conference on Data and Software Engineering (ICoDSE), Palembang, Indonesia.","DOI":"10.1109\/ICODSE.2017.8285893"},{"key":"ref_5","unstructured":"NordVPN (2022, July 14). Five Vulnerabilities Attackers Leveraged Most in 2020. Available online: https:\/\/www.qualitydigest.com\/inside\/management-article\/five-vulnerabilities-attackers-leveraged-most-2020-011321.html."},{"key":"ref_6","unstructured":"(2022, July 14). Nsrav. Available online: https:\/\/owasp.org\/www-community\/attacks\/Denial_of_Service."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Bayles, A.W., Brindley, E., Foster, J.C., Hurley, C., and Long, J. (2005). Chapter 8\u2014Classes of Attack. Infosec Career Hacking, Syngress.","DOI":"10.1016\/B978-159749011-5\/50014-3"},{"key":"ref_8","unstructured":"Harley (2022, July 14). WebApps 101: Information Disclosure Vulnerabilities and PortSwigger Lab Examples. Available online: https:\/\/infinitelogins.com\/2021\/01\/02\/information-disclosure-vulnerabilities-portswigger-lab-examples\/."},{"key":"ref_9","unstructured":"(2021, October 15). Apache Tomcat. Available online: https:\/\/tomcat.apache.org."},{"key":"ref_10","unstructured":"Aruna (2022, July 14). CSRF Protection by Using Double Submit Cookie. Available online: https:\/\/arunashansat.wordpress.com\/2018\/10\/12\/csrf-protection-by-using-double-submit-cookie\/."},{"key":"ref_11","unstructured":"PortSwigger Ltd. (2022, July 14). What Is HTTP Request Smuggling?. Available online: https:\/\/portswigger.net\/web-security\/request-smuggling."},{"key":"ref_12","unstructured":"Aniche, M. (2022, July 14). Java Code Metrics Calculator (CK). Available online: https:\/\/github.com\/mauricioaniche\/ck\/."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Lewis, D.D. (1998). Naive (Bayes) at forty: The independence assumption in information retrieval. Machine Learning: ECML-98, Springer.","DOI":"10.1007\/BFb0026666"},{"key":"ref_14","unstructured":"Swaminathan, S. (2022, July 14). Logistic Regression\u2014Detailed Overview. Available online: https:\/\/towardsdatascience.com\/logistic-regression-detailed-overview-46c4da4303bc."},{"key":"ref_15","unstructured":"(2022, July 14). Overfitting. Available online: https:\/\/corporatefinanceinstitute.com\/resources\/knowledge\/other\/overfitting\/."},{"key":"ref_16","unstructured":"(2022, July 14). Decision Trees. Available online: https:\/\/www.cs.cmu.edu\/~bhiksha\/courses\/10-601\/decisiontrees\/."},{"key":"ref_17","unstructured":"Brownlee, J. (2022, July 14). Gentle Introduction to the Bias-Variance Trade-Off in Machine Learning. Available online: https:\/\/machinelearningmastery.com\/gentle-introduction-to-the-bias-variance-trade-off-in-machine-learning\/."},{"key":"ref_18","unstructured":"Tarbani, N. (2022, July 14). Gradient Boosting Algorithm | How Gradient Boosting Algorithm Works. Available online: https:\/\/www.analyticsvidhya.com\/blog\/2021\/04\/how-the-gradient-boosting-algorithm-works\/."},{"key":"ref_19","unstructured":"Ceballos, F. (2022, July 14). Stacking Classifiers for Higher Predictive Performance. Available online: https:\/\/towardsdatascience.com\/stacking-classifiers-for-higher-predictive-performance-566f963e4840."},{"key":"ref_20","unstructured":"Luhaniwal, V. (2022, July 14). Feature Selection Using Wrapper Method\u2014Python Implementation. Available online: https:\/\/www.analyticsvidhya.com\/blog\/2020\/10\/a-comprehensive-guide-to-feature-selection-using-wrapper-methods-in-python\/."},{"key":"ref_21","unstructured":"Yemulwar, S. (2022, July 14). Feature Selection Techniques. Available online: https:\/\/medium.com\/analytics-vidhya\/feature-selection-techniques-2614b3b7efcd."},{"key":"ref_22","unstructured":"(2022, July 14). Bayes\u2019 Theorem. Available online: https:\/\/corporatefinanceinstitute.com\/resources\/knowledge\/other\/bayes-theorem\/#:~:text=Formula."},{"key":"ref_23","unstructured":"Roy, A. (2022, July 14). A Dive into Decision Trees. Available online: https:\/\/towardsdatascience.com\/a-dive-into-decision-trees-a128923c9298."},{"key":"ref_24","unstructured":"Harer, J.A., Kim, L.Y., Russell, R.L., Ozdemir, O., Kosta, L.R., Rangamani, A., Hamilton, L.H., Centeno, G.I., Key, J.R., and Ellingwood, P.M. (2018). Automated software vulnerability detection with machine learning. arXiv."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Pang, Y., Xue, X., and Wang, H. (2017, January 2\u20134). Predicting Vulnerable Software Components through Deep Neural Network. Proceedings of the 2017 International Conference on Deep Learning Technologies, ICDLT \u201917, Chengdu, China.","DOI":"10.1145\/3094243.3094245"},{"key":"ref_26","unstructured":"Benjamin, V., and Lam, M.S. (2005). Finding Security Vulnerabilities in Java Applications with Static Analysis. USENIX Secur. Symp., 14."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Hammouri, A., Hammad, M., Alnabhan, M., and Alsarayrah, F. (2018). Software Bug Prediction using Machine Learning Approach. International Journal of Advanced Computer Science and Applications, 9.","DOI":"10.14569\/IJACSA.2018.090212"},{"key":"ref_28","first-page":"665","article-title":"Software Bug Prediction using object-oriented metrics","volume":"42","author":"Gupta","year":"2018","journal-title":"Sadhana"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Goyal, R., Chandra, P., and Singh, Y. (2013). Identifying influential metrics in the combined metrics approach of fault prediction. SpringerPlus, 2.","DOI":"10.1186\/2193-1801-2-627"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Shar, L.K., and Tan, H.B.K. (2012, January 3\u20137). Predicting common web application vulnerabilities from input validation and sanitization code patterns. Proceedings of the 27th IEEE\/ACM International Conference on Automated Software Engineering, Essen, Germany.","DOI":"10.1145\/2351676.2351733"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Yosifova, V. (October, January 30). Vulnerability Type Prediction in Common Vulnerabilities and Exposures Database with Ensemble Machine Learning. Proceedings of the 2021 International Conference Automatics and Informatics (ICAI), Varna, Bulgaria.","DOI":"10.1109\/ICAI52893.2021.9639588"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"150672","DOI":"10.1109\/ACCESS.2020.3016774","article-title":"Vulnerability Prediction From Source Code Using Machine Learning","volume":"8","author":"Bilgin","year":"2020","journal-title":"IEEE Access"}],"container-title":["Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2306-5729\/7\/9\/127\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T00:24:55Z","timestamp":1760142295000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2306-5729\/7\/9\/127"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,9,7]]},"references-count":32,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2022,9]]}},"alternative-id":["data7090127"],"URL":"https:\/\/doi.org\/10.3390\/data7090127","relation":{},"ISSN":["2306-5729"],"issn-type":[{"value":"2306-5729","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,9,7]]}}}