{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,3]],"date-time":"2026-06-03T15:43:14Z","timestamp":1780501394141,"version":"3.54.1"},"reference-count":28,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2021,6,20]],"date-time":"2021-06-20T00:00:00Z","timestamp":1624147200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Successful cyber-attacks are caused by the exploitation of some vulnerabilities in the software and\/or hardware that exist in systems deployed in premises or the cloud. Although hundreds of vulnerabilities are discovered every year, only a small fraction of them actually become exploited, thereby there exists a severe class imbalance between the number of exploited and non-exploited vulnerabilities. The open source national vulnerability database, the largest repository to index and maintain all known vulnerabilities, assigns a unique identifier to each vulnerability. Each registered vulnerability also gets a severity score based on the impact it might inflict upon if compromised. Recent research works showed that the cvss score is not the only factor to select a vulnerability for exploitation, and other attributes in the national vulnerability database can be effectively utilized as predictive feature to predict the most exploitable vulnerabilities. Since cybersecurity management is highly resource savvy, organizations such as cloud systems will benefit when the most likely exploitable vulnerabilities that exist in their system software or hardware can be predicted with as much accuracy and reliability as possible, to best utilize the available resources to fix those first. Various existing research works have developed vulnerability exploitation prediction models by addressing the existing class imbalance based on algorithmic and artificial data resampling techniques but still suffer greatly from the overfitting problem to the major class rendering them practically unreliable. In this research, we have designed a novel cost function feature to address the existing class imbalance. We also have utilized the available large text corpus in the extracted dataset to develop a custom-trained word vector that can better capture the context of the local text data for utilization as an embedded layer in neural networks. Our developed vulnerability exploitation prediction models powered by a novel cost function and custom-trained word vector have achieved very high overall performance metrics for accuracy, precision, recall, F1-Score and AUC score with values of 0.92, 0.89, 0.98, 0.94 and 0.97, respectively, thereby outperforming any existing models while successfully overcoming the existing overfitting problem for class imbalance.<\/jats:p>","DOI":"10.3390\/s21124220","type":"journal-article","created":{"date-parts":[[2021,6,20]],"date-time":"2021-06-20T21:50:15Z","timestamp":1624225815000},"page":"4220","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":15,"title":["An Improved Vulnerability Exploitation Prediction Model with Novel Cost Function and Custom Trained Word Vector Embedding"],"prefix":"10.3390","volume":"21","author":[{"given":"Mohammad Shamsul","family":"Hoque","sequence":"first","affiliation":[{"name":"College of Computing & Informatics, Universiti Tenaga Nasional, Kajang 43000, Malaysia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Norziana","family":"Jamil","sequence":"additional","affiliation":[{"name":"College of Computing & Informatics, Universiti Tenaga Nasional, Kajang 43000, Malaysia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3250-1307","authenticated-orcid":false,"given":"Nowshad","family":"Amin","sequence":"additional","affiliation":[{"name":"Renewable Energy and Solar Photovoltaics, Institute of Sustainable Energy (ISE), Universiti Tenaga Nasional, Kajang 43000, Malaysia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Kwok-Yan","family":"Lam","sequence":"additional","affiliation":[{"name":"Technopreneur-Ship Centre, School of Computer Science and Engineering and Director of the Nanyang, Nanyang Technological University (NTU), Singapore 639798, Singapore"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2021,6,20]]},"reference":[{"key":"ref_1","unstructured":"(2020, August 01). HP Identifies Top Enterprise Security Threats. Available online: https:\/\/www.riskbasedsecurity.com\/2018\/08\/13\/more-than-10000-vulnerabilities-disclosed-so-far-in-2018-over-3000-you-may-not-know-about\/."},{"key":"ref_2","unstructured":"Sabottke, C., Suciu, O., and Dumitra, T. (2015, January 12\u201314). Vulnerability Disclosure in the Age of Social Media: Exploiting Twitter for Predicting Real World Exploits. Proceedings of the 24th USENIX Security Symposium, Washington, DC, USA."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Kim, J., Malaiya, Y.K., and Ray, I. (2007, January 14\u201316). Vulnerability Discovery in Multi-Version Software Systems. Proceedings of the 10th IEEE High Assuance Systems Engineering Symposium (HASE \u201907), Plano, TX, USA.","DOI":"10.1109\/HASE.2007.55"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"772","DOI":"10.1109\/TSE.2010.81","article-title":"Evaluating complexity, code churn, and developer activity metrics as indicators of software vulnerabilities","volume":"37","author":"Shin","year":"2011","journal-title":"IEEE Trans. Softw. Eng."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"489","DOI":"10.1016\/S0167-4048(04)00175-0","article-title":"Vulnerability forecasting\u2014A conceptual model","volume":"23","author":"Venter","year":"2004","journal-title":"Comput. Secur."},{"key":"ref_6","unstructured":"O\u2019Conner, L. (2014, January 3\u20136). Predicting vulnerable components: Software metrics vs text mining. Proceedings of the Twenty-Fifth IEEE International Symposium on Software Reliability Engineering, Naples, Italy."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Han, Z., Li, X., Xing, Z., Liu, H., and Feng, Z. (2017, January 17\u201322). Learning to Predict Severity of Software Vulnerability Using Only Vulnerability Description. Proceedings of the 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME), Shanghai, China.","DOI":"10.1109\/ICSME.2017.52"},{"key":"ref_8","unstructured":"(2020, September 04). Predictive Modeling, Supervised Machine Learning, and Pattern Classification. Available online: https:\/\/sebastianraschka.com\/Articles\/2014_intro_supervised_learning.html."},{"key":"ref_9","unstructured":"Expert Systems (2020, September 03). What Is Machine Learning? A Definition. Available online: https:\/\/expertsystem.com\/machine-learning-definition\/."},{"key":"ref_10","unstructured":"(2020, September 02). Time Series Machine Learning Regression Framework. Available online: https:\/\/towardsdatascience.com\/time-series-machine-learning-regression-framework-9ea33929009a."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Almukaynizi, M., Nunes, E., Dharaiya, K., Senguttuvan, M., Shakarian, J., and Shakarian, P. (2017, January 7\u20138). Proactive Identification of Exploits in the Wild Through Vulnerability Mentions Online. Proceedings of the 2017 International Conference on Cyber Conflict (CyCon U.S.), Washington, DC, USA.","DOI":"10.1109\/CYCONUS.2017.8167501"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Tavabi, N., Goyal, P., Almukaynizi, M., Shakarian, P., and Lerman, K. (2018, January 2\u20137). DarkEmbed: Exploit Prediction with Neural Language Models. Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, LA, USA.","DOI":"10.1609\/aaai.v32i1.11428"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Edkrantz, M., and Said, A. (2015, January 3\u20135). Predicting cyber vulnerability exploits with machine learning. Proceedings of the 2nd International Conference on Cyber Security and Cloud Computing, New York, NY, USA.","DOI":"10.1109\/CSCloud.2015.56"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Reinthal, A., Filippakis, E.L., and Almgren, M. (2018, January 28\u201330). Data Modelling for Predicting Exploits. Proceedings of the 23rd Nordic Conference on Secure IT Systems, NordSec 2018, Gothenburg, Sweden.","DOI":"10.1007\/978-3-030-03638-6_21"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Bullough, B.L., Yanchenko, A.K., Smith, C.L., and Zipkin, J.R. (2017, January 24). Predicting exploitation of disclosed software vulnerabilities using open-source data. Proceedings of the IWSPA \u201917: 3rd ACM on International Workshop on Security and Privacy Analytics, Scottsdale, AZ, USA.","DOI":"10.1145\/3041008.3041009"},{"key":"ref_16","unstructured":"Queiroz, A., Keegan, B., and Mtenzi, F. (2017, January 29\u201330). Predicting software vulnerability using security discussion in social media. Proceedings of the European Conference on Information Warfare and Security, Dublin, Ireland."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"tyaa015","DOI":"10.1093\/cybsec\/tyaa015","article-title":"Improving Vulnerability Remediation through Better Exploit Prediction","volume":"6","author":"Jacobs","year":"2020","journal-title":"J. Cybersecur."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Zhang, S., Caragea, D., and Ou, X. (2011). An empirical study on using the national vulnerability database to predict software vulnerabilities. DEXA 2011: Database and Expert Systems Applications, Springer.","DOI":"10.1007\/978-3-642-23088-2_15"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Bozorgi, M., Saul, L.K., Savage, S., and Voelker, G.M. (2010, January 24\u201328). Beyond heuristics: Learning to classify vulnerabilities and predict exploits. Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA.","DOI":"10.1145\/1835804.1835821"},{"key":"ref_20","unstructured":"(2020, September 01). Common Vulnerabilities and Exposures: History. Available online: https:\/\/cve.mitre.org\/about\/history.html."},{"key":"ref_21","unstructured":"(2021, February 01). Exploit Database. Available online: https:\/\/www.exploit-db.com\/."},{"key":"ref_22","unstructured":"(2021, February 02). Attack Signatures. Available online: https:\/\/www.broadcom.com\/support\/security-center\/attacksignaturescve-search."},{"key":"ref_23","unstructured":"(2021, February 02). cve-serch-Tools to Perform Local Searches for Known Vulnerabilities. Available online: https:\/\/github.com\/cve-search\/."},{"key":"ref_24","unstructured":"(2020, August 10). Pre-trained Word Embeddings or Embedding Layer?\u2014A Dilemma. Available online: https:\/\/towardsdatascience.com\/pre-trained-word-embeddings-or-embedding-layer-a-dilemma-8406959fd76c."},{"key":"ref_25","unstructured":"(2020, August 08). Precision and Recall. Available online: https:\/\/en.wikipedia.org\/wiki\/Precision_and_recall."},{"key":"ref_26","unstructured":"(2020, August 09). Receiver Operating Characteristic. Available online: https:\/\/en.wikipedia.org\/wiki\/Receiver_operating_characteristic."},{"key":"ref_27","unstructured":"(2020, August 07). Understanding LSTM Networks. Available online: http:\/\/colah.github.io\/posts\/2015-08-Understanding-LSTMs\/."},{"key":"ref_28","unstructured":"(2021, February 10). CVE Details: The Ultimate Security Vulnerability Datasource. Available online: https:\/\/www.cvedetails.com\/."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/12\/4220\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T06:19:33Z","timestamp":1760163573000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/12\/4220"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,6,20]]},"references-count":28,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2021,6]]}},"alternative-id":["s21124220"],"URL":"https:\/\/doi.org\/10.3390\/s21124220","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,6,20]]}}}