{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,31]],"date-time":"2026-01-31T08:21:41Z","timestamp":1769847701696,"version":"3.49.0"},"reference-count":49,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2024,4,11]],"date-time":"2024-04-11T00:00:00Z","timestamp":1712793600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100002347","name":"Federal Ministry of Education and Research of Germany","doi-asserted-by":"publisher","award":["16KIS1337"],"award-info":[{"award-number":["16KIS1337"]}],"id":[{"id":"10.13039\/501100002347","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>Given a set of software programs, each being labeled either as vulnerable or benign, deep learning technology can be used to automatically build a software vulnerability detector. A challenge in this context is that there are countless equivalent ways to implement a particular functionality in a program. For instance, the naming of variables is often a matter of the personal style of programmers, and thus, the detection of vulnerability patterns in programs is made difficult. Current deep learning approaches to software vulnerability detection rely on the raw text of a program and exploit general natural language processing capabilities to address the problem of dealing with different naming schemes in instances of vulnerability patterns. Relying on natural language processing, and learning how to reveal variable reference structures from the raw text, is often too high a burden, however. Thus, approaches based on deep learning still exhibit problems generating a detector with decent generalization properties due to the naming or, more generally formulated, the vocabulary explosion problem. In this work, we propose techniques to mitigate this problem by making the referential structure of variable references explicit in input representations for deep learning approaches. Evaluation results show that deep learning models based on techniques presented in this article outperform raw text approaches for vulnerability detection. In addition, the new techniques also induce a very small main memory footprint. The efficiency gain of memory usage can be up to four orders of magnitude compared to existing methods as our experiments indicate.<\/jats:p>","DOI":"10.3390\/info15040216","type":"journal-article","created":{"date-parts":[[2024,4,11]],"date-time":"2024-04-11T07:09:28Z","timestamp":1712819368000},"page":"216","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["There Are Infinite Ways to Formulate Code: How to Mitigate the Resulting Problems for Better Software Vulnerability Detection"],"prefix":"10.3390","volume":"15","author":[{"given":"Jinghua","family":"Groppe","sequence":"first","affiliation":[{"name":"Institute of Information Systems (IFIS), University of L\u00fcbeck, Ratzeburger Allee 160, 23562 L\u00fcbeck, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5196-1117","authenticated-orcid":false,"given":"Sven","family":"Groppe","sequence":"additional","affiliation":[{"name":"Institute of Information Systems (IFIS), University of L\u00fcbeck, Ratzeburger Allee 160, 23562 L\u00fcbeck, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Daniel","family":"Senf","sequence":"additional","affiliation":[{"name":"Lufthansa Industry Solutions AS GmbH, Sch\u00fctzenwall 1, 22844 Norderstedt, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ralf","family":"M\u00f6ller","sequence":"additional","affiliation":[{"name":"Institute of Information Systems (IFIS), University of L\u00fcbeck, Ratzeburger Allee 160, 23562 L\u00fcbeck, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2024,4,11]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Brooks, T.N. (2018, January 7\u20138). Survey of automated vulnerability detection and exploit generation techniques in cyber reasoning systems. Proceedings of the Science and Information Conference, Semarang, Indonesia.","DOI":"10.1007\/978-3-030-01177-2_79"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Henzinger, T.A., Jhala, R., Majumdar, R., and Sutre, G. (2003, January 9\u201310). Software verification with BLAST. Proceedings of the Workshop on Model Checking of Software, Portland, OR, USA.","DOI":"10.1007\/3-540-44829-2_17"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"B\u00f6hme, M., Pham, V.T., and Roychoudhury, A. (2016, January 24\u201328). Coverage-based greybox fuzzing as markov chain. Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria.","DOI":"10.1145\/2976749.2978428"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Stephens, N., Grosen, J., Salls, C., Dutcher, A., Wang, R., Corbetta, J., Shoshitaishvili, Y., Kruegel, C., and Vigna, G. (2016, January 21\u201324). Driller: Augmenting fuzzing through selective symbolic execution. Proceedings of the NDSS, San Diego, CA, USA.","DOI":"10.14722\/ndss.2016.23368"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Johnson, B., Song, Y., Murphy-Hill, E., and Bowdidge, R. (2013, January 18\u201326). Why don\u2019t software developers use static analysis tools to find bugs?. Proceedings of the 2013 35th International Conference on Software Engineering (ICSE), San Francisco, CA, USA.","DOI":"10.1109\/ICSE.2013.6606613"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Smith, J., Johnson, B., Murphy-Hill, E., Chu, B., and Lipford, H.R. (September, January 30). Questions developers ask while diagnosing potential security vulnerabilities with static analysis. Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, Bergamo, Italy.","DOI":"10.1145\/2786805.2786812"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Ayewah, N., Pugh, W., Morgenthaler, J.D., Penix, J., and Zhou, Y. (2007, January 13\u201314). Evaluating static analysis defect warnings on production software. Proceedings of the 7th Acm Sigplan-Sigsoft Workshop on Program Analysis for Software Tools and Engineering, San Diego, CA, USA.","DOI":"10.1145\/1251535.1251536"},{"key":"ref_8","first-page":"3","article-title":"Dynamic taint analysis for automatic detection, analysis, and signaturegeneration of exploits on commodity software","volume":"5","author":"Newsome","year":"2005","journal-title":"Proc. Ndss. Citeseer"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Liu, B., Shi, L., Cai, Z., and Li, M. (2012, January 2\u20134). Software vulnerability discovery techniques: A survey. Proceedings of the 2012 Fourth International Conference on Multimedia Information Networking and Security, Nanjing, China.","DOI":"10.1109\/MINES.2012.202"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"3280","DOI":"10.1109\/TSE.2021.3087402","article-title":"Deep learning based vulnerability detection: Are we there yet","volume":"48","author":"Chakraborty","year":"2021","journal-title":"IEEE Trans. Softw. Eng."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Collobert, R., and Weston, J. (2008, January 5\u20139). A unified architecture for natural language processing: Deep neural networks with multitask learning. Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland.","DOI":"10.1145\/1390156.1390177"},{"key":"ref_12","first-page":"1","article-title":"Phone recognition with the mean-covariance restricted Boltzmann machine","volume":"23","author":"Dahl","year":"2010","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1145\/3065386","article-title":"Imagenet classification with deep convolutional neural networks","volume":"60","author":"Krizhevsky","year":"2017","journal-title":"Commun. ACM"},{"key":"ref_14","first-page":"1","article-title":"Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks","volume":"32","author":"Zhou","year":"2019","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Li, Z., Zou, D., Xu, S., Ou, X., Jin, H., Wang, S., Deng, Z., and Zhong, Y. (2018). Vuldeepecker: A deep learning-based system for vulnerability detection. arXiv.","DOI":"10.14722\/ndss.2018.23158"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Russell, R., Kim, L., Hamilton, L., Lazovich, T., Harer, J., Ozdemir, O., Ellingwood, P., and McConley, M. (2018, January 17\u201320). Automated vulnerability detection in source code using deep representation learning. Proceedings of the 17th IEEE International Conference on Machine Learning and Applications (ICMLA), Orlando, FL, USA.","DOI":"10.1109\/ICMLA.2018.00120"},{"key":"ref_17","unstructured":"Dam, H.K., Tran, T., Pham, T., Ng, S.W., Grundy, J., and Ghose, A. (2017). Automatic feature learning for vulnerability prediction. arXiv."},{"key":"ref_18","first-page":"2224","article-title":"VulDeePecker: A Deep Learning-Based System for Multiclass Vulnerability Detection","volume":"18","author":"Zou","year":"2019","journal-title":"IEEE Trans. Dependable Secur. Comput."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1017\/S0963548304006674","article-title":"Complete disorder is impossible: The mathematical work of Walter Deuber","volume":"14","year":"2005","journal-title":"Comb. Probab. Comput."},{"key":"ref_20","unstructured":"Graham, R.L., Rothschild, B.L., and Spencer, J.H. (1991). Ramsey Theory, John Wiley & Sons."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Groppe, J., Groppe, S., and M\u00f6ller, R. (2023, January 28\u201330). Variables are a Curse in Software Vulnerability Prediction. Proceedings of the 34th International Conference on Database and Expert Systems Applications (DEXA 2023), Penang, Malaysia.","DOI":"10.1007\/978-3-031-39847-6_41"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Wang, S., Liu, T., and Tan, L. (2016, January 14\u201322). Automatically learning semantic features for defect prediction. Proceedings of the 38th International Conference on Software Engineering, Austin, TX, USA.","DOI":"10.1145\/2884781.2884804"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"3289","DOI":"10.1109\/TII.2018.2821768","article-title":"Cross-project transfer representation learning for vulnerable function discovery","volume":"14","author":"Lin","year":"2018","journal-title":"IEEE Trans. Ind. Inform."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3276517","article-title":"Deepbugs: A learning approach to name-based bug detection","volume":"2","author":"Pradel","year":"2018","journal-title":"Proc. ACM Program. Lang."},{"key":"ref_25","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv."},{"key":"ref_26","unstructured":"Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv."},{"key":"ref_27","unstructured":"Kanade, A., Maniatis, P., Balakrishnan, G., and Shi, K. (2020). Learning and Evaluating Contextual Embedding of Source Code. arXiv."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Feng, Z., Guo, D., Tang, D., Duan, N., Feng, X., Gong, M., Shou, L., Qin, B., Liu, T., and Jiang, D. (2020). Codebert: A pre-trained model for programming and natural languages. arXiv.","DOI":"10.18653\/v1\/2020.findings-emnlp.139"},{"key":"ref_29","unstructured":"Guo, D., Ren, S., Lu, S., Feng, Z., Tang, D., Liu, S., Zhou, L., Duan, N., Svyatkovskiy, A., and Fu, S. (2020). Graphcodebert: Pre-training code representations with data flow. arXiv."},{"key":"ref_30","unstructured":"Wang, X., Wang, Y., Mi, F., Zhou, P., Wan, Y., Liu, X., Li, L., Wu, H., Liu, J., and Jiang, X. (2021). Syncobert: Syntax-guided multi-modal contrastive pre-training for code representation. arXiv."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Du, Q., Kuang, X., and Zhao, G. (2022, January 22\u201327). Code Vulnerability Detection via Nearest Neighbor Mechanism. Proceedings of the Findings of the Association for Computational Linguistics, Dublin, Ireland.","DOI":"10.18653\/v1\/2022.findings-emnlp.459"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"1798","DOI":"10.1109\/TPAMI.2013.50","article-title":"Representation learning: A review and new perspectives","volume":"35","author":"Bengio","year":"2013","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"2244","DOI":"10.1109\/TDSC.2021.3051525","article-title":"Sysevr: A framework for using deep learning to detect software vulnerabilities","volume":"19","author":"Li","year":"2021","journal-title":"IEEE Trans. Dependable Secur. Comput."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Yamaguchi, F., Golde, N., Arp, D., and Rieck, K. (2014, January 18\u201321). Modeling and Discovering Vulnerabilities with Code Property Graphs. Proceedings of the 2014 IEEE Symposium on Security and Privacy, San Jose, CA, USA.","DOI":"10.1109\/SP.2014.44"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Yamaguchi, F., Maier, A., Gascon, H., and Rieck, K. (2015, January 17\u201321). Automatic inference of search patterns for taint-style vulnerabilities. Proceedings of the 2015 IEEE Symposium on Security and Privacy, San Jose, CA, USA.","DOI":"10.1109\/SP.2015.54"},{"key":"ref_36","unstructured":"Fey, M., and Lenssen, J.E. (2019). Fast graph representation learning with PyTorch Geometric. arXiv."},{"key":"ref_37","unstructured":"Wang, M., Zheng, D., Ye, Z., Gan, Q., Li, M., Song, X., Zhou, J., Ma, C., Yu, L., and Gai, Y. (2019). Deep graph library: A graph-centric, highly-performant package for graph neural networks. arXiv."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Ehrig, H., Rozenberg, G., and Kreowski, H.J. (1999). Handbook of Graph Grammars and Computing by Graph Transformation, World Scientific.","DOI":"10.1142\/9789812815149"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"1427","DOI":"10.1093\/logcom\/exr021","article-title":"An abstract view on syntax with sharing","volume":"22","author":"Garner","year":"2012","journal-title":"J. Log. Comput."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Wang, Y., and Li, H. (2021, January 8). Code completion by modeling flattened abstract syntax trees as graphs. Proceedings of the AAAI Conference on Artificial Intelligence, Virtual Event.","DOI":"10.1609\/aaai.v35i16.17650"},{"key":"ref_41","unstructured":"Fowler, M. (2018). Refactoring: Improving the Design of Existing Code, Addison-Wesley Professional."},{"key":"ref_42","unstructured":"Raghavan, S., Rohana, R., Leon, D., Podgurski, A., and Augustine, V. (2004, January 11\u201317). Dex: A semantic-graph differencing tool for studying changes in large code bases. Proceedings of the 20th IEEE International Conference on Software Maintenance, Chicago, IL, USA."},{"key":"ref_43","unstructured":"Li, Y., Tarlow, D., Brockschmidt, M., and Zemel, R. (2015). Gated graph sequence neural networks. arXiv."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1007\/BF00342633","article-title":"Cognitron: A self-organizing multilayered neural network","volume":"20","author":"Fukushima","year":"1975","journal-title":"Biol. Cybern."},{"key":"ref_45","unstructured":"Groppe, J., Schlichting, R., Groppe, S., and M\u00f6ller, R. (2022). Lecture Notes in Electrical Engineering, Springer."},{"key":"ref_46","unstructured":"Murphy, K.P. (2012). Machine Learning: A Probabilistic Perspective, MIT Press."},{"key":"ref_47","unstructured":"Kingma, D.P., and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv."},{"key":"ref_48","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"Hinton","year":"2008","journal-title":"J. Mach. Learn. Res."},{"key":"ref_49","unstructured":"McInnes, L., Healy, J., and Melville, J. (2020). UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv."}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/15\/4\/216\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T14:26:12Z","timestamp":1760106372000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/15\/4\/216"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,4,11]]},"references-count":49,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2024,4]]}},"alternative-id":["info15040216"],"URL":"https:\/\/doi.org\/10.3390\/info15040216","relation":{},"ISSN":["2078-2489"],"issn-type":[{"value":"2078-2489","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,4,11]]}}}