{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,12]],"date-time":"2026-06-12T10:08:37Z","timestamp":1781258917756,"version":"3.54.1"},"reference-count":40,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2025,8,6]],"date-time":"2025-08-06T00:00:00Z","timestamp":1754438400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["2334243"],"award-info":[{"award-number":["2334243"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["MAKE"],"abstract":"<jats:p>Rust\u2019s growing popularity in high-integrity systems requires automated vulnerability detection in order to maintain its strong safety guarantees. Although Rust\u2019s ownership model and compile-time checks prevent many errors, sometimes unexpected bugs may occasionally pass analysis, underlining the necessity for automated safe and unsafe code detection. This paper presents Rust-IR-BERT, a machine learning approach to detect security vulnerabilities in Rust code by analyzing its compiled LLVM intermediate representation (IR) instead of the raw source code. This approach offers novelty by employing LLVM IR\u2019s language-neutral, semantically rich representation of the program, facilitating robust detection by capturing core data and control-flow semantics and reducing language-specific syntactic noise. Our method leverages a graph-based transformer model, GraphCodeBERT, which is a transformer architecture pretrained model to encode structural code semantics via data-flow information, followed by a gradient boosting classifier, CatBoost, that is capable of handling complex feature interactions\u2014to classify code as vulnerable or safe. The model was evaluated using a carefully curated dataset of over 2300 real-world Rust code samples (vulnerable and non-vulnerable Rust code snippets) from RustSec and OSV advisory databases, compiled to LLVM IR and labeled with corresponding Common Vulnerabilities and Exposures (CVEs) identifiers to ensure comprehensive and realistic coverage. Rust-IR-BERT achieved an overall accuracy of 98.11%, with a recall of 99.31% for safe code and 93.67% for vulnerable code. Despite these promising results, this study acknowledges potential limitations such as focusing primarily on known CVEs. Built on a representative dataset spanning over 2300 real-world Rust samples from diverse crates, Rust-IR-BERT delivers consistently strong performance. Looking ahead, practical deployment could take the form of a Cargo plugin or pre-commit hook that automatically generates and scans LLVM IR artifacts during the development cycle, enabling developers to catch vulnerabilities at an early stage in the development cycle.<\/jats:p>","DOI":"10.3390\/make7030079","type":"journal-article","created":{"date-parts":[[2025,8,6]],"date-time":"2025-08-06T15:09:53Z","timestamp":1754492993000},"page":"79","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Machine Learning-Based Vulnerability Detection in Rust Code Using LLVM IR and Transformer Model"],"prefix":"10.3390","volume":"7","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3589-3120","authenticated-orcid":false,"given":"Young","family":"Lee","sequence":"first","affiliation":[{"name":"Department of Computational, Engineering, and Mathematical Sciences, Texas A&M University-San Antonio, One University Way, San Antonio, TX 78224, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Syeda Jannatul","family":"Boshra","sequence":"additional","affiliation":[{"name":"Department of Computational, Engineering, and Mathematical Sciences, Texas A&M University-San Antonio, One University Way, San Antonio, TX 78224, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3819-3544","authenticated-orcid":false,"given":"Jeong","family":"Yang","sequence":"additional","affiliation":[{"name":"Department of Computational, Engineering, and Mathematical Sciences, Texas A&M University-San Antonio, One University Way, San Antonio, TX 78224, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4542-7791","authenticated-orcid":false,"given":"Zechun","family":"Cao","sequence":"additional","affiliation":[{"name":"Department of Computational, Engineering, and Mathematical Sciences, Texas A&M University-San Antonio, One University Way, San Antonio, TX 78224, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6700-6664","authenticated-orcid":false,"given":"Gongbo","family":"Liang","sequence":"additional","affiliation":[{"name":"Department of Computational, Engineering, and Mathematical Sciences, Texas A&M University-San Antonio, One University Way, San Antonio, TX 78224, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2025,8,6]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Kishiyama, B., Lee, Y., and Yang, J. (2024). Improving VulRepair\u2019s Perfect Prediction by Leveraging the LION Optimizer. Appl. Sci., 14.","DOI":"10.20944\/preprints202406.0755.v1"},{"key":"ref_2","unstructured":"Yang, J., and Lodgher, A. (2019). Fundamental Defensive Programming Practicec with Secure Coding Modules. Int. Conf. Secur. Manag."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Bae, Y., Kim, Y., Askar, A., Lim, J., and Kim, T. (2021, January 26\u201329). Rudra: Finding Memory Safety Bugs in Rust at the Ecosystem Scale. Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles, Virtual Event.","DOI":"10.1145\/3477132.3483570"},{"key":"ref_4","unstructured":"Guo, D., Ren, S., Lu, S., Feng, Z., Tang, D., Liu, S., Zhou, L., Duan, N., Svyatkovskiy, A., and Fu, S. (2021). GraphCodeBERT: Pre-Training Code Representations with Data Flow. arXiv."},{"key":"ref_5","unstructured":"Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A.V., and Gulin, A. (2018, January 3\u20138). CatBoost: Unbiased boosting with categorical features. Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_6","first-page":"34","article-title":"A Closer Look at the Security Risks in the Rust Ecosystem","volume":"33","author":"Zheng","year":"2023","journal-title":"ACM Trans. Softw. Eng. Methodol."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"713","DOI":"10.1142\/S0218194022500231","article-title":"The Safety and Performance of Prominent Programming Languages","volume":"32","author":"Bugden","year":"2022","journal-title":"Int. J. Softw. Eng. Knowl. Eng."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Zhu, S., Zhang, Z., Qin, B., Xiong, A., and Song, L. (2022, January 21\u201329). Learning and programming challenges of rust: A mixed-methods study. Proceedings of the 44th International Conference on Software Engineering, ICSE \u201922, Pittsburgh, PA, USA.","DOI":"10.1145\/3510003.3510164"},{"key":"ref_9","first-page":"1","article-title":"Memory-Safety Challenge Considered Solved? An In-Depth Study with All Rust CVEs","volume":"31","author":"Xu","year":"2022","journal-title":"ACM Trans. Softw. Eng. Methodol."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Qin, B., Chen, Y., Yu, Z., Song, L., and Zhang, Y. (2020, January 15\u201320). Understanding memory and thread safety practices and issues in real-world Rust programs. Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation, London, UK.","DOI":"10.1145\/3385412.3386036"},{"key":"ref_11","unstructured":"Yu, Z., Song, L., and Zhang, Y. (2019). Fearless Concurrency? Understanding Concurrent Programming Safety in Real-World Rust Software. arXiv."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Hassnain, M., and Stanford, C. (2024, January 27). Counterexamples in Safe Rust. Proceedings of the 39th IEEE\/ACM International Conference on Automated Software Engineering Workshops, ASEW \u201924, Sacramento, CA, USA.","DOI":"10.1145\/3691621.3694943"},{"key":"ref_13","unstructured":"(2025, April 25). How to Write a Timing-Attack-Proof Comparison Function (\u2018Ord::cmp\u2019, Lexicographic) for Byte Arrays?\u2014Help, 2023. Section: Help. Available online: https:\/\/users.rust-lang.org\/t\/how-to-write-a-timing-attack-proof-comparison-function-ord-cmp-lexicographic-for-byte-arrays\/100607."},{"key":"ref_14","unstructured":"Park, S., Cheng, X., and Kim, T. (2022). Unsafe\u2019s Betrayal: Abusing Unsafe Rust in Binary Reverse Engineering via Machine Learning. arXiv, Available online: https:\/\/www.semanticscholar.org\/paper\/0d3052a6c38876eed2c66e1ea3ee6e6c074d62f2."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"2244","DOI":"10.1109\/TDSC.2021.3051525","article-title":"SySeVR: A Framework for Using Deep Learning to Detect Software Vulnerabilities","volume":"19","author":"Li","year":"2022","journal-title":"IEEE Trans. Dependable Secur. Comput."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Li, Z., Zou, D., Xu, S., Ou, X., Jin, H., Wang, S., Deng, Z., and Zhong, Y. (2018, January 18\u201321). VulDeePecker: A Deep Learning-Based System for Vulnerability Detection. Proceedings of the 2018 Network and Distributed System Security Symposium, San Diego, CA, USA.","DOI":"10.14722\/ndss.2018.23158"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Hanif, H., and Maffeis, S. (2022, January 18\u201323). VulBERTa: Simplified Source Code Pre-Training for Vulnerability Detection. Proceedings of the 2022 International Joint Conference on Neural Networks (IJCNN), Padua, Italy.","DOI":"10.1109\/IJCNN55064.2022.9892280"},{"key":"ref_18","first-page":"66","article-title":"RustBelt: Securing the foundations of the rust programming language","volume":"2","author":"Jung","year":"2017","journal-title":"Proc. ACM Program. Lang."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"92","DOI":"10.1145\/3473597","article-title":"GhostCell: Separating permissions from data in Rust","volume":"5","author":"Yanovski","year":"2021","journal-title":"Proc. ACM Program. Lang."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"144","DOI":"10.1145\/3418295","article-title":"Safe systems programming in Rust","volume":"64","author":"Jung","year":"2021","journal-title":"Commun. ACM"},{"key":"ref_21","unstructured":"(2025, April 25). AWS\u2019 Sponsorship of the Rust Project|AWS Open Source Blog, 2019. Section: Developer Tools. Available online: https:\/\/aws.amazon.com\/cn\/blogs\/opensource\/aws-sponsorship-of-the-rust-project\/."},{"key":"ref_22","unstructured":"Klabnik, S. (2023). The Rust Programming Language, No Starch Press. [2nd ed.]."},{"key":"ref_23","unstructured":"(2025, May 15). RustSec Security Advisory Database. Available online: https:\/\/rustsec.org\/advisories\/."},{"key":"ref_24","unstructured":"(2025, May 15). Open Source Vulnerabilities (OSV) Database. Available online: https:\/\/osv.dev\/."},{"key":"ref_25","unstructured":"Computer Security Division (2025, April 20). NIST. 2008. Last Modified: 2022-04-11T08:23-04:00, Available online: https:\/\/www.nist.gov\/itl\/csd."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Cheng, X., Zhang, G., Wang, H., and Sui, Y. (2022, January 18\u201322). Path-sensitive code embedding via contrastive learning for software vulnerability detection. Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual.","DOI":"10.1145\/3533767.3534371"},{"key":"ref_27","unstructured":"Zhou, Y., Liu, S., Siow, J., Du, X., and Liu, Y. (2019). Devign: Effective Vulnerability Identification by Learning Comprehensive Program Semantics via Graph Neural Networks. arXiv."},{"key":"ref_28","unstructured":"Lattner, C., and Adve, V. (2004, January 20\u201324). LLVM: A compilation framework for lifelong program analysis & transformation. Proceedings of the International Symposium on Code Generation and Optimization, CGO 2004, San Jose, CA, USA."},{"key":"ref_29","unstructured":"Moses, W.S. (2025, April 25). Understanding High-Level Properties of Low-Level Programs Through Transformers. Available online: https:\/\/math.mit.edu\/research\/highschool\/primes\/materials\/2022\/Guo-Moses.pdf."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Mahyari, A. (2022). A Hierarchical Deep Neural Network for Detecting Lines of Codes with Vulnerabilities. arXiv.","DOI":"10.1109\/QRS-C57518.2022.00011"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"4","DOI":"10.1007\/s10664-023-10346-3","article-title":"AIBugHunter: A Practical tool for predicting, classifying and repairing software vulnerabilities","volume":"29","author":"Fu","year":"2024","journal-title":"Empir. Softw. Eng."},{"key":"ref_32","unstructured":"Luo, Y., Zhou, H., Zhang, M., Rosa, D.D.L., Ahmed, H., Xu, W., and Xu, D. (2025). HALURust: Exploiting Hallucinations of Large Language Models to Detect Vulnerabilities in Rust. arXiv."},{"key":"ref_33","unstructured":"Suneja, S., Zheng, Y., Zhuang, Y., Laredo, J., and Morari, A. (2020). Learning to map source code to software vulnerability using code-as-a-graph. arXiv."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Cipollone, D., Wang, C., Scazzariello, M., Ferlin, S., Izadi, M., Kostic, D., and Chiesa, M. (2025). Automating the Detection of Code Vulnerabilities by Analyzing GitHub Issues. arXiv.","DOI":"10.1109\/LLM4Code66737.2025.00010"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Chen, T., and Guestrin, C. (2016, January 13\u201317). XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD\u201916, San Francisco, CA, USA.","DOI":"10.1145\/2939672.2939785"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"103994","DOI":"10.1016\/j.cose.2024.103994","article-title":"SCL-CVD: Supervised contrastive learning for code vulnerability detection via GraphCodeBERT","volume":"145","author":"Wang","year":"2024","journal-title":"Comput. Secur."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"5121","DOI":"10.22214\/ijraset.2024.61156","article-title":"Design and Development of Android App Malware Detector API Using Androguard and Catboost","volume":"12","author":"K","year":"2024","journal-title":"Int. J. Res. Appl. Sci. Eng. Technol."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Ullah, S., Han, M., Pujar, S., Pearce, H., Coskun, A., and Stringhini, G. (2024). LLMs Cannot Reliably Identify and Reason About Security Vulnerabilities (Yet?): A Comprehensive Evaluation, Framework, and Benchmarks. arXiv.","DOI":"10.1109\/SP54263.2024.00210"},{"key":"ref_39","unstructured":"Mittal, A. (Artificial Intelligence, 2024). Code Embedding: A Comprehensive Guide, Artificial Intelligence."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"de Moor, O., Verbaere, M., Hajiyev, E., Avgustinov, P., Ekman, T., Ongkingco, N., Sereni, D., Tibble, J., Limited, S., and Centre, M. (October, January 30). Keynote Address: .QL for Source Code Analysis. Proceedings of the Seventh IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM 2007), Paris, France.","DOI":"10.1109\/SCAM.2007.4362893"}],"container-title":["Machine Learning and Knowledge Extraction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-4990\/7\/3\/79\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T18:24:57Z","timestamp":1760034297000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-4990\/7\/3\/79"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8,6]]},"references-count":40,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2025,9]]}},"alternative-id":["make7030079"],"URL":"https:\/\/doi.org\/10.3390\/make7030079","relation":{},"ISSN":["2504-4990"],"issn-type":[{"value":"2504-4990","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,8,6]]}}}