{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,28]],"date-time":"2026-04-28T15:38:39Z","timestamp":1777390719782,"version":"3.51.4"},"reference-count":28,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2025,10,22]],"date-time":"2025-10-22T00:00:00Z","timestamp":1761091200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100006012","name":"Christian Doppler Research Association","doi-asserted-by":"publisher","award":["CDL AsTra"],"award-info":[{"award-number":["CDL AsTra"]}],"id":[{"id":"10.13039\/501100006012","id-type":"DOI","asserted-by":"publisher"}]},{"name":"FFG","award":["SBA-K1 NGC"],"award-info":[{"award-number":["SBA-K1 NGC"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["MAKE"],"abstract":"<jats:p>We present a novel methodology for classifying code obfuscation techniques in LLVM IR program embeddings. We apply isolated and layered code obfuscations to C source code using the Tigress obfuscator, compile them to LLVM IR, and convert each IR code representation into a numerical embedding (vector representation) that captures intrinsic characteristics of the applied obfuscations. We then use two modern boost classifiers to identify which obfuscation, or layering of obfuscations, was used on the source code from the vector representation. To better analyze classifier behavior and error propagation, we employ a staged, cascading experimental design that separates the task into multiple decision levels, including obfuscation detection, single-versus-layered discrimination, and detailed technique classification. This structured evaluation allows a fine-grained view of classification uncertainty and model robustness across the inference stages. We achieve an overall accuracy of more than 90% in identifying the types of obfuscations. Our experiments show high classification accuracy for most obfuscations, including layered obfuscations, and even perfect scores for certain transformations, indicating that a vector representation of IR code preserves distinguishing features of the protections. In this article, we detail the workflow for applying obfuscations, generating embeddings, and training the model, and we discuss challenges such as obfuscation patterns covered by other obfuscations in layered protection scenarios.<\/jats:p>","DOI":"10.3390\/make7040125","type":"journal-article","created":{"date-parts":[[2025,10,23]],"date-time":"2025-10-23T01:14:02Z","timestamp":1761182042000},"page":"125","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Classification of Obfuscation Techniques in LLVM IR: Machine Learning on Vector Representations"],"prefix":"10.3390","volume":"7","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2206-9263","authenticated-orcid":false,"given":"Sebastian","family":"Raubitzek","sequence":"first","affiliation":[{"name":"SBA Research gGmbH, Floragasse 7\/5.OG, 1040 Vienna, Austria"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-6488-8272","authenticated-orcid":false,"given":"Patrick","family":"Felbauer","sequence":"additional","affiliation":[{"name":"Christian Doppler Laboratory for Assurance and Transparency in Software Protection, Faculty of Computer Science, University of Vienna, Kolingasse 14\u201316, 1090 Vienna, Austria"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kevin","family":"Mallinger","sequence":"additional","affiliation":[{"name":"SBA Research gGmbH, Floragasse 7\/5.OG, 1040 Vienna, Austria"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2115-2022","authenticated-orcid":false,"given":"Sebastian","family":"Schrittwieser","sequence":"additional","affiliation":[{"name":"Christian Doppler Laboratory for Assurance and Transparency in Software Protection, Faculty of Computer Science, University of Vienna, Kolingasse 14\u201316, 1090 Vienna, Austria"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2025,10,22]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Collberg, C., Martin, S., Myers, J., and Nagra, J. (2012, January 3\u20137). Distributed application tamper detection via continuous software updates. Proceedings of the 28th Annual Computer Security Applications Conference, Orlando, FL, USA.","DOI":"10.1145\/2420950.2420997"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Junod, P., Rinaldini, J., Wehrli, J., and Michielin, J. (2015, January 19). Obfuscator-LLVM\u2014Software protection for the masses. Proceedings of the 2015 IEEE\/ACM 1st International Workshop on Software Protection, Florence, Italy.","DOI":"10.1109\/SPRO.2015.10"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Salem, A., and Banescu, S. (2016, January 5\u20136). Metadata recovery from obfuscated programs using machine learning. Proceedings of the 6thWorkshop on Software Security, Protection, and Reverse Engineering, Los Angeles, CA, USA.","DOI":"10.1145\/3015135.3015136"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Sagisaka, H., and Tamada, H. (2016, January 26\u201329). Identifying the applied obfuscation method towards de-obfuscation. Proceedings of the 2016 IEEE\/ACIS 15th International Conference on Computer and Information Science (ICIS), Okayama, Japan.","DOI":"10.1109\/ICIS.2016.7550869"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1109\/64.511768","article-title":"Neural networks for computer virus recognition","volume":"11","author":"Tesauro","year":"1996","journal-title":"IEEE Expert"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Jones, L., Christman, D., Banescu, S., and Carlisle, M. (2018, January 8\u201310). Bytewise: A case study in neural network obfuscation identification. Proceedings of the 2018 IEEE 8th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, USA.","DOI":"10.1109\/CCWC.2018.8301720"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Kim, H. (2021). LOM: Lightweight Classifier for Obfuscation Methods. Information Security Applications, Proceedings of the 22nd International Conference, WISA 2021, Springer.","DOI":"10.1007\/978-3-030-89432-0_1"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Schrittwieser, S., Wimmer, E., Mallinger, K., Kochberger, P., Lawitschka, C., Raubitzek, S., and Weippl, E.R. (2023, January 25\u201329). Modeling Obfuscation Stealth Through Code Complexity. Proceedings of the Computer Security. ESORICS 2023 International Workshops: CPS4CIP,\nADIoT, SecAssure, WASP, TAURIN, PriST-AI, and SECAI, The Hague, The Netherlands. Revised Selected Papers, Part II.","DOI":"10.1007\/978-3-031-54129-2_23"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3418463","article-title":"IR2VEC: LLVM IR Based Scalable Program Embeddings","volume":"17","author":"VenkataKeerthy","year":"2020","journal-title":"ACM Trans. Archit. Code Optim."},{"key":"ref_10","first-page":"103850","article-title":"Obfuscation undercover: Unraveling the impact of obfuscation layering on structural code patterns","volume":"85","author":"Raubitzek","year":"2024","journal-title":"J. Inf. Secur. Appl."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Raubitzek, S., Schrittwieser, S., Lawitschka, C., Mallinger, K., Ekelhart, A., and Weippl, E.R. (2024, January 8\u201310). Code Obfuscation Classification Using Singular Value Decomposition on Grayscale Image Representations. Proceedings of the 21st International Conference on Security and Cryptography (SECRYPT), Dijon, France.","DOI":"10.5220\/0012856600003767"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Raitsis, T., Elgazari, Y., Toibin, G.E., Lurie, Y., Mark, S., and Margalit, O. (2025). Code Obfuscation: A Comprehensive Approach to Detection, Classification, and Ethical Challenges. Algorithms, 18.","DOI":"10.3390\/a18020054"},{"key":"ref_13","first-page":"102953","article-title":"Function-level obfuscation detection method based on Graph Convolutional Networks","volume":"61","author":"Jiang","year":"2021","journal-title":"J. Inf. Secur. Appl."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Wang, Y., and Rountev, A. (2017, January 22\u201323). Who changed you? Obfuscator identification for Android. Proceedings of the 2017 IEEE\/ACM 4th International Conference on Mobile Software Engineering and Systems (MOBILESoft), Buenos Aires, Argentina.","DOI":"10.1109\/MOBILESoft.2017.18"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Bacci, A., Bartoli, A., Martinelli, F., Medvet, E., and Mercaldo, F. (2018, January 27\u201330). Detection of obfuscation techniques in android applications. Proceedings of the 13th International Conference on Availability, Reliability and Security, Hamburg Germany.","DOI":"10.1145\/3230833.3232823"},{"key":"ref_16","first-page":"22","article-title":"A Framework for Identifying Obfuscation Techniques applied to Android Apps using Machine Learning","volume":"10","author":"Park","year":"2019","journal-title":"J. Wirel. Mob. Netw. Ubiquitous Comput. Dependable Appl."},{"key":"ref_17","unstructured":"Collberg, C. (2025, October 10). Tigress: The Diversifying C Virtualizer\/Obfuscator. Available online: https:\/\/tigress.wtf\/."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Banescu, S., Ochoa, M., and Pretschner, A. (2015, January 19). A framework for measuring software obfuscation resilience against automated attacks. Proceedings of the 2015 IEEE\/ACM 1st International Workshop on Software Protection, Florence, Italy.","DOI":"10.1109\/SPRO.2015.16"},{"key":"ref_19","unstructured":"Patra, B., Kanade, A., Maniatis, P., and Orso, A. (2020, January 8\u201313). IR2Vec: A Flow-Aware Representation Learning for LLVM IR. Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC\/FSE 2020), Virtual Event."},{"key":"ref_20","unstructured":"Banescu, S., Collberg, C., and Pretschner, A. (2016, January 28). Measuring the Strength of Software Obfuscation. Proceedings of the 2016 Workshop on Software Protection (SPRO \u201916), Vienna, Austria."},{"key":"ref_21","first-page":"76","article-title":"Machine Learning-Based Detection of Software Obfuscation Techniques","volume":"46","author":"Salem","year":"2019","journal-title":"J. Inf. Secur. Appl."},{"key":"ref_22","unstructured":"Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A.V., and Gulin, A. (2018, January 3\u20138). CatBoost: Unbiased boosting with categorical features. Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Montreal, QC, Canada."},{"key":"ref_23","unstructured":"Dorogush, A.V., Ershov, V., and Gulin, A. (2017, January 4\u20139). CatBoost: Gradient boosting with categorical features support. Proceedings of the Workshop on ML Systems at NeurIPS, 2017, Long Beach Convention Center, Long Beach, CA, USA."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1007\/s10994-006-6226-1","article-title":"Extremely Randomized Trees","volume":"63","author":"Geurts","year":"2006","journal-title":"Mach. Learn."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Raubitzek, S., Corpaci, L., Hofer, R., and Mallinger, K. (2023). Scaling Exponents of Time Series Data: A Machine Learning Approach. Entropy, 25.","DOI":"10.20944\/preprints202311.0467.v1"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Raubitzek, S., and Mallinger, K. (2023). On the Applicability of Quantum Machine Learning. Entropy, 25.","DOI":"10.20944\/preprints202305.0833.v1"},{"key":"ref_27","unstructured":"Snoek, J., Larochelle, H., and Adams, R.P. (2012). Practical Bayesian Optimization of Machine Learning Algorithms. arXiv."},{"key":"ref_28","unstructured":"Head, T., Kumar, M., Nahrstaedt, H., Louppe, G., and Shcherbatyi, I. (2025, October 10). Scikit-Optimize\/Scikit-Optimize (v0.9.0). Available online: https:\/\/zenodo.org\/records\/5565057."}],"container-title":["Machine Learning and Knowledge Extraction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-4990\/7\/4\/125\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,24]],"date-time":"2025-10-24T04:24:37Z","timestamp":1761279877000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-4990\/7\/4\/125"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,22]]},"references-count":28,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["make7040125"],"URL":"https:\/\/doi.org\/10.3390\/make7040125","relation":{},"ISSN":["2504-4990"],"issn-type":[{"value":"2504-4990","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,10,22]]}}}