{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,14]],"date-time":"2026-02-14T10:26:38Z","timestamp":1771064798121,"version":"3.50.1"},"reference-count":54,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2025,7,18]],"date-time":"2025-07-18T00:00:00Z","timestamp":1752796800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,7,18]],"date-time":"2025-07-18T00:00:00Z","timestamp":1752796800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Autom Softw Eng"],"published-print":{"date-parts":[[2025,11]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Binary analysis, the process of examining software without its source code, plays a crucial role in understanding program behavior, e.g., evaluating the security properties of commercial software, and analyzing malware. One challenging aspect of this process is to classify data encoding schemes, such as encryption and compression, due to the absence of high-level semantic information. Existing approaches either rely on code similarity, which only works for known schemes, or heuristic rules, which lack scalability. In this paper, we propose\u00a0<jats:bold>DESCG<\/jats:bold>, a novel deep learning-based method for automatically classifying four widely employed kinds of data encoding schemes in binary programs: encryption, compression, decompression, and hashing. Our approach leverages dynamic analysis to extract execution traces from binary programs, builds data dependency graphs from these traces, and incorporates critical feature engineering. By combining the specialized graph representation with the Graph Neural Network (GNN), our approach enables accurate classification without requiring prior knowledge of specific encoding schemes. The Evaluation result shows that\u00a0<jats:bold>DESCG\u00a0<\/jats:bold>achieves 97.7% accuracy and an F1 score of 97.67%, outperforming baseline models. We also conducted an extensive evaluation of\u00a0<jats:bold>DESCG\u00a0<\/jats:bold>to explore which feature is more important for it and examine its performance and overhead.<\/jats:p>","DOI":"10.1007\/s10515-025-00538-0","type":"journal-article","created":{"date-parts":[[2025,7,18]],"date-time":"2025-07-18T06:28:05Z","timestamp":1752820085000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["DESCG: data encoding scheme classification with GNN in binary analysis"],"prefix":"10.1007","volume":"32","author":[{"given":"Xushu","family":"Dai","sequence":"first","affiliation":[]},{"given":"Nanqing","family":"Luo","sequence":"additional","affiliation":[]},{"given":"Haizhou","family":"Wang","sequence":"additional","affiliation":[]},{"given":"Zhilong","family":"Wang","sequence":"additional","affiliation":[]},{"given":"Chen","family":"Cao","sequence":"additional","affiliation":[]},{"given":"Peng","family":"Liu","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,7,18]]},"reference":[{"key":"538_CR1","unstructured":"Abu-El-Haija, S., Perozzi, B., Kapoor, A., et\u00a0al.: Mixhop: Higher-order graph convolutional architectures via sparsified neighborhood mixing. In: International Conference on Machine Learning, pp. 21\u201329. PMLR (2019)"},{"key":"538_CR2","unstructured":"Allamanis, M., Brockschmidt, M., Khademi, M.: Learning to represent programs with graphs. arXiv preprint arXiv:1711.00740 (2017)"},{"key":"538_CR3","doi-asserted-by":"crossref","unstructured":"Allamanis, M.: Graph neural networks in program analysis. Graph neural networks: foundations, frontiers, and applications, pp. 483\u2013497 (2022)","DOI":"10.1007\/978-981-16-6054-2_22"},{"key":"538_CR4","doi-asserted-by":"publisher","first-page":"S61","DOI":"10.1016\/j.diin.2015.01.011","volume":"12","author":"S Alrabaee","year":"2015","unstructured":"Alrabaee, S., Shirani, P., Wang, L., et al.: Sigma: A semantic integrated graph matching approach for identifying reused functions in binary code. Digit. Investig. 12, S61\u2013S71 (2015)","journal-title":"Digit. Investig."},{"issue":"2","key":"538_CR5","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3175492","volume":"21","author":"S Alrabaee","year":"2018","unstructured":"Alrabaee, S., Shirani, P., Wang, L., et al.: Fossil: A resilient and efficient system for identifying foss functions in malware binaries. ACM Trans. Priv. Secur. (TOPS) 21(2), 1\u201334 (2018)","journal-title":"ACM Trans. Priv. Secur. (TOPS)"},{"key":"538_CR6","doi-asserted-by":"publisher","first-page":"6249","DOI":"10.1109\/ACCESS.2019.2963724","volume":"8","author":"\u00d6A Aslan","year":"2020","unstructured":"Aslan, \u00d6.A., Samet, R.: A comprehensive review on malware detection approaches. IEEE Access 8, 6249\u20136271 (2020)","journal-title":"IEEE Access"},{"issue":"3","key":"538_CR7","doi-asserted-by":"publisher","first-page":"211","DOI":"10.1007\/s11416-008-0084-2","volume":"4","author":"JM Borello","year":"2008","unstructured":"Borello, J.M., M\u00e9, L.: Code obfuscation techniques for metamorphic viruses. J. Comput. Virol. 4(3), 211\u2013220 (2008)","journal-title":"J. Comput. Virol."},{"key":"538_CR8","unstructured":"Chen, Q., Lacomis, J., Schwartz, E.J., et\u00a0al.: Augmenting decompiler output with learned variable names and types. In: 31st USENIX Security Symposium (USENIX Security 22), pp. 4327\u20134343. USENIX Association, Boston, MA (2022). https:\/\/www.usenix.org\/conference\/usenixsecurity22\/presentation\/chen-qibin"},{"key":"538_CR9","unstructured":"Chua, Z.L., Shen, S., Saxena, P., et\u00a0al.: Neural nets can learn function type signatures from binaries. In: 26th USENIX Security Symposium USENIX Security 17), pp. 99\u2013116 (2017)"},{"key":"538_CR10","unstructured":"Cummins, C., Fisches, Z.V., Ben-Nun, T., et\u00a0al.: Programl: A graph-based program representation for data flow analysis and compiler optimizations. In: International Conference on Machine Learning, pp. 2244\u20132253. PMLR (2021)"},{"key":"538_CR11","unstructured":"Dai, W.: Crypto++ library 5.6.2. (1995). https:\/\/www.cryptopp.com\/. Accessed 2024 Aug 11"},{"issue":"2","key":"538_CR12","doi-asserted-by":"publisher","first-page":"159","DOI":"10.1075\/ijcl.14.2.02dav","volume":"14","author":"M Davies","year":"2009","unstructured":"Davies, M.: The 385+ million word corpus of contemporary american english (1990\u20132008+): Design, architecture, and linguistic insights. Int. J. Corpus Linguist. 14(2), 159\u2013190 (2009)","journal-title":"Int. J. Corpus Linguist."},{"key":"538_CR13","doi-asserted-by":"publisher","unstructured":"Ding, S.H.H., Fung, B.C.M., Charland, P.: Asm2vec: Boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. In: 2019 IEEE Symposium on Security and Privacy (SP), pp. 472\u2013489 (2019). https:\/\/doi.org\/10.1109\/SP.2019.00003","DOI":"10.1109\/SP.2019.00003"},{"key":"538_CR14","unstructured":"Eyrolles, N.: Obfuscation with mixed boolean-arithmetic expressions: reconstruction, analysis and simplification tools. PhD thesis, Universit\u00e9 Paris Saclay (COmUE) (2017)"},{"key":"538_CR15","doi-asserted-by":"crossref","unstructured":"Feng, Q., Zhou, R., Xu, C., et\u00a0al.: Scalable graph-based bug search for firmware images. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 480\u2013491 (2016)","DOI":"10.1145\/2976749.2978370"},{"issue":"3","key":"538_CR16","doi-asserted-by":"publisher","first-page":"319","DOI":"10.1145\/24039.24041","volume":"9","author":"J Ferrante","year":"1987","unstructured":"Ferrante, J., Ottenstein, K.J., Warren, J.D.: The program dependence graph and its use in optimization. ACM Trans. Program. Lang. Syst. (TOPLAS) 9(3), 319\u2013349 (1987)","journal-title":"ACM Trans. Program. Lang. Syst. (TOPLAS)"},{"key":"538_CR17","doi-asserted-by":"crossref","unstructured":"Gandotra, E., Bansal, D., Sofat, S.: Malware analysis and classification: A survey. J. Inf. Secur. 2014 (2014)","DOI":"10.4236\/jis.2014.52006"},{"key":"538_CR18","doi-asserted-by":"crossref","unstructured":"Guo, Y., Li, P., Luo, Y., et\u00a0al.: Exploring gnn based program embedding technologies for binary related tasks. In: Proceedings of the 30th IEEE\/ACM International Conference on Program Comprehension, pp. 366\u2013377 (2022)","DOI":"10.1145\/3524610.3527900"},{"key":"538_CR19","doi-asserted-by":"publisher","unstructured":"Han, X., Pasquier, T., Bates, A., et\u00a0al.: Unicorn: Runtime provenance-based detector for advanced persistent threats. In: 2020 Network and Distributed System Security Symposiu (2020). https:\/\/doi.org\/10.14722\/ndss.2020.24046","DOI":"10.14722\/ndss.2020.24046"},{"key":"538_CR20","doi-asserted-by":"crossref","unstructured":"He, J., Ivanov, P., Tsankov, P., et\u00a0al.: Debin: Predicting debug information in stripped binaries. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pp. 1667\u20131680 (2018)","DOI":"10.1145\/3243734.3243866"},{"key":"538_CR21","unstructured":"Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)"},{"issue":"19","key":"538_CR22","doi-asserted-by":"publisher","first-page":"4086","DOI":"10.3390\/app9194086","volume":"9","author":"Y Lee","year":"2019","unstructured":"Lee, Y., Kwon, H., Choi, S.H., et al.: Instruction2vec: Efficient preprocessor of assembly code to detect software weakness with cnn. Appl. Sci. 9(19), 4086 (2019)","journal-title":"Appl. Sci."},{"key":"538_CR23","doi-asserted-by":"crossref","unstructured":"Lestringant, P., Guih\u00e9ry, F., Fouque, P.A.: Automated identification of cryptographic primitives in binary code with data flow graph isomorphism. In: Proceedings of the 10th ACM Symposium on Information, Computer and Communications Security, pp. 203\u2013214 (2015)","DOI":"10.1145\/2714576.2714639"},{"key":"538_CR24","unstructured":"Li, Y., Gu, C., Dullien, T., et\u00a0al.: Graph matching networks for learning the similarity of graph structured objects. In: International Conference on Machine Learning, pp. 3835\u20133845. PMLR (2019)"},{"key":"538_CR25","doi-asserted-by":"crossref","unstructured":"Li, J., Lin, Z., Caballero, J., et\u00a0al.: K-hunt: Pinpointing insecure cryptographic keys from execution traces. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pp. 412\u2013425 (2018)","DOI":"10.1145\/3243734.3243783"},{"key":"538_CR26","doi-asserted-by":"crossref","unstructured":"Li, X., Qu, Y., Yin, H.: Palmtree: Learning an assembly language model for instruction embedding. In: Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, pp. 3236\u20133251 (2021)","DOI":"10.1145\/3460120.3484587"},{"key":"538_CR27","unstructured":"Liu, S., Chen, Y., Xie, X., et\u00a0al.: Retrieval-augmented generation for code summarization via hybrid gnn. arXiv preprint arXiv:2006.05405 (2020)"},{"issue":"7","key":"538_CR28","doi-asserted-by":"publisher","first-page":"1722","DOI":"10.3390\/electronics12071722","volume":"12","author":"G Liu","year":"2023","unstructured":"Liu, G., Zhou, X., Pang, J., et al.: Codeformer: A gnn-nested transformer model for binary code similarity detection. Electronics 12(7), 1722 (2023)","journal-title":"Electronics"},{"issue":"6","key":"538_CR29","doi-asserted-by":"publisher","first-page":"190","DOI":"10.1145\/1064978.1065034","volume":"40","author":"CK Luk","year":"2005","unstructured":"Luk, C.K., Cohn, R., Muth, R., et al.: Pin: building customized program analysis tools with dynamic instrumentation. Acm Sigplan Notices 40(6), 190\u2013200 (2005)","journal-title":"Acm Sigplan Notices"},{"key":"538_CR30","doi-asserted-by":"crossref","unstructured":"Machiry, A., Redini, N., Gustafson, E., et\u00a0al.: Using loops for malware classification resilient to feature-unaware perturbations. In: Proceedings of the 34th Annual Computer Security Applications Conference, pp. 112\u2013123 (2018)","DOI":"10.1145\/3274694.3274731"},{"key":"538_CR31","doi-asserted-by":"crossref","unstructured":"Manzoor, E., Milajerdi, S.M., Akoglu, L.: Fast memory-efficient anomaly detection in streaming heterogeneous graphs. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1035\u20131044 (2016)","DOI":"10.1145\/2939672.2939783"},{"key":"538_CR32","unstructured":"Marpaung, J.A., Sain, M., Lee, H.J.: Survey on malware evasion techniques: State of the art and challenges. In: 2012 14th International Conference on Advanced Communication Technology (ICACT), pp. 744\u2013749. IEEE (2012)"},{"key":"538_CR33","unstructured":"Meijer, C., Moonsamy, V., Wetzels, J.: Where\u2019s crypto?: Automated identification and classification of proprietary cryptographic primitives in binary code. In: 30th USENIX Security Symposium (USENIX Security 21), pp. 555\u2013572 (2021)"},{"key":"538_CR34","unstructured":"Moseley, T., Grunwald, D., Connors, D.A., et\u00a0al.: Loopprof: Dynamic techniques for loop detection and profiling. In: Proceedings of the 2006 Workshop on Binary Instrumentation and Applications (WBIA). Citeseer (2006)"},{"key":"538_CR35","doi-asserted-by":"crossref","unstructured":"Nitin, V., Saieva, A., Ray, B., et\u00a0al.: Direct: A transformer-based model for decompiled variable name recovery. NLP4Prog 2021, p.\u00a048 (2021)","DOI":"10.18653\/v1\/2021.nlp4prog-1.6"},{"key":"538_CR36","unstructured":"Openssl: The open source toolkit for ssl\/tls. (1998). https:\/\/www.openssl.org\/. Accessed 2024 Aug 11"},{"key":"538_CR37","doi-asserted-by":"crossref","unstructured":"O\u2019sullivan, P., Anand, K., Kotha, A., et\u00a0al .: Retrofitting security in cots software with binary rewriting. In: Future Challenges in Security and Privacy for Academia and Industry: 26th IFIP TC 11 International Information Security Conference, SEC 2011, Lucerne, Switzerland, June 7-9, 2011. Proceedings 26, pp. 154\u2013172. Springer (2011)","DOI":"10.1007\/978-3-642-21424-0_13"},{"key":"538_CR38","doi-asserted-by":"crossref","unstructured":"Pei, K., Guan, J., Broughton, M., et\u00a0al.: Stateformer: Fine-grained type recovery from binaries using generative state modeling. In: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 690\u2013702 (2021)","DOI":"10.1145\/3468264.3468607"},{"key":"538_CR39","doi-asserted-by":"crossref","unstructured":"Pei, K., She, D., Wang, M., et\u00a0al.: Neudep: Neural binary memory dependence analysis. In: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 747\u2013759 (2022)","DOI":"10.1145\/3540250.3549147"},{"key":"538_CR40","doi-asserted-by":"crossref","unstructured":"Sajadieh, M., Dakhilalian, M., Mala, H., et al.: Recursive diffusion layers for block ciphers and hash functions. In: Fast Software Encryption: 19th International Workshop, FSE 2012, Washington, DC, USA, March 19\u201321, 2012, pp. 385\u2013401. Springer, Revised Selected Papers (2012)","DOI":"10.1007\/978-3-642-34047-5_22"},{"issue":"1","key":"538_CR41","doi-asserted-by":"publisher","first-page":"61","DOI":"10.1109\/TNN.2008.2005605","volume":"20","author":"F Scarselli","year":"2008","unstructured":"Scarselli, F., Gori, M., Tsoi, A.C., et al.: The graph neural network model. IEEE Trans. Neural Netw. 20(1), 61\u201380 (2008)","journal-title":"IEEE Trans. Neural Netw."},{"key":"538_CR42","unstructured":"Sethi, A.: Digital rights management and code obfuscation. Master\u2019s thesis, University of Waterloo (2004)"},{"key":"538_CR43","doi-asserted-by":"crossref","unstructured":"Sharma, A., Sahay, S.K.: Evolution and detection of polymorphic and metamorphic malwares: A survey. arXiv preprint arXiv:1406.7061 (2014)","DOI":"10.5120\/15544-4098"},{"key":"538_CR44","unstructured":"Shin, E.C.R., Song, D., Moazzezi, R.: Recognizing functions in binaries with neural networks. In: 24th USENIX Security Symposium (USENIX Security 15), pp. 611\u2013626 (2015)"},{"key":"538_CR45","unstructured":"Skibinski, P.: lzbench - an in-memory benchmark of open-source lz77\/lzss\/lzma compressors (2021). https:\/\/github.com\/inikep\/lzbench"},{"key":"538_CR46","unstructured":"Tallent, N.R.: Binary analysis for attribution and interpretation of performance measurements on fully-optimized code. In: Masters Abstracts International (2007)"},{"key":"538_CR47","unstructured":"Vranken, G.: Cryptofuzz. (2019). https:\/\/github.com\/guidovranken\/cryptofuzz"},{"key":"538_CR48","doi-asserted-by":"crossref","unstructured":"Wang, H., Qu, W., Katz, G., et\u00a0al.: Jtrans: Jump-aware transformer for binary code similarity detection. In: Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 1\u201313 (2022)","DOI":"10.1145\/3533767.3534367"},{"key":"538_CR49","doi-asserted-by":"crossref","unstructured":"Xu, D., Ming, J., Wu, D.: Cryptographic function detection in obfuscated binaries via bit-precise symbolic loop mapping. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 921\u2013937. IEEE (2017)","DOI":"10.1109\/SP.2017.56"},{"key":"538_CR50","doi-asserted-by":"crossref","unstructured":"Xu, Z., Zhang, X., Chen, L., et\u00a0al.: Python probabilistic type inference with natural language support. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 607\u2013618 (2016)","DOI":"10.1145\/2950290.2950343"},{"key":"538_CR51","doi-asserted-by":"crossref","unstructured":"Yamaguchi, F., Golde, N., Arp, D., et\u00a0al.: Modeling and discovering vulnerabilities with code property graphs. In: 2014 IEEE Symposium on Security and Privacy, pp. 590\u2013604. IEEE (2014)","DOI":"10.1109\/SP.2014.44"},{"key":"538_CR52","unstructured":"Zhou, Y., Liu, S., Siow, J., et\u00a0al.: Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. Adv. Neural Inf. Process. Syst. 32 (2019)"},{"key":"538_CR53","unstructured":"Zhu, C., Li, Z., Xue, A., et\u00a0al.: TYGR: Type inference on stripped binaries using graph neural networks. In: 33rd USENIX Security Symposium (USENIX Security 24), pp. 4283\u20134300 (2024)"},{"key":"538_CR54","doi-asserted-by":"crossref","unstructured":"Zuo, F., Li, X., Young, P., et\u00a0al.: Neural machine translation inspired binary code similarity comparison beyond function pairs. arXiv preprint arXiv:1808.04706 (2018)","DOI":"10.14722\/ndss.2019.23492"}],"container-title":["Automated Software Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10515-025-00538-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10515-025-00538-0\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10515-025-00538-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,11]],"date-time":"2025-09-11T01:02:10Z","timestamp":1757552530000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10515-025-00538-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,18]]},"references-count":54,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,11]]}},"alternative-id":["538"],"URL":"https:\/\/doi.org\/10.1007\/s10515-025-00538-0","relation":{},"ISSN":["0928-8910","1573-7535"],"issn-type":[{"value":"0928-8910","type":"print"},{"value":"1573-7535","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,7,18]]},"assertion":[{"value":"18 September 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"5 July 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"18 July 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"65"}}