{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,6]],"date-time":"2026-03-06T05:38:28Z","timestamp":1772775508830,"version":"3.50.1"},"reference-count":47,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2024,8,16]],"date-time":"2024-08-16T00:00:00Z","timestamp":1723766400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Priv. Secur."],"published-print":{"date-parts":[[2024,8,31]]},"abstract":"<jats:p>Malware attacks are posing a significant threat to national security, cooperate network, and public endpoint security. Identifying the Advanced Persistent Threat (APT) groups behind the attacks and grouping their activities into attack campaigns help security investigators trace their activities thus providing better security protections against future attacks. Existing Cyber Threat Intelligent (CTI) components mainly focus on malware family identification and behavior characterization, which cannot solve the APT tracking problem: while APT tracking needs one to link malware binaries of multiple families to a single threat actor, these behavior or function-based techniques are tightened up to a specific attack technique and would fail on connecting different families. Binary Authorship Attribution (AA) solutions could discriminate against threat actors based on their stylometric traits. However, AA solutions assume that the author of a binary is within a fixed candidate author set. However, real-world malware binaries may be created by a new unknown threat actor.<\/jats:p>\n          <jats:p>To address this research gap, we propose VeriBin for the Binary Authorship Verification (BAV) problem. VeriBin is a novel adversarial neural network that extracts functionality-agnostic style representations from assembly code for the AV task. The extracted style representations can be visualized and are explainable with VeriBin\u2019s multi-head attention mechanism. We benchmark VeriBin with state-of-the-art coding style representations on a standard dataset and a recent malware-APT dataset. Given two anonymous binaries of out-of-sample authors, VeriBin can accurately determine whether they belong to the same author or not. VeriBin is resilient to compiler optimizations and robust against malware family variants.<\/jats:p>","DOI":"10.1145\/3669901","type":"journal-article","created":{"date-parts":[[2024,7,20]],"date-time":"2024-07-20T11:20:24Z","timestamp":1721474424000},"page":"1-37","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["VeriBin: A Malware Authorship Verification Approach for APT Tracking through Explainable and Functionality-Debiasing Adversarial Representation Learning"],"prefix":"10.1145","volume":"27","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6911-6146","authenticated-orcid":false,"given":"Weihan","family":"Ou","sequence":"first","affiliation":[{"name":"School of Computing, Queen's University, Kingston, Canada"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4513-200X","authenticated-orcid":false,"given":"Steven","family":"Ding","sequence":"additional","affiliation":[{"name":"School of Computing, Queen's University, Kingston, Canada"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1697-4101","authenticated-orcid":false,"given":"Mohammad","family":"Zulkernine","sequence":"additional","affiliation":[{"name":"School of Computing, Queen's University, Kingston, Canada"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7925-764X","authenticated-orcid":false,"given":"Li Tao","family":"Li","sequence":"additional","affiliation":[{"name":"School of Computing, Queen's University, Kingston, Canada"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3763-8828","authenticated-orcid":false,"given":"Sarah","family":"Labrosse","sequence":"additional","affiliation":[{"name":"School of Computing, Queen's University, Kingston, Canada"}]}],"member":"320","published-online":{"date-parts":[[2024,8,16]]},"reference":[{"key":"e_1_3_2_2_2","first-page":"101","volume-title":"Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, CCS 2018","author":"Abuhamad Mohammed","year":"2018","unstructured":"Mohammed Abuhamad, Tamer AbuHmed, Aziz Mohaisen, and DaeHun Nyang. 2018. Large-scale and language-oblivious code authorship identification. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, CCS 2018. 101\u2013114."},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.2478\/popets-2020-0044"},{"key":"e_1_3_2_4_2","unstructured":"Naveed Akhtar and Ajmal Mian. 2018. Threat of adversarial attacks on deep learning in computer vision: A survey. CoRR abs\/1801.00553 (2018). Retrieved from http:\/\/arxiv.org\/abs\/1801.00553"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1016\/J.DIIN.2019.01.028"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1145\/3486860"},{"key":"e_1_3_2_7_2","first-page":"47","volume-title":"Proceedings of the European Symposium on Research in Computer Security","author":"Alrabaee Saed","year":"2019","unstructured":"Saed Alrabaee, ElMouatez Billah Karbab, Lingyu Wang, and Mourad Debbabi. 2019. BinEye: Towards efficient binary authorship characterization using deep learning. In Proceedings of the European Symposium on Research in Computer Security. Springer, 47\u201367."},{"issue":"1","key":"e_1_3_2_8_2","first-page":"S94\u2013S103","article-title":"OBA2: An onion approach to binary code authorship attribution","volume":"11","author":"Alrabaee Saed","year":"2014","unstructured":"Saed Alrabaee, Noman Saleem, Stere Preda, Lingyu Wang, and Mourad Debbabi. 2014. OBA2: An onion approach to binary code authorship attribution. Digital Investigation 11, 1 (2014), S94\u2013S103.","journal-title":"Digital Investigation"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","unstructured":"Saed Alrabaee Paria Shirani Lingyu Wang Mourad Debbabi and Aiman Hanna. 2019. Decoupling coding habits from functionality for effective binary authorship attribution. J. Comput. Secur. 27 6 (2019) 613\u2013648. DOI:10.3233\/JCS-191292","DOI":"10.3233\/JCS-191292"},{"key":"e_1_3_2_10_2","unstructured":"Dzmitry Bahdanau Kyunghyun Cho and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In 3rd International Conference on Learning Representations (ICLR\u201915) Yoshua Bengio and Yann LeCun (Eds.). San Diego CA USA May 7\u20139 2015. Retrieved from http:\/\/arxiv.org\/abs\/1409.0473"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1142\/S0218001493000339"},{"issue":"2","key":"e_1_3_2_12_2","first-page":"151","article-title":"Efficient plagiarism detection for large code repositories","volume":"37","author":"Burrows Steven","year":"2007","unstructured":"Steven Burrows, Seyed M. M. Tahaghoghi, and Justin Zobel. 2007. Efficient plagiarism detection for large code repositories. Software: Practice and Experience 37, 2 (2007), 151\u2013175.","journal-title":"Software: Practice and Experience"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1002\/spe.2146"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/3412841.3441919"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/953049.800955"},{"key":"e_1_3_2_16_2","unstructured":"Bruce S. Elenbogen and Naeem Seliya. 2008. Detecting outsourced student programming assignments. J. Comput. Sci. Coll. 23 3 (January 2008) 50\u201357."},{"key":"e_1_3_2_17_2","first-page":"85","volume-title":"Proceedings of the 1st International Conference on E-Business and Telecommunication Networks","author":"Frantzeskou Georgia","year":"2004","unstructured":"Georgia Frantzeskou and Stefanos Gritzalis. 2004. Source code authorship analysis for supporting the cybercrime investigation process. In Proceedings of the 1st International Conference on E-Business and Telecommunication Networks. 85\u201392."},{"key":"e_1_3_2_18_2","first-page":"508","volume-title":"Proceedings of the 3rd IFIP Conference on Artificial Intelligence Applications and Innovations (AIAI) 2006","author":"Frantzeskou Georgia","year":"2006","unstructured":"Georgia Frantzeskou, Efstathios Stamatatos, Stefanos Gritzalis, and Sokratis K. Katsikas. 2006. Source code author identification based on N-gram author profiles. In Proceedings of the 3rd IFIP Conference on Artificial Intelligence Applications and Innovations (AIAI) 2006. 508\u2013515."},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","unstructured":"Ian Goodfellow Jean Pouget-Abadie Mehdi Mirza Bing Xu David Warde-Farley Sherjil Ozair Aaron Courville and Y. Bengio. 2014. Generative adversarial networks. Advances in Neural Information Processing Systems 3 (June 2014). DOI:10.1145\/3422622","DOI":"10.1145\/3422622"},{"key":"e_1_3_2_20_2","first-page":"799","volume-title":"Proceedings of the 15th International Conference on Artificial Neural Networks: Formal Models and Their Applications - ICANN 2005,","author":"Graves Alex","year":"2005","unstructured":"Alex Graves, Santiago Fern\u00e1ndez, and J\u00fcrgen Schmidhuber. 2005. Bidirectional LSTM networks for improved phoneme classification and recognition. In Proceedings of the 15th International Conference on Artificial Neural Networks: Formal Models and Their Applications - ICANN 2005,. 799\u2013804."},{"key":"e_1_3_2_21_2","first-page":"920","volume-title":"Proceedings of the International Conference on Intelligent Systems Design and Applications","author":"Gupta Sumit","year":"2022","unstructured":"Sumit Gupta, Tapas Kumar Patra, and Chitrita Chaudhuri. 2022. Role of machine learning in authorship attribution with select stylometric features. In Proceedings of the International Conference on Intelligent Systems Design and Applications. Springer, 920\u2013932."},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","unstructured":"Giacomo Iadarola Fabio Martinelli Francesco Mercaldo and Antonella Santone. 2021. Towards an interpretable deep learning model for mobile malware detection and family identification. Comput. Secur. 105 (2021) 102198. DOI:10.1016\/J.COSE.2021.102198","DOI":"10.1016\/J.COSE.2021.102198"},{"key":"e_1_3_2_24_2","first-page":"255","volume-title":"Proceedings of the 24th USENIX Security Symposium, USENIX Security 15.","author":"Islam Aylin Caliskan","year":"2015","unstructured":"Aylin Caliskan Islam, Richard E. Harang, Andrew Liu, Arvind Narayanan, Clare R. Voss, Fabian Yamaguchi, and Rachel Greenstadt. 2015. De-anonymizing programmers via code stylometry. In Proceedings of the 24th USENIX Security Symposium, USENIX Security 15.255\u2013270."},{"key":"e_1_3_2_25_2","unstructured":"Aylin Caliskan Fabian Yamaguchi Edwin Dauber Richard E. Harang Konrad Rieck Rachel Greenstadt and Arvind Narayanan. 2018. When coding style survives compilation: De-anonymizing programmers from executable binaries. In 25th Annual Network and Distributed System Security Symposium (NDSS\u201918) San Diego California USA February 18-21 2018 The Internet Society. Retrieved from https:\/\/www.ndss-symposium.org\/wp-content\/uploads\/2018\/02\/ndss2018%5C_06B-2%5C_Caliskan%5C_paper.pdf"},{"issue":"1","key":"e_1_3_2_26_2","first-page":"3:1\u20133:36","article-title":"Code authorship attribution: Methods and challenges","volume":"52","author":"Kalgutkar Vaibhavi","year":"2019","unstructured":"Vaibhavi Kalgutkar, Ratinder Kaur, Hugo Gonzalez, Natalia Stakhanova, and Alina Matyukhina. 2019. Code authorship attribution: Methods and challenges. ACM Computing Surveys 52, 1 (2019), 3:1\u20133:36.","journal-title":"ACM Computing Surveys"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-2012"},{"key":"e_1_3_2_28_2","first-page":"1104","volume-title":"Proceedings of the International Symposium on Communications and Information Technologies, ISCIT 2012","author":"Layton Robert","year":"2012","unstructured":"Robert Layton, Paul A. Watters, and Richard Dazeley. 2012. Unsupervised authorship analysis of phishing webpages. In Proceedings of the International Symposium on Communications and Information Technologies, ISCIT 2012. 1104\u20131109."},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1038\/nature14539"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2017.2655046"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1145\/2950290.2983962"},{"key":"e_1_3_2_32_2","first-page":"286","volume-title":"Proceedings of the 22nd European Symposium on Research in Computer Security","author":"Meng Xiaozhu","year":"2017","unstructured":"Xiaozhu Meng, Barton P. Miller, and Kwang-Sung Jun. 2017. Identifying multiple authors in a binary program. In Proceedings of the 22nd European Symposium on Research in Computer Security. 286\u2013304."},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-66399-9_16"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11416-020-00376-6"},{"key":"e_1_3_2_35_2","unstructured":"Tomas Mikolov Kai Chen Gregory S. Corrado and Jeffrey A. Dean. 2015. Computing numeric representations of words in a high-dimensional space. (May 192015). US Patent 9 037 464."},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","unstructured":"Weihan Ou Steven H. H. Ding Yuan Tian and Leo Song. 2023. SCS-Gan: Learning functionality-agnostic stylometric representations for source code authorship verification. IEEE Trans. Software Eng. 49 4 (2023) 1426\u20131442. DOI:10.1109\/TSE.2022.3177228","DOI":"10.1109\/TSE.2022.3177228"},{"key":"e_1_3_2_37_2","unstructured":"Brian Pellin. 2006. Using Classification Techniques to Determine Source Code Authorship. Retrieved from https:\/\/api.semanticscholar.org\/CorpusID:14399700"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1162"},{"key":"e_1_3_2_39_2","first-page":"479","volume-title":"Proceedings of the 28th USENIX Security Symposium, USENIX Security 2019","author":"Quiring Erwin","year":"2019","unstructured":"Erwin Quiring, Alwin Maier, and Konrad Rieck. 2019. Misleading authorship attribution of source code using adversarial learning. In Proceedings of the 28th USENIX Security Symposium, USENIX Security 2019, Nadia Heninger and Patrick Traynor (Eds.). USENIX Association, 479\u2013496. Retrieved from https:\/\/www.usenix.org\/conference\/usenixsecurity19\/presentation\/quiring"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1504\/IJAHUC.2021.119097"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-23822-2_10"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/78.650093"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1515\/popets-2018-0007"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2019.2943639"},{"key":"e_1_3_2_45_2","unstructured":"Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N. Gomez Lukasz Kaiser and Illia Polosukhin. 2017. Attention is all you need. Retrieved from https:\/\/arxiv.org\/pdf\/1706.03762.pdf"},{"key":"e_1_3_2_46_2","unstructured":"Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N. Gomez Lukasz Kaiser and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems Isabelle Guyon Ulrike von Luxburg Samy Bengio Hanna M. Wallach Rob Fergus S. V. N. Vishwanathan and Roman Garnett (Eds.). Long Beach CA USA 5998\u20136008. Retrieved from https:\/\/proceedings.neurips.cc\/paper\/2017\/hash\/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html"},{"key":"e_1_3_2_47_2","unstructured":"Wikipedia contributors. 2020. Siamese Neural Network \u2014 Wikipedia The Free Encyclopedia. (2020). https:\/\/en.wikipedia.org\/wiki\/Siamese_neural_network"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","unstructured":"Jie Zhou Ying Cao Xuguang Wang Peng Li and Wei Xu. 2016. Deep recurrent models with fast-forward connections for neural machine translation. Trans. Assoc. Comput. Linguistics 4 (2016) 371\u2013383. DOI:10.1162\/TACL_A_00105","DOI":"10.1162\/TACL_A_00105"}],"container-title":["ACM Transactions on Privacy and Security"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3669901","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3669901","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T00:58:36Z","timestamp":1750294716000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3669901"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,8,16]]},"references-count":47,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2024,8,31]]}},"alternative-id":["10.1145\/3669901"],"URL":"https:\/\/doi.org\/10.1145\/3669901","relation":{},"ISSN":["2471-2566","2471-2574"],"issn-type":[{"value":"2471-2566","type":"print"},{"value":"2471-2574","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,8,16]]},"assertion":[{"value":"2022-05-22","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-04-23","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-08-16","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}