{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,6]],"date-time":"2026-06-06T16:33:52Z","timestamp":1780763632792,"version":"3.54.1"},"reference-count":61,"publisher":"SAGE Publications","issue":"6","license":[{"start":{"date-parts":[[2019,9,26]],"date-time":"2019-09-26T00:00:00Z","timestamp":1569456000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Journal of Computer Security"],"published-print":{"date-parts":[[2019,10,11]]},"abstract":"<jats:p>Binary authorship attribution refers to the process of identifying the author of a given anonymous binary file based on stylistic characteristics. It aims to automate the laborious and error-prone reverse engineering task of discovering information related to the author(s) of a binary code. Existing works typically employ machine learning methods to extract features that are unique for each author and subsequently match them against a given binary to identify the author. However, most existing works share a common critical limitation, i.e., they cannot distinguish between features representing program functionality and those representing authorship (e.g., authors\u2019 coding habits). Such distinction is crucial for effective authorship attribution because what is unique in a particular binary may be attributed to either author, compiler, or function. In this study, we present BinAuthor a system capable of decoupling program functionality from authors\u2019 coding habits in binary code. To capture coding habits, BinAuthor leverages a set of features that are based on collections of functionality-independent choices made by authors during coding. Our evaluation demonstrates that BinAuthor outperforms existing methods in several aspects. First, it successfully attributes a larger number of authors with a significantly higher accuracy (around [Formula: see text]) based on the large datasets extracted from selected open-source C[Formula: see text] projects in GitHub, Google Code Jam events, Planet Source Code contests, and several programming projects. Second, BinAuthor is more robust than previous methods; there is no significant drop in accuracy when the code is subjected to refactoring techniques, simple obfuscation, and processed with different compilers. Finally, decoupling authorship from functionality allows us to apply BinAuthor to real malware binaries (Citadel, Zeus, Stuxnet, Flame, Bunny, and Babar) to automatically generate evidence on similar coding habits.<\/jats:p>","DOI":"10.3233\/jcs-191292","type":"journal-article","created":{"date-parts":[[2019,9,27]],"date-time":"2019-09-27T14:11:47Z","timestamp":1569593507000},"page":"613-648","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":9,"title":["Decoupling coding habits from functionality for effective binary authorship attribution"],"prefix":"10.1177","volume":"27","author":[{"given":"Saed","family":"Alrabaee","sequence":"first","affiliation":[{"name":"Information Systems and Security, CIT, United Arab Emirates University, Al Ain, UAE. E-mail:\u00a0"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Paria","family":"Shirani","sequence":"additional","affiliation":[{"name":"CIISE, Concordia University, Montreal, QC, Canada. E-mails:\u00a0,\u00a0,\u00a0"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Lingyu","family":"Wang","sequence":"additional","affiliation":[{"name":"CIISE, Concordia University, Montreal, QC, Canada. E-mails:\u00a0,\u00a0,\u00a0"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Mourad","family":"Debbabi","sequence":"additional","affiliation":[{"name":"CIISE, Concordia University, Montreal, QC, Canada. E-mails:\u00a0,\u00a0,\u00a0"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Aiman","family":"Hanna","sequence":"additional","affiliation":[{"name":"Computer Science, Concordia University, Montreal, QC, Canada. E-mail:\u00a0"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"179","published-online":{"date-parts":[[2019,9,26]]},"reference":[{"key":"ref001","unstructured":"Advanced Windows software protection system, 2016, http:\/\/www.oreans.com\/themida.php."},{"key":"ref002","unstructured":"Adventure in Windows debugging and reverse enigineering, 2016, http:\/\/www.nynaeve.net\/."},{"key":"ref003","unstructured":"S.Alrabaee, Efficient, scalable, and accurate program fingerprinting in binary code, Ph.D. dissertation, Concordia University, 2018."},{"key":"ref004","doi-asserted-by":"publisher","DOI":"10.1016\/j.diin.2019.01.028"},{"key":"ref005","doi-asserted-by":"publisher","DOI":"10.1016\/j.diin.2014.03.012"},{"key":"ref006","doi-asserted-by":"crossref","unstructured":"S.Alrabaee, P.Shirani, M.Debbabi and L.Wang, On the feasibility of malware authorship attribution, in: International Symposium on Foundations and Practice of Security, Springer, 2016, pp. 256\u2013272.","DOI":"10.1007\/978-3-319-51966-1_17"},{"key":"ref007","doi-asserted-by":"publisher","DOI":"10.1016\/j.diin.2015.01.011"},{"key":"ref008","doi-asserted-by":"crossref","unstructured":"S.Alrabaee, P.Shirani, L.Wang, M.Debbabi and A.Hanna, On leveraging coding habits for effective binary authorship attribution, in: European Symposium on Research in Computer Security, Springer, 2018, pp. 26\u201347. doi:10.1007\/978-3-319-99073-6_2.","DOI":"10.1007\/978-3-319-99073-6_2"},{"key":"ref009","doi-asserted-by":"publisher","DOI":"10.1016\/j.diin.2016.04.002"},{"key":"ref010","doi-asserted-by":"crossref","unstructured":"G.Balakrishnan and T.Reps, Wysinwyx: What you see is not what you execute, ACM Transactions on Programming Languages and Systems (TOPLAS) 32(6) (2010), 23. doi:10.1145\/1749608.1749612.","DOI":"10.1145\/1749608.1749612"},{"key":"ref011","unstructured":"Big Game Hunting: Nation-state malware research, BlackHat, 2015, https:\/\/www.blackhat.com\/docs\/us-15\/materials\/us-15-MarquisBoire-Big-Game-Hunting-The-Peculiarities-Of-Nation-State-Malware-Research.pdf."},{"key":"ref012","doi-asserted-by":"crossref","unstructured":"M.Brennan, S.Afroz and R.Greenstadt, Adversarial stylometry: Circumventing authorship recognition to preserve privacy and anonymity, ACM Transactions on Information and System Security (TISSEC) 15(3) (2012), 12. doi:10.1145\/2382448.2382450.","DOI":"10.1145\/2382448.2382450"},{"key":"ref013","unstructured":"C++ refactoring tools for visual studio, 2016, http:\/\/www.wholetomato.com\/."},{"key":"ref014","unstructured":"A.Caliskan-Islam, R.Harang, A.Liu, A.Narayanan, C.Voss, F.Yamaguchi and R.Greenstadt, De-anonymizing programmers via code stylometry, in: 24th USENIX Security Symposium (USENIX Security 15), 2015, pp. 255\u2013270."},{"key":"ref015","unstructured":"A.Caliskan-Islam, F.Yamaguchi, E.Dauber, R.Harang, K.Rieck, R.Greenstadt and A.Narayanan, When coding style survives compilation: De-anonymizing programmers from executable binaries, 2015, arXiv preprint arXiv:1512.08546."},{"key":"ref016","doi-asserted-by":"crossref","unstructured":"R.Chen, L.Hong, C.L\u00fc and W.Deng, Author identification of software source code with program dependence graphs, in: Computer Software and Applications Conference Workshops (COMPSACW), 2010 IEEE 34th Annual, IEEE, 2010, pp. 281\u2013286.","DOI":"10.1109\/COMPSACW.2010.56"},{"key":"ref017","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.2005.844059"},{"key":"ref018","unstructured":"Contagio: Malware dump, 2016, http:\/\/contagiodump.blogspot.ca."},{"key":"ref019","doi-asserted-by":"crossref","unstructured":"Y.David, N.Partush and E.Yahav, Similarity of binaries through re-optimization, in: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, ACM, 2017, pp. 79\u201394.","DOI":"10.1145\/3062341.3062387"},{"issue":"3","key":"ref020","first-page":"50","volume":"23","author":"Elenbogen B.S.","year":"2008","journal-title":"Journal of Computing Sciences in Colleges"},{"key":"ref021","unstructured":"GitHub-Build software better, 2011, https:\/\/github.com\/trending?l=cpp."},{"key":"ref022","unstructured":"Hex-Ray decompiler, 2015, https:\/\/www.hex-rays.com\/products\/decompiler\/."},{"key":"ref023","unstructured":"IDA pro Fast Library Identification and Recognition Technology, 2011, https:\/\/www.hex-rays.com\/products\/ida\/tech\/."},{"key":"ref024","doi-asserted-by":"crossref","unstructured":"P.Junod, J.Rinaldini, J.Wehrli and J.Michielin, Obfuscator-LLVM \u2013 software protection for the masses, in: Proceedings of the IEEE\/ACM 1st International Workshop on Software Protection, SPRO\u201915, Firenze, Italy, May 19, 2015, B.Wyseur, ed. IEEE, 2015, pp. 3\u20139.","DOI":"10.1109\/SPRO.2015.10"},{"key":"ref025","doi-asserted-by":"crossref","unstructured":"P.Junod, J.Rinaldini, J.Wehrli and J.Michielin, Obfuscator-llvm: Software protection for the masses, in: Proceedings of the 1st International Workshop on Software Protection, IEEE Press, 2015, pp. 3\u20139.","DOI":"10.1109\/SPRO.2015.10"},{"key":"ref026","doi-asserted-by":"crossref","unstructured":"T.A.Junttila and P.Kaski, Engineering an efficient canonical labeling tool for large and sparse graphs, in: ALENEX, Vol. 7, SIAM, 2007, pp. 135\u2013149.","DOI":"10.1137\/1.9781611972870.13"},{"key":"ref027","doi-asserted-by":"publisher","DOI":"10.1145\/355588.365140"},{"key":"ref028","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-4048(97)00005-9"},{"key":"ref029","doi-asserted-by":"publisher","DOI":"10.1007\/s11416-010-0148-y"},{"key":"ref030","first-page":"49","volume":"2","author":"Mahalanobis P.C.","year":"1936","journal-title":"Proceedings of the National Institute of Sciences (Calcutta)"},{"key":"ref031","doi-asserted-by":"crossref","unstructured":"X.Meng, B.P.Miller and K.S.Jun, Identifying multiple authors in a binary program, in: European Symposium on Research in Computer Security, Springer, 2017, pp. 286\u2013304.","DOI":"10.1007\/978-3-319-66399-9_16"},{"key":"ref032","unstructured":"N.Moran and J.Bennett, Supply Chain Analysis: From Quartermaster to Sun-shop,\n                      FireEye Labs\n                      11\n                      (2013)."},{"key":"ref033","unstructured":"R.Muth, Register liveness analysis of executable code, 1998, Manuscript, Dept. of Computer Science, The University of Arizona."},{"key":"ref034","unstructured":"OllyDbg, it is a assembler level analysing debugger, http:\/\/www.ollydbg.de\/."},{"key":"ref035","unstructured":"PELock is a software security solution designed for protection of any 32 bit Windows applications, 2016, https:\/\/www.pelock.com\/."},{"key":"ref036","unstructured":"Programmer De-anonymization from Binary Executables, 2015, https:\/\/github.com\/calaylin\/bda."},{"key":"ref037","doi-asserted-by":"publisher","DOI":"10.1016\/j.diin.2015.05.015"},{"key":"ref038","doi-asserted-by":"crossref","unstructured":"V.Rajlich, Software evolution and maintenance, in: Proceedings of the Future of Software Engineering, ACM, 2014, pp. 133\u2013144.","DOI":"10.1145\/2593882.2593893"},{"key":"ref039","unstructured":"Refactoring tool, https:\/\/www.devexpress.com\/Products\/CodeRush\/."},{"key":"ref040","doi-asserted-by":"crossref","unstructured":"N.Rosenblum, X.Zhu and B.P.Miller, Who wrote this code? Identifying the authors of program binaries, in: Computer Security\u2013ESORICS 2011, Springer, 2011, pp. 172\u2013189. doi:10.1007\/978-3-642-23822-2_10.","DOI":"10.1007\/978-3-642-23822-2_10"},{"key":"ref041","unstructured":"Script modifies GNU assembly files (.s) to confuse linear sweep disassemblers like objdump, 2016, https:\/\/github.com\/defuse\/gas-obfuscation."},{"key":"ref042","doi-asserted-by":"crossref","unstructured":"P.Shirani, L.Collard, B.L.Agba, B.Lebel, M.Debbabi, L.Wang and A.Hanna, Binarm: Scalable and efficient detection of vulnerabilities in firmware images of intelligent electronic devices, in: International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, Springer, 2018, pp. 114\u2013138. doi:10.1007\/978-3-319-93411-2_6.","DOI":"10.1007\/978-3-319-93411-2_6"},{"key":"ref043","doi-asserted-by":"crossref","unstructured":"P.Shirani, L.Wang and M.Debbabi, Binshape: Scalable and robust binary library function identification using function shape, in: International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, Springer, 2017, pp. 301\u2013324. doi:10.1007\/978-3-319-60876-1_14.","DOI":"10.1007\/978-3-319-60876-1_14"},{"key":"ref044","doi-asserted-by":"publisher","DOI":"10.1016\/0167-4048(93)90055-A"},{"key":"ref045","unstructured":"Techniqal report, Resource 207: Kaspersky Lab Research proves that Stuxnet and Flame developers are connected, http:\/\/www.kaspersky.com\/about\/news\/virus\/2012\/."},{"key":"ref046","unstructured":"Techniqal report, Mcafee, 2011, www.mcafee.com\/ca\/resources\/wp-citadel-trojan-summary.pdf."},{"key":"ref047","unstructured":"The C Language Library, Cplusplus website, 2011, http:\/\/www.cplusplus.com\/reference\/clibrary\/."},{"key":"ref048","unstructured":"The Gephi plugin for nneo4j, 2015, Avaiable from https:\/\/marketplace.gephi.org\/plugin\/neo4j-graph-database-support\/."},{"key":"ref049","unstructured":"The Google Code Jam, 2008\u20132015, http:\/\/code.google.com\/codejam\/."},{"key":"ref050","unstructured":"The IDA Pro Book: The Unofficial Guide to the World\u2019s Most Popular Disassembler, 2011, http:\/\/www.amazon.ca\/The-IDA-Pro-Book-Disassembler\/dp\/1593272898."},{"key":"ref051","unstructured":"The materials supplement for the paper \u201cWho Wrote This Code? Identifying the Authors of Program Binaries\u201d, 2011, http:\/\/pages.cs.wisc.edu\/~nater\/esorics-supp\/."},{"key":"ref052","unstructured":"The planet source code, 2015, Available from http:\/\/www.planet-source-code.com\/vb\/default.asp?lngWId=3#ContentWinners."},{"key":"ref053","unstructured":"The Scalable Native Graph Database, 2015, Available from http:\/\/neo4j.com\/."},{"key":"ref054","unstructured":"Tigress is a diversifying virtualizer\/obfuscator for the C language, 2016, http:\/\/tigress.cs.arizona.edu\/."},{"key":"ref055","first-page":"2837","volume":"11","author":"Vinh N.X.","year":"2010","journal-title":"The Journal of Machine Learning Research"},{"key":"ref056","unstructured":"VirusSign: Malware Research & Data Center, Virus Free, 2016, http:\/\/www.virussign.com\/."},{"key":"ref057","doi-asserted-by":"publisher","DOI":"10.1147\/sj.402.0426"},{"key":"ref058","doi-asserted-by":"crossref","unstructured":"X.Wang and R.Karri, Detecting kernel control-flow modifying rootkits, in: Network Science and Cybersecurity, Springer, 2014, pp. 177\u2013187. doi:10.1007\/978-1-4614-7597-2_11.","DOI":"10.1007\/978-1-4614-7597-2_11"},{"key":"ref059","first-page":"207","volume":"10","author":"Weinberger K.Q.","year":"2009","journal-title":"The Journal of Machine Learning Research"},{"key":"ref060","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2007.1078"},{"key":"ref061","doi-asserted-by":"publisher","DOI":"10.1049\/iet-ifs.2012.0289"}],"container-title":["Journal of Computer Security"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.3233\/JCS-191292","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.3233\/JCS-191292","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.3233\/JCS-191292","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T20:45:20Z","timestamp":1777495520000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.3233\/JCS-191292"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,9,26]]},"references-count":61,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2019,10,11]]}},"alternative-id":["10.3233\/JCS-191292"],"URL":"https:\/\/doi.org\/10.3233\/jcs-191292","relation":{},"ISSN":["0926-227X","1875-8924"],"issn-type":[{"value":"0926-227X","type":"print"},{"value":"1875-8924","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,9,26]]}}}