{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,24]],"date-time":"2026-02-24T16:54:04Z","timestamp":1771952044945,"version":"3.50.1"},"reference-count":126,"publisher":"Association for Computing Machinery (ACM)","issue":"8","funder":[{"name":"Google PhD Fellowship"},{"name":"Suzuki Motor Corporation"},{"name":"SERB-DST","award":["CRG\/2022\/009400"],"award-info":[{"award-number":["CRG\/2022\/009400"]}]},{"DOI":"10.13039\/501100003593","name":"CNPq","doi-asserted-by":"crossref","award":["304441\/2021-0 and 444127\/2024-0"],"award-info":[{"award-number":["304441\/2021-0 and 444127\/2024-0"]}],"id":[{"id":"10.13039\/501100003593","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100004901","name":"FAPEMIG","doi-asserted-by":"crossref","award":["APQ-00440-23"],"award-info":[{"award-number":["APQ-00440-23"]}],"id":[{"id":"10.13039\/501100004901","id-type":"DOI","asserted-by":"crossref"}]},{"name":"AMD and Qualcomm"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Softw. Eng. Methodol."],"published-print":{"date-parts":[[2025,11,30]]},"abstract":"<jats:p>\n            Binary similarity involves determining whether two binary programs exhibit similar functionality with applications in vulnerability detection, malware analysis, and copyright detection. However, variations in compiler settings, target architectures, and deliberate code obfuscations significantly complicate the similarity measurement by effectively altering the syntax, semantics, and structure of the underlying binary. To address these challenges, we propose\n            <jats:sc>VexIR2Vec<\/jats:sc>\n            , a robust, architecture-neutral approach based on VEX-IR to solve binary similarity tasks.\n            <jats:sc>VexIR2Vec<\/jats:sc>\n            consists of three key components: a peephole extractor, a normalization engine (\n            <jats:sc>VexINE<\/jats:sc>\n            ), and an embedding model (\n            <jats:sc>VexNet<\/jats:sc>\n            ). The process to build program embeddings starts with the extraction of sequences of basic blocks, or\n            <jats:italic toggle=\"yes\">peepholes<\/jats:italic>\n            , from control-flow graphs via random walks, capturing structural information. These generated peepholes are then\n            <jats:italic toggle=\"yes\">normalized<\/jats:italic>\n            using\n            <jats:sc>VexINE<\/jats:sc>\n            , which applies compiler-inspired transformations to reduce architectural and compiler-induced variations. Embeddings of peepholes are generated using representation learning techniques, avoiding Out-of-Vocabulary (OOV) issues. These embeddings are then fine-tuned with\n            <jats:sc>VexNet<\/jats:sc>\n            , a feed-forward Siamese network that maps functions into a high-dimensional space for diffing and searching tasks in an application-independent manner.\n          <\/jats:p>\n          <jats:p>\n            We evaluate\n            <jats:sc>VexIR2Vec<\/jats:sc>\n            against five baselines\u2014BinDiff, DeepBinDiff, SAFE, BinFinder, and histograms of opcodes\u2014on a dataset comprising 2.7\n            <jats:italic toggle=\"yes\">M<\/jats:italic>\n            functions and 15.5\n            <jats:italic toggle=\"yes\">K<\/jats:italic>\n            binaries from 7 projects compiled across 12 compilers targeting x86 and ARM architectures. The experiments span four adversarial settings\u2014cross-optimization, cross-compilation, cross-architecture, and obfuscations\u2014that are typically exploited by malware and vulnerabilities. In diffing experiments,\n            <jats:sc>VexIR2Vec<\/jats:sc>\n            outperforms the nearest baseline in these four scenarios by\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(40\\%\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            ,\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(18\\%\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            ,\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(21\\%\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            , and\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(60\\%\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            , respectively. In the searching experiment,\n            <jats:sc>VexIR2Vec<\/jats:sc>\n            achieves a mean average precision of 0.76, the nearest baseline, by\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(46\\%\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            . Our framework is highly scalable and is built as a lightweight, multi-threaded, parallel library using only open source tools.\n            <jats:sc>VexIR2Vec<\/jats:sc>\n            is\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(\\approx 3.1\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            \u2013\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(3.5\\times\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            faster than the closest baselines and orders-of-magnitude faster than other tools.\n          <\/jats:p>","DOI":"10.1145\/3721481","type":"journal-article","created":{"date-parts":[[2025,3,5]],"date-time":"2025-03-05T10:12:46Z","timestamp":1741169566000},"page":"1-54","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["<scp>VexIR2Vec<\/scp>\n            : An Architecture-Neutral Embedding Framework for Binary Similarity"],"prefix":"10.1145","volume":"34","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1393-7321","authenticated-orcid":false,"given":"S.","family":"VenkataKeerthy","sequence":"first","affiliation":[{"name":"Department of Computer Science and Engineering, Indian Institute of Technology Hyderabad, Hyderabad, India"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-5772-2469","authenticated-orcid":false,"given":"Soumya","family":"Banerjee","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Indian Institute of Technology Hyderabad, Hyderabad, India"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4115-0588","authenticated-orcid":false,"given":"Sayan","family":"Dey","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Indian Institute of Technology Hyderabad, Hyderabad, India"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1180-4197","authenticated-orcid":false,"given":"Yashas","family":"Andaluri","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Indian Institute of Technology Hyderabad, Hyderabad, India"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-3059-5025","authenticated-orcid":false,"given":"Raghul","family":"P. S.","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Indian Institute of Technology Hyderabad, Hyderabad, India"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9094-3368","authenticated-orcid":false,"given":"Subrahmanyam","family":"Kalyanasundaram","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Indian Institute of Technology Hyderabad, Hyderabad, India"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0375-1657","authenticated-orcid":false,"given":"Fernando Magno","family":"Quint\u00e3o Pereira","sequence":"additional","affiliation":[{"name":"UFMG, Belo Horizonte, Brazil"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5290-3266","authenticated-orcid":false,"given":"Ramakrishna","family":"Upadrasta","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Indian Institute of Technology Hyderabad, Hyderabad, India"}]}],"member":"320","published-online":{"date-parts":[[2025,10,4]]},"reference":[{"key":"e_1_3_3_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/3564625.3567975"},{"key":"e_1_3_3_3_2","doi-asserted-by":"publisher","DOI":"10.1145\/3292500.3330701"},{"key":"e_1_3_3_4_2","unstructured":"Annoy. 2024. ANNOY Library. Retrieved November 22 2024 from https:\/\/github.com\/spotify\/annoy"},{"key":"e_1_3_3_5_2","doi-asserted-by":"publisher","DOI":"10.1145\/3640537.3641573"},{"key":"e_1_3_3_6_2","volume-title":"3rd International Conference on Learning Representations (ICLR \u201915)","author":"Bahdanau Dzmitry","year":"2015","unstructured":"Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In 3rd International Conference on Learning Representations (ICLR \u201915). Yoshua Bengio and Yann LeCun (Eds.), San Diego, CA. Retrieved from http:\/\/arxiv.org\/abs\/1409.0473"},{"key":"e_1_3_3_7_2","volume-title":"1998 USENIX Annual Technical Conference (USENIX ATC \u201998)","author":"Baker Brenda S.","year":"1998","unstructured":"Brenda S. Baker and Udi Manber. 1998. Deducing similarities in Java sources from bytecodes. In 1998 USENIX Annual Technical Conference (USENIX ATC \u201998). USENIX Association, New Orleans, LA. Retrieved from https:\/\/www.usenix.org\/conference\/1998-usenix-annual-technical-conference\/deducing-similarities-java-sources-bytecodes"},{"key":"e_1_3_3_8_2","doi-asserted-by":"publisher","unstructured":"Yoshua Bengio Aaron Courville and Pascal Vincent. 2013. Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence 35 8 (Aug. 2013) 1798\u20131828. DOI: 10.1109\/TPAMI.2013.50","DOI":"10.1109\/TPAMI.2013.50"},{"key":"e_1_3_3_9_2","doi-asserted-by":"publisher","DOI":"10.1145\/361002.361007"},{"key":"e_1_3_3_10_2","unstructured":"Binary Ninja. [n. d.]. Binary Ninja. Retrieved from https:\/\/binary.ninja"},{"key":"e_1_3_3_11_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00051"},{"key":"e_1_3_3_12_2","first-page":"2787","volume-title":"Conference on Neural Information Processing Systems (NIPS \u201913)","author":"Bordes A.","year":"2013","unstructured":"A. Bordes, N. Usunier, A. Garcia-Dur\u00e1n, J. Weston, and O. Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In Conference on Neural Information Processing Systems (NIPS \u201913), 2787\u20132795. Retrieved from http:\/\/dl.acm.org\/citation.cfm?id=2999792.2999923"},{"key":"e_1_3_3_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/2430553.2430557"},{"key":"e_1_3_3_14_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-22110-1_37"},{"key":"e_1_3_3_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/3185768.3185771"},{"key":"e_1_3_3_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/2950290.2950350"},{"key":"e_1_3_3_17_2","first-page":"11","volume-title":"37th International Conference on Machine Learning (ICML \u201920)","author":"Chen Ting","year":"2020","unstructured":"Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In 37th International Conference on Machine Learning (ICML \u201920). JMLR.org, Article 149, 11 pages."},{"key":"e_1_3_3_18_2","unstructured":"Zheng Leong Chua Shiqi Shen Prateek Saxena and Zhenkai Liang. 2017. Neural nets can learn function type signatures from binaries. In 26th USENIX Security Symposium (USENIX Security \u201917). USENIX Association Vancouver BC 99\u2013116. Retrieved from https:\/\/www.usenix.org\/conference\/usenixsecurity17\/technical-sessions\/presentation\/chua"},{"key":"e_1_3_3_19_2","unstructured":"Josh Collyer Tim Watson and Iain Phillips. 2023. FASER: Binary code similarity search through the use of intermediate representations. arXiv:2310.03605. Retrieved from https:\/\/arxiv.org\/abs\/2310.03605"},{"key":"e_1_3_3_20_2","unstructured":"Krish Coppieters. 1995. A Cross-Platform Binary Diff. Retrieved November 12 2023 from https:\/\/www.drdobbs.com\/embedded-systems\/a-cross-platform-binary-diff\/184409550"},{"key":"e_1_3_3_21_2","unstructured":"Coreutils. 2024. GNU Coreutils. Retrieved May 8 2024 from https:\/\/www.gnu.org\/software\/coreutils\/"},{"key":"e_1_3_3_22_2","doi-asserted-by":"publisher","DOI":"10.1145\/115372.115320"},{"key":"e_1_3_3_23_2","doi-asserted-by":"publisher","DOI":"10.1145\/3579990.3580012"},{"key":"e_1_3_3_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/2980983.2908126"},{"key":"e_1_3_3_25_2","doi-asserted-by":"publisher","DOI":"10.1145\/3062341.3062387"},{"key":"e_1_3_3_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/349214.349233"},{"key":"e_1_3_3_27_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N19-1423"},{"key":"e_1_3_3_28_2","unstructured":"Diffutils. 2024. GNU Diffutils. Retrieved May 8 2024 from https:\/\/www.gnu.org\/software\/diffutils\/"},{"key":"e_1_3_3_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/SP.2019.00003"},{"key":"e_1_3_3_30_2","unstructured":"Matthijs Douze Alexandr Guzhva Chengqi Deng Jeff Johnson Gergely Szilvasy Pierre-Emmanuel Mazar\u00e9 Maria Lomeli Lucas Hosseini and Herv\u00e9 J\u00e9gou. 2024. The Faiss library. arXiv:2401.08281. Retrieved from https:\/\/arxiv.org\/abs\/2401.08281"},{"key":"e_1_3_3_31_2","doi-asserted-by":"publisher","DOI":"10.14722\/ndss.2020.24311"},{"key":"e_1_3_3_32_2","unstructured":"Thomas Dullien. 2018. FuncSimSearch. Retrieved July 13 2022 from https:\/\/github.com\/googleprojectzero\/functionsimsearch"},{"key":"e_1_3_3_33_2","first-page":"58","article-title":"discovRE: Efficient cross-architecture identification of bugs in binary code","volume":"52","author":"Eschweiler Sebastian","year":"2016","unstructured":"Sebastian Eschweiler, Khaled Yakdan, and Elmar Gerhards-Padilla. 2016. discovRE: Efficient cross-architecture identification of bugs in binary code. In Network and Distributed System Security Symposium (NDSS), Vol. 52, 58\u201379.","journal-title":"Network and Distributed System Security Symposium (NDSS)"},{"key":"e_1_3_3_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/SERE.2014.21"},{"key":"e_1_3_3_35_2","doi-asserted-by":"publisher","DOI":"10.1145\/2976749.2978370"},{"key":"e_1_3_3_36_2","unstructured":"Findutils. 2024. GNU Findutils. Retrieved May 8 2024 from https:\/\/www.gnu.org\/software\/findutils\/"},{"key":"e_1_3_3_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/3238147.3240480"},{"key":"e_1_3_3_38_2","unstructured":"Ghidra [n. d.]. Ghidra: Software Reverse Engineering Framework. Retrieved from https:\/\/ghidra-sre.org\/"},{"key":"e_1_3_3_39_2","doi-asserted-by":"publisher","DOI":"10.3390\/fi15090314"},{"key":"e_1_3_3_40_2","unstructured":"Wenbo Guo Dongliang Mu Xinyu Xing Min Du and Dawn Song. 2019. DEEPVSA: Facilitating value-set analysis with deep learning for postmortem program analysis. In 28th USENIX Security Symposium (USENIX Security \u201919). USENIX Association Santa Clara CA 1787\u20131804. Retrieved from https:\/\/www.usenix.org\/conference\/usenixsecurity19\/presentation\/guo"},{"key":"e_1_3_3_41_2","unstructured":"Gzip. 2024. GNU Gzip. Retrieved May 8 2024 from https:\/\/www.gnu.org\/software\/gzip\/"},{"key":"e_1_3_3_42_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-2024"},{"key":"e_1_3_3_43_2","doi-asserted-by":"publisher","DOI":"10.1145\/3446371"},{"key":"e_1_3_3_44_2","first-page":"1759","volume-title":"the 33rd USENIX Conference on Security Symposium (USENIX Security \u201924)","author":"He Haojie","year":"2024","unstructured":"Haojie He, Xingwei Lin, Ziang Weng, Ruijie Zhao, Shuitao Gan, Libo Chen, Yuede Ji, Jiashui Wang, and Zhi Xue. 2024. Code is not natural language: Unlock the power of semantics-oriented graph representation for binary code similarity detection. In the 33rd USENIX Conference on Security Symposium (USENIX Security \u201924), 1759\u20131776."},{"key":"e_1_3_3_45_2","unstructured":"Dan Hendrycks and Kevin Gimpel. 2016. Bridging nonlinearities and stochastic regularizers with Gaussian error linear units. arXiv:1606.08415. Retrieved from http:\/\/arxiv.org\/abs\/1606.08415"},{"key":"e_1_3_3_46_2","unstructured":"Hex-Rays. [n. d.]. IDA Pro. Retrieved from https:\/\/hex-rays.com\/ida-pro\/"},{"key":"e_1_3_3_47_2","doi-asserted-by":"publisher","DOI":"10.1145\/359581.359603"},{"key":"e_1_3_3_48_2","first-page":"448","volume-title":"32nd International Conference on Machine Learning (Proceedings of Machine Learning Research), Vol","author":"Ioffe Sergey","year":"2015","unstructured":"Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In 32nd International Conference on Machine Learning (Proceedings of Machine Learning Research), Vol. 37. Francis Bach and David Blei (Eds.), PMLR, Lille, France, 448\u2013456. Retrieved from http:\/\/proceedings.mlr.press\/v37\/ioffe15.html"},{"key":"e_1_3_3_49_2","doi-asserted-by":"publisher","DOI":"10.1145\/3561385"},{"key":"e_1_3_3_50_2","doi-asserted-by":"publisher","DOI":"10.1145\/3597503.3639080"},{"key":"e_1_3_3_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICMLA.2012.70"},{"key":"e_1_3_3_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/SPRO.2015.10"},{"key":"e_1_3_3_53_2","doi-asserted-by":"publisher","DOI":"10.5555\/1214993"},{"key":"e_1_3_3_54_2","doi-asserted-by":"publisher","DOI":"10.1109\/SANER.2018.8330221"},{"key":"e_1_3_3_55_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2022.3187689"},{"key":"e_1_3_3_56_2","doi-asserted-by":"publisher","DOI":"10.1145\/3533767.3534383"},{"key":"e_1_3_3_57_2","unstructured":"Gregory Koch Richard Zemel and Ruslan Salakhutdinov. 2015. Siamese neural networks for one-shot image recognition. In ICML Deep Learning Workshop Vol. 2. Lille."},{"key":"e_1_3_3_58_2","first-page":"3","article-title":"Obfuscating C++ programs via control flow flattening","volume":"30","author":"L\u00e1szl\u00f3 T\u0131mea","year":"2009","unstructured":"T\u0131mea L\u00e1szl\u00f3 and \u00c1kos Kiss. 2009. Obfuscating C++ programs via control flow flattening. Annales Universitatis Scientarum Budapestinensis de Rolando E\u00f6tv\u00f6s Nominatae, Sectio Computatorica 30, 1 (2009), 3\u201319.","journal-title":"Annales Universitatis Scientarum Budapestinensis de Rolando E\u00f6tv\u00f6s Nominatae, Sectio Computatorica"},{"key":"e_1_3_3_59_2","first-page":"75","volume-title":"International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization (CGO \u201904)","author":"Lattner Chris","year":"2004","unstructured":"Chris Lattner and Vikram Adve. 2004. LLVM: A compilation framework for lifelong program analysis & transformation. In International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization (CGO \u201904). IEEE Computer Society, 75."},{"key":"e_1_3_3_60_2","unstructured":"Lisha Li Kevin Jamieson Afshin Rostamizadeh Katya Gonina Moritz Hardt Benjamin Recht and Ameet Talwalkar. 2018. Massively Parallel Hyperparameter Tuning. Retrieved from https:\/\/openreview.net\/forum?id=S1Y7OOlRZ"},{"key":"e_1_3_3_61_2","doi-asserted-by":"publisher","DOI":"10.1145\/3460120.3484587"},{"key":"e_1_3_3_62_2","first-page":"3835","volume-title":"36th International Conference on Machine Learning (Proceedings of Machine Learning Research), Vol. 97","author":"Li Yujia","year":"2019","unstructured":"Yujia Li, Chenjie Gu, Thomas Dullien, Oriol Vinyals, and Pushmeet Kohli. 2019. Graph matching networks for learning the similarity of graph structured objects. In 36th International Conference on Machine Learning (Proceedings of Machine Learning Research), Vol. 97. Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.), PMLR, 3835\u20133845. Retrieved from https:\/\/proceedings.mlr.press\/v97\/li19d.html"},{"key":"e_1_3_3_63_2","doi-asserted-by":"publisher","DOI":"10.1145\/3664806"},{"key":"e_1_3_3_64_2","unstructured":"Richard Liaw Eric Liang Robert Nishihara Philipp Moritz Joseph E. Gonzalez and Ion Stoica. 2018. Tune: A research platform for distributed model selection and training. arXiv:1807.05118. Retrieved from http:\/\/arxiv.org\/abs\/1807.05118"},{"key":"e_1_3_3_65_2","doi-asserted-by":"publisher","DOI":"10.1109\/TII.2019.2942800"},{"key":"e_1_3_3_66_2","doi-asserted-by":"publisher","unstructured":"Yinhan Liu Myle Ott Naman Goyal Jingfei Du Mandar Joshi Danqi Chen Omer Levy Mike Lewis Luke Zettlemoyer and Veselin Stoyanov. 2019. RoBERTa: A robustly optimized BERT pretraining approach. In Chinese Computational Linguistics: 20th China National Conference CCL 2021 August 13\u201315 2021 Hohhot China. Springer-Verlag Berlin 471\u2013484. DOI: 10.1007\/978-3-030-84186-7_31","DOI":"10.1007\/978-3-030-84186-7_31"},{"key":"e_1_3_3_67_2","unstructured":"LLVM. 2018. LLVM Language Reference. Retrieved August 20 2019 from https:\/\/llvm.org\/docs\/LangRef.html"},{"key":"e_1_3_3_68_2","unstructured":"lua. 2024. lua. Retrieved May 8 2024 from https:\/\/www.lua.org\/source\/"},{"key":"e_1_3_3_69_2","doi-asserted-by":"publisher","DOI":"10.14722\/ndss.2023.24415"},{"key":"e_1_3_3_70_2","unstructured":"Yury A. Malkov and Dmitry A. Yashunin. 2016. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. arXiv:1603.09320. Retrieved from http:\/\/arxiv.org\/abs\/1603.09320"},{"key":"e_1_3_3_71_2","unstructured":"Andrea Marcelli Mariano Graziano Xabier Ugarte-Pedrero Yanick Fratantonio Mohamad Mansouri and Davide Balzarotti. 2022. How machine learning is solving the binary function similarity problem. In 31st USENIX Security Symposium (USENIX Security \u201922). USENIX Association Boston MA. Retrieved from https:\/\/www.usenix.org\/conference\/usenixsecurity22\/presentation\/marcelli"},{"key":"e_1_3_3_72_2","doi-asserted-by":"publisher","DOI":"10.1145\/3578360.3580262"},{"key":"e_1_3_3_73_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-22038-9_15"},{"key":"e_1_3_3_74_2","unstructured":"T. Mikolov K. Chen G. Corrado and J. Dean. 2013. Efficient estimation of word representations in vector space. arXiv:1301.3781. Retrieved from https:\/\/arxiv.org\/abs\/1301.3781"},{"key":"e_1_3_3_75_2","doi-asserted-by":"publisher","DOI":"10.1109\/TR.2016.2570554"},{"key":"e_1_3_3_76_2","doi-asserted-by":"publisher","DOI":"10.1145\/512927.512941"},{"key":"e_1_3_3_77_2","volume-title":"Advanced Compiler Design and Implementation","author":"Muchnick Steven S.","year":"1997","unstructured":"Steven S. Muchnick. 1997. Advanced Compiler Design and Implementation. Morgan Kaufmann Publishers Inc., San Francisco, CA."},{"key":"e_1_3_3_78_2","doi-asserted-by":"publisher","DOI":"10.1145\/1250734.1250746"},{"key":"e_1_3_3_79_2","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2019.8661201"},{"key":"e_1_3_3_80_2","doi-asserted-by":"publisher","DOI":"10.3233\/SW-160218"},{"key":"e_1_3_3_81_2","doi-asserted-by":"publisher","DOI":"10.5555\/1953048.2078195"},{"key":"e_1_3_3_82_2","unstructured":"Kexin Pei Zhou Xuan Junfeng Yang Suman Jana and Baishakhi Ray. 2020. Trex: Learning execution semantics from micro-traces for binary similarity. arXiv:2012.08680. Retrieved from https:\/\/arxiv.org\/abs\/2012.08680"},{"key":"e_1_3_3_83_2","first-page":"8476","volume-title":"International Conference on Machine Learning. PMLR","author":"Peng Dinglan","year":"2021","unstructured":"Dinglan Peng, Shuxin Zheng, Yatao Li, Guolin Ke, Di He, and Tie-Yan Liu. 2021. How could neural networks understand programs? In International Conference on Machine Learning. PMLR, 8476\u20138486."},{"key":"e_1_3_3_84_2","doi-asserted-by":"publisher","DOI":"10.1109\/SP.2015.49"},{"key":"e_1_3_3_85_2","unstructured":"PuTTY. 2024. PuTTY. Retrieved May 8 2024 from https:\/\/www.chiark.greenend.org.uk\/\u223csgtatham\/putty\/"},{"key":"e_1_3_3_86_2","doi-asserted-by":"publisher","DOI":"10.1145\/3579856.3582818"},{"key":"e_1_3_3_87_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-80515-9"},{"key":"e_1_3_3_88_2","doi-asserted-by":"crossref","unstructured":"Kimberly Redmond Lannan Luo and Qiang Zeng. 2018. A cross-architecture instruction embedding model for natural language processing-inspired binary code analysis. arXiv:1812.09652. Retrieved from https:\/\/arxiv.org\/abs\/1812.09652","DOI":"10.14722\/bar.2019.23057"},{"key":"e_1_3_3_89_2","doi-asserted-by":"publisher","DOI":"10.1145\/3453483.3454035"},{"key":"e_1_3_3_90_2","doi-asserted-by":"publisher","DOI":"10.1090\/S0002-9947-1953-0053041-6"},{"key":"e_1_3_3_91_2","doi-asserted-by":"publisher","DOI":"10.1145\/3542637.3542649"},{"key":"e_1_3_3_92_2","doi-asserted-by":"publisher","DOI":"10.1145\/3465361"},{"key":"e_1_3_3_93_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298682"},{"key":"e_1_3_3_94_2","doi-asserted-by":"publisher","DOI":"10.1145\/3264820.3264821"},{"key":"e_1_3_3_95_2","volume-title":"22nd Annual Network and Distributed System Security Symposium (NDSS \u201915)","author":"Shoshitaishvili Yan","year":"2015","unstructured":"Yan Shoshitaishvili, Ruoyu Wang, Christophe Hauser, Christopher Kruegel, and Giovanni Vigna. 2015. Firmalice\u2014Automatic detection of authentication bypass vulnerabilities in binary firmware. In 22nd Annual Network and Distributed System Security Symposium (NDSS \u201915). The Internet Society. Retrieved from https:\/\/www.ndss-symposium.org\/ndss2015\/firmalice-automatic-detection-authentication-bypass-vulnerabilities-binary-firmware"},{"key":"e_1_3_3_96_2","unstructured":"Jashanpreet Singh Sraw and Keshav Kumar. 2021. Using static and dynamic malware features to perform malware ascription. arXiv:2112.02639. Retrieved from https:\/\/arxiv.org\/abs\/2112.02639"},{"key":"e_1_3_3_97_2","unstructured":"Daniel Stenberg. 2024. cURL. Retrieved May 8 2024 from https:\/\/curl.se\/"},{"key":"e_1_3_3_98_2","doi-asserted-by":"publisher","DOI":"10.1109\/APSEC.2018.00043"},{"key":"e_1_3_3_99_2","volume-title":"Radare2 Book","author":"Radare2 Team","year":"2017","unstructured":"Radare2 Team. 2017. Radare2 Book. GitHub."},{"key":"e_1_3_3_100_2","doi-asserted-by":"publisher","DOI":"10.5555\/3692070.3694037"},{"key":"e_1_3_3_101_2","doi-asserted-by":"publisher","DOI":"10.1145\/321921.321925"},{"issue":"86","key":"e_1_3_3_102_2","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"van der Maaten Laurens","year":"2008","unstructured":"Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, 86 (2008), 2579\u20132605. Retrieved from http:\/\/jmlr.org\/papers\/v9\/vandermaaten08a.html","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_3_103_2","volume-title":"Advances in Neural Information Processing Systems","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30, Curran Associates, Inc, 6000\u20136010. Retrieved from https:\/\/proceedings.neurips.cc\/paper\/2017\/file\/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf"},{"key":"e_1_3_3_104_2","doi-asserted-by":"publisher","DOI":"10.1145\/3418463"},{"key":"e_1_3_3_105_2","doi-asserted-by":"publisher","DOI":"10.1145\/3640537.3641580"},{"key":"e_1_3_3_106_2","doi-asserted-by":"publisher","DOI":"10.1109\/SecDev.2017.14"},{"key":"e_1_3_3_107_2","doi-asserted-by":"crossref","unstructured":"Hao Wang Zeyu Gao Chao Zhang Mingyang Sun Yuchen Zhou Han Qiu and Xi Xiao. 2024. CEBin: A cost-effective framework for large-scale binary code similarity detection. arXiv:2402.18818. Retrieved from https:\/\/arxiv.org\/abs\/2402.18818","DOI":"10.1145\/3650212.3652117"},{"key":"e_1_3_3_108_2","doi-asserted-by":"publisher","DOI":"10.1145\/3569933"},{"key":"e_1_3_3_109_2","doi-asserted-by":"publisher","DOI":"10.1145\/3533767.3534367"},{"key":"e_1_3_3_110_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2017.2754499"},{"key":"e_1_3_3_111_2","first-page":"1","article-title":"Bmat-A binary matching tool for stale profile propagation","volume":"2","author":"Wang Zheng","year":"2000","unstructured":"Zheng Wang, Ken Pierce, and Scott McFarling. 2000. Bmat-A binary matching tool for stale profile propagation. The Journal of Instruction-Level Parallelism 2 (2000), 1\u201320.","journal-title":"The Journal of Instruction-Level Parallelism"},{"key":"e_1_3_3_112_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.1984.5010248"},{"key":"e_1_3_3_113_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-981-16-6054-2"},{"key":"e_1_3_3_114_2","doi-asserted-by":"publisher","DOI":"10.1145\/3133956.3134018"},{"key":"e_1_3_3_115_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58568-6_8"},{"key":"e_1_3_3_116_2","doi-asserted-by":"publisher","DOI":"10.1109\/WACV45572.2020.9093432"},{"key":"e_1_3_3_117_2","doi-asserted-by":"publisher","DOI":"10.1145\/3211346.3211347"},{"key":"e_1_3_3_118_2","doi-asserted-by":"publisher","DOI":"10.5555\/2832415.2832542"},{"key":"e_1_3_3_119_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2021.3056139"},{"key":"e_1_3_3_120_2","doi-asserted-by":"publisher","DOI":"10.1109\/DSN48987.2021.00036"},{"key":"e_1_3_3_121_2","doi-asserted-by":"crossref","unstructured":"Shouguo Yang Chaopeng Dong Yang Xiao Yiran Cheng Zhiqiang Shi Zhi Li and Limin Sun. 2023. Asteria-Pro: Enhancing deep-learning based binary code similarity detection by incorporating domain knowledge. arXiv:2301.00511. Retrieved from https:\/\/arxiv.org\/abs\/2301.00511","DOI":"10.1145\/3604611"},{"key":"e_1_3_3_122_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i01.5466"},{"key":"e_1_3_3_123_2","doi-asserted-by":"publisher","DOI":"10.1145\/3701992"},{"key":"e_1_3_3_124_2","doi-asserted-by":"publisher","DOI":"10.14722\/bar.2020.23002"},{"key":"e_1_3_3_125_2","unstructured":"Wenyu Zhu Hao Wang Yuchen Zhou Jiaming Wang Zihan Sha Zeyu Gao and Chao Zhang. 2023. kTrans: Knowledge-aware transformer for binary code embedding. arXiv:2308.12659. Retrieved from https:\/\/arxiv.org\/abs\/2308.12659"},{"key":"e_1_3_3_126_2","doi-asserted-by":"publisher","DOI":"10.14722\/ndss.2019.23492"},{"key":"e_1_3_3_127_2","unstructured":"Zynamics. 2024. Bindiff7. Retrieved June 17 2024 from https:\/\/www.zynamics.com\/bindiff.html"}],"container-title":["ACM Transactions on Software Engineering and Methodology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3721481","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,4]],"date-time":"2025-10-04T11:08:48Z","timestamp":1759576128000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3721481"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,4]]},"references-count":126,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2025,11,30]]}},"alternative-id":["10.1145\/3721481"],"URL":"https:\/\/doi.org\/10.1145\/3721481","relation":{},"ISSN":["1049-331X","1557-7392"],"issn-type":[{"value":"1049-331X","type":"print"},{"value":"1557-7392","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,10,4]]},"assertion":[{"value":"2024-07-08","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-02-13","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-10-04","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}