{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,11]],"date-time":"2026-03-11T01:36:04Z","timestamp":1773192964220,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":76,"publisher":"ACM","license":[{"start":{"date-parts":[[2024,9,11]],"date-time":"2024-09-11T00:00:00Z","timestamp":1726012800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2024,9,11]]},"DOI":"10.1145\/3650212.3652145","type":"proceedings-article","created":{"date-parts":[[2024,9,11]],"date-time":"2024-09-11T11:44:25Z","timestamp":1726055065000},"page":"503-515","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":33,"title":["CLAP: Learning Transferable Binary Code Representations with Natural Language Supervision"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0536-5039","authenticated-orcid":false,"given":"Hao","family":"Wang","sequence":"first","affiliation":[{"name":"Tsinghua University, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-2318-9061","authenticated-orcid":false,"given":"Zeyu","family":"Gao","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7894-8828","authenticated-orcid":false,"given":"Chao","family":"Zhang","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1020-9006","authenticated-orcid":false,"given":"Zihan","family":"Sha","sequence":"additional","affiliation":[{"name":"Information Engineering University, Zhengzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-7858-9246","authenticated-orcid":false,"given":"Mingyang","family":"Sun","sequence":"additional","affiliation":[{"name":"University of Electronic Science and Technology of China, Chengdu, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7021-1183","authenticated-orcid":false,"given":"Yuchen","family":"Zhou","sequence":"additional","affiliation":[{"name":"Beijing University of Technology, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-0559-8915","authenticated-orcid":false,"given":"Wenyu","family":"Zhu","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-1923-1749","authenticated-orcid":false,"given":"Wenju","family":"Sun","sequence":"additional","affiliation":[{"name":"Tsinghua University, Shenzhen, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2678-8070","authenticated-orcid":false,"given":"Han","family":"Qiu","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1521-9542","authenticated-orcid":false,"given":"Xi","family":"Xiao","sequence":"additional","affiliation":[{"name":"Tsinghua University, Shenzhen, China"}]}],"member":"320","published-online":{"date-parts":[[2024,9,11]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/3564625.3567975"},{"key":"e_1_3_2_1_2_1","first-page":"23716","article-title":"Flamingo: a visual language model for few-shot learning","volume":"35","author":"Alayrac Jean-Baptiste","year":"2022","unstructured":"Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katherine Millican, and Malcolm Reynolds. 2022. Flamingo: a visual language model for few-shot learning. Advances in Neural Information Processing Systems, 35 (2022), 23716\u201323736.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_1_3_1","unstructured":"Sebastian Borgeaud Arthur Mensch Jordan Hoffmann Trevor Cai Eliza Rutherford Katie Millican George van den Driessche Jean-Baptiste Lespiau Bogdan Damoc Aidan Clark Diego de Las Casas Aurelia Guy Jacob Menick Roman Ring Tom Hennigan Saffron Huang Loren Maggiore Chris Jones Albin Cassirer Andy Brock Michela Paganini Geoffrey Irving Oriol Vinyals Simon Osindero Karen Simonyan Jack W. Rae Erich Elsen and Laurent Sifre. 2021. Improving Language Models by Retrieving from Trillions of Tokens. arxiv:2112.04426."},{"key":"e_1_3_2_1_4_1","unstructured":"Tom B. Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell Sandhini Agarwal Ariel Herbert-Voss Gretchen Krueger Tom Henighan Rewon Child Aditya Ramesh Daniel M. Ziegler Jeffrey Wu Clemens Winter Christopher Hesse Mark Chen Eric Sigler Mateusz Litwin Scott Gray Benjamin Chess Jack Clark Christopher Berner Sam McCandlish Alec Radford Ilya Sutskever and Dario Amodei. 2020. Language Models Are Few-Shot Learners. arxiv:2005.14165."},{"key":"e_1_3_2_1_5_1","volume-title":"Ubuntu: Enterprise Open Source and Linux. https:\/\/ubuntu.com\/ Accessed: 2023-06-01","year":"2023","unstructured":"Canonical. 2023. Ubuntu: Enterprise Open Source and Linux. https:\/\/ubuntu.com\/ Accessed: 2023-06-01"},{"key":"e_1_3_2_1_6_1","unstructured":"Guoqiang Chen Xiuwei Shang Shaoyin Cheng Yanming Zhang Weiming Zhang and Nenghai Yu. 2024. FoC: Figure out the Cryptographic Functions in Stripped Binaries with LLMs. arxiv:2403.18403."},{"key":"e_1_3_2_1_7_1","volume-title":"Xing","author":"Chiang Wei-Lin","year":"2023","unstructured":"Wei-Lin Chiang, Zhuohan Li, Zi Lin, Ying Sheng, Zhanghao Wu, Hao Zhang, Lianmin Zheng, Siyuan Zhuang, Yonghao Zhuang, Joseph E. Gonzalez, Ion Stoica, and Eric P. Xing. 2023. Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality. https:\/\/lmsys.org\/blog\/2023-03-30-vicuna\/"},{"key":"e_1_3_2_1_8_1","volume-title":"USENIX Security Symposium. 99\u2013116","author":"Chua Zheng Leong","year":"2017","unstructured":"Zheng Leong Chua, Shiqi Shen, Prateek Saxena, and Zhenkai Liang. 2017. Neural Nets Can Learn Function Type Signatures From Binaries.. In USENIX Security Symposium. 99\u2013116."},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/EuroSP53844.2022.00012"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/SP.2019.00003"},{"key":"e_1_3_2_1_11_1","volume-title":"Zhang","author":"Fan Angela","year":"2023","unstructured":"Angela Fan, Beliz Gokkaya, Mark Harman, Mitya Lyubarskiy, Shubho Sengupta, Shin Yoo, and Jie M. Zhang. 2023. Large Language Models for Software Engineering: Survey and Open Problems. arxiv:2310.03533."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","unstructured":"Zhangyin Feng Daya Guo Duyu Tang Nan Duan Xiaocheng Feng Ming Gong Linjun Shou Bing Qin Ting Liu Daxin Jiang and Ming Zhou. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. https:\/\/doi.org\/10.48550\/arXiv.2002.08155 arxiv:2002.08155. 10.48550\/arXiv.2002.08155","DOI":"10.48550\/arXiv.2002.08155"},{"key":"e_1_3_2_1_13_1","unstructured":"Zeyu Gao Hao Wang Yuchen Zhou Wenyu Zhu and Chao Zhang. 2023. How Far Have We Gone in Vulnerability Detection Using Large Language Models. arxiv:2311.12420."},{"key":"e_1_3_2_1_14_1","volume-title":"USENIX Security Symposium. 1787\u20131804","author":"Guo Wenbo","year":"2019","unstructured":"Wenbo Guo, Dongliang Mu, Xinyu Xing, Min Du, and Dawn Song. 2019. DEEPVSA: Facilitating Value-set Analysis with Deep Learning for Postmortem Program Analysis.. In USENIX Security Symposium. 1787\u20131804."},{"key":"e_1_3_2_1_15_1","volume-title":"33rd USENIX Security Symposium (USENIX Security 24)","author":"He Haojie","year":"2024","unstructured":"Haojie He, Xingwei Lin, Ziang Weng, Ruijie Zhao, Shuitao Gan, Libo Chen, Yuede Ji, Jiashui Wang, and Zhi Xue. 2024. Code is not Natural Language: Unlock the Power of Semantics-Oriented Graph Representation for Binary Code Similarity Detection. In 33rd USENIX Security Symposium (USENIX Security 24), PHILADELPHIA, PA."},{"key":"e_1_3_2_1_16_1","unstructured":"Hex-Rays. 2015. IDA Pro Disassembler and Debugger. https:\/\/www.hex-rays.com\/products\/ida\/index.shtml"},{"key":"e_1_3_2_1_17_1","volume-title":"Long short-term memory. Neural computation, 9, 8","author":"Hochreiter Sepp","year":"1997","unstructured":"Sepp Hochreiter and J\u00fcrgen Schmidhuber. 1997. Long short-term memory. Neural computation, 9, 8 (1997), 1735\u20131780."},{"key":"e_1_3_2_1_18_1","unstructured":"Jordan Hoffmann Sebastian Borgeaud Arthur Mensch Elena Buchatskaya Trevor Cai Eliza Rutherford Diego de Las Casas Lisa Anne Hendricks Johannes Welbl Aidan Clark Tom Hennigan Eric Noland Katie Millican George van den Driessche Bogdan Damoc Aurelia Guy Simon Osindero Karen Simonyan Erich Elsen Jack W. Rae Oriol Vinyals and Laurent Sifre. 2022. Training Compute-Optimal Large Language Models. arxiv:2203.15556."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"crossref","unstructured":"Xinyi Hou Yanjie Zhao Yue Liu Zhou Yang Kailong Wang Li Li Xiapu Luo David Lo John Grundy and Haoyu Wang. 2024. Large Language Models for Software Engineering: A Systematic Literature Review. arxiv:2308.10620.","DOI":"10.1145\/3695988"},{"key":"e_1_3_2_1_20_1","volume-title":"The Twelfth International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=A0HKeKl4Nl","author":"Jain Samyak","year":"2024","unstructured":"Samyak Jain, Robert Kirk, Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka, Tim Rockt\u00e4schel, Edward Grefenstette, and David Krueger. 2024. What happens when you fine-tuning your model? Mechanistic analysis of procedurally generated tasks.. In The Twelfth International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=A0HKeKl4Nl"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"crossref","unstructured":"Ling Jiang Junwen An Huihui Huang Qiyi Tang Sen Nie Shi Wu and Yuqun Zhang. 2024. BinaryAI: Binary Software Composition Analysis via Intelligent Binary Source Code Matching. arXiv preprint arXiv:2401.11161.","DOI":"10.1145\/3597503.3639100"},{"key":"e_1_3_2_1_22_1","unstructured":"Nan Jiang Chengxiao Wang Kevin Liu Xiangzhe Xu Lin Tan and Xiangyu Zhang. 2023. Nova^+: Generative Language Models for Binaries. arxiv:2311.13721."},{"key":"e_1_3_2_1_23_1","volume-title":"Swe-bench: Can language models resolve real-world github issues? arXiv preprint arXiv:2310.06770.","author":"Jimenez Carlos E","year":"2023","unstructured":"Carlos E Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan. 2023. Swe-bench: Can language models resolve real-world github issues? arXiv preprint arXiv:2310.06770."},{"key":"e_1_3_2_1_24_1","unstructured":"Xin Jin Jonathan Larson Weiwei Yang and Zhiqiang Lin. 2023. Binary code summarization: Benchmarking chatgpt\/gpt-4 and other large language models. arXiv preprint arXiv:2312.09601."},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","unstructured":"Jared Kaplan Sam McCandlish Tom Henighan Tom B. Brown Benjamin Chess Rewon Child Scott Gray Alec Radford Jeffrey Wu and Dario Amodei. 2020. Scaling Laws for Neural Language Models. https:\/\/doi.org\/10.48550\/arXiv.2001.08361 arxiv:2001.08361. 10.48550\/arXiv.2001.08361","DOI":"10.48550\/arXiv.2001.08361"},{"key":"e_1_3_2_1_26_1","unstructured":"Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907."},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"crossref","unstructured":"Taku Kudo. 2018. Subword regularization: Improving neural network translation models with multiple subword candidates. arXiv preprint arXiv:1804.10959.","DOI":"10.18653\/v1\/P18-1007"},{"key":"e_1_3_2_1_28_1","volume-title":"2019 34th IEEE\/ACM International Conference on Automated Software Engineering (ASE). 628\u2013639","author":"Lacomis Jeremy","year":"2019","unstructured":"Jeremy Lacomis, Pengcheng Yin, Edward Schwartz, Miltiadis Allamanis, Claire Le Goues, Graham Neubig, and Bogdan Vasilescu. 2019. Dire: A neural approach to decompiled identifier naming. In 2019 34th IEEE\/ACM International Conference on Automated Software Engineering (ASE). 628\u2013639."},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/5.726791"},{"key":"e_1_3_2_1_30_1","volume-title":"International Conference on Machine Learning. 12888\u201312900","author":"Li Junnan","year":"2022","unstructured":"Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. 2022. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In International Conference on Machine Learning. 12888\u201312900."},{"key":"e_1_3_2_1_31_1","unstructured":"Raymond Li Loubna Ben Allal Yangtian Zi Niklas Muennighoff Denis Kocetkov Chenghao Mou Marc Marone Christopher Akiki Jia Li Jenny Chim Qian Liu Evgenii Zheltonozhskii Terry Yue Zhuo Thomas Wang Olivier Dehaene Mishig Davaadorj Joel Lamy-Poirier Jo\u00e3o Monteiro Oleh Shliazhko Nicolas Gontier Nicholas Meade Armel Zebaze Ming-Ho Yee Logesh Kumar Umapathi Jian Zhu Benjamin Lipkin Muhtasham Oblokulov Zhiruo Wang Rudra Murthy Jason Stillerman Siva Sankalp Patel Dmitry Abulkhanov Marco Zocca Manan Dey Zhihan Zhang Nour Fahmy Urvashi Bhattacharyya Wenhao Yu Swayam Singh Sasha Luccioni Paulo Villegas Maxim Kunakov Fedor Zhdanov Manuel Romero Tony Lee Nadav Timor Jennifer Ding Claire Schlesinger Hailey Schoelkopf Jan Ebert Tri Dao Mayank Mishra Alex Gu Jennifer Robinson Carolyn Jane Anderson Brendan Dolan-Gavitt Danish Contractor Siva Reddy Daniel Fried Dzmitry Bahdanau Yacine Jernite Carlos Mu\u00f1oz Ferrandis Sean Hughes Thomas Wolf Arjun Guha Leandro von Werra and Harm de Vries. 2023. StarCoder: may the source be with you!. arxiv:2305.06161."},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/3460120.3484587"},{"key":"e_1_3_2_1_33_1","volume-title":"International conference on machine learning. 3835\u20133845","author":"Li Yujia","year":"2019","unstructured":"Yujia Li, Chenjie Gu, Thomas Dullien, Oriol Vinyals, and Pushmeet Kohli. 2019. Graph matching networks for learning the similarity of graph structured objects. In International conference on machine learning. 3835\u20133845."},{"key":"e_1_3_2_1_34_1","unstructured":"Yujia Li Chenjie Gu Thomas Dullien Oriol Vinyals and Pushmeet Kohli. 2019. Graph Matching Networks for Learning the Similarity of Graph Structured Objects. In arXiv:1904.12787 [Cs Stat]. arxiv:1904.12787."},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","unstructured":"Zehan Li Xin Zhang Yanzhao Zhang Dingkun Long Pengjun Xie and Meishan Zhang. 2023. Towards General Text Embeddings with Multi-stage Contrastive Learning. https:\/\/doi.org\/10.48550\/arXiv.2308.03281 arxiv:2308.03281. 10.48550\/arXiv.2308.03281","DOI":"10.48550\/arXiv.2308.03281"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3238147.3238199"},{"key":"e_1_3_2_1_37_1","unstructured":"Haotian Liu Chunyuan Li Qingyang Wu and Yong Jae Lee. 2023. Visual Instruction Tuning. arxiv:2304.08485."},{"key":"e_1_3_2_1_38_1","unstructured":"Huidong Liu Shaoyuan Xu Jinmiao Fu Yang Liu Ning Xie Chien-Chih Wang Bryan Wang and Yi Sun. 2021. CMA-CLIP: Cross-Modality Attention CLIP for Image-Text Classification. arxiv:2112.03562."},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","unstructured":"Yinhan Liu Myle Ott Naman Goyal Jingfei Du Mandar Joshi Danqi Chen Omer Levy Mike Lewis Luke Zettlemoyer and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. https:\/\/doi.org\/10.48550\/arXiv.1907.11692 arxiv:1907.11692. 10.48550\/arXiv.1907.11692","DOI":"10.48550\/arXiv.1907.11692"},{"key":"e_1_3_2_1_40_1","unstructured":"LLVM. 2023. Clang: a C language family frontend for LLVM. https:\/\/clang.llvm.org Accessed: 2023-06-01"},{"key":"e_1_3_2_1_41_1","volume-title":"Federico Cassano, Joel Lamy-Poirier, Nouamane Tazi, Ao Tang, Dmytro Pykhtar, Jiawei Liu, and Yuxiang Wei.","author":"Lozhkov Anton","year":"2024","unstructured":"Anton Lozhkov, Raymond Li, Loubna Ben Allal, Federico Cassano, Joel Lamy-Poirier, Nouamane Tazi, Ao Tang, Dmytro Pykhtar, Jiawei Liu, and Yuxiang Wei. 2024. StarCoder 2 and The Stack v2: The Next Generation. arXiv preprint arXiv:2402.19173."},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2102.04664"},{"key":"e_1_3_2_1_43_1","unstructured":"Zhenhao Luo Pengfei Wang Baosheng Wang Yong Tang Wei Xie Xu Zhou Danjun Liu and Kai Lu. [n. d.]. VulHawk: Cross-architecture Vulnerability Detection with Entropy-based Binary Code Search."},{"key":"e_1_3_2_1_44_1","unstructured":"Andrea Marcelli Mariano Graziano Mohamad Mansouri Xabier Ugarte-Pedrero Davide Balzarotti and Yanick Fratantonio. 2022. How Machine Learning Is Solving the Binary Function Similarity Problem. 18."},{"key":"e_1_3_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.14722\/bar.2019.23020"},{"key":"e_1_3_2_1_46_1","volume-title":"Fabio Petroni, Leonardo Querzoni, and Roberto Baldoni.","author":"Massarelli Luca","year":"2019","unstructured":"Luca Massarelli, Giuseppe Antonio Di Luna, Fabio Petroni, Leonardo Querzoni, and Roberto Baldoni. 2019. SAFE: Self-Attentive Function Embeddings for Binary Similarity. In arXiv:1811.05296 [Cs]. arxiv:1811.05296."},{"key":"e_1_3_2_1_47_1","unstructured":"Aaron van den Oord Yazhe Li and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748."},{"key":"e_1_3_2_1_49_1","unstructured":"OpenAI. 2023. ChatGPT. https:\/\/chat.openai.com Accessed: 2023-06-06"},{"key":"e_1_3_2_1_50_1","volume-title":"Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32","author":"Paszke Adam","year":"2019","unstructured":"Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, and Luca Antiga. 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32 (2019), 8026\u20138037."},{"key":"e_1_3_2_1_51_1","volume-title":"XDA: Accurate, Robust Disassembly with Transfer Learning. arxiv:2010.00770.","author":"Pei Kexin","year":"2020","unstructured":"Kexin Pei, Jonas Guan, David Williams-King, Junfeng Yang, and Suman Jana. 2020. XDA: Accurate, Robust Disassembly with Transfer Learning. arxiv:2010.00770."},{"key":"e_1_3_2_1_52_1","volume-title":"Trex: Learning Execution Semantics from Micro-Traces for Binary Similarity. arXiv:2012.08680 [cs], April, arxiv:2012.08680.","author":"Pei Kexin","year":"2021","unstructured":"Kexin Pei, Zhou Xuan, Junfeng Yang, Suman Jana, and Baishakhi Ray. 2021. Trex: Learning Execution Semantics from Micro-Traces for Binary Similarity. arXiv:2012.08680 [cs], April, arxiv:2012.08680."},{"key":"e_1_3_2_1_53_1","volume-title":"Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever.","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. arxiv:2103.00020."},{"key":"e_1_3_2_1_54_1","unstructured":"Alec Radford Jeffrey Wu Rewon Child David Luan Dario Amodei and Ilya Sutskever. [n. d.]. Language Models Are Unsupervised Multitask Learners."},{"key":"e_1_3_2_1_55_1","unstructured":"Edward Raff Jon Barker Jared Sylvester Robert Brandon Bryan Catanzaro and Charles Nicholas. 2017. Malware detection by eating a whole exe. arXiv preprint arXiv:1710.09435."},{"key":"e_1_3_2_1_56_1","volume-title":"Yossi Adi, Jingyu Liu, Tal Remez, and J\u00e9r\u00e9my Rapin.","author":"Roziere Baptiste","year":"2023","unstructured":"Baptiste Roziere, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, and J\u00e9r\u00e9my Rapin. 2023. Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950."},{"key":"e_1_3_2_1_57_1","volume-title":"Sentence Transformer: MPNet-Base-V2. https:\/\/huggingface.co\/sentence-transformers\/all-mpnet-base-v2\/ Accessed: 2023-06-01","year":"2023","unstructured":"sentence transformers. 2023. Sentence Transformer: MPNet-Base-V2. https:\/\/huggingface.co\/sentence-transformers\/all-mpnet-base-v2\/ Accessed: 2023-06-01"},{"key":"e_1_3_2_1_58_1","volume-title":"24th $USENIX$ Security Symposium ($USENIX$ Security 15). 611\u2013626.","author":"Richard Shin Eui Chul","unstructured":"Eui Chul Richard Shin, Dawn Song, and Reza Moazzezi. 2015. Recognizing functions in binaries with neural networks. In 24th $USENIX$ Security Symposium ($USENIX$ Security 15). 611\u2013626."},{"key":"e_1_3_2_1_59_1","doi-asserted-by":"crossref","unstructured":"Student. 1908. The probable error of a mean. Biometrika 1\u201325.","DOI":"10.2307\/2331554"},{"key":"e_1_3_2_1_60_1","unstructured":"Yuqiang Sun Daoyuan Wu Yue Xue Han Liu Wei Ma Lyuye Zhang Miaolei Shi and Yang Liu. 2024. LLM4Vuln: A Unified Evaluation Framework for Decoupling and Enhancing LLMs\u2019 Vulnerability Reasoning. arXiv preprint arXiv:2401.16185."},{"key":"e_1_3_2_1_61_1","unstructured":"Hanzhuo Tan Qi Luo Jing Li and Yuqun Zhang. 2024. LLM4Decompile: Decompiling Binary Code with Large Language Models. arxiv:2403.05286."},{"key":"e_1_3_2_1_62_1","unstructured":"Hugo Touvron Thibaut Lavril Gautier Izacard Xavier Martinet Marie-Anne Lachaux Timothee Lacroix Baptiste Rozi\u00e8re Naman Goyal Eric Hambro Faisal Azhar Aurelien Rodriguez Armand Joulin Edouard Grave and Guillaume Lample. [n. d.]. LLaMA: Open and Efficient Foundation Language Models."},{"key":"e_1_3_2_1_63_1","article-title":"Visualizing data using t-SNE","volume":"9","author":"der Maaten Laurens Van","year":"2008","unstructured":"Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE.. Journal of machine learning research, 9, 11 (2008).","journal-title":"Journal of machine learning research"},{"key":"e_1_3_2_1_64_1","volume-title":"Order Matters: Sequence to Sequence for Sets. arXiv:1511.06391 [cs, stat], Feb., arxiv:1511.06391.","author":"Vinyals Oriol","year":"2016","unstructured":"Oriol Vinyals, Samy Bengio, and Manjunath Kudlur. 2016. Order Matters: Sequence to Sequence for Sets. arXiv:1511.06391 [cs, stat], Feb., arxiv:1511.06391."},{"key":"e_1_3_2_1_65_1","doi-asserted-by":"crossref","unstructured":"Hao Wang Zeyu Gao Chao Zhang Mingyang Sun Yuchen Zhou Han Qiu and Xi Xiao. 2024. CEBin: A Cost-Effective Framework for Large-Scale Binary Code Similarity Detection. arxiv:2402.18818.","DOI":"10.1145\/3650212.3652117"},{"key":"e_1_3_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1145\/3533767.3534367"},{"key":"e_1_3_2_1_67_1","volume-title":"COSEA: Convolutional Code Search with Layer-wise Attention. arxiv:2010.09520.","author":"Wang Hao","year":"2020","unstructured":"Hao Wang, Jia Zhang, Yingce Xia, Jiang Bian, Chao Zhang, and Tie-Yan Liu. 2020. COSEA: Convolutional Code Search with Layer-wise Attention. arxiv:2010.09520."},{"key":"e_1_3_2_1_68_1","volume-title":"Nghi D. Q. Bui, Junnan Li, and Steven C. H. Hoi.","author":"Wang Yue","year":"2023","unstructured":"Yue Wang, Hung Le, Akhilesh Deepak Gotmare, Nghi D. Q. Bui, Junnan Li, and Steven C. H. Hoi. 2023. CodeT5+: Open Code Large Language Models for Code Understanding and Generation. arxiv:2305.07922."},{"key":"e_1_3_2_1_69_1","volume-title":"Hoi","author":"Wang Yue","year":"2021","unstructured":"Yue Wang, Weishi Wang, Shafiq Joty, and Steven C. H. Hoi. 2021. CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation. Sept., arxiv:2109.00859."},{"key":"e_1_3_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.1145\/3133956.3134018"},{"key":"e_1_3_2_1_71_1","doi-asserted-by":"publisher","DOI":"10.1145\/3133956.3134018"},{"key":"e_1_3_2_1_72_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2021.3056139"},{"key":"e_1_3_2_1_73_1","volume-title":"31st USENIX Security Symposium (USENIX Security 22)","author":"Yu Sheng","year":"2022","unstructured":"Sheng Yu, Yu Qu, Xunchao Hu, and Heng Yin. 2022. DeepDi: Learning a Relational Graph Convolutional Network Model on Instructions for Fast and Accurate Disassembly. In 31st USENIX Security Symposium (USENIX Security 22). USENIX Association, Boston, MA. 2709\u20132725. isbn:978-1-939133-31-1 https:\/\/www.usenix.org\/conference\/usenixsecurity22\/presentation\/yu-sheng"},{"key":"e_1_3_2_1_74_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i01.5466"},{"key":"e_1_3_2_1_75_1","first-page":"3872","article-title":"CodeCMR: Cross-Modal Retrieval For Function-Level Binary Source Code Matching","author":"Yu Zeping","year":"2020","unstructured":"Zeping Yu, Wenxin Zheng, Jiaqi Wang, Qiyi Tang, Sen Nie, and Shi Wu. 2020. CodeCMR: Cross-Modal Retrieval For Function-Level Binary Source Code Matching. In Advances in Neural Information Processing Systems. 33, Curran Associates, Inc., 3872\u20133883.","journal-title":"Advances in Neural Information Processing Systems. 33, Curran Associates, Inc."},{"key":"e_1_3_2_1_76_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2305.11206"},{"key":"e_1_3_2_1_77_1","unstructured":"Wenyu Zhu Hao Wang Yuchen Zhou Jiaming Wang Zihan Sha Zeyu Gao and Chao Zhang. 2023. kTrans: Knowledge-Aware Transformer for Binary Code Embedding. arxiv:2308.12659."}],"event":{"name":"ISSTA '24: 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis","location":"Vienna Austria","acronym":"ISSTA '24","sponsor":["SIGSOFT ACM Special Interest Group on Software Engineering","AITO"]},"container-title":["Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3650212.3652145","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3650212.3652145","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T22:50:06Z","timestamp":1750287006000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3650212.3652145"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,9,11]]},"references-count":76,"alternative-id":["10.1145\/3650212.3652145","10.1145\/3650212"],"URL":"https:\/\/doi.org\/10.1145\/3650212.3652145","relation":{},"subject":[],"published":{"date-parts":[[2024,9,11]]},"assertion":[{"value":"2024-09-11","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}