{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,29]],"date-time":"2026-05-29T11:34:38Z","timestamp":1780054478247,"version":"3.54.0"},"reference-count":47,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2025,4,28]],"date-time":"2025-04-28T00:00:00Z","timestamp":1745798400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Softw. Eng. Methodol."],"published-print":{"date-parts":[[2025,5,31]]},"abstract":"<jats:p>\n            Predicting function names in stripped binaries, which requires succinctly summarizing semantics of binary code in natural languages, is a crucial but challenging task. Recently, many machine learning based solutions have been proposed. However, they have poor generalizability, i.e., fail to handle unseen binaries. To advance the state of the art, we present\n            <jats:bold>\n              large assembly language Model (\n              <jats:monospace>llasm<\/jats:monospace>\n              )\n            <\/jats:bold>\n            , a novel framework which fuses encoder-only and decoder-only LLMs for function name prediction. It refines encoder-only models to preserve more binary information and learn better binary representations. Then it adopts a novel architecture to project the encoding to the input space of a decoder-only natural language model, which enables it to have better capability of inferring general knowledge and better generalizability. We have evaluated\n            <jats:monospace>llasm<\/jats:monospace>\n            in the BinaryCorp and Debin datasets.\n            <jats:monospace>llasm<\/jats:monospace>\n            outperforms the state-of-the-art function name prediction tools by up to 19.9%, 40.7%, and 36.5% in precision, recall, and F1 score, with significantly better generalizability in unseen binaries. Our case studies further demonstrate the practical use cases of\n            <jats:monospace>llasm<\/jats:monospace>\n            in analyzing real-world malware, showing the usefulness of function name prediction.\n          <\/jats:p>","DOI":"10.1145\/3702988","type":"journal-article","created":{"date-parts":[[2024,11,5]],"date-time":"2024-11-05T16:29:38Z","timestamp":1730824178000},"page":"1-22","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":8,"title":["llasm: Naming Functions in Binaries by Fusing Encoder-only and Decoder-only LLMs"],"prefix":"10.1145","volume":"34","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1020-9006","authenticated-orcid":false,"given":"Zihan","family":"Sha","sequence":"first","affiliation":[{"name":"Key Laboratory of Cyberspace Security, Ministry of Education, Zhengzhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0536-5039","authenticated-orcid":false,"given":"Hao","family":"Wang","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-2318-9061","authenticated-orcid":false,"given":"Zeyu","family":"Gao","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2797-1355","authenticated-orcid":false,"given":"Hui","family":"Shu","sequence":"additional","affiliation":[{"name":"Key Laboratory of Cyberspace Security, Ministry of Education, Zhengzhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-1287-096X","authenticated-orcid":false,"given":"Bolun","family":"Zhang","sequence":"additional","affiliation":[{"name":"Institute of Information Engineering CAS, Beijing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-3823-1069","authenticated-orcid":false,"given":"Ziqing","family":"Wang","sequence":"additional","affiliation":[{"name":"Key Laboratory of Cyberspace Security, Ministry of Education, Zhengzhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7894-8828","authenticated-orcid":false,"given":"Chao","family":"Zhang","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2025,4,28]]},"reference":[{"key":"e_1_3_1_2_2","first-page":"23716","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","volume":"35","author":"Alayrac Jean-Baptiste","year":"2022","unstructured":"Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katherine Millican, Malcolm Reynolds, et al. 2022. Flamingo: A visual language model for few-shot learning. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 35, 23716\u201323736."},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","unstructured":"S\u00e9bastien Bubeck Varun Chandrasekaran Ronen Eldan Johannes Gehrke Eric Horvitz Ece Kamar Peter Lee Yin Tat Lee Yuanzhi Li Scott Lundberg et al. 2023. Sparks of artificial general intelligence: Early experiments with GPT-4. arXiv:2303.12712. Retrieved from 10.48550\/arXiv.2303.12712","DOI":"10.48550\/arXiv.2303.12712"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/3428293"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N19-1423"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N19-1423"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/SP.2019.00003"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/3243734.3243866"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2939756"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01157"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/3548606.3560612"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/3579856.3582823"},{"key":"e_1_3_1_14_2","volume-title":"Proceedings of the International Conference on Learning Representations (ICLR \u201917)","author":"Kipf Thomas N.","year":"2017","unstructured":"Thomas N. Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In Proceedings of the International Conference on Learning Representations (ICLR \u201917)."},{"key":"e_1_3_1_15_2","volume-title":"Proceedings of the 31st International Conference on International Conference on Machine Learning (ICML\u201914),","volume":"32","author":"Le Quoc","year":"2014","unstructured":"Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on International Conference on Machine Learning (ICML\u201914), Vol. 32. JMLR.org, II\u20131188\u2013II\u20131196."},{"key":"e_1_3_1_16_2","first-page":"12888","volume-title":"Proceedings of the International Conference on Machine Learning.","author":"Li Junnan","year":"2022","unstructured":"Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. 2022. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In Proceedings of the International Conference on Machine Learning. PMLR, 12888\u201312900."},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1145\/3460120.3484587"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.324"},{"key":"e_1_3_1_19_2","first-page":"34892","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","volume":"36","author":"Liu Haotian","year":"2023","unstructured":"Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. 2023. Visual instruction tuning. In Proceedings of the Advances in Neural Information Processing Systems. A. Oh, T. Neumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine (Eds.), Vol. 36. Curran Associates, Inc., 34892\u201334916. Retrieved from https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2023\/file\/6dcf277ea32ce3288914faf369fe6de0-Paper-Conference.pdf"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","unstructured":"Yinhan Liu Myle Ott Naman Goyal Jingfei Du Mandar Joshi Danqi Chen Omer Levy Mike Lewis Luke Zettlemoyer and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv:1907.11692. Retrieved from 10.48550\/arXiv.1907.11692","DOI":"10.48550\/arXiv.1907.11692"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.14722\/ndss.2023.24415"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.14722\/bar.2019.23020"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-22038-9_15"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSME.2019.00041"},{"key":"e_1_3_1_25_2","unstructured":"OpenAI. 2023. ChatGPT. Retrieved from https:\/\/openai.com\/chatgpt"},{"key":"e_1_3_1_26_2","first-page":"8026","volume-title":"Proceedings of the Advances in Neural Information Processing Systems,","volume":"32","author":"Paszke Adam","year":"2019","unstructured":"Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 32, 8026\u20138037."},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/SP46215.2023.10179439"},{"key":"e_1_3_1_28_2","unstructured":"Kexin Pei Zhou Xuan Junfeng Yang Suman Sekhar Jana and Baishakhi Ray. 2020. Trex: Learning execution semantics from micro-traces for binary similarity. arXiv:abs\/2012.08680. Retrieved from https:\/\/api.semanticscholar.org\/CorpusID:229220356"},{"key":"e_1_3_1_29_2","first-page":"8748","volume-title":"Proceedings of the International Conference on Machine Learning.","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In Proceedings of the International Conference on Machine Learning. PMLR, 8748\u20138763."},{"key":"e_1_3_1_30_2","unstructured":"Alec Radford Karthik Narasimhan Tim Salimans and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. (2018)."},{"issue":"8","key":"e_1_3_1_31_2","first-page":"9","article-title":"Language models are unsupervised multitask learners","volume":"1","author":"Radford Alec","year":"2019","unstructured":"Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI Blog 1, 8 (2019), 9.","journal-title":"OpenAI Blog"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.5555\/3455716.3455856"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNN.2008.2005605"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P16-1162"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","unstructured":"Jianlin Su Yu Lu Shengfeng Pan Ahmed Murtadha Bo Wen and Yunfeng Liu. 2022. RoFormer: Enhanced transformer with rotary position embedding. arXiv:2104.09864 [cs]. Retrieved from 10.48550\/arXiv.2104.09864","DOI":"10.48550\/arXiv.2104.09864"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","unstructured":"Hugo Touvron Thibaut Lavril Gautier Izacard Xavier Martinet Marie-Anne Lachaux Timoth\u00e9e Lacroix Baptiste Rozi\u00e8re Naman Goyal Eric Hambro Faisal Azhar et al. 2023. Llama: Open and efficient foundation language models. arXiv:2302.13971. Retrieved from 10.48550\/arXiv.2302.13971","DOI":"10.48550\/arXiv.2302.13971"},{"key":"e_1_3_1_37_2","unstructured":"Hugo Touvron Louis Martin Kevin Stone Peter Albert Amjad Almahairi Yasmine Babaei Nikolay Bashlykov Soumya Batra Prajjwal Bhargava Shruti Bhosale et al. 2023. Llama 2: Open foundation and fine-tuned chat models. arXiv:2307.09288 [cs.CL]."},{"key":"e_1_3_1_38_2","volume-title":"Proceedings of the Advances in Neural Information Processing Systems,","volume":"30","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 30."},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1145\/3650212.3652145"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1145\/3533767.3534367"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.emnlp-main.685"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1145\/3658644.3670340"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1145\/3133956.3134018"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i01.5466"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1145\/210134.210179"},{"key":"e_1_3_1_46_2","unstructured":"Hongxiang Zhang Yuyang Rong Yifeng He and Hao Chen. 2024. LLAMAFUZZ: Large language model enhanced greybox fuzzing. arXiv:abs\/2406.07714. Retrieved from https:\/\/api.semanticscholar.org\/CorpusID:270391217"},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","unstructured":"Qiming Zhang Jing Zhang Yufei Xu and Dacheng Tao. 2023. Vision transformer with quadrangle attention. arXiv:2303.15105. Retrieved from 10.48550\/arXiv.2303.15105","DOI":"10.48550\/arXiv.2303.15105"},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/SP46215.2023.10179482"}],"container-title":["ACM Transactions on Software Engineering and Methodology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3702988","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3702988","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:18:04Z","timestamp":1750295884000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3702988"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,4,28]]},"references-count":47,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025,5,31]]}},"alternative-id":["10.1145\/3702988"],"URL":"https:\/\/doi.org\/10.1145\/3702988","relation":{},"ISSN":["1049-331X","1557-7392"],"issn-type":[{"value":"1049-331X","type":"print"},{"value":"1557-7392","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,4,28]]},"assertion":[{"value":"2024-03-31","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-10-18","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-04-28","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}