{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,12]],"date-time":"2026-05-12T18:04:43Z","timestamp":1778609083477,"version":"3.51.4"},"publisher-location":"California","reference-count":0,"publisher":"International Joint Conferences on Artificial Intelligence Organization","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,9]]},"abstract":"<jats:p>Binary analysis is crucial for software security, offering insights into compiled programs without source code. As large language models (LLMs) excel in language tasks, their potential for complex decoding binary data structures is growing. However, the lack of standardized benchmarks hinders their evaluation and progress in this domain.\n\nTo bridge this gap, we introduce BinMetric, a first comprehensive benchmark designed specifically to evaluate LLMs performance on binary analysis tasks. BinMetric comprises 1,000 questions derived from 20 real-world open-source projects across 6 practical binary analysis tasks, including decompilation, code summarization, etc., which reflect actual reverse engineering scenarios. Our empirical study on this benchmark investigates various state-of-the-art LLMs, revealing their strengths and limitations. The findings indicate that while LLMs show strong potential, challenges still exist, particularly in the areas of precise binary lifting and assembly synthesis. In summary, BinMetric makes a significant step forward in measuring binary analysis capabilities of LLMs, establishing a new benchmark leaderboard, and our study offers valuable insights for advancing LLMs in software security.<\/jats:p>","DOI":"10.24963\/ijcai.2025\/858","type":"proceedings-article","created":{"date-parts":[[2025,9,19]],"date-time":"2025-09-19T08:10:40Z","timestamp":1758269440000},"page":"7715-7723","source":"Crossref","is-referenced-by-count":4,"title":["BinMetric: A Comprehensive Binary Code Analysis Benchmark for Large Language Models"],"prefix":"10.24963","author":[{"given":"Xiuwei","family":"Shang","sequence":"first","affiliation":[{"name":"University of Science and Technology of China, Hefei, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Guoqiang","family":"Chen","sequence":"additional","affiliation":[{"name":"QI-ANXIN Technology Research Institute, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shaoyin","family":"Cheng","sequence":"additional","affiliation":[{"name":"University of Science and Technology of China, Hefei, China"},{"name":"Anhui Province Key Laboratory of Digital Security, Hefei, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Benlong","family":"Wu","sequence":"additional","affiliation":[{"name":"University of Science and Technology of China, Hefei, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Li","family":"Hu","sequence":"additional","affiliation":[{"name":"University of Science and Technology of China, Hefei, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Gangyang","family":"Li","sequence":"additional","affiliation":[{"name":"University of Science and Technology of China, Hefei, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Weiming","family":"Zhang","sequence":"additional","affiliation":[{"name":"University of Science and Technology of China, Hefei, China"},{"name":"Anhui Province Key Laboratory of Digital Security, Hefei, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Nenghai","family":"Yu","sequence":"additional","affiliation":[{"name":"University of Science and Technology of China, Hefei, China"},{"name":"Anhui Province Key Laboratory of Digital Security, Hefei, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"10584","event":{"name":"Thirty-Fourth International Joint Conference on Artificial Intelligence {IJCAI-25}","theme":"Artificial Intelligence","location":"Montreal, Canada","acronym":"IJCAI-2025","number":"34","sponsor":["International Joint Conferences on Artificial Intelligence Organization (IJCAI)"],"start":{"date-parts":[[2025,8,16]]},"end":{"date-parts":[[2025,8,22]]}},"container-title":["Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence"],"original-title":[],"deposited":{"date-parts":[[2025,9,23]],"date-time":"2025-09-23T11:35:19Z","timestamp":1758627319000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.ijcai.org\/proceedings\/2025\/858"}},"subtitle":[],"proceedings-subject":"Artificial Intelligence Research Articles","short-title":[],"issued":{"date-parts":[[2025,9]]},"references-count":0,"URL":"https:\/\/doi.org\/10.24963\/ijcai.2025\/858","relation":{},"subject":[],"published":{"date-parts":[[2025,9]]}}}