{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,17]],"date-time":"2026-03-17T00:54:31Z","timestamp":1773708871410,"version":"3.50.1"},"reference-count":52,"publisher":"Association for Computing Machinery (ACM)","issue":"FSE","funder":[{"name":"National Natural Science Foundation of China","award":["61932012, 62372228, U24A20337"],"award-info":[{"award-number":["61932012, 62372228, U24A20337"]}]},{"name":"the Fundamental Research Funds for the Central Universities","award":["14380029"],"award-info":[{"award-number":["14380029"]}]},{"name":"Open Project of State Key Laboratory for Novel Software Technology at Nanjing University","award":["KFKT2024B21"],"award-info":[{"award-number":["KFKT2024B21"]}]},{"name":"the Science, Technology and Innovation Commission of Shenzhen Municipality","award":["CJGJZD20200617103001003, 2021Szvup057"],"award-info":[{"award-number":["CJGJZD20200617103001003, 2021Szvup057"]}]},{"name":"National Research Foundation, Singapore, and DSO National Laboratories under the AI Singapore Programme","award":["AISG2-GC-2023-008"],"award-info":[{"award-number":["AISG2-GC-2023-008"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Softw. Eng."],"published-print":{"date-parts":[[2025,6,19]]},"abstract":"<jats:p>\n            Neural code models (NCMs) have been widely used to address various code understanding tasks, such as defect detection. However, numerous recent studies reveal that such models are vulnerable to backdoor attacks. Backdoored NCMs function normally on normal\/clean code snippets, but exhibit adversary-expected behavior on poisoned code snippets injected with the adversary-crafted trigger. It poses a significant security threat. For example, a backdoored defect detection model may misclassify user-submitted defective code as non-defective. If this insecure code is then integrated into critical systems, like autonomous driving systems, it could jeopardize life safety. Therefore, there is an urgent need for effective techniques to detect and eliminate backdoors stealthily implanted in NCMs.     To address this issue, in this paper, we innovatively propose a backdoor elimination technique for secure code understanding, called EliBadCode. EliBadCode eliminates backdoors in NCMs by inverting\/reverse-engineering and unlearning backdoor triggers. Specifically, EliBadCode first filters the model vocabulary for trigger tokens based on the naming conventions of specific programming languages to reduce the trigger search space and cost. Then, EliBadCode introduces a sample-specific trigger position identification method, which can reduce the interference of\n            <jats:italic toggle=\"yes\">non-backdoor<\/jats:italic>\n            (\n            <jats:italic toggle=\"yes\">adversarial<\/jats:italic>\n            )\n            <jats:italic toggle=\"yes\">perturbations<\/jats:italic>\n            for subsequent trigger inversion, thereby producing effective inverted backdoor triggers efficiently. Backdoor triggers can be viewed as\n            <jats:italic toggle=\"yes\">backdoor<\/jats:italic>\n            (\n            <jats:italic toggle=\"yes\">adversarial<\/jats:italic>\n            )\n            <jats:italic toggle=\"yes\">perturbations<\/jats:italic>\n            . Subsequently, EliBadCode employs a Greedy Coordinate Gradient algorithm to optimize the inverted trigger and designs a trigger anchoring method to purify the inverted trigger. Finally, EliBadCode eliminates backdoors through model unlearning. We evaluate the effectiveness of  in eliminating backdoors implanted in multiple NCMs used for three safety-critical code understanding tasks. The results demonstrate that EliBadCode can effectively eliminate backdoors while having minimal adverse effects on the normal functionality of the model. For instance, on defect detection tasks, EliBadCode substantially decreases the average Attack Success Rate (ASR) of the advanced backdoor attack from 99.76% to 2.64%, significantly outperforming the three baselines. The clean model produced by EliBadCode exhibits an average decrease in defect prediction accuracy of only 0.01% (the same as the baseline).\n          <\/jats:p>","DOI":"10.1145\/3715782","type":"journal-article","created":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T15:15:34Z","timestamp":1750346134000},"page":"1386-1408","source":"Crossref","is-referenced-by-count":5,"title":["Eliminating Backdoors in Neural Code Models for Secure Code Understanding"],"prefix":"10.1145","volume":"2","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9236-8264","authenticated-orcid":false,"given":"Weisong","family":"Sun","sequence":"first","affiliation":[{"name":"Nanyang Technological University, Singapore, Singapore"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3380-5564","authenticated-orcid":false,"given":"Yuchen","family":"Chen","sequence":"additional","affiliation":[{"name":"Nanjing University, Nanjing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9930-7111","authenticated-orcid":false,"given":"Chunrong","family":"Fang","sequence":"additional","affiliation":[{"name":"Nanjing University, Nanjing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7235-2377","authenticated-orcid":false,"given":"Yebo","family":"Feng","sequence":"additional","affiliation":[{"name":"Nanyang Technological University, Singapore, Singapore"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-3166-8007","authenticated-orcid":false,"given":"Yuan","family":"Xiao","sequence":"additional","affiliation":[{"name":"Nanjing University, Nanjing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-8661-6133","authenticated-orcid":false,"given":"An","family":"Guo","sequence":"additional","affiliation":[{"name":"Nanjing University, Nanjing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2495-3805","authenticated-orcid":false,"given":"Quanjun","family":"Zhang","sequence":"additional","affiliation":[{"name":"Nanjing University, Nanjing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9592-7022","authenticated-orcid":false,"given":"Zhenyu","family":"Chen","sequence":"additional","affiliation":[{"name":"Nanjing University, Nanjing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7743-1296","authenticated-orcid":false,"given":"Baowen","family":"Xu","sequence":"additional","affiliation":[{"name":"Nanjing University, Nanjing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7300-9215","authenticated-orcid":false,"given":"Yang","family":"Liu","sequence":"additional","affiliation":[{"name":"Nanyang Technological University, Singapore, Singapore"}]}],"member":"320","published-online":{"date-parts":[[2025,6,19]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1811.03728"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2410.15631"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/3395363.3397362"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3180155.3180167"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2406.03508"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2106.09685"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1909.09436"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2312.04004"},{"key":"e_1_2_1_11_1","volume-title":"Java Identifiers: Definition, Syntax, and Examples. https:\/\/docs.oracle.com\/cd\/E19798-01\/821-1841\/bnbuk\/index.html","author":"Java Oracle","year":"2010","unstructured":"Oracle Java. 2010. Java Identifiers: Definition, Syntax, and Examples. https:\/\/docs.oracle.com\/cd\/E19798-01\/821-1841\/bnbuk\/index.html"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1611835114"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3630008"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","unstructured":"Raymond Li Loubna Ben Allal Yangtian Zi Niklas Muennighoff Denis Kocetkov Chenghao Mou Marc Marone Christopher Akiki Jia Li Jenny Chim Qian Liu Evgenii Zheltonozhskii Terry Yue Zhuo Thomas Wang Olivier Dehaene Mishig Davaadorj Joel Lamy-Poirier Jo\u00e3o Monteiro and Oleh Shliazhko. 2023. StarCoder: may the source be with you!. CoRR abs\/2305.06161 1 (2023) 1\u201344. https:\/\/doi.org\/10.48550\/arXiv.2305.06161 10.48550\/arXiv.2305.06161","DOI":"10.48550\/arXiv.2305.06161"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2022.3182979"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3319535.3363216"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.14722\/ndss.2018.23291"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/SP46214.2022.9833579"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2102.04664"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.3115\/1073083.1073135"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1"},{"key":"e_1_2_1_24_1","volume-title":"Language Models are Unsupervised Multitask Learners. OpenAI blog, 1, 8","author":"Radford Alec","year":"2019","unstructured":"Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language Models are Unsupervised Multitask Learners. OpenAI blog, 1, 8 (2019), 1\u201312."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICPR56361.2022.9956690"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","unstructured":"Baptiste Rozi\u00e8re Jonas Gehring Fabian Gloeckle Sten Sootla Itai Gat Xiaoqing Ellen Tan Yossi Adi Jingyu Liu Tal Remez J\u00e9r\u00e9my Rapin Artyom Kozhevnikov Ivan Evtimov Joanna Bitton Manish Bhatt Cristian Canton-Ferrer Aaron Grattafiori Wenhan Xiong Alexandre D\u00e9fossez Jade Copet Faisal Azhar Hugo Touvron Louis Martin Nicolas Usunier Thomas Scialom and Gabriel Synnaeve. 2023. Code Llama: Open Foundation Models for Code. CoRR abs\/2308.12950 1 (2023) 1\u201347. https:\/\/doi.org\/10.48550\/arXiv.2308.12950 10.48550\/arXiv.2308.12950","DOI":"10.48550\/arXiv.2308.12950"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2007.02220"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2202.05749"},{"key":"e_1_2_1_29_1","volume-title":"site:. https:\/\/www.graphpad.com Accessed","author":"Software GraphPad","year":"2025","unstructured":"GraphPad Software. 1995. GraphPad Prism. site:. https:\/\/www.graphpad.com Accessed March, 2025"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10515-024-00421-4"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2502.15830"},{"key":"e_1_2_1_32_1","unstructured":"Weisong Sun Yuchen Chen Chunrong Fang Yebo Feng Yuan Xiao An Guo Quanjun Zhang Zhenyu Chen Baowen Xu and Yang Liu. 2025. Artifacts of EliBadCode. site:. https:\/\/github.com\/wssun\/EliBadCode Accessed: 2025"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3510003.3510140"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/3656341"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2403.13271"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICSME.2014.77"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1811.00636"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASE.2019.00012"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3540250.3549153"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/SP.2019.00031"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/2884781.2884804"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2017"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2405.16112"},{"key":"e_1_2_1_47_1","volume-title":"Critical Values and Probability Levels for the Wilcoxon Rank Sum Test and the Wilcoxon Signed Rank Test","author":"Wilcoxon Frank","unstructured":"Frank Wilcoxon, SK Katti, and Roberta A Wilcox. 1963. Critical Values and Probability Levels for the Wilcoxon Rank Sum Test and the Wilcoxon Signed Rank Test. American Cyanamid Company."},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/3650212.3680304"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2024.3361661"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/3428230"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1909.03496"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2307.15043"}],"container-title":["Proceedings of the ACM on Software Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3715782","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T15:20:30Z","timestamp":1750346430000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3715782"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,19]]},"references-count":52,"journal-issue":{"issue":"FSE","published-print":{"date-parts":[[2025,6,19]]}},"alternative-id":["10.1145\/3715782"],"URL":"https:\/\/doi.org\/10.1145\/3715782","relation":{},"ISSN":["2994-970X"],"issn-type":[{"value":"2994-970X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,6,19]]}}}