{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,10]],"date-time":"2026-04-10T02:30:47Z","timestamp":1775788247651,"version":"3.50.1"},"reference-count":37,"publisher":"MIT Press","license":[{"start":{"date-parts":[[2023,12,21]],"date-time":"2023-12-21T00:00:00Z","timestamp":1703116800000},"content-version":"vor","delay-in-days":354,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,12,20]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Recent research has revealed that pre-trained models (PTMs) are vulnerable to backdoor attacks before the fine-tuning stage. The attackers can implant transferable task-agnostic backdoors in PTMs, and control model outputs on any downstream task, which poses severe security threats to all downstream applications. Existing backdoor-removal defenses focus on task-specific classification models and they are not suitable for defending PTMs against task-agnostic backdoor attacks. To this end, we propose the first task-agnostic backdoor removal method for PTMs. Based on the selective activation phenomenon in backdoored PTMs, we design a simple and effective backdoor eraser, which continually pre-trains the backdoored PTMs with a regularization term in an end-to-end approach. The regularization term removes backdoor functionalities from PTMs while the continual pre-training maintains the normal functionalities of PTMs. We conduct extensive experiments on pre-trained models across different modalities and architectures. The experimental results show that our method can effectively remove backdoors inside PTMs and preserve benign functionalities of PTMs with a few downstream-task-irrelevant auxiliary data, e.g., unlabeled plain texts. The average attack success rate on three downstream datasets is reduced from 99.88% to 8.10% after our defense on the backdoored BERT. The codes are publicly available at https:\/\/github.com\/thunlp\/RECIPE.<\/jats:p>","DOI":"10.1162\/tacl_a_00622","type":"journal-article","created":{"date-parts":[[2023,12,21]],"date-time":"2023-12-21T14:09:49Z","timestamp":1703167789000},"page":"1608-1623","update-policy":"https:\/\/doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":2,"title":["Removing Backdoors in Pre-trained Models by Regularized Continual Pre-training"],"prefix":"10.1162","volume":"11","author":[{"given":"Biru","family":"Zhu","sequence":"first","affiliation":[{"name":"School of Software, Tsinghua University, China. zbr19@mails.tsinghua.edu.cn"}]},{"given":"Ganqu","family":"Cui","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Technology, Tsinghua University, China. cgq22@mails.tsinghua.edu.cn"}]},{"given":"Yangyi","family":"Chen","sequence":"additional","affiliation":[{"name":"University of Illinois Urbana-Champaign, USA"}]},{"given":"Yujia","family":"Qin","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Technology, Tsinghua University, China"}]},{"given":"Lifan","family":"Yuan","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Technology, Tsinghua University, China"}]},{"given":"Chong","family":"Fu","sequence":"additional","affiliation":[{"name":"Zhejiang University, China"}]},{"given":"Yangdong","family":"Deng","sequence":"additional","affiliation":[{"name":"School of Software, Tsinghua University, China. dengyd@tsinghua.edu.cn"}]},{"given":"Zhiyuan","family":"Liu","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Technology, Tsinghua University, China. liuzy@tsinghua.edu.cn"}]},{"given":"Maosong","family":"Sun","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Technology, Tsinghua University, China"}]},{"given":"Ming","family":"Gu","sequence":"additional","affiliation":[{"name":"School of Software, Tsinghua University, China"}]}],"member":"281","published-online":{"date-parts":[[2023,12,20]]},"reference":[{"key":"2023122114093961100_bib1","article-title":"On the opportunities and risks of foundation models","author":"Bommasani","year":"2021","journal-title":"arXiv preprint arXiv:2108.07258"},{"key":"2023122114093961100_bib2","article-title":"Poisoning and backdooring contrastive learning","volume-title":"Proceedings of ICLR","author":"Carlini","year":"2022"},{"key":"2023122114093961100_bib3","article-title":"One-shot neural backdoor erasing via adversarial weight masking","volume-title":"Advances in Neural Information Processing Systems","author":"Chai","year":"2022"},{"key":"2023122114093961100_bib4","article-title":"Badpre: Task-agnostic backdoor attacks to pre-trained nlp foundation models","volume-title":"Proceedings of ICLR","author":"Chen","year":"2022"},{"key":"2023122114093961100_bib5","doi-asserted-by":"publisher","first-page":"512","DOI":"10.1609\/icwsm.v11i1.14955","article-title":"Automated hate speech detection and the problem of offensive language","volume-title":"Proceedings of the 11th International AAAI Conference on Web and Social Media","author":"Davidson","year":"2017"},{"key":"2023122114093961100_bib6","first-page":"4171","article-title":"BERT: Pre-training of deep bidirectional transformers for language understanding","volume-title":"Proceedings of NAACL-HLT","author":"Devlin","year":"2019"},{"key":"2023122114093961100_bib7","article-title":"An image is worth 16x16 words: Transformers for image recognition at scale","volume-title":"9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3\u20137, 2021","author":"Dosovitskiy","year":"2021"},{"key":"2023122114093961100_bib8","article-title":"Badnets: Identifying vulnerabilities in the machine learning model supply chain","author":"Tianyu","year":"2017","journal-title":"arXiv preprint arXiv:1708.06733"},{"key":"2023122114093961100_bib9","article-title":"Threats to pre-trained language models: Survey and taxonomy","author":"Guo","year":"2022","journal-title":"arXiv preprint arXiv:2202.06862"},{"key":"2023122114093961100_bib10","article-title":"Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding","author":"Han","year":"2016","journal-title":"International Conference on Learning Representations (ICLR)"},{"key":"2023122114093961100_bib11","doi-asserted-by":"publisher","DOI":"10.1016\/j.aiopen.2021.08.002","article-title":"Pre-trained models: Past, present and future","author":"Han","year":"2021","journal-title":"AI Open"},{"key":"2023122114093961100_bib12","article-title":"Batch normalization: Accelerating deep network training by reducing internal covariate shift","volume-title":"International Conference on Machine Learning","author":"Ioffe","year":"2015"},{"key":"2023122114093961100_bib13","doi-asserted-by":"publisher","first-page":"2043","DOI":"10.1109\/SP46214.2022.9833644","article-title":"Badencoder: Backdoor attacks to pre-trained encoders in self-supervised learning","author":"Jia","year":"2022","journal-title":"2022 IEEE Symposium on Security and Privacy (SP)"},{"key":"2023122114093961100_bib14","article-title":"Learning multiple layers of features from tiny images","author":"Krizhevsky","year":"2009"},{"key":"2023122114093961100_bib15","article-title":"Neural attention distillation: Erasing backdoor triggers from deep neural networks","author":"Li","year":"2021","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2023122114093961100_bib16","article-title":"Backdoor learning: A survey","author":"Li","year":"2020","journal-title":"arXiv preprint arXiv:2007.08745"},{"key":"2023122114093961100_bib17","doi-asserted-by":"publisher","first-page":"740","DOI":"10.1007\/978-3-319-10602-1_48","article-title":"Microsoft coco: Common objects in context","volume-title":"European Conference on Computer Vision","author":"Lin","year":"2014"},{"key":"2023122114093961100_bib18","doi-asserted-by":"publisher","first-page":"273","DOI":"10.1007\/978-3-030-00470-5_13","article-title":"Fine-pruning: Defending against backdooring attacks on deep neural networks","volume-title":"International Symposium on Research in Attacks, Intrusions, and Defenses","author":"Liu","year":"2018"},{"key":"2023122114093961100_bib19","article-title":"Roberta: A robustly optimized bert pretraining approach","author":"Liu","year":"2019","journal-title":"arXiv preprint arXiv:1907.11692"},{"key":"2023122114093961100_bib20","article-title":"Pointer sentinel mixture models","volume-title":"International Conference on Learning Representations","author":"Merity","year":"2017"},{"key":"2023122114093961100_bib21","first-page":"8748","article-title":"Learning transferable visual models from natural language supervision","volume-title":"International Conference on Machine Learning","author":"Radford","year":"2021"},{"issue":"8","key":"2023122114093961100_bib22","first-page":"9","article-title":"Language models are unsupervised multitask learners","volume":"1","author":"Radford","year":"2019","journal-title":"OpenAI blog"},{"issue":"3","key":"2023122114093961100_bib23","doi-asserted-by":"publisher","first-page":"211","DOI":"10.1007\/s11263-015-0816-y","article-title":"ImageNet Large Scale Visual Recognition Challenge","volume":"115","author":"Russakovsky","year":"2015","journal-title":"International Journal of Computer Vision (IJCV)"},{"key":"2023122114093961100_bib24","doi-asserted-by":"publisher","first-page":"3141","DOI":"10.1145\/3460120.3485370","article-title":"Backdoor pre-trained models can transfer to all","volume-title":"CCS \u201921: 2021 ACM SIGSAC Conference on Computer and Communications Security","author":"Shen","year":"2021"},{"key":"2023122114093961100_bib25","article-title":"Very deep convolutional networks for large-scale image recognition","volume-title":"3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7\u20139, 2015, Conference Track Proceedings","author":"Simonyan","year":"2015"},{"key":"2023122114093961100_bib26","first-page":"1631","article-title":"Recursive deep models for semantic compositionality over a sentiment treebank","volume-title":"Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing","author":"Socher","year":"2013"},{"key":"2023122114093961100_bib27","doi-asserted-by":"publisher","first-page":"323","DOI":"10.1016\/j.neunet.2012.02.016","article-title":"Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition","volume":"32","author":"Stallkamp","year":"2012","journal-title":"Neural Networks"},{"key":"2023122114093961100_bib28","doi-asserted-by":"publisher","first-page":"3949","DOI":"10.18653\/v1\/2022.naacl-main.290","article-title":"On transferability of prompt tuning for natural language processing","volume-title":"Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Yusheng","year":"2022"},{"key":"2023122114093961100_bib29","article-title":"Neural pruning via growing regularization","volume-title":"9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3\u20137, 2021","author":"Wang","year":"2021"},{"key":"2023122114093961100_bib30","doi-asserted-by":"publisher","first-page":"11132","DOI":"10.18653\/v1\/2022.emnlp-main.765","article-title":"Finding skill neurons in pre-trained transformer-based language models","volume-title":"Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing","author":"Wang","year":"2022"},{"key":"2023122114093961100_bib31","first-page":"16913","article-title":"Adversarial neuron pruning purifies backdoored deep models","volume":"34","author":"Dongxian","year":"2021","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2023122114093961100_bib32","article-title":"Adversarial unlearning of backdoors via implicit hypergradient","volume-title":"International Conference on Learning Representations","author":"Yi","year":"2022"},{"key":"2023122114093961100_bib33","first-page":"649","article-title":"Character-level convolutional networks for text classification","volume-title":"Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7\u201312, 2015, Montreal, Quebec, Canada","author":"Zhang","year":"2015"},{"key":"2023122114093961100_bib34","doi-asserted-by":"publisher","first-page":"877","DOI":"10.18653\/v1\/2022.findings-acl.71","article-title":"Moefication: Transformer feed-forward layers are mixtures of experts","volume-title":"Findings of the Association for Computational Linguistics: ACL 2022","author":"Zhang","year":"2022"},{"key":"2023122114093961100_bib35","article-title":"Red alarm for pre-trained models: Universal vulnerabilities by neuron-level backdoor attacks","author":"Zhang","year":"2021","journal-title":"arXiv preprint arXiv:2101.06969"},{"key":"2023122114093961100_bib36","article-title":"Pre-activation distributions expose backdoor neurons","volume-title":"Advances in Neural Information Processing Systems","author":"Zheng","year":"2022"},{"key":"2023122114093961100_bib37","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.11","article-title":"Aligning books and movies: Towards story-like visual explanations by watching movies and reading books","volume-title":"The IEEE International Conference on Computer Vision (ICCV)","author":"Zhu","year":"2015"}],"container-title":["Transactions of the Association for Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00622\/2199607\/tacl_a_00622.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00622\/2199607\/tacl_a_00622.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,12,21]],"date-time":"2023-12-21T14:10:02Z","timestamp":1703167802000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/tacl\/article\/doi\/10.1162\/tacl_a_00622\/118798\/Removing-Backdoors-in-Pre-trained-Models-by"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023]]},"references-count":37,"URL":"https:\/\/doi.org\/10.1162\/tacl_a_00622","relation":{},"ISSN":["2307-387X"],"issn-type":[{"value":"2307-387X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023]]},"published":{"date-parts":[[2023]]}}}