{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,8]],"date-time":"2026-06-08T18:58:36Z","timestamp":1780945116764,"version":"3.54.1"},"reference-count":57,"publisher":"MIT Press","license":[{"start":{"date-parts":[[2024,5,6]],"date-time":"2024-05-06T00:00:00Z","timestamp":1714953600000},"content-version":"vor","delay-in-days":126,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2024,5,3]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Large pretrained language models are widely used in downstream NLP tasks via task- specific fine-tuning, but such procedures can be costly. Recently, Parameter-Efficient Fine-Tuning (PEFT) methods have achieved strong task performance while updating much fewer parameters than full model fine-tuning (FFT). However, it is non-trivial to make informed design choices on the PEFT configurations, such as their architecture, the number of tunable parameters, and even the layers in which the PEFT modules are inserted. Consequently, it is highly likely that the current, manually designed configurations are suboptimal in terms of their performance-efficiency trade-off. Inspired by advances in neural architecture search, we propose AutoPEFT for automatic PEFT configuration selection: We first design an expressive configuration search space with multiple representative PEFT modules as building blocks. Using multi-objective Bayesian optimization in a low-cost setup, we then discover a Pareto-optimal set of configurations with strong performance-cost trade-offs across different numbers of parameters that are also highly transferable across different tasks. Empirically, on GLUE and SuperGLUE tasks, we show that AutoPEFT-discovered configurations significantly outperform existing PEFT methods and are on par or better than FFT without incurring substantial training efficiency costs.<\/jats:p>","DOI":"10.1162\/tacl_a_00662","type":"journal-article","created":{"date-parts":[[2024,5,6]],"date-time":"2024-05-06T20:13:34Z","timestamp":1715026414000},"page":"525-542","update-policy":"https:\/\/doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":19,"title":["<scp>AutoPEFT<\/scp>: Automatic Configuration Search for Parameter-Efficient Fine-Tuning"],"prefix":"10.1162","volume":"12","author":[{"given":"Han","family":"Zhou","sequence":"first","affiliation":[{"name":"Language Technology Lab, University of Cambridge, UK. hz416@cam.ac.uk"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Xingchen","family":"Wan","sequence":"additional","affiliation":[{"name":"Machine Learning Research Group, University of Oxford, UK. xwan@robots.ox.ac.uk"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ivan","family":"Vuli\u0107","sequence":"additional","affiliation":[{"name":"Language Technology Lab, University of Cambridge, UK. iv250@cam.ac.uk"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Anna","family":"Korhonen","sequence":"additional","affiliation":[{"name":"Language Technology Lab, University of Cambridge, UK. alk23@cam.ac.uk"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"281","published-online":{"date-parts":[[2024,5,3]]},"reference":[{"key":"2024050620131718600_bib1","doi-asserted-by":"publisher","first-page":"1778","DOI":"10.18653\/v1\/2022.acl-long.125","article-title":"Composable sparse fine-tuning for cross-lingual transfer","volume-title":"Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Ansell","year":"2022"},{"key":"2024050620131718600_bib2","first-page":"21524","article-title":"Botorch: A framework for efficient monte-carlo bayesian optimization","volume":"33","author":"Balandat","year":"2020","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2024050620131718600_bib3","doi-asserted-by":"publisher","first-page":"1","DOI":"10.18653\/v1\/2022.acl-short.1","article-title":"BitFit: Simple parameter- efficient fine-tuning for transformer-based masked language-models","volume-title":"Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)","author":"Zaken","year":"2022"},{"key":"2024050620131718600_bib4","article-title":"Language models are few-shot learners","volume-title":"Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6\u201312, 2020, virtual","author":"Brown","year":"2020"},{"key":"2024050620131718600_bib5","doi-asserted-by":"publisher","first-page":"2612","DOI":"10.18653\/v1\/2022.emnlp-main.168","article-title":"Revisiting parameter-efficient tuning: Are we really there yet?","volume-title":"Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing","author":"Chen","year":"2022"},{"key":"2024050620131718600_bib6","article-title":"Parameter-efficient fine-tuning design spaces","volume-title":"The Eleventh International Conference on Learning Representations","author":"Chen","year":"2023"},{"key":"2024050620131718600_bib7","article-title":"Adaptformer: Adapting vision transformers for scalable visual recognition","volume-title":"Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022","author":"Chen","year":"2022"},{"key":"2024050620131718600_bib8","first-page":"2187","article-title":"Parallel bayesian optimization of multiple noisy objectives with expected hypervolume improvement","volume-title":"Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6\u201314, 2021, virtual","author":"Daulton","year":"2021"},{"key":"2024050620131718600_bib9","doi-asserted-by":"publisher","first-page":"4171","DOI":"10.18653\/v1\/N19-1423","article-title":"BERT: Pre-training of deep bidirectional transformers for language understanding","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Devlin","year":"2019"},{"key":"2024050620131718600_bib10","article-title":"Nas-bench-201: Extending the scope of reproducible neural architecture search","volume-title":"8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26\u201330, 2020","author":"Dong","year":"2020"},{"issue":"1","key":"2024050620131718600_bib11","doi-asserted-by":"publisher","first-page":"1997","DOI":"10.1007\/978-3-030-05318-5_11","article-title":"Neural architecture search: A survey","volume":"20","author":"Elsken","year":"2019","journal-title":"The Journal of Machine Learning Research"},{"key":"2024050620131718600_bib12","article-title":"Latency-aware neural architecture search with multi-objective bayesian optimization","volume-title":"8th ICML Workshop on Automated Machine Learning (AutoML)","author":"Eriksson","year":"2021"},{"key":"2024050620131718600_bib13","first-page":"493","article-title":"High-dimensional bayesian optimization with sparse axis-aligned subspaces","volume-title":"Uncertainty in Artificial Intelligence","author":"Eriksson","year":"2021"},{"key":"2024050620131718600_bib14","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1807.02811","article-title":"A tutorial on bayesian optimization","author":"Frazier","year":"2018","journal-title":"CoRR"},{"key":"2024050620131718600_bib15","doi-asserted-by":"publisher","DOI":"10.1017\/9781108348973","volume-title":"Bayesian Optimization","author":"Garnett","year":"2023"},{"key":"2024050620131718600_bib16","doi-asserted-by":"publisher","first-page":"4884","DOI":"10.18653\/v1\/2021.acl-long.378","article-title":"Parameter-efficient transfer learning with diff pruning","volume-title":"Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)","author":"Guo","year":"2021"},{"key":"2024050620131718600_bib17","article-title":"Towards a unified view of parameter-efficient transfer learning","volume-title":"The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25\u201329, 2022","author":"He","year":"2022"},{"key":"2024050620131718600_bib18","first-page":"2790","article-title":"Parameter-efficient transfer learning for NLP","volume-title":"Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9\u201315 June 2019, Long Beach, California, USA","author":"Houlsby","year":"2019"},{"key":"2024050620131718600_bib19","article-title":"LoRA: Low-rank adaptation of large language models","volume-title":"International Conference on Learning Representations","author":"Edward","year":"2022"},{"key":"2024050620131718600_bib20","article-title":"Sparse structure search for delta tuning","volume-title":"Advances in Neural Information Processing Systems","author":"Shengding","year":"2022"},{"key":"2024050620131718600_bib21","article-title":"Bag of baselines for multi-objective joint neural architecture search and hyperparameter optimization","volume-title":"8th ICML Workshop on Automated Machine Learning (AutoML)","author":"Izquierdo","year":"2021"},{"key":"2024050620131718600_bib22","doi-asserted-by":"publisher","first-page":"3045","DOI":"10.18653\/v1\/2021.emnlp-main.243","article-title":"The power of scale for parameter- efficient prompt tuning","volume-title":"Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing","author":"Lester","year":"2021"},{"key":"2024050620131718600_bib23","first-page":"367","article-title":"Random search and reproducibility for neural architecture search","volume-title":"Proceedings of the Thirty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI 2019, Tel Aviv, Israel, July 22\u201325, 2019","author":"Li","year":"2019"},{"key":"2024050620131718600_bib24","doi-asserted-by":"publisher","first-page":"4582","DOI":"10.18653\/v1\/2021.acl-long.353","article-title":"Prefix- tuning: Optimizing continuous prompts for generation","volume-title":"Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)","author":"Li","year":"2021"},{"key":"2024050620131718600_bib25","article-title":"DARTS: Differentiable architecture search","volume-title":"7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6\u20139, 2019","author":"Liu","year":"2019"},{"key":"2024050620131718600_bib26","article-title":"Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning","volume-title":"Advances in Neural Information Processing Systems","author":"Liu","year":"2022"},{"key":"2024050620131718600_bib27","article-title":"Roberta: A robustly optimized BERT pretraining approach","author":"Liu","year":"2019","journal-title":"CoRR"},{"key":"2024050620131718600_bib28","first-page":"1022","article-title":"Compacter: Efficient low-rank hypercomplex adapter layers","volume-title":"Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6\u201314, 2021, virtual","author":"Mahabadi","year":"2021"},{"key":"2024050620131718600_bib29","doi-asserted-by":"publisher","first-page":"6253","DOI":"10.18653\/v1\/2022.acl-long.433","article-title":"UniPELT: A unified framework for parameter-efficient language model tuning","volume-title":"Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Mao","year":"2022"},{"key":"2024050620131718600_bib30","doi-asserted-by":"publisher","first-page":"3742","DOI":"10.18653\/v1\/2022.naacl-main.274","article-title":"Adaptable adapters","volume-title":"Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Moosavi","year":"2022"},{"key":"2024050620131718600_bib31","doi-asserted-by":"publisher","first-page":"3479","DOI":"10.18653\/v1\/2022.naacl-main.255","article-title":"Lifting the curse of multilinguality by pre-training modular transformers","volume-title":"Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Pfeiffer","year":"2022"},{"key":"2024050620131718600_bib32","doi-asserted-by":"publisher","first-page":"46","DOI":"10.18653\/v1\/2020.emnlp-demos.7","article-title":"AdapterHub: A framework for adapting transformers","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations","author":"Pfeiffer","year":"2020"},{"key":"2024050620131718600_bib33","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2302.11529","article-title":"Modular deep learning","author":"Pfeiffer","year":"2023","journal-title":"Transactions on Machine Learning Research"},{"key":"2024050620131718600_bib34","doi-asserted-by":"publisher","first-page":"7654","DOI":"10.18653\/v1\/2020.emnlp-main.617","article-title":"MAD-X: An adapter-based framework for multi-task cross-lingual transfer","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Pfeiffer","year":"2020"},{"key":"2024050620131718600_bib35","first-page":"140:1\u2013140:67","article-title":"Exploring the limits of transfer learning with a unified text-to-text transformer","volume":"21","author":"Raffel","year":"2020","journal-title":"Journal of Machine Learning Research"},{"issue":"4","key":"2024050620131718600_bib36","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3447582","article-title":"A comprehensive survey of neural architecture search: Challenges and solutions","volume":"54","author":"Ren","year":"2021","journal-title":"ACM Computing Surveys (CSUR)"},{"key":"2024050620131718600_bib37","article-title":"Interpretable neural architecture search via bayesian optimisation with weisfeiler-lehman kernels","volume-title":"9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3\u20137, 2021","author":"Bin Xin","year":"2021"},{"key":"2024050620131718600_bib38","article-title":"Neural architecture generator optimization","volume-title":"Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6\u201312, 2020, virtual","author":"Robin","year":"2020"},{"key":"2024050620131718600_bib39","doi-asserted-by":"publisher","first-page":"7930","DOI":"10.18653\/v1\/2021.emnlp-main.626","article-title":"AdapterDrop: On the efficiency of adapters in transformers","volume-title":"Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing","author":"R\u00fcckl\u00e9","year":"2021"},{"key":"2024050620131718600_bib40","article-title":"Multitask prompted training enables zero-shot task generalization","volume-title":"The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25\u201329, 2022","author":"Sanh","year":"2022"},{"key":"2024050620131718600_bib41","first-page":"24193","article-title":"Training neural networks with fixed sparse masks","volume-title":"Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6\u201314, 2021, virtual","author":"Sung","year":"2021"},{"key":"2024050620131718600_bib42","doi-asserted-by":"publisher","first-page":"4593","DOI":"10.18653\/v1\/P19-1452","article-title":"BERT rediscovers the classical NLP pipeline","volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics","author":"Tenney","year":"2019"},{"key":"2024050620131718600_bib43","doi-asserted-by":"publisher","first-page":"3274","DOI":"10.18653\/v1\/2023.eacl-main.239","article-title":"DyLoRA: Parameter-efficient tuning of pre-trained models using dynamic search-free low-rank adaptation","volume-title":"Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics","author":"Valipour","year":"2023"},{"key":"2024050620131718600_bib44","doi-asserted-by":"publisher","first-page":"7222","DOI":"10.18653\/v1\/2020.emnlp-main.586","article-title":"Probing pretrained language models for lexical semantics","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Vuli\u0107","year":"2020"},{"key":"2024050620131718600_bib45","first-page":"10663","article-title":"Think global and act local: Bayesian optimisation over high-dimensional categorical and mixed search spaces","volume-title":"International Conference on Machine Learning","author":"Wan","year":"2021"},{"key":"2024050620131718600_bib46","article-title":"On redundancy and diversity in cell-based neural architecture search","volume-title":"The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25\u201329, 2022","author":"Wan","year":"2022"},{"key":"2024050620131718600_bib47","first-page":"3261","article-title":"Superglue: A stickier benchmark for general-purpose language understanding systems","volume-title":"Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8\u201314, 2019, Vancouver, BC, Canada","author":"Wang","year":"2019"},{"key":"2024050620131718600_bib48","doi-asserted-by":"publisher","first-page":"353","DOI":"10.18653\/v1\/W18-5446","article-title":"GLUE: A multi-task benchmark and analysis platform for natural language understanding","volume-title":"Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP","author":"Wang","year":"2018"},{"key":"2024050620131718600_bib49","doi-asserted-by":"publisher","first-page":"5744","DOI":"10.18653\/v1\/2022.emnlp-main.388","article-title":"AdaMix: Mixture-of-adaptations for parameter-efficient model tuning","volume-title":"Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing","author":"Wang","year":"2022"},{"key":"2024050620131718600_bib50","doi-asserted-by":"publisher","first-page":"10293","DOI":"10.1609\/aaai.v35i12.17233","article-title":"BANANAS: Bayesian optimization with neural architectures for neural architecture search","volume-title":"Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2\u20139, 2021","author":"White","year":"2021"},{"key":"2024050620131718600_bib51","doi-asserted-by":"publisher","first-page":"1284","DOI":"10.1109\/ICCV.2019.00137","article-title":"Exploring randomly wired neural networks for image recognition","volume-title":"2019 IEEE\/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019","author":"Xie","year":"2019"},{"key":"2024050620131718600_bib52","article-title":"NAS evaluation is frustratingly hard","volume-title":"8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26\u201330, 2020","author":"Yang","year":"2020"},{"key":"2024050620131718600_bib53","article-title":"Adaptive budget allocation for parameter-efficient fine-tuning","volume-title":"The Eleventh International Conference on Learning Representations","author":"Zhang","year":"2023"},{"key":"2024050620131718600_bib54","doi-asserted-by":"publisher","first-page":"2226","DOI":"10.18653\/v1\/2020.emnlp-main.174","article-title":"Masking as an efficient alternative to finetuning for pretrained language models","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Zhao","year":"2020"},{"key":"2024050620131718600_bib55","doi-asserted-by":"publisher","first-page":"13064","DOI":"10.18653\/v1\/2023.findings-emnlp.870","article-title":"Survival of the most influential prompts: Efficient black-box prompt search via clustering and pruning","volume-title":"Findings of the Association for Computational Linguistics: EMNLP 2023","author":"Zhou","year":"2023"},{"key":"2024050620131718600_bib56","doi-asserted-by":"publisher","first-page":"292","DOI":"10.1007\/BFb0056872","article-title":"Multiobjective optimization using evolutionary algorithms - A comparative case study","volume-title":"Parallel Problem Solving from Nature - PPSN V, 5th International Conference, Amsterdam, The Netherlands, September 27\u201330, 1998, Proceedings","author":"Zitzler","year":"1998"},{"key":"2024050620131718600_bib57","article-title":"Neural architecture search with reinforcement learning","volume-title":"5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24\u201326, 2017, Conference Track Proceedings","author":"Zoph","year":"2017"}],"container-title":["Transactions of the Association for Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00662\/2369530\/tacl_a_00662.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00662\/2369530\/tacl_a_00662.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,5,6]],"date-time":"2024-05-06T20:14:34Z","timestamp":1715026474000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/tacl\/article\/doi\/10.1162\/tacl_a_00662\/120914\/AutoPEFT-Automatic-Configuration-Search-for"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024]]},"references-count":57,"URL":"https:\/\/doi.org\/10.1162\/tacl_a_00662","relation":{},"ISSN":["2307-387X"],"issn-type":[{"value":"2307-387X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024]]},"published":{"date-parts":[[2024]]}}}