{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,2]],"date-time":"2026-05-02T16:13:37Z","timestamp":1777738417391,"version":"3.51.4"},"reference-count":43,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2024,11,25]],"date-time":"2024-11-25T00:00:00Z","timestamp":1732492800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["61806065 and No. 62120106008"],"award-info":[{"award-number":["61806065 and No. 62120106008"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100012226","name":"Fundamental Research Funds for the Central Universities","doi-asserted-by":"crossref","award":["JZ2022HGTB0239"],"award-info":[{"award-number":["JZ2022HGTB0239"]}],"id":[{"id":"10.13039\/501100012226","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Evol. Learn. Optim."],"published-print":{"date-parts":[[2024,12,31]]},"abstract":"<jats:p>The superior performance of large-scale pre-trained models, such as Bidirectional Encoder Representations from Transformers (BERT) and Generative Pre-trained Transformer (GPT), has received increasing attention in both academic and industrial research and has become one of the current research hotspots. A pre-trained model refers to a model trained on large-scale unlabeled data, whose purpose is to learn general language representation or features for fine-tuning or transfer learning in subsequent tasks. After pre-training is complete, a small amount of labeled data can be used to fine-tune the model for a specific task or domain. This two-stage method of \u201cpre-training+fine-tuning\u201d has achieved advanced results in natural language processing (NLP) tasks. Despite widespread adoption, existing fixed fine-tuning schemes that adapt well to one NLP task may perform inconsistently on other NLP tasks given that different tasks have different latent semantic structures. In this article, we explore the effectiveness of automatic fine-tuning pattern search for layer-wise learning rates from an evolutionary optimization perspective. Our goal is to use evolutionary algorithms to search for better task-dependent fine-tuning patterns for specific NLP tasks than typical fixed fine-tuning patterns. Experimental results on two real-world language benchmarks and three advanced pre-training language models show the effectiveness and generality of the proposed framework.<\/jats:p>","DOI":"10.1145\/3689827","type":"journal-article","created":{"date-parts":[[2024,8,24]],"date-time":"2024-08-24T10:44:27Z","timestamp":1724496267000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["Layer-Wise Learning Rate Optimization for Task-Dependent Fine-Tuning of Pre-Trained Models: An Evolutionary Approach"],"prefix":"10.1145","volume":"4","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8203-0956","authenticated-orcid":false,"given":"Chenyang","family":"Bu","sequence":"first","affiliation":[{"name":"Key Laboratory of Knowledge Engineering with Big Data, Ministry of Education and School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0292-0834","authenticated-orcid":false,"given":"Yuxin","family":"Liu","sequence":"additional","affiliation":[{"name":"Key Laboratory of Knowledge Engineering with Big Data, Ministry of Education and School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-9103-8595","authenticated-orcid":false,"given":"Manzong","family":"Huang","sequence":"additional","affiliation":[{"name":"Key Laboratory of Knowledge Engineering with Big Data, Ministry of Education and School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-0214-346X","authenticated-orcid":false,"given":"Jianxuan","family":"Shao","sequence":"additional","affiliation":[{"name":"Key Laboratory of Knowledge Engineering with Big Data, Ministry of Education and School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4942-9767","authenticated-orcid":false,"given":"Shengwei","family":"Ji","sequence":"additional","affiliation":[{"name":"School of Artificial Intelligence and Big Data, Hefei University, Hefei, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8357-1655","authenticated-orcid":false,"given":"Wenjian","family":"Luo","sequence":"additional","affiliation":[{"name":"Guangdong Provincial Key Laboratory of Novel Security Intelligence Technologies, School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2396-1704","authenticated-orcid":false,"given":"Xindong","family":"Wu","sequence":"additional","affiliation":[{"name":"Key Laboratory of Knowledge Engineering with Big Data, Ministry of Education, Hefei University of Technology, Hefei, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,11,25]]},"reference":[{"key":"e_1_3_1_2_1","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Aghajanyan Armen","year":"2021","unstructured":"Armen Aghajanyan, Akshat Shrivastava, Anchit Gupta, Naman Goyal, Luke Zettlemoyer, and Sonal Gupta. 2021. Better fine-tuning by reducing representational collapse. In Proceedings of the International Conference on Learning Representations. Retrieved from https:\/\/openreview.net\/forum?id=OQ08SN70M1V"},{"key":"e_1_3_1_3_1","doi-asserted-by":"crossref","unstructured":"Claudio Angione Eric Silverman and Elisabeth Yaneske. 2022. Using machine learning as a surrogate model for agent-based simulations. PLoS One 17 (2022). Retrieved from https:\/\/api.semanticscholar.org\/CorpusID:246748043","DOI":"10.1371\/journal.pone.0263150"},{"key":"e_1_3_1_4_1","first-page":"632","volume-title":"Proceedings of EMNLP","author":"Bowman Samuel R.","year":"2015","unstructured":"Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015. A large annotated corpus for learning natural language inference. In Proceedings of EMNLP. Llu\u00eds M\u00e0rquez, Chris Callison-Burch, Jian Su, Daniele Pighin, and Yuval Marton (Eds.), ACL, 632\u2013642."},{"key":"e_1_3_1_5_1","doi-asserted-by":"publisher","DOI":"10.5555\/3495724.3495883"},{"key":"e_1_3_1_6_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10898-020-00916-w"},{"key":"e_1_3_1_7_1","first-page":"657","article-title":"Revisiting pre-trained models for Chinese natural language processing","author":"Cui Yiming","year":"2020","unstructured":"Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Shijin Wang, and Guoping Hu. 2020. Revisiting pre-trained models for Chinese natural language processing. In Findings of ACL: EMNLP. ACL, 657\u2013668.","journal-title":"Findings of ACL: EMNLP"},{"key":"e_1_3_1_8_1","first-page":"3079","volume-title":"Proceedings of NeurIPS","author":"Dai Andrew M.","year":"2015","unstructured":"Andrew M. Dai and Quoc V. Le. 2015. Semi-supervised sequence learning. In Proceedings of NeurIPS. Corinna Cortes, Neil D. Lawrence, Daniel D. Lee, Masashi Sugiyama, and Roman Garnett (Eds.), 3079\u20133087."},{"key":"e_1_3_1_9_1","first-page":"4171","volume-title":"Proceedings of NAACL-HLT","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT. Jill Burstein, Christy Doran, and Thamar Solorio (Eds.), ACL, 4171\u20134186."},{"key":"e_1_3_1_10_1","unstructured":"Jesse Dodge Gabriel Ilharco Roy Schwartz Ali Farhadi Hannaneh Hajishirzi and Noah A. Smith. 2020. Fine-tuning pretrained language models: Weight initializations data orders and early stopping. arXiv:2002.06305."},{"key":"e_1_3_1_11_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.compchemeng.2014.05.021"},{"key":"e_1_3_1_12_1","unstructured":"John M. Giorgi Xindi Wang Nicola Sahar Won Young Shin Gary D. Bader and Bo Wang. 2019. End-to-end named entity recognition and relation extraction using pre-trained language models. arXiv:1912.13415."},{"key":"e_1_3_1_13_1","doi-asserted-by":"crossref","unstructured":"Tomohiro Harada. 2023. A pairwise ranking estimation model for surrogate-assisted evolutionary algorithms. Complex & Intelligent Systems 9 (2023) 6875 \u2013 6890. Retrieved from https:\/\/api.semanticscholar.org\/CorpusID:259809093","DOI":"10.1007\/s40747-023-01113-4"},{"key":"e_1_3_1_14_1","first-page":"328","volume-title":"Proceedings of ACL","author":"Howard Jeremy","year":"2018","unstructured":"Jeremy Howard and Sebastian Ruder. 2018. Universal language model fine-tuning for text classification. In Proceedings of ACL. Iryna Gurevych and Yusuke Miyao (Eds.), ACL, 328\u2013339."},{"key":"e_1_3_1_15_1","volume-title":"Proceedings of ICLR","author":"Hu Edward J.","year":"2022","unstructured":"Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2022. LoRA: Low-rank adaptation of large language models. In Proceedings of ICLR. OpenReview.net. Retrieved from https:\/\/openreview.net\/forum?id=nZeVKeeFYf9"},{"key":"e_1_3_1_16_1","doi-asserted-by":"crossref","unstructured":"Beichen Huang Ran Cheng Zhuozhao Li Yaochu Jin and Kay Chen Tan. 2024. EvoX: A distributed GPU-accelerated framework for scalable evolutionary computation. IEEE Transactions on Evolutionary Computation (published online) (2024). Retrieved from https:\/\/ieeexplore.ieee.org\/document\/10499977","DOI":"10.1109\/TEVC.2024.3388550"},{"key":"e_1_3_1_17_1","doi-asserted-by":"crossref","unstructured":"William Frost Jenkins Peter Gerstoft and Yongsung Park. 2023. Bayesian optimization with Gaussian process surrogate model for source localization. The Journal of the Acoustical Society of America 154 3 (2023) 1459\u20131470. Retrieved from https:\/\/api.semanticscholar.org\/CorpusID:261580132","DOI":"10.1121\/10.0020839"},{"key":"e_1_3_1_18_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/2021.EMNLP-MAIN.243"},{"key":"e_1_3_1_19_1","unstructured":"Yinhan Liu Myle Ott Naman Goyal Jingfei Du Mandar Joshi Danqi Chen Omer Levy Mike Lewis Luke Zettlemoyer and Veselin Stoyanov. 2019. RoBERTa: A robustly optimized BERT pretraining approach. arXiv:1907.11692."},{"key":"e_1_3_1_20_1","first-page":"61","volume-title":"Proceedings of ACL (Volume 2: Short Papers)","author":"Liu Xiao","year":"2022","unstructured":"Xiao Liu, Kaixuan Ji, Yicheng Fu, Zhengxiao Du, Zhilin Yang, and Jie Tang. 2022a. P-Tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks. In Proceedings of ACL (Volume 2: Short Papers), 61\u201368."},{"key":"e_1_3_1_21_1","doi-asserted-by":"crossref","unstructured":"Ye Liu Gang Zhao Gang Li Wanxin He and Changting Zhong. 2022b. Analytical robust design optimization based on a hybrid surrogate model by combining polynomial chaos expansion and Gaussian kernel. Structural and Multidisciplinary Optimization 65 (2022b) 1\u201320. Retrieved from https:\/\/api.semanticscholar.org\/CorpusID:253512424","DOI":"10.1007\/s00158-022-03400-z"},{"key":"e_1_3_1_22_1","first-page":"364","volume-title":"Proceedings of PPSN Part I","volume":"6238","author":"Loshchilov Ilya","year":"2010","unstructured":"Ilya Loshchilov, Marc Schoenauer, and Mich\u00e8le Sebag. 2010. Comparison-based optimizers need comparison-based surrogates. In Proceedings of PPSN Part I. Robert Schaefer, Carlos Cotta, Joanna Kolodziej, and G\u00fcnter Rudolph (Eds.), Vol. 6238, Springer, 364\u2013373."},{"key":"e_1_3_1_23_1","first-page":"722","volume-title":"Proceedings of the GECCO","author":"Lu Yongfan","year":"2023","unstructured":"Yongfan Lu, Bingdong Li, Hong Qian, Wenjing Hong, Peng Yang, and Aimin Zhou. 2023. RM-SAEA: Regularity model based surrogate-assisted evolutionary algorithms for expensive multi-objective optimization. In Proceedings of the GECCO. ACM, 722\u2013730."},{"key":"e_1_3_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/TEVC.2022.3197427"},{"key":"e_1_3_1_25_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N18-1202"},{"key":"e_1_3_1_26_1","unstructured":"Alec Radford and Karthik Narasimhan. 2018. Improving language understanding by generative pre-training. Retrieved from https:\/\/api.semanticscholar.org\/CorpusID:49313245"},{"key":"e_1_3_1_27_1","volume-title":"Gaussian Processes for Machine Learning","author":"Rasmussen Carl Edward","year":"2006","unstructured":"Carl Edward Rasmussen and Christopher K. I. Williams. 2006. Gaussian Processes for Machine Learning. MIT Press."},{"key":"e_1_3_1_28_1","unstructured":"Philipp Reiser Javier Enrique Aguilar Anneli Guthke and Paul-Christian Burkner. 2023. Uncertainty quantification and propagation in surrogate-based Bayesian inference. arXiv:2312.05153. Retrieved from https:\/\/api.semanticscholar.org\/CorpusID:266149965"},{"key":"e_1_3_1_29_1","unstructured":"Victor Sanh Lysandre Debut Julien Chaumond and Thomas Wolf. 2019. DistilBERT a distilled version of BERT: Smaller faster cheaper and lighter. arXiv:1910.01108."},{"key":"e_1_3_1_30_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ijepes.2021.107401"},{"key":"e_1_3_1_31_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/2020.EMNLP-MAIN.346"},{"key":"e_1_3_1_32_1","series-title":"Lecture Notes in Computer Science","doi-asserted-by":"crossref","first-page":"194","DOI":"10.1007\/978-3-030-32381-3_16","volume-title":"Proceedings of CCL","volume":"11856","author":"Sun Chi","year":"2019","unstructured":"Chi Sun, Xipeng Qiu, Yige Xu, and Xuanjing Huang. 2019. How to fine-tune BERT for text classification? In Proceedings of CCL. Maosong Sun, Xuanjing Huang, Heng Ji, Zhiyuan Liu, and Yang Liu (Eds.), Lecture Notes in Computer Science, Vol. 11856, Springer, 194\u2013206."},{"key":"e_1_3_1_33_1","first-page":"1393","article-title":"Investigating transferability in pretrained language models","author":"Tamkin Alex","year":"2020","unstructured":"Alex Tamkin, Trisha Singh, Davide Giovanardi, and Noah D. Goodman. 2020. Investigating transferability in pretrained language models. In Findings of ACL: EMNLP, Vol. EMNLP, ACL, 1393\u20131401.","journal-title":"Findings of ACL: EMNLP"},{"key":"e_1_3_1_34_1","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Wang Alex","year":"2019","unstructured":"Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the International Conference on Learning Representations. Retrieved from https:\/\/openreview.net\/forum?id=rJ4km2R5t7"},{"key":"e_1_3_1_35_1","doi-asserted-by":"crossref","unstructured":"Liu Xingpo Luca Muzi Chai Yaozhi Tan Jue and Gao Jinyan. 2021. A comprehensive framework for HSPF hydrological parameter sensitivity optimization and uncertainty evaluation based on SVM surrogate model- A case study in Qinglong River watershed China. Environmental Modelling & Software 143 (2021) 105126. Retrieved from https:\/\/api.semanticscholar.org\/CorpusID:237319683","DOI":"10.1016\/j.envsoft.2021.105126"},{"key":"e_1_3_1_36_1","unstructured":"Liang Xu Yu Tong Qianqian Dong Yixuan Liao Cong Yu Yin Tian Weitang Liu Lu Li and Xuanwei Zhang. 2020. CLUENER2020: Fine-grained named entity recognition dataset and benchmark for Chinese. arXiv:2001.04351."},{"key":"e_1_3_1_37_1","doi-asserted-by":"crossref","unstructured":"Shu-Bo Yang Zukui Li and Wei Wu. 2021. Data-driven process optimization considering surrogate model prediction uncertainty: A mixture density network-based approach. Industrial & Engineering Chemistry Research 60 (2021) 2206\u20132222.","DOI":"10.1021\/acs.iecr.0c04214"},{"key":"e_1_3_1_38_1","first-page":"3320","volume-title":"Proceedings of NeurIPS","author":"Yosinski Jason","year":"2014","unstructured":"Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. 2014. How transferable are features in deep neural networks? In Proceedings of NeurIPS. Zoubin Ghahramani, Max Welling, Corinna Cortes, Neil D. Lawrence, and Kilian Q. Weinberger (Eds.), 3320\u20133328."},{"key":"e_1_3_1_39_1","doi-asserted-by":"crossref","unstructured":"Changhai Yu Xiaolong Lv Dan Huang and Dongju Jiang. 2023. Reliability-based design optimization of offshore wind turbine support structures using RBF surrogate model. Frontiers of Structural and Civil Engineering 17 (2023) 1086\u20131099. Retrieved from https:\/\/api.semanticscholar.org\/CorpusID:265665484","DOI":"10.1007\/s11709-023-0976-8"},{"key":"e_1_3_1_40_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/2022.ACL-SHORT.1"},{"key":"e_1_3_1_41_1","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Zhang Tianyi","year":"2021","unstructured":"Tianyi Zhang, Felix Wu, Arzoo Katiyar, Kilian Q. Weinberger, and Yoav Artzi. 2021a. Revisiting few-sample BERT fine-tuning. In Proceedings of the International Conference on Learning Representations. Retrieved from https:\/\/openreview.net\/forum?id=cO1IH43yUF"},{"key":"e_1_3_1_42_1","first-page":"421","article-title":"AMBERT: A pre-trained language model with multi-grained tokenization","author":"Zhang Xinsong","year":"2021","unstructured":"Xinsong Zhang, Pengshuai Li, and Hang Li. 2021b. AMBERT: A pre-trained language model with multi-grained tokenization. In Findings of ACL. Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (Eds.), ACL, 421\u2013435.","journal-title":"Findings of ACL"},{"key":"e_1_3_1_43_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.coco.2021.100671"},{"key":"e_1_3_1_44_1","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Zhu Chen","year":"2020","unstructured":"Chen Zhu, Yu Cheng, Zhe Gan, Siqi Sun, Tom Goldstein, and Jingjing Liu. 2020. FreeLB: Enhanced adversarial training for natural language understanding. In Proceedings of the International Conference on Learning Representations. Retrieved from https:\/\/openreview.net\/forum?id=BygzbyHFvB"}],"container-title":["ACM Transactions on Evolutionary Learning and Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3689827","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3689827","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T00:06:11Z","timestamp":1750291571000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3689827"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,11,25]]},"references-count":43,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2024,12,31]]}},"alternative-id":["10.1145\/3689827"],"URL":"https:\/\/doi.org\/10.1145\/3689827","relation":{},"ISSN":["2688-3007"],"issn-type":[{"value":"2688-3007","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,11,25]]},"assertion":[{"value":"2023-05-02","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-08-02","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-11-25","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}