{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,13]],"date-time":"2026-03-13T01:58:34Z","timestamp":1773367114173,"version":"3.50.1"},"reference-count":51,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2026,3,12]],"date-time":"2026-03-12T00:00:00Z","timestamp":1773273600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>Low-Rank Adaptation (LoRA) has become a widely adopted parameter-efficient fine-tuning (PEFT) technique for large language models (LLMs). LoRA\u2019s benefits stem from its light weight and modular adapters. Standard LoRA applies adapters uniformly across all Transformer layers, implicitly assuming that each layer contributes equally to task adaptation. However, LLMs are found to have internal substructures that contribute in a disproportionate manner. In this work, we provide a theoretical analysis of how LoRA weight updates are influenced by a layer\u2019s activation magnitude. We propose Act-LoRA, a simple activation-guided layer selection strategy for selective Low-Rank Adaptation. We evaluate this strategy for both encoder-only and decoder-only architectures using the GLUE benchmark. Our method achieved a 20% GPUh saving with a 1% drop in GLUE score using DeBERTaV3-Base on a single-instance GPU with 50% less LoRA parameters. It also achieved 2% GPUh savings with a less than 0.15% drop in GLUE score with the Llama-3.1-8B model in Distributed Data Parallel mode with 25% fewer LoRA parameters. Our experiments and analysis show that the compute and memory requirements of LoRA adapters increase linearly with the number of selected layers. We further compare activation-guided selection against gradient-guided importance metrics and show that activation norms yield more stable and reproducible layer rankings across seeds and datasets. Overall, our results demonstrate that activation-guided layer selection is a practical and effective way to improve the efficiency of LoRA fine-tuning, making it immediately compatible with some existing PEFT techniques and distributed training frameworks.<\/jats:p>","DOI":"10.3390\/info17030283","type":"journal-article","created":{"date-parts":[[2026,3,12]],"date-time":"2026-03-12T13:12:24Z","timestamp":1773321144000},"page":"283","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Activation-Guided Layer Selection for LoRA"],"prefix":"10.3390","volume":"17","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7208-6681","authenticated-orcid":false,"given":"Aditya","family":"Dawadikar","sequence":"first","affiliation":[{"name":"Department of Computer Science, San Jose State University, San Jose, CA 95192, USA"}]},{"given":"Pooja","family":"Shyamsundar","sequence":"additional","affiliation":[{"name":"IBM, Armonk, NY 10504, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-2955-5462","authenticated-orcid":false,"given":"Rashmi Vishwanath","family":"Bhat","sequence":"additional","affiliation":[{"name":"Salesforce, San Francisco, CA 94105, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4875-0420","authenticated-orcid":false,"given":"Navrati","family":"Saxena","sequence":"additional","affiliation":[{"name":"Department of Computer Science, San Jose State University, San Jose, CA 95192, USA"}]}],"member":"1968","published-online":{"date-parts":[[2026,3,12]]},"reference":[{"key":"ref_1","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., and Polosukhin, I. (2017). Attention is all you need. arXiv."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., and Funtowicz, M. (2020). Transformers: State-of-the-Art Natural Language Processing. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Online, 16\u201320 November 2020, Association for Computational Linguistics.","DOI":"10.18653\/v1\/2020.emnlp-demos.6"},{"key":"ref_3","unstructured":"Sanh, V., Debut, L., Chaumond, J., and Wolf, T. (2019). DistilBERT: A distilled version of BERT. arXiv."},{"key":"ref_4","unstructured":"Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv."},{"key":"ref_5","unstructured":"Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., de Laroussilhe, Q., Gesmundo, A., Attariyan, M., and Gelly, S. (2019). Parameter-Efficient Transfer Learning for NLP. arXiv."},{"key":"ref_6","unstructured":"Hu, E., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., and Chen, W. (2022, January 25\u201329). LoRA: Low-rank adaptation of large language models. Proceedings of the International Conference on Learning Representations (ICLR), Online. Available online: https:\/\/openreview.net\/forum?id=nZeVKeeFYf9."},{"key":"ref_7","unstructured":"Liu, H., Tamkin, A., Hajishirzi, M., and Smith, N.A. (2022). Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning. arXiv."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Lester, B., Al-Rfou, R., and Constant, N. (2021). The Power of Scale for Parameter-Efficient Prompt Tuning. arXiv.","DOI":"10.18653\/v1\/2021.emnlp-main.243"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Li, J., and Liang, Y. (2021). Prefix-Tuning: Optimizing Continuous Prompts for Generation. arXiv.","DOI":"10.18653\/v1\/2021.acl-long.353"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"70909","DOI":"10.1109\/ACCESS.2025.3533701","article-title":"Activation-guided low-rank parameter adaptation for efficient model fine-tuning","volume":"13","author":"Wang","year":"2025","journal-title":"IEEE Access"},{"key":"ref_11","unstructured":"Meta AI (2026, January 20). The LLaMA 3 Herd of Models: Open and Efficient Foundation Language Models. Available online: https:\/\/ai.meta.com\/llama\/."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Voita, E., Talbot, D., Moiseev, F., Sennrich, D., and Titov, I. (2019). Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting. arXiv.","DOI":"10.18653\/v1\/P19-1580"},{"key":"ref_13","unstructured":"Zhang, R., Han, S., Gao, H., Zhang, W., and Liu, S. (2023, January 1\u20135). AdaLoRA: Adaptive budget allocation for parameter-efficient fine-tuning. Proceedings of the International Conference on Learning Representations (ICLR), Online. Available online: https:\/\/openreview.net\/forum?id=lq62uWRJjiY."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Liao, X., Wang, C., Zhou, S., Hu, J., Zheng, H., and Gao, J. (2025). Dynamic Adaptation of LoRA Fine-Tuning for Efficient and Task-Specific Optimization of Large Language Models. arXiv.","DOI":"10.1145\/3730436.3730456"},{"key":"ref_15","unstructured":"Kaplan, J., McCandlish, S., Henighan, T., Brown, T.B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., and Amodei, D. (2020). Scaling laws for neural language models. arXiv."},{"key":"ref_16","unstructured":"Hoffmann, J., Borgeaud, S., Mensch, A., Buchatskaya, E., Cai, T., Rutherford, E., Casas, D.d., Hendricks, L., Welbl, J., and Rings, F. (2020). Training compute-optimal large language models. arXiv."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"101429","DOI":"10.1016\/j.csl.2022.101429","article-title":"On the Effect of Dropping Layers of Pre-trained Transformer Models","volume":"77","author":"Sajjad","year":"2023","journal-title":"Comput. Speech Lang."},{"key":"ref_18","unstructured":"Fan, A., Grave, E., and Joulin, A. (2020, January 26\u201330). Reducing Transformer Depth on Demand with Structured Dropout. Proceedings of the International Conference on Learning Representations (ICLR), Addis Ababa, Ethiopia."},{"key":"ref_19","unstructured":"Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V. (2019). RoBERTa: A robustly optimized BERT pretraining approach. arXiv."},{"key":"ref_20","unstructured":"Yao, K., Gao, P., Li, L., Zhao, Y., Wang, X., Wang, W., and Zhu, J. (2020). Layer-wise Importance Matters: Less Memory for Better Performance in Parameter-efficient Fine-tuning of Large Language Models. arXiv."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"131575","DOI":"10.52202\/079017-4182","article-title":"Unveiling LoRA intrinsic ranks via salience analysis","volume":"37","author":"Ke","year":"2024","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_22","unstructured":"Chen, H., and Garner, P.N. (2024). A Bayesian interpretation of adaptive low-rank adaptation. arXiv."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Valipour, M., Rezagholizadeh, M., Kobyzev, I., and Ghodsi, A. (2023). DyLoRA: Parameter-efficient tuning of pre-trained models using dynamic search-free low-rank adaptation. Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, Dubrovnik, Croatia, 2\u20136 May 2023, Association for Computational Linguistics.","DOI":"10.18653\/v1\/2023.eacl-main.239"},{"key":"ref_24","first-page":"1746","article-title":"Learning ordered representations with nested dropout","volume":"32","author":"Rippel","year":"2014","journal-title":"Proc. ICML (PMLR)"},{"key":"ref_25","unstructured":"Liang, J., Liu, C., Yu, Y., and Xu, Y. (2024). ALoRA: Allocating low-rank adaptation for fine-tuning large language models. arXiv."},{"key":"ref_26","unstructured":"Xu, H., Xu, H., Chen, L., and Kong, L. (2024). AutoLoRA: Automatically tuning matrix ranks in low-rank adaptation based on meta learning. arXiv."},{"key":"ref_27","unstructured":"Chen, H., Zhang, J., and Chen, Y. (2024). LoRA-drop: Efficient LoRA parameter pruning based on output evaluation. arXiv."},{"key":"ref_28","unstructured":"Yang, A., Chen, L., Liu, Z., and Tang, J. (2023). LoRA-FA: Memory-efficient low-rank adaptation for large language model fine-tuning. arXiv."},{"key":"ref_29","unstructured":"Zhao, J., Ren, Z., Zhao, K., Ma, R., Wu, J., and Huang, H. (2024, January 7\u201311). GaLore: Memory-efficient LLM training by gradient low-rank projection. Proceedings of the International Conference on Machine Learning (ICML), Online. Available online: https:\/\/openreview.net\/forum?id=hYHsrKDiX7."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Dettmers, T., Pagnoni, A., Holtzman, A., and Zettlemoyer, L. (2023, January 10\u201316). QLoRA: Efficient fine-tuning of quantized large language models. Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), New Orleans, LA, USA.","DOI":"10.52202\/075280-0441"},{"key":"ref_31","unstructured":"Meng, C., Deng, C., Shen, Y., Yang, H., and Zhang, Y. (2024). LoRA+: Efficient low-rank adaptation of large models. arXiv."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"026119","DOI":"10.1063\/5.0203126","article-title":"X-LoRA: Mixture of low-rank adapter experts, a flexible framework for large language models with applications in protein mechanics and molecular design","volume":"2","author":"Buehler","year":"2024","journal-title":"Apl. Mach. Learn."},{"key":"ref_33","unstructured":"Zhang, Z., Hu, Y., Li, X., and Zhao, J. (2023). Delta-LoRA: Fine-tuning high-rank parameters with the delta of low-rank matrices. arXiv."},{"key":"ref_34","unstructured":"Liu, Z., Xu, H., Wang, X., and Yu, Y. (2024). DoRA: Weight-decomposed low-rank adaptation. arXiv."},{"key":"ref_35","unstructured":"Kopiczko, R., Tjandra, M., and Simonyan, K. (2023). VeRA: Vector-based random matrix adaptation. arXiv."},{"key":"ref_36","unstructured":"Li, L., Lin, C., Li, D., Huang, Y.-L., Li, W., Wu, T., Zou, J., Xue, W., Han, S., and Guo, Y. (2025). Efficient Fine-Tuning of Large Models via Nested Low-Rank Adaptation. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Honolulu, HI, USA, 19\u201323 October 2025, IEEE."},{"key":"ref_37","unstructured":"Hyeon-Woo, N., Moon, Y.-B., and Oh, T.-H. (2021). FedPara: Low-Rank Hadamard Product for Communication-Efficient Federated Learning. arXiv."},{"key":"ref_38","unstructured":"Yeh, S.-Y., Hsieh, Y.-G., Gao, Z., Yang, B.B.W., Oh, G., and Gong, Y. (2023). Navigating Text-To-Image Customization: From LyCORIS Fine-Tuning to Model Evaluation. arXiv."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"3521","DOI":"10.1073\/pnas.1611835114","article-title":"Overcoming catastrophic forgetting in neural networks","volume":"114","author":"Kirkpatrick","year":"2017","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Tenney, I., Das, D., and Pavlick, E. (2019). BERT Rediscovers the Classical NLP Pipeline. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July\u20132 August 2019, Association for Computational Linguistics.","DOI":"10.18653\/v1\/P19-1452"},{"key":"ref_41","unstructured":"Michel, M., Levy, O., and Neubig, G. (2019). Are Sixteen Heads Really Better than One?. arXiv."},{"key":"ref_42","unstructured":"Hao, Y., Zeng, A., and Sakti, S. (2020). Self-Attention Attribution: Interpreting Information Interactions Inside Transformer. arXiv."},{"key":"ref_43","unstructured":"Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V. (2021). Multi-Head or Single-Head? An Empirical Comparison. arXiv."},{"key":"ref_44","unstructured":"LeCun, Y., Denker, J.S., and Solla, S.A. (1990, January 2\u20137). Optimal brain damage. Proceedings of the NIPS\u201989: The 3rd International Conference on Neural Information Processing Systems, San Diego, CA, USA."},{"key":"ref_45","unstructured":"Hassibi, B., and Stork, D.G. (1993). Second-order derivatives for network pruning: Optimal brain surgeon. Proceedings of the Advances in Neural Information Processing Systems 5 (NIPS 1992), San Francisco, CA, USA, 30 November\u20133 December 1992, Morgan Kaufmann Publishers Inc."},{"key":"ref_46","unstructured":"Theis, L., Korshunova, I., Tejani, A., and Husz\u00e1r, F. (2018). Faster gaze prediction with dense networks and Fisher pruning. arXiv."},{"key":"ref_47","unstructured":"Li, H., Kadav, A., Durdanovic, I., Samet, H., and Graf, H.P. (2017). Pruning filters for efficient ConvNets. arXiv."},{"key":"ref_48","unstructured":"Olsson, K., Weller, A., and Rockt\u00e4schel, T. (2022). In-Context Learning and Induction Heads. arXiv."},{"key":"ref_49","unstructured":"Xu, Y., Liang, Y., Dai, S., Hu, T., Chan, T.N., and Ma, C. (2020). Understanding and Guiding Layer Placement in Parameter-Efficient Fine-Tuning of Large Language Models. arXiv."},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., and Bowman, S.R. (2019). GLUE: A multi-task benchmark and analysis platform for natural language understanding. arXiv.","DOI":"10.18653\/v1\/W18-5446"},{"key":"ref_51","unstructured":"He, P., Liu, X., Gao, J., and Chen, W. (2021). DeBERTa: Decoding-enhanced BERT with disentangled attention. arXiv."}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/17\/3\/283\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,12]],"date-time":"2026-03-12T13:51:16Z","timestamp":1773323476000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/17\/3\/283"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3,12]]},"references-count":51,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2026,3]]}},"alternative-id":["info17030283"],"URL":"https:\/\/doi.org\/10.3390\/info17030283","relation":{},"ISSN":["2078-2489"],"issn-type":[{"value":"2078-2489","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,3,12]]}}}