{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,20]],"date-time":"2026-05-20T01:50:09Z","timestamp":1779241809395,"version":"3.51.4"},"reference-count":70,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2025,4,15]],"date-time":"2025-04-15T00:00:00Z","timestamp":1744675200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001663","name":"VolkswagenStiftung","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100001663","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Responsible Computing Challenge"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Intell. Syst. Technol."],"published-print":{"date-parts":[[2025,6,30]]},"abstract":"<jats:p>In the rapidly evolving field of Natural Language Processing (NLP), optimizing methods for fine-tuning Large Language Models (LLMs) is increasingly critical for improving generalization and performance. Fine-tuning LLMs is challenging due to high costs, overfitting, and difficulty adapting to diverse tasks. These challenges grow as LLMs scale, making traditional fine-tuning methods inefficient and expensive. To address these issues, a novel Information Bottleneck (IB) method for fine-tuning LLMs is proposed, focusing on retaining only the most critical and relevant information in the model\u2019s internal representations. By striking a balance between information compression and predictive relevance, the IB method aims to reduce overfitting and enhance generalization. This approach also integrates reinforcement learning and continual learning to enhance LLM performance further. The proposed framework considers two key metrics: (1) compression effectiveness, which reduces redundancy and improves generalization, and (2) predictive relevance, which ensures high task-specific performance. The proposed scheme achieves scalable fine-tuning across diverse NLP tasks using a lightweight proxy model to enhance computational efficiency. The proposed framework empirical evaluations and ablation studies show that the IB method improves accuracy while significantly reducing computational costs, enabling efficient, interpretable, and adaptable LLM optimization and increasing convergence.<\/jats:p>","DOI":"10.1145\/3718096","type":"journal-article","created":{"date-parts":[[2025,2,18]],"date-time":"2025-02-18T16:24:37Z","timestamp":1739895877000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Efficiency and Performance Optimization in Large Language Models through IB Fine-Tuning"],"prefix":"10.1145","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0009-0009-8641-5226","authenticated-orcid":false,"given":"Ashly Ann","family":"Jo","sequence":"first","affiliation":[{"name":"FACTS-H Lab, Indian Institute of Information Technology, Kottayam, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3339-8601","authenticated-orcid":false,"given":"Ebin Deni","family":"Raj","sequence":"additional","affiliation":[{"name":"FACTS-H Lab, Indian Institute of Information Technology, Kottayam, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4514-3916","authenticated-orcid":false,"given":"Jayakrushna","family":"Sahoo","sequence":"additional","affiliation":[{"name":"FACTS-H Lab, Indian Institute of Information Technology, Kottayam, India"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,4,15]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2017.2784440"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.3390\/info14080462"},{"key":"e_1_3_1_4_2","unstructured":"Alexander A. Alemi Ian Fischer Joshua V. Dillon and Kevin Murphy. 2016. Deep variational information bottleneck. arXiv:1612.00410. Retrieved from https:\/\/doi.org\/10.48550\/arXiv.1612.00410"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2019.2909031"},{"key":"e_1_3_1_6_2","doi-asserted-by":"crossref","first-page":"261","DOI":"10.1007\/978-981-99-1203-2_23","volume-title":"Proceedings of Advances in Distributed Computing and Machine Learning (ICADCML \u201923)","author":"Jo Ashly Ann","year":"2023","unstructured":"Ashly Ann Jo and Ebin Deni Raj. 2023. Post hoc interpretability: Review on new Frontiers of interpretable AI. In Proceedings of Advances in Distributed Computing and Machine Learning (ICADCML \u201923), 261\u2013276."},{"key":"e_1_3_1_7_2","doi-asserted-by":"crossref","first-page":"9114","DOI":"10.18653\/v1\/2021.emnlp-main.717","volume-title":"Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing","author":"Arefyev Nikolay","year":"2021","unstructured":"Nikolay Arefyev, Dmitrii Kharchev, and Artem Shelmanov. 2021. NB-MLM: Efficient domain adaptation of masked language models for sentiment analysis. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 9114\u20139124."},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10660-022-09560-w"},{"key":"e_1_3_1_9_2","doi-asserted-by":"crossref","first-page":"2418","DOI":"10.1109\/OJCOMS.2024.3390069","article-title":"Opportunistic information-bottleneck for goal-oriented feature extraction and communication","volume":"5","author":"Binucci Francesco","year":"2024","unstructured":"Francesco Binucci, Paolo Banelli, Paolo Di Lorenzo, and Sergio Barbarossa. 2024. Opportunistic information-bottleneck for goal-oriented feature extraction and communication. IEEE Open Journal of the Communications Society 5 (2024), 2418\u20132432.","journal-title":"IEEE Open Journal of the Communications Society"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/3641289"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.5555\/1046920.1046926"},{"key":"e_1_3_1_12_2","doi-asserted-by":"crossref","unstructured":"Jin Chen Zheng Liu Xu Huang Chenwang Wu Qi Liu Gangwei Jiang Yuanhao Pu Yuxuan Lei Xiaolong Chen Xingmei Wanget al. 2023. When large language models meet personalization: Perspectives of challenges and opportunities. arXiv:2307.16376. Retrieved from https:\/\/doi.org\/10.48550\/arXiv.2307.16376","DOI":"10.1007\/s11280-024-01276-1"},{"key":"e_1_3_1_13_2","unstructured":"Yukang Chen Shengju Qian Haotian Tang Xin Lai Zhijian Liu Song Han and Jiaya Jia. 2023. Longlora: Efficient fine-tuning of long-context large language models. arXiv:2309.12307. Retrieved from https:\/\/doi.org\/10.48550\/arXiv.2309.12307"},{"key":"e_1_3_1_14_2","first-page":"435","volume-title":"Computer Vision\u2013ECCV 2016 Workshops","author":"Chu Brian","year":"2016","unstructured":"Brian Chu, Vashisht Madhavan, Oscar Beijbom, Judy Hoffman, and Trevor Darrell. 2016. Best practices for fine-tuning visual classifiers to new domains. In Computer Vision\u2013ECCV 2016 Workshops. Springer, 435\u2013442."},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00432"},{"issue":"7","key":"e_1_3_1_16_2","first-page":"1708","article-title":"Bidirectional encoder representations from transformers (BERT) language model for sentiment analysis task: Review","volume":"12","author":"Deepa D.","year":"2021","unstructured":"D. Deepa and A. Tamilarasi. 2021. Bidirectional encoder representations from transformers (BERT) language model for sentiment analysis task: Review. Turkish Journal of Computer and Mathematics Education 12, 7 (2021), 1708\u20131721.","journal-title":"Turkish Journal of Computer and Mathematics Education"},{"key":"e_1_3_1_17_2","first-page":"4171","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technologies, Long and Short Papers, Vol. 1","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technologies, Long and Short Papers, Vol. 1, Association for Computational Linguistics, 4171\u20134186."},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1038\/s42256-023-00626-4"},{"key":"e_1_3_1_19_2","doi-asserted-by":"crossref","first-page":"175","DOI":"10.1109\/ICICT48043.2020.9112469","volume-title":"Proceedings of the 2020 International Conference on Inventive Computation Technologies (ICICT)","author":"Dutta Pronnoy","year":"2020","unstructured":"Pronnoy Dutta, Pradumn Upadhyay, Madhurima De, and RG Khalkar. 2020. Medical image analysis using deep convolutional neural networks: CNN architectures and transfer learning. In Proceedings of the 2020 International Conference on Inventive Computation Technologies (ICICT). IEEE, 175\u2013180."},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.3390\/info13020083"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.aiopen.2021.08.002"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ebiom.2023.104512"},{"issue":"178","key":"e_1_3_1_23_2","doi-asserted-by":"crossref","first-page":"33","DOI":"10.1002\/ev.20556","article-title":"Large language model applications for evaluation: Opportunities and ethical implications","volume":"2023","author":"Head Cari Beth","year":"2023","unstructured":"Cari Beth Head, Paul Jasper, Matthew McConnachie, Linda Raftree, and Grace Higdon. 2023. Large language model applications for evaluation: Opportunities and ethical implications. New Directions for Evaluation 2023, 178-179 (2023), 33\u201346.","journal-title":"New Directions for Evaluation"},{"key":"e_1_3_1_24_2","first-page":"1","volume-title":"Proceedings of the 31st Conference on Neural Information Processing Systems","author":"Hodas Nathan O.","year":"2017","unstructured":"Nathan O. Hodas, Kyle Shaffer, Artem Yankov, Courtney D. Corley, Aryk Anderson, and Washington Cheney. 2017. Beyond fine tuning: Adding capacity to leverage few labels. In Proceedings of the 31st Conference on Neural Information Processing Systems, 1\u20137."},{"issue":"2","key":"e_1_3_1_25_2","first-page":"3","article-title":"LoRA: Low-rank adaptation of large language models","volume":"1","author":"Hu Edward J.","year":"2022","unstructured":"Edward J. Hu, Yelong shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2022. LoRA: Low-rank adaptation of large language models. ICLR 1, 2 (2022), 3.","journal-title":"ICLR"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00886"},{"key":"e_1_3_1_27_2","unstructured":"Zhiqiang Hu Yihuai Lan Lei Wang Wanyu Xu Ee-Peng Lim Roy Ka-Wei Lee Lidong Bing and Soujanya Poria. 2023. LLM-adapters: An adapter family for parameter-efficient fine-tuning of large language models. arXiv:2304.01933. Retrieved from https:\/\/doi.org\/10.48550\/arXiv.2304.01933"},{"issue":"12","key":"e_1_3_1_28_2","doi-asserted-by":"crossref","first-page":"5068","DOI":"10.3390\/app14125068","article-title":"From large language models to large multimodal models: A literature review","volume":"14","author":"Huang Dawei","year":"2024","unstructured":"Dawei Huang, Chuan Yan, Qing Li, and Xiaojiang Peng. 2024. From large language models to large multimodal models: A literature review. Applied Sciences 14, 12 (2024), 5068.","journal-title":"Applied Sciences"},{"key":"e_1_3_1_29_2","doi-asserted-by":"crossref","first-page":"122666","DOI":"10.1016\/j.eswa.2023.122666","article-title":"A comprehensive survey on applications of transformers for deep learning tasks","volume":"241","author":"Islam Saidul","year":"2023","unstructured":"Saidul Islam, Hanae Elmekki, Ahmed Elsebai, Jamal Bentahar, Nagat Drawel, Gaith Rjoub, and Witold Pedrycz. 2023. A comprehensive survey on applications of transformers for deep learning tasks. Expert Systems with Applications 241 (2023), 122666.","journal-title":"Expert Systems with Applications"},{"key":"e_1_3_1_30_2","first-page":"1","volume-title":"Proceedings of the 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT)","author":"Jo Ashly Ann","year":"2024","unstructured":"Ashly Ann Jo and Ebin Deni Raj. 2024. Ethical dimensions in data-driven business: Balancing fairness, accountability, transparency, and explainability. In Proceedings of the 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT). IEEE, 1\u20136."},{"key":"e_1_3_1_31_2","unstructured":"Katikapalli Subramanyam Kalyan Ajit Rajasekharan and Sivanesan Sangeetha. 2021. Ammus: A survey of transformer-based pretrained models in natural language processing. arXiv:2108.05542. Retrieved from https:\/\/doi.org\/10.48550\/arXiv.2108.05542"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.3390\/e21121181"},{"key":"e_1_3_1_33_2","unstructured":"Ananya Kumar Aditi Raghunathan Robbie Jones Tengyu Ma and Percy Liang. 2022. Fine-tuning can distort pretrained features and underperform out-of-distribution. arXiv:2202.10054. Retrieved from https:\/\/doi.org\/10.48550\/arXiv.2202.10054"},{"key":"e_1_3_1_34_2","first-page":"27249","article-title":"Improved regularization and robustness for fine-tuning in neural networks","volume":"34","author":"Li Dongyue","year":"2021","unstructured":"Dongyue Li and Hongyang Zhang. 2021. Improved regularization and robustness for fine-tuning in neural networks. In Advances in Neural Information Processing Systems, Vol. 34, 27249\u201327262.","journal-title":"Advances in Neural Information Processing Systems, Vol"},{"key":"e_1_3_1_35_2","unstructured":"Hao Li Pratik Chaudhari Hao Yang Michael Lam Avinash Ravichandran Rahul Bhotika and Stefano Soatto. 2020. Rethinking the hyperparameters for fine-tuning. arXiv:2002.11770. Retrieved from https:\/\/doi.org\/10.48550\/arXiv.2002.11770"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41562-024-01847-2"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/3663530.3665021"},{"key":"e_1_3_1_38_2","first-page":"32568","article-title":"Improved fine-tuning by better leveraging pre-training data","volume":"35","author":"Liu Ziquan","year":"2022","unstructured":"Ziquan Liu, Yi Xu, Yuanhong Xu, Qi Qian, Hao Li, Xiangyang Ji, Antoni Chan, and Rong Jin. 2022. Improved fine-tuning by better leveraging pre-training data. In Advances in Neural Information Processing Systems, Vol. 35, 32568\u201332581.","journal-title":"Advances in Neural Information Processing Systems, Vol"},{"key":"e_1_3_1_39_2","first-page":"340","volume-title":"Proceedings of the 2020 4th International Conference on Computing Methodologies and Communication (ICCMC)","author":"Mathew Leeja","year":"2020","unstructured":"Leeja Mathew and V. R. Bindu. 2020. A review of natural language processing techniques for sentiment analysis using pre-trained models. In Proceedings of the 2020 4th International Conference on Computing Methodologies and Communication (ICCMC). IEEE, 340\u2013345."},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1145\/3605943"},{"key":"e_1_3_1_41_2","first-page":"9234","article-title":"Graph information bottleneck for subgraph recognition","volume":"34","author":"Misra Dipendra","year":"2021","unstructured":"Dipendra Misra and Dheeraj Roy. 2021. Graph information bottleneck for subgraph recognition. In Advances in Neural Information Processing Systems (NeurIPS), Vol. 34, 9234\u20139245.","journal-title":"Advances in Neural Information Processing Systems (NeurIPS)"},{"key":"e_1_3_1_42_2","unstructured":"Subhabrata Mukherjee and Ahmed Hassan Awadallah. 2019. Distilling Bert into simple neural networks with unlabeled transfer data. arXiv:1910.01769. Retrieved from https:\/\/doi.org\/10.48550\/arXiv.1910.01769"},{"key":"e_1_3_1_43_2","first-page":"27730","article-title":"Training language models to follow instructions with human feedback","volume":"35","author":"Ouyang Long","year":"2022","unstructured":"Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Rayet al., 2022. Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems, Vol. 35, 27730\u201327744.","journal-title":"Advances in Neural Information Processing Systems, Vol"},{"issue":"3","key":"e_1_3_1_44_2","first-page":"1043","article-title":"Unsupervised feature selection based on nonlinear information theory","volume":"32","author":"Pan Xiaodan","year":"2021","unstructured":"Xiaodan Pan, Makoto Yamada, Xin Wang, and Masashi Sugiyama. 2021. Unsupervised feature selection based on nonlinear information theory. IEEE Transactions on Neural Networks and Learning Systems 32, 3 (2021), 1043\u20131055.","journal-title":"IEEE Transactions on Neural Networks and Learning Systems"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-demos.7"},{"key":"e_1_3_1_46_2","unstructured":"Jason Phang Thibault F\u00e9vry and Samuel R. Bowman. 2018. Sentence encoders on stilts: Supplementary training on intermediate labeled-data tasks. arXiv:1811.01088. Retrieved from https:\/\/doi.org\/10.48550\/arXiv.1811.01088"},{"issue":"8","key":"e_1_3_1_47_2","article-title":"Leveraging the potential of large language models","volume":"48","author":"Prasad Shreya","year":"2024","unstructured":"Shreya Prasad, Himank Gupta, and Arup Ghosh. 2024. Leveraging the potential of large language models. Informatica 48, 8 (2024).","journal-title":"Informatica"},{"key":"e_1_3_1_48_2","unstructured":"Xiangyu Qi Yi Zeng Tinghao Xie Pin-Yu Chen Ruoxi Jia Prateek Mittal and Peter Henderson. 2023. Fine-tuning aligned language models compromises safety even when users do not intend to! arXiv:2310.03693. Retrieved from https:\/\/doi.org\/10.48550\/arXiv.2310.03693"},{"key":"e_1_3_1_49_2","first-page":"23231","article-title":"Meta-learning to improve pre-training","volume":"34","author":"Raghu Aniruddh","year":"2021","unstructured":"Aniruddh Raghu, Jonathan Lorraine, Simon Kornblith, Matthew McDermott, and David K. Duvenaud. 2021. Meta-learning to improve pre-training. In Advances in Neural Information Processing Systems, Vol. 34, 23231\u201323244.","journal-title":"Advances in Neural Information Processing Systems, Vol"},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2024.3365742"},{"key":"e_1_3_1_51_2","unstructured":"Vinay V. Ramasesh Ethan Dyer and Maithra Raghu. 2020. Anatomy of catastrophic forgetting: Hidden representations and task semantics. arXiv:2007.07400. Retrieved from https:\/\/doi.org\/10.48550\/arXiv.2007.07400"},{"key":"e_1_3_1_52_2","first-page":"304","volume-title":"Proceedings of the 2024 IEEE 3rd World Conference on Applied Intelligence and Computing (AIC)","author":"Ranjith R.","year":"2024","unstructured":"R. Ranjith, Ashly Ann Jo, and Ebin Deni Raj. 2024. Bipol-driven bias analysis in transformer models: A quantitative, global, and local interpretability approach for textual data. In Proceedings of the 2024 IEEE 3rd World Conference on Applied Intelligence and Computing (AIC). IEEE, 304\u2013309."},{"key":"e_1_3_1_53_2","doi-asserted-by":"publisher","DOI":"10.1145\/3411763.3451760"},{"key":"e_1_3_1_54_2","first-page":"76737","article-title":"Interpretable prototype-based graph information bottleneck","volume":"36","author":"Seo Sangwoo","year":"2024","unstructured":"Sangwoo Seo, Sungwon Kim, and Chanyoung Park. 2024. Interpretable prototype-based graph information bottleneck. Advances in Neural Information Processing Systems 36 (2023), 76737\u201376748.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_55_2","unstructured":"Ravid Shwartz-Ziv and Naftali Tishby. 2017. Opening the Black Box of deep neural networks via information. arXiv:1703.00810. Retrieved from https:\/\/doi.org\/10.48550\/arXiv.1703.00810"},{"key":"e_1_3_1_56_2","first-page":"1","article-title":"Transparency by design for large language models","author":"South Tobin","year":"2023","unstructured":"Tobin South, Robert Mahari, and Alex Pentland. 2023. Transparency by design for large language models. Computational Legal Futures, Network Law Review (2023), 1\u20136.","journal-title":"Computational Legal Futures, Network Law Review"},{"key":"e_1_3_1_57_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSAC.2023.3288238"},{"key":"e_1_3_1_58_2","first-page":"680","volume-title":"Proceedings of the 2023 2nd International Conference on Applied Artificial Intelligence and Computing (ICAAIC)","author":"Thiruthuvaraj Rajasekhar","year":"2023","unstructured":"Rajasekhar Thiruthuvaraj, Ashly Ann Jo, and Ebin Deni Raj. 2023. Explainability to business: Demystify transformer models with attention-based explanations. In Proceedings of the 2023 2nd International Conference on Applied Artificial Intelligence and Computing (ICAAIC). IEEE, 680\u2013686."},{"key":"e_1_3_1_59_2","first-page":"368","volume-title":"Proceedings of the Annual Allerton Conference on Communication, Control, and Computing","author":"Tishby N.","year":"1999","unstructured":"N. Tishby, F. C. Pereira, and W. Bialek. 1999. The information bottleneck method. In Proceedings of the Annual Allerton Conference on Communication, Control, and Computing, 368\u2013377."},{"key":"e_1_3_1_60_2","first-page":"1050","volume-title":"Proceedings of the International Conference on Applied Engineering and Natural Sciences","volume":"1","author":"Topsakal Oguzhan","year":"2023","unstructured":"Oguzhan Topsakal and Tahir Cetin Akinci. 2023. Creating large language model applications utilizing langchain: A primer on developing LLM apps fast. In Proceedings of the International Conference on Applied Engineering and Natural Sciences, Vol. 1, 1050\u20131056."},{"issue":"3","key":"e_1_3_1_61_2","first-page":"1","article-title":"Pre-trained language models in biomedical domain: A systematic survey","volume":"56","author":"Wang Benyou","year":"2023","unstructured":"Benyou Wang, Qianqian Xie, Jiahuan Pei, Zhihong Chen, Prayag Tiwari, Zhao Li, and Jie Fu. 2023. Pre-trained language models in biomedical domain: A systematic survey. ACM Computing Surveys 56, 3 (2023), 1\u201352.","journal-title":"ACM"},{"key":"e_1_3_1_62_2","doi-asserted-by":"crossref","first-page":"7747","DOI":"10.18653\/v1\/2022.acl-long.534","volume-title":"Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Long Papers","volume":"1","author":"Wang Jiapeng","year":"2022","unstructured":"Jiapeng Wang, Lianwen Jin, and Kai Ding. 2022. LiLT: A simple yet effective language-independent layout transformer for structured document understanding. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Long Papers, Vol. 1. Association for Computational Linguistics, 7747\u20137757."},{"key":"e_1_3_1_63_2","unstructured":"Jason Wei Maarten Bosma Vincent Y. Zhao Kelvin Guu Adams Wei Yu Brian Lester Nan Du Andrew M. Dai and Quoc V. Le. 2021. Finetuned language models are zero-shot learners. arXiv:2109.01652. Retrieved from https:\/\/doi.org\/10.48550\/arXiv.2109.01652"},{"key":"e_1_3_1_64_2","first-page":"10271","article-title":"Stable and low-precision training for large-scale vision-language models","volume":"36","author":"Wortsman Mitchell","year":"2024","unstructured":"Mitchell Wortsman, Tim Dettmers, Luke Zettlemoyer, Ari Morcos, Ali Farhadi, and Ludwig Schmidt. 2024. Stable and low-precision training for large-scale vision-language models. Advances in Neural Information Processing Systems 36 (2023), 10271\u201310298","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_65_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00780"},{"key":"e_1_3_1_66_2","first-page":"1050","volume-title":"Uncertainty in Artificial Intelligence","author":"Wu Tailin","year":"2020","unstructured":"Tailin Wu, Ian Fischer, Isaac L. Chuang, and Max Tegmark. 2020. Learnability for the information bottleneck. In Uncertainty in Artificial Intelligence. PMLR, 1050\u20131060."},{"issue":"4","key":"e_1_3_1_67_2","first-page":"1","article-title":"A survey of knowledge enhanced pre-trained models","volume":"37","author":"Yang Jian","year":"2024","unstructured":"Jian Yang, Gang Xiao, Yulong Shen, Wei Jiang, Xinyu Hu, Ying Zhang, and Jinghui Peng. 2024. A survey of knowledge enhanced pre-trained models. ACM Transactions on Asian and Low-Resource Language Information Processing 37, 4 (2024), Article 111, 1\u201330.","journal-title":"ACM Transactions on Asian and Low-Resource Language Information Processing"},{"issue":"209","key":"e_1_3_1_68_2","first-page":"1","article-title":"Ranking and tuning pre-trained models: A new paradigm for exploiting model hubs","volume":"23","author":"You Kaichao","year":"2022","unstructured":"Kaichao You, Yong Liu, Ziyang Zhang, Jianmin Wang, Michael I. Jordan, and Mingsheng Long. 2022. Ranking and tuning pre-trained models: A new paradigm for exploiting model hubs. Journal of Machine Learning Research 23, 209 (2022), 1\u201347.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_1_69_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2020.04.157"},{"key":"e_1_3_1_70_2","doi-asserted-by":"crossref","unstructured":"Haode Zhang Haowen Liang Liming Zhan Xiao-Ming Wu and Albert Lam. 2023. Revisit few-shot intent classification with PLMs: Direct fine-tuning vs. continual pre-training. arXiv:2306.05278. Retrieved from https:\/\/doi.org\/10.48550\/arXiv.2306.05278","DOI":"10.18653\/v1\/2023.findings-acl.706"},{"key":"e_1_3_1_71_2","doi-asserted-by":"publisher","DOI":"10.1145\/3639372"}],"container-title":["ACM Transactions on Intelligent Systems and Technology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3718096","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3718096","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:18:37Z","timestamp":1750295917000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3718096"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,4,15]]},"references-count":70,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2025,6,30]]}},"alternative-id":["10.1145\/3718096"],"URL":"https:\/\/doi.org\/10.1145\/3718096","relation":{},"ISSN":["2157-6904","2157-6912"],"issn-type":[{"value":"2157-6904","type":"print"},{"value":"2157-6912","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,4,15]]},"assertion":[{"value":"2024-05-28","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-01-17","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-04-15","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}