{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,21]],"date-time":"2026-05-21T08:08:57Z","timestamp":1779350937443,"version":"3.51.4"},"reference-count":96,"publisher":"Springer Science and Business Media LLC","issue":"4","license":[{"start":{"date-parts":[[2026,3,6]],"date-time":"2026-03-06T00:00:00Z","timestamp":1772755200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2026,3,6]],"date-time":"2026-03-06T00:00:00Z","timestamp":1772755200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Institute for Information and Communications Technology Planning & Evaluation","award":["IITP-2025-RS-2024-00436857"],"award-info":[{"award-number":["IITP-2025-RS-2024-00436857"]}]},{"DOI":"10.13039\/100031839","name":"Korea Research Institute for Defense Technology Planning and Advancement","doi-asserted-by":"crossref","award":["KRIT-CT-23-021"],"award-info":[{"award-number":["KRIT-CT-23-021"]}],"id":[{"id":"10.13039\/100031839","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int J Comput Vis"],"published-print":{"date-parts":[[2026,4]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>\n                    Pre-trained vision-language models\u00a0(\n                    <jats:italic>e.g<\/jats:italic>\n                    ., CLIP) have shown impressive success in various computer vision tasks with their generalization capability. Recently, parameter-efficient fine-tuning\u00a0(PEFT) approaches have been actively explored to effectively and efficiently adapt the pre-trained vision-language models to a variety of downstream tasks. However, most existing PEFT approaches suffer from a task overfitting issue since the general knowledge of the pre-trained models is forgotten while a small number of learnable parameters in soft prompts\/adapters are fine-tuned on a small data set from a specific target task. Thus, we propose a\n                    <jats:bold>P<\/jats:bold>\n                    arameter-\n                    <jats:bold>E<\/jats:bold>\n                    fficient\n                    <jats:bold>F<\/jats:bold>\n                    ine-\n                    <jats:bold>T<\/jats:bold>\n                    uning via\n                    <jats:bold>Meta<\/jats:bold>\n                    -\n                    <jats:bold>R<\/jats:bold>\n                    egularization\u00a0(PEFT-MetaR) to improve the generalizability of parameter-efficient fine-tuning methods for vision-language models. Specifically, PEFT-MetaR meta-learns both the regularizer and learnable parameters to harness the task-specific knowledge from the downstream tasks and task-agnostic general knowledge from the pretrained models. Further, PEFT-MetaR augments the task to generate multiple virtual tasks to alleviate the meta-overfitting. In addition, we provide the analysis to comprehend how PEFT-MetaR improves the generalizability from the perspective of the gradient alignment. Our experiments demonstrate that PEFT-MetaR improves the generalizability of parameter-efficient fine-tuning methods on various datasets.\n                  <\/jats:p>","DOI":"10.1007\/s11263-025-02693-z","type":"journal-article","created":{"date-parts":[[2026,3,6]],"date-time":"2026-03-06T09:38:38Z","timestamp":1772789918000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Parameter-Efficient Fine-Tuning via Meta-Regularizer"],"prefix":"10.1007","volume":"134","author":[{"given":"Jinyoung","family":"Park","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Juyeon","family":"Ko","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sanghyeok","family":"Lee","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Joonmyung","family":"Choi","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hyunwoo J.","family":"Kim","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2026,3,6]]},"reference":[{"key":"2693_CR1","doi-asserted-by":"crossref","unstructured":"Abdul\u00a0Samadh, J., Gani, M.H., Hussein, N., Khattak, M.U., Naseer, M.M., Shahbaz\u00a0Khan, F., & Khan, S.H. (2023) Align your prompts: Test-time prompting with distribution alignment for zero-shot generalization. In: NeurIPS","DOI":"10.52202\/075280-3525"},{"key":"2693_CR2","unstructured":"Antoniou, A., Edwards, H., & Storkey, A. (2019) How to train your maml. In: ICLR"},{"key":"2693_CR3","unstructured":"Balaji, Y., Sankaranarayanan, S., & Chellappa, R. (2018) Metareg: Towards domain generalization using meta-regularization. In: NeurIPS"},{"key":"2693_CR4","unstructured":"Bechtle, S., Molchanov, A., Chebotar, Y., Grefenstette, E., Righetti, L., Sukhatme, G.S., & Meier, F. (2020) Meta learning via learned loss. In: ICPR"},{"key":"2693_CR5","doi-asserted-by":"crossref","unstructured":"Bossard, L., Guillaumin, M., & Van\u00a0Gool, L. (2014) Food-101\u2013mining discriminative components with random forests. In: ECCV","DOI":"10.1007\/978-3-319-10599-4_29"},{"key":"2693_CR6","doi-asserted-by":"crossref","unstructured":"Chen, S., Ge, C., Tong, Z., Wang, J., Song, Y., Wang, J., & Luo, P. (2022) Adaptformer: Adapting vision transformers for scalable visual recognition. In: NeurIPS","DOI":"10.52202\/068431-1212"},{"key":"2693_CR7","doi-asserted-by":"crossref","unstructured":"Cho, E., Kim, J., & Kim, H.J. (2023) Distribution-aware prompt tuning for vision-language models. In: ICCV","DOI":"10.1109\/ICCV51070.2023.02011"},{"key":"2693_CR8","doi-asserted-by":"crossref","unstructured":"Choi, H.K., Choi, J., & Kim, H.J. (2023) Tokenmixup: Efficient attention-guided token-level data augmentation for transformers. In: NeurIPS","DOI":"10.52202\/068431-1034"},{"key":"2693_CR9","doi-asserted-by":"crossref","unstructured":"Chowdhury, S., Nag, S., & Manocha, D. (2023) Apollo: unified adapter and prompt learning for vision language models. In: EMNLP","DOI":"10.18653\/v1\/2023.emnlp-main.629"},{"key":"2693_CR10","doi-asserted-by":"crossref","unstructured":"Cimpoi, M., Maji, S., Kokkinos, I., Mohamed, S., & Vedaldi, A. (2014) Describing textures in the wild. In: CVPR","DOI":"10.1109\/CVPR.2014.461"},{"key":"2693_CR11","doi-asserted-by":"crossref","unstructured":"Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009) Imagenet: A large-scale hierarchical image database. In: CVPR","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"2693_CR12","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al. (2021) An image is worth 16x16 words: Transformers for image recognition at scale. In: ICLR"},{"key":"2693_CR13","doi-asserted-by":"crossref","unstructured":"Du, Y., Wei, F., Zhang, Z., Shi, M., Gao, Y., & Li, G. (2022) Learning to prompt for open-vocabulary object detection with vision-language model. In: CVPR","DOI":"10.1109\/CVPR52688.2022.01369"},{"key":"2693_CR14","unstructured":"Fei-Fei, L., Fergus, R., & Perona, P. (2004) Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. In: CVPRW"},{"key":"2693_CR15","doi-asserted-by":"crossref","unstructured":"Feng, C., Zhong, Y., Jie, Z., Chu, X., Ren, H., Wei, X., Xie, W., & Ma, L. (2022) Promptdet: Towards open-vocabulary detection using uncurated images. In: ECCV","DOI":"10.1007\/978-3-031-20077-9_41"},{"key":"2693_CR16","unstructured":"Finn, C., Abbeel, P., & Levine, S. (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: ICML"},{"issue":"2","key":"2693_CR17","doi-asserted-by":"publisher","first-page":"581","DOI":"10.1007\/s11263-023-01891-x","volume":"132","author":"P Gao","year":"2024","unstructured":"Gao, P., Geng, S., Zhang, R., Ma, T., Fang, R., Zhang, Y., Li, H., & Qiao, Y. (2024). Clip-adapter: Better vision-language models with feature adapters. IJCV, 132(2), 581\u2013595.","journal-title":"IJCV"},{"key":"2693_CR18","unstructured":"Grant, E., Finn, C., Levine, S., Darrell, T., & Griffiths, T. (2018) Recasting gradient-based meta-learning as hierarchical bayes. In: ICLR"},{"key":"2693_CR19","unstructured":"Gu, X., Lin, T.-Y., Kuo, W., & Cui, Y. (2022) Open-vocabulary object detection via vision and language knowledge distillation. In: ICLR"},{"issue":"7","key":"2693_CR20","first-page":"2217","volume":"12","author":"P Helber","year":"2019","unstructured":"Helber, P., Bischke, B., Dengel, A., & Borth, D. (2019). Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification. JSTARS, 12(7), 2217\u20132226.","journal-title":"JSTARS"},{"key":"2693_CR21","doi-asserted-by":"crossref","unstructured":"Hendrycks, D., Basart, S., Mu, N., Kadavath, S., Wang, F., Dorundo, E., Desai, R., Zhu, T., Parajuli, S., Guo, M., et al. (2021) The many faces of robustness: A critical analysis of out-of-distribution generalization. In: ICCV","DOI":"10.1109\/ICCV48922.2021.00823"},{"key":"2693_CR22","doi-asserted-by":"crossref","unstructured":"Hendrycks, D., Zhao, K., Basart, S., Steinhardt, J., & Song, D. (2021) Natural adversarial examples. In: CVPR","DOI":"10.1109\/CVPR46437.2021.01501"},{"key":"2693_CR23","doi-asserted-by":"crossref","unstructured":"Hochreiter, S., Younger, A.S., & Conwell, P.R. (2001) Learning to learn using gradient descent. In: ICANN","DOI":"10.1007\/3-540-44668-0_13"},{"issue":"9","key":"2693_CR24","first-page":"5149","volume":"44","author":"T Hospedales","year":"2021","unstructured":"Hospedales, T., Antoniou, A., Micaelli, P., & Storkey, A. (2021). Meta-learning in neural networks: A survey. TPAMI, 44(9), 5149\u20135169.","journal-title":"TPAMI"},{"key":"2693_CR25","unstructured":"Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., De\u00a0Laroussilhe, Q., Gesmundo, A., Attariyan, M., & Gelly, S. (2019) Parameter-efficient transfer learning for nlp. In: ICML"},{"key":"2693_CR26","unstructured":"Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., Chen, W., et al. (2022) Lora: Low-rank adaptation of large language models. In: ICLR"},{"key":"2693_CR27","unstructured":"Hwang, D., Park, J., Kwon, S., Kim, K., Ha, J.-W., & Kim, H.J. (2020) Self-supervised auxiliary learning with meta-paths for heterogeneous graphs. In: NeurIPS"},{"key":"2693_CR28","unstructured":"Hwang, D., Park, J., Kwon, S., Kim, K.-M., Ha, J.-W., & Kim, H.J. (2021) Self-supervised auxiliary learning for graph neural networks via meta-learning. arXiv:2103.00771"},{"key":"2693_CR29","unstructured":"Ilharco, G., Wortsman, M., Gadre, S.Y., Song, S., Hajishirzi, H., Kornblith, S., Farhadi, A., & Schmidt, L. (2022) Patching open-vocabulary models by interpolating weights. In: NeurIPS"},{"key":"2693_CR30","doi-asserted-by":"crossref","unstructured":"Jia, M., Tang, L., Chen, B.-C., Cardie, C., Belongie, S., Hariharan, B., & Lim, S.-N. (2022) Visual prompt tuning. In: ECCV","DOI":"10.1007\/978-3-031-19827-4_41"},{"key":"2693_CR31","unstructured":"Jia, C., Yang, Y., Xia, Y., Chen, Y.-T., Parekh, Z., Pham, H., Le, Q., Sung, Y.-H., Li, Z., & Duerig, T. (2021) Scaling up visual and vision-language representation learning with noisy text supervision. In: ICML"},{"key":"2693_CR32","unstructured":"Karimi\u00a0Mahabadi, R., Henderson, J., & Ruder, S. (2021) Compacter: Efficient low-rank hypercomplex adapter layers. In: NeurIPS"},{"key":"2693_CR33","doi-asserted-by":"crossref","unstructured":"Khattak, M.U., Rasheed, H., Maaz, M., Khan, S., & Khan, F.S. (2023) Maple: Multi-modal prompt learning. In: CVPR","DOI":"10.1109\/CVPR52729.2023.01832"},{"key":"2693_CR34","doi-asserted-by":"crossref","unstructured":"Khattak, M.U., Wasim, S.T., Naseer, M., Khan, S., Yang, M.-H., & Khan, F.S. (2023) Self-regulating prompts: Foundational model adaptation without forgetting. In: ICCV","DOI":"10.1109\/ICCV51070.2023.01394"},{"key":"2693_CR35","unstructured":"Kim, J.-H., Choo, W., Jeong, H., & Song, H.O. (2021) Co-mixup: Saliency guided joint mixup with supermodular diversity. In: ICLR"},{"key":"2693_CR36","doi-asserted-by":"crossref","unstructured":"Ko, D., Choi, J., Choi, H.K., On, K.-W., Roh, B., & Kim, H.J. (2023) Meltr: Meta loss transformer for learning to fine-tune video foundation models. In: CVPR","DOI":"10.1109\/CVPR52729.2023.01925"},{"key":"2693_CR37","unstructured":"Koch, G., Zemel, R., Salakhutdinov, R., et al. (2015) Siamese neural networks for one-shot image recognition. In: ICMLW"},{"key":"2693_CR38","doi-asserted-by":"crossref","unstructured":"Krause, J., Stark, M., Deng, J., & Fei-Fei, L. (2013) 3d object representations for fine-grained categorization. In: ICCVW","DOI":"10.1109\/ICCVW.2013.77"},{"key":"2693_CR39","doi-asserted-by":"crossref","unstructured":"Lee, K., Maji, S., Ravichandran, A.,& Soatto, S. (2019) Meta-learning with differentiable convex optimization. In: CVPR","DOI":"10.1109\/CVPR.2019.01091"},{"key":"2693_CR40","doi-asserted-by":"crossref","unstructured":"Lee, D., Song, S., Suh, J., Choi, J., Lee, S., & Kim, H.J. (2023) Read-only prompt optimization for vision-language few-shot learning. In: ICCV","DOI":"10.1109\/ICCV51070.2023.00135"},{"key":"2693_CR41","doi-asserted-by":"crossref","unstructured":"Lester, B., Al-Rfou, R., & Constant, N. (2021) The power of scale for parameter-efficient prompt tuning. In: EMNLP","DOI":"10.18653\/v1\/2021.emnlp-main.243"},{"key":"2693_CR42","doi-asserted-by":"crossref","unstructured":"Li, X.L., & Liang, P. (2021) Prefix-tuning: Optimizing continuous prompts for generation. In: ACL","DOI":"10.18653\/v1\/2021.acl-long.353"},{"key":"2693_CR43","doi-asserted-by":"crossref","unstructured":"Li, J., Gao, M., Wei, L., Tang, S., Zhang, W., Li, M., Ji, W., Tian, Q., Chua, T.-S., & Zhuang, Y. (2023) Gradient-regulated meta-prompt learning for generalizable vision-language models. In: ICCV","DOI":"10.1109\/ICCV51070.2023.00241"},{"key":"2693_CR44","doi-asserted-by":"crossref","unstructured":"Li, Z., Li, X., Fu, X., Zhang, X., Wang, W., Chen, S., & Yang, J. (2024) Promptkd: Unsupervised prompt distillation for vision-language models. In: CVPR","DOI":"10.1109\/CVPR52733.2024.02513"},{"key":"2693_CR45","unstructured":"Li, J., Li, D., Savarese, S., & Hoi, S. (2023) Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. In: ICML"},{"key":"2693_CR46","doi-asserted-by":"crossref","unstructured":"Li, D., Yang, Y., Song, Y.-Z., & Hospedales, T. (2018) Learning to generalize: Meta-learning for domain generalization. In: AAAI","DOI":"10.1609\/aaai.v32i1.11596"},{"key":"2693_CR47","unstructured":"Lian, D., Zhou, D., Feng, J., & Wang, X. (2022) Scaling & shifting your features: A new baseline for efficient model tuning. In: NeurIPS"},{"key":"2693_CR48","doi-asserted-by":"crossref","unstructured":"Liu, X., Zheng, Y., Du, Z., Ding, M., Qian, Y., Yang, Z., & Tang, J. (2023) Gpt understands, too. AI Open","DOI":"10.1016\/j.aiopen.2023.08.012"},{"key":"2693_CR49","unstructured":"Loshchilov, I., & Hutter, F. (2019) Decoupled weight decay regularization. In: ICLR"},{"key":"2693_CR50","doi-asserted-by":"crossref","unstructured":"Lu, Y., Liu, J., Zhang, Y., Liu, Y., & Tian, X. (2022) Prompt distribution learning. In: CVPR","DOI":"10.1109\/CVPR52688.2022.00514"},{"key":"2693_CR51","doi-asserted-by":"crossref","unstructured":"L\u00fcddecke, T., & Ecker, A. (2022) Image segmentation using text and image prompts. In: CVPR","DOI":"10.1109\/CVPR52688.2022.00695"},{"key":"2693_CR52","unstructured":"Maji, S., Kannala, J., Rahtu, E., Blaschko, M., & Vedaldi, A. (2013) Fine-grained visual classification of aircraft. arXiv:1306.5151"},{"key":"2693_CR53","unstructured":"Mishra, N., Rohaninejad, M., Chen, X., & Abbeel, P. (2018) A simple neural attentive meta-learner. In: ICLR"},{"key":"2693_CR54","unstructured":"Mokady, R., Hertz, A., & Bermano, A.H. (2021) Clipcap: Clip prefix for image captioning. arXiv:2111.09734"},{"key":"2693_CR55","unstructured":"Munkhdalai, T., & Yu, H. (2017) Meta networks. In: ICML"},{"key":"2693_CR56","unstructured":"Munkhdalai, T., Yuan, X., Mehri, S., & Trischler, A. (2018) Rapid adaptation with conditionally shifted neurons. In: ICML"},{"key":"2693_CR57","unstructured":"Nichol, A., Achiam, J., & Schulman, J. (2018) On first-order meta-learning algorithms. arXiv:1803.02999"},{"key":"2693_CR58","doi-asserted-by":"crossref","unstructured":"Nilsback, M.-E., & Zisserman, A. (2008) Automated flower classification over a large number of classes. In: ICVGIP","DOI":"10.1109\/ICVGIP.2008.47"},{"key":"2693_CR59","doi-asserted-by":"crossref","unstructured":"Park, J., Ko, J., & Kim, H.J. (2024) Prompt learning via meta-regularization. In: CVPR","DOI":"10.1109\/CVPR52733.2024.02544"},{"key":"2693_CR60","unstructured":"Park, H., Lee, S., Kim, S., Park, J., Jeong, J., Kim, K.-M., Ha, J.-W., & Kim, H.J. (2022) Metropolis-hastings data augmentation for graph neural networks. In: NeurIPS"},{"key":"2693_CR61","doi-asserted-by":"crossref","unstructured":"Parkhi, O.M., Vedaldi, A., Zisserman, A., & Jawahar, C. (2012) Cats and dogs. In: CVPR","DOI":"10.1109\/CVPR.2012.6248092"},{"key":"2693_CR62","unstructured":"Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: (2021) Learning transferable visual models from natural language supervision. In: ICML"},{"key":"2693_CR63","unstructured":"Ravi, S., & Larochelle, H. (2016) Optimization as a model for few-shot learning. In: ICLR"},{"key":"2693_CR64","unstructured":"Recht, B., Roelofs, R., Schmidt, L., & Shankar, V. (2019) Do imagenet classifiers generalize to imagenet? In: ICML"},{"key":"2693_CR65","unstructured":"Roy, S., & Etemad, A. (2024) Consistency-guided prompt learning for vision-language models. In: ICLR"},{"key":"2693_CR66","unstructured":"Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., & Lillicrap, T. (2016) Meta-learning with memory-augmented neural networks. In: ICML"},{"key":"2693_CR67","unstructured":"Shu, J., Xie, Q., Yi, L., Zhao, Q., Zhou, S., Xu, Z., & Meng, D. (2019) Meta-weight-net: Learning an explicit mapping for sample weighting. In: NeurIPS"},{"key":"2693_CR68","doi-asserted-by":"crossref","unstructured":"Singh, A., Hu, R., Goswami, V., Couairon, G., Galuba, W., Rohrbach, M., & Kiela, D. (2022) Flava: A foundational language and vision alignment model. In: CVPR","DOI":"10.1109\/CVPR52688.2022.01519"},{"key":"2693_CR69","unstructured":"Snell, J., Swersky, K., & Zemel, R. (2017) Prototypical networks for few-shot learning. In: NeurIPS"},{"key":"2693_CR70","unstructured":"Soomro, K., Zamir, A.R., & Shah, M. (2013) Ucf101: A dataset of 101 human actions classes from videos in the wild. In: ICCVW"},{"issue":"1","key":"2693_CR71","first-page":"1929","volume":"15","author":"N Srivastava","year":"2014","unstructured":"Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. JMLR, 15(1), 1929\u20131958.","journal-title":"JMLR"},{"key":"2693_CR72","doi-asserted-by":"crossref","unstructured":"Sung, Y.-L., Cho, J., & Bansal, M. (2022) Vl-adapter: Parameter-efficient transfer learning for vision-and-language tasks. In: CVPR","DOI":"10.1109\/CVPR52688.2022.00516"},{"key":"2693_CR73","doi-asserted-by":"crossref","unstructured":"Sung, F., Yang, Y., Zhang, L., Xiang, T., Torr, P.H., & Hospedales, T.M. (2018) Learning to compare: Relation network for few-shot learning. In: CVPR","DOI":"10.1109\/CVPR.2018.00131"},{"key":"2693_CR74","unstructured":"Uddin, A., Monira, M., Shin, W., Chung, T., Bae, S.-H., et al. (2021) Saliencymix: A saliency guided data augmentation strategy for better regularization. In: ICLR"},{"key":"2693_CR75","doi-asserted-by":"crossref","unstructured":"Upadhyay, U., Karthik, S., Mancini, M., & Akata, Z. (2023) Probvlm: Probabilistic adapter for frozen vison-language models. In: ICCV","DOI":"10.1109\/ICCV51070.2023.00182"},{"key":"2693_CR76","unstructured":"Verma, V., Lamb, A., Beckham, C., Najafi, A., Courville, A., Mitliagkas, I., & Bengio, Y. (2019) Manifold mixup: learning better representations by interpolating hidden states. In: ICML"},{"key":"2693_CR77","unstructured":"Vinyals, O., Blundell, C., Lillicrap, T., Wierstra, D., et al. (2016) Matching networks for one shot learning. In: NeurIPS"},{"key":"2693_CR78","unstructured":"Wang, H., Ge, S., Lipton, Z., & Xing, E.P. (2019) Learning robust global representations by penalizing local predictive power. In: NeurIPS"},{"key":"2693_CR79","doi-asserted-by":"crossref","unstructured":"Wortsman, M., Ilharco, G., Kim, J.W., Li, M., Kornblith, S., Roelofs, R., Lopes, R.G., Hajishirzi, H., Farhadi, A., Namkoong, H., et al. (2022) Robust fine-tuning of zero-shot models. In: CVPR","DOI":"10.1109\/CVPR52688.2022.00780"},{"key":"2693_CR80","doi-asserted-by":"crossref","unstructured":"Xian, Y., Schiele, B., & Akata, Z. (2017) Zero-shot learning-the good, the bad and the ugly. In: CVPR","DOI":"10.1109\/CVPR.2017.328"},{"key":"2693_CR81","doi-asserted-by":"crossref","unstructured":"Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., & Torralba, A. (2010) Sun database: Large-scale scene recognition from abbey to zoo. In: CVPR","DOI":"10.1109\/CVPR.2010.5539970"},{"key":"2693_CR82","doi-asserted-by":"crossref","unstructured":"Yang, L., Zhang, R.-Y., Wang, Y., & Xie, X. (2024) Mma: Multi-modal adapter for vision-language models. In: CVPR","DOI":"10.1109\/CVPR52733.2024.02249"},{"key":"2693_CR83","unstructured":"Yao, H., Zhang, L., & Finn, C. (2022) Meta-learning with fewer tasks through task interpolation. In: ICLR"},{"key":"2693_CR84","doi-asserted-by":"crossref","unstructured":"Yun, S., Han, D., Oh, S.J., Chun, S., Choe, J., & Yoo, Y. (2019) Cutmix: Regularization strategy to train strong classifiers with localizable features. In: ICCV","DOI":"10.1109\/ICCV.2019.00612"},{"key":"2693_CR85","unstructured":"Zang, Y., Li, W., Zhou, K., Huang, C., & Loy, C.C. (2022) Unified vision and language prompt learning. arXiv:2210.07225"},{"key":"2693_CR86","doi-asserted-by":"crossref","unstructured":"Zhai, X., Wang, X., Mustafa, B., Steiner, A., Keysers, D., Kolesnikov, A., & Beyer, L. (2022) Lit: Zero-shot transfer with locked-image text tuning. In: CVPR","DOI":"10.1109\/CVPR52688.2022.01759"},{"key":"2693_CR87","unstructured":"Zhang, H., Cisse, M., Dauphin, Y.N., & Lopez-Paz, D. (2018) mixup: Beyond empirical risk minimization. In: ICLR"},{"key":"2693_CR88","unstructured":"Zhang, R., Han, J., Zhou, A., Hu, X., Yan, S., Lu, P., Li, H., Gao, P., & Qiao, Y. (2024) Llama-adapter: Efficient fine-tuning of language models with zero-init attention. In: ICLR"},{"key":"2693_CR89","unstructured":"Zhang, G., Wang, C., Xu, B., & Grosse, R. (2019) Three mechanisms of weight decay regularization. In: ICLR"},{"key":"2693_CR90","doi-asserted-by":"crossref","unstructured":"Zhang, R., Zhang, W., Fang, R., Gao, P., Li, K., Dai, J., Qiao, Y., & Li, H. (2022) Tip-adapter: Training-free adaption of clip for few-shot classification. In: ECCV","DOI":"10.1007\/978-3-031-19833-5_29"},{"key":"2693_CR91","doi-asserted-by":"crossref","unstructured":"Zhang, Y., Zhang, C., Yu, K., Tang, Y., & He, Z. (2024) Concept-guided prompt learning for generalization in vision-language models. In: AAAI","DOI":"10.1609\/aaai.v38i7.28568"},{"key":"2693_CR92","doi-asserted-by":"crossref","unstructured":"Zhong, Y., Yang, J., Zhang, P., Li, C., Codella, N., Li, L.H., Zhou, L., Dai, X., Yuan, L., Li, Y., et al.: (2022) Regionclip: Region-based language-image pretraining. In: CVPR","DOI":"10.1109\/CVPR52688.2022.01629"},{"key":"2693_CR93","doi-asserted-by":"crossref","unstructured":"Zhou, K., Yang, J., Loy, C.C., & Liu, Z. (2022) Conditional prompt learning for vision-language models. In: CVPR","DOI":"10.1109\/CVPR52688.2022.01631"},{"issue":"9","key":"2693_CR94","doi-asserted-by":"publisher","first-page":"2337","DOI":"10.1007\/s11263-022-01653-1","volume":"130","author":"K Zhou","year":"2022","unstructured":"Zhou, K., Yang, J., Loy, C. C., & Liu, Z. (2022). Learning to prompt for vision-language models. IJCV, 130(9), 2337\u20132348.","journal-title":"IJCV"},{"key":"2693_CR95","doi-asserted-by":"crossref","unstructured":"Zhu, B., Niu, Y., Han, Y., Wu, Y., & Zhang, H. (2023) Prompt-aligned gradient for prompt tuning. In: ICCV","DOI":"10.1109\/ICCV51070.2023.01435"},{"key":"2693_CR96","unstructured":"Zintgraf, L., Shiarli, K., Kurin, V., Hofmann, K., & Whiteson, S. (2019) Fast context adaptation via meta-learning. In: ICML"}],"container-title":["International Journal of Computer Vision"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11263-025-02693-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11263-025-02693-z","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11263-025-02693-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,5,21]],"date-time":"2026-05-21T07:32:27Z","timestamp":1779348747000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11263-025-02693-z"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3,6]]},"references-count":96,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2026,4]]}},"alternative-id":["2693"],"URL":"https:\/\/doi.org\/10.1007\/s11263-025-02693-z","relation":{},"ISSN":["0920-5691","1573-1405"],"issn-type":[{"value":"0920-5691","type":"print"},{"value":"1573-1405","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,3,6]]},"assertion":[{"value":"21 April 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 September 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"6 March 2026","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"148"}}