{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T01:02:10Z","timestamp":1760058130131,"version":"build-2065373602"},"reference-count":36,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2025,3,14]],"date-time":"2025-03-14T00:00:00Z","timestamp":1741910400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>Prompt tuning visual-language models (VLMs) for specialized tasks often involves leveraging task-specific textual tokens, which can tailor the pre-existing, broad capabilities of a VLM to more narrowly focused applications. This approach, exemplified by CoOp-based methods, integrates mutable textual tokens with categorical tokens to foster nuanced textual comprehension. Nonetheless, such specialized textual insights often fail to generalize beyond the scope of familiar categories, as they tend to overshadow the versatile, general textual knowledge intrinsic to the model\u2019s wide-ranging applicability. Addressing this base-novel dilemma, we propose the innovative concept of SparseKnowledge-guided Context Optimization (Sparse-KgCoOp). This technique aims to fortify the adaptable prompts\u2019 capacity to generalize to categories yet unencountered. The cornerstone of Sparse-KgCoOp is based on the premise that reducing the differences between adaptive prompt and their hand-crafted counterparts through sparsification operations can mitigate the erosion of fundamental knowledge. Specifically, Sparse-KgCoOp seeks to narrow the gap between the textual embeddings produced by both the dynamic prompts and the manually devised ones, thus preserving the foundational knowledge while maintaining adaptability. Extensive experiments of several benchmarks demonstrate that the proposed Sparse-KgCoOp is an efficient method for prompt tuning.<\/jats:p>","DOI":"10.3390\/e27030301","type":"journal-article","created":{"date-parts":[[2025,3,14]],"date-time":"2025-03-14T07:02:16Z","timestamp":1741935736000},"page":"301","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Enhancing Visual-Language Prompt Tuning Through Sparse Knowledge-Guided Context Optimization"],"prefix":"10.3390","volume":"27","author":[{"given":"Qiangxing","family":"Tian","sequence":"first","affiliation":[{"name":"School of Information and Electrical Engineering, Hangzhou City University, Hangzhou 310015, China"}]},{"given":"Min","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, East China Normal University, Shanghai 200062, China"}]}],"member":"1968","published-online":{"date-parts":[[2025,3,14]]},"reference":[{"key":"ref_1","unstructured":"Meila, M., and Zhang, T. (2021, January 18\u201324). Learning Transferable Visual Models From Natural Language Supervision. Proceedings of the 38th International Conference on Machine Learning, ICML 2021, Virtual Event. PMLR Proceedings of Machine Learning Research."},{"key":"ref_2","unstructured":"Alayrac, J., Donahue, J., Luc, P., Miech, A., Barr, I., Hasson, Y., Lenc, K., Mensch, A., Millican, K., and Reynolds, M. (2022). Flamingo: A Visual Language Model for Few-Shot Learning. arXiv."},{"key":"ref_3","unstructured":"Jia, C., Yang, Y., Xia, Y., Chen, Y., Parekh, Z., Pham, H., Le, Q.V., Sung, Y., Li, Z., and Duerig, T. (2021, January 18\u201324). Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision. Proceedings of the 38th International Conference on Machine Learning, ICML, Virtual."},{"key":"ref_4","unstructured":"Meila, M., and Zhang, T. (2021, January 18\u201324). Unifying Vision-and-Language Tasks via Text Generation. Proceedings of the 38th International Conference on Machine Learning, ICML 2021, Virtual Event. PMLR Proceedings of Machine Learning Research."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Gan, Z., Li, L., Li, C., Wang, L., Liu, Z., and Gao, J. (2022). Vision-Language Pre-training: Basics, Recent Advances, and Future Trends. arXiv.","DOI":"10.1561\/9781638281337"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"709","DOI":"10.1007\/978-3-031-19827-4_41","article-title":"Visual Prompt Tuning","volume":"Volume 13693","author":"Avidan","year":"2022","journal-title":"Proceedings of the Computer Vision-ECCV 2022-17th European Conference"},{"key":"ref_7","unstructured":"Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., and Neubig, G. (2021). Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. arXiv."},{"key":"ref_8","unstructured":"Inui, K., Jiang, J., Ng, V., and Wan, X. (2019, January 3\u20137). Language Models as Knowledge Bases?. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Rao, Y., Zhao, W., Chen, G., Tang, Y., Zhu, Z., Huang, G., Zhou, J., and Lu, J. (2022, January 18\u201324). DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01755"},{"key":"ref_10","unstructured":"Ranzato, M., Beygelzimer, A., Dauphin, Y.N., Liang, P., and Vaughan, J.W. (2021, January 6\u201314). Multimodal Few-Shot Learning with Frozen Language Models. Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, Virtual."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Yao, Y., Zhang, A., Zhang, Z., Liu, Z., Chua, T., and Sun, M. (2021). CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models. arXiv.","DOI":"10.18653\/v1\/2022.findings-acl.273"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"2337","DOI":"10.1007\/s11263-022-01653-1","article-title":"Learning to Prompt for Vision-Language Models","volume":"130","author":"Zhou","year":"2022","journal-title":"Int. J. Comput. Vis."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Zhou, K., Yang, J., Loy, C.C., and Liu, Z. (2022, January 18\u201324). Conditional Prompt Learning for Vision-Language Models. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01631"},{"key":"ref_14","unstructured":"Guyon, I., von Luxburg, U., Bengio, S., Wallach, H.M., Fergus, R., Vishwanathan, S.V.N., and Garnett, R. (2017, January 4\u20139). Attention is All you Need. Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, Long Beach, CA, USA."},{"key":"ref_15","unstructured":"Chen, T., Kornblith, S., Norouzi, M., and Hinton, G.E. (2020, January 13\u201318). A Simple Framework for Contrastive Learning of Visual Representations. Proceedings of the 37th International Conference on Machine Learning, ICML 2020, Virtual Event. PMLR Proceedings of Machine Learning Research."},{"key":"ref_16","unstructured":"Wang, Z., Yu, J., Yu, A.W., Dai, Z., Tsvetkov, Y., and Cao, Y. (2022, January 25\u201329). SimVLM: Simple Visual Language Model Pretraining with Weak Supervision. Proceedings of the Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event."},{"key":"ref_17","unstructured":"Meila, M., and Zhang, T. (2021, January 18\u201324). ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision. Proceedings of the 38th International Conference on Machine Learning, ICML 2021, Virtual Event. PMLR Proceedings of Machine Learning Research."},{"key":"ref_18","unstructured":"Wallach, H.M., Larochelle, H., Beygelzimer, A., d\u2019Alch\u00e9-Buc, F., Fox, E.B., and Garnett, R. (2019, January 8\u201314). ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks. Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, Vancouver, BC, Canada."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"He, K., Chen, X., Xie, S., Li, Y., Doll\u00e1r, P., and Girshick, R.B. (2022, January 18\u201324). Masked Autoencoders Are Scalable Vision Learners. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01553"},{"key":"ref_20","unstructured":"Zang, Y., Li, W., Zhou, K., Huang, C., and Loy, C.C. (2022). Unified Vision and Language Prompt Learning. arXiv."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Lu, Y., Liu, J., Zhang, Y., Liu, Y., and Tian, X. (2022, January 18\u201324). Prompt Distribution Learning. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00514"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Zhu, B., Niu, Y., Han, Y., Wu, Y., and Zhang, H. (2022). Prompt-Aligned Gradient for Prompt Tuning. arXiv.","DOI":"10.1109\/ICCV51070.2023.01435"},{"key":"ref_23","unstructured":"Gao, P., Geng, S., Zhang, R., Ma, T., Fang, R., Zhang, Y., Li, H., and Qiao, Y. (2021). CLIP-Adapter: Better Vision-Language Models with Feature Adapters. arXiv."},{"key":"ref_24","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2021, January 3\u20137). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. Proceedings of the 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep Residual Learning for Image Recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Deng, J., Dong, W., Socher, R., Li, L., Li, K., and Fei-Fei, L. (2009, January 20\u201325). ImageNet: A large-scale hierarchical image database. Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), Miami, FL, USA.","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1016\/j.cviu.2005.09.012","article-title":"Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories","volume":"106","author":"Li","year":"2007","journal-title":"Comput. Vis. Image Underst."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Parkhi, O.M., Vedaldi, A., Zisserman, A., and Jawahar, C.V. (2012, January 16\u201321). Cats and dogs. Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA.","DOI":"10.1109\/CVPR.2012.6248092"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Krause, J., Stark, M., Deng, J., and Fei-Fei, L. (2013, January 1\u20138). 3D Object Representations for Fine-Grained Categorization. Proceedings of the 2013 IEEE International Conference on Computer Vision Workshops, ICCV Workshops 2013, Sydney, Australia.","DOI":"10.1109\/ICCVW.2013.77"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Nilsback, M., and Zisserman, A. (2008, January 16\u201319). Automated Flower Classification over a Large Number of Classes. Proceedings of the Sixth Indian Conference on Computer Vision, Graphics & Image Processing, ICVGIP 2008, Bhubaneswar, India.","DOI":"10.1109\/ICVGIP.2008.47"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"446","DOI":"10.1007\/978-3-319-10599-4_29","article-title":"Food-101-Mining Discriminative Components with Random Forests","volume":"Volume 8694","author":"Fleet","year":"2014","journal-title":"Proceedings of the Computer Vision-ECCV 2014-13th European Conference"},{"key":"ref_32","unstructured":"Maji, S., Rahtu, E., Kannala, J., Blaschko, M.B., and Vedaldi, A. (2013). Fine-Grained Visual Classification of Aircraft. arXiv."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"2217","DOI":"10.1109\/JSTARS.2019.2918242","article-title":"EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification","volume":"12","author":"Helber","year":"2019","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_34","unstructured":"Soomro, K., Zamir, A.R., and Shah, M. (2012). UCF101: A Dataset of 101 Human Actions Classes from Videos in The Wild. arXiv."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Cimpoi, M., Maji, S., Kokkinos, I., Mohamed, S., and Vedaldi, A. (2014, January 23\u201328). Describing Textures in the Wild. Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.461"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., and Torralba, A. (2010, January 13\u201318). SUN database: Large-scale scene recognition from abbey to zoo. Proceedings of the Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, CA, USA.","DOI":"10.1109\/CVPR.2010.5539970"}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/27\/3\/301\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T16:53:35Z","timestamp":1760028815000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/27\/3\/301"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,14]]},"references-count":36,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2025,3]]}},"alternative-id":["e27030301"],"URL":"https:\/\/doi.org\/10.3390\/e27030301","relation":{},"ISSN":["1099-4300"],"issn-type":[{"type":"electronic","value":"1099-4300"}],"subject":[],"published":{"date-parts":[[2025,3,14]]}}}