{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,26]],"date-time":"2026-03-26T15:31:26Z","timestamp":1774539086533,"version":"3.50.1"},"reference-count":301,"publisher":"Association for Computing Machinery (ACM)","issue":"12","license":[{"start":{"date-parts":[[2024,7,25]],"date-time":"2024-07-25T00:00:00Z","timestamp":1721865600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100012166","name":"National Key R&D Program of China","doi-asserted-by":"crossref","award":["2022ZD0160300"],"award-info":[{"award-number":["2022ZD0160300"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62276004, 61932020"],"award-info":[{"award-number":["62276004, 61932020"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100003816","name":"Huawei Technologies","doi-asserted-by":"crossref","award":["P0038941"],"award-info":[{"award-number":["P0038941"]}],"id":[{"id":"10.13039\/501100003816","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Comput. Surv."],"published-print":{"date-parts":[[2024,12,31]]},"abstract":"<jats:p>Fine-tuning visual models has been widely shown promising performance on many downstream visual tasks. With the surprising development of pre-trained visual foundation models, visual tuning jumped out of the standard modus operandi that fine-tunes the whole pre-trained model or just the fully connected layer. Instead, recent advances can achieve superior performance than full-tuning the whole pre-trained parameters by updating far fewer parameters, enabling edge devices and downstream applications to reuse the increasingly large foundation models deployed on the cloud. With the aim of helping researchers get the full picture and future directions of visual tuning, this survey characterizes a large and thoughtful selection of recent works, providing a systematic and comprehensive overview of existing work and models. Specifically, it provides a detailed background of visual tuning and categorizes recent visual tuning techniques into five groups: fine-tuning, prompt tuning, adapter tuning, parameter tuning, and remapping tuning. Meanwhile, it offers some exciting research directions for prospective pre-training and various interactions in visual tuning.<\/jats:p>","DOI":"10.1145\/3657632","type":"journal-article","created":{"date-parts":[[2024,4,12]],"date-time":"2024-04-12T12:23:59Z","timestamp":1712924639000},"page":"1-38","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":28,"title":["Visual Tuning"],"prefix":"10.1145","volume":"56","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9905-8154","authenticated-orcid":false,"given":"Bruce X.B.","family":"Yu","sequence":"first","affiliation":[{"name":"Zhejiang University-University of Illinois Urbana-Champaign Institute, Zhejiang University, Haining, China and Zhejiang Provincial Engineering Research Center for Multimodal Transport Logistics Large Models, Haining China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0610-907X","authenticated-orcid":false,"given":"Jianlong","family":"Chang","sequence":"additional","affiliation":[{"name":"Huawei, Shenzhen China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5714-0149","authenticated-orcid":false,"given":"Haixin","family":"Wang","sequence":"additional","affiliation":[{"name":"Peking University, National Engineering Research Center for Software Engineering, Beijing China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8179-6685","authenticated-orcid":false,"given":"Lingbo","family":"Liu","sequence":"additional","affiliation":[{"name":"Peng Cheng Laboratory, Shenzhen China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7254-4715","authenticated-orcid":false,"given":"Shijie","family":"Wang","sequence":"additional","affiliation":[{"name":"Huawei, Shenzhen China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-7606-5114","authenticated-orcid":false,"given":"Zhiyu","family":"Wang","sequence":"additional","affiliation":[{"name":"Huawei, Shenzhen China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8717-8351","authenticated-orcid":false,"given":"Junfan","family":"Lin","sequence":"additional","affiliation":[{"name":"Peng Cheng Laboratory, Shenzhen China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4831-9451","authenticated-orcid":false,"given":"Lingxi","family":"Xie","sequence":"additional","affiliation":[{"name":"Huawei, Shenzhen China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3882-2205","authenticated-orcid":false,"given":"Haojie","family":"Li","sequence":"additional","affiliation":[{"name":"Shandong University of Science and Technology, College of Computer Science and Engineering, Qingdao China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1493-7569","authenticated-orcid":false,"given":"Zhouchen","family":"Lin","sequence":"additional","affiliation":[{"name":"National Key Lab of General AI, Peking University, School of Intelligence Science and Technology, Peking University, Beijing, China and Pazhou Laboratory (Huangpu), Guangzhou China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7252-5047","authenticated-orcid":false,"given":"Qi","family":"Tian","sequence":"additional","affiliation":[{"name":"Huawei, Shenzhen China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6720-234X","authenticated-orcid":false,"given":"Chang Wen","family":"Chen","sequence":"additional","affiliation":[{"name":"The Hong Kong Polytechnic University, Department of Computing, Hong Kong, Hong Kong"}]}],"member":"320","published-online":{"date-parts":[[2024,7,25]]},"reference":[{"key":"e_1_3_2_2_2","unstructured":"Jean-Baptiste Alayrac Jeff Donahue Pauline Luc Antoine Miech Iain Barr Yana Hasson Karel Lenc Arthur Mensch Katie Millican Malcolm Reynolds et\u00a0al. 2022. Flamingo: A visual language model for few-shot learning. Proc. NeurIPS 35 (2022) 23716\u201323736."},{"key":"e_1_3_2_3_2","first-page":"1","volume-title":"Proceedings of the ICET","author":"Albawi Saad","year":"2017","unstructured":"Saad Albawi, Tareq Abed Mohammed, and Saad Al-Zawi. 2017. Understanding of a convolutional neural network. In Proceedings of the ICET. IEEE, 1\u20136."},{"key":"e_1_3_2_4_2","first-page":"1051","volume-title":"Proceedings of the COLING","author":"Amplayo Reinald Kim","year":"2022","unstructured":"Reinald Kim Amplayo, Kang Min Yoo, and Sang-Woo Lee. 2022. Attribute injection for pretrained language models: A new benchmark and an efficient method. In Proceedings of the COLING. 1051\u20131064."},{"key":"e_1_3_2_5_2","doi-asserted-by":"crossref","unstructured":"Simone Angarano Francesco Salvetti Mauro Martini and Marcello Chiaberge. 2023. Generative adversarial superresolution at the edge with knowledge distillation. Engineering Applications of Artificial Intelligence 123 (2023) 106407.","DOI":"10.1016\/j.engappai.2023.106407"},{"key":"e_1_3_2_6_2","volume-title":"Proceedings of the ICLR","author":"Ashok Anubhav","year":"2018","unstructured":"Anubhav Ashok, Nicholas Rhinehart, Fares Beainy, and Kris M. Kitani. 2018. N2n learning: Network to network compression via policy gradient reinforcement learning. In Proceedings of the ICLR."},{"key":"e_1_3_2_7_2","volume-title":"Proceedings of the NeurIPS","author":"Ba Jimmy","year":"2014","unstructured":"Jimmy Ba and Rich Caruana. 2014. Do deep nets really need to be deep?. In Proceedings of the NeurIPS."},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2013.50"},{"key":"e_1_3_2_9_2","unstructured":"Hugo Berg Siobhan Mackenzie Hall Yash Bhalgat Wonsuk Yang Hannah Rose Kirk Aleksandar Shtedritski and Max Bain. 2022. A prompt array keeps the bias away: Debiasing vision-language models with adversarial learning. (2022). Preprint at https:\/\/arxiv.org\/abs\/2203.11933"},{"key":"e_1_3_2_10_2","first-page":"e1484","article-title":"Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges","author":"Bischl Bernd","year":"2021","unstructured":"Bernd Bischl, Martin Binder, Michel Lang, Tobias Pielok, Jakob Richter, Stefan Coors, Janek Thomas, Theresa Ullmann, Marc Becker, Anne-Laure Boulesteix, et\u00a0al. 2021. Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery (2021), e1484.","journal-title":"Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery"},{"key":"e_1_3_2_11_2","unstructured":"Rishi Bommasani Drew A Hudson Ehsan Adeli Russ Altman Simran Arora Sydney von Arx Michael S Bernstein Jeannette Bohg Antoine Bosselut Emma Brunskill et\u00a0al. 2021. On the opportunities and risks of foundation models. (2021). Preprint at https:\/\/arxiv.org\/abs\/2108.07258"},{"key":"e_1_3_2_12_2","doi-asserted-by":"crossref","unstructured":"Benjamin Bowman Alessandro Achille Luca Zancato Matthew Trager Pramuditha Perera Giovanni Paolini and Stefano Soatto. 2023. A-la-carte prompt tuning (apt): Combining distinct data via composable prompting. In Proc. CVPR. 14984\u201314993.","DOI":"10.1109\/CVPR52729.2023.01439"},{"key":"e_1_3_2_13_2","first-page":"1877","volume-title":"Proceedings of the NeurIPS","author":"Brown Tom","year":"2020","unstructured":"Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et\u00a0al. 2020. Language models are few-shot learners. In Proceedings of the NeurIPS. 1877\u20131901."},{"issue":"01","key":"e_1_3_2_14_2","first-page":"1","article-title":"EGCN++: A new fusion strategy for ensemble learning in skeleton-based rehabilitation exercise assessment","author":"Bruce XB","year":"2024","unstructured":"XB Bruce, Yan Liu, Keith CC Chan, and Chang Wen Chen. 2024. EGCN++: A new fusion strategy for ensemble learning in skeleton-based rehabilitation exercise assessment. IEEE Trans. Patt. Anal. Mach. Intell.01 (2024), 1\u201316.","journal-title":"IEEE Trans. Patt. Anal. Mach. Intell."},{"key":"e_1_3_2_15_2","article-title":"Mmnet: A model-based multimodal network for human action recognition in rgb-d videos","author":"Bruce XB","year":"2022","unstructured":"XB Bruce, Yan Liu, Xiang Zhang, Sheng-hua Zhong, and Keith CC Chan. 2022. Mmnet: A model-based multimodal network for human action recognition in rgb-d videos. IEEE Trans. Patt. Anal. Mach. Intell. (2022).","journal-title":"IEEE Trans. Patt. Anal. Mach. Intell."},{"key":"e_1_3_2_16_2","unstructured":"Zhiqi Bu Yu-Xiang Wang Sheng Zha and George Karypis. 2022. Differentially private bias-term only fine-tuning of foundation models. In Workshop on Trustworthy and Socially Responsible Machine Learning (NeurIPS\u201922)."},{"key":"e_1_3_2_17_2","first-page":"535","volume-title":"Proceedings of the ACM SIGKDD","author":"Bucilu\u01ce Cristian","year":"2006","unstructured":"Cristian Bucilu\u01ce, Rich Caruana, and Alexandru Niculescu-Mizil. 2006. Model compression. In Proceedings of the ACM SIGKDD. 535\u2013541."},{"key":"e_1_3_2_18_2","doi-asserted-by":"crossref","unstructured":"Adrian Bulat and Georgios Tzimiropoulos. 2023. Lasp: Text-to-text optimization for language-aware soft prompting of vision & language models. In Proc. CVPR. 23232\u201323241.","DOI":"10.1109\/CVPR52729.2023.02225"},{"key":"e_1_3_2_19_2","volume-title":"Proceedings of the AAAI","author":"Cai Han","year":"2018","unstructured":"Han Cai, Tianyao Chen, Weinan Zhang, Yong Yu, and Jun Wang. 2018. Efficient architecture search by network transformation. In Proceedings of the AAAI."},{"key":"e_1_3_2_20_2","first-page":"678","volume-title":"Proceedings of the ICML","author":"Cai Han","year":"2018","unstructured":"Han Cai, Jiacheng Yang, Weinan Zhang, Song Han, and Yong Yu. 2018. Path-level network transformation for efficient architecture search. In Proceedings of the ICML. PMLR, 678\u2013687."},{"key":"e_1_3_2_21_2","first-page":"778","volume-title":"Proceedings of the ECCV","author":"Calonder Michael","year":"2010","unstructured":"Michael Calonder, Vincent Lepetit, Christoph Strecha, and Pascal Fua. 2010. Brief: Binary robust independent elementary features. In Proceedings of the ECCV. Springer, 778\u2013792."},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.502"},{"key":"e_1_3_2_23_2","first-page":"874","volume-title":"Proceedings of the NeurIPS","author":"Chang Jianlong","year":"2019","unstructured":"Jianlong Chang, Xinbang Zhang, Yiwen Guo, Gaofeng Meng, Shiming Xiang, and Chunhong Pan. 2019. DATA: Differentiable architecture approximation. In Proceedings of the NeurIPS. 874\u2013884."},{"key":"e_1_3_2_24_2","first-page":"7028","volume-title":"Proceedings of the AAAI","author":"Chen Defang","year":"2021","unstructured":"Defang Chen, Jian-Ping Mei, Yuan Zhang, Can Wang, Zhe Wang, Yan Feng, and Chun Chen. 2021. Cross-layer distillation with semantic calibration. In Proceedings of the AAAI. 7028\u20137036."},{"key":"e_1_3_2_25_2","doi-asserted-by":"crossref","first-page":"647","DOI":"10.1109\/ICCSEE.2012.193","volume-title":"Proceedings of the 2012 International Conference on Computer Science and Electronics Engineering","author":"Chen Deyan","year":"2012","unstructured":"Deyan Chen and Hong Zhao. 2012. Data security and privacy protection issues in cloud computing. In Proceedings of the 2012 International Conference on Computer Science and Electronics Engineering. IEEE, 647\u2013651."},{"key":"e_1_3_2_26_2","unstructured":"Guangyi Chen Weiran Yao Xiangchen Song Xinyue Li Yongming Rao and Kun Zhang. 2022. Prompt learning with optimal transport for vision-language models. (2022). Preprint at https:\/\/arxiv.org\/abs\/2210.01253"},{"key":"e_1_3_2_27_2","unstructured":"Hao Chen Ran Tao Han Zhang Yidong Wang Wei Ye Jindong Wang Guosheng Hu and Marios Savvides. 2022. Conv-adapter: Exploring parameter efficient transfer learning for ConvNets. (2022). Preprint at https:\/\/arxiv.org\/abs\/2208.07463"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2020.2970494"},{"key":"e_1_3_2_29_2","unstructured":"Haoran Chen Zuxuan Wu and Yu-Gang Jiang. 2022. Multi-prompt alignment for multi-source unsupervised domain adaptation. In Proc. NeurIPS 36 (2024)."},{"key":"e_1_3_2_30_2","first-page":"1691","volume-title":"Proceedings of the ICML","author":"Chen Mark","year":"2020","unstructured":"Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, and Ilya Sutskever. 2020. Generative pretraining from pixels. In Proceedings of the ICML. PMLR, 1691\u20131703."},{"key":"e_1_3_2_31_2","first-page":"5008","volume-title":"Proceedings of the CVPR","author":"Chen Pengguang","year":"2021","unstructured":"Pengguang Chen, Shu Liu, Hengshuang Zhao, and Jiaya Jia. 2021. Distilling knowledge via knowledge review. In Proceedings of the CVPR. 5008\u20135017."},{"key":"e_1_3_2_32_2","unstructured":"Shoufa Chen Chongjian Ge Zhan Tong Jiangliu Wang Yibing Song Jue Wang and Ping Luo. 2022. AdaptFormer: Adapting vision transformers for scalable visual recognition. Proc. NeurIPS 35 (2022) 16664\u201316678."},{"key":"e_1_3_2_33_2","volume-title":"Proceedings of the ICLR","author":"Chen Tianqi","year":"2016","unstructured":"Tianqi Chen, Ian Goodfellow, and Jonathon Shlens. 2016. Net2net: Accelerating learning via knowledge transfer. In Proceedings of the ICLR."},{"key":"e_1_3_2_34_2","first-page":"12020","volume-title":"Proceedings of the CVPR","author":"Chen Tianlong","year":"2022","unstructured":"Tianlong Chen, Zhenyu Zhang, Yu Cheng, Ahmed Awadallah, and Zhangyang Wang. 2022. The principle of diversity: Training stronger vision transformers calls for reducing all levels of redundancy. In Proceedings of the CVPR. 12020\u201312030."},{"key":"e_1_3_2_35_2","first-page":"1294","volume-title":"Proceedings of the CVPR","author":"Chen Xin","year":"2019","unstructured":"Xin Chen, Lingxi Xie, Jun Wu, and Qi Tian. 2019. Progressive differentiable architecture search: Bridging the depth gap between search and evaluation. In Proceedings of the CVPR. 1294\u20131303."},{"key":"e_1_3_2_36_2","article-title":"Good visual guidance makes a better extractor: Hierarchical visual prefix for multimodal entity and relation extraction","author":"Chen Xiang","year":"2022","unstructured":"Xiang Chen, Ningyu Zhang, Lei Li, Yunzhi Yao, Shumin Deng, Chuanqi Tan, Fei Huang, Luo Si, and Huajun Chen. 2022. Good visual guidance makes a better extractor: Hierarchical visual prefix for multimodal entity and relation extraction. North American Chapter of the Association for Computational Linguistics (2022).","journal-title":"North American Chapter of the Association for Computational Linguistics"},{"key":"e_1_3_2_37_2","unstructured":"Zhe Chen Yuchen Duan Wenhai Wang Junjun He Tong Lu Jifeng Dai and Yu Qiao. 2022. Vision transformer adapter for dense predictions. (2022). Preprint at https:\/\/arxiv.org\/abs\/2205.08534"},{"key":"e_1_3_2_38_2","article-title":"NVIDIA hopper H100 GPU: Scaling performance","author":"Choquette Jack","year":"2023","unstructured":"Jack Choquette. 2023. NVIDIA hopper H100 GPU: Scaling performance. IEEE Micro (2023).","journal-title":"IEEE Micro"},{"key":"e_1_3_2_39_2","first-page":"9355","volume-title":"Proceedings of the NeurIPS","author":"Chu Xiangxiang","year":"2021","unstructured":"Xiangxiang Chu, Zhi Tian, Yuqing Wang, Bo Zhang, Haibing Ren, Xiaolin Wei, Huaxia Xia, and Chunhua Shen. 2021. Twins: Revisiting the design of spatial attention in vision transformers. In Proceedings of the NeurIPS. 9355\u20139366."},{"key":"e_1_3_2_40_2","unstructured":"Xiangxiang Chu Zhi Tian Bo Zhang Xinlong Wang and Chunhua Shen. 2023. Conditional positional encodings for vision transformers. In The Eleventh Proc. ICLR.https:\/\/openreview.net\/forum?id=3KWnuT-R1bh"},{"key":"e_1_3_2_41_2","volume-title":"Proceedings of the ICLR","author":"Chu Xiangxiang","year":"2021","unstructured":"Xiangxiang Chu, Xiaoxing Wang, Bo Zhang, Shun Lu, Xiaolin Wei, and Junchi Yan. 2021. Darts-: Robustly stepping out of performance collapse without indicators. In Proceedings of the ICLR."},{"key":"e_1_3_2_42_2","first-page":"12239","volume-title":"Proceedings of the CVPR","author":"Chu Xiangxiang","year":"2021","unstructured":"Xiangxiang Chu, Bo Zhang, and Ruijun Xu. 2021. Fairnas: Rethinking evaluation fairness of weight sharing neural architecture search. In Proceedings of the CVPR. 12239\u201312248."},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1137\/S0895479896305696"},{"key":"e_1_3_2_44_2","unstructured":"Mostafa Dehghani Josip Djolonga Basil Mustafa Piotr Padlewski Jonathan Heek Justin Gilmer Andreas Peter Steiner Mathilde Caron Robert Geirhos Ibrahim Alabdulmohsin Rodolphe Jenatton Lucas Beyer Michael Tschannen Anurag Arnab Xiao Wang Carlos Riquelme Ruiz Matthias Minderer Joan Puigcerver Utku Evci Manoj Kumar Sjoerd Van Steenkiste Gamaleldin Fathy Elsayed Aravindh Mahendran Fisher Yu Avital Oliver Fantine Huot Jasmijn Bastings Mark Collier Alexey A. Gritsenko Vighnesh Birodkar Cristina Nader Vasconcelos Yi Tay Thomas Mensink Alexander Kolesnikov Filip Pavetic Dustin Tran Thomas Kipf Mario Lucic Xiaohua Zhai Daniel Keysers Jeremiah J. Harmsen and Neil Houlsby. 2023. Scaling vision transformers to 22 billion parameters. In Proceedings of the 40th International Conference on Machine Learning (Proceedings of Machine Learning Research Vol. 202) Andreas Krause Emma Brunskill Kyunghyun Cho Barbara Engelhardt Sivan Sabato and Jonathan Scarlett (Eds.). PMLR 7480\u20137512. https:\/\/proceedings.mlr.press\/v202\/dehghani23a.html"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1007\/1-84628-168-7"},{"key":"e_1_3_2_46_2","first-page":"248","volume-title":"Proceedings of the 2009 CVPR","author":"Deng Jia","year":"2009","unstructured":"Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 CVPR. IEEE, 248\u2013255."},{"key":"e_1_3_2_47_2","first-page":"1","article-title":"Parameter-efficient fine-tuning of large-scale pre-trained language models","author":"Ding Ning","year":"2023","unstructured":"Ning Ding, Yujia Qin, Guang Yang, Fuchao Wei, Zonghan Yang, Yusheng Su, Shengding Hu, Yulin Chen, Chi-Min Chan, Weize Chen, et\u00a0al. 2023. Parameter-efficient fine-tuning of large-scale pre-trained language models. Nature Machine Intelligence (2023), 1\u201316.","journal-title":"Nature Machine Intelligence"},{"key":"e_1_3_2_48_2","unstructured":"Bowen Dong Pan Zhou Shuicheng Yan and Wangmeng Zuo. 2022. LPT: Long-tailed prompt tuning for image classification. In The Eleventh Proc. ICLR."},{"key":"e_1_3_2_49_2","unstructured":"Runpei Dong Zekun Qi Linfeng Zhang Junbo Zhang Jianjian Sun Zheng Ge Li Yi and Kaisheng Ma. 2022. Autoencoders as cross-modal teachers: Can pretrained 2D image transformers help 3D representation learning? In The Eleventh Proc. ICLR."},{"key":"e_1_3_2_50_2","volume-title":"Proceedings of the ICLR","author":"Dosovitskiy Alexey","year":"2021","unstructured":"Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et\u00a0al. 2021. An image is worth 16x16 words: Transformers for image recognition at scale. In Proceedings of the ICLR."},{"key":"e_1_3_2_51_2","unstructured":"Danny Driess Fei Xia Mehdi S. M. Sajjadi Corey Lynch Aakanksha Chowdhery Brian Ichter AyzaanWahid Jonathan Tompson Quan Vuong Tianhe Yu et\u00a0al. 2023. PaLM-e: An embodied multimodal language model. In International Conference on Machine Learning. PMLR 8469\u20138488."},{"key":"e_1_3_2_52_2","unstructured":"Zane Durante Qiuyuan Huang Naoki Wake Ran Gong Jae Sung Park Bidipta Sarkar Rohan Taori Yusuke Noda Demetri Terzopoulos Yejin Choi et\u00a0al. 2024. Agent ai: Surveying the horizons of multimodal interaction. arXiv preprint arXiv:2401.03568 (2024)."},{"key":"e_1_3_2_53_2","unstructured":"Ali Edalati Marzieh Tahaei Ivan Kobyzev Vahid Partovi Nia James J. Clark and Mehdi Rezagholizadeh. 2022. KronA: Parameter efficient tuning with kronecker adapter. In The Third Workshop on Efficient Natural Language and Speech Processing (ENLSP-III)."},{"key":"e_1_3_2_54_2","doi-asserted-by":"crossref","unstructured":"Constantin Eichenberg Sidney Black Samuel Weinbach Letitia Parcalabescu and Anette Frank. 2022. MAGMA\u2013 multimodal augmentation of generative models through adapter-based finetuning. In Findings of the Association for Computational Linguistics: (EMNLP\u201922). 2416\u20132428.","DOI":"10.18653\/v1\/2022.findings-emnlp.179"},{"key":"e_1_3_2_55_2","volume-title":"Proceedings of the ICLR, Workshop Track","author":"Elsken Thomas","year":"2018","unstructured":"Thomas Elsken, Jan-Hendrik Metzen, and Frank Hutter. 2018. Simple and efficient architecture search for convolutional neural networks. In Proceedings of the ICLR, Workshop Track."},{"key":"e_1_3_2_56_2","first-page":"3774","volume-title":"Proceedings of the CVPR Workshops","author":"Ermis Beyza","year":"2022","unstructured":"Beyza Ermis, Giovanni Zappella, Martin Wistuba, Aditya Rawal, and C\u00e9dric Archambeau. 2022. Continual learning with transformers for image classification. In Proceedings of the CVPR Workshops. 3774\u20133781."},{"issue":"1","key":"e_1_3_2_57_2","doi-asserted-by":"crossref","first-page":"72","DOI":"10.1107\/S0907444905036693","article-title":"Scaling and assessment of data quality","volume":"62","author":"Evans Philip","year":"2006","unstructured":"Philip Evans. 2006. Scaling and assessment of data quality. Acta Crystallographica Section D: Biological Crystallography 62, 1 (2006), 72\u201382.","journal-title":"Acta Crystallographica Section D: Biological Crystallography"},{"key":"e_1_3_2_58_2","volume-title":"Proceedings of the ICLR","author":"Fang Jiemin","year":"2020","unstructured":"Jiemin Fang, Yuzhu Sun, Kangjian Peng, Qian Zhang, Yuan Li, Wenyu Liu, and Xinggang Wang. 2020. Fast neural network adaptation via parameter remapping and architecture search. In Proceedings of the ICLR."},{"issue":"9","key":"e_1_3_2_59_2","doi-asserted-by":"crossref","first-page":"2990","DOI":"10.1109\/TPAMI.2020.3044416","article-title":"FNA++: Fast network adaptation via parameter remapping and architecture search","volume":"43","author":"Fang Jiemin","year":"2020","unstructured":"Jiemin Fang, Yuzhu Sun, Qian Zhang, Kangjian Peng, Yuan Li, Wenyu Liu, and Xinggang Wang. 2020. FNA++: Fast network adaptation via parameter remapping and architecture search. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 9 (2020), 2990\u20133004.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"e_1_3_2_60_2","first-page":"203","volume-title":"Proc. CVPR","author":"Feichtenhofer Christoph","year":"2020","unstructured":"Christoph Feichtenhofer. 2020. X3d: Expanding architectures for efficient video recognition. In Proc. CVPR. 203\u2013213."},{"key":"e_1_3_2_61_2","unstructured":"Chin-Lun Fu Zih-Ching Chen Yun-Ru Lee and Hung-yi Lee. 2022. AdapterBias: Parameter-efficient token-dependent representation shift for adapters in NLP tasks. In Findings of the Association for Computational Linguistics: (NAACL\u201922). 2608\u20132621."},{"key":"e_1_3_2_62_2","doi-asserted-by":"crossref","unstructured":"Yulu Gan Yan Bai Yihang Lou Xianzheng Ma Renrui Zhang Nian Shi and Lin Luo. 2023. Decorate the newcomers: Visual domain prompt for continual test time adaptation. In Proc. AAAI Vol. 37 7595\u20137603.","DOI":"10.1609\/aaai.v37i6.25922"},{"key":"e_1_3_2_63_2","unstructured":"Kaifeng Gao Long Chen Hanwang Zhang Jun Xiao and Qianru Sun. 2022. Compositional prompt tuning with motion cues for Open-vocabulary video relation detection. In The Eleventh Proc. ICLR."},{"key":"e_1_3_2_64_2","doi-asserted-by":"crossref","unstructured":"Peng Gao Shijie Geng Renrui Zhang Teli Ma Rongyao Fang Yongfeng Zhang Hongsheng Li and Yu Qiao. 2024. Clip-adapter: Better vision-language models with feature adapters. International Journal of Computer Vision 132 2 (2024) 581\u2013595.","DOI":"10.1007\/s11263-023-01891-x"},{"key":"e_1_3_2_65_2","unstructured":"Yunhe Gao Xingjian Shi Yi Zhu Hao Wang Zhiqiang Tang Xiong Zhou Mu Li and Dimitris N. Metaxas. 2022. Visual Prompt Tuning for Test-time Domain Adaptation. (2022). Preprint at https:\/\/arxiv.org\/abs\/2210.04831"},{"key":"e_1_3_2_66_2","doi-asserted-by":"publisher","DOI":"10.4324\/9781315740218"},{"key":"e_1_3_2_67_2","unstructured":"Yunchao Gong Liu Liu Ming Yang and Lubomir Bourdev. 2014. Compressing deep convolutional networks using vector quantization. (2014). Preprint at https:\/\/arxiv.org\/abs\/1412.6115"},{"key":"e_1_3_2_68_2","article-title":"PHNNs: Lightweight neural networks via parameterized hypercomplex convolutions","author":"Grassucci Eleonora","year":"2022","unstructured":"Eleonora Grassucci, Aston Zhang, and Danilo Comminiello. 2022. PHNNs: Lightweight neural networks via parameterized hypercomplex convolutions. IEEE Trans. Neur. Netw. Learn. Syst. (2022).","journal-title":"IEEE Trans. Neur. Netw. Learn. Syst."},{"key":"e_1_3_2_69_2","unstructured":"Xiuye Gu Tsung-Yi Lin Weicheng Kuo and Yin Cui. 2021. Open-vocabulary object detection via vision and language knowledge distillation. In Proc. ICLR."},{"key":"e_1_3_2_70_2","first-page":"469","volume-title":"Proceedings of the ECCV 16","author":"Guan Yushuo","year":"2020","unstructured":"Yushuo Guan, Pengyu Zhao, Bingxuan Wang, Yuanxing Zhang, Cong Yao, Kaigui Bian, and Jian Tang. 2020. Differentiable feature aggregation search for knowledge distillation. In Proceedings of the ECCV 16. Springer, 469\u2013484."},{"key":"e_1_3_2_71_2","first-page":"2154","volume-title":"Proceedings of the CVPR","author":"Guo Jianyuan","year":"2021","unstructured":"Jianyuan Guo, Kai Han, Yunhe Wang, Han Wu, Xinghao Chen, Chunjing Xu, and Chang Xu. 2021. Distilling object detectors via decoupled features. In Proceedings of the CVPR. 2154\u20132164."},{"key":"e_1_3_2_72_2","first-page":"12175","volume-title":"Proceedings of the CVPR","author":"Guo Jianyuan","year":"2022","unstructured":"Jianyuan Guo, Kai Han, Han Wu, Yehui Tang, Xinghao Chen, Yunhe Wang, and Chang Xu. 2022. Cmt: Convolutional neural networks meet vision transformers. In Proceedings of the CVPR. 12175\u201312185."},{"key":"e_1_3_2_73_2","article-title":"A survey on vision transformer","author":"Han Kai","year":"2022","unstructured":"Kai Han, Yunhe Wang, Hanting Chen, Xinghao Chen, Jianyuan Guo, Zhenhua Liu, Yehui Tang, An Xiao, Chunjing Xu, Yixing Xu, et\u00a0al. 2022. A survey on vision transformer. IEEE Transactions on Pattern Analysis and Machine Intelligence (2022).","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"e_1_3_2_74_2","first-page":"15908","volume-title":"Proceedings of the NeurIPS","author":"Han Kai","year":"2021","unstructured":"Kai Han, An Xiao, Enhua Wu, Jianyuan Guo, Chunjing Xu, and Yunhe Wang. 2021. Transformer in transformer. In Proceedings of the NeurIPS. 15908\u201315919."},{"key":"e_1_3_2_75_2","article-title":"Dynamic neural networks: A survey","author":"Han Yizeng","year":"2021","unstructured":"Yizeng Han, Gao Huang, Shiji Song, Le Yang, Honghui Wang, and Yulin Wang. 2021. Dynamic neural networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021).","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"e_1_3_2_76_2","volume-title":"Proceedings of the ICLR","author":"Hao Tianxiang","year":"2023","unstructured":"Tianxiang Hao, Hui Chen, Yuchen Guo, and Guiguang Ding. 2023. Consolidator: Mergable adapter with group connections for visual adaptation. In Proceedings of the ICLR. https:\/\/openreview.net\/forum?id=J_Cja7cpgW"},{"key":"e_1_3_2_77_2","volume-title":"Proceedings of the NeurIPS","author":"Hao Zhiwei","year":"2022","unstructured":"Zhiwei Hao, Jianyuan Guo, Ding Jia, Kai Han, Yehui Tang, Chao Zhang, Han Hu, and Yunhe Wang. 2022. Learning efficient vision transformers via fine-grained manifold distillation. In Proceedings of the NeurIPS."},{"key":"e_1_3_2_78_2","first-page":"11993","volume-title":"Proceedings of the CVPR","author":"He Chaoyang","year":"2020","unstructured":"Chaoyang He, Haishan Ye, Li Shen, and Tong Zhang. 2020. Milenas: Efficient neural architecture search via mixed-level reformulation. In Proceedings of the CVPR. 11993\u201312002."},{"key":"e_1_3_2_79_2","volume-title":"Proceedings of the ICLR","author":"He Junxian","year":"2022","unstructured":"Junxian He, Chunting Zhou, Xuezhe Ma, Taylor Berg-Kirkpatrick, and Graham Neubig. 2022. Towards a unified view of parameter-efficient transfer learning. In Proceedings of the ICLR. Retrieved from https:\/\/openreview.net\/forum?id=0RDcd5Axok"},{"key":"e_1_3_2_80_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01553"},{"key":"e_1_3_2_81_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_82_2","doi-asserted-by":"publisher","unstructured":"Xuehai He Chuanyuan Li Pengchuan Zhang Jianwei Yang and Xin Eric Wang. 2023. Parameter-efficient model adaptation for vision transformers. In (AAAI\u201923\/IAAI\u201923\/EAAI\u201923). AAAI Press Article 91 9 pages. 10.1609\/aaai.v37i1.25160","DOI":"10.1609\/aaai.v37i1.25160"},{"key":"e_1_3_2_83_2","doi-asserted-by":"crossref","unstructured":"Xuehai He Diji Yang Weixi Feng Tsu-Jui Fu Arjun Akula Varun Jampani Pradyumna Narayana Sugato Basu William Yang Wang and Xin Eric Wang. 2022. CPL: Counterfactual prompt learning for vision and language models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 3407\u20133418.","DOI":"10.18653\/v1\/2022.emnlp-main.224"},{"key":"e_1_3_2_84_2","unstructured":"Roei Herzig Ofir Abramovich Elad Ben-Avraham Assaf Arbelle Leonid Karlinsky Ariel Shamir Trevor Darrell and Amir Globerson. 2022. PromptonomyViT: Multi-task prompt learning improves video transformers using synthetic scene data. In Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision. 6803\u20136815."},{"key":"e_1_3_2_85_2","unstructured":"Geoffrey Hinton Oriol Vinyals and Jeff Dean. 2015. Distilling the knowledge in a neural network. (2015). Preprint at https:\/\/arxiv.org\/abs\/1503.02531"},{"key":"e_1_3_2_86_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00517"},{"key":"e_1_3_2_87_2","first-page":"2790","volume-title":"Proceedings of the ICML","author":"Houlsby Neil","year":"2019","unstructured":"Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. 2019. Parameter-efficient transfer learning for NLP. In Proceedings of the ICML. PMLR, 2790\u20132799."},{"key":"e_1_3_2_88_2","volume-title":"Proceedings of the ICLR","author":"Hu Edward J.","year":"2022","unstructured":"Edward J. Hu, yelong shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2022. LoRA: Low-rank adaptation of large language models. In Proceedings of the ICLR. Retrieved from https:\/\/openreview.net\/forum?id=nZeVKeeFYf9"},{"key":"e_1_3_2_89_2","doi-asserted-by":"crossref","unstructured":"Junjie Hu Chenyou Fan Hualie Jiang Xiyue Guo Yuan Gao Xiangyong Lu and Tin Lun Lam. 2023. Boosting LightWeight depth estimation via knowledge distillation. In Knowledge Science Engineering and Management Zhi Jin Yuncheng Jiang Robert Andrei Buchmann Yaxin Bi Ana-Maria Ghiran and Wenjun Ma (Eds.). Springer Nature Switzerland Cham 27\u201339.","DOI":"10.1007\/978-3-031-40283-8_3"},{"key":"e_1_3_2_90_2","unstructured":"Shishuai Hu Zehui Liao and Yong Xia. 2022. ProSFDA: Prompt learning based source-free domain adaptation for medical image segmentation. (2022). Preprint at https:\/\/arxiv.org\/abs\/2211.11514"},{"key":"e_1_3_2_91_2","unstructured":"Zilong Huang Youcheng Ben Guozhong Luo Pei Cheng Gang Yu and Bin Fu. 2021. Shuffle transformer: Rethinking spatial shuffle for vision transformer. (2021). Preprint at https:\/\/arxiv.org\/abs\/2106.03650"},{"key":"e_1_3_2_92_2","doi-asserted-by":"publisher","DOI":"10.1113\/jphysiol.1959.sp006308"},{"key":"e_1_3_2_93_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2013.248"},{"key":"e_1_3_2_94_2","first-page":"4904","volume-title":"Proceedings of the ICML","author":"Jia Chao","year":"2021","unstructured":"Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc Le, Yun-Hsuan Sung, Zhen Li, and Tom Duerig. 2021. Scaling up visual and vision-language representation learning with noisy text supervision. In Proceedings of the ICML. PMLR, 4904\u20134916."},{"key":"e_1_3_2_95_2","first-page":"709","volume-title":"Proceedings of the ECCV","author":"Jia Menglin","year":"2022","unstructured":"Menglin Jia, Luming Tang, Bor-Chun Chen, Claire Cardie, Serge Belongie, Bharath Hariharan, and Ser-Nam Lim. 2022. Visual prompt tuning. In Proceedings of the ECCV. Springer, 709\u2013727."},{"key":"e_1_3_2_96_2","unstructured":"Haojun Jiang Jianke Zhang Rui Huang Chunjiang Ge Zanlin Ni Jiwen Lu Jie Zhou Shiji Song and Gao Huang. 2022. Cross-Modal Adapter for Text-Video Retrieval. (2022). Preprint at https:\/\/arxiv.org\/abs\/2211.09623"},{"key":"e_1_3_2_97_2","first-page":"239","volume-title":"Proceedings of the ECCV","author":"Jiang Ziyu","year":"2022","unstructured":"Ziyu Jiang, Tianlong Chen, Xuxi Chen, Yu Cheng, Luowei Zhou, Lu Yuan, Ahmed Awadallah, and Zhangyang Wang. 2022. DnA: Improving few-shot transfer learning with low-rank decomposition and alignment. In Proceedings of the ECCV. Springer, 239\u2013256."},{"key":"e_1_3_2_98_2","volume-title":"Proceedings of the NeurIPS","author":"Jiang Ziyu","year":"2022","unstructured":"Ziyu Jiang, Xuxi Chen, Xueqin Huang, Xianzhi Du, Denny Zhou, and Zhangyang Wang. 2022. Back razor: Memory-efficient transfer learning by self-sparsified backpropagation. In Proceedings of the NeurIPS."},{"key":"e_1_3_2_99_2","unstructured":"Shibo Jie and Zhi-Hong Deng. 2022. Convolutional bypasses are better vision transformer adapters. (2022). Preprint at https:\/\/arxiv.org\/abs\/2207.07039"},{"key":"e_1_3_2_100_2","doi-asserted-by":"crossref","unstructured":"Shibo Jie and Zhi-Hong Deng. 2023. Fact: Factor-tuning for lightweight adaptation on vision transformer. In Proc. AAAI Vol. 37. 1060\u20131068.","DOI":"10.1609\/aaai.v37i1.25187"},{"key":"e_1_3_2_101_2","first-page":"105","volume-title":"Proceedings of the ECCV","author":"Ju Chen","year":"2022","unstructured":"Chen Ju, Tengda Han, Kunhao Zheng, Ya Zhang, and Weidi Xie. 2022. Prompting visual-language models for efficient video understanding. In Proceedings of the ECCV. Springer, 105\u2013124."},{"key":"e_1_3_2_102_2","doi-asserted-by":"publisher","DOI":"10.1038\/s42256-020-0186-1"},{"key":"e_1_3_2_103_2","first-page":"1022","volume-title":"Proceedings of the NeurIPS","author":"Mahabadi Rabeeh Karimi","year":"2021","unstructured":"Rabeeh Karimi Mahabadi, James Henderson, and Sebastian Ruder. 2021. Compacter: Efficient low-rank hypercomplex adapter layers. In Proceedings of the NeurIPS. 1022\u20131035."},{"key":"e_1_3_2_104_2","unstructured":"Will Kay Joao Carreira Karen Simonyan Brian Zhang Chloe Hillier Sudheendra Vijayanarasimhan Fabio Viola Tim Green Trevor Back Paul Natsev et\u00a0al. 2017. The kinetics human action video dataset. (2017). Preprint at https:\/\/arxiv.org\/abs\/1705.06950"},{"key":"e_1_3_2_105_2","first-page":"II\u2013II","volume-title":"Proceedings of the CVPR","author":"Ke Yan","year":"2004","unstructured":"Yan Ke and Rahul Sukthankar. 2004. PCA-SIFT: A more distinctive representation for local image descriptors. In Proceedings of the CVPR. IEEE, II\u2013II."},{"key":"e_1_3_2_106_2","doi-asserted-by":"publisher","DOI":"10.1145\/3505244"},{"key":"e_1_3_2_107_2","doi-asserted-by":"crossref","unstructured":"Muhammad Uzair Khattak Hanoona Rasheed Muhammad Maaz Salman Khan and Fahad Shahbaz Khan. 2023. Maple: Multi-modal prompt learning. In Proc. CVPR. 19113\u201319122.","DOI":"10.1109\/CVPR52729.2023.01832"},{"key":"e_1_3_2_108_2","volume-title":"Proceedings of the NeurIPS","author":"Kim Jangho","year":"2018","unstructured":"Jangho Kim, SeongUk Park, and Nojun Kwak. 2018. Paraphrasing complex network: Network compression via factor transfer. In Proceedings of the NeurIPS."},{"key":"e_1_3_2_109_2","unstructured":"Konwoo Kim Michael Laskin Igor Mordatch and Deepak Pathak. 2021. How to adapt your large-scale vision-and-language model. (2021). Preprint at https:\/\/openreview.net\/forum?id=EhwEUb2ynIa. Access date: 14 Feb 2023."},{"key":"e_1_3_2_110_2","unstructured":"Kwanyoung Kim Yujin Oh and Jong Chul Ye. 2023. ZegOT: Zero-shot segmentation through optimal transport of text prompts. (2023). Preprint at https:\/\/arxiv.org\/abs\/2301.12171"},{"key":"e_1_3_2_111_2","unstructured":"Minsu Kim Hyung-Il Kim and Yong Man Ro. 2023. Prompt tuning of deep neural networks for speaker-adaptive visual speech recognition. (2023). Preprint at https:\/\/arxiv.org\/abs\/2302.08102"},{"key":"e_1_3_2_112_2","unstructured":"Parth Kothari Danya Li Yuejiang Liu and Alexandre Alahi. 2023. Motion style transfer: Modular low-rank adaptation for deep motion forecasting. In Proceedings of The 6th Conference on Robot Learning (Proceedings of Machine Learning Research Vol. 205) Karen Liu Dana Kulic and Jeff Ichnowski (Eds.). PMLR 774\u2013784. https:\/\/proceedings.mlr.press\/v205\/kothari23a.html"},{"key":"e_1_3_2_113_2","doi-asserted-by":"publisher","DOI":"10.1145\/3065386"},{"key":"e_1_3_2_114_2","volume-title":"Proceedings of the ICLR","author":"Kumar Ananya","year":"2021","unstructured":"Ananya Kumar, Aditi Raghunathan, Robbie Matthew Jones, Tengyu Ma, and Percy Liang. 2021. Fine-tuning can distort pretrained features and underperform out-of-distribution. In Proceedings of the ICLR."},{"key":"e_1_3_2_115_2","doi-asserted-by":"publisher","DOI":"10.1126\/science.aab3050"},{"key":"e_1_3_2_116_2","first-page":"4013","volume-title":"Proceedings of the CVPR","author":"Lavin Andrew","year":"2016","unstructured":"Andrew Lavin and Scott Gray. 2016. Fast algorithms for convolutional neural networks. In Proceedings of the CVPR. 4013\u20134021."},{"key":"e_1_3_2_117_2","doi-asserted-by":"publisher","DOI":"10.1038\/nature14539"},{"key":"e_1_3_2_118_2","volume-title":"Proceedings of the ICLR","author":"Li Chunyuan","year":"2022","unstructured":"Chunyuan Li, Jianwei Yang, Pengchuan Zhang, Mei Gao, Bin Xiao, Xiyang Dai, Lu Yuan, and Jianfeng Gao. 2022. Efficient self-supervised vision transformers for representation learning. In Proceedings of the ICLR."},{"key":"e_1_3_2_119_2","first-page":"1620","volume-title":"Proceedings of the CVPR","author":"Li Guohao","year":"2020","unstructured":"Guohao Li, Guocheng Qian, Itzel C Delgadillo, Matthias Muller, Ali Thabet, and Bernard Ghanem. 2020. Sgas: Sequential greedy architecture search. In Proceedings of the CVPR. 1620\u20131630."},{"key":"e_1_3_2_120_2","first-page":"6356","volume-title":"Proceedings of the CVPR","author":"Li Quanquan","year":"2017","unstructured":"Quanquan Li, Shengying Jin, and Junjie Yan. 2017. Mimicking very efficient network for object detection. In Proceedings of the CVPR. 6356\u20136364."},{"key":"e_1_3_2_121_2","first-page":"13434","volume-title":"Proceedings of the CVPR","author":"Li Tianjiao","year":"2021","unstructured":"Tianjiao Li, Qiuhong Ke, Hossein Rahmani, Rui En Ho, Henghui Ding, and Jun Liu. 2021. Else-net: Elastic semantic network for continual action recognition from skeleton data. In Proceedings of the CVPR. 13434\u201313443."},{"key":"e_1_3_2_122_2","article-title":"Low dimensional trajectory hypothesis is true: Dnns can be trained in tiny subspaces","author":"Li Tao","year":"2022","unstructured":"Tao Li, Lei Tan, Zhehao Huang, Qinghua Tao, Yipeng Liu, and Xiaolin Huang. 2022. Low dimensional trajectory hypothesis is true: Dnns can be trained in tiny subspaces. IEEE Transactions on Pattern Analysis and Machine Intelligence (2022).","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"e_1_3_2_123_2","first-page":"280","volume-title":"Proceedings of the ECCV","author":"Li Yanghao","year":"2022","unstructured":"Yanghao Li, Hanzi Mao, Ross Girshick, and Kaiming He. 2022. Exploring plain vision transformer backbones for object detection. In Proceedings of the ECCV. Springer, 280\u2013296."},{"key":"e_1_3_2_124_2","unstructured":"Yanghao Li Saining Xie Xinlei Chen Piotr Dollar Kaiming He and Ross Girshick. 2021. Benchmarking detection transfer learning with vision transformers. (2021). Preprint at https:\/\/arxiv.org\/abs\/2111.11429"},{"key":"e_1_3_2_125_2","article-title":"A survey of convolutional neural networks: analysis, applications, and prospects","author":"Li Zewen","year":"2021","unstructured":"Zewen Li, Fan Liu, Wenjie Yang, Shouheng Peng, and Jun Zhou. 2021. A survey of convolutional neural networks: analysis, applications, and prospects. IEEE Trans. Neur. Netw. Learn. Syst. (2021).","journal-title":"IEEE Trans. Neur. Netw. Learn. Syst."},{"key":"e_1_3_2_126_2","volume-title":"Proceedings of the NeurIPS","author":"Lian Dongze","year":"2022","unstructured":"Dongze Lian, Zhou Daquan, Jiashi Feng, and Xinchao Wang. 2022. Scaling and shifting your features: A new baseline for efficient model tuning. In Proceedings of the NeurIPS. Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (Eds.), Retrieved from https:\/\/openreview.net\/forum?id=XtyeppctGgc"},{"key":"e_1_3_2_127_2","unstructured":"Hanwen Liang Shifeng Zhang Jiacheng Sun Xingqiu He Weiran Huang Kechen Zhuang and Zhenguo Li. 2019. Darts+: Improved differentiable architecture search with early stopping. (2019). Preprint at https:\/\/arxiv.org\/abs\/1909.06035"},{"key":"e_1_3_2_128_2","volume-title":"Proceedings of the CVPR","author":"Lin Junfan","year":"2023","unstructured":"Junfan Lin, Jianlong Chang, Lingbo Liu, Guanbin Li, Liang Lin, Qi Tian, and Chang-wen Chen. 2023. Being comes from not-being: Open-vocabulary text-to-motion generation with wordless training. In Proceedings of the CVPR."},{"key":"e_1_3_2_129_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"e_1_3_2_130_2","unstructured":"Yan-Bo Lin Yi-Lin Sung Jie Lei Mohit Bansal and Gedas Bertasius. 2023. Vision transformers are parameter-efficient audio-visual learners. In Proc. CVPR. 2299\u20132309."},{"key":"e_1_3_2_131_2","first-page":"441","volume-title":"Findings of the Association for Computational Linguistics","author":"Lin Zhaojiang","year":"2020","unstructured":"Zhaojiang Lin, Andrea Madotto, and Pascale Fung. 2020. Exploring versatile generative language model via parameter-efficient transfer learning. In Findings of the Association for Computational Linguistics. 441\u2013459."},{"key":"e_1_3_2_132_2","volume-title":"Proceedings of the ICLR","author":"Liu Hanxiao","year":"2019","unstructured":"Hanxiao Liu, Karen Simonyan, and Yiming Yang. 2019. Darts: Differentiable architecture search. In Proceedings of the ICLR."},{"key":"e_1_3_2_133_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2019.2916873"},{"key":"e_1_3_2_134_2","first-page":"2645","volume-title":"Proceedings of the ACM MM","author":"Liu Lingbo","year":"2020","unstructured":"Lingbo Liu, Jiaqi Chen, Hefeng Wu, Tianshui Chen, Guanbin Li, and Liang Lin. 2020. Efficient crowd counting via structured knowledge transfer. In Proceedings of the ACM MM. 2645\u20132654."},{"key":"e_1_3_2_135_2","first-page":"4823","volume-title":"Proceedings of the CVPR","author":"Liu Lingbo","year":"2021","unstructured":"Lingbo Liu, Jiaqi Chen, Hefeng Wu, Guanbin Li, Chenglong Li, and Liang Lin. 2021. Cross-modal collaborative representation learning and a large-scale rgbt benchmark for crowd counting. In Proceedings of the CVPR. 4823\u20134833."},{"key":"e_1_3_2_136_2","unstructured":"Lingbo Liu Bruce XB Yu Jianlong Chang Qi Tian and Chang-Wen Chen. 2022. Prompt-matched semantic segmentation. (2022). Preprint at https:\/\/arxiv.org\/abs\/2208.10159"},{"key":"e_1_3_2_137_2","doi-asserted-by":"publisher","unstructured":"Pengfei Liu Weizhe Yuan Jinlan Fu Zhengbao Jiang Hiroaki Hayashi and Graham Neubig. 2021. Pre-train prompt and predict: A systematic survey of prompting methods in natural language processing. ACM Comput. Surv. 55 9 Article 195 (Jan 2023) 35 pages. 10.1145\/3560815","DOI":"10.1145\/3560815"},{"key":"e_1_3_2_138_2","unstructured":"Shiwei Liu and Zhangyang Wang. 2023. Ten lessons we have learned in the new \u201csparseland\u201d: A short handbook for sparse neural network researchers. In ICLR 2023 Workshop on Sparsity in Neural Networks: On Practical Limitations and Tradeoffs Between Sustainability and Efficiency. ICLR. Spotlight."},{"key":"e_1_3_2_139_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2021.108293"},{"key":"e_1_3_2_140_2","first-page":"7096","volume-title":"Proceedings of the CVPR","author":"Liu Yufan","year":"2019","unstructured":"Yufan Liu, Jiajiong Cao, Bing Li, Chunfeng Yuan, Weiming Hu, Yangxi Li, and Yunqiang Duan. 2019. Knowledge distillation via instance relationship graph. In Proceedings of the CVPR. 7096\u20137104."},{"key":"e_1_3_2_141_2","first-page":"2604","volume-title":"Proceedings of the CVPR","author":"Liu Yifan","year":"2019","unstructured":"Yifan Liu, Ke Chen, Chris Liu, Zengchang Qin, Zhenbo Luo, and Jingdong Wang. 2019. Structured knowledge distillation for semantic segmentation. In Proceedings of the CVPR. 2604\u20132613."},{"key":"e_1_3_2_142_2","first-page":"7539","volume-title":"Proceedings of the CVPR","author":"Liu Yu","year":"2020","unstructured":"Yu Liu, Xuhui Jia, Mingxing Tan, Raviteja Vemulapalli, Yukun Zhu, Bradley Green, and Xiaogang Wang. 2020. Search to distill: Pearls are everywhere but not the eyes. In Proceedings of the CVPR. 7539\u20137548."},{"key":"e_1_3_2_143_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.trc.2021.103070"},{"key":"e_1_3_2_144_2","article-title":"Structured knowledge distillation for dense prediction","author":"Liu Yifan","year":"2020","unstructured":"Yifan Liu, Changyong Shu, Jingdong Wang, and Chunhua Shen. 2020. Structured knowledge distillation for dense prediction. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020).","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"e_1_3_2_145_2","unstructured":"Yixin Liu Kai Zhang Yuan Li Zhiling Yan Chujie Gao Ruoxi Chen Zhengqing Yuan Yue Huang Hanchi Sun Jianfeng Gao et\u00a0al. 2024. Sora: A review on background technology limitations and opportunities of large vision models. arXiv preprint arXiv:2402.17177 (2024)."},{"key":"e_1_3_2_146_2","unstructured":"Yen-Cheng Liu Chih-Yao Ma Junjiao Tian Zijian He and Zsolt Kira. 2022. Polyhistor: Parameter-efficient multi-task adaptation for dense vision tasks. Proc. NeurIPS 35 (2022) 36889\u201336901."},{"key":"e_1_3_2_147_2","first-page":"10012","volume-title":"Proceedings of the CVPR","author":"Liu Ze","year":"2021","unstructured":"Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the CVPR. 10012\u201310022."},{"key":"e_1_3_2_148_2","first-page":"3202","volume-title":"Proceedings of the CVPR","author":"Liu Ze","year":"2022","unstructured":"Ze Liu, Jia Ning, Yue Cao, Yixuan Wei, Zheng Zhang, Stephen Lin, and Han Hu. 2022. Video swin transformer. In Proceedings of the CVPR. 3202\u20133211."},{"key":"e_1_3_2_149_2","unstructured":"Jochem Loedeman Maarten C. Stol Tengda Han and Yuki M. Asano. 2022. Prompt generation networks for efficient adaptation of frozen vision transformers. (2022). Preprint at https:\/\/arxiv.org\/abs\/2210.06466"},{"key":"e_1_3_2_150_2","unstructured":"Haoyu Lu Mingyu Ding Yuqi Huo Guoxing Yang Zhiwu Lu Masayoshi Tomizuka and Wei Zhan. 2024. UniAdapter: Unified parameter-efficient transfer learning for cross-modal modeling. (2024)."},{"key":"e_1_3_2_151_2","unstructured":"Gen Luo Minglang Huang Yiyi Zhou Xiaoshuai Sun Guannan Jiang Zhiyu Wang and Rongrong Ji. 2023. Towards efficient visual adaption via structural re-parameterization. (2023). Preprint at https:\/\/arxiv.org\/abs\/2302.08106"},{"key":"e_1_3_2_152_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2021.3139234"},{"key":"e_1_3_2_153_2","doi-asserted-by":"publisher","DOI":"10.1145\/3532624"},{"key":"e_1_3_2_154_2","article-title":"Understanding and mitigating overfitting in prompt tuning for vision-language models","author":"Ma Chengcheng","year":"2023","unstructured":"Chengcheng Ma, Yang Liu, Jiankang Deng, Lingxi Xie, Weiming Dong, and Changsheng Xu. 2023. Understanding and mitigating overfitting in prompt tuning for vision-language models. TCSVT (2023).","journal-title":"TCSVT"},{"key":"e_1_3_2_155_2","unstructured":"Teli Ma Shijie Geng Mengmeng Wang Jing Shao Jiasen Lu Hongsheng Li Peng Gao and Yu Qiao. 2021. A simple long-tailed recognition baseline via vision-language model. (2021). Preprint at https:\/\/arxiv.org\/abs\/2111.14745"},{"key":"e_1_3_2_156_2","first-page":"4783","volume-title":"Proceedings of the ICASSP","author":"Ma Zeyu","year":"2022","unstructured":"Zeyu Ma, Yuhang Guo, Xiao Luo, Chong Chen, Minghua Deng, Wei Cheng, and Guangming Lu. 2022. DHWP: Learning high-quality short hash codes via weight pruning. In Proceedings of the ICASSP. IEEE, 4783\u20134787."},{"key":"e_1_3_2_157_2","unstructured":"Chengzhi Mao Scott Geng Junfeng Yang Xin Wang and Carl Vondrick. 2022. Understanding zero-shot adversarial robustness for large-scale models. In Proc. ICLR."},{"key":"e_1_3_2_158_2","doi-asserted-by":"crossref","unstructured":"Yuning Mao Lambert Mathias Rui Hou Amjad Almahairi Hao Ma Jiawei Han Scott Yih and Madian Khabsa. 2022. UniPELT: A unified framework for parameter-efficient language model tuning. In Proceedings of Annual Meeting of the Association for Computational Linguistics. 6253\u20136264.","DOI":"10.18653\/v1\/2022.acl-long.433"},{"key":"e_1_3_2_159_2","unstructured":"Imad Eddine MAROUF Enzo Tartaglione and St\u00e9phane Lathuili\u00e8re. 2023. Tiny Adapters for Vision Transformers. Retrieved 14 Feb 2023 from https:\/\/openreview.net\/forum?id=V0Vo9eW2nzL"},{"key":"e_1_3_2_160_2","doi-asserted-by":"publisher","DOI":"10.7551\/mitpress\/9780262514620.001.0001"},{"key":"e_1_3_2_161_2","first-page":"12663","volume-title":"Proceedings of the CVPR","author":"Metzer Gal","year":"2023","unstructured":"Gal Metzer, Elad Richardson, Or Patashnik, Raja Giryes, and Daniel Cohen-Or. 2023. Latent-nerf for shape-guided generation of 3d shapes and textures. In Proceedings of the CVPR. 12663\u201312673."},{"key":"e_1_3_2_162_2","doi-asserted-by":"publisher","DOI":"10.1145\/3503250"},{"key":"e_1_3_2_163_2","doi-asserted-by":"crossref","unstructured":"Mahdi Namazifar Devamanyu Hazarika and Dilek Hakkani-Tur. 2023. Role of bias terms in dot-product attention. (2023). Preprint at https:\/\/arxiv.org\/abs\/2302.08626","DOI":"10.1109\/ICASSP49357.2023.10097125"},{"key":"e_1_3_2_164_2","unstructured":"Tung Nguyen Johannes Brandstetter Ashish Kapoor Jayesh K. Gupta and Aditya Grover. 2023. ClimaX: A foundation model for weather and climate. (2023). Preprint at https:\/\/arxiv.org\/abs\/2301.10343"},{"key":"e_1_3_2_165_2","first-page":"1","volume-title":"Computer Vision\u2013ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23\u201327, 2022, Proceedings, Part IV","author":"Ni Bolin","year":"2022","unstructured":"Bolin Ni, Houwen Peng, Minghao Chen, Songyang Zhang, Gaofeng Meng, Jianlong Fu, Shiming Xiang, and Haibin Ling. 2022. Expanding language-image pretrained models for general video recognition. In Computer Vision\u2013ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23\u201327, 2022, Proceedings, Part IV. Springer, 1\u201318."},{"key":"e_1_3_2_166_2","unstructured":"Xing Nie Bolin Ni Jianlong Chang Gaomeng Meng Chunlei Huo Zhaoxiang Zhang Shiming Xiang Qi Tian and Chunhong Pan. 2022. Pro-tuning: Unified prompt tuning for vision tasks. arXiv:2207.14381. Retrieved from https:\/\/arxiv.org\/abs\/2207.14381"},{"key":"e_1_3_2_167_2","volume-title":"Proceedings of the NeurIPS","author":"Pan Junting","year":"2022","unstructured":"Junting Pan, Ziyi Lin, Xiatian Zhu, Jing Shao, and Hongsheng Li. 2022. ST-Adapter: Parameter-efficient image-to-video transfer learning. In Proceedings of the NeurIPS."},{"key":"e_1_3_2_168_2","unstructured":"Junting Pan Ziyi Lin Xiatian Zhu Jing Shao and Hongsheng Li. 2022. St-adapter: Parameter-efficient image-to-video transfer learning. Proc. NeurIPS 35 (2022) 26462\u201326477."},{"key":"e_1_3_2_169_2","unstructured":"Omiros Pantazis Gabriel Brostow Katherine Jones and Oisin Mac Aodha. 2022. SVL-Adapter: Self-supervised adapter for vision-language pretrained models. In Proceedings of The 33rd British Machine Vision Conference. The British Machine Vision Association (BMVA)."},{"key":"e_1_3_2_170_2","doi-asserted-by":"crossref","unstructured":"Pinelopi Papalampidi and Mirella Lapata. 2022. Hierarchical3D adapters for long video-to-text summarization. arXiv:2210.04829. Retrieved from https:\/\/arxiv.org\/abs\/2210.04829","DOI":"10.18653\/v1\/2023.findings-eacl.96"},{"key":"e_1_3_2_171_2","doi-asserted-by":"crossref","unstructured":"Jihye Park Sunwoo Kim Soohyun Kim Seokju Cho Jaejun Yoo Youngjung Uh and Seungryong Kim. 2023. Lanit: Language-driven image-to-image translation for unlabeled data. In Proc. CVPR. 23401\u201323411.","DOI":"10.1109\/CVPR52729.2023.02241"},{"key":"e_1_3_2_172_2","first-page":"3967","volume-title":"Proceedings of the CVPR","author":"Park Wonpyo","year":"2019","unstructured":"Wonpyo Park, Dongju Kim, Yan Lu, and Minsu Cho. 2019. Relational knowledge distillation. In Proceedings of the CVPR. 3967\u20133976."},{"key":"e_1_3_2_173_2","first-page":"2339","volume-title":"Proceedings of the CVPR","author":"Passalis Nikolaos","year":"2020","unstructured":"Nikolaos Passalis, Maria Tzelepi, and Anastasios Tefas. 2020. Heterogeneous knowledge distillation using information flow modeling. In Proceedings of the CVPR. 2339\u20132348."},{"key":"e_1_3_2_174_2","first-page":"4195","volume-title":"Proceedings of the CVPR","author":"Peebles William","year":"2023","unstructured":"William Peebles and Saining Xie. 2023. Scalable diffusion models with transformers. In Proceedings of the CVPR. 4195\u20134205."},{"key":"e_1_3_2_175_2","first-page":"5007","volume-title":"Proceedings of the CVPR","author":"Peng Baoyun","year":"2019","unstructured":"Baoyun Peng, Xiao Jin, Jiaheng Liu, Dongsheng Li, Yichao Wu, Yu Liu, Shunfeng Zhou, and Zhaoning Zhang. 2019. Correlation congruence for knowledge distillation. In Proceedings of the CVPR. 5007\u20135016."},{"key":"e_1_3_2_176_2","doi-asserted-by":"crossref","unstructured":"Fang Peng Xiaoshan Yang and Changsheng Xu. 2022. SgVA-CLIP: Semantic-guided visual adapting of vision-language models for few-shot image classification. IEEE Transactions on Multimedia (2023).","DOI":"10.1109\/TMM.2023.3311646"},{"key":"e_1_3_2_177_2","doi-asserted-by":"crossref","unstructured":"Jonas Pfeiffer Andreas R\u00fcckl\u00e9 Clifton Poth Aishwarya Kamath Ivan Vuli\u0107 Sebastian Ruder Kyunghyun Cho and Iryna Gurevych. 2020. Adapterhub: A framework for adapting transformers. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 46\u201354.","DOI":"10.18653\/v1\/2020.emnlp-demos.7"},{"key":"e_1_3_2_178_2","first-page":"11557","volume-title":"Proceedings of the CVPR","author":"Pham Hieu","year":"2021","unstructured":"Hieu Pham, Zihang Dai, Qizhe Xie, and Quoc V. Le. 2021. Meta pseudo labels. In Proceedings of the CVPR. 11557\u201311568."},{"key":"e_1_3_2_179_2","first-page":"4095","volume-title":"Proceedings of the ICML","author":"Pham Hieu","year":"2018","unstructured":"Hieu Pham, Melody Guan, Barret Zoph, Quoc Le, and Jeff Dean. 2018. Efficient neural architecture search via parameters sharing. In Proceedings of the ICML. PMLR, 4095\u20134104."},{"key":"e_1_3_2_180_2","first-page":"93","volume-title":"Proceedings of the ECCV","author":"Porrello Angelo","year":"2020","unstructured":"Angelo Porrello, Luca Bergamini, and Simone Calderara. 2020. Robust re-identification by multiple views knowledge distillation. In Proceedings of the ECCV. Springer, 93\u2013110."},{"key":"e_1_3_2_181_2","unstructured":"Yujia Qin Shengding Hu Yankai Lin Weize Chen Ning Ding Ganqu Cui Zheni Zeng Yufei Huang Chaojun Xiao Chi Han et\u00a0al. 2023. Tool learning with foundation models. arXiv preprint arXiv:2304.08354 (2023)."},{"key":"e_1_3_2_182_2","unstructured":"Ziran Qin Mingbao Lin and Weiyao Lin. 2023. Low-rank winograd transformation for 3D convolutional neural networks. (2023). Preprint at https:\/\/arxiv.org\/abs\/2301.11180"},{"key":"e_1_3_2_183_2","unstructured":"Haoxuan Qu Hossein Rahmani Li Xu Bryan Williams and Jun Liu. 2021. Recent advances of continual learning in computer vision: An overview. (2021). Preprint at https:\/\/arxiv.org\/abs\/2109.11369"},{"key":"e_1_3_2_184_2","first-page":"8748","volume-title":"Proceedings of the ICML","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et\u00a0al. 2021. Learning transferable visual models from natural language supervision. In Proceedings of the ICML. PMLR, 8748\u20138763."},{"key":"e_1_3_2_185_2","doi-asserted-by":"crossref","unstructured":"Jun Rao Xv Meng Liang Ding Shuhan Qi Xuebo Liu Min Zhang and Dacheng Tao. 2023. Parameter-efficient and student-friendly knowledge distillation. IEEE Transactions on Multimedia (2023).","DOI":"10.1109\/TMM.2023.3321480"},{"key":"e_1_3_2_186_2","first-page":"50","volume-title":"Proceedings of the ECCV","author":"Rao Yongming","year":"2022","unstructured":"Yongming Rao, Wenliang Zhao, Jie Zhou, and Jiwen Lu. 2022. AMixer: Adaptive weight mixing for self-attention free vision transformers. In Proceedings of the ECCV. Springer, 50\u201367."},{"key":"e_1_3_2_187_2","volume-title":"Proceedings of the NeurIPS","author":"Rebuffi Alvise","year":"2017","unstructured":"Alvise Rebuffi, Hakan Bilen, and Andrea Vedaldi. 2017. Learning multiple visual domains with residual adapters. In Proceedings of the NeurIPS."},{"key":"e_1_3_2_188_2","first-page":"8119","volume-title":"Proceedings of the CVPR","author":"Rebuffi Sylvestre-Alvise","year":"2018","unstructured":"Sylvestre-Alvise Rebuffi, Hakan Bilen, and Andrea Vedaldi. 2018. Efficient parametrization of multi-domain deep neural networks. In Proceedings of the CVPR. 8119\u20138127."},{"key":"e_1_3_2_189_2","unstructured":"Scott Reed Konrad Zolna Emilio Parisotto Sergio Gomez Colmenarejo Alexander Novikov Gabriel Barth-Maron Mai Gimenez Yury Sulsky Jackie Kay Jost Tobias Springenberg et\u00a0al. 2022. A generalist agent. (2022). Preprint at https:\/\/arxiv.org\/abs\/2205.06175"},{"key":"e_1_3_2_190_2","volume-title":"Proceedings of the ICIP","author":"Remigereau F\u00e9lix","year":"2022","unstructured":"F\u00e9lix Remigereau, Djebril Mekhazni, Sajjad Abdoli, Rafael MO Cruz, Eric Granger, et\u00a0al. 2022. Knowledge distillation for multi-target domain adaptation in real-time person re-identification. In Proceedings of the ICIP. IEEE, 3853\u20133557."},{"key":"e_1_3_2_191_2","first-page":"10684","volume-title":"Proceedings of the CVPR","author":"Rombach Robin","year":"2022","unstructured":"Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj\u00f6rn Ommer. 2022. High-resolution image synthesis with latent diffusion models. In Proceedings of the CVPR. 10684\u201310695."},{"key":"e_1_3_2_192_2","volume-title":"Proceedings of the ICLR","author":"Romero Adriana","year":"2015","unstructured":"Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. 2015. Fitnets: Hints for thin deep nets. In Proceedings of the ICLR."},{"key":"e_1_3_2_193_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2018.2884462"},{"key":"e_1_3_2_194_2","first-page":"2564","volume-title":"Proceedings of the ICCV","author":"Rublee Ethan","year":"2011","unstructured":"Ethan Rublee, Vincent Rabaud, Kurt Konolige, and Gary Bradski. 2011. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the ICCV. IEEE, 2564\u20132571."},{"key":"e_1_3_2_195_2","first-page":"22500","volume-title":"Proceedings of the CVPR","author":"Ruiz Nataniel","year":"2023","unstructured":"Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. 2023. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the CVPR. 22500\u201322510."},{"key":"e_1_3_2_196_2","volume-title":"Artificial Intelligence a Modern Approach","author":"Russell Stuart J.","year":"2010","unstructured":"Stuart J. Russell. 2010. Artificial Intelligence a Modern Approach. Pearson Education, Inc."},{"key":"e_1_3_2_197_2","doi-asserted-by":"crossref","unstructured":"Kuniaki Saito Kihyuk Sohn Xiang Zhang Chun-Liang Li Chen-Yu Lee Kate Saenko and Tomas Pfister. 2023. Prefix conditioning unifies language and label supervision. In Proc. CVPR. 2861\u20132870.","DOI":"10.1109\/CVPR52729.2023.00280"},{"key":"e_1_3_2_198_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00474"},{"key":"e_1_3_2_199_2","doi-asserted-by":"publisher","DOI":"10.1109\/78.650093"},{"key":"e_1_3_2_200_2","volume-title":"Proceedings of the ICLR","author":"Sharma Mohit","year":"2023","unstructured":"Mohit Sharma, Claudio Fantacci, Yuxiang Zhou, Skanda Koppula, Nicolas Heess, Jon Scholz, and Yusuf Aytar. 2023. Lossless adaptation of pretrained vision models for robotic manipulation. In Proceedings of the ICLR. Retrieved from https:\/\/openreview.net\/forum?id=5IND3TXJRb-"},{"key":"e_1_3_2_201_2","doi-asserted-by":"crossref","unstructured":"Sheng Shen Shijia Yang Tianjun Zhang Bohan Zhai Joseph E. Gonzalez Kurt Keutzer and Trevor Darrell. 2024. Multitask vision-language prompt tuning. In Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision. 5656\u20135667.","DOI":"10.1109\/WACV57701.2024.00556"},{"key":"e_1_3_2_202_2","first-page":"6327","volume-title":"Proceedings of the CVPR Workshop","author":"Shi Yifeng","year":"2023","unstructured":"Yifeng Shi, Feng Lv, Xinliang Wang, Chunlong Xia, Shaojie Li, Shujie Yang, Teng Xi, and Gang Zhang. 2023. Open-transmind: A new baseline and benchmark for 1st foundation model challenge of intelligent transportation. In Proceedings of the CVPR Workshop. 6327\u20136334."},{"key":"e_1_3_2_203_2","doi-asserted-by":"crossref","unstructured":"Erica K. Shimomoto Edison Marrese-Taylor Hiroya Takamura Ichiro Kobayashi Hideki Nakayama and Yusuke Miyao. 2022. Towards parameter-efficient integration of pre-trained language models in temporal video grounding. (2022). Preprint at https:\/\/arxiv.org\/abs\/2209.13359","DOI":"10.18653\/v1\/2023.findings-acl.829"},{"key":"e_1_3_2_204_2","unstructured":"Manli Shu Weili Nie De-An Huang Zhiding Yu Tom Goldstein Anima Anandkumar and Chaowei Xiao. 2022. Test-time prompt tuning for zero-shot generalization in vision-language models. arXiv:2209.07511. Retrieved from https:\/\/arxiv.org\/abs\/2209.07511"},{"key":"e_1_3_2_205_2","volume-title":"Proceedings of the ICLR","author":"Shysheya Aliaksandra","year":"2023","unstructured":"Aliaksandra Shysheya, John F Bronskill, Massimiliano Patacchiola, Sebastian Nowozin, and Richard E Turner. 2023. FiT: Parameter efficient few-shot transfer learning for personalized and federated image classification. In Proceedings of the ICLR. Retrieved from https:\/\/openreview.net\/forum?id=9aokcgBVIj1"},{"key":"e_1_3_2_206_2","unstructured":"Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556. Retrieved from https:\/\/arxiv.org\/abs\/1409.1556"},{"key":"e_1_3_2_207_2","first-page":"15638","volume-title":"Proceedings of the CVPR","author":"Singh Amanpreet","year":"2022","unstructured":"Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela. 2022. Flava: A foundational language and vision alignment model. In Proceedings of the CVPR. 15638\u201315650."},{"key":"e_1_3_2_208_2","first-page":"804","volume-title":"Proceedings of the CVPR","author":"Singh Mannat","year":"2022","unstructured":"Mannat Singh, Laura Gustafson, Aaron Adcock, Vinicius de Freitas Reis, Bugra Gedik, Raj Prateek Kosaraju, Dhruv Mahajan, Ross Girshick, Piotr Doll\u00e1r, and Laurens van der Maaten. 2022. Revisiting weakly supervised pre-training of visual perception models. In Proceedings of the CVPR. 804\u2013814."},{"key":"e_1_3_2_209_2","doi-asserted-by":"crossref","unstructured":"Kihyuk Sohn Yuan Hao Jos\u00e9 Lezama Luisa Polania Huiwen Chang Han Zhang Irfan Essa and Lu Jiang. 2022. Visual prompt tuning for generative transfer learning. arXiv:2210.00990. Retrieved from https:\/\/arxiv.org\/abs\/2210.00990","DOI":"10.1109\/CVPR52729.2023.01900"},{"key":"e_1_3_2_210_2","first-page":"843","volume-title":"Proceedings of the ICCV","author":"Sun Chen","year":"2017","unstructured":"Chen Sun, Abhinav Shrivastava, Saurabh Singh, and Abhinav Gupta. 2017. Revisiting unreasonable effectiveness of data in deep learning era. In Proceedings of the ICCV. 843\u2013852."},{"key":"e_1_3_2_211_2","unstructured":"Ximeng Sun Ping Hu and Kate Saenko. 2022. Dualcoop: Fast adaptation to multi-label recognition with limited annotations. arXiv:2206.09541. Retrieved from https:\/\/arxiv.org\/abs\/2206.09541"},{"key":"e_1_3_2_212_2","doi-asserted-by":"publisher","DOI":"10.1155\/2014\/190903"},{"key":"e_1_3_2_213_2","volume-title":"Proceedings of the NeurIPS","author":"Sung Yi-Lin","year":"2022","unstructured":"Yi-Lin Sung, Jaemin Cho, and Mohit Bansal. 2022. Lst: Ladder side-tuning for parameter and memory efficient transfer learning. In Proceedings of the NeurIPS."},{"key":"e_1_3_2_214_2","first-page":"5227","volume-title":"Proceedings of the CVPR","author":"Sung Yi-Lin","year":"2022","unstructured":"Yi-Lin Sung, Jaemin Cho, and Mohit Bansal. 2022. Vl-adapter: Parameter-efficient transfer learning for vision-and-language tasks. In Proceedings of the CVPR. 5227\u20135237."},{"key":"e_1_3_2_215_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"e_1_3_2_216_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.308"},{"key":"e_1_3_2_217_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01424-7_27"},{"key":"e_1_3_2_218_2","first-page":"6105","volume-title":"Proceedings of the ICML","author":"Tan Mingxing","year":"2019","unstructured":"Mingxing Tan and Quoc Le. 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the ICML. PMLR, 6105\u20136114."},{"key":"e_1_3_2_219_2","doi-asserted-by":"crossref","unstructured":"Ming Tao Bing-Kun Bao Hao Tang and Changsheng Xu. 2023. GALIP: Generative adversarial clips for text-to-image synthesis. arXiv:2301.12959. Retrieved from https:\/\/arxiv.org\/abs\/2301.12959","DOI":"10.1109\/CVPR52729.2023.01366"},{"key":"e_1_3_2_220_2","doi-asserted-by":"publisher","DOI":"10.1145\/3530811"},{"key":"e_1_3_2_221_2","unstructured":"Zhan Tong Yibing Song Jue Wang and Limin Wang. 2022. Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. arXiv:2203.12602. Retrieved from https:\/\/arxiv.org\/abs\/2203.12602"},{"key":"e_1_3_2_222_2","first-page":"10347","volume-title":"Proceedings of the ICML","author":"Touvron Hugo","year":"2021","unstructured":"Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Herv\u00e9 J\u00e9gou. 2021. Training data-efficient image transformers and distillation through attention. In Proceedings of the ICML. PMLR, 10347\u201310357."},{"key":"e_1_3_2_223_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.510"},{"key":"e_1_3_2_224_2","first-page":"7852","volume-title":"Proceedings of the NeurIPS","author":"Tripuraneni Nilesh","year":"2020","unstructured":"Nilesh Tripuraneni, Michael Jordan, and Chi Jin. 2020. On the theory of transfer learning: The importance of task diversity. In Proceedings of the NeurIPS. 7852\u20137862."},{"key":"e_1_3_2_225_2","first-page":"2529","volume-title":"Proceedings of the WACV","author":"Tsubota Koki","year":"2023","unstructured":"Koki Tsubota, Hiroaki Akutsu, and Kiyoharu Aizawa. 2023. Universal deep image compression via content-adaptive optimization with adapters. In Proceedings of the WACV. 2529\u20132538."},{"key":"e_1_3_2_226_2","unstructured":"Cheng-Hao Tu Zheda Mai and Wei-Lun Chao. 2022. Visual query tuning: Towards effective usage of intermediate representations for parameter and memory efficient transfer learning. arXiv:2212.03220. Retrieved from https:\/\/arxiv.org\/abs\/2212.03220"},{"key":"e_1_3_2_227_2","doi-asserted-by":"publisher","DOI":"10.1038\/s42256-021-00390-3"},{"key":"e_1_3_2_228_2","doi-asserted-by":"crossref","unstructured":"Mojtaba Valipour Mehdi Rezagholizadeh Ivan Kobyzev and Ali Ghodsi. 2022. DyLoRA: Parameter efficient tuning of pre-trained models using dynamic search-free low-rank adaptation. arXiv:2210.07558. Retrieved from https:\/\/arxiv.org\/abs\/2210.07558","DOI":"10.18653\/v1\/2023.eacl-main.239"},{"key":"e_1_3_2_229_2","doi-asserted-by":"crossref","unstructured":"Sai H. Vemprala Rogerio Bonatti Arthur Bucker and Ashish Kapoor. 2024. Chatgpt for robotics: Design principles and model abilities. IEEE Access (2024).","DOI":"10.1109\/ACCESS.2024.3387941"},{"key":"e_1_3_2_230_2","article-title":"Language models generalize beyond natural proteins","author":"Verkuil Robert","year":"2022","unstructured":"Robert Verkuil, Ori Kabeli, Yilun Du, Basile IM Wicky, Lukas F Milles, Justas Dauparas, David Baker, Sergey Ovchinnikov, Tom Sercu, and Alexander Rives. 2022. Language models generalize beyond natural proteins. bioRxiv (2022), 2022\u201312.","journal-title":"bioRxiv"},{"key":"e_1_3_2_231_2","unstructured":"Feng Wang Manling Li Xudong Lin Hairong Lv Alexander G. Schwing and Heng Ji. 2022. Learning to decompose visual features with latent textual prompts. arXiv:2210.04287. Retrieved from https:\/\/arxiv.org\/abs\/2210.04287"},{"key":"e_1_3_2_232_2","unstructured":"Haixin Wang Jianlong Chang Xiao Luo Jinan Sun Zhouchen Lin and Qi Tian. 2023. LION: Implicit vision prompt tuning. arXiv:2303.09992. Retrieved from https:\/\/arxiv.org\/abs\/2303.09992"},{"key":"e_1_3_2_233_2","unstructured":"Haixin Wang Xinlong Yang Jianlong Chang Dian Jin Jinan Sun Shikun Zhang Xiao Luo and Qi Tian. 2023. Mode approximation makes good vision-language prompts. arXiv:2305.08381. Retrieved from https:\/\/arxiv.org\/abs\/2305.08381"},{"key":"e_1_3_2_234_2","first-page":"446","volume-title":"Proceedings of the ECCV","author":"Wang Haixin","year":"2020","unstructured":"Haixin Wang, Tianhao Zhang, Muzhi Yu, Jinan Sun, Wei Ye, Chen Wang, and Shikun Zhang. 2020. Stacking networks dynamically for image restoration based on the Plug-and-Play framework. In Proceedings of the ECCV. Springer, 446\u2013462."},{"key":"e_1_3_2_235_2","first-page":"19381","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24, 2023","author":"Wang Shijie","year":"2023","unstructured":"Shijie Wang, Jianlong Chang, Haojie Li, Zhihui Wang, Wanli Ouyang, and Qi Tian. 2023. Open-set fine-grained retrieval via prompting vision-language evaluator. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24, 2023. IEEE, 19381\u201319391."},{"key":"e_1_3_2_236_2","unstructured":"Shijie Wang Jianlong Chang Zhihui Wang Haojie Li Wanli Ouyang and Qi Tian. 2022. Fine-grained retrieval prompt tuning. arXiv:2207.14465. Retrieved from https:\/\/arxiv.org\/abs\/2207.14465"},{"key":"e_1_3_2_237_2","doi-asserted-by":"crossref","unstructured":"Wenhui Wang Hangbo Bao Li Dong Johan Bjorck Zhiliang Peng Qiang Liu Kriti Aggarwal Owais Khan Mohammed Saksham Singhal Subhojit Som et\u00a0al. 2022. Image as a foreign language: Beit pretraining for all vision and vision-language tasks. arXiv:2208.10442. Retrieved from https:\/\/arxiv.org\/abs\/2208.10442","DOI":"10.1109\/CVPR52729.2023.01838"},{"key":"e_1_3_2_238_2","first-page":"568","volume-title":"Proceedings of the CVPR","author":"Wang Wenhai","year":"2021","unstructured":"Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, and Ling Shao. 2021. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In Proceedings of the CVPR. 568\u2013578."},{"key":"e_1_3_2_239_2","doi-asserted-by":"crossref","unstructured":"Xinlong Wang Wen Wang Yue Cao Chunhua Shen and Tiejun Huang. 2022. Images speak in images: A generalist painter for in-context visual learning. arXiv:2212.02499. Retrieved from https:\/\/arxiv.org\/abs\/2212.02499","DOI":"10.1109\/CVPR52729.2023.00660"},{"key":"e_1_3_2_240_2","unstructured":"Yabin Wang Zhiwu Huang and Xiaopeng Hong. 2022. S-prompts learning with pre-trained transformers: An occam\u2019s razor for domain incremental learning. arXiv:2207.12819. Retrieved from https:\/\/arxiv.org\/abs\/2207.12819"},{"key":"e_1_3_2_241_2","first-page":"2457","volume-title":"Proceedings of the CVPR","author":"Wang Yiran","year":"2021","unstructured":"Yiran Wang, Xingyi Li, Min Shi, Ke Xian, and Zhiguo Cao. 2021. Knowledge distillation for fast and accurate monocular depth estimation on mobile devices. In Proceedings of the CVPR. 2457\u20132465."},{"key":"e_1_3_2_242_2","first-page":"346","volume-title":"Proceedings of the ECCV","author":"Wang Yukang","year":"2020","unstructured":"Yukang Wang, Wei Zhou, Tao Jiang, Xiang Bai, and Yongchao Xu. 2020. Intra-class feature variation distillation for semantic segmentation. In Proceedings of the ECCV. Springer, 346\u2013362."},{"key":"e_1_3_2_243_2","unstructured":"Ziyi Wang Xumin Yu Yongming Rao Jie Zhou and Jiwen Lu. 2022. P2p: Tuning pre-trained image models for point cloud analysis with point-to-pixel prompting. arXiv:2208.02812. Retrieved from https:\/\/arxiv.org\/abs\/2208.02812"},{"key":"e_1_3_2_244_2","volume-title":"Proceedings of the NeurIPS","author":"Wen Wei","year":"2016","unstructured":"Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. 2016. Learning structured sparsity in deep neural networks. In Proceedings of the NeurIPS."},{"key":"e_1_3_2_245_2","unstructured":"Chenfei Wu Shengming Yin Weizhen Qi Xiaodong Wang Zecheng Tang and Nan Duan. 2023. Visual ChatGPT: Talking drawing and editing with visual foundation models. arXiv:2303.04671. Retrieved from https:\/\/arxiv.org\/abs\/2303.04671"},{"key":"e_1_3_2_246_2","volume-title":"Proceedings of the NeurIPS","author":"Wu Chen Henry","year":"2022","unstructured":"Chen Henry Wu, Saman Motamed, Shaunak Srivastava, and Fernando De la Torre. 2022. Generative visual prompt: Unifying distributional control of pre-trained generative models. In Proceedings of the NeurIPS."},{"issue":"2","key":"e_1_3_2_247_2","doi-asserted-by":"crossref","first-page":"63","DOI":"10.3390\/a15020063","article-title":"Pruning adapters with lottery ticket","volume":"15","author":"Wu Jiarun","year":"2022","unstructured":"Jiarun Wu and Qingliang Chen. 2022. Pruning adapters with lottery ticket. Algorithms 15, 2 (2022), 63.","journal-title":"Algorithms"},{"key":"e_1_3_2_248_2","unstructured":"Junyang Wu Xianhang Li Chen Wei Huiyu Wang Alan Yuille Yuyin Zhou and Cihang Xie. 2022. Unleashing the power of visual prompting at the pixel level. arXiv:2212.10556. Retrieved from https:\/\/arxiv.org\/abs\/2212.10556"},{"key":"e_1_3_2_249_2","first-page":"7623","volume-title":"Proceedings of the CVPR","author":"Wu Jay Zhangjie","year":"2023","unstructured":"Jay Zhangjie Wu, Yixiao Ge, Xintao Wang, Stan Weixian Lei, Yuchao Gu, Yufei Shi, Wynne Hsu, Ying Shan, Xiaohu Qie, and Mike Zheng Shou. 2023. Tune-a-video: One-shot tuning of image diffusion models for text-to-video generation. In Proceedings of the CVPR. 7623\u20137633."},{"issue":"9","key":"e_1_3_2_250_2","first-page":"1","article-title":"Weight-sharing neural architecture search: A battle to shrink the optimization gap","volume":"54","author":"Xie Lingxi","year":"2021","unstructured":"Lingxi Xie, Xin Chen, Kaifeng Bi, Longhui Wei, Yuhui Xu, Lanfei Wang, Zhengsu Chen, An Xiao, Jianlong Chang, Xiaopeng Zhang, et\u00a0al. 2021. Weight-sharing neural architecture search: A battle to shrink the optimization gap. ACM Computing Surveys 54, 9 (2021), 1\u201337.","journal-title":"ACM Computing Surveys"},{"key":"e_1_3_2_251_2","first-page":"305","volume-title":"Proceedings of the ECCV","author":"Xie Saining","year":"2018","unstructured":"Saining Xie, Chen Sun, Jonathan Huang, Zhuowen Tu, and Kevin Murphy. 2018. Rethinking spatiotemporal feature learning: Speed-accuracy tradeoffs in video classification. In Proceedings of the ECCV. 305\u2013321."},{"key":"e_1_3_2_252_2","volume-title":"Proceedings of the ICLR","author":"Xie Sirui","year":"2019","unstructured":"Sirui Xie, Hehui Zheng, Chunxiao Liu, and Liang Lin. 2019. SNAS: Stochastic neural architecture search. In Proceedings of the ICLR."},{"key":"e_1_3_2_253_2","unstructured":"Yinghui Xing Qirui Wu De Cheng Shizhou Zhang Guoqiang Liang and Yanning Zhang. 2022. Class-aware visual prompt tuning for vision-language pre-trained model. arXiv:2208.08340. Retrieved from https:\/\/arxiv.org\/abs\/2208.08340"},{"key":"e_1_3_2_254_2","unstructured":"Chengming Xu Siqian Yang Yabiao Wang Zhanxiong Wang Yanwei Fu and Xiangyang Xue. 2023. Exploring efficient few-shot adaptation for vision transformers. arXiv:2301.02419. Retrieved from https:\/\/arxiv.org\/abs\/2301.02419"},{"key":"e_1_3_2_255_2","first-page":"664","volume-title":"Proceedings of the ECCV","author":"Xu Kunran","year":"2020","unstructured":"Kunran Xu, Lai Rui, Yishi Li, and Lin Gu. 2020. Feature normalized knowledge distillation for image classification. In Proceedings of the ECCV. Springer, 664\u2013680."},{"key":"e_1_3_2_256_2","unstructured":"Mengde Xu Zheng Zhang Fangyun Wei Han Hu and Xiang Bai. 2023. Side adapter network for open-vocabulary semantic segmentation. arXiv:2302.12242. Retrieved from https:\/\/arxiv.org\/abs\/2302.12242"},{"key":"e_1_3_2_257_2","volume-title":"Proceedings of the ICLR","author":"Xu Yuhui","year":"2020","unstructured":"Yuhui Xu, Lingxi Xie, Xiaopeng Zhang, Xin Chen, Guo-Jun Qi, Qi Tian, and Hongkai Xiong. 2020. Pc-darts: Partial channel connections for memory-efficient architecture search. In Proceedings of the ICLR."},{"key":"e_1_3_2_258_2","volume-title":"Proceedings of the AAAI","author":"Yan Sijie","year":"2018","unstructured":"Sijie Yan, Yuanjun Xiong, and Dahua Lin. 2018. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Proceedings of the AAAI."},{"key":"e_1_3_2_259_2","first-page":"12319","volume-title":"Proceedings of the CVPR","author":"Yang Chuanguang","year":"2022","unstructured":"Chuanguang Yang, Helong Zhou, Zhulin An, Xue Jiang, Yongjun Xu, and Qian Zhang. 2022. Cross-image relational knowledge distillation for semantic segmentation. In Proceedings of the CVPR. 12319\u201312328."},{"key":"e_1_3_2_260_2","unstructured":"Jing Yang Brais Martinez Adrian Bulat and Georgios Tzimiropoulos. 2020. Knowledge distillation via adaptive instance normalization. arXiv:2003.04289. Retrieved from https:\/\/arxiv.org\/abs\/2003.04289"},{"key":"e_1_3_2_261_2","doi-asserted-by":"publisher","DOI":"10.1145\/3626235"},{"key":"e_1_3_2_262_2","volume-title":"Proceedings of the ICLR","author":"Yang Taojiannan","year":"2023","unstructured":"Taojiannan Yang, Yi Zhu, Yusheng Xie, Aston Zhang, Chen Chen, and Mu Li. 2023. AIM: Adapting image models for efficient video action recognition. In Proceedings of the ICLR. https:\/\/openreview.net\/forum?id=CIoSZ_HKHS7"},{"key":"e_1_3_2_263_2","first-page":"25739","volume-title":"Proceedings of the NeurIPS","author":"Yang Xingyi","year":"2022","unstructured":"Xingyi Yang, Daquan Zhou, Songhua Liu, Jingwen Ye, and Xinchao Wang. 2022. Deep model reassembly. In Proceedings of the NeurIPS. 25739\u201325753."},{"key":"e_1_3_2_264_2","first-page":"23519","volume-title":"Proceedings of the NeurIPS","author":"Ye Haotian","year":"2021","unstructured":"Haotian Ye, Chuanlong Xie, Tianle Cai, Ruichen Li, Zhenguo Li, and Liwei Wang. 2021. Towards a theoretical framework of out-of-distribution generalization. In Proceedings of the NeurIPS. 23519\u201323531."},{"key":"e_1_3_2_265_2","unstructured":"Bruce XB Yu Jianlong Chang Lingbo Liu Qi Tian and Chang Wen Chen. 2022. Towards a unified view on visual parameter-efficient transfer learning. arXiv:2210.00788. Retrieved from https:\/\/arxiv.org\/abs\/2210.00788"},{"key":"e_1_3_2_266_2","first-page":"8818","volume-title":"Proceedings of the ICCV","author":"Yu Bruce XB","year":"2023","unstructured":"Bruce XB Yu, Zhi Zhang, Yongxu Liu, Sheng-hua Zhong, Yan Liu, and Chang Wen Chen. 2023. Gla-gcn: Global-local adaptive graph convolutional network for 3d human pose estimation from monocular video. In Proceedings of the ICCV. 8818\u20138829."},{"key":"e_1_3_2_267_2","unstructured":"Jiahui Yu Zirui Wang Vijay Vasudevan Legg Yeung Mojtaba Seyedhosseini and Yonghui Wu. 2022. Coca: Contrastive captioners are image-text foundation models. arXiv:2205.01917. Retrieved from https:\/\/arxiv.org\/abs\/2205.01917"},{"issue":"4","key":"e_1_3_2_268_2","doi-asserted-by":"crossref","first-page":"799","DOI":"10.26599\/TST.2022.9010044","article-title":"Prompting and tuning: A two-stage unsupervised domain adaptive person re-identification method on vision transformer backbone","volume":"28","author":"Yu Shengming","year":"2023","unstructured":"Shengming Yu, Zhaopeng Dou, and Shengjin Wang. 2023. Prompting and tuning: A two-stage unsupervised domain adaptive person re-identification method on vision transformer backbone. Tsinghua Science and Technology 28, 4 (2023), 799\u2013810.","journal-title":"Tsinghua Science and Technology"},{"key":"e_1_3_2_269_2","unstructured":"Zitong Yu Rizhao Cai Yawen Cui Xin Liu Yongjian Hu and Alex Kot. 2023. Rethinking vision transformer and masked autoencoder in multimodal face anti-spoofing. arXiv:2302.05744. Retrieved from https:\/\/arxiv.org\/abs\/2302.05744"},{"key":"e_1_3_2_270_2","first-page":"579","volume-title":"Proceedings of the CVPR","author":"Yuan Kun","year":"2021","unstructured":"Kun Yuan, Shaopeng Guo, Ziwei Liu, Aojun Zhou, Fengwei Yu, and Wei Wu. 2021. Incorporating convolution designs into visual transformers. In Proceedings of the CVPR. 579\u2013588."},{"key":"e_1_3_2_271_2","unstructured":"Lu Yuan Dongdong Chen Yi-Ling Chen Noel Codella Xiyang Dai Jianfeng Gao Houdong Hu Xuedong Huang Boxin Li Chunyuan Li et\u00a0al. 2021. Florence: A new foundation model for computer vision. arXiv:2111.11432. Retrieved from https:\/\/arxiv.org\/abs\/2111.11432"},{"key":"e_1_3_2_272_2","first-page":"558","volume-title":"Proceedings of the CVPR","author":"Yuan Li","year":"2021","unstructured":"Li Yuan, Yunpeng Chen, Tao Wang, Weihao Yu, Yujun Shi, Zi-Hang Jiang, Francis EH Tay, Jiashi Feng, and Shuicheng Yan. 2021. Tokens-to-token vit: Training vision transformers from scratch on imagenet. In Proceedings of the CVPR. 558\u2013567."},{"key":"e_1_3_2_273_2","article-title":"Volo: Vision outlooker for visual recognition","author":"Yuan Li","year":"2022","unstructured":"Li Yuan, Qibin Hou, Zihang Jiang, Jiashi Feng, and Shuicheng Yan. 2022. Volo: Vision outlooker for visual recognition. IEEE Trans. Patt. Anal. Mach. Intell. (2022).","journal-title":"IEEE Trans. Patt. Anal. Mach. Intell."},{"key":"e_1_3_2_274_2","unstructured":"Sha Yuan Hanyu Zhao Shuai Zhao Jiahong Leng Yangxiao Liang Xiaozhi Wang Jifan Yu Xin Lv Zhou Shao Jiaao He et\u00a0al. 2022. A roadmap for big model. arXiv:2203.14101. Retrieved from https:\/\/arxiv.org\/abs\/2203.14101"},{"key":"e_1_3_2_275_2","first-page":"1","volume-title":"Proceedings of the ACL (Volume 2: Short Papers)","author":"Zaken Elad Ben","year":"2022","unstructured":"Elad Ben Zaken, Yoav Goldberg, and Shauli Ravfogel. 2022. BitFit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. In Proceedings of the ACL (Volume 2: Short Papers). 1\u20139."},{"key":"e_1_3_2_276_2","unstructured":"Yuhang Zang Wei Li Kaiyang Zhou Chen Huang and Chen Change Loy. 2022. Unified vision and language prompt learning. arXiv:2210.07225. Retrieved from https:\/\/arxiv.org\/abs\/2210.07225"},{"key":"e_1_3_2_277_2","volume-title":"Proceedings of the ICLR","author":"Zhang Aston","year":"2021","unstructured":"Aston Zhang, Yi Tay, SHUAI Zhang, Alvin Chan, Anh Tuan Luu, Siu Hui, and Jie Fu. 2021. Beyond fully-connected layers with quaternions: Parameterization of hypercomplex multiplications with \\(1\/n\\) parameters. In Proceedings of the ICLR."},{"key":"e_1_3_2_278_2","unstructured":"Bowen Zhang Xiaojie Jin Weibo Gong Kai Xu Zhao Zhang Peng Wang Xiaohui Shen and Jiashi Feng. 2023. Multimodal video adapter for parameter efficient video text retrieval. arXiv:2301.07868. Retrieved from https:\/\/arxiv.org\/abs\/2301.07868"},{"key":"e_1_3_2_279_2","unstructured":"Jian-Wei Zhang Yifan Sun Yi Yang and Wei Chen. 2022. Feature-proxy transformer for few-shot segmentation. arXiv:2210.06908. Retrieved from https:\/\/arxiv.org\/abs\/2210.06908"},{"key":"e_1_3_2_280_2","volume-title":"Proceedings of the ICLR","author":"Zhang Linfeng","year":"2021","unstructured":"Linfeng Zhang and Kaisheng Ma. 2021. Improve object detection with feature-based knowledge distillation: Towards accurate and efficient detectors. In Proceedings of the ICLR."},{"key":"e_1_3_2_281_2","unstructured":"Renrui Zhang Hanqiu Deng Bohao Li Wei Zhang Hao Dong Hongsheng Li Peng Gao and Yu Qiao. 2022. Collaboration of pre-trained models makes better few-shot learner. arXiv:2209.12255. Retrieved from https:\/\/arxiv.org\/abs\/2209.12255"},{"key":"e_1_3_2_282_2","unstructured":"Renrui Zhang Rongyao Fang Wei Zhang Peng Gao Kunchang Li Jifeng Dai Yu Qiao and Hongsheng Li. 2021. Tip-adapter: Training-free clip-adapter for better vision-language modeling. arXiv:2111.03930. Retrieved from https:\/\/arxiv.org\/abs\/2111.03930"},{"key":"e_1_3_2_283_2","first-page":"8552","volume-title":"Proceedings of the CVPR","author":"Zhang Renrui","year":"2022","unstructured":"Renrui Zhang, Ziyu Guo, Wei Zhang, Kunchang Li, Xupeng Miao, Bin Cui, Yu Qiao, Peng Gao, and Hongsheng Li. 2022. Pointclip: Point cloud understanding by clip. In Proceedings of the CVPR. 8552\u20138562."},{"issue":"9","key":"e_1_3_2_284_2","doi-asserted-by":"crossref","first-page":"2905","DOI":"10.1109\/TPAMI.2020.3020315","article-title":"DATA: Differentiable architecture approximation with distribution guided sampling","volume":"43","author":"Zhang Xinbang","year":"2021","unstructured":"Xinbang Zhang, Jianlong Chang, Yiwen Guo, Gaofeng Meng, Shiming Xiang, Zhouchen Lin, and Chunhong Pan. 2021. DATA: Differentiable architecture approximation with distribution guided sampling. IEEE Trans. Pattern Anal. Mach. Intell. 43, 9 (2021), 2905\u20132920.","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"e_1_3_2_285_2","first-page":"7852","volume-title":"Proceedings of the CVPR","author":"Zhang Yiman","year":"2021","unstructured":"Yiman Zhang, Hanting Chen, Xinghao Chen, Yiping Deng, Chunjing Xu, and Yunhe Wang. 2021. Data-free knowledge distillation for image super-resolution. In Proceedings of the CVPR. 7852\u20137861."},{"key":"e_1_3_2_286_2","unstructured":"Yue Zhang Hongliang Fei Dingcheng Li Tan Yu and Ping Li. 2022. Prompting through prototype: A prototype-based prompt learning on pretrained vision-language models. arXiv:2210.10841. Retrieved from https:\/\/arxiv.org\/abs\/2210.10841"},{"key":"e_1_3_2_287_2","first-page":"7404","volume-title":"Proceedings of the ICML","author":"Zhang Yuchen","year":"2019","unstructured":"Yuchen Zhang, Tianle Liu, Mingsheng Long, and Michael Jordan. 2019. Bridging theory and algorithm for domain adaptation. In Proceedings of the ICML. PMLR, 7404\u20137413."},{"key":"e_1_3_2_288_2","article-title":"A survey on multi-task learning","author":"Zhang Yu","year":"2021","unstructured":"Yu Zhang and Qiang Yang. 2021. A survey on multi-task learning. IEEE Trans. Know. and Data Engin. (2021).","journal-title":"IEEE Trans. Know. and Data Engin."},{"key":"e_1_3_2_289_2","unstructured":"Yuanhan Zhang Kaiyang Zhou and Ziwei Liu. 2022. Neural prompt search. arXiv:2206.04673. Retrieved from https:\/\/arxiv.org\/abs\/2206.04673"},{"key":"e_1_3_2_290_2","unstructured":"Yuanhan Zhang Kaiyang Zhou and Ziwei Liu. 2023. What makes good examples for visual in-context learning? arXiv:2301.13670. Retrieved from https:\/\/arxiv.org\/abs\/2301.13670"},{"key":"e_1_3_2_291_2","doi-asserted-by":"crossref","unstructured":"Zhengkun Zhang Wenya Guo Xiaojun Meng Yasheng Wang Yadao Wang Xin Jiang Qun Liu and Zhenglu Yang. 2022. Hyperpelt: Unified parameter-efficient language model tuning for both language and vision-and-language tasks. arXiv:2203.03878. Retrieved from https:\/\/arxiv.org\/abs\/2203.03878","DOI":"10.18653\/v1\/2023.findings-acl.725"},{"key":"e_1_3_2_292_2","first-page":"11953","volume-title":"Proceedings of the CVPR","author":"Zhao Borui","year":"2022","unstructured":"Borui Zhao, Quan Cui, Renjie Song, Yiyu Qiu, and Jiajun Liang. 2022. Decoupled knowledge distillation. In Proceedings of the CVPR. 11953\u201311962."},{"key":"e_1_3_2_293_2","unstructured":"Cairong Zhao Yubin Wang Xinyang Jiang Yifei Shen Kaitao Song Dongsheng Li and Duoqian Miao. 2022. Learning domain invariant prompt for vision-language models. arXiv:2212.04196. Retrieved from https:\/\/arxiv.org\/abs\/2212.04196"},{"key":"e_1_3_2_294_2","first-page":"9407","volume-title":"Proceedings of the CVPR","author":"Zheng Zhaohui","year":"2022","unstructured":"Zhaohui Zheng, Rongguang Ye, Ping Wang, Dongwei Ren, Wangmeng Zuo, Qibin Hou, and Ming-Ming Cheng. 2022. Localization distillation for dense object detection. In Proceedings of the CVPR. 9407\u20139416."},{"key":"e_1_3_2_295_2","unstructured":"Ce Zhou Qian Li Chen Li Jun Yu Yixin Liu Guangjing Wang Kai Zhang Cheng Ji Qiben Yan Lifang He et\u00a0al. 2023. A comprehensive survey on pretrained foundation models: A history from BERT to ChatGPT. arXiv:2302.09419. Retrieved from https:\/\/arxiv.org\/abs\/2302.09419"},{"key":"e_1_3_2_296_2","first-page":"16816","volume-title":"Proceedings of the CVPR","author":"Zhou Kaiyang","year":"2022","unstructured":"Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. 2022. Conditional prompt learning for vision-language models. In Proceedings of the CVPR. 16816\u201316825."},{"key":"e_1_3_2_297_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-022-01653-1"},{"key":"e_1_3_2_298_2","first-page":"10387","volume-title":"Proceedings of the CVPR","author":"Zhou Sheng","year":"2021","unstructured":"Sheng Zhou, Yucheng Wang, Defang Chen, Jiawei Chen, Xin Wang, Can Wang, and Jiajun Bu. 2021. Distilling holistic knowledge with graph neural networks. In Proceedings of the CVPR. 10387\u201310396."},{"key":"e_1_3_2_299_2","doi-asserted-by":"crossref","unstructured":"Ziqin Zhou Bowen Zhang Yinjie Lei Lingqiao Liu and Yifan Liu. 2022. ZegCLIP: Towards adapting CLIP for zero-shot semantic segmentation. arXiv:2212.03588. Retrieved from https:\/\/arxiv.org\/abs\/2212.03588","DOI":"10.1109\/CVPR52729.2023.01075"},{"key":"e_1_3_2_300_2","doi-asserted-by":"crossref","unstructured":"Beier Zhu Yulei Niu Yucheng Han Yue Wu and Hanwang Zhang. 2022. Prompt-aligned gradient for prompt tuning. arXiv:2205.14865. Retrieved from https:\/\/arxiv.org\/abs\/2205.14865","DOI":"10.1109\/ICCV51070.2023.01435"},{"key":"e_1_3_2_301_2","unstructured":"Xiangyang Zhu Renrui Zhang Bowei He Ziyao Zeng Shanghang Zhang and Peng Gao. 2022. PointCLIP V2: Adapting CLIP for powerful 3D open-world learning. arXiv:2211.11682. Retrieved from https:\/\/arxiv.org\/abs\/2211.11682"},{"key":"e_1_3_2_302_2","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2020.3004555"}],"container-title":["ACM Computing Surveys"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3657632","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3657632","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T00:05:53Z","timestamp":1750291553000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3657632"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7,25]]},"references-count":301,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2024,12,31]]}},"alternative-id":["10.1145\/3657632"],"URL":"https:\/\/doi.org\/10.1145\/3657632","relation":{},"ISSN":["0360-0300","1557-7341"],"issn-type":[{"value":"0360-0300","type":"print"},{"value":"1557-7341","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,7,25]]},"assertion":[{"value":"2023-06-07","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-04-07","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-07-25","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}