{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,21]],"date-time":"2025-08-21T18:48:17Z","timestamp":1755802097074,"version":"3.44.0"},"publisher-location":"New York, NY, USA","reference-count":53,"publisher":"ACM","funder":[{"DOI":"10.13039\/501100006374","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62072394"],"award-info":[{"award-number":["62072394"]}],"id":[{"id":"10.13039\/501100006374","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,6,30]]},"DOI":"10.1145\/3731715.3733300","type":"proceedings-article","created":{"date-parts":[[2025,6,25]],"date-time":"2025-06-25T18:31:39Z","timestamp":1750876299000},"page":"1795-1803","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["DASPL: Enhancing Few-Shot Learning with Dual Adapters and a Single-Step Pseudo-Label Cycle"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0009-2139-6319","authenticated-orcid":false,"given":"Yanbo","family":"Zhang","sequence":"first","affiliation":[{"name":"Yanshan University, Qinhuangdao, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-3368-6582","authenticated-orcid":false,"given":"Yuhao","family":"Liu","sequence":"additional","affiliation":[{"name":"Yanshan University, Qinhuangdao, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-5867-5073","authenticated-orcid":false,"given":"Zhaoyang","family":"Liu","sequence":"additional","affiliation":[{"name":"Yanshan University, Qinhuangdao, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8123-7768","authenticated-orcid":false,"given":"Huiying","family":"Li","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-6000-9644","authenticated-orcid":false,"given":"Ruilin","family":"Chai","sequence":"additional","affiliation":[{"name":"Yanshan University, Qinhuangdao, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9532-8273","authenticated-orcid":false,"given":"Guanghua","family":"Gu","sequence":"additional","affiliation":[{"name":"Yanshan University, Qinhuangdao, China"}]}],"member":"320","published-online":{"date-parts":[[2025,6,30]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"Jean-Baptiste Alayrac Jeff Donahue Pauline Luc Antoine Miech Iain Barr Yana Hasson Karel Lenc Arthur Mensch Katherine Millican et al. 2022. Flamingo: a visual language model for few-shot learning. In NeurIPS."},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10599-4_29"},{"key":"e_1_3_2_1_3_1","first-page":"1877","article-title":"Language Models are Few-Shot Learners","volume":"33","author":"Brown Tom","year":"2020","unstructured":"Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems (NeurIPS), Vol. 33. 1877--1901.","journal-title":"Advances in Neural Information Processing Systems (NeurIPS)"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"crossref","unstructured":"Mathilde Caron Hugo Touvron Ishan Misra Herv\u00e9 J\u00e9gou Julien Mairal Piotr Bojanowski and Armand Joulin. 2021. Emerging Properties in Self-Supervised Vision Transformers. In ICCV. 9650--9660.","DOI":"10.1109\/ICCV48922.2021.00951"},{"key":"e_1_3_2_1_5_1","volume-title":"PLOT: Prompt Learning with Optimal Transport for Vision-Language Models. In ICLR.","author":"Chen Guangyi","year":"2023","unstructured":"Guangyi Chen, Weiran Yao, Xiangchen Song, Xinyue Li, Yongming Rao, and Kun Zhang. 2023b. PLOT: Prompt Learning with Optimal Transport for Vision-Language Models. In ICLR."},{"key":"e_1_3_2_1_6_1","unstructured":"Xi Chen Xiao Wang Soravit Changpinyo A. J. Piergiovanni Piotr Padlewski Daniel Salz Sebastian Goodman Adam Grycner Basil Mustafa Lucas Beyer Alexander Kolesnikov et al. 2023a. PaLI: A jointly-scaled multilingual language-image model. In ICLR."},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"crossref","unstructured":"Xinlei Chen Saining Xie and Kaiming He. 2021. An empirical study of training self-supervised vision transformers. In ICCV. 9640--9649.","DOI":"10.1109\/ICCV48922.2021.00950"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.461"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_2_1_10_1","volume-title":"An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv preprint arXiv:2010.11929","author":"Dosovitskiy Alexey","year":"2021","unstructured":"Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv preprint arXiv:2010.11929 (2021)."},{"key":"e_1_3_2_1_11_1","volume-title":"Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. 2004 conference on computer vision and pattern recognition workshop","author":"Fei-Fei Li","year":"2004","unstructured":"Li Fei-Fei, Rob Fergus, and Pietro Perona. 2004. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. 2004 conference on computer vision and pattern recognition workshop (2004), 178."},{"key":"e_1_3_2_1_12_1","volume-title":"CLIP-Adapter: Better vision-language models with feature adapters. IJCV","author":"Gao Peng","year":"2023","unstructured":"Peng Gao, Shijie Geng, Renrui Zhang, Teli Ma, Rongyao Fang, Yongfeng Zhang, Hongsheng Li, and Yu Qiao. 2023. CLIP-Adapter: Better vision-language models with feature adapters. IJCV (2023), 1--15."},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"crossref","unstructured":"Ziyu Guo Renrui Zhang Longtian Qiu Xianzheng Ma Xupeng Miao Xuming He and Bin Cui. 2023. CALIP: zero-shot enhancement of CLIP with parameter-free attention. In AAAI. 746--754.","DOI":"10.1609\/aaai.v37i1.25152"},{"key":"e_1_3_2_1_14_1","unstructured":"Kaiming He Xiangyu Zhang Shaoqing Ren and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/JSTARS.2019.2918242"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"crossref","unstructured":"Dan Hendrycks Steven Basart Norman Mu Saurav Kadavath Frank Wang Evan Dorundo Rahul Desai Tyler Zhu Samyak Parajuli Mike Guo et al. 2021a. The many faces of robustness: A critical analysis of out-of-distribution generalization. In ICCV. 8340--8349.","DOI":"10.1109\/ICCV48922.2021.00823"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"crossref","unstructured":"Dan Hendrycks Kevin Zhao Steven Basart Jacob Steinhardt and Dawn Song. 2021b. Natural adversarial examples. In CVPR. 15262--15271.","DOI":"10.1109\/CVPR46437.2021.01501"},{"key":"e_1_3_2_1_18_1","volume-title":"Unsupervised Prompt Learning for Vision-Language Models. arXiv preprint arXiv:2204.03649","author":"Huang Tony","year":"2022","unstructured":"Tony Huang, Jack Chu, and Fangyun Wei. 2022a. Unsupervised Prompt Learning for Vision-Language Models. arXiv preprint arXiv:2204.03649 (2022)."},{"key":"e_1_3_2_1_19_1","volume-title":"Unsupervised Prompt Learning for Vision-Language Models. arXiv preprint arXiv:2204.03649","author":"Huang Tony","year":"2022","unstructured":"Tony Huang, Jack Chu, and Fangyun Wei. 2022b. Unsupervised Prompt Learning for Vision-Language Models. arXiv preprint arXiv:2204.03649 (2022). https:\/\/arxiv.org\/pdf\/2204.03649v1"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"crossref","unstructured":"Ahmet Iscen Giorgos Tolias Yannis Avrithis and Ondrej Chum. 2019. Label Propagation for Deep Semi-Supervised Learning. In CVPR. 5065--5074.","DOI":"10.1109\/CVPR.2019.00521"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"crossref","unstructured":"Zhao Jin Munawar Hayat Yuwei Yang Yulan Guo and Yinjie Lei. 2023. Context-aware alignment and mutual masking for 3d-language pre-training. In CVPR. 10984--10994.","DOI":"10.1109\/CVPR52729.2023.01057"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"crossref","unstructured":"Muhammad Uzair Khattak Hanoona Rasheed Muhammad Maaz Salman Khan and Fahad Shahbaz Khan. 2023a. MaPLe: Multi-modal Prompt Learning. In CVPR. 19113--19122.","DOI":"10.1109\/CVPR52729.2023.01832"},{"key":"e_1_3_2_1_23_1","volume-title":"Muhammad Maaz, Salman H. Khan, and Fahad Shahbaz Khan.","author":"Khattak Muhammad Uzair","year":"2023","unstructured":"Muhammad Uzair Khattak, Hanoona Abdul Rasheed, Muhammad Maaz, Salman H. Khan, and Fahad Shahbaz Khan. 2023b. Maple: Multi-modal prompt learning. In CVPR. IEEE, 19113--19122."},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"crossref","unstructured":"Jonathan Krause Michael Stark Jia Deng and Li Fei-Fei. 2013. 3d object representations for fine-grained categorization. Technical Report. Stanford University.","DOI":"10.1109\/ICCVW.2013.77"},{"key":"e_1_3_2_1_25_1","volume-title":"Pseudo-Label: The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks. In ICML 2013 Workshop: Challenges in Representation Learning (WREPL).","author":"Lee Dong-Hyun","year":"2013","unstructured":"Dong-Hyun Lee. 2013. Pseudo-Label: The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks. In ICML 2013 Workshop: Challenges in Representation Learning (WREPL)."},{"key":"e_1_3_2_1_26_1","volume-title":"Enabling Calibration in the Zero-Shot Inference of Large Vision-Language Models. In ICLR 2023 Workshop on Pitfalls of limited data and computation for Trustworthy ML.","author":"LeVine Will","year":"2023","unstructured":"Will LeVine, Benjamin Pikus, Pranav Vishnu Raja, and Fernando Amat. 2023. Enabling Calibration in the Zero-Shot Inference of Large Vision-Language Models. In ICLR 2023 Workshop on Pitfalls of limited data and computation for Trustworthy ML."},{"key":"e_1_3_2_1_27_1","volume-title":"Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In ICCV. 10012--10022.","author":"Liu Ze","year":"2021","unstructured":"Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In ICCV. 10012--10022."},{"key":"e_1_3_2_1_28_1","volume-title":"Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151","author":"Maji Subhransu","year":"2013","unstructured":"Subhransu Maji, Esa Rahtu, Juho Kannala, Matthew Blaschko, and Andrea Vedaldi. 2013. Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151 (2013)."},{"key":"e_1_3_2_1_29_1","volume-title":"SLIP: Self-Supervision Meets Language-Image Pretraining","author":"Mu Norman","year":"2022","unstructured":"Norman Mu, Alexander Kirillov, David Wagner, and Saining Xie. 2022. SLIP: Self-Supervision Meets Language-Image Pretraining. In ECCV. Springer, 529--544."},{"key":"e_1_3_2_1_30_1","volume-title":"On first-order meta-learning algorithms. CoRR","author":"Nichol Alex","year":"2018","unstructured":"Alex Nichol, Joshua Achiam, and John Schulman. 2018. On first-order meta-learning algorithms. CoRR, Vol. abs\/1803.02999 (2018)."},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICVGIP.2008.47"},{"key":"e_1_3_2_1_32_1","volume-title":"Pau Rodr\u00edguez L\u00f3pez, and Alexandre Lacoste","author":"Oreshkin Boris","year":"2018","unstructured":"Boris Oreshkin, Pau Rodr\u00edguez L\u00f3pez, and Alexandre Lacoste. 2018. Tadam: Task dependent adaptive metric for improved few-shot learning. In NeurIPS."},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2012.6248092"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1002\/tea.3660020306"},{"volume-title":"The Theory of Stages in Cognitive Development","author":"Piaget Jean","key":"e_1_3_2_1_35_1","unstructured":"Jean Piaget. 1971. The Theory of Stages in Cognitive Development. In Measurement and Piaget, D. Green, M. P. Ford, and G. B. Flamer (Eds.). McGraw-Hill, New York, NY, 1--11."},{"key":"e_1_3_2_1_36_1","volume-title":"VT-CLIP: Enhancing Vision-Language Models with Visual-guided Texts. arXiv preprint arXiv:2112.02399","author":"Qiu Longtian","year":"2021","unstructured":"Longtian Qiu, Renrui Zhang, Ziyu Guo, Ziyao Zeng, Yafeng Li, and Guangnan Zhang. 2021. VT-CLIP: Enhancing Vision-Language Models with Visual-guided Texts. arXiv preprint arXiv:2112.02399 (2021)."},{"key":"e_1_3_2_1_37_1","volume-title":"Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al.","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In ICML. 8748--8763."},{"key":"e_1_3_2_1_38_1","volume-title":"Proceedings of the 38th International Conference on Machine Learning (Proceedings of Machine Learning Research","volume":"8831","author":"Ramesh Aditya","year":"2021","unstructured":"Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, and Ilya Sutskever. 2021. Zero-Shot Text-to-Image Generation. In Proceedings of the 38th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 8821--8831. https:\/\/proceedings.mlr.press\/v139\/ramesh21a.html"},{"key":"e_1_3_2_1_39_1","unstructured":"Benjamin Recht Rebecca Roelofs Ludwig Schmidt and Vaishaal Shankar. 2019. Do ImageNet classifiers generalize to ImageNet?. In ICML. PMLR 5389--5400."},{"key":"e_1_3_2_1_40_1","unstructured":"Weiwei Shi Yihong Gong C. Ding Zhiheng Ma Xiaoyu Tao and Nanning Zheng. 2018. Transductive Semi-Supervised Deep Learning Using Min-Max Features. In ECCV."},{"key":"e_1_3_2_1_41_1","volume-title":"Alexey Kurakin, and Chun-Liang Li.","author":"Sohn Kihyuk","year":"2020","unstructured":"Kihyuk Sohn, David Berthelot, Nicholas Carlini, Zizhao Zhang, Han Zhang, Colin A. Raffel, Ekin Dogus Cubuk, Alexey Kurakin, and Chun-Liang Li. 2020. FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence. In NIPS."},{"key":"e_1_3_2_1_42_1","volume-title":"Amir Roshan Zamir, and Mubarak Shah","author":"Soomro Khurram","year":"2012","unstructured":"Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. 2012. Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012)."},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"crossref","unstructured":"Yuwei Tang Zhenyi Lin Qilong Wang Pengfei Zhu and Qinghua Hu. 2024. AMU-Tuning: Effective Logit Bias for CLIP-based Few-shot Learning. In CVPR. 23323--23333.","DOI":"10.1109\/CVPR52733.2024.02201"},{"key":"e_1_3_2_1_44_1","doi-asserted-by":"crossref","unstructured":"Vishaal Udandarao Ankush Gupta and Samuel Albanie. 2023. SuS-X: Training-free name-only transfer of vision-language models. In ICCV. 2725--2736.","DOI":"10.1109\/ICCV51070.2023.00257"},{"key":"e_1_3_2_1_45_1","volume-title":"NeurIPS","volume":"32","author":"Wang Haohan","year":"2019","unstructured":"Haohan Wang, Songwei Ge, Zachary Lipton, and Eric P Xing. 2019. Learning robust global representations by penalizing local predictive power. In NeurIPS, Vol. 32."},{"key":"e_1_3_2_1_46_1","volume-title":"Yu","author":"Wang Xudong","year":"2022","unstructured":"Xudong Wang, Zhirong Wu, Long Lian, and Stella X. Yu. 2022. Debiased Learning From Naturally Imbalanced Pseudo-Labels. In CVPR. 14647--14657."},{"key":"e_1_3_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2010.5539970"},{"key":"e_1_3_2_1_48_1","doi-asserted-by":"crossref","unstructured":"Yuwei Yang Munawar Hayat Zhao Jin Hongyuan Zhu and Yinjie Lei. 2023. Zero-shot point cloud segmentation by semantic-visual aware synthesis. In ICCV. 11586--11596.","DOI":"10.1109\/ICCV51070.2023.01064"},{"key":"e_1_3_2_1_49_1","volume-title":"CoCa: Contrastive Captioners are Image-Text Foundation Models. Transactions of Machine Learning Research","author":"Yu Jiahui","year":"2022","unstructured":"Jiahui Yu, Zirui Wang, Vijay Vasudevan, Legg Yeung, Mojtaba Seyedhosseini, and Yonghui Wu. 2022. CoCa: Contrastive Captioners are Image-Text Foundation Models. Transactions of Machine Learning Research (2022)."},{"key":"e_1_3_2_1_50_1","doi-asserted-by":"crossref","unstructured":"Renrui Zhang Xiangfei Hu Bohao Li Siyuan Huang Hanqiu Deng Yu Qiao Peng Gao and Hongsheng Li. 2023. Prompt generate then cache: Cascade of foundation models makes strong few-shot learners. In CVPR. 15211--15222.","DOI":"10.1109\/CVPR52729.2023.01460"},{"volume-title":"TipAdapter: Training-free adaption of CLIP for few-shot classification","author":"Zhang Renrui","key":"e_1_3_2_1_51_1","unstructured":"Renrui Zhang, Wei Zhang, Rongyao Fang, Peng Gao, Kunchang Li, Jifeng Dai, Yu Qiao, and Hongsheng Li. 2022. TipAdapter: Training-free adaption of CLIP for few-shot classification. In ECCV. Springer, 493--510."},{"key":"e_1_3_2_1_52_1","volume-title":"Chen Change Loy, and Ziwei Liu","author":"Zhou Kaiyang","year":"2022","unstructured":"Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. 2022a. Conditional prompt learning for vision-language models. In CVPR. 16816--16825."},{"key":"e_1_3_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-022-01653-1"}],"event":{"name":"ICMR '25: International Conference on Multimedia Retrieval","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Chicago IL USA","acronym":"ICMR '25"},"container-title":["Proceedings of the 2025 International Conference on Multimedia Retrieval"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3731715.3733300","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,21]],"date-time":"2025-08-21T04:15:12Z","timestamp":1755749712000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3731715.3733300"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,30]]},"references-count":53,"alternative-id":["10.1145\/3731715.3733300","10.1145\/3731715"],"URL":"https:\/\/doi.org\/10.1145\/3731715.3733300","relation":{},"subject":[],"published":{"date-parts":[[2025,6,30]]},"assertion":[{"value":"2025-06-30","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}