{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,23]],"date-time":"2025-08-23T05:25:11Z","timestamp":1755926711353,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":46,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,10,10]],"date-time":"2022-10-10T00:00:00Z","timestamp":1665360000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Youth Innovation Promotion Association Chinese Academy of Sciences","award":["Y2021122"],"award-info":[{"award-number":["Y2021122"]}]},{"DOI":"10.13039\/501100002858","name":"China Postdoctoral Science Foundation","doi-asserted-by":"publisher","award":["2021M703081"],"award-info":[{"award-number":["2021M703081"]}],"id":[{"id":"10.13039\/501100002858","id-type":"DOI","asserted-by":"publisher"}]},{"name":"National Nature Science Foundation of China","award":["62121002, 62022076, U1936210"],"award-info":[{"award-number":["62121002, 62022076, U1936210"]}]},{"DOI":"10.13039\/501100012226","name":"Fundamental Research Funds for the Central Universities","doi-asserted-by":"publisher","award":["WK3480000011, WK2100000026"],"award-info":[{"award-number":["WK3480000011, WK2100000026"]}],"id":[{"id":"10.13039\/501100012226","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,10,10]]},"DOI":"10.1145\/3503161.3547945","type":"proceedings-article","created":{"date-parts":[[2022,10,10]],"date-time":"2022-10-10T15:42:46Z","timestamp":1665416566000},"page":"4185-4193","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Proxy Probing Decoder for Weakly Supervised Object Localization: A Baseline Investigation"],"prefix":"10.1145","author":[{"given":"Jingyuan","family":"Xu","sequence":"first","affiliation":[{"name":"University of Science and Technology of China, Hefei, China"}]},{"given":"Hongtao","family":"Xie","sequence":"additional","affiliation":[{"name":"University of Science and Technology of China, Hefei, China"}]},{"given":"Chuanbin","family":"Liu","sequence":"additional","affiliation":[{"name":"University of Science and Technology of China, Hefei, China"}]},{"given":"Yongdong","family":"Zhang","sequence":"additional","affiliation":[{"name":"University of Science and Technology of China, Hefei, China"}]}],"member":"320","published-online":{"date-parts":[[2022,10,10]]},"reference":[{"key":"e_1_3_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58604-1_37"},{"key":"e_1_3_2_2_2_1","unstructured":"Tom Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared D Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell etal 2020. Language models are few-shot learners. Advances in neural information processing systems Vol. 33 (2020) 1877--1901.  Tom Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared D Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell et al. 2020. Language models are few-shot learners. Advances in neural information processing systems Vol. 33 (2020) 1877--1901."},{"key":"e_1_3_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2015.2463223"},{"key":"e_1_3_2_2_4_1","volume-title":"Proceedings, Part I (Lecture Notes in Computer Science), Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.)","volume":"12346","author":"Carion Nicolas","year":"2020","unstructured":"Nicolas Carion , Francisco Massa , Gabriel Synnaeve , Nicolas Usunier , Alexander Kirillov , and Sergey Zagoruyko . 2020 . End-to-End Object Detection with Transformers. In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23--28, 2020 , Proceedings, Part I (Lecture Notes in Computer Science), Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.) , Vol. 12346 . Springer, 213--229. https:\/\/doi.org\/10.1007\/978--3-030--58452--8_13 Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-End Object Detection with Transformers. In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part I (Lecture Notes in Computer Science), Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.), Vol. 12346. Springer, 213--229. https:\/\/doi.org\/10.1007\/978--3-030--58452--8_13"},{"key":"e_1_3_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00951"},{"key":"e_1_3_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2020.2999099"},{"key":"e_1_3_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00232"},{"key":"e_1_3_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2016.2535231"},{"key":"e_1_3_2_2_9_1","volume-title":"9th International Conference on Learning Representations, ICLR 2021","author":"Dosovitskiy Alexey","year":"2021","unstructured":"Alexey Dosovitskiy , Lucas Beyer , Alexander Kolesnikov , Dirk Weissenborn , Xiaohua Zhai , Thomas Unterthiner , Mostafa Dehghani , Matthias Minderer , Georg Heigold , Sylvain Gelly , Jakob Uszkoreit , and Neil Houlsby . 2021 . An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale . In 9th International Conference on Learning Representations, ICLR 2021 , Virtual Event, Austria, May 3--7 , 2021. OpenReview.net. https:\/\/openreview.net\/forum?id=YicbFdNTTy Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3--7, 2021. OpenReview.net. https:\/\/openreview.net\/forum?id=YicbFdNTTy"},{"key":"e_1_3_2_2_10_1","volume-title":"TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization. arXiv preprint arXiv:2103.14862","author":"Gao Wei","year":"2021","unstructured":"Wei Gao , Fang Wan , Xingjia Pan , Zhiliang Peng , Qi Tian , Zhenjun Han , Bolei Zhou , and Qixiang Ye. 2021. TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization. arXiv preprint arXiv:2103.14862 ( 2021 ). Wei Gao, Fang Wan, Xingjia Pan, Zhiliang Peng, Qi Tian, Zhenjun Han, Bolei Zhou, and Qixiang Ye. 2021. TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization. arXiv preprint arXiv:2103.14862 (2021)."},{"key":"e_1_3_2_2_11_1","volume-title":"Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, et al.","author":"Grill Jean-Bastien","year":"2020","unstructured":"Jean-Bastien Grill , Florian Strub , Florent Altch\u00e9 , Corentin Tallec , Pierre H Richemond , Elena Buchatskaya , Carl Doersch , Bernardo Avila Pires , Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, et al. 2020 . Bootstrap your own latent: A new approach to self-supervised learning. arXiv preprint arXiv:2006.07733 (2020). Jean-Bastien Grill, Florian Strub, Florent Altch\u00e9, Corentin Tallec, Pierre H Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, et al. 2020. Bootstrap your own latent: A new approach to self-supervised learning. arXiv preprint arXiv:2006.07733 (2020)."},{"key":"e_1_3_2_2_12_1","volume-title":"Strengthen Learning Tolerance for Weakly Supervised Object Localization. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021","author":"Guo Guangyu","year":"2021","unstructured":"Guangyu Guo , Junwei Han , Fang Wan , and Dingwen Zhang . 2021 . Strengthen Learning Tolerance for Weakly Supervised Object Localization. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021 , virtual, June 19 --25 , 2021. Computer Vision Foundation \/ IEEE, 7403--7412. https:\/\/openaccess.thecvf.com\/content\/CVPR2021\/html\/Guo_Strengthen_Learning_Tolerance_for_Weakly_Supervised_Object_Localization_CVPR_2021_paper.html Guangyu Guo, Junwei Han, Fang Wan, and Dingwen Zhang. 2021. Strengthen Learning Tolerance for Weakly Supervised Object Localization. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19--25, 2021. Computer Vision Foundation \/ IEEE, 7403--7412. https:\/\/openaccess.thecvf.com\/content\/CVPR2021\/html\/Guo_Strengthen_Learning_Tolerance_for_Weakly_Supervised_Object_Localization_CVPR_2021_paper.html"},{"key":"e_1_3_2_2_13_1","volume-title":"Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems","author":"Krizhevsky Alex","year":"2012","unstructured":"Alex Krizhevsky , Ilya Sutskever , and Geoffrey E Hinton . 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems , Vol. 25 ( 2012 ). Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, Vol. 25 (2012)."},{"key":"e_1_3_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3474085.3475409"},{"key":"e_1_3_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v36i2.20025"},{"key":"e_1_3_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2020.3043084"},{"key":"e_1_3_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2019.2954747"},{"key":"e_1_3_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMI.2020.3046672"},{"key":"e_1_3_2_2_19_1","volume-title":"2021 e. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. arXiv preprint arXiv:2107.13586","author":"Liu Pengfei","year":"2021","unstructured":"Pengfei Liu , Weizhe Yuan , Jinlan Fu , Zhengbao Jiang , Hiroaki Hayashi , and Graham Neubig . 2021 e. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. arXiv preprint arXiv:2107.13586 ( 2021 ). Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. 2021 e. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. arXiv preprint arXiv:2107.13586 (2021)."},{"key":"e_1_3_2_2_20_1","volume-title":"Temporal Feature Alignment and Mutual Information Maximization for Video-Based Human Pose Estimation. CoRR","author":"Liu Zhenguang","year":"2022","unstructured":"Zhenguang Liu , Runyang Feng , Haoming Chen , Shuang Wu , Yixing Gao , Yunjun Gao , and Xiang Wang . 2022. Temporal Feature Alignment and Mutual Information Maximization for Video-Based Human Pose Estimation. CoRR , Vol. abs\/ 2203 .15227 ( 2022 ). Zhenguang Liu, Runyang Feng, Haoming Chen, Shuang Wu, Yixing Gao, Yunjun Gao, and Xiang Wang. 2022. Temporal Feature Alignment and Mutual Information Maximization for Video-Based Human Pose Estimation. CoRR, Vol. abs\/2203.15227 (2022)."},{"key":"e_1_3_2_2_21_1","doi-asserted-by":"crossref","unstructured":"Ze Liu Han Hu Yutong Lin Zhuliang Yao Zhenda Xie Yixuan Wei Jia Ning Yue Cao Zheng Zhang Li Dong etal 2021a. Swin Transformer V2: Scaling Up Capacity and Resolution. arXiv preprint arXiv:2111.09883 (2021).  Ze Liu Han Hu Yutong Lin Zhuliang Yao Zhenda Xie Yixuan Wei Jia Ning Yue Cao Zheng Zhang Li Dong et al. 2021a. Swin Transformer V2: Scaling Up Capacity and Resolution. arXiv preprint arXiv:2111.09883 (2021).","DOI":"10.1109\/CVPR52688.2022.01170"},{"key":"e_1_3_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"e_1_3_2_2_23_1","volume-title":"Proceedings, Part XXVI 16","author":"Lu Weizeng","year":"2020","unstructured":"Weizeng Lu , Xi Jia , Weicheng Xie , Linlin Shen , Yicong Zhou , and Jinming Duan . 2020 . Geometry constrained weakly supervised object localization. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020 , Proceedings, Part XXVI 16 . Springer, 481--496. https:\/\/doi.org\/10.1007\/978--3-030--58574--7_29 Weizeng Lu, Xi Jia, Weicheng Xie, Linlin Shen, Yicong Zhou, and Jinming Duan. 2020. Geometry constrained weakly supervised object localization. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XXVI 16. Springer, 481--496. https:\/\/doi.org\/10.1007\/978--3-030--58574--7_29"},{"key":"e_1_3_2_2_24_1","volume-title":"Erasing Integrated Learning: A Simple Yet Effective Approach for Weakly Supervised Object Localization. In 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020","author":"Mai Jinjie","year":"2020","unstructured":"Jinjie Mai , Meng Yang , and Wenfeng Luo . 2020 . Erasing Integrated Learning: A Simple Yet Effective Approach for Weakly Supervised Object Localization. In 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020 , Seattle, WA, USA, June 13--19 , 2020. Computer Vision Foundation \/ IEEE, 8763--8772. https:\/\/doi.org\/10.1109\/CVPR42600.2020.00879 Jinjie Mai, Meng Yang, and Wenfeng Luo. 2020. Erasing Integrated Learning: A Simple Yet Effective Approach for Weakly Supervised Object Localization. In 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13--19, 2020. Computer Vision Foundation \/ IEEE, 8763--8772. https:\/\/doi.org\/10.1109\/CVPR42600.2020.00879"},{"key":"e_1_3_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00818"},{"key":"e_1_3_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00337"},{"key":"e_1_3_2_2_27_1","volume-title":"Unveiling the Potential of Structure Preserving for Weakly Supervised Object Localization. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021","author":"Pan Xingjia","year":"2021","unstructured":"Xingjia Pan , Yingguo Gao , Zhiwen Lin , Fan Tang , Weiming Dong , Haolei Yuan , Feiyue Huang , and Changsheng Xu . 2021 . Unveiling the Potential of Structure Preserving for Weakly Supervised Object Localization. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021 , virtual, June 19 --25 , 2021. Computer Vision Foundation \/ IEEE, 11642--11651. https:\/\/openaccess.thecvf.com\/content\/CVPR2021\/html\/Pan_Unveiling_the_Potential_of_Structure_Preserving_for_Weakly_Supervised_Object_CVPR_2021_paper.html Xingjia Pan, Yingguo Gao, Zhiwen Lin, Fan Tang, Weiming Dong, Haolei Yuan, Feiyue Huang, and Changsheng Xu. 2021. Unveiling the Potential of Structure Preserving for Weakly Supervised Object Localization. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19--25, 2021. Computer Vision Foundation \/ IEEE, 11642--11651. https:\/\/openaccess.thecvf.com\/content\/CVPR2021\/html\/Pan_Unveiling_the_Potential_of_Structure_Preserving_for_Weakly_Supervised_Object_CVPR_2021_paper.html"},{"key":"e_1_3_2_2_28_1","volume-title":"Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021","author":"Raghu Maithra","year":"2021","unstructured":"Maithra Raghu , Thomas Unterthiner , Simon Kornblith , Chiyuan Zhang , and Alexey Dosovitskiy . 2021 . Do Vision Transformers See Like Convolutional Neural Networks? . In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021 , NeurIPS 2021, December 6--14, 2021, virtual, Marc'Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan (Eds.). 12116--12128. https:\/\/proceedings.neurips.cc\/paper\/2021\/hash\/652cf38361a209088302ba2b8b7f51e0-Abstract.html Maithra Raghu, Thomas Unterthiner, Simon Kornblith, Chiyuan Zhang, and Alexey Dosovitskiy. 2021. Do Vision Transformers See Like Convolutional Neural Networks?. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6--14, 2021, virtual, Marc'Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan (Eds.). 12116--12128. https:\/\/proceedings.neurips.cc\/paper\/2021\/hash\/652cf38361a209088302ba2b8b7f51e0-Abstract.html"},{"key":"e_1_3_2_2_29_1","volume-title":"An Overview of Multi-Task Learning in Deep Neural Networks. CoRR","author":"Ruder Sebastian","year":"2017","unstructured":"Sebastian Ruder . 2017. An Overview of Multi-Task Learning in Deep Neural Networks. CoRR , Vol. abs\/ 1706 .05098 ( 2017 ). showeprint[arXiv]1706.05098 http:\/\/arxiv.org\/abs\/1706.05098 Sebastian Ruder. 2017. An Overview of Multi-Task Learning in Deep Neural Networks. CoRR, Vol. abs\/1706.05098 (2017). showeprint[arXiv]1706.05098 http:\/\/arxiv.org\/abs\/1706.05098"},{"key":"e_1_3_2_2_30_1","volume-title":"32nd British Machine Vision Conference 2021, BMVC 2021, Online, November 22--25","author":"Oriane Sim\u00e9","year":"2021","unstructured":"Oriane Sim\u00e9 oni, Gilles Puy , Huy V. Vo , Simon Roburin , Spyros Gidaris , Andrei Bursuc , Patrick P\u00e9 rez, Renaud Marlet , and Jean Ponce . 2021 . Localizing Objects with Self-supervised Transformers and no Labels . In 32nd British Machine Vision Conference 2021, BMVC 2021, Online, November 22--25 , 2021. BMVA Press, 310. https:\/\/www.bmvc 2021-virtualconference.com\/assets\/papers\/1339.pdf Oriane Sim\u00e9 oni, Gilles Puy, Huy V. Vo, Simon Roburin, Spyros Gidaris, Andrei Bursuc, Patrick P\u00e9 rez, Renaud Marlet, and Jean Ponce. 2021. Localizing Objects with Self-supervised Transformers and no Labels. In 32nd British Machine Vision Conference 2021, BMVC 2021, Online, November 22--25, 2021. BMVA Press, 310. https:\/\/www.bmvc2021-virtualconference.com\/assets\/papers\/1339.pdf"},{"key":"e_1_3_2_2_31_1","volume-title":"Dual-Gradients Localization Framework for Weakly Supervised Object Localization. In MM '20: The 28th ACM International Conference on Multimedia, Virtual Event \/ Seattle, WA, USA, October 12--16","author":"Tan Chuangchuang","year":"2020","unstructured":"Chuangchuang Tan , Guanghua Gu , Tao Ruan , Shikui Wei , and Yao Zhao . 2020 . Dual-Gradients Localization Framework for Weakly Supervised Object Localization. In MM '20: The 28th ACM International Conference on Multimedia, Virtual Event \/ Seattle, WA, USA, October 12--16 , 2020, Chang Wen Chen, Rita Cucchiara, Xian-Sheng Hua, Guo-Jun Qi, Elisa Ricci, Zhengyou Zhang, and Roger Zimmermann (Eds.). ACM , 1976--1984. https:\/\/doi.org\/10.1145\/3394171.3413622 Chuangchuang Tan, Guanghua Gu, Tao Ruan, Shikui Wei, and Yao Zhao. 2020. Dual-Gradients Localization Framework for Weakly Supervised Object Localization. In MM '20: The 28th ACM International Conference on Multimedia, Virtual Event \/ Seattle, WA, USA, October 12--16, 2020, Chang Wen Chen, Rita Cucchiara, Xian-Sheng Hua, Guo-Jun Qi, Elisa Ricci, Zhengyou Zhang, and Roger Zimmermann (Eds.). ACM, 1976--1984. https:\/\/doi.org\/10.1145\/3394171.3413622"},{"key":"e_1_3_2_2_32_1","volume-title":"International Conference on Machine Learning. PMLR, 10347--10357","author":"Touvron Hugo","year":"2021","unstructured":"Hugo Touvron , Matthieu Cord , Matthijs Douze , Francisco Massa , Alexandre Sablayrolles , and Herv\u00e9 J\u00e9gou . 2021 . Training data-efficient image transformers & distillation through attention . In International Conference on Machine Learning. PMLR, 10347--10357 . http:\/\/proceedings.mlr.press\/v139\/touvron21a.html Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Herv\u00e9 J\u00e9gou. 2021. Training data-efficient image transformers & distillation through attention. In International Conference on Machine Learning. PMLR, 10347--10357. http:\/\/proceedings.mlr.press\/v139\/touvron21a.html"},{"key":"e_1_3_2_2_33_1","volume-title":"Advances in Neural Information Processing Systems","volume":"34","author":"Tsimpoukelli Maria","year":"2021","unstructured":"Maria Tsimpoukelli , Jacob Menick , Serkan Cabi , SM Eslami , Oriol Vinyals , and Felix Hill . 2021 . Multimodal few-shot learning with frozen language models . Advances in Neural Information Processing Systems , Vol. 34 (2021). https:\/\/proceedings.neurips.cc\/paper\/2021\/hash\/01b7575c38dac42f3cfb7d500438b875-Abstract.html Maria Tsimpoukelli, Jacob Menick, Serkan Cabi, SM Eslami, Oriol Vinyals, and Felix Hill. 2021. Multimodal few-shot learning with frozen language models. Advances in Neural Information Processing Systems, Vol. 34 (2021). https:\/\/proceedings.neurips.cc\/paper\/2021\/hash\/01b7575c38dac42f3cfb7d500438b875-Abstract.html"},{"key":"e_1_3_2_2_34_1","volume-title":"Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani , Noam Shazeer , Niki Parmar , Jakob Uszkoreit , Llion Jones , Aidan N. Gomez , Lukasz Kaiser , and Illia Polosukhin . 2017 . Attention is All you Need . In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017 , December 4 --9 , 2017, Long Beach, CA, USA, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 5998--6008. https:\/\/proceedings.neurips.cc\/paper\/2017\/hash\/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4--9, 2017, Long Beach, CA, USA, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 5998--6008. https:\/\/proceedings.neurips.cc\/paper\/2017\/hash\/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html"},{"key":"e_1_3_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01231-1_29"},{"key":"e_1_3_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00020"},{"key":"e_1_3_2_2_37_1","volume-title":"Multi-task hourglass network for online automatic diagnosis of developmental dysplasia of the hip. World Wide Web","author":"Xu Jingyuan","year":"2022","unstructured":"Jingyuan Xu , Hongtao Xie , Qingfeng Tan , Hai Wu , Chuanbin Liu , Sicheng Zhang , Zhendong Mao , and Yongdong Zhang . 2022. Multi-task hourglass network for online automatic diagnosis of developmental dysplasia of the hip. World Wide Web ( 2022 ), 1--21. https:\/\/doi.org\/10.1007\/s11280-022-01051-0 Jingyuan Xu, Hongtao Xie, Qingfeng Tan, Hai Wu, Chuanbin Liu, Sicheng Zhang, Zhendong Mao, and Yongdong Zhang. 2022. Multi-task hourglass network for online automatic diagnosis of developmental dysplasia of the hip. World Wide Web (2022), 1--21. https:\/\/doi.org\/10.1007\/s11280-022-01051-0"},{"key":"e_1_3_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00669"},{"key":"e_1_3_2_2_39_1","volume-title":"Rethinking the Route Towards Weakly Supervised Object Localization. In 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020","author":"Zhang Chen-Lin","year":"2020","unstructured":"Chen-Lin Zhang , Yun-Hao Cao , and Jianxin Wu . 2020 a. Rethinking the Route Towards Weakly Supervised Object Localization. In 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020 , Seattle, WA, USA, June 13--19 , 2020. Computer Vision Foundation \/ IEEE, 13457--13466. https:\/\/doi.org\/10.1109\/CVPR42600.2020.01347 Chen-Lin Zhang, Yun-Hao Cao, and Jianxin Wu. 2020a. Rethinking the Route Towards Weakly Supervised Object Localization. In 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13--19, 2020. Computer Vision Foundation \/ IEEE, 13457--13466. https:\/\/doi.org\/10.1109\/CVPR42600.2020.01347"},{"key":"e_1_3_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2021.3074313"},{"key":"e_1_3_2_2_41_1","volume-title":"Leonidas J. Guibas, and Jitendra Malik.","author":"Zhang Jeffrey O.","year":"2020","unstructured":"Jeffrey O. Zhang , Alexander Sax , Amir Roshan Zamir , Leonidas J. Guibas, and Jitendra Malik. 2020 b. Side-Tuning: A Baseline for Network Adaptation via Additive Side Networks. In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part III (Lecture Notes in Computer Science), Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.), Vol. 12348 . Springer , 698--714. https:\/\/doi.org\/10.1007\/978--3-030--58580--8_41 Jeffrey O. Zhang, Alexander Sax, Amir Roshan Zamir, Leonidas J. Guibas, and Jitendra Malik. 2020b. Side-Tuning: A Baseline for Network Adaptation via Additive Side Networks. In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part III (Lecture Notes in Computer Science), Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.), Vol. 12348. Springer, 698--714. https:\/\/doi.org\/10.1007\/978--3-030--58580--8_41"},{"key":"e_1_3_2_2_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00144"},{"key":"e_1_3_2_2_43_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01258-8_37"},{"key":"e_1_3_2_2_44_1","volume-title":"Proceedings, Part XIX (Lecture Notes in Computer Science), Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.)","volume":"12364","author":"Zhang Xiaolin","year":"2020","unstructured":"Xiaolin Zhang , Yunchao Wei , and Yi Yang . 2020 c. Inter-Image Communication for Weakly Supervised Localization. In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23--28, 2020 , Proceedings, Part XIX (Lecture Notes in Computer Science), Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.) , Vol. 12364 . Springer, 271--287. https:\/\/doi.org\/10.1007\/978--3-030--58529--7_17 Xiaolin Zhang, Yunchao Wei, and Yi Yang. 2020c. Inter-Image Communication for Weakly Supervised Localization. In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XIX (Lecture Notes in Computer Science), Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.), Vol. 12364. Springer, 271--287. https:\/\/doi.org\/10.1007\/978--3-030--58529--7_17"},{"key":"e_1_3_2_2_45_1","doi-asserted-by":"crossref","unstructured":"Xiawu Zheng Rongrong Ji Xiaoshuai Sun Yongjian Wu Feiyue Huang and Yanhua Yang. 2018. Centralized Ranking Loss with Weakly Supervised Localization for Fine-Grained Object Retrieval. In IJCAI. 1226--1233. https:\/\/doi.org\/10.24963\/ijcai.2018\/171  Xiawu Zheng Rongrong Ji Xiaoshuai Sun Yongjian Wu Feiyue Huang and Yanhua Yang. 2018. Centralized Ranking Loss with Weakly Supervised Localization for Fine-Grained Object Retrieval. In IJCAI. 1226--1233. https:\/\/doi.org\/10.24963\/ijcai.2018\/171","DOI":"10.24963\/ijcai.2018\/171"},{"key":"e_1_3_2_2_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.319"}],"event":{"name":"MM '22: The 30th ACM International Conference on Multimedia","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Lisboa Portugal","acronym":"MM '22"},"container-title":["Proceedings of the 30th ACM International Conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3503161.3547945","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3503161.3547945","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:00:31Z","timestamp":1750186831000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3503161.3547945"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,10,10]]},"references-count":46,"alternative-id":["10.1145\/3503161.3547945","10.1145\/3503161"],"URL":"https:\/\/doi.org\/10.1145\/3503161.3547945","relation":{},"subject":[],"published":{"date-parts":[[2022,10,10]]},"assertion":[{"value":"2022-10-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}