{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,15]],"date-time":"2025-12-15T14:13:00Z","timestamp":1765807980374,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":28,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,10,17]],"date-time":"2021-10-17T00:00:00Z","timestamp":1634428800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,10,17]]},"DOI":"10.1145\/3474085.3481034","type":"proceedings-article","created":{"date-parts":[[2021,10,18]],"date-time":"2021-10-18T06:57:34Z","timestamp":1634540254000},"page":"2974-2978","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Dynamic Knowledge Distillation with Cross-Modality Knowledge Transfer"],"prefix":"10.1145","author":[{"given":"Guangzhi","family":"Wang","sequence":"first","affiliation":[{"name":"National University of Singapore, Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,10,17]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"crossref","unstructured":"Zeynep Akata Scott Reed Daniel Walter Honglak Lee and Bernt Schiele. 2015. Evaluation of Output Embeddings for Fine-grained Image Classification. In CVPR. 2927--2936. Zeynep Akata Scott Reed Daniel Walter Honglak Lee and Bernt Schiele. 2015. Evaluation of Output Embeddings for Fine-grained Image Classification. In CVPR. 2927--2936.","DOI":"10.1109\/CVPR.2015.7298911"},{"key":"e_1_3_2_1_2_1","unstructured":"Marcin Andrychowicz Misha Denil Sergio Gomez Matthew W Hoffman David Pfau Tom Schaul Brendan Shillingford and Nando De Freitas. 2016. Learning to Learn by Gradient Descent by Gradient Descent. In NIPS. 3981--3989. Marcin Andrychowicz Misha Denil Sergio Gomez Matthew W Hoffman David Pfau Tom Schaul Brendan Shillingford and Nando De Freitas. 2016. Learning to Learn by Gradient Descent by Gradient Descent. In NIPS. 3981--3989."},{"volume-title":"Mind, Experience, and School: Expanded Edition","author":"National Research Council. 2000.","key":"e_1_3_2_1_3_1","unstructured":"National Research Council. 2000. How People Learn: Brain , Mind, Experience, and School: Expanded Edition . The National Academies Press . National Research Council. 2000. How People Learn: Brain, Mind, Experience, and School: Expanded Edition. The National Academies Press."},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"crossref","unstructured":"Thomas Deselaers and Vittorio Ferrari. 2011. Visual and Semantic Similarity in ImageNet. In CVPR. 1777--1784. Thomas Deselaers and Vittorio Ferrari. 2011. Visual and Semantic Similarity in ImageNet. In CVPR. 1777--1784.","DOI":"10.1109\/CVPR.2011.5995474"},{"key":"e_1_3_2_1_5_1","volume-title":"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT. 4171--4186.","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2019 . BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT. 4171--4186. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT. 4171--4186."},{"key":"e_1_3_2_1_6_1","volume-title":"Improved Regularization of Convolutional Neural Networks with Cutout. arXiv preprint arXiv:1708.04552","author":"DeVries Terrance","year":"2017","unstructured":"Terrance DeVries and Graham W Taylor . 2017. Improved Regularization of Convolutional Neural Networks with Cutout. arXiv preprint arXiv:1708.04552 ( 2017 ). Terrance DeVries and Graham W Taylor. 2017. Improved Regularization of Convolutional Neural Networks with Cutout. arXiv preprint arXiv:1708.04552 (2017)."},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"crossref","unstructured":"Ali Farhadi Ian Endres Derek Hoiem and David Forsyth. 2009. Describing Objects by Their Attributes. In CVPR. 1778--1785. Ali Farhadi Ian Endres Derek Hoiem and David Forsyth. 2009. Describing Objects by Their Attributes. In CVPR. 1778--1785.","DOI":"10.1109\/CVPR.2009.5206772"},{"key":"e_1_3_2_1_8_1","unstructured":"Andrea Frome Greg S Corrado Jon Shlens Samy Bengio Jeff Dean Marc'Aurelio Ranzato and Tomas Mikolov. 2013. DeViSE: A Deep Visual-Semantic Embedding Model. In NIPS. 2121--2129. Andrea Frome Greg S Corrado Jon Shlens Samy Bengio Jeff Dean Marc'Aurelio Ranzato and Tomas Mikolov. 2013. DeViSE: A Deep Visual-Semantic Embedding Model. In NIPS. 2121--2129."},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2015.2408354"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"crossref","unstructured":"Yanwei Fu and Leonid Sigal. 2016. Semi-Supervised Vocabulary-Informed Learning. In CVPR. 5337--5346. Yanwei Fu and Leonid Sigal. 2016. Semi-Supervised Vocabulary-Informed Learning. In CVPR. 5337--5346.","DOI":"10.1109\/CVPR.2016.576"},{"key":"e_1_3_2_1_11_1","unstructured":"Zhenyong Fu Tao Xiang Elyor Kodirov and Shaogang Gong. 2015b. Zero-shot Object Recognition by Semantic Manifold Distance. In CVPR. 2635--2644. Zhenyong Fu Tao Xiang Elyor Kodirov and Shaogang Gong. 2015b. Zero-shot Object Recognition by Semantic Manifold Distance. In CVPR. 2635--2644."},{"key":"e_1_3_2_1_12_1","unstructured":"Golnaz Ghiasi Tsung-Yi Lin and Quoc V Le. 2018. DropBlock: A regularization method for convolutional networks. In NeurIPS. 10727--10737. Golnaz Ghiasi Tsung-Yi Lin and Quoc V Le. 2018. DropBlock: A regularization method for convolutional networks. In NeurIPS. 10727--10737."},{"key":"e_1_3_2_1_13_1","unstructured":"Kaiming He Xiangyu Zhang Shaoqing Ren and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR. 770--778. Kaiming He Xiangyu Zhang Shaoqing Ren and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR. 770--778."},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/5.726791"},{"key":"e_1_3_2_1_17_1","unstructured":"Jiasen Lu Dhruv Batra Devi Parikh and Stefan Lee. 2019. ViLBERT: Pretraining Task-agnostic Visiolinguistic Representations for Vision-and-Language Tasks. In NeurIPS. 13--23. Jiasen Lu Dhruv Batra Devi Parikh and Stefan Lee. 2019. ViLBERT: Pretraining Task-agnostic Visiolinguistic Representations for Vision-and-Language Tasks. In NeurIPS. 13--23."},{"key":"e_1_3_2_1_18_1","unstructured":"Jiasen Lu Vedanuj Goswami Marcus Rohrbach Devi Parikh and Stefan Lee. 2020. 12-in-1: Multi-Task Vision and Language Representation Learning. In CVPR. 10437--10446. Jiasen Lu Vedanuj Goswami Marcus Rohrbach Devi Parikh and Stefan Lee. 2020. 12-in-1: Multi-Task Vision and Language Representation Learning. In CVPR. 10437--10446."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"crossref","unstructured":"Duy-Kien Nguyen and Takayuki Okatani. 2019. Multi-task Learning of Hierarchical Vision-Language Representation. In CVPR. 10492--10501. Duy-Kien Nguyen and Takayuki Okatani. 2019. Multi-task Learning of Hierarchical Vision-Language Representation. In CVPR. 10492--10501.","DOI":"10.1109\/CVPR.2019.01074"},{"key":"e_1_3_2_1_20_1","unstructured":"Mohammad Norouzi Tomas Mikolov Samy Bengio Yoram Singer Jonathon Shlens Andrea Frome Greg S Corrado and Jeffrey Dean. 2014. Zero-shot Learning by Convex Combination of Semantic Embeddings. In ICLR. Mohammad Norouzi Tomas Mikolov Samy Bengio Yoram Singer Jonathon Shlens Andrea Frome Greg S Corrado and Jeffrey Dean. 2014. Zero-shot Learning by Convex Combination of Semantic Embeddings. In ICLR."},{"key":"e_1_3_2_1_21_1","volume-title":"et almbox","author":"Russakovsky Olga","year":"2015","unstructured":"Olga Russakovsky , Jia Deng , Hao Su , Jonathan Krause , Sanjeev Satheesh , Sean Ma , Zhiheng Huang , Andrej Karpathy , Aditya Khosla , Michael Bernstein , et almbox . 2015 . Imagenet Large Scale Visual Recognition Challenge. International journal of computer vision, Vol. 115 , 3 (2015), 211--252. Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et almbox. 2015. Imagenet Large Scale Visual Recognition Challenge. International journal of computer vision, Vol. 115, 3 (2015), 211--252."},{"key":"e_1_3_2_1_22_1","unstructured":"Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In ICLR. Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In ICLR."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.5555\/2627435.2670313"},{"key":"e_1_3_2_1_24_1","unstructured":"Weijie Su Xizhou Zhu Yue Cao Bin Li Lewei Lu Furu Wei and Jifeng Dai. 2020. VL-BERT: Pre-training of Generic Visual-Linguistic Representations. In ICLR. Weijie Su Xizhou Zhu Yue Cao Bin Li Lewei Lu Furu Wei and Jifeng Dai. 2020. VL-BERT: Pre-training of Generic Visual-Linguistic Representations. In ICLR."},{"key":"e_1_3_2_1_25_1","volume-title":"Carl Vondrick, Kevin Murphy, and Cordelia Schmid.","author":"Sun Chen","year":"2019","unstructured":"Chen Sun , Austin Myers , Carl Vondrick, Kevin Murphy, and Cordelia Schmid. 2019 . VideoBERT: A Joint Model for Video and Language Representation Learning. In ICCV. 7464--7473. Chen Sun, Austin Myers, Carl Vondrick, Kevin Murphy, and Cordelia Schmid. 2019. VideoBERT: A Joint Model for Video and Language Representation Learning. In ICCV. 7464--7473."},{"key":"e_1_3_2_1_26_1","volume-title":"LXMERT: Learning Cross-Modality Encoder Representations from Transformers. In EMNLP. 5103--5114.","author":"Tan Hao","year":"2019","unstructured":"Hao Tan and Mohit Bansal . 2019 . LXMERT: Learning Cross-Modality Encoder Representations from Transformers. In EMNLP. 5103--5114. Hao Tan and Mohit Bansal. 2019. LXMERT: Learning Cross-Modality Encoder Representations from Transformers. In EMNLP. 5103--5114."},{"key":"e_1_3_2_1_27_1","unstructured":"Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N Gomez \u0141ukasz Kaiser and Illia Polosukhin. 2017. Attention is All You Need. In NIPS. 6000--6010. Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N Gomez \u0141ukasz Kaiser and Illia Polosukhin. 2017. Attention is All You Need. In NIPS. 6000--6010."},{"key":"e_1_3_2_1_28_1","volume-title":"ICML (JMLR Workshop and Conference Proceedings","volume":"1066","author":"Wan Li","year":"2013","unstructured":"Li Wan , Matthew D. Zeiler , Sixin Zhang , Yann LeCun , and Rob Fergus . 2013 . Regularization of Neural Networks using DropConnect . In ICML (JMLR Workshop and Conference Proceedings , Vol. 28). 1058-- 1066 . Li Wan, Matthew D. Zeiler, Sixin Zhang, Yann LeCun, and Rob Fergus. 2013. Regularization of Neural Networks using DropConnect. In ICML (JMLR Workshop and Conference Proceedings, Vol. 28). 1058--1066."},{"key":"e_1_3_2_1_29_1","unstructured":"Yulin Wang Xuran Pan Shiji Song Hong Zhang Cheng Wu and Gao Huang. 2019. Implicit Semantic Data Augmentation for Deep Networks. In NeurIPS. 12614--12623. Yulin Wang Xuran Pan Shiji Song Hong Zhang Cheng Wu and Gao Huang. 2019. Implicit Semantic Data Augmentation for Deep Networks. In NeurIPS. 12614--12623."}],"event":{"name":"MM '21: ACM Multimedia Conference","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Virtual Event China","acronym":"MM '21"},"container-title":["Proceedings of the 29th ACM International Conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3474085.3481034","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3474085.3481034","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:17:35Z","timestamp":1750191455000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3474085.3481034"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,17]]},"references-count":28,"alternative-id":["10.1145\/3474085.3481034","10.1145\/3474085"],"URL":"https:\/\/doi.org\/10.1145\/3474085.3481034","relation":{},"subject":[],"published":{"date-parts":[[2021,10,17]]},"assertion":[{"value":"2021-10-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}