{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,4]],"date-time":"2025-10-04T08:16:25Z","timestamp":1759565785781,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":33,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,10,18]],"date-time":"2021-10-18T00:00:00Z","timestamp":1634515200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Bavarian Ministry of Economic Affairs, Regional Development and Energy","award":["IUK-1902-003\/\/ IUK625\/002."],"award-info":[{"award-number":["IUK-1902-003\/\/ IUK625\/002."]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,10,18]]},"DOI":"10.1145\/3462244.3479904","type":"proceedings-article","created":{"date-parts":[[2021,10,15]],"date-time":"2021-10-15T15:01:58Z","timestamp":1634310118000},"page":"34-42","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["A Contrastive Learning Approach for Compositional Zero-Shot Learning"],"prefix":"10.1145","author":[{"given":"Muhammad Umer","family":"Anwaar","sequence":"first","affiliation":[{"name":"Technical University of Munich, Germany"}]},{"given":"Rayyan Ahmad","family":"Khan","sequence":"additional","affiliation":[{"name":"Technical University of Munich, Germany"}]},{"given":"Zhihui","family":"Pan","sequence":"additional","affiliation":[{"name":"Technical University of Munich, Germany"}]},{"given":"Martin","family":"Kleinsteuber","sequence":"additional","affiliation":[{"name":"Mercateo AG, Germany and Technical University of Munich, Germany"}]}],"member":"320","published-online":{"date-parts":[[2021,10,18]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/WACV48630.2021.00118"},{"key":"e_1_3_2_1_2_1","unstructured":"Wei-Lun Chao Soravit Changpinyo Boqing Gong and Fei Sha. 2017. An Empirical Study and Analysis of Generalized Zero-Shot Learning for Object Recognition in the Wild. arxiv:1605.04253\u00a0[cs.CV]  Wei-Lun Chao Soravit Changpinyo Boqing Gong and Fei Sha. 2017. An Empirical Study and Analysis of Generalized Zero-Shot Learning for Object Recognition in the Wild. arxiv:1605.04253\u00a0[cs.CV]"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.33"},{"key":"e_1_3_2_1_4_1","unstructured":"Ting Chen Simon Kornblith Mohammad Norouzi and Geoffrey Hinton. 2020. A Simple Framework for Contrastive Learning of Visual Representations. arxiv:2002.05709\u00a0[cs.LG]  Ting Chen Simon Kornblith Mohammad Norouzi and Geoffrey Hinton. 2020. A Simple Framework for Contrastive Learning of Visual Representations. arxiv:2002.05709\u00a0[cs.LG]"},{"key":"e_1_3_2_1_5_1","volume-title":"Imagenet: A large-scale hierarchical image database. In CVPR.","author":"Deng Jia","year":"2009","unstructured":"Jia Deng , Wei Dong , Richard Socher , Li-Jia Li , Kai Li , and Li Fei-Fei . 2009 . Imagenet: A large-scale hierarchical image database. In CVPR. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In CVPR."},{"key":"e_1_3_2_1_6_1","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","volume":"1","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2019 . BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding . In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , Volume 1 (Long and Short Papers). 4171\u20134186. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171\u20134186."},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206772"},{"key":"e_1_3_2_1_8_1","unstructured":"Xiaoxiao Guo Hui Wu Yu Cheng Steven Rennie Gerald Tesauro and Rogerio Feris. 2018. Dialog-based interactive image retrieval. In Advances in Neural Information Processing Systems. 678\u2013688.  Xiaoxiao Guo Hui Wu Yu Cheng Steven Rennie Gerald Tesauro and Rogerio Feris. 2018. Dialog-based interactive image retrieval. In Advances in Neural Information Processing Systems. 678\u2013688."},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.163"},{"key":"e_1_3_2_1_10_1","unstructured":"Kaiming He Haoqi Fan Yuxin Wu Saining Xie and Ross Girshick. 2020. Momentum Contrast for Unsupervised Visual Representation Learning. arxiv:1911.05722\u00a0[cs.CV]  Kaiming He Haoqi Fan Yuxin Wu Saining Xie and Ross Girshick. 2020. Momentum Contrast for Unsupervised Visual Representation Learning. arxiv:1911.05722\u00a0[cs.CV]"},{"key":"e_1_3_2_1_11_1","unstructured":"Alexander Hermans Lucas Beyer and Bastian Leibe. 2017. In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737(2017).  Alexander Hermans Lucas Beyer and Bastian Leibe. 2017. In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737(2017)."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"crossref","unstructured":"Phillip Isola Joseph\u00a0J Lim and Edward\u00a0H Adelson. 2015. Discovering states and transformations in image collections. In CVPR.  Phillip Isola Joseph\u00a0J Lim and Edward\u00a0H Adelson. 2015. Discovering states and transformations in image collections. In CVPR.","DOI":"10.1109\/CVPR.2015.7298744"},{"key":"e_1_3_2_1_13_1","volume-title":"Adam: A method for stochastic optimization. ICLR","author":"Kingma P","year":"2015","unstructured":"Diederik\u00a0 P Kingma and Jimmy Ba . 2015 . Adam: A method for stochastic optimization. ICLR (2015). Diederik\u00a0P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. ICLR (2015)."},{"key":"e_1_3_2_1_14_1","volume-title":"Proceedings of the 34 th International Conference on Machine Learning","author":"Koushik Jayanth","year":"2017","unstructured":"Jayanth Koushik , Hiroaki Hayashi , and Devendra\u00a0Singh Sachan . 2017 . Compositional Reasoning for Visual Question Answering . In Proceedings of the 34 th International Conference on Machine Learning , 2017. Jayanth Koushik, Hiroaki Hayashi, and Devendra\u00a0Singh Sachan. 2017. Compositional Reasoning for Visual Question Answering. In Proceedings of the 34 th International Conference on Machine Learning, 2017."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-88693-8_25"},{"key":"e_1_3_2_1_16_1","unstructured":"Yong-Lu Li Yue Xu Xiaohan Mao and Cewu Lu. 2020. Symmetry and Group in Attribute-Object Compositions. In CVPR.  Yong-Lu Li Yue Xu Xiaohan Mao and Cewu Lu. 2020. Symmetry and Group in Attribute-Object Compositions. In CVPR."},{"key":"e_1_3_2_1_17_1","volume-title":"Deep Learning Face Attributes in the Wild. In 2015 IEEE International Conference on Computer Vision (ICCV). 3730\u20133738","author":"Liu Ziwei","year":"2015","unstructured":"Ziwei Liu , Ping Luo , Xiaogang Wang , and Xiaoou Tang . 2015 . Deep Learning Face Attributes in the Wild. In 2015 IEEE International Conference on Computer Vision (ICCV). 3730\u20133738 . https:\/\/doi.org\/10.1109\/ICCV.2015.425 Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep Learning Face Attributes in the Wild. In 2015 IEEE International Conference on Computer Vision (ICCV). 3730\u20133738. https:\/\/doi.org\/10.1109\/ICCV.2015.425"},{"key":"e_1_3_2_1_18_1","unstructured":"Tomas Mikolov Ilya Sutskever Kai Chen Greg Corrado and Jeffrey Dean. 2013. Distributed representations of words and phrases and their compositionality. In NeurIPS.  Tomas Mikolov Ilya Sutskever Kai Chen Greg Corrado and Jeffrey Dean. 2013. Distributed representations of words and phrases and their compositionality. In NeurIPS."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"crossref","unstructured":"Ishan Misra Abhinav Gupta and Martial Hebert. 2017. From red wine to red tomato: Composition with context. In CVPR.  Ishan Misra Abhinav Gupta and Martial Hebert. 2017. From red wine to red tomato: Composition with context. In CVPR.","DOI":"10.1109\/CVPR.2017.129"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"crossref","unstructured":"Tushar Nagarajan and Kristen Grauman. 2018. Attributes as operators: factorizing unseen attribute-object compositions. In ECCV.  Tushar Nagarajan and Kristen Grauman. 2018. Attributes as operators: factorizing unseen attribute-object compositions. In ECCV.","DOI":"10.1007\/978-3-030-01246-5_11"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"crossref","unstructured":"Shah Nawaz Kamran Janjua Ignazio Gallo Arif Mahmood Alessandro Calefati and Faisal Shafait. 2019. Do Cross Modal Systems Leverage Semantic Relationships?arxiv:1909.01976\u00a0[cs.CV]  Shah Nawaz Kamran Janjua Ignazio Gallo Arif Mahmood Alessandro Calefati and Faisal Shafait. 2019. Do Cross Modal Systems Leverage Semantic Relationships?arxiv:1909.01976\u00a0[cs.CV]","DOI":"10.1109\/ICCVW.2019.00551"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1162"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"crossref","unstructured":"Senthil Purushwalkam Maximilian Nickel Abhinav Gupta and Marc\u2019Aurelio Ranzato. 2019. Task-driven modular networks for zero-shot compositional learning. In ICCV.  Senthil Purushwalkam Maximilian Nickel Abhinav Gupta and Marc\u2019Aurelio Ranzato. 2019. Task-driven modular networks for zero-shot compositional learning. In ICCV.","DOI":"10.1109\/ICCV.2019.00369"},{"key":"e_1_3_2_1_24_1","volume-title":"Drill-down: Interactive Retrieval of Complex Scenes using Natural Language Queries. In Advances in Neural Information Processing Systems. 2647\u20132657.","author":"Tan Fuwen","year":"2019","unstructured":"Fuwen Tan , Paola Cascante-Bonilla , Xiaoxiao Guo , Hui Wu , Song Feng , and Vicente Ordonez . 2019 . Drill-down: Interactive Retrieval of Complex Scenes using Natural Language Queries. In Advances in Neural Information Processing Systems. 2647\u20132657. Fuwen Tan, Paola Cascante-Bonilla, Xiaoxiao Guo, Hui Wu, Song Feng, and Vicente Ordonez. 2019. Drill-down: Interactive Retrieval of Complex Scenes using Natural Language Queries. In Advances in Neural Information Processing Systems. 2647\u20132657."},{"key":"e_1_3_2_1_25_1","unstructured":"Aaron van\u00a0den Oord Yazhe Li and Oriol Vinyals. 2019. Representation Learning with Contrastive Predictive Coding. arxiv:1807.03748\u00a0[cs.LG]  Aaron van\u00a0den Oord Yazhe Li and Oriol Vinyals. 2019. Representation Learning with Contrastive Predictive Coding. arxiv:1807.03748\u00a0[cs.LG]"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00660"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46448-0_30"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.541"},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"crossref","unstructured":"Xin Wang Fisher Yu Ruth Wang Trevor Darrell and Joseph\u00a0E. Gonzalez. 2019. TAFE-Net: Task-Aware Feature Embeddings for Low Shot Learning. arxiv:1904.05967\u00a0[cs.CV]  Xin Wang Fisher Yu Ruth Wang Trevor Darrell and Joseph\u00a0E. Gonzalez. 2019. TAFE-Net: Task-Aware Feature Embeddings for Low Shot Learning. arxiv:1904.05967\u00a0[cs.CV]","DOI":"10.1109\/CVPR.2019.00193"},{"key":"e_1_3_2_1_30_1","unstructured":"Han Xiao. 2018. bert-as-service. https:\/\/github.com\/hanxiao\/bert-as-service.  Han Xiao. 2018. bert-as-service. https:\/\/github.com\/hanxiao\/bert-as-service."},{"key":"e_1_3_2_1_31_1","unstructured":"Aron Yu and Kristen Grauman. 2017. Semantic jitter: Dense supervision for visual comparisons via synthetic images. In CVPR.  Aron Yu and Kristen Grauman. 2017. Semantic jitter: Dense supervision for visual comparisons via synthetic images. In CVPR."},{"key":"e_1_3_2_1_32_1","unstructured":"Yuhao Zhang Hang Jiang Yasuhide Miura Christopher\u00a0D. Manning and Curtis\u00a0P. Langlotz. 2020. Contrastive Learning of Medical Visual Representations from Paired Images and Text. arxiv:2010.00747\u00a0[cs.CV]  Yuhao Zhang Hang Jiang Yasuhide Miura Christopher\u00a0D. Manning and Curtis\u00a0P. Langlotz. 2020. Contrastive Learning of Medical Visual Representations from Paired Images and Text. arxiv:2010.00747\u00a0[cs.CV]"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.652"}],"event":{"name":"ICMI '21: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION","sponsor":["SIGCHI ACM Special Interest Group on Computer-Human Interaction"],"location":"Montr\u00e9al QC Canada","acronym":"ICMI '21"},"container-title":["Proceedings of the 2021 International Conference on Multimodal Interaction"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3462244.3479904","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3462244.3479904","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:48:54Z","timestamp":1750193334000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3462244.3479904"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,18]]},"references-count":33,"alternative-id":["10.1145\/3462244.3479904","10.1145\/3462244"],"URL":"https:\/\/doi.org\/10.1145\/3462244.3479904","relation":{},"subject":[],"published":{"date-parts":[[2021,10,18]]},"assertion":[{"value":"2021-10-18","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}