{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,1]],"date-time":"2026-05-01T01:33:39Z","timestamp":1777599219589,"version":"3.51.4"},"publisher-location":"New York, NY, USA","reference-count":73,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,10,10]],"date-time":"2022-10-10T00:00:00Z","timestamp":1665360000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Fundamental Research Funds for the Central Universities"},{"name":"National Natural Science Foundation of China","award":["62022083U21B203861931008"],"award-info":[{"award-number":["62022083U21B203861931008"]}]},{"name":"National Key R&D Program of China under Grant","award":["Grant 2018AAA0102000"],"award-info":[{"award-number":["Grant 2018AAA0102000"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,10,10]]},"DOI":"10.1145\/3503161.3547814","type":"proceedings-article","created":{"date-parts":[[2022,10,10]],"date-time":"2022-10-10T15:42:35Z","timestamp":1665416555000},"page":"4355-4364","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":8,"title":["Synthesizing Counterfactual Samples for Effective Image-Text Matching"],"prefix":"10.1145","author":[{"given":"Hao","family":"Wei","sequence":"first","affiliation":[{"name":"Key Lab of Intell. Info. Process., Inst. of Comput. Tech., CAS &amp; University of Chinese Academy of Sciences, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shuhui","family":"Wang","sequence":"additional","affiliation":[{"name":"Key Lab of Intell. Info. Process., Inst. of Comput. Tech., CAS &amp; Peng Cheng Laboratory, Beijing;Shenzhen, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xinzhe","family":"Han","sequence":"additional","affiliation":[{"name":"Key Lab of Intell. Info. Process., Inst. of Comput. Tech., CAS &amp; University of Chinese Academy of Sciences, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhe","family":"Xue","sequence":"additional","affiliation":[{"name":"Beijing Key Laboratory of Intelligent Telecommunication Software and Multimedia, BUPT, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bin","family":"Ma","sequence":"additional","affiliation":[{"name":"Meituan Inc., Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xiaoming","family":"Wei","sequence":"additional","affiliation":[{"name":"Meituan Inc., Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xiaolin","family":"Wei","sequence":"additional","affiliation":[{"name":"Meituan Inc., Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,10,10]]},"reference":[{"key":"e_1_3_2_2_1_1","volume-title":"Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR. 6077--6086","author":"Anderson Peter","year":"2018","unstructured":"Peter Anderson , Xiaodong He , Chris Buehler , Damien Teney , Mark Johnson , Stephen Gould , and Lei Zhang . 2018 . Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR. 6077--6086 . Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, and Lei Zhang. 2018. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR. 6077--6086."},{"key":"e_1_3_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/WACV.2016.7477688"},{"key":"e_1_3_2_2_3_1","volume-title":"8th International Conference on Learning Representations, ICLR 2020","author":"Besserve Michel","year":"2020","unstructured":"Michel Besserve , Arash Mehrjou , R\u00e9 my Sun , and Bernhard Sch\u00f6 lkopf. 2020 a. Counterfactuals uncover the modular structure of deep generative models . In 8th International Conference on Learning Representations, ICLR 2020 , Addis Ababa, Ethiopia, April 26--30 , 2020. OpenReview.net. Michel Besserve, Arash Mehrjou, R\u00e9 my Sun, and Bernhard Sch\u00f6 lkopf. 2020a. Counterfactuals uncover the modular structure of deep generative models. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26--30, 2020. OpenReview.net."},{"key":"e_1_3_2_2_4_1","volume-title":"8th International Conference on Learning Representations, ICLR 2020","author":"Besserve Michel","year":"2020","unstructured":"Michel Besserve , Arash Mehrjou , R\u00e9 my Sun , and Bernhard Sch\u00f6 lkopf. 2020 b. Counterfactuals uncover the modular structure of deep generative models . In 8th International Conference on Learning Representations, ICLR 2020 , Addis Ababa, Ethiopia, April 26--30 , 2020. OpenReview.net. Michel Besserve, Arash Mehrjou, R\u00e9 my Sun, and Bernhard Sch\u00f6 lkopf. 2020b. Counterfactuals uncover the modular structure of deep generative models. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26--30, 2020. OpenReview.net."},{"key":"e_1_3_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.5555\/1756006.1756042"},{"key":"e_1_3_2_2_6_1","volume-title":"IMRAM: Iterative Matching With Recurrent Attention Memory for Cross-Modal Image-Text Retrieval. In Conference on Computer Vision and Pattern Recognition, CVPR. 12652--12660","author":"Chen Hui","year":"2020","unstructured":"Hui Chen , Guiguang Ding , Xudong Liu , Zijia Lin , Ji Liu , and Jungong Han . 2020 b. IMRAM: Iterative Matching With Recurrent Attention Memory for Cross-Modal Image-Text Retrieval. In Conference on Computer Vision and Pattern Recognition, CVPR. 12652--12660 . Hui Chen, Guiguang Ding, Xudong Liu, Zijia Lin, Ji Liu, and Jungong Han. 2020b. IMRAM: Iterative Matching With Recurrent Attention Memory for Cross-Modal Image-Text Retrieval. In Conference on Computer Vision and Pattern Recognition, CVPR. 12652--12660."},{"key":"e_1_3_2_2_7_1","volume-title":"Learning the Best Pooling Strategy for Visual Semantic Embedding. In Conference on Computer Vision and Pattern Recognition CVPR. 15789--15798","author":"Chen Jiacheng","year":"2021","unstructured":"Jiacheng Chen , Hexiang Hu , Hao Wu , Yuning Jiang , and Changhu Wang . 2021 . Learning the Best Pooling Strategy for Visual Semantic Embedding. In Conference on Computer Vision and Pattern Recognition CVPR. 15789--15798 . Jiacheng Chen, Hexiang Hu, Hao Wu, Yuning Jiang, and Changhu Wang. 2021. Learning the Best Pooling Strategy for Visual Semantic Embedding. In Conference on Computer Vision and Pattern Recognition CVPR. 15789--15798."},{"key":"e_1_3_2_2_8_1","volume-title":"Counterfactual Samples Synthesizing for Robust Visual Question Answering. In 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020","author":"Chen Long","year":"2020","unstructured":"Long Chen , Xin Yan , Jun Xiao , Hanwang Zhang , Shiliang Pu , and Yueting Zhuang . 2020 c. Counterfactual Samples Synthesizing for Robust Visual Question Answering. In 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020 , Seattle, WA, USA, June 13--19 , 2020. Computer Vision Foundation \/ IEEE, 10797--10806. Long Chen, Xin Yan, Jun Xiao, Hanwang Zhang, Shiliang Pu, and Yueting Zhuang. 2020c. Counterfactual Samples Synthesizing for Robust Visual Question Answering. In 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13--19, 2020. Computer Vision Foundation \/ IEEE, 10797--10806."},{"key":"e_1_3_2_2_9_1","volume-title":"Counterfactual Samples Synthesizing for Robust Visual Question Answering. In Conference on Computer Vision and Pattern Recognition, CVPR. 10797--10806","author":"Chen Long","year":"2020","unstructured":"Long Chen , Xin Yan , Jun Xiao , Hanwang Zhang , Shiliang Pu , and Yueting Zhuang . 2020 d. Counterfactual Samples Synthesizing for Robust Visual Question Answering. In Conference on Computer Vision and Pattern Recognition, CVPR. 10797--10806 . Long Chen, Xin Yan, Jun Xiao, Hanwang Zhang, Shiliang Pu, and Yueting Zhuang. 2020d. Counterfactual Samples Synthesizing for Robust Visual Question Answering. In Conference on Computer Vision and Pattern Recognition, CVPR. 10797--10806."},{"key":"e_1_3_2_2_10_1","volume-title":"Adaptive Offline Quintuplet Loss for Image-Text Matching. In Computer Vision ECCV 16th European Conference (Lecture Notes in Computer Science","volume":"565","author":"Chen Tianlang","year":"2020","unstructured":"Tianlang Chen , Jiajun Deng , and Jiebo Luo . 2020 a. Adaptive Offline Quintuplet Loss for Image-Text Matching. In Computer Vision ECCV 16th European Conference (Lecture Notes in Computer Science , Vol. 12358). 549-- 565 . Tianlang Chen, Jiajun Deng, and Jiebo Luo. 2020a. Adaptive Offline Quintuplet Loss for Image-Text Matching. In Computer Vision ECCV 16th European Conference (Lecture Notes in Computer Science, Vol. 12358). 549--565."},{"key":"e_1_3_2_2_11_1","volume-title":"Expressing Objects Just Like Words: Recurrent Visual Embedding for Image-Text Matching. In The Thirty-Fourth AAAI Conference on Artificial Intelligence. 10583--10590","author":"Chen Tianlang","year":"2020","unstructured":"Tianlang Chen and Jiebo Luo . 2020 . Expressing Objects Just Like Words: Recurrent Visual Embedding for Image-Text Matching. In The Thirty-Fourth AAAI Conference on Artificial Intelligence. 10583--10590 . Tianlang Chen and Jiebo Luo. 2020. Expressing Objects Just Like Words: Recurrent Visual Embedding for Image-Text Matching. In The Thirty-Fourth AAAI Conference on Artificial Intelligence. 10583--10590."},{"key":"e_1_3_2_2_12_1","volume-title":"Beyond Triplet Loss: A Deep Quadruplet Network for Person Re-identification. In 2017 IEEE Conference on Computer Vision and Pattern Recognition CVPR. 1320--1329","author":"Chen Weihua","year":"2017","unstructured":"Weihua Chen , Xiaotang Chen , Jianguo Zhang , and Kaiqi Huang . 2017 . Beyond Triplet Loss: A Deep Quadruplet Network for Person Re-identification. In 2017 IEEE Conference on Computer Vision and Pattern Recognition CVPR. 1320--1329 . Weihua Chen, Xiaotang Chen, Jianguo Zhang, and Kaiqi Huang. 2017. Beyond Triplet Loss: A Deep Quadruplet Network for Person Re-identification. In 2017 IEEE Conference on Computer Vision and Pattern Recognition CVPR. 1320--1329."},{"key":"e_1_3_2_2_13_1","volume-title":"Piotr Doll\u00e1 r, and C. Lawrence Zitnick","author":"Chen Xinlei","year":"2015","unstructured":"Xinlei Chen , Hao Fang , Tsung-Yi Lin , Ramakrishna Vedantam , Saurabh Gupta , Piotr Doll\u00e1 r, and C. Lawrence Zitnick . 2015 . Microsoft COCO Captions: Data Collection and Evaluation Server. CoRR , Vol. abs\/ 1504 .00325 (2015). Xinlei Chen, Hao Fang, Tsung-Yi Lin, Ramakrishna Vedantam, Saurabh Gupta, Piotr Doll\u00e1 r, and C. Lawrence Zitnick. 2015. Microsoft COCO Captions: Data Collection and Evaluation Server. CoRR, Vol. abs\/1504.00325 (2015)."},{"key":"e_1_3_2_2_14_1","volume-title":"Control and Tell: A Framework for Generating Controllable and Grounded Captions. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR. 8307--8316","author":"Cornia Marcella","year":"2019","unstructured":"Marcella Cornia , Lorenzo Baraldi , and Rita Cucchiara . 2019 . Show , Control and Tell: A Framework for Generating Controllable and Grounded Captions. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR. 8307--8316 . Marcella Cornia, Lorenzo Baraldi, and Rita Cucchiara. 2019. Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR. 8307--8316."},{"key":"e_1_3_2_2_15_1","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT. 4171--4186","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2019 . BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding . In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT. 4171--4186 . Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT. 4171--4186."},{"key":"e_1_3_2_2_16_1","volume-title":"Similarity Reasoning and Filtration for Image-Text Matching. In Thirty-Fifth AAAI Conference on Artificial Intelligence. 1218--1226","author":"Diao Haiwen","year":"2021","unstructured":"Haiwen Diao , Ying Zhang , Lin Ma , and Huchuan Lu . 2021 . Similarity Reasoning and Filtration for Image-Text Matching. In Thirty-Fifth AAAI Conference on Artificial Intelligence. 1218--1226 . Haiwen Diao, Ying Zhang, Lin Ma, and Huchuan Lu. 2021. Similarity Reasoning and Filtration for Image-Text Matching. In Thirty-Fifth AAAI Conference on Artificial Intelligence. 1218--1226."},{"key":"e_1_3_2_2_17_1","volume-title":"VSE: Improving Visual-Semantic Embeddings with Hard Negatives. In British Machine Vision Conference, BMVC. 12","author":"Faghri Fartash","year":"2018","unstructured":"Fartash Faghri , David J. Fleet , Jamie Ryan Kiros , and Sanja Fidler . 2018 . VSE: Improving Visual-Semantic Embeddings with Hard Negatives. In British Machine Vision Conference, BMVC. 12 . Fartash Faghri, David J. Fleet, Jamie Ryan Kiros, and Sanja Fidler. 2018. VSE: Improving Visual-Semantic Embeddings with Hard Negatives. In British Machine Vision Conference, BMVC. 12."},{"key":"e_1_3_2_2_18_1","volume-title":"Causal inference in statistics: A primer","author":"Glymour Madelyn","unstructured":"Madelyn Glymour , Judea Pearl , and Nicholas P Jewell . 2016. Causal inference in statistics: A primer . John Wiley & Sons . Madelyn Glymour, Judea Pearl, and Nicholas P Jewell. 2016. Causal inference in statistics: A primer. John Wiley & Sons."},{"key":"e_1_3_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.63"},{"key":"e_1_3_2_2_20_1","volume-title":"Greedy Gradient Ensemble for Robust Visual Question Answering. CoRR","author":"Han Xinzhe","year":"2021","unstructured":"Xinzhe Han , Shuhui Wang , Chi Su , Qingming Huang , and Qi Tian . 2021. Greedy Gradient Ensemble for Robust Visual Question Answering. CoRR , Vol. abs\/ 2107 .12651 ( 2021 ). Xinzhe Han, Shuhui Wang, Chi Su, Qingming Huang, and Qi Tian. 2021. Greedy Gradient Ensemble for Robust Visual Question Answering. CoRR, Vol. abs\/2107.12651 (2021)."},{"key":"e_1_3_2_2_21_1","volume-title":"Momentum Contrast for Unsupervised Visual Representation Learning. In 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020","author":"He Kaiming","year":"2020","unstructured":"Kaiming He , Haoqi Fan , Yuxin Wu , Saining Xie , and Ross B. Girshick . 2020 . Momentum Contrast for Unsupervised Visual Representation Learning. In 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020 , Seattle, WA, USA, June 13--19 , 2020 . Computer Vision Foundation \/ IEEE, 9726--9735. Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross B. Girshick. 2020. Momentum Contrast for Unsupervised Visual Representation Learning. In 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13--19, 2020. Computer Vision Foundation \/ IEEE, 9726--9735."},{"key":"e_1_3_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-24261-3_7"},{"key":"e_1_3_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2019\/111"},{"key":"e_1_3_2_2_24_1","volume-title":"Deconfounded Visual Grounding. CoRR","author":"Huang Jianqiang","year":"2021","unstructured":"Jianqiang Huang , Yu Qin , Jiaxin Qi , Qianru Sun , and Hanwang Zhang . 2021. Deconfounded Visual Grounding. CoRR , Vol. abs\/ 2112 .15324 ( 2021 ). Jianqiang Huang, Yu Qin, Jiaxin Qi, Qianru Sun, and Hanwang Zhang. 2021. Deconfounded Visual Grounding. CoRR, Vol. abs\/2112.15324 (2021)."},{"key":"e_1_3_2_2_25_1","volume-title":"Computer Vision - ECCV 2020 - 16th European Conference (Lecture Notes in Computer Science","author":"Huang Zeyi","unstructured":"Zeyi Huang , Haohan Wang , Eric P. Xing , and Dong Huang . 2020. Self-challenging Improves Cross-Domain Generalization . In Computer Vision - ECCV 2020 - 16th European Conference (Lecture Notes in Computer Science , Vol. 12347). 124-- 140 . Zeyi Huang, Haohan Wang, Eric P. Xing, and Dong Huang. 2020. Self-challenging Improves Cross-Domain Generalization. In Computer Vision - ECCV 2020 - 16th European Conference (Lecture Notes in Computer Science, Vol. 12347). 124--140."},{"key":"e_1_3_2_2_26_1","volume-title":"Computer Vision - ECCV - 15th European Conference (Lecture Notes in Computer Science","author":"Lee Kuang-Huei","unstructured":"Kuang-Huei Lee , Xi Chen , Gang Hua , Houdong Hu , and Xiaodong He. 2018. Stacked Cross Attention for Image-Text Matching . In Computer Vision - ECCV - 15th European Conference (Lecture Notes in Computer Science , Vol. 11208). 212-- 228 . Kuang-Huei Lee, Xi Chen, Gang Hua, Houdong Hu, and Xiaodong He. 2018. Stacked Cross Attention for Image-Text Matching. In Computer Vision - ECCV - 15th European Conference (Lecture Notes in Computer Science, Vol. 11208). 212--228."},{"key":"e_1_3_2_2_27_1","volume-title":"Visual Semantic Reasoning for Image-Text Matching. In International Conference on Computer Vision, ICCV. 4653--4661","author":"Li Kunpeng","year":"2019","unstructured":"Kunpeng Li , Yulun Zhang , Kai Li , Yuanyuan Li , and Yun Fu . 2019 . Visual Semantic Reasoning for Image-Text Matching. In International Conference on Computer Vision, ICCV. 4653--4661 . Kunpeng Li, Yulun Zhang, Kai Li, Yuanyuan Li, and Yun Fu. 2019. Visual Semantic Reasoning for Image-Text Matching. In International Conference on Computer Vision, ICCV. 4653--4661."},{"key":"e_1_3_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.265"},{"key":"e_1_3_2_2_29_1","volume-title":"Piotr Doll\u00e1 r, and C. Lawrence Zitnick","author":"Lin Tsung-Yi","year":"2014","unstructured":"Tsung-Yi Lin , Michael Maire , Serge J. Belongie , James Hays , Pietro Perona , Deva Ramanan , Piotr Doll\u00e1 r, and C. Lawrence Zitnick . 2014 . Microsoft COCO: Common Objects in Context. In Computer Vision - ECCV 2014 - 13th European Conference (Lecture Notes in Computer Science , Vol. 8693). 740-- 755 . Tsung-Yi Lin, Michael Maire, Serge J. Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll\u00e1 r, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. In Computer Vision - ECCV 2014 - 13th European Conference (Lecture Notes in Computer Science, Vol. 8693). 740--755."},{"key":"e_1_3_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3343031.3350869"},{"key":"e_1_3_2_2_31_1","volume-title":"Prophet Attention: Predicting Attention with Future Attention. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems","author":"Liu Fenglin","year":"2020","unstructured":"Fenglin Liu , Xuancheng Ren , Xian Wu , Shen Ge , Wei Fan , Yuexian Zou , and Xu Sun . 2020 . Prophet Attention: Predicting Attention with Future Attention. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS. Fenglin Liu, Xuancheng Ren, Xian Wu, Shen Ge, Wei Fan, Yuexian Zou, and Xu Sun. 2020. Prophet Attention: Predicting Attention with Future Attention. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS."},{"key":"e_1_3_2_2_32_1","volume-title":"Improving Referring Expression Grounding With Cross-Modal Attention-Guided Erasing. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019","author":"Liu Xihui","year":"2019","unstructured":"Xihui Liu , Zihao Wang , Jing Shao , Xiaogang Wang , and Hongsheng Li . 2019 b. Improving Referring Expression Grounding With Cross-Modal Attention-Guided Erasing. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019 , Long Beach, CA, USA, June 16--20 , 2019. Computer Vision Foundation \/ IEEE, 1950--1959. Xihui Liu, Zihao Wang, Jing Shao, Xiaogang Wang, and Hongsheng Li. 2019b. Improving Referring Expression Grounding With Cross-Modal Attention-Guided Erasing. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16--20, 2019. Computer Vision Foundation \/ IEEE, 1950--1959."},{"key":"e_1_3_2_2_33_1","volume-title":"Discovering Causal Signals in Images. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017","author":"Lopez-Paz David","year":"2017","unstructured":"David Lopez-Paz , Robert Nishihara , Soumith Chintala , Bernhard Sch\u00f6 lkopf, and L\u00e9 on Bottou . 2017 . Discovering Causal Signals in Images. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 , Honolulu, HI, USA, July 21--26 , 2017. IEEE Computer Society, 58--66. David Lopez-Paz, Robert Nishihara, Soumith Chintala, Bernhard Sch\u00f6 lkopf, and L\u00e9 on Bottou. 2017. Discovering Causal Signals in Images. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21--26, 2017. IEEE Computer Society, 58--66."},{"key":"e_1_3_2_2_34_1","volume-title":"Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems","author":"Lu Jiasen","year":"2019","unstructured":"Jiasen Lu , Dhruv Batra , Devi Parikh , and Stefan Lee . 2019. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks . In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019 , NeurIPS 2019, December 8--14, 2019, Vancouver, BC, Canada . 13--23. Jiasen Lu, Dhruv Batra, Devi Parikh, and Stefan Lee. 2019. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8--14, 2019, Vancouver, BC, Canada. 13--23."},{"key":"e_1_3_2_2_35_1","volume-title":"Sampling Matters in Deep Embedding Learning. In International Conference on Computer Vision ICCV. 2859--2867","author":"Manmatha R.","year":"2017","unstructured":"R. Manmatha , Chao-Yuan Wu , Alexander J. Smola , and Philipp Kr\"a henb\u00fc hl. 2017 . Sampling Matters in Deep Embedding Learning. In International Conference on Computer Vision ICCV. 2859--2867 . R. Manmatha, Chao-Yuan Wu, Alexander J. Smola, and Philipp Kr\"a henb\u00fc hl. 2017. Sampling Matters in Deep Embedding Learning. In International Conference on Computer Vision ICCV. 2859--2867."},{"key":"e_1_3_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01251"},{"key":"e_1_3_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01251"},{"key":"e_1_3_2_2_38_1","volume-title":"Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsm\"a ssan","author":"Parascandolo Giambattista","year":"2018","unstructured":"Giambattista Parascandolo , Niki Kilbertus , Mateo Rojas-Carulla , and Bernhard Sch\u00f6 lkopf. 2018 . Learning Independent Causal Mechanisms . In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsm\"a ssan , Stockholm, Sweden, July 10--15 , 2018 (Proceedings of Machine Learning Research, Vol. 80). PMLR, 4033--4041. Giambattista Parascandolo, Niki Kilbertus, Mateo Rojas-Carulla, and Bernhard Sch\u00f6 lkopf. 2018. Learning Independent Causal Mechanisms. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsm\"a ssan, Stockholm, Sweden, July 10--15, 2018 (Proceedings of Machine Learning Research, Vol. 80). PMLR, 4033--4041."},{"key":"e_1_3_2_2_39_1","volume-title":"Interpretation and identification of causal mediation. Psychological methods","author":"Pearl Judea","year":"2014","unstructured":"Judea Pearl . 2014. Interpretation and identification of causal mediation. Psychological methods , Vol. 19 , 4 ( 2014 ), 459. Judea Pearl. 2014. Interpretation and identification of causal mediation. Psychological methods, Vol. 19, 4 (2014), 459."},{"key":"e_1_3_2_2_40_1","doi-asserted-by":"crossref","unstructured":"Judea Pearl. 2022. Direct and indirect effects. In Probabilistic and Causal Inference: The Works of Judea Pearl. 373--392.  Judea Pearl. 2022. Direct and indirect effects. In Probabilistic and Causal Inference: The Works of Judea Pearl. 373--392.","DOI":"10.1145\/3501714.3501736"},{"key":"e_1_3_2_2_41_1","unstructured":"Judea Pearl and Dana Mackenzie. 2018. The book of why: the new science of cause and effect. Basic books.  Judea Pearl and Dana Mackenzie. 2018. The book of why: the new science of cause and effect. Basic books."},{"key":"e_1_3_2_2_42_1","volume-title":"Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI. 3846--3853","author":"Peng Yuxin","year":"2016","unstructured":"Yuxin Peng , Xin Huang , and Jinwei Qi . 2016 . Cross-Media Shared Representation by Hierarchical Learning with Multiple Deep Networks . In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI. 3846--3853 . Yuxin Peng, Xin Huang, and Jinwei Qi. 2016. Cross-Media Shared Representation by Hierarchical Learning with Multiple Deep Networks. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI. 3846--3853."},{"key":"e_1_3_2_2_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2017.2742704"},{"key":"e_1_3_2_2_44_1","volume-title":"Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18--24","volume":"8763","author":"Radford Alec","year":"2021","unstructured":"Alec Radford , Jong Wook Kim , Chris Hallacy , Aditya Ramesh , Gabriel Goh , Sandhini Agarwal , Girish Sastry , Amanda Askell , Pamela Mishkin , Jack Clark , Gretchen Krueger , and Ilya Sutskever . 2021 . Learning Transferable Visual Models From Natural Language Supervision . In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18--24 July 2021, Virtual Event (Proceedings of Machine Learning Research , Vol. 139). PMLR, 8748-- 8763 . Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18--24 July 2021, Virtual Event (Proceedings of Machine Learning Research, Vol. 139). PMLR, 8748--8763."},{"key":"e_1_3_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298682"},{"key":"e_1_3_2_2_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.74"},{"key":"e_1_3_2_2_47_1","volume-title":"Improved Deep Metric Learning with Multi-class N-pair Loss Objective. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems. 1849--1857","author":"Sohn Kihyuk","year":"2016","unstructured":"Kihyuk Sohn . 2016 . Improved Deep Metric Learning with Multi-class N-pair Loss Objective. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems. 1849--1857 . Kihyuk Sohn. 2016. Improved Deep Metric Learning with Multi-class N-pair Loss Objective. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems. 1849--1857."},{"key":"e_1_3_2_2_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.434"},{"key":"e_1_3_2_2_49_1","doi-asserted-by":"publisher","DOI":"10.5555\/2627435.2670313"},{"key":"e_1_3_2_2_50_1","volume-title":"VL-BERT: Pre-training of Generic Visual-Linguistic Representations. In 8th International Conference on Learning Representations, ICLR 2020","author":"Su Weijie","year":"2020","unstructured":"Weijie Su , Xizhou Zhu , Yue Cao , Bin Li , Lewei Lu , Furu Wei , and Jifeng Dai . 2020 . VL-BERT: Pre-training of Generic Visual-Linguistic Representations. In 8th International Conference on Learning Representations, ICLR 2020 , Addis Ababa, Ethiopia, April 26--30 , 2020. Weijie Su, Xizhou Zhu, Yue Cao, Bin Li, Lewei Lu, Furu Wei, and Jifeng Dai. 2020. VL-BERT: Pre-training of Generic Visual-Linguistic Representations. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26--30, 2020."},{"key":"e_1_3_2_2_51_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1514"},{"key":"e_1_3_2_2_52_1","volume-title":"Unbiased Scene Graph Generation From Biased Training. In 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020","author":"Tang Kaihua","year":"2020","unstructured":"Kaihua Tang , Yulei Niu , Jianqiang Huang , Jiaxin Shi , and Hanwang Zhang . 2020 . Unbiased Scene Graph Generation From Biased Training. In 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020 , Seattle, WA, USA, June 13--19 , 2020. Computer Vision Foundation \/ IEEE, 3713--3722. Kaihua Tang, Yulei Niu, Jianqiang Huang, Jiaxin Shi, and Hanwang Zhang. 2020. Unbiased Scene Graph Generation From Biased Training. In 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13--19, 2020. Computer Vision Foundation \/ IEEE, 3713--3722."},{"key":"e_1_3_2_2_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/3123266.3123326"},{"key":"e_1_3_2_2_54_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58586-0_2"},{"key":"e_1_3_2_2_55_1","volume-title":"Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval. In Winter Conference on Applications of Computer Vision, WACV. 1497--1506","author":"Wang Sijin","year":"2020","unstructured":"Sijin Wang , Ruiping Wang , Ziwei Yao , Shiguang Shan , and Xilin Chen . 2020 a. Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval. In Winter Conference on Applications of Computer Vision, WACV. 1497--1506 . Sijin Wang, Ruiping Wang, Ziwei Yao, Shiguang Shan, and Xilin Chen. 2020a. Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval. In Winter Conference on Applications of Computer Vision, WACV. 1497--1506."},{"key":"e_1_3_2_2_56_1","volume-title":"Multi-Similarity Loss With General Pair Weighting for Deep Metric Learning. In IEEE Conference on Computer Vision and Pattern Recognition CVPR. 5022--5030","author":"Wang Xun","unstructured":"Xun Wang , Xintong Han , Weilin Huang , Dengke Dong , and Matthew R. Scott . 2019a . Multi-Similarity Loss With General Pair Weighting for Deep Metric Learning. In IEEE Conference on Computer Vision and Pattern Recognition CVPR. 5022--5030 . Xun Wang, Xintong Han, Weilin Huang, Dengke Dong, and Matthew R. Scott. 2019a. Multi-Similarity Loss With General Pair Weighting for Deep Metric Learning. In IEEE Conference on Computer Vision and Pattern Recognition CVPR. 5022--5030."},{"key":"e_1_3_2_2_57_1","volume-title":"Cross-Batch Memory for Embedding Learning. In 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020","author":"Wang Xun","year":"2020","unstructured":"Xun Wang , Haozhi Zhang , Weilin Huang , and Matthew R. Scott . 2020b . Cross-Batch Memory for Embedding Learning. In 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020 , Seattle, WA, USA, June 13--19 , 2020 . Computer Vision Foundation \/ IEEE, 6387--6396. Xun Wang, Haozhi Zhang, Weilin Huang, and Matthew R. Scott. 2020b. Cross-Batch Memory for Embedding Learning. In 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13--19, 2020. Computer Vision Foundation \/ IEEE, 6387--6396."},{"key":"e_1_3_2_2_58_1","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2019\/526"},{"key":"e_1_3_2_2_59_1","volume-title":"CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval. In International Conference on Computer Vision, ICCV. 5763--5772","author":"Wang Zihao","year":"2019","unstructured":"Zihao Wang , Xihui Liu , Hongsheng Li , Lu Sheng , Junjie Yan , Xiaogang Wang , and Jing Shao . 2019 b. CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval. In International Conference on Computer Vision, ICCV. 5763--5772 . Zihao Wang, Xihui Liu, Hongsheng Li, Lu Sheng, Junjie Yan, Xiaogang Wang, and Jing Shao. 2019b. CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval. In International Conference on Computer Vision, ICCV. 5763--5772."},{"key":"e_1_3_2_2_60_1","volume-title":"Language-Agnostic Visual-Semantic Embeddings. In International Conference on Computer Vision, ICCV. 5803--5812","author":"Wehrmann Jonatas","unstructured":"Jonatas Wehrmann , Maur'i cio Armani Lopes , Douglas M. Souza , and Rodrigo C. Barros . 2019 . Language-Agnostic Visual-Semantic Embeddings. In International Conference on Computer Vision, ICCV. 5803--5812 . Jonatas Wehrmann, Maur'i cio Armani Lopes, Douglas M. Souza, and Rodrigo C. Barros. 2019. Language-Agnostic Visual-Semantic Embeddings. In International Conference on Computer Vision, ICCV. 5803--5812."},{"key":"e_1_3_2_2_61_1","volume-title":"Universal Weighting Metric Learning for Cross-Modal Matching. In Conference on Computer Vision and Pattern Recognition CVPR. 13002--13011","author":"Wei Jiwei","year":"2020","unstructured":"Jiwei Wei , Xing Xu , Yang Yang , Yanli Ji , Zheng Wang , and Heng Tao Shen . 2020 a. Universal Weighting Metric Learning for Cross-Modal Matching. In Conference on Computer Vision and Pattern Recognition CVPR. 13002--13011 . Jiwei Wei, Xing Xu, Yang Yang, Yanli Ji, Zheng Wang, and Heng Tao Shen. 2020a. Universal Weighting Metric Learning for Cross-Modal Matching. In Conference on Computer Vision and Pattern Recognition CVPR. 13002--13011."},{"key":"e_1_3_2_2_62_1","volume-title":"Multi-Modality Cross Attention Network for Image and Sentence Matching. In Conference on Computer Vision and Pattern Recognition, CVPR. 10938--10947","author":"Wei Xi","year":"2020","unstructured":"Xi Wei , Tianzhu Zhang , Yan Li , Yongdong Zhang , and Feng Wu . 2020 b. Multi-Modality Cross Attention Network for Image and Sentence Matching. In Conference on Computer Vision and Pattern Recognition, CVPR. 10938--10947 . Xi Wei, Tianzhu Zhang, Yan Li, Yongdong Zhang, and Feng Wu. 2020b. Multi-Modality Cross Attention Network for Image and Sentence Matching. In Conference on Computer Vision and Pattern Recognition, CVPR. 10938--10947."},{"key":"e_1_3_2_2_63_1","volume-title":"Unified Visual-Semantic Embeddings: Bridging Vision and Language With Structured Meaning Representations. In Conference on Computer Vision and Pattern Recognition CVPR. 6609--6618","author":"Wu Hao","year":"2019","unstructured":"Hao Wu , Jiayuan Mao , Yufeng Zhang , Yuning Jiang , Lei Li , Weiwei Sun , and Wei-Ying Ma . 2019 . Unified Visual-Semantic Embeddings: Bridging Vision and Language With Structured Meaning Representations. In Conference on Computer Vision and Pattern Recognition CVPR. 6609--6618 . Hao Wu, Jiayuan Mao, Yufeng Zhang, Yuning Jiang, Lei Li, Weiwei Sun, and Wei-Ying Ma. 2019. Unified Visual-Semantic Embeddings: Bridging Vision and Language With Structured Meaning Representations. In Conference on Computer Vision and Pattern Recognition CVPR. 6609--6618."},{"key":"e_1_3_2_2_64_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00393"},{"key":"e_1_3_2_2_65_1","volume-title":"Semi-Autoregressive Image Captioning. In MM '21: ACM Multimedia Conference. 2708--2716","author":"Yan Xu","year":"2021","unstructured":"Xu Yan , Zhengcong Fei , Zekang Li , Shuhui Wang , Qingming Huang , and Qi Tian . 2021 . Semi-Autoregressive Image Captioning. In MM '21: ACM Multimedia Conference. 2708--2716 . Xu Yan, Zhengcong Fei, Zekang Li, Shuhui Wang, Qingming Huang, and Qi Tian. 2021. Semi-Autoregressive Image Captioning. In MM '21: ACM Multimedia Conference. 2708--2716."},{"key":"e_1_3_2_2_66_1","volume-title":"Causal Attention for Vision-Language Tasks. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021","author":"Yang Xu","year":"2021","unstructured":"Xu Yang , Hanwang Zhang , Guojun Qi , and Jianfei Cai . 2021 . Causal Attention for Vision-Language Tasks. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021 , virtual, June 19 --25 , 2021. Computer Vision Foundation \/ IEEE, 9847--9857. Xu Yang, Hanwang Zhang, Guojun Qi, and Jianfei Cai. 2021. Causal Attention for Vision-Language Tasks. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19--25, 2021. Computer Vision Foundation \/ IEEE, 9847--9857."},{"key":"e_1_3_2_2_67_1","volume-title":"Context and Attribute Grounded Dense Captioning. In Conference on Computer Vision and Pattern Recognition, CVPR. 6241--6250","author":"Yin Guojun","year":"2019","unstructured":"Guojun Yin , Lu Sheng , Bin Liu , Nenghai Yu , Xiaogang Wang , and Jing Shao . 2019 . Context and Attribute Grounded Dense Captioning. In Conference on Computer Vision and Pattern Recognition, CVPR. 6241--6250 . Guojun Yin, Lu Sheng, Bin Liu, Nenghai Yu, Xiaogang Wang, and Jing Shao. 2019. Context and Attribute Grounded Dense Captioning. In Conference on Computer Vision and Pattern Recognition, CVPR. 6241--6250."},{"key":"e_1_3_2_2_68_1","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00166"},{"key":"e_1_3_2_2_69_1","volume-title":"Counterfactual Zero-Shot and Open-Set Visual Recognition. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021","author":"Yue Zhongqi","year":"2021","unstructured":"Zhongqi Yue , Tan Wang , Qianru Sun , Xian-Sheng Hua , and Hanwang Zhang . 2021 . Counterfactual Zero-Shot and Open-Set Visual Recognition. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021 , virtual, June 19 --25 , 2021. Computer Vision Foundation \/ IEEE, 15404--15414. Zhongqi Yue, Tan Wang, Qianru Sun, Xian-Sheng Hua, and Hanwang Zhang. 2021. Counterfactual Zero-Shot and Open-Set Visual Recognition. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19--25, 2021. Computer Vision Foundation \/ IEEE, 15404--15414."},{"key":"e_1_3_2_2_70_1","volume-title":"Causal Intervention for Weakly-Supervised Semantic Segmentation. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020","author":"Zhang Dong","year":"2020","unstructured":"Dong Zhang , Hanwang Zhang , Jinhui Tang , Xian-Sheng Hua , and Qianru Sun . 2020 b. Causal Intervention for Weakly-Supervised Semantic Segmentation. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020 , NeurIPS 2020, December 6--12, 2020, virtual, Hugo Larochelle, Marc'Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.). Dong Zhang, Hanwang Zhang, Jinhui Tang, Xian-Sheng Hua, and Qianru Sun. 2020b. Causal Intervention for Weakly-Supervised Semantic Segmentation. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6--12, 2020, virtual, Hugo Larochelle, Marc'Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.)."},{"key":"e_1_3_2_2_71_1","volume-title":"Context-Aware Attention Network for Image-Text Retrieval. In Conference on Computer Vision and Pattern Recognition, CVPR. 3533--3542","author":"Zhang Qi","unstructured":"Qi Zhang , Zhen Lei , Zhaoxiang Zhang , and Stan Z. Li . 2020a . Context-Aware Attention Network for Image-Text Retrieval. In Conference on Computer Vision and Pattern Recognition, CVPR. 3533--3542 . Qi Zhang, Zhen Lei, Zhaoxiang Zhang, and Stan Z. Li. 2020a. Context-Aware Attention Network for Image-Text Retrieval. In Conference on Computer Vision and Pattern Recognition, CVPR. 3533--3542."},{"key":"e_1_3_2_2_72_1","volume-title":"Deep Supervised Cross-Modal Retrieval. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR. 10394--10403","author":"Zhen Liangli","year":"2019","unstructured":"Liangli Zhen , Peng Hu , Xu Wang , and Dezhong Peng . 2019 . Deep Supervised Cross-Modal Retrieval. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR. 10394--10403 . Liangli Zhen, Peng Hu, Xu Wang, and Dezhong Peng. 2019. Deep Supervised Cross-Modal Retrieval. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR. 10394--10403."},{"key":"e_1_3_2_2_73_1","volume-title":"Learning Deep Features for Discriminative Localization. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016","author":"Zhou Bolei","year":"2016","unstructured":"Bolei Zhou , Aditya Khosla , \u00c0gata Lapedriza , Aude Oliva , and Antonio Torralba . 2016 . Learning Deep Features for Discriminative Localization. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016 , Las Vegas, NV, USA, June 27--30 , 2016. IEEE Computer Society, 2921--2929. Bolei Zhou, Aditya Khosla, \u00c0gata Lapedriza, Aude Oliva, and Antonio Torralba. 2016. Learning Deep Features for Discriminative Localization. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27--30, 2016. IEEE Computer Society, 2921--2929."}],"event":{"name":"MM '22: The 30th ACM International Conference on Multimedia","location":"Lisboa Portugal","acronym":"MM '22","sponsor":["SIGMM ACM Special Interest Group on Multimedia"]},"container-title":["Proceedings of the 30th ACM International Conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3503161.3547814","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3503161.3547814","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:02:34Z","timestamp":1750186954000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3503161.3547814"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,10,10]]},"references-count":73,"alternative-id":["10.1145\/3503161.3547814","10.1145\/3503161"],"URL":"https:\/\/doi.org\/10.1145\/3503161.3547814","relation":{},"subject":[],"published":{"date-parts":[[2022,10,10]]},"assertion":[{"value":"2022-10-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}