{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,6]],"date-time":"2025-12-06T05:04:34Z","timestamp":1764997474703,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":42,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,10,10]],"date-time":"2022-10-10T00:00:00Z","timestamp":1665360000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"The National Key R&D Program of China","award":["2018AAA0100604"],"award-info":[{"award-number":["2018AAA0100604"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,10,10]]},"DOI":"10.1145\/3503161.3548396","type":"proceedings-article","created":{"date-parts":[[2022,10,10]],"date-time":"2022-10-10T15:43:12Z","timestamp":1665416592000},"page":"4996-5004","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["Counterfactually Measuring and Eliminating Social Bias in Vision-Language Pre-training Models"],"prefix":"10.1145","author":[{"given":"Yi","family":"Zhang","sequence":"first","affiliation":[{"name":"Beijing Jiaotong University, Beijing, China"}]},{"given":"Junyang","family":"Wang","sequence":"additional","affiliation":[{"name":"Beijing Jiaotong University, Beijing, China"}]},{"given":"Jitao","family":"Sang","sequence":"additional","affiliation":[{"name":"Beijing Jiaotong University &amp; Peng Cheng Lab, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2022,10,10]]},"reference":[{"key":"e_1_3_2_2_1_1","volume-title":"Science","volume":"356","author":"Caliskan Aylin","year":"2017","unstructured":"Aylin Caliskan , Joanna J Bryson , and Arvind Narayanan . 2017 . Semantics derived automatically from language corpora contain human-like biases . Science , Vol. 356 , 6334 (2017), 183--186. Aylin Caliskan, Joanna J Bryson, and Arvind Narayanan. 2017. Semantics derived automatically from language corpora contain human-like biases. Science, Vol. 356, 6334 (2017), 183--186."},{"key":"e_1_3_2_2_2_1","volume-title":"Faisal Ahmed, Zhe Gan, Yu Cheng, and Jingjing Liu.","author":"Chen Yen-Chun","year":"2020","unstructured":"Yen-Chun Chen , Linjie Li , Licheng Yu , Ahmed El Kholy , Faisal Ahmed, Zhe Gan, Yu Cheng, and Jingjing Liu. 2020 . Uniter : Universal image-text representation learning. In ECCV. Yen-Chun Chen, Linjie Li, Licheng Yu, Ahmed El Kholy, Faisal Ahmed, Zhe Gan, Yu Cheng, and Jingjing Liu. 2020. Uniter: Universal image-text representation learning. In ECCV."},{"key":"e_1_3_2_2_3_1","volume-title":"International Conference on Machine Learning. PMLR","author":"Choi Kristy","year":"2020","unstructured":"Kristy Choi , Aditya Grover , Trisha Singh , Rui Shu , and Stefano Ermon . 2020 . Fair generative modeling via weak supervision . In International Conference on Machine Learning. PMLR , 1887--1898. Kristy Choi, Aditya Grover, Trisha Singh, Rui Shu, and Stefano Ermon. 2020. Fair generative modeling via weak supervision. In International Conference on Machine Learning. PMLR, 1887--1898."},{"key":"e_1_3_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3474085.3475251"},{"key":"e_1_3_2_2_5_1","unstructured":"Jacob Devlin Ming-Wei Chang Kenton Lee and Kristina Toutanova. [n.d.]. Bert: Pre-training of deep bidirectional transformers for language understanding. ([n. d.]) 4171--4186.  Jacob Devlin Ming-Wei Chang Kenton Lee and Kristina Toutanova. [n.d.]. Bert: Pre-training of deep bidirectional transformers for language understanding. ([n. d.]) 4171--4186."},{"key":"e_1_3_2_2_6_1","volume-title":"3rd International Conference on Learning Representations.","author":"Goodfellow Ian J.","year":"2015","unstructured":"Ian J. Goodfellow , Jonathon Shlens , and Christian Szegedy . 2015 . Explaining and harnessing adversarial examples . In 3rd International Conference on Learning Representations. Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and harnessing adversarial examples. In 3rd International Conference on Learning Representations."},{"key":"e_1_3_2_2_7_1","volume-title":"Bias correction of learned generative models using likelihood-free importance weighting. Advances in neural information processing systems","author":"Grover Aditya","year":"2019","unstructured":"Aditya Grover , Jiaming Song , Ashish Kapoor , Kenneth Tran , Alekh Agarwal , Eric J Horvitz , and Stefano Ermon . 2019. Bias correction of learned generative models using likelihood-free importance weighting. Advances in neural information processing systems , Vol. 32 ( 2019 ). Aditya Grover, Jiaming Song, Ashish Kapoor, Kenneth Tran, Alekh Agarwal, Eric J Horvitz, and Stefano Ermon. 2019. Bias correction of learned generative models using likelihood-free importance weighting. Advances in neural information processing systems, Vol. 32 (2019)."},{"key":"e_1_3_2_2_8_1","volume-title":"Vivo: Surpassing human performance in novel object captioning with visual vocabulary pre-training.","author":"Hu Xiaowei","year":"2020","unstructured":"Xiaowei Hu , Xi Yin , Kevin Lin , Lijuan Wang , Lei Zhang , Jianfeng Gao , and Zicheng Liu . 2020 . Vivo: Surpassing human performance in novel object captioning with visual vocabulary pre-training. (2020). Xiaowei Hu, Xi Yin, Kevin Lin, Lijuan Wang, Lei Zhang, Jianfeng Gao, and Zicheng Liu. 2020. Vivo: Surpassing human performance in novel object captioning with visual vocabulary pre-training. (2020)."},{"key":"e_1_3_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/3474085.3478871"},{"key":"e_1_3_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.artint.2021.103652"},{"key":"e_1_3_2_2_12_1","volume-title":"International Conference on Machine Learning. PMLR, 4904--4916","author":"Jia Chao","year":"2021","unstructured":"Chao Jia , Yinfei Yang , Ye Xia , Yi-Ting Chen , Zarana Parekh , Hieu Pham , Quoc Le , Yun-Hsuan Sung , Zhen Li , and Tom Duerig . 2021 . Scaling up visual and vision-language representation learning with noisy text supervision . In International Conference on Machine Learning. PMLR, 4904--4916 . Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc Le, Yun-Hsuan Sung, Zhen Li, and Tom Duerig. 2021. Scaling up visual and vision-language representation learning with noisy text supervision. In International Conference on Machine Learning. PMLR, 4904--4916."},{"key":"e_1_3_2_2_13_1","volume-title":"International Conference on Machine Learning. PMLR, 5583--5594","author":"Kim Wonjae","year":"2021","unstructured":"Wonjae Kim , Bokyung Son , and Ildoo Kim . 2021 . Vilt: Vision-and-language transformer without convolution or region supervision . In International Conference on Machine Learning. PMLR, 5583--5594 . Wonjae Kim, Bokyung Son, and Ildoo Kim. 2021. Vilt: Vision-and-language transformer without convolution or region supervision. In International Conference on Machine Learning. PMLR, 5583--5594."},{"key":"e_1_3_2_2_14_1","volume-title":"5th International Conference on Learning Representations","author":"Kurakin Alexey","year":"2017","unstructured":"Alexey Kurakin , Ian Goodfellow , and Samy Bengio . 2017 . Adversarial examples in the physical world . 5th International Conference on Learning Representations (2017). Alexey Kurakin, Ian Goodfellow, and Samy Bengio. 2017. Adversarial examples in the physical world. 5th International Conference on Learning Representations (2017)."},{"key":"e_1_3_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W19-3823"},{"key":"e_1_3_2_2_16_1","volume-title":"Advances in Neural Information Processing Systems","volume":"34","author":"Li Junnan","year":"2021","unstructured":"Junnan Li , Ramprasaath Selvaraju , Akhilesh Gotmare , Shafiq Joty , Caiming Xiong , and Steven Chu Hong Hoi . 2021 . Align before fuse: Vision and language representation learning with momentum distillation . Advances in Neural Information Processing Systems , Vol. 34 (2021). Junnan Li, Ramprasaath Selvaraju, Akhilesh Gotmare, Shafiq Joty, Caiming Xiong, and Steven Chu Hong Hoi. 2021. Align before fuse: Vision and language representation learning with momentum distillation. Advances in Neural Information Processing Systems, Vol. 34 (2021)."},{"key":"e_1_3_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58577-8_8"},{"key":"e_1_3_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"e_1_3_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.425"},{"key":"e_1_3_2_2_20_1","volume-title":"Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. Advances in neural information processing systems","author":"Lu Jiasen","year":"2019","unstructured":"Jiasen Lu , Dhruv Batra , Devi Parikh , and Stefan Lee . 2019 . Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. Advances in neural information processing systems , Vol. 32 (2019). Jiasen Lu, Dhruv Batra, Devi Parikh, and Stefan Lee. 2019. Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. Advances in neural information processing systems, Vol. 32 (2019)."},{"key":"e_1_3_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3474085.3475703"},{"key":"e_1_3_2_2_22_1","volume-title":"Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083","author":"Madry Aleksander","year":"2017","unstructured":"Aleksander Madry , Aleksandar Makelov , Ludwig Schmidt , Dimitris Tsipras , and Adrian Vladu . 2017. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 ( 2017 ). Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2017. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 (2017)."},{"key":"e_1_3_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00251"},{"key":"e_1_3_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1162"},{"key":"e_1_3_2_2_25_1","volume-title":"International Conference on Machine Learning. PMLR, 8748--8763","author":"Radford Alec","year":"2021","unstructured":"Alec Radford , Jong Wook Kim , Chris Hallacy , Aditya Ramesh , Gabriel Goh , Sandhini Agarwal , Girish Sastry , Amanda Askell , Pamela Mishkin , Jack Clark , 2021 . Learning transferable visual models from natural language supervision . In International Conference on Machine Learning. PMLR, 8748--8763 . Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning. PMLR, 8748--8763."},{"key":"e_1_3_2_2_26_1","doi-asserted-by":"crossref","unstructured":"Candace Ross Boris Katz and Andrei Barbu. 2020. Measuring Social Biases in Grounded Vision and Language Embeddings. arXiv preprint arXiv:2002.08911.  Candace Ross Boris Katz and Andrei Barbu. 2020. Measuring Social Biases in Grounded Vision and Language Embeddings. arXiv preprint arXiv:2002.08911.","DOI":"10.18653\/v1\/2021.naacl-main.78"},{"key":"e_1_3_2_2_27_1","volume-title":"Worst of both worlds: Biases compound in pre-trained vision-and-language models. arXiv preprint arXiv:2104.08666","author":"Srinivasan Tejas","year":"2021","unstructured":"Tejas Srinivasan and Yonatan Bisk . 2021. Worst of both worlds: Biases compound in pre-trained vision-and-language models. arXiv preprint arXiv:2104.08666 ( 2021 ). Tejas Srinivasan and Yonatan Bisk. 2021. Worst of both worlds: Biases compound in pre-trained vision-and-language models. arXiv preprint arXiv:2104.08666 (2021)."},{"key":"e_1_3_2_2_28_1","volume-title":"Vl-bert: Pre-training of generic visual-linguistic representations. arXiv preprint arXiv:1908.08530","author":"Su Weijie","year":"2019","unstructured":"Weijie Su , Xizhou Zhu , Yue Cao , Bin Li , Lewei Lu , Furu Wei , and Jifeng Dai . 2019 . Vl-bert: Pre-training of generic visual-linguistic representations. arXiv preprint arXiv:1908.08530 (2019). Weijie Su, Xizhou Zhu, Yue Cao, Bin Li, Lewei Lu, Furu Wei, and Jifeng Dai. 2019. Vl-bert: Pre-training of generic visual-linguistic representations. arXiv preprint arXiv:1908.08530 (2019)."},{"key":"e_1_3_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1644"},{"key":"e_1_3_2_2_30_1","volume-title":"Mitigating Gender Bias in Natural Language Processing: Literature Review","author":"Sun Tony","year":"2019","unstructured":"Tony Sun , Andrew Gaut , Shirlyn Tang , Yuxin Huang , Mai ElSherief , Jieyu Zhao , Diba Mirza , Elizabeth Belding , Kai-Wei Chang , and William Yang Wang . 2019. Mitigating Gender Bias in Natural Language Processing: Literature Review . Association for Computational Linguistics (ACL 2019 ) (2019). Tony Sun, Andrew Gaut, Shirlyn Tang, Yuxin Huang, Mai ElSherief, Jieyu Zhao, Diba Mirza, Elizabeth Belding, Kai-Wei Chang, and William Yang Wang. 2019. Mitigating Gender Bias in Natural Language Processing: Literature Review. Association for Computational Linguistics (ACL 2019) (2019)."},{"key":"e_1_3_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/3377811.3380420"},{"key":"e_1_3_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1514"},{"key":"e_1_3_2_2_33_1","volume-title":"Attention is all you need. Advances in neural information processing systems","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani , Noam Shazeer , Niki Parmar , Jakob Uszkoreit , Llion Jones , Aidan N Gomez , \u0141ukasz Kaiser , and Illia Polosukhin . 2017. Attention is all you need. Advances in neural information processing systems , Vol. 30 ( 2017 ). Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems, Vol. 30 (2017)."},{"key":"e_1_3_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00078"},{"key":"e_1_3_2_2_35_1","volume-title":"Chi, and Slav Petrov","author":"Webster Kellie","year":"2020","unstructured":"Kellie Webster , Xuezhi Wang , Ian Tenney , Alex Beutel , Emily Pitler , Ellie Pavlick , Jilin Chen , Ed Chi, and Slav Petrov . 2020 . Measuring and reducing gendered correlations in pre-trained models. arXiv preprint arXiv:2010.06032 (2020). Kellie Webster, Xuezhi Wang, Ian Tenney, Alex Beutel, Emily Pitler, Ellie Pavlick, Jilin Chen, Ed Chi, and Slav Petrov. 2020. Measuring and reducing gendered correlations in pre-trained models. arXiv preprint arXiv:2010.06032 (2020)."},{"key":"e_1_3_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01522"},{"key":"e_1_3_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00166"},{"key":"e_1_3_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3413518"},{"key":"e_1_3_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3413772"},{"key":"e_1_3_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.463"},{"key":"e_1_3_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N19-1064"},{"key":"e_1_3_2_2_42_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N18-2003"},{"key":"e_1_3_2_2_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.244"}],"event":{"name":"MM '22: The 30th ACM International Conference on Multimedia","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Lisboa Portugal","acronym":"MM '22"},"container-title":["Proceedings of the 30th ACM International Conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3503161.3548396","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3503161.3548396","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:00:44Z","timestamp":1750186844000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3503161.3548396"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,10,10]]},"references-count":42,"alternative-id":["10.1145\/3503161.3548396","10.1145\/3503161"],"URL":"https:\/\/doi.org\/10.1145\/3503161.3548396","relation":{},"subject":[],"published":{"date-parts":[[2022,10,10]]},"assertion":[{"value":"2022-10-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}