{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,6]],"date-time":"2026-03-06T18:40:51Z","timestamp":1772822451859,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":54,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,7,6]],"date-time":"2022-07-06T00:00:00Z","timestamp":1657065600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"ANR","award":["ANR-19-CE23-0028"],"award-info":[{"award-number":["ANR-19-CE23-0028"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,7,6]]},"DOI":"10.1145\/3477495.3531753","type":"proceedings-article","created":{"date-parts":[[2022,7,7]],"date-time":"2022-07-07T15:12:13Z","timestamp":1657206733000},"page":"3108-3120","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":19,"title":["ViQuAE, a Dataset for Knowledge-based Visual Question Answering about Named Entities"],"prefix":"10.1145","author":[{"given":"Paul","family":"Lerner","sequence":"first","affiliation":[{"name":"Universit\u00e9 Paris-Saclay, CNRS, LISN, Orsay, France"}]},{"given":"Olivier","family":"Ferret","sequence":"additional","affiliation":[{"name":"Universit\u00e9 Paris-Saclay, CEA, List, Palaiseau, France"}]},{"given":"Camille","family":"Guinaudeau","sequence":"additional","affiliation":[{"name":"Universit\u00e9 Paris-Saclay, CNRS, LISN, Orsay, France"}]},{"given":"Herv\u00e9","family":"Le Borgne","sequence":"additional","affiliation":[{"name":"Universit\u00e9 Paris-Saclay, CEA, List, Palaiseau, France"}]},{"given":"Romaric","family":"Besan\u00e7on","sequence":"additional","affiliation":[{"name":"Universit\u00e9 Paris-Saclay, CEA, List, Palaiseau, France"}]},{"given":"Jose G.","family":"Moreno","sequence":"additional","affiliation":[{"name":"IRIT, Universit\u00e9 Paul Sabatier, Toulouse, France"}]},{"given":"Jes\u00fas","family":"Lov\u00f3n Melgarejo","sequence":"additional","affiliation":[{"name":"IRIT, Universit\u00e9 Paul Sabatier, Toulouse, France"}]}],"member":"320","published-online":{"date-parts":[[2022,7,7]]},"reference":[{"key":"e_1_3_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00636"},{"key":"e_1_3_2_2_2_1","volume-title":"VQA: Visual Question Answering. In 2015 IEEE International Conference on Computer Vision (ICCV). IEEE","author":"Antol Stanislaw","year":"2015","unstructured":"Stanislaw Antol , Aishwarya Agrawal , Jiasen Lu , Margaret Mitchell , Dhruv Batra , C. Lawrence Zitnick , and Devi Parikh . 2015 . VQA: Visual Question Answering. In 2015 IEEE International Conference on Computer Vision (ICCV). IEEE , Santiago, Chile, 2425--2433. https:\/\/doi.org\/10.1109\/ICCV. 2015.279 10.1109\/ICCV.2015.279 Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, and Devi Parikh. 2015. VQA: Visual Question Answering. In 2015 IEEE International Conference on Computer Vision (ICCV). IEEE, Santiago, Chile, 2425--2433. https:\/\/doi.org\/10.1109\/ICCV.2015.279"},{"key":"e_1_3_2_2_3_1","volume-title":"Advances in Information Retrieval (Lecture Notes in Computer Science), Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil N\u00f8rv\u00e5g","author":"Bassani Elias","unstructured":"Elias Bassani . 2022. ranx: A Blazing-Fast Python Library for Ranking Evaluation and Comparison . In Advances in Information Retrieval (Lecture Notes in Computer Science), Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil N\u00f8rv\u00e5g , and Vinay Setty (Eds.). Springer International Publishing , Cham , 259--264. https:\/\/doi.org\/10.1007\/978--3-030--99739--7_30 10.1007\/978--3-030--99739--7_30 Elias Bassani. 2022. ranx: A Blazing-Fast Python Library for Ranking Evaluation and Comparison. In Advances in Information Retrieval (Lecture Notes in Computer Science), Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil N\u00f8rv\u00e5g, and Vinay Setty (Eds.). Springer International Publishing, Cham, 259--264. https:\/\/doi.org\/10.1007\/978--3-030--99739--7_30"},{"key":"e_1_3_2_2_4_1","unstructured":"Rishi Bommasani Drew A. Hudson Ehsan Adeli Russ Altman Simran Arora Sydney von Arx Michael S. Bernstein Jeannette Bohg Antoine Bosselut Emma Brunskill Erik Brynjolfsson Shyamal Buch Dallas Card Rodrigo Castellon Niladri Chatterji Annie Chen Kathleen Creel Jared Quincy Davis Dora Demszky Chris Donahue Moussa Doumbouya Esin Durmus Stefano Ermon John Etchemendy Kawin Ethayarajh Li Fei-Fei Chelsea Finn Trevor Gale Lauren Gillespie Karan Goel Noah Goodman Shelby Grossman Neel Guha Tatsunori Hashimoto Peter Henderson John Hewitt Daniel E. Ho Jenny Hong Kyle Hsu Jing Huang Thomas Icard Saahil Jain Dan Jurafsky Pratyusha Kalluri Siddharth Karamcheti Geoff Keeling Fereshte Khani Omar Khattab Pang Wei Koh Mark Krass Ranjay Krishna Rohith Kuditipudi Ananya Kumar Faisal Ladhak Mina Lee Tony Lee Jure Leskovec Isabelle Levent Xiang Lisa Li Xuechen Li Tengyu Ma Ali Malik Christopher D. Manning Suvir Mirchandani Eric Mitchell Zanele Munyikwa Suraj Nair Avanika Narayan Deepak Narayanan Ben Newman Allen Nie Juan Carlos Niebles Hamed Nilforoshan Julian Nyarko Giray Ogut Laurel Orr Isabel Papadimitriou Joon Sung Park Chris Piech Eva Portelance Christopher Potts Aditi Raghunathan Rob Reich Hongyu Ren Frieda Rong Yusuf Roohani Camilo Ruiz Jack Ryan Christopher R\u00e9 Dorsa Sadigh Shiori Sagawa Keshav Santhanam Andy Shih Krishnan Srinivasan Alex Tamkin Rohan Taori Armin W. Thomas Florian Tram\u00e8r Rose E. Wang William Wang Bohan Wu Jiajun Wu Yuhuai Wu Sang Michael Xie Michihiro Yasunaga Jiaxuan You Matei Zaharia Michael Zhang Tianyi Zhang Xikun Zhang Yuhui Zhang Lucia Zheng Kaitlyn Zhou and Percy Liang. 2021. On the Opportunities and Risks of Foundation Models. arXiv:2108.07258 [cs] (Aug. 2021). http:\/\/arxiv.org\/abs\/2108.07258 arXiv: 2108.07258.  Rishi Bommasani Drew A. Hudson Ehsan Adeli Russ Altman Simran Arora Sydney von Arx Michael S. Bernstein Jeannette Bohg Antoine Bosselut Emma Brunskill Erik Brynjolfsson Shyamal Buch Dallas Card Rodrigo Castellon Niladri Chatterji Annie Chen Kathleen Creel Jared Quincy Davis Dora Demszky Chris Donahue Moussa Doumbouya Esin Durmus Stefano Ermon John Etchemendy Kawin Ethayarajh Li Fei-Fei Chelsea Finn Trevor Gale Lauren Gillespie Karan Goel Noah Goodman Shelby Grossman Neel Guha Tatsunori Hashimoto Peter Henderson John Hewitt Daniel E. Ho Jenny Hong Kyle Hsu Jing Huang Thomas Icard Saahil Jain Dan Jurafsky Pratyusha Kalluri Siddharth Karamcheti Geoff Keeling Fereshte Khani Omar Khattab Pang Wei Koh Mark Krass Ranjay Krishna Rohith Kuditipudi Ananya Kumar Faisal Ladhak Mina Lee Tony Lee Jure Leskovec Isabelle Levent Xiang Lisa Li Xuechen Li Tengyu Ma Ali Malik Christopher D. Manning Suvir Mirchandani Eric Mitchell Zanele Munyikwa Suraj Nair Avanika Narayan Deepak Narayanan Ben Newman Allen Nie Juan Carlos Niebles Hamed Nilforoshan Julian Nyarko Giray Ogut Laurel Orr Isabel Papadimitriou Joon Sung Park Chris Piech Eva Portelance Christopher Potts Aditi Raghunathan Rob Reich Hongyu Ren Frieda Rong Yusuf Roohani Camilo Ruiz Jack Ryan Christopher R\u00e9 Dorsa Sadigh Shiori Sagawa Keshav Santhanam Andy Shih Krishnan Srinivasan Alex Tamkin Rohan Taori Armin W. Thomas Florian Tram\u00e8r Rose E. Wang William Wang Bohan Wu Jiajun Wu Yuhuai Wu Sang Michael Xie Michihiro Yasunaga Jiaxuan You Matei Zaharia Michael Zhang Tianyi Zhang Xikun Zhang Yuhui Zhang Lucia Zheng Kaitlyn Zhou and Percy Liang. 2021. On the Opportunities and Risks of Foundation Models. arXiv:2108.07258 [cs] (Aug. 2021). http:\/\/arxiv.org\/abs\/2108.07258 arXiv: 2108.07258."},{"key":"e_1_3_2_2_5_1","unstructured":"Tom B Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell etal 2020. Language models are few-shot learners. arXiv preprint arXiv:2005.14165 (2020).  Tom B Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell et al. 2020. Language models are few-shot learners. arXiv preprint arXiv:2005.14165 (2020)."},{"key":"e_1_3_2_2_6_1","volume-title":"WebQA: Multihop and Multimodal QA. arXiv:2109.00590 [cs] (Sept","author":"Chang Yingshan","year":"2021","unstructured":"Yingshan Chang , Mridu Narang , Hisami Suzuki , Guihong Cao , Jianfeng Gao , and Yonatan Bisk . 2021. WebQA: Multihop and Multimodal QA. arXiv:2109.00590 [cs] (Sept . 2021 ). http:\/\/arxiv.org\/abs\/2109.00590 Yingshan Chang, Mridu Narang, Hisami Suzuki, Guihong Cao, Jianfeng Gao, and Yonatan Bisk. 2021. WebQA: Multihop and Multimodal QA. arXiv:2109.00590 [cs] (Sept. 2021). http:\/\/arxiv.org\/abs\/2109.00590"},{"key":"e_1_3_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P17-1171"},{"key":"e_1_3_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-1078"},{"key":"e_1_3_2_2_9_1","volume-title":"The CLEF Cross Language Image Retrieval Track (ImageCLEF)","author":"Clough Paul","year":"2004","unstructured":"Paul Clough , Mark Sanderson , and Henning M\u00fcller . 2004. The CLEF Cross Language Image Retrieval Track (ImageCLEF) 2004 . In Image and Video Retrieval (Lecture Notes in Computer Science), Peter Enser, Yiannis Kompatsiaris, Noel E. O'Connor, Alan F. Smeaton, and Arnold W. M. Smeulders (Eds.). Springer , Berlin, Heidelberg, 243--251. https:\/\/doi.org\/10.1007\/978--3--540--27814--6_31 10.1007\/978--3--540--27814--6_31 Paul Clough, Mark Sanderson, and Henning M\u00fcller. 2004. The CLEF Cross Language Image Retrieval Track (ImageCLEF) 2004. In Image and Video Retrieval (Lecture Notes in Computer Science), Peter Enser, Yiannis Kompatsiaris, Noel E. O'Connor, Alan F. Smeaton, and Arnold W. M. Smeulders (Eds.). Springer, Berlin, Heidelberg, 243--251. https:\/\/doi.org\/10.1007\/978--3--540--27814--6_31"},{"key":"e_1_3_2_2_10_1","volume-title":"Generating Referring Expressions Involving Relations. In Fifth Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics","author":"Dale Robert","year":"1991","unstructured":"Robert Dale and Nicholas Haddock . 1991 . Generating Referring Expressions Involving Relations. In Fifth Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics , Berlin, Germany. https:\/\/aclanthology.org\/E91--1028 Robert Dale and Nicholas Haddock. 1991. Generating Referring Expressions Involving Relations. In Fifth Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, Berlin, Germany. https:\/\/aclanthology.org\/E91--1028"},{"key":"e_1_3_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_2_2_12_1","doi-asserted-by":"crossref","unstructured":"Jiankang Deng Jia Guo Niannan Xue and Stefanos Zafeiriou. 2019. ArcFace: Additive Angular Margin Loss for Deep Face Recognition. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). https: \/\/openaccess.thecvf.com\/content_CVPR_2019\/html\/Deng_ArcFace_Additive_ Angular_Margin_Loss_for_Deep_Face_Recognition_CVPR_2019_paper.html  Jiankang Deng Jia Guo Niannan Xue and Stefanos Zafeiriou. 2019. ArcFace: Additive Angular Margin Loss for Deep Face Recognition. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). https: \/\/openaccess.thecvf.com\/content_CVPR_2019\/html\/Deng_ArcFace_Additive_ Angular_Margin_Loss_for_Deep_Face_Recognition_CVPR_2019_paper.html","DOI":"10.1109\/CVPR.2019.00482"},{"key":"e_1_3_2_2_13_1","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","volume":"1","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2019 . BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding . In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171--4186. https:\/\/doi.org\/10. 18653\/v1\/N19--1423 10.18653\/v1 Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171--4186. https:\/\/doi.org\/10.18653\/v1\/N19--1423"},{"key":"e_1_3_2_2_14_1","unstructured":"The European Parliament. 2016. Regulation (EU) 2016\/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data and repealing Directive 95\/46\/EC (General Data Protection Regulation) (Text with EEA relevance). http:\/\/data.europa.eu\/eli\/reg\/2016\/679\/oj\/eng Legislative Body: EP CONSIL.  The European Parliament. 2016. Regulation (EU) 2016\/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data and repealing Directive 95\/46\/EC (General Data Protection Regulation) (Text with EEA relevance). http:\/\/data.europa.eu\/eli\/reg\/2016\/679\/oj\/eng Legislative Body: EP CONSIL."},{"key":"e_1_3_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1871437.1871689"},{"key":"e_1_3_2_2_16_1","volume-title":"The design of experiments. The design of experiments","author":"Fisher R. A.","year":"1937","unstructured":"R. A. Fisher . 1937. The design of experiments. The design of experiments . 2 nd Ed ( 1937 ). https:\/\/www.cabdirect.org\/cabdirect\/abstract\/19371601600 Publisher : Oliver & Boyd, Edinburgh & London .. R. A. Fisher. 1937. The design of experiments. The design of experiments. 2nd Ed (1937). https:\/\/www.cabdirect.org\/cabdirect\/abstract\/19371601600 Publisher: Oliver & Boyd, Edinburgh & London..","edition":"2"},{"key":"e_1_3_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1037\/h0031619"},{"key":"e_1_3_2_2_18_1","volume-title":"ConceptBert: Concept-Aware Representation for Visual Question Answering. Findings of the Association for Computational Linguistics: EMNLP 2020 (2020","author":"Gard\u00e8res Fran\u00e7ois","year":"2020","unstructured":"Fran\u00e7ois Gard\u00e8res and Maryam Ziaeefard . 2020 . ConceptBert: Concept-Aware Representation for Visual Question Answering. Findings of the Association for Computational Linguistics: EMNLP 2020 (2020 ), 10. https:\/\/aclanthology.org\/2020. findings-emnlp.44\/ Fran\u00e7ois Gard\u00e8res and Maryam Ziaeefard. 2020. ConceptBert: Concept-Aware Representation for Visual Question Answering. Findings of the Association for Computational Linguistics: EMNLP 2020 (2020), 10. https:\/\/aclanthology.org\/2020. findings-emnlp.44\/"},{"key":"e_1_3_2_2_19_1","volume-title":"Computer Vision -- ECCV 2016 (Lecture Notes in Computer Science)","author":"Guo Yandong","unstructured":"Yandong Guo , Lei Zhang , Yuxiao Hu , Xiaodong He , and Jianfeng Gao . 2016. MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition . In Computer Vision -- ECCV 2016 (Lecture Notes in Computer Science) , Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Springer International Publishing , Cham , 87--102. https:\/\/doi.org\/10.1007\/978--3--319--46487--9_6 10.1007\/978--3--319--46487--9_6 Yandong Guo, Lei Zhang, Yuxiao Hu, Xiaodong He, and Jianfeng Gao. 2016. MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition. In Computer Vision -- ECCV 2016 (Lecture Notes in Computer Science), Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Springer International Publishing, Cham, 87--102. https:\/\/doi.org\/10.1007\/978--3--319--46487--9_6"},{"key":"e_1_3_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/TBDATA.2019.2921572"},{"key":"e_1_3_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P17-1147"},{"key":"e_1_3_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.550"},{"key":"e_1_3_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.571"},{"key":"e_1_3_2_2_25_1","volume-title":"Proceedings of the 3rd International Conference for Learning Representations. http:\/\/arxiv.org\/abs\/1412","author":"Diederik","unstructured":"Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization . In Proceedings of the 3rd International Conference for Learning Representations. http:\/\/arxiv.org\/abs\/1412 .6980 Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference for Learning Representations. http:\/\/arxiv.org\/abs\/1412.6980"},{"key":"e_1_3_2_2_26_1","volume-title":"The Oxford Handbook of Reference","author":"Krahmer Emiel","year":"1996","unstructured":"Emiel Krahmer and Kees van Deemter . 2019. Computational Generation of Referring Expressions: An Updated Survey . In The Oxford Handbook of Reference , B. Abbott and J. Gundel (Eds.). Oxford University Press . https:\/\/www.oxfordhandbooks.com\/view\/10.1093\/oxfordhb\/ 9780 1996 87305.001.0001\/oxfordhb-9780199687305-e-19 Emiel Krahmer and Kees van Deemter. 2019. Computational Generation of Referring Expressions: An Updated Survey. In The Oxford Handbook of Reference, B. Abbott and J. Gundel (Eds.). Oxford University Press. https:\/\/www.oxfordhandbooks.com\/view\/10.1093\/oxfordhb\/ 9780199687305.001.0001\/oxfordhb-9780199687305-e-19"},{"key":"e_1_3_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00276"},{"key":"e_1_3_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.eacl-main.86"},{"key":"e_1_3_2_2_29_1","doi-asserted-by":"crossref","unstructured":"Quentin Lhoest Albert Villanova del Moral Yacine Jernite Abhishek Thakur Patrick von Platen Suraj Patil Julien Chaumond Mariama Drame Julien Plu Lewis Tunstall Joe Davison Mario Gunjan Chhablani Bhavitvya Malik Simon Brandeis Teven Le Scao Victor Sanh Canwen Xu Nicolas Patry Angelina McMillan-Major Philipp Schmid Sylvain Gugger Cl\u00e9ment Delangue Th\u00e9o Matussi\u00e8re Lysandre Debut Stas Bekman Pierric Cistac Thibault Goehringer Victor Mustar Fran\u00e7ois Lagunas Alexander Rush and Thomas Wolf. 2021. Datasets: A Community Library for Natural Language Processing. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics Online and Punta Cana Dominican Republic 175--184. https:\/\/aclanthology.org\/2021.emnlp-demo.21  Quentin Lhoest Albert Villanova del Moral Yacine Jernite Abhishek Thakur Patrick von Platen Suraj Patil Julien Chaumond Mariama Drame Julien Plu Lewis Tunstall Joe Davison Mario Gunjan Chhablani Bhavitvya Malik Simon Brandeis Teven Le Scao Victor Sanh Canwen Xu Nicolas Patry Angelina McMillan-Major Philipp Schmid Sylvain Gugger Cl\u00e9ment Delangue Th\u00e9o Matussi\u00e8re Lysandre Debut Stas Bekman Pierric Cistac Thibault Goehringer Victor Mustar Fran\u00e7ois Lagunas Alexander Rush and Thomas Wolf. 2021. Datasets: A Community Library for Natural Language Processing. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics Online and Punta Cana Dominican Republic 175--184. https:\/\/aclanthology.org\/2021.emnlp-demo.21","DOI":"10.18653\/v1\/2021.emnlp-demo.21"},{"key":"e_1_3_2_2_30_1","volume-title":"Incorporating External Knowledge to Answer Open-Domain Visual Questions with Dynamic Memory Networks. arXiv:1712.00733 [cs] (Dec","author":"Li Guohao","year":"2017","unstructured":"Guohao Li , Hang Su , and Wenwu Zhu . 2017. Incorporating External Knowledge to Answer Open-Domain Visual Questions with Dynamic Memory Networks. arXiv:1712.00733 [cs] (Dec . 2017 ). http:\/\/arxiv.org\/abs\/1712.00733 Guohao Li, Hang Su, and Wenwu Zhu. 2017. Incorporating External Knowledge to Answer Open-Domain Visual Questions with Dynamic Memory Networks. arXiv:1712.00733 [cs] (Dec. 2017). http:\/\/arxiv.org\/abs\/1712.00733"},{"key":"e_1_3_2_2_31_1","volume-title":"Computer Vision -- ECCV 2014 (Lecture Notes in Computer Science), David Fleet, Tomas Pajdla","author":"Lin Tsung-Yi","unstructured":"Tsung-Yi Lin , Michael Maire , Serge Belongie , James Hays , Pietro Perona , Deva Ramanan , Piotr Doll\u00e1r , and C. Lawrence Zitnick . 2014. Microsoft COCO: Common Objects in Context . In Computer Vision -- ECCV 2014 (Lecture Notes in Computer Science), David Fleet, Tomas Pajdla , Bernt Schiele , and Tinne Tuytelaars (Eds.). Springer International Publishing , Cham, 740--755. https:\/\/doi.org\/10.1007\/978- 3--319--10602--1_48 10.1007\/978- Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll\u00e1r, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. In Computer Vision -- ECCV 2014 (Lecture Notes in Computer Science), David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars (Eds.). Springer International Publishing, Cham, 740--755. https:\/\/doi.org\/10.1007\/978- 3--319--10602--1_48"},{"key":"e_1_3_2_2_32_1","volume-title":"A Replication Study of Dense Passage Retriever. arXiv:2104.05740 [cs] (April","author":"Ma Xueguang","year":"2021","unstructured":"Xueguang Ma , Kai Sun , Ronak Pradeep , and Jimmy Lin . 2021. A Replication Study of Dense Passage Retriever. arXiv:2104.05740 [cs] (April 2021 ). http: \/\/arxiv.org\/abs\/2104.05740 Xueguang Ma, Kai Sun, Ronak Pradeep, and Jimmy Lin. 2021. A Replication Study of Dense Passage Retriever. arXiv:2104.05740 [cs] (April 2021). http: \/\/arxiv.org\/abs\/2104.05740"},{"key":"e_1_3_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00331"},{"key":"e_1_3_2_2_34_1","volume-title":"PyTorch: An Imperative Style","author":"Paszke Adam","year":"2019","unstructured":"Adam Paszke , Sam Gross , Francisco Massa , Adam Lerer , James Bradbury , Gregory Chanan , Trevor Killeen , Zeming Lin , Natalia Gimelshein , Luca Antiga , Alban Desmaison , Andreas Kopf , Edward Yang , Zachary DeVito , Martin Raison , Alykhan Tejani , Sasank Chilamkurthy , Benoit Steiner , Lu Fang , Junjie Bai , and Soumith Chintala . 2019. PyTorch: An Imperative Style , High-Performance Deep Learning Library . Advances in Neural Information Processing Systems 32 ( 2019 ). https:\/\/papers. nips.cc\/paper\/2019\/hash\/bdbca288fee7f92f2bfa9f7012727740-Abstract.html Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. Advances in Neural Information Processing Systems 32 (2019). https:\/\/papers. nips.cc\/paper\/2019\/hash\/bdbca288fee7f92f2bfa9f7012727740-Abstract.html"},{"key":"e_1_3_2_2_35_1","volume-title":"James Thorne, Yacine Jernite, Vladimir Karpukhin, Jean Maillard, Vassilis Plachouras, Tim Rockt\u00e4schel, and Sebastian Riedel.","author":"Petroni Fabio","year":"2021","unstructured":"Fabio Petroni , Aleksandra Piktus , Angela Fan , Patrick Lewis , Majid Yazdani , Nicola De Cao , James Thorne, Yacine Jernite, Vladimir Karpukhin, Jean Maillard, Vassilis Plachouras, Tim Rockt\u00e4schel, and Sebastian Riedel. 2021 . KILT: a Benchmark for Knowledge Intensive Language Tasks. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics , Online, 2523--2544. https:\/\/doi.org\/10.18653\/v1\/2021.naacl-main.200 10.18653\/v1 Fabio Petroni, Aleksandra Piktus, Angela Fan, Patrick Lewis, Majid Yazdani, Nicola De Cao, James Thorne, Yacine Jernite, Vladimir Karpukhin, Jean Maillard, Vassilis Plachouras, Tim Rockt\u00e4schel, and Sebastian Riedel. 2021. KILT: a Benchmark for Knowledge Intensive Language Tasks. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Online, 2523--2544. https:\/\/doi.org\/10.18653\/v1\/2021.naacl-main.200"},{"key":"e_1_3_2_2_36_1","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https:\/\/openaccess.thecvf.com\/content_cvpr_2018\/html\/Radenovic_ Revisiting_Oxford_and_CVPR_2018_paper.html","author":"Filip","year":"2018","unstructured":"Filip Radenovi?, Ahmet Iscen , Giorgos Tolias , Yannis Avrithis , and Ondej Chum . 2018 . Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https:\/\/openaccess.thecvf.com\/content_cvpr_2018\/html\/Radenovic_ Revisiting_Oxford_and_CVPR_2018_paper.html Filip Radenovi?, Ahmet Iscen, Giorgos Tolias, Yannis Avrithis, and Ondej Chum. 2018. Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https:\/\/openaccess.thecvf.com\/content_cvpr_2018\/html\/Radenovic_ Revisiting_Oxford_and_CVPR_2018_paper.html"},{"key":"e_1_3_2_2_37_1","volume-title":"International Conference on Machine Learning. PMLR, 8748--8763","author":"Radford Alec","year":"2021","unstructured":"Alec Radford , Jong Wook Kim , Chris Hallacy , Aditya Ramesh , Gabriel Goh , Sandhini Agarwal , Girish Sastry , Amanda Askell , Pamela Mishkin , Jack Clark , 2021 . Learning transferable visual models from natural language supervision . In International Conference on Machine Learning. PMLR, 8748--8763 . Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning. PMLR, 8748--8763."},{"key":"e_1_3_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D16-1264"},{"key":"e_1_3_2_2_39_1","volume-title":"Proceedings of the 38th International Conference on Machine Learning (Proceedings of Machine Learning Research","volume":"8831","author":"Ramesh Aditya","year":"2021","unstructured":"Aditya Ramesh , Mikhail Pavlov , Gabriel Goh , Scott Gray , Chelsea Voss , Alec Radford , Mark Chen , and Ilya Sutskever . 2021 . Zero-Shot Text-to-Image Generation . In Proceedings of the 38th International Conference on Machine Learning (Proceedings of Machine Learning Research , Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 8821-- 8831 . https:\/\/proceedings.mlr.press\/v139\/ramesh21a.html Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, and Ilya Sutskever. 2021. Zero-Shot Text-to-Image Generation. In Proceedings of the 38th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 8821--8831. https:\/\/proceedings.mlr.press\/v139\/ramesh21a.html"},{"key":"e_1_3_2_2_40_1","volume-title":"MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media Knowledge Extraction and Grounding. (Dec","author":"Reddy Revanth Gangi","year":"2021","unstructured":"Revanth Gangi Reddy , Xilin Rui , Manling Li , Xudong Lin , Haoyang Wen , Jaemin Cho , Lifu Huang , Mohit Bansal , Avirup Sil , Shih-Fu Chang , Alexander Schwing , and Heng Ji. 2021. MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media Knowledge Extraction and Grounding. (Dec . 2021 ). https: \/\/arxiv.org\/abs\/2112.10728v1 Revanth Gangi Reddy, Xilin Rui, Manling Li, Xudong Lin, Haoyang Wen, Jaemin Cho, Lifu Huang, Mohit Bansal, Avirup Sil, Shih-Fu Chang, Alexander Schwing, and Heng Ji. 2021. MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media Knowledge Extraction and Grounding. (Dec. 2021). https: \/\/arxiv.org\/abs\/2112.10728v1"},{"key":"e_1_3_2_2_41_1","volume-title":"Third Text REtrieval Conference (TREC-3) (NIST Special Publication","volume":"126","author":"Robertson Stephen E.","year":"1995","unstructured":"Stephen E. Robertson , Steve Walker , Susan Jones , Micheline M. Hancock-Beaulieu , and Mike Gatford . 1995 . Okapi at TREC-3 . In Third Text REtrieval Conference (TREC-3) (NIST Special Publication , Vol. 500--225), Donna K. Harman (Ed.). National Institute of Standards and Technology (NIST), 109-- 126 . https:\/\/citeseerx.ist.psu. edu\/viewdoc\/download?doi=10.1.1.32.9922&rep=rep1&type=pdf Stephen E. Robertson, Steve Walker, Susan Jones, Micheline M. Hancock-Beaulieu, and Mike Gatford. 1995. Okapi at TREC-3. In Third Text REtrieval Conference (TREC-3) (NIST Special Publication, Vol. 500--225), Donna K. Harman (Ed.). National Institute of Standards and Technology (NIST), 109--126. https:\/\/citeseerx.ist.psu. edu\/viewdoc\/download?doi=10.1.1.32.9922&rep=rep1&type=pdf"},{"key":"e_1_3_2_2_42_1","first-page":"413","volume-title":"Visuo-Linguistic Question Answering (VLQA) Challenge. In Findings of the Association for Computational Linguistics: EMNLP 2020","author":"Sampat Shailaja Keyur","year":"2020","unstructured":"Shailaja Keyur Sampat , Yezhou Yang , and Chitta Baral . 2020 . Visuo-Linguistic Question Answering (VLQA) Challenge. In Findings of the Association for Computational Linguistics: EMNLP 2020 . Association for Computational Linguistics, Online, 4606--4616. https:\/\/doi.org\/10. 18653\/v1\/2020.findings-emnlp. 413 10.18653\/v1 Shailaja Keyur Sampat, Yezhou Yang, and Chitta Baral. 2020. Visuo-Linguistic Question Answering (VLQA) Challenge. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online, 4606--4616. https:\/\/doi.org\/10.18653\/v1\/2020.findings-emnlp.413"},{"key":"e_1_3_2_2_43_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33018876"},{"key":"e_1_3_2_2_44_1","doi-asserted-by":"crossref","unstructured":"Ali Sharif Razavian Hossein Azizpour Josephine Sullivan and Stefan Carlsson. 2014. CNN Features Off-the-Shelf: An Astounding Baseline for Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. https:\/\/www.cv-foundation.org\/openaccess\/ content_cvpr_workshops_2014\/W15\/html\/Razavian_CNN_Features_Off-theShelf_2014_CVPR_paper.html  Ali Sharif Razavian Hossein Azizpour Josephine Sullivan and Stefan Carlsson. 2014. CNN Features Off-the-Shelf: An Astounding Baseline for Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. https:\/\/www.cv-foundation.org\/openaccess\/ content_cvpr_workshops_2014\/W15\/html\/Razavian_CNN_Features_Off-theShelf_2014_CVPR_paper.html","DOI":"10.1109\/CVPRW.2014.131"},{"key":"e_1_3_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/1321440.1321528"},{"key":"e_1_3_2_2_46_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1009962928226"},{"key":"e_1_3_2_2_47_1","volume-title":"Tables and Images. In ICLR","author":"Talmor Alon","year":"2021","unstructured":"Alon Talmor , Ori Yoran , Amnon Catav , Dan Lahav , Yizhong Wang , Akari Asai , Gabriel Ilharco , Hannaneh Hajishirzi , and Jonathan Berant . 2021 . MultiModalQA: Complex Question Answering over Text , Tables and Images. In ICLR 2021. https: \/\/openreview.net\/forum?id=ee6W5UgQLa Alon Talmor, Ori Yoran, Amnon Catav, Dan Lahav, Yizhong Wang, Akari Asai, Gabriel Ilharco, Hannaneh Hajishirzi, and Jonathan Berant. 2021. MultiModalQA: Complex Question Answering over Text, Tables and Images. In ICLR 2021. https: \/\/openreview.net\/forum?id=ee6W5UgQLa"},{"key":"e_1_3_2_2_48_1","volume-title":"Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'00)","author":"Ellen","unstructured":"Ellen M. Voorhees and Dawn M. Tice. 2000. Building a question answering test collection . In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'00) . ACM Press, Athens, Greece, 200--207. https:\/\/doi.org\/10.1145\/345508.345577 10.1145\/345508.345577 Ellen M. Voorhees and Dawn M. Tice. 2000. Building a question answering test collection. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'00). ACM Press, Athens, Greece, 200--207. https:\/\/doi.org\/10.1145\/345508.345577"},{"key":"e_1_3_2_2_49_1","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2017\/179"},{"key":"e_1_3_2_2_50_1","volume-title":"FVQA: Fact-Based Visual Question Answering","author":"Wang Peng","year":"2018","unstructured":"Peng Wang , Qi Wu , Chunhua Shen , Anthony Dick , and Anton van den Hengel . 2018 . FVQA: Fact-Based Visual Question Answering . IEEE transactions on pattern analysis and machine intelligence 40, 10 (2018), 2413--2427. https:\/\/doi.org\/10. 1109\/TPAMI.2017.2754246 Peng Wang, Qi Wu, Chunhua Shen, Anthony Dick, and Anton van den Hengel. 2018. FVQA: Fact-Based Visual Question Answering. IEEE transactions on pattern analysis and machine intelligence 40, 10 (2018), 2413--2427. https:\/\/doi.org\/10. 1109\/TPAMI.2017.2754246"},{"key":"e_1_3_2_2_51_1","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Wang Zhiguo","year":"1865","unstructured":"Zhiguo Wang , Patrick Ng , Xiaofei Ma , Ramesh Nallapati , and Bing Xiang . 2019. Multi-passage BERT: A Globally Normalized BERT Model for Open-domain Question Answering . In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) . Association for Computational Linguistics , Hong Kong , China, 5878--5882. https:\/\/doi.org\/10. 1865 3\/v1\/D19--1599 10.18653\/v1 Zhiguo Wang, Patrick Ng, Xiaofei Ma, Ramesh Nallapati, and Bing Xiang. 2019. Multi-passage BERT: A Globally Normalized BERT Model for Open-domain Question Answering. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 5878--5882. https:\/\/doi.org\/10.18653\/v1\/D19--1599"},{"key":"e_1_3_2_2_52_1","volume-title":"Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush.","author":"Wolf Thomas","year":"2020","unstructured":"Thomas Wolf , Lysandre Debut , Victor Sanh , Julien Chaumond , Clement Delangue , Anthony Moi , Pierric Cistac , Tim Rault , R\u00e9mi Louf , Morgan Funtowicz , Joe Davison , Sam Shleifer , Patrick von Platen , Clara Ma , Yacine Jernite , Julien Plu , Canwen Xu , Teven Le Scao , Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. 2020 . HuggingFace's Transformers: State-of-the-art Natural Language Processing . arXiv:1910.03771 [cs] (July 2020). http:\/\/arxiv.org\/ abs\/1910.03771 Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, R\u00e9mi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. 2020. HuggingFace's Transformers: State-of-the-art Natural Language Processing. arXiv:1910.03771 [cs] (July 2020). http:\/\/arxiv.org\/ abs\/1910.03771"},{"key":"e_1_3_2_2_53_1","volume-title":"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics","author":"Yang Zhilin","year":"1865","unstructured":"Zhilin Yang , Peng Qi , Saizheng Zhang , Yoshua Bengio , William Cohen , Ruslan Salakhutdinov , and Christopher D. Manning . 2018. HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering . In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics , Brussels, Belgium, 2369--2380. https:\/\/doi.org\/10. 1865 3\/v1\/D18--1259 Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William Cohen, Ruslan Salakhutdinov, and Christopher D. Manning. 2018. HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 2369--2380. https:\/\/doi.org\/10. 18653\/v1\/D18--1259"},{"key":"e_1_3_2_2_54_1","doi-asserted-by":"publisher","DOI":"10.1109\/LSP.2016.2603342"}],"event":{"name":"SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval","location":"Madrid Spain","acronym":"SIGIR '22","sponsor":["SIGIR ACM Special Interest Group on Information Retrieval"]},"container-title":["Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3477495.3531753","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3477495.3531753","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:02:07Z","timestamp":1750186927000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3477495.3531753"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,7,6]]},"references-count":54,"alternative-id":["10.1145\/3477495.3531753","10.1145\/3477495"],"URL":"https:\/\/doi.org\/10.1145\/3477495.3531753","relation":{},"subject":[],"published":{"date-parts":[[2022,7,6]]},"assertion":[{"value":"2022-07-07","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}