{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,5]],"date-time":"2025-11-05T06:54:55Z","timestamp":1762325695502,"version":"3.41.0"},"reference-count":54,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2024,10,8]],"date-time":"2024-10-08T00:00:00Z","timestamp":1728345600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Key Research and Development Program of China","award":["2020YFB1406600"],"award-info":[{"award-number":["2020YFB1406600"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["U21B2024, 62002257"],"award-info":[{"award-number":["U21B2024, 62002257"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100019062","name":"Tianjin Research Innovation Project for Postgraduate Students","doi-asserted-by":"crossref","award":["2022BKY121"],"award-info":[{"award-number":["2022BKY121"]}],"id":[{"id":"10.13039\/501100019062","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Web"],"published-print":{"date-parts":[[2024,11,30]]},"abstract":"<jats:p>Visual Question Answering (VQA) is a task that involves predicting an answer to a question depending on the content of an image. However, recent VQA methods have relied more on language priors between the question and answer rather than the image content. To address this issue, many debiasing methods have been proposed to reduce language bias in model reasoning. However, the bias can be divided into two categories: good bias and bad bias. Good bias can benefit to the answer prediction, while the bad bias may associate the models with the unrelated information. Therefore, instead of excluding good and bad bias indiscriminately in existing debiasing methods, we proposed a bias discrimination module to distinguish them. Additionally, bad bias may reduce the model\u2019s reliance on image content during answer reasoning and thus attend little on image features updating. To tackle this, we leverage Markov theory to construct a Markov field with image regions and question words as nodes. This helps with feature updating for both image regions and question words, thereby facilitating more accurate and comprehensive reasoning about both the image content and question. To verify the effectiveness of our network, we evaluate our network on VQA v2 and VQA cp v2 datasets and conduct extensive quantity and quality studies to verify the effectiveness of our proposed network. Experimental resu- lts show that our network achieves significant performance against the previous state-of-the-art methods.<\/jats:p>","DOI":"10.1145\/3616399","type":"journal-article","created":{"date-parts":[[2023,8,28]],"date-time":"2023-08-28T11:46:39Z","timestamp":1693223199000},"page":"1-13","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Multi-stage reasoning on introspecting and revising bias for visual question answering"],"prefix":"10.1145","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5755-9145","authenticated-orcid":false,"given":"L.","family":"An-An","sequence":"first","affiliation":[{"name":"School of Electrical and Information Engineering and also with the Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Tianjin, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8479-2807","authenticated-orcid":false,"given":"Lu","family":"Zimu","sequence":"additional","affiliation":[{"name":"School of Electrical and Information Engineering, Tianjin University, Tianjin, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7526-4356","authenticated-orcid":false,"given":"Xu","family":"Ning","sequence":"additional","affiliation":[{"name":"School of Electrical and Information Engineering, Tianjin University, Tianjin, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6406-4896","authenticated-orcid":false,"given":"Liu","family":"Min","sequence":"additional","affiliation":[{"name":"College of Electrical and Information Engineering, Hunan University, Changsha and also with the National Engineering Laboratory for Robot Visual Perception and Control Technology, Hunan University, Changsha, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1204-0512","authenticated-orcid":false,"given":"Yan","family":"Chenggang","sequence":"additional","affiliation":[{"name":"the Institute of Information and Control, Hangzhou Dianzi University, Hangzhou China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8788-1725","authenticated-orcid":false,"given":"Zheng","family":"Bolun","sequence":"additional","affiliation":[{"name":"the Institute of Information and Control, Hangzhou Dianzi University, Hangzhou China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7647-0602","authenticated-orcid":false,"given":"Lv","family":"Bo","sequence":"additional","affiliation":[{"name":"the 30th resaerch institute of CETC, Chengdu China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-3563-8488","authenticated-orcid":false,"given":"Duan","family":"Yulong","sequence":"additional","affiliation":[{"name":"the 30th resaerch institute of CETC, Chengdu China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7824-0985","authenticated-orcid":false,"given":"Shao","family":"Zhuang","sequence":"additional","affiliation":[{"name":"Warwick Manufacturing Group, University of Warwick, Coventry United Kingdom of Great Britain and Northern Ireland"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9765-0998","authenticated-orcid":false,"given":"Li","family":"Xuanya","sequence":"additional","affiliation":[{"name":"Baidu Inc, Beijing China"}]}],"member":"320","published-online":{"date-parts":[[2024,10,8]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.279"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00444"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.10"},{"key":"e_1_3_2_5_2","article-title":"Weakly supervised object localization and detection: A survey","volume":"2104","author":"Zhang Dingwen","year":"2021","unstructured":"Dingwen Zhang, Junwei Han, Gong Cheng, and Ming-Hsuan Yang. 2021. Weakly supervised object localization and detection: A survey. CoRR abs\/2104.07918 (2021).","journal-title":"CoRR"},{"key":"e_1_3_2_6_2","article-title":"Counterfactual visual dialog: Robust commonsense knowledge learning from unbiased training","author":"Liu An-An","year":"2023","unstructured":"An-An Liu, Chenxi Huang, Ning Xu, Hongshuo Tian, Jing Liu, and Yongdong Zhang. 2023. Counterfactual visual dialog: Robust commonsense knowledge learning from unbiased training. IEEE Trans. Multim (2023).","journal-title":"IEEE Trans. Multim"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01081"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01251"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1145\/3460231.3473325"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/3336191.3371769"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2021.3050918"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2021.3121062"},{"key":"e_1_3_2_13_2","first-page":"1106","volume-title":"NeurIPS","author":"Krizhevsky Alex","year":"2012","unstructured":"Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In NeurIPS. 1106\u20131114."},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/JBHI.2019.2891526"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1179"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.499"},{"volume-title":"CVPR.","author":"Yu Zhou","key":"e_1_3_2_18_2","unstructured":"Zhou Yu, Jun Yu, Yuhao Cui, Dacheng Tao, and Qi Tian. Deep modular co-attention networks for visual question answering. In CVPR. 6281\u20136290."},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.540"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2021.3107035"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/3219819.3220036"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00331"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01237-3_28"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00807"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2017.2754246"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00268"},{"key":"e_1_3_2_27_2","first-page":"8601","volume-title":"NeurIPS","author":"Wu Jialin","year":"2019","unstructured":"Jialin Wu and Raymond J. Mooney. 2019. Self-critical reasoning for robust visual question answering. In NeurIPS. 8601\u20138611."},{"key":"e_1_3_2_28_2","unstructured":"R\u00e9mi Cad\u00e8ne Corentin Dancette Hedi BenYounes Matthieu Cord and Devi Parikh. 2019. RUBi: Reducing unimodal biases in visual question answering. Advances in neural information processing systems 2019 32."},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1418"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.63"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICME51207.2021.9428165"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.265"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2020\/151"},{"key":"e_1_3_2_34_2","first-page":"1548","volume-title":"NeurIPS","author":"Ramakrishnan Sainandan","year":"2018","unstructured":"Sainandan Ramakrishnan, Aishwarya Agrawal, and Stefan Lee. 2018. Overcoming language priors in visual question answering with adversarial regularization. In NeurIPS. 1548\u20131558."},{"volume-title":"AAAI.","author":"Jing Chenchen","key":"e_1_3_2_35_2","unstructured":"Chenchen Jing, Yuwei Wu, Xiaoxun Zhang, Yunde Jia, and Qi Wu. Overcoming language priors in VQA via decomposed linguistic representations. In AAAI. 11181\u201311188."},{"key":"e_1_3_2_36_2","first-page":"18","volume-title":"ECCV","author":"V. Gouthaman K.","year":"2020","unstructured":"Gouthaman K. V. and Anurag Mittal. 2020. Reducing language biases in visual question answering with visually-grounded question encoder. In ECCV, Vol. 12358. 18\u201334."},{"key":"e_1_3_2_37_2","article-title":"Relational inductive biases, deep learning, and graph networks","volume":"1806","author":"Battaglia Peter W.","year":"2018","unstructured":"Peter W. Battaglia, Jessica B. Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vin\u00edcius Flores Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, \u00c7aglar G\u00fcl\u00e7ehre, H. Francis Song, Andrew J. Ballard, Justin Gilmer, George E. Dahl, Ashish Vaswani, Kelsey R. Allen, Charles Nash, Victoria Langston, Chris Dyer, Nicolas Heess, Daan Wierstra, Pushmeet Kohli, Matthew M. Botvinick, Oriol Vinyals, Yujia Li, and Razvan Pascanu. 2018. Relational inductive biases, deep learning, and graph networks. CoRR abs\/1806.01261 (2018).","journal-title":"CoRR"},{"key":"e_1_3_2_38_2","first-page":"91","volume-title":"NeurIPS","author":"Ren Shaoqing","year":"2015","unstructured":"Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In NeurIPS. 91\u201399."},{"key":"e_1_3_2_39_2","volume-title":"ICLR","author":"Zhang Yan","year":"2018","unstructured":"Yan Zhang, Jonathon S. Hare, and Adam Pr\u00fcgel-Bennett. 2018. Learning to count objects in natural images for visual question answering. In ICLR."},{"key":"e_1_3_2_40_2","unstructured":"Norman E. Fenton Martin Neil Anthony C. Constantinou. The Book of Why: The New Science of Cause and Effect Basic Books (2018)."},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00522"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00636"},{"key":"e_1_3_2_43_2","volume-title":"ICLR","author":"Kingma Diederik P.","year":"2015","unstructured":"Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In ICLR."},{"key":"e_1_3_2_44_2","unstructured":"PaddlePaddle An Easy-to-use Easy-to-learn Deep Learning Platform. 2019. Retrieved from http:\/\/www.paddlepaddle.org\/"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-018-1116-0"},{"key":"e_1_3_2_46_2","first-page":"8344","volume-title":"NeurIPS","author":"Norcliffe-Brown Will","year":"2018","unstructured":"Will Norcliffe-Brown, Stathis Vafeias, and Sarah Parisot. 2018. Learning conditioned graph structures for interpretable visual question answering. In NeurIPS. 8344\u20138353."},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00209"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00637"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1145\/3343031.3350925"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00851"},{"key":"e_1_3_2_51_2","article-title":"Beyond bilinear: Generalized multi-modal factorized high-order pooling for visual question answering","volume":"1708","author":"Yu Zhou","year":"2017","unstructured":"Zhou Yu, Jun Yu, Chenchao Xiang, Jianping Fan, and Dacheng Tao. 2017. Beyond bilinear: Generalized multi-modal factorized high-order pooling for visual question answering. CoRR abs\/1708.03619 (2017).","journal-title":"CoRR"},{"key":"e_1_3_2_52_2","first-page":"1571","volume-title":"NeurIPS","author":"Kim Jin-Hwa","year":"2018","unstructured":"Jin-Hwa Kim, Jaehyun Jun, and Byoung-Tak Zhang. 2018. Bilinear attention networks. In NeurIPS. 1571\u20131581."},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2019.2943456"},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2020.3004830"},{"key":"e_1_3_2_55_2","first-page":"16292","volume-title":"NeurIPS","author":"Niu Yulei","year":"2021","unstructured":"Yulei Niu and Hanwang Zhang. 2021. Introspective distillation for robust question answering. In NeurIPS. 16292\u201316304."}],"container-title":["ACM Transactions on the Web"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3616399","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3616399","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:45:31Z","timestamp":1750178731000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3616399"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,10,8]]},"references-count":54,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2024,11,30]]}},"alternative-id":["10.1145\/3616399"],"URL":"https:\/\/doi.org\/10.1145\/3616399","relation":{},"ISSN":["1559-1131","1559-114X"],"issn-type":[{"type":"print","value":"1559-1131"},{"type":"electronic","value":"1559-114X"}],"subject":[],"published":{"date-parts":[[2024,10,8]]},"assertion":[{"value":"2022-06-04","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-07-29","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-10-08","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}