{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:30:32Z","timestamp":1750221032777,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":39,"publisher":"ACM","license":[{"start":{"date-parts":[[2018,10,15]],"date-time":"2018-10-15T00:00:00Z","timestamp":1539561600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Natural Science Foundation of China","award":["61472422","61332016"],"award-info":[{"award-number":["61472422","61332016"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2018,10,15]]},"DOI":"10.1145\/3240508.3240662","type":"proceedings-article","created":{"date-parts":[[2018,10,18]],"date-time":"2018-10-18T17:52:08Z","timestamp":1539885128000},"page":"1002-1010","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["Enhancing Visual Question Answering Using Dropout"],"prefix":"10.1145","author":[{"given":"Zhiwei","family":"Fang","sequence":"first","affiliation":[{"name":"Institute of Automation, Chinese Academy of Sciences &amp; University of Chinese Academy of Sciences, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jing","family":"Liu","sequence":"additional","affiliation":[{"name":"Institute of Automation, Chinese Academy of Sciences &amp; University of Chinese Academy of Sciences, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yanyuan","family":"Qiao","sequence":"additional","affiliation":[{"name":"University of Chinese Academy of Sciences, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Qu","family":"Tang","sequence":"additional","affiliation":[{"name":"Institute of Automation, Chinese Academy of Sciences, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yong","family":"Li","sequence":"additional","affiliation":[{"name":"Business Growth BU, JD.com, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hanqing","family":"Lu","sequence":"additional","affiliation":[{"name":"Institute of Automation, Chinese Academy of Sciences &amp; University of Chinese Academy of Sciences, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2018,10,15]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Don't Just Assume","author":"Agrawal Aishwarya","year":"2017","unstructured":"Aishwarya Agrawal , Dhruv Batra , Devi Parikh , and Aniruddha Kembhavi . 2017. Don't Just Assume ; Look and Answer: Overcoming Priors for Visual Question Answering . arXiv preprint arXiv:1712.00377 ( 2017 ). Aishwarya Agrawal, Dhruv Batra, Devi Parikh, and Aniruddha Kembhavi. 2017. Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering. arXiv preprint arXiv:1712.00377 (2017)."},{"key":"e_1_3_2_1_2_1","volume-title":"Bottom-up and top-down attention for image captioning and VQA. arXiv preprint arXiv:1707.07998","author":"Anderson Peter","year":"2017","unstructured":"Peter Anderson , Xiaodong He , Chris Buehler , Damien Teney , Mark Johnson , Stephen Gould , and Lei Zhang . 2017. Bottom-up and top-down attention for image captioning and VQA. arXiv preprint arXiv:1707.07998 ( 2017 ). Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, and Lei Zhang. 2017. Bottom-up and top-down attention for image captioning and VQA. arXiv preprint arXiv:1707.07998 (2017)."},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.12"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.279"},{"key":"e_1_3_2_1_5_1","volume-title":"The IEEE International Conference on Computer Vision (ICCV)","volume":"1","author":"Hedi","year":"2017","unstructured":"Hedi Ben-younes, R\u00e9mi Cadene , Matthieu Cord , and Nicolas Thome . 2017 . Mutan: Multimodal tucker fusion for visual question answering . In The IEEE International Conference on Computer Vision (ICCV) , Vol. 1 . 3. Hedi Ben-younes, R\u00e9mi Cadene, Matthieu Cord, and Nicolas Thome. 2017. Mutan: Multimodal tucker fusion for visual question answering. In The IEEE International Conference on Computer Vision (ICCV), Vol. 1. 3."},{"key":"e_1_3_2_1_6_1","volume-title":"Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio.","author":"Cho Kyunghyun","year":"2014","unstructured":"Kyunghyun Cho , Bart Van Merri\u00ebnboer , Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014 . Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014). Kyunghyun Cho, Bart Van Merri\u00ebnboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)."},{"key":"e_1_3_2_1_7_1","volume-title":"Visual madlibs: Fill in the blank image generation and question answering. arXiv preprint. arXiv","author":"Yu L","year":"2015","unstructured":"L ea Yu . 2015. Visual madlibs: Fill in the blank image generation and question answering. arXiv preprint. arXiv , Vol. 1506 ( 2015 ), 3. L ea Yu. 2015. Visual madlibs: Fill in the blank image generation and question answering. arXiv preprint. arXiv , Vol. 1506 (2015), 3."},{"key":"e_1_3_2_1_8_1","volume-title":"Daylen Yang, Anna Rohrbach, Trevor Darrell, and Marcus Rohrbach.","author":"Fukui Akira","year":"2016","unstructured":"Akira Fukui , Dong Huk Park , Daylen Yang, Anna Rohrbach, Trevor Darrell, and Marcus Rohrbach. 2016 . Multimodal compact bilinear pooling for visual question answering and visual grounding. arXiv preprint arXiv:1606.01847 (2016). Akira Fukui, Dong Huk Park, Daylen Yang, Anna Rohrbach, Trevor Darrell, and Marcus Rohrbach. 2016. Multimodal compact bilinear pooling for visual question answering and visual grounding. arXiv preprint arXiv:1606.01847 (2016)."},{"key":"e_1_3_2_1_9_1","unstructured":"Haoyuan Gao Junhua Mao Jie Zhou Zhiheng Huang Lei Wang and Wei Xu. 2015. Are you talking to a machine? dataset and methods for multilingual image question. In Advances in neural information processing systems. 2296--2304.   Haoyuan Gao Junhua Mao Jie Zhou Zhiheng Huang Lei Wang and Wei Xu. 2015. Are you talking to a machine? dataset and methods for multilingual image question. In Advances in neural information processing systems. 2296--2304."},{"key":"e_1_3_2_1_10_1","first-page":"9","article-title":"Making the V in VQA matter: Elevating the role of image understanding in Visual Question Answering","volume":"1","author":"Goyal Yash","year":"2017","unstructured":"Yash Goyal , Tejas Khot , Douglas Summers-Stay , Dhruv Batra , and Devi Parikh . 2017 . Making the V in VQA matter: Elevating the role of image understanding in Visual Question Answering . In CVPR , Vol. 1. 9 . Yash Goyal, Tejas Khot, Douglas Summers-Stay, Dhruv Batra, and Devi Parikh. 2017. Making the V in VQA matter: Elevating the role of image understanding in Visual Question Answering. In CVPR, Vol. 1. 9.","journal-title":"CVPR"},{"key":"e_1_3_2_1_11_1","volume-title":"Deep Residual Learning for Image Recognition. CoRR","author":"He Kaiming","year":"2015","unstructured":"Kaiming He , Xiangyu Zhang , Shaoqing Ren , and Jian Sun . 2015. Deep Residual Learning for Image Recognition. CoRR , Vol. abs\/ 1512 .03385 ( 2015 ). arxiv: 1512.03385 http:\/\/arxiv.org\/abs\/1512.03385 Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. CoRR , Vol. abs\/1512.03385 (2015). arxiv: 1512.03385 http:\/\/arxiv.org\/abs\/1512.03385"},{"key":"e_1_3_2_1_12_1","volume-title":"Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580","author":"Hinton Geoffrey E","year":"2012","unstructured":"Geoffrey E Hinton , Nitish Srivastava , Alex Krizhevsky , Ilya Sutskever , and Ruslan R Salakhutdinov . 2012. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580 ( 2012 ). Geoffrey E Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R Salakhutdinov. 2012. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580 (2012)."},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"volume-title":"Advances in psychology .","author":"Jordan Michael I","key":"e_1_3_2_1_14_1","unstructured":"Michael I Jordan . 1997. Serial order: A parallel distributed processing approach . In Advances in psychology . Vol. 121 . Elsevier , 471--495. Michael I Jordan. 1997. Serial order: A parallel distributed processing approach. In Advances in psychology . Vol. 121. Elsevier, 471--495."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.217"},{"key":"e_1_3_2_1_16_1","volume-title":"The 5th International Conference on Learning Representations .","author":"Kim Jin-Hwa","year":"2017","unstructured":"Jin-Hwa Kim , Kyoung Woon On , Woosang Lim , Jeonghee Kim , Jung-Woo Ha , and Byoung-Tak Zhang . 2017 . Hadamard Product for Low-rank Bilinear Pooling . In The 5th International Conference on Learning Representations . Jin-Hwa Kim, Kyoung Woon On, Woosang Lim, Jeonghee Kim, Jung-Woo Ha, and Byoung-Tak Zhang. 2017. Hadamard Product for Low-rank Bilinear Pooling . In The 5th International Conference on Learning Representations ."},{"key":"e_1_3_2_1_17_1","volume-title":"Kingma and Jimmy Ba","author":"Diederik","year":"2014","unstructured":"Diederik P. Kingma and Jimmy Ba . 2014 . Adam : A Method for Stochastic Optimization. CoRR , Vol. abs\/ 1412 .6980 (2014). arxiv: 1412.6980 http:\/\/arxiv.org\/abs\/1412.6980 Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. CoRR , Vol. abs\/1412.6980 (2014). arxiv: 1412.6980 http:\/\/arxiv.org\/abs\/1412.6980"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-016-0981-7"},{"key":"e_1_3_2_1_19_1","unstructured":"Alex Krizhevsky Ilya Sutskever and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105.   Alex Krizhevsky Ilya Sutskever and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105."},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.0910734106"},{"key":"e_1_3_2_1_22_1","unstructured":"Jiasen Lu Jianwei Yang Dhruv Batra and Devi Parikh. 2016. Hierarchical question-image co-attention for visual question answering. In Advances In Neural Information Processing Systems. 289--297.   Jiasen Lu Jianwei Yang Dhruv Batra and Devi Parikh. 2016. Hierarchical question-image co-attention for visual question answering. In Advances In Neural Information Processing Systems. 289--297."},{"key":"e_1_3_2_1_23_1","unstructured":"Mateusz Malinowski and Mario Fritz. 2014. A multi-world approach to question answering about real-world scenes based on uncertain input. In Advances in neural information processing systems. 1682--1690.   Mateusz Malinowski and Mario Fritz. 2014. A multi-world approach to question answering about real-world scenes based on uncertain input. In Advances in neural information processing systems. 1682--1690."},{"key":"e_1_3_2_1_24_1","volume-title":"Dual attention networks for multimodal reasoning and matching. arXiv preprint arXiv:1611.00471","author":"Nam Hyeonseob","year":"2016","unstructured":"Hyeonseob Nam , Jung-Woo Ha , and Jeonghee Kim . 2016. Dual attention networks for multimodal reasoning and matching. arXiv preprint arXiv:1611.00471 ( 2016 ). Hyeonseob Nam, Jung-Woo Ha, and Jeonghee Kim. 2016. Dual attention networks for multimodal reasoning and matching. arXiv preprint arXiv:1611.00471 (2016)."},{"key":"e_1_3_2_1_25_1","first-page":"5","article-title":"Image question answering: A visual semantic embedding model and a new dataset","volume":"1","author":"Ren Mengye","year":"2015","unstructured":"Mengye Ren , Ryan Kiros , and Richard Zemel . 2015 . Image question answering: A visual semantic embedding model and a new dataset . Proc. Advances in Neural Inf. Process. Syst , Vol. 1 , 2 (2015), 5 . Mengye Ren, Ryan Kiros, and Richard Zemel. 2015. Image question answering: A visual semantic embedding model and a new dataset. Proc. Advances in Neural Inf. Process. Syst , Vol. 1, 2 (2015), 5.","journal-title":"Proc. Advances in Neural Inf. Process. Syst"},{"key":"e_1_3_2_1_26_1","unstructured":"Idan Schwartz Alexander Schwing and Tamir Hazan. 2017. High-Order Attention Models for Visual Question Answering. In Advances in Neural Information Processing Systems. 3667--3677.  Idan Schwartz Alexander Schwing and Tamir Hazan. 2017. High-Order Attention Models for Visual Question Answering. In Advances in Neural Information Processing Systems. 3667--3677."},{"key":"e_1_3_2_1_27_1","volume-title":"Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556","author":"Simonyan Karen","year":"2014","unstructured":"Karen Simonyan and Andrew Zisserman . 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 ( 2014 ). Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)."},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.5555\/2627435.2670313"},{"key":"e_1_3_2_1_29_1","unstructured":"Dendi Suhubdy. 2018. Fraternal Dropout. (2018).  Dendi Suhubdy. 2018. Fraternal Dropout. (2018)."},{"key":"e_1_3_2_1_30_1","volume-title":"et almbox","author":"Szegedy Christian","year":"2015","unstructured":"Christian Szegedy , Wei Liu , Yangqing Jia , Pierre Sermanet , Scott Reed , Dragomir Anguelov , Dumitru Erhan , Vincent Vanhoucke , Andrew Rabinovich , et almbox . 2015 . Going deeper with convolutions. Cvpr . Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich, et almbox. 2015. Going deeper with convolutions. Cvpr."},{"key":"e_1_3_2_1_31_1","unstructured":"Damien Teney Peter Anderson Xiaodong He and Anton van den Hengel. 2017. Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge . (2017). arxiv: 1708.02711 http:\/\/arxiv.org\/abs\/1708.02711  Damien Teney Peter Anderson Xiaodong He and Anton van den Hengel. 2017. Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge . (2017). arxiv: 1708.02711 http:\/\/arxiv.org\/abs\/1708.02711"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2017.2754246"},{"key":"e_1_3_2_1_33_1","volume-title":"Explicit knowledge-based reasoning for visual question answering. arXiv preprint arXiv:1511.02570","author":"Wang Peng","year":"2015","unstructured":"Peng Wang , Qi Wu , Chunhua Shen , Anton van den Hengel , and Anthony Dick . 2015. Explicit knowledge-based reasoning for visual question answering. arXiv preprint arXiv:1511.02570 ( 2015 ). Peng Wang, Qi Wu, Chunhua Shen, Anton van den Hengel, and Anthony Dick. 2015. Explicit knowledge-based reasoning for visual question answering. arXiv preprint arXiv:1511.02570 (2015)."},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cviu.2017.05.001"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.10"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.202"},{"key":"e_1_3_2_1_37_1","volume-title":"Recurrent neural network regularization. arXiv preprint arXiv:1409.2329","author":"Zaremba Wojciech","year":"2014","unstructured":"Wojciech Zaremba , Ilya Sutskever , and Oriol Vinyals . 2014. Recurrent neural network regularization. arXiv preprint arXiv:1409.2329 ( 2014 ). Wojciech Zaremba, Ilya Sutskever, and Oriol Vinyals. 2014. Recurrent neural network regularization. arXiv preprint arXiv:1409.2329 (2014)."},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.542"},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.540"}],"event":{"name":"MM '18: ACM Multimedia Conference","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Seoul Republic of Korea","acronym":"MM '18"},"container-title":["Proceedings of the 26th ACM international conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3240508.3240662","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3240508.3240662","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T00:43:31Z","timestamp":1750207411000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3240508.3240662"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,10,15]]},"references-count":39,"alternative-id":["10.1145\/3240508.3240662","10.1145\/3240508"],"URL":"https:\/\/doi.org\/10.1145\/3240508.3240662","relation":{},"subject":[],"published":{"date-parts":[[2018,10,15]]},"assertion":[{"value":"2018-10-15","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}