{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,5]],"date-time":"2025-10-05T19:53:37Z","timestamp":1759694017153,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":38,"publisher":"ACM","license":[{"start":{"date-parts":[[2019,10,15]],"date-time":"2019-10-15T00:00:00Z","timestamp":1571097600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Key Research and Development Program of China","award":["2017YFB1300201"],"award-info":[{"award-number":["2017YFB1300201"]}]},{"name":"Ministry of Education of P.R. China","award":["WK2100100030"],"award-info":[{"award-number":["WK2100100030"]}]},{"name":"National Natural Science Foundation of China","award":["61622211,61620106009"],"award-info":[{"award-number":["61622211,61620106009"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2019,10,15]]},"DOI":"10.1145\/3343031.3350969","type":"proceedings-article","created":{"date-parts":[[2019,10,21]],"date-time":"2019-10-21T16:32:26Z","timestamp":1571675546000},"page":"1184-1192","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":20,"title":["Question-Aware Tube-Switch Network for Video Question Answering"],"prefix":"10.1145","author":[{"given":"Tianhao","family":"Yang","sequence":"first","affiliation":[{"name":"University of Science and Technology of China, Hefei, China"}]},{"given":"Zheng-Jun","family":"Zha","sequence":"additional","affiliation":[{"name":"University of Science and Technology of China, Hefei, China"}]},{"given":"Hongtao","family":"Xie","sequence":"additional","affiliation":[{"name":"University of Science and Technology of China, Hefei, China"}]},{"given":"Meng","family":"Wang","sequence":"additional","affiliation":[{"name":"Hefei University of Technology, Hefei, China"}]},{"given":"Hanwang","family":"Zhang","sequence":"additional","affiliation":[{"name":"Nanyang Technological University, Singapore, Singapore"}]}],"member":"320","published-online":{"date-parts":[[2019,10,15]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.279"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.5555\/2002472.2002497"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2018.06.069"},{"volume-title":"Empirical evaluation of gated recurrent neural networks on sequence modeling. NIPS","year":"2014","author":"Chung Junyoung","key":"e_1_3_2_1_4_1"},{"volume-title":"Devi Parikh, and Dhruv Batra.","year":"2017","author":"Das Abhishek","key":"e_1_3_2_1_5_1"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"volume-title":"Daylen Yang, Anna Rohrbach, Trevor Darrell, and Marcus Rohrbach.","year":"2016","author":"Fukui Akira","key":"e_1_3_2_1_7_1"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00688"},{"volume-title":"Statistical theory of extreme values and some practical applications: a series of lectures","author":"Gumbel Emil Julius","key":"e_1_3_2_1_9_1"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"volume-title":"Long short-term memory. Neural computation","year":"1997","author":"Hochreiter Sepp","key":"e_1_3_2_1_11_1"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.470"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.149"},{"volume-title":"Hadamard product for low-rank bilinear pooling. arXiv preprint arXiv:1610.04325","year":"2016","author":"Kim Jin-Hwa","key":"e_1_3_2_1_14_1"},{"volume-title":"Adam: A method for stochastic optimization. In ICLR .","year":"2015","author":"Kingma Diederik P","key":"e_1_3_2_1_15_1"},{"key":"e_1_3_2_1_16_1","unstructured":"Xiangpeng Li Jingkuan Song Lianli Gao Xianglong Liu Wenbing Huang Xiangnan He and Chuang Gan. 2019. Beyond RNNs: Positional Self-Attention with Co-Attention for Video Question Answering. In AAAI .  Xiangpeng Li Jingkuan Song Lianli Gao Xianglong Liu Wenbing Huang Xiangnan He and Chuang Gan. 2019. Beyond RNNs: Positional Self-Attention with Co-Attention for Video Question Answering. In AAAI ."},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2017.2749509"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00642"},{"volume-title":"Explainability by Parsing: Neural Module Tree Networks for Natural Language Visual Grounding. arXiv preprint arXiv:1812.03299","year":"2018","author":"Liu Daqing","key":"e_1_3_2_1_19_1"},{"key":"e_1_3_2_1_20_1","unstructured":"Jiasen Lu Anitha Kannan Jianwei Yang Devi Parikh and Dhruv Batra. 2017. Best of both worlds: Transferring knowledge from discriminative learning to a generative visual dialog model. In NIPS .  Jiasen Lu Anitha Kannan Jianwei Yang Devi Parikh and Dhruv Batra. 2017. Best of both worlds: Transferring knowledge from discriminative learning to a generative visual dialog model. In NIPS ."},{"key":"e_1_3_2_1_21_1","unstructured":"Jiasen Lu Jianwei Yang Dhruv Batra and Devi Parikh. 2016. Hierarchical question-image co-attention for visual question answering. In NIPS .  Jiasen Lu Jianwei Yang Dhruv Batra and Devi Parikh. 2016. Hierarchical question-image co-attention for visual question answering. In NIPS ."},{"key":"e_1_3_2_1_22_1","unstructured":"Hyeonseob Nam Jung-Woo Ha and Jeonghee Kim. 2017. Dual attention networks for multimodal reasoning and matching. In CVPR .  Hyeonseob Nam Jung-Woo Ha and Jeonghee Kim. 2017. Dual attention networks for multimodal reasoning and matching. In CVPR ."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1162"},{"key":"e_1_3_2_1_24_1","unstructured":"Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In ICLR .  Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In ICLR ."},{"volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 4223--4232","author":"Teney Damien","key":"e_1_3_2_1_25_1"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.510"},{"key":"e_1_3_2_1_27_1","unstructured":"Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N Gomez \u0141ukasz Kaiser and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 5998--6008.  Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N Gomez \u0141ukasz Kaiser and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 5998--6008."},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2012.2185041"},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.5555\/1641661.1641671"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3123266.3123427"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.571"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"crossref","unstructured":"Zichao Yang Xiaodong He Jianfeng Gao Li Deng and Alex Smola. 2016. Stacked attention networks for image question answering. In CVPR .  Zichao Yang Xiaodong He Jianfeng Gao Li Deng and Alex Smola. 2016. Stacked attention networks for image question answering. In CVPR .","DOI":"10.1109\/CVPR.2016.10"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00142"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.202"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.5555\/3298023.3298196"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"crossref","unstructured":"Zhou Zhao Qifan Yang Deng Cai Xiaofei He and Yueting Zhuang. 2017. Video Question Answering via Hierarchical Spatio-Temporal Attention Networks.. In IJCAI . 3518--3524.  Zhou Zhao Qifan Yang Deng Cai Xiaofei He and Yueting Zhuang. 2017. Video Question Answering via Hierarchical Spatio-Temporal Attention Networks.. In IJCAI . 3518--3524.","DOI":"10.24963\/ijcai.2017\/492"},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"crossref","unstructured":"Chen Zhu Yanpeng Zhao Shuaiyi Huang Kewei Tu and Yi Ma. 2017b. Structured attentions for visual question answering. In ICCV .  Chen Zhu Yanpeng Zhao Shuaiyi Huang Kewei Tu and Yi Ma. 2017b. Structured attentions for visual question answering. In ICCV .","DOI":"10.1109\/ICCV.2017.145"},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-017-1033-7"}],"event":{"name":"MM '19: The 27th ACM International Conference on Multimedia","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Nice France","acronym":"MM '19"},"container-title":["Proceedings of the 27th ACM International Conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3343031.3350969","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3343031.3350969","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:13:18Z","timestamp":1750201998000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3343031.3350969"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,10,15]]},"references-count":38,"alternative-id":["10.1145\/3343031.3350969","10.1145\/3343031"],"URL":"https:\/\/doi.org\/10.1145\/3343031.3350969","relation":{},"subject":[],"published":{"date-parts":[[2019,10,15]]},"assertion":[{"value":"2019-10-15","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}