{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,21]],"date-time":"2026-07-21T08:17:37Z","timestamp":1784621857755,"version":"3.55.0"},"reference-count":43,"publisher":"Wiley","issue":"4","license":[{"start":{"date-parts":[[2023,2,22]],"date-time":"2023-02-22T00:00:00Z","timestamp":1677024000000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61972016"],"award-info":[{"award-number":["61972016"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62032016"],"award-info":[{"award-number":["62032016"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004826","name":"Natural Science Foundation of Beijing","doi-asserted-by":"publisher","award":["L191007"],"award-info":[{"award-number":["L191007"]}],"id":[{"id":"10.13039\/501100004826","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["advanced.onlinelibrary.wiley.com"],"crossmark-restriction":true},"short-container-title":["Advanced Intelligent Systems"],"published-print":{"date-parts":[[2023,4]]},"abstract":"<jats:sec><jats:label\/><jats:p>Video question answering (VideoQA) is a typical task that integrates language and vision. The key for VideoQA is to extract relevant and effective visual information for answering a specific question. Information selection is believed to be necessary for this task due to the large amount of irrelevant information in the video, and explicitly learning an attention model can be a reasonable and effective solution for the selection. Herein, a novel VideoQA model called Text\u2010Assisted Spatial and Temporal Attention Network (TASTA) is proposed, which shows the great potential of explicitly modeling attention. TASTA is made to be simple, small, clean, and efficient for clear performance justification and possible easy extension. Its success is mainly from two new strategies of better using the textual information. Experimental results on a large and most representative dataset, TGIF\u2010QA, show the significant superiority of TASTA w.r.t. the state\u2010of\u2010the\u2010art and demonstrate the effectiveness of its key components via ablation studies.<\/jats:p><\/jats:sec>","DOI":"10.1002\/aisy.202200131","type":"journal-article","created":{"date-parts":[[2023,2,22]],"date-time":"2023-02-22T22:45:22Z","timestamp":1677105922000},"update-policy":"https:\/\/doi.org\/10.1002\/crossmark_policy","source":"Crossref","is-referenced-by-count":104,"title":["TASTA: Text\u2010Assisted Spatial and Temporal Attention Network for Video Question Answering"],"prefix":"10.1002","volume":"5","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8427-4495","authenticated-orcid":false,"given":"Tian","family":"Wang","sequence":"first","affiliation":[{"name":"Institute of Artificial Intelligence Beihang University  Beijing 100083 China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Boyao","family":"Hou","sequence":"additional","affiliation":[{"name":"School of Automation Science and Electrical Engineering Beihang University  Beijing 100083 China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jiakun","family":"Li","sequence":"additional","affiliation":[{"name":"School of Automation Science and Electrical Engineering Beihang University  Beijing 100083 China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Peng","family":"Shi","sequence":"additional","affiliation":[{"name":"College of Computer and Cyber Security Fujian Normal University  Fuzhou Fujian 350117 China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Baochang","family":"Zhang","sequence":"additional","affiliation":[{"name":"Institute of Artificial Intelligence Beihang University  Beijing 100083 China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Hichem","family":"Snoussi","sequence":"additional","affiliation":[{"name":"Institute Charles Delaunay University of Technology of Troyes  10004 Troyes France"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"311","published-online":{"date-parts":[[2023,2,22]]},"reference":[{"key":"e_1_2_9_2_1","doi-asserted-by":"publisher","DOI":"10.1002\/aisy.202000043"},{"key":"e_1_2_9_3_1","doi-asserted-by":"publisher","DOI":"10.1002\/aisy.202100228"},{"key":"e_1_2_9_4_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2022.03.065"},{"key":"e_1_2_9_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2022.3158546"},{"key":"e_1_2_9_6_1","first-page":"1573","author":"Yusuf A. A.","year":"2022","journal-title":"Artif. Intell. Rev."},{"key":"e_1_2_9_7_1","unstructured":"Y.Jang Y.Song Y.Yu Y.Kim G.Kim inProc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) IEEE Piscataway NJ2017 pp.2758\u20132766."},{"key":"e_1_2_9_8_1","unstructured":"J.Gao R.Ge K.Chen R.Nevatia inProc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) IEEE Piscataway NJ2018 pp.6576\u20136585."},{"key":"e_1_2_9_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2021.3076556"},{"key":"e_1_2_9_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2021.3120867"},{"key":"e_1_2_9_11_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2022.108540"},{"key":"e_1_2_9_12_1","doi-asserted-by":"crossref","unstructured":"L.Gao P.Zeng J.Song Y.-F.Li W.Liu T.Mei H. T.Shen inProc. of the AAAI Conf. on Artificial Intelligence AAAI Press Palo Alto CA2019 Vol.33 pp.6391\u20136398.","DOI":"10.1609\/aaai.v33i01.33016391"},{"key":"e_1_2_9_13_1","unstructured":"M.Malinowski M.Fritz inProc. of the 27th Inter. Conf. on Neural Information Processing Systems Curran Associates Inc. Red Hook NY2014 Vol.1 pp.1682\u20131690."},{"key":"e_1_2_9_14_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-016-0966-6"},{"key":"e_1_2_9_15_1","unstructured":"M.Ren R.Kiros R.Zemel inProc. of the 28th Inter. Conf. on Neural Information Processing Systems MIT Press Cambridge MA2015 Vol.2 pp.2953\u20132961."},{"key":"e_1_2_9_16_1","unstructured":"H.Gao J.Mao J.Zhou Z.Huang L.Wang W.Xu inProc. of the 28th Inter. Conf. on Neural Information Processing Systems MIT Press Cambridge MA2015 Vol.2 pp.2296\u20132304."},{"key":"e_1_2_9_17_1","doi-asserted-by":"crossref","unstructured":"Y.Goyal T.Khot D.Summers-Stay D.Batra D.Parikh inProc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) IEEE Piscataway NJ2017 pp.6325\u20136334.","DOI":"10.1109\/CVPR.2017.670"},{"key":"e_1_2_9_18_1","doi-asserted-by":"crossref","unstructured":"Y.Gao O.Beijbom N.Zhang T.Darrell inProc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)IEEE Piscataway NJ2016 pp.317\u2013326.","DOI":"10.1109\/CVPR.2016.41"},{"key":"e_1_2_9_19_1","unstructured":"J.-H.Kim S.-W.Lee D.Kwak M.-O.Heo J.Kim J.-W.Ha B.-T.Zhang inProc. of the 30th Inter. Conf. on Neural Information Processing Systems Curran Associates Inc. Red Hook NY2016 pp.361\u2013369."},{"key":"e_1_2_9_20_1","unstructured":"A.Fukui D. H.Park D.Yang A.Rohrbach T.Darrell M.Rohrbach inProc. of the 2016 Conf. on Empirical Methods in Natural Language Processing Association for Computational Linguistics Austin Texas2016 pp.457\u2013468."},{"key":"e_1_2_9_21_1","doi-asserted-by":"crossref","unstructured":"D.Teney P.Anderson X.He A.van den Hengel inProc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) IEEE Piscataway NJ2018 pp.4223\u20134232.","DOI":"10.1109\/CVPR.2018.00444"},{"key":"e_1_2_9_22_1","unstructured":"J.Lu J.Yang D.Batra D.Parikh inProc. of the 30th Inter. Conf. on Neural Information Processing Systems Curran Associates Inc. Red Hook NY2016 pp.289\u2013297."},{"key":"e_1_2_9_23_1","unstructured":"T.Rahman S.-H.Chou L.Sigal G.Carenini inProc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) IEEE Piscataway NJ2021 pp.1653\u20131662."},{"key":"e_1_2_9_24_1","unstructured":"M.Tapaswi Y.Zhu R.Stiefelhagen A.Torralba R.Urtasun S.Fidler inProc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) IEEE Piscataway NJ2016 pp.4631\u20134640."},{"key":"e_1_2_9_25_1","unstructured":"J.Lei L.Yu M.Bansal T.Berg inProc. of the 2018 Conf. on Empirical Methods in Natural Language Processing Association for Computational Linguistics Brussels Belgium2018 pp.1369\u20131379."},{"key":"e_1_2_9_26_1","unstructured":"H.Yun Y.Yu W.Yang K.Lee G.Kim inProc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) IEEE Piscataway NJ2021 pp.2031\u20132041."},{"key":"e_1_2_9_27_1","unstructured":"K.Yi C.Gan Y.Li P.Kohli J.Wu A.Torralba J. B.Tenenbaum in8th Inter. Conf. on Learning Representations ICLR 2020 Addis Ababa Ethiopia April2020 OpenReview.net."},{"key":"e_1_2_9_28_1","volume-title":"Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks","author":"Wu B.","year":"2021"},{"key":"e_1_2_9_29_1","unstructured":"Y.Yu H.Ko J.Choi G.Kim inProc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) IEEE Piscataway NJ2017 pp.3165\u20133173."},{"key":"e_1_2_9_30_1","doi-asserted-by":"crossref","unstructured":"W.Jin Z.Zhao M.Gu J.Yu J.Xiao Y.Zhuang inProc. of the 27th ACM Inter. Conf. on Multimedia Association for Computing Machinery New York NY2019 pp.1193\u20131201.","DOI":"10.1145\/3343031.3351065"},{"key":"e_1_2_9_31_1","doi-asserted-by":"crossref","unstructured":"X.Li J.Song L.Gao X.Liu W.Huang X.He C.Gan inProc. of the AAAI Conf. on Artificial Intelligence AAAI Press Palo Alto CA2019 33 p.8658 number: 01.","DOI":"10.1609\/aaai.v33i01.33018658"},{"key":"e_1_2_9_32_1","unstructured":"J.Lei L.Li L.Zhou Z.Gan T. L.Berg M.Bansal J.Liu in2021 IEEE\/CVF Conf. on Computer Vision and Pattern Recognition (CVPR) IEEE Piscataway NJ2021 pp.7327\u20137337 ISSN: 2575-7075."},{"key":"e_1_2_9_33_1","unstructured":"Z.Chen J.Mao J.Wu K.-Y. K.Wong J. B.Tenenbaum C.Gan in9th Inter. Conf. on Learning Representations ICLR 2021 Virtual Event Austria May 2021 OpenReview.net."},{"key":"e_1_2_9_34_1","first-page":"887","volume-title":"Advances in Neural Information Processing Systems","author":"Ding M.","year":"2021"},{"key":"e_1_2_9_35_1","unstructured":"J.Pennington R.Socher C.Manning inProc. of the 2014 Conf. on Empirical Methods in Natural Language Processing Association for Computational Linguistics Doha Qatar2014 pp.1532\u20131543."},{"key":"e_1_2_9_36_1","doi-asserted-by":"crossref","unstructured":"K.He X.Zhang S.Ren J.Sun inProc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) IEEE Piscataway NJ2016 pp.770\u2013778.","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_2_9_37_1","doi-asserted-by":"crossref","unstructured":"V.Pham T.Bluche C.Kermorvant J.Louradour in2014 14th Inter. Conf. on Frontiers in Handwriting Recognition IEEE Piscataway NJ2014 pp.285\u2013290.","DOI":"10.1109\/ICFHR.2014.55"},{"key":"e_1_2_9_38_1","unstructured":"J. L.Ba J. R.Kiros G. E.Hinton arXiv preprint arXiv:1607.064502016."},{"key":"e_1_2_9_39_1","unstructured":"X.Glorot Y.Bengio inProc. of the thirteenth international conference on artificial intelligence and statistics PMLR Chia Laguna Resort Sardinia Italy2010 pp.249\u2013256."},{"key":"e_1_2_9_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/4235.585893"},{"key":"e_1_2_9_41_1","unstructured":"K.Cho B.van Merri\u00ebnboer C.Gulcehre D.Bahdanau F.Bougares H.Schwenk Y.Bengio inProc. of the 2014 Conf. on Empirical Methods in Natural Language Processing (EMNLP) Association for Computational Linguistics Doha Qatar2014 pp.1724\u20131734."},{"key":"e_1_2_9_42_1","unstructured":"J.Chung C.Gulcehre K.Cho Y.Bengio inNIPS 2014 Workshop on Deep LearningDecember2014."},{"key":"e_1_2_9_43_1","doi-asserted-by":"crossref","unstructured":"D.Tran L.Bourdev R.Fergus L.Torresani M.Paluri in2015 Inter. Conf. on Computer Vision ICCV 2015 Proceedings of the IEEE Inter. Conf. on Computer Vision Institute of Electrical and Electronics Engineers Inc. Piscataway NJ2015 pp.4489\u20134497.","DOI":"10.1109\/ICCV.2015.510"},{"key":"e_1_2_9_44_1","unstructured":"A.Karpathy G.Toderici S.Shetty T.Leung R.Sukthankar L.Fei-Fei in2014 IEEE Conf. on Computer Vision and Pattern Recognition IEEE Piscataway NJ2014 pp.1725\u20131732."}],"container-title":["Advanced Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/aisy.202200131","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/full-xml\/10.1002\/aisy.202200131","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/advanced.onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/aisy.202200131","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,7]],"date-time":"2025-10-07T16:07:48Z","timestamp":1759853268000},"score":1,"resource":{"primary":{"URL":"https:\/\/advanced.onlinelibrary.wiley.com\/doi\/10.1002\/aisy.202200131"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,2,22]]},"references-count":43,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2023,4]]}},"alternative-id":["10.1002\/aisy.202200131"],"URL":"https:\/\/doi.org\/10.1002\/aisy.202200131","archive":["Portico"],"relation":{},"ISSN":["2640-4567","2640-4567"],"issn-type":[{"value":"2640-4567","type":"print"},{"value":"2640-4567","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,2,22]]},"assertion":[{"value":"2022-05-19","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-02-22","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}],"article-number":"2200131"}}