{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:24:25Z","timestamp":1750220665671,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":35,"publisher":"ACM","license":[{"start":{"date-parts":[[2020,10,21]],"date-time":"2020-10-21T00:00:00Z","timestamp":1603238400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100006435","name":"National Science Foundation","doi-asserted-by":"publisher","award":["IIP-1631674"],"award-info":[{"award-number":["IIP-1631674"]}],"id":[{"id":"10.13039\/100006435","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2020,10,21]]},"DOI":"10.1145\/3382507.3418886","type":"proceedings-article","created":{"date-parts":[[2020,10,22]],"date-time":"2020-10-22T10:04:34Z","timestamp":1603361074000},"page":"510-518","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Temporal Attention and Consistency Measuring for Video Question Answering"],"prefix":"10.1145","author":[{"given":"Lingyu","family":"Zhang","sequence":"first","affiliation":[{"name":"Rensselaer Polytechnic Institute, Troy, NY, USA"}]},{"given":"Richard J.","family":"Radke","sequence":"additional","affiliation":[{"name":"Rensselaer Polytechnic Institute, Troy, NY, USA"}]}],"member":"320","published-online":{"date-parts":[[2020,10,22]]},"reference":[{"key":"e_1_3_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.279"},{"key":"e_1_3_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2017.2740062"},{"key":"e_1_3_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cviu.2013.01.013"},{"key":"e_1_3_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1179"},{"key":"e_1_3_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2013.340"},{"key":"e_1_3_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2014.6853739"},{"key":"e_1_3_2_2_7_1","volume-title":"Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805","author":"Devlin Jacob","year":"2018","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2018 . Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018). Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)."},{"key":"e_1_3_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/3340555.3353748"},{"key":"e_1_3_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.670"},{"key":"e_1_3_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2013.337"},{"key":"e_1_3_2_2_11_1","volume-title":"Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision. 2961--2969","author":"He Kaiming","year":"2017","unstructured":"Kaiming He , Georgia Gkioxari , Piotr Doll\u00e1r , and Ross Girshick . 2017 . Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision. 2961--2969 . Kaiming He, Georgia Gkioxari, Piotr Doll\u00e1r, and Ross Girshick. 2017. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision. 2961--2969."},{"key":"e_1_3_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_3_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.243"},{"key":"e_1_3_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.5555\/2540128.2540483"},{"key":"e_1_3_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3242969.3243018"},{"key":"e_1_3_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1186\/s13640-017-0224-z"},{"key":"e_1_3_2_2_17_1","volume-title":"Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980","author":"Kingma Diederik P","year":"2014","unstructured":"Diederik P Kingma and Jimmy Ba . 2014 . Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014). Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)."},{"key":"e_1_3_2_2_18_1","volume-title":"Berg","author":"Lei Jie","year":"2018","unstructured":"Jie Lei , Licheng Yu , Mohit Bansal , and Tamara L . Berg . 2018 . TVQA : Localized, Compositional Video Question Answering. In EMNLP . Jie Lei, Licheng Yu, Mohit Bansal, and Tamara L. Berg. 2018. TVQA: Localized, Compositional Video Question Answering. In EMNLP ."},{"key":"e_1_3_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.502"},{"key":"e_1_3_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/2818346.2820757"},{"key":"e_1_3_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.447"},{"key":"e_1_3_2_2_22_1","volume-title":"Workshop on Multimodal Corpora for Machine Learning: Taking Stock and Road Mapping the Future, ICMI-MLMI .","author":"Sanchez-Cortes Dairazalia","year":"2011","unstructured":"Dairazalia Sanchez-Cortes , Oya Aran , and Daniel Gatica-Perez . 2011 . An audio visual corpus for emergent leader analysis . In Workshop on Multimodal Corpora for Machine Learning: Taking Stock and Road Mapping the Future, ICMI-MLMI . Dairazalia Sanchez-Cortes, Oya Aran, and Daniel Gatica-Perez. 2011. An audio visual corpus for emergent leader analysis. In Workshop on Multimodal Corpora for Machine Learning: Taking Stock and Road Mapping the Future, ICMI-MLMI ."},{"key":"e_1_3_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/78.650093"},{"key":"e_1_3_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3242969.3242989"},{"key":"e_1_3_2_2_25_1","volume-title":"CLARIN Annual Conference","author":"V\u00e1radi T","year":"2018","unstructured":"T V\u00e1radi , Gy Kov\u00e1cs , I Szekr\u00e9nyes , H Kiss , and K Tak\u00e1cs . [n.d.]. Human-human, human-machine communication: on the HuComTech multimodal corpus . In CLARIN Annual Conference 2018 . 56. T V\u00e1radi, Gy Kov\u00e1cs, I Szekr\u00e9nyes, H Kiss, and K Tak\u00e1cs. [n.d.]. Human-human, human-machine communication: on the HuComTech multimodal corpus. In CLARIN Annual Conference 2018. 56."},{"key":"e_1_3_2_2_26_1","volume-title":"Thirty-Second AAAI Conference on Artificial Intelligence .","author":"Wang Bo","year":"2018","unstructured":"Bo Wang , Youjiang Xu , Yahong Han , and Richang Hong . 2018 b. Movie question answering: Remembering the textual cues for layered visual contents . In Thirty-Second AAAI Conference on Artificial Intelligence . Bo Wang, Youjiang Xu, Yahong Han, and Richang Hong. 2018b. Movie question answering: Remembering the textual cues for layered visual contents. In Thirty-Second AAAI Conference on Artificial Intelligence ."},{"key":"e_1_3_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00443"},{"key":"e_1_3_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3340555.3353739"},{"key":"e_1_3_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/WACV.2019.00048"},{"key":"e_1_3_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3077136.3080655"},{"key":"e_1_3_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00901"},{"key":"e_1_3_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D17-1115"},{"key":"e_1_3_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-74889-2_41"},{"key":"e_1_3_2_2_34_1","volume-title":"Radke","author":"Zhang Lingyu","year":"2020","unstructured":"Lingyu Zhang and Richard J . Radke . 2020 . A Multi-Stream Recurrent Neural Network for Social Role Detection in Multiparty Interactions. IEEE Journal of Selected Topics in Signal Processing ( 2020), 1--14. Lingyu Zhang and Richard J. Radke. 2020. A Multi-Stream Recurrent Neural Network for Social Role Detection in Multiparty Interactions. IEEE Journal of Selected Topics in Signal Processing (2020), 1--14."},{"key":"e_1_3_2_2_35_1","doi-asserted-by":"crossref","unstructured":"Zhou Zhao Qifan Yang Deng Cai Xiaofei He Yueting Zhuang Zhou Zhao Qifan Yang Deng Cai Xiaofei He and Yueting Zhuang. 2017. Video Question Answering via Hierarchical Spatio-Temporal Attention Networks.. In IJCAI. 3518--3524.  Zhou Zhao Qifan Yang Deng Cai Xiaofei He Yueting Zhuang Zhou Zhao Qifan Yang Deng Cai Xiaofei He and Yueting Zhuang. 2017. Video Question Answering via Hierarchical Spatio-Temporal Attention Networks.. In IJCAI. 3518--3524.","DOI":"10.24963\/ijcai.2017\/492"}],"event":{"name":"ICMI '20: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION","sponsor":["SIGCHI ACM Special Interest Group on Computer-Human Interaction"],"location":"Virtual Event Netherlands","acronym":"ICMI '20"},"container-title":["Proceedings of the 2020 International Conference on Multimodal Interaction"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3382507.3418886","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3382507.3418886","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:02:49Z","timestamp":1750197769000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3382507.3418886"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,10,21]]},"references-count":35,"alternative-id":["10.1145\/3382507.3418886","10.1145\/3382507"],"URL":"https:\/\/doi.org\/10.1145\/3382507.3418886","relation":{},"subject":[],"published":{"date-parts":[[2020,10,21]]},"assertion":[{"value":"2020-10-22","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}