{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,6]],"date-time":"2026-05-06T15:54:43Z","timestamp":1778082883661,"version":"3.51.4"},"publisher-location":"New York, NY, USA","reference-count":109,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,10,10]],"date-time":"2022-10-10T00:00:00Z","timestamp":1665360000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Research Foundation Singapore","award":["Strategic Capability Research Centres Funding Initiative"],"award-info":[{"award-number":["Strategic Capability Research Centres Funding Initiative"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,10,10]]},"DOI":"10.1145\/3503161.3549202","type":"proceedings-article","created":{"date-parts":[[2022,10,10]],"date-time":"2022-10-10T15:43:12Z","timestamp":1665416592000},"page":"6875-6882","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":16,"title":["Compute to Tell the Tale: Goal-Driven Narrative Generation"],"prefix":"10.1145","author":[{"given":"Yongkang","family":"Wong","sequence":"first","affiliation":[{"name":"National University of Singapore, Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shaojing","family":"Fan","sequence":"additional","affiliation":[{"name":"National University of Singapore, Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yangyang","family":"Guo","sequence":"additional","affiliation":[{"name":"National University of Singapore, Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ziwei","family":"Xu","sequence":"additional","affiliation":[{"name":"National University of Singapore, Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Karen","family":"Stephen","sequence":"additional","affiliation":[{"name":"NEC Corporation, Tokyo, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Rishabh","family":"Sheoran","sequence":"additional","affiliation":[{"name":"National University of Singapore, Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Anusha","family":"Bhamidipati","sequence":"additional","affiliation":[{"name":"NEC Corporation, Tokyo, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Vivek","family":"Barsopia","sequence":"additional","affiliation":[{"name":"NEC Corporation, Tokyo, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jianquan","family":"Liu","sequence":"additional","affiliation":[{"name":"NEC Corporation, Tokyo, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mohan","family":"Kankanhalli","sequence":"additional","affiliation":[{"name":"National University of Singapore, Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,10,10]]},"reference":[{"key":"e_1_3_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-22362-4_1"},{"key":"e_1_3_2_2_2_1","doi-asserted-by":"crossref","unstructured":"Peter Anderson Xiaodong He Chris Buehler Damien Teney Mark Johnson Stephen Gould and Lei Zhang. 2018. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering. In CVPR. 6077--6086. Peter Anderson Xiaodong He Chris Buehler Damien Teney Mark Johnson Stephen Gould and Lei Zhang. 2018. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering. In CVPR. 6077--6086.","DOI":"10.1109\/CVPR.2018.00636"},{"key":"e_1_3_2_2_3_1","volume-title":"VQA: Visual Question Answering. In ICCV. 2425--2433.","author":"Antol Stanislaw","year":"2015","unstructured":"Stanislaw Antol , Aishwarya Agrawal , Jiasen Lu , Margaret Mitchell , Dhruv Batra , C. Lawrence Zitnick , and Devi Parikh . 2015 . VQA: Visual Question Answering. In ICCV. 2425--2433. Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, and Devi Parikh. 2015. VQA: Visual Question Answering. In ICCV. 2425--2433."},{"key":"e_1_3_2_2_4_1","first-page":"113","article-title":"Persuasive effects of fictional narratives increase over time","volume":"10","author":"Appel Markus","year":"2007","unstructured":"Markus Appel and Tobias Richter . 2007 . Persuasive effects of fictional narratives increase over time . Media Psychology 10 , 1 (2007), 113 -- 134 . Markus Appel and Tobias Richter. 2007. Persuasive effects of fictional narratives increase over time. Media Psychology 10, 1 (2007), 113--134.","journal-title":"Media Psychology"},{"key":"e_1_3_2_2_5_1","doi-asserted-by":"crossref","first-page":"e01542","DOI":"10.1002\/bes2.1542","article-title":"Storytelling: A Natural Tool to Weave the Threads of Science and Community Together","volume":"100","author":"Bayer Skylar","year":"2019","unstructured":"Skylar Bayer and Annaliese Hettinger . 2019 . Storytelling: A Natural Tool to Weave the Threads of Science and Community Together . The Bulletin of the Ecological Society of America 100 , 2 (2019), e01542 . Skylar Bayer and Annaliese Hettinger. 2019. Storytelling: A Natural Tool to Weave the Threads of Science and Community Together. The Bulletin of the Ecological Society of America 100, 2 (2019), e01542.","journal-title":"The Bulletin of the Ecological Society of America"},{"key":"e_1_3_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1016\/B978-0-444-62713-1.00011-8"},{"key":"e_1_3_2_2_7_1","doi-asserted-by":"crossref","unstructured":"Jo\u00e3o Carreira and Andrew Zisserman. 2017. Quo Vadis Action Recognition? A New Model and the Kinetics Dataset. In CVPR. 4724--4733. Jo\u00e3o Carreira and Andrew Zisserman. 2017. Quo Vadis Action Recognition? A New Model and the Kinetics Dataset. In CVPR. 4724--4733.","DOI":"10.1109\/CVPR.2017.502"},{"key":"e_1_3_2_2_8_1","doi-asserted-by":"crossref","unstructured":"Gregory D. Casta\u00f1\u00f3n Yuting Chen Ziming Zhang and Venkatesh Saligrama. 2015. Efficient Activity Retrieval through Semantic Graph Queries. In ACM Multimedia. 391--400. Gregory D. Casta\u00f1\u00f3n Yuting Chen Ziming Zhang and Venkatesh Saligrama. 2015. Efficient Activity Retrieval through Semantic Graph Queries. In ACM Multimedia. 391--400.","DOI":"10.1145\/2733373.2806229"},{"key":"e_1_3_2_2_9_1","doi-asserted-by":"crossref","unstructured":"Hong Chen Yifei Huang Hiroya Takamura and Hideki Nakayama. 2021. Commonsense Knowledge Aware Concept Selection For Diverse and Informative Visual Storytelling. In AAAI. 999--1008. Hong Chen Yifei Huang Hiroya Takamura and Hideki Nakayama. 2021. Commonsense Knowledge Aware Concept Selection For Diverse and Informative Visual Storytelling. In AAAI. 999--1008.","DOI":"10.1609\/aaai.v35i2.16184"},{"key":"e_1_3_2_2_10_1","doi-asserted-by":"crossref","unstructured":"Shizhe Chen Yida Zhao Qin Jin and Qi Wu. 2020. Fine-Grained Video-Text Retrieval With Hierarchical Graph Reasoning. In CVPR. 10635--10644. Shizhe Chen Yida Zhao Qin Jin and Qi Wu. 2020. Fine-Grained Video-Text Retrieval With Hierarchical Graph Reasoning. In CVPR. 10635--10644.","DOI":"10.1109\/CVPR42600.2020.01065"},{"key":"e_1_3_2_2_11_1","volume-title":"The distinction of fiction","author":"Cohn Dorrit","unstructured":"Dorrit Cohn . 2000. The distinction of fiction . JHU Press . Dorrit Cohn. 2000. The distinction of fiction. JHU Press."},{"key":"e_1_3_2_2_12_1","volume-title":"O'Leary","author":"Conroy John M.","year":"2001","unstructured":"John M. Conroy and Dianne P . O'Leary . 2001 . Text Summarization via Hidden Markov Models. In SIGIR. 406--407. John M. Conroy and Dianne P. O'Leary. 2001. Text Summarization via Hidden Markov Models. In SIGIR. 406--407."},{"key":"e_1_3_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1017\/S0261444800007801"},{"key":"e_1_3_2_2_14_1","doi-asserted-by":"crossref","unstructured":"Mariam Daoud Lynda Tamine-Lechani Mohand Boughanem and Bilal Chebaro. 2009. A session based personalized search using an ontological user profile. In SAC. 1732--1736. Mariam Daoud Lynda Tamine-Lechani Mohand Boughanem and Bilal Chebaro. 2009. A session based personalized search using an ontological user profile. In SAC. 1732--1736.","DOI":"10.1145\/1529282.1529670"},{"key":"e_1_3_2_2_15_1","unstructured":"Abhishek Das Satwik Kottur Khushi Gupta Avi Singh Deshraj Yadav Jos\u00e9 M. F. Moura Devi Parikh and Dhruv Batra. 2017. Visual Dialog. In CVPR. 1080--1089. Abhishek Das Satwik Kottur Khushi Gupta Avi Singh Deshraj Yadav Jos\u00e9 M. F. Moura Devi Parikh and Dhruv Batra. 2017. Visual Dialog. In CVPR. 1080--1089."},{"key":"e_1_3_2_2_16_1","doi-asserted-by":"crossref","unstructured":"Jiankang Deng Jia Guo Niannan Xue and Stefanos Zafeiriou. 2019. ArcFace: Additive angular margin loss for deep face recognition. In CVPR. 4690--4699. Jiankang Deng Jia Guo Niannan Xue and Stefanos Zafeiriou. 2019. ArcFace: Additive angular margin loss for deep face recognition. In CVPR. 4690--4699.","DOI":"10.1109\/CVPR.2019.00482"},{"key":"e_1_3_2_2_17_1","volume-title":"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT. 4171--4186.","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2019 . BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT. 4171--4186. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT. 4171--4186."},{"key":"e_1_3_2_2_18_1","unstructured":"Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai Thomas Unterthiner Mostafa Dehghani Matthias Minderer Georg Heigold Sylvain Gelly Jakob Uszkoreit and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In ICLR. Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai Thomas Unterthiner Mostafa Dehghani Matthias Minderer Georg Heigold Sylvain Gelly Jakob Uszkoreit and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In ICLR."},{"key":"e_1_3_2_2_19_1","doi-asserted-by":"crossref","unstructured":"Christoph Feichtenhofer Haoqi Fan Jitendra Malik and Kaiming He. 2019. SlowFast Networks for Video Recognition. In ICCV. 6201--6210. Christoph Feichtenhofer Haoqi Fan Jitendra Malik and Kaiming He. 2019. SlowFast Networks for Video Recognition. In ICCV. 6201--6210.","DOI":"10.1109\/ICCV.2019.00630"},{"key":"e_1_3_2_2_20_1","volume-title":"Kankanhalli","author":"Gan Tian","year":"2013","unstructured":"Tian Gan , Yongkang Wong , Daqing Zhang , and Mohan S . Kankanhalli . 2013 . Temporal encoded F-formation system for social interaction detection. In ACM Multimedia . 937--946. Tian Gan, Yongkang Wong, Daqing Zhang, and Mohan S. Kankanhalli. 2013. Temporal encoded F-formation system for social interaction detection. In ACM Multimedia. 937--946."},{"key":"e_1_3_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-72079-9_2"},{"key":"e_1_3_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1207\/s15516709cog0801_3"},{"key":"e_1_3_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1177\/0261927X99018002003"},{"key":"e_1_3_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1613\/jair.1.12850"},{"key":"e_1_3_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.1460-2466.2006.00288.x"},{"key":"e_1_3_2_2_26_1","volume-title":"UNION: An Unreferenced Metric for Evaluating Open-ended Story Generation. In EMNLP. 9157--9166.","author":"Guan Jian","year":"2020","unstructured":"Jian Guan and Minlie Huang . 2020 . UNION: An Unreferenced Metric for Evaluating Open-ended Story Generation. In EMNLP. 9157--9166. Jian Guan and Minlie Huang. 2020. UNION: An Unreferenced Metric for Evaluating Open-ended Story Generation. In EMNLP. 9157--9166."},{"key":"e_1_3_2_2_27_1","doi-asserted-by":"crossref","unstructured":"Jian Guan Yansen Wang and Minlie Huang. 2019. Story Ending Generation with Incremental Encoding and Commonsense Knowledge. In AAAI. 6473--6480. Jian Guan Yansen Wang and Minlie Huang. 2019. Story Ending Generation with Incremental Encoding and Commonsense Knowledge. In AAAI. 6473--6480.","DOI":"10.1609\/aaai.v33i01.33016473"},{"key":"e_1_3_2_2_28_1","volume-title":"Kankanhalli","author":"Guo Yangyang","year":"2018","unstructured":"Yangyang Guo , Zhiyong Cheng , Liqiang Nie , Xin-Shun Xu , and Mohan S . Kankanhalli . 2018 . Multi-modal Preference Modeling for Product Search. In ACM Multimedia . 1865--1873. Yangyang Guo, Zhiyong Cheng, Liqiang Nie, Xin-Shun Xu, and Mohan S. Kankanhalli. 2018. Multi-modal Preference Modeling for Product Search. In ACM Multimedia. 1865--1873."},{"key":"e_1_3_2_2_29_1","unstructured":"Yangyang Guo Liqiang Nie Yongkang Wong Yibing Liu Zhiyong Cheng and Mohan Kankanhalli. 2022. A Unified End-to-End Retriever-Reader Framework for Knowledge-based VQA. In ACM Multimedia. Yangyang Guo Liqiang Nie Yongkang Wong Yibing Liu Zhiyong Cheng and Mohan Kankanhalli. 2022. A Unified End-to-End Retriever-Reader Framework for Knowledge-based VQA. In ACM Multimedia."},{"key":"e_1_3_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2018.2844175"},{"key":"e_1_3_2_2_31_1","unstructured":"Kaiming He Xiangyu Zhang Shaoqing Ren and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR. 770--778. Kaiming He Xiangyu Zhang Shaoqing Ren and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR. 770--778."},{"key":"e_1_3_2_2_32_1","doi-asserted-by":"crossref","unstructured":"Xiangnan He Lizi Liao Hanwang Zhang Liqiang Nie Xia Hu and Tat-Seng Chua. 2017. Neural Collaborative Filtering. In WWW. 173--182. Xiangnan He Lizi Liao Hanwang Zhang Liqiang Nie Xia Hu and Tat-Seng Chua. 2017. Neural Collaborative Filtering. In WWW. 173--182.","DOI":"10.1145\/3038912.3052569"},{"key":"e_1_3_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_3_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1111\/jcom.12114"},{"key":"e_1_3_2_2_35_1","volume-title":"Jianfeng Wang, and Xiaodong He.","author":"Huang Qiuyuan","year":"2019","unstructured":"Qiuyuan Huang , Zhe Gan , Asli Celikyilmaz , Dapeng Oliver Wu , Jianfeng Wang, and Xiaodong He. 2019 . Hierarchically Structured Reinforcement Learning for Topically Coherent Visual Story Generation. In AAAI. 8465--8472. Qiuyuan Huang, Zhe Gan, Asli Celikyilmaz, Dapeng Oliver Wu, Jianfeng Wang, and Xiaodong He. 2019. Hierarchically Structured Reinforcement Learning for Topically Coherent Visual Story Generation. In AAAI. 8465--8472."},{"key":"e_1_3_2_2_36_1","doi-asserted-by":"crossref","unstructured":"Qingbao Huang Chuan Huang Linzhang Mo Jielong Wei Yi Cai Ho-fung Leung and Qing Li. 2021. IgSEG: Image-guided Story Ending Generation. In ACL\/IJCNLP (Findings). 3114--3123. Qingbao Huang Chuan Huang Linzhang Mo Jielong Wei Yi Cai Ho-fung Leung and Qing Li. 2021. IgSEG: Image-guided Story Ending Generation. In ACL\/IJCNLP (Findings). 3114--3123.","DOI":"10.18653\/v1\/2021.findings-acl.274"},{"key":"e_1_3_2_2_37_1","volume-title":"Jeff Da, Keisuke Sakaguchi, Antoine Bosselut, and Yejin Choi.","author":"Hwang D.","year":"2021","unstructured":"Jena D. Hwang , Chandra Bhagavatula , Ronan Le Bras , Jeff Da, Keisuke Sakaguchi, Antoine Bosselut, and Yejin Choi. 2021 . (Comet-) Atomic 2020: On Symbolic and Neural Commonsense Knowledge Graphs. In AAAI. 6384--6392. JenaD. Hwang, Chandra Bhagavatula, Ronan Le Bras, Jeff Da, Keisuke Sakaguchi, Antoine Bosselut, and Yejin Choi. 2021. (Comet-) Atomic 2020: On Symbolic and Neural Commonsense Knowledge Graphs. In AAAI. 6384--6392."},{"key":"e_1_3_2_2_38_1","volume-title":"Action Genome: Actions As Compositions of Spatio-Temporal Scene Graphs. In CVPR. 10233--10244.","author":"Ji Jingwei","year":"2020","unstructured":"Jingwei Ji , Ranjay Krishna , Li Fei-Fei , and Juan Carlos Niebles . 2020 . Action Genome: Actions As Compositions of Spatio-Temporal Scene Graphs. In CVPR. 10233--10244. Jingwei Ji, Ranjay Krishna, Li Fei-Fei, and Juan Carlos Niebles. 2020. Action Genome: Actions As Compositions of Spatio-Temporal Scene Graphs. In CVPR. 10233--10244."},{"key":"e_1_3_2_2_39_1","doi-asserted-by":"crossref","unstructured":"Andrej Karpathy and Li Fei-Fei. 2015. Deep visual-semantic alignments for generating image descriptions. In CVPR. 3128--3137. Andrej Karpathy and Li Fei-Fei. 2015. Deep visual-semantic alignments for generating image descriptions. In CVPR. 3128--3137.","DOI":"10.1109\/CVPR.2015.7298932"},{"key":"e_1_3_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1121\/1.3184603"},{"key":"e_1_3_2_2_41_1","volume-title":"ICML (Proceedings of Machine Learning Research","volume":"5594","author":"Kim Wonjae","year":"2021","unstructured":"Wonjae Kim , Bokyung Son , and Ildoo Kim . 2021 . ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision . In ICML (Proceedings of Machine Learning Research , Vol. 139). PMLR, 5583-- 5594 . Wonjae Kim, Bokyung Son, and Ildoo Kim. 2021. ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision. In ICML (Proceedings of Machine Learning Research, Vol. 139). PMLR, 5583--5594."},{"key":"e_1_3_2_2_42_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-016-0981-7"},{"key":"e_1_3_2_2_43_1","doi-asserted-by":"crossref","unstructured":"Wojciech Kryscinski Bryan McCann Caiming Xiong and Richard Socher. 2020. Evaluating the Factual Consistency of Abstractive Text Summarization. In EMNLP. 9332--9346. Wojciech Kryscinski Bryan McCann Caiming Xiong and Richard Socher. 2020. Evaluating the Factual Consistency of Abstractive Text Summarization. In EMNLP. 9332--9346.","DOI":"10.18653\/v1\/2020.emnlp-main.750"},{"key":"e_1_3_2_2_44_1","volume-title":"Narrative theory","author":"Garcia Landa Jose Angel","year":"2005","unstructured":"Jose Angel Garcia Landa . 2005. Narrative theory . University of Zaragoza . On Line Edition ( 2005 ). Jose Angel Garcia Landa. 2005. Narrative theory. University of Zaragoza. On Line Edition (2005)."},{"key":"e_1_3_2_2_45_1","volume-title":"Distributed Representations of Sentences and Documents. In ICML (JMLR Workshop and Conference Proceedings","volume":"1196","author":"Quoc","unstructured":"Quoc V. Le and Tom\u00e1s Mikolov. 2014 . Distributed Representations of Sentences and Documents. In ICML (JMLR Workshop and Conference Proceedings , Vol. 32). JMLR.org, 1188-- 1196 . Quoc V. Le and Tom\u00e1s Mikolov. 2014. Distributed Representations of Sentences and Documents. In ICML (JMLR Workshop and Conference Proceedings, Vol. 32). JMLR.org, 1188--1196."},{"key":"e_1_3_2_2_46_1","unstructured":"Thao Minh Le Vuong Le Svetha Venkatesh and Truyen Tran. 2020. Action- Centric Relation Transformer Network for Video Question Answering. In CVPR. 9972--9981. Thao Minh Le Vuong Le Svetha Venkatesh and Truyen Tran. 2020. Action- Centric Relation Transformer Network for Video Question Answering. In CVPR. 9972--9981."},{"key":"e_1_3_2_2_47_1","volume-title":"Kankanhalli","author":"Li Junnan","year":"2017","unstructured":"Junnan Li , Yongkang Wong , Qi Zhao , and Mohan S . Kankanhalli . 2017 . Dual- Glance Model for Deciphering Social Relationships. In ICCV. 2669--2678. Junnan Li, Yongkang Wong, Qi Zhao, and Mohan S. Kankanhalli. 2017. Dual- Glance Model for Deciphering Social Relationships. In ICCV. 2669--2678."},{"key":"e_1_3_2_2_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2019.2930041"},{"key":"e_1_3_2_2_49_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-020-01295-1"},{"key":"e_1_3_2_2_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2022.3158546"},{"key":"e_1_3_2_2_51_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46448-0_51"},{"key":"e_1_3_2_2_52_1","doi-asserted-by":"crossref","unstructured":"Kenneth Marino Ruslan Salakhutdinov and Abhinav Gupta. 2017. The More You Know: Using Knowledge Graphs for Image Classification. In CVPR. 20--28. Kenneth Marino Ruslan Salakhutdinov and Abhinav Gupta. 2017. The More You Know: Using Knowledge Graphs for Image Classification. In CVPR. 20--28.","DOI":"10.1109\/CVPR.2017.10"},{"key":"e_1_3_2_2_53_1","doi-asserted-by":"crossref","unstructured":"Tim Meinhardt Alexander Kirillov Laura Leal-Taix\u00e9 and Christoph Feichtenhofer. 2022. TrackFormer: Multi-Object Tracking with Transformers. In CVPR. 8844--8854. Tim Meinhardt Alexander Kirillov Laura Leal-Taix\u00e9 and Christoph Feichtenhofer. 2022. TrackFormer: Multi-Object Tracking with Transformers. In CVPR. 8844--8854.","DOI":"10.1109\/CVPR52688.2022.00864"},{"key":"e_1_3_2_2_54_1","volume-title":"Ikram Amous, and Fa\u00efez Gargouri.","author":"Mezghani Manel","year":"2012","unstructured":"Manel Mezghani , Corinne Amel Zayani , Ikram Amous, and Fa\u00efez Gargouri. 2012 . A user profile modelling using social annotations: a survey. In WWW (Companion Volume) . 969--976. Manel Mezghani, Corinne Amel Zayani, Ikram Amous, and Fa\u00efez Gargouri. 2012. A user profile modelling using social annotations: a survey. In WWW (Companion Volume). 969--976."},{"key":"e_1_3_2_2_55_1","unstructured":"Tom\u00e1s Mikolov Ilya Sutskever Kai Chen Gregory S. Corrado and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In NIPS. 3111--3119. Tom\u00e1s Mikolov Ilya Sutskever Kai Chen Gregory S. Corrado and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In NIPS. 3111--3119."},{"key":"e_1_3_2_2_56_1","volume-title":"Explaining the effects of narrative in an entertainment television program: Overcoming resistance to persuasion. Human communication research 36, 1","author":"Moyer-Gus\u00e9 Emily","year":"2010","unstructured":"Emily Moyer-Gus\u00e9 and Robin L Nabi . 2010. Explaining the effects of narrative in an entertainment television program: Overcoming resistance to persuasion. Human communication research 36, 1 ( 2010 ), 26--52. Emily Moyer-Gus\u00e9 and Robin L Nabi. 2010. Explaining the effects of narrative in an entertainment television program: Overcoming resistance to persuasion. Human communication research 36, 1 (2010), 26--52."},{"key":"e_1_3_2_2_57_1","volume-title":"Practical strategies: STELLA narratives. Literacy Learning: the Middle Years 9, 2","author":"Murphy J","year":"2001","unstructured":"J Murphy , S McDonough , R van Haren , B Triglone , and J Salinas . 2001. Practical strategies: STELLA narratives. Literacy Learning: the Middle Years 9, 2 ( 2001 ). J Murphy, S McDonough, R van Haren, B Triglone, and J Salinas. 2001. Practical strategies: STELLA narratives. Literacy Learning: the Middle Years 9, 2 (2001)."},{"key":"e_1_3_2_2_58_1","volume-title":"Schwing","author":"Narasimhan Medhini","year":"2018","unstructured":"Medhini Narasimhan , Svetlana Lazebnik , and Alexander G . Schwing . 2018 . Out of the Box : Reasoning with Graph Convolution Nets for Factual Visual Question Answering. In NeurIPS. 2659--2670. Medhini Narasimhan, Svetlana Lazebnik, and Alexander G. Schwing. 2018. Out of the Box: Reasoning with Graph Convolution Nets for Factual Visual Question Answering. In NeurIPS. 2659--2670."},{"key":"e_1_3_2_2_59_1","volume-title":"Speech and audio coding for wireless and network applications","author":"Panzer Ira L","unstructured":"Ira L Panzer , Alan D Sharpley , and William D Voiers . 1993. A comparison of subjective methods for evaluating speech quality . In Speech and audio coding for wireless and network applications . Springer , 59--65. Ira L Panzer, Alan D Sharpley, and William D Voiers. 1993. A comparison of subjective methods for evaluating speech quality. In Speech and audio coding for wireless and network applications. Springer, 59--65."},{"key":"e_1_3_2_2_60_1","volume-title":"Park and Gunhee Kim","author":"Cesc","year":"2015","unstructured":"Cesc C. Park and Gunhee Kim . 2015 . Expressing an Image Stream with a Sequence of Natural Sentences. In NIPS. 73--81. Cesc C. Park and Gunhee Kim. 2015. Expressing an Image Stream with a Sequence of Natural Sentences. In NIPS. 73--81."},{"key":"e_1_3_2_2_61_1","unstructured":"Jiaxin Qi Yulei Niu Jianqiang Huang and Hanwang Zhang. 2020. Two Causal Principles for Improving Visual Dialog. In CVPR. 10857--10866. Jiaxin Qi Yulei Niu Jianqiang Huang and Hanwang Zhang. 2020. Two Causal Principles for Improving Visual Dialog. In CVPR. 10857--10866."},{"key":"e_1_3_2_2_62_1","first-page":"168","article-title":"Recent concepts of narrative and the narratives of narrative theory","volume":"34","author":"Richardson Brian","year":"2000","unstructured":"Brian Richardson . 2000 . Recent concepts of narrative and the narratives of narrative theory . Style 34 , 2 (2000), 168 -- 175 . Brian Richardson. 2000. Recent concepts of narrative and the narratives of narrative theory. Style 34, 2 (2000), 168--175.","journal-title":"Style"},{"key":"e_1_3_2_2_63_1","doi-asserted-by":"publisher","DOI":"10.5555\/1946417.1946422"},{"key":"e_1_3_2_2_64_1","unstructured":"Devendra Singh Sachan Siva Reddy William L. Hamilton Chris Dyer and Dani Yogatama. 2021. End-to-End Training of Multi-Document Reader and Retriever for Open-Domain Question Answering. In NeurIPS. Devendra Singh Sachan Siva Reddy William L. Hamilton Chris Dyer and Dani Yogatama. 2021. End-to-End Training of Multi-Document Reader and Retriever for Open-Domain Question Answering. In NeurIPS."},{"key":"e_1_3_2_2_65_1","volume-title":"Emily Allaway, Chandra Bhagavatula, Nicholas Lourie, Hannah Rashkin, Brendan Roof, Noah A. Smith, and Yejin Choi.","author":"Sap Maarten","year":"2019","unstructured":"Maarten Sap , Ronan Le Bras , Emily Allaway, Chandra Bhagavatula, Nicholas Lourie, Hannah Rashkin, Brendan Roof, Noah A. Smith, and Yejin Choi. 2019 . ATOMIC : An Atlas of Machine Commonsense for If-Then Reasoning. In AAAI. 3027--3035. Maarten Sap, Ronan Le Bras, Emily Allaway, Chandra Bhagavatula, Nicholas Lourie, Hannah Rashkin, Brendan Roof, Noah A. Smith, and Yejin Choi. 2019. ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning. In AAAI. 3027--3035."},{"key":"e_1_3_2_2_66_1","volume-title":"Coherent Multi- Sentence Video Description with Variable Level of Detail. In German Conference on Pattern Recognition. 184--195","author":"Senina Anna","year":"2014","unstructured":"Anna Senina , Marcus Rohrbach , Wei Qiu , Annemarie Friedrich , Sikandar Amin , Mykhaylo Andriluka , Manfred Pinkal , and Bernt Schiele . 2014 . Coherent Multi- Sentence Video Description with Variable Level of Detail. In German Conference on Pattern Recognition. 184--195 . Anna Senina, Marcus Rohrbach, Wei Qiu, Annemarie Friedrich, Sikandar Amin, Mykhaylo Andriluka, Manfred Pinkal, and Bernt Schiele. 2014. Coherent Multi- Sentence Video Description with Variable Level of Detail. In German Conference on Pattern Recognition. 184--195."},{"key":"e_1_3_2_2_67_1","doi-asserted-by":"crossref","unstructured":"Xindi Shang Zehuan Yuan AnranWang and ChanghuWang. 2021. Multimodal Video Summarization via Time-Aware Transformers. In ACM Multimedia. 1756--1765. Xindi Shang Zehuan Yuan AnranWang and ChanghuWang. 2021. Multimodal Video Summarization via Time-Aware Transformers. In ACM Multimedia. 1756--1765.","DOI":"10.1145\/3474085.3475321"},{"key":"e_1_3_2_2_68_1","doi-asserted-by":"crossref","unstructured":"Yang Shao and DeLiang Wang. 2008. Robust speaker identification using auditory features and computational auditory scene analysis. In ICASSP. 1589--1592. Yang Shao and DeLiang Wang. 2008. Robust speaker identification using auditory features and computational auditory scene analysis. In ICASSP. 1589--1592.","DOI":"10.1109\/ICASSP.2008.4517928"},{"key":"e_1_3_2_2_69_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2020.3030403"},{"key":"e_1_3_2_2_70_1","unstructured":"Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In ICLR. Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In ICLR."},{"key":"e_1_3_2_2_71_1","volume-title":"Video Google: A Text Retrieval Approach to Object Matching in Videos. In ICCV. 1470--1477.","author":"Sivic Josef","year":"2003","unstructured":"Josef Sivic and Andrew Zisserman . 2003 . Video Google: A Text Retrieval Approach to Object Matching in Videos. In ICCV. 1470--1477. Josef Sivic and Andrew Zisserman. 2003. Video Google: A Text Retrieval Approach to Object Matching in Videos. In ICCV. 1470--1477."},{"key":"e_1_3_2_2_72_1","volume-title":"Smitten and Ann Daghistany","author":"Jeffrey","year":"1981","unstructured":"Jeffrey R. Smitten and Ann Daghistany . 1981 . Spatial Form in Narrative. Cornell Univ Press . Jeffrey R. Smitten and Ann Daghistany. 1981. Spatial Form in Narrative. Cornell Univ Press."},{"key":"e_1_3_2_2_73_1","volume-title":"AndrewY. Ng, and Christopher D. Manning.","author":"Socher Richard","year":"2011","unstructured":"Richard Socher , Cliff Chiung-Yu Lin , AndrewY. Ng, and Christopher D. Manning. 2011 . Parsing Natural Scenes and Natural Language with Recursive Neural Networks. In ICML. 129--136. Richard Socher, Cliff Chiung-Yu Lin, AndrewY. Ng, and Christopher D. Manning. 2011. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. In ICML. 129--136."},{"key":"e_1_3_2_2_74_1","doi-asserted-by":"crossref","unstructured":"Robyn Speer Joshua Chin and Catherine Havasi. 2017. ConceptNet 5.5: An Open Multilingual Graph of General Knowledge. In AAAI. 4444--4451. Robyn Speer Joshua Chin and Catherine Havasi. 2017. ConceptNet 5.5: An Open Multilingual Graph of General Knowledge. In AAAI. 4444--4451.","DOI":"10.1609\/aaai.v31i1.11164"},{"key":"e_1_3_2_2_75_1","volume-title":"Le","author":"Sutskever Ilya","year":"2014","unstructured":"Ilya Sutskever , Oriol Vinyals , and Quoc V . Le . 2014 . Sequence to Sequence Learning with Neural Networks. In NIPS. 3104--3112. Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to Sequence Learning with Neural Networks. In NIPS. 3104--3112."},{"key":"e_1_3_2_2_76_1","volume-title":"Plummer","author":"Tan Reuben","year":"2021","unstructured":"Reuben Tan , Huijuan Xu , Kate Saenko , and Bryan A . Plummer . 2021 . LoGAN: Latent Graph Co-Attention Network for Weakly-Supervised Video Moment Retrieval. In WACV. 2082--2091. Reuben Tan, Huijuan Xu, Kate Saenko, and Bryan A. Plummer. 2021. LoGAN: Latent Graph Co-Attention Network for Weakly-Supervised Video Moment Retrieval. In WACV. 2082--2091."},{"key":"e_1_3_2_2_77_1","doi-asserted-by":"crossref","unstructured":"Thomas Pellissier Tanon Denny Vrandecic Sebastian Schaffert Thomas Steiner and Lydia Pintscher. 2016. From Freebase to Wikidata: The Great Migration. In WWW. 1419--1428. Thomas Pellissier Tanon Denny Vrandecic Sebastian Schaffert Thomas Steiner and Lydia Pintscher. 2016. From Freebase to Wikidata: The Great Migration. In WWW. 1419--1428.","DOI":"10.1145\/2872427.2874809"},{"key":"e_1_3_2_2_78_1","volume-title":"RUBER: An Unsupervised Method for Automatic Evaluation of Open-Domain Dialog Systems. In AAAI. 722--729.","author":"Tao Chongyang","year":"2018","unstructured":"Chongyang Tao , Lili Mou , Dongyan Zhao , and Rui Yan . 2018 . RUBER: An Unsupervised Method for Automatic Evaluation of Open-Domain Dialog Systems. In AAAI. 722--729. Chongyang Tao, Lili Mou, Dongyan Zhao, and Rui Yan. 2018. RUBER: An Unsupervised Method for Automatic Evaluation of Open-Domain Dialog Systems. In AAAI. 722--729."},{"key":"e_1_3_2_2_79_1","doi-asserted-by":"crossref","unstructured":"Du Tran Heng Wang Lorenzo Torresani Jamie Ray Yann LeCun and Manohar Paluri. 2018. A Closer Look at Spatiotemporal Convolutions for Action Recognition. In CVPR. 6450--6459. Du Tran Heng Wang Lorenzo Torresani Jamie Ray Yann LeCun and Manohar Paluri. 2018. A Closer Look at Spatiotemporal Convolutions for Action Recognition. In CVPR. 6450--6459.","DOI":"10.1109\/CVPR.2018.00675"},{"key":"e_1_3_2_2_80_1","doi-asserted-by":"crossref","unstructured":"Tao Tu Qing Ping Govindarajan Thattai G\u00f6khan T\u00fcr and Prem Natarajan. 2021. Learning Better Visual Dialog Agents With Pretrained Visual-Linguistic Representation. In CVPR. 5622--5631. Tao Tu Qing Ping Govindarajan Thattai G\u00f6khan T\u00fcr and Prem Natarajan. 2021. Learning Better Visual Dialog Agents With Pretrained Visual-Linguistic Representation. In CVPR. 5622--5631.","DOI":"10.1109\/CVPR46437.2021.00557"},{"key":"e_1_3_2_2_81_1","unstructured":"Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N. Gomez Lukasz Kaiser and Illia Polosukhin. 2017. Attention is All you Need. In NIPS. 5998--6008. Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N. Gomez Lukasz Kaiser and Illia Polosukhin. 2017. Attention is All you Need. In NIPS. 5998--6008."},{"key":"e_1_3_2_2_82_1","volume-title":"Andreas Geiger, and Bastian Leibe.","author":"Voigtlaender Paul","year":"2019","unstructured":"Paul Voigtlaender , Michael Krause , Aljosa Osep , Jonathon Luiten , Berin Balachandar Gnana Sekar , Andreas Geiger, and Bastian Leibe. 2019 . MOTS : Multi- Object Tracking and Segmentation. In CVPR. 7942--7951. Paul Voigtlaender, Michael Krause, Aljosa Osep, Jonathon Luiten, Berin Balachandar Gnana Sekar, Andreas Geiger, and Bastian Leibe. 2019. MOTS: Multi- Object Tracking and Segmentation. In CVPR. 7942--7951."},{"key":"e_1_3_2_2_83_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2018.2842159"},{"key":"e_1_3_2_2_84_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2017.2754246"},{"key":"e_1_3_2_2_85_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASL.2012.2221459"},{"key":"e_1_3_2_2_86_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCIAIG.2013.2277051"},{"key":"e_1_3_2_2_87_1","volume-title":"Tung","author":"Wu Shuang","year":"2020","unstructured":"Shuang Wu , Shaojing Fan , Zhiqi Shen , Mohan S. Kankanhalli , and Anthony K. H . Tung . 2020 . Who You Are Decides How You Tell. In ACM Multimedia . 4013--4022. Shuang Wu, Shaojing Fan, Zhiqi Shen, Mohan S. Kankanhalli, and Anthony K. H. Tung. 2020. Who You Are Decides How You Tell. In ACM Multimedia. 4013--4022."},{"key":"e_1_3_2_2_88_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cviu.2021.103349"},{"key":"e_1_3_2_2_89_1","doi-asserted-by":"crossref","unstructured":"Yu Xiang Alexandre Alahi and Silvio Savarese. 2015. Learning to Track: Online Multi-object Tracking by Decision Making. In ICCV. 4705--4713. Yu Xiang Alexandre Alahi and Silvio Savarese. 2015. Learning to Track: Online Multi-object Tracking by Decision Making. In ICCV. 4705--4713.","DOI":"10.1109\/ICCV.2015.534"},{"key":"e_1_3_2_2_90_1","unstructured":"Yaqi Xie Ziwei Xu Kuldeep S. Meel Mohan S. Kankanhalli and Harold Soh. 2019. Embedding Symbolic Knowledge into Deep Networks. In NeurIPS. 4235--4245. Yaqi Xie Ziwei Xu Kuldeep S. Meel Mohan S. Kankanhalli and Harold Soh. 2019. Embedding Symbolic Knowledge into Deep Networks. In NeurIPS. 4235--4245."},{"key":"e_1_3_2_2_91_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2019.2943753"},{"key":"e_1_3_2_2_92_1","volume-title":"Kankanhalli","author":"Xu Bingjie","year":"2019","unstructured":"Bingjie Xu , Yongkang Wong , Junnan Li , Qi Zhao , and Mohan S . Kankanhalli . 2019 . Learning to Detect Human-Object Interactions With Knowledge. In CVPR. 2019--2028. Bingjie Xu, Yongkang Wong, Junnan Li, Qi Zhao, and Mohan S. Kankanhalli. 2019. Learning to Detect Human-Object Interactions With Knowledge. In CVPR. 2019--2028."},{"key":"e_1_3_2_2_93_1","doi-asserted-by":"crossref","unstructured":"Chunpu Xu Min Yang Chengming Li Ying Shen Xiang Ao and Ruifeng Xu. 2021. Imagine Reason and Write: Visual Storytelling with Graph Knowledge and Relational Reasoning. In AAAI. 3022--3029. Chunpu Xu Min Yang Chengming Li Ying Shen Xiang Ao and Ruifeng Xu. 2021. Imagine Reason and Write: Visual Storytelling with Graph Knowledge and Relational Reasoning. In AAAI. 3022--3029.","DOI":"10.1609\/aaai.v35i4.16410"},{"key":"e_1_3_2_2_94_1","doi-asserted-by":"crossref","unstructured":"Jun Xu Tao Mei Ting Yao and Yong Rui. 2016. MSR-VTT: A Large Video Description Dataset for Bridging Video and Language. In CVPR. 5288--5296. Jun Xu Tao Mei Ting Yao and Yong Rui. 2016. MSR-VTT: A Large Video Description Dataset for Bridging Video and Language. In CVPR. 5288--5296.","DOI":"10.1109\/CVPR.2016.571"},{"key":"e_1_3_2_2_95_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2018.2867286"},{"key":"e_1_3_2_2_96_1","volume-title":"Kankanhalli","author":"Xu Ziwei","year":"2021","unstructured":"Ziwei Xu , Xudong Shen , Yongkang Wong , and Mohan S . Kankanhalli . 2021 . Unsupervised Motion Representation Learning with Capsule Autoencoders. In NeurIPS. 3205--3217. Ziwei Xu, Xudong Shen, Yongkang Wong, and Mohan S. Kankanhalli. 2021. Unsupervised Motion Representation Learning with Capsule Autoencoders. In NeurIPS. 3205--3217."},{"key":"e_1_3_2_2_97_1","doi-asserted-by":"crossref","unstructured":"Su Yan Xin Chen Ran Huo Xu Zhang and Leyu Lin. 2020. Learning to Build User-tag Profile in Recommendation System. In CIKM. 2877--2884. Su Yan Xin Chen Ran Huo Xu Zhang and Leyu Lin. 2020. Learning to Build User-tag Profile in Recommendation System. In CIKM. 2877--2884.","DOI":"10.1145\/3340531.3412719"},{"key":"e_1_3_2_2_98_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.imavis.2020.104091"},{"key":"e_1_3_2_2_99_1","doi-asserted-by":"crossref","unstructured":"Xu Yang Chongyang Gao Hanwang Zhang and Jianfei Cai. 2020. Hierarchical Scene Graph Encoder-Decoder for Image Paragraph Captioning. In ACM Multimedia. 4181--4189. Xu Yang Chongyang Gao Hanwang Zhang and Jianfei Cai. 2020. Hierarchical Scene Graph Encoder-Decoder for Image Paragraph Captioning. In ACM Multimedia. 4181--4189.","DOI":"10.1145\/3394171.3413859"},{"key":"e_1_3_2_2_100_1","volume-title":"Le","author":"Yang Zhilin","year":"2019","unstructured":"Zhilin Yang , Zihang Dai , Yiming Yang , Jaime G. Carbonell , Ruslan Salakhutdinov , and Quoc V . Le . 2019 . XLNet: Generalized Autoregressive Pretraining for Language Understanding. In NeurIPS. 5754--5764. Zhilin Yang, Zihang Dai, Yiming Yang, Jaime G. Carbonell, Ruslan Salakhutdinov, and Quoc V. Le. 2019. XLNet: Generalized Autoregressive Pretraining for Language Understanding. In NeurIPS. 5754--5764."},{"key":"e_1_3_2_2_101_1","doi-asserted-by":"crossref","unstructured":"Li Yao Atousa Torabi Kyunghyun Cho Nicolas Ballas Christopher Pal Hugo Larochelle and Aaron Courville. 2015. Describing Videos by Exploiting Temporal Structure. In ICCV. 4507--4515. Li Yao Atousa Torabi Kyunghyun Cho Nicolas Ballas Christopher Pal Hugo Larochelle and Aaron Courville. 2015. Describing Videos by Exploiting Temporal Structure. In ICCV. 4507--4515.","DOI":"10.1109\/ICCV.2015.512"},{"key":"e_1_3_2_2_102_1","doi-asserted-by":"crossref","unstructured":"Ting Yao Tao Mei and Yong Rui. 2016. Highlight Detection with Pairwise Deep Ranking for First-Person Video Summarization. In CVPR. 982--990. Ting Yao Tao Mei and Yong Rui. 2016. Highlight Detection with Pairwise Deep Ranking for First-Person Video Summarization. In CVPR. 982--990.","DOI":"10.1109\/CVPR.2016.112"},{"key":"e_1_3_2_2_103_1","volume-title":"CVPR Workshop.","author":"Yeung Serena","year":"2014","unstructured":"Serena Yeung , Alireza Fathi , and Li Fei-Fei . 2014 . In VideoSET: Video Summary Evaluation through Text . CVPR Workshop. Serena Yeung, Alireza Fathi, and Li Fei-Fei. 2014. In VideoSET: Video Summary Evaluation through Text. CVPR Workshop."},{"key":"e_1_3_2_2_104_1","doi-asserted-by":"crossref","unstructured":"Yawen Zeng Da Cao Xiaochi Wei Meng Liu Zhou Zhao and Zheng Qin. 2021. Multi-Modal Relational Graph for Cross-Modal Video Moment Retrieval. In CVPR. 2215--2224. Yawen Zeng Da Cao Xiaochi Wei Meng Liu Zhou Zhao and Zheng Qin. 2021. Multi-Modal Relational Graph for Cross-Modal Video Moment Retrieval. In CVPR. 2215--2224.","DOI":"10.1109\/CVPR46437.2021.00225"},{"key":"e_1_3_2_2_105_1","doi-asserted-by":"crossref","unstructured":"Xiaohua Zhai Yuxin Peng and Jianguo Xiao. 2013. Heterogeneous Metric Learning with Joint Graph Regularization for Cross-Media Retrieval. In AAAI. 1198--1204. Xiaohua Zhai Yuxin Peng and Jianguo Xiao. 2013. Heterogeneous Metric Learning with Joint Graph Regularization for Cross-Media Retrieval. In AAAI. 1198--1204.","DOI":"10.1609\/aaai.v27i1.8464"},{"key":"e_1_3_2_2_106_1","volume-title":"Davis","author":"Zhang Da","year":"2019","unstructured":"Da Zhang , Xiyang Dai , Xin Wang , Yuan-Fang Wang , and Larry S . Davis . 2019 . MAN : Moment Alignment Network for Natural Language Moment Retrieval via Iterative Graph Adjustment. In CVPR. 1247--1257. Da Zhang, Xiyang Dai, Xin Wang, Yuan-Fang Wang, and Larry S. Davis. 2019. MAN: Moment Alignment Network for Natural Language Moment Retrieval via Iterative Graph Adjustment. In CVPR. 1247--1257."},{"key":"e_1_3_2_2_107_1","doi-asserted-by":"crossref","unstructured":"Ziqi Zhang Yaya Shi Chunfeng Yuan Bing Li Peijin Wang Weiming Hu and Zheng-Jun Zha. 2020. Object Relational Graph With Teacher-Recommended Learning for Video Captioning. In CVPR. 13278--13288. Ziqi Zhang Yaya Shi Chunfeng Yuan Bing Li Peijin Wang Weiming Hu and Zheng-Jun Zha. 2020. Object Relational Graph With Teacher-Recommended Learning for Video Captioning. In CVPR. 13278--13288.","DOI":"10.1109\/CVPR42600.2020.01329"},{"key":"e_1_3_2_2_108_1","doi-asserted-by":"crossref","unstructured":"Ke Zhou Shuang-Hong Yang and Hongyuan Zha. 2011. Functional matrix factorizations for cold-start recommendation. In SIGIR. 315--324. Ke Zhou Shuang-Hong Yang and Hongyuan Zha. 2011. Functional matrix factorizations for cold-start recommendation. In SIGIR. 315--324.","DOI":"10.1145\/2009916.2009961"},{"key":"e_1_3_2_2_109_1","volume-title":"Hauptmann","author":"Zhu Linchao","year":"2017","unstructured":"Linchao Zhu , Zhongwen Xu , Yi Yang , and Alexander G . Hauptmann . 2017 . Uncovering Temporal Context for Video Question and Answering. In International Journal of Computer Vision, Vol. 124 . Springer , 409--421. Linchao Zhu, Zhongwen Xu, Yi Yang, and Alexander G. Hauptmann. 2017. Uncovering Temporal Context for Video Question and Answering. In International Journal of Computer Vision, Vol. 124. Springer, 409--421."}],"event":{"name":"MM '22: The 30th ACM International Conference on Multimedia","location":"Lisboa Portugal","acronym":"MM '22","sponsor":["SIGMM ACM Special Interest Group on Multimedia"]},"container-title":["Proceedings of the 30th ACM International Conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3503161.3549202","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3503161.3549202","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T17:49:18Z","timestamp":1750182558000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3503161.3549202"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,10,10]]},"references-count":109,"alternative-id":["10.1145\/3503161.3549202","10.1145\/3503161"],"URL":"https:\/\/doi.org\/10.1145\/3503161.3549202","relation":{},"subject":[],"published":{"date-parts":[[2022,10,10]]},"assertion":[{"value":"2022-10-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}