{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,23]],"date-time":"2025-08-23T05:21:38Z","timestamp":1755926498208,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":18,"publisher":"ACM","license":[{"start":{"date-parts":[[2020,10,12]],"date-time":"2020-10-12T00:00:00Z","timestamp":1602460800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Natural Science Foundation of China","award":["61876177"],"award-info":[{"award-number":["61876177"]}]},{"name":"Beijing Natural Science Foundation","award":["4202034"],"award-info":[{"award-number":["4202034"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2020,10,12]]},"DOI":"10.1145\/3394171.3416284","type":"proceedings-article","created":{"date-parts":[[2020,10,12]],"date-time":"2020-10-12T12:26:25Z","timestamp":1602505585000},"page":"4590-4594","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":15,"title":["Video Relation Detection with Trajectory-aware Multi-modal Features"],"prefix":"10.1145","author":[{"given":"Wentao","family":"Xie","sequence":"first","affiliation":[{"name":"Beihang University, Beijing, China"}]},{"given":"Guanghui","family":"Ren","sequence":"additional","affiliation":[{"name":"YITU Technology, Beijing, China"}]},{"given":"Si","family":"Liu","sequence":"additional","affiliation":[{"name":"Beihang University, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2020,10,12]]},"reference":[{"key":"e_1_3_2_2_1_1","unstructured":"Zhaowei Cai and Nuno Vasconcelos. 2018. Cascade R-CNN: Delving Into High Quality Object Detection. (2018) 6154--6162.  Zhaowei Cai and Nuno Vasconcelos. 2018. Cascade R-CNN: Delving Into High Quality Object Detection. (2018) 6154--6162."},{"key":"e_1_3_2_2_2_1","doi-asserted-by":"crossref","unstructured":"Joao Carreira and Andrew Zisserman. 2017. Quo Vadis Action Recognition? A New Model and the Kinetics Dataset. (2017) 4724--4733.  Joao Carreira and Andrew Zisserman. 2017. Quo Vadis Action Recognition? A New Model and the Kinetics Dataset. (2017) 4724--4733.","DOI":"10.1109\/CVPR.2017.502"},{"key":"e_1_3_2_2_3_1","volume-title":"Prajit Ramachandran, Mohammad Babaeizadeh, Honghui Shi, Jianan Li, Shuicheng Yan, and Thomas S Huang.","author":"Han Wei","year":"2016","unstructured":"Wei Han , Pooya Khorrami , Tom Le Paine , Prajit Ramachandran, Mohammad Babaeizadeh, Honghui Shi, Jianan Li, Shuicheng Yan, and Thomas S Huang. 2016 . Seq-NMS for Video Object Detection. arXiv: Computer Vision and Pattern Recognition ( 2016). Wei Han, Pooya Khorrami, Tom Le Paine, Prajit Ramachandran, Mohammad Babaeizadeh, Honghui Shi, Jianan Li, Shuicheng Yan, and Thomas S Huang. 2016. Seq-NMS for Video Object Detection. arXiv: Computer Vision and Pattern Recognition (2016)."},{"key":"e_1_3_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-016-0981-7"},{"key":"e_1_3_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00056"},{"key":"e_1_3_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.324"},{"key":"e_1_3_2_2_7_1","volume-title":"Beyond Short-Term Snippet: Video Relation Detection With Spatio-Temporal Global Context. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR).","author":"Liu Chenchen","year":"2020","unstructured":"Chenchen Liu , Yang Jin , Kehan Xu , Guoqiang Gong , and Yadong Mu . 2020 . Beyond Short-Term Snippet: Video Relation Detection With Spatio-Temporal Global Context. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Chenchen Liu, Yang Jin, Kehan Xu, Guoqiang Gong, and Yadong Mu. 2020. Beyond Short-Term Snippet: Video Relation Detection With Spatio-Temporal Global Context. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)."},{"key":"e_1_3_2_2_8_1","doi-asserted-by":"crossref","unstructured":"Xufeng Qian Yueting Zhuang Yimeng Li Shaoning Xiao Shiliang Pu and Jun Xiao. 2019. Video Relation Detection with Spatio-Temporal Graph. (2019) 84--93.  Xufeng Qian Yueting Zhuang Yimeng Li Shaoning Xiao Shiliang Pu and Jun Xiao. 2019. Video Relation Detection with Spatio-Temporal Graph. (2019) 84--93.","DOI":"10.1145\/3343031.3351058"},{"key":"e_1_3_2_2_9_1","volume-title":"Scene Graph Generation With Hierarchical Context","author":"Ren Guanghui","year":"2020","unstructured":"Guanghui Ren , Lejian Ren , Yue Liao , Si Liu , Bo Li , Jizhong Han , and Shuicheng Yan . 2020. Scene Graph Generation With Hierarchical Context . IEEE Transactions on Neural Networks and Learning Systems ( 2020 ). Guanghui Ren, Lejian Ren, Yue Liao, Si Liu, Bo Li, Jizhong Han, and Shuicheng Yan. 2020. Scene Graph Generation With Hierarchical Context. IEEE Transactions on Neural Networks and Learning Systems (2020)."},{"key":"e_1_3_2_2_10_1","doi-asserted-by":"crossref","unstructured":"Xindi Shang Donglin Di Junbin Xiao Yu Cao Xun Yang and Tatseng Chua. 2019 a. Annotating Objects and Relations in User-Generated Videos. (2019) 279--287.  Xindi Shang Donglin Di Junbin Xiao Yu Cao Xun Yang and Tatseng Chua. 2019 a. Annotating Objects and Relations in User-Generated Videos. (2019) 279--287.","DOI":"10.1145\/3323873.3325056"},{"key":"e_1_3_2_2_11_1","unstructured":"Xindi Shang Tongwei Ren Jingfan Guo Hanwang Zhang and Tatseng Chua. 2017. Video Visual Relation Detection. (2017) 1300--1308.  Xindi Shang Tongwei Ren Jingfan Guo Hanwang Zhang and Tatseng Chua. 2017. Video Visual Relation Detection. (2017) 1300--1308."},{"key":"e_1_3_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3343031.3356082"},{"key":"e_1_3_2_2_13_1","doi-asserted-by":"crossref","unstructured":"Xu Sun Tongwei Ren Yuan Zi and Gangshan Wu. 2019. Video Visual Relation Detection via Multi-modal Feature Fusion. (2019) 2657--2661.  Xu Sun Tongwei Ren Yuan Zi and Gangshan Wu. 2019. Video Visual Relation Detection via Multi-modal Feature Fusion. (2019) 2657--2661.","DOI":"10.1145\/3343031.3356076"},{"key":"e_1_3_2_2_14_1","volume-title":"Unbiased Scene Graph Generation from Biased Training. arXiv: Computer Vision and Pattern Recognition","author":"Tang Kaihua","year":"2020","unstructured":"Kaihua Tang , Yulei Niu , Jianqiang Huang , Jiaxin Shi , and Hanwang Zhang . 2020. Unbiased Scene Graph Generation from Biased Training. arXiv: Computer Vision and Pattern Recognition ( 2020 ). Kaihua Tang, Yulei Niu, Jianqiang Huang, Jiaxin Shi, and Hanwang Zhang. 2020. Unbiased Scene Graph Generation from Biased Training. arXiv: Computer Vision and Pattern Recognition (2020)."},{"key":"e_1_3_2_2_15_1","unstructured":"Yaohung Hubert Tsai Santosh K Divvala Louisphilippe Morency Ruslan Salakhutdinov and Ali Farhadi. 2019. Video Relationship Reasoning Using Gated Spatio-Temporal Energy Graph. (2019) 10424--10433.  Yaohung Hubert Tsai Santosh K Divvala Louisphilippe Morency Ruslan Salakhutdinov and Ali Farhadi. 2019. Video Relationship Reasoning Using Gated Spatio-Temporal Energy Graph. (2019) 10424--10433."},{"key":"e_1_3_2_2_16_1","volume-title":"Neural Motifs: Scene Graph Parsing with Global Context.","author":"Zellers Rowan","year":"2018","unstructured":"Rowan Zellers , Mark Yatskar , Sam Thomson , and Yejin Choi . 2018 . Neural Motifs: Scene Graph Parsing with Global Context. (2018), 5831--5840. Rowan Zellers, Mark Yatskar, Sam Thomson, and Yejin Choi. 2018. Neural Motifs: Scene Graph Parsing with Global Context. (2018), 5831--5840."},{"key":"e_1_3_2_2_17_1","volume-title":"Tong He, Jonas Mueller, R. Manmatha, Mengnan Li, and Alexander J. Smola.","author":"Zhang Hang","year":"2020","unstructured":"Hang Zhang , Chongruo Wu , Zhongyue Zhang , Yi Zhu , Zhi-Li Zhang , Haibin Lin , Yu e Sun , Tong He, Jonas Mueller, R. Manmatha, Mengnan Li, and Alexander J. Smola. 2020 . ResNeSt: Split- Attention Networks . ArXiv , Vol. abs\/ 2004 .08955 (2020). Hang Zhang, Chongruo Wu, Zhongyue Zhang, Yi Zhu, Zhi-Li Zhang, Haibin Lin, Yu e Sun, Tong He, Jonas Mueller, R. Manmatha, Mengnan Li, and Alexander J. Smola. 2020. ResNeSt: Split-Attention Networks. ArXiv, Vol. abs\/2004.08955 (2020)."},{"key":"e_1_3_2_2_18_1","unstructured":"Sipeng Zheng Xiangyu Chen Shizhe Chen and Qin Jin. 2019. Relation Understanding in Videos. (2019) 2662--2666.  Sipeng Zheng Xiangyu Chen Shizhe Chen and Qin Jin. 2019. Relation Understanding in Videos. (2019) 2662--2666."}],"event":{"name":"MM '20: The 28th ACM International Conference on Multimedia","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Seattle WA USA","acronym":"MM '20"},"container-title":["Proceedings of the 28th ACM International Conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3394171.3416284","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3394171.3416284","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:01:24Z","timestamp":1750197684000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3394171.3416284"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,10,12]]},"references-count":18,"alternative-id":["10.1145\/3394171.3416284","10.1145\/3394171"],"URL":"https:\/\/doi.org\/10.1145\/3394171.3416284","relation":{},"subject":[],"published":{"date-parts":[[2020,10,12]]},"assertion":[{"value":"2020-10-12","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}