{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,14]],"date-time":"2025-10-14T01:09:31Z","timestamp":1760404171994,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":71,"publisher":"ACM","license":[{"start":{"date-parts":[[2019,10,15]],"date-time":"2019-10-15T00:00:00Z","timestamp":1571097600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2019,10,15]]},"DOI":"10.1145\/3343031.3351054","type":"proceedings-article","created":{"date-parts":[[2019,10,21]],"date-time":"2019-10-21T16:32:26Z","timestamp":1571675546000},"page":"592-600","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":9,"title":["Sparse Temporal Causal Convolution for Efficient Action Modeling"],"prefix":"10.1145","author":[{"given":"Changmao","family":"Cheng","sequence":"first","affiliation":[{"name":"Fudan University &amp; Jilian Technology Group (Video++), Shanghai, China"}]},{"given":"Chi","family":"Zhang","sequence":"additional","affiliation":[{"name":"Megvii Technology, Beijing, China"}]},{"given":"Yichen","family":"Wei","sequence":"additional","affiliation":[{"name":"Megvii Technology, Beijing, China"}]},{"given":"Yu-Gang","family":"Jiang","sequence":"additional","affiliation":[{"name":"Fudan University &amp; Jilian Technology Group (Video++), Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2019,10,15]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"crossref","unstructured":"Pulkit Agrawal Joao Carreira and Jitendra Malik. 2015. Learning to see by moving. In ICCV .  Pulkit Agrawal Joao Carreira and Jitendra Malik. 2015. Learning to see by moving. In ICCV .","DOI":"10.1109\/ICCV.2015.13"},{"volume-title":"Jamie Ryan Kiros, and Geoffrey E Hinton","year":"2016","author":"Ba Jimmy Lei","key":"e_1_3_2_1_2_1"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-25446-8_4"},{"volume-title":"An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271","year":"2018","author":"Bai Shaojie","key":"e_1_3_2_1_4_1"},{"volume-title":"Delving deeper into convolutional networks for learning video representations. arXiv preprint arXiv:1511.06432","year":"2015","author":"Ballas Nicolas","key":"e_1_3_2_1_5_1"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"crossref","unstructured":"Joao Carreira Viorica Patraucean Laurent Mazare Andrew Zisserman and Simon Osindero. 2018. Massively Parallel Video Networks. In ECCV .  Joao Carreira Viorica Patraucean Laurent Mazare Andrew Zisserman and Simon Osindero. 2018. Massively Parallel Video Networks. In ECCV .","DOI":"10.1007\/978-3-030-01225-0_40"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"crossref","unstructured":"Joao Carreira and Andrew Zisserman. 2017. Quo vadis action recognition? a new model and the kinetics dataset. In CVPR .  Joao Carreira and Andrew Zisserman. 2017. Quo vadis action recognition? a new model and the kinetics dataset. In CVPR .","DOI":"10.1109\/CVPR.2017.502"},{"volume-title":"Multitask learning. Machine learning","year":"1997","author":"Caruana Rich","key":"e_1_3_2_1_8_1"},{"volume-title":"Training deep nets with sublinear memory cost. arXiv preprint arXiv:1604.06174","year":"2016","author":"Chen Tianqi","key":"e_1_3_2_1_9_1"},{"key":"e_1_3_2_1_10_1","unstructured":"Zhao Chen Vijay Badrinarayanan Chen-Yu Lee and Andrew Rabinovich. 2018. GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks. ICML .  Zhao Chen Vijay Badrinarayanan Chen-Yu Lee and Andrew Rabinovich. 2018. GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks. ICML ."},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"crossref","unstructured":"Jifeng Dai Haozhi Qi Yuwen Xiong Yi Li Guodong Zhang Han Hu and Yichen Wei. 2017. Deformable Convolutional Networks. In ICCV .  Jifeng Dai Haozhi Qi Yuwen Xiong Yi Li Guodong Zhang Han Hu and Yichen Wei. 2017. Deformable Convolutional Networks. In ICCV .","DOI":"10.1109\/ICCV.2017.89"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"crossref","unstructured":"Navneet Dalal Bill Triggs and Cordelia Schmid. 2006. Human detection using oriented histograms of flow and appearance. In ECCV .  Navneet Dalal Bill Triggs and Cordelia Schmid. 2006. Human detection using oriented histograms of flow and appearance. In ECCV .","DOI":"10.1007\/11744047_33"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"crossref","unstructured":"Carl Doersch and Andrew Zisserman. 2017. Multi-task self-supervised visual learning. In ICCV .  Carl Doersch and Andrew Zisserman. 2017. Multi-task self-supervised visual learning. In ICCV .","DOI":"10.1109\/ICCV.2017.226"},{"volume-title":"Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell.","year":"2015","author":"Donahue Jeffrey","key":"e_1_3_2_1_14_1"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"crossref","unstructured":"Christoph Feichtenhofer Axel Pinz and Richard Wildes. 2016a. Spatiotemporal residual networks for video action recognition. In NIPS .  Christoph Feichtenhofer Axel Pinz and Richard Wildes. 2016a. Spatiotemporal residual networks for video action recognition. In NIPS .","DOI":"10.1109\/CVPR.2017.787"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"crossref","unstructured":"Christoph Feichtenhofer Axel Pinz and Richard P Wildes. 2017. Spatiotemporal multiplier networks for video action recognition. In CVPR .  Christoph Feichtenhofer Axel Pinz and Richard P Wildes. 2017. Spatiotemporal multiplier networks for video action recognition. In CVPR .","DOI":"10.1109\/CVPR.2017.787"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"crossref","unstructured":"Christoph Feichtenhofer Axel Pinz and Andrew Zisserman. 2016b. Convolutional two-stream network fusion for video action recognition. In CVPR .  Christoph Feichtenhofer Axel Pinz and Andrew Zisserman. 2016b. Convolutional two-stream network fusion for video action recognition. In CVPR .","DOI":"10.1109\/CVPR.2016.213"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"crossref","unstructured":"Junyu Gao Tianzhu Zhang and Changsheng Xu. 2018. Watch Think and Attend: End-to-End Video Classification via Dynamic Knowledge Evolution Modeling. In ACM MM .  Junyu Gao Tianzhu Zhang and Changsheng Xu. 2018. Watch Think and Attend: End-to-End Video Classification via Dynamic Knowledge Evolution Modeling. In ACM MM .","DOI":"10.1145\/3240508.3240566"},{"key":"e_1_3_2_1_19_1","unstructured":"Jonas Gehring Michael Auli David Grangier Denis Yarats and Yann Dauphin. 2017. Convolutional Sequence to Sequence Learning. In ICML .  Jonas Gehring Michael Auli David Grangier Denis Yarats and Yann Dauphin. 2017. Convolutional Sequence to Sequence Learning. In ICML ."},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"crossref","unstructured":"Raghav Goyal Samira Ebrahimi Kahou Vincent Michalski Joanna Materzynska Susanne Westphal Heuna Kim Valentin Haenel Ingo Fruend Peter Yianilos Moritz Mueller-Freitag Florian Hoppe Christian Thurau Ingo Bax and Roland Memisevic. 2017. The \u201cSomething Something\" Video Database for Learning and Evaluating Visual Common Sense. In ICCV .  Raghav Goyal Samira Ebrahimi Kahou Vincent Michalski Joanna Materzynska Susanne Westphal Heuna Kim Valentin Haenel Ingo Fruend Peter Yianilos Moritz Mueller-Freitag Florian Hoppe Christian Thurau Ingo Bax and Roland Memisevic. 2017. The \u201cSomething Something\" Video Database for Learning and Evaluating Visual Common Sense. In ICCV .","DOI":"10.1109\/ICCV.2017.622"},{"key":"e_1_3_2_1_21_1","unstructured":"Michelle Guo Albert Haque De-An Huang Serena Yeung and Li Fei-Fei. 2018. Dynamic Task Prioritization for Multitask Learning. In ECCV .  Michelle Guo Albert Haque De-An Huang Serena Yeung and Li Fei-Fei. 2018. Dynamic Task Prioritization for Multitask Learning. In ECCV ."},{"key":"e_1_3_2_1_22_1","unstructured":"Kaiming He Xiangyu Zhang Shaoqing Ren and Jian Sun. 2015. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In ICCV .  Kaiming He Xiangyu Zhang Shaoqing Ren and Jian Sun. 2015. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In ICCV ."},{"key":"e_1_3_2_1_23_1","unstructured":"Kaiming He Xiangyu Zhang Shaoqing Ren and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR .  Kaiming He Xiangyu Zhang Shaoqing Ren and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR ."},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"crossref","unstructured":"De-An Huang Vignesh Ramanathan Dhruv Mahajan Lorenzo Torresani Manohar Paluri Li Fei-Fei and Juan Carlos Niebles. 2018. What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets. In CVPR .  De-An Huang Vignesh Ramanathan Dhruv Mahajan Lorenzo Torresani Manohar Paluri Li Fei-Fei and Juan Carlos Niebles. 2018. What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets. In CVPR .","DOI":"10.1109\/CVPR.2018.00769"},{"volume-title":"Laurens Van Der Maaten, and Kilian Q Weinberger","year":"2017","author":"Huang Gao","key":"e_1_3_2_1_25_1"},{"volume-title":"et almbox","year":"2015","author":"Jaderberg Max","key":"e_1_3_2_1_26_1"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2012.59"},{"volume-title":"Video pixel networks. arXiv preprint arXiv:1610.00527","year":"2016","author":"Kalchbrenner Nal","key":"e_1_3_2_1_28_1"},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"crossref","unstructured":"Andrej Karpathy George Toderici Sanketh Shetty Thomas Leung Rahul Sukthankar and Li Fei-Fei. 2014. Large-scale Video Classification with Convolutional Neural Networks. In CVPR .  Andrej Karpathy George Toderici Sanketh Shetty Thomas Leung Rahul Sukthankar and Li Fei-Fei. 2014. Large-scale Video Classification with Convolutional Neural Networks. In CVPR .","DOI":"10.1109\/CVPR.2014.223"},{"key":"e_1_3_2_1_30_1","unstructured":"Alex Kendall Yarin Gal and Roberto Cipolla. 2018. Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics. In CVPR .  Alex Kendall Yarin Gal and Roberto Cipolla. 2018. Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics. In CVPR ."},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"crossref","unstructured":"Alexander Klaser Marcin Marsza\u0142ek and Cordelia Schmid. 2008. A spatio-temporal descriptor based on 3d-gradients. In BMVC .  Alexander Klaser Marcin Marsza\u0142ek and Cordelia Schmid. 2008. A spatio-temporal descriptor based on 3d-gradients. In BMVC .","DOI":"10.5244\/C.22.99"},{"volume-title":"Ubernet: Training a Universal Convolutional Neural Network for Low-, Mid-, and High-Level Vision Using Diverse Datasets and Limited Memory. In CVPR .","year":"2017","author":"Kokkinos Iasonas","key":"e_1_3_2_1_32_1"},{"volume-title":"Hinton","year":"2012","author":"Krizhevsky Alex","key":"e_1_3_2_1_33_1"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"crossref","unstructured":"H. Kuehne H. Jhuang E. Garrote T. Poggio and T. Serre. 2011. HMDB: a large video database for human motion recognition. In ICCV .  H. Kuehne H. Jhuang E. Garrote T. Poggio and T. Serre. 2011. HMDB: a large video database for human motion recognition. In ICCV .","DOI":"10.1109\/ICCV.2011.6126543"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-005-1838-7"},{"key":"e_1_3_2_1_36_1","unstructured":"Junnan Li Yongkang Wong Qi Zhao and Mohan S Kankanhalli. 2017. Attention transfer from web images for video recognition. In ACM MM .  Junnan Li Yongkang Wong Qi Zhao and Mohan S Kankanhalli. 2017. Attention transfer from web images for video recognition. In ACM MM ."},{"key":"e_1_3_2_1_37_1","unstructured":"Yuke Li. 2018. Video Forecasting with Forward-Backward-Net: Delving Deeper into Spatiotemporal Consistency. In ACM MM .  Yuke Li. 2018. Video Forecasting with Forward-Backward-Net: Delving Deeper into Spatiotemporal Consistency. In ACM MM ."},{"key":"e_1_3_2_1_38_1","unstructured":"William Lotter Gabriel Kreiman and David Cox. 2017. Deep predictive coding networks for video prediction and unsupervised learning. In ICLR .  William Lotter Gabriel Kreiman and David Cox. 2017. Deep predictive coding networks for video prediction and unsupervised learning. In ICLR ."},{"key":"e_1_3_2_1_39_1","unstructured":"Michael Mathieu Camille Couprie and Yann LeCun. 2016. Deep multi-scale video prediction beyond mean square error. In ICLR .  Michael Mathieu Camille Couprie and Yann LeCun. 2016. Deep multi-scale video prediction beyond mean square error. In ICLR ."},{"key":"e_1_3_2_1_40_1","unstructured":"Liqiang Nie Xiang Wang Jianglong Zhang Xiangnan He Hanwang Zhang Richang Hong and Qi Tian. 2017. Enhancing micro-video understanding by harnessing external sounds. In ACM MM .  Liqiang Nie Xiang Wang Jianglong Zhang Xiangnan He Hanwang Zhang Richang Hong and Qi Tian. 2017. Enhancing micro-video understanding by harnessing external sounds. In ACM MM ."},{"volume-title":"Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499","year":"2016","author":"van den Oord Aaron","key":"e_1_3_2_1_41_1"},{"volume-title":"NIPS Workshops .","year":"2017","author":"Paszke Adam","key":"e_1_3_2_1_42_1"},{"volume-title":"Freeman","year":"2014","author":"Pickup Lyndsey C.","key":"e_1_3_2_1_43_1"},{"volume-title":"Memory-efficient implementation of densenets. arXiv preprint arXiv:1707.06990","year":"2017","author":"Pleiss Geoff","key":"e_1_3_2_1_44_1"},{"key":"e_1_3_2_1_45_1","unstructured":"Zhaofan Qiu Ting Yao and Tao Mei. 2017. Learning spatio-temporal representation with pseudo-3d residual networks. In ICCV .  Zhaofan Qiu Ting Yao and Tao Mei. 2017. Learning spatio-temporal representation with pseudo-3d residual networks. In ICCV ."},{"key":"e_1_3_2_1_46_1","doi-asserted-by":"crossref","unstructured":"Sreemanananth Sadanand and Jason J Corso. 2012. Action bank: A high-level representation of activity in video. In CVPR .  Sreemanananth Sadanand and Jason J Corso. 2012. Action bank: A high-level representation of activity in video. In CVPR .","DOI":"10.1109\/CVPR.2012.6247806"},{"key":"e_1_3_2_1_47_1","doi-asserted-by":"crossref","unstructured":"Paul Scovanner Saad Ali and Mubarak Shah. 2007. A 3-dimensional sift descriptor and its application to action recognition. In ACM MM .  Paul Scovanner Saad Ali and Mubarak Shah. 2007. A 3-dimensional sift descriptor and its application to action recognition. In ACM MM .","DOI":"10.1145\/1291233.1291311"},{"key":"e_1_3_2_1_48_1","unstructured":"Ozan Sener and Vladlen Koltun. 2018. Multi-Task Learning as Multi-Objective Optimization. In NIPS .  Ozan Sener and Vladlen Koltun. 2018. Multi-Task Learning as Multi-Objective Optimization. In NIPS ."},{"key":"e_1_3_2_1_49_1","unstructured":"Wenzhe Shi Jose Caballero Ferenc Husz\u00e1r Johannes Totz Andrew P Aitken Rob Bishop Daniel Rueckert and Zehan Wang. 2016. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In CVPR .  Wenzhe Shi Jose Caballero Ferenc Husz\u00e1r Johannes Totz Andrew P Aitken Rob Bishop Daniel Rueckert and Zehan Wang. 2016. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In CVPR ."},{"key":"e_1_3_2_1_50_1","unstructured":"Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. In NIPS .  Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. In NIPS ."},{"key":"e_1_3_2_1_51_1","unstructured":"Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In ICLR .  Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In ICLR ."},{"key":"e_1_3_2_1_52_1","doi-asserted-by":"crossref","unstructured":"Leslie N Smith. 2017. Cyclical learning rates for training neural networks. In WACV .  Leslie N Smith. 2017. Cyclical learning rates for training neural networks. In WACV .","DOI":"10.1109\/WACV.2017.58"},{"volume-title":"Amir Roshan Zamir, and Mubarak Shah","year":"2012","author":"Soomro Khurram","key":"e_1_3_2_1_53_1"},{"key":"e_1_3_2_1_54_1","unstructured":"Nitish Srivastava Elman Mansimov and Ruslan Salakhudinov. 2015. Unsupervised learning of video representations using lstms. In ICML .  Nitish Srivastava Elman Mansimov and Ruslan Salakhudinov. 2015. Unsupervised learning of video representations using lstms. In ICML ."},{"key":"e_1_3_2_1_55_1","unstructured":"Deqing Sun Stefan Roth and Michael J Black. 2010. Secrets of optical flow estimation and their principles. In CVPR .  Deqing Sun Stefan Roth and Michael J Black. 2010. Secrets of optical flow estimation and their principles. In CVPR ."},{"key":"e_1_3_2_1_56_1","doi-asserted-by":"crossref","unstructured":"Du Tran Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri. 2015. Learning spatiotemporal features with 3d convolutional networks. In ICCV .  Du Tran Lubomir Bourdev Rob Fergus Lorenzo Torresani and Manohar Paluri. 2015. Learning spatiotemporal features with 3d convolutional networks. In ICCV .","DOI":"10.1109\/ICCV.2015.510"},{"volume-title":"Convnet architecture search for spatiotemporal feature learning. arXiv preprint arXiv:1708.05038","year":"2017","author":"Tran Du","key":"e_1_3_2_1_57_1"},{"key":"e_1_3_2_1_58_1","doi-asserted-by":"crossref","unstructured":"Du Tran Heng Wang Lorenzo Torresani Jamie Ray Yann LeCun and Manohar Paluri. 2018. A Closer Look at Spatiotemporal Convolutions for Action Recognition. In CVPR .  Du Tran Heng Wang Lorenzo Torresani Jamie Ray Yann LeCun and Manohar Paluri. 2018. A Closer Look at Spatiotemporal Convolutions for Action Recognition. In CVPR .","DOI":"10.1109\/CVPR.2018.00675"},{"key":"e_1_3_2_1_59_1","unstructured":"Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N Gomez \u0141ukasz Kaiser and Illia Polosukhin. 2017. Attention is all you need. In NIPS .  Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N Gomez \u0141ukasz Kaiser and Illia Polosukhin. 2017. Attention is all you need. In NIPS ."},{"key":"e_1_3_2_1_60_1","doi-asserted-by":"crossref","unstructured":"Carl Vondrick and Antonio Torralba. 2017. Generating the Future With Adversarial Transformers. In CVPR .  Carl Vondrick and Antonio Torralba. 2017. Generating the Future With Adversarial Transformers. In CVPR .","DOI":"10.1109\/CVPR.2017.319"},{"key":"e_1_3_2_1_61_1","doi-asserted-by":"crossref","unstructured":"Heng Wang and Cordelia Schmid. 2013. Action recognition with improved trajectories. In ICCV .  Heng Wang and Cordelia Schmid. 2013. Action recognition with improved trajectories. In ICCV .","DOI":"10.1109\/ICCV.2013.441"},{"key":"e_1_3_2_1_62_1","doi-asserted-by":"crossref","unstructured":"Limin Wang Yuanjun Xiong Zhe Wang Yu Qiao Dahua Lin Xiaoou Tang and Luc Van Gool. 2016. Temporal segment networks: Towards good practices for deep action recognition. In ECCV .  Limin Wang Yuanjun Xiong Zhe Wang Yu Qiao Dahua Lin Xiaoou Tang and Luc Van Gool. 2016. Temporal segment networks: Towards good practices for deep action recognition. In ECCV .","DOI":"10.1007\/978-3-319-46484-8_2"},{"key":"e_1_3_2_1_63_1","doi-asserted-by":"crossref","unstructured":"Xiaolong Wang Ross Girshick Abhinav Gupta and Kaiming He. 2018. Non-local Neural Networks. In CVPR .  Xiaolong Wang Ross Girshick Abhinav Gupta and Kaiming He. 2018. Non-local Neural Networks. In CVPR .","DOI":"10.1109\/CVPR.2018.00813"},{"volume-title":"Freeman","year":"2018","author":"Wei Donglai","key":"e_1_3_2_1_64_1"},{"key":"e_1_3_2_1_65_1","unstructured":"Yuxin Wu and Kaiming He. 2018. Group Normalization. In ECCV .  Yuxin Wu and Kaiming He. 2018. Group Normalization. In ECCV ."},{"key":"e_1_3_2_1_66_1","doi-asserted-by":"crossref","unstructured":"Z. Wu Y.-G. Jiang X. Wang H. Ye and X. Xue. 2016. Multi-Stream Multi-Class Fusion of Deep Networks for Video Classification. ACM MM (2016).  Z. Wu Y.-G. Jiang X. Wang H. Ye and X. Xue. 2016. Multi-Stream Multi-Class Fusion of Deep Networks for Video Classification. ACM MM (2016).","DOI":"10.1145\/2964284.2964328"},{"key":"e_1_3_2_1_67_1","unstructured":"Zuxuan Wu Xi Wang Yu-Gang Jiang Hao Ye and Xiangyang Xue. 2015. Modeling spatial-temporal clues in a hybrid deep learning framework for video classification. In ACM MM .  Zuxuan Wu Xi Wang Yu-Gang Jiang Hao Ye and Xiangyang Xue. 2015. Modeling spatial-temporal clues in a hybrid deep learning framework for video classification. In ACM MM ."},{"key":"e_1_3_2_1_68_1","unstructured":"Saining Xie Chen Sun Jonathan Huang Zhuowen Tu and Kevin Murphy. 2018. Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification. In ECCV .  Saining Xie Chen Sun Jonathan Huang Zhuowen Tu and Kevin Murphy. 2018. Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification. In ECCV ."},{"key":"e_1_3_2_1_69_1","doi-asserted-by":"crossref","unstructured":"Joe Yue-Hei Ng Matthew Hausknecht Sudheendra Vijayanarasimhan Oriol Vinyals Rajat Monga and George Toderici. 2015. Beyond short snippets: Deep networks for video classification. In CVPR .  Joe Yue-Hei Ng Matthew Hausknecht Sudheendra Vijayanarasimhan Oriol Vinyals Rajat Monga and George Toderici. 2015. Beyond short snippets: Deep networks for video classification. In CVPR .","DOI":"10.1109\/CVPR.2015.7299101"},{"volume-title":"Taskonomy: Disentangling Task Transfer Learning. In CVPR .","year":"2018","author":"Zamir Amir R","key":"e_1_3_2_1_70_1"},{"key":"e_1_3_2_1_71_1","doi-asserted-by":"crossref","unstructured":"Bolei Zhou Alex Andonian Aude Oliva and Antonio Torralba. 2018. Temporal Relational Reasoning in Videos. In ECCV .  Bolei Zhou Alex Andonian Aude Oliva and Antonio Torralba. 2018. Temporal Relational Reasoning in Videos. In ECCV .","DOI":"10.1007\/978-3-030-01246-5_49"}],"event":{"name":"MM '19: The 27th ACM International Conference on Multimedia","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Nice France","acronym":"MM '19"},"container-title":["Proceedings of the 27th ACM International Conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3343031.3351054","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3343031.3351054","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:13:12Z","timestamp":1750201992000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3343031.3351054"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,10,15]]},"references-count":71,"alternative-id":["10.1145\/3343031.3351054","10.1145\/3343031"],"URL":"https:\/\/doi.org\/10.1145\/3343031.3351054","relation":{},"subject":[],"published":{"date-parts":[[2019,10,15]]},"assertion":[{"value":"2019-10-15","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}