{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,16]],"date-time":"2026-05-16T00:34:24Z","timestamp":1778891664926,"version":"3.51.4"},"reference-count":187,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2024,10,7]],"date-time":"2024-10-07T00:00:00Z","timestamp":1728259200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Comput. Surv."],"published-print":{"date-parts":[[2025,1,31]]},"abstract":"<jats:p>Action recognition refers to the process of categorizing a video by identifying and classifying the specific actions it encompasses. Videos originate from several domains, and within each domain of video analysis, comprehending actions holds paramount significance. The primary aim of this research is to assist scholars in understanding, comparing, and using action recognition models within the several fields of video analysis. This article provides a comprehensive analysis of action recognition models, comparing their performance and computational requirements. Additionally, it presents a detailed overview of benchmark datasets, which can aid in selecting the most suitable action recognition model. This review additionally examines the diverse applications of action recognition, the datasets available, the research that has been undertaken, potential future prospects, and the challenges encountered.<\/jats:p>","DOI":"10.1145\/3679011","type":"journal-article","created":{"date-parts":[[2024,8,8]],"date-time":"2024-08-08T11:33:00Z","timestamp":1723116780000},"page":"1-36","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["How to Improve Video Analytics with Action Recognition: A Survey"],"prefix":"10.1145","volume":"57","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2515-3746","authenticated-orcid":false,"given":"Gayathri","family":"T","sequence":"first","affiliation":[{"name":"Computer Science and Engineering, PES University, Bangalore, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5409-1329","authenticated-orcid":false,"given":"Mamatha","family":"HR","sequence":"additional","affiliation":[{"name":"Computer Science and Engineering, PES University, Bangalore, India"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,10,7]]},"reference":[{"key":"e_1_3_2_2_2","first-page":"20143","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201922)","author":"Acsintoae Andra","year":"2022","unstructured":"Andra Acsintoae, Andrei Florescu, Mariana-Iuliana Georgescu, Tudor Mare, Paul Sumedrea, Radu Tudor Ionescu, Fahad Shahbaz Khan, and Mubarak Shah. 2022. UBnormal: New benchmark for supervised open-set video anomaly detection. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201922). IEEE, 20143\u201320153."},{"issue":"3","key":"e_1_3_2_3_2","doi-asserted-by":"crossref","first-page":"555","DOI":"10.1109\/TPAMI.2007.70825","article-title":"Robust real-time unusual event detection using multiple fixed-location monitors","volume":"30","author":"Adam Amit","year":"2008","unstructured":"Amit Adam, Ehud Rivlin, Ilan Shimshoni, and Daviv Reinitz. 2008. Robust real-time unusual event detection using multiple fixed-location monitors. IEEE Trans. Pattern Anal. Mach. Intell. 30, 3 (2008), 555\u2013560.","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"e_1_3_2_4_2","first-page":"469","volume-title":"Proceedings of the 8th European Conference on Computer Vision (ECCV\u201904)","author":"Ahonen Timo","year":"2004","unstructured":"Timo Ahonen, Abdenour Hadid, and Matti Pietik\u00e4inen. 2004. Face recognition with local binary patterns. In Proceedings of the 8th European Conference on Computer Vision (ECCV\u201904). Springer, 469\u2013481."},{"key":"e_1_3_2_5_2","first-page":"179","volume-title":"Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV\u201919)","author":"Ahsan Unaiza","year":"2019","unstructured":"Unaiza Ahsan, Rishi Madhok, and Irfan Essa. 2019. Video jigsaw: Unsupervised learning of spatiotemporal context for video action recognition. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV\u201919). IEEE, 179\u2013189."},{"key":"e_1_3_2_6_2","doi-asserted-by":"crossref","first-page":"20877","DOI":"10.1007\/s11042-019-7392-z","article-title":"Highly refined human action recognition model to handle intraclass variability & interclass similarity","volume":"78","author":"Akila K.","year":"2019","unstructured":"K. Akila and S. Chitrakala. 2019. Highly refined human action recognition model to handle intraclass variability & interclass similarity. Multim. Tools Applic. 78 (2019), 20877\u201320894.","journal-title":"Multim. Tools Applic."},{"issue":"2","key":"e_1_3_2_7_2","first-page":"288","article-title":"Human action recognition in videos using kinematic features and multiple instance learning","volume":"32","author":"Ali Saad","year":"2008","unstructured":"Saad Ali and Mubarak Shah. 2008. Human action recognition in videos using kinematic features and multiple instance learning. IEEE Trans. Pattern Anal. Mach. Intell. 32, 2 (2008), 288\u2013303.","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"issue":"6","key":"e_1_3_2_8_2","doi-asserted-by":"crossref","first-page":"1433","DOI":"10.1109\/TMM.2019.2944745","article-title":"2D pose-based real-time human action recognition with occlusion-handling","volume":"22","author":"Angelini Federico","year":"2019","unstructured":"Federico Angelini, Zeyu Fu, Yang Long, Ling Shao, and Syed Mohsen Naqvi. 2019. 2D pose-based real-time human action recognition with occlusion-handling. IEEE Trans. Multim. 22, 6 (2019), 1433\u20131446.","journal-title":"IEEE Trans. Multim."},{"key":"e_1_3_2_9_2","article-title":"Towards smart city security: Violence and weaponized violence detection using DCNN","author":"Aremu Toluwani","year":"2022","unstructured":"Toluwani Aremu, Li Zhiyuan, Reem Alameeri, Moayad Aloqaily, and Mohsen Guizani. 2022. Towards smart city security: Violence and weaponized violence detection using DCNN. arXiv preprint arXiv:2207.12850 (2022).","journal-title":"arXiv preprint arXiv:2207.12850"},{"key":"e_1_3_2_10_2","first-page":"6836","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV\u201921)","author":"Arnab Anurag","year":"2021","unstructured":"Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Lu\u010di\u0107, and Cordelia Schmid. 2021. ViViT: A video vision transformer. In Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV\u201921). IEEE, 6836\u20136846."},{"key":"e_1_3_2_11_2","first-page":"751","volume-title":"Proceedings of the 16th European Conference on Computer Vision (ECCV\u201920)","author":"Arnab Anurag","year":"2020","unstructured":"Anurag Arnab, Chen Sun, Arsha Nagrani, and Cordelia Schmid. 2020. Uncertainty-aware weakly supervised action detection from untrimmed videos. In Proceedings of the 16th European Conference on Computer Vision (ECCV\u201920). Springer, 751\u2013768."},{"key":"e_1_3_2_12_2","unstructured":"Robotics Artifical Intelligence Department of Computer Science Vision Laboratory University of Minnesota and Engineering. 2014. Unusual Event Datasets. Retrieved from: http:\/\/mha.cs.umn.edu\/Movies\/Crowd-Activity-All.avi"},{"key":"e_1_3_2_13_2","first-page":"3580","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201922)","author":"Askari Farzaneh","year":"2022","unstructured":"Farzaneh Askari, Rohit Ramaprasad, James J. Clark, and Martin D. Levine. 2022. Interaction classification with key actor detection in multi-person sports videos. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201922). IEEE, 3580\u20133588."},{"key":"e_1_3_2_14_2","first-page":"24","article-title":"Multiple cameras fall dataset","volume":"1350","author":"Auvinet Edouard","year":"2010","unstructured":"Edouard Auvinet, Caroline Rougier, Jean Meunier, Alain St-Arnaud, and Jacqueline Rousseau. 2010. Multiple cameras fall dataset. DIRO-Universit\u00e9 de Montr\u00e9al, Tech. Rep 1350 (2010), 24.","journal-title":"DIRO-Universit\u00e9 de Montr\u00e9al, Tech. Rep"},{"key":"e_1_3_2_15_2","first-page":"29","volume-title":"Proceedings of the 2nd International Workshop on Human Behavior Understanding (HBU\u201911)","author":"Baccouche Moez","year":"2011","unstructured":"Moez Baccouche, Franck Mamalet, Christian Wolf, Christophe Garcia, and Atilla Baskurt. 2011. Sequential deep learning for human action recognition. In Proceedings of the 2nd International Workshop on Human Behavior Understanding (HBU\u201911). Springer, 29\u201339."},{"key":"e_1_3_2_16_2","first-page":"404","volume-title":"Proceedings of the 9th European Conference on Computer Vision (ECCV\u201906)","author":"Bay Herbert","year":"2006","unstructured":"Herbert Bay, Tinne Tuytelaars, and Luc Van Gool. 2006. SURF: Speeded up robust features. In Proceedings of the 9th European Conference on Computer Vision (ECCV\u201906). Springer, 404\u2013417."},{"key":"e_1_3_2_17_2","first-page":"2280","volume-title":"Proceedings of the IEEE International Conference on Computer Vision (ICCV\u201913)","author":"Bojanowski Piotr","year":"2013","unstructured":"Piotr Bojanowski, Francis Bach, Ivan Laptev, Jean Ponce, Cordelia Schmid, and Josef Sivic. 2013. Finding actors and actions in movies. In Proceedings of the IEEE International Conference on Computer Vision (ICCV\u201913). IEEE, 2280\u20132287."},{"key":"e_1_3_2_18_2","first-page":"628","volume-title":"Proceedings of the 13th European Conference on Computer Vision (ECCV\u201914)","author":"Bojanowski Piotr","year":"2014","unstructured":"Piotr Bojanowski, R\u00e9mi Lajugie, Francis Bach, Ivan Laptev, Jean Ponce, Cordelia Schmid, and Josef Sivic. 2014. Weakly supervised action labeling in videos under ordering constraints. In Proceedings of the 13th European Conference on Computer Vision (ECCV\u201914). Springer, 628\u2013643."},{"key":"e_1_3_2_19_2","first-page":"7291","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201917)","author":"Cao Zhe","year":"2017","unstructured":"Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2017. Realtime multi-person 2D pose estimation using part affinity fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201917). IEEE, 7291\u20137299."},{"key":"e_1_3_2_20_2","article-title":"A short note about Kinetics-600","author":"Carreira Joao","year":"2018","unstructured":"Joao Carreira, Eric Noland, Andras Banki-Horvath, Chloe Hillier, and Andrew Zisserman. 2018. A short note about Kinetics-600. arXiv preprint arXiv:1808.01340 (2018).","journal-title":"arXiv preprint arXiv:1808.01340"},{"key":"e_1_3_2_21_2","first-page":"4724","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201917)","author":"Carreira Joao","year":"2017","unstructured":"Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? A new model and the Kinetics dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201917). CVPR, 4724\u20134733."},{"key":"e_1_3_2_22_2","unstructured":"Paola Cascante-Bonilla Kalpathy Sitaraman Mengjia Luo and Vicente Ordonez. 2019. MovieScope: Large-scale Analysis of Movies using Multiple Modalities. arXiv preprint arXiv:1908.03180."},{"key":"e_1_3_2_23_2","first-page":"100134","article-title":"Deep learning in computer vision: A critical review of emerging techniques and application scenarios","volume":"6","author":"Chai Junyi","year":"2021","unstructured":"Junyi Chai, Hao Zeng, Anming Li, and Eric W. T. Ngai. 2021. Deep learning in computer vision: A critical review of emerging techniques and application scenarios. Mach. Learn. Applic. 6 (2021), 100134.","journal-title":"Mach. Learn. Applic."},{"key":"e_1_3_2_24_2","first-page":"168","volume-title":"Proceedings of the IEEE International Conference on Image Processing (ICIP\u201915)","author":"Chen Chen","year":"2015","unstructured":"Chen Chen, Roozbeh Jafari, and Nasser Kehtarnavaz. 2015. UTD-MHAD: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In Proceedings of the IEEE International Conference on Image Processing (ICIP\u201915). IEEE, 168\u2013172."},{"key":"e_1_3_2_25_2","first-page":"846","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201921)","author":"Chen Rui","year":"2021","unstructured":"Rui Chen, Jiajun Chen, Zixi Liang, Huaien Gao, and Shan Lin. 2021. DarkLight networks for action recognition in the dark. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201921). IEEE, 846\u2013852."},{"key":"e_1_3_2_26_2","doi-asserted-by":"crossref","first-page":"2846","DOI":"10.1007\/s11263-021-01486-4","article-title":"SportsCap: Monocular 3D human motion capture and fine-grained understanding in challenging sports videos","volume":"129","author":"Chen Xin","year":"2021","unstructured":"Xin Chen, Anqi Pang, Wei Yang, Yuexin Ma, Lan Xu, and Jingyi Yu. 2021. SportsCap: Monocular 3D human motion capture and fine-grained understanding in challenging sports videos. Int. J. Comput. Vis. 129 (2021), 2846\u20132864.","journal-title":"Int. J. Comput. Vis."},{"key":"e_1_3_2_27_2","first-page":"4183","volume-title":"Proceedings of the 25th International Conference on Pattern Recognition (ICPR\u201921)","author":"Cheng Ming","year":"2021","unstructured":"Ming Cheng, Kunjing Cai, and Ming Li. 2021. RWF-2000: An open large scale video database for violence detection. In Proceedings of the 25th International Conference on Pattern Recognition (ICPR\u201921). IEEE, 4183\u20134190."},{"key":"e_1_3_2_28_2","first-page":"1","volume-title":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","author":"Cheng Yi-Bin","year":"2021","unstructured":"Yi-Bin Cheng, Xipeng Chen, Dongyu Zhang, and Liang Lin. 2021. Motion-transformer: Self-supervised pre-training for skeleton-based action recognition. In Proceedings of the 2nd ACM International Conference on Multimedia in Asia. ACM, 1\u20136."},{"key":"e_1_3_2_29_2","first-page":"1717","volume-title":"Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision","author":"Choi Jinwoo","year":"2020","unstructured":"Jinwoo Choi, Gaurav Sharma, Manmohan Chandraker, and Jia-Bin Huang. 2020. Unsupervised and semi-supervised domain adaptation for action recognition from drones. In Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision. IEEE, 1717\u20131726."},{"key":"e_1_3_2_30_2","first-page":"7024","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201918)","author":"Choutas Vasileios","year":"2018","unstructured":"Vasileios Choutas, Philippe Weinzaepfel, J\u00e9r\u00f4me Revaud, and Cordelia Schmid. 2018. PoTion: Pose moTion representation for action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201918). IEEE, 7024\u20137033."},{"key":"e_1_3_2_31_2","first-page":"13465","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV\u201921)","author":"Chung Jihoon","year":"2021","unstructured":"Jihoon Chung, Cheng-hsin Wuu, Hsuan-Ru Yang, Yu-Wing Tai, and Chi-Keung Tang. 2021. HAA500: Human-centric atomic action dataset with curated videos. In Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV\u201921). IEEE, 13465\u201313474."},{"key":"e_1_3_2_32_2","first-page":"295","article-title":"Pose-appearance relational modeling for video action recognition","volume":"32","author":"Cui Mengmeng","year":"2022","unstructured":"Mengmeng Cui, Wei Wang, Kunbo Zhang, Zhenan Sun, and Liang Wang. 2022. Pose-appearance relational modeling for video action recognition. IEEE Trans. Image Process. 32 (2022), 295\u2013308.","journal-title":"IEEE Trans. Image Process."},{"key":"e_1_3_2_33_2","first-page":"886","volume-title":"Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR\u201905)","author":"Dalal Navneet","year":"2005","unstructured":"Navneet Dalal and Bill Triggs. 2005. Histograms of oriented gradients for human detection. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR\u201905). IEEE, 886\u2013893."},{"key":"e_1_3_2_34_2","first-page":"428","volume-title":"Proceedings of the 9th European Conference on Computer Vision (ECCV\u201906)","author":"Dalal Navneet","year":"2006","unstructured":"Navneet Dalal, Bill Triggs, and Cordelia Schmid. 2006. Human detection using oriented histograms of flow and appearance. In Proceedings of the 9th European Conference on Computer Vision (ECCV\u201906). Springer, 428\u2013441."},{"key":"e_1_3_2_35_2","first-page":"720","volume-title":"Proceedings of the European Conference on Computer Vision (ECCV\u201918)","author":"Damen Dima","year":"2018","unstructured":"Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Sanja Fidler, Antonino Furnari, Evangelos Kazakos, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, and Michael Wray. 2018. Scaling egocentric vision: The EPIC-KITCHENS dataset. In Proceedings of the European Conference on Computer Vision (ECCV\u201918). IEEE, 720\u2013736."},{"key":"e_1_3_2_36_2","volume-title":"Weakly and Partially Supervised Learning Frameworks for Anomaly Detection","author":"Degardin Bruno Manuel","year":"2020","unstructured":"Bruno Manuel Degardin. 2020. Weakly and Partially Supervised Learning Frameworks for Anomaly Detection. Ph. D. Dissertation. Universidade da Beira Interior (Portugal)."},{"key":"e_1_3_2_37_2","doi-asserted-by":"crossref","first-page":"7379","DOI":"10.1007\/s11042-014-1984-4","article-title":"VSD, a public dataset for the detection of violent scenes in movies: Design, annotation, analysis and evaluation","volume":"74","author":"Demarty Claire-H\u00e9l\u00e8ne","year":"2015","unstructured":"Claire-H\u00e9l\u00e8ne Demarty, C\u00e9dric Penet, Mohammad Soleymani, and Guillaume Gravier. 2015. VSD, a public dataset for the detection of violent scenes in movies: Design, annotation, analysis and evaluation. Multim. Tools Applic. 74 (2015), 7379\u20137404.","journal-title":"Multim. Tools Applic."},{"key":"e_1_3_2_38_2","doi-asserted-by":"crossref","first-page":"3835","DOI":"10.1109\/TIP.2020.2965299","article-title":"View-invariant deep architecture for human action recognition using two-stream motion and shape temporal dynamics","volume":"29","author":"Dhiman Chhavi","year":"2020","unstructured":"Chhavi Dhiman and Dinesh Kumar Vishwakarma. 2020. View-invariant deep architecture for human action recognition using two-stream motion and shape temporal dynamics. IEEE Trans. Image Process. 29 (2020), 3835\u20133844.","journal-title":"IEEE Trans. Image Process."},{"issue":"3","key":"e_1_3_2_39_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3441628","article-title":"Part-wise spatio-temporal attention driven CNN-based 3D human action recognition","volume":"17","author":"Dhiman Chhavi","year":"2021","unstructured":"Chhavi Dhiman, Dinesh Kumar Vishwakarma, and Paras Agarwal. 2021. Part-wise spatio-temporal attention driven CNN-based 3D human action recognition. ACM Trans. Multim. Comput. Commun. Applic. 17, 3 (2021), 1\u201324.","journal-title":"ACM Trans. Multim. Comput. Commun. Applic."},{"key":"e_1_3_2_40_2","unstructured":"Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai Thomas Unterthiner Mostafa Dehghani Matthias Minderer Georg Heigold Sylvain Gelly Jakob Uszkoreit and Neil Houlsby. 2020. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv preprint arXiv:2010.11929."},{"key":"e_1_3_2_41_2","first-page":"1110","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201915)","author":"Du Yong","year":"2015","unstructured":"Yong Du, Wei Wang, and Liang Wang. 2015. Hierarchical recurrent neural network for skeleton based action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201915). IEEE, 1110\u20131118."},{"key":"e_1_3_2_42_2","first-page":"6824","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV\u201921)","author":"Fan Haoqi","year":"2021","unstructured":"Haoqi Fan, Bo Xiong, Karttikeya Mangalam, Yanghao Li, Zhicheng Yan, Jitendra Malik, and Christoph Feichtenhofer. 2021. Multiscale vision transformers. In Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV\u201921). IEEE, 6824\u20136835."},{"key":"e_1_3_2_43_2","first-page":"1","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201908)","author":"Fathi Alireza","year":"2008","unstructured":"Alireza Fathi and Greg Mori. 2008. Action recognition by learning mid-level motion features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201908). IEEE, 1\u20138."},{"key":"e_1_3_2_44_2","article-title":"Flow-pose Net: An effective two-stream network for fall detection","author":"Fei Kexin","year":"2023","unstructured":"Kexin Fei, Chao Wang, Jiaxu Zhang, Yuanzhong Liu, Xing Xie, and Zhigang Tu. 2023. Flow-pose Net: An effective two-stream network for fall detection. The Visual Computer. 39, 6 (2023), 2305--20.","journal-title":"The Visual Computer."},{"key":"e_1_3_2_45_2","first-page":"203","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201920)","author":"Feichtenhofer Christoph","year":"2020","unstructured":"Christoph Feichtenhofer. 2020. X3D: Expanding architectures for efficient video recognition. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201920). IEEE, 203\u2013213."},{"key":"e_1_3_2_46_2","first-page":"6202","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV\u201919)","author":"Feichtenhofer Christoph","year":"2019","unstructured":"Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, and Kaiming He. 2019. SlowFast networks for video recognition. In Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV\u201919). IEEE, 6202\u20136211."},{"issue":"11","key":"e_1_3_2_47_2","doi-asserted-by":"crossref","first-page":"2188","DOI":"10.1109\/TPAMI.2011.70","article-title":"Hough forests for object detection, tracking, and action recognition","volume":"33","author":"Gall Juergen","year":"2011","unstructured":"Juergen Gall, Angela Yao, Nima Razavi, Luc Van Gool, and Victor Lempitsky. 2011. Hough forests for object detection, tracking, and action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 33, 11 (2011), 2188\u20132202.","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"e_1_3_2_48_2","first-page":"137","volume-title":"Proceedings of the 15th International Work-Conference on Artificial Neural Networks: Advances in Computational Intelligence (IWANN\u201919)","author":"Ganesh Yaparla","year":"2019","unstructured":"Yaparla Ganesh, Allaparthi Sri Teja, Sai Krishna Munnangi, and Garimella Rama Murthy. 2019. A novel framework for fine grained action recognition in soccer. In Proceedings of the 15th International Work-Conference on Artificial Neural Networks: Advances in Computational Intelligence (IWANN\u201919). Springer, 137\u2013150."},{"key":"e_1_3_2_49_2","first-page":"12742","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201921)","author":"Georgescu Mariana-Iuliana","year":"2021","unstructured":"Mariana-Iuliana Georgescu, Antonio Barbalau, Radu Tudor Ionescu, Fahad Shahbaz Khan, Marius Popescu, and Mubarak Shah. 2021. Anomaly detection in video via self-supervised and multi-task learning. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201921). IEEE, 12742\u201312752."},{"key":"e_1_3_2_50_2","first-page":"1711","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops","author":"Giancola Silvio","year":"2018","unstructured":"Silvio Giancola, Mohieddine Amine, Tarek Dghaily, and Bernard Ghanem. 2018. SoccerNet: A scalable dataset for action spotting in soccer videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. IEEE, 1711\u20131721."},{"key":"e_1_3_2_51_2","doi-asserted-by":"crossref","first-page":"103442","DOI":"10.1016\/j.cviu.2022.103442","article-title":"Multi-human fall detection and localization in videos","volume":"220","author":"Gomes Mouglas Eug\u00eanio Nas\u00e1rio","year":"2022","unstructured":"Mouglas Eug\u00eanio Nas\u00e1rio Gomes, David Mac\u00eado, Cleber Zanchettin, Paulo Salgado Gomes de Mattos-Neto, and Adriano Oliveira. 2022. Multi-human fall detection and localization in videos. Comput. Vis. Image Underst. 220 (2022), 103442.","journal-title":"Comput. Vis. Image Underst."},{"issue":"12","key":"e_1_3_2_52_2","doi-asserted-by":"crossref","first-page":"2247","DOI":"10.1109\/TPAMI.2007.70711","article-title":"Actions as space-time shapes","volume":"29","author":"Gorelick Lena","year":"2007","unstructured":"Lena Gorelick, Moshe Blank, Eli Shechtman, Michal Irani, and Ronen Basri. 2007. Actions as space-time shapes. IEEE Trans. Pattern Anal. Mach. Intell. 29, 12 (2007), 2247\u20132253.","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"e_1_3_2_53_2","first-page":"5842","volume-title":"Proceedings of the IEEE International Conference on Computer Vision (ICCV\u201917)","author":"Goyal Raghav","year":"2017","unstructured":"Raghav Goyal, Samira Ebrahimi Kahou, Vincent Michalski, Joanna Materzynska, Susanne Westphal, Heuna Kim, Valentin Haenel, Ingo Fruend, Peter Yianilos, Moritz Mueller-Freitag, Florian Hoppe, Christian Thurau, Ingo Bax, and Roland Memisevic. 2017. The \u201cSomething Something\u201d video database for learning and evaluating visual common sense. In Proceedings of the IEEE International Conference on Computer Vision (ICCV\u201917). IEEE, 5842\u20135850."},{"key":"e_1_3_2_54_2","first-page":"6047","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201918)","author":"Gu Chunhui","year":"2018","unstructured":"Chunhui Gu, Chen Sun, David A. Ross, Carl Vondrick, Caroline Pantofaru, Yeqing Li, Sudheendra Vijayanarasimhan, George Toderici, Susanna Ricco, Rahul Sukthankar, Cordelia Schmid, and Jitendra Malik. 2018. AVA: A video dataset of spatio-temporally localized atomic visual actions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201918). IEEE, 6047\u20136056."},{"key":"e_1_3_2_55_2","first-page":"762","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","author":"Guo Tianyu","year":"2022","unstructured":"Tianyu Guo, Hong Liu, Zhan Chen, Mengyuan Liu, Tao Wang, and Runwei Ding. 2022. Contrastive learning from extremely augmented skeleton sequences for self-supervised action recognition. In Proceedings of the AAAI Conference on Artificial Intelligence. AAAI, 762\u2013770."},{"key":"e_1_3_2_56_2","doi-asserted-by":"crossref","first-page":"110333","DOI":"10.1016\/j.patcog.2024.110333","article-title":"Improving self-supervised action recognition from extremely augmented skeleton sequences","volume":"150","author":"Guo Tianyu","year":"2024","unstructured":"Tianyu Guo, Mengyuan Liu, Hong Liu, Guoquan Wang, and Wenhao Li. 2024. Improving self-supervised action recognition from extremely augmented skeleton sequences. Pattern Recog. 150 (2024), 110333.","journal-title":"Pattern Recog."},{"key":"e_1_3_2_57_2","first-page":"628","volume-title":"Proceedings of the 28th ACM International Conference on Multimedia","author":"Hao Yanbin","year":"2020","unstructured":"Yanbin Hao, Hao Zhang, Chong-Wah Ngo, Qiang Liu, and Xiaojun Hu. 2020. Compact bilinear augmented query structured attention for sport highlights classification. In Proceedings of the 28th ACM International Conference on Multimedia. ACM, 628\u2013636."},{"key":"e_1_3_2_58_2","first-page":"6546","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201918)","author":"Hara Kensho","year":"2018","unstructured":"Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh. 2018. Can spatiotemporal 3D CNNs retrace the history of 2D CNNs and ImageNet? In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201918). IEEE, 6546\u20136555."},{"key":"e_1_3_2_59_2","first-page":"9254","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV\u201921)","author":"Hong James","year":"2021","unstructured":"James Hong, Matthew Fisher, Micha\u00ebl Gharbi, and Kayvon Fatahalian. 2021. Video pose distillation for few-shot, fine-grained sports action recognition. In Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV\u201921). IEEE, 9254\u20139263."},{"key":"e_1_3_2_60_2","first-page":"137","volume-title":"Proceedings of the 14th European Conference on Computer Vision (ECCV\u201916)","author":"Huang De-An","year":"2016","unstructured":"De-An Huang, Li Fei-Fei, and Juan Carlos Niebles. 2016. Connectionist temporal modeling for weakly supervised action labeling. In Proceedings of the 14th European Conference on Computer Vision (ECCV\u201916). Springer, 137\u2013153."},{"key":"e_1_3_2_61_2","first-page":"709","volume-title":"Proceedings of the 16th European Conference on Computer Vision (ECCV\u201920)","author":"Huang Qingqiu","year":"2020","unstructured":"Qingqiu Huang, Yu Xiong, Anyi Rao, Jiaze Wang, and Dahua Lin. 2020. MovieNet: A holistic dataset for movie understanding. In Proceedings of the 16th European Conference on Computer Vision (ECCV\u201920). Springer, 709\u2013727."},{"key":"e_1_3_2_62_2","first-page":"2462","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201917)","author":"Ilg Eddy","year":"2017","unstructured":"Eddy Ilg, Nikolaus Mayer, Tonmoy Saikia, Margret Keuper, Alexey Dosovitskiy, and Thomas Brox. 2017. FlowNet 2.0: Evolution of optical flow estimation with deep networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201917). IEEE, 2462\u20132470."},{"key":"e_1_3_2_63_2","first-page":"10285","volume-title":"Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS\u201920)","author":"Islam Md Mofijul","year":"2020","unstructured":"Md Mofijul Islam and Tariq Iqbal. 2020. Hamlet: A hierarchical multimodal attention-based human activity recognition algorithm. In Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS\u201920). IEEE, 10285\u201310292."},{"issue":"2","key":"e_1_3_2_64_2","doi-asserted-by":"crossref","first-page":"1729","DOI":"10.1109\/LRA.2021.3059624","article-title":"Multi-GAT: A graphical attention-based hierarchical multimodal representation learning approach for human activity recognition","volume":"6","author":"Islam Md Mofijul","year":"2021","unstructured":"Md Mofijul Islam and Tariq Iqbal. 2021. Multi-GAT: A graphical attention-based hierarchical multimodal representation learning approach for human activity recognition. IEEE Robot. Autom. Lett. 6, 2 (2021), 1729\u20131736.","journal-title":"IEEE Robot. Autom. Lett."},{"key":"e_1_3_2_65_2","first-page":"10990","volume-title":"Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS\u201920)","author":"Jang Jinhyeok","year":"2020","unstructured":"Jinhyeok Jang, Dohyung Kim, Cheonshu Park, Minsu Jang, Jaeyeon Lee, and Jaehong Kim. 2020. ETRI-activity3D: A large-scale RGB-D dataset for robots to recognize daily activities of the elderly. In Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS\u201920). IEEE, 10990\u201310997."},{"issue":"1","key":"e_1_3_2_66_2","first-page":"221","article-title":"3D convolutional neural networks for human action recognition","volume":"35","author":"Ji Shuiwang","year":"2012","unstructured":"Shuiwang Ji, Wei Xu, Ming Yang, and Kai Yu. 2012. 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1 (2012), 221\u2013231.","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"e_1_3_2_67_2","first-page":"574","volume-title":"Proceedings of the 27th ACM International Conference on Multimedia","author":"Ji Yanli","year":"2019","unstructured":"Yanli Ji, Feixiang Xu, Yang Yang, Ning Xie, Heng Tao Shen, and Tatsuya Harada. 2019. Attention transfer (ANT) network for view-invariant action recognition. In Proceedings of the 27th ACM International Conference on Multimedia. ACM, 574\u2013582."},{"key":"e_1_3_2_68_2","doi-asserted-by":"crossref","first-page":"201","DOI":"10.3758\/BF03212378","article-title":"Visual perception of biological motion and a model for its analysis","volume":"14","author":"Johansson Gunnar","year":"1973","unstructured":"Gunnar Johansson. 1973. Visual perception of biological motion and a model for its analysis. Percept. Psychophys. 14 (1973), 201\u2013211.","journal-title":"Percept. Psychophys."},{"key":"e_1_3_2_69_2","first-page":"1725","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201914)","author":"Karpathy Andrej","year":"2014","unstructured":"Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. 2014. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201914). IEEE, 1725\u20131732."},{"key":"e_1_3_2_70_2","doi-asserted-by":"crossref","first-page":"143","DOI":"10.1016\/j.cviu.2017.04.007","article-title":"Fine-grained action recognition of boxing punches from depth imagery","volume":"159","author":"Kasiri Soudeh","year":"2017","unstructured":"Soudeh Kasiri, Clinton Fookes, Sridha Sridharan, and Stuart Morgan. 2017. Fine-grained action recognition of boxing punches from depth imagery. Comput. Vis. Image Underst. 159 (2017), 143\u2013153.","journal-title":"Comput. Vis. Image Underst."},{"issue":"6","key":"e_1_3_2_71_2","doi-asserted-by":"crossref","first-page":"2453","DOI":"10.1007\/s00530-022-00978-8","article-title":"ENet: Event based highlight generation network for broadcast sports videos","volume":"28","author":"Khan Abdullah Aman","year":"2022","unstructured":"Abdullah Aman Khan, Yunbo Rao, and Jie Shao. 2022. ENet: Event based highlight generation network for broadcast sports videos. Multim. Syst. 28, 6 (2022), 2453\u20132464.","journal-title":"Multim. Syst."},{"key":"e_1_3_2_72_2","doi-asserted-by":"crossref","first-page":"107779","DOI":"10.1016\/j.compeleceng.2022.107779","article-title":"SPNet: A deep network for broadcast sports video highlight generation","volume":"99","author":"Khan Abdullah Aman","year":"2022","unstructured":"Abdullah Aman Khan and Jie Shao. 2022. SPNet: A deep network for broadcast sports video highlight generation. Comput. Electric. Eng. 99 (2022), 107779.","journal-title":"Comput. Electric. Eng."},{"key":"e_1_3_2_73_2","doi-asserted-by":"crossref","first-page":"108068","DOI":"10.1016\/j.patcog.2021.108068","article-title":"Weakly-supervised temporal attention 3D network for human action recognition","volume":"119","author":"Kim Jonghyun","year":"2021","unstructured":"Jonghyun Kim, Gen Li, Inyong Yun, Cheolkon Jung, and Joongkyu Kim. 2021. Weakly-supervised temporal attention 3D network for human action recognition. Pattern Recog. 119 (2021), 108068.","journal-title":"Pattern Recog."},{"issue":"2","key":"e_1_3_2_74_2","doi-asserted-by":"crossref","first-page":"532","DOI":"10.1109\/TCSVT.2019.2893318","article-title":"A joint framework for athlete tracking and action recognition in sports videos","volume":"30","author":"Kong Longteng","year":"2019","unstructured":"Longteng Kong, Di Huang, Jie Qin, and Yunhong Wang. 2019. A joint framework for athlete tracking and action recognition in sports videos. IEEE Trans. Circ. Syst. Video Technol. 30, 2 (2019), 532\u2013548.","journal-title":"IEEE Trans. Circ. Syst. Video Technol."},{"key":"e_1_3_2_75_2","first-page":"2046","volume-title":"Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR\u201910)","author":"Kovashka Adriana","year":"2010","unstructured":"Adriana Kovashka and Kristen Grauman. 2010. Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR\u201910). IEEE, 2046\u20132053."},{"key":"e_1_3_2_76_2","first-page":"2556","volume-title":"Proceedings of the International Conference on Computer Vision (ICCV\u201911)","author":"Kuehne Hildegard","year":"2011","unstructured":"Hildegard Kuehne, Hueihan Jhuang, Est\u00edbaliz Garrote, Tomaso Poggio, and Thomas Serre. 2011. HMDB: A large video database for human motion recognition. In Proceedings of the International Conference on Computer Vision (ICCV\u201911). IEEE, 2556\u20132563."},{"key":"e_1_3_2_77_2","doi-asserted-by":"crossref","first-page":"78","DOI":"10.1016\/j.cviu.2017.06.004","article-title":"Weakly supervised learning of actions from transcripts","volume":"163","author":"Kuehne Hilde","year":"2017","unstructured":"Hilde Kuehne, Alexander Richard, and Juergen Gall. 2017. Weakly supervised learning of actions from transcripts. Comput. Vis. Image Underst. 163 (2017), 78\u201389.","journal-title":"Comput. Vis. Image Underst."},{"key":"e_1_3_2_78_2","first-page":"1","article-title":"Multi-label movie genre detection from a movie poster using knowledge transfer learning","volume":"5","author":"Kundalia Kaushil","year":"2020","unstructured":"Kaushil Kundalia, Yash Patel, and Manan Shah. 2020. Multi-label movie genre detection from a movie poster using knowledge transfer learning. Augment. Hum. Res. 5 (2020), 1\u20139.","journal-title":"Augment. Hum. Res."},{"issue":"11","key":"e_1_3_2_79_2","doi-asserted-by":"crossref","first-page":"2278","DOI":"10.1109\/5.726791","article-title":"Gradient-based learning applied to document recognition","volume":"86","author":"LeCun Yann","year":"1998","unstructured":"Yann LeCun, L\u00e9on Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278\u20132324.","journal-title":"Proc. IEEE"},{"key":"e_1_3_2_80_2","first-page":"1","volume-title":"Proceedings of the 5th International Workshop on Biometrics and Forensics (IWBF\u201917)","author":"Leyva Roberto","year":"2017","unstructured":"Roberto Leyva, Victor Sanchez, and Chang-Tsun Li. 2017. The LV dataset: A realistic surveillance video dataset for abnormal event detection. In Proceedings of the 5th International Workshop on Biometrics and Forensics (IWBF\u201917). IEEE, 1\u20136."},{"key":"e_1_3_2_81_2","first-page":"434","volume-title":"Proceedings of the Asian Conference on Computer Vision","author":"Li Jianwei","year":"2022","unstructured":"Jianwei Li, Haiqing Hu, Jinyang Li, and Xiaomei Zhao. 2022. 3D-Yoga: A 3D yoga dataset for visual-based hierarchical sports action analysis. In Proceedings of the Asian Conference on Computer Vision. Springer, 434\u2013450."},{"key":"e_1_3_2_82_2","first-page":"2723","volume-title":"Proceedings of the 31st ACM International Conference on Multimedia","author":"Li Qiankun","year":"2023","unstructured":"Qiankun Li, Xiaolong Huang, Zhifan Wan, Lanqing Hu, Shuzhe Wu, Jie Zhang, Shiguang Shan, and Zengfu Wang. 2023. Data-efficient masked video modeling for self-supervised action recognition. In Proceedings of the 31st ACM International Conference on Multimedia. ACM, 2723\u20132733."},{"key":"e_1_3_2_83_2","first-page":"1173","volume-title":"Proceedings of the IEEE 16th International Conference on Rehabilitation Robotics (ICORR\u201919)","author":"Li Shengchao","year":"2019","unstructured":"Shengchao Li, Hao Xiong, and Xiumin Diao. 2019. Pre-impact fall detection using 3D convolutional neural network. In Proceedings of the IEEE 16th International Conference on Rehabilitation Robotics (ICORR\u201919). IEEE, 1173\u20131178."},{"issue":"1","key":"e_1_3_2_84_2","doi-asserted-by":"crossref","first-page":"18","DOI":"10.1109\/TPAMI.2013.111","article-title":"Anomaly detection and localization in crowded scenes","volume":"36","author":"Li Weixin","year":"2013","unstructured":"Weixin Li, Vijay Mahadevan, and Nuno Vasconcelos. 2013. Anomaly detection and localization in crowded scenes. IEEE Trans. Pattern Anal. Mach. Intell. 36, 1 (2013), 18\u201332.","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"e_1_3_2_85_2","first-page":"9","volume-title":"Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","author":"Li Wanqing","year":"2010","unstructured":"Wanqing Li, Zhengyou Zhang, and Zicheng Liu. 2010. Action recognition based on a bag of 3D points. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. IEEE, 9\u201314."},{"key":"e_1_3_2_86_2","first-page":"13536","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV\u201921)","author":"Li Yixuan","year":"2021","unstructured":"Yixuan Li, Lei Chen, Runyu He, Zhenzhi Wang, Gangshan Wu, and Limin Wang. 2021. MultiSports: A multi-person video dataset of spatio-temporally localized sports actions. In Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV\u201921). IEEE, 13536\u201313545."},{"key":"e_1_3_2_87_2","first-page":"909","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201920)","author":"Li Yan","year":"2020","unstructured":"Yan Li, Bin Ji, Xintian Shi, Jianguo Zhang, Bin Kang, and Limin Wang. 2020. TEA: Temporal excitation and aggregation for action recognition. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201920). IEEE, 909\u2013918."},{"key":"e_1_3_2_88_2","first-page":"513","volume-title":"Proceedings of the European Conference on Computer Vision (ECCV\u201918)","author":"Li Yingwei","year":"2018","unstructured":"Yingwei Li, Yi Li, and Nuno Vasconcelos. 2018. RESOUND: Towards action recognition without representation bias. In Proceedings of the European Conference on Computer Vision (ECCV\u201918). ECCV, 513\u2013528."},{"key":"e_1_3_2_89_2","first-page":"4804","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201922)","author":"Li Yanghao","year":"2022","unstructured":"Yanghao Li, Chao-Yuan Wu, Haoqi Fan, Karttikeya Mangalam, Bo Xiong, Jitendra Malik, and Christoph Feichtenhofer. 2022. MViTv2: Improved multiscale vision transformers for classification and detection. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201922). IEEE, 4804\u20134814."},{"key":"e_1_3_2_90_2","doi-asserted-by":"crossref","first-page":"107293","DOI":"10.1016\/j.patcog.2020.107293","article-title":"Learning shape and motion representations for view invariant skeleton-based action recognition","volume":"103","author":"Li Yanshan","year":"2020","unstructured":"Yanshan Li, Rongjie Xia, and Xing Liu. 2020. Learning shape and motion representations for view invariant skeleton-based action recognition. Pattern Recog. 103 (2020), 107293.","journal-title":"Pattern Recog."},{"key":"e_1_3_2_91_2","first-page":"7083","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV\u201919)","author":"Lin Ji","year":"2019","unstructured":"Ji Lin, Chuang Gan, and Song Han. 2019. TSM: Temporal shift module for efficient video understanding. In Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV\u201919). IEEE, 7083\u20137093."},{"key":"e_1_3_2_92_2","first-page":"2490","volume-title":"Proceedings of the 28th ACM International Conference on Multimedia","author":"Lin Lilang","year":"2020","unstructured":"Lilang Lin, Sijie Song, Wenhan Yang, and Jiaying Liu. 2020. MS2L: Multi-task self-supervised learning for skeleton based action recognition. In Proceedings of the 28th ACM International Conference on Multimedia. ACM, 2490\u20132498."},{"key":"e_1_3_2_93_2","first-page":"2363","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201923)","author":"Lin Lilang","year":"2023","unstructured":"Lilang Lin, Jiahang Zhang, and Jiaying Liu. 2023. Actionlet-dependent contrastive learning for unsupervised skeleton-based action recognition. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201923). IEEE, 2363\u20132372."},{"issue":"10","key":"e_1_3_2_94_2","first-page":"2684","article-title":"NTU RGB+ D 120: A large-scale benchmark for 3D human activity understanding","volume":"42","author":"Liu Jun","year":"2019","unstructured":"Jun Liu, Amir Shahroudy, Mauricio Perez, Gang Wang, Ling-Yu Duan, and Alex C. Kot. 2019. NTU RGB+ D 120: A large-scale benchmark for 3D human activity understanding. IEEE Trans. Pattern Anal. Mach. Intell. 42, 10 (2019), 2684\u20132701.","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"e_1_3_2_95_2","doi-asserted-by":"crossref","first-page":"82246","DOI":"10.1109\/ACCESS.2019.2923651","article-title":"R-STAN: Residual spatial-temporal attention network for action recognition","volume":"7","author":"Liu Quanle","year":"2019","unstructured":"Quanle Liu, Xiangjiu Che, and Mei Bie. 2019. R-STAN: Residual spatial-temporal attention network for action recognition. IEEE Access 7 (2019), 82246\u201382255.","journal-title":"IEEE Access"},{"key":"e_1_3_2_96_2","doi-asserted-by":"crossref","first-page":"360","DOI":"10.1016\/j.neucom.2020.06.108","article-title":"FSD-10: A fine-grained classification dataset for figure skating","volume":"413","author":"Liu Shenglan","year":"2020","unstructured":"Shenglan Liu, Xiang Liu, Gao Huang, Hong Qiao, Lianyu Hu, Dong Jiang, Aibin Zhang, Yang Liu, and Ge Guo. 2020. FSD-10: A fine-grained classification dataset for figure skating. Neurocomputing 413 (2020), 360\u2013367.","journal-title":"Neurocomputing"},{"key":"e_1_3_2_97_2","first-page":"12009","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201922)","author":"Liu Ze","year":"2022","unstructured":"Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, and Baining Guo. 2022. Swin transformer V2: Scaling up capacity and resolution. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201922). IEEE, 12009\u201312019."},{"key":"e_1_3_2_98_2","first-page":"10012","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV\u201921)","author":"Liu Ze","year":"2021","unstructured":"Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV\u201921). IEEE, 10012\u201310022."},{"key":"e_1_3_2_99_2","first-page":"12426","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201919)","author":"Lohit Suhas","year":"2019","unstructured":"Suhas Lohit, Qiao Wang, and Pavan Turaga. 2019. Temporal transformer networks: Joint learning of invariant and discriminative time warping. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201919). IEEE, 12426\u201312435."},{"key":"e_1_3_2_100_2","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1023\/B:VISI.0000029664.99615.94","article-title":"Distinctive image features from scale-invariant keypoints","volume":"60","author":"Lowe David G.","year":"2004","unstructured":"David G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60 (2004), 91\u2013110.","journal-title":"Int. J. Comput. Vis."},{"key":"e_1_3_2_101_2","first-page":"2720","volume-title":"Proceedings of the IEEE International Conference on Computer Vision (ICCV\u201913)","author":"Lu Cewu","year":"2013","unstructured":"Cewu Lu, Jianping Shi, and Jiaya Jia. 2013. Abnormal event detection at 150 fps in MatLab. In Proceedings of the IEEE International Conference on Computer Vision (ICCV\u201913). IEEE, 2720\u20132727."},{"key":"e_1_3_2_102_2","unstructured":"Camillo Lugaresi Jiuqiang Tang Hadon Nash Chris McClanahan Esha Uboweja Michael Hays Fan Zhang Chuo-Ling Chang Ming Guang Yong Juhyun Lee et\u00a0al. 2019. MediaPipe: A Framework for Building Perception Pipelines. arXiv preprint arXiv:1906.08172."},{"key":"e_1_3_2_103_2","first-page":"341","volume-title":"Proceedings of the IEEE International Conference on Computer Vision (ICCV\u201917)","author":"Luo Weixin","year":"2017","unstructured":"Weixin Luo, Wen Liu, and Shenghua Gao. 2017. A revisit of sparse coding based anomaly detection in stacked RNN framework. In Proceedings of the IEEE International Conference on Computer Vision (ICCV\u201917). IEEE, 341\u2013349."},{"issue":"8","key":"e_1_3_2_104_2","first-page":"2752","article-title":"Multi-task deep learning for real-time 3D human pose estimation and action recognition","volume":"43","author":"Luvizon Diogo C.","year":"2020","unstructured":"Diogo C. Luvizon, David Picard, and Hedi Tabia. 2020. Multi-task deep learning for real-time 3D human pose estimation and action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 43, 8 (2020), 2752\u20132764.","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"e_1_3_2_105_2","doi-asserted-by":"crossref","first-page":"282","DOI":"10.1049\/cvi2.12162","article-title":"Violence 4D: Violence detection in surveillance using 4D convolutional neural networks","volume":"17","author":"Magdy Mai","year":"2022","unstructured":"Mai Magdy, Mohamed Waleed Fakhr, and Fahima A. Maghraby. 2022. Violence 4D: Violence detection in surveillance using 4D convolutional neural networks. IET Comput. Vis. 17 (2022), 282\u2013294.","journal-title":"IET Comput. Vis."},{"key":"e_1_3_2_106_2","first-page":"6884","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201917)","author":"Maharaj Tegan","year":"2017","unstructured":"Tegan Maharaj, Nicolas Ballas, Anna Rohrbach, Aaron Courville, and Christopher Pal. 2017. A dataset and exploration of models for understanding video data through fill-in-the-blank question-answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201917). IEEE, 6884\u20136893."},{"key":"e_1_3_2_107_2","first-page":"1","volume-title":"Proceedings of the 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG\u201921)","author":"Majhi Snehashis","year":"2021","unstructured":"Snehashis Majhi, Srijan Das, Fran\u00e7ois Br\u00e9mond, Ratnakar Dash, and Pankaj Kumar Sa. 2021. Weakly-supervised joint anomaly detection and classification. In Proceedings of the 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG\u201921). IEEE, 1\u20137."},{"issue":"14","key":"e_1_3_2_108_2","doi-asserted-by":"crossref","first-page":"19071","DOI":"10.1007\/s11042-020-10086-2","article-title":"A multimodal approach for multi-label movie genre classification","volume":"81","author":"Mangolin Rafael B.","year":"2022","unstructured":"Rafael B. Mangolin, Rodolfo M. Pereira, Alceu S. Britto Jr, Carlos N. Silla Jr, Val\u00e9ria D. Feltrim, Diego Bertolini, and Yandre M. G. Costa. 2022. A multimodal approach for multi-label movie genre classification. Multim. Tools Applic. 81, 14 (2022), 19071\u201319096.","journal-title":"Multim. Tools Applic."},{"key":"e_1_3_2_109_2","doi-asserted-by":"crossref","first-page":"20429","DOI":"10.1007\/s11042-020-08917-3","article-title":"Fine grained sport action recognition with Twin spatio-temporal convolutional neural networks: Application to table tennis","volume":"79","author":"Martin Pierre-Etienne","year":"2020","unstructured":"Pierre-Etienne Martin, Jenny Benois-Pineau, Renaud P\u00e9teri, and Julien Morlier. 2020. Fine grained sport action recognition with Twin spatio-temporal convolutional neural networks: Application to table tennis. Multim. Tools Applic. 79 (2020), 20429\u201320447.","journal-title":"Multim. Tools Applic."},{"issue":"9","key":"e_1_3_2_110_2","doi-asserted-by":"crossref","first-page":"1988","DOI":"10.3390\/s19091988","article-title":"UP-fall detection dataset: A multimodal approach","volume":"19","author":"Mart\u00ednez-Villase\u00f1or Lourdes","year":"2019","unstructured":"Lourdes Mart\u00ednez-Villase\u00f1or, Hiram Ponce, Jorge Brieva, Ernesto Moya-Albor, Jos\u00e9 N\u00fa\u00f1ez-Mart\u00ednez, and Carlos Pe\u00f1afort-Asturiano. 2019. UP-fall detection dataset: A multimodal approach. Sensors 19, 9 (2019), 1988.","journal-title":"Sensors"},{"key":"e_1_3_2_111_2","first-page":"6321","volume-title":"Proceedings of the 25th International Conference on Pattern Recognition (ICPR\u201921)","author":"Mehta Vineet","year":"2021","unstructured":"Vineet Mehta, Abhinav Dhall, Sujata Pal, and Shehroz S. Khan. 2021. Motion and region aware adversarial learning for fall detection with thermal imaging. In Proceedings of the 25th International Conference on Pattern Recognition (ICPR\u201921). IEEE, 6321\u20136328."},{"key":"e_1_3_2_112_2","doi-asserted-by":"crossref","first-page":"198","DOI":"10.1016\/j.neucom.2023.03.070","article-title":"Focalized contrastive view-invariant learning for self-supervised skeleton-based action recognition","volume":"537","author":"Men Qianhui","year":"2023","unstructured":"Qianhui Men, Edmond S. L. Ho, Hubert P. H. Shum, and Howard Leung. 2023. Focalized contrastive view-invariant learning for self-supervised skeleton-based action recognition. Neurocomputing 537 (2023), 198\u2013209.","journal-title":"Neurocomputing"},{"issue":"2","key":"e_1_3_2_113_2","doi-asserted-by":"crossref","first-page":"502","DOI":"10.1109\/TPAMI.2019.2901464","article-title":"Moments in time dataset: One million videos for event understanding","volume":"42","author":"Monfort Mathew","year":"2019","unstructured":"Mathew Monfort, Alex Andonian, Bolei Zhou, Kandan Ramakrishnan, Sarah Adel Bargal, Tom Yan, Lisa Brown, Quanfu Fan, Dan Gutfreund, Carl Vondrick, and Aude Oliva. 2019. Moments in time dataset: One million videos for event understanding. IEEE Trans. Pattern Anal. Mach. Intell. 42, 2 (2019), 502\u2013508.","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"e_1_3_2_114_2","first-page":"675","volume-title":"Proceedings of the 22nd International Conference on Control Systems and Computer Science (CSCS\u201919)","author":"Nan Mihai","year":"2019","unstructured":"Mihai Nan, Alexandra Stefania Ghi \\(\\ca{t}\\) \u0103, Alexandru-Florin Gavril, Mihai Trascau, Alexandru Sorici, Bogdan Cramariuc, and Adina Magda Florea. 2019. Human action recognition for social robots. In Proceedings of the 22nd International Conference on Control Systems and Computer Science (CSCS\u201919). IEEE, 675\u2013681."},{"issue":"1","key":"e_1_3_2_115_2","doi-asserted-by":"crossref","first-page":"393","DOI":"10.1109\/TII.2019.2938527","article-title":"Spatiotemporal anomaly detection using deep learning for real-time video surveillance","volume":"16","author":"Nawaratne Rashmika","year":"2019","unstructured":"Rashmika Nawaratne, Damminda Alahakoon, Daswin De Silva, and Xinghuo Yu. 2019. Spatiotemporal anomaly detection using deep learning for real-time video surveillance. IEEE Trans. Industr. Inform. 16, 1 (2019), 393\u2013402.","journal-title":"IEEE Trans. Industr. Inform."},{"key":"e_1_3_2_116_2","first-page":"1273","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV\u201919)","author":"Nguyen Trong-Nguyen","year":"2019","unstructured":"Trong-Nguyen Nguyen and Jean Meunier. 2019. Anomaly detection in video sequence with appearance-motion correspondence. In Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV\u201919). IEEE, 1273\u20131283."},{"key":"e_1_3_2_117_2","doi-asserted-by":"crossref","first-page":"299","DOI":"10.1007\/s11263-007-0122-4","article-title":"Unsupervised learning of human action categories using spatial-temporal words","volume":"79","author":"Niebles Juan Carlos","year":"2008","unstructured":"Juan Carlos Niebles, Hongcheng Wang, and Li Fei-Fei. 2008. Unsupervised learning of human action categories using spatial-temporal words. Int. J. Comput. Vis. 79 (2008), 299\u2013318.","journal-title":"Int. J. Comput. Vis."},{"issue":"23","key":"e_1_3_2_118_2","doi-asserted-by":"crossref","first-page":"7786","DOI":"10.3390\/s21237786","article-title":"A study of the recent trends of immunology: Key challenges, domains, applications, datasets, and future directions","volume":"21","author":"Pandya Sharnil","year":"2021","unstructured":"Sharnil Pandya, Aanchal Thakur, Santosh Saxena, Nandita Jassal, Chirag Patel, Kirit Modi, Pooja Shah, Rahul Joshi, Sudhanshu Gonge, Kalyani Kadam, and Prachi Kadam. 2021. A study of the recent trends of immunology: Key challenges, domains, applications, datasets, and future directions. Sensors 21, 23 (2021), 7786.","journal-title":"Sensors"},{"issue":"2","key":"e_1_3_2_119_2","doi-asserted-by":"crossref","first-page":"2365","DOI":"10.1109\/LRA.2021.3060410","article-title":"Recognition and prediction of surgical actions based on online robotic tool detection","volume":"6","author":"Park Juyoun","year":"2021","unstructured":"Juyoun Park and Chung Hyuk Park. 2021. Recognition and prediction of surgical actions based on online robotic tool detection. IEEE Robot. Autom. Lett. 6, 2 (2021), 2365\u20132372.","journal-title":"IEEE Robot. Autom. Lett."},{"issue":"5","key":"e_1_3_2_120_2","doi-asserted-by":"crossref","first-page":"1780","DOI":"10.3390\/s22051780","article-title":"DBGC: Dimension-based generic convolution block for object recognition","volume":"22","author":"Patel Chirag","year":"2022","unstructured":"Chirag Patel, Dulari Bhatt, Urvashi Sharma, Radhika Patel, Sharnil Pandya, Kirit Modi, Nagaraj Cholli, Akash Patel, Urvi Bhatt, Muhammad Ahmed Khan, Shubhankar Majumdar, Mohd Zuhair, Khushi Patel, Syed Aziz Shah, and Hemant Ghayvat. 2022. DBGC: Dimension-based generic convolution block for object recognition. Sensors 22, 5 (2022), 1780.","journal-title":"Sensors"},{"key":"e_1_3_2_121_2","first-page":"2662","volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP\u201919)","author":"Perez Mauricio","year":"2019","unstructured":"Mauricio Perez, Alex C. Kot, and Anderson Rocha. 2019. Detection of real-world fights in surveillance videos. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP\u201919). IEEE, 2662\u20132666."},{"key":"e_1_3_2_122_2","first-page":"2569","volume-title":"Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision","author":"Ramachandra Bharathkumar","year":"2020","unstructured":"Bharathkumar Ramachandra and Michael Jones. 2020. Street Scene: A new dataset and evaluation protocol for video anomaly detection. In Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision. IEEE, 2569\u20132578."},{"key":"e_1_3_2_123_2","doi-asserted-by":"crossref","first-page":"90","DOI":"10.1016\/j.ins.2021.04.023","article-title":"Augmented skeleton based contrastive action learning with momentum LSTM for unsupervised action recognition","volume":"569","author":"Rao Haocong","year":"2021","unstructured":"Haocong Rao, Shihao Xu, Xiping Hu, Jun Cheng, and Bin Hu. 2021. Augmented skeleton based contrastive action learning with momentum LSTM for unsupervised action recognition. Inf. Sci. 569 (2021), 90\u2013109.","journal-title":"Inf. Sci."},{"key":"e_1_3_2_124_2","first-page":"1","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201908)","author":"Rodriguez Mikel D.","year":"2008","unstructured":"Mikel D. Rodriguez, Javed Ahmed, and Mubarak Shah. 2008. Action mach a spatio-temporal maximum average correlation height filter for action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201908). IEEE, 1\u20138."},{"key":"e_1_3_2_125_2","first-page":"1234","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201912)","author":"Sadanand Sreemanananth","year":"2012","unstructured":"Sreemanananth Sadanand and Jason J. Corso. 2012. Action bank: A high-level representation of activity in video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201912). IEEE, 1234\u20131241."},{"issue":"10","key":"e_1_3_2_126_2","doi-asserted-by":"crossref","first-page":"1225","DOI":"10.1109\/TCSVT.2005.854237","article-title":"Event detection in field sports video using audio-visual features and a support vector machine","volume":"15","author":"Sadlier David A.","year":"2005","unstructured":"David A. Sadlier and Noel E. O\u2019Connor. 2005. Event detection in field sports video using audio-visual features and a support vector machine. IEEE Trans. Circ. Syst. Video Technol. 15, 10 (2005), 1225\u20131233.","journal-title":"IEEE Trans. Circ. Syst. Video Technol."},{"key":"e_1_3_2_127_2","first-page":"1","volume-title":"Proceedings of the IEEE Workshop on Motion and Video Computing","author":"Savarese Silvio","year":"2008","unstructured":"Silvio Savarese, Andrey DelPozo, Juan Carlos Niebles, and Li Fei-Fei. 2008. Spatial-temporal correlations for unsupervised action classification. In Proceedings of the IEEE Workshop on Motion and Video Computing. IEEE, 1\u20138."},{"key":"e_1_3_2_128_2","first-page":"1","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201908)","author":"Schindler Konrad","year":"2008","unstructured":"Konrad Schindler and Luc Van Gool. 2008. Action snippets: How many frames does human action recognition require? In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201908). IEEE, 1\u20138."},{"key":"e_1_3_2_129_2","first-page":"32","volume-title":"Proceedings of the 17th International Conference on Pattern Recognition (ICPR\u201904)","author":"Schuldt Christian","year":"2004","unstructured":"Christian Schuldt, Ivan Laptev, and Barbara Caputo. 2004. Recognizing human actions: A local SVM approach. In Proceedings of the 17th International Conference on Pattern Recognition (ICPR\u201904). IEEE, 32\u201336."},{"key":"e_1_3_2_130_2","doi-asserted-by":"crossref","first-page":"357","DOI":"10.1145\/1291233.1291311","volume-title":"Proceedings of the 15th ACM International Conference on Multimedia","author":"Scovanner Paul","year":"2007","unstructured":"Paul Scovanner, Saad Ali, and Mubarak Shah. 2007. A 3-dimensional sift descriptor and its application to action recognition. In Proceedings of the 15th ACM International Conference on Multimedia. IEEE, 357\u2013360."},{"key":"e_1_3_2_131_2","doi-asserted-by":"crossref","first-page":"63","DOI":"10.1007\/978-981-19-0475-2_6","article-title":"Violence Detection in Video Footages Using I3D ConvNet","author":"Selvaraj Joel","year":"2022","unstructured":"Joel Selvaraj and J. Anuradha. 2022. Violence Detection in Video Footages Using I3D ConvNet. Innovations in Computational Intelligence and Computer Vision: Proceedings of (ICICV'21). Singapore: Springer Nature Singapore, 63\u201375 pages.","journal-title":"Innovations in Computational Intelligence and Computer Vision: Proceedings of (ICICV'21)"},{"key":"e_1_3_2_132_2","first-page":"1010","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201916)","author":"Shahroudy Amir","year":"2016","unstructured":"Amir Shahroudy, Jun Liu, Tian-Tsong Ng, and Gang Wang. 2016. NTU RGB+ D: A large scale dataset for 3D human activity analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201916). IEEE, 1010\u20131019."},{"key":"e_1_3_2_133_2","first-page":"2616","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201920)","author":"Shao Dian","year":"2020","unstructured":"Dian Shao, Yue Zhao, Bo Dai, and Dahua Lin. 2020. FineGym: A hierarchical video dataset for fine-grained action understanding. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201920). IEEE, 2616\u20132625."},{"key":"e_1_3_2_134_2","first-page":"35","volume-title":"Proceedings of the 16th European Conference on Computer Vision (ECCV\u201920)","author":"Si Chenyang","year":"2020","unstructured":"Chenyang Si, Xuecheng Nie, Wei Wang, Liang Wang, Tieniu Tan, and Jiashi Feng. 2020. Adversarial self-supervised learning for semi-supervised 3D action recognition. In Proceedings of the 16th European Conference on Computer Vision (ECCV\u201920). Springer, 35\u201351."},{"key":"e_1_3_2_135_2","first-page":"510","volume-title":"Proceedings of the 14th European Conference on Computer Vision (ECCV\u201916)","author":"Sigurdsson Gunnar A.","year":"2016","unstructured":"Gunnar A. Sigurdsson, G\u00fcl Varol, Xiaolong Wang, Ali Farhadi, Ivan Laptev, and Abhinav Gupta. 2016. Hollywood in homes: Crowdsourcing data collection for activity understanding. In Proceedings of the 14th European Conference on Computer Vision (ECCV\u201916). Springer, 510\u2013526."},{"key":"e_1_3_2_136_2","first-page":"568","article-title":"Two-stream convolutional networks for action recognition in videos","volume":"27","author":"Simonyan Karen","year":"2014","unstructured":"Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. Advan. Neural Inf. Process. Syst. 27 (2014), 568\u2013576.","journal-title":"Advan. Neural Inf. Process. Syst."},{"key":"e_1_3_2_137_2","first-page":"80","volume-title":"Proceedings of the 9th International Conference on Intelligent Computing and Information Systems (ICICIS\u201919)","author":"Soliman Mohamed Mostafa","year":"2019","unstructured":"Mohamed Mostafa Soliman, Mohamed Hussein Kamal, Mina Abd El-Massih Nashed, Youssef Mohamed Mostafa, Bassel Safwat Chawky, and Dina Khattab. 2019. Violence recognition from videos using deep learning techniques. In Proceedings of the 9th International Conference on Intelligent Computing and Information Systems (ICICIS\u201919). IEEE, 80\u201385."},{"issue":"5","key":"e_1_3_2_138_2","doi-asserted-by":"crossref","first-page":"1915","DOI":"10.1109\/TCSVT.2020.3015051","article-title":"Richly activated graph convolutional network for robust skeleton-based action recognition","volume":"31","author":"Song Yi-Fan","year":"2020","unstructured":"Yi-Fan Song, Zhang Zhang, Caifeng Shan, and Liang Wang. 2020. Richly activated graph convolutional network for robust skeleton-based action recognition. IEEE Trans. Circ. Syst. Video Technol. 31, 5 (2020), 1915\u20131925.","journal-title":"IEEE Trans. Circ. Syst. Video Technol."},{"key":"e_1_3_2_139_2","unstructured":"Khurram Soomro Amir Roshan Zamir and Mubarak Shah. 2012. UCF101: A Dataset of 101 Human Actions Classes From Videos in the Wild. arXiv:1212.0402"},{"key":"e_1_3_2_140_2","first-page":"9631","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201920)","author":"Su Kun","year":"2020","unstructured":"Kun Su, Xiulong Liu, and Eli Shlizerman. 2020. Predict & cluster: Unsupervised skeleton based action recognition. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201920). IEEE, 9631\u20139640."},{"key":"e_1_3_2_141_2","doi-asserted-by":"crossref","first-page":"470","DOI":"10.1007\/978-3-031-21967-2_38","volume-title":"Proceedings of the 14th Asian Conference on Intelligent Information and Database Systems (ACIIDS\u201922)","author":"Suarez Jessie James P.","year":"2022","unstructured":"Jessie James P. Suarez, Nathaniel S. Orillaza Jr, and Prospero C. Naval Jr. 2022. FASENet: A two-stream fall detection and activity monitoring model using pose keypoints and squeeze-and-excitation networks. In Proceedings of the 14th Asian Conference on Intelligent Information and Database Systems (ACIIDS\u201922). Springer, 470\u2013483."},{"key":"e_1_3_2_142_2","first-page":"6479","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201918)","author":"Sultani Waqas","year":"2018","unstructured":"Waqas Sultani, Chen Chen, and Mubarak Shah. 2018. Real-world anomaly detection in surveillance videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201918). IEEE, 6479\u20136488."},{"key":"e_1_3_2_143_2","first-page":"2004","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201909)","author":"Sun Ju","year":"2009","unstructured":"Ju Sun, Xiao Wu, Shuicheng Yan, Loong-Fah Cheong, Tat-Seng Chua, and Jintao Li. 2009. Hierarchical spatio-temporal context modeling for action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201909). IEEE, 2004\u20132011."},{"key":"e_1_3_2_144_2","first-page":"787","volume-title":"Proceedings of the 13th European Conference on Computer Vision (ECCV\u201914)","author":"Sun Min","year":"2014","unstructured":"Min Sun, Ali Farhadi, and Steve Seitz. 2014. Ranking domain-specific highlights by analyzing edited videos. In Proceedings of the 13th European Conference on Computer Vision (ECCV\u201914). Springer, 787\u2013802."},{"key":"e_1_3_2_145_2","first-page":"429","volume-title":"Proceedings of the ACM International Conference on Multimedia Retrieval","author":"Sun Shan","year":"2017","unstructured":"Shan Sun, Feng Wang, Qi Liang, and Liang He. 2017. TaiChi: A fine-grained action recognition dataset. In Proceedings of the ACM International Conference on Multimedia Retrieval. ACM, 429\u2013433."},{"key":"e_1_3_2_146_2","first-page":"4631","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201916)","author":"Tapaswi Makarand","year":"2016","unstructured":"Makarand Tapaswi, Yukun Zhu, Rainer Stiefelhagen, Antonio Torralba, Raquel Urtasun, and Sanja Fidler. 2016. MovieQA: Understanding stories in movies through question-answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201916). IEEE, 4631\u20134640."},{"key":"e_1_3_2_147_2","doi-asserted-by":"crossref","first-page":"19457","DOI":"10.1109\/ACCESS.2021.3054040","article-title":"Anomaly detection with particle filtering for online video surveillance","volume":"9","author":"Tariq Sameema","year":"2021","unstructured":"Sameema Tariq, Haroon Farooq, Abdul Jaleel, Syed Muhammad Wasif, and Ata-Ur-Rehman. 2021. Anomaly detection with particle filtering for online video surveillance. IEEE Access 9 (2021), 19457\u201319468.","journal-title":"IEEE Access"},{"key":"e_1_3_2_148_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2022.3210271"},{"key":"e_1_3_2_149_2","first-page":"4489","volume-title":"Proceedings of the IEEE International Conference on Computer Vision (ICCV\u201915)","author":"Tran Du","year":"2015","unstructured":"Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning spatiotemporal features with 3D convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV\u201915). IEEE, 4489\u20134497."},{"key":"e_1_3_2_150_2","first-page":"5552","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV\u201919)","author":"Tran Du","year":"2019","unstructured":"Du Tran, Heng Wang, Lorenzo Torresani, and Matt Feiszli. 2019. Video classification with channel-separated convolutional networks. In Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV\u201919). IEEE, 5552\u20135561."},{"key":"e_1_3_2_151_2","first-page":"6450","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201918)","author":"Tran Du","year":"2018","unstructured":"Du Tran, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, and Manohar Paluri. 2018. A closer look at spatiotemporal convolutions for action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201918). IEEE, 6450\u20136459."},{"key":"e_1_3_2_152_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2022.3168137"},{"issue":"11","key":"e_1_3_2_153_2","doi-asserted-by":"crossref","first-page":"1473","DOI":"10.1109\/TCSVT.2008.2005594","article-title":"Machine recognition of human activities: A survey","volume":"18","author":"Turaga Pavan","year":"2008","unstructured":"Pavan Turaga, Rama Chellappa, Venkatramana S. Subrahmanian, and Octavian Udrea. 2008. Machine recognition of human activities: A survey. IEEE Trans. Circ. Syst. Video Technol. 18, 11 (2008), 1473\u20131488.","journal-title":"IEEE Trans. Circ. Syst. Video Technol."},{"key":"e_1_3_2_154_2","first-page":"5998","article-title":"Attention is all you need","volume":"30","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advan. Neural Inf. Process. Syst. 30 (2017), 5998\u20136008.","journal-title":"Advan. Neural Inf. Process. Syst."},{"key":"e_1_3_2_155_2","first-page":"882","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops","author":"Vats Kanav","year":"2020","unstructured":"Kanav Vats, Mehrnaz Fani, Pascale Walters, David A. Clausi, and John Zelek. 2020. Event detection in coarsely annotated sports videos via parallel multi-receptive field 1D convolutions. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops. IEEE, 882\u2013883."},{"key":"e_1_3_2_156_2","first-page":"8581","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201918)","author":"Vicol Paul","year":"2018","unstructured":"Paul Vicol, Makarand Tapaswi, Lluis Castrejon, and Sanja Fidler. 2018. MovieGraphs: Towards understanding human-centric situations from videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201918). IEEE, 8581\u20138590."},{"key":"e_1_3_2_157_2","doi-asserted-by":"crossref","first-page":"60","DOI":"10.1007\/s11263-012-0594-8","article-title":"Dense trajectories and motion boundary descriptors for action recognition","volume":"103","author":"Wang Heng","year":"2013","unstructured":"Heng Wang, Alexander Kl\u00e4ser, Cordelia Schmid, and Cheng-Lin Liu. 2013. Dense trajectories and motion boundary descriptors for action recognition. Int. J. Comput. Vis. 103 (2013), 60\u201379.","journal-title":"Int. J. Comput. Vis."},{"key":"e_1_3_2_158_2","first-page":"1430","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201918)","author":"Wang Limin","year":"2018","unstructured":"Limin Wang, Wei Li, Wen Li, and Luc Van Gool. 2018. Appearance-and-relation networks for video classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201918). IEEE, 1430\u20131439."},{"key":"e_1_3_2_159_2","first-page":"4325","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201917)","author":"Wang Limin","year":"2017","unstructured":"Limin Wang, Yuanjun Xiong, Dahua Lin, and Luc Van Gool. 2017. UntrimmedNets for weakly supervised action recognition and detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201917). IEEE, 4325\u20134334."},{"key":"e_1_3_2_160_2","first-page":"20","volume-title":"Proceedings of the European Conference on Computer Vision (ECCV\u201916)","author":"Wang Limin","year":"2016","unstructured":"Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool. 2016. Temporal segment networks: Towards good practices for deep action recognition. In Proceedings of the European Conference on Computer Vision (ECCV\u201916). Springer, 20\u201336."},{"key":"e_1_3_2_161_2","doi-asserted-by":"crossref","first-page":"6224","DOI":"10.1109\/TIP.2022.3207577","article-title":"Contrast-reconstruction representation learning for self-supervised skeleton-based action recognition","volume":"31","author":"Wang Peng","year":"2022","unstructured":"Peng Wang, Jun Wen, Chenyang Si, Yuntao Qian, and Liang Wang. 2022. Contrast-reconstruction representation learning for self-supervised skeleton-based action recognition. IEEE Trans. Image Process. 31 (2022), 6224\u20136238.","journal-title":"IEEE Trans. Image Process."},{"issue":"2","key":"e_1_3_2_162_2","doi-asserted-by":"crossref","first-page":"249","DOI":"10.1016\/j.cviu.2006.07.013","article-title":"Free viewpoint action recognition using motion history volumes","volume":"104","author":"Weinland Daniel","year":"2006","unstructured":"Daniel Weinland, Remi Ronfard, and Edmond Boyer. 2006. Free viewpoint action recognition using motion history volumes. Comput. Vis. Image Underst. 104, 2-3 (2006), 249\u2013257.","journal-title":"Comput. Vis. Image Underst."},{"key":"e_1_3_2_163_2","doi-asserted-by":"crossref","first-page":"286","DOI":"10.1016\/j.neunet.2023.03.042","article-title":"Robust fall detection in video surveillance based on weakly supervised learning","volume":"163","author":"Wu Lian","year":"2023","unstructured":"Lian Wu, Chao Huang, Shuping Zhao, Jinkai Li, Jianchuan Zhao, Zhongwei Cui, Zhen Yu, Yong Xu, and Min Zhang. 2023. Robust fall detection in video surveillance based on weakly supervised learning. Neural Netw. 163 (2023), 286\u2013297.","journal-title":"Neural Netw."},{"key":"e_1_3_2_164_2","doi-asserted-by":"crossref","first-page":"217","DOI":"10.1016\/j.neucom.2020.07.003","article-title":"Fusing motion patterns and key visual information for semantic event recognition in basketball videos","volume":"413","author":"Wu Lifang","year":"2020","unstructured":"Lifang Wu, Zhou Yang, Qi Wang, Meng Jian, Boxuan Zhao, Junchi Yan, and Chang Wen Chen. 2020. Fusing motion patterns and key visual information for semantic event recognition in basketball videos. Neurocomputing 413 (2020), 217\u2013229.","journal-title":"Neurocomputing"},{"key":"e_1_3_2_165_2","first-page":"322","volume-title":"Proceedings of the 16th European Conference on Computer Vision (ECCV\u201920)","author":"Wu Peng","year":"2020","unstructured":"Peng Wu, Jing Liu, Yujia Shi, Yujia Sun, Fangtao Shao, Zhaoyang Wu, and Zhiwei Yang. 2020. Not only look, but also listen: Learning multimodal violence detection under weak supervision. In Proceedings of the 16th European Conference on Computer Vision (ECCV\u201920). Springer, 322\u2013339."},{"key":"e_1_3_2_166_2","first-page":"20","volume-title":"Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops","author":"Xia Lu","year":"2012","unstructured":"Lu Xia, Chia-Chih Chen, and Jake K. Aggarwal. 2012. View invariant human action recognition using histograms of 3D joints. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. IEEE, 20\u201327."},{"key":"e_1_3_2_167_2","first-page":"9727","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201922)","author":"Xiao Fanyi","year":"2022","unstructured":"Fanyi Xiao, Kaustav Kundu, Joseph Tighe, and Davide Modolo. 2022. Hierarchical self-supervised representation learning for movie understanding. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201922). IEEE, 9727\u20139736."},{"key":"e_1_3_2_168_2","first-page":"3252","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201922)","author":"Xiao Junfei","year":"2022","unstructured":"Junfei Xiao, Longlong Jing, Lin Zhang, Ju He, Qi She, Zongwei Zhou, Alan Yuille, and Yingwei Li. 2022. Learning from temporal gradient for semi-supervised action recognition. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201922). IEEE, 3252\u20133262."},{"key":"e_1_3_2_169_2","first-page":"18816","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201923)","author":"Xing Zhen","year":"2023","unstructured":"Zhen Xing, Qi Dai, Han Hu, Jingjing Chen, Zuxuan Wu, and Yu-Gang Jiang. 2023. SVFormer: Semi-supervised video transformer for action recognition. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201923). IEEE, 18816\u201318826."},{"key":"e_1_3_2_170_2","doi-asserted-by":"crossref","first-page":"605","DOI":"10.1016\/j.jmsy.2020.04.007","article-title":"Transferable two-stream convolutional neural network for human action recognition","volume":"56","author":"Xiong Qianqian","year":"2020","unstructured":"Qianqian Xiong, Jianjing Zhang, Peng Wang, Dongdong Liu, and Robert X. Gao. 2020. Transferable two-stream convolutional neural network for human action recognition. J. Manuf. Syst. 56 (2020), 605\u2013614.","journal-title":"J. Manuf. Syst."},{"issue":"7","key":"e_1_3_2_171_2","doi-asserted-by":"crossref","first-page":"1342","DOI":"10.1109\/TMM.2008.2004912","article-title":"Using webcast text for semantic event detection in broadcast sports video","volume":"10","author":"Xu Changsheng","year":"2008","unstructured":"Changsheng Xu, Yi-Fan Zhang, Guangyu Zhu, Yong Rui, Hanqing Lu, and Qingming Huang. 2008. Using webcast text for semantic event detection in broadcast sports video. IEEE Trans. Multim. 10, 7 (2008), 1342\u20131355.","journal-title":"IEEE Trans. Multim."},{"key":"e_1_3_2_172_2","first-page":"10334","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201919)","author":"Xu Dejing","year":"2019","unstructured":"Dejing Xu, Jun Xiao, Zhou Zhao, Jian Shao, Di Xie, and Yueting Zhuang. 2019. Self-supervised spatiotemporal learning via video clip order prediction. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201919). IEEE, 10334\u201310343."},{"key":"e_1_3_2_173_2","first-page":"568","volume-title":"Proceedings of the IEEE International Conference on Multimedia and Expo (ICME\u201919)","author":"Xu Qichao","year":"2019","unstructured":"Qichao Xu, John See, and Weiyao Lin. 2019. Localization guided fight action detection in surveillance videos. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME\u201919). IEEE, 568\u2013573."},{"key":"e_1_3_2_174_2","first-page":"2959","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201922)","author":"Xu Yinghao","year":"2022","unstructured":"Yinghao Xu, Fangyun Wei, Xiao Sun, Ceyuan Yang, Yujun Shen, Bo Dai, Bolei Zhou, and Stephen Lin. 2022. Cross-model pseudo-labeling for semi-supervised action recognition. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201922). IEEE, 2959\u20132968."},{"key":"e_1_3_2_175_2","first-page":"9332","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV\u201921)","author":"Xu Yuecong","year":"2021","unstructured":"Yuecong Xu, Jianfei Yang, Haozhi Cao, Zhenghua Chen, Qi Li, and Kezhi Mao. 2021. Partial video domain adaptation with partial adversarial temporal attentive network. In Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV\u201921). IEEE, CVF, 9332\u20139341."},{"key":"e_1_3_2_176_2","first-page":"70","volume-title":"Proceedings of the 2nd International Workshop on Deep Learning for Human Activity Recognition (DL-HAR\u201920)","author":"Xu Yuecong","year":"2021","unstructured":"Yuecong Xu, Jianfei Yang, Haozhi Cao, Kezhi Mao, Jianxiong Yin, and Simon See. 2021. ARID: A new dataset for recognizing action in the dark. In Proceedings of the 2nd International Workshop on Deep Learning for Human Activity Recognition (DL-HAR\u201920). Springer, 70\u201384."},{"key":"e_1_3_2_177_2","first-page":"147","volume-title":"Proceedings of the 17th European Conference on Computer Vision (ECCV\u201922)","author":"Xu Yuecong","year":"2022","unstructured":"Yuecong Xu, Jianfei Yang, Haozhi Cao, Keyu Wu, Min Wu, and Zhenghua Chen. 2022. Source-free video domain adaptation by learning temporal consistency for action recognition. In Proceedings of the 17th European Conference on Computer Vision (ECCV\u201922). Springer, 147\u2013164."},{"key":"e_1_3_2_178_2","doi-asserted-by":"crossref","first-page":"106624","DOI":"10.1016\/j.asoc.2020.106624","article-title":"A unified framework of deep networks for genre classification using movie trailer","volume":"96","author":"Yadav Ashima","year":"2020","unstructured":"Ashima Yadav and Dinesh Kumar Vishwakarma. 2020. A unified framework of deep networks for genre classification using movie trailer. Appl. Soft Comput. 96 (2020), 106624.","journal-title":"Appl. Soft Comput."},{"key":"e_1_3_2_179_2","first-page":"14063","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201922)","author":"Yang Jiewen","year":"2022","unstructured":"Jiewen Yang, Xingbo Dong, Liujun Liu, Chao Zhang, Jiajun Shen, and Dahai Yu. 2022. Recurring the transformer for video action recognition. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201922). IEEE, 14063\u201314073."},{"key":"e_1_3_2_180_2","unstructured":"Dahua Lin Yue Zhao Yuanjun Xiong. 2019. MMAction. Retrieved from: https:\/\/github.com\/open-mmlab\/mmaction"},{"key":"e_1_3_2_181_2","first-page":"13577","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV\u201921)","author":"Zhang Yanyi","year":"2021","unstructured":"Yanyi Zhang, Xinyu Li, Chunhui Liu, Bing Shuai, Yi Zhu, Biagio Brattoli, Hao Chen, Ivan Marsic, and Joseph Tighe. 2021. VidTr: Video transformer without convolutions. In Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV\u201921). IEEE, 13577\u201313587."},{"key":"e_1_3_2_182_2","first-page":"8668","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV\u201919)","author":"Zhao Hang","year":"2019","unstructured":"Hang Zhao, Antonio Torralba, Lorenzo Torresani, and Zhicheng Yan. 2019. HACS: Human action clips and segments dataset for recognition and temporal localization. In Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV\u201919). IEEE, 8668\u20138678."},{"key":"e_1_3_2_183_2","first-page":"7733","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201919)","author":"Zhao Rui","year":"2019","unstructured":"Rui Zhao, Wanru Xu, Hui Su, and Qiang Ji. 2019. Bayesian hierarchical dynamic model for human action recognition. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201919). IEEE, 7733\u20137742."},{"key":"e_1_3_2_184_2","first-page":"1237","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201919)","author":"Zhong Jia-Xing","year":"2019","unstructured":"Jia-Xing Zhong, Nannan Li, Weijie Kong, Shan Liu, Thomas H. Li, and Ge Li. 2019. Graph convolutional label noise cleaner: Train a plug-and-play action classifier for anomaly detection. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201919). IEEE, 1237\u20131246."},{"key":"e_1_3_2_185_2","first-page":"803","volume-title":"Proceedings of the European Conference on Computer Vision (ECCV\u201918)","author":"Zhou Bolei","year":"2018","unstructured":"Bolei Zhou, Alex Andonian, Aude Oliva, and Antonio Torralba. 2018. Temporal relational reasoning in videos. In Proceedings of the European Conference on Computer Vision (ECCV\u201918). IEEE, 803\u2013818."},{"issue":"10","key":"e_1_3_2_186_2","doi-asserted-by":"crossref","first-page":"2537","DOI":"10.1109\/TIFS.2019.2900907","article-title":"AnomalyNet: An anomaly detection network for video surveillance","volume":"14","author":"Zhou Joey Tianyi","year":"2019","unstructured":"Joey Tianyi Zhou, Jiawei Du, Hongyuan Zhu, Xi Peng, Yong Liu, and Rick Siow Mong Goh. 2019. AnomalyNet: An anomaly detection network for video surveillance. IEEE Trans. Inf. Forens. Secur. 14, 10 (2019), 2537\u20132550.","journal-title":"IEEE Trans. Inf. Forens. Secur."},{"key":"e_1_3_2_187_2","first-page":"3589","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201923)","author":"Zhu Kevin","year":"2022","unstructured":"Kevin Zhu, Alexander Wong, and John McPhee. 2022. FenceNet: Fine-grained footwork recognition in fencing. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR\u201923). IEEE, 3589\u20133598."},{"key":"e_1_3_2_188_2","first-page":"695","volume-title":"Proceedings of the European Conference on Computer Vision (ECCV\u201918)","author":"Zolfaghari Mohammadreza","year":"2018","unstructured":"Mohammadreza Zolfaghari, Kamaljeet Singh, and Thomas Brox. 2018. ECO: Efficient convolutional network for online video understanding. In Proceedings of the European Conference on Computer Vision (ECCV\u201918). IEEE, 695\u2013712."}],"container-title":["ACM Computing Surveys"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3679011","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3679011","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T00:58:14Z","timestamp":1750294694000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3679011"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,10,7]]},"references-count":187,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,1,31]]}},"alternative-id":["10.1145\/3679011"],"URL":"https:\/\/doi.org\/10.1145\/3679011","relation":{},"ISSN":["0360-0300","1557-7341"],"issn-type":[{"value":"0360-0300","type":"print"},{"value":"1557-7341","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,10,7]]},"assertion":[{"value":"2023-05-11","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-07-05","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-10-07","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}