{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:13:06Z","timestamp":1750219986901,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":26,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,11,17]],"date-time":"2022-11-17T00:00:00Z","timestamp":1668643200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,11,17]]},"DOI":"10.1145\/3581807.3581810","type":"proceedings-article","created":{"date-parts":[[2023,5,23]],"date-time":"2023-05-23T00:02:28Z","timestamp":1684800148000},"page":"14-19","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["An Efficient Lightweight Spatio-temporal Attention Module for Action Recognition"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6515-8859","authenticated-orcid":false,"given":"Zhonghua","family":"Sun","sequence":"first","affiliation":[{"name":"Faculty of Information Technology, Beijing University of Technology, China and \rBeijing Laboratory of Advanced Information Networks, Beijing University of Technology, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3942-1841","authenticated-orcid":false,"given":"Meng","family":"Dai","sequence":"additional","affiliation":[{"name":"Faculty of Information Technology, Beijing University of Technology, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2249-8340","authenticated-orcid":false,"given":"Ziwen","family":"Yi","sequence":"additional","affiliation":[{"name":"Faculty of Information Technology, Beijing University of Technology, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4746-9124","authenticated-orcid":false,"given":"Tianyi","family":"Wang","sequence":"additional","affiliation":[{"name":"Faculty of Information Technology, Beijing University of Technology, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5603-8874","authenticated-orcid":false,"given":"Jinchao","family":"Feng","sequence":"additional","affiliation":[{"name":"Faculty of Information Technology, Beijing University of Technology, China and \rBeijing Key Laboratory of Computational Intelligence and Intelligent System, Faculty of Information Technology, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7620-2221","authenticated-orcid":false,"given":"Kebin","family":"Jia","sequence":"additional","affiliation":[{"name":"Faculty of Information Technology, Beijing University of Technology, China and \rBeijing Laboratory of Advanced Information Networks, Beijing University of Technology, China"}]}],"member":"320","published-online":{"date-parts":[[2023,5,22]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Proceedings of the European Conference on Computer Vision. Springer, Cham, 20-36","author":"Wang Limin","year":"2016","unstructured":"Limin Wang , Yuanjun Xiong , Zhe Wang , Yu Qiao , Dahua Lin , Xiaoou Tang , Luc Van Gool . 2016 . Temporal segment networks: Towards good practices for deep action recognition . In Proceedings of the European Conference on Computer Vision. Springer, Cham, 20-36 . https:\/\/doi.org\/10.1007\/978-3-319-46484-8_2 10.1007\/978-3-319-46484-8_2 Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, Luc Van Gool. 2016. Temporal segment networks: Towards good practices for deep action recognition. In Proceedings of the European Conference on Computer Vision. Springer, Cham, 20-36. https:\/\/doi.org\/10.1007\/978-3-319-46484-8_2"},{"key":"e_1_3_2_1_2_1","first-page":"568","article-title":"Two-stream convolutional networks for action recognition in videos","volume":"27","author":"Simonyan Karen","year":"2014","unstructured":"Karen Simonyan , Andrew Zisserman . 2014 . Two-stream convolutional networks for action recognition in videos . In Proceedings of Advances in Neural Information Processing Systems 27 , 568 - 576 . Karen Simonyan, Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. In Proceedings of Advances in Neural Information Processing Systems 27, 568-576.","journal-title":"Proceedings of Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_1_3_1","volume-title":"Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2625-2634","author":"Donahue Jeffrey","year":"2015","unstructured":"Jeffrey Donahue , Lisa Anne Hendricks , Sergio Guadarrama , Marcus Rohrbach , Subhashini, Venugopalan, Kate Saenko , Trevor Darrell . 2015 . Long-term recurrent convolutional networks for visual recognition and description . In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2625-2634 . https:\/\/doi: 10.1109\/CVPR.2015.7298878. 10.1109\/CVPR.2015.7298878 Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini, Venugopalan, Kate Saenko, Trevor Darrell. 2015. Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2625-2634. https:\/\/doi: 10.1109\/CVPR.2015.7298878."},{"key":"e_1_3_2_1_4_1","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6546\u20136555","author":"Hara Kensho","year":"2018","unstructured":"Kensho Hara , Hirokatsu Kataoka , Yutaka Satoh . 2018 . Can spatiotemporal 3D CNNs retrace the history of 2D CNNs and ImageNet? In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6546\u20136555 Kensho Hara, Hirokatsu Kataoka, Yutaka Satoh. 2018. Can spatiotemporal 3D CNNs retrace the history of 2D CNNs and ImageNet? In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6546\u20136555"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00745"},{"volume-title":"So Kweon. 2018. CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision, 3-19","author":"Woo Sanghyun","key":"e_1_3_2_1_6_1","unstructured":"Sanghyun Woo , Jongchan Park , Joon-Young Lee , In So Kweon. 2018. CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision, 3-19 Sanghyun Woo, Jongchan Park, Joon-Young Lee, In So Kweon. 2018. CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision, 3-19"},{"volume-title":"the Proceedings of the International Conference on Computer Vision, 2556-2563","author":"Kuehne H.","key":"e_1_3_2_1_7_1","unstructured":"H. Kuehne , H. Jhuang , E. Garrote , T. Poggio and T. Serre . 2011. HMDB: A large video database for human motion recognition . In the Proceedings of the International Conference on Computer Vision, 2556-2563 H. Kuehne, H. Jhuang, E. Garrote, T. Poggio and T. Serre. 2011. HMDB: A large video database for human motion recognition. In the Proceedings of the International Conference on Computer Vision, 2556-2563"},{"key":"e_1_3_2_1_8_1","volume-title":"the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. arXiv:1212","author":"Soomro Khurram","year":"2012","unstructured":"Khurram Soomro ,\u00a0 Amir Roshan Zamir ,\u00a0 Mubarak Shah . 2012 . UCF101: A Dataset of 101 human actions classes from videos in the wild . In the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. arXiv:1212 .0402, https:\/\/arxiv.org\/abs\/1212.0402. Khurram Soomro,\u00a0Amir Roshan Zamir,\u00a0Mubarak Shah. 2012. UCF101: A Dataset of 101 human actions classes from videos in the wild. In the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. arXiv:1212.0402, https:\/\/arxiv.org\/abs\/1212.0402."},{"key":"e_1_3_2_1_9_1","volume-title":"the Proceedings of the International Conference on Computer Vision, 4489\u20134497","author":"Tran Du","year":"2015","unstructured":"Du Tran , Lubomir Bourdev , Rob Fergus , Lorenzo Torresani , Manohar Paluri . 2015 . Learning spatiotemporal features with 3D convolutional networks . In the Proceedings of the International Conference on Computer Vision, 4489\u20134497 Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, Manohar Paluri. 2015. Learning spatiotemporal features with 3D convolutional networks. In the Proceedings of the International Conference on Computer Vision, 4489\u20134497"},{"key":"e_1_3_2_1_10_1","volume-title":"the Proceedings of the International Conference on Computer Vision, 5534\u20135542","author":"Qiu Zhaofan","year":"2017","unstructured":"Zhaofan Qiu , Ting Yao , Tao Mei . 2017 . Learning spatio-temporal representation with pseudo-3D residual networks . In the Proceedings of the International Conference on Computer Vision, 5534\u20135542 Zhaofan Qiu, Ting Yao, Tao Mei. 2017. Learning spatio-temporal representation with pseudo-3D residual networks. In the Proceedings of the International Conference on Computer Vision, 5534\u20135542"},{"key":"e_1_3_2_1_11_1","volume-title":"the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6450\u20136459","author":"Tran Du","year":"2018","unstructured":"Du Tran , Heng Wang , Lorenzo Torresani , Jamie Ray , Yann LeCun , Manohar Paluri . 2018 . A Closer look at spatiotemporal convolutions for action recognition . In the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6450\u20136459 Du Tran, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, Manohar Paluri. 2018. A Closer look at spatiotemporal convolutions for action recognition. In the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6450\u20136459"},{"key":"e_1_3_2_1_12_1","volume-title":"the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 449\u2013458","author":"Zhou Yizhou","year":"2018","unstructured":"Yizhou Zhou , Xiaoyan Sun , Zheng-Jun Zha , Wenjun Zeng . 2018 . MiCT: Mixed 3D\/2D convolutional tube for human action recognition . In the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 449\u2013458 Yizhou Zhou, Xiaoyan Sun, Zheng-Jun Zha, Wenjun Zeng. 2018. MiCT: Mixed 3D\/2D convolutional tube for human action recognition. In the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 449\u2013458"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"crossref","first-page":"1379","DOI":"10.1007\/s11760-021-01868-8","article-title":"RGB+2D skeleton: local hand-crafted and 3D convolution feature coding for action recognition","volume":"15","author":"Zhang Yixiang","year":"2021","unstructured":"Yixiang Zhang , Hongbo Zhang , Jixiang Du , Qing Lei , Lijie Yang , Bineng Zhong . 2021 . RGB+2D skeleton: local hand-crafted and 3D convolution feature coding for action recognition . Signal, Image and Video Processing 15 , 1379 \u2013 1386 . https:\/\/doi.org\/10.1007\/s11760-021-01868-8 10.1007\/s11760-021-01868-8 Yixiang Zhang, Hongbo Zhang, Jixiang Du, Qing Lei, Lijie Yang, Bineng Zhong. 2021. RGB+2D skeleton: local hand-crafted and 3D convolution feature coding for action recognition. Signal, Image and Video Processing 15, 1379\u20131386. https:\/\/doi.org\/10.1007\/s11760-021-01868-8","journal-title":"Signal, Image and Video Processing"},{"key":"e_1_3_2_1_14_1","first-page":"2204","article-title":"Recurrent models of visual attention","volume":"27","author":"Mnih Volodymyr","year":"2014","unstructured":"Volodymyr Mnih , Nicolas Heess , Alex Graves , koray kavukcuoglu. 2014 . Recurrent models of visual attention . In the Proceedings of Advances in Neural Information Processing Systems 27 , 2204 \u2013 2212 Volodymyr Mnih, Nicolas Heess, Alex Graves, koray kavukcuoglu. 2014. Recurrent models of visual attention. In the Proceedings of Advances in Neural Information Processing Systems 27, 2204\u20132212","journal-title":"the Proceedings of Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_1_15_1","first-page":"2017","article-title":"Spatial transformer networks","volume":"28","author":"Jaderberg Max","year":"2015","unstructured":"Max Jaderberg , Karen Simonyan , Andrew Zisserman , koray kavukcuoglu. 2015 . Spatial transformer networks . In the Proceedings of Advances in Neural Information Processing Systems 28 , 2017 \u2013 2025 Max Jaderberg, Karen Simonyan, Andrew Zisserman, koray kavukcuoglu. 2015. Spatial transformer networks. In the Proceedings of Advances in Neural Information Processing Systems 28, 2017\u20132025","journal-title":"the Proceedings of Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_1_16_1","unstructured":"Shikhar Sharma Ryan Kiros Ruslan Salakhutdinov. 2015. Action Recognition using Visual Attention. arXiv preprint arXiv:1511.04119  Shikhar Sharma Ryan Kiros Ruslan Salakhutdinov. 2015. Action Recognition using Visual Attention. arXiv preprint arXiv:1511.04119"},{"key":"e_1_3_2_1_17_1","volume-title":"the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7794\u20137803","author":"Wang Xiaolong","year":"2018","unstructured":"Xiaolong Wang , Ross Girshick , Abhinav Gupta , Kaiming He . 2018 . Non-local neural networks . In the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7794\u20137803 Xiaolong Wang, Ross Girshick, Abhinav Gupta, Kaiming He. 2018. Non-local neural networks. In the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7794\u20137803"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"crossref","first-page":"865","DOI":"10.1007\/s11760-021-02028-8","article-title":"Comparison of 2D and 3D attention mechanisms for human (collective) activity recognition","volume":"16","author":"Zalluhoglu Cemil","year":"2021","unstructured":"Cemil Zalluhoglu , Nazli Ikizler-Cinbis . 2021 . Comparison of 2D and 3D attention mechanisms for human (collective) activity recognition . Signal, Image and Video Processing 16 , 865 - 872 . https:\/\/doi.org\/10.1007\/s11760-021-02028-8 10.1007\/s11760-021-02028-8 Cemil Zalluhoglu, Nazli Ikizler-Cinbis. 2021. Comparison of 2D and 3D attention mechanisms for human (collective) activity recognition. Signal, Image and Video Processing 16, 865-872. https:\/\/doi.org\/10.1007\/s11760-021-02028-8","journal-title":"Signal, Image and Video Processing"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_1_20_1","volume-title":"the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4724\u20134733","author":"Carreira Joao","year":"2017","unstructured":"Joao Carreira , Andrew Zisserman . 2017 . Quo vadis, action recognition? a new model and the kinetics dataset . In the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4724\u20134733 Joao Carreira, Andrew Zisserman. 2017. Quo vadis, action recognition? a new model and the kinetics dataset. In the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4724\u20134733"},{"key":"e_1_3_2_1_21_1","volume-title":"the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1-6","author":"Wang Limin","year":"2015","unstructured":"Limin Wang , Yuanjun Xiong , Zhe Wang , Yu Qiao . 2015 . Towards good practices for very deep two-stream convnets . In the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1-6 Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao. 2015. Towards good practices for very deep two-stream convnets. In the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1-6"},{"key":"e_1_3_2_1_22_1","volume-title":"the Proceedings of the International Conference on Computer Vision, 3551\u20133558","author":"Wang Heng","year":"2013","unstructured":"Heng Wang , Cordelia Schmid . 2013 . Action recognition with improved trajectories . In the Proceedings of the International Conference on Computer Vision, 3551\u20133558 Heng Wang, Cordelia Schmid. 2013. Action recognition with improved trajectories. In the Proceedings of the International Conference on Computer Vision, 3551\u20133558"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"crossref","unstructured":"Ramprasaath R. Selvaraju Michael Cogswell Abhishek Das Ramakrishna Vedantam Devi Parikh Dhruv Batra. 2020. Grad-CAM: Visual explanations from deep networks via gradient-based localization. International Journal of Computer Vision 336\u2013359  Ramprasaath R. Selvaraju Michael Cogswell Abhishek Das Ramakrishna Vedantam Devi Parikh Dhruv Batra. 2020. Grad-CAM: Visual explanations from deep networks via gradient-based localization. International Journal of Computer Vision 336\u2013359","DOI":"10.1007\/s11263-019-01228-7"},{"key":"e_1_3_2_1_24_1","volume-title":"the Proceedings of the IEEE International Conference on Computer Vision, 3628\u20133636","author":"Gao Jiyang","year":"2017","unstructured":"Jiyang Gao , Zhenheng Yang , Kan Chen , Chen Sun , Ram Nevatia . 2017 . Turn tap: Temporal unit regression network for temporal action proposals . In the Proceedings of the IEEE International Conference on Computer Vision, 3628\u20133636 Jiyang Gao, Zhenheng Yang, Kan Chen, Chen Sun, Ram Nevatia. 2017. Turn tap: Temporal unit regression network for temporal action proposals. In the Proceedings of the IEEE International Conference on Computer Vision, 3628\u20133636"},{"key":"e_1_3_2_1_25_1","article-title":"Channel separable convolutional neural network for action recognition","volume":"36","author":"Sun Z. Y, Z.H.","year":"2020","unstructured":"Z. Y, Z.H. Sun , J.C. Feng , K. Jia . 2020 . Channel separable convolutional neural network for action recognition . Journal of Signal Processing 36 , 9( September 2020), 1497-1502 Z. Y, Z.H. Sun, J.C. Feng, K. Jia. 2020. Channel separable convolutional neural network for action recognition. Journal of Signal Processing 36, 9(September 2020), 1497-1502","journal-title":"Journal of Signal Processing"},{"key":"e_1_3_2_1_26_1","volume-title":"Yusuf Sinan Akgul","author":"Yucer Seyma","year":"2018","unstructured":"Seyma Yucer , Yusuf Sinan Akgul . 2018 . 3D human action recognition with Siamese-LSTM based deep metric learning. arXiv preprint arXiv:1807.02131, https:\/\/doi.org\/10.18178\/joig.6.1.21-26 10.18178\/joig.6.1.21-26 Seyma Yucer, Yusuf Sinan Akgul. 2018. 3D human action recognition with Siamese-LSTM based deep metric learning. arXiv preprint arXiv:1807.02131, https:\/\/doi.org\/10.18178\/joig.6.1.21-26"}],"event":{"name":"ICCPR 2022: 2022 11th International Conference on Computing and Pattern Recognition","acronym":"ICCPR 2022","location":"Beijing China"},"container-title":["Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3581807.3581810","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3581807.3581810","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T17:49:29Z","timestamp":1750182569000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3581807.3581810"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,11,17]]},"references-count":26,"alternative-id":["10.1145\/3581807.3581810","10.1145\/3581807"],"URL":"https:\/\/doi.org\/10.1145\/3581807.3581810","relation":{},"subject":[],"published":{"date-parts":[[2022,11,17]]},"assertion":[{"value":"2023-05-22","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}