{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,3]],"date-time":"2025-11-03T13:46:22Z","timestamp":1762177582537,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":25,"publisher":"ACM","license":[{"start":{"date-parts":[[2023,1,6]],"date-time":"2023-01-06T00:00:00Z","timestamp":1672963200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Science and Technology Project of State Grid Information & Telecommunication Branch","award":["52993920002N"],"award-info":[{"award-number":["52993920002N"]}]},{"name":"Natural Science Foundation of China","award":["61872333 and 61830612"],"award-info":[{"award-number":["61872333 and 61830612"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,1,6]]},"DOI":"10.1145\/3582649.3582656","type":"proceedings-article","created":{"date-parts":[[2023,4,7]],"date-time":"2023-04-07T16:23:28Z","timestamp":1680884608000},"page":"161-167","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["Lighter Transformer for Online Action Detection"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4887-6496","authenticated-orcid":false,"given":"Ruixin","family":"Li","sequence":"first","affiliation":[{"name":"School of Compuetr Science and Technology, University of Chinese Academy of Sciences, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4324-1632","authenticated-orcid":false,"given":"Longchuan","family":"Yan","sequence":"additional","affiliation":[{"name":"Technology Department, State Grid Information and Telecommunication Branch, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4723-8676","authenticated-orcid":false,"given":"Yuanlong","family":"Peng","sequence":"additional","affiliation":[{"name":"Technology Department, State Grid Information and Telecommunication Branch, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9923-5034","authenticated-orcid":false,"given":"Laiyun","family":"Qing","sequence":"additional","affiliation":[{"name":"School of Compuetr Science and Technology, University of Chinese Academy of Sciences, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2023,4,7]]},"reference":[{"key":"e_1_3_2_1_1_1","first-page":"2634","volume-title":"Proceedings of the IEEE conference on computer vision and pattern recognition","author":"Donahue Jeffrey","year":"2015","unstructured":"Donahue ., Jeffrey and Anne Hendricks ., Lisa and Guadarrama ., Sergio and Rohrbach ., Marcus and Venugopalan ., Subhashini and Saenko ., Kate and Darrell .: Trevor . Long-term recurrent convolutional networks for visual recognition and description . Proceedings of the IEEE conference on computer vision and pattern recognition , pp. 2625\u2013 2634 , 2015 Donahue., Jeffrey and Anne Hendricks., Lisa and Guadarrama., Sergio and Rohrbach., Marcus and Venugopalan., Subhashini and Saenko., Kate and Darrell.: Trevor. Long-term recurrent convolutional networks for visual recognition and description. Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2625\u20132634, 2015"},{"key":"e_1_3_2_1_2_1","first-page":"5541","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Xu Mingze","year":"2019","unstructured":"Xu ., Mingze and Gao ., Mingfei and Chen ., Yi-Ting and Davis ., Larry S and Crandall ., David J . : Temporal recurrent networks for online action detection . Proceedings of the IEEE\/CVF International Conference on Computer Vision , pp. 5532\u2013 5541 , 2019 Xu., Mingze and Gao., Mingfei and Chen., Yi-Ting and Davis., Larry S and Crandall., David J.: Temporal recurrent networks for online action detection. Proceedings of the IEEE\/CVF International Conference on Computer Vision, pp. 5532\u20135541, 2019"},{"key":"e_1_3_2_1_3_1","first-page":"284","volume-title":"European Conference on Computer Vision","author":"De Geest Roeland","year":"2016","unstructured":"De Geest ., Roeland and Gavves ., Efstratios and Gho- drati., Amir and Li ., Zhenyang and Snoek ., Cees and Tuytelaars ., Tinne . : Online action detection . European Conference on Computer Vision , pp. 269\u2013 284 , 2016 De Geest., Roeland and Gavves., Efstratios and Gho- drati., Amir and Li., Zhenyang and Snoek., Cees and Tuytelaars., Tinne.: Online action detection. European Conference on Computer Vision, pp. 269\u2013284, 2016"},{"key":"e_1_3_2_1_4_1","volume-title":"Red: Reinforced encoder-decoder networks for action anticipation. arXiv preprint arXiv:1707.04818","author":"Gao Jiyang","year":"2017","unstructured":"Gao ., Jiyang and Yang ., Zhenheng and Nevatia ., Ram . Red: Reinforced encoder-decoder networks for action anticipation. arXiv preprint arXiv:1707.04818 , 2017 Gao., Jiyang and Yang., Zhenheng and Nevatia., Ram. Red: Reinforced encoder-decoder networks for action anticipation. arXiv preprint arXiv:1707.04818, 2017"},{"key":"e_1_3_2_1_5_1","first-page":"7565","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Wang X.","year":"2021","unstructured":"Wang , X. , Zhang , S., Qing , Z. , Shao , Y. , Zuo , Z. , Gao , C. , & Sang , N.. : Oadtr: Online action detection with tranformers . In Proceedings of the IEEE\/CVF International Conference on Computer Vision , pp. 7565 - 7575 , 2021 . Wang, X., Zhang, S., Qing, Z., Shao, Y., Zuo, Z., Gao, C., & Sang, N..: Oadtr: Online action detection with tranformers. In Proceedings of the IEEE\/CVF International Conference on Computer Vision, pp. 7565-7575, 2021."},{"key":"e_1_3_2_1_6_1","first-page":"818","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Eun Hyunjun","year":"2020","unstructured":"Eun ., Hyunjun and Moon ., Jinyoung and Park . , Jongy- oul and Jung., Chanho and Kim., Changick.: Learning to discriminate information for online action detection . Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pp. 809\u2013 818 , 2020 Eun., Hyunjun and Moon., Jinyoung and Park., Jongy- oul and Jung., Chanho and Kim., Changick.: Learning to discriminate information for online action detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 809\u2013818, 2020"},{"key":"e_1_3_2_1_7_1","first-page":"1557","volume-title":"2018 IEEE Winter Conference on Applications of Computer Vision","author":"De Geest Roeland","year":"2018","unstructured":"De Geest ., Roeland and Tuytelaars ., Tinne . : Modeling temporal structure with lstm for online action detection . 2018 IEEE Winter Conference on Applications of Computer Vision , pp. 1549\u2013 1557 , 2018 De Geest., Roeland and Tuytelaars., Tinne.: Modeling temporal structure with lstm for online action detection. 2018 IEEE Winter Conference on Applications of Computer Vision, pp. 1549\u20131557, 2018"},{"key":"e_1_3_2_1_8_1","volume-title":"Advances in Neural Information Processing Systems","volume":"34","author":"Xu Mingze","year":"2021","unstructured":"Xu ., Mingze and Xiong ., Yuanjun and Chen ., Hao and Li ., Xinyu and Xia ., Wei and Tu ., Zhuowen and Soatto ., Stefano . : Long short-term transformer for online action detection . Advances in Neural Information Processing Systems , vol. 34 , 2021 Xu., Mingze and Xiong., Yuanjun and Chen., Hao and Li., Xinyu and Xia., Wei and Tu., Zhuowen and Soatto., Stefano.: Long short-term transformer for online action detection. Advances in Neural Information Processing Systems, vol.34, 2021"},{"key":"e_1_3_2_1_9_1","volume-title":"Proceedings of the IEEE Conference on Computer Vision","author":"Arnab Anurag","year":"2021","unstructured":"Arnab ., Anurag and Dehghani ., Mostafa and Heigold ., Georg and Sun ., Chen and Luc\u02c7ic \u0301., Mario and Schmid ., Cordelia . : Vivit: A video vision transformer . Proceedings of the IEEE Conference on Computer Vision , 2021 . Arnab., Anurag and Dehghani., Mostafa and Heigold., Georg and Sun., Chen and Luc\u02c7ic \u0301., Mario and Schmid., Cordelia.: Vivit: A video vision transformer. Proceedings of the IEEE Conference on Computer Vision, 2021."},{"key":"e_1_3_2_1_10_1","first-page":"23","volume-title":"Computer Vision and Image Understanding","author":"Idrees Haroon","year":"2017","unstructured":"Idrees ., Haroon and Zamir ., Amir R and Jiang ., Yu- Gang and Gorban ., Alex and Laptev ., Ivan and Suk- thankar., Rahul and., Mubarak .: The THUMOS challenge on action recognition for videos \u201cin the wild \u201d. Computer Vision and Image Understanding , pp. 1\u2013 23 , 2017 . Idrees., Haroon and Zamir., Amir R and Jiang., Yu- Gang and Gorban., Alex and Laptev., Ivan and Suk- thankar., Rahul and., Mubarak.: The THUMOS challenge on action recognition for videos \u201cin the wild\u201d. Computer Vision and Image Understanding, pp. 1\u201323, 2017 ."},{"key":"e_1_3_2_1_11_1","first-page":"36","volume-title":"European conference on computer vision","author":"Wang Limin","year":"2016","unstructured":"Wang ., Limin and Xiong ., Yuanjun and Wang ., Zhe and Qiao ., Yu and Lin ., Dahua and Tang ., Xiaoou and Van Gool ., Luc . : Temporal segment networks: Towards good practices for deep action recognition . European conference on computer vision , pp. 20\u2013 36 , 2016 Wang., Limin and Xiong., Yuanjun and Wang., Zhe and Qiao., Yu and Lin., Dahua and Tang., Xiaoou and Van Gool., Luc.: Temporal segment networks: Towards good practices for deep action recognition. European conference on computer vision, pp. 20\u201336, 2016"},{"key":"e_1_3_2_1_12_1","volume-title":"Proceedings of the IEEE International Conference on Computer Vision, 2914\u20132923","author":"Zhao Yue","year":"2017","unstructured":"Zhao ., Yue and Xiong ., Yuanjun and Wang ., Limin and Wu ., Zhirong and Tang ., Xiaoou and Lin ., Dahua . : Temporal action detection with structured segment networks . Proceedings of the IEEE International Conference on Computer Vision, 2914\u20132923 , 2017 Zhao., Yue and Xiong., Yuanjun and Wang., Limin and Wu., Zhirong and Tang., Xiaoou and Lin., Dahua.: Temporal action detection with structured segment networks. Proceedings of the IEEE International Conference on Computer Vision, 2914\u20132923, 2017"},{"key":"e_1_3_2_1_13_1","volume-title":"In\u00a0Proceedings of the European conference on computer vision (ECCV)\u00a0(pp. 3-19)","author":"Lin T.","year":"2018","unstructured":"Lin , T. , Zhao , X., Su , H. , Wang , C. , & Yang , M. ( 2018 ). Bsn: Boundary sensitive network for temporal action proposal generation . In\u00a0Proceedings of the European conference on computer vision (ECCV)\u00a0(pp. 3-19) . Lin, T., Zhao, X., Su, H., Wang, C., & Yang, M. (2018). Bsn: Boundary sensitive network for temporal action proposal generation. In\u00a0Proceedings of the European conference on computer vision (ECCV)\u00a0(pp. 3-19)."},{"key":"e_1_3_2_1_14_1","volume-title":"CVPR","author":"Chen Minghao","year":"2022","unstructured":"Chen , Minghao and Wei , Fangyun and Li , Chong and Cai , Deng . Frame-wise Action Representations for Long Videos via Sequence Contrastive Learning , CVPR , 2022 Chen, Minghao and Wei, Fangyun and Li, Chong and Cai, Deng. Frame-wise Action Representations for Long Videos via Sequence Contrastive Learning, CVPR, 2022"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298698"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_1_17_1","first-page":"456","volume-title":"International conference on ma- chine learning","author":"Ioffe Sergey","year":"2015","unstructured":"Ioffe ., Sergey and Szegedy ., Christian . : Batch normalization: Accelerating deep network training by reducing internal covariate shift . International conference on ma- chine learning , pp. 448\u2013 456 , 2015 Ioffe., Sergey and Szegedy., Christian.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. International conference on ma- chine learning, pp. 448\u2013456, 2015"},{"key":"e_1_3_2_1_18_1","volume-title":"A pursuit of temporal accuracy in general activity detection. arXiv preprint arXiv:1703.02716","author":"Xiong Yuanjun","year":"2017","unstructured":"Xiong ., Yuanjun and Zhao ., Yue and Wang ., Limin and Lin ., Dahua and Tang ., Xiaoou . : A pursuit of temporal accuracy in general activity detection. arXiv preprint arXiv:1703.02716 , 2017 . Xiong., Yuanjun and Zhao., Yue and Wang., Limin and Lin., Dahua and Tang., Xiaoou.: A pursuit of temporal accuracy in general activity detection. arXiv preprint arXiv:1703.02716, 2017."},{"key":"e_1_3_2_1_19_1","first-page":"3898","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Lin Tianwei","year":"2019","unstructured":"Lin ., Tianwei and Liu ., Xiao and Li ., Xin and Ding ., Errui and Wen ., Shilei . : Actionvlad: Bmn: Boundary- matching network for temporal action proposal generation . Proceedings of the IEEE\/CVF International Conference on Computer Vision , pp. 3889\u2013 3898 , 2019 Lin., Tianwei and Liu., Xiao and Li., Xin and Ding., Errui and Wen., Shilei.: Actionvlad: Bmn: Boundary- matching network for temporal action proposal generation. Proceedings of the IEEE\/CVF International Conference on Computer Vision, pp. 3889\u20133898, 2019"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.502"},{"key":"e_1_3_2_1_21_1","first-page":"3169","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Yang Le","year":"2022","unstructured":"Yang ., Le and Han ., Junwei and Zhang ., Dingwen . : Colar: Effective and Efficient Online Action Detection by Consulting Exemplars . Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pp. 3160\u2013 3169 , 2022 Yang., Le and Han., Junwei and Zhang., Dingwen.: Colar: Effective and Efficient Online Action Detection by Consulting Exemplars. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3160\u20133169, 2022"},{"key":"e_1_3_2_1_22_1","volume-title":"Philipp.: Real-time Online Video Detection with Temporal Smoothing Transformers. European Conference on Computer Vision","author":"Zhao Yue","year":"2022","unstructured":"Zhao ., Yue and Kra \u0308henbu \u0308hl ., Philipp.: Real-time Online Video Detection with Temporal Smoothing Transformers. European Conference on Computer Vision , 2022 Zhao., Yue and Kra \u0308henbu \u0308hl., Philipp.: Real-time Online Video Detection with Temporal Smoothing Transformers. European Conference on Computer Vision, 2022"},{"key":"e_1_3_2_1_23_1","volume-title":"Sylvain and others.: An image is worth 16x16 words: Transformers for image recognition at scale. The Interna- tional Conference on Learning Representations (ICLR)","author":"Dosovitskiy Alexey","year":"2021","unstructured":"Dosovitskiy ., Alexey and Beyer ., Lucas and Kolesnikov ., Alexander and Weissenborn ., Dirk and Zhai ., Xiaohua and Unterthiner ., Thomas and Dehghani ., Mostafa and Minderer ., Matthias and Heigold ., Georg and Gelly . , Sylvain and others.: An image is worth 16x16 words: Transformers for image recognition at scale. The Interna- tional Conference on Learning Representations (ICLR) , 2021 Dosovitskiy., Alexey and Beyer., Lucas and Kolesnikov., Alexander and Weissenborn., Dirk and Zhai., Xiaohua and Unterthiner., Thomas and Dehghani., Mostafa and Minderer., Matthias and Heigold., Georg and Gelly., Sylvain and others.: An image is worth 16x16 words: Transformers for image recognition at scale. The Interna- tional Conference on Learning Representations (ICLR), 2021"},{"key":"e_1_3_2_1_24_1","volume-title":"ICML","author":"Bertasius Gedas","year":"2021","unstructured":"Bertasius , Gedas and Wang , Heng and Torresani , Lorenzo . Is space-time attention all you need for video understanding ? ICML , 2021 Bertasius, Gedas and Wang, Heng and Torresani, Lorenzo. Is space-time attention all you need for video understanding? ICML, 2021"},{"key":"e_1_3_2_1_25_1","volume-title":"CVPR","author":"Liu Ze","year":"2021","unstructured":"Liu , Ze and Lin , Yutong and Cao , Yue and Hu , Han and Wei , Yixuan and Zhang , Zheng and Lin , Stephen and Guo , Baining , Swin transformer : Hierarchical vision transformer using shifted windows , CVPR , 2021 Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining, Swin transformer: Hierarchical vision transformer using shifted windows, CVPR, 2021"}],"event":{"name":"ICIGP 2023: 2023 The 6th International Conference on Image and Graphics Processing","acronym":"ICIGP 2023","location":"Chongqing China"},"container-title":["Proceedings of the 2023 6th International Conference on Image and Graphics Processing"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3582649.3582656","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3582649.3582656","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T18:09:14Z","timestamp":1750183754000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3582649.3582656"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,1,6]]},"references-count":25,"alternative-id":["10.1145\/3582649.3582656","10.1145\/3582649"],"URL":"https:\/\/doi.org\/10.1145\/3582649.3582656","relation":{},"subject":[],"published":{"date-parts":[[2023,1,6]]},"assertion":[{"value":"2023-04-07","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}