{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,13]],"date-time":"2026-01-13T20:39:27Z","timestamp":1768336767513,"version":"3.49.0"},"reference-count":54,"publisher":"Association for Computing Machinery (ACM)","issue":"1","funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62372325, 62402255, 62502344, and U25A20444"],"award-info":[{"award-number":["62372325, 62402255, 62502344, and U25A20444"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100006606","name":"Natural Science Foundation of Tianjin Municipality","doi-asserted-by":"crossref","award":["23JCZDJC00280"],"award-info":[{"award-number":["23JCZDJC00280"]}],"id":[{"id":"10.13039\/501100006606","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100007129","name":"Shandong Provincial Natural Science Foundation","doi-asserted-by":"crossref","award":["ZR2024QF020"],"award-info":[{"award-number":["ZR2024QF020"]}],"id":[{"id":"10.13039\/501100007129","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Shandong Province National Talents Supporting Program","award":["2023GJJLJRC-070"],"award-info":[{"award-number":["2023GJJLJRC-070"]}]},{"name":"Shandong Project towards the Integration of Education and Industry","award":["801822020100000024 and 2024ZDZX11"],"award-info":[{"award-number":["801822020100000024 and 2024ZDZX11"]}]},{"name":"Young Talent of Lifting Engineering for Science and Technology in Shandong","award":["SDAST2024QTB001"],"award-info":[{"award-number":["SDAST2024QTB001"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2026,1,31]]},"abstract":"<jats:p>Temporal action localization is a fundamental task in video understanding that focuses on classifying and temporally localizing action instances in untrimmed videos. Compared to temporal action localization, the Weakly supervised Temporal Action Localization (WTAL) task presents greater challenges, as its training data lacks detailed information about action boundaries. Existing WTAL methods ignore the complementary relationship between modalities and the dependency between snippets, resulting in inaccurate localization results. To solve these issues, we propose a Collaborative Hierarchical Aggregation Network (CHA-Net). Specifically, we first use a modality complementary module to learn the synergies between modalities. Then, a collaborative enhance module is proposed to remove the information irrelevant to actions in RGB modality. Finally, a hierarchical aggregation module is proposed to capture the complete temporal information of action instances to better mine the temporal dependencies between snippets. Extensive experiments on THUMOS14, ActivityNet1.2, and ActivityNet1.3 datasets demonstrate the effectiveness of our method. Compared with F3-Net (TMM2024, Avg{0.1:0.5}) and SPCC-Net (TMM2024, Avg{0.1:0.7}) on the THUMOS14 dataset, the proposed method can achieve improvements of 3.2% and 2.4%, respectively.<\/jats:p>","DOI":"10.1145\/3778170","type":"journal-article","created":{"date-parts":[[2025,11,27]],"date-time":"2025-11-27T09:19:54Z","timestamp":1764235194000},"page":"1-18","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["A Collaborative Hierarchical Aggregation Network for Weakly Supervised Temporal Action Localization"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2182-5741","authenticated-orcid":false,"given":"Zan","family":"Gao","sequence":"first","affiliation":[{"name":"Tianjin University of Technology, Tianjin, China and Qilu University of Technology, Jinan, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-4760-7324","authenticated-orcid":false,"given":"Xiaoyi","family":"Xu","sequence":"additional","affiliation":[{"name":"Tianjin University of Technology, Tianjin, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4187-1980","authenticated-orcid":false,"given":"Yibo","family":"Zhao","sequence":"additional","affiliation":[{"name":"Tianjin University of Technology, Tianjin, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6348-671X","authenticated-orcid":false,"given":"Chunjie","family":"Ma","sequence":"additional","affiliation":[{"name":"Qilu University of Technology, Jinan, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2249-1480","authenticated-orcid":false,"given":"Yanbing","family":"Xue","sequence":"additional","affiliation":[{"name":"Tianjin University of Technology, Tianjin, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-5303-0265","authenticated-orcid":false,"given":"Riwei","family":"Wang","sequence":"additional","affiliation":[{"name":"Wenzhou University of Technology, Wenzhou, China"}]}],"member":"320","published-online":{"date-parts":[[2026,1,13]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.502"},{"key":"e_1_3_1_3_2","first-page":"1130","volume-title":"2018 IEEE Conference on Computer Vision and Pattern Recognition","author":"Chao Yu-Wei","year":"2018","unstructured":"Yu-Wei Chao, Sudheendra Vijayanarasimhan, Bryan Seybold, David A. Ross, Jia Deng, and Rahul Sukthankar. 2018. Rethinking the faster R-CNN architecture for temporal action localization. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, 1130\u20131139."},{"key":"e_1_3_1_4_2","first-page":"248","volume-title":"36th AAAI Conference on Artificial Intelligence (AAAI)","author":"Chen Guo","year":"2022","unstructured":"Guo Chen, Yin-Dong Zheng, Limin Wang, and Tong Lu. 2022. DCAN: Improving temporal action detection via dual context aggregation. In 36th AAAI Conference on Artificial Intelligence (AAAI), 248\u2013257."},{"key":"e_1_3_1_5_2","first-page":"192","volume-title":"17th European Conference on Computer Vision (ECCV \u201922)","author":"Chen Mengyuan","year":"2022","unstructured":"Mengyuan Chen, Junyu Gao, Shicai Yang, and Changsheng Xu. 2022. Dual-evidential learning for weakly-supervised temporal action localization. In 17th European Conference on Computer Vision (ECCV \u201922), 192\u2013208."},{"key":"e_1_3_1_6_2","doi-asserted-by":"crossref","unstructured":"Yuanjie Dang Chunxia Huang Peng Chen Dongdong Zhao Nan Gao Ronghua Liang and Ruohong Huan. 2024. Discriminative action snippet propagation network for weakly supervised temporal action localization. ACM Transactions on Multimedia Computing Communications and Applications 20 6 Article 180 (2024) 1\u201321.","DOI":"10.1145\/3643815"},{"key":"e_1_3_1_7_2","doi-asserted-by":"crossref","unstructured":"Peng Dou Ying Zeng Zhuoqun Wang and Haifeng Hu. 2023. Multiple temporal pooling mechanisms for weakly supervised temporal action localization. Transactions on Multimedia Computing Communications and Applications 19 3 Article 108 (2023) 1\u201319.","DOI":"10.1145\/3567828"},{"key":"e_1_3_1_8_2","doi-asserted-by":"crossref","unstructured":"Jia-Run Du Jia-Chang Feng Kun-Yu Lin Fa-Ting Hong Zhongang Qi Ying Shan Jian-Fang Hu and Wei-Shi Zheng. 2025. Weakly-supervised temporal action localization by progressive complementary learning. IEEE Transactions on Circuits and Systems for Video Technology 35 1 (2025) 938\u2013952.","DOI":"10.1109\/TCSVT.2024.3456795"},{"key":"e_1_3_1_9_2","doi-asserted-by":"crossref","unstructured":"Zan Gao Xinglei Cui Tao Zhuo Zhiyong Cheng An-An Liu Meng Wang and Shengyong Chen. 2023. A multitemporal scale and spatial-temporal transformer network for temporal action localization. IEEE Transactions on Human-Machine Systems 53 3 (2023) 569\u2013580.","DOI":"10.1109\/THMS.2023.3266037"},{"key":"e_1_3_1_10_2","doi-asserted-by":"crossref","unstructured":"Jiachang Hao Haifeng Sun Pengfei Ren Yiming Zhong Jingyu Wang Qi Qi and Jianxin Liao. 2023. Fine-grained text-to-video temporal grounding from coarse boundary. ACM Transactions on Multimedia Computing Communications and Applications 19 5 Article 157 (2023) 1\u201321.","DOI":"10.1145\/3579825"},{"key":"e_1_3_1_11_2","first-page":"13915","volume-title":"IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","author":"He Bo","year":"2022","unstructured":"Bo He, Xitong Yang, Le Kang, Zhiyu Cheng, Xin Zhou, and Abhinav Shrivastava. 2022. ASM-Loc: Action-aware segment modeling for weakly-supervised temporal action localization. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 13915\u201313925."},{"key":"e_1_3_1_12_2","first-page":"961","volume-title":"IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Caba Heilbron Fabian","year":"2015","unstructured":"Fabian Caba Heilbron, Victor Escorcia, Bernard Ghanem, and Juan Carlos Niebles. 2015. ActivityNet: A large-scale video benchmark for human activity understanding. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 961\u2013970."},{"key":"e_1_3_1_13_2","first-page":"1591","volume-title":"ACM MultimediaMM \u201921","author":"Hong Fa-Ting","year":"2021","unstructured":"Fa-Ting Hong, Jia-Chang Feng, Dan Xu, Ying Shan, and Wei-Shi Zheng. 2021. Cross-modal consensus network for weakly supervised temporal action localization. In ACM Multimedia Conference (MM \u201921), 1591\u20131599."},{"key":"e_1_3_1_14_2","first-page":"2704","volume-title":"IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Hu Xin","year":"2024","unstructured":"Xin Hu, Kai Li, Deep Patel, Erik Kruus, Martin Renqiang Min, and Zhengming Ding. 2024. Weakly-supervised temporal action localization with multi-modal Plateau transformers. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2704\u20132713."},{"key":"e_1_3_1_15_2","first-page":"3262","volume-title":"IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Huang Linjiang","year":"2022","unstructured":"Linjiang Huang, Liang Wang, and Hongsheng Li. 2022. Weakly supervised temporal action localization via representative snippet knowledge propagation. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 3262\u20133271."},{"key":"e_1_3_1_16_2","doi-asserted-by":"crossref","unstructured":"Haroon Idrees Amir R. Zamir Yu-Gang Jiang Alex Gorban Ivan Laptev Rahul Sukthankar and Mubarak Shah. 2017. The THUMOS challenge on action recognition. Computer Vision and Image Understanding 155 (2017) 1\u201323.","DOI":"10.1016\/j.cviu.2016.10.018"},{"key":"e_1_3_1_17_2","first-page":"1637","volume-title":"35th AAAI Conference on Artificial Intelligence","author":"Islam Ashraful","year":"2021","unstructured":"Ashraful Islam, Chengjiang Long, and Richard J. Radke. 2021. A hybrid attention mechanism for weakly-supervised temporal action localization. In 35th AAAI Conference on Artificial Intelligence, 1637\u20131645."},{"key":"e_1_3_1_18_2","first-page":"1","volume-title":"3rd International Conference on Learning Representations","author":"Kingma Diederik P.","year":"2015","unstructured":"Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, 1\u20138."},{"key":"e_1_3_1_19_2","first-page":"11320","volume-title":"34th AAAI Conference on Artificial Intelligence (AAAI)","author":"Lee Pilhyeon","year":"2020","unstructured":"Pilhyeon Lee, Youngjung Uh, and Hyeran Byun. 2020. Background suppression network for weakly-supervised temporal action localization. In 34th AAAI Conference on Artificial Intelligence (AAAI), 11320\u201311327."},{"key":"e_1_3_1_20_2","doi-asserted-by":"crossref","unstructured":"Chuankun Li Yonghong Hou Pichao Wang and Wanqing Li. 2019. Multiview-based 3D action recognition using deep networks. IEEE Transactions on Human-Machine Systems 49 1 (2019) 95\u2013104.","DOI":"10.1109\/THMS.2018.2883001"},{"key":"e_1_3_1_21_2","first-page":"3","volume-title":"15th European Conference on Computer Vision (ECCV \u201918)","author":"Lin Tianwei","year":"2018","unstructured":"Tianwei Lin, Xu Zhao, Haisheng Su, Chongjing Wang, and Ming Yang. 2018. BSN: Boundary sensitive network for temporal action proposal generation. In 15th European Conference on Computer Vision (ECCV \u201918), 3\u201321."},{"key":"e_1_3_1_22_2","first-page":"344","volume-title":"IEEE Conference on Computer Vision and Pattern Recognition","author":"Long Fuchen","year":"2019","unstructured":"Fuchen Long, Ting Yao, Zhaofan Qiu, Xinmei Tian, Jiebo Luo, and Tao Mei. 2019. Gaussian temporal awareness networks for action localization. In IEEE Conference on Computer Vision and Pattern Recognition, 344\u2013353."},{"key":"e_1_3_1_23_2","first-page":"420","volume-title":"16th European Conference on Computer Vision (ECCV \u201920)","author":"Ma Fan","year":"2020","unstructured":"Fan Ma, Linchao Zhu, Yi Yang, Shengxin Zha, Gourab Kundu, Matt Feiszli, and Zheng Shou. 2020. SF-Net: Single-frame supervision for temporal action localization. In 16th European Conference on Computer Vision (ECCV \u201920), 420\u2013437."},{"key":"e_1_3_1_24_2","doi-asserted-by":"crossref","unstructured":"Md Moniruzzaman and Zhaozheng Yin. 2024. Feature weakening contextualization and discrimination for weakly supervised temporal action localization. IEEE Transactions on Multimedia 26 (2024) 270\u2013283.","DOI":"10.1109\/TMM.2023.3263965"},{"key":"e_1_3_1_25_2","first-page":"6752","volume-title":"IEEE Conference on Computer Vision and Pattern Recognition","author":"Nguyen Phuc","year":"2018","unstructured":"Phuc Nguyen, Ting Liu, Gautam Prasad, and Bohyung Han. 2018. Weakly supervised action localization by sparse temporal pooling network. In IEEE Conference on Computer Vision and Pattern Recognition, 6752\u20136761."},{"key":"e_1_3_1_26_2","first-page":"588","volume-title":"15th European Conference on Computer Vision (ECCV \u201918)","author":"Paul Sujoy","year":"2018","unstructured":"Sujoy Paul, Sourya Roy, and Amit K. Roy-Chowdhury. 2018. W-TALC: Weakly-supervised temporal activity localization and classification. In 15th European Conference on Computer Vision (ECCV \u201918). Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.), 588\u2013607."},{"key":"e_1_3_1_27_2","doi-asserted-by":"crossref","unstructured":"Asanka G. Perera Yee Wei Law Titilayo T. Ogunwa and Javaan S. Chahl. 2020. A multiviewpoint outdoor dataset for human action recognition. IEEE Transactions on Human-Machine Systems 50 (2020) 405\u2013413.","DOI":"10.1109\/THMS.2020.2971958"},{"key":"e_1_3_1_28_2","first-page":"485","volume-title":"IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Qing Zhiwu","year":"2021","unstructured":"Zhiwu Qing, Haisheng Su, Weihao Gan, Dongliang Wang, Wei Wu, Xiang Wang, Yu Qiao, Junjie Yan, Changxin Gao, and Nong Sang. 2021. Temporal context aggregation network for temporal action proposal refinement. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 485\u2013494."},{"key":"e_1_3_1_29_2","first-page":"1008","volume-title":"IEEE International Conference on Multimedia and Expo (ICME)","author":"Ren Hao","year":"2023","unstructured":"Hao Ren, Wu Ran, Xingson Liu, Haoran Ren, Hong Lu, Rui Zhang, and Cheng Jin. 2023. Weakly-supervised temporal action localization with adaptive clustering and refining network. In IEEE International Conference on Multimedia and Expo (ICME), 1008\u20131013."},{"key":"e_1_3_1_30_2","first-page":"281","volume-title":"19th Pacific Rim International Conference on Artificial Intelligence","author":"Ren Hao","year":"2022","unstructured":"Hao Ren, Haoran Ren, Wu Ran, Hong Lu, and Cheng Jin. 2022. Weakly-supervised temporal action localization with multi-head cross-modal attention. In 19th Pacific Rim International Conference on Artificial Intelligence, 281\u2013295."},{"key":"e_1_3_1_31_2","first-page":"2394","volume-title":"IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Ren Huan","year":"2023","unstructured":"Huan Ren, Wenfei Yang, Tianzhu Zhang, and Yongdong Zhang. 2023. Proposal-based multiple instance learning for weakly-supervised temporal action localization. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2394\u20132404."},{"key":"e_1_3_1_32_2","doi-asserted-by":"crossref","unstructured":"Yuxiang Shao Feifei Zhang and Changsheng Xu. 2024. Snippet-to-prototype contrastive consensus network for weakly supervised temporal action localization. IEEE Transactions on Multimedia 26 (2024) 6717\u20136729.","DOI":"10.1109\/TMM.2024.3355628"},{"key":"e_1_3_1_33_2","first-page":"1049","volume-title":"IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Shou Zheng","year":"2016","unstructured":"Zheng Shou, Dongang Wang, and Shih-Fu Chang. 2016. Temporal action localization in untrimmed videos via multi-stage CNNs. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1049\u20131058."},{"key":"e_1_3_1_34_2","first-page":"13719","volume-title":"IEEE\/CVF International Conference on Computer Vision (ICCV)","author":"Sridhar Deepak","year":"2021","unstructured":"Deepak Sridhar, Niamul Quader, Srikanth Muralidharan, Yaoxin Li, Peng Dai, and Juwei Lu. 2021. Class semantics-based attention for action detection. In IEEE\/CVF International Conference on Computer Vision (ICCV), 13719\u201313728."},{"key":"e_1_3_1_35_2","first-page":"6599","volume-title":"IEEE\/CVF International Conference on Computer Vision (ICCV)","author":"Tang Xiaojun","year":"2023","unstructured":"Xiaojun Tang, Junsong Fan, Chuanchen Luo, Zhaoxiang Zhang, Man Zhang, and Zongyuan Yang. 2023. DDG-Net: Discriminability-driven graph network for weakly-supervised temporal action localization. In IEEE\/CVF International Conference on Computer Vision (ICCV), 6599\u20136609."},{"key":"e_1_3_1_36_2","first-page":"6402","volume-title":"IEEE Conference on Computer Vision and Pattern Recognition","author":"Wang Limin","year":"2017","unstructured":"Limin Wang, Yuanjun Xiong, Dahua Lin, and Luc Van Gool. 2017. UntrimmedNets for weakly supervised action recognition and detection. In IEEE Conference on Computer Vision and Pattern Recognition, 6402\u20136411."},{"key":"e_1_3_1_37_2","first-page":"18440","volume-title":"IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Xia Ziying","year":"2024","unstructured":"Ziying Xia, Jian Cheng, Siyu Liu, Yongxiang Hu, Shiguang Wang, Yijie Zhang, and Liwan Dang. 2024. Realigning confidence with temporal saliency information for point-level weakly-supervised temporal action localization. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 18440\u201318450."},{"key":"e_1_3_1_38_2","doi-asserted-by":"crossref","unstructured":"Chi Xie Zikun Zhuang Shengjie Zhao and Shuang Liang. 2023. Temporal dropout for weakly supervised action localization. ACM Transactions on Multimedia Computing Communications and Applications 19 3 Article 102 (2023) 1\u201324.","DOI":"10.1145\/3567827"},{"key":"e_1_3_1_39_2","doi-asserted-by":"crossref","unstructured":"Le Yang Junwei Han Tao Zhao Tianwei Lin Dingwen Zhang and Jianxin Chen. 2022. Background-click supervision for temporal action localization. IEEE Transactions on Pattern Analysis and Machine Intelligence 44 12 (2022) 9814\u20139829.","DOI":"10.1109\/TPAMI.2021.3132058"},{"key":"e_1_3_1_40_2","first-page":"3090","volume-title":"36th AAAI Conference on Artificial Intelligence","author":"Yang Zichen","year":"2022","unstructured":"Zichen Yang, Jie Qin, and Di Huang. 2022. ACGNet: Action complement graph network for weakly-supervised temporal action localization. In 36th AAAI Conference on Artificial Intelligence, 3090\u20133098."},{"key":"e_1_3_1_41_2","first-page":"37","volume-title":"16th European Conference on Computer Vision (ECCV \u201920)","author":"Zhai Yuanhao","year":"2020","unstructured":"Yuanhao Zhai, Le Wang, Wei Tang, Qilin Zhang, Junsong Yuan, and Gang Hua. 2020. Two-stream consensus network for weakly-supervised temporal action localization. In 16th European Conference on Computer Vision (ECCV \u201920), 37\u201354."},{"key":"e_1_3_1_42_2","doi-asserted-by":"crossref","unstructured":"Yuanhao Zhai Le Wang Wei Tang Qilin Zhang Nanning Zheng David S. Doermann Junsong Yuan and Gang Hua. 2023. Adaptive two-stream consensus network for weakly-supervised temporal action localization. IEEE Transactions on Pattern Analysis and Machine Intelligence 45 4 (2023) 4136\u20134151.","DOI":"10.1109\/TPAMI.2022.3189662"},{"key":"e_1_3_1_43_2","first-page":"16010","volume-title":"IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Zhang Can","year":"2021","unstructured":"Can Zhang, Meng Cao, Dongming Yang, Jie Chen, and Yuexian Zou. 2021. CoLA: Weakly-supervised temporal action localization with snippet contrastive learning. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 16010\u201316019."},{"key":"e_1_3_1_44_2","first-page":"492","volume-title":"17th European Conference on Computer Vision (ECCV)","author":"Zhang Chen-Lin","year":"2022","unstructured":"Chen-Lin Zhang, Jianxin Wu, and Yin Li. 2022. ActionFormer: Localizing moments of actions with transformers. In 17th European Conference on Computer Vision (ECCV), 492\u2013510."},{"key":"e_1_3_1_45_2","first-page":"24139","volume-title":"IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Zhang Quan","year":"2025","unstructured":"Quan Zhang and Yuxin Qi. 2025. Weakly supervised temporal action localization via dual-prior collaborative learning guided by multimodal large language models. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 24139\u201324148."},{"key":"e_1_3_1_46_2","doi-asserted-by":"crossref","unstructured":"Shihui Zhang Bingchun Luo Houlin Wang Yu Gu and Jiacheng He. 2024. Temporal action detection in videos with generative denoising diffusion. Knowledge-Based Systems 269 (2024) 111767.","DOI":"10.1016\/j.knosys.2024.111767"},{"key":"e_1_3_1_47_2","first-page":"13638","volume-title":"IEEE\/CVF International Conference on Computer Vision (ICCV)","author":"Zhao Chen","year":"2021","unstructured":"Chen Zhao, Ali K. Thabet, and Bernard Ghanem. 2021. Video self-stitching graph network for temporal action localization. In IEEE\/CVF International Conference on Computer Vision (ICCV), 13638\u201313647."},{"key":"e_1_3_1_48_2","doi-asserted-by":"crossref","unstructured":"Yibo Zhao Hua Zhang Zan Gao Wen Gao Meng Wang and Shengyong Chen. 2023. A novel action saliency and context-aware network for weakly-supervised temporal action localization. IEEE Transactions on Multimedia 25 (2023) 8253\u20138266.","DOI":"10.1109\/TMM.2023.3234362"},{"key":"e_1_3_1_49_2","doi-asserted-by":"crossref","unstructured":"Yibo Zhao Hua Zhang Zan Gao Weili Guan Jie Nie Anan Liu Meng Wang and Shengyong Chen. 2022. A temporal-aware relation and attention network for temporal action localization. IEEE Transactions on Image Processing 31 (2022) 4746\u20134760.","DOI":"10.1109\/TIP.2022.3182866"},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2024.3374870"},{"key":"e_1_3_1_51_2","first-page":"2933","volume-title":"IEEE International Conference on Computer Vision","author":"Zhao Yue","year":"2017","unstructured":"Yue Zhao, Yuanjun Xiong, Limin Wang, Zhirong Wu, Xiaoou Tang, and Dahua Lin. 2017. Temporal action detection with structured segment networks. In IEEE International Conference on Computer Vision, 2933\u20132942."},{"key":"e_1_3_1_52_2","doi-asserted-by":"crossref","unstructured":"Qi Zheng Jianfeng Dong Xiaoye Qu Xun Yang Yabing Wang Pan Zhou Baolong Liu and Xun Wang. 2023. Progressive localization networks for language-based moment localization. ACM Transactions on Multimedia Computing Communications and Applications 19 2 Article 55 (2023) 1\u201321.","DOI":"10.1145\/3543857"},{"key":"e_1_3_1_53_2","doi-asserted-by":"crossref","first-page":"6017","DOI":"10.1109\/WACV56688.2023.00597","volume-title":"IEEE\/CVF Winter Conference on Applications of Computer Vision (WACV)","author":"Zhou Jianxiong","year":"2023","unstructured":"Jianxiong Zhou and Ying Wu. 2023. Temporal feature enhancement dilated convolution network for weakly-supervised temporal action localization. In IEEE\/CVF Winter Conference on Applications of Computer Vision (WACV), 6017\u20136026."},{"key":"e_1_3_1_54_2","first-page":"23003","volume-title":"IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Zhou Jingqiu","year":"2023","unstructured":"Jingqiu Zhou, Linjiang Huang, Liang Wang, Si Liu, and Hongsheng Li. 2023. Improving weakly supervised temporal action localization by bridging train-test gap in pseudo labels. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 23003\u201323012."},{"key":"e_1_3_1_55_2","first-page":"13496","volume-title":"IEEE\/CVF International Conference on Computer Vision (ICCV)","author":"Zhu Zixin","year":"2021","unstructured":"Zixin Zhu, Wei Tang, Le Wang, Nanning Zheng, and Gang Hua. 2021. Enriching local and global contexts for temporal action localization. In IEEE\/CVF International Conference on Computer Vision (ICCV), 13496\u201313505."}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3778170","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,13]],"date-time":"2026-01-13T14:20:39Z","timestamp":1768314039000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3778170"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,1,13]]},"references-count":54,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2026,1,31]]}},"alternative-id":["10.1145\/3778170"],"URL":"https:\/\/doi.org\/10.1145\/3778170","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,1,13]]},"assertion":[{"value":"2025-02-20","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-11-16","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2026-01-13","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}