{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,14]],"date-time":"2026-01-14T16:18:53Z","timestamp":1768407533998,"version":"3.49.0"},"publisher-location":"New York, NY, USA","reference-count":40,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,10,17]],"date-time":"2021-10-17T00:00:00Z","timestamp":1634428800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"the National Key R&D Program of China","award":["Grant No. 2018AAA0102001"],"award-info":[{"award-number":["Grant No. 2018AAA0102001"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,10,17]]},"DOI":"10.1145\/3474085.3475261","type":"proceedings-article","created":{"date-parts":[[2021,10,18]],"date-time":"2021-10-18T11:31:01Z","timestamp":1634556661000},"page":"853-861","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":25,"title":["Weakly-Supervised Temporal Action Localization via Cross-Stream Collaborative Learning"],"prefix":"10.1145","author":[{"given":"Yuan","family":"Ji","sequence":"first","affiliation":[{"name":"Dalian University of Technology, Dalian, China"}]},{"given":"Xu","family":"Jia","sequence":"additional","affiliation":[{"name":"Dalian University of Technology, Dalian, China"}]},{"given":"Huchuan","family":"Lu","sequence":"additional","affiliation":[{"name":"Dalian University of Technology &amp; Peng Cheng Laboratory, Dalian &amp; Shenzhen, China"}]},{"given":"Xiang","family":"Ruan","sequence":"additional","affiliation":[{"name":"Tiwaki Co.Ltd., Kusatsu, Shiga, Japan"}]}],"member":"320","published-online":{"date-parts":[[2021,10,17]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Activitynet: A large-scale video benchmark for human activity understanding. In CVPR. 961--970.","author":"Heilbron Fabian Caba","year":"2015","unstructured":"Fabian Caba Heilbron , Victor Escorcia , Bernard Ghanem , and Juan Carlos Niebles . 2015 . Activitynet: A large-scale video benchmark for human activity understanding. In CVPR. 961--970. Fabian Caba Heilbron, Victor Escorcia, Bernard Ghanem, and Juan Carlos Niebles. 2015. Activitynet: A large-scale video benchmark for human activity understanding. In CVPR. 961--970."},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"crossref","unstructured":"Joao Carreira and Andrew Zisserman. 2017. Quo vadis action recognition? a new model and the kinetics dataset. In CVPR. 6299--6308.  Joao Carreira and Andrew Zisserman. 2017. Quo vadis action recognition? a new model and the kinetics dataset. In CVPR. 6299--6308.","DOI":"10.1109\/CVPR.2017.502"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"crossref","unstructured":"Christoph Feichtenhofer Axel Pinz and Andrew Zisserman. 2016. Convolutional two-stream network fusion for video action recognition. In CVPR. 1933--1941.  Christoph Feichtenhofer Axel Pinz and Andrew Zisserman. 2016. Convolutional two-stream network fusion for video action recognition. In CVPR. 1933--1941.","DOI":"10.1109\/CVPR.2016.213"},{"key":"e_1_3_2_1_4_1","volume":"202","author":"Islam Ashraful","unstructured":"Ashraful Islam , Chengjiang Long , and Richard J. Radke. 202 1. A Hybrid Attention Mechanism for Weakly-Supervised Temporal Action Localization. In Proceedings of the AAAI Conference on Artificial Intelligence. 1637--1645. Ashraful Islam, Chengjiang Long, and Richard J. Radke. 2021. A Hybrid Attention Mechanism for Weakly-Supervised Temporal Action Localization. In Proceedings of the AAAI Conference on Artificial Intelligence. 1637--1645.","journal-title":"Richard J. Radke."},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/WACV45572.2020.9093620"},{"key":"e_1_3_2_1_6_1","unstructured":"Y.-G. Jiang J. Liu A. Roshan Zamir G. Toderici I. Laptev M. Shah and R. Sukthankar. 2014. THUMOS Challenge: Action Recognition with a Large Number of Classes. http:\/\/crcv.ucf.edu\/THUMOS14\/.  Y.-G. Jiang J. Liu A. Roshan Zamir G. Toderici I. Laptev M. Shah and R. Sukthankar. 2014. THUMOS Challenge: Action Recognition with a Large Number of Classes. http:\/\/crcv.ucf.edu\/THUMOS14\/."},{"key":"e_1_3_2_1_7_1","volume-title":"Adam: A method for stochastic optimization. In ICLR .","author":"Kingma Diederik P","year":"2015","unstructured":"Diederik P Kingma and Jimmy Ba . 2015 . Adam: A method for stochastic optimization. In ICLR . Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In ICLR ."},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i07.6793"},{"key":"e_1_3_2_1_9_1","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence. 1854--1862","author":"Lee Pilhyeon","year":"2021","unstructured":"Pilhyeon Lee , Jinglu Wang , Yan Lu , and Hyeran Byun . 2021 . Weakly-supervised Temporal Action Localization by Uncertainty Modeling . In Proceedings of the AAAI Conference on Artificial Intelligence. 1854--1862 . Pilhyeon Lee, Jinglu Wang, Yan Lu, and Hyeran Byun. 2021. Weakly-supervised Temporal Action Localization by Uncertainty Modeling. In Proceedings of the AAAI Conference on Artificial Intelligence. 1854--1862."},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3123266.3123343"},{"key":"e_1_3_2_1_11_1","volume-title":"Bsn: Boundary sensitive network for temporal action proposal generation. In ECCV. 3--19.","author":"Lin Tianwei","year":"2018","unstructured":"Tianwei Lin , Xu Zhao , Haisheng Su , Chongjing Wang , and Ming Yang . 2018 . Bsn: Boundary sensitive network for temporal action proposal generation. In ECCV. 3--19. Tianwei Lin, Xu Zhao, Haisheng Su, Chongjing Wang, and Ming Yang. 2018. Bsn: Boundary sensitive network for temporal action proposal generation. In ECCV. 3--19."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00139"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00400"},{"key":"e_1_3_2_1_14_1","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence. 2233--2241","author":"Liu Ziyi","year":"2021","unstructured":"Ziyi Liu , Le Wang , Qilin Zhang , Wei Tang , Junsong Yuan , Nanning Zheng , and Gang Hua . 2021 . ACSNet: Action-Context Separation Network for Weakly Supervised Temporal Action Localization . In Proceedings of the AAAI Conference on Artificial Intelligence. 2233--2241 . Ziyi Liu, Le Wang, Qilin Zhang, Wei Tang, Junsong Yuan, Nanning Zheng, and Gang Hua. 2021. ACSNet: Action-Context Separation Network for Weakly Supervised Temporal Action Localization. In Proceedings of the AAAI Conference on Artificial Intelligence. 2233--2241."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"crossref","unstructured":"Fuchen Long Ting Yao Zhaofan Qiu Xinmei Tian Jiebo Luo and Tao Mei. 2019. Gaussian temporal awareness networks for action localization. In CVPR. 344--353.  Fuchen Long Ting Yao Zhaofan Qiu Xinmei Tian Jiebo Luo and Tao Mei. 2019. Gaussian temporal awareness networks for action localization. In CVPR. 344--353.","DOI":"10.1109\/CVPR.2019.00043"},{"key":"e_1_3_2_1_16_1","volume-title":"Weakly-Supervised Action Localization with Expectation-Maximization Multi-Instance Learning. In European Conference on Computer Vision. Springer, 729--745","author":"Luo Zhekun","year":"2020","unstructured":"Zhekun Luo , Devin Guillory , Baifeng Shi , Wei Ke , Fang Wan , Trevor Darrell , and Huijuan Xu . 2020 . Weakly-Supervised Action Localization with Expectation-Maximization Multi-Instance Learning. In European Conference on Computer Vision. Springer, 729--745 . Zhekun Luo, Devin Guillory, Baifeng Shi, Wei Ke, Fang Wan, Trevor Darrell, and Huijuan Xu. 2020. Weakly-Supervised Action Localization with Expectation-Maximization Multi-Instance Learning. In European Conference on Computer Vision. Springer, 729--745."},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"crossref","unstructured":"Fan Ma Linchao Zhu Yi Yang Shengxin Zha Gourab Kundu Matt Feiszli and Zheng Shou. 2020. SF-Net: Single-frame supervision for temporal action localization. In ECCV. 420--437.  Fan Ma Linchao Zhu Yi Yang Shengxin Zha Gourab Kundu Matt Feiszli and Zheng Shou. 2020. SF-Net: Single-frame supervision for temporal action localization. In ECCV. 420--437.","DOI":"10.1007\/978-3-030-58548-8_25"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58568-6_17"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3413687"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00877"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00706"},{"key":"e_1_3_2_1_22_1","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV). 5502--5511","author":"Nguyen Phuc Xuan","unstructured":"Phuc Xuan Nguyen , Deva Ramanan , and Charless C. Fowlkes . 2019. Weakly-Supervised Action Localization With Background Modeling . In Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV). 5502--5511 . Phuc Xuan Nguyen, Deva Ramanan, and Charless C. Fowlkes. 2019. Weakly-Supervised Action Localization With Background Modeling. In Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV). 5502--5511."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/WACV48630.2021.00336"},{"key":"e_1_3_2_1_24_1","volume-title":"Proceedings of the European Conference on Computer Vision (ECCV). 563--579","author":"Paul Sujoy","unstructured":"Sujoy Paul , Sourya Roy , and Amit K . Roy-Chowdhury. 2018. W-TALC: Weakly-supervised Temporal Activity Localization and Classification . In Proceedings of the European Conference on Computer Vision (ECCV). 563--579 . Sujoy Paul, Sourya Roy, and Amit K. Roy-Chowdhury. 2018. W-TALC: Weakly-supervised Temporal Activity Localization and Classification. In Proceedings of the European Conference on Computer Vision (ECCV). 563--579."},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00109"},{"key":"e_1_3_2_1_26_1","volume-title":"Cdc: Convolutional-de-convolutional networks for precise temporal action localization in untrimmed videos. In CVPR. 5734--5743.","author":"Shou Zheng","year":"2017","unstructured":"Zheng Shou , Jonathan Chan , Alireza Zareian , Kazuyuki Miyazawa , and Shih-Fu Chang . 2017 . Cdc: Convolutional-de-convolutional networks for precise temporal action localization in untrimmed videos. In CVPR. 5734--5743. Zheng Shou, Jonathan Chan, Alireza Zareian, Kazuyuki Miyazawa, and Shih-Fu Chang. 2017. Cdc: Convolutional-de-convolutional networks for precise temporal action localization in untrimmed videos. In CVPR. 5734--5743."},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01270-0_10"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.5555\/3326943.3327112"},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"crossref","unstructured":"Haisheng Su Weihao Gan Wei Wu Junjie Yan and Yu Qiao. 2021. BSN+: Complementary Boundary Regressor with Scale-Balanced Relation Modeling for Temporal Action Proposal Generation. In AAAI. 2602--2610.  Haisheng Su Weihao Gan Wei Wu Junjie Yan and Yu Qiao. 2021. BSN+: Complementary Boundary Regressor with Scale-Balanced Relation Modeling for Temporal Action Proposal Generation. In AAAI. 2602--2610.","DOI":"10.1609\/aaai.v35i3.16363"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.678"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-03061-1_2"},{"key":"e_1_3_2_1_32_1","volume-title":"G-tad: Sub-graph localization for temporal action detection. In CVPR. 10156--10165.","author":"Xu Mengmeng","year":"2020","unstructured":"Mengmeng Xu , Chen Zhao , David S Rojas , Ali Thabet , and Bernard Ghanem . 2020 . G-tad: Sub-graph localization for temporal action detection. In CVPR. 10156--10165. Mengmeng Xu, Chen Zhao, David S Rojas, Ali Thabet, and Bernard Ghanem. 2020. G-tad: Sub-graph localization for temporal action detection. In CVPR. 10156--10165."},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"crossref","unstructured":"Yunlu Xu Chengwei Zhang Zhanzhan Cheng Jianwen Xie Yi Niu Shiliang Pu and Fei Wu. 2019. Segregated temporal assembly recurrent networks for weakly supervised multiple action detection. In AAAI. 9070--9078.  Yunlu Xu Chengwei Zhang Zhanzhan Cheng Jianwen Xie Yi Niu Shiliang Pu and Fei Wu. 2019. Segregated temporal assembly recurrent networks for weakly supervised multiple action detection. In AAAI. 9070--9078.","DOI":"10.1609\/aaai.v33i01.33019070"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3343031.3351043"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58539-6_3"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01575"},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3343031.3351044"},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/3474085.3475192"},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"crossref","unstructured":"Yue Zhao Yuanjun Xiong Limin Wang Zhirong Wu Xiaoou Tang and Dahua Lin. 2017. Temporal action detection with structured segment networks. In ICCV. 2914--2923.  Yue Zhao Yuanjun Xiong Limin Wang Zhirong Wu Xiaoou Tang and Dahua Lin. 2017. Temporal action detection with structured segment networks. In ICCV. 2914--2923.","DOI":"10.1109\/ICCV.2017.317"},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/3240508.3240511"}],"event":{"name":"MM '21: ACM Multimedia Conference","location":"Virtual Event China","acronym":"MM '21","sponsor":["SIGMM ACM Special Interest Group on Multimedia"]},"container-title":["Proceedings of the 29th ACM International Conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3474085.3475261","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3474085.3475261","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:48:17Z","timestamp":1750193297000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3474085.3475261"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,17]]},"references-count":40,"alternative-id":["10.1145\/3474085.3475261","10.1145\/3474085"],"URL":"https:\/\/doi.org\/10.1145\/3474085.3475261","relation":{},"subject":[],"published":{"date-parts":[[2021,10,17]]},"assertion":[{"value":"2021-10-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}