{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,30]],"date-time":"2026-04-30T17:21:53Z","timestamp":1777569713708,"version":"3.51.4"},"publisher-location":"New York, NY, USA","reference-count":60,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,8,24]],"date-time":"2021-08-24T00:00:00Z","timestamp":1629763200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100000266","name":"Engineering and Physical Sciences Research Council","doi-asserted-by":"publisher","award":["EP\/R025290\/1"],"award-info":[{"award-number":["EP\/R025290\/1"]}],"id":[{"id":"10.13039\/501100000266","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,8,24]]},"DOI":"10.1145\/3460426.3463643","type":"proceedings-article","created":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T22:50:28Z","timestamp":1630536628000},"page":"339-348","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["Few-Shot Action Localization without Knowing Boundaries"],"prefix":"10.1145","author":[{"given":"Ting-Ting","family":"Xie","sequence":"first","affiliation":[{"name":"Queen Mary University of London, London, United Kingdom"}]},{"given":"Christos","family":"Tzelepis","sequence":"additional","affiliation":[{"name":"Queen Mary University of London, London, United Kingdom"}]},{"given":"Fan","family":"Fu","sequence":"additional","affiliation":[{"name":"City, University of London, London, United Kingdom"}]},{"given":"Ioannis","family":"Patras","sequence":"additional","affiliation":[{"name":"Queen Mary University of London, London, United Kingdom"}]}],"member":"320","published-online":{"date-parts":[[2021,9]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Proceedings of the British Machine Vision Conference","author":"Bishay Mina","year":"2019","unstructured":"Mina Bishay , Georgios Zoumpourlis , and Ioannis Patras . 2019 . TARN: Temporal Attentive Relation Network for Few-Shot and Zero-Shot Action Recognition . Proceedings of the British Machine Vision Conference (2019). Mina Bishay, Georgios Zoumpourlis, and Ioannis Patras. 2019. TARN: Temporal Attentive Relation Network for Few-Shot and Zero-Shot Action Recognition. Proceedings of the British Machine Vision Conference (2019)."},{"key":"e_1_3_2_1_2_1","volume-title":"Rethinking Zero-Shot Video Classification: End-to-End Training for Realistic Applications. In IEEE Conference on Computer Vision and Pattern Recognition .","author":"Brattoli Biagio","year":"2020","unstructured":"Biagio Brattoli , Joseph Tighe , Fedor Zhdanov , Pietro Perona , and Krzysztof Chalupka . 2020 . Rethinking Zero-Shot Video Classification: End-to-End Training for Realistic Applications. In IEEE Conference on Computer Vision and Pattern Recognition . Biagio Brattoli, Joseph Tighe, Fedor Zhdanov, Pietro Perona, and Krzysztof Chalupka. 2020. Rethinking Zero-Shot Video Classification: End-to-End Training for Realistic Applications. In IEEE Conference on Computer Vision and Pattern Recognition ."},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298698"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01063"},{"key":"e_1_3_2_1_5_1","volume-title":"Metric-Based Few-Shot Learning for Video Action Recognition. arXiv preprint arXiv:1909.09602","author":"Careaga Chris","year":"2019","unstructured":"Chris Careaga , Brian Hutchinson , Nathan Hodas , and Lawrence Phillips . 2019. Metric-Based Few-Shot Learning for Video Action Recognition. arXiv preprint arXiv:1909.09602 ( 2019 ). Chris Careaga, Brian Hutchinson, Nathan Hodas, and Lawrence Phillips. 2019. Metric-Based Few-Shot Learning for Video Action Recognition. arXiv preprint arXiv:1909.09602 (2019)."},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.502"},{"key":"e_1_3_2_1_7_1","volume-title":"Rethinking the Faster R-CNN Architecture for Temporal Action Localization. In IEEE Conference on Computer Vision and Pattern Recognition. 1130--1139","author":"Chao Yu-Wei","year":"2018","unstructured":"Yu-Wei Chao , Sudheendra Vijayanarasimhan , Bryan Seybold , David A Ross , Jia Deng , and Rahul Sukthankar . 2018 . Rethinking the Faster R-CNN Architecture for Temporal Action Localization. In IEEE Conference on Computer Vision and Pattern Recognition. 1130--1139 . Yu-Wei Chao, Sudheendra Vijayanarasimhan, Bryan Seybold, David A Ross, Jia Deng, and Rahul Sukthankar. 2018. Rethinking the Faster R-CNN Architecture for Temporal Action Localization. In IEEE Conference on Computer Vision and Pattern Recognition. 1130--1139."},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2006.79"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01264-9_4"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01216-8_5"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.5244\/C.31.52"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.392"},{"key":"e_1_3_2_1_13_1","volume-title":"IEEE Conference on Computer Vision and Pattern Recognition","author":"Hahn Meera","year":"2019","unstructured":"Meera Hahn , Andrew Silva , and James M Rehg . 2019 . Action2vec: A crossmodal embedding approach to action learning . IEEE Conference on Computer Vision and Pattern Recognition , Workshop (2019). Meera Hahn, Andrew Silva, and James M Rehg. 2019. Action2vec: A crossmodal embedding approach to action learning. IEEE Conference on Computer Vision and Pattern Recognition, Workshop (2019)."},{"key":"e_1_3_2_1_14_1","unstructured":"Ruibing Hou Hong Chang MA Bingpeng Shiguang Shan and Xilin Chen. 2019. Cross attention network for few-shot classification. In Advances in Neural Information Processing Systems. 4003--4014.  Ruibing Hou Hong Chang MA Bingpeng Shiguang Shan and Xilin Chen. 2019. Cross attention network for few-shot classification. In Advances in Neural Information Processing Systems. 4003--4014."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i07.6763"},{"key":"e_1_3_2_1_16_1","unstructured":"Y.-G. Jiang J. Liu A. Roshan Zamir G. Toderici I. Laptev M. Shah and R. Sukthankar. 2014. THUMOS Challenge: Action Recognition with a Large Number of Classes. http:\/\/crcv.ucf.edu\/THUMOS14\/.  Y.-G. Jiang J. Liu A. Roshan Zamir G. Toderici I. Laptev M. Shah and R. Sukthankar. 2014. THUMOS Challenge: Action Recognition with a Large Number of Classes. http:\/\/crcv.ucf.edu\/THUMOS14\/."},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.223"},{"key":"e_1_3_2_1_18_1","volume-title":"et almbox","author":"Kay Will","year":"2017","unstructured":"Will Kay , Joao Carreira , Karen Simonyan , Brian Zhang , Chloe Hillier , Sudheendra Vijayanarasimhan , Fabio Viola , Tim Green , Trevor Back , Paul Natsev , et almbox . 2017 . The kinetics human action video dataset. arXiv preprint arXiv:1705.06950 (2017). Will Kay, Joao Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, et almbox. 2017. The kinetics human action video dataset. arXiv preprint arXiv:1705.06950 (2017)."},{"key":"e_1_3_2_1_19_1","volume-title":"International Conference on Learning Representations","author":"Kingma Diederik P","year":"2014","unstructured":"Diederik P Kingma and Jimmy Ba . 2014 . Adam: A method for stochastic optimization . International Conference on Learning Representations (2014). Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. International Conference on Learning Representations (2014)."},{"key":"e_1_3_2_1_20_1","volume-title":"International Conference on Learning Representations","author":"Kingma Diederik P","year":"2013","unstructured":"Diederik P Kingma and Max Welling . 2013 . Auto-encoding variational bayes . International Conference on Learning Representations (2013). Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. International Conference on Learning Representations (2013)."},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00645"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-51811-4_21"},{"key":"e_1_3_2_1_23_1","volume-title":"National Conference on Artificial Intelligence .","author":"Larochelle Hugo","year":"2008","unstructured":"Hugo Larochelle , Dumitru Erhan , and Yoshua Bengio . 2008 . Zero-data learning of new tasks .. In National Conference on Artificial Intelligence . Hugo Larochelle, Dumitru Erhan, and Yoshua Bengio. 2008. Zero-data learning of new tasks.. In National Conference on Artificial Intelligence ."},{"key":"e_1_3_2_1_24_1","volume-title":"BMN: Boundary-Matching Network for Temporal Action Proposal Generation. IEEE International Conference on Computer Vision","author":"Lin Tianwei","year":"2019","unstructured":"Tianwei Lin , Xiao Liu , Xin Li , Errui Ding , and Shilei Wen . 2019 . BMN: Boundary-Matching Network for Temporal Action Proposal Generation. IEEE International Conference on Computer Vision (2019). Tianwei Lin, Xiao Liu, Xin Li, Errui Ding, and Shilei Wen. 2019. BMN: Boundary-Matching Network for Temporal Action Proposal Generation. IEEE International Conference on Computer Vision (2019)."},{"key":"e_1_3_2_1_25_1","volume-title":"BSN: Boundary Sensitive Network for Temporal Action Proposal Generation. European Conference on Computer Vision","author":"Lin Tianwei","year":"2018","unstructured":"Tianwei Lin , Xu Zhao , Haisheng Su , Chongjing Wang , and Ming Yang . 2018 . BSN: Boundary Sensitive Network for Temporal Action Proposal Generation. European Conference on Computer Vision (2018). Tianwei Lin, Xu Zhao, Haisheng Su, Chongjing Wang, and Ming Yang. 2018. BSN: Boundary Sensitive Network for Temporal Action Proposal Generation. European Conference on Computer Vision (2018)."},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00139"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00043"},{"key":"e_1_3_2_1_28_1","volume-title":"Closing the Generalization Gap in One-Shot Object Detection. arXiv preprint arXiv:2011.04267","author":"Michaelis Claudio","year":"2020","unstructured":"Claudio Michaelis , Matthias Bethge , and Alexander S Ecker . 2020. Closing the Generalization Gap in One-Shot Object Detection. arXiv preprint arXiv:2011.04267 ( 2020 ). Claudio Michaelis, Matthias Bethge, and Alexander S Ecker. 2020. Closing the Generalization Gap in One-Shot Object Detection. arXiv preprint arXiv:2011.04267 (2020)."},{"key":"e_1_3_2_1_29_1","volume-title":"Adversarial Background-Aware Loss for Weakly-supervised Temporal Activity Localization. European Conference on Computer Vision","author":"Min Kyle","year":"2020","unstructured":"Kyle Min and Jason J Corso . 2020 . Adversarial Background-Aware Loss for Weakly-supervised Temporal Activity Localization. European Conference on Computer Vision (2020). Kyle Min and Jason J Corso. 2020. Adversarial Background-Aware Loss for Weakly-supervised Temporal Activity Localization. European Conference on Computer Vision (2020)."},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00877"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00706"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00560"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2009.5204262"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01225-0_35"},{"key":"e_1_3_2_1_35_1","unstructured":"Alec Radford Jong Wook Kim et almbox. [n.d.]. Learning Transferable Visual Models From Natural Language Supervision. Image ( [n. d.]).  Alec Radford Jong Wook Kim et almbox. [n.d.]. Learning Transferable Visual Models From Natural Language Supervision. Image ( [n. d.])."},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00269"},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00081"},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00109"},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.155"},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01270-0_10"},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.119"},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00049"},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00131"},{"key":"e_1_3_2_1_44_1","volume-title":"et almbox","author":"Vinyals Oriol","year":"2016","unstructured":"Oriol Vinyals , Charles Blundell , Timothy Lillicrap , Daan Wierstra , et almbox . 2016 . Matching networks for one shot learning. In Advances in Neural Information Processing Systems . 3630--3638. Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Daan Wierstra, et almbox. 2016. Matching networks for one shot learning. In Advances in Neural Information Processing Systems. 3630--3638."},{"key":"e_1_3_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.678"},{"key":"e_1_3_2_1_46_1","volume-title":"Exploring Feature Representation and Training Strategies in Temporal Action Localization. In International Conference on Image Processing . IEEE, 1605--1609","author":"Xie Tingting","year":"2019","unstructured":"Tingting Xie , Xiaoshan Yang , Tianzhu Zhang , Changsheng Xu , and Ioannis Patras . 2019 . Exploring Feature Representation and Training Strategies in Temporal Action Localization. In International Conference on Image Processing . IEEE, 1605--1609 . Tingting Xie, Xiaoshan Yang, Tianzhu Zhang, Changsheng Xu, and Ioannis Patras. 2019. Exploring Feature Representation and Training Strategies in Temporal Action Localization. In International Conference on Image Processing . IEEE, 1605--1609."},{"key":"e_1_3_2_1_47_1","volume-title":"Temporal Action Localization with Variance-Aware Networks. arXiv preprint arXiv:2008.11254","author":"Xie Ting-Ting","year":"2020","unstructured":"Ting-Ting Xie , Christos Tzelepis , and Ioannis Patras . 2020. Temporal Action Localization with Variance-Aware Networks. arXiv preprint arXiv:2008.11254 ( 2020 ). Ting-Ting Xie, Christos Tzelepis, and Ioannis Patras. 2020. Temporal Action Localization with Variance-Aware Networks. arXiv preprint arXiv:2008.11254 (2020)."},{"key":"e_1_3_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.617"},{"key":"e_1_3_2_1_49_1","volume-title":"2020 a. Revisiting Few-shot Activity Detection with Class Similarity Control. arXiv preprint arXiv:2004.00137","author":"Xu Huijuan","year":"2020","unstructured":"Huijuan Xu , Ximeng Sun , Eric Tzeng , Abir Das , Kate Saenko , and Trevor Darrell . 2020 a. Revisiting Few-shot Activity Detection with Class Similarity Control. arXiv preprint arXiv:2004.00137 ( 2020 ). Huijuan Xu, Ximeng Sun, Eric Tzeng, Abir Das, Kate Saenko, and Trevor Darrell. 2020 a. Revisiting Few-shot Activity Detection with Class Similarity Control. arXiv preprint arXiv:2004.00137 (2020)."},{"key":"e_1_3_2_1_50_1","volume-title":"G-TAD: Sub-Graph Localization for Temporal Action Detection. IEEE Conference on Computer Vision and Pattern Recognition","author":"Xu Mengmeng","year":"2020","unstructured":"Mengmeng Xu , Chen Zhao , David S Rojas , Ali Thabet , and Bernard Ghanem . 2020 b . G-TAD: Sub-Graph Localization for Temporal Action Detection. IEEE Conference on Computer Vision and Pattern Recognition (2020). Mengmeng Xu, Chen Zhao, David S Rojas, Ali Thabet, and Bernard Ghanem. 2020 b. G-TAD: Sub-Graph Localization for Temporal Action Detection. IEEE Conference on Computer Vision and Pattern Recognition (2020)."},{"key":"e_1_3_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00157"},{"key":"e_1_3_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58571-6_30"},{"key":"e_1_3_2_1_53_1","volume-title":"Joint pattern recognition symposium","author":"Zach Christopher","unstructured":"Christopher Zach , Thomas Pock , and Horst Bischof . 2007. A duality based approach for realtime tv-l 1 optical flow . In Joint pattern recognition symposium . Springer , 214--223. Christopher Zach, Thomas Pock, and Horst Bischof. 2007. A duality based approach for realtime tv-l 1 optical flow. In Joint pattern recognition symposium. Springer, 214--223."},{"key":"e_1_3_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58558-7_31"},{"key":"e_1_3_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00876"},{"key":"e_1_3_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.317"},{"key":"e_1_3_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.319"},{"key":"e_1_3_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01234-2_46"},{"key":"e_1_3_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00983"},{"key":"e_1_3_2_1_60_1","volume-title":"Compositional Few-Shot Recognition with Primitive Discovery and Enhancing. In ACM on Multimedia Conference. 156--164","author":"Zou Yixiong","year":"2020","unstructured":"Yixiong Zou , Shanghang Zhang , Ke Chen , Yonghong Tian , Yaowei Wang , and Jos\u00e9 MF Moura . 2020 . Compositional Few-Shot Recognition with Primitive Discovery and Enhancing. In ACM on Multimedia Conference. 156--164 . Yixiong Zou, Shanghang Zhang, Ke Chen, Yonghong Tian, Yaowei Wang, and Jos\u00e9 MF Moura. 2020. Compositional Few-Shot Recognition with Primitive Discovery and Enhancing. In ACM on Multimedia Conference. 156--164."}],"event":{"name":"ICMR '21: International Conference on Multimedia Retrieval","location":"Taipei Taiwan","acronym":"ICMR '21","sponsor":["SIGMM ACM Special Interest Group on Multimedia"]},"container-title":["Proceedings of the 2021 International Conference on Multimedia Retrieval"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3460426.3463643","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3460426.3463643","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:17:04Z","timestamp":1750191424000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3460426.3463643"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,8,24]]},"references-count":60,"alternative-id":["10.1145\/3460426.3463643","10.1145\/3460426"],"URL":"https:\/\/doi.org\/10.1145\/3460426.3463643","relation":{},"subject":[],"published":{"date-parts":[[2021,8,24]]},"assertion":[{"value":"2021-09-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}