{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,7]],"date-time":"2025-11-07T19:20:52Z","timestamp":1762543252460,"version":"3.41.0"},"reference-count":79,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2020,5,22]],"date-time":"2020-05-22T00:00:00Z","timestamp":1590105600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2020,5,31]]},"abstract":"<jats:p>This article aims for the detection and search of events in videos, where video examples are either scarce or even absent during training. To enable such event detection and search, ImageNet concept banks have shown to be effective. Rather than employing the standard concept bank of 1,000 ImageNet classes, we leverage the full 21,841-class dataset. We identify two problems with using the full dataset: (i) there is an imbalance between the number of examples per concept, and (ii) not all concepts are equally relevant for events. In this article, we propose to balance large-scale image hierarchies for pre-training. We shuffle concepts based on bottom-up and top-down operations to overcome the problems of example imbalance and concept relevance. Using this strategy, we arrive at the shuffled ImageNet bank, a concept bank with an order of magnitude more concepts compared to standard ImageNet banks. Compared to standard ImageNet pre-training, our shuffles result in more discriminative representations to train event models from the limited video event examples. For event search, the broad range of concepts enable a closer match between textual queries of events and concept detections in videos. Experimentally, we show the benefit of the proposed bank for event detection and event search, with state-of-the-art performance for both tasks on the challenging TRECVID Multimedia Event Detection and Ad-Hoc Video Search benchmarks.<\/jats:p>","DOI":"10.1145\/3377875","type":"journal-article","created":{"date-parts":[[2020,5,25]],"date-time":"2020-05-25T22:07:21Z","timestamp":1590444441000},"page":"1-21","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":25,"title":["Shuffled ImageNet Banks for Video Event Detection and Search"],"prefix":"10.1145","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9275-5942","authenticated-orcid":false,"given":"Pascal","family":"Mettes","sequence":"first","affiliation":[{"name":"University of Amsterdam, Amsterdam, the Netherlands"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dennis C.","family":"Koelma","sequence":"additional","affiliation":[{"name":"University of Amsterdam, Amsterdam, the Netherlands"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Cees G. M.","family":"Snoek","sequence":"additional","affiliation":[{"name":"University of Amsterdam, Amsterdam, the Netherlands"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2020,5,22]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Proceedings of the TRECVID.","author":"Awad George","year":"2017","unstructured":"George Awad , Asad Butt , Jonathan Fiscus , David Joy , Andrew Delgado , Martial Michel , Alan F. Smeaton , Yvette Graham , Wessel Kraaij , Georges Qu\u00e9not , 2017 . Trecvid 2017: Evaluating ad-hoc and instance video search, events detection, video captioning and hyperlinking . In Proceedings of the TRECVID. George Awad, Asad Butt, Jonathan Fiscus, David Joy, Andrew Delgado, Martial Michel, Alan F. Smeaton, Yvette Graham, Wessel Kraaij, Georges Qu\u00e9not, et al. 2017. Trecvid 2017: Evaluating ad-hoc and instance video search, events detection, video captioning and hyperlinking. In Proceedings of the TRECVID."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/2578726.2578740"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/1961189.1961199"},{"key":"e_1_2_1_4_1","first-page":"1180","article-title":"Bi-level semantic representation analysis for multimedia event detection","volume":"47","author":"Chang Xiaojun","year":"2017","unstructured":"Xiaojun Chang , Zhigang Ma , Yi Yang , Zhiqiang Zeng , and Alexander G. Hauptmann . 2017 . Bi-level semantic representation analysis for multimedia event detection . ToC 47 , 5 (2017), 1180 -- 1197 . Xiaojun Chang, Zhigang Ma, Yi Yang, Zhiqiang Zeng, and Alexander G. Hauptmann. 2017. Bi-level semantic representation analysis for multimedia event detection. ToC 47, 5 (2017), 1180--1197.","journal-title":"ToC"},{"key":"e_1_2_1_5_1","volume-title":"Hauptmann","author":"Chang Xiaojun","year":"2016","unstructured":"Xiaojun Chang , Yi Yang , Guodong Long , Chengqi Zhang , and Alexander G . Hauptmann . 2016 . Dynamic concept composition for zero-example event detection. In Proceedings of the AAAI. Xiaojun Chang, Yi Yang, Guodong Long, Chengqi Zhang, and Alexander G. Hauptmann. 2016. Dynamic concept composition for zero-example event detection. In Proceedings of the AAAI."},{"key":"e_1_2_1_6_1","volume-title":"Proceedings of the ICML. 1348--1357","author":"Chang Xiaojun","year":"2015","unstructured":"Xiaojun Chang , Yi Yang , Eric Xing , and Yaoliang Yu . 2015 . Complex event detection using semantic saliency and nearly isotonic SVM . In Proceedings of the ICML. 1348--1357 . Xiaojun Chang, Yi Yang, Eric Xing, and Yaoliang Yu. 2015. Complex event detection using semantic saliency and nearly isotonic SVM. In Proceedings of the ICML. 1348--1357."},{"key":"e_1_2_1_7_1","volume-title":"Xing","author":"Chang Xiaojun","year":"2016","unstructured":"Xiaojun Chang , Yao-Liang Yu , Yi Yang , and Eric P . Xing . 2016 . They are not equally reliable: Semantic event search using differentiated concept classifiers. In Proceedings of the CVPR. Xiaojun Chang, Yao-Liang Yu, Yi Yang, and Eric P. Xing. 2016. They are not equally reliable: Semantic event search using differentiated concept classifiers. In Proceedings of the CVPR."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2016.2608901"},{"key":"e_1_2_1_9_1","volume-title":"Proceedings of the CoRR.","author":"Chen Tianqi","year":"2015","unstructured":"Tianqi Chen , Mu Li , Yutian Li , Min Lin , Naiyan Wang , Minjie Wang , Tianjun Xiao , Bing Xu , Chiyuan Zhang , and Zheng Zhang . 2015 . Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems . In Proceedings of the CoRR. Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. In Proceedings of the CoRR."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298878"},{"key":"e_1_2_1_11_1","volume-title":"Snoek","author":"Dong Jianfeng","year":"2016","unstructured":"Jianfeng Dong , Xirong Li , Weiyu Lan , Yujia Huo , and Cees G. M . Snoek . 2016 . Early embedding and late reranking for video captioning. In Proceedings of the MM. Jianfeng Dong, Xirong Li, Weiyu Lan, Yujia Huo, and Cees G. M. Snoek. 2016. Early embedding and late reranking for video captioning. In Proceedings of the MM."},{"key":"e_1_2_1_12_1","volume-title":"Hauptmann","author":"Fan Hehe","year":"2017","unstructured":"Hehe Fan , Xiaojun Chang , De Cheng , Yi Yang , Dong Xu , and Alexander G . Hauptmann . 2017 . Complex event detection by identifying reliable shots from untrimmed videos. In Proceedings of the ICCV. Hehe Fan, Xiaojun Chang, De Cheng, Yi Yang, Dong Xu, and Alexander G. Hauptmann. 2017. Complex event detection by identifying reliable shots from untrimmed videos. In Proceedings of the ICCV."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1117\/12.290336"},{"key":"e_1_2_1_14_1","volume-title":"Hauptmann","author":"Gan Chuang","year":"2015","unstructured":"Chuang Gan , Naiyan Wang , Yi Yang , Dit-Yan Yeung , and Alex G . Hauptmann . 2015 . Devnet : A deep event network for multimedia event detection and evidence recounting. In Proceedings of the CVPR. Chuang Gan, Naiyan Wang, Yi Yang, Dit-Yan Yeung, and Alex G. Hauptmann. 2015. Devnet: A deep event network for multimedia event detection and evidence recounting. In Proceedings of the CVPR."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.106"},{"key":"e_1_2_1_16_1","volume-title":"Proceedings of the BMVC.","author":"Girard Julien","year":"2018","unstructured":"Julien Girard , Youssef Tamaazousti , Herv\u00e9 Le Borgne , and C\u00e9line Hudelot . 2018 . Learning finer-class networks for universal representations . In Proceedings of the BMVC. Julien Girard, Youssef Tamaazousti, Herv\u00e9 Le Borgne, and C\u00e9line Hudelot. 2018. Learning finer-class networks for universal representations. In Proceedings of the BMVC."},{"key":"e_1_2_1_17_1","volume-title":"Snoek","author":"Habibian Amirhossein","year":"2014","unstructured":"Amirhossein Habibian , Thomas Mensink , and Cees G. M . Snoek . 2014 . Composite concept discovery for zero-shot video event detection. In Proceedings of the ICMR. Amirhossein Habibian, Thomas Mensink, and Cees G. M. Snoek. 2014. Composite concept discovery for zero-shot video event detection. In Proceedings of the ICMR."},{"key":"e_1_2_1_18_1","volume-title":"Snoek","author":"Habibian Amirhossein","year":"2014","unstructured":"Amirhossein Habibian , Thomas Mensink , and Cees G. M . Snoek . 2014 . Videostory : A new multimedia embedding for few-example recognition and translation of events. In Proceedings of the MM. Amirhossein Habibian, Thomas Mensink, and Cees G. M. Snoek. 2014. Videostory: A new multimedia embedding for few-example recognition and translation of events. In Proceedings of the MM."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2016.2627563"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cviu.2014.02.003"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2007.900150"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/2964284.2967226"},{"key":"e_1_2_1_24_1","volume-title":"Smith","author":"Jaimes Alejandro","year":"2003","unstructured":"Alejandro Jaimes , Belle L. Tseng , and John R . Smith . 2003 . Modal keywords, ontologies, and reasoning for video understanding. In Proceedings of the CIVR. Alejandro Jaimes, Belle L. Tseng, and John R. Smith. 2003. Modal keywords, ontologies, and reasoning for video understanding. In Proceedings of the CIVR."},{"volume-title":"Proceedings of the ICCV.","author":"Jain Mihir","key":"e_1_2_1_25_1","unstructured":"Mihir Jain , Jan C. van Gemert , Thomas Mensink , and Cees G. M. Snoek . 2015. Objects2action: Classifying and localizing actions without any video example . In Proceedings of the ICCV. Mihir Jain, Jan C. van Gemert, Thomas Mensink, and Cees G. M. Snoek. 2015. Objects2action: Classifying and localizing actions without any video example. In Proceedings of the ICCV."},{"key":"e_1_2_1_26_1","volume-title":"Hauptmann","author":"Jiang Lu","year":"2015","unstructured":"Lu Jiang , Shoou- I. Yu , Deyu Meng , Yi Yang , Teruko Mitamura , and Alexander G . Hauptmann . 2015 . Fast and accurate content-based semantic search in 100m internet videos. In Proceedings of the MM. Lu Jiang, Shoou-I. Yu, Deyu Meng, Yi Yang, Teruko Mitamura, and Alexander G. Hauptmann. 2015. Fast and accurate content-based semantic search in 100m internet videos. In Proceedings of the MM."},{"key":"e_1_2_1_27_1","unstructured":"Yu-Gang Jiang Chong-Wah Ngo and Jun Yang. 2007. VIREO-374: LSCOM semantic concept detectors using local keypoint features.  Yu-Gang Jiang Chong-Wah Ngo and Jun Yang. 2007. VIREO-374: LSCOM semantic concept detectors using local keypoint features."},{"key":"e_1_2_1_28_1","volume-title":"CU-VIREO374: Fusing Columbia374 and VIREO374 for large scale semantic concept detection","author":"Jiang Yu-Gang","year":"2008","unstructured":"Yu-Gang Jiang , Akira Yanagawa , Shih-Fu Chang , and Chong-Wah Ngo . 2008. CU-VIREO374: Fusing Columbia374 and VIREO374 for large scale semantic concept detection . Columbia University ADVENT ( 2008 ), 223--2008. Yu-Gang Jiang, Akira Yanagawa, Shih-Fu Chang, and Chong-Wah Ngo. 2008. CU-VIREO374: Fusing Columbia374 and VIREO374 for large scale semantic concept detection. Columbia University ADVENT (2008), 223--2008."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.223"},{"key":"e_1_2_1_30_1","volume-title":"Proceedings of the CVPR.","author":"Lan Zhengzhong","year":"2015","unstructured":"Zhengzhong Lan , Ming Lin , Xuanchong Li , Alex G. Hauptmann , and Bhiksha Raj . 2015 . Beyond gaussian pyramid: Multi-skip feature stacking for action recognition . In Proceedings of the CVPR. Zhengzhong Lan, Ming Lin, Xuanchong Li, Alex G. Hauptmann, and Bhiksha Raj. 2015. Beyond gaussian pyramid: Multi-skip feature stacking for action recognition. In Proceedings of the CVPR."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.394"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/2964284.2967289"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/WACV.2013.6475038"},{"key":"e_1_2_1_34_1","volume-title":"Hauptmann","author":"Ma Zhigang","year":"2018","unstructured":"Zhigang Ma , Xiaojun Chang , Zhongwen Xu , Nicu Sebe , and Alexander G . Hauptmann . 2018 . Joint attributes and event analysis for multimedia event detection. TNNLS 10 (2018). Zhigang Ma, Xiaojun Chang, Zhongwen Xu, Nicu Sebe, and Alexander G. Hauptmann. 2018. Joint attributes and event analysis for multimedia event detection. TNNLS 10 (2018)."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2017.2659221"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3078971.3079041"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2014.2359771"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2016.2559947"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2011.2168948"},{"key":"e_1_2_1_40_1","volume-title":"Snoek","author":"Mettes Pascal","year":"2016","unstructured":"Pascal Mettes , Dennis C. Koelma , and Cees G. M . Snoek . 2016 . The imagenet shuffle: Reorganized pre-training for video event detection. In Proceedings of the ICMR. Pascal Mettes, Dennis C. Koelma, and Cees G. M. Snoek. 2016. The imagenet shuffle: Reorganized pre-training for video event detection. In Proceedings of the ICMR."},{"key":"e_1_2_1_41_1","volume-title":"Snoek","author":"Mettes Pascal","year":"2017","unstructured":"Pascal Mettes and Cees G. M . Snoek . 2017 . Spatial-aware object embeddings for zero-shot localization and classification of actions. In Proceedings of the ICCV. Pascal Mettes and Cees G. M. Snoek. 2017. Spatial-aware object embeddings for zero-shot localization and classification of actions. In Proceedings of the ICCV."},{"volume-title":"Proceedings of the ICMR.","author":"Mettes Pascal","key":"e_1_2_1_42_1","unstructured":"Pascal Mettes , Jan C. van Gemert , Spencer Cappallo , Thomas Mensink , and Cees G. M. Snoek . 2015. Bag-of-fragments: Selecting and encoding video fragments for event detection and recounting . In Proceedings of the ICMR. Pascal Mettes, Jan C. van Gemert, Spencer Cappallo, Thomas Mensink, and Cees G. M. Snoek. 2015. Bag-of-fragments: Selecting and encoding video fragments for event detection and recounting. In Proceedings of the ICMR."},{"key":"e_1_2_1_43_1","volume-title":"Proceedings of the NIPS.","author":"Mikolov Tomas","year":"2013","unstructured":"Tomas Mikolov , Ilya Sutskever , Kai Chen , Greg S. Corrado , and Jeff Dean . 2013 . Distributed representations of words and phrases and their compositionality . In Proceedings of the NIPS. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the NIPS."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/219717.219748"},{"key":"e_1_2_1_45_1","volume-title":"Snoek","author":"Nagel Markus","year":"2015","unstructured":"Markus Nagel , Thomas Mensink , and Cees G. M . Snoek . 2015 . Event fisher vectors: Robust encoding visual diversity of visual streams. In Proceedings of the BMVC. Markus Nagel, Thomas Mensink, and Cees G. M. Snoek. 2015. Event fisher vectors: Robust encoding visual diversity of visual streams. In Proceedings of the BMVC."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/MMUL.2006.63"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1007\/11788034_15"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2013.228"},{"key":"e_1_2_1_49_1","volume-title":"Berg","author":"Ordonez Vicente","year":"2013","unstructured":"Vicente Ordonez , Jia Deng , Yejin Choi , Alexander C. Berg , and Tamara L . Berg . 2013 . From large scale image categorization to entry-level categories. In Proceedings of the ICCV. Vicente Ordonez, Jia Deng, Yejin Choi, Alexander C. Berg, and Tamara L. Berg. 2013. From large scale image categorization to entry-level categories. In Proceedings of the ICCV."},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1016\/0010-0285(76)90013-X"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-015-0816-y"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-013-0636-x"},{"key":"e_1_2_1_54_1","volume-title":"Davis","author":"Singh Bharat","year":"2015","unstructured":"Bharat Singh , Xintong Han , Zhe Wu , Vlad I. Morariu , and Larry S . Davis . 2015 . Selecting relevant web trained concepts for automated event retrieval. In Proceedings of the ICCV. Bharat Singh, Xintong Han, Zhe Wu, Vlad I. Morariu, and Larry S. Davis. 2015. Selecting relevant web trained concepts for automated event retrieval. In Proceedings of the ICCV."},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2007.900156"},{"volume-title":"Proceedings of the ACM MM.","author":"Snoek Cees G. M.","key":"e_1_2_1_56_1","unstructured":"Cees G. M. Snoek , Marcel Worring , Jan C. Van Gemert , Jan-Mark Geusebroek , and Arnold W. M. Smeulders . 2006. The challenge problem for automated detection of 101 semantic concepts in multimedia . In Proceedings of the ACM MM. Cees G. M. Snoek, Marcel Worring, Jan C. Van Gemert, Jan-Mark Geusebroek, and Arnold W. M. Smeulders. 2006. The challenge problem for automated detection of 101 semantic concepts in multimedia. In Proceedings of the ACM MM."},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1109\/WACV.2013.6474994"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2012.6248114"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.510"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1111\/cgf.13188"},{"key":"e_1_2_1_62_1","volume-title":"Smeulders","author":"Vreeswijk Daan T. J.","year":"2012","unstructured":"Daan T. J. Vreeswijk , Cees G. M. Snoek , Koen E. A. van de Sande , and Arnold W. M . Smeulders . 2012 . All vehicles are cars: Subclass preferences in container concepts. In Proceedings of the ICMR. Daan T. J. Vreeswijk, Cees G. M. Snoek, Koen E. A. van de Sande, and Arnold W. M. Smeulders. 2012. All vehicles are cars: Subclass preferences in container concepts. In Proceedings of the ICMR."},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1145\/1291233.1291293"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2013.441"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00521"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2008.2001382"},{"key":"e_1_2_1_67_1","volume-title":"McDonnell","author":"Wong Sebastien C.","year":"2016","unstructured":"Sebastien C. Wong , Adam Gatt , Victor Stamatescu , and Mark D . McDonnell . 2016 . Understanding data augmentation for classification: When to warp? In Proceedings of the DICTA. Sebastien C. Wong, Adam Gatt, Victor Stamatescu, and Mark D. McDonnell. 2016. Understanding data augmentation for classification: When to warp? In Proceedings of the DICTA."},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.634"},{"key":"e_1_2_1_69_1","volume-title":"Hauptmann","author":"Xu Zhongwen","year":"2015","unstructured":"Zhongwen Xu , Yi Yang , and Alex G . Hauptmann . 2015 . A discriminative CNN video representation for event detection. In Proceedings of the CVPR. Zhongwen Xu, Yi Yang, and Alex G. Hauptmann. 2015. A discriminative CNN video representation for event detection. In Proceedings of the CVPR."},{"key":"e_1_2_1_71_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-33712-3_52"},{"key":"e_1_2_1_72_1","doi-asserted-by":"publisher","DOI":"10.1145\/2733373.2806221"},{"key":"e_1_2_1_73_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2016.03.102"},{"key":"e_1_2_1_74_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2016.2614136"},{"key":"e_1_2_1_75_1","volume-title":"Hauptmann","author":"Yu I.","year":"2015","unstructured":"Shoou- I. Yu , Lu Jiang , Zhongwen Xu , Yi Yang , and Alexander G . Hauptmann . 2015 . Content-based video search over 1 million videos with 1 core in 1 second. In Proceedings of the ICMR. Shoou-I. Yu, Lu Jiang, Zhongwen Xu, Yi Yang, and Alexander G. Hauptmann. 2015. Content-based video search over 1 million videos with 1 core in 1 second. In Proceedings of the ICMR."},{"key":"e_1_2_1_76_1","doi-asserted-by":"publisher","DOI":"10.5244\/C.29.60"},{"key":"e_1_2_1_77_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2015.2449660"},{"key":"e_1_2_1_78_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.147"},{"key":"e_1_2_1_79_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-017-1033-7"},{"key":"e_1_2_1_80_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01234-2_46"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3377875","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3377875","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:38:52Z","timestamp":1750199932000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3377875"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,5,22]]},"references-count":79,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2020,5,31]]}},"alternative-id":["10.1145\/3377875"],"URL":"https:\/\/doi.org\/10.1145\/3377875","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"type":"print","value":"1551-6857"},{"type":"electronic","value":"1551-6865"}],"subject":[],"published":{"date-parts":[[2020,5,22]]},"assertion":[{"value":"2019-08-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-01-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-05-22","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}