{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:35:27Z","timestamp":1750221327658,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":41,"publisher":"ACM","license":[{"start":{"date-parts":[[2017,10,19]],"date-time":"2017-10-19T00:00:00Z","timestamp":1508371200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"STW","award":["Story project"],"award-info":[{"award-number":["Story project"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2017,10,19]]},"DOI":"10.1145\/3123266.3123437","type":"proceedings-article","created":{"date-parts":[[2017,10,20]],"date-time":"2017-10-20T13:04:26Z","timestamp":1508504666000},"page":"28-36","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["Future-Supervised Retrieval of Unseen Queries for Live Video"],"prefix":"10.1145","author":[{"given":"Spencer","family":"Cappallo","sequence":"first","affiliation":[{"name":"University of Amsterdam, Amsterdam, Netherlands"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Cees G.M.","family":"Snoek","sequence":"additional","affiliation":[{"name":"University of Amsterdam, Amsterdam, Netherlands"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2017,10,19]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Activitynet: A large-scale video benchmark for human activity understanding CVPR.","author":"Heilbron Fabian Caba","year":"2015","unstructured":"Fabian Caba Heilbron , Victor Escorcia , Bernard Ghanem , and Juan Carlos Niebles . 2015 . Activitynet: A large-scale video benchmark for human activity understanding CVPR. Fabian Caba Heilbron, Victor Escorcia, Bernard Ghanem, and Juan Carlos Niebles. 2015. Activitynet: A large-scale video benchmark for human activity understanding CVPR."},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/2733373.2806335"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"crossref","unstructured":"Spencer Cappallo Thomas Mensink and Cees GM Snoek. 2016. Video Stream Retrieval of Unseen Queries using Semantic Memory BMVC.  Spencer Cappallo Thomas Mensink and Cees GM Snoek. 2016. Video Stream Retrieval of Unseen Queries using Semantic Memory BMVC.","DOI":"10.5244\/C.30.143"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/2578726.2578729"},{"key":"e_1_3_2_1_5_1","unstructured":"Franccois Chollet. 2015. Keras. https:\/\/github.com\/fchollet\/keras. (2015).  Franccois Chollet. 2015. Keras. https:\/\/github.com\/fchollet\/keras. (2015)."},{"key":"e_1_3_2_1_6_1","volume-title":"Building A Large Concept Bank for Representing Events in Video. arXiv:1403.7591","author":"Cui Yin","year":"2014","unstructured":"Yin Cui , Dong Liu , Jiawei Chen , and Shih-Fu Chang . 2014. Building A Large Concept Bank for Representing Events in Video. arXiv:1403.7591 ( 2014 ). Yin Cui, Dong Liu, Jiawei Chen, and Shih-Fu Chang. 2014. Building A Large Concept Bank for Representing Events in Video. arXiv:1403.7591 (2014)."},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"crossref","unstructured":"Roeland De Geest Efstratios Gavves Amir Ghodrati Zhenyang Li Cees Snoek and Tinne Tuytelaars. 2016. Online action detection. In ECCV.  Roeland De Geest Efstratios Gavves Amir Ghodrati Zhenyang Li Cees Snoek and Tinne Tuytelaars. 2016. Online action detection. In ECCV.","DOI":"10.1007\/978-3-319-46454-1_17"},{"key":"e_1_3_2_1_8_1","volume-title":"Imagenet: A large-scale hierarchical image database CVPR.","author":"Deng Jia","year":"2009","unstructured":"Jia Deng , Wei Dong , Richard Socher , Li-Jia Li , Kai Li , and Li Fei-Fei . 2009 . Imagenet: A large-scale hierarchical image database CVPR. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database CVPR."},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/2578726.2578746"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.521"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"crossref","unstructured":"Mihir Jain Jan C. van Gemert and Cees G. M. Snoek. 2015 a. What do 15 000 object categories tell us about classifying and localizing actions? CVPR.  Mihir Jain Jan C. van Gemert and Cees G. M. Snoek. 2015 a. What do 15 000 object categories tell us about classifying and localizing actions? CVPR.","DOI":"10.1109\/CVPR.2015.7298599"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"crossref","unstructured":"Lu Jiang Deyu Meng Qian Zhao Shiguang Shan and Alexander G Hauptmann. 2015. Self-paced curriculum learning. In AAAI.   Lu Jiang Deyu Meng Qian Zhao Shiguang Shan and Alexander G Hauptmann. 2015. Self-paced curriculum learning. In AAAI.","DOI":"10.1609\/aaai.v29i1.9608"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/2578726.2578764"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2733373.2806237"},{"key":"e_1_3_2_1_16_1","volume-title":"Columbia-UCF TRECVID2010 Multimedia Event Detection: Combining Multiple Modalities, Contextual Concepts, and Temporal Matching NIST TRECVID Workshop.","author":"Jiang Y.-G.","year":"2010","unstructured":"Y.-G. Jiang , X. Zeng , G. Ye , S. Bhattacharya , D. Ellis , M. Shah , and S.-F. Chang . 2010 . Columbia-UCF TRECVID2010 Multimedia Event Detection: Combining Multiple Modalities, Contextual Concepts, and Temporal Matching NIST TRECVID Workshop. Y.-G. Jiang, X. Zeng, G. Ye, S. Bhattacharya, D. Ellis, M. Shah, and S.-F. Chang. 2010. Columbia-UCF TRECVID2010 Multimedia Event Detection: Combining Multiple Modalities, Contextual Concepts, and Temporal Matching NIST TRECVID Workshop."},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2011.6126472"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/2671188.2749402"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2911996.2912036"},{"key":"e_1_3_2_1_20_1","unstructured":"Tomas Mikolov Ilya Sutskever Kai Chen Greg S Corrado and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality NIPS.   Tomas Mikolov Ilya Sutskever Kai Chen Greg S Corrado and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality NIPS."},{"key":"e_1_3_2_1_21_1","unstructured":"Mohammad Norouzi Tomas Mikolov Samy Bengio Yoram Singer Jonathon Shlens Andrea Frome Greg S Corrado and Jeffrey Dean. 2014. Zero-Shot Learning by Convex Combination of Semantic Embeddings ICLR.  Mohammad Norouzi Tomas Mikolov Samy Bengio Yoram Singer Jonathon Shlens Andrea Frome Greg S Corrado and Jeffrey Dean. 2014. Zero-Shot Learning by Convex Combination of Semantic Embeddings ICLR."},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2013.228"},{"key":"e_1_3_2_1_23_1","unstructured":"Paul Over Jon Fiscus Greg Sanders David Joy Martial Michel George Awad Alan Smeaton Wessel Kraaij and Georges Qu\u00e9not. 2014. Trecvid 2014--an overview of the goals tasks data evaluation mechanisms and metrics TRECVID.  Paul Over Jon Fiscus Greg Sanders David Joy Martial Michel George Awad Alan Smeaton Wessel Kraaij and Georges Qu\u00e9not. 2014. Trecvid 2014--an overview of the goals tasks data evaluation mechanisms and metrics TRECVID."},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10578-9_12"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2013.318"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"crossref","unstructured":"Mikel D Rodriguez Javed Ahmed and Mubarak Shah. 2008. Action mach a spatio-temporal maximum average correlation height filter for action recognition. In CVPR.  Mikel D Rodriguez Javed Ahmed and Mubarak Shah. 2008. Action mach a spatio-temporal maximum average correlation height filter for action recognition. In CVPR.","DOI":"10.1109\/CVPR.2008.4587727"},{"key":"e_1_3_2_1_27_1","volume-title":"Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556","author":"Simonyan Karen","year":"2014","unstructured":"Karen Simonyan and Andrew Zisserman . 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 ( 2014 ). Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)."},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"crossref","unstructured":"K. Soomro H. Idrees and M. Shah. 2016. Predicting the where and what of actors and actions through online action localization CVPR.  K. Soomro H. Idrees and M. Shah. 2016. Predicting the where and what of actors and actions through online action localization CVPR.","DOI":"10.1109\/CVPR.2016.290"},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"crossref","unstructured":"Christian Szegedy Wei Liu Yangqing Jia Pierre Sermanet Scott Reed Dragomir Anguelov Dumitru Erhan Vincent Vanhoucke and Andrew Rabinovich. 2015. Going deeper with convolutions. In CVPR.  Christian Szegedy Wei Liu Yangqing Jia Pierre Sermanet Scott Reed Dragomir Anguelov Dumitru Erhan Vincent Vanhoucke and Andrew Rabinovich. 2015. Going deeper with convolutions. In CVPR.","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/2812802"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"crossref","unstructured":"Carl Vondrick Hamed Pirsiavash and Antonio Torralba. 2016. Anticipating visual representations from unlabeled video CVPR.  Carl Vondrick Hamed Pirsiavash and Antonio Torralba. 2016. Anticipating visual representations from unlabeled video CVPR.","DOI":"10.1109\/CVPR.2016.18"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"crossref","unstructured":"Jacob Walker Carl Doersch Abhinav Gupta and Martial Hebert. 2016. An uncertain future: Forecasting from static images using variational autoencoders ECCV.  Jacob Walker Carl Doersch Abhinav Gupta and Martial Hebert. 2016. An uncertain future: Forecasting from static images using variational autoencoders ECCV.","DOI":"10.1007\/978-3-319-46478-7_51"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.281"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.5591\/978-1-57735-516-8\/IJCAI11-460"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.341"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/2964284.2964328"},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"crossref","unstructured":"Xun Xu Timothy Hospedales and Shaogang Gong. 2015 a. Semantic embedding space for zero-shot action recognition ICIP.  Xun Xu Timothy Hospedales and Shaogang Gong. 2015 a. Semantic embedding space for zero-shot action recognition ICIP.","DOI":"10.1109\/ICIP.2015.7350760"},{"key":"e_1_3_2_1_38_1","unstructured":"Zhongwen Xu Yi Yang and Alexander G Hauptmann. 2015 b. A discriminative CNN video representation for event detection CVPR.  Zhongwen Xu Yi Yang and Alexander G Hauptmann. 2015 b. A discriminative CNN video representation for event detection CVPR."},{"key":"e_1_3_2_1_39_1","unstructured":"Tianfan Xue Jiajun Wu Katherine Bouman and Bill Freeman. 2016. Visual dynamics: Probabilistic future frame synthesis via cross convolutional networks NIPS.  Tianfan Xue Jiajun Wu Katherine Bouman and Bill Freeman. 2016. Visual dynamics: Probabilistic future frame synthesis via cross convolutional networks NIPS."},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/2733373.2806221"},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"crossref","unstructured":"Joe Yue-Hei Ng Matthew Hausknecht Sudheendra Vijayanarasimhan Oriol Vinyals Rajat Monga and George Toderici. 2015. Beyond short snippets: Deep networks for video classification CVPR.  Joe Yue-Hei Ng Matthew Hausknecht Sudheendra Vijayanarasimhan Oriol Vinyals Rajat Monga and George Toderici. 2015. Beyond short snippets: Deep networks for video classification CVPR.","DOI":"10.1109\/CVPR.2015.7299101"}],"event":{"name":"MM '17: ACM Multimedia Conference","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Mountain View California USA","acronym":"MM '17"},"container-title":["Proceedings of the 25th ACM international conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3123266.3123437","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3123266.3123437","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T02:14:04Z","timestamp":1750212844000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3123266.3123437"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,10,19]]},"references-count":41,"alternative-id":["10.1145\/3123266.3123437","10.1145\/3123266"],"URL":"https:\/\/doi.org\/10.1145\/3123266.3123437","relation":{},"subject":[],"published":{"date-parts":[[2017,10,19]]},"assertion":[{"value":"2017-10-19","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}