{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,5]],"date-time":"2026-03-05T16:17:46Z","timestamp":1772727466410,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":49,"publisher":"ACM","license":[{"start":{"date-parts":[[2018,10,15]],"date-time":"2018-10-15T00:00:00Z","timestamp":1539561600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2018,10,15]]},"DOI":"10.1145\/3240508.3240713","type":"proceedings-article","created":{"date-parts":[[2018,10,18]],"date-time":"2018-10-18T17:52:08Z","timestamp":1539885128000},"page":"1829-1837","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":14,"title":["Deep Adaptive Temporal Pooling for Activity Recognition"],"prefix":"10.1145","author":[{"given":"Sibo","family":"Song","sequence":"first","affiliation":[{"name":"Singapore University of Technology and Design, Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ngai-Man","family":"Cheung","sequence":"additional","affiliation":[{"name":"Singapore University of Technology and Design, Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Vijay","family":"Chandrasekhar","sequence":"additional","affiliation":[{"name":"Institute for Infocomm Research, Singapore, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bappaditya","family":"Mandal","sequence":"additional","affiliation":[{"name":"Keele University, Keele, Staffordshire, United Kingdom"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2018,10,15]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"Jimmy Ba and Rich Caruana. 2014. Do deep nets really need to be deep?. In Advances in neural information processing systems. 2654--2662.   Jimmy Ba and Rich Caruana. 2014. Do deep nets really need to be deep?. In Advances in neural information processing systems. 2654--2662."},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-25446-8_4"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.502"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00234"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1007\/11744047_33"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"crossref","unstructured":"Ali Diba Vivek Sharma and Luc Van Gool. 2017. Deep temporal linear encoding networks. In Computer Vision and Pattern Recognition .  Ali Diba Vivek Sharma and Luc Van Gool. 2017. Deep temporal linear encoding networks. In Computer Vision and Pattern Recognition .","DOI":"10.1109\/CVPR.2017.168"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298878"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.341"},{"key":"e_1_3_2_1_10_1","unstructured":"Christoph Feichtenhofer Axel Pinz and Richard Wildes. 2016a. Spatiotemporal Residual Networks for Video Action Recognition. In Advances in Neural Information Processing Systems. 3468--3476.   Christoph Feichtenhofer Axel Pinz and Richard Wildes. 2016a. Spatiotemporal Residual Networks for Video Action Recognition. In Advances in Neural Information Processing Systems. 3468--3476."},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.787"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.213"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"crossref","unstructured":"Rohit Girdhar Deva Ramanan Abhinav Gupta Josef Sivic and Bryan Russell. 2017. ActionVLAD: Learning spatio-temporal aggregation for action classification. arXiv preprint arXiv:1704.02895 (2017).  Rohit Girdhar Deva Ramanan Abhinav Gupta Josef Sivic and Bryan Russell. 2017. ActionVLAD: Learning spatio-temporal aggregation for action classification. arXiv preprint arXiv:1704.02895 (2017).","DOI":"10.1109\/CVPR.2017.337"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.169"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00248"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_1_17_1","unstructured":"Geoffrey Hinton Oriol Vinyals and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).  Geoffrey Hinton Oriol Vinyals and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)."},{"key":"e_1_3_2_1_18_1","unstructured":"Max Jaderberg Karen Simonyan Andrew Zisserman et almbox. 2015. Spatial transformer networks. In Advances in Neural Information Processing Systems. 2017--2025.   Max Jaderberg Karen Simonyan Andrew Zisserman et almbox. 2015. Spatial transformer networks. In Advances in Neural Information Processing Systems. 2017--2025."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"crossref","unstructured":"Amlan Kar Nishant Rai Karan Sikka and Gaurav Sharma. 2016. AdaScan: Adaptive Scan Pooling in Deep Convolutional Neural Networks for Human Action Recognition in Videos. arXiv preprint arXiv:1611.08240 (2016).  Amlan Kar Nishant Rai Karan Sikka and Gaurav Sharma. 2016. AdaScan: Adaptive Scan Pooling in Deep Convolutional Neural Networks for Human Action Recognition in Videos. arXiv preprint arXiv:1611.08240 (2016).","DOI":"10.1109\/CVPR.2017.604"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.223"},{"key":"e_1_3_2_1_21_1","unstructured":"Will Kay Joao Carreira Karen Simonyan Brian Zhang Chloe Hillier Sudheendra Vijayanarasimhan Fabio Viola Tim Green Trevor Back Paul Natsev et almbox. 2017. The kinetics human action video dataset. arXiv preprint arXiv:1705.06950 (2017).  Will Kay Joao Carreira Karen Simonyan Brian Zhang Chloe Hillier Sudheendra Vijayanarasimhan Fabio Viola Tim Green Trevor Back Paul Natsev et almbox. 2017. The kinetics human action video dataset. arXiv preprint arXiv:1705.06950 (2017)."},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2011.6126543"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2008.4587756"},{"key":"e_1_3_2_1_24_1","unstructured":"Multimedia-Laboratory-CUHK. 2016. TSN Pretrained Models on Kinetics Dataset. http:\/\/yjxiong.me\/others\/kinetics_action\/. (2016).  Multimedia-Laboratory-CUHK. 2016. TSN Pretrained Models on Kinetics Dataset. http:\/\/yjxiong.me\/others\/kinetics_action\/. (2016)."},{"key":"e_1_3_2_1_25_1","unstructured":"Wenjie Pei Tadas Baltruvs aitis David MJ Tax and Louis-Philippe Morency. 2017. Temporal Attention-Gated Model for Robust Sequence Classification. (2017).  Wenjie Pei Tadas Baltruvs aitis David MJ Tax and Louis-Philippe Morency. 2017. Temporal Attention-Gated Model for Robust Sequence Classification. (2017)."},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.590"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-013-0636-x"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/1291233.1291311"},{"key":"e_1_3_2_1_29_1","unstructured":"Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. In Advances in neural information processing systems. 568--576.   Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. In Advances in neural information processing systems. 568--576."},{"key":"e_1_3_2_1_30_1","unstructured":"Khurram Soomro Amir Roshan Zamir and Mubarak Shah. 2012. UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012).  Khurram Soomro Amir Roshan Zamir and Mubarak Shah. 2012. UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012)."},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.510"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00675"},{"key":"e_1_3_2_1_33_1","unstructured":"Gul Varol Ivan Laptev and Cordelia Schmid. 2017. Long-term temporal convolutions for action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (2017).  Gul Varol Ivan Laptev and Cordelia Schmid. 2017. Long-term temporal convolutions for action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (2017)."},{"key":"e_1_3_2_1_34_1","unstructured":"Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N Gomez \u0141ukasz Kaiser and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 6000--6010.  Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N Gomez \u0141ukasz Kaiser and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 6000--6010."},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.515"},{"key":"e_1_3_2_1_36_1","volume-title":"International journal of computer vision","author":"Wang Heng","year":"2013"},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2013.441"},{"key":"e_1_3_2_1_38_1","unstructured":"Limin Wang Wei Li Wen Li and Luc Van Gool. 2017a. Appearance-and-Relation Networks for Video Classification. arXiv preprint arXiv:1711.09125 (2017).  Limin Wang Wei Li Wen Li and Luc Van Gool. 2017a. Appearance-and-Relation Networks for Video Classification. arXiv preprint arXiv:1711.09125 (2017)."},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7299059"},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.678"},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46484-8_2"},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.291"},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.226"},{"key":"e_1_3_2_1_44_1","unstructured":"Yilin Wang Suhang Wang Jiliang Tang Neil O'Hare Yi Chang and Baoxin Li. 2016b. Hierarchical attention network for action recognition in videos. arXiv preprint arXiv:1607.06416 (2016).  Yilin Wang Suhang Wang Jiliang Tang Neil O'Hare Yi Chang and Baoxin Li. 2016b. Hierarchical attention network for action recognition in videos. arXiv preprint arXiv:1607.06416 (2016)."},{"key":"e_1_3_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.512"},{"key":"e_1_3_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.5555\/1771530.1771554"},{"key":"e_1_3_2_1_47_1","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 318--327","author":"Zhou Yizhou","year":"2017"},{"key":"e_1_3_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00054"},{"key":"e_1_3_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.219"}],"event":{"name":"MM '18: ACM Multimedia Conference","location":"Seoul Republic of Korea","acronym":"MM '18","sponsor":["SIGMM ACM Special Interest Group on Multimedia"]},"container-title":["Proceedings of the 26th ACM international conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3240508.3240713","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3240508.3240713","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T00:43:32Z","timestamp":1750207412000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3240508.3240713"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,10,15]]},"references-count":49,"alternative-id":["10.1145\/3240508.3240713","10.1145\/3240508"],"URL":"https:\/\/doi.org\/10.1145\/3240508.3240713","relation":{},"subject":[],"published":{"date-parts":[[2018,10,15]]},"assertion":[{"value":"2018-10-15","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}