{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,4]],"date-time":"2026-04-04T22:18:37Z","timestamp":1775341117281,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":48,"publisher":"ACM","license":[{"start":{"date-parts":[[2019,7,25]],"date-time":"2019-07-25T00:00:00Z","timestamp":1564012800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2019,7,25]]},"DOI":"10.1145\/3292500.3330653","type":"proceedings-article","created":{"date-parts":[[2019,7,26]],"date-time":"2019-07-26T13:17:26Z","timestamp":1564147046000},"page":"2394-2402","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Large-Scale Training Framework for Video Annotation"],"prefix":"10.1145","author":[{"given":"Seong Jae","family":"Hwang","sequence":"first","affiliation":[{"name":"University of Wisconsin-Madison, Madison, WI, USA"}]},{"given":"Joonseok","family":"Lee","sequence":"additional","affiliation":[{"name":"Google Research, Mountain View, CA, USA"}]},{"given":"Balakrishnan","family":"Varadarajan","sequence":"additional","affiliation":[{"name":"Google Research, Mountain View, CA, USA"}]},{"given":"Ariel","family":"Gordon","sequence":"additional","affiliation":[{"name":"Google Research, Mountain View, CA, USA"}]},{"given":"Zheng","family":"Xu","sequence":"additional","affiliation":[{"name":"Google Research, Mountain View, CA, USA"}]},{"given":"Apostol (Paul)","family":"Natsev","sequence":"additional","affiliation":[{"name":"Google Research, Mountain View, CA, USA"}]}],"member":"320","published-online":{"date-parts":[[2019,7,25]]},"reference":[{"key":"e_1_3_2_2_1_1","volume-title":"Youtube-8M: A Large-scale Video Classification Benchmark. arXiv preprint arXiv:1609.08675","author":"Abu-El-Haija Sami","year":"2016","unstructured":"Sami Abu-El-Haija , Nisarg Kothari , Joonseok Lee , Paul Natsev , George Toderici , Balakrishnan Varadarajan , and Sudheendra Vijayanarasimhan . 2016. Youtube-8M: A Large-scale Video Classification Benchmark. arXiv preprint arXiv:1609.08675 ( 2016 ). Sami Abu-El-Haija, Nisarg Kothari, Joonseok Lee, Paul Natsev, George Toderici, Balakrishnan Varadarajan, and Sudheendra Vijayanarasimhan. 2016. Youtube-8M: A Large-scale Video Classification Benchmark. arXiv preprint arXiv:1609.08675 (2016)."},{"key":"e_1_3_2_2_2_1","volume-title":"Proc. of the 2nd Workshop on YouTube-8M Large-Scale Video Understanding.","author":"Aliev Vladimir","year":"2018","unstructured":"Vladimir Aliev , Pavel Ostyakov , Roman Suvorov , Gleb Sterkin , Elizaveta Logacheva , Oleg Khomenko , and Sergey Nikolenko . 2018 . Label Denoising with Large Ensembles of Heterogeneous Neural Networks . In Proc. of the 2nd Workshop on YouTube-8M Large-Scale Video Understanding. Vladimir Aliev, Pavel Ostyakov, Roman Suvorov, Gleb Sterkin, Elizaveta Logacheva, Oleg Khomenko, and Sergey Nikolenko. 2018. Label Denoising with Large Ensembles of Heterogeneous Neural Networks. In Proc. of the 2nd Workshop on YouTube-8M Large-Scale Video Understanding."},{"key":"e_1_3_2_2_3_1","volume-title":"Proc. of the CVPR Workshop on YouTube-8M Large-Scale Video Understanding.","author":"Chen Shaoxiang","year":"2017","unstructured":"Shaoxiang Chen , Xi Wang , Yongyi Tang , Xinpeng Chen , Zuxuan Wu , and Yu-Gang Jiang . 2017 . Aggregating Frame-level Features for Large-Scale Video Classification . In Proc. of the CVPR Workshop on YouTube-8M Large-Scale Video Understanding. Shaoxiang Chen, Xi Wang, Yongyi Tang, Xinpeng Chen, Zuxuan Wu, and Yu-Gang Jiang. 2017. Aggregating Frame-level Features for Large-Scale Video Classification. In Proc. of the CVPR Workshop on YouTube-8M Large-Scale Video Understanding."},{"key":"e_1_3_2_2_4_1","volume-title":"Proc. of the 2nd Workshop on YouTube-8M Large-Scale Video Understanding.","author":"Cho Choongyeun","year":"2018","unstructured":"Choongyeun Cho , Benjamin Antin , Sanchit Arora , Shwan Ashrafi , Peilin Duan , Dang The Huynh , Lee James , Hang Tuan Nguyen , Moji Solgi , and Cuong Van Than . 2018 . Axon AI's Solution to the 2nd YouTube-8M Video Understanding Challenge . In Proc. of the 2nd Workshop on YouTube-8M Large-Scale Video Understanding. Choongyeun Cho, Benjamin Antin, Sanchit Arora, Shwan Ashrafi, Peilin Duan, Dang The Huynh, Lee James, Hang Tuan Nguyen, Moji Solgi, and Cuong Van Than. 2018. Axon AI's Solution to the 2nd YouTube-8M Video Understanding Challenge. In Proc. of the 2nd Workshop on YouTube-8M Large-Scale Video Understanding."},{"key":"e_1_3_2_2_5_1","unstructured":"Cheng-Tao Chu Sang K Kim Yi-An Lin YuanYuan Yu Gary Bradski Kunle Olukotun and Andrew Y Ng. 2007. Map-Reduce for machine learning on multicore. In Advances in neural information processing systems (NIPS).   Cheng-Tao Chu Sang K Kim Yi-An Lin YuanYuan Yu Gary Bradski Kunle Olukotun and Andrew Y Ng. 2007. Map-Reduce for machine learning on multicore. In Advances in neural information processing systems (NIPS)."},{"key":"e_1_3_2_2_6_1","volume-title":"et almbox","author":"Dean Jeffrey","year":"2012","unstructured":"Jeffrey Dean , Greg Corrado , Rajat Monga , Kai Chen , Matthieu Devin , Mark Mao , Andrew Senior , Paul Tucker , Ke Yang , Quoc V Le , et almbox . 2012 . Large scale distributed deep networks. In Advances in neural information processing systems (NIPS) . Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Andrew Senior, Paul Tucker, Ke Yang, Quoc V Le, et almbox. 2012. Large scale distributed deep networks. In Advances in neural information processing systems (NIPS)."},{"key":"e_1_3_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/1327452.1327492"},{"key":"e_1_3_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2016.2558148"},{"key":"e_1_3_2_2_9_1","volume-title":"Proc. of the 2nd Workshop on YouTube-8M Large-Scale Video Understanding.","author":"Garg Shivam","year":"2018","unstructured":"Shivam Garg . 2018 . Learning Video Features for Multi-Label Classification . In Proc. of the 2nd Workshop on YouTube-8M Large-Scale Video Understanding. Shivam Garg. 2018. Learning Video Features for Multi-Label Classification. In Proc. of the 2nd Workshop on YouTube-8M Large-Scale Video Understanding."},{"key":"e_1_3_2_2_10_1","volume-title":"large minibatch SGD: training imagenet in 1 hour. arXiv preprint arXiv:1706.02677","author":"Goyal Priya","year":"2017","unstructured":"Priya Goyal , Piotr Doll\u00e1r , Ross Girshick , Pieter Noordhuis , Lukasz Wesolowski , Aapo Kyrola , Andrew Tulloch , Yangqing Jia , and Kaiming He. 2017. Accurate , large minibatch SGD: training imagenet in 1 hour. arXiv preprint arXiv:1706.02677 ( 2017 ). Priya Goyal, Piotr Doll\u00e1r, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. 2017. Accurate, large minibatch SGD: training imagenet in 1 hour. arXiv preprint arXiv:1706.02677 (2017)."},{"key":"e_1_3_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2017.7952132"},{"key":"e_1_3_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_3_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.243"},{"key":"e_1_3_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0925-2312(01)00700-7"},{"key":"e_1_3_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1994.6.2.181"},{"key":"e_1_3_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.223"},{"key":"e_1_3_2_2_18_1","volume-title":"Proc. of the 2nd Workshop on YouTube-8M Large-Scale Video Understanding.","author":"Kim Eun-Sol","year":"2018","unstructured":"Eun-Sol Kim , Jongseok Kim , Kyoung-Woon On , Yu-Jung Heo , Seong-Ho Choi , Hyun-Dong Lee , and Byoung-Tak Zhang . 2018 . Temporal Attention Mechanism with Conditional Inference for Large-Scale Multi-Label Video Classification . In Proc. of the 2nd Workshop on YouTube-8M Large-Scale Video Understanding. Eun-Sol Kim, Jongseok Kim, Kyoung-Woon On, Yu-Jung Heo, Seong-Ho Choi, Hyun-Dong Lee, and Byoung-Tak Zhang. 2018. Temporal Attention Mechanism with Conditional Inference for Large-Scale Multi-Label Video Classification. In Proc. of the 2nd Workshop on YouTube-8M Large-Scale Video Understanding."},{"key":"e_1_3_2_2_19_1","volume-title":"Proc. of the 2nd Workshop on YouTube-8M Large-Scale Video Understanding.","author":"Kmiec Sebastian","year":"2018","unstructured":"Sebastian Kmiec and Juhan Bae . 2018 . Learnable Pooling Methods for Video Classification . In Proc. of the 2nd Workshop on YouTube-8M Large-Scale Video Understanding. Sebastian Kmiec and Juhan Bae. 2018. Learnable Pooling Methods for Video Classification. In Proc. of the 2nd Workshop on YouTube-8M Large-Scale Video Understanding."},{"key":"e_1_3_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-016-0981-7"},{"key":"e_1_3_2_2_21_1","unstructured":"Alex Krizhevsky Ilya Sutskever and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems.   Alex Krizhevsky Ilya Sutskever and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems."},{"key":"e_1_3_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2008.4587756"},{"key":"e_1_3_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3219819.3219856"},{"key":"e_1_3_2_2_24_1","volume-title":"Proc. of the European Conference on Computer Vision (ECCV).","author":"Lee Joonseok","year":"2018","unstructured":"Joonseok Lee , Apostol Natsev , Walter Reade , Rahul Sukthankar , and George Toderici . 2018 b. The 2nd YouTube-8M Large-Scale Video Understanding Challenge . In Proc. of the European Conference on Computer Vision (ECCV). Joonseok Lee, Apostol Natsev, Walter Reade, Rahul Sukthankar, and George Toderici. 2018b. The 2nd YouTube-8M Large-Scale Video Understanding Challenge. In Proc. of the European Conference on Computer Vision (ECCV)."},{"key":"e_1_3_2_2_25_1","volume-title":"Proc. of the CVPR'17 Workshop on YouTube-8M Large-Scale Video Understanding.","author":"Li Fu","year":"2017","unstructured":"Fu Li , Chuang Gan , Xiao Liu , Yunlong Bian , Xiang Long , Yandong Li , Zhichao Li , Jie Zhou , and Shilei Wen . 2017 . Temporal modeling approaches for large-scale Youtube-8M video understanding . In Proc. of the CVPR'17 Workshop on YouTube-8M Large-Scale Video Understanding. Fu Li, Chuang Gan, Xiao Liu, Yunlong Bian, Xiang Long, Yandong Li, Zhichao Li, Jie Zhou, and Shilei Wen. 2017. Temporal modeling approaches for large-scale Youtube-8M video understanding. In Proc. of the CVPR'17 Workshop on YouTube-8M Large-Scale Video Understanding."},{"key":"e_1_3_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.469"},{"key":"e_1_3_2_2_27_1","volume-title":"Proc. of the 2nd Workshop on YouTube-8M Large-Scale Video Understanding.","author":"Lin Rongcheng","year":"2018","unstructured":"Rongcheng Lin , Jing Xiao , and Jianping Fan . 2018 . NeXtVLAD: An Efficient Neural Network to Aggregate Frame-level Features for Large-scale Video Classification . In Proc. of the 2nd Workshop on YouTube-8M Large-Scale Video Understanding. Rongcheng Lin, Jing Xiao, and Jianping Fan. 2018. NeXtVLAD: An Efficient Neural Network to Aggregate Frame-level Features for Large-scale Video Classification. In Proc. of the 2nd Workshop on YouTube-8M Large-Scale Video Understanding."},{"key":"e_1_3_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3041021.3051099"},{"key":"e_1_3_2_2_29_1","volume-title":"Proc. of the CVPR Workshop on YouTube-8M Large-Scale Video Understanding.","author":"Miech Antoine","year":"2017","unstructured":"Antoine Miech , Ivan Laptev , and Josef Sivic . 2017 . Learnable Pooling with Context Gating for Video Classification . In Proc. of the CVPR Workshop on YouTube-8M Large-Scale Video Understanding. Antoine Miech, Ivan Laptev, and Josef Sivic. 2017. Learnable Pooling with Context Gating for Video Classification. In Proc. of the CVPR Workshop on YouTube-8M Large-Scale Video Understanding."},{"key":"e_1_3_2_2_30_1","volume-title":"Cyclades: Conflict-free asynchronous machine learning. In Advances in Neural Information Processing Systems. 2568--2576.","author":"Pan Xinghao","year":"2016","unstructured":"Xinghao Pan , Maximilian Lam , Stephen Tu , Dimitris Papailiopoulos , Ce Zhang , Michael I Jordan , Kannan Ramchandran , and Christopher R\u00e9 . 2016 . Cyclades: Conflict-free asynchronous machine learning. In Advances in Neural Information Processing Systems. 2568--2576. Xinghao Pan, Maximilian Lam, Stephen Tu, Dimitris Papailiopoulos, Ce Zhang, Michael I Jordan, Kannan Ramchandran, and Christopher R\u00e9. 2016. Cyclades: Conflict-free asynchronous machine learning. In Advances in Neural Information Processing Systems. 2568--2576."},{"key":"e_1_3_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.590"},{"key":"e_1_3_2_2_32_1","unstructured":"Shaoqing Ren Kaiming He Ross Girshick and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems (NIPS).   Shaoqing Ren Kaiming He Ross Girshick and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems (NIPS)."},{"key":"e_1_3_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICNN.1993.298623"},{"key":"e_1_3_2_2_34_1","volume-title":"Increase the Batch Size. arXiv preprint arXiv:1711.00489","author":"Smith Samuel L","year":"2017","unstructured":"Samuel L Smith , Pieter-Jan Kindermans , and Quoc V Le. 2017. Don't Decay the Learning Rate , Increase the Batch Size. arXiv preprint arXiv:1711.00489 ( 2017 ). Samuel L Smith, Pieter-Jan Kindermans, and Quoc V Le. 2017. Don't Decay the Learning Rate, Increase the Batch Size. arXiv preprint arXiv:1711.00489 (2017)."},{"key":"e_1_3_2_2_35_1","volume-title":"Proc. of the AAAI Conference on Artificial Intelligence.","author":"Szegedy Christian","year":"2017","unstructured":"Christian Szegedy , Sergey Ioffe , Vincent Vanhoucke , and Alexander A Alemi . 2017 . Inception-v4, Inception-ResNet and the impact of residual connections on learning .. In Proc. of the AAAI Conference on Artificial Intelligence. Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A Alemi. 2017. Inception-v4, Inception-ResNet and the impact of residual connections on learning.. In Proc. of the AAAI Conference on Artificial Intelligence."},{"key":"e_1_3_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"e_1_3_2_2_37_1","volume-title":"Proc. of the 2nd Workshop on YouTube-8M Large-Scale Video Understanding.","author":"Tang Yongyi","year":"2018","unstructured":"Yongyi Tang , Xing Zhang , Jingwen Wang , Shaoxiang Chen , Lin Ma , and Yu-Gang Jiang . 2018 . Non-local NetVLAD Encoding for Video Classification . In Proc. of the 2nd Workshop on YouTube-8M Large-Scale Video Understanding. Yongyi Tang, Xing Zhang, Jingwen Wang, Shaoxiang Chen, Lin Ma, and Yu-Gang Jiang. 2018. Non-local NetVLAD Encoding for Video Classification. In Proc. of the 2nd Workshop on YouTube-8M Large-Scale Video Understanding."},{"key":"e_1_3_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2017.2712608"},{"key":"e_1_3_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.5244\/C.23.124"},{"key":"e_1_3_2_2_40_1","volume-title":"Proc. of the CVPR Workshop on YouTube-8M Large-Scale Video Understanding.","author":"Wang He-Da","year":"2017","unstructured":"He-Da Wang , Teng Zhang , and Ji Wu . 2017 . The Monkeytyping Solution to the Youtube-8M Video Understanding Challenge . In Proc. of the CVPR Workshop on YouTube-8M Large-Scale Video Understanding. He-Da Wang, Teng Zhang, and Ji Wu. 2017. The Monkeytyping Solution to the Youtube-8M Video Understanding Challenge. In Proc. of the CVPR Workshop on YouTube-8M Large-Scale Video Understanding."},{"key":"e_1_3_2_2_41_1","volume-title":"Large batch training of convolutional networks. arXiv preprint arXiv:1708.03888","author":"You Yang","year":"2017","unstructured":"Yang You , Igor Gitman , and Boris Ginsburg . 2017. Large batch training of convolutional networks. arXiv preprint arXiv:1708.03888 ( 2017 ). Yang You, Igor Gitman, and Boris Ginsburg. 2017. Large batch training of convolutional networks. arXiv preprint arXiv:1708.03888 (2017)."},{"key":"e_1_3_2_2_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7299101"},{"key":"e_1_3_2_2_43_1","volume-title":"Proc. of the International Conference on Machine Learning (ICML).","author":"Zhang Ruiliang","year":"2014","unstructured":"Ruiliang Zhang and James Kwok . 2014 . Asynchronous distributed ADMM for consensus optimization . In Proc. of the International Conference on Machine Learning (ICML). Ruiliang Zhang and James Kwok. 2014. Asynchronous distributed ADMM for consensus optimization. In Proc. of the International Conference on Machine Learning (ICML)."},{"key":"e_1_3_2_2_44_1","unstructured":"Sixin Zhang Anna E Choromanska and Yann LeCun. 2015. Deep learning with elastic averaging SGD. In Advances in Neural Information Processing Systems.   Sixin Zhang Anna E Choromanska and Yann LeCun. 2015. Deep learning with elastic averaging SGD. In Advances in Neural Information Processing Systems."},{"key":"e_1_3_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.660"},{"key":"e_1_3_2_2_46_1","volume-title":"Places: A 10 million image database for scene recognition","author":"Zhou Bolei","year":"2017","unstructured":"Bolei Zhou , Agata Lapedriza , Aditya Khosla , Aude Oliva , and Antonio Torralba . 2017 . Places: A 10 million image database for scene recognition . IEEE Transactions on Pattern Analysis and Machine Intelligence ( 2017). Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. 2017. Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (2017)."},{"key":"e_1_3_2_2_47_1","volume-title":"Proc. of the CVPR'17 Workshop on YouTube-8M Large-Scale Video Understanding.","author":"Zhu Linchao","year":"2017","unstructured":"Linchao Zhu , Yanbin Liu , and Yi Yang . 2017 . UTS submission to Google YouTube-8M Challenge 2017 . In Proc. of the CVPR'17 Workshop on YouTube-8M Large-Scale Video Understanding. Linchao Zhu, Yanbin Liu, and Yi Yang. 2017. UTS submission to Google YouTube-8M Challenge 2017. In Proc. of the CVPR'17 Workshop on YouTube-8M Large-Scale Video Understanding."},{"key":"e_1_3_2_2_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.540"}],"event":{"name":"KDD '19: The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining","location":"Anchorage AK USA","acronym":"KDD '19","sponsor":["SIGMOD ACM Special Interest Group on Management of Data","SIGKDD ACM Special Interest Group on Knowledge Discovery in Data"]},"container-title":["Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3292500.3330653","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3292500.3330653","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T01:02:09Z","timestamp":1750208529000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3292500.3330653"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,7,25]]},"references-count":48,"alternative-id":["10.1145\/3292500.3330653","10.1145\/3292500"],"URL":"https:\/\/doi.org\/10.1145\/3292500.3330653","relation":{},"subject":[],"published":{"date-parts":[[2019,7,25]]},"assertion":[{"value":"2019-07-25","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}