{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,1]],"date-time":"2026-05-01T16:48:46Z","timestamp":1777654126024,"version":"3.51.4"},"publisher-location":"New York, NY, USA","reference-count":30,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,12,1]],"date-time":"2021-12-01T00:00:00Z","timestamp":1638316800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,12]]},"DOI":"10.1145\/3469877.3490580","type":"proceedings-article","created":{"date-parts":[[2022,1,10]],"date-time":"2022-01-10T18:24:29Z","timestamp":1641839069000},"page":"1-5","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["NoisyActions2M: A Multimedia Dataset for Video Understanding from Noisy Labels"],"prefix":"10.1145","author":[{"given":"Mohit","family":"Sharma","sequence":"first","affiliation":[{"name":"IIIT Delhi, IN"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Raj Aaryaman","family":"Patra","sequence":"additional","affiliation":[{"name":"National Institute Of Technology,Rourkela, IN"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Harshal","family":"Desai","sequence":"additional","affiliation":[{"name":"National Institute of Technology Jamshedpur, IN"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shruti","family":"Vyas","sequence":"additional","affiliation":[{"name":"University of Central Florida, US"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yogesh","family":"Rawat","sequence":"additional","affiliation":[{"name":"University of Central Florida, US"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Rajiv Ratn","family":"Shah","sequence":"additional","affiliation":[{"name":"IIIT Delhi, IN"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,1,10]]},"reference":[{"key":"e_1_3_2_2_1_1","unstructured":"Sami Abu-El-Haija Nisarg Kothari Joonseok Lee Paul Natsev George Toderici Balakrishnan Varadarajan and Sudheendra Vijayanarasimhan. 2016. Youtube-8m: A large-scale video classification benchmark. arXiv preprint arXiv:1609.08675(2016).  Sami Abu-El-Haija Nisarg Kothari Joonseok Lee Paul Natsev George Toderici Balakrishnan Varadarajan and Sudheendra Vijayanarasimhan. 2016. Youtube-8m: A large-scale video classification benchmark. arXiv preprint arXiv:1609.08675(2016)."},{"key":"e_1_3_2_2_2_1","volume-title":"Activitynet: A large-scale video benchmark for human activity understanding. In CVPR. 961\u2013970.","author":"Caba\u00a0Heilbron Fabian","year":"2015","unstructured":"Fabian Caba\u00a0Heilbron , Victor Escorcia , Bernard Ghanem , and Juan Carlos\u00a0Niebles . 2015 . Activitynet: A large-scale video benchmark for human activity understanding. In CVPR. 961\u2013970. Fabian Caba\u00a0Heilbron, Victor Escorcia, Bernard Ghanem, and Juan Carlos\u00a0Niebles. 2015. Activitynet: A large-scale video benchmark for human activity understanding. In CVPR. 961\u2013970."},{"key":"e_1_3_2_2_3_1","doi-asserted-by":"crossref","unstructured":"Xinlei Chen and Abhinav Gupta. 2015. Webly supervised learning of convolutional networks. In CVPR. 1431\u20131439.  Xinlei Chen and Abhinav Gupta. 2015. Webly supervised learning of convolutional networks. In CVPR. 1431\u20131439.","DOI":"10.1109\/ICCV.2015.168"},{"key":"e_1_3_2_2_4_1","volume-title":"Imagenet: A large-scale hierarchical image database. In CVPR. Ieee, 248\u2013255.","author":"Deng Jia","year":"2009","unstructured":"Jia Deng , Wei Dong , Richard Socher , Li-Jia Li , Kai Li , and Li Fei-Fei . 2009 . Imagenet: A large-scale hierarchical image database. In CVPR. Ieee, 248\u2013255. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In CVPR. Ieee, 248\u2013255."},{"key":"e_1_3_2_2_5_1","unstructured":"Ali Diba Mohsen Fayyaz Vivek Sharma Manohar Paluri J\u00fcrgen Gall Rainer Stiefelhagen and Luc Van\u00a0Gool. 2019. Large Scale Holistic Video Understanding. arXiv preprint arXiv:1904.11451(2019).  Ali Diba Mohsen Fayyaz Vivek Sharma Manohar Paluri J\u00fcrgen Gall Rainer Stiefelhagen and Luc Van\u00a0Gool. 2019. Large Scale Holistic Video Understanding. arXiv preprint arXiv:1904.11451(2019)."},{"key":"e_1_3_2_2_6_1","doi-asserted-by":"crossref","unstructured":"Santosh\u00a0K Divvala Ali Farhadi and Carlos Guestrin. 2014. Learning everything about anything: Webly-supervised visual concept learning. In CVPR. 3270\u20133277.  Santosh\u00a0K Divvala Ali Farhadi and Carlos Guestrin. 2014. Learning everything about anything: Webly-supervised visual concept learning. In CVPR. 3270\u20133277.","DOI":"10.1109\/CVPR.2014.412"},{"key":"e_1_3_2_2_7_1","volume-title":"Classification in the presence of label noise: a survey","author":"Fr\u00e9nay Beno\u00eet","year":"2013","unstructured":"Beno\u00eet Fr\u00e9nay and Michel Verleysen . 2013. Classification in the presence of label noise: a survey . IEEE transactions on neural networks and learning systems 25, 5( 2013 ), 845\u2013869. Beno\u00eet Fr\u00e9nay and Michel Verleysen. 2013. Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25, 5(2013), 845\u2013869."},{"key":"e_1_3_2_2_8_1","doi-asserted-by":"crossref","unstructured":"Deepti Ghadiyaram Du Tran and Dhruv Mahajan. 2019. Large-scale weakly-supervised pre-training for video action recognition. In CVPR. 12046\u201312055.  Deepti Ghadiyaram Du Tran and Dhruv Mahajan. 2019. Large-scale weakly-supervised pre-training for video action recognition. In CVPR. 12046\u201312055.","DOI":"10.1109\/CVPR.2019.01232"},{"key":"e_1_3_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW53098.2021.00304"},{"key":"e_1_3_2_2_10_1","volume-title":"Something Something","author":"Goyal Raghav","unstructured":"Raghav Goyal , Samira\u00a0Ebrahimi Kahou , Vincent Michalski , Joanna Materzynska , Susanne Westphal , Heuna Kim , Valentin Haenel , Ingo Fruend , Peter Yianilos , Moritz Mueller-Freitag , 2017. The \u201d Something Something \u201d Video Database for Learning and Evaluating Visual Common Sense. In ICCV, Vol .\u00a01. 5. Raghav Goyal, Samira\u00a0Ebrahimi Kahou, Vincent Michalski, Joanna Materzynska, Susanne Westphal, Heuna Kim, Valentin Haenel, Ingo Fruend, Peter Yianilos, Moritz Mueller-Freitag, 2017. The\u201d Something Something\u201d Video Database for Learning and Evaluating Visual Common Sense. In ICCV, Vol.\u00a01. 5."},{"key":"e_1_3_2_2_11_1","volume-title":"Ava: A video dataset of spatio-temporally localized atomic visual actions. In CVPR. 6047\u20136056.","author":"Gu Chunhui","year":"2018","unstructured":"Chunhui Gu , Chen Sun , David\u00a0 A Ross , Carl Vondrick , Caroline Pantofaru , Yeqing Li , Sudheendra Vijayanarasimhan , George Toderici , Susanna Ricco , Rahul Sukthankar , 2018 . Ava: A video dataset of spatio-temporally localized atomic visual actions. In CVPR. 6047\u20136056. Chunhui Gu, Chen Sun, David\u00a0A Ross, Carl Vondrick, Caroline Pantofaru, Yeqing Li, Sudheendra Vijayanarasimhan, George Toderici, Susanna Ricco, Rahul Sukthankar, 2018. Ava: A video dataset of spatio-temporally localized atomic visual actions. In CVPR. 6047\u20136056."},{"key":"e_1_3_2_2_12_1","doi-asserted-by":"crossref","unstructured":"Andrej Karpathy George Toderici Sanketh Shetty Thomas Leung Rahul Sukthankar and Li Fei-Fei. 2014. Large-scale video classification with convolutional neural networks. In CVPR. 1725\u20131732.  Andrej Karpathy George Toderici Sanketh Shetty Thomas Leung Rahul Sukthankar and Li Fei-Fei. 2014. Large-scale video classification with convolutional neural networks. In CVPR. 1725\u20131732.","DOI":"10.1109\/CVPR.2014.223"},{"key":"e_1_3_2_2_13_1","unstructured":"Will Kay Joao Carreira Karen Simonyan Brian Zhang Chloe Hillier Sudheendra Vijayanarasimhan Fabio Viola Tim Green Trevor Back Paul Natsev 2017. The kinetics human action video dataset. arXiv preprint arXiv:1705.06950(2017).  Will Kay Joao Carreira Karen Simonyan Brian Zhang Chloe Hillier Sudheendra Vijayanarasimhan Fabio Viola Tim Green Trevor Back Paul Natsev 2017. The kinetics human action video dataset. arXiv preprint arXiv:1705.06950(2017)."},{"key":"e_1_3_2_2_14_1","volume-title":"HMDB: a large video database for human motion recognition","author":"Kuehne Hildegard","unstructured":"Hildegard Kuehne , Hueihan Jhuang , Est\u00edbaliz Garrote , Tomaso Poggio , and Thomas Serre . 2011. HMDB: a large video database for human motion recognition . In ICCV. IEEE , 2556\u20132563. Hildegard Kuehne, Hueihan Jhuang, Est\u00edbaliz Garrote, Tomaso Poggio, and Thomas Serre. 2011. HMDB: a large video database for human motion recognition. In ICCV. IEEE, 2556\u20132563."},{"key":"e_1_3_2_2_15_1","volume-title":"Deep learning. nature 521, 7553","author":"LeCun Yann","year":"2015","unstructured":"Yann LeCun , Yoshua Bengio , and Geoffrey Hinton . 2015. Deep learning. nature 521, 7553 ( 2015 ), 436\u2013444. Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. nature 521, 7553 (2015), 436\u2013444."},{"key":"e_1_3_2_2_16_1","volume-title":"Cleannet: Transfer learning for scalable image classifier training with label noise. In CVPR. 5447\u20135456.","author":"Lee Kuang-Huei","year":"2018","unstructured":"Kuang-Huei Lee , Xiaodong He , Lei Zhang , and Linjun Yang . 2018 . Cleannet: Transfer learning for scalable image classifier training with label noise. In CVPR. 5447\u20135456. Kuang-Huei Lee, Xiaodong He, Lei Zhang, and Linjun Yang. 2018. Cleannet: Transfer learning for scalable image classifier training with label noise. In CVPR. 5447\u20135456."},{"key":"e_1_3_2_2_17_1","unstructured":"Wen Li Limin Wang Wei Li Eirikur Agustsson and Luc Van\u00a0Gool. 2017. Webvision database: Visual learning and understanding from web data. arXiv preprint arXiv:1708.02862(2017).  Wen Li Limin Wang Wei Li Eirikur Agustsson and Luc Van\u00a0Gool. 2017. Webvision database: Visual learning and understanding from web data. arXiv preprint arXiv:1708.02862(2017)."},{"key":"e_1_3_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.324"},{"key":"e_1_3_2_2_19_1","doi-asserted-by":"crossref","unstructured":"Dhruv Mahajan Ross Girshick Vignesh Ramanathan Kaiming He Manohar Paluri Yixuan Li Ashwin Bharambe and Laurens van\u00a0der Maaten. 2018. Exploring the limits of weakly supervised pretraining. In ECCV. 181\u2013196.  Dhruv Mahajan Ross Girshick Vignesh Ramanathan Kaiming He Manohar Paluri Yixuan Li Ashwin Bharambe and Laurens van\u00a0der Maaten. 2018. Exploring the limits of weakly supervised pretraining. In ECCV. 181\u2013196.","DOI":"10.1007\/978-3-030-01216-8_12"},{"key":"e_1_3_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/219717.219748"},{"key":"e_1_3_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2019.2901464"},{"key":"e_1_3_2_2_22_1","doi-asserted-by":"crossref","unstructured":"Giorgio Patrini Alessandro Rozza Aditya Krishna\u00a0Menon Richard Nock and Lizhen Qu. 2017. Making deep neural networks robust to label noise: A loss correction approach. In CVPR. 1944\u20131952.  Giorgio Patrini Alessandro Rozza Aditya Krishna\u00a0Menon Richard Nock and Lizhen Qu. 2017. Making deep neural networks robust to label noise: A loss correction approach. In CVPR. 1944\u20131952.","DOI":"10.1109\/CVPR.2017.240"},{"key":"e_1_3_2_2_23_1","unstructured":"David Rolnick Andreas Veit Serge Belongie and Nir Shavit. 2017. Deep learning is robust to massive label noise. arXiv preprint arXiv:1705.10694(2017).  David Rolnick Andreas Veit Serge Belongie and Nir Shavit. 2017. Deep learning is robust to massive label noise. arXiv preprint arXiv:1705.10694(2017)."},{"key":"e_1_3_2_2_24_1","volume-title":"Hollywood in homes: Crowdsourcing data collection for activity understanding","author":"Sigurdsson A","unstructured":"Gunnar\u00a0 A Sigurdsson , G\u00fcl Varol , Xiaolong Wang , Ali Farhadi , Ivan Laptev , and Abhinav Gupta . 2016. Hollywood in homes: Crowdsourcing data collection for activity understanding . In ECCV. Springer , 510\u2013526. Gunnar\u00a0A Sigurdsson, G\u00fcl Varol, Xiaolong Wang, Ali Farhadi, Ivan Laptev, and Abhinav Gupta. 2016. Hollywood in homes: Crowdsourcing data collection for activity understanding. In ECCV. Springer, 510\u2013526."},{"key":"e_1_3_2_2_25_1","unstructured":"Khurram Soomro Amir\u00a0Roshan Zamir and Mubarak Shah. 2012. UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402(2012).  Khurram Soomro Amir\u00a0Roshan Zamir and Mubarak Shah. 2012. UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402(2012)."},{"key":"e_1_3_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/2812802"},{"key":"e_1_3_2_2_27_1","unstructured":"Arash Vahdat. 2017. Toward robustness against label noise in training deep discriminative neural networks. In NeurIPS. 5596\u20135605.  Arash Vahdat. 2017. Toward robustness against label noise in training deep discriminative neural networks. In NeurIPS. 5596\u20135605."},{"key":"e_1_3_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58548-8_10"},{"key":"e_1_3_2_2_29_1","doi-asserted-by":"crossref","unstructured":"Tong Xiao Tian Xia Yi Yang Chang Huang and Xiaogang Wang. 2015. Learning from Massive Noisy Labeled Data for Image Classification. In CVPR.  Tong Xiao Tian Xia Yi Yang Chang Huang and Xiaogang Wang. 2015. Learning from Massive Noisy Labeled Data for Image Classification. In CVPR.","DOI":"10.1109\/CVPR.2015.7298885"},{"key":"e_1_3_2_2_30_1","volume-title":"Hacs: Human action clips and segments dataset for recognition and temporal localization. In ICCV. 8668\u20138678.","author":"Zhao Hang","year":"2019","unstructured":"Hang Zhao , Antonio Torralba , Lorenzo Torresani , and Zhicheng Yan . 2019 . Hacs: Human action clips and segments dataset for recognition and temporal localization. In ICCV. 8668\u20138678. Hang Zhao, Antonio Torralba, Lorenzo Torresani, and Zhicheng Yan. 2019. Hacs: Human action clips and segments dataset for recognition and temporal localization. In ICCV. 8668\u20138678."}],"event":{"name":"MMAsia '21: ACM Multimedia Asia","location":"Gold Coast Australia","acronym":"MMAsia '21","sponsor":["SIGMM ACM Special Interest Group on Multimedia"]},"container-title":["ACM Multimedia Asia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3469877.3490580","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3469877.3490580","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:30:16Z","timestamp":1750188616000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3469877.3490580"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,12]]},"references-count":30,"alternative-id":["10.1145\/3469877.3490580","10.1145\/3469877"],"URL":"https:\/\/doi.org\/10.1145\/3469877.3490580","relation":{},"subject":[],"published":{"date-parts":[[2021,12]]},"assertion":[{"value":"2022-01-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}