{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,27]],"date-time":"2025-10-27T16:18:23Z","timestamp":1761581903272,"version":"3.41.0"},"reference-count":74,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2019,11,30]],"date-time":"2019-11-30T00:00:00Z","timestamp":1575072000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Key Research Program of Frontier Sciences, CAS","award":["QYZDY-SSW-JSC044"],"award-info":[{"award-number":["QYZDY-SSW-JSC044"]}]},{"name":"National Key R8D Program of China","award":["2017YFB0502900"],"award-info":[{"award-number":["2017YFB0502900"]}]},{"name":"CAS \u201cLight of West China\u201d Program","award":["XAB2017B15"],"award-info":[{"award-number":["XAB2017B15"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["61772510 and 61702498"],"award-info":[{"award-number":["61772510 and 61702498"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Young Top-notch Talent Program of Chinese Academy of Sciences","award":["QYZDB-SSW-JSC015"],"award-info":[{"award-number":["QYZDB-SSW-JSC015"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2019,11,30]]},"abstract":"<jats:p>In this article, we propose a novel method for unsupervised learning of human action categories in still images. In contrast to previous methods, the proposed method explores distinctive information of actions directly from unlabeled image databases, attempting to learn discriminative deep representations in an unsupervised manner to distinguish different actions. In the proposed method, action image collections can be used without manual annotations. Specifically, (i) to deal with the problem that unsupervised discriminative deep representations are difficult to learn, the proposed method builds a training dataset with surrogate labels from the unlabeled dataset, then learns discriminative representations by alternately updating convolutional neural network (CNN) parameters and the surrogate training dataset in an iterative manner; (ii) to explore the discriminatory information among different action categories, training batches for updating the CNN parameters are built with triplet groups and the triplet loss function is introduced to update the CNN parameters; and (iii) to learn more discriminative deep representations, a Random Forest classifier is adopted to update the surrogate training dataset, and more beneficial triplet groups then can be built with the updated surrogate training dataset. Extensive experiments on four benchmark datasets demonstrate the effectiveness of the proposed method.<\/jats:p>","DOI":"10.1145\/3362161","type":"journal-article","created":{"date-parts":[[2019,12,16]],"date-time":"2019-12-16T13:12:30Z","timestamp":1576501950000},"page":"1-20","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["Unsupervised Learning of Human Action Categories in Still Images with Deep Representations"],"prefix":"10.1145","volume":"15","author":[{"given":"Yunpeng","family":"Zheng","sequence":"first","affiliation":[{"name":"Key Laboratory of Spectral Imaging Technology CAS, Xi\u2019an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences and the University of Chinese Academy of Sciences, Beijing, China"}]},{"given":"Xuelong","family":"Li","sequence":"additional","affiliation":[{"name":"School of Computer Science and Center for OPTical IMagery Analysis and Learning (OPTIMAL), Northwestern Polytechnical University, Shaanxi, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7037-5188","authenticated-orcid":false,"given":"Xiaoqiang","family":"Lu","sequence":"additional","affiliation":[{"name":"Key Laboratory of Spectral Imaging Technology CAS, Xi\u2019an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Shaanxi, China"}]}],"member":"320","published-online":{"date-parts":[[2019,12,16]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/3199668"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.208"},{"key":"e_1_2_1_3_1","unstructured":"Miguel \u00c1ngel Bautista Artsiom Sanakoyeu Ekaterina Tikhoncheva and Bj\u00f6rn Ommer. 2016. CliqueCNN: Deep unsupervised exemplar learning. In Advances in Neural Information Processing Systems. NIPSF 3846--3854.  Miguel \u00c1ngel Bautista Artsiom Sanakoyeu Ekaterina Tikhoncheva and Bj\u00f6rn Ommer. 2016. CliqueCNN: Deep unsupervised exemplar learning. In Advances in Neural Information Processing Systems. NIPSF 3846--3854."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2007.4409066"},{"key":"e_1_2_1_5_1","volume-title":"Van Gool","author":"Bossard Lukas","year":"2014","unstructured":"Lukas Bossard , Matthieu Guillaumin , and Luc J . Van Gool . 2014 . Food-101\u2014 mining discriminative components with random forests. In Proceedings of the European Conference on Computer Vision. Springer , 446--461. Lukas Bossard, Matthieu Guillaumin, and Luc J. Van Gool. 2014. Food-101\u2014 mining discriminative components with random forests. In Proceedings of the European Conference on Computer Vision. Springer, 446--461."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1010933404324"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2005.198"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01264-9_9"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.5244\/C.24.97"},{"key":"e_1_2_1_10_1","unstructured":"Vincent Delaitre Josef Sivic and Ivan Laptev. 2011. Learning person-object interactions for action recognition in still images. In Advances in Neural Information Processing Systems. NIPSF 1503--1511.  Vincent Delaitre Josef Sivic and Ivan Laptev. 2011. Learning person-object interactions for action recognition in still images. In Advances in Neural Information Processing Systems. NIPSF 1503--1511."},{"key":"e_1_2_1_11_1","volume-title":"Efros","author":"Doersch Carl","year":"2015","unstructured":"Carl Doersch , Abhinav Gupta , and Alexei A . Efros . 2015 . Unsupervised visual representation learning by context prediction. In Advances in Neural Information Processing Systems. NIPSF , 1422--1430. Carl Doersch, Abhinav Gupta, and Alexei A. Efros. 2015. Unsupervised visual representation learning by context prediction. In Advances in Neural Information Processing Systems. NIPSF, 1422--1430."},{"key":"e_1_2_1_12_1","volume-title":"Martin A. Riedmiller, and Thomas Brox.","author":"Dosovitskiy Alexey","year":"2014","unstructured":"Alexey Dosovitskiy , Jost Tobias Springenberg , Martin A. Riedmiller, and Thomas Brox. 2014 . Discriminative unsupervised feature learning with convolutional neural networks. In Advances in Neural Information Processing Systems. NIPSF , 766--774. Alexey Dosovitskiy, Jost Tobias Springenberg, Martin A. Riedmiller, and Thomas Brox. 2014. Discriminative unsupervised feature learning with convolutional neural networks. In Advances in Neural Information Processing Systems. NIPSF, 766--774."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01249-6_4"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2017.205"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.284"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.129"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2014.04.018"},{"volume-title":"Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, 2980--2988","author":"He Kaiming","key":"e_1_2_1_18_1","unstructured":"Kaiming He , Georgia Gkioxari , Piotr Doll\u00e1r , and Ross B. Girshick . 2017 . Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, 2980--2988 . Kaiming He, Georgia Gkioxari, Piotr Doll\u00e1r, and Ross B. Girshick. 2017. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, 2980--2988."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3177757"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICPR.2008.4761663"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2009.09.011"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2010.5540039"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3152114"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2018.2812099"},{"volume-title":"Proceedings of the European Symposium on Artificial Neural Networks. i6doc.com publication, 489--494","author":"Krizhevsky Alex","key":"e_1_2_1_25_1","unstructured":"Alex Krizhevsky and Geoffrey E. Hinton . 2011. Using very deep autoencoders for content-based image retrieval . In Proceedings of the European Symposium on Artificial Neural Networks. i6doc.com publication, 489--494 . Alex Krizhevsky and Geoffrey E. Hinton. 2011. Using very deep autoencoders for content-based image retrieval. In Proceedings of the European Symposium on Artificial Neural Networks. i6doc.com publication, 489--494."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2006.68"},{"volume-title":"Proceedings of the International Conference on Multimedia Retrieval. ACM, 231--238","author":"Le Dieu-Thu","key":"e_1_2_1_27_1","unstructured":"Dieu-Thu Le , Raffaella Bernardi , and Jasper R. R. Uijlings . 2013. Exploiting language models to recognize unseen actions . In Proceedings of the International Conference on Multimedia Retrieval. ACM, 231--238 . Dieu-Thu Le, Raffaella Bernardi, and Jasper R. R. Uijlings. 2013. Exploiting language models to recognize unseen actions. In Proceedings of the International Conference on Multimedia Retrieval. ACM, 231--238."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2013.6639343"},{"volume-title":"Proceedings of the International Conference on Machine Learning. ACM, 609--616","author":"Lee Honglak","key":"e_1_2_1_29_1","unstructured":"Honglak Lee , Roger B. Grosse , Rajesh Ranganath , and Andrew Y. Ng . 2009. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations . In Proceedings of the International Conference on Machine Learning. ACM, 609--616 . Honglak Lee, Roger B. Grosse, Rajesh Ranganath, and Andrew Y. Ng. 2009. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In Proceedings of the International Conference on Machine Learning. ACM, 609--616."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.79"},{"key":"e_1_2_1_31_1","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. IEEE, 178--178","author":"Li Fei-Fei","year":"2004","unstructured":"Fei-Fei Li , Rob Fergus , and Pietro Perona . 2004 . Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. IEEE, 178--178 . Fei-Fei Li, Rob Fergus, and Pietro Perona. 2004. Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. IEEE, 178--178."},{"key":"e_1_2_1_32_1","doi-asserted-by":"crossref","unstructured":"Piji Li Jun Ma and Shuai Gao. 2011. Actions in still web images: Visualization detection and retrieval. In Web-Age Information Management. 302--313.  Piji Li Jun Ma and Shuai Gao. 2011. Actions in still web images: Visualization detection and retrieval. In Web-Age Information Management. 302--313.","DOI":"10.1007\/978-3-642-23535-1_27"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/3131344"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/WACV.2018.00046"},{"key":"e_1_2_1_35_1","article-title":"Skeleton-based online action prediction using scale selection network","author":"Liu Jun","year":"2019","unstructured":"Jun Liu , Amir Shahroudy , Gang Wang , Ling-Yu Duan , and Alex C. Kot . 2019 . Skeleton-based online action prediction using scale selection network . IEEE Transactions on Pattern Analysis and Machine Intelligence. Jun Liu, Amir Shahroudy, Gang Wang, Ling-Yu Duan, and Alex C. Kot. 2019. Skeleton-based online action prediction using scale selection network. IEEE Transactions on Pattern Analysis and Machine Intelligence.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3231741"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1023\/B:VISI.0000029664.99615.94"},{"key":"e_1_2_1_38_1","volume-title":"Proceedings of the International Conference on Machine Learning. IMLS, 2275--2284","author":"Ma Fan","year":"2017","unstructured":"Fan Ma , Deyu Meng , Qi Xie , Zina Li , and Xuanyi Dong . 2017 . Self-paced co-training . In Proceedings of the International Conference on Machine Learning. IMLS, 2275--2284 . Fan Ma, Deyu Meng, Qi Xie, Zina Li, and Xuanyi Dong. 2017. Self-paced co-training. In Proceedings of the International Conference on Machine Learning. IMLS, 2275--2284."},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2017.01.027"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2011.5995631"},{"volume-title":"Introduction to Information Retrieval","author":"Manning Christopher D.","key":"e_1_2_1_41_1","unstructured":"Christopher D. Manning , Prabhakar Raghavan , and Hinrich Sch\u00fctze . 2008. Introduction to Information Retrieval . Cambridge University Press . Christopher D. Manning, Prabhakar Raghavan, and Hinrich Sch\u00fctze. 2008. Introduction to Information Retrieval. Cambridge University Press."},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-007-0122-4"},{"key":"e_1_2_1_43_1","volume-title":"Papadimitriou and Kenneth Steiglitz","author":"Christos","year":"1998","unstructured":"Christos H. Papadimitriou and Kenneth Steiglitz . 1998 . Combinatorial Optimization : Algorithms and Complexity. Prentice-Hall . Christos H. Papadimitriou and Kenneth Steiglitz. 1998. Combinatorial Optimization: Algorithms and Complexity. Prentice-Hall."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2011.158"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/3271553.3271563"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.621"},{"volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1505--1512","author":"Razavi Nima","key":"e_1_2_1_47_1","unstructured":"Nima Razavi , Juergen Gall , and Luc J . Van Gool. 2011. Scalable multi-class object detection . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1505--1512 . Nima Razavi, Juergen Gall, and Luc J. Van Gool. 2011. Scalable multi-class object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1505--1512."},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2015.2459678"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298682"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-33885-4_27"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2016.2537325"},{"key":"e_1_2_1_52_1","volume-title":"Proceedings of the International Conference on Learning Representations. ICLR.","author":"Simonyan Karen","year":"2015","unstructured":"Karen Simonyan and Andrew Zisserman . 2015 . Very deep convolutional networks for large-scale image recognition . In Proceedings of the International Conference on Learning Representations. ICLR. Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations. ICLR."},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.82"},{"key":"e_1_2_1_54_1","volume-title":"Amir Roshan Zamir, and Mubarak Shah","author":"Soomro Khurram","year":"2012","unstructured":"Khurram Soomro , Amir Roshan Zamir, and Mubarak Shah . 2012 . UCF101: a dataset of 101 human actions classes from videos in the wild. In CRCV-TR- 12-01. Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. 2012. UCF101: a dataset of 101 human actions classes from videos in the wild. In CRCV-TR-12-01."},{"key":"e_1_2_1_55_1","first-page":"583","article-title":"Cluster ensembles \u2014 a knowledge reuse framework for combining multiple partitions","volume":"3","author":"Strehl Alexander","year":"2002","unstructured":"Alexander Strehl and Joydeep Ghosh . 2002 . Cluster ensembles \u2014 a knowledge reuse framework for combining multiple partitions . Journal of Machine Learning Research 3 , 583 -- 617 . Alexander Strehl and Joydeep Ghosh. 2002. Cluster ensembles \u2014 a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research 3, 583--617.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2010.5540018"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/3152127"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.149"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2006.321"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7299065"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00543"},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.556"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2011.6126386"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2012.67"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.597"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.5555\/3034194.3034338"},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2018.06.071"},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2016.2605305"},{"key":"e_1_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2017.2682196"},{"key":"e_1_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2016.08.020"},{"key":"e_1_2_1_71_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICIP.2012.6466977"},{"key":"e_1_2_1_72_1","doi-asserted-by":"publisher","DOI":"10.1145\/3159171"},{"key":"e_1_2_1_73_1","article-title":"Fusing multiple features for depth-based action recognition","volume":"6","author":"Zhu Yu","year":"2015","unstructured":"Yu Zhu , Wenbin Chen , and Guodong Guo . 2015 . Fusing multiple features for depth-based action recognition . ACM Transactions on Multimedia Computing, Communications, and Applications 6 , 2, 18:1--18:20. Yu Zhu, Wenbin Chen, and Guodong Guo. 2015. Fusing multiple features for depth-based action recognition. ACM Transactions on Multimedia Computing, Communications, and Applications 6, 2, 18:1--18:20.","journal-title":"ACM Transactions on Multimedia Computing, Communications, and Applications"},{"key":"e_1_2_1_74_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2015.03.006"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3362161","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3362161","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:44:54Z","timestamp":1750203894000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3362161"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,11,30]]},"references-count":74,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2019,11,30]]}},"alternative-id":["10.1145\/3362161"],"URL":"https:\/\/doi.org\/10.1145\/3362161","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"type":"print","value":"1551-6857"},{"type":"electronic","value":"1551-6865"}],"subject":[],"published":{"date-parts":[[2019,11,30]]},"assertion":[{"value":"2018-08-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-09-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-12-16","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}