{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,6]],"date-time":"2026-04-06T21:12:06Z","timestamp":1775509926061,"version":"3.50.1"},"reference-count":60,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2022,3,4]],"date-time":"2022-03-04T00:00:00Z","timestamp":1646352000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Ministry of Electronics and Information Technology, Govt. of India","award":["MIT1100CSE"],"award-info":[{"award-number":["MIT1100CSE"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2022,8,31]]},"abstract":"<jats:p>Many applications of action recognition, especially broad domains like surveillance or anomaly-detection, favor unsupervised methods considering that exhaustive labeling of actions is not possible. However, very limited work has happened in this domain. Moreover, the existing self-supervised approaches suffer from their dependency upon labeled data for finetuning. To this end, this paper puts forward a manifold based unsupervised pose-sequence recognition approach that leverages only the natural biases present in the data. It works by clustering the projections of temporal derivatives of the fragmented data on the Grassmann manifold. Temporal derivatives are formed by the inter-frame gradients with local and global metrics. To commensurate with this, a dynamic view-invariant pose representation is proposed. Additionally, a variable aggregation step is introduced for better feature vector quantization. Extensive empirical evaluation and ablations on several challenging datasets under three categories confirm the superiority of the proposed approach in contrast to current methods.<\/jats:p>","DOI":"10.1145\/3491227","type":"journal-article","created":{"date-parts":[[2022,3,4]],"date-time":"2022-03-04T10:26:32Z","timestamp":1646389592000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["GraSP: Local Grassmannian Spatio-Temporal Patterns for Unsupervised Pose Sequence Recognition"],"prefix":"10.1145","volume":"18","author":[{"given":"Himanshu","family":"Buckchash","sequence":"first","affiliation":[{"name":"Indian Institute of Technology Roorkee, Roorkee, Uttarakhand, India"}]},{"given":"Balasubramanian","family":"Raman","sequence":"additional","affiliation":[{"name":"Indian Institute of Technology Roorkee, Roorkee, Uttarakhand, India"}]}],"member":"320","published-online":{"date-parts":[[2022,3,4]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/2808797.2809344"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2006.50"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00300"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICIP40778.2020.9190765"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11554-013-0370-1"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICME51207.2021.9428459"},{"key":"e_1_3_2_8_2","volume-title":"Carnegie Mellon Univ. Pittsburgh, PA, USA","year":"2007","unstructured":"CMU. 2007. Carnegie-Mellon motion capture database. In Carnegie Mellon Univ. Pittsburgh, PA, USA. Last online: Jan. 2021. http:\/\/mocap.cs.cmu.edu\/."},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1080\/10586458.1996.10504585"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-019-01208-x"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCYB.2014.2350774"},{"key":"e_1_3_2_12_2","first-page":"1110","volume-title":"IEEE Conference on Computer Vision and Pattern Recognition","author":"Du Yong","year":"2015","unstructured":"Yong Du, Wei Wang, and Liang Wang. 2015. Hierarchical recurrent neural network for skeleton based action recognition. In IEEE Conference on Computer Vision and Pattern Recognition. 1110\u20131118."},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.1983.10478008"},{"key":"e_1_3_2_14_2","first-page":"72","volume-title":"Summer School on Machine Learning","author":"Ghahramani Zoubin","year":"2003","unstructured":"Zoubin Ghahramani. 2003. Unsupervised learning. In Summer School on Machine Learning. Springer, 72\u2013112."},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cogsys.2020.05.002"},{"key":"e_1_3_2_16_2","first-page":"2066","volume-title":"IEEE Conference on Computer Vision and Pattern Recognition","author":"Gong Boqing","year":"2012","unstructured":"Boqing Gong, Yuan Shi, Fei Sha, and Kristen Grauman. 2012. Geodesic flow kernel for unsupervised domain adaptation. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2066\u20132073."},{"key":"e_1_3_2_17_2","first-page":"1351","volume-title":"23rd International Joint Conference on Artificial Intelligence","author":"Gowayyed Mohammad A.","year":"2013","unstructured":"Mohammad A. Gowayyed, Marwan Torki, Mohamed E. Hussein, and Motaz El-Saban. 2013. Histogram of oriented displacements (HOD) describing trajectories of human joints for action recognition. In 23rd International Joint Conference on Artificial Intelligence. 1351\u20131357."},{"key":"e_1_3_2_18_2","first-page":"1","volume-title":"SIGGRAPH Asia 2015 Technical Briefs","author":"Holden Daniel","year":"2015","unstructured":"Daniel Holden, Jun Saito, Taku Komura, and Thomas Joyce. 2015. Learning motion manifolds with convolutional autoencoders. In SIGGRAPH Asia 2015 Technical Briefs. 1\u20134."},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.137"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1007\/BF01908075"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2019.10.047"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/3DV50981.2020.00102"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.486"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.115"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.5555\/3326943.3327059"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3413548"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46487-9_50"},{"key":"e_1_3_2_28_2","first-page":"1","article-title":"Enhanced 3D human pose estimation from videos by using attention-based neural network with dilated convolutions","author":"Liu Ruixu","year":"2021","unstructured":"Ruixu Liu, Ju Shen, He Wang, Chen Chen, Sen-ching Cheung, and Vijayan K. Asari. 2021. Enhanced 3D human pose estimation from videos by using attention-based neural network with dilated convolutions. International Journal of Computer Vision (2021), 1\u201320.","journal-title":"International Journal of Computer Vision"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/ITSC.2019.8917128"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/FG.2011.5771378"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2010.5540131"},{"key":"e_1_3_2_32_2","volume-title":"Conference on Mathematical Theory of Networks and Systems","author":"Ma Yi","year":"1998","unstructured":"Yi Ma, Jana Kosecka, and Shankar Sastry. 1998. Optimal motion from image sequences: A Riemannian viewpoint. In Conference on Mathematical Theory of Networks and Systems. CiteSeer."},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01227"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58529-7_7"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.3390\/s19081932"},{"key":"e_1_3_2_36_2","article-title":"Augmented skeleton based contrastive action learning with momentum LSTM for unsupervised action recognition","author":"Rao Haocong","year":"2020","unstructured":"Haocong Rao, Shihao Xu, Xiping Hu, Jun Cheng, and Bin Hu. 2020. Augmented skeleton based contrastive action learning with momentum LSTM for unsupervised action recognition. CoRR (2020). arxiv:2008.00188.","journal-title":"CoRR"},{"key":"e_1_3_2_37_2","first-page":"410","volume-title":"Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning","author":"Rosenberg Andrew","year":"2007","unstructured":"Andrew Rosenberg and Julia Hirschberg. 2007. V-measure: A conditional entropy-based external cluster evaluation measure. In Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 410\u2013420."},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2013.77"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.115"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00132"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2014.08.011"},{"key":"e_1_3_2_42_2","first-page":"843","volume-title":"International Conference on Machine Learning","author":"Srivastava Nitish","year":"2015","unstructured":"Nitish Srivastava, Elman Mansimov, and Ruslan Salakhudinov. 2015. Unsupervised learning of video representations using LSTMs. In International Conference on Machine Learning. PMLR, 843\u2013852."},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jvcir.2020.102925"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00965"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2015.7301354"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2008.4587733"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.82"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v30i1.10267"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2018.03.030"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2013.198"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.339"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.641"},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46478-7_23"},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.2977856"},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2012.6239234"},{"key":"e_1_3_2_56_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-020-01398-9"},{"key":"e_1_3_2_57_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00698"},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.11853"},{"key":"e_1_3_2_59_2","article-title":"Auto-conditioned recurrent networks for extended complex human motion synthesis","author":"Zhou Yi","year":"2018","unstructured":"Yi Zhou, Zimo Li, Shuangjiu Xiao, Chong He, Zeng Huang, and Hao Li. 2018. Auto-conditioned recurrent networks for extended complex human motion synthesis. International Conference on Learning Representations (2018).","journal-title":"International Conference on Learning Representations"},{"key":"e_1_3_2_60_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v30i1.10451"},{"key":"e_1_3_2_61_2","doi-asserted-by":"publisher","DOI":"10.14257\/ijsip.2014.7.3.11"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3491227","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3491227","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T18:09:19Z","timestamp":1750183759000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3491227"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,4]]},"references-count":60,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2022,8,31]]}},"alternative-id":["10.1145\/3491227"],"URL":"https:\/\/doi.org\/10.1145\/3491227","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,3,4]]},"assertion":[{"value":"2021-04-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-10-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-03-04","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}