{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,20]],"date-time":"2026-05-20T16:14:48Z","timestamp":1779293688278,"version":"3.51.4"},"reference-count":34,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2019,1,21]],"date-time":"2019-01-21T00:00:00Z","timestamp":1548028800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100013293","name":"Active and Assisted Living programme","doi-asserted-by":"publisher","award":["CAMI"],"award-info":[{"award-number":["CAMI"]}],"id":[{"id":"10.13039\/100013293","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100006595","name":"UEFISCDI","doi-asserted-by":"publisher","award":["Bridge Grant SPARC"],"award-info":[{"award-number":["Bridge Grant SPARC"]}],"id":[{"id":"10.13039\/501100006595","id-type":"DOI","asserted-by":"publisher"}]},{"name":"UEFISCI","award":["Robin Social"],"award-info":[{"award-number":["Robin Social"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Robust action recognition methods lie at the cornerstone of Ambient Assisted Living (AAL) systems employing optical devices. Using 3D skeleton joints extracted from depth images taken with time-of-flight (ToF) cameras has been a popular solution for accomplishing these tasks. Though seemingly scarce in terms of information availability compared to its RGB or depth image counterparts, the skeletal representation has proven to be effective in the task of action recognition. This paper explores different interpretations of both the spatial and the temporal dimensions of a sequence of frames describing an action. We show that rather intuitive approaches, often borrowed from other computer vision tasks, can improve accuracy. We report results based on these modifications and propose an architecture that uses temporal convolutions with results comparable to the state of the art.<\/jats:p>","DOI":"10.3390\/s19020423","type":"journal-article","created":{"date-parts":[[2019,1,22]],"date-time":"2019-01-22T03:08:22Z","timestamp":1548126502000},"page":"423","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":12,"title":["Spatio-Temporal Features in Action Recognition Using 3D Skeletal Joints"],"prefix":"10.3390","volume":"19","author":[{"given":"Mihai","family":"Tr\u0103sc\u0103u","sequence":"first","affiliation":[{"name":"Faculty of Automatic Control and Computers, University Politehnica Bucharest, Bucure\u0219ti 060042, Romania"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mihai","family":"Nan","sequence":"additional","affiliation":[{"name":"Faculty of Automatic Control and Computers, University Politehnica Bucharest, Bucure\u0219ti 060042, Romania"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Adina Magda","family":"Florea","sequence":"additional","affiliation":[{"name":"Faculty of Automatic Control and Computers, University Politehnica Bucharest, Bucure\u0219ti 060042, Romania"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2019,1,21]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"3585","DOI":"10.1109\/JSEN.2017.2697077","article-title":"Radar and RGB-depth sensors for fall detection: A review","volume":"17","author":"Cippitelli","year":"2017","journal-title":"IEEE Sens. J."},{"key":"ref_2","unstructured":"Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F., and Weinberger, K.Q. (2011). Learning person-object interactions for action recognition in still images. Advances in Neural Information Processing Systems 24, Curran Associates, Inc."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Vitri\u00e0, J., Sanches, J.M., and Hern\u00e1ndez, M. (2011). On Importance of Interactions and Context in Human Action Recognition. Pattern Recognition and Image Analysis, Springer.","DOI":"10.1007\/978-3-642-21257-4"},{"key":"ref_4","unstructured":"Duong, T.V., Bui, H.H., Phung, D.Q., and Venkatesh, S. (2005, January 20\u201326). Activity recognition and abnormality detection with the switching hidden semi-Markov model. Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR\u201905), San Diego, CA, USA."},{"key":"ref_5","unstructured":"Yamato, J., Ohya, J., and Ishii, K. (1992, January 15\u201318). Recognizing human action in time-sequential images using hidden Markov model. Proceedings of the 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Champaign, IL, USA."},{"key":"ref_6","unstructured":"Bai, S., Kolter, J.Z., and Koltun, V. (arXiv, 2018). An empirical evaluation of generic convolutional and recurrent networks for sequence modeling, arXiv."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"201","DOI":"10.3758\/BF03212378","article-title":"Visual perception of biological motion and a model for its analysis","volume":"14","author":"Johansson","year":"1973","journal-title":"Percept. Psychophys."},{"key":"ref_8","unstructured":"Shahroudy, A., Liu, J., Ng, T., and Wang, G. (July, January 26). NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Veeriah, V., Zhuang, N., and Qi, G. (2015, January 7\u201312). Differential Recurrent Neural Networks for Action Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/ICCV.2015.460"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Zhu, W., Lan, C., Xing, J., Zeng, W., Li, Y., Shen, L., and Xie, X. (2016, January 12\u201317). Co-occurrence Feature Learning for Skeleton based Action Recognition using Regularized Deep LSTM Networks. Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA.","DOI":"10.1609\/aaai.v30i1.10451"},{"key":"ref_11","first-page":"1110","article-title":"Hierarchical recurrent neural network for skeleton based action recognition","volume":"1","author":"Du","year":"2015","journal-title":"Comput. Vis. Pattern Recognit."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Song, S., Lan, C., Xing, J., Zeng, W., and Liu, J. (2017, January 4\u20139). An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data. Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, San Francisco, CA, USA.","DOI":"10.1609\/aaai.v31i1.11212"},{"key":"ref_13","unstructured":"Liu, J., Akhtar, N., and Mian, A. (arXiv, 2017). Skepxels: Spatio-temporal Image Representation of Human Skeleton Joints for Action Recognition, arXiv."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Yang, Z., Li, Y., Yang, J., and Luo, J. (arXiv, 2018). Action Recognition with Spatio-Temporal Visual Attention on Skeleton Image Sequences, arXiv.","DOI":"10.1109\/ICPR.2018.8546012"},{"key":"ref_15","unstructured":"Kipf, T.N., and Welling, M. (2017, January 24\u201326). Semi-Supervised Classification with Graph Convolutional Networks. Proceedings of the International Conference on Learning Representations (ICLR), Toulon, France."},{"key":"ref_16","unstructured":"Van den Berg, R., Kipf, T.N., and Welling, M. (arXiv, 2017). Graph Convolutional Matrix Completion, arXiv."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Schlichtkrull, M., Kipf, T.N., Bloem, P., van den Berg, R., Titov, I., and Welling, M. (arXiv, 2017). Modeling Relational Data with Graph Convolutional Networks, arXiv.","DOI":"10.1007\/978-3-319-93417-4_38"},{"key":"ref_18","unstructured":"Defferrard, M., Bresson, X., and Vandergheynst, P. (arXiv, 2016). Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering, arXiv."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Li, C., Cui, Z., Zheng, W., Xu, C., and Yang, J. (2018, January 2\u20137). Spatio-Temporal Graph Convolution for Skeleton Based Action Recognition. Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA.","DOI":"10.1609\/aaai.v32i1.11776"},{"key":"ref_20","unstructured":"Zhang, X., Xu, C., Tian, X., and Tao, D. (arXiv, 2018). Graph Edge Convolutional Neural Networks for Skeleton Based Action Recognition, arXiv."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"2330","DOI":"10.1109\/TMM.2018.2802648","article-title":"Fusing Geometric Features for Skeleton-Based Action Recognition using Multilayer LSTM Networks","volume":"20","author":"Zhang","year":"2018","journal-title":"IEEE Trans. Multimed."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger, K.Q. (2017, January 21\u201326). Densely Connected Convolutional Networks. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.243"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Zanfir, M., Leordeanu, M., and Sminchisescu, C. (2013, January 26\u201328). The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection. Proceedings of the IEEE International Conference on Computer Vision, Tamilnadu, India.","DOI":"10.1109\/ICCV.2013.342"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Liu, J., Shahroudy, A., Xu, D., and Wang, G. (2016, January 8\u201316). Spatio-temporal lstm with trust gates for 3D human action recognition. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46487-9_50"},{"key":"ref_25","unstructured":"Battaglia, P., Pascanu, R., Lai, M., Rezende, D.J., and Kavukcuoglu, K. (2016, January 5\u201310). Interaction networks for learning about objects, relations and physics. Proceedings of the 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain."},{"key":"ref_26","unstructured":"Kingma, D.P., and Ba, J. (arXiv, 2014). Adam: A Method for Stochastic Optimization, arXiv."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Huang, Z., Wan, C., Probst, T., and LVan Gool, L. (2017, January 21\u201326). Deep Learning on Lie Groups for Skeleton-based Action Recognition. Proceedings of the IEEE Computer Vision and Pattern Recognition (CVPR) 2017, Hawaii, HI, USA.","DOI":"10.1109\/CVPR.2017.137"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"2186","DOI":"10.1109\/TPAMI.2016.2640292","article-title":"Jointly Learning Heterogeneous Features for RGB-D Activity Recognition","volume":"39","author":"Hu","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Liu, J., Shahroudy, A., Xu, D., Kot, A.C., and Wang, G. (arXiv, 2017). Skeleton-Based Action Recognition Using Spatio-Temporal LSTM Network with Trust Gates, arXiv.","DOI":"10.1109\/TPAMI.2017.2771306"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Ke, Q., Bennamoun, M., An, S., Sohel, F.A., and Boussa\u00efd, F. (arXiv, 2017). A New Representation of Skeleton Sequences for 3D Action Recognition, arXiv.","DOI":"10.1109\/CVPR.2017.486"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Yan, S., Xiong, Y., and Lin, D. (arXiv, 2018). Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition, arXiv.","DOI":"10.1609\/aaai.v32i1.12328"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"1045","DOI":"10.1109\/TPAMI.2017.2691321","article-title":"Deep Multimodal Feature Analysis for Action Recognition in RGB+D Videos","volume":"40","author":"Shahroudy","year":"2018","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_33","unstructured":"Li, C., Zhong, Q., Xie, D., and Pu, S. (arXiv, 2017). Skeleton-based Action Recognition with Convolutional Neural Networks, arXiv."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Wang, P., Li, W., Gao, Z., Tang, C., and Ogunbona, P. (2018). Depth Pooling Based Large-Scale 3-D Action Recognition with Convolutional Neural Networks. IEEE Trans. Multimed., 20.","DOI":"10.1109\/TMM.2018.2818329"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/19\/2\/423\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T12:27:39Z","timestamp":1760185659000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/19\/2\/423"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,1,21]]},"references-count":34,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2019,1]]}},"alternative-id":["s19020423"],"URL":"https:\/\/doi.org\/10.3390\/s19020423","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,1,21]]}}}