{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,18]],"date-time":"2026-07-18T14:37:09Z","timestamp":1784385429849,"version":"3.55.0"},"reference-count":57,"publisher":"MDPI AG","issue":"6","license":[{"start":{"date-parts":[[2021,3,15]],"date-time":"2021-03-15T00:00:00Z","timestamp":1615766400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100006595","name":"Unitatea Executiva pentru Finantarea Invatamantului Superior, a Cercetarii, Dezvoltarii si Inovarii","doi-asserted-by":"publisher","award":["PN-III-P1-1.2-PCCDI2017-0734"],"award-info":[{"award-number":["PN-III-P1-1.2-PCCDI2017-0734"]}],"id":[{"id":"10.13039\/501100006595","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100006595","name":"Unitatea Executiva pentru Finantarea Invatamantului Superior, a Cercetarii, Dezvoltarii si Inovarii","doi-asserted-by":"publisher","award":["PN-III-P2-2.1-PED2019-4995"],"award-info":[{"award-number":["PN-III-P2-2.1-PED2019-4995"]}],"id":[{"id":"10.13039\/501100006595","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Action recognition plays an important role in various applications such as video monitoring, automatic video indexing, crowd analysis, human-machine interaction, smart homes and personal assistive robotics. In this paper, we propose improvements to some methods for human action recognition from videos that work with data represented in the form of skeleton poses. These methods are based on the most widely used techniques for this problem\u2014Graph Convolutional Networks (GCNs), Temporal Convolutional Networks (TCNs) and Recurrent Neural Networks (RNNs). Initially, the paper explores and compares different ways to extract the most relevant spatial and temporal characteristics for a sequence of frames describing an action. Based on this comparative analysis, we show how a TCN type unit can be extended to work even on the characteristics extracted from the spatial domain. To validate our approach, we test it against a benchmark often used for human action recognition problems and we show that our solution obtains comparable results to the state-of-the-art, but with a significant increase in the inference speed.<\/jats:p>","DOI":"10.3390\/s21062051","type":"journal-article","created":{"date-parts":[[2021,3,15]],"date-time":"2021-03-15T02:51:48Z","timestamp":1615776708000},"page":"2051","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":39,"title":["Comparison between Recurrent Networks and Temporal Convolutional Networks Approaches for Skeleton-Based Action Recognition"],"prefix":"10.3390","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3260-9705","authenticated-orcid":false,"given":"Mihai","family":"Nan","sequence":"first","affiliation":[{"name":"Faculty of Automatic Control and Computers, University POLITEHNICA of Bucharest, RO-060042 Bucharest, Romania"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Mihai","family":"Tr\u0103sc\u0103u","sequence":"additional","affiliation":[{"name":"Faculty of Automatic Control and Computers, University POLITEHNICA of Bucharest, RO-060042 Bucharest, Romania"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Adina Magda","family":"Florea","sequence":"additional","affiliation":[{"name":"Faculty of Automatic Control and Computers, University POLITEHNICA of Bucharest, RO-060042 Bucharest, Romania"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Cezar C\u0103t\u0103lin","family":"Iacob","sequence":"additional","affiliation":[{"name":"Faculty of Automatic Control and Computers, University POLITEHNICA of Bucharest, RO-060042 Bucharest, Romania"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2021,3,15]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/1922649.1922653","article-title":"Human activity analysis: A review","volume":"43","author":"Aggarwal","year":"2011","journal-title":"ACM Comput. Surv."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"2684","DOI":"10.1109\/TPAMI.2019.2916873","article-title":"Ntu rgb+ d 120: A large-scale benchmark for 3d human activity understanding","volume":"42","author":"Liu","year":"2019","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Blank, M., Gorelick, L., Shechtman, E., Irani, M., and Basri, R. (2005, January 17\u201321). Actions as space-time shapes. Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV\u201905), Beijing, China.","DOI":"10.1109\/ICCV.2005.28"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Li, W., Zhang, Z., and Liu, Z. (2010, January 13\u201318). Action recognition based on a bag of 3D points. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, San Francisco, CA, USA.","DOI":"10.1109\/CVPRW.2010.5543273"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"249","DOI":"10.1016\/j.cviu.2006.07.013","article-title":"Free Viewpoint Action Recognition Using Motion History Volumes","volume":"104","author":"Weinland","year":"2006","journal-title":"Comput. Vis. Image Underst."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Shahroudy, A., Liu, J., Ng, T., and Wang, G. (2016). NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis. arXiv.","DOI":"10.1109\/CVPR.2016.115"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Song, Y.F., Zhang, Z., Shan, C., and Wang, L. (2020, January 12\u201316). Stronger, Faster and More Explainable: A Graph Convolutional Baseline for Skeleton-Based Action Recognition. Proceedings of the 28th ACM International Conference on Multimedia (ACMMM), New York, NY, USA, October 2020, New York, NY, USA.","DOI":"10.1145\/3394171.3413802"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Yan, S., Xiong, Y., and Lin, D. (2018). Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition. arXiv.","DOI":"10.1609\/aaai.v32i1.12328"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"538","DOI":"10.1007\/s11390-020-0405-6","article-title":"Two-Stream Temporal Convolutional Networks for Skeleton-Based Human Action Recognition","volume":"35","author":"Jia","year":"2020","journal-title":"J. Comput. Sci. Technol."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Lea, C., Flynn, M.D., Vidal, R., Reiter, A., and Hager, G.D. (2017, January 22\u201325). Temporal convolutional networks for action segmentation and detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.113"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Tr\u0103sc\u0103u, M., Nan, M., and Florea, A.M. (2019). Spatio-Temporal Features in Action Recognition Using 3D Skeletal Joints. Sensors, 19.","DOI":"10.3390\/s19020423"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Ghi\u021b\u0103, A.\u0218., Gavril, A.F., Nan, M., Hoteit, B., Awada, I.A., Sorici, A., Mocanu, I.G., and Florea, A.M. (2020). The AMIRO Social Robotics Framework: Deployment and Evaluation on the Pepper Robot. Sensors, 20.","DOI":"10.3390\/s20247271"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Carreras, M., Deriu, G., Raffo, L., Benini, L., and Meloni, P. (2020). Optimizing Temporal Convolutional Network inference on FPGA-based accelerators. arXiv.","DOI":"10.1109\/JETCAS.2020.3014503"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Lara-Ben\u00edtez, P., Carranza-Garc\u00eda, M., Luna-Romera, J.M., and Riquelme, J.C. (2020). Temporal convolutional networks applied to energy-related time series forecasting. Appl. Sci., 10.","DOI":"10.20944\/preprints202003.0096.v1"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Zhang, L., Shi, Z., Han, J., Shi, A., and Ma, D. (2020). FurcaNeXt: End-to-end monaural speech separation with dynamic gated dilated temporal convolutional networks. International Conference on Multimedia Modeling, Proceedings of the 26th International Conference, MMM 2020, Daejeon, Korea, 5\u20138 January 2020, Springer.","DOI":"10.1007\/978-3-030-37731-1_53"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"7432","DOI":"10.1109\/JIOT.2020.2984544","article-title":"Temporal Convolutional Networks for Multiperson Activity Recognition Using a 2-D LIDAR","volume":"7","author":"Luo","year":"2020","journal-title":"IEEE Internet Things J."},{"key":"ref_17","unstructured":"Li, S.J., AbuFarha, Y., Liu, Y., Cheng, M.M., and Gall, J. (2020). Ms-tcn++: Multi-stage temporal convolutional network for action segmentation. IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_18","first-page":"3656","article-title":"Spatiotemporal multi-graph convolution network for ride-hailing demand forecasting","volume":"33","author":"Geng","year":"2019","journal-title":"Proc. AAAI Conf. Artif. Intell."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Shi, L., Zhang, Y., Cheng, J., and Lu, H. (2019, January 16\u201320). Two-stream adaptive graph convolutional networks for skeleton-based action recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01230"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Li, M., Chen, S., Chen, X., Zhang, Y., Wang, Y., and Tian, Q. (2019, January 16\u201320). Actional-structural graph convolutional networks for skeleton-based action recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00371"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"3047","DOI":"10.1109\/TNNLS.2019.2935173","article-title":"Graph edge convolutional neural networks for skeleton-based action recognition","volume":"31","author":"Zhang","year":"2019","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_22","unstructured":"Oord, A.v.d., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior, A., and Kavukcuoglu, K. (2016). Wavenet: A generative model for raw audio. arXiv."},{"key":"ref_23","unstructured":"Aksan, E., and Hilliges, O. (2019). Stcn: Stochastic temporal convolutional networks. arXiv."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Plizzari, C., Cannici, M., and Matteucci, M. (2020). Spatial temporal transformer network for skeleton-based action recognition. arXiv.","DOI":"10.1007\/978-3-030-68796-0_50"},{"key":"ref_25","unstructured":"Zhang, H., Wu, C., Zhang, Z., Zhu, Y., Zhang, Z., Lin, H., Sun, Y., He, T., Muller, J., and Manmatha, R. (2020). ResNeSt: Split-Attention Networks. arXiv."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"1586","DOI":"10.1109\/TIP.2017.2785279","article-title":"Skeleton-based human action recognition with global context-aware attention LSTM networks","volume":"27","author":"Liu","year":"2017","journal-title":"IEEE Trans. Image Process."},{"key":"ref_27","unstructured":"Li, C., Zhong, Q., Xie, D., and Pu, S. (2017). Skeleton-based Action Recognition with Convolutional Neural Networks. arXiv."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"3459","DOI":"10.1109\/TIP.2018.2818328","article-title":"Spatio-temporal attention-based LSTM networks for 3D action recognition and detection","volume":"27","author":"Song","year":"2018","journal-title":"IEEE Trans. Image Process."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Veeriah, V., Zhuang, N., and Qi, G. (2015). Differential Recurrent Neural Networks for Action Recognition. arXiv.","DOI":"10.1109\/ICCV.2015.460"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Zhu, W., Lan, C., Xing, J., Zeng, W., Li, Y., Shen, L., and Xie, X. (2016). Co-occurrence Feature Learning for Skeleton based Action Recognition using Regularized Deep LSTM Networks. arXiv.","DOI":"10.1609\/aaai.v30i1.10451"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"2405","DOI":"10.1109\/TCSVT.2018.2864148","article-title":"Action Recognition with Spatio-Temporal Visual Attention on Skeleton Image Sequences","volume":"29","author":"Yang","year":"2018","journal-title":"IEEE Trans. Circ. Syst. Video Technol."},{"key":"ref_32","unstructured":"Peng, Y., Liu, Q., Lu, H., Sun, Z., Liu, C., Chen, X., Zha, H., and Yang, J. (2020). Graph-Temporal LSTM Networks for Skeleton-Based Action Recognition. Pattern Recognition and Computer Vision, Springer International Publishing."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Huang, J., Xiang, X., Gong, X., and Zhang, B. (2020, January 1\u20135). Long-Short Graph Memory Network for Skeleton-based Action Recognition. Proceedings of the IEEE Winter Conference on Applications of Computer Vision, Snowmass Village, CO, USA.","DOI":"10.1109\/WACV45572.2020.9093598"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Si, C., Chen, W., Wang, W., Wang, L., and Tan, T. (2019, January 16\u201320). An attention enhanced graph convolutional lstm network for skeleton-based action recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00132"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Li, C., Cui, Z., Zheng, W., Xu, C., and Yang, J. (2018, January 13\u201319). Spatio-Temporal Graph Convolution for Skeleton Based Action Recognition. Proceedings of the AAAI Conference on Artificial Intelligence, Stockholm, Sweden.","DOI":"10.1609\/aaai.v32i1.11776"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Feng, L., Yuan, Q., Liu, Y., Huang, Q., Liu, S., and Li, Y. (2020). A Discriminative STGCN for Skeleton Oriented Action Recognition. International Conference on Neural Information Processing, Proceedings of the 27th International Conference, ICONIP 2020, Bangkok, Thailand, 18\u201322 November 2020, Springer.","DOI":"10.1007\/978-3-030-63823-8_1"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Ghosh, P., Yao, Y., Davis, L., and Divakaran, A. (2020, January 1\u20135). Stacked spatio-temporal graph convolutional networks for action segmentation. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, Snowmass Village, CO, USA.","DOI":"10.1109\/WACV45572.2020.9093361"},{"key":"ref_38","unstructured":"Bai, S., Kolter, J.Z., and Koltun, V. (2018). An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv."},{"key":"ref_39","unstructured":"Loshchilov, I., and Hutter, F. (2016). Sgdr: Stochastic gradient descent with warm restarts. arXiv."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2015, January 7\u201313). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.123"},{"key":"ref_41","unstructured":"Du, Y., Wang, W., and Wang, L. (2015, January 7\u201312). Hierarchical recurrent neural network for skeleton based action recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Liu, J., Shahroudy, A., Xu, D., and Wang, G. (2016). Spatio-temporal lstm with trust gates for 3D human action recognition. European Conference on Computer Vision, Proceedings of the 14th European Conference, Amsterdam, The Netherlands, 11\u201314 October 2016, Springer.","DOI":"10.1007\/978-3-319-46487-9_50"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Caetano, C., Br\u2019emond, F., and Schwartz, W.R. (2019, January 28\u201330). Skeleton image representation for 3D action recognition based on tree structure and reference joints. Proceedings of the 32nd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), Rio de Janeiro. Brazil.","DOI":"10.1109\/SIBGRAPI.2019.00011"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Caetano, C., Sena, J., Br\u00e9mond, F., Dos Santos, J.A., and Schwartz, W.R. (2019, January 18\u201321). Skelemotion: A new representation of skeleton joint sequences based on motion information for 3D action recognition. Proceedings of the 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Taipei, Taiwan.","DOI":"10.1109\/AVSS.2019.8909840"},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"1963","DOI":"10.1109\/TPAMI.2019.2896631","article-title":"View adaptive neural networks for high performance skeleton-based human action recognition","volume":"41","author":"Zhang","year":"2019","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Si, C., Jing, Y., Wang, W., Wang, L., and Tan, T. (2018, January 8\u201314). Skeleton-based action recognition with spatial reasoning and temporal stack learning. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01246-5_7"},{"key":"ref_47","unstructured":"Thakkar, K., and Narayanan, P. (2018). Part-based graph convolutional network for action recognition. arXiv."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Song, Y.F., Zhang, Z., and Wang, L. (2019, January 22\u201325). Richly activated graph convolutional network for action recognition with incomplete skeletons. Proceedings of the IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan.","DOI":"10.1109\/ICIP.2019.8802917"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Gao, X., Hu, W., Tang, J., Liu, J., and Guo, Z. (2019, January 21\u201325). Optimized skeleton-based action recognition via sparsified graph regression. Proceedings of the 27th ACM International Conference on Multimedia, Nice, France.","DOI":"10.1145\/3343031.3351170"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Shi, L., Zhang, Y., Cheng, J., and Lu, H. (2019, January 16\u201320). Skeleton-based action recognition with directed graph neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00810"},{"key":"ref_51","unstructured":"Papadopoulos, K., Ghorbel, E., Aouada, D., and Ottersten, B. (2019). Vertex feature encoding and hierarchical temporal modeling in a spatial-temporal graph convolutional network for action recognition. arXiv."},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Zhang, P., Lan, C., Zeng, W., Xing, J., Xue, J., and Zheng, N. (2020, January 13\u201319). Semantics-Guided Neural Networks for Efficient Skeleton-Based Human Action Recognition. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00119"},{"key":"ref_53","first-page":"11045","article-title":"Part-Level Graph Convolutional Network for Skeleton-Based Action Recognition","volume":"34","author":"Huang","year":"2020","journal-title":"Proc. AAAI Conf. Artif. Intell."},{"key":"ref_54","first-page":"2669","article-title":"Learning Graph Convolutional Network for Skeleton-Based Human Action Recognition by Neural Searching","volume":"34","author":"Peng","year":"2020","journal-title":"Proc. AAAI Conf. Artif. Intell."},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Das, S., Sharma, S., Dai, R., Bremond, F., and Thonnat, M. (2020). Vpn: Learning video-pose embedding for activities of daily living. European Conference on Computer Vision, Proceedings of the 16th European Conference, Glasgow, UK, 23\u201328 August 2020, Springer.","DOI":"10.1007\/978-3-030-58545-7_5"},{"key":"ref_56","unstructured":"Shi, L., Zhang, Y., Cheng, J., and Lu, H. (2020). Decoupled Spatial-Temporal Attention Network for Skeleton-Based Action Recognition. arXiv."},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Liu, Z., Zhang, H., Chen, Z., Wang, Z., and Ouyang, W. (2020, January 14\u201319). Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00022"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/6\/2051\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T05:35:39Z","timestamp":1760160939000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/6\/2051"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,3,15]]},"references-count":57,"journal-issue":{"issue":"6","published-online":{"date-parts":[[2021,3]]}},"alternative-id":["s21062051"],"URL":"https:\/\/doi.org\/10.3390\/s21062051","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,3,15]]}}}