{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,28]],"date-time":"2026-04-28T08:34:45Z","timestamp":1777365285570,"version":"3.51.4"},"reference-count":51,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2025,12,15]],"date-time":"2025-12-15T00:00:00Z","timestamp":1765756800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computation"],"abstract":"<jats:p>Skeleton-based action recognition has achieved remarkable advances with graph convolutional networks (GCNs). However, most existing models process spatial and temporal information within a single coupled stream, which often obscures the distinct patterns of joint configuration and motion dynamics. This paper introduces the Dual-Path Cross-Attention Graph Convolutional Network (DPCA-GCN), an architecture that explicitly separates spatial and temporal modeling into two specialized pathways while maintaining rich bidirectional interaction between them. The spatial branch integrates graph convolution and spatial transformers to capture intra-frame joint relationships, whereas the temporal branch combines temporal convolution and temporal transformers to model inter-frame dependencies. A bidirectional cross-attention mechanism facilitates explicit information exchange between both paths, and an adaptive gating module balances their respective contributions according to the action context. Unlike traditional approaches that process spatial\u2013temporal information sequentially, our dual-path design enables specialized processing while maintaining cross-modal coherence through memory-efficient chunked attention mechanisms. Extensive experiments on the NTU RGB+D 60 and NTU RGB+D 120 datasets demonstrate that DPCA-GCN achieves competitive joint-only accuracies of 88.72%\/94.31% and 82.85%\/83.65%, respectively, with exceptional top-5 scores of 96.97%\/99.14% and 95.59%\/95.96%, while maintaining significantly lower computational complexity compared to multi-modal approaches.<\/jats:p>","DOI":"10.3390\/computation13120293","type":"journal-article","created":{"date-parts":[[2025,12,15]],"date-time":"2025-12-15T13:32:37Z","timestamp":1765805557000},"page":"293","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["DPCA-GCN: Dual-Path Cross-Attention Graph Convolutional Networks for Skeleton-Based Action Recognition"],"prefix":"10.3390","volume":"13","author":[{"given":"Khadija","family":"Lasri","sequence":"first","affiliation":[{"name":"I3A Laboratory, Department of Computer Sciences, Faculty of Sciences, Dhar El Mahraz, Sidi Mohamed ben Abdellah University, Fez 30050, Morocco"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6333-2235","authenticated-orcid":false,"given":"Khalid","family":"El Fazazy","sequence":"additional","affiliation":[{"name":"I3A Laboratory, Department of Computer Sciences, Faculty of Sciences, Dhar El Mahraz, Sidi Mohamed ben Abdellah University, Fez 30050, Morocco"}]},{"given":"Adnane","family":"Mohamed Mahraz","sequence":"additional","affiliation":[{"name":"I3A Laboratory, Department of Computer Sciences, Faculty of Sciences, Dhar El Mahraz, Sidi Mohamed ben Abdellah University, Fez 30050, Morocco"}]},{"given":"Hamid","family":"Tairi","sequence":"additional","affiliation":[{"name":"I3A Laboratory, Department of Computer Sciences, Faculty of Sciences, Dhar El Mahraz, Sidi Mohamed ben Abdellah University, Fez 30050, Morocco"}]},{"given":"Jamal","family":"Riffi","sequence":"additional","affiliation":[{"name":"I3A Laboratory, Department of Computer Sciences, Faculty of Sciences, Dhar El Mahraz, Sidi Mohamed ben Abdellah University, Fez 30050, Morocco"}]}],"member":"1968","published-online":{"date-parts":[[2025,12,15]]},"reference":[{"key":"ref_1","first-page":"59562","article-title":"A Comprehensive Survey of RGB-Based and Skeleton-Based Human Action Recognition","volume":"11","author":"Wang","year":"2023","journal-title":"IEEE Access"},{"key":"ref_2","unstructured":"Zhang, H., Zhang, Y., Zhong, B., Lei, Q., Yang, L., Du, J., and Chen, D. (2023). A comprehensive survey of vision-based human action recognition methods. Sensors, 23."},{"key":"ref_3","unstructured":"Zhao, H., Torralba, A., and Gupta, A. (November, January 27). Making the Invisible Visible: Action Recognition Through Walls and Occlusions. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"0100","DOI":"10.34133\/cbsystems.0100","article-title":"A Survey on 3D Skeleton-Based Action Recognition Using Learning Method","volume":"5","author":"Ren","year":"2024","journal-title":"Cyborg Bionic Syst."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"914","DOI":"10.1109\/TPAMI.2013.198","article-title":"Learning actionlet ensemble for 3D human action recognition","volume":"36","author":"Wang","year":"2014","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Xia, L., Chen, C.-C., and Aggarwal, J.K. (2012, January 16\u201321). View invariant human action recognition using histograms of 3D joints. Proceedings of the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Providence, RI, USA.","DOI":"10.1109\/CVPRW.2012.6239233"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"6957","DOI":"10.1016\/j.eswa.2015.04.039","article-title":"Hybrid classifier based human activity recognition using the silhouette and cells","volume":"42","author":"Vishwakarma","year":"2015","journal-title":"Expert Syst. Appl."},{"key":"ref_8","unstructured":"Kipf, T.N., and Welling, M. (2017, January 24\u201326). Semi-supervised classification with graph convolutional networks. Proceedings of the International Conference on Learning Representations, Toulon, France."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Yan, S., Xiong, Y., and Lin, D. (2018, January 2\u20137). Spatial temporal graph convolutional networks for skeleton-based action recognition. Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA.","DOI":"10.1609\/aaai.v32i1.12328"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Shi, L., Zhang, Y., Cheng, J., and Lu, H. (2019, January 15\u201320). Two-stream adaptive graph convolutional networks for skeleton-based action recognition. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01230"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Ye, Y., Liu, P., and Wang, X. (2020, January 12\u201316). Dynamic GCN: Context-enriched topology learning for skeleton-based action recognition. Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA.","DOI":"10.1145\/3394171.3413941"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Li, M., Chen, S., Chen, X., Zhang, Y., Wang, Y., and Tian, Q. (2019, January 15\u201320). Actional-Structural Graph Convolutional Networks for Skeleton-Based Action Recognition. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00371"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Cheng, K., Zhang, Y., He, X., Chen, W., Cheng, J., and Lu, H. (2020, January 13\u201319). Skeleton-Based Action Recognition with Shift Graph Convolutional Network. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00026"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Chen, Y., Zhang, Z., Yuan, C., Li, B., Deng, Y., and Hu, W. (2021, January 10\u201317). Channel-wise topology refinement graph convolution for skeleton-based action recognition. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.01311"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Chi, N., Hou, Z., Xu, H., and Song, J. (2022, January 18\u201324). Infogcn: Representation learning for human skeleton-based action recognition. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01955"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Kilic, U., Oztimur Karadag, O., and Tumuklu Ozyer, G. (2025). AGMS-GCN: Attention-Guided Multi-Scale Graph Convolutional Networks for Skeleton-Based Action Recognition. Knowl.-Based Syst., 113045.","DOI":"10.1016\/j.knosys.2025.113045"},{"key":"ref_17","unstructured":"Wang, S., Li, B.Z., Khabsa, M., Fang, H., and Ma, H. (2020). Linformer: Self-attention with linear complexity. arXiv."},{"key":"ref_18","unstructured":"Beltagy, I., Peters, M.E., and Cohan, A. (2020). Longformer: The long-document transformer. arXiv."},{"key":"ref_19","first-page":"109173","article-title":"Multi-scale temporal graph neural network for skeleton-based action recognition","volume":"135","author":"Feng","year":"2023","journal-title":"Pattern Recognit."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Liu, D., Chen, P., Yao, M., Lu, Y., Cai, Z., and Tian, Y. (2023). TSGCNeXt: Dynamic-static graph convolution for efficient skeleton-based action recognition. arXiv.","DOI":"10.2139\/ssrn.4984425"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Zhou, Y., Yan, X., Cheng, Z.-Q., Yan, Y., Dai, Q., and Hua, X.-S. (2024, January 16\u201322). BlockGCN: Redefine topology awareness for skeleton-based action recognition. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR52733.2024.00200"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"2477","DOI":"10.1109\/TIP.2024.3378886","article-title":"Deformable graph convolutional networks for skeleton-based action recognition","volume":"33","author":"Myung","year":"2024","journal-title":"IEEE Trans. Image Process."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Li, Y., and Li, N. (2025, January 28\u201330). DF-GCN: A Lightweight Dynamic Dual-Stream Graph Convolutional Network for Skeleton-Based Action Recognition. Proceedings of the IEEE 3rd International Conference on Image Processing and Computer Applications (ICIPCA), Shenyang, China.","DOI":"10.1109\/ICIPCA65645.2025.11138799"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"4982","DOI":"10.1038\/s41598-025-87752-8","article-title":"Two-stream spatio-temporal GCN-transformer networks for skeleton-based action recognition","volume":"15","author":"Chen","year":"2025","journal-title":"Sci. Rep."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"42757","DOI":"10.1007\/s11042-025-20653-0","article-title":"Hybrid attention-inflated 3D architecture for human action recognition","volume":"84","author":"Lasri","year":"2025","journal-title":"Multimed. Tools Appl."},{"key":"ref_26","first-page":"113","article-title":"Mutually reinforcing motion-pose framework for pose invariant action recognition","volume":"11","author":"Ramanathan","year":"2019","journal-title":"Int. J. Biom."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Lasri, K., Riffi, J., Fazazy, K.E., Mahraz, A.M., and Tairi, H. (2024, January 29\u201330). Comparative Study of I3D and SlowFast Networks for Spatial-Temporal Video Action Recognition. Proceedings of the 4th International Conference on Advances in Communication Technology and Computer Engineering (ICACTCE\u201924), Meknes, Morocco.","DOI":"10.1007\/978-3-031-94623-3_38"},{"key":"ref_28","first-page":"651","article-title":"Spatial Temporal Transformer Network for Skeleton-Based Action Recognition","volume":"12663","author":"Cucchiara","year":"2021","journal-title":"Proceedings of the Pattern Recognition. ICPR International Workshops and Challenges (ICPR 2021), Virtual, 10-15 January 2021"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Si, C., Chen, W., Wang, W., Wang, L., and Tan, T. (2019, January 15\u201320). An attention enhanced graph convolutional LSTM network for skeleton-based action recognition. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00132"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"103219","DOI":"10.1016\/j.cviu.2021.103219","article-title":"Skeleton-based action recognition via spatial and temporal transformer networks","volume":"208","author":"Plizzari","year":"2021","journal-title":"Comput. Vis. Image Underst."},{"key":"ref_31","unstructured":"Shi, L., Zhang, Y., Cheng, J., and Lu, H. (December, January 30). Decoupled Spatial-Temporal Attention Network for Skeleton-Based Action Recognition. Proceedings of the Asian Conference on Computer Vision (ACCV), Kyoto, Japan."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Ahn, D., Kim, S., Hong, H., and Ko, B.C. (2023, January 2\u20137). STAR-Transformer: A Spatio-Temporal Cross-Attention Transformer for Human Action Recognition. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA.","DOI":"10.1109\/WACV56688.2023.00333"},{"key":"ref_33","unstructured":"Bertasius, G., Wang, H., and Torresani, L. (2021, January 18\u201324). Is space-time attention all you need for video understanding?. Proceedings of the International Conference on Machine Learning, Virtual."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Arnab, A., Dehghani, M., Heigold, G., Sun, C., Lu\u010di\u0107, M., and Schmid, C. (2021, January 11\u201317). ViViT: A video vision transformer. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00676"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Fan, H., Xiong, B., Mangalam, K., Li, Y., Yan, Z., Malik, J., and Feichtenhofer, C. (2021, January 11\u201317). Multiscale Vision Transformers. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00675"},{"key":"ref_36","unstructured":"Feichtenhofer, C., Fan, H., Malik, J., and He, K. (November, January 27). SlowFast networks for video recognition. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Liu, Z., Zhang, H., Chen, Z., Wang, Z., and Ouyang, W. (2020, January 13\u201319). Disentangling and unifying graph convolutions for skeleton-based action recognition. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00022"},{"key":"ref_38","unstructured":"Duan, H., Wang, J., Chen, K., and Lin, D. (2022). DG-STGCN: Dynamic spatial-temporal graph convolutional network for skeleton-based action recognition. arXiv."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Wang, J., Bergeret, E., and Falih, I. (2024). Skeleton-Based Action Recognition with Spatial-Structural Graph Convolution. arXiv.","DOI":"10.1109\/IJCNN60899.2024.10651306"},{"key":"ref_40","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., and Polosukhin, I. (2017, January 4\u20139). Attention is all you need. Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS), Long Beach, CA, USA."},{"key":"ref_41","unstructured":"Chen, T., Xu, B., Zhang, C., and Guestrin, C. (2016). Training deep nets with sublinear memory cost. arXiv."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Shahroudy, A., Liu, J., Ng, T.-T., and Wang, G. (2016, January 27\u201330). NTU RGB+D: A large scale dataset for 3D human activity analysis. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.115"},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"2684","DOI":"10.1109\/TPAMI.2019.2916873","article-title":"NTU RGB+D 120: A large-scale benchmark for 3D human activity understanding","volume":"42","author":"Liu","year":"2020","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Xing, Y., and Zhu, J. (2021). Deep Learning-Based Action Recognition with 3D Skeleton: A Survey. CAAI Trans. Intell. Technol.","DOI":"10.1049\/cit2.12014"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Liu, J., Shahroudy, A., Xu, D., and Wang, G. (2016, January 11\u201314). Spatio-temporal LSTM with trust gates for 3D human action recognition. Proceedings of the European Conference Computer Vision\u2014ECCV, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46487-9_50"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"1963","DOI":"10.1109\/TPAMI.2019.2896631","article-title":"View adaptive neural networks for high performance skeleton-based human action recognition","volume":"41","author":"Zhang","year":"2019","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Tang, Y., Tian, Y., Lu, J., Li, P., and Zhou, J. (2018, January 18\u201323). Deep progressive reinforcement learning for skeleton-based action recognition. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00558"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Yu, F., Chen, H., Wang, X., Xian, W., Chen, Y., Liu, F., Madhavan, V., and Darrell, T. (2020, January 14\u201319). BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA. Available online: https:\/\/openaccess.thecvf.com\/content_CVPR_2020\/html\/Yu_BDD100K_A_Diverse_Driving_Dataset_for_Heterogeneous_Multitask_Learning_CVPR_2020_paper.html.","DOI":"10.1109\/CVPR42600.2020.00271"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Song, Y.-F., Zhang, Z., and Wang, L. (2019, January 22\u201325). Richly activated graph convolutional network for action recognition with incomplete skeletons. Proceedings of the IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan.","DOI":"10.1109\/ICIP.2019.8802917"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Zhang, P., Lan, C., Zeng, W., Xing, J., Xue, J., and Zheng, N. (2020, January 13\u201319). Semantics-guided neural networks for efficient skeleton-based human action recognition. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00119"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Zhang, P., Lan, C., Xing, J., Zeng, W., Xue, J., and Zheng, N. (2017, January 22\u201329). View adaptive recurrent neural networks for high performance human action recognition from skeleton data. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.233"}],"container-title":["Computation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2079-3197\/13\/12\/293\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,15]],"date-time":"2025-12-15T14:02:24Z","timestamp":1765807344000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2079-3197\/13\/12\/293"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12,15]]},"references-count":51,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["computation13120293"],"URL":"https:\/\/doi.org\/10.3390\/computation13120293","relation":{},"ISSN":["2079-3197"],"issn-type":[{"value":"2079-3197","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,12,15]]}}}