{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,15]],"date-time":"2026-03-15T10:51:45Z","timestamp":1773571905130,"version":"3.50.1"},"reference-count":48,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2022,12,15]],"date-time":"2022-12-15T00:00:00Z","timestamp":1671062400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Neurorobot."],"abstract":"<jats:p>Graph convolution networks (GCNs) have been widely used in the field of skeleton-based human action recognition. However, it is still difficult to improve recognition performance and reduce parameter complexity. In this paper, a novel multi-scale attention spatiotemporal GCN (MSA-STGCN) is proposed for human violence action recognition by learning spatiotemporal features from four different skeleton modality variants. Firstly, the original joint data are preprocessed to obtain joint position, bone vector, joint motion and bone motion datas as inputs of recognition framework. Then, a spatial multi-scale graph convolution network based on the attention mechanism is constructed to obtain the spatial features from joint nodes, while a temporal graph convolution network in the form of hybrid dilation convolution is designed to enlarge the receptive field of the feature map and capture multi-scale context information. Finally, the specific relationship in the different skeleton data is explored by fusing the information of multi-stream related to human joints and bones. To evaluate the performance of the proposed MSA-STGCN, a skeleton violence action dataset: Filtered NTU RGB+D was constructed based on NTU RGB+D120. We conducted experiments on constructed Filtered NTU RGB+D and Kinetics Skeleton 400 datasets to verify the performance of the proposed recognition framework. The proposed method achieves an accuracy of 95.3% on the Filtered NTU RGB+D with the parameters 1.21M, and an accuracy of 36.2% (Top-1) and 58.5% (Top-5) on the Kinetics Skeleton 400, respectively. The experimental results on these two skeleton datasets show that the proposed recognition framework can effectively recognize violence actions without adding parameters.<\/jats:p>","DOI":"10.3389\/fnbot.2022.1091361","type":"journal-article","created":{"date-parts":[[2022,12,15]],"date-time":"2022-12-15T06:48:57Z","timestamp":1671086937000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":8,"title":["Multi-scale and attention enhanced graph convolution network for skeleton-based violence action recognition"],"prefix":"10.3389","volume":"16","author":[{"given":"Huaigang","family":"Yang","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ziliang","family":"Ren","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Huaqiang","family":"Yuan","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wenhong","family":"Wei","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Qieshi","family":"Zhang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhaolong","family":"Zhang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1965","published-online":{"date-parts":[[2022,12,15]]},"reference":[{"key":"B1","first-page":"16","article-title":"Skeleton image representation for 3D action recognition based on tree structure and reference joints,","volume-title":"2019 32nd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)","author":"Caetano","year":""},{"key":"B2","first-page":"1","article-title":"Skelemotion: a new representation of skeleton joint sequences based on motion information for 3D action recognition,","volume-title":"2019 IEEE International Conference on Advanced Video and Signal-based Surveillance","author":"Caetano","year":""},{"key":"B3","doi-asserted-by":"publisher","first-page":"1095","DOI":"10.1109\/TCYB.2017.2756840","article-title":"Body joint guided 3-d deep convolutional descriptors for action recognition","volume":"48","author":"Cao","year":"2018","journal-title":"IEEE Trans. Cybern"},{"key":"B4","doi-asserted-by":"publisher","first-page":"172","DOI":"10.1109\/TPAMI.2019.2929257","article-title":"Openpose: realtime multi-person 2D pose estimation using part affinity fields","volume":"43","author":"Cao","year":"2021","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell"},{"key":"B5","doi-asserted-by":"crossref","first-page":"4724","DOI":"10.1109\/CVPR.2017.502","article-title":"Quo vadis, action recognition? a new model and the kinetics dataset,","volume-title":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Carreira","year":"2017"},{"key":"B6","first-page":"13359","article-title":"Channel-wise topology refinement graph convolution for skeleton-based action recognition,","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Chen","year":"2021"},{"key":"B7","doi-asserted-by":"publisher","first-page":"1498","DOI":"10.1109\/TCSVT.2021.3076165","article-title":"Cross-modality compensation convolutional neural networks for RGB-D action recognition","volume":"32","author":"Cheng","year":"2021","journal-title":"IEEE Trans. Circ. Syst. Video Technol"},{"key":"B8","first-page":"183","article-title":"Skeleton-based action recognition with shift graph convolutional network,","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Cheng","year":"2020"},{"key":"B9","first-page":"20186","article-title":"Infogcn: representation learning for human skeleton-based action recognition,","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Chi","year":"2022"},{"key":"B10","doi-asserted-by":"publisher","first-page":"5442","DOI":"10.1109\/TIFS.2021.3130437","article-title":"Regina\u2013reasoning graph convolutional networks in human action recognition","volume":"16","author":"Degardin","year":"2021","journal-title":"IEEE Trans. Inf. Forensics Security"},{"key":"B11","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/IJCNN55064.2022.9892660","article-title":"Violence detection and recognition from diverse video sources,","volume-title":"2022 International Joint Conference on Neural Networks (IJCNN)","author":"Gadelkarim","year":"2022"},{"key":"B12","doi-asserted-by":"crossref","first-page":"500","DOI":"10.1109\/SmartIoT.2019.00093","article-title":"3D skeleton-based video action recognition by graph convolution network,","volume-title":"2019 IEEE International Conference on Smart Internet of Things (SmartIoT)","author":"Gao","year":"2019"},{"key":"B13","doi-asserted-by":"publisher","first-page":"807","DOI":"10.1109\/TCSVT.2016.2628339","article-title":"Skeleton optical spectra-based action recognition using convolutional neural networks","volume":"28","author":"Hou","year":"2018","journal-title":"IEEE Trans. Circ. Syst. Video Technol"},{"key":"B14","doi-asserted-by":"publisher","first-page":"2011","DOI":"10.1109\/TPAMI.2019.2913372","article-title":"Squeeze-and-excitation networks","volume":"42","author":"Hu","year":"2020","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell"},{"key":"B15","doi-asserted-by":"crossref","first-page":"2122","DOI":"10.1145\/3394171.3413666","article-title":"Spatio-temporal inception graph convolutional networks for skeleton-based action recognition,","volume-title":"Proceedings of the 28th ACM International Conference on Multimedia, MM '20","author":"Huang","year":"2020"},{"key":"B16","doi-asserted-by":"publisher","first-page":"289","DOI":"10.1109\/TCSVT.2020.2975845","article-title":"Arbitrary-view human action recognition: a varying-view RGB-D action dataset","volume":"31","author":"Ji","year":"2021","journal-title":"IEEE Trans. Circ. Syst. Video Technol"},{"key":"B17","doi-asserted-by":"publisher","first-page":"2129","DOI":"10.1109\/TCSVT.2019.2914137","article-title":"Action recognition scheme based on skeleton representation with DS-LSTM network","volume":"30","author":"Jiang","year":"2020","journal-title":"IEEE Trans. Circ. Syst. Video Technol"},{"key":"B18","doi-asserted-by":"crossref","first-page":"1623","DOI":"10.1109\/CVPRW.2017.207","article-title":"Interpretable 3D human action analysis with temporal convolutional networks,","volume-title":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","author":"Kim","year":"2017"},{"key":"B19","doi-asserted-by":"publisher","first-page":"8561","DOI":"10.1609\/aaai.v33i01.33018561","article-title":"Spatio-temporal graph routing for skeleton-based action recognition","volume":"33","author":"Li","year":"2019","journal-title":"Proc. AAAI Conf. Artif. Intell"},{"key":"B20","doi-asserted-by":"publisher","first-page":"95","DOI":"10.1109\/THMS.2018.2883001","article-title":"Multiview-based 3-D action recognition using deep networks","volume":"49","author":"Li","year":"2019","journal-title":"IEEE Trans. Hum. Mach. Syst"},{"key":"B21","doi-asserted-by":"publisher","first-page":"4800","DOI":"10.1109\/TNNLS.2021.3061115","article-title":"Memory attention networks for skeleton-based action recognition","volume":"33","author":"Li","year":"2022","journal-title":"IEEE Trans. Neural Netw. Learn. Syst"},{"key":"B22","first-page":"782","article-title":"Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation,","volume-title":"IJCAI'18: Proceedings of the 27th International Joint Conference on Artificial Intelligence","author":"Li","year":"2018"},{"key":"B23","doi-asserted-by":"crossref","first-page":"3590","DOI":"10.1109\/CVPR.2019.00371","article-title":"Actional-structural graph convolutional networks for skeleton-based action recognition,","volume-title":"2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Li","year":"2019"},{"key":"B24","doi-asserted-by":"publisher","first-page":"3316","DOI":"10.1109\/TPAMI.2021.3053765","article-title":"Symbiotic graph neural networks for 3D skeleton-based human action recognition and motion prediction","volume":"44","author":"Li","year":"2022","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell"},{"key":"B25","doi-asserted-by":"crossref","first-page":"5457","DOI":"10.1109\/CVPR.2018.00572","article-title":"Independently recurrent neural network (indrnn): building a longer and deeper RNN,","volume-title":"2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Li","year":"2018"},{"key":"B26","first-page":"13434","article-title":"Else-net: elastic semantic network for continual action recognition from skeleton data,","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Li","year":"2021"},{"key":"B27","doi-asserted-by":"publisher","first-page":"2684","DOI":"10.1109\/TPAMI.2019.2916873","article-title":"NTU RGB+D 120: a large-scale benchmark for 3D human activity understanding","volume":"42","author":"Liu","year":"2020","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell"},{"key":"B28","doi-asserted-by":"publisher","first-page":"3007","DOI":"10.1109\/TPAMI.2017.2771306","article-title":"Skeleton-based action recognition using spatio-temporal LSTM network with trust gates","volume":"40","author":"Liu","year":"2018","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell"},{"key":"B29","doi-asserted-by":"crossref","first-page":"140","DOI":"10.1109\/CVPR42600.2020.00022","article-title":"Disentangling and unifying graph convolutions for skeleton-based action recognition,","volume-title":"2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Liu","year":"2020"},{"key":"B30","doi-asserted-by":"publisher","first-page":"53942","DOI":"10.1109\/ACCESS.2020.2980996","article-title":"Multi-scale neural network with dilated convolutions for image deblurring","volume":"8","author":"Ople","year":"2020","journal-title":"IEEE Access"},{"key":"B31","doi-asserted-by":"publisher","first-page":"107560","DOI":"10.1109\/ACCESS.2019.2932114","article-title":"A review on state-of-the-art violence detection techniques","volume":"7","author":"Ramzan","year":"2019","journal-title":"IEEE Access"},{"key":"B32","doi-asserted-by":"publisher","first-page":"2945","DOI":"10.1109\/TIFS.2017.2725820","article-title":"Crowd violence detection using global motion-compensated lagrangian features and scale-sensitive video-level representation","volume":"12","author":"Senst","year":"2017","journal-title":"IEEE Trans. Inf. Forensics Security"},{"key":"B33","doi-asserted-by":"publisher","first-page":"4787","DOI":"10.1109\/TIP.2018.2845742","article-title":"Fight recognition in video using hough forests and 2D convolutional neural network","volume":"27","author":"Serrano","year":"2018","journal-title":"IEEE Trans. Image Process"},{"key":"B34","first-page":"7904","article-title":"Skeleton-based action recognition with directed graph neural networks,","volume-title":"2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Shi","year":""},{"key":"B35","first-page":"12018","article-title":"Two-stream adaptive graph convolutional networks for skeleton-based action recognition,","volume-title":"2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Shi","year":""},{"key":"B36","doi-asserted-by":"publisher","first-page":"9532","DOI":"10.1109\/TIP.2020.3028207","article-title":"Skeleton-based action recognition with multi-stream adaptive graph convolutional networks","volume":"29","author":"Shi","year":"2020","journal-title":"IEEE Trans. Image Process"},{"key":"B37","doi-asserted-by":"publisher","first-page":"663","DOI":"10.1109\/TNNLS.2020.2978942","article-title":"Host-parasite: Graph LSTM-in-LSTM for group activity recognition","volume":"32","author":"Shu","year":"2021","journal-title":"IEEE Trans. Neural Netw. Learn. Syst"},{"key":"B38","doi-asserted-by":"crossref","first-page":"1227","DOI":"10.1109\/CVPR.2019.00132","article-title":"An attention enhanced graph convolutional lstm network for skeleton-based action recognition,","author":"Si","year":"2019","journal-title":"2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)"},{"key":"B39","first-page":"1","article-title":"An end-to-end spatio-temporal attention model for human action recognition from skeleton data,","author":"Song","year":"2017","journal-title":"AAAI Conference on Artificial Intelligence"},{"key":"B40","doi-asserted-by":"publisher","first-page":"3459","DOI":"10.1109\/TIP.2018.2818328","article-title":"Spatio-temporal attention-based LSTM networks for 3D action recognition and detection","volume":"27","author":"Song","year":"2018","journal-title":"IEEE Trans. Image Process"},{"key":"B41","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2022.3157033","article-title":"Constructing stronger and faster baselines for skeleton-based action recognition","author":"Song","year":"2022","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell"},{"key":"B42","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2022.3168137","article-title":"Joint-bone fusion graph convolutional network for semi-supervised skeleton action recognition","author":"Tu","year":"2022","journal-title":"IEEE Trans. Multimedia"},{"key":"B43","doi-asserted-by":"crossref","first-page":"3633","DOI":"10.1109\/CVPR.2017.387","article-title":"Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks,","volume-title":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Wang","year":"2017"},{"key":"B44","doi-asserted-by":"crossref","first-page":"1451","DOI":"10.1109\/WACV.2018.00163","article-title":"Understanding convolution for semantic segmentation,","volume-title":"2018 IEEE Winter Conference on Applications of Computer Vision (WACV)","author":"Wang","year":"2018"},{"key":"B45","doi-asserted-by":"crossref","first-page":"1740","DOI":"10.1109\/ICCVW.2019.00216","article-title":"Spatial residual layer and dense connection block enhanced spatial temporal graph convolutional network for skeleton-based action recognition,","volume-title":"2019 IEEE\/CVF International Conference on Computer Vision Workshop (ICCVW)","author":"Wu","year":"2019"},{"key":"B46","doi-asserted-by":"publisher","first-page":"1044","DOI":"10.1109\/LSP.2018.2841649","article-title":"Ensemble one-dimensional convolution neural networks for skeleton-based action recognition","volume":"25","author":"Xu","year":"2018","journal-title":"IEEE Signal Process. Lett"},{"key":"B47","first-page":"1","article-title":"Spatial temporal graph convolutional networks for skeleton-based action recognition,","volume-title":"2018 AAAI Conference on Artificial Intelligence","author":"Yan","year":"2018"},{"key":"B48","doi-asserted-by":"crossref","first-page":"1109","DOI":"10.1109\/CVPR42600.2020.00119","article-title":"Semantics-guided neural networks for efficient skeleton-based human action recognition,","volume-title":"2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Zhang","year":"2020"}],"container-title":["Frontiers in Neurorobotics"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fnbot.2022.1091361\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,15]],"date-time":"2022-12-15T06:49:29Z","timestamp":1671086969000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fnbot.2022.1091361\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,15]]},"references-count":48,"alternative-id":["10.3389\/fnbot.2022.1091361"],"URL":"https:\/\/doi.org\/10.3389\/fnbot.2022.1091361","relation":{},"ISSN":["1662-5218"],"issn-type":[{"value":"1662-5218","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,12,15]]},"article-number":"1091361"}}