{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,21]],"date-time":"2026-01-21T10:47:08Z","timestamp":1768992428445,"version":"3.49.0"},"reference-count":57,"publisher":"MDPI AG","issue":"20","license":[{"start":{"date-parts":[[2021,10,12]],"date-time":"2021-10-12T00:00:00Z","timestamp":1633996800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Graph Convolutional Networks (GCNs) have attracted a lot of attention and shown remarkable performance for action recognition in recent years. For improving the recognition accuracy, how to build graph structure adaptively, select key frames and extract discriminative features are the key problems of this kind of method. In this work, we propose a novel Adaptive Attention Memory Graph Convolutional Networks (AAM-GCN) for human action recognition using skeleton data. We adopt GCN to adaptively model the spatial configuration of skeletons and employ Gated Recurrent Unit (GRU) to construct an attention-enhanced memory for capturing the temporal feature. With the memory module, our model can not only remember what happened in the past but also employ the information in the future using multi-bidirectional GRU layers. Furthermore, in order to extract discriminative temporal features, the attention mechanism is also employed to select key frames from the skeleton sequence. Extensive experiments on Kinetics, NTU RGB+D and HDM05 datasets show that the proposed network achieves better performance than some state-of-the-art methods.<\/jats:p>","DOI":"10.3390\/s21206761","type":"journal-article","created":{"date-parts":[[2021,10,13]],"date-time":"2021-10-13T06:38:41Z","timestamp":1634107121000},"page":"6761","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["Adaptive Attention Memory Graph Convolutional Networks for Skeleton-Based Action Recognition"],"prefix":"10.3390","volume":"21","author":[{"given":"Di","family":"Liu","sequence":"first","affiliation":[{"name":"College of Information Sciences and Technology, Northeast Normal University, Changchun 130117, China"}]},{"given":"Hui","family":"Xu","sequence":"additional","affiliation":[{"name":"College of Information Sciences and Technology, Northeast Normal University, Changchun 130117, China"}]},{"given":"Jianzhong","family":"Wang","sequence":"additional","affiliation":[{"name":"College of Information Sciences and Technology, Northeast Normal University, Changchun 130117, China"}]},{"given":"Yinghua","family":"Lu","sequence":"additional","affiliation":[{"name":"College of Information Sciences and Technology, Northeast Normal University, Changchun 130117, China"}]},{"given":"Jun","family":"Kong","sequence":"additional","affiliation":[{"name":"Institute for Intelligent Elderly Care, Changchun Humanities and Sciences College, Changchun 130117, China"},{"name":"Key Laboratory for Applied Statistics of MOE, Northeast Normal University, Changchun 130024, China"}]},{"given":"Miao","family":"Qi","sequence":"additional","affiliation":[{"name":"College of Information Sciences and Technology, Northeast Normal University, Changchun 130117, China"}]}],"member":"1968","published-online":{"date-parts":[[2021,10,12]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Hu, J., Zhu, E., Wang, S., Liu, X., Guo, X., and Yin, J. (2019). An Efficient and Robust Unsupervised Anomaly Detection Method Using Ensemble Random Projection in Surveillance Videos. Sensors, 19.","DOI":"10.3390\/s19194145"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"1272","DOI":"10.1109\/JPROC.2002.801449","article-title":"Integrating perceptual and cognitive modeling for adaptive and intelligent human-computer interaction","volume":"90","author":"Duric","year":"2002","journal-title":"Proc. IEEE"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"1","DOI":"10.4018\/IJACI.2017100101","article-title":"Approaches and applications of virtual reality and gesture recognition: A review","volume":"8","author":"Sudha","year":"2017","journal-title":"Int. J. Ambient. Comput. Intell."},{"key":"ref_4","unstructured":"Simonyan, K., and Zisserman, A. (2014, January 8\u201313). Two-stream Convolutional Networks for Action Recognition in Videos. Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., and Van Gool, L. (2016, January 8\u201316). Temporal Segment Networks: Towards Good Practices for Deep Action Recognition. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46484-8_2"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Liu, J., Shahroudy, A., Xu, D., and Wang, G. (2016, January 8\u201316). Spatio-temporal LSTM with trust gates for 3D human action recognition. Proceedings of the 14th European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46487-9_50"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"85","DOI":"10.1016\/j.cviu.2017.01.011","article-title":"Space-time Representation of People based on 3D Skeletal Data: A Review","volume":"158","author":"Han","year":"2017","journal-title":"Comput. Vis. Image Underst."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"130","DOI":"10.1016\/j.patcog.2015.11.019","article-title":"3D skeleton-based human action classification: A survey","volume":"53","author":"Presti","year":"2016","journal-title":"Pattern Recognit."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Cao, Z., Simon, T., Wei, S.E., and Sheikh, Y. (2017). Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. arXiv.","DOI":"10.1109\/CVPR.2017.143"},{"key":"ref_10","unstructured":"Fang, H.S., Xie, S., Tai, Y.W., and Lu, C. (2021, March 23). RMPE: Regional Multi-Person Pose Estimation. ICCV. Available online: https:\/\/github.com\/MVIG-SJTU\/AlphaPose."},{"key":"ref_11","unstructured":"Du, Y., Wang, W., and Wang, L. (2015, January 7\u201312). Hierarchical recurrent neural network for skeleton based action recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Liu, J., Wang, G., Hu, P., Duan, L.Y., and Kot, A.C. (2017, January 21\u201326). Global context-aware attention lstm networks for 3d action recognition. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.391"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Xie, C., Li, C., Zhang, B., Chen, C., Han, J., Zou, C., and Liu, J. (2018). Memory attention networks for skeleton-based action recognition. arXiv.","DOI":"10.24963\/ijcai.2018\/227"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Fernando, B., Gavves, E., Oramas, J.M., Ghodrati, A., and Tuytelaars, T. (2015, January 7\u201312). Modeling Video Evolution for Action Recognition. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7299176"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Wang, H., and Wang, L. (2017, January 21\u201326). Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.387"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Vemulapalli, R., Arrate, F., and Chellappa, R. (2014, January 23\u201328). Human action recognition by representing 3d skeletons as points in a lie group. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.82"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Li, M., Chen, S., Chen, X., Zhang, Y., Wang, Y., and Tian, Q. (2019, January 16\u201320). Actional-structural graph convolutional networks for skeleton-based action recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00371"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Si, C., Chen, W., Wang, W., Wang, L., and Tan, T. (2019, January 16\u201320). An attention enhanced graph convolutional lstm network for skeleton-based action recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00132"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Shi, L., Zhang, Y., Cheng, J., and Lu, H. (2019, January 16\u201320). Two-stream adaptive graph convolutional networks for skeleton-based action recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01230"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Yan, S., Xiong, Y., and Lin, D. (2018, January 2\u20137). Spatial temporal graph convolutional networks for skeleton-based action recognition. Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA.","DOI":"10.1609\/aaai.v32i1.12328"},{"key":"ref_21","unstructured":"Kipf, T.N., and Welling, M. (2016). Semi-Supervised Classification with Graph Convolutional Networks. arXiv."},{"key":"ref_22","unstructured":"Niepert, M., Ahmed, M., and Kutzkov, K. (2016, January 19\u201324). Learning Convolutional Neural Networks for Graphs. Proceedings of the International Conference on Machine Learning, New York, NY, USA."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Monti, F., Boscaini, D., Masci, J., Rodola, E., Svoboda, J., and Bronstein, M.M. (2017, January 21\u201326). Geometric Deep Learning on Graphs and Manifolds Using Mixture Model CNNs. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.576"},{"key":"ref_24","unstructured":"Li, B., Li, X., Zhang, Z., and Wu, F. (February, January 27). Spatio-temporal graph routing for skeleton-based action recognition. Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Shi, L., Zhang, Y., Cheng, J., and Lu, H. (2019, January 16\u201320). Skeleton-based action recognition with directed graph neural networks. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00810"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Wen, Y., Gao, L., Fu, H., Zhang, F., and Xia, S. (February, January 27). Graph CNNs with motif and variable temporal block for skeleton-based action recognition. Proceedings of the 33rd AAAI Conference on Artificial Intelligence, Honolulu, HI, USA.","DOI":"10.1609\/aaai.v33i01.33018989"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Yang, W., Zhang, J., Cai, J., and Xu, Z. (2021). Shallow Graph Convolutional Network for Skeleton-Based Action Recognition. Sensors, 21.","DOI":"10.3390\/s21020452"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Shahroudy, A., Liu, J., Ng, T.T., and Wang, G. (2016, January 27\u201330). NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.115"},{"key":"ref_29","unstructured":"Kay, W., Carreira, J., Simonyan, K., Zhang, B., Hillier, C., Vijayanarasimhan, S., Viola, F., Green, T., Back, T., and Natsev, P. (2017). The Kinetics Human Action Video Dataset. arXiv."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"M\u00fcller, M., R\u00f6der, T., Clausen, M., Eberhardt, B., and Kr\u00fcger, B.A. (2007). Weber: Documentation Mocap Database HDM05, Universit\u00e4t Bonn. Available online: http:\/\/resources.mpi-inf.mpg.de\/HDM05\/.","DOI":"10.36198\/9783838529523"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"2405","DOI":"10.1109\/TCSVT.2018.2864148","article-title":"Action Recognition with Spatio-Temporal Visual Attention on Skeleton Image Sequences","volume":"29","author":"Yang","year":"2018","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Baradel, F., Wolf, C., and Mille, J. (2017, January 22\u201329). Human Action Recognition: Pose-based Attention Draws Focus to Hands. Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), Venice, Italy.","DOI":"10.1109\/ICCVW.2017.77"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Li, C., Zhong, Q., Xie, D., and Pu, S. (2018, January 13\u201319). Co-Occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation. Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence IJCAI-18, Stockholm, Sweden.","DOI":"10.24963\/ijcai.2018\/109"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Kim, T.S., and Reiter, A. (2017, January 21\u201326). Interpretable 3D Human Action Analysis with Temporal Convolutional Networks. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA.","DOI":"10.1109\/CVPRW.2017.207"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"346","DOI":"10.1016\/j.patcog.2017.02.030","article-title":"Enhanced skeleton visualization for view invariant human action recognition","volume":"68","author":"Liu","year":"2017","journal-title":"Pattern Recognit."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Ke, Q., Bennamoun, M., An, S., Sohel, F., and Boussaid, F. (2017, January 21\u201326). A New Representation of Skeleton Sequences for 3D Action Recognition. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.486"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Li, W., Wen, L., Chang, M.C., Lim, S.N., and Lyu, S. (2017, January 22\u201329). Adaptive RNN tree for large-scale human action recognition. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.161"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Song, S., Lan, C., Xing, J., Zeng, W., and Liu, J. (2016). An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data. arXiv.","DOI":"10.1609\/aaai.v31i1.11212"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Zhang, P., Lan, C., Xing, J., Zeng, W., Xue, J., and Zheng, N. (2017, January 22\u201329). View Adaptive Recurrent Neural Networks for High Performance Human Action Recognition From Skeleton Data. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.233"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Zhu, W., Lan, C., Xing, J., Zeng, W., Li, Y., Shen, L., and Xie, X. (2016, January 12\u201317). Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA.","DOI":"10.1609\/aaai.v30i1.10451"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Cho, K., Merrienboer, V.B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., and Bengio, Y. (2014). Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation. arXiv, Available online: https:\/\/arxiv.org\/abs\/1406.1078.","DOI":"10.3115\/v1\/D14-1179"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Liu, Z., Zhang, H., Chen, Z., and Wang, Z. (2020, January 14\u201319). Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition. Proceedings of the 2020 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Virtual.","DOI":"10.1109\/CVPR42600.2020.00022"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Ye, F., Pu, S., Zhong, Q., and Li, C. (2020, January 12\u201316). Dynamic GCN: Context-enriched Topology Learning for Skeleton-based Action Recognition. Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA.","DOI":"10.1145\/3394171.3413941"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Plizzari, C., Cannici, M., and Matteucci, M. (2021). Spatial Temporal Transformer Network for Skeleton-Based Action Recognition. International Conference on Pattern Recognition, Springer.","DOI":"10.1007\/978-3-030-68796-0_50"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Chen, Y., Zhang, Z., and Yuan, C. (2021, January 11\u201317). Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Virtual.","DOI":"10.1109\/ICCV48922.2021.01311"},{"key":"ref_46","unstructured":"Sharma, S., Kiros, R., and Salakhutdinov, R. (2015). Action recognition using visual attention. arXiv."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Luong, M.T., Pham, H., and Manning, C.D. (2015, January 17\u201321). Effective approaches to attention-based neural machine translation. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal.","DOI":"10.18653\/v1\/D15-1166"},{"key":"ref_48","unstructured":"Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., and Bengio, Y. (2015, January 6\u201311). Show, attend and tell: Neural image caption generation with visual attention. Proceedings of the 32nd International Conference on Machine Learning, Lille, France."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Yao, L., Torabi, A., Cho, K., Ballas, N., Pal, C., Larochelle, H., and Courville, A. (2015, January 13\u201316). Describing videos by exploiting temporal structure. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.512"},{"key":"ref_50","unstructured":"Stollenga, M.F., Masci, J., Gomez, F., and Schmidhuber, J. (2014, January 8\u201313). Deep networks with internal selective attention through feedback connections. Proceedings of the Annual Conference on Neural Information Processing Systems 2014, Montreal, QC, Canada."},{"key":"ref_51","unstructured":"Wang, Y., Wang, S., Tang, J., O\u2019Hare, N., Chang, Y., and Li, B. (2016). Hierarchical Attention Network for Action Recognition in Videos. arXiv."},{"key":"ref_52","unstructured":"Goodfellow, I., Bengio, Y., and Courville, A. (2021, September 01). Deep Learning. MIT Press. Available online: http:\/\/www.deeplearningbook.org."},{"key":"ref_53","unstructured":"Cho, K., and Chen, X. (2014, January 5\u20138). Classifying and Visualizing Motion Capture Sequences using Deep Neural Networks. Proceedings of the 2014 International Conference on Computer Vision Theory and Applications (VISAPP), Lisbon, Portugal."},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"45","DOI":"10.1016\/j.neucom.2021.05.004","article-title":"Rethinking the ST-GCNs for 3D skeleton-based human action recognition","volume":"454","author":"Peng","year":"2021","journal-title":"Neurocomputing"},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Pishchulin, L., Insafutdinov, E., and Tang, S. (2016, January 27\u201330). Deepcut: Joint Subset Partition and Labeling for Multi Person Pose Estimation. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.533"},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Gao, X., Li, K., Miao, Q., and Sheng, L. (2019, January 9\u201311). 3D Skeleton-Based Video Action Recognition by Graph Convolution Network. Proceedings of the 2019 IEEE International Conference on Smart Internet of Things (SmartIoT), Tianjin, China.","DOI":"10.1109\/SmartIoT.2019.00093"},{"key":"ref_57","doi-asserted-by":"crossref","first-page":"2129","DOI":"10.1109\/TCSVT.2019.2914137","article-title":"Action Recognition Scheme Based on Skeleton Representation with DS-LSTM Network","volume":"30","author":"Jiang","year":"2020","journal-title":"IEEE Trans. Circuits Syst. Video Technol."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/20\/6761\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T07:12:00Z","timestamp":1760166720000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/20\/6761"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,12]]},"references-count":57,"journal-issue":{"issue":"20","published-online":{"date-parts":[[2021,10]]}},"alternative-id":["s21206761"],"URL":"https:\/\/doi.org\/10.3390\/s21206761","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,10,12]]}}}