{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,20]],"date-time":"2026-03-20T04:25:11Z","timestamp":1773980711928,"version":"3.50.1"},"reference-count":46,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2023,1,30]],"date-time":"2023-01-30T00:00:00Z","timestamp":1675036800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"the National Key Research and Development Program of China","award":["2018YFC0407905"],"award-info":[{"award-number":["2018YFC0407905"]}]},{"name":"the National Key Research and Development Program of China","award":["D\/2021\/01\/39"],"award-info":[{"award-number":["D\/2021\/01\/39"]}]},{"name":"the National Key Research and Development Program of China","award":["2021JSJG143"],"award-info":[{"award-number":["2021JSJG143"]}]},{"name":"the 14th Five-Year Plan for Educational Science of Jiangsu Province","award":["2018YFC0407905"],"award-info":[{"award-number":["2018YFC0407905"]}]},{"name":"the 14th Five-Year Plan for Educational Science of Jiangsu Province","award":["D\/2021\/01\/39"],"award-info":[{"award-number":["D\/2021\/01\/39"]}]},{"name":"the 14th Five-Year Plan for Educational Science of Jiangsu Province","award":["2021JSJG143"],"award-info":[{"award-number":["2021JSJG143"]}]},{"name":"the Jiangsu Higher Education Reform Research Project","award":["2018YFC0407905"],"award-info":[{"award-number":["2018YFC0407905"]}]},{"name":"the Jiangsu Higher Education Reform Research Project","award":["D\/2021\/01\/39"],"award-info":[{"award-number":["D\/2021\/01\/39"]}]},{"name":"the Jiangsu Higher Education Reform Research Project","award":["2021JSJG143"],"award-info":[{"award-number":["2021JSJG143"]}]},{"name":"the 2022 Undergraduate Practice Teaching Reform Research Project of Hohai University","award":["2018YFC0407905"],"award-info":[{"award-number":["2018YFC0407905"]}]},{"name":"the 2022 Undergraduate Practice Teaching Reform Research Project of Hohai University","award":["D\/2021\/01\/39"],"award-info":[{"award-number":["D\/2021\/01\/39"]}]},{"name":"the 2022 Undergraduate Practice Teaching Reform Research Project of Hohai University","award":["2021JSJG143"],"award-info":[{"award-number":["2021JSJG143"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Symmetry"],"abstract":"<jats:p>3D human action recognition is crucial in broad industrial application scenarios such as robotics, video surveillance, autonomous driving, or intellectual education, etc. In this paper, we present a new point cloud sequence network called PointMapNet for 3D human action recognition. In PointMapNet, two point cloud feature maps symmetrical to depth feature maps are proposed to summarize appearance and motion representations from point cloud sequences. Specifically, we first convert the point cloud frames to virtual action frames using static point cloud techniques. The virtual action frame is a 1D vector used to characterize the structural details in the point cloud frame. Then, inspired by feature map-based human action recognition on depth sequences, two point cloud feature maps are symmetrically constructed to recognize human action from the point cloud sequence, i.e., Point Cloud Appearance Map (PCAM) and Point Cloud Motion Map (PCMM). To construct PCAM, an MLP-like network architecture is designed and used to capture the spatio-temporal appearance feature of the human action in a virtual action sequence. To construct PCMM, the MLP-like network architecture is used to capture the motion feature of the human action in a virtual action difference sequence. Finally, the two point cloud feature map descriptors are concatenated and fed to a fully connected classifier for human action recognition. In order to evaluate the performance of the proposed approach, extensive experiments are conducted. The proposed method achieves impressive results on three benchmark datasets, namely NTU RGB+D 60 (89.4% cross-subject and 96.7% cross-view), UTD-MHAD (91.61%), and MSR Action3D (91.91%). The experimental results outperform existing state-of-the-art point cloud sequence classification networks, demonstrating the effectiveness of our method.<\/jats:p>","DOI":"10.3390\/sym15020363","type":"journal-article","created":{"date-parts":[[2023,1,30]],"date-time":"2023-01-30T07:34:41Z","timestamp":1675064081000},"page":"363","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":10,"title":["PointMapNet: Point Cloud Feature Map Network for 3D Human Action Recognition"],"prefix":"10.3390","volume":"15","author":[{"given":"Xing","family":"Li","sequence":"first","affiliation":[{"name":"The Key Laboratory of Water Big Data Technology of Ministry of Water Resources, Hohai University, Nanjing 211100, China"},{"name":"School of Computer and Information, Hohai University, Nanjing 211100, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5625-0402","authenticated-orcid":false,"given":"Qian","family":"Huang","sequence":"additional","affiliation":[{"name":"The Key Laboratory of Water Big Data Technology of Ministry of Water Resources, Hohai University, Nanjing 211100, China"},{"name":"School of Computer and Information, Hohai University, Nanjing 211100, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yunfei","family":"Zhang","sequence":"additional","affiliation":[{"name":"The Key Laboratory of Water Big Data Technology of Ministry of Water Resources, Hohai University, Nanjing 211100, China"},{"name":"School of Computer and Information, Hohai University, Nanjing 211100, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tianjin","family":"Yang","sequence":"additional","affiliation":[{"name":"The Key Laboratory of Water Big Data Technology of Ministry of Water Resources, Hohai University, Nanjing 211100, China"},{"name":"School of Computer and Information, Hohai University, Nanjing 211100, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhijian","family":"Wang","sequence":"additional","affiliation":[{"name":"The Key Laboratory of Water Big Data Technology of Ministry of Water Resources, Hohai University, Nanjing 211100, China"},{"name":"School of Computer and Information, Hohai University, Nanjing 211100, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2023,1,30]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Yang, W., Zhang, J., Cai, J., and Xu, Z. (2021). Relation Selective Graph Convolutional Network for Skeleton-Based Action Recognition. Symmetry, 13.","DOI":"10.3390\/sym13122275"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Yang, X., Zhang, C., and Tian, Y. (2012). Recognizing Actions Using Depth Motion Maps-Based Histograms of Oriented Gradients, Association for Computing Machinery.","DOI":"10.1145\/2393347.2396382"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1109\/34.910878","article-title":"The recognition of human movement using temporal templates","volume":"23","author":"Bobick","year":"2001","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Shahroudy, A., Liu, J., Ng, T.T., and Wang, G. (2016, January 27\u201330). Ntu rgb+ d: A large scale dataset for 3d human activity analysis. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.115"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2015, January 7\u201312). Going deeper with convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1806","DOI":"10.1109\/TSMC.2018.2850149","article-title":"Deep Convolutional Neural Networks for Human Action Recognition Using Depth Maps and Postures","volume":"49","author":"Kamel","year":"2019","journal-title":"IEEE Trans. Syst. Man Cybern. Syst."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Li, X., Shuai, B., and Tighe, J. (2020, January 23\u201328). Directional temporal modeling for action recognition. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58539-6_17"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Wang, Y., Xiao, Y., Xiong, F., Jiang, W., Cao, Z., Zhou, J.T., and Yuan, J. (2020, January 14\u201319). 3dv: 3d dynamic voxel for action recognition in depth video. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00059"},{"key":"ref_9","unstructured":"Liu, X., Yan, M., and Bohg, J. (November, January 27). Meteornet: Deep learning on dynamic 3d point cloud sequences. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_10","unstructured":"Fan, H., Yu, X., Ding, Y., Yang, Y., and Kankanhalli, M. (2022). PSTNet: Point spatio-temporal convolution on point cloud sequences. arXiv."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Fan, H., Yang, Y., and Kankanhalli, M. (2021, January 19\u201325). Point 4d transformer networks for spatio-temporal modeling in point cloud videos. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01398"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Wang, J., Liu, Z., Chorowski, J., Chen, Z., and Wu, Y. (2012, January 7\u201313). Robust 3d action recognition with random occupancy patterns. Proceedings of the European Conference on Computer Vision, Florence, Italy.","DOI":"10.1007\/978-3-642-33709-3_62"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Liu, H., He, Q., and Liu, M. (2017, January 5\u20139). Human action recognition using adaptive hierarchical depth motion maps and gabor filter. Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA.","DOI":"10.1109\/ICASSP.2017.7952393"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Su, H., Maji, S., Kalogerakis, E., and Learned-Miller, E. (2015, January 7\u201313). Multi-View Convolutional Neural Networks for 3D Shape Recognition. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.114"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Zhou, Y., and Tuzel, O. (2018, January 18\u201322). VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00472"},{"key":"ref_16","unstructured":"Qi, C.R., Su, H., Mo, K., and Guibas, L.J. (2017, January 21\u201326). Pointnet: Deep learning on point sets for 3d classification and segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA."},{"key":"ref_17","unstructured":"Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. Proceedings of the Advances in Neural Information Processing Systems."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Xu, Y., Fan, T., Xu, M., Zeng, L., and Qiao, Y. (2018, January 8\u201314). SpiderCNN: Deep Learning on Point Sets with Parameterized Convolutional Filters. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01237-3_6"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"146","DOI":"10.1145\/3326362","article-title":"Dynamic Graph CNN for Learning on Point Clouds","volume":"38","author":"Wang","year":"2019","journal-title":"ACM Trans. Graph."},{"key":"ref_20","unstructured":"Zhang, K., Hao, M., Wang, J., de Silva, C.W., and Fu, C. (2019). Linked Dynamic Graph CNN: Learning on Point Cloud via Linking Hierarchical Features. arXiv."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Jiang, M., Wu, Y., Zhao, T., Zhao, Z., and Lu, C. (2018). PointSIFT: A SIFT-like Network Module for 3D Point Cloud Semantic Segmentation. arXiv.","DOI":"10.1109\/IGARSS.2019.8900102"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"2247","DOI":"10.1109\/TPAMI.2007.70711","article-title":"Actions as Space-Time Shapes","volume":"29","author":"Gorelick","year":"2007","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Wang, H., and Schmid, C. (2013, January 1\u20138). Action Recognition with Improved Trajectories. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Sydney, Australia.","DOI":"10.1109\/ICCV.2013.441"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Luo, W., Yang, B., and Urtasun, R. (2018, January 18\u201322). Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting With a Single Convolutional Net. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00376"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Choy, C., Gwak, J., and Savarese, S. (2019, January 15\u201320). 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00319"},{"key":"ref_26","first-page":"24261","article-title":"Mlp-mixer: An all-mlp architecture for vision","volume":"34","author":"Tolstikhin","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Chen, C., Jafari, R., and Kehtarnavaz, N. (2015, January 27\u201330). UTD-MHAD: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. Proceedings of the 2015 IEEE International conference on image processing (ICIP), Quebec City, QC, Canada.","DOI":"10.1109\/ICIP.2015.7350781"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Li, W., Zhang, Z., and Liu, Z. (2010, January 13\u201318). Action recognition based on a bag of 3d points. Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, San Francisco, CA, USA.","DOI":"10.1109\/CVPRW.2010.5543273"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Oreifej, O., and Liu, Z. (2013, January 23\u201328). Hon4d: Histogram of oriented 4d normals for activity recognition from depth sequences. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA.","DOI":"10.1109\/CVPR.2013.98"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"287","DOI":"10.1016\/j.ins.2018.12.050","article-title":"Action recognition for depth video using multi-view dynamic images","volume":"480","author":"Xiao","year":"2019","journal-title":"Inf. Sci."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"1729881418825093","DOI":"10.1177\/1729881418825093","article-title":"Hierarchical dynamic depth projected difference images\u2013based action recognition in videos with convolutional neural networks","volume":"16","author":"Wu","year":"2019","journal-title":"Int. J. Adv. Robot. Syst."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"1051","DOI":"10.1109\/TMM.2018.2818329","article-title":"Depth Pooling Based Large-Scale 3-D Action Recognition With Convolutional Neural Networks","volume":"20","author":"Wang","year":"2018","journal-title":"IEEE Trans. Multimed."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"24119","DOI":"10.1007\/s11042-022-12091-z","article-title":"3dfcnn: Real-time action recognition using 3d deep neural networks with raw depth information","volume":"81","author":"Sarker","year":"2022","journal-title":"Multimed. Tools Appl."},{"key":"ref_34","unstructured":"Sanchez-Caballero, A., Fuentes-Jimenez, D., and Losada-Guti\u00e9rrez, C. (2020). Exploiting the convlstm: Human action recognition using raw depth video-based recurrent neural networks. arXiv."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"1963","DOI":"10.1109\/TPAMI.2019.2896631","article-title":"View Adaptive Neural Networks for High Performance Skeleton-Based Human Action Recognition","volume":"41","author":"Zhang","year":"2019","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Zhang, P., Lan, C., Zeng, W., Xing, J., Xue, J., and Zheng, N. (2020, January 14\u201319). Semantics-guided neural networks for efficient skeleton-based human action recognition. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00119"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Shi, L., Zhang, Y., Cheng, J., and Lu, H. (2019, January 15\u201320). Skeleton-based action recognition with directed graph neural networks. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00810"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Cheng, K., Zhang, Y., Cao, C., Shi, L., Cheng, J., and Lu, H. (2020, January 23\u201328). Decoupling gcn with dropgraph module for skeleton-based action recognition. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58586-0_32"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Yan, S., Xiong, Y., and Lin, D. (2018, January 2\u20137). Spatial temporal graph convolutional networks for skeleton-based action recognition. Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA.","DOI":"10.1609\/aaai.v32i1.12328"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"4648","DOI":"10.1109\/TIP.2017.2718189","article-title":"Action Recognition Using 3D Histograms of Texture and A Multi-Class Boosting Classifier","volume":"26","author":"Zhang","year":"2017","journal-title":"IEEE Trans. Image Process."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"5275","DOI":"10.1109\/TIP.2018.2855438","article-title":"Information Fusion for Human Action Recognition via Biset\/Multiset Globality Locality Preserving Canonical Correlation Analysis","volume":"27","author":"Elmadany","year":"2018","journal-title":"IEEE Trans. Image Process."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Rahmani, H., Mahmood, A., Du Huynh, Q., and Mian, A. (2014, January 6\u201312). HOPC: Histogram of oriented principal components of 3D pointclouds for action recognition. Proceedings of the European Conference on Computer Vision, Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10605-2_48"},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"135118","DOI":"10.1109\/ACCESS.2020.3006067","article-title":"Depth Sequential Information Entropy Maps and Multi-Label Subspace Learning for Human Action Recognition","volume":"8","author":"Yang","year":"2020","journal-title":"IEEE Access"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"1250","DOI":"10.1109\/TCSVT.2021.3077512","article-title":"Spatiotemporal Multimodal Learning With 3D CNNs for Video Action Recognition","volume":"32","author":"Wu","year":"2022","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Xia, L., and Aggarwal, J. (2013, January 23\u201328). Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA.","DOI":"10.1109\/CVPR.2013.365"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Xie, S., Sun, C., Huang, J., Tu, Z., and Murphy, K. (2018, January 8\u201314). Rethinking spatiotemporal feature learning: Speed-accuracy trade-offs in video classification. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01267-0_19"}],"container-title":["Symmetry"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-8994\/15\/2\/363\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T18:19:31Z","timestamp":1760120371000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-8994\/15\/2\/363"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,1,30]]},"references-count":46,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2023,2]]}},"alternative-id":["sym15020363"],"URL":"https:\/\/doi.org\/10.3390\/sym15020363","relation":{},"ISSN":["2073-8994"],"issn-type":[{"value":"2073-8994","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,1,30]]}}}