{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,29]],"date-time":"2025-10-29T03:47:29Z","timestamp":1761709649469,"version":"build-2065373602"},"reference-count":79,"publisher":"MDPI AG","issue":"10","license":[{"start":{"date-parts":[[2019,10,21]],"date-time":"2019-10-21T00:00:00Z","timestamp":1571616000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100009928","name":"Higher Committee for Education Development in Iraq","doi-asserted-by":"publisher","award":["2015"],"award-info":[{"award-number":["2015"]}],"id":[{"id":"10.13039\/501100009928","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["J. Imaging"],"abstract":"<jats:p>Human action recognition (HAR) is an important yet challenging task. This paper presents a novel method. First, fuzzy weight functions are used in computations of depth motion maps (DMMs). Multiple length motion information is also used. These features are referred to as fuzzy weighted multi-resolution DMMs (FWMDMMs). This formulation allows for various aspects of individual actions to be emphasized. It also helps to characterise the importance of the temporal dimension. This is important to help overcome, e.g., variations in time over which a single type of action might be performed. A deep convolutional neural network (CNN) motion model is created and trained to extract discriminative and compact features. Transfer learning is also used to extract spatial information from RGB and depth data using the AlexNet network. Different late fusion techniques are then investigated to fuse the deep motion model with the spatial network. The result is a spatial temporal HAR model. The developed approach is capable of recognising both human action and human\u2013object interaction. Three public domain datasets are used to evaluate the proposed solution. The experimental results demonstrate the robustness of this approach compared with state-of-the art algorithms.<\/jats:p>","DOI":"10.3390\/jimaging5100082","type":"journal-article","created":{"date-parts":[[2019,10,21]],"date-time":"2019-10-21T11:37:55Z","timestamp":1571657875000},"page":"82","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["Deep Learning of Fuzzy Weighted Multi-Resolution Depth Motion Maps with Spatial Feature Fusion for Action Recognition"],"prefix":"10.3390","volume":"5","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4376-6228","authenticated-orcid":false,"given":"Mahmoud","family":"Al-Faris","sequence":"first","affiliation":[{"name":"School of Energy and Electronic Engineering, University of Portsmouth, Portsmouth PO1 3DJ, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9081-4136","authenticated-orcid":false,"given":"John","family":"Chiverton","sequence":"additional","affiliation":[{"name":"School of Energy and Electronic Engineering, University of Portsmouth, Portsmouth PO1 3DJ, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1047-2274","authenticated-orcid":false,"given":"Yanyan","family":"Yang","sequence":"additional","affiliation":[{"name":"School of Computing, University of Portsmouth, Portsmouth PO1 3DJ, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1125-1978","authenticated-orcid":false,"given":"David","family":"Ndzi","sequence":"additional","affiliation":[{"name":"School of Computing, Engineering and Physical Sciences, University of the West of Scotland, Paisley PA1 2BE, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2019,10,21]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1909","DOI":"10.1109\/TMM.2015.2477242","article-title":"A continuous learning framework for activity recognition using deep hybrid feature models","volume":"17","author":"Hasan","year":"2015","journal-title":"IEEE Trans. Multimed."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"93","DOI":"10.1016\/j.imavis.2016.04.004","article-title":"3D-based deep convolutional neural network for action recognition with depth sequences","volume":"55","author":"Liu","year":"2016","journal-title":"Image Vis. Comput."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"116","DOI":"10.1145\/2398356.2398381","article-title":"Real-time human pose recognition in parts from single depth images","volume":"56","author":"Shotton","year":"2013","journal-title":"Commun. ACM"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Yang, X., and Tian, Y. (2014, January 23\u201328). Super normal vector for activity recognition using depth sequences. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.108"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1527","DOI":"10.1162\/neco.2006.18.7.1527","article-title":"A fast learning algorithm for deep belief nets","volume":"18","author":"Hinton","year":"2006","journal-title":"Neural Comput."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"2278","DOI":"10.1109\/5.726791","article-title":"Gradient-based learning applied to document recognition","volume":"86","author":"LeCun","year":"1998","journal-title":"Proc. IEEE"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"374","DOI":"10.1016\/j.neucom.2016.12.027","article-title":"Generalized extreme learning machine autoencoder and a new deep neural network","volume":"230","author":"Sun","year":"2017","journal-title":"Neurocomputing"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Le, Q.V., Zou, W.Y., Yeung, S.Y., and Ng, A.Y. (2011, January 20\u201325). Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA.","DOI":"10.1109\/CVPR.2011.5995496"},{"key":"ref_10","first-page":"1","article-title":"Deep learning for sensor-based activity recognition: A Survey","volume":"0167-8655","author":"Wang","year":"2018","journal-title":"Pattern Recognit. Lett."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"2352","DOI":"10.1162\/neco_a_00990","article-title":"Deep convolutional neural networks for Image Classification: A Comprehensive Review","volume":"29","author":"Rawat","year":"2017","journal-title":"Neural Comput."},{"key":"ref_12","first-page":"49","article-title":"Deep learning for environmentally robust speech recognition: An overview of recent developments","volume":"9","author":"Zhang","year":"2018","journal-title":"ACM Trans. Intell. Syst. Technol. (TIST)"},{"key":"ref_13","unstructured":"Goyal, S., and Benjamin, P. (2014). Object recognition using deep neural networks: A survey. arXiv."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Sun, L., Jia, K., Chan, T.H., Fang, Y., Wang, G., and Yan, S. (2014, January 23\u201328). DL-SFA: Deeply-learned slow feature analysis for action recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.336"},{"key":"ref_15","unstructured":"Wang, P., Li, W., Gao, Z., Zhang, J., Tang, C., and Ogunbona, P. (2015). Deep convolutional neural networks for action recognition using depth map sequences. arXiv."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"118","DOI":"10.1016\/j.cviu.2018.04.007","article-title":"RGB-D-based human motion recognition with deep learning: A survey","volume":"171","author":"Wang","year":"2018","journal-title":"Comput. Vis. Image Underst."},{"key":"ref_17","unstructured":"Yang, X., Zhang, C., and Tian, Y. (November, January 29). Recognizing actions using depth motion maps-based histograms of oriented gradients. Proceedings of the 20th ACM international conference on Multimedia, Nara, Japan."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"155","DOI":"10.1007\/s11554-013-0370-1","article-title":"Real-time human action recognition based on depth motion maps","volume":"12","author":"Chen","year":"2016","journal-title":"J. Real Time Image Process."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"22590","DOI":"10.1109\/ACCESS.2017.2759058","article-title":"Multi-Temporal depth motion maps-Based local binary patterns for 3-D human action recognition","volume":"5","author":"Chen","year":"2017","journal-title":"IEEE Access"},{"key":"ref_20","unstructured":"Chen, C., Liu, M., Zhang, B., Han, J., Jiang, J., and Liu, H. (2016, January 9\u201315). 3D Action Recognition Using Multi-Temporal depth motion maps and Fisher Vector. Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI \u201916), New York, NY, USA."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"4","DOI":"10.1109\/MMUL.2012.24","article-title":"Microsoft kinect sensor and its effect","volume":"19","author":"Zhang","year":"2012","journal-title":"IEEE Multimed."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Haggag, H., Hossny, M., Filippidis, D., Creighton, D., Nahavandi, S., and Puri, V. (2013, January 16\u201318). Measuring depth accuracy in RGB-D cameras. Proceedings of the 2013 7th International Conference on Signal Processing and Communication Systems (ICSPCS), Carrara, VIC, Australia.","DOI":"10.1109\/ICSPCS.2013.6723971"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"51","DOI":"10.1016\/j.fcij.2017.11.002","article-title":"Depth-based human activity recognition: A comparative perspective study on feature extraction","volume":"3","author":"Ali","year":"2017","journal-title":"Future Comput. Inform. J."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Ye, M., Zhang, Q., Wang, L., Zhu, J., Yang, R., and Gall, J. (2013). A Survey on Human Motion Analysis from Depth Data. Time-of-Flight and Depth Imaging. Sensors, Algorithms, and Applications, Springer.","DOI":"10.1007\/978-3-642-44964-2_8"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Chen, C., Jafari, R., and Kehtarnavaz, N. (2015, January 5\u20139). Action recognition from depth sequences using depth motion maps-based local binary patterns. Proceedings of the 2015 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA.","DOI":"10.1109\/WACV.2015.150"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"El Madany, N.E.D., He, Y., and Guan, L. (2015, January 19\u201321). human action recognition using temporal hierarchical pyramid of depth motion map and keca. Proceedings of the 2015 IEEE 17th International Workshop on Multimedia Signal Processing (MMSP), Xiamen, China.","DOI":"10.1109\/MMSP.2015.7340857"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Vemulapalli, R., Arrate, F., and Chellappa, R. (2014, January 23\u201328). human action recognition by representing 3D skeletons as points in a lie group. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.82"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"453","DOI":"10.1016\/j.imavis.2014.04.005","article-title":"Evaluating spatiotemporal interest point features for depth-based action recognition","volume":"32","author":"Zhu","year":"2014","journal-title":"Image Vis. Comput."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Wang, J., Liu, Z., and Wu, Y. (2014). Learning Actionlet Ensemble for 3D human action recognition. Human Action Recognition with Depth Cameras, Springer.","DOI":"10.1007\/978-3-319-04561-0_2"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Xia, L., and Aggarwal, J. (2013, January 23\u201328). Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera. Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, OR, USA.","DOI":"10.1109\/CVPR.2013.365"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"1028","DOI":"10.1109\/TPAMI.2016.2565479","article-title":"Super normal vector for human activity recognition with depth cameras","volume":"39","author":"Yang","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"438","DOI":"10.1016\/j.neucom.2017.08.063","article-title":"Deep appearance and motion learning for egocentric activity recognition","volume":"275","author":"Wang","year":"2018","journal-title":"Neurocomputing"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Vella, F., Augello, A., Maniscalco, U., Bentivenga, V., and Gaglio, S. (December, January 28). Classification of Indoor Actions through Deep Neural Networks. Proceedings of the 2016 12th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS), Naples, Italy.","DOI":"10.1109\/SITIS.2016.22"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Gowda, S.N. (2017, January 21\u201326). Human activity recognition using combinatorial Deep Belief Networks. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA.","DOI":"10.1109\/CVPRW.2017.203"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"221","DOI":"10.1109\/TPAMI.2012.59","article-title":"3D convolutional neural networks for human action recognition","volume":"35","author":"Ji","year":"2013","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Wu, H., and Gu, X. (2015, January 9\u201312). Max-pooling dropout for regularization of convolutional neural networks. Proceedings of the International Conference on Neural Information Processing, Istanbul, Turkey.","DOI":"10.1007\/978-3-319-26532-2_6"},{"key":"ref_37","unstructured":"Yu, K., Xu, W., and Gong, Y. (2009). Deep Learning with Kernel Regularization for Visual Recognition. Advances in Neural Information Processing Systems, Neural Information Processing Systems Foundation, Inc. (NIPS)."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"83","DOI":"10.1016\/j.patrec.2017.08.015","article-title":"Going deeper with two-stream ConvNets for action recognition in video surveillance","volume":"107","author":"Han","year":"2018","journal-title":"Pattern Recognit. Lett."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Sun, L., Jia, K., Yeung, D.Y., and Shi, B.E. (2015, January 7\u201313). human action recognition using factorized spatio-temporal convolutional networks. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.522"},{"key":"ref_40","unstructured":"Simonyan, K., and Zisserman, A. (2014). Two-Stream Convolutional Networks for Action Recognition in Videos. Advances in Neural Information Processing Systems, Neural Information Processing Systems Foundation, Inc. (NIPS)."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Wang, L., Qiao, Y., and Tang, X. (2015, January 7\u201312). Action recognition with trajectory-pooled deep-convolutional descriptors. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7299059"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Park, E., Han, X., Berg, T.L., and Berg, A.C. (2016, January 7\u201310). Combining multiple sources of knowledge in deep cnns for action recognition. Proceedings of the 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Placid, NY, USA.","DOI":"10.1109\/WACV.2016.7477589"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Feichtenhofer, C., Pinz, A., and Wildes, R.P. (2017, January 21\u201326). Spatiotemporal multiplier networks for video action recognition. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.787"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Feichtenhofer, C., Pinz, A., and Zisserman, A. (July, January 26). Convolutional Two-Stream Network Fusion for Video Action Recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.213"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., and Fei-Fei, L. (2014, January 23\u201328). Large-scale video classification with convolutional neural networks. Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.223"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"773","DOI":"10.1109\/JSEN.2015.2487358","article-title":"A real-time human action recognition system using depth and inertial sensor fusion","volume":"16","author":"Chen","year":"2016","journal-title":"IEEE Sens. J."},{"key":"ref_47","unstructured":"Krizhevsky, A., Sutskever, I., and Hinton, G.E. (2012). Imagenet Classification with Deep convolutional neural networks. Advances in Neural Information Processing Systems, Neural Information Processing Systems Foundation, Inc. (NIPS)."},{"key":"ref_48","unstructured":"Thaker, D., and Krishnakumar, K. (2017, September 01). k-Shot Learning for Action Recognition. Available online: https:\/\/pdfs.semanticscholar.org\/7576\/8ff4129ca6cd122c5ca729e9cfc66cc798fe.pdf."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Wang, J., Nie, X., Xia, Y., Wu, Y., and Zhu, S.C. (2014, January 23\u201328). Cross-view action modeling, learning and recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.339"},{"key":"ref_50","unstructured":"Wang, J., Liu, Z., Wu, Y., and Yuan, J. (2012, January 16\u201321). Mining actionlet ensemble for action recognition with depth cameras. Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA."},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Li, W., Zhang, Z., and Liu, Z. (2010, January 13\u201318). Action recognition based on a bag of 3D points. Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), San Francisco, CA, USA.","DOI":"10.1109\/CVPRW.2010.5543273"},{"key":"ref_52","unstructured":"Li, R., and Zickler, T. (2012, January 16\u201321). Discriminative virtual views for cross-view action recognition. Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA."},{"key":"ref_53","unstructured":"Li, B., Camps, O.I., and Sznaier, M. (2012, January 16\u201321). Cross-view activity recognition using hankelets. Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA."},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Sadanand, S., and Corso, J.J. (2012, January 16\u201321). Action bank: A high-level representation of activity in video. Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA.","DOI":"10.1109\/CVPR.2012.6247806"},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Maji, S., Bourdev, L., and Malik, J. (2011, January 20\u201325). Action recognition from a distributed representation of pose and appearance. Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Colorado Springs, CO, USA.","DOI":"10.1109\/CVPR.2011.5995631"},{"key":"ref_56","doi-asserted-by":"crossref","first-page":"75","DOI":"10.1016\/j.patcog.2017.12.004","article-title":"Tensor-based linear dynamical systems for action recognition from 3D skeletons","volume":"77","author":"Ding","year":"2018","journal-title":"Pattern Recognit."},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Wang, J., and Liu, Y. (2018, January 4\u20138). Kinematics Features for 3D Action Recognition Using Two-Stream CNN. Proceedings of the 2018 13th World Congress on Intelligent Control and Automation (WCICA), Changsha, China.","DOI":"10.1109\/WCICA.2018.8630333"},{"key":"ref_58","doi-asserted-by":"crossref","first-page":"667","DOI":"10.1109\/TPAMI.2017.2691768","article-title":"Learning a deep model for human action recognition from novel viewpoints","volume":"40","author":"Rahmani","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_59","doi-asserted-by":"crossref","unstructured":"Demisse, G., Papadopoulos, K., Aouada, D., and Ottersten, B. (2018, January 18\u201322). Pose encoding for robust skeleton-based action recognition. Proceedings of the Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPRW.2018.00056"},{"key":"ref_60","doi-asserted-by":"crossref","unstructured":"Baptista, R., Ghorbel, E., Papadopoulos, K., Demisse, G.G., Aouada, D., and Ottersten, B. (2019, January 12\u201317). View-invariant Action Recognition from RGB Data via 3D Pose Estimation. Proceedings of the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2019), Brighton, UK.","DOI":"10.1109\/ICASSP.2019.8682904"},{"key":"ref_61","doi-asserted-by":"crossref","unstructured":"Lee, I., Kim, D., Kang, S., and Lee, S. (2017, January 22\u201329). Ensemble deep learning for skeleton-based action recognition using temporal sliding lstm networks. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.115"},{"key":"ref_62","doi-asserted-by":"crossref","unstructured":"Xia, L., Chen, C.C., and Aggarwal, J. (2012, January 16\u201321). View invariant human action recognition using histograms of 3D joints. Proceedings of the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Providence, RI, USA.","DOI":"10.1109\/CVPRW.2012.6239233"},{"key":"ref_63","unstructured":"Padilla-L\u00f3pez, J.R., Chaaraoui, A.A., and Fl\u00f3rez-Revuelta, F. (2014). A discussion on the validation tests employed to compare human action recognition methods using the MSR action 3D dataset. arXiv."},{"key":"ref_64","doi-asserted-by":"crossref","first-page":"786","DOI":"10.1016\/j.eswa.2013.08.009","article-title":"Evolutionary joint selection to improve human action recognition with RGB-D devices","volume":"41","author":"Chaaraoui","year":"2014","journal-title":"Expert Syst. Appl."},{"key":"ref_65","doi-asserted-by":"crossref","first-page":"221","DOI":"10.1016\/j.patrec.2013.07.011","article-title":"On the improvement of human action recognition from depth map sequences using space\u2013time occupancy patterns","volume":"36","author":"Vieira","year":"2014","journal-title":"Pattern Recognit. Lett."},{"key":"ref_66","doi-asserted-by":"crossref","first-page":"498","DOI":"10.1109\/THMS.2015.2504550","article-title":"Action recognition from depth maps using deep convolutional neural networks","volume":"46","author":"Wang","year":"2016","journal-title":"IEEE Trans. Hum. Mach. Syst."},{"key":"ref_67","unstructured":"Du, Y., Wang, W., and Wang, L. (2015, January 7\u201312). Hierarchical recurrent neural network for skeleton based action recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA."},{"key":"ref_68","first-page":"439","article-title":"Latent max-margin multitask learning with skelets for 3-D action recognition","volume":"47","author":"Yang","year":"2016","journal-title":"IEEE Trans. Cybern."},{"key":"ref_69","doi-asserted-by":"crossref","unstructured":"Tomas, A., and Biswas, K. (2017, January 4\u20136). Human activity recognition using combined deep architectures. Proceedings of the 2017 IEEE 2nd International Conference on Signal and Image Processing (ICSIP), Singapore.","DOI":"10.1109\/SIPROCESS.2017.8124502"},{"key":"ref_70","doi-asserted-by":"crossref","first-page":"1197","DOI":"10.1007\/s11760-018-1271-3","article-title":"Combining 2D and 3D deep models for action recognition with depth information","volume":"12","author":"Kaya","year":"2018","journal-title":"Signal Image Video Process."},{"key":"ref_71","first-page":"77","article-title":"Action recognition using vague division DMMs","volume":"2017","author":"Jin","year":"2017","journal-title":"J. Eng."},{"key":"ref_72","doi-asserted-by":"crossref","first-page":"51","DOI":"10.1016\/j.cviu.2018.03.003","article-title":"Exploiting deep residual networks for human action recognition from skeletal data","volume":"170","author":"Pham","year":"2018","journal-title":"Comput. Vis. Image Underst."},{"key":"ref_73","doi-asserted-by":"crossref","unstructured":"Yin, X., and Chen, Q. (2016, January 16\u201321). Deep metric learning autoencoder for nonlinear temporal alignment of human motion. Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA), Stockholm, Sweden.","DOI":"10.1109\/ICRA.2016.7487366"},{"key":"ref_74","doi-asserted-by":"crossref","unstructured":"Klaser, A., Marsza\u0142ek, M., and Schmid, C. (2008, January 1\u20134). A spatio-temporal descriptor based on 3D-gradients. Proceedings of the 19th British Machine Vision Conference, Leeds, UK.","DOI":"10.5244\/C.22.99"},{"key":"ref_75","doi-asserted-by":"crossref","unstructured":"Wang, J., Liu, Z., Chorowski, J., Chen, Z., and Wu, Y. (2012). Robust 3D Action Recognition with Random Occupancy Patterns. Computer Vision\u2014ECCV 2012, Springer.","DOI":"10.1007\/978-3-642-33709-3_62"},{"key":"ref_76","unstructured":"Doll\u00e1r, P., Rabaud, V., Cottrell, G., and Belongie, S. (2005, January 15\u201316). Behavior recognition via sparse spatio-temporal features. Proceedings of the 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, Beijing, China."},{"key":"ref_77","doi-asserted-by":"crossref","unstructured":"Oreifej, O., and Liu, Z. (2013, January 23\u201328). Hon4d: Histogram of oriented 4d normals for activity recognition from depth sequences. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA.","DOI":"10.1109\/CVPR.2013.98"},{"key":"ref_78","doi-asserted-by":"crossref","unstructured":"Asadi-Aghbolaghi, M., Bertiche, H., Roig, V., Kasaei, S., and Escalera, S. (2017, January 22\u201329). Action recognition from RGB-D data: Comparison and fusion of spatio-temporal handcrafted features and deep strategies. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCVW.2017.376"},{"key":"ref_79","doi-asserted-by":"crossref","unstructured":"Mavroudi, E., Tao, L., and Vidal, R. (2017, January 24\u201331). Deep moving poselets for video based action recognition. Proceedings of the 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), Santa Rosa, CA, USA.","DOI":"10.1109\/WACV.2017.20"}],"container-title":["Journal of Imaging"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2313-433X\/5\/10\/82\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T13:28:12Z","timestamp":1760189292000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2313-433X\/5\/10\/82"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,10,21]]},"references-count":79,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2019,10]]}},"alternative-id":["jimaging5100082"],"URL":"https:\/\/doi.org\/10.3390\/jimaging5100082","relation":{},"ISSN":["2313-433X"],"issn-type":[{"type":"electronic","value":"2313-433X"}],"subject":[],"published":{"date-parts":[[2019,10,21]]}}}