{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T01:07:53Z","timestamp":1760231273424,"version":"build-2065373602"},"reference-count":84,"publisher":"MDPI AG","issue":"18","license":[{"start":{"date-parts":[[2022,9,9]],"date-time":"2022-09-09T00:00:00Z","timestamp":1662681600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Korea government (MSIT)","award":["2017-0-00897","2018-0-01290"],"award-info":[{"award-number":["2017-0-00897","2018-0-01290"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Depth video sequence-based deep models for recognizing human actions are scarce compared to RGB and skeleton video sequences-based models. This scarcity limits the research advancements based on depth data, as training deep models with small-scale data is challenging. In this work, we propose a sequence classification deep model using depth video data for scenarios when the video data are limited. Unlike summarizing the frame contents of each frame into a single class, our method can directly classify a depth video, i.e., a sequence of depth frames. Firstly, the proposed system transforms an input depth video into three sequences of multi-view temporal motion frames. Together with the three temporal motion sequences, the input depth frame sequence offers a four-stream representation of the input depth action video. Next, the DenseNet121 architecture is employed along with ImageNet pre-trained weights to extract the discriminating frame-level action features of depth and temporal motion frames. The extracted four sets of feature vectors about frames of four streams are fed into four bi-directional (BLSTM) networks. The temporal features are further analyzed through multi-head self-attention (MHSA) to capture multi-view sequence correlations. Finally, the concatenated genre of their outputs is processed through dense layers to classify the input depth video. The experimental results on two small-scale benchmark depth datasets, MSRAction3D and DHA, demonstrate that the proposed framework is efficacious even for insufficient training samples and superior to the existing depth data-based action recognition methods.<\/jats:p>","DOI":"10.3390\/s22186841","type":"journal-article","created":{"date-parts":[[2022,9,13]],"date-time":"2022-09-13T04:05:41Z","timestamp":1663041941000},"page":"6841","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["A Deep Sequence Learning Framework for Action Recognition in Small-Scale Depth Video Dataset"],"prefix":"10.3390","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1691-8695","authenticated-orcid":false,"given":"Mohammad Farhad","family":"Bulbul","sequence":"first","affiliation":[{"name":"Department of Computer Science and Engineering, Pohang University of Science and Technology (POSTECH), 77 Cheongam, Pohang 37673, Korea"},{"name":"Department of Mathematics, Jashore University of Science and Technology, Jashore 7408, Bangladesh"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7538-2689","authenticated-orcid":false,"given":"Amin","family":"Ullah","sequence":"additional","affiliation":[{"name":"CORIS Institute, Oregon State University, Corvallis, OR 97331, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3058-5794","authenticated-orcid":false,"given":"Hazrat","family":"Ali","sequence":"additional","affiliation":[{"name":"College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation, Doha P.O. Box 34110, Qatar"}]},{"given":"Daijin","family":"Kim","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Pohang University of Science and Technology (POSTECH), 77 Cheongam, Pohang 37673, Korea"}]}],"member":"1968","published-online":{"date-parts":[[2022,9,9]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Shaikh, M.B., and Chai, D. (2021). Rgb-d data-based action recognition: A review. Sensors, 21.","DOI":"10.20944\/preprints202101.0369.v1"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"458","DOI":"10.26599\/TST.2019.9010018","article-title":"Survey of pedestrian action recognition techniques for autonomous driving","volume":"25","author":"Chen","year":"2020","journal-title":"Tsinghua Sci. Technol."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Dawar, N., and Kehtarnavaz, N. (2017, January 17\u201320). Continuous detection and recognition of actions of interest among actions of non-interest using a depth camera. Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China.","DOI":"10.1109\/ICIP.2017.8297079"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Zhu, H., Vial, R., and Lu, S. (2017, January 22\u201329). Tornado: A spatio-temporal convolutional regression network for video action proposal. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.619"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"8895","DOI":"10.3390\/s140508895","article-title":"A vision-based system for intelligent monitoring: Human behaviour analysis and privacy by context","volume":"14","author":"Chaaraoui","year":"2014","journal-title":"Sensors"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Wei, H., Laszewski, M., and Kehtarnavaz, N. (2018, January 12). Deep learning-based person detection and classification for far field video surveillance. Proceedings of the 2018 IEEE 13th Dallas Circuits and Systems Conference (DCAS), Dallas, TX, USA.","DOI":"10.1109\/DCAS.2018.8620111"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1109\/34.910878","article-title":"The recognition of human movement using temporal templates","volume":"23","author":"Bobick","year":"2001","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_8","unstructured":"Doll\u00e1r, P., Rabaud, V., Cottrell, G., and Belongie, S. (2005, January 15\u201316). Behavior recognition via sparse spatio-temporal features. Proceedings of the 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, Beijing, China."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Laptev, I., Marszalek, M., Schmid, C., and Rozenfeld, B. (2008, January 24\u201326). Learning realistic human actions from movies. Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA.","DOI":"10.1109\/CVPR.2008.4587756"},{"key":"ref_10","unstructured":"Liu, J., and Shah, M. (2008, January 24\u201326). Learning human actions via information maximization. Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Wu, H., Ma, X., and Li, Y. (2019). Hierarchical dynamic depth projected difference images\u2013based action recognition in videos with convolutional neural networks. Int. J. Adv. Robot. Syst., 16.","DOI":"10.1177\/1729881418825093"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Shen, X., and Ding, Y. (2022). Human skeleton representation for 3D action recognition based on complex network coding and LSTM. J. Vis. Commun. Image Represent., 82.","DOI":"10.1016\/j.jvcir.2021.103386"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Tasnim, N., Islam, M.K., and Baek, J.H. (2021). Deep learning based human activity recognition using spatio-temporal image formation of skeleton joints. Appl. Sci., 11.","DOI":"10.3390\/app11062675"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"LeCun, Y., Kavukcuoglu, K., and Farabet, C. (June, January 30). Convolutional networks and applications in vision. Proceedings of the 2010 IEEE International Symposium on Circuits and Systems, Paris, France.","DOI":"10.1109\/ISCAS.2010.5537907"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Tran, D., Bourdev, L., Fergus, R., Torresani, L., and Paluri, M. (2015, January 7\u201313). Learning spatiotemporal features with 3d convolutional networks. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.510"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Wang, L., Qiao, Y., and Tang, X. (2015, January 7\u201312). Action recognition with trajectory-pooled deep-convolutional descriptors. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7299059"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Du, Y., Fu, Y., and Wang, L. (2015, January 3\u20136). Skeleton based action recognition with convolutional neural network. Proceedings of the 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR), Kuala Lumpur, Malaysia.","DOI":"10.1109\/ACPR.2015.7486569"},{"key":"ref_18","first-page":"568","article-title":"Two-stream convolutional networks for action recognition in videos","volume":"Volume 27","author":"Simonyan","year":"2014","journal-title":"Advances in Neural Information Processing Systems"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., and Van Gool, L. (2016). Temporal segment networks: Towards good practices for deep action recognition. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-319-46484-8_2"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"807","DOI":"10.1109\/TCSVT.2016.2628339","article-title":"Skeleton optical spectra-based action recognition using convolutional neural networks","volume":"28","author":"Hou","year":"2016","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Ke, Q., Bennamoun, M., An, S., Sohel, F., and Boussaid, F. (2017, January 21\u201326). A new representation of skeleton sequences for 3d action recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.486"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Pham, H.H., Salmane, H., Khoudour, L., Crouzil, A., Zegers, P., and Velastin, S.A. (2019). Spatio\u2013temporal image representation of 3D skeletal movements for view-invariant action recognition with deep convolutional neural networks. Sensors, 19.","DOI":"10.20944\/preprints201903.0086.v1"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Tasnim, N., Islam, M., and Baek, J.H. (2020). Deep learning-based action recognition using 3D skeleton joints information. Inventions, 5.","DOI":"10.3390\/inventions5030049"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"3459","DOI":"10.1109\/TIP.2018.2818328","article-title":"Spatio-temporal attention-based LSTM networks for 3D action recognition and detection","volume":"27","author":"Song","year":"2018","journal-title":"IEEE Trans. Image Process."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"671","DOI":"10.1007\/s00530-020-00677-2","article-title":"Deep learning-based multi-modal approach using RGB and skeleton sequences for human activity recognition","volume":"26","author":"Verma","year":"2020","journal-title":"Multimed. Syst."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"3835","DOI":"10.1109\/TIP.2020.2965299","article-title":"View-invariant deep architecture for human action recognition using two-stream motion and shape temporal dynamics","volume":"29","author":"Dhiman","year":"2020","journal-title":"IEEE Trans. Image Process."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Yang, W., Zhang, J., Cai, J., and Xu, Z. (2022). HybridNet: Integrating GCN and CNN for skeleton-based action recognition. Appl. Intell., 1\u201312.","DOI":"10.1007\/s10489-022-03436-0"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"9875","DOI":"10.1007\/s11042-022-11937-w","article-title":"Deep learning network model based on fusion of spatiotemporal features for action recognition","volume":"81","author":"Yang","year":"2022","journal-title":"Multimed. Tools Appl."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Tasnim, N., and Baek, J.H. (2022). Deep Learning-Based Human Action Recognition with Key-Frames Sampling Using Ranking Methods. Appl. Sci., 12.","DOI":"10.3390\/app12094165"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"24119","DOI":"10.1007\/s11042-022-12091-z","article-title":"3DFCNN: Real-time action recognition using 3d deep neural networks with raw depth information","volume":"81","author":"Sarker","year":"2022","journal-title":"Multimed. Tools Appl."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Trelinski, J., and Kwolek, B. (2021, January 8\u201310). Embedded Features for 1D CNN-based Action Recognition on Depth Maps. Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, Online.","DOI":"10.5220\/0010340105360543"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"1051","DOI":"10.1109\/TMM.2018.2818329","article-title":"Depth pooling based large-scale 3-d action recognition with convolutional neural networks","volume":"20","author":"Wang","year":"2018","journal-title":"IEEE Trans. Multimed."},{"key":"ref_33","first-page":"1060916","article-title":"Action recognition in depth video from RGB perspective: A knowledge transfer manner","volume":"Volume 10609","author":"Chen","year":"2018","journal-title":"MIPPR 2017: Pattern Recognition and Computer Vision"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Imran, J., and Kumar, P. (2016, January 21\u201324). Human action recognition using RGB-D sensor and deep convolutional neural networks. Proceedings of the 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Jaipur, India.","DOI":"10.1109\/ICACCI.2016.7732038"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Treli\u0144ski, J., and Kwolek, B. (2020, January 23\u201326). Ensemble of Multi-channel CNNs for Multi-class Time-Series Classification. Depth-Based Human Activity Recognition. Proceedings of the Asian Conference on Intelligent Information and Database Systems, Phuket, Thailand.","DOI":"10.1007\/978-3-030-41964-6_39"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"14551","DOI":"10.1007\/s00521-021-06097-1","article-title":"CNN-based and DTW features for human activity recognition on depth maps","volume":"33","author":"Trelinski","year":"2021","journal-title":"Neural Comput. Appl."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"498","DOI":"10.1109\/THMS.2015.2504550","article-title":"Action recognition from depth maps using deep convolutional neural networks","volume":"46","author":"Wang","year":"2015","journal-title":"IEEE Trans. Hum.-Mach. Syst."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Shahroudy, A., Liu, J., Ng, T.T., and Wang, G. (2016, January 27\u201330). Ntu rgb+ d: A large scale dataset for 3d human activity analysis. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.115"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"2684","DOI":"10.1109\/TPAMI.2019.2916873","article-title":"Ntu rgb+ d 120: A large-scale benchmark for 3d human activity understanding","volume":"42","author":"Liu","year":"2019","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"1250","DOI":"10.1109\/TCSVT.2021.3077512","article-title":"Spatiotemporal multimodal learning with 3D CNNs for video action recognition","volume":"32","author":"Wu","year":"2021","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Sun, X., Wang, B., Huang, L., Zhang, Q., Zhu, S., and Ma, Y. (2021). CrossFuNet: RGB and Depth Cross-Fusion Network for Hand Pose Estimation. Sensors, 21.","DOI":"10.3390\/s21186095"},{"key":"ref_42","first-page":"44","article-title":"Deep Multi-Model Fusion for Human Activity Recognition Using Evolutionary Algorithms","volume":"7","author":"Verma","year":"2021","journal-title":"Int. J. Interact. Multimed. Artif. Intell."},{"key":"ref_43","unstructured":"Yang, X., Zhang, C., and Tian, Y. (November, January 9). Recognizing actions using depth motion maps-based histograms of oriented gradients. Proceedings of the 20th ACM international conference on Multimedia, Nara, Japan."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Oreifej, O., and Liu, Z. (2013, January 23\u201328). Hon4d: Histogram of oriented 4d normals for activity recognition from depth sequences. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA.","DOI":"10.1109\/CVPR.2013.98"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Yang, X., and Tian, Y. (2014, January 23\u201328). Super normal vector for activity recognition using depth sequences. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.108"},{"key":"ref_46","unstructured":"Chen, C., Liu, M., Zhang, B., Han, J., Jiang, J., and Liu, H. (2016, January 9\u201315). 3D Action Recognition Using Multi-Temporal Depth Motion Maps and Fisher Vector. Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI\u201916), New York, NY, USA."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"14115","DOI":"10.1007\/s11042-017-5017-y","article-title":"Supervised spatio-temporal kernel descriptor for human action recognition from RGB-depth videos","volume":"77","author":"Kasaei","year":"2018","journal-title":"Multimed. Tools Appl."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Miao, J., Jia, X., Mathew, R., Xu, X., Taubman, D., and Qing, C. (2016, January 25\u201328). Efficient action recognition from compressed depth maps. Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA.","DOI":"10.1109\/ICIP.2016.7532310"},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"23","DOI":"10.4018\/IJMDEM.2015100102","article-title":"DMMs-based multiple features fusion for human action recognition","volume":"6","author":"Bulbul","year":"2015","journal-title":"Int. J. Multimed. Data Eng. Manag. (IJMDEM)"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Chen, C., Hou, Z., Zhang, B., Jiang, J., and Yang, Y. (2015). Gradient local auto-correlations and extreme learning machine for depth-based activity recognition. International Symposium on Visual Computing, Springer.","DOI":"10.1007\/978-3-319-27857-5_55"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Chen, C., Jafari, R., and Kehtarnavaz, N. (2015, January 5\u20139). Action recognition from depth sequences using depth motion maps-based local binary patterns. Proceedings of the 2015 IEEE Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA.","DOI":"10.1109\/WACV.2015.150"},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"32","DOI":"10.1016\/j.patrec.2016.05.032","article-title":"Spatiotemporal representation of 3d skeleton joints-based action recognition using modified spherical harmonics","volume":"83","author":"Youssef","year":"2016","journal-title":"Pattern Recognit. Lett."},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"4648","DOI":"10.1109\/TIP.2017.2718189","article-title":"Action recognition using 3D histograms of texture and a multi-class boosting classifier","volume":"26","author":"Zhang","year":"2017","journal-title":"IEEE Trans. Image Process."},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"4651","DOI":"10.1007\/s11042-016-3284-7","article-title":"Action recognition from depth sequences using weighted fusion of 2D and 3D auto-correlation of gradients features","volume":"76","author":"Chen","year":"2017","journal-title":"Multimed. Tools Appl."},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"1729","DOI":"10.1109\/TCSVT.2018.2855416","article-title":"Dynamic 3D hand gesture recognition by learning weighted depth motion maps","volume":"29","author":"Azad","year":"2018","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_56","doi-asserted-by":"crossref","first-page":"203","DOI":"10.5194\/isprs-archives-XLII-2-W12-203-2019","article-title":"Action recognition using undecimated dual tree complex wavelet transform from depth motion maps\/depth sequences","volume":"XLII-2\/W12","author":"Shekar","year":"2019","journal-title":"Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci."},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Liu, H., Tian, L., Liu, M., and Tang, H. (2015, January 27\u201330). Sdm-bsm: A fusing depth scheme for human action recognition. Proceedings of the 2015 IEEE International Conference on Image Processing (ICIP), Qu\u00e9bec, QC, Canada.","DOI":"10.1109\/ICIP.2015.7351693"},{"key":"ref_58","doi-asserted-by":"crossref","unstructured":"Liu, M., Liu, H., Chen, C., and Najafian, M. (2016, January 25\u201328). Energy-based global ternary image for action recognition using sole depth sequences. Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA.","DOI":"10.1109\/3DV.2016.14"},{"key":"ref_59","unstructured":"Wang, L., Ding, Z., Tao, Z., Liu, Y., and Fu, Y. (November, January 27). Generative multi-view human action recognition. Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_60","doi-asserted-by":"crossref","unstructured":"Al-Obaidi, S., and Abhayaratne, C. (2019, January 25). Privacy protected recognition of activities of daily living in video. Proceedings of the 3rd IET International Conference on Technologies for Active and Assisted Living (TechAAL 2019), London, UK.","DOI":"10.1049\/cp.2019.0101"},{"key":"ref_61","doi-asserted-by":"crossref","unstructured":"Liu, Y., Wang, L., Bai, Y., Qin, C., Ding, Z., and Fu, Y. (2020). Generative View-Correlation Adaptation for Semi-supervised Multi-view Learning. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-030-58568-6_19"},{"key":"ref_62","unstructured":"Bai, Y., Tao, Z., Wang, L., Li, S., Yin, Y., and Fu, Y. (2020). Collaborative Attention Mechanism for Multi-View Action Recognition. arXiv."},{"key":"ref_63","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1109\/TIP.2019.2925285","article-title":"A comparative review of recent kinect-based action recognition algorithms","volume":"29","author":"Wang","year":"2019","journal-title":"IEEE Trans. Image Process."},{"key":"ref_64","doi-asserted-by":"crossref","unstructured":"Yang, R., and Yang, R. (2014, January 1\u20135). DMM-pyramid based deep architectures for action recognition with depth cameras. Proceedings of the Asian Conference on Computer Vision, Singapore.","DOI":"10.1007\/978-3-319-16814-2_3"},{"key":"ref_65","doi-asserted-by":"crossref","first-page":"287","DOI":"10.1016\/j.ins.2018.12.050","article-title":"Action recognition for depth video using multi-view dynamic images","volume":"480","author":"Xiao","year":"2019","journal-title":"Inf. Sci."},{"key":"ref_66","doi-asserted-by":"crossref","first-page":"1197","DOI":"10.1007\/s11760-018-1271-3","article-title":"Combining 2D and 3D deep models for action recognition with depth information","volume":"12","author":"Keceli","year":"2018","journal-title":"Signal Image Video Process."},{"key":"ref_67","doi-asserted-by":"crossref","first-page":"39","DOI":"10.1023\/A:1008280620621","article-title":"Overcoming the myopia of inductive learning algorithms with RELIEFF","volume":"7","author":"Kononenko","year":"1997","journal-title":"Appl. Intell."},{"key":"ref_68","doi-asserted-by":"crossref","first-page":"19587","DOI":"10.1007\/s11042-019-7356-3","article-title":"Action recognition from depth sequence using depth motion maps-based local ternary patterns and CNN","volume":"78","author":"Li","year":"2019","journal-title":"Multimed. Tools Appl."},{"key":"ref_69","doi-asserted-by":"crossref","first-page":"2293","DOI":"10.1109\/TMM.2019.2953814","article-title":"Convolutional networks with channel and STIPs attention model for action recognition in videos","volume":"22","author":"Wu","year":"2020","journal-title":"IEEE Trans. Multimed."},{"key":"ref_70","doi-asserted-by":"crossref","first-page":"93","DOI":"10.1016\/j.imavis.2016.04.004","article-title":"3D-based deep convolutional neural network for action recognition with depth sequences","volume":"55","author":"Liu","year":"2016","journal-title":"Image Vis. Comput."},{"key":"ref_71","doi-asserted-by":"crossref","unstructured":"Al-Faris, M., Chiverton, J., Yang, Y., and Ndzi, D. (2019). Deep learning of fuzzy weighted multi-resolution depth motion maps with spatial feature fusion for action recognition. J. Imaging, 5.","DOI":"10.3390\/jimaging5100082"},{"key":"ref_72","doi-asserted-by":"crossref","first-page":"313","DOI":"10.1007\/s00530-019-00645-5","article-title":"Combining CNN streams of dynamic image and depth data for action recognition","volume":"26","author":"Singh","year":"2020","journal-title":"Multimed. Syst."},{"key":"ref_73","doi-asserted-by":"crossref","first-page":"1155","DOI":"10.1109\/ACCESS.2017.2778011","article-title":"Action recognition in video sequences using deep bi-directional LSTM with CNN features","volume":"6","author":"Ullah","year":"2017","journal-title":"IEEE Access"},{"key":"ref_74","unstructured":"Li, C., Wang, P., Wang, S., Hou, Y., and Li, W. (2017, January 10\u201314). Skeleton-based action recognition using LSTM and CNN. Proceedings of the 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), Hong Kong, China."},{"key":"ref_75","doi-asserted-by":"crossref","first-page":"3007","DOI":"10.1109\/TPAMI.2017.2771306","article-title":"Skeleton-based action recognition using spatio-temporal lstm network with trust gates","volume":"40","author":"Liu","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_76","doi-asserted-by":"crossref","unstructured":"Zhang, S., Liu, X., and Xiao, J. (2017, January 24\u201331). On geometric features for skeleton-based action recognition using multilayer lstm networks. Proceedings of the 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), Santa Rosa, CA, USA.","DOI":"10.1109\/WACV.2017.24"},{"key":"ref_77","doi-asserted-by":"crossref","unstructured":"Si, C., Chen, W., Wang, W., Wang, L., and Tan, T. (2019, January 16\u201317). An attention enhanced graph convolutional lstm network for skeleton-based action recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00132"},{"key":"ref_78","doi-asserted-by":"crossref","unstructured":"Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger, K.Q. (2017, January 21\u201326). Densely connected convolutional networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.243"},{"key":"ref_79","doi-asserted-by":"crossref","unstructured":"Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Li, F.-F. (2009, January 20\u201325). ImageNet: A large-scale hierarchical image database. Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA.","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"ref_80","doi-asserted-by":"crossref","first-page":"602","DOI":"10.1016\/j.neunet.2005.06.042","article-title":"Framewise phoneme classification with bidirectional LSTM and other neural network architectures","volume":"18","author":"Graves","year":"2005","journal-title":"Neural Netw."},{"key":"ref_81","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, \u0141., and Polosukhin, I. (2017, January 4\u20139). Attention is all you need. Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA."},{"key":"ref_82","doi-asserted-by":"crossref","unstructured":"Li, W., Zhang, Z., and Liu, Z. (2010, January 13\u201318). Action recognition based on a bag of 3d points. Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, San Francisco, CA, USA.","DOI":"10.1109\/CVPRW.2010.5543273"},{"key":"ref_83","unstructured":"Lin, Y.C., Hu, M.C., Cheng, W.H., Hsieh, Y.H., and Chen, H.M. (November, January 29). Human action recognition and retrieval using sole depth information. Proceedings of the 20th ACM international conference on Multimedia, Nara, Japan."},{"key":"ref_84","unstructured":"Kingma, D.P., and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/18\/6841\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T00:28:33Z","timestamp":1760142513000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/18\/6841"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,9,9]]},"references-count":84,"journal-issue":{"issue":"18","published-online":{"date-parts":[[2022,9]]}},"alternative-id":["s22186841"],"URL":"https:\/\/doi.org\/10.3390\/s22186841","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2022,9,9]]}}}