{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,1]],"date-time":"2026-07-01T00:32:42Z","timestamp":1782865962338,"version":"3.54.5"},"reference-count":150,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2021,6,21]],"date-time":"2021-06-21T00:00:00Z","timestamp":1624233600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Higher Education Commission, Pakistan","award":["No.5-1\/HRD\/UESTPI(Batch-VI)\/7108\/2018\/HEC"],"award-info":[{"award-number":["No.5-1\/HRD\/UESTPI(Batch-VI)\/7108\/2018\/HEC"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Classification of human actions is an ongoing research problem in computer vision. This review is aimed to scope current literature on data fusion and action recognition techniques and to identify gaps and future research direction. Success in producing cost-effective and portable vision-based sensors has dramatically increased the number and size of datasets. The increase in the number of action recognition datasets intersects with advances in deep learning architectures and computational support, both of which offer significant research opportunities. Naturally, each action-data modality\u2014such as RGB, depth, skeleton, and infrared (IR)\u2014has distinct characteristics; therefore, it is important to exploit the value of each modality for better action recognition. In this paper, we focus solely on data fusion and recognition techniques in the context of vision with an RGB-D perspective. We conclude by discussing research challenges, emerging trends, and possible future research directions.<\/jats:p>","DOI":"10.3390\/s21124246","type":"journal-article","created":{"date-parts":[[2021,6,21]],"date-time":"2021-06-21T13:29:58Z","timestamp":1624282198000},"page":"4246","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":68,"title":["RGB-D Data-Based Action Recognition: A Review"],"prefix":"10.3390","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9042-5018","authenticated-orcid":false,"given":"Muhammad Bilal","family":"Shaikh","sequence":"first","affiliation":[{"name":"School of Engineering, Edith Cowan University, Perth, WA 6027, Australia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9004-7608","authenticated-orcid":false,"given":"Douglas","family":"Chai","sequence":"additional","affiliation":[{"name":"School of Engineering, Edith Cowan University, Perth, WA 6027, Australia"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2021,6,21]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"4275","DOI":"10.1109\/JSEN.2015.2416651","article-title":"Evaluating and Improving the Depth Accuracy of Kinect for Windows v2","volume":"15","author":"Yang","year":"2015","journal-title":"IEEE Sens."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Carfagni, M., Furferi, R., Governi, L., Santarelli, C., Servi, M., Uccheddu, F., and Volpe, Y. (2019). Metrological and Critical Characterization of the Intel D415 Stereo Depth Camera. Sensors, 19.","DOI":"10.3390\/s19030489"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1016\/j.gaitpost.2021.04.005","article-title":"Effects of camera viewing angles on tracking kinematic gait patterns using Azure Kinect, Kinect v2 and Orbbec Astra Pro v2","volume":"87","author":"Yeung","year":"2021","journal-title":"Gait Posture"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"4","DOI":"10.1016\/j.imavis.2017.01.010","article-title":"Going Deeper into Action Recognition: A Survey","volume":"60","author":"Herath","year":"2017","journal-title":"Image Vis. Comput."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"428","DOI":"10.1006\/cviu.1998.0744","article-title":"Human Motion Analysis: A Review","volume":"73","author":"Aggarwal","year":"1999","journal-title":"Comput. Vis. Image Underst."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"3343","DOI":"10.1016\/j.patcog.2014.04.018","article-title":"A Survey on Still-Image-based Human Action Recognition","volume":"47","author":"Guo","year":"2014","journal-title":"Pattern Recognit."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"976","DOI":"10.1016\/j.imavis.2009.11.014","article-title":"A Survey on Vision-based Human Action Recognition","volume":"28","author":"Poppe","year":"2010","journal-title":"Image Vis. Comput."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"1473","DOI":"10.1109\/TCSVT.2008.2005594","article-title":"Machine Recognition of Human Activities: A Survey","volume":"18","author":"Turaga","year":"2008","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_9","unstructured":"Wang, H., Kl\u00e4ser, A., Schmid, C., and Cheng-Lin, L. (2011, January 16\u201320). Action Recognition by Dense Trajectories. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Zhu, G., Zhang, L., Mei, L., Shao, J., Song, J., and Shen, P. (2016, January 4\u20138). Large-scale Isolated Gesture Recognition using Pyramidal 3D Convolutional Networks. Proceedings of the 23rd International Conference on Pattern Recognition, Cancun, Mexico.","DOI":"10.1109\/ICPR.2016.7899601"},{"key":"ref_11","unstructured":"Asadi-Aghbolaghi, M., Clap\u00e9s, A., Bellantonio, M., Escalante, H.J., Ponce-L\u00f3pez, V., Bar\u00f3, X., Guyon, I., Kasaei, S., and Escalera, S. (June, January 30). A Survey on Deep Learning Based Approaches for Action and Gesture Recognition in Image Sequences. Proceedings of the International Conference on Automatic Face Gesture Recognition, Washington, WA, USA."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Prince, S. (2012). Computer Vision: Models, Learning, and Inference, Cambridge University Press. [1st ed.].","DOI":"10.1017\/CBO9780511996504"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Szeliski, R. (2010). Computer Vision: Algorithms and Applications, Springer. [1st ed.].","DOI":"10.1007\/978-1-84882-935-0"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"118","DOI":"10.1016\/j.cviu.2018.04.007","article-title":"RGB-D-based Human Motion Recognition with Deep Learning: A Survey","volume":"171","author":"Wang","year":"2018","journal-title":"Comput. Vis. Image Underst."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"70","DOI":"10.1016\/j.patrec.2014.04.011","article-title":"Human Activity Recognition from 3D Data: A Review","volume":"48","author":"Aggarwal","year":"2014","journal-title":"Pattern Recognit. Lett."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"1995","DOI":"10.1016\/j.patrec.2013.02.006","article-title":"A Survey of Human Motion Analysis using Depth Imagery","volume":"34","author":"Chen","year":"2013","journal-title":"Pattern Recognit. Lett."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"85","DOI":"10.1016\/j.cviu.2017.01.011","article-title":"Space-time Representation of People based on 3D Skeletal Data: A Review","volume":"158","author":"Han","year":"2017","journal-title":"J. Vis. Commun. Image Represent."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"86","DOI":"10.1016\/j.patcog.2016.05.019","article-title":"RGB-D-based Action Recognition Datasets: A Survey","volume":"60","author":"Zhang","year":"2016","journal-title":"Pattern Recognit."},{"key":"ref_19","first-page":"149","article-title":"A Survey on Human Motion Analysis from Depth Data","volume":"Volume 8200","author":"Ye","year":"2013","journal-title":"Time-of-Flight and Depth Imaging. Sensors, Algorithms, and Applications, Lecture Notes in Computer Science"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"42","DOI":"10.1016\/j.imavis.2016.06.007","article-title":"From Handcrafted to Learned Representations for Human Action Recognition: A Survey","volume":"55","author":"Zhu","year":"2016","journal-title":"Image Vis. Comput."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Zhang, H.B., Zhang, Y.X., Zhong, B., Lei, Q., Yang, L., Du, J.X., and Chen, D.S. (2019). A Comprehensive Survey of Vision-Based Human Action Recognition Methods. Sensors, 19.","DOI":"10.3390\/s19051005"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"4405","DOI":"10.1007\/s11042-015-3177-1","article-title":"A Survey of Depth and Inertial Sensor Fusion for Human Action Recognition","volume":"76","author":"Chen","year":"2017","journal-title":"Multimed. Tools Appl."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Zhang, Z., Ma, X., Song, R., Rong, X., Tian, X., Tian, G., and Li, Y. (2017, January 20\u201322). Deep Learning-based Human Action Recognition: A Survey. Proceedings of the Chinese Automation Congress (CAC), Jinan, China.","DOI":"10.1109\/CAC.2017.8243438"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"107561","DOI":"10.1016\/j.patcog.2020.107561","article-title":"Sensor-based and Vision-based Human Activity Recognition: A Comprehensive Survey","volume":"108","author":"Min","year":"2020","journal-title":"Pattern Recognit."},{"key":"ref_25","unstructured":"Sun, Z., Liu, J., Ke, Q., Rahmani, H., Bennamoun, M., and Wang, G. (2020). Human Action Recognition from Various Data Modalities: A Review. arXiv."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.patcog.2019.05.020","article-title":"RGB-D sensing based human action and interaction analysis: A survey","volume":"94","author":"Liu","year":"2019","journal-title":"Pattern Recognit."},{"key":"ref_27","first-page":"1","article-title":"Recent evolution of modern datasets for human activity recognition: A deep survey","volume":"26","author":"Singh","year":"2019","journal-title":"Multimed. Syst."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"130","DOI":"10.1016\/j.patcog.2015.11.019","article-title":"3D skeleton-based human action classification: A survey","volume":"53","author":"Presti","year":"2016","journal-title":"Pattern Recognit."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"64241","DOI":"10.1109\/ACCESS.2021.3075766","article-title":"Content-based Management of Human Motion Data: Survey and Challenges","volume":"9","author":"Sedmidubsky","year":"2021","journal-title":"IEEE Access"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Rosin, P.L., Lai, Y.K., Shao, L., and Liu, Y. (2019). RGB-D Image Analysis and Processing, Springer.","DOI":"10.1007\/978-3-030-28603-3"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Liu, J., Shahroudy, A., Perez, M.L., Wang, G., Duan, L.Y., and Kot Chichung, A. (2019). NTU RGB + D 120: A Large-Scale Benchmark for 3D Human Activity Understanding. IEEE Trans. Pattern Anal. Mach. Intell. TPAMI, 2684\u20132701.","DOI":"10.1109\/TPAMI.2019.2916873"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"T\u00f6lgyessy, M., Dekan, M., Chovanec, L., and Hubinsk\u1ef3, P. (2021). Evaluation of the Azure Kinect and Its Comparison to Kinect V1 and Kinect V2. Sensors, 21.","DOI":"10.3390\/s21020413"},{"key":"ref_33","unstructured":"Microsoft (2021, June 14). Buy the Azure Kinect Developer kit\u2013Microsoft. Available online: https:\/\/www.microsoft.com\/en-us\/d\/azure-kinect-dk\/8pp5vxmd9nhq."},{"key":"ref_34","unstructured":"EB Games (2021, June 14). Kinect for Xbox One (Preowned)-Xbox One-EB Games Australia. Available online: https:\/\/www.ebgames.com.au\/product\/xbox-one\/202155-kinect-for-xbox-one-preowned."},{"key":"ref_35","unstructured":"EB Games (2021, June 14). Kinect for Xbox 360 without AC Adapter (Preowned)-Xbox 360-EB Games Australia. Available online: https:\/\/www.ebgames.com.au\/product\/xbox360\/151784-kinect-for-xbox-360-without-ac-adapter-preowned."},{"key":"ref_36","unstructured":"Intel Corporation (2021, June 14). LiDAR Camera L515 \u2013 Intel\u00ae RealSense\u2122 Depth and Tracking Cameras. Available online: https:\/\/www.intelrealsense.com\/lidar-camera-l515\/."},{"key":"ref_37","unstructured":"Orbbec 3D (2021, June 14). Astra Series-Orbbec. Available online: https:\/\/orbbec3d.com\/product-astra-pro."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Lee, I.J. (2020). Kinect-for-windows with augmented reality in an interactive roleplay system for children with an autism spectrum disorder. Interact. Learn. Environ., 1\u201317.","DOI":"10.1080\/10494820.2019.1710851"},{"key":"ref_39","first-page":"159","article-title":"Using game-based learning with kinect technology in foreign language education course","volume":"21","author":"Yukselturk","year":"2018","journal-title":"J. Educ. Technol. Soc."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Pal, M., Saha, S., and Konar, A. (2016, January 23\u201325). Distance matching based gesture recognition for healthcare using Microsoft\u2019s Kinect sensor. Proceedings of the International Conference on Microelectronics, Computing and Communications (MicroCom), Durga, India.","DOI":"10.1109\/MicroCom.2016.7522586"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Ketoma, V.K., Sch\u00e4fer, P., and Meixner, G. (2018, January 7\u20139). Development and evaluation of a virtual reality grocery shopping application using a multi-Kinect walking-in-place approach. Proceedings of the International Conference on Intelligent Human Systems Integration, Dubai, UAE.","DOI":"10.1007\/978-3-319-73888-8_57"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"3935","DOI":"10.1109\/TITS.2018.2791476","article-title":"A Kinect-based approach for 3D pavement surface reconstruction and cracking recognition","volume":"19","author":"Zhang","year":"2018","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Keselman, L., Woodfill, J.I., Grunnet-Jepsen, A., and Bhowmik, A. (2017, January 21\u201326). Intel(R) RealSense(TM) Stereoscopic Depth Cameras. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPRW.2017.167"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Drouin, M.A., and Seoud, L. (2020). Consumer-Grade RGB-D Cameras. 3D Imaging, Analysis and Applications, Springer.","DOI":"10.1007\/978-3-030-44070-1_5"},{"key":"ref_45","unstructured":"Grunnet-Jepsen, A., Sweetser, J.N., and Woodfill, J. (2021, January 28). Best Known Methods for Tuning Intel\u00ae RealSense\u2122 Depth Cameras D415. Available online: https:\/\/www.intel.com.au\/content\/www\/au\/en\/support\/articles\/000027833\/emerging-technologies\/intel-realsense-technology.html."},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Zabatani, A., Surazhsky, V., Sperling, E., Moshe, S.B., Menashe, O., Silver, D.H., Karni, T., Bronstein, A.M., Bronstein, M.M., and Kimmel, R. (2019). Intel\u00ae RealSense\u2122 SR300 Coded light depth Camera. IEEE Trans. Pattern Anal. Mach. Intell. TPAMI, 2333\u20132345.","DOI":"10.1109\/TPAMI.2019.2915841"},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Coroiu, A.D.C.A., and Coroiu, A. (2018, January 6\u20138). Interchangeability of Kinect and Orbbec Sensors for Gesture Recognition. Proceedings of the 2018 IEEE 14th International Conference on Intelligent Computer Communication and Processing (ICCP), Cluj-Napoca, Romania.","DOI":"10.1109\/ICCP.2018.8516586"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Villena-Mart\u00ednez, V., Fuster-Guill\u00f3, A., Azor\u00edn-L\u00f3pez, J., Saval-Calvo, M., Mora-Pascual, J., Garcia-Rodriguez, J., and Garcia-Garcia, A. (2017). A Quantitative Comparison of Calibration Methods for RGB-D Sensors Using Different Technologies. Sensors, 17.","DOI":"10.3390\/s17020243"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Oreifej, O., and Liu, Z. (2013, January 23\u201328). HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, OR, USA.","DOI":"10.1109\/CVPR.2013.98"},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"155","DOI":"10.1007\/s11554-013-0370-1","article-title":"Real-time Human Action Recognition Based on Depth Motion Maps","volume":"12","author":"Chen","year":"2016","journal-title":"J. Real Time Image Process."},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"2","DOI":"10.1016\/j.jvcir.2013.03.001","article-title":"Effective 3D Action Recognition using EigenJoints","volume":"25","author":"Yang","year":"2014","journal-title":"J. Vis. Commun. Image Represent."},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Li, M., Leung, H., and Shum, H.P. (2016, January 10\u201312). Human Action Recognition via Skeletal and Depth based Feature Fusion. Proceedings of the 9th International Conference on Motion in Games, Burlingame, CA, USA.","DOI":"10.1145\/2994258.2994268"},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Yang, X., and Tian, Y. (2014, January 23\u201328). Super Normal Vector for Activity Recognition using Depth Sequences. Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OR, USA.","DOI":"10.1109\/CVPR.2014.108"},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Rahmani, H., Mahmood, A., Huynh, D.Q., and Mian, A. (2014, January 24\u201326). Real Time Action Recognition using Histograms of Depth Gradients and Random Decision Forests. Proceedings of the IEEE Winter Conference on Applications of Computer Vision, (WCACV), SteamBoats Springs, CO, USA,.","DOI":"10.1109\/WACV.2014.6836044"},{"key":"ref_55","unstructured":"Yang, X., Zhang, C., and Tian, Y. (November, January 29). Recognizing Actions using Depth Motion Maps-based Histograms of Oriented Gradients. Proceedings of the 20th ACM International Conference on Multimedia, Nara, Japan."},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Chen, C., Jafari, R., and Kehtarnavaz, N. (2015, January 5\u20139). Action Recognition from Depth Sequences using Depth Motion Maps-based Local Binary Patterns. Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WCACV), Waikola, HI, USA.","DOI":"10.1109\/WACV.2015.150"},{"key":"ref_57","doi-asserted-by":"crossref","first-page":"182","DOI":"10.1016\/j.jvcir.2014.11.008","article-title":"TriViews: A General Framework to use 3D Depth Data Effectively for Action Recognition","volume":"26","author":"Chen","year":"2015","journal-title":"J. Vis. Commun. Image Represent."},{"key":"ref_58","doi-asserted-by":"crossref","unstructured":"Miao, J., Jia, X., Mathew, R., Xu, X., Taubman, D., and Qing, C. (2016, January 25\u201328). Efficient Action Recognition from Compressed Depth Maps. Proceedings of the IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA.","DOI":"10.1109\/ICIP.2016.7532310"},{"key":"ref_59","doi-asserted-by":"crossref","unstructured":"Xia, L., Chen, C., and Aggarwal, J. (2012, January 16\u201321). View Invariant Human Action Recognition using Histograms of 3D Joints. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA.","DOI":"10.1109\/CVPRW.2012.6239233"},{"key":"ref_60","unstructured":"Gowayyed, M.A., Torki, M., Hussein, M.E., and El-Saban, M. (2013, January 3\u20139). Histogram of Oriented Displacements (HOD): Describing Trajectories of Human Joints for Action Recognition. Proceedings of the 23rd International Joint Conference on Artificial Intelligence, Beijing, China."},{"key":"ref_61","doi-asserted-by":"crossref","first-page":"10","DOI":"10.1016\/j.jvcir.2015.03.002","article-title":"Joint Movement Similarities for Robust 3D Action Recognition using Skeletal Data","volume":"30","author":"Lam","year":"2015","journal-title":"J. Vis. Commun. Image Represent."},{"key":"ref_62","doi-asserted-by":"crossref","unstructured":"Papadopoulos, G.T., Axenopoulos, A., and Daras, P. (2014, January 6\u201310). Real-time Skeleton-tracking-based Human Action Recognition using Kinect Data. Proceedings of the International Conference on Multimedia Modeling, Dublin, Ireland.","DOI":"10.1007\/978-3-319-04114-8_40"},{"key":"ref_63","doi-asserted-by":"crossref","unstructured":"Chaaraoui, A., Padilla-Lopez, J., and Fl\u00f3rez-Revuelta, F. (2013, January 1\u20138). Fusion of Skeletal and Silhouette-based Features for Human Action Recognition with RGB-D Devices. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Sydney, Australia.","DOI":"10.1109\/ICCVW.2013.19"},{"key":"ref_64","doi-asserted-by":"crossref","first-page":"1800","DOI":"10.1016\/j.patcog.2013.11.032","article-title":"Human Activity Recognition using Multi-features and Multiple Kernel Learning","volume":"47","author":"Althloothi","year":"2014","journal-title":"Pattern Recognit."},{"key":"ref_65","unstructured":"Liu, L., and Shao, L. (2013, January 3\u20139). Learning Discriminative Representations from RGB-D Video Data. Proceedings of the International Joint Conference on Artificial Intelligence, Beijing, China."},{"key":"ref_66","doi-asserted-by":"crossref","first-page":"295","DOI":"10.1016\/j.patcog.2016.08.003","article-title":"Robust Human Activity Recognition from Depth Video using Spatiotemporal Multi-fused Features","volume":"61","author":"Jalal","year":"2017","journal-title":"Pattern Recognit."},{"key":"ref_67","first-page":"1383","article-title":"Multilevel Depth and Image Fusion for Human Activity Detection","volume":"43","author":"Ni","year":"2013","journal-title":"IEEE Trans. Syst. Man Cybern."},{"key":"ref_68","doi-asserted-by":"crossref","first-page":"2856","DOI":"10.1109\/TIP.2016.2556940","article-title":"Discriminative relational representation learning for RGB-D action recognition","volume":"25","author":"Kong","year":"2016","journal-title":"IEEE Trans. Image Process."},{"key":"ref_69","doi-asserted-by":"crossref","first-page":"1651","DOI":"10.1109\/TPAMI.2015.2491925","article-title":"Structure-preserving binary representations for RGB-D action recognition","volume":"38","author":"Yu","year":"2015","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_70","doi-asserted-by":"crossref","first-page":"27","DOI":"10.1016\/j.neucom.2015.09.116","article-title":"Deep learning for visual understanding: A review","volume":"187","author":"Guo","year":"2016","journal-title":"Neurocomputing"},{"key":"ref_71","doi-asserted-by":"crossref","unstructured":"Yan, S., Xiong, Y., and Lin, D. (2018, January 2\u20137). Spatial temporal graph convolutional networks for skeleton-based action recognition. Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA.","DOI":"10.1609\/aaai.v32i1.12328"},{"key":"ref_72","doi-asserted-by":"crossref","unstructured":"Miikkulainen, R., Liang, J., Meyerson, E., Rawal, A., Fink, D., Francon, O., Raju, B., Shahrzad, H., Navruzyan, A., and Duffy, N. (2019). Chapter 15-Evolving Deep Neural Networks. Artificial Intelligence in the Age of Neural Networks and Brain Computing, Academic Press.","DOI":"10.1016\/B978-0-12-815480-9.00015-3"},{"key":"ref_73","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput."},{"key":"ref_74","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1016\/j.eswa.2016.04.032","article-title":"Human activity recognition with smartphone sensors using deep learning neural networks","volume":"59","author":"Ronao","year":"2016","journal-title":"Expert Syst. Appl."},{"key":"ref_75","doi-asserted-by":"crossref","unstructured":"Zhu, W., Lan, C., Xing, J., Zeng, W., Li, Y., Shen, L., and Xie, X. (2016, January 12\u201317). Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA.","DOI":"10.1609\/aaai.v30i1.10451"},{"key":"ref_76","unstructured":"Kipf, T.N., and Welling, M. (2017). Semi-Supervised Classification with Graph Convolutional Networks. arXiv."},{"key":"ref_77","doi-asserted-by":"crossref","unstructured":"Caetano, C., Sena de Souza, J., Santos, J., and Schwartz, W. (2019, January 18\u201321). SkeleMotion: A New Representation of Skeleton Joint Sequences Based on Motion Information for 3D Action Recognition. Proceedings of the IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Taipei, Taiwan.","DOI":"10.1109\/AVSS.2019.8909840"},{"key":"ref_78","doi-asserted-by":"crossref","unstructured":"Li, W., Zhang, Z., and Liu, Z. (2010, January 13\u201318). Action recognition based on a bag of 3d points. Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.","DOI":"10.1109\/CVPRW.2010.5543273"},{"key":"ref_79","doi-asserted-by":"crossref","unstructured":"Damen, D., Doughty, H., Farinella, G.M., Fidler, S., Furnari, A., Kazakos, E., Moltisanti, D., Munro, J., Perrett, T., and Price, W. (2018, January 8\u201314). Scaling egocentric vision: The epic-kitchens dataset. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01225-0_44"},{"key":"ref_80","unstructured":"Soomro, K., Zamir, A.R., and Shah, M. (2012). UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild. arXiv."},{"key":"ref_81","unstructured":"Das, S., Dai, R., Koperski, M., Minciullo, L., Garattoni, L., Bremond, F., and Francesca, G. (November, January 27). Toyota Smarthome: Real-World Activities of Daily Living. Proceedings of the Internation Conference on Computer Vision (ICCV), Seoul, Korea."},{"key":"ref_82","doi-asserted-by":"crossref","unstructured":"Ni, B., Wang, G., and Moulin, P. (2011, January 6\u201313). RGBD-HuDaAct: A color-depth video database for human daily activity recognition. Proceedings of the Internation Conference on Computer Vision (ICCV), Barcelona, Spain.","DOI":"10.1109\/ICCVW.2011.6130379"},{"key":"ref_83","doi-asserted-by":"crossref","unstructured":"Chen, C., Jafari, R., and Kehtarnavaz, N. (2015, January 27\u201330). UTD-MHAD: A Multimodal Dataset for Human Action Recognition Utilizing A Depth Camera and A Wearable Inertial Sensor. Proceedings of the Int. Conf. on Image Processing, Quebec City, QC, Canada.","DOI":"10.1109\/ICIP.2015.7350781"},{"key":"ref_84","doi-asserted-by":"crossref","unstructured":"Sigurdsson, G.A., Varol, G., Wang, X., Farhadi, A., Laptev, I., and Gupta, A. (2016, January 11\u201314). Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding. Proceedings of the European Conference Computer Vision (ECCV), Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_31"},{"key":"ref_85","unstructured":"Korbar, B., Tran, D., and Torresani, L. (November, January 27). SCSampler: Sampling Salient Clips from Video for Efficient Action Recognition. Proceedings of the International Conference on Compututer Vision (ICCV), Seoul, Korea."},{"key":"ref_86","doi-asserted-by":"crossref","unstructured":"Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., and Fei-Fei, L. (2014, January 24\u201327). Large-scale Video Classification With Convolutional Neural Networks. Proceedings of the IEEE Computer Society Conference Computer Vision Pattern Recognit (CVPR), Columbus, OR, USA.","DOI":"10.1109\/CVPR.2014.223"},{"key":"ref_87","doi-asserted-by":"crossref","unstructured":"Kim, S., Yun, K., Park, J., and Choi, J. (2019, January 7\u201311). Skeleton-Based Action Recognition of People Handling Objects. Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WCACV), Waikoloa, HI, USA.","DOI":"10.1109\/WACV.2019.00014"},{"key":"ref_88","unstructured":"Zhu, J., Zou, W., Xu, L., Hu, Y., Zhu, Z., Chang, M., Huang, J., Huang, G., and Du, D. (2018). Action Machine: Rethinking Action Recognition in Trimmed Videos. arXiv."},{"key":"ref_89","doi-asserted-by":"crossref","unstructured":"Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., and Serre, T. (2011, January 6\u201313). HMDB: A Large Video Database for Human Motion Recognition. Proceedings of the International Conference on Computer Vision (ICCV), Barcelona, Spain.","DOI":"10.1109\/ICCV.2011.6126543"},{"key":"ref_90","doi-asserted-by":"crossref","unstructured":"Shahroudy, A., Liu, J., Ng, T.T., and Wang, G. (2016, January 27\u201330). NTU RGB + D: A Large Scale Dataset for 3D Human Activity Analysis. Proceedings of the IEEE Computer Society Conference Computer Vision Pattern Recognition (CVPR), Los Alamitos, CA, USA.","DOI":"10.1109\/CVPR.2016.115"},{"key":"ref_91","doi-asserted-by":"crossref","unstructured":"Wang, J., Nie, X., Xia, Y., Wu, Y., and Zhu, S.C. (2014, January 23\u201328). Cross-view Action Modeling, Learning and Recognition. Proceedings of the IEEE Computer Society Conference Computer Vision Pattern Recognition (CVPR), Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.339"},{"key":"ref_92","unstructured":"Zhao, Y., Liu, Z., Yang, L., and Cheng, H. (2012, January 3\u20136). Combing RGB and Depth Map Features for human activity recognition. Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, Hollywood, CA, USA."},{"key":"ref_93","doi-asserted-by":"crossref","unstructured":"Ye, J., Li, K., Qi, G.J., and Hua, K.A. (2015, January 23\u201326). Temporal order-preserving dynamic quantization for human action recognition from multimodal sensor streams. Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, Shanghai, China.","DOI":"10.1145\/2671188.2749340"},{"key":"ref_94","doi-asserted-by":"crossref","first-page":"1045","DOI":"10.1109\/TPAMI.2017.2691321","article-title":"Deep multimodal feature analysis for action recognition in RGB + D videos","volume":"40","author":"Shahroudy","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell. TPAMI"},{"key":"ref_95","unstructured":"Ryoo, M.S., Piergiovanni, A., Tan, M., and Angelova, A. (2020). AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures. arXiv."},{"key":"ref_96","unstructured":"Tran, D., Wang, H., Torresani, L., and Feiszli, M. (November, January 27). Video Classification with Channel-separated Convolutional Networks. Proceedings of the International Conference on Computer Vision (ICCV), Seoul, Korea."},{"key":"ref_97","unstructured":"Wang, L., Koniusz, P., and Huynh, D.Q. (November, January 27). Hallucinating iDT Descriptors and i3D Optical Flow Features for Action Recognition with CNNs. Proceedings of the International Conference on Computer Vision (ICCV), Seoul, Korea."},{"key":"ref_98","unstructured":"Kazakos, E., Nagrani, A., Zisserman, A., and Damen, D. (November, January 27). EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Korea."},{"key":"ref_99","doi-asserted-by":"crossref","unstructured":"Das, S., Sharma, S., Dai, R., Br\u00e9mond, F., and Thonnat, M. (2020). VPN: Learning Video-Pose Embedding for Activities of Daily Living. ECCV 2020, Springer.","DOI":"10.1007\/978-3-030-58545-7_5"},{"key":"ref_100","doi-asserted-by":"crossref","unstructured":"Islam, M.M., and Iqbal, T. (2020). HAMLET: A Hierarchical Multimodal Attention-based Human Activity Recognition Algorithm. arXiv.","DOI":"10.1109\/IROS45743.2020.9340987"},{"key":"ref_101","doi-asserted-by":"crossref","unstructured":"Davoodikakhki, M., and Yin, K. (2020). Hierarchical action classification with network pruning. International Symposium on Visual Computing, Springer.","DOI":"10.1007\/978-3-030-64556-4_23"},{"key":"ref_102","unstructured":"Wang, P., Li, W., Gao, Z., Zhang, J., Tang, C., and Ogunbona, P. (2015). Deep Convolutional Neural Networks for Action Recognition Using Depth Map Sequences. arXiv."},{"key":"ref_103","doi-asserted-by":"crossref","unstructured":"Wang, P., Wang, S., Gao, Z., Hou, Y., and Li, W. (2017, January 22\u201329). Structured Images for RGB-D Action Recognition. Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV 2017), Venice, Italy.","DOI":"10.1109\/ICCVW.2017.123"},{"key":"ref_104","doi-asserted-by":"crossref","first-page":"3459","DOI":"10.1109\/TIP.2018.2818328","article-title":"Spatio-Temporal Attention-Based LSTM Networks for 3D Action Recognition and Detection","volume":"27","author":"Song","year":"2018","journal-title":"IEEE Trans. Image Process. TIP"},{"key":"ref_105","doi-asserted-by":"crossref","unstructured":"Ye, Y., and Tian, Y. (2016, January 27\u201330). Embedding Sequential Information into Spatiotemporal Features for Action Recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPRW.2016.142"},{"key":"ref_106","unstructured":"Krizhevsky, A., Sutskever, I., and Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Proceedings of the Advances in Neural Information Processing Systems."},{"key":"ref_107","doi-asserted-by":"crossref","unstructured":"Baccouche, M., Mamalet, F., Wolf, C., Garcia, C., and Baskurt, A. (2011). Sequential Deep Learning for Human Action Recognition. International Workshop on Human Behavior Understanding, Springer.","DOI":"10.1007\/978-3-642-25446-8_4"},{"key":"ref_108","doi-asserted-by":"crossref","first-page":"221","DOI":"10.1109\/TPAMI.2012.59","article-title":"3D Convolutional Neural Networks for Human Action Recognition","volume":"35","author":"Ji","year":"2013","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell. TPAMI"},{"key":"ref_109","unstructured":"Simonyan, K., and Zisserman, A. (2014, January 8\u201313). Two-stream Convolutional Networks for Action Recognition in Videos. Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_110","doi-asserted-by":"crossref","first-page":"2326","DOI":"10.1109\/TIP.2018.2791180","article-title":"Real-Time Action Recognition With Deeply Transferred Motion Vector CNNs","volume":"27","author":"Zhang","year":"2018","journal-title":"IEEE Trans. Image Process. TIP"},{"key":"ref_111","doi-asserted-by":"crossref","unstructured":"Feichtenhofer, C., Pinz, A., and Zisserman, A. (2016, January 27\u201330). Convolutional Two-Stream Network Fusion for Video Action Recognition. Proceedings of the IEEE Computer Society Conference Computer Vision Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.213"},{"key":"ref_112","doi-asserted-by":"crossref","unstructured":"Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., and Gool, L.V. (2016, January 11\u201314). Temporal Segment Networks: Towards Good Practices for Deep Action Recognition. Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46484-8_2"},{"key":"ref_113","doi-asserted-by":"crossref","unstructured":"Lan, Z., Zhu, Y., Hauptmann, A.G., and Newsam, S. (2017, January 21\u201326). Deep Local Video Feature for Action Recognition. Proceedings of the IEEE Computer Society Conference Computer Vision Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPRW.2017.161"},{"key":"ref_114","doi-asserted-by":"crossref","unstructured":"Zhou, B., Andonian, A., and Torralba, A. (2018, January 8\u201314). Temporal Relational Reasoning in Videos. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01246-5_49"},{"key":"ref_115","doi-asserted-by":"crossref","unstructured":"Carreira, J., and Zisserman, A. (2017, January 21\u201326). Quo Vadis, Action Recognition? In A New Model and the Kinetics Dataset. Proceedings of the IEEE Computer Society Conference Computer Vision Pattern Recognit. (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.502"},{"key":"ref_116","doi-asserted-by":"crossref","unstructured":"Zhu, Y., Lan, Z., Newsam, S., and Hauptmann, A. (2018, January 2\u20136). Hidden Two-Stream Convolutional Networks for Action Recognition. Proceedings of the Asian Conference on Computer Vision, Perth, Australia.","DOI":"10.1007\/978-3-030-20893-6_23"},{"key":"ref_117","unstructured":"Ng, J.Y.H., Hausknecht, M., Vijayanarasimhan, S., Vinyals, O., Monga, R., and Toderici, G. (2015, January 7\u201312). Beyond Short Snippets: Deep Networks for Video Classification. Proceedings of the IIEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA."},{"key":"ref_118","doi-asserted-by":"crossref","unstructured":"Qiu, Z., Yao, T., and Mei, T. (2017, January 22\u201329). Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks. Proceedings of the International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.590"},{"key":"ref_119","doi-asserted-by":"crossref","first-page":"3007","DOI":"10.1109\/TPAMI.2017.2771306","article-title":"Skeleton-Based Action Recognition Using Spatio-Temporal LSTM Network with Trust Gates","volume":"40","author":"Liu","year":"2018","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell. TPAMI"},{"key":"ref_120","doi-asserted-by":"crossref","first-page":"677","DOI":"10.1109\/TPAMI.2016.2599174","article-title":"Long-Term Recurrent Convolutional Networks for Visual Recognition and Description","volume":"39","author":"Donahue","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell. TPAMI"},{"key":"ref_121","unstructured":"Abu-El-Haija, S., Kothari, N., Lee, J., Natsev, P., Toderici, G., Varadarajan, B., and Vijayanarasimhan, S. (2016). YouTube-8M: A Large-Scale Video Classification Benchmark. arXiv."},{"key":"ref_122","doi-asserted-by":"crossref","unstructured":"Caba Heilbron, F., Victor Escorcia, B.G., and Niebles, J.C. (2015, January 7\u201312). ActivityNet: A Large-Scale Video Benchmark for Human Activity Understanding. Proceedings of the IEEE Computer Society Conference Computer Vision Pattern Recognition (CVPR), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298698"},{"key":"ref_123","doi-asserted-by":"crossref","first-page":"502","DOI":"10.4218\/etrij.17.0116.0054","article-title":"Extensible Hierarchical Method of Detecting Interactive Actions for Video Understanding","volume":"39","author":"Moon","year":"2017","journal-title":"ETRI J."},{"key":"ref_124","doi-asserted-by":"crossref","unstructured":"Moon, J., Kwon, Y., Kang, K., and Park, J. (2015, January 25\u201328). ActionNet-VE Dataset: A Dataset for Describing Visual Events by Extending VIRAT Ground 2.0. Proceedings of the 8th International Conference on Signal Processing, Image Processing and Pattern Recognition (SIP), Jeju, Korea.","DOI":"10.1109\/SIP.2015.9"},{"key":"ref_125","doi-asserted-by":"crossref","unstructured":"Liu, Y., Ma, L., Zhang, Y., Liu, W., and Chang, S. (2019, January 15\u201320). Multi-Granularity Generator for Temporal Action Proposal. Proceedings of the IEEE Computer Society Conference Computer Vision Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00372"},{"key":"ref_126","doi-asserted-by":"crossref","unstructured":"Qiu, Z., Yao, T., Ngo, C.W., Tian, X., and Mei, T. (2019, January 15\u201320). Learning Spatio-Temporal Representation With Local and Global Diffusion. Proceedings of the IEEE Computer Society Conference Computer Vision Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01233"},{"key":"ref_127","unstructured":"Lin, J., Gan, C., and Han, S. (November, January 27). TSM: Temporal Shift Module for Efficient Video Understanding. Proceedings of the International Conference on Computer Vision (ICCV), Seoul, Korea."},{"key":"ref_128","doi-asserted-by":"crossref","unstructured":"Girdhar, R., Jo\u00e3o Carreira, J., Doersch, C., and Zisserman, A. (2019, January 15\u201320). Video Action Transformer Network. Proceedings of the IEEE Computer Society Conference Computer Vision Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00033"},{"key":"ref_129","doi-asserted-by":"crossref","unstructured":"Hu, J.F., Zheng, W.S., Pan, J., Lai, J., and Zhang, J. (2018, January 8\u201314). Deep Bilinear Learning for RGB-D Action Recognition. Proceedings of the European Conference Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_21"},{"key":"ref_130","unstructured":"Sudhakaran, S., Escalera, S., and Lanz, O. (July, January 26). Gate-Shift Networks for Video Action Recognition. Proceedings of the IEEE Computer Society Conference Computer Vision Pattern Recognition (CVPR), Los Alamitos, CA, USA."},{"key":"ref_131","doi-asserted-by":"crossref","unstructured":"Liu, X., Lee, J., and Jin, H. (2019, January 15\u201320). Learning Video Representations From Correspondence Proposals. Proceedings of the IEEE Computer Society Conference Computer Vision Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00440"},{"key":"ref_132","doi-asserted-by":"crossref","unstructured":"Materzynska, J., Berger, G., Bax, I., and Memisevic, R. (2019, January 27\u201328). The Jester Dataset: A Large-Scale Video Dataset of Human Gestures. Proceedings of the International Conference on Computer Vision, Seoul, Korea.","DOI":"10.1109\/ICCVW.2019.00349"},{"key":"ref_133","unstructured":"Martin, M., Roitberg, A., Haurilet, M., Horne, M., Rei\u00df, S., Voit, M., and Stiefelhagen, R. (November, January 27). Drive & Act: A Multimodal Dataset for Fine-Grained Driver Behavior Recognition in Autonomous Vehicles. Proceedings of the International Conference on Computer Vision (ICCV), Seoul, Korea."},{"key":"ref_134","doi-asserted-by":"crossref","unstructured":"Munro, J., and Damen, D. (2020, January 14\u201319). Multi-modal Domain Adaptation for Fine-grained Action Recognition. Proceedings of the IEEE Computer Society Conference Computer Vision Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00020"},{"key":"ref_135","doi-asserted-by":"crossref","unstructured":"Jiang, H., Li, Y., Song, S., and Liu, J. (2018, January 21\u201322). Rethinking Fusion Baselines for Multimodal Human Action Recognition. Proceedings of the 19th Pacific-Rim Conference on Multimedia, Advances in Multimedia Information Processing, Hefei, China.","DOI":"10.1007\/978-3-030-00764-5_17"},{"key":"ref_136","first-page":"31","article-title":"Content based image retrieval: Classification using neural networks","volume":"6","author":"Shereena","year":"2014","journal-title":"Int. J. Multimed. Its Appl."},{"key":"ref_137","doi-asserted-by":"crossref","unstructured":"Bhaumik, H., Bhattacharyya, S., Nath, M.D., and Chakraborty, S. (2015, January 4\u20136). Real-time storyboard generation in videos using a probability distribution based threshold. Proceedings of the Fifth International Conference on Communication Systems and Network Technologies, Gwalior, India.","DOI":"10.1109\/CSNT.2015.169"},{"key":"ref_138","doi-asserted-by":"crossref","unstructured":"Lim, J.H., Teh, E.Y., Geh, M.H., and Lim, C.H. (2017, January 12\u201315). Automated classroom monitoring with connected visioning system. Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Kuala Lumpur, Malaysia.","DOI":"10.1109\/APSIPA.2017.8282063"},{"key":"ref_139","doi-asserted-by":"crossref","first-page":"86","DOI":"10.1016\/j.procs.2017.06.121","article-title":"Activity recognition and abnormal behaviour detection with recurrent neural networks","volume":"110","author":"Arifoglu","year":"2017","journal-title":"Procedia Comput. Sci."},{"key":"ref_140","doi-asserted-by":"crossref","first-page":"376","DOI":"10.1016\/j.compeleceng.2017.06.031","article-title":"A smartphone-based wearable sensors for monitoring real-time physiological data","volume":"65","author":"You","year":"2018","journal-title":"Comput. Electr. Eng."},{"key":"ref_141","doi-asserted-by":"crossref","unstructured":"Laptev, I., Marszalek, M., Schmid, C., and Rozenfeld, B. (2008, January 23\u201328). Learning realistic human actions from movies. Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA.","DOI":"10.1109\/CVPR.2008.4587756"},{"key":"ref_142","doi-asserted-by":"crossref","first-page":"2613","DOI":"10.1109\/TCSVT.2016.2576761","article-title":"Temporal pyramid pooling-based convolutional neural network for action recognition","volume":"27","author":"Wang","year":"2016","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_143","doi-asserted-by":"crossref","unstructured":"Kumar, K., Kishore, P., Kumar, D.A., and Kumar, E.K. (2018, January 4\u20135). Indian classical dance action identification using adaboost multiclass classifier on multifeature fusion. Proceedings of the 2018 Conference on Signal Processing And Communication Engineering Systems (SPACES), Vijayawada, India.","DOI":"10.1109\/SPACES.2018.8316338"},{"key":"ref_144","unstructured":"Castro, D., Hickson, S., Sangkloy, P., Mittal, B., Dai, S., Hays, J., and Essa, I. (2018). Let\u2019s Dance: Learning from Online Dance Videos. arXiv."},{"key":"ref_145","doi-asserted-by":"crossref","first-page":"548","DOI":"10.1016\/j.neucom.2016.09.063","article-title":"Learning deep event models for crowd anomaly detection","volume":"219","author":"Feng","year":"2017","journal-title":"Neurocomputing"},{"key":"ref_146","doi-asserted-by":"crossref","unstructured":"Li, T., Liu, J., Zhang, W., Ni, Y., Wang, W., and Li, Z. (2021). UAV-Human: A Large Benchmark for Human Behavior Understanding with Unmanned Aerial Vehicles. arXiv.","DOI":"10.1109\/CVPR46437.2021.01600"},{"key":"ref_147","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1016\/j.cviu.2017.04.011","article-title":"Computer vision for sports: Current applications and research topics","volume":"159","author":"Thomas","year":"2017","journal-title":"Comput. Vis. Image Underst."},{"key":"ref_148","doi-asserted-by":"crossref","first-page":"386","DOI":"10.1016\/j.future.2019.01.029","article-title":"Action recognition using optimized deep autoencoder and CNN for surveillance data streams of non-stationary environments","volume":"96","author":"Ullah","year":"2019","journal-title":"Future Gener. Comput. Syst."},{"key":"ref_149","doi-asserted-by":"crossref","first-page":"549","DOI":"10.1109\/TCSVT.2019.2894161","article-title":"stagNet: An attentive semantic RNN for group activity and individual action recognition","volume":"30","author":"Qi","year":"2019","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_150","doi-asserted-by":"crossref","first-page":"194457","DOI":"10.1109\/ACCESS.2020.3031005","article-title":"A Combined Object Detection Method With Application to Pedestrian Detection","volume":"8","author":"Gao","year":"2020","journal-title":"IEEE Access"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/12\/4246\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T06:20:26Z","timestamp":1760163626000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/12\/4246"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,6,21]]},"references-count":150,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2021,6]]}},"alternative-id":["s21124246"],"URL":"https:\/\/doi.org\/10.3390\/s21124246","relation":{"has-preprint":[{"id-type":"doi","id":"10.20944\/preprints202101.0369.v1","asserted-by":"object"}]},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,6,21]]}}}