Human‐action recognition using a multi‐layered fusion scheme of Kinect modalities

none 10.1049/iet-cvi.2016.0326

Institution of Engineering and Technology (IET)

265

89889654

64226

2025102702292000359

10.1049

2025-10-27T11:16:26Z

2017-04-28T22:14:27Z

IET Computer Vision IET Computer Vision 1751-9632 1751-9640 10 2017 11 7 10.1049/cvi2.v11.7 https://ietresearch.onlinelibrary.wiley.com/toc/17519640/11/7 Human‐action recognition using a multi‐layered fusion scheme of Kinect modalities Bassem Seddik LATIS Laboratory, National Engineering School of Sousse University of Sousse Sousse Tunisia National Engineering School of Sfax University of Sfax Sfax Tunisia http://orcid.org/0000-0003-0617-686X Sami Gazzah LATIS Laboratory, National Engineering School of Sousse University of Sousse Sousse Tunisia Najoua Essoukri Ben Amara LATIS Laboratory, National Engineering School of Sousse University of Sousse Sousse Tunisia This study addresses the problem of efficiently combining the joint, RGB and depth modalities of the Kinect sensor in order to recognise human actions. For this purpose, a multi‐layered fusion scheme concatenates different specific features, builds specialised local and global SVM models and then iteratively fuses their different scores. The authors essentially contribute in two levels: (i) they combine the performance of local descriptors with the strength of global bags‐of‐visual‐words representations. They are able then to generate improved local decisions that allow noisy frames handling. (ii) They also study the performance of multiple fusion schemes guided by different features concatenations, Fisher vectors representations concatenation and later iterative scores fusion. To prove the efficiency of their approach, they have evaluated their experiments on two challenging public datasets: CAD‐60 and CGC‐2014. Competitive results are obtained for both benchmarks. 08 18 2017 10 2017 530 540 10.1049/iet-cvi.2016.0326 http://onlinelibrary.wiley.com/termsAndConditions#vor 10.1049/iet-cvi.2016.0326 https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/iet-cvi.2016.0326 https://ietresearch.onlinelibrary.wiley.com/doi/pdf/10.1049/iet-cvi.2016.0326 https://onlinelibrary.wiley.com/doi/pdf/10.1049/iet-cvi.2016.0326 https://onlinelibrary.wiley.com/doi/full-xml/10.1049/iet-cvi.2016.0326 10.1049/iet-cvi.2015.0321 10.1016/j.patrec.2014.04.011 10.1049/iet-cvi.2013.0323 10.1049/iet-cvi.2015.0291 10.3389/frobt.2015.00028 10.1007/978-3-319-46448-0_10 Haque A. Peng B. Luo Z. et al: ‘Towards viewpoint invariant 3d human pose estimation’.Proc. ECCV 2016 pp.160–177 10.1007/978-3-319-10602-1_37 Wang L. Qiao Y. Tang X.: ‘Video action detection with relational dynamic‐poselets’.Proc. ECCV 2014 pp.565–580 10.1007/s11263-016-0917-2 10.1109/CVPR.2008.4587756 Laptev I. Marszalek M. Schmid C. et al: ‘Learning realistic human actions from movies’.Proc. CVPR 2008 pp.1–8 10.1109/ICCV.2013.396 Jhuang H. Gall J. Zuffi S. et al: ‘Towards understanding action recognition’.Proc. ICCV 2013 pp.3192–3199 10.1109/ICRA.2012.6224591 Sung J. Ponce C. Selman B. et al: ‘Unstructured human activity detection from rgbd images’.Proc. ICRA 2012 pp.842–849 10.1007/s00138-014-0596-3 10.1007/978-3-319-16178-5_32 Escalera S. Baró X. Gonzàlez J. et al: ‘Chalearn looking at people challenge 2014: dataset and results’.Proc. ECCV Workshops 2014 pp.459–473 10.1016/j.neucom.2015.09.116 Krizhevsky A. Sutskever I. Hinton G.E.: ‘ImageNet classification with deep convolutional neural networks’.Proc. NIPS 2012 pp.1097–1105 10.1007/978-3-642-15561-1_11 Perronnin F. Sánchez J. Mensink T.: ‘Improving the Fisher kernel for large‐scale image classification’.Proc. ECCV 2010 pp.143–156 10.1109/ICCV.2015.222 Pfister T. Charles J. Zisserman A.: ‘Flowing convNets for human pose estimation in videos’.Proc. ICCV 2015 pp.1913–1921 10.1109/TPAMI.2015.2461544 10.1109/CVPR.2015.7299059 Wang L. Qiao Y. Tang X.: ‘Action recognition with trajectory‐pooled deep‐convolutional descriptors’.Proc. CVPR 2015 pp.4305–4314 10.1109/EUSIPCO.2015.7362562 Seddik B. Gazzah S. Essoukri Ben Amara N.: ‘Hands face and joints for multi‐modal human‐action temporal segmentation and recognition’.Proc. EUSIPCO 2015 pp.1143–1147 10.1007/978-3-319-23234-8_65 Seddik B. Gazzah S. Essoukri Ben Amara N.: ‘Modalities combination for Italian sign language extraction and recognition’.Proc. ICIAP 2015 pp.710–721 J. Mach. Learn. Res. Wan J. 2549 14 2013 One‐shot learning gesture recognition from rgb‐d data using bag of features 10.1177/0278364913478446 10.1007/978-3-319-16178-5_41 Camgöz N.C. Kindiroglu A.A. Akarun L.: ‘Gesture recognition using template based random forest classifiers’.Proc. ECCV Workshops 2014 pp.579–594 10.1016/j.cviu.2016.04.005 10.1109/THMS.2014.2377111 10.1007/978-3-319-16178-5_34 Monnier C. German S. Ost A.: ‘A multi‐scale boosted detector for efficient and robust gesture recognition’.Proc. ECCV Workshops 2014 pp.491–502 10.1109/ARSO.2014.7020983 Shan J. Akella S.: ‘3d human action segmentation and recognition using pose kinetic energy’.Proc. ARSO 2014 pp.69–75 10.1109/ICCV.2013.342 Zanfir M. Leordeanu M. Sminchisescu C.: ‘The moving pose: an efficient 3d kinematics descriptor for low‐latency action recognition and detection’.Proc. ICCV 2013 pp.2752–2759 10.1007/978-3-319-16178-5_35 Chang J.Y.: ‘Nonparametric gesture labeling from multi‐modal data’.Proc. ECCV Workshops 2014 pp.503–517 10.1109/ROMAN.2014.6926340 Faria D.R. Premebida C. Nunes U.: ‘A probabilistic approach for human everyday activities recognition using body motion from rgb‐d images’.Proc. RO‐MAN 2014 pp.732–737 10.1155/2016/4351435 10.1109/TPAMI.2015.2439257 10.1049/iet-cvi.2015.0233 10.1016/j.patrec.2013.09.009 10.1109/CVPR.2011.5995407 Wang H. Kläser A. Schmid C. et al: ‘Action recognition by dense trajectories’.Proc. CVPR 2011 pp.3169–3176 10.1109/ICCV.2013.441 Wang H. Schmid C.: ‘Action recognition with improved trajectories’.Proc. ICCV 2013 pp.3551–3558 10.1016/j.patrec.2013.10.010 10.1007/978-3-319-16178-5_44 Liang B. Zheng L.: ‘Multi‐modal gesture recognition using skeletal joints and motion trail model’.Proc. ECCV Workshops 2014 pp.623–638 10.1109/CVPR.2013.98 Oreifej O. Liu Z.: ‘Hon4d: histogram of oriented 4d normals for activity recognition from depth sequences’.Proc. CVPR 2013 pp.716–723 10.1016/j.cviu.2015.05.010 10.1016/j.imavis.2014.04.005 10.3389/fnbot.2015.00003 10.1049/iet-cvi.2013.0306 10.1016/j.cviu.2016.03.013 10.1007/978-3-319-16178-5_36 Peng X. Wang L. Cai Z. et al: ‘Action and gesture temporal spotting with super vector representation’.Proc. ECCV Workshops 2014 pp.518–527 10.1049/iet-cvi.2013.0015 10.1016/j.patrec.2014.07.011 Int. J. Comput. Vis. Pigou L. 1 124 2016 Beyond temporal pooling: recurrence and temporal convolutions for gesture recognition in video 10.1109/TPAMI.2016.2537340 10.1049/iet-cvi.2015.0235 10.1007/978-3-642-33709-3_13 Ni B. Moulin P. Yan S.: ‘Order‐Preserving sparse coding for sequence classification’.Proc. ECCV 2012 pp.173–187 10.1016/j.jvcir.2013.03.001 10.1109/CVPR.2016.456 Molchanov P. Yang X. Gupta S. et al: ‘Online detection and classification of dynamic hand gestures with recurrent 3d convolutional neural networks’.Proc. CVPR 2016 pp.4207–4215 10.1007/978-3-319-16178-5_42 Evangelidis G.D. Singh G. Horaud R.: ‘Continuous gesture recognition from articulated poses’.Proc. ECCV Workshops 2014 pp.595–607 10.1109/SSD.2013.6564032 Seddik B. Maâmatou H. Gazzah S. et al: ‘Unsupervised facial expressions recognition and avatar reconstruction from kinect’.Proc. SSD 2013 pp.1–6