{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,28]],"date-time":"2026-01-28T06:57:02Z","timestamp":1769583422127,"version":"3.49.0"},"reference-count":12,"publisher":"SAGE Publications","issue":"1","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IFS"],"published-print":{"date-parts":[[2024,1,10]]},"abstract":"<jats:p>Gestures have long been recognized as an interaction technique that can provide a more natural, creative, and intuitive way to communicate with computers. However, some existing difficulties include the high probability that the same type of movement done at different speeds will be recognized as a different category of movement; cluttered, occluded, and low-resolution backgrounds; and the near-impossibility of fusing different types of features. To this end, we propose a novel framework for integrating different scales of RGB and motion skeletons to obtain higher recognition accuracy using multiple features. Specifically, we provide a network architecture that combines a three-dimensional convolutional neural network (3DCNN) and post-fusion to better embed different features. Also, we combine RGB and motion skeleton information at different scales to mitigate speed and background issues. Experiments on several gesture recognition public datasets show desirable results, validating the superiority of the proposed gesture recognition method. Finally, we do a human-computer interaction experiment to prove its practicality.<\/jats:p>","DOI":"10.3233\/jifs-234791","type":"journal-article","created":{"date-parts":[[2023,11,24]],"date-time":"2023-11-24T12:38:31Z","timestamp":1700829511000},"page":"1647-1661","source":"Crossref","is-referenced-by-count":0,"title":["Gestures recognition based on multimodal fusion by using 3D CNNs"],"prefix":"10.1177","volume":"46","author":[{"given":"Yimin","family":"Zhu","sequence":"first","affiliation":[{"name":"School of Information Engineering, Shenyang University of Chemical Technology, shenyang, China"},{"name":"State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang, China"},{"name":"Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences, Shenyang, China"}]},{"given":"Qing","family":"Gao","sequence":"additional","affiliation":[{"name":"School of Electronics and Communication Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen, China"},{"name":"State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang, China"},{"name":"Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences, Shenyang, China"}]},{"given":"Hongyan","family":"Shi","sequence":"additional","affiliation":[{"name":"School of Information Engineering, Shenyang University of Chemical Technology, shenyang, China"}]},{"given":"Jinguo","family":"Liu","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang, China"},{"name":"Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences, Shenyang, China"}]}],"member":"179","reference":[{"key":"10.3233\/JIFS-234791_ref1","first-page":"7291","article-title":"Realtime multi-person 2dpose estimation using part affinity fields, inpp","author":"Cao","year":"2017","journal-title":"Proceedings ofthe IEEE conference on computer vision and pattern recognition"},{"key":"10.3233\/JIFS-234791_ref5","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1007\/s11263-015-0816-y","article-title":"Imagenetlarge scale visual recognition challenge","volume":"115","author":"Russakovsky","year":"2015","journal-title":"International Journalof Computer Vision"},{"issue":"4","key":"10.3233\/JIFS-234791_ref12","doi-asserted-by":"crossref","first-page":"82","DOI":"10.3390\/jimaging9040082","article-title":"A 3dcnn-based knowledge distillationframework for human activity recognition","volume":"9","author":"Ullah","year":"2023","journal-title":"Journal of Imaging"},{"issue":"3","key":"10.3233\/JIFS-234791_ref18","doi-asserted-by":"crossref","first-page":"231","DOI":"10.1006\/cviu.2000.0897","article-title":"A survey of computer vision-based humanmotion capture","volume":"81","author":"Moeslund","year":"2001","journal-title":"Computer Vision and Image Understanding"},{"issue":"2","key":"10.3233\/JIFS-234791_ref19","first-page":"4","article-title":"Microsoft kinect sensor and its effect","volume":"19","author":"Zhang","year":"2012","journal-title":"IEEEMultimedia"},{"issue":"4","key":"10.3233\/JIFS-234791_ref21","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3072959.3073596","article-title":"Vnect: Real-time 3d humanpose estimation with a single rgb camera","volume":"36","author":"Mehta","year":"2017","journal-title":"Acm Transactions onGraphics (tog)"},{"issue":"4","key":"10.3233\/JIFS-234791_ref22","doi-asserted-by":"crossref","first-page":"956","DOI":"10.1109\/TPAMI.2018.2827052","article-title":"Real-time 3d hand poseestimation with 3d convolutional neural networks","volume":"41","author":"Ge","year":"2018","journal-title":"IEEETransactions on Pattern Analysis and Machine Intelligence"},{"issue":"11","key":"10.3233\/JIFS-234791_ref23","doi-asserted-by":"crossref","first-page":"1676","DOI":"10.1109\/TVCG.2010.272","article-title":"Learning a3d human pose distance metric from geometric pose descriptor","volume":"17","author":"Chen","year":"2010","journal-title":"IEEE Transactions on Visualization and Computer Graphics"},{"key":"10.3233\/JIFS-234791_ref25","doi-asserted-by":"crossref","first-page":"346","DOI":"10.1016\/j.patcog.2017.02.030","article-title":"Enhanced skeleton visualization for viewinvariant human action recognition","volume":"68","author":"Liu","year":"2017","journal-title":"Pattern Recognition"},{"issue":"2","key":"10.3233\/JIFS-234791_ref26","doi-asserted-by":"crossref","first-page":"239","DOI":"10.3390\/s19020239","article-title":"Mfa-net:Motion feature augmented network for dynamic hand gesturerecognition from skeletal data","volume":"19","author":"Chen","year":"2019","journal-title":"Sensors"},{"issue":"8","key":"10.3233\/JIFS-234791_ref29","doi-asserted-by":"crossref","first-page":"2405","DOI":"10.1109\/TCSVT.2018.2864148","article-title":"Action recognition withspatio-temporal visual attention on skeleton image sequences","volume":"29","author":"Yang","year":"2018","journal-title":"IEEE Transactions on Circuits and Systems for Video Technology"},{"key":"10.3233\/JIFS-234791_ref33","doi-asserted-by":"crossref","first-page":"25 811","DOI":"10.1109\/ACCESS.2020.2971283","article-title":"Real-time detection and motion recognition ofhuman moving objects based on deep learning and multi-scale featurefusion in video","volume":"8","author":"Gong","year":"2020","journal-title":"IEEE Access"}],"container-title":["Journal of Intelligent &amp; Fuzzy Systems"],"original-title":[],"link":[{"URL":"https:\/\/content.iospress.com\/download?id=10.3233\/JIFS-234791","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,27]],"date-time":"2026-01-27T17:43:04Z","timestamp":1769535784000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/full\/10.3233\/JIFS-234791"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,1,10]]},"references-count":12,"journal-issue":{"issue":"1"},"URL":"https:\/\/doi.org\/10.3233\/jifs-234791","relation":{},"ISSN":["1064-1246","1875-8967"],"issn-type":[{"value":"1064-1246","type":"print"},{"value":"1875-8967","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,1,10]]}}}