{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,2]],"date-time":"2026-02-02T21:27:10Z","timestamp":1770067630611,"version":"3.49.0"},"reference-count":49,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2022,9,7]],"date-time":"2022-09-07T00:00:00Z","timestamp":1662508800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,9,7]],"date-time":"2022-09-07T00:00:00Z","timestamp":1662508800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2023,4]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Dynamic gesture recognition has become a new type of interaction to meet the needs of daily interaction. It is the most natural, easy to operate, and intuitive, so it has a wide range of applications. The accuracy of gesture recognition depends on the ability to accurately learn the short-term and long-term spatiotemporal features of gestures. Our work is different from improving the performance of a single type of network with convnets-based models and recurrent neural network-based models or serial stacking of two heterogeneous networks, we proposed a fusion architecture that can simultaneously learn short-term and long-term spatiotemporal features of gestures, which combined convnets-based models and recurrent neural network-based models in parallel. At each stage of feature learning, the short-term and long-term spatiotemporal features of gestures are captured simultaneously, and the contribution of two heterogeneous networks to the classification results in spatial and channel axes that can be learned automatically by using the attention mechanism. The sequence and pooling operation of the channel attention module and spatial attention module are compared through experiments. And the proportion of short-term and long-term features of gestures on channel and spatial axes in each stage of feature learning is quantitatively analyzed, and the final model is determined according to the experimental results. The module can be used for end-to-end learning and the proposed method was validated on the EgoGesture, SKIG, and IsoGD datasets and got very competitive performance.<\/jats:p>","DOI":"10.1007\/s40747-022-00858-8","type":"journal-article","created":{"date-parts":[[2022,9,7]],"date-time":"2022-09-07T08:03:54Z","timestamp":1662537834000},"page":"1377-1390","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":16,"title":["Parallel temporal feature selection based on improved attention mechanism for dynamic gesture recognition"],"prefix":"10.1007","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9855-6223","authenticated-orcid":false,"given":"Gongzheng","family":"Chen","sequence":"first","affiliation":[]},{"given":"Zhenghong","family":"Dong","sequence":"additional","affiliation":[]},{"given":"Jue","family":"Wang","sequence":"additional","affiliation":[]},{"given":"Lurui","family":"Xia","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,9,7]]},"reference":[{"issue":"4","key":"858_CR1","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/2897824.2925953","volume":"35","author":"J Lien","year":"2016","unstructured":"Lien J, Gillian N, Karagozler ME, Amihood P, Schwesig C, Olson E, Raja H, Poupyrev I (2016) Soli: ubiquitous gesture sensing with millimeter wave radar. ACM Trans Graph 35(4):1\u201319","journal-title":"ACM Trans Graph"},{"key":"858_CR2","unstructured":"Nymoen K, Haugen MR, Jensenius AR (2015) Mumyo\u2013evaluating and exploring the myo armband for musical interaction. In: Proceedings of the international conference on new interfaces for musical expression"},{"key":"858_CR3","doi-asserted-by":"crossref","unstructured":"Parcheta Z, Mart\u00ednez-Hinarejos C-D (2017) Sign language gesture recognition using HMM. In: Iberian conference on pattern recognition and image analysis. Springer, pp.419\u2013426","DOI":"10.1007\/978-3-319-58838-4_46"},{"issue":"7","key":"858_CR4","doi-asserted-by":"publisher","first-page":"4820","DOI":"10.1109\/TII.2021.3129629","volume":"18","author":"M Wieczorek","year":"2021","unstructured":"Wieczorek M, Sika J, Wozniak M, Garg S, Hassan M (2021) Lightweight CNN model for human face detection in risk situations. IEEE Trans Ind Inf 18(7):4820\u20134829","journal-title":"IEEE Trans Ind Inf"},{"issue":"1","key":"858_CR5","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41598-022-09293-8","volume":"12","author":"H Basak","year":"2022","unstructured":"Basak H, Kundu R, Singh PK, Ijaz MF, Wo\u017aniak M, Sarkar R (2022) A union of deep learning and swarm-based optimization for 3D human action recognition. Sci Rep 12(1):1\u201317","journal-title":"Sci Rep"},{"key":"858_CR6","doi-asserted-by":"crossref","unstructured":"Yan G, Wo\u017aniak M (2022) Accurate key frame extraction algorithm of video action for Aerobics online teaching. Mobile Netw Appl 1\u201310","DOI":"10.1007\/s11036-022-01939-1"},{"key":"858_CR7","doi-asserted-by":"crossref","unstructured":"Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 4489\u20134497","DOI":"10.1109\/ICCV.2015.510"},{"key":"858_CR8","doi-asserted-by":"crossref","unstructured":"Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132\u20137141","DOI":"10.1109\/CVPR.2018.00745"},{"key":"858_CR9","unstructured":"Park J, Woo S, Lee J-Y, Kweon IS (2018) BAM: Bottleneck attention module. http:\/\/arxiv.org\/abs\/1807.06514"},{"key":"858_CR10","doi-asserted-by":"crossref","unstructured":"Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3\u201319","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"858_CR11","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2020.114499","volume":"169","author":"X Tang","year":"2021","unstructured":"Tang X, Yan Z, Peng J, Hao B, Wang H, Li J (2021) Selective spatiotemporal features learning for dynamic gesture recognition. Expert Syst Appl 169:114499","journal-title":"Expert Syst Appl"},{"issue":"5","key":"858_CR12","doi-asserted-by":"publisher","first-page":"1038","DOI":"10.1109\/TMM.2018.2808769","volume":"20","author":"Y Zhang","year":"2018","unstructured":"Zhang Y, Cao C, Cheng J, Lu H (2018) Egogesture: a new dataset and benchmark for egocentric hand gesture recognition. IEEE Trans Multimed 20(5):1038\u20131050","journal-title":"IEEE Trans Multimed"},{"key":"858_CR13","doi-asserted-by":"crossref","unstructured":"Klaser A, Marsza\u0142ek M, Schmid C (2008) A spatio-temporal descriptor based on 3d-gradients. In: BMVC 2008\u201319th British machine vision conference. British Machine Vision Association, pp 271\u2013275","DOI":"10.5244\/C.22.99"},{"key":"858_CR14","doi-asserted-by":"crossref","unstructured":"Wan J, Zhao Y, Zhou S, Guyon I, Escalera S, Li SZ (2016) Chalearn looking at people rgb-d isolated and continuous datasets for gesture recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 56\u201364","DOI":"10.1109\/CVPRW.2016.100"},{"issue":"4","key":"858_CR15","first-page":"470","volume":"30","author":"NB Ibrahim","year":"2018","unstructured":"Ibrahim NB, Selim MM, Zayed HH (2018) An automatic Arabic sign language recognition system (ArSLRS). J King Saud Univ Comput Inf Sci 30(4):470\u2013477","journal-title":"J King Saud Univ Comput Inf Sci"},{"key":"858_CR16","doi-asserted-by":"crossref","unstructured":"Yang X, Zhang C, Tian Y (2012) Recognizing actions using depth motion maps-based histograms of oriented gradients. In: Proceedings of the 20th ACM international conference on multimedia, pp 1057\u20131060","DOI":"10.1145\/2393347.2396382"},{"key":"858_CR17","doi-asserted-by":"crossref","unstructured":"Wang H, Schmid C (2013) Action recognition with improved trajectories. In: Proceedings of the IEEE international conference on computer vision, pp 3551\u20133558","DOI":"10.1109\/ICCV.2013.441"},{"key":"858_CR18","unstructured":"Wang L, Xiong Y, Wang Z, Qiao Y (2015) Towards good practices for very deep two-stream convnets. http:\/\/arxiv.org\/abs\/1507.02159."},{"key":"858_CR19","doi-asserted-by":"crossref","unstructured":"Wu J, Ishwar P, Konrad J (2016) Two-stream CNNs for gesture-based verification and identification: learning user style. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 42\u201350","DOI":"10.1109\/CVPRW.2016.21"},{"key":"858_CR20","doi-asserted-by":"crossref","unstructured":"Funke I, Bodenstedt S, Oehme F, von Bechtolsheim F, Weitz J, Speidel S (2019) Using 3D convolutional neural networks to learn spatiotemporal features for automatic surgical gesture recognition in video. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 467\u2013475","DOI":"10.1007\/978-3-030-32254-0_52"},{"key":"858_CR21","doi-asserted-by":"crossref","unstructured":"Miao Q, Li Y, Ouyang W, Ma Z, Xu X, Shi W, Cao X (2017) Multimodal gesture recognition based on the resc3d network. In: Proceedings of the IEEE international conference on computer vision workshops, pp 3047\u20133055","DOI":"10.1109\/ICCVW.2017.360"},{"issue":"2","key":"858_CR22","doi-asserted-by":"publisher","first-page":"430","DOI":"10.1007\/s11263-016-0957-7","volume":"126","author":"L Pigou","year":"2018","unstructured":"Pigou L, Van Den Oord A, Dieleman S, Van Herreweghe M, Dambre J (2018) Beyond temporal pooling: recurrence and temporal convolutions for gesture recognition in video. Int J Comput Vis 126(2):430\u2013439","journal-title":"Int J Comput Vis"},{"key":"858_CR23","doi-asserted-by":"crossref","unstructured":"Shi L, Zhang Y, Hu J, Cheng J, Lu H (2019) Gesture recognition using spatiotemporal deformable convolutional representation. In: 2019 IEEE international conference on image processing (ICIP). IEEE, pp 1900\u20131904","DOI":"10.1109\/ICIP.2019.8803152"},{"key":"858_CR24","doi-asserted-by":"crossref","unstructured":"Wan J, Escalera S, Anbarjafari G, Jair Escalante H, Bar\u00f3 X, Guyon I, Madadi M, Allik J, Gorbova J, Lin C (2017) Results and analysis of chalearn lap multi-modal isolated and continuous gesture recognition, and real versus fake expressed emotions challenges. In: Proceedings of the IEEE international conference on computer vision workshops, pp 3189\u20133197","DOI":"10.1109\/ICCVW.2017.377"},{"key":"858_CR25","unstructured":"Shi X, Chen Z, Wang H, Yeung DY, Wong WK, Woo WC (2015) Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Advances in neural information processing systems, pp 802\u2013810"},{"key":"858_CR26","doi-asserted-by":"crossref","unstructured":"Molchanov P, Yang X, Gupta S, Kim K, Tyree S, Kautz J (2016) Online detection and classification of dynamic hand gestures with recurrent 3d convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4207\u20134215","DOI":"10.1109\/CVPR.2016.456"},{"key":"858_CR27","doi-asserted-by":"publisher","first-page":"80","DOI":"10.1016\/j.patcog.2017.10.033","volume":"76","author":"JC Nunez","year":"2018","unstructured":"Nunez JC, Cabido R, Pantrigo JJ, Montemayor AS, Velez JF (2018) Convolutional neural networks and long short-term memory for skeleton-based human activity and hand gesture recognition. Pattern Recognit 76:80\u201394","journal-title":"Pattern Recognit"},{"key":"858_CR28","doi-asserted-by":"crossref","unstructured":"Zhang L, Zhu G, Shen P, Song J, Afaq Shah S, Bennamoun M (2017) Learning spatiotemporal features using 3dcnn and convolutional lstm for gesture recognition. In: Proceedings of the IEEE international conference on computer vision workshops, pp 3120\u20133128","DOI":"10.1109\/ICCVW.2017.369"},{"key":"858_CR29","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2019.112829","volume":"139","author":"A Elboushaki","year":"2020","unstructured":"Elboushaki A, Hannane R, Afdel K, Koutti L (2020) MultiD-CNN: a multi-dimensional feature learning approach based on deep convolutional networks for gesture recognition in RGB-D image sequences. Expert Syst Appl 139:112829","journal-title":"Expert Syst Appl"},{"issue":"11","key":"858_CR30","doi-asserted-by":"publisher","first-page":"2480","DOI":"10.1049\/iet-ipr.2019.1248","volume":"14","author":"Y Peng","year":"2020","unstructured":"Peng Y, Tao H, Li W, Yuan H, Li T (2020) Dynamic gesture recognition based on feature fusion network and variant ConvLSTM. IET Image Proc 14(11):2480\u20132486","journal-title":"IET Image Proc"},{"issue":"5","key":"858_CR31","doi-asserted-by":"publisher","first-page":"1051","DOI":"10.1109\/TMM.2018.2818329","volume":"20","author":"P Wang","year":"2018","unstructured":"Wang P, Li W, Gao Z, Tang C, Ogunbona PO (2018) Depth pooling based large-scale 3-d action recognition with convolutional neural networks. IEEE Trans Multimed 20(5):1051\u20131061","journal-title":"IEEE Trans Multimed"},{"key":"858_CR32","doi-asserted-by":"crossref","unstructured":"Hou J, Wang G, Chen X, Xue J-H, Zhu R, Yang H (2018) Spatial-temporal attention res-TCN for skeleton-based dynamic hand gesture recognition. In: Proceedings of the European conference on computer vision (ECCV) workshops","DOI":"10.1007\/978-3-030-11024-6_18"},{"key":"858_CR33","doi-asserted-by":"crossref","unstructured":"Wiederer J, Bouazizi A, Kressel U, Belagiannis V (2020) Traffic control gesture recognition for autonomous vehicles. In: 2020 IEEE\/RSJ international conference on intelligent robots and systems (IROS). IEEE, pp 10676\u201310683","DOI":"10.1109\/IROS45743.2020.9341214"},{"key":"858_CR34","doi-asserted-by":"crossref","unstructured":"Dhingra N, Kunz A (2019) Res3atn-deep 3d residual attention network for hand gesture recognition in videos. In: 2019 international conference on 3D vision (3DV). IEEE, pp 491\u2013501","DOI":"10.1109\/3DV.2019.00061"},{"key":"858_CR35","unstructured":"Zhang L, Zhu G, Mei L, Shen P, Shah SAA, Bennamoun M (2018) Attention in convolutional LSTM for gesture recognition. In: Proceedings of the 32nd international conference on neural information processing systems, pp 1957\u20131966"},{"issue":"4","key":"858_CR36","doi-asserted-by":"publisher","first-page":"1323","DOI":"10.1109\/TNNLS.2019.2919764","volume":"31","author":"G Zhu","year":"2019","unstructured":"Zhu G, Zhang L, Yang L, Mei L, Shah SAA, Bennamoun M, Shen P (2019) Redundancy and attention in convolutional LSTM for gesture recognition. IEEE Trans Neural Netw Learn Syst 31(4):1323\u20131335","journal-title":"IEEE Trans Neural Netw Learn Syst"},{"key":"858_CR37","doi-asserted-by":"crossref","unstructured":"Materzynska J, Berger G, Bax I, Memisevic R (2019) The jester dataset: a large-scale video dataset of human gestures. In: Proceedings of the IEEE\/CVF international conference on computer vision workshops","DOI":"10.1109\/ICCVW.2019.00349"},{"key":"858_CR38","unstructured":"Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. http:\/\/arxiv.org\/abs\/1409.1556"},{"key":"858_CR39","unstructured":"Zhang L, Zhu G, Mei L, Shen P, Shah SAA, Bennamoun M (2018) Attention in convolutional LSTM for gesture recognition. In: Advances in neural information processing systems, p 31"},{"key":"858_CR40","doi-asserted-by":"crossref","unstructured":"Wang Z, She Q, Chalasani T, Smolic A (2020) Catnet: class incremental 3d convnets for lifelong egocentric gesture recognition. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition workshops, pp 230\u2013231","DOI":"10.1109\/CVPRW50498.2020.00123"},{"key":"858_CR41","doi-asserted-by":"crossref","unstructured":"Abavisani M, Joze HRV, Patel VM (2019) Improving the performance of unimodal dynamic hand-gesture recognition with multimodal training. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 1165\u20131174","DOI":"10.1109\/CVPR.2019.00126"},{"key":"858_CR42","doi-asserted-by":"crossref","unstructured":"K\u00f6p\u00fckl\u00fc O, Gunduz A, Kose N, Rigoll G (2019) Real-time hand gesture detection and classification using convolutional neural networks. In: 2019 14th IEEE international conference on automatic face & gesture recognition (FG 2019). IEEE, pp 1\u20138","DOI":"10.1109\/FG.2019.8756576"},{"key":"858_CR43","doi-asserted-by":"crossref","unstructured":"Han X, Lu F, Yin J, Tian G, Liu J (2022) Sign language recognition based on R (2+ 1) D With spatial\u2013temporal\u2013channel attention. IEEE Trans Hum Mach Syst 1\u201312","DOI":"10.1109\/THMS.2022.3144000"},{"key":"858_CR44","doi-asserted-by":"crossref","unstructured":"Wang Z, She Q, Smolic A (2021) Action-net: Multipath excitation for action recognition. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 13214\u201313223","DOI":"10.1109\/CVPR46437.2021.01301"},{"key":"858_CR45","unstructured":"Liu L, Shao L (2013) Learning discriminative representations from RGB-D video data. In: Twenty-third international joint conference on artificial intelligence, pp 1493\u20131500"},{"key":"858_CR46","doi-asserted-by":"crossref","unstructured":"Nishida N, Nakayama H (2015) Multimodal gesture recognition using multi-stream recurrent neural network. In: Image and video technology. Springer, pp 682\u2013694","DOI":"10.1007\/978-3-319-29451-3_54"},{"key":"858_CR47","doi-asserted-by":"crossref","unstructured":"Li D, Chen Y, Gao M, Jiang S, Huang C (2018) Multimodal gesture recognition using densely connected convolution and blstm. In: 2018 24th international conference on pattern recognition (ICPR). IEEE, pp 3365\u20133370","DOI":"10.1109\/ICPR.2018.8545502"},{"key":"858_CR48","doi-asserted-by":"crossref","unstructured":"Narayana P, Beveridge R, Draper BA (2018) Gesture recognition: focus on the hands. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5235\u20135244","DOI":"10.1109\/CVPR.2018.00549"},{"issue":"1","key":"858_CR49","doi-asserted-by":"publisher","first-page":"127","DOI":"10.1007\/s11042-020-09700-0","volume":"80","author":"R Rastgoo","year":"2021","unstructured":"Rastgoo R, Kiani K, Escalera S (2021) Hand pose aware multimodal isolated sign language recognition. Multimed Tools Appl 80(1):127\u2013163","journal-title":"Multimed Tools Appl"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-022-00858-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-022-00858-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-022-00858-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,4,18]],"date-time":"2023-04-18T09:24:00Z","timestamp":1681809840000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-022-00858-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,9,7]]},"references-count":49,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2023,4]]}},"alternative-id":["858"],"URL":"https:\/\/doi.org\/10.1007\/s40747-022-00858-8","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"value":"2199-4536","type":"print"},{"value":"2198-6053","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,9,7]]},"assertion":[{"value":"1 May 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"16 August 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 September 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}