{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,18]],"date-time":"2026-04-18T10:01:26Z","timestamp":1776506486683,"version":"3.51.2"},"reference-count":36,"publisher":"Springer Science and Business Media LLC","issue":"6","license":[{"start":{"date-parts":[[2023,7,17]],"date-time":"2023-07-17T00:00:00Z","timestamp":1689552000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,7,17]],"date-time":"2023-07-17T00:00:00Z","timestamp":1689552000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100005711","name":"Universit\u00e4t Hamburg","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100005711","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Cogn Comput"],"published-print":{"date-parts":[[2023,11]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>As robots are expected to get more involved in people\u2019s everyday lives, frameworks that enable intuitive user interfaces are in demand. Hand gesture recognition systems provide a natural way of communication and, thus, are an integral part of seamless human-robot interaction (HRI). Recent years have witnessed an immense evolution of computational models powered by deep learning. However, state-of-the-art models fall short of expanding across different gesture domains, such as emblems and co-speech. In this paper, we propose a novel hybrid hand gesture recognition system. Our <jats:italic>Snapture<\/jats:italic> architecture enables learning both static and dynamic gestures: by capturing a so-called <jats:italic>snapshot<\/jats:italic> of the gesture performance at its peak, we integrate the hand pose and the dynamic movement. Moreover, we present a method for analyzing the motion profile of a gesture to uncover its dynamic characteristics, which allows regulating a static channel based on the amount of motion. Our evaluation demonstrates the superiority of our approach on two gesture benchmarks compared to a state-of-the-art CNNLSTM baseline. Our analysis on a gesture class basis unveils the potential of our <jats:italic>Snapture<\/jats:italic> architecture for performance improvements using RGB data. Thanks to its modular implementation, our framework allows the integration of other multimodal data, like facial expressions and head tracking, which are essential cues in HRI scenarios, into one architecture. Thus, our work contributes both to integrative gesture recognition research and machine learning applications for non-verbal communication with robots.<\/jats:p>","DOI":"10.1007\/s12559-023-10174-z","type":"journal-article","created":{"date-parts":[[2023,7,17]],"date-time":"2023-07-17T16:03:07Z","timestamp":1689609787000},"page":"2014-2033","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":15,"title":["Snapture\u2014a Novel Neural Architecture for Combined Static and Dynamic Hand Gesture Recognition"],"prefix":"10.1007","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9907-1834","authenticated-orcid":false,"given":"Hassan","family":"Ali","sequence":"first","affiliation":[]},{"given":"Doreen","family":"Jirak","sequence":"additional","affiliation":[]},{"given":"Stefan","family":"Wermter","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,7,17]]},"reference":[{"key":"10174_CR1","volume-title":"Gesture recognition","author":"S Escalera","year":"2018","unstructured":"Escalera S, Guyon I, Athitsos V. Gesture recognition. 1st ed. Incorporated: Springer Publishing Company; 2018.","edition":"1"},{"key":"10174_CR2","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s10462-012-9356-9","volume":"43","author":"S Siddharth","year":"2015","unstructured":"Siddharth S, Agrawal A. Vision based hand gesture recognition for human computer interaction: a survey. Artif Intell Rev. 2015;43:1\u201354. https:\/\/doi.org\/10.1007\/s10462-012-9356-9.","journal-title":"Artif Intell Rev"},{"key":"10174_CR3","doi-asserted-by":"publisher","first-page":"365","DOI":"10.1007\/978-981-13-0776-8_33","volume-title":"Nanoelectronics, circuits and communication systems","author":"S Anwar","year":"2019","unstructured":"Anwar S, Sinha SK, Vivek S, Ashank V. Hand gesture recognition: a survey. In: Nath V, Mandal JK, editors. Nanoelectronics, circuits and communication systems. Singapore: Springer Singapore; 2019. p. 365\u201371."},{"key":"10174_CR4","doi-asserted-by":"publisher","unstructured":"Chakraborty B, Sarma D, Bhuyan M, MacDorman K. A review of constraints on vision-based gesture recognition for human-computer interaction. IET Comput Vis. 2017;12. https:\/\/doi.org\/10.1049\/iet-cvi.2017.0052.","DOI":"10.1049\/iet-cvi.2017.0052"},{"key":"10174_CR5","doi-asserted-by":"publisher","unstructured":"Abdulazeez AM, Faizi S. Vision-based mobile robot controllers: a scientific review. Turkish J Comput Math Educ (TURCOMAT). 2021;12. https:\/\/doi.org\/10.17762\/turcomat.v12i6.2695.","DOI":"10.17762\/turcomat.v12i6.2695"},{"key":"10174_CR6","doi-asserted-by":"publisher","unstructured":"Renard F, Guedria S, De Palma N, Vuillerme N. Variability and reproducibility in deep learning for medical image segmentation. Sci Rep. 2020;10. https:\/\/doi.org\/10.1038\/s41598-020-69920-0.","DOI":"10.1038\/s41598-020-69920-0"},{"key":"10174_CR7","doi-asserted-by":"publisher","unstructured":"Vanamsterdam B, Clarkson M, Stoyanov D. Gesture recognition in robotic surgery: a review. IEEE Trans Biomed Eng.\u00a02021;1\u20131. https:\/\/doi.org\/10.1109\/TBME.2021.3054828.","DOI":"10.1109\/TBME.2021.3054828"},{"key":"10174_CR8","doi-asserted-by":"publisher","unstructured":"Asadi-Aghbolaghi M, Clap\u00e9s A, Bellantonio M, Escalante HJ, Ponce-L\u00f3pez V, Bar\u00f3 X, Guyon I, Kasaei S, Escalera S. A survey on deep learning based approaches for action and gesture recognition in image sequences. In: 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017); 2017. p. 476\u201383. \nhttps:\/\/doi.org\/10.1109\/FG.2017.150.","DOI":"10.1109\/FG.2017.150"},{"key":"10174_CR9","unstructured":"Tsironi E, Barros P, Wermter S. Gesture recognition with a convolutional long short-term memory recurrent neural network. In: Proceedings of the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN). 2016.\u00a0p. 213\u20138."},{"key":"10174_CR10","doi-asserted-by":"publisher","unstructured":"dos Santos CC, Samatelo JLA, Vassallo RF. Dynamic gesture recognition by using CNNs and star RGB: a temporal information condensation. Neurocomputing. 2020;400:238\u201354. https:\/\/doi.org\/10.1016\/j.neucom.2020.03.038. www.sciencedirect.com\/science\/article\/pii\/S092523122030391X.","DOI":"10.1016\/j.neucom.2020.03.038"},{"key":"10174_CR11","doi-asserted-by":"publisher","unstructured":"Kendon A. Gesticulation and speech: two aspects of the process of utterance. In: The relationship of verbal and nonverbal communication. De Gruyter Mouton; 2011. p. 207\u201328. https:\/\/doi.org\/10.1515\/9783110813098.207.","DOI":"10.1515\/9783110813098.207"},{"key":"10174_CR12","doi-asserted-by":"publisher","unstructured":"Tsironi E, Barros P, Weber C, Wermter S. An analysis of convolutional long short-term memory recurrent neural networks for gesture recognition. Neurocomputing. 2017;268:76\u201386. https:\/\/doi.org\/10.1016\/j.neucom.2016.12.088. www.sciencedirect.com\/science\/article\/pii\/S0925231217307555.","DOI":"10.1016\/j.neucom.2016.12.088"},{"key":"10174_CR13","doi-asserted-by":"publisher","first-page":"459","DOI":"10.1007\/978-3-319-16178-5_32","volume-title":"Computer Vision - ECCV 2014 Workshops","author":"S Escalera","year":"2015","unstructured":"Escalera S, Bar\u00f3 X, Gonz\u00e0lez J, Bautista MA, Madadi M, Reyes M, Ponce-L\u00f3pez V, Escalante HJ, Shotton J, Guyon I. Chalearn looking at people challenge 2014: dataset and results. In: Agapito L, Bronstein MM, Rother C, editors. Computer Vision - ECCV 2014 Workshops. Cham: Springer International Publishing; 2015. p. 459\u201373."},{"key":"10174_CR14","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/TPAMI.2016.2537340","volume":"38","author":"D Wu","year":"2016","unstructured":"Wu D, Pigou L, Kindermans PJ, Le N, Shao L, Dambre J, Odobez JM. Deep dynamic neural networks for multimodal gesture segmentation and recognition. IEEE Trans Pattern Anal Mach Intell. 2016;38:1\u20131. https:\/\/doi.org\/10.1109\/TPAMI.2016.2537340.","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"10174_CR15","doi-asserted-by":"publisher","first-page":"2227","DOI":"10.3390\/s21062227","volume":"21","author":"O Mazhar","year":"2021","unstructured":"Mazhar O, Ramdani S, Cherubini A. A deep learning framework for recognizing both static and dynamic gestures. Sensors. 2021;21:2227. https:\/\/doi.org\/10.3390\/s21062227.","journal-title":"Sensors"},{"key":"10174_CR16","doi-asserted-by":"publisher","unstructured":"Wan J, Li SZ, Zhao Y, Zhou S, Guyon I, Escalera S. Chalearn looking at people RGB-D isolated and continuous datasets for gesture recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 2016. p. 761\u20139. https:\/\/doi.org\/10.1109\/CVPRW.2016.100.","DOI":"10.1109\/CVPRW.2016.100"},{"key":"10174_CR17","doi-asserted-by":"publisher","unstructured":"Mazhar O. OpenSign - Kinect v2 hand gesture data - American sign language. 2019. https:\/\/doi.org\/10.17632\/k793ybxx7t.1.","DOI":"10.17632\/k793ybxx7t.1"},{"key":"10174_CR18","doi-asserted-by":"publisher","unstructured":"D\u2019Eusanio A, Simoni A, Pini S, Borghi G, Vezzani R, Cucchiara R. A transformer-based network for dynamic hand gesture recognition. In: 2020 International Conference on 3D Vision (3DV). 2020. p. 623\u201332. https:\/\/doi.org\/10.1109\/3DV50981.2020.00072.","DOI":"10.1109\/3DV50981.2020.00072"},{"key":"10174_CR19","doi-asserted-by":"publisher","unstructured":"Molchanov P, Yang X, Gupta S, Kim K, Tyree S, Kautz J. Online detection and classification of dynamic hand gestures with recurrent 3D convolutional neural networks. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016. p. 4207\u201315. https:\/\/doi.org\/10.1109\/CVPR.2016.456.","DOI":"10.1109\/CVPR.2016.456"},{"key":"10174_CR20","doi-asserted-by":"publisher","unstructured":"Manganaro F, Pini S, Borghi G, Vezzani R, Cucchiara R. Hand gestures for the human-car interaction: The Briareo dataset. In: Image Analysis and Processing \u2013 ICIAP 2019. Springer International Publishing; 2019. p. 560\u201371. https:\/\/doi.org\/10.1007\/978-3-030-30645-8_51.","DOI":"10.1007\/978-3-030-30645-8_51"},{"key":"10174_CR21","doi-asserted-by":"publisher","first-page":"6452","DOI":"10.3390\/s22176452","volume":"22","author":"W Aditya","year":"2022","unstructured":"Aditya W, Shih T, Thaipisutikul T, Fitriajie A, Gochoo M, Utaminingrum F, Lin CY. Novel spatio-temporal continuous sign language recognition using an attentive multi-feature network. Sensors. 2022;22:6452. https:\/\/doi.org\/10.3390\/s22176452.","journal-title":"Sensors"},{"key":"10174_CR22","doi-asserted-by":"crossref","unstructured":"Huang J, Zhou W, Zhang Q, Li H, Li W. Video-based sign language recognition without temporal segmentation. In: AAAI Conference on Artificial Intelligence (AAAI). 2018.","DOI":"10.1609\/aaai.v32i1.11903"},{"key":"10174_CR23","doi-asserted-by":"publisher","unstructured":"Pu J, Zhou W, Li H. Iterative alignment network for continuous sign language recognition. In: Conference on Computer Vision and Pattern Recognition (CVPR). 2019. p. 4160\u20139. https:\/\/doi.org\/10.1109\/CVPR.2019.00429.","DOI":"10.1109\/CVPR.2019.00429"},{"key":"10174_CR24","doi-asserted-by":"publisher","DOI":"10.1109\/ICME.2019.00223","author":"H Zhou","year":"2019","unstructured":"Zhou H, Zhou W, Li H. Dynamic pseudo label decoding for continuous sign language recognition. Int Conf Multimedia Expo (ICME). 2019. https:\/\/doi.org\/10.1109\/ICME.2019.00223.","journal-title":"Int Conf Multimedia Expo (ICME)"},{"key":"10174_CR25","doi-asserted-by":"publisher","first-page":"108","DOI":"10.1016\/j.cviu.2015.09.013","volume":"141","author":"O Koller","year":"2015","unstructured":"Koller O, Forster J, Ney H. Continuous sign language recognition: towards large vocabulary statistical recognition systems handling multiple signers. Comput Vis Image Underst. 2015;141:108\u201325.","journal-title":"Comput Vis Image Underst"},{"key":"10174_CR26","doi-asserted-by":"publisher","unstructured":"Cao Z, Li Y, Shin BS. Content-adaptive and attention-based network for hand gesture recognition. Appl Sci. 2022;12(4). https:\/\/doi.org\/10.3390\/app12042041, \nhttps:\/\/www.mdpi.com\/2076-3417\/12\/4\/2041.","DOI":"10.3390\/app12042041"},{"issue":"5","key":"10174_CR27","doi-asserted-by":"publisher","first-page":"1038","DOI":"10.1109\/TMM.2018.2808769","volume":"20","author":"Y Zhang","year":"2018","unstructured":"Zhang Y, Cao C, Cheng J, Lu H. Egogesture: a new dataset and benchmark for egocentric hand gesture recognition. IEEE Trans Multimedia. 2018;20(5):1038\u201350. https:\/\/doi.org\/10.1109\/TMM.2018.2808769.","journal-title":"IEEE Trans Multimedia"},{"key":"10174_CR28","doi-asserted-by":"publisher","DOI":"10.1007\/s40747-022-00858-8","author":"G Chen","year":"2022","unstructured":"Chen G, Dong Z, Wang J, Xia L. Parallel temporal feature selection based on improved attention mechanism for dynamic gesture recognition. Complex Intell Syst. 2022. https:\/\/doi.org\/10.1007\/s40747-022-00858-8.","journal-title":"Complex Intell Syst"},{"key":"10174_CR29","doi-asserted-by":"publisher","unstructured":"Klaser A, Marszalek M, Schmid C. A spatio-temporal descriptor based on 3D-gradients. In: Everingham M, Needham C, Fraile R editors. BMVC 2008 - 19th British Machine Vision Conference. British Machine Vision Association, Leeds, United Kingdom; 2008. p. 275:1\u201310. https:\/\/doi.org\/10.5244\/C.22.99.","DOI":"10.5244\/C.22.99"},{"issue":"4","key":"10174_CR30","doi-asserted-by":"publisher","first-page":"600","DOI":"10.1109\/TIP.2003.819861","volume":"13","author":"Z Wang","year":"2004","unstructured":"Wang Z, Bovik A, Sheikh H, Simoncelli E. Image quality assessment: From error visibility to structural similarity. IEEE Trans Image Process. 2004;13(4):600\u201312. https:\/\/doi.org\/10.1109\/TIP.2003.819861.","journal-title":"IEEE Trans Image Process"},{"key":"10174_CR31","unstructured":"Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach F, Blei\u00a0D, editors. Proceedings of the 32nd International Conference on Machine Learning, Proceedings of Machine Learning Research (vol. 37). PMLR, Lille, France; 2015. p. 448\u201356. https:\/\/proceedings.mlr.press\/v37\/ioffe15.html."},{"key":"10174_CR32","first-page":"249","volume":"9","author":"X Glorot","year":"2010","unstructured":"Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks. J Mach Learn Res - Proc Track. 2010;9:249\u201356.","journal-title":"J Mach Learn Res - Proc Track"},{"key":"10174_CR33","doi-asserted-by":"publisher","unstructured":"Pham V, Bluche T, Kermorvant C, Louradour J. Dropout improves recurrent neural networks for handwriting recognition. In: 2014 14th International Conference on Frontiers in Handwriting Recognition. 2014. p. 285\u201390. https:\/\/doi.org\/10.1109\/ICFHR.2014.55.","DOI":"10.1109\/ICFHR.2014.55"},{"key":"10174_CR34","doi-asserted-by":"publisher","first-page":"696","DOI":"10.1109\/34.1000242","volume":"1","author":"RL Hsu","year":"2002","unstructured":"Hsu RL, Abdel-Mottaleb M, Jain A. Face detection in color images. IEEE Trans Pattern Anal Mach Intell. 2002;1:696\u2013706. https:\/\/doi.org\/10.1109\/34.1000242.","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"10174_CR35","doi-asserted-by":"publisher","unstructured":"Qiu-yu Z, Lu J, Zhang M, Duan H, Lv L. Hand gesture segmentation method based on YCbCr color space and k-means clustering. Int J Signal Process Image Process Pattern Recog. 2015;8:105\u201316. https:\/\/doi.org\/10.14257\/ijsip.2015.8.5.11.","DOI":"10.14257\/ijsip.2015.8.5.11"},{"key":"10174_CR36","unstructured":"Basilio JAM, Torres GA, P\u00e9rez GS, Medina LKT, Meana HMP. Explicit image detection using YCbCr space color model as skin detection. In: Proceedings of the 2011 American Conference on Applied Mathematics and the 5th WSEAS International Conference on Computer Engineering and Applications, AMERICAN-MATH\u201911\/CEA\u201911. World Scientific and Engineering Academy and Society (WSEAS), Stevens Point, Wisconsin, USA; 2011.\u00a0p. 123\u20138."}],"container-title":["Cognitive Computation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s12559-023-10174-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s12559-023-10174-z\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s12559-023-10174-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,13]],"date-time":"2023-11-13T10:22:56Z","timestamp":1699870976000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s12559-023-10174-z"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,7,17]]},"references-count":36,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2023,11]]}},"alternative-id":["10174"],"URL":"https:\/\/doi.org\/10.1007\/s12559-023-10174-z","relation":{},"ISSN":["1866-9956","1866-9964"],"issn-type":[{"value":"1866-9956","type":"print"},{"value":"1866-9964","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,7,17]]},"assertion":[{"value":"25 May 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"30 June 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"17 July 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of Interest"}}]}}