{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,18]],"date-time":"2026-06-18T16:07:14Z","timestamp":1781798834986,"version":"3.54.5"},"reference-count":37,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2023,6,14]],"date-time":"2023-06-14T00:00:00Z","timestamp":1686700800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Telekom Malaysia Research &amp; Development","award":["RDTC\/23107"],"award-info":[{"award-number":["RDTC\/23107"]}]},{"name":"Telekom Malaysia Research &amp; Development","award":["RGP1\/357\/43"],"award-info":[{"award-number":["RGP1\/357\/43"]}]},{"name":"Deanship of Scientific Research, King Khalid University, Saudi Arabia","award":["RDTC\/23107"],"award-info":[{"award-number":["RDTC\/23107"]}]},{"name":"Deanship of Scientific Research, King Khalid University, Saudi Arabia","award":["RGP1\/357\/43"],"award-info":[{"award-number":["RGP1\/357\/43"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Hand gesture recognition (HGR) is a crucial area of research that enhances communication by overcoming language barriers and facilitating human-computer interaction. Although previous works in HGR have employed deep neural networks, they fail to encode the orientation and position of the hand in the image. To address this issue, this paper proposes HGR-ViT, a Vision Transformer (ViT) model with an attention mechanism for hand gesture recognition. Given a hand gesture image, it is first split into fixed size patches. Positional embedding is added to these embeddings to form learnable vectors that capture the positional information of the hand patches. The resulting sequence of vectors are then served as the input to a standard Transformer encoder to obtain the hand gesture representation. A multilayer perceptron head is added to the output of the encoder to classify the hand gesture to the correct class. The proposed HGR-ViT obtains an accuracy of 99.98%, 99.36% and 99.85% for the American Sign Language (ASL) dataset, ASL with Digits dataset, and National University of Singapore (NUS) hand gesture dataset, respectively.<\/jats:p>","DOI":"10.3390\/s23125555","type":"journal-article","created":{"date-parts":[[2023,6,14]],"date-time":"2023-06-14T02:01:40Z","timestamp":1686708100000},"page":"5555","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":40,"title":["HGR-ViT: Hand Gesture Recognition with Vision Transformer"],"prefix":"10.3390","volume":"23","author":[{"given":"Chun Keat","family":"Tan","sequence":"first","affiliation":[{"name":"Faculty of Information Science and Technology, Multimedia University, Jalan Ayer Keroh Lama, Melaka 75450, Malaysia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1929-7978","authenticated-orcid":false,"given":"Kian Ming","family":"Lim","sequence":"additional","affiliation":[{"name":"Faculty of Information Science and Technology, Multimedia University, Jalan Ayer Keroh Lama, Melaka 75450, Malaysia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Roy Kwang Yang","family":"Chang","sequence":"additional","affiliation":[{"name":"Faculty of Information Science and Technology, Multimedia University, Jalan Ayer Keroh Lama, Melaka 75450, Malaysia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3679-8977","authenticated-orcid":false,"given":"Chin Poo","family":"Lee","sequence":"additional","affiliation":[{"name":"Faculty of Information Science and Technology, Multimedia University, Jalan Ayer Keroh Lama, Melaka 75450, Malaysia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1052-2657","authenticated-orcid":false,"given":"Ali","family":"Alqahtani","sequence":"additional","affiliation":[{"name":"Department of Computer Science, King Khalid University, Abha 61421, Saudi Arabia"},{"name":"Center for Artificial Intelligence (CAI), King Khalid University, Abha 61421, Saudi Arabia"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2023,6,14]]},"reference":[{"key":"ref_1","first-page":"22","article-title":"Gesture Recognition of RGB and RGB-D Static Images Using Convolutional Neural Networks","volume":"5","author":"Khari","year":"2019","journal-title":"Int. J. Interact. Multim. Artif. Intell."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"8955","DOI":"10.1007\/s00521-019-04427-y","article-title":"Transfer learning-based convolutional neural networks with heuristic optimization for hand gesture recognition","volume":"31","author":"Ozcan","year":"2019","journal-title":"Neural Comput. Appl."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"5339","DOI":"10.1007\/s00521-020-05337-0","article-title":"Convolutional neural network with spatial pyramid pooling for hand gesture recognition","volume":"33","author":"Tan","year":"2021","journal-title":"Neural Comput. Appl."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Mujahid, A., Awan, M.J., Yasin, A., Mohammed, M.A., Dama\u0161evi\u010dius, R., Maskeli\u016bnas, R., and Abdulkareem, K.H. (2021). Real-time hand gesture recognition based on deep learning YOLOv3 model. Appl. Sci., 11.","DOI":"10.3390\/app11094164"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Ewe, E.L.R., Lee, C.P., Kwek, L.C., and Lim, K.M. (2022). Hand Gesture Recognition via Lightweight VGG16 and Ensemble Classifier. Appl. Sci., 12.","DOI":"10.3390\/app12157643"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"114797","DOI":"10.1016\/j.eswa.2021.114797","article-title":"Hand gesture recognition via enhanced densely connected convolutional neural network","volume":"175","author":"Tan","year":"2021","journal-title":"Expert Syst. Appl."},{"key":"ref_7","first-page":"906","article-title":"Wide Residual Network for Vision-based Static Hand Gesture Recognition","volume":"48","author":"Tan","year":"2021","journal-title":"IAENG Int. J. Comput. Sci."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"311","DOI":"10.1016\/j.neucom.2017.06.012","article-title":"A four dukkha state-space model for hand tracking","volume":"267","author":"Lim","year":"2017","journal-title":"Neurocomputing"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Chen, X., Wang, G., Guo, H., Zhang, C., Wang, H., and Zhang, L. (2019). Mfa-net: Motion feature augmented network for dynamic hand gesture recognition from skeletal data. Sensors, 19.","DOI":"10.3390\/s19020239"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Rahim, M.A., Islam, M.R., and Shin, J. (2019). Non-touch sign word recognition based on dynamic hand gesture using hybrid segmentation and CNN feature fusion. Appl. Sci., 9.","DOI":"10.3390\/app9183790"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Vaitkevi\u010dius, A., Taroza, M., Bla\u017eauskas, T., Dama\u0161evi\u010dius, R., Maskeli\u016bnas, R., and Wo\u017aniak, M. (2019). Recognition of American sign language gestures in a virtual reality using leap motion. Appl. Sci., 9.","DOI":"10.3390\/app9030445"},{"key":"ref_12","first-page":"1","article-title":"Dynamic hand gesture recognition based on signals from specialized data glove and deep learning algorithms","volume":"70","author":"Dong","year":"2021","journal-title":"IEEE Trans. Instrum. Meas."},{"key":"ref_13","first-page":"771","article-title":"A signer independent sign language recognition with co-articulation elimination from live videos: An Indian scenario","volume":"34","author":"Athira","year":"2022","journal-title":"J. King Saud Univ.-Comput. Inf. Sci."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Ma, L., and Huang, W. (2016, January 27\u201328). A static hand gesture recognition method based on the depth information. Proceedings of the 2016 8th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC), Hangzhou, China.","DOI":"10.1109\/IHMSC.2016.159"},{"key":"ref_15","first-page":"561","article-title":"Recognition of static hand gesture with using ANN and SVM","volume":"10","author":"Bamwenda","year":"2019","journal-title":"Dicle Univ. J. Eng."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"012022","DOI":"10.1088\/1742-6596\/1367\/1\/012022","article-title":"Discrete Wavelet Transform on static hand gesture recognition","volume":"1367","author":"Candrasari","year":"2019","journal-title":"J. Phys. Conf. Ser."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Gao, Q., Liu, J., Ju, Z., Li, Y., Zhang, T., and Zhang, L. (2017, January 16\u201318). Static hand gesture recognition with parallel CNNs for space human-robot interaction. Proceedings of the International Conference on Intelligent Robotics and Applications, Wuhan, China.","DOI":"10.1007\/978-3-319-65289-4_44"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"1515","DOI":"10.1049\/joe.2018.8327","article-title":"RGB-D static gesture recognition based on convolutional neural network","volume":"2018","author":"Xie","year":"2018","journal-title":"J. Eng."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"2353","DOI":"10.1016\/j.procs.2020.04.255","article-title":"A deep convolutional neural network approach for static hand gesture recognition","volume":"171","author":"Adithya","year":"2020","journal-title":"Procedia Comput. Sci."},{"key":"ref_20","first-page":"34","article-title":"Eye-Tracking Signals Based Affective Classification Employing Deep Gradient Convolutional Neural Networks","volume":"7","author":"Li","year":"2021","journal-title":"Int. J. Interact. Multimed. Artif. Intell."},{"key":"ref_21","first-page":"112","article-title":"A Novel Technique to Detect and Track Multiple Objects in Dynamic Video Surveillance Systems","volume":"7","author":"Adimoolam","year":"2022","journal-title":"Int. J. Interact. Multimed. Artif. Intell."},{"key":"ref_22","first-page":"1","article-title":"Hand Gesture Recognition based on Invariant Features and Artifical Neural Network","volume":"9","author":"Kaur","year":"2016","journal-title":"Indian J. Sci. Technol."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Gupta, B., Shukla, P., and Mittal, A. (2016, January 7\u20139). K-nearest correlated neighbor classification for Indian sign language gesture recognition using feature fusion. Proceedings of the 2016 International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, India.","DOI":"10.1109\/ICCCI.2016.7479951"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"254","DOI":"10.1016\/j.procs.2018.07.259","article-title":"Hand gesture recognition method based on HOG-LBP features for mobile device","volume":"126","author":"Lahiani","year":"2018","journal-title":"Procedia Comput. Sci."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"1780","DOI":"10.1049\/iet-ipr.2017.1312","article-title":"Hand gesture recognition using DWT and Fratio based feature descriptor","volume":"12","author":"Sahoo","year":"2018","journal-title":"IET Image Process."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"6793","DOI":"10.1007\/s12652-020-02314-2","article-title":"Development of hand gesture recognition system using machine learning","volume":"12","author":"Parvathy","year":"2021","journal-title":"J. Ambient Intell. Humaniz. Comput."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Flores, C.J.L., Cutipa, A.G., and Enciso, R.L. (2017, January 15\u201318). Application of convolutional neural networks for static hand gestures recognition under different invariant features. Proceedings of the 2017 IEEE XXIV International Conference on Electronics, Electrical Engineering and Computing (INTERCON), Cusco, Peru.","DOI":"10.1109\/INTERCON.2017.8079727"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Alani, A.A., Cosma, G., Taherkhani, A., and McGinnity, T.M. (2018, January 25\u201327). Hand gesture recognition using an adapted convolutional neural network with data augmentation. Proceedings of the 2018 4th International Conference on Information Management (ICIM), Oxford, UK.","DOI":"10.1109\/INFOMAN.2018.8392660"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"547","DOI":"10.12988\/ces.2018.8241","article-title":"Convolutional neural network with a dag architecture for control of a robotic arm by means of hand gestures","volume":"11","author":"Arenas","year":"2018","journal-title":"Contemp. Eng. Sci."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"700","DOI":"10.1049\/iet-cvi.2018.5796","article-title":"HGR-Net: A fusion network for hand gesture segmentation and recognition","volume":"13","author":"Dadashzadeh","year":"2019","journal-title":"IET Comput. Vis."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"60","DOI":"10.4018\/IJACI.2019070104","article-title":"Convolutional neural network based american sign language static hand gesture recognition","volume":"10","author":"Ahuja","year":"2019","journal-title":"Int. J. Ambient Comput. Intell. (IJACI)"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Osimani, C., Ojeda-Castelo, J.J., and Piedra-Fernandez, J.A. (2023). Point Cloud Deep Learning Solution for Hand Gesture Recognition. Int. J. Interact. Multimed. Artif. Intell., 1\u201310. in press.","DOI":"10.9781\/ijimai.2023.01.001"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"77","DOI":"10.1007\/s41060-016-0008-z","article-title":"Recent methods in vision-based hand gesture recognition","volume":"1","author":"Badi","year":"2016","journal-title":"Int. J. Data Sci. Anal."},{"key":"ref_34","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An image is worth 16 \u00d7 16 words: Transformers for image recognition at scale. arXiv."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Pugeault, N., and Bowden, R. (2011, January 6\u201313). Spelling it out: Real-time ASL fingerspelling recognition. Proceedings of the 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), Barcelona, Spain.","DOI":"10.1109\/ICCVW.2011.6130290"},{"key":"ref_36","first-page":"12","article-title":"A New 2D Static Hand Gesture Colour Image Dataset for ASL Gestures","volume":"15","author":"Barczak","year":"2011","journal-title":"Res. Lett. Inf. Math. Sci"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1007\/s11263-012-0560-5","article-title":"Attention based detection and recognition of hand postures against complex backgrounds","volume":"101","author":"Pisharady","year":"2013","journal-title":"Int. J. Comput. Vis."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/12\/5555\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T19:54:25Z","timestamp":1760126065000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/12\/5555"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,14]]},"references-count":37,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2023,6]]}},"alternative-id":["s23125555"],"URL":"https:\/\/doi.org\/10.3390\/s23125555","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,6,14]]}}}