{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,6]],"date-time":"2025-12-06T16:49:31Z","timestamp":1765039771736,"version":"build-2065373602"},"reference-count":29,"publisher":"MDPI AG","issue":"8","license":[{"start":{"date-parts":[[2025,7,24]],"date-time":"2025-07-24T00:00:00Z","timestamp":1753315200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"JKA"},{"name":"KEIRIN RACE"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>In this work, we introduce IK-AUG, a unified algorithmic framework for kinematics-driven data augmentation tailored to sign language recognition (SLR). Departing from traditional augmentation techniques that operate at the pixel or feature level, our method integrates inverse kinematics (IK) and virtual simulation to synthesize anatomically valid gesture sequences within a structured 3D environment. The proposed system begins with sparse 3D keypoints extracted via a pose estimator and projects them into a virtual coordinate space. A differentiable IK solver based on forward-and-backward constrained optimization is then employed to reconstruct biomechanically plausible joint trajectories. To emulate natural signer variability and enhance data richness, we define a set of parametric perturbation operators spanning spatial displacement, depth modulation, and solver sensitivity control. These operators are embedded into a generative loop that transforms each original gesture sample into a diverse sequence cluster, forming a high-fidelity augmentation corpus. We benchmark our method across five deep sequence models (CNN3D, TCN, Transformer, Informer, and Sparse Transformer) and observe consistent improvements in accuracy and convergence. Notably, Informer achieves 94.1% validation accuracy with IK-AUG enhanced training, underscoring the framework\u2019s efficacy. These results suggest that algorithmic augmentation via kinematic modeling offers a scalable, annotation free pathway for improving SLR systems and lays the foundation for future integration with multi-sensor inputs in hybrid recognition pipelines.<\/jats:p>","DOI":"10.3390\/a18080463","type":"journal-article","created":{"date-parts":[[2025,7,24]],"date-time":"2025-07-24T14:11:44Z","timestamp":1753366304000},"page":"463","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Inverse Kinematics-Augmented Sign Language: A Simulation-Based Framework for Scalable Deep Gesture Recognition"],"prefix":"10.3390","volume":"18","author":[{"given":"Binghao","family":"Wang","sequence":"first","affiliation":[{"name":"Graduate School of Computer Science and Engineering, University of Aizu, Aizu-Wakamatsu 965-8580, Japan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1181-2536","authenticated-orcid":false,"given":"Lei","family":"Jing","sequence":"additional","affiliation":[{"name":"Graduate School of Computer Science and Engineering, University of Aizu, Aizu-Wakamatsu 965-8580, Japan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3919-2658","authenticated-orcid":false,"given":"Xiang","family":"Li","sequence":"additional","affiliation":[{"name":"Graduate School of Computer Science and Engineering, University of Aizu, Aizu-Wakamatsu 965-8580, Japan"}]}],"member":"1968","published-online":{"date-parts":[[2025,7,24]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Bragg, D., Koller, O., Bellard, M., Berke, L., Boudreault, P., Braffort, A., Caselli, N., Huenerfauth, M., Kacorri, H., and Verhoef, T. (2019, January 28\u201330). Sign language recognition, generation, and translation: An interdisciplinary perspective. Proceedings of the 21st International ACM Special Interest Group on Accessibility and Computing Conference on Computers and Accessibility, Pittsburgh, PA, USA.","DOI":"10.1145\/3308561.3353774"},{"key":"ref_2","unstructured":"Camgoz, N.C., Koller, O., Hadfield, S., and Bowden, R. (2020, January 13\u201319). Sign language transformers: Joint end-to-end sign language recognition and translation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"400","DOI":"10.1093\/deafed\/enu005","article-title":"Technology use among adults who are deaf and hard of hearing: A national survey","volume":"19","author":"Pagliaro","year":"2014","journal-title":"J. Deaf. Stud. Deaf. Educ."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1311","DOI":"10.1007\/s11263-018-1121-3","article-title":"Deep sign: Enabling robust statistical continuous sign language recognition via hybrid CNN-HMMs","volume":"126","author":"Koller","year":"2018","journal-title":"Int. J. Comput. Vis."},{"key":"ref_5","unstructured":"Li, D., Rodriguez, C., Yu, X., and Li, H. (2022, January 3\u20138). Word-level deep sign language recognition from video: A new large-scale dataset and methods comparison. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Ge, L., Ren, Z., Li, Y., Xue, Z., Wang, Y., Cai, J., and Yuan, J. (2019, January 15\u201320). 3D hand shape and pose estimation from a single RGB image. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01109"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"55524","DOI":"10.1109\/ACCESS.2025.3554046","article-title":"Deep learning approaches for continuous sign language recognition: A comprehensive review","volume":"13","author":"Khan","year":"2025","journal-title":"IEEE Access"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Duarte, A., Palaskar, S., Ventura, L., Ghadiyaram, D., DeHaan, K., Metze, F., Torres, J., and Giro-i Nieto, X. (2021, January 20\u201325). How2Sign: A large-scale multimodal dataset for continuous American Sign Language. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00276"},{"key":"ref_9","first-page":"3785","article-title":"Rwth-phoenix-weather: A large vocabulary sign language recognition and translation corpus","volume":"9","author":"Forster","year":"2012","journal-title":"Int. Conf. Lang. Resour. Eval."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"126917","DOI":"10.1109\/ACCESS.2021.3110912","article-title":"Deep learning for sign language recognition: Current techniques, benchmarks, and open issues","volume":"9","author":"Khalid","year":"2021","journal-title":"IEEE Access"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Jiang, S., Sun, B., Wang, L., Bai, Y., Li, K., and Fu, Y. (2021, January 20\u201325). Skeleton-aware multi-modal sign language recognition. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA.","DOI":"10.1109\/CVPRW53098.2021.00380"},{"key":"ref_12","first-page":"211","article-title":"Application of deep learning techniques on sign language recognition A survey","volume":"Volume 70","author":"Barve","year":"2021","journal-title":"Data Management, Analytics and Innovation, Proceedings of the International Conference on Discrete Mathematics, Rupnagar, India, 11\u201313 February 2021"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"143","DOI":"10.1016\/j.jksuci.2018.11.008","article-title":"3D sign language recognition using spatio-temporal graph kernels","volume":"34","author":"Kumar","year":"2022","journal-title":"J. King Saud Univ. Comput. Inf. Sci."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"221","DOI":"10.1109\/TPAMI.2012.59","article-title":"3D convolutional neural networks for human action recognition","volume":"35","author":"Ji","year":"2012","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Lea, C., Vidal, R., Reiter, A., and Hager, G.D. (15\u201316, January 8\u201310). Temporal convolutional networks: A unified approach to action segmentation. Proceedings of the European Conference on Computer Vision 2016 Workshops, Amsterdam, The Netherlands. Part III.","DOI":"10.1007\/978-3-319-49409-8_7"},{"key":"ref_16","first-page":"6000","article-title":"Attention is all you need","volume":"30","author":"Vaswani","year":"2017","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Zhou, H., Zhang, S., Peng, J., Zhang, S., Li, J., Xiong, H., and Zhang, W. (2021, January 2\u20139). Informer: Beyond efficient transformer for long sequence time-series forecasting. Proceedings of the The Association for the Advancement of Artificial Intelligence (AAAI) Conference on Artificial Intelligence, Online.","DOI":"10.1609\/aaai.v35i12.17325"},{"key":"ref_18","unstructured":"Child, R., Gray, S., Radford, A., and Sutskever, I. (2019). Generating long sequences with sparse transformers. arXiv."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Alam, M.S., Lamberton, J., Wang, J., Leannah, C., Miller, S., Palagano, J., de Bastion, M., Smith, H.L., Malzkuhn, M., and Quandt, L.C. (2024). ASL champ!: A virtual reality game with deep-learning driven sign recognition. Comput. Educ. X Real., 4.","DOI":"10.1016\/j.cexr.2024.100059"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Vaitkevi\u010dius, A., Taroza, M., Bla\u017eauskas, T., Dama\u0161evi\u010dius, R., Maskeli\u016bnas, R., and Wo\u017aniak, M. (2019). Recognition of American sign language gestures in a virtual reality using leap motion. Appl. Sci., 9.","DOI":"10.3390\/app9030445"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Schioppo, J., Meyer, Z., Fabiano, D., and Canavan, S. (2019, January 4\u20139). Sign language recognition: Learning American sign language in a virtual environment. Proceedings of the Extended Abstracts of the 2019 Conference on Human Factors in Computing Systems, Glasgow, Scotland, UK.","DOI":"10.1145\/3290607.3313025"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Papadogiorgaki, M., Grammalidis, N., Makris, L., and Strintzis, M.G. (2006, January 5\u20136). Gesture synthesis from sign language notation using MPEG-4 humanoid animation parameters and inverse kinematics. Proceedings of the 2nd IET International Conference on Intelligent Environments, Athens, Greece.","DOI":"10.1049\/cp:20060637"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Ponton, J.L., Yun, H., Aristidou, A., Andujar, C., and Pelechano, N. (2023). Sparseposer: Real-time full-body motion reconstruction from sparse data. ACM Trans. Graph., 43.","DOI":"10.1145\/3625264"},{"key":"ref_24","unstructured":"Nunnari, F., Espa\u00f1a-Bonet, C., and Avramidis, E. (2021, January 1\u20133). A data augmentation approach for sign-language-to-text translation in-the-wild. Proceedings of the 3rd Conference on Language, Data and Knowledge. Schloss Dagstuhl\u2013Leibniz-Zentrum f\u00fcr Informatik, Zaragoza, Spain."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Awaluddin, B.A., Chao, C.T., and Chiou, J.S. (2024). A hybrid image augmentation technique for user-and environment-independent hand gesture recognition based on deep learning. Mathematics, 12.","DOI":"10.3390\/math12091393"},{"key":"ref_26","first-page":"1283","article-title":"Impact of face swapping and data augmentation on sign language recognition","volume":"24","year":"2024","journal-title":"Univers. Access Inf. Soc."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3377552","article-title":"Learning three-dimensional skeleton data from sign language video","volume":"11","author":"Brock","year":"2020","journal-title":"ACM Trans. Intell. Syst. Technol."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"15290","DOI":"10.1109\/ACCESS.2024.3481254","article-title":"Skeleton-based data augmentation for sign language recognition using adversarial learning","volume":"13","author":"Nakamura","year":"2024","journal-title":"IEEE Access"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Wen, F., Zhang, Z., He, T., and Lee, C. (2021). AI enabled sign language recognition and VR space bidirectional communication using triboelectric smart glove. Nat. Commun., 12.","DOI":"10.1038\/s41467-021-25637-w"}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/18\/8\/463\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T18:15:33Z","timestamp":1760033733000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/18\/8\/463"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,24]]},"references-count":29,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2025,8]]}},"alternative-id":["a18080463"],"URL":"https:\/\/doi.org\/10.3390\/a18080463","relation":{},"ISSN":["1999-4893"],"issn-type":[{"type":"electronic","value":"1999-4893"}],"subject":[],"published":{"date-parts":[[2025,7,24]]}}}