{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,2]],"date-time":"2026-05-02T15:19:03Z","timestamp":1777735143262,"version":"3.51.4"},"reference-count":113,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2021,10,28]],"date-time":"2021-10-28T00:00:00Z","timestamp":1635379200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Science Foundation","award":["1846076"],"award-info":[{"award-number":["1846076"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Access. Comput."],"published-print":{"date-parts":[[2021,12,31]]},"abstract":"<jats:p>Many sign languages are bona fide natural languages with grammatical rules and lexicons hence can benefit from machine translation methods. Similarly, since sign language is a visual-spatial language, it can also benefit from computer vision methods for encoding it. With the advent of deep learning methods in recent years, significant advances have been made in natural language processing (specifically neural machine translation) and in computer vision methods (specifically image and video captioning). Researchers have therefore begun expanding these learning methods to sign language understanding. Sign language interpretation is especially challenging, because it involves a continuous visual-spatial modality where meaning is often derived based on context.<\/jats:p>\n          <jats:p>\n            The focus of this article, therefore, is to examine various deep learning\u2013based methods for encoding sign language as inputs, and to analyze the efficacy of several machine translation methods, over three different sign language datasets. The goal is to determine which combinations are sufficiently robust for sign language translation\n            <jats:italic>without<\/jats:italic>\n            any gloss-based information.\n          <\/jats:p>\n          <jats:p>To understand the role of the different input features, we perform ablation studies over the model architectures (input features + neural translation models) for improved continuous sign language translation. These input features include body and finger joints, facial points, as well as vector representations\/embeddings from convolutional neural networks. The machine translation models explored include several baseline sequence-to-sequence approaches, more complex and challenging networks using attention, reinforcement learning, and the transformer model. We implement the translation methods over multiple sign languages\u2014German (GSL), American (ASL), and Chinese sign languages (CSL). From our analysis, the transformer model combined with input embeddings from ResNet50 or pose-based landmark features outperformed all the other sequence-to-sequence models by achieving higher BLEU2-BLEU4 scores when applied to the controlled and constrained GSL benchmark dataset. These combinations also showed significant promise on the other less controlled ASL and CSL datasets.<\/jats:p>","DOI":"10.1145\/3477498","type":"journal-article","created":{"date-parts":[[2021,10,28]],"date-time":"2021-10-28T17:43:10Z","timestamp":1635442990000},"page":"1-30","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":47,"title":["Deep Learning Methods for Sign Language Translation"],"prefix":"10.1145","volume":"14","author":[{"given":"Tejaswini","family":"Ananthanarayana","sequence":"first","affiliation":[{"name":"Rochester Institute of Technology, Rochester, New York"}]},{"given":"Priyanshu","family":"Srivastava","sequence":"additional","affiliation":[{"name":"Rochester Institute of Technology, Rochester, New York"}]},{"given":"Akash","family":"Chintha","sequence":"additional","affiliation":[{"name":"Rochester Institute of Technology, Rochester, New York"}]},{"given":"Akhil","family":"Santha","sequence":"additional","affiliation":[{"name":"Rochester Institute of Technology, Rochester, New York"}]},{"given":"Brian","family":"Landy","sequence":"additional","affiliation":[{"name":"Rochester Institute of Technology, Rochester, New York"}]},{"given":"Joseph","family":"Panaro","sequence":"additional","affiliation":[{"name":"Rochester Institute of Technology, Rochester, New York"}]},{"given":"Andre","family":"Webster","sequence":"additional","affiliation":[{"name":"Rochester Institute of Technology, Rochester, New York"}]},{"given":"Nikunj","family":"Kotecha","sequence":"additional","affiliation":[{"name":"Rochester Institute of Technology, Rochester, New York"}]},{"given":"Shagan","family":"Sah","sequence":"additional","affiliation":[{"name":"Rochester Institute of Technology, Rochester, New York"}]},{"given":"Thomastine","family":"Sarchet","sequence":"additional","affiliation":[{"name":"Rochester Institute of Technology, Rochester, New York"}]},{"given":"Raymond","family":"Ptucha","sequence":"additional","affiliation":[{"name":"Rochester Institute of Technology, Rochester, New York"}]},{"given":"Ifeoma","family":"Nwogu","sequence":"additional","affiliation":[{"name":"Rochester Institute of Technology, Rochester, New York"}]}],"member":"320","published-online":{"date-parts":[[2021,10,28]]},"reference":[{"key":"e_1_3_3_2_2","doi-asserted-by":"crossref","unstructured":"Biyi Fang Jillian Co and Mi Zhang. 2017. DeepASL: Enabling ubiquitous and non-intrusive word and sentence-level sign language translation. In Proceedings of the 15th ACM Conference on Embedded Network Sensor Systems . 1\u201313.","DOI":"10.1145\/3131672.3131693"},{"key":"e_1_3_3_3_2","unstructured":"Openpose 2D. 2018. Retrieved January 4 2020 from https:\/\/github.com\/CMU-Perceptual-Computing-Lab\/openpose."},{"key":"e_1_3_3_4_2","doi-asserted-by":"crossref","unstructured":"Nayyer Aafaq Syed Zulqarnain Gilani Wei Liu and Ajmal Mian. 2018. Video description: A survey of methods datasets and evaluation metrics. ACM Comput. Surv. 52 6 Article 115 (Oct. 2019) 37 pages. https:\/\/doi.org\/10.1145\/3355390","DOI":"10.1145\/3355390"},{"key":"e_1_3_3_5_2","unstructured":"Dzmitry Bahdanau Kyunghyun Cho and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. In Proceedings of the 3rd International Conference on Learning Representations (ICLR\u201915) San Diego CA USA May 7-9 2015 Yoshua Bengio and Yann LeCun (Eds.). http:\/\/arxiv.org\/abs\/1409.0473"},{"key":"e_1_3_3_6_2","unstructured":"Dzmitry Bahdanau Kyunghyun Cho and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations ICLR 2015 San Diego CA USA May 7-9 2015 Yoshua Bengio and Yann LeCun (Eds.). http:\/\/arxiv.org\/abs\/1409.0473"},{"key":"e_1_3_3_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.339"},{"key":"e_1_3_3_8_2","unstructured":"Tom B. Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell Sandhini Agarwal Ariel Herbert-Voss Gretchen Krueger Tom Henighan Rewon Child Aditya Ramesh Daniel M. Ziegler Jeffrey Wu Clemens Winter Christopher Hesse Mark Chen Eric Sigler Mateusz Litwin Scott Gray Benjamin Chess Jack Clark Christopher Berner Sam McCandlish Alec Radford Ilya Sutskever and Dario Amodei. 2020. Language models are few-shot learners. In Advances in Neural Information Processing Systems H. Larochelle M. Ranzato R. Hadsell M. F. Balcan and H. Lin (Eds.) Vol. 33. Curran Associates Inc. 1877\u20131901. https:\/\/proceedings.neurips.cc\/paper\/2020\/file\/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf."},{"key":"e_1_3_3_9_2","doi-asserted-by":"crossref","unstructured":"Necati Cihan Camgoz Oscar Koller Simon Hadfield and Richard Bowden. 2020. Sign language transformers: Joint end-to-end sign language recognition and translation. In 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition CVPR . Computer Vision Foundation \/ IEEE 10020\u201310030. https:\/\/doi.org\/10.1109\/CVPR42600.2020.01004","DOI":"10.1109\/CVPR42600.2020.01004"},{"key":"e_1_3_3_10_2","article-title":"OpenPose: Realtime multi-person 2D pose estimation using part affinity fields","author":"Cao Z.","year":"2019","unstructured":"Z. Cao, G. Hidalgo Martinez, T. Simon, S. Wei, and Y. A. Sheikh. 2019. OpenPose: Realtime multi-person 2D pose estimation using part affinity fields. IEEE Trans. Pattern Anal. Mach. Intell. (2019).","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"e_1_3_3_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.143"},{"key":"e_1_3_3_12_2","first-page":"847","volume-title":"Proceedings of the 10th Asian Conference on Machine Learning","author":"Chen Ming","year":"2018","unstructured":"Ming Chen, Yingming Li, Zhongfei Zhang, and Siyu Huang. 2018. TVT: Two-view transformer network for video captioning. In Proceedings of the 10th Asian Conference on Machine Learning, Jun Zhu and Ichiro Takeuchi (Eds.). PMLR, 847\u2013862. http:\/\/proceedings.mlr.press\/v95\/chen18b.html."},{"key":"e_1_3_3_13_2","doi-asserted-by":"crossref","unstructured":"Kyunghyun Cho Bart van Merri\u00ebnboer Dzmitry Bahdanau and Yoshua Bengio. 2014. On the Properties of Neural Machine Translation: Encoder\u2013Decoder Approaches. In Proceedings of SSST-8 Eighth Workshop on Syntax Semantics and Structure in Statistical Translation . Association for Computational Linguistics Doha Qatar 103\u2013111. https:\/\/doi.org\/10.3115\/v1\/W14-4012","DOI":"10.3115\/v1\/W14-4012"},{"key":"e_1_3_3_14_2","doi-asserted-by":"crossref","unstructured":"KyungHyun Cho Bart van Merrienboer Dzmitry Bahdanau and Yoshua Bengio. 2014. On the properties of neural machine translation: Encoder-decoder approaches. In Proceedings of SSST-8 Eighth Workshop on Syntax Semantics and Structure in Statistical Translation . Association for Computational Linguistics Doha Qatar 103\u2013111. https:\/\/doi.org\/10.3115\/v1\/W14-4012","DOI":"10.3115\/v1\/W14-4012"},{"key":"e_1_3_3_15_2","unstructured":"Junyoung Chung Caglar Gulcehre KyungHyun Cho and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555. Retrieved from https:\/\/arxiv.org\/abs\/1412.3555."},{"key":"e_1_3_3_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.367"},{"key":"e_1_3_3_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.332"},{"key":"e_1_3_3_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00812"},{"key":"e_1_3_3_19_2","doi-asserted-by":"crossref","unstructured":"Zihang Dai Zhilin Yang Yiming Yang Jaime G. Carbonell Quoc V. Le and Ruslan Salakhutdinov. 2019. Transformer-XL: Attentive language models beyond a fixed-length context. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics . Association for Computational Linguistics Florence Italy 2978\u20132988. https:\/\/doi.org\/10.18653\/v1\/P19-1285","DOI":"10.18653\/v1\/P19-1285"},{"issue":"2","key":"e_1_3_3_20_2","first-page":"251","article-title":"Spoken vs. sign languages\u2014What\u2019s the difference?","volume":"15","author":"Damian Simona","year":"2011","unstructured":"Simona Damian. 2011. Spoken vs. sign languages\u2014What\u2019s the difference? Cogn. Brain Behav. 15, 2 (2011), 251.","journal-title":"Cogn. Brain Behav."},{"key":"e_1_3_3_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.321"},{"key":"e_1_3_3_22_2","unstructured":"American Sign Language Dataset. Retrieved May 9 2020 from http:\/\/www.bu.edu\/asllrp\/."},{"key":"e_1_3_3_23_2","unstructured":"American Sign Language Dataset. Retrieved May 9 2020 from http:\/\/www.bu.edu\/asllrp\/."},{"key":"e_1_3_3_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_3_25_2","doi-asserted-by":"crossref","unstructured":"Jacob Devlin Ming-Wei Chang Kenton Lee and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies . Association for Computational Linguistics Minneapolis Minnesota 4171\u20134186. https:\/\/doi.org\/10.18653\/v1\/N19-1423","DOI":"10.18653\/v1\/N19-1423"},{"key":"e_1_3_3_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298878"},{"key":"e_1_3_3_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/3131672.3131693"},{"key":"e_1_3_3_28_2","unstructured":"AlphaGo Zero: Starting from scratch. Retrieved May 1 2020 from https:\/\/deepmind.com\/blog\/article\/alphago-zero-starting-scratch."},{"key":"e_1_3_3_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2017.2729019"},{"key":"e_1_3_3_30_2","doi-asserted-by":"publisher","DOI":"10.5555\/3305381.3305510"},{"key":"e_1_3_3_31_2","doi-asserted-by":"publisher","DOI":"10.1145\/1143844.1143891"},{"key":"e_1_3_3_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCVW.2017.373"},{"key":"e_1_3_3_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_3_34_2","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_3_3_35_2","unstructured":"Markus Hosemann and Jana Steinbach. [n.d.]. Atlas of Sign Language Structures."},{"key":"e_1_3_3_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/LSP.2018.2797228"},{"key":"e_1_3_3_37_2","unstructured":"Five Paramters in Sign Language. Retrieved March 24 2021 from https:\/\/www.lifeprint.com\/asl101\/pages-layout\/parameters.htm\/."},{"key":"e_1_3_3_38_2","unstructured":"Glossing in Sign Language. Retrieved April 23 2020 from https:\/\/www.lifeprint.com\/asl101\/topics\/gloss.htm."},{"key":"e_1_3_3_39_2","unstructured":"Glossing in Sign Language. Retrieved April 23 2020 from https:\/\/www.startasl.com\/sign-language-symbols\/."},{"key":"e_1_3_3_40_2","doi-asserted-by":"crossref","unstructured":"Eldar Insafutdinov Mykhaylo Andriluka Leonid Pishchulin Siyu Tang Evgeny Levinkov Bjoern Andres and Bernt Schiele. 2016. Articulated multi-person tracking in the wild. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . 1293\u20131301.","DOI":"10.1109\/CVPR.2017.142"},{"key":"e_1_3_3_41_2","first-page":"1700","volume-title":"Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing","author":"Kalchbrenner Nal","year":"2013","unstructured":"Nal Kalchbrenner and Phil Blunsom. 2013. Recurrent continuous translation models. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 1700\u20131709."},{"key":"e_1_3_3_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.223"},{"key":"e_1_3_3_43_2","doi-asserted-by":"publisher","DOI":"10.3390\/app9132683"},{"key":"e_1_3_3_44_2","doi-asserted-by":"publisher","DOI":"10.1177\/0278364913495721"},{"key":"e_1_3_3_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2019.2911077"},{"key":"e_1_3_3_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCVW.2015.69"},{"key":"e_1_3_3_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.412"},{"key":"e_1_3_3_48_2","doi-asserted-by":"publisher","DOI":"10.3390\/robotics2030122"},{"key":"e_1_3_3_49_2","first-page":"1097","volume-title":"Advances in Neural Information Processing Systems","author":"Krizhevsky Alex","year":"2012","unstructured":"Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097\u20131105."},{"key":"e_1_3_3_50_2","article-title":"Word-level deep sign language recognition from video: A new large-scale dataset and methods comparison","volume":"1910","author":"Li Dongxu","year":"2019","unstructured":"Dongxu Li, Cristian Rodriguez Opazo, Xin Yu, and Hongdong Li. 2019. Word-level deep sign language recognition from video: A new large-scale dataset and methods comparison. arXiv abs\/1910.11006. Retrieved from https:\/\/arxiv.org\/abs\/1910.11006.","journal-title":"arXiv"},{"key":"e_1_3_3_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/WACV.2019.00042"},{"key":"e_1_3_3_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/WACV.2019.00042"},{"key":"e_1_3_3_53_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-69923-3_77"},{"key":"e_1_3_3_54_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.1982.1056489"},{"key":"e_1_3_3_55_2","unstructured":"Jiasen Lu Dhruv Batra Devi Parikh and Stefan Lee. 2019. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks. In Advances in Neural Information Processing Systems H. Wallach H. Larochelle A. Beygelzimer F. d\u2019Alch\u00e9-Buc E. Fox and R. Garnett (Eds.) Vol. 32. Curran Associates Inc. https:\/\/proceedings.neurips.cc\/paper\/2019\/file\/c74d97b01eae257e44aa9d5bade97baf-Paper.pdf."},{"key":"e_1_3_3_56_2","doi-asserted-by":"crossref","unstructured":"Minh-Thang Luong Hieu Pham and Christopher D. Manning. 2015. Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing . Association for Computational Linguistics Lisbon Portugal 1412\u20131421. https:\/\/doi.org\/10.18653\/v1\/D15-1166","DOI":"10.18653\/v1\/D15-1166"},{"key":"e_1_3_3_57_2","first-page":"281","volume-title":"Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability","volume":"1","author":"MacQueen James","year":"1967","unstructured":"James MacQueen et\u00a0al. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1. Oakland, CA, 281\u2013297."},{"key":"e_1_3_3_58_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-981-10-7299-4_15"},{"key":"e_1_3_3_59_2","unstructured":"Volodymyr Mnih Koray Kavukcuoglu David Silver Alex Graves Ioannis Antonoglou Daan Wierstra and Martin Riedmiller. 2013. Playing atari with deep reinforcement learning. In NIPS Deep Learning Workshop . arXivpreprintarXiv:1312.5602."},{"key":"e_1_3_3_60_2","volume-title":"Proceedings of the Workshop on the Creating Meaning With Robot Assistants: The Gap Left by Smart Devices","author":"Mocialov Boris","year":"2017","unstructured":"Boris Mocialov, Graham Turner, Katrin Lohan, and Helen Hastie. 2017. Towards continuous sign language recognition with deep learning. In Proceedings of the Workshop on the Creating Meaning With Robot Assistants: The Gap Left by Smart Devices."},{"key":"e_1_3_3_61_2","unstructured":"Oscar Koller Hermann Ney Richard Bowden Necati Cihan Camg\u00f6z and Simon Hadfield. 2018. RWTH-PHOENIX-Weather 2014 T: Parallel corpus of sign language video gloss and translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Salt Lake City UT ."},{"key":"e_1_3_3_62_2","first-page":"682","volume-title":"Image and Video Technology","author":"Nishida Noriki","year":"2015","unstructured":"Noriki Nishida and Hideki Nakayama. 2015. Multimodal gesture recognition using multi-stream recurrent neural network. In Image and Video Technology. Springer, 682\u2013694."},{"key":"e_1_3_3_63_2","unstructured":"Chigozie Nwankpa Winifred Ijomah Anthony Gachagan and Stephen Marshall. 2018. Activation functions: Comparison of trends in practice and research for deep learning. CoRR abs\/1811.03378 (2018). arXiv:1811.03378 http:\/\/arxiv.org\/abs\/1811.03378."},{"key":"e_1_3_3_64_2","doi-asserted-by":"crossref","unstructured":"Silvio Olivastri Gurkirt Singh and Fabio Cuzzolin. 2019. End-to-End video captioning. In IEEE\/CVF International Conference on Computer Vision Workshop (ICCVW\u201919) . 1474\u20131482.","DOI":"10.1109\/ICCVW.2019.00185"},{"key":"e_1_3_3_65_2","article-title":"The virtualsign channel for the communication between deaf and hearing users","author":"Oliveira Tiago","year":"2019","unstructured":"Tiago Oliveira, Nuno Escudeiro, Paula Escudeiro, Emanuel Rocha, and Fernando Maciel Barbosa. 2019. The virtualsign channel for the communication between deaf and hearing users. IEEE Rev. Iberoam. Tecnol. Aprend. (2019).","journal-title":"IEEE Rev. Iberoam. Tecnol. Aprend."},{"key":"e_1_3_3_66_2","unstructured":"OpenFace. 2018. Retrieved January 4 2020 from https:\/\/github.com\/TadasBaltrusaitis\/OpenFace."},{"key":"e_1_3_3_67_2","unstructured":"World Health Organization. 2018. Retrieved January 4 2020 from https:\/\/www.who.int\/news-room\/facts-in-pictures\/detail\/deafness\/."},{"key":"e_1_3_3_68_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.111"},{"key":"e_1_3_3_69_2","first-page":"311","volume-title":"Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL\u201902)","author":"Papineni Kishore","year":"2002","unstructured":"Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL\u201902). Association for Computational Linguistics, Stroudsburg, PA, USA, 311\u2013318. https:\/\/doi.org\/10.3115\/1073083.1073135"},{"key":"e_1_3_3_70_2","doi-asserted-by":"publisher","DOI":"10.3389\/fpsyg.2018.01433"},{"key":"e_1_3_3_71_2","first-page":"1","volume-title":"Proceedings of the 3rd IEEE-RAS International Conference on Humanoid Robots","author":"Peters Jan","year":"2003","unstructured":"Jan Peters, Sethu Vijayakumar, and Stefan Schaal. 2003. Reinforcement learning for humanoid robotics. In Proceedings of the 3rd IEEE-RAS International Conference on Humanoid Robots. 1\u201320."},{"key":"e_1_3_3_72_2","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511712203.018"},{"key":"e_1_3_3_73_2","first-page":"572","volume-title":"Proceedings of the European Conference on Computer Vision","author":"Pigou Lionel","year":"2014","unstructured":"Lionel Pigou, Sander Dieleman, Pieter-Jan Kindermans, and Benjamin Schrauwen. 2014. Sign language recognition using convolutional neural networks. In Proceedings of the European Conference on Computer Vision. Springer, 572\u2013578."},{"key":"e_1_3_3_74_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-016-0957-7"},{"key":"e_1_3_3_75_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCVW.2017.365"},{"key":"e_1_3_3_76_2","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2018\/123"},{"key":"e_1_3_3_77_2","unstructured":"Alec Radford. 2018. Improving language understanding by generative pre-training. https:\/\/s3-us-west-2.amazonaws.com\/openai-assets\/research-covers\/languageunsupervised\/languageunderstandingpaper.pdf."},{"issue":"8","key":"e_1_3_3_78_2","first-page":"9","article-title":"Language models are unsupervised multitask learners","volume":"1","author":"Radford Alec","year":"2019","unstructured":"Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI Blog 1, 8 (2019), 9.","journal-title":"OpenAI Blog"},{"key":"e_1_3_3_79_2","volume-title":"Proceedings of the 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016","author":"Ranzato Marc\u2019Aurelio","year":"2016","unstructured":"Marc\u2019Aurelio Ranzato, Sumit Chopra, Michael Auli, and Wojciech Zaremba. 2016. Sequence level training with recurrent neural networks. In Proceedings of the 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Yoshua Bengio and Yann LeCun (Eds.). http:\/\/arxiv.org\/abs\/1511.06732"},{"key":"e_1_3_3_80_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.128"},{"key":"e_1_3_3_81_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.131"},{"key":"e_1_3_3_82_2","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9781139163910"},{"key":"e_1_3_3_83_2","doi-asserted-by":"publisher","DOI":"10.1021\/ac60214a047"},{"key":"e_1_3_3_84_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.548"},{"key":"e_1_3_3_85_2","doi-asserted-by":"publisher","DOI":"10.1145\/3240876.3240900"},{"key":"e_1_3_3_86_2","doi-asserted-by":"publisher","DOI":"10.1038\/nature24270"},{"key":"e_1_3_3_87_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.494"},{"key":"e_1_3_3_88_2","unstructured":"Chen Sun Fabien Baradel Kevin Murphy and Cordelia Schmid. 2019. Contrastive bidirectional transformer for temporal representation learning. arXiv:1906.05743. Retrieved from http:\/\/arxiv.org\/abs\/1906.05743."},{"key":"e_1_3_3_89_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00756"},{"key":"e_1_3_3_90_2","first-page":"3104","volume-title":"Advances in Neural Information Processing Systems","author":"Sutskever Ilya","year":"2014","unstructured":"Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems. 3104\u20133112."},{"key":"e_1_3_3_91_2","volume-title":"Reinforcement Learning: An Introduction","author":"Sutton Richard S.","year":"2018","unstructured":"Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction. MIT Press."},{"key":"e_1_3_3_92_2","doi-asserted-by":"crossref","unstructured":"Christian Szegedy Wei Liu Yangqing Jia Pierre Sermanet Scott E. Reed Dragomir Anguelov Dumitru Erhan Vincent Vanhoucke and Andrew Rabinovich. 2015. Going Deeper with Convolutions. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . 1\u20139. https:\/\/ieeexplore.ieee.org\/document\/7298594.","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"e_1_3_3_93_2","doi-asserted-by":"crossref","unstructured":"Christian Szegedy Vincent Vanhoucke Sergey Ioffe Jonathon Shlens and Zbigniew Wojna. 2015. Rethinking the inception architecture for computer vision. 2818\u20132826. https:\/\/doi.org\/10.1109\/CVPR.2016.308","DOI":"10.1109\/CVPR.2016.308"},{"key":"e_1_3_3_94_2","unstructured":"Mingxing Tan and Quoc V. Le. 2019. EfficientNet: Rethinking model scaling for convolutional neural networks. 97 (2019) 6105\u20136114. https:\/\/proceedings.mlr.press\/v97\/tan19a.html."},{"key":"e_1_3_3_95_2","unstructured":"Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N. Gomez Lukasz Kaiser and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach California USA) (NIPS\u201917) . Curran Associates Inc. 6000\u20136010."},{"key":"e_1_3_3_96_2","unstructured":"Mel Vecerik Todd Hester Jonathan Scholz Fumin Wang Olivier Pietquin Bilal Piot Nicolas Heess Thomas Roth\u00f6rl Thomas Lampe and Martin Riedmiller. 2017. Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards. arXiv:1707.08817. Retrieved from https:\/\/arxiv.org\/abs\/1707.08817."},{"key":"e_1_3_3_97_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7299087"},{"key":"e_1_3_3_98_2","doi-asserted-by":"crossref","unstructured":"Subhashini Venugopalan Marcus Rohrbach Jeff Donahue Raymond J. Mooney Trevor Darrell and Kate Saenko. 2015. Sequence to sequence\u2014Video to text. In ICCV . IEEE Computer Society 4534\u20134542. http:\/\/dblp.uni-trier.de\/db\/conf\/iccv\/iccv2015.html#VenugopalanRDMD15.","DOI":"10.1109\/ICCV.2015.515"},{"key":"e_1_3_3_99_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298935"},{"key":"e_1_3_3_100_2","volume-title":"Proceedings of the 5th Workshop on the Representation and Processing of Sign Languages: Interactions between Corpus and Lexicon","author":"Vogler Christian","year":"2012","unstructured":"Christian Vogler and Carol Neidle. 2012. A new web interface to facilitate access to corpora: Development of the ASLLRP data access interface. In Proceedings of the 5th Workshop on the Representation and Processing of Sign Languages: Interactions between Corpus and Lexicon."},{"key":"e_1_3_3_101_2","doi-asserted-by":"publisher","DOI":"10.1145\/3240508.3240671"},{"key":"e_1_3_3_102_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00443"},{"key":"e_1_3_3_103_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.511"},{"key":"e_1_3_3_104_2","doi-asserted-by":"publisher","DOI":"10.1007\/BF00992696"},{"key":"e_1_3_3_105_2","doi-asserted-by":"publisher","DOI":"10.1353\/sls.1993.0010"},{"key":"e_1_3_3_106_2","unstructured":"Yonghui Wu Mike Schuster Zhifeng Chen Quoc V. Le Mohammad Norouzi Wolfgang Macherey Maxim Krikun Yuan Cao Qin Gao Klaus Macherey Jeff Klingner Apurva Shah Melvin Johnson Xiaobing Liu Lukasz Kaiser Stephan Gouws Yoshikiyo Kato Taku Kudo Hideto Kazawa Keith Stevens George Kurian Nishant Patil Wei Wang Cliff Young Jason Smith Jason Riesa Alex Rudnick Oriol Vinyals Greg Corrado Macduff Hughes and Jeffrey Dean. 2016. Google\u2019s neural machine translation system: Bridging the gap between human and machine translation. arXiv:1609.08144. Retrieved from http:\/\/arxiv.org\/abs\/1609.08144."},{"key":"e_1_3_3_107_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2018.00280"},{"key":"e_1_3_3_108_2","unstructured":"Kayo Yin. 2020. Sign language translation with transformers. arXiv:2004.00588. Retrieved from https:\/\/arxiv.org\/abs\/2004.00588."},{"key":"e_1_3_3_109_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.496"},{"key":"e_1_3_3_110_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.347"},{"key":"e_1_3_3_111_2","doi-asserted-by":"publisher","DOI":"10.1109\/FG.2019.8756506"},{"key":"e_1_3_3_112_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICIP.2019.8802972"},{"key":"e_1_3_3_113_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICIP.2019.8802972"},{"key":"e_1_3_3_114_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00911"}],"container-title":["ACM Transactions on Accessible Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3477498","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3477498","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3477498","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T18:10:36Z","timestamp":1750183836000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3477498"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,28]]},"references-count":113,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2021,12,31]]}},"alternative-id":["10.1145\/3477498"],"URL":"https:\/\/doi.org\/10.1145\/3477498","relation":{},"ISSN":["1936-7228","1936-7236"],"issn-type":[{"value":"1936-7228","type":"print"},{"value":"1936-7236","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,10,28]]},"assertion":[{"value":"2020-12-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-07-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-10-28","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}