{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,29]],"date-time":"2026-05-29T20:57:57Z","timestamp":1780088277094,"version":"3.54.0"},"reference-count":76,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2021,7,22]],"date-time":"2021-07-22T00:00:00Z","timestamp":1626912000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2021,8,31]]},"abstract":"<jats:p>\n            Sign language recognition (SLR) is a challenging problem, involving complex manual features (i.e., hand gestures) and fine-grained non-manual features\u00a0(NMFs) (i.e., facial expression, mouth shapes,\n            <jats:italic>etc<\/jats:italic>\n            .). Although manual features are dominant, non-manual features also play an important role in the expression of a sign word. Specifically, many sign words convey different meanings due to non-manual features, even though they share the same hand gestures. This ambiguity introduces great challenges in the recognition of sign words. To tackle the above issue, we propose a simple yet effective architecture called Global-Local Enhancement Network\u00a0(GLE-Net), including two mutually promoted streams toward different crucial aspects of SLR. Of the two streams, one captures the global contextual relationship, while the other stream captures the discriminative fine-grained cues. Moreover, due to the lack of datasets explicitly focusing on this kind of feature, we introduce the first non-manual-feature-aware isolated Chinese sign language dataset\u00a0(NMFs-CSL) with a total vocabulary size of 1,067 sign words in daily life. Extensive experiments on NMFs-CSL and SLR500 datasets demonstrate the effectiveness of our method.\n          <\/jats:p>","DOI":"10.1145\/3436754","type":"journal-article","created":{"date-parts":[[2021,7,22]],"date-time":"2021-07-22T14:44:29Z","timestamp":1626965069000},"page":"1-19","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":47,"title":["Global-Local Enhancement Network for NMF-Aware Sign Language Recognition"],"prefix":"10.1145","volume":"17","author":[{"given":"Hezhen","family":"Hu","sequence":"first","affiliation":[{"name":"University of Science and Technology of China, Hefei, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Wengang","family":"Zhou","sequence":"additional","affiliation":[{"name":"University of Science and Technology of China, Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Junfu","family":"Pu","sequence":"additional","affiliation":[{"name":"University of Science and Technology of China, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Houqiang","family":"Li","sequence":"additional","affiliation":[{"name":"University of Science and Technology of China, Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2021,7,22]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2018.2856094"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206523"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.332"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.502"},{"key":"e_1_2_1_5_1","volume-title":"The devisign large vocabulary of Chinese sign language database and baseline evaluations. Technical report VIPL-TR-14-SLR-001","author":"Chai Xiujuan","unstructured":"Xiujuan Chai , Hanjie Wang , and Xilin Chen . 2014. The devisign large vocabulary of Chinese sign language database and baseline evaluations. Technical report VIPL-TR-14-SLR-001 . Key Lab of Intelligent Information Processing of Chinese Academy of Sciences . Xiujuan Chai, Hanjie Wang, and Xilin Chen. 2014. The devisign large vocabulary of Chinese sign language database and baseline evaluations. Technical report VIPL-TR-14-SLR-001. Key Lab of Intelligent Information Processing of Chinese Academy of Sciences."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01246-5_22"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICMLA.2014.110"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.5555\/2503308.2503313"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2018.2889563"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_2_1_11_1","volume-title":"Proceedings of the European Conference on Computer Vision. 595\u2013607","author":"Evangelidis Georgios D.","year":"2014","unstructured":"Georgios D. Evangelidis , Gurkirt Singh , and Radu Horaud . 2014 . Continuous gesture recognition from articulated poses . In Proceedings of the European Conference on Computer Vision. 595\u2013607 . Georgios D. Evangelidis, Gurkirt Singh, and Radu Horaud. 2014. Continuous gesture recognition from articulated poses. In Proceedings of the European Conference on Computer Vision. 595\u2013607."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00630"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/72.182690"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.476"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.129"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.622"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3152121"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.5555\/645886.758306"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00378"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.5555\/1170745.1171537"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/ChinaSIP.2015.7230384"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2018.2870740"},{"key":"e_1_2_1_24_1","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence.","author":"Huang Jie","year":"2018","unstructured":"Jie Huang , Wengang Zhou , Qilin Zhang , Houqiang Li , and Weiping Li . 2018 . Video-based sign language recognition without temporal segmentation . In Proceedings of the AAAI Conference on Artificial Intelligence. Jie Huang, Wengang Zhou, Qilin Zhang, Houqiang Li, and Weiping Li. 2018. Video-based sign language recognition without temporal segmentation. In Proceedings of the AAAI Conference on Artificial Intelligence."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.5555\/2969442.2969465"},{"key":"e_1_2_1_26_1","volume-title":"Proceedings of the British Machine Vision Conference. 1\u201316","author":"Vaezi Joze Hamid Reza","year":"2019","unstructured":"Hamid Reza Vaezi Joze and Oscar Koller . 2019 . MS-ASL: A large-scale data set and benchmark for understanding American sign language . In Proceedings of the British Machine Vision Conference. 1\u201316 . Hamid Reza Vaezi Joze and Oscar Koller. 2019. MS-ASL: A large-scale data set and benchmark for understanding American sign language. In Proceedings of the British Machine Vision Conference. 1\u201316."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.5772\/60091"},{"key":"e_1_2_1_28_1","volume-title":"et\u00a0al","author":"Kay Will","year":"2017","unstructured":"Will Kay , Joao Carreira , Karen Simonyan , Brian Zhang , Chloe Hillier , Sudheendra Vijayanarasimhan , Fabio Viola , Tim Green , Trevor Back , Paul Natsev , et\u00a0al . 2017 . The Kinetics human action video dataset. arXiv preprint arXiv:1705.06950 (2017). Will Kay, Joao Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, et\u00a0al. 2017. The Kinetics human action video dataset. arXiv preprint arXiv:1705.06950 (2017)."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/3477.485888"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2019.2911077"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cviu.2015.09.013"},{"key":"e_1_2_1_32_1","first-page":"136","article-title":"Deep sign: Hybrid CNN-HMM for continuous sign language recognition. In Proceedings of the British Machine Vision Conference","volume":"136","author":"Koller Oscar","year":"2016","unstructured":"Oscar Koller , O. Zargaran , Hermann Ney , and Richard Bowden . 2016 . Deep sign: Hybrid CNN-HMM for continuous sign language recognition. In Proceedings of the British Machine Vision Conference . Article 136 , 136 .1-136.12 pages. Oscar Koller, O. Zargaran, Hermann Ney, and Richard Bowden. 2016. Deep sign: Hybrid CNN-HMM for continuous sign language recognition. In Proceedings of the British Machine Vision Conference. Article 136, 136.1-136.12 pages.","journal-title":"Article"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.364"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-018-1121-3"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-005-1838-7"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cviu.2017.10.011"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00718"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCVW.2017.361"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2009.5204291"},{"key":"e_1_2_1_40_1","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"van der Maaten Laurens","year":"2008","unstructured":"Laurens van der Maaten and Geoffrey Hinton . 2008 . Visualizing data using t-SNE . Journal of Machine Learning Research 9 , 86 (2008), 2579 \u2013 2605 . Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, 86 (2008), 2579\u20132605.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46448-0_25"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.5555\/2969033.2969073"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/3123266.3123313"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/HSI.2013.6577826"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.5555\/3304415.3304541"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00429"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.590"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01233"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01240-3_4"},{"key":"e_1_2_1_50_1","volume-title":"Laura Cristina Lanzarini, and Alejandro Rosete.","author":"Ronchetti Franco","year":"2016","unstructured":"Franco Ronchetti , Facundo Quiroga , C\u00e9sar Armando Estrebou , Laura Cristina Lanzarini, and Alejandro Rosete. 2016 . LSA64: An Argentinian sign language dataset. In XXII Congreso Argentino de Ciencias de la Computaci\u00f3n . Franco Ronchetti, Facundo Quiroga, C\u00e9sar Armando Estrebou, Laura Cristina Lanzarini, and Alejandro Rosete. 2016. LSA64: An Argentinian sign language dataset. In XXII Congreso Argentino de Ciencias de la Computaci\u00f3n."},{"key":"e_1_2_1_51_1","volume-title":"Action recognition using visual attention. arXiv preprint arXiv:1511.04119","author":"Sharma Shikhar","year":"2015","unstructured":"Shikhar Sharma , Ryan Kiros , and Ruslan Salakhutdinov . 2015. Action recognition using visual attention. arXiv preprint arXiv:1511.04119 ( 2015 ). Shikhar Sharma, Ryan Kiros, and Ruslan Salakhutdinov. 2015. Action recognition using visual attention. arXiv preprint arXiv:1511.04119 (2015)."},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00550"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.5555\/2968826.2968890"},{"key":"e_1_2_1_55_1","doi-asserted-by":"crossref","unstructured":"Kiriakos Stefanidis Dimitrios Konstantinidis Athanasios Kalvourtzis Kosmas Dimitropoulos and Petros Daras. 2020. 3D technologies and applications in sign language. In Recent Advances in 3D Imaging Modeling and Reconstruction. 50\u201378.  Kiriakos Stefanidis Dimitrios Konstantinidis Athanasios Kalvourtzis Kosmas Dimitropoulos and Petros Daras. 2020. 3D technologies and applications in sign language. In Recent Advances in 3D Imaging Modeling and Reconstruction. 50\u201378.","DOI":"10.4018\/978-1-5225-5294-9.ch003"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/2735952"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-13572-4_30"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.510"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.5555\/3295222.3295349"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.5555\/648109.747376"},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2019.2915032"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2011.5995407"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46484-8_2"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2014.2298382"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00813"},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01267-0_19"},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.5555\/3045118.3045336"},{"key":"e_1_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1109\/IWCIA.2015.7449458"},{"key":"e_1_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46478-7_27"},{"key":"e_1_2_1_71_1","doi-asserted-by":"publisher","DOI":"10.1007\/11550518_50"},{"key":"e_1_2_1_72_1","volume-title":"Self-attention generative adversarial networks. arXiv preprint arXiv:1805.08318","author":"Zhang Han","year":"2018","unstructured":"Han Zhang , Ian Goodfellow , Dimitris Metaxas , and Augustus Odena . 2018. Self-attention generative adversarial networks. arXiv preprint arXiv:1805.08318 ( 2018 ). Han Zhang, Ian Goodfellow, Dimitris Metaxas, and Augustus Odena. 2018. Self-attention generative adversarial networks. arXiv preprint arXiv:1805.08318 (2018)."},{"key":"e_1_2_1_73_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICME.2016.7552950"},{"key":"e_1_2_1_74_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00454"},{"key":"e_1_2_1_75_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00515"},{"key":"e_1_2_1_76_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICME.2019.00223"},{"key":"e_1_2_1_77_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00054"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3436754","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3436754","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T17:45:04Z","timestamp":1750268704000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3436754"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,7,22]]},"references-count":76,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2021,8,31]]}},"alternative-id":["10.1145\/3436754"],"URL":"https:\/\/doi.org\/10.1145\/3436754","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,7,22]]},"assertion":[{"value":"2020-07-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-11-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-07-22","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}