{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,23]],"date-time":"2026-01-23T12:25:56Z","timestamp":1769171156453,"version":"3.49.0"},"reference-count":22,"publisher":"Wiley","license":[{"start":{"date-parts":[[2020,12,18]],"date-time":"2020-12-18T00:00:00Z","timestamp":1608249600000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61976236"],"award-info":[{"award-number":["61976236"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Complexity"],"published-print":{"date-parts":[[2020,12,18]]},"abstract":"<jats:p>In this paper, we propose to incorporate the local attention in WaveNet-CTC to improve the performance of Tibetan speech recognition in multitask learning. With an increase in task number, such as simultaneous Tibetan speech content recognition, dialect identification, and speaker recognition, the accuracy rate of a single WaveNet-CTC decreases on speech recognition. Inspired by the attention mechanism, we introduce the local attention to automatically tune the weights of feature frames in a window and pay different attention on context information for multitask learning. The experimental results show that our method improves the accuracies of speech recognition for all Tibetan dialects in three-task learning, compared with the baseline model. Furthermore, our method significantly improves the accuracy for low-resource dialect by 5.11% against the specific-dialect model.<\/jats:p>","DOI":"10.1155\/2020\/8894566","type":"journal-article","created":{"date-parts":[[2020,12,19]],"date-time":"2020-12-19T01:50:10Z","timestamp":1608342610000},"page":"1-10","source":"Crossref","is-referenced-by-count":4,"title":["Multitask Learning with Local Attention for Tibetan Speech Recognition"],"prefix":"10.1155","volume":"2020","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8814-1988","authenticated-orcid":true,"given":"Hui","family":"Wang","sequence":"first","affiliation":[{"name":"School of Information Engineering, Minzu University of China, Beijing 100081, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7831-5721","authenticated-orcid":true,"given":"Fei","family":"Gao","sequence":"additional","affiliation":[{"name":"School of Information Engineering, Minzu University of China, Beijing 100081, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4007-7016","authenticated-orcid":true,"given":"Yue","family":"Zhao","sequence":"additional","affiliation":[{"name":"School of Information Engineering, Minzu University of China, Beijing 100081, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2305-3012","authenticated-orcid":true,"given":"Li","family":"Yang","sequence":"additional","affiliation":[{"name":"School of Information Engineering, Minzu University of China, Beijing 100081, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6015-0746","authenticated-orcid":true,"given":"Jianjian","family":"Yue","sequence":"additional","affiliation":[{"name":"School of Information Engineering, Minzu University of China, Beijing 100081, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7133-2713","authenticated-orcid":true,"given":"Huilin","family":"Ma","sequence":"additional","affiliation":[{"name":"School of Information Engineering, Minzu University of China, Beijing 100081, China"}]}],"member":"311","reference":[{"key":"1","first-page":"1","article-title":"Multi-task recurrent model for speech and speaker recognition","author":"Z. Tang"},{"key":"2","first-page":"589","article-title":"Multitask learning and system combination for automatic speech recognition","author":"O. Siohan"},{"key":"3","doi-asserted-by":"publisher","DOI":"10.1109\/ASRU.2015.7404810"},{"key":"4","article-title":"Multi-task learning of deep neural networks for audio visual automatic speech recognition","author":"A. Thanda","year":"2020"},{"key":"5","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2018.8462557"},{"key":"6","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2015.2422573"},{"key":"7","article-title":"Hierarchical multitask learning for ctc-based speech recognition","author":"K. Krishna","year":"2020"},{"key":"8","article-title":"Multi-dialect speech recognition with a single sequence-to-sequence model","author":"B. Li","year":"2017"},{"key":"9","doi-asserted-by":"crossref","DOI":"10.1109\/ICASSP.2018.8461972","article-title":"Multilingual speech recognition with a single end-to-end model","author":"S. Toshniwal","year":"2018"},{"key":"10","doi-asserted-by":"publisher","DOI":"10.1109\/access.2019.2952406"},{"key":"11","article-title":"Advancing connectionist temporal classification with attention","author":"A. Das"},{"issue":"3","key":"12","first-page":"249","article-title":"Long short-term memory with attention and multitask learning for distant speech recognition","volume":"58","author":"Y. Zhang","year":"2018","journal-title":"Journal of Tsinghua University (Science and Technology)"},{"key":"13","article-title":"End-to-end multi-task learning with attention","author":"S. Liu"},{"key":"14","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2019.8682896"},{"key":"15","article-title":"Speech-to-text-wavenet: end-to-end sentence level Chinese speech recognition using deepmind\u2019s wavenet","author":"S. Xu","year":"2020"},{"key":"16","article-title":"Speech-to-text-WaveNet","author":"Kim","year":"2016"},{"key":"17","article-title":"WaveNet: a generative model for raw audio","author":"A. van den Oord","year":"2016"},{"key":"18","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-642-24797-2","volume-title":"Supervised Sequence Labelling with Recurrent Neural Networks","author":"A. Graves","year":"2012"},{"key":"19","doi-asserted-by":"publisher","DOI":"10.1108\/ijicc-02-2020-0017"},{"key":"20","doi-asserted-by":"publisher","DOI":"10.1108\/IJICC-11-2019-0119"},{"key":"21","article-title":"Effective approaches to attention-based neural machine translation","author":"M.-T. Luong","year":"2020"},{"key":"22","article-title":"Tibetan spoken language","author":"B. La","year":"2005"}],"container-title":["Complexity"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/downloads.hindawi.com\/journals\/complexity\/2020\/8894566.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/downloads.hindawi.com\/journals\/complexity\/2020\/8894566.xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/downloads.hindawi.com\/journals\/complexity\/2020\/8894566.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2020,12,19]],"date-time":"2020-12-19T01:50:13Z","timestamp":1608342613000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.hindawi.com\/journals\/complexity\/2020\/8894566\/"}},"subtitle":[],"editor":[{"given":"Ning","family":"Cai","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2020,12,18]]},"references-count":22,"alternative-id":["8894566","8894566"],"URL":"https:\/\/doi.org\/10.1155\/2020\/8894566","relation":{},"ISSN":["1099-0526","1076-2787"],"issn-type":[{"value":"1099-0526","type":"electronic"},{"value":"1076-2787","type":"print"}],"subject":[],"published":{"date-parts":[[2020,12,18]]}}}