{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,2]],"date-time":"2026-01-02T07:29:07Z","timestamp":1767338947631,"version":"3.37.3"},"reference-count":26,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2024,7,17]],"date-time":"2024-07-17T00:00:00Z","timestamp":1721174400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,7,17]],"date-time":"2024-07-17T00:00:00Z","timestamp":1721174400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61976236"],"award-info":[{"award-number":["61976236"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J AUDIO SPEECH MUSIC PROC."],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>The disparities in phonetics and corpuses across the three major dialects of Tibetan exacerbate the difficulty of a single task model for one dialect to accommodate other different dialects. To address this issue, this paper proposes task-diverse meta-learning. Our model can acquire more comprehensive and robust features, facilitating its adaptation to the variations among different dialects. This study uses Tibetan dialect ID recognition and Tibetan speaker recognition as the source tasks for meta-learning, which aims to augment the ability of the model to discriminate variations and differences among different dialects. Consequently, the model\u2019s performance in Tibetan multi-dialect speech recognition tasks is enhanced. The experimental results show that task-diverse meta-learning leads to improved performance in Tibetan multi-dialect speech recognition. This demonstrates the effectiveness and applicability of task-diverse meta-learning, thereby contributing to the advancement of speech recognition techniques in multi-dialect environments.<\/jats:p>","DOI":"10.1186\/s13636-024-00361-7","type":"journal-article","created":{"date-parts":[[2024,7,17]],"date-time":"2024-07-17T02:01:38Z","timestamp":1721181698000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Exploring task-diverse meta-learning on Tibetan multi-dialect speech recognition"],"prefix":"10.1186","volume":"2024","author":[{"given":"Yigang","family":"Liu","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7831-5721","authenticated-orcid":false,"given":"Yue","family":"Zhao","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xiaona","family":"Xu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Liang","family":"Xu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xubei","family":"Zhang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Qiang","family":"Ji","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2024,7,17]]},"reference":[{"key":"361_CR1","unstructured":"N. Zhou, Research on Tibetan non-specific person continuous speech recognition based on deep learning. Master\u2019s thesis, Central University for Nationalities (2017)"},{"issue":"5","key":"361_CR2","first-page":"189","volume":"32","author":"X Huang","year":"2018","unstructured":"X. Huang, J. Li, Acoustic model for Tibetan speech recognition based on recurrent neural network. J. Chin. Inf. 32(5), 189\u2013191 (2018)","journal-title":"J. Chin. Inf."},{"issue":"4","key":"361_CR3","first-page":"359","volume":"30","author":"Q Wang","year":"2017","unstructured":"Q. Wang, W. Guo, C. Xie, Tibetan speech recognition based on end-to-end technology. Pattern Recognit. Artif. Intell. 30(4), 359\u2013363 (2017)","journal-title":"Pattern Recognit. Artif. Intell."},{"issue":"3","key":"361_CR4","first-page":"209","volume":"28","author":"S Yuan","year":"2015","unstructured":"S. Yuan, W. Guo, L. Dai, Tibetan language recognition based on deep neural networks. Pattern Recognit. Artif. Intell. 28(3), 209\u2013213 (2015)","journal-title":"Pattern Recognit. Artif. Intell."},{"key":"361_CR5","doi-asserted-by":"crossref","unstructured":"S. Min, M. Lewis, L. Zettlemoyer et al., Metaicl: Learning to learn in context[J]. arXiv preprint arXiv:2110.15943\u00a0(2021)","DOI":"10.18653\/v1\/2022.naacl-main.201"},{"key":"361_CR6","unstructured":"C. Finn, K. Xu, S. Levine, Probabilistic model-agnostic meta-learning[J]. Adv. Neural Inf. Process. Syst. 31, (2018)"},{"key":"361_CR7","first-page":"18860","volume":"33","author":"L Collins","year":"2020","unstructured":"L. Collins, A. Mokhtari, S. Shakkottai, Task-robust model-agnostic meta-learning[J]. Adv. Neural Inf. Process. Syst. 33, 18860\u201318871 (2020)","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"361_CR8","unstructured":"C. Finn, P. Abbeel, S. Levine, in International conference on machine learning. Model-agnostic meta-learning for fast adaptation of deep networks[C] (PMLR, 2017), pp. 1126\u20131135"},{"key":"361_CR9","doi-asserted-by":"crossref","unstructured":"J. Gu, Y. Wang, Y. Chen et al., Meta-learning for low-resource neural machine translation[J]. arXiv preprint arXiv:1808.08437\u00a0(2018)","DOI":"10.18653\/v1\/D18-1398"},{"key":"361_CR10","doi-asserted-by":"publisher","first-page":"1558","DOI":"10.1109\/TASLP.2022.3167258","volume":"30","author":"SF Huang","year":"2022","unstructured":"S.F. Huang, C.J. Lin, D.R. Liu et al., Meta-TTS: Meta-learning for few-shot speaker adaptive text-to-speech[J]. IEEE\/ACM Trans. Audio Speech Lang. Process. 30, 1558\u20131571 (2022)","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"361_CR11","doi-asserted-by":"crossref","unstructured":"A. Naman, C. Sinha, in Machine Intelligence and Smart Systems: Proceedings of MISS 2021. Fixed-MAML for few-shot classification in multilingual speech emotion recognition[M] (Springer Nature Singapore, Singapore, 2022), pp. 473\u2013483","DOI":"10.1007\/978-981-16-9650-3_37"},{"key":"361_CR12","doi-asserted-by":"crossref","unstructured":"A. Kannan, A. Datta, T.N. Sainath et al., Large-scale multilingual speech recognition with a streaming end-to-end model[J]. (2019). arXiv preprint arXiv:1909.05330","DOI":"10.21437\/Interspeech.2019-2858"},{"key":"361_CR13","doi-asserted-by":"crossref","unstructured":"R. Imaizumi, R. Masumura, S. Shiota et al., End-to-end Japanese multi-dialect speech recognition and dialect identification with multi-task learning[J]. APSIPA Trans. Signal Inf. Process. 11(1), (2022)","DOI":"10.1561\/116.00000045"},{"key":"361_CR14","doi-asserted-by":"crossref","unstructured":"S. Dalmia, R. Sanabria, F. Metze et al., in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Sequence-based multi-lingual low resource speech recognition[C] (IEEE, 2018), pp. 4909\u20134913","DOI":"10.1109\/ICASSP.2018.8461802"},{"key":"361_CR15","doi-asserted-by":"publisher","first-page":"162519","DOI":"10.1109\/ACCESS.2019.2952406","volume":"7","author":"Y Zhao","year":"2019","unstructured":"Y. Zhao, J. Yue, X. Xu et al., End-to-end-based Tibetan multitask speech recognition[J]. IEEE Access 7, 162519\u2013162529 (2019)","journal-title":"IEEE Access"},{"key":"361_CR16","doi-asserted-by":"crossref","unstructured":"Z. Dan, Y. Zhao, X. Bi et al. in Natural Language Processing and Chinese Computing: 11th CCF International Conference, NLPCC 2022, Guilin, China, September 24-25, 2022, Proceedings, Part I. Multi-task learning with auxiliary cross-attention transformer for low-resource multi-dialect speech recognition[C] (Springer International Publishing, Cham, 2022), pp. 107\u2013118","DOI":"10.1007\/978-3-031-17120-8_9"},{"key":"361_CR17","doi-asserted-by":"crossref","unstructured":"D. Li, Y. Yang, Y.Z. Song et al., in Proceedings of the AAAI conference on artificial intelligence. Learning to generalize: Meta-learning for domain generalization[C], vol. 32(1) (2018)","DOI":"10.1609\/aaai.v32i1.11596"},{"key":"361_CR18","doi-asserted-by":"crossref","unstructured":"K. Yang, R. Wang, L. Wang, in 31st International Joint Conference on Artificial Intelligence (IJCAI-22). Metafinger: Fingerprinting the deep neural networks with meta-training[C] (2022)","DOI":"10.24963\/ijcai.2022\/109"},{"key":"361_CR19","unstructured":"H. Yao, L.K. Huang, L. Zhang et al., in International conference on machine learning. Improving generalization in meta-learning via task augmentation[C] (PMLR, 2021), pp. 11887\u201311897"},{"key":"361_CR20","doi-asserted-by":"crossref","unstructured":"J.Y. Hsu, Y.J. Chen, H. Lee, in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Meta learning for end-to-end low-resource speech recognition[C] (IEEE, 2020), pp. 7844-7848","DOI":"10.1109\/ICASSP40776.2020.9053112"},{"key":"361_CR21","doi-asserted-by":"crossref","unstructured":"D. Eledath, A. Baby, S. Singh, in 2024 National Conference on Communications (NCC). Robust speech recognition using meta-learning for low-resource accents[C] (IEEE, 2024), pp. 1\u20136","DOI":"10.1109\/NCC60321.2024.10485786"},{"key":"361_CR22","doi-asserted-by":"crossref","unstructured":"S. Qin, L. Wang, S. Li et al., in Proc. INTERSPEECH. Finer-grained modeling units-based meta-learning for low-resource tibetan speech recognition[C] (2022)","DOI":"10.21437\/Interspeech.2022-10015"},{"key":"361_CR23","doi-asserted-by":"crossref","unstructured":"H. Bu, J. Du, X Na et al., in 2017 20th conference of the oriental chapter of the international coordinating committee on speech databases and speech I\/O systems and assessment (O-COCOSDA). Aishell-1: An open-source mandarin speech corpus and a speech recognition baseline[C] (IEEE, 2017), pp. 1\u20135","DOI":"10.1109\/ICSDA.2017.8384449"},{"key":"361_CR24","doi-asserted-by":"crossref","unstructured":"V. Panayotov, G. Chen, D. Povey et al., in 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP). Librispeech: An ASR corpus based on public domain audio books[C] (IEEE, 2015), pp. 5206\u20135210","DOI":"10.1109\/ICASSP.2015.7178964"},{"issue":"2\u20133","key":"361_CR25","first-page":"297","volume":"22","author":"Y Zhao","year":"2020","unstructured":"Y. Zhao, X. Xu, J. Yue et al., An open speech resource for Tibetan multi-dialect and multitask recognition[J]. Int. J. Comput. Sci. Eng. 22(2\u20133), 297\u2013304 (2020)","journal-title":"Int. J. Comput. Sci. Eng."},{"key":"361_CR26","doi-asserted-by":"crossref","unstructured":"A. Gulati, J. Qin, C.C. Chiu et al. Conformer: Convolution-augmented Transformer for speech recognition[J]. arXiv preprint arXiv:2005.08100 (2020)","DOI":"10.21437\/Interspeech.2020-3015"}],"container-title":["EURASIP Journal on Audio, Speech, and Music Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13636-024-00361-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13636-024-00361-7\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13636-024-00361-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,17]],"date-time":"2024-07-17T02:05:31Z","timestamp":1721181931000},"score":1,"resource":{"primary":{"URL":"https:\/\/asmp-eurasipjournals.springeropen.com\/articles\/10.1186\/s13636-024-00361-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7,17]]},"references-count":26,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2024,12]]}},"alternative-id":["361"],"URL":"https:\/\/doi.org\/10.1186\/s13636-024-00361-7","relation":{},"ISSN":["1687-4722"],"issn-type":[{"type":"electronic","value":"1687-4722"}],"subject":[],"published":{"date-parts":[[2024,7,17]]},"assertion":[{"value":"12 February 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"25 June 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"17 July 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"37"}}