{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,28]],"date-time":"2026-04-28T15:37:59Z","timestamp":1777390679935,"version":"3.51.4"},"reference-count":27,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2024,6,15]],"date-time":"2024-06-15T00:00:00Z","timestamp":1718409600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,6,15]],"date-time":"2024-06-15T00:00:00Z","timestamp":1718409600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Committee of Science of the Ministry of Science and Higher Education of the RK","award":["BR11765619"],"award-info":[{"award-number":["BR11765619"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Sci Rep"],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>To obtain a reliable and accurate automatic speech recognition (ASR) machine learning model, it is necessary to have sufficient audio data transcribed, for training. Many languages in the world, especially the agglutinative languages of the Turkic family, suffer from a lack of this type of data. Many studies have been conducted in order to obtain better models for low-resource languages, using different approaches. The most popular approaches include multilingual training and transfer learning. In this study, we combined five agglutinative languages from the Turkic family\u2014Kazakh, Bashkir, Kyrgyz, Sakha, and Tatar,\u2014in order to provide multilingual training using connectionist temporal classification and an attention mechanism including a language model, because these languages have cognate words, sentence formation rules, and alphabet (Cyrillic). Data from the open-source database Common voice was used for the study, to make the experiments reproducible. The results of the experiments showed that multilingual training could improve ASR performances for all languages included in the experiment, except Bashkir language. A dramatic result was achieved for the Kyrgyz language: word error rate decreased to nearly one-fifth and character error rate decreased to one-fourth, which proves that this approach can be helpful for critically low-resource languages.<\/jats:p>","DOI":"10.1038\/s41598-024-64848-1","type":"journal-article","created":{"date-parts":[[2024,6,15]],"date-time":"2024-06-15T09:01:42Z","timestamp":1718442102000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["Multilingual end-to-end ASR for low-resource Turkic languages with common alphabets"],"prefix":"10.1038","volume":"14","author":[{"given":"Akbayan","family":"Bekarystankyzy","sequence":"first","affiliation":[]},{"given":"Orken","family":"Mamyrbayev","sequence":"additional","affiliation":[]},{"given":"Mateus","family":"Mendes","sequence":"additional","affiliation":[]},{"given":"Anar","family":"Fazylzhanova","sequence":"additional","affiliation":[]},{"given":"Muhammad","family":"Assam","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,6,15]]},"reference":[{"issue":"115","key":"64848_CR1","first-page":"84","volume":"9","author":"O Mamyrbayev","year":"2022","unstructured":"Mamyrbayev, O., Alimhan, K., Oralbekova, D., Bekarystankyzy, A. & Zhumazhanov, B. Identifying the influence of transfer learning method in developing an end-to-end automatic speech recognition system with a low data level. East. Eur. J. Enterp. Technol. 9(115), 84\u201392 (2022).","journal-title":"East. Eur. J. Enterp. Technol."},{"key":"64848_CR2","doi-asserted-by":"crossref","unstructured":"Musaev, M., Mussakhojayeva, S., Khujayorov, I., Khassanov, Y., Ochilov, M. & Atakan Varol, H. USC: An open-source uzbek speech corpus and initial speech recognition experiments. In Proceedings of the Speech and Computer: 23rd International Conference, SPECOM 2021, St. 27\u201330 (Petersburg, 2021).","DOI":"10.1007\/978-3-030-87802-3_40"},{"key":"64848_CR3","doi-asserted-by":"crossref","unstructured":"Khassanov, Y., Mussakhojayeva, S., Mirzakhmetov, A., Adiyev, A., Nurpeiissov,M., Varol, H.A. A crowdsourced open-source Kazakh speech corpus and initial speech recognition baseline. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics, 697\u2013706 (2021).","DOI":"10.18653\/v1\/2021.eacl-main.58"},{"key":"64848_CR4","doi-asserted-by":"crossref","unstructured":"Cho, J., Baskar, M.K., Li, R., Wiesner, M., Mallidi, S.H., Yalta, N., Karafi\u00e1t, M., Watanabe, S. & Hori, T. Multilingual sequence-to-sequence speech recognition: architecture, transfer learning, and language modeling. In Proceedings of the 2018 IEEE Spoken Language Technology Workshop (SLT) 521\u2013527 (2018)","DOI":"10.1109\/SLT.2018.8639655"},{"key":"64848_CR5","unstructured":"Henretty, M., Morais,R., Saunders, L., Tyers, F.M., Weber, G. Common voice: a massively-multilingual speech corpus. In Proceedings of the LREC, 4218\u20134222 (ELRA, 2020)"},{"key":"64848_CR6","doi-asserted-by":"publisher","first-page":"45","DOI":"10.13064\/KSSS.2021.13.1.045","volume":"13","author":"H Yang","year":"2021","unstructured":"Yang, H. & Nam, H. Hyperparameter experiments on end-to-end automatic speech recognition. Phon. Speech Sci. 13, 45\u201351 (2021).","journal-title":"Phon. Speech Sci."},{"key":"64848_CR7","unstructured":"Carki, K., Geutner, P., Schultz T. Turkish LVCSR: towards better speech recognition for agglutinative languages. In Proceedings of the 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing (2000)."},{"issue":"4","key":"64848_CR8","doi-asserted-by":"publisher","first-page":"5880","DOI":"10.30534\/ijatcse\/2020\/249942020","volume":"9","author":"A Beibut","year":"2020","unstructured":"Beibut, A. Development of automatic speech recognition for Kazakh language using transfer learning. Int. J. Adv. Trends Comput. Sci. Eng. 9(4), 5880\u20135886 (2020).","journal-title":"Int. J. Adv. Trends Comput. Sci. Eng."},{"key":"64848_CR9","doi-asserted-by":"crossref","unstructured":"Conneau, A., Baevski, A., Collobert, R., Mohamed, A., & Auli, M. Unsupervised Cross-Lingual Representation Learning For Speech Recognition. https:\/\/arxiv.org\/abs\/2006.13979 (2020.)","DOI":"10.21437\/Interspeech.2021-329"},{"key":"64848_CR10","doi-asserted-by":"crossref","unstructured":"\u017belasko, P., Feng, S., Vel\u00e1zquez, L.M., Abavisani, A., Bhati, S., Scharenborg, O., Hasegawa-Johnson, M.A. & Dehak, N. Discovering phonetic inventories with crosslingual automatic speech recognition. abs\/2201.11207 (2022).","DOI":"10.1016\/j.csl.2022.101358"},{"key":"64848_CR11","doi-asserted-by":"publisher","first-page":"103148","DOI":"10.1016\/j.ipm.2022.103148","volume":"60","author":"K Nowakowski","year":"2023","unstructured":"Nowakowski, K., Ptaszynski, M., Murasaki, K. & Nieuwa\u017cny, J. Adapting multilingual speech representation model for a new, underresourced language through multilingual fine-tuning and continued pretraining. Inform. Process. Manag. 60, 103148 (2023).","journal-title":"Inform. Process. Manag."},{"key":"64848_CR12","doi-asserted-by":"publisher","first-page":"71","DOI":"10.1016\/j.specom.2022.03.006","volume":"140","author":"MY Tachbelie","year":"2022","unstructured":"Tachbelie, M. Y., Abate, S. T. & Schultz, T. Multilingual speech recognition for GlobalPhone languages. Speech Commun. 140, 71\u201386 (2022).","journal-title":"Speech Commun."},{"key":"64848_CR13","doi-asserted-by":"crossref","unstructured":"Chowdhury, S.A., Hussein, A., Abdelali, A., Ali, A. Towards one model to rule all: multilingual strategy for dialectal code-switching Arabic ASR. In Proceedings of the 22nd Annual Conference of the International Speech Communication Association, (Interspeech, Brno, 2021).","DOI":"10.21437\/Interspeech.2021-1809"},{"key":"64848_CR14","first-page":"19","volume":"12","author":"AJ Kumar","year":"2022","unstructured":"Kumar, A. J. & Aggarwal, R. K. An investigation of multilingual TDNN-BLSTM acoustic modeling for Hindi speech recognition. Int. J. Sens. Wirel. Commun. Control 12, 19\u201331 (2022).","journal-title":"Int. J. Sens. Wirel. Commun. Control"},{"key":"64848_CR15","doi-asserted-by":"crossref","unstructured":"Heigold, G., Vanhoucke, V., Senior, A.W., Nguyen, P., Ranzato, M., Devin, M. & Dean, J. Multilingual acoustic models using distributed deep neural networks. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, (Google Inc., 2013).","DOI":"10.1109\/ICASSP.2013.6639348"},{"key":"64848_CR16","doi-asserted-by":"crossref","unstructured":"Mussakhojayeva, S., Khassanov, Y. & Varol, H.A. A study of multilingual end-to-end speech recognition for Kazakh, Russian, and English. In Proceedings of the 23rd International Conference on Speech and Computer, SPECOM 2021Virtual, (2021).","DOI":"10.1007\/978-3-030-87802-3_41"},{"key":"64848_CR17","doi-asserted-by":"publisher","first-page":"74","DOI":"10.3390\/info14020074","volume":"14","author":"S Mussakhojayeva","year":"2023","unstructured":"Mussakhojayeva, S., Dauletbek, K., Yeshpanov, R. & Varol, H. A. Multilingual speech recognition for Turkic languages. Information 14, 74. https:\/\/doi.org\/10.3390\/info14020074 (2023).","journal-title":"Information"},{"key":"64848_CR18","doi-asserted-by":"crossref","unstructured":"Watanabe, S., Hori, T., Karita, S., Hayashi, T., Nishitoba, J., Unno, Y., Yalta, N., Heymann, J., Wiesner, M., Chen, N., Renduchintala, A. & Ochiai, T. ESPnet: End-to-end speech processing toolkit. Interspeech, arXiv:1804.00015 (2018).","DOI":"10.21437\/Interspeech.2018-1456"},{"key":"64848_CR19","doi-asserted-by":"crossref","unstructured":"Watanabe, S., Boyer, F., Chang, X., Guo, P., Hayashi, T., Higuchi, Y., Hori, T., Huang, W., Inaguma, H., Kamo, N., Karita, S., Li, C., Shi, J., Subramanian, A.S. & Zhang, W. The 2020 ESPnet update: new features, broadened applications, performance improvements, and future plans. In Proceedings of the 2021 IEEE Data Science and Learning Workshop (DSLW), (2021).","DOI":"10.1109\/DSLW51110.2021.9523402"},{"key":"64848_CR20","doi-asserted-by":"crossref","unstructured":"Guo, P., Boyer, F., Chang, X., Hayashi, T., Higuchi, Y., Inaguma, H., Kamo, N., Li, C., Garcia-Romero, D., Shi, J., Shi, J., Watanabe, S., Wei, K., Zhang, W. & Zhang, Y. Recent developments on espnet toolkit boosted by conformer. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2020).","DOI":"10.1109\/ICASSP39728.2021.9414858"},{"key":"64848_CR21","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13636-018-0141-9","volume":"2018","author":"C Qin","year":"2018","unstructured":"Qin, C., Qu, D. & Zhang, L. Towards end-to-end speech recognition with transfer learning. EURASIP J Audio Speech Music Process. 2018, 1\u20139 (2018).","journal-title":"EURASIP J Audio Speech Music Process."},{"key":"64848_CR22","first-page":"319","volume":"22","author":"UA Kimanuka","year":"2018","unstructured":"Kimanuka, U. A. & Buyuk, O. Turkish speech recognition based on deep neural networks. J. Nat. Appl. Sci. 22, 319\u2013329 (2018).","journal-title":"J. Nat. Appl. Sci."},{"key":"64848_CR23","doi-asserted-by":"crossref","unstructured":"Xiao, Z., Ou, Z., Chu, W., Lin, H. Hybrid CTC-attention based end-to-end speech recognition using subword units. In 2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP), 146\u2013150 (2018)","DOI":"10.1109\/ISCSLP.2018.8706675"},{"key":"64848_CR24","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2021-775","volume-title":"A Comparative Study on Neural Architectures and Training Methods for Japanese Speech Recognition","author":"S Karita","year":"2021","unstructured":"Karita, S., Kubo, Y., Bacchiani, M. & Jones, L. A Comparative Study on Neural Architectures and Training Methods for Japanese Speech Recognition (Interspeech, 2021)."},{"key":"64848_CR25","unstructured":"Hannun, A.Y., Case, C., Casper, J., Catanzaro, B., Diamos, G.F., Elsen, E., Prenger, R.J., Satheesh, S., Sengupta, S., Coates, A. & Ng, A. Deep Speech: Scaling up end-to-end speech recognition. abs\/1412.5567, (2014)."},{"key":"64848_CR26","doi-asserted-by":"publisher","first-page":"634","DOI":"10.3390\/sym13040634","volume":"13","author":"A Valizada","year":"2021","unstructured":"Valizada, A., Akhundova, N. & Rustamov, S. Development of speech recognition systems in emergency call centers. Symmetry 13, 634 (2021).","journal-title":"Symmetry"},{"key":"64848_CR27","doi-asserted-by":"publisher","DOI":"10.1038\/s41598-022-12260-y","author":"M Orken","year":"2022","unstructured":"Orken, M., Dina, O., Keylan, A., Tolganay, T. & Mohamed, O. A study of transformer-based end-to-end speech recognition system for Kazakh language. Sci. Rep. https:\/\/doi.org\/10.1038\/s41598-022-12260-y (2022).","journal-title":"Sci. Rep."}],"container-title":["Scientific Reports"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.nature.com\/articles\/s41598-024-64848-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41598-024-64848-1","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41598-024-64848-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,6,15]],"date-time":"2024-06-15T09:03:24Z","timestamp":1718442204000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.nature.com\/articles\/s41598-024-64848-1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,6,15]]},"references-count":27,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2024,12]]}},"alternative-id":["64848"],"URL":"https:\/\/doi.org\/10.1038\/s41598-024-64848-1","relation":{},"ISSN":["2045-2322"],"issn-type":[{"value":"2045-2322","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,6,15]]},"assertion":[{"value":"1 June 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 June 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"15 June 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declare no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"13835"}}