{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,20]],"date-time":"2025-07-20T04:32:38Z","timestamp":1752985958200,"version":"3.37.3"},"reference-count":50,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2021,8,30]],"date-time":"2021-08-30T00:00:00Z","timestamp":1630281600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,8,30]],"date-time":"2021-08-30T00:00:00Z","timestamp":1630281600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62071302"],"award-info":[{"award-number":["62071302"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61701306"],"award-info":[{"award-number":["61701306"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J AUDIO SPEECH MUSIC PROC."],"published-print":{"date-parts":[[2021,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Code-switching (CS) refers to the phenomenon of using more than one language in an utterance, and it presents great challenge to automatic speech recognition (ASR) due to the code-switching property in one utterance, the pronunciation variation phenomenon of the embedding language words and the heavy training data sparse problem. This paper focuses on the Mandarin-English CS ASR task. We aim at dealing with the pronunciation variation and alleviating the sparse problem of code-switches by using pronunciation augmentation methods. An English-to-Mandarin mix-language phone mapping approach is first proposed to obtain a language-universal CS lexicon. Based on this lexicon, an acoustic data-driven lexicon learning framework is further proposed to learn new pronunciations to cover the accents, mis-pronunciations, or pronunciation variations of those embedding English words. Experiments are performed on real CS ASR tasks. Effectiveness of the proposed methods are examined on all of the conventional, hybrid, and the recent end-to-end speech recognition systems. Experimental results show that both the learned phone mapping and augmented pronunciations can significantly improve the performance of code-switching speech recognition.<\/jats:p>","DOI":"10.1186\/s13636-021-00222-7","type":"journal-article","created":{"date-parts":[[2021,8,30]],"date-time":"2021-08-30T08:16:54Z","timestamp":1630311414000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["Pronunciation augmentation for Mandarin-English code-switching speech recognition"],"prefix":"10.1186","volume":"2021","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0924-408X","authenticated-orcid":false,"given":"Yanhua","family":"Long","sequence":"first","affiliation":[]},{"given":"Shuang","family":"Wei","sequence":"additional","affiliation":[]},{"given":"Jie","family":"Lian","sequence":"additional","affiliation":[]},{"given":"Yijie","family":"Li","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2021,8,30]]},"reference":[{"key":"222_CR1","first-page":"3","volume":"14","author":"D. Sankoff","year":"1981","unstructured":"D. Sankoff, S. Poplack, A formal grammar for code-switching. Res. Lang. Soc. Interact.14:, 3\u201345 (1981).","journal-title":"Res. Lang. Soc. Interact."},{"key":"222_CR2","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511620867","volume-title":"One speaker, two languages: cross-disciplinary perspectives on code-switching","author":"L. Milroy","year":"1995","unstructured":"L. Milroy, P. Muysken, One speaker, two languages: cross-disciplinary perspectives on code-switching (Cambridge University Press, New York, 1995)."},{"key":"222_CR3","doi-asserted-by":"publisher","DOI":"10.4324\/9780203017883","volume-title":"Code-switching in conversation: language, interaction and identity","author":"P. Auer","year":"2013","unstructured":"P. Auer, Code-switching in conversation: language, interaction and identity (Routledge, New York, 2013)."},{"key":"222_CR4","doi-asserted-by":"publisher","unstructured":"M. Ma, B. Ramabhadran, J. Emond, A. Rosenberg, F. Biadsy, in Proceedings of the 44th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Comparison of data augmentation and adaptation strategies for code-switched automatic speech recognition (Brighton, 2019), pp. 6081\u20136085. https:\/\/doi.org\/10.1109\/icassp.2019.8682824.","DOI":"10.1109\/icassp.2019.8682824"},{"issue":"3","key":"222_CR5","doi-asserted-by":"publisher","first-page":"305","DOI":"10.1111\/1467-971X.00181","volume":"19","author":"D. C. Li","year":"2000","unstructured":"D. C. Li, Cantonese-english code-switching research in Hong Kong: a Y2K review. World Englishes. 19(3), 305\u2013322 (2000).","journal-title":"World Englishes"},{"key":"222_CR6","first-page":"2160","volume-title":"The 20th Annual Conference of the International Speech Communication Association(Interspeech)","author":"Y. Khassanov","year":"2019","unstructured":"Y. Khassanov, H. Xu, V. Pham, Z. Zeng, E. S. Chng, C. Ni, B. Ma, in The 20th Annual Conference of the International Speech Communication Association(Interspeech). Constrained output embeddings for end-to-end code-switching speech recognition with only monolingual data (ISCAGraz, 2019), pp. 2160\u20132164."},{"key":"222_CR7","doi-asserted-by":"publisher","first-page":"2165","DOI":"10.21437\/interspeech.2019-1429","volume-title":"The 20th Annual Conference of the International Speech Communication Association(Interspeech)","author":"Z. Zeng","year":"2019","unstructured":"Z. Zeng, Y. Khassanov, V. Pham, H. Xu, E. S. Chng, H. Li, in The 20th Annual Conference of the International Speech Communication Association(Interspeech). On the end-to-end solution to Mandarin-English code-switching speech recognition (ISCAGraz, 2019), pp. 2165\u20132169. https:\/\/doi.org\/10.21437\/interspeech.2019-1429."},{"key":"222_CR8","doi-asserted-by":"publisher","unstructured":"H. Chang, Y. H. Sung, B. Strope, F. Beaufays, in Proceedings of the 36th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Recognizing english queries in Mandarin voice search (Praque, 2011), pp. 5016\u20135019. https:\/\/doi.org\/10.1109\/icassp.2011.5947483.","DOI":"10.1109\/icassp.2011.5947483"},{"key":"222_CR9","volume-title":"Foundations of bilingual education and bilingualism","author":"C. Baker","year":"2011","unstructured":"C. Baker, Foundations of bilingual education and bilingualism (Multilingual Matters, Tonawanda, 2011)."},{"key":"222_CR10","doi-asserted-by":"publisher","first-page":"82","DOI":"10.1109\/MSP.2012.2205597","volume":"29","author":"G. Hinton","year":"2012","unstructured":"G. Hinton, L. Deng, D. Yu, G. Dahl, A. R. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, B. Kingsbury, Deep neural networks for acoustic modeling in speech recognition. Sig. Process. Mag.29:, 82\u201397 (2012).","journal-title":"Sig. Process. Mag."},{"issue":"3","key":"222_CR11","doi-asserted-by":"publisher","first-page":"396","DOI":"10.1109\/JAS.2017.7510508","volume":"4","author":"D. Yu","year":"2017","unstructured":"D. Yu, J. Li, Recent progresses in deep learning based acoustic models. IEEE\/CAA J. Autom. Sin.4(3), 396\u2013409 (2017).","journal-title":"IEEE\/CAA J. Autom. Sin."},{"key":"222_CR12","doi-asserted-by":"publisher","unstructured":"S. Yu, S. Zhang, B. Xu, in Proceedings of the 29th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Chinese-English bilingual phone modeling for cross-language speech recognition (Montreal, 2004), pp. 917\u201320. https:\/\/doi.org\/10.1109\/icassp.2004.1326136.","DOI":"10.1109\/icassp.2004.1326136"},{"key":"222_CR13","doi-asserted-by":"publisher","unstructured":"H. Lin, L. Deng, D. Yu, Y. Gong, A. Acero, C. -H. Lee, in Proceedings of the 34th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). A study on multilingual acoustic modeling for large vocabulary ASR (Taipei, 2009), pp. 4333\u20134336. https:\/\/doi.org\/10.1109\/icassp.2009.4960588.","DOI":"10.1109\/icassp.2009.4960588"},{"key":"222_CR14","doi-asserted-by":"publisher","unstructured":"N. Vu, D. Lyu, J. Weiner, D. Telaar, T. Schlippe, F. Blaicher, E. Chng, T. Schultz, H. Li, in Proceedings of the 37th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). A first speech recognition system for Mandarin-English code-switch conversational speech (Kyoto, 2012), pp. 4889\u20134892. https:\/\/doi.org\/10.1109\/icassp.2012.6289015.","DOI":"10.1109\/icassp.2012.6289015"},{"key":"222_CR15","doi-asserted-by":"publisher","unstructured":"S. Zhang, Y. Liu, M. Lei, B. Ma, L. Xie, in The 20th Annual Conference of the International Speech Communication Association(Interspeech). Towards language-universal Mandarin-English speech recognition (Graz, 2019), pp. 2170\u20132174. https:\/\/doi.org\/10.21437\/interspeech.2019-1365.","DOI":"10.21437\/interspeech.2019-1365"},{"key":"222_CR16","doi-asserted-by":"publisher","unstructured":"K. Li, J. Li, G. Ye, R. Zhao, Y. Gong, in Proceedings of the 44th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Towards code-switching ASR for end-to-end CTC models (Brighton, 2019), pp. 6076\u20136080. https:\/\/doi.org\/10.1109\/icassp.2019.8683223.","DOI":"10.1109\/icassp.2019.8683223"},{"key":"222_CR17","doi-asserted-by":"publisher","unstructured":"Z. Huang, X. Zhuang, D. Liu, X. Xiao, Y. Zhang, S. M. Siniscalchi, in Proceedings of the 44th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Exploring retraining-free speech recognition for intra-sentential code-switching (Brighton, 2019), pp. 6066\u20136070. https:\/\/doi.org\/10.1109\/icassp.2019.8682478.","DOI":"10.1109\/icassp.2019.8682478"},{"key":"222_CR18","doi-asserted-by":"publisher","unstructured":"X. Zhang, V. Manohar, D. Povey, S. Khudanpur, in The 18th Annual Conference of the International Speech Communication Association(Interspeech). Acoustic data-driven lexicon learning based on a greedy pronunciation selection framework (Stockholm, 2017), pp. 2541\u20132545. https:\/\/doi.org\/10.21437\/interspeech.2017-588.","DOI":"10.21437\/interspeech.2017-588"},{"key":"222_CR19","doi-asserted-by":"publisher","unstructured":"D. Wang, D. Tang, Z. Tang, Q. Chen, in The 19th Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques (O-COCOSDA). OC16-CE80: a Chinese-English mixlingual database and a speech recognition baseline (Bali, 2016), pp. 84\u201388. https:\/\/doi.org\/10.1109\/icsda.2016.7918989.","DOI":"10.1109\/icsda.2016.7918989"},{"key":"222_CR20","first-page":"1986","volume-title":"The 11th Annual Conference of the International Speech Communication Association (Interspeech)","author":"D. Lyu","year":"2010","unstructured":"D. Lyu, T. Tan, E. Chng, H. Li, in The 11th Annual Conference of the International Speech Communication Association (Interspeech). SEAME a Mandarin-English code-switching speech corpus in south-east Asia (ISCAMakuhari, 2010), pp. 1986\u20131989."},{"key":"222_CR21","doi-asserted-by":"crossref","unstructured":"G. Lee, T. -N. Ho, E. -S. Chng, H. Li, in The 21th International Conference on Asian Language Processing (IALP). A review of the Mandarin-English code-switching corpus: SEAME (Singapore, 2017), pp. 210\u2013213.","DOI":"10.1109\/IALP.2017.8300581"},{"key":"222_CR22","doi-asserted-by":"publisher","first-page":"20","DOI":"10.1016\/j.specom.2017.07.006","volume":"93","author":"M. Sperber","year":"2017","unstructured":"M. Sperber, G. Neubig, J. Niehues, S. Nakamura, A. Waibel, Transcribing against time. Speech Commun.93:, 20\u201330 (2017).","journal-title":"Speech Commun."},{"issue":"11","key":"222_CR23","doi-asserted-by":"publisher","first-page":"1892","DOI":"10.1109\/TASLP.2015.2456417","volume":"23","author":"C. Wu","year":"2015","unstructured":"C. Wu, H. Shen, C. Hsu, Code-switching event detection by using a latent language space model and the delta-Bayesian information criterion. IEEE\/ACM Trans. Audio Speech Lang. Process.23(11), 1892\u20131903 (2015).","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"222_CR24","doi-asserted-by":"publisher","unstructured":"S. Rallabandi, S. Sitaram, A. Black, in The Third Workshop on Computational Approaches to Linguistic Code-Switching(CALCS). Automatic detection of code-switching style from acoustics (Melbourne, 2018), pp. 76\u201381. https:\/\/doi.org\/10.18653\/v1\/w18-3209.","DOI":"10.18653\/v1\/w18-3209"},{"key":"222_CR25","doi-asserted-by":"publisher","unstructured":"Q. Wang, E. Yilmaz, A. Derinel, H. Li, in The 20th Annual Conference of the International Speech Communication Association(Interspeech). Code-switching detection using ASR-generated language posteriors (Graz, 2019), pp. 3740\u20133744. https:\/\/doi.org\/10.21437\/interspeech.2019-1161.","DOI":"10.21437\/interspeech.2019-1161"},{"key":"222_CR26","doi-asserted-by":"publisher","unstructured":"Y. Long, Y. Li, Q. Zhang, S. Wei, H. Ye, J. Yang, Acoustic data augmentation for Mandarin-English code-switching speech recognition. Appl. Acoust.161: (2020). https:\/\/doi.org\/10.1016\/j.apacoust.2019.107175.","DOI":"10.1016\/j.apacoust.2019.107175"},{"key":"222_CR27","first-page":"3586","volume-title":"The 16th Annual Conference of the International Speech Communication Association(Interspeech)","author":"T. Ko","year":"2015","unstructured":"T. Ko, V. Peddinti, D. Povey, S. Khudanpur, in The 16th Annual Conference of the International Speech Communication Association(Interspeech). Audio augmentation for speech recognition (ISCADresden, 2015), pp. 3586\u20133589."},{"key":"222_CR28","doi-asserted-by":"publisher","unstructured":"Y. R. Pandeya, D. Kim, J. Lee, Domestic cat sound classification using learned features from deep neural nets. Appl. Sci.8(10) (2018). https:\/\/doi.org\/10.3390\/app8101949.","DOI":"10.3390\/app8101949"},{"key":"222_CR29","doi-asserted-by":"publisher","unstructured":"D. S. Park, W. Chan, Y. Zhang, et.al., Specaugment: a simple data augmentation method for automatic speech recognition, 2613\u20132617 (2019). https:\/\/doi.org\/10.21437\/interspeech.2019-2680.","DOI":"10.21437\/interspeech.2019-2680"},{"key":"222_CR30","doi-asserted-by":"publisher","unstructured":"Y. R. Pandeya, B. Bhattarai, J. Lee, in IEEE 2020 International Conference on Information and Communication Technology Convergence (ICTC). Sound event detection in cowshed using synthetic data and convolutional neural network (Singapore, 2020), pp. 273\u2013276. https:\/\/doi.org\/10.1109\/ictc49870.2020.9289545.","DOI":"10.1109\/ictc49870.2020.9289545"},{"key":"222_CR31","first-page":"1415","volume-title":"The 15th Annual Conference of the International Speech Communication Association (Interspeech)","author":"H. Adel","year":"2014","unstructured":"H. Adel, D. Telaar, N. Vu, K. Kirchhoff, T. Schultz, in The 15th Annual Conference of the International Speech Communication Association (Interspeech). Combing recurrent neural networks and factored language models during decoding of code-switching speech (ISCASingapore, 2014), pp. 1415\u20131419."},{"key":"222_CR32","doi-asserted-by":"publisher","unstructured":"C. T. Chang, S. P. Chuang, H. Y. Lee, in The 20th Annual Conference of the International Speech Communication Association(Interspeech). Code-switching sentence generation by generative adversarial networks and its application to data augmentation (Graz, 2019), pp. 554\u2013558. https:\/\/doi.org\/10.21437\/interspeech.2019-3214.","DOI":"10.21437\/interspeech.2019-3214"},{"key":"222_CR33","doi-asserted-by":"publisher","unstructured":"A. Pratapa, G. Bhat, M. Choudhury, S. Sitaram, S. Dandapat, K. Bali, in The 56th Annual Meeting of the Association for Computational Linguistics (ACL). Language modeling for code-mixing: the role of linguistic theory based synthetic data (Melbourne, 2018), pp. 1543\u20131553. https:\/\/doi.org\/10.18653\/v1\/p18-1143.","DOI":"10.18653\/v1\/p18-1143"},{"key":"222_CR34","doi-asserted-by":"publisher","unstructured":"C. Yeh, C. Huang, L. Sun, L. Lee, in The 7th International Symposium on Chinese Spoken Language Processing (ISCSLP). An integrated framework for transcribing Mandarin-English code-mixed lectures with improved acoustic and language modeling (Tainai, 2010), pp. 214\u2013219. https:\/\/doi.org\/10.1109\/iscslp.2010.5684908.","DOI":"10.1109\/iscslp.2010.5684908"},{"key":"222_CR35","doi-asserted-by":"publisher","unstructured":"Y. Qian, J. Liu, in Proceedings of the 35th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Phone modeling and combining discriminative training for Mandarin-English bilingual speech recognition (Florence, 2010), pp. 4918\u20134921. https:\/\/doi.org\/10.1109\/icassp.2010.5495112.","DOI":"10.1109\/icassp.2010.5495112"},{"key":"222_CR36","doi-asserted-by":"publisher","unstructured":"C. F. Yeh, L. S. Lee, in Proceedings of the 39th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Transcribing code-switched bilingual lectures using deep neural networks with unit merging in acoustic modeling (Florence, 2014), pp. 220\u2013224. https:\/\/doi.org\/10.1109\/icassp.2014.6853590.","DOI":"10.1109\/icassp.2014.6853590"},{"key":"222_CR37","doi-asserted-by":"publisher","unstructured":"R. Sennrich, B. Haddow, A. Birch, in The 54th Annual Meeting of the Association for Computational Linguistics (ACL). Neural machine translation of rare words with subword units (Berlin, 2016), pp. 1715\u20131725. https:\/\/doi.org\/10.18653\/v1\/p16-1162.","DOI":"10.18653\/v1\/p16-1162"},{"key":"222_CR38","doi-asserted-by":"publisher","unstructured":"C. Shan, C. Weng, G. Wang, D. Su, M. Luo, D. Yu, L. Xie, in Proceedings of the 44th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Investigating end-to-end speech recognition for Mandarin-English code-switching (Brighton, 2019), pp. 6056\u20136060. https:\/\/doi.org\/10.1109\/icassp.2019.8682850.","DOI":"10.1109\/icassp.2019.8682850"},{"key":"222_CR39","doi-asserted-by":"publisher","first-page":"12","DOI":"10.1016\/j.specom.2018.10.006","volume":"105","author":"E. Yilmaz","year":"2018","unstructured":"E. Yilmaz, M. McLaren, H. van den Heuvel, D. A. van Leeuwen, Semi-supervised acoustic model training for speech with code-switching. Speech Commun.105:, 12\u201322 (2018).","journal-title":"Speech Commun."},{"key":"222_CR40","doi-asserted-by":"publisher","unstructured":"P. Guo, H. Xu, L. Xie, E. S. Chng, in The 19th Annual Conference of the International Speech Communication Association(Interspeech). Study of semi-supervised approaches to improving English-Mandarin code-switching speech recognition (Hyderabad, 2018), pp. 1928\u20131932. https:\/\/doi.org\/10.21437\/interspeech.2018-1974.","DOI":"10.21437\/interspeech.2018-1974"},{"key":"222_CR41","doi-asserted-by":"publisher","unstructured":"H. Xu, S. DIng, S. Watanabe, in Proceedings of the 44th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Improving end-to-end speech recognition with pronunciation-assisted sub-word modeling (Brighton, 2019), pp. 7110\u20137114. https:\/\/doi.org\/10.1109\/icassp.2019.8682494.","DOI":"10.1109\/icassp.2019.8682494"},{"key":"222_CR42","doi-asserted-by":"publisher","unstructured":"T. N. Sainath, R. Prabhavalkar, S. Kumar, S. Lee, A. Kannan, D. Rybach, V. Schogol, P. Nguyen, B. Li, Y. Wu, Z. Chen, C. -C. Chiu, in Proceedings of the 43th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). No need for a lexicon? Evaluating the value of the pronunciation lexica in end-to-end models (Calgary, 2018), pp. 5859\u20135863. https:\/\/doi.org\/10.1109\/icassp.2018.8462380.","DOI":"10.1109\/icassp.2018.8462380"},{"key":"222_CR43","first-page":"4253","volume-title":"Proceedings of the 33th International Conference on Acoustics, Speech, and Signal Processing (ICASSP)","author":"Q. Zhang","year":"2008","unstructured":"Q. Zhang, J. Pan, Y. Yan, in Proceedings of the 33th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Mandarin-English bilingual speech recognition for real world music retrieval (IEEELas Vegas, 2008), pp. 4253\u20134256."},{"issue":"5","key":"222_CR44","doi-asserted-by":"publisher","first-page":"434","DOI":"10.1016\/j.specom.2008.01.002","volume":"50","author":"M. Bisani","year":"2008","unstructured":"M. Bisani, H. Ney, Joint-sequence models for grapheme-to-phoneme conversion. Speech Commun.50(5), 434\u2013451 (2008).","journal-title":"Speech Commun."},{"key":"222_CR45","volume-title":"Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)","author":"D. Povey","year":"2011","unstructured":"D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, et.al., in Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU). The Kaldi speech recognition toolkit (IEEEWaikoloa, 2011)."},{"key":"222_CR46","doi-asserted-by":"publisher","unstructured":"D. Povey, V. Peddinti, D. Galvez, P. Ghahremani, et.al., in The 17th Annual Conference of the International Speech Communication Association(Interspeech). Purely sequence-trained neural networks for ASR based on lattice-free MMI (San Francisco, 2016), pp. 2751\u20132755. https:\/\/doi.org\/10.21437\/interspeech.2016-595.","DOI":"10.21437\/interspeech.2016-595"},{"key":"222_CR47","doi-asserted-by":"publisher","unstructured":", in The 18th Annual Conference of the International Speech Communication Association(Interspeech). An exploration of dropout with LSTMs (Stockholm, 2017), pp. 1586\u20131590. https:\/\/doi.org\/10.21437\/interspeech.2017-129.","DOI":"10.21437\/interspeech.2017-129"},{"key":"222_CR48","first-page":"5998","volume-title":"Advances in Neural Information Processing Systems (NIPS)","author":"A. Vaswani","year":"2017","unstructured":"A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, \u0141. Kaiser, I. Polosukhin, in Advances in Neural Information Processing Systems (NIPS). Attention is all you need (NIPSLong Beach, 2017), pp. 5998\u20136008."},{"key":"222_CR49","first-page":"5884","volume-title":"Proceedings of the 43th International Conference on Acoustics, Speech, and Signal Processing (ICASSP)","author":"L. Dong","year":"2018","unstructured":"L. Dong, S. Xu, B. Xu, in Proceedings of the 43th International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Speech-transformer: a no-recurrence sequence-to-sequence model for speech recognition (IEEECalgary, 2018), pp. 5884\u20135888."},{"key":"222_CR50","first-page":"2207","volume-title":"The 19th Annual Conference of the International Speech Communication Association(Interspeech)","author":"S. Watanabe","year":"2018","unstructured":"S. Watanabe, T. Hori, S. Karita, T. Hayashi, J. Nishitoba, Y. Unno, N. Enrique Yalta Soplin, J. Heymann, M. Wiesner, N. Chen, A. Renduchintala, T. Ochiai, in The 19th Annual Conference of the International Speech Communication Association(Interspeech). ESPnet: end-to-end speech processing toolkit (ISCAHyderabad, 2018), pp. 2207\u20132211."}],"container-title":["EURASIP Journal on Audio, Speech, and Music Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13636-021-00222-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13636-021-00222-7\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13636-021-00222-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,8,30]],"date-time":"2021-08-30T08:31:23Z","timestamp":1630312283000},"score":1,"resource":{"primary":{"URL":"https:\/\/asmp-eurasipjournals.springeropen.com\/articles\/10.1186\/s13636-021-00222-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,8,30]]},"references-count":50,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,12]]}},"alternative-id":["222"],"URL":"https:\/\/doi.org\/10.1186\/s13636-021-00222-7","relation":{},"ISSN":["1687-4722"],"issn-type":[{"type":"electronic","value":"1687-4722"}],"subject":[],"published":{"date-parts":[[2021,8,30]]},"assertion":[{"value":"10 May 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"18 August 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"30 August 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"34"}}