{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:14:24Z","timestamp":1750220064201,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":18,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,12,23]],"date-time":"2022-12-23T00:00:00Z","timestamp":1671753600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,12,23]]},"DOI":"10.1145\/3578741.3578801","type":"proceedings-article","created":{"date-parts":[[2023,3,7]],"date-time":"2023-03-07T04:18:52Z","timestamp":1678162732000},"page":"317-322","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["A Streaming End-to-End Speech Recognition Approach Based on WeNet for Tibetan Amdo Dialect"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3174-1317","authenticated-orcid":false,"given":"Chao","family":"Wang","sequence":"first","affiliation":[{"name":"School of Information Science and Technology, Tibet University, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6730-1201","authenticated-orcid":false,"given":"Yao","family":"Wen","sequence":"additional","affiliation":[{"name":"School of Information Science and Technology, Tibet University, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1568-2644","authenticated-orcid":false,"given":"Phurba","family":"Lhamo","sequence":"additional","affiliation":[{"name":"School of Information Science and Technology, Tibet University, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9288-6600","authenticated-orcid":false,"given":"Nyima","family":"Tashi","sequence":"additional","affiliation":[{"name":"School of Information Science and Technology, Tibet University, China"}]}],"member":"320","published-online":{"date-parts":[[2023,3,6]]},"reference":[{"key":"e_1_3_2_1_1_1","first-page":"199","volume-title":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","author":"Rao Kanishka","unstructured":"Kanishka Rao , Ha\u015fim Sak , and Rohit Prabhavalkar . Exploring architectures, data and units for streaming end-to-end speech recognition with rnntransducer . In 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) , pages 193\u2013 199 . IEEE, 2017. Kanishka Rao, Ha\u015fim Sak, and Rohit Prabhavalkar. Exploring architectures, data and units for streaming end-to-end speech recognition with rnntransducer. In 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pages 193\u2013199. IEEE, 2017."},{"key":"e_1_3_2_1_2_1","first-page":"4964","volume-title":"2016 IEEE international conference on acoustics, speech and signal processing (ICASSP)","author":"Chan William","unstructured":"William Chan , Navdeep Jaitly , Quoc Le , and Oriol Vinyals . Listen, attend and spell: A neural network for large vocabulary conversational speech recognition . In 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP) , pages 4960\u2013 4964 . IEEE, 2016. William Chan, Navdeep Jaitly, Quoc Le, and Oriol Vinyals. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. In 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP), pages 4960\u20134964. IEEE, 2016."},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2022-10015"},{"key":"e_1_3_2_1_4_1","first-page":"943","volume-title":"Interspeech","author":"Prabhavalkar Rohit","year":"2017","unstructured":"Rohit Prabhavalkar , Kanishka Rao , Tara N Sainath , Bo Li , Leif Johnson , and Navdeep Jaitly . A comparison of sequence-to-sequence models for speech recognition . In Interspeech , pages 939\u2013 943 , 2017 . Rohit Prabhavalkar, Kanishka Rao, Tara N Sainath, Bo Li, Leif Johnson, and Navdeep Jaitly. A comparison of sequence-to-sequence models for speech recognition. In Interspeech, pages 939\u2013943, 2017."},{"key":"e_1_3_2_1_5_1","first-page":"213","volume-title":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","author":"Battenberg Eric","unstructured":"Eric Battenberg , Jitong Chen , Rewon Child , Adam Coates , Yashesh Gaur Yi Li , Hairong Liu , Sanjeev Satheesh , Anuroop Sriram , and Zhenyao Zhu . Exploring neural transducers for end-to-end speech recognition . In 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) , pages 206\u2013 213 . IEEE, 2017. Eric Battenberg, Jitong Chen, Rewon Child, Adam Coates, Yashesh Gaur Yi Li, Hairong Liu, Sanjeev Satheesh, Anuroop Sriram, and Zhenyao Zhu. Exploring neural transducers for end-to-end speech recognition. In 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pages 206\u2013213. IEEE, 2017."},{"key":"e_1_3_2_1_6_1","first-page":"5798","volume-title":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","author":"Li Jinyu","unstructured":"Jinyu Li , Guoli Ye , Amit Das , Rui Zhao , and Yifan Gong . Advancing acoustic-to-word ctc model . In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages 5794\u2013 5798 . IEEE, 2018. Jinyu Li, Guoli Ye, Amit Das, Rui Zhao, and Yifan Gong. Advancing acoustic-to-word ctc model. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5794\u20135798. IEEE, 2018."},{"key":"e_1_3_2_1_7_1","first-page":"6385","volume-title":"ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","author":"He Yanzhang","unstructured":"Yanzhang He , Tara N Sainath , Rohit Prabhavalkar , Ian McGraw , Raziel Alvarez , Ding Zhao , David Rybach , Anjuli Kannan , Yonghui Wu , Ruoming Pang , Streaming end-to-end speech recognition for mobile devices . In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages 6381\u2013 6385 . IEEE, 2019. Yanzhang He, Tara N Sainath, Rohit Prabhavalkar, Ian McGraw, Raziel Alvarez, Ding Zhao, David Rybach, Anjuli Kannan, Yonghui Wu, Ruoming Pang, Streaming end-to-end speech recognition for mobile devices. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6381\u20136385. IEEE, 2019."},{"key":"e_1_3_2_1_8_1","volume-title":"Developing rnn-t models surpassing high-performance hybrid models with customization capability. arXiv preprint arXiv:2007.15188","author":"Li Jinyu","year":"2020","unstructured":"Jinyu Li , Rui Zhao , Zhong Meng , Yanqing Liu , Wenning Wei , Sarangarajan Parthasarathy , Vadim Mazalov , Zhenghao Wang , Lei He , Sheng Zhao , Developing rnn-t models surpassing high-performance hybrid models with customization capability. arXiv preprint arXiv:2007.15188 , 2020 . Jinyu Li, Rui Zhao, Zhong Meng, Yanqing Liu, Wenning Wei, Sarangarajan Parthasarathy, Vadim Mazalov, Zhenghao Wang, Lei He, Sheng Zhao, Developing rnn-t models surpassing high-performance hybrid models with customization capability. arXiv preprint arXiv:2007.15188, 2020."},{"key":"e_1_3_2_1_9_1","first-page":"6787","volume-title":"ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","author":"Shi Yangyang","unstructured":"Yangyang Shi , Yongqiang Wang , Chunyang Wu , Ching-Feng Yeh , Julian Chan , Frank Zhang , Duc Le , and Mike Seltzer . Emformer : Efficient memory transformer based acoustic model for low latency streaming speech recognition . In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages 6783\u2013 6787 . IEEE, 2021. Yangyang Shi, Yongqiang Wang, Chunyang Wu, Ching-Feng Yeh, Julian Chan, Frank Zhang, Duc Le, and Mike Seltzer. Emformer: Efficient memory transformer based acoustic model for low latency streaming speech recognition. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6783\u20136787. IEEE, 2021."},{"key":"e_1_3_2_1_10_1","volume-title":"Sequence transduction with recurrent neural networks. arXiv preprint arXiv:1211.3711","author":"Graves Alex","year":"2012","unstructured":"Alex Graves . Sequence transduction with recurrent neural networks. arXiv preprint arXiv:1211.3711 , 2012 . Alex Graves. Sequence transduction with recurrent neural networks. arXiv preprint arXiv:1211.3711, 2012."},{"issue":"1","key":"e_1_3_2_1_11_1","volume":"11","author":"Jinyu Li","year":"2022","unstructured":"Jinyu Li Recent advances in end-to-end automatic speech recognition. APSIPA Transactions on Signal and Information Processing , 11 ( 1 ), 2022 . [12] Xiong Wang, Zhuoyuan Yao, Xian Shi, and Lei Xie. Cascade rnn-transducer: Syllable based streaming on-device mandarin speech recognition with a syllable-to-character converter. In 2021 IEEE Spoken Language Technology Workshop (SLT), pages 15\u201321. IEEE, 2021. Jinyu Li Recent advances in end-to-end automatic speech recognition. APSIPA Transactions on Signal and Information Processing, 11(1), 2022. [12] Xiong Wang, Zhuoyuan Yao, Xian Shi, and Lei Xie. Cascade rnn-transducer: Syllable based streaming on-device mandarin speech recognition with a syllable-to-character converter. In 2021 IEEE Spoken Language Technology Workshop (SLT), pages 15\u201321. IEEE, 2021.","journal-title":"APSIPA Transactions on Signal and Information Processing"},{"key":"e_1_3_2_1_12_1","first-page":"1369","volume-title":"2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","author":"Wang Senmao","unstructured":"Senmao Wang , Pan Zhou , Wei Chen , Jia Jia , and Lei Xie . Exploring rnn-transducer for chinese speech recognition . In 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) , pages 1364\u2013 1369 . IEEE, 2019. Senmao Wang, Pan Zhou, Wei Chen, Jia Jia, and Lei Xie. Exploring rnn-transducer for chinese speech recognition. In 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pages 1364\u20131369. IEEE, 2019."},{"key":"e_1_3_2_1_13_1","volume-title":"Attention-based models for speech recognition. Advances in neural information processing systems, 28","author":"Chorowski Jan K","year":"2015","unstructured":"Jan K Chorowski , Dzmitry Bahdanau , Dmitriy Serdyuk , Kyunghyun Cho , and Yoshua Bengio . Attention-based models for speech recognition. Advances in neural information processing systems, 28 , 2015 . Jan K Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, and Yoshua Bengio. Attention-based models for speech recognition. Advances in neural information processing systems, 28, 2015."},{"key":"e_1_3_2_1_14_1","volume-title":"More productive end-to-end speech recognition toolkit. arXiv preprint arXiv:2203.15455","author":"Zhang Binbin","year":"2022","unstructured":"Binbin Zhang , Di Wu , Zhendong Peng , Xingchen Song , Zhuoyuan Yao , Hang Lv , Lei Xie , Chao Yang , Fuping Pan , and Jianwei Niu . Wenet 2.0 : More productive end-to-end speech recognition toolkit. arXiv preprint arXiv:2203.15455 , 2022 . Binbin Zhang, Di Wu, Zhendong Peng, Xingchen Song, Zhuoyuan Yao, Hang Lv, Lei Xie, Chao Yang, Fuping Pan, and Jianwei Niu. Wenet 2.0: More productive end-to-end speech recognition toolkit. arXiv preprint arXiv:2203.15455, 2022."},{"key":"e_1_3_2_1_15_1","volume-title":"Attention is all you need. Advances in neural information processing systems, 30","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani , Noam Shazeer , Niki Parmar , Jakob Uszkoreit , Llion Jones , Aidan N Gomez , \u0141ukasz Kaiser , and Illia Polosukhin . Attention is all you need. Advances in neural information processing systems, 30 , 2017 . Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017."},{"key":"e_1_3_2_1_16_1","volume-title":"Yonghui Wu","author":"Gulati Anmol","year":"2005","unstructured":"Anmol Gulati , James Qin , Chung-Cheng Chiu , Niki Parmar , Yu Zhang , Jiahui Yu , Wei Han , Shibo Wang , Zhengdong Zhang , Yonghui Wu , Conformer : Convolution-augmented transformer for speech recognition. arXiv preprint arXiv: 2005 .08100, 2020. Anmol Gulati, James Qin, Chung-Cheng Chiu, Niki Parmar, Yu Zhang, Jiahui Yu, Wei Han, Shibo Wang, Zhengdong Zhang, Yonghui Wu, Conformer: Convolution-augmented transformer for speech recognition. arXiv preprint arXiv:2005.08100, 2020."},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/1143844.1143891"},{"key":"e_1_3_2_1_18_1","first-page":"584","volume-title":"Speech recognition with weighted finite-state transducers","author":"Mohri Mehryar","unstructured":"Mehryar Mohri , Fernando Pereira , and Michael Riley . Speech recognition with weighted finite-state transducers . In Springer Handbook of Speech Processing , pages 559\u2013 584 . Springer, 2008. Mehryar Mohri, Fernando Pereira, and Michael Riley. Speech recognition with weighted finite-state transducers. In Springer Handbook of Speech Processing, pages 559\u2013584. Springer, 2008."}],"event":{"name":"MLNLP 2022: 2022 5th International Conference on Machine Learning and Natural Language Processing","acronym":"MLNLP 2022","location":"Sanya China"},"container-title":["Proceedings of the 2022 5th International Conference on Machine Learning and Natural Language Processing"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3578741.3578801","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3578741.3578801","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T18:08:51Z","timestamp":1750183731000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3578741.3578801"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,23]]},"references-count":18,"alternative-id":["10.1145\/3578741.3578801","10.1145\/3578741"],"URL":"https:\/\/doi.org\/10.1145\/3578741.3578801","relation":{},"subject":[],"published":{"date-parts":[[2022,12,23]]},"assertion":[{"value":"2023-03-06","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}