{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,14]],"date-time":"2026-03-14T21:38:07Z","timestamp":1773524287354,"version":"3.50.1"},"reference-count":55,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2015,4,20]],"date-time":"2015-04-20T00:00:00Z","timestamp":1429488000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Asian Low-Resour. Lang. Inf. Process."],"published-print":{"date-parts":[[2015,4,20]]},"abstract":"<jats:p>Nowadays, bilingual or multilingual speech recognition is confronted with the accent-related problem caused by non-native speech in a variety of real-world applications. Accent modeling of non-native speech is definitely challenging, because the acoustic properties in highly-accented speech pronounced by non-native speakers are quite divergent. The aim of this study is to generate highly Mandarin-accented English models for speakers whose mother tongue is Mandarin. First, a two-stage, state-based verification method is proposed to extract the state-level, highly-accented speech segments automatically. Acoustic features and articulatory features are successively used for robust verification of the extracted speech segments. Second, Gaussian components of the highly-accented speech models are generated from the corresponding Gaussian components of the native speech models using a linear transformation function. A decision tree is constructed to categorize the transformation functions and used for transformation function retrieval to deal with the data sparseness problem. Third, a discrimination function is further applied to verify the generated accented acoustic models. Finally, the successfully verified accented English models are integrated into the native bilingual phone model set for Mandarin-English bilingual speech recognition. Experimental results show that the proposed approach can effectively alleviate recognition performance degradation due to accents and can obtain absolute improvements of 4.1%, 1.8%, and 2.7% in word accuracy for bilingual speech recognition compared to that using traditional ASR approaches, MAP-adapted, and MLLR-adapted ASR methods, respectively.<\/jats:p>","DOI":"10.1145\/2661637","type":"journal-article","created":{"date-parts":[[2015,6,18]],"date-time":"2015-06-18T18:14:05Z","timestamp":1434651245000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Model Generation of Accented Speech using Model Transformation and Verification for Bilingual Speech Recognition"],"prefix":"10.1145","volume":"14","author":[{"given":"Han-ping","family":"Shen","sequence":"first","affiliation":[{"name":"National Cheng Kung University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chung-hsien","family":"Wu","sequence":"additional","affiliation":[{"name":"National Cheng Kung University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Pei-shan","family":"Tsai","sequence":"additional","affiliation":[{"name":"National Cheng Kung University"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2015,4,20]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1155\/S1110865704310024"},{"key":"e_1_2_1_2_1","volume-title":"Proceedings of Interspeech. 1449--1452","author":"Bouselmi G."},{"key":"e_1_2_1_3_1","volume-title":"Proceedings of Interspeech.","author":"Bouselmi G."},{"key":"e_1_2_1_4_1","volume-title":"Proceedings of the IEEE Odyssey: Speaker and Language Recognition Workshop. 1--8.","author":"Campbell W. M."},{"key":"e_1_2_1_5_1","first-page":"255","article-title":"A novel characterization of the alternative hypothesis using kernel discriminant analysis for LLR-based speaker verification","volume":"12","author":"Chao Y.-H.","year":"2007","journal-title":"Int. J. Comput. Linguist. Chinese Lang. Process."},{"key":"e_1_2_1_6_1","volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. 4396--4399","author":"Chen N. F."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-6393(01)00076-0"},{"key":"e_1_2_1_8_1","unstructured":"Chomsky N. and Halle M. 1968. The Sound Pattern of English. Harper & Row New York NY. Chomsky N. and Halle M. 1968. The Sound Pattern of English. Harper & Row New York NY."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.2517-6161.1977.tb01600.x"},{"key":"e_1_2_1_10_1","unstructured":"English Across Taiwan. 2005. EAT. http:\/\/www.aclclp.org.tw\/use_mat.php#eat. English Across Taiwan. 2005. EAT. http:\/\/www.aclclp.org.tw\/use_mat.php#eat."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASL.2012.2201474"},{"key":"e_1_2_1_12_1","volume-title":"Proceedings of the DARPA Workshop on Speech Recognition. 93--99","author":"Fisher W. M."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1121\/1.2035588"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-6393(98)00066-1"},{"key":"e_1_2_1_15_1","unstructured":"Hieronymus J. L. 1994. ASCII phonetic symbols for the world\u2019s languages: Worldbet. AT&T Tech.Rep. http:\/\/www.cslu.ogi.edu\/publications\/. Hieronymus J. L. 1994. ASCII phonetic symbols for the world\u2019s languages: Worldbet. AT&T Tech.Rep. http:\/\/www.cslu.ogi.edu\/publications\/."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2007.1064"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASL.2012.2213247"},{"key":"e_1_2_1_18_1","first-page":"33","article-title":"Subphonetic modeling with Markov states-senone","volume":"1","author":"Hwang M.-Y.","year":"1992","journal-title":"Proceedings of ICASSP."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/89.544526"},{"key":"e_1_2_1_20_1","doi-asserted-by":"crossref","volume-title":"Handbook of the International Phonetic Association: A Guide to the Use of the International Phonetic Alphabet","author":"International Phonetic Association (IPA) 1999.","DOI":"10.1017\/9780511807954"},{"key":"e_1_2_1_22_1","unstructured":"Joachims T. 1999. Making large-scale SVM learning practical. In Advances in Kernel Methods:Support Vector Learning. Sch\u00f6lkopf B. Burges C. and Smola A. Eds. MIT Press. Joachims T. 1999. Making large-scale SVM learning practical. In Advances in Kernel Methods:Support Vector Learning . Sch\u00f6lkopf B. Burges C. and Smola A. Eds. MIT Press."},{"key":"e_1_2_1_23_1","volume-title":"Proceedings of the Workshop on Phonetics and Phonology in ASR.","author":"Kirchhoff K.","year":"2000"},{"key":"e_1_2_1_24_1","volume-title":"Proceedings of Southern Africa Telecommunication Networks and Applications Conference (SATNAC).","author":"Lebese E."},{"key":"e_1_2_1_25_1","volume-title":"Proceedings of Interspeech.","author":"Lee C.-H."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSA.2004.828638"},{"key":"e_1_2_1_27_1","volume-title":"Proceedings of ICASSP.","volume":"3","author":"Livescu K."},{"key":"e_1_2_1_28_1","volume-title":"Proceedings of the ESCA Workshop on Modeling Pronunciation Variation for Automatic Speech Recognition. 73--78","author":"Mokbel H."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.specom.2006.10.006"},{"key":"e_1_2_1_30_1","volume-title":"Proceedings of the Automatic Speech Recognition and Understanding Workshop.","volume":"1","author":"Ostendorf M.","year":"1999"},{"key":"e_1_2_1_31_1","volume-title":"Proceedings of Interspeech.","author":"Qian Y."},{"key":"e_1_2_1_32_1","volume-title":"Proceedings of EuroSpeech. 2467--2470","author":"Ravishankar M."},{"key":"e_1_2_1_33_1","volume-title":"Proceedings of the 2nd International Conference on Spoken Language Processing.","author":"Rosenberg A. E."},{"key":"e_1_2_1_34_1","volume-title":"-H","author":"Siniscalchi S. M.","year":"2008"},{"key":"e_1_2_1_35_1","volume-title":"Proceedings of International Conference of Phonetic Sciences (ICPhS). 2545--2548","author":"Stefan S.","year":"2003"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-6393(99)00038-2"},{"key":"e_1_2_1_37_1","volume-title":"Proceedings of EuroSpeech.","author":"St\u00fcker S."},{"key":"e_1_2_1_38_1","first-page":"I","article-title":"Multilingual articulatory features","volume":"1","author":"St\u00fcker S.","year":"2003","journal-title":"Proceedings of ICASSP."},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/89.661472"},{"key":"e_1_2_1_40_1","volume-title":"Proceedings of Multilinguality in Spoken Language Processing.","author":"Tomokiyo L. M."},{"key":"e_1_2_1_41_1","first-page":"1463","article-title":"Automatic alternative transcription generation and vocabulary selection for flexible word recognizers","volume":"2","author":"Torre D.","year":"1997","journal-title":"Proceedings of ICASSP."},{"key":"e_1_2_1_42_1","volume-title":"Proceedings of ICASSP. 4889--4892","author":"Vu N. T."},{"key":"e_1_2_1_43_1","first-page":"I","article-title":"Comparison of acoustic model adaptation techniques on non-native speech","volume":"1","author":"Wang Z.","year":"2003","journal-title":"Proceedings of ICASSP"},{"key":"e_1_2_1_44_1","volume-title":"Proceedings of Eurospeech.","author":"Wang Z."},{"key":"e_1_2_1_45_1","volume-title":"Proceedings of the IEEE Workshop of Spoken Language Technology (SLT).","author":"Weiner J."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1017\/S0025100300005892"},{"key":"e_1_2_1_47_1","volume-title":"Proceedings of the ESCA Workshop on Modeling Pronunciation Variation for Automatic Speech Recognition. 151--156","author":"Williams G."},{"key":"e_1_2_1_48_1","volume-title":"-T","author":"Wu C.-H.","year":"2012"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/1967293.1967294"},{"key":"e_1_2_1_50_1","volume-title":"Proceedings of International Symposium on Chinese Spoken Language Processing (ISCSLP). 1--4.","author":"Yang J."},{"key":"e_1_2_1_51_1","volume-title":"-S","author":"Yeh C.-F.","year":"2012"},{"key":"e_1_2_1_52_1","volume-title":"-S","author":"Yeh C.-F.","year":"2011"},{"key":"e_1_2_1_53_1","unstructured":"Young S. Evermann G. Gales M. Hain T. Kershaw D. Liu X.-Y. Moore G. Odell J. Ollason D. Povey D. Valtchev V. and Woodland P. 2006. The Hidden Markov Model Toolkit (HTK) Version 3.4. http:\/\/htk.eng.cam.ac.uk\/. Young S. Evermann G. Gales M. Hain T. Kershaw D. Liu X.-Y. Moore G. Odell J. Ollason D. Povey D. Valtchev V. and Woodland P. 2006. The Hidden Markov Model Toolkit (HTK) Version 3.4. http:\/\/htk.eng.cam.ac.uk\/."},{"key":"e_1_2_1_54_1","first-page":"I","article-title":"Chinese-English bilingual phone modeling for cross-language speech recognition","volume":"1","author":"Yu S.","year":"2004","journal-title":"Proceedings of ICASSP"},{"key":"e_1_2_1_55_1","volume-title":"Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).","author":"Zhang C."},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCIT.2008.404"}],"container-title":["ACM Transactions on Asian and Low-Resource Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2661637","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2661637","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T07:28:15Z","timestamp":1750231695000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2661637"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,4,20]]},"references-count":55,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2015,4,20]]}},"alternative-id":["10.1145\/2661637"],"URL":"https:\/\/doi.org\/10.1145\/2661637","relation":{},"ISSN":["2375-4699","2375-4702"],"issn-type":[{"value":"2375-4699","type":"print"},{"value":"2375-4702","type":"electronic"}],"subject":[],"published":{"date-parts":[[2015,4,20]]},"assertion":[{"value":"2013-08-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2014-08-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2015-04-20","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}