{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2022,4,2]],"date-time":"2022-04-02T17:43:48Z","timestamp":1648921428452},"reference-count":42,"publisher":"Cambridge University Press (CUP)","issue":"2","license":[{"start":{"date-parts":[[2012,12,12]],"date-time":"2012-12-12T00:00:00Z","timestamp":1355270400000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/www.cambridge.org\/core\/terms"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Nat. Lang. Eng."],"published-print":{"date-parts":[[2014,4]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>This paper presents morpheme-based language models developed for Amharic (a morphologically rich Semitic language) and their application to a speech recognition task. A substantial reduction in the out of vocabulary rate has been observed as a result of using subwords or morphemes. Thus a severe problem of morphologically rich languages has been addressed. Moreover, lower perplexity values have been obtained with morpheme-based language models than with word-based models. However, when comparing the quality based on the probability assigned to the test sets, word-based models seem to fare better. We have studied the utility of morpheme-based language models in speech recognition systems and found that the performance of a relatively small vocabulary (5k) speech recognition system improved significantly as a result of using morphemes as language modeling and dictionary units. However, as the size of the vocabulary increases (20k or more) the morpheme-based systems suffer from acoustic confusability and did not achieve a significant improvement over a word-based system with an equivalent vocabulary size even with the use of higher order (quadrogram) n-gram language models.<\/jats:p>","DOI":"10.1017\/s1351324912000356","type":"journal-article","created":{"date-parts":[[2012,12,12]],"date-time":"2012-12-12T10:28:30Z","timestamp":1355308110000},"page":"235-259","source":"Crossref","is-referenced-by-count":1,"title":["Using morphemes in language modeling and automatic speech recognition of Amharic"],"prefix":"10.1017","volume":"20","author":[{"given":"MARTHA YIFIRU","family":"TACHBELIE","sequence":"first","affiliation":[]},{"given":"SOLOMON TEFERRA","family":"ABATE","sequence":"additional","affiliation":[]},{"given":"WOLFGANG","family":"MENZEL","sequence":"additional","affiliation":[]}],"member":"56","published-online":{"date-parts":[[2012,12,12]]},"reference":[{"key":"S1351324912000356_ref19","first-page":"183","article-title":"Amharic verb lexicon in the context of machine translation","volume":"2","author":"Fissaha","year":"2003","journal-title":"Proceedings of the 10th Conference on Traitement Automatique des Langues Naturelles"},{"key":"S1351324912000356_ref5","first-page":"47","volume-title":"Proceedings of International Conference on Recent Advances in Natural Language Processing","author":"Amsalu","year":"2005"},{"key":"S1351324912000356_ref11","first-page":"1481","volume-title":"Proceeding the 8th International Conference on Spoken Language Processing","author":"Bouwman","year":"2004"},{"key":"S1351324912000356_ref42","volume-title":"The HTK Book","author":"Young","year":"2006"},{"key":"S1351324912000356_ref35","unstructured":"Tachbelie M. Y . 2010. Morphology-Based Language Modeling for Amharic. PhD thesis, University of Hamburg. http:\/\/www2.sub.uni-hamburg.de\/opus\/volltexte\/2010\/4848\/pdf\/TachbelieDissertation.pdf. Last accessed on the 3rd of December, 2012."},{"key":"S1351324912000356_ref3","doi-asserted-by":"publisher","DOI":"10.3115\/1621787.1621797"},{"key":"S1351324912000356_ref9","unstructured":"Bayu T. 2002. Automatic Morphological Analyzer for Amharic: An Experiment Employing Unsupervised Learning and Autosegmental Analysis Approaches. Master's thesis, Addis Ababa University."},{"key":"S1351324912000356_ref17","doi-asserted-by":"crossref","first-page":"2679","DOI":"10.21437\/Interspeech.2009-123","volume-title":"Proceedings of INTERSPEECH 2009","author":"El-Desoky","year":"2009"},{"key":"S1351324912000356_ref29","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-78135-6_32"},{"key":"S1351324912000356_ref40","first-page":"170","volume-title":"Proceeding of International Conference on Spoken Language Processing","author":"Whittaker","year":"2000"},{"key":"S1351324912000356_ref23","first-page":"121","volume-title":"Proceedings of the International and Interdisciplinary Conference on Adaptive Knowledge Representation and Reasoning","author":"Hirsim\u00e4ki","year":"2005"},{"key":"S1351324912000356_ref27","volume-title":"Proceedings of Machine Translation Summit XI, 2007","author":"Labaka","year":"2007"},{"key":"S1351324912000356_ref38","doi-asserted-by":"publisher","DOI":"10.1093\/jss\/XXXII.1.1"},{"key":"S1351324912000356_ref31","unstructured":"Pellegrini T. , and Lamel L. 2007. Using phonetic features in unsupervised word decompounding for ASR with application to a less-represented language. In Proceedings of INTERSPEECH 2007, pp. 1797\u20131800."},{"key":"S1351324912000356_ref21","first-page":"445","article-title":"Using morphology towards better large-vocabulary speech recognition systems","volume":"1","author":"Geutner","year":"1995","journal-title":"Proceedings of IEEE International on Acoustics, Speech and Signal Processing"},{"key":"S1351324912000356_ref7","first-page":"153","volume-title":"Proceedings of ACL-08, HLT Short Paper","author":"Badr","year":"2008"},{"key":"S1351324912000356_ref15","unstructured":"Creutz M. , and Lagus K. 2005. Unsupervised morpheme segmentation and morphology induction from text corpora using Morfessor 1.1. Technical Report A81, Neural Networks Research Center, Helsinki University of Technology."},{"key":"S1351324912000356_ref20","first-page":"94","volume-title":"Proceedings of the Conference on Human Language Technology for Development - HLTD 2011","author":"Gasser","year":"2011"},{"key":"S1351324912000356_ref8","unstructured":"Bayou A. 2000. Developing Automatic Word Parser for Amharic Verbs and their Derivation. Master's thesis, Addis Ababa University."},{"key":"S1351324912000356_ref1","unstructured":"Abate S. T . 2006. Automatic Speech Recognition for Amharic. PhD. thesis. University of Hamburg."},{"key":"S1351324912000356_ref6","first-page":"104","volume-title":"Proceedings of the Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources - ACL 2007","author":"Argaw","year":"2007"},{"key":"S1351324912000356_ref10","volume-title":"Languages in Ethiopia","author":"Bender","year":"1976"},{"key":"S1351324912000356_ref12","unstructured":"Chen S. F. , and Goodman J. 1998. An empirical study of smoothing techniques for language modeling. Technical Report TR-10-98, Computer Science Group, Harvard University."},{"key":"S1351324912000356_ref13","unstructured":"Creutz M. 2006. Induction of the Morphology of Natural language: Unsupervised Morpheme Segmentation with Application to Automatic Speech Recognition. PhD thesis, Helsinki University of Technology."},{"key":"S1351324912000356_ref14","first-page":"380","volume-title":"Proceedings of NAACL HLT 2007","author":"Creutz","year":"2007"},{"key":"S1351324912000356_ref18","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.1997.599552"},{"key":"S1351324912000356_ref22","unstructured":"Heintz I. 2010. Arabic Language Modeling with Stem-derived Morphemes for Automatic Speech Recognition. PhD thesis, Graduate Program in Linguistics, The Ohio State University."},{"key":"S1351324912000356_ref24","first-page":"487","volume-title":"Proceeding of the European Conference on Speech Communication and Technology","author":"Ircing","year":"2001"},{"key":"S1351324912000356_ref34","first-page":"901","article-title":"SRILM \u2013 an extensible language modeling toolkit","volume":"2","author":"Stolcke","year":"2002","journal-title":"Proceedings of International Conference on Spoken Language Processing"},{"key":"S1351324912000356_ref25","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4613-1297-0"},{"key":"S1351324912000356_ref26","unstructured":"Kirchhoff K. , Bilmes J. , Henderson J. , Schwartz R. , Noamany M. , Schone P. , Ji G. , Das S. , Egan M. , He F. , Vergyri D. , Liu D. , and Duta N. 2002. Novel speech recognition models for Arabic. Technical Report, Johns-Hopkins University Summer Research Workshop."},{"key":"S1351324912000356_ref28","volume-title":"Foundations of Statistical Natural Language Processing","author":"Manning","year":"1999"},{"key":"S1351324912000356_ref30","first-page":"285","volume-title":"Proceedings of INTERSPEECH 2006","author":"Pellegrini","year":"2006"},{"key":"S1351324912000356_ref32","doi-asserted-by":"publisher","DOI":"10.1109\/TASL.2009.2022295"},{"key":"S1351324912000356_ref33","doi-asserted-by":"crossref","DOI":"10.7551\/mitpress\/4775.001.0001","volume-title":"Morphology and Computation","author":"Sproat","year":"1992"},{"key":"S1351324912000356_ref36","doi-asserted-by":"publisher","DOI":"10.1075\/cilt.309.25tac"},{"key":"S1351324912000356_ref37","volume-title":"Essentials of Amharic","author":"Teferra","year":"2007"},{"key":"S1351324912000356_ref39","first-page":"2245","volume-title":"Proceedings of International Conference on Spoken Language Processing","author":"Vergyri","year":"2004"},{"key":"S1351324912000356_ref4","doi-asserted-by":"publisher","DOI":"10.1093\/llc\/17.1.1"},{"key":"S1351324912000356_ref2","first-page":"1601","volume-title":"Proceedings of 9th European Conference on Speech Communication and Technology, Interspeech-2005","author":"Abate","year":"2005"},{"key":"S1351324912000356_ref41","volume-title":"Y\u00e4amarI\u00f1a S\u00e4was\u00e4w","author":"Yimam","year":"2007"},{"key":"S1351324912000356_ref16","unstructured":"Creutz M. , and Lind K. 2004. Morpheme Segmentation Gold Standards for Finnish and English. Technical Report A77, Helsinki University of Technology."}],"container-title":["Natural Language Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S1351324912000356","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,2,3]],"date-time":"2022-02-03T02:28:18Z","timestamp":1643855298000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S1351324912000356\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,12,12]]},"references-count":42,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2014,4]]}},"alternative-id":["S1351324912000356"],"URL":"https:\/\/doi.org\/10.1017\/s1351324912000356","relation":{},"ISSN":["1351-3249","1469-8110"],"issn-type":[{"value":"1351-3249","type":"print"},{"value":"1469-8110","type":"electronic"}],"subject":[],"published":{"date-parts":[[2012,12,12]]}}}