{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,30]],"date-time":"2026-05-30T14:24:00Z","timestamp":1780151040148,"version":"3.54.0"},"reference-count":41,"publisher":"Cambridge University Press (CUP)","issue":"6","license":[{"start":{"date-parts":[[2015,9,7]],"date-time":"2015-09-07T00:00:00Z","timestamp":1441584000000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/www.cambridge.org\/core\/terms"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Nat. Lang. Eng."],"published-print":{"date-parts":[[2016,11]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>This paper provides an analysis of several practical issues related to the theory and implementation of Grapheme-to-Phoneme (G2P) conversion systems utilizing the Weighted Finite-State Transducer paradigm. The paper addresses issues related to system accuracy, training time and practical implementation. The focus is on joint n-gram models which have proven to provide an excellent trade-off between system accuracy and training complexity. The paper argues in favor of simple, productive approaches to G2P, which favor a balance between training time, accuracy and model complexity. The paper also introduces the first instance of using joint sequence RnnLMs directly for G2P conversion, and achieves new state-of-the-art performance via ensemble methods combining RnnLMs and n-gram based models. In addition to detailed descriptions of the approach, minor yet novel implementation solutions, and experimental results, the paper introduces<jats:italic>Phonetisaurus<\/jats:italic>, a fully-functional, flexible, open-source, BSD-licensed G2P conversion toolkit, which leverages the OpenFst library. The work is intended to be accessible to a broad range of readers.<\/jats:p>","DOI":"10.1017\/s1351324915000315","type":"journal-article","created":{"date-parts":[[2015,9,7]],"date-time":"2015-09-07T07:51:56Z","timestamp":1441612316000},"page":"907-938","source":"Crossref","is-referenced-by-count":37,"title":["Phonetisaurus: Exploring grapheme-to-phoneme conversion with joint n-gram models in the WFST framework"],"prefix":"10.1017","volume":"22","author":[{"given":"JOSEF ROBERT","family":"NOVAK","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"NOBUAKI","family":"MINEMATSU","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"KEIKICHI","family":"HIROSE","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"56","published-online":{"date-parts":[[2015,9,7]]},"reference":[{"key":"S1351324915000315_ref031","first-page":"14","volume-title":"Proceedings of the 4th Workshop on Spoken Language Technologies for Under-resourced Languages","author":"Schlippe","year":"2014"},{"key":"S1351324915000315_ref016","first-page":"780","volume-title":"Proceedings of the ACL 2010","author":"Jiampojamarn","year":"2010"},{"key":"S1351324915000315_ref032","doi-asserted-by":"publisher","DOI":"10.1109\/78.650093"},{"key":"S1351324915000315_ref014","volume-title":"Proceedings of INTERSPEECH 2012","author":"Hahn","year":"2012"},{"key":"S1351324915000315_ref040","unstructured":"Wu J. 2002. Maximum Entropy Language Modeling with Non-Local Dependencies. PhD thesis, Baltimore, Maryland, USA."},{"key":"S1351324915000315_ref025","unstructured":"Novak J. (2011) Available at: http:\/\/code.google.com\/p\/phonetisaurus."},{"key":"S1351324915000315_ref006","first-page":"434","volume-title":"Speech Communication","author":"Bisani","year":"2008"},{"key":"S1351324915000315_ref005","volume-title":"Text Compression","author":"Bell","year":"1990"},{"key":"S1351324915000315_ref013","volume-title":"Proceedings of the 4th ISCA Tutorial and Research Workshop on Speech Synthesis","author":"Galescu","year":"2001"},{"key":"S1351324915000315_ref038","unstructured":"Weide R. L. 1998. The Carnegie Mellon pronouncing dictionary. Available at: http:\/\/www.speech.cs.cmu.edu\/cgi-bin\/cmudict."},{"key":"S1351324915000315_ref011","first-page":"167","volume-title":"Proceedings of the 10th International Conference on Speech and Computer (SPECOM 2005)","author":"Damper","year":"2005"},{"key":"S1351324915000315_ref035","first-page":"901","volume-title":"Proceedings of ICSLP 2002","author":"Stolcke","year":"2002"},{"key":"S1351324915000315_ref029","first-page":"61","volume-title":"Proceedings of the ACL 2012 - System Demonstrations","author":"Roark","year":"2012"},{"key":"S1351324915000315_ref017","first-page":"372","volume-title":"Proceedings of NAACL HLT 2007","author":"Jiampojamarn","year":"2007"},{"key":"S1351324915000315_ref015","doi-asserted-by":"crossref","first-page":"825","DOI":"10.21437\/Interspeech.2011-305","volume-title":"Proceedings of INTERSPEECH 2011","author":"Hixon","year":"2011"},{"key":"S1351324915000315_ref012","first-page":"2243","volume-title":"Proceedings of EUROSPEECH 1995","author":"Deligne","year":"1995"},{"key":"S1351324915000315_ref037","unstructured":"Tam Y. 2009. Rapid Unsupervised Topic Adaptation - A Latent Semantic Approach. PhD thesis, Carnegie Mellon University, Pittsburgh, PA, USA."},{"key":"S1351324915000315_ref023","first-page":"321","article-title":"Semiring frameworks and algorithms for shortest-distance problems","volume":"7","author":"Mohri","year":"2002","journal-title":"Journal of Automata, Languages and Combinatorics"},{"key":"S1351324915000315_ref010","first-page":"896","volume-title":"Proceedings of ICML 2014","author":"Cortes","year":"2014"},{"key":"S1351324915000315_ref019","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.1995.479394"},{"key":"S1351324915000315_ref039","doi-asserted-by":"publisher","DOI":"10.1109\/18.87000"},{"key":"S1351324915000315_ref028","doi-asserted-by":"crossref","first-page":"1821","DOI":"10.21437\/Interspeech.2013-449","volume-title":"Proceedings of INTERSPEECH 2013","author":"Novak","year":"2013"},{"key":"S1351324915000315_ref007","volume-title":"Proceedings of the 2002 IEEE Workshop on Speech Synthesis","author":"Caseiro","year":"2002"},{"key":"S1351324915000315_ref020","unstructured":"Mikolov T. 2012. Statistical Language Models Based on Neural Networks. PhD Thesis, Brno University of Technology, Czech republic."},{"key":"S1351324915000315_ref004","first-page":"1044","volume-title":"Proceedings of EMNLP 2013","author":"Auli","year":"2013"},{"key":"S1351324915000315_ref030","first-page":"522","article-title":"Learning string edit distance","volume":"20","author":"Ristad","year":"1998","journal-title":"IEEE Transactions PRMI"},{"key":"S1351324915000315_ref034","first-page":"1293","volume-title":"Proceedings of ICSLP 2002","author":"Shu","year":"2002"},{"key":"S1351324915000315_ref027","first-page":"45","volume-title":"Proceedings of FSMNLP 2012","author":"Novak","year":"2012"},{"key":"S1351324915000315_ref024","first-page":"69","article-title":"Weighted finite-state transducers in speech recognition","volume":"16","author":"Mohri","year":"2002","journal-title":"ComputerSpeech and Language"},{"key":"S1351324915000315_ref008","doi-asserted-by":"crossref","unstructured":"Chen S. 2003. Conditional and joint models for grapheme-to-phoneme conversion. In Proceedings of EUROSPEECH.","DOI":"10.21437\/Eurospeech.2003-584"},{"key":"S1351324915000315_ref036","first-page":"297","volume-title":"Proceedings of SPECOM","author":"St\u00fcker","year":"2004"},{"key":"S1351324915000315_ref026","doi-asserted-by":"crossref","first-page":"2526","DOI":"10.21437\/Interspeech.2012-654","volume-title":"Proceedings of INTERSPEECH 2012","author":"Novak","year":"2012"},{"key":"S1351324915000315_ref009","unstructured":"Chen S. , and Goodman J. 1998. An empirical study of smoothing techniques for language modeling. Technical Report, Computer Science Group, Harvard Univerisity."},{"key":"S1351324915000315_ref033","unstructured":"Sejnowski T. J. , and Rosenberg C. R. 1993. NETtalk corpus. Available at: ftp:\/\/svr-ftp.eng.cam.ac.uk\/pub\/comp.speech\/dictionar-ies\/beep.tar.gz."},{"key":"S1351324915000315_ref041","doi-asserted-by":"crossref","first-page":"1258","DOI":"10.21437\/Interspeech.2014-315","volume-title":"Proceedings of INTERSPEECH 2014","author":"Wu","year":"2014"},{"key":"S1351324915000315_ref002","first-page":"11","volume-title":"Proceedings of CIAA 2007","author":"Allauzen","year":"2007"},{"key":"S1351324915000315_ref003","volume-title":"Proceedings of Interspeech 2010","author":"Alum\u00e4e","year":"2010"},{"key":"S1351324915000315_ref022","volume-title":"ASRU 2011, demo session","author":"Mikolov","year":"2011"},{"key":"S1351324915000315_ref001","first-page":"40","volume-title":"Proceedings of the 41st Annual Meeting of the Assocication for Computational Linguistics","author":"Allauzen","year":"2003"},{"key":"S1351324915000315_ref021","doi-asserted-by":"crossref","first-page":"1045","DOI":"10.21437\/Interspeech.2010-343","volume-title":"Proceedings of INTERSPEECH 2010","author":"Mikolov","year":"2010"},{"key":"S1351324915000315_ref018","first-page":"170","volume-title":"Proceedings of CIAA 2001","author":"Kempe","year":"2001"}],"container-title":["Natural Language Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S1351324915000315","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,8,13]],"date-time":"2023-08-13T20:42:18Z","timestamp":1691959338000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S1351324915000315\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,9,7]]},"references-count":41,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2016,11]]}},"alternative-id":["S1351324915000315"],"URL":"https:\/\/doi.org\/10.1017\/s1351324915000315","relation":{},"ISSN":["1351-3249","1469-8110"],"issn-type":[{"value":"1351-3249","type":"print"},{"value":"1469-8110","type":"electronic"}],"subject":[],"published":{"date-parts":[[2015,9,7]]}}}