{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,2]],"date-time":"2025-07-02T18:34:51Z","timestamp":1751481291812},"reference-count":39,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2012,9,12]],"date-time":"2012-09-12T00:00:00Z","timestamp":1347408000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/2.0"},{"start":{"date-parts":[[2012,9,12]],"date-time":"2012-09-12T00:00:00Z","timestamp":1347408000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/2.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Braz Comput Soc"],"published-print":{"date-parts":[[2013,6]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>This paper addresses the problem of grapheme to phoneme conversion to create a pronunciation dictionary from a vocabulary of the most frequent words in European Portuguese. A system based on a mixed approach funded on a stochastic model with embedded rules for stressed vowel assignment is described. The implemented model can generate pronunciations from unrestricted words; however, a dictionary with the 40k most frequent words was constructed and corrected interactively. The dictionary includes homographs with multiplepronunciations. The vocabulary was defined using the CETEMP\u00fablico corpus. The model and dictionary are publicly available.<\/jats:p>","DOI":"10.1007\/s13173-012-0088-0","type":"journal-article","created":{"date-parts":[[2012,9,11]],"date-time":"2012-09-11T08:50:50Z","timestamp":1347353450000},"page":"127-134","update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Generating a pronunciation dictionary for European Portuguese using a joint-sequence model with embedded stress assignment"],"prefix":"10.1007","volume":"19","author":[{"given":"Arlindo","family":"Veiga","sequence":"first","affiliation":[]},{"given":"Sara","family":"Candeias","sequence":"additional","affiliation":[]},{"given":"Fernando","family":"Perdig\u00e3o","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2012,9,12]]},"reference":[{"key":"88_CR1","unstructured":"Andrade E, Viana MC (1985) Corso I\u2014Um Conversor de Texto Ortogr\u00e1fico em C\u00f3digo Fon\u00e9tico para o Portugu\u00eas. Technical Report, CLUL-INIC, Lisboa"},{"key":"88_CR2","volume-title":"Maximum entropy motivated grapheme-to-phoneme, stress and syllable boundary prediction for Portuguese text-to-speech","author":"MJ Barros","year":"2006","unstructured":"Barros MJ, Weiss C (2006) Maximum entropy motivated grapheme-to-phoneme, stress and syllable boundary prediction for Portuguese text-to-speech. IV Jornadas en Tecnolog\u00edas del Habla, Zaragoza"},{"key":"88_CR3","unstructured":"Eckhard B (2000) The parsing system \u201cPalavras\u201d: automatic grammatical analysis of Portuguese in a constraint grammar framework. Dr.phil. thesis, Aarhus University Press, Aarhus"},{"key":"88_CR4","doi-asserted-by":"crossref","unstructured":"Bisani M, Ney H (2002) Investigations on joint-multigram models for grapheme-to-phoneme conversion. In: Proceedings of the 7th international conference on spoken language processing (ICSLP\u201902), Denver, USA, pp 105\u2013108","DOI":"10.21437\/ICSLP.2002-78"},{"issue":"5","key":"88_CR5","doi-asserted-by":"publisher","first-page":"434","DOI":"10.1016\/j.specom.2008.01.002","volume":"50","author":"M Bisani","year":"2008","unstructured":"Bisani M, Ney H (2008) Joint-sequence models for grapheme-to-phoneme conversion. Speech Commun 50(5):434\u2013451","journal-title":"Speech Commun"},{"key":"88_CR6","doi-asserted-by":"publisher","DOI":"10.1109\/ITS.2006.4433293","volume-title":"A rule-based grapheme-to-phone converter for TTS systems in European Portuguese","author":"D Braga","year":"2006","unstructured":"Braga D, Coelho L (2006) A rule-based grapheme-to-phone converter for TTS systems in European Portuguese. VI International Telecommunications Symposium, Fortaleza"},{"key":"88_CR7","unstructured":"Braga D (2008) Algoritmos de Processamento da Linguagem Natural para Sistemas de Convers\u00e3o Texto-Fala em Portugu\u00eas. PhD thesis, Universidade da Coru\u00f1a"},{"key":"88_CR8","unstructured":"Caseiro D, Trancoso I, Oliveira L, Viana C (2002) Grapheme-to-phone using finite-state transducers. In: Proceedings of the IEEE 2002 workshop on speech synthesis, California USA, pp 215\u2013218"},{"key":"88_CR9","unstructured":"Chen S, Goodman J (1998) An empirical study of smoothing techniques for language modeling. Technical Report TR-10-98, Center for Research in Computing Technology (Harvard University)"},{"key":"88_CR10","doi-asserted-by":"crossref","unstructured":"Chotimongkol A, Black A (2000) Statistically trained orthographic to sound models for Thai. In: Proceedings of ICSLP, vol 2. Beijing, China, pp 551\u2013554","DOI":"10.21437\/ICSLP.2000-328"},{"key":"88_CR11","unstructured":"Crystal D (2002) A dictionary of linguistics and phonetics, 5th edn. Blackwell, Oxford"},{"key":"88_CR12","unstructured":"Demberg, V. (2006), Letter-to-Phoneme Conversion for a German Text-to-Speech System, Stuttgart University, published as book by Verlag Dr. M\u00fcller (VDM), ISBN: 978-3-8364-6428-4 (from Amazon.com)."},{"key":"88_CR13","unstructured":"Demberg V, Schmid H, M\u00f6hler G (2007) Phonological constraints and morphological preprocessing for grapheme-to-phoneme conversion\u201d. In: Proceedings of the 45th annual meeting of the association for computational linguistics (ACL-07), Prague, Czech Republic, pp 96\u2013103"},{"key":"88_CR14","unstructured":"Galescu L, Allen J (2001) Bi-directional conversion between graphemes and phonemes using a joint N-gram model\u201d. In: Proceedings of the 4th ISCA workshop on apeech aynthesis, Perthshire, Scotland"},{"issue":"3,4","key":"88_CR15","doi-asserted-by":"publisher","first-page":"237","DOI":"10.1093\/biomet\/40.3-4.237","volume":"40","author":"I Good","year":"1953","unstructured":"Good I (1953) The population frequencies of species and the estimation of population parameters. Biometrika 40(3,4):237\u2013264","journal-title":"Biometrika"},{"key":"88_CR16","unstructured":"Jiampojamarn S, Kondrak G, Sherif T (2007) Applying many-to-many alignments and hidden markov models to letter-to-phoneme conversion\u201d, HLT-NAACL, Rochester, New York, pp 372\u2013379"},{"key":"88_CR17","doi-asserted-by":"crossref","unstructured":"Jiampojamarn S, Kondrak G (2009) Online discriminative training for grapheme-to-phoneme conversion. In: Proceedings of INTERSPEECH, Brighton, UK, pp 1303\u20131306","DOI":"10.21437\/Interspeech.2009-407"},{"key":"88_CR18","unstructured":"Kaplan RM, Kay M (1994) Computational linguistics. In: Regular models of phonological rule systems, vol 20, issue 3. MIT Press, Cambridge, pp 331\u2013378"},{"issue":"3","key":"88_CR19","doi-asserted-by":"publisher","first-page":"400","DOI":"10.1109\/TASSP.1987.1165125","volume":"35","author":"S Katz","year":"1987","unstructured":"Katz S (1987) Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Trans Acoust Speech Signal Process 35(3):400\u2013401","journal-title":"IEEE Trans Acoust Speech Signal Process"},{"key":"88_CR20","doi-asserted-by":"crossref","unstructured":"Kneser R, Ney H (1995) Improved backing-off for M-gram language modeling. In: Proceedings of ICASSP, vol 1. pp 181\u2013 184","DOI":"10.1109\/ICASSP.1995.479394"},{"key":"88_CR21","doi-asserted-by":"crossref","unstructured":"Mateus, MH, d\u2019Andrade E (2000) The phonology of Portuguese. Cambridge University Press, USA 18(2):309\u2013312","DOI":"10.1017\/S0952675701004109"},{"issue":"1","key":"88_CR22","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1145\/375360.375365","volume":"33","author":"G Navarro","year":"2001","unstructured":"Navarro G (2001) A guided tour to approximate string matching. ACM Comput Surveys 33(1):31\u201388","journal-title":"ACM Comput Surveys"},{"key":"88_CR23","doi-asserted-by":"crossref","unstructured":"Ney H, Essen U, Kneser (1994) On structuring probabilistic dependences in stochastic language modelling. Computer Speech Lang 8(1):1\u201338","DOI":"10.1006\/csla.1994.1001"},{"key":"88_CR24","unstructured":"Oliveira C, Moutinho L, Teixeira A (2004) Um Novo Sistema de Convers\u00e3o Grafema-Fone para PE Baseado em Transdutores\u201d, Actas do II Congresso Internacional de Fon\u00e9tica e Fonologia, Maranh\u00e3o, Brazil."},{"key":"88_CR25","doi-asserted-by":"crossref","unstructured":"Oliveira LC, Viana MC, Trancoso IM (1992) A rule-based text-to-speech system for Portuguese. In: Proceedings of ICASSP, vol. 2. San Francisco, USA, pp 73\u201376","DOI":"10.1109\/ICASSP.1992.226117"},{"key":"88_CR26","doi-asserted-by":"crossref","unstructured":"Santos D, Rocha P (2001) Evaluating CETEMP\u00fablico, AFree Resource for Portuguese\u201d. In: Proceedings of the 39th annual meeting of the association for computational linguistics, Toulouse, France, pp 442\u2013449","DOI":"10.3115\/1073012.1073070"},{"key":"88_CR27","unstructured":"SpeechDAT (1998) Portuguese SpeechDat(II) FDB-4000, European Language Resources Association. http:\/\/www.elda.org\/catalogue\/en\/speech\/S0092.html"},{"key":"88_CR28","doi-asserted-by":"crossref","unstructured":"Taylor P (2005) Hidden markov models for grapheme to phoneme conversion. In: Proceedings of INTERSPEECH, Lisbon, Portugal, pp 1973\u20131976","DOI":"10.21437\/Interspeech.2005-615"},{"key":"88_CR29","unstructured":"Teixeira JP (2004) A prosody model to TTS systems. PhD Thesis, Faculdade de Engenharia da Universidade do Porto"},{"key":"88_CR30","unstructured":"Wells JC (1997) SAMPA computer readable phonetic alphabet. In: Gibbon D, Moore R, Winski R (eds) Handbook of standards and resources for spoken language systems, Part IV. Berlin, Mouton de Gruyter"},{"issue":"4","key":"88_CR31","doi-asserted-by":"publisher","first-page":"1085","DOI":"10.1109\/18.87000","volume":"37","author":"I Witten","year":"1991","unstructured":"Witten I, Bell T (1991) The zero-frequency problem: estimating the probabilities of novel events in adaptive text compression. IEEE Trans Inf Theory 37(4):1085\u20131094","journal-title":"IEEE Trans Inf Theory"},{"key":"88_CR32","doi-asserted-by":"crossref","unstructured":"Ribeiro R, Oliveira LC, Trancoso I (2003) Using morphossyntactic information in TTS systems: comparing strategies for European Portuguese. In: PROPOR\u20192003\u20146th workshop on computational processing of the Portuguese Language. Springer, Heidelberg, pp 143\u2013150","DOI":"10.1007\/3-540-45011-4_21"},{"key":"88_CR33","unstructured":"Ribeiro, R, Oliveira, LC, Trancoso I (2002) Morphossyntactic Disambiguation for TTS Systems. In: Proceedings of the 3rd international conference on language resources and evaluation, vol V. pp 1427\u20131431 (ELRA)"},{"key":"88_CR34","unstructured":"Braga D, Marques MA (2007) Desambigua\u00e7\u00e3o hom\u00f3grafos para Sistemas de convers\u00e3o Texto-Fala em Portugu\u00eas\u201d, Diacr\u00edtica, 21.1 (S\u00e9rie Ci\u00eancias da Linguagem) Braga: CEHUM\/Universidade do Minho, pp 25\u201350"},{"key":"88_CR35","doi-asserted-by":"crossref","unstructured":"Seara I, Kafka S, Klein S, Seara R (2001) \u201cConsidera\u00e7\u00f5es sobre os problemas de altern\u00e2ncia voc\u00e1lica das formas verbais do Portugu\u00eas falado no Brasil para aplica\u00e7\u00e3o em um sistema de convers\u00e3o Texto-Fala\u201d, SBrT 2001\u2014XIX. Simp\u00f3sio Brasileiro de Telecomunica\u00e7\u00f5es, Fortaleza, Brazil","DOI":"10.14209\/jcis.2002.15"},{"issue":"1","key":"88_CR36","first-page":"79","volume":"17","author":"I Seara","year":"2002","unstructured":"Seara I, Kafka S, Klein S, Seara R (2002) Altern\u00e2ncia voc\u00e1lica das formas verbais e nominais do Portugu\u00eas Brasileiro para aplica\u00e7\u00e3o em convers\u00e3o Texto-Fala. Revista da Sociedade Brasileira de Telecomunica\u00e7\u00f5es 17(1):79\u201385","journal-title":"Revista da Sociedade Brasileira de Telecomunica\u00e7\u00f5es"},{"key":"88_CR37","doi-asserted-by":"crossref","unstructured":"Barbosa F, Ferrari L, Resende F Jr (2003) A methodology to analyze homographs for a Brazilian Portuguese TTS system. In: PROPOR\u20192003\u2014 6th workshop on computational processing of the Portuguese Language. Springer, Heidelberg","DOI":"10.1007\/3-540-45011-4_8"},{"key":"88_CR38","unstructured":"Ferrari L, Barbosa F, Resende F Jr (2003) Constru\u00e7\u00f5es gramaticais e sistemas de convers\u00e3o texto-fala: o caso dos hom\u00f3grafos. In: Proceedings of the international conference on cognitive linguistics, Braga"},{"key":"88_CR39","doi-asserted-by":"crossref","unstructured":"Silva D, Braga D, Resende F Jr (2009) Conjunto de Regras para Desambigua\u00e7\u00e3o de Hom\u00f3grafos Heter\u00f3fonos no Portugu\u00eas Brasileiro. In: XXVII Simp\u00f3sio Brasileiro de Telecomunica\u00e7\u00f5es \u2014 SBrT 2009, September 29\u2013October 2, Blumenau, Santa Catarina, Brazil, vol 1. pp 1\u20136","DOI":"10.14209\/sbrt.2009.58107"}],"container-title":["Journal of the Brazilian Computer Society"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s13173-012-0088-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s13173-012-0088-0\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s13173-012-0088-0","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s13173-012-0088-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,25]],"date-time":"2023-06-25T17:45:59Z","timestamp":1687715159000},"score":1,"resource":{"primary":{"URL":"https:\/\/journal-bcs.springeropen.com\/articles\/10.1007\/s13173-012-0088-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,9,12]]},"references-count":39,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2013,6]]}},"alternative-id":["88"],"URL":"https:\/\/doi.org\/10.1007\/s13173-012-0088-0","relation":{},"ISSN":["0104-6500","1678-4804"],"issn-type":[{"value":"0104-6500","type":"print"},{"value":"1678-4804","type":"electronic"}],"subject":[],"published":{"date-parts":[[2012,9,12]]},"assertion":[{"value":"30 December 2011","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 August 2012","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"12 September 2012","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}