{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,13]],"date-time":"2025-10-13T19:54:25Z","timestamp":1760385265806},"reference-count":24,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2012,12,1]],"date-time":"2012-12-01T00:00:00Z","timestamp":1354320000000},"content-version":"unspecified","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/2.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J AUDIO SPEECH MUSIC PROC."],"published-print":{"date-parts":[[2012,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>In this study, a consistency analysis of energy parameter for Mandarin speech is presented. Identified as a result of inspection of the human pronunciation process, the consistency can be interpreted as a high correlation of a warping curve between the spectrum and the prosody intra a syllable. Through three steps in the procedure of the consistency analysis, the hidden Markov model (HMM) algorithm is used first to decode HMM-state sequences within a syllable at the same time as to divide them into three segments. Second, based on a designated syllable, the vector quantization (VQ) with the Linde\u2013Buzo\u2013Gray algorithm is used to train the VQ codebooks of each segment. Third, the energy vector of each segment is encoded as an index by VQ codebooks, and then the probability of each possible path is evaluated as a prerequisite to analyze the consistency. It is demonstrated experimentally that a consistency is definitely acquired in case the syllable is located exactly in the same word. These results offer a research direction that the energy warping process intra a syllable must be considered in a text-to-speech system to improve the synthesized speech quality.<\/jats:p>","DOI":"10.1186\/1687-4722-2012-28","type":"journal-article","created":{"date-parts":[[2012,12,17]],"date-time":"2012-12-17T23:14:14Z","timestamp":1355786054000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["A study on the consistency analysis of energy parameter for Mandarin speech"],"prefix":"10.1186","volume":"2012","author":[{"given":"Li-Te","family":"Shen","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Cheng-Yu","family":"Yeh","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shaw-Hwa","family":"Hwang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2012,12,17]]},"reference":[{"key":"65_CR1","doi-asserted-by":"publisher","first-page":"737","DOI":"10.1121\/1.395275","volume":"82","author":"DH Klatt","year":"1987","unstructured":"Klatt DH: Review of text-to-speech conversion for English. J. Acoust. Soc. Am. 1987, 82: 737-793. 10.1121\/1.395275","journal-title":"J. Acoust. Soc. Am"},{"key":"65_CR2","doi-asserted-by":"publisher","first-page":"1309","DOI":"10.1109\/29.31286","volume":"37","author":"LS Lee","year":"1989","unstructured":"Lee LS, Tseng CY, Ming OY: The synthesis rules in a Chinese text-to-speech system. IEEE T. Acoust. Speech 1989, 37: 1309-1320. 10.1109\/29.31286","journal-title":"IEEE T. Acoust. Speech"},{"key":"65_CR3","doi-asserted-by":"publisher","first-page":"17","DOI":"10.1109\/2.56867","volume":"23","author":"MH O\u2019Malley","year":"1990","unstructured":"O\u2019Malley MH: Text-to-speech conversion technology. Computer 1990, 23: 17-23.","journal-title":"Computer"},{"key":"65_CR4","first-page":"1421","volume-title":"Proceedings of the ICSLP","author":"SH Hwang","year":"1996","unstructured":"Hwang SH, Chen SH, Wang YR: A Mandarin text-to-speech system. In Proceedings of the ICSLP. Philadelphia, USA; 1996:1421-1424."},{"key":"65_CR5","doi-asserted-by":"publisher","DOI":"10.1155\/2009\/169819","author":"W Mattheyses","year":"2009","unstructured":"Mattheyses W, Latacz L, Verhelst W: On the importance of audiovisual coherence for the perceived quality of synthesized visual speech. EJASMP 2009. 10.1155\/2009\/169819","journal-title":"EJASMP"},{"key":"65_CR6","doi-asserted-by":"publisher","DOI":"10.1155\/2009\/169819","volume-title":"On the importance of audiovisual coherence for the perceived quality of synthesized visual speech","author":"W Mattheyses","year":"2009","unstructured":"Edge JD, Hilton A, Jackson P: Model-based synthesis of visual speech movements from 3D video. EJASMP 2009."},{"key":"65_CR7","doi-asserted-by":"publisher","first-page":"613","DOI":"10.1109\/TCE.2009.5174430","volume":"55","author":"S Karabetsos","year":"2009","unstructured":"Karabetsos S, Tsiakoulis P, Chalamandaris A, Raptis S: Embedded unit selection text-to-speech synthesis for mobile devices. IEEE Trans. Consum. Electron. 2009, 55: 613-621.","journal-title":"IEEE Trans. Consum. Electron"},{"key":"65_CR8","doi-asserted-by":"publisher","first-page":"39","DOI":"10.1109\/LES.2010.2052019","volume":"2","author":"C Spelta","year":"2010","unstructured":"Spelta C, Manzoni V, Corti A, Goggi A, Savaresi SM: Smartphone-based vehicle-to-driver\/environment interaction system for motorcycles. IEEE Embed. Syst. Lett. 2010, 2: 39-42.","journal-title":"IEEE Embed. Syst. Lett"},{"key":"65_CR9","first-page":"1652","volume-title":"Proceedings of the ICALIP","author":"DJ Yue","year":"2010","unstructured":"Yue DJ: Two stage concatenation speech synthesis for embedded devices. In Proceedings of the ICALIP. Shanghai, China; 2010:1652-1656."},{"key":"65_CR10","doi-asserted-by":"publisher","first-page":"1890","DOI":"10.1109\/TCE.2010.5606343","volume":"56","author":"A Chalamandaris","year":"2010","unstructured":"Chalamandaris A, Karabetsos S, Tsiakoulis P, Raptis S: A unit selection text-to-speech synthesis system optimized for use with screen readers. IEEE Trans. Consum. Electron. 2010, 56: 1890-1897.","journal-title":"IEEE Trans. Consum. Electron"},{"key":"65_CR11","doi-asserted-by":"publisher","first-page":"219","DOI":"10.1016\/S0167-6393(00)00075-3","volume":"35","author":"CH Wu","year":"2001","unstructured":"Wu CH, Chen JH: Automatic generation of synthesis units and prosodic information for Chinese concatenative synthesis. Speech Commun. 2001, 35: 219-237. 10.1016\/S0167-6393(00)00075-3","journal-title":"Speech Commun"},{"key":"65_CR12","doi-asserted-by":"publisher","first-page":"481","DOI":"10.1109\/TSA.2002.803437","volume":"10","author":"FC Chou","year":"2002","unstructured":"Chou FC, Tseng CY, Lee LS: A set of corpus-based text-to-speech synthesis technologies for Mandarin Chinese. IEEE Trans. Speech Audio Process. 2002, 10: 481-494. 10.1109\/TSA.2002.803437","journal-title":"IEEE Trans. Speech Audio Process"},{"key":"65_CR13","doi-asserted-by":"publisher","first-page":"1455","DOI":"10.1109\/TASL.2009.2035209","volume":"18","author":"JR Bellegarda","year":"2010","unstructured":"Bellegarda JR, Dynamic A: Cost weighting framework for unit selection text-to-speech synthesis. IEEE Trans. Audio Speech Lang. Process. 2010, 18: 1455-1463.","journal-title":"IEEE Trans. Audio Speech Lang. Process"},{"key":"65_CR14","doi-asserted-by":"publisher","first-page":"453","DOI":"10.1016\/0167-6393(90)90021-Z","volume":"9","author":"E Moulines","year":"1990","unstructured":"Moulines E, Charpentier F: Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Commun. 1990, 9: 453-467. 10.1016\/0167-6393(90)90021-Z","journal-title":"Speech Commun"},{"key":"65_CR15","first-page":"204","volume-title":"Proceedings of the TENCON","author":"Y Zhu","year":"2002","unstructured":"Zhu Y, Zhao L, Xu Y, Niimi Y: A Chinese text-to-speech system based on TD-PSOLA. In Proceedings of the TENCON. Beijing, China; 2002:204-207."},{"key":"65_CR16","doi-asserted-by":"publisher","first-page":"226","DOI":"10.1109\/89.668817","volume":"6","author":"SH Chen","year":"1998","unstructured":"Chen SH, Hwang SH, Wang YR: An RNN-based prosodic information synthesizer for Mandarin text-to-speech. IEEE Trans. Speech Audio Process. 1998, 6: 226-239. 10.1109\/89.668817","journal-title":"IEEE Trans. Speech Audio Process"},{"key":"65_CR17","first-page":"809","volume-title":"Proceedings of the ICASSP","author":"Z Ying","year":"2001","unstructured":"Ying Z, Shi X: An RNN-based algorithm to detect prosodic phrase for Chinese TTS. In Proceedings of the ICASSP. Salt Lake City, Utah, USA; 2001:809-812."},{"key":"65_CR18","doi-asserted-by":"publisher","first-page":"793","DOI":"10.1049\/ip-vis:20045095","volume":"152","author":"CY Yeh","year":"2005","unstructured":"Yeh CY, Hwang SH: Efficient text analyzer with prosody generator-driven approach for Mandarin text-to-speech. IEE Proc. Vis. Image Signal Process. 2005, 152: 793-793. 10.1049\/ip-vis:20045095","journal-title":"IEE Proc. Vis. Image Signal Process"},{"key":"65_CR19","first-page":"377","volume-title":"Hidden Markov models, in Spoken Language Processing","author":"XD Huang","year":"2001","unstructured":"Huang XD, Acero A, Hon HW: Hidden Markov models, in Spoken Language Processing. Prentice Hall PTR, NJ; 2001:377-413."},{"key":"65_CR20","doi-asserted-by":"publisher","first-page":"257","DOI":"10.1109\/5.18626","volume":"77","author":"LR Rabiner","year":"1989","unstructured":"Rabiner LR: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 1989, 77: 257-286. 10.1109\/5.18626","journal-title":"Proc. IEEE"},{"key":"65_CR21","volume-title":"EURASIP J. Adv. Signal Process","author":"U Simsekli","year":"2011","unstructured":"Simsekli U, Jylha A, Erkut C, Cemgil T: Real-time recognition of percussive sounds by a model-based method. EURASIP J. Adv. Signal Process 2011. 10.1155\/2011\/291860"},{"key":"65_CR22","volume-title":"EURASIP J. Adv. Signal Process","author":"S Winters-Hilt","year":"2010","unstructured":"Winters-Hilt S, Jiang Z, Baribault C: Hidden Markov model with duration side information for novel HMMD derivation, with application to eukaryotic gene finding. EURASIP J. Adv. Signal Process 2010. 10.1155\/2010\/761360"},{"key":"65_CR23","doi-asserted-by":"publisher","first-page":"1039","DOI":"10.1016\/j.specom.2009.04.004","volume":"51","author":"H Zen","year":"2009","unstructured":"Zen H, Tokuda K, Black AW: Statistical parametric speech synthesis. Speech Commun. 2009, 51: 1039-1064. 10.1016\/j.specom.2009.04.004","journal-title":"Speech Commun"},{"key":"65_CR24","doi-asserted-by":"publisher","first-page":"84","DOI":"10.1109\/TCOM.1980.1094577","volume":"28","author":"Y Linde","year":"1980","unstructured":"Linde Y, Buzo A, Gray R: An algorithm for vector quantizer design. IEEE Trans. Commun. 1980, 28: 84-95. 10.1109\/TCOM.1980.1094577","journal-title":"IEEE Trans. Commun"}],"container-title":["EURASIP Journal on Audio, Speech, and Music Processing"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/1687-4722-2012-28.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/1687-4722-2012-28\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1687-4722-2012-28.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T22:14:06Z","timestamp":1630534446000},"score":1,"resource":{"primary":{"URL":"https:\/\/asmp-eurasipjournals.springeropen.com\/articles\/10.1186\/1687-4722-2012-28"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,12]]},"references-count":24,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2012,12]]}},"alternative-id":["65"],"URL":"https:\/\/doi.org\/10.1186\/1687-4722-2012-28","relation":{},"ISSN":["1687-4722"],"issn-type":[{"value":"1687-4722","type":"electronic"}],"subject":[],"published":{"date-parts":[[2012,12]]},"assertion":[{"value":"20 September 2012","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"26 November 2012","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"17 December 2012","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"28"}}