{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,3,27]],"date-time":"2025-03-27T07:53:15Z","timestamp":1743061995698,"version":"3.40.3"},"publisher-location":"Cham","reference-count":38,"publisher":"Springer International Publishing","isbn-type":[{"type":"print","value":"9783319026800"},{"type":"electronic","value":"9783319026817"}],"license":[{"start":{"date-parts":[[2014,1,1]],"date-time":"2014-01-01T00:00:00Z","timestamp":1388534400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"},{"start":{"date-parts":[[2014,1,1]],"date-time":"2014-01-01T00:00:00Z","timestamp":1388534400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2014]]},"DOI":"10.1007\/978-3-319-02681-7_13","type":"book-chapter","created":{"date-parts":[[2014,3,4]],"date-time":"2014-03-04T06:36:00Z","timestamp":1393914960000},"page":"179-193","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Text-To-Speech Synthesis"],"prefix":"10.1007","author":[{"given":"Florian","family":"Hinterleitner","sequence":"first","affiliation":[]},{"given":"Christoph","family":"Norrenbrock","sequence":"additional","affiliation":[]},{"given":"Sebastian","family":"M\u00f6ller","sequence":"additional","affiliation":[]},{"given":"Ulrich","family":"Heute","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2014,3,5]]},"reference":[{"key":"13_CR1","unstructured":"ASA S3.2-2009 (2009) American national standard method for measuring the intelligibility of speech over communication systems. American National Standards of the Acoustical Society of America, Washington"},{"issue":"4","key":"13_CR2","doi-asserted-by":"crossref","first-page":"381","DOI":"10.1016\/0167-6393(96)00026-X","volume":"18","author":"C Benoit","year":"1996","unstructured":"Benoit C, Griceb M, Hazanc V (1996) The SUS test: a method for the assessment of text-to-speech synthesis intelligibility using semantically unpredictable sentences. Speech Communication 18(4):381\u2013392","journal-title":"Speech Communication"},{"key":"13_CR3","doi-asserted-by":"crossref","unstructured":"Black AW, Taylor PA (1994) CHATR: a generic speech synthesis system. In: COLING 1994, vol 2. pp 983\u2013986","DOI":"10.3115\/991250.991307"},{"key":"13_CR4","unstructured":"Burkhardt F (2013) Comparison of German TTS-systems. Cited 20 Apr 2013. http:\/\/syntheticspeech.de\/index.html"},{"key":"13_CR5","unstructured":"Cernak M, Rusko M (2005) An evaluation of synthetic speech using the PESQ measure. In: Proceedings of forum acusticum, Budapest, Hungary, pp 2725\u20132728"},{"key":"13_CR6","doi-asserted-by":"crossref","unstructured":"Chu M, Peng H (2001) An objective measure for estimating MOS of synthesized speech. In: Proceedings of the 7th international conference on speech communication and technology (Eurospeech 2001), Aalborg, Denmark, pp 2087\u20132090","DOI":"10.21437\/Eurospeech.2001-492"},{"key":"13_CR7","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-18463-5","volume-title":"Integral and diagnostic intrusive prediction of speech quality","author":"N C\u00f4t\u00e9","year":"2011","unstructured":"C\u00f4t\u00e9 N (2011) Integral and diagnostic intrusive prediction of speech quality. Springer, Heidelberg"},{"key":"13_CR8","doi-asserted-by":"crossref","first-page":"781","DOI":"10.1109\/LSP.2008.2006709","volume":"15","author":"TH Falk","year":"2008","unstructured":"Falk TH, M\u00f6ller S (2008) Towards signal-based instrumental quality diagnosis for text-to-speech systems. IEEE Signal Processing Letter 15:781\u2013784","journal-title":"IEEE Signal Processing Letter"},{"key":"13_CR9","unstructured":"Fujisaki H (1981) Dynamic characteristics of voice fundamental frequency in speech and singing. Acoustical analysis and physiological interpretations. In: STL-QPSR, vol 22. pp 1\u201320"},{"key":"13_CR10","volume-title":"Handbook of standards and resources for spoken language systems","author":"D Gibbon","year":"1997","unstructured":"Gibbon D, Moore R, Winski R (1997) Handbook of standards and resources for spoken language systems. De Gruyter Mouton, Berlin, Boston"},{"key":"13_CR11","doi-asserted-by":"crossref","unstructured":"Hinterleitner F, M\u00f6ller S, Norrenbrock C, Heute U (2011) Perceptual quality dimensions of text-to-speech systems. In: Proceedings of the 12th annual conference of the international speech communication association (Interspeech 2011), Florence, Italy, pp 2177\u20132180","DOI":"10.21437\/Interspeech.2011-570"},{"key":"13_CR12","doi-asserted-by":"crossref","unstructured":"Hinterleitner F, Neitzel G, M\u00f6ller S, Norrenbrock C (2011) An evaluation protocol for the subjective assessment of text-to-speech in audiobook reading tasks. In: Proceedings of the Blizzard challenge workshop, Florence, Italy","DOI":"10.21437\/Blizzard.2011-11"},{"key":"13_CR13","unstructured":"Hinterleitner F, Zabel S, M\u00f6ller S, Leutelt L, Norrenbrock C (2011) Predicting the quality of synthesized speech using reference-based prediction measures. In: Proceedings of the 22nd Konferenz Elektronische Sprachsignalverarbeitung (ESSV 2011), Aachen, Germany, pp 99\u2013106"},{"key":"13_CR14","unstructured":"Hinterleitner F, Norrenbrock C, M\u00f6ller S (2012) On the use of fujisaki parameters for the quality prediction of synthetic speech. In: Proceedings of the 23rd Konferenz Elektronische Sprachsignalverarbeitung (ESSV 2012), Cottbus, Germany, pp 112\u2013119"},{"key":"13_CR15","doi-asserted-by":"crossref","unstructured":"Hinterleitner F, Norrenbrock C, M\u00f6ller S, Heute U (2012) What makes this voice sound so bad? A multidimensional analysis of state-of-the-art text-to-speech systems. In: Proceedings of the 2012 IEEE workshop on spoken language technology (SLT), Miami, USA, pp 240\u2013245","DOI":"10.1109\/SLT.2012.6424229"},{"key":"13_CR16","unstructured":"Hinterleitner F, Norrenbrock C, M\u00f6ller S (2013) Perceptual quality dimensions of text-to-speech in audiobook reading tasks. In: Proceedings of the 24th Konferenz Elektronische Sprachsignalverarbeitung (ESSV 2013), Bielefeld, Germany, pp 44\u201349"},{"key":"13_CR17","doi-asserted-by":"crossref","unstructured":"Hinterleitner F, Norrenbrock C, M\u00f6ller S, Heute U (2013) Predicting the quality of text-to-speech systems from a large-scale feature set, Lyon, France, pp 383\u2013387","DOI":"10.21437\/Interspeech.2013-105"},{"key":"13_CR18","unstructured":"ITU-T Recommendation P.85 (1994) A method for subjective performance assessment of the quality of speech voice output devices. International Telecommunication Union, Geneva"},{"key":"13_CR19","unstructured":"ITU-T Recommendation P.563 (2004) Single ended method for objective speech quality assessment in narrow-band telephony. International Telecommunication Union, Geneva"},{"key":"13_CR20","unstructured":"ITU-T Recommendation P.862 (2001) Perceptual evaluation of speech quality (PESQ), an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs. International Telecommunication Union, Geneva"},{"key":"13_CR21","unstructured":"ITU-T Recommendation P.863 (2011) Perceptual objective listening quality assessment (POLQA). International Telecommunication Union, Geneva"},{"key":"13_CR22","doi-asserted-by":"crossref","unstructured":"Jekosch U (1993) Speech quality assessment and evaluation. In: Proceedings of Eurospeech, Berlin, Germany, pp 1387\u20131394","DOI":"10.21437\/Eurospeech.1993-11"},{"issue":"3","key":"13_CR23","doi-asserted-by":"crossref","first-page":"971","DOI":"10.1121\/1.383940","volume":"67","author":"DH Klatt","year":"1980","unstructured":"Klatt DH (1980) Software for a cascade\/parallel formant synthesizer. Journal of the Acoustical Society of America 67(3):971\u2013995","journal-title":"Journal of the Acoustical Society of America"},{"key":"13_CR24","first-page":"351","volume":"3","author":"V Kraft","year":"1995","unstructured":"Kraft V, Portele T (1995) Quality evaluation of five German speech synthesis systems. Acta Acustica 3:351\u2013365","journal-title":"Acta Acustica"},{"key":"13_CR25","doi-asserted-by":"crossref","unstructured":"Mariniak A (1993) A global framework for the assessment of synthetic speech without subjects. In: Proceedings of the 3rd European conference on speech processing and technology (Eurospeech), Berlin, Germany, pp 1683\u20131686","DOI":"10.21437\/Eurospeech.1993-379"},{"key":"13_CR26","unstructured":"Mayo C, Clark RAJ, King S (2005) Listener\u2019s weighting of acoustic cues to synthetic speech naturalness: a multidimensional scaling analysis. In: Proceedings of the 6th annual conference of the international speech communication association (Interspeech), Lisbon, Portugal, pp 1725\u20131728"},{"key":"13_CR27","doi-asserted-by":"crossref","unstructured":"Minker W, Lee GG, Mariani J, Nakamura S (2010) Salient features for anger recognition in German and English IVR portals. Spoken dialogue systems technology and design. Springer","DOI":"10.1007\/978-1-4419-7934-6"},{"key":"13_CR28","unstructured":"M\u00f6ller S, Hinterleitner F (2013) ITU-T Contribution COM 12\u201337: proposal for an appendix to Rec. P.85 of the evaluation of speech output for audiobook reading tasks. Deutsche Telekom AG, ITU-T SG12 meeting 19\u201328 Mar 2013, Geneva"},{"key":"13_CR29","doi-asserted-by":"crossref","unstructured":"M\u00f6ller S, Hinterleitner F, Falk TH, Polzehl T (2010) Comparison of approaches for instrumentally predicting the quality of text-to-speech systems. In: Proceedings of the 11th annual conference of the international speech communication association (Interspeech 2010), Makuhari, Japan, pp 1325\u20131328","DOI":"10.21437\/Interspeech.2010-413"},{"issue":"5\/6","key":"13_CR30","doi-asserted-by":"crossref","first-page":"453","DOI":"10.1016\/0167-6393(90)90021-Z","volume":"9","author":"E Moulines","year":"1990","unstructured":"Moulines E, Charpentier N (1990) Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Communication 9(5\/6):453\u2013467","journal-title":"Speech Communication"},{"key":"13_CR31","doi-asserted-by":"crossref","first-page":"255","DOI":"10.1109\/LSP.2012.2189562","volume":"19","author":"C Norrenbrock","year":"2012","unstructured":"Norrenbrock C, Hinterleitner F, Heute U, M\u00f6ller S (2012) Instrumental assessment of prosodic quality for text-to-speech signals. IEEE Signal Processing Letters 19:255\u2013258","journal-title":"IEEE Signal Processing Letters"},{"key":"13_CR32","doi-asserted-by":"crossref","unstructured":"Norrenbrock C, Hinterleitner F, Heute U, M\u00f6ller S (2012) Quality analysis of macroprosodic $$F_{0}$$ dynamics in text-to-speech signals. In: Proceedings of the 13th annual conference of the international speech communication association (Interspeech 2012), Portland, USA, pp 454\u2013457","DOI":"10.21437\/Interspeech.2012-156"},{"key":"13_CR33","doi-asserted-by":"crossref","unstructured":"Norrenbrock C, Hinterleitner F, Heute U, M\u00f6ller S (2012) Towards perceptual quality modeling of synthesized audiobooks. In: Proceedings of the blizzard challenge workshop, Portland, USA","DOI":"10.21437\/Blizzard.2012-11"},{"issue":"2","key":"13_CR34","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1109\/5.18626","volume":"77","author":"L Rabiner","year":"1989","unstructured":"Rabiner L (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2):257\u2013286","journal-title":"Proc IEEE"},{"key":"13_CR35","doi-asserted-by":"crossref","unstructured":"Sityaev D, Knill K, Burrows T (2006) Comparison of the ITU-T P.85 standard to other methods for the evaluation of text-to-speech systems. In: Proceedings of the 9th international conference on spoken language processing (Interspeech), Pittsburgh, USA, pp 1077\u20131080","DOI":"10.21437\/Interspeech.2006-54"},{"key":"13_CR36","unstructured":"Tokuda K, Zen H, Black AW (2002) An HMM-based speech synthesis system applied to English. In: Proceedings of 2002 IEEE speech synthesis workshop, Santa Monica, USA, pp 227\u2013230"},{"issue":"3","key":"13_CR37","doi-asserted-by":"crossref","first-page":"307","DOI":"10.1207\/S15327906MBR3503_02","volume":"35","author":"L Tsogo","year":"2000","unstructured":"Tsogo L, Masson MH, Bardot A (2000) Multidimensional scaling methods for many-objects sets: a review. Multivariate Behavioral Research 35(3):307\u2013319","journal-title":"Multivariate Behavioral Research"},{"issue":"1","key":"13_CR38","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1016\/j.csl.2003.12.001","volume":"19","author":"M Viswanathan","year":"2005","unstructured":"Viswanathan M, Viswanathan M (2005) Measuring speech quality for text-to-speech systems: development and assessment of a modified mean opinion score (MOS) scale. Computer Speech and Language 19(1):55\u201383","journal-title":"Computer Speech and Language"}],"container-title":["T-Labs Series in Telecommunication Services","Quality of Experience"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/978-3-319-02681-7_13","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,5,24]],"date-time":"2024-05-24T19:30:13Z","timestamp":1716579013000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/978-3-319-02681-7_13"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014]]},"ISBN":["9783319026800","9783319026817"],"references-count":38,"URL":"https:\/\/doi.org\/10.1007\/978-3-319-02681-7_13","relation":{},"ISSN":["2192-2810","2192-2829"],"issn-type":[{"type":"print","value":"2192-2810"},{"type":"electronic","value":"2192-2829"}],"subject":[],"published":{"date-parts":[[2014]]},"assertion":[{"value":"5 March 2014","order":1,"name":"first_online","label":"First Online","group":{"name":"ChapterHistory","label":"Chapter History"}}]}}