{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2022,3,29]],"date-time":"2022-03-29T23:18:05Z","timestamp":1648595885588},"reference-count":26,"publisher":"Institute of Electronics, Information and Communications Engineers (IEICE)","issue":"8","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IEICE Trans. Inf. &amp; Syst."],"published-print":{"date-parts":[[2019,8,1]]},"DOI":"10.1587\/transinf.2018edp7344","type":"journal-article","created":{"date-parts":[[2019,7,31]],"date-time":"2019-07-31T22:10:57Z","timestamp":1564611057000},"page":"1546-1553","source":"Crossref","is-referenced-by-count":1,"title":["Pre-Training of DNN-Based Speech Synthesis Based on Bidirectional Conversion between Text and Speech"],"prefix":"10.1587","volume":"E102.D","author":[{"given":"Kentaro","family":"SONE","sequence":"first","affiliation":[{"name":"Graduate School of Informatics and Engineering, The University of Electro-Communications"}]},{"given":"Toru","family":"NAKASHIKA","sequence":"additional","affiliation":[{"name":"Graduate School of Informatics and Engineering, The University of Electro-Communications"}]}],"member":"532","reference":[{"key":"1","doi-asserted-by":"publisher","unstructured":"[1] K. Tokuda, Y. Nankaku, T. Toda, H. Zen, J. Yamagishi, and K. Oura, \u201cSpeech synthesis based on hidden Markov models,\u201d Proc. IEEE, vol.101, no.5, pp.1234-1252, 2013. 10.1109\/jproc.2013.2251852","DOI":"10.1109\/JPROC.2013.2251852"},{"key":"2","unstructured":"[2] A.J. Hunt and A.W. Black, \u201cUnit selection in a concatenative speech synthesis system using a large speech database,\u201d Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, pp.373-376, 1996. 10.1109\/icassp.1996.541110"},{"key":"3","unstructured":"[3] T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, \u201cSimultaneous modeling of spectrum pitch and duration in HMM-based speech synthesis,\u201d Proc. Eurospeech, pp.2347-2350, 1999."},{"key":"4","unstructured":"[4] M. Tamura, T. Masuko, K. Tokuda, and T. Kobayashi, \u201cAdaption of pitch and spectrum for HMM-based speech synthesis using MLLR,\u201d Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, pp.805-808, 2001."},{"key":"5","doi-asserted-by":"publisher","unstructured":"[5] T. Nose, J. Yamagishi, T. Masuko, and T. Kobayashi, \u201cA style control technique for HMM-based expressive speech synthesis,\u201d IEICE Trans. Inf.&amp; Syst., vol.E90-D, no.9, pp.1406-1413, 2007. 10.1093\/ietisy\/e90-d.9.1406","DOI":"10.1093\/ietisy\/e90-d.9.1406"},{"key":"6","doi-asserted-by":"publisher","unstructured":"[6] H. Zen, K. Tokuda, and A.W. Black, \u201cStatistical parametric speech synthesis,\u201d Speech Commun, vol.51, no.11, pp.1039-1064, 2009. 10.1016\/j.specom.2009.04.004","DOI":"10.1016\/j.specom.2009.04.004"},{"key":"7","doi-asserted-by":"crossref","unstructured":"[7] H. Zen, A. Senior, and M. Schuster, \u201cStatistical parametric speech synthesis using deep neural networks,\u201d Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, pp.7962-7966, 2013. 10.1109\/icassp.2013.6639215","DOI":"10.1109\/ICASSP.2013.6639215"},{"key":"8","unstructured":"[8] A. den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, \u201cWavenet: A generative model for raw audio,\u201d arXiv:1609.03499, 2016."},{"key":"9","unstructured":"[9] S. Mehri, K. Kumar, I. Gulrajani, R. Kumar, S. Jain, J. Sotelo, A. Courville, and Y. Bengio, \u201cSampleRNN: An unconditional end-to-end neural audio generation model,\u201d arXiv:1612.07837, 2016."},{"key":"10","unstructured":"[10] J. Sotelo, S. Mehri, K. Kumar, J.F. Santos, K. Kastner, A. Courville, and Y. Bengio, \u201cChar2wav: End-to-end speech synthesis,\u201d Proc. International Conference on Learning Representations, 2017."},{"key":"11","doi-asserted-by":"publisher","unstructured":"[11] A. Krizhevsky, I. Sutskever, and G.E. Hinton, \u201cImagenet classification with deep convolutional neural networks,\u201d Commun. ACM, vol.60, no.6, pp.84-90, 2017. 10.1145\/3065386","DOI":"10.1145\/3065386"},{"key":"12","doi-asserted-by":"publisher","unstructured":"[12] G. Hinton, L. Deng, D. Yu, G. Dahl, A.-R. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, and B. Kingsbury, \u201cDeep neural networks for acoustic modeling in speech recognition: The Shared Views of Four Research Groups,\u201d IEEE Signal Processing Magazine, vol.29, no.6, pp.82-97, 2012. 10.1109\/msp.2012.2205597","DOI":"10.1109\/MSP.2012.2205597"},{"key":"13","doi-asserted-by":"crossref","unstructured":"[13] S.H. Mohammadi and A. Kain, \u201cVoice conversion using deep neural networks with speaker-independent pre-training,\u201d Proc. IEEE Spoken Language Technology Workshop, pp.19-23, 2014. 10.1109\/slt.2014.7078543","DOI":"10.1109\/SLT.2014.7078543"},{"key":"14","doi-asserted-by":"publisher","unstructured":"[14] T. Nakashika, \u201cDeep relational model: A joint probabilistic model with a hierarchical structure for bidirectional estimation of image and labels,\u201d IEICE Transactions on Information and Systems, vol.E101-D, no.2, pp.428-436, 2018. 10.1587\/transinf.2017edp7149","DOI":"10.1587\/transinf.2017EDP7149"},{"key":"15","unstructured":"[15] T. Masuko, K. Tokuda, T. Kobayashi, and S. Imai, \u201cSpeech synthesis using HMMs using dynamic features,\u201d Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, pp.389-392, 1996. 10.1109\/icassp.1996.541114"},{"key":"16","unstructured":"[16] K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, and T. Kitamura, \u201cSpeech parameter generation algorithms for HMM-based speech synthesis,\u201d Proc. International Conference on Artificial Intelligence and Statistics, pp.1315-1318, 2009. 10.1109\/icassp.2000.861820"},{"key":"17","doi-asserted-by":"publisher","unstructured":"[17] G.E. Hinton, S. Osindero, and Y.-W. Teh, \u201cA fast learning algorithm for deep belief nets,\u201d Neural computation, vol.18, no.7, pp.1527-1554, 2006. 10.1162\/neco.2006.18.7.1527","DOI":"10.1162\/neco.2006.18.7.1527"},{"key":"18","unstructured":"[18] R. Salakhutdinov and G.E. Hinton, \u201cDeep Boltzmann machines,\u201d Proc. International Conference on Artificial Intelligence and Statistics, pp.448-455, 2009."},{"key":"19","unstructured":"[19] R. Salakhutdinov and H. Larochelle, \u201cEfficient learning of deep Boltzmann machines,\u201d Proc. International Conference on Artificial Intelligence and Statistics, pp.693-700, 2010."},{"key":"20","doi-asserted-by":"publisher","unstructured":"[20] G.E. Hinton and R. Salakhutdinov, \u201cReducing the dimensionality of data with neural networks,\u201d Science, vol.313, no.5786, pp.504-507, 2006. 10.1126\/science.1127647","DOI":"10.1126\/science.1127647"},{"key":"21","doi-asserted-by":"crossref","unstructured":"[21] K. Cho, A. Ilin, and T. Raiko, \u201cImproved learning of Gaussian-Bernoulli restricted Boltzmann machines,\u201d Proc. International Conference on Artificial Neural Networks, pp.10-17, 2011. 10.1007\/978-3-642-21735-7_2","DOI":"10.1007\/978-3-642-21735-7_2"},{"key":"22","doi-asserted-by":"crossref","unstructured":"[22] S. Kang, X. Qian, and H. Meng, \u201cMulti-distribution deep belief network for speech synthesis,\u201d Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, pp.8012-8016, 2013. 10.1109\/icassp.2013.6639225","DOI":"10.1109\/ICASSP.2013.6639225"},{"key":"23","doi-asserted-by":"publisher","unstructured":"[23] B. Kosko, \u201cBidirectional associative memories,\u201d IEEE Trans Systems, Man, Cybern., vol.18, no.1, pp.49-60, 1988. 10.1109\/21.87054","DOI":"10.1109\/21.87054"},{"key":"24","doi-asserted-by":"publisher","unstructured":"[24] L.-H. Chen, T. Raitio, C. Valentini-Botinhao, Z.-H. Ling, and J. Yamagishi, \u201cA deep generative architecture for postfiltering in statistical parametric speech synthesis,\u201d IEEE\/ACM Transactions on Audio, Speech, and Language Processing, vol.23, no.11, pp.2003-2014, 2015. 10.1109\/taslp.2015.2461448","DOI":"10.1109\/TASLP.2015.2461448"},{"key":"25","doi-asserted-by":"crossref","unstructured":"[25] T. Fukada, K. Tokuda, T. Kobayashi, and S. Imai, \u201cAn adaptive algorithm for mel-cepstral analysis of speech,\u201d Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, pp.137-140, 1992. 10.1109\/icassp.1992.225953","DOI":"10.1109\/ICASSP.1992.225953"},{"key":"26","doi-asserted-by":"crossref","unstructured":"[26] S. Desai, E.V. Raghavendra, B. Yegnanarayana, A.W. Black, and K. Prahallad, \u201cVoice conversion using artificial neural networks,\u201d Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, pp.3893-3896, 2009. 10.1109\/icassp.2009.4960478","DOI":"10.1109\/ICASSP.2009.4960478"}],"container-title":["IEICE Transactions on Information and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.jstage.jst.go.jp\/article\/transinf\/E102.D\/8\/E102.D_2018EDP7344\/_pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,8,3]],"date-time":"2019-08-03T03:29:52Z","timestamp":1564802992000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.jstage.jst.go.jp\/article\/transinf\/E102.D\/8\/E102.D_2018EDP7344\/_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,8,1]]},"references-count":26,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2019]]}},"URL":"https:\/\/doi.org\/10.1587\/transinf.2018edp7344","relation":{},"ISSN":["0916-8532","1745-1361"],"issn-type":[{"value":"0916-8532","type":"print"},{"value":"1745-1361","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,8,1]]}}}