{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,27]],"date-time":"2025-10-27T16:13:07Z","timestamp":1761581587911},"reference-count":20,"publisher":"Institute of Electronics, Information and Communications Engineers (IEICE)","issue":"8","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IEICE Trans. Inf. &amp; Syst."],"published-print":{"date-parts":[[2017]]},"DOI":"10.1587\/transinf.2017edl8034","type":"journal-article","created":{"date-parts":[[2017,7,31]],"date-time":"2017-07-31T22:19:50Z","timestamp":1501539590000},"page":"1925-1928","source":"Crossref","is-referenced-by-count":21,"title":["Voice Conversion Using Input-to-Output Highway Networks"],"prefix":"10.1587","volume":"E100.D","author":[{"given":"Yuki","family":"SAITO","sequence":"first","affiliation":[{"name":"Graduate School of Information Science and Technology, The University of Tokyo"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shinnosuke","family":"TAKAMICHI","sequence":"additional","affiliation":[{"name":"Graduate School of Information Science and Technology, The University of Tokyo"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hiroshi","family":"SARUWATARI","sequence":"additional","affiliation":[{"name":"Graduate School of Information Science and Technology, The University of Tokyo"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"532","reference":[{"key":"1","doi-asserted-by":"publisher","unstructured":"[1] Z.-H. Ling, S.-Y. Kang, H. Zen, A. Senior, M. Schuster, X.-J. Qian, H. Meng, and L. Deng, \u201cDeep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques and future trends,\u201d IEEE Signal Process. Mag., vol.32, no.3, pp.35-52, May 2015. 10.1109\/msp.2014.2359987","DOI":"10.1109\/MSP.2014.2359987"},{"key":"2","doi-asserted-by":"publisher","unstructured":"[2] T. Toda, A.W. Black, and K. Tokuda, \u201cVoice conversion based on maximum likelihood estimation of spectral parameter trajectory,\u201d IEEE Trans. Audio Speech Lang. Process., vol.15, no.8, pp.2222-2235, Nov. 2007. 10.1109\/tasl.2007.907344","DOI":"10.1109\/TASL.2007.907344"},{"key":"3","doi-asserted-by":"publisher","unstructured":"[3] K. Tokuda, Y. Nankaku, T. Toda, H. Zen, J. Yamagishi, and K. Oura, \u201cSpeech synthesis based on hidden Markov models,\u201d Proc. IEEE, vol.101, no.5, pp.1234-1252, April 2013. 10.1109\/jproc.2013.2251852","DOI":"10.1109\/JPROC.2013.2251852"},{"key":"4","unstructured":"[4] Y.-J. Wu and R.-H. Wang, \u201cMinimum generation error training for HMM-based speech synthesis,\u201d Proc. ICASSP, Toulouse, France, pp.89-92, May 2006. 10.1109\/icassp.2006.1659964"},{"key":"5","doi-asserted-by":"publisher","unstructured":"[5] Z. Wu and S. King, \u201cImproving trajectory modeling for DNN-based speech synthesis by using stacked bottleneck features and minimum trajectory error training,\u201d IEEE Trans. Audio Speech Lang. Process., vol.24, no.7, pp.1255-1265, July 2016. 10.1109\/taslp.2016.2551865","DOI":"10.1109\/TASLP.2016.2551865"},{"key":"6","doi-asserted-by":"publisher","unstructured":"[6] S. Takamichi, T. Toda, A.W. Black, G. Neubig, S. Sakti, and S. Nakamura, \u201cPostfilters to modify the modulation spectrum for statistical parametric speech synthesis,\u201d IEEE Trans. Audio Speech Lang. Process., vol.24, no.4, pp.755-767, April 2016. 10.1109\/taslp.2016.2522655","DOI":"10.1109\/TASLP.2016.2522655"},{"key":"7","unstructured":"[7] Y. Saito, S. Takamichi, and H. Saruwatari, \u201cTraining algorithm to deceive anti-spoofing verification for DNN-based speech synthesis,\u201d Proc. ICASSP, New Orleans, U.S.A., pp.4900-4904, March 2017."},{"key":"8","unstructured":"[8] R.K. Srivastava, K. Greff, and J. Schmidhuber, \u201cHighway networks,\u201d Proc. ICML Deep Learning Workshop, Lille, France, July 2015."},{"key":"9","doi-asserted-by":"crossref","unstructured":"[9] X. Wang, S. Takaki, and J. Yamagishi, \u201cInvestigating very deep highway networks for parametric speech synthesis,\u201d Proc. 9th ISCA Speech Synthesis Workshop, California, U.S.A., pp.166-171, Sept. 2016. 10.21437\/ssw.2016-27","DOI":"10.21437\/SSW.2016-27"},{"key":"10","doi-asserted-by":"crossref","unstructured":"[10] K. He, X. Zhang, S. Ren, and J. Sun, \u201cDeep residual learning for image recognition,\u201d Proc. CVPR, Las Vegas, U.S.A., pp.770-778, June 2016. 10.1109\/cvpr.2016.90","DOI":"10.1109\/CVPR.2016.90"},{"key":"11","doi-asserted-by":"publisher","unstructured":"[11] T. Kitamura and M. Akagi, \u201cSpeaker individualities in speech spectral envelopes,\u201d J. Acoust. Soc. Jpn (E), vol.16, no.5, pp.283-289, Sept. 1995. 10.1250\/ast.16.283","DOI":"10.1250\/ast.16.283"},{"key":"12","doi-asserted-by":"publisher","unstructured":"[12] D. Erro, A. Moreno, and A. Bonafonte, \u201cVoice conversion based on weighted frequency warping,\u201d IEEE Trans. Audio Speech Lang. Process., vol.18, no.5, pp.922-931, July 2010. 10.1109\/tasl.2009.2038663","DOI":"10.1109\/TASL.2009.2038663"},{"key":"13","doi-asserted-by":"crossref","unstructured":"[13] J. van Hout and A. Alwan, \u201cA novel approach to soft-mask estmation and log-spectral enhancement for robust speech recognition,\u201d Proc. ICASSP, Kyoto, Japan, pp.4105-4108, March 2012. 10.1109\/icassp.2012.6288821","DOI":"10.1109\/ICASSP.2012.6288821"},{"key":"14","unstructured":"[14] Y. Sagisaka, K. Takeda, M. Abe, S. Katagiri, T. Umeda, and H. Kuawhara, \u201cA large-scale Japanese speech database,\u201d ICSLP90, pp.1089-1092, Kobe, Japan, Nov. 1990."},{"key":"15","unstructured":"[15] H. Kawahara, J. Estill, and O. Fujimura, \u201cAperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STRAIGHT,\u201d MAVEBA 2001, Firentze, Italy, pp.1-6, Sept. 2001."},{"key":"16","unstructured":"[16] Y. Ohtani, T. Toda, H. Saruwatari, and K. Shikano, \u201cMaximum likelihood voice conversion based on GMM with STRAIGHT mixed excitation,\u201d Proc. INTERSPEECH, Pittsburgh, U.S.A., pp.2266-2269, Sept. 2006."},{"key":"17","doi-asserted-by":"publisher","unstructured":"[17] H. Kawahara, I. Masuda-Katsuse, and A.D. Cheveign\u00e9, \u201cRestructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds,\u201d Speech Communication, vol.27, no.3-4, pp.187-207, April 1999. 10.1016\/s0167-6393(98)00085-5","DOI":"10.1016\/S0167-6393(98)00085-5"},{"key":"18","doi-asserted-by":"crossref","unstructured":"[18] S. Takamichi, K. Kobayashi, K. Tanaka, T. Toda, and S. Nakamura, \u201cThe NAIST text-to-speech system for the Blizzard Challenge 2015,\u201d Proc. Blizzard Challenge workshop, Berlin, Germany, Sept. 2015.","DOI":"10.21437\/Blizzard.2015-7"},{"key":"19","unstructured":"[19] X. Glorot, A. Bordes, and Y. Bengio, \u201cDeep sparse rectifier neural networks,\u201d Proc. AISTATS, Lauderdale, U.S.A., pp.315-323, April 2011."},{"key":"20","unstructured":"[20] J. Duchi, E. Hazan, and Y. Singer, \u201cAdaptive subgradient methods for online learning and stochastic optimization,\u201d Journal of Machine Learning Research, vol.12, pp.2121-2159, July 2011."}],"container-title":["IEICE Transactions on Information and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.jstage.jst.go.jp\/article\/transinf\/E100.D\/8\/E100.D_2017EDL8034\/_pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,6,25]],"date-time":"2024-06-25T22:07:46Z","timestamp":1719353266000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.jstage.jst.go.jp\/article\/transinf\/E100.D\/8\/E100.D_2017EDL8034\/_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017]]},"references-count":20,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2017]]}},"URL":"https:\/\/doi.org\/10.1587\/transinf.2017edl8034","relation":{},"ISSN":["0916-8532","1745-1361"],"issn-type":[{"value":"0916-8532","type":"print"},{"value":"1745-1361","type":"electronic"}],"subject":[],"published":{"date-parts":[[2017]]}}}