{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,2,29]],"date-time":"2024-02-29T23:41:18Z","timestamp":1709250078412},"reference-count":32,"publisher":"Institute of Electronics, Information and Communications Engineers (IEICE)","issue":"3","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IEICE Trans. Inf. &amp; Syst."],"published-print":{"date-parts":[[2020,3,1]]},"DOI":"10.1587\/transinf.2019edp7228","type":"journal-article","created":{"date-parts":[[2020,2,29]],"date-time":"2020-02-29T22:10:55Z","timestamp":1583014255000},"page":"639-647","source":"Crossref","is-referenced-by-count":3,"title":["Generative Moment Matching Network-Based Neural Double-Tracking for Synthesized and Natural Singing Voices"],"prefix":"10.1587","volume":"E103.D","author":[{"given":"Hiroki","family":"TAMARU","sequence":"first","affiliation":[{"name":"The University of Tokyo"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yuki","family":"SAITO","sequence":"additional","affiliation":[{"name":"The University of Tokyo"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shinnosuke","family":"TAKAMICHI","sequence":"additional","affiliation":[{"name":"The University of Tokyo"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tomoki","family":"KORIYAMA","sequence":"additional","affiliation":[{"name":"The University of Tokyo"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hiroshi","family":"SARUWATARI","sequence":"additional","affiliation":[{"name":"The University of Tokyo"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"532","reference":[{"key":"1","doi-asserted-by":"crossref","unstructured":"[1] S. Takamichi, T. Koriyama, and H. Saruwatari, \u201cSampling-based speech parameter generation using moment-matching network,\u201d Proc. INTERSPEECH, pp.3961-3965, Stockholm, Sweden, Aug. 2017. 10.21437\/interspeech.2017-362","DOI":"10.21437\/Interspeech.2017-362"},{"key":"2","unstructured":"[2] R. Brice, Music Engineering, Elsevier Science, Oct. 2001. 10.1016\/b978-075065040-3\/50021-x"},{"key":"3","unstructured":"[3] K. Womack, The Beatles Encyclopedia: Everything Fab Four, ABC-CLIO, June 2014."},{"key":"4","unstructured":"[4] H. Kenmochi and H. Ohshita, \u201cVOCALOID-commercial singing synthesizer based on sample concatenation,\u201d Proc. INTERSPEECH, pp.4011-4012, Antwerp, Belgium, Aug. 2007."},{"key":"5","doi-asserted-by":"crossref","unstructured":"[5] M. Blaauw, J. Bonada, and R. Daido, \u201cData efficient voice cloning for neural singing synthesis,\u201d Proc. ICASSP, pp.6840-6844, Brighton, U.K., May 2019. 10.1109\/icassp.2019.8682656","DOI":"10.1109\/ICASSP.2019.8682656"},{"key":"6","doi-asserted-by":"crossref","unstructured":"[6] D. Ayll\u00f3n, F. Villavicencio, and P. Lanchantin, \u201cA strategy for improved phone-level lyrics-to-audio alignment for speech-to-singing synthesis,\u201d Proc. INTERSPEECH, pp.2603-2607, Graz, Austria, Sept. 2019. 10.21437\/interspeech.2019-3049","DOI":"10.21437\/Interspeech.2019-3049"},{"key":"7","unstructured":"[7] K. Saino, H. Zen, Y. Nankaku, A. Lee, and K. Tokuda, \u201cAn HMM-based singing voice synthesis system,\u201d Proc. ICSLP, pp.2274-2277, Pittsburgh, U.S.A., Sept. 2006."},{"key":"8","unstructured":"[8] K. Oura, A. Mase, T. Yamada, S. Muto, Y. Nankaku, and K. Tokuda, \u201cRecent development of the HMM-based singing voice synthesis system-Sinsy,\u201d Proc. SSW7, pp.211-216, Kyoto, Japan, Sept. 2010."},{"key":"9","doi-asserted-by":"crossref","unstructured":"[9] M. Nishimura, K. Hashimoto, K. Oura, Y. Nankaku, and K. Tokuda, \u201cSinging voice synthesis based on deep neural networks,\u201d Proc. INTERSPEECH, pp.2478-2482, San Francisco, U.S.A., Sept. 2016. 10.21437\/interspeech.2016-1027","DOI":"10.21437\/Interspeech.2016-1027"},{"key":"10","doi-asserted-by":"publisher","unstructured":"[10] M. Blaauw and J. Bonada, \u201cA neural parametric singing synthesizer modeling timbre and expression from natural songs,\u201d Applied Sciences, vol.7, no.12, Dec. 2017. 10.3390\/app7121313","DOI":"10.3390\/app7121313"},{"key":"11","doi-asserted-by":"crossref","unstructured":"[11] R. Izhaki, Mixing Audio: Concepts, Practices, and Tools, Taylor &amp; Francis, Oct. 2017. 10.4324\/9781315716947","DOI":"10.4324\/9781315716947"},{"key":"12","unstructured":"[12] Y. Li, K. Swersky, and R. Zemel, \u201cGenerative moment matching networks,\u201d Proc. ICML, pp.1718-1727, Lille, France, July 2015."},{"key":"13","unstructured":"[13] Y. Ren, J. Li, Y. Luo, and J. Zhu, \u201cConditional generative moment-matching networks,\u201d Proc. NIPS, pp.2928-2936, Barcelona, Spain, Dec. 2016."},{"key":"14","unstructured":"[14] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. WardeFarley, S. Ozair, A. Courville, and Y. Bengio, \u201cGenerative adversarial nets,\u201d Proc. NIPS, pp.2672-2680, Montreal, Canada, Dec. 2014."},{"key":"15","unstructured":"[15] D.P. Kingma and M. Welling, \u201cAuto-encoding variational Bayes,\u201d arXiv, vol.abs\/1312.6114, 2013."},{"key":"16","doi-asserted-by":"crossref","unstructured":"[16] Y. Saito, Y. Ijima, K. Nishida, and S. Takamichi, \u201cNon-parallel voice conversion using variational autoencoders conditioned by phonetic posteriorgrams and d-vectors,\u201d Proc. ICASSP, pp.5274-5278, Calgary, Canada, Apr. 2018. 10.1109\/icassp.2018.8461384","DOI":"10.1109\/ICASSP.2018.8461384"},{"key":"17","doi-asserted-by":"crossref","unstructured":"[17] S. Takamichi, T. Toda, A.W. Black, G. Neubig, S. Sakti, and S. Nakamura, \u201cPostfilters to modify the modulation spectrum for statistical parametric speech synthesis,\u201d IEEE Trans. Audio, Speech, Language Process., vol.24, no.4, pp.755-767, April 2016.","DOI":"10.1109\/TASLP.2016.2522655"},{"key":"18","doi-asserted-by":"crossref","unstructured":"[18] H. Tamaru, Y. Saito, S. Takamichi, T. Koriyama, and H. Saruwatari, \u201cGenerative moment matching network-based random modulation post-filter for DNN-based singing voice synthesis and neuraldouble-tracking,\u201d Proc. ICASSP, pp.7070-7074, Brighton, U.K., May 2019. 10.1109\/icassp.2019.8683476","DOI":"10.1109\/ICASSP.2019.8683476"},{"key":"19","doi-asserted-by":"publisher","unstructured":"[19] K. Yu and S. Young, \u201cContinuous F0 modeling for HMM based statistical parametric speech synthesis,\u201d IEEE Trans. Audio Speech Lang. Process., vol.19, no.5, pp.1071-1079, July 2011. 10.1109\/tasl.2010.2076805","DOI":"10.1109\/TASL.2010.2076805"},{"key":"20","doi-asserted-by":"crossref","unstructured":"[20] L. Song, J. Huang, A. Smola, and K. Fukumizu, \u201cHilbert space embeddings of conditional distributions with applications to dynamical systems,\u201d Proc. ICML, pp.961-968, Montreal, Canada, June 2009. 10.1145\/1553374.1553497","DOI":"10.1145\/1553374.1553497"},{"key":"21","unstructured":"[21] \u201cHMM-based speech synthesis system (HTS),\u201d http:\/\/hts.sp.nitech.ac.jp\/."},{"key":"22","unstructured":"[22] \u201cJSUT-song,\u201d https:\/\/sites.google.com\/site\/shinnosuketakamichi\/publication\/jsut-song."},{"key":"23","doi-asserted-by":"publisher","unstructured":"[23] M. Morise, F. Yokomori, and K. Ozawa, \u201cWORLD: A vocoder-based high-quality speech synthesis system for real-time applications,\u201d IEICE Trans. Inf. &amp; Syst., vol.E99-D, no.7, pp.1877-1884, July 2016. 10.1587\/transinf.2015edp7457","DOI":"10.1587\/transinf.2015EDP7457"},{"key":"24","unstructured":"[24] Y.N. Dauphin, A. Fan, M. Auli, and D. Grangier, \u201cLanguage modeling with gated convolutional networks,\u201d arXiv, vol.abs\/1612.08083, 2016."},{"key":"25","doi-asserted-by":"publisher","unstructured":"[25] N. Hojo, Y. Ijima, and H. Mizuno, \u201cDNN-based speech synthesis using speaker codes,\u201d IEICE Trans. Inf. &amp; Syst., vol.E101-D, no.2, pp.462-472, 2018. 10.1587\/transinf.2017edp7165","DOI":"10.1587\/transinf.2017EDP7165"},{"key":"26","doi-asserted-by":"publisher","unstructured":"[26] M. Morise, \u201cD4C, a band-aperiodicity estimator for high-quality speech synthesis,\u201d Speech Commun., vol.84, pp.57-65, 2016. 10.1016\/j.specom.2016.09.001","DOI":"10.1016\/j.specom.2016.09.001"},{"key":"27","unstructured":"[27] K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, and T. Kitamura, \u201cSpeech parameter generation algorithms for HMM-based speech synthesis,\u201d Proc. ICASSP, Istanbul, Turkey, pp.1315-1318, June 2000. 10.1109\/icassp.2000.861820"},{"key":"28","unstructured":"[28] J. Duchi, E. Hazan, and Y. Singer, \u201cAdaptive subgradient methods for online learning and stochastic optimization,\u201d EURASIP Journal on Applied Signal Processing, vol.12, pp.2121-2159, July 2011."},{"key":"29","doi-asserted-by":"crossref","unstructured":"[29] T. Kaneko, S. Takaki, H. Kameoka, and J. Yamagishi, \u201cGenerative adversarial network-based postfilter for STFT spectrograms,\u201d Proc. INTERSPEECH, pp.3389-3393, Stockholm, Sweden, Aug. 2017. 10.21437\/interspeech.2017-962","DOI":"10.21437\/Interspeech.2017-962"},{"key":"30","unstructured":"[30] A. Rahimi and B. Recht, \u201cRandom features for large-scale kernel machines,\u201d Proc. NIPS, Vancouver, Canada, pp.1177-1184, Dec. 2008."},{"key":"31","unstructured":"[31] \u201cLancers,\u201d https:\/\/www.lancers.jp\/."},{"key":"32","unstructured":"[32] H. Kawahara, J. Estill, and O. Fujimura, \u201cAperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STRAIGHT,\u201d MAVEBA, pp.1-6, Firentze, Italy, Sept. 2001. 10.21437\/ssw.2016-36"}],"container-title":["IEICE Transactions on Information and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.jstage.jst.go.jp\/article\/transinf\/E103.D\/3\/E103.D_2019EDP7228\/_pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,3,1]],"date-time":"2021-03-01T21:20:10Z","timestamp":1614633610000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.jstage.jst.go.jp\/article\/transinf\/E103.D\/3\/E103.D_2019EDP7228\/_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,3,1]]},"references-count":32,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2020]]}},"URL":"https:\/\/doi.org\/10.1587\/transinf.2019edp7228","relation":{},"ISSN":["0916-8532","1745-1361"],"issn-type":[{"value":"0916-8532","type":"print"},{"value":"1745-1361","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,3,1]]}}}