{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,3]],"date-time":"2025-08-03T01:11:52Z","timestamp":1754183512477,"version":"3.41.2"},"reference-count":32,"publisher":"Institute of Electronics, Information and Communications Engineers (IEICE)","issue":"8","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IEICE Trans. Inf. &amp; Syst."],"published-print":{"date-parts":[[2025,8,1]]},"DOI":"10.1587\/transinf.2024edp7206","type":"journal-article","created":{"date-parts":[[2025,2,24]],"date-time":"2025-02-24T17:11:36Z","timestamp":1740417096000},"page":"1001-1010","source":"Crossref","is-referenced-by-count":0,"title":["Fast and Lightweight Non-Parallel Voice Conversion Based on Free-Energy Minimization of Speaker-Conditional Restricted Boltzmann Machine"],"prefix":"10.1587","volume":"E108.D","author":[{"given":"Takuya","family":"KISHIDA","sequence":"first","affiliation":[{"name":"Faculty of Human Informatics, Aichi Shukutoku University"}]},{"given":"Toru","family":"NAKASHIKA","sequence":"additional","affiliation":[{"name":"Graduate School of Informatics and Engineering, The University of Electro-Communications"}]}],"member":"532","reference":[{"key":"1","unstructured":"[1] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, \u201cGenerative adversarial nets,\u201d Advances in neural information processing systems, pp.2672-2680, 2014."},{"key":"2","doi-asserted-by":"crossref","unstructured":"[2] T. Kaneko and H. Kameoka, \u201cParallel-data-free voice conversion using cycle-consistent adversarial networks,\u201d arXiv preprint arXiv:1711.11293, 2017.","DOI":"10.23919\/EUSIPCO.2018.8553236"},{"key":"3","doi-asserted-by":"crossref","unstructured":"[3] T. Kaneko, H. Kameoka, K. Tanaka, and N. Hojo, \u201cCycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion,\u201d ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.6820-6824, IEEE, 2019. 10.1109\/icassp.2019.8682897","DOI":"10.1109\/ICASSP.2019.8682897"},{"key":"4","doi-asserted-by":"crossref","unstructured":"[4] H. Kameoka, T. Kaneko, K. Tanaka, and N. Hojo, \u201cStarGAN-VC: Non-parallel many-to-many voice conversion with star generative adversarial networks,\u201d arXiv preprint arXiv:1806.02169, 2018.","DOI":"10.1109\/SLT.2018.8639535"},{"key":"5","doi-asserted-by":"crossref","unstructured":"[5] D.-Y. Wu and H.-y. Lee, \u201cOne-shot voice conversion by vector quantization,\u201d ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.7734-7738, IEEE, 2020. 10.1109\/icassp40776.2020.9053854","DOI":"10.1109\/ICASSP40776.2020.9053854"},{"key":"6","unstructured":"[6] D.-Y. Wu, Y.-H. Chen, and H.-y. Lee, \u201cVQVC+: One-shot voice conversion by vector quantization and U-net architecture,\u201d arXiv preprint arXiv:2006.04154, 2020."},{"key":"7","unstructured":"[7] A. Van Den Oord, O. Vinyals, et al., \u201cNeural discrete representation learning,\u201d Advances in neural information processing systems, vol.30, 2017."},{"key":"8","unstructured":"[8] C. Peterson and J.R. Anderson, \u201cA mean field theory learning algorithm for neural networks,\u201d Complex Systems, vol.1, pp.995-1019, 1987."},{"key":"9","doi-asserted-by":"crossref","unstructured":"[9] J.-Y. Zhu, T. Park, P. Isola, and A.A. Efros, \u201cUnpaired image-to-image translation using cycle-consistent adversarial networks,\u201d Proc. IEEE international conference on computer vision, pp.2242-2251, 2017. 10.1109\/iccv.2017.244","DOI":"10.1109\/ICCV.2017.244"},{"key":"10","doi-asserted-by":"crossref","unstructured":"[10] Y. Choi, M. Choi, M. Kim, J.-W. Ha, S. Kim, and J. Choo, \u201cStarGAN: Unified generative adversarial networks for multi-domain image-to-image translation,\u201d Proc. IEEE conference on computer vision and pattern recognition, pp.8789-8797, 2018. 10.1109\/cvpr.2018.00916","DOI":"10.1109\/CVPR.2018.00916"},{"key":"11","unstructured":"[11] D.P. Kingma and M. Welling, \u201cAuto-encoding variational bayes,\u201d arXiv preprint arXiv:1312.6114, 2013."},{"key":"12","unstructured":"[12] K. Qian, Y. Zhang, S. Chang, X. Yang, and M. Hasegawa-Johnson, \u201cAutoVC: Zero-shot voice style transfer with only autoencoder loss,\u201d International Conference on Machine Learning, pp.5210-5219, PMLR, 2019."},{"key":"13","unstructured":"[13] A. Razavi, A. Van den Oord, and O. Vinyals, \u201cGenerating diverse high-fidelity images with VQ-VAE-2,\u201d Advances in neural information processing systems, vol.32, 2019."},{"key":"14","doi-asserted-by":"crossref","unstructured":"[14] M. Proszewska, G. Beringer, D. S\u00e1ez-Trigueros, T. Merritt, A. Ezzerg, and R. Barra-Chicote, \u201cGlowVC: Mel-spectrogram space disentangling model for language-independent text-free voice conversion,\u201d Proceedings of Interspeech 2022, pp.2973-2977, 2022. 10.21437\/interspeech.2022-322","DOI":"10.21437\/Interspeech.2022-322"},{"key":"15","unstructured":"[15] D.P. Kingma and P. Dhariwal, \u201cGlow: generative flow with invertible 1\u00d71 convolutions,\u201d Proc. 32nd International Conference on Neural Information Processing Systems, pp.10236-10245, 2018."},{"key":"16","doi-asserted-by":"publisher","unstructured":"[16] H. Kameoka, T. Kaneko, K. Tanaka, N. Hojo, and S. Seki, \u201cVoicegrad: Non-parallel any-to-many voice conversion with annealed langevin dynamics,\u201d IEEE\/ACM Trans. Audio, Speech, Language Process., vol.32, pp.2213-2226, 2024. 10.1109\/taslp.2024.3379901","DOI":"10.1109\/TASLP.2024.3379901"},{"key":"17","unstructured":"[17] Y. Song and S. Ermon, \u201cGenerative modeling by estimating gradients of the data distribution,\u201d Advances in neural information processing systems, pp.11918-11930, 2019."},{"key":"18","unstructured":"[18] Y. Song and S. Ermon, \u201cImproved techniques for training score-based generative models,\u201d arXiv preprint arXiv:2006.09011, 2020."},{"key":"19","unstructured":"[19] V. Popov, I. Vovk, V. Gogoryan, T. Sadekova, M. Kudinov, and J. Wei, \u201cDiffusion-based voice conversion with fast maximum likelihood sampling scheme,\u201d arXiv preprint arXiv:2109.13821, 2021."},{"key":"20","unstructured":"[20] J. Ho, A. Jain, and P. Abbeel, \u201cDenoising diffusion probabilistic models,\u201d Advances in neural information processing systems, vol.33, pp.6840-6851, 2020."},{"key":"21","doi-asserted-by":"publisher","unstructured":"[21] T. Nakashika, T. Takiguchi, and Y. Minami, \u201cNon-parallel training in voice conversion using an adaptive restricted Boltzmann machine,\u201d IEEE\/ACM Transactions on Audio, Speech and Language Processing (IEEE\/ACM Trans. Audio, Speech, Language Process.), vol.24, no.11, pp.2032-2045, 2016. 10.1109\/taslp.2016.2593263","DOI":"10.1109\/TASLP.2016.2593263"},{"key":"22","doi-asserted-by":"publisher","unstructured":"[22] T. Kishida and T. Nakashika, \u201cSpeech Chain VC: Linking linguistic and acoustic levels via latent distinctive features for RBM-based voice conversion,\u201d IEICE Trans. Inf. &amp; Syst., vol.E103-D, no.11, pp.2340-2350, 2020. 10.1587\/transinf.2020edp7032","DOI":"10.1587\/transinf.2020EDP7032"},{"key":"23","doi-asserted-by":"crossref","unstructured":"[23] G.E. Hinton, S. Osindero, and Y.-W. Teh, \u201cA fast learning algorithm for deep belief nets,\u201d Neural computation, vol.18, no.7, pp.1527-1554, 2006. 10.1162\/neco.2006.18.7.1527","DOI":"10.1162\/neco.2006.18.7.1527"},{"key":"24","doi-asserted-by":"crossref","unstructured":"[24] K.H. Cho, A. Ilin, and T. Raiko, \u201cImproved learning of Gaussian-Bernoulli restricted Boltzmann machines,\u201d International conference on artificial neural networks, pp.10-17, Springer, 2011. 10.1007\/978-3-642-21735-7_2","DOI":"10.1007\/978-3-642-21735-7_2"},{"key":"25","unstructured":"[25] M.A. Carreira-Perpinan and G. Hinton, \u201cOn contrastive divergence learning,\u201d International workshop on artificial intelligence and statistics, pp.33-40, PMLR, 2005."},{"key":"26","unstructured":"[26] S. Takamichi, K. Mitsui, Y. Saito, T. Koriyama, N. Tanji, and H. Saruwatari, \u201cJVS corpus: free Japanese multi-speaker voice corpus,\u201d arXiv preprint arXiv:1908.06248, 2019."},{"key":"27","unstructured":"[27] J. Kominek and A.W. Black, \u201cThe CMU Arctic speech databases,\u201d Fifth ISCA workshop on speech synthesis, 2004."},{"key":"28","doi-asserted-by":"publisher","unstructured":"[28] M. Morise, F. Yokomori, and K. Ozawa, \u201cWORLD: A vocoder-based high-quality speech synthesis system for real-time applications,\u201d IEICE Trans. Inf. &amp; Syst., vol.E99-D, no.7, pp.1877-1884, 2016. 10.1587\/transinf.2015edp7457","DOI":"10.1587\/transinf.2015EDP7457"},{"key":"29","unstructured":"[29] D.P. Kingma and J. Ba, \u201cAdam: A method for stochastic optimization,\u201d arXiv preprint arXiv:1412.6980, 2014."},{"key":"30","doi-asserted-by":"crossref","unstructured":"[30] G.E. Peterson and H.L. Barney, \u201cControl methods used in a study of the vowels,\u201d The Journal of the acoustical society of America, vol.24, no.2, pp.175-184, 1952. 10.1121\/1.1906875","DOI":"10.1121\/1.1906875"},{"key":"31","unstructured":"[31] T. Hirahara and R. Akahane-Yamada, \u201cAcoustic characteristics of Japanese vowels,\u201d Proc. 18th ICA, pp.3387-3290, 2004."},{"key":"32","unstructured":"[32] A. Radford, J.W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever, \u201cRobust speech recognition via large-scale weak supervision,\u201d International conference on machine learning, pp.28492-28518, PMLR, 2023."}],"container-title":["IEICE Transactions on Information and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.jstage.jst.go.jp\/article\/transinf\/E108.D\/8\/E108.D_2024EDP7206\/_pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,2]],"date-time":"2025-08-02T03:29:15Z","timestamp":1754105355000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.jstage.jst.go.jp\/article\/transinf\/E108.D\/8\/E108.D_2024EDP7206\/_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8,1]]},"references-count":32,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2025]]}},"URL":"https:\/\/doi.org\/10.1587\/transinf.2024edp7206","relation":{},"ISSN":["0916-8532","1745-1361"],"issn-type":[{"type":"print","value":"0916-8532"},{"type":"electronic","value":"1745-1361"}],"subject":[],"published":{"date-parts":[[2025,8,1]]},"article-number":"2024EDP7206"}}