{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,30]],"date-time":"2025-07-30T13:09:56Z","timestamp":1753880996781,"version":"3.41.2"},"reference-count":18,"publisher":"World Scientific Pub Co Pte Ltd","issue":"01","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Int. J. As. Lang. Proc."],"published-print":{"date-parts":[[2023,3]]},"abstract":"<jats:p> With the development of deep learning, nonparallel voice conversion (VC) has achieved a significant progress recently. Automatic speech recognition (ASR) and text-to-speech (TTS) for leveraging knowledge are the two mainstream methods in VC research. In this paper, we demonstrate that the two bottleneck features (BNFs) in the above methods are complementary. ASR-BNFs are more robust especially in any-to-many tasks, but suffer from leakage of source speaker\u2019s timbre information; TTS-BNFs are less likely to reveal speaker\u2019s timbre information, but lack robustness. Therefore, a nonparallel any-to-many voice conversion model is proposed by combining ASR-BNFs and TTS-BNFs. The whole modules in the proposed model can be trained jointly without any pre-trained models. Experiments are conducted on a private multi-speaker TTS dataset. It is demonstrated that the proposed model achieves the best balance in speech quality, timbre similarity and robustness compared to baseline models. <\/jats:p>","DOI":"10.1142\/s271755452350011x","type":"journal-article","created":{"date-parts":[[2023,8,19]],"date-time":"2023-08-19T05:29:00Z","timestamp":1692422940000},"source":"Crossref","is-referenced-by-count":0,"title":["Disentangling Content Information by Combining ASR and TTS Bottleneck Features for Voice Conversion"],"prefix":"10.1142","volume":"33","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9868-306X","authenticated-orcid":false,"given":"Zeqing","family":"Zhao","sequence":"first","affiliation":[{"name":"AI Lab, Lenovo Research, Haidian District, Beijing 100094, P. R. China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4716-5622","authenticated-orcid":false,"given":"Sifan","family":"Ma","sequence":"additional","affiliation":[{"name":"AI Lab, Lenovo Research, Haidian District, Beijing 100094, P. R. China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6632-2131","authenticated-orcid":false,"given":"Yan","family":"Jia","sequence":"additional","affiliation":[{"name":"AI Lab, Lenovo Research, Haidian District, Beijing 100094, P. R. China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7288-3717","authenticated-orcid":false,"given":"Jingyu","family":"Hou","sequence":"additional","affiliation":[{"name":"AI Lab, Lenovo Research, Haidian District, Beijing 100094, P. R. China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-5570-6063","authenticated-orcid":false,"given":"Lin","family":"Yang","sequence":"additional","affiliation":[{"name":"AI Lab, Lenovo Research, Haidian District, Beijing 100094, P. R. China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9374-9699","authenticated-orcid":false,"given":"Junjie","family":"Wang","sequence":"additional","affiliation":[{"name":"AI Lab, Lenovo Research, Haidian District, Beijing 100094, P. R. China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"219","published-online":{"date-parts":[[2023,9,26]]},"reference":[{"key":"S271755452350011XBIB001","first-page":"5210","volume-title":"Proc. 36th Int. Conf. Machine Learning","author":"Qian K.","year":"2019"},{"key":"S271755452350011XBIB004","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2019.8682897"},{"key":"S271755452350011XBIB006","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2021.3066047"},{"key":"S271755452350011XBIB007","doi-asserted-by":"publisher","DOI":"10.1109\/ICME.2016.7552917"},{"key":"S271755452350011XBIB008","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2019.2892235"},{"key":"S271755452350011XBIB009","doi-asserted-by":"publisher","DOI":"10.1109\/ISCSLP49672.2021.9362095"},{"key":"S271755452350011XBIB010","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP39728.2021.9414788"},{"key":"S271755452350011XBIB011","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP43922.2022.9747625"},{"key":"S271755452350011XBIB013","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP43922.2022.9747140"},{"key":"S271755452350011XBIB014","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP43922.2022.9746139"},{"key":"S271755452350011XBIB015","doi-asserted-by":"publisher","DOI":"10.1109\/ISCSLP57327.2022.10038075"},{"key":"S271755452350011XBIB016","first-page":"3165","volume-title":"Proc. 33rd Conf. Neural Information Processing Systems","volume":"32","author":"Ren Y.","year":"2019"},{"key":"S271755452350011XBIB017","first-page":"5998","volume-title":"Advances in Neural Information Processing Systems","volume":"30","author":"Vaswani A.","year":"2017"},{"key":"S271755452350011XBIB019","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2018.8461368"},{"key":"S271755452350011XBIB022","first-page":"864","volume-title":"Proc. 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conf. (APSIPA ASC)","author":"Zhao Z.","year":"2021"},{"key":"S271755452350011XBIB023","doi-asserted-by":"publisher","DOI":"10.1145\/1143844.1143891"},{"key":"S271755452350011XBIB026","first-page":"17022","volume-title":"Advances in Neural Information Processing Systems","volume":"33","author":"Kong J.","year":"2020"},{"issue":"11","key":"S271755452350011XBIB033","first-page":"2579","volume":"9","author":"Van der Maaten L.","year":"2008","journal-title":"J. Mach. Learn. Res."}],"container-title":["International Journal of Asian Language Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.worldscientific.com\/doi\/pdf\/10.1142\/S271755452350011X","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,1,11]],"date-time":"2024-01-11T07:44:38Z","timestamp":1704959078000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.worldscientific.com\/doi\/10.1142\/S271755452350011X"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,3]]},"references-count":18,"journal-issue":{"issue":"01","published-print":{"date-parts":[[2023,3]]}},"alternative-id":["10.1142\/S271755452350011X"],"URL":"https:\/\/doi.org\/10.1142\/s271755452350011x","relation":{},"ISSN":["2717-5545","2424-791X"],"issn-type":[{"type":"print","value":"2717-5545"},{"type":"electronic","value":"2424-791X"}],"subject":[],"published":{"date-parts":[[2023,3]]},"article-number":"2350011"}}