{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:11:04Z","timestamp":1750219864141,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":22,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,12,9]],"date-time":"2022-12-09T00:00:00Z","timestamp":1670544000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,12,9]]},"DOI":"10.1145\/3577530.3577575","type":"proceedings-article","created":{"date-parts":[[2023,3,30]],"date-time":"2023-03-30T22:13:24Z","timestamp":1680214404000},"page":"284-288","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["High Quality and Similarity One-Shot Voice Conversion Using End-to-End Model"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3116-1961","authenticated-orcid":false,"given":"Renmingyue","family":"Du","sequence":"first","affiliation":[{"name":"School of Advanced Technology, Xi'an Jiaotong-liverpool University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5324-7360","authenticated-orcid":false,"given":"Jixun","family":"Yao","sequence":"additional","affiliation":[{"name":"School of Computer, Northwestern Polytechnical University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2023,3,30]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"crossref","unstructured":"Ju-chieh Chou Cheng-chieh Yeh and Hung-yi Lee. 2019. One-shot voice conversion by separating speaker and content representations with instance normalization. arXiv preprint arXiv:1904.05742(2019).  Ju-chieh Chou Cheng-chieh Yeh and Hung-yi Lee. 2019. One-shot voice conversion by separating speaker and content representations with instance normalization. arXiv preprint arXiv:1904.05742(2019).","DOI":"10.21437\/Interspeech.2019-2663"},{"key":"e_1_3_2_1_2_1","volume-title":"Ecapa-tdnn: Emphasized channel attention, propagation and aggregation in tdnn based speaker verification. arXiv preprint arXiv:2005.07143(2020).","author":"Desplanques Brecht","year":"2020","unstructured":"Brecht Desplanques , Jenthe Thienpondt , and Kris Demuynck . 2020 . Ecapa-tdnn: Emphasized channel attention, propagation and aggregation in tdnn based speaker verification. arXiv preprint arXiv:2005.07143(2020). Brecht Desplanques, Jenthe Thienpondt, and Kris Demuynck. 2020. Ecapa-tdnn: Emphasized channel attention, propagation and aggregation in tdnn based speaker verification. arXiv preprint arXiv:2005.07143(2020)."},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"crossref","unstructured":"Daniel Garcia-Romero David Snyder Gregory Sell Alan McCree Daniel Povey and Sanjeev Khudanpur. 2019. x-Vector DNN Refinement with Full-Length Recordings for Speaker Recognition.. In Interspeech. 1493\u20131496.  Daniel Garcia-Romero David Snyder Gregory Sell Alan McCree Daniel Povey and Sanjeev Khudanpur. 2019. x-Vector DNN Refinement with Full-Length Recordings for Speaker Recognition.. In Interspeech. 1493\u20131496.","DOI":"10.21437\/Interspeech.2019-2205"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2019.2917232"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.23919\/EUSIPCO.2018.8553236"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"crossref","unstructured":"Takuhiro Kaneko Hirokazu Kameoka Kaoru Hiramatsu and Kunio Kashino. 2017. Sequence-to-Sequence Voice Conversion with Similarity Metric Learned Using Generative Adversarial Networks.. In Interspeech Vol.\u00a02017. 1283\u20131287.  Takuhiro Kaneko Hirokazu Kameoka Kaoru Hiramatsu and Kunio Kashino. 2017. Sequence-to-Sequence Voice Conversion with Similarity Metric Learned Using Generative Adversarial Networks.. In Interspeech Vol.\u00a02017. 1283\u20131287.","DOI":"10.21437\/Interspeech.2017-970"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"crossref","unstructured":"Takuhiro Kaneko Hirokazu Kameoka Kou Tanaka and Nobukatsu Hojo. 2019. Stargan-vc2: Rethinking conditional methods for stargan-based voice conversion. arXiv preprint arXiv:1907.12279(2019).  Takuhiro Kaneko Hirokazu Kameoka Kou Tanaka and Nobukatsu Hojo. 2019. Stargan-vc2: Rethinking conditional methods for stargan-based voice conversion. arXiv preprint arXiv:1907.12279(2019).","DOI":"10.21437\/Interspeech.2019-2236"},{"key":"e_1_3_2_1_8_1","first-page":"17022","article-title":"Hifi-gan: Generative adversarial networks for efficient and high fidelity speech synthesis","volume":"33","author":"Kong Jungil","year":"2020","unstructured":"Jungil Kong , Jaehyeon Kim , and Jaekyoung Bae . 2020 . Hifi-gan: Generative adversarial networks for efficient and high fidelity speech synthesis . Advances in Neural Information Processing Systems 33 (2020), 17022 \u2013 17033 . Jungil Kong, Jaehyeon Kim, and Jaekyoung Bae. 2020. Hifi-gan: Generative adversarial networks for efficient and high fidelity speech synthesis. Advances in Neural Information Processing Systems 33 (2020), 17022\u201317033.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_1_9_1","volume-title":"Melgan: Generative adversarial networks for conditional waveform synthesis. Advances in neural information processing systems","author":"Kumar Kundan","year":"2019","unstructured":"Kundan Kumar , Rithesh Kumar , Thibault de Boissiere , Lucas Gestin , Wei\u00a0Zhen Teoh , Jose Sotelo , Alexandre de Br\u00e9bisson , Yoshua Bengio , and Aaron\u00a0 C Courville . 2019 . Melgan: Generative adversarial networks for conditional waveform synthesis. Advances in neural information processing systems (2019). Kundan Kumar, Rithesh Kumar, Thibault de Boissiere, Lucas Gestin, Wei\u00a0Zhen Teoh, Jose Sotelo, Alexandre de Br\u00e9bisson, Yoshua Bengio, and Aaron\u00a0C Courville. 2019. Melgan: Generative adversarial networks for conditional waveform synthesis. Advances in neural information processing systems (2019)."},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP43922.2022.9747020"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/APSIPAASC47483.2019.9023141"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP40776.2020.9054734"},{"key":"e_1_3_2_1_13_1","volume-title":"International Conference on Machine Learning. PMLR, 7836\u20137846","author":"Qian Kaizhi","year":"2020","unstructured":"Kaizhi Qian , Yang Zhang , Shiyu Chang , Mark Hasegawa-Johnson , and David Cox . 2020 . Unsupervised speech decomposition via triple information bottleneck . In International Conference on Machine Learning. PMLR, 7836\u20137846 . Kaizhi Qian, Yang Zhang, Shiyu Chang, Mark Hasegawa-Johnson, and David Cox. 2020. Unsupervised speech decomposition via triple information bottleneck. In International Conference on Machine Learning. PMLR, 7836\u20137846."},{"key":"e_1_3_2_1_14_1","volume-title":"International Conference on Machine Learning. PMLR, 5210\u20135219","author":"Qian Kaizhi","year":"2019","unstructured":"Kaizhi Qian , Yang Zhang , Shiyu Chang , Xuesong Yang , and Mark Hasegawa-Johnson . 2019 . Autovc: Zero-shot voice style transfer with only autoencoder loss . In International Conference on Machine Learning. PMLR, 5210\u20135219 . Kaizhi Qian, Yang Zhang, Shiyu Chang, Xuesong Yang, and Mark Hasegawa-Johnson. 2019. Autovc: Zero-shot voice style transfer with only autoencoder loss. In International Conference on Machine Learning. PMLR, 5210\u20135219."},{"key":"e_1_3_2_1_15_1","volume-title":"Blow: a single-scale hyperconditioned flow for non-parallel raw-audio voice conversion. Advances in Neural Information Processing Systems 32","author":"Serr\u00e0 Joan","year":"2019","unstructured":"Joan Serr\u00e0 , Santiago Pascual , and Carlos Segura\u00a0Perales . 2019. Blow: a single-scale hyperconditioned flow for non-parallel raw-audio voice conversion. Advances in Neural Information Processing Systems 32 ( 2019 ). Joan Serr\u00e0, Santiago Pascual, and Carlos Segura\u00a0Perales. 2019. Blow: a single-scale hyperconditioned flow for non-parallel raw-audio voice conversion. Advances in Neural Information Processing Systems 32 (2019)."},{"key":"e_1_3_2_1_16_1","volume-title":"An overview of voice conversion and its challenges: From statistical modeling to deep learning","author":"Sisman Berrak","year":"2020","unstructured":"Berrak Sisman , Junichi Yamagishi , Simon King , and Haizhou Li. 2020. An overview of voice conversion and its challenges: From statistical modeling to deep learning . IEEE\/ACM Transactions on Audio, Speech, and Language Processing ( 2020 ), 132\u2013157. Berrak Sisman, Junichi Yamagishi, Simon King, and Haizhou Li. 2020. An overview of voice conversion and its challenges: From statistical modeling to deep learning. IEEE\/ACM Transactions on Audio, Speech, and Language Processing (2020), 132\u2013157."},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2018.8462665"},{"key":"e_1_3_2_1_18_1","volume-title":"Vqmivc: Vector quantization and mutual information-based unsupervised speech representation disentanglement for one-shot voice conversion. arXiv preprint arXiv:2106.10132(2021).","author":"Wang Disong","year":"2021","unstructured":"Disong Wang , Liqun Deng , Yu\u00a0Ting Yeung , Xiao Chen , Xunying Liu , and Helen Meng . 2021 . Vqmivc: Vector quantization and mutual information-based unsupervised speech representation disentanglement for one-shot voice conversion. arXiv preprint arXiv:2106.10132(2021). Disong Wang, Liqun Deng, Yu\u00a0Ting Yeung, Xiao Chen, Xunying Liu, and Helen Meng. 2021. Vqmivc: Vector quantization and mutual information-based unsupervised speech representation disentanglement for one-shot voice conversion. arXiv preprint arXiv:2106.10132(2021)."},{"key":"e_1_3_2_1_19_1","unstructured":"Da-Yi Wu Yen-Hao Chen and Hung-Yi Lee. 2020. Vqvc+: One-shot voice conversion by vector quantization and u-net architecture. arXiv preprint arXiv:2006.04154(2020).  Da-Yi Wu Yen-Hao Chen and Hung-Yi Lee. 2020. Vqvc+: One-shot voice conversion by vector quantization and u-net architecture. arXiv preprint arXiv:2006.04154(2020)."},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP40776.2020.9053854"},{"key":"e_1_3_2_1_21_1","volume-title":"Wenet: Production oriented streaming and non-streaming end-to-end speech recognition toolkit. arXiv preprint arXiv:2102.01547(2021).","author":"Yao Zhuoyuan","year":"2021","unstructured":"Zhuoyuan Yao , Di Wu , Xiong Wang , Binbin Zhang , Fan Yu , Chao Yang , Zhendong Peng , Xiaoyu Chen , Lei Xie , and Xin Lei . 2021 . Wenet: Production oriented streaming and non-streaming end-to-end speech recognition toolkit. arXiv preprint arXiv:2102.01547(2021). Zhuoyuan Yao, Di Wu, Xiong Wang, Binbin Zhang, Fan Yu, Chao Yang, Zhendong Peng, Xiaoyu Chen, Lei Xie, and Xin Lei. 2021. Wenet: Production oriented streaming and non-streaming end-to-end speech recognition toolkit. arXiv preprint arXiv:2102.01547(2021)."},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP43922.2022.9746682"}],"event":{"name":"CSAI 2022: 2022 6th International Conference on Computer Science and Artificial Intelligence","acronym":"CSAI 2022","location":"Beijing China"},"container-title":["Proceedings of the 2022 6th International Conference on Computer Science and Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3577530.3577575","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3577530.3577575","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:47:35Z","timestamp":1750178855000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3577530.3577575"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,9]]},"references-count":22,"alternative-id":["10.1145\/3577530.3577575","10.1145\/3577530"],"URL":"https:\/\/doi.org\/10.1145\/3577530.3577575","relation":{},"subject":[],"published":{"date-parts":[[2022,12,9]]},"assertion":[{"value":"2023-03-30","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}