{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,2]],"date-time":"2026-01-02T07:28:06Z","timestamp":1767338886910,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":21,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,10,18]],"date-time":"2021-10-18T00:00:00Z","timestamp":1634515200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,10,18]]},"DOI":"10.1145\/3461615.3491106","type":"proceedings-article","created":{"date-parts":[[2021,12,18]],"date-time":"2021-12-18T04:57:40Z","timestamp":1639803460000},"page":"75-79","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["Improving Model Stability and Training Efficiency in Fast, High Quality Expressive Voice Conversion System"],"prefix":"10.1145","author":[{"given":"Zhiyuan","family":"Zhao","sequence":"first","affiliation":[{"name":"UBTECH Robotics Corp, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jingjun","family":"Liang","sequence":"additional","affiliation":[{"name":"UBTECH Robotics Corp, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zehong","family":"Zheng","sequence":"additional","affiliation":[{"name":"UBTECH Robotics Corp, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Linhuang","family":"Yan","sequence":"additional","affiliation":[{"name":"UBTECH Robotics Corp, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhiyong","family":"Yang","sequence":"additional","affiliation":[{"name":"UBTECH Robotics Corp, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wan","family":"Ding","sequence":"additional","affiliation":[{"name":"UBTECH Robotics Corp, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dongyan","family":"Huang","sequence":"additional","affiliation":[{"name":"UBTECH Robotics Corp, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,12,17]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"[n. d.]. Computing Error Rates. ([n. d.]). https:\/\/sites.google.com\/site\/textdigitisation\/qualitymeasures\/computingerrorrates  [n. d.]. Computing Error Rates. ([n. d.]). https:\/\/sites.google.com\/site\/textdigitisation\/qualitymeasures\/computingerrorrates"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.5923\/j.ajsp.20120205.06"},{"key":"e_1_3_2_1_3_1","volume-title":"Parrotron: An End-to-End Speech-to-Speech Conversion Model and its Applications to Hearing-Impaired Speech and Speech Separation.","author":"Biadsy Fadi","year":"2019","unstructured":"Fadi Biadsy , Ron\u00a0 J Weiss , Pedro\u00a0 J Moreno , Dimitri Kanvesky , and Ye Jia . 2019 . Parrotron: An End-to-End Speech-to-Speech Conversion Model and its Applications to Hearing-Impaired Speech and Speech Separation. (2019). Fadi Biadsy, Ron\u00a0J Weiss, Pedro\u00a0J Moreno, Dimitri Kanvesky, and Ye Jia. 2019. Parrotron: An End-to-End Speech-to-Speech Conversion Model and its Applications to Hearing-Impaired Speech and Speech Separation. (2019)."},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2014.2353991"},{"volume-title":"Mapping frames with DNN-HMM recognizer for non-parallel voice conversion. In 2015 APSIPA","author":"Dong Minghui","key":"e_1_3_2_1_5_1","unstructured":"Minghui Dong , Chenyu Yang , Yanfeng Lu , Jochen\u00a0Walter Ehnes , Dongyan Huang , Huaiping Ming , Rong Tong , Siu\u00a0Wa Lee , and Haizhou Li. 2015. Mapping frames with DNN-HMM recognizer for non-parallel voice conversion. In 2015 APSIPA . IEEE , 488\u2013494. Minghui Dong, Chenyu Yang, Yanfeng Lu, Jochen\u00a0Walter Ehnes, Dongyan Huang, Huaiping Ming, Rong Tong, Siu\u00a0Wa Lee, and Haizhou Li. 2015. Mapping frames with DNN-HMM recognizer for non-parallel voice conversion. In 2015 APSIPA. IEEE, 488\u2013494."},{"key":"e_1_3_2_1_6_1","unstructured":"et\u00a0al. Junyi\u00a0Sun. 2013. \u201dJieba\u201d: Chinese text segmentation. (2013). https:\/\/github.com\/fxsjy\/jieba  et\u00a0al. Junyi\u00a0Sun. 2013. \u201dJieba\u201d: Chinese text segmentation. (2013). https:\/\/github.com\/fxsjy\/jieba"},{"key":"e_1_3_2_1_7_1","volume-title":"exploitation of the other aspect of VOCODER: Perceptually isomorphic decomposition of speech sounds. Acoustical science and technology 27, 6","author":"Kawahara Hideki","year":"2006","unstructured":"Hideki Kawahara . 2006. STRAIGHT , exploitation of the other aspect of VOCODER: Perceptually isomorphic decomposition of speech sounds. Acoustical science and technology 27, 6 ( 2006 ), 349\u2013353. Hideki Kawahara. 2006. STRAIGHT, exploitation of the other aspect of VOCODER: Perceptually isomorphic decomposition of speech sounds. Acoustical science and technology 27, 6 (2006), 349\u2013353."},{"volume-title":"Joint CTC-attention based end-to-end speech recognition using multi-task learning. In 2017 ICASSP","author":"Kim Suyoun","key":"e_1_3_2_1_8_1","unstructured":"Suyoun Kim , Takaaki Hori , and Shinji Watanabe . 2017. Joint CTC-attention based end-to-end speech recognition using multi-task learning. In 2017 ICASSP . IEEE , 4835\u20134839. Suyoun Kim, Takaaki Hori, and Shinji Watanabe. 2017. Joint CTC-attention based end-to-end speech recognition using multi-task learning. In 2017 ICASSP. IEEE, 4835\u20134839."},{"key":"e_1_3_2_1_9_1","unstructured":"Jungil Kong Jaehyeon Kim and Jaekyoung Bae. 2020. HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis. arXiv preprint arXiv:2010.05646(2020).  Jungil Kong Jaehyeon Kim and Jaekyoung Bae. 2020. HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis. arXiv preprint arXiv:2010.05646(2020)."},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"crossref","unstructured":"Javier Latorre Jakub Lachowicz Jaime Lorenzo-Trueba Thomas Merritt Thomas Drugman Srikanth Ronanki and Klimkov Viacheslav. 2018. Effect of data reduction on sequence-to-sequence neural TTS.  Javier Latorre Jakub Lachowicz Jaime Lorenzo-Trueba Thomas Merritt Thomas Drugman Srikanth Ronanki and Klimkov Viacheslav. 2018. Effect of data reduction on sequence-to-sequence neural TTS.","DOI":"10.1109\/ICASSP.2019.8682168"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33016706"},{"key":"e_1_3_2_1_12_1","unstructured":"Peng Liu Xixin Wu Shiyin Kang Guangzhi Li Dan Su and Dong Yu. 2019. Maximizing mutual information for tacotron. arXiv preprint arXiv:1909.01145(2019).  Peng Liu Xixin Wu Shiyin Kang Guangzhi Li Dan Su and Dong Yu. 2019. Maximizing mutual information for tacotron. arXiv preprint arXiv:1909.01145(2019)."},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1587\/transinf.2015EDP7457"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/WASPAA.2013.6701851"},{"key":"e_1_3_2_1_15_1","unstructured":"ST-CMDS-20170001_1. 2017. Free ST Chinese Mandarin Corpus. (2017). https:\/\/openslr.org\/38\/  ST-CMDS-20170001_1. 2017. Free ST Chinese Mandarin Corpus. (2017). https:\/\/openslr.org\/38\/"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"crossref","unstructured":"Lifa Sun Shiyin Kang Kun Li and Helen Meng. 2015. Voice conversion using deep bidirectional long short-term memory based recurrent neural networks. (2015) 4869\u20134873.  Lifa Sun Shiyin Kang Kun Li and Helen Meng. 2015. Voice conversion using deep bidirectional long short-term memory based recurrent neural networks. (2015) 4869\u20134873.","DOI":"10.1109\/ICASSP.2015.7178896"},{"volume-title":"Generalized end-to-end loss for speaker verification. In 2018 ICASSP","author":"Wan Li","key":"e_1_3_2_1_18_1","unstructured":"Li Wan , Quan Wang , Alan Papir , and Ignacio\u00a0Lopez Moreno . 2018. Generalized end-to-end loss for speaker verification. In 2018 ICASSP . IEEE , 4879\u20134883. Li Wan, Quan Wang, Alan Papir, and Ignacio\u00a0Lopez Moreno. 2018. Generalized end-to-end loss for speaker verification. In 2018 ICASSP. IEEE, 4879\u20134883."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2014.2333242"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.21437\/VCC_BC.2020-14"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"crossref","unstructured":"Yu Zhang Ron\u00a0J Weiss Heiga Zen Yonghui Wu Zhifeng Chen RJ Skerry-Ryan Ye Jia Andrew Rosenberg and Bhuvana Ramabhadran. 2019. Learning to speak fluently in a foreign language: Multilingual speech synthesis and cross-language voice cloning. arXiv preprint arXiv:1907.04448(2019).  Yu Zhang Ron\u00a0J Weiss Heiga Zen Yonghui Wu Zhifeng Chen RJ Skerry-Ryan Ye Jia Andrew Rosenberg and Bhuvana Ramabhadran. 2019. Learning to speak fluently in a foreign language: Multilingual speech synthesis and cross-language voice cloning. arXiv preprint arXiv:1907.04448(2019).","DOI":"10.21437\/Interspeech.2019-2668"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.21437\/VCC_BC.2020-1"}],"event":{"name":"ICMI '21: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION","sponsor":["SIGCHI ACM Special Interest Group on Computer-Human Interaction"],"location":"Montreal QC Canada","acronym":"ICMI '21"},"container-title":["Companion Publication of the 2021 International Conference on Multimodal Interaction"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3461615.3491106","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3461615.3491106","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:49:04Z","timestamp":1750193344000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3461615.3491106"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,18]]},"references-count":21,"alternative-id":["10.1145\/3461615.3491106","10.1145\/3461615"],"URL":"https:\/\/doi.org\/10.1145\/3461615.3491106","relation":{},"subject":[],"published":{"date-parts":[[2021,10,18]]},"assertion":[{"value":"2021-12-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}