{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,3]],"date-time":"2026-07-03T21:00:08Z","timestamp":1783112408828,"version":"3.54.6"},"publisher-location":"New York, NY, USA","reference-count":37,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,10,10]],"date-time":"2022-10-10T00:00:00Z","timestamp":1665360000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Key R&D Program of China","award":["No.62072397"],"award-info":[{"award-number":["No.62072397"]}]},{"name":"National Key R&D Program of China","award":["No.61836002"],"award-info":[{"award-number":["No.61836002"]}]},{"name":"Zhejiang Natural Science Foundation","award":["LR19F020006"],"award-info":[{"award-number":["LR19F020006"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,10,10]]},"DOI":"10.1145\/3503161.3547854","type":"proceedings-article","created":{"date-parts":[[2022,10,10]],"date-time":"2022-10-10T15:42:35Z","timestamp":1665416555000},"page":"2525-2535","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":26,"title":["SingGAN: Generative Adversarial Network For High-Fidelity Singing Voice Generation"],"prefix":"10.1145","author":[{"given":"Rongjie","family":"Huang","sequence":"first","affiliation":[{"name":"Zhejiang University, HangZhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Chenye","family":"Cui","sequence":"additional","affiliation":[{"name":"Zhejiang University, HangZhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"FeiYang","family":"cHEN","sequence":"additional","affiliation":[{"name":"Huawei Cloud, HangZhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yi","family":"Ren","sequence":"additional","affiliation":[{"name":"Zhejiang University, HangZhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jinglin","family":"Liu","sequence":"additional","affiliation":[{"name":"Zhejiang University, HangZhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Zhou","family":"Zhao","sequence":"additional","affiliation":[{"name":"Zhejiang University, HangZhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Baoxing","family":"Huai","sequence":"additional","affiliation":[{"name":"Huawei Cloud, HangZhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Zhefeng","family":"Wang","sequence":"additional","affiliation":[{"name":"Huawei Cloud, HangZhou, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2022,10,10]]},"reference":[{"key":"e_1_3_2_2_1_1","volume-title":"Hi-FiSinger: Towards High-Fidelity Neural Singing Voice Synthesis. arXiv preprint arXiv:2009.01776","author":"Chen Jiawei","year":"2020","unstructured":"Jiawei Chen , Xu Tan , Jian Luan , Tao Qin , and Tie-Yan Liu . 2020. Hi-FiSinger: Towards High-Fidelity Neural Singing Voice Synthesis. arXiv preprint arXiv:2009.01776 ( 2020 ). Jiawei Chen, Xu Tan, Jian Luan, Tao Qin, and Tie-Yan Liu. 2020. Hi-FiSinger: Towards High-Fidelity Neural Singing Voice Synthesis. arXiv preprint arXiv:2009.01776 (2020)."},{"key":"e_1_3_2_2_2_1","volume-title":"Adaspeech: Adaptive text to speech for custom voice. arXiv preprint arXiv:2103.00993","author":"Chen Mingjian","year":"2021","unstructured":"Mingjian Chen , Xu Tan , Bohan Li , Yanqing Liu , Tao Qin , Sheng Zhao , and Tie-Yan Liu . 2021 . Adaspeech: Adaptive text to speech for custom voice. arXiv preprint arXiv:2103.00993 (2021). Mingjian Chen, Xu Tan, Bohan Li, Yanqing Liu, Tao Qin, Sheng Zhao, and Tie-Yan Liu. 2021. Adaspeech: Adaptive text to speech for custom voice. arXiv preprint arXiv:2103.00993 (2021)."},{"key":"e_1_3_2_2_3_1","unstructured":"Nanxin Chen Yu Zhang Heiga Zen Ron J Weiss Mohammad Norouzi and William Chan. 2020. WaveGrad: Estimating Gradients for Waveform Generation. (2020).  Nanxin Chen Yu Zhang Heiga Zen Ron J Weiss Mohammad Norouzi and William Chan. 2020. WaveGrad: Estimating Gradients for Waveform Generation. (2020)."},{"key":"e_1_3_2_2_4_1","volume-title":"EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model. arXiv preprint arXiv:2106.09317","author":"Cui Chenye","year":"2021","unstructured":"Chenye Cui , Yi Ren , Jinglin Liu , Feiyang Chen , Rongjie Huang , Ming Lei , and Zhou Zhao . 2021 . EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model. arXiv preprint arXiv:2106.09317 (2021). Chenye Cui, Yi Ren, Jinglin Liu, Feiyang Chen, Rongjie Huang, Ming Lei, and Zhou Zhao. 2021. EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model. arXiv preprint arXiv:2106.09317 (2021)."},{"key":"e_1_3_2_2_5_1","unstructured":"Ian J. Goodfellow Jean Pouget-Abadie Mehdi Mirza Bing Xu David Warde-Farley Sherjil Ozair Aaron Courville and Yoshua Bengio. 2014. Generative Adversarial Networks. (2014). arXiv:1406.2661 [stat.ML]  Ian J. Goodfellow Jean Pouget-Abadie Mehdi Mirza Bing Xu David Warde-Farley Sherjil Ozair Aaron Courville and Yoshua Bengio. 2014. Generative Adversarial Networks. (2014). arXiv:1406.2661 [stat.ML]"},{"key":"e_1_3_2_2_6_1","volume-title":"ByteSing: A Chinese Singing Voice Synthesis System Using Duration Allocated Encoder-Decoder Acoustic Models and WaveRNN Vocoders. arXiv preprint arXiv:2004.11012","author":"Gu Yu","year":"2020","unstructured":"Yu Gu , Xiang Yin , Yonghui Rao , Yuan Wan , Benlai Tang , Yang Zhang , Jitong Chen , Yuxuan Wang , and Zejun Ma. 2020. ByteSing: A Chinese Singing Voice Synthesis System Using Duration Allocated Encoder-Decoder Acoustic Models and WaveRNN Vocoders. arXiv preprint arXiv:2004.11012 ( 2020 ). Yu Gu, Xiang Yin, Yonghui Rao, Yuan Wan, Benlai Tang, Yang Zhang, Jitong Chen, Yuxuan Wang, and Zejun Ma. 2020. ByteSing: A Chinese Singing Voice Synthesis System Using Duration Allocated Encoder-Decoder Acoustic Models and WaveRNN Vocoders. arXiv preprint arXiv:2004.11012 (2020)."},{"key":"e_1_3_2_2_7_1","doi-asserted-by":"crossref","unstructured":"Rongjie Huang Feiyang Chen Yi Ren Jinglin Liu Chenye Cui and Zhou Zhao. 2021. Multi-Singer: Fast Multi-Singer Singing Voice Vocoder With A Large Scale Corpus. (2021).  Rongjie Huang Feiyang Chen Yi Ren Jinglin Liu Chenye Cui and Zhou Zhao. 2021. Multi-Singer: Fast Multi-Singer Singing Voice Vocoder With A Large Scale Corpus. (2021).","DOI":"10.1145\/3474085.3475437"},{"key":"e_1_3_2_2_8_1","volume-title":"Jun Wang, Dan Su, Dong Yu, Yi Ren, and Zhou Zhao.","author":"Huang Rongjie","year":"2022","unstructured":"Rongjie Huang , Max WY Lam , Jun Wang, Dan Su, Dong Yu, Yi Ren, and Zhou Zhao. 2022 . FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis . arXiv preprint arXiv:2204.09934 (2022). Rongjie Huang, Max WY Lam, Jun Wang, Dan Su, Dong Yu, Yi Ren, and Zhou Zhao. 2022. FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis. arXiv preprint arXiv:2204.09934 (2022)."},{"key":"e_1_3_2_2_9_1","volume-title":"Gener-Speech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech Synthesis. arXiv preprint arXiv:2205.07211","author":"Huang Rongjie","year":"2022","unstructured":"Rongjie Huang , Yi Ren , Jinglin Liu , Chenye Cui , and Zhou Zhao . 2022. Gener-Speech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech Synthesis. arXiv preprint arXiv:2205.07211 ( 2022 ). Rongjie Huang, Yi Ren, Jinglin Liu, Chenye Cui, and Zhou Zhao. 2022. Gener-Speech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech Synthesis. arXiv preprint arXiv:2205.07211 (2022)."},{"key":"e_1_3_2_2_10_1","volume-title":"TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation. arXiv preprint arXiv:2205.12523","author":"Huang Rongjie","year":"2022","unstructured":"Rongjie Huang , Zhou Zhao , Jinglin Liu , Huadai Liu , Yi Ren , Lichao Zhang , and Jinzheng He. 2022. TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation. arXiv preprint arXiv:2205.12523 ( 2022 ). Rongjie Huang, Zhou Zhao, Jinglin Liu, Huadai Liu, Yi Ren, Lichao Zhang, and Jinzheng He. 2022. TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation. arXiv preprint arXiv:2205.12523 (2022)."},{"key":"e_1_3_2_2_11_1","volume-title":"Efficient neural audio synthesis. arXiv preprint arXiv:1802.08435","author":"Kalchbrenner Nal","year":"2018","unstructured":"Nal Kalchbrenner , Erich Elsen , Karen Simonyan , Seb Noury , Norman Casagrande , Edward Lockhart , Florian Stimberg , Aaron van den Oord , Sander Dieleman , and Koray Kavukcuoglu . 2018. Efficient neural audio synthesis. arXiv preprint arXiv:1802.08435 ( 2018 ). Nal Kalchbrenner, Erich Elsen, Karen Simonyan, Seb Noury, Norman Casagrande, Edward Lockhart, Florian Stimberg, Aaron van den Oord, Sander Dieleman, and Koray Kavukcuoglu. 2018. Efficient neural audio synthesis. arXiv preprint arXiv:1802.08435 (2018)."},{"key":"e_1_3_2_2_12_1","unstructured":"Jaehyeon Kim Sungwon Kim Jungil Kong and Sungroh Yoon. 2020. Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search. (2020). arXiv:2005.11129 [eess.AS]  Jaehyeon Kim Sungwon Kim Jungil Kong and Sungroh Yoon. 2020. Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search. (2020). arXiv:2005.11129 [eess.AS]"},{"key":"e_1_3_2_2_13_1","volume-title":"HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis. arXiv preprint arXiv:2010.05646","author":"Kong Jungil","year":"2020","unstructured":"Jungil Kong , Jaehyeon Kim , and Jaekyoung Bae . 2020. HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis. arXiv preprint arXiv:2010.05646 ( 2020 ). Jungil Kong, Jaehyeon Kim, and Jaekyoung Bae. 2020. HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis. arXiv preprint arXiv:2010.05646 (2020)."},{"key":"e_1_3_2_2_14_1","unstructured":"Zhifeng Kong Wei Ping Jiaji Huang Kexin Zhao and Bryan Catanzaro. 2020. DiffWave: A Versatile Diffusion Model for Audio Synthesis. (2020).  Zhifeng Kong Wei Ping Jiaji Huang Kexin Zhao and Bryan Catanzaro. 2020. DiffWave: A Versatile Diffusion Model for Audio Synthesis. (2020)."},{"key":"e_1_3_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/PACRIM.1993.407206"},{"key":"e_1_3_2_2_16_1","volume-title":"Jose Sotelo, Alexandre de Br\u00e9bisson, Yoshua Bengio, and Aaron C Courville.","author":"Kumar Kundan","year":"2019","unstructured":"Kundan Kumar , Rithesh Kumar , Thibault de Boissiere , Lucas Gestin , Wei Zhen Teoh , Jose Sotelo, Alexandre de Br\u00e9bisson, Yoshua Bengio, and Aaron C Courville. 2019 . Melgan : Generative adversarial networks for conditional waveform syn- thesis. (2019), 14910--14921. Kundan Kumar, Rithesh Kumar, Thibault de Boissiere, Lucas Gestin, Wei Zhen Teoh, Jose Sotelo, Alexandre de Br\u00e9bisson, Yoshua Bengio, and Aaron C Courville. 2019. Melgan: Generative adversarial networks for conditional waveform syn- thesis. (2019), 14910--14921."},{"key":"e_1_3_2_2_17_1","volume-title":"Neural speech synthesis with transformer network. 33, 01","author":"Li Naihan","year":"2019","unstructured":"Naihan Li , Shujie Liu , Yanqing Liu , Sheng Zhao , and Ming Liu . 2019. Neural speech synthesis with transformer network. 33, 01 ( 2019 ), 6706--6713. Naihan Li, Shujie Liu, Yanqing Liu, Sheng Zhao, and Ming Liu. 2019. Neural speech synthesis with transformer network. 33, 01 (2019), 6706--6713."},{"key":"e_1_3_2_2_18_1","volume-title":"Diffsinger: Singing voice synthesis via shallow diffusion mechanism. arXiv preprint arXiv:2105.02446 2","author":"Liu Jinglin","year":"2021","unstructured":"Jinglin Liu , Chengxi Li , Yi Ren , Feiyang Chen , Peng Liu , and Zhou Zhao . 2021 . Diffsinger: Singing voice synthesis via shallow diffusion mechanism. arXiv preprint arXiv:2105.02446 2 (2021). Jinglin Liu, Chengxi Li, Yi Ren, Feiyang Chen, Peng Liu, and Zhou Zhao. 2021. Diffsinger: Singing voice synthesis via shallow diffusion mechanism. arXiv preprint arXiv:2105.02446 2 (2021)."},{"key":"e_1_3_2_2_19_1","volume-title":"Eunho Yang, and Sung Ju Hwang.","author":"Min Dongchan","year":"2021","unstructured":"Dongchan Min , Dong Bok Lee , Eunho Yang, and Sung Ju Hwang. 2021 . Meta-stylespeech : Multi-speaker adaptive text-to-speech generation. (2021), 7748--7759. Dongchan Min, Dong Bok Lee, Eunho Yang, and Sung Ju Hwang. 2021. Meta-stylespeech: Multi-speaker adaptive text-to-speech generation. (2021), 7748--7759."},{"key":"e_1_3_2_2_20_1","unstructured":"Aaron Oord Yazhe Li Igor Babuschkin Karen Simonyan Oriol Vinyals Koray Kavukcuoglu George Driessche Edward Lockhart Luis Cobo Florian Stimberg etal 2018. Parallel wavenet: Fast high-fidelity speech synthesis. (2018) 3918--3926.  Aaron Oord Yazhe Li Igor Babuschkin Karen Simonyan Oriol Vinyals Koray Kavukcuoglu George Driessche Edward Lockhart Luis Cobo Florian Stimberg et al. 2018. Parallel wavenet: Fast high-fidelity speech synthesis. (2018) 3918--3926."},{"key":"e_1_3_2_2_21_1","volume-title":"Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499","author":"van den Oord Aaron","year":"2016","unstructured":"Aaron van den Oord , Sander Dieleman , Heiga Zen , Karen Simonyan , Oriol Vinyals , Alex Graves , Nal Kalchbrenner , Andrew Senior , and Koray Kavukcuoglu . 2016 . Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499 (2016). Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. 2016. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499 (2016)."},{"key":"e_1_3_2_2_22_1","volume-title":"https:\/\/ github.com\/ philsyn\/ DiffWave-Vocoder","year":"2021","unstructured":"philsyn. 2021. DiffWave-Vocoder. https:\/\/ github.com\/ philsyn\/ DiffWave-Vocoder ( 2021 ). philsyn. 2021. DiffWave-Vocoder. https:\/\/ github.com\/ philsyn\/ DiffWave-Vocoder (2021)."},{"key":"e_1_3_2_2_23_1","unstructured":"Vadim Popov Ivan Vovk Vladimir Gogoryan Tasnima Sadekova and Mikhail Kudinov. 2021. Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech. (2021).  Vadim Popov Ivan Vovk Vladimir Gogoryan Tasnima Sadekova and Mikhail Kudinov. 2021. Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech. (2021)."},{"key":"e_1_3_2_2_24_1","volume-title":"Waveglow: A flow-based generative network for speech synthesis.","author":"Prenger Ryan","year":"2019","unstructured":"Ryan Prenger , Rafael Valle , and Bryan Catanzaro . 2019 . Waveglow: A flow-based generative network for speech synthesis. (2019), 3617--3621. Ryan Prenger, Rafael Valle, and Bryan Catanzaro. 2019. Waveglow: A flow-based generative network for speech synthesis. (2019), 3617--3621."},{"key":"e_1_3_2_2_25_1","unstructured":"Flavio Protasio Ribeiro Dinei Florencio Cha Zhang and Seltze. [n.d.]. CROWD-MOS: An Approach for Crowdsourcing Mean Opinion Score Studies. ([n. d.]). Edition: ICASSP.  Flavio Protasio Ribeiro Dinei Florencio Cha Zhang and Seltze. [n.d.]. CROWD-MOS: An Approach for Crowdsourcing Mean Opinion Score Studies. ([n. d.]). Edition: ICASSP."},{"key":"e_1_3_2_2_26_1","volume-title":"Fastspeech 2: Fast and high-quality end-to-end text to speech. arXiv preprint arXiv:2006.04558","author":"Ren Yi","year":"2020","unstructured":"Yi Ren , Chenxu Hu , Xu Tan , Tao Qin , Sheng Zhao , Zhou Zhao , and Tie-Yan Liu . 2020. Fastspeech 2: Fast and high-quality end-to-end text to speech. arXiv preprint arXiv:2006.04558 ( 2020 ). Yi Ren, Chenxu Hu, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, and Tie-Yan Liu. 2020. Fastspeech 2: Fast and high-quality end-to-end text to speech. arXiv preprint arXiv:2006.04558 (2020)."},{"key":"e_1_3_2_2_27_1","volume-title":"Fastspeech: Fast, robust and controllable text to speech.","author":"Ren Yi","year":"2019","unstructured":"Yi Ren , Yangjun Ruan , Xu Tan , Tao Qin , Sheng Zhao , Zhou Zhao , and Tie-Yan Liu . 2019 . Fastspeech: Fast, robust and controllable text to speech. (2019), 3171--3180. Yi Ren, Yangjun Ruan, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, and Tie-Yan Liu. 2019. Fastspeech: Fast, robust and controllable text to speech. (2019), 3171--3180."},{"key":"e_1_3_2_2_28_1","volume-title":"Deepsinger: Singing voice synthesis with data mined from the web.","author":"Ren Yi","year":"2020","unstructured":"Yi Ren , Xu Tan , Tao Qin , Jian Luan , Zhou Zhao , and Tie-Yan Liu . 2020 . Deepsinger: Singing voice synthesis with data mined from the web. (2020), 1979--1989. Yi Ren, Xu Tan, Tao Qin, Jian Luan, Zhou Zhao, and Tie-Yan Liu. 2020. Deepsinger: Singing voice synthesis with data mined from the web. (2020), 1979--1989."},{"key":"e_1_3_2_2_29_1","unstructured":"Antony W Rix John G Beerends Michael P Hollier and Andries P Hekstra. 2001. Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. (2001).  Antony W Rix John G Beerends Michael P Hollier and Andries P Hekstra. 2001. Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. (2001)."},{"key":"e_1_3_2_2_30_1","volume-title":"et al","author":"Shen Jonathan","year":"2018","unstructured":"Jonathan Shen , Ruoming Pang , Ron J Weiss , Mike Schuster , Navdeep Jaitly , Zongheng Yang , Zhifeng Chen , Yu Zhang , Yuxuan Wang , Rj Skerrv-Ryan , et al . 2018 . Natural tts synthesis by conditioning wavenet on mel spectrogram predictions. (2018), 4779--4783. Jonathan Shen, Ruoming Pang, Ron J Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, Rj Skerrv-Ryan, et al . 2018. Natural tts synthesis by conditioning wavenet on mel spectrogram predictions. (2018), 4779--4783."},{"key":"e_1_3_2_2_31_1","doi-asserted-by":"crossref","unstructured":"Cees H Taal Richard C Hendriks Richard Heusdens and Jesper Jensen. 2010. A short-time objective intelligibility measure for time-frequency weighted noisy speech. (2010).  Cees H Taal Richard C Hendriks Richard Heusdens and Jesper Jensen. 2010. A short-time objective intelligibility measure for time-frequency weighted noisy speech. (2010).","DOI":"10.1109\/ICASSP.2010.5495701"},{"key":"e_1_3_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2019.2956145"},{"key":"e_1_3_2_2_33_1","doi-asserted-by":"crossref","unstructured":"Ryuichi Yamamoto Eunwoo Song and Jae-Min Kim. 2020. Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram. (2020) 6199--6203.  Ryuichi Yamamoto Eunwoo Song and Jae-Min Kim. 2020. Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram. (2020) 6199--6203.","DOI":"10.1109\/ICASSP40776.2020.9053795"},{"key":"e_1_3_2_2_34_1","doi-asserted-by":"crossref","unstructured":"Geng Yang Shan Yang Kai Liu Peng Fang Wei Chen and Lei Xie. 2020. Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech. (2020). arXiv:2005.05106 [cs.SD]  Geng Yang Shan Yang Kai Liu Peng Fang Wei Chen and Lei Xie. 2020. Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech. (2020). arXiv:2005.05106 [cs.SD]","DOI":"10.1109\/SLT48900.2021.9383551"},{"key":"e_1_3_2_2_35_1","volume-title":"Poet: Product-oriented Video Captioner for E-commerce. In MM '20: The 28th ACM International Conference on Multimedia. ACM, 1292--1301","author":"Zhang Shengyu","year":"2020","unstructured":"Shengyu Zhang , Ziqi Tan , Jin Yu , Zhou Zhao , Kun Kuang , Jie Liu , Jingren Zhou , Hongxia Yang , and Fei Wu . 2020 . Poet: Product-oriented Video Captioner for E-commerce. In MM '20: The 28th ACM International Conference on Multimedia. ACM, 1292--1301 . Shengyu Zhang, Ziqi Tan, Jin Yu, Zhou Zhao, Kun Kuang, Jie Liu, Jingren Zhou, Hongxia Yang, and Fei Wu. 2020. Poet: Product-oriented Video Captioner for E-commerce. In MM '20: The 28th ACM International Conference on Multimedia. ACM, 1292--1301."},{"key":"e_1_3_2_2_36_1","volume-title":"Comprehensive Information Integration Modeling Framework for Video Titling. In KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. ACM, 2744--2754","author":"Zhang Shengyu","year":"2020","unstructured":"Shengyu Zhang , Ziqi Tan , Zhou Zhao , Jin Yu , Kun Kuang , Tan Jiang , Jingren Zhou , Hongxia Yang , and Fei Wu . 2020 . Comprehensive Information Integration Modeling Framework for Video Titling. In KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. ACM, 2744--2754 . Shengyu Zhang, Ziqi Tan, Zhou Zhao, Jin Yu, Kun Kuang, Tan Jiang, Jingren Zhou, Hongxia Yang, and Fei Wu. 2020. Comprehensive Information Integration Modeling Framework for Video Titling. In KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. ACM, 2744--2754."},{"key":"e_1_3_2_2_37_1","volume-title":"Re-construct for Multi-interest Recommendation. In WWW '22: The ACM Web Conference","author":"Zhang Shengyu","year":"2022","unstructured":"Shengyu Zhang , Lingxiao Yang , Dong Yao , Yujie Lu , Fuli Feng , Zhou Zhao , Tat-Seng Chua , and Fei Wu . 2022 . Re4: Learning to Re-contrast, Re-attend , Re-construct for Multi-interest Recommendation. In WWW '22: The ACM Web Conference 2022. ACM, 2216--2226. Shengyu Zhang, Lingxiao Yang, Dong Yao, Yujie Lu, Fuli Feng, Zhou Zhao, Tat-Seng Chua, and Fei Wu. 2022. Re4: Learning to Re-contrast, Re-attend, Re-construct for Multi-interest Recommendation. In WWW '22: The ACM Web Conference 2022. ACM, 2216--2226."}],"event":{"name":"MM '22: The 30th ACM International Conference on Multimedia","location":"Lisboa Portugal","acronym":"MM '22","sponsor":["SIGMM ACM Special Interest Group on Multimedia"]},"container-title":["Proceedings of the 30th ACM International Conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3503161.3547854","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3503161.3547854","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:02:35Z","timestamp":1750186955000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3503161.3547854"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,10,10]]},"references-count":37,"alternative-id":["10.1145\/3503161.3547854","10.1145\/3503161"],"URL":"https:\/\/doi.org\/10.1145\/3503161.3547854","relation":{},"subject":[],"published":{"date-parts":[[2022,10,10]]},"assertion":[{"value":"2022-10-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}