{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,22]],"date-time":"2026-04-22T19:37:19Z","timestamp":1776886639268,"version":"3.51.2"},"publisher-location":"New York, NY, USA","reference-count":31,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,10,18]],"date-time":"2021-10-18T00:00:00Z","timestamp":1634515200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,10,18]]},"DOI":"10.1145\/3461615.3491114","type":"proceedings-article","created":{"date-parts":[[2021,12,18]],"date-time":"2021-12-18T04:57:40Z","timestamp":1639803460000},"page":"126-130","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["TeNC: Low Bit-Rate Speech Coding with VQ-VAE and GAN"],"prefix":"10.1145","author":[{"given":"Yi","family":"Chen","sequence":"first","affiliation":[{"name":"Northwestern Polytechnical University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shan","family":"Yang","sequence":"additional","affiliation":[{"name":"Tencent AI LAB, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Na","family":"Hu","sequence":"additional","affiliation":[{"name":"Tencent AI LAB, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lei","family":"Xie","sequence":"additional","affiliation":[{"name":"Northwestern Polytechnical University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dan","family":"Su","sequence":"additional","affiliation":[{"name":"Tencent AI LAB, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,12,17]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Report: The 32-kb\/s ADPCM coding standard. AT&T technical journal 65, 5","author":"Benvenuto Nevio","year":"1986","unstructured":"Nevio Benvenuto , Guido Bertocci , William\u00a0 R Daumer , and Duncan\u00a0 K Sparrell . 1986 . Report: The 32-kb\/s ADPCM coding standard. AT&T technical journal 65, 5 (1986), 12\u201322. Nevio Benvenuto, Guido Bertocci, William\u00a0R Daumer, and Duncan\u00a0K Sparrell. 1986. Report: The 32-kb\/s ADPCM coding standard. AT&T technical journal 65, 5 (1986), 12\u201322."},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"crossref","unstructured":"Jonah Casebeer Vinjai Vale Umut Isik Jean-Marc Valin Ritwik Giri and Arvindh Krishnaswamy. 2021. Enhancing into the codec: Noise Robust Speech Coding with Vector-Quantized Autoencoders. arXiv preprint arXiv:2102.06610(2021).  Jonah Casebeer Vinjai Vale Umut Isik Jean-Marc Valin Ritwik Giri and Arvindh Krishnaswamy. 2021. Enhancing into the codec: Noise Robust Speech Coding with Vector-Quantized Autoencoders. arXiv preprint arXiv:2102.06610(2021).","DOI":"10.1109\/ICASSP39728.2021.9414605"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP40776.2020.9053220"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2019.8683277"},{"key":"e_1_3_2_1_5_1","unstructured":"Ian Goodfellow Jean Pouget-Abadie Mehdi Mirza Bing Xu David Warde-Farley Sherjil Ozair Aaron Courville and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in neural information processing systems. 2672\u20132680.  Ian Goodfellow Jean Pouget-Abadie Mehdi Mirza Bing Xu David Warde-Farley Sherjil Ozair Aaron Courville and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in neural information processing systems. 2672\u20132680."},{"key":"e_1_3_2_1_6_1","volume-title":"International Conference on Machine Learning. PMLR, 2410\u20132419","author":"Kalchbrenner Nal","year":"2018","unstructured":"Nal Kalchbrenner , Erich Elsen , Karen Simonyan , Seb Noury , Norman Casagrande , Edward Lockhart , Florian Stimberg , Aaron Oord , Sander Dieleman , and Koray Kavukcuoglu . 2018 . Efficient neural audio synthesis . In International Conference on Machine Learning. PMLR, 2410\u20132419 . Nal Kalchbrenner, Erich Elsen, Karen Simonyan, Seb Noury, Norman Casagrande, Edward Lockhart, Florian Stimberg, Aaron Oord, Sander Dieleman, and Koray Kavukcuoglu. 2018. Efficient neural audio synthesis. In International Conference on Machine Learning. PMLR, 2410\u20132419."},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2018.8462529"},{"key":"e_1_3_2_1_8_1","volume-title":"Generative Speech Coding with Predictive Variance Regularization. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6478\u20136482","author":"Kleijn W\u00a0Bastiaan","year":"2021","unstructured":"W\u00a0Bastiaan Kleijn , Andrew Storus , Michael Chinen , Tom Denton , Felicia\u00a0 SC Lim , Alejandro Luebs , Jan Skoglund , and Hengchin Yeh . 2021 . Generative Speech Coding with Predictive Variance Regularization. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6478\u20136482 . W\u00a0Bastiaan Kleijn, Andrew Storus, Michael Chinen, Tom Denton, Felicia\u00a0SC Lim, Alejandro Luebs, Jan Skoglund, and Hengchin Yeh. 2021. Generative Speech Coding with Predictive Variance Regularization. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6478\u20136482."},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2019.8682435"},{"key":"e_1_3_2_1_10_1","unstructured":"Jungil Kong Jaehyeon Kim and Jaekyoung Bae. 2020. HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis. arXiv preprint arXiv:2010.05646(2020).  Jungil Kong Jaehyeon Kim and Jaekyoung Bae. 2020. HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis. arXiv preprint arXiv:2010.05646(2020)."},{"key":"e_1_3_2_1_11_1","volume-title":"Melgan: Generative adversarial networks for conditional waveform synthesis. arXiv preprint arXiv:1910.06711(2019).","author":"Kumar Kundan","year":"2019","unstructured":"Kundan Kumar , Rithesh Kumar , Thibault de Boissiere , Lucas Gestin , Wei\u00a0Zhen Teoh , Jose Sotelo , Alexandre de Br\u00e9bisson , Yoshua Bengio , and Aaron Courville . 2019 . Melgan: Generative adversarial networks for conditional waveform synthesis. arXiv preprint arXiv:1910.06711(2019). Kundan Kumar, Rithesh Kumar, Thibault de Boissiere, Lucas Gestin, Wei\u00a0Zhen Teoh, Jose Sotelo, Alexandre de Br\u00e9bisson, Yoshua Bengio, and Aaron Courville. 2019. Melgan: Generative adversarial networks for conditional waveform synthesis. arXiv preprint arXiv:1910.06711(2019)."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.304"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASSP.1974.1162554"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.1996.540325"},{"key":"e_1_3_2_1_15_1","unstructured":"Soroush Mehri Kundan Kumar Ishaan Gulrajani Rithesh Kumar Shubham Jain Jose Sotelo Aaron Courville and Yoshua Bengio. 2016. SampleRNN: An unconditional end-to-end neural audio generation model. arXiv preprint arXiv:1612.07837(2016).  Soroush Mehri Kundan Kumar Ishaan Gulrajani Rithesh Kumar Shubham Jain Jose Sotelo Aaron Courville and Yoshua Bengio. 2016. SampleRNN: An unconditional end-to-end neural audio generation model. arXiv preprint arXiv:1612.07837(2016)."},{"key":"e_1_3_2_1_16_1","volume-title":"Deep Vocoder: Low Bit Rate Compression of Speech with Deep Autoencoder. In 2019 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). IEEE, 372\u2013377","author":"Min Gang","year":"2019","unstructured":"Gang Min , Changqing Zhang , Xiongwei Zhang , and Wei Tan . 2019 . Deep Vocoder: Low Bit Rate Compression of Speech with Deep Autoencoder. In 2019 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). IEEE, 372\u2013377 . Gang Min, Changqing Zhang, Xiongwei Zhang, and Wei Tan. 2019. Deep Vocoder: Low Bit Rate Compression of Speech with Deep Autoencoder. In 2019 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). IEEE, 372\u2013377."},{"key":"e_1_3_2_1_17_1","unstructured":"Takeru Miyato Toshiki Kataoka Masanori Koyama and Yuichi Yoshida. 2018. Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957(2018).  Takeru Miyato Toshiki Kataoka Masanori Koyama and Yuichi Yoshida. 2018. Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957(2018)."},{"key":"e_1_3_2_1_18_1","volume-title":"Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499(2016).","author":"van\u00a0den Oord Aaron","year":"2016","unstructured":"Aaron van\u00a0den Oord , Sander Dieleman , Heiga Zen , Karen Simonyan , Oriol Vinyals , Alex Graves , Nal Kalchbrenner , Andrew Senior , and Koray Kavukcuoglu . 2016 . Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499(2016). Aaron van\u00a0den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. 2016. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499(2016)."},{"key":"e_1_3_2_1_19_1","unstructured":"Aaron van\u00a0den Oord Oriol Vinyals and Koray Kavukcuoglu. 2017. Neural discrete representation learning. arXiv preprint arXiv:1711.00937(2017).  Aaron van\u00a0den Oord Oriol Vinyals and Koray Kavukcuoglu. 2017. Neural discrete representation learning. arXiv preprint arXiv:1711.00937(2017)."},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/45.1890"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2015.7178964"},{"key":"e_1_3_2_1_22_1","volume-title":"BS","author":"Recommendation ITUR","year":"2001","unstructured":"ITUR Recommendation . 2001. Method for the subjective assessment of intermediate sound quality (MUSHRA). ITU , BS ( 2001 ), 1543\u20131. ITUR Recommendation. 2001. Method for the subjective assessment of intermediate sound quality (MUSHRA). ITU, BS (2001), 1543\u20131."},{"key":"e_1_3_2_1_23_1","unstructured":"P Revised\u00a0Recommendation. 1996. 800 (Methods for Subjective Determination of Transmission Quality).  P Revised\u00a0Recommendation. 1996. 800 (Methods for Subjective Determination of Transmission Quality)."},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2001.941023"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"crossref","unstructured":"Jan Skoglund and Jean-Marc Valin. 2019. Improving Opus low bit rate quality with neural speech synthesis. arXiv preprint arXiv:1905.04628(2019).  Jan Skoglund and Jean-Marc Valin. 2019. Improving Opus low bit rate quality with neural speech synthesis. arXiv preprint arXiv:1905.04628(2019).","DOI":"10.21437\/Interspeech.2020-2939"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"crossref","unstructured":"Naoya Takahashi Mayank\u00a0Kumar Singh and Yuki Mitsufuji. 2021. Hierarchical disentangled representation learning for singing voice conversion. arXiv preprint arXiv:2101.06842(2021).  Naoya Takahashi Mayank\u00a0Kumar Singh and Yuki Mitsufuji. 2021. Hierarchical disentangled representation learning for singing voice conversion. arXiv preprint arXiv:2101.06842(2021).","DOI":"10.1109\/IJCNN52387.2021.9533583"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2019.8682804"},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"crossref","unstructured":"Jean-Marc Valin and Jan Skoglund. 2019. A real-time wideband neural vocoder at 1.6 kb\/s using LPCNet. arXiv preprint arXiv:1903.12087(2019).  Jean-Marc Valin and Jan Skoglund. 2019. A real-time wideband neural vocoder at 1.6 kb\/s using LPCNet. arXiv preprint arXiv:1903.12087(2019).","DOI":"10.21437\/Interspeech.2019-1255"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP40776.2020.9053795"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"crossref","unstructured":"Geng Yang Shan Yang Kai Liu Peng Fang Wei Chen and Lei Xie. 2020. Multi-band MelGAN: Faster waveform generation for high-quality text-to-speech. arXiv preprint arXiv:2005.05106(2020).  Geng Yang Shan Yang Kai Liu Peng Fang Wei Chen and Lei Xie. 2020. Multi-band MelGAN: Faster waveform generation for high-quality text-to-speech. arXiv preprint arXiv:2005.05106(2020).","DOI":"10.1109\/SLT48900.2021.9383551"}],"event":{"name":"ICMI '21: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION","location":"Montreal QC Canada","acronym":"ICMI '21","sponsor":["SIGCHI ACM Special Interest Group on Computer-Human Interaction"]},"container-title":["Companion Publication of the 2021 International Conference on Multimodal Interaction"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3461615.3491114","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3461615.3491114","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:49:04Z","timestamp":1750193344000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3461615.3491114"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,18]]},"references-count":31,"alternative-id":["10.1145\/3461615.3491114","10.1145\/3461615"],"URL":"https:\/\/doi.org\/10.1145\/3461615.3491114","relation":{},"subject":[],"published":{"date-parts":[[2021,10,18]]},"assertion":[{"value":"2021-12-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}