{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,24]],"date-time":"2025-09-24T00:15:10Z","timestamp":1758672910070,"version":"3.44.0"},"publisher-location":"California","reference-count":0,"publisher":"International Joint Conferences on Artificial Intelligence Organization","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,9]]},"abstract":"<jats:p>While previous diffusion-based neural vocoders typically follow a noise-to-data generation pipe-line, the linear-degradation prior of the mel-spectrogram is often neglected, resulting in limited generation quality. By revisiting the vocoding task and excavating its connection with the signal restoration task, this paper proposes a time-frequency (T-F) domain-based neural vocoder with the Schr\u00f6dinger Bridge, called BridgeVoC, which is the first to follow the data-to-data generation paradigm. Specifically, the mel-spectrogram can be projected into the target linear-scale domain and regarded as a degraded spectral representation with a deficient rank distribution. Based on this, the Schr\u00f6dinger Bridge is leveraged to establish a connection between the degraded and target data distributions. During the inference stage, starting from the degraded representation, the target spectrum can be gradually restored rather than generated from a Gaussian noise process. Quantitative experiments on LJSpeech and LibriTTS show that BridgeVoC achieves faster inference and surpasses existing diffusion-based vocoder baselines, while also matching or exceeding non-diffusion state-of-the-art methods across evaluation metrics.<\/jats:p>","DOI":"10.24963\/ijcai.2025\/903","type":"proceedings-article","created":{"date-parts":[[2025,9,19]],"date-time":"2025-09-19T08:10:40Z","timestamp":1758269440000},"page":"8122-8130","source":"Crossref","is-referenced-by-count":0,"title":["BridgeVoC: Neural Vocoder with Schr\u00f6dinger Bridge"],"prefix":"10.24963","author":[{"given":"Tong","family":"Lei","sequence":"first","affiliation":[{"name":"Nanjing University; Tencent AI Lab"}]},{"given":"Zhiyu","family":"Zhang","sequence":"additional","affiliation":[{"name":"Southeast University"}]},{"given":"Rilin","family":"Chen","sequence":"additional","affiliation":[{"name":"Tencent AI Lab"}]},{"given":"Meng","family":"Yu","sequence":"additional","affiliation":[{"name":"Tencent AI Lab"}]},{"given":"Jing","family":"Lu","sequence":"additional","affiliation":[{"name":"Nanjing University"}]},{"given":"Chengshi","family":"Zheng","sequence":"additional","affiliation":[{"name":"Institute of Acoustics Chinese Academy of Sciences"}]},{"given":"Dong","family":"Yu","sequence":"additional","affiliation":[{"name":"Tencent AI Lab"}]},{"given":"Andong","family":"Li","sequence":"additional","affiliation":[{"name":"Institute of Acoustics Chinese Academy of Sciences"}]}],"member":"10584","event":{"number":"34","sponsor":["International Joint Conferences on Artificial Intelligence Organization (IJCAI)"],"acronym":"IJCAI-2025","name":"Thirty-Fourth International Joint Conference on Artificial Intelligence {IJCAI-25}","start":{"date-parts":[[2025,8,16]]},"theme":"Artificial Intelligence","location":"Montreal, Canada","end":{"date-parts":[[2025,8,22]]}},"container-title":["Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence"],"original-title":[],"deposited":{"date-parts":[[2025,9,23]],"date-time":"2025-09-23T11:35:26Z","timestamp":1758627326000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.ijcai.org\/proceedings\/2025\/903"}},"subtitle":[],"proceedings-subject":"Artificial Intelligence Research Articles","short-title":[],"issued":{"date-parts":[[2025,9]]},"references-count":0,"URL":"https:\/\/doi.org\/10.24963\/ijcai.2025\/903","relation":{},"subject":[],"published":{"date-parts":[[2025,9]]}}}