{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,6]],"date-time":"2026-05-06T15:45:46Z","timestamp":1778082346312,"version":"3.51.4"},"reference-count":54,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2024,1,2]],"date-time":"2024-01-02T00:00:00Z","timestamp":1704153600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,1,2]],"date-time":"2024-01-02T00:00:00Z","timestamp":1704153600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100021171","name":"Basic and Applied Basic Research Foundation of Guangdong Province","doi-asserted-by":"publisher","award":["2021A1515011999"],"award-info":[{"award-number":["2021A1515011999"]}],"id":[{"id":"10.13039\/501100021171","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100011338","name":"Key Laboratory of Chemical Biology and Traditional Chinese Medicine Research, Ministry of Education","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100011338","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Guangdong Province Big Data Innovation Engineering Technology Research Center"},{"DOI":"10.13039\/501100012226","name":"Fundamental Research Funds for the Central Universities","doi-asserted-by":"publisher","award":["21619412"],"award-info":[{"award-number":["21619412"]}],"id":[{"id":"10.13039\/501100012226","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["EURASIP J. Adv. Signal Process."],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Text-to-speech synthesis plays an essential role in facilitating human-computer interaction. Currently, the predominant approach in Text-to-speech acoustic models selects only the Mel spectrum as an intermediate feature for converting text to speech. However, the Mel spectrograms obtained may exhibit ambiguity in some aspects owing to the limited capability of the Fourier transform to capture mutation signals during the acquisition of the Mel spectrograms. With the aim of improving the clarity of synthesized speech, this study proposes a multi-task learning optimization method and conducts experiments on the Tacotron2 speech synthesis system to demonstrate the effectiveness of the proposed method. The method in the study introduces an additional task: wavelet spectrograms. The continuous wavelet transform has gained significant popularity in various applications, including speech enhancement and speech recognition, which is primarily attributed to its capability to adaptively vary the time-frequency resolution and its excellent performance in capturing non-stationary signals. This study highlights that the clarity of Tacotron2 synthesized speech can be improved by introducing Wavelet-spectrogram as an auxiliary task through theoretical and experimental analysis: a feature extraction network is added, and Wavelet-spectrogram features are extracted from the Mel spectrum output generated by the decoder. Experimental findings indicate that the Mean Opinion Score achieved for the speech synthesized by the model using multi-task learning is 0.17 higher compared to the baseline model. Furthermore, by analyzing the factors contributing to the success of the continuous wavelet transform-based multi-task learning method in the Tacotron2 model, as well as the effectiveness of multi-task learning, the study conjectures that the proposed method has the potential to enhance the performance of other acoustic models.<\/jats:p>","DOI":"10.1186\/s13634-023-01096-x","type":"journal-article","created":{"date-parts":[[2024,1,2]],"date-time":"2024-01-02T06:02:24Z","timestamp":1704175344000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["A multi-task learning speech synthesis optimization method based on CWT: a case study of Tacotron2"],"prefix":"10.1186","volume":"2024","author":[{"given":"Guoqiang","family":"Hu","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhuofan","family":"Ruan","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wenqiu","family":"Guo","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6206-7022","authenticated-orcid":false,"given":"Yujuan","family":"Quan","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2024,1,2]]},"reference":[{"key":"1096_CR1","doi-asserted-by":"publisher","unstructured":"H. Zen, T. Toda, An Overview of Nitech HMM-based Speech Synthesis System for Blizzard Challenge 2005, in Proceeding of the Interspeech 2005. ISCA, Lisbon, Portugal, pp. 93\u201396 (2005). https:\/\/doi.org\/10.21437\/interspeech.2005-76","DOI":"10.21437\/interspeech.2005-76"},{"issue":"7","key":"1096_CR2","doi-asserted-by":"publisher","first-page":"5837","DOI":"10.1007\/s10462-022-10315-0","volume":"56","author":"N Kaur","year":"2023","unstructured":"N. Kaur, P. Singh, Conventional and contemporary approaches used in text to speech synthesis: a review. Artif. Intell. Rev. 2022, 1\u201344 (2022). https:\/\/doi.org\/10.1007\/s10462-022-10315-0","journal-title":"Artif. Intell. Rev."},{"key":"1096_CR3","doi-asserted-by":"publisher","first-page":"4050","DOI":"10.3390\/app9194050","volume":"9","author":"Y Ning","year":"2019","unstructured":"Y. Ning, S. He, Z. Wu, C. Xing, L.-J. Zhang, A review of deep learning based speech synthesis. Appl. Sci. 9, 4050 (2019). https:\/\/doi.org\/10.3390\/app9194050","journal-title":"Appl. Sci."},{"key":"1096_CR4","doi-asserted-by":"publisher","unstructured":"N. Li, S. Liu, Y. Liu, S. Zhao, M. Liu, Neural Speech Synthesis with Transformer Network, in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33. PKP Publishing, Honolulu, Hawaii, pp. 6706\u20136713 (2019). https:\/\/doi.org\/10.1609\/aaai.v33i01.33016706","DOI":"10.1609\/aaai.v33i01.33016706"},{"key":"1096_CR5","doi-asserted-by":"publisher","unstructured":"J. Shen, R. Pang, R.J. Weiss, M. Schuster, N. Jaitly, Z. Yang, Z. Chen, Y. Zhang, Y. Wang, R. Skerrv-Ryan, R.A. Saurous, Y. Agiomvrgiannakis, Y. Wu, Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions, in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, Calgary, AB, Canada, pp. 4779\u20134783 (2018). https:\/\/doi.org\/10.1109\/icassp.2018.8461368","DOI":"10.1109\/icassp.2018.8461368"},{"key":"1096_CR6","unstructured":"Y. Ren, Y. Ruan, X. Tan, T. Qin, S. Zhao, Z. Zhao, T.-Y. Liu, Fastspeech: Fast, robust and controllable text to speech. Advances in Neural Information Processing Systems 32 (2019)"},{"key":"1096_CR7","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.physrep.2022.08.001","volume":"985","author":"RC Guido","year":"2022","unstructured":"R.C. Guido, Wavelets behind the scenes: Practical aspects, insights, and perspectives. Phys. Rep. 985, 1\u201323 (2022). https:\/\/doi.org\/10.1016\/j.physrep.2022.08.001","journal-title":"Phys. Rep."},{"key":"1096_CR8","doi-asserted-by":"publisher","first-page":"1696","DOI":"10.1109\/tsp.2019.2896246","volume":"67","author":"X Zheng","year":"2019","unstructured":"X. Zheng, Y. Tang, J. Zhou, A framework of adaptive multiscale wavelet decomposition for signals on undirected graphs. IEEE Trans. Signal Process. 67, 1696\u20131711 (2019). https:\/\/doi.org\/10.1109\/tsp.2019.2896246","journal-title":"IEEE Trans. Signal Process."},{"key":"1096_CR9","doi-asserted-by":"publisher","first-page":"1950050","DOI":"10.1142\/s0219691319500504","volume":"17","author":"L Yang","year":"2019","unstructured":"L. Yang, H. Su, C. Zhong, Z. Meng, H. Luo, X. Li, Y.Y. Tang, Y. Lu, Hyperspectral image classification using wavelet transform-based smooth ordering. Int. J. Wavelets Multiresolut. Inf. Process. 17, 1950050 (2019). https:\/\/doi.org\/10.1142\/s0219691319500504","journal-title":"Int. J. Wavelets Multiresolut. Inf. Process."},{"key":"1096_CR10","doi-asserted-by":"publisher","first-page":"89","DOI":"10.1109\/msp.2017.2672759","volume":"34","author":"RC Guido","year":"2017","unstructured":"R.C. Guido, Effectively interpreting discrete wavelet transformed signals [lecture notes]. IEEE Signal Process. Mag. 34, 89\u2013100 (2017). https:\/\/doi.org\/10.1109\/msp.2017.2672759","journal-title":"IEEE Signal Process. Mag."},{"key":"1096_CR11","doi-asserted-by":"publisher","first-page":"162","DOI":"10.1109\/msp.2014.2368586","volume":"32","author":"RC Guido","year":"2015","unstructured":"R.C. Guido, Practical and useful tips on discrete wavelet transforms [sp tips & tricks]. IEEE Signal Process. Mag. 32, 162\u2013166 (2015). https:\/\/doi.org\/10.1109\/msp.2014.2368586","journal-title":"IEEE Signal Process. Mag."},{"key":"1096_CR12","doi-asserted-by":"publisher","first-page":"304","DOI":"10.3390\/e21030304","volume":"21","author":"E Guariglia","year":"2019","unstructured":"E. Guariglia, Primality, fractality, and image analysis. Entropy 21, 304 (2019). https:\/\/doi.org\/10.3390\/e21030304","journal-title":"Entropy"},{"key":"1096_CR13","doi-asserted-by":"publisher","first-page":"337","DOI":"10.1007\/978-3-319-42105-6_16","volume-title":"Engineering Mathematics II. Springer Proceedings in Mathematics & Statistics","author":"E Guariglia","year":"2016","unstructured":"E. Guariglia, S. Silvestrov, Fractional-wavelet analysis of positive definite distributions and wavelets on D\u2019(C), in Engineering Mathematics II. Springer Proceedings in Mathematics & Statistics, vol. 179, ed. by S. Silvestrov, M. Ran\u010di\u0107 (Springer, Cham, 2016), pp.337\u2013353. https:\/\/doi.org\/10.1007\/978-3-319-42105-6_16"},{"key":"1096_CR14","doi-asserted-by":"publisher","first-page":"5542054","DOI":"10.1155\/2022\/5542054","volume":"2022","author":"E Guariglia","year":"2022","unstructured":"E. Guariglia, R.C. Guido, Chebyshev wavelet analysis. J. Funct. Spaces 2022, 5542054 (2022). https:\/\/doi.org\/10.1155\/2022\/5542054","journal-title":"J. Funct. Spaces"},{"key":"1096_CR15","doi-asserted-by":"publisher","first-page":"674","DOI":"10.1109\/34.192463","volume":"11","author":"SG Mallat","year":"1989","unstructured":"S.G. Mallat, A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans. Pattern Anal. Mach. Intell. 11, 674\u2013693 (1989). https:\/\/doi.org\/10.1109\/34.192463","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"1096_CR16","doi-asserted-by":"publisher","unstructured":"A. Grossmann, R. Kronland-Martinet, J. Morlet, Reading and Understanding Continuous Wavelet Transforms. Wavelets. Inverse Problems and Theoretical Imaging (Springer, Berlin, Heidelberg, 1989), pp.2\u201320. https:\/\/doi.org\/10.1007\/978-3-642-97177-8_1","DOI":"10.1007\/978-3-642-97177-8_1"},{"key":"1096_CR17","doi-asserted-by":"publisher","first-page":"41","DOI":"10.1023\/a:1007379606734","volume":"28","author":"R Caruana","year":"1997","unstructured":"R. Caruana, Multitask learning. Mach. Learn. 28, 41\u201375 (1997). https:\/\/doi.org\/10.1023\/a:1007379606734","journal-title":"Mach. Learn."},{"key":"1096_CR18","doi-asserted-by":"publisher","first-page":"130","DOI":"10.1080\/02564602.2018.1432422","volume":"36","author":"N Adiga","year":"2019","unstructured":"N. Adiga, S.R.M. Prasanna, Acoustic features modelling for statistical parametric speech synthesis: a review. IETE Tech. Rev. 36, 130\u2013149 (2019). https:\/\/doi.org\/10.1080\/02564602.2018.1432422","journal-title":"IETE Tech. Rev."},{"key":"1096_CR19","doi-asserted-by":"publisher","unstructured":"Y. Ren, C. Hu, X. Tan, T. Qin, S. Zhao, Z. Zhao, T.-Y. Liu, Fastspeech 2: Fast and high-quality end-to-end text to speech. arXiv:2006.04558 [eess.AS] (2022) https:\/\/doi.org\/10.48550\/arXiv.2006.04558","DOI":"10.48550\/arXiv.2006.04558"},{"key":"1096_CR20","doi-asserted-by":"publisher","first-page":"7","DOI":"10.1250\/ast.24.7","volume":"24","author":"Y Chisaki","year":"2003","unstructured":"Y. Chisaki, H. Nakashima, S. Shiroshita, T. Usagawa, M. Ebata, A pitch detection method based on continuous wavelet transform for harmonic signal. Acoust. Sci. Technol. 24, 7\u201316 (2003). https:\/\/doi.org\/10.1250\/ast.24.7","journal-title":"Acoust. Sci. Technol."},{"key":"1096_CR21","doi-asserted-by":"publisher","first-page":"917","DOI":"10.1109\/18.119752","volume":"38","author":"S Kadambe","year":"1992","unstructured":"S. Kadambe, G.F. Boudreaux-Bartels, Application of the wavelet transform for pitch detection of speech signals. IEEE Trans. Inf. Theory 38, 917\u2013924 (1992). https:\/\/doi.org\/10.1109\/18.119752","journal-title":"IEEE Trans. Inf. Theory"},{"key":"1096_CR22","doi-asserted-by":"publisher","unstructured":"A. Mehrish, N. Majumder, R. Bhardwaj, R. Mihalcea, S. Poria, A review of deep learning techniques for speech processing. arXiv:2305.00359 [eess.AS] (2023) https:\/\/doi.org\/10.48550\/arXiv.2305.00359","DOI":"10.48550\/arXiv.2305.00359"},{"key":"1096_CR23","doi-asserted-by":"publisher","unstructured":"K. O\u2019Shea, R. Nash, An introduction to convolutional neural networks. arXiv:1511.08458 [cs.NE] (2015) https:\/\/doi.org\/10.48550\/arXiv.1511.08458","DOI":"10.48550\/arXiv.1511.08458"},{"key":"1096_CR24","doi-asserted-by":"publisher","first-page":"132306","DOI":"10.1016\/j.physd.2019.132306","volume":"404","author":"A Sherstinsky","year":"2020","unstructured":"A. Sherstinsky, Fundamentals of recurrent neural network (rnn) and long short-term memory (lstm) network. Phys. D Nonlinear Phenom. 404, 132306 (2020). https:\/\/doi.org\/10.1016\/j.physd.2019.132306","journal-title":"Phys. D Nonlinear Phenom."},{"key":"1096_CR25","unstructured":"A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, L. Kaiser, I. Polosukhin, Attention is all you need. Advances in Neural Information Processing Systems 30 (2017)"},{"key":"1096_CR26","unstructured":"S.O. Ar\u0131k, M. Chrzanowski, A.Coates, G. Diamos, A. Gibiansky, Y.Kang, X. Li, J. Miller, A. Ng, J. Raiman, S. Sengupta, M. Shoeybi, Deep Voice: Real-time Neural Text-to-Speech, in Proceedings of the 34th International Conference on Machine Learning, vol. 70. PMLR, Sydney, Australia, pp. 195\u2013204 (2017). https:\/\/proceedings.mlr.press\/v70\/arik17a.html"},{"key":"1096_CR27","unstructured":"A. Gibiansky, S. Arik, G. Diamos, J. Miller, K. Peng, W. Ping, J. Raiman, Y. Zhou, Deep voice 2: Multi-speaker neural text-to-speech. Advances in Neural Information Processing Systems 30 (2017)"},{"key":"1096_CR28","doi-asserted-by":"publisher","unstructured":"W. Ping, K. Peng, A. Gibiansky, S.O. Arik, A. Kannan, S. Narang, J. Raiman, J. Miller, Deep voice 3: Scaling text-to-speech with convolutional sequence learning. arXiv:1710.07654 [cs.SD] (2018) https:\/\/doi.org\/10.48550\/arXiv.1710.07654","DOI":"10.48550\/arXiv.1710.07654"},{"key":"1096_CR29","doi-asserted-by":"publisher","first-page":"199","DOI":"10.1016\/j.dsp.2018.04.005","volume":"79","author":"S Ouelha","year":"2018","unstructured":"S. Ouelha, A. A\u00efssa-El-Bey, B. Boashash, An improved time-frequency noise reduction method using a psycho-acoustic mel model. Digit. Signal Process. 79, 199\u2013212 (2018). https:\/\/doi.org\/10.1016\/j.dsp.2018.04.005","journal-title":"Digit. Signal Process."},{"key":"1096_CR30","doi-asserted-by":"publisher","unstructured":"J. Xiao, J. Liu, D. Li, L. Zhao, Q. Wang, Speech Intelligibility Enhancement By Non-Parallel Speech Style Conversion Using CWT and iMetricGAN Based CycleGAN, in MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, vol. 13141. Springer, Cham, pp. 544\u2013556 (2022). https:\/\/doi.org\/10.1007\/978-3-030-98358-1_43","DOI":"10.1007\/978-3-030-98358-1_43"},{"key":"1096_CR31","doi-asserted-by":"publisher","unstructured":"Y. Gu, Y. Kang, Multi-task wavenet: A multi-task generative model for statistical parametric speech synthesis without fundamental frequency conditions. arXiv:1806.08619 [eess.AS] (2018) https:\/\/doi.org\/10.48550\/arXiv.1806.08619","DOI":"10.48550\/arXiv.1806.08619"},{"key":"1096_CR32","doi-asserted-by":"publisher","unstructured":"Z. Huang, J. Li, S.M. Siniscalchi, I.-F. Chen, J. Wu, C.-H. Lee, Rapid adaptation for deep neural networks through multi-task learning, in Sixteenth Annual Conference of the International Speech Communication Association. INTERSPEECH, ISCA, Dresden, Germany, pp. 3625\u20133629 (2015). https:\/\/doi.org\/10.21437\/interspeech.2015-719","DOI":"10.21437\/interspeech.2015-719"},{"key":"1096_CR33","doi-asserted-by":"publisher","unstructured":"Z. Wu, C. Valentini-Botinhao, O. Watts, S. King, Deep neural networks employing multi-task learning and stacked bottleneck features for speech synthesis, in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, South Brisbane, QLD, Australia, pp. 4460\u20134464 (2015). https:\/\/doi.org\/10.1109\/icassp.2015.7178814","DOI":"10.1109\/icassp.2015.7178814"},{"key":"1096_CR34","doi-asserted-by":"publisher","first-page":"101243","DOI":"10.1016\/j.csl.2021.101243","volume":"70","author":"J Chen","year":"2021","unstructured":"J. Chen, L. Ye, Z. Ming, Mass: Multi-task anthropomorphic speech synthesis framework. Comput. Speech Lang. 70, 101243 (2021). https:\/\/doi.org\/10.1016\/j.csl.2021.101243","journal-title":"Comput. Speech Lang."},{"key":"1096_CR35","doi-asserted-by":"publisher","unstructured":"J.-T. Huang, J. Li, D. Yu, L. Deng, Y. Gong, Cross-language Knowledge Transfer Using Multilingual Deep Neural Network with Shared Hidden Layers, in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, Vancouver, BC, Canada, pp. 7304\u20137308 (2013). https:\/\/doi.org\/10.1109\/icassp.2013.6639081","DOI":"10.1109\/icassp.2013.6639081"},{"key":"1096_CR36","doi-asserted-by":"publisher","unstructured":"C.J. Peng, Y.L. Shen, Y.J. Chan, C. Yu, Y. Tsao, T.S. Chi, Perceptual Characteristics Based Multi-objective Model for Speech Enhancement, in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, vol. 2022-September, Incheon, Korea. ISCA, pp. 211\u2013215 (2022). https:\/\/doi.org\/10.21437\/interspeech.2022-11197","DOI":"10.21437\/interspeech.2022-11197"},{"key":"1096_CR37","doi-asserted-by":"publisher","unstructured":"J. Lee, S. Han, H. Cho, W. Jung, PHASEAUG: a differentiable augmentation for speech synthesis to simulate one-to-many mapping, in ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, Rhodes Island, Greece, pp. 1\u20135 (2023). https:\/\/doi.org\/10.1109\/ICASSP49357.2023.10096374. https:\/\/ieeexplore.ieee.org\/abstract\/document\/10096374","DOI":"10.1109\/ICASSP49357.2023.10096374"},{"key":"1096_CR38","doi-asserted-by":"publisher","first-page":"101103","DOI":"10.1016\/j.csl.2020.101103","volume":"64","author":"G Pironkov","year":"2020","unstructured":"G. Pironkov, S.U. Wood, S. Dupont, Hybrid-task learning for robust automatic speech recognition. Comput. Speech Lang. 64, 101103 (2020). https:\/\/doi.org\/10.1016\/j.csl.2020.101103","journal-title":"Comput. Speech Lang."},{"key":"1096_CR39","doi-asserted-by":"publisher","first-page":"10","DOI":"10.1002\/ecja.4400660203","volume":"66","author":"S Imai","year":"1983","unstructured":"S. Imai, K. Sumita, C. Furuichi, Mel log spectrum approximation (mlsa) filter for speech synthesis. Electron. Commun. Japan (Part I Commun.) 66, 10\u201318 (1983). https:\/\/doi.org\/10.1002\/ecja.4400660203","journal-title":"Electron. Commun. Japan (Part I Commun.)"},{"key":"1096_CR40","volume-title":"Spectral Analysis of Signals","author":"P Stoica","year":"2005","unstructured":"P. Stoica, R.L. Moses, Spectral Analysis of Signals, vol. 452 (Pearson Prentice Hall, Upper Saddle Riverr, 2005)"},{"key":"1096_CR41","doi-asserted-by":"publisher","first-page":"961","DOI":"10.1109\/18.57199","volume":"36","author":"I Daubechies","year":"1990","unstructured":"I. Daubechies, The wavelet transform, time-frequency localization and signal analysis. IEEE Trans. Inf. Theory 36, 961\u20131005 (1990). https:\/\/doi.org\/10.1109\/18.57199","journal-title":"IEEE Trans. Inf. Theory"},{"key":"1096_CR42","doi-asserted-by":"publisher","first-page":"289","DOI":"10.1016\/j.triboint.2015.12.037","volume":"96","author":"A Rai","year":"2016","unstructured":"A. Rai, S.H. Upadhyay, A review on signal processing techniques utilized in the fault diagnosis of rolling element bearings. Tribol. Int. 96, 289\u2013306 (2016). https:\/\/doi.org\/10.1016\/j.triboint.2015.12.037","journal-title":"Tribol. Int."},{"key":"1096_CR43","doi-asserted-by":"publisher","unstructured":"H. Soltau, G. Saon, T.N. Sainath, Joint Training of Convolutional and Non-convolutional Neural Networks, in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, Florence, Italy, pp. 5572\u20135576 (2014). https:\/\/doi.org\/10.1109\/icassp.2014.6854669","DOI":"10.1109\/icassp.2014.6854669"},{"key":"1096_CR44","doi-asserted-by":"publisher","first-page":"395","DOI":"10.1146\/annurev.fl.24.010192.002143","volume":"24","author":"M Farge","year":"1992","unstructured":"M. Farge, Wavelet transforms and their applications to turbulence. Annu. Rev. Fluid Mech. 24, 395\u2013458 (1992). https:\/\/doi.org\/10.1146\/annurev.fl.24.010192.002143","journal-title":"Annu. Rev. Fluid Mech."},{"key":"1096_CR45","doi-asserted-by":"publisher","first-page":"617","DOI":"10.1109\/18.119727","volume":"38","author":"S Mallat","year":"1992","unstructured":"S. Mallat, W.L. Hwang, Singularity detection and processing with wavelets. IEEE Trans. Inf. Theory 38, 617\u2013643 (1992). https:\/\/doi.org\/10.1109\/18.119727","journal-title":"IEEE Trans. Inf. Theory"},{"key":"1096_CR46","doi-asserted-by":"publisher","unstructured":"M.S. Ribeiro, O. Watts, J. Yamagishi, R.A.J. Clark, Wavelet-based Decomposition of F0 as a Secondary Task for DNN-based Speech Synthesis with Multi-task Learning, in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, Shanghai, China, pp. 5525\u20135529 (2016). https:\/\/doi.org\/10.1109\/ICASSP.2016.7472734. https:\/\/ieeexplore.ieee.org\/abstract\/document\/7472734","DOI":"10.1109\/ICASSP.2016.7472734"},{"key":"1096_CR47","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611970104","volume-title":"Ten Lectures on Wavelets","author":"I Daubechies","year":"1992","unstructured":"I. Daubechies, Ten Lectures on Wavelets (Society For Industrial And Applied Mathematics, Philadelphia, 1992)"},{"key":"1096_CR48","volume-title":"Time-Frequency Analysis","author":"L Cohen","year":"1995","unstructured":"L. Cohen, Time-Frequency Analysis (Prentice Hall PTR, Upper Saddle River, 1995)"},{"key":"1096_CR49","doi-asserted-by":"publisher","unstructured":"S. Qin, Z. Ji, Multi-resolution time-frequency analysis for detection of rhythms of EEG signals, in 3rd IEEE Signal Processing Education Workshop. 2004 IEEE 11th Digital Signal Processing Workshop, 2004. IEEE, Taos Ski Valley, NM, USA, pp. 338\u2013341 (2004). https:\/\/doi.org\/10.1109\/DSPWS.2004.1437971. https:\/\/ieeexplore.ieee.org\/abstract\/document\/1437971","DOI":"10.1109\/DSPWS.2004.1437971"},{"key":"1096_CR50","unstructured":"K. Ito, L. Johnson, The LJ Speech Dataset (2017). https:\/\/keithito.com\/LJ-Speech-Dataset\/"},{"key":"1096_CR51","doi-asserted-by":"publisher","first-page":"403","DOI":"10.1007\/bf02163027","volume":"14","author":"GH Golub","year":"1970","unstructured":"G.H. Golub, C. Reinsch, Singular value decomposition and least squares solutions. Numer. Math. 14, 403\u2013420 (1970). https:\/\/doi.org\/10.1007\/bf02163027","journal-title":"Numer. Math."},{"key":"1096_CR52","unstructured":"S. Ioffe, C. Szegedy, Batch normalization: accelerating deep network training by reducing internal covariate shift, in Proceedings of the 32nd International Conference on Machine Learning, vol. 37. PMLR, Lille, France, pp. 448\u2013456 (2015). http:\/\/proceedings.mlr.press\/v37\/ioffe15.html"},{"key":"1096_CR53","doi-asserted-by":"publisher","first-page":"2673","DOI":"10.1109\/78.650093","volume":"45","author":"M Schuster","year":"1997","unstructured":"M. Schuster, K.K. Paliwal, Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45, 2673\u20132681 (1997). https:\/\/doi.org\/10.1109\/78.650093","journal-title":"IEEE Trans. Signal Process."},{"key":"1096_CR54","doi-asserted-by":"publisher","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","volume":"9","author":"S Hochreiter","year":"1997","unstructured":"S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput. 9, 1735\u20131780 (1997). https:\/\/doi.org\/10.1162\/neco.1997.9.8.1735","journal-title":"Neural Comput."}],"container-title":["EURASIP Journal on Advances in Signal Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13634-023-01096-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13634-023-01096-x\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13634-023-01096-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,1,2]],"date-time":"2024-01-02T06:07:32Z","timestamp":1704175652000},"score":1,"resource":{"primary":{"URL":"https:\/\/asp-eurasipjournals.springeropen.com\/articles\/10.1186\/s13634-023-01096-x"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,1,2]]},"references-count":54,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2024,12]]}},"alternative-id":["1096"],"URL":"https:\/\/doi.org\/10.1186\/s13634-023-01096-x","relation":{"has-preprint":[{"id-type":"doi","id":"10.21203\/rs.3.rs-3369751\/v1","asserted-by":"object"}]},"ISSN":["1687-6180"],"issn-type":[{"value":"1687-6180","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,1,2]]},"assertion":[{"value":"19 September 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"6 December 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 January 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing  interets.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"4"}}