{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,22]],"date-time":"2026-04-22T20:33:32Z","timestamp":1776890012877,"version":"3.51.2"},"reference-count":58,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2024,7,19]],"date-time":"2024-07-19T00:00:00Z","timestamp":1721347200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,7,19]],"date-time":"2024-07-19T00:00:00Z","timestamp":1721347200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J AUDIO SPEECH MUSIC PROC."],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>This paper presents the crossing scheme (X-scheme) for improving the performance of <jats:italic>deep neural network<\/jats:italic> (DNN)-based music source separation (MSS) with almost no increasing calculation cost. It consists of three components: (i) <jats:italic>multi-domain loss<\/jats:italic> (MDL), (ii) <jats:italic>bridging operation<\/jats:italic>, which couples the individual instrument networks, and (iii) <jats:italic>combination loss<\/jats:italic> (CL). MDL enables the taking advantage of the frequency- and time-domain representations of audio signals. We modify the target network, i.e., the network architecture of the original DNN-based MSS, by adding bridging paths for each output instrument to share their information. MDL is then applied to the combinations of the output sources as well as each independent source; hence, we called it CL. MDL and CL can easily be applied to many DNN-based separation methods as they are merely loss functions that are only used during training and do not affect the inference step. Bridging operation does not increase the number of learnable parameters in the network. Experimental results showed that the validity of Open-Unmix (UMX), densely connected dilated DenseNet (D3Net) and convolutional time-domain audio separation network (Conv-TasNet) extended with our X-scheme, respectively called X-UMX, X-D3Net and X-Conv-TasNet, by comparing them with their original versions. We also verified the effectiveness of X-scheme in a large-scale data regime, showing its generality with respect to data size. X-UMX Large (X-UMXL), which was trained on large-scale internal data and used in our experiments, is newly available at <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/asteroid-team\/asteroid\/tree\/master\/egs\/musdb18\/X-UMX\">https:\/\/github.com\/asteroid-team\/asteroid\/tree\/master\/egs\/musdb18\/X-UMX<\/jats:ext-link>.<\/jats:p>","DOI":"10.1186\/s13636-024-00354-6","type":"journal-article","created":{"date-parts":[[2024,7,19]],"date-time":"2024-07-19T04:01:28Z","timestamp":1721361688000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["The whole is greater than the sum of its parts: improving music source separation by bridging networks"],"prefix":"10.1186","volume":"2024","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3230-4335","authenticated-orcid":false,"given":"Ryosuke","family":"Sawata","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Naoya","family":"Takahashi","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Stefan","family":"Uhlich","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shusuke","family":"Takahashi","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yuki","family":"Mitsufuji","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2024,7,19]]},"reference":[{"issue":"8","key":"354_CR1","doi-asserted-by":"publisher","first-page":"1256","DOI":"10.1109\/TASLP.2019.2915167","volume":"27","author":"Y Luo","year":"2019","unstructured":"Y. Luo, N. Mesgarani, Conv-TasNet: Surpassing ideal time-frequency magnitude masking for speech separation. IEEE\/ACM Trans. Audio Speech Lang. Process. 27(8), 1256\u20131266 (2019)","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"354_CR2","volume-title":"Discrete-Time Signal Processing","author":"AV Oppenheim","year":"1999","unstructured":"A.V. Oppenheim, R.W. Schafer, J.R. Buck, Discrete-Time Signal Processing, 2nd edn. (Prentice-hall Englewood Cliffs, USA, 1999)","edition":"2"},{"key":"354_CR3","unstructured":"G. Meseguer-Brocal, G. Peeters, in Proc. of the 20th International Society for Music Information Retrieval Conference (ISMIR), ed. by A. Flexer, G. Peeters, J. Urbano, A. Volk. Conditioned-U-Net: introducing a control mechanism in the U-Net for multiple source separations\u00a0(2019). pp. 159\u2013165.\u00a0http:\/\/archives.ismir.net\/ismir2019\/paper\/000017.pdf. Accessed 29 Apr 2024"},{"key":"354_CR4","doi-asserted-by":"publisher","first-page":"2083","DOI":"10.1109\/TASLP.2021.3082331","volume":"29","author":"O Slizovskaia","year":"2021","unstructured":"O. Slizovskaia, G. Haro, E. G\u00f3mez, Conditioned source separation for musical instrument performances. IEEE\/ACM Trans. Audio Speech Lang. Process. 29, 2083\u20132095 (2021). https:\/\/doi.org\/10.1109\/TASLP.2021.3082331","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"354_CR5","doi-asserted-by":"publisher","unstructured":"V.S. Kadandale, J.F. Montesinos, G. Haro, E. G\u00f3mez, in Proc. of IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP). Multi-channel U-Net for music source separation (2020), pp. 1\u20136. https:\/\/doi.org\/10.1109\/MMSP48831.2020.9287108","DOI":"10.1109\/MMSP48831.2020.9287108"},{"key":"354_CR6","doi-asserted-by":"publisher","unstructured":"E. Perez, F. Strub, H. de Vries, V. Dumoulin, A. Courville, FiLM: Visual reasoning with a general conditioning layer. Proc. AAAI Conf. Artif. Intell. 32(1) (2018). https:\/\/doi.org\/10.1609\/aaai.v32i1.11671","DOI":"10.1609\/aaai.v32i1.11671"},{"issue":"1","key":"354_CR7","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1109\/MSP.2018.2874719","volume":"36","author":"E Cano","year":"2019","unstructured":"E. Cano, D. FitzGerald, A. Liutkus, M.D. Plumbley, F.R. St\u00f6ter, Musical source separation: An introduction. IEEE Signal Process. Mag. 36(1), 31\u201340 (2019). https:\/\/doi.org\/10.1109\/MSP.2018.2874719","journal-title":"IEEE Signal Process. Mag."},{"issue":"7","key":"354_CR8","doi-asserted-by":"publisher","first-page":"1830","DOI":"10.1109\/TASL.2010.2050716","volume":"18","author":"NQK Duong","year":"2010","unstructured":"N.Q.K. Duong, E. Vincent, R. Gribonval, Under-determined reverberant audio source separation using a full-rank spatial covariance model. IEEE Trans. Audio Speech Lang. Process. 18(7), 1830\u20131840 (2010)","journal-title":"IEEE Trans. Audio Speech Lang. Process."},{"key":"354_CR9","doi-asserted-by":"crossref","unstructured":"D. FitzGerald, A. Liutkus, R. Badeau, in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). PROJET \u2014 Spatial audio separation using projections (Institute of Electrical and Electronics Engineers (IEEE),\u00a0Shanghai, 2016), pp. 36\u201340","DOI":"10.1109\/ICASSP.2016.7471632"},{"key":"354_CR10","doi-asserted-by":"crossref","unstructured":"A. Liutkus, D. Fitzgerald, R. Badeau, in Proc. of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). Cauchy nonnegative matrix factorization (Institute of Electrical and Electronics Engineers (IEEE),\u00a0New Paltz, 2015), pp. 1\u20135","DOI":"10.1109\/WASPAA.2015.7336900"},{"key":"354_CR11","doi-asserted-by":"crossref","unstructured":"J. Le Roux, J.R. Hershey, F. Weninger, in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Deep NMF for speech separation (Institute of Electrical and Electronics Engineers (IEEE),\u00a0South Brisbane, 2015), pp. 66\u201370","DOI":"10.1109\/ICASSP.2015.7177933"},{"key":"354_CR12","doi-asserted-by":"crossref","unstructured":"Y. Mitsufuji, S. Koyama, H. Saruwatari, in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Multichannel blind source separation based on non-negative tensor factorization in wavenumber domain (Institute of Electrical and Electronics Engineers (IEEE),\u00a0Shanghai, 2016), pp. 56\u201360","DOI":"10.1109\/ICASSP.2016.7471636"},{"issue":"16","key":"354_CR13","doi-asserted-by":"publisher","first-page":"4298","DOI":"10.1109\/TSP.2014.2332434","volume":"62","author":"A Liutkus","year":"2014","unstructured":"A. Liutkus, D. Fitzgerald, Z. Rafii, B. Pardo, L. Daudet, Kernel additive models for source separation. IEEE Trans. Signal Process. 62(16), 4298\u20134310 (2014)","journal-title":"IEEE Trans. Signal Process."},{"issue":"3","key":"354_CR14","doi-asserted-by":"publisher","first-page":"550","DOI":"10.1109\/TASL.2009.2031510","volume":"18","author":"A Ozerov","year":"2010","unstructured":"A. Ozerov, C. Fevotte, Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation. IEEE Trans. Audio Speech Lang. Process. 18(3), 550\u2013563 (2010)","journal-title":"IEEE Trans. Audio Speech Lang. Process."},{"key":"354_CR15","doi-asserted-by":"crossref","unstructured":"A. Liutkus, D. Fitzgerald, Z. Rafii, in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Scalable audio separation with light kernel additive modelling (Institute of Electrical and Electronics Engineers (IEEE),\u00a0South Brisbane, 2015), pp. 76\u201380","DOI":"10.1109\/ICASSP.2015.7177935"},{"key":"354_CR16","doi-asserted-by":"crossref","unstructured":"C. Van Der Malsburg, in\u00a0Brain Theory,\u00a0ed. by G. Palm, A.\u00a0Aertsen.\u00a0Frank Rosenblatt: Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms, (Springer Berlin Heidelberg,\u00a0Berlin, Heidelberg, 1986), pp. 245\u2013248","DOI":"10.1007\/978-3-642-70911-1_20"},{"key":"354_CR17","doi-asserted-by":"publisher","first-page":"193","DOI":"10.1007\/BF00344251","volume":"36","author":"K Fukushima","year":"1980","unstructured":"K. Fukushima, Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 36, 193\u2013202 (1980)","journal-title":"Biol. Cybern."},{"issue":"6088","key":"354_CR18","doi-asserted-by":"publisher","first-page":"533","DOI":"10.1038\/323533a0","volume":"323","author":"DE Rumelhart","year":"1986","unstructured":"D.E. Rumelhart, G.E. Hinton, R.J. Williams, Learning representations by back-propagating errors. Nature 323(6088), 533\u2013536 (1986)","journal-title":"Nature"},{"key":"354_CR19","doi-asserted-by":"crossref","unstructured":"A.A. Nugraha, A. Liutkus, E. Vincent, in Proc. of 24th European Signal Processing Conference (EUSIPCO). Multichannel music separation with deep neural networks (Institute of Electrical and Electronics Engineers (IEEE),\u00a0Budapest, 2016), pp. 1748\u20131752","DOI":"10.1109\/EUSIPCO.2016.7760548"},{"key":"354_CR20","doi-asserted-by":"crossref","unstructured":"S. Uhlich, F. Giron, Y. Mitsufuji, in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Deep neural network based instrument extraction from music (Institute of Electrical and Electronics Engineers (IEEE),\u00a0South Brisbane, 2015), pp. 2135\u20132139","DOI":"10.1109\/ICASSP.2015.7178348"},{"key":"354_CR21","doi-asserted-by":"crossref","unstructured":"N. Takahashi, Y. Mitsufuji, in Proc. of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). Multi-scale multi-band densenets for audio source separation (Institute of Electrical and Electronics Engineers (IEEE),\u00a0New Paltz, 2017), pp. 21\u201325","DOI":"10.1109\/WASPAA.2017.8169987"},{"key":"354_CR22","doi-asserted-by":"crossref","unstructured":"S. Uhlich, M. Porcu, F. Giron, M. Enenkl, T. Kemp, N. Takahashi, Y. Mitsufuji, in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Improving music source separation based on deep neural networks through data augmentation and network blending (Institute of Electrical and Electronics Engineers (IEEE),\u00a0New Orleans, 2017), pp. 261\u2013265","DOI":"10.1109\/ICASSP.2017.7952158"},{"key":"354_CR23","doi-asserted-by":"crossref","unstructured":"N. Takahashi, N. Goswami, Y. Mitsufuji, in Proc. of IWAENC. MMDenseLSTM: An efficient combination of convolutional and recurrent neural networks for audio source separation (2018)","DOI":"10.1109\/IWAENC.2018.8521383"},{"key":"354_CR24","doi-asserted-by":"publisher","first-page":"1667","DOI":"10.21105\/joss.01667","volume":"4","author":"FR St\u00f6ter","year":"2019","unstructured":"F.R. St\u00f6ter, S. Uhlich, A. Liutkus, Y. Mitsufuji, Open-Unmix - A reference implementation for music source separation. J. Open Source Softw. 4, 1667 (2019)","journal-title":"J. Open Source Softw."},{"key":"354_CR25","unstructured":"J.H. Kim, J. Yoo, S. Chun, A. Kim, J.W. Ha, Multi-domain processing via hybrid denoising networks for speech enhancement. arXiv\u00a0(2018)"},{"key":"354_CR26","doi-asserted-by":"publisher","unstructured":"J. Su, Z. Jin, A. Finkelstein, in Proc. of Interspeech. HiFi-GAN: High-Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks (International Speech Communication Association (ISCA),\u00a0Shanghai, 2020), pp.\u00a04506\u20134510. https:\/\/doi.org\/10.21437\/Interspeech.2020-2143","DOI":"10.21437\/Interspeech.2020-2143"},{"key":"354_CR27","doi-asserted-by":"publisher","unstructured":"N. Wiener, Extrapolation, Interpolation, and Smoothing of Stationary Time Series: With Engineering Applications (The MIT Press, USA, 1949). https:\/\/doi.org\/10.7551\/mitpress\/2946.001.0001","DOI":"10.7551\/mitpress\/2946.001.0001"},{"key":"354_CR28","unstructured":"J. Lee, Y. Jung, M. Jung, H. Kim, in Proc. of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). Dynamic noise embedding: Noise aware training and adaptation for speech enhancement (Institute of Electrical and Electronics Engineers (IEEE),\u00a0Auckland, 2020), pp. 739\u2013746"},{"key":"354_CR29","doi-asserted-by":"crossref","unstructured":"H. Fang, G. Carbajal, S. Wermter, T. Gerkmann, in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Variational autoencoder for speech enhancement with a noise-aware encoder (Institute of Electrical and Electronics Engineers (IEEE), Toronto, 2021), pp. 676\u2013680","DOI":"10.1109\/ICASSP39728.2021.9414060"},{"key":"354_CR30","doi-asserted-by":"crossref","unstructured":"R. Sawata, S. Uhlich, S. Takahashi, Y. Mitsufuji, in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). All for one and one for all: Improving music separation by bridging networks (Institute of Electrical and Electronics Engineers (IEEE),\u00a0Toronto, 2021), pp. 51\u201355","DOI":"10.1109\/ICASSP39728.2021.9414044"},{"key":"354_CR31","unstructured":"A. D\u00e9fossez, in Proc. of the International Society for Music Information Retrieval (ISMIR) Conference Workshop on Music Source Separation. Hybrid Spectrogram and Waveform Source Separation (International Society for Music Information Retrieval (ISMIR), 2021)"},{"key":"354_CR32","doi-asserted-by":"crossref","unstructured":"S. Rouard, F. Massa, A. D\u00e9fossez, in\u00a0Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).\u00a0Hybrid transformers for music source separation\u00a0(IEEE,\u00a0Rhodes Island, 2023), pp. 1\u20135","DOI":"10.1109\/ICASSP49357.2023.10096956"},{"key":"354_CR33","doi-asserted-by":"crossref","unstructured":"Y. Luo, J. Yu, Music source separation with band-split rnn. IEEE\/ACM Trans. Audio Speech Lang. Process. (2023)","DOI":"10.1109\/TASLP.2023.3271145"},{"key":"354_CR34","doi-asserted-by":"publisher","unstructured":"N. Takahashi, Y. Mitsufuji, in Proc. of IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Densely connected multidilated convolutional networks for dense prediction tasks (2021), pp. 993\u20131002. https:\/\/doi.org\/10.1109\/CVPR46437.2021.00105","DOI":"10.1109\/CVPR46437.2021.00105"},{"key":"354_CR35","unstructured":"N. Takahashi, Y. Mitsufuji, D3Net:\u00a0Densely connected multidilated DenseNet for music source separation.\u00a0CoRR. abs\/2010.01733 (2020). https:\/\/arxiv.org\/abs\/2010.01733"},{"key":"354_CR36","doi-asserted-by":"crossref","unstructured":"F.R. St\u00f6ter, A. Liutkus, N. Ito, in\u00a0Proc. of International Conference on Latent Variable Analysis and Signal Separation. The 2018 signal separation evaluation campaign (Springer International Publishing,\u00a0Guildford, 2018), pp. 293\u2013305","DOI":"10.1007\/978-3-319-93764-9_28"},{"key":"354_CR37","doi-asserted-by":"crossref","unstructured":"F. Llu\u00eds, J. Pons, X. Serra, in Proc. of Interspeech. End-to-end music source separation: Is it possible in the waveform domain? (International Speech Communication Association (ISCA),\u00a0Graz, 2019), pp. 4619\u20134623","DOI":"10.21437\/Interspeech.2019-1177"},{"key":"354_CR38","unstructured":"D. Stoller, S. Ewert, S. Dixon, in Proc. of the 19th International Society for Music Information Retrieval (ISMIR) Conference. Wave-U-Net: A multi-scale neural network for end- to-end audio source separation (International Society for Music Information Retrieval (ISMIR),\u00a0Paris, 2018), pp. 334\u2013340"},{"key":"354_CR39","unstructured":"A. D\u00e9fossez, N. Usunier, L. Bottou, F. Bach, Demucs: Deep Extractor for Music Sources with extra unlabeled data remixed.\u00a0CoRR. abs\/1909.01174 (2019). http:\/\/arxiv.org\/abs\/1909.01174"},{"key":"354_CR40","unstructured":"Y.N. Dauphin, A. Fan, M. Auli, D. Grangier, in Proc. of the 34th International Conference on Machine Learning (ICML), vol. 70. Language modeling with gated convolutional networks (Proceedings of Machine Learning Research (PMLR),\u00a0Sydney, 2017), pp. 933\u2013941"},{"key":"354_CR41","unstructured":"M. Kim, W. Choi, J. Chung, D. Lee, S. Jung, in Proc. of the International Society for Music Information Retrieval (ISMIR) Conference Workshop on Music Source Separation. KUIELab-MDX-Net: A two-stream neural network for music demixing (International Society for Music Information Retrieval (ISMIR), 2021)"},{"key":"354_CR42","unstructured":"C.Y. Yu, K.W. Cheuk, in Proc. of the International Society for Music Information Retrieval (ISMIR) Conference Workshop on Music Source Separation. Danna-Sep: Unite to separate them all (International Society for Music Information Retrieval (ISMIR), 2021)"},{"key":"354_CR43","doi-asserted-by":"crossref","unstructured":"W. Choi, M. Kim, J. Chung, S. Jung, in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). LaSAFT: Latent source attentive frequency transformation for conditioned source separation (Institute of Electrical and Electronics Engineers (IEEE),\u00a0Toronto, 2021), pp. 171\u2013175","DOI":"10.1109\/ICASSP39728.2021.9413896"},{"key":"354_CR44","unstructured":"Y.S. Jeong, J. Kim, W. Choi, J. Chung, S. Jung, in Proc. of the International Society for Music Information Retrieval (ISMIR) Conference Workshop on Music Source Separation. LightSAFT: Lightweight latent source aware frequency transform for source separation (International Society for Music Information Retrieval (ISMIR), 2021)"},{"key":"354_CR45","unstructured":"H. Liu, Q. Kong, J. Liu, in Proc. of the International Society for Music Information Retrieval (ISMIR) Conference Workshop on Music Source Separation. CWS-PResUNet: Music source separation with channel-wise subband phase-aware ResUNet (International Society for Music Information Retrieval (ISMIR), 2021)"},{"key":"354_CR46","unstructured":"W. Choi, M. Kim, J. Chung, D. Lee, S. Jung, in Proc. of the 21st International Society for Music Information Retrieval (ISMIR) Conference. Investigating U-Nets with various intermediate blocks for spectrogram-based singing voice separation (International Society for Music Information Retrieval (ISMIR),\u00a0Montreal, 2020), pp. 192\u2013198"},{"key":"354_CR47","doi-asserted-by":"crossref","unstructured":"O. Ronneberger, P. Fischer, T. Brox, in Proc. of Medical Image Computing and Computer-Assisted Intervention (MICCAI). U-Net: Convolutional networks for biomedical image segmentation (Springer,\u00a0Munich, 2015), pp. 234\u2013241","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"354_CR48","unstructured":"Q. Kong, Y. Cao, H. Liu, K. Choi, Y. Wang, in Proc. of the 22nd International Society for Music Information Retrieval (ISMIR) Conference. Decoupling magnitude and phase estimation with deep ResUNet for music source separation (International Society for Music Information Retrieval (ISMIR), 2021), pp. 342\u2013349"},{"key":"354_CR49","doi-asserted-by":"crossref","unstructured":"N. Takahashi, P. Agrawal, N. Goswami, Y. Mitsufuji, in Proc. Interspeech. PhaseNet: Discretized phase modeling with deep neural networks for audio source separation (International Speech Communication Association (ISCA),\u00a0Hyderabad, 2018), pp. 2713\u20132717","DOI":"10.21437\/Interspeech.2018-1773"},{"key":"354_CR50","doi-asserted-by":"crossref","unstructured":"D. Yin, C. Luo, Z. Xiong, W. Zeng, in Proc. AAAI. Phasen: A phase-and-harmonics-aware speech enhancement network (AAAI Press,\u00a0New York City, 2020), pp. 9458\u20139465","DOI":"10.1609\/aaai.v34i05.6489"},{"key":"354_CR51","doi-asserted-by":"publisher","unstructured":"T. Peer, S. Welker, T. in\u00a0Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Gerkmann,\u00a0DiffPhase: Generative Diffusion-Based STFT Phase Retrieval\u00a0(Institute of Electrical and Electronics Engineers (IEEE),\u00a0Rhodes Island, 2023), pp. 1\u20135. https:\/\/doi.org\/10.1109\/ICASSP49357.2023.10095396","DOI":"10.1109\/ICASSP49357.2023.10095396"},{"key":"354_CR52","unstructured":"H.S. Choi, J.H. Kim, J. Huh, A. Kim, J.W. Ha, K. Lee,\u00a0Phase-aware Speech Enhancement with Deep Complex\u00a0U-Net.\u00a0CoRR abs\/1903.03107 (2019). http:\/\/arxiv.org\/abs\/1903.03107"},{"key":"354_CR53","unstructured":"A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, L.U. Kaiser, I. Polosukhin, in Proc. of Advances in Neural Information Processing Systems (NeurIPS), ed. by I. Guyon, U.V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, R. Garnett. Attention is all you need, vol. 30 (Neural Information Processing Systems Foundation, Inc. (NeurIPS),\u00a0Long Beach, 2017)"},{"key":"354_CR54","doi-asserted-by":"crossref","unstructured":"J. Hu, L. Shen, G. Sun, in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Squeeze-and-excitation networks (Institute of Electrical and Electronics Engineers (IEEE),\u00a0Salt Lake City, 2018)","DOI":"10.1109\/CVPR.2018.00745"},{"key":"354_CR55","doi-asserted-by":"crossref","unstructured":"Y. Luo, Z. Chen, N. Mesgarani, T. Yoshioka, in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). End-to-end microphone permutation and number invariant multi-channel speech separation (Institute of Electrical and Electronics Engineers (IEEE),\u00a0Barcelona, 2020), pp. 6394\u20136398","DOI":"10.1109\/ICASSP40776.2020.9054177"},{"issue":"1","key":"354_CR56","doi-asserted-by":"publisher","first-page":"41","DOI":"10.1023\/A:1007379606734","volume":"28","author":"R Caruana","year":"1997","unstructured":"R. Caruana, Multitask learning. Mach. Learn. 28(1), 41\u201375 (1997)","journal-title":"Mach. Learn."},{"key":"354_CR57","doi-asserted-by":"publisher","unstructured":"Z. Rafii, A. Liutkus, F.R. St\u00f6ter, S.I. Mimilakis, R. Bittner. The MUSDB18 corpus for music separation (2017). https:\/\/doi.org\/10.5281\/zenodo.1117372","DOI":"10.5281\/zenodo.1117372"},{"key":"354_CR58","unstructured":"D.P. Kingma, J. Ba, in Proc. of 3rd International Conference on Learning Representations (ICLR), ed. by Y. Bengio, Y. LeCun. Adam: A method for stochastic optimization (OpenReview.net,\u00a0San Diego, 2015)"}],"container-title":["EURASIP Journal on Audio, Speech, and Music Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13636-024-00354-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13636-024-00354-6\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13636-024-00354-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,19]],"date-time":"2024-07-19T04:08:45Z","timestamp":1721362125000},"score":1,"resource":{"primary":{"URL":"https:\/\/asmp-eurasipjournals.springeropen.com\/articles\/10.1186\/s13636-024-00354-6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7,19]]},"references-count":58,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2024,12]]}},"alternative-id":["354"],"URL":"https:\/\/doi.org\/10.1186\/s13636-024-00354-6","relation":{},"ISSN":["1687-4722"],"issn-type":[{"value":"1687-4722","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,7,19]]},"assertion":[{"value":"26 December 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"29 May 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"19 July 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"39"}}