{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,27]],"date-time":"2026-03-27T20:58:46Z","timestamp":1774645126325,"version":"3.50.1"},"reference-count":50,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2022,9,14]],"date-time":"2022-09-14T00:00:00Z","timestamp":1663113600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,9,14]],"date-time":"2022-09-14T00:00:00Z","timestamp":1663113600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100003725","name":"National Research Foundation of Korea","doi-asserted-by":"publisher","award":["NRF-2021R1A2C2006895"],"award-info":[{"award-number":["NRF-2021R1A2C2006895"]}],"id":[{"id":"10.13039\/501100003725","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Circuits Syst Signal Process"],"published-print":{"date-parts":[[2023,2]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Music source separation has traditionally followed the encoder-decoder paradigm (e.g., hourglass, U-Net, DeconvNet, SegNet) to isolate individual music components from mixtures. Such networks, however, result in a loss of location-sensitivity, as low-resolution representation drops the useful harmonic patterns over the temporal dimension. We overcame this problem by performing singing voice separation using a high-resolution representation learning (HRNet) system coupled with a long short-term memory (LSTM) module to retain high-resolution feature map and capture the temporal behavior of the acoustic signal. We called this joint combination of HRNet and LSTM as HR-LSTM. The predicted spectrograms produced by this system are close to ground truth and successfully separate music sources, achieving results superior to those realized by past methods. The proposed network was tested using four datasets (DSD100, MIR-1K, Korean <jats:italic>Pansori<\/jats:italic>, and Nepal Idol singing voice). Our experiments confirmed that the proposed HR-LSTM outperforms state-of-the-art networks at singing voice separation when the DSD100 dataset is used, performs comparably to alternative methods when the MIR-1K dataset is used, and separates the voice and accompaniment components well when the <jats:italic>Pansori<\/jats:italic> and NISVS datasets are used. In addition to proposing and validating our network, we also developed and shared our Nepal Idol dataset.<\/jats:p>","DOI":"10.1007\/s00034-022-02166-5","type":"journal-article","created":{"date-parts":[[2022,9,14]],"date-time":"2022-09-14T17:03:33Z","timestamp":1663175013000},"page":"1083-1104","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["High-Resolution Representation Learning and Recurrent Neural Network for Singing Voice Separation"],"prefix":"10.1007","volume":"42","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7014-4868","authenticated-orcid":false,"given":"Bhuwan","family":"Bhattarai","sequence":"first","affiliation":[]},{"given":"Yagya Raj","family":"Pandeya","sequence":"additional","affiliation":[]},{"given":"You","family":"Jie","sequence":"additional","affiliation":[]},{"given":"Arjun Kumar","family":"Lamichhane","sequence":"additional","affiliation":[]},{"given":"Joonwhoan","family":"Lee","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,9,14]]},"reference":[{"key":"2166_CR1","doi-asserted-by":"publisher","first-page":"206016","DOI":"10.1109\/ACCESS.2020.3037773","volume":"8","author":"B Bhuwan","year":"2020","unstructured":"B. Bhuwan, R.P. Yagya, L. Joonwhoan, Parallel stacked hourglass network for music source separation. IEEE Access 8, 206016\u2013206027 (2020). https:\/\/doi.org\/10.1109\/ACCESS.2020.3037773","journal-title":"IEEE Access"},{"issue":"5","key":"2166_CR2","doi-asserted-by":"publisher","first-page":"2604","DOI":"10.1121\/1.4948445","volume":"139","author":"J Chen","year":"2016","unstructured":"J. Chen, Y. Wang et al., Large-scale training to increase speech intelligibility for hearing-impaired listeners in novel noises. J. Acoust. Soc. Am. 139(5), 2604\u20132612 (2016). https:\/\/doi.org\/10.1121\/1.4948445","journal-title":"J. Acoust. Soc. Am."},{"key":"2166_CR3","doi-asserted-by":"publisher","unstructured":"C. P. Dadula, E. P. Dadios, A genetic algorithm for blind source separation based on independent component analysis, in 2014 International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment and Management (HNICEM), pp. 1\u20136. IEEE. https:\/\/doi.org\/10.1109\/HNICEM.2014.7016226","DOI":"10.1109\/HNICEM.2014.7016226"},{"key":"2166_CR4","doi-asserted-by":"publisher","unstructured":"C. Donahue, J. McAuley, M. Puckette, Adversarial audio synthesis, ICLR 2019. https:\/\/doi.org\/10.48550\/arXiv.1802.04208.","DOI":"10.48550\/arXiv.1802.04208"},{"key":"2166_CR5","doi-asserted-by":"publisher","unstructured":"Z.C. Fan, J.S.R. Jang, C.L. Lu, Singing voice separation and pitch extraction from monaural polyphonic audio music via DNN and adaptive pitch tracking, in IEEE International Conference on Multimedia Big Data (2016). https:\/\/doi.org\/10.1109\/BigMM.2016.56","DOI":"10.1109\/BigMM.2016.56"},{"key":"2166_CR6","doi-asserted-by":"publisher","first-page":"992","DOI":"10.1109\/TNN.2005.849840","volume":"16","author":"P Georgiev","year":"2005","unstructured":"P. Georgiev, F. Theis, A. Cichocki, Sparse component analysis and blind source separation of underdetermined mixtures. IEEE Trans. Neural Netw. 16, 992\u2013996 (2005). https:\/\/doi.org\/10.1109\/TNN.2005.849840","journal-title":"IEEE Trans. Neural Netw."},{"key":"2166_CR7","unstructured":"E. G\u00f3mez, F. Canadas, J. Salamon, J. Bonada, P. Vera, P. Cabanas, Predominant fundamental frequency estimation vs singing voice separation for the automatic transcription of accompanied flamenco singing, in 13th International Society for Music Information Retrieval Conference (ISMIR 2012)."},{"key":"2166_CR8","doi-asserted-by":"publisher","unstructured":"E.M. Grais, M.D. Plumbley, Single channel audio source separation using convolutional denoising autoencoders, in Proceedings of the IEEE GlobalSIP Symposium on Sparse Signal Processing and Deep Learning, 5th IEEE Global Conference on Signal and Information Processing (GlobalSIP 2017), 14\u201316 Nov. Montreal, Canada. https:\/\/doi.org\/10.1109\/GlobalSIP.2017.8309164","DOI":"10.1109\/GlobalSIP.2017.8309164"},{"key":"2166_CR9","doi-asserted-by":"publisher","unstructured":"E.M. Grais, D. Ward, M.D. Plumbley, Raw multi-channel audio source separation using multiresolution convolutional auto-encoders, in 26th European Signal Processing Conference (EUSIPCO), 2018. https:\/\/doi.org\/10.23919\/EUSIPCO.2018.8553571","DOI":"10.23919\/EUSIPCO.2018.8553571"},{"key":"2166_CR10","doi-asserted-by":"publisher","unstructured":"K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, CA, USA, 27\u201330 June 2016; pp. 770\u2013778. https:\/\/doi.org\/10.48550\/arXiv.1512.03385","DOI":"10.48550\/arXiv.1512.03385"},{"key":"2166_CR11","doi-asserted-by":"publisher","DOI":"10.3390\/app10051727","author":"WH Heo","year":"2020","unstructured":"W.H. Heo, H. Kim, O.W. Kwon, Source separation using dilated time-frequency DenseNet for music identification in broadcast contents. Appl. Sci. (2020). https:\/\/doi.org\/10.3390\/app10051727","journal-title":"Appl. Sci."},{"key":"2166_CR12","doi-asserted-by":"publisher","DOI":"10.1109\/TASL.2009.2026503","author":"CL Hsu","year":"2010","unstructured":"C.L. Hsu, J.S.R. Jang, On the improvement of singing voice separation for monaural recordings using MIR-1K dataset. IEEE Trans. Audio Speech Lang. Process. (2010). https:\/\/doi.org\/10.1109\/TASL.2009.2026503","journal-title":"IEEE Trans. Audio Speech Lang. Process."},{"key":"2166_CR13","doi-asserted-by":"publisher","first-page":"411","DOI":"10.1016\/S0893-6080(00)00026-5","volume":"13","author":"A Hyv\u00e4rinen","year":"2000","unstructured":"A. Hyv\u00e4rinen, E. Oja, Independent component analysis: algorithms and applications. Neural Netw. 13, 411\u2013430 (2000). https:\/\/doi.org\/10.1016\/S0893-6080(00)00026-5","journal-title":"Neural Netw."},{"key":"2166_CR14","unstructured":"A. Jansson, E. Humphrey, N. Montecchio, R. Bittner, A. Kumar, T. Weyde, Singing voice separation with deep U-Net convolutional networks, in 18th International Society for Music Information Retrieval Conferencing, Suzhou, China (2017)."},{"issue":"4","key":"2166_CR15","doi-asserted-by":"publisher","first-page":"2379","DOI":"10.1121\/1.2839887","volume":"123","author":"K Kokkinakis","year":"2008","unstructured":"K. Kokkinakis, P.C. Loizou, Using blind source separation techniques to improve speech recognition in bilateral cochlear implant patients. J. Acoust. Soc. Am. 123(4), 2379\u20132390 (2008). https:\/\/doi.org\/10.1121\/1.2839887","journal-title":"J. Acoust. Soc. Am."},{"key":"2166_CR16","unstructured":"D.D. Lee, H.S. Seung, Algorithms for non-negative matrix factorization, in Proceedings of the Neural Information Processing Systems (NIPS), Vancouver, BC, Canada, 3\u20138 December 2001; pp. 556\u2013562."},{"key":"2166_CR17","doi-asserted-by":"publisher","unstructured":"J.L. LeRoux, J.R. Hershey, F.J. Weninger, Deep NMF for speech separation, in Proceedings of ICASSP, 2015, p. 6670. https:\/\/doi.org\/10.1109\/ICASSP.2015.7177933","DOI":"10.1109\/ICASSP.2015.7177933"},{"key":"2166_CR18","unstructured":"K.W.E. Lin, H. Anderson, M.H.M. Hamzeen, S. Lui, Implementation and evaluation of real-time interactive user interface design in self-learning singing pitch training apps, in Joint Proceedings of International Computer Music Conference (ICMC) and Sound and Music Computing Conference (SMC) 2014. http:\/\/hdl.handle.net\/2027\/spo.bbp2372.2014.257"},{"key":"2166_CR19","doi-asserted-by":"publisher","unstructured":"K.W.E. Lin, H. Anderson, N. Agus, C. So, S. Lui, Visualising singing style under common musical events using pitch-dynamics trajectories and modified traclus clustering, in International conference on machine learning and applications (ICMLA), pp 237\u2013242 (2014). https:\/\/doi.org\/10.1109\/ICMLA.2014.44","DOI":"10.1109\/ICMLA.2014.44"},{"key":"2166_CR20","doi-asserted-by":"publisher","unstructured":"K. W. E. Lin, T. Feng, N. Agus, C. So, S. Lui, Modelling mutual information between voiceprint and optimal number of mel-frequency cepstral coefficients in voice discrimination, in International conference on machine learning and applications (ICMLA), pp 15\u201320 (2014). https:\/\/doi.org\/10.1109\/ICMLA.2014.9","DOI":"10.1109\/ICMLA.2014.9"},{"key":"2166_CR21","doi-asserted-by":"publisher","unstructured":"P.M.G. Lopez, H.M. Lozano, F.L.P. Sanchez, L.N. Oliva, Blind Source Separation of audio signals using independent component analysis and wavelets, in CONIELECOMP 2011, 21st International Conference on Electrical Communications and Computers, pp. 152\u2013157. IEEE. https:\/\/doi.org\/10.1109\/CONIELECOMP.2011.5749353","DOI":"10.1109\/CONIELECOMP.2011.5749353"},{"key":"2166_CR22","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2018.8462116","author":"Y Luo","year":"2017","unstructured":"Y. Luo, N. Mesgarani, Tasnet: time-domain audio separation network for real-time, single-channel speech separation. CoRR (2017). https:\/\/doi.org\/10.1109\/ICASSP.2018.8462116","journal-title":"CoRR"},{"key":"2166_CR23","doi-asserted-by":"publisher","first-page":"546047","DOI":"10.1186\/1687-4722-2010-546047","volume":"1","author":"A Mesaros","year":"2010","unstructured":"A. Mesaros, T. Virtanen, Automatic recognition of lyrics in singing. EURASIP J. Audio Speech Music Process 1, 546047 (2010)","journal-title":"EURASIP J. Audio Speech Music Process"},{"key":"2166_CR24","doi-asserted-by":"publisher","unstructured":"A.A. Nugraha, A. Liutkus, E. Vincent, Multichannel music separation with deep neural networks, in Proceedings of EUSIPCO (2015). https:\/\/doi.org\/10.1109\/EUSIPCO.2016.7760548","DOI":"10.1109\/EUSIPCO.2016.7760548"},{"issue":"10","key":"2166_CR25","doi-asserted-by":"publisher","first-page":"1652","DOI":"10.1109\/TASLP.2016.2580946","volume":"24","author":"AA Nugraha","year":"2016","unstructured":"A.A. Nugraha, A. Liutkus, E. Vincent, Multichannel audio source separation with deep neural networks. IEEE\/ACM Trans. Audio Speech Lang. Process. Inst. Electr. Electron. Eng. 24(10), 1652\u20131664 (2016). https:\/\/doi.org\/10.1109\/TASLP.2016.2580946","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process. Inst. Electr. Electron. Eng."},{"key":"2166_CR26","doi-asserted-by":"publisher","unstructured":"N. Ono, Z. Koldovsky, S. Miyabe, N. Ito, The 2013 signal separation evaluation campaign, in Proc. MLSP, pp. 1\u20136 (2013). https:\/\/doi.org\/10.1109\/MLSP.2013.6661988","DOI":"10.1109\/MLSP.2013.6661988"},{"key":"2166_CR27","unstructured":"A.V.D. Oord, S. Dieleman, et al., Wavenet. A generative model for raw audio, in Proceedings of 9th ISCA Workshop on Speech Synthesis Workshop (SSW 9), 125 (2016)."},{"issue":"5","key":"2166_CR28","doi-asserted-by":"publisher","first-page":"1564","DOI":"10.1109\/TASL.2007.899291","volume":"15","author":"A Ozerov","year":"2007","unstructured":"A. Ozerov, P. Philippe, F. Bimbot, R. Gribonval, Adaptation of Bayesian Models for single-channel source separation and its application to voice\/music separation in popular songs. IEEE Trans. Audio Speech Lang. Process. 15(5), 1564\u20131578 (2007). https:\/\/doi.org\/10.1109\/TASL.2007.899291","journal-title":"IEEE Trans. Audio Speech Lang. Process."},{"key":"2166_CR29","unstructured":"S. Park, T. Kim, K. Lee, N. Kwak, Music source separation using stacked hourglass networks, in Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Paris, France, 23\u201327 September (2018), pp. 289\u2013296."},{"key":"2166_CR30","doi-asserted-by":"publisher","unstructured":"S. Pascual, A. Bonafonte, J. Serra, SEGAN: Speech enhancement generative adversarial network, in Conference of the International Speech Communication Association, INTERSPEECH (2017). https:\/\/doi.org\/10.48550\/arXiv.1703.09452","DOI":"10.48550\/arXiv.1703.09452"},{"issue":"1","key":"2166_CR31","doi-asserted-by":"publisher","first-page":"73","DOI":"10.1109\/TASL.2012.2213249","volume":"21","author":"Z Rafii","year":"2012","unstructured":"Z. Rafii, B. Pardo, Repeating pattern extraction technique (repet): A simple method for music\/voice separation. IEEE Trans. Audio Speech Lang. Process. 21(1), 73\u201384 (2012). https:\/\/doi.org\/10.1109\/TASL.2012.2213249","journal-title":"IEEE Trans. Audio Speech Lang. Process."},{"key":"2166_CR32","unstructured":"B. Raj, P. Smaragdis, M. Shashanka, R. Singh, Separating a foreground singer from background music, in Proceedings of International symposium on Frontiers of Research in Speech and Music (2007), pp. 8\u20139."},{"key":"2166_CR33","doi-asserted-by":"publisher","unstructured":"D. Rethage, J. Pons, X. Serra, A wavenet for speech denoising, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2018). https:\/\/doi.org\/10.1109\/ICASSP.2018.8462417","DOI":"10.1109\/ICASSP.2018.8462417"},{"key":"2166_CR34","unstructured":"J. Salamon, R.M. Bittner, J. Bonada, J.J. Bosch, E. G\u00f3mez, J.P. Bello, An analysis\/synthesis framework for automatic F0 annotation of multitrack datasets, in International Society for Music Information Retrieval Conference (2017)."},{"key":"2166_CR35","doi-asserted-by":"publisher","unstructured":"J. Sebastian, H. A. Murthy, Group delay based music source separation using deep recurrent neural networks, in 2016 International Conference on Signal Processing and Communications (SPCOM). IEEE, (2016), pp. 1\u20135. https:\/\/doi.org\/10.1109\/SPCOM.2016.7746672","DOI":"10.1109\/SPCOM.2016.7746672"},{"issue":"7","key":"2166_CR36","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/TCYB.2021.3119199","volume":"52","author":"H Shen","year":"2022","unstructured":"H. Shen, Z. Huang, Z. Wu, J. Cao, J.H. Park, Nonfragile synchronization of BAM inertial neural networks subject to persistent dwell-time switching regularity. IEEE Trans. Cybernet. 52(7), 1 (2022). https:\/\/doi.org\/10.1109\/TCYB.2021.3119199","journal-title":"IEEE Trans. Cybernet."},{"key":"2166_CR37","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2021.3107607","author":"H Shen","year":"2021","unstructured":"H. Shen, X. Hu, J. Wang, J. Cao, W. Qian, Non-fragile synchronization for Markov jump singularly perturbed coupled neural networks subject to double-layer switching regulation. IEEE Trans. Neural Netw. Learn. Syst. Early Access (2021). https:\/\/doi.org\/10.1109\/TNNLS.2021.3107607","journal-title":"IEEE Trans. Neural Netw. Learn. Syst. Early Access"},{"key":"2166_CR38","doi-asserted-by":"publisher","unstructured":"D. Stoller, S. Ewert, S. Dixon, Wave-u-net: a multi-scale neural network for end-to-end audio source separation, in 19th International Society for Music Information Retrieval Conference (ISMIR 2018). https:\/\/doi.org\/10.48550\/arXiv.1806.03185","DOI":"10.48550\/arXiv.1806.03185"},{"key":"2166_CR39","doi-asserted-by":"publisher","unstructured":"N. Takahashi, N. Goswami, Y. Mitsufuji, MMDENSELSTM: an efficient combination of convolutional and recurrent neural networks for audio source separation, in Proceedings of 16th International Workshop Acoustic Signal Enhancement (IWAENC), Tokyo, Japan (2018), pp. 106\u2013110. https:\/\/doi.org\/10.1109\/IWAENC.2018.8521383","DOI":"10.1109\/IWAENC.2018.8521383"},{"key":"2166_CR40","doi-asserted-by":"publisher","unstructured":"N. Takahashi, Y. Mitsufuji, Multi-scale multi-band DenseNets for audio source separation, in Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA, 15\u201318 October 2017, pp. 21\u201325. https:\/\/doi.org\/10.1109\/WASPAA.2017.8169987","DOI":"10.1109\/WASPAA.2017.8169987"},{"key":"2166_CR41","doi-asserted-by":"publisher","unstructured":"S. Uhlich, M. Porcu, F. Giron, M. Enenkl, T. Kemp. N. Takahashi, Y. Mitsufuji, Improving music source separation based on deep neural networks through data augmentation and network blending, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2017), pp. 261\u2013265. https:\/\/doi.org\/10.1109\/ICASSP.2017.7952158","DOI":"10.1109\/ICASSP.2017.7952158"},{"key":"2166_CR42","doi-asserted-by":"publisher","unstructured":"S. Uhlich, F. Giron, Y. Mitsufuji, Deep neural network based instrument extraction from music, in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2015), pp. 2135\u20132139. https:\/\/doi.org\/10.1109\/ICASSP.2015.7178348","DOI":"10.1109\/ICASSP.2015.7178348"},{"issue":"4","key":"2166_CR43","doi-asserted-by":"publisher","first-page":"1462","DOI":"10.1109\/TSA.2005.858005","volume":"14","author":"E Vincent","year":"2006","unstructured":"E. Vincent, R. Gribonval, C. Fevotte, Performance measurement in blind audio source separation. IEEE Trans. Audio Speech Lang. Process. 14(4), 1462\u20131469 (2006). https:\/\/doi.org\/10.1109\/TSA.2005.858005","journal-title":"IEEE Trans. Audio Speech Lang. Process."},{"key":"2166_CR44","doi-asserted-by":"publisher","unstructured":"Y. Wang, M. Y. Kan, T. L. Nwe, A. Shenoy, J. Yin, Lyrically: automatic synchronization of acoustic musical signals and textual lyrics, in ACM International Conference on Multimedia. ACM, Cambridge, pp 212\u2013219 (2004). https:\/\/doi.org\/10.1109\/TASL.2007.911559","DOI":"10.1109\/TASL.2007.911559"},{"key":"2166_CR45","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2020.2983686","author":"J Wang","year":"2020","unstructured":"J. Wang, K. Sun et al., Deep high-resolution representation learning for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. (2020). https:\/\/doi.org\/10.1109\/TPAMI.2020.2983686","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"2166_CR46","doi-asserted-by":"publisher","unstructured":"F. Weninger, J. R. Hershey, J. Le. Roux, B. Schuller, Discriminatively trained recurrent neural networks for single-channel speech separation, in 2014 IEEE Global Conference on Signal and Information Processing (GlobalSIP). IEEE (2014), pp. 577\u2013581. https:\/\/doi.org\/10.1109\/GlobalSIP.2014.7032183","DOI":"10.1109\/GlobalSIP.2014.7032183"},{"key":"2166_CR47","unstructured":"Wikipedia, https:\/\/en.wikipedia.org\/wiki\/Idols_(franchise)"},{"key":"2166_CR48","unstructured":"Y.H. Yang, Low \u2013Rank representation of both singing voice and music accompaniment via learned dictionaries, in ISMIR, pp. 427\u2013432 (2013)"},{"key":"2166_CR49","doi-asserted-by":"publisher","unstructured":"J. R. Zapata, E. Gomez, Using voice suppression algorithms to improve beat tracking in the presence of highly predominant vocals, in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 51\u201355. IEEE. https:\/\/doi.org\/10.1109\/ICASSP.2013.6637607","DOI":"10.1109\/ICASSP.2013.6637607"},{"key":"2166_CR50","doi-asserted-by":"publisher","unstructured":"H. Zhang, X. Zhang, S. Nie, G. Gao, W. Liu, A pairwise algorithm for pitch estimation and speech separation using deep stacking network, in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2015), pp. 246\u2013250. https:\/\/doi.org\/10.1109\/ICASSP.2015.7177969","DOI":"10.1109\/ICASSP.2015.7177969"}],"container-title":["Circuits, Systems, and Signal Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00034-022-02166-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s00034-022-02166-5\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00034-022-02166-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,3]],"date-time":"2023-02-03T03:07:26Z","timestamp":1675393646000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s00034-022-02166-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,9,14]]},"references-count":50,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2023,2]]}},"alternative-id":["2166"],"URL":"https:\/\/doi.org\/10.1007\/s00034-022-02166-5","relation":{},"ISSN":["0278-081X","1531-5878"],"issn-type":[{"value":"0278-081X","type":"print"},{"value":"1531-5878","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,9,14]]},"assertion":[{"value":"24 January 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 August 2022","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 August 2022","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"14 September 2022","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}