{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T07:27:03Z","timestamp":1740122823054,"version":"3.37.3"},"reference-count":68,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2018,2,28]],"date-time":"2018-02-28T00:00:00Z","timestamp":1519776000000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/www.springer.com\/tdm"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Multimed Tools Appl"],"published-print":{"date-parts":[[2019,2]]},"DOI":"10.1007\/s11042-018-5797-8","type":"journal-article","created":{"date-parts":[[2018,2,27]],"date-time":"2018-02-27T20:34:48Z","timestamp":1519763688000},"page":"2703-2718","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["SATIN: a persistent musical database for music information retrieval and a supporting deep learning experiment on song instrumental classification"],"prefix":"10.1007","volume":"78","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5143-2335","authenticated-orcid":false,"given":"Yann","family":"Bayle","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Matthias","family":"Robine","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Pierre","family":"Hanna","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2018,2,28]]},"reference":[{"key":"5797_CR1","unstructured":"Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, Kudlur M, Levenberg J, Monga R, Moore S, Murray D G, Steiner B, Tucker P, Vasudevan V, Warden P, Wicke M, Yu Y, Zheng X (2016) Tensorflow: a system for large-scale machine learning. In: Proceedings of the 12th USENIX symposium on operating system design implementation, vol 16, pp 265\u2013 283"},{"key":"5797_CR2","unstructured":"Bayle Y, Hanna P, Robine M (2016) Classification \u00e0 grande \u00e9chelle de morceaux de musique en fonction de la pr\u00e9sence de chant. In: Journ\u00e9es d\u2019informatique musicale, Albi, France, pp 144\u2013152"},{"issue":"4","key":"5797_CR3","doi-asserted-by":"publisher","first-page":"858","DOI":"10.1109\/TPAMI.2010.208","volume":"33","author":"J Bekios-Calfa","year":"2011","unstructured":"Bekios-Calfa J, Buenaposada J M, Baumela L (2011) Revisiting linear discriminant techniques in gender recognition. IEEE Trans Pattern Anal Mach Intell 33(4):858\u2013864","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"issue":"2","key":"5797_CR4","doi-asserted-by":"publisher","first-page":"157","DOI":"10.1109\/72.279181","volume":"5","author":"Y Bengio","year":"1994","unstructured":"Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157\u2013166","journal-title":"IEEE Trans Neural Netw"},{"key":"5797_CR5","unstructured":"Bertin-Mahieux T, Ellis D P W, Whitman B, Lamere P (2011) The million song dataset. In: Proceedings of the 12th international society for music information retrieval conference, Miami, FL, USA, pp 591\u2013596"},{"key":"5797_CR6","unstructured":"Bittner R M, Salamon J, Tierney M, Mauch M, Cannam C, Bello J P (2014) MedleyDB: a multitrack dataset for annotation-intensive MIR research. In: Proceedings of the 15th international society for music information retrieval conference, Taipei, Taiwan, pp 155\u2013160"},{"issue":"4","key":"5797_CR7","doi-asserted-by":"publisher","first-page":"687","DOI":"10.1109\/TMM.2011.2125784","volume":"13","author":"D Bogdanov","year":"2011","unstructured":"Bogdanov D, Serr\u00e0 J, Wack N, Herrera P, Serra X (2011) Unifying low-level and high-level music similarity measures. IEEE Trans Multimedia 13(4):687\u2013701","journal-title":"IEEE Trans Multimedia"},{"key":"5797_CR8","unstructured":"Bogdanov D, Wack N, G\u00f3mez E, Gulati S, Herrera P, Mayor O, Roma G, Salomon J, Zapata J R, Serra X (2013) Essentia: an audio analysis library for music information retrieval. In: Proceedings of the 14th international society for music information retrieval conference, Curitiba, Brazil, pp 493\u2013 498"},{"key":"5797_CR9","doi-asserted-by":"crossref","unstructured":"Cheng Z, Shen J (2014) Just-for-me: an adaptive personalization system for location-aware social music recommendation. In: Proceedings of international conference on multimedia retrieval. ACM, p 185","DOI":"10.1145\/2578726.2578751"},{"key":"5797_CR10","unstructured":"Choi K, Fazekas G, Sandler M, Kim J (2015) Auralisation of deep convolutional neural networks: Listening to learned features. In: Proceedings of the 16th international society for music information retrieval conference, pp 26\u201330"},{"key":"5797_CR11","unstructured":"Choi K, Fazekas G, Sandler M B (2016) Automatic tagging using deep convolutional neural networks. In: Proceedings of the 17th international society for music information retrieval conference, New York, NY, USA, pp 805\u2013811"},{"key":"5797_CR12","unstructured":"Choi K, Fazekas G, Cho K, Sandler M (2017) A comparison on audio signal preprocessing methods for deep neural networks on music tagging. arXiv: 1709.01922"},{"key":"5797_CR13","unstructured":"Chollet F (2015) Keras: deep learning library for theano and tensorflow. Tech. Rep"},{"issue":"1","key":"5797_CR14","doi-asserted-by":"publisher","first-page":"21","DOI":"10.1109\/TIT.1967.1053964","volume":"13","author":"T Cover","year":"1967","unstructured":"Cover T, Hart P E (1967) Nearest neighbor pattern classification. IEEE Trans Inform Theory 13(1):21\u201327","journal-title":"IEEE Trans Inform Theory"},{"key":"5797_CR15","unstructured":"Defferrard M, Benzi K, Vandergheynst P, Bresson X (2017) Fma: a dataset for music analysis. In: Proceedings of the 18th international society for music information retrieval conference"},{"key":"5797_CR16","doi-asserted-by":"crossref","unstructured":"Eronen A, Klapuri A (2000) Musical instrument recognition using cepstral coefficients and temporal features. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing, vol 2. IEEE, pp II753\u2013II756","DOI":"10.1109\/ICASSP.2000.859069"},{"key":"5797_CR17","doi-asserted-by":"crossref","unstructured":"Fern\u00e1ndez C, Huerta I, Prati A (2015) A comparative evaluation of regression learning algorithms for facial age estimation. In: Ji Q, Moeslund T, Hua G, Nasrollahi K (eds) Face and facial expression recognition from real world videos. Springer, Cham, pp 133\u2013144","DOI":"10.1007\/978-3-319-13737-7_12"},{"key":"5797_CR18","unstructured":"Foote J T (1997) Content-based retrieval of music and audio. In: Multimedia storage and archiving systems II, international society for optics and photonics, vol 3229, pp 138\u2013148"},{"issue":"526","key":"5797_CR19","first-page":"1","volume":"2","author":"A Ghosal","year":"2013","unstructured":"Ghosal A, Chakraborty R, Dhara B C, Saha S K (2013) A hierarchical approach for speech-instrumental-song classification. SpringerPlus 2(526):1\u201311","journal-title":"SpringerPlus"},{"key":"5797_CR20","unstructured":"Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the 14th international conference on artificial intelligence and statistics, pp 315\u2013323"},{"key":"5797_CR21","unstructured":"Goto M, Hashiguchi H, Nishimura T, Oka R (2002) RWC music database: popular, classical and jazz music databases. In: Proceedings of the 3rd international conference on music information retrieval, Paris, France, pp 287\u2013288"},{"key":"5797_CR22","unstructured":"Gouyon F, Sturm B L, Oliveira J L, Hespanhol N, Langlois T (2014) On evaluation validity in music autotagging. arXiv: 1410.0001"},{"key":"5797_CR23","unstructured":"Hennequin R, Moussallam M (2015) Detection and characterization of singing voice using deep neural networks. Tech. rep., Deezer"},{"key":"5797_CR24","doi-asserted-by":"crossref","unstructured":"Hershey S, Chaudhuri S, Ellis D P W, Gemmeke J F, Jansen A, Moore R C, Plakal M, Platt D, Saurous R A, Seybold B, Slaney M, Weiss R J, Wilson K (2017) Cnn architectures for large-scale audio classification. In: ICASSP. IEEE, pp 131\u2013135","DOI":"10.1109\/ICASSP.2017.7952132"},{"key":"5797_CR25","unstructured":"Hespanhol N (2013) Using autotagging for classification of vocals in music signals. PhD Thesis, University of Porto, Portugal"},{"key":"5797_CR26","unstructured":"Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning, pp 448\u2013456"},{"key":"5797_CR27","unstructured":"Jeon B, Kim C, Kim A, Kim D, Park J, Ha J W (2017) Music emotion recognition via end-to-end multimodal neural networks. In: RECSYS"},{"key":"5797_CR28","unstructured":"Kim Y E, Whitman B (2002) Singer identification in popular music recordings using voice coding features. In: Proceedings of the 3rd international conference on music information retrieval, Paris, France, pp 17\u201323"},{"key":"5797_CR29","unstructured":"Krizhevsky A, Sutskever I, Hinton G E (2012) Imagenet classification with deep convolutional neural networks. In: Pereira F, Burges C J C, Bottou L, Weinberger K Q (eds) Proceedings of the 25th conference on advances neural information processing systems. Curran Associates, Inc., pp 1097\u20131105"},{"key":"5797_CR30","unstructured":"Law E, West K, Mandel M I, Bay M, Downie J S (2009) Evaluation of algorithms using games: the case of music tagging. In: Proceedings of the 10th international society for music information retrieval conference, Kobe, Japan, pp 387\u2013392"},{"key":"5797_CR31","doi-asserted-by":"crossref","unstructured":"Leglaive S, Hennequin R, Badeau R (2015) Singing voice detection with deep recurrent neural networks. In: Proceedings of the 40th IEEE international conference on acoustics, speech, and signal processing, Brisbane, Australia, pp 121\u2013125","DOI":"10.1109\/ICASSP.2015.7177944"},{"key":"5797_CR32","unstructured":"Lehner B, Widmer G (2015) Monaural blind source separation in the context of vocal detection. In: Proceedings of the 16th international society for music information retrieval conference, pp 309\u2013315"},{"key":"5797_CR33","doi-asserted-by":"crossref","unstructured":"Lehner B, Widmer G, B\u00f6ck S (2015) A low-latency, real-time-capable singing voice detection method with LSTM recurrent neural networks. In: Proceedings of the 23rd european signal processing conference, Nice, France, pp 21\u201325","DOI":"10.1109\/EUSIPCO.2015.7362337"},{"key":"5797_CR34","doi-asserted-by":"publisher","DOI":"10.1002\/9781118393550","volume-title":"An introduction to audio content analysis: applications in signal processing and music informatics","author":"A Lerch","year":"2012","unstructured":"Lerch A (2012) An introduction to audio content analysis: applications in signal processing and music informatics. Wiley, New York"},{"issue":"16","key":"5797_CR35","doi-asserted-by":"publisher","first-page":"4298","DOI":"10.1109\/TSP.2014.2332434","volume":"62","author":"A Liutkus","year":"2014","unstructured":"Liutkus A, Fitzgerald D, Rafii Z, Pardo B, Daudet L (2014) Kernel additive models for source separation. IEEE Trans Signal Process 62(16):4298\u20134310","journal-title":"IEEE Trans Signal Process"},{"key":"5797_CR36","unstructured":"Livshin A, Rodet X (2003) The importance of cross database evaluation in sound classification. In: Proceedings of the 4th international conference on music information retrieval, Baltimore, MD, USA, pp 1\u20132"},{"issue":"4","key":"5797_CR37","doi-asserted-by":"publisher","first-page":"658","DOI":"10.1109\/TITB.2012.2193408","volume":"16","author":"M Llamedo","year":"2012","unstructured":"Llamedo M, Khawaja A, Martinez J P (2012) Cross-database evaluation of a multilead heartbeat classifier. IEEE Trans Inf Technol Biomed 16(4):658\u2013664","journal-title":"IEEE Trans Inf Technol Biomed"},{"key":"5797_CR38","doi-asserted-by":"crossref","unstructured":"Lyu Q, Wu Z, Zhu J (2015) Polyphonic music modelling with lstm-rtrbm. In: Proceedings of the 23rd ACM international conference on multimedia. ACM, pp 991\u2013994","DOI":"10.1145\/2733373.2806383"},{"key":"5797_CR39","unstructured":"Marques G, Domingues M A, Langlois T, Gouyon F (2011) Three current issues in music autotagging. In: Proceedings of the 12th international society for music information retrieval conference, Miami, FL, USA, pp 795\u2013800"},{"key":"5797_CR40","unstructured":"Mathieu B, Essid S, Fillon T, Prado J, Richard G (2010) YAAFE, an easy to use and efficient audio feature extraction software. In: Proceedings of the 11th international society for music information retrieval conference, Utrecht, Netherlands, pp 441\u2013446"},{"key":"5797_CR41","unstructured":"McEnnis D, McKay C, Fujinaga I (2006) Overview of OMEN. In: Proceedings of the 7th international conference on music information retrieval, Victoria, BC, Canada, pp 7\u201312"},{"key":"5797_CR42","doi-asserted-by":"crossref","unstructured":"McFee B, Raffel C, Liang D, Ellis D P W, McVicar M, Battenberg E, Nieto O (2015) Librosa Audio and music signal analysis in python. In: Proceedings of the 14th python in science conference, pp 18\u201325","DOI":"10.25080\/Majora-7b98e3ed-003"},{"key":"5797_CR43","volume-title":"An introduction to the psychology of hearing","author":"BCJ Moore","year":"2012","unstructured":"Moore B C J (2012) An introduction to the psychology of hearing. Brill, Leiden"},{"key":"5797_CR44","unstructured":"Ng A Y (1997) Preventing \u201coverfitting\u201d of cross-validation data. In: Proceedings of the 14th international conference on machine learning, Nashville, TN, USA, pp 245\u2013253"},{"key":"5797_CR45","first-page":"2825","volume":"12","author":"F Pedregosa","year":"2011","unstructured":"Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay \u00c9 (2011) Scikit-learn: machine learning in Python. J Mach Learning Res 12:2825\u20132830","journal-title":"J Mach Learning Res"},{"key":"5797_CR46","volume-title":"Fundamentals of speech recognition","author":"LR Rabiner","year":"1993","unstructured":"Rabiner L R, Juang B H (1993) Fundamentals of speech recognition. PTR Prentice Hall, Englewood Cliffs"},{"key":"5797_CR47","doi-asserted-by":"crossref","unstructured":"Raina R, Madhavan A, Ng A Y (2009) Large-scale deep unsupervised learning using graphics processors. In: Proceedings of the 26th annual international conference on machine learning. ACM, pp 873\u2013880","DOI":"10.1145\/1553374.1553486"},{"key":"5797_CR48","doi-asserted-by":"crossref","unstructured":"Ramona M, Richard G, David B (2008) Vocal detection in music with support vector machines. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing, Las Vegas, NV, USA, pp 1885\u20131888","DOI":"10.1109\/ICASSP.2008.4518002"},{"key":"5797_CR49","unstructured":"Rocamora M, Herrera P (2007) Comparing audio descriptors for singing voice detection in music audio files. In: Proceedings of the 11th Brazilian symposium on computer music, San Pablo, Brazil, vol 26, p 27"},{"key":"5797_CR50","unstructured":"Roma G, Grais E M, Simpson A J, Plumbley M D (2016) Singing voice separation using deep neural networks and f0 estimation. In: MIREX"},{"key":"5797_CR51","unstructured":"Schl\u00fcter J (2016) Learning to pinpoint singing voice from weakly labeled examples. In: Proceedings of the 17th international society for music information retrieval conference, New York, NY, USA, pp 44\u201350"},{"key":"5797_CR52","unstructured":"Schl\u00fcter J, Grill T (2015) Exploring data augmentation for improved singing voice detection with neural networks. In: Proceedings of the 16th international society for music information retrieval conference, M\u00e1laga, Spain, pp 121\u2013126"},{"key":"5797_CR53","doi-asserted-by":"crossref","unstructured":"Shen J, Meng W, Yan S, Pang H, Hua X (2010) Effective music tagging through advanced statistical modeling. In: Proceedings of the 33rd international ACM SIGIR conference on research and development in information retrieval. ACM, pp 635\u2013642","DOI":"10.1145\/1835449.1835555"},{"key":"5797_CR54","doi-asserted-by":"crossref","unstructured":"Shen J, Pang H, Wang M, Yan S (2012) Modeling concept dynamics for large scale music search. In: Proceedings of the 35th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 455\u2013464","DOI":"10.1145\/2348283.2348346"},{"key":"5797_CR55","unstructured":"Silla C N Jr, Koerich A L, Kaestner C A A (2008) The latin music database. In: Proceedings of the 9th international conference on music information retrieval, pp 451\u2013456"},{"issue":"1","key":"5797_CR56","first-page":"1929","volume":"15","author":"N Srivastava","year":"2014","unstructured":"Srivastava N, Hinton G E, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929\u20131958","journal-title":"J Mach Learn Res"},{"issue":"2","key":"5797_CR57","doi-asserted-by":"publisher","first-page":"147","DOI":"10.1080\/09298215.2014.894533","volume":"43","author":"BL Sturm","year":"2014","unstructured":"Sturm B L (2014) The state of the art ten years after a state of the art: Future research in music information retrieval. Journal of New Music Research 43(2):147\u2013172","journal-title":"Journal of New Music Research"},{"key":"5797_CR58","unstructured":"Sturm B L (2015) Faults in the latin music database and with its use. In: Proceedings of the late breaking demo 16th international society for music information retrieval conference, M\u00e1laga, Spain, pp 1\u20132"},{"key":"5797_CR59","doi-asserted-by":"crossref","unstructured":"Tachibana H, Ono T, Ono N, Sagayama S (2010) Melody line estimation in homophonic music audio signals based on temporal-variability of melodic source. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing. IEEE, pp 425\u2013428","DOI":"10.1109\/ICASSP.2010.5495764"},{"issue":"2","key":"5797_CR60","doi-asserted-by":"publisher","first-page":"467","DOI":"10.1109\/TASL.2007.913750","volume":"16","author":"D Turnbull","year":"2008","unstructured":"Turnbull D, Barrington L, Torres D, Lanckriet G (2008) Semantic annotation and retrieval of music and sound effects. IEEE Trans Audio Speech Lang Process 16 (2):467\u2013476","journal-title":"IEEE Trans Audio Speech Lang Process"},{"issue":"3","key":"5797_CR61","doi-asserted-by":"publisher","first-page":"169","DOI":"10.1017\/S1355771800003071","volume":"4","author":"G Tzanetakis","year":"2000","unstructured":"Tzanetakis G, Cook P (2000) Marsyas: a framework for audio analysis. Organised Sound 4(3):169\u2013175","journal-title":"Organised Sound"},{"key":"5797_CR62","doi-asserted-by":"crossref","unstructured":"Valin JM (2017) A hybrid dsp\/deep learning approach to real-time full-band speech enhancement. Tech. rep","DOI":"10.1109\/MMSP.2018.8547084"},{"key":"5797_CR63","unstructured":"Velarde G (2017) Convolutional methods for music analysis. PhD Thesis, Aalborg Universitetsforlag"},{"key":"5797_CR64","doi-asserted-by":"crossref","unstructured":"Wang X, Wang Y (2014) Improving content-based and hybrid music recommendation using deep learning. In: Proceedings of the 22nd ACM international conference on multimedia. ACM, pp 627\u2013636","DOI":"10.1145\/2647868.2654940"},{"key":"5797_CR65","unstructured":"West K, Cox S (2004) Features and classifiers for the automatic classification of musical audio signals. In: Proceedings of the 5th international conference on music information retrieval"},{"key":"5797_CR66","unstructured":"Yoshii K, Goto M, Komatani K, Ogata T, Okuno H G (2007) Improving efficiency and scalability of model-based music recommender system based on incremental training. In: Proceedings of the 8th international conference on music information retrieval, Vienna, Austria, pp 89\u201394"},{"key":"5797_CR67","volume-title":"The HTK book, vol 3","author":"S Young","year":"2002","unstructured":"Young S, Evermann G, Gales M, Hain T, Kershaw D, Liu X, Moore G, Odell J, Ollason D, Povey D et al (2002) The HTK book, vol 3. Cambridge University Engineering Department, Cambridge"},{"key":"5797_CR68","doi-asserted-by":"crossref","unstructured":"Zhao Z, Wang X, Xiang Q, Sarroff A M, Li Z, Wang Y (2010) Large-scale music tag recommendation with explicit multiple attributes. In: Proceedings of the 18th ACM international conference on multimedia. ACM, pp 401\u2013410","DOI":"10.1145\/1873951.1874006"}],"container-title":["Multimedia Tools and Applications"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/article\/10.1007\/s11042-018-5797-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s11042-018-5797-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s11042-018-5797-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,10,11]],"date-time":"2019-10-11T20:23:49Z","timestamp":1570825429000},"score":1,"resource":{"primary":{"URL":"http:\/\/link.springer.com\/10.1007\/s11042-018-5797-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,2,28]]},"references-count":68,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2019,2]]}},"alternative-id":["5797"],"URL":"https:\/\/doi.org\/10.1007\/s11042-018-5797-8","relation":{},"ISSN":["1380-7501","1573-7721"],"issn-type":[{"type":"print","value":"1380-7501"},{"type":"electronic","value":"1573-7721"}],"subject":[],"published":{"date-parts":[[2018,2,28]]},"assertion":[{"value":"18 October 2017","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 January 2018","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 February 2018","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"28 February 2018","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}