{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2022,4,5]],"date-time":"2022-04-05T18:09:57Z","timestamp":1649182197386},"reference-count":33,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2022,3,7]],"date-time":"2022-03-07T00:00:00Z","timestamp":1646611200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,3,7]],"date-time":"2022-03-07T00:00:00Z","timestamp":1646611200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J AUDIO SPEECH MUSIC PROC."],"published-print":{"date-parts":[[2022,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Polyphonic sound event detection aims to detect the types of sound events that occur in given audio clips, and their onset and offset times, in which multiple sound events may occur simultaneously. Deep learning\u2013based methods such as convolutional neural networks (CNN) achieved state-of-the-art results in polyphonic sound event detection. However, two open challenges still remain: overlap between events and prone to overfitting problem. To solve the above two problems, we proposed a capsule network-based method for polyphonic sound event detection. With so-called <jats:italic>dynamic routing<\/jats:italic>, capsule networks have the advantage of handling overlapping objects and the generalization ability to reduce overfitting. However, dynamic routing also greatly slows down the training process. In order to speed up the training process, we propose a weakly labeled polyphonic sound event detection model based on the improved capsule routing. Our proposed method is evaluated on task 4 of the DCASE 2017 challenge and compared with several baselines, demonstrating competitive results in terms of <jats:italic>F<\/jats:italic>-score and computational efficiency.<\/jats:p>","DOI":"10.1186\/s13636-022-00239-6","type":"journal-article","created":{"date-parts":[[2022,3,7]],"date-time":"2022-03-07T09:02:58Z","timestamp":1646643778000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Improved capsule routing for weakly labeled sound event detection"],"prefix":"10.1186","volume":"2022","author":[{"given":"Haitao","family":"Li","sequence":"first","affiliation":[]},{"given":"Shuguo","family":"Yang","sequence":"additional","affiliation":[]},{"given":"Wenwu","family":"Wang","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,3,7]]},"reference":[{"key":"239_CR1","doi-asserted-by":"publisher","first-page":"196","DOI":"10.1109\/TCE.2011.5735502","volume":"57","author":"N Cho","year":"2011","unstructured":"N. Cho, E.K. Kim, Enhanced voice activity detection using acoustic event detection and classification. IEEE Trans. Consum. Electron. 57, 196 (2011).","journal-title":"IEEE Trans. Consum. Electron."},{"key":"239_CR2","doi-asserted-by":"crossref","unstructured":"N. C. Phuong and T. Do Dat, Sound classification for event detection: Application into medical telemonitoring, 2013 Int. Conf. Comput. Manag. Telecommun. ComManTel 2013 330 (2013).","DOI":"10.1109\/ComManTel.2013.6482415"},{"key":"239_CR3","doi-asserted-by":"publisher","first-page":"5767","DOI":"10.1007\/s00521-018-3407-3","volume":"31","author":"TK Chan","year":"2019","unstructured":"T.K. Chan, C.S. Chin, Health stages diagnostics of underwater thruster using sound features with imbalanced dataset. Neural Comput. Appl. 31, 5767 (2019).","journal-title":"Neural Comput. Appl."},{"key":"239_CR4","doi-asserted-by":"publisher","first-page":"99","DOI":"10.1016\/j.ecoinf.2017.04.003","volume":"39","author":"Z Zhao","year":"2017","unstructured":"Z. Zhao, S. Zhang, Z. Xu, K. Bellisario, N. Dai, H. Omrani, B.C. Pijanowski, Automated bird acoustic event detection and robust species classification. Ecol. Inform. 39, 99 (2017).","journal-title":"Ecol. Inform."},{"key":"239_CR5","doi-asserted-by":"crossref","unstructured":"E. Cakir, G. Parascandolo, T. Heittola, H. Huttunen, T. Virtanen, Convolutional recurrent neural networks for polyphonic sound event detection. IEEE\/ACM Trans. Audio, Speech and Lang. Proc. 25, 1291 (2017).","DOI":"10.1109\/TASLP.2017.2690575"},{"key":"239_CR6","doi-asserted-by":"crossref","unstructured":"G. Parascandolo, H. Huttunen, T. Virtanen, Recurrent neural networks for polyphonic sound event detection in real life recordings. Paper presented at the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),\u00a0pp. 6440\u20136444,\u00a02016.","DOI":"10.1109\/ICASSP.2016.7472917"},{"key":"239_CR7","unstructured":"T. Heittola, A. Mesaros, A. Eronen, T. Virtanen, Audio context recognition using audio event histograms. Paper presented at the\u00a018th European Signal Processing Conference, pp. 1272\u20131276, (2010)."},{"key":"239_CR8","doi-asserted-by":"publisher","first-page":"1228","DOI":"10.1109\/JSTSP.2011.2146229","volume":"5","author":"N Degara","year":"2011","unstructured":"N. Degara, M.E.P. Davies, A. Pena, M.D. Plumbley, Onset event decoding exploiting the rhythmic structure of polyphonic music. IEEE J. Sel. Top. Signal Process. 5, 1228 (2011).","journal-title":"IEEE J. Sel. Top. Signal Process."},{"key":"239_CR9","doi-asserted-by":"publisher","first-page":"482","DOI":"10.1007\/s00530-002-0065-0","volume":"8","author":"L Lu","year":"2003","unstructured":"L. Lu, H.-J. Zhang, S.Z. Li, Content-based audio classification and segmentation by using support vector machines. Multimed. Syst. 8, 482 (2003).","journal-title":"Multimed. Syst."},{"key":"239_CR10","doi-asserted-by":"publisher","first-page":"1144","DOI":"10.1109\/JSTSP.2011.2159700","volume":"5","author":"JJ Carabias-Orti","year":"2011","unstructured":"J.J. Carabias-Orti, T. Virtanen, P. Vera-Candeas, N. Ruiz-Reyes, F.J. Canadas-Quesada, Musical instrument sound multi-excitation model for non-negative spectrogram factorization. IEEE J. Sel. Top. Signal Process. 5, 1144 (2011).","journal-title":"IEEE J. Sel. Top. Signal Process."},{"key":"239_CR11","doi-asserted-by":"crossref","unstructured":"K.J. Piczak, Environmental sound classification with convolutional neural networks.\u00a0Paper presented at the\u00a0IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1\u20136, (2015).","DOI":"10.1109\/MLSP.2015.7324337"},{"key":"239_CR12","unstructured":"S. Sabour, N. Frosst, G.E. Hinton, Dynamic routing between capsules. Paper presented at the Proceedings\u00a0of the 31st International Conference on Neural Information Processing Systems, pp. 3859\u20133869, (2017)."},{"key":"239_CR13","doi-asserted-by":"publisher","first-page":"741","DOI":"10.1007\/978-3-030-00934-2_82","volume":"11071 LNCS","author":"A Mobiny","year":"2018","unstructured":"A. Mobiny, H. Van Nguyen, Fast CapsNet for lung cancer screening. Lect. Notes Comput. Sci. 11071 LNCS, 741 (2018).","journal-title":"Lect. Notes Comput. Sci."},{"key":"239_CR14","doi-asserted-by":"publisher","first-page":"84","DOI":"10.1145\/3065386","volume":"60","author":"A Krizhevsky","year":"2012","unstructured":"A. Krizhevsky, I. Sutskever, G.E. Hinton, ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 84 (2012).","journal-title":"Commun. ACM"},{"key":"239_CR15","doi-asserted-by":"crossref","unstructured":"K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition. Paper presented at the\u00a0IEEE Conference on Computer Vision and Pattern Recognition (CVPR),\u00a0pp. 770\u2013778, (2016).","DOI":"10.1109\/CVPR.2016.90"},{"key":"239_CR16","doi-asserted-by":"crossref","unstructured":"T. Iqbal, Y. Xu, Q. Kong, W. Wang, Capsule routing for sound event detection. Paper presented at the\u00a026th European Signal Processing Conference (EUSIPCO), pp. 2255\u20132259,\u00a0 (2018).","DOI":"10.23919\/EUSIPCO.2018.8553198"},{"key":"239_CR17","doi-asserted-by":"crossref","unstructured":"Y. Liu, J. Tang, Y. Song, L. Dai, A capsule based approach for polyphonic sound event detection. Paper presented at the\u00a0Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 1853\u20131857, (2018).","DOI":"10.23919\/APSIPA.2018.8659533"},{"key":"239_CR18","doi-asserted-by":"crossref","unstructured":"K.-W. Liang, Y.-H. Tseng, P.-C. Chang, Parallel capsule neural networks for sound event detection. Paper presented at the\u00a0Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 1933\u20131936, (2019).","DOI":"10.1109\/APSIPAASC47483.2019.9023176"},{"key":"239_CR19","doi-asserted-by":"crossref","unstructured":"J.F. Gemmeke, D.P.W. Ellis, D. Freedman, A. Jansen, W. Lawrence, R.C. Moore, M. Plakal, M. Ritter, Audio set: An ontology and human-labeled dataset for audio events, Paper presented at the\u00a0IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 776\u2013780, (2017).","DOI":"10.1109\/ICASSP.2017.7952261"},{"key":"239_CR20","doi-asserted-by":"publisher","first-page":"81","DOI":"10.1016\/j.artint.2013.06.003","volume":"201","author":"J Amores","year":"2013","unstructured":"J. Amores, Multiple instance classification: Review, taxonomy and comparative study. Artif. Intell. 201, 81 (2013).","journal-title":"Artif. Intell."},{"key":"239_CR21","doi-asserted-by":"crossref","unstructured":"Y. Xu, Q. Kong, W. Wang, M.D. Plumbley, Large-scale weakly supervised audio classification using gated convolutional neural network. Paper presented at the\u00a0IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 121\u2013125, (2018).","DOI":"10.1109\/ICASSP.2018.8461975"},{"key":"239_CR22","volume-title":"End-to-end deep convolutional neural network with multi-scale structure for weakly labeled sound event detection","author":"S Lee","year":"2019","unstructured":"S. Lee, M. Kim, Y. Jeong, End-to-end deep convolutional neural network with multi-scale structure for weakly labeled sound event detection (2019)."},{"key":"239_CR23","doi-asserted-by":"publisher","first-page":"2180","DOI":"10.1109\/TASLP.2018.2858559","volume":"26","author":"B McFee","year":"2018","unstructured":"B. McFee, J. Salamon, J.P. Bello, Adaptive pooling operators for weakly labeled sound event detection. IEEE\/ACM Trans. Audio, Speech, Lang. Process. 26, 2180 (2018).","journal-title":"IEEE\/ACM Trans. Audio, Speech, Lang. Process."},{"key":"239_CR24","first-page":"1925","volume":"28","author":"I Mart\u00edn-Morat\u00f3","year":"2020","unstructured":"I. Mart\u00edn-Morat\u00f3, M. Cobos, F.J. Ferri, Adaptive distance-based pooling in convolutional neural networks for audio event classification. IEEE\/ACM Trans. Audio, Speech, Lang. Process. 28, 1925 (2020).","journal-title":"IEEE\/ACM Trans. Audio, Speech, Lang. Process."},{"key":"239_CR25","doi-asserted-by":"crossref","unstructured":"S. Liu, F. Yang, Y. Cao, and J. Yang, Frequency-dependent auto-pooling function for weakly supervised sound event detection, EURASIP J. Audio, Speech, Music Process. 2021, 19 (2021).","DOI":"10.1186\/s13636-021-00206-7"},{"key":"239_CR26","doi-asserted-by":"publisher","first-page":"6466","DOI":"10.1109\/TII.2020.2964117","volume":"16","author":"R Huang","year":"2020","unstructured":"R. Huang, J. Li, S. Wang, G. Li, W. Li, A Robust weight-shared capsule network for intelligent machinery fault diagnosis. IEEE Trans. Ind. Informatics 16, 6466 (2020).","journal-title":"IEEE Trans. Ind. Informatics"},{"key":"239_CR27","first-page":"85","volume-title":"Proceedings of the Detection and Classification of Acoustic Scenes and Events 2017 Workshop (DCASE2017)","author":"A Mesaros","year":"2017","unstructured":"A. Mesaros, T. Heittola, A. Diment, B.M. Elizalde, A. Shah, E. Vincent, B. Raj, T. Virtanen, DCASE 2017 Challenge setup: Tasks, datasets and baseline system, in Proceedings of the Detection and Classification of Acoustic Scenes and Events 2017 Workshop (DCASE2017),\u00a0pp. 85\u201392, (2017)."},{"key":"239_CR28","unstructured":"Y.-C. Wu, P.-C. Chang, C.-Y. Wang, J.-C. Wang,\u00a0Asymmetrie kernel convolutional neural network for acoustic scenes classification (2017).\u00a0Paper presented at the\u00a0IEEE International Symposium on Consumer Electronics (ISCE),\u00a0pp. 11\u201312, (2017)."},{"key":"239_CR29","doi-asserted-by":"crossref","unstructured":"Y. Xu, Q. Kong, Q. Huang, W. Wang, M.D. Plumbley, Convolutional gated recurrent neural network incorporating spatial features for audio tagging. Paper presented at the\u00a0International Joint Conference on Neural Networks (IJCNN),\u00a0pp. 3461\u20133466, (2017).","DOI":"10.1109\/IJCNN.2017.7966291"},{"key":"239_CR30","doi-asserted-by":"crossref","unstructured":"A. Mesaros, T. Heittola, T. Virtanen, Metrics for polyphonic sound event detection. Appl. Sci. 6 (2016).","DOI":"10.3390\/app6060162"},{"key":"239_CR31","unstructured":"S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, in Proceedings of the 32nd International Conference on International Conference on Machine Learning (ICML), Lille, France, pp. 448\u2013456, (2015)."},{"key":"239_CR32","unstructured":"P. Molchanov, S. Tyree, T. Karras, T. Aila, J. Kautz,\u00a0Pruning convolutional neural networks for resource efficient inference. Paper presented at the\u00a05th International Conference on Learning Representations, ICLR 2017 - Conference Track Proceedings,\u00a0pp. 1\u201317, (2017)."},{"key":"239_CR33","unstructured":"G. E. Hinton, S. Sabour, and N. Frosst, Matrix capsules with EM routing, in ICLR (Poster) (OpenReview.net, 2018)."}],"container-title":["EURASIP Journal on Audio, Speech, and Music Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13636-022-00239-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13636-022-00239-6\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13636-022-00239-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,3,7]],"date-time":"2022-03-07T09:17:21Z","timestamp":1646644641000},"score":1,"resource":{"primary":{"URL":"https:\/\/asmp-eurasipjournals.springeropen.com\/articles\/10.1186\/s13636-022-00239-6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,7]]},"references-count":33,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2022,12]]}},"alternative-id":["239"],"URL":"https:\/\/doi.org\/10.1186\/s13636-022-00239-6","relation":{},"ISSN":["1687-4722"],"issn-type":[{"value":"1687-4722","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,3,7]]},"assertion":[{"value":"30 May 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"14 February 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 March 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"5"}}