{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,20]],"date-time":"2026-04-20T23:55:16Z","timestamp":1776729316259,"version":"3.51.2"},"reference-count":38,"publisher":"Springer Science and Business Media LLC","issue":"19","license":[{"start":{"date-parts":[[2023,3,20]],"date-time":"2023-03-20T00:00:00Z","timestamp":1679270400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,3,20]],"date-time":"2023-03-20T00:00:00Z","timestamp":1679270400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Appl Intell"],"published-print":{"date-parts":[[2023,10]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>In many real data science problems, it is common to encounter a domain mismatch between the training and testing datasets, which means that solutions designed for one may not transfer well to the other due to their differences. An example of such was in the BirdCLEF2021 Kaggle competition, where participants had to identify all bird species that could be heard in audio recordings. Thus, multi-label classifiers, capable of coping with domain mismatch, were required. In addition, classifiers needed to be resilient to a long-tailed (imbalanced) class distribution and weak labels. Throughout the competition, a diverse range of solutions based on convolutional neural networks were proposed. However, it is unclear how different solution components contribute to overall performance. In this work, we contextualise the problem with respect to the previously existing literature, analysing and discussing the choices made by the different participants. We also propose a modular solution architecture to empirically quantify the effects of different architectures. The results of this study provide insights into which components worked well for this challenge.<\/jats:p>","DOI":"10.1007\/s10489-023-04486-8","type":"journal-article","created":{"date-parts":[[2023,3,27]],"date-time":"2023-03-27T02:10:47Z","timestamp":1679883047000},"page":"21485-21499","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":11,"title":["Identifying bird species by their calls in Soundscapes"],"prefix":"10.1007","volume":"53","author":[{"given":"Kyle","family":"Maclean","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0150-0651","authenticated-orcid":false,"given":"Isaac","family":"Triguero","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,3,20]]},"reference":[{"issue":"3","key":"4486_CR1","doi-asserted-by":"publisher","first-page":"606","DOI":"10.1007\/s10618-016-0483-9","volume":"31","author":"A Bagnall","year":"2017","unstructured":"Bagnall A, Lines J, Bostrom A (2017) The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Min Knowl Disc 31(3):606\u2013660","journal-title":"Data Min Knowl Disc"},{"key":"4486_CR2","doi-asserted-by":"publisher","first-page":"54,789","DOI":"10.1109\/ACCESS.2020.2979074","volume":"8","author":"JJ Bird","year":"2020","unstructured":"Bird JJ, Kobylarz J, Faria DR et al (2020) Cross-domain MLP and CNN transfer learning for biological signal processing: EEG and EMG. IEEE Access 8:54,789\u201354,801","journal-title":"IEEE Access"},{"key":"4486_CR3","doi-asserted-by":"publisher","first-page":"249","DOI":"10.1016\/j.neunet.2018.07.011","volume":"106","author":"M Buda","year":"2018","unstructured":"Buda M, Maki A, Mazurowski MA (2018) A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw 106:249\u2013259","journal-title":"Neural Netw"},{"issue":"6","key":"4486_CR4","doi-asserted-by":"publisher","first-page":"1291","DOI":"10.1109\/TASLP.2017.2690575","volume":"25","author":"E Cakir","year":"2017","unstructured":"Cakir E, Parascandolo G, Heittola T (2017) Convolutional recurrent neural networks for polyphonic sound event detection. IEEE\/ACM Transactions on Audio, Speech and Language Processing 25(6):1291\u20131303","journal-title":"IEEE\/ACM Transactions on Audio, Speech and Language Processing"},{"key":"4486_CR5","doi-asserted-by":"publisher","first-page":"e14","DOI":"10.1017\/ATSIP.2014.12","volume":"3","author":"S Chachada","year":"2014","unstructured":"Chachada S, Kuo CCJ (2014) Environmental sound recognition: a survey. APSIPA Transactions on Signal and Information Processing 3:e14","journal-title":"APSIPA Transactions on Signal and Information Processing"},{"key":"4486_CR6","doi-asserted-by":"crossref","unstructured":"Chen L, Gunduz S, Ozsu MT (2006) Mixed type audio classification with support vector machine. In: 2006 IEEE international conference on multimedia and expo, pp 781\u2013784","DOI":"10.1109\/ICME.2006.262954"},{"key":"4486_CR7","doi-asserted-by":"crossref","unstructured":"Dandashi A, AlJaam J (2017) A survey on audio content-based classification. In: 2017 International conference on computational science and computational intelligence (CSCI), pp 408\u2013413","DOI":"10.1109\/CSCI.2017.69"},{"key":"4486_CR8","unstructured":"Dosovitskiy A, Beyer L, Kolesnikov A et al (2021) An image is worth 16x16 words: transformers for image recognition at scale. In: International conference on learning representations (ICLR 21)"},{"issue":"5","key":"4486_CR9","doi-asserted-by":"publisher","first-page":"815","DOI":"10.3390\/app8050815","volume":"8","author":"W Feng","year":"2018","unstructured":"Feng W, Huang W, Ren J (2018) Class imbalance ensemble learning based on the margin theory. Appl Sci 8(5):815","journal-title":"Appl Sci"},{"key":"4486_CR10","doi-asserted-by":"crossref","unstructured":"Fern\u00e1ndez A, Garc\u00eda S, Galar M et al (2018) Learning from imbalanced data streams. In: Fern\u00e1ndez A, Garc\u00eda S, Galar M (eds) Learning from imbalanced data sets. Springer International Publishing, Cham, pp 279\u2013303","DOI":"10.1007\/978-3-319-98074-4_11"},{"key":"4486_CR11","unstructured":"Gouyon F, Pachet F, Delerue O (2000) On the use of zero-crossing rate for an application of classification of percussive sounds. Proceedings of the COST G-6 Conference on Digital Audio Effects"},{"key":"4486_CR12","doi-asserted-by":"publisher","first-page":"354","DOI":"10.1016\/j.patcog.2017.10.013","volume":"77","author":"J Gu","year":"2018","unstructured":"Gu J, Wang Z, Kuen J et al (2018) Recent advances in convolutional neural networks. Pattern Recogn 77:354\u2013377","journal-title":"Pattern Recogn"},{"key":"4486_CR13","doi-asserted-by":"crossref","unstructured":"He K, Zhang X, Ren S (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR)","DOI":"10.1109\/CVPR.2016.90"},{"key":"4486_CR14","doi-asserted-by":"publisher","first-page":"915","DOI":"10.1016\/j.asoc.2017.09.027","volume":"62","author":"AD Ignatov","year":"2018","unstructured":"Ignatov AD (2018) Real-time human activity recognition from accelerometer data using convolutional neural networks. Appl Soft Comput 62:915\u2013922","journal-title":"Appl Soft Comput"},{"issue":"4","key":"4486_CR15","doi-asserted-by":"publisher","first-page":"917","DOI":"10.1007\/s10618-019-00619-1","volume":"33","author":"H Ismail Fawaz","year":"2019","unstructured":"Ismail Fawaz H, Forestier G, Weber J (2019) Deep learning for time series classification: a review. Data Min Knowl Disc 33(4):917\u2013963","journal-title":"Data Min Knowl Disc"},{"key":"4486_CR16","unstructured":"Schl\u00fcter J (2021) Learning to monitor Birdcalls from weakly-labeled focused recordings. CEUR Workshop Proceedings 29362936(CLEF 2021 Working Notes)"},{"key":"4486_CR17","unstructured":"Puget J-F (2021) STFT transformers for bird song recognition. CEUR Workshop Proceedings 2936(CLEF 2021 Working Notes)"},{"issue":"9","key":"4486_CR18","doi-asserted-by":"publisher","first-page":"10,082","DOI":"10.1007\/s10489-021-02926-x","volume":"52","author":"J Li","year":"2022","unstructured":"Li J, Pedrycz W, Gacek A (2022) Time series reconstruction and classification: a comprehensive comparative study. Appl Intell 52(9):10,082\u201310,097","journal-title":"Appl Intell"},{"key":"4486_CR19","unstructured":"Lin L, Xu B, Wu W et al (2019) Medical time series classification with hierarchical attention-based temporal convolutional networks: a case study of myotonic dystrophy diagnosis. In: IEEE conference on computer vision and pattern recognition workshops, CVPR workshops 2019, Long Beach, CA, USA, June 16-20, 2019, pp 83\u201386"},{"issue":"11","key":"4486_CR20","doi-asserted-by":"publisher","first-page":"7955","DOI":"10.1109\/TPAMI.2021.3119334","volume":"44","author":"W Liu","year":"2022","unstructured":"Liu W, Wang H, Shen X et al (2022) The emerging trends of multi-label learning. IEEE Trans Pattern Anal Mach Intell 44(11):7955\u20137974","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"4486_CR21","unstructured":"Shugaev MV, Tanahashi N, Dhingra P (2021) BirdCLEF 2021: building a birdcall segmentation model based on weak labels. CEUR Workshop Proceedings 2936(CLEF 2021 Working Notes)"},{"key":"4486_CR22","doi-asserted-by":"publisher","first-page":"101, 909","DOI":"10.1016\/j.ecoinf.2022.101909","volume":"72","author":"G Morales","year":"2022","unstructured":"Morales G, Vargas V, Espejo D et al (2022) Method for passive acoustic monitoring of bird communities using UMAP and a deep neural network. Eco Inform 72:101, 909","journal-title":"Eco Inform"},{"issue":"5","key":"4486_CR23","doi-asserted-by":"publisher","first-page":"340","DOI":"10.1007\/s42979-021-00735-0","volume":"2","author":"A Mumuni","year":"2021","unstructured":"Mumuni A, Mumuni F (2021) CNN architectures for geometric transformation-invariant feature representation in computer vision: a review. SN Computer Science 2(5):340","journal-title":"SN Computer Science"},{"key":"4486_CR24","doi-asserted-by":"crossref","unstructured":"Musaev M, Khujayorov I, Ochilov M (2020) Image Approach to Speech Recognition on CNN. In: Proceedings of the 2019 3rd international symposium on computer science and intelligent control. Association for Computing Machinery, New York, NY, USA, ISCSIC 2019, pp 1\u20136","DOI":"10.1145\/3386164.3389100"},{"key":"4486_CR25","unstructured":"Murakami N, Tanaka H, Nishimori M (2021) Birdcall identification using CNN and gradient boosting decision trees with weak and noisy supervision. CEUR Workshop Proceedings 2936(CLEF 2021 Working Notes)"},{"key":"4486_CR26","doi-asserted-by":"publisher","first-page":"101,093","DOI":"10.1016\/j.ecoinf.2020.101093","volume":"58","author":"J Qin","year":"2020","unstructured":"Qin J, Pan W, Xiang X (2020) A biological image classification method based on improved CNN. Eco Inform 58:101,093","journal-title":"Eco Inform"},{"issue":"11","key":"4486_CR27","doi-asserted-by":"publisher","first-page":"2000","DOI":"10.1109\/LSP.2015.2451591","volume":"22","author":"E Singer","year":"2015","unstructured":"Singer E, Reynolds DA (2015) Domain mismatch compensation for speaker recognition using a library of whiteners. IEEE Signal Process Lett 22(11):2000\u20132003","journal-title":"IEEE Signal Process Lett"},{"key":"4486_CR28","volume-title":"Spectral Audio Signal Processing","author":"JO Smith","year":"2011","unstructured":"Smith JO (2011) Spectral Audio Signal Processing. Stanford University, CCRMA"},{"issue":"3","key":"4486_CR29","doi-asserted-by":"publisher","first-page":"1552","DOI":"10.1007\/s10489-020-01878-y","volume":"51","author":"L Sun","year":"2021","unstructured":"Sun L, Lyu G, Feng S, et al. (2021) Beyond missing: weakly-supervised multi-label learning with incomplete and noisy labels. Appl Intell 51(3):1552\u20131564","journal-title":"Appl Intell"},{"key":"4486_CR30","doi-asserted-by":"publisher","first-page":"107,965","DOI":"10.1016\/j.patcog.2021.107965","volume":"118","author":"AN Tarekegn","year":"2021","unstructured":"Tarekegn AN, Giacobini M, Michalak K (2021) A review of methods for imbalanced multi-label classification. Pattern Recogn 118:107,965","journal-title":"Pattern Recogn"},{"issue":"1","key":"4486_CR31","doi-asserted-by":"publisher","first-page":"792","DOI":"10.1038\/s41467-022-27980-y","volume":"13","author":"D Tuia","year":"2022","unstructured":"Tuia D, Kellenberger B, Beery S et al (2022) Perspectives in machine learning for wildlife conservation. Nat Commun 13(1):792","journal-title":"Nat Commun"},{"key":"4486_CR32","doi-asserted-by":"crossref","unstructured":"Wang T, Li Y, Kang B (2020) The devil is in classification: a simple framework for long-tail instance segmentation. In: Computer vision \u2013 ECCV 2020. Springer International Publishing, Cham, pp 728\u2013744","DOI":"10.1007\/978-3-030-58568-6_43"},{"key":"4486_CR33","doi-asserted-by":"crossref","unstructured":"Yang Y, Liu X (1999) A re-examination of text categorization methods. In: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR \u201999. ACM Press, Berkeley California, United States, pp 42\u201349","DOI":"10.1145\/312624.312647"},{"key":"4486_CR34","doi-asserted-by":"crossref","unstructured":"Zhang H, Wu C, Zhang Z et al (2022) Resnest: Split-attention networks. In: 2022 IEEE\/CVF conference on computer vision and pattern recognition workshops (CVPRW), pp 2735\u20132745","DOI":"10.1109\/CVPRW56347.2022.00309"},{"key":"4486_CR35","unstructured":"Zhang Y, Kang B, Hooi B et al (2021) Deep long-tailed learning: a survey. https:\/\/doi.org\/10.48550, arXiv.2110.04596"},{"key":"4486_CR36","unstructured":"Zhang Z, Sabuncu M (2020) Self-Distillation as instance-specific label smoothing. In: 34th Conference on neural information processing systems (NeurIPS 2020), Vancouver, Canada"},{"issue":"11","key":"4486_CR37","doi-asserted-by":"publisher","first-page":"1751","DOI":"10.3390\/f13111751","volume":"13","author":"Y Zhao","year":"2022","unstructured":"Zhao Y, Xu S, Huang Z, et al. (2022) Temporal and spatial characteristics of Soundscape ecology in urban forest areas and its landscape spatial influencing factors. Forests 13(11):1751","journal-title":"Forests"},{"issue":"1","key":"4486_CR38","doi-asserted-by":"publisher","first-page":"44","DOI":"10.1093\/nsr\/nwx106","volume":"5","author":"ZH Zhou","year":"2017","unstructured":"Zhou ZH (2017) A brief introduction to weakly supervised learning. Natl Sci Rev 5(1):44\u201353","journal-title":"Natl Sci Rev"}],"container-title":["Applied Intelligence"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10489-023-04486-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10489-023-04486-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10489-023-04486-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,10,18]],"date-time":"2023-10-18T13:10:25Z","timestamp":1697634625000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10489-023-04486-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,3,20]]},"references-count":38,"journal-issue":{"issue":"19","published-print":{"date-parts":[[2023,10]]}},"alternative-id":["4486"],"URL":"https:\/\/doi.org\/10.1007\/s10489-023-04486-8","relation":{},"ISSN":["0924-669X","1573-7497"],"issn-type":[{"value":"0924-669X","type":"print"},{"value":"1573-7497","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,3,20]]},"assertion":[{"value":"23 January 2023","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 March 2023","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors have no competing interests to declare that are relevant to the content of this article.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"<!--Emphasis Type='Bold' removed-->Competing interests"}}]}}