{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,6]],"date-time":"2026-04-06T10:29:52Z","timestamp":1775471392279,"version":"3.50.1"},"reference-count":31,"publisher":"MDPI AG","issue":"11","license":[{"start":{"date-parts":[[2021,11,13]],"date-time":"2021-11-13T00:00:00Z","timestamp":1636761600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100005089","name":"Beijing Municipal Natural Science Foundation","doi-asserted-by":"publisher","award":["6214040"],"award-info":[{"award-number":["6214040"]}],"id":[{"id":"10.13039\/501100005089","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100012226","name":"Fundamental Research Funds for the Central Universities","doi-asserted-by":"publisher","award":["2021ZY70"],"award-info":[{"award-number":["2021ZY70"]}],"id":[{"id":"10.13039\/501100012226","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>Deep convolutional neural networks (DCNNs) have achieved breakthrough performance on bird species identification using a spectrogram of bird vocalization. Aiming at the imbalance of the bird vocalization dataset, a single feature identification model (SFIM) with residual blocks and modified, weighted, cross-entropy function was proposed. To further improve the identification accuracy, two multi-channel fusion methods were built with three SFIMs. One of these fused the outputs of the feature extraction parts of three SFIMs (feature fusion mode), the other fused the outputs of the classifiers of three SFIMs (result fusion mode). The SFIMs were trained with three different kinds of spectrograms, which were calculated through short-time Fourier transform, mel-frequency cepstrum transform and chirplet transform, respectively. To overcome the shortage of the huge number of trainable model parameters, transfer learning was used in the multi-channel models. Using our own vocalization dataset as a sample set, it is found that the result fusion mode model outperforms the other proposed models, the best mean average precision (MAP) reaches 0.914. Choosing three durations of spectrograms, 100 ms, 300 ms and 500 ms for comparison, the results reveal that the 300 ms duration is the best for our own dataset. The duration is suggested to be determined based on the duration distribution of bird syllables. As for the performance with the training dataset of BirdCLEF2019, the highest classification mean average precision (cmAP) reached 0.135, which means the proposed model has certain generalization ability.<\/jats:p>","DOI":"10.3390\/e23111507","type":"journal-article","created":{"date-parts":[[2021,11,14]],"date-time":"2021-11-14T20:48:36Z","timestamp":1636922916000},"page":"1507","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":18,"title":["Bird Species Identification Using Spectrogram Based on Multi-Channel Fusion of DCNNs"],"prefix":"10.3390","volume":"23","author":[{"given":"Feiyu","family":"Zhang","sequence":"first","affiliation":[{"name":"School of Technology, Beijing Forestry University, Beijing 100083, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Luyang","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Technology, Beijing Forestry University, Beijing 100083, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hongxiang","family":"Chen","sequence":"additional","affiliation":[{"name":"School of Technology, Beijing Forestry University, Beijing 100083, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1367-324X","authenticated-orcid":false,"given":"Jiangjian","family":"Xie","sequence":"additional","affiliation":[{"name":"School of Technology, Beijing Forestry University, Beijing 100083, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2021,11,13]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1111\/jav.01447","article-title":"Automated birdsong recognition in complex acoustic environments: A review","volume":"49","author":"Priyadarshani","year":"2018","journal-title":"J. Avian Biol."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"295","DOI":"10.1016\/0022-5193(61)90032-7","article-title":"The Analysis of Animal Communication","volume":"1","author":"Green","year":"1961","journal-title":"J. Theor. Biol."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Graciarena, M., Delplanch, M., Shriberg, E., and Stolcke, A. (2011, January 22\u201327). Bird species recognition combining acoustic and sequence modeling. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Prague, Czech.","DOI":"10.1109\/ICASSP.2011.5946410"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"217","DOI":"10.1016\/j.ecolind.2015.02.023","article-title":"Towards the automated detection and occupancy estimation of primates using passive acoustic monitoring","volume":"54","author":"Kalan","year":"2015","journal-title":"Ecol. Indic."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"901","DOI":"10.1111\/ibi.12728","article-title":"Vocal activity rate index: A useful method to infer terrestrial bird abundance with acoustic monitoring","volume":"161","author":"Giralt","year":"2019","journal-title":"Ibis"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"2412","DOI":"10.3906\/elk-1808-208","article-title":"Spatial-aware global contrast representation for saliency detection","volume":"27","author":"Dan","year":"2019","journal-title":"Turk. J. Electr. Eng. Comput. Sci."},{"key":"ref_7","first-page":"634","article-title":"A Deep Neural Network Approach to the LifeCLEF 2014 bird task","volume":"1180","author":"Koops","year":"2014","journal-title":"LifeClef Work. Notes"},{"key":"ref_8","first-page":"1","article-title":"Recognizing Bird Species in Audio Recordings Using Deep Convolutional Neural Networks","volume":"1609","author":"Piczak","year":"2016","journal-title":"CEUR Workshop Proc."},{"key":"ref_9","unstructured":"Toth, B.P., and Czeba, B. (2016, January 5\u20138). Convolutional Neural Networks for Large-Scale Bird Song Classification in Noisy Environment. Proceedings of theConference and Labs of the Evaluation Forum, \u00c9vora, Portugal."},{"key":"ref_10","unstructured":"Sprengel, E., Jaggi, M., Kilcher, Y., and Hofmann, T. (2016, January 5\u20138). Audio Based Bird Species Identification using Deep Learning Techniques. Proceedings of the CEUR Workshop, Evora, Portugal."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Cakir, E., Adavanne, S., Parascandolo, G., Drossos, K., and Virtanen, T. (September, January 28). Convolutional recurrent neural networks for bird audio detection. Proceedings of the 2017 25th European Signal Processing Conference (EUSIPCO), Kos, Greek Island.","DOI":"10.23919\/EUSIPCO.2017.8081508"},{"key":"ref_12","unstructured":"Agnes, I., Henrietta-Bernadett, J., Zoltan, S., Attila, F., and Csaba, S. (2018, January 13\u201315). Bird sound recognition using a convolutional neural network. Proceedings of the 2018 IEEE 16th International Symposium on Intelligent Systems and Informatics (SISY), Subotica, Serbia."},{"key":"ref_13","first-page":"122","article-title":"Bird species recognition method based on Chirplet spectrogram feature and deep learning","volume":"40","author":"Xie","year":"2018","journal-title":"J. Beijing For. Univ."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"41062","DOI":"10.1109\/ACCESS.2020.2973243","article-title":"High accuracy individual identification model of crested ibis (Nipponia Nippon) based on autoencoder with self-attention","volume":"8","author":"Xie","year":"2020","journal-title":"IEEE Access"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"101236","DOI":"10.1016\/j.ecoinf.2021.101236","article-title":"BirdNET: A deep learning solution for avian diversity monitoring","volume":"61","author":"Kahl","year":"2021","journal-title":"Ecol. Inform."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"107866","DOI":"10.1016\/j.apacoust.2020.107866","article-title":"Multileveled ternary pattern and iterative ReliefF based bird sound classification","volume":"176","author":"Turker","year":"2021","journal-title":"Appl. Acoust."},{"key":"ref_17","first-page":"26","article-title":"Survey on transfer learning research","volume":"26","author":"Zhuang","year":"2015","journal-title":"J. Softw."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"1716","DOI":"10.3906\/elk-1910-171","article-title":"Human activity recognition by using MHIs of frame sequences","volume":"28","author":"Zebhi","year":"2020","journal-title":"Turk. J. Electr. Eng. Comput. Sci."},{"key":"ref_19","first-page":"1","article-title":"Audio Bird Classification with Inception-v4 extended with Time and Time-Frequency Attention Mechanisms","volume":"1866","author":"Antoine","year":"2017","journal-title":"LifeClef Work. Notes"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.apacoust.2014.01.001","article-title":"Automatic bird sound detection in long real-field recordings: Applications and tools","volume":"80","author":"Potamitis","year":"2014","journal-title":"Appl. Acoust."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"731","DOI":"10.1109\/78.747779","article-title":"A four-parameter atomic decomposition of chirplets","volume":"47","author":"Bultan","year":"2002","journal-title":"IEEE Trans. Signal Process."},{"key":"ref_22","unstructured":"Glotin, H., Ricard, J., and Balestriero, R. (2016). Fast Chirplet Transform to Enhance CNN Machine Listening\u2014Validation on Animal calls and Speech. arXiv."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"3195","DOI":"10.3906\/elk-1901-48","article-title":"A comparative study on handwritten Bangla character recognition","volume":"27","author":"Rizvi","year":"2019","journal-title":"Turk. J. Electr. Eng. Comput. Sci."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"917","DOI":"10.3906\/elk-1905-42","article-title":"An automated eye disease recognition system from visual content of facial images using machine learning techniques","volume":"28","author":"Akram","year":"2020","journal-title":"Turk. J. Electr. Eng. Comput. Sci."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep Residual Learning for Image Recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"2395","DOI":"10.3906\/elk-1808-130","article-title":"Elimination of useless images from raw camera-trap data","volume":"27","author":"Tekeli","year":"2019","journal-title":"Turk. J. Electr. Eng. Comput. Sci."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Xie, J., Li, A., Zhang, J., and Cheng, Z. (2019). An Integrated Wildlife Recognition Model Based on Multi-Branch Aggregation and Squeeze-And-Excitation Network. Appl. Sci., 9.","DOI":"10.3390\/app9142794"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"191","DOI":"10.1016\/j.neucom.2019.01.090","article-title":"A novel scene classification model combining ResNet based transfer learning and data augmentation with a filter","volume":"338","author":"Liu","year":"2019","journal-title":"Neurocomputing"},{"key":"ref_29","unstructured":"Kingma, D., and Ba, J. (2015, January 7\u20139). Adam: A Method for Stochastic Optimization. Proceedings of the International Conference on Learning Representations, San Diego, CA, USA."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Cai, Z., Fan, Q., Feris, R.S., and Vasconcelos, N. (2016, January 11\u201314). A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection. Proceedings of the 14th European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46493-0_22"},{"key":"ref_31","unstructured":"Kahl, S., Stoter, F.R., Goeau, H., Glotin, H., Planque, R., Vellinga, W.P., and Joly, A. (2019, November 04). Overview of BirdCLEF 2019: Large-Scale Bird Recognition in Soundscapes. Technical Report for 2019BirdCLEF Challenge. Available online: https:\/\/hal.umontpellier.fr\/hal-02345644\/document."}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/23\/11\/1507\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T07:29:47Z","timestamp":1760167787000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/23\/11\/1507"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,11,13]]},"references-count":31,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2021,11]]}},"alternative-id":["e23111507"],"URL":"https:\/\/doi.org\/10.3390\/e23111507","relation":{},"ISSN":["1099-4300"],"issn-type":[{"value":"1099-4300","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,11,13]]}}}