{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,12]],"date-time":"2026-05-12T15:32:19Z","timestamp":1778599939389,"version":"3.51.4"},"reference-count":69,"publisher":"MDPI AG","issue":"19","license":[{"start":{"date-parts":[[2023,9,27]],"date-time":"2023-09-27T00:00:00Z","timestamp":1695772800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Natural Science Foundation of China","award":["62066042"],"award-info":[{"award-number":["62066042"]}]},{"name":"National Natural Science Foundation of China","award":["2022YFG0045"],"award-info":[{"award-number":["2022YFG0045"]}]},{"name":"National Natural Science Foundation of China","award":["2022SCU12008"],"award-info":[{"award-number":["2022SCU12008"]}]},{"name":"Key Research and Development Program of Sichuan","award":["62066042"],"award-info":[{"award-number":["62066042"]}]},{"name":"Key Research and Development Program of Sichuan","award":["2022YFG0045"],"award-info":[{"award-number":["2022YFG0045"]}]},{"name":"Key Research and Development Program of Sichuan","award":["2022SCU12008"],"award-info":[{"award-number":["2022SCU12008"]}]},{"name":"Fundamental Research Funds for the Central Universities","award":["62066042"],"award-info":[{"award-number":["62066042"]}]},{"name":"Fundamental Research Funds for the Central Universities","award":["2022YFG0045"],"award-info":[{"award-number":["2022YFG0045"]}]},{"name":"Fundamental Research Funds for the Central Universities","award":["2022SCU12008"],"award-info":[{"award-number":["2022SCU12008"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Birds play a vital role in the study of ecosystems and biodiversity. Accurate bird identification helps monitor biodiversity, understand the functions of ecosystems, and develop effective conservation strategies. However, previous bird sound recognition methods often relied on single features and overlooked the spatial information associated with these features, leading to low accuracy. Recognizing this gap, the present study proposed a bird sound recognition method that employs multiple convolutional neural-based networks and a transformer encoder to provide a reliable solution for identifying and classifying birds based on their unique sounds. We manually extracted various acoustic features as model inputs, and feature fusion was applied to obtain the final set of feature vectors. Feature fusion combines the deep features extracted by various networks, resulting in a more comprehensive feature set, thereby improving recognition accuracy. The multiple integrated acoustic features, such as mel frequency cepstral coefficients (MFCC), chroma features (Chroma) and Tonnetz features, were encoded by a transformer encoder. The transformer encoder effectively extracted the positional relationships between bird sound features, resulting in enhanced recognition accuracy. The experimental results demonstrated the exceptional performance of our method with an accuracy of 97.99%, a recall of 96.14%, an F1 score of 96.88% and a precision of 97.97% on the Birdsdata dataset. Furthermore, our method achieved an accuracy of 93.18%, a recall of 92.43%, an F1 score of 93.14% and a precision of 93.25% on the Cornell Bird Challenge 2020 (CBC) dataset.<\/jats:p>","DOI":"10.3390\/s23198099","type":"journal-article","created":{"date-parts":[[2023,9,27]],"date-time":"2023-09-27T03:49:14Z","timestamp":1695786554000},"page":"8099","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":30,"title":["A Novel Bird Sound Recognition Method Based on Multifeature Fusion and a Transformer Encoder"],"prefix":"10.3390","volume":"23","author":[{"given":"Shaokai","family":"Zhang","sequence":"first","affiliation":[{"name":"College of Electronics and Information Engineering, Sichuan University, Chengdu 610041, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-2611-5901","authenticated-orcid":false,"given":"Yuan","family":"Gao","sequence":"additional","affiliation":[{"name":"College of Electronics and Information Engineering, Sichuan University, Chengdu 610041, China"}]},{"given":"Jianmin","family":"Cai","sequence":"additional","affiliation":[{"name":"College of Electronics and Information Engineering, Sichuan University, Chengdu 610041, China"}]},{"given":"Hangxiao","family":"Yang","sequence":"additional","affiliation":[{"name":"College of Computer Science, Sichuan University, Chengdu 610041, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4651-7163","authenticated-orcid":false,"given":"Qijun","family":"Zhao","sequence":"additional","affiliation":[{"name":"College of Computer Science, Sichuan University, Chengdu 610041, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9119-7387","authenticated-orcid":false,"given":"Fan","family":"Pan","sequence":"additional","affiliation":[{"name":"College of Electronics and Information Engineering, Sichuan University, Chengdu 610041, China"}]}],"member":"1968","published-online":{"date-parts":[[2023,9,27]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1111\/jofo.12146","article-title":"Bird conservation and biodiversity research in Mexico: Status and priorities","volume":"87","author":"Peterson","year":"2016","journal-title":"J. Field Ornithol."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"106","DOI":"10.1111\/j.1740-9713.2006.00178.x","article-title":"Birds as Biodiversity Indicators for Europe","volume":"3","author":"Gregory","year":"2006","journal-title":"Significance"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"132","DOI":"10.5122\/cbirds.2011.0024","article-title":"Individual identification on the basis of the songs of the Asian Stubtail (Urosphena squameiceps)","volume":"2","author":"Xia","year":"2011","journal-title":"Chin. Birds"},{"key":"ref_4","first-page":"279","article-title":"Individual acoustic monitoring of the European Eagle Owl Bubo bubo","volume":"150","author":"Grava","year":"2008","journal-title":"Int. J. Avain Sci."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"6217","DOI":"10.1038\/s41467-021-26488-1","article-title":"Bird population declines and species turnover are changing the acoustic properties of spring soundscapes","volume":"12","author":"Morrison","year":"2021","journal-title":"Nat. Commun."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Sainburg, T., Thielk, M., and Gentner, T.Q. (2020). Finding, visualizing, and quantifying latent structure across diverse animal vocal repertoires. PLoS Comput. Biol., 16.","DOI":"10.1371\/journal.pcbi.1008228"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"101009","DOI":"10.1016\/j.ecoinf.2019.101009","article-title":"Spectrogram-frame linear network and continuous frame sequence for bird sound classification","volume":"54","author":"Zhang","year":"2019","journal-title":"Ecol. Inform."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"2974","DOI":"10.1121\/1.2345831","article-title":"Semi-automatic classification of bird vocalizations using spectral peak tracks","volume":"120","author":"Chen","year":"2006","journal-title":"J. Acoust. Soc. Am."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"1069","DOI":"10.1121\/1.4906168","article-title":"Dynamic time warping and sparse representation classification for birdsong phrase classification using limited training data","volume":"137","author":"Tan","year":"2015","journal-title":"J. Acoust. Soc. Am."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"217","DOI":"10.1016\/j.ecolind.2015.02.023","article-title":"Towards the automated detection and occupancy estimation of primates using passive acoustic monitoring","volume":"54","author":"Kalan","year":"2015","journal-title":"Ecol. Indic."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"454","DOI":"10.1109\/TMM.2012.2229969","article-title":"Continuous Birdsong Recognition Using Gaussian Mixture Modeling of Image Shape Features","volume":"15","author":"Lee","year":"2013","journal-title":"IEEE Trans. Multimed."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1016\/j.ecoinf.2017.04.003","article-title":"Automated bird acoustic event detection and robust species classification","volume":"39","author":"Zhao","year":"2017","journal-title":"Ecol. Inform."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Leng, Y.R., and Tran, H.D. (2014, January 9\u201312). Multi-label bird classification using an ensemble classifier with simple features. Proceedings of the Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific, Chiang Mai, Thailand.","DOI":"10.1109\/APSIPA.2014.7041649"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"e488","DOI":"10.7717\/peerj.488","article-title":"Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning","volume":"2","author":"Stowell","year":"2014","journal-title":"PeerJ"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Shaheen, F., Verma, B., and Asafuddoula, M. (December, January 30). Impact of Automatic Feature Extraction in Deep Learning Architecture. Proceedings of the 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA), Gold Coast, Australia.","DOI":"10.1109\/DICTA.2016.7797053"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Zhang, H., McLoughlin, I., and Song, Y. (2015, January 19\u201324). Robust sound event recognition using convolutional neural networks. Proceedings of the 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), South Brisbane, Australia.","DOI":"10.1109\/ICASSP.2015.7178031"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"631","DOI":"10.1007\/s10772-016-9354-4","article-title":"Robust acoustic bird recognition for habitat monitoring with wireless sensor networks","volume":"19","author":"Boulmaiz","year":"2016","journal-title":"Int. J. Speech Technol."},{"key":"ref_18","unstructured":"Stahl, V., Fischer, A., and Bippus, R. (2000, January 5\u20139). Quantile based noise estimation for spectral subtraction and Wiener filtering. Proceedings of the 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing, Istanbul, Turkey. Cat. No. 00CH37100."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"1524","DOI":"10.1016\/j.patrec.2009.09.014","article-title":"Detecting bird sounds in a complex acoustic environment and application to bioacoustic monitoring","volume":"31","author":"Bardeli","year":"2010","journal-title":"Pattern Recognit. Lett."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"175353","DOI":"10.1109\/ACCESS.2019.2957572","article-title":"Investigation of different CNN-based models for improved bird sound classification","volume":"7","author":"Xie","year":"2019","journal-title":"IEEE Access"},{"key":"ref_21","unstructured":"Koh, C.Y., Chang, J.Y., Tai, C.L., Huang, D.Y., Hsieh, H.H., and Liu, Y.W. (2019, January 9\u201312). Bird Sound Classification Using Convolutional Neural Networks. Proceedings of the Clef (Working Notes), Lugano, Switzerland."},{"key":"ref_22","unstructured":"Himawan, I., and Towsey, M. (2018, January November). 3D convolution recurrent neural networks for bird sound detection. Proceedings of the 3rd Workshop on Detection and Classification of Acoustic Scenes and Events, Surrey, UK."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"74","DOI":"10.1016\/j.ecoinf.2019.05.007","article-title":"Handcrafted features and late fusion with deep learning for bird sound classification","volume":"52","author":"Xie","year":"2019","journal-title":"Ecol. Inform."},{"key":"ref_24","unstructured":"Sankupellay, M., and Konovalov, D. (2018, January 7\u20139). Bird call recognition using deep convolutional neural network, ResNet-50. Proceedings of the Acoustics, Adelaide, Australia."},{"key":"ref_25","unstructured":"Puget, J.F. (2021, January 21\u201324th). STFT Transformers for Bird Song Recognition. Proceedings of the CLEF (Working Notes), Bucharest, Romania."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"102001","DOI":"10.1016\/j.ecoinf.2023.102001","article-title":"Transound: Hyper-head attention transformer for birds sound recognition","volume":"75","author":"Tang","year":"2023","journal-title":"Ecol. Inform."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"424","DOI":"10.1016\/j.procs.2022.12.154","article-title":"Repurposing transfer learning strategy of computer vision for owl sound classification","volume":"216","author":"Gunawan","year":"2023","journal-title":"Procedia Comput. Sci."},{"key":"ref_28","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., and Polosukhin, I. (2017, January 4\u20139). Attention is all you need. Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Su, Y., Zhang, K., Wang, J., and Madani, K. (2019). Environment sound classification using a two-stream CNN based on decision-level fusion. Sensors, 19.","DOI":"10.3390\/s19071733"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"109121","DOI":"10.1016\/j.apacoust.2022.109121","article-title":"AMResNet: An automatic recognition model of bird sounds in real environment","volume":"201","author":"Xiao","year":"2022","journal-title":"Appl. Acoust."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"81","DOI":"10.1016\/j.procs.2020.12.010","article-title":"Convolutional Neural Networks for Scops Owl Sound Classification","volume":"179","author":"Hidayat","year":"2021","journal-title":"Procedia Comput. Sci."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Neal, L., Briggs, F., Raich, R., and Fern, X.Z. (2011, January 22\u201327). Time-frequency segmentation of bird song in noisy acoustic environments. Proceedings of the 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic.","DOI":"10.1109\/ICASSP.2011.5946906"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"108550","DOI":"10.1016\/j.apacoust.2021.108550","article-title":"KD-CLDNN: Lightweight automatic recognition model based on bird vocalization","volume":"188","author":"Xie","year":"2022","journal-title":"Appl. Acoust."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Adavanne, S., Drossos, K., \u00c7akir, E., and Virtanen, T. (September, January 28). Stacked convolutional and recurrent neural networks for bird audio detection. Proceedings of the 2017 25th European Signal Processing Conference (EUSIPCO), Kos Island, Greece.","DOI":"10.23919\/EUSIPCO.2017.8081505"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"051806","DOI":"10.1155\/2007\/51806","article-title":"Wavelets in recognition of bird sounds","volume":"2007","author":"Selin","year":"2006","journal-title":"EURASIP J. Adv. Signal Process."},{"key":"ref_36","unstructured":"Sabour, S., Frosst, N., and Hinton, G.E. (2017, January 4\u20139). Dynamic routing between capsules. Proceedings of the Advances in neural information processing systems 2017, Long Beach, CA, USA."},{"key":"ref_37","unstructured":"Tan, M., and Le, Q. (2019, January 10\u201315). Efficientnet: Rethinking model scaling for convolutional neural networks. Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep Residual Learning for Image Recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_39","unstructured":"Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., and Liu, T.Y. (2017, January 4\u20139). Lightgbm: A highly efficient gradient boosting decision tree. Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA."},{"key":"ref_40","unstructured":"Sprengel, E., Jaggi, M., Kilcher, Y., and Hofmann, T. (2016, January 5\u20138). Audio based bird species identification using deep learning techniques. Proceedings of the CEUR Workshop Proceedings, \u00c9vora, Portugal."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"17085","DOI":"10.1038\/s41598-021-96446-w","article-title":"Comparing recurrent convolutional neural networks for large scale bird species classification","volume":"11","author":"Gupta","year":"2021","journal-title":"Sci. Rep."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Kiapuchinski, D.M., Lima, C., and Kaestner, C. (2012, January 10\u201312). Spectral Noise Gate Technique Applied to Birdsong Preprocessing on Embedded Unit. Proceedings of the IEEE International Symposium on Multimedia, Irvine, CA, USA.","DOI":"10.1109\/ISM.2012.12"},{"key":"ref_43","unstructured":"Oppenheim, A.V. (2023, August 17). Discrete-Time Signal Processing; Pearson Education India: 1999. Available online: https:\/\/ds.amu.edu.et\/xmlui\/bitstream\/handle\/123456789\/5524\/1001326.pdf?sequence=1&isAllowed=y."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"18006","DOI":"10.15680\/IJIRSET.2014.0312034","article-title":"A comparative study of feature extraction techniques for speech recognition system","volume":"3","author":"Kurzekar","year":"2014","journal-title":"Int. J. Innov. Res. Sci. Eng. Technol."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Seo, S., Kim, C., and Kim, J.H. (2022). Convolutional Neural Networks Using Log Mel-Spectrogram Separation for Audio Event Classification with Unknown Devices. J. Web Eng., 97\u2013522.","DOI":"10.13052\/jwe1540-9589.21216"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Leung, H.C., Chigier, B., and Glass, J.R. (1993, January 27\u201330). A comparative study of signal representations and classification techniques for speech recognition. Proceedings of the IEEE International Conference on Acoustics, Minneapolis, MN, USA.","DOI":"10.1109\/ICASSP.1993.319402"},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Ramirez, A.D.P., de la Rosa Vargas, J.I., Valdez, R.R., and Becerra, A. (2018, January 7\u20139). A comparative between mel frequency cepstral coefficients (MFCC) and inverse mel frequency cepstral coefficients (IMFCC) features for an automatic bird species recognition system. Proceedings of the 2018 IEEE Latin American Conference on Computational Intelligence (LA-CCI), Gudalajara, Mexico.","DOI":"10.1109\/LA-CCI.2018.8625230"},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"90","DOI":"10.1109\/T-C.1974.223784","article-title":"Discrete cosine transform","volume":"100","author":"Ahmed","year":"1974","journal-title":"IEEE Trans. Comput."},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"293","DOI":"10.1109\/TSA.2002.800560","article-title":"Musical genre classification of audio signals","volume":"10","author":"Tzanetakis","year":"2002","journal-title":"IEEE Trans. Speech Audio Process."},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"108","DOI":"10.1016\/j.neucom.2014.12.042","article-title":"Adaptive energy detection for bird sound detection in complex environments","volume":"155","author":"Zhang","year":"2015","journal-title":"Neurocomputing"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"McFee, B., Raffel, C., Liang, D., Ellis, D.P., McVicar, M., Battenberg, E., and Nieto, O. (2015, January 6\u201312). librosa: Audio and music signal analysis in python. Proceedings of the 14th python in science conference, Austin, TX, USA.","DOI":"10.25080\/Majora-7b98e3ed-003"},{"key":"ref_52","unstructured":"Kwan, C., Mei, G., Zhao, X., Ren, Z., Xu, R., Stanford, V., Rochet, C., Aube, J., and Ho, K. (2004, January 17\u201321). Bird classification algorithms: Theory and experimental results. Proceedings of the 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, Montreal, QC, Canada."},{"key":"ref_53","unstructured":"Ioffe, S., and Szegedy, C. (2015, January 6\u201311). Batch normalization: Accelerating deep network training by reducing internal covariate shift. Proceedings of the International Conference on Machine Learning, PMLR, Lille, France."},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/s10915-021-01628-3","article-title":"Stochastic gradient descent with polyak\u2019s learning rate","volume":"89","author":"Prazeres","year":"2021","journal-title":"J. Sci. Comput."},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Huang, G., Liu, Z., Maaten, L.V.D., and Weinberger, K.Q. (2017, January 21\u201326). Densely Connected Convolutional Networks. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.243"},{"key":"ref_56","unstructured":"Simonyan, K., and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv."},{"key":"ref_57","doi-asserted-by":"crossref","first-page":"101236","DOI":"10.1016\/j.ecoinf.2021.101236","article-title":"BirdNET: A deep learning solution for avian diversity monitoring","volume":"61","author":"Kahl","year":"2021","journal-title":"Ecol. Inform."},{"key":"ref_58","doi-asserted-by":"crossref","unstructured":"Andono, P.N., Shidik, G.F., Prabowo, D.P., Yanuarsari, D.H., Sari, Y., and Pramunendar, R.A. (2023). Feature Selection on Gammatone Cepstral Coefficients for Bird Voice Classification Using Particle Swarm Optimization. Int. J. Intell. Eng. Syst., 16.","DOI":"10.22266\/ijies2023.0228.23"},{"key":"ref_59","doi-asserted-by":"crossref","first-page":"775","DOI":"10.1111\/cobi.13643","article-title":"Importance of species translocations under rapid climate change","volume":"35","author":"Butt","year":"2021","journal-title":"Conserv. Biol."},{"key":"ref_60","doi-asserted-by":"crossref","first-page":"971","DOI":"10.1016\/j.tree.2019.07.014","article-title":"Climate change is breaking earth\u2019s beat","volume":"34","author":"Sueur","year":"2019","journal-title":"Trends Ecol. Evol."},{"key":"ref_61","doi-asserted-by":"crossref","first-page":"eaay9969","DOI":"10.1126\/sciadv.aay9969","article-title":"Integrating climate adaptation and biodiversity conservation in the global ocean","volume":"5","author":"Tittensor","year":"2019","journal-title":"Sci. Adv."},{"key":"ref_62","unstructured":"Kim, B., Yang, S., Kim, J., and Chang, S. (2022). QTI submission to DCASE 2021: Residual normalization for device-imbalanced acoustic scene classification with efficient design. arXiv."},{"key":"ref_63","doi-asserted-by":"crossref","first-page":"475","DOI":"10.1016\/j.anbehav.2013.04.017","article-title":"A method for automated individual, species and call type recognition in free-ranging animals","volume":"86","author":"Mielke","year":"2013","journal-title":"Anim. Behav."},{"key":"ref_64","doi-asserted-by":"crossref","unstructured":"Nanni, L., Costa, Y.M., Lucio, D.R., Silla, C.N., and Brahnam, S. (2016, January 6\u20138). Combining visual and acoustic features for bird species classification. Proceedings of the 2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI), San Jose, CA, USA.","DOI":"10.1109\/ICTAI.2016.0067"},{"key":"ref_65","doi-asserted-by":"crossref","first-page":"338","DOI":"10.1080\/00063657.2018.1511682","article-title":"A cost-effective protocol for monitoring birds using autonomous recording units: A case study with a night-time singing passerine","volume":"65","author":"Bota","year":"2018","journal-title":"Bird Study"},{"key":"ref_66","doi-asserted-by":"crossref","first-page":"79","DOI":"10.1002\/rse2.125","article-title":"Automated identification of avian vocalizations with deep convolutional neural networks","volume":"6","author":"Ruff","year":"2020","journal-title":"Remote. Sens. Ecol. Conserv."},{"key":"ref_67","doi-asserted-by":"crossref","unstructured":"Liu, H., Liu, F., Fan, X., and Huang, D. (2021). Polarized self-attention: Towards high-quality pixel-wise regression. arXiv.","DOI":"10.1016\/j.neucom.2022.07.054"},{"key":"ref_68","unstructured":"Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., and Funtowicz, M. Proceedings of the Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Online, 16\u201320 November 2020."},{"key":"ref_69","doi-asserted-by":"crossref","first-page":"101927","DOI":"10.1016\/j.ecoinf.2022.101927","article-title":"A review of automatic recognition technology for bird vocalizations in the deep learning era","volume":"73","author":"Xie","year":"2022","journal-title":"Ecol. Inform."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/19\/8099\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T20:59:07Z","timestamp":1760129947000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/19\/8099"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,9,27]]},"references-count":69,"journal-issue":{"issue":"19","published-online":{"date-parts":[[2023,10]]}},"alternative-id":["s23198099"],"URL":"https:\/\/doi.org\/10.3390\/s23198099","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,9,27]]}}}