{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,13]],"date-time":"2026-01-13T22:54:38Z","timestamp":1768344878149,"version":"3.49.0"},"reference-count":38,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2026,1,13]],"date-time":"2026-01-13T00:00:00Z","timestamp":1768262400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"national funds through FCT\u2014Funda\u00e7\u00e3o para a Ci\u00eancia e a Tecnologia, I.P.","award":["UID\/6486\/2025"],"award-info":[{"award-number":["UID\/6486\/2025"]}]},{"name":"national funds through FCT\u2014Funda\u00e7\u00e3o para a Ci\u00eancia e a Tecnologia, I.P.","award":["UID\/PRR\/6486\/2025"],"award-info":[{"award-number":["UID\/PRR\/6486\/2025"]}]},{"name":"national funds through FCT\u2014Funda\u00e7\u00e3o para a Ci\u00eancia e a Tecnologia, I.P.","award":["UID\/50021\/2025"],"award-info":[{"award-number":["UID\/50021\/2025"]}]},{"name":"national funds through FCT\u2014Funda\u00e7\u00e3o para a Ci\u00eancia e a Tecnologia, I.P.","award":["UID\/PRR\/50021\/2025"],"award-info":[{"award-number":["UID\/PRR\/50021\/2025"]}]},{"name":"national funds through FCT\u2014Funda\u00e7\u00e3o para a Ci\u00eancia e a Tecnologia, I.P.","award":["2023.15325.PEX"],"award-info":[{"award-number":["2023.15325.PEX"]}]},{"name":"national funds through FCT\u2014Funda\u00e7\u00e3o para a Ci\u00eancia e a Tecnologia, I.P.","award":["LISBOA2030-FEDER-00692100"],"award-info":[{"award-number":["LISBOA2030-FEDER-00692100"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Electronics"],"abstract":"<jats:p>Monitoring wildlife has become increasingly important for understanding the evolution of species and ecosystem health. Acoustic monitoring offers several advantages over video-based approaches, enabling continuous 24\/7 observation and robust detection under challenging environmental conditions. Deep learning models have demonstrated strong performance in audio classification. However, their computational complexity poses significant challenges for deployment on low-power embedded platforms. This paper presents a low-power embedded system for real-time bird audio detection. A hybrid CNN\u2013RNN architecture is adopted, redesigned, and quantized to significantly reduce model complexity while preserving classification accuracy. To support efficient execution, a custom hardware accelerator was developed and integrated into a Zynq UltraScale+ ZU3CG FPGA. The proposed system achieves an accuracy of 87.4%, processes up to 5 audio samples per second, and operates at only 1.4 W, demonstrating its suitability for autonomous, energy-efficient wildlife monitoring applications.<\/jats:p>","DOI":"10.3390\/electronics15020354","type":"journal-article","created":{"date-parts":[[2026,1,13]],"date-time":"2026-01-13T15:52:57Z","timestamp":1768319577000},"page":"354","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Real-Time Bird Audio Detection with a CNN-RNN Model on a SoC-FPGA"],"prefix":"10.3390","volume":"15","author":[{"given":"Rodrigo Lopes da","family":"Silva","sequence":"first","affiliation":[{"name":"ISEL-IPL, 1959-007 Lisboa, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-9651-5043","authenticated-orcid":false,"given":"Gustavo","family":"Jacinto","sequence":"additional","affiliation":[{"name":"INESC-ID\/IST-ULisboa, 1000-029 Lisbon, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8556-4507","authenticated-orcid":false,"given":"M\u00e1rio","family":"V\u00e9stias","sequence":"additional","affiliation":[{"name":"ISEL-IPL, 1959-007 Lisboa, Portugal"},{"name":"INESC INOV, 1000-029 Lisboa, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7060-4745","authenticated-orcid":false,"given":"Rui Policarpo","family":"Duarte","sequence":"additional","affiliation":[{"name":"ISEL-IPL, 1959-007 Lisboa, Portugal"},{"name":"INESC INOV, 1000-029 Lisboa, Portugal"}]}],"member":"1968","published-online":{"date-parts":[[2026,1,13]]},"reference":[{"key":"ref_1","unstructured":"T\u00f3th, B., and Czeba, B. (2016, January 5\u20138). Convolutional Neural Networks for Large-Scale Bird Song Classification in Noisy Environment. Proceedings of the Conference and Labs of the Evaluation Forum, \u00c9vora, Portugal."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"110285","DOI":"10.1016\/j.apacoust.2024.110285","article-title":"Research progress in bird sounds recognition based on acoustic monitoring technology: A systematic review","volume":"228","author":"Liu","year":"2025","journal-title":"Appl. Acoust."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Anusha, P., and ManiSai, K. (2022, January 21\u201323). Bird Species Classification Using Deep Learning. Proceedings of the 2022 International Conference on Intelligent Controller and Computing for Smart Power (ICICCSP), Hyderabad, India.","DOI":"10.1109\/ICICCSP53532.2022.9862344"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"V\u00e9stias, M., and Neto, H. (2014, January 2\u20134). Trends of CPU, GPU and FPGA for high-performance computing. Proceedings of the 2014 24th International Conference on Field Programmable Logic and Applications (FPL), Munich, Germany.","DOI":"10.1109\/FPL.2014.6927483"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"V\u00e9stias, M.P. (2019). A Survey of Convolutional Neural Networks on Edge with Reconfigurable Computing. Algorithms, 12.","DOI":"10.3390\/a12080154"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"454","DOI":"10.1109\/TMM.2012.2229969","article-title":"Continuous Birdsong Recognition Using Gaussian Mixture Modeling of Image Shape Features","volume":"15","author":"Lee","year":"2013","journal-title":"Trans. Multi."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1016\/j.ecoinf.2017.04.003","article-title":"Automated bird acoustic event detection and robust species classification","volume":"39","author":"Zhao","year":"2017","journal-title":"Ecol. Inform."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"102471","DOI":"10.1016\/j.ecoinf.2024.102471","article-title":"Multi-label classification for acoustic bird species detection using transfer learning approach","volume":"80","author":"Swaminathan","year":"2024","journal-title":"Ecol. Inform."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Zhang, H., McLoughlin, I., and Song, Y. (2015, January 19\u201324). Robust sound event recognition using convolutional neural networks. Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), South Brisbane, Australia.","DOI":"10.1109\/ICASSP.2015.7178031"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"1524","DOI":"10.1016\/j.patrec.2009.09.014","article-title":"Detecting bird sounds in a complex acoustic environment and application to bioacoustic monitoring","volume":"31","author":"Bardeli","year":"2010","journal-title":"Pattern Recognit. Lett."},{"key":"ref_11","unstructured":"Mario, L. (2018). Acoustic Bird Detection with Deep Convolutional Neural Networks. Proceedings of the Detection and Classification of Acoustic Scenes and Events 2018 Workshop (DCASE2018), Tampere University of Technology. Technical Report, DCASE2018 Challenge."},{"key":"ref_12","unstructured":"Liaqat, S., Bozorg, N., Jose, N., Conrey, P., Tamasi, A., and Johnson, M.T. (2018). Domain Tuning Methods for Bird Audio Detection. Proceedings of the Detection and Classification of Acoustic Scenes and Events 2018 Workshop (DCASE2018), Tampere University of Technology. Technical Report, DCASE2018 Challenge."},{"key":"ref_13","unstructured":"Vesperini, F., Gabrielli, L., Principi, E., and Squartini, S. (2018). A Capsule Neural Networks Based Approach for Bird Audio Detection. Proceedings of the Detection and Classification of Acoustic Scenes and Events 2018 Workshop (DCASE2018), Tampere University of Technology. Technical Report, DCASE2018 Challenge."},{"key":"ref_14","unstructured":"DCASE2018 (2025, November 20). Bird Audio Detection Challenge. Available online: https:\/\/dcase.community\/challenge2018\/task-bird-audio-detection."},{"key":"ref_15","unstructured":"DCASE2018 (2025, November 20). Bird Audio Detection Challenge\u2014Results. Available online: https:\/\/dcase.community\/challenge2018\/task-bird-audio-detection-results."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"175353","DOI":"10.1109\/ACCESS.2019.2957572","article-title":"Investigation of Different CNN-Based Models for Improved Bird Sound Classification","volume":"7","author":"Xie","year":"2019","journal-title":"IEEE Access"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"74","DOI":"10.1016\/j.ecoinf.2019.05.007","article-title":"Handcrafted features and late fusion with deep learning for bird sound classification","volume":"52","author":"Xie","year":"2019","journal-title":"Ecol. Inform."},{"key":"ref_18","unstructured":"Wood, M., Glotin, H., Stowell, D., and Stylianou, Y. (2018). 3D convolution recurrent neural networks for bird sound detection. Proceedings of the 3rd Workshop on Detection and Classification of Acoustic Scenes and Events, Tampere University of Technology."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Cakir, E., Adavanne, S., Parascandolo, G., Drossos, K., and Virtanen, T. (September, January 28). Convolutional recurrent neural networks for bird audio detection. Proceedings of the 2017 25th European Signal Processing Conference (EUSIPCO), Kos Island, Greece.","DOI":"10.23919\/EUSIPCO.2017.8081508"},{"key":"ref_20","unstructured":"Sankupellay, M., and Konovalov, D. (2018). Bird Call Recognition using Deep Convolutional Neural Network, ResNet-50. Proceedings of the Acoustics 2018, James Cook University."},{"key":"ref_21","unstructured":"Conde, M., Shubham, K., Agnihotri, P., Movva, N.D., and Bessenyei, S. (2021). Weakly-Supervised Classification and Detection of Bird Sounds in the Wild. A BirdCLEF 2021 Solution. arXiv."},{"key":"ref_22","first-page":"01407102","article-title":"Bird Detection and Species Classification: Using YOLOv5 and Deep Transfer Learning Models","volume":"14","author":"Vo","year":"2023","journal-title":"Int. J. Adv. Comput. Sci. Appl."},{"key":"ref_23","unstructured":"Puget, J.F. (2021, January 21\u201324). STFT Transformers for Bird Song Recognition. Proceedings of the Conference and Labs of the Evaluation Forum, Bucharest, Romania."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"102001","DOI":"10.1016\/j.ecoinf.2023.102001","article-title":"Transound: Hyper-head attention transformer for birds sound recognition","volume":"75","author":"Tang","year":"2023","journal-title":"Ecol. Inform."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Su, Y., Zhang, K., Wang, J., and Madani, K. (2019). Environment Sound Classification Using a Two-Stream CNN Based on Decision-Level Fusion. Sensors, 19.","DOI":"10.3390\/s19071733"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"109121","DOI":"10.1016\/j.apacoust.2022.109121","article-title":"AMResNet: An automatic recognition model of bird sounds in real environment","volume":"201","author":"Xiao","year":"2022","journal-title":"Appl. Acoust."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"109833","DOI":"10.1016\/j.apacoust.2023.109833","article-title":"Bird sound detection based on sub-band features and the perceptron model","volume":"217","author":"Han","year":"2024","journal-title":"Appl. Acoust."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Adavanne, S., Drossos, K., \u00c7akir, E., and Virtanen, T. (2017). Stacked Convolutional and Recurrent Neural Networks for Bird Audio Detection. arXiv.","DOI":"10.23919\/EUSIPCO.2017.8081505"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"101009","DOI":"10.1016\/j.ecoinf.2019.101009","article-title":"Spectrogram-frame linear network and continuous frame sequence for bird sound classification","volume":"54","author":"Zhang","year":"2019","journal-title":"Ecol. Inform."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Zhang, S., Gao, Y., Cai, J., Yang, H., Zhao, Q., and Pan, F. (2023). A Novel Bird Sound Recognition Method Based on Multi-feature Fusion and a Transformer Encoder. Sensors, 23.","DOI":"10.3390\/s23198099"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"53","DOI":"10.24018\/ejeng.2024.9.1.3150","article-title":"A deep learning-based embedded system for pest bird sound detection and proximity estimation","volume":"9","author":"Aman","year":"2024","journal-title":"Eur. J. Eng. Technol. Res."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Vandendriessche, J., Wouters, N., da Silva, B., Lamrini, M., Chkouri, M.Y., and Touhafi, A. (2021). Environmental Sound Recognition on Embedded Systems: From FPGAs to TPUs. Electronics, 10.","DOI":"10.3390\/electronics10212622"},{"key":"ref_33","unstructured":"MathWorks (2025, November 20). \u201cWhat Is Quantization?\u201d. Available online: https:\/\/www.mathworks.com\/discovery\/quantization.html."},{"key":"ref_34","unstructured":"Google (2026, January 04). QKeras: A Quantization Deep Learning Library for TensorFlow Keras. Available online: https:\/\/github.com\/google\/qkeras."},{"key":"ref_35","unstructured":"Stowell, D., and Plumbley, M.D. (2014, January 26\u201329). Freefield1010: An open dataset for research on audio field recording archives. Proceedings of the 53rd Audio Engineering Society Conference on Semantic Audio (AES 53), London, UK."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"368","DOI":"10.1111\/2041-210X.13103","article-title":"Automatic acoustic detection of birds through deep learning: The first Bird Audio Detection challenge","volume":"10","author":"Stowell","year":"2018","journal-title":"J. Methods Ecol. Evol."},{"key":"ref_37","unstructured":"microfaune (2025, November 20). \u201cmicrofaune_ai (updated fork)\u201d. Available online: https:\/\/github.com\/W-Alphonse\/microfaune_ai."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Cho, K., Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., and Bengio, Y. (2014). Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. arXiv.","DOI":"10.3115\/v1\/D14-1179"}],"container-title":["Electronics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2079-9292\/15\/2\/354\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,13]],"date-time":"2026-01-13T16:10:48Z","timestamp":1768320648000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2079-9292\/15\/2\/354"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,1,13]]},"references-count":38,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2026,1]]}},"alternative-id":["electronics15020354"],"URL":"https:\/\/doi.org\/10.3390\/electronics15020354","relation":{},"ISSN":["2079-9292"],"issn-type":[{"value":"2079-9292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,1,13]]}}}