{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T02:06:21Z","timestamp":1760234781076,"version":"build-2065373602"},"reference-count":62,"publisher":"MDPI AG","issue":"13","license":[{"start":{"date-parts":[[2021,6,23]],"date-time":"2021-06-23T00:00:00Z","timestamp":1624406400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001871","name":"Funda\u00e7\u00e3o para a Ci\u00eancia e a Tecnologia","doi-asserted-by":"publisher","award":["SFRH\/BD\/120383\/2016"],"award-info":[{"award-number":["SFRH\/BD\/120383\/2016"]}],"id":[{"id":"10.13039\/501100001871","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Electronics"],"abstract":"<jats:p>The extraction of the beat from musical audio signals represents a foundational task in the field of music information retrieval. While great advances in performance have been achieved due the use of deep neural networks, significant shortcomings still remain. In particular, performance is generally much lower on musical content that differs from that which is contained in existing annotated datasets used for neural network training, as well as in the presence of challenging musical conditions such as rubato. In this paper, we positioned our approach to beat tracking from a real-world perspective where an end-user targets very high accuracy on specific music pieces and for which the current state of the art is not effective. To this end, we explored the use of targeted fine-tuning of a state-of-the-art deep neural network based on a very limited temporal region of annotated beat locations. We demonstrated the success of our approach via improved performance across existing annotated datasets and a new annotation-correction approach for evaluation. Furthermore, we highlighted the ability of content-specific fine-tuning to learn both what is and what is not the beat in challenging musical conditions.<\/jats:p>","DOI":"10.3390\/electronics10131518","type":"journal-article","created":{"date-parts":[[2021,6,23]],"date-time":"2021-06-23T11:28:41Z","timestamp":1624447721000},"page":"1518","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":10,"title":["User-Driven Fine-Tuning for Beat Tracking"],"prefix":"10.3390","volume":"10","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1629-8385","authenticated-orcid":false,"given":"Ant\u00f3nio","family":"Pinto","sequence":"first","affiliation":[{"name":"INESC TEC, Centre for Telecommunications and Multimedia, 4200-465 Porto, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6707-5427","authenticated-orcid":false,"given":"Sebastian","family":"B\u00f6ck","sequence":"additional","affiliation":[{"name":"enliteAI, 1000-1901 Vienna, Austria"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3760-2473","authenticated-orcid":false,"given":"Jaime","family":"Cardoso","sequence":"additional","affiliation":[{"name":"INESC TEC, Centre for Telecommunications and Multimedia, 4200-465 Porto, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1315-3992","authenticated-orcid":false,"given":"Matthew","family":"Davies","sequence":"additional","affiliation":[{"name":"Centre for Informatics and Systems, Department of Informatics Engineering, University of Coimbra, 3030-290 Coimbra, Portugal"}]}],"member":"1968","published-online":{"date-parts":[[2021,6,23]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Schl\u00fcter, J., and B\u00f6ck, S. (2014, January 4\u20139). Improved musical onset detection with Convolutional Neural Networks. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy.","DOI":"10.1109\/ICASSP.2014.6854953"},{"key":"ref_2","unstructured":"Schreiber, H., and M\u00fcller, M. (2019, January 28\u201331). Musical tempo and key estimation using convolutional neural networks with directional filters. Proceedings of the Sound and Music Computing Conference (SMC), Malaga, Spain."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"45","DOI":"10.1613\/jair.1121","article-title":"Monte Carlo Methods for Tempo Tracking and Rhythm Quantization","volume":"18","author":"Cemgil","year":"2003","journal-title":"J. Artif. Intell. Res."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Klapuri, A., and Davy, M. (2006). Beat Tracking and Musical Metre Analysis. Signal Processing Methods for Music Transcription, Springer US.","DOI":"10.1007\/0-387-32845-9"},{"key":"ref_5","unstructured":"Sethares, W.A. (2007). Rhythm and Transforms, Springer Science & Business Media."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"M\u00fcller, M. (2015). Tempo and Beat Tracking. Fundamentals of Music Processing, Springer International Publishing.","DOI":"10.1007\/978-3-319-21945-5_6"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"190","DOI":"10.1109\/TASL.2011.2159593","article-title":"Performance Following: Real-Time Prediction of Musical Sequences Without a Score","volume":"20","author":"Stark","year":"2011","journal-title":"IEEE Trans. Audio Speech Lang. Process."},{"key":"ref_8","first-page":"246","article-title":"Audio-Based Music Structure Analysis: Current Trends, Open Challenges, and Applications","volume":"3","author":"Nieto","year":"2020","journal-title":"Trans. Int. Soc. Music. Inf. Retr."},{"key":"ref_9","unstructured":"Fuentes, M., Maia, L.S., Rocamora, M., Biscainho, L.W., Crayencour, H.C., Essid, S., and Bello, J.P. (2019, January 4\u20138). Tracking beats and microtiming in Afro-latin American music using conditional random fields and deep learning. Proceedings of the 20th International Society for Music Information Retrieval Conference (ISMIR), Delft, The Netherlands."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Vande Veire, L., and De Bie, T. (2018). From raw audio to a seamless mix: Creating an automated DJ system for Drum and Bass. EURASIP J. Audio Speech Music. Process., 2018.","DOI":"10.1186\/s13636-018-0134-8"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1726","DOI":"10.1109\/TASLP.2014.2347135","article-title":"AutoMashUpper: Automatic Creation of Multi-Song Music Mashups","volume":"22","author":"Davies","year":"2014","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"553","DOI":"10.1109\/LSP.2004.827951","article-title":"On the Use of Phase and Energy for Musical Onset Detection in the Complex Domain","volume":"11","author":"Bello","year":"2004","journal-title":"IEEE Signal Process. Lett."},{"key":"ref_13","unstructured":"Dixon, S. (2006, January 18\u201320). Onset detection revisited. Proceedings of the 9th International Conference on Digital Audio Effects (DAFx), Montreal, QC, Canada."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"1009","DOI":"10.1109\/TASL.2006.885257","article-title":"Context-Dependent Beat Tracking of Musical Audio","volume":"15","author":"Davies","year":"2007","journal-title":"IEEE Trans. Audio Speech Lang. Process."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"342","DOI":"10.1109\/TSA.2005.854090","article-title":"Analysis of the meter of acoustic musical signals","volume":"14","author":"Klapuri","year":"2006","journal-title":"IEEE Trans. Audio Speech Lang. Process."},{"key":"ref_16","unstructured":"Dixon, S. (2001, January 17\u201322). An Interactive Beat Tracking and Visualisation System. Proceedings of the International Computer Music Conference (ICMC), Havana, Cuba."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Goto, M., and Muraoka, Y. (1994). A beat tracking system for acoustic signals of music. Proceedings of the 2nd ACM International Conference on Multimedia (MULTIMEDIA \u201994), ACM Press.","DOI":"10.1145\/192593.192700"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"51","DOI":"10.1080\/09298210701653344","article-title":"Beat Tracking by Dynamic Programming","volume":"36","author":"Ellis","year":"2007","journal-title":"J. New Music Res."},{"key":"ref_19","unstructured":"B\u00f6ck, S., and Schedl, M. (2011, January 19\u201323). Enhanced beat tracking with context-aware neural networks. Proceedings of the 14th International Conference on Digital Audio Effects (DAFx), Paris, France."},{"key":"ref_20","unstructured":"B\u00f6ck, S., Krebs, F., and Widmer, G. (2014, January 27\u201331). A Multi-model Approach to Beat Tracking Considering Heterogeneous Music Styles. Proceedings of the 15th International Society for Music Information Retrieval Conference (ISMIR), Taipei, Taiwan."},{"key":"ref_21","unstructured":"Krebs, F., Sebastian, B., and Widmer, G. (2015, January 26\u201330). An Efficient State-Space Model for Joint Tempo and Meter Tracking. Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR), Malaga, Spain."},{"key":"ref_22","unstructured":"B\u00f6ck, S., and Davies, M.E.P. (2020, January 12\u201316). Deconstruct, Analyse, Reconstruct: How To Improve Tempo, Beat, and Downbeat Estimation. Proceedings of the 21st International Society for Music Information Retrieval Conference (ISMIR), Montreal, QC, Canada."},{"key":"ref_23","unstructured":"Hainsworth, S. (2004). Techniques for the Automated Analysis of Musical Audio. [Ph.D. Thesis, University of Cambridge]."},{"key":"ref_24","unstructured":"Davies, M.E.P., Degara, N., and Plumbley, M.D. (2009). Evaluation Methods for Musical Audio Beat Tracking Algorithms, Queen Mary University of London. Technical Report October."},{"key":"ref_25","unstructured":"Marchand, U., and Peeters, G. (December, January 30). Swing Ratio Estimation. Proceedings of the 18th International Conference on Digital Audio Effects (DAFx), Trondheim, Norway."},{"key":"ref_26","unstructured":"Krebs, F., B\u00f6ck, S., and Widmer, G. (2013, January 4\u20138). Rhythmic Pattern Modeling for Beat and Downbeat Tracking in Musical Audio. Proceedings of the 14th International Society for Music Information Retrieval Conference (ISMIR), Curitiba, Brazil."},{"key":"ref_27","first-page":"3","article-title":"The Deep Learning Revolution in MIR: The Pros and Cons, the Needs and the Challenges","volume":"Volume 12631","author":"Ystad","year":"2021","journal-title":"Perception, Representations, Image, Sound, Music\u2014Proceedings of the 14th International Symposium (CMMR 2019), Marseille, France, 14\u201318 October 2019"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"2539","DOI":"10.1109\/TASL.2012.2205244","article-title":"Selective sampling for beat tracking evaluation","volume":"20","author":"Holzapfel","year":"2012","journal-title":"IEEE Trans. Audio Speech Lang. Process."},{"key":"ref_29","unstructured":"Grosche, P., M\u00fcller, M., and Sapp, C.S. (2010, January 9\u201313). What Makes Beat Tracking Difficult? A Case Study on Chopin Mazurkas. Proceedings of the 11th International Society for Music Information Retrieval Conference (ISMIR), Utrecht, The Netherlands."},{"key":"ref_30","unstructured":"Dalton, B., Johnson, D., and Tzanetakis, G. (2019, January 28\u201331). DAW-Integrated Beat Tracking for Music Production. Proceedings of the Sound and Music Computing Conference (SMC), Malaga, Spain."},{"key":"ref_31","first-page":"75","article-title":"Tapping Along to the Difficult Ones: Leveraging User-Input for Beat Tracking in Highly Expressive Musical Content","volume":"Volume 12631","author":"Ystad","year":"2021","journal-title":"Perception, Representations, Image, Sound, Music\u2014Proceedings of the 14th International Symposium, CMMR 2019, Marseille, France, 14\u201318 October 2019"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Pons, J., Serra, J., and Serra, X. (2019, January 12\u201317). Training Neural Audio Classifiers with Few Data. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK.","DOI":"10.1109\/ICASSP.2019.8682591"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"1345","DOI":"10.1109\/TKDE.2009.191","article-title":"A survey on transfer learning","volume":"22","author":"Pan","year":"2010","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_34","unstructured":"Van den Oord, A., Dieleman, S., and Schrauwen, B. (2014, January 27\u201331). Transfer Learning by Supervised Pre-training for Audio-based Music Classification. Proceedings of the 15th International Society for Music Information Retrieval Conference (ISMIR), Taipei, Taiwan."},{"key":"ref_35","unstructured":"Choi, K., Fazekas, G., Sandler, M., and Cho, K. (2017, January 23\u201327). Transfer learning for music classification and regression tasks. Proceedings of the 18th International Conference on Music Information Retrieval (ISMIR), Suzhou, China."},{"key":"ref_36","unstructured":"Burloiu, G. (2021, May 25). Adaptive Drum Machine Microtiming with Transfer Learning and RNNs. Extended Abstracts for the Late-Breaking Demo Session of the International Society for Music Information Retrieval Conference (ISMIR). Available online: https:\/\/program.ismir2020.net\/static\/lbd\/ISMIR2020-LBD-422-abstract.pdf."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Fiocchi, D., Buccoli, M., Zanoni, M., Antonacci, F., and Sarti, A. (2018, January 3\u20137). Beat Tracking using Recurrent Neural Network: A Transfer Learning Approach. Proceedings of the 26th European Signal Processing Conference (EUSIPCO), Rome, Italy.","DOI":"10.23919\/EUSIPCO.2018.8553059"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Wang, Y., Yao, Q., Kwok, J., and Ni, L.M. (2019). Generalizing from a Few Examples: A Survey on Few-Shot Learning. arXiv.","DOI":"10.1145\/3386252"},{"key":"ref_39","unstructured":"Choi, J., Lee, J., Park, J., and Nam, J. (2019, January 4\u20138). Zero-shot learning for audio-based music classification and tagging. Proceedings of the 20th International Society for Music Information Retrieval Conference (ISMIR), Delft, The Netherlands."},{"key":"ref_40","unstructured":"Dhillon, G.S., Chaudhari, P., Ravichandran, A., and Soatto, S. (2020, January 26\u201330). A Baseline for Few-Shot Image Classification. Proceedings of the 8th International Conference on Learning Representations (ICLR), Addis Ababa, Ethiopia."},{"key":"ref_41","unstructured":"Manilow, E., and Pardo, B. (2020). Bespoke Neural Networks for Score-Informed Source Separation. arXiv."},{"key":"ref_42","unstructured":"Wang, Y., Salamon, J., Cartwright, M., Bryan, N.J., and Bello, J.P. (2020, January 12\u201316). Few-Shot Drum Transcription in Polyphonic Music. Proceedings of the 21st International Society for Music Information Retrieval Conference (ISMIR), Montreal, QC, Canada."},{"key":"ref_43","unstructured":"Davies, M.E.P., and B\u00f6ck, S. (2019, January 2\u20136). Temporal convolutional networks for musical audio beat tracking. Proceedings of the 27th European Signal Processing Conference (EUSIPCO), A Coruna, Spain."},{"key":"ref_44","unstructured":"B\u00f6ck, S., Davies, M.E.P., and Knees, P. (2019, January 4\u20138). Multi-Task Learning of Tempo and Beat: Learning One To Improve the Other. Proceedings of the 20th International Society for Music Information Retrieval Conference (ISMIR), Delft, The Netherlands."},{"key":"ref_45","unstructured":"B\u00f6ck, S., Krebs, F., and Widmer, G. (2016, January 7\u201311). Joint Beat and Downbeat tracking with recurrent neural networks. Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR), New York, NY, USA."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"1832","DOI":"10.1109\/TSA.2005.858509","article-title":"An Experimental Comparison of Audio Tempo Induction Algorithms","volume":"14","author":"Gouyon","year":"2006","journal-title":"IEEE Trans. Audio Speech Lang. Process."},{"key":"ref_47","unstructured":"Hockman, J.A., Bello, J.P., Davies, M.E.P., and Plumbley, M.D. (2008, January 1\u20134). Automated Rhythmic Transformation of Musical Audio. Proceedings of the 11th International Conference on Digital Audio Effects (DAFx), Espoo, Finland."},{"key":"ref_48","unstructured":"Gouyon, F. (2005). A Computational Approach to Rhythm Description\u2014Audio Features for the Computation of Rhythm Periodicity Functions and their use in Tempo Induction and Music Content Processing. [Ph.D. Thesis, Universitat Pompeu Fabra]."},{"key":"ref_49","unstructured":"Yosinski, J., Clune, J., Bengio, Y., and Lipson, H. (2014, January 13). How transferable are features in deep neural networks?. Proceedings of the Advances in Neural Information Processing Systems (NIPS2014), Montreal, QC, Canada."},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Howard, J., and Ruder, S. (2018). Universal language model fine-tuning for text classification. ACL 2018\u2014Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers), Association for Computational Linguistics Location.","DOI":"10.18653\/v1\/P18-1031"},{"key":"ref_51","unstructured":"Pinto, A.S., Domingues, I., and Davies, M.E.P. (2020). Shift If You Can: Counting and Visualising Correction Operations for Beat Tracking Evaluation. arXiv."},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"3250","DOI":"10.1109\/TIT.2004.838101","article-title":"The similarity metric","volume":"50","author":"Li","year":"2004","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Valero-Mas, J.J., and I\u00f1esta, J.M. (2017). Interactive user correction of automatically detected onsets: Approach and evaluation. EURASIP J. Audio Speech Music Process., 2017.","DOI":"10.1186\/s13636-017-0111-7"},{"key":"ref_54","unstructured":"Driedger, J., Schreiber, H., De Haas, W.B., and M\u00fcller, M. (2019, January 4\u20138). Towards automatically correcting tapped beat annotations for music recordings. Proceedings of the 20th International Society for Music Information Retrieval Conference (ISMIR), Delft, The Netherlands."},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"292","DOI":"10.1038\/nrn2258","article-title":"Noise in the nervous system","volume":"9","author":"Faisal","year":"2008","journal-title":"Nat. Rev. Neurosci."},{"key":"ref_56","unstructured":"Raffel, C., Mcfee, B., Humphrey, E.J., Salamon, J., Nieto, O., Liang, D., and Ellis, D.P.W. (2014, January 27\u201331). mir_eval: A Transparent Implementation of Common MIR Metrics. Proceedings of the 15th International Society for Music Information Retrieval Conference (ISMIR), Taipei, Taiwan."},{"key":"ref_57","doi-asserted-by":"crossref","first-page":"293","DOI":"10.1109\/TSA.2002.800560","article-title":"Musical genre classification of audio signals","volume":"10","author":"Tzanetakis","year":"2002","journal-title":"IEEE Trans. Speech Audio Process."},{"key":"ref_58","doi-asserted-by":"crossref","unstructured":"Cannam, C., Landone, C., and Sandler, M. (2010). Sonic visualiser. Proceedings of the International Conference on Multimedia (MM \u201910), ACM Press.","DOI":"10.1145\/1873951.1874248"},{"key":"ref_59","unstructured":"Schreiber, H., and M\u00fcller, M. (2018, January 23\u201327). A single-step approach to musical tempo estimation using a convolutional neural network. Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR), Paris, France."},{"key":"ref_60","doi-asserted-by":"crossref","first-page":"109","DOI":"10.1016\/S0079-7421(08)60536-8","article-title":"Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem","volume":"24","author":"McCloskey","year":"1989","journal-title":"Psychol. Learn. Motiv. Adv. Res. Theory"},{"key":"ref_61","doi-asserted-by":"crossref","first-page":"54","DOI":"10.1016\/j.neunet.2019.01.012","article-title":"Continual lifelong learning with neural networks: A review","volume":"113","author":"Parisi","year":"2019","journal-title":"Neural Netw."},{"key":"ref_62","doi-asserted-by":"crossref","unstructured":"Guo, Y., Shi, H., Kumar, A., Grauman, K., Rosing, T., and Feris, R. (2018). SpotTune: Transfer Learning through Adaptive Fine-tuning. arXiv.","DOI":"10.1109\/CVPR.2019.00494"}],"container-title":["Electronics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2079-9292\/10\/13\/1518\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T06:22:16Z","timestamp":1760163736000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2079-9292\/10\/13\/1518"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,6,23]]},"references-count":62,"journal-issue":{"issue":"13","published-online":{"date-parts":[[2021,7]]}},"alternative-id":["electronics10131518"],"URL":"https:\/\/doi.org\/10.3390\/electronics10131518","relation":{},"ISSN":["2079-9292"],"issn-type":[{"type":"electronic","value":"2079-9292"}],"subject":[],"published":{"date-parts":[[2021,6,23]]}}}