{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,9]],"date-time":"2026-01-09T22:37:17Z","timestamp":1767998237299,"version":"3.49.0"},"reference-count":25,"publisher":"MDPI AG","issue":"7","license":[{"start":{"date-parts":[[2022,6,22]],"date-time":"2022-06-22T00:00:00Z","timestamp":1655856000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Open Access Publication Fund of the University of Wuerzburg"},{"name":"German academy of science"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>This paper deals with the effect of exploiting background knowledge for improving an OMR (Optical Music Recognition) deep learning pipeline for transcribing medieval, monophonic, handwritten music from the 12th\u201314th century, whose usage has been neglected in the literature. Various types of background knowledge about overlapping notes and text, clefs, graphical connections (neumes) and their implications on the position in staff of the notes were used and evaluated. Moreover, the effect of different encoder\/decoder architectures and of different datasets for training a mixed model and for document-specific fine-tuning based on an extended OMR pipeline with an additional post-processing step were evaluated. The use of background models improves all metrics and in particular the melody accuracy rate (mAR), which is based on the insert, delete and replace operations necessary to convert the generated melody into the correct melody. When using a mixed model and evaluating on a different dataset, our best model achieves without fine-tuning and without post-processing a mAR of 90.4%, which is raised by nearly 30% to 93.2% mAR using background knowledge. With additional fine-tuning, the contribution of post-processing is even greater: the basic mAR of 90.5% is raised by more than 50% to 95.8% mAR.<\/jats:p>","DOI":"10.3390\/a15070221","type":"journal-article","created":{"date-parts":[[2022,6,22]],"date-time":"2022-06-22T21:31:06Z","timestamp":1655933466000},"page":"221","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Optical Medieval Music Recognition Using Background Knowledge"],"prefix":"10.3390","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9093-5077","authenticated-orcid":false,"given":"Alexander","family":"Hartelt","sequence":"first","affiliation":[{"name":"Department for Artificial Intelligence and Knowledge Systems, University of Wuerzburg, D-97074 Wuerzburg, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7106-3223","authenticated-orcid":false,"given":"Frank","family":"Puppe","sequence":"additional","affiliation":[{"name":"Department for Artificial Intelligence and Knowledge Systems, University of Wuerzburg, D-97074 Wuerzburg, Germany"}]}],"member":"1968","published-online":{"date-parts":[[2022,6,22]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv.","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"ref_2","unstructured":"Pacha, A., and Calvo-Zaragoza, J. (2018, January 23\u201327). Optical Music Recognition in Mensural Notation with Region-Based Convolutional Neural Networks. Proceedings of the 19th International Society for Music Information Retrieval Conference, Paris, France."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Pacha, A., Choi, K.Y., Co\u00fcasnon, B., Ricquebourg, Y., Zanibbi, R., and Eidenberger, H.M. (2018, January 24\u201327). Handwritten Music Object Detection: Open Issues and Baseline Results. Proceedings of the 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), Vienna, Austria.","DOI":"10.1109\/DAS.2018.51"},{"key":"ref_4","unstructured":"Ren, S., He, K., Girshick, R.B., and Sun, J. (2015). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. arXiv."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S.K., Girshick, R.B., and Farhadi, A. (2015). You Only Look Once: Unified, Real-Time Object Detection. arXiv.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_6","unstructured":"Wel, E., and Ullrich, K. (2017). Optical Music Recognition with Convolutional Sequence-to-Sequence Models. arXiv."},{"key":"ref_7","first-page":"1","article-title":"Understanding Optical Music Recognition","volume":"53","author":"Pacha","year":"2021","journal-title":"ACM Comput. Surv."},{"key":"ref_8","unstructured":"Bar\u00f3-Mas, A. (2017). Optical Music Recognition by Long Short-Term Memory Recurrent Neural Networks. [Master\u2019s Thesis, Universitat Aut\u00f2noma de Barcelona]."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Calvo-Zaragoza, J., and Rizo, D. (2018). End-to-End Neural Optical Music Recognition of Monophonic Scores. Appl. Sci., 8.","DOI":"10.3390\/app8040606"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Graves, A., Fern\u00e1ndez, S., Gomez, F., and Schmidhuber, J. (2006, January 25\u201329). Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks. Proceedings of the Proceedings of the 23rd International Conference on Machine Learning (ICML \u201906), Pittsburgh, PA, USA.","DOI":"10.1145\/1143844.1143891"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Calvo-Zaragoza, J., Castellanos, F.J., Vigliensoni, G., and Fujinaga, I. (2018). Deep Neural Networks for Document Processing of Music Score Images. Appl. Sci., 8.","DOI":"10.3390\/app8050654"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Wick, C., Hartelt, A., and Puppe, F. (2019). Staff, Symbol and Melody Detection of Medieval Manuscripts Written in Square Notation Using Deel Fully Convolutional Networks. Appl. Sci., 9.","DOI":"10.20944\/preprints201905.0231.v1"},{"key":"ref_13","unstructured":"Hajic, J., Dorfer, M., Widmer, G., and Pecina, P. (2018, January 23\u201327). Towards Full-Pipeline Handwritten OMR with Musical Symbol Detection by U-Nets. Proceedings of the ISMIR, Paris, France."},{"key":"ref_14","unstructured":"d\u2019Andecy, V., Camillerapp, J., and Leplumey, I. (1994, January 9\u201313). Kalman filtering for segment detection: Application to music scores analysis. Proceedings of the 12th International Conference on Pattern Recognition, Jerusalem, Israel."},{"key":"ref_15","unstructured":"FuJinaga, I. (1988). Optical Music Recognition Using Projections, Faculty of Music McGilI Universit."},{"key":"ref_16","unstructured":"Bellini, P., Bruno, I., and Nesi, P. (2001, January 23\u201324). Optical music sheet segmentation. Proceedings of the First International Conference on WEB Delivering of Music. WEDELMUSIC 2001, Florence, Italy."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"23763","DOI":"10.3390\/s150923763","article-title":"Block-based connected-component labeling algorithm using binary decision trees","volume":"15","author":"Chang","year":"2015","journal-title":"Sensors"},{"key":"ref_18","unstructured":"Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., and Adam, H. (2017). MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2015). Deep Residual Learning for Image Recognition. arXiv.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_20","unstructured":"Tan, M., and Le, Q.V. (2019). EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Doll\u00e1r, P., and Girshick, R.B. (2017). Mask R-CNN. arXiv.","DOI":"10.1109\/ICCV.2017.322"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Lin, T., Maire, M., Belongie, S.J., Bourdev, L.D., Girshick, R.B., Hays, J., Perona, P., Ramanan, D., Doll\u00e1r, P., and Zitnick, C.L. (2014). Microsoft COCO: Common Objects in Context. arXiv.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_23","unstructured":"Eipert, T., Herrman, F., Wick, C., Puppe, F., and Haug, A. (2019, January 2). Editor Support for Digital Editions of Medieval Monophonic Music. Proceedings of the 2nd International Workshop on Reading Music Systems, Delft, The Netherlands."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., and Fei-Fei, L. (2009, January 20\u201325). Imagenet: A large-scale hierarchical image database. Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA.","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"ref_25","unstructured":"Kingma, D.P., and Ba, J. (2014). Adam: A Method for Stochastic Optimization. arXiv."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/15\/7\/221\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T23:37:19Z","timestamp":1760139439000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/15\/7\/221"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,6,22]]},"references-count":25,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2022,7]]}},"alternative-id":["a15070221"],"URL":"https:\/\/doi.org\/10.3390\/a15070221","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,6,22]]}}}