{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,8]],"date-time":"2026-03-08T17:18:01Z","timestamp":1772990281530,"version":"3.50.1"},"reference-count":34,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2023,5,26]],"date-time":"2023-05-26T00:00:00Z","timestamp":1685059200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,5,26]],"date-time":"2023-05-26T00:00:00Z","timestamp":1685059200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100009092","name":"Universidad de Alicante","doi-asserted-by":"crossref","id":[{"id":"10.13039\/100009092","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int J Multimed Info Retr"],"published-print":{"date-parts":[[2023,6]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>The recognition of patterns that have a time dependency is common in areas like speech recognition or natural language processing. The equivalent situation in image analysis is present in tasks like text or video recognition. Recently, Convolutional Recurrent Neural Networks (CRNN) have been broadly applied to solve these tasks in an end-to-end fashion with successful performance. However, its application to Optical Music Recognition (OMR) is not so straightforward due to the presence of different elements sharing the same horizontal position, disrupting the linear flow of the timeline. In this paper, we study the ability of the state-of-the-art CRNN approach to learn codes that represent this disruption in homophonic scores. In our experiments, we study the lower bounds in the recognition task of real scores when the models are trained with synthetic data. Two relevant conclusions are drawn: (1) Our serialized ways of encoding the music content are appropriate for CRNN-based OMR; (2) the learning process is possible with synthetic data, but there exists a <jats:italic>glass ceiling<\/jats:italic> when recognizing real sheet music.<\/jats:p>","DOI":"10.1007\/s13735-023-00278-5","type":"journal-article","created":{"date-parts":[[2023,5,26]],"date-time":"2023-05-26T15:02:29Z","timestamp":1685113349000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":17,"title":["Optical music recognition for homophonic scores with neural networks and synthetic music generation"],"prefix":"10.1007","volume":"12","author":[{"given":"Mar\u00eda","family":"Alfaro-Contreras","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jos\u00e9 M.","family":"I\u00f1esta","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jorge","family":"Calvo-Zaragoza","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2023,5,26]]},"reference":[{"key":"278_CR1","unstructured":"Alfaro\u00a0Contreras M (2018) Construcci\u00f3n de un corpus de referencia para investigaci\u00f3n en reconocimiento autom\u00e1tico de partituras musicales. Technical report, Universidad de Alicante. (In Spanish)"},{"key":"278_CR2","doi-asserted-by":"crossref","unstructured":"Alfaro-Contreras M, Calvo-Zaragoza J, I\u00f1esta JM (2019) Approaching end-to-end optical music recognition for homophonic scores. In: Iberian conference on pattern recognition and image analysis, pp 147\u2013158. Springer","DOI":"10.1007\/978-3-030-31321-0_13"},{"key":"278_CR3","unstructured":"Alfaro-Contreras M, Rizo D, I\u00f1esta JM, Calvo-Zaragoza J (2021) OMR-assisted transcription: a case study with early prints. In: Proceedings of the 22nd international society for music information retrieval conference, pp 35\u201341, Online. ISMIR"},{"issue":"2","key":"278_CR4","doi-asserted-by":"publisher","first-page":"95","DOI":"10.1023\/A:1002485918032","volume":"35","author":"D Bainbridge","year":"2001","unstructured":"Bainbridge D, Bell T (2001) The challenge of optical music recognition. Comput Humanit 35(2):95\u2013121","journal-title":"Comput Humanit"},{"key":"278_CR5","doi-asserted-by":"crossref","unstructured":"Bar\u00f3 A, Badal C, Forn\u00e9s A (2020) Handwritten historical music recognition by sequence-to-sequence with attention mechanism. In: 17th International conference on frontiers in handwriting recognition, ICFHR 2020, Dortmund, Germany, 2020, pp 205\u2013210","DOI":"10.1109\/ICFHR2020.2020.00046"},{"key":"278_CR6","doi-asserted-by":"crossref","unstructured":"Bar\u00f3 A, Riba P, Forn\u00e9s A (2018) A starting point for handwritten music recognition. In: 1st International workshop on reading music systems. France, Paris, pp 5\u20136","DOI":"10.1016\/j.patrec.2019.02.029"},{"key":"278_CR7","unstructured":"Burgoyne JA, Pugin L, Eustace G, Fujinaga I (2007) A comparative survey of image binarisation algorithms for optical recognition on degraded musical sources. In: Proceedings of the 8th international conference on music information retrieval, ISMIR 2007, Vienna, Austria, 2007, pp 509\u2013512"},{"issue":"3","key":"278_CR8","doi-asserted-by":"publisher","first-page":"169","DOI":"10.1080\/09298215.2015.1045424","volume":"44","author":"D Byrd","year":"2015","unstructured":"Byrd D, Simonsen JG (2015) Towards a standard testbed for optical music recognition: Definitions, metrics, and page images. J New Music Res 44(3):169\u2013195","journal-title":"J New Music Res"},{"key":"278_CR9","doi-asserted-by":"crossref","unstructured":"Calvo-Zaragoza J, Jr JH, Pacha A (2020) Understanding optical music recognition. ACM Comput Surv, 53(4): 1\u201377","DOI":"10.1145\/3397499"},{"issue":"3","key":"278_CR10","doi-asserted-by":"publisher","first-page":"211","DOI":"10.1007\/s10032-016-0266-2","volume":"19","author":"J Calvo-Zaragoza","year":"2016","unstructured":"Calvo-Zaragoza J, Mic\u00f3 L, Oncina J (2016) Music staff removal with supervised pixel classification. Int J Doc Anal Recognit 19(3):211\u2013219","journal-title":"Int J Doc Anal Recognit"},{"key":"278_CR11","unstructured":"Calvo-Zaragoza J, Rizo D (2018) Camera-PrIMuS: neural end-to-end optical music recognition on realistic monophonic scores. In: Proceedings of the 19th international society for music information retrieval conference, ISMIR 2018, Paris, France, 2018, pp 248\u2013255"},{"key":"278_CR12","doi-asserted-by":"crossref","unstructured":"Calvo-Zaragoza J, Rizo D (2018) Camera-PrIMuS: neural end-to-end optical music recognition on realistic monophonic scores. In: Proceedings of the 19th international society for music information retrieval conference, pp 248\u2013255, Paris, France","DOI":"10.3390\/app8040606"},{"key":"278_CR13","doi-asserted-by":"publisher","first-page":"115","DOI":"10.1016\/j.patrec.2019.08.021","volume":"128","author":"J Calvo-Zaragoza","year":"2019","unstructured":"Calvo-Zaragoza J, Toselli AH, Vidal E (2019) Handwritten music recognition for mensural notation with convolutional recurrent neural networks. Pattern Recognit Lett 128:115\u2013121","journal-title":"Pattern Recognit Lett"},{"key":"278_CR14","first-page":"1","volume":"7","author":"J Dem\u0161ar","year":"2006","unstructured":"Dem\u0161ar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1\u201330","journal-title":"J Mach Learn Res"},{"key":"278_CR15","doi-asserted-by":"crossref","unstructured":"Dutta A, Pal U, Forn\u00e9s A, Llad\u00f3s J (2010) An efficient staff removal approach from printed musical documents. In: 20th International conference on pattern recognition, ICPR 2010, Istanbul, Turkey, 2010, pp 1965\u20131968","DOI":"10.1109\/ICPR.2010.484"},{"key":"278_CR16","doi-asserted-by":"crossref","unstructured":"Forn\u00e9s A, S\u00e1nchez G (2014) Analysis and recognition of music scores. In: Handbook of document image processing and recognition, pp 749\u2013774","DOI":"10.1007\/978-0-85729-859-1_24"},{"key":"278_CR17","doi-asserted-by":"publisher","first-page":"138","DOI":"10.1016\/j.eswa.2017.07.002","volume":"89","author":"A-J Gallego","year":"2017","unstructured":"Gallego A-J, Calvo-Zaragoza J (2017) Staff-line removal with selectional auto-encoders. Expert Syst Appl 89:138\u2013148","journal-title":"Expert Syst Appl"},{"key":"278_CR18","unstructured":"Good M et\u00a0al (2001) MusicXML: an internet-friendly format for sheet music. In: XML conference and expo, pp 03\u201304"},{"key":"278_CR19","unstructured":"Graves A (2008) Supervised sequence labelling with recurrent neural networks. PhD thesis, Technical University Munich"},{"key":"278_CR20","doi-asserted-by":"crossref","unstructured":"Graves A, Fern\u00e1ndez S, Gomez F, Schmidhuber J (2006) Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, ICML \u201906, pp 369\u2013376, New York, NY, USA. ACM","DOI":"10.1145\/1143844.1143891"},{"key":"278_CR21","unstructured":"Graves A, Schmidhuber J (2009) Offline handwriting recognition with multidimensional recurrent neural networks. In: Advances in neural information processing systems, pp 545\u2013552"},{"key":"278_CR22","unstructured":"Hankinson A, Roland P, Fujinaga I (2011) The music encoding initiative as a document-encoding framework. In: Proceedings of the 12th international society for music information retrieval conference, pp 293\u2013298"},{"key":"278_CR23","unstructured":"Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: Bengio, Y, LeCun, Y., (eds) In: 3rd International conference on learning representations, ICLR 2015, San Diego, CA, USA, 7-9, 2015, Conference Track Proceedings"},{"issue":"7553","key":"278_CR24","doi-asserted-by":"publisher","first-page":"436","DOI":"10.1038\/nature14539","volume":"521","author":"Y LeCun","year":"2015","unstructured":"LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436\u2013444","journal-title":"Nature"},{"issue":"9","key":"278_CR25","doi-asserted-by":"publisher","first-page":"6383","DOI":"10.1007\/s11042-019-08200-0","volume":"79","author":"L Mengarelli","year":"2020","unstructured":"Mengarelli L, Kostiuk B, Vitorio JG, Tibola MA, Wolff W, Silla CN (2020) OMR metrics and evaluation: a systematic review. Multimed Tools Appl 79(9):6383\u20136408","journal-title":"Multimed Tools Appl"},{"key":"278_CR26","doi-asserted-by":"publisher","DOI":"10.4324\/9780080502403","volume-title":"Composing music with computers","author":"E Miranda","year":"2001","unstructured":"Miranda E (2001) Composing music with computers. Focal Press, New York"},{"key":"278_CR27","unstructured":"Pacha A, Calvo-Zaragoza J, Jr JH (2019) Learning notation graph construction for full-pipeline optical music recognition. In: Flexer A, Peeters G, Urbano J, Volk A, (eds) In: Proceedings of the 20th international society for music information retrieval conference, ISMIR 2019, Delft, The Netherlands, 2019, pp 75\u201382"},{"key":"278_CR28","doi-asserted-by":"crossref","unstructured":"Pacha A, Eidenberger H (2017) Towards a universal music symbol classifier. In: 2017 14th IAPR International conference on document analysis and recognition (ICDAR), 2, pp 35\u201336. IEEE","DOI":"10.1109\/ICDAR.2017.265"},{"issue":"4","key":"278_CR29","doi-asserted-by":"publisher","first-page":"289","DOI":"10.1007\/s10032-016-0271-5","volume":"19","author":"F Pedersoli","year":"2016","unstructured":"Pedersoli F, Tzanetakis G (2016) Document segmentation and classification into musical scores and text. Int J Doc Anal Recognit 19(4):289\u2013304","journal-title":"Int J Doc Anal Recognit"},{"key":"278_CR30","unstructured":"Raphael C, Wang J (2011) New approaches to optical music recognition. In: Klapuri A, Leider C, editors, In: Proceedings of the 12th international society for music information retrieval conference, ISMIR 2011, Miami, Florida, USA, October 24-28, 2011, pp 305\u2013310"},{"issue":"1","key":"278_CR31","doi-asserted-by":"publisher","first-page":"19","DOI":"10.1007\/s10032-009-0100-1","volume":"13","author":"A Rebelo","year":"2010","unstructured":"Rebelo A, Capela G, Cardoso JdS (2010) Optical recognition of music symbols. Int J Doc Anal Recognit 13(1):19\u201331","journal-title":"Int J Doc Anal Recognit"},{"key":"278_CR32","doi-asserted-by":"publisher","first-page":"173","DOI":"10.1007\/s13735-012-0004-6","volume":"1","author":"A Rebelo","year":"2012","unstructured":"Rebelo A, Fujinaga I, Paszkiewicz F, Mar\u00e7al A, Guedes C, Cardoso J (2012) Optical music recognition: state-of-the-art and open issues. Int J Multimed Inf Retr 1:173\u2013190","journal-title":"Int J Multimed Inf Retr"},{"issue":"11","key":"278_CR33","doi-asserted-by":"publisher","first-page":"2298","DOI":"10.1109\/TPAMI.2016.2646371","volume":"39","author":"B Shi","year":"2017","unstructured":"Shi B, Bai X, Yao C (2017) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell 39(11):2298\u20132304","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"278_CR34","unstructured":"Williams RJ, Zipser D (1995) Gradient-based learning algorithms for recurrent networks and their computational complexity. In: Chauvin Y, Rumelhart DE, (eds.) Back-propagation: Theory, architectures and applications, 13: 433\u2013486"}],"container-title":["International Journal of Multimedia Information Retrieval"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s13735-023-00278-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s13735-023-00278-5\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s13735-023-00278-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,14]],"date-time":"2023-06-14T15:29:22Z","timestamp":1686756562000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s13735-023-00278-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,5,26]]},"references-count":34,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,6]]}},"alternative-id":["278"],"URL":"https:\/\/doi.org\/10.1007\/s13735-023-00278-5","relation":{},"ISSN":["2192-6611","2192-662X"],"issn-type":[{"value":"2192-6611","type":"print"},{"value":"2192-662X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,5,26]]},"assertion":[{"value":"25 May 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"27 January 2023","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 March 2023","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"26 May 2023","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no conflicts of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interests"}}],"article-number":"12"}}