{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,7]],"date-time":"2026-04-07T16:19:54Z","timestamp":1775578794725,"version":"3.50.1"},"reference-count":48,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2023,5,12]],"date-time":"2023-05-12T00:00:00Z","timestamp":1683849600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,5,12]],"date-time":"2023-05-12T00:00:00Z","timestamp":1683849600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["IJDAR"],"published-print":{"date-parts":[[2023,9]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>End-to-end solutions have brought about significant advances in the field of Optical Music Recognition. These approaches directly provide the symbolic representation of a given image of a musical score. Despite this, several documents, such as pianoform musical scores, cannot yet benefit from these solutions since their structural complexity does not allow their effective transcription. This paper presents a neural method whose objective is to transcribe these musical scores in an end-to-end fashion. We also introduce the<jats:sc>GrandStaff<\/jats:sc>dataset, which contains 53,882 single-system piano scores in common western modern notation. The sources are encoded in both a standard digital music representation and its adaptation for current transcription technologies. The method proposed in this paper is trained and evaluated using this dataset. The results show that the approach presented is, for the first time, able to effectively transcribe pianoform notation in an end-to-end manner.<\/jats:p>","DOI":"10.1007\/s10032-023-00432-z","type":"journal-article","created":{"date-parts":[[2023,5,12]],"date-time":"2023-05-12T13:02:44Z","timestamp":1683896564000},"page":"347-362","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":20,"title":["End-to-end optical music recognition for pianoform sheet music"],"prefix":"10.1007","volume":"26","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7770-9726","authenticated-orcid":false,"given":"Antonio","family":"R\u00edos-Vila","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3448-2688","authenticated-orcid":false,"given":"David","family":"Rizo","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6184-4694","authenticated-orcid":false,"given":"Jos\u00e9 M.","family":"I\u00f1esta","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3183-2232","authenticated-orcid":false,"given":"Jorge","family":"Calvo-Zaragoza","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,5,12]]},"reference":[{"issue":"4","key":"432_CR1","first-page":"77","volume":"53","author":"J J H Calvo-Zaragoza Jr","year":"2020","unstructured":"Calvo-Zaragoza, J. J. H., Jr., Pacha, A.: Understanding optical music recognition. ACM Comput. Surv 53(4), 77\u201317735 (2020)","journal-title":"ACM Comput. Surv"},{"issue":"3","key":"432_CR2","doi-asserted-by":"publisher","first-page":"169","DOI":"10.1080\/09298215.2015.1045424","volume":"44","author":"D Byrd","year":"2015","unstructured":"Byrd, D., Simonsen, J.G.: Towards a standard testbed for optical music recognition: definitions, metrics, and page images. J. N. Music Res. 44(3), 169\u2013195 (2015)","journal-title":"J. N. Music Res."},{"key":"432_CR3","doi-asserted-by":"crossref","unstructured":"Haji\u010d\u00a0jr., J., Pecina, P.: The MUSCIMA++ dataset for handwritten optical music recognition. In: 14th International Conference on Document Analysis and Recognition, Kyoto, Japan, pp. 39\u201346 (2017)","DOI":"10.1109\/ICDAR.2017.16"},{"issue":"3","key":"432_CR4","doi-asserted-by":"publisher","first-page":"173","DOI":"10.1007\/s13735-012-0004-6","volume":"1","author":"A Rebelo","year":"2012","unstructured":"Rebelo, A., Fujinaga, I., Paszkiewicz, F., Marcal, A.R., Guedes, C., Cardoso, J.S.: Optical music recognition: state-of-the-art and open issues. Int. J. Multim. Inf. Retr. 1(3), 173\u2013190 (2012)","journal-title":"Int. J. Multim. Inf. Retr."},{"issue":"5","key":"432_CR5","doi-asserted-by":"publisher","first-page":"753","DOI":"10.1109\/TPAMI.2007.70749","volume":"30","author":"C Dalitz","year":"2008","unstructured":"Dalitz, C., Droettboom, M., Pranzas, B., Fujinaga, I.: A comparative study of staff removal algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 30(5), 753\u2013766 (2008)","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"432_CR6","first-page":"1","volume":"2007","author":"F Rossant","year":"2007","unstructured":"Rossant, F., Bloch, I.: Robust and adaptive OMR system including fuzzy modeling, fusion of musical rules, and possible error detection. EURASIP J. Adv. Sig. Proc. 2007, 1\u201325 (2007)","journal-title":"EURASIP J. Adv. Sig. Proc."},{"key":"432_CR7","doi-asserted-by":"publisher","first-page":"138","DOI":"10.1016\/j.eswa.2017.07.002","volume":"89","author":"A-J Gallego","year":"2017","unstructured":"Gallego, A.-J., Calvo-Zaragoza, J.: Staff-line removal with selectional auto-encoders. Exp. Syst. Appl. 89, 138\u2013148 (2017)","journal-title":"Exp. Syst. Appl."},{"key":"432_CR8","doi-asserted-by":"crossref","unstructured":"Pacha, A., Eidenberger, H.: Towards a universal music symbol classifier. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 2, pp. 35\u201336 (2017). IEEE","DOI":"10.1109\/ICDAR.2017.265"},{"key":"432_CR9","unstructured":"Chowdhury, A., Vig, L.: An efficient end-to-end neural model for handwritten text recognition. In: 29th British Machine Vision Conference (2018)"},{"key":"432_CR10","doi-asserted-by":"crossref","unstructured":"Chiu, C.-C., Sainath, T.N., Wu, Y., Prabhavalkar, R., Nguyen, P., Chen, Z., Kannan, A., Weiss, R.J., Rao, K., Gonina, E., Jaitly, N., Li, B., Chorowski, J., Bacchiani, M.: State-of-the-art speech recognition with sequence-to-sequence models. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4774\u20134778 (2018)","DOI":"10.1109\/ICASSP.2018.8462105"},{"key":"432_CR11","doi-asserted-by":"publisher","first-page":"196","DOI":"10.1016\/j.patcog.2017.06.017","volume":"71","author":"J Zhang","year":"2017","unstructured":"Zhang, J., Du, J., Zhang, S., Liu, D., Hu, Y., Hu, J., Wei, S., Dai, L.: Watch, attend and parse: an end-to-end neural network based approach to handwritten mathematical expression recognition. Pattern Recogn. 71, 196\u2013206 (2017)","journal-title":"Pattern Recogn."},{"issue":"11","key":"432_CR12","doi-asserted-by":"publisher","first-page":"2298","DOI":"10.1109\/TPAMI.2016.2646371","volume":"39","author":"B Shi","year":"2017","unstructured":"Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298\u20132304 (2017)","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"issue":"4","key":"432_CR13","doi-asserted-by":"publisher","first-page":"606","DOI":"10.3390\/app8040606","volume":"8","author":"J Calvo-Zaragoza","year":"2018","unstructured":"Calvo-Zaragoza, J., Rizo, D.: End-to-end neural optical music recognition of monophonic scores. Appl. Sci. 8(4), 606 (2018)","journal-title":"Appl. Sci."},{"key":"432_CR14","doi-asserted-by":"publisher","first-page":"157","DOI":"10.1016\/j.patrec.2022.04.032","volume":"158","author":"M Alfaro-Contreras","year":"2022","unstructured":"Alfaro-Contreras, M., R\u00edos-Vila, A., Valero-Mas, J.J., I\u00f1esta, J.M., Calvo-Zaragoza, J.: Decoupling music notation to improve end-to-end optical music recognition. Pattern Recognit. Lett. 158, 157\u2013163 (2022)","journal-title":"Pattern Recognit. Lett."},{"key":"432_CR15","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.patrec.2019.02.029","volume":"123","author":"A Bar\u00f3","year":"2019","unstructured":"Bar\u00f3, A., Riba, P., Calvo-Zaragoza, J., Forn\u00e9s, A.: From optical music recognition to handwritten music recognition: a baseline. Pattern Recogn. Lett. 123, 1\u20138 (2019)","journal-title":"Pattern Recogn. Lett."},{"key":"432_CR16","doi-asserted-by":"crossref","unstructured":"Alfaro-Contreras, M., Calvo-Zaragoza, J., I\u00f1esta, J.M.: Approaching end-to-end optical music recognition for homophonic scores. In: 9th Iberian Conference Pattern Recognition and Image Analysis. Lecture Notes in Computer Science, vol. 11868, pp. 147\u2013158. Springer, Madrid, Spain (2019)","DOI":"10.1007\/978-3-030-31321-0_13"},{"key":"432_CR17","unstructured":"Edirisooriya, S., Dong, H., McAuley, J.J., Berg-Kirkpatrick, T.: An empirical evaluation of end-to-end polyphonic optical music recognition. In: Proceedings of the 22nd International Society for Music Information Retrieval Conference, ISMIR 2021, Online, pp. 167\u2013173 (2021)"},{"key":"432_CR18","doi-asserted-by":"crossref","unstructured":"Tuggener, L., Elezi, I., Schmidhuber, J., Pelillo, M., Stadelmann, T.: DeepScores-A Dataset for Segmentation, Detection and Classification of Tiny Objects. In: Proceedings of the 24th International Conference on Pattern Recognition, pp. 3704\u20133709 (2018)","DOI":"10.1109\/ICPR.2018.8545307"},{"key":"432_CR19","doi-asserted-by":"crossref","unstructured":"Jan Haji\u010d, j., Pecina, P.: The MUSCIMA++ Dataset for Handwritten Optical Music Recognition. In: 14th International Conference on Document Analysis and Recognition, ICDAR 2017, Kyoto, Japan, November 13 - 15, 2017, pp. 39\u201346. IEEE Computer Society, New York, USA (2017). Dept. of Computer Science and Intelligent Systems, Graduate School of Engineering, Osaka Prefecture University","DOI":"10.1109\/ICDAR.2017.16"},{"key":"432_CR20","unstructured":"Parada-Cabaleiro, E., Batliner, A., Schuller, B.W.: A diplomatic edition of Il Lauro Secco: ground truth for OMR of white mensural notation. In: Proceedings of the 20th International Society for Music Information Retrieval Conference, pp. 557\u2013564 (2019)"},{"key":"432_CR21","doi-asserted-by":"publisher","first-page":"115","DOI":"10.1016\/j.patrec.2019.08.021","volume":"128","author":"J Calvo-Zaragoza","year":"2019","unstructured":"Calvo-Zaragoza, J., Toselli, A.H., Vidal, E.: Handwritten music recognition for mensural notation with convolutional recurrent neural networks. Pattern Recognit. Lett. 128, 115\u2013121 (2019)","journal-title":"Pattern Recognit. Lett."},{"key":"432_CR22","volume-title":"Codified Spanish Music Heritage Through Verovio: The Online Platforms Fondo de M\u00fasica Tradicional IMF-CSIC and Books of Hispanic Polyphony IMF-CSIC","author":"E Ros-F\u00e1bregas","year":"2021","unstructured":"Ros-F\u00e1bregas, E.: Codified Spanish Music Heritage Through Verovio: The Online Platforms Fondo de M\u00fasica Tradicional IMF-CSIC and Books of Hispanic Polyphony IMF-CSIC. Alicante, Spain (2021)"},{"key":"432_CR23","volume-title":"Behind Bars: The Definitive Guide to Music Notation","author":"E Gould","year":"2011","unstructured":"Gould, E.: Behind Bars: The Definitive Guide to Music Notation, Faber Faber Music, London, United Kingdom (2011)","edition":"Faber"},{"key":"432_CR24","unstructured":"Hankinson, A., Roland, P., Fujinaga, I.: The music encoding initiative as a document-encoding framework. In: International Conference on Music Information Retrieval (2011)"},{"key":"432_CR25","doi-asserted-by":"crossref","unstructured":"Good, M., Actor, G.: Using MusicXML for file interchange. In: Web Delivering of Music, International Conference on 0, 153 (2003)","DOI":"10.1109\/WDM.2003.1233890"},{"key":"432_CR26","unstructured":"Huron, D.: Humdrum and Kern: Selective feature encoding BT - beyond MIDI: the handbook of musical codes. In: Beyond MIDI: The Handbook of Musical Codes, pp. 375\u2013401. MIT Press, Cambridge, MA, USA (1997)"},{"key":"432_CR27","unstructured":"Sapp, C.S.: Verovio humdrum viewer. In: Proceedings of Music Encoding Conference (MEC), Tours, France (2017)"},{"key":"432_CR28","unstructured":"Pugin, L., Zitellini, R., Roland, P.: Verovio - a library for engraving MEI music notation into SVG. In: International Society for Music Information Retrieval (2014)"},{"key":"432_CR29","unstructured":"Calvo-Zaragoza, J., Rizo, D.: Camera-PrIMuS: Neural end-to-end optical music recognition on realistic monophonic scores. In: Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018, Paris, France, pp. 248\u2013255 (2018)"},{"key":"432_CR30","doi-asserted-by":"publisher","first-page":"118211","DOI":"10.1016\/j.eswa.2022.118211","volume":"209","author":"FJ Castellanos","year":"2022","unstructured":"Castellanos, F.J., Garrido-Munoz, C., R\u00edos-Vila, A., Calvo-Zaragoza, J.: Region-based layout analysis of music score images. Exp. Syst. Appl. 209, 118211 (2022)","journal-title":"Exp. Syst. Appl."},{"key":"432_CR31","doi-asserted-by":"publisher","first-page":"122","DOI":"10.1016\/j.patcog.2019.05.025","volume":"94","author":"J S\u00e1nchez","year":"2019","unstructured":"S\u00e1nchez, J., Romero, V., Toselli, A.H., Villegas, M., Vidal, E.: A set of benchmarks for handwritten text recognition on historical documents. Pattern Recognit. 94, 122\u2013134 (2019)","journal-title":"Pattern Recognit."},{"key":"432_CR32","unstructured":"Graves, A., Fern\u00e1ndez, S., Gomez, F.J., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the Twenty-Third International Conference on Machine Learning, (ICML 2006), Pittsburgh, Pennsylvania, USA, June 25-29, 2006, pp. 369\u2013376 (2006)"},{"key":"432_CR33","unstructured":"Bluche, T.: Joint line segmentation and transcription for end-to-end handwritten paragraph recognition. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. NIPS\u201916, pp. 838\u2013846. Curran Associates Inc., Red Hook, NY, USA (2016)"},{"key":"432_CR34","doi-asserted-by":"crossref","unstructured":"Bluche, T., Louradour, J., Messina, R.O.: Scan, attend and read: End-to-end handwritten paragraph recognition with MDLSTM attention. In: 14th IAPR International Conference on Document Analysis and Recognition. ICDAR 2017, pp. 1050\u20131055. IEEE, Kyoto, Japan (2017)","DOI":"10.1109\/ICDAR.2017.174"},{"key":"432_CR35","doi-asserted-by":"crossref","unstructured":"Coquenet, D., Chatelain, C., Paquet, T.: End-to-end handwritten paragraph text recognition using a vertical attention network. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (2022)","DOI":"10.1109\/TPAMI.2022.3144899"},{"key":"432_CR36","doi-asserted-by":"crossref","unstructured":"Singh, S.S., Karayev, S.: Full page handwriting recognition via image to sequence extraction. In: Document Analysis and Recognition - ICDAR 2021: 16th International Conference. Lausanne, Switzerland, September 5\u201310, 2021, Proceedings, Part III, pp. 55\u201369. Springer, Berlin, Heidelberg (2021)","DOI":"10.1007\/978-3-030-86334-0_4"},{"key":"432_CR37","doi-asserted-by":"crossref","unstructured":"Yousef, M., Bishop, T.E.: Origaminet: Weakly-supervised, segmentation-free, one-step, full page textrecognition by learning to unfold. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)","DOI":"10.1109\/CVPR42600.2020.01472"},{"key":"432_CR38","doi-asserted-by":"crossref","unstructured":"Coquenet, D., Chatelain, C., Paquet, T.: Span: A simple predict & align network for handwritten paragraph recognition. In: 16th International Conference on Document Analysis and Recognition, ICDAR. Lecture Notes in Computer Science, vol. 12823, pp. 70\u201384 (2021)","DOI":"10.1007\/978-3-030-86334-0_5"},{"key":"432_CR39","first-page":"5998","volume":"30","author":"A Vaswani","year":"2017","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30, 5998\u20136008 (2017)","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"432_CR40","doi-asserted-by":"crossref","unstructured":"R\u00edos-Vila, A., I\u00f1esta, J.M., Calvo-Zaragoza, J.: On the use of transformers for end-to-end optical music recognition. In: Pattern Recognition and Image Analysis, pp. 470\u2013481. Springer, Cham (2022)","DOI":"10.1007\/978-3-031-04881-4_37"},{"key":"432_CR41","doi-asserted-by":"publisher","first-page":"108766","DOI":"10.1016\/j.patcog.2022.108766","volume":"129","author":"L Kang","year":"2022","unstructured":"Kang, L., Riba, P., Rusi\u00f1ol, M., Forn\u00e9s, A., Villegas, M.: Pay attention to what you read: Non-recurrent handwritten text-line recognition. Pattern Recogn. 129, 108766 (2022)","journal-title":"Pattern Recogn."},{"key":"432_CR42","unstructured":"Li, M., Lv, T., Chen, J., Cui, L., Lu, Y., Florencio, D., Zhang, C., Li, Z., Wei, F.: TrOCR: Transformer-Based Optical Character Recognition with Pre-trained Models. arXiv (2021)"},{"key":"432_CR43","doi-asserted-by":"crossref","unstructured":"Kudo, T., Richardson, J.: SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 66\u201371. Association for Computational Linguistics, Brussels, Belgium (2018)","DOI":"10.18653\/v1\/D18-2012"},{"key":"432_CR44","doi-asserted-by":"crossref","unstructured":"Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1715\u20131725. Association for Computational Linguistics, Berlin, Germany (2016)","DOI":"10.18653\/v1\/P16-1162"},{"key":"432_CR45","doi-asserted-by":"crossref","unstructured":"Kudo, T.: Subword regularization: Improving neural network translation models with multiple subword candidates. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 66\u201375. Association for Computational Linguistics, Melbourne, Australia (2018)","DOI":"10.18653\/v1\/P18-1007"},{"key":"432_CR46","doi-asserted-by":"crossref","unstructured":"R\u00edos-Vila, A., Rizo, D., Calvo-Zaragoza, J.: Complete optical music recognition via agnostic transcription and machine translation. In: Llad\u00f3s, J., Lopresti, D., Uchida, S. (eds.) 16th International Conference on Document Analysis and Recognition, ICDAR. Lecture Notes in Computer Science, vol. 12823, pp. 661\u2013675 (2021)","DOI":"10.1007\/978-3-030-86334-0_43"},{"key":"432_CR47","doi-asserted-by":"crossref","unstructured":"Coquenet, D., Chatelain, C., Paquet, T.: DAN: A Segmentation-free Document Attention Network for Handwritten Document Recognition. arXiv (2022)","DOI":"10.1109\/TPAMI.2023.3235826"},{"key":"432_CR48","volume-title":"The Definitive ANTLR 4 Reference","author":"T Parr","year":"2013","unstructured":"Parr, T.: The Definitive ANTLR 4 Reference, 2nd edn. Pragmatic Bookshelf, Raleigh (2013)","edition":"2"}],"container-title":["International Journal on Document Analysis and Recognition (IJDAR)"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10032-023-00432-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10032-023-00432-z\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10032-023-00432-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,20]],"date-time":"2024-10-20T08:25:00Z","timestamp":1729412700000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10032-023-00432-z"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,5,12]]},"references-count":48,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2023,9]]}},"alternative-id":["432"],"URL":"https:\/\/doi.org\/10.1007\/s10032-023-00432-z","relation":{},"ISSN":["1433-2833","1433-2825"],"issn-type":[{"value":"1433-2833","type":"print"},{"value":"1433-2825","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,5,12]]},"assertion":[{"value":"14 November 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 April 2023","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"8 April 2023","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"12 May 2023","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}