{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,17]],"date-time":"2026-04-17T16:05:53Z","timestamp":1776441953022,"version":"3.51.2"},"reference-count":51,"publisher":"Emerald","issue":"7","license":[{"start":{"date-parts":[[2025,8,14]],"date-time":"2025-08-14T00:00:00Z","timestamp":1755129600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licences\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,12,15]]},"abstract":"<jats:sec>\n                    <jats:title>Purpose<\/jats:title>\n                    <jats:p>The aim of this work is to provide an overview of the current capabilities of Multimodal Large Language Models (MLLMs) for Handwritten Text Recognition (HTR), assessing their potential when compared to traditional task-specific, supervised models.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Design\/methodology\/approach<\/jats:title>\n                    <jats:p>The approach is that of using a set of openly-available benchmarks to compare different LLMs with strong task-specific supervised baselines for the task of HTR.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Findings<\/jats:title>\n                    <jats:p>The results show that LLMs currently show a strong performance on English texts, yet they demonstrate a weaker performance on languages other than English, and do not possess a significant capability for self-correction. Moreover, their comparison with Transkribus\u2019s models highlight the fact that proprietary LLM models are the best performing, in particular on modern handwriting, while for historical documents the overall performance comparison between LLMs and Transkribus is not consistent.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Originality\/value<\/jats:title>\n                    <jats:p>The authors are not aware of a similar study relying on open benchmarks.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1108\/jd-03-2025-0082","type":"journal-article","created":{"date-parts":[[2025,9,17]],"date-time":"2025-09-17T06:16:25Z","timestamp":1758089785000},"page":"334-354","source":"Crossref","is-referenced-by-count":3,"title":["Benchmarking large language models for handwritten text recognition"],"prefix":"10.1108","volume":"81","author":[{"ORCID":"https:\/\/orcid.org\/0009-0008-8464-1636","authenticated-orcid":true,"given":"Giorgia","family":"Crosilla","sequence":"first","affiliation":[{"name":"University of Bologna Department of Classical Philology and Italian Studies, , ,","place":["Bologna, Italy"]}]},{"given":"Lukas","family":"Klic","sequence":"additional","affiliation":[{"name":"Digital Humanities Lab, Villa I Tatti , ,","place":["Florence, Italy"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9806-084X","authenticated-orcid":true,"given":"Giovanni","family":"Colavizza","sequence":"additional","affiliation":[{"name":"University of Bologna Department of Classical Philology and Italian Studies, , ,","place":["Bologna, Italy"]},{"name":"University of Copenhagen Department of Communication, , ,","place":["Copenhagen, Denmark"]}]}],"member":"140","published-online":{"date-parts":[[2025,8,15]]},"reference":[{"issue":"1","key":"2025121000520758000_ref001","doi-asserted-by":"crossref","first-page":"18","DOI":"10.3390\/jimaging10010018","article-title":"Advancements and challenges in handwritten text recognition: a comprehensive survey","volume":"10","author":"AlKendi","year":"2024","journal-title":"Journal of Imaging"},{"key":"2025121000520758000_ref002","unstructured":"Augustin, E., Carr\u00e9, M., Grosicki, E., Brodin, J., Geoffrois, E. and Pr\u00eateux, F. (2006), \u201cRIMES evaluation campaign for handwritten mail processing\u201d, available at:\u00a0https:\/\/www.semanticscholar.org\/paper\/RIMES-evaluation-campaign-for-handwritten-mail-Augustin-Carr%C3%A9\/1a08e3055dd76c307f5f2993d54465dd407ad1ab (accessed\u00a014 January 2025)."},{"key":"2025121000520758000_ref003","doi-asserted-by":"crossref","unstructured":"Bluche, T., Louradour, J. and Messina, R. (2016), \u201cScan, attend and read: end-to-end handwritten paragraph recognition with MDLSTM attention\u201d, available at:\u00a0https:\/\/doi.org\/10.48550\/arXiv.1604.03286 (accessed\u00a014 January 2025).","DOI":"10.1109\/ICDAR.2017.174"},{"key":"2025121000520758000_ref004","unstructured":"Bourne, J.\n           (2025), \u201cCLOCR-C: context leveraging OCR correction with pre-trained language models\u201d, available at:\u00a0http:\/\/arxiv.org\/abs\/2408.17428 (accessed\u00a023 January 2025)."},{"key":"2025121000520758000_ref005","first-page":"29","article-title":"End-to-end handwritten text detection and transcription in full pages","author":"Carbonell","year":"2019"},{"key":"2025121000520758000_ref006","first-page":"340","volume-title":"Computer Analysis of Images and Patterns. CAIP 2021. Lecture Notes in Computer Science","author":"Cascianelli","year":"2021"},{"key":"2025121000520758000_ref007","first-page":"1506","article-title":"The LAM dataset: a novel benchmark for line-level handwritten text recognition","author":"Cascianelli","year":"2022"},{"key":"2025121000520758000_ref008","first-page":"43","article-title":"Handwriting recognition of historical documents with few labeled data","author":"Chammas","year":"2018"},{"key":"2025121000520758000_ref009","doi-asserted-by":"crossref","unstructured":"Chang, K.K., Cramer, M., Soni, S. and Bamman, D. (2023), \u201cSpeak, memory: an archaeology of books known to ChatGPT\/GPT- 4\u201d, pp.\u00a07312-7327, doi: 10.18653\/v1\/2023.emnlp-main.453, available at:\u00a0http:\/\/arxiv.org\/abs\/2305.00118 (accessed\u00a024 November 2024).","DOI":"10.18653\/v1\/2023.emnlp-main.453"},{"issue":"1","key":"2025121000520758000_ref010","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3479010","article-title":"Archives and AI: an overview of current debates and future perspectives","volume":"15","author":"Colavizza","year":"2022","journal-title":"Journal on Computing and Cultural Heritage"},{"key":"2025121000520758000_ref011","doi-asserted-by":"crossref","unstructured":"Coquenet, D., Chatelain, C. and Paquet, T. (2021), \u201cEnd-to-end handwritten paragraph text recognition using a vertical attention network\u201d, Vol.\u00a045 No.\u00a01, pp.\u00a0508-524, available at:\u00a0https:\/\/doi.org\/10.1109\/TPAMI.2022.3144899 (accessed\u00a014 January 2025).","DOI":"10.1109\/TPAMI.2022.3144899"},{"key":"2025121000520758000_ref012","first-page":"8227","article-title":"DAN: a segmentation-free document attention network for handwritten document recognition","volume-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence","author":"Coquenet","year":"2023"},{"key":"2025121000520758000_ref013","unstructured":"Dao, T., Fu, D.Y., Ermon, S., Rudra, A. and R\u00e9, C. (2022), \u201cFlashAttention: fast and memory-efficient exact attention with IO-awareness\u201d, doi: 10.48550\/arXiv.2205.14135, available at:\u00a0http:\/\/arxiv.org\/abs\/2205.14135 (accessed\u00a020 January 2025)."},{"key":"2025121000520758000_ref014","unstructured":"Greif, G., Griesshaber, N. and Greif, R. (2025), \u201cMultimodal LLMs for OCR, OCR post-correction, and named entity recognition in historical documents\u201d, available at:\u00a0https:\/\/arxiv.org\/pdf\/2504.00414 (accessed\u00a016 June 2025)."},{"key":"2025121000520758000_ref015","doi-asserted-by":"crossref","first-page":"157","DOI":"10.14361\/9783839455845-007","volume-title":"Archives, Access and Artificial Intelligence: Working with Born- Digital and Digitized Archival Collections","author":"Hodel","year":"2022"},{"key":"2025121000520758000_ref016","doi-asserted-by":"publisher","first-page":"13","DOI":"10.5334\/johd.46","article-title":"General models for handwritten text recognition: feasibility and state-of-the art. German kurrent as an example","volume":"7","author":"Hodel","year":"2021","journal-title":"Journal of Open Humanities Data"},{"key":"2025121000520758000_ref017","unstructured":"Huang, J., Chen, X., Mishra, S., Zheng, H.S., Yu, A.W., Song, X. and Zhou, D. (2024), \u201cLarge language models cannot self-correct reasoning yet\u201d, available at:\u00a0https:\/\/arxiv.org\/abs\/2310.01798 (accessed\u00a021 October 2024)."},{"key":"2025121000520758000_ref018","unstructured":"Humphries, M., Leddy, L.C., Downton, Q., Legace, M., McConnell, J., Murray, I. and Spence, E. (2024), \u201cUnlocking the archives: using Large Language Models to transcribe handwritten historical documents\u201d, available at:\u00a0http:\/\/arxiv.org\/abs\/2411.03340 (accessed\u00a07 November 2024)."},{"key":"2025121000520758000_ref019","unstructured":"Kang, L., Riba, P., Rusi\u00f1ol, M., Forn\u00e9s, A. and Villegas, M. (2020), \u201cPay attention to what you read: non-recurrent handwritten text-line recognition\u201d, available at:\u00a0https:\/\/doi.org\/10.48550\/arXiv.2005.13044 (accessed\u00a014 January 2025)."},{"key":"2025121000520758000_ref020","unstructured":"Keskar, N.S., McCann, B., Varshney, L.R., Xiong, C. and Socher, R. (2019), \u201cCTRL: a conditional transformer language model for controllable generation\u201d, available at:\u00a0http:\/\/arxiv.org\/abs\/1909.05858 (accessed\u00a018 January 2025)."},{"key":"2025121000520758000_ref021","unstructured":"Kim, S., Baudru, J., Ryckbosch, W., Bersini, H. and Ginis, V. (2025), \u201cEarly evidence of how LLMs outperform traditional systems on OCR\/HTR tasks for historical records\u201d, available at:\u00a0http:\/\/arxiv.org\/abs\/2501.11623 (accessed\u00a05 February 2025)."},{"key":"2025121000520758000_ref022","first-page":"707","article-title":"Binary codes capable of correcting deletions, insertions, and reversals","volume":"10","author":"Levenshtein","year":"1965","journal-title":"Soviet Physics Doklady"},{"key":"2025121000520758000_ref023","unstructured":"Li, L.\n           (2024), \u201cHandwriting recognition in historical documents with multimodal LLM\u201d, available at:\u00a0http:\/\/arxiv.org\/abs\/2410.24034 (accessed\u00a024 November 2024)."},{"key":"2025121000520758000_ref024","doi-asserted-by":"publisher","first-page":"13094","DOI":"10.1609\/aaai.v37i11.26538","article-title":"TrOCR: transformer-based optical character recognition with pre- trained models","author":"Li","year":"2023"},{"key":"2025121000520758000_ref025","doi-asserted-by":"crossref","unstructured":"Li, Y., Chen, D., Tang, T. and Shen, X. (2024), \u201cHTR-VT: handwritten text recognition with vision transformer\u201d, Vol.\u00a0158, 110967, available at:\u00a0https:\/\/doi.org\/10.1016\/j.patcog.2024.110967 (accessed\u00a014 January 2025).","DOI":"10.1016\/j.patcog.2024.110967"},{"issue":"4","key":"2025121000520758000_ref026","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3690391","article-title":"How can generative artificial intelligence techniques facilitate intelligent research into ancient books?","volume":"17","author":"Liu","year":"2024","journal-title":"Journal on Computing and Cultural Heritage"},{"key":"2025121000520758000_ref027","doi-asserted-by":"publisher","first-page":"705","DOI":"10.1109\/icdar.1999.791885","article-title":"A full English sentence database for off-line handwriting recognition","author":"Marti","year":"1999"},{"issue":"1","key":"2025121000520758000_ref028","doi-asserted-by":"crossref","first-page":"39","DOI":"10.1007\/s100320200071","article-title":"The IAM-database: an English sentence database for offline handwriting recognition","volume":"5","author":"Marti","year":"2002","journal-title":"International Journal on Document Analysis and Recognition"},{"key":"2025121000520758000_ref029","doi-asserted-by":"crossref","unstructured":"Moysset, B., Kermorvant, C. and Wolf, C. (2017), \u201cFull-page text recognition: learning where to start and when to stop\u201d, pp.\u00a0871-876, doi: 10.1109\/icdar.2017.147, available at:\u00a0http:\/\/arxiv.org\/abs\/1704.08628 (accessed\u00a019 October 2024).","DOI":"10.1109\/ICDAR.2017.147"},{"issue":"5","key":"2025121000520758000_ref030","doi-asserted-by":"publisher","first-page":"954","DOI":"10.1108\/JD-07-2018-0114","article-title":"Transforming scholarship in the archives through handwritten text recognition","volume":"75","author":"Muehlberger","year":"2019","journal-title":"Journal of Documentation"},{"key":"2025121000520758000_ref031","first-page":"13","article-title":"A survey of OCR evaluation tools and metrics","author":"Neudecker","year":"2021"},{"issue":"7","key":"2025121000520758000_ref032","doi-asserted-by":"crossref","first-page":"148","DOI":"10.1108\/JD-09-2023-0183","article-title":"The implications of handwritten text recognition for accessing the past at scale","volume":"80","author":"Nockels","year":"2024","journal-title":"Journal of Documentation"},{"key":"2025121000520758000_ref033","first-page":"39","article-title":"Europeana newspapers OCR workflow evaluation","author":"Pletschacher","year":"2015"},{"key":"2025121000520758000_ref034","first-page":"67","article-title":"Are multidimensional recurrent layers really necessary for handwritten text recognition?","author":"Puigcerver","year":"2017"},{"key":"2025121000520758000_ref035","first-page":"630","article-title":"ICFHR2016 competition on handwritten text recognition on the READ dataset","author":"Sanchez","year":"2016"},{"key":"2025121000520758000_ref052","first-page":"785","author":"S\u00e1nchez","year":"2014","journal-title":"2014 14th International Conference on Frontiers in Handwriting Recognition"},{"key":"2025121000520758000_ref036","doi-asserted-by":"publisher","first-page":"1383","DOI":"10.1109\/icdar.2017.226","article-title":"ICDAR2017 competition on handwritten text recognition on the READ dataset","author":"S\u00e1nchez","year":"2017"},{"key":"2025121000520758000_ref037","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1007\/978-3-030-86334-0_4","volume-title":"Document Analysis and Recognition \u2013 ICDAR 2021","author":"Singh","year":"2021"},{"key":"2025121000520758000_ref038","unstructured":"Stechly, K., Marquez, M. and Kambhampati, S. (2023), \u201cGPT-4 doesn\u2019t know it\u2019s wrong: an analysis of iterative prompting for reasoning problems\u201d, available at:\u00a0http:\/\/arxiv.org\/abs\/2310.12397 (accessed\u00a05 January 2025)."},{"key":"2025121000520758000_ref039","unstructured":"Str\u00f6bel, P.B.\n           (2023), \u201cFlexible techniques for automatic text recognition of historical documents\u201d, PhD dissertation, University of Zurich, available at:\u00a0https:\/\/core.ac.uk\/outputs\/574072779\/?source=2 (accessed\u00a015 January 2025)."},{"key":"2025121000520758000_ref041","unstructured":"Str\u00f6bel, P.B., Clematide, S., Volk, M. and Hodel, T. (2022a), \u201cTransformer-based HTR for historical documents\u201d, available at:\u00a0https:\/\/doi.org\/10.48550\/arXiv.2203.11008 (accessed\u00a014 January 2025)."},{"key":"2025121000520758000_ref040","first-page":"4395","article-title":"Evaluation of HTR models without ground truth material","author":"Str\u00f6bel","year":"2022"},{"key":"2025121000520758000_ref042","first-page":"1","article-title":"Training full-page handwritten text recognition models without annotated line breaks","author":"Tensmeyer","year":"2019"},{"key":"2025121000520758000_ref043","doi-asserted-by":"crossref","first-page":"179","DOI":"10.14361\/9783839455845-008","volume-title":"Archives, Access and Artificial Intelligence: Working with Born-Digital and Digitized Archival Collections","author":"Terras","year":"2022"},{"key":"2025121000520758000_ref044","doi-asserted-by":"crossref","unstructured":"Terras, M., Anzinger, B., Gooding, P., M\u00fchlberger, G., Nockels, J., Romein, C.A., Stauder, A. and Stauder, F. (2025), \u201cThe artificial intelligence cooperative: READ-COOP, Transkribus, and the benefits of shared community infrastructure for automated text recognition [version 1; peer review: awaiting peer review]\u201d, Open Research Europe, Vol.\u00a05 No.\u00a016, p. 16, available at:\u00a0https:\/\/doi.org\/10.12688\/openreseurope.18747.1 (accessed\u00a031 January 2025).","DOI":"10.12688\/openreseurope.18747.1"},{"key":"2025121000520758000_ref045","first-page":"228","article-title":"Handwriting recognition with large multidimensional long short-term memory recurrent neural networks","author":"Voigtlaender","year":"2016"},{"key":"2025121000520758000_ref046","unstructured":"White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J. and Schmidt, D.C. (2023), \u201cA prompt pattern catalog to enhance prompt engineering with ChatGPT\u201d, available at:\u00a0http:\/\/arxiv.org\/abs\/2302.11382 (accessed\u00a09 October 2024)."},{"key":"2025121000520758000_ref047","doi-asserted-by":"publisher","first-page":"112","DOI":"10.1007\/978-3-030-86334-0_8","article-title":"Transformer for handwritten text recognition using bidirectional post-decoding","author":"Wick","year":"2021"},{"key":"2025121000520758000_ref048","doi-asserted-by":"publisher","first-page":"372","DOI":"10.1007\/978-3-030-01231-1_23","article-title":"Start, follow, read: end-to-end full-page handwriting recognition","volume":"11210","author":"Wigington","year":"2018"},{"key":"2025121000520758000_ref049","first-page":"211","article-title":"Language model integration for the recognition of handwritten medieval documents","author":"Wuthrich","year":"2009"},{"key":"2025121000520758000_ref050","doi-asserted-by":"crossref","first-page":"14698","DOI":"10.1109\/CVPR42600.2020.01472","article-title":"OrigamiNet: weakly-supervised, segmentation-free, one-step, full page text recognition by learning to unfold","author":"Yousef","year":"2020","journal-title":"2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)"}],"container-title":["Journal of Documentation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.emerald.com\/jd\/article-pdf\/81\/7\/334\/10070996\/jd-03-2025-0082en.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/www.emerald.com\/jd\/article-pdf\/81\/7\/334\/10070996\/jd-03-2025-0082en.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,10]],"date-time":"2025-12-10T05:52:14Z","timestamp":1765345934000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.emerald.com\/jd\/article\/81\/7\/334\/1275080\/Benchmarking-large-language-models-for-handwritten"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8,15]]},"references-count":51,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2025,12,15]]}},"URL":"https:\/\/doi.org\/10.1108\/jd-03-2025-0082","relation":{},"ISSN":["0022-0418","1758-7379"],"issn-type":[{"value":"0022-0418","type":"print"},{"value":"1758-7379","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,8,15]]}}}