{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,26]],"date-time":"2026-03-26T18:27:11Z","timestamp":1774549631582,"version":"3.50.1"},"reference-count":47,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2023,11,16]],"date-time":"2023-11-16T00:00:00Z","timestamp":1700092800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Funds through FCT - Foundation for Science and Technology I.P.","award":["DSAIPA\/DS\/0023\/2018"],"award-info":[{"award-number":["DSAIPA\/DS\/0023\/2018"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["J. Comput. Cult. Herit."],"published-print":{"date-parts":[[2023,12,31]]},"abstract":"<jats:p>Linked data is used in various fields as a new way of structuring and connecting data. Cultural heritage institutions have been using linked data to improve archival descriptions and facilitate the discovery of information. Most archival records have digital representations of physical artifacts in the form of scanned images that are non-machine-readable. Optical Character Recognition (OCR) recognizes text in images and translates it into machine-encoded text. This article evaluates the impact of image processing methods and parameter tuning in OCR applied to typewritten cultural heritage documents. The approach uses a multi-objective problem formulation to minimize Levenshtein edit distance and maximize the number of words correctly identified with a non-dominated sorting genetic algorithm (NSGA-II) to tune the methods\u2019 parameters. Evaluation results show that parameterization by digital representation typology benefits the performance of image pre-processing algorithms in OCR. Furthermore, our findings suggest that employing image pre-processing algorithms in OCR might be more suitable for typologies where the text recognition task without pre-processing does not produce good results. In particular, Adaptive Thresholding, Bilateral Filter, and Opening are the best-performing algorithms for the theater plays\u2019 covers, letters, and overall dataset, respectively, and should be applied before OCR to improve its performance.<\/jats:p>","DOI":"10.1145\/3606705","type":"journal-article","created":{"date-parts":[[2023,6,30]],"date-time":"2023-06-30T11:56:44Z","timestamp":1688126204000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["Optimization of Image Processing Algorithms for Character Recognition in Cultural Typewritten Documents"],"prefix":"10.1145","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6329-2184","authenticated-orcid":false,"given":"Mariana","family":"Dias","sequence":"first","affiliation":[{"name":"Faculty of Engineering of the University of Porto and INESC-TEC, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4202-791X","authenticated-orcid":false,"given":"Carla Teixeira","family":"Lopes","sequence":"additional","affiliation":[{"name":"Faculty of Engineering of the University of Porto and INESC-TEC, Portugal"}]}],"member":"320","published-online":{"date-parts":[[2023,11,16]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDAR.2013.294"},{"key":"e_1_3_2_3_2","unstructured":"Konstantin Baierer. 2020. Models \\(\\cdot\\) OCROPUS\/ocropy wiki. https:\/\/github.com\/ocropus\/ocropy\/wiki\/Models."},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.4018\/jswis.2009081901"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.2990567"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1080\/01576895.2016.1233606"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDAR.2017.36"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/2595188.2595221"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDAR.2013.141"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2020.02.003"},{"key":"e_1_3_2_11_2","unstructured":"DGARQ. 2008. Arquivo Nacional Torre do Tombo. https:\/\/digitarq.arquivos.pt\/."},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.25747\/ZC25-1531"},{"key":"e_1_3_2_13_2","series-title":"CEUR Workshop Proceedings","first-page":"70","volume-title":"Proceedings of the 26th International Conference on Theory and Practice of Digital Libraries - Workshops and Doctoral Consortium","volume":"3246","author":"Dias Mariana","year":"2022","unstructured":"Mariana Dias and Carla Teixeira Lopes. 2022. Mining typewritten digital representations to support archival description. In Proceedings of the 26th International Conference on Theory and Practice of Digital Libraries - Workshops and Doctoral Consortium(CEUR Workshop Proceedings, Vol. 3246), Leonardo Candela and Gianmaria Silvello (Eds.). CEUR-WS.org, 70\u201376. https:\/\/ceur-ws.org\/Vol-3246\/09_Paper2.pdf."},{"key":"e_1_3_2_14_2","unstructured":"Instituto dos Arquivos Nacionais\/Torre do Tombo. 2006. Orienta\u00e7\u00f5es para a gest\u00e3o de documentos de Arquivo no contexto de uma reestrutura\u00e7\u00e3o da Administra\u00e7\u00e3o Central do Estado. 55 pages."},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10032-020-00359-9"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.25747\/WPNA-JE39"},{"key":"e_1_3_2_17_2","unstructured":"FineReader PDF. 2019. How AI Powers PDF Software & Technology Trends: FineReader Blog. https:\/\/pdf.abbyy.com\/blog\/finereader-powered-by-ai\/."},{"key":"e_1_3_2_18_2","unstructured":"FineReader PDF. 2021. Technical Specifications and System Requirements | FineReader PDF. https:\/\/pdf.abbyy.com\/specifications\/."},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2016-431"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.3390\/w11081650"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2006.04.043"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.5121\/sipij.2015.6401"},{"key":"e_1_3_2_23_2","first-page":"6","article-title":"Brno mobile OCR dataset","volume":"1907","author":"Kiss Martin","year":"2019","unstructured":"Martin Kiss, Michal Hradis, and Oldrich Kodym. 2019. Brno mobile OCR dataset. CoRR abs\/1907.01307 (2019), 6 pages. http:\/\/arxiv.org\/abs\/1907.01307.","journal-title":"CoRR"},{"key":"e_1_3_2_24_2","first-page":"277","volume-title":"Proceedings of the 21st Nordic Conference on Computational Linguistics","author":"Koistinen Mika","year":"2017","unstructured":"Mika Koistinen, Kimmo Kettunen, and Tuula P\u00e4\u00e4kk\u00f6nen. 2017. Improving optical character recognition of Finnish historical newspapers with a combination of Fraktur & Antiqua models and image preprocessing. In Proceedings of the 21st Nordic Conference on Computational Linguistics. Association for Computational Linguistics, 277\u2013283. https:\/\/aclanthology.org\/W17-0238."},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-50137-6_7"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.13140\/RG.2.1.3896.7842"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1117\/12.2006228"},{"key":"e_1_3_2_28_2","first-page":"15","article-title":"Neural OCR post-hoc correction of historical corpora","volume":"2102","author":"Lyu Lijun","year":"2021","unstructured":"Lijun Lyu, Maria Koutraki, Martin Krickl, and Besnik Fetahu. 2021. Neural OCR post-hoc correction of historical corpora. CoRR abs\/2102.00583 (2021), 15 pages. https:\/\/arxiv.org\/abs\/2102.00583.","journal-title":"CoRR"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","unstructured":"Mumtazimah Mohamad. 2015. A Review on OpenCV. 17 pages. DOI:10.13140\/RG.2.1.2269.8721","DOI":"10.13140\/RG.2.1.2269.8721"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.3390\/mca27060103"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/JCDL.2019.00015"},{"key":"e_1_3_2_32_2","first-page":"83","article-title":"Linked data for archives","volume":"82","author":"Niu Jinfang","year":"2016","unstructured":"Jinfang Niu. 2016. Linked data for archives. Archivaria 82 ( 2016), 83\u2013110. https:\/\/archivaria.ca\/index.php\/archivaria\/article\/view\/13582.","journal-title":"Archivaria"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4612-4380-9_15"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1108\/17563780910939282"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPSW.2017.28"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1145\/2809544.2809554"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.5555\/3322706.3361994"},{"key":"e_1_3_2_38_2","first-page":"231","volume-title":"Chapter 6 - Optimal Design of Heat Exchanger Networks","author":"Roetzel Wilfried","year":"2020","unstructured":"Wilfried Roetzel, Xing Luo, and Dezhen Chen. 2020. Chapter 6 - Optimal Design of Heat Exchanger Networks. Academic Press, 231\u2013317."},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.3390\/sym12050715"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/CYBER53097.2021.9588230"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.3390\/jimaging5040048"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1007\/s42979-020-00176-1"},{"key":"e_1_3_2_43_2","first-page":"88","volume-title":"4th Workshop on Very Large Corpora (VLC@COLING\u201996)","author":"Tong Xiang","year":"1996","unstructured":"Xiang Tong and David A. Evans. 1996. A statistical approach to automatic OCR error correction in context. In 4th Workshop on Very Large Corpora (VLC@COLING\u201996). Association for Computational Linguistics, 88\u2013100. http:\/\/www.aclweb.org\/anthology\/W\/W96\/W96-0108.pdf."},{"key":"e_1_3_2_44_2","first-page":"21","volume-title":"13th IAPR International Workshop on Document Analysis Systems (DAS\u201918)","author":"Walker Jake","year":"2018","unstructured":"Jake Walker, Yasuhisa Fujii, and Ashok C. Popat. 2018. A web-based OCR service for documents. In 13th IAPR International Workshop on Document Analysis Systems (DAS\u201918), Vol. 1. 21\u201322."},{"key":"e_1_3_2_45_2","unstructured":"Stefan Weil. 2021. Tesseract-OCR\/langdata: Source Training Data for Tesseract for Lots of Languages. https:\/\/github.com\/tesseract-ocr\/langdata."},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1177\/1748302620942467"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1016\/B978-0-12-821986-7.00013-5"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICIP.1998.999063"}],"container-title":["Journal on Computing and Cultural Heritage"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3606705","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3606705","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T17:48:51Z","timestamp":1750182531000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3606705"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,11,16]]},"references-count":47,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2023,12,31]]}},"alternative-id":["10.1145\/3606705"],"URL":"https:\/\/doi.org\/10.1145\/3606705","relation":{},"ISSN":["1556-4673","1556-4711"],"issn-type":[{"value":"1556-4673","type":"print"},{"value":"1556-4711","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,11,16]]},"assertion":[{"value":"2022-12-30","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-06-02","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-11-16","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}