{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,2]],"date-time":"2025-08-02T17:44:28Z","timestamp":1754156668898,"version":"3.41.2"},"reference-count":20,"publisher":"Emerald","issue":"4","license":[{"start":{"date-parts":[[2020,7,16]],"date-time":"2020-07-16T00:00:00Z","timestamp":1594857600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.emerald.com\/insight\/site-policies"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["AJIM"],"published-print":{"date-parts":[[2020,7,16]]},"abstract":"<jats:sec><jats:title content-type=\"abstract-subheading\">Purpose<\/jats:title><jats:p>The authors investigate optical character recognition (OCR) technology and discuss its implementation in the context of digitisation of archival materials.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-subheading\">Design\/methodology\/approach<\/jats:title><jats:p>The typewritten transcripts of the Croatian Writers' Society from the mid-60s of the 20th century are used as the test data. The optimal digitisation setup is investigated in order to obtain the best OCR results. This was done by using the sample of 123 pages digitised at different resolution settings and binarisation levels.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-subheading\">Findings<\/jats:title><jats:p>A series of tests showed that different settings produce significantly different results. The best OCR accuracy achieved at the test sample of the typewritten documents was 95.02%. The results show that the resolution is significantly more important than binarisation pre-processing procedure for achieving better OCR results.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-subheading\">Originality\/value<\/jats:title><jats:p>Based on the research results, the authors give recommendations for achieving optimal digitisation process setup with the aim of increasing the quality of OCR results. Finally, the authors put the research results in the context of digitisation of cultural heritage in general and discuss further investigation possibilities.<\/jats:p><\/jats:sec>","DOI":"10.1108\/ajim-11-2019-0326","type":"journal-article","created":{"date-parts":[[2020,7,17]],"date-time":"2020-07-17T05:56:41Z","timestamp":1594965401000},"page":"545-559","source":"Crossref","is-referenced-by-count":4,"title":["Optimisation of archival processes involving digitisation of typewritten documents"],"prefix":"10.1108","volume":"72","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6925-4402","authenticated-orcid":false,"given":"Hrvoje","family":"Stan\u010di\u0107","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1555-7130","authenticated-orcid":false,"given":"\u017deljko","family":"Trbu\u0161i\u0107","sequence":"additional","affiliation":[]}],"member":"140","reference":[{"key":"key2020111204340273400_ref001","unstructured":"Anderson, N. (2010a), \u201cIMPACT best practice guide: optical character recognition - part 1\u201d, available at: http:\/\/www.impact-project.eu\/uploads\/media\/IMPACT-ocr-bpg-pilot-s1_01.pdf (accessed 30 October 2019)."},{"key":"key2020111204340273400_ref003","unstructured":"Anderson, N. (2010b), \u201cIMPACT workflow resource: glossary for the mass digitisation of text and OCR\u201d, available at: https:\/\/www.digitisation.eu\/download\/website-files\/WorkflowResources\/GlossaryfortheMassDigitisationofText_OCR-ImpactWorkflowResource_01.pdf (accessed 30 October 2019)."},{"issue":"1","key":"key2020111204340273400_ref004","doi-asserted-by":"crossref","first-page":"76","DOI":"10.1177\/0165551511429418","article-title":"Ocropodium: open source OCR for small-scale historical archives","volume":"38","year":"2012","journal-title":"Journal of Information Science"},{"key":"key2020111204340273400_ref005","first-page":"82970N","article-title":"Asymptotic cost in document conversion","volume-title":"Document Recognition and Retrieval XIX: Proceedings of SPIE","year":"2012"},{"key":"key2020111204340273400_ref006","first-page":"154","article-title":"QUARC: a remarkably effective method for increasing the OCR accuracy of degraded typewritten documents","volume-title":"1999 Symposium on Document Image Understanding Technology, April 1999, Annapolis, United States","year":"1999"},{"issue":"1","key":"key2020111204340273400_ref007","first-page":"106","article-title":"Optical character recognition applied to Romanian printed texts of the 18th-20th century","volume":"24","year":"2016","journal-title":"Computer Science Journal of Moldova"},{"issue":"3\/4","key":"key2020111204340273400_ref008","first-page":"1","article-title":"How good can it get? Analysing and improving OCR accuracy in large scale historic newspaper digitisation programs","volume":"15","year":"2008","journal-title":"D-Lib Magazine"},{"key":"key2020111204340273400_ref009","first-page":"3227","article-title":"Training and quality assessment of an optical character recognition model for Northern Haida","volume-title":"Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Paris","year":"2016"},{"first-page":"279","article-title":"How to improve optical character recognition of historical Finnish newspapers using open source tesseract OCR engine","year":"2017","key":"key2020111204340273400_ref010"},{"key":"key2020111204340273400_ref011","first-page":"23","article-title":"Improving optical character recognition of Finnish historical newspapers with a combination of Fraktur and Antiqua models and image preprocessing","volume-title":"Proceedings of the 21st Nordic Conference of Computational Linguistics, May 2017","year":"2017"},{"issue":"5","key":"key2020111204340273400_ref012","doi-asserted-by":"crossref","first-page":"954","DOI":"10.1108\/JD-07-2018-0114","article-title":"Transforming scholarship in the archives through handwritten text recognition: Transkribus as a case study","volume":"75","year":"2019","journal-title":"Journal of Documentation"},{"key":"key2020111204340273400_ref013","first-page":"58","article-title":"Optical character recognition: an illustrated guide to the frontier","volume-title":"Document Recognition and Retrieval VII, December 1999, San Jose, United States","year":"1999"},{"article-title":"Digitizing, coding, annotating, disseminating, and preserving documents","volume-title":"Proceedings of the 2006 International Workshop on Research Issues in Digital Libraries, Kolkota, India, 2006","year":"2007","key":"key2020111204340273400_ref014"},{"year":"1996","key":"key2020111204340273400_ref015","article-title":"The ISRI Analytic Tools for OCR Evaluation Version 5.1"},{"issue":"4","key":"key2020111204340273400_ref016","first-page":"38","article-title":"Where's the AI?","volume":"12","year":"1991","journal-title":"AI Magazine"},{"key":"key2020111204340273400_ref017","first-page":"629","article-title":"An overview of the Tesseract OCR engine","volume-title":"23-26 September 2007, Curitiba, Brazil","year":"2007"},{"issue":"5","key":"key2020111204340273400_ref018","doi-asserted-by":"crossref","first-page":"108","DOI":"10.17148\/IARJSET.2016.3523","article-title":"Document image analysis using ImageMagick and Tesseract-ocr","volume":"3","year":"2016","journal-title":"International Advanced Research Journal in Science, Engineering and Technology"},{"issue":"1","key":"key2020111204340273400_ref019","article-title":"Mining for the meanings of a murder: the impact of OCR quality on the use of digitized historical newspapers","volume":"8","year":"2014","journal-title":"DHQ: Digital Humanities Quarterly"},{"key":"key2020111204340273400_ref020","first-page":"252","article-title":"Impact analysis of OCR quality on research tasks in digital archives","volume-title":"Research and Advanced Technology for Digital Libraries, 19th International Conference on Theory and Practice of Digital Libraries, TPDL, 14-18 September 2015, Pozna\u0144, Poland","year":"2015"},{"key":"key2020111204340273400_ref021","unstructured":"Tweedie, M. (2018), \u201c6 technologies behind AI\u201d, available at: https:\/\/codebots.com\/ai-powered-bots\/6-technologies-behind-ai (accessed 30 October 2019)."}],"container-title":["Aslib Journal of Information Management"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.emerald.com\/insight\/content\/doi\/10.1108\/AJIM-11-2019-0326\/full\/xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.emerald.com\/insight\/content\/doi\/10.1108\/AJIM-11-2019-0326\/full\/html","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,24]],"date-time":"2025-07-24T23:01:26Z","timestamp":1753398086000},"score":1,"resource":{"primary":{"URL":"http:\/\/www.emerald.com\/ajim\/article\/72\/4\/545-559\/39244"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,7,16]]},"references-count":20,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2020,7,16]]}},"alternative-id":["10.1108\/AJIM-11-2019-0326"],"URL":"https:\/\/doi.org\/10.1108\/ajim-11-2019-0326","relation":{},"ISSN":["2050-3806"],"issn-type":[{"type":"print","value":"2050-3806"}],"subject":[],"published":{"date-parts":[[2020,7,16]]}}}