{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,2]],"date-time":"2026-05-02T15:33:03Z","timestamp":1777735983685,"version":"3.51.4"},"publisher-location":"New York, NY, USA","reference-count":27,"publisher":"ACM","license":[{"start":{"date-parts":[[2024,12,16]],"date-time":"2024-12-16T00:00:00Z","timestamp":1734307200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"FWF","award":["10.55776\/PAT1763723"],"award-info":[{"award-number":["10.55776\/PAT1763723"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2024,12,16]]},"DOI":"10.1145\/3677389.3702524","type":"proceedings-article","created":{"date-parts":[[2025,3,13]],"date-time":"2025-03-13T16:55:46Z","timestamp":1741884946000},"page":"1-3","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Text Extraction for Complex Historical Documents: A Modular Approach to Layout Detection and OCR"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0003-4787-9091","authenticated-orcid":false,"given":"David","family":"Fleischhacker","sequence":"first","affiliation":[{"name":"Graz University of Technology, Graz, Austria"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9417-5316","authenticated-orcid":false,"given":"Wolfgang Thomas","family":"G\u00f6derle","sequence":"additional","affiliation":[{"name":"Structural Changes of the Technosphere, Max Planck Institute of Geoanthropology, Jena, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0202-6100","authenticated-orcid":false,"given":"Roman","family":"Kern","sequence":"additional","affiliation":[{"name":"Graz University of Technology, Graz, Austria"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,3,13]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"[n.d.]. BirdFont. https:\/\/birdfont.org. Accessed: 2024-01-15."},{"key":"e_1_3_2_1_2_1","unstructured":"[n. d.]. Hof- und Staatsschematismus des \u00f6sterreichischen Kaiserthums. https:\/\/anno.onb.ac.at\/. Accessed: 2024-01-15."},{"key":"e_1_3_2_1_3_1","unstructured":"[n.d.]. Kraken OCR. https:\/\/kraken.re. Accessed: 2024-01-15."},{"key":"e_1_3_2_1_4_1","unstructured":"[n.d.]. Milit\u00e4r-Schematismus des \u00f6sterreichischen Kaiserthums. https:\/\/alex.onb.ac.at\/. Accessed: 2024-01-15."},{"key":"e_1_3_2_1_5_1","unstructured":"[n.d.]. OCR4all. https:\/\/github.com\/OCR4all. Accessed: 2024-01-15."},{"key":"e_1_3_2_1_6_1","unstructured":"[n.d.]. Tesseract OCR. https:\/\/github.com\/tesseract-ocr\/tesseract. Accessed: 2024-01-15."},{"key":"e_1_3_2_1_7_1","unstructured":"[n. d.]. Transkribus. https:\/\/readcoop.eu\/transkribus\/. Accessed: 2024-01-15."},{"key":"e_1_3_2_1_8_1","volume-title":"Beamte als historische Quelle: Der \u00d6sterreichische \"Amtskalender\". \u00d6sterreichische Zeitschrift f\u00fcr Geschichtswissenschaften 10, 3","author":"Bauer Ingrid","year":"1999","unstructured":"Ingrid Bauer. 1999. Beamte als historische Quelle: Der \u00d6sterreichische \"Amtskalender\". \u00d6sterreichische Zeitschrift f\u00fcr Geschichtswissenschaften 10, 3 (1999), 416--438."},{"key":"e_1_3_2_1_9_1","volume-title":"Proceedings of the 13th Language Resources and Evaluation Conference. 2327--2335","author":"Boros Emanuela","year":"2022","unstructured":"Emanuela Boros, Laurent Romary, and Beno\u00eet Sagot. 2022. Automatic OCR Post-Correction of Historical Documents at Scale. In Proceedings of the 13th Language Resources and Evaluation Conference. 2327--2335."},{"key":"e_1_3_2_1_10_1","unstructured":"Andreas B\u00fcttner Christian Reul and Frank Puppe. 2022. A Generic Approach to Layout Detection in Historical Documents. In Document Analysis Systems. 89--103."},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"crossref","unstructured":"Manuel Carbonell Marcel Villegas Alicia Forn\u00e9s and Josep Llad\u00f3s. 2020. Historical Document Layout Analysis Using Deep Learning. In Pattern Recognition and Image Analysis. 219--231.","DOI":"10.1016\/j.patrec.2020.05.001"},{"key":"e_1_3_2_1_12_1","volume-title":"Machine Learning and Libraries: A Report on the State of the Field","author":"Cordell Ryan","year":"2020","unstructured":"Ryan Cordell. 2020. Machine Learning and Libraries: A Report on the State of the Field. Library of Congress Labs (2020)."},{"key":"e_1_3_2_1_13_1","first-page":"5153","article-title":"Efficient Document Layout Analysis Using Transformers","volume":"44","author":"Douzon Jean-Paul","year":"2022","unstructured":"Jean-Paul Douzon and Mar\u00e7al Rusinol. 2022. Efficient Document Layout Analysis Using Transformers. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 9 (2022), 5153--5168.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"e_1_3_2_1_14_1","volume-title":"Proceedings of the 21st Nordic Conference on Computational Linguistics. 77--86","author":"Drobac Senka","year":"2020","unstructured":"Senka Drobac, Pekka Kauppinen, and Krister Lind\u00e9n. 2020. OCR and Post-Correction of Historical Finnish Texts. In Proceedings of the 21st Nordic Conference on Computational Linguistics. 77--86."},{"key":"e_1_3_2_1_15_1","volume-title":"Improving OCR Quality in 19th Century Historical Documents Using a Combined Machine Learning Based Approach. arXiv:2401.07787 [cs.CV]","author":"Fleischhacker David","unstructured":"David Fleischhacker, Wolfgang G\u00f6derle, and Roman Kern. 2024. Improving OCR Quality in 19th Century Historical Documents Using a Combined Machine Learning Based Approach. arXiv:2401.07787 [cs.CV]"},{"key":"e_1_3_2_1_16_1","first-page":"571","article-title":"An Analysis of OCR Accuracy by Historical Period for Printed English Books","volume":"38","author":"Grieggs Harrison","year":"2023","unstructured":"Harrison Grieggs, Pablo Ruiz, Nicholas Taylor, David Smith, Lauren Klein, and Shayoni Bhattacharyya. 2023. An Analysis of OCR Accuracy by Historical Period for Printed English Books. Digital Scholarship in the Humanities 38, 2 (2023), 571--587.","journal-title":"Digital Scholarship in the Humanities"},{"key":"e_1_3_2_1_17_1","volume-title":"IEEE\/ACM Joint Conference on Digital Libraries. 227--236","author":"Hamdi Ahmed","year":"2020","unstructured":"Ahmed Hamdi, Alix Jean-Caurant, Nicolas Sidere, Mickael Coustaty, and Antoine Doucet. 2020. An Analysis of the Performance of Named Entity Recognition over OCRed Documents. In IEEE\/ACM Joint Conference on Digital Libraries. 227--236."},{"key":"e_1_3_2_1_18_1","volume-title":"Comparison of OCR Tools and Post-correction Methods for Historical Texts. In Digital Humanities Conference.","author":"Jannidis Fotis","year":"2017","unstructured":"Fotis Jannidis, Leonard Konle, and Albin Zehe. 2017. Comparison of OCR Tools and Post-correction Methods for Historical Texts. In Digital Humanities Conference."},{"key":"e_1_3_2_1_19_1","volume-title":"Processing and Analysis of Visually Rich Historical Documents. In Digital Humanities Conference.","author":"Klut Meinard","year":"2023","unstructured":"Meinard Klut and Anna Bruhn. 2023. Processing and Analysis of Visually Rich Historical Documents. In Digital Humanities Conference."},{"key":"e_1_3_2_1_20_1","unstructured":"Old\u0159ich Kodym and Michal \u0160pan\u011bl. 2021. Page Layout Analysis of Historical Documents Using Deep Learning. In Advanced Concepts for Intelligent Vision Systems. 311--323."},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N19-2005"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00521-020-04910-x"},{"key":"e_1_3_2_1_23_1","volume-title":"Efficient Layout Analysis of Historical Documents Using Text Block Detection. In International Conference on Document Analysis and Recognition. 753--768","author":"Najem-Meyer Anis","year":"2022","unstructured":"Anis Najem-Meyer, Claude Barras, and Lori Lamel. 2022. Efficient Layout Analysis of Historical Documents Using Text Block Detection. In International Conference on Document Analysis and Recognition. 753--768."},{"key":"e_1_3_2_1_24_1","first-page":"3","article-title":"Layout Analysis and Text Column Detection in Historical Newspapers: Survey and Perspectives","volume":"56","author":"Neudecker Clemens","year":"2023","unstructured":"Clemens Neudecker and Christian Clausner. 2023. Layout Analysis and Text Column Detection in Historical Newspapers: Survey and Perspectives. Historical Methods: A Journal of Quantitative and Interdisciplinary History 56, 1 (2023), 3--29.","journal-title":"Historical Methods: A Journal of Quantitative and Interdisciplinary History"},{"key":"e_1_3_2_1_25_1","first-page":"1","article-title":"Document Layout Analysis: A Comprehensive Survey","volume":"55","author":"Rezanzehad Arash","year":"2023","unstructured":"Arash Rezanzehad and Ali Mohades. 2023. Document Layout Analysis: A Comprehensive Survey. Comput. Surveys 55, 11 (2023), 1--38.","journal-title":"Comput. Surveys"},{"key":"e_1_3_2_1_26_1","volume-title":"LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis. Document Analysis and Recognition","author":"Shen Zejiang","year":"2021","unstructured":"Zejiang Shen, Ruochen Zhang, Melissa Dell, Benjamin C. G. Lee, Jacob Carlson, and Weining Li. 2021. LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis. Document Analysis and Recognition (2021), 131--146."},{"key":"e_1_3_2_1_27_1","volume-title":"HO-RAE: An End-to-End Historical Document Processing Framework. In International Conference on Document Analysis and Recognition. 598--613","author":"Tarride Simon","year":"2023","unstructured":"Simon Tarride, Dominique Stutzmann, and Christopher Kermorvant. 2023. HO-RAE: An End-to-End Historical Document Processing Framework. In International Conference on Document Analysis and Recognition. 598--613."}],"event":{"name":"JCDL '24: 24th ACM\/IEEE Joint Conference on Digital Libraries","location":"Hong Kong China","acronym":"JCDL '24","sponsor":["SIGIR ACM Special Interest Group on Information Retrieval","SIGWEB ACM Special Interest Group on Hypertext, Hypermedia, and Web","IEEE TCDL"]},"container-title":["Proceedings of the 24th ACM\/IEEE Joint Conference on Digital Libraries"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3677389.3702524","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3677389.3702524","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:19:07Z","timestamp":1750295947000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3677389.3702524"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,12,16]]},"references-count":27,"alternative-id":["10.1145\/3677389.3702524","10.1145\/3677389"],"URL":"https:\/\/doi.org\/10.1145\/3677389.3702524","relation":{},"subject":[],"published":{"date-parts":[[2024,12,16]]},"assertion":[{"value":"2025-03-13","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}