{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,8]],"date-time":"2026-05-08T22:12:49Z","timestamp":1778278369055,"version":"3.51.4"},"reference-count":77,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2024,6,6]],"date-time":"2024-06-06T00:00:00Z","timestamp":1717632000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,6,6]],"date-time":"2024-06-06T00:00:00Z","timestamp":1717632000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100016378","name":"Technische Universit\u00e4t Dortmund","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100016378","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["IJDAR"],"published-print":{"date-parts":[[2024,9]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Semantic analysis of handwritten document images offers a wide range of practical application scenarios. A sequential combination of handwritten text recognition (HTR) and a task-specific natural language processing system offers an intuitive solution in this domain. However, this HTR-based approach suffers from the problem of error propagation. An HTR-free model, which avoids explicit text recognition and solves the task end-to-end, tackles this problem, but often produces poor results. A possible reason for this is that it does not incorporate largely pre-trained semantic word embeddings, which turn out to be one of the most powerful advantages in the textual domain. In this work, we propose an HTR-based and an HTR-free model and compare them on a variety of segmentation-based handwritten document image benchmarks including semantic word spotting, named entity recognition, and question answering. Furthermore, we propose a cross-modal knowledge distillation approach to integrate semantic knowledge from textually pre-trained word embeddings into HTR-free models. In a series of experiments, we investigate optimization strategies for robust semantic word image representation. We show that the incorporation of semantic knowledge is beneficial for HTR-free approaches in achieving state-of-the-art results on a variety of benchmarks.<\/jats:p>","DOI":"10.1007\/s10032-024-00477-8","type":"journal-article","created":{"date-parts":[[2024,6,6]],"date-time":"2024-06-06T13:01:38Z","timestamp":1717678898000},"page":"245-263","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Neural models for semantic analysis of handwritten document images"],"prefix":"10.1007","volume":"27","author":[{"given":"Oliver","family":"T\u00fcselmann","sequence":"first","affiliation":[]},{"given":"Gernot A.","family":"Fink","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,6,6]]},"reference":[{"key":"477_CR1","doi-asserted-by":"crossref","unstructured":"Adak, C., Chaudhuri, B.B., Blumenstein, M.: Named entity recognition from unstructured handwritten document images. In: International Workshop on Document Analysis Systems, pp. 375\u2013380 (2016)","DOI":"10.1109\/DAS.2016.15"},{"key":"477_CR2","doi-asserted-by":"crossref","unstructured":"Adak, C., Chaudhuri, B.B., Lin, C., Blumenstein, M.: Detecting named entities in unstructured Bengali manuscript images. In: International Conference on Document Analysis and Recognition, pp. 196\u2013201 (2019)","DOI":"10.1109\/ICDAR.2019.00040"},{"key":"477_CR3","unstructured":"Akbik, A., Bergmann, T., Blythe, D., Rasul, K., Schweter, S., Vollgraf, R.: FLAIR: An easy-to-use framework for state-of-the-art NLP. In: Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 54\u201359 (2019)"},{"key":"477_CR4","unstructured":"Akbik, A., Blythe, D., Vollgraf, R.: Contextual string embeddings for sequence labeling. In: International Conference on Computational Linguistics, pp. 1638\u20131649 (2018)"},{"issue":"12","key":"477_CR5","doi-asserted-by":"publisher","first-page":"2552","DOI":"10.1109\/TPAMI.2014.2339814","volume":"36","author":"J Almaz\u00e1n","year":"2014","unstructured":"Almaz\u00e1n, J., Gordo, A., Forn\u00e9s, A., Valveny, E.: Word spotting and recognition with embedded attributes. IEEE Trans. Pattern Anal. Mach. Intell. 36(12), 2552\u20132566 (2014)","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"477_CR6","doi-asserted-by":"crossref","unstructured":"Appalaraju, S., Jasani, B., Kota, B.U., Xie, Y., Manmatha, R.: DocFormer: End-to-end transformer for document understanding. In: International Conference on Computer Vision, pp. 973\u2013983 (2021)","DOI":"10.1109\/ICCV48922.2021.00103"},{"key":"477_CR7","unstructured":"Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: International Conference on Learning Representations (2015)"},{"issue":"6","key":"477_CR8","doi-asserted-by":"publisher","first-page":"683","DOI":"10.1017\/S1351324921000395","volume":"28","author":"R Baradaran","year":"2022","unstructured":"Baradaran, R., Ghiasi, R., Amirkhani, H.: A survey on machine reading comprehension systems. Nat. Lang. Eng. 28(6), 683\u2013732 (2022)","journal-title":"Nat. Lang. Eng."},{"key":"477_CR9","doi-asserted-by":"publisher","first-page":"135","DOI":"10.1162\/tacl_a_00051","volume":"5","author":"P Bojanowski","year":"2017","unstructured":"Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135\u2013146 (2017)","journal-title":"Trans. Assoc. Comput. Linguist."},{"key":"477_CR10","doi-asserted-by":"crossref","unstructured":"Boros, E., Romero, V., Maarand, M., Zenklov\u00e1, K., Kreckov\u00e1, J., Vidal, E., Stutzmann, D., Kermorvant, C.: A comparison of sequential and combined approaches for named entity recognition in a corpus of handwritten medieval charters. In: International Conference on Frontiers in Handwriting Recognition, pp. 79\u201384 (2020)","DOI":"10.1109\/ICFHR2020.2020.00025"},{"key":"477_CR11","doi-asserted-by":"crossref","unstructured":"Bos, J., Basile, V., Evang, K., Venhuizen, N., Bjerva, J.: The groningen meaning bank. In: Joint Symposium on Semantic Processing, pp. 463\u2013496 (2017)","DOI":"10.1007\/978-94-024-0881-2_18"},{"key":"477_CR12","doi-asserted-by":"publisher","first-page":"219","DOI":"10.1016\/j.patrec.2020.05.001","volume":"136","author":"M Carbonell","year":"2020","unstructured":"Carbonell, M., Forn\u00e9s, A., Villegas, M., Llad\u00f3s, J.: A neural model for text localization, transcription and named entity recognition in full pages. Pattern Recogn. Lett. 136, 219\u2013227 (2020)","journal-title":"Pattern Recogn. Lett."},{"key":"477_CR13","doi-asserted-by":"crossref","unstructured":"Carbonell, M., Villegas, M., Forn\u00e9s, A., Llad\u00f3s, J.: Joint recognition of handwritten text and named entities with a neural end-to-end model. In: International Workshop on Document Analysis Systems, pp. 399\u2013404 (2018)","DOI":"10.1109\/DAS.2018.52"},{"key":"477_CR14","doi-asserted-by":"crossref","unstructured":"Chiron, G., Doucet, A., Coustaty, M., Visani, M., Moreux, J.: Impact of OCR errors on the use of digital libraries: Towards a better access to information. In: Joint Conferene on Digital Libraries, pp. 249\u2013252 (2017)","DOI":"10.1109\/JCDL.2017.7991582"},{"key":"477_CR15","unstructured":"Cui, L., Xu, Y., Lv, T., Wei, F.: Document AI: Benchmarks, models and applications. CoRR abs\/2111.08609 (2021)"},{"key":"477_CR16","doi-asserted-by":"crossref","unstructured":"Davis, B.L., Morse, B.S., Price, B.L., Tensmeyer, C., Wigington, C., Morariu, V.I.: End-to-end document recognition and understanding with Dessurt. In: European Conference on Computer Vision, pp. 280\u2013296 (2022)","DOI":"10.1007\/978-3-031-25069-9_19"},{"key":"477_CR17","unstructured":"Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 4171\u20134186 (2019)"},{"key":"477_CR18","doi-asserted-by":"crossref","unstructured":"Dhiaf, M., Jemni, S.K., Kessentini, Y.: DocNER: A deep learning system for named entity recognition in handwritten document images. In: International Conference on Neural Information Processing, pp. 239\u2013246 (2021)","DOI":"10.1007\/978-3-030-92310-5_28"},{"issue":"2","key":"477_CR19","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3604931","volume":"56","author":"M Ehrmann","year":"2021","unstructured":"Ehrmann, M., Hamdi, A., Pontes, E.L., Romanello, M., Doucet, A.: Named entity recognition and classification in historical documents: A survey. ACM Comput. Surv. 56(2), 1\u201347 (2021)","journal-title":"ACM Comput. Surv."},{"key":"477_CR20","doi-asserted-by":"crossref","unstructured":"Ethayarajh, K.: How contextual are contextualized word representations? Comparing the geometry of BERT, ELMo, and GPT-2 embeddings. In: Conference on Empirical Methods in Natural Language Processing, pp. 55\u201365 (2019)","DOI":"10.18653\/v1\/D19-1006"},{"key":"477_CR21","doi-asserted-by":"crossref","unstructured":"Forn\u00e9s, A., Romero, V., Baro, A., Toledo, J.I., S\u00e1nchez, J., Vidal, E., Llad\u00f3s, J.: ICDAR2017 competition on information extraction in historical handwritten records. In: International Conference on Document Analysis and Recognition, pp. 1389\u20131394 (2017)","DOI":"10.1109\/ICDAR.2017.227"},{"key":"477_CR22","doi-asserted-by":"publisher","first-page":"310","DOI":"10.1016\/j.patcog.2017.02.023","volume":"68","author":"AP Giotis","year":"2017","unstructured":"Giotis, A.P., Sfikas, G., Gatos, B., Nikou, C.: A survey of document image word spotting techniques. Pattern Recogn. 68, 310\u2013332 (2017)","journal-title":"Pattern Recogn."},{"issue":"6","key":"477_CR23","doi-asserted-by":"publisher","first-page":"1789","DOI":"10.1007\/s11263-021-01453-z","volume":"129","author":"J Gou","year":"2021","unstructured":"Gou, J., Yu, B., Maybank, S.J., Tao, D.: Knowledge distillation: a survey. Int. J. Comput. Vis. 129(6), 1789\u20131819 (2021)","journal-title":"Int. J. Comput. Vis."},{"key":"477_CR24","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition, pp. 770\u2013778 (2016)","DOI":"10.1109\/CVPR.2016.90"},{"key":"477_CR25","unstructured":"Heinzerling, B., Strube, M.: BPEmb: Tokenization-free pre-trained subword embeddings in 275 languages. In: International Conference on Language Resources and Evaluation (2018)"},{"key":"477_CR26","doi-asserted-by":"crossref","unstructured":"Kang, L., Toledo, J.I., Riba, P., Villegas, M., Forn\u00e9s, A., Rusi\u00f1ol, M.: Convolve, attend and spell: An attention-based sequence-to-sequence model for handwritten word recognition. In: German Conference on Pattern Recognition, pp. 459\u2013472 (2018)","DOI":"10.1007\/978-3-030-12939-2_32"},{"key":"477_CR27","doi-asserted-by":"crossref","unstructured":"Kim, G., Hong, T., Yim, M., Nam, J., Park, J., Yim, J., Hwang, W., Yun, S., Han, D., Park, S.: OCR-free document understanding transformer. In: European Conference on Computer Vision, pp. 498\u2013517 (2022)","DOI":"10.1007\/978-3-031-19815-1_29"},{"key":"477_CR28","doi-asserted-by":"crossref","unstructured":"Krishnan, P., Dutta, K., Jawahar, C.V.: HWNet v3: A joint embedding framework for recognition and retrieval of handwritten text. Int. J. Document Anal. Recognit. pp. 1\u201317 (2023)","DOI":"10.1007\/s10032-022-00423-6"},{"key":"477_CR29","doi-asserted-by":"crossref","unstructured":"Krishnan, P., Jawahar, C.V.: Bringing semantics in word image retrieval. In: International Conference on Document Analysis and Recognition, pp. 733\u2013737 (2013)","DOI":"10.1109\/ICDAR.2013.150"},{"key":"477_CR30","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2020.107542","volume":"108","author":"P Krishnan","year":"2020","unstructured":"Krishnan, P., Jawahar, C.V.: Bringing semantics into word image representation. Pattern Recognit. 108, 107542 (2020)","journal-title":"Pattern Recognit."},{"key":"477_CR31","doi-asserted-by":"crossref","unstructured":"Landeghem, J.V., Tito, R., Borchmann, L., Pietruszka, M., Jurkiewicz, D., Powalski, R., J\u00f3ziak, P., Biswas, S., Coustaty, M., Stanislawek, T.: ICDAR 2023 competition on document understanding of everything (DUDE). In: International Conference on Document Analysis and Recognition, pp. 420\u2013434 (2023)","DOI":"10.1007\/978-3-031-41679-8_24"},{"issue":"18","key":"477_CR32","doi-asserted-by":"publisher","first-page":"3698","DOI":"10.3390\/app9183698","volume":"9","author":"S Liu","year":"2019","unstructured":"Liu, S., Zhang, X., Zhang, S., Wang, H., Zhang, W.: Neural machine reading comprehension: Methods and trends. Appl. Sci. 9(18), 3698 (2019)","journal-title":"Appl. Sci."},{"key":"477_CR33","unstructured":"Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: RoBERTa: A robustly optimized BERT pretraining approach. CoRR abs\/1907.11692 (2019)"},{"key":"477_CR34","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511809071","volume-title":"Introduction to Information Retrieval","author":"CD Manning","year":"2008","unstructured":"Manning, C.D., Raghavan, P., Sch\u00fctze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)"},{"issue":"1","key":"477_CR35","doi-asserted-by":"publisher","first-page":"39","DOI":"10.1007\/s100320200071","volume":"5","author":"U Marti","year":"2002","unstructured":"Marti, U., Bunke, H.: The IAM-database: An English sentence database for offline handwriting recognition. Int. J. Document Anal. Recognit. 5(1), 39\u201346 (2002)","journal-title":"Int. J. Document Anal. Recognit."},{"key":"477_CR36","doi-asserted-by":"publisher","first-page":"235","DOI":"10.1007\/s10032-021-00383-3","volume":"24","author":"M Mathew","year":"2021","unstructured":"Mathew, M., G\u00f3mez, L., Karatzas, D., Jawahar, C.V.: Asking questions on handwritten document collections. Int. J. Document Anal. Recognit. 24, 235\u2013249 (2021)","journal-title":"Int. J. Document Anal. Recognit."},{"key":"477_CR37","doi-asserted-by":"crossref","unstructured":"Mathew, M., Karatzas, D., Jawahar, C.V.: DocVQA: A dataset for VQA on document images. In: IEEE Winter Conference on Applications of Computer Vision, pp. 2199\u20132208 (2021)","DOI":"10.1109\/WACV48630.2021.00225"},{"key":"477_CR38","unstructured":"Mathew, M., Tito, R., Karatzas, D., Manmatha, R., Jawahar, C.V.: Document visual question answering challenge 2020. CoRR abs\/2008.08899 (2020)"},{"key":"477_CR39","unstructured":"Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: International Conference on Learning Representations (2013)"},{"key":"477_CR40","doi-asserted-by":"crossref","unstructured":"Monroc, C.B., Miret, B., Bonhomme, M., Kermorvant, C.: A comprehensive study of open-source libraries for named entity recognition on handwritten historical documents. In: International Workshop on Document Analysis Systems, pp. 429\u2013444 (2022)","DOI":"10.1007\/978-3-031-06555-2_29"},{"key":"477_CR41","doi-asserted-by":"crossref","unstructured":"Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., Zettlemoyer, L.: Deep contextualized word representations. In: Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 2227\u20132237 (2018)","DOI":"10.18653\/v1\/N18-1202"},{"key":"477_CR42","doi-asserted-by":"crossref","unstructured":"Powalski, R., Borchmann, L., Jurkiewicz, D., Dwojak, T., Pietruszka, M., Palka, G.: Going Full-TILT boogie on document understanding with text-image-layout transformer. In: International Conference on Document Analysis and Recognition, pp. 732\u2013747 (2021)","DOI":"10.1007\/978-3-030-86331-9_47"},{"key":"477_CR43","unstructured":"Prasad, A., D\u00e9jean, H., Meunier, J., Weidemann, M., Michael, J., Leifert, G.: Bench-marking information extraction in semi-structured historical handwritten records. CoRR abs\/1807.06270 (2018)"},{"key":"477_CR44","doi-asserted-by":"crossref","unstructured":"Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100, 000+ questions for machine comprehension of text. In: Conference on Empirical Methods in Natural Language Processing, pp. 2383\u20132392 (2016)","DOI":"10.18653\/v1\/D16-1264"},{"issue":"2\u20134","key":"477_CR45","doi-asserted-by":"publisher","first-page":"139","DOI":"10.1007\/s10032-006-0027-8","volume":"9","author":"TM Rath","year":"2007","unstructured":"Rath, T.M., Manmatha, R.: Word spotting for historical documents. Int. J. Document Anal. Recognit. 9(2\u20134), 139\u2013152 (2007)","journal-title":"Int. J. Document Anal. Recognit."},{"key":"477_CR46","doi-asserted-by":"publisher","first-page":"128","DOI":"10.1016\/j.patrec.2021.11.010","volume":"155","author":"AC Rouhou","year":"2022","unstructured":"Rouhou, A.C., Dhiaf, M., Kessentini, Y., Salem, S.B.: Transformer-based approach for joint handwriting and named entity recognition in historical document. Pattern Recogn. Lett. 155, 128\u2013134 (2022)","journal-title":"Pattern Recogn. Lett."},{"key":"477_CR47","unstructured":"Rowtula, V., Krishnan, P., Jawahar, C.V.: PoS tagging and named entity recognition on handwritten documents. In: International Conference on Natural Language Processing (2018)"},{"key":"477_CR48","doi-asserted-by":"crossref","unstructured":"Rowtula, V., Oota, S.R., Jawahar, C.V.: Towards automated evaluation of handwritten assessments. In: International Conference on Document Analysis and Recognition, pp. 426\u2013433 (2019)","DOI":"10.1109\/ICDAR.2019.00075"},{"key":"477_CR49","doi-asserted-by":"crossref","unstructured":"Sauer, A., Asaadi, S., K\u00fcch, F.: Knowledge distillation meets few-shot learning: An approach for few-shot intent classification within and across domains. In: Workshop on NLP for Conversational AI, pp. 108\u2013119 (2022)","DOI":"10.18653\/v1\/2022.nlp4convai-1.10"},{"key":"477_CR50","unstructured":"Seo, M., Kembhavi, A., Farhadi, A., Hajishirzi, H.: Bidirectional attention flow for machine comprehension. In: International Conference on Learning Representations (2017)"},{"key":"477_CR51","unstructured":"Sezerer, E., Tekir, S.: A survey on neural word embeddings. CoRR abs\/2110.01804 (2021)"},{"key":"477_CR52","doi-asserted-by":"crossref","unstructured":"Sharma, A., Jayagopi, D.B.: Automated grading of handwritten essays. In: International Conference on Frontiers in Handwriting Recognition, pp. 279\u2013284 (2018)","DOI":"10.1109\/ICFHR-2018.2018.00056"},{"key":"477_CR53","unstructured":"Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)"},{"key":"477_CR54","doi-asserted-by":"crossref","unstructured":"van Strien., D., Beelen., K., Ardanuy., M.C., Hosseini., K., McGillivray., B., Colavizza., G.: Assessing the impact of OCR quality on downstream NLP tasks. In: International Conference on Agents and Artificial Intelligence, pp. 484\u2013496 (2020)","DOI":"10.5220\/0009169004840496"},{"key":"477_CR55","unstructured":"Sudholt, S.: Learning attribute representations with deep convolutional neural networks for word spotting. Ph.D. thesis, TU Dortmund (2018)"},{"key":"477_CR56","doi-asserted-by":"crossref","unstructured":"Sudholt, S., Fink, G.A.: PHOCNet: A deep convolutional neural network for word spotting in handwritten documents. In: International Conference on Frontiers in Handwriting Recognition, pp. 277\u2014282 (2016)","DOI":"10.1109\/ICFHR.2016.0060"},{"key":"477_CR57","doi-asserted-by":"crossref","unstructured":"Sudholt, S., Fink, G.A.: Evaluating word string embeddings and loss functions for CNN-based word spotting. In: International Conference on Document Analysis and Recognition, pp. 493\u2013498 (2017)","DOI":"10.1109\/ICDAR.2017.87"},{"key":"477_CR58","doi-asserted-by":"crossref","unstructured":"Tang, L., Kender, J.R.: Educational video understanding: Mapping handwritten text to textbook chapters. In: International Conference on Document Analysis and Recognition, pp. 919\u2013923 (2005)","DOI":"10.1109\/ICDAR.2005.97"},{"key":"477_CR59","doi-asserted-by":"crossref","unstructured":"Tarride, S., Boillet, M., Kermorvant, C.: Key-value information extraction from full handwritten pages. In: International Conference on Document Analysis and Recognition, pp. 185\u2013204 (2023)","DOI":"10.1007\/978-3-031-41679-8_11"},{"key":"477_CR60","doi-asserted-by":"crossref","unstructured":"Tarride, S., Lemaitre, A., Co\u00fcasnon, B., Tardivel, S.: A comparative study of information extraction strategies using an attention-based neural network. In: International Workshop on Document Analysis Systems, pp. 644\u2013658 (2022)","DOI":"10.1007\/978-3-031-06555-2_43"},{"key":"477_CR61","doi-asserted-by":"crossref","unstructured":"Tito, R., Mathew, M., Jawahar, C.V., Valveny, E., Karatzas, D.: ICDAR 2021 competition on document visual question answering. CoRR abs\/2111.05547 (2021)","DOI":"10.1007\/978-3-030-86337-1_42"},{"key":"477_CR62","doi-asserted-by":"publisher","first-page":"27","DOI":"10.1016\/j.patcog.2018.08.020","volume":"86","author":"JI Toledo","year":"2019","unstructured":"Toledo, J.I., Carbonell, M., Forn\u00e9s, A., Llad\u00f3s, J.: Information extraction from historical handwritten document images with a context-aware neural model. Pattern Recogn. 86, 27\u201336 (2019)","journal-title":"Pattern Recogn."},{"key":"477_CR63","doi-asserted-by":"crossref","unstructured":"T\u00fcselmann, O., Brandenbusch, K., Chen, M., Fink, G.A.: A weighted combination of semantic and syntatic word image representations. In: International Conference on Frontiers in Handwriting Recognition, pp. 285\u2013299 (2022)","DOI":"10.1007\/978-3-031-21648-0_20"},{"key":"477_CR64","doi-asserted-by":"crossref","unstructured":"T\u00fcselmann, O., Fink, G.A.: Exploring semantic word representations for recognition-free NLP on handwritten document images. In: International Conference on Document Analysis and Recognition, pp. 85\u2013100 (2023)","DOI":"10.1007\/978-3-031-41685-9_6"},{"key":"477_CR65","doi-asserted-by":"crossref","unstructured":"T\u00fcselmann, O., Wolf, F., Fink, G.A.: Identifying and tackling key challenges in semantic word spotting. In: International Conference on Frontiers in Handwriting Recognition, pp. 55\u201360 (2020)","DOI":"10.1109\/ICFHR2020.2020.00021"},{"key":"477_CR66","doi-asserted-by":"crossref","unstructured":"T\u00fcselmann, O., Wolf, F., Fink, G.A.: Are end-to-end systems really necessary for NER on handwritten document images? In: International Conference on Document Analysis and Recognition, pp. 808\u2013822 (2021)","DOI":"10.1007\/978-3-030-86331-9_52"},{"key":"477_CR67","doi-asserted-by":"crossref","unstructured":"T\u00fcselmann, O., M\u00fcller, F., Wolf, F., Fink, G.A.: Recognition-free question answering on handwritten document collections. In: International Conference on Frontiers in Handwriting Recognition, pp. 259\u2013273 (2022)","DOI":"10.1007\/978-3-031-21648-0_18"},{"key":"477_CR68","doi-asserted-by":"crossref","unstructured":"Villanova-Aparisi, D., Martinez-Hinarejos, C.D., Romero, V., Pastor-Gadea, M.: Evaluation of different tagging schemes for named entity recognition in handwritten documents. In: International Conference on Document Analysis and Recognition, pp. 3\u201316 (2023)","DOI":"10.1007\/978-3-031-41682-8_1"},{"key":"477_CR69","unstructured":"Wang, W., Bi, B., Yan, M., Wu, C., Xia, J., Bao, Z., Peng, L., Si, L.: StructBERT: Incorporating language structures into pre-training for deep language understanding. In: International Conference on Learning Representations (2020)"},{"key":"477_CR70","doi-asserted-by":"crossref","unstructured":"Wilkinson, T., Brun, A.: Semantic and verbatim word spotting using deep neural networks. In: International Conference on Frontiers in Handwriting Recognition, pp. 307\u2013312 (2016)","DOI":"10.1109\/ICFHR.2016.0065"},{"key":"477_CR71","doi-asserted-by":"crossref","unstructured":"Wolf, F., Fink, G.A.: Self-training of handwritten word recognition for synthetic-to-real adaptation. In: International Conference on Pattern Recognition, pp. 3885\u20133892 (2022)","DOI":"10.1109\/ICPR56361.2022.9956168"},{"key":"477_CR72","doi-asserted-by":"publisher","first-page":"21","DOI":"10.1016\/j.cviu.2017.05.001","volume":"163","author":"Q Wu","year":"2017","unstructured":"Wu, Q., Teney, D., Wang, P., Shen, C., Dick, A.R., van den Hengel, A.: Visual question answering: a survey of methods and datasets. Comput. Vis. Image Underst. 163, 21\u201340 (2017)","journal-title":"Comput. Vis. Image Underst."},{"key":"477_CR73","doi-asserted-by":"crossref","unstructured":"Xu, Y., Xu, Y., Lv, T., Cui, L., Wei, F., Wang, G., Lu, Y., Flor\u00eancio, D.A.F., Zhang, C., Che, W., Zhang, M., Zhou, L.: Layoutlmv2: Multi-modal pre-training for visually-rich document understanding. In: Annual Meeting of the Association for Computational Linguistics and International Joint Conference on Natural Language Processing, pp. 2579\u20132591 (2021)","DOI":"10.18653\/v1\/2021.acl-long.201"},{"key":"477_CR74","unstructured":"Yadav, V., Bethard, S.: A survey on recent advances in named entity recognition from deep learning models. In: International Conference on Computational Linguistics, pp. 2145\u20132158 (2018)"},{"key":"477_CR75","doi-asserted-by":"crossref","unstructured":"Yamada, I., Asai, A., Shindo, H., Takeda, H., Matsumoto, Y.: LUKE: Deep contextualized entity representations with entity-aware self-attention. In: Conference on Empirical Methods in Natural Language Processing, pp. 6442\u20136454 (2020)","DOI":"10.18653\/v1\/2020.emnlp-main.523"},{"issue":"21","key":"477_CR76","doi-asserted-by":"publisher","first-page":"7640","DOI":"10.3390\/app10217640","volume":"10","author":"C Zeng","year":"2020","unstructured":"Zeng, C., Li, S., Li, Q., Hu, J., Hu, J.: A survey on machine reading comprehension: tasks, evaluation metrics, and benchmark datasets. Appl. Sci. 10(21), 7640 (2020)","journal-title":"Appl. Sci."},{"key":"477_CR77","unstructured":"Zhu, F., Lei, W., Wang, C., Zheng, J., Poria, S., Chua, T.: Retrieving and reading: A comprehensive survey on open-domain question answering. CoRR abs\/2101.00774 (2021)"}],"container-title":["International Journal on Document Analysis and Recognition (IJDAR)"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10032-024-00477-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10032-024-00477-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10032-024-00477-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,11,21]],"date-time":"2024-11-21T07:50:09Z","timestamp":1732175409000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10032-024-00477-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,6,6]]},"references-count":77,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2024,9]]}},"alternative-id":["477"],"URL":"https:\/\/doi.org\/10.1007\/s10032-024-00477-8","relation":{},"ISSN":["1433-2833","1433-2825"],"issn-type":[{"value":"1433-2833","type":"print"},{"value":"1433-2825","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,6,6]]},"assertion":[{"value":"13 November 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 March 2024","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"8 May 2024","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"6 June 2024","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}