{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,3]],"date-time":"2025-10-03T08:48:52Z","timestamp":1759481332794,"version":"3.40.3"},"publisher-location":"Cham","reference-count":24,"publisher":"Springer Nature Switzerland","isbn-type":[{"type":"print","value":"9783031264375"},{"type":"electronic","value":"9783031264382"}],"license":[{"start":{"date-parts":[[2023,1,1]],"date-time":"2023-01-01T00:00:00Z","timestamp":1672531200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,2,23]],"date-time":"2023-02-23T00:00:00Z","timestamp":1677110400000},"content-version":"vor","delay-in-days":53,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Multimodal integration of text, layout and visual information has achieved SOTA results in visually rich document understanding (VrDU) tasks, including relation extraction (RE). However, despite its importance, evaluation of the relative predictive capacity of these modalities is less prevalent. Here, we demonstrate the value of shared representations for RE tasks by conducting experiments in which each data type is iteratively excluded during training. In addition, text and layout data are evaluated in isolation. While a bimodal text and layout approach performs best (F1 = 0.684), we show that text is the most important single predictor of entity relations. Additionally, layout geometry is highly predictive and may even be a feasible unimodal approach. Despite being less effective, we highlight circumstances where visual information can bolster performance. In total, our results demonstrate the efficacy of training joint representations for RE.<\/jats:p>","DOI":"10.1007\/978-3-031-26438-2_35","type":"book-chapter","created":{"date-parts":[[2023,2,22]],"date-time":"2023-02-22T06:32:56Z","timestamp":1677047576000},"page":"450-461","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Unimodal and\u00a0Multimodal Representation Training for\u00a0Relation Extraction"],"prefix":"10.1007","author":[{"given":"Ciaran","family":"Cooney","sequence":"first","affiliation":[]},{"given":"Rachel","family":"Heyburn","sequence":"additional","affiliation":[]},{"given":"Liam","family":"Madigan","sequence":"additional","affiliation":[]},{"given":"Mairead","family":"O\u2019Cuinn","sequence":"additional","affiliation":[]},{"given":"Chloe","family":"Thompson","sequence":"additional","affiliation":[]},{"given":"Joana","family":"Cavadas","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,2,23]]},"reference":[{"key":"35_CR1","doi-asserted-by":"crossref","unstructured":"Audebert, N., Herold, C., Slimani, K., Vidal, C.: Multimodal deep networks for text and image-based document classification. arXiv preprint arXiv:1907.06370 (2019)","DOI":"10.1007\/978-3-030-43823-4_35"},{"key":"35_CR2","doi-asserted-by":"publisher","first-page":"219","DOI":"10.1016\/j.patrec.2020.05.001","volume":"136","author":"M Carbonell","year":"2020","unstructured":"Carbonell, M., Forn\u00e9s, A., Villegas, M., Llad\u00f3s, J.: A neural model for text localization, transcription and named entity recognition in full pages. Pattern Recogn. Lett. 136, 219\u2013227 (2020)","journal-title":"Pattern Recogn. Lett."},{"key":"35_CR3","doi-asserted-by":"crossref","unstructured":"Carbonell, M., Riba, P., Villegas, M., Forn\u00e9s, A., Llad\u00f3s, J.: Named entity recognition and relation extraction with graph neural networks in semi structured documents. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 9622\u20139627. IEEE (2021)","DOI":"10.1109\/ICPR48806.2021.9412669"},{"key":"35_CR4","doi-asserted-by":"publisher","first-page":"1983","DOI":"10.1109\/TBME.2021.3132861","volume":"69","author":"C Cooney","year":"2021","unstructured":"Cooney, C., Folli, R., Coyle, D.: A bimodal deep learning architecture for EEG-fNIRS decoding of overt and imagined speech. IEEE Trans. Biomed. Eng. 69, 1983\u20131994 (2021)","journal-title":"IEEE Trans. Biomed. Eng."},{"key":"35_CR5","doi-asserted-by":"crossref","unstructured":"Dang, T.A.N., Hoang, D.T., Tran, Q.B., Pan, C.W., Nguyen, T.D.: End-to-end hierarchical relation extraction for generic form understanding. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 5238\u20135245. IEEE (2021)","DOI":"10.1109\/ICPR48806.2021.9412778"},{"key":"35_CR6","doi-asserted-by":"crossref","unstructured":"Davis, B., Morse, B., Price, B., Tensmeyer, C., Wiginton, C.: Visual fudge: form understanding via dynamic graph editing. arXiv preprint arXiv:2105.08194 (2021)","DOI":"10.1007\/978-3-030-86549-8_27"},{"key":"35_CR7","unstructured":"Gralinski, F., et al.: Kleister: a novel task for information extraction involving long documents with complex layout. CoRR abs\/2003.02356 (2020). https:\/\/arxiv.org\/abs\/2003.02356"},{"key":"35_CR8","doi-asserted-by":"crossref","unstructured":"Gu, Z., et al.: XYLayoutLM: Towards layout-aware multimodal networks for visually-rich document understanding. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 4583\u20134592 (2022)","DOI":"10.1109\/CVPR52688.2022.00454"},{"key":"35_CR9","doi-asserted-by":"crossref","unstructured":"Hong, T., Kim, D., Ji, M., Hwang, W., Nam, D., Park, S.: BROS: a pre-trained language model focusing on text and layout for better key information extraction from documents. arXiv preprint arXiv:2108.04539 (2021)","DOI":"10.1609\/aaai.v36i10.21322"},{"key":"35_CR10","doi-asserted-by":"crossref","unstructured":"Huang, Y., Lv, T., Cui, L., Lu, Y., Wei, F.: LayoutLMv3: pre-training for document ai with unified text and image masking. arXiv preprint arXiv:2204.08387 (2022)","DOI":"10.1145\/3503161.3548112"},{"key":"35_CR11","doi-asserted-by":"crossref","unstructured":"Jaume, G., Ekenel, H.K., Thiran, J.P.: FUNSD: a dataset for form understanding in noisy scanned documents. In: 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), vol. 2, pp. 1\u20136. IEEE (2019)","DOI":"10.1109\/ICDARW.2019.10029"},{"key":"35_CR12","unstructured":"Li, C., et al.: StructuralLM: Structural pre-training for form understanding. arXiv preprint arXiv:2105.11210 (2021)"},{"key":"35_CR13","doi-asserted-by":"crossref","unstructured":"Li, Y., et al.: StrucText: structured text understanding with multi-modal transformers. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 1912\u20131920 (2021)","DOI":"10.1145\/3474085.3475345"},{"key":"35_CR14","doi-asserted-by":"crossref","unstructured":"Liu, X., Gao, F., Zhang, Q., Zhao, H.: Graph convolution for multimodal information extraction from visually rich documents. arXiv preprint arXiv:1903.11279 (2019)","DOI":"10.18653\/v1\/N19-2005"},{"key":"35_CR15","unstructured":"Pramanik, S., Mujumdar, S., Patel, H.: Towards a multi-modal, multi-task learning based pre-training framework for document representation learning. arXiv preprint arXiv:2009.14457 (2020)"},{"key":"35_CR16","doi-asserted-by":"crossref","unstructured":"Sharif, M.I., Khan, M.A., Alhussein, M., Aurangzeb, K., Raza, M.: A decision support system for multimodal brain tumor classification using deep learning. Complex Intell. Syst. 1\u201314 (2021)","DOI":"10.1007\/s40747-021-00321-0"},{"issue":"1","key":"35_CR17","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41598-020-74399-w","volume":"11","author":"J Venugopalan","year":"2021","unstructured":"Venugopalan, J., Tong, L., Hassanzadeh, H.R., Wang, M.D.: Multimodal deep learning models for early detection of Alzheimer\u2019s disease stage. Sci. Rep. 11(1), 1\u201313 (2021)","journal-title":"Sci. Rep."},{"key":"35_CR18","doi-asserted-by":"crossref","unstructured":"Wang, Z., Zhan, M., Liu, X., Liang, D.: DocStruct: a multimodal method to extract hierarchy structure in document for general form understanding. arXiv preprint arXiv:2010.11685 (2020)","DOI":"10.18653\/v1\/2020.findings-emnlp.80"},{"key":"35_CR19","doi-asserted-by":"crossref","unstructured":"Wei, M., He, Y., Zhang, Q.: Robust layout-aware IE for visually rich documents with pre-trained language models. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2367\u20132376 (2020)","DOI":"10.1145\/3397271.3401442"},{"key":"35_CR20","doi-asserted-by":"crossref","unstructured":"Xie, S., Girshick, R., Doll\u00e1r, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. arXiv preprint arXiv:1611.05431 (2016)","DOI":"10.1109\/CVPR.2017.634"},{"key":"35_CR21","doi-asserted-by":"crossref","unstructured":"Xu, Y., et al.: LayoutLMv2: multi-modal pre-training for visually-rich document understanding. arXiv preprint arXiv:2012.14740 (2020)","DOI":"10.18653\/v1\/2021.acl-long.201"},{"key":"35_CR22","doi-asserted-by":"crossref","unstructured":"Xu, Y., Li, M., Cui, L., Huang, S., Wei, F., Zhou, M.: LayoutLM: pre-training of text and layout for document image understanding. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1192\u20131200 (2020)","DOI":"10.1145\/3394486.3403172"},{"key":"35_CR23","unstructured":"Xu, Y., et al.: LayoutXLM: multimodal pre-training for multilingual visually-rich document understanding. arXiv preprint arXiv:2104.08836 (2021)"},{"key":"35_CR24","doi-asserted-by":"crossref","unstructured":"Zhang, P., et al.: TRIE: end-to-end text reading and information extraction for document understanding. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 1413\u20131422 (2020)","DOI":"10.1145\/3394171.3413900"}],"container-title":["Communications in Computer and Information Science","Artificial Intelligence and Cognitive Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/978-3-031-26438-2_35","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,22]],"date-time":"2023-02-22T06:40:22Z","timestamp":1677048022000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/978-3-031-26438-2_35"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023]]},"ISBN":["9783031264375","9783031264382"],"references-count":24,"URL":"https:\/\/doi.org\/10.1007\/978-3-031-26438-2_35","relation":{},"ISSN":["1865-0929","1865-0937"],"issn-type":[{"type":"print","value":"1865-0929"},{"type":"electronic","value":"1865-0937"}],"subject":[],"published":{"date-parts":[[2023]]},"assertion":[{"value":"23 February 2023","order":1,"name":"first_online","label":"First Online","group":{"name":"ChapterHistory","label":"Chapter History"}},{"value":"AICS","order":1,"name":"conference_acronym","label":"Conference Acronym","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Irish Conference on Artificial Intelligence and Cognitive Science","order":2,"name":"conference_name","label":"Conference Name","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Munster","order":3,"name":"conference_city","label":"Conference City","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Ireland","order":4,"name":"conference_country","label":"Conference Country","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"2022","order":5,"name":"conference_year","label":"Conference Year","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"8 December 2022","order":7,"name":"conference_start_date","label":"Conference Start Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"9 December 2022","order":8,"name":"conference_end_date","label":"Conference End Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"30","order":9,"name":"conference_number","label":"Conference Number","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"aics2022","order":10,"name":"conference_id","label":"Conference ID","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"https:\/\/aics2022.mtu.ie\/","order":11,"name":"conference_url","label":"Conference URL","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Single-blind","order":1,"name":"type","label":"Type","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"EasyChair","order":2,"name":"conference_management_system","label":"Conference Management System","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"102","order":3,"name":"number_of_submissions_sent_for_review","label":"Number of Submissions Sent for Review","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"41","order":4,"name":"number_of_full_papers_accepted","label":"Number of Full Papers Accepted","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"0","order":5,"name":"number_of_short_papers_accepted","label":"Number of Short Papers Accepted","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"40% - The value is computed by the equation \"Number of Full Papers Accepted \/ Number of Submissions Sent for Review * 100\" and then rounded to a whole number.","order":6,"name":"acceptance_rate_of_full_papers","label":"Acceptance Rate of Full Papers","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"3","order":7,"name":"average_number_of_reviews_per_paper","label":"Average Number of Reviews per Paper","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"3","order":8,"name":"average_number_of_papers_per_reviewer","label":"Average Number of Papers per Reviewer","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"No","order":9,"name":"external_reviewers_involved","label":"External Reviewers Involved","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}}]}}