{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,18]],"date-time":"2025-12-18T05:21:15Z","timestamp":1766035275414,"version":"3.48.0"},"reference-count":40,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2025,12,16]],"date-time":"2025-12-16T00:00:00Z","timestamp":1765843200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["MAKE"],"abstract":"<jats:p>A receipt information extraction task requires both textual and spatial analyses. Early receipt analysis systems primarily relied on template matching to extract data from spatially structured documents. However, these methods lack generalizability across various document layouts and require defining the specific spatial characteristics of unseen document sources. The advent of convolutional and recurrent neural networks has led to models that generalize better over unseen document layouts, and more recently, multi-modal transformer-based models, which consider a combination of text, visual, and layout inputs, have led to an even more significant boost in document-understanding capabilities. This work focuses on the joint use of a neural multi-modal transformer and a rule-based model and studies whether this combination achieves higher performance levels than the transformer on its own. A comprehensively annotated dataset, comprising real-world and synthetic receipts, was specifically developed for this study. The open source optical character recognition model DocTR was used to textually scan receipts and, together with an image, provided input to the classifier model. The open-source pre-trained LayoutLMv3 transformer-based model was augmented with a classifier model head, which was trained for classifying textual data into 12 predefined labels, such as date, price, and shop name. The methods implemented in the rule-based model were manually designed and consisted of four types: pattern-matching rules based on regular expressions and logic, database search-based methods for named entities, spatial pattern discovery guided by statistical metrics, and error correcting mechanisms based on confidence scores and local distance metrics. Following hyperparameter tuning of the classifier head and the integration of a rule-based model, the system achieved an overall F1 score of 0.98 in classifying textual data, including line items, from receipts.<\/jats:p>","DOI":"10.3390\/make7040167","type":"journal-article","created":{"date-parts":[[2025,12,16]],"date-time":"2025-12-16T08:46:52Z","timestamp":1765874812000},"page":"167","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Receipt Information Extraction with Joint Multi-Modal Transformer and Rule-Based Model"],"prefix":"10.3390","volume":"7","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4391-2988","authenticated-orcid":false,"given":"Xandru","family":"Mifsud","sequence":"first","affiliation":[{"name":"Department of Communications and Computer Engineering, University of Malta, MSD 2080 Msida, Malta"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4062-0787","authenticated-orcid":false,"given":"Leander","family":"Grech","sequence":"additional","affiliation":[{"name":"Department of Communications and Computer Engineering, University of Malta, MSD 2080 Msida, Malta"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-5602-0874","authenticated-orcid":false,"given":"Adriana","family":"Baldacchino","sequence":"additional","affiliation":[{"name":"Department of Communications and Computer Engineering, University of Malta, MSD 2080 Msida, Malta"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-2391-3026","authenticated-orcid":false,"given":"L\u00e9a","family":"Keller","sequence":"additional","affiliation":[{"name":"Department of Communications and Computer Engineering, University of Malta, MSD 2080 Msida, Malta"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3864-7785","authenticated-orcid":false,"given":"Gianluca","family":"Valentino","sequence":"additional","affiliation":[{"name":"Department of Communications and Computer Engineering, University of Malta, MSD 2080 Msida, Malta"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9157-2818","authenticated-orcid":false,"given":"Adrian","family":"Muscat","sequence":"additional","affiliation":[{"name":"Department of Communications and Computer Engineering, University of Malta, MSD 2080 Msida, Malta"}]}],"member":"1968","published-online":{"date-parts":[[2025,12,16]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Huang, Y., Lv, T., Cui, L., Lu, Y., and Wei, F. (2022). LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking. arXiv.","DOI":"10.1145\/3503161.3548112"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Huang, Z., Chen, K., He, J., Bai, X., Karatzas, D., Lu, S., and Jawahar, C.V. (2019, January 20\u201325). ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction. Proceedings of the 2019 International Conference on Document Analysis and Recognition (ICDAR), Sydney, NSW, Australia.","DOI":"10.1109\/ICDAR.2019.00244"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Huang, Z., Chen, K., He, J., Bai, X., Karatzas, D., Lu, S., and Jawahar, C.V. (2021). ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction. arXiv.","DOI":"10.1109\/ICDAR.2019.00244"},{"key":"ref_4","unstructured":"Park, S., Shin, S., Lee, B., Lee, J., Surh, J., Seo, M., and Lee, H. (2019, January 8\u201314). CORD: A Consolidated Receipt Dataset for Post-OCR Parsing. Proceedings of the Workshop on Document Intelligence at NeurIPS 2019, Vancouver, BC, Canada."},{"key":"ref_5","unstructured":"Yue, A. (2023, June 13). Automated Receipt Image Identification, Cropping, and Parsing. Available online: https:\/\/api.semanticscholar.org\/CorpusID:49555566."},{"key":"ref_6","unstructured":"Lazic, M. (2020). Using Natural Language Processing to Extract Information from Receipt Text. [Master\u2019s Thesis, KTH Royal Institute of Technology]. Available online: https:\/\/urn.kb.se\/resolve?urn=urn:nbn:se:kth:diva-279302."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Palm, R.B., Winther, O., and Laws, F. (2017). CloudScan\u2014A configuration-free invoice analysis system using recurrent neural networks. arXiv.","DOI":"10.1109\/ICDAR.2017.74"},{"key":"ref_8","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Xu, Y., Li, M., Cui, L., Huang, S., Wei, F., and Zhou, M. (2020, January 6\u201310). LayoutLM: Pre-training of Text and Layout for Document Image Understanding. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual Event.","DOI":"10.1145\/3394486.3403172"},{"key":"ref_10","unstructured":"Zong, C., Xia, F., Li, W., and Navigli, R. (2021, January 1\u20136). LayoutLMv2: Multi-modal Pre-training for Visually-rich Document Understanding. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Virtual Event."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Garncarek, \u0141., Powalski, R., Stanis\u0142awek, T., Topolski, B., Halama, P., Turski, M., and Grali\u0144ski, F. (2021). LAMBERT: Layout-Aware Language Modeling for Information Extraction. Document Analysis and Recognition\u2014ICDAR 2021, Springer International Publishing.","DOI":"10.1007\/978-3-030-86549-8_34"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Zhang, Z., Ma, J., Du, J., Wang, L., and Zhang, J. (2022). Multimodal Pre-training Based on Graph Attention Network for Document Understanding. arXiv.","DOI":"10.1109\/TMM.2022.3214102"},{"key":"ref_13","unstructured":"Abdallah, A., Abdalla, M., Elkasaby, M., Elbendary, Y., and Jatowt, A. (2024). AMuRD: Annotated Arabic-English Receipt Dataset for Key Information Extraction and Classification. arXiv."},{"key":"ref_14","unstructured":"Townsend, B., May, M., Mackowiak, K., and Wells, C. (2025). RealKIE: Five Novel Datasets for Enterprise Key Information Extraction. arXiv."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Wang, D., Ma, Z., Nourbakhsh, A., Gu, K., and Shah, S. (2024). DocGraphLM: Documental Graph Language Model for Information Extraction. arXiv.","DOI":"10.1145\/3539618.3591975"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"104046","DOI":"10.1016\/j.ipm.2024.104046","article-title":"DocExtractNet: A novel framework for enhanced information extraction from business documents","volume":"62","author":"Yan","year":"2025","journal-title":"Inf. Process. Manag."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Yu, J.M., Ma, H.J., and Kong, J.L. (2025). Receipt Recognition Technology Driven by Multimodal Alignment and Lightweight Sequence Modeling. Electronics, 14.","DOI":"10.3390\/electronics14091717"},{"key":"ref_18","unstructured":"Berghaus, D., Berger, A., Hillebrand, L., Cvejoski, K., and Sifa, R. (2025). Multi-Modal Vision vs. Text-Based Parsing: Benchmarking LLM Strategies for Invoice Processing. arXiv."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Kim, G., Hong, T., Yim, M., Nam, J., Park, J., Yim, J., Hwang, W., Yun, S., Han, D., and Park, S. (2022, January 23\u201327). OCR-Free Document Understanding Transformer. Proceedings of the Computer Vision\u2014ECCV 2022: 17th European Conference, Tel Aviv, Israel. Proceedings, Part XXVIII.","DOI":"10.1007\/978-3-031-19815-1_29"},{"key":"ref_20","unstructured":"Smith, R. (2007, January 23\u201326). An Overview of the Tesseract OCR Engine. Proceedings of the ICDAR \u201907: Proceedings of the Ninth International Conference on Document Analysis and Recognition, Curitiba, Brazil."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chrasekhar, V.R., and Lu, S. (2015, January 23\u201326). ICDAR 2015 competition on robust reading. Proceedings of the ICDAR, Tunis, Tunisia.","DOI":"10.1109\/ICDAR.2015.7333942"},{"key":"ref_22","unstructured":"(2024, November 07). JaidedAI: EasyOCR. Available online: https:\/\/github.com\/JaidedAI\/EasyOCR."},{"key":"ref_23","unstructured":"(2024, November 07). docTR: Document Text Recognition. Available online: https:\/\/github.com\/mindee\/doctr."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"2298","DOI":"10.1109\/TPAMI.2016.2646371","article-title":"An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition","volume":"39","author":"Shi","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Busta, M., Neumann, L., and Matas, J. (2017, January 22\u201329). Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework. Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.242"},{"key":"ref_26","unstructured":"Yao, C., Bai, X., Liu, W., Ma, Y., and Tu, Z. (2012, January 16\u201321). Detecting texts of arbitrary orientations in natural images. Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., and Liang, J. (2017, January 21\u201326). EAST: An Efficient and Accurate Scene Text Detector. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.283"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"3676","DOI":"10.1109\/TIP.2018.2825107","article-title":"TextBoxes++: A Single-Shot Oriented Scene Text Detector","volume":"27","author":"Liao","year":"2018","journal-title":"IEEE Trans. Image Process."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"9029","DOI":"10.1109\/TMM.2023.3244322","article-title":"Text Growing on Leaf","volume":"25","author":"Cheng","year":"2023","journal-title":"IEEE Trans. Multimed."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"15745","DOI":"10.1109\/TNNLS.2023.3289327","article-title":"Zoom Text Detector","volume":"35","author":"Yang","year":"2024","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"2864","DOI":"10.1109\/TIP.2022.3141844","article-title":"CM-Net: Concentric Mask Based Arbitrary-Shaped Text Detection","volume":"31","author":"Yang","year":"2022","journal-title":"IEEE Trans. Image Process."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Zhang, X., Su, Y., Tripathi, S., and Tu, Z. (2022, January 18\u201324). Text Spotting Transformers. Proceedings of the 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00930"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Huang, M., Liu, Y., Peng, Z., Liu, C., Lin, D., Zhu, S., Yuan, N., Ding, K., and Jin, L. (2022, January 18\u201324). SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition. Proceedings of the 2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00455"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021, January 10\u201317). Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. Proceedings of the 2021 IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"161","DOI":"10.1007\/s11263-020-01369-0","article-title":"Scene Text Detection and Recognition: The Deep Learning Era","volume":"129","author":"Long","year":"2021","journal-title":"Int. J. Comput. Vis."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"126702","DOI":"10.1016\/j.neucom.2023.126702","article-title":"A survey of text detection and recognition algorithms based on deep learning technology","volume":"556","author":"Wang","year":"2023","journal-title":"Neurocomputing"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Ribeiro, M.R.M., J\u00f9lio, D., Abelha, V., Abelha, A., and Machado, J. (2019, January 8\u201311). A Comparative Study of Optical Character Recognition in Health Information System. Proceedings of the 2019 International Conference in Engineering Applications (ICEA), Sao Miguel, Portugal.","DOI":"10.1109\/CEAP.2019.8883448"},{"key":"ref_38","first-page":"447","article-title":"OCR-MRD: Performance analysis of different optical character recognition engines for medical report digitization","volume":"16","author":"Batra","year":"2024","journal-title":"Int. J. Inf. Technol."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Sfikas, G., and Retsinas, G. (2024). Confidence-Aware Document OCR Error Detection. International Workshop on Document Analysis Systems, Springer.","DOI":"10.1007\/978-3-031-70442-0"},{"key":"ref_40","unstructured":"Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv."}],"container-title":["Machine Learning and Knowledge Extraction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-4990\/7\/4\/167\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,18]],"date-time":"2025-12-18T05:17:46Z","timestamp":1766035066000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-4990\/7\/4\/167"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12,16]]},"references-count":40,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["make7040167"],"URL":"https:\/\/doi.org\/10.3390\/make7040167","relation":{},"ISSN":["2504-4990"],"issn-type":[{"type":"electronic","value":"2504-4990"}],"subject":[],"published":{"date-parts":[[2025,12,16]]}}}