{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,17]],"date-time":"2025-09-17T16:50:16Z","timestamp":1758127816158,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":38,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,10,10]],"date-time":"2022-10-10T00:00:00Z","timestamp":1665360000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,10,10]]},"DOI":"10.1145\/3503161.3547877","type":"proceedings-article","created":{"date-parts":[[2022,10,10]],"date-time":"2022-10-10T15:42:35Z","timestamp":1665416555000},"page":"4261-4271","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":12,"title":["Query-driven Generative Network for Document Information Extraction in the Wild"],"prefix":"10.1145","author":[{"given":"Haoyu","family":"Cao","sequence":"first","affiliation":[{"name":"Tencent YouTu Lab, HeFei, China"}]},{"given":"Xin","family":"Li","sequence":"additional","affiliation":[{"name":"Tencent YouTu Lab, HeFei, China"}]},{"given":"Jiefeng","family":"Ma","sequence":"additional","affiliation":[{"name":"University of Science and Technology of China, HeFei, China"}]},{"given":"Deqiang","family":"Jiang","sequence":"additional","affiliation":[{"name":"Tencent YouTu Lab, HeFei, China"}]},{"given":"Antai","family":"Guo","sequence":"additional","affiliation":[{"name":"Tencent YouTu Lab, HeFei, China"}]},{"given":"Yiqing","family":"Hu","sequence":"additional","affiliation":[{"name":"Tencent YouTu Lab, HeFei, China"}]},{"given":"Hao","family":"Liu","sequence":"additional","affiliation":[{"name":"Tencent YouTu Lab, HeFei, China"}]},{"given":"Yinsong","family":"Liu","sequence":"additional","affiliation":[{"name":"Tencent YouTu Lab, HeFei, China"}]},{"given":"Bo","family":"Ren","sequence":"additional","affiliation":[{"name":"Tencent YouTu Lab, HeFei, China"}]}],"member":"320","published-online":{"date-parts":[[2022,10,10]]},"reference":[{"key":"e_1_3_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1117\/12.2003911"},{"key":"e_1_3_2_2_2_1","volume-title":"Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13--18","volume":"652","author":"Bao Hangbo","year":"2020","unstructured":"Hangbo Bao , Li Dong , Furu Wei , Wenhui Wang , Nan Yang , Xiaodong Liu , Yu Wang , Jianfeng Gao , Songhao Piao , Ming Zhou , and Hsiao-Wuen Hon . 2020 . UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training . In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13--18 July 2020, Virtual Event (Proceedings of Machine Learning Research , Vol. 119). PMLR, 642-- 652 . http:\/\/proceedings.mlr.press\/v119\/bao20a.html Hangbo Bao, Li Dong, Furu Wei, Wenhui Wang, Nan Yang, Xiaodong Liu, Yu Wang, Jianfeng Gao, Songhao Piao, Ming Zhou, and Hsiao-Wuen Hon. 2020. UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13--18 July 2020, Virtual Event (Proceedings of Machine Learning Research, Vol. 119). PMLR, 642--652. http:\/\/proceedings.mlr.press\/v119\/bao20a.html"},{"key":"e_1_3_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.naacl-main.276"},{"key":"e_1_3_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3413511"},{"key":"e_1_3_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.naacl-main.280"},{"key":"e_1_3_2_2_6_1","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019","volume":"1","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2019 . BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding . In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019 , Minneapolis, MN, USA, June 2--7 , 2019, Volume 1 (Long and Short Papers), , Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, 4171--4186. https:\/\/doi.org\/10.18653\/v1\/n19--1423 Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2--7, 2019, Volume 1 (Long and Short Papers), , Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, 4171--4186. https:\/\/doi.org\/10.18653\/v1\/n19--1423"},{"key":"e_1_3_2_2_7_1","volume-title":"Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems","author":"Dong Li","year":"2019","unstructured":"Li Dong , Nan Yang , Wenhui Wang , Furu Wei , Xiaodong Liu , Yu Wang , Jianfeng Gao , Ming Zhou , and Hsiao-Wuen Hon . 2019. Unified Language Model Pre-training for Natural Language Understanding and Generation . In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019 , NeurIPS 2019, December 8--14, 2019, Vancouver, BC, Canada, Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d'Alch\u00e9 -Buc, Emily B. Fox, and Roman Garnett (Eds .). 13042--13054. https:\/\/proceedings.neurips.cc\/paper\/2019\/hash\/c20bb2d9a50d5ac1f713f8b34d9aac5a-Abstract.html Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, and Hsiao-Wuen Hon. 2019. Unified Language Model Pre-training for Natural Language Understanding and Generation. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8--14, 2019, Vancouver, BC, Canada, Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d'Alch\u00e9 -Buc, Emily B. Fox, and Roman Garnett (Eds.). 13042--13054. https:\/\/proceedings.neurips.cc\/paper\/2019\/hash\/c20bb2d9a50d5ac1f713f8b34d9aac5a-Abstract.html"},{"key":"e_1_3_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1117\/12.908542"},{"key":"e_1_3_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.acl-long.295"},{"key":"e_1_3_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-86331-9_45"},{"key":"e_1_3_2_2_11_1","volume-title":"BROS: A Layout-Aware Pre-trained Language Model for Understanding Documents. CoRR","author":"Hong Teakgyu","year":"2021","unstructured":"Teakgyu Hong , Donghyun Kim , Mingi Ji , Wonseok Hwang , Daehyun Nam , and Sungrae Park . 2021 . BROS: A Layout-Aware Pre-trained Language Model for Understanding Documents. CoRR , Vol. abs\/ 2108 .04539 (2021). showeprint[arXiv]2108.04539 https:\/\/arxiv.org\/abs\/2108.04539 Teakgyu Hong, Donghyun Kim, Mingi Ji, Wonseok Hwang, Daehyun Nam, and Sungrae Park. 2021. BROS: A Layout-Aware Pre-trained Language Model for Understanding Documents. CoRR , Vol. abs\/2108.04539 (2021). showeprint[arXiv]2108.04539 https:\/\/arxiv.org\/abs\/2108.04539"},{"key":"e_1_3_2_2_12_1","volume-title":"ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction. In 2019 International Conference on Document Analysis and Recognition, ICDAR 2019","author":"Huang Zheng","year":"2019","unstructured":"Zheng Huang , Kai Chen , Jianhua He , Xiang Bai , Dimosthenis Karatzas , Shijian Lu , and C. V. Jawahar . 2019 . ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction. In 2019 International Conference on Document Analysis and Recognition, ICDAR 2019 , Sydney, Australia, September 20--25 , 2019 . IEEE, 1516--1520. https:\/\/doi.org\/10.1109\/ICDAR.2019.00244 Zheng Huang, Kai Chen, Jianhua He, Xiang Bai, Dimosthenis Karatzas, Shijian Lu, and C. V. Jawahar. 2019. ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction. In 2019 International Conference on Document Analysis and Recognition, ICDAR 2019, Sydney, Australia, September 20--25, 2019. IEEE, 1516--1520. https:\/\/doi.org\/10.1109\/ICDAR.2019.00244"},{"key":"e_1_3_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.findings-acl.28"},{"key":"e_1_3_2_2_14_1","volume-title":"FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents. In 2nd International Workshop on Open Services and Tools for Document Analysis, OST@ICDAR 2019","author":"Jaume Guillaume","year":"2019","unstructured":"Guillaume Jaume , Hazim Kemal Ekenel , and Jean-Philippe Thiran . 2019 . FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents. In 2nd International Workshop on Open Services and Tools for Document Analysis, OST@ICDAR 2019 , Sydney, Australia, September 22--25 , 2019. IEEE, 1--6. https:\/\/doi.org\/10.1109\/ICDARW.2019.10029 Guillaume Jaume, Hazim Kemal Ekenel, and Jean-Philippe Thiran. 2019. FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents. In 2nd International Workshop on Open Services and Tools for Document Analysis, OST@ICDAR 2019, Sydney, Australia, September 22--25, 2019. IEEE, 1--6. https:\/\/doi.org\/10.1109\/ICDARW.2019.10029"},{"key":"e_1_3_2_2_15_1","volume-title":"Kingma and Jimmy Ba","author":"Diederik","year":"2015","unstructured":"Diederik P. Kingma and Jimmy Ba . 2015 . Adam : A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7--9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds .). http:\/\/arxiv.org\/abs\/1412.6980 Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7--9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http:\/\/arxiv.org\/abs\/1412.6980"},{"volume-title":"Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval","author":"Lewis D.","key":"e_1_3_2_2_16_1","unstructured":"D. Lewis , G. Agam , S. Argamon , O. Frieder , D. Grossman , and J. Heard . 2006. Building a Test Collection for Complex Document Information Processing . In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval ( Seattle, Washington, USA) (SIGIR '06). Association for Computing Machinery, New York, NY, USA, 665--666. https:\/\/doi.org\/10.1145\/1148170.1148307 D. Lewis, G. Agam, S. Argamon, O. Frieder, D. Grossman, and J. Heard. 2006. Building a Test Collection for Complex Document Information Processing. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Seattle, Washington, USA) (SIGIR '06). Association for Computing Machinery, New York, NY, USA, 665--666. https:\/\/doi.org\/10.1145\/1148170.1148307"},{"key":"e_1_3_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.acl-long.493"},{"key":"e_1_3_2_2_18_1","volume-title":"SelfDoc: Self-Supervised Document Representation Learning. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021","author":"Li Peizhao","year":"2021","unstructured":"Peizhao Li , Jiuxiang Gu , Jason Kuen , Vlad I. Morariu , Handong Zhao , Rajiv Jain , Varun Manjunatha , and Hongfu Liu . 2021 b. SelfDoc: Self-Supervised Document Representation Learning. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021 , virtual, June 19 --25 , 2021. Computer Vision Foundation \/ IEEE, 5652--5660. https:\/\/openaccess.thecvf.com\/content\/CVPR2021\/html\/Li_SelfDoc_Self-Supervised_Document_Representation_Learning_CVPR_2021_paper.html Peizhao Li, Jiuxiang Gu, Jason Kuen, Vlad I. Morariu, Handong Zhao, Rajiv Jain, Varun Manjunatha, and Hongfu Liu. 2021b. SelfDoc: Self-Supervised Document Representation Learning. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19--25, 2021. Computer Vision Foundation \/ IEEE, 5652--5660. https:\/\/openaccess.thecvf.com\/content\/CVPR2021\/html\/Li_SelfDoc_Self-Supervised_Document_Representation_Learning_CVPR_2021_paper.html"},{"key":"e_1_3_2_2_19_1","volume-title":"StrucTexT: Structured Text Understanding with Multi-Modal Transformers. In MM '21: ACM Multimedia Conference","author":"Li Yulin","year":"2021","unstructured":"Yulin Li , Yuxi Qian , Yuechen Yu , Xiameng Qin , Chengquan Zhang , Yan Liu , Kun Yao , Junyu Han , Jingtuo Liu , and Errui Ding . 2021 c. StrucTexT: Structured Text Understanding with Multi-Modal Transformers. In MM '21: ACM Multimedia Conference , Virtual Event, China, October 20 - 24 , 2021, , Heng Tao Shen, Yueting Zhuang, John R. Smith, Yang Yang, Pablo Cesar, Florian Metze, and Balakrishnan Prabhakaran (Eds.). ACM, 1912--1920. https:\/\/doi.org\/10.1145\/3474085.3475345 Yulin Li, Yuxi Qian, Yuechen Yu, Xiameng Qin, Chengquan Zhang, Yan Liu, Kun Yao, Junyu Han, Jingtuo Liu, and Errui Ding. 2021c. StrucTexT: Structured Text Understanding with Multi-Modal Transformers. In MM '21: ACM Multimedia Conference, Virtual Event, China, October 20 - 24, 2021, , Heng Tao Shen, Yueting Zhuang, John R. Smith, Yang Yang, Pablo Cesar, Florian Metze, and Balakrishnan Prabhakaran (Eds.). ACM, 1912--1920. https:\/\/doi.org\/10.1145\/3474085.3475345"},{"key":"e_1_3_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2019\/423"},{"key":"e_1_3_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.580"},{"key":"e_1_3_2_2_22_1","volume-title":"CORD: A Consolidated Receipt Dataset for Post-OCR Parsing. In Workshop on Document Intelligence at NeurIPS","author":"Park Seunghyun","year":"2019","unstructured":"Seunghyun Park , Seung Shin , Bado Lee , Junyeop Lee , Jaeheung Surh , Minjoon Seo , and Hwalsuk Lee . 2019 . CORD: A Consolidated Receipt Dataset for Post-OCR Parsing. In Workshop on Document Intelligence at NeurIPS 2019. Seunghyun Park, Seung Shin, Bado Lee, Junyeop Lee, Jaeheung Surh, Minjoon Seo, and Hwalsuk Lee. 2019. CORD: A Consolidated Receipt Dataset for Post-OCR Parsing. In Workshop on Document Intelligence at NeurIPS 2019."},{"key":"e_1_3_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-86331-9_47"},{"key":"e_1_3_2_2_24_1","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019","volume":"1","author":"Qian Yujie","year":"2019","unstructured":"Yujie Qian , Enrico Santus , Zhijing Jin , Jiang Guo , and Regina Barzilay . 2019 . GraphIE: A Graph-Based Framework for Information Extraction . In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019 , Minneapolis, MN, USA, June 2--7 , 2019, Volume 1 (Long and Short Papers), , Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, 751--761. https:\/\/doi.org\/10.18653\/v1\/n19--1082 Yujie Qian, Enrico Santus, Zhijing Jin, Jiang Guo, and Regina Barzilay. 2019. GraphIE: A Graph-Based Framework for Information Extraction. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2--7, 2019, Volume 1 (Long and Short Papers), , Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, 751--761. https:\/\/doi.org\/10.18653\/v1\/n19--1082"},{"key":"e_1_3_2_2_25_1","article-title":"Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer","volume":"21","author":"Raffel Colin","year":"2020","unstructured":"Colin Raffel , Noam Shazeer , Adam Roberts , Katherine Lee , Sharan Narang , Michael Matena , Yanqi Zhou , Wei Li , and Peter J. Liu . 2020 . Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer . J. Mach. Learn. Res. , Vol. 21 (2020), 140:1--140:67. http:\/\/jmlr.org\/papers\/v21\/20-074.html Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res. , Vol. 21 (2020), 140:1--140:67. http:\/\/jmlr.org\/papers\/v21\/20-074.html","journal-title":"J. Mach. Learn. Res."},{"key":"e_1_3_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.spnlp-1.6"},{"key":"e_1_3_2_2_27_1","volume-title":"Recurrent Neural Network Approach for Table Field Extraction in Business Documents. In 2019 International Conference on Document Analysis and Recognition, ICDAR 2019","author":"Sage Cl\u00e9","year":"2019","unstructured":"Cl\u00e9 ment Sage , Alexandre Aussem , Haytham Elghazel , V\u00e9 ronique Eglin , and J\u00e9 r\u00e9 my Espinas. 2019 . Recurrent Neural Network Approach for Table Field Extraction in Business Documents. In 2019 International Conference on Document Analysis and Recognition, ICDAR 2019 , Sydney, Australia, September 20--25 , 2019 . IEEE, 1308--1313. https:\/\/doi.org\/10.1109\/ICDAR.2019.00211 Cl\u00e9 ment Sage, Alexandre Aussem, Haytham Elghazel, V\u00e9 ronique Eglin, and J\u00e9 r\u00e9 my Espinas. 2019. Recurrent Neural Network Approach for Table Field Extraction in Business Documents. In 2019 International Conference on Document Analysis and Recognition, ICDAR 2019, Sydney, Australia, September 20--25, 2019. IEEE, 1308--1313. https:\/\/doi.org\/10.1109\/ICDAR.2019.00211"},{"key":"e_1_3_2_2_28_1","volume-title":"Intellix - End-User Trained Information Extraction for Document Archiving. In 12th International Conference on Document Analysis and Recognition, ICDAR 2013","author":"Schuster Daniel","year":"2013","unstructured":"Daniel Schuster , Klemens Muthmann , Daniel Esser , Alexander Schill , Michael Berger , Christoph Weidling , Kamil Aliyev , and Andreas Hofmeier . 2013 . Intellix - End-User Trained Information Extraction for Document Archiving. In 12th International Conference on Document Analysis and Recognition, ICDAR 2013 , Washington, DC, USA, August 25--28 , 2013. IEEE Computer Society, 101--105. https:\/\/doi.org\/10.1109\/ICDAR.2013.28 Daniel Schuster, Klemens Muthmann, Daniel Esser, Alexander Schill, Michael Berger, Christoph Weidling, Kamil Aliyev, and Andreas Hofmeier. 2013. Intellix - End-User Trained Information Extraction for Document Archiving. In 12th International Conference on Document Analysis and Recognition, ICDAR 2013, Washington, DC, USA, August 25--28, 2013. IEEE Computer Society, 101--105. https:\/\/doi.org\/10.1109\/ICDAR.2013.28"},{"key":"e_1_3_2_2_29_1","volume-title":"Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017","volume":"1083","author":"Liu Peter J.","year":"1865","unstructured":"Abigail See, Peter J. Liu , and Christopher D. Manning . 2017. Get To The Point: Summarization with Pointer-Generator Networks . In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017 , Vancouver, Canada, July 30 - August 4 , Volume 1: Long Papers, , Regina Barzilay and Min-Yen Kan (Eds.). Association for Computational Linguistics, 1073-- 1083 . https:\/\/doi.org\/10. 1865 3\/v1\/P17--1099 Abigail See, Peter J. Liu, and Christopher D. Manning. 2017. Get To The Point: Summarization with Pointer-Generator Networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers, , Regina Barzilay and Min-Yen Kan (Eds.). Association for Computational Linguistics, 1073--1083. https:\/\/doi.org\/10.18653\/v1\/P17--1099"},{"key":"e_1_3_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2016.2646371"},{"key":"e_1_3_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2021\/144"},{"key":"e_1_3_2_2_32_1","volume-title":"Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani , Noam Shazeer , Niki Parmar , Jakob Uszkoreit , Llion Jones , Aidan N. Gomez , Lukasz Kaiser , and Illia Polosukhin . 2017 . Attention is All you Need . In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017 , December 4 --9 , 2017, Long Beach, CA, USA, , Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 5998--6008. https:\/\/proceedings.neurips.cc\/paper\/2017\/hash\/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4--9, 2017, Long Beach, CA, USA, , Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 5998--6008. https:\/\/proceedings.neurips.cc\/paper\/2017\/hash\/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html"},{"key":"e_1_3_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i4.16378"},{"key":"e_1_3_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2021\/150"},{"key":"e_1_3_2_2_35_1","volume-title":"LayoutLM: Pre-training of Text and Layout for Document Image Understanding. In KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining","author":"Xu Yiheng","year":"2020","unstructured":"Yiheng Xu , Minghao Li , Lei Cui , Shaohan Huang , Furu Wei , and Ming Zhou . 2020 . LayoutLM: Pre-training of Text and Layout for Document Image Understanding. In KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , Virtual Event, CA, USA, August 23--27 , 2020, , Rajesh Gupta, Yan Liu, Jiliang Tang, and B. Aditya Prakash (Eds.). ACM, 1192--1200. https:\/\/doi.org\/10.1145\/3394486.3403172 Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, and Ming Zhou. 2020. LayoutLM: Pre-training of Text and Layout for Document Image Understanding. In KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, CA, USA, August 23--27, 2020, , Rajesh Gupta, Yan Liu, Jiliang Tang, and B. Aditya Prakash (Eds.). ACM, 1192--1200. https:\/\/doi.org\/10.1145\/3394486.3403172"},{"key":"e_1_3_2_2_36_1","volume-title":"Cha Zhang, and Furu Wei.","author":"Xu Yiheng","year":"2021","unstructured":"Yiheng Xu , Tengchao Lv , Lei Cui , Guoxin Wang , Yijuan Lu , Dinei Flor\u00ea ncio , Cha Zhang, and Furu Wei. 2021 a. LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding. CoRR , Vol. abs\/ 2104 .08836 (2021). showeprint[arXiv]2104.08836 https:\/\/arxiv.org\/abs\/2104.08836 Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Flor\u00ea ncio, Cha Zhang, and Furu Wei. 2021a. LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding. CoRR , Vol. abs\/2104.08836 (2021). showeprint[arXiv]2104.08836 https:\/\/arxiv.org\/abs\/2104.08836"},{"key":"e_1_3_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.acl-long.201"},{"key":"e_1_3_2_2_38_1","volume-title":"25th International Conference on Pattern Recognition, ICPR 2020","author":"Yu Wenwen","year":"2020","unstructured":"Wenwen Yu , Ning Lu , Xianbiao Qi , Ping Gong , and Rong Xiao . 2020 . PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks . In 25th International Conference on Pattern Recognition, ICPR 2020 , Virtual Event \/ Milan, Italy, January 10--15 , 2021. IEEE, 4363--4370. https:\/\/doi.org\/10.1109\/ICPR48806.2021.9412927io Wenwen Yu, Ning Lu, Xianbiao Qi, Ping Gong, and Rong Xiao. 2020. PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks. In 25th International Conference on Pattern Recognition, ICPR 2020, Virtual Event \/ Milan, Italy, January 10--15, 2021. IEEE, 4363--4370. https:\/\/doi.org\/10.1109\/ICPR48806.2021.9412927io"}],"event":{"name":"MM '22: The 30th ACM International Conference on Multimedia","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Lisboa Portugal","acronym":"MM '22"},"container-title":["Proceedings of the 30th ACM International Conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3503161.3547877","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3503161.3547877","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:02:35Z","timestamp":1750186955000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3503161.3547877"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,10,10]]},"references-count":38,"alternative-id":["10.1145\/3503161.3547877","10.1145\/3503161"],"URL":"https:\/\/doi.org\/10.1145\/3503161.3547877","relation":{},"subject":[],"published":{"date-parts":[[2022,10,10]]},"assertion":[{"value":"2022-10-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}