{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:19:06Z","timestamp":1750220346314,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":37,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,8,24]],"date-time":"2021-08-24T00:00:00Z","timestamp":1629763200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,8,24]]},"DOI":"10.1145\/3460426.3463612","type":"proceedings-article","created":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T22:50:28Z","timestamp":1630536628000},"page":"210-218","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Reading Scene Text by Fusing Visual Attention with Semantic Representations"],"prefix":"10.1145","author":[{"given":"Zhiguang","family":"Liu","sequence":"first","affiliation":[{"name":"Noah's Ark Lab &amp; Huawei Technologies, Shenzhen, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Liangwei","family":"Wang","sequence":"additional","affiliation":[{"name":"Noah's Ark Lab &amp; Huawei Technologies, Shenzhen, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jian","family":"Qiao","sequence":"additional","affiliation":[{"name":"Huawei Technologies, Shenzhen, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,9]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00163"},{"key":"e_1_3_2_1_2_1","volume-title":"Block: Bilinear superdiagonal fusion for visual question answering and visual relationship detection. arXiv preprint arXiv:1902.00038","author":"Ben-Younes Hedi","year":"2019","unstructured":"Hedi Ben-Younes , R\u00e9mi Cadene , Nicolas Thome , and Matthieu Cord . 2019 . Block: Bilinear superdiagonal fusion for visual question answering and visual relationship detection. arXiv preprint arXiv:1902.00038 (2019). Hedi Ben-Younes, R\u00e9mi Cadene, Nicolas Thome, and Matthieu Cord. 2019. Block: Bilinear superdiagonal fusion for visual question answering and visual relationship detection. arXiv preprint arXiv:1902.00038 (2019)."},{"key":"e_1_3_2_1_3_1","volume-title":"Article arXiv:1312.3005 (Dec","author":"Chelba Ciprian","year":"2013","unstructured":"Ciprian Chelba , Tomas Mikolov , Mike Schuster , Qi Ge , Thorsten Brants , Phillipp Koehn , and Tony Robinson . 2013. One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling. arXiv e-prints , Article arXiv:1312.3005 (Dec 2013 ), arXiv:1312.3005 pages.arxiv: 1312.3005 [cs.CL] Ciprian Chelba, Tomas Mikolov, Mike Schuster, Qi Ge, Thorsten Brants, Phillipp Koehn, and Tony Robinson. 2013. One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling. arXiv e-prints, Article arXiv:1312.3005 (Dec 2013), arXiv:1312.3005 pages.arxiv: 1312.3005 [cs.CL]"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.543"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00584"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.5555\/3305381.3305478"},{"key":"e_1_3_2_1_7_1","volume-title":"Universal transformers. arXiv preprint arXiv:1807.03819","author":"Dehghani Mostafa","year":"2018","unstructured":"Mostafa Dehghani , Stephan Gouws , Oriol Vinyals , Jakob Uszkoreit , and \u0141ukasz Kaiser . 2018. Universal transformers. arXiv preprint arXiv:1807.03819 ( 2018 ). Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob Uszkoreit, and \u0141ukasz Kaiser. 2018. Universal transformers. arXiv preprint arXiv:1807.03819 (2018)."},{"key":"e_1_3_2_1_8_1","volume-title":"Proceedings of the 2019 Conference of the NAACL. ACL, 4171--4186","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2019 . BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding . In Proceedings of the 2019 Conference of the NAACL. ACL, 4171--4186 . https:\/\/doi.org\/10.18653\/v1\/N19--1423 10.18653\/v1 Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the NAACL. ACL, 4171--4186. https:\/\/doi.org\/10.18653\/v1\/N19--1423"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W16-3210"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3240508.3240571"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.5555\/3305381.3305510"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.254"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.5555\/3016387.3016396"},{"key":"e_1_3_2_1_14_1","unstructured":"M. Jaderberg K. Simonyan A. Vedaldi and A. Zisserman. 2014. Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition. arXiv preprint arXiv:1406.2227 (2014).  M. Jaderberg K. Simonyan A. Vedaldi and A. Zisserman. 2014. Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition. arXiv preprint arXiv:1406.2227 (2014)."},{"key":"e_1_3_2_1_15_1","unstructured":"Max Jaderberg Karen Simonyan Andrew Zisserman etal 2015. Spatial transformer networks. In Advances in neural information processing systems. 2017--2025.  Max Jaderberg Karen Simonyan Andrew Zisserman et al. 2015. Spatial transformer networks. In Advances in neural information processing systems. 2017--2025."},{"key":"e_1_3_2_1_16_1","volume-title":"Hazim Kemal Ekenel, and Jean-Philippe Thiran","author":"Jaume Guillaume","year":"2019","unstructured":"Guillaume Jaume , Hazim Kemal Ekenel, and Jean-Philippe Thiran . 2019 . FUNSD : A Dataset for Form Understanding in Noisy Scanned Documents. arXiv e-prints, Article arXiv:1905.13538 (May 2019), arXiv:1905.13538 pages.arxiv: 1905.13538 [cs.IR] Guillaume Jaume, Hazim Kemal Ekenel, and Jean-Philippe Thiran. 2019. FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents. arXiv e-prints, Article arXiv:1905.13538 (May 2019), arXiv:1905.13538 pages.arxiv: 1905.13538 [cs.IR]"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.245"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.560"},{"key":"e_1_3_2_1_19_1","volume-title":"Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition. In AAAI Conference on Artificial Intelligence.","author":"Li Hui","year":"2019","unstructured":"Hui Li , Peng Wang , Chunhua Shen , and Guyu Zhang . 2019 . Show , Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition. In AAAI Conference on Artificial Intelligence. Hui Li, Peng Wang, Chunhua Shen, and Guyu Zhang. 2019. Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition. In AAAI Conference on Artificial Intelligence."},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33018714"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.12246"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.12252"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2019.01.020"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2014.07.008"},{"key":"e_1_3_2_1_25_1","volume-title":"An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition","author":"Shi Baoguang","year":"2016","unstructured":"Baoguang Shi , Xiang Bai , and Cong Yao . 2016a. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition . IEEE transactions on PAMI , Vol. 39 , 11 ( 2016 ), 2298--2304. Baoguang Shi, Xiang Bai, and Cong Yao. 2016a. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE transactions on PAMI, Vol. 39, 11 (2016), 2298--2304."},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.452"},{"key":"e_1_3_2_1_27_1","volume-title":"Aster: An attentional scene text recognizer with flexible rectification","author":"Shi Baoguang","year":"2018","unstructured":"Baoguang Shi , Mingkun Yang , Xinggang Wang , Pengyuan Lyu , Cong Yao , and Xiang Bai . 2018 . Aster: An attentional scene text recognizer with flexible rectification . IEEE transactions on pattern analysis and machine intelligence (2018). Baoguang Shi, Mingkun Yang, Xinggang Wang, Pengyuan Lyu, Cong Yao, and Xiang Bai. 2018. Aster: An attentional scene text recognizer with flexible rectification. IEEE transactions on pattern analysis and machine intelligence (2018)."},{"key":"e_1_3_2_1_28_1","unstructured":"Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N Gomez \u0141ukasz Kaiser and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.  Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N Gomez \u0141ukasz Kaiser and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008."},{"key":"e_1_3_2_1_29_1","volume-title":"Coco-text: Dataset and benchmark for text detection and recognition in natural images. arXiv preprint arXiv:1601.07140","author":"Veit Andreas","year":"2016","unstructured":"Andreas Veit , Tomas Matera , Lukas Neumann , Jiri Matas , and Serge Belongie . 2016 . Coco-text: Dataset and benchmark for text detection and recognition in natural images. arXiv preprint arXiv:1601.07140 (2016). Andreas Veit, Tomas Matera, Lukas Neumann, Jiri Matas, and Serge Belongie. 2016. Coco-text: Dataset and benchmark for text detection and recognition in natural images. arXiv preprint arXiv:1601.07140 (2016)."},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i07.6891"},{"key":"e_1_3_2_1_31_1","volume-title":"2011 International Conference on Computer Vision. IEEE, 1457--1464","author":"Wang Kai","year":"2011","unstructured":"Kai Wang , Boris Babenko , and Serge Belongie . 2011 . End-to-end scene text recognition . In 2011 International Conference on Computer Vision. IEEE, 1457--1464 . Kai Wang, Boris Babenko, and Serge Belongie. 2011. End-to-end scene text recognition. In 2011 International Conference on Computer Vision. IEEE, 1457--1464."},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"crossref","unstructured":"Tianwei Wang Yuanzhi Zhu Lianwen Jin Canjie Luo Xiaoxue Chen Yaqiang Wu Qianying Wang and Mingxiang Cai. 2020. Decoupled Attention Network for Text Recognition.. In AAAI. 12216--12224.  Tianwei Wang Yuanzhi Zhu Lianwen Jin Canjie Luo Xiaoxue Chen Yaqiang Wu Qianying Wang and Mingxiang Cai. 2020. Decoupled Attention Network for Text Recognition.. In AAAI. 12216--12224.","DOI":"10.1609\/aaai.v34i07.6903"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/2671188.2749352"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2020.07.010"},{"key":"e_1_3_2_1_35_1","volume-title":"Proceedings of the 26th International Joint Conference on Artificial Intelligence","author":"Yang Xiao","year":"2077","unstructured":"Xiao Yang , Dafang He , Zihan Zhou , Daniel Kifer , and C. Lee Giles . 2017. Learning to Read Irregular Text with Attention Mechanisms . In Proceedings of the 26th International Joint Conference on Artificial Intelligence ( Melbourne, Australia) (IJCAI'17). AAAI Press, 3280--3286. http:\/\/dl.acm.org\/citation.cfm?id=317 2077 .3172347 Xiao Yang, Dafang He, Zihan Zhou, Daniel Kifer, and C. Lee Giles. 2017. Learning to Read Irregular Text with Attention Mechanisms. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (Melbourne, Australia) (IJCAI'17). AAAI Press, 3280--3286. http:\/\/dl.acm.org\/citation.cfm?id=3172077.3172347"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.515"},{"key":"e_1_3_2_1_37_1","volume-title":"RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition. In European Conference on Computer Vision. Springer, 135--151","author":"Yue Xiaoyu","year":"2020","unstructured":"Xiaoyu Yue , Zhanghui Kuang , Chenhao Lin , Hongbin Sun , and Wayne Zhang . 2020 . RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition. In European Conference on Computer Vision. Springer, 135--151 . Xiaoyu Yue, Zhanghui Kuang, Chenhao Lin, Hongbin Sun, and Wayne Zhang. 2020. RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition. In European Conference on Computer Vision. Springer, 135--151."}],"event":{"name":"ICMR '21: International Conference on Multimedia Retrieval","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Taipei Taiwan","acronym":"ICMR '21"},"container-title":["Proceedings of the 2021 International Conference on Multimedia Retrieval"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3460426.3463612","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3460426.3463612","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:17:03Z","timestamp":1750191423000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3460426.3463612"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,8,24]]},"references-count":37,"alternative-id":["10.1145\/3460426.3463612","10.1145\/3460426"],"URL":"https:\/\/doi.org\/10.1145\/3460426.3463612","relation":{},"subject":[],"published":{"date-parts":[[2021,8,24]]},"assertion":[{"value":"2021-09-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}