{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,2]],"date-time":"2026-01-02T07:47:21Z","timestamp":1767340041908,"version":"3.41.0"},"reference-count":56,"publisher":"Association for Computing Machinery (ACM)","issue":"1s","license":[{"start":{"date-parts":[[2019,1,24]],"date-time":"2019-01-24T00:00:00Z","timestamp":1548288000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Nature Science Foundation of China","doi-asserted-by":"crossref","award":["61525206,61771468,61622211,61472392 and 61620106009"],"award-info":[{"award-number":["61525206,61771468,61622211,61472392 and 61620106009"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100012226","name":"Fundamental Research Funds for the Central Universities","doi-asserted-by":"crossref","award":["WK2100100030"],"award-info":[{"award-number":["WK2100100030"]}],"id":[{"id":"10.13039\/501100012226","id-type":"DOI","asserted-by":"crossref"}]},{"name":"National Key Research and Development Program of China","award":["2017YFC0820600"],"award-info":[{"award-number":["2017YFC0820600"]}]},{"DOI":"10.13039\/501100004739","name":"Youth Innovation Promotion Association Chinese Academy of Sciences","doi-asserted-by":"crossref","award":["2017209"],"award-info":[{"award-number":["2017209"]}],"id":[{"id":"10.13039\/501100004739","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2019,1,31]]},"abstract":"<jats:p>\n            In this article, we present Convoluitional Attention Networks (CAN) for unconstrained scene text recognition. Recent dominant approaches for scene text recognition are mainly based on Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN), where the CNN encodes images and the RNN generates character sequences. Our CAN is different from these methods; our CAN is completely built on CNN and includes an attention mechanism. The distinctive characteristics of our method include (i) CAN follows encoder-decoder architecture, in which the encoder is a deep two-dimensional CNN and the decoder is a one-dimensional CNN; (ii) the attention mechanism is applied in every convolutional layer of the decoder, and we propose a novel spatial attention method using average pooling; and (iii) position embeddings are equipped in both a spatial encoder and a sequence decoder to give our networks a sense of location. We conduct experiments on standard datasets for scene text recognition, including\n            <jats:italic>Street View Text<\/jats:italic>\n            ,\n            <jats:italic>IIIT5K,<\/jats:italic>\n            and\n            <jats:italic>ICDAR<\/jats:italic>\n            datasets. The experimental results validate the effectiveness of different components and show that our convolutional-based method achieves state-of-the-art or competitive performance over prior works, even without the use of RNN.\n          <\/jats:p>","DOI":"10.1145\/3231737","type":"journal-article","created":{"date-parts":[[2019,1,28]],"date-time":"2019-01-28T14:01:39Z","timestamp":1548684099000},"page":"1-17","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":61,"title":["Convolutional Attention Networks for Scene Text Recognition"],"prefix":"10.1145","volume":"15","author":[{"given":"Hongtao","family":"Xie","sequence":"first","affiliation":[{"name":"School of Information Science and Technology, University of Science and Technology of China, Hefei, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3100-3664","authenticated-orcid":false,"given":"Shancheng","family":"Fang","sequence":"additional","affiliation":[{"name":"Institute of Information Engineering, Chinese Academy of Sciences, China and School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China"}]},{"given":"Zheng-Jun","family":"Zha","sequence":"additional","affiliation":[{"name":"School of Information Science and Technology, University of Science and Technology of China, Hefei, China"}]},{"given":"Yating","family":"Yang","sequence":"additional","affiliation":[{"name":"Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumchi, China"}]},{"given":"Yan","family":"Li","sequence":"additional","affiliation":[{"name":"Beijing Kuaishou Technology Co., Ltd. Beijing, China"}]},{"given":"Yongdong","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Information Science and Technology, University of Science and Technology of China, Hefei, China"}]}],"member":"320","published-online":{"date-parts":[[2019,1,24]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2014.2339814"},{"key":"e_1_2_1_2_1","volume-title":"End-to-end text recognition with hybrid HMM maxout models. CoRR abs\/1310.1811","author":"Alsharif Ouais","year":"2013","unstructured":"Ouais Alsharif and Joelle Pineau . 2013. End-to-end text recognition with hybrid HMM maxout models. CoRR abs\/1310.1811 ( 2013 ). http:\/\/arxiv.org\/abs\/1310.1811 Ouais Alsharif and Joelle Pineau. 2013. End-to-end text recognition with hybrid HMM maxout models. CoRR abs\/1310.1811 (2013). http:\/\/arxiv.org\/abs\/1310.1811"},{"key":"e_1_2_1_3_1","volume-title":"Hinton","author":"Ba Lei Jimmy","year":"2016","unstructured":"Lei Jimmy Ba , Ryan Kiros , and Geoffrey E . Hinton . 2016 . Layer normalization. CoRR abs\/1607.06450 (2016). http:\/\/arxiv.org\/abs\/1607.06450 Lei Jimmy Ba, Ryan Kiros, and Geoffrey E. Hinton. 2016. Layer normalization. CoRR abs\/1607.06450 (2016). http:\/\/arxiv.org\/abs\/1607.06450"},{"key":"e_1_2_1_4_1","volume-title":"Neural machine translation by jointly learning to align and translate. CoRR abs\/1409.0473","author":"Bahdanau Dzmitry","year":"2014","unstructured":"Dzmitry Bahdanau , Kyunghyun Cho , and Yoshua Bengio . 2014. Neural machine translation by jointly learning to align and translate. CoRR abs\/1409.0473 ( 2014 ). http:\/\/arxiv.org\/abs\/1409.0473 Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. CoRR abs\/1409.0473 (2014). http:\/\/arxiv.org\/abs\/1409.0473"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2013.102"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11390-014-1468-z"},{"key":"e_1_2_1_7_1","first-page":"1","article-title":"Name-face association with web facial image supervision","volume":"4","author":"Chen Zhineng","year":"2017","unstructured":"Zhineng Chen , Wei Zhang , Bin Deng , Hongtao Xie , and Xiaoyan Gu . 2017 . Name-face association with web facial image supervision . Multimedia Systems 4 (2017), 1 -- 20 . Zhineng Chen, Wei Zhang, Bin Deng, Hongtao Xie, and Xiaoyan Gu. 2017. Name-face association with web facial image supervision. Multimedia Systems 4 (2017), 1--20.","journal-title":"Multimedia Systems"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/W14-4012"},{"key":"e_1_2_1_9_1","volume-title":"Proceedings of the 34th International Conference on Machine Learning (ICML\u201917)","volume":"70","author":"Dauphin Yann N.","year":"2017","unstructured":"Yann N. Dauphin , Angela Fan , Michael Auli , and David Grangier . 2017 . Language modeling with gated convolutional networks . In Proceedings of the 34th International Conference on Machine Learning (ICML\u201917) (Proceedings of Machine Learning Research), Sydney, NSW, Australia , August 6-11, 2017, Doina Precup and Yee Whye Teh (Eds.), Vol. 70 . ACM, 933--941. http:\/\/proceedings.mlr.press\/v70\/dauphin17a.html. Yann N. Dauphin, Angela Fan, Michael Auli, and David Grangier. 2017. Language modeling with gated convolutional networks. In Proceedings of the 34th International Conference on Machine Learning (ICML\u201917) (Proceedings of Machine Learning Research), Sydney, NSW, Australia, August 6-11, 2017, Doina Precup and Yee Whye Teh (Eds.), Vol. 70. ACM, 933--941. http:\/\/proceedings.mlr.press\/v70\/dauphin17a.html."},{"key":"e_1_2_1_10_1","volume-title":"Uyghur text matching in graphic images for biomedical semantic analysis. Neuroinformatics (19","author":"Fang Shancheng","year":"2018","unstructured":"Shancheng Fang , Hongtao Xie , Zhineng Chen , Yizhi Liu , and Yan Li. 2018. Uyghur text matching in graphic images for biomedical semantic analysis. Neuroinformatics (19 Jan 2018 ). Shancheng Fang, Hongtao Xie, Zhineng Chen, Yizhi Liu, and Yan Li. 2018. Uyghur text matching in graphic images for biomedical semantic analysis. Neuroinformatics (19 Jan 2018)."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-017-4538-8"},{"key":"e_1_2_1_12_1","volume-title":"Proceedings of the 34th International Conference on Machine Learning (ICML\u201917)","author":"Gehring Jonas","year":"2017","unstructured":"Jonas Gehring , Michael Auli , David Grangier , Denis Yarats , and Yann N. Dauphin . 2017. Convolutional sequence to sequence learning . In Proceedings of the 34th International Conference on Machine Learning (ICML\u201917) (Proceedings of Machine Learning Research), Sydney, NSW, Australia , August 6-11, 2017 , Doina Precup and Yee Whye Teh (Eds.), Vol. 70. ACM, 1243--1252. http:\/\/proceedings.mlr.press\/v70\/gehring17a.html. Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N. Dauphin. 2017. Convolutional sequence to sequence learning. In Proceedings of the 34th International Conference on Machine Learning (ICML\u201917) (Proceedings of Machine Learning Research), Sydney, NSW, Australia, August 6-11, 2017, Doina Precup and Yee Whye Teh (Eds.), Vol. 70. ACM, 1243--1252. http:\/\/proceedings.mlr.press\/v70\/gehring17a.html."},{"key":"e_1_2_1_13_1","volume-title":"Bagdanov","author":"Ghosh Suman K.","year":"2017","unstructured":"Suman K. Ghosh , Ernest Valveny , and Andrew D . Bagdanov . 2017 . Visual attention models for scene text recognition. CoRR abs\/1706.01487 (2017). http:\/\/arxiv.org\/abs\/1706.01487 Suman K. Ghosh, Ernest Valveny, and Andrew D. Bagdanov. 2017. Visual attention models for scene text recognition. CoRR abs\/1706.01487 (2017). http:\/\/arxiv.org\/abs\/1706.01487"},{"key":"e_1_2_1_14_1","volume-title":"Proceedings of the International Conference on Machine Learning (ICML'13)","author":"Goodfellow Ian J.","year":"2013","unstructured":"Ian J. Goodfellow , David Warde-Farley , Mehdi Mirza , Aaron C. Courville , and Yoshua Bengio . 2013 . Maxout networks . In Proceedings of the International Conference on Machine Learning (ICML'13) . ACM, 1319--1327. https:\/\/arxiv.org\/pdf\/1302.4389. Ian J. Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron C. Courville, and Yoshua Bengio. 2013. Maxout networks. In Proceedings of the International Conference on Machine Learning (ICML'13). ACM, 1319--1327. https:\/\/arxiv.org\/pdf\/1302.4389."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298914"},{"key":"e_1_2_1_16_1","volume-title":"Generating sequences with recurrent neural networks. CoRR abs\/1308.0850","author":"Graves Alex","year":"2013","unstructured":"Alex Graves . 2013. Generating sequences with recurrent neural networks. CoRR abs\/1308.0850 ( 2013 ). http:\/\/arxiv.org\/abs\/1308.0850 Alex Graves. 2013. Generating sequences with recurrent neural networks. CoRR abs\/1308.0850 (2013). http:\/\/arxiv.org\/abs\/1308.0850"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/1143844.1143891"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.123"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46493-0_38"},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of the 30th AAAI Conference on Artificial Intelligence","author":"He Pan","year":"2016","unstructured":"Pan He , Weilin Huang , Yu Qiao , Chen Change Loy , and Xiaoou Tang . 2016 a. Reading scene text in deep convolutional sequences . In Proceedings of the 30th AAAI Conference on Artificial Intelligence , Phoenix, AZ , February 12-17, 2016, Dale Schuurmans and Michael P. Wellman (Eds.). AAAI Press, 3501--3508. http:\/\/www.aaai.org\/ocs\/index.php\/AAAI\/AAAI16\/paper\/view\/12256. Pan He, Weilin Huang, Yu Qiao, Chen Change Loy, and Xiaoou Tang. 2016a. Reading scene text in deep convolutional sequences. In Proceedings of the 30th AAAI Conference on Artificial Intelligence, Phoenix, AZ, February 12-17, 2016, Dale Schuurmans and Michael P. Wellman (Eds.). AAAI Press, 3501--3508. http:\/\/www.aaai.org\/ocs\/index.php\/AAAI\/AAAI16\/paper\/view\/12256."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_2_1_23_1","volume-title":"Proceedings of the 32nd International Conference on Machine Learning (ICML\u201915) (JMLR Workshop and Conference Proceedings)","volume":"37","author":"Ioffe Sergey","year":"2015","unstructured":"Sergey Ioffe and Christian Szegedy . 2015 . Batch normalization: Accelerating deep network training by reducing internal covariate shift . In Proceedings of the 32nd International Conference on Machine Learning (ICML\u201915) (JMLR Workshop and Conference Proceedings) , Lille, France , July 6-11, 2015, Francis R. Bach and David M. Blei (Eds.), Vol. 37 . JMLR.org, 448--456. http:\/\/jmlr.org\/proceedings\/papers\/v37\/ioffe15.html. Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on Machine Learning (ICML\u201915) (JMLR Workshop and Conference Proceedings), Lille, France, July 6-11, 2015, Francis R. Bach and David M. Blei (Eds.), Vol. 37. JMLR.org, 448--456. http:\/\/jmlr.org\/proceedings\/papers\/v37\/ioffe15.html."},{"key":"e_1_2_1_24_1","volume-title":"Deep structured output learning for unconstrained text recognition. CoRR abs\/1412.5903","author":"Jaderberg Max","year":"2014","unstructured":"Max Jaderberg , Karen Simonyan , Andrea Vedaldi , and Andrew Zisserman . 2014a. Deep structured output learning for unconstrained text recognition. CoRR abs\/1412.5903 ( 2014 ). http:\/\/arxiv.org\/abs\/1412.5903 Max Jaderberg, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2014a. Deep structured output learning for unconstrained text recognition. CoRR abs\/1412.5903 (2014). http:\/\/arxiv.org\/abs\/1412.5903"},{"key":"e_1_2_1_25_1","volume-title":"Reading text in the wild with convolutional neural networks. CoRR abs\/1412.1842","author":"Jaderberg Max","year":"2014","unstructured":"Max Jaderberg , Karen Simonyan , Andrea Vedaldi , and Andrew Zisserman . 2014b. Reading text in the wild with convolutional neural networks. CoRR abs\/1412.1842 ( 2014 ). http:\/\/arxiv.org\/abs\/1412.1842 Max Jaderberg, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2014b. Reading text in the wild with convolutional neural networks. CoRR abs\/1412.1842 (2014). http:\/\/arxiv.org\/abs\/1412.1842"},{"key":"e_1_2_1_26_1","volume-title":"Synthetic data and artificial neural networks for natural scene text recognition. CoRR abs\/1406.2227","author":"Jaderberg Max","year":"2014","unstructured":"Max Jaderberg , Karen Simonyan , Andrea Vedaldi , and Andrew Zisserman . 2014c. Synthetic data and artificial neural networks for natural scene text recognition. CoRR abs\/1406.2227 ( 2014 ). http:\/\/arxiv.org\/abs\/1406.2227 Max Jaderberg, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2014c. Synthetic data and artificial neural networks for natural scene text recognition. CoRR abs\/1406.2227 (2014). http:\/\/arxiv.org\/abs\/1406.2227"},{"key":"e_1_2_1_27_1","volume-title":"Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015","author":"Jaderberg Max","year":"2015","unstructured":"Max Jaderberg , Karen Simonyan , Andrew Zisserman , and Koray Kavukcuoglu . 2015 . Spatial transformer networks . In Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015 , Montreal, Canada , December 7-12, 2015, Corinna Cortes, Neil D. Lawrence, Daniel D. Lee, Masashi Sugiyama, and Roman Garnett (Eds.). MIT Press, 2017--2025. http:\/\/papers.nips.cc\/paper\/5854-spatial-transformer-networks. Max Jaderberg, Karen Simonyan, Andrew Zisserman, and Koray Kavukcuoglu. 2015. Spatial transformer networks. In Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, Montreal, Canada, December 7-12, 2015, Corinna Cortes, Neil D. Lawrence, Daniel D. Lee, Masashi Sugiyama, and Roman Garnett (Eds.). MIT Press, 2017--2025. http:\/\/papers.nips.cc\/paper\/5854-spatial-transformer-networks."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10593-2_34"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDAR.2013.221"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.245"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/1126004.1126005"},{"key":"e_1_2_1_32_1","volume-title":"ICDAR 2003 robust reading competitions. In Proceedings of the 7th International Conference on Document Analysis and Recognition (ICDAR\u201903)","author":"Lucas Simon M.","year":"2003","unstructured":"Simon M. Lucas , Alex Panaretos , Luis Sosa , Anthony Tang , Shirley Wong , and Robert Young . 2003 . ICDAR 2003 robust reading competitions. In Proceedings of the 7th International Conference on Document Analysis and Recognition (ICDAR\u201903) , 2-Volume Set, Edinburg, Scotland , August 3-6, 2003. IEEE, 682--687. Simon M. Lucas, Alex Panaretos, Luis Sosa, Anthony Tang, Shirley Wong, and Robert Young. 2003. ICDAR 2003 robust reading competitions. In Proceedings of the 7th International Conference on Document Analysis and Recognition (ICDAR\u201903), 2-Volume Set, Edinburg, Scotland, August 3-6, 2003. IEEE, 682--687."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D15-1166"},{"key":"e_1_2_1_34_1","volume-title":"Proceedings of the British Machine Vision Conference (BMVC\u201912)","author":"Mishra Anand","year":"2012","unstructured":"Anand Mishra , Karteek Alahari , and C. V. Jawahar . 2012. Scene text recognition using higher order language priors . In Proceedings of the British Machine Vision Conference (BMVC\u201912) , Surrey, UK , September 3-7, 2012 , Richard Bowden, John P. Collomosse, and Krystian Mikolajczyk (Eds.). British Machine Vision Association Press, 1--11. Anand Mishra, Karteek Alahari, and C. V. Jawahar. 2012. Scene text recognition using higher order language priors. In Proceedings of the British Machine Vision Conference (BMVC\u201912), Surrey, UK, September 3-7, 2012, Richard Bowden, John P. Collomosse, and Krystian Mikolajczyk (Eds.). British Machine Vision Association Press, 1--11."},{"key":"e_1_2_1_35_1","volume-title":"Proceedings of the 27th International Conference on Machine Learning (ICML\u201910)","author":"Nair Vinod","year":"2010","unstructured":"Vinod Nair and Geoffrey E. Hinton . 2010. Rectified linear units improve restricted Boltzmann machines . In Proceedings of the 27th International Conference on Machine Learning (ICML\u201910) , Haifa, Israel , June 21-24, 2010 , Johannes F\u00fcrnkranz and Thorsten Joachims (Eds.). ACM, 807--814. http:\/\/www.icml2010.org\/papers\/432.pdf. Vinod Nair and Geoffrey E. Hinton. 2010. Rectified linear units improve restricted Boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML\u201910), Haifa, Israel, June 21-24, 2010, Johannes F\u00fcrnkranz and Thorsten Joachims (Eds.). ACM, 807--814. http:\/\/www.icml2010.org\/papers\/432.pdf."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.5555\/2354409.2355095"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2010.5540009"},{"key":"e_1_2_1_38_1","volume-title":"Proceddings of the British Machine Vision Conference (BMVC\u201913)","author":"Jos\u00e9","year":"2013","unstructured":"Jos\u00e9 A. Rodr\u00edguez and Florent Perronnin. 2013. Label embedding for text recognition . In Proceddings of the British Machine Vision Conference (BMVC\u201913) , Bristol, UK , September 9-13, 2013 , Tilo Burghardt, Dima Damen, Walterio W. Mayol-Cuevas, and Majid Mirmehdi (Eds.). British Machine Vision Association Press. Jos\u00e9 A. Rodr\u00edguez and Florent Perronnin. 2013. Label embedding for text recognition. In Proceddings of the British Machine Vision Conference (BMVC\u201913), Bristol, UK, September 9-13, 2013, Tilo Burghardt, Dima Damen, Walterio W. Mayol-Cuevas, and Majid Mirmehdi (Eds.). British Machine Vision Association Press."},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-014-0793-6"},{"key":"e_1_2_1_40_1","volume-title":"Proceedings of the 29th Annual Conference on Advances in Neural Information Processing Systems","author":"Salimans Tim","year":"2016","unstructured":"Tim Salimans and Diederik P. Kingma . 2016. Weight normalization: A simple reparameterization to accelerate training of deep neural networks . In Proceedings of the 29th Annual Conference on Advances in Neural Information Processing Systems , Barcelona, Spain , December 5-10, 2016 , Daniel D. Lee, Masashi Sugiyama, Ulrike von Luxburg, Isabelle Guyon, and Roman Garnett (Eds.). 901. http:\/\/papers.nips.cc\/paper\/6114-weight-normalization-a-simple-reparameterization-to-accelerate-training-of-deep-neural-networks. Tim Salimans and Diederik P. Kingma. 2016. Weight normalization: A simple reparameterization to accelerate training of deep neural networks. In Proceedings of the 29th Annual Conference on Advances in Neural Information Processing Systems, Barcelona, Spain, December 5-10, 2016, Daniel D. Lee, Masashi Sugiyama, Ulrike von Luxburg, Isabelle Guyon, and Roman Garnett (Eds.). 901. http:\/\/papers.nips.cc\/paper\/6114-weight-normalization-a-simple-reparameterization-to-accelerate-training-of-deep-neural-networks."},{"key":"e_1_2_1_41_1","volume-title":"An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. CoRR abs\/1507.05717","author":"Shi Baoguang","year":"2015","unstructured":"Baoguang Shi , Xiang Bai , and Cong Yao . 2015. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. CoRR abs\/1507.05717 ( 2015 ). http:\/\/arxiv.org\/abs\/1507.05717 Baoguang Shi, Xiang Bai, and Cong Yao. 2015. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. CoRR abs\/1507.05717 (2015). http:\/\/arxiv.org\/abs\/1507.05717"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.452"},{"key":"e_1_2_1_43_1","volume-title":"Proceedings of the 12th Asian Conference on Computer Vision (ACCV\u201914)","volume":"9003","author":"Su Bolan","year":"2014","unstructured":"Bolan Su and Shijian Lu . 2014 . Accurate scene text recognition based on recurrent neural network . In Proceedings of the 12th Asian Conference on Computer Vision (ACCV\u201914) Revised Selected Papers, Part I (Lecture Notes in Computer Science), Singapore , November 1-5, 2014, Daniel Cremers, Ian D. Reid, Hideo Saito, and Ming-Hsuan Yang (Eds.), Vol. 9003 . Springer, 35--48. Bolan Su and Shijian Lu. 2014. Accurate scene text recognition based on recurrent neural network. In Proceedings of the 12th Asian Conference on Computer Vision (ACCV\u201914) Revised Selected Papers, Part I (Lecture Notes in Computer Science), Singapore, November 1-5, 2014, Daniel Cremers, Ian D. Reid, Hideo Saito, and Ming-Hsuan Yang (Eds.), Vol. 9003. Springer, 35--48."},{"key":"e_1_2_1_44_1","volume-title":"Proceedings of the 30th International Conference on Machine Learning (ICML\u201913) (JMLR Workshop and Conference Proceedings)","volume":"28","author":"Sutskever Ilya","year":"2013","unstructured":"Ilya Sutskever , James Martens , George E. Dahl , and Geoffrey E. Hinton . 2013. On the importance of initialization and momentum in deep learning . In Proceedings of the 30th International Conference on Machine Learning (ICML\u201913) (JMLR Workshop and Conference Proceedings) , Atlanta, GA , June 16-21, 2013 , Vol. 28 . ACM, 1139--1147. http:\/\/jmlr.org\/proceedings\/papers\/v28\/sutskever13.html. Ilya Sutskever, James Martens, George E. Dahl, and Geoffrey E. Hinton. 2013. On the importance of initialization and momentum in deep learning. In Proceedings of the 30th International Conference on Machine Learning (ICML\u201913) (JMLR Workshop and Conference Proceedings), Atlanta, GA, June 16-21, 2013, Vol. 28. ACM, 1139--1147. http:\/\/jmlr.org\/proceedings\/papers\/v28\/sutskever13.html."},{"key":"e_1_2_1_45_1","volume-title":"Attention is all you need. CoRR abs\/1706.03762","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani , Noam Shazeer , Niki Parmar , Jakob Uszkoreit , Llion Jones , Aidan N. Gomez , Lukasz Kaiser , and Illia Polosukhin . 2017. Attention is all you need. CoRR abs\/1706.03762 ( 2017 ). http:\/\/arxiv.org\/abs\/1706.03762 Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. CoRR abs\/1706.03762 (2017). http:\/\/arxiv.org\/abs\/1706.03762"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2011.6126402"},{"key":"e_1_2_1_47_1","volume-title":"Proceedings of the 11th European Conference on Computer Vision (ECCV\u201910)","author":"Wang Kai","year":"2010","unstructured":"Kai Wang and Serge J. Belongie . 2010. Word spotting in the wild . In Proceedings of the 11th European Conference on Computer Vision (ECCV\u201910) , Part I (Lecture Notes in Computer Science), Heraklion, Crete , September 5-11, 2010 , Kostas Daniilidis, Petros Maragos, and Nikos Paragios (Eds.), Vol. 6311. IEEE, 591--604. Kai Wang and Serge J. Belongie. 2010. Word spotting in the wild. In Proceedings of the 11th European Conference on Computer Vision (ECCV\u201910), Part I (Lecture Notes in Computer Science), Heraklion, Crete, September 5-11, 2010, Kostas Daniilidis, Petros Maragos, and Nikos Paragios (Eds.), Vol. 6311. IEEE, 591--604."},{"key":"e_1_2_1_48_1","volume-title":"Proceedings of the 21st International Conference on Pattern Recognition (ICPR\u201912)","author":"Wang Tao","year":"2012","unstructured":"Tao Wang , David J. Wu , Adam Coates , and Andrew Y. Ng . 2012. End-to-end text recognition with convolutional neural networks . In Proceedings of the 21st International Conference on Pattern Recognition (ICPR\u201912) , Tsukuba, Japan , November 11-15, 2012 . IEEE, 3304--3308. http:\/\/ieeexplore.ieee.org\/document\/6460871\/. Tao Wang, David J. Wu, Adam Coates, and Andrew Y. Ng. 2012. End-to-end text recognition with convolutional neural networks. In Proceedings of the 21st International Conference on Pattern Recognition (ICPR\u201912), Tsukuba, Japan, November 11-15, 2012. IEEE, 3304--3308. http:\/\/ieeexplore.ieee.org\/document\/6460871\/."},{"key":"e_1_2_1_49_1","volume-title":"Attention-based extraction of structured information from street view imagery. CoRR abs\/1704.03549","author":"Wojna Zbigniew","year":"2017","unstructured":"Zbigniew Wojna , Alexander N. Gorban , Dar-Shyang Lee , Kevin Murphy , Qian Yu , Yeqing Li , and Julian Ibarz . 2017. Attention-based extraction of structured information from street view imagery. CoRR abs\/1704.03549 ( 2017 ). http:\/\/arxiv.org\/abs\/1704.03549 Zbigniew Wojna, Alexander N. Gorban, Dar-Shyang Lee, Kevin Murphy, Qian Yu, Yeqing Li, and Julian Ibarz. 2017. Attention-based extraction of structured information from street view imagery. CoRR abs\/1704.03549 (2017). http:\/\/arxiv.org\/abs\/1704.03549"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/TITS.2017.2749977"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/TITS.2017.2749965"},{"key":"e_1_2_1_52_1","unstructured":"Hongtao Xie Dongbao Yang Nannan Sun Zhineng Chen and Yongdong Zhang. 2014. Automated pulmonary nodule detection in CT images using deep convolutional neural networks. Pattern Recognition.  Hongtao Xie Dongbao Yang Nannan Sun Zhineng Chen and Yongdong Zhang. 2014. Automated pulmonary nodule detection in CT images using deep convolutional neural networks. Pattern Recognition."},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.515"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2016.2599102"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2015.2511585"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/2808210"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3231737","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3231737","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T01:08:16Z","timestamp":1750208896000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3231737"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,1,24]]},"references-count":56,"journal-issue":{"issue":"1s","published-print":{"date-parts":[[2019,1,31]]}},"alternative-id":["10.1145\/3231737"],"URL":"https:\/\/doi.org\/10.1145\/3231737","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"type":"print","value":"1551-6857"},{"type":"electronic","value":"1551-6865"}],"subject":[],"published":{"date-parts":[[2019,1,24]]},"assertion":[{"value":"2017-10-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2018-06-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-01-24","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}