{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,29]],"date-time":"2026-03-29T01:59:03Z","timestamp":1774749543696,"version":"3.50.1"},"reference-count":54,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2019,11,30]],"date-time":"2019-11-30T00:00:00Z","timestamp":1575072000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100004739","name":"Youth Innovation Promotion Association of the Chinese Academy of Sciences","doi-asserted-by":"crossref","award":["2018497"],"award-info":[{"award-number":["2018497"]}],"id":[{"id":"10.13039\/501100004739","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100012659","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61836011, 61822208, 61632019, 61662082, U1703261"],"award-info":[{"award-number":["61836011, 61822208, 61632019, 61662082, U1703261"]}],"id":[{"id":"10.13039\/501100012659","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2019,11,30]]},"abstract":"<jats:p>Detection of scene text in arbitrary shapes is a challenging task in the field of computer vision. Most existing scene text detection methods exploit the rectangle\/quadrangular bounding box to denote the detected text, which fails to accurately fit text with arbitrary shapes, such as curved text. In addition, recent progress on scene text detection has benefited from Fully Convolutional Network. Text cues contained in multi-level convolutional features are complementary for detecting scene text objects. How to explore these multi-level features is still an open problem. To tackle the above issues, we propose an Attention-based Bidirectional Long Short-Term Memory (AB-LSTM) model for scene text detection. First, word stroke regions (WSRs) and text center blocks (TCBs) are extracted by two AB-LSTM models, respectively. Then, the union of WSRs and TCBs are used to represent text objects. To verify the effectiveness of the proposed method, we perform experiments on four public benchmarks: CTW1500, Total-text, ICDAR2013, and MSRA-TD500, and compare it with existing state-of-the-art methods. Experiment results demonstrate that the proposed method can achieve competitive results, and well handle scene text objects with arbitrary shapes (i.e., curved, oriented, and horizontal forms).<\/jats:p>","DOI":"10.1145\/3356728","type":"journal-article","created":{"date-parts":[[2019,12,16]],"date-time":"2019-12-16T13:12:30Z","timestamp":1576501950000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":16,"title":["AB-LSTM"],"prefix":"10.1145","volume":"15","author":[{"given":"Zhandong","family":"Liu","sequence":"first","affiliation":[{"name":"University of Science and Technology of China, Shushan District, Hefei, China"}]},{"given":"Wengang","family":"Zhou","sequence":"additional","affiliation":[{"name":"University of Science and Technology of China, Shushan District, Hefei, China"}]},{"given":"Houqiang","family":"Li","sequence":"additional","affiliation":[{"name":"University of Science and Technology of China, Shushan District, Hefei, China"}]}],"member":"320","published-online":{"date-parts":[[2019,12,16]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Neural machine translation by jointly learning to align and translate. Retrieved from Arxiv Preprint Arxiv:1409.0473","author":"Bahdanau Dzmitry","year":"2014","unstructured":"Dzmitry Bahdanau , Kyunghyun Cho , and Yoshua Bengio . 2014. Neural machine translation by jointly learning to align and translate. Retrieved from Arxiv Preprint Arxiv:1409.0473 ( 2014 ). Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. Retrieved from Arxiv Preprint Arxiv:1409.0473 (2014)."},{"key":"e_1_2_1_2_1","volume-title":"Proceedings of the International Conference on Computer Vision (ICCV\u201915)","author":"Busta Michal","year":"2015","unstructured":"Michal Busta , Lukas Neumann , and Jiri Matas . 2015 . Fastext: Efficient unconstrained scene text detector . In Proceedings of the International Conference on Computer Vision (ICCV\u201915) . 1206--1214. Michal Busta, Lukas Neumann, and Jiri Matas. 2015. Fastext: Efficient unconstrained scene text detector. In Proceedings of the International Conference on Computer Vision (ICCV\u201915). 1206--1214."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDAR.2017.157"},{"key":"e_1_2_1_4_1","first-page":"48","article-title":"Paying more attention to saliency: Image captioning with saliency and context attention. ACM Trans. Multimedia Comput., Commun","volume":"14","author":"Cornia Marcella","year":"2018","unstructured":"Marcella Cornia , Lorenzo Baraldi , Giuseppe Serra , and Rita Cucchiara . 2018 . Paying more attention to saliency: Image captioning with saliency and context attention. ACM Trans. Multimedia Comput., Commun ., Applic. 14 , 2 (2018), 48 . Marcella Cornia, Lorenzo Baraldi, Giuseppe Serra, and Rita Cucchiara. 2018. Paying more attention to saliency: Image captioning with saliency and context attention. ACM Trans. Multimedia Comput., Commun., Applic. 14, 2 (2018), 48.","journal-title":"Applic."},{"key":"e_1_2_1_5_1","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence (AAAI\u201918)","author":"Deng Dan","year":"2018","unstructured":"Dan Deng , Haifeng Liu , Xuelong Li , and Deng Cai . 2018 . PixelLink: Detecting scene text via instance segmentation . In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI\u201918) . 6773--6780. Dan Deng, Haifeng Liu, Xuelong Li, and Deng Cai. 2018. PixelLink: Detecting scene text via instance segmentation. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI\u201918). 6773--6780."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2010.5540041"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-009-0275-4"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.254"},{"key":"e_1_2_1_9_1","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201917)","author":"He Dafang","unstructured":"Dafang He , Xiao Yang , Chen Liang , Zihan Zhou , G. Alexander , I. I. Ororbia , Daniel Kifer , and C. Lee Giles . 2017. Multi-scale FCN with cascaded instance aware segmentation for arbitrary oriented word spotting in the wild . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201917) . 474--483. Dafang He, Xiao Yang, Chen Liang, Zihan Zhou, G. Alexander, I. I. Ororbia, Daniel Kifer, and C. Lee Giles. 2017. Multi-scale FCN with cascaded instance aware segmentation for arbitrary oriented word spotting in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201917). 474--483."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.331"},{"key":"e_1_2_1_11_1","volume-title":"Accurate text localization in natural image with cascaded convolutional text network. Retrieved from: Arxiv Preprint Arxiv:1603.09423","author":"He Tong","year":"2016","unstructured":"Tong He , Weilin Huang , Yu Qiao , and Jian Yao . 2016. Accurate text localization in natural image with cascaded convolutional text network. Retrieved from: Arxiv Preprint Arxiv:1603.09423 ( 2016 ). Tong He, Weilin Huang, Yu Qiao, and Jian Yao. 2016. Accurate text localization in natural image with cascaded convolutional text network. Retrieved from: Arxiv Preprint Arxiv:1603.09423 (2016)."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00527"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.87"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.529"},{"key":"e_1_2_1_15_1","first-page":"10","article-title":"Egocentric hand detection via dynamic region growing. ACM Trans. Multimedia Comput., Commun","volume":"14","author":"Huang Shao","year":"2017","unstructured":"Shao Huang , Weiqiang Wang , Shengfeng He , and Rynson W. H. Lau . 2017 . Egocentric hand detection via dynamic region growing. ACM Trans. Multimedia Comput., Commun ., Applic. 14 , 1 (2017), 10 . Shao Huang, Weiqiang Wang, Shengfeng He, and Rynson W. H. Lau. 2017. Egocentric hand detection via dynamic region growing. ACM Trans. Multimedia Comput., Commun., Applic. 14, 1 (2017), 10.","journal-title":"Applic."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2013.157"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/2647868.2654889"},{"key":"e_1_2_1_18_1","volume-title":"R2CNN: Rotational region CNN for orientation robust scene text detection. Retrieved from Arxiv Preprint Arxiv:1706.09579","author":"Jiang Yingying","year":"2017","unstructured":"Yingying Jiang , Xiangyu Zhu , Xiaobing Wang , Shuli Yang , Wei Li , Hua Wang , Pei Fu , and Zhenbo Luo . 2017. R2CNN: Rotational region CNN for orientation robust scene text detection. Retrieved from Arxiv Preprint Arxiv:1706.09579 ( 2017 ). Yingying Jiang, Xiangyu Zhu, Xiaobing Wang, Shuli Yang, Wei Li, Hua Wang, Pei Fu, and Zhenbo Luo. 2017. R2CNN: Rotational region CNN for orientation robust scene text detection. Retrieved from Arxiv Preprint Arxiv:1706.09579 (2017)."},{"key":"e_1_2_1_19_1","volume-title":"ICDAR 2015 competition on robust reading. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR\u201915)","author":"Karatzas Dimosthenis","unstructured":"Dimosthenis Karatzas , Lluis Gomez-Bigorda , Anguelos Nicolaou , Suman Ghosh , Andrew Bagdanov , Masakazu Iwamura , Jiri Matas , Lukas Neumann , Vijay Ramaseshan Chandrasekhar , Shijian Lu et al. 2015 . ICDAR 2015 competition on robust reading. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR\u201915) . 1156--1160. Dimosthenis Karatzas, Lluis Gomez-Bigorda, Anguelos Nicolaou, Suman Ghosh, Andrew Bagdanov, Masakazu Iwamura, Jiri Matas, Lukas Neumann, Vijay Ramaseshan Chandrasekhar, Shijian Lu et al. 2015. ICDAR 2015 competition on robust reading. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR\u201915). 1156--1160."},{"key":"e_1_2_1_20_1","volume-title":"ICDAR 2013 robust reading competition. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR\u201913)","author":"Karatzas Dimosthenis","year":"2013","unstructured":"Dimosthenis Karatzas , Faisal Shafait , Seiichi Uchida , Masakazu Iwamura , Lluis Gomez i Bigorda , Sergi Robles Mestre , Joan Mas , David Fernandez Mota , Jon Almazan Almazan , and Lluis Pere De Las Heras . 2013 . ICDAR 2013 robust reading competition. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR\u201913) . 1484--1493. Dimosthenis Karatzas, Faisal Shafait, Seiichi Uchida, Masakazu Iwamura, Lluis Gomez i Bigorda, Sergi Robles Mestre, Joan Mas, David Fernandez Mota, Jon Almazan Almazan, and Lluis Pere De Las Heras. 2013. ICDAR 2013 robust reading competition. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR\u201913). 1484--1493."},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS\u201912)","author":"Krizhevsky Alex","unstructured":"Alex Krizhevsky , Ilya Sutskever , and Geoffrey E. Hinton . 2012. Imagenet classification with deep convolutional neural networks . In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS\u201912) . 1097--1105. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS\u201912). 1097--1105."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.782"},{"key":"e_1_2_1_23_1","volume-title":"Shape robust text detection with progressive scale expansion network. Retrieved from Arxiv Preprint Arxiv:1806.02559","author":"Li Xiang","year":"2018","unstructured":"Xiang Li , Wenhai Wang , Wenbo Hou , Ruo-Ze Liu , Tong Lu , and Jian Yang . 2018. Shape robust text detection with progressive scale expansion network. Retrieved from Arxiv Preprint Arxiv:1806.02559 ( 2018 ). Xiang Li, Wenhai Wang, Wenbo Hou, Ruo-Ze Liu, Tong Lu, and Jian Yang. 2018. Shape robust text detection with progressive scale expansion network. Retrieved from Arxiv Preprint Arxiv:1806.02559 (2018)."},{"key":"e_1_2_1_24_1","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence (AAAI\u201917)","author":"Liao Minghui","year":"2017","unstructured":"Minghui Liao , Baoguang Shi , Xiang Bai , Xinggang Wang , and Wenyu Liu . 2017 . TextBoxes: A fast text detector with a single deep neural network . In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI\u201917) . 4161--4167. Minghui Liao, Baoguang Shi, Xiang Bai, Xinggang Wang, and Wenyu Liu. 2017. TextBoxes: A fast text detector with a single deep neural network. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI\u201917). 4161--4167."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00619"},{"key":"e_1_2_1_26_1","volume-title":"Proceedings of the European Conference on Computer Vision (ECCV\u201916)","author":"Liu Wei","unstructured":"Wei Liu , Dragomir Anguelov , Dumitru Erhan , Christian Szegedy , Scott Reed , Cheng-Yang Fu , and Alexander C. Berg . 2016. SSD: Single shot multibox detector . In Proceedings of the European Conference on Computer Vision (ECCV\u201916) . 21--37. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. 2016. SSD: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision (ECCV\u201916). 21--37."},{"key":"e_1_2_1_27_1","volume-title":"Detecting curve text in the wild: New dataset and new solution. Retrieved from Arxiv Preprint Arxiv:1712.02170","author":"Liu Yuliang","year":"2017","unstructured":"Yuliang Liu , Lianwen Jin , Shuaitao Zhang , and Sheng Zhang . 2017. Detecting curve text in the wild: New dataset and new solution. Retrieved from Arxiv Preprint Arxiv:1712.02170 ( 2017 ). Yuliang Liu, Lianwen Jin, Shuaitao Zhang, and Sheng Zhang. 2017. Detecting curve text in the wild: New dataset and new solution. Retrieved from Arxiv Preprint Arxiv:1712.02170 (2017)."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-019-7177-4"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01216-8_2"},{"key":"e_1_2_1_30_1","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence (AAAI\u201918)","author":"Long Xiang","year":"2018","unstructured":"Xiang Long , Chuang Gan , Gerard de Melo , Xiao Liu , Yandong Li , Fu Li , and Shilei Wen . 2018 . Multimodal keyless attention fusion for video classification . In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI\u201918) . 7202--7209. Xiang Long, Chuang Gan, Gerard de Melo, Xiao Liu, Yandong Li, Fu Li, and Shilei Wen. 2018. Multimodal keyless attention fusion for video classification. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI\u201918). 7202--7209."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01264-9_5"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00788"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/tmm.2018.2818020"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-8655(97)00131-1"},{"key":"e_1_2_1_35_1","volume-title":"Joseph Chazalon et al","author":"Nayef Nibal","year":"2017","unstructured":"Nibal Nayef , Fei Yin , Imen Bizid , Hyunsoo Choi , Yuan Feng , Dimosthenis Karatzas , Zhenbo Luo , Umapada Pal , Christophe Rigaud , Joseph Chazalon et al . 2017 . ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identification-RRC-MLT. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR\u2019 17). 1454--1459. Nibal Nayef, Fei Yin, Imen Bizid, Hyunsoo Choi, Yuan Feng, Dimosthenis Karatzas, Zhenbo Luo, Umapada Pal, Christophe Rigaud, Joseph Chazalon et al. 2017. ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identification-RRC-MLT. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR\u201917). 1454--1459."},{"key":"e_1_2_1_36_1","volume-title":"Proceedings of the Asian Conference on Computer Vision (ACCV\u201910)","author":"Neumann Lukas","year":"2010","unstructured":"Lukas Neumann and Jiri Matas . 2010 . A method for text localization and recognition in real-world images . In Proceedings of the Asian Conference on Computer Vision (ACCV\u201910) . 770--783. Lukas Neumann and Jiri Matas. 2010. A method for text localization and recognition in real-world images. In Proceedings of the Asian Conference on Computer Vision (ACCV\u201910). 770--783."},{"key":"e_1_2_1_37_1","volume-title":"Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS\u201915)","author":"Ren Shaoqing","year":"2015","unstructured":"Shaoqing Ren , Kaiming He , Ross Girshick , and Jian Sun . 2015 . Faster R-CNN: Towards real-time object detection with region proposal networks . In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS\u201915) . 91--99. Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS\u201915). 91--99."},{"key":"e_1_2_1_38_1","volume-title":"ICDAR 2011 robust reading competition challenge 2: Reading text in scene images. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR\u201911)","author":"Shahab Asif","year":"2011","unstructured":"Asif Shahab , Faisal Shafait , and Andreas Dengel . 2011 . ICDAR 2011 robust reading competition challenge 2: Reading text in scene images. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR\u201911) . 1491--1496. Asif Shahab, Faisal Shafait, and Andreas Dengel. 2011. ICDAR 2011 robust reading competition challenge 2: Reading text in scene images. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR\u201911). 1491--1496."},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.371"},{"key":"e_1_2_1_40_1","volume-title":"Very deep convolutional networks for large-scale image recognition. Retrieved from Arxiv Preprint Arxiv:1409.1556","author":"Simonyan Karen","year":"2014","unstructured":"Karen Simonyan and Andrew Zisserman . 2014. Very deep convolutional networks for large-scale image recognition. Retrieved from Arxiv Preprint Arxiv:1409.1556 ( 2014 ). Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. Retrieved from Arxiv Preprint Arxiv:1409.1556 (2014)."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46484-8_4"},{"key":"e_1_2_1_43_1","first-page":"40","article-title":"Image captioning with deep bidirectional LSTMs and multi-task learning. ACM Trans. Multimedia Comput., Commun","volume":"14","author":"Wang Cheng","year":"2018","unstructured":"Cheng Wang , Haojin Yang , and Christoph Meinel . 2018 . Image captioning with deep bidirectional LSTMs and multi-task learning. ACM Trans. Multimedia Comput., Commun ., Applic. 14 , 2s (2018), 40 . Cheng Wang, Haojin Yang, and Christoph Meinel. 2018. Image captioning with deep bidirectional LSTMs and multi-task learning. ACM Trans. Multimedia Comput., Commun., Applic. 14, 2s (2018), 40.","journal-title":"Applic."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.5555\/2722900.2723092"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.164"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.335"},{"key":"e_1_2_1_47_1","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201912)","author":"Yao Cong","year":"2012","unstructured":"Cong Yao , Xiang Bai , Wenyu Liu , Yi Ma , and Zhuowen Tu . 2012 . Detecting texts of arbitrary orientations in natural images . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201912) . 1083--1090. Cong Yao, Xiang Bai, Wenyu Liu, Yi Ma, and Zhuowen Tu. 2012. Detecting texts of arbitrary orientations in natural images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201912). 1083--1090."},{"key":"e_1_2_1_48_1","volume-title":"Scene text detection via holistic, multi-channel prediction. Retrieved from Arxiv Preprint Arxiv:1606.09002","author":"Yao Cong","year":"2016","unstructured":"Cong Yao , Xiang Bai , Nong Sang , Xinyu Zhou , Shuchang Zhou , and Zhimin Cao . 2016. Scene text detection via holistic, multi-channel prediction. Retrieved from Arxiv Preprint Arxiv:1606.09002 ( 2016 ). Cong Yao, Xiang Bai, Nong Sang, Xinyu Zhou, Shuchang Zhou, and Zhimin Cao. 2016. Scene text detection via holistic, multi-channel prediction. Retrieved from Arxiv Preprint Arxiv:1606.09002 (2016)."},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2013.182"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2016.2554321"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00187"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.451"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.283"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11704-015-4488-0"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3356728","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3356728","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:22:55Z","timestamp":1750202575000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3356728"}},"subtitle":["Attention-based Bidirectional LSTM Model for Scene Text Detection"],"short-title":[],"issued":{"date-parts":[[2019,11,30]]},"references-count":54,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2019,11,30]]}},"alternative-id":["10.1145\/3356728"],"URL":"https:\/\/doi.org\/10.1145\/3356728","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,11,30]]},"assertion":[{"value":"2018-12-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-08-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-12-16","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}