{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,4]],"date-time":"2026-04-04T17:54:22Z","timestamp":1775325262085,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":46,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,10,10]],"date-time":"2022-10-10T00:00:00Z","timestamp":1665360000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,10,10]]},"DOI":"10.1145\/3503161.3548266","type":"proceedings-article","created":{"date-parts":[[2022,10,10]],"date-time":"2022-10-10T15:42:46Z","timestamp":1665416566000},"page":"1319-1328","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":17,"title":["Decoupling Recognition from Detection: Single Shot Self-Reliant Scene Text Spotter"],"prefix":"10.1145","author":[{"given":"Jingjing","family":"Wu","sequence":"first","affiliation":[{"name":"Harbin Institute of Technology, Shenzhen, Shenzhen, China"}]},{"given":"Pengyuan","family":"Lyu","sequence":"additional","affiliation":[{"name":"Baidu Inc., Shenzhen, China"}]},{"given":"Guangming","family":"Lu","sequence":"additional","affiliation":[{"name":"Guangdong Provincial Key Laboratory of Novel Security Intelligence Technologies &amp; Harbin Institute of Technology, Shenzhen, Shenzhen, China"}]},{"given":"Chengquan","family":"Zhang","sequence":"additional","affiliation":[{"name":"Baidu Inc., Shenzhen, China"}]},{"given":"Kun","family":"Yao","sequence":"additional","affiliation":[{"name":"Baidu Inc., Shenzhen, China"}]},{"given":"Wenjie","family":"Pei","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology, Shenzhen, Shenzhen, China"}]}],"member":"320","published-online":{"date-parts":[[2022,10,10]]},"reference":[{"key":"e_1_3_2_2_1_1","volume-title":"Character region attention for text spotting","author":"Baek Youngmin","unstructured":"Youngmin Baek , Seung Shin , Jeonghun Baek , Sungrae Park , Junyeop Lee , Daehyun Nam , and Hwalsuk Lee . 2020. Character region attention for text spotting . In ECCV. Springer , 504--521. Youngmin Baek, Seung Shin, Jeonghun Baek, Sungrae Park, Junyeop Lee, Daehyun Nam, and Hwalsuk Lee. 2020. Character region attention for text spotting. In ECCV. Springer, 504--521."},{"key":"e_1_3_2_2_2_1","volume-title":"Yolact: Real-time instance segmentation. In CVPR. 9157--9166.","author":"Bolya Daniel","year":"2019","unstructured":"Daniel Bolya , Chong Zhou , Fanyi Xiao , and Yong Jae Lee . 2019 . Yolact: Real-time instance segmentation. In CVPR. 9157--9166. Daniel Bolya, Chong Zhou, Fanyi Xiao, and Yong Jae Lee. 2019. Yolact: Real-time instance segmentation. In CVPR. 9157--9166."},{"key":"e_1_3_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/34.24792"},{"key":"e_1_3_2_2_4_1","doi-asserted-by":"crossref","unstructured":"Michal Busta Lukas Neumann and Jiri Matas. 2017. Deep textspotter: An end-to-end trainable scene text localization and recognition framework. In ICCV. 2204--2212.  Michal Busta Lukas Neumann and Jiri Matas. 2017. Deep textspotter: An end-to-end trainable scene text localization and recognition framework. In ICCV. 2204--2212.","DOI":"10.1109\/ICCV.2017.242"},{"key":"e_1_3_2_2_5_1","volume-title":"Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs","author":"Chen Liang-Chieh","year":"2017","unstructured":"Liang-Chieh Chen , George Papandreou , Iasonas Kokkinos , Kevin Murphy , and Alan L Yuille . 2017 . Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs . IEEE transactions on pattern analysis and machine intelligence, Vol. 40 , 4 (2017), 834--848. Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. 2017. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence, Vol. 40, 4 (2017), 834--848."},{"key":"e_1_3_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDAR.2017.157"},{"key":"e_1_3_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDAR.2019.00252"},{"key":"e_1_3_2_2_8_1","unstructured":"Shancheng Fang Hongtao Xie Yuxin Wang Zhendong Mao and Yongdong Zhang. 2021. Read like humans: Autonomous bidirectional and iterative language modeling for scene text recognition. In CVPR. 7098--7107.  Shancheng Fang Hongtao Xie Yuxin Wang Zhendong Mao and Yongdong Zhang. 2021. Read like humans: Autonomous bidirectional and iterative language modeling for scene text recognition. In CVPR. 7098--7107."},{"key":"e_1_3_2_2_9_1","volume-title":"Textdragon: An end-to-end framework for arbitrary shaped text spotting. In ICCV. 9076--9085.","author":"Feng Wei","year":"2019","unstructured":"Wei Feng , Wenhao He , Fei Yin , Xu-Yao Zhang , and Cheng-Lin Liu . 2019 . Textdragon: An end-to-end framework for arbitrary shaped text spotting. In ICCV. 9076--9085. Wei Feng, Wenhao He, Fei Yin, Xu-Yao Zhang, and Cheng-Lin Liu. 2019. Textdragon: An end-to-end framework for arbitrary shaped text spotting. In ICCV. 9076--9085."},{"key":"e_1_3_2_2_10_1","doi-asserted-by":"crossref","unstructured":"Alex Graves Santiago Fern\u00e1ndez Faustino Gomez and J\u00fcrgen Schmidhuber. 2006. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In ICML. 369--376.  Alex Graves Santiago Fern\u00e1ndez Faustino Gomez and J\u00fcrgen Schmidhuber. 2006. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In ICML. 369--376.","DOI":"10.1145\/1143844.1143891"},{"key":"e_1_3_2_2_11_1","doi-asserted-by":"crossref","unstructured":"Ankush Gupta Andrea Vedaldi and Andrew Zisserman. 2016. Synthetic data for text localisation in natural images. In CVPR. 2315--2324.  Ankush Gupta Andrea Vedaldi and Andrew Zisserman. 2016. Synthetic data for text localisation in natural images. In CVPR. 2315--2324.","DOI":"10.1109\/CVPR.2016.254"},{"key":"e_1_3_2_2_12_1","unstructured":"Kaiming He Xiangyu Zhang Shaoqing Ren and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR. 770--778.  Kaiming He Xiangyu Zhang Shaoqing Ren and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR. 770--778."},{"key":"e_1_3_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-015-0823-z"},{"key":"e_1_3_2_2_14_1","volume-title":"ICDAR 2015 competition on robust reading. In 2015 13th international conference on document analysis and recognition (ICDAR). IEEE, 1156--1160","author":"Karatzas Dimosthenis","year":"2015","unstructured":"Dimosthenis Karatzas , Lluis Gomez-Bigorda , Anguelos Nicolaou , Suman Ghosh , Andrew Bagdanov , Masakazu Iwamura , Jiri Matas , Lukas Neumann , Vijay Ramaseshan Chandrasekhar , Shijian Lu , 2015 . ICDAR 2015 competition on robust reading. In 2015 13th international conference on document analysis and recognition (ICDAR). IEEE, 1156--1160 . Dimosthenis Karatzas, Lluis Gomez-Bigorda, Anguelos Nicolaou, Suman Ghosh, Andrew Bagdanov, Masakazu Iwamura, Jiri Matas, Lukas Neumann, Vijay Ramaseshan Chandrasekhar, Shijian Lu, et al. 2015. ICDAR 2015 competition on robust reading. In 2015 13th international conference on document analysis and recognition (ICDAR). IEEE, 1156--1160."},{"key":"e_1_3_2_2_15_1","doi-asserted-by":"crossref","unstructured":"Hui Li Peng Wang and Chunhua Shen. 2017. Towards end-to-end text spotting with convolutional recurrent neural networks. In ICCV. 5238--5246.  Hui Li Peng Wang and Chunhua Shen. 2017. Towards end-to-end text spotting with convolutional recurrent neural networks. In ICCV. 5238--5246.","DOI":"10.1109\/ICCV.2017.560"},{"key":"e_1_3_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2019.2937086"},{"key":"e_1_3_2_2_17_1","volume-title":"Mask textspotter v3: Segmentation proposal network for robust scene text spotting","author":"Liao Minghui","unstructured":"Minghui Liao , Guan Pang , Jing Huang , Tal Hassner , and Xiang Bai . 2020. Mask textspotter v3: Segmentation proposal network for robust scene text spotting . In ECCV. Springer , 706--722. Minghui Liao, Guan Pang, Jing Huang, Tal Hassner, and Xiang Bai. 2020. Mask textspotter v3: Segmentation proposal network for robust scene text spotting. In ECCV. Springer, 706--722."},{"key":"e_1_3_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2018.2825107"},{"key":"e_1_3_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v31i1.11196"},{"key":"e_1_3_2_2_20_1","volume-title":"Ssd: Single shot multibox detector","author":"Liu Wei","year":"2016","unstructured":"Wei Liu , Dragomir Anguelov , Dumitru Erhan , Christian Szegedy , Scott Reed , Cheng-Yang Fu , and Alexander C Berg . 2016 . Ssd: Single shot multibox detector . In ECCV. Springer , 21--37. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. 2016. Ssd: Single shot multibox detector. In ECCV. Springer, 21--37."},{"key":"e_1_3_2_2_21_1","volume-title":"Fots: Fast oriented text spotting with a unified network. In CVPR. 5676--5685.","author":"Liu Xuebo","year":"2018","unstructured":"Xuebo Liu , Ding Liang , Shi Yan , Dagui Chen , Yu Qiao , and Junjie Yan . 2018 . Fots: Fast oriented text spotting with a unified network. In CVPR. 5676--5685. Xuebo Liu, Ding Liang, Shi Yan, Dagui Chen, Yu Qiao, and Junjie Yan. 2018. Fots: Fast oriented text spotting with a unified network. In CVPR. 5676--5685."},{"key":"e_1_3_2_2_22_1","volume-title":"Abcnet: Real-time scene text spotting with adaptive bezier-curve network. In CVPR. 9809--9818.","author":"Liu Yuliang","year":"2020","unstructured":"Yuliang Liu , Hao Chen , Chunhua Shen , Tong He , Lianwen Jin , and Liangwei Wang . 2020 . Abcnet: Real-time scene text spotting with adaptive bezier-curve network. In CVPR. 9809--9818. Yuliang Liu, Hao Chen, Chunhua Shen, Tong He, Lianwen Jin, and Liangwei Wang. 2020. Abcnet: Real-time scene text spotting with adaptive bezier-curve network. In CVPR. 9809--9818."},{"key":"e_1_3_2_2_23_1","volume-title":"ABCNet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. arXiv preprint arXiv:2105.03620","author":"Liu Yuliang","year":"2021","unstructured":"Yuliang Liu , Chunhua Shen , Lianwen Jin , Tong He , Peng Chen , Chongyu Liu , and Hao Chen . 2021. ABCNet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. arXiv preprint arXiv:2105.03620 ( 2021 ). Yuliang Liu, Chunhua Shen, Lianwen Jin, Tong He, Peng Chen, Chongyu Liu, and Hao Chen. 2021. ABCNet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. arXiv preprint arXiv:2105.03620 (2021)."},{"key":"e_1_3_2_2_24_1","unstructured":"Pengyuan Lyu Minghui Liao Cong Yao Wenhao Wu and Xiang Bai. 2018a. Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In ECCV. 67--83.  Pengyuan Lyu Minghui Liao Cong Yao Wenhao Wu and Xiang Bai. 2018a. Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In ECCV. 67--83."},{"key":"e_1_3_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00788"},{"key":"e_1_3_2_2_26_1","volume-title":"V-net: Fully convolutional neural networks for volumetric medical image segmentation. In 2016 fourth international conference on 3D vision (3DV)","author":"Milletari Fausto","year":"2016","unstructured":"Fausto Milletari , Nassir Navab , and Seyed-Ahmad Ahmadi . 2016 . V-net: Fully convolutional neural networks for volumetric medical image segmentation. In 2016 fourth international conference on 3D vision (3DV) . IEEE , 565--571. Fausto Milletari, Nassir Navab, and Seyed-Ahmad Ahmadi. 2016. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In 2016 fourth international conference on 3D vision (3DV). IEEE, 565--571."},{"key":"e_1_3_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDAR.2017.237"},{"key":"e_1_3_2_2_28_1","volume-title":"Mango: a mask attention guided one-stage scene text spotter. arXiv preprint arXiv:2012.04350","author":"Qiao Liang","year":"2020","unstructured":"Liang Qiao , Ying Chen , Zhanzhan Cheng , Yunlu Xu , Yi Niu , Shiliang Pu , and Fei Wu. 2020a. Mango: a mask attention guided one-stage scene text spotter. arXiv preprint arXiv:2012.04350 ( 2020 ). Liang Qiao, Ying Chen, Zhanzhan Cheng, Yunlu Xu, Yi Niu, Shiliang Pu, and Fei Wu. 2020a. Mango: a mask attention guided one-stage scene text spotter. arXiv preprint arXiv:2012.04350 (2020)."},{"key":"e_1_3_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i07.6864"},{"key":"e_1_3_2_2_30_1","unstructured":"Siyang Qin Alessandro Bissacco Michalis Raptis Yasuhisa Fujii and Ying Xiao. 2019. Towards unconstrained end-to-end text spotting. In ICCV. 4704--4714.  Siyang Qin Alessandro Bissacco Michalis Raptis Yasuhisa Fujii and Ying Xiao. 2019. Towards unconstrained end-to-end text spotting. In ICCV. 4704--4714."},{"key":"e_1_3_2_2_31_1","unstructured":"Shaoqing Ren Kaiming He Ross Girshick and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In NeurIPS. 91--99.  Shaoqing Ren Kaiming He Ross Girshick and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In NeurIPS. 91--99."},{"key":"e_1_3_2_2_32_1","volume-title":"An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition","author":"Shi Baoguang","year":"2016","unstructured":"Baoguang Shi , Xiang Bai , and Cong Yao . 2016. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition . IEEE transactions on pattern analysis and machine intelligence, Vol. 39 , 11 ( 2016 ), 2298--2304. Baoguang Shi, Xiang Bai, and Cong Yao. 2016. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE transactions on pattern analysis and machine intelligence, Vol. 39, 11 (2016), 2298--2304."},{"key":"e_1_3_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2018.2848939"},{"key":"e_1_3_2_2_34_1","volume-title":"Asian Conference on Computer Vision. Springer, 83--99","author":"Sun Yipeng","year":"2018","unstructured":"Yipeng Sun , Chengquan Zhang , Zuming Huang , Jiaming Liu , Junyu Han , and Errui Ding . 2018 . Textnet: Irregular text reading from images with an end-to-end trainable network . In Asian Conference on Computer Vision. Springer, 83--99 . Yipeng Sun, Chengquan Zhang, Zuming Huang, Jiaming Liu, Junyu Han, and Errui Ding. 2018. Textnet: Irregular text reading from images with an end-to-end trainable network. In Asian Conference on Computer Vision. Springer, 83--99."},{"key":"e_1_3_2_2_35_1","volume-title":"Efficientdet: Scalable and efficient object detection. In CVPR. 10781--10790.","author":"Tan Mingxing","year":"2020","unstructured":"Mingxing Tan , Ruoming Pang , and Quoc V Le . 2020 . Efficientdet: Scalable and efficient object detection. In CVPR. 10781--10790. Mingxing Tan, Ruoming Pang, and Quoc V Le. 2020. Efficientdet: Scalable and efficient object detection. In CVPR. 10781--10790."},{"key":"e_1_3_2_2_36_1","volume-title":"Coco-text: Dataset and benchmark for text detection and recognition in natural images. arXiv preprint arXiv:1601.07140","author":"Veit Andreas","year":"2016","unstructured":"Andreas Veit , Tomas Matera , Lukas Neumann , Jiri Matas , and Serge Belongie . 2016 . Coco-text: Dataset and benchmark for text detection and recognition in natural images. arXiv preprint arXiv:1601.07140 (2016). Andreas Veit, Tomas Matera, Lukas Neumann, Jiri Matas, and Serge Belongie. 2016. Coco-text: Dataset and benchmark for text detection and recognition in natural images. arXiv preprint arXiv:1601.07140 (2016)."},{"key":"e_1_3_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i07.6896"},{"key":"e_1_3_2_2_38_1","volume-title":"2011 International conference on computer vision. IEEE, 1457--1464","author":"Wang Kai","year":"2011","unstructured":"Kai Wang , Boris Babenko , and Serge Belongie . 2011 . End-to-end scene text recognition . In 2011 International conference on computer vision. IEEE, 1457--1464 . Kai Wang, Boris Babenko, and Serge Belongie. 2011. End-to-end scene text recognition. In 2011 International conference on computer vision. IEEE, 1457--1464."},{"key":"e_1_3_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2021.3095916"},{"key":"e_1_3_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i4.16383"},{"key":"e_1_3_2_2_41_1","volume-title":"PAN: towards efficient and accurate End-to-End spotting of arbitrarily-shaped text","author":"Wang Wenhai","year":"2021","unstructured":"Wenhai Wang , Enze Xie , Xiang Li , Xuebo Liu , Ding Liang , Yang Zhibo , Tong Lu , and Chunhua Shen . 2021b. PAN: towards efficient and accurate End-to-End spotting of arbitrarily-shaped text . IEEE Transactions on Pattern Analysis and Machine Intelligence ( 2021 ). Wenhai Wang, Enze Xie, Xiang Li, Xuebo Liu, Ding Liang, Yang Zhibo, Tong Lu, and Chunhua Shen. 2021b. PAN: towards efficient and accurate End-to-End spotting of arbitrarily-shaped text. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021)."},{"key":"e_1_3_2_2_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2020.2995290"},{"key":"e_1_3_2_2_43_1","doi-asserted-by":"crossref","unstructured":"Linjie Xing Zhi Tian Weilin Huang and Matthew R Scott. 2019. Convolutional character networks. In CVPR. 9126--9136.  Linjie Xing Zhi Tian Weilin Huang and Matthew R Scott. 2019. Convolutional character networks. In CVPR. 9126--9136.","DOI":"10.1109\/ICCV.2019.00922"},{"key":"e_1_3_2_2_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2019.2900589"},{"key":"e_1_3_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/2964284.2967274"},{"key":"e_1_3_2_2_46_1","doi-asserted-by":"crossref","unstructured":"Xinyu Zhou Cong Yao He Wen Yuzhi Wang Shuchang Zhou Weiran He and Jiajun Liang. 2017. East: an efficient and accurate scene text detector. In CVPR. 5551--5560.  Xinyu Zhou Cong Yao He Wen Yuzhi Wang Shuchang Zhou Weiran He and Jiajun Liang. 2017. East: an efficient and accurate scene text detector. In CVPR. 5551--5560.","DOI":"10.1109\/CVPR.2017.283"}],"event":{"name":"MM '22: The 30th ACM International Conference on Multimedia","location":"Lisboa Portugal","acronym":"MM '22","sponsor":["SIGMM ACM Special Interest Group on Multimedia"]},"container-title":["Proceedings of the 30th ACM International Conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3503161.3548266","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3503161.3548266","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:00:42Z","timestamp":1750186842000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3503161.3548266"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,10,10]]},"references-count":46,"alternative-id":["10.1145\/3503161.3548266","10.1145\/3503161"],"URL":"https:\/\/doi.org\/10.1145\/3503161.3548266","relation":{},"subject":[],"published":{"date-parts":[[2022,10,10]]},"assertion":[{"value":"2022-10-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}