{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,4]],"date-time":"2026-04-04T18:05:48Z","timestamp":1775325948916,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":48,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,10,10]],"date-time":"2022-10-10T00:00:00Z","timestamp":1665360000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,10,10]]},"DOI":"10.1145\/3503161.3547787","type":"proceedings-article","created":{"date-parts":[[2022,10,10]],"date-time":"2022-10-10T15:43:01Z","timestamp":1665416581000},"page":"4154-4163","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":21,"title":["You Can even Annotate Text with Voice: Transcription-only-Supervised Text Spotting"],"prefix":"10.1145","author":[{"given":"Jingqun","family":"Tang","sequence":"first","affiliation":[{"name":"Ant Group, Hangzhou, China"}]},{"given":"Su","family":"Qiao","sequence":"additional","affiliation":[{"name":"Zhejiang Gongshang University, Hangzhou, China"}]},{"given":"Benlei","family":"Cui","sequence":"additional","affiliation":[{"name":"Alibaba Group, Hangzhou, China"}]},{"given":"Yuhang","family":"Ma","sequence":"additional","affiliation":[{"name":"University College London, London, United Kingdom"}]},{"given":"Sheng","family":"Zhang","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]},{"given":"Dimitrios","family":"Kanoulas","sequence":"additional","affiliation":[{"name":"University College London, London, United Kingdom"}]}],"member":"320","published-online":{"date-parts":[[2022,10,10]]},"reference":[{"key":"e_1_3_2_2_1_1","volume-title":"International Conference on Machine Learning.","author":"Amodei Dario","year":"2016","unstructured":"Dario Amodei , Sundaram Ananthanarayanan , Rishita Anubhai , Jingliang Bai , Eric Battenberg , Carl Case , Jared Casper , Bryan Catanzaro , Qiang Cheng , Guoliang Chen , Jie Chen , Jingdong Chen , Zhijie Chen , Mike Chrzanowski , Adam Coates , Greg Diamos , Ke Ding , Niandong Du , Erich Elsen , Jesse Engel , Weiwei Fang , Linxi Fan , Christopher Fougner , Liang Gao , Caixia Gong , Awni Hannun , Tony X. Han , Lappi Vaino Johannes , Bing Jiang , Cai Ju , Billy Jun , Patrick LeGresley , Libby Lin , Junjie Liu , Yang Liu , Weigao Li , Xiangang Li , Dongpeng Ma , Sharan Narang , Andrew Y. Ng , Sherjil Ozair , Yiping Peng , Ryan Prenger , Sheng Qian , Zongfeng Quan , Jonathan Raiman , Vinay Rao , Sanjeev Satheesh , David Seetapun , Shubho Sengupta , Kavya Srinet , Anuroop Sriram , Haiyuan Tang , Liliang Tang , Chong Wang , Jidong Wang , Kaifu Wang , Yi Wang , Zhijian Wang , Zhiqian Wang , Shuang Wu , Likai Wei , Bo Xiao , Wen Xie , Yan Xie , Dani Yogatama , Bin Yuan , Jun Zhan , and Zhenyao Zhu . 2016 . Deep speech 2: end-to-end speech recognition in English and mandarin . In International Conference on Machine Learning. Dario Amodei, Sundaram Ananthanarayanan, Rishita Anubhai, Jingliang Bai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Qiang Cheng, Guoliang Chen, Jie Chen, Jingdong Chen, Zhijie Chen, Mike Chrzanowski, Adam Coates, Greg Diamos, Ke Ding, Niandong Du, Erich Elsen, Jesse Engel, Weiwei Fang, Linxi Fan, Christopher Fougner, Liang Gao, Caixia Gong, Awni Hannun, Tony X. Han, Lappi Vaino Johannes, Bing Jiang, Cai Ju, Billy Jun, Patrick LeGresley, Libby Lin, Junjie Liu, Yang Liu, Weigao Li, Xiangang Li, Dongpeng Ma, Sharan Narang, Andrew Y. Ng, Sherjil Ozair, Yiping Peng, Ryan Prenger, Sheng Qian, Zongfeng Quan, Jonathan Raiman, Vinay Rao, Sanjeev Satheesh, David Seetapun, Shubho Sengupta, Kavya Srinet, Anuroop Sriram, Haiyuan Tang, Liliang Tang, Chong Wang, Jidong Wang, Kaifu Wang, Yi Wang, Zhijian Wang, Zhiqian Wang, Shuang Wu, Likai Wei, Bo Xiao, Wen Xie, Yan Xie, Dani Yogatama, Bin Yuan, Jun Zhan, and Zhenyao Zhu. 2016. Deep speech 2: end-to-end speech recognition in English and mandarin. In International Conference on Machine Learning."},{"key":"e_1_3_2_2_2_1","volume-title":"Character Region Attention for Text Spotting. In European Conference on Computer Vision.","author":"Baek Young Min","year":"2020","unstructured":"Young Min Baek , Seung Shin , Jeonghun Baek , Sungrae Park , Junyeop Lee , Daehyun Nam , and Hwalsuk Lee . 2020 . Character Region Attention for Text Spotting. In European Conference on Computer Vision. Young Min Baek, Seung Shin, Jeonghun Baek, Sungrae Park, Junyeop Lee, Daehyun Nam, and Hwalsuk Lee. 2020. Character Region Attention for Text Spotting. In European Conference on Computer Vision."},{"key":"e_1_3_2_2_3_1","unstructured":"Alexei Baevski Yuhao Zhou Abdelrahman Mohamed and Michael Auli. 2020. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations. In Neural Information Processing Systems.  Alexei Baevski Yuhao Zhou Abdelrahman Mohamed and Michael Auli. 2020. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations. In Neural Information Processing Systems."},{"key":"e_1_3_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.naacl-main.276"},{"key":"e_1_3_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58452-8_13"},{"key":"e_1_3_2_2_6_1","doi-asserted-by":"crossref","unstructured":"Nenglun Chen Xingjia Pan Runnan Chen Lei Yang Zhiwen Lin Ren Yuqiang Haolei Yuan Xiaowei Guo Feiyue Huang and Wenping Wang. 2021. Distributed Attention for Grounded Image Captioning. In ACM Multimedia.  Nenglun Chen Xingjia Pan Runnan Chen Lei Yang Zhiwen Lin Ren Yuqiang Haolei Yuan Xiaowei Guo Feiyue Huang and Wenping Wang. 2021. Distributed Attention for Grounded Image Captioning. In ACM Multimedia.","DOI":"10.1145\/3474085.3475354"},{"key":"e_1_3_2_2_7_1","first-page":"935","article-title":"Total-text: A comprehensive dataset for scene text detection and recognition","volume":"1","author":"Ch'ng Chee Kheng","year":"2017","unstructured":"Chee Kheng Ch'ng and Chee Seng Chan . 2017 . Total-text: A comprehensive dataset for scene text detection and recognition . In Proc. ICDAR , Vol. 1. 935 -- 942 . Chee Kheng Ch'ng and Chee Seng Chan. 2017. Total-text: A comprehensive dataset for scene text detection and recognition. In Proc. ICDAR, Vol. 1. 935--942.","journal-title":"Proc. ICDAR"},{"key":"e_1_3_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.89"},{"key":"e_1_3_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00269"},{"key":"e_1_3_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00917"},{"key":"e_1_3_2_2_11_1","volume-title":"Catastrophic forgetting in connectionist networks. Trends in cognitive sciences","author":"French Robert M","year":"1999","unstructured":"Robert M French . 1999. Catastrophic forgetting in connectionist networks. Trends in cognitive sciences , Vol. 3 , 4 ( 1999 ), 128--135. Robert M French. 1999. Catastrophic forgetting in connectionist networks. Trends in cognitive sciences, Vol. 3, 4 (1999), 128--135."},{"key":"e_1_3_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00527"},{"key":"e_1_3_2_2_14_1","volume-title":"Long Short-term Memory. Neural computation","author":"Hochreiter Sepp","year":"1997","unstructured":"Sepp Hochreiter and J\u00fcrgen Schmidhuber . 1997. Long Short-term Memory. Neural computation , Vol. 9 , 8 ( 1997 ), 1735--1780. Sepp Hochreiter and J\u00fcrgen Schmidhuber. 1997. Long Short-term Memory. Neural computation, Vol. 9, 8 (1997), 1735--1780."},{"key":"e_1_3_2_2_15_1","volume-title":"ICDAR 2015 competition on robust reading. In ICDAR. 1156--1160","author":"Karatzas Dimosthenis","year":"2015","unstructured":"Dimosthenis Karatzas , Lluis Gomez-Bigorda , Anguelos Nicolaou , Suman Ghosh , Andrew Bagdanov , Masakazu Iwamura , Jiri Matas , Lukas Neumann , Vijay Ramaseshan Chandrasekhar , Shijian Lu , 2015 . ICDAR 2015 competition on robust reading. In ICDAR. 1156--1160 . Dimosthenis Karatzas, Lluis Gomez-Bigorda, Anguelos Nicolaou, Suman Ghosh, Andrew Bagdanov, Masakazu Iwamura, Jiri Matas, Lukas Neumann, Vijay Ramaseshan Chandrasekhar, Shijian Lu, et al. 2015. ICDAR 2015 competition on robust reading. In ICDAR. 1156--1160."},{"key":"e_1_3_2_2_16_1","volume-title":"ICDAR 2013 robust reading competition. In Proc. ICDAR. 1484--1493","author":"Karatzas Dimosthenis","year":"2013","unstructured":"Dimosthenis Karatzas , Faisal Shafait , Seiichi Uchida , Masakazu Iwamura , Lluis Gomez i Bigorda , Sergi Robles Mestre , Joan Mas , David Fernandez Mota , Jon Almazan Almazan , and Lluis Pere De Las Heras . 2013 . ICDAR 2013 robust reading competition. In Proc. ICDAR. 1484--1493 . Dimosthenis Karatzas, Faisal Shafait, Seiichi Uchida, Masakazu Iwamura, Lluis Gomez i Bigorda, Sergi Robles Mestre, Joan Mas, David Fernandez Mota, Jon Almazan Almazan, and Lluis Pere De Las Heras. 2013. ICDAR 2013 robust reading competition. In Proc. ICDAR. 1484--1493."},{"key":"e_1_3_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00456"},{"key":"e_1_3_2_2_18_1","volume-title":"The Hungarian method for the assignment problem. Naval research logistics quarterly","author":"Kuhn Harold W","year":"1955","unstructured":"Harold W Kuhn . 1955. The Hungarian method for the assignment problem. Naval research logistics quarterly , Vol. 2 , 1--2 ( 1955 ), 83--97. Harold W Kuhn. 1955. The Hungarian method for the assignment problem. Naval research logistics quarterly, Vol. 2, 1--2 (1955), 83--97."},{"key":"e_1_3_2_2_19_1","volume-title":"Overcoming catastrophic forgetting by incremental moment matching. Advances in neural information processing systems","author":"Lee Sang-Woo","year":"2017","unstructured":"Sang-Woo Lee , Jin-Hwa Kim , Jaehyun Jun , Jung-Woo Ha , and Byoung-Tak Zhang . 2017. Overcoming catastrophic forgetting by incremental moment matching. Advances in neural information processing systems , Vol. 30 ( 2017 ). Sang-Woo Lee, Jin-Hwa Kim, Jaehyun Jun, Jung-Woo Ha, and Byoung-Tak Zhang. 2017. Overcoming catastrophic forgetting by incremental moment matching. Advances in neural information processing systems, Vol. 30 (2017)."},{"key":"e_1_3_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.560"},{"key":"e_1_3_2_2_21_1","doi-asserted-by":"crossref","unstructured":"Hui Li Peng Wang and Chunhua Shen. 2017b. Towards end-to-end text spotting with convolutional recurrent neural networks. (2017) 5238--5246.  Hui Li Peng Wang and Chunhua Shen. 2017b. Towards end-to-end text spotting with convolutional recurrent neural networks. (2017) 5238--5246.","DOI":"10.1109\/ICCV.2017.560"},{"key":"e_1_3_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58621-8_41"},{"key":"e_1_3_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01176"},{"key":"e_1_3_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3474085.3481534"},{"key":"e_1_3_2_2_25_1","volume-title":"2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR).","author":"Liu X.","unstructured":"X. Liu , L. Ding , Y. Shi , D. Chen , and J. Yan . 2018. FOTS: Fast Oriented Text Spotting with a Unified Network . In 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). X. Liu, L. Ding, Y. Shi, D. Chen, and J. Yan. 2018. FOTS: Fast Oriented Text Spotting with a Unified Network. In 2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)."},{"key":"e_1_3_2_2_26_1","volume-title":"ABCNet: Real-Time Scene Text Spotting With Adaptive Bezier-Curve Network. In 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR).","author":"Liu Y.","unstructured":"Y. Liu , H. Chen , C. Shen , T. He , and L. Wang . 2020 . ABCNet: Real-Time Scene Text Spotting With Adaptive Bezier-Curve Network. In 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Y. Liu, H. Chen, C. Shen, T. He, and L. Wang. 2020. ABCNet: Real-Time Scene Text Spotting With Adaptive Bezier-Curve Network. In 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)."},{"key":"e_1_3_2_2_27_1","doi-asserted-by":"crossref","unstructured":"Y. Liu C. Shen L. Jin T. He P. Chen C. Liu and H. Chen. 2021b. ABCNet v2: Adaptive Bezier-Curve Network for Real-time End-to-end Text Spotting. (2021).  Y. Liu C. Shen L. Jin T. He P. Chen C. Liu and H. Chen. 2021b. ABCNet v2: Adaptive Bezier-Curve Network for Real-time End-to-end Text Spotting. (2021).","DOI":"10.1109\/TPAMI.2021.3107437"},{"key":"e_1_3_2_2_28_1","unstructured":"Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. In ICLR.  Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. In ICLR."},{"key":"e_1_3_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01264-9_5"},{"key":"e_1_3_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDAR.2019.00254"},{"key":"e_1_3_2_2_31_1","volume-title":"SPTS: Single-Point Text Spotting. arXiv preprint arXiv:2112.07917","author":"Peng Dezhi","year":"2021","unstructured":"Dezhi Peng , Xinyu Wang , Yuliang Liu , Jiaxin Zhang , Mingxin Huang , Songxuan Lai , Shenggao Zhu , Jing Li , Dahua Lin , Chunhua Shen , 2021 . SPTS: Single-Point Text Spotting. arXiv preprint arXiv:2112.07917 (2021). Dezhi Peng, Xinyu Wang, Yuliang Liu, Jiaxin Zhang, Mingxin Huang, Songxuan Lai, Shenggao Zhu, Jing Li, Dahua Lin, Chunhua Shen, et al. 2021. SPTS: Single-Point Text Spotting. arXiv preprint arXiv:2112.07917 (2021)."},{"key":"e_1_3_2_2_32_1","volume-title":"MANGO: A Mask Attention Guided One-Stage Scene Text Spotter. In National Conference on Artificial Intelligence.","author":"Qiao Liang","year":"2021","unstructured":"Liang Qiao , Ying Chen , Zhanzhan Cheng , Yunlu Xu , Yi Niu , Shiliang Pu , and Fei Wu . 2021 . MANGO: A Mask Attention Guided One-Stage Scene Text Spotter. In National Conference on Artificial Intelligence. Liang Qiao, Ying Chen, Zhanzhan Cheng, Yunlu Xu, Yi Niu, Shiliang Pu, and Fei Wu. 2021. MANGO: A Mask Attention Guided One-Stage Scene Text Spotter. In National Conference on Artificial Intelligence."},{"key":"e_1_3_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i07.6864"},{"key":"e_1_3_2_2_34_1","doi-asserted-by":"crossref","unstructured":"S. Qin A. BiSsAcco M. Raptis Y. Fujii and Y. Xiao. 2019. Towards Unconstrained End-to-End Text Spotting. IEEE (2019).  S. Qin A. BiSsAcco M. Raptis Y. Fujii and Y. Xiao. 2019. Towards Unconstrained End-to-End Text Spotting. IEEE (2019).","DOI":"10.1109\/ICCV.2019.00480"},{"key":"e_1_3_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.368"},{"key":"e_1_3_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3474085.3475196"},{"key":"e_1_3_2_2_37_1","volume-title":"Asian Conference on Computer Vision. Springer, 83--99","author":"Sun Yipeng","year":"2018","unstructured":"Yipeng Sun , Chengquan Zhang , Zuming Huang , Jiaming Liu , Junyu Han , and Errui Ding . 2018 . Textnet: Irregular text reading from images with an end-to-end trainable network . In Asian Conference on Computer Vision. Springer, 83--99 . Yipeng Sun, Chengquan Zhang, Zuming Huang, Jiaming Liu, Junyu Han, and Errui Ding. 2018. Textnet: Irregular text reading from images with an end-to-end trainable network. In Asian Conference on Computer Vision. Springer, 83--99."},{"key":"e_1_3_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00452"},{"key":"e_1_3_2_2_39_1","volume-title":"\u0141 ukasz Kaiser, and Illia Polosukhin","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani , Noam Shazeer , Niki Parmar , Jakob Uszkoreit , Llion Jones , Aidan N Gomez , \u0141 ukasz Kaiser, and Illia Polosukhin . 2017 . Attention is All you Need. In Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30 . Curran Associates, Inc . https:\/\/proceedings.neurips.cc\/paper\/2017\/file\/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, \u0141 ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc. https:\/\/proceedings.neurips.cc\/paper\/2017\/file\/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf"},{"key":"e_1_3_2_2_40_1","volume-title":"COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images. arXiv: Computer Vision and Pattern Recognition","author":"Veit Andreas","year":"2016","unstructured":"Andreas Veit , Tomas Matera , Lukas Neumann , Jiri Matas , and Serge Belongie . 2016. COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images. arXiv: Computer Vision and Pattern Recognition ( 2016 ). Andreas Veit, Tomas Matera, Lukas Neumann, Jiri Matas, and Serge Belongie. 2016. COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images. arXiv: Computer Vision and Pattern Recognition (2016)."},{"key":"e_1_3_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i07.6896"},{"key":"e_1_3_2_2_42_1","volume-title":"PGNet: Real-time Arbitrarily-Shaped Text Spotting with Point Gathering Network. AAAI. AAAI","author":"Wang Pengfei","year":"2021","unstructured":"Pengfei Wang , Chengquan Zhang , Fei Qi , Shanshan Liu , Xiaoqiang Zhang , Pengyuan Lyu , Junyu Han , Jingtuo Liu , Errui Ding , and Guangming Shi . 2021b. PGNet: Real-time Arbitrarily-Shaped Text Spotting with Point Gathering Network. AAAI. AAAI ( 2021 ), 2782--2790. Pengfei Wang, Chengquan Zhang, Fei Qi, Shanshan Liu, Xiaoqiang Zhang, Pengyuan Lyu, Junyu Han, Jingtuo Liu, Errui Ding, and Guangming Shi. 2021b. PGNet: Real-time Arbitrarily-Shaped Text Spotting with Point Gathering Network. AAAI. AAAI (2021), 2782--2790."},{"key":"e_1_3_2_2_43_1","volume-title":"PAN: Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped Text","author":"Wang Wenhai","year":"2021","unstructured":"Wenhai Wang , Enze Xie , Xiang Li , Xuebo Liu , Ding Liang , Yang Zhibo , Tong Lu , and Chunhua Shen . 2021 a. PAN: Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped Text . IEEE Transactions on Pattern Analysis and Machine Intelligence ( 2021), 1--1. https:\/\/doi.org\/10.1109\/TPAMI.2021.3077555 Wenhai Wang, Enze Xie, Xiang Li, Xuebo Liu, Ding Liang, Yang Zhibo, Tong Lu, and Chunhua Shen. 2021a. PAN: Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped Text. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021), 1--1. https:\/\/doi.org\/10.1109\/TPAMI.2021.3077555"},{"key":"e_1_3_2_2_44_1","volume-title":"OpenWorld video text dataset and end-to-end video text spotter with transformer. arXiv preprint arXiv:2112.04888","author":"Wu Weijia","year":"2021","unstructured":"Weijia Wu , Yuanqiang Cai , Debing Zhang , Sibo Wang , Zhuang Li , Jiahong Li , Yejun Tang , and Hong Zhou . 2021. A bilingual , OpenWorld video text dataset and end-to-end video text spotter with transformer. arXiv preprint arXiv:2112.04888 ( 2021 ). Weijia Wu, Yuanqiang Cai, Debing Zhang, Sibo Wang, Zhuang Li, Jiahong Li, Yejun Tang, and Hong Zhou. 2021. A bilingual, OpenWorld video text dataset and end-to-end video text spotter with transformer. arXiv preprint arXiv:2112.04888 (2021)."},{"key":"e_1_3_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00922"},{"key":"e_1_3_2_2_46_1","volume-title":"Language Matters: A Weakly Supervised Pre-training Approach for Scene Text Detection and Spotting. arXiv preprint arXiv:2203.03911","author":"Xue Chuhui","year":"2022","unstructured":"Chuhui Xue , Yu Hao , Shijian Lu , Philip Torr , and Song Bai . 2022 . Language Matters: A Weakly Supervised Pre-training Approach for Scene Text Detection and Spotting. arXiv preprint arXiv:2203.03911 (2022). Chuhui Xue, Yu Hao, Shijian Lu, Philip Torr, and Song Bai. 2022. Language Matters: A Weakly Supervised Pre-training Approach for Scene Text Detection and Spotting. arXiv preprint arXiv:2203.03911 (2022)."},{"key":"e_1_3_2_2_47_1","volume-title":"Detecting curve text in the wild: New dataset and new solution. arXiv preprint arXiv:1712.02170","author":"Yuliang Liu","year":"2017","unstructured":"Liu Yuliang , Jin Lianwen , Zhang Shuaitao , and Zhang Sheng . 2017. Detecting curve text in the wild: New dataset and new solution. arXiv preprint arXiv:1712.02170 ( 2017 ). Liu Yuliang, Jin Lianwen, Zhang Shuaitao, and Zhang Sheng. 2017. Detecting curve text in the wild: New dataset and new solution. arXiv preprint arXiv:1712.02170 (2017)."},{"key":"e_1_3_2_2_48_1","doi-asserted-by":"crossref","unstructured":"Yiqin Zhu Jianyong Chen Lingyu Liang Zhanghui Kuang Lianwen Jin and Wayne Zhang. 2021. Fourier Contour Embedding for Arbitrary-Shaped Text Detection. In Computer Vision and Pattern Recognition.  Yiqin Zhu Jianyong Chen Lingyu Liang Zhanghui Kuang Lianwen Jin and Wayne Zhang. 2021. Fourier Contour Embedding for Arbitrary-Shaped Text Detection. In Computer Vision and Pattern Recognition.","DOI":"10.1109\/CVPR46437.2021.00314"}],"event":{"name":"MM '22: The 30th ACM International Conference on Multimedia","location":"Lisboa Portugal","acronym":"MM '22","sponsor":["SIGMM ACM Special Interest Group on Multimedia"]},"container-title":["Proceedings of the 30th ACM International Conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3503161.3547787","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3503161.3547787","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:30:41Z","timestamp":1750188641000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3503161.3547787"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,10,10]]},"references-count":48,"alternative-id":["10.1145\/3503161.3547787","10.1145\/3503161"],"URL":"https:\/\/doi.org\/10.1145\/3503161.3547787","relation":{},"subject":[],"published":{"date-parts":[[2022,10,10]]},"assertion":[{"value":"2022-10-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}