{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,8]],"date-time":"2026-01-08T05:55:30Z","timestamp":1767851730159,"version":"3.49.0"},"reference-count":48,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2024,2,17]],"date-time":"2024-02-17T00:00:00Z","timestamp":1708128000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,2,17]],"date-time":"2024-02-17T00:00:00Z","timestamp":1708128000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Tianjin \"Project+Team\" Key Training Special Project","award":["XB202007"],"award-info":[{"award-number":["XB202007"]}]},{"name":"Science and Technology Support of Tianjin Key Research and the Development Plan Project","award":["18YFZCGX00930"],"award-info":[{"award-number":["18YFZCGX00930"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Neural Process Lett"],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>The existing text detection algorithms based on Convolutional Neural Networks (CNN) commonly have the problems of insufficient receptive fields and inadequate extraction of spatial positional information, which limit their ability to detect large-scale variation text instances, long-distance and wide-spaced text instances as well as effectively distinguish complex background textures. To address the above problems, in this paper, a scene text detection algorithm combining Swin Transformer and attention-weighted fusion is proposed. Firstly, an attention-weighted fusion (AWF) module is proposed, which embeds a modified coordinate attention module (CAM) in the feature pyramid network (FPN). This module learns spatial positional weights of foreground information in different-scale features while suppressing redundant background information. As a result, the fused features are more focused on the text regions, enhancing the localization ability for text regions and boundaries. Secondly, the window-based self-attention mechanism of the Swin Transformer is utilized to achieve global feature perception on the fused features of the pyramid network. This compensates for the insufficient receptive fields of CNN and enhances the representation capability of global contextual features, thereby further improving the performance of text detection. Experimental results demonstrate that the proposed algorithm achieves competitive performance on three public datasets, namely ICDAR2015, MSRA-TD500, and Total-Text, with F-measure reaching 87.9%, 91.4%, and 86.7%, respectively. Code is available at: <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/xgli411\/ST-AWFNet\">https:\/\/github.com\/xgli411\/ST-AWFNet<\/jats:ext-link>.<\/jats:p>","DOI":"10.1007\/s11063-024-11501-7","type":"journal-article","created":{"date-parts":[[2024,2,17]],"date-time":"2024-02-17T04:25:54Z","timestamp":1708143954000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["Combining Swin Transformer and Attention-Weighted Fusion for Scene Text Detection"],"prefix":"10.1007","volume":"56","author":[{"given":"Xianguo","family":"Li","sequence":"first","affiliation":[]},{"given":"Xingchen","family":"Yao","sequence":"additional","affiliation":[]},{"given":"Yi","family":"Liu","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,2,17]]},"reference":[{"key":"11501_CR1","doi-asserted-by":"publisher","first-page":"56","DOI":"10.1007\/978-3-319-46484-8_4","volume-title":"Computer vision\u2014ECCV 2016","author":"Z Tian","year":"2016","unstructured":"Tian Z, Huang W, He T, He P, Qiao Y (2016) Detecting text in natural image with connectionist text proposal network. Computer vision\u2014ECCV 2016. Springer International Publishing, Cham, pp 56\u201372"},{"issue":"6","key":"11501_CR2","doi-asserted-by":"publisher","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","volume":"39","author":"S Ren","year":"2017","unstructured":"Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137\u20131149","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"11501_CR3","doi-asserted-by":"publisher","first-page":"21","DOI":"10.1007\/978-3-319-46448-0_2","volume-title":"Computer vision\u2014ECCV 2016","author":"W Liu","year":"2016","unstructured":"Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: single shot multibox detector. Computer vision\u2014ECCV 2016. Springer International Publishing, Cham, pp 21\u201337"},{"key":"11501_CR4","doi-asserted-by":"crossref","unstructured":"Liao M, Wan Z, Yao C, Chen K, Bai X (2020) Real-time scene text detection with differentiable binarization. In: Proceedings of the AAAI conference on artificial intelligence, pp 11474\u201311481","DOI":"10.1609\/aaai.v34i07.6812"},{"key":"11501_CR5","doi-asserted-by":"crossref","unstructured":"Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132\u20137141","DOI":"10.1109\/CVPR.2018.00745"},{"key":"11501_CR6","doi-asserted-by":"crossref","unstructured":"Woo S, Park J, Lee JY, K I S (2018) Cbam: convolutional block attention module. In: 2018 European conference on computer vision (ECCV), pp 3\u201319","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"11501_CR7","doi-asserted-by":"crossref","unstructured":"Wang Q, Wu B, Zhu P, Li P, Zou W, Hu Q (2020) ECA-Net: efficient channel attention for deep convolutional neural networks. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 11534\u201311542","DOI":"10.1109\/CVPR42600.2020.01155"},{"key":"11501_CR8","doi-asserted-by":"crossref","unstructured":"Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 13713\u201313722","DOI":"10.1109\/CVPR46437.2021.01350"},{"key":"11501_CR9","doi-asserted-by":"crossref","unstructured":"Huang Z, Wang X, Huang L, Huang C, Wei Y, Liu W (2019) Ccnet: criss-cross attention for semantic segmentation. In Proceedings of the IEEE\/CVF international conference on computer vision, pp 603\u2013612","DOI":"10.1109\/ICCV.2019.00069"},{"key":"11501_CR10","unstructured":"Vaswani A, Shazeer N, Parmar N, Parmar N, Uszkoreit J, Jones L, Gomez A, Kaiser \u0141, Polosukhin I (2017) Attention is all you need. Advances in neural information processing systems, pp 5998\u20136008"},{"key":"11501_CR11","unstructured":"Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D et al (2021) An image is worth 16\u00d716 words: transformers for image recognition at scale. arXiv preprint arXiv: 2010.11929"},{"key":"11501_CR12","doi-asserted-by":"crossref","unstructured":"Raisi Z, Younes G, Zelek J (2022) Arbitrary shape text detection using transformers. In: 2022 26th International Conference on Pattern Recognition (ICPR). IEEE, pp 3238\u20133245","DOI":"10.1109\/ICPR56361.2022.9956488"},{"key":"11501_CR13","unstructured":"Zeng YX, Hsieh JW, Li X, Chang MC (2023) MixNet: toward accurate detection of challenging scene text in the wild. arXiv preprint arXiv:2308.12817"},{"key":"11501_CR14","doi-asserted-by":"crossref","unstructured":"Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE\/CVF international conference on computer vision (ICCV), pp 10012\u201310022","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"11501_CR15","doi-asserted-by":"crossref","unstructured":"Liao M, Shi B, Bai X, Wang X, Liu W (2017) Textboxes: a fast text detector with a single deep neural network. In: Proceedings of the thirty-first AAAI conference on artificial intelligence, pp 4161\u20134167","DOI":"10.1609\/aaai.v31i1.11196"},{"issue":"8","key":"11501_CR16","doi-asserted-by":"publisher","first-page":"3676","DOI":"10.1109\/TIP.2018.2825107","volume":"27","author":"M Liao","year":"2018","unstructured":"Liao M, Shi B, Bai X (2018) Textboxes++: a single-shot oriented scene text detector. IEEE Trans Image Process 27(8):3676\u20133690","journal-title":"IEEE Trans Image Process"},{"key":"11501_CR17","doi-asserted-by":"crossref","unstructured":"Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J (2017) East: an efficient and accurate scene text detector. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 5551\u20135560","DOI":"10.1109\/CVPR.2017.283"},{"key":"11501_CR18","doi-asserted-by":"crossref","unstructured":"Shi B, Bai X, Belongie S (2017) Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2550\u20132558","DOI":"10.1109\/CVPR.2017.371"},{"key":"11501_CR19","doi-asserted-by":"crossref","unstructured":"Zhang C, Liang B, Huang Z, En M, Han J, Ding E, Ding X (2019) Look more than once: an accurate detector for text of arbitrary shapes. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (CVPR), pp 10552\u201310561","DOI":"10.1109\/CVPR.2019.01080"},{"key":"11501_CR20","doi-asserted-by":"crossref","unstructured":"Wang Y, Xie H, Zha ZJ, Xing M, Fu Z, Zhang Y (2020) Contournet: taking a further step toward accurate arbitrary-shaped scene text detection. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (CVPR), pp 11753\u201311762","DOI":"10.1109\/CVPR42600.2020.01177"},{"key":"11501_CR21","doi-asserted-by":"crossref","unstructured":"Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3431\u20133440","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"11501_CR22","doi-asserted-by":"crossref","unstructured":"Deng D, Liu H, Li X, Cai D (2018) Pixellink: detecting scene text via instance segmentation. In: Proceedings of the thirty-second AAAI conference on artificial intelligence. AAAI Press, pp 6773\u20136780","DOI":"10.1609\/aaai.v32i1.12269"},{"key":"11501_CR23","doi-asserted-by":"crossref","unstructured":"Wang W, Xie E, Li X, Hou W, Lu T, Yu G, Shao S (2019) Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (CVPR), pp 9336\u20139345","DOI":"10.1109\/CVPR.2019.00956"},{"key":"11501_CR24","doi-asserted-by":"crossref","unstructured":"Wang W, Xie E, Song X, Zang Y, Wang W, Lu T, Yu G, Shen C (2019) Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: Proceedings of the IEEE\/CVF International conference on computer vision(ICCV), pp 8439\u20138448","DOI":"10.1109\/ICCV.2019.00853"},{"key":"11501_CR25","doi-asserted-by":"crossref","unstructured":"Baek Y, Lee B, Han D, Yun S, Lee H (2019) Character region awareness for text detection. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (CVPR), pp 9357\u20139366","DOI":"10.1109\/CVPR.2019.00959"},{"key":"11501_CR26","first-page":"516","volume":"20","author":"J Ye","year":"2020","unstructured":"Ye J, Chen Z, Liu J, Du B (2020) TextFuseNet: scene text detection with richer fused features. Proc IJCAI 20:516\u2013522","journal-title":"Proc IJCAI"},{"issue":"1","key":"11501_CR27","doi-asserted-by":"publisher","first-page":"919","DOI":"10.1109\/TPAMI.2022.3155612","volume":"45","author":"M Liao","year":"2022","unstructured":"Liao M, Zou Z, Wan Z, Yao C, Bai X (2022) Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Trans Pattern Anal Mach Intell 45(1):919\u2013931","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"11501_CR28","unstructured":"Chen Z, Wang J, Wang W, Chen G, Xie E, Lou P, Lu T (2021) FAST: faster arbitrarily-shaped text detector with minimalist Kernel representation. arXiv preprint arXiv:2111.02394"},{"key":"11501_CR29","doi-asserted-by":"crossref","unstructured":"He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 770\u2013778","DOI":"10.1109\/CVPR.2016.90"},{"key":"11501_CR30","doi-asserted-by":"crossref","unstructured":"Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y(2017) Deformable convolutional networks. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 764\u2013773","DOI":"10.1109\/ICCV.2017.89"},{"key":"11501_CR31","doi-asserted-by":"crossref","unstructured":"Lin T, Doll\u00b4ar P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 936\u2013944","DOI":"10.1109\/CVPR.2017.106"},{"issue":"7","key":"11501_CR32","doi-asserted-by":"publisher","first-page":"56","DOI":"10.1145\/129902.129906","volume":"35","author":"BR Vatti","year":"1992","unstructured":"Vatti BR (1992) A generic solution to polygon clipping. Commun ACM 35(7):56\u201363","journal-title":"Commun ACM"},{"key":"11501_CR33","doi-asserted-by":"crossref","unstructured":"Shrivastava A, Gupta A, Girshick RB (2016) Training region-based object detectors with online hard example mining. In: Proceedings of the IEEE Conference on computer vision and pattern recognition (CVPR), pp 761\u2013769","DOI":"10.1109\/CVPR.2016.89"},{"key":"11501_CR34","doi-asserted-by":"crossref","unstructured":"Nayef N, Yin F, Bizid I, Choi H, Feng Y, Karatzas D, Luo Z, Pal U, Rigaud C, Chazalon J, Khlif W, Luqman MM, Burie J, Liu C, Ogier J (2017) ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: 2017 14th IAPR International Conference on document analysis and recognition (ICDAR), pp 1454\u20131459","DOI":"10.1109\/ICDAR.2017.237"},{"key":"11501_CR35","doi-asserted-by":"crossref","unstructured":"Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M, Matas J, Neumann L, Chandrasekhar VR, Lu S, Shafait F, Uchida S, Valveny E (2015) Icdar 2015 competition on robust reading. In: 2015 13th international conference on document analysis and recognition (ICDAR), pp 1156\u20131160","DOI":"10.1109\/ICDAR.2015.7333942"},{"key":"11501_CR36","unstructured":"Yao C, Bai X, Liu W, Ma Y, Tu Z (2012) Detecting texts of arbitrary orientations in natural images. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR), pp 1083\u20131090"},{"issue":"11","key":"11501_CR37","doi-asserted-by":"publisher","first-page":"4737","DOI":"10.1109\/TIP.2014.2353813","volume":"23","author":"C Yao","year":"2014","unstructured":"Yao C, Bai X, Liu W (2014) A unified framework for multi oriented text detection and recognition. Image Process IEEE Trans 23(11):4737\u20134749","journal-title":"Image Process IEEE Trans"},{"key":"11501_CR38","doi-asserted-by":"crossref","unstructured":"Chng CK, Chan CS (2017) Total-text: a comprehensive dataset for scene text detection and recognition. In: 14th IAPR International conference on document analysis and recognition (ICDAR), pp 935\u2013942","DOI":"10.1109\/ICDAR.2017.157"},{"key":"11501_CR39","doi-asserted-by":"crossref","unstructured":"Zhang S, Zhu X, Hou J, Liu C, Yang C, Wang H, Yin X (2020) Deep relational reasoning graph network for arbitrary shape text detection. In: Proceedings of the IEEE\/CVF Conference on computer vision and pattern recognition(CVPR), pp 9696\u20139705","DOI":"10.1109\/CVPR42600.2020.00972"},{"key":"11501_CR40","doi-asserted-by":"crossref","unstructured":"Zhu Y, Chen J, Liang L, Kuang Z, Jin L, Zhang W (2021) Fourier contour embedding for arbitrary-shaped text detection. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (CVPR), pp 3123\u20133131","DOI":"10.1109\/CVPR46437.2021.00314"},{"key":"11501_CR41","doi-asserted-by":"publisher","first-page":"454","DOI":"10.1109\/TMM.2020.2978630","volume":"23","author":"S Zhang","year":"2020","unstructured":"Zhang S, Liu Y, Jin L, Wei Z, Shen C (2020) OPMP: an omnidirectional pyramid mask proposal network for arbitrary-shape scene text detection. IEEE Trans Multimed 23:454\u2013467","journal-title":"IEEE Trans Multimed"},{"key":"11501_CR42","doi-asserted-by":"publisher","first-page":"108608","DOI":"10.1016\/j.patcog.2022.108608","volume":"127","author":"Y Cai","year":"2022","unstructured":"Cai Y, Liu YY, Shen C, Jin L, Li Y, Ergu D (2022) Arbitrarily shaped scene text detection with dynamic convolution. Pattern Recognit 127:108608","journal-title":"Pattern Recognit"},{"key":"11501_CR43","doi-asserted-by":"crossref","unstructured":"Zhang SX, Zhu X, Hou JB, Yang C, Yin XC (2022) Kernel proposal network for arbitrary shape text detection. IEEE Trans Neural Netw Learn Syst 1\u201312","DOI":"10.1109\/ICCV48922.2021.00134"},{"key":"11501_CR44","doi-asserted-by":"crossref","unstructured":"Su Y, Shao Z, Zhou Y, Meng H, Zhu H, Liu B, Yao R (2022) Textdct: arbitrary-shaped text detection via discrete cosine transform mask. IEEE Trans Multimed 1\u201314","DOI":"10.1109\/TMM.2022.3186431"},{"key":"11501_CR45","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/TIP.2022.3201467","volume":"32","author":"F Wang","year":"2022","unstructured":"Wang F, Xu X, Chen Y, Li X (2022) Fuzzy semantics for arbitrary-shaped scene text detection. IEEE Trans Image Process 32:1\u201312","journal-title":"IEEE Trans Image Process"},{"issue":"1s","key":"11501_CR46","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3524617","volume":"19","author":"Z Fu","year":"2023","unstructured":"Fu Z, Xie H, Fang S, Wang Y, Xing M, Zhang Y (2023) Learning pixel affinity pyramid for arbitrary-shaped text detection. ACM Trans Multimed Comput 19(1s):1\u201324","journal-title":"ACM Trans Multimed Comput"},{"key":"11501_CR47","doi-asserted-by":"crossref","unstructured":"Yang C, Chen M, Yuan Y, Wang Q (2023). Text growing on leaf. IEEE Trans Multimed 1\u201314","DOI":"10.1109\/TNNLS.2023.3289327"},{"key":"11501_CR48","doi-asserted-by":"crossref","unstructured":"Dai P, Zhang S, Zhang H, Cao C (2021) Progressive contour regression for arbitrary-shape scene text detection. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition (CVPR), pp 7393\u20137402","DOI":"10.1109\/CVPR46437.2021.00731"}],"container-title":["Neural Processing Letters"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11063-024-11501-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11063-024-11501-7\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11063-024-11501-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,5,16]],"date-time":"2024-05-16T20:22:56Z","timestamp":1715890976000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11063-024-11501-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,2,17]]},"references-count":48,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2024,4]]}},"alternative-id":["11501"],"URL":"https:\/\/doi.org\/10.1007\/s11063-024-11501-7","relation":{},"ISSN":["1573-773X"],"issn-type":[{"value":"1573-773X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,2,17]]},"assertion":[{"value":"25 November 2023","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"17 February 2024","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"All authors declare that they have no conflict of interest in this research.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}},{"value":"This work does not involve the participation of any human or animal subject.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical Approval"}}],"article-number":"52"}}