{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,10]],"date-time":"2026-01-10T07:44:42Z","timestamp":1768031082768,"version":"3.49.0"},"reference-count":46,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2022,11,30]],"date-time":"2022-11-30T00:00:00Z","timestamp":1669766400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,11,30]],"date-time":"2022-11-30T00:00:00Z","timestamp":1669766400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2023,6]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Text recognition has been applied in many fields recently, such as robot vision, video retrieval, and scene understanding. However, minimal research has been conducted in the field of logistics wherein images of express sheets captured by cameras are mostly curved, distorted, and have low resolution. In this study, a new method is proposed to address the aforementioned research gap while simultaneously considering irregular and low-resolution English letters. The entire approach comprises a rectification module, a convolutional neural network (CNN) extractor, a semantic context module (SCM), a global context module (GCM), and a lightweight transformer decoder that can exhibit improved training speed. In particular, we propose the idea of context modeling in our proposed method. (1) The proposed SCM is introduced to capture full-image dependencies and generates rich semantic context information. (2) We propose the GCM, which not only enhances long-range dependencies from the output of SCM but also outputs abundant pixel information to the self-attention decoder. (3) To solve the low-resolution text recognition problem in a large number of express sheet scenes, we propose Chinese datasets for improving intelligent logistics. Experiments conducted on six public benchmarks demonstrate that the developed method achieves better robustness to low-resolution and irregular text images.<\/jats:p>","DOI":"10.1007\/s40747-022-00916-1","type":"journal-article","created":{"date-parts":[[2022,11,30]],"date-time":"2022-11-30T09:05:41Z","timestamp":1669799141000},"page":"3229-3248","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["Scene text recognition via context modeling for low-quality image in logistics industry"],"prefix":"10.1007","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2165-0021","authenticated-orcid":false,"given":"Herui","family":"Heng","sequence":"first","affiliation":[]},{"given":"Peiji","family":"Li","sequence":"additional","affiliation":[]},{"given":"Tuxin","family":"Guan","sequence":"additional","affiliation":[]},{"given":"Tianyu","family":"Yang","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,11,30]]},"reference":[{"key":"916_CR1","doi-asserted-by":"publisher","unstructured":"Baek J, Kim G, Lee J et al (2019) What is wrong with scene text recognition model comparisons dataset and model analysis. In: Proceedings of IEEE\/CVF international conference on computervision, pp 4714\u20134722. https:\/\/doi.org\/10.1109\/ICCV.2019.00481","DOI":"10.1109\/ICCV.2019.00481"},{"key":"916_CR2","unstructured":"Jaderberg M, Simonyan K, Zisserman A et al (2015) Spatial transformer networks. In: Proceedings of conference and workshop on neural information processing systems, pp 2017\u20132025"},{"key":"916_CR3","doi-asserted-by":"publisher","unstructured":"He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 770\u2013778. https:\/\/doi.org\/10.1109\/CVPR.2016.90","DOI":"10.1109\/CVPR.2016.90"},{"issue":"11","key":"916_CR4","doi-asserted-by":"publisher","first-page":"2298","DOI":"10.1109\/TPAMI.2016.2646371","volume":"39","author":"B Shi","year":"2015","unstructured":"Shi B, Bai X, Yao C (2015) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell 39(11):2298\u20132304. https:\/\/doi.org\/10.1109\/TPAMI.2016.2646371","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"916_CR5","doi-asserted-by":"publisher","unstructured":"Sundermeyer M, Schl\u00fcter R, Ney H (2012) Lstm neural networks for language modeling. In: Thirteenth annual conference of the international speech communication association, pp 4. https:\/\/doi.org\/10.21437\/Interspeech.2012-65","DOI":"10.21437\/Interspeech.2012-65"},{"key":"916_CR6","doi-asserted-by":"publisher","unstructured":"Cheng Z, Bai F, Xu Y et al (2017) Focusing attention: towards accurate text recognition in natural images. In: Proceedings of IEEE international conference on computer vision, pp 5086\u20135094. https:\/\/doi.org\/10.1109\/ICCV.2017.543","DOI":"10.1109\/ICCV.2017.543"},{"key":"916_CR7","unstructured":"Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L et al (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998\u20136008"},{"key":"916_CR8","doi-asserted-by":"publisher","unstructured":"Carion N, Massa F, Synnaeve G et al (2020) End-to-end object detection with transformers. In: Proceedings of european conference on computer vision, pp 213\u2013229. https:\/\/doi.org\/10.1007\/978-3-030-58452-8_13","DOI":"10.1007\/978-3-030-58452-8_13"},{"key":"916_CR9","unstructured":"Zhu X, Su W, Lu L et al (2021) Deformable DETR: deformable transformers for end-to-end object detection. abs\/2010.04159"},{"key":"916_CR10","doi-asserted-by":"publisher","unstructured":"Vaidwan H, Seth N, Parihar A S, Singh K (2021) A study on transformer-based object detection.\u00a0In: Proceedings of 2021 international conference on intelligent technologies (CONIT), pp 1\u20136. https:\/\/doi.org\/10.1109\/CONIT51480.2021.9498550","DOI":"10.1109\/CONIT51480.2021.9498550"},{"key":"916_CR11","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2021.107980","volume":"117","author":"N Lu","year":"2021","unstructured":"Lu N, Yu W, Qi X et al (2021) MASTER: multi-aspect non-local network for scene text recognition. Patt Recogn 117:107980. https:\/\/doi.org\/10.1016\/j.patcog.2021.107980","journal-title":"Patt Recogn"},{"key":"916_CR12","doi-asserted-by":"publisher","unstructured":"Lee J, Park S, Baek J, Oh S J, Kim S, Lee H (2020) On recognizing texts of arbitrary shapes with 2D self-attention. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops (CVPRW), pp 2326\u20132335. https:\/\/doi.org\/10.1109\/CVPRW50498.2020.00281","DOI":"10.1109\/CVPRW50498.2020.00281"},{"key":"916_CR13","doi-asserted-by":"publisher","unstructured":"Kim Y G, Kim H, Kang M, Lee H J, Lee R, Park G (2021) Analysis of the novel transformer module combination for scene text recognition.\u00a0In: proceedings of 2021 IEEE international conference on image processing (ICIP), pp 1229\u20131233. https:\/\/doi.org\/10.1109\/ICIP42928.2021.9506779","DOI":"10.1109\/ICIP42928.2021.9506779"},{"key":"916_CR14","doi-asserted-by":"publisher","unstructured":"Ren L, Zhou H, Chen J et al (2021) A transformer-based decoupled attention network for text recognition in shopping receipt images. In: Proceedings of neural computing for advanced applications, pp 563\u2013577. https:\/\/doi.org\/10.1007\/978-981-16-5188-5_40","DOI":"10.1007\/978-981-16-5188-5_40"},{"key":"916_CR15","doi-asserted-by":"publisher","unstructured":"Zhu Y, Wang S, Huang Z, Chen K (2019) text recognition in images based on transformer with hierarchical attention. In: Proceedings of 2019 IEEE international conference on image processing (ICIP), pp 1945\u20131949. https:\/\/doi.org\/10.1109\/ICIP.2019.8803203","DOI":"10.1109\/ICIP.2019.8803203"},{"key":"916_CR16","doi-asserted-by":"publisher","unstructured":"Huang Z, Wang X, Huang L, Huang C., Wei Y, Liu W (2019) CCNet: criss-cross attention for semantic segmentation. In: Proceedings of 2019 IEEE\/CVF international conference on computer vision, pp 603\u2013612. https:\/\/doi.org\/10.1109\/ICCV.2019.00069","DOI":"10.1109\/ICCV.2019.00069"},{"key":"916_CR17","doi-asserted-by":"publisher","unstructured":"Shi B, Wang X, Lyu P, Yao C, Bai X (2016) Robust scene text recognition with automatic rectification. In: Proceedings of conference on computer vision and pattern recognition, pp 4168\u20134176. https:\/\/doi.org\/10.1109\/CVPR.2016.452","DOI":"10.1109\/CVPR.2016.452"},{"issue":"9","key":"916_CR18","doi-asserted-by":"publisher","first-page":"2035","DOI":"10.1109\/TPAMI.2018.2848939","volume":"41","author":"B Shi","year":"2019","unstructured":"Shi B, Yang M, Wang X et al (2019) ASTER: an attentional scene text recognizer with flexiblerectification. IEEE Trans Pattern Anal Mach Intell 41(9):2035\u20132048. https:\/\/doi.org\/10.1109\/TPAMI.2018.2848939","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"916_CR19","doi-asserted-by":"publisher","unstructured":"Zhan F, Lu S (2019) ESIR: end-to-end scene text recognition via iterative image rectification. In: CVPR, pp 2054\u20132063. https:\/\/doi.org\/10.1109\/CVPR.2019.00216","DOI":"10.1109\/CVPR.2019.00216"},{"key":"916_CR20","doi-asserted-by":"publisher","unstructured":"Zhang Y, Nie S, Liu W et al (2019) Sequence-to-sequence domain adaptation network for robust text image recognition. In: Proceedings of IEEE\/CVF conference on computer vision and pattern recognition, pp 2735\u20132744. https:\/\/doi.org\/10.1109\/CVPR.2019.00285","DOI":"10.1109\/CVPR.2019.00285"},{"key":"916_CR21","doi-asserted-by":"publisher","unstructured":"Wan Z, Zhang J, Zhang L et al (2020) On vocabulary reliance in scene text recognition. in: Proceedings of IEEE\/CVF conference on computer vision and pattern recognition, pp 11422\u201311431. https:\/\/doi.org\/10.1109\/CVPR42600.2020.01144","DOI":"10.1109\/CVPR42600.2020.01144"},{"key":"916_CR22","doi-asserted-by":"publisher","unstructured":"Yu D, Li X, Zhang C et al (2020) Towards accurate scene text recognition with semantic reasoning networks. In: Proceedings of IEEE\/CVF conference on computer vision and pattern recognition, pp 12110\u201312119. https:\/\/doi.org\/10.1109\/CVPR42600.2020.01213","DOI":"10.1109\/CVPR42600.2020.01213"},{"key":"916_CR23","doi-asserted-by":"publisher","unstructured":"Qiao Z, Zhou Y, Yang D et al (2020) SEED: semantics enhanced encoder-decoder framework forscene text recognition. In: 2020 IEEE\/CVF conference on computer vision and pattern recognition, pp 13525\u201313534. https:\/\/doi.org\/10.1109\/CVPR42600.2020.01354","DOI":"10.1109\/CVPR42600.2020.01354"},{"key":"916_CR24","doi-asserted-by":"publisher","unstructured":"Zhang H, Yao Q, Yang M et al (2020) AutoSTR: efficient backbone search for scene text recognition. In: Proceedings of European conference on computer vision, pp 751\u2013767. https:\/\/doi.org\/10.1007\/978-3-030-58586-0_44","DOI":"10.1007\/978-3-030-58586-0_44"},{"key":"916_CR25","doi-asserted-by":"crossref","unstructured":"Liu Z, Li Y, Ren F, Goh W L, Yu H (2018) SqueezedText: a real-time scene text recognition by binary convolutional encoder-decoder network. In: Proceedings of association for the advancement of artificial intelligence, pp 7194\u20137201","DOI":"10.1609\/aaai.v32i1.12252"},{"key":"916_CR26","unstructured":"Li B, Tang X, Qi X et al (2020) Hamming OCR: a locality sensitive hashing neural network for scene text recognition. arXiv:2009.10874"},{"issue":"18","key":"916_CR27","doi-asserted-by":"publisher","first-page":"8027","DOI":"10.1016\/j.eswa.2014.07.008","volume":"41","author":"A Risnumawan","year":"2014","unstructured":"Risnumawan A, Shivakumara P, Chan CS et al (2014) A robust arbitrary text detection system for natural scene images. Expert Syst Appl 41(18):8027\u20138048. https:\/\/doi.org\/10.1016\/j.eswa.2014.07.008","journal-title":"Expert Syst Appl"},{"key":"916_CR28","doi-asserted-by":"publisher","unstructured":"Lin T Y, Piotr D, Ross G, He K M, Bharath H, Serge B (2017) Feature pyramid networks for object detection. In: Proceedings of conference on computer vision and pattern recognition, pp 2117\u20132125. https:\/\/doi.org\/10.1109\/CVPR.2017.106","DOI":"10.1109\/CVPR.2017.106"},{"key":"916_CR29","doi-asserted-by":"crossref","unstructured":"He KM, Zhang XY, Ren SQ, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the ieee conference on computer vision and pattern recognition, pp 770\u2013778","DOI":"10.1109\/CVPR.2016.90"},{"key":"916_CR30","doi-asserted-by":"publisher","unstructured":"Cao Y, Xu J, Lin S, Wei F, Hu H (2019) GCNet: non-local networks meet squeeze-excitation networks and beyond. In: Proceedings of international conference on computer vision workshop, pp 1971\u20131980. https:\/\/doi.org\/10.1109\/ICCVW.2019.00246","DOI":"10.1109\/ICCVW.2019.00246"},{"key":"916_CR31","doi-asserted-by":"publisher","unstructured":"Mishra A, Alahari K, Jawahar C V (2012) Scene text recognition using higher order language priors. In: Proceedings of British machine vision conference, pp 1\u201311. https:\/\/doi.org\/10.5244\/C.26.127","DOI":"10.5244\/C.26.127"},{"key":"916_CR32","unstructured":"Wang K, Babenko B, Belongie SJ (2011) End-to-end scene text recognition. In: Proceedings of the ICCV, pp 1457\u20131464"},{"key":"916_CR33","doi-asserted-by":"publisher","unstructured":"Karatzas D, Shafait F, Uchida S, Iwamura M, Bigorda GL, Mestre SR, Mas J, Mota DF, Almazan JA et al (2013) ICDAR 2013 robust reading competition. In: ICDAR, pp 1484\u20131493. https:\/\/doi.org\/10.1109\/ICDAR.2013.221","DOI":"10.1109\/ICDAR.2013.221"},{"key":"916_CR34","doi-asserted-by":"crossref","unstructured":"Lucas SM, Panaretos A, Sosa L, Tang A, Wong S, Young R (2003) ICDAR 2003 robust reading competitions. In: ICDAR, pp 682\u2013687","DOI":"10.1109\/ICDAR.2003.1227749"},{"key":"916_CR35","doi-asserted-by":"publisher","unstructured":"Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M, Matas J, NeumannL, Chandrasekhar V R et al (2015) ICDAR 2015 competition on robust reading. In: ICDAR, pp 1156\u20131160. https:\/\/doi.org\/10.1109\/ICDAR.2015.7333942","DOI":"10.1109\/ICDAR.2015.7333942"},{"key":"916_CR36","doi-asserted-by":"publisher","unstructured":"Phan T Q, Shivakumara P, Tian S et al (2013) Recognizing text with perspective distortion in natural scenes. In: 2013 IEEE international conference on computer vision, pp 569\u2013576. https:\/\/doi.org\/10.1109\/ICCV.2013.76","DOI":"10.1109\/ICCV.2013.76"},{"key":"916_CR37","doi-asserted-by":"publisher","unstructured":"Gupta A, Vedaldi A, Zisserman A (2016) Synthetic data for text localisation in natural images. In: Proceedings of computer vision and pattern recognition, pp 2315\u20132324. https:\/\/doi.org\/10.1109\/CVPR.2016.254","DOI":"10.1109\/CVPR.2016.254"},{"key":"916_CR38","unstructured":"Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2014) Synthetic data and artificial neural networks for natural scene text recognition. In: Proceedings of advances in neural information processing systems workshop"},{"key":"916_CR39","unstructured":"Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: Proceeding of international conference for learning representations"},{"key":"916_CR40","doi-asserted-by":"publisher","first-page":"109","DOI":"10.1016\/j.patcog.2019.01.020","volume":"90","author":"C Luo","year":"2019","unstructured":"Luo C, Jin L, Sun Z (2019) MORAN: a multi-object rectified attention network for scene text recognition. Patt Recogn 90:109\u2013118. https:\/\/doi.org\/10.1016\/j.patcog.2019.01.020","journal-title":"Patt Recogn"},{"key":"916_CR41","doi-asserted-by":"publisher","unstructured":"Yang M, Guan Y, Liao M, He X, Bian K, Bai S, Yao C, Bai X (2019) Symmetry constrained rectification network for scene text recognition. In: Proceedings of IEEE international conference on computer vision, pp 9147\u20139156. https:\/\/doi.org\/10.1109\/ICCV.2019.00924","DOI":"10.1109\/ICCV.2019.00924"},{"key":"916_CR42","doi-asserted-by":"publisher","unstructured":"Wang TW, Zhu YZ, Jin LW, Luo CJ, Chen XX, Wu YQ, Wang QY, Cai MX (2020) Decoupled attention network for text recognition. In: proceeding of the Association for the Advance of Artificial Intelligence, pp 12216\u201312224. https:\/\/doi.org\/10.1609\/aaai.v34i07.6903","DOI":"10.1609\/aaai.v34i07.6903"},{"key":"916_CR43","doi-asserted-by":"publisher","unstructured":"Wang YZ, Lian ZH (2020) Exploring font-independent features for scene text recognition. In: European conference on computer vision, pp 1900\u20131920. https:\/\/doi.org\/10.1145\/3394171.3413592","DOI":"10.1145\/3394171.3413592"},{"key":"916_CR44","doi-asserted-by":"publisher","first-page":"67","DOI":"10.1016\/j.neucom.2020.07.010","volume":"2020","author":"L Yang","year":"2020","unstructured":"Yang L, Wang P, Li H, Li Z, Zhang Y (2020) A holistic representation guided attention network for scene text recognition. Neurocomputing 2020:67\u201375. https:\/\/doi.org\/10.1016\/j.neucom.2020.07.010","journal-title":"Neurocomputing"},{"key":"916_CR45","doi-asserted-by":"publisher","first-page":"278","DOI":"10.1016\/j.neucom.2020.04.129","volume":"425","author":"C Wang","year":"2021","unstructured":"Wang C, Liu CL (2021) Multi-branch guided attention network for irregular text recognition. Neurocomputing 425:278\u2013289. https:\/\/doi.org\/10.1016\/j.neucom.2020.04.129","journal-title":"Neurocomputing"},{"issue":"10","key":"916_CR46","doi-asserted-by":"publisher","first-page":"6698","DOI":"10.1007\/s10489-021-02219-3","volume":"51","author":"X Ma","year":"2021","unstructured":"Ma X, He K, Zhang D et al (2021) PIEED: position information enhanced encoder-decoder framework for scene text recognition. Appl Intell 51(10):6698\u20136707. https:\/\/doi.org\/10.1007\/s10489-021-02219-3","journal-title":"Appl Intell"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-022-00916-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-022-00916-1\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-022-00916-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,9]],"date-time":"2023-06-09T17:09:49Z","timestamp":1686330589000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-022-00916-1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,11,30]]},"references-count":46,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2023,6]]}},"alternative-id":["916"],"URL":"https:\/\/doi.org\/10.1007\/s40747-022-00916-1","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"value":"2199-4536","type":"print"},{"value":"2198-6053","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,11,30]]},"assertion":[{"value":"23 March 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"6 November 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"30 November 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no conflicts of interest to this work.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflicts of interest"}}]}}