{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,2]],"date-time":"2026-05-02T06:53:43Z","timestamp":1777704823970,"version":"3.51.4"},"reference-count":12,"publisher":"SAGE Publications","issue":"1","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IFS"],"published-print":{"date-parts":[[2021,1,4]]},"abstract":"<jats:p>This paper focuses on script identification in natural scene images. Traditional CNNs (Convolution Neural Networks) cannot solve this problem perfectly for two reasons: one is the arbitrary aspect ratios of scene images which bring much difficulty to traditional CNNs with a fixed size image as the input. And the other is that some scripts with minor differences are easily confused because they share a subset of characters with the same shapes. We propose a novel approach combing Score CNN, Attention CNN and patches. Attention CNN is utilized to determine whether a patch is a discriminative patch and calculate the contribution weight of the discriminative patch to script identification of the whole image. Score CNN uses a discriminative patch as input and predict the score of each script type. Firstly patches with the same size are extracted from the scene images. Secondly these patches are used as inputs to Score CNN and Attention CNN to train two patch-level classifiers. Finally, the results of multiple discriminative patches extracted from the same image via the above two classifiers are fused to obtain the script type of this image. Using patches with the same size as inputs to CNN can avoid the problems caused by arbitrary aspect ratios of scene images. The trained classifiers can mine discriminative patches to accurately identify some confusing scripts. The experimental results show the good performance of our approach on four public datasets.<\/jats:p>","DOI":"10.3233\/jifs-200260","type":"journal-article","created":{"date-parts":[[2020,11,6]],"date-time":"2020-11-06T11:46:59Z","timestamp":1604663219000},"page":"551-563","source":"Crossref","is-referenced-by-count":6,"title":["Mining discriminative patches for script identification in natural scene images"],"prefix":"10.1177","volume":"40","author":[{"given":"Liqiong","family":"Lu","sequence":"first","affiliation":[{"name":"Department of Information Engineering, Lingnan Normal University, Zhanjiang, P.R. China"},{"name":"School of Printing and Packaging, Wuhan University, Wuhan, P.R. China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dong","family":"Wu","sequence":"additional","affiliation":[{"name":"Department of Information Engineering, Lingnan Normal University, Zhanjiang, P.R. China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ziwei","family":"Tang","sequence":"additional","affiliation":[{"name":"School of Printing and Packaging, Wuhan University, Wuhan, P.R. China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yaohua","family":"Yi","sequence":"additional","affiliation":[{"name":"School of Printing and Packaging, Wuhan University, Wuhan, P.R. China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Faliang","family":"Huang","sequence":"additional","affiliation":[{"name":"School of Computer and Information Engineering, Nanning Normal University, Nanning, P.R. China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"179","reference":[{"issue":"12","key":"10.3233\/JIFS-200260_ref1","doi-asserted-by":"crossref","first-page":"2142","DOI":"10.1109\/TPAMI.2010.30","article-title":"Script Recognition \u2013 A Review","volume":"32","author":"Ghosh","year":"2012","journal-title":"IEEE Transactions on Pattern Analysis and Machining Intelligence"},{"key":"10.3233\/JIFS-200260_ref6","doi-asserted-by":"crossref","first-page":"66322","DOI":"10.1109\/ACCESS.2018.2878899","article-title":"Integrating Scene Text and Visual Appearance for Fine-Grained Image Classification","volume":"6","author":"Bai","year":"2017","journal-title":"IEEE ACCESS"},{"key":"10.3233\/JIFS-200260_ref14","doi-asserted-by":"crossref","first-page":"85","DOI":"10.1016\/j.patcog.2017.01.032","article-title":"Improving patch-based scene text script identification with ensembles of conjoined networks","volume":"67","author":"Gomez","year":"2017","journal-title":"Pattern Recognition"},{"key":"10.3233\/JIFS-200260_ref15","doi-asserted-by":"crossref","first-page":"448","DOI":"10.1016\/j.patcog.2015.11.005","article-title":"Script identification in the wild via discriminative convolutional neural network","volume":"52","author":"Shi","year":"2016","journal-title":"Pattern Recognition"},{"key":"10.3233\/JIFS-200260_ref17","doi-asserted-by":"crossref","first-page":"172","DOI":"10.1016\/j.patcog.2018.07.034","article-title":"Script Identification in Natural Scene Image and Video Frame using Attention based Convolutional-LSTM Network","volume":"85","author":"Kumar","year":"2019","journal-title":"Pattern Recognition"},{"issue":"8","key":"10.3233\/JIFS-200260_ref18","doi-asserted-by":"crossref","first-page":"1881","DOI":"10.1109\/TMM.2017.2692650","article-title":"Overlapping Community Detection for Multimedia Social Networks","volume":"19","author":"Huang","year":"2017","journal-title":"IEEE Transactions on Multimedia"},{"key":"10.3233\/JIFS-200260_ref21","first-page":"6546","article-title":"Script Identification of Multi-Script Documents: a Survey","volume":"5","author":"Ubul","year":"2017","journal-title":"IEEE Access"},{"key":"10.3233\/JIFS-200260_ref27","doi-asserted-by":"crossref","unstructured":"Verma M. , Sood N. , Roy P.P. and Raman B. , Script Identification in Natural Scene Images: A Dataset and Texture-Feature Based Performance Evaluation, Proceedings of International Conference on Computer Vision and Image Processing, 2017, 309\u2013319.","DOI":"10.1007\/978-981-10-2107-7_28"},{"key":"10.3233\/JIFS-200260_ref29","doi-asserted-by":"crossref","unstructured":"Fasil O.K. , Manjunath S. and Aradhya V.N.M. , Word-Level Script Identification from Scene Images, Proceedings of the 5th International Conference on Frontiers in Intelligent Computing: Theory and Applications, 2017, 417\u2013426.","DOI":"10.1007\/978-981-10-3156-4_43"},{"key":"10.3233\/JIFS-200260_ref30","doi-asserted-by":"crossref","unstructured":"Jia Y. , Shelhamer E. , Donahue J. , Karayev S. , Long J. , Girshick R. , Guadarrama S. and Darrell T. , Caffe: Convolutional architecture for fast feature embedding, the 22nd ACM international conference on Multimedia, Orlando, USA, 2014, pp. 675\u2013678.","DOI":"10.1145\/2647868.2654889"},{"key":"10.3233\/JIFS-200260_ref31","doi-asserted-by":"crossref","unstructured":"Karpathy A. , Toderici G. , Shetty S. , Leung T. , Sukthankar R. and Fei-Fei L. , Large-scale Video Classification with Convolutional Neural Networks, the IEEE conference on Computer Vision and Pattern Recognition (CVPR), Columbus, USA, 2014, pp. 1725\u20131732.","DOI":"10.1109\/CVPR.2014.223"},{"key":"10.3233\/JIFS-200260_ref32","doi-asserted-by":"crossref","unstructured":"Lowe D.G. , Object recognition from local scale-invariant features, the 7th IEEE International Conference on Computer Vision, Kerkyra, Greece, 1999, pp. 1150\u20131157.","DOI":"10.1109\/ICCV.1999.790410"}],"container-title":["Journal of Intelligent &amp; Fuzzy Systems"],"original-title":[],"link":[{"URL":"https:\/\/content.iospress.com\/download?id=10.3233\/JIFS-200260","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T09:42:06Z","timestamp":1777455726000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/full\/10.3233\/JIFS-200260"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,1,4]]},"references-count":12,"journal-issue":{"issue":"1"},"URL":"https:\/\/doi.org\/10.3233\/jifs-200260","relation":{},"ISSN":["1064-1246","1875-8967"],"issn-type":[{"value":"1064-1246","type":"print"},{"value":"1875-8967","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,1,4]]}}}