{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,3]],"date-time":"2026-06-03T13:05:54Z","timestamp":1780491954103,"version":"3.54.1"},"reference-count":59,"publisher":"MDPI AG","issue":"16","license":[{"start":{"date-parts":[[2022,8,12]],"date-time":"2022-08-12T00:00:00Z","timestamp":1660262400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"UNIQUARE GmbH, Austria"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>In this study, we propose a new model for optical character recognition (OCR) based on both CNNs (convolutional neural networks) and RNNs (recurrent neural networks). The distortions affecting the document image can take different forms, such as blur (focus blur, motion blur, etc.), shadow, bad contrast, etc. Document-image distortions significantly decrease the performance of OCR systems, to the extent that they reach a performance close to zero. Therefore, a robust OCR model that performs robustly even under hard (distortion) conditions is still sorely needed. However, our comprehensive study in this paper shows that various related works can somewhat improve their respective OCR recognition performance of degraded document images (e.g., captured by smartphone cameras under different conditions and, thus, distorted by shadows, contrast, blur, etc.), but it is worth underscoring, that improved recognition is neither sufficient nor always satisfactory\u2014especially in very harsh conditions. Therefore, in this paper, we suggest and develop a much better and fully different approach and model architecture, which significantly outperforms the aforementioned previous related works. Furthermore, a new dataset was gathered to show a series of different and well-representative real-world scenarios of hard distortion conditions. The new OCR model suggested performs in such a way that even document images (even from the hardest conditions) that were previously not recognizable by other OCR systems can be fully recognized with up to 97.5% accuracy\/precision by our new deep-learning-based OCR model.<\/jats:p>","DOI":"10.3390\/s22166025","type":"journal-article","created":{"date-parts":[[2022,8,15]],"date-time":"2022-08-15T23:44:03Z","timestamp":1660607043000},"page":"6025","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["A Smart Visual Sensing Concept Involving Deep Learning for a Robust Optical Character Recognition under Hard Real-World Conditions"],"prefix":"10.3390","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6015-1703","authenticated-orcid":false,"given":"Kabeh","family":"Mohsenzadegan","sequence":"first","affiliation":[{"name":"Institute for Smart Systems Technologies, University Klagenfurt, 9020 Klagenfurt, Austria"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Vahid","family":"Tavakkoli","sequence":"additional","affiliation":[{"name":"Institute for Smart Systems Technologies, University Klagenfurt, 9020 Klagenfurt, Austria"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0773-9476","authenticated-orcid":false,"given":"Kyandoghere","family":"Kyamakya","sequence":"additional","affiliation":[{"name":"Institute for Smart Systems Technologies, University Klagenfurt, 9020 Klagenfurt, Austria"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2022,8,12]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1007\/s10032-007-0043-3","article-title":"A generalised framework for script identification","volume":"10","author":"Joshi","year":"2007","journal-title":"Int. J. Doc. Anal. Recognit."},{"key":"ref_2","first-page":"132","article-title":"A new document authentication method by embedding deformation characters","volume":"6067","author":"Wang","year":"2006","journal-title":"Electron. Imaging"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"77","DOI":"10.1007\/s10032-010-0112-x","article-title":"Special issue on document recognition and retrieval 2009","volume":"13","author":"Berkner","year":"2010","journal-title":"Int. J. Doc. Anal. Recognit."},{"key":"ref_4","unstructured":"Chung, Y., Chi, S., Bae, K.S., Kim, K., Jang, D., Kim, K., and Choi, Y. (2005). Optical Information Systems III, SPIE Optics."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Sharma, P., and Sharma, S. (2016). Image Processing Based Degraded Camera Captured Document Enhancement for Improved OCR Accuracy, IEEE.","DOI":"10.1109\/CONFLUENCE.2016.7508160"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Visvanathan, A., Chattopadhyay, T., and Bhattacharya, U. (2013). Enhancement of Camera Captured Text Images with Specular Reflection, IEEE.","DOI":"10.1109\/NCVPRIPG.2013.6776189"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Tian, D., Hao, Y., Ha, M., Tian, X., and Ha, Y. (2007). Algorithm of Contrast Enhancement for Visual Document Images with Underexposure, SPIE.","DOI":"10.1117\/12.790761"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"823","DOI":"10.1080\/01431160600746456","article-title":"A survey of image classification methods and techniques for improving classification performance","volume":"28","author":"Lu","year":"2007","journal-title":"J. Remote Sens."},{"key":"ref_9","unstructured":"Fan, M., Huang, R., Feng, W., and Sun, J. (2017). Image Blur Classification and Blur Usefulness Assessment, IEEE."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., and Liang, J. (2017, January 10). East: An Efficient and Accurate Scene Text Detector. Proceedings of the IEEE Conference on CVPR, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.283"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"107392","DOI":"10.1016\/j.patcog.2020.107392","article-title":"Reinterpreting CTC training as iterative fitting","volume":"105","author":"Li","year":"2020","journal-title":"Pattern Recognit."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1016\/j.neucom.2018.11.081","article-title":"Single infrared image enhancement using a deep convolutional neural network","volume":"332","author":"Kuang","year":"2019","journal-title":"Neurocomputing"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Lefkimmiatis, S. (2017, January 8\u201310). Non-local Color Image Denoising with Convolutional Neural Networks. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Bangalore, India.","DOI":"10.1109\/CVPR.2017.623"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"1216","DOI":"10.1109\/LSP.2018.2850222","article-title":"Nonlocality-Reinforced Convolutional Neural Networks for Image Denoising","volume":"25","author":"Cruz","year":"2018","journal-title":"IEEE Signal Process. Lett."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1016\/j.image.2018.02.001","article-title":"A novel contrast enhancement forensics based on convolutional neural networks","volume":"63","author":"Sun","year":"2018","journal-title":"Signal Process.-Image Commun."},{"key":"ref_16","unstructured":"Leal, H.K., and Yang, X. (2018). Removing the Blur in Images Using Deep Convolutional Neural Network, Young Scientist."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Nah, S., Kim, T.H., and Lee, K.M. (2017). Deep Multi-scale Convolutional Neural Network for Dynamic Scene Deblurring. Arxiv Comput. Vis. Pattern Recognit., 257\u2013265.","DOI":"10.1109\/CVPR.2017.35"},{"key":"ref_18","first-page":"1","article-title":"Text detection and recognition in the wild: A review","volume":"54","author":"Raisi","year":"2020","journal-title":"ACM Comput. Surv."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"1631","DOI":"10.1109\/TPAMI.2003.1251157","article-title":"Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm","volume":"25","author":"Kim","year":"2003","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_20","unstructured":"Dalal, N., and Triggs, B. (2005, January 20\u201325). Histograms of Oriented Gradients for Human Detection. Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR\u201905), San Diego, CA, USA."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Hanif, S.M., and Prevost, L. (2009, January 26\u201329). Text Detection and Localization in Complex Scene Images Using Constrained Adaboost Algorithm. Proceedings of the 2009 10th International Conference on Document Analysis and Recognition, Catalunya, Spain.","DOI":"10.1109\/ICDAR.2009.172"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Gupta, A., Vedaldi, A., and Zisserman, A. (2016, January 22). Synthetic Data for Text Localisation in Natural Images. Proceedings of the IEEE Conference Computer Vision and Pattern Recognition, Oxford, UK.","DOI":"10.1109\/CVPR.2016.254"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Jeon, M., and Jeong, Y. (2020). Compact and accurate scene text detector. Appl. Sci., 10.","DOI":"10.3390\/app10062096"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Kobchaisawat, T., Chalidabhongse, T., and Satoh, S. (2020). Scene text detection with polygon offsetting and border augmentation. Electronics, 9.","DOI":"10.3390\/electronics9010117"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"3676","DOI":"10.1109\/TIP.2018.2825107","article-title":"TextBoxes++: A Single-Shot Oriented Scene Text Detector","volume":"27","author":"Liao","year":"2018","journal-title":"IEEE Trans. Image Process."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Wang, K., and Belongie, S. (2010, January 24). Word Spotting in the Wild. Proceedings of the European Conference on Computer Vision, Berlin, Germany.","DOI":"10.1007\/978-3-642-15549-9_43"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"8027","DOI":"10.1016\/j.eswa.2014.07.008","article-title":"A robust arbitrary text detection system for natural scene images","volume":"41","author":"Karatzas","year":"2014","journal-title":"Expert Syst. Wit Appl."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Iwamura, M., Morimoto, N., Tainaka, K., Bazazian, D., Gomez, L., and Karatzas, D. (2017, January 9\u201315). ICDAR2017 Robust Reading Challenge on Omnidirectional Video. Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), Kyoto, Japan.","DOI":"10.1109\/ICDAR.2017.236"},{"key":"ref_29","first-page":"7","article-title":"STAR-Net: A spatial attention residue network for scene text recognition","volume":"2","author":"Liu","year":"2016","journal-title":"BMVC"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1023\/B:VISI.0000029664.99615.94","article-title":"Distinctive image features from scale-invariant keypoints","volume":"60","author":"Lowe","year":"2004","journal-title":"Int. J. Comp. Vis."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"293","DOI":"10.1023\/A:1018628609742","article-title":"Least squares support vector machine classifiers","volume":"9","author":"Suykens","year":"1999","journal-title":"Neural Process. Lett."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"175","DOI":"10.1080\/00031305.1992.10475879","article-title":"An introduction to kernel and nearest-neighbor noparametric regression","volume":"46","author":"Altman","year":"1992","journal-title":"Amer. Stat."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"1480","DOI":"10.1109\/TPAMI.2014.2366765","article-title":"Text detection and recognition in imagery: A survey","volume":"37","author":"Ye","year":"2015","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Neumann, L., and Matas, J. (2012, January 16\u201321). Real-Time Scene Text Localization and Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA.","DOI":"10.1109\/CVPR.2012.6248097"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"2552","DOI":"10.1109\/TPAMI.2014.2339814","article-title":"Word spotting and recognition with embedded attributes","volume":"36","author":"Gordo","year":"2014","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_36","unstructured":"Simponyan, K., and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv."},{"key":"ref_37","unstructured":"Wang, T., Wu, D., Coates, A., and Ng, A. (2012, January 11\u201315). End-to-End Text Recognition with Convolutional Neural Network. Proceedings of the International Conference on Pattern Recognition (ICPR), Tsukuba, Japan."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Bissacco, A., Cummins, M., Netzer, Y., and Neven, H. (2013, January 1\u20138). PhotoOCR: Reading Text in Uncontrolled Conditions. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Sydney, Australia.","DOI":"10.1109\/ICCV.2013.102"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/s11263-015-0823-z","article-title":"Reading text in the wild with convolutional neural networks","volume":"116","author":"Jaderberg","year":"2016","journal-title":"Int. J. Comp. Vis."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Borisyuk, F., Albert, G., and Viswanath, S. (2018, January 19). Rosetta: Large Scale System for Text Detection and Recognition in Images. Proceedings of the 4th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, New York, NY, USA.","DOI":"10.1145\/3219819.3219861"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"2298","DOI":"10.1109\/TPAMI.2016.2646371","article-title":"An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition","volume":"39","author":"Shi","year":"2016","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_42","unstructured":"Bahdanau, D., Cho, K., and Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"2035","DOI":"10.1109\/TPAMI.2018.2848939","article-title":"Aster: An attentional scene text recognizer with flexible rectification","volume":"41","author":"Shi","year":"2018","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Baek, J., Kim, G., Lee, J., Park, S., Han, D., Yun, S., Oh, S.J., and Lee, H. (2019, January 3). What is Wrong with Scene Text Recognition Model Comparisons? Dataset and Model Analysis. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00481"},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"82031","DOI":"10.1109\/ACCESS.2021.3086020","article-title":"U-net and its variants for medical image segmentation: A review of theory and applications","volume":"9","author":"Siddique","year":"2021","journal-title":"IEEE Access"},{"key":"ref_46","unstructured":"Maini, R., and Aggarwal, H. (2010). A Comprehensive Review of Image Enhancement Techniques. arXiv."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Yang, C., and Hsieh, C. (2019, January 3\u20136). High Accuracy Text Detection Using ResNet as Feature Extractor. Proceedings of the IEEE Eurasia Conference on IOT, Communication and Engineering (ECICE), Yunlin, Taiwan.","DOI":"10.1109\/ECICE47484.2019.8942666"},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"325","DOI":"10.1016\/j.neucom.2019.01.078","article-title":"Bidirectional LSTM with attention mechanism and convolutional layer for text classification","volume":"337","author":"Liu","year":"2019","journal-title":"Neurocomputing"},{"key":"ref_49","unstructured":"Kingma, D., and Adam, J.B. (2014). A method for stochastic optimization. arXiv."},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/s11263-015-0823-z","article-title":"Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition","volume":"116","author":"Jaderberg","year":"2014","journal-title":"Int. J. Comput. Vis."},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Shi, B., Bai, X., and Belongie, S. (2017, January 21\u201326). Detecting Oriented Text in Natural Images by Linking Segments. Proceedings of the IEEE Conference Computing Visual Pattern Recognition, Honolulu, HL, USA.","DOI":"10.1109\/CVPR.2017.371"},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Deng, D., Liu, H., Li, X., and Cai, D. (2018, January 27). Pixellink: Detecting Scene Text via Instance Segmentation. Proceedings of the 32nd AAAI Conference on Artificial Intelligence, Riverside, CA, USA.","DOI":"10.1609\/aaai.v32i1.12269"},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Long, S., Ruan, J., Zhang, W., He, X., Wu, W., and Yao, C. (2018, January 9). Textsnake: A Flexible Representation for Detecting Text of Arbitrary Shapes. Proceedings of the European Conference Computer Vision, Munich, Germany.","DOI":"10.1007\/978-3-030-01216-8_2"},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"106954","DOI":"10.1016\/j.patcog.2019.06.020","article-title":"SegLink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping","volume":"96","author":"Tang","year":"2019","journal-title":"Pattern Recognit."},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Lyu, P., Yao, C., Wu, W., Yan, S., and Bai, X. (2018, January 25). Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation. Proceedings of the IEEE Conference Computer Vision Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00788"},{"key":"ref_56","doi-asserted-by":"crossref","first-page":"5566","DOI":"10.1109\/TIP.2019.2900589","article-title":"TextField: Learning a deep direction field for irregular scene text detection","volume":"28","author":"Xu","year":"2019","journal-title":"IEEE Trans. Image Process."},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Wang, W., Xie, E., Li, X., Hou, W., Lu, T., Yu, G., and Shao, S. (2019, January 1). Shape Robust Text Detection with Progressive Scale Expansion Network. Proceedings of the IEEE Conference Computer Vision Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00956"},{"key":"ref_58","doi-asserted-by":"crossref","first-page":"45825","DOI":"10.1109\/ACCESS.2020.2978225","article-title":"Instance Segmentation Network with Self-Distillation for Scene Text Detection","volume":"8","author":"Yang","year":"2020","journal-title":"IEEE Access"},{"key":"ref_59","doi-asserted-by":"crossref","unstructured":"Shi, B., Wang, X., Lyu, P., Yao, C., and Bai, X. (2016, January 19). Robust Scene Text Recognition with Automatic Rectification. Proceedings of the CVPR, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.452"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/16\/6025\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T00:07:49Z","timestamp":1760141269000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/16\/6025"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,8,12]]},"references-count":59,"journal-issue":{"issue":"16","published-online":{"date-parts":[[2022,8]]}},"alternative-id":["s22166025"],"URL":"https:\/\/doi.org\/10.3390\/s22166025","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,8,12]]}}}