{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,12]],"date-time":"2026-05-12T16:11:35Z","timestamp":1778602295252,"version":"3.51.4"},"reference-count":63,"publisher":"MDPI AG","issue":"20","license":[{"start":{"date-parts":[[2021,10,12]],"date-time":"2021-10-12T00:00:00Z","timestamp":1633996800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>This paper\u2019s core objective is to develop and validate a new neurocomputing model to classify document images in particularly demanding hard conditions such as image distortions, image size variance and scale, a huge number of classes, etc. Document classification is a special machine vision task in which document images are categorized according to their likelihood. Document classification is by itself an important topic for the digital office and it has several usages. Additionally, different methods for solving this problem have been presented in various studies; their respectively reached performance is however not yet good enough. This task is very tough and challenging. Thus, a novel, more accurate and precise model is needed. Although the related works do reach acceptable accuracy values for less hard conditions, they generally fully fail in the face of those above-mentioned hard, real-world conditions, including, amongst others, distortions such as noise, blur, low contrast, and shadows. In this paper, a novel deep CNN model is developed, validated and benchmarked with a selection of the most relevant recent document classification models. Additionally, the model\u2019s sensitivity was significantly improved by injecting different artifacts during the training process. In the benchmarking, it does clearly outperform all others by at least 4%, thus reaching more than 96% accuracy.<\/jats:p>","DOI":"10.3390\/s21206763","type":"journal-article","created":{"date-parts":[[2021,10,13]],"date-time":"2021-10-13T06:38:41Z","timestamp":1634107121000},"page":"6763","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["A Deep-Learning Based Visual Sensing Concept for a Robust Classification of Document Images under Real-World Hard Conditions"],"prefix":"10.3390","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6015-1703","authenticated-orcid":false,"given":"Kabeh","family":"Mohsenzadegan","sequence":"first","affiliation":[{"name":"Institute for Smart Systems Technologies, University Klagenfurt, 9020 Klagenfurt, Austria"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Vahid","family":"Tavakkoli","sequence":"additional","affiliation":[{"name":"Institute for Smart Systems Technologies, University Klagenfurt, 9020 Klagenfurt, Austria"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0773-9476","authenticated-orcid":false,"given":"Kyandoghere","family":"Kyamakya","sequence":"additional","affiliation":[{"name":"Institute for Smart Systems Technologies, University Klagenfurt, 9020 Klagenfurt, Austria"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2021,10,12]]},"reference":[{"key":"ref_1","unstructured":"Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R.R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"162","DOI":"10.1016\/j.neucom.2019.11.084","article-title":"Local manifold sparse model for image classification","volume":"382","author":"Luo","year":"2020","journal-title":"Neurocomputing"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"145","DOI":"10.5194\/isprsarchives-XL-3-145-2014","article-title":"A Dynamic Bayes Network for visual Pedestrian Tracking","volume":"XL-3","author":"Klinger","year":"2014","journal-title":"ISPRS-Int. Arch. Photogramm. Remote. Sens. Spat. Inf. Sci."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"16692","DOI":"10.3390\/s140916692","article-title":"The Feature Extraction Based on Texture Image Information for Emotion Sensing in Speech","volume":"14","author":"Wang","year":"2014","journal-title":"Sensors"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"19","DOI":"10.5194\/isprs-annals-IV-1-W1-19-2017","article-title":"SECURITY EVENT RECOGNITION FOR VISUAL SURVEILLANCE","volume":"IV-1\/W1","author":"Liao","year":"2017","journal-title":"ISPRS Ann. Photogramm. Remote. Sens. Spat. Inf. Sci."},{"key":"ref_6","unstructured":"Agricultura, J., and Chun, J.C. (2016). In the Current Network Environment, the Management of Official Documents Requires the Archives, MEICI."},{"key":"ref_7","first-page":"224","article-title":"A Robust System for Noisy Image Classification Combining Denoising Autoencoder and Convolutional Neural Network","volume":"9","author":"Roy","year":"2018","journal-title":"Int. J. Adv. Comput. Sci. Appl."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"515","DOI":"10.1109\/TPAMI.2018.2794470","article-title":"Semi-Supervised Discriminative Classification Robust to Sample-Outliers and Feature-Noises","volume":"41","author":"Adeli","year":"2019","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"4806","DOI":"10.1109\/ACCESS.2019.2962617","article-title":"The Real-World-Weight Cross-Entropy Loss Function: Modeling the Costs of Mislabeling","volume":"8","author":"Ho","year":"2020","journal-title":"IEEE Access"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Cao, J., Su, Z., Yu, L., Chang, D., Li, X., and Ma, Z. (December, January 30). Softmax Cross Entropy Loss with Unbiased Decision Boundary for Image Classification. Proceedings of the 2018 Chinese Automation Congress (CAC), Xi\u2019an, China.","DOI":"10.1109\/CAC.2018.8623242"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"166","DOI":"10.1016\/j.triboint.2019.05.029","article-title":"A hybrid convolutional neural network for intelligent wear particle classification","volume":"138","author":"Peng","year":"2019","journal-title":"Tribol. Int."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"931","DOI":"10.1007\/s11554-016-0569-z","article-title":"Real-time raindrop detection based on cellular neural networks for ADAS","volume":"16","author":"Ali","year":"2019","journal-title":"J. Real-Time Image Process."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Nasir, I.M., Khan, M.A., Yasmin, M., Shah, J.H., Gabryel, M., Scherer, R., and Dama\u0161evi\u010dius, R. (2020). Pearson Correlation-Based Feature Selection for Document Classification Using Balanced Training. Sensors, 20.","DOI":"10.3390\/s20236793"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"013013","DOI":"10.1117\/1.JEI.22.1.013013","article-title":"Developing an efficient technique for satellite image denoising and resolution enhancement for improving classification accuracy","volume":"22","author":"Thangaswamy","year":"2013","journal-title":"J. Electron. Imaging"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"237","DOI":"10.1016\/j.neucom.2016.11.100","article-title":"On the application of reservoir computing networks for noisy image recognition","volume":"277","author":"Jalalvand","year":"2018","journal-title":"Neurocomputing"},{"key":"ref_16","first-page":"1","article-title":"Multiclass Noisy Image Classification Based on Optimal Threshold and Neighboring Window Denoising","volume":"4","author":"Singh","year":"2015","journal-title":"Int. J. Comput. Eng. Sci."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"541","DOI":"10.1162\/neco.1989.1.4.541","article-title":"Backpropagation Applied to Handwritten Zip Code Recognition","volume":"1","author":"LeCun","year":"1989","journal-title":"Neural Comput."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"2402","DOI":"10.1001\/jama.2016.17216","article-title":"Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs","volume":"316","author":"Gulshan","year":"2016","journal-title":"JAMA"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"115","DOI":"10.1038\/nature21056","article-title":"Dermatologist-level classification of skin cancer with deep neural networks","volume":"542","author":"Esteva","year":"2017","journal-title":"Nature"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"377","DOI":"10.1016\/j.procs.2018.05.198","article-title":"An Analysis of Convolutional Neural Networks for Image Classification","volume":"132","author":"Sharma","year":"2018","journal-title":"Procedia Comput. Sci."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"12755","DOI":"10.1109\/ACCESS.2018.2796722","article-title":"Image Classification Based on the Boost Convolutional Neural Network","volume":"6","author":"Lee","year":"2018","journal-title":"IEEE Access"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Wang, Q., Zhou, C., and Xu, N. (2017, January 25\u201326). Street view image classification based on convolutional neural network. Proceedings of the 2017 IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Chongqing, China.","DOI":"10.1109\/IAEAC.2017.8054251"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"937","DOI":"10.1109\/TGRS.2017.2756851","article-title":"Multisource Remote Sensing Data Classification Based on Convolutional Neural Network","volume":"56","author":"Xu","year":"2017","journal-title":"IEEE Trans. Geosci. Remote. Sens."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Wenhui, Y., and Fan, Y. (2017, January 23\u201325). Lidar Image Classification Based on Convolutional Neural Networks. Proceedings of the 2017 International Conference on Computer Network, Electronic and Automation (ICCNEA), Xi\u2019an, China.","DOI":"10.1109\/ICCNEA.2017.37"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Ahn, J., Park, J., Park, D., Paek, J., and Ko, J. (2018). Convolutional neural network-based classification system design with compressed wireless sensor network images. PLoS ONE, 13.","DOI":"10.1371\/journal.pone.0196251"},{"key":"ref_26","unstructured":"Ioffe, S., and Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"He, K.M., Zhang, X.Y., Ren, S.Q., and Sun, J. (2016, January 27\u201330). Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1145\/3065386","article-title":"ImageNet classification with deep convolutional neural networks","volume":"60","author":"Krizhevsky","year":"2017","journal-title":"Commun. ACM"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., and Fei-Fei, L. (2014, January 23\u201328). Large-Scale Video Classification with Convolutional Neural Networks. Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.223"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Yao, H., Chuyi, L., Dan, H., and Weiyu, Y. (2016, January 8\u201310). Gabor Feature Based Convolutional Neural Network for Object Recognition in Natural Scene. Proceedings of the 2016 3rd International Conference on Information Science and Control Engineering (ICISCE), Beijing, China.","DOI":"10.1109\/ICISCE.2016.91"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Hosseini, S., Lee, S.H., Kwon, H.J., Koo, H.I., and Cho, N.I. (2018, January 7\u201310). Age and gender classification using wide convolutional neural network and Gabor filter. Proceedings of the 2018 International Workshop on Advanced Image Technology (IWAIT), Chiang Mai, Thailand.","DOI":"10.1109\/IWAIT.2018.8369721"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Nguyen, V.D., Lim, K., Le, M.D., and Bui, N.D. (2018, January 21\u201324). Combination of Gabor Filter and Convolutional Neural Network for Suspicious Mass Classification. Proceedings of the 2018 22nd International Computer Science and Engineering Conference (ICSEC), Chiang Mai, Thailand.","DOI":"10.1109\/ICSEC.2018.8712796"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"2278","DOI":"10.1109\/5.726791","article-title":"Gradient-based learning applied to document recognition","volume":"86","author":"LeCun","year":"1998","journal-title":"Proc. IEEE"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"138","DOI":"10.1017\/S0962492900000015","article-title":"Radial basis functions","volume":"9","author":"Buhmann","year":"2000","journal-title":"Acta Numer."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"504","DOI":"10.1126\/science.1127647","article-title":"Reducing the Dimensionality of Data with Neural Networks","volume":"313","author":"Hinton","year":"2006","journal-title":"Science"},{"key":"ref_36","first-page":"1097","article-title":"Imagenet classification with deep convolutional neural networks","volume":"25","author":"Krizhevsky","year":"2012","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_37","unstructured":"Kumar, J., Ye, P., and Doermann, D. (2012, January 11\u201315). Learning document structure for retrieval and classification. Proceedings of the the 21st International Conference on Pattern Recognition (ICPR2012), Tsukuba, Japan."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"2011","DOI":"10.1109\/TPAMI.2019.2913372","article-title":"Squeeze-and-Excitation Networks","volume":"42","author":"Hu","year":"2020","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1016\/j.patrec.2013.10.030","article-title":"Structural similarity for document image classification and retrieval","volume":"43","author":"Kumar","year":"2014","journal-title":"Pattern Recognit. Lett."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Kang, L., Kumar, J., Ye, P., and Doermann, D. (2014, January 24\u201328). Convolutional neural networks for document image classification. Proceedings of the 22nd International Conference of Pattern Recognition Pattern Recognition (ICPR2014), Stockholm, Sweden.","DOI":"10.1109\/ICPR.2014.546"},{"key":"ref_41","unstructured":"Chen, S., He, Y., Sun, J., and Naoi, S. (2012, January 11\u201315). Structured document classification by matching local salient features. Proceedings of the 21st International Conference on Pattern Recognition, Tsukuba, Japan."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Joutel, G., Eglin, V., Bres, S., and Emptoz, H. (2007, January 23\u201326). Curvelets Based Queries for CBIR Application in Handwriting Collections. Proceedings of the Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), Curitiba, Brazil.","DOI":"10.1109\/ICDAR.2007.4376995"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Kochi, T., and Saitoh, T. (1999, January 2\u201322). User-defined template for identifying document type and extracting information from documents. Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR \u201999 (Cat. No.PR00318), Bangalore, India.","DOI":"10.1109\/ICDAR.1999.791741"},{"key":"ref_44","unstructured":"Bagdanov, A.D., and Worring, M. (2001, January 13). Fine-grained document genre classification using first order random graphs. Proceedings of the Sixth International Conference on Document Analysis and Recognition, Seattle, WA, USA."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Byun, Y., and Lee, Y. (2000). Form Classification Using DP Matching, ACM.","DOI":"10.1145\/335603.335611"},{"key":"ref_46","unstructured":"Shin, S., and Doermann, D. (2006, January 26\u201329). Document Image Retrieval Based on Layout Structural Similarity. Proceedings of the International Conference on Image Processing, Computer Vision, Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_47","unstructured":"Collins-Thompson, K., and Nickolov, R. (2002, January 11\u201315). A Clustering-Based Algorithm for Automatic Document Separation. Proceedings of the SIGIR Workshop on Information Retrieval and OCR, Tampere, Finland."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Sermanet, P., Kavukcuoglu, K., Chintala, S., and Lecun, Y. (2013, January 23\u201328). Pedestrian Detection with Unsupervised Multi-stage Feature Learning. Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA.","DOI":"10.1109\/CVPR.2013.465"},{"key":"ref_49","unstructured":"Torresani, L., Szummer, M., and Fitzgibbon, A. (2010). Computer Vision\u2013ECCV 2010, Springer."},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014, January 23\u201328). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.81"},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"1239","DOI":"10.1109\/TPAMI.2019.2950923","article-title":"Effects of Image Degradation and Degradation Removal to CNN-Based Image Classification","volume":"43","author":"Pei","year":"2021","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"823","DOI":"10.1080\/01431160600746456","article-title":"A survey of image classification methods and techniques for improving classification performance","volume":"28","author":"Lu","year":"2007","journal-title":"Int. J. Remote. Sens."},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"503","DOI":"10.1016\/j.ijleo.2017.11.116","article-title":"Efficient hybrid image denoising scheme based on SVM classification","volume":"157","author":"Routray","year":"2018","journal-title":"Optik"},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Hossain, T., Teng, S.W., Zhang, D., Lim, S., and Lu, G. (2019, January 22\u201325). Distortion Robust Image Classification Using Deep Convolutional Neural Network with Discrete Cosine Transform. Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan.","DOI":"10.1109\/ICIP.2019.8803787"},{"key":"ref_55","first-page":"1","article-title":"Restoration of Partial Blurred Image Based on Blur Detection and Classification","volume":"2016","author":"Yang","year":"2016","journal-title":"J. Electr. Comput. Eng."},{"key":"ref_56","first-page":"1","article-title":"Blurred Image Classification Based on Adaptive Dictionary","volume":"5","author":"Sun","year":"2013","journal-title":"Int. J. Multimed. Its Appl."},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., and Liang, J. (2017, January 21\u201326). EAST: An Efficient and Accurate Scene Text Detector. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.283"},{"key":"ref_58","doi-asserted-by":"crossref","unstructured":"Yang, C.-S., and Hsieh, C.-C. (2019, January 3\u20136). High Accuracy Text Detection using ResNet as Feature Extractor. Proceedings of the 2019 IEEE Eurasia Conference on IOT, Communication and Engineering (ECICE), Yunlin, Taiwan.","DOI":"10.1109\/ECICE47484.2019.8942666"},{"key":"ref_59","doi-asserted-by":"crossref","unstructured":"Harley, A.W., Ufkes, A., and Derpanis, K.G. (2015, January 13\u201326). Evaluation of deep convolutional nets for document image classification and retrieval. Proceedings of the 2015 13th International Conference on Document Analysis and Recognition (ICDAR), Tunis, Tunisia.","DOI":"10.1109\/ICDAR.2015.7333910"},{"key":"ref_60","unstructured":"Zeshan, A.M., Andreas, K., Sheraz, A., and Marcus, L. (2017, January 9\u201315). Cutting the Error by Half: Investigation of Very Deep CNN and Advanced Training Strategies for Document Image Classification. Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, Japan."},{"key":"ref_61","doi-asserted-by":"crossref","first-page":"414","DOI":"10.1109\/THMS.2020.2984181","article-title":"A Smartphone-Based Adaptive Recognition and Real-Time Monitoring System for Human Activities","volume":"50","author":"Qi","year":"2020","journal-title":"IEEE Trans. Hum.-Mach. Syst."},{"key":"ref_62","doi-asserted-by":"crossref","first-page":"2943","DOI":"10.1109\/LRA.2020.2974445","article-title":"Deep Neural Network Approach in Robot Tool Dynamics Identification for Bilateral Teleoperation","volume":"5","author":"Su","year":"2020","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_63","doi-asserted-by":"crossref","first-page":"291","DOI":"10.1016\/j.neunet.2020.07.033","article-title":"Improved recurrent neural network-based manipulator control with remote center of motion constraints: Experimental results","volume":"131","author":"Su","year":"2020","journal-title":"Neural Netw."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/20\/6763\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T07:12:04Z","timestamp":1760166724000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/20\/6763"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,12]]},"references-count":63,"journal-issue":{"issue":"20","published-online":{"date-parts":[[2021,10]]}},"alternative-id":["s21206763"],"URL":"https:\/\/doi.org\/10.3390\/s21206763","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,10,12]]}}}