{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T15:23:54Z","timestamp":1772119434298,"version":"3.50.1"},"reference-count":46,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2023,4,25]],"date-time":"2023-04-25T00:00:00Z","timestamp":1682380800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,4,25]],"date-time":"2023-04-25T00:00:00Z","timestamp":1682380800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Deutsches Forschungszentrum f\u00fcr K\u00fcnstliche Intelligenz GmbH (DFKI)"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["IJDAR"],"published-print":{"date-parts":[[2023,9]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>\n                    Deep learning has been extensively researched in the field of document analysis and has shown excellent performance across a wide range of document-related tasks. As a result, a great deal of emphasis is now being placed on its practical deployment and integration into modern industrial document processing pipelines. It is well known, however, that deep learning models are data-hungry and often require huge volumes of annotated data in order to achieve competitive performances. And since data annotation is a costly and labor-intensive process, it remains one of the major hurdles to their practical deployment. This study investigates the possibility of using active learning to reduce the costs of data annotation in the context of document image classification, which is one of the core components of modern document processing pipelines. The results of this study demonstrate that by utilizing active learning (AL), deep document classification models can achieve competitive performances to the models trained on fully annotated datasets and, in some cases, even surpass them by annotating only 15\u201340% of the total training dataset. Furthermore, this study demonstrates that modern AL strategies significantly outperform random querying, and in many cases achieve comparable performance to the models trained on fully annotated datasets even in the presence of practical deployment issues such as data imbalance, and annotation noise, and thus, offer tremendous benefits in real-world deployment of deep document classification models. The code to reproduce our experiments is publicly available at\n                    <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/saifullah3396\/doc_al\">https:\/\/github.com\/saifullah3396\/doc_al<\/jats:ext-link>\n                    .\n                  <\/jats:p>","DOI":"10.1007\/s10032-023-00429-8","type":"journal-article","created":{"date-parts":[[2023,4,25]],"date-time":"2023-04-25T13:07:25Z","timestamp":1682428045000},"page":"187-209","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["Analyzing the potential of active learning for document image classification"],"prefix":"10.1007","volume":"26","author":[{"given":"Saifullah","family":"Saifullah","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Stefan","family":"Agne","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Andreas","family":"Dengel","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sheraz","family":"Ahmed","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2023,4,25]]},"reference":[{"key":"429_CR1","doi-asserted-by":"crossref","unstructured":"Xu, Y.: et\u00a0al. LayoutLM: Pre-training of Text and Layout for Document Image Understanding. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Vol. 20, pp. 1192\u20131200 (2020). arXiv:1912.13318","DOI":"10.1145\/3394486.3403172"},{"key":"429_CR2","doi-asserted-by":"crossref","unstructured":"Harley, A.W., Ufkes, A., Derpanis, K.G.: Evaluation of deep convolutional nets for document image classification and retrieval. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 991\u2013995 (2015)","DOI":"10.1109\/ICDAR.2015.7333910"},{"key":"429_CR3","doi-asserted-by":"crossref","unstructured":"Siddiqui S.A., Agne, S., Dengel, A., Ahmed, S.: Are deep models robust against real distortions? a case study on document image classification (2022). https:\/\/doi.org\/10.20944\/preprints202202.0058.v1","DOI":"10.20944\/preprints202202.0058.v1"},{"key":"429_CR4","doi-asserted-by":"crossref","unstructured":"Ferrando, J.: et\u00a0al. Improving accuracy and speeding up document image classification through parallel systems. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 12138 LNCS, pp. 387\u2013400 (2020). arXiv:2006.09141","DOI":"10.1007\/978-3-030-50417-5_29"},{"key":"429_CR5","doi-asserted-by":"crossref","unstructured":"Xu, Y. et\u00a0al.: LayoutLMv2: Multi-modal Pre-training for Visually-rich Document Understanding, pp. 2579\u20132591 (Association for Computational Linguistics (ACL), 2021). 2012.14740","DOI":"10.18653\/v1\/2021.acl-long.201"},{"issue":"11","key":"429_CR6","doi-asserted-by":"publisher","first-page":"2298","DOI":"10.1109\/TPAMI.2016.2646371","volume":"39","author":"B Shi","year":"2015","unstructured":"Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Patt. Analy. Mach. Intell. 39(11), 2298\u20132304 (2015)","journal-title":"IEEE Trans. Patt. Analy. Mach. Intell."},{"key":"429_CR7","unstructured":"Settles, B.: Computer Sciences Department Active Learning Literature Survey (2009)"},{"key":"429_CR8","unstructured":"Zhan, X., et al.: A comparative survey of deep active learning (2022). https:\/\/arxiv.org\/abs\/2203.13450"},{"key":"429_CR9","doi-asserted-by":"crossref","unstructured":"Li, X., Guo, Y.: Adaptive Active Learning for Image Classification, pp. 859\u2013866 (2013)","DOI":"10.1109\/CVPR.2013.116"},{"key":"429_CR10","doi-asserted-by":"publisher","first-page":"113","DOI":"10.1007\/s11263-014-0781-x","volume":"113","author":"Y Yang","year":"2015","unstructured":"Yang, Y., et al.: Multi-class active learning by uncertainty sampling with diversity maximization. Int. J. Comput. Vis. 113, 113\u2013127 (2015). https:\/\/doi.org\/10.1007\/s11263-014-0781-x","journal-title":"Int. J. Comput. Vis."},{"key":"429_CR11","unstructured":"Sener, O., Savarese, S.: Active Learning for Convolutional Neural Networks: A Core-Set Approach (2018). https:\/\/openreview.net\/forum?id=H1aIuk-RW"},{"key":"429_CR12","doi-asserted-by":"crossref","unstructured":"Sinha, S., Ebrahimi, S., Darrell, T.: Variational Adversarial Active Learning (2019). https:\/\/arxiv.org\/abs\/1904.00370","DOI":"10.1109\/ICCV.2019.00607"},{"key":"429_CR13","doi-asserted-by":"crossref","unstructured":"Afzal, M.\u00a0Z., Kolsch, A., Ahmed, S., Liwicki, M.: Cutting the Error by half: investigation of very deep cnn and advanced training strategies for document image classification. In Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, Vol. 1, pp. 883\u2013888 (2017). arXiv:1704.03557","DOI":"10.1109\/ICDAR.2017.149"},{"key":"429_CR14","doi-asserted-by":"crossref","unstructured":"Mahapatra, D., Bozorgtabar, B., Thiran, J.-P., Reyes, M.: Efficient active learning for image classification and segmentation using a sample selection and conditional generative adversarial network (2018). https:\/\/arxiv.org\/abs\/1806.05473","DOI":"10.1007\/978-3-030-00934-2_65"},{"key":"429_CR15","unstructured":"Mayer, C., Timofte, R.: Adversarial sampling for active learning (2018). https:\/\/arxiv.org\/abs\/1808.06671"},{"key":"429_CR16","doi-asserted-by":"publisher","unstructured":"Gal, Y., Islam, R., Ghahramani, Z.: Deep Bayesian Active Learning with Image Data. 34th International Conference on Machine Learning, ICML 2017, Vol. 3, pp. 1923\u20131932 (2017). https:\/\/arxiv.org\/abs\/1703.02910v1. https:\/\/doi.org\/10.48550\/arxiv.1703.02910","DOI":"10.48550\/arxiv.1703.02910"},{"key":"429_CR17","doi-asserted-by":"publisher","unstructured":"Cai, W., et al.: Active learning for support vector machines with maximum model change. Undefined 8724 LNAI (PART 1), pp. 211\u2013226 (2014). https:\/\/doi.org\/10.1007\/978-3-662-44848-9_14","DOI":"10.1007\/978-3-662-44848-9_14"},{"issue":"12","key":"429_CR18","doi-asserted-by":"publisher","first-page":"2591","DOI":"10.1109\/TCSVT.2016.2589879","volume":"27","author":"K Wang","year":"2016","unstructured":"Wang, K., Zhang, D., Li, Y., Zhang, R., Lin, L.: Cost-effective active learning for deep image classification. IEEE Trans. Circuits Syst. Video Technol. 27(12), 2591\u20132600 (2016)","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"issue":"1","key":"429_CR19","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1145\/584091.584093","volume":"5","author":"CE Shannon","year":"2001","unstructured":"Shannon, C.E.: A mathematical theory of communication. SIGMOBILE Mob. Comput. Commun. Rev. 5(1), 3\u201355 (2001). https:\/\/doi.org\/10.1145\/584091.584093","journal-title":"SIGMOBILE Mob. Comput. Commun. Rev."},{"key":"429_CR20","unstructured":"Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: representing model uncertainty in deep learning (2015). https:\/\/arxiv.org\/abs\/1506.02142"},{"issue":"56","key":"429_CR21","first-page":"1929","volume":"15","author":"N Srivastava","year":"2014","unstructured":"Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(56), 1929\u20131958 (2014)","journal-title":"J. Mach. Learn. Res."},{"key":"429_CR22","doi-asserted-by":"crossref","unstructured":"Yang, L., Zhang, Y., Chen, J., Zhang, S., Chen, D.Z.: Suggestive annotation: a deep active learning framework for biomedical image segmentation (2017). https:\/\/arxiv.org\/abs\/1706.04737","DOI":"10.1007\/978-3-319-66179-7_46"},{"key":"429_CR23","unstructured":"Ducoffe, M., Precioso, F.: Adversarial active learning for deep networks: a margin based approach (2018). https:\/\/arxiv.org\/abs\/1802.09841"},{"key":"429_CR24","unstructured":"Kurakin, A., Goodfellow, I., Bengio, S.: Adversarial examples in the physical world (2016). https:\/\/arxiv.org\/abs\/1607.02533"},{"key":"429_CR25","unstructured":"Shui, C., Zhou, F., Gagn\u00e9, C., Wang, B.: Deep active learning: unified and principled method for query and training (2019). https:\/\/arxiv.org\/abs\/1911.09162"},{"key":"429_CR26","doi-asserted-by":"crossref","unstructured":"Yoo, D., Kweon, I.S.: Learning loss for active learning (2019). https:\/\/arxiv.org\/abs\/1905.03677","DOI":"10.1109\/CVPR.2019.00018"},{"key":"429_CR27","doi-asserted-by":"publisher","unstructured":"Ash, J.T., Zhang, C., Krishnamurthy, A., Langford, J., Agarwal, A.: deep batch active learning by diverse, uncertain gradient lower bounds (2019). https:\/\/doi.org\/10.48550\/ARXIV.1906.03671","DOI":"10.48550\/ARXIV.1906.03671"},{"key":"429_CR28","first-page":"8927","volume":"11","author":"JT Ash","year":"2021","unstructured":"Ash, J.T., Goel, S., Krishnamurthy, A., Kakade, S.: Gone fishing: neural active learning with fisher embeddings. Adv. Neural Inf. Process. Syst. 11, 8927\u20138939 (2021)","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"429_CR29","unstructured":"Shin, C.K., Doermann, D.S.: Document image retrieval based on layout structural similarity. In: Proceedings of 2006 International Conference on Image Process, Computer Vision and Pattern Recognition, Vol. 2, pp. 606\u2013612 (2016)"},{"issue":"1","key":"429_CR30","doi-asserted-by":"publisher","first-page":"119","DOI":"10.1016\/j.patrec.2013.10.030","volume":"43","author":"J Kumar","year":"2014","unstructured":"Kumar, J., Ye, P., Doermann, D.: Structural similarity for document image classification and retrieval. Patt. Recognit. Lett. 43(1), 119\u2013126 (2014)","journal-title":"Patt. Recognit. Lett."},{"key":"429_CR31","unstructured":"Baldi, S., Marinai, S., Soda, G.: Using tree-grammars for training set expansion in page classification. In: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, 2003-Janua\u00a0(Icdar), pp. 829\u2013833. (2003)"},{"key":"429_CR32","doi-asserted-by":"publisher","first-page":"519","DOI":"10.1109\/TPAMI.2003.1190578","volume":"25","author":"M Diligenti","year":"2003","unstructured":"Diligenti, M., Frasconi, P., Gori, M.: Hidden tree Markov models for document image classification. Patt. Anal. Mach. Intell. IEEE Trans. 25, 519\u2013523 (2003). https:\/\/doi.org\/10.1109\/TPAMI.2003.1190578","journal-title":"Patt. Anal. Mach. Intell. IEEE Trans."},{"key":"429_CR33","doi-asserted-by":"crossref","unstructured":"Kang, L., Kumar, J., Ye, P., Li, Y., Doermann, D.: Convolutional neural networks for document image classification. In: Proceedings of the International Conference on Pattern Recognition, pp. 3168\u20133172 (2014)","DOI":"10.1109\/ICPR.2014.546"},{"key":"429_CR34","doi-asserted-by":"crossref","unstructured":"Das, A., Roy, S., Bhattacharya, U., Parui, S.K.: Document image classification with intra-domain transfer learning and stacked generalization of deep convolutional neural networks. In: Proceedings of the International Conference on Pattern Recognition, 2018-Augus, pp. 3180\u20133185 (2018). arXiv:1801.09321","DOI":"10.1109\/ICPR.2018.8545630"},{"key":"429_CR35","doi-asserted-by":"crossref","unstructured":"Saifullah, S., Agne, S., Dengel, A., Ahmed, S.: DocXClassifier: towards an interpretable deep convolutional neural network for document image classification. (2022,9), https:\/\/doi.org\/10.36227\/techrxiv.19310489.v4","DOI":"10.36227\/techrxiv.19310489"},{"key":"429_CR36","doi-asserted-by":"publisher","first-page":"164358","DOI":"10.1109\/ACCESS.2021.3133200","volume":"9","author":"S Siddiqui","year":"2021","unstructured":"Siddiqui, S., Dengel, A., Ahmed, S.: Self-supervised representation learning for document image classification. IEEE Access 9, 164358\u2013164367 (2021)","journal-title":"IEEE Access"},{"key":"429_CR37","doi-asserted-by":"crossref","unstructured":"Cosma, A., Ghidoveanu, M., Panaitescu-Liess, M., Popescu, M.: Self-Supervised Representation Learning on Document Images. 2020, https:\/\/arxiv.org\/abs\/2004.10605","DOI":"10.1007\/978-3-030-57058-3_8"},{"key":"429_CR38","doi-asserted-by":"crossref","unstructured":"Powalski, R., et al.: Going full-tilt boogie on document understanding with text-image-layout transformer (2021). https:\/\/arxiv.org\/abs\/2102.09550","DOI":"10.1007\/978-3-030-86331-9_47"},{"key":"429_CR39","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015). https:\/\/arxiv.org\/abs\/1512.03385","DOI":"10.1109\/CVPR.2016.90"},{"key":"429_CR40","doi-asserted-by":"crossref","unstructured":"Deng, J., et al.: Imagenet: a large-scale hierarchical image database, pp. 248\u2013255 (2009)","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"429_CR41","unstructured":"Zbontar, J., Jing, L., Misra, I., LeCun, Y., Deny, S.: Barlow Twins: Self-Supervised Learning via Redundancy Reduction. 2021, https:\/\/arxiv.org\/abs\/2103.03230"},{"key":"429_CR42","unstructured":"Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A Simple Framework for Contrastive Learning of Visual Representations. 2020, https:\/\/arxiv.org\/abs\/2002.05709"},{"key":"429_CR43","doi-asserted-by":"crossref","unstructured":"Bengar, J.Z., van\u00a0de Weijer, J., Fuentes, L.L., Raducanu, B.: Class-balanced active learning for image classification (2021). https:\/\/arxiv.org\/abs\/2110.04543","DOI":"10.1109\/WACV51458.2022.00376"},{"key":"429_CR44","doi-asserted-by":"publisher","DOI":"10.4135\/9781412983419","volume-title":"ANOVA: Repeated Measures","author":"E Girden","year":"1992","unstructured":"Girden, E.: ANOVA: Repeated Measures. Sage (1992)"},{"key":"429_CR45","unstructured":"Gupta, G., Sahu, A.K., Lin, W.-Y.: Noisy batch active learning with deterministic annealing (2019). https:\/\/arxiv.org\/abs\/1909.12473"},{"key":"429_CR46","doi-asserted-by":"crossref","unstructured":"Liu, Z., Mao, H., Wu, C., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s. (2022), https:\/\/arxiv.org\/abs\/2201.03545","DOI":"10.1109\/CVPR52688.2022.01167"}],"container-title":["International Journal on Document Analysis and Recognition (IJDAR)"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10032-023-00429-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10032-023-00429-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10032-023-00429-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,8,17]],"date-time":"2023-08-17T05:06:10Z","timestamp":1692248770000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10032-023-00429-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,4,25]]},"references-count":46,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2023,9]]}},"alternative-id":["429"],"URL":"https:\/\/doi.org\/10.1007\/s10032-023-00429-8","relation":{"has-preprint":[{"id-type":"doi","id":"10.21203\/rs.3.rs-2273654\/v1","asserted-by":"object"}]},"ISSN":["1433-2833","1433-2825"],"issn-type":[{"value":"1433-2833","type":"print"},{"value":"1433-2825","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,4,25]]},"assertion":[{"value":"14 November 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 March 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"25 April 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"8 July 2023","order":4,"name":"change_date","label":"Change Date","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Update","order":5,"name":"change_type","label":"Change Type","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"This article was revised due to update in funding note","order":6,"name":"change_details","label":"Change Details","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}