{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,23]],"date-time":"2026-01-23T16:57:29Z","timestamp":1769187449918,"version":"3.49.0"},"reference-count":32,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2025,7,1]],"date-time":"2025-07-01T00:00:00Z","timestamp":1751328000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,7,1]],"date-time":"2025-07-01T00:00:00Z","timestamp":1751328000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100009056","name":"University of West Bohemia","doi-asserted-by":"crossref","id":[{"id":"10.13039\/100009056","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["IJDAR"],"published-print":{"date-parts":[[2025,9]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Historical document analysis plays a crucial role in understanding and preserving our past. However, this task is often hindered by challenges such as limited annotated training data and the diverse nature of historical handwritten documents. In this paper, we explore the potential of self-supervised learning (SSL) in historical document analysis, with a particular focus on historical handwritten document segmentation, to overcome the need for extensive annotated data while enhancing efficiency and robustness. We present an overview of SSL methods suitable for historical document analysis and discuss their potential applications and benefits. Furthermore, we present an approach for SSL in the document domain, considering various setups, augmentations, and resolutions. We also provide experimental results that demonstrate its feasibility and effectiveness. Our findings indicate that most document segmentation tasks can be effectively addressed using SSL features, highlighting the potential of SSL to advance historical document analysis and pave the way for more efficient and robust document processing workflows.<\/jats:p>","DOI":"10.1007\/s10032-025-00538-6","type":"journal-article","created":{"date-parts":[[2025,7,1]],"date-time":"2025-07-01T03:24:46Z","timestamp":1751340286000},"page":"329-344","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["On self-supervision in historical handwritten document segmentation"],"prefix":"10.1007","volume":"28","author":[{"given":"Josef","family":"Baloun","sequence":"first","affiliation":[]},{"given":"Martin","family":"Prantl","sequence":"additional","affiliation":[]},{"given":"Ladislav","family":"Lenc","sequence":"additional","affiliation":[]},{"given":"Ji\u0159\u00ed","family":"Mart\u00ednek","sequence":"additional","affiliation":[]},{"given":"Pavel","family":"Kr\u00e1l","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,7,1]]},"reference":[{"key":"538_CR1","doi-asserted-by":"publisher","unstructured":"Ares\u00a0Oliveira, S., Seguin, B., Kaplan, F.: dhSegment: a generic deep-learning approach for document segmentation. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 7\u201312, (2018). https:\/\/doi.org\/10.1109\/ICFHR-2018.2018.00011","DOI":"10.1109\/ICFHR-2018.2018.00011"},{"key":"538_CR2","doi-asserted-by":"publisher","unstructured":"Baloun, J., Kr\u00e1l, P., Lenc, L.: ChronSeg: Novel dataset for segmentation of handwritten historical chronicles. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence - ICAART,, INSTICC. SciTePress, Vol. 2, pp. 314\u2013322, (2021). https:\/\/doi.org\/10.5220\/0010317203140322","DOI":"10.5220\/0010317203140322"},{"key":"538_CR3","unstructured":"Bao, H., Dong, L., Piao, S., et\u00a0al.: BEiT: Bert pre-training of image transformers. arXiv: 2106.08254 (2022)"},{"key":"538_CR4","unstructured":"Chen, T., Kornblith, S., Norouzi, M., et\u00a0al.: A simple framework for contrastive learning of visual representations. In: Proceedings of the 37th International Conference on Machine Learning. JMLR.org, ICML\u201920 (2020)"},{"key":"538_CR5","doi-asserted-by":"publisher","unstructured":"Christlein, V., Nicolaou, A., Seuret, M., et\u00a0al.: Icdar 2019 competition on image retrieval for historical handwritten documents. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1505\u20131509, (2019). https:\/\/doi.org\/10.1109\/ICDAR.2019.00242","DOI":"10.1109\/ICDAR.2019.00242"},{"key":"538_CR6","doi-asserted-by":"publisher","unstructured":"Cosma, A., Ghidoveanu, M., Panaitescu-Liess, M., et\u00a0al.: Self-supervised representation learning on document images. In: Bai, X., Karatzas, D., Lopresti, D. (eds.) Document Analysis Systems. pp. 103\u2013117, Springer International Publishing, Cham, (2020) https:\/\/doi.org\/10.1007\/978-3-030-57058-3_8","DOI":"10.1007\/978-3-030-57058-3_8"},{"key":"538_CR7","doi-asserted-by":"publisher","unstructured":"Devlin, J., Chang, M.W., Lee, K., et\u00a0al.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, pp. 4171\u20134186, (2019).https:\/\/doi.org\/10.18653\/v1\/N19-1423,","DOI":"10.18653\/v1\/N19-1423"},{"key":"538_CR8","doi-asserted-by":"publisher","unstructured":"Diem, M., Kleber, F., Fiel, S., et\u00a0al.: cBAD: Icdar2017 competition on baseline detection. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 1355\u20131360, (2017). https:\/\/doi.org\/10.1109\/ICDAR.2017.222","DOI":"10.1109\/ICDAR.2017.222"},{"key":"538_CR9","doi-asserted-by":"publisher","unstructured":"Fiel, S., Kleber, F., Diem, M., et\u00a0al.: Icdar2017 competition on historical document writer identification (historical-wi). In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 1377\u20131382, (2017). https:\/\/doi.org\/10.1109\/ICDAR.2017.225","DOI":"10.1109\/ICDAR.2017.225"},{"key":"538_CR10","doi-asserted-by":"publisher","unstructured":"Gatos, B., Stamatopoulos, N., Louloudis, G., et\u00a0al.: GRPOLY-DB: An old greek polytonic document image database. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 646\u2013650, (2015). https:\/\/doi.org\/10.1109\/ICDAR.2015.7333841","DOI":"10.1109\/ICDAR.2015.7333841"},{"key":"538_CR11","unstructured":"Goodfellow, I., Pouget-Abadie, J., Mirza, M., et\u00a0al.: Generative adversarial nets. In: Ghahramani Z, Welling M, Cortes C, et\u00a0al (eds) Advances in Neural Information Processing Systems, vol\u00a027. Curran Associates, Inc., (2014). https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2014\/file\/5ca3e9b122f61f8f06494c97b1afccf3-Paper.pdf"},{"key":"538_CR12","first-page":"21271","volume":"33","author":"JB Grill","year":"2020","unstructured":"Grill, J.B., Strub, F., Altch\u00e9, F., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271\u201321284 (2020)","journal-title":"Advances in neural information processing systems"},{"key":"538_CR13","doi-asserted-by":"publisher","unstructured":"Gupta, A., Vedaldi, A., Zisserman, A.: Learning to read by spelling: Towards unsupervised text recognition. In: Proceedings of the 11th Indian Conference on Computer Vision, Graphics and Image Processing. Association for Computing Machinery, New York, NY, USA, ICVGIP \u201918, (2020). https:\/\/doi.org\/10.1145\/3293353.3293386,","DOI":"10.1145\/3293353.3293386"},{"key":"538_CR14","doi-asserted-by":"publisher","unstructured":"Kim, G., Hong, T., Yim, M., et\u00a0al.: Ocr-free document understanding transformer. In: Computer Vision \u2013 ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23\u201327, 2022, Proceedings, Part XXVIII. Springer-Verlag, Berlin, Heidelberg, p. 498\u2013517, (2022). https:\/\/doi.org\/10.1007\/978-3-031-19815-1_29,","DOI":"10.1007\/978-3-031-19815-1_29"},{"key":"538_CR15","doi-asserted-by":"publisher","unstructured":"Kleber, F., Fiel, S., Diem, M., et\u00a0al.: CVL-DataBase: An off-line database for writer retrieval, writer identification and word spotting. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 560\u2013564, (2013).https:\/\/doi.org\/10.1109\/ICDAR.2013.117","DOI":"10.1109\/ICDAR.2013.117"},{"key":"538_CR16","doi-asserted-by":"publisher","unstructured":"Li, J., Xu, Y., Lv, T., et\u00a0al.: DiT: Self-supervised pre-training for document image transformer. In: Proceedings of the 30th ACM International Conference on Multimedia. Association for Computing Machinery, New York, NY, USA, MM \u201922, p. 3530\u20133539, (2022). https:\/\/doi.org\/10.1145\/3503161.3547911,","DOI":"10.1145\/3503161.3547911"},{"key":"538_CR17","doi-asserted-by":"publisher","unstructured":"Li, P., Gu, J., Kuen, J., et\u00a0al.: SelfDoc: Self-supervised document representation learning. In: 2021 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5648\u20135656, (2021). https:\/\/doi.org\/10.1109\/CVPR46437.2021.00560","DOI":"10.1109\/CVPR46437.2021.00560"},{"key":"538_CR18","unstructured":"Loshchilov, I., Hutter, F.: Fixing weight decay regularization in adam. CoRR arXiv:1711.05101 (2017)"},{"key":"538_CR19","doi-asserted-by":"publisher","unstructured":"Maity, S., Biswas, S., Manna, S., et\u00a0al.: SelfDocSeg: A Self-supervised Vision-Based Approach Towards Document Segmentation, Springer Nature Switzerland, p. 342\u2013360. (2023). https:\/\/doi.org\/10.1007\/978-3-031-41676-7_20,","DOI":"10.1007\/978-3-031-41676-7_20"},{"key":"538_CR20","doi-asserted-by":"publisher","first-page":"39","DOI":"10.1007\/s100320200071","volume":"5","author":"UV Marti","year":"2002","unstructured":"Marti, U.V., Bunke, H.: The IAM-database: an english sentence database for offline handwriting recognition. International Journal on Document Analysis and Recognition 5, 39\u201346 (2002)","journal-title":"International Journal on Document Analysis and Recognition"},{"key":"538_CR21","doi-asserted-by":"crossref","unstructured":"Michael, J., Labahn, R., Gr\u00fcning, T., et\u00a0al.: Evaluating sequence-to-sequence models for handwritten text recognition. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), IEEE, pp. 1286\u20131293 (2019)","DOI":"10.1109\/ICDAR.2019.00208"},{"key":"538_CR22","unstructured":"Oquab, M., Darcet, T., Moutakanni, T., et\u00a0al.: DINOv2: Learning robust visual features without supervision. Transactions on Machine Learning Research (2024). https:\/\/openreview.net\/forum?id=a68SUt6zFt"},{"key":"538_CR23","doi-asserted-by":"publisher","unstructured":"Pletschacher, S., Antonacopoulos, A.: The PAGE (page analysis and ground-truth elements) format framework. In: 2010 20th International Conference on Pattern Recognition, pp. 257\u2013260, (2010).https:\/\/doi.org\/10.1109\/ICPR.2010.72","DOI":"10.1109\/ICPR.2010.72"},{"key":"538_CR24","unstructured":"Pramanik, S., Mujumdar, S., Patel, H.: Towards a multi-modal, multi-task learning based pre-training framework for document representation learning. arXiv preprint arXiv:2009.14457 (2020)"},{"key":"538_CR25","doi-asserted-by":"publisher","unstructured":"Ronneberger, O., Fischer, P., Brox, T.: U-Net: Convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., et\u00a0al. (eds.) Medical Image Computing and Computer-Assisted Intervention \u2013 MICCAI 2015. Springer International Publishing, Cham, pp. 234\u2013241, (2015). https:\/\/doi.org\/10.1007\/978-3-319-24574-4_28","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"538_CR26","doi-asserted-by":"publisher","unstructured":"Simistira, F., Seuret, M., Eichenberger, N., et\u00a0al.: DIVA-HisDB: A precisely annotated large dataset of challenging medieval manuscripts. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 471\u2013476, (2016).https:\/\/doi.org\/10.1109\/ICFHR.2016.0093","DOI":"10.1109\/ICFHR.2016.0093"},{"key":"538_CR27","doi-asserted-by":"publisher","unstructured":"S\u00e1nchez, J.A., Romero, V., Toselli, A.H., et\u00a0al.: ICFHR2014 competition on handwritten text recognition on transcriptorium datasets (htrts). In: 2014 14th International Conference on Frontiers in Handwriting Recognition, pp. 785\u2013790, (2014). https:\/\/doi.org\/10.1109\/ICFHR.2014.137","DOI":"10.1109\/ICFHR.2014.137"},{"key":"538_CR28","doi-asserted-by":"publisher","unstructured":"S\u00e1nchez, J.A., Romero, V., Toselli, A.H., et\u00a0al.: ICFHR2016 competition on handwritten text recognition on the READ dataset. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 630\u2013635, (2016). https:\/\/doi.org\/10.1109\/ICFHR.2016.0120","DOI":"10.1109\/ICFHR.2016.0120"},{"key":"538_CR29","doi-asserted-by":"publisher","unstructured":"Wallace, E., Wang, Y., Li, S., et\u00a0al.: Do NLP models know numbers? probing numeracy in embeddings. In: Inui, K., Jiang, J., Ng, V., et\u00a0al.: (eds.) Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, pp. 5307\u20135315, (2019). https:\/\/doi.org\/10.18653\/v1\/D19-1534,","DOI":"10.18653\/v1\/D19-1534"},{"key":"538_CR30","doi-asserted-by":"publisher","unstructured":"Xu, Y., Li, M., Cui, L., et\u00a0al.: LayoutLM: Pre-training of text and layout for document image understanding. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. Association for Computing Machinery, New York, NY, USA, KDD \u201920, p. 1192\u20131200, (2020). https:\/\/doi.org\/10.1145\/3394486.3403172,","DOI":"10.1145\/3394486.3403172"},{"key":"538_CR31","unstructured":"Yu, Y., Li, Y., Zhang, C., et\u00a0al.: Structextv2: Masked visual-textual prediction for document image pre-training. arXiv: 2303.00289 (2023)"},{"key":"538_CR32","unstructured":"Zbontar, J., Jing, L., Misra, I., et\u00a0al.: Barlow twins: Self-supervised learning via redundancy reduction. In: International conference on machine learning, PMLR, pp. 12310\u201312320 (2021)"}],"container-title":["International Journal on Document Analysis and Recognition (IJDAR)"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10032-025-00538-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10032-025-00538-6\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10032-025-00538-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,20]],"date-time":"2025-09-20T08:38:38Z","timestamp":1758357518000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10032-025-00538-6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,1]]},"references-count":32,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2025,9]]}},"alternative-id":["538"],"URL":"https:\/\/doi.org\/10.1007\/s10032-025-00538-6","relation":{},"ISSN":["1433-2833","1433-2825"],"issn-type":[{"value":"1433-2833","type":"print"},{"value":"1433-2825","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,7,1]]},"assertion":[{"value":"23 August 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 February 2025","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 June 2025","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"1 July 2025","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}]}}