{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,25]],"date-time":"2026-02-25T18:13:28Z","timestamp":1772043208345,"version":"3.50.1"},"reference-count":42,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2024,3,5]],"date-time":"2024-03-05T00:00:00Z","timestamp":1709596800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["J. Imaging"],"abstract":"<jats:p>Text line segmentation is a necessary preliminary step before most text transcription algorithms are applied. The leading deep learning networks used in this context (ARU-Net, dhSegment, and Doc-UFCN) are based on the U-Net architecture. They are efficient, but fall under the same concept, requiring a post-processing step to perform instance (e.g., text line) segmentation. In the present work, we test the advantages of Mask-RCNN, which is designed to perform instance segmentation directly. This work is the first to directly compare Mask-RCNN- and U-Net-based networks on text segmentation of historical documents, showing the superiority of the former over the latter. Three studies were conducted, one comparing these networks on different historical databases, another comparing Mask-RCNN with Doc-UFCN on a private historical database, and a third comparing the handwritten text recognition (HTR) performance of the tested networks. The results showed that Mask-RCNN outperformed ARU-Net, dhSegment, and Doc-UFCN using relevant line segmentation metrics, that performance evaluation should not focus on the raw masks generated by the networks, that a light mask processing is an efficient and simple solution to improve evaluation, and that Mask-RCNN leads to better HTR performance.<\/jats:p>","DOI":"10.3390\/jimaging10030065","type":"journal-article","created":{"date-parts":[[2024,3,5]],"date-time":"2024-03-05T08:35:54Z","timestamp":1709627754000},"page":"65","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":13,"title":["Historical Text Line Segmentation Using Deep Learning Algorithms: Mask-RCNN against U-Net Networks"],"prefix":"10.3390","volume":"10","author":[{"ORCID":"https:\/\/orcid.org\/0009-0003-9772-5832","authenticated-orcid":false,"given":"Florian C\u00f4me","family":"Fizaine","sequence":"first","affiliation":[{"name":"LEAD-CNRS, Universit\u00e9 de Bourgogne, 21000 Dijon, France"},{"name":"Archives D\u00e9partementales de C\u00f4te d\u2019Or, 21000 Dijon, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5464-4766","authenticated-orcid":false,"given":"Patrick","family":"Bard","sequence":"additional","affiliation":[{"name":"LEAD-CNRS, Universit\u00e9 de Bourgogne, 21000 Dijon, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6026-9453","authenticated-orcid":false,"given":"Michel","family":"Paindavoine","sequence":"additional","affiliation":[{"name":"LEAD-CNRS, Universit\u00e9 de Bourgogne, 21000 Dijon, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"C\u00e9cile","family":"Robin","sequence":"additional","affiliation":[{"name":"Archives D\u00e9partementales de C\u00f4te d\u2019Or, 21000 Dijon, France"},{"name":"Institut National du Patrimoine, 75002 Paris, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Edouard","family":"Bouy\u00e9","sequence":"additional","affiliation":[{"name":"Archives D\u00e9partementales de C\u00f4te d\u2019Or, 21000 Dijon, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Rapha\u00ebl","family":"Lef\u00e8vre","sequence":"additional","affiliation":[{"name":"Soci\u00e9t\u00e9 Nationale des Chemins de fer Fran\u00e7ais, 93200 Saint Denis, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4108-6106","authenticated-orcid":false,"given":"Annie","family":"Vinter","sequence":"additional","affiliation":[{"name":"LEAD-CNRS, Universit\u00e9 de Bourgogne, 21000 Dijon, France"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2024,3,5]]},"reference":[{"key":"ref_1","unstructured":"Archives, F.N. (1997). Gallica, The BnF Digital Library."},{"key":"ref_2","unstructured":"Nadeau, C., Haliwell, W., Roberts, K., and Roberts, G. (1980). Psychology of Motor Behavior and Sport, Human Kinetic Publisher."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1007\/s10032-006-0023-z","article-title":"Text line segmentation of historical documents: A survey","volume":"9","author":"Zahour","year":"2007","journal-title":"Int. J. Doc. Anal. Recognit. (IJDAR)"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Diem, M., Kleber, F., Fiel, S., Gruning, T., and Gatos, B. (2017, January 9\u201315). cBAD: ICDAR2017 Competition on Baseline Detection. Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, Japan.","DOI":"10.1109\/ICDAR.2017.222"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Kurar Barakat, B., Cohen, R., Droby, A., Rabaev, I., and El-Sana, J. (2020). Learning-Free Text Line Segmentation for Historical Handwritten Documents. Appl. Sci., 10.","DOI":"10.3390\/app10228276"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Nguyen, T.N., Burie, J.C., Le, T.L., and Schweyer, A.V. (2022, January 21\u201325). An effective method for text line segmentation in historical document images. Proceedings of the 2022 26th International Conference on Pattern Recognition (ICPR), Montreal, QC, Canada.","DOI":"10.1109\/ICPR56361.2022.9956617"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"2278","DOI":"10.1109\/5.726791","article-title":"Gradient-Based Learning Applied to Document Recognition","volume":"86","author":"Lecun","year":"1998","journal-title":"Proc. IEEE"},{"key":"ref_8","first-page":"3523","article-title":"Image Segmentation Using Deep Learning: A Survey","volume":"44","author":"Minaee","year":"2022","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016). You Only Look Once: Unified, Real-Time Object Detection. arXiv.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., and Berg, A.C. (2016). SSD: Single Shot MultiBox Detector. arXiv.","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015). Fast R-CNN. arXiv.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Cl\u00e9rice, T. (2022). You Actually Look Twice At it (YALTAi): Using an object detection approach instead of region segmentation within the Kraken engine. arXiv.","DOI":"10.46298\/jdmdh.9806"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv.","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Doll\u00e1r, P., and Girshick, R. (2018). Mask R-CNN. arXiv.","DOI":"10.1109\/ICCV.2017.322"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"499","DOI":"10.1007\/s42979-022-01407-3","article-title":"A Survey on Object Instance Segmentation","volume":"3","author":"Sharma","year":"2022","journal-title":"SN Comput. Sci."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"535","DOI":"10.3390\/signals3030032","article-title":"Text Line Extraction in Historical Documents Using Mask R-CNN","volume":"3","author":"Droby","year":"2022","journal-title":"Signals"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"95","DOI":"10.1007\/s10032-022-00395-7","article-title":"Robust text line detection in historical documents: Learning and evaluation methods","volume":"25","author":"Boillet","year":"2022","journal-title":"Int. J. Doc. Anal. Recognit. (IJDAR)"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Simistira, F., Seuret, M., Eichenberger, N., Garz, A., Liwicki, M., and Ingold, R. (2016, January 23\u201326). DIVA-HisDB: A Precisely Annotated Large Dataset of Challenging Medieval Manuscripts. Proceedings of the 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), Shenzhen, China.","DOI":"10.1109\/ICFHR.2016.0093"},{"key":"ref_19","unstructured":"Stutzmann, D., Torres Aguilar, S., and Chaffenet, P. (2024, February 26). HOME-Alcar: Aligned and Annotated Cartularies. Available online: https:\/\/doi.org\/10.5281\/zenodo.5600884."},{"key":"ref_20","unstructured":"Oliveira, S.A., Seguin, B., and Kaplan, F. (2018, January 5\u20138). dhSegment: A generic deep-learning approach for document segmentation. Proceedings of the 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), Niagara Falls, NY, USA."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"285","DOI":"10.1007\/s10032-019-00332-1","article-title":"A Two-Stage Method for Text Line Detection in Historical Documents","volume":"22","author":"Leifert","year":"2019","journal-title":"Int. J. Doc. Anal. Recognit. (IJDAR)"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Boillet, M., Maarand, M., Paquet, T., and Kermorvant, C. (2021, January 13\u201318). Including Keyword Position in Image-based Models for Act Segmentation of Historical Registers. Proceedings of the 6th International Workshop on Historical Document Imaging and Processing, New York, NY, USA. HIP \u201921.","DOI":"10.1145\/3476887.3476905"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Renton, G., Chatelain, C., Adam, S., Kermorvant, C., and Paquet, T. (2017, January 9\u201315). Handwritten Text Line Segmentation Using Fully Convolutional Network. Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, Japan.","DOI":"10.1109\/ICDAR.2017.321"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Ren, S., He, K., Girshick, R., and Sun, J. (2016). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. arXiv.","DOI":"10.1109\/TPAMI.2016.2577031"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Vuola, A.O., Akram, S.U., and Kannala, J. (2019). Mask-RCNN and U-net Ensembled for Nuclei Segmentation. arXiv.","DOI":"10.1109\/ISBI.2019.8759574"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"gfac105-002","DOI":"10.1093\/ndt\/gfac105.002","article-title":"FC046: Automated Mest-C Classification in IGA Nephropathy using Deep-Learning based Segmentation","volume":"37","author":"Marechal","year":"2022","journal-title":"Nephrol. Dial. Transplant."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Van Wymelbeke-Delannoy, V., Juhel, C., Bole, H., Sow, A.K., Guyot, C., Belbaghdadi, F., Brousse, O., and Paindavoine, M. (2022). A Cross-Sectional Reproducibility Study of a Standard Camera Sensor Using Artificial Intelligence to Assess Food Items: The FoodIntech Project. Nutrients, 14.","DOI":"10.3390\/nu14010221"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"154435","DOI":"10.1109\/ACCESS.2021.3128536","article-title":"Accurate Fine-grained Layout Analysis for the Historical Tibetan Document Based on the Instance Segmentation","volume":"9","author":"Zhao","year":"2021","journal-title":"IEEE Access"},{"key":"ref_29","unstructured":"Wang, X., Zhang, R., Kong, T., Li, L., and Shen, C. (2020). SOLOv2: Dynamic and Fast Instance Segmentation. arXiv."},{"key":"ref_30","unstructured":"Fizaine, F.C., Robin, C., and Paindavoine, M. (2021, January 8\u201310). Transcription Automatique de textes du XVIIIe si\u00e8cle \u00e0 l\u2019aide de l\u2019intelligence artificielle. Proceedings of the Conference of AI4LAM Les Futurs Fantastiques, Paris, France. Available online: https:\/\/www.bnf.fr\/fr\/les-futurs-fantastiques."},{"key":"ref_31","unstructured":"Fizaine, F.C., and Bouy\u00e9, E. (2022, January 23\u201324). Lettres en Lumi\u00e8res. Proceedings of the Conference of CremmaLab Documents Anciens et Reconnaissance Automatique des \u00e9Critures Manuscrites, Paris, France."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"62","DOI":"10.1109\/TSMC.1979.4310076","article-title":"A Threshold Selection Method from Gray-Level Histograms","volume":"9","author":"Ostu","year":"1979","journal-title":"IEEE Trans. Syst. Man, Cybern."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Mechi, O., Mehri, M., Ingold, R., and Essoukri Ben Amara, N. (2019, January 20\u201325). Text Line Segmentation in Historical Document Images Using an Adaptive U-Net Architecture. Proceedings of the 2019 International Conference on Document Analysis and Recognition (ICDAR), Sydney, NSW, Australia.","DOI":"10.1109\/ICDAR.2019.00066"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1007\/s11263-009-0275-4","article-title":"The Pascal Visual Object Classes (VOC) Challenge","volume":"88","author":"Everingham","year":"2010","journal-title":"Int. J. Comput. Vis."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Wick, C., and Puppe, F. (2018, January 24\u201327). Fully Convolutional Neural Networks for Page Segmentation of Historical Document Images. Proceedings of the 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), Vienna, Austria.","DOI":"10.1109\/DAS.2018.39"},{"key":"ref_36","unstructured":"Li, M., Lv, T., Cui, L., Lu, Y., Florencio, D., Zhang, C., Li, Z., and Wei, F. (2021). TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models. arXiv."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., and Fei-Fei, L. (2009, January 20\u201325). ImageNet: A large-scale hierarchical image database. Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA.","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"39","DOI":"10.1007\/s100320200071","article-title":"The IAM-database: An English sentence database for offline handwriting recognition","volume":"5","author":"Marti","year":"2002","journal-title":"Int. J. Doc. Anal. Recognit."},{"key":"ref_39","unstructured":"S\u00e1nchez, J.A., Romero, V., Toselli, A.H., and Vidal, E. (2024, February 26). Available online: https:\/\/doi.org\/10.5281\/zenodo.218236."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Boillet, M., Bonhomme, M.L., Stutzmann, D., and Kermorvant, C. (2019, January 20\u201321). HORAE: An annotated dataset of books of hours. Proceedings of the 5th International Workshop on Historical Document Imaging and Processing, Sydney, NSW, Australia.","DOI":"10.1145\/3352631.3352633"},{"key":"ref_41","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., and Polosukhin, I. (2017). Attention Is All You Need. arXiv."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T., and Xie, S. (2022). A ConvNet for the 2020s. arXiv.","DOI":"10.1109\/CVPR52688.2022.01167"}],"container-title":["Journal of Imaging"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2313-433X\/10\/3\/65\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T14:09:31Z","timestamp":1760105371000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2313-433X\/10\/3\/65"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,3,5]]},"references-count":42,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2024,3]]}},"alternative-id":["jimaging10030065"],"URL":"https:\/\/doi.org\/10.3390\/jimaging10030065","relation":{},"ISSN":["2313-433X"],"issn-type":[{"value":"2313-433X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,3,5]]}}}