{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,1]],"date-time":"2025-12-01T15:10:56Z","timestamp":1764601856516,"version":"build-2065373602"},"reference-count":38,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2025,4,28]],"date-time":"2025-04-28T00:00:00Z","timestamp":1745798400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"MAGIC project"},{"name":"Ministry of Enterprise and Made in Italy"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["J. Imaging"],"abstract":"<jats:p>Over the last decades, plenty of ancient manuscripts have been digitized all over the world, and particularly in Europe. The fruition of these huge digital archives is often limited by the bleed-through effect due to the acid nature of the inks used, resulting in very noisy images. Several authors have recently worked on bleed-through removal, using different approaches. With the aim of developing a bleed-through removal tool, capable of batch application on a large number of images, of the order of hundred thousands, we used machine learning and robust statistical methods with four different methods, and applied them to two medieval manuscripts. The methods used are (i) non-local means (NLM); (ii) Gaussian mixture models (GMMs); (iii) biweight estimation; and (iv) Gaussian blur. The application of these methods to the two quoted manuscripts shows that these methods are, in general, quite effective in bleed-through removal, but the selection of the method has to be performed according to the characteristics of the manuscript, e.g., if there is no ink fading and the difference between bleed-through pixels and the foreground text is clear, we can use a stronger model without the risk of losing important information. Conversely, if the distinction between bleed-through and foreground pixels is less pronounced, it is better to use a weaker model to preserve useful details.<\/jats:p>","DOI":"10.3390\/jimaging11050136","type":"journal-article","created":{"date-parts":[[2025,4,28]],"date-time":"2025-04-28T11:48:33Z","timestamp":1745840913000},"page":"136","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Minimizing Bleed-Through Effect in Medieval Manuscripts with Machine Learning and Robust Statistics"],"prefix":"10.3390","volume":"11","author":[{"ORCID":"https:\/\/orcid.org\/0009-0008-7445-8926","authenticated-orcid":false,"given":"Adriano","family":"Ettari","sequence":"first","affiliation":[{"name":"Department of Physics E. Pancini, University of Naples Federico II, Via Vicinale Cupa Cinthia, 26, 80126 Napoli, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9506-5680","authenticated-orcid":false,"given":"Massimo","family":"Brescia","sequence":"additional","affiliation":[{"name":"Department of Physics E. Pancini, University of Naples Federico II, Via Vicinale Cupa Cinthia, 26, 80126 Napoli, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-9527-3451","authenticated-orcid":false,"given":"Stefania","family":"Conte","sequence":"additional","affiliation":[{"name":"Department of Physics E. Pancini, University of Naples Federico II, Via Vicinale Cupa Cinthia, 26, 80126 Napoli, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-4437-0255","authenticated-orcid":false,"given":"Yahya","family":"Momtaz","sequence":"additional","affiliation":[{"name":"Department of Physics E. Pancini, University of Naples Federico II, Via Vicinale Cupa Cinthia, 26, 80126 Napoli, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5823-4393","authenticated-orcid":false,"given":"Guido","family":"Russo","sequence":"additional","affiliation":[{"name":"Department of Physics E. Pancini, University of Naples Federico II, Via Vicinale Cupa Cinthia, 26, 80126 Napoli, Italy"}]}],"member":"1968","published-online":{"date-parts":[[2025,4,28]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Hanif, M., Tonazzini, A., Savino, P., Salerno, E., and Tsagkatakis, G. (2018, January 24\u201327). Document Bleed-Through Removal Using Sparse Image Inpainting. Proceedings of the 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), Vienna, Austria.","DOI":"10.1109\/DAS.2018.21"},{"key":"ref_2","unstructured":"Dubois, E., and Pathak, A. (2001, January 22\u201325). Reduction of bleed-through in scanned manuscript documents. Proceedings of the IS&T Conference on Image Processing, Image Quality, Image Capture Systems, Montreal, QC, Canada."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"163","DOI":"10.1007\/s10032-019-00323-2","article-title":"Bleed-through cancellation in non-rigidly misaligned recto\u2013verso archival manuscripts based on local registration","volume":"22","author":"Savino","year":"2019","journal-title":"Int. J. Doc. Anal. Recognit. (IJDAR)"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"11743","DOI":"10.1007\/s00521-023-09354-7","article-title":"Training a shallow NN to erase ink seepage in historical manuscripts based on a degradation model","volume":"36","author":"Savino","year":"2024","journal-title":"Neural Comput. Appl."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1007\/s11220-016-0134-7","article-title":"Global and local features based classification for bleed-through removal","volume":"17","author":"Hu","year":"2016","journal-title":"Sens. Imaging"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"489","DOI":"10.1016\/j.neucom.2005.12.126","article-title":"Extreme learning machine: Theory and applications","volume":"70","author":"Huang","year":"2006","journal-title":"Neurocomputing"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"5702","DOI":"10.1109\/TIP.2016.2614133","article-title":"Blind bleed-through removal for scanned historical document image with conditional random fields","volume":"25","author":"Sun","year":"2016","journal-title":"IEEE Trans. Image Process."},{"key":"ref_8","unstructured":"Lafferty, J., McCallum, A., and Pereira, F.C.N. (July, January 28). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. Proceedings of the Eighteenth International Conference on Machine Learning, Williamstown, MA, USA."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"da Costa Rocha, C., Deborah, H., and Hardeberg, J.Y. (2018, January 2\u20134). Ink bleed-through removal of historical manuscripts based on hyperspectral imaging. Proceedings of the Image and Signal Processing: 8th International Conference, ICISP 2018, Cherbourg, France. Proceedings 8.","DOI":"10.1007\/978-3-319-94211-7_51"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Hanif, M., Tonazzini, A., Hussain, S.F., Khalil, A., and Habib, U. (2023). Restoration and content analysis of ancient manuscripts via color space based segmentation. PLoS ONE, 18.","DOI":"10.1371\/journal.pone.0282142"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Villegas, M., and Toselli, A.H. (2014, January 1\u20134). Bleed-Through Removal by Learning a Discriminative Color Channel. Proceedings of the 2014 14th International Conference on Frontiers in Handwriting Recognition, Hersonissos, Greece.","DOI":"10.1109\/ICFHR.2014.16"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"012081","DOI":"10.1088\/1757-899X\/949\/1\/012081","article-title":"MAGIC: Manuscripts of Girolamini in Cloud","volume":"949","author":"Russo","year":"2020","journal-title":"IOP Conf. Ser. Mater. Sci. Eng."},{"key":"ref_13","unstructured":"Bucciero, A., Fanini, B., Graf, H., Pescarin, S., and Rizvic, S. (2023, January 4\u20136). The Role of Project MA.G.I.C. in the Context of the European Strategies for the Digitization of the Library and Archival Heritage. Proceedings of the Eurographics Workshop on Graphics and Cultural Heritage, Lecce, Italy."},{"key":"ref_14","unstructured":"Conte, S., Russo, G., Salvatore, M., and Tortora, A. (2024, January 18\u201319). Application of the IBiSCo Data Canter for cultural heritage projects. Proceedings of the Final Workshop for the Italian PON IBiSCo Project, Naples, Italy."},{"key":"ref_15","unstructured":"Conte, S., Mazzucchi, A., Russo, G., Tortora, A., and Tortora, G. (2024, January 28\u201330). The organization and management of the MAGIC project for ancient manuscripts digitization: Connections between Mediterranean cultures. Proceedings of the AIUCD 2024. MeTe Digital: Mediterranean Networks Between Texts and Contexts, XIII Convegno Nazionale, Catania, Italy."},{"key":"ref_16","unstructured":"Conte, S., Di Domenico, G.M., Mazzei, A., Mazzucchi, A., Russo, G., Salvi, A., and Tortora, A. (2024, January 22\u201323). The MAGIC project: First research results. Proceedings of the Information and Research Science Connecting to Digital and Library Science, Bressanone-Brixen, Italy."},{"key":"ref_17","unstructured":"Conte, S., Ferrante, G., Laccetti, L., Mazzucchi, A., Momtaz, Y., and Tortora, A. (2024, January 5\u20136). Content representation and analysis: The Magic Project and the Illuminated Dante Project integrated systems for multimedia information retrieval. Proceedings of the 14th Italian Information Retrieval Workshop, Udine, Italy."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"208","DOI":"10.5201\/ipol.2011.bcm_nlm","article-title":"Non-local means denoising","volume":"1","author":"Buades","year":"2011","journal-title":"Image Process. Line"},{"key":"ref_19","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2015, January 5\u20139). U-net: Convolutional networks for biomedical image segmentation. Proceedings of the Medical Image Computing and Computer-Assisted Intervention\u2013MICCAI 2015: 18th International Conference, Munich, Germany. Proceedings, Part III 18."},{"key":"ref_20","unstructured":"Hinton, G.E., and Zemel, R. (December, January 29). Autoencoders, minimum description length and Helmholtz free energy. Proceedings of the 7th International Conference on Neural Information Processing Systems, Denver, CO, USA."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"62","DOI":"10.1109\/TSMC.1979.4310076","article-title":"A Threshold Selection Method from Gray-Level Histograms","volume":"9","author":"Otsu","year":"1979","journal-title":"IEEE Trans. Syst. Man Cybern."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"105","DOI":"10.6028\/jres.088.006","article-title":"The efficiency of the biweight as a robust estimator of location","volume":"88","author":"Kafadar","year":"1983","journal-title":"J. Res. Natl. Bur. Stand."},{"key":"ref_23","unstructured":"Mosteller, F., and Tukey, J.W. (1977). Data Analysis and Regression. A Second Course in Statistics, Addison-Wesley Publishing Company."},{"key":"ref_24","first-page":"1","article-title":"Kappa sigma clipping","volume":"367","author":"Lehmann","year":"2006","journal-title":"Afr. Insight"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1111\/j.2517-6161.1977.tb01600.x","article-title":"Maximum likelihood from incomplete data via the EM algorithm","volume":"39","author":"Dempster","year":"1977","journal-title":"J. R. Stat. Soc. Ser. B (Methodol.)"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"716","DOI":"10.1109\/TAC.1974.1100705","article-title":"A new look at the statistical model identification","volume":"19","author":"Akaike","year":"1974","journal-title":"IEEE Trans. Autom. Control"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"461","DOI":"10.1214\/aos\/1176344136","article-title":"Estimating the Dimension of a Model","volume":"6","author":"Schwarz","year":"1978","journal-title":"Ann. Stat."},{"key":"ref_28","unstructured":"Fix, E. (1985). Discriminatory Analysis: Nonparametric Discrimination, Consistency Properties, USAF School of Aviation Medicine."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1016\/0377-0427(87)90125-7","article-title":"Silhouettes: A graphical aid to the interpretation and validation of cluster analysis","volume":"20","author":"Rousseeuw","year":"1987","journal-title":"J. Comput. Appl. Math."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1007\/BF02289263","article-title":"Who belongs in the family?","volume":"18","author":"Thorndike","year":"1953","journal-title":"Psychometrika"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"1395","DOI":"10.1109\/ICDAR.2017.228","article-title":"ICDAR2017 competition on document image binarization (DIBCO 2017)","volume":"Volume 1","author":"Pratikakis","year":"2017","journal-title":"Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Pratikakis, I., Zagoris, K., Karagiannis, X., Tsochatzidis, L., Mondal, T., and Marthot-Santaniello, I. (2019, January 20\u201325). ICDAR 2019 Competition on Document Image Binarization (DIBCO 2019). Proceedings of the 2019 International Conference on Document Analysis and Recognition (ICDAR), Sydney, NSW, Australia.","DOI":"10.1109\/ICDAR.2019.00249"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Tensmeyer, C., Davis, B., Wigington, C., Lee, I., and Barrett, B. (2017, January 10\u201311). Pagenet: Page boundary extraction in historical handwritten documents. Proceedings of the 4th International Workshop on Historical Document Imaging and Processing, Kyoto, Japan.","DOI":"10.1145\/3151509.3151522"},{"key":"ref_34","unstructured":"Oliveira, S.A., Seguin, B., and Kaplan, F. (2018, January 5\u20138). dhSegment: A generic deep-learning approach for document segmentation. Proceedings of the 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), Niagara Falls, NY, USA."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"5929","DOI":"10.1007\/s11263-024-02168-7","article-title":"Exploiting diffusion prior for real-world image super-resolution","volume":"132","author":"Wang","year":"2024","journal-title":"Int. J. Comput. Vis."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Gr\u00fcning, T., Labahn, R., Diem, M., Kleber, F., and Fiel, S. (2018, January 24\u201327). READ-BAD: A new dataset and evaluation scheme for baseline detection in archival documents. Proceedings of the 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), Vienna, Austria.","DOI":"10.1109\/DAS.2018.38"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"595","DOI":"10.1109\/TIP.2012.2219550","article-title":"Performance Evaluation Methodology for Historical Document Image Binarization","volume":"22","author":"Ntirogiannis","year":"2013","journal-title":"IEEE Trans. Image Process."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"228","DOI":"10.1109\/LSP.2003.821748","article-title":"Distance-reciprocal distortion measure for binary document images","volume":"11","author":"Lu","year":"2004","journal-title":"IEEE Signal Process. Lett."}],"container-title":["Journal of Imaging"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2313-433X\/11\/5\/136\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T17:23:32Z","timestamp":1760030612000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2313-433X\/11\/5\/136"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,4,28]]},"references-count":38,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2025,5]]}},"alternative-id":["jimaging11050136"],"URL":"https:\/\/doi.org\/10.3390\/jimaging11050136","relation":{},"ISSN":["2313-433X"],"issn-type":[{"type":"electronic","value":"2313-433X"}],"subject":[],"published":{"date-parts":[[2025,4,28]]}}}