{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T04:45:47Z","timestamp":1760244347293,"version":"build-2065373602"},"reference-count":12,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2022,12,11]],"date-time":"2022-12-11T00:00:00Z","timestamp":1670716800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>Nowadays, digital transformation (DX) is the key concept to change and improve the operations in governments, companies, and schools. Therefore, any data should be digitized for processing by computers. Unfortunately, a lot of data and information are printed and handled on paper, although they may originally come from digital sources. Data on paper can be digitized using an optical character recognition (OCR) software. However, if the paper contains a table, it becomes difficult because of the separated characters by rows and columns there. It is necessary to solve the research question of \u201chow to convert a printed table on paper into an Excel table while keeping the relationships between the cells?\u201d In this paper, we propose a printed table digitization algorithm using image processing techniques and OCR software for it. First, the target paper is scanned into an image file. Second, each table is divided into a collection of cells where the topology information is obtained. Third, the characters in each cell are digitized by OCR software. Finally, the digitalized data are arranged in an Excel file using the topology information. We implement the algorithm on Python using OpenCV for the image processing library and Tesseract for the OCR software. For evaluations, we applied the proposal to 19 scanned and 17 screenshotted table images. The results show that for any image, the Excel file is generated with the correct structure, and some characters are misrecognized by OCR software. The improvement will be in future works.<\/jats:p>","DOI":"10.3390\/a15120471","type":"journal-article","created":{"date-parts":[[2022,12,12]],"date-time":"2022-12-12T04:05:22Z","timestamp":1670817922000},"page":"471","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["A Proposal of Printed Table Digitization Algorithm with Image Processing"],"prefix":"10.3390","volume":"15","author":[{"given":"Chenrui","family":"Shi","sequence":"first","affiliation":[{"name":"Department of Information and Communication Systems, Graduate School of Natural Science and Technology, Okayama University, Okayama 700-8530, Japan"}]},{"given":"Nobuo","family":"Funabiki","sequence":"additional","affiliation":[{"name":"Department of Information and Communication Systems, Graduate School of Natural Science and Technology, Okayama University, Okayama 700-8530, Japan"}]},{"given":"Yuanzhi","family":"Huo","sequence":"additional","affiliation":[{"name":"Department of Information and Communication Systems, Graduate School of Natural Science and Technology, Okayama University, Okayama 700-8530, Japan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5469-9724","authenticated-orcid":false,"given":"Mustika","family":"Mentari","sequence":"additional","affiliation":[{"name":"Department of Information Technology, State Polytechnic of Malang, Malang 65141, Indonesia"}]},{"given":"Kohei","family":"Suga","sequence":"additional","affiliation":[{"name":"Astrolab, Tokyo 107-0062, Japan"}]},{"given":"Takashi","family":"Toshida","sequence":"additional","affiliation":[{"name":"Astrolab, Tokyo 107-0062, Japan"}]}],"member":"1968","published-online":{"date-parts":[[2022,12,11]]},"reference":[{"key":"ref_1","unstructured":"(2022, July 02). What is Digital Transformation (DX)?. Available online: https:\/\/www.netapp.com\/devops-solutions\/what-is-digital-transformation\/."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Ohta, M., Yamada, R., Kanazawa, T., and Takasu, A. (2019, January 23\u201326). A cell-detection-based table-structure recognition method. Proceedings of the ACM Symposium on Document Engineering 2019, Berlin, Germany.","DOI":"10.1145\/3342558.3345412"},{"key":"ref_3","first-page":"41","article-title":"An Integrated Approach for Table Detection and Structure Recognition","volume":"1","author":"Phan","year":"2021","journal-title":"J. Inf. Technol. Commun."},{"key":"ref_4","first-page":"217","article-title":"Auto-Table-Extract: A System To Identify And Extract Tables From PDF To Excel","volume":"9","author":"Sahoo","year":"2020","journal-title":"Int. J. Sci. Technol. Res."},{"key":"ref_5","first-page":"3743","article-title":"Conversion of Image To Excel Using Ocr Technique","volume":"4","author":"Amitha","year":"2020","journal-title":"Int. Res. J. Mod. Eng. Technol. Sci."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Nagy, G., and Seth, S. (2016, January 4\u20138). Table headers: An entrance to the data mine. Proceedings of the 2016 IEEE 23rd International Conference on Pattern Recognition (ICPR), Cancun, Mexico.","DOI":"10.1109\/ICPR.2016.7900270"},{"key":"ref_7","unstructured":"Li, M., Cui, L., Huang, S., Wei, F., Zhou, M., and Li, Z. (2019). Tablebank: A benchmark dataset for table detection and recognition. arXiv."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"40","DOI":"10.1145\/2181796.2206309","article-title":"Realtime Computer Vision with OpenCV","volume":"10","author":"Pulli","year":"2012","journal-title":"Queue"},{"key":"ref_9","unstructured":"(2022, July 10). A Table Detection, Cell Recognition and Text Extraction Algorithm to Convert Tables in Images to Excel Files. Available online: https:\/\/towardsdatascience.com\/."},{"key":"ref_10","unstructured":"(2022, September 10). How to Convert a Table of Paper Data to Excel Data? We Can Scan Tables by Use Office Application in Smartphone. Available online: https:\/\/dekiru.net\/article\/21950\/."},{"key":"ref_11","unstructured":"(2022, September 10). Scan and Edit a Document. Available online: https:\/\/support.microsoft.com\/en-us\/office\/scan-and-edit-a-document-7a07a4bd-aca5-4ec5-ba73-4589ac8b9eed."},{"key":"ref_12","unstructured":"(2022, September 10). Table Detection, Table Extraction & Information Extraction Using DL. Available online: https:\/\/nanonets.com\/blog\/table-extraction-deep-learning\/."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/15\/12\/471\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T01:38:06Z","timestamp":1760146686000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/15\/12\/471"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,11]]},"references-count":12,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2022,12]]}},"alternative-id":["a15120471"],"URL":"https:\/\/doi.org\/10.3390\/a15120471","relation":{},"ISSN":["1999-4893"],"issn-type":[{"type":"electronic","value":"1999-4893"}],"subject":[],"published":{"date-parts":[[2022,12,11]]}}}