{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,8]],"date-time":"2025-02-08T03:40:11Z","timestamp":1738986011974,"version":"3.37.0"},"edition-number":"1","reference-count":38,"publisher":"Wiley","isbn-type":[{"type":"print","value":"9780471383932"},{"type":"electronic","value":"9780470050118"}],"license":[{"start":{"date-parts":[[2009,3,16]],"date-time":"2009-03-16T00:00:00Z","timestamp":1237161600000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/doi.wiley.com\/10.1002\/tdm_license_1.1"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Paper archives need to be converted to electronic form to enable the search and organization of documents. This article provides a brief introduction to the basic ideas and the remaining challenges. Essentially, this involves first scanning the image, then discovering the page layout to separate text segments and finally recognizing characters and words. The basics of scanning, image preprocessing steps, page segmentation and recognition are described for printed documents followed by a brief discussion of large vocabulary handwriting recognition. Other issues discussed include the detection of text against image backgrounds, language identification and datasets and evaluation. Modern good quality documents with printed fonts can be well recognized but poorer quality print recognition as well as handwriting recognition still remain major research challenges.<\/jats:p>","DOI":"10.1002\/9780470050118.ecse667","type":"other","created":{"date-parts":[[2009,3,9]],"date-time":"2009-03-09T17:48:02Z","timestamp":1236620882000},"page":"1022-1031","source":"Crossref","is-referenced-by-count":0,"title":["Document Image Analysis and Recognition"],"prefix":"10.1002","author":[{"given":"R.","family":"Manmatha","sequence":"first","affiliation":[]}],"member":"311","published-online":{"date-parts":[[2009,3,16]]},"reference":[{"key":"e_1_2_15_2_1","doi-asserted-by":"crossref","unstructured":"G.Mori andJ.Malik Recognizing objects in adversarial clutter: breaking a visual captcha.Proc. Computer Vision and Pattern Recognition Vol. 1 pp.I\u2010134\u2013I\u2010141 2003.","DOI":"10.1109\/CVPR.2003.1211347"},{"key":"e_1_2_15_3_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4615-5021-1"},{"key":"e_1_2_15_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/34.824820"},{"volume-title":"Document image analysis","year":"1995","author":"O'Gorman L.","key":"e_1_2_15_5_1"},{"key":"e_1_2_15_6_1","doi-asserted-by":"crossref","unstructured":"T. M.Breuel An algorithm for finding maximal whitespace rectangles at arbitrary orientations for document layout analysis Proc. of the 7th Int'l Conf. on Document Analysis and Recognition Vol. 1 Edinburgh Scotland 2003 pp.66\u201370.","DOI":"10.1109\/ICDAR.2003.1227629"},{"key":"e_1_2_15_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSMC.1979.4310076"},{"key":"e_1_2_15_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/34.368197"},{"key":"e_1_2_15_9_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0031-3203(99)00055-2"},{"key":"e_1_2_15_10_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF01212455"},{"key":"e_1_2_15_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/34.244677"},{"key":"e_1_2_15_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/2.144436"},{"key":"e_1_2_15_13_1","doi-asserted-by":"publisher","DOI":"10.1147\/rd.266.0647"},{"key":"e_1_2_15_14_1","doi-asserted-by":"publisher","DOI":"10.1006\/cviu.1998.0684"},{"key":"e_1_2_15_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2004.14"},{"key":"e_1_2_15_16_1","doi-asserted-by":"publisher","DOI":"10.1142\/S0218001401000848"},{"key":"e_1_2_15_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/34.908966"},{"key":"e_1_2_15_18_1","doi-asserted-by":"crossref","unstructured":"V.Lavrenko T. M.Rath andR.Manmatha Holistic word recognition for handwritten historical documents Proc. of the Workshop on Document Image Analysis for Libraries DIAL'04 2004 pp.278\u2013287.","DOI":"10.1109\/DIAL.2004.1263256"},{"key":"e_1_2_15_19_1","doi-asserted-by":"crossref","unstructured":"Z.Lu R.Schwartz P.Natarajan I.Bazzi andJ.Makhoul Advances in the bbn byblos ocr system Proc. of the International Conference on Document Analysis and Recognition 1999 pp.337\u2013340.","DOI":"10.1109\/ICDAR.1999.791793"},{"key":"e_1_2_15_20_1","unstructured":"J. T.Favata G.Srikantan andS. N.Srihari Handprinted character\/digit recognition using multiple feature\/resolution philosophy Proc. International Workshop on Frontiers in Handwriting Recognition pp.57\u201366 1994."},{"volume-title":"Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids","year":"2001","author":"Durbin R.","key":"e_1_2_15_21_1"},{"volume-title":"Statistical Methods for Speech Recognition (Language, Speech, and Communication)","year":"1998","author":"Jelinek F.","key":"e_1_2_15_22_1"},{"key":"e_1_2_15_23_1","first-page":"176","article-title":"Large scale simulation studies in image pattern recognition","author":"Ho T. K.","year":"1997","journal-title":"IEEE Trans. Patt. Anal. Mach. Intell."},{"key":"e_1_2_15_24_1","unstructured":"S.Rice Measuring the accuracy of page\u2010reading systems PhD thesis Las Vegas Nevada University of Nevada 1996."},{"key":"e_1_2_15_25_1","doi-asserted-by":"crossref","unstructured":"S.Feng andR.Manmatha A hierarchical hmm\u2010based automatic evaluation of ocr accuracy for a digital library of books.Proc. Joint conf on Digital Libraries (JCDL) pp.109\u2013118 2006.","DOI":"10.1145\/1141753.1141776"},{"key":"e_1_2_15_26_1","doi-asserted-by":"crossref","unstructured":"P.Kantor andE.Voorhes Report on the TREC\u20105 confusion track Online Proceedings of TREC\u20105 (1996) NIST Special Publication 500\u2010238 pp.65\u201374 1997.","DOI":"10.6028\/NIST.SP.500-238.confusion-overview"},{"key":"e_1_2_15_27_1","doi-asserted-by":"crossref","unstructured":"S. M.Harding W. B.Croft andC.Weir Probabilistic retrieval of ocr degraded text using n\u2010grams Proc. European Conference on Digital Libraries pp.1997 345\u2013359.","DOI":"10.1007\/BFb0026737"},{"key":"e_1_2_15_28_1","doi-asserted-by":"crossref","unstructured":"A. L.Spitz Determination of the script and language content of document images IEEE Trans. Patt. Anal. Mach. Intell. 1997 pp.235\u2013245.","DOI":"10.1109\/34.584100"},{"key":"e_1_2_15_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/34.574802"},{"key":"e_1_2_15_30_1","doi-asserted-by":"crossref","unstructured":"G.Peake andT.Tan Script and language identification from document images Proc. Workshop Document Image Analysis 1997 pp.10\u201317","DOI":"10.1007\/3-540-63931-4_203"},{"key":"e_1_2_15_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/34.824821"},{"key":"e_1_2_15_32_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0031-3203(01)00129-7"},{"key":"e_1_2_15_33_1","unstructured":"A.Kornai K. M.Mohiuddin andS. D.Connell Recognition of cursive writing on personal checks Proc. of the 5th Int'l Workshop on Frontiers in Handwriting Recognition 1996."},{"key":"e_1_2_15_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/34.329003"},{"key":"e_1_2_15_35_1","doi-asserted-by":"publisher","DOI":"10.1016\/0167-8655(93)90090-Z"},{"key":"e_1_2_15_36_1","doi-asserted-by":"publisher","DOI":"10.1142\/S0218001401000848"},{"key":"e_1_2_15_37_1","doi-asserted-by":"crossref","unstructured":"T. M.Rath R.Manmatha andV.Lavrenko A search engine for historical manuscript images Proc. ACM SIGR 2004 pp.369\u2013376.","DOI":"10.1145\/1008992.1009056"},{"key":"e_1_2_15_38_1","doi-asserted-by":"crossref","unstructured":"V.Wu R.Manmatha andE.Riseman Textfinder: An automatic system to detect and recognize text in images IEEE Trans. Patt. Anal. Mach. Intell. PAMI 1999 pp.1224\u20131229.","DOI":"10.1109\/34.809116"},{"key":"e_1_2_15_39_1","unstructured":"X.Chen andA.Yuille Adaboost learning for detecting and reading text in city scenes Proc. Computer Vision and Pattern Recognition (CVPR) pp.366\u2013373 2004."}],"container-title":["Wiley Encyclopedia of Computer Science and Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/9780470050118.ecse667","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,8]],"date-time":"2025-02-08T03:11:48Z","timestamp":1738984308000},"score":1,"resource":{"primary":{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/10.1002\/9780470050118.ecse667"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,3,16]]},"ISBN":["9780471383932","9780470050118"],"references-count":38,"alternative-id":["10.1002\/9780470050118.ecse667","10.1002\/9780470050118"],"URL":"https:\/\/doi.org\/10.1002\/9780470050118.ecse667","archive":["Portico"],"relation":{},"subject":[],"published":{"date-parts":[[2009,3,16]]}}}