{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,3]],"date-time":"2026-04-03T03:31:53Z","timestamp":1775187113170,"version":"3.50.1"},"reference-count":120,"publisher":"SAGE Publications","issue":"1","license":[{"start":{"date-parts":[[2014,10,3]],"date-time":"2014-10-03T00:00:00Z","timestamp":1412294400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Journal of Information Science"],"published-print":{"date-parts":[[2015,2]]},"abstract":"<jats:p> Table detection, extraction and annotation have been an important research problem for years. To handle this issue, different approaches have been designed for different types of documents. Among these PDF is a widely used format for preserving and presenting different types of documents. We investigate the state of the art in table detection, extraction and annotation in PDF documents. Because of varying table structural anatomy, the state of the art in table-related research enumerates a number of approaches that are critically and analytically investigated for identifying their strengths and limitations as well as for making recommendations for further improvement. An evaluation framework is contributed that compares different information extraction tools that may be used in table detection, extraction and annotation. We found very limited attention towards these aspects in books, especially books in PDF format. There is no searching solution that can find books having tables that are semantically related to a table in a given book. <\/jats:p>","DOI":"10.1177\/0165551514551903","type":"journal-article","created":{"date-parts":[[2014,10,4]],"date-time":"2014-10-04T04:53:13Z","timestamp":1412398393000},"page":"41-57","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":47,"title":["On methods and tools of table detection, extraction and annotation in PDF documents"],"prefix":"10.1177","volume":"41","author":[{"given":"Shah","family":"Khusro","sequence":"first","affiliation":[{"name":"Department of Computer Science, University of Peshawar, Pakistan"}]},{"given":"Asima","family":"Latif","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Peshawar, Pakistan"}]},{"given":"Irfan","family":"Ullah","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Peshawar, Pakistan"}]}],"member":"179","published-online":{"date-parts":[[2014,10,3]]},"reference":[{"key":"bibr1-0165551514551903","unstructured":"Y\u0131ld\u0131z B. Information extraction\u2013utilizing table patterns: Master\u2019s thesis, Vienna University of Technology, 2004."},{"key":"bibr2-0165551514551903","volume-title":"Portable document format reference manual","author":"Bienz T","year":"1993"},{"key":"bibr3-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-72032-1"},{"key":"bibr4-0165551514551903","unstructured":"Pitfalls CFPTXMWAT. White paper, 2003."},{"key":"bibr5-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1007\/s10032-006-0016-y"},{"key":"bibr6-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1109\/ICDAR.2005.21"},{"key":"bibr7-0165551514551903","volume-title":"Proceedings of the symposium on document image understanding technology (SDIUT\u201997)","author":"Peterman C","year":"1997"},{"key":"bibr8-0165551514551903","author":"Cameron JP","year":"1989","journal-title":"Ohio State University, Computer & Information Science Research Center"},{"key":"bibr9-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1007\/s10032-005-0001-x"},{"key":"bibr10-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-40953-X_9"},{"key":"bibr11-0165551514551903","doi-asserted-by":"publisher","DOI":"10.3115\/1572364.1572371"},{"key":"bibr12-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1109\/DAS.2012.29"},{"key":"bibr13-0165551514551903","unstructured":"Long V. An agent-based approach to table recognition and interpretation. Macquarie University Sydney, Australia, 2010."},{"key":"bibr14-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2004.01.012"},{"key":"bibr15-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1177\/0049124189018002002"},{"key":"bibr16-0165551514551903","volume-title":"EP92, Proceedings of electronic publishing","author":"Vanoirbeek C","year":"1992"},{"key":"bibr17-0165551514551903","unstructured":"Liu Y. Tableseer: Automatic table extraction, search, and understanding. The Pennsylvania State University, 2009."},{"key":"bibr18-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1145\/2452376.2452457"},{"key":"bibr19-0165551514551903","first-page":"1","volume":"1","author":"Schmoekel I","year":"2010","journal-title":"AchimUesen, Germany"},{"key":"bibr20-0165551514551903","volume-title":"PDF Vol. 2010","author":"Amyuni T","year":"2010"},{"key":"bibr21-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1145\/1255175.1255193"},{"key":"bibr22-0165551514551903","volume-title":"Probabilistic graphical models: Principles and techniques","author":"Friedman DKaN","year":"2009"},{"key":"bibr23-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1145\/290941.291037"},{"key":"bibr24-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1007\/s10032-006-0017-x"},{"key":"bibr25-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1109\/ICDAR.2011.304"},{"issue":"1","key":"bibr26-0165551514551903","first-page":"1","volume":"7","author":"Zanibbi R","year":"2004","journal-title":"Document Analysis and Recognition"},{"key":"bibr27-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1145\/263690.263816"},{"key":"bibr28-0165551514551903","volume-title":"Using white space for automated document structuring","author":"Rus D","year":"1994"},{"key":"bibr29-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1109\/ICDAR.2001.953842"},{"key":"bibr30-0165551514551903","volume-title":"Medium-independent table detection. Electronic Imaging","author":"Hu J","year":"1999"},{"key":"bibr31-0165551514551903","doi-asserted-by":"publisher","DOI":"10.3115\/1034678.1034746"},{"key":"bibr32-0165551514551903","volume-title":"Photonics west\u201998: Electronic imaging","author":"Kieninger TG","year":"1998"},{"key":"bibr33-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1145\/544220.544228"},{"key":"bibr34-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1145\/860435.860479"},{"key":"bibr35-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-61226-2_8"},{"key":"bibr36-0165551514551903","unstructured":"Tupaj S, Shi Z, Chang CH, Alam H. Extracting tabular information from text files. EECS Department, Tufts University, Medford, USA. 1996."},{"key":"bibr37-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1109\/ICDAR.1997.619833"},{"key":"bibr38-0165551514551903","volume-title":"Proceedings of document analysis systems (DAS)","author":"Kieninger T","year":"1998"},{"key":"bibr39-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1109\/ICPR.2002.1047838"},{"key":"bibr40-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-45869-7_29"},{"key":"bibr41-0165551514551903","volume-title":"Proceedings of the fifth international conference on the practical application of Prolog","author":"Ferguson D","year":"1997"},{"key":"bibr42-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1109\/ICDAR.2003.1227692"},{"key":"bibr43-0165551514551903","doi-asserted-by":"publisher","DOI":"10.3115\/990820.990845"},{"key":"bibr44-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1145\/511446.511477"},{"key":"bibr45-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1145\/511446.511478"},{"key":"bibr46-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2006.05.029"},{"key":"bibr47-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1117\/12.909077"},{"key":"bibr48-0165551514551903","volume-title":"pdf2table: A method to extract table information from PDF files","author":"Yildiz B","year":"2005"},{"key":"bibr49-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1145\/1655925.1656103"},{"issue":"2","key":"bibr50-0165551514551903","first-page":"404","volume":"1","author":"Mohemad R","year":"2011","journal-title":"International Journal of New Computer Architectures and their Applications (IJNCAA)"},{"key":"bibr51-0165551514551903","volume-title":"Proceedings of the national conference on artificial intelligence","author":"Liu Y","year":"2007"},{"key":"bibr52-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1117\/12.410860"},{"key":"bibr53-0165551514551903","volume-title":"Proceedings of the fourth annual symposium on document analysis and information retrieval","author":"Douglas S","year":"1995"},{"key":"bibr54-0165551514551903","volume-title":"Class of 2005 senior conference on natural language processing","author":"Shin J","year":"2005"},{"key":"bibr55-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1007\/s10032-005-0006-5"},{"key":"bibr56-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1145\/1141753.1141835"},{"key":"bibr57-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1007\/11610113_79"},{"key":"bibr58-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1109\/ICDAR.2007.4377094"},{"key":"bibr59-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1145\/1458082.1458255"},{"key":"bibr60-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1109\/ICDAR.2009.12"},{"key":"bibr61-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1109\/ICTAI.2008.48"},{"key":"bibr62-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1016\/j.datak.2004.10.004"},{"key":"bibr63-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1145\/1242572.1242583"},{"key":"bibr64-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1109\/ICDAR.2009.185"},{"key":"bibr65-0165551514551903","unstructured":"Lafferty J, McCallum A, Pereira FC. Conditional random fields: Probabilistic models for segmenting and labeling sequence data, 2001."},{"key":"bibr66-0165551514551903","volume-title":"Proceedings of the international workshop on Web document analysis","author":"Hurst M","year":"2001"},{"key":"bibr67-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1007\/s10791-006-9005-5"},{"key":"bibr68-0165551514551903","volume-title":"ICDAR","author":"Silva AC","year":"2007"},{"key":"bibr69-0165551514551903","doi-asserted-by":"publisher","DOI":"10.3115\/1072228.1072358"},{"key":"bibr70-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1145\/1401890.1401927"},{"key":"bibr71-0165551514551903","volume-title":"Tabular abstraction, editing, and formatting","author":"Wang X","year":"1996"},{"key":"bibr72-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1007\/s11280-005-0360-8"},{"key":"bibr73-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1145\/1815330.1815345"},{"key":"bibr74-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1145\/2361354.2361365"},{"key":"bibr75-0165551514551903","volume-title":"Proceedings of the international conference on recent advances in natural language processing (RANLP)","author":"Cimiano P","year":"2005"},{"key":"bibr76-0165551514551903","author":"Jha P","year":"2008","journal-title":"Rensselaer Polytechnic Institute"},{"key":"bibr77-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-89704-0_24"},{"key":"bibr78-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-02121-3_47"},{"key":"bibr79-0165551514551903","doi-asserted-by":"publisher","DOI":"10.14778\/1920841.1921005"},{"key":"bibr80-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1145\/1774088.1774445"},{"key":"bibr81-0165551514551903","volume-title":"VLDS","author":"Mulwad V","year":"2011"},{"key":"bibr82-0165551514551903","author":"Mulwad V","year":"2010","journal-title":"University of Maryland"},{"key":"bibr83-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-20291-9_45"},{"key":"bibr84-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-34002-4_11"},{"key":"bibr85-0165551514551903","doi-asserted-by":"publisher","DOI":"10.14778\/2002938.2002939"},{"key":"bibr86-0165551514551903","volume-title":"Atelier Ontologies et Jeux de Donn\u00e9es pour \u00e9valuer le Web S\u00e9mantique (OJD) associ\u00e9 \u00e0 IC\u20192012","author":"Quercini G","year":"2012"},{"key":"bibr87-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-88564-1_29"},{"key":"bibr88-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-17746-0_2"},{"key":"bibr89-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1109\/ICDAR.2001.953843"},{"key":"bibr90-0165551514551903","unstructured":"Pereira FSaF. Shallow parsing with conditional random \ufb01elds, 2003."},{"key":"bibr91-0165551514551903","doi-asserted-by":"publisher","DOI":"10.3115\/1119176.1119206"},{"key":"bibr92-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1145\/1998076.1998079"},{"key":"bibr93-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1007\/11669487_15"},{"key":"bibr94-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1109\/ICDAR.2003.1227793"},{"key":"bibr95-0165551514551903","volume-title":"Proceedings sixth international conference on document analysis and recognition","author":"Wang Y","year":"2001"},{"key":"bibr96-0165551514551903","unstructured":"Hurst MF. The interpretation of tables in texts, 2000."},{"key":"bibr97-0165551514551903","first-page":"88","volume-title":"Advances in multimedia information processing \u2013 PCM 2004","author":"Wang C","year":"2005"},{"key":"bibr98-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-30475-3_13"},{"key":"bibr99-0165551514551903","first-page":"31","author":"Yoshida M","year":"2001","journal-title":"Proceedings of the international workshop on Web document analysis. WDA"},{"key":"bibr100-0165551514551903","doi-asserted-by":"crossref","unstructured":"Tenopir C, Sandusky RJ, Casado MM. Uses of figures and tables from scholarly journal articles in teaching and research, 2007.","DOI":"10.1002\/meet.1450440389"},{"key":"bibr101-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btm301"},{"key":"bibr102-0165551514551903","first-page":"21C9","volume-title":"Proceedings of the 19th annual international ACM SIGIR conference on research and development in information retrieval","author":"Mitra CBASaM"},{"key":"bibr103-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1145\/290941.291008"},{"key":"bibr104-0165551514551903","first-page":"253","volume-title":"7th text retrieval conference, TREC7","author":"Robertson WE"},{"key":"bibr105-0165551514551903","first-page":"107","author":"Brin LPS","year":"1999","journal-title":"Proceedings of the seventh international conference on World Wide Web"},{"key":"bibr106-0165551514551903","volume-title":"Proceedings of tenth international Semantic Web conference, Part II","author":"Varish M","year":"2011"},{"key":"bibr107-0165551514551903","volume-title":"From tessellations to table interpretation","author":"Ramana C","year":"2009"},{"key":"bibr108-0165551514551903","volume-title":"DocLab","author":"Nagy G","year":"2010"},{"key":"bibr109-0165551514551903","author":"Tim Finin ZS","year":"2010","journal-title":"Web science conference"},{"key":"bibr110-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1007\/s100320200074"},{"key":"bibr111-0165551514551903","author":"Phillips I","year":"1996","journal-title":"UW-III English\/Technical Document Image Database Manual"},{"key":"bibr112-0165551514551903","unstructured":"Wang Y, Haralick M, Haralick RM, Phillips IT. Document analysis: Table structure understanding and zone content classification, 2002."},{"key":"bibr113-0165551514551903","unstructured":"Wu W, Li H, Wang H, Zhu K. Towards a probabilistic taxonomy of many concepts. Technical Report MSR-TR-2011-25, Microsoft Research, 2011."},{"key":"bibr114-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1145\/1242572.1242667"},{"key":"bibr115-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1016\/j.websem.2009.07.002"},{"issue":"6","key":"bibr116-0165551514551903","volume":"2","author":"Pitale S","year":"2011","journal-title":"International Journal of Computer Technology and Applications"},{"key":"bibr117-0165551514551903","first-page":"470","author":"Shaker M","year":"2009","journal-title":"Proceedings of the 11th international conference on information integration and Web-based applications & services"},{"key":"bibr118-0165551514551903","first-page":"205","author":"Zwicklbauer S","year":"2013","journal-title":"ISWC (P&D)"},{"key":"bibr119-0165551514551903","doi-asserted-by":"publisher","DOI":"10.1007\/11669487_40"},{"key":"bibr120-0165551514551903","doi-asserted-by":"crossref","unstructured":"Beel J, Gipp B, Shaker A, Friedrich N. SciPlore Xtract: Extracting Titles from Scientific PDF Documents by Analyzing Style Information (Font Size). Research and Advanced Technology for Digital Libraries, LNCS 6273 (Springer, Berlin, 2010), 413\u2013416.","DOI":"10.1007\/978-3-642-15464-5_45"}],"container-title":["Journal of Information Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/0165551514551903","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/0165551514551903","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/0165551514551903","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,28]],"date-time":"2025-02-28T16:51:24Z","timestamp":1740761484000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/0165551514551903"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,10,3]]},"references-count":120,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2015,2]]}},"alternative-id":["10.1177\/0165551514551903"],"URL":"https:\/\/doi.org\/10.1177\/0165551514551903","relation":{},"ISSN":["0165-5515","1741-6485"],"issn-type":[{"value":"0165-5515","type":"print"},{"value":"1741-6485","type":"electronic"}],"subject":[],"published":{"date-parts":[[2014,10,3]]}}}