{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,27]],"date-time":"2026-03-27T07:15:02Z","timestamp":1774595702548,"version":"3.50.1"},"reference-count":32,"publisher":"Springer Science and Business Media LLC","issue":"6","license":[{"start":{"date-parts":[[2020,9,8]],"date-time":"2020-09-08T00:00:00Z","timestamp":1599523200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,9,8]],"date-time":"2020-09-08T00:00:00Z","timestamp":1599523200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"publisher","award":["TABSIM"],"award-info":[{"award-number":["TABSIM"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"publisher","award":["STA1471\/1-1"],"award-info":[{"award-number":["STA1471\/1-1"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Data Min Knowl Disc"],"published-print":{"date-parts":[[2020,11]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Tables are a common way to present information in an intuitive and concise manner. They are used extensively in media such as scientific articles or web pages. Automatically analyzing the content of tables bears special challenges. One of the most basic tasks is determination of the orientation of a table: In column tables, columns represent one entity with the different attribute values present in the different rows; row tables are vice versa, and matrix tables give information on pairs of entities. In this paper, we address the problem of classifying a given table into one of the three layouts horizontal (for row tables), vertical (for column tables), and matrix. We describe DeepTable, a novel method based on deep neural networks designed for learning from sets. Contrary to previous state-of-the-art methods, this basis makes DeepTable invariant to the permutation of rows or columns, which is a highly desirable property as in most tables the order of rows and columns does not carry specific information. We evaluate our method using a silver standard corpus of 5500 tables extracted from biomedical articles where the layout was determined heuristically. DeepTable outperforms previous methods in both precision and recall on our corpus. In a second evaluation, we manually labeled a corpus of 300 tables and were able to confirm DeepTable to reach superior performance in the table layout classification task. The codes and resources introduced here are available at<jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/Marhabibi\/DeepTable\">https:\/\/github.com\/Marhabibi\/DeepTable<\/jats:ext-link>.<\/jats:p>","DOI":"10.1007\/s10618-020-00711-x","type":"journal-article","created":{"date-parts":[[2020,9,10]],"date-time":"2020-09-10T06:32:44Z","timestamp":1599719564000},"page":"1963-1983","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["DeepTable: a permutation invariant neural network for table orientation classification"],"prefix":"10.1007","volume":"34","author":[{"given":"Maryam","family":"Habibi","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0832-9604","authenticated-orcid":false,"given":"Johannes","family":"Starlinger","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2166-9582","authenticated-orcid":false,"given":"Ulf","family":"Leser","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2020,9,8]]},"reference":[{"key":"711_CR1","unstructured":"Agassi S, Ziv U, Shulman H (2004) Auto completion of relationships between objects in a data model. US Patent 6775674 B1"},{"key":"711_CR2","doi-asserted-by":"crossref","unstructured":"Bhagavatula CS, Noraset T, Downey D (2013) Methods for exploring and mining tables on Wikipedia. In: Proceedings of the ACM SIGKDD workshop on interactive data exploration and analytics, pp 18\u201326","DOI":"10.1145\/2501511.2501516"},{"key":"711_CR3","unstructured":"Braunschweig K (2015) Recovering the semantics of tabular web data. Ph.D. thesis, Dresden University of Technology"},{"issue":"1","key":"711_CR4","doi-asserted-by":"publisher","first-page":"538","DOI":"10.14778\/1453856.1453916","volume":"1","author":"MJ Cafarella","year":"2008","unstructured":"Cafarella MJ, Halevy A, Wang DZ, Wu E, Zhang Y (2008a) Webtables: exploring the power of tables on the web. Proc VLDB Endow (PVLDB) 1(1):538\u2013549","journal-title":"Proc VLDB Endow (PVLDB)"},{"key":"711_CR5","unstructured":"Cafarella MJ, Halevy AY, Zhang Y, Wang DZ, Wu E (2008b) Uncovering the relational web. In: Proceedings of international workshop on the web and databases (WebDB)"},{"issue":"3","key":"711_CR6","first-page":"273","volume":"20","author":"C Cortes","year":"1995","unstructured":"Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273\u2013297","journal-title":"Mach Learn"},{"key":"711_CR7","doi-asserted-by":"crossref","unstructured":"Crestan E, Pantel P (2011) Web-scale table census and classification. In: Proceedings of the fourth ACM international conference on web search and data mining, pp 545\u2013554","DOI":"10.1145\/1935826.1935904"},{"issue":"7","key":"711_CR8","doi-asserted-by":"publisher","first-page":"1895","DOI":"10.1162\/089976698300017197","volume":"10","author":"TG Dietterich","year":"1998","unstructured":"Dietterich TG (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10(7):1895\u20131923","journal-title":"Neural Comput"},{"key":"711_CR9","doi-asserted-by":"crossref","unstructured":"Eberius J, Braunschweig K, Hentsch M, Thiele M, Ahmadov A, Lehner W (2015) Building the Dresden web table corpus: a classification approach. In: Proceedings of the second IEEE\/ACM international conference on big data computing (BDC), pp 41\u201350","DOI":"10.1109\/BDC.2015.30"},{"key":"711_CR10","unstructured":"Ghasemi-Gol M, Szekely P (2018) Tabvec: table vectors for classification of web tables. CoRR abs\/1802.06290"},{"key":"711_CR11","unstructured":"Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp 315\u2013323"},{"key":"711_CR12","doi-asserted-by":"crossref","unstructured":"Gonzalez H, Halevy A, Jensen CS, Langen A, Madhavan J, Shapley R, Shen W (2010) Google fusion tables: data management, integration and collaboration in the cloud. In: Proceedings of the first ACM symposium on cloud computing, pp 175\u2013180","DOI":"10.1145\/1807128.1807158"},{"key":"711_CR13","unstructured":"Ho TK (1995) Random decision forests. In: Proceedings of 3rd international conference on document analysis and recognition, vol\u00a01. IEEE, pp 278\u2013282"},{"key":"711_CR14","unstructured":"Isberner A (2016) Similarity search on tabular data. Diploma Thesis. Humboldt University of Berlin"},{"issue":"2","key":"711_CR15","doi-asserted-by":"publisher","first-page":"139","DOI":"10.1007\/s12559-009-9009-8","volume":"1","author":"P Kanerva","year":"2009","unstructured":"Kanerva P (2009) Hyperdimensional computing: an introduction to computing in distributed representation with high-dimensional random vectors. Cognit Comput 1(2):139\u2013159","journal-title":"Cognit Comput"},{"key":"711_CR16","unstructured":"Lee J, Lee Y, Kim J, Kosiorek A, Choi S, Teh YW (2019) Set transformer: a framework for attention-based permutation-invariant neural networks. In: International conference on machine learning, pp 3744\u20133753"},{"key":"711_CR17","doi-asserted-by":"crossref","unstructured":"Lehmberg O, Ritze D, Meusel R, Bizer C (2016) A large public corpus of web tables containing time and context metadata. In: Proceedings of the 25th international conference companion on world wide web, pp 75\u201376","DOI":"10.1145\/2872518.2889386"},{"key":"711_CR18","doi-asserted-by":"crossref","unstructured":"Liu Y, Bai K, Mitra P, Giles CL (2007) Tableseer: automatic table metadata extraction and searching in digital libraries. In: Proceedings of the 7th ACM\/IEEE-CS joint conference on digital libraries, pp 91\u2013100","DOI":"10.1145\/1255175.1255193"},{"key":"711_CR19","unstructured":"Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Proceedings of advances in neural information processing systems, pp 3111\u20133119"},{"key":"711_CR20","doi-asserted-by":"crossref","unstructured":"Milosevic N, Gregson C, Hernandez R, Nenadic G (2016) Extracting patient data from tables in clinical literature: case study on extraction of BMI, weight and number of patients. In: Proceedings of the 9th international joint conference on biomedical engineering systems and technologies, pp 223\u201322","DOI":"10.5220\/0005660102230228"},{"key":"711_CR21","doi-asserted-by":"crossref","unstructured":"Nishida K, Sadamitsu K, Higashinaka R, Matsuo Y (2017) Understanding the semantic structures of tables with a hybrid deep neural network architecture. In: Proceedings of the 31st AAAI conference on artificial intelligence, pp 168\u2013174","DOI":"10.1609\/aaai.v31i1.10484"},{"key":"711_CR22","doi-asserted-by":"crossref","unstructured":"Petrovski B, Aguado I, Hossmann A, Baeriswyl M, Musat C (2018) Embedding individual table columns for resilient SQL chatbots. In: Proceedings of the 2018 EMNLP workshop SCAI: the 2nd international workshop on search-oriented conversational AI, pp 67\u201373","DOI":"10.18653\/v1\/W18-5710"},{"key":"711_CR23","doi-asserted-by":"crossref","unstructured":"Pinto D, Branstein M, Coleman R, Croft WB, King M, Li W, Wei X (2002) QuASM: a system for question answering using semi-structured data. In: Proceedings of the 2nd ACM\/IEEE-CS joint conference on digital libraries, pp 46\u201355","DOI":"10.1145\/544220.544228"},{"key":"711_CR24","unstructured":"Pyysalo S, Ginter F, Moen H, Salakoski T, Ananiadou S (2013) Distributional semantics resources for biomedical text processing. In: Proceedings of the 5th international symposium on languages in biology and medicine"},{"key":"711_CR25","unstructured":"Qi CR, Su H, Mo K, Guibas LJ (2017) Pointnet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 652\u2013660"},{"key":"711_CR26","doi-asserted-by":"crossref","unstructured":"Ritze D, Lehmberg O, Bizer C (2015) Matching HTML tables to DBpedia. In: Proceedings of the 5th international conference on web intelligence, mining and semantics, pp 10:1\u201310:6","DOI":"10.1145\/2797115.2797118"},{"key":"711_CR27","unstructured":"Vilain M, Gibson J, Wellner B, Quimby R (2006) Table classification: an application of machine learning to web-hosted financial documents. Technical report, MITRE"},{"key":"711_CR28","doi-asserted-by":"crossref","unstructured":"Wang Y, Hu J (2002) A machine learning based approach for table detection on the web. In: Proceedings of the 11th international conference on world wide web, pp 242\u2013250","DOI":"10.1145\/511446.511478"},{"key":"711_CR29","unstructured":"Xu X, Liu C, Song D (2017) SQLNET: Generating structured queries from natural language without reinforcement learning. CoRR abs\/1711.04436"},{"key":"711_CR30","unstructured":"Yoshida M, Torisawa K, Tsujii J (2001) A method to integrate tables of the world wide web. In: Proceedings of the international workshop on web document analysis, pp 31\u201334"},{"key":"711_CR31","unstructured":"Zaheer M, Kottur S, Ravanbakhsh S, Poczos B, Salakhutdinov RR, Smola AJ (2017) Deep sets. In: Advances in neural information processing systems, pp 3391\u20133401"},{"key":"711_CR32","doi-asserted-by":"crossref","unstructured":"Zhang S, Balog K (2018) Ad hoc table retrieval using semantic similarity. In: Proceedings of the 2018 world wide web conference, international world wide web conferences steering committee, pp 1553\u20131562","DOI":"10.1145\/3178876.3186067"}],"container-title":["Data Mining and Knowledge Discovery"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10618-020-00711-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10618-020-00711-x\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10618-020-00711-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,11,17]],"date-time":"2022-11-17T16:40:24Z","timestamp":1668703224000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10618-020-00711-x"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,9,8]]},"references-count":32,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2020,11]]}},"alternative-id":["711"],"URL":"https:\/\/doi.org\/10.1007\/s10618-020-00711-x","relation":{},"ISSN":["1384-5810","1573-756X"],"issn-type":[{"value":"1384-5810","type":"print"},{"value":"1573-756X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,9,8]]},"assertion":[{"value":"13 September 2019","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"17 August 2020","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"8 September 2020","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Compliance with ethical standards"}},{"value":"The authors declare that they have no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}