{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,6]],"date-time":"2026-02-06T18:03:13Z","timestamp":1770400993542,"version":"3.49.0"},"reference-count":26,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2025,10,27]],"date-time":"2025-10-27T00:00:00Z","timestamp":1761523200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/deed.de"},{"start":{"date-parts":[[2025,10,27]],"date-time":"2025-10-27T00:00:00Z","timestamp":1761523200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/deed.de"}],"funder":[{"DOI":"10.13039\/501100002347","name":"Bundesministerium f\u00fcr Bildung und Forschung","doi-asserted-by":"publisher","award":["NHR2021HE"],"award-info":[{"award-number":["NHR2021HE"]}],"id":[{"id":"10.13039\/501100002347","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003495","name":"Hessisches Ministerium f\u00fcr Wissenschaft und Kunst","doi-asserted-by":"publisher","award":["Kapitel 1502, F\u00f6rderprodukt 19 NHR4CES"],"award-info":[{"award-number":["Kapitel 1502, F\u00f6rderprodukt 19 NHR4CES"]}],"id":[{"id":"10.13039\/501100003495","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003495","name":"Hessisches Ministerium f\u00fcr Wissenschaft und Kunst","doi-asserted-by":"publisher","award":["3AI \u2013 The Third Wave of Artificial Intelligence"],"award-info":[{"award-number":["3AI \u2013 The Third Wave of Artificial Intelligence"]}],"id":[{"id":"10.13039\/501100003495","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003495","name":"Hessisches Ministerium f\u00fcr Wissenschaft und Kunst","doi-asserted-by":"publisher","award":["LOEWE Spitzenprofessur"],"award-info":[{"award-number":["LOEWE Spitzenprofessur"]}],"id":[{"id":"10.13039\/501100003495","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100005714","name":"Technische Universit\u00e4t Darmstadt","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100005714","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Datenbank Spektrum"],"published-print":{"date-parts":[[2025,11]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Natural Language Interfaces for Databases (NLIDBs) offer an interesting alternative to SQL since they empower non-experts to query data. However, they require this data to be integrated into a\u00a0database schema, causing high data engineering and integration overheads. As such, Open Table Question Answering (OTQA) is promising since it allows directly querying tables in data lakes without first incorporating them into a\u00a0relational schema. Many recent OTQA approaches combine retrieval-augmented generation with Large Language Models (LLMs), where relevant tables are first retrieved from a\u00a0data lake and then used as input to an LLM to answer the user query. In this paper, we systematically analyze how LLMs paired with table retrievers can answer queries over private tabular data lakes. We find that the answer generation often fails because the retrieval step does not provide the required tabular context. To overcome this issue, we propose a\u00a0novel LLM-based retrieval approach called Zoom retrieval, which effectively boosts retrieval accuracies and thereby improves question answering results. Nevertheless, LLMs often still fail to answer even simple extraction queries, let alone aggregates, thus remaining far from the rich querying capabilities that NLIDBs offer today. Therefore, future work should focus on improving the query execution capabilities of LLMs to enable complex question answering over tabular data lakes.<\/jats:p>","DOI":"10.1007\/s13222-025-00513-9","type":"journal-article","created":{"date-parts":[[2025,10,27]],"date-time":"2025-10-27T15:39:38Z","timestamp":1761579578000},"page":"145-152","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Towards Complex Table Question Answering Over Tabular Data Lakes (Extended Version)"],"prefix":"10.1007","volume":"25","author":[{"ORCID":"https:\/\/orcid.org\/0009-0004-8220-3051","authenticated-orcid":false,"given":"Daniela","family":"Risis","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4884-0300","authenticated-orcid":false,"given":"Jan-Micha","family":"Bodensohn","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7418-6181","authenticated-orcid":false,"given":"Matthias","family":"Urban","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2744-7836","authenticated-orcid":false,"given":"Carsten","family":"Binnig","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2025,10,27]]},"reference":[{"key":"513_CR1","volume-title":"Instructeval: systematic evaluation of instruction selection methods. arxiv preprint arxiv:230700259","author":"A Ajith","year":"2023","unstructured":"Ajith\u00a0A, Pan\u00a0C, Xia\u00a0M et\u00a0al (2023) Instructeval: systematic evaluation of instruction selection methods. arxiv preprint arxiv:230700259"},{"issue":"1","key":"513_CR2","doi-asserted-by":"publisher","first-page":"29","DOI":"10.1017\/S135132490000005X","volume":"1","author":"I Androutsopoulos","year":"1995","unstructured":"Androutsopoulos\u00a0I, Ritchie\u00a0G, Thanisch\u00a0P (1995) Natural language interfaces to databases\u2014an introduction. Nat Lang Eng 1(1):29\u201381. https:\/\/doi.org\/10.1017\/S135132490000005X","journal-title":"Nat Lang Eng"},{"key":"513_CR3","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2408.14717","volume-title":"Text2sql is not enough: unifying AI and databases with TAG. CoRR abs\/2408.14717","author":"A Biswal","year":"2024","unstructured":"Biswal\u00a0A, Patel\u00a0L, Jha\u00a0S et\u00a0al (2024) Text2sql is not enough: unifying AI and databases with TAG. CoRR abs\/2408.14717 https:\/\/doi.org\/10.48550\/ARXIV.2408.14717"},{"key":"513_CR4","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3663742.3663972","volume-title":"Rethinking table retrieval from data lakes. In: proceedings of the seventh international workshop on exploiting artificial intelligence techniques for data management","author":"JM Bodensohn","year":"2024","unstructured":"Bodensohn\u00a0JM, Binnig\u00a0C (2024) Rethinking table retrieval from data lakes. In: proceedings of the seventh international workshop on exploiting artificial intelligence techniques for data management vol 24. Association for Computing Machinery, New York, NY, USA, pp\u00a01\u20135 https:\/\/doi.org\/10.1145\/3663742.3663972 (https:\/\/dl.acm.org\/doi\/10.1145\/3663742.3663972)"},{"key":"513_CR5","doi-asserted-by":"publisher","first-page":"2687","DOI":"10.18653\/v1\/2024.acl-long.148","volume-title":"Proceedings of the 62nd annual meeting of the association for computational linguistics (volume 1: long papers)","author":"PB Chen","year":"2024","unstructured":"Chen\u00a0PB, Zhang\u00a0Y, Roth\u00a0D (2024) Is table retrieval a\u00a0solved problem? exploring join-aware multi-table retrieval. In: Ku\u00a0LW, Martins\u00a0A, Srikumar\u00a0V (eds) Proceedings of the 62nd annual meeting of the association for computational linguistics (volume 1: long papers). Association for Computational Linguistics, Bangkok, Thailand, pp\u00a02687\u20132699 https:\/\/doi.org\/10.18653\/v1\/2024.acl-long.148 (https:\/\/aclanthology.org\/2024.acl-long.148)"},{"key":"513_CR6","doi-asserted-by":"publisher","first-page":"2997","DOI":"10.1145\/3626772.3661384","volume-title":"Proceedings of the 47th international ACM SIGIR conference on research and development in information retrieval","author":"H Dong","year":"2024","unstructured":"Dong\u00a0H, Wang\u00a0Z (2024) Large language models for tabular data: progresses and future directions. In: Proceedings of the 47th international ACM SIGIR conference on research and development in information retrieval, pp\u00a02997\u20133000"},{"key":"513_CR7","doi-asserted-by":"publisher","first-page":"512","DOI":"10.18653\/v1\/2021.naacl-main.43","volume-title":"Proceedings of the 2021 conference of the north American chapter of the association for computational linguistics: human language technologies","author":"J Herzig","year":"2021","unstructured":"Herzig\u00a0J, M\u00fcller\u00a0T, Krichene\u00a0S et\u00a0al (2021) Open domain question answering over tables via dense retrieval. In: Toutanova\u00a0K, Rumshisky\u00a0A, Zettlemoyer\u00a0L et\u00a0al (eds) Proceedings of the 2021 conference of the north American chapter of the association for computational linguistics: human language technologies. Association for Computational Linguistics, pp\u00a0512\u2013519 https:\/\/doi.org\/10.18653\/v1\/2021.naacl-main.43 (https:\/\/aclanthology.org\/2021.naacl-main.43)"},{"key":"513_CR8","doi-asserted-by":"publisher","first-page":"305","DOI":"10.18653\/v1\/2022.naacl-industry.34","volume-title":"Proceedings of the 2022 conference of the north American chapter of the association for computational linguistics: human language technologies: industry track","author":"Y Katsis","year":"2022","unstructured":"Katsis\u00a0Y, Chemmengath\u00a0S, Kumar\u00a0V et\u00a0al (2022) Ait-qa: question answering dataset over complex tables in the airline industry. In: Proceedings of the 2022 conference of the north American chapter of the association for computational linguistics: human language technologies: industry track, pp\u00a0305\u2013314"},{"key":"513_CR9","doi-asserted-by":"publisher","first-page":"8285","DOI":"10.18653\/v1\/2023.findings-acl.526","volume-title":"Findings of the association for computational linguistics: aCL 2023","author":"S Kweon","year":"2023","unstructured":"Kweon\u00a0S, Kwon\u00a0Y, Cho\u00a0S et\u00a0al (2023) Open-Wikitable : Dataset for open domain question answering with complex reasoning over table. In: Rogers\u00a0A, Boyd-Graber\u00a0J, Okazaki\u00a0N (eds) Findings of the association for computational linguistics: aCL 2023. Association for Computational Linguistics, Toronto, Canada, pp\u00a08285\u20138297 https:\/\/doi.org\/10.18653\/v1\/2023.findings-acl.526 (https:\/\/aclanthology.org\/2023.findings-acl.526)"},{"key":"513_CR10","volume-title":"Piece of table: a\u00a0divide-and-conquer approach for selecting sub-tables in table question answering. arxiv preprint arxiv:241207629","author":"W Lee","year":"2024","unstructured":"Lee\u00a0W, Kim\u00a0K, Lee\u00a0S et\u00a0al (2024) Piece of table: a\u00a0divide-and-conquer approach for selecting sub-tables in table question answering. arxiv preprint arxiv:241207629"},{"key":"513_CR11","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2307.03172","volume-title":"Lost in the middle: how language models use long contexts","author":"NF Liu","year":"2023","unstructured":"Liu\u00a0NF, Lin\u00a0K, Hewitt\u00a0J et\u00a0al (2023) Lost in the middle: how language models use long contexts https:\/\/doi.org\/10.48550\/ARXIV.2307.03172 (https:\/\/arxiv.org\/abs\/2307.03172)"},{"key":"513_CR12","unstructured":"Ma X, Sun K, Pradeep R et\u00a0al (2021) A replication study of dense passage retriever. CoRR abs\/2104.05740. https:\/\/arxiv.org\/abs\/2104.05740"},{"key":"513_CR13","volume-title":"GPT-4o system card","author":"A Open","year":"2024","unstructured":"Open\u00a0A (2024) GPT-4o system card"},{"key":"513_CR14","doi-asserted-by":"publisher","first-page":"6322","DOI":"10.18653\/v1\/2023.acl-long.348","volume-title":"Proceedings of the 61st annual meeting of the association for computational linguistics (volume 1: long papers)","author":"V Pal","year":"2023","unstructured":"Pal\u00a0V, Yates\u00a0A, Kanoulas\u00a0E et\u00a0al (2023) MultitabQA: generating tabular answers for multi-table question answering. In: Proceedings of the 61st annual meeting of the association for computational linguistics (volume 1: long papers). Association for Computational Linguistics, Toronto, Canada, pp\u00a06322\u20136334 https:\/\/doi.org\/10.18653\/v1\/2023.acl-long.348 (https:\/\/aclanthology.org\/2023.acl-long.348)"},{"key":"513_CR15","volume-title":"Lotus: Enabling semantic queries with llms over tables of unstructured and structured data. arXiv preprint arXiv:240711418","author":"L Patel","year":"2024","unstructured":"Patel\u00a0L, Jha\u00a0S, Guestrin\u00a0C et\u00a0al (2024) Lotus: Enabling semantic queries with llms over tables of unstructured and structured data. arXiv preprint arXiv:240711418"},{"key":"513_CR16","volume-title":"Tqa-bench: evaluating llms for multi-table question answering with scalable context and symbolic extension. arxiv preprint arxiv:241119504","author":"Z Qiu","year":"2024","unstructured":"Qiu\u00a0Z, Peng\u00a0Y, He\u00a0G et\u00a0al (2024) Tqa-bench: evaluating llms for multi-table question answering with scalable context and symbolic extension. arxiv preprint arxiv:241119504"},{"key":"513_CR17","first-page":"3982","volume-title":"Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP","author":"N Reimers","year":"2019","unstructured":"Reimers\u00a0N, Gurevych\u00a0I (2019) Sentence-bert: sentence embeddings using siamese bert-networks. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP, pp\u00a03982\u20133992"},{"key":"513_CR18","doi-asserted-by":"publisher","first-page":"267","DOI":"10.18420\/BTW2025-130","volume-title":"Datenbanksysteme f\u00fcr business, Technologie und web\u2014Workshopband (BTW 2025)","author":"D Risis","year":"2025","unstructured":"Risis\u00a0D, Bodensohn\u00a0JM, Urban\u00a0M et\u00a0al (2025) Towards complex table question answering over tabular data lakes. In: Datenbanksysteme f\u00fcr business, Technologie und web\u2014Workshopband (BTW 2025). Gesellschaft f\u00fcr Informatik, Bonn, pp\u00a0267\u2013275 https:\/\/doi.org\/10.18420\/BTW2025-130"},{"key":"513_CR19","doi-asserted-by":"publisher","first-page":"5418","DOI":"10.18653\/v1\/2020.emnlp-main.437","volume-title":"Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP)","author":"A Roberts","year":"2020","unstructured":"Roberts\u00a0A, Raffel\u00a0C, Shazeer\u00a0N (2020) How much knowledge can you pack into the parameters of a\u00a0language model? In: Webber\u00a0B, Cohn\u00a0T, He\u00a0Y et\u00a0al (eds) Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP). Association for Computational Linguistics, pp\u00a05418\u20135426 https:\/\/doi.org\/10.18653\/v1\/2020.emnlp-main.437 (https:\/\/aclanthology.org\/2020.emnlp-main.437)"},{"issue":"4","key":"513_CR20","first-page":"333","volume":"3","author":"S Robertson","year":"2009","unstructured":"Robertson\u00a0S, Zaragoza\u00a0H (2009) The probabilistic relevance framework: BM25 and beyond. Inf Retrieval 3(4):333\u2013389","journal-title":"Inf Retrieval"},{"key":"513_CR21","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2024.3513155","volume-title":"Advancing retrieval-augmented generation with inverted question matching for enhanced qa performance","author":"B Saha","year":"2024","unstructured":"Saha\u00a0B, Saha\u00a0U, Malik\u00a0MZ (2024) Advancing retrieval-augmented generation with inverted question matching for enhanced qa performance. IEEE"},{"key":"513_CR22","volume-title":"Caesura: language models as multi-modal query planners. arxiv preprint arxiv:230803424","author":"M Urban","year":"2023","unstructured":"Urban\u00a0M, Binnig\u00a0C (2023) Caesura: language models as multi-modal query planners. arxiv preprint arxiv:230803424"},{"key":"513_CR23","doi-asserted-by":"publisher","first-page":"36","DOI":"10.18653\/v1\/2022.suki-1.5","volume-title":"Proceedings of the workshop on structured and unstructured knowledge integration (SUKI)","author":"Z Wang","year":"2022","unstructured":"Wang\u00a0Z, Jiang\u00a0Z, Nyberg\u00a0E et\u00a0al (2022) Table retrieval may not necessitate table-specific model design. In: Proceedings of the workshop on structured and unstructured knowledge integration (SUKI), pp\u00a036\u201346"},{"key":"513_CR24","volume-title":"The thirteenth international conference on learning representations","author":"J Wu","year":"2025","unstructured":"Wu\u00a0J, Yang\u00a0L, Li\u00a0D et\u00a0al (2025) MMQA: evaluating LLms with multi-table multi-hop complex questions. In: The thirteenth international conference on learning representations (https:\/\/openreview.net\/forum?id=GGlpykXDCa)"},{"key":"513_CR25","volume-title":"Tablebench: a\u00a0comprehensive and complex benchmark for table question answering. arxiv preprint arxiv:240809174","author":"X Wu","year":"2024","unstructured":"Wu\u00a0X, Yang\u00a0J, Chai\u00a0L et\u00a0al (2024) Tablebench: a\u00a0comprehensive and complex benchmark for table question answering. arxiv preprint arxiv:240809174"},{"key":"513_CR26","volume-title":"Seq2SQL: generating structured queries from natural language using reinforcement learning","author":"V Zhong","year":"2017","unstructured":"Zhong\u00a0V, Xiong\u00a0C, Socher\u00a0R (2017) Seq2SQL: generating structured queries from natural language using reinforcement learning (http:\/\/arxiv.org\/abs\/1709.00103, arXiv:1709.00103 [cs)"}],"container-title":["Datenbank-Spektrum"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s13222-025-00513-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s13222-025-00513-9","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s13222-025-00513-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,2,6]],"date-time":"2026-02-06T07:52:42Z","timestamp":1770364362000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s13222-025-00513-9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,27]]},"references-count":26,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2025,11]]}},"alternative-id":["513"],"URL":"https:\/\/doi.org\/10.1007\/s13222-025-00513-9","relation":{},"ISSN":["1618-2162","1610-1995"],"issn-type":[{"value":"1618-2162","type":"print"},{"value":"1610-1995","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,10,27]]},"assertion":[{"value":"31 May 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"8 September 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"27 October 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"D.\u00a0Risis, J.-M.\u00a0Bodensohn, M.\u00a0Urban and C.\u00a0Binnig declare that they have no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}