{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,30]],"date-time":"2025-11-30T09:19:37Z","timestamp":1764494377714,"version":"3.41.0"},"reference-count":28,"publisher":"MIT Press","license":[{"start":{"date-parts":[[2025,6,11]],"date-time":"2025-06-11T00:00:00Z","timestamp":1749600000000},"content-version":"vor","delay-in-days":161,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,6,6]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Language models, potentially augmented with tool usage such as retrieval, are becoming the go-to means of answering questions. Understanding and answering questions in real-world settings often requires retrieving information from different sources, processing and aggregating data to extract insights, and presenting complex findings in form of structured artifacts such as novel tables, charts, or infographics. In this paper, we introduce TANQ,1 the first open-domain question answering dataset where the answers require building tables from information across multiple sources. We release the full source attribution for every cell in the resulting table and benchmark state-of-the-art language models in open, oracle, and closed book setups. Our best-performing baseline, Gemini Flash, reaches an overall F1 score of 60.7, lagging behind human performance by 12.3 points. We analyze baselines\u2019 performance across different dataset attributes such as different skills required for this task, including multi-hop reasoning, math operations, and unit conversions. We further discuss common failures in model-generated answers, suggesting that TANQ is a complex task with many challenges ahead.<\/jats:p>","DOI":"10.1162\/tacl_a_00749","type":"journal-article","created":{"date-parts":[[2025,6,11]],"date-time":"2025-06-11T16:25:57Z","timestamp":1749659157000},"page":"461-480","update-policy":"https:\/\/doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":1,"title":["TANQ: An Open Domain Dataset of Table Answered Questions"],"prefix":"10.1162","volume":"13","author":[{"given":"Mubashara","family":"Akhtar","sequence":"first","affiliation":[{"name":"King\u2019s College London, UK & ETH Zurich, Switzerland. mubashara.akhtar@ai.ethz.ch"},{"name":"Google DeepMind, Switzerland. chenxipang@google.com"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chenxi","family":"Pang","sequence":"additional","affiliation":[{"name":"Google DeepMind, Switzerland. chenxipang@google.com"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Andreea","family":"Marzoca","sequence":"additional","affiliation":[{"name":"Google DeepMind, Switzerland. andreeam@google.com"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yasemin","family":"Altun","sequence":"additional","affiliation":[{"name":"Google DeepMind, Switzerland. altun@google.com"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Julian Martin","family":"Eisenschlos","sequence":"additional","affiliation":[{"name":"Google DeepMind, Switzerland. eisenjulian@google.com"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"281","published-online":{"date-parts":[[2025,6,6]]},"reference":[{"key":"2025061112255414100_bib1","doi-asserted-by":"publisher","first-page":"15391","DOI":"10.18653\/v1\/2023.findings-emnlp.1028","article-title":"Exploring the numerical reasoning capabilities of language models: A comprehensive analysis on tabular data","volume-title":"Findings of the Association for Computational Linguistics: EMNLP 2023","author":"Akhtar","year":"2023"},{"key":"2025061112255414100_bib2","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2205.12665","article-title":"QAMPARI: An open-domain question answering benchmark for questions with many answers from multiple paragraphs","volume":"abs\/2205.12665","author":"Amouyal","year":"2022","journal-title":"CoRR"},{"key":"2025061112255414100_bib3","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2312.11805","article-title":"Gemini: A family of highly capable multimodal models","volume":"abs\/2312.11805","author":"Anil","year":"2023","journal-title":"CoRR"},{"key":"2025061112255414100_bib4","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2305.10403","article-title":"Palm 2 technical report","volume":"abs\/2305.10403","author":"Anil","year":"2023","journal-title":"CoRR"},{"key":"2025061112255414100_bib5","doi-asserted-by":"publisher","first-page":"1563","DOI":"10.1109\/ICDAR.2019.00251","article-title":"ICDAR 2019 competition on scene text visual question answering","volume-title":"2019 International Conference on Document Analysis and Recognition, ICDAR 2019, Sydney, Australia, September 20\u201325, 2019","author":"Biten","year":"2019"},{"key":"2025061112255414100_bib6","article-title":"Language models are few-shot learners","volume-title":"Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6\u201312, 2020, virtual","author":"Brown","year":"2020"},{"key":"2025061112255414100_bib7","article-title":"Open question answering over tables and text","volume-title":"9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3\u20137, 2021","author":"Chen","year":"2021"},{"key":"2025061112255414100_bib8","article-title":"Tabfact: A large-scale dataset for table-based fact verification","volume-title":"8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26\u201330, 2020","author":"Chen","year":"2020"},{"key":"2025061112255414100_bib9","doi-asserted-by":"publisher","first-page":"1026","DOI":"10.18653\/v1\/2020.findings-emnlp.91","article-title":"HybridQA: A dataset of multi-hop question answering over tabular and textual data","volume-title":"Findings of the Association for Computational Linguistics: EMNLP 2020","author":"Chen","year":"2020"},{"key":"2025061112255414100_bib10","doi-asserted-by":"publisher","first-page":"3697","DOI":"10.18653\/v1\/2021.emnlp-main.300","article-title":"FinQA: A dataset of numerical reasoning over financial data","volume-title":"Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing","author":"Chen","year":"2021"},{"key":"2025061112255414100_bib11","doi-asserted-by":"publisher","first-page":"6279","DOI":"10.18653\/v1\/2022.emnlp-main.421","article-title":"ConvFinQA: Exploring the chain of numerical reasoning in conversational finance question answering","volume-title":"Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing","author":"Chen","year":"2022"},{"key":"2025061112255414100_bib12","article-title":"The social economy: Unlocking value and productivity through social technologies","author":"Chui","year":"2012"},{"key":"2025061112255414100_bib13","doi-asserted-by":"publisher","first-page":"50","DOI":"10.1007\/978-3-319-11964-9_4","article-title":"Introducing wikidata to the linked data Web","volume-title":"The Semantic Web - ISWC 2014 - 13th International Semantic Web Conference, Riva del Garda, Italy, October 19\u201323, 2014. Proceedings, Part I","author":"Erxleben","year":"2014"},{"key":"2025061112255414100_bib14","doi-asserted-by":"publisher","first-page":"2309","DOI":"10.18653\/v1\/2020.acl-main.210","article-title":"INFOTABS: Inference on tables as semi-structured data","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Gupta","year":"2020"},{"key":"2025061112255414100_bib15","doi-asserted-by":"publisher","first-page":"4294","DOI":"10.18653\/v1\/2023.findings-acl.263","article-title":"RobustQA: Benchmarking the robustness of domain adaptation for open-domain question answering","volume-title":"Findings of the Association for Computational Linguistics: ACL 2023","author":"Han","year":"2023"},{"key":"2025061112255414100_bib16","doi-asserted-by":"publisher","first-page":"512","DOI":"10.18653\/v1\/2021.naacl-main.43","article-title":"Open domain question answering over tables via dense retrieval","volume-title":"Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Herzig","year":"2021"},{"key":"2025061112255414100_bib17","doi-asserted-by":"publisher","first-page":"37","DOI":"10.18653\/v1\/2023.acl-industry.4","article-title":"MathPrompter: Mathematical reasoning using large language models","volume-title":"Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track)","author":"Imani","year":"2023"},{"key":"2025061112255414100_bib18","doi-asserted-by":"publisher","first-page":"1601","DOI":"10.18653\/v1\/P17-1147","article-title":"TriviaQA: A large scale distantly supervised challenge dataset for reading comprehension","volume-title":"Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Joshi","year":"2017"},{"key":"2025061112255414100_bib19","doi-asserted-by":"publisher","first-page":"1013","DOI":"10.1162\/tacl_a_00503","article-title":"ProoFVer: Natural logic theorem proving for fact verification","volume":"10","author":"Krishna","year":"2022","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2025061112255414100_bib20","doi-asserted-by":"publisher","first-page":"10381","DOI":"10.18653\/v1\/2023.findings-acl.660","article-title":"DePlot: One-shot visual language reasoning by plot-to-table translation","volume-title":"Findings of the Association for Computational Linguistics: ACL 2023","author":"Liu","year":"2023"},{"key":"2025061112255414100_bib21","doi-asserted-by":"publisher","first-page":"35","DOI":"10.1162\/tacl_a_00446","article-title":"FeTaQA: Free-form table question answering","volume":"10","author":"Nan","year":"2022","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2025061112255414100_bib22","doi-asserted-by":"publisher","first-page":"6322","DOI":"10.18653\/v1\/2023.acl-long.348","article-title":"MultiTabQA: Generating tabular answers for multi-table question answering","volume-title":"Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Pal","year":"2023"},{"key":"2025061112255414100_bib23","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2408.00118","article-title":"Gemma 2: Improving open language models at a practical size","volume":"abs\/2408.00118","author":"Rivi\u00e8re","year":"2024","journal-title":"CoRR"},{"key":"2025061112255414100_bib24","doi-asserted-by":"publisher","first-page":"2013","DOI":"10.18653\/v1\/D15-1237","article-title":"WikiQA: A challenge dataset for open-domain question answering","volume-title":"Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing","author":"Yi","year":"2015"},{"key":"2025061112255414100_bib25","doi-asserted-by":"publisher","first-page":"2369","DOI":"10.18653\/v1\/D18-1259","article-title":"HotpotQA: A dataset for diverse, explainable multi-hop question answering","volume-title":"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing","author":"Yang","year":"2018"},{"key":"2025061112255414100_bib26","article-title":"React: Synergizing reasoning and acting in language models","volume-title":"The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1\u20135, 2023","author":"Yao","year":"2023"},{"key":"2025061112255414100_bib27","doi-asserted-by":"publisher","first-page":"6588","DOI":"10.18653\/v1\/2022.acl-long.454","article-title":"MultiHiertt: Numerical reasoning over multi hierarchical tabular and textual data","volume-title":"Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Zhao","year":"2022"},{"key":"2025061112255414100_bib28","doi-asserted-by":"publisher","first-page":"3277","DOI":"10.18653\/v1\/2021.acl-long.254","article-title":"TAT-QA: A question answering benchmark on a hybrid of tabular and textual content in finance","volume-title":"Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)","author":"Zhu","year":"2021"}],"container-title":["Transactions of the Association for Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00749\/2530175\/tacl_a_00749.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00749\/2530175\/tacl_a_00749.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,11]],"date-time":"2025-06-11T16:26:01Z","timestamp":1749659161000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/tacl\/article\/doi\/10.1162\/tacl_a_00749\/131278\/TANQ-An-Open-Domain-Dataset-of-Table-Answered"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025]]},"references-count":28,"URL":"https:\/\/doi.org\/10.1162\/tacl_a_00749","relation":{},"ISSN":["2307-387X"],"issn-type":[{"value":"2307-387X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025]]},"published":{"date-parts":[[2025]]}}}