{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,16]],"date-time":"2026-07-16T02:12:02Z","timestamp":1784167922864,"version":"3.55.0"},"reference-count":64,"publisher":"Association for Computing Machinery (ACM)","issue":"3","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Manag. Data"],"published-print":{"date-parts":[[2025,6,17]]},"abstract":"<jats:p>Finding relevant tables among databases, lakes, and repositories is the first step in extracting value from data. Such a task remains difficult because assessing whether a table is relevant to a problem does not always depend only on its content but also on the context, which is usually tribal knowledge known to the individual or team. While tools like data catalogs and academic data discovery systems target this problem, they rely on keyword search or more complex interfaces, limiting non-technical users' ability to find relevant data. The advent of large language models (LLMs) offers a unique opportunity for users to ask questions directly in natural language, making dataset discovery more intuitive, accessible, and efficient.<\/jats:p>\n                  <jats:p>\n                    In this paper, we introduce\n                    <jats:sc>Pneuma<\/jats:sc>\n                    , a retrieval-augmented generation (RAG) system designed to efficiently and effectively discover tabular data.\n                    <jats:sc>Pneuma<\/jats:sc>\n                    leverages large language models (LLMs) for both table representation and table retrieval. For table representation,\n                    <jats:sc>Pneuma<\/jats:sc>\n                    preserves schema and row-level information to ensure comprehensive data understanding. For table retrieval,\n                    <jats:sc>Pneuma<\/jats:sc>\n                    augments LLMs with traditional information retrieval techniques, such as full-text and vector search, harnessing the strengths of both to improve retrieval performance. To evaluate\n                    <jats:sc>Pneuma<\/jats:sc>\n                    , we generate comprehensive benchmarks that simulate table discovery workload on six real-world datasets including enterprise data, scientific databases, warehousing data, and open data. Our results demonstrate that\n                    <jats:sc>Pneuma<\/jats:sc>\n                    outperforms widely used table search systems (such as full-text search and state-of-the-art RAG systems) in accuracy and resource efficiency.\n                  <\/jats:p>","DOI":"10.1145\/3725337","type":"journal-article","created":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T21:23:29Z","timestamp":1750281809000},"page":"1-28","source":"Crossref","is-referenced-by-count":9,"title":["<scp>Pneuma<\/scp>\n                    : Leveraging LLMs for Tabular Data Representation and Retrieval in an End-to-End System"],"prefix":"10.1145","volume":"3","author":[{"ORCID":"https:\/\/orcid.org\/0009-0001-5324-7758","authenticated-orcid":false,"given":"Muhammad Imam Luthfi","family":"Balaka","sequence":"first","affiliation":[{"name":"University of Indonesia, Depok, Indonesia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-6239-9092","authenticated-orcid":false,"given":"David","family":"Alexander","sequence":"additional","affiliation":[{"name":"University of Indonesia, Depok, Indonesia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-6554-4161","authenticated-orcid":false,"given":"Qiming","family":"Wang","sequence":"additional","affiliation":[{"name":"The University of Chicago, Chicago, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-8646-473X","authenticated-orcid":false,"given":"Yue","family":"Gong","sequence":"additional","affiliation":[{"name":"The University of Chicago, Chicago, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0745-6804","authenticated-orcid":false,"given":"Adila","family":"Krisnadhi","sequence":"additional","affiliation":[{"name":"University of Indonesia, Depok, Indonesia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7675-6080","authenticated-orcid":false,"given":"Raul","family":"Castro Fernandez","sequence":"additional","affiliation":[{"name":"The University of Chicago, Chicago, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2025,6,18]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al.","author":"Achiam Josh","year":"2023","unstructured":"Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al., 2023. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, (2023)."},{"key":"e_1_2_1_2_1","unstructured":"Amey Agrawal Nitin Kedia Ashish Panwar Jayashree Mohan Nipun Kwatra Bhargav S. Gulavani Alexey Tumanov and Ramachandran Ramjee. 2024. Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve. arxiv:2403.02310 [cs.LG] https:\/\/arxiv.org\/abs\/2403.02310"},{"key":"e_1_2_1_3_1","unstructured":"amundsen. [n.d.]. https:\/\/www.amundsen.io\/."},{"key":"e_1_2_1_4_1","volume-title":"Bohou Li, Mark Lindblad, Henry Lindeman, Alex Meyer, Parth Parmar, Tanvi Ranade, Mehul A. Shah, Benjamin Sowell, Dan Tecuci, Vinayak Thapliyal, and Matt Welsh.","author":"Anderson Eric","year":"2024","unstructured":"Eric Anderson, Jonathan Fritz, Austin Lee, Bohou Li, Mark Lindblad, Henry Lindeman, Alex Meyer, Parth Parmar, Tanvi Ranade, Mehul A. Shah, Benjamin Sowell, Dan Tecuci, Vinayak Thapliyal, and Matt Welsh. 2024. The Design of an LLM-powered Unstructured Analytics System. arxiv:2409.00847 [cs.DB] https:\/\/arxiv.org\/abs\/2409.00847"},{"key":"e_1_2_1_5_1","unstructured":"Public BI Benchmark. [n.d.]. Public BI Benchmark. https:\/\/github.com\/cwida\/public_bi_benchmark"},{"key":"e_1_2_1_6_1","unstructured":"Asim Biswal Liana Patel Siddarth Jha Amog Kamsetty Shu Liu Joseph E. Gonzalez Carlos Guestrin and Matei Zaharia. 2024. Text2SQL is Not Enough: Unifying AI and Databases with TAG. arxiv:2408.14717 [cs.DB] https:\/\/arxiv.org\/abs\/2408.14717"},{"key":"e_1_2_1_7_1","doi-asserted-by":"crossref","unstructured":"Dan Brickley Matthew Burgess and Natasha Noy. 2019. Google Dataset Search: Building a search engine for datasets in an open Web ecosystem. In The world wide web conference. 1365-1375.","DOI":"10.1145\/3308558.3313685"},{"key":"e_1_2_1_8_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3596512","article-title":"An analysis of fusion functions for hybrid retrieval","volume":"42","author":"Bruch Sebastian","year":"2023","unstructured":"Sebastian Bruch, Siyu Gai, and Amir Ingber. 2023. An analysis of fusion functions for hybrid retrieval. ACM Transactions on Information Systems, Vol. 42, 1 (2023), 1-35.","journal-title":"ACM Transactions on Information Systems"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476311.3476346"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2018.00094"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.findings-acl.137"},{"key":"e_1_2_1_12_1","volume-title":"International Conference on Learning Representations.","author":"Chen Wenhu","year":"2020","unstructured":"Wenhu Chen, Ming-Wei Chang, Eva Schlinger, William Yang Wang, and William W Cohen. 2020. Open Question Answering over Tables and Text. In International Conference on Learning Representations."},{"key":"e_1_2_1_13_1","unstructured":"chromadb. [n.d.]. https:\/\/www.trychroma.com\/."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1186\/s40537-021-00516-9"},{"key":"e_1_2_1_15_1","unstructured":"Hugging Face. [n.d.]. https:\/\/huggingface.co\/models."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE60146.2024.00272"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3665601.3669846"},{"key":"e_1_2_1_18_1","volume-title":"2018 IEEE 34th International Conference on Data Engineering (ICDE). IEEE, 1001-1012","author":"Fernandez Raul Castro","year":"2018","unstructured":"Raul Castro Fernandez, Ziawasch Abedjan, Famien Koko, Gina Yuan, Samuel Madden, and Michael Stonebraker. 2018. Aurum: A data discovery system. In 2018 IEEE 34th International Conference on Data Engineering (ICDE). IEEE, 1001-1012."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.14778\/3407790.3407800"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE55515.2023.00213"},{"key":"e_1_2_1_21_1","unstructured":"Jingtong Gao Bo Chen Xiangyu Zhao Weiwen Liu Xiangyang Li Yichao Wang Zijian Zhang Wanyu Wang Yuyang Ye Shanru Lin et al. 2024. LLM-enhanced Reranking in Recommender Systems. arXiv preprint arXiv:2406.12433 (2024)."},{"key":"e_1_2_1_22_1","unstructured":"Yunfan Gao Yun Xiong Xinyu Gao Kangxiang Jia Jinliu Pan Yuxi Bi Yi Dai Jiawei Sun and Haofen Wang. 2023. Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997 (2023)."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gkr777"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3458723"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE55515.2023.00045"},{"key":"e_1_2_1_26_1","volume-title":"Ground: A Data Context Service. In 8th Biennial Conference on Innovative Data Systems Research, CIDR","author":"Hellerstein Joseph M.","year":"2017","unstructured":"Joseph M. Hellerstein, Vikram Sreekanti, Joseph E. Gonzalez, James Dalton, Akon Dey, Sreyashi Nag, Krishna Ramachandran, Sudhanshu Arora, Arka Bhattacharyya, Shirshanka Das, Mark Donsky, Gabriel Fierro, Chang She, Carl Steinbach, Venkat Subramanian, and Eric Sun. 2017. Ground: A Data Context Service. In 8th Biennial Conference on Innovative Data Systems Research, CIDR 2017, Chaminade, CA, USA, January 8--11, 2017, Online Proceedings. www.cidrdb.org. http:\/\/cidrdb.org\/cidr2017\/papers\/p111-hellerstein-cidr17.pdf"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.naacl-main.43"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.398"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.naacl-main.68"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-981-19-7596-7_14"},{"key":"e_1_2_1_31_1","unstructured":"jstor. [n.d.]. https:\/\/www.jstor.org\/."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.550"},{"key":"e_1_2_1_33_1","volume-title":"Advances in Neural Information Processing Systems","author":"Kusupati Aditya","year":"2022","unstructured":"Aditya Kusupati, Gantavya Bhatt, Aniket Rege, Matthew Wallingford, Aditya Sinha, Vivek Ramanujan, William Howard-Snyder, Kaifeng Chen, Sham Kakade, Prateek Jain, and Ali Farhadi. 2022. Matryoshka Representation Learning. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, (Eds.), Vol. 35. Curran Associates, Inc., 30233-30249. https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2022\/file\/c32319f4868da7613d78af9993100e42-Paper-Conference.pdf"},{"key":"e_1_2_1_34_1","unstructured":"Guillaume Lample Alexis Conneau Ludovic Denoyer and Marc'Aurelio Ranzato. 2017. Unsupervised machine translation using monolingual corpora only. arXiv preprint arXiv:1711.00043 (2017)."},{"key":"e_1_2_1_35_1","unstructured":"Fangyu Lei Fangyu Lei Jixuan Chen Yuxiao Ye Ruisheng Cao Dongchan Shin Hongjin Su Zhaoqing Suo Hongcheng Gao Wenjing Hu Pengcheng Yin Victor Zhong Caiming Xiong Ruoxi Sun Qian Liu Sida Wang and Tao Yu. 2024. Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows. https:\/\/api.semanticscholar.org\/CorpusID:273970164"},{"key":"e_1_2_1_36_1","first-page":"9459","article-title":". Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks","volume":"33","author":"Lewis Patrick","year":"2020","unstructured":"Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich K\u00fcttler, Mike Lewis, Wen-tau Yih, Tim Rockt\u00e4schel, et al., 2020. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Advances in Neural Information Processing Systems, Vol. 33 (2020), 9459-9474.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_1_37_1","volume-title":"Proceedings of the 37th International Conference on Neural Information Processing Systems","author":"Li Jinyang","year":"2024","unstructured":"Jinyang Li, Binyuan Hui, Ge Qu, Jiaxi Yang, Binhua Li, Bowen Li, Bailin Wang, Bowen Qin, Ruiying Geng, Nan Huo, Xuanhe Zhou, Chenhao Ma, Guoliang Li, Kevin C.C. Chang, Fei Huang, Reynold Cheng, and Yongbin Li. 2024. Can LLM already serve as a database interface? a big bench for large-scale database grounded text-to-SQLs. In Proceedings of the 37th International Conference on Neural Information Processing Systems, (New Orleans, LA, USA) (NIPS '23). Curran Associates Inc., Red Hook, NY, USA, Article 1835, 28 pages."},{"key":"e_1_2_1_38_1","volume-title":"Proceedings of the Conference, on Innovative Database Research, (CIDR, )","author":"Liu Chunwei","year":"2025","unstructured":"Chunwei Liu, Matthew Russo, Michael Cafarella, Lei Cao, Peter Baile Chen, Zui Chen, Michael Franklin, Tim Kraska, Samuel Madden, Rana Shahout, and Gerardo Vitagliano. 2025. Palimpzest: Optimizing AI-Powered Analytics with Declarative Query Processing. In Proceedings of the Conference, on Innovative Database Research, (CIDR, ), (2025)."},{"key":"e_1_2_1_39_1","unstructured":"Jerry Liu. 2022. LlamaIndex . https:\/\/github.com\/jerryjliu\/llama_index"},{"key":"e_1_2_1_40_1","volume-title":"TAPEX: Table Pre-training via Learning a Neural SQL Executor. arxiv:2107.07653 [cs.CL]","author":"Liu Qian","year":"2021","unstructured":"Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, and Jian guang Lou. 2021. TAPEX: Table Pre-training via Learning a Neural SQL Executor. arxiv:2107.07653 [cs.CL]"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2407.03618"},{"key":"e_1_2_1_42_1","volume-title":"Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs","author":"Malkov Yu A","year":"2018","unstructured":"Yu A Malkov and Dmitry A Yashunin. 2018. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE transactions on pattern analysis and machine intelligence, Vol. 42, 4 (2018), 824-836."},{"key":"e_1_2_1_43_1","unstructured":"AirBnB Metis. [n.d.]. https:\/\/medium.com\/airbnb-engineering\/metis-building-airbnbs-next-generation-data-management-platform-d2c5219edf19."},{"key":"e_1_2_1_44_1","unstructured":"Microsoft. 2024. Adventure Works. https:\/\/learn.microsoft.com\/en-us\/sql\/samples\/adventureworks-install-configure?view=sql-server-ver16&tabs=ssms"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.eacl-main.148"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00446"},{"key":"e_1_2_1_47_1","volume-title":"QATCH: Benchmarking SQL-centric tasks with Table Representation Learning Models on Your Data. In Advances in Neural Information Processing Systems","author":"Papicchio Simone","year":"2023","unstructured":"Simone Papicchio, Paolo Papotti, and Luca Cagliero. 2023. QATCH: Benchmarking SQL-centric tasks with Table Representation Learning Models on Your Data. In Advances in Neural Information Processing Systems, A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, (Eds.), Vol. 36. Curran Associates, Inc., 30898-30917. https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2023\/file\/62a24b69b820d30e9e5ad4f15ff7bf72-Paper-Datasets_and_Benchmarks.pdf"},{"key":"e_1_2_1_48_1","volume-title":"Semantic Operators: A Declarative Model for Rich, AI-based Analytics Over Text Data. arxiv:2407.11418 [cs.DB] https:\/\/arxiv.org\/abs\/2407.11418","author":"Patel Liana","year":"2024","unstructured":"Liana Patel, Siddharth Jha, Parth Asawa, Melissa Pan, Carlos Guestrin, and Matei Zaharia. 2024. Semantic Operators: A Declarative Model for Rich, AI-based Analytics Over Text Data. arxiv:2407.11418 [cs.DB] https:\/\/arxiv.org\/abs\/2407.11418"},{"key":"e_1_2_1_49_1","unstructured":"Chicago Data Portal. [n.d.]. Chicago Data Portal. https:\/\/data.cityofchicago.org\/"},{"key":"e_1_2_1_50_1","unstructured":"pubmed. [n.d.]. https:\/\/pubmed.ncbi.nlm.nih.gov\/."},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1561\/1500000019"},{"key":"e_1_2_1_52_1","unstructured":"Yunfan Shao Linyang Li Junqi Dai and Xipeng Qiu. 2023. Character-LLM: A Trainable Agent for Role-Playing. arxiv:2310.10158 [cs.CL] https:\/\/arxiv.org\/abs\/2310.10158"},{"key":"e_1_2_1_53_1","volume-title":"Zephyr: Direct Distillation of LM Alignment. arxiv:2310.16944 [cs.LG] https:\/\/arxiv.org\/abs\/2310.16944","author":"Tunstall Lewis","year":"2023","unstructured":"Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Cl\u00e9mentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sanseviero, Alexander M. Rush, and Thomas Wolf. 2023. Zephyr: Direct Distillation of LM Alignment. arxiv:2310.16944 [cs.LG] https:\/\/arxiv.org\/abs\/2310.16944"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1145\/3404835.3462909"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/3626756"},{"key":"e_1_2_1_56_1","volume-title":"Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding. In The Twelfth International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=4L0xnS4GQM","author":"Wang Zilong","year":"2024","unstructured":"Zilong Wang, Hao Zhang, Chun-Liang Li, Julian Martin Eisenschlos, Vincent Perot, Zifeng Wang, Lesly Miculicich, Yasuhisa Fujii, Jingbo Shang, Chen-Yu Lee, and Tomas Pfister. 2024. Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding. In The Twelfth International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=4L0xnS4GQM"},{"key":"e_1_2_1_57_1","unstructured":"An Yang Baosong Yang Binyuan Hui Bo Zheng Bowen Yu Chang Zhou Chengpeng Li Chengyuan Li Dayiheng Liu Fei Huang Guanting Dong Haoran Wei Huan Lin Jialong Tang Jialin Wang Jian Yang Jianhong Tu Jianwei Zhang Jianxin Ma Jianxin Yang Jin Xu Jingren Zhou Jinze Bai Jinzheng He Junyang Lin Kai Dang Keming Lu Keqin Chen Kexin Yang Mei Li Mingfeng Xue Na Ni Pei Zhang Peng Wang Ru Peng Rui Men Ruize Gao Runji Lin Shijie Wang Shuai Bai Sinan Tan Tianhang Zhu Tianhao Li Tianyu Liu Wenbin Ge Xiaodong Deng Xiaohuan Zhou Xingzhang Ren Xinyu Zhang Xipin Wei Xuancheng Ren Xuejing Liu Yang Fan Yang Yao Yichang Zhang Yu Wan Yunfei Chu Yuqiong Liu Zeyu Cui Zhenru Zhang Zhifang Guo and Zhihao Fan. 2024. Qwen2 Technical Report. arxiv:2407.10671 [cs.CL] https:\/\/arxiv.org\/abs\/2407.10671"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.745"},{"key":"e_1_2_1_59_1","volume-title":"Radev","author":"Yu Tao","year":"2018","unstructured":"Tao Yu, Rui Zhang, Kai-Chou Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Z Li, Qingning Yao, Shanelle Roman, Zilin Zhang, and Dragomir R. Radev. 2018. Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task. ArXiv, Vol. abs\/1809.08887 (2018). https:\/\/api.semanticscholar.org\/CorpusID:52815560"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1145\/3178876.3186067"},{"key":"e_1_2_1_61_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3372117","article-title":"Web table extraction, retrieval, and augmentation: A survey","volume":"11","author":"Zhang Shuo","year":"2020","unstructured":"Shuo Zhang and Krisztian Balog. 2020. Web table extraction, retrieval, and augmentation: A survey. ACM Transactions on Intelligent Systems and Technology (TIST), Vol. 11, 2 (2020), 1-35.","journal-title":"ACM Transactions on Intelligent Systems and Technology (TIST)"},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.14778\/3659437.3659452"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3517891"},{"key":"e_1_2_1_64_1","volume-title":"Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning. ArXiv","author":"Zhong Victor","year":"2017","unstructured":"Victor Zhong, Caiming Xiong, and Richard Socher. 2017. Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning. ArXiv, Vol. abs\/1709.00103 (2017). https:\/\/api.semanticscholar.org\/CorpusID:25156106"}],"container-title":["Proceedings of the ACM on Management of Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3725337","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T18:52:25Z","timestamp":1774983145000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3725337"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,17]]},"references-count":64,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2025,6,17]]}},"alternative-id":["10.1145\/3725337"],"URL":"https:\/\/doi.org\/10.1145\/3725337","relation":{},"ISSN":["2836-6573"],"issn-type":[{"value":"2836-6573","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,6,17]]}}}