{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,18]],"date-time":"2026-05-18T18:32:55Z","timestamp":1779129175543,"version":"3.51.4"},"reference-count":53,"publisher":"Association for Computing Machinery (ACM)","issue":"4","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2023,12]]},"abstract":"<jats:p>Natural Language to SQL systems (NL-to-SQL) have recently shown improved accuracy (exceeding 80%) for natural language to SQL query translation due to the emergence of transformer-based language models, and the popularity of the Spider benchmark. However, Spider mainly contains simple databases with few tables, columns, and entries, which do not reflect a realistic setting. Moreover, complex real-world databases with domain-specific content have little to no training data available in the form of NL\/SQL-pairs leading to poor performance of existing NL-to-SQL systems.<\/jats:p>\n          <jats:p>\n            In this paper, we introduce\n            <jats:italic>ScienceBenchmark<\/jats:italic>\n            , a new complex NL-to-SQL benchmark for three real-world, highly domain-specific databases. For this new benchmark, SQL experts and domain experts created high-quality NL\/SQL-pairs for each domain. To garner more data, we extended the small amount of human-generated data with synthetic data generated using GPT-3. We show that our benchmark is highly challenging, as the top performing systems on Spider achieve a very low performance on our benchmark. Thus, the challenge is many-fold: creating NL-to-SQL systems for highly complex domains with a small amount of hand-made training data augmented with synthetic data. To our knowledge,\n            <jats:italic>ScienceBenchmark<\/jats:italic>\n            is the first NL-to-SQL benchmark designed with complex real-world scientific databases, containing challenging training and test data carefully validated by domain experts.\n          <\/jats:p>","DOI":"10.14778\/3636218.3636225","type":"journal-article","created":{"date-parts":[[2024,3,5]],"date-time":"2024-03-05T17:04:07Z","timestamp":1709658247000},"page":"685-698","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":31,"title":["ScienceBenchmark: A Complex Real-World Benchmark for Evaluating Natural Language to SQL Systems"],"prefix":"10.14778","volume":"17","author":[{"given":"Yi","family":"Zhang","sequence":"first","affiliation":[{"name":"Zurich University of Applied Sciences, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jan","family":"Deriu","sequence":"additional","affiliation":[{"name":"Zurich University of Applied Sciences, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"George","family":"Katsogiannis-Meimarakis","sequence":"additional","affiliation":[{"name":"Athena Research Center, Greece"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Catherine","family":"Kosten","sequence":"additional","affiliation":[{"name":"Zurich University of Applied Sciences, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Georgia","family":"Koutrika","sequence":"additional","affiliation":[{"name":"Athena Research Center, Greece"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kurt","family":"Stockinger","sequence":"additional","affiliation":[{"name":"Zurich University of Applied Sciences, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,3,5]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-019-00567-8"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/3516431.3516436"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1017\/S135132490000005X"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1017\/S135132490000005X"},{"key":"e_1_2_1_5_1","volume-title":"Soda: Generating sql for business users. arXiv preprint arXiv:1207.0134","author":"Blunschi Lukas","year":"2012","unstructured":"Lukas Blunschi, Claudio Jossen, Donald Kossman, Magdalini Mori, and Kurt Stockinger. 2012. Soda: Generating sql for business users. arXiv preprint arXiv:1207.0134 (2012)."},{"key":"e_1_2_1_6_1","unstructured":"Tom Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared D Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell et al. 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020) 1877--1901."},{"key":"e_1_2_1_7_1","volume-title":"2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE, 2177--2182","author":"Brunner Ursin","year":"2021","unstructured":"Ursin Brunner and Kurt Stockinger. 2021. Valuenet: A natural language-to-sql system that learns from database information. In 2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE, 2177--2182."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.84"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-1033"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.acl-long.195"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.findings-emnlp.174"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1188"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1444"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","unstructured":"Moshe Hazoom Vibhor Malik and Ben Bogin. 2021. Text-to-SQL in the Wild: A Naturally-Occurring Dataset Based on Stack Exchange Data. 10.48550\/ARXIV.2106.05006","DOI":"10.48550\/ARXIV.2106.05006"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.nlp4prog-1.9"},{"key":"e_1_2_1_16_1","volume-title":"Unnatural instructions: Tuning language models with (almost) no human labor. arXiv preprint arXiv:2212.09689","author":"Honovich Or","year":"2022","unstructured":"Or Honovich, Thomas Scialom, Omer Levy, and Timo Schick. 2022. Unnatural instructions: Tuning language models with (almost) no human labor. arXiv preprint arXiv:2212.09689 (2022)."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","unstructured":"Wonseok Hwang Jinyeong Yim Seunghyun Park and Minjoon Seo. 2019. A Comprehensive Exploration on WikiSQL with Table-Aware Word Contextualization. 10.48550\/ARXIV.1902.01069","DOI":"10.48550\/ARXIV.1902.01069"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.coling-main.34"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P17-1089"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-022-00776-8"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.acl-long.176"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.acl-long.176"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.14778\/2735461.2735468"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2594519"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v37i11.26535"},{"key":"e_1_2_1_26_1","unstructured":"Jinyang Li Binyuan Hui Ge Qu Binhua Li Jiaxi Yang Bowen Li Bailin Wang Bowen Qin Rongyu Cao Ruiying Geng Nan Huo Chenhao Ma Kevin C. C. Chang Fei Huang Reynold Cheng and Yongbin Li. 2023. Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs. arXiv:2305.03111 [cs.CL]"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.14778\/3494124.3494139"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.3115\/1073083.1073135"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/604045.604070"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W18-6319"},{"key":"e_1_2_1_31_1","unstructured":"Alec Radford Jeffrey Wu Rewon Child David Luan Dario Amodei and Ilya Sutskever. 2019. \"Language Models are Unsupervised Multitask Learners\". (2019)."},{"key":"e_1_2_1_32_1","volume-title":"Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683","author":"Raffel Colin","year":"2019","unstructured":"Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2019. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683 (2019)."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1410"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W15-3802"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W17-1003"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.spnlp-1.2"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.14778\/2994509.2994536"},{"key":"e_1_2_1_38_1","volume-title":"PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models. arXiv:2109.05093 [cs.CL]","author":"Scholak Torsten","year":"2021","unstructured":"Torsten Scholak, Nathan Schucher, and Dzmitry Bahdanau. 2021. PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models. arXiv:2109.05093 [cs.CL]"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.14778\/3407790.3407858"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-007-0075-9"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/564691.564758"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.3115\/1117794.1117811"},{"key":"e_1_2_1_43_1","volume-title":"Attention is all you need. Advances in neural information processing systems 30","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017)."},{"key":"e_1_2_1_44_1","doi-asserted-by":"crossref","unstructured":"Bailin Wang Richard Shin Xiaodong Liu Oleksandr Polozov and Matthew Richardson. 2020. RAT-SQL: Relation-Aware Schema Encoding and Linking for Text-to-SQL Parsers. arXiv:1911.04942 [cs.CL]","DOI":"10.18653\/v1\/2020.acl-main.677"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/3366423.3380120"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3380589"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.emnlp-main.707"},{"key":"e_1_2_1_48_1","unstructured":"Xiaojun Xu Chang Liu and Dawn Song. 2017. SQLNet: Generating Structured Queries From Natural Language Without Reinforcement Learning. arXiv:1711.04436 [cs.CL]"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/3133887"},{"key":"e_1_2_1_50_1","volume-title":"GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing. In International Conference on Learning Representations. https:\/\/arxiv.org\/abs\/2009","author":"Yu Tao","year":"2021","unstructured":"Tao Yu, Chien-Sheng Wu, Xi Victoria Lin, Bailin Wang, Yi Chern Tan, Xinyi Yang, Dragomir Radev, Richard Socher, and Caiming Xiong. 2021. GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing. In International Conference on Learning Representations. https:\/\/arxiv.org\/abs\/2009.13845"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1425"},{"key":"e_1_2_1_52_1","volume-title":"Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task. arXiv:1809.08887 [cs.CL]","author":"Yu Tao","year":"2019","unstructured":"Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, Zilin Zhang, and Dragomir Radev. 2019. Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task. arXiv:1809.08887 [cs.CL]"},{"key":"e_1_2_1_53_1","unstructured":"Victor Zhong Caiming Xiong and Richard Socher. 2017. Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning. arXiv:1709.00103 [cs.CL]"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3636218.3636225","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,3,5]],"date-time":"2024-03-05T17:06:26Z","timestamp":1709658386000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3636218.3636225"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,12]]},"references-count":53,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2023,12]]}},"alternative-id":["10.14778\/3636218.3636225"],"URL":"https:\/\/doi.org\/10.14778\/3636218.3636225","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2023,12]]},"assertion":[{"value":"2024-03-05","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}