{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,24]],"date-time":"2026-07-24T14:54:22Z","timestamp":1784904862182,"version":"3.55.0"},"reference-count":93,"publisher":"Association for Computing Machinery (ACM)","issue":"11","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2025,7]]},"abstract":"<jats:p>Text-to-SQL, the task of translating natural language questions into SQL queries, plays a crucial role in enabling non-experts to interact with databases. While recent advancements in large language models (LLMs) have significantly enhanced text-to-SQL performance, existing approaches face notable limitations in real-world text-to-SQL applications. Prompting-based methods often depend on closed-source LLMs, which are expensive, raise privacy concerns, and lack customization. Fine-tuning-based methods, on the other hand, suffer from poor generalizability due to the limited coverage of publicly available training data. To overcome these challenges, we propose a novel and scalable text-to-SQL data synthesis framework for automatically synthesizing large-scale, high-quality, and diverse datasets without extensive human intervention. Using this framework, we introduce SynSQL-2.5M, the first million-scale text-to-SQL dataset, containing 2.5 million samples spanning over 16,000 synthetic databases. Each sample includes a database, SQL query, natural language question, and chain-of-thought (CoT) solution. Leveraging SynSQL-2.5M, we develop OmniSQL, a powerful open-source text-to-SQL model available in three sizes: 7B, 14B, and 32B. Extensive evaluations across nine datasets demonstrate that OmniSQL achieves state-of-the-art performance, matching or surpassing leading closed-source and open-source LLMs, including GPT-4o and DeepSeek-V3, despite its smaller size. We release all code, datasets, and models to support further research.<\/jats:p>","DOI":"10.14778\/3749646.3749723","type":"journal-article","created":{"date-parts":[[2025,9,4]],"date-time":"2025-09-04T17:55:06Z","timestamp":1757008506000},"page":"4695-4709","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":19,"title":["OmniSQL: Synthesizing High-Quality Text-to-SQL Data at Scale"],"prefix":"10.14778","volume":"18","author":[{"given":"Haoyang","family":"Li","sequence":"first","affiliation":[{"name":"Renmin University of China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Shang","family":"Wu","sequence":"additional","affiliation":[{"name":"Renmin University of China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Xiaokang","family":"Zhang","sequence":"additional","affiliation":[{"name":"Renmin University of China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Xinmei","family":"Huang","sequence":"additional","affiliation":[{"name":"Renmin University of China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jing","family":"Zhang","sequence":"additional","affiliation":[{"name":"Renmin University of China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Fuxin","family":"Jiang","sequence":"additional","affiliation":[{"name":"ByteDance Inc."}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Shuai","family":"Wang","sequence":"additional","affiliation":[{"name":"ByteDance Inc."}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Tieying","family":"Zhang","sequence":"additional","affiliation":[{"name":"ByteDance Inc."}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jianjun","family":"Chen","sequence":"additional","affiliation":[{"name":"ByteDance Inc."}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Rui","family":"Shi","sequence":"additional","affiliation":[{"name":"ByteDance Inc."}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Hong","family":"Chen","sequence":"additional","affiliation":[{"name":"Renmin University of China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Cuiping","family":"Li","sequence":"additional","affiliation":[{"name":"Renmin University of China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2025,9,4]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1017\/S135132490000005X"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2312.11805"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/3620665.3640366"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.emnlp-main.794"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-25007-6_25"},{"key":"e_1_2_1_6_1","volume-title":"SADGA: Structure-Aware Dual Graph Aggregation Network for Text-to-SQL. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021","author":"Cai Ruichu","year":"2021","unstructured":"Ruichu Cai, Jinjie Yuan, Boyan Xu, and Zhifeng Hao. 2021. SADGA: Structure-Aware Dual Graph Aggregation Network for Text-to-SQL. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6\u201314, 2021, virtual. 7664\u20137676."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/2021.ACL-LONG.198"},{"key":"e_1_2_1_8_1","volume-title":"The Eleventh International Conference on Learning Representations, ICLR 2023","author":"Chang Shuaichen","year":"2023","unstructured":"Shuaichen Chang, Jun Wang, Mingwen Dong, Lin Pan, Henghui Zhu, Alexander Hanbo Li, Wuwei Lan, Sheng Zhang, Jiarong Jiang, Joseph Lilien, Steve Ash, and et al. 2023. Dr.Spider: A Diagnostic Evaluation Benchmark towards Text-to-SQL Robustness. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1\u20135, 2023. OpenReview.net."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2406.11931"},{"key":"e_1_2_1_10_1","unstructured":"DeepSeek-AI. 2024. DeepSeek-V3 Technical Report. CoRR abs\/2412.19437 (2024). arXiv:2412.19437 10.48550\/ARXIV.2412.19437"},{"key":"e_1_2_1_11_1","unstructured":"DeepSeek-AI. 2025. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. arXiv:2501.12948 [cs.CL]"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.naacl-main.105"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3654975"},{"key":"e_1_2_1_14_1","volume-title":"GitSchemas: A Dataset for Automating Relational Data Preparation Tasks. In 38th IEEE International Conference on Data Engineering Workshops, ICDE Workshops 2022","author":"D\u00f6hmen Till","year":"2022","unstructured":"Till D\u00f6hmen, Madelon Hulsebos, Christian Beecks, and Sebastian Schelter. 2022. GitSchemas: A Dataset for Automating Relational Data Preparation Tasks. In 38th IEEE International Conference on Data Engineering Workshops, ICDE Workshops 2022, Kuala Lumpur, Malaysia, May 9, 2022. IEEE, 74\u201378."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2208.11857"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","unstructured":"Abhimanyu Dubey Abhinav Jauhri Abhinav Pandey Abhishek Kadian Ahmad Al-Dahle Aiesha Letman Akhil Mathur Alan Schelten Amy Yang Angela Fan Anirudh Goyal Anthony Hartshorn and et al. 2024. The Llama 3 Herd of Models. CoRR abs\/2407.21783 (2024). arXiv:2407.21783 10.48550\/ARXIV.2407.21783","DOI":"10.48550\/ARXIV.2407.21783"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2310.07875"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.14778\/3583140.3583165"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/2021.ACL-LONG.195"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/2021.EMNLP-MAIN.702"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.14778\/3641204.3641221"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2411.08599"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2411.15594"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3589292"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/D18-1188"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","unstructured":"Daya Guo Qihao Zhu Dejian Yang Zhenda Xie Kai Dong Wentao Zhang Guanting Chen Xiao Bi Y. Wu and et al. 2024. DeepSeek-Coder: When the Large Language Model Meets Programming - The Rise of Code Intelligence. CoRR abs\/2401.14196 (2024). arXiv:2401.14196 10.48550\/ARXIV.2401.14196","DOI":"10.48550\/ARXIV.2401.14196"},{"key":"e_1_2_1_27_1","volume-title":"LoRA: Low-Rank Adaptation of Large Language Models. In The Tenth International Conference on Learning Representations, ICLR 2022","author":"Hu Edward J.","year":"2022","unstructured":"Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2022. LoRA: Low-Rank Adaptation of Large Language Models. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25\u201329, 2022. OpenReview.net."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/2023.FINDINGS-ACL.86"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","unstructured":"Siming Huang Tianhao Cheng J. K. Liu Jiaran Hao Liuyihan Song Yang Xu J. Yang J. H. Liu Chenchen Zhang Linzheng Chai Ruifeng Yuan and et al. 2024. OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models. CoRR abs\/2411.04905 (2024). arXiv:2411.04905 10.48550\/ARXIV.2411.04905","DOI":"10.48550\/ARXIV.2411.04905"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","unstructured":"Binyuan Hui Jian Yang Zeyu Cui Jiaxi Yang Dayiheng Liu Lei Zhang Tianyu Liu Jiajun Zhang Bowen Yu Kai Dang An Yang and et al. 2024. Qwen2.5-Coder Technical Report. CoRR abs\/2409.12186 (2024). arXiv:2409.12186 10.48550\/ARXIV.2409.12186","DOI":"10.48550\/ARXIV.2409.12186"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/3588710"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","unstructured":"Aaron Jaech Adam Kalai Adam Lerer Adam Richardson Ahmed El-Kishky Aiden Low Alec Helyar Aleksander Madry Alex Beutel Alex Carney Alex Iftimie Alex Karpenko and et al. 2024. OpenAI o1 System Card. CoRR abs\/2412.16720 (2024). arXiv:2412.16720 10.48550\/ARXIV.2412.16720","DOI":"10.48550\/ARXIV.2412.16720"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2401.04088"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2025.naacl-long.94"},{"key":"e_1_2_1_35_1","volume-title":"Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022","author":"Kojima Takeshi","year":"2022","unstructured":"Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. 2022. Large Language Models are Zero-Shot Reasoners. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 \u2013 December 9, 2022."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3600006.3613165"},{"key":"e_1_2_1_37_1","volume-title":"Proceedings of the 31st International Conference on Computational Linguistics, COLING 2025, Abu Dhabi, UAE, January 19\u201324","author":"Lee Dongjun","year":"2025","unstructured":"Dongjun Lee, Choongwon Park, Jaehyuk Kim, and Heesoo Park. 2025. MCS-SQL: Leveraging Multiple Prompts and Multiple-Choice Selection For Text-to-SQL Generation. In Proceedings of the 31st International Conference on Computational Linguistics, COLING 2025, Abu Dhabi, UAE, January 19\u201324, 2025. Association for Computational Linguistics, 337\u2013353."},{"key":"e_1_2_1_38_1","volume-title":"EHRSQL: A Practical Text-to-SQL Benchmark for Electronic Health Records. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022","author":"Lee Gyubok","year":"2022","unstructured":"Gyubok Lee, Hyeonji Hwang, Seongsu Bae, Yeonsu Kwon, Woncheol Shin, Seongjun Yang, Minjoon Seo, Jong-Yeup Kim, and Edward Choi. 2022. EHRSQL: A Practical Text-to-SQL Benchmark for Electronic Health Records. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 \u2013 December 9, 2022."},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2411.07763"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.14778\/3681954.3682003"},{"key":"e_1_2_1_41_1","volume-title":"Alpha-SQL: Zero-Shot Text-to-SQL using Monte Carlo Tree Search. CoRR abs\/2502.17248","author":"Li Boyan","year":"2025","unstructured":"Boyan Li, Jiayi Zhang, Ju Fan, Yanwei Xu, Chong Chen, Nan Tang, and Yuyu Luo. 2025. Alpha-SQL: Zero-Shot Text-to-SQL using Monte Carlo Tree Search. CoRR abs\/2502.17248 (2025)."},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1609\/AAAI.V37I11.26535"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/3654930"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1609\/AAAI.V37I11.26536"},{"key":"e_1_2_1_45_1","volume-title":"Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023","author":"Li Jinyang","year":"2023","unstructured":"Jinyang Li, Binyuan Hui, Ge Qu, Jiaxi Yang, Binhua Li, Bowen Li, Bailin Wang, Bowen Qin, Ruiying Geng, Nan Huo, Xuanhe Zhou, and et al. 2023. Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs. In Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 \u2013 16, 2023."},{"key":"e_1_2_1_46_1","volume-title":"Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, Qian Liu, Evgenii Zheltonozhskii, and et al.","author":"Li Raymond","year":"2023","unstructured":"Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, Qian Liu, Evgenii Zheltonozhskii, and et al. 2023. StarCoder: may the source be with you! Trans. Mach. Learn. Res. 2023 (2023)."},{"key":"e_1_2_1_47_1","volume-title":"A Survey of NL2SQL with Large Language Models: Where are we, and where are we going? CoRR abs\/2408.05109","author":"Liu Xinyu","year":"2024","unstructured":"Xinyu Liu, Shuyu Shen, Boyan Li, Peixian Ma, Runzhi Jiang, Yuyu Luo, Yuxin Zhang, Ju Fan, Guoliang Li, and Nan Tang. 2024. A Survey of NL2SQL with Large Language Models: Where are we, and where are we going? CoRR abs\/2408.05109 (2024)."},{"key":"e_1_2_1_48_1","volume-title":"Decoupled Weight Decay Regularization. In 7th International Conference on Learning Representations, ICLR 2019","author":"Loshchilov Ilya","year":"2019","unstructured":"Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6\u20139, 2019. OpenReview.net."},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2402.19173"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.4236\/jcc.2025.132015"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2408.07702"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2405.04324"},{"key":"e_1_2_1_53_1","unstructured":"OpenAI. 2023. GPT-4 Technical Report. CoRR abs\/2303.08774 (2023). arXiv:2303.08774 10.48550\/ARXIV.2303.08774"},{"key":"e_1_2_1_54_1","unstructured":"OpenAI. 2024. GPT-4 Turbo and GPT-4. (2024). https:\/\/platform.openai.com\/docs\/models\/gpt-4-turbo-and-gpt-4."},{"key":"e_1_2_1_55_1","unstructured":"OpenAI. 2024. GPT-4o mini: advancing cost-efficient intelligence. (2024). https:\/\/openai.com\/index\/gpt-4o-mini-advancing-cost-efficient-intelligence\/."},{"key":"e_1_2_1_56_1","volume-title":"https:\/\/openai.com\/index\/hello-gpt-4o\/","author":"AI.","year":"2024","unstructured":"OpenAI. 2024. Hello GPT-4o. (2024). https:\/\/openai.com\/index\/hello-gpt-4o\/."},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P16-2024"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2410.01943"},{"key":"e_1_2_1_59_1","volume-title":"DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction. In Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023","author":"Pourreza Mohammadreza","year":"2023","unstructured":"Mohammadreza Pourreza and Davood Rafiei. 2023. DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction. In Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 \u2013 16, 2023."},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/2022.EMNLP-MAIN.211"},{"key":"e_1_2_1_61_1","unstructured":"Alec Radford Karthik Narasimhan Tim Salimans Ilya Sutskever et al. 2018. Improving language understanding by generative pre-training. (2018)."},{"key":"e_1_2_1_62_1","article-title":"Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer","volume":"21","author":"Raffel Colin","year":"2020","unstructured":"Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res. 21 (2020), 140:1\u2013140:67.","journal-title":"J. Mach. Learn. Res."},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.acl-short.15"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC41405.2020.00024"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2204.00498"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","unstructured":"Machel Reid Nikolay Savinov Denis Teplyashin Dmitry Lepikhin Timothy P. Lillicrap Jean-Baptiste Alayrac Radu Soricut Angeliki Lazaridou Orhan Firat Julian Schrittwieser Ioannis Antonoglou Rohan Anil Sebastian Borgeaud Andrew M. Dai Katie Millican Ethan Dyer Mia Glaese Thibault Sottiaux Benjamin Lee Fabio Viola Malcolm Reynolds Yuanzhong Xu and et al. 2024. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context. CoRR abs\/2403.05530 (2024). arXiv:2403.05530 10.48550\/ARXIV.2403.05530","DOI":"10.48550\/ARXIV.2403.05530"},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1410"},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/W17-1003"},{"key":"e_1_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/2021.EMNLP-MAIN.779"},{"key":"e_1_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2405.16755"},{"key":"e_1_2_1_71_1","unstructured":"Transaction Processing Performance Council (TPC). [n.d.]. TPC-DS: Decision Support Benchmark. Online. Available: http:\/\/www.tpc.org\/tpcds\/."},{"key":"e_1_2_1_72_1","article-title":"Visualizing data using t-SNE","volume":"9","author":"der Maaten Laurens Van","year":"2008","unstructured":"Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, 11 (2008).","journal-title":"Journal of machine learning research"},{"key":"e_1_2_1_73_1","volume-title":"WikiDBs: A Large-Scale Corpus Of Relational Databases From Wikidata. In Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024","author":"Vogel Liane","year":"2024","unstructured":"Liane Vogel, Jan-Micha Bodensohn, and Carsten Binnig. 2024. WikiDBs: A Large-Scale Corpus Of Relational Databases From Wikidata. In Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 \u2013 15, 2024."},{"key":"e_1_2_1_74_1","volume-title":"Proceedings of the 31st International Conference on Computational Linguistics, COLING 2025, Abu Dhabi, UAE, January 19\u201324","author":"Wang Bing","year":"2025","unstructured":"Bing Wang, Changyu Ren, Jian Yang, Xinnian Liang, Jiaqi Bai, Linzheng Chai, Zhao Yan, Qian-Wen Zhang, Di Yin, Xing Sun, and Zhoujun Li. 2025. MAC-SQL: A Multi-Agent Collaborative Framework for Text-to-SQL. In Proceedings of the 31st International Conference on Computational Linguistics, COLING 2025, Abu Dhabi, UAE, January 19\u201324, 2025. Association for Computational Linguistics, 540\u2013557."},{"key":"e_1_2_1_75_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/2020.ACL-MAIN.677"},{"key":"e_1_2_1_76_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/2021.NAACL-MAIN.220"},{"key":"e_1_2_1_77_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1807.03100"},{"key":"e_1_2_1_78_1","volume-title":"Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022","author":"Wei Jason","year":"2022","unstructured":"Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. 2022. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 \u2013 December 9, 2022."},{"key":"e_1_2_1_79_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3380589"},{"key":"e_1_2_1_80_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/2021.EMNLP-MAIN.707"},{"key":"e_1_2_1_81_1","doi-asserted-by":"publisher","unstructured":"An Yang Baosong Yang Beichen Zhang Binyuan Hui Bo Zheng Bowen Yu Chengyuan Li Dayiheng Liu Fei Huang Haoran Wei Huan Lin Jian Yang and et al. 2024. Qwen2.5 Technical Report. CoRR abs\/2412.15115 (2024). arXiv:2412.15115 10.48550\/ARXIV.2412.15115","DOI":"10.48550\/ARXIV.2412.15115"},{"key":"e_1_2_1_82_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/2024.ACL-LONG.425"},{"key":"e_1_2_1_83_1","volume-title":"Hierarchical Neural Data Synthesis for Semantic Parsing. CoRR abs\/2112.02212","author":"Yang Wei","year":"2021","unstructured":"Wei Yang, Peng Xu, and Yanshuai Cao. 2021. Hierarchical Neural Data Synthesis for Semantic Parsing. CoRR abs\/2112.02212 (2021). arXiv:2112.02212"},{"key":"e_1_2_1_84_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/2020.ACL-MAIN.745"},{"key":"e_1_2_1_85_1","volume-title":"GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing. In 9th International Conference on Learning Representations, ICLR 2021","author":"Yu Tao","year":"2021","unstructured":"Tao Yu, Chien-Sheng Wu, Xi Victoria Lin, Bailin Wang, Yi Chern Tan, Xinyi Yang, Dragomir R. Radev, Richard Socher, and Caiming Xiong. 2021. GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3\u20137, 2021. OpenReview.net."},{"key":"e_1_2_1_86_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/D18-1193"},{"key":"e_1_2_1_87_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/D18-1425"},{"key":"e_1_2_1_88_1","doi-asserted-by":"publisher","DOI":"10.14778\/3636218.3636225"},{"key":"e_1_2_1_89_1","volume-title":"Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena. In Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023","author":"Zheng Lianmin","year":"2023","unstructured":"Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric P. Xing, Hao Zhang, Joseph E. Gonzalez, and Ion Stoica. 2023. Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena. In Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 \u2013 16, 2023."},{"key":"e_1_2_1_90_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/2020.EMNLP-MAIN.29"},{"key":"e_1_2_1_91_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/2020.EMNLP-MAIN.558"},{"key":"e_1_2_1_92_1","volume-title":"Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning. CoRR abs\/1709.00103","author":"Zhong Victor","year":"2017","unstructured":"Victor Zhong, Caiming Xiong, and Richard Socher. 2017. Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning. CoRR abs\/1709.00103 (2017). arXiv:1709.00103"},{"key":"e_1_2_1_93_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2310.17631"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3749646.3749723","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,5]],"date-time":"2025-09-05T03:23:04Z","timestamp":1757042584000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3749646.3749723"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7]]},"references-count":93,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2025,7]]}},"alternative-id":["10.14778\/3749646.3749723"],"URL":"https:\/\/doi.org\/10.14778\/3749646.3749723","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2025,7]]},"assertion":[{"value":"2025-09-04","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}