{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,11]],"date-time":"2025-09-11T17:42:50Z","timestamp":1757612570955,"version":"3.44.0"},"reference-count":38,"publisher":"Association for Computing Machinery (ACM)","issue":"10","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2025,6]]},"abstract":"<jats:p>Neural text-to-SQL models, which translate natural language questions (NLQs) into SQL queries given a database schema, have achieved remarkable performance. However, database schemas frequently evolve to meet new requirements. Such schema evolution often leads to performance degradation for models trained on static schemas. Existing work either mainly focuses on simply paraphrasing some syntactic or semantic mappings among NLQ, DB and SQL, or lacks a comprehensive and controllable way to investigate the model robustness issue under the schema evolution, which is insufficient when facing the increasingly complex and rich database schema changes in reality, especially in the LLM era.<\/jats:p>\n          <jats:p>To address the challenges posed by schema evolution, we present EvoSchema, a comprehensive benchmark designed to assess and enhance the robustness of text-to-SQL systems under real-world schema changes. EvoSchema introduces a novel schema evolution taxonomy, encompassing ten perturbation types across column-level and table-level modifications, systematically simulating the dynamic nature of database schemas. Through EvoSchema, we conduct an in-depth evaluation spanning different open-source and closed-source LLMs, revealing that table-level perturbations have a significantly greater impact on model performance compared to column-level changes. Furthermore, EvoSchema inspires the development of more resilient text-to-SQL systems, in terms of both model training and database design. The models trained on EvoSchema's diverse schema designs can force the model to distinguish the schema difference for the same questions to avoid learning spurious patterns, which demonstrate remarkable robustness compared to those trained on unperturbed data on average. This benchmark offers valuable insights into model behavior and a path forward for designing systems capable of thriving in dynamic, real-world environments.<\/jats:p>","DOI":"10.14778\/3748191.3748222","type":"journal-article","created":{"date-parts":[[2025,9,4]],"date-time":"2025-09-04T13:50:16Z","timestamp":1756993816000},"page":"3655-3668","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Evoschema: Towards Text-to-SQL Robustness against Schema Evolution"],"prefix":"10.14778","volume":"18","author":[{"given":"Tianshu","family":"Zhang","sequence":"first","affiliation":[{"name":"The Ohio State University, Columbus, OH"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kun","family":"Qian","sequence":"additional","affiliation":[{"name":"Adobe Inc., Seattle, WA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Siddhartha","family":"Sahai","sequence":"additional","affiliation":[{"name":"Adobe Inc., Seattle, WA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yuan","family":"Tian","sequence":"additional","affiliation":[{"name":"Purdue University, West Lafayette, IN"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shaddy","family":"Garg","sequence":"additional","affiliation":[{"name":"Adobe Inc., Bangalore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Huan","family":"Sun","sequence":"additional","affiliation":[{"name":"The Ohio State University, Columbus, OH"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yunyao","family":"Li","sequence":"additional","affiliation":[{"name":"Adobe Inc., San Jose, CA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,9,4]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"FIFA World Cup: All the results from World Cups. Kaggle","author":"Becklas Andre","year":"2018","unstructured":"Andre Becklas. 2018. FIFA World Cup: All the results from World Cups. Kaggle (2018). https:\/\/www.kaggle.com\/datasets\/abecklas\/fifa-world-cup"},{"key":"e_1_2_1_2_1","volume-title":"The Eleventh International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=Wc5bmZZU9cy","author":"Chang Shuaichen","year":"2023","unstructured":"Shuaichen Chang, Jun Wang, Mingwen Dong, Lin Pan, Henghui Zhu, Alexander Hanbo Li, Wuwei Lan, Sheng Zhang, Jiarong Jiang, Joseph Lilien, Steve Ash, William Yang Wang, Zhiguo Wang, Vittorio Castelli, Patrick Ng, and Bing Xiang. 2023. Dr.Spider: A Diagnostic Evaluation Benchmark towards Text-to-SQL Robustness. In The Eleventh International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=Wc5bmZZU9cy"},{"key":"e_1_2_1_3_1","doi-asserted-by":"crossref","first-page":"113","DOI":"10.1016\/j.scico.2013.11.025","article-title":"Understanding database schema evolution: A case study","volume":"97","author":"Cleve Anthony","year":"2015","unstructured":"Anthony Cleve, Maxime Gobert, Loup Meurice, Jerome Maes, and Jens Weber. 2015. Understanding database schema evolution: A case study. Science of Computer Programming 97 (2015), 113\u2013121.","journal-title":"Science of Computer Programming"},{"key":"e_1_2_1_4_1","volume-title":"Proceedings of the Future Technologies Conference. Springer, 555\u2013564","author":"Deksne Daiga","year":"2022","unstructured":"Daiga Deksne and Raivis Skadi\u0146\u0161. 2022. Virtual Assistant for Querying Databases in Natural Language. In Proceedings of the Future Technologies Conference. Springer, 555\u2013564."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICSME.2018.00073"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.naacl-main.105"},{"key":"e_1_2_1_7_1","unstructured":"Abhimanyu Dubey and et al. 2024. The Llama 3 Herd of Models. arXiv:2407.21783 [cs.AI] https:\/\/arxiv.org\/abs\/2407.21783"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","unstructured":"Jonathan F\u00fcrst Catherine Kosten Farhad Nooralahzadeh Yi Zhang and Kurt Stockinger. 2025. Evaluating the Data Model Robustness of Text-to-SQL Systems Based on Real User Queries. In EDBT. 158\u2013170. 10.48786\/edbt.2025.13","DOI":"10.48786\/edbt.2025.13"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.findings-emnlp.155"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.14778\/3641204.3641221"},{"key":"e_1_2_1_11_1","volume-title":"Self-adaptive Schema Migration Strategies. In International Conference on Model-Driven Engineering and Software Development. Springer, 230\u2013253","author":"Hillenbrand Andrea","year":"2021","unstructured":"Andrea Hillenbrand and Uta St\u00f6rl. 2021. Managing Schema Migration in NoSQL Databases: Advisor Heuristics vs. Self-adaptive Schema Migration Strategies. In International Conference on Model-Driven Engineering and Software Development. Springer, 230\u2013253."},{"key":"e_1_2_1_12_1","unstructured":"Albert Q. Jiang Alexandre Sablayrolles Arthur Mensch Chris Bamford Devendra Singh Chaplot Diego de las Casas Florian Bressand Gianna Lengyel Guillaume Lample Lucile Saulnier L\u00e9lio Renard Lavaud Marie-Anne Lachaux Pierre Stock Teven Le Scao Thibaut Lavril Thomas Wang Timoth\u00e9e Lacroix and William El Sayed. 2023. Mistral 7B. arXiv:2310.06825 [cs.CL] https:\/\/arxiv.org\/abs\/2310.06825"},{"key":"e_1_2_1_13_1","volume-title":"International conference on machine learning. PMLR, 5637\u20135664","author":"Koh Pang Wei","year":"2021","unstructured":"Pang Wei Koh, Shiori Sagawa, Henrik Marklund, Sang Michael Xie, Marvin Zhang, Akshay Balsubramani, Weihua Hu, Michihiro Yasunaga, Richard Lanas Phillips, Irena Gao, et al. 2021. Wilds: A benchmark of in-the-wild distribution shifts. In International conference on machine learning. PMLR, 5637\u20135664."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/UPCON.2017.8251067"},{"key":"e_1_2_1_15_1","volume-title":"The Dawn of Natural Language to SQL: Are We Fully Ready? arXiv preprint arXiv:2406.01265","author":"Li Boyan","year":"2024","unstructured":"Boyan Li, Yuyu Luo, Chengliang Chai, Guoliang Li, and Nan Tang. 2024. The Dawn of Natural Language to SQL: Are We Fully Ready? arXiv preprint arXiv:2406.01265 (2024)."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.14778\/3685800.3685838"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3654930"},{"key":"e_1_2_1_18_1","unstructured":"Jinyang Li Binyuan Hui Ge Qu Jiaxi Yang Binhua Li Bowen Li Bailin Wang Bowen Qin Ruiying Geng Nan Huo et al. 2024. Can llm already serve as a database interface? a big bench for large-scale database grounded text-to-sqls. Advances in Neural Information Processing Systems 36 (2024)."},{"key":"e_1_2_1_19_1","volume-title":"A Survey of NL2SQL with Large Language Models: Where are we, and where are we going? arXiv preprint arXiv:2408.05109","author":"Liu Xinyu","year":"2024","unstructured":"Xinyu Liu, Shuyu Shen, Boyan Li, Peixian Ma, Runzhi Jiang, Yuxin Zhang, Ju Fan, Guoliang Li, Nan Tang, and Yuyu Luo. 2024. A Survey of NL2SQL with Large Language Models: Where are we, and where are we going? arXiv preprint arXiv:2408.05109 (2024)."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.14778\/3494124.3494139"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF02295996"},{"key":"e_1_2_1_22_1","unstructured":"OpenAI. 2024. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL] https:\/\/arxiv.org\/abs\/2303.08774"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.acl-long.142"},{"key":"e_1_2_1_24_1","doi-asserted-by":"crossref","unstructured":"Joaquin Quionero-Candela Masashi Sugiyama Anton Schwaighofer and Neil D. Lawrence. 2009. Dataset Shift in Machine Learning. The MIT Press.","DOI":"10.7551\/mitpress\/9780262170055.001.0001"},{"key":"e_1_2_1_25_1","unstructured":"Baptiste Rozi\u00e8re Jonas Gehring Fabian Gloeckle Sten Sootla Itai Gat Xiaoqing Ellen Tan Yossi Adi Jingyu Liu Romain Sauvestre Tal Remez J\u00e9r\u00e9my Rapin Artyom Kozhevnikov Ivan Evtimov Joanna Bitton Manish Bhatt Cristian Canton Ferrer Aaron Grattafiori Wenhan Xiong Alexandre D\u00e9fossez Jade Copet Faisal Azhar Hugo Touvron Louis Martin Nicolas Usunier Thomas Scialom and Gabriel Synnaeve. 2024. Code Llama: Open Foundation Models for Code. arXiv:2308.12950 [cs.CL] https:\/\/arxiv.org\/abs\/2308.12950"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3639477.3639732"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.emnlpmain.327"},{"key":"e_1_2_1_28_1","volume-title":"CHESS: Contextual Harnessing for Efficient SQL Synthesis. arXiv:2405.16755 [cs.LG] https:\/\/arxiv.org\/abs\/2405.16755","author":"Talaei Shayan","year":"2024","unstructured":"Shayan Talaei, Mohammadreza Pourreza, Yu-Chen Chang, Azalia Mirhoseini, and Amin Saberi. 2024. CHESS: Contextual Harnessing for Efficient SQL Synthesis. arXiv:2405.16755 [cs.LG] https:\/\/arxiv.org\/abs\/2405.16755"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.677"},{"key":"e_1_2_1_30_1","unstructured":"Chenglong Wang Kedar Tatwawadi Marc Brockschmidt Po-Sen Huang Yi Mao Oleksandr Polozov and Rishabh Singh. 2018. Robust Text-to-SQL Generation with Execution-Guided Decoding. arXiv:1807.03100 [cs.CL] https:\/\/arxiv.org\/abs\/1807.03100"},{"key":"e_1_2_1_31_1","volume-title":"Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush.","author":"Wolf Thomas","year":"2020","unstructured":"Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, R\u00e9mi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. 2020. HuggingFace's Transformers: State-of-the-art Natural Language Processing. arXiv:1910.03771 [cs.CL] https:\/\/arxiv.org\/abs\/1910.03771"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","unstructured":"Tao Yu Rui Zhang Kai Yang Michihiro Yasunaga Dongxu Wang Zifan Li James Ma Irene Li Qingning Yao Shanelle Roman Zilin Zhang and Dragomir Radev. 2018. Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing Ellen Riloff David Chiang Julia Hockenmaier and Jun'ichi Tsujii (Eds.). Association for Computational Linguistics Brussels Belgium 3911\u20133921. 10.18653\/v1\/D18-1425","DOI":"10.18653\/v1\/D18-1425"},{"key":"e_1_2_1_33_1","volume-title":"Rui Zhao, Ziyue Li, and Hangyu Mao.","author":"Zhang Bin","year":"2024","unstructured":"Bin Zhang, Yuxiao Ye, Guoqing Du, Xiaoru Hu, Zhishuai Li, Sun Yang, Chi Harold Liu, Rui Zhao, Ziyue Li, and Hangyu Mao. 2024. Benchmarking the Text-to-SQL Capability of Large Language Models: A Comprehensive Evaluation. arXiv:2403.02951 [cs.CL] https:\/\/arxiv.org\/abs\/2403.02951"},{"key":"e_1_2_1_34_1","volume-title":"FinSQL: Model-Agnostic LLMs-based Text-to-SQL Framework for Financial Analysis. arXiv preprint arXiv:2401.10506","author":"Zhang Chao","year":"2024","unstructured":"Chao Zhang, Yuren Mao, Yijiang Fan, Yu Mi, Yunjun Gao, Lu Chen, Dongfang Lou, and Jinshu Lin. 2024. FinSQL: Model-Agnostic LLMs-based Text-to-SQL Framework for Financial Analysis. arXiv preprint arXiv:2401.10506 (2024)."},{"key":"e_1_2_1_35_1","doi-asserted-by":"crossref","unstructured":"Hanchong Zhang Ruisheng Cao Lu Chen Hongshen Xu and Kai Yu. 2023. ACT-SQL: In-Context Learning for Text-to-SQL with Automatically-Generated Chain-of-Thought. In The 2023 Conference on Empirical Methods in Natural Language Processing. https:\/\/openreview.net\/forum?id=oeZiXoCHgq","DOI":"10.18653\/v1\/2023.findings-emnlp.227"},{"key":"e_1_2_1_36_1","unstructured":"Tianshu Zhang Changchang Liu Wei-Han Lee Yu Su and Huan Sun. 2023. Federated Learning for Semantic Parsing: Task Formulation Evaluation Setup New Algorithms. arXiv:2305.17221 [cs.CL] https:\/\/arxiv.org\/abs\/2305.17221"},{"key":"e_1_2_1_37_1","doi-asserted-by":"crossref","unstructured":"Yanli Zhao Andrew Gu Rohan Varma Liang Luo Chien-Chin Huang Min Xu Less Wright Hamid Shojanazeri Myle Ott Sam Shleifer Alban Desmaison Can Balioglu Pritam Damania Bernard Nguyen Geeta Chauhan Yuchen Hao Ajit Mathews and Shen Li. 2023. PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. arXiv:2304.11277 [cs.DC] https:\/\/arxiv.org\/abs\/2304.11277","DOI":"10.14778\/3611540.3611569"},{"key":"e_1_2_1_38_1","volume-title":"StructLM: Towards Building Generalist Models for Structured Knowledge Grounding. arXiv preprint arXiv:2402.16671","author":"Zhuang Alex","year":"2024","unstructured":"Alex Zhuang, Ge Zhang, Tianyu Zheng, Xinrun Du, Junjie Wang, Weiming Ren, Stephen W Huang, Jie Fu, Xiang Yue, and Wenhu Chen. 2024. StructLM: Towards Building Generalist Models for Structured Knowledge Grounding. arXiv preprint arXiv:2402.16671 (2024)."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3748191.3748222","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,4]],"date-time":"2025-09-04T13:54:54Z","timestamp":1756994094000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3748191.3748222"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6]]},"references-count":38,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2025,6]]}},"alternative-id":["10.14778\/3748191.3748222"],"URL":"https:\/\/doi.org\/10.14778\/3748191.3748222","relation":{},"ISSN":["2150-8097"],"issn-type":[{"type":"print","value":"2150-8097"}],"subject":[],"published":{"date-parts":[[2025,6]]},"assertion":[{"value":"2025-09-04","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}