{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,1]],"date-time":"2026-06-01T20:34:28Z","timestamp":1780346068192,"version":"3.54.1"},"reference-count":85,"publisher":"Association for Computing Machinery (ACM)","issue":"12","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2025,8]]},"abstract":"<jats:p>Exploratory data analysis (EDA), coupled with SQL, is essential for data analysts involved in data exploration and analysis. However, data analysts often encounter two primary challenges: (1) the need to craft SQL queries skillfully and (2) the requirement to generate suitable visualization types that enhance the interpretation of query results. Due to its significance, substantial research efforts have been made to explore different approaches to address these challenges, including leveraging large language models (LLMs). However, existing methods fail to meet real-world data exploration requirements primarily due to (1) complex database schema, (2) unclear user intent, (3) limited cross-domain generalization capability, and (4) insufficient end-to-end text-to-visualization capability.<\/jats:p>\n          <jats:p>This paper presents TiInsight, an automated SQL-based cross-domain exploratory data analysis system. First, we propose a hierarchical data context (i.e., HDC), which leverages LLMs to summarize the contexts related to the database schema, which is crucial for open-world EDA systems to generalize across data domains. Second, the EDA system is divided into four components (i.e., stages): HDC generation, question clarification and decomposition, text-to-SQL generation (i.e., TiSQL), and data visualization (i.e., TiChart). Finally, we implemented an end-to-end EDA system with a user-friendly GUI in the production environment at PingCAP. We have also open-sourced all APIs of TiInsight to facilitate research within the EDA community. Through extensive evaluations by a real-world user study, we demonstrate that TiInsight offers remarkable performance compared to human experts. Additionally, TiSQL achieves an execution accuracy of 86.3% on the Spider dataset when using GPT-4. It also attains an execution accuracy of 60.98% on the Bird test dataset.<\/jats:p>","DOI":"10.14778\/3750601.3750629","type":"journal-article","created":{"date-parts":[[2025,9,16]],"date-time":"2025-09-16T13:38:05Z","timestamp":1758029885000},"page":"5086-5099","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Towards Automated Cross-Domain Exploratory Data Analysis through Large Language Models"],"prefix":"10.14778","volume":"18","author":[{"given":"Jun-Peng","family":"Zhu","sequence":"first","affiliation":[{"name":"East China Normal University &amp; PingCAP"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Boyan","family":"Niu","sequence":"additional","affiliation":[{"name":"PingCAP, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Peng","family":"Cai","sequence":"additional","affiliation":[{"name":"East China Normal University, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Zheming","family":"Ni","sequence":"additional","affiliation":[{"name":"PingCAP, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jianwei","family":"Wan","sequence":"additional","affiliation":[{"name":"PingCAP, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Kai","family":"Xu","sequence":"additional","affiliation":[{"name":"PingCAP, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jiajun","family":"Huang","sequence":"additional","affiliation":[{"name":"PingCAP, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Shengbo","family":"Ma","sequence":"additional","affiliation":[{"name":"PingCAP, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Bing","family":"Wang","sequence":"additional","affiliation":[{"name":"PingCAP, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Xuan","family":"Zhou","sequence":"additional","affiliation":[{"name":"East China Normal University, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Guanglei","family":"Bao","sequence":"additional","affiliation":[{"name":"PingCAP, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Donghui","family":"Zhang","sequence":"additional","affiliation":[{"name":"PingCAP, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Liu","family":"Tang","sequence":"additional","affiliation":[{"name":"PingCAP, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Qi","family":"Liu","sequence":"additional","affiliation":[{"name":"PingCAP, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2025,9,16]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Retrived in May","year":"2024","unstructured":"[n.d.]. Act on AI-powered insights in your flow of work built on the Salesforce Platform with Agentforce. Retrived in May, 2024 from https:\/\/www.tableau.com\/."},{"key":"e_1_2_1_2_1","volume-title":"Retrived in May","year":"2024","unstructured":"[n.d.]. BIRD-SQL: A Big Bench for Large-Scale Database Grounded Text-to-SQLs. Retrived in May, 2024 from https:\/\/bird-bench.github.io\/."},{"key":"e_1_2_1_3_1","volume-title":"Retrived in May","author":"Chroma","year":"2024","unstructured":"[n.d.]. Chroma - the open-source embedding database. Retrived in May, 2024 from https:\/\/github.com\/chroma-core\/chroma."},{"key":"e_1_2_1_4_1","volume-title":"Retrived in May","year":"2024","unstructured":"[n.d.]. Crude Oil WTI Futures. Retrived in May, 2024 from https:\/\/www.investing.com\/commodities\/crude-oil-historical-data."},{"key":"e_1_2_1_5_1","volume-title":"Retrived in May","year":"2024","unstructured":"[n.d.]. Dow Jones Industrial Average. Retrived in May, 2024 from https:\/\/fred.stlouisfed.org\/series\/DJIA."},{"key":"e_1_2_1_6_1","volume-title":"Retrived in May","year":"2024","unstructured":"[n.d.]. Federal Funds Effective Rate. Retrived in May, 2024 from https:\/\/fred.stlouisfed.org\/series\/FEDFUNDS."},{"key":"e_1_2_1_7_1","volume-title":"Retrived in May","year":"2024","unstructured":"[n.d.]. Gold futures. Retrived in May, 2024 from https:\/\/www.investing.com\/commodities\/gold-historical-data."},{"key":"e_1_2_1_8_1","volume-title":"Retrived in May","year":"2024","unstructured":"[n.d.]. GPT-4o mini: advancing cost-efficient intelligence. Retrived in May, 2024 from https:\/\/openai.com\/index\/gpt-4o-mini-advancing-cost-efficient-intelligence\/."},{"key":"e_1_2_1_9_1","volume-title":"Retrived","author":"Hello","year":"2024","unstructured":"[n.d.]. Hello GPT-4o. Retrived in May, 2024 from https:\/\/openai.com\/index\/hello-gpt-4o\/."},{"key":"e_1_2_1_10_1","volume-title":"Retrived in May","year":"2024","unstructured":"[n.d.]. The JavaScript library for bespoke data visualization. Retrived in May, 2024 from https:\/\/d3js.org\/."},{"key":"e_1_2_1_11_1","volume-title":"Retrived in May","year":"2024","unstructured":"[n.d.]. NASDAQ Composite Index. Retrived in May, 2024 from https:\/\/fred.stlouisfed.org\/series\/NASDAQCOM."},{"key":"e_1_2_1_12_1","volume-title":"Retrived in May","year":"2024","unstructured":"[n.d.]. Pinecone serverless lets you deliver remarkable GenAI applications faster. Retrived in May, 2024 from https:\/\/www.pinecone.io\/."},{"key":"e_1_2_1_13_1","volume-title":"Retrived in May","author":"Power","year":"2024","unstructured":"[n.d.]. Power BI: Uncover powerful insights and turn them into impact. Retrived in May, 2024 from https:\/\/www.microsoft.com\/en-us\/power-platform\/products\/power-bi."},{"key":"e_1_2_1_14_1","volume-title":"Retrived in May","year":"2024","unstructured":"[n.d.]. Shanghai Shenzhen CSI 300. Retrived in May, 2024 from https:\/\/www.investing.com\/indices\/csi300."},{"key":"e_1_2_1_15_1","volume-title":"Retrived in May","year":"2024","unstructured":"[n.d.]. Spider: Yale Semantic Parsing and Text-to-SQL Challenge. Retrived in May, 2024 from https:\/\/yale-lily.github.io\/spider."},{"key":"e_1_2_1_16_1","volume-title":"Unemployment Rate. Retrived","year":"2024","unstructured":"[n.d.]. Unemployment Rate. Retrived in May, 2024 from https:\/\/fred.stlouisfed.org\/series\/UNRATE."},{"key":"e_1_2_1_17_1","volume-title":"Vector Search (Beta) Overview. Retrived","year":"2024","unstructured":"[n.d.]. Vector Search (Beta) Overview. Retrived in May, 2024 from https:\/\/docs.pingcap.com\/tidbcloud\/vector-search-overview."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.14778\/3685800.3685913"},{"key":"e_1_2_1_19_1","volume-title":"The claude 3 model family: Opus, sonnet, haiku. Claude-3 Model Card 1","author":"Anthropic AI","year":"2024","unstructured":"AI Anthropic. 2024. The claude 3 model family: Opus, sonnet, haiku. Claude-3 Model Card 1 (2024)."},{"key":"e_1_2_1_20_1","volume-title":"Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 2319\u20132329","author":"Baik Christopher","year":"2020","unstructured":"Christopher Baik, Zhongjun Jin, Michael Cafarella, and HV Jagadish. 2020. Duoquest: A dual-specification system for expressive SQL queries. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 2319\u20132329."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1448"},{"key":"e_1_2_1_22_1","volume-title":"Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 5567\u20135577","author":"Chen Zhi","year":"2021","unstructured":"Zhi Chen, Lu Chen, Yanbin Zhao, Ruisheng Cao, Zihan Xu, Su Zhu, and Kai Yu. 2021. ShadowGNN: Graph Projection Neural Network for Text-to-SQL Parser. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 5567\u20135577."},{"key":"e_1_2_1_23_1","first-page":"309","article-title":"Ryansql: Recursively applying sketch-based slot fillings for complex text-to-sql in cross-domain databases","volume":"47","author":"Choi DongHyun","year":"2021","unstructured":"DongHyun Choi, Myeong Cheol Shin, EungGyun Kim, and Dong Ryeol Shin. 2021. Ryansql: Recursively applying sketch-based slot fillings for complex text-to-sql in cross-domain databases. Computational Linguistics 47, 2 (2021), 309\u2013332.","journal-title":"Computational Linguistics"},{"key":"e_1_2_1_24_1","volume-title":"2016 IEEE 32nd International Conference on Data Engineering (ICDE). IEEE, 1358\u20131361","author":"Deutch Daniel","year":"2016","unstructured":"Daniel Deutch and Amir Gilad. 2016. QPlain: Query by explanation. In 2016 IEEE 32nd International Conference on Data Engineering (ICDE). IEEE, 1358\u20131361."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.14778\/3565838.3565841"},{"key":"e_1_2_1_26_1","volume-title":"Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805","author":"Devlin Jacob","year":"2018","unstructured":"Jacob Devlin. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)."},{"key":"e_1_2_1_27_1","volume-title":"Proceedings of the 2019 international conference on management of data. 317\u2013332","author":"Ding Rui","year":"2019","unstructured":"Rui Ding, Shi Han, Yong Xu, Haidong Zhang, and Dongmei Zhang. 2019. Quick-insights: Quick and automatic discovery of insights from multi-dimensional data. In Proceedings of the 2019 international conference on management of data. 317\u2013332."},{"key":"e_1_2_1_28_1","unstructured":"Xuemei Dong Chao Zhang Yuhang Ge Yuren Mao Yunjun Gao Jinshu Lin Dongfang Lou et al. 2023. C3: Zero-shot text-to-sql with chatgpt. arXiv preprint arXiv:2307.07306 (2023)."},{"key":"e_1_2_1_29_1","unstructured":"Ori Bar El Tova Milo and Amit Somech. 2020. Towards Autonomous Hands-Free Data Exploration.. In CIDR."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.14778\/3681954.3681960"},{"key":"e_1_2_1_31_1","doi-asserted-by":"crossref","first-page":"1262","DOI":"10.14778\/3342263.3342266","article-title":"Example-driven query intent discovery: abductive reasoning using semantic similarity","volume":"12","author":"Fariha Anna","year":"2019","unstructured":"Anna Fariha and Alexandra Meliou. 2019. Example-driven query intent discovery: abductive reasoning using semantic similarity. Proceedings of the VLDB Endowment 12, 11 (2019), 1262\u20131275.","journal-title":"Proceedings of the VLDB Endowment"},{"key":"e_1_2_1_32_1","volume-title":"Brandon Chow, Kai Deng, Katherine Lin, Marcos Campos, K. Venkatesh Emani, Vivek Pandit, Victor Shnayder, Wenjing Wang, and Carlo Curino.","author":"Floratou Avrilia","year":"2024","unstructured":"Avrilia Floratou, Fotis Psallidas, Fuheng Zhao, Shaleen Deep, Gunther Hagleither, Wangda Tan, Joyce Cahoon, Rana Alotaibi, Jordan Henkel, Abhik Singla, Alex Van Grootel, Brandon Chow, Kai Deng, Katherine Lin, Marcos Campos, K. Venkatesh Emani, Vivek Pandit, Victor Shnayder, Wenjing Wang, and Carlo Curino. 2024. NL2SQL is a solved problem... Not!. In CIDR. https:\/\/www.cidrdb.org\/cidr2024\/papers\/p74-floratou.pdf"},{"key":"e_1_2_1_33_1","volume-title":"Text-to-sql empowered by large language models: A benchmark evaluation. arXiv preprint arXiv:2308.15363","author":"Gao Dawei","year":"2023","unstructured":"Dawei Gao, Haibin Wang, Yaliang Li, Xiuyu Sun, Yichen Qian, Bolin Ding, and Jingren Zhou. 2023. Text-to-sql empowered by large language models: A benchmark evaluation. arXiv preprint arXiv:2308.15363 (2023)."},{"key":"e_1_2_1_34_1","volume-title":"Few-shot Text-to-SQL Translation using Structure and Content Prompt Learning. SIGMOD","author":"Gu Zihui","year":"2023","unstructured":"Zihui Gu, Ju Fan, Nan Tang, Lei Cao, Bowen Jia, Sam Madden, and Xiaoyong Du. 2023. Few-shot Text-to-SQL Translation using Structure and Content Prompt Learning. SIGMOD (2023), 1\u201328."},{"key":"e_1_2_1_35_1","volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics.","author":"Guo Jiaqi","year":"2019","unstructured":"Jiaqi Guo, Zecheng Zhan, Yan Gao, Yan Xiao, Jian-Guang Lou, Ting Liu, and Dongmei Zhang. 2019. Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics."},{"key":"e_1_2_1_36_1","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","volume":"38","author":"He Xinyi","year":"2024","unstructured":"Xinyi He, Mengyu Zhou, Xinrun Xu, Xiaojun Ma, Rui Ding, Lun Du, Yan Gao, Ran Jia, Xu Chen, Shi Han, et al. 2024. Text2analysis: A benchmark of table question answering with advanced data analysis and unclear queries. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 18206\u201318215."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2731084"},{"key":"e_1_2_1_38_1","volume-title":"A survey on deep learning approaches for text-to-SQL. The VLDB Journal","author":"Katsogiannis-Meimarakis George","year":"2023","unstructured":"George Katsogiannis-Meimarakis and Georgia Koutrika. 2023. A survey on deep learning approaches for text-to-SQL. The VLDB Journal (2023), 1\u201332."},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.14778\/3401960.3401970"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.14778\/3229863.3240493"},{"key":"e_1_2_1_41_1","doi-asserted-by":"crossref","first-page":"727","DOI":"10.14778\/3494124.3494151","article-title":"Lux: always-on visualization recommendations for exploratory dataframe workflows","volume":"15","author":"Jung-Lin Lee Doris","year":"2021","unstructured":"Doris Jung-Lin Lee, Dixin Tang, Kunal Agarwal, Thyne Boonmark, Caitlyn Chen, Jake Kang, Ujjaini Mukhopadhyay, Jerry Song, Micah Yong, Marti A Hearst, et al. 2021. Lux: always-on visualization recommendations for exploratory dataframe workflows. Proceedings of the VLDB Endowment 15, 3 (2021), 727\u2013738.","journal-title":"Proceedings of the VLDB Endowment"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v37i11.26535"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/3654930"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v37i11.26536"},{"key":"e_1_2_1_45_1","unstructured":"Jinyang Li Binyuan Hui Ge Qu Jiaxi Yang Binhua Li Bowen Li Bailin Wang Bowen Qin Ruiying Geng Nan Huo et al. 2024. Can llm already serve as a database interface? a big bench for large-scale database grounded text-to-sqls. Advances in Neural Information Processing Systems 36 (2024)."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2018.2807843"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.findings-emnlp.438"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2018.00019"},{"key":"e_1_2_1_49_1","volume-title":"Proceedings of the 2021 international conference on management of data. 1262\u20131274","author":"Ma Pingchuan","year":"2021","unstructured":"Pingchuan Ma, Rui Ding, Shi Han, and Dongmei Zhang. 2021. Metainsight: Automatic discovery of structured knowledge for exploratory data analysis. In Proceedings of the 2021 international conference on management of data. 1262\u20131274."},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.emnlp-demo.31"},{"key":"e_1_2_1_51_1","volume-title":"XInsight: eXplainable Data Analysis Through The Lens of Causality. SIGMOD","author":"Ma Pingchuan","year":"2023","unstructured":"Pingchuan Ma, Rui Ding, Shuai Wang, Shi Han, and Dongmei Zhang. 2023. XInsight: eXplainable Data Analysis Through The Lens of Causality. SIGMOD (2023), 1\u201327."},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/3211954.3211958"},{"key":"e_1_2_1_53_1","doi-asserted-by":"crossref","unstructured":"Tova Milo and Amit Somech. 2020. Automating exploratory data analysis via machine learning: An overview. In SIGMOD. 2617\u20132622.","DOI":"10.1145\/3318464.3383126"},{"key":"e_1_2_1_54_1","volume-title":"Retrived","author":"AI.","year":"2024","unstructured":"OpenAI. [n.d.]. GPT-4. Retrived in May, 2024 from https:\/\/openai.com\/research\/gpt-4."},{"key":"e_1_2_1_55_1","volume-title":"Retrived in May","author":"AI.","year":"2022","unstructured":"OpenAI. 2022. Introducing ChatGPT. Retrived in May, 2024 from https:\/\/openai.com\/blog\/chatgpt,."},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.14778\/3717755.3717774"},{"key":"e_1_2_1_57_1","volume-title":"Proceedings of the 2021 International Conference on Management of Data. 2271\u20132280","author":"Peng Jinglin","year":"2021","unstructured":"Jinglin Peng, Weiyuan Wu, Brandon Lockhart, Song Bian, Jing Nathan Yan, Linghao Xu, Zhixuan Chi, Jeffrey M Rzeszotarski, and Jiannan Wang. 2021. Dataprep. eda: Task-centric exploratory data analysis for statistical modeling in python. In Proceedings of the 2021 International Conference on Management of Data. 2271\u20132280."},{"key":"e_1_2_1_58_1","volume-title":"Retrived in June","author":"CAP.","year":"2025","unstructured":"PingCAP. [n.d.]. Use Knowledge Bases. Retrived in June, 2025 from https:\/\/docs.pingcap.com\/tidbcloud\/use-chat2query-knowledge."},{"key":"e_1_2_1_59_1","volume-title":"Din-sql: Decomposed in-context learning of text-to-sql with self-correction. Advances in Neural Information Processing Systems 36","author":"Pourreza Mohammadreza","year":"2024","unstructured":"Mohammadreza Pourreza and Davood Rafiei. 2024. Din-sql: Decomposed in-context learning of text-to-sql with self-correction. Advances in Neural Information Processing Systems 36 (2024)."},{"key":"e_1_2_1_60_1","volume-title":"DTS-SQL: Decomposed Text-to-SQL with Small Large Language Models. arXiv preprint arXiv:2402.01117","author":"Pourreza Mohammadreza","year":"2024","unstructured":"Mohammadreza Pourreza and Davood Rafiei. 2024. DTS-SQL: Decomposed Text-to-SQL with Small Large Language Models. arXiv preprint arXiv:2402.01117 (2024)."},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-019-00588-3"},{"key":"e_1_2_1_62_1","volume-title":"Align it! A Novel and Effective Strategy for Mitigating Hallucinations in Text-to-SQL Generation. arXiv preprint arXiv:2405.15307","author":"Qu Ge","year":"2024","unstructured":"Ge Qu, Jinyang Li, Bowen Li, Bowen Qin, Nan Huo, Chenhao Ma, and Reynold Cheng. 2024. Before Generation, Align it! A Novel and Effective Strategy for Mitigating Hallucinations in Text-to-SQL Generation. arXiv preprint arXiv:2405.15307 (2024)."},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.14778\/2994509.2994536"},{"key":"e_1_2_1_64_1","volume-title":"Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 9895\u20139901","author":"Scholak Torsten","year":"2021","unstructured":"Torsten Scholak, Nathan Schucher, and Dzmitry Bahdanau. 2021. PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 9895\u20139901."},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.14778\/3407790.3407858"},{"key":"e_1_2_1_66_1","volume-title":"Towards natural language interfaces for data visualization: A survey","author":"Shen Leixian","year":"2022","unstructured":"Leixian Shen, Enya Shen, Yuyu Luo, Xiaocong Yang, Xuming Hu, Xiongshuai Zhang, Zhiwei Tai, and Jianmin Wang. 2022. Towards natural language interfaces for data visualization: A survey. IEEE transactions on visualization and computer graphics 29, 6 (2022), 3121\u20133144."},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1109\/2945.981851"},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3035922"},{"key":"e_1_2_1_69_1","volume-title":"Proceedings of the 2022 International Conference on Management of Data. 2353\u20132356","author":"Tang Jiawei","year":"2022","unstructured":"Jiawei Tang, Yuyu Luo, Mourad Ouzzani, Guoliang Li, and Hongyang Chen. 2022. Sevi: Speech-to-visualization through neural machine translation. In Proceedings of the 2022 International Conference on Management of Data. 2353\u20132356."},{"key":"e_1_2_1_70_1","volume-title":"Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971","author":"Touvron Hugo","year":"2023","unstructured":"Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timoth\u00e9e Lacroix, Baptiste Rozi\u00e8re, Naman Goyal, Eric Hambro, Faisal Azhar, et al. 2023. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023)."},{"key":"e_1_2_1_71_1","volume-title":"Proceedings of the VLDB Endowment International Conference on Very Large Data Bases","volume":"8","author":"Vartak Manasi","year":"2015","unstructured":"Manasi Vartak, Sajjadur Rahman, Samuel Madden, Aditya Parameswaran, and Neoklis Polyzotis. 2015. Seedb: Efficient data-driven visualization recommendations to support visual analytics. In Proceedings of the VLDB Endowment International Conference on Very Large Data Bases, Vol. 8. NIH Public Access, 2182."},{"key":"e_1_2_1_72_1","unstructured":"Bing Wang Changyu Ren Jian Yang Xinnian Liang Jiaqi Bai Linzheng Chai Zhao Yan Qian-Wen Zhang Di Yin Xing Sun and Zhoujun Li. 2024. MAC-SQL: A Multi-Agent Collaborative Framework for Text-to-SQL. arXiv:2312.11242 [cs.CL]"},{"key":"e_1_2_1_73_1","doi-asserted-by":"publisher","DOI":"10.1145\/3025453.3025768"},{"key":"e_1_2_1_74_1","doi-asserted-by":"crossref","first-page":"5049","DOI":"10.1109\/TVCG.2021.3099002","article-title":"Ai4vis: Survey on artificial intelligence approaches for data visualization","volume":"28","author":"Wu Aoyu","year":"2021","unstructured":"Aoyu Wu, Yun Wang, Xinhuan Shu, Dominik Moritz, Weiwei Cui, Haidong Zhang, Dongmei Zhang, and Huamin Qu. 2021. Ai4vis: Survey on artificial intelligence approaches for data visualization. IEEE Transactions on Visualization and Computer Graphics 28, 12 (2021), 5049\u20135070.","journal-title":"IEEE Transactions on Visualization and Computer Graphics"},{"key":"e_1_2_1_75_1","volume-title":"Annual meeting of the Association for Computational Linguistics (ACL). 1341\u20131350","author":"Xiao Chunyang","year":"2016","unstructured":"Chunyang Xiao, Marc Dymetman, and Claire Gardent. 2016. Sequence-based structured prediction for semantic parsing. In Annual meeting of the Association for Computational Linguistics (ACL). 1341\u20131350."},{"key":"e_1_2_1_76_1","volume-title":"HAIChart: Human and AI Paired Visualization System. arXiv preprint arXiv:2406.11033","author":"Xie Yupeng","year":"2024","unstructured":"Yupeng Xie, Yuyu Luo, Guoliang Li, and Nan Tang. 2024. HAIChart: Human and AI Paired Visualization System. arXiv preprint arXiv:2406.11033 (2024)."},{"key":"e_1_2_1_77_1","volume-title":"Sqlnet: Generating structured queries from natural language without reinforcement learning. arXiv preprint arXiv:1711.04436","author":"Xu Xiaojun","year":"2017","unstructured":"Xiaojun Xu, Chang Liu, and Dawn Song. 2017. Sqlnet: Generating structured queries from natural language without reinforcement learning. arXiv preprint arXiv:1711.04436 (2017)."},{"key":"e_1_2_1_78_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1425"},{"key":"e_1_2_1_79_1","volume-title":"FinSQL: Model-Agnostic LLMs-based Text-to-SQL Framework for Financial Analysis. arXiv preprint arXiv:2401.10506","author":"Zhang Chao","year":"2024","unstructured":"Chao Zhang, Yuren Mao, Yijiang Fan, Yu Mi, Yunjun Gao, Lu Chen, Dongfang Lou, and Jinshu Lin. 2024. FinSQL: Model-Agnostic LLMs-based Text-to-SQL Framework for Financial Analysis. arXiv preprint arXiv:2401.10506 (2024)."},{"key":"e_1_2_1_80_1","volume-title":"Seq2sql: Generating structured queries from natural language using reinforcement learning. arXiv preprint arXiv:1709.00103","author":"Zhong Victor","year":"2017","unstructured":"Victor Zhong, Caiming Xiong, and Richard Socher. 2017. Seq2sql: Generating structured queries from natural language using reinforcement learning. arXiv preprint arXiv:1709.00103 (2017)."},{"key":"e_1_2_1_81_1","volume-title":"Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2389\u20132399","author":"Zhou Mengyu","year":"2021","unstructured":"Mengyu Zhou, Qingtao Li, Xinyi He, Yuejiang Li, Yibo Liu, Wei Ji, Shi Han, Yining Chen, Daxin Jiang, and Dongmei Zhang. 2021. Table2Charts: recommending charts by learning shared table representations. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2389\u20132399."},{"key":"e_1_2_1_82_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE60146.2024.00420"},{"key":"e_1_2_1_83_1","doi-asserted-by":"publisher","DOI":"10.14778\/3685800.3685816"},{"key":"e_1_2_1_84_1","volume-title":"UNITQA: A Unified Automated Tabular Question Answering System with Multi-Agent Large Language Models. In Companion of the 2025 International Conference on Management of Data. 279\u2013282","author":"Zhu Jun-Peng","year":"2025","unstructured":"Jun-Peng Zhu, Peng Cai, Kai Xu, Li Li, Yishen Sun, Shuai Zhou, Haihuang Su, Liu Tang, and Qi Liu. 2025. UNITQA: A Unified Automated Tabular Question Answering System with Multi-Agent Large Language Models. In Companion of the 2025 International Conference on Management of Data. 279\u2013282."},{"key":"e_1_2_1_85_1","unstructured":"Jun-Peng Zhu Boyan Niu Peng Cai Zheming Ni Jianwei Wan Kai Xu Jiajun Huang Shengbo Ma Bing Wang Xuan Zhou et al. 2024. Towards Automated Cross-domain Exploratory Data Analysis through Large Language Models. arXiv preprint arXiv:2412.07214 (2024)."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3750601.3750629","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,16]],"date-time":"2025-09-16T13:42:18Z","timestamp":1758030138000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3750601.3750629"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8]]},"references-count":85,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2025,8]]}},"alternative-id":["10.14778\/3750601.3750629"],"URL":"https:\/\/doi.org\/10.14778\/3750601.3750629","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2025,8]]},"assertion":[{"value":"2025-09-16","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}