{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,6]],"date-time":"2026-05-06T06:28:46Z","timestamp":1778048926823,"version":"3.51.4"},"reference-count":76,"publisher":"Association for Computing Machinery (ACM)","issue":"11","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2025,7]]},"abstract":"<jats:p>\n            The semantic capabilities of large language models (LLMs) have the potential to enable rich analytics and reasoning over vast knowledge corpora. Unfortunately, existing systems either empirically optimize expensive LLM-powered operations with\n            <jats:italic toggle=\"yes\">no performance guarantees<\/jats:italic>\n            , or limit their support to simple batched-inference primitives. We introduce\n            <jats:italic toggle=\"yes\">semantic operators<\/jats:italic>\n            , the first formalism with statistical accuracy guarantees for general-purpose AI-based operations with natural language parameters (e.g., filtering, sorting, joining or aggregating records using natural language criteria). Each operator can be implemented by multiple\n            <jats:italic toggle=\"yes\">AI algorithms<\/jats:italic>\n            , which compose individual model invocations to orchestrate the model over the data. Our programming model specifies the expected behavior of each operator with a high-quality\n            <jats:italic toggle=\"yes\">reference algorithm<\/jats:italic>\n            , and we develop an optimization framework that reduces cost, while providing accuracy guarantees for individual operators. Using this approach, we propose several novel optimizations to accelerate semantic filtering, joining, group-by and top-k operations by up to 1, 000\u00d7. We implement semantic operators in the LOTUS system and demonstrate LOTUS' effectiveness on real, bulk-semantic processing applications, including fact-checking, biomedical multi-label classification, search, and topic analysis. We show that the semantic operator model is expressive, capturing state-of-the-art AI pipelines in a few operator calls, and making it easy to express new pipelines that match or exceed quality of recent LLM-based analytic systems by up to 170%, while offering accuracy guarantees. Overall, LOTUS programs match or exceed the accuracy of state-of-the-art AI pipelines for each task while running up to 3.6\u00d7 faster than the highest-quality baselines. LOTUS is publicly available at https:\/\/github.com\/lotus-data\/lotus.\n          <\/jats:p>","DOI":"10.14778\/3749646.3749685","type":"journal-article","created":{"date-parts":[[2025,9,4]],"date-time":"2025-09-04T17:55:06Z","timestamp":1757008506000},"page":"4171-4184","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["Semantic Operators and Their Optimization: Enabling LLM-Based Data Processing with Accuracy Guarantees in LOTUS"],"prefix":"10.14778","volume":"18","author":[{"given":"Liana","family":"Patel","sequence":"first","affiliation":[{"name":"Stanford University"}]},{"given":"Siddharth","family":"Jha","sequence":"additional","affiliation":[{"name":"UC Berkeley"}]},{"given":"Melissa","family":"Pan","sequence":"additional","affiliation":[{"name":"UC Berkeley"}]},{"given":"Harshit","family":"Gupta","sequence":"additional","affiliation":[{"name":"Stanford University"}]},{"given":"Parth","family":"Asawa","sequence":"additional","affiliation":[{"name":"UC Berkeley"}]},{"given":"Carlos","family":"Guestrin","sequence":"additional","affiliation":[{"name":"Stanford University"}]},{"given":"Matei","family":"Zaharia","sequence":"additional","affiliation":[{"name":"UC Berkeley"}]}],"member":"320","published-online":{"date-parts":[[2025,9,4]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"[n.d.]. AI Functions on Databricks. https:\/\/docs.databricks.com"},{"key":"e_1_2_1_2_1","unstructured":"[n.d.]. arXiv.org ePrint archive. https:\/\/arxiv.org"},{"key":"e_1_2_1_3_1","unstructured":"[n.d.]. Custom Search JSON API | Programmable Search Engine. https:\/\/developers.google.com\/custom-search\/v1\/overview"},{"key":"e_1_2_1_4_1","unstructured":"[n.d.]. Discovery Insight Platform. https:\/\/www.findourview.com"},{"key":"e_1_2_1_5_1","unstructured":"[n.d.]. GAIR-NLP\/factool: FacTool: Factuality Detection in Generative AI. https:\/\/github.com\/GAIR-NLP\/factool"},{"key":"e_1_2_1_6_1","unstructured":"[n.d.]. Introducing Meta Llama 3: The most capable openly available LLM to date. https:\/\/ai.meta.com\/blog\/meta-llama-3\/"},{"key":"e_1_2_1_7_1","unstructured":"[n.d.]. LangChain. https:\/\/www.langchain.com\/"},{"key":"e_1_2_1_8_1","unstructured":"[n.d.]. Large Language Model (LLM) Functions (Snowflake Cortex) | Snowflake Documentation. https:\/\/docs.snowflake.com\/user-guide\/snowflake-cortex\/llm-functions"},{"key":"e_1_2_1_9_1","unstructured":"[n.d.]. LLM with Vertex AI only using SQL queries in BigQuery. https:\/\/cloud.google.com\/blog\/products\/ai-machine-learning\/llm-with-vertex-ai-only-using-sql-queries-in-bigquery"},{"key":"e_1_2_1_10_1","unstructured":"[n.d.]. OpenAI Platform. https:\/\/platform.openai.com"},{"key":"e_1_2_1_11_1","unstructured":"[n.d.]. pandas - Python Data Analysis Library. https:\/\/pandas.pydata.org\/"},{"key":"e_1_2_1_12_1","unstructured":"[n.d.]. python-bigquery-dataframes\/notebooks\/experimental\/semantic_operators.ipynb at main \u00b7 googleapis\/python-bigquery-dataframes. https:\/\/github.com\/googleapis\/python-bigquery-dataframes\/blob\/main\/notebooks\/experimental\/semantic_operators.ipynb"},{"key":"e_1_2_1_13_1","unstructured":"[n.d.]. Querying - LlamaIndex 0.9.11.post1. https:\/\/docs.llamaindex.ai\/en\/stable\/understanding\/querying\/querying.html"},{"key":"e_1_2_1_14_1","unstructured":"2023. Large Language Models for sentiment analysis with Amazon Redshift ML (Preview) | AWS Big Data Blog. https:\/\/aws.amazon.com\/blogs\/big-data\/large-language-models-for-sentiment-analysis-with-amazon-redshift-ml-preview\/ Section: Amazon Redshift."},{"key":"e_1_2_1_15_1","unstructured":"2024. mixedbread-ai\/mxbai-rerank-large-v1 \u00b7 Hugging Face. https:\/\/huggingface.co\/mixedbread-ai\/mxbai-rerank-large-v1"},{"key":"e_1_2_1_16_1","doi-asserted-by":"crossref","unstructured":"Griffin Adams Alexander Fabbri Faisal Ladhak Eric Lehman and No\u00e9mie Elhadad. 2023. From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting. http:\/\/arxiv.org\/abs\/2309.04269 arXiv:2309.04269 [cs].","DOI":"10.18653\/v1\/2023.newsum-1.7"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2409.00847"},{"key":"e_1_2_1_18_1","doi-asserted-by":"crossref","unstructured":"Simran Arora Brandon Yang Sabri Eyuboglu Avanika Narayan Andrew Hojel Immanuel Trummer and Christopher R\u00e9. 2023. Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes. http:\/\/arxiv.org\/abs\/2304.09433 arXiv:2304.09433 [cs].","DOI":"10.14778\/3626292.3626294"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","unstructured":"Asim Biswal Liana Patel Siddarth Jha Amog Kamsetty Shu Liu Joseph E. Gonzalez Carlos Guestrin and Matei Zaharia. 2024. Text2SQL is Not Enough: Unifying AI and Databases with TAG. arXiv:2408.14717 [cs]. 10.48550\/arXiv.2408.14717","DOI":"10.48550\/arXiv.2408.14717"},{"key":"e_1_2_1_20_1","unstructured":"Mark Braverman and Elchanan Mossel. [n.d.]. Noisy sorting without resampling. ([n. d.])."},{"key":"e_1_2_1_21_1","unstructured":"Yapei Chang Kyle Lo Tanya Goyal and Mohit Iyyer. 2024. BooookScore: A systematic exploration of book-length summarization in the era of LLMs. http:\/\/arxiv.org\/abs\/2310.00785 arXiv:2310.00785 [cs]."},{"key":"e_1_2_1_22_1","unstructured":"Lingjiao Chen Matei Zaharia and James Zou. 2023. FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance. http:\/\/arxiv.org\/abs\/2305.05176 arXiv:2305.05176 [cs]."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","unstructured":"I.-Chun Chern Steffi Chern Shiqi Chen Weizhe Yuan Kehua Feng Chunting Zhou Junxian He Graham Neubig and Pengfei Liu. 2023. FacTool: Factuality Detection in Generative AI - A Tool Augmented Framework for MultiTask and Multi-Domain Scenarios. arXiv:2307.13528 [cs]. 10.48550\/arXiv.2307.13528","DOI":"10.48550\/arXiv.2307.13528"},{"key":"e_1_2_1_24_1","volume-title":"A Relational Model of Data for Large Shared Data Banks. 13, 6","author":"Codd E F","year":"1970","unstructured":"E F Codd. 1970. A Relational Model of Data for Large Shared Data Banks. 13, 6 (1970)."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2407.09522"},{"key":"e_1_2_1_26_1","unstructured":"Sanjoy Dasgupta. [n.d.]. The hardness of k-means clustering. ([n. d.])."},{"key":"e_1_2_1_27_1","doi-asserted-by":"crossref","unstructured":"Shrey Desai and Greg Durrett. 2020. Calibration of Pre-trained Transformers. https:\/\/arxiv.org\/abs\/2003.07892v3","DOI":"10.18653\/v1\/2020.emnlp-main.21"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","unstructured":"Karel D'Oosterlinck Omar Khattab Fran\u00e7ois Remy Thomas Demeester Chris Develder and Christopher Potts. 2024. In-Context Learning for Extreme Multi-Label Classification. arXiv:2401.12178 [cs]. 10.48550\/arXiv.2401.12178","DOI":"10.48550\/arXiv.2401.12178"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","unstructured":"Karel D'Oosterlinck Fran\u00e7ois Remy Johannes Deleu Thomas Demeester Chris Develder Klim Zaporojets Aneiss Ghodsi Simon Ellershaw Jack Collins and Christopher Potts. 2023. BioDEX: Large-Scale Biomedical Adverse Drug Event Extraction for Real-World Pharmacovigilance. arXiv:2305.13395 [cs]. 10.48550\/arXiv.2305.13395","DOI":"10.48550\/arXiv.2305.13395"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","unstructured":"Matthijs Douze Alexandr Guzhva Chengqi Deng Jeff Johnson Gergely Szilvasy Pierre-Emmanuel Mazar\u00e9 Maria Lomeli Lucas Hosseini and Herv\u00e9 J\u00e9gou. 2024. The Faiss library. arXiv:2401.08281 [cs]. 10.48550\/arXiv.2401.08281","DOI":"10.48550\/arXiv.2401.08281"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","unstructured":"Andrew Drozdov Honglei Zhuang Zhuyun Dai Zhen Qin Razieh Rahimi Xuanhui Wang Dana Alon Mohit Iyyer Andrew McCallum Donald Metzler and Kai Hui. 2023. PaRaDe: Passage Ranking using Demonstrations with Large Language Models. arXiv:2310.14408 [cs]. 10.48550\/arXiv.2310.14408","DOI":"10.48550\/arXiv.2310.14408"},{"key":"e_1_2_1_32_1","volume-title":"Eugene Fratkin, Aleksander Gorajek, Kee Siong Ng, Caleb Welton, Xixuan Feng, Kun Li, and Arun Kumar.","author":"Hellerstein Joe","year":"2012","unstructured":"Joe Hellerstein, Christopher R\u00e9, Florian Schoppmann, Daisy Zhe Wang, Eugene Fratkin, Aleksander Gorajek, Kee Siong Ng, Caleb Welton, Xixuan Feng, Kun Li, and Arun Kumar. 2012. The MADlib Analytics Library or MAD Skills, the SQL. http:\/\/arxiv.org\/abs\/1208.4165 arXiv:1208.4165 [cs]."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/366622.366647"},{"key":"e_1_2_1_34_1","unstructured":"Jeff Johnson Matthijs Douze and Herv\u00e9 J\u00e9gou. 2017. Billion-scale similarity search with GPUs. http:\/\/arxiv.org\/abs\/1702.08734 arXiv:1702.08734 [cs]."},{"key":"e_1_2_1_35_1","doi-asserted-by":"crossref","unstructured":"Daniel Kang Peter Bailis and Matei Zaharia. 2019. BlazeIt: Optimizing Declarative Aggregation and Limit Queries for Neural Network-Based Video Analytics. http:\/\/arxiv.org\/abs\/1805.01046 arXiv:1805.01046 [cs].","DOI":"10.14778\/3372716.3372725"},{"key":"e_1_2_1_36_1","doi-asserted-by":"crossref","unstructured":"Daniel Kang John Emmons Firas Abuzaid Peter Bailis and Matei Zaharia. 2017. NoScope: Optimizing Neural Network Queries over Video at Scale. http:\/\/arxiv.org\/abs\/1703.02529 arXiv:1703.02529 [cs].","DOI":"10.14778\/3137628.3137664"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.14778\/3407790.3407804"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","unstructured":"Daniel Kang Edward Gan Peter Bailis Tatsunori Hashimoto and Matei Zaharia. 2022. Approximate Selection with Guarantees using Proxies. arXiv:2004.00827 [cs]. 10.48550\/arXiv.2004.00827","DOI":"10.48550\/arXiv.2004.00827"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476249.3476285"},{"key":"e_1_2_1_40_1","unstructured":"Daniel Kang John Guibas Peter Bailis Tatsunori Hashimoto and Matei Zaharia. [n.d.]. Task-agnostic Indexes for Deep Learning-based Queries over Unstructured Data. ([n. d.])."},{"key":"e_1_2_1_41_1","unstructured":"Omar Khattab Arnav Singhvi Paridhi Maheshwari Zhiyuan Zhang Keshav Santhanam Sri Vardhamanan Saiful Haq Ashutosh Sharma Thomas T. Joshi Hanna Moazam Heather Miller Matei Zaharia and Christopher Potts. 2023. DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines. https:\/\/arxiv.org\/abs\/2310.03714v1"},{"key":"e_1_2_1_42_1","doi-asserted-by":"crossref","unstructured":"Omar Khattab and Matei Zaharia. 2020. ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT. https:\/\/arxiv.org\/abs\/2004.12832v2","DOI":"10.1145\/3397271.3401075"},{"key":"e_1_2_1_43_1","volume-title":"Joseph E. Gonzalez, Hao Zhang, and Ion Stoica.","author":"Kwon Woosuk","year":"2023","unstructured":"Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, and Ion Stoica. 2023. Efficient Memory Management for Large Language Model Serving with PagedAttention. https:\/\/arxiv.org\/abs\/2309.06180v1"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","unstructured":"Patrick Lewis Ethan Perez Aleksandra Piktus Fabio Petroni Vladimir Karpukhin Naman Goyal Heinrich K\u00fcttler Mike Lewis Wen-tau Yih Tim Rockt\u00e4schel Sebastian Riedel and Douwe Kiela. 2021. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. arXiv:2005.11401 [cs]. 10.48550\/arXiv.2005.11401","DOI":"10.48550\/arXiv.2005.11401"},{"key":"e_1_2_1_45_1","volume-title":"Shibani Santurkar, Surya Ganguli, Tatsunori Hashimoto, Thomas Icard, Tianyi Zhang, Vishrav Chaudhary, William Wang, Xuechen Li, Yifan Mai, Yuhui Zhang, and Yuta Koreeda.","author":"Liang Percy","year":"2022","unstructured":"Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, Benjamin Newman, Binhang Yuan, Bobby Yan, Ce Zhang, Christian Cosgrove, Christopher D. Manning, Christopher R\u00e9, Diana Acosta-Navas, Drew A. Hudson, Eric Zelikman, Esin Durmus, Faisal Ladhak, Frieda Rong, Hongyu Ren, Huaxiu Yao, Jue Wang, Keshav Santhanam, Laurel Orr, Lucia Zheng, Mert Yuksekgonul, Mirac Suzgun, Nathan Kim, Neel Guha, Niladri Chatterji, Omar Khattab, Peter Henderson, Qian Huang, Ryan Chi, Sang Michael Xie, Shibani Santurkar, Surya Ganguli, Tatsunori Hashimoto, Thomas Icard, Tianyi Zhang, Vishrav Chaudhary, William Wang, Xuechen Li, Yifan Mai, Yuhui Zhang, and Yuta Koreeda. 2022. Holistic Evaluation of Language Models. https:\/\/arxiv.org\/abs\/2211.09110v2"},{"key":"e_1_2_1_46_1","unstructured":"Yiming Lin Madelon Hulsebos Ruiying Ma Shreya Shankar Sepanta Zeigham Aditya G. Parameswaran and Eugene Wu. 2024. Towards Accurate and Efficient Document Analytics with Large Language Models. http:\/\/arxiv.org\/abs\/2405.04674 arXiv:2405.04674 [cs]."},{"key":"e_1_2_1_47_1","volume-title":"Zui Chen, Michael Franklin, Tim Kraska, Samuel Madden, and Gerardo Vitagliano.","author":"Liu Chunwei","year":"2024","unstructured":"Chunwei Liu, Matthew Russo, Michael Cafarella, Lei Cao, Peter Baille Chen, Zui Chen, Michael Franklin, Tim Kraska, Samuel Madden, and Gerardo Vitagliano. 2024. A Declarative System for Optimizing AI Workloads. http:\/\/arxiv.org\/abs\/2405.14696 arXiv:2405.14696 [cs]."},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","unstructured":"Nelson F. Liu Kevin Lin John Hewitt Ashwin Paranjape Michele Bevilacqua Fabio Petroni and Percy Liang. 2023. Lost in the Middle: How Language Models Use Long Contexts. arXiv:2307.03172 [cs]. 10.48550\/arXiv.2307.03172","DOI":"10.48550\/arXiv.2307.03172"},{"key":"e_1_2_1_49_1","unstructured":"Shu Liu Asim Biswal Audrey Cheng Xiangxi Mo Shiyi Cao Joseph E. Gonzalez Ion Stoica and Matei Zaharia. 2024. Optimizing LLM Queries in Relational Workloads. http:\/\/arxiv.org\/abs\/2403.05821 arXiv:2403.05821 [cs]."},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2311.09818"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3183751"},{"key":"e_1_2_1_52_1","unstructured":"Xueguang Ma Xinyu Zhang Ronak Pradeep and Jimmy Lin. 2023. Zero-Shot Listwise Document Reranking with a Large Language Model. https:\/\/arxiv.org\/abs\/2305.02156v1"},{"key":"e_1_2_1_53_1","unstructured":"MotherDuck. [n.d.]. Introducing the prompt() Function: Use the Power of LLMs with SQL! - MotherDuck Blog. https:\/\/motherduck.com\/blog\/sql-llm-prompt-function-gpt-models\/"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2407.11418"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/3654923"},{"key":"e_1_2_1_56_1","unstructured":"Ronak Pradeep Sahel Sharifymoghaddam and Jimmy Lin. 2023. RankVicuna: Zero-Shot Listwise Document Reranking with Open-Source Large Language Models. http:\/\/arxiv.org\/abs\/2309.15088 arXiv:2309.15088 [cs]."},{"key":"e_1_2_1_57_1","unstructured":"Ronak Pradeep Sahel Sharifymoghaddam and Jimmy Lin. 2023. RankZephyr: Effective and Robust Zero-Shot Listwise Reranking is a Breeze! http:\/\/arxiv.org\/abs\/2312.02724 arXiv:2312.02724 [cs]."},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","unstructured":"Zhen Qin Rolf Jagerman Kai Hui Honglei Zhuang Junru Wu Le Yan Jiaming Shen Tianqi Liu Jialu Liu Donald Metzler Xuanhui Wang and Michael Bendersky. 2024. Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting. arXiv:2306.17563 [cs]. 10.48550\/arXiv.2306.17563","DOI":"10.48550\/arXiv.2306.17563"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.emnlp-main.249"},{"key":"e_1_2_1_60_1","volume-title":"Wainwright","author":"Shah Nihar B.","year":"2016","unstructured":"Nihar B. Shah and Martin J. Wainwright. 2016. Simple, Robust and Optimal Ranking from Pairwise Comparisons. http:\/\/arxiv.org\/abs\/1512.08949 arXiv:1512.08949 [cs, math, stat]."},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","unstructured":"Shreya Shankar Tristan Chambers Tarak Shah Aditya G. Parameswaran and Eugene Wu. 2024. DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing. arXiv:2410.12189 [cs]. 10.48550\/arXiv.2410.12189","DOI":"10.48550\/arXiv.2410.12189"},{"key":"e_1_2_1_62_1","doi-asserted-by":"crossref","unstructured":"Weiwei Sun Lingyong Yan Xinyu Ma Shuaiqiang Wang Pengjie Ren Zhumin Chen Dawei Yin and Zhaochun Ren. 2023. Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents. https:\/\/arxiv.org\/abs\/2304.09542v2","DOI":"10.18653\/v1\/2023.emnlp-main.923"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2104.08663"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N18-1074"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2001.990517"},{"key":"e_1_2_1_66_1","unstructured":"Liang Wang Nan Yang Xiaolong Huang Binxing Jiao Linjun Yang Daxin Jiang Rangan Majumder and Furu Wei. 2024. Text Embeddings by Weakly-Supervised Contrastive Pre-training. http:\/\/arxiv.org\/abs\/2212.03533 arXiv:2212.03533 [cs]."},{"key":"e_1_2_1_67_1","unstructured":"WilliamDAssafMSFT. 2024. Intelligent Applications - Azure SQL Database. https:\/\/learn.microsoft.com\/en-us\/azure\/azure-sql\/database\/ai-artificial-intelligence-intelligent-applications?view=azuresql"},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","unstructured":"Jeff Wu Long Ouyang Daniel M. Ziegler Nisan Stiennon Ryan Lowe Jan Leike and Paul Christiano. 2021. Recursively Summarizing Books with Human Feedback. arXiv:2109.10862 [cs]. 10.48550\/arXiv.2109.10862","DOI":"10.48550\/arXiv.2109.10862"},{"key":"e_1_2_1_69_1","unstructured":"Shirley Wu Shiyu Zhao Michihiro Yasunaga Kexin Huang Kaidi Cao Qian Huang Vassilis N. Ioannidis Karthik Subbian James Zou and Jure Leskovec. 2024. STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases. https:\/\/arxiv.org\/abs\/2404.13207v2"},{"key":"e_1_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.1145\/3133887"},{"key":"e_1_2_1_71_1","doi-asserted-by":"publisher","unstructured":"Tao Yu Zifan Li Zilin Zhang Rui Zhang and Dragomir Radev. 2018. TypeSQL: Knowledge-based Type-Aware Neural Text-to-SQL Generation. arXiv:1804.09769 [cs]. 10.48550\/arXiv.1804.09769","DOI":"10.48550\/arXiv.1804.09769"},{"key":"e_1_2_1_72_1","unstructured":"Murong Yue Jie Zhao Min Zhang Liang Du and Ziyu Yao. 2024. Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning. http:\/\/arxiv.org\/abs\/2310.03094 arXiv:2310.03094 [cs]."},{"key":"e_1_2_1_73_1","unstructured":"John M Zelle and Raymond J Mooney. 1996. 1996-Learning to Parse Database Queries Using Inductive Logic Programming. (1996)."},{"key":"e_1_2_1_74_1","doi-asserted-by":"publisher","unstructured":"Rowan Zellers Ari Holtzman Yonatan Bisk Ali Farhadi and Yejin Choi. 2019. HellaSwag: Can a Machine Really Finish Your Sentence? arXiv:1905.07830 [cs]. 10.48550\/arXiv.1905.07830","DOI":"10.48550\/arXiv.1905.07830"},{"key":"e_1_2_1_75_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2403.02951"},{"key":"e_1_2_1_76_1","doi-asserted-by":"crossref","unstructured":"Honglei Zhuang Zhen Qin Kai Hui Junru Wu Le Yan Xuanhui Wang and Michael Bendersky. 2024. Beyond Yes and No: Improving Zero-Shot LLM Rankers via Scoring Fine-Grained Relevance Labels. http:\/\/arxiv.org\/abs\/2310.14122 arXiv:2310.14122 [cs].","DOI":"10.18653\/v1\/2024.naacl-short.31"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3749646.3749685","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,5]],"date-time":"2025-09-05T03:27:39Z","timestamp":1757042859000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3749646.3749685"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7]]},"references-count":76,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2025,7]]}},"alternative-id":["10.14778\/3749646.3749685"],"URL":"https:\/\/doi.org\/10.14778\/3749646.3749685","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2025,7]]},"assertion":[{"value":"2025-09-04","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}