{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,19]],"date-time":"2026-02-19T15:35:47Z","timestamp":1771515347349,"version":"3.50.1"},"reference-count":61,"publisher":"Association for Computing Machinery (ACM)","issue":"11","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2025,7]]},"abstract":"<jats:p>Recently, large language models (LLMs) have demonstrated remarkable capabilities in understanding and generating natural language content, attracting widespread attention in both industry and academia. An increasing number of services offer LLMs for various tasks via APIs. Different LLMs demonstrate expertise in different domains of queries (e.g., text classification queries). Meanwhile, LLMs of different scales, complexities, and performance are priced diversely. Driven by this, several researchers are investigating strategies for selecting an ensemble of LLMs, aiming to decrease overall usage costs while enhancing performance. However, to our best knowledge, none of the existing works addresses the problem, how to find an LLM ensemble subject to a cost budget, which maximizes the ensemble performance with guarantees.<\/jats:p>\n          <jats:p>In this paper, we formalize the performance of an ensemble of models (LLMs) using the notion of correctness probability, which we formally define. We develop an approach for aggregating responses from multiple LLMs to enhance ensemble performance. Building on this, we formulate the Optimal Ensemble Selection (OES) problem of selecting a set of LLMs subject to a cost budget that maximizes the overall correctness probability. We show that the correctness probability function is non-decreasing and non-submodular and provide evidence that the OES problem is likely to be NP-hard. By leveraging a submodular function that upper bounds correctness probability, we develop an algorithm, ThriftLLM, and prove that it achieves an instance-dependent approximation guarantee with high probability. Our framework functions as a data processing system that selects appropriate LLM operators to deliver high-quality results under budget constraints. It achieves state-of-the-art performance for text classification and entity matching queries on multiple real-world datasets against various baselines in our extensive experimental evaluation, while using a relatively lower cost budget, strongly supporting the effectiveness and superiority of our method.<\/jats:p>","DOI":"10.14778\/3749646.3749702","type":"journal-article","created":{"date-parts":[[2025,9,4]],"date-time":"2025-09-04T17:55:06Z","timestamp":1757008506000},"page":"4410-4423","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["ThriftLLM: On Cost-Effective Selection of Large Language Models for Classification Queries"],"prefix":"10.14778","volume":"18","author":[{"given":"Keke","family":"Huang","sequence":"first","affiliation":[{"name":"University of British Columbia"}]},{"given":"Yimin","family":"Shi","sequence":"additional","affiliation":[{"name":"National University of Singapore"}]},{"given":"Dujian","family":"Ding","sequence":"additional","affiliation":[{"name":"University of British Columbia"}]},{"given":"Yifei","family":"Li","sequence":"additional","affiliation":[{"name":"University of British Columbia"}]},{"given":"Yang","family":"Fei","sequence":"additional","affiliation":[{"name":"National University of Singapore"}]},{"given":"Laks","family":"Lakshmanan","sequence":"additional","affiliation":[{"name":"University of British Columbia"}]},{"given":"Xiaokui","family":"Xiao","sequence":"additional","affiliation":[{"name":"National University of Singapore, CNRS@CREATE, Singapore"}]}],"member":"320","published-online":{"date-parts":[[2025,9,4]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Introducing the next generation of Claude. https:\/\/www.anthropic.com\/news\/claude-3-family Accessed on","year":"2024","unstructured":"Anthropic. 2024. Introducing the next generation of Claude. https:\/\/www.anthropic.com\/news\/claude-3-family Accessed on April 26, 2024."},{"key":"e_1_2_1_2_1","unstructured":"Rachit Bansal Bidisha Samanta Siddharth Dalmia Nitish Gupta Sriram Ganapathy Abhishek Bapna Prateek Jain and Partha Talukdar. 2024. LLM Augmented LLMs: Expanding Capabilities through Composition. In ICLR."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/3442200"},{"key":"e_1_2_1_4_1","first-page":"498","article-title":"Guarantees for Greedy Maximization of Non-submodular Functions with Applications","volume":"70","author":"Bian Andrew An","year":"2017","unstructured":"Andrew An Bian, Joachim M. Buhmann, Andreas Krause, and Sebastian Tschiatschek. 2017. Guarantees for Greedy Maximization of Non-submodular Functions with Applications. In ICML, Vol. 70. 498\u2013507.","journal-title":"ICML"},{"key":"e_1_2_1_5_1","first-page":"208","article-title":"Concentration Inequalities","volume":"3176","author":"Boucheron St\u00e9phane","year":"2003","unstructured":"St\u00e9phane Boucheron, G\u00e1bor Lugosi, and Olivier Bousquet. 2003. Concentration Inequalities. In Advanced Lectures on Machine Learning, Vol. 3176. 208\u2013240.","journal-title":"Advanced Lectures on Machine Learning"},{"key":"e_1_2_1_6_1","doi-asserted-by":"crossref","unstructured":"Niv Buchbinder Moran Feldman Joseph Naor and Roy Schwartz. 2012. A Tight Linear Time (1\/2)-Approximation for Unconstrained Submodular Maximization. In FOCS. 649\u2013658.","DOI":"10.1109\/FOCS.2012.73"},{"key":"e_1_2_1_7_1","doi-asserted-by":"crossref","unstructured":"Riccardo Cappuzzo Paolo Papotti and Saravanan Thirumuruganathan. 2020. Creating embeddings of heterogeneous relational datasets for data integration tasks. In SIGMOD. 1335\u20131349.","DOI":"10.1145\/3318464.3389742"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.nlp4convai-1.5"},{"key":"e_1_2_1_9_1","volume-title":"What is the Role of Small Models in the LLM Era: A Survey. arXiv preprint arXiv:2409.06857","author":"Chen Lihu","year":"2024","unstructured":"Lihu Chen and Ga\u00ebl Varoquaux. 2024. What is the Role of Small Models in the LLM Era: A Survey. arXiv preprint arXiv:2409.06857 (2024)."},{"key":"e_1_2_1_10_1","volume-title":"FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance. CoRR abs\/2305.05176","author":"Chen Lingjiao","year":"2023","unstructured":"Lingjiao Chen, Matei Zaharia, and James Zou. 2023. FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance. CoRR abs\/2305.05176 (2023)."},{"key":"e_1_2_1_11_1","volume-title":"Octopus v2: On-device language model for super agent. CoRR abs\/2404.01744","author":"Chen Wei","year":"2024","unstructured":"Wei Chen and Zhiyuan Li. 2024. Octopus v2: On-device language model for super agent. CoRR abs\/2404.01744 (2024)."},{"key":"e_1_2_1_12_1","volume-title":"Octopus v4: Graph of language models. CoRR abs\/2404.19296","author":"Chen Wei","year":"2024","unstructured":"Wei Chen and Zhiyuan Li. 2024. Octopus v4: Graph of language models. CoRR abs\/2404.19296 (2024)."},{"key":"e_1_2_1_13_1","volume-title":"Lui","author":"Dai Xiangxiang","year":"2024","unstructured":"Xiangxiang Dai, Jin Li, Xutong Liu, Anqi Yu, and John C. S. Lui. 2024. Cost-Effective Online Multi-LLM Selection with Versatile Reward Models. CoRR abs\/2405.16587 (2024)."},{"key":"e_1_2_1_14_1","unstructured":"Dujian Ding Ankur Mallick Chi Wang Robert Sim Subhabrata Mukherjee Victor R\u00fchle Laks V. S. Lakshmanan and Ahmed Hassan Awadallah. 2024. Hybrid LLM: Cost-Efficient and Quality-Aware Query Routing. In ICLR."},{"key":"e_1_2_1_15_1","first-page":"1454","article-title":"Distributed Representations of Tuples for Entity Resolution","volume":"11","author":"Ebraheem Muhammad","year":"2018","unstructured":"Muhammad Ebraheem, Saravanan Thirumuruganathan, Shafiq Joty, Mourad Ouzzani, and Nan Tang. 2018. Distributed Representations of Tuples for Entity Resolution. VLDB 11, 11 (2018), 1454\u20131467.","journal-title":"VLDB"},{"key":"e_1_2_1_16_1","first-page":"2750","article-title":"Combining Small Language Models and Large Language Models for Zero-Shot NL2SQL","volume":"17","author":"Fan Ju","year":"2024","unstructured":"Ju Fan, Zihui Gu, Songyue Zhang, Yuxin Zhang, Zui Chen, Lei Cao, Guoliang Li, Samuel Madden, Xiaoyong Du, and Nan Tang. 2024. Combining Small Language Models and Large Language Models for Zero-Shot NL2SQL. VLDB 17, 11 (2024), 2750\u20132763.","journal-title":"VLDB"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3626772.3661357"},{"key":"e_1_2_1_18_1","volume-title":"Tiny Titans: Can Smaller Large Language Models Punch Above Their Weight in the Real World for Meeting Summarization?. In NAACL. 387\u2013394","author":"Fu Xue-Yong","unstructured":"Xue-Yong Fu, Md. Tahmid Rahman Laskar, Elena Khasanova, Cheng Chen, and Shashi Bhushan TN. 2024. Tiny Titans: Can Smaller Large Language Models Punch Above Their Weight in the Real World for Meeting Summarization?. In NAACL. 387\u2013394."},{"key":"e_1_2_1_19_1","doi-asserted-by":"crossref","unstructured":"Junhao Gan and Yufei Tao. 2015. DBSCAN Revisited: Mis-Claim Un-Fixability and Approximation. In SIGMOD. 519\u2013530.","DOI":"10.1145\/2723372.2737792"},{"key":"e_1_2_1_20_1","first-page":"1132","article-title":"Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation","volume":"17","author":"Gao Dawei","year":"2024","unstructured":"Dawei Gao, Haibin Wang, Yaliang Li, Xiuyu Sun, Yichen Qian, Bolin Ding, and Jingren Zhou. 2024. Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation. VLDB 17, 5 (2024), 1132\u20131145.","journal-title":"VLDB"},{"key":"e_1_2_1_21_1","volume-title":"Gemini: A Family of Highly Capable Multimodal Models. CoRR abs\/2312.11805","author":"Gemini Team Rohan Anil","year":"2023","unstructured":"Rohan Anil Gemini Team, Sebastian Borgeaud, Yonghui Wu, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, and Anja Hauth et al. 2023. Gemini: A Family of Highly Capable Multimodal Models. CoRR abs\/2312.11805 (2023)."},{"key":"e_1_2_1_22_1","first-page":"2018","article-title":"Entity resolution: theory, practice & open challenges","volume":"5","author":"Getoor Lise","year":"2012","unstructured":"Lise Getoor and Ashwin Machanavajjhala. 2012. Entity resolution: theory, practice & open challenges. VLDB 5, 12 (2012), 2018\u20132019.","journal-title":"VLDB"},{"key":"e_1_2_1_23_1","volume-title":"Retrieved","year":"2024","unstructured":"help.openai.com. 2024. How much does GPT-4 cost? Retrieved Oct 17, 2024 from https:\/\/help.openai.com\/en\/articles\/7127956-how-much-does-gpt-4-cost"},{"key":"e_1_2_1_24_1","volume-title":"Super Tiny Language Models. CoRR abs\/2405.14159","author":"Hillier Dylan","year":"2024","unstructured":"Dylan Hillier, Leon Guertler, Cheston Tan, Palaash Agrawal, Chen Ruirui, and Bobby Cheng. 2024. Super Tiny Language Models. CoRR abs\/2405.14159 (2024)."},{"key":"e_1_2_1_25_1","doi-asserted-by":"crossref","unstructured":"Wassily Hoeffding. 1963. Probability Inequalities for Sums of Bounded Random Variables. J. Amer. Statist. Assoc. (1963) 13\u201330.","DOI":"10.1080\/01621459.1963.10500830"},{"key":"e_1_2_1_26_1","doi-asserted-by":"crossref","unstructured":"Keke Huang Jing Tang Xiaokui Xiao Aixin Sun and Andrew Lim. 2020. Efficient Approximation Algorithms for Adaptive Target Profit Maximization. In ICDE. 649\u2013660.","DOI":"10.1109\/ICDE48307.2020.00062"},{"key":"e_1_2_1_27_1","doi-asserted-by":"crossref","unstructured":"Dongfu Jiang Xiang Ren and Bill Yuchen Lin. 2023. LLM-Blender: Ensembling Large Language Models with Pairwise Ranking and Generative Fusion. In ACL. 14165\u201314178.","DOI":"10.18653\/v1\/2023.acl-long.792"},{"key":"e_1_2_1_28_1","volume-title":"NAACL","volume":"1","author":"Ming-Wei Chang Jacob Devlin","year":"2019","unstructured":"Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL, Vol. 1."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0020-0190(99)00031-9"},{"key":"e_1_2_1_30_1","first-page":"1197","article-title":"Magellan: toward building entity matching management systems","volume":"9","author":"Konda Pradap","year":"2016","unstructured":"Pradap Konda, Sanjib Das, Paul Suganthan G. C., AnHai Doan, Adel Ardalan, Jeffrey R. Ballard, Han Li, Fatemah Panahi, Haojun Zhang, Jeff Naughton, Shishir Prasad, Ganesh Krishnan, Rohit Deep, and Vijay Raghavendra. 2016. Magellan: toward building entity matching management systems. VLDB 9, 12 (2016), 1197\u20131208.","journal-title":"VLDB"},{"key":"e_1_2_1_31_1","volume-title":"Chain of Natural Language Inference for Reducing Large Language Model Ungrounded Hallucinations. CoRR abs\/2310.03951","author":"Lei Deren","year":"2023","unstructured":"Deren Lei, Yaxi Li, Mengya Hu, Mingyu Wang, Vincent Yun, Emily Ching, and Eslam Kamal. 2023. Chain of Natural Language Inference for Reducing Large Language Model Ungrounded Hallucinations. CoRR abs\/2310.03951 (2023)."},{"key":"e_1_2_1_32_1","first-page":"50","article-title":"Deep entity matching with pre-trained language models","volume":"14","author":"Li Yuliang","year":"2020","unstructured":"Yuliang Li, Jinfeng Li, Yoshihiko Suhara, AnHai Doan, and Wang-Chiew Tan. 2020. Deep entity matching with pre-trained language models. VLDB 14, 1 (2020), 50\u201360.","journal-title":"VLDB"},{"key":"e_1_2_1_33_1","volume-title":"RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv preprint arXiv:1907.11692","author":"Liu Yinhan","year":"2019","unstructured":"Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv preprint arXiv:1907.11692 (2019)."},{"key":"e_1_2_1_34_1","volume-title":"Retrieved","year":"2024","unstructured":"livechatai.com. 2024. Gemini Pro API Pricing Calculator. Retrieved Oct 17, 2024 from https:\/\/livechatai.com\/gemini-pro-api-pricing-calculator"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0304-0208(08)73237-7"},{"key":"e_1_2_1_36_1","doi-asserted-by":"crossref","unstructured":"Sidharth Mudgal Han Li Theodoros Rekatsinas AnHai Doan Youngchoon Park Ganesh Krishnan Rohit Deep Esteban Arcaute and Vijay Raghavendra. 2018. Deep learning for entity matching: A design space exploration. In SIGMOD. 19\u201334.","DOI":"10.1145\/3183713.3196926"},{"key":"e_1_2_1_37_1","unstructured":"OpenAI. 2023. GPT-4 Technical Report. CoRR abs\/2303.08774 (2023)."},{"key":"e_1_2_1_38_1","unstructured":"OpenAI. 2024. OpenAI Embeddings API. https:\/\/api.openai.com\/v1\/embeddings."},{"key":"e_1_2_1_39_1","doi-asserted-by":"crossref","unstructured":"Diarmuid O'Reilly-Morgan Elias Tragos Erika Duriakova Honghui Du Neil Hurley and Aonghus Lawlor. 2025. Entity Matching with Large Language Models as Weak and Strong Labellers. In New Trends in Database and Information Systems. 58\u201367.","DOI":"10.1007\/978-3-031-70421-5_6"},{"key":"e_1_2_1_40_1","volume-title":"VLDB 2024 Workshop: Tabular Data Analysis Workshop (TaDA)","author":"Parciak Marcel","year":"2024","unstructured":"Marcel Parciak, Brecht Vandevoort, Frank Neven, Liesbet M. Peeters, and Stijn Vansummeren. 2024. Schema Matching with Large Language Models: an Experimental Study. VLDB 2024 Workshop: Tabular Data Analysis Workshop (TaDA) (2024)."},{"key":"e_1_2_1_41_1","unstructured":"Ralph Peeters Aaron Steiner and Christian Bizer. 2025. Entity Matching using Large Language Models. In EDBT. 529\u2013541."},{"key":"e_1_2_1_42_1","volume-title":"Towards Optimizing the Costs of LLM Usage. CoRR abs\/2402.01742","author":"Shekhar Shivanshu","year":"2024","unstructured":"Shivanshu Shekhar, Tanishq Dubey, Koyel Mukherjee, Apoorv Saxena, Atharv Tyagi, and Nishanth Kotla. 2024. Towards Optimizing the Costs of LLM Usage. CoRR abs\/2402.01742 (2024)."},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.tcs.2024.114409"},{"key":"e_1_2_1_44_1","volume-title":"Peng Xu, Hyeondey Kim, Zihan Liu, and Pascale Fung.","author":"Su Dan","year":"2019","unstructured":"Dan Su, Yan Xu, Genta Indra Winata, Peng Xu, Hyeondey Kim, Zihan Liu, and Pascale Fung. 2019. Generalizing Question Answering System with Pre-trained Language Model Fine-tuning. In EMNLP. 203\u2013211."},{"key":"e_1_2_1_45_1","unstructured":"Xiaofei Sun Xiaoya Li Jiwei Li Fei Wu Shangwei Guo Tianwei Zhang and Guoyin Wang. 2023. Text Classification via Large Language Models. In EMNLP. 8990\u20139005."},{"key":"e_1_2_1_46_1","first-page":"2919","article-title":"Are Large Language Models a Good Replacement of Taxonomies","volume":"17","author":"Sun Yushi","year":"2024","unstructured":"Yushi Sun, Xin Hao, Kai Sun, Yifan Xu, Xiao Yang, Xin Luna Dong, Nan Tang, and Lei Chen. 2024. Are Large Language Models a Good Replacement of Taxonomies? VLDB 17, 11 (2024), 2919\u20132932.","journal-title":"VLDB"},{"key":"e_1_2_1_47_1","unstructured":"Yehui Tang Kai Han Fangcheng Liu Yunsheng Ni Yuchuan Tian Zheyuan Bai Yi-Qi Hu Sichao Liu Shangling Jui and Yunhe Wang. 2024. Rethinking Optimization and Architecture for Tiny Language Models. In ICML."},{"key":"e_1_2_1_48_1","doi-asserted-by":"crossref","unstructured":"Youze Tang Xiaokui Xiao and Yanchen Shi. 2014. Influence maximization: near-optimal time complexity meets practical efficiency. In SIGMOD. 75\u201386.","DOI":"10.1145\/2588555.2593670"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.findings-emnlp.698"},{"key":"e_1_2_1_50_1","first-page":"3511","article-title":"Generating Succinct Descriptions of Database Schemata for Cost-Efficient Prompting of Large Language Models","volume":"17","author":"Trummer Immanuel","year":"2024","unstructured":"Immanuel Trummer. 2024. Generating Succinct Descriptions of Database Schemata for Cost-Efficient Prompting of Large Language Models. VLDB 17, 11 (2024), 3511\u20133523.","journal-title":"VLDB"},{"key":"e_1_2_1_51_1","first-page":"4333","article-title":"An Interactive Multi-modal Query Answering System with Retrieval-Augmented Large Language Models","volume":"17","author":"Wang Mengzhao","year":"2024","unstructured":"Mengzhao Wang, Haotian Wu, Xiangyu Ke, Yunjun Gao, Xiaoliang Xu, and Lu Chen. 2024. An Interactive Multi-modal Query Answering System with Retrieval-Augmented Large Language Models. VLDB 17, 12 (2024), 4333\u20134336.","journal-title":"VLDB"},{"key":"e_1_2_1_52_1","volume-title":"Crowdsourcing Multiple Choice Science Questions. In Workshop on Noisy User-generated Text, NUT@EMNLP. 94\u2013106","author":"Welbl Johannes","year":"2017","unstructured":"Johannes Welbl, Nelson F. Liu, and Matt Gardner. 2017. Crowdsourcing Multiple Choice Science Questions. In Workshop on Noisy User-generated Text, NUT@EMNLP. 94\u2013106."},{"key":"e_1_2_1_53_1","doi-asserted-by":"crossref","unstructured":"Yu Xia Fang Kong Tong Yu Liya Guo Ryan A. Rossi Sungchul Kim and Shuai Li. 2024. Which LLM to Play? Convergence-Aware Online Model Selection with Time-Increasing Bandits. In WWW. 4059\u20134070.","DOI":"10.1145\/3589334.3645420"},{"key":"e_1_2_1_54_1","doi-asserted-by":"crossref","unstructured":"Dezhong Yao Yuhong Gu Gao Cong Hai Jin and Xinqiao Lv. 2022. Entity resolution with hierarchical graph attention networks. In SIGMOD. 429\u2013442.","DOI":"10.1145\/3514221.3517872"},{"key":"e_1_2_1_55_1","doi-asserted-by":"crossref","unstructured":"Rowan Zellers Ari Holtzman Yonatan Bisk Ali Farhadi and Yejin Choi. 2019. HellaSwag: Can a Machine Really Finish Your Sentence?. In ACL. 4791\u20134800.","DOI":"10.18653\/v1\/P19-1472"},{"key":"e_1_2_1_56_1","doi-asserted-by":"crossref","unstructured":"Chao Zhang Yuren Mao Yijiang Fan Yu Mi Yunjun Gao Lu Chen Dongfang Lou and Jinshu Lin. 2024. FinSQL: Model-Agnostic LLMs-based Text-to-SQL Framework for Financial Analysis. In SIGMOD. 93\u2013105.","DOI":"10.1145\/3626246.3653375"},{"key":"e_1_2_1_57_1","first-page":"39","article-title":"Benchmarking Large Language Models for News Summarization","volume":"12","author":"Zhang Tianyi","year":"2024","unstructured":"Tianyi Zhang, Faisal Ladhak, Esin Durmus, Percy Liang, Kathleen R. McKeown, and Tatsunori B. Hashimoto. 2024. Benchmarking Large Language Models for News Summarization. ACL 12 (2024), 39\u201357.","journal-title":"ACL"},{"key":"e_1_2_1_58_1","volume-title":"Junbo Jake Zhao, and Yann LeCun","author":"Zhang Xiang","year":"2015","unstructured":"Xiang Zhang, Junbo Jake Zhao, and Yann LeCun. 2015. Character-level Convolutional Networks for Text Classification. In NIPS. 649\u2013657."},{"key":"e_1_2_1_59_1","volume-title":"Auto-em: End-to-end fuzzy entity-matching using pre-trained deep models and transfer learning. In WWW. 2413\u20132424.","author":"Zhao Chen","year":"2019","unstructured":"Chen Zhao and Yeye He. 2019. Auto-em: End-to-end fuzzy entity-matching using pre-trained deep models and transfer learning. In WWW. 2413\u20132424."},{"key":"e_1_2_1_60_1","volume-title":"Ho","author":"Zheng Lucia","year":"2021","unstructured":"Lucia Zheng, Neel Guha, Brandon R. Anderson, Peter Henderson, and Daniel E. Ho. 2021. When does pretraining help?: assessing self-supervised learning for law and the CaseHOLD dataset of 53, 000+ legal holdings. In ICAIL. 159\u2013168."},{"key":"e_1_2_1_61_1","first-page":"3920","article-title":"AutoTQA: Towards Autonomous Tabular Question Answering through Multi-Agent Large Language Models","volume":"17","author":"Zhu Jun-Peng","year":"2024","unstructured":"Jun-Peng Zhu, Peng Cai, Kai Xu, Li Li, Yishen Sun, Shuai Zhou, Haihuang Su, Liu Tang, and Qi Liu. 2024. AutoTQA: Towards Autonomous Tabular Question Answering through Multi-Agent Large Language Models. VLDB 17, 12 (2024), 3920\u20133933.","journal-title":"VLDB"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3749646.3749702","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,5]],"date-time":"2025-09-05T02:55:24Z","timestamp":1757040924000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3749646.3749702"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7]]},"references-count":61,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2025,7]]}},"alternative-id":["10.14778\/3749646.3749702"],"URL":"https:\/\/doi.org\/10.14778\/3749646.3749702","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2025,7]]},"assertion":[{"value":"2025-09-04","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}