{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,19]],"date-time":"2026-05-19T07:17:16Z","timestamp":1779175036627,"version":"3.51.4"},"reference-count":48,"publisher":"Association for Computing Machinery (ACM)","issue":"13","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2024,9]]},"abstract":"<jats:p>In this paper, we present ELEET, a novel execution engine that allows one to seamlessly query and process text as a first-class citizen along with tables. To enable such a seamless integration of text and tables, ELEET leverages learned multi-modal operators (MMOps) such as joins and unions that seamlessly combine structured with unstructured textual data. While large language models (LLM) such as GPT-4 are interesting candidates to enable such learned multimodal operations, we deliberately do not follow this trend to enable MMOps, since it would result in high overhead at query runtime. Instead, to enable MMOps, ELEET comes with a more efficient small language model (SLM) that is targeted to extract structured data from text. Thanks to our novel architecture and pre-training procedure, the ELEET-model enables high-accuracy extraction with low overheads. In our evaluation, we compare query execution based on ELEET to baselines leveraging LLMs such as GPT-4 and show that ELEET can speed up multi-modal queries over tables and text by up to 575\u00d7 without sacrificing accuracy.<\/jats:p>","DOI":"10.14778\/3704965.3704989","type":"journal-article","created":{"date-parts":[[2025,2,18]],"date-time":"2025-02-18T17:22:57Z","timestamp":1739899377000},"page":"4867-4880","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["ELEET: Efficient Learned Query Execution over Text and Tables"],"prefix":"10.14778","volume":"17","author":[{"given":"Matthias","family":"Urban","sequence":"first","affiliation":[{"name":"Technical University of Darmstadt"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Carsten","family":"Binnig","sequence":"additional","affiliation":[{"name":"Technical University of Darmstadt &amp; DFKI"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,2,18]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE53745.2022.00084"},{"key":"e_1_2_1_2_1","unstructured":"Rohan Anil Andrew M. Dai Orhan Firat Melvin Johnson Dmitry Lepikhin Alexandre Passos Siamak Shakeri Emanuel Taropa Paige Bailey Zhifeng Chen Eric Chu Jonathan H. Clark Laurent El Shafey Yanping Huang Kathy Meier-Hellstern Gaurav Mishra Erica Moreira Mark Omernick Kevin Robinson Sebastian Ruder Yi Tay Kefan Xiao Yuanzhong Xu Yujing Zhang Gustavo Hernandez Abrego Junwhan Ahn Jacob Austin Paul Barham Jan Botha James Bradbury Siddhartha Brahma Kevin Brooks Michele Catasta Yong Cheng Colin Cherry Christopher A. Choquette-Choo Aakanksha Chowdhery Cl\u00e9ment Crepy Shachi Dave Mostafa Dehghani Sunipa Dev Jacob Devlin Mark D\u00edaz Nan Du Ethan Dyer Vlad Feinberg Fangxiaoyu Feng Vlad Fienber Markus Freitag Xavier Garcia Sebastian Gehrmann Lucas Gonzalez Guy Gur-Ari Steven Hand Hadi Hashemi Le Hou Joshua Howland Andrea Hu Jeffrey Hui Jeremy Hurwitz Michael Isard Abe Ittycheriah Matthew Jagielski Wenhao Jia Kathleen Kenealy Maxim Krikun Sneha Kudugunta Chang Lan Katherine Lee Benjamin Lee Eric Li Music Li Wei Li YaGuang Li Jian Li Hyeontaek Lim Hanzhao Lin Zhongtao Liu Frederick Liu Marcello Maggioni Aroma Mahendru Joshua Maynez Vedant Misra Maysam Moussalem Zachary Nado John Nham Eric Ni Andrew Nystrom Alicia Parrish Marie Pellat Martin Polacek Alex Polozov Reiner Pope Siyuan Qiao Emily Reif Bryan Richter Parker Riley Alex Castro Ros Aurko Roy Brennan Saeta Rajkumar Samuel Renee Shelby Ambrose Slone Daniel Smilkov David R. So Daniel Sohn Simon Tokumine Dasha Valter Vijay Vasudevan Kiran Vodrahalli Xuezhi Wang Pidong Wang Zirui Wang Tao Wang John Wieting Yuhuai Wu Kelvin Xu Yunhan Xu Linting Xue Pengcheng Yin Jiahui Yu Qiao Zhang Steven Zheng Ce Zheng Weikang Zhou Denny Zhou Slav Petrov and Yonghui Wu. 2023. PaLM 2 Technical Report. arXiv:2305.10403 [cs]"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.14778\/3626292.3626294"},{"key":"e_1_2_1_4_1","volume-title":"Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020","author":"Brown Tom B.","year":"2020","unstructured":"Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, Hugo Larochelle, Marc'Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.). https:\/\/proceedings.neurips.cc\/paper\/2020\/hash\/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html"},{"key":"e_1_2_1_5_1","volume-title":"Third Biennial Conference on Innovative Data Systems Research, CIDR","author":"Cafarella Michael J.","year":"2007","unstructured":"Michael J. Cafarella, Christopher R\u00e9, Dan Suciu, and Oren Etzioni. 2007. Structured Querying of Web Text Data: A Technical Challenge. In Third Biennial Conference on Innovative Data Systems Research, CIDR 2007, Asilomar, CA, USA, January 7-10, 2007, Online Proceedings. www.cidrdb.org, 225--234. http:\/\/cidrdb.org\/cidr2007\/papers\/cidr07p25.pdf"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/568271.223856"},{"key":"e_1_2_1_7_1","volume-title":"13th Conference on Innovative Data Systems Research, CIDR 2023","author":"Chen Zui","year":"2023","unstructured":"Zui Chen, Zihui Gu, Lei Cao, Ju Fan, Samuel Madden, and Nan Tang. 2023. Symphony: Towards Natural Language Query Answering over Multi-modal Data Lakes. In 13th Conference on Innovative Data Systems Research, CIDR 2023, Amsterdam, The Netherlands, January 8-11, 2023. www.cidrdb.org. https:\/\/www.cidrdb.org\/cidr2023\/papers\/p51-chen.pdf"},{"key":"e_1_2_1_8_1","article-title":"PaLM: Scaling Language Modeling with Pathways","volume":"24","author":"Chowdhery Aakanksha","year":"2023","unstructured":"Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam Shazeer, Vinodkumar Prabhakaran, Emily Reif, Nan Du, Ben Hutchinson, Reiner Pope, James Bradbury, Jacob Austin, Michael Isard, Guy Gur-Ari, Pengcheng Yin, Toju Duke, Anselm Levskaya, Sanjay Ghemawat, Sunipa Dev, Henryk Michalewski, Xavier Garcia, Vedant Misra, Kevin Robinson, Liam Fedus, Denny Zhou, Daphne Ippolito, David Luan, Hyeontaek Lim, Barret Zoph, Alexander Spiridonov, Ryan Sepassi, David Dohan, Shivani Agrawal, Mark Omernick, Andrew M. Dai, Thanumalayan Sankaranarayana Pillai, Marie Pellat, Aitor Lewkowycz, Erica Moreira, Rewon Child, Oleksandr Polozov, Katherine Lee, Zongwei Zhou, Xuezhi Wang, Brennan Saeta, Mark Diaz, Orhan Firat, Michele Catasta, Jason Wei, Kathy Meier-Hellstern, Douglas Eck, Jeff Dean, Slav Petrov, and Noah Fiedel. 2023. PaLM: Scaling Language Modeling with Pathways. J. Mach. Learn. Res. 24 (2023), 240:1--240:113. http:\/\/jmlr.org\/papers\/v24\/22-1144.html","journal-title":"J. Mach. Learn. Res."},{"key":"e_1_2_1_9_1","volume-title":"Proceedings of the 33rd International Conference on Very Large Data Bases","author":"Chu Eric","year":"2007","unstructured":"Eric Chu, Akanksha Baid, Ting Chen, AnHai Doan, and Jeffrey F. Naughton. 2007. A Relational Approach to Incrementally Extracting and Querying Structure in Unstructured Data. In Proceedings of the 33rd International Conference on Very Large Data Bases, University of Vienna, Austria, September 23-27, 2007, Christoph Koch, Johannes Gehrke, Minos N. Garofalakis, Divesh Srivastava, Karl Aberer, Anand Deshpande, Daniela Florescu, Chee Yong Chan, Venkatesh Ganti, Carl-Christian Kanne, Wolfgang Klas, and Erich J. Neuhold (Eds.). ACM, 1045--1056. http:\/\/www.vldb.org\/conf\/2007\/papers\/research\/p1045-chu.pdf"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.naacl-main.105"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2305.14314"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/n19-1423"},{"key":"e_1_2_1_13_1","volume-title":"Proceedings of the Eleventh International Conference on Language Resources and Evaluation, LREC 2018","author":"ElSahar Hady","year":"2018","unstructured":"Hady ElSahar, Pavlos Vougiouklis, Arslen Remaci, Christophe Gravier, Jonathon S. Hare, Fr\u00e9d\u00e9rique Laforest, and Elena Simperl. 2018. T-REx: A Large Scale Alignment of Natural Language with Knowledge Base Triples. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation, LREC 2018, Miyazaki, Japan, May 7-12, 2018, Nicoletta Calzolari, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, K\u00f4iti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, H\u00e9l\u00e8ne Mazo, Asunci\u00f3n Moreno, Jan Odijk, Stelios Piperidis, and Takenobu Tokunaga (Eds.). European Language Resources Association (ELRA). http:\/\/www.lrec-conf.org\/proceedings\/lrec2018\/summaries\/632.html"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3589265"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3588722"},{"key":"e_1_2_1_16_1","unstructured":"Michael Gubanov and Philip Bernstein. 2006. Structural text search and comparison using automatically extracted schema."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2014.6816755"},{"key":"e_1_2_1_18_1","first-page":"7","article-title":"Microsoft SQL Server Full-Text Search","volume":"24","author":"Hamilton James R.","year":"2001","unstructured":"James R. Hamilton and Tapas K. Nayak. 2001. Microsoft SQL Server Full-Text Search. IEEE Data Eng. Bull. 24, 4 (2001), 7--10. http:\/\/sites.computer.org\/debull\/A01DEC-CD.pdf","journal-title":"IEEE Data Eng. Bull."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.18420\/BTW2023-08"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3588710"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3555041.3589730"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1519103.1519105"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D17-1018"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/2020.ACLMAIN.703"},{"key":"e_1_2_1_25_1","volume-title":"Dongmei Zhang, and Surajit Chaudhuri.","author":"Li Peng","year":"2023","unstructured":"Peng Li, Yeye He, Dror Yashar, Weiwei Cui, Song Ge, Haidong Zhang, Danielle Rifinski Fainman, Dongmei Zhang, and Surajit Chaudhuri. 2023. Table-gpt: Table-tuned gpt for diverse table tasks. arXiv preprint arXiv:2310.09263 (2023)."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.14778\/3342263.3342271"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.14778\/3421424.3421431"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.14778\/3574245.3574258"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2206.04045"},{"key":"e_1_2_1_31_1","volume-title":"Nicola De Cao, and Paolo Papotti","author":"Saeed Mohammed","year":"2023","unstructured":"Mohammed Saeed, Nicola De Cao, and Paolo Papotti. 2023. Querying Large Language Models with SQL. arXiv preprint arXiv:2304.00472 (2023)."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.5555\/1325851.1325968"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.14778\/2809974.2809991"},{"key":"e_1_2_1_34_1","doi-asserted-by":"crossref","first-page":"86","DOI":"10.1109\/TKDE.2010.214","article-title":"Incremental information extraction using relational databases","volume":"24","author":"Tari Luis","year":"2010","unstructured":"Luis Tari, Phan Huy Tu, Jorg Hakenberg, Yi Chen, Tran Cao Son, Graciela Gonzalez, and Chitta Baral. 2010. Incremental information extraction using relational databases. IEEE Transactions on Knowledge and Data Engineering 24, 1 (2010), 86--99.","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"key":"e_1_2_1_35_1","unstructured":"Gemini Team Rohan Anil Sebastian Borgeaud Yonghui Wu Jean-Baptiste Alayrac Jiahui Yu Radu Soricut Johan Schalkwyk Andrew M Dai Anja Hauth et al. 2023. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805 (2023)."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.acl-long.241"},{"key":"e_1_2_1_37_1","unstructured":"Hugo Touvron Thibaut Lavril Gautier Izacard Xavier Martinet Marie-Anne Lachaux Timoth\u00e9e Lacroix Baptiste Rozi\u00e8re Naman Goyal Eric Hambro Faisal Azhar Aurelien Rodriguez Armand Joulin Edouard Grave and Guillaume Lample. 2023. LLaMA: Open and Efficient Foundation Language Models. arXiv:2302.13971 [cs]"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","unstructured":"Hugo Touvron Louis Martin Kevin Stone Peter Albert Amjad Almahairi Yasmine Babaei Nikolay Bashlykov Soumya Batra Prajjwal Bhargava Shruti Bhosale Dan Bikel Lukas Blecher Cristian Canton-Ferrer Moya Chen Guillem Cucurull David Esiobu Jude Fernandes Jeremy Fu Wenyin Fu Brian Fuller Cynthia Gao Vedanuj Goswami Naman Goyal Anthony Hartshorn Saghar Hosseini Rui Hou Hakan Inan Marcin Kardas Viktor Kerkez Madian Khabsa Isabel Kloumann Artem Korenev Punit Singh Koura Marie-Anne Lachaux Thibaut Lavril Jenya Lee Diana Liskovich Yinghai Lu Yuning Mao Xavier Martinet Todor Mihaylov Pushkar Mishra Igor Molybog Yixin Nie Andrew Poulton Jeremy Reizenstein Rashi Rungta Kalyan Saladi Alan Schelten Ruan Silva Eric Michael Smith Ranjan Subramanian Xiaoqing Ellen Tan Binh Tang Ross Taylor Adina Williams Jian Xiang Kuan Puxin Xu Zheng Yan Iliyan Zarov Yuchen Zhang Angela Fan Melanie Kambadur Sharan Narang Aur\u00e9lien Rodriguez Robert Stojnic Sergey Edunov and Thomas Scialom. 2023. Llama 2: Open Foundation and Fine-Tuned Chat Models. CoRR abs\/2307.09288 (2023). arXiv:2307.09288 10.48550\/ARXIV.2307.09288","DOI":"10.48550\/ARXIV.2307.09288"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3517843"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/3626246.3654732"},{"key":"e_1_2_1_41_1","unstructured":"Matthias Urban and Carsten Binnig. 2024. Efficient Learned Query Execution over Text and Tables [Technical Report]. arXiv:2410.22522 [cs.DB] https:\/\/arxiv.org\/abs\/2410.22522"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3593078.3593933"},{"key":"e_1_2_1_43_1","unstructured":"Liane Vogel Benjamin Hilprecht and Carsten Binnig. 2023. Towards Foundation Models for Relational Databases [Vision Paper]. arXiv:2305.15321 [cs]"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/3533702.3534910"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2206.07682"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/d17-1239"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.acl-long.180"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICMLC.2007.4370588"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.745"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3704965.3704989","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,18]],"date-time":"2025-02-18T17:29:12Z","timestamp":1739899752000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3704965.3704989"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,9]]},"references-count":48,"journal-issue":{"issue":"13","published-print":{"date-parts":[[2024,9]]}},"alternative-id":["10.14778\/3704965.3704989"],"URL":"https:\/\/doi.org\/10.14778\/3704965.3704989","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2024,9]]},"assertion":[{"value":"2025-02-18","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}