{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,13]],"date-time":"2026-06-13T04:58:48Z","timestamp":1781326728554,"version":"3.54.1"},"reference-count":53,"publisher":"Association for Computing Machinery (ACM)","issue":"6","license":[{"start":{"date-parts":[[2025,12,5]],"date-time":"2025-12-05T00:00:00Z","timestamp":1764892800000},"content-version":"vor","delay-in-days":1,"URL":"http:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000001","name":"NSF","doi-asserted-by":"publisher","award":["2112631"],"award-info":[{"award-number":["2112631"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Horizon Europe Research and Innovation programme","award":["101192750"],"award-info":[{"award-number":["101192750"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Manag. Data"],"published-print":{"date-parts":[[2025,12,4]]},"abstract":"<jats:p>\n                    Deploying Large Language Models (LLMs) on resource-constrained devices remains challenging due to limited memory, lack of GPUs, and the complexity of existing runtimes. In this paper, we introduce\n                    <jats:bold>\n                      TranSQL\n                      <jats:sup>+<\/jats:sup>\n                    <\/jats:bold>\n                    , a template-based code generator that translates LLM computation graphs into pure SQL queries for execution in relational databases. Without relying on external libraries, TranSQL\n                    <jats:sup>+<\/jats:sup>\n                    , leverages mature database features-such as vectorized execution and out-of-core processing-for efficient inference. We further propose a row-to-column (ROW2COL) optimization that improves join efficiency in matrix operations. Evaluated on Llama3-8B and DeepSeekMoE models, TranSQL\n                    <jats:sup>+<\/jats:sup>\n                    achieves up to 20\u00d7 lower prefill latency and 4\u00d7 higher decoding speed compared to DeepSpeed Inference and\n                    <jats:italic toggle=\"yes\">Llama.cpp<\/jats:italic>\n                    in low-memory and CPU-only configurations. Our results highlight relational databases as a practical environment for LLMs on low-resource hardware.\n                  <\/jats:p>","DOI":"10.1145\/3769836","type":"journal-article","created":{"date-parts":[[2025,12,6]],"date-time":"2025-12-06T04:32:13Z","timestamp":1764995533000},"page":"1-27","source":"Crossref","is-referenced-by-count":0,"title":["TranSQL\n                    <sup>+<\/sup>\n                    : Serving Large Language Models with SQL on Low-Resource Hardware"],"prefix":"10.1145","volume":"3","author":[{"ORCID":"https:\/\/orcid.org\/0009-0004-7849-7771","authenticated-orcid":false,"given":"Wenbo","family":"Sun","sequence":"first","affiliation":[{"name":"Delft University of Technology, Delft, Netherlands"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-5545-8361","authenticated-orcid":false,"given":"Qiming","family":"Guo","sequence":"additional","affiliation":[{"name":"Texas A&amp;M University - Corpus Christi, Corpus Christi, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4829-1068","authenticated-orcid":false,"given":"Wenlu","family":"Wang","sequence":"additional","affiliation":[{"name":"Texas A&amp;M University - Corpus Christi, Corpus Christi, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3720-6585","authenticated-orcid":false,"given":"Rihan","family":"Hai","sequence":"additional","affiliation":[{"name":"Delft University of Technology, Delft, Netherlands"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2025,12,5]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"2017. Open Neural Network Exchange: The open standard for machine learning interoperability. https:\/\/github.com\/onnx\/onnx."},{"key":"e_1_2_1_2_1","unstructured":"2023. Most Widely Deployed and Used Database Engine. https:\/\/sqlite.org\/mostdeployed.html."},{"key":"e_1_2_1_3_1","unstructured":"2024. PostgresML. postgresml.org."},{"key":"e_1_2_1_4_1","unstructured":"2024. SQL machine learning documentation. https:\/\/learn.microsoft.com\/en-us\/sql\/machine-learning\/?view=sqlserver-ver16."},{"key":"e_1_2_1_5_1","unstructured":"2025. ARM v8 Manual. https:\/\/developer.arm.com\/documentation\/ddi0553\/latest\/."},{"key":"e_1_2_1_6_1","unstructured":"2025. llama.cpp. https:\/\/github.com\/ggml-org\/llama.cpp."},{"key":"e_1_2_1_7_1","unstructured":"2025. Risc-V Instruction Set. https:\/\/lf-riscv.atlassian.net\/wiki\/spaces\/HOME\/pages\/16154769\/RISC-VTechnicalSpecifications#ISA-Specifications."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/2023.EMNLP-MAIN.298"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/2024.ACL-LONG.678"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.14778\/3137628.3137633"},{"key":"e_1_2_1_11_1","first-page":"579","volume-title":"Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation","author":"Chen Tianqi","unstructured":"Tianqi Chen, Thierry Moreau, Ziheng Jiang, and et al. 2018. TVM: an automated end-to-end optimizing compiler for deep learning. In Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation (Carlsbad, CA, USA) (OSDI'18). USENIX Association, USA, 579-594."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.18653\/V1\/2024.ACL-LONG.70"},{"key":"e_1_2_1_13_1","volume-title":"Proceedings of the 38th International Conference on Neural Information Processing Systems","author":"Debenedetti Edoardo","year":"2025","unstructured":"Edoardo Debenedetti, Jie Zhang, Mislav Balunovic, Luca Beurer-Kellner, Marc Fischer, and Florian Tram\u00e8r. 2025. AgentDojo: a dynamic environment to evaluate prompt injection attacks and defenses for LLM agents. In Proceedings of the 38th International Conference on Neural Information Processing Systems (Vancouver, BC, Canada) (NIPS '24). Curran Associates Inc., Red Hook, NY, USA, Article 2636, 26 pages."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2503.03777"},{"key":"e_1_2_1_15_1","unstructured":"Abhimanyu Dubey Abhinav Jauhri Abhinav Pandey and et al. 2024. The Llama 3 Herd of Models. CoRR abs\/2407.21783 (2024). arXiv:2407.21783"},{"key":"e_1_2_1_16_1","volume-title":"Learning factored representations in a deep mixture of experts. arXiv preprint arXiv:1312.4314","author":"Eigen David","year":"2013","unstructured":"David Eigen, Marc'Aurelio Ranzato, and Ilya Sutskever. 2013. Learning factored representations in a deep mixture of experts. arXiv preprint arXiv:1312.4314 (2013)."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3386137"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3662006.3662067"},{"key":"e_1_2_1_19_1","volume-title":"Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian Taylor","author":"Harris Charles R","year":"2020","unstructured":"Charles R Harris, K Jarrod Millman, St\u00e9fan J Van Der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg, Nathaniel J Smith, et al. 2020. Array programming with NumPy. Nature 585, 7825 (2020), 357-362."},{"key":"e_1_2_1_20_1","volume-title":"Third Biennial Conference on Innovative Data Systems Research, CIDR 2007, Asilomar, CA, USA, January 7-10, 2007, Online Proceedings. www.cidrdb.org, 96-101","author":"H\u00e9man S\u00e1ndor","year":"2007","unstructured":"S\u00e1ndor H\u00e9man, Marcin Zukowski, Arjen P. de Vries, and Peter A. Boncz. 2007. Efficient and Flexible Information Retrieval using MonetDB\/X100. In Third Biennial Conference on Innovative Data Systems Research, CIDR 2007, Asilomar, CA, USA, January 7-10, 2007, Online Proceedings. www.cidrdb.org, 96-101. http:\/\/cidrdb.org\/cidr2007\/papers\/cidr07p10.pdf"},{"key":"e_1_2_1_21_1","volume-title":"Kolla Bhanu Prakash, and GR Kanagachidambaresan","author":"Imambi Sagar","year":"2021","unstructured":"Sagar Imambi, Kolla Bhanu Prakash, and GR Kanagachidambaresan. 2021. PyTorch. Programming with TensorFlow: solution for edge computing applications (2021), 87-104."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3422648.3422659"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.14778\/3317315.3317323"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3468791.3472262"},{"key":"e_1_2_1_25_1","volume-title":"Diego de las Casas, Emma Bou Hanna, Florian Bressand, et al.","author":"Jiang Albert Q","year":"2024","unstructured":"Albert Q Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, et al. 2024. Mixtral of experts. arXiv preprint arXiv:2401.04088 (2024)."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.findings-emnlp.221"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/3600006.3613165"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO51591.2021.9370308"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v39i23.34620"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3319878"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE53745.2022.00180"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/3277006.3277013"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/3277006.3277013"},{"key":"e_1_2_1_34_1","first-page":"1222","volume-title":"Automatic Optimization of Matrix Implementations for Distributed Machine Learning and Linear Algebra. In SIGMOD '21","author":"Luo Shangyu","year":"2021","unstructured":"Shangyu Luo, Dimitrije Jankov, Binhang Yuan, and Chris Jermaine. 2021. Automatic Optimization of Matrix Implementations for Distributed Machine Learning and Linear Algebra. In SIGMOD '21. ACM, 1222-1234."},{"key":"e_1_2_1_35_1","unstructured":"Reiner Pope Sholto Douglas Aakanksha Chowdhery Jacob Devlin James Bradbury Jonathan Heek Kefan Xiao Shivani Agrawal and Jeff Dean. 2023. Efficiently Scaling Transformer Inference. In MLSys'23."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3320212"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394486.3406703"},{"key":"e_1_2_1_38_1","first-page":"562","article-title":"ML2SQL-Compiling a Declarative Machine Learning Language to SQL and Python","author":"Sch\u00fcle Maximilian E","year":"2019","unstructured":"Maximilian E Sch\u00fcle, Matthias Bungeroth, Dimitri Vorona, Alfons Kemper, Stephan G\u00fcnnemann, and Thomas Neumann. 2019. ML2SQL-Compiling a Declarative Machine Learning Language to SQL and Python.. In EDBT. 562-565.","journal-title":"EDBT."},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1007\/s13222-024-00485-2"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.14778\/3685800.3685802"},{"key":"e_1_2_1_41_1","unstructured":"Noam Shazeer. 2019. Fast Transformer Decoding: One Write-Head is All You Need. arXiv:1911.02150 [cs.NE] https:\/\/arxiv.org\/abs\/1911.02150"},{"key":"e_1_2_1_42_1","volume-title":"GLU Variants Improve Transformer. CoRR abs\/2002.05202","author":"Shazeer Noam","year":"2020","unstructured":"Noam Shazeer. 2020. GLU Variants Improve Transformer. CoRR abs\/2002.05202 (2020). arXiv:2002.05202 https:\/\/arxiv.org\/abs\/2002.05202"},{"key":"e_1_2_1_43_1","volume-title":"Proceedings of the 40th International Conference on Machine Learning","author":"Sheng Ying","year":"2023","unstructured":"Ying Sheng, Lianmin Zheng, Binhang Yuan, Zhuohan Li, Max Ryabinin, Beidi Chen, Percy Liang, Christopher R\u00e9, Ion Stoica, and Ce Zhang. 2023. FlexGen: high-throughput generative inference of large language models with a single GPU. In Proceedings of the 40th International Conference on Machine Learning (Honolulu, Hawaii, USA) (ICML'23). JMLR.org, Article 1288, 23 pages."},{"key":"e_1_2_1_44_1","volume-title":"LLaMA: Open and Efficient Foundation Language Models. CoRR abs\/2302.13971","author":"Touvron Hugo","year":"2023","unstructured":"Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timoth\u00e9e Lacroix, Baptiste Rozi\u00e8re, Naman Goyal, Eric Hambro, Faisal Azhar, Aur\u00e9lien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. 2023. LLaMA: Open and Efficient Foundation Language Models. CoRR abs\/2302.13971 (2023). doi:10.48550\/ ARXIV.2302.13971 arXiv:2302.13971"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.5555\/3295222.3295349"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2406.06282"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/3133887"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","unstructured":"An Yang Baosong Yang Binyuan Hui Bo Zheng Bowen Yu Chang Zhou and et al. 2024. Qwen2 Technical Report. CoRR abs\/2407.10671 (2024). doi:10.48550\/ARXIV.2407.10671 arXiv:2407.10671","DOI":"10.48550\/ARXIV.2407.10671"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.findings-acl.624"},{"key":"e_1_2_1_50_1","unstructured":"Lianmin Zheng Wei-Lin Chiang Ying Sheng Tianle Li Siyuan Zhuang Zhanghao Wu Yonghao Zhuang Zhuohan Li Zi Lin Eric P. Xing Joseph E. Gonzalez Ion Stoica and Hao Zhang. 2024. LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset. arXiv:2309.11998 [cs.CL] https:\/\/arxiv.org\/abs\/2309.11998"},{"key":"e_1_2_1_51_1","volume-title":"Proceedings of the 18th USENIX Conference on Operating Systems Design and Implementation","author":"Zhong Yinmin","year":"2024","unstructured":"Yinmin Zhong, Shengyu Liu, Junda Chen, Jianbo Hu, Yibo Zhu, Xuanzhe Liu, Xin Jin, and Hao Zhang. 2024. DistServe: disaggregating prefill and decoding for goodput-optimized large language model serving. In Proceedings of the 18th USENIX Conference on Operating Systems Design and Implementation (Santa Clara, CA, USA) (OSDI'24). USENIX Association, USA, Article 11, 18 pages."},{"key":"e_1_2_1_52_1","volume-title":"Proceedings of the 18th USENIX Conference on Operating Systems Design and Implementation","author":"Zhong Yinmin","year":"2024","unstructured":"Yinmin Zhong, Shengyu Liu, Junda Chen, Jianbo Hu, Yibo Zhu, Xuanzhe Liu, Xin Jin, and Hao Zhang. 2024. DistServe: disaggregating prefill and decoding for goodput-optimized large language model serving. In Proceedings of the 18th USENIX Conference on Operating Systems Design and Implementation (Santa Clara, CA, USA) (OSDI'24). USENIX Association, USA, Article 11, 18 pages."},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2012.148"}],"container-title":["Proceedings of the ACM on Management of Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3769836","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3769836","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,6,13]],"date-time":"2026-06-13T04:49:15Z","timestamp":1781326155000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3769836"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12,4]]},"references-count":53,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2025,12,4]]}},"alternative-id":["10.1145\/3769836"],"URL":"https:\/\/doi.org\/10.1145\/3769836","relation":{},"ISSN":["2836-6573"],"issn-type":[{"value":"2836-6573","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,12,4]]}}}