{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,11]],"date-time":"2025-09-11T18:08:00Z","timestamp":1757614080710,"version":"3.44.0"},"reference-count":50,"publisher":"Association for Computing Machinery (ACM)","issue":"11","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2025,7]]},"abstract":"<jats:p>Late Materialization (LM) is a critical technique applied in traditional column stores to speed up analytical queries. However, with modern analytical databases evolved to incorporate a vectorized columnar execution engine, LM's benefits in I\/O reduction and fast columnar query processing have diminished. In this paper, we redefine the concept of Late Materialization in the context of modern analytical databases and propose Selective Late Materialization (SLM) to allow each attribute in a query to choose its own materialization point that yields the minimum cost. SLM expands the solution space of the traditional materialization problem from one unified hard-coded binary decision (i.e., early or late) for all attributes to per attribute per query decisions. By integrating SLM into DuckDB, we show that SLM consistently outperforms the baselines of Early Materialization and Late Materialization by 14.7% and 8.9%, respectively, on average using the Join Order Benchmark (JOB), with up to 76.7% latency reduction for individual queries. We observe similar results for the TPC-DS benchmark.<\/jats:p>","DOI":"10.14778\/3749646.3749717","type":"journal-article","created":{"date-parts":[[2025,9,4]],"date-time":"2025-09-04T17:55:06Z","timestamp":1757008506000},"page":"4616-4628","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Selective Late Materialization in Modern Analytical Databases"],"prefix":"10.14778","volume":"18","author":[{"given":"Yihao","family":"Liu","sequence":"first","affiliation":[{"name":"Tsinghua University"}]},{"given":"Shaoxuan","family":"Tang","sequence":"additional","affiliation":[{"name":"Tsinghua University"}]},{"given":"Yulong","family":"Hui","sequence":"additional","affiliation":[{"name":"Tsinghua University"}]},{"given":"Hangrui","family":"Zhou","sequence":"additional","affiliation":[{"name":"Tsinghua University"}]},{"given":"Huanchen","family":"Zhang","sequence":"additional","affiliation":[{"name":"Tsinghua University"}]}],"member":"320","published-online":{"date-parts":[[2025,9,4]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"2022. Apache Parquet. https:\/\/parquet.apache.org\/."},{"key":"e_1_2_1_2_1","unstructured":"2024. Duckdb v1.1.0. https:\/\/github.com\/duckdb\/duckdb\/releases\/tag\/v1.1.0."},{"key":"e_1_2_1_3_1","unstructured":"2024. JOB implement. https:\/\/github.com\/gregrahn\/join-order-benchmark?tab=readme-ov-file."},{"key":"e_1_2_1_4_1","unstructured":"2024. Kaggle Movie ID dataset. https:\/\/www.kaggle.com\/datasets\/grouplens\/movielens-20m-dataset?select=rating.csv."},{"key":"e_1_2_1_5_1","unstructured":"2024. Kaggle USA Real Estate Dataset. https:\/\/www.kaggle.com\/datasets\/ahmedshahriarsakib\/usa-real-estate-dataset?select=realtor-dataset-100k.csv."},{"key":"e_1_2_1_6_1","unstructured":"2025. Apache Doris. https:\/\/github.com\/apache\/doris."},{"key":"e_1_2_1_7_1","unstructured":"2025. emhash. https:\/\/github.com\/ktprime\/emhash."},{"key":"e_1_2_1_8_1","unstructured":"2025. StarRocks. https:\/\/github.com\/StarRocks\/starrocks."},{"key":"e_1_2_1_9_1","unstructured":"2025. TPC-DS Benchmark Standard Specification. https:\/\/www.tpc.org\/tpcds\/."},{"key":"e_1_2_1_10_1","doi-asserted-by":"crossref","unstructured":"Daniel Abadi Peter Boncz Stavros Harizopoulos Stratos Idreos Samuel Madden et al. 2013. The design and implementation of modern column-oriented database systems. Foundations and Trends\u00ae in Databases 5 3 (2013) 197\u2013280.","DOI":"10.1561\/1900000024"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1142473.1142548"},{"key":"e_1_2_1_12_1","volume-title":"2007 IEEE 23rd International Conference on Data Engineering. IEEE, 466\u2013475","author":"Abadi Daniel J","year":"2006","unstructured":"Daniel J Abadi, Daniel S Myers, David J DeWitt, and Samuel R Madden. 2006. Materialization strategies in a column-oriented DBMS. In 2007 IEEE 23rd International Conference on Data Engineering. IEEE, 466\u2013475."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.14778\/3685800.3685809"},{"key":"e_1_2_1_14_1","first-page":"169","article-title":"Weaving Relations for Cache Performance","volume":"1","author":"Ailamaki Anastassia","year":"2001","unstructured":"Anastassia Ailamaki, David J DeWitt, Mark D Hill, and Marios Skounakis. 2001. Weaving Relations for Cache Performance.. In VLDB, Vol. 1. 169\u2013180.","journal-title":"VLDB"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3526045"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3452831"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.14778\/3407790.3407851"},{"key":"e_1_2_1_18_1","first-page":"225","article-title":"MonetDB\/X100: Hyper-Pipelining Query Execution","volume":"5","author":"Boncz Peter A","year":"2005","unstructured":"Peter A Boncz, Marcin Zukowski, and Niels Nes. 2005. MonetDB\/X100: Hyper-Pipelining Query Execution.. In Cidr, Vol. 5. 225\u2013237.","journal-title":"Cidr"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.5555\/645912.671296"},{"key":"e_1_2_1_20_1","unstructured":"George A Chernishev Viacheslav Galaktionov Valentin V Grigorev Evgeniy Klyuchikov and Kirill Smirnov. 2022. A Comprehensive Study of Late Materialization Strategies for a Disk-Based Column-Store.. In DOLAP. 21\u201330."},{"key":"e_1_2_1_21_1","volume-title":"Cidr","volume":"25","author":"Chronis Yannis","year":"2025","unstructured":"Yannis Chronis, Anastasia Ailamaki, Lawrence Benson, Helena Caminal, Jana Gi\u010deva, Dave Patterson, Eric Sedlar, and Lisa Wu Wills. 2025. Databases in the Era of Memory-Centric Computing. In Cidr, Vol. 25."},{"key":"e_1_2_1_22_1","volume-title":"PIMDAL: Mitigating the Memory Bottleneck in Data Analytics using a Real Processing-in-Memory System. arXiv preprint arXiv:2504.01948","author":"Frouzakis Manos","year":"2025","unstructured":"Manos Frouzakis, Juan G\u00f3mez-Luna, Geraldo F Oliveira, Mohammad Sadrosadati, and Onur Mutlu. 2025. PIMDAL: Mitigating the Memory Bottleneck in Data Analytics using a Real Processing-in-Memory System. arXiv preprint arXiv:2504.01948 (2025)."},{"key":"e_1_2_1_23_1","volume-title":"Proceedings of the 2011 Third International Conference on Advances in Databases, Knowledge, and Data Applications (DBKDA'11)","author":"Grund Martin","year":"2011","unstructured":"Martin Grund, Jens Krueger, Matthias Kleine, Alexander Zeier, and Hasso Plattner. 2011. Optimal query operator materialization strategy for hybrid databases. In Proceedings of the 2011 Third International Conference on Advances in Databases, Knowledge, and Data Applications (DBKDA'11). 169\u2013174."},{"key":"e_1_2_1_24_1","unstructured":"Alan Halverson Jennifer L Beckmann Jeffrey F Naughton and David J Dewitt. 2006. A comparison of c-store and row-store in a common framework. Technical Report. University of Wisconsin-Madison Department of Computer Sciences."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.5555\/1182635.1164170"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/1559845.1559878"},{"key":"e_1_2_1_27_1","volume-title":"SOSD: A Benchmark for Learned Indexes.","author":"Kipf A.","year":"2019","unstructured":"A. Kipf, R Marcus, A Van Renen, M. Stoian, A. Kemper, T. Kraska, and T. Neumann. 2019. SOSD: A Benchmark for Learned Indexes. (2019)."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3632410.3632422"},{"key":"e_1_2_1_29_1","volume-title":"Bloom filters in distributed query execution","author":"Koutris Paraschos","year":"2011","unstructured":"Paraschos Koutris. 2011. Bloom filters in distributed query execution. University of Washington, CSE 544 (2011)."},{"key":"e_1_2_1_30_1","volume-title":"Database and Expert Systems Applications: 25th International Conference, DEXA 2014, Munich, Germany, September 1\u20134, 2014. Proceedings, Part II 25","author":"Ku Chi","year":"2014","unstructured":"Chi Ku, Yanchen Liu, Masood Mortazavi, Fang Cao, Mengmeng Chen, and Guangyu Shi. 2014. Optimization strategies for column materialization in parallel execution of queries. In Database and Expert Systems Applications: 25th International Conference, DEXA 2014, Munich, Germany, September 1\u20134, 2014. Proceedings, Part II 25. Springer, 191\u2013198."},{"key":"e_1_2_1_31_1","volume-title":"The vertica analytic database: C-store 7 years later. arXiv preprint arXiv:1208.4173","author":"Lamb Andrew","year":"2012","unstructured":"Andrew Lamb, Matt Fuller, Ramakrishna Varadarajan, Nga Tran, Ben Vandier, Lyric Doshi, and Chuck Bear. 2012. The vertica analytic database: C-store 7 years later. arXiv preprint arXiv:1208.4173 (2012)."},{"key":"e_1_2_1_32_1","volume-title":"Modular Analytic Query Engine. In Companion of the 2024 International Conference on Management of Data. 5\u201317","author":"Lamb Andrew","year":"2024","unstructured":"Andrew Lamb, Yijie Shen, Dani\u00ebl Heres, Jayjeet Chakraborty, Mehmet Ozan Kabak, Liang-Chi Hsieh, and Chao Sun. 2024. Apache Arrow DataFusion: A Fast, Embeddable, Modular Analytic Query Engine. In Companion of the 2024 International Conference on Management of Data. 5\u201317."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2882925"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2610507"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.14778\/2850583.2850594"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1007\/s007780050071"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3639320"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.5555\/1316689.1316749"},{"key":"e_1_2_1_39_1","volume-title":"MonetDB: Two decades of research in column-oriented database architectures. Data Engineering 40","author":"Fabian Groffen Niels Nes Stratos Idreos","year":"2012","unstructured":"Stratos Idreos Fabian Groffen Niels Nes and Stefan Manegold Sjoerd Mullender Martin Kersten. 2012. MonetDB: Two decades of research in column-oriented database architectures. Data Engineering 40 (2012)."},{"key":"e_1_2_1_40_1","first-page":"29","article-title":"Umbra: A Disk-Based System with In-Memory Performance","volume":"20","author":"Neumann Thomas","year":"2020","unstructured":"Thomas Neumann and Michael J Freitag. 2020. Umbra: A Disk-Based System with In-Memory Performance.. In CIDR, Vol. 20. 29.","journal-title":"CIDR"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.14778\/3554821.3554829"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3320212"},{"key":"e_1_2_1_43_1","volume-title":"Distributed Computing and Internet Technology: 5th International Conference, ICDCIT 2008 New Delhi, India, December 10\u201312, 2008. Proceedings 5. Springer, 145\u2013156","author":"Ramesh Sukriti","year":"2009","unstructured":"Sukriti Ramesh, Odysseas Papapetrou, and Wolf Siberski. 2009. Optimizing distributed joins with bloom filters. In Distributed Computing and Internet Technology: 5th International Conference, ICDCIT 2008 New Delhi, India, December 10\u201312, 2008. Proceedings 5. Springer, 145\u2013156."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.14778\/3685800.3685802"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/1457150.1457154"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2013.6544909"},{"key":"e_1_2_1_47_1","doi-asserted-by":"crossref","unstructured":"Mike Stonebraker Daniel J Abadi Adam Batkin Xuedong Chen Mitch Cherniack Miguel Ferreira Edmond Lau Amerson Lin Sam Madden Elizabeth O'Neil et al. 2018. C-store: a column-oriented DBMS. In Making Databases Work: the Pragmatic Wisdom of Michael Stonebraker. 491\u2013518.","DOI":"10.1145\/3226595.3226638"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/1559845.1559854"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.14778\/3090163.3090167"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2012.148"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3749646.3749717","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,5]],"date-time":"2025-09-05T03:32:29Z","timestamp":1757043149000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3749646.3749717"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7]]},"references-count":50,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2025,7]]}},"alternative-id":["10.14778\/3749646.3749717"],"URL":"https:\/\/doi.org\/10.14778\/3749646.3749717","relation":{},"ISSN":["2150-8097"],"issn-type":[{"type":"print","value":"2150-8097"}],"subject":[],"published":{"date-parts":[[2025,7]]},"assertion":[{"value":"2025-09-04","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}