{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,19]],"date-time":"2026-02-19T00:30:31Z","timestamp":1771461031996,"version":"3.50.1"},"reference-count":46,"publisher":"Association for Computing Machinery (ACM)","issue":"9","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2023,5]]},"abstract":"<jats:p>With enormous volumes of data, quickly retrieving data that is relevant to a query is essential for achieving high performance. Modern cloud-based database systems often partition the data into blocks and employ various techniques to skip irrelevant blocks during query execution. Several algorithms, often based on historical properties of a workload of queries run over the data, have been proposed to tune the physical layout of data to reduce the number of blocks accessed. The effectiveness of these methods at skipping blocks depends on what metadata is stored and how well the physical data layout aligns with the queries. Existing work on automatic physical database design misses significant opportunities in skipping blocks because it ignores logical predicates in the workload that exhibit strongly correlated results. In this paper, we present Pando which enables significantly better block skipping than past methods by informing physical layout decisions with correlation-aware logical partitioning. Across a range of benchmark and real-world workloads, Pando attains up to 2.8X reduction in the number of blocks scanned and up to 2.3X speedup in end-to-end query execution time over the state-of-the-art techniques.<\/jats:p>","DOI":"10.14778\/3598581.3598601","type":"journal-article","created":{"date-parts":[[2023,7,10]],"date-time":"2023-07-10T22:19:06Z","timestamp":1689027546000},"page":"2316-2329","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":8,"title":["Pando: Enhanced Data Skipping with Logical Data Partitioning"],"prefix":"10.14778","volume":"16","author":[{"given":"Sivaprasad","family":"Sudhir","sequence":"first","affiliation":[{"name":"MIT, Meta"}]},{"given":"Wenbo","family":"Tao","sequence":"additional","affiliation":[{"name":"Meta"}]},{"given":"Nikolay","family":"Laptev","sequence":"additional","affiliation":[{"name":"Meta"}]},{"given":"Cyrille","family":"Habis","sequence":"additional","affiliation":[{"name":"Meta"}]},{"given":"Michael","family":"Cafarella","sequence":"additional","affiliation":[{"name":"MIT"}]},{"given":"Samuel","family":"Madden","sequence":"additional","affiliation":[{"name":"MIT"}]}],"member":"320","published-online":{"date-parts":[[2023,7,10]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"AutoAdmin: Self-Tuning Database Systems Technology","author":"Agrawal Sanjay","year":"2006","unstructured":"Sanjay Agrawal , Nicolas Bruno , Surajit Chaudhuri , and Vivek Narasayya . 2006. AutoAdmin: Self-Tuning Database Systems Technology . IEEE Data Engineering Bulletin ( 2006 ), 7--15. Sanjay Agrawal, Nicolas Bruno, Surajit Chaudhuri, and Vivek Narasayya. 2006. AutoAdmin: Self-Tuning Database Systems Technology. IEEE Data Engineering Bulletin (2006), 7--15."},{"key":"e_1_2_1_2_1","volume-title":"Proceedings of the 26th International Conference on Very Large Data Bases (VLDB '00)","author":"Agrawal Sanjay","unstructured":"Sanjay Agrawal , Surajit Chaudhuri , and Vivek R. Narasayya . 2000. Automated Selection of Materialized Views and Indexes in SQL Databases . In Proceedings of the 26th International Conference on Very Large Data Bases (VLDB '00) . Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 496--505. Sanjay Agrawal, Surajit Chaudhuri, and Vivek R. Narasayya. 2000. Automated Selection of Materialized Views and Indexes in SQL Databases. In Proceedings of the 26th International Conference on Very Large Data Bases (VLDB '00). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 496--505."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/1007568.1007609"},{"key":"e_1_2_1_4_1","unstructured":"Amazon [n.d.]. Managing the volume of merged rows. https:\/\/docs.aws.amazon.com\/redshift\/latest\/dg\/vacuum-managing-volume-of-unmerged-rows.html.  Amazon [n.d.]. Managing the volume of merged rows. https:\/\/docs.aws.amazon.com\/redshift\/latest\/dg\/vacuum-managing-volume-of-unmerged-rows.html."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.14778\/3358701.3358707"},{"key":"e_1_2_1_6_1","volume-title":"Proceedings of the 29th International Conference on Very Large Data Bases -","volume":"29","author":"Paul","unstructured":"Paul G. Brown and Peter J. Hass. 2003. BHUNT: Automatic Discovery of Fuzzy Algebraic Constraints in Relational Data . In Proceedings of the 29th International Conference on Very Large Data Bases - Volume 29 (Berlin, Germany) (VLDB '03). VLDB Endowment, 668--679. Paul G. Brown and Peter J. Hass. 2003. BHUNT: Automatic Discovery of Fuzzy Algebraic Constraints in Relational Data. In Proceedings of the 29th International Conference on Very Large Data Bases - Volume 29 (Berlin, Germany) (VLDB '03). VLDB Endowment, 668--679."},{"key":"e_1_2_1_7_1","volume-title":"Proceedings of the 29th International Conference on Very Large Data Bases -","volume":"29","author":"Paul","unstructured":"Paul G. Brown and Peter J. Hass. 2003. BHUNT: Automatic Discovery of Fuzzy Algebraic Constraints in Relational Data . In Proceedings of the 29th International Conference on Very Large Data Bases - Volume 29 (Berlin, Germany) (VLDB '03). VLDB Endowment, 668--679. Paul G. Brown and Peter J. Hass. 2003. BHUNT: Automatic Discovery of Fuzzy Algebraic Constraints in Relational Data. In Proceedings of the 29th International Conference on Very Large Data Bases - Volume 29 (Berlin, Germany) (VLDB '03). VLDB Endowment, 668--679."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/1292609.1292618"},{"key":"e_1_2_1_9_1","volume-title":"Proceedings of the 33rd International Conference on Very Large Data Bases","author":"Chaudhuri Surajit","year":"2007","unstructured":"Surajit Chaudhuri and Vivek Narasayya . 2007 . Self-Tuning Database Systems: A Decade of Progress . In Proceedings of the 33rd International Conference on Very Large Data Bases ( Vienna, Austria) (VLDB '07). VLDB Endowment, 3--14. Surajit Chaudhuri and Vivek Narasayya. 2007. Self-Tuning Database Systems: A Decade of Progress. In Proceedings of the 33rd International Conference on Very Large Data Bases (Vienna, Austria) (VLDB '07). VLDB Endowment, 3--14."},{"key":"e_1_2_1_10_1","volume-title":"Proceedings of the 23rd International Conference on Very Large Data Bases (VLDB '97)","author":"Chaudhuri Surajit","unstructured":"Surajit Chaudhuri and Vivek R. Narasayya . 1997. An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server . In Proceedings of the 23rd International Conference on Very Large Data Bases (VLDB '97) . Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 146--155. Surajit Chaudhuri and Vivek R. Narasayya. 1997. An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server. In Proceedings of the 23rd International Conference on Very Large Data Bases (VLDB '97). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 146--155."},{"key":"e_1_2_1_11_1","unstructured":"Zach Christopherson. 2016. Amazon Redshift Engineering's Advanced Table Design Playbook: Compound and Interleaved Sort Keys. (2016). https:\/\/aws.amazon.com\/blogs\/big-data\/amazon-redshift-engineerings-advanced-table-design-playbook-compound-and-interleaved-sort-keys\/  Zach Christopherson. 2016. Amazon Redshift Engineering's Advanced Table Design Playbook: Compound and Interleaved Sort Keys. (2016). https:\/\/aws.amazon.com\/blogs\/big-data\/amazon-redshift-engineerings-advanced-table-design-playbook-compound-and-interleaved-sort-keys\/"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/32.44388"},{"key":"e_1_2_1_13_1","volume-title":"Yang Zhang, and Samuel R Madden.","author":"Curino Carlo","year":"2010","unstructured":"Carlo Curino , Evan Philip Charles Jones , Yang Zhang, and Samuel R Madden. 2010 . Schism : a workload-driven approach to database replication and partitioning. (2010). Carlo Curino, Evan Philip Charles Jones, Yang Zhang, and Samuel R Madden. 2010. Schism: a workload-driven approach to database replication and partitioning. (2010)."},{"key":"e_1_2_1_14_1","unstructured":"Databricks [n.d.]. Data skipping index. https:\/\/docs.databricks.com\/delta\/data-skipping.html\/. Accessed: 2022-12-01.  Databricks [n.d.]. Data skipping index. https:\/\/docs.databricks.com\/delta\/data-skipping.html\/. Accessed: 2022-12-01."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3457270"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.14778\/3425879.3425880"},{"key":"e_1_2_1_17_1","unstructured":"Markus Dreseler Jan Kossmann Martin Boissier Stefan Klauck Matthias Uflacker and Hasso Plattner. [n.d.]. Hyrise Re-engineered: An Extensible Database System for Research in Relational In-Memory Data Management.  Markus Dreseler Jan Kossmann Martin Boissier Stefan Klauck Matthias Uflacker and Hasso Plattner. [n.d.]. Hyrise Re-engineered: An Extensible Database System for Research in Relational In-Memory Data Management."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/CAHPC.2005.32"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-03730-6_10"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3389704"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.14778\/3368289.3368292"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687627.1687765"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.14778\/1920841.1920979"},{"key":"e_1_2_1_24_1","volume-title":"Ani Kristo, Guillaume Leclerc, S. Madden, Hongzi Mao, and V. Nathan.","author":"Kraska Tim","year":"2019","unstructured":"Tim Kraska , M. Alizadeh , Alex Beutel , Ed H. Chi , Ani Kristo, Guillaume Leclerc, S. Madden, Hongzi Mao, and V. Nathan. 2019 . SageDB: A Learned Database System. In CIDR. Tim Kraska, M. Alizadeh, Alex Beutel, Ed H. Chi, Ani Kristo, Guillaume Leclerc, S. Madden, Hongzi Mao, and V. Nathan. 2019. SageDB: A Learned Database System. In CIDR."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.14778\/3192965.3192968"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463676.2463708"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.14778\/2850583.2850594"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.14778\/3342263.3342270"},{"key":"e_1_2_1_29_1","unstructured":"Samuel Madden Jialin Ding Tim Kraska Sivaprasad Sudhir David Cohen Timothy Mattson and Nesime Tatbul. 2022. Self-Organizing Data Containers. (2022).  Samuel Madden Jialin Ding Tim Kraska Sivaprasad Sudhir David Cohen Timothy Mattson and Nesime Tatbul. 2022. Self-Organizing Data Containers. (2022)."},{"key":"e_1_2_1_30_1","unstructured":"Microsoft [n.d.]. Columnstore Indexes Query performance. https:\/\/learn.microsoft.com\/en-us\/sql\/relational-databases\/indexes\/columnstore-indexes-query-performance?view=sql-server-ver16. Accessed: 2022-12-01.  Microsoft [n.d.]. Columnstore Indexes Query performance. https:\/\/learn.microsoft.com\/en-us\/sql\/relational-databases\/indexes\/columnstore-indexes-query-performance?view=sql-server-ver16. Accessed: 2022-12-01."},{"key":"e_1_2_1_31_1","volume-title":"Proceedings of the 24rd International Conference on Very Large Data Bases (VLDB '98)","author":"Moerkotte Guido","year":"1998","unstructured":"Guido Moerkotte . 1998 . Small Materialized Aggregates: A Light Weight Index Structure for Data Warehousing . In Proceedings of the 24rd International Conference on Very Large Data Bases (VLDB '98) . Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 476--487. Guido Moerkotte. 1998. Small Materialized Aggregates: A Light Weight Index Structure for Data Warehousing. In Proceedings of the 24rd International Conference on Very Large Data Bases (VLDB '98). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 476--487."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2012.06683"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.14778\/3115404.3115415"},{"key":"e_1_2_1_34_1","unstructured":"Oracle [n.d.]. Database Data Warehousing Guide. https:\/\/docs.oracle.com\/database\/121\/DWHSG\/zone_maps.htm#DWHSG-GUID-BEA5ACA1-6718-4948-AB38-1F2C0335FDE4. Accessed: 2022-12-01.  Oracle [n.d.]. Database Data Warehousing Guide. https:\/\/docs.oracle.com\/database\/121\/DWHSG\/zone_maps.htm#DWHSG-GUID-BEA5ACA1-6718-4948-AB38-1F2C0335FDE4. Accessed: 2022-12-01."},{"key":"e_1_2_1_35_1","doi-asserted-by":"crossref","unstructured":"Stefano Paraboschi Giuseppe Sindoni Elena Baralis and Ernest Teniente. 2003. Materialized Views in Multidimensional Databases. IGI Global USA 222--251.  Stefano Paraboschi Giuseppe Sindoni Elena Baralis and Ernest Teniente. 2003. Materialized Views in Multidimensional Databases. IGI Global USA 222--251.","DOI":"10.4018\/978-1-59140-053-0.ch008"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/564691.564757"},{"key":"e_1_2_1_37_1","volume-title":"Proceedings of the 31st International Conference on Very Large Data Bases","author":"Stonebraker Mike","year":"2005","unstructured":"Mike Stonebraker , Daniel J. Abadi , Adam Batkin , Xuedong Chen , Mitch Cherniack , Miguel Ferreira , Edmond Lau , Amerson Lin , Sam Madden , Elizabeth O'Neil , Pat O'Neil , Alex Rasin , Nga Tran , and Stan Zdonik . 2005 . C-Store: A Column-Oriented DBMS . In Proceedings of the 31st International Conference on Very Large Data Bases ( Trondheim, Norway) (VLDB '05). VLDB Endowment, 553--564. Mike Stonebraker, Daniel J. Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Sam Madden, Elizabeth O'Neil, Pat O'Neil, Alex Rasin, Nga Tran, and Stan Zdonik. 2005. C-Store: A Column-Oriented DBMS. In Proceedings of the 31st International Conference on Very Large Data Bases (Trondheim, Norway) (VLDB '05). VLDB Endowment, 553--564."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.14778\/3503585.3503606"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2610515"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.14778\/3025111.3025123"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2010.5447738"},{"key":"e_1_2_1_42_1","unstructured":"TPC-DS [n.d.]. TPC-DS. https:\/\/www.tpc.org\/tpcds\/. Accessed: 2022-12-01.  TPC-DS [n.d.]. TPC-DS. https:\/\/www.tpc.org\/tpcds\/. Accessed: 2022-12-01."},{"key":"e_1_2_1_43_1","unstructured":"TPCH-H [n.d.]. TPC-H. https:\/\/www.tpc.org\/tpch\/. Accessed: 2022-12-01.  TPCH-H [n.d.]. TPC-H. https:\/\/www.tpc.org\/tpch\/. Accessed: 2022-12-01."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3319861"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3389770"},{"key":"e_1_2_1_46_1","volume-title":"International Conference on Autonomic Computing, 2004. Proceedings.","author":"Zilio D.","year":"2004","unstructured":"D. Zilio , C. Zuzarte , S. Lightstone , Wenbin Ma , G. Lohman , R. Cochrane , H. Pirahesh , L. Colby , Jarek Gryz , E. Alton , Dongming Liang , and G. Valentin . 2004. Recommending materialized views and indexes with the IBM DB2 design advisor . International Conference on Autonomic Computing, 2004. Proceedings. ( 2004 ), 180--187. D. Zilio, C. Zuzarte, S. Lightstone, Wenbin Ma, G. Lohman, R. Cochrane, H. Pirahesh, L. Colby, Jarek Gryz, E. Alton, Dongming Liang, and G. Valentin. 2004. Recommending materialized views and indexes with the IBM DB2 design advisor. International Conference on Autonomic Computing, 2004. Proceedings. (2004), 180--187."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3598581.3598601","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,7,19]],"date-time":"2023-07-19T23:00:05Z","timestamp":1689807605000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3598581.3598601"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,5]]},"references-count":46,"journal-issue":{"issue":"9","published-print":{"date-parts":[[2023,5]]}},"alternative-id":["10.14778\/3598581.3598601"],"URL":"https:\/\/doi.org\/10.14778\/3598581.3598601","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2023,5]]},"assertion":[{"value":"2023-07-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}