{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,7]],"date-time":"2026-01-07T23:36:49Z","timestamp":1767829009559,"version":"3.49.0"},"reference-count":89,"publisher":"Association for Computing Machinery (ACM)","issue":"3","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2019,11]]},"abstract":"<jats:p>Using data statistics, we convert predicates on a table into data induced predicates (diPs) that apply on the joining tables. Doing so substantially speeds up multi-relation queries because the benefits of predicate pushdown can now apply beyond just the tables that have predicates. We use diPs to skip data exclusively during query optimization; i.e., diPs lead to better plans and have no overhead during query execution. We study how to apply diPs for complex query expressions and how the usefulness of diPs varies with the data statistics used to construct diPs and the data distributions. Our results show that building diPs using zone-maps which are already maintained in today's clusters leads to sizable data skipping gains. Using a new (slightly larger) statistic, 50% of the queries in the TPC-H, TPC-DS and JoinOrder benchmarks can skip at least 33% of the query input. Consequently, the median query in a production big-data cluster finishes roughly 2x faster.<\/jats:p>","DOI":"10.14778\/3368289.3368292","type":"journal-article","created":{"date-parts":[[2020,9,11]],"date-time":"2020-09-11T03:17:35Z","timestamp":1599794255000},"page":"252-265","source":"Crossref","is-referenced-by-count":30,"title":["Pushing data-induced predicates through joins in big-data clusters"],"prefix":"10.14778","volume":"13","author":[{"given":"Srikanth","family":"Kandula","sequence":"first","affiliation":[{"name":"Microsoft"}]},{"given":"Laurel","family":"Orr","sequence":"additional","affiliation":[{"name":"Microsoft"}]},{"given":"Surajit","family":"Chaudhuri","sequence":"additional","affiliation":[{"name":"Microsoft"}]}],"member":"320","published-online":{"date-parts":[[2019,11]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"2017 big-data and analytics forecast. https:\/\/bit.ly\/2TtKyjB.  2017 big-data and analytics forecast. https:\/\/bit.ly\/2TtKyjB."},{"key":"e_1_2_1_2_1","unstructured":"Apache orc spec. vi. https:\/\/bit.ly\/2J5BIkh.  Apache orc spec. vi. https:\/\/bit.ly\/2J5BIkh."},{"key":"e_1_2_1_3_1","unstructured":"Apache spark join guidelines and performance tuning. https:\/\/bit.ly\/2Jd87We.  Apache spark join guidelines and performance tuning. https:\/\/bit.ly\/2Jd87We."},{"key":"e_1_2_1_4_1","unstructured":"Band join. https:\/\/bit.ly\/2kixJJn.  Band join. https:\/\/bit.ly\/2kixJJn."},{"key":"e_1_2_1_5_1","unstructured":"Bitmap join indexes in oracle. https:\/\/bit.ly\/2TLBBTF.  Bitmap join indexes in oracle. https:\/\/bit.ly\/2TLBBTF."},{"key":"e_1_2_1_6_1","unstructured":"Clustered and nonclustered indexes described. https:\/\/bit.ly\/2Drdb9o.  Clustered and nonclustered indexes described. https:\/\/bit.ly\/2Drdb9o."},{"key":"e_1_2_1_7_1","unstructured":"Columnstore index performance: Rowgroup elimination. https:\/\/bit.ly\/2VFpljV.  Columnstore index performance: Rowgroup elimination. https:\/\/bit.ly\/2VFpljV."},{"key":"e_1_2_1_8_1","unstructured":"Columnstore indexes described. https:\/\/bit.ly\/2F7LZuI.  Columnstore indexes described. https:\/\/bit.ly\/2F7LZuI."},{"key":"e_1_2_1_9_1","unstructured":"Data skipping index in spark. https:\/\/bit.ly\/2qONacb.  Data skipping index in spark. https:\/\/bit.ly\/2qONacb."},{"key":"e_1_2_1_10_1","unstructured":"Date correlation optimzation in sql server 2005 & 2008. https:\/\/bit.ly\/2VodSVN.  Date correlation optimzation in sql server 2005 & 2008. https:\/\/bit.ly\/2VodSVN."},{"key":"e_1_2_1_11_1","unstructured":"Imdb datasets. https:\/\/imdb.to\/2S3BzSF.  Imdb datasets. https:\/\/imdb.to\/2S3BzSF."},{"key":"e_1_2_1_12_1","unstructured":"Join order benchmark. https:\/\/bit.ly\/2tTRyIb.  Join order benchmark. https:\/\/bit.ly\/2tTRyIb."},{"key":"e_1_2_1_13_1","unstructured":"The junction tree algorithm. https:\/\/bit.ly\/2lPHNtA.  The junction tree algorithm. https:\/\/bit.ly\/2lPHNtA."},{"key":"e_1_2_1_14_1","unstructured":"Oracle database guide: Using zone maps. https:\/\/bit.ly\/2qMeO9E.  Oracle database guide: Using zone maps. https:\/\/bit.ly\/2qMeO9E."},{"key":"e_1_2_1_15_1","unstructured":"Oracle: Using zone maps. https:\/\/bit.ly\/2vsUWKK.  Oracle: Using zone maps. https:\/\/bit.ly\/2vsUWKK."},{"key":"e_1_2_1_16_1","unstructured":"Parquet thrift format. https:\/\/bit.ly\/2vm6D5U.  Parquet thrift format. https:\/\/bit.ly\/2vm6D5U."},{"key":"e_1_2_1_17_1","unstructured":"Presto: Repartitioned and replicated joins. https:\/\/bit.ly\/2JauYll.  Presto: Repartitioned and replicated joins. https:\/\/bit.ly\/2JauYll."},{"key":"e_1_2_1_18_1","unstructured":"Processing petabytes of data in seconds with databricks delta. https:\/\/bit.ly\/2Pryf2E.  Processing petabytes of data in seconds with databricks delta. https:\/\/bit.ly\/2Pryf2E."},{"key":"e_1_2_1_19_1","unstructured":"Pushing data-induced predicates through joins in bigdata clusters; extended version. https:\/\/bit.ly\/2WhTwP1.  Pushing data-induced predicates through joins in bigdata clusters; extended version. https:\/\/bit.ly\/2WhTwP1."},{"key":"e_1_2_1_20_1","unstructured":"Query 17 in tpc-h see page #57. https:\/\/bit.ly\/2kJRV72.  Query 17 in tpc-h see page #57. https:\/\/bit.ly\/2kJRV72."},{"key":"e_1_2_1_21_1","unstructured":"Query execution bitmap filters. https:\/\/bit.ly\/2NJzzgF.  Query execution bitmap filters. https:\/\/bit.ly\/2NJzzgF."},{"key":"e_1_2_1_22_1","unstructured":"Redshift: Choosing the best sort key. https:\/\/amzn.to\/2AmYbXh.  Redshift: Choosing the best sort key. https:\/\/amzn.to\/2AmYbXh."},{"key":"e_1_2_1_23_1","unstructured":"Teradata: Join index. https:\/\/bit.ly\/2FbalDT.  Teradata: Join index. https:\/\/bit.ly\/2FbalDT."},{"key":"e_1_2_1_24_1","unstructured":"TPC-DS Benchmark. http:\/\/www.tpc.org\/tpcds\/.  TPC-DS Benchmark. http:\/\/www.tpc.org\/tpcds\/."},{"key":"e_1_2_1_25_1","unstructured":"Tpc-ds query #35. https:\/\/bit.ly\/2U0rIk6.  Tpc-ds query #35. https:\/\/bit.ly\/2U0rIk6."},{"key":"e_1_2_1_26_1","unstructured":"TPC-H Benchmark. http:\/\/www.tpc.org\/tpch.  TPC-H Benchmark. http:\/\/www.tpc.org\/tpch."},{"key":"e_1_2_1_27_1","unstructured":"Vertica: Choosing sort order: Best practices. https:\/\/bit.ly\/2yrvPtG.  Vertica: Choosing sort order: Best practices. https:\/\/bit.ly\/2yrvPtG."},{"key":"e_1_2_1_28_1","unstructured":"Views in sql server. https:\/\/bit.ly\/2CnbmIo.  Views in sql server. https:\/\/bit.ly\/2CnbmIo."},{"key":"e_1_2_1_29_1","unstructured":"Program for tpc-h data generation with skew. https:\/\/bit.ly\/2wvdNVo 2016.  Program for tpc-h data generation with skew. https:\/\/bit.ly\/2wvdNVo 2016."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/304182.304198"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2500128"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/2465351.2465355"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/253260.253355"},{"key":"e_1_2_1_34_1","volume-title":"VLDB","author":"Agrawal S.","year":"2000"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1016\/B978-012088469-8.50097-8"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1006\/jcss.1997.1545"},{"key":"e_1_2_1_37_1","volume-title":"SIGMOD","author":"Armbrust M.","year":"2015"},{"key":"e_1_2_1_38_1","volume-title":"SIGMOD","author":"Bancilhon F.","year":"1985"},{"key":"e_1_2_1_39_1","volume-title":"CACM","author":"Bloom B.","year":"1970"},{"key":"e_1_2_1_40_1","volume-title":"Hadoop Apache Project","author":"Borthakur D.","year":"2008"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/1989323.1989438"},{"key":"e_1_2_1_42_1","volume-title":"VLDB","author":"Brown P. G.","year":"2003"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/3093754.3093761"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.14778\/2732240.2732245"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.14778\/1454159.1454166"},{"key":"e_1_2_1_46_1","volume-title":"Foundations and Trends in Databases","author":"Chirkova R.","year":"2012"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2750545"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1561\/1900000004"},{"key":"e_1_2_1_49_1","author":"Cormode G.","year":"2005","journal-title":"J. Algorithms"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/1247480.1247598"},{"key":"e_1_2_1_51_1","volume-title":"VLDB","author":"Eltabakh M. Y.","year":"2011"},{"key":"e_1_2_1_52_1","volume-title":"VLDB","author":"Gibbons P. B.","year":"1997"},{"key":"e_1_2_1_53_1","volume-title":"IEEE Data Eng. Bull.","author":"Graefe G.","year":"1995"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3196911"},{"key":"e_1_2_1_55_1","volume-title":"CIDR","author":"Idreos S.","year":"2007"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/1007568.1007641"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2008.4497486"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687627.1687765"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.14778\/2367502.2367518"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.14778\/2850583.2850594"},{"key":"e_1_2_1_61_1","volume-title":"VLDB","author":"Levy A. Y.","year":"1994"},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.14778\/3055540.3055551"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1145\/191839.191860"},{"key":"e_1_2_1_64_1","unstructured":"A. Nanda. Oracle exadata: Smart scans meet storage indexes. http:\/\/bit.ly\/2ha7C5u 2011.  A. Nanda. Oracle exadata: Smart scans meet storage indexes. http:\/\/bit.ly\/2ha7C5u 2011."},{"key":"e_1_2_1_65_1","volume-title":"VLDB","author":"Nica A.","year":"2017"},{"key":"e_1_2_1_66_1","volume-title":"VLDB","author":"Olma M.","year":"2017"},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1145\/1376616.1376726"},{"key":"e_1_2_1_68_1","volume-title":"VLDB","author":"Patel J. M.","year":"2018"},{"key":"e_1_2_1_69_1","volume-title":"Morgan Kaufmann","author":"Pearl J.","year":"1988"},{"key":"e_1_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.1145\/233269.233361"},{"key":"e_1_2_1_71_1","doi-asserted-by":"publisher","DOI":"10.14778\/2732228.2732229"},{"key":"e_1_2_1_72_1","doi-asserted-by":"publisher","DOI":"10.1145\/233269.233360"},{"key":"e_1_2_1_73_1","doi-asserted-by":"publisher","DOI":"10.1145\/3127479.3131613"},{"key":"e_1_2_1_74_1","doi-asserted-by":"publisher","DOI":"10.14778\/3007263.3007311"},{"key":"e_1_2_1_75_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2013.6544909"},{"key":"e_1_2_1_76_1","doi-asserted-by":"publisher","DOI":"10.14778\/2536222.2536232"},{"key":"e_1_2_1_77_1","volume-title":"VLDB","author":"D.","year":"2008"},{"key":"e_1_2_1_78_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2610515"},{"key":"e_1_2_1_79_1","volume-title":"VLDB","author":"Sun L.","year":"2017"},{"key":"e_1_2_1_80_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687553.1687609"},{"key":"e_1_2_1_81_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2014.6816727"},{"key":"e_1_2_1_82_1","doi-asserted-by":"crossref","unstructured":"M. J. Wainwright and M. I. Jordan. Graphical models exponential families and variational inference. https:\/\/bit.ly\/2yurPIS 2008.  M. J. Wainwright and M. I. Jordan. Graphical models exponential families and variational inference. https:\/\/bit.ly\/2yurPIS 2008.","DOI":"10.1561\/9781601981851"},{"key":"e_1_2_1_83_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3064053"},{"key":"e_1_2_1_84_1","doi-asserted-by":"publisher","DOI":"10.14778\/3025111.3025120"},{"key":"e_1_2_1_85_1","volume-title":"NSDI","author":"Zaharia M.","year":"2012"},{"key":"e_1_2_1_86_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2723718"},{"key":"e_1_2_1_87_1","volume-title":"SIGMOD","author":"Zhang H.","year":"2018"},{"key":"e_1_2_1_88_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-012-0280-z"},{"key":"e_1_2_1_89_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2010.5447802"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3368289.3368292","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T09:42:10Z","timestamp":1672220530000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3368289.3368292"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,11]]},"references-count":89,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2019,11]]}},"alternative-id":["10.14778\/3368289.3368292"],"URL":"https:\/\/doi.org\/10.14778\/3368289.3368292","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2019,11]]}}}