{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,22]],"date-time":"2026-01-22T20:42:51Z","timestamp":1769114571062,"version":"3.49.0"},"reference-count":70,"publisher":"Association for Computing Machinery (ACM)","issue":"4","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2018,12]]},"abstract":"<jats:p>\n            A range of\n            <jats:italic>explanation engines<\/jats:italic>\n            assist data analysts by performing feature selection over increasingly high-volume and high-dimensional data, grouping and highlighting commonalities among data points. While useful in diverse tasks such as user behavior analytics, operational event processing, and root cause analysis, today's explanation engines are designed as standalone data processing tools that do not interoperate with traditional, SQL-based analytics workflows; this limits the applicability and extensibility of these engines. In response, we propose the DIFF operator, a relational aggregation operator that unifies the core functionality of these engines with declarative relational query processing. We implement both single-node and distributed versions of the DIFF operator in MB SQL, an extension of MacroBase, and demonstrate how DIFF can provide the same semantics as existing explanation engines while capturing a broad set of production use cases in industry, including at Microsoft and Facebook. Additionally, we illustrate how this declarative approach to data explanation enables new logical and physical query optimizations. We evaluate these optimizations on several real-world production applications, and find that DIFF in MB SQL can outperform state-of-the-art engines by up to an order of magnitude.\n          <\/jats:p>","DOI":"10.14778\/3297753.3297761","type":"journal-article","created":{"date-parts":[[2019,2,27]],"date-time":"2019-02-27T14:57:56Z","timestamp":1551279476000},"page":"419-432","source":"Crossref","is-referenced-by-count":18,"title":["DIFF"],"prefix":"10.14778","volume":"12","author":[{"given":"Firas","family":"Abuzaid","sequence":"first","affiliation":[{"name":"Microsoft"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Peter","family":"Kraft","sequence":"additional","affiliation":[{"name":"Microsoft"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sahaana","family":"Suri","sequence":"additional","affiliation":[{"name":"Microsoft"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Edward","family":"Gan","sequence":"additional","affiliation":[{"name":"Microsoft"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Eric","family":"Xu","sequence":"additional","affiliation":[{"name":"Microsoft"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Atul","family":"Shenoy","sequence":"additional","affiliation":[{"name":"Microsoft"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Asvin","family":"Ananthanarayan","sequence":"additional","affiliation":[{"name":"Microsoft"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"John","family":"Sheu","sequence":"additional","affiliation":[{"name":"Microsoft"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Erik","family":"Meijer","sequence":"additional","affiliation":[{"name":"Facebook"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xi","family":"Wu","sequence":"additional","affiliation":[{"name":"Google"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jeff","family":"Naughton","sequence":"additional","affiliation":[{"name":"Google"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Peter","family":"Bailis","sequence":"additional","affiliation":[{"name":"Microsoft"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Matei","family":"Zaharia","sequence":"additional","affiliation":[{"name":"Microsoft"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2018,12]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Foundations of databases: the logical level","author":"Abiteboul S.","year":"1995","unstructured":"S. Abiteboul , R. Hull , and V. Vianu . Foundations of databases: the logical level . Addison-Wesley Longman Publishing Co., Inc. , 1995 . S. Abiteboul, R. Hull, and V. Vianu. Foundations of databases: the logical level. Addison-Wesley Longman Publishing Co., Inc., 1995."},{"key":"e_1_2_1_2_1","first-page":"487","volume-title":"VLDB","author":"Agarwal R.","year":"1994","unstructured":"R. Agarwal , R. Srikant , Fast algorithms for mining association rules . In VLDB , pages 487 -- 499 , 1994 . R. Agarwal, R. Srikant, et al. Fast algorithms for mining association rules. In VLDB, pages 487--499, 1994."},{"key":"e_1_2_1_3_1","volume-title":"USENIX Security","author":"Antonakakis M.","year":"2017","unstructured":"M. Antonakakis , T. April , M. Bailey , M. Bernhard , E. Bursztein , J. Cochran , Z. Durumeric , J. A. Halderman , L. Invernizzi , M. Kallitsis , D. Kumar , C. Lever , Z. Ma , J. Mason , D. Menscher , C. Seaman , N. Sullivan , K. Thomas , and Y. Zhou . Understanding the mirai botnet . In USENIX Security , 2017 . M. Antonakakis, T. April, M. Bailey, M. Bernhard, E. Bursztein, J. Cochran, Z. Durumeric, J. A. Halderman, L. Invernizzi, M. Kallitsis, D. Kumar, C. Lever, Z. Ma, J. Mason, D. Menscher, C. Seaman, N. Sullivan, K. Thomas, and Y. Zhou. Understanding the mirai botnet. In USENIX Security, 2017."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2742797"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/335191.335420"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/775047.775109"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/1066157.1066171"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3035928"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3035928"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2005.80"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2008.180"},{"key":"e_1_2_1_12_1","volume-title":"Compressive sensing {lecture notes}","author":"Baraniuk R. G.","year":"2007","unstructured":"R. G. Baraniuk . Compressive sensing {lecture notes} . IEEE signal processing magazine, 24(4):118--121, 2007 . R. G. Baraniuk. Compressive sensing {lecture notes}. IEEE signal processing magazine, 24(4):118--121, 2007."},{"key":"e_1_2_1_13_1","first-page":"1165","volume-title":"The control of the false discovery rate in multiple testing under dependency. Annals of statistics","author":"Benjamini Y.","year":"2001","unstructured":"Y. Benjamini and D. Yekutieli . The control of the false discovery rate in multiple testing under dependency. Annals of statistics , pages 1165 -- 1188 , 2001 . Y. Benjamini and D. Yekutieli. The control of the false discovery rate in multiple testing under dependency. Annals of statistics, pages 1165--1188, 2001."},{"key":"e_1_2_1_14_1","volume-title":"CIDR","author":"Bittorf M.","year":"2015","unstructured":"M. Bittorf : A modern, open-source sql engine for hadoop . In CIDR , 2015 . M. Bittorf et al. Impala: A modern, open-source sql engine for hadoop. In CIDR, 2015."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.5555\/645484.656386"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1002\/spe.2325"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/2938503.2938515"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/275487.275492"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.14778\/3137628.3137633"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/2408776.2408794"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.5555\/1331939.1331940"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/2663716.2663755"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/2810103.2813703"},{"key":"e_1_2_1_24_1","first-page":"958","volume-title":"VLDB","author":"Fagin R.","year":"2005","unstructured":"R. Fagin Efficient implementation of large-scale multi-structural databases . In VLDB , pages 958 -- 969 . VLDB Endowment , 2005 . R. Fagin et al. Efficient implementation of large-scale multi-structural databases. In VLDB, pages 958--969. VLDB Endowment, 2005."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/1065167.1065191"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/1565694.1565702"},{"key":"e_1_2_1_27_1","first-page":"36","volume-title":"Joint European conference on machine learning and knowledge discovery in databases","author":"P.","year":"2016","unstructured":"P. Fournier-Viger et al. The spmf open-source data mining library version 2 . In Joint European conference on machine learning and knowledge discovery in databases , pages 36 -- 40 . Springer , 2016 . P. Fournier-Viger et al. The spmf open-source data mining library version 2. In Joint European conference on machine learning and knowledge discovery in databases, pages 36--40. Springer, 2016."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.5555\/645478.757691"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1009726021843"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/1496091.1496103"},{"key":"e_1_2_1_31_1","volume-title":"An introduction to variable and feature selection. Journal of machine learning research, 3(Mar):1157--1182","author":"Guyon I.","year":"2003","unstructured":"I. Guyon and A. Elisseeff . An introduction to variable and feature selection. Journal of machine learning research, 3(Mar):1157--1182 , 2003 . I. Guyon and A. Elisseeff. An introduction to variable and feature selection. Journal of machine learning research, 3(Mar):1157--1182, 2003."},{"key":"e_1_2_1_32_1","volume-title":"Correlation-based feature selection of discrete and numeric class machine learning","author":"Hall M. A.","year":"2000","unstructured":"M. A. Hall . Correlation-based feature selection of discrete and numeric class machine learning . 2000 . M. A. Hall. Correlation-based feature selection of discrete and numeric class machine learning. 2000."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1561\/1900000002"},{"key":"e_1_2_1_34_1","volume-title":"Readings in database systems","author":"Hellerstein J. M.","year":"2005","unstructured":"J. M. Hellerstein and M. Stonebraker . Readings in database systems . 2005 . J. M. Hellerstein and M. Stonebraker. Readings in database systems. 2005."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/2351316.2351329"},{"key":"e_1_2_1_36_1","volume-title":"Rich data and the increasing value of the internet of things","author":"DC.","year":"2014","unstructured":"I DC. The digital universe of opportunities : Rich data and the increasing value of the internet of things , 2014 . http:\/\/www.emc.com\/leadership\/digital-universe\/. IDC. The digital universe of opportunities: Rich data and the increasing value of the internet of things, 2014. http:\/\/www.emc.com\/leadership\/digital-universe\/."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/1007568.1007641"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/119995.115835"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.14778\/2180912.2180913"},{"key":"e_1_2_1_40_1","volume-title":"The data warehouse toolkit: the complete guide to dimensional modeling","author":"Kimball R.","year":"2011","unstructured":"R. Kimball and M. Ross . The data warehouse toolkit: the complete guide to dimensional modeling . John Wiley & Sons , 2011 . R. Kimball and M. Ross. The data warehouse toolkit: the complete guide to dimensional modeling. John Wiley & Sons, 2011."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.14778\/2536274.2536302"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2882952"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2723713"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.14778\/2367502.2367518"},{"key":"e_1_2_1_46_1","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9781139924801","volume-title":"Mining of Massive Datasets","author":"Leskovec J.","year":"2014","unstructured":"J. Leskovec Mining of Massive Datasets . Cambridge university press , 2014 . J. Leskovec et al. Mining of Massive Datasets. Cambridge university press, 2014."},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/1454008.1454027"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/3136625"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.14778\/2733004.2733070"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.14778\/1920841.1920886"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.5555\/2946645.2946679"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3183733"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/3180143"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1145\/253262.253268"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/1142351.1142384"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1016\/0031-3203(80)90029-1"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2015.7113365"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2588578"},{"key":"e_1_2_1_59_1","volume-title":"Simultaneous statistical inference","author":"Rupert G.","year":"2012","unstructured":"G. Rupert Jr Simultaneous statistical inference . Springer Science & Business Media , 2012 . G. Rupert Jr et al. Simultaneous statistical inference. Springer Science & Business Media, 2012."},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btm344"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2882917"},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1145\/582095.582099"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1007\/11415763_3"},{"key":"e_1_2_1_64_1","first-page":"553","volume-title":"VLDB","author":"Stonebraker M.","year":"2005","unstructured":"M. Stonebraker : a column-oriented dbms . In VLDB , pages 553 -- 564 . VLDB Endowment , 2005 . M. Stonebraker et al. C-store: a column-oriented dbms. In VLDB, pages 553--564. VLDB Endowment, 2005."},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2750549"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1006\/jcss.1996.0012"},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.14778\/2536354.2536356"},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2595631"},{"key":"e_1_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2915218"},{"key":"e_1_2_1_70_1","first-page":"2","volume-title":"NSDI","author":"Zaharia M.","year":"2012","unstructured":"M. Zaharia Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing . In NSDI , pages 2 -- 2 . USENIX Association , 2012 . M. Zaharia et al. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In NSDI, pages 2--2. USENIX Association, 2012."},{"key":"e_1_2_1_71_1","doi-asserted-by":"publisher","DOI":"10.1109\/CLUSTER.2011.61"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3297753.3297761","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T09:33:20Z","timestamp":1672220000000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3297753.3297761"}},"subtitle":["a relational interface for large-scale data explanation"],"short-title":[],"issued":{"date-parts":[[2018,12]]},"references-count":70,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2018,12]]}},"alternative-id":["10.14778\/3297753.3297761"],"URL":"https:\/\/doi.org\/10.14778\/3297753.3297761","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2018,12]]}}}