{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,11]],"date-time":"2026-01-11T08:26:57Z","timestamp":1768120017116,"version":"3.49.0"},"reference-count":82,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2024,3,12]],"date-time":"2024-03-12T00:00:00Z","timestamp":1710201600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100006374","name":"NSF","doi-asserted-by":"publisher","award":["IIS-1552538, IIS-1703431, IIS-2008107, IIS-2147061"],"award-info":[{"award-number":["IIS-1552538, IIS-1703431, IIS-2008107, IIS-2147061"]}],"id":[{"id":"10.13039\/501100006374","id-type":"DOI","asserted-by":"publisher"}]},{"name":"NSF Convergence Accelerator Program award","award":["2132318"],"award-info":[{"award-number":["2132318"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Manag. Data"],"published-print":{"date-parts":[[2024,3,12]]},"abstract":"<jats:p>SQL queries with group-by and average are frequently used and plotted as bar charts in several data analysis applications. Understanding the reasons behind the results in such an aggregate view may be a highly nontrivial and time-consuming task, especially for large datasets with multiple attributes. Hence, generating automated explanations for aggregate views can allow users to gain better insights into the results while saving time in data analysis. When providing explanations for such views, it is paramount to ensure that they are succinct yet comprehensive, reveal different types of insights that hold for different aggregate answers in the view, and, most importantly, they reflect reality and arm users to make informed data-driven decisions, i.e., the explanations do not only consider correlations but are causal. In this paper, we present CauSumX, a framework for generating summarized causal explanations for the entire aggregate view. Using background knowledge captured in a causal DAG, CauSumX finds the most effective causal treatments for different groups in the view. We formally define the framework and the optimization problem, study its complexity, and devise an efficient algorithm using the Apriori algorithm, LP rounding, and several optimizations. We experimentally show that our system generates useful summarized causal explanations compared to prior work and scales well for large high-dimensional data.<\/jats:p>","DOI":"10.1145\/3639328","type":"journal-article","created":{"date-parts":[[2024,3,26]],"date-time":"2024-03-26T18:51:32Z","timestamp":1711479092000},"page":"1-27","source":"Crossref","is-referenced-by-count":7,"title":["Summarized Causal Explanations For Aggregate Views"],"prefix":"10.1145","volume":"2","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0031-5550","authenticated-orcid":false,"given":"Brit","family":"Youngmann","sequence":"first","affiliation":[{"name":"Technion - Israel Institute of Technology, Haifa, Israel"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6122-0590","authenticated-orcid":false,"given":"Michael","family":"Cafarella","sequence":"additional","affiliation":[{"name":"CSAIL MIT, Cambridge, MA, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3764-1958","authenticated-orcid":false,"given":"Amir","family":"Gilad","sequence":"additional","affiliation":[{"name":"Hebrew University, Jerusalem, Israel"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-8300-7891","authenticated-orcid":false,"given":"Sudeepa","family":"Roy","sequence":"additional","affiliation":[{"name":"Duke University, Durham, USA"}]}],"member":"320","published-online":{"date-parts":[[2024,3,26]]},"reference":[{"key":"e_1_2_2_1_1","unstructured":"2021. 2021 Stackoverflow Developer Survey. https:\/\/insights.stackoverflow.com\/survey\/2021."},{"key":"e_1_2_2_2_1","unstructured":"2021. Adult Census Income Dataset. https:\/\/www.kaggle.com\/datasets\/uciml\/adult-census-income."},{"key":"e_1_2_2_3_1","volume-title":"The 19th*. https:\/\/19thnews.org\/2023\/03\/parenthood-stereotypes-gender-pay-gap\/. Accessed: 2023-05--18","unstructured":"2023. The 19th*. https:\/\/19thnews.org\/2023\/03\/parenthood-stereotypes-gender-pay-gap\/. Accessed: 2023-05--18."},{"key":"e_1_2_2_4_1","unstructured":"2023. OpenAI Introducing ChatGPT. https:\/\/openai.com\/blog\/chatgpt."},{"key":"e_1_2_2_5_1","unstructured":"2023. Viza jobs. https:\/\/vizajobs.com\/what-do-technology-jobs-paysalary-insights-and-compensation-factors\/."},{"key":"e_1_2_2_6_1","unstructured":"2029. Tech Talks. https:\/\/bdtechtalks.com\/2019\/03\/29\/ageism-in-tech-age-limit-software-developers-face\/."},{"key":"e_1_2_2_7_1","volume-title":"Proc. 20th int. conf. very large data bases, VLDB","volume":"1215","author":"Agrawal Rakesh","year":"1994","unstructured":"Rakesh Agrawal, Ramakrishnan Srikant, et al . 1994. Fast algorithms for mining association rules. In Proc. 20th int. conf. very large data bases, VLDB, Vol. 1215. Santiago, Chile, 487--499."},{"key":"e_1_2_2_8_1","unstructured":"Mohamed Aljaban. 2021. Analysis of car accidents causes in the usa. (2021)."},{"key":"e_1_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/1989284.1989302"},{"key":"e_1_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2019.00056"},{"key":"e_1_2_2_11_1","unstructured":"Arthur Asuncion and David Newman. 2007. UCI machine learning repository."},{"key":"e_1_2_2_12_1","unstructured":"Abhijit V Banerjee Abhijit Banerjee and Esther Duflo. 2011. Poor economics: A radical rethinking of the way to fight global poverty. Public Affairs."},{"key":"e_1_2_2_13_1","volume-title":"Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. 843--854","author":"Roy Senjuti Basu","year":"2010","unstructured":"Senjuti Basu Roy, Sihem Amer-Yahia, Ashish Chawla, Gautam Das, and Cong Yu. 2010. Constructing and exploring composite items. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. 843--854."},{"key":"e_1_2_2_14_1","volume-title":"Are Emily and Greg more employable than Lakisha and Jamal? A field experiment on labor market discrimination. American economic review 94, 4","author":"Bertrand Marianne","year":"2004","unstructured":"Marianne Bertrand and Sendhil Mullainathan. 2004. Are Emily and Greg more employable than Lakisha and Jamal? A field experiment on labor market discrimination. American economic review 94, 4 (2004), 991--1013."},{"key":"e_1_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3385192"},{"key":"e_1_2_2_16_1","unstructured":"Nicole Bidoit Melanie Herschel and Katerina Tzompanaki. 2014. Query-based why-not provenance with nedexplain. In Extending database technology (EDBT)."},{"key":"e_1_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.5555\/1083592.1083644"},{"key":"e_1_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/1559845.1559901"},{"key":"e_1_2_2_19_1","volume-title":"International conference on artificial intelligence and statistics. PMLR, 604--612","author":"Chen Chaofan","year":"2018","unstructured":"Chaofan Chen and Cynthia Rudin. 2018. An optimization approach to learning falling rule lists. In International conference on artificial intelligence and statistics. PMLR, 604--612."},{"key":"e_1_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1093\/qje\/qjz042"},{"key":"e_1_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33017801"},{"key":"e_1_2_2_22_1","doi-asserted-by":"crossref","unstructured":"Leonardo Mendon\u00e7a de Moura and Nikolaj Bj\u00f8rner. 2008. Z3: An Efficient SMT Solver. In TACAS. 337--340.","DOI":"10.1007\/978-3-540-78800-3_24"},{"key":"e_1_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-019-00584-7"},{"key":"e_1_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.5441\/002\/edbt.2019.25"},{"key":"e_1_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.14778\/3565838.3565841"},{"key":"e_1_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.14778\/2735461.2735467"},{"key":"e_1_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.14778\/2735461.2735467"},{"key":"e_1_2_2_28_1","volume-title":"Integrated public use microdata series, current population survey: Version 9.0.[Machine-readable database]","author":"Flood Sarah","year":"2015","unstructured":"Sarah Flood, Miriam King, Steven Ruggles, and J Robert Warren. 2015. Integrated public use microdata series, current population survey: Version 9.0.[Machine-readable database]. Minneapolis: University of Minnesota 1 (2015)."},{"key":"e_1_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3526149"},{"key":"e_1_2_2_30_1","first-page":"321","article-title":"Epidemiology, justice, and the probability of causation","volume":"40","author":"Greenland Sander","year":"1999","unstructured":"Sander Greenland and James M Robins. 1999. Epidemiology, justice, and the probability of causation. Jurimetrics 40 (1999), 321.","journal-title":"Jurimetrics"},{"key":"e_1_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.1986.10478354"},{"key":"e_1_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE48307.2020.00081"},{"key":"e_1_2_2_33_1","volume-title":"The bayesian case model: A generative approach for case-based reasoning and prototype classification. Advances in neural information processing systems 27","author":"Kim Been","year":"2014","unstructured":"Been Kim, Cynthia Rudin, and Julie A Shah. 2014. The bayesian case model: A generative approach for case-based reasoning and prototype classification. Advances in neural information processing systems 27 (2014)."},{"key":"e_1_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2939874"},{"key":"e_1_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1016\/B978-155860869-6\/50073-1"},{"key":"e_1_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1016\/B978-155860869-6\/50074-3"},{"key":"e_1_2_2_37_1","volume-title":"Approximate summaries for why and why-not provenance (extended version). arXiv preprint arXiv:2002.00084","author":"Lee Seokki","year":"2020","unstructured":"Seokki Lee, Bertram Lud\u00e4scher, and Boris Glavic. 2020. Approximate summaries for why and why-not provenance (extended version). arXiv preprint arXiv:2002.00084 (2020)."},{"key":"e_1_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3459246"},{"key":"e_1_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.14778\/3485450.3485457"},{"key":"e_1_2_2_40_1","first-page":"1","article-title":"The Shapley Value of Tuples in Query Answering","volume":"155","author":"Livshits Ester","year":"2020","unstructured":"Ester Livshits, Leopoldo E. Bertossi, Benny Kimelfeld, and Moshe Sebag. 2020. The Shapley Value of Tuples in Query Answering. In ICDT, Vol. 155. 20:1--20:19.","journal-title":"ICDT"},{"key":"e_1_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/2487575.2487579"},{"key":"e_1_2_2_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3589301"},{"key":"e_1_2_2_43_1","volume-title":"Why so? or why no? functional causality for explaining query answers. arXiv preprint arXiv:0912.5340","author":"Meliou Alexandra","year":"2009","unstructured":"Alexandra Meliou, Wolfgang Gatterbauer, Katherine F Moore, and Dan Suciu. 2009. Why so? or why no? functional causality for explaining query answers. arXiv preprint arXiv:0912.5340 (2009)."},{"key":"e_1_2_2_44_1","volume-title":"The complexity of causality and responsibility for query answers and non-answers. arXiv preprint arXiv:1009.2021","author":"Meliou Alexandra","year":"2010","unstructured":"Alexandra Meliou, Wolfgang Gatterbauer, Katherine F Moore, and Dan Suciu. 2010. The complexity of causality and responsibility for query answers and non-answers. arXiv preprint arXiv:1009.2021 (2010)."},{"key":"e_1_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3300066"},{"key":"e_1_2_2_46_1","volume-title":"Contribution Maximization in Probabilistic Datalog. In 2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE, 817--828","author":"Milo Tova","year":"2020","unstructured":"Tova Milo, Yuval Moskovitch, and Brit Youngmann. 2020. Contribution Maximization in Probabilistic Datalog. In 2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE, 817--828."},{"key":"e_1_2_2_47_1","volume-title":"Srinivasan Parthasarathy, and Rajiv Ramnath.","author":"Moosavi Sobhan","year":"2019","unstructured":"Sobhan Moosavi, Mohammad Hossein Samavatian, Srinivasan Parthasarathy, and Rajiv Ramnath. 2019. A countrywide traffic accident dataset. arXiv preprint arXiv:1906.05409 (2019)."},{"key":"e_1_2_2_48_1","unstructured":"G\u00f6ran Nilsson. 1982. Effects of speed limits on traffic accidents in Sweden."},{"key":"e_1_2_2_49_1","first-page":"2305","article-title":"The effectiveness of traffic calming measures in reducing road carnage in masvingo urban","volume":"3","author":"Pardon Ndhlovu","year":"2013","unstructured":"Ndhlovu Pardon and Chigwenya Average. 2013. The effectiveness of traffic calming measures in reducing road carnage in masvingo urban. International Journal 3, 2 (2013), 2305--1493.","journal-title":"International Journal"},{"key":"e_1_2_2_50_1","doi-asserted-by":"crossref","unstructured":"Judea Pearl. 2009. Causal inference in statistics: An overview. (2009).","DOI":"10.1214\/09-SS057"},{"key":"e_1_2_2_51_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF02579324"},{"key":"e_1_2_2_52_1","doi-asserted-by":"crossref","unstructured":"Alon Reshef Benny Kimelfeld and Ester Livshits. 2020. The Impact of Negation on the Complexity of the Shapley Value in Conjunctive Queries. In PODS Dan Suciu Yufei Tao and Zhewei Wei (Eds.). 285--297.","DOI":"10.1145\/3375395.3387664"},{"key":"e_1_2_2_53_1","volume-title":"Miguel Angel Hernan, and Babette Brumback","author":"Robins James M","year":"2000","unstructured":"James M Robins, Miguel Angel Hernan, and Babette Brumback. 2000. Marginal structural models and causal inference in epidemiology."},{"key":"e_1_2_2_54_1","doi-asserted-by":"publisher","DOI":"10.1093\/biomet\/70.1.41"},{"key":"e_1_2_2_55_1","doi-asserted-by":"publisher","DOI":"10.14778\/2856318.2856329"},{"key":"e_1_2_2_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2588578"},{"key":"e_1_2_2_57_1","volume-title":"The use of matched sampling and regression adjustment in observational studies. Ph. D. Dissertation","author":"Rubin Donald Bruce","unstructured":"Donald Bruce Rubin. 1971. The use of matched sampling and regression adjustment in observational studies. Ph. D. Dissertation. Harvard University."},{"key":"e_1_2_2_58_1","doi-asserted-by":"publisher","DOI":"10.1198\/016214504000001880"},{"key":"e_1_2_2_59_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2021.05.055"},{"key":"e_1_2_2_60_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3196914"},{"key":"e_1_2_2_61_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3389759"},{"key":"e_1_2_2_62_1","unstructured":"Gayatri Sathe and Sunita Sarawagi. 2001. Intelligent rollups in multidimensional OLAP data. In VLDB. 307--316."},{"key":"e_1_2_2_63_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.2041-210X.2010.00012.x"},{"key":"e_1_2_2_64_1","volume-title":"DoWhy: An End-to-End Library for Causal Inference. arXiv preprint arXiv:2011.04216","author":"Sharma Amit","year":"2020","unstructured":"Amit Sharma and Emre Kiciman. 2020. DoWhy: An End-to-End Library for Causal Inference. arXiv preprint arXiv:2011.04216 (2020)."},{"key":"e_1_2_2_65_1","article-title":"A linear non-Gaussian acyclic model for causal discovery","volume":"7","author":"Shimizu Shohei","year":"2006","unstructured":"Shohei Shimizu, Patrik O Hoyer, Aapo Hyv\u00e4rinen, Antti Kerminen, and Michael Jordan. 2006. A linear non-Gaussian acyclic model for causal discovery. Journal of Machine Learning Research 7, 10 (2006).","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_2_2_66_1","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511605949"},{"key":"e_1_2_2_67_1","doi-asserted-by":"crossref","unstructured":"P. Spirtes et al. 2000. Causation prediction and search. MIT press.","DOI":"10.7551\/mitpress\/1754.001.0001"},{"key":"e_1_2_2_68_1","doi-asserted-by":"publisher","DOI":"10.14778\/3561261.3561271"},{"key":"e_1_2_2_69_1","doi-asserted-by":"publisher","DOI":"10.1145\/2745754.2745765"},{"key":"e_1_2_2_70_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1018912507879"},{"key":"e_1_2_2_71_1","doi-asserted-by":"publisher","DOI":"10.1145\/800070.802186"},{"key":"e_1_2_2_72_1","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.2017.1319839"},{"key":"e_1_2_2_73_1","volume-title":"Proceedings of the VLDB Endowment. International Conference on Very Large Data Bases","volume":"11","author":"Wen Yuhao","year":"2018","unstructured":"Yuhao Wen, Xiaodan Zhu, Sudeepa Roy, and Jun Yang. 2018. Interactive summarization and exploration of top aggregate query answers. In Proceedings of the VLDB Endowment. International Conference on Very Large Data Bases, Vol. 11. NIH Public Access, 2196."},{"key":"e_1_2_2_74_1","doi-asserted-by":"publisher","DOI":"10.14778\/2536354.2536356"},{"key":"e_1_2_2_75_1","volume-title":"Estimating heterogeneous treatment effects with observational data. Sociological methodology 42, 1","author":"Xie Yu","year":"2012","unstructured":"Yu Xie, Jennie E Brand, and Ben Jann. 2012. Estimating heterogeneous treatment effects with observational data. Sociological methodology 42, 1 (2012), 314--347."},{"key":"e_1_2_2_76_1","volume-title":"International conference on machine learning. PMLR, 3921--3930","author":"Yang Hongyu","year":"2017","unstructured":"Hongyu Yang, Cynthia Rudin, and Margo Seltzer. 2017. Scalable Bayesian rule lists. In International conference on machine learning. PMLR, 3921--3930."},{"key":"e_1_2_2_77_1","doi-asserted-by":"publisher","DOI":"10.14778\/3538598.3538603"},{"key":"e_1_2_2_78_1","unstructured":"Brit Youngmann Michael Cafarella Amir Gilad and Sudeepa Roy. 2023. Techinical Report. https:\/\/anonymous.4open. science\/r\/Explanation_Summarization-F736"},{"key":"e_1_2_2_79_1","volume-title":"NEXUS: On Explaining Confounding Bias. In Companion of the 2023 International Conference on Management of Data. 171--174","author":"Youngmann Brit","year":"2023","unstructured":"Brit Youngmann, Michael Cafarella, Yuval Moskovitch, and Babak Salimi. 2023. NEXUS: On Explaining Confounding Bias. In Companion of the 2023 International Conference on Management of Data. 171--174."},{"key":"e_1_2_2_80_1","volume-title":"On Explaining Confounding Bias. 2023 IEEE 39th International Conference on Data Engineering (ICDE)","author":"Youngmann Brit","year":"2023","unstructured":"Brit Youngmann, Michael Cafarella, Yuval Moskovitch, and Babak Salimi. 2023. On Explaining Confounding Bias. 2023 IEEE 39th International Conference on Data Engineering (ICDE) (2023)."},{"key":"e_1_2_2_81_1","doi-asserted-by":"publisher","DOI":"10.14778\/3603581.3603602"},{"key":"e_1_2_2_82_1","doi-asserted-by":"publisher","DOI":"10.1145\/1516360.1516404"}],"container-title":["Proceedings of the ACM on Management of Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3639328","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3639328","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,21]],"date-time":"2025-08-21T15:17:12Z","timestamp":1755789432000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3639328"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,3,12]]},"references-count":82,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2024,3,12]]}},"alternative-id":["10.1145\/3639328"],"URL":"https:\/\/doi.org\/10.1145\/3639328","relation":{},"ISSN":["2836-6573"],"issn-type":[{"value":"2836-6573","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,3,12]]}}}