{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:23:15Z","timestamp":1750220595329,"version":"3.41.0"},"reference-count":56,"publisher":"Association for Computing Machinery (ACM)","issue":"6","license":[{"start":{"date-parts":[[2020,9,28]],"date-time":"2020-09-28T00:00:00Z","timestamp":1601251200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"EU\/MSCA-IF","award":["745829"],"award-info":[{"award-number":["745829"]}]},{"name":"UK EPSRC","award":["EP\/R018634\/1"],"award-info":[{"award-number":["EP\/R018634\/1"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Knowl. Discov. Data"],"published-print":{"date-parts":[[2020,12,31]]},"abstract":"<jats:p>Analysts wishing to explore multivariate data spaces, typically issue queries involving selection operators, i.e., range or equality predicates, which define data subspaces of potential interest. Then, they use aggregation functions, the results of which determine a subspace\u2019s interestingness for further exploration and deeper analysis. However, Aggregate Query (AQ) results are scalars and convey limited information and explainability about the queried subspaces for enhanced exploratory analysis. Analysts have no way of identifying how these results are derived or how they change w.r.t query (input) parameter values. We address this shortcoming by aiding analysts to explore and understand data subspaces by contributing a novel explanation mechanism based on machine learning. We explain AQ results using functions obtained by a three-fold joint optimization problem which assume the form of explainable piecewise-linear regression functions. A key feature of the proposed solution is that the explanation functions are estimated using past executed queries. These queries provide a coarse grained overview of the underlying aggregate function (generating the AQ results) to be learned. Explanations for future, previously unseen AQs can be computed without accessing the underlying data and can be used to further explore the queried data subspaces, without issuing more queries to the backend analytics engine. We evaluate the explanation accuracy and efficiency through theoretically grounded metrics over real-world and synthetic datasets and query workloads.<\/jats:p>","DOI":"10.1145\/3410448","type":"journal-article","created":{"date-parts":[[2020,9,29]],"date-time":"2020-09-29T04:10:30Z","timestamp":1601352630000},"page":"1-33","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Large-scale Data Exploration Using Explanatory Regression Functions"],"prefix":"10.1145","volume":"14","author":[{"given":"Fotis","family":"Savva","sequence":"first","affiliation":[{"name":"University of Glasgow, United Kingdom"}]},{"given":"Christos","family":"Anagnostopoulos","sequence":"additional","affiliation":[{"name":"University of Glasgow, United Kingdom"}]},{"given":"Peter","family":"Triantafillou","sequence":"additional","affiliation":[{"name":"University of Warwick, United Kingdom"}]},{"given":"Kostas","family":"Kolomvatsos","sequence":"additional","affiliation":[{"name":"University of Thessaly, Greece"}]}],"member":"320","published-online":{"date-parts":[[2020,9,28]]},"reference":[{"volume-title":"Crimes - 2001 to present. Retrieved","year":"2016","key":"e_1_2_1_1_1","unstructured":"2016. Crimes - 2001 to present. Retrieved December 1, 2016 from https:\/\/data.cityofchicago.org\/Public-Safety\/Crimes-2001-to-present\/ijzp-q8t2. 2016. Crimes - 2001 to present. Retrieved December 1, 2016 from https:\/\/data.cityofchicago.org\/Public-Safety\/Crimes-2001-to-present\/ijzp-q8t2."},{"volume-title":"Retrieved","year":"2019","key":"e_1_2_1_2_1","unstructured":"2019. Query Analytics Workloads Dataset Data Set . Retrieved July 29, 2019 from https:\/\/archive.ics.uci.edu\/ml\/datasets\/Query+Analytics+Workloads+Dataset. 2019. Query Analytics Workloads Dataset Data Set. Retrieved July 29, 2019 from https:\/\/archive.ics.uci.edu\/ml\/datasets\/Query+Analytics+Workloads+Dataset."},{"volume-title":"Retrieved","year":"2020","key":"e_1_2_1_3_1","unstructured":"2020. HIGGS Data Set . Retrieved February 19, 2020 from http:\/\/archive.ics.uci.edu\/ml\/datasets\/HIGGS. 2020. HIGGS Data Set. Retrieved February 19, 2020 from http:\/\/archive.ics.uci.edu\/ml\/datasets\/HIGGS."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/2465351.2465355"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/1989284.1989302"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10489-017-1093-y"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2015.17"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2017.111"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3035928"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1038\/ncomms5308"},{"key":"e_1_2_1_11_1","volume-title":"Bokeh: Python Library for Interactive Visualization.","author":"Team Bokeh Development","year":"2018","unstructured":"Bokeh Development Team . 2018 . Bokeh: Python Library for Interactive Visualization. Retrieved from https:\/\/bokeh.pydata.org\/en\/latest\/. Bokeh Development Team. 2018. Bokeh: Python Library for Interactive Visualization. Retrieved from https:\/\/bokeh.pydata.org\/en\/latest\/."},{"volume-title":"Neural Networks: Tricks of the Trade","author":"Bottou L\u00e9on","key":"e_1_2_1_12_1","unstructured":"L\u00e9on Bottou . 2012. Stochastic gradient descent tricks . In Neural Networks: Tricks of the Trade . Springer , 421--436. L\u00e9on Bottou. 2012. Stochastic gradient descent tricks. In Neural Networks: Tricks of the Trade. Springer, 421--436."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2610520"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/1242524.1242526"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1007568.1007602"},{"key":"e_1_2_1_16_1","volume-title":"Provenance in Databases: Why, How, and Where","author":"Cheney James","year":"2009","unstructured":"James Cheney , Laura Chiticariu , and Wang-Chiew Tan . 2009. Provenance in Databases: Why, How, and Where . Now Publishers Inc , 2009 . James Cheney, Laura Chiticariu, and Wang-Chiew Tan. 2009. Provenance in Databases: Why, How, and Where. Now Publishers Inc, 2009."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jalgor.2003.12.001"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.14778\/3137765.3137813"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.14778\/2735461.2735467"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.18637\/jss.v033.i01"},{"key":"e_1_2_1_21_1","doi-asserted-by":"crossref","unstructured":"Jerome H. Friedman. 1991. Multivariate adaptive regression splines. The Annals of Statistics Mar. (1991) 1:1--67.  Jerome H. Friedman. 1991. Multivariate adaptive regression splines. The Annals of Statistics Mar. (1991) 1:1--67.","DOI":"10.1214\/aos\/1176347963"},{"key":"e_1_2_1_22_1","volume-title":"Proceedings of the Advances in Neural Information Processing Systems. 281--288","author":"Hamerly Greg","year":"2004","unstructured":"Greg Hamerly and Charles Elkan . 2004 . Learning the k in k-means . In Proceedings of the Advances in Neural Information Processing Systems. 281--288 . Greg Hamerly and Charles Elkan. 2004. Learning the k in k-means. In Proceedings of the Advances in Neural Information Processing Systems. 281--288."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.2307\/2346830"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF02985802"},{"key":"e_1_2_1_25_1","volume-title":"Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data.","volume":"26","author":"Hellerstein Joseph M.","unstructured":"Joseph M. Hellerstein , Peter J. Haas , and Helen J. Wang . 1997. Online aggregation . In Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data. Vol. 26 . ACM, 171--182. Joseph M. Hellerstein, Peter J. Haas, and Helen J. Wang. 1997. Online aggregation. In Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data. Vol. 26. ACM, 171--182."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463676.2465273"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2731084"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3159940"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/1989323.1989411"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.14778\/2180912.2180913"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/3077257.3077271"},{"key":"e_1_2_1_32_1","volume-title":"Proceedings of the Annual Meeting of the Society for Academic Emergency Medicine.","volume":"14","author":"Lewis Roger J.","year":"2000","unstructured":"Roger J. Lewis . 2000 . An introduction to classification and regression tree (CART) analysis . In Proceedings of the Annual Meeting of the Society for Academic Emergency Medicine. Vol. 14 . Roger J. Lewis. 2000. An introduction to classification and regression tree (CART) analysis. In Proceedings of the Annual Meeting of the Society for Academic Emergency Medicine. Vol. 14."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2014.2346452"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.14778\/2733004.2733070"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3300066"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.5555\/2969735.2969739"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/2783258.2788624"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3064013"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.5555\/1953048.2078195"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2939778"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.14778\/2856318.2856329"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/BigData.2018.8621953"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/BigData47090.2019.9006267"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.future.2020.03.063"},{"key":"e_1_2_1_45_1","volume-title":"ML-AQP: Query-driven approximate query processing based on machine learning. Arxiv Preprint Arxiv:2003.06613","author":"Savva Fotis","year":"2020","unstructured":"Fotis Savva , Christos Anagnostopoulos , and Peter Triantafillou . 2020. ML-AQP: Query-driven approximate query processing based on machine learning. Arxiv Preprint Arxiv:2003.06613 ( 2020 ). Fotis Savva, Christos Anagnostopoulos, and Peter Triantafillou. 2020. ML-AQP: Query-driven approximate query processing based on machine learning. Arxiv Preprint Arxiv:2003.06613 (2020)."},{"key":"e_1_2_1_46_1","volume-title":"Proceedings of the Conference on Innovative Data Systems Research (CIDR'11)","volume":"11","author":"Sidirourgos Lefteris","unstructured":"Lefteris Sidirourgos , Martin L. Kersten , and Peter A. Boncz . 2011. SciBORQ: Scientific data management with bounds on runtime and quality . In Proceedings of the Conference on Innovative Data Systems Research (CIDR'11) , Vol. 11 . 296--301. Lefteris Sidirourgos, Martin L. Kersten, and Peter A. Boncz. 2011. SciBORQ: Scientific data management with bounds on runtime and quality. In Proceedings of the Conference on Innovative Data Systems Research (CIDR'11), Vol. 11. 296--301."},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/564691.564758"},{"key":"e_1_2_1_48_1","unstructured":"Jean Claude Utazirubanda Tom\u00e1s M. Le\u00f3n and Papa Ngom. 2019. Variable selection with group LASSO approach: Application to Cox regression with frailty model. Communications in Statistics - Simulation and Computation Feb. (2019) 16:1--21.  Jean Claude Utazirubanda Tom\u00e1s M. Le\u00f3n and Papa Ngom. 2019. Variable selection with group LASSO approach: Application to Cox regression with frailty model. Communications in Statistics - Simulation and Computation Feb. (2019) 16:1--21."},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.14778\/2831360.2831371"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2750549"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3035925"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3064051"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/1553374.1553516"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.14778\/2536354.2536356"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2013.6544881"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/1807167.1807238"}],"container-title":["ACM Transactions on Knowledge Discovery from Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3410448","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3410448","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T21:31:51Z","timestamp":1750195911000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3410448"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,9,28]]},"references-count":56,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2020,12,31]]}},"alternative-id":["10.1145\/3410448"],"URL":"https:\/\/doi.org\/10.1145\/3410448","relation":{},"ISSN":["1556-4681","1556-472X"],"issn-type":[{"type":"print","value":"1556-4681"},{"type":"electronic","value":"1556-472X"}],"subject":[],"published":{"date-parts":[[2020,9,28]]},"assertion":[{"value":"2019-11-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-07-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-09-28","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}