{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:08:38Z","timestamp":1750306118220,"version":"3.41.0"},"reference-count":51,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2017,6,29]],"date-time":"2017-06-29T00:00:00Z","timestamp":1498694400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Knowl. Discov. Data"],"published-print":{"date-parts":[[2017,11,30]]},"abstract":"<jats:p>Fundamental to many predictive analytics tasks is the ability to estimate the cardinality (number of data items) of multi-dimensional data subspaces, defined by query selections over datasets. This is crucial for data analysts dealing with, e.g., interactive data subspace explorations, data subspace visualizations, and in query processing optimization. However, in many modern data systems, predictive analytics may be (i) too costly money-wise, e.g., in clouds, (ii) unreliable, e.g., in modern Big Data query engines, where accurate statistics are difficult to obtain\/maintain, or (iii) infeasible, e.g., for privacy issues. We contribute a novel, query-driven, function estimation model of analyst-defined data subspace cardinality. The proposed estimation model is highly accurate in terms of prediction and accommodating the well-known selection queries: multi-dimensional range and distance-nearest neighbors (radius) queries. Our function estimation model: (i) quantizes the vectorial query space, by learning the analysts\u2019 access patterns over a data space, (ii) associates query vectors with their corresponding cardinalities of the analyst-defined data subspaces, (iii) abstracts and employs query vectorial similarity to predict the cardinality of an unseen\/unexplored data subspace, and (iv) identifies and adapts to possible changes of the query subspaces based on the theory of optimal stopping. The proposed model is decentralized, facilitating the scaling-out of such predictive analytics queries. The research significance of the model lies in that (i) it is an attractive solution when data-driven statistical techniques are undesirable or infeasible, (ii) it offers a scale-out, decentralized training solution, (iii) it is applicable to different selection query types, and (iv) it offers a performance that is superior to that of data-driven approaches.<\/jats:p>","DOI":"10.1145\/3059177","type":"journal-article","created":{"date-parts":[[2017,6,30]],"date-time":"2017-06-30T12:36:19Z","timestamp":1498826179000},"page":"1-46","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":17,"title":["Query-Driven Learning for Predictive Analytics of Data Subspace Cardinality"],"prefix":"10.1145","volume":"11","author":[{"given":"Christos","family":"Anagnostopoulos","sequence":"first","affiliation":[{"name":"University of Glasgow, Scotland, United Kingdom"}]},{"given":"Peter","family":"Triantafillou","sequence":"additional","affiliation":[{"name":"University of Glasgow, Scotland, United Kingdom"}]}],"member":"320","published-online":{"date-parts":[[2017,6,29]]},"reference":[{"volume-title":"Handbook of Mathematical Functions: With Formulas, Graphs, and Mathematical Tables","author":"Abramowitz Milton","key":"e_1_2_1_1_1","unstructured":"Milton Abramowitz . 1974. Handbook of Mathematical Functions: With Formulas, Graphs, and Mathematical Tables . Dover Publications, Inc orporated. Milton Abramowitz. 1974. Handbook of Mathematical Functions: With Formulas, Graphs, and Mathematical Tables. Dover Publications, Incorporated."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-014-0357-y"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.14778\/2733085.2733096"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2015.17"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/BigData.2015.7363736"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/LDAV.2012.6378972"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/BigData.2013.6691635"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/263661.263671"},{"volume-title":"Dynamic Programming and Optimal Control","author":"Bertsekas Dimitri P.","key":"e_1_2_1_9_1","unstructured":"Dimitri P. Bertsekas . 2005. Dynamic Programming and Optimal Control . Volume I. Athena Scientific , Belmont, MA . Dimitri P. Bertsekas. 2005. Dynamic Programming and Optimal Control. Volume I. Athena Scientific, Belmont, MA."},{"key":"e_1_2_1_10_1","volume-title":"NIPS Conference","author":"Bottou Leon","year":"1994","unstructured":"Leon Bottou and Yoshua Bengio . 1994 . Convergence properties of the K-means algorithms. In Advances in Neural Information Processing Systems 7 , NIPS Conference , Denver, Colorado, USA. 585--592. Leon Bottou and Yoshua Bengio. 1994. Convergence properties of the K-means algorithms. In Advances in Neural Information Processing Systems 7, NIPS Conference, Denver, Colorado, USA. 585--592."},{"volume-title":"Advances in Neural Information Processing Systems 20","author":"Bousquet Olivier","key":"e_1_2_1_11_1","unstructured":"Olivier Bousquet and Leon Bottou . 2008. The tradeoffs of large scale learning . In Advances in Neural Information Processing Systems 20 , J. C. Platt, D. Koller, Y. Singer, and S. T. Roweis (Eds.). Curran Associates, Inc. , 161--168. Olivier Bousquet and Leon Bottou. 2008. The tradeoffs of large scale learning. In Advances in Neural Information Processing Systems 20, J. C. Platt, D. Koller, Y. Singer, and S. T. Roweis (Eds.). Curran Associates, Inc., 161--168."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/2.33"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.5555\/645926.671851"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/PacificVis.2014.60"},{"key":"e_1_2_1_15_1","volume-title":"Herbert Ellis Robbins, and David Siegmund","author":"Chow Yuan Shih","year":"1971","unstructured":"Yuan Shih Chow , Herbert Ellis Robbins, and David Siegmund . 1971 . Great Expectations : The Theory of Optimal Stopping. Houghton Mifflin, Boston . Yuan Shih Chow, Herbert Ellis Robbins, and David Siegmund. 1971. Great Expectations: The Theory of Optimal Stopping. Houghton Mifflin, Boston."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1561\/1900000004"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1214\/aoms\/1177692491"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/1327452.1327492"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2010.80"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2009.145"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-003-0090-4"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1214\/aoms\/1177698595"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10618-010-0201-y"},{"key":"e_1_2_1_24_1","volume-title":"Schnabel","author":"Dennis John E.","year":"1983","unstructured":"John E. Dennis Jr . and Robert B . Schnabel . 1983 . Numerical Methods for Unconstrained Optimization and Nonlinear Equations. Prentice Hall . John E. Dennis Jr. and Robert B. Schnabel. 1983. Numerical Methods for Unconstrained Optimization and Nonlinear Equations. Prentice Hall."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.29012\/jpc.v1i2.573"},{"key":"e_1_2_1_26_1","volume-title":"Self-Organization and Associative Memory","author":"Kohonen Tuevo","unstructured":"Tuevo Kohonen . 1989. Self-Organization and Associative Memory : 3 rd Edition. Springer-Verlag New York, Inc. , New York, NY . Tuevo Kohonen. 1989. Self-Organization and Associative Memory: 3rd Edition. Springer-Verlag New York, Inc., New York, NY.","edition":"3"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neunet.2012.09.018"},{"key":"e_1_2_1_28_1","volume-title":"Cheol Hoon Park, and Touradj Ebrahimi","author":"Lee Jong-Seok","year":"2013","unstructured":"Jong-Seok Lee , Cheol Hoon Park, and Touradj Ebrahimi . 2013 . Theory and Applications of Hybrid Simulated Annealing. Springer Berlin Heidelberg , 395--422. Jong-Seok Lee, Cheol Hoon Park, and Touradj Ebrahimi. 2013. Theory and Applications of Hybrid Simulated Annealing. Springer Berlin Heidelberg, 395--422."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1214\/aoms\/1177698978"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1142\/S0218488505003436"},{"key":"e_1_2_1_31_1","unstructured":"Moshe Lichman. 2013. UCI Machine Learning Repository. Retrieved from http:\/\/archive.ics.uci.edu\/ml.  Moshe Lichman. 2013. UCI Machine Learning Repository. Retrieved from http:\/\/archive.ics.uci.edu\/ml."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/BigData.2014.7004269"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1162\/089976603322385117"},{"key":"e_1_2_1_34_1","volume-title":"Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, L. M. Le Cam and J. Neyman (Eds.)","volume":"1","author":"MacQueen James B.","year":"1967","unstructured":"James B. MacQueen . 1967 . Some methods for classification and analysis of multivariate observations . In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, L. M. Le Cam and J. Neyman (Eds.) , Vol. 1 . University of California Press, 281--297. James B. MacQueen. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, L. M. Le Cam and J. Neyman (Eds.), Vol. 1. University of California Press, 281--297."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/72.159068"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.5555\/1127110.1127118"},{"volume-title":"Optimal Stopping and Free-Boundary Problems","author":"Peskir Goran","key":"e_1_2_1_37_1","unstructured":"Goran Peskir and Albert Shiryaev . 2006. Optimal Stopping and Free-Boundary Problems . Springer , Dordrecht . Goran Peskir and Albert Shiryaev. 2006. Optimal Stopping and Free-Boundary Problems. Springer, Dordrecht."},{"key":"e_1_2_1_38_1","article-title":"Hubs in space: Popular nearest neighbors in high-dimensional data","author":"Radovanovic Milos","year":"2010","unstructured":"Milos Radovanovic , Alexandros Nanopoulos , and Mirjana Ivanovic . 2010 . Hubs in space: Popular nearest neighbors in high-dimensional data . Journal of Machine Learning Research 11 ( December 2010), 2487--2531. Milos Radovanovic, Alexandros Nanopoulos, and Mirjana Ivanovic. 2010. Hubs in space: Popular nearest neighbors in high-dimensional data. Journal of Machine Learning Research 11 (December 2010), 2487--2531.","journal-title":"Journal of Machine Learning Research 11"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1111\/cogs.12148"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/18.144705"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/18.32133"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2006.84"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.2009.2039048"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/956863.956881"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/2505515.2505756"},{"key":"e_1_2_1_46_1","volume-title":"Vapnik and Alexey Ya. Chervonenkis","author":"Vladimir","year":"1971","unstructured":"Vladimir N. Vapnik and Alexey Ya. Chervonenkis . 1971 . On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications 16, 2 (1971), 264--280. Vladimir N. Vapnik and Alexey Ya. Chervonenkis. 1971. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications 16, 2 (1971), 264--280."},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/2523616.2523633"},{"key":"e_1_2_1_48_1","volume-title":"A learning framework for self-tuning histograms. CoRR abs\/1111.7295","author":"Viswanathan Raajay","year":"2011","unstructured":"Raajay Viswanathan , Prateek Jain , Srivatsan Laxman , and Arvind Arasu . 2011. A learning framework for self-tuning histograms. CoRR abs\/1111.7295 ( 2011 ). http:\/\/arxiv.org\/abs\/1111.7295. Raajay Viswanathan, Prateek Jain, Srivatsan Laxman, and Arvind Arasu. 2011. A learning framework for self-tuning histograms. CoRR abs\/1111.7295 (2011). http:\/\/arxiv.org\/abs\/1111.7295."},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/3147.3165"},{"key":"e_1_2_1_50_1","volume-title":"Hadoop: The Definitive Guide","author":"White Tom","year":"2009","unstructured":"Tom White . 2009 . Hadoop: The Definitive Guide ( 1 st ed.). O\u2019Reilly Media, Inc. Tom White. 2009. Hadoop: The Definitive Guide (1st ed.). O\u2019Reilly Media, Inc.","edition":"1"},{"key":"e_1_2_1_51_1","volume-title":"Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation (NSDI\u201912)","author":"Zaharia Matei","year":"2012","unstructured":"Matei Zaharia , Mosharaf Chowdhury , Tathagata Das , Ankur Dave , Justin Ma , Murphy McCauley , Michael J. Franklin , Scott Shenker , and Ion Stoica . 2012 . Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing . In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation (NSDI\u201912) . USENIX Association, Berkeley, CA, 2. Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation (NSDI\u201912). USENIX Association, Berkeley, CA, 2."}],"container-title":["ACM Transactions on Knowledge Discovery from Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3059177","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3059177","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T03:36:38Z","timestamp":1750217798000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3059177"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,6,29]]},"references-count":51,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2017,11,30]]}},"alternative-id":["10.1145\/3059177"],"URL":"https:\/\/doi.org\/10.1145\/3059177","relation":{},"ISSN":["1556-4681","1556-472X"],"issn-type":[{"type":"print","value":"1556-4681"},{"type":"electronic","value":"1556-472X"}],"subject":[],"published":{"date-parts":[[2017,6,29]]},"assertion":[{"value":"2016-05-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2017-03-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2017-06-29","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}