{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,23]],"date-time":"2025-08-23T05:10:53Z","timestamp":1755925853031,"version":"3.28.0"},"reference-count":38,"publisher":"Association for Computing Machinery (ACM)","issue":"14","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2013,9]]},"abstract":"<jats:p>Analytics over the increasing quantity of data stored in the Cloud has become very expensive, particularly due to the pay-as-you-go Cloud computation model. Data scientists typically manually extract samples of increasing data size (progressive samples) using domain-specific sampling strategies for exploratory querying. This provides them with user-control, repeatable semantics, and result provenance. However, such solutions result in tedious workflows that preclude the reuse of work across samples. On the other hand, existing approximate query processing systems report early results, but do not offer the above benefits for complex ad-hoc queries. We propose a new progressive analytics system based on a progress model called Prism that (1) allows users to communicate progressive samples to the system; (2) allows efficient and deterministic query processing over samples; and (3) provides repeatable semantics and provenance to data scientists. We show that one can realize this model for atemporal relational queries using an unmodified temporal streaming engine, by re-interpreting temporal event fields to denote progress. Based on Prism, we build Now!, a progressive data-parallel computation framework for Windows Azure, where progress is understood as a first-class citizen in the framework. Now! works with \"progress-aware reducers\"- in particular, it works with streaming engines to support progressive SQL over big data. Extensive experiments on Windows Azure with real and synthetic workloads validate the scalability and benefits of Now! and its optimizations, over current solutions for progressive analytics.<\/jats:p>","DOI":"10.14778\/2556549.2556557","type":"journal-article","created":{"date-parts":[[2014,6,24]],"date-time":"2014-06-24T12:17:57Z","timestamp":1403612277000},"page":"1726-1737","source":"Crossref","is-referenced-by-count":29,"title":["Scalable progressive analytics on big data in the cloud"],"prefix":"10.14778","volume":"6","author":[{"given":"Badrish","family":"Chandramouli","sequence":"first","affiliation":[{"name":"Microsoft Research, Redmond"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jonathan","family":"Goldstein","sequence":"additional","affiliation":[{"name":"Microsoft Research, Redmond"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Abdul","family":"Quamar","sequence":"additional","affiliation":[{"name":"University of Maryland, College Park"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2013,9]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"The design of the Borealis stream processing engine","author":"Abadi D.","year":"2005","unstructured":"D. Abadi et al. The design of the Borealis stream processing engine. 2005."},{"doi-asserted-by":"publisher","key":"e_1_2_1_2_1","DOI":"10.1145\/2465351.2465355"},{"key":"e_1_2_1_3_1","volume-title":"Microsoft CEP Server and Online Behavioral Targeting","author":"Ali M.","year":"2009","unstructured":"M. Ali et al. Microsoft CEP Server and Online Behavioral Targeting. 2009."},{"key":"e_1_2_1_4_1","volume-title":"Models and issues in data stream systems","author":"Babcock B.","year":"2002","unstructured":"B. Babcock et al. Models and issues in data stream systems. 2002."},{"key":"e_1_2_1_5_1","volume-title":"SC","author":"Barga R.","year":"2011","unstructured":"R. Barga, J. Ekanayake, and W. Lu. Iterative mapreduce research on Azure. In SC, 2011."},{"key":"e_1_2_1_6_1","volume-title":"Consistent streaming through time: A vision for event stream processing","author":"Barga R.","year":"2007","unstructured":"R. Barga et al. Consistent streaming through time: A vision for event stream processing. 2007."},{"volume-title":"MSR","author":"Chandramouli B.","unstructured":"B. Chandramouli et al. Scalable progressive analytics on big data in the cloud. Technical report, MSR. http:\/\/aka.ms\/Jpe5f5.","key":"e_1_2_1_7_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_8_1","DOI":"10.1109\/ICDE.2012.55"},{"doi-asserted-by":"publisher","key":"e_1_2_1_9_1","DOI":"10.1145\/1007568.1007602"},{"doi-asserted-by":"publisher","key":"e_1_2_1_10_1","DOI":"10.1145\/304182.304206"},{"doi-asserted-by":"publisher","key":"e_1_2_1_11_1","DOI":"10.1023\/A:1022673506211"},{"doi-asserted-by":"publisher","key":"e_1_2_1_12_1","DOI":"10.5555\/1855711.1855732"},{"unstructured":"Daytona for Azure. http:\/\/aka.ms\/unkcbq.","key":"e_1_2_1_13_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_14_1","DOI":"10.5555\/1251254.1251264"},{"key":"e_1_2_1_15_1","author":"Doucet A.","year":"2006","unstructured":"A. Doucet, M. Briers, and S. Senecal. Efficient block sampling strategies for sequential monte carlo methods. Journal of Computational and Graphical Statistics, 2006.","journal-title":"Journal of Computational and Graphical Statistics"},{"doi-asserted-by":"publisher","key":"e_1_2_1_16_1","DOI":"10.1145\/304182.304208"},{"key":"e_1_2_1_17_1","first-page":"10126","author":"Haas P. J.","year":"1998","unstructured":"P. J. Haas and J. M. Hellerstein. Join algorithms for online aggregation. In IBM Research Report RJ 10126, 1998.","journal-title":"Join algorithms for online aggregation. In IBM Research Report RJ"},{"doi-asserted-by":"publisher","key":"e_1_2_1_18_1","DOI":"10.14778\/2350229.2350259"},{"key":"e_1_2_1_19_1","volume-title":"Nile: A query processing engine for data streams","author":"Hammad M.","year":"2004","unstructured":"M. Hammad et al. Nile: A query processing engine for data streams. 2004."},{"doi-asserted-by":"publisher","key":"e_1_2_1_20_1","DOI":"10.1023\/A:1009835310546"},{"doi-asserted-by":"publisher","key":"e_1_2_1_21_1","DOI":"10.1145\/253260.253291"},{"key":"e_1_2_1_22_1","volume-title":"Temporal specialization","author":"Jensen C.","year":"1992","unstructured":"C. Jensen and R. Snodgrass. Temporal specialization. 1992."},{"doi-asserted-by":"publisher","key":"e_1_2_1_23_1","DOI":"10.1145\/1247480.1247560"},{"doi-asserted-by":"publisher","key":"e_1_2_1_24_1","DOI":"10.1145\/2213836.2213840"},{"doi-asserted-by":"publisher","key":"e_1_2_1_25_1","DOI":"10.14778\/2336664.2336675"},{"doi-asserted-by":"publisher","key":"e_1_2_1_26_1","DOI":"10.1145\/1989323.1989426"},{"key":"e_1_2_1_27_1","volume-title":"NIPS","author":"Maron O.","year":"1993","unstructured":"O. Maron et al. Hoeffding races: Accelerating model selection search for classification and function approximation. In NIPS, 1993."},{"key":"e_1_2_1_28_1","first-page":"21","author":"McKay M. D.","year":"1979","unstructured":"M. D. McKay et al. Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output from a Computer Code. Technometrics, 21, 1979.","journal-title":"Computer Code. Technometrics"},{"doi-asserted-by":"publisher","key":"e_1_2_1_29_1","DOI":"10.14778\/1920841.1920886"},{"doi-asserted-by":"publisher","key":"e_1_2_1_30_1","DOI":"10.14778\/3402707.3402748"},{"doi-asserted-by":"publisher","key":"e_1_2_1_31_1","DOI":"10.5555\/645925.671663"},{"doi-asserted-by":"publisher","key":"e_1_2_1_32_1","DOI":"10.1145\/2169090.2169092"},{"doi-asserted-by":"publisher","key":"e_1_2_1_33_1","DOI":"10.1109\/ICDE.2006.130"},{"unstructured":"The LINQ Project. http:\/\/aka.ms\/rjhi00.","key":"e_1_2_1_34_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_35_1","DOI":"10.1145\/1989323.1989350"},{"doi-asserted-by":"publisher","key":"e_1_2_1_36_1","DOI":"10.5555\/1717298"},{"doi-asserted-by":"publisher","key":"e_1_2_1_37_1","DOI":"10.5555\/2228298.2228301"},{"doi-asserted-by":"publisher","key":"e_1_2_1_38_1","DOI":"10.5555\/2342763.2342773"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/2556549.2556557","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,23]],"date-time":"2024-10-23T22:35:56Z","timestamp":1729722956000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/2556549.2556557"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,9]]},"references-count":38,"journal-issue":{"issue":"14","published-print":{"date-parts":[[2013,9]]}},"alternative-id":["10.14778\/2556549.2556557"],"URL":"https:\/\/doi.org\/10.14778\/2556549.2556557","relation":{},"ISSN":["2150-8097"],"issn-type":[{"type":"print","value":"2150-8097"}],"subject":[],"published":{"date-parts":[[2013,9]]}}}