{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T00:04:39Z","timestamp":1774915479942,"version":"3.50.1"},"reference-count":58,"publisher":"Association for Computing Machinery (ACM)","issue":"10","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2021,6]]},"abstract":"<jats:p>In pursuit of real-time data analysis, approximate summarization structures, i.e., synopses, have gained importance over the years. However, existing stream processing systems, such as Flink, Spark, and Storm, do not support synopses as first class citizens, i.e., as pipeline operators. Synopses' implementation is upon users. This is mainly because of the diversity of synopses, which makes a unified implementation difficult. We present Condor, a framework that supports synopses as first class citizens. Condor facilitates the specification and processing of synopsis-based streaming jobs while hiding all internal processing details. Condor's key component is its model that represents synopses as a particular case of windowed aggregate functions. An inherent divide and conquer strategy allows Condor to efficiently distribute the computation, allowing for high-performance and linear scalability. Our evaluation shows that Condor outperforms existing approaches by up to a factor of 75x and that it scales linearly with the number of cores.<\/jats:p>","DOI":"10.14778\/3467861.3467871","type":"journal-article","created":{"date-parts":[[2021,10,26]],"date-time":"2021-10-26T16:17:12Z","timestamp":1635265032000},"page":"1818-1831","source":"Crossref","is-referenced-by-count":10,"title":["In the land of data streams where synopses are missing, one framework to bring them all"],"prefix":"10.14778","volume":"14","author":[{"given":"Rudi","family":"Poepsel-Lemaitre","sequence":"first","affiliation":[{"name":"TU Berlin"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Martin","family":"Kiefer","sequence":"additional","affiliation":[{"name":"TU Berlin"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Joscha","family":"von Hein","sequence":"additional","affiliation":[{"name":"TU Berlin"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jorge-Arnulfo","family":"Quian\u00e9-Ruiz","sequence":"additional","affiliation":[{"name":"TU Berlin and German Research Center for Artificial Intelligence (DFKI)"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Volker","family":"Markl","sequence":"additional","affiliation":[{"name":"TU Berlin and German Research Center for Artificial Intelligence (DFKI)"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,10,26]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-015-0389-y"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/2500128"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/2465351.2465355"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.5555\/1182635.1164180"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.14778\/3236187.3236195"},{"key":"e_1_2_1_6_1","doi-asserted-by":"crossref","unstructured":"Tyler Akidau Robert Bradshaw Craig Chambers Slava Chernyak Rafael J Fern\u00e1ndez-Moctezuma Reuven Lax Sam McVeety Daniel Mills Frances Perry Eric Schmidt etal 2015. The dataflow model: a practical approach to balancing correctness latency and cost in massive-scale unbounded out-of-order data processing. (2015).  Tyler Akidau Robert Bradshaw Craig Chambers Slava Chernyak Rafael J Fern\u00e1ndez-Moctezuma Reuven Lax Sam McVeety Daniel Mills Frances Perry Eric Schmidt et al. 2015. The dataflow model: a practical approach to balancing correctness latency and cost in massive-scale unbounded out-of-order data processing. (2015).","DOI":"10.14778\/2824032.2824076"},{"key":"e_1_2_1_7_1","unstructured":"Apache Software Foundation. 2020. Apache Hive. https:\/\/hive.apache.org\/  Apache Software Foundation. 2020. Apache Hive. https:\/\/hive.apache.org\/"},{"key":"e_1_2_1_8_1","unstructured":"Apache Software Foundation. 2020. Apache Pig. https:\/\/pig.apache.org\/  Apache Software Foundation. 2020. Apache Pig. https:\/\/pig.apache.org\/"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/543613.543615"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/362686.362692"},{"key":"e_1_2_1_11_1","unstructured":"Andrei Broder Michael Mitzenmacher and Andrei Broder I Michael Mitzenmacher. 2002. Network applications of bloom filters: A survey. In Internet mathematics. Citeseer.  Andrei Broder Michael Mitzenmacher and Andrei Broder I Michael Mitzenmacher. 2002. Network applications of bloom filters: A survey. In Internet mathematics. Citeseer."},{"key":"e_1_2_1_12_1","volume-title":"Apache flink: Stream and batch processing in a single engine. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 36, 4","author":"Carbone Paris","year":"2015"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/2983323.2983807"},{"key":"e_1_2_1_14_1","volume-title":"Boyang Jerry Peng, et al","author":"Chintapalli Sanket","year":"2016"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.5555\/1855711.1855732"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687627.1687687"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.5555\/1083592.1083598"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1561\/1900000004"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.14778\/1454159.1454225"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jalgor.2003.12.001"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/2674005.2674994"},{"key":"e_1_2_1_22_1","doi-asserted-by":"crossref","unstructured":"Philippe Flajolet \u00c9ric Fusy Olivier Gandouet and Fr\u00e9d\u00e9ric Meunier. 2007. Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm. In Discrete Mathematics and Theoretical Computer Science. Discrete Mathematics and Theoretical Computer Science 137--156.  Philippe Flajolet \u00c9ric Fusy Olivier Gandouet and Fr\u00e9d\u00e9ric Meunier. 2007. Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm. In Discrete Mathematics and Theoretical Computer Science. Discrete Mathematics and Theoretical Computer Science 137--156.","DOI":"10.46298\/dmtcs.3545"},{"key":"e_1_2_1_23_1","unstructured":"Apache Flink. 2020. The Broadcast State Pattern. https:\/\/ci.apache.org\/projects\/flink\/flink-docs-stable\/dev\/stream\/state\/broadcast_state.html  Apache Flink. 2020. The Broadcast State Pattern. https:\/\/ci.apache.org\/projects\/flink\/flink-docs-stable\/dev\/stream\/state\/broadcast_state.html"},{"key":"e_1_2_1_24_1","unstructured":"Apache Flink. 2020. Physical Partitioning. https:\/\/ci.apache.org\/projects\/flink\/flink-docs-stable\/dev\/stream\/operators\/#physical-partitioning  Apache Flink. 2020. Physical Partitioning. https:\/\/ci.apache.org\/projects\/flink\/flink-docs-stable\/dev\/stream\/operators\/#physical-partitioning"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.5555\/645927.672356"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/581751.581753"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/2786763.2694351"},{"key":"e_1_2_1_29_1","doi-asserted-by":"crossref","unstructured":"Alfred Haar. 1909. Zur theorie der orthogonalen funktionensysteme. Georg-August-Universitat Gottingen.  Alfred Haar. 1909. Zur theorie der orthogonalen funktionensysteme. Georg-August-Universitat Gottingen.","DOI":"10.1007\/BF01456326"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/COMST.2014.2354398"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2882940"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.5555\/1083592.1083643"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.5555\/3430915.3442428"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1186\/s40537-019-0210-7"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/1142473.1142543"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/1058150.1058158"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/1066157.1066193"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/1066157.1066193"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1007\/s41019-018-0074-4"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.14778\/3352063.3352135"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.5555\/2831090.2831104"},{"key":"e_1_2_1_42_1","article-title":"SnappyData: A Unified Cluster for Streaming","author":"Mozafari Barzan","year":"2017","journal-title":"Transactions and Interactice Analytics.. In CIDR."},{"key":"e_1_2_1_43_1","volume-title":"Data streams: Algorithms and applications","author":"Muthukrishnan Shanmugavelayutham"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/971697.602294"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/235968.233342"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/3135974.3135989"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/217391.217453"},{"key":"e_1_2_1_48_1","unstructured":"Apache Spark. 2020. Scheduling Within an Application. https:\/\/spark.apache.org\/docs\/latest\/job-scheduling.html#scheduling-within-an-application  Apache Spark. 2020. Scheduling Within an Application. https:\/\/spark.apache.org\/docs\/latest\/job-scheduling.html#scheduling-within-an-application"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.5555\/1315451.1315479"},{"key":"e_1_2_1_50_1","unstructured":"NYC Taxi and Limousine Commission (TLC). 2020. New York City Taxi and Limousine Commission (TLC) Trip Record Data. https:\/\/www1.nyc.gov\/site\/tlc\/about\/tlc-trip-record-data.page  NYC Taxi and Limousine Commission (TLC). 2020. New York City Taxi and Limousine Commission (TLC) Trip Record Data. https:\/\/www1.nyc.gov\/site\/tlc\/about\/tlc-trip-record-data.page"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2018.00135"},{"key":"e_1_2_1_52_1","volume-title":"Efficient Window Aggregation with General Stream Slicing. In 22th International Conference on Extending Database Technology (EDBT).","author":"Traub Jonas","year":"2019"},{"key":"e_1_2_1_53_1","volume-title":"Sebastian Bre\u00df, Asterios Katsifodimos, Tilmann Rabl, and Volker Markl.","author":"Traub Jonas","year":"2020"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1145\/3456859.3456861"},{"key":"e_1_2_1_55_1","unstructured":"Jonas Traub Nikolaas Steenbergen Philipp M Grulich Tilmann Rabl and Volker Markl. 2017. I2: Interactive Real-Time Visualization for Streaming Data. In EDBT. 526--529.  Jonas Traub Nikolaas Steenbergen Philipp M Grulich Tilmann Rabl and Volker Markl. 2017. I2: Interactive Real-Time Visualization for Streaming Data. In EDBT. 526--529."},{"key":"e_1_2_1_56_1","volume-title":"Statistically nonrepresentative stratified sampling: A sampling technique for qualitative studies. Qualitative sociology 9, 1","author":"Trost Jan E","year":"1986"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/3423165"},{"key":"e_1_2_1_58_1","unstructured":"Yahoo! 2020. DataSketches: Sketches Library from Yahoo! https:\/\/datasketches.github.io\/  Yahoo! 2020. DataSketches: Sketches Library from Yahoo! https:\/\/datasketches.github.io\/"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1145\/2934664"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3467861.3467871","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T10:35:04Z","timestamp":1672223704000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3467861.3467871"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,6]]},"references-count":58,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2021,6]]}},"alternative-id":["10.14778\/3467861.3467871"],"URL":"https:\/\/doi.org\/10.14778\/3467861.3467871","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2021,6]]}}}